 | GRID::Panoptes |
WebHome | UnixGeekTools | Geekfarm | About This Site
UPDATE:
I've taken all the knowledge I've gained from writing GRID::Panoptes and am starting
over building a distributed reactive automation framework. This new framework
is build from a set of loosely coupled components and will have all the capabilities of
GRID::Panoptes and more.
Follow the project here:
http://github.com/wu/panoptes/tree/master
Overview
Many thanks to the folks who built the tools that made GRID::Panoptes
possible, including:
Images
Manager launching agents
The manager runs POE. It forks off a child process for each host
(POE::Wheel::Run). Each child process connects to the remote host
over ssh using GRID::Machine to deploy and start up the agent.
GRID::Panoptes is built on a number of CPAN modules, so there are
quite a few perl modules required on the machine running the manager.
To run an agent on a machine, you simply need to be running sshd on
that server, have perl installed, and have the POE perl library
installed.
Manager and agent communications
The manager and agents are running POE. They communicate using IKC
(inter-kernel communcations for POE). By default, the manager does
not listen on any public ports--all communication is tunneled over
ssh.
When connecting via GRID::Machine, an ssh tunnel is set up that
listens on a (localhost) port on the remote server and forwards the
connection back to a (localhost) port on the manager server.
Nagios
The agents can submit passive checks directly to the nagios server
using the standard NSCA protocol. This bypasses the manager,
eliminates the manager as a single point of failure or bottleneck,
allowing for a massively scalable nagios implementation.
The agent can also monitor the logs of one or more nagios instances
running on each server, and perform event correlation. When doing
this, it will watch for services with the same (or similar) name
running on more than one host. Any time multiple problems are noticed
for such a service, the event correlator will submit a passive check
for an event correlator service. By setting up the nagios
dependencies properly, an error on the event correlator service will
prevent paging from each of the instances. This means you get a
single page instead of potentially getting dozens or hundreds.
Database
Every message exchanaged between the manager and the agents is stored
in a database. An interface to the database is under development...
Thu Feb 26, 2009 11:28 PM