GRID::Panoptes

WebHome | UnixGeekTools | Geekfarm | About This Site

UPDATE:

I've taken all the knowledge I've gained from writing GRID::Panoptes and am starting over building a distributed reactive automation framework. This new framework is build from a set of loosely coupled components and will have all the capabilities of GRID::Panoptes and more.

Follow the project here: http://github.com/wu/panoptes/tree/master
































Overview



Many thanks to the folks who built the tools that made GRID::Panoptes possible, including:

Images

Manager launching agents

The manager runs POE. It forks off a child process for each host (POE::Wheel::Run). Each child process connects to the remote host over ssh using GRID::Machine to deploy and start up the agent. GRID::Panoptes is built on a number of CPAN modules, so there are quite a few perl modules required on the machine running the manager. To run an agent on a machine, you simply need to be running sshd on that server, have perl installed, and have the POE perl library installed.

Manager and agent communications

The manager and agents are running POE. They communicate using IKC (inter-kernel communcations for POE). By default, the manager does not listen on any public ports--all communication is tunneled over ssh. When connecting via GRID::Machine, an ssh tunnel is set up that listens on a (localhost) port on the remote server and forwards the connection back to a (localhost) port on the manager server.

Nagios

The agents can submit passive checks directly to the nagios server using the standard NSCA protocol. This bypasses the manager, eliminates the manager as a single point of failure or bottleneck, allowing for a massively scalable nagios implementation. The agent can also monitor the logs of one or more nagios instances running on each server, and perform event correlation. When doing this, it will watch for services with the same (or similar) name running on more than one host. Any time multiple problems are noticed for such a service, the event correlator will submit a passive check for an event correlator service. By setting up the nagios dependencies properly, an error on the event correlator service will prevent paging from each of the instances. This means you get a single page instead of potentially getting dozens or hundreds.

Database

Every message exchanaged between the manager and the agents is stored in a database. An interface to the database is under development...



Thu Feb 26, 2009 11:28 PM