Nagios is a monitoring solution for hosts and network services although it will monitor almost anything you can think of. It is extensible in as much as it uses plugins (of which many are already available) to determine whether a host or a service is up or not. When invoked, the Nagios plugin returns a code (OK, WARNING, CRITICAL) as well as a short description back to the Nagios monitoring manager.
For our remote mail servers, I wanted to ensure that we could use our corporate Nagios installation to do the monitoring of the distant hosts. Being behind firewalls which we don't control, our only option is to use HTTPS to connect to the machines, which is what I'll be using.
Our central Nagios installation uses NRPE to invoke a check_xmlrpc plugin on a host in our DMZ, giving it the name of the target machine and the service to check for. That plugin then contacts an XML-RPC service via HTTPS on the distant host to perform the desired check and return the usual Nagios-like status such as OK, WARNING or CRITICAL to the plugin on our side of the world. And so the status goes back to the Nagios manager for further processing. The service on the distant host is a simple script built around XML-RPC for PHP.
As soon as the first mini-server goes live, we'll be monitoring disk capacities, the health of the LDAP server and of course the uptime of the system (I want to know if someone has power-cycled the system).
This nice girl (or a grumpy old colleague of hers) will be flying our CD-ROMS out to Hong-Kong this afternoon. I hope she handles her the wares gently.
Because our clients are so unhappy about the bad IMAP connections, we've created mini mail servers which will be placed directly in the LANs on site, therewith effectively hiding any bandwidth and latency issues from the client (I say hide because the MUA will be connecting to a local server in the local network; that in turn is fed from Europe, but the delays that appear there will not affect the users).
A self-installing CD-ROM has been cut. It carries a Centos 4.2 Linux distribution on it as well as a preconfigured Exim mail server, an OpenLDAP LDAP directory server which is fed directly from the source with delta-syncrepl, a Dovecot server acting as POP3 server as well as a number of bits and pieces including of course OpenSSH in order for us to be able to maintain the boxes, including a mod_ssl-enabled Apache web server which provides a number of web services with which I'll be monitoring the systems.
The self-installing CD-ROM uses kickstart to launch the installation, and most of the hard work is done in the postinstall scripts (setting up networking, SSL certificates, loading the LDAP server with a base LDIF, etc.)
I've tested the crap out of the system, so it ought to work rather nicely (famous last words), and we'll be cutting the CD-ROMs tomorrow.
Ever since I first layed hands on OpenLDAP, I've been accustomed to synchronizing the master server with a number of slaves utilizing slurpd. This has always worked very well for us and, being a push replication, it has the advantage that it can easily be monitored for errors in the replication. The reject files created by slurpd if updates couldn't be performed on the slaves can easily be monitored and the replication log file itself can be monitored for "growth", which indicates that one or more slave servers have gone South.
For the dozen (!) mini mail servers I'm installing very far away from Europe, I've opted for OpenLDAP's LDAP sync replication, in which the slaves "pull" updates from the master. Let me begin by saying that I'm bloody impressed by its performance.
I'm implementing delta-syncrepl in "refreshAndPersist" mode. The slave (called a consumer) connects to the master (the provider) and pulls the LDAP modifications the latter has recorded it its access log. The "Persist" bit means that the consumer keeps the connection open to the provider, receiving updates almost instantaneously when they occur on the provider. So far so good.
What is a bit difficult in this scenario is to monitor that the updates are being "consumed" (i.e. that the slave is pulling the updates). In order to monitor this, I'm periodically updating a counter on the master and using a custom built Nagios plugin to check the value of the counter on the slaves. Time will tell if this is the best implementation, but I think it should be ok.
I've probably said this a hundred times before, but I gladly repeat myself: the OpenLDAP team has created a fabulous bit of software. Thank you!
It is cold in the house. When I finally noticed at about 8:00 PM I called our friendly plumber who immediately drove out to see us (it helps a great deal when you pay your bills on time; people tend to be more willing to help).
After take the cowling off the heating and checking all sorts of things, the guy asks me if we have enough fuel in the tanks.
Oops
Note to self: first thing tomorrow morning, oder 6000 litres of oil.
I was checking up on results of Google Blogsearch, and I couldn't find my second to last posting. I then found it:

How did that happen? Did Ping-O-Matic set that date for me?