Jeff Roberts
RHCE #804006066322833
Vim-Fu is now iPhone and Android friendly

Check out the Vim-Fu Store!

Vim-Fu

using Cacti to monitor a large scale infrastructure in Amazon’s EC2

Like a lot of projects, it started small and easily managed, but quickly grew to something else, something that took thought and effort to control and shape. Knowing how your server and application are running can help you fine tune or even discover deficiencies in your app or architecture. Knowing what happened in the last moments before a crash can help you to diagnose problems with hardware or application and possibly avoid them in the future, at OpenX we chose Cacti to do this. Getting Cacti installed and talking to 20 machines is one thing, but getting it to monitor several hundred and automating the addition of new machines so that you can take advantage of EC2’s lateral scaling is what this post is all about.

Installation of Cacti is trivial, it requires only a LAMP stack to run. I chose to install the cacti-0.8.7d-1 RPM available from rpmforge which has improved LDAP support that the previous versions were sorely lacking.

Memcached Stats Displayed in Cacti

Memcached Stats Displayed in Cacti

  • start up MySQL
  • create the Cacti db
  • add an account for the Cacti software to use
  • add a second account to use for administration
  • start up Apache.

All done right? Hardly. Now comes the task of getting Cacti to talk your hosts, which is about 20 or so clicks of the mouse for each host, as well as installing the net-snmp package on each machine and customizing the /etc/snmp/snmpd.conf script, not immediately scalable. Luckily, Cacti has an api that we can use to add new machines and graphs for a wide variety of measurable events and the organization of those graphs into graph trees. Combine the api with a change management package, like slack, and you have the tools to quickly setup monitoring on hundreds of machines in no time.

Before I get into the api, I should talk a little about the other major components that we utilize to scale the procedure described above. At OpenX, we have written an api around Amazon’s provided tool set (see Grig Gheorghiu’s blog post entitled “Experiences deploying a large-scale infrastructure in Amazon EC2″ @ agiletesting.blogspot.com for more info about this). Our deployment command puts data about our EC2 hosts in a centralized db so that we can programmatically pull things like host name and ip address and use the data in scripts to automate tasks like adding snmp monitoring of a host. We also employ something called Slack. Slack is a simple change management tool developed by Google, that allows the designation of hosts to roles which contain all the files that you want on your hosts as well as scripts which you want applied to your hosts before and/or after the files are copied over. The combination of our api, MySQL and Slack allow us to automate the deployment and customization of all of our EC2 hosts. Setting up MySQL to store data about your host is beyond the scope of this post, as is getting slack installed. I’ll assume for this post, that you have both installed and working.

Install the net-snmp package on a box and pull the snmp.conf file from it. This will be the file that once modified, we’ll put into Slack to get distributed to all the hosts we want to monitor. Find the section below:

##       sec.name  source          community
com2sec local     localhost         public

and add:

com2sec mynetwork cacti_server_ip     my_community

Note: In EC2, if you are sending packets from the US to the EU, or vice versa, you must use the external ip of the Cacti server.

On your Slack server, create a slack role called ‘cacti’, and in the files section, put your modifed snmpd.conf in the files section under etc/snmp/ and in the scripts/postinstall file put something similar to:

!#/bin/bash
yum install net-snmpd -y
chkconfig snmpd on
service snmpd restart

This will install the required package on the host to be monitored, start the service for you and set it to be persistent on reboot. Remember to make this file executable, or Slack will not run it. On the host, run Slack by hand or put it into a cronjob set to run every 10 minutes or so, and it will install and configure snmpd in the next run. You can use a lock file to keep slack from doing this every time Slack runs.

That’s it for the host side of it. You can see how easy this makes lateral scaling. The second part of the automation happens on the Cacti server. This is done via api calls to Cacti to insert the new host into the Cacti db, then add the desired graphs to monitor and finally add the host and related graphs to a graph-tree so that you can view them in an organized way.

Before you can do anything on the server side you need to open up ports to allow Cacti to contact your machine. If you are working in EC2 this is done using the ec2-authorize command to allow tcp traffic on port 161 (snmp).

ec2-authorize -c ec2_cert -k ec2_private_key security_group_name -P tcp -p 161 -s ip_address_of_cacti_server/32

Once this is done, we can start using the Cacti api to interact with our hosts.

The api commands are kept in cacti_home/cli/ and consist of 13 or so php scripts. The most pertinent being add_device.php, add_graphs.php and add_tree.php. I ran them all by hand first to learn how they worked before automating them:

ADD DEVICE

    php -q /mnt/var/www/cacti/cli/add_device.php --list-host-templates

This first command will list the available templates on the cacti server. It should return some thing like:

Valid Host Templates: (id, name)

1       Generic SNMP-enabled Host
3       ucd/net SNMP Host
4       Karlnet Wireless Bridge
5       Cisco Router
6       Netware 4/5 Server
7       Windows 2000/XP Host
8       Local Linux Machine
9       Memcached Server
Adding s (s) as “ucd/net SNMP Host” using SNMP v1 with community “openx”
Success – new device-id: (5

Note the # of the template that you want. For this post I’ll be using #3, which will communicate with the net-snmp package installed above.

php -q /mnt/var/www/cacti/cli/add_device.php --description="host.description" --ip="hostname or IP" --template=3

This should return:

Adding s (s) as "ucd/net SNMP Host" using SNMP v1 with community "public"
Success - new device-id: (5)

Note the device-id and plug it into the add-graph command in the steps below. The following commands will add a “standard” set of graphs to be monitored:

INTERFACE GRAPHS

php -q /mnt/var/www/cacti/cli/add_graphs.php --host-id=5 --graph-type=ds --graph-template-id=2 --snmp-query-id=1 --snmp-query-type-id=13 --snmp-field=ifOperStatus --snmp-value=Up

CPU

php -q /mnt/var/www/cacti/cli/add_graphs.php --host-id=5 --graph-type=cg --graph-template-id=4

MOUNTED VOLUMES

Use add_graphs.php with “–list-snmp-values” to se all mounted partitions. Then issue the following command for each returned partition that you want to monitor. I am using /dev/sda2 as an example.

php add_graphs.php --host-id=5 --snmp-query-id=2 --snmp-query-type-id=6 --snmp-field=dskDevice --graph-template-id=3 --graph-type=ds --snmp-value=/dev/sda1 --snmp-value=/dev/sda2

LOAD

php -q /mnt/var/www/cacti/cli/add_graphs.php --host-id=5 --graph-type=cg --graph-template-id=11

MEMORY

php -q /mnt/var/www/cacti/cli/add_graphs.php --host-id=5 --graph-type=cg --graph-template-id=13

ADD DEVICE TO DEFAULT TREE

Use this command to list the available trees:

php add_tree.php --list-trees

And this command to add the host to the chosen graph. “1″ is the default graph tree.

php -q /mnt/var/www/cacti/cli/add_tree.php --type=node --node-type=host --tree-id=1 --host-id=5
cacti.grph_2

CPU Usage Displayed in Cacti

If you look in Cacti now at the default graph tree, you should see your host listed and all of the above graphs should be present with data trickling in. Now that it’s working by hand let’s talk about tying it all together and automating the task of adding each machine to Cacti.

To automate the above steps is a simple matter. Our deploy script handles interactions with EC2 related to creating the new host instance and putting the new data into the db. A separate script that I called “add2cacti” communicates with the Cacti server executing the api calls remotely via ssh using variables populated from the data stored in the db.

The following should be a good starting point for you no matter what language you choose to write your script in. I chose Perl.

  • determine the proper EC2 credentials to use and populate $cert and $priv_key
  • run the ec2-authorize command to allow the new host to communicate with the Cacti server
  • add the host to Cacti and collect the host_id
  • add the graphs for the new host_id
  • based on the hostname choose a graph tree to put the graph in
  • add the host_id to the proper graph tree

After the host instance has been launched and Slack has run at least once, I run the add2cacti script adding the host and completing the process. I typically add multiple machines at once, using a simple ‘for’ loop. This makes adding 10 machines or 100 machines trivial. The add2cacti script can be called or added to any process, like the deploy script, adding every new machine to Cacti effortlessly.

3 comments to using Cacti to monitor a large scale infrastructure in Amazon’s EC2

  • Aaron

    Do you have a automated way to remove hosts from cacti? I haven’t found an API script to do this, I was wondering if you knew of a way.

    Good write up.

    Aaron

    • jroberts

      Aaron, Thanks for the comment. To answer your question, no I don’t have a good way to remove machines from Cacti, I can’t believe they left that out of the API. I imagine you can remove the machine from mysql, but I have not tested it yet. I’m going to need to do this very soon and I’ll, of course, post my findings here.

  • Hey Jeff — great to see that you’re posting sysadmin-related articles. Great post on Cacti, will bookmark it ;-)

    Grig

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>