Monitoring Multiple Clusters using Ganglia

First of all, let me introduce roughly what Ganglia is. Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids (taken from http://ganglia.sourceforge.net homepage). So basically it is for people who want to have an idea what their bunch of super-computers are doing as a whole (it monitors nodes as individuals too).

I have been using Ganglia to monitor clusters, but I had it installed on every master node of the clusters, so basically to check how let’s say ClusterX was doing I had to go onto master node’s Ganglia web interface and browse through it, which is not very practical. So I decided to move Ganglia web interface on to a single master node and have different clusters stats separately, but from within a single web interface.

I poked around for an easy way of doing that, but could not find any clear documentation (I know, my search skills do suck). I was also told on IRC #ganglia, that I needed separate gmond processes on the Ganglia master node for every single cluster I wanted to monitor separately, which kind of made sense, which I obviously tried to accomplish. So guess what? – It didn’t work (maybe because I have not read official Ganglia documentationm which is not very detailed anyway IMHO), so I got totally confused about the way Ganglia works and gathers data from various nodes.

I thought I will share with you how I got what I originally wanted to achieve.

Ganglia monitoring suite consists of three main parts: gmond, gmetad and web interface, usually called ganglia-web. Long story short:

  • gmond is a daemon which needs to sit on every single node which needs to be monitored, gather monitoring statistics, send as well as receive the stats to and from within the same multicast or unicast channel
  • gmetad – a collector deamon which needs to run on the actual Ganglia master node, normally it goes together with web interface.
  • ganglia-web – this component explains itself – it is a bunch of php scripts.

I will not explain how to install Ganglia or how to set up a web server to serve web UI etc – I believe it is very simple. Instead I will try to explain a bit more complex setup. Let’s start from visual stuff – an imaginary setup of one’s network of clusters:

As you can see from the diagram above, let’s say we have three clusters on the same broadcast (same network), but instead of having three separate Ganglia web interfaces and gmetad collector daemons we can have one on node0.c1 node, which then can collect stats from three different unicast (in our case) channels.

So what components are needed on what server:

  • ganglia-gmond is needed on every single node
  • ganglia-gmetad and ganglia-web is needed on node0.c1 only (let’s say we want to dedicate node0.c1 as a Ganglia web interface and stats collector)

And here is the setup snippets of configuration files:

  • /etc/gmond.conf identical on ClusterOne nodes (node0, node1, node2, node3) – I will specify the part which is the most important:
# /etc/gmond.conf - on ClusterOne
cluster {
  name = "ClusterOne"
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
}

udp_send_channel {
  mcast_join = 239.2.11.71
  port = 8661
  ttl = 1
}

udp_recv_channel {
  mcast_join = 239.2.11.71
  port = 8661
  bind = 239.2.11.71
}

tcp_accept_channel {
  port = 8661
}
  • /etc/gmond.conf identical on ClusterTwo nodes (node0, node1, node2, node3):
# /etc/gmond.conf - on ClusterTwo
cluster {
  name = "ClusterTwo"
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
}

udp_send_channel {
  mcast_join = 239.2.11.71
  port = 8662
  ttl = 1
}

udp_recv_channel {
  mcast_join = 239.2.11.71
  port = 8662
  bind = 239.2.11.71
}

tcp_accept_channel {
  port = 8662
}
  • /etc/gmond.conf identical on ClusterThree nodes (node0, node1, node2, node3):
# /etc/gmond.conf - on ClusterThree
cluster {
  name = "ClusterThree"
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
}

udp_send_channel {
  mcast_join = 239.2.11.71
  port = 8663
  ttl = 1
}

udp_recv_channel {
  mcast_join = 239.2.11.71
  port = 8663
  bind = 239.2.11.71
}

tcp_accept_channel {
  port = 8663
}
  • /etc/gmetad.conf – only exists on node0.c1 (again the most important part below):
# /etc/gmetad.conf on node0.c1
data_source "ClusterOne" 30 node0.c1:8661 node1.c1:8661
data_source "ClusterTwo" 30 node0.c2:8662 node1.c2:8662
data_source "ClusterThree" 30 node3.c2:8663 node1.c3:8663

Notice, I did not list all the nodes as data sources above for each cluster (imagine if you had like a thousand nodes per cluster :-) ), the reason why is it is not necessary. Imagine this as a three different pools, every one of them has its own virtual boundaries. So what happens is, the gmetad daemon accesses the configured data sources for data, say if one node dies the other one will still be able to provide stats to gmetad, because gmond nodes exchange stats within their configured UDP channels.

Now all you have to do is to configure your web server on node0.c1, start gmetad (default location for RRDs is /var/lib/ganglia/rrds) and start gmond services on all the clusters. You should have working monitoring system for your three clusters on a single node.

If I forgot to mention something or you found some mistakes or lies, please post it in comments, otherwise I hope it will be useful for some people.

7 thoughts on “Monitoring Multiple Clusters using Ganglia

  1. Pingback: Monitoramento de Servidores e Clusters com Ganglia 3.1.7 – Instalação « Blog de Carlos Eduardo Mota

  2. Matt Pederson

    So I tried doing what you suggested but all of the nodes show up under 1 cluster. Even though I have some of the servers in gmond.conf with a different cluster they are not showing up on that one

    Reply
    1. Vaidas Jablonskis Post author

      Are you sure you have specified different receive/send channel for each cluster? Would help if you could pastebin your config files, so I could have a look. Thanks

      Reply
  3. Leopoldo

    Hi Vaidas.
    The IP Number 239.2.11.71 as a multicast number?
    My collect node has 192.168.2.10 ip address; need to replace that 239… by this ?
    Thanks a lot.

    Reply
    1. Vaidas Jablonskis Post author

      Hi Leopoldo,

      Nope, you do not need to change any existing IPs on your nodes. Ganglia will just use 239.2.11.71 as a multicast channel address where the other nodes will be transmitting and receiving.

      Reply
  4. bui van quy

    Thank for your post,
    But I have two hosts in different physical network. I want add both of them to one Cluster this managed by Ganglia. But gmetad always detects only one host. stop gmond on host A, gmetad detects hostB and stop gmond on hostB, gmetad detects hostA.

    I don’t known what happen.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>