Intro
Recently I had some joy to build a diskless linux cluster for parallel regexing. So I decided to document it how I accomplished that using free and open source software. There is very little or very old documentation on how to build a diskless cluster using linux distributions.
This howto explains the setup of using a single compressed root file system image as a ramdisk for the slave nodes.
My setup/configuration
Since the cluster is going to be used for regexing of some useful data out of millions of small files, it is going to be very CPU and RAM intensive process. I am going to use the following hardware:
- 10 x slave nodes: AMD CPUs - 24 cores, 32GiB of RAM
- 1 x master node: AMD CPUs - 8 cores, 8GiB of RAM, RAID5 array of 8 SATAII disks
- 1 x network switch
Software/packages required:
- Centos Linux 5.6
- nfs client/server
- dhcpd
- tftp-server
- syslinux
- xinetd
- chroot
Additional software may be installed:
- ntp
- pssh (parallel SSH)
Networking and hostnames:
- Subnet - 10.0.0.0/24
- Gateway - 10.0.0.254
- Master node hostname - m0.example.com (10.0.0.1)
- Slave nodes hostnames - s{0..9}.example.com (10.0.0.10-10.0.0.19)
Prerequisites
I suppose you have all the servers connected to the network switch and a router which does routing or NAT at least for your master node to be able to download software needed (unless you have a local yum repo as I do). Also I believe you have some CentOS or RHEL based distributions administration skills and general common sense. :-)
Master node installation and config
The first step in building your diskless cluster is to install an OS on the master node in this case I am using CentOS5.6. The master node’s base installation is going to be used for slave nodes root filesystem image, so initially try to keep it as minimal as possible (obviously install the required packages for the purpose of your cluster).
The things master node is going to provide to slave nodes are:
- dhcp server
- pxe/tftp boot server
- read-only /usr NFS export
- NTP time server
Install ntp package, so both the master node and slave nodes will have it: yum install ntp
Build a root file system image
Once the OS is installed and most recent updates are applied we can start building a root file system image for the slave nodes. The process is pretty straight forward: create a file of 512MiB in size, create a file system on the loop file, mount it, copy and create required files/directories, edit configuration files, chroot into the new environment, create users, enable/disable services etc. Here is a basic script which can be used to do some of the mentions steps:
#!/bin/bash
# zooz <[email protected]>
# a script to create a basic compressed rootfs for diskless nodes
# set variables
# size in megabytes
rootfs_size="512"
# set mount point for the rootfs
mount_point="rootfs-loop"
# create a rootfs file
dd if=/dev/zero of=rootfs bs=1k count=$(($rootfs_size * 1024))
# create an ext3 file system
mkfs.ext3 -m0 -F -L root rootfs
# create a mount point
mkdir -p $mount_point
# mount the newly created file system
mount -t ext2 -o loop rootfs $mount_point
# cd into it and create required directory structure
cd $mount_point && mkdir -p bin boot dev etc home lib64 \
mnt proc root sbin sys usr/{bin,lib,lib64} var/{lib,log,run,tmp} \
var/lib/nfs tmp var/run/netreport var/lock/subsys
# copy required files into created directories
cp -ap /etc .
cp -ap /dev .
cp -ap /bin .
cp -ap /sbin .
cp -ap /lib .
cp -ap /lib64 .
cp -ap /var/lib/nfs var/lib
cp -ap /usr/bin/id usr/bin
cp -ap /root/.bashrc root/
cp -ap /root/.bash_profile root/
cp -ap /root/.bash_logout root/
# set required permissions
chown root:lock var/lock
# cd out of the mount point
cd ..
The above script creates a rootfs ext3 file with the directory structure and populates
it with required binaries and libraries for the system to be able to boot off. You should
have the rootfs mounted on /root/rootfs-loop
. Now you can bind mount /usr and chroot to the environment:
# bind mount /usr
mount -o bind /usr rootfs-loop/usr
# chroot to the new environment
chroot rootfs-loop /bin/bash
Now you are in your new (node) environment. The following steps are necessary for the nodes to function properly:
- _/etc/fstab _the contents of this file should look like below:
/dev/ram0 / ext3 defaults 0 0
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
m0.example.com:/usr /usr nfs ro 0 0
/etc/hosts
:
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
10.0.0.1 m0.example.com
10.0.0.10 s0.example.com
10.0.0.11 s1.example.com
10.0.0.12 s2.example.com
10.0.0.13 s3.example.com
10.0.0.14 s4.example.com
10.0.0.15 s5.example.com
10.0.0.16 s6.example.com
10.0.0.17 s7.example.com
10.0.0.18 s8.example.com
10.0.0.19 s9.example.com
/etc/sysconfig/network
Leave HOSTNAME unset - the slave nodes will get their hostnames from DHCP.
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=
/etc/sysconfig/network-scripts/ifcfg-eth0
Make sure HWADDR is unset, I will not explain why :-)
DEVICE=eth0
BOOTPROTO=dhcp
HWADDR=
ONBOOT=yes
It is always a good idea to have time synced up with the master node or some external NTP source, so we will enable NTP service and point it to sync up with the master node.
chkconfig ntpd on
Edit /etc/ntpd.conf
file and set server
option to m0.example.com
(configuring NTPD is out of the scope of this post)
Some additional things you may consider doing: disable all unnecessary services, create users, add public ssh keys (to make nodes management a lot easier), configure remote syslog etc.
The next step is to exit chrooted environment, umount the image and compress it:
exit
umount rootfs-loop/usr
umount rootfs-loop
gzip -c rootfs | dd of=rootfs.gz
Now you have rootfs.gz
root file system image.
Master node configuration
Right. Let’s configure the main services on the master node which are vital
for the slave nodes. I assume you have the network interface configured with
static IPs and host name set correctly to m0.example.com
(or whatever naming you decided to use).
Now install the necessary software/packages:
yum install xinetd dhcp syslinux tftp-server
DHCP server configuration - /etc/dhcpd.conf_
Here is the very basic DHCP daemon config file:
ddns-update-style interim;
ignore client-updates;
subnet 10.0.0.0 netmask 255.255.255.0 {
# supposedly your router has 10.0.0.254 address
option routers 10.0.0.254;
option subnet-mask 255.255.255.0;
# address of the tftpboot server
next-server 10.0.0.1;
filename "pxelinux.0";
default-lease-time 432000;
max-lease-time 432000;
}
# fixed IP configuration for s0.example node
host s0.example.com {
fixed-address 10.0.0.10;
hardware ethernet AA:BB:CC:DD:EE:FF;
option host-name "s0.example.com";
}
If you have more slave nodes creating host configuration for every single one can be
painful, so I wrote a simple bash script to easy it up a bit. What you need is a file,
let’s say host_ip_mac.txt
, which contains:
s0.example.com 10.0.0.10 AA:BB:CC:DD:EE:FF
s1.example.com 10.0.0.11 AA:BB:CC:DD:EE:00
s2.example.com 10.0.0.12 AA:BB:CC:DD:EE:11
s4.example.com 10.0.0.13 AA:BB:CC:DD:EE:22
And then the below script, say named dhpd-conf-gen
(make it executable of course):
#!/bin/bash
# takes three arguments from stdin and creates dhcpd
# config for each node
# hostname ip mac
# multiple lines can be passed on
while read -r hostname ip mac
do
echo "host $hostname {"
echo -e "\tfixed-address $ip;"
echo -e "\thardware ethernet $mac;"
echo -e "\toption host-name \"$hostname\";"
echo -e "}"
echo
done
Run it and it will spit the config snippets for every node you listed in
host_ip_mac.txt
file and then just paste it into the dhcpd.conf
file:
cat host_ip_mac.txt | ./dhcpd-conf-gen
host s0.example.com {
fixed-address 10.0.0.10;
hardware ethernet AA:BB:CC:DD:EE:FF;
option host-name "s0.example.com";
}
...
Start the service and make sure it is set to start on boot:
service dhcpd start && chkconfig dhcpd on
tftp boot server configuration - /etc/xinetd.d/tftp
service tftp
{
socket_type = dgram
protocol = udp
wait = yes
user = root
server = /usr/sbin/in.tftpd
server_args = -s /tftpboot -v
disable = no
per_source = 11
cps = 100 2
flags = IPv4
}
Start the service and make sure it is set to start on boot:
service xinetd start && chkconfig xinetd on
PXE boot configuration
PXE boot loader and its configuration file as well as the linux kernel
and rootfs.gz image will have to be copied under /tftpboot
directory:
# create directories required for pxe bootloader
mkdir -p /tftpboot/{linux,pxelinux.cfg}
# copy pxe boot loader (comes with syslinux package)
cp /usr/lib/syslinux/pxelinux.0 /tftpboot/
# copy linux kernel so it can be passed onto nodes by a pxe bootloader
cp /boot/vmlinuz-$(uname -r) /tftpboot/linux
# copy linux root filesystem image
cp /root/rootfs.gz /tftpboot/linux
Create a PXE bootloader config file - /tftpboot/pxelinux.cfg/0A0000
# default is label 'linux'
# boots a linux kernel and mounts rootfs.gz as a root file system on a 512MiB ramdisk
default linux
label linux
kernel linux/vmlinuz
append initrd=linux/rootfs.gz root=/dev/ram ramdisk_size=524288 rw ip=dhcp
The above config looks similar to the one we used to have in happy LILO days, remember?
The append
kernel line parameters pass the rootfs.gz
image as a root file system,
which is then mounted on /dev/ram0
, 512MiB in size as read-write (there is no point
to mount ro and then remount it rw).
0A0000
is 10.0.0.x converted into HEX, which means that the above config is valid for
the nodes with 10.0.0.x IPs. For information about the way pxelinux
finds its configuration
files can be found here.
NFS server configuration
NFS server configuration on Linux and most Unix-like systems is very
simple - in our case you will need /etc/exports
below:
/usr 10.0.0.0/24(ro,no_root_squash)
Start NFS services and make sure it is set to start on boot:
service nfs start && chkconfig nfs on
Powering on slave nodes
Now once we’ve got everything (I believe) in place we can power on slave nodes. So fingers crossed and if you added a bit of your brain too while following this howto you should have a fully working cluster for high performance tasks (what tasks? - I will leave it for your imagination :-) ).
Cluster Management
If you’ve forgotten something or want to add more features or change the config files
for your slave nodes, scroll back up and follow instructions how to mount the rootfs
image and chroot to the new environment, then unmount it and compress it over to /tftpboot/linux/rootfs.gz
.
Also if you screwed something on the slave nodes you can always power cycle them and they will be as fresh as new :-)
Slave nodes management and control can be done using an awesome tool written in Python called pssh.
Final words
This was my second public blog post, so please post your comments with your suggestions, questions and fixes and I will try to answer them all.
Enjoy! Hope it will be useful for some people and if it isn’t, then I will definitely benefit from this post some time in the future.