By Marcel van der Veer
April 2018

Published in Tech Tips

More on Beowulf, Computational Science, Linux

As described in an earlier post , at home I operate a modest Beowulf type cluster for embarrassingly parallel simulation runs in batch mode. With the experience from that build, I took on building a higher-performance cluster using Debian.

Even when using older machines in a cluster, combined performance can still be acceptable depending on application. A small cluster of older machines is like a freight truck - not the fastest option at low load, but maintaining speed at high load.

Mesoscopic simulation of a phospholipid micelle in water.

An alternative to segmenting a molecular dynamics simulation when the number of nodes in the cluster is small, is to run the actual simulation on one node and computational intensive post-processing on one or two others. In this way the cluster is still dedicated to solving a single problem. This scheme also avoids inter-segment communication which can become a bottleneck.

Drones start in text mode

The best machine in a Beowulf cluster should be the master node with a nice video card and display. Drones however can be started in text mode to save resources. On the net, different ways are described to start Linux in text mode. Contrary to some of those posts, in modern installations meddling with GRUB configuration is no longer necessary. Suffice giving the command:

# systemctl set-default
# reboot

If at any time one wishes to revert to the graphical environment, issue

# systemctl set-default

Connecting drones with the internet

My cluster is small enough to not need dhcp. The master node is connected to the world through a wireless connection to a router that connects to the internet. Not all drones have a wireless connection, but it is not hard to share the wireless connection on the master to the drones. I use ufw as front-end to ip-tables and then the procedure to make internet cluster-wide available is documented here and there in the forums - with a few small catches … first we set up ufw

# apt install ufw

and then activate the firewall … but take care when doing this in a ssh session with a drone, because a fresh firewall without rules will by default deny anything and lock you out of the drone! So we issue

# ufw enable && ufw allow ssh

and we remain connected. First we set DEFAULT_FORWARD_POLICY with

# vi /etc/default/ufw

Then we uncomment net/ipv4/ip_forward=1 to allow forwarding packages, with

# vi /etc/ufw/sysctl.conf
# Uncomment this to allow this host to route packets between interfaces

Finally we set up NAT before the *filter rules. Suppose the physical wireless interface is wan on the master node. Then we edit as follows

# vi /etc/ufw/before.rules
# Add rules for NAT table -- MvdV

Now every machine with IP address 10.2.2.N can enjoy an internet connection after we restart ufw

# service ufw restart:

The master node interface is Now on the client side, say on drone with interface eth we enter in /etc/network/interfaces a section:

# vi /etc/network/interfaces
auto eth
iface eth inet static

In the forums, the last two lines are often forgotten (or taken for granted perhaps); without gateway you cannot connect outside the LAN since your packets are not transferred by the master node, and without a nameserver you will have an internet connection that works with IP adresses but not with URLs which is not very useful. Actually, you need to have already installed a DNS service on the drone, for instance

# apt install resolvconf

Hence connecting drones to the internet can be a chicken-egg problem - you need internet to install stuff to connect to the internet. I just used a mobile phone as AP with USB tethering during the set up.

Message Passing Interface

Various sources will tell you that there is no software package that defines a Beowulf cluster. But a Beowulf is dedicated to a single task so there must be communication between nodes, which is generally handled by a message passing interface for which there are excellent solutions available. Even so, I did not install an independent message passage interface. The cluster is dedicated to one application and I designed that application to handle inter-process communication itself. The program detects nodes that are online on the LAN, rates their performance, schedules subtasks based on that rating and synchronises them. This also facilitates sustainability: for less demanding jobs, nodes can be left in energy-saving mode which contributes to solving a problem in an energy efficient way.

Updated on 18-11-2019

All blog posts