|
As described in an earlier post , at home I operate a modest Beowulf type cluster for embarrassingly parallel simulation runs in batch mode. With the experience from that build, I took on building a higher-performance cluster using Debian.
Even when using older machines in a cluster, combined performance can still be acceptable depending on application. A small cluster of older machines is like a freight truck - not the fastest option at low load, but maintaining speed at high load.
An alternative to segmenting a molecular dynamics simulation when the number of nodes in the cluster is small, is to run the actual simulation on one node and computational intensive post-processing on one or two others. In this way the cluster is still dedicated to solving a single problem. This scheme also avoids inter-segment communication which can become a bottleneck.
The best machine in a Beowulf cluster should be the master node with a nice video card and display. Drones however can be started in text mode to save resources. On the net, different ways are described to start Linux in text mode. Contrary to some of those posts, in modern installations meddling with GRUB
configuration is no longer necessary. Suffice giving the command:
# systemctl set-default multiuser.target # reboot
If at any time one wishes to revert to the graphical environment, issue
# systemctl set-default graphical.target
My cluster is small enough to not need dhcp. The master node is connected to the world through a wireless connection to a router that connects to the internet. Not all drones have a wireless connection, but it is not hard to share the wireless connection on the master to the drones. I use ufw
as front-end to ip-tables
and then the procedure to make internet cluster-wide available is documented here and there in the forums - with a few small catches … first we set up ufw
# apt install ufw
and then activate the firewall … but take care when doing this in a ssh
session with a drone, because a fresh firewall without rules will by default deny anything and lock you out of the drone! So we issue
# ufw enable && ufw allow ssh
and we remain connected. First we set DEFAULT_FORWARD_POLICY
with
# vi /etc/default/ufw ... DEFAULT_FORWARD_POLICY="ACCEPT"
Then we uncomment net/ipv4/ip_forward=1
to allow forwarding packages, with
# vi /etc/ufw/sysctl.conf ... # Uncomment this to allow this host to route packets between interfaces net/ipv4/ip_forward=1 #net/ipv6/conf/default/forwarding=1 #net/ipv6/conf/all/forwarding=1
Finally we set up NAT
before the *filter
rules. Suppose the physical wireless interface is wan
on the master node. Then we edit as follows
# vi /etc/ufw/before.rules ... # Add rules for NAT table -- MvdV *nat :POSTROUTING ACCEPT [0:0] -A POSTROUTING -s 10.2.2.0/24 -o wan -j MASQUERADE COMMIT
Now every machine with IP address 10.2.2.N
can enjoy an internet connection after we restart ufw
# service ufw restart:
The master node interface is 10.2.2.1
. Now on the client side, say on drone 10.2.2.2
with interface eth
we enter in /etc/network/interfaces
a section:
# vi /etc/network/interfaces ... auto eth iface eth inet static address 10.2.2.2 netmask 255.255.255.0 network 10.2.2.0 gateway 10.2.2.1 dns-nameserver 8.8.8.8
In the forums, the last two lines are often forgotten (or taken for granted perhaps); without gateway
you cannot connect outside the LAN since your packets are not transferred by the master node, and without a nameserver you will have an internet connection that works with IP adresses but not with URLs which is not very useful. Actually, you need to have already installed a DNS service on the drone, for instance
# apt install resolvconf
Hence connecting drones to the internet can be a chicken-egg problem - you need internet to install stuff to connect to the internet. I just used a mobile phone as AP with USB tethering during the set up.
Various sources will tell you that there is no software package that defines a Beowulf cluster. But a Beowulf is dedicated to a single task so there must be communication between nodes, which is generally handled by a message passing interface for which there are excellent solutions available. Even so, I did not install an independent message passage interface. The cluster is dedicated to one application and I designed that application to handle inter-process communication itself. The program detects nodes that are online on the LAN, rates their performance, schedules subtasks based on that rating and synchronises them. This also facilitates sustainability: for less demanding jobs, nodes can be left in energy-saving mode which contributes to solving a problem in an energy efficient way.
© 2002-2024 J.M. van der Veer (jmvdveer@xs4all.nl)
|