Ubuntu for a Beowulf cluster

At home I operate a modest Beowulf type cluster for embarrassingly parallel simulation runs in batch mode. A master node controls several slave nodes. Nodes in this cluster need no more than a server installation, but if a slave happens to have keyboard, mouse and monitor a small GUI is nice to have. This GUI should be lightweight, otherwise a desktop installation makes more sense. This page demonstrates how to set up a slave node with GUI on Ubuntu Server including browser, document reader, image viewer and sound. In below sections, when editing a file, lines without prompt are text you may enter at an appropriate point in that file.

Installation of a slave node
Setting up sound and GUI
Applications particular for my cluster
PyMOL set up

Installation of a slave node

This description pertains to Ubuntu 14.04 LTS. Partition your disk the way you see fit, personally for a slave I just partition into three: /, swap and /data. I give it one user account, from now on conveniently called marcel. When prompted for software to install during installation I just select OpenSSH.

After installation reboot into the new installation. First of all I wish to work from the superuser account (though others will advice against this practice since you will wield the power to irrevocably scratch data so be warned: if in doubt use the sudo command). I assign a password to root and then become superuser.

$ sudo passwd root
$ su

Next the network is set up. The master node connects to the internet, and the slave nodes connect to the master node only. My cluster is small enough to assign static IPs to slaves. Reviewing the file in /etc/udev/rules.d only is needed if a disk moves from one machine to another after installation, and the network card MAC address must be changed - proceed with care.

# vi /etc/network/interfaces
auto p2p1
iface p2p1 inet static
address 192.168.178.[xxx]
dns-nameservers  [...]
# vi /etc/udev/rules.d/70-persistent-net.rules
# vi /etc/hosts
# ifup p2p1

On my Beowulf, all slave nodes share a volume on the master node. Applications are mostly stored locally to free network capacity. Since data traffic is not critical I do not need the newest and fastest. SAMBA is used to share the volume instead of NFS because I also access data on the cluster from Microsoft Windows. Note that mounting the share is done by rc.local at end of boot when the network is humming.

# mkdir /share
# apt-get install cifs-utils
# vi /etc/rc.local
mount -t cifs //[server ip address]/share\
 -o guest,uid=1000,iocharset=utf8 /share
# /etc/rc.local

Note that just cifs-utils is needed since slaves are SAMBA clients. The master is a SAMBA server simply set up by

root@master# apt-get install samba

and the share is set up by adding a description at the end of the configuration file

root@master# vi /etc/samba/smb.conf
   comment = Batch Spool Space
   path = /share
   read only = no
root@master# reboot

The master node executes jobs on slave nodes by means of ssh. This requires password-less login, meaning that on the master node a key is generated for user marcel by

marcel@master$ ssh-keygen -t rsa

This key is copied to each slave using

marcel@master$ ssh-copy-id marcel@slave

In principle the slave can now execute jobs given to it by the master node. The master sends a script named job to the slave for execution by

marcel@master$ ssh marcel@slave job

Setting up sound and GUI

A GUI with browser is much more fun with sound. Hence we install ALSA and pulse audio, start the daemon and make sure sound goes into the correct sink.

# apt-get install alsa alsa-tools\
  alsa-utils alsa-oss
# apt-get install libasound2 libasound2-plugins
# apt-get install pulseaudio pulseaudio-utils
# usermod -aG audio,pulse,pulse-access marcel
# usermod -aG audio,pulse,pulse-access root
# pulseaudio -D
# aplay -l
[Note card# and device#]
# vi /etc/asound.conf
defaults.pcm.card card#
defaults.pcm.device device#
defaults.ctl.card card#
# reboot

Log in again as root after this reboot.

# alsamixer
# apt-get install moc mplayer

Install the X Window system. Install only the fonts you want to use. The window manager will be basic but rock solid TWM. Ensure LC_ALL=POSIX is in your locale otherwise TWM will use too large line spacing in title bars and menus.

# apt-get install xorg xterm twm
# apt-get install xfonts-traditional\
  xfonts-scalable xfonts-encodings\
  xfonts-unifont xfonts-utils
# apt-get install firefox gimp\
  evince eog imagemagick
# vi /etc/defaults/locale
# vi .twmrc
[Refer to TWM manual]
menu "main"
"Terminal" f.exec  "xterm -fn 9x15 &"
"Firefox"  f.exec  "firefox &"
"GIMP"     f.exec  "gimp &"
"Reader"   f.exec  "evince &"
"Viewer"   f.exec  "eog &"

Now the GUI can be started manually:

# startx

Basically you are done now. Enjoy Ubuntu Server with GUI!

Applications particular for my cluster

I run LaTeX in batch so want the full package. So in console mode or from xterm I proceed as follows:

# apt-get install texlive-full

Next, development material is installed for C and Pascal. Also some utilities and libraries are installed.

# apt-get install make automake autoconf
# apt-get install gcc clang fp-compiler gdb
# apt-get install libgsl0-dev fftw3-dev
# apt-get install libncurses5-dev
# apt-get install plotutils libplot-dev

On my cluster Algol 68 and Fortran 77 are prominent languages, and are built from the shared volume. Same goes for my Fortran 77 label renumber program rf77.

# cd /share/a68g-2.8 && ./configure &&\
  make && make install
# cd /share/f77 && make && make install
# cd /share/rf77 && make && make install

Note that there is no MPI or CONDOR. Jobs are supposed to be embarrassingly parallel and to be run in batch. An Algol 68 script takes care of scheduling the jobs in the batch queue using straightforward semaphores, which is a simple yet effective solution for my needs. As you are aware, no two Beowulfs are the same.

PyMOL set up

PyMOL can be set up straightforwardly:
# apt-get install pymol

A common pitfall follows from PyMOL being picky about the video driver and Ubuntu falling back to generic driver llvm if not everything is set up just right for your graphics hardware. PyMOL cannot handle llvm resulting in a crash when PyMOL wants to draw your atoms, after for instance loading a coordinate file. I once had one slave on which PyMOL ran well from root but crashed from a normal user account which points at insufficient or incomplete permissions to access video hardware. Indeed next actions resolved the issue:

# usermod -aG video marcel
# usermod -aG video root
# chmod ugo+rw /dev/dri/*