Our research group is in the process of upgrading our main login node from a two-core Core2 Duo (3 GHz) to an eight-core Xeon machine (2.5 GHz); as one of the group’s system administrators, I am faced with the exciting task of porting all of the group’s software from the old machine to the new one. Along with the hardware upgrade, we are also upgrading from the EOL’d Ubuntu Gutsy Gibbon to the latest Ubuntu Long-Term Support release, Lucid Lynx.
Basically none of the actual scientific software we use (things like Q-Chem, CHARMM, GAMESS, TURBOMOLE and the like) is packaged for Debian/Ubuntu, so there is a great maze of hacked software deployment on the old machine. And, as with most amateur sysadmin projects, it is horrendously under-documented. Luckily, some of the actual binaries are statically-linked, and can thus be copied over as-is and will continue to work. Unfortunately, much of the work our group does involves new method development, and writing new code into the software requires recompiling it.
We have traditionally compiled our production software using the Intel compiler collection in preference to the GNU offerings, as one expects the processor manufacturer’s compiler to produce the most efficient code at high optimization levels. This, in turn leads to an amusing cascade of circumstances as all the pieces don’t quite fit together ….
In addition to the Intel compilers, our applications also link against the fftw (“Fastest Fourier Transform in the West”) libraries, and the lammpi implementation of the MPI (Message-Passing Interface) parallelization API. But! Instead of calling into fftw from Fortran, as it seems to assume will be the case, the existing code base calls fftw from C++, and does not append a trailing underscore to the symbols it uses, as is commonly done when interlinking C++ (well, C) and Fortran object files. This means that the fftw package available from Ubuntu is unsuitable for us, as the libraries it distributes have trailing underscores.
However, I can still leverage the existing Debian packaging: apt-get source fftw2 gives me the source code and packaging files for the several related packages that we will need. Within that directory, I can modify the debian/rules Makefile to use gfortran -no-underscoring instead of ordinary gfortran, and get libraries that will actually link against the existing codebase. (I ended up using dpkg --extract instead of just installing the resulting packages, so as to keep the unmodified Ubuntu packages in their normal place; the modified packages live in a subdirectory in /opt.)
With that out of the way, I could compile and link the codebase for single-threaded use. This is good, but our calculations that will be running for a month or two really want to benefit from a parallelization speedup, so on to MPI!
The recommended way to use MPI is to use the distributed set of compiler wrappers (e.g. mpic++) instead of the standard compiler; this takes care of linking in the appropriate MPI libraries as needed. Unfortunately, it turns out that these compiler wrappers hardcode at their compile-time which backend compiler to use. In the case of the Ubuntu packaged versions, this means gcc. It turns out that the particular set of compiler arguments that our build system normally passes to icc (the Intel C compiler) do not error out when passed to gcc, so the compilation process would proceed just fine until the first time that it attempted to link individual object files into an archive. This failed because the linker could not find any of the object files it was supposed to be looking for. The root cause of this is quite hilarious — the Intel compiler takes -openmp, enabling Intel’s parallelization technology. The GNU compiler interprets this as -o penmp, so that each object file produced is named “penmp” (overwriting the previous file). The linker, of course, can’t find filename.o, since it doesn’t exist!
Again, we can leverage the Ubuntu packaging, running apt-get source to get the lam4-dev source, and change the rules file to use icc and friends. Of course, there are a few wrenches in the works … the intel compilers want some environment preparation before being executed, in the form of a shell script that is sourced in. The debuild utility, used to build the package files, sanitizes the build environment by default, so any setup done in the shell invoking debuild is lost. A solution is to create wrapper scripts that do the setup before calling the actual Intel binaries; I chose to put these in /usr/local/bin, so as to not interfere with the system namespace. However, debuild doesn’t include that in the PATH, so I ended up putting symlinks in /usr/bin anyway. Even this is insufficient, though, as the rules file calls into dh_shlibdeps which checks the dependency information for the various shared library targets that the package provides. The code compiled with the Intel compilers links into Intel libraries (provided with the compiler suite), but these libraries are not packaged, and have no versioning information available to the Debian utilities. In this case, dh_shlibdeps errors out, which causes the entire build process to fail. A quick hack around this is to just ignore the lam libraries when doing the dependency checks, by passing -Xlam to dh_shlibdeps. This allows the lam4-dev package to build with (and backend to) the Intel compilers; I then extracted these packages into /opt and expected things to Just Work.
Alas, it proved not to be the case. All the application code compiled, but at the final link stage (some 45 minutes in), the link failed with undefined symbol references, complaining about the lam libraries installed in /usr/lib. That is, the ones installed by the standard lam4-dev package, not my custom-built version. I haven’t had much of a chance to debug the root cause of this, but hope that it can be solved by passing a non-standard --prefix argument to configure in the rules file (so that the runtime code will use a more appropriate library search path). We’ll see what actually happens.
Archive for the ‘System Administration’ Category
packaging
Monday, July 19th, 2010A battle of stale file handles
Monday, February 22nd, 2010As a computational chemist, I frequently require a lot of computing power, and so do my co-workers. To handle this demand, our research group has a cluster of servers that are managed by some queuing software, so that we can just submit jobs “to the queue” and they will be run on the next available node. While the scheduling algorithms for the queuing system could end up as a post by themself, here I’ll be talking about a lower-level component, something which is integral to the cluster itself: NFS. The Network File System is an incredibly useful tool, allowing a filesystem on a single, central fileserver to be exported to any number of clients, who all see the same set of files as if it were local to that machine. However, NFS is not without it’s own (large) share of bugs.
In particular, our setup has more and more frequently, of late, been cropping up with one of the most frustrating errors: the stale file handle. It’s a perfectly reasonable error in concept, since when multiple clients are editing files in the same directory, client B needs to see the changes made by client A, so a file handle that was formerly okay instead becomes out-of-date. Or “stale”, so to speak.
The most reliable way that I have found to trigger the stale file handle is to submit a large number (say, 100) of very small jobs to our cluster, taking advantage of their parallelizability more than out of a need for heavy computation. Frequently, a large portion of these jobs will terminate almost immediately, producing an empty output file. This is quite perplexing at first, but I can use the queue’s monitoring software to determine which compute node(s) are causing these empty output files. Having done so, I can log into one of them interactively, and try to run the calculation there. Bingo — a stale NFS file handle error,
a quick termination, and empty output file.
Since the actual job that I submit it usually just a wrapper script around a larger compiled executable, I could track down the “missing” file to be that large executable. However, if I try to access that file myself, say, by using cat(1) to print it to /dev/null, then the file access works fine.
My present surmise is that since there are only 8 nfsd server threads on the NFS server, and I frequently have up to 30 jobs running simultaneously (and trying to put output files in the same directory), the clients are getting starved for fileserver access and giving up. This doesn’t quite make sense, as the stale file handle error is only supposed to occur when that file itself has changed, and this large executable is only changed infrequently, but I don’t have any better thoughts at the moment.
Luckily, the workaround is quite simple: remember how this was called from a shell script? Just adding the line
cat /path-to-NFS-executable > /dev/null
at the top of the shell script brings the file in cache and makes the errors go away. But I still wonder what’s really going on …