Continuing on from last time, I had made a quick overview of conical intersections and density functional theory (DFT). I also promised more about Configuration-Interaction methods (Wikipedia) and the new method developed in our research group, Constrained DFT-Configuration Interaction. But first, a bit about electronic correlation.
The standard way to write an electronic wavefunction is as a Slater determinant — I have N (identical) electrons that I need to put in N one-particle orbitals, and maintain the antisymmetry required since electrons are fermions. The natural way to do this turns out to be to write an NxN matrix, with increasing electron index along the columns, and increasing orbital index along the rows. The determinant of this matrix is the simplest N-electron wavefunction from that given set of one-electron orbitals that obeys the required symmetry. However, this is a fixed description of a snapshot of an electronic configuration, that ties down the electrons in particular places. Unfortunately, this is not always sufficient to describe a system, as electrons (with their wavelike nature) can be quite tricksy. The most notable cases when this occurs are when there are multiple degenerate (or nearly-degenerate) configurations for the electrons of the system that have the same (or nearly so) energy. In this case, the actual wavefunction is a linear combination of the Slater determinants corresponding to those configurations. The classic example is when a bond is being broken — dihydrogen at large internuclear separations should have one electron at each nucleus, but in a Slater “determinant” (no actual determinant being needed since there is only one electron of each spin), we have to pick one nucleus to have the spin-up electron and the other to have the spin-down one. Due to the quantum uncertainty principle, we can’t know in advance, and thus spin-up and spin-down must be in a linear combination until the wavefunction is collapsed. This need for multiple determinants is the result of “static correlation”, where there are multiple electronic configurations to be accounted for. There is also “dynamical correlation”, which reflects the fermionic tendency of the electrons to not be in the same place, i.e. to avoid each other. This is sometimes described as the time-dependent motion of the electrons to avoid each other, though this is not exactly right. If this all seems a bit confusing, don’t worry too much — there isn’t a precise way to distinguish “static” and “dynamic” correlation, and in fact, the correlation energy is usually defined as the difference between the Hartree-Fock energy (that is, the best single-determinantal energy) and the exact energy (in a given basis set). That’s not really a very useful description, but we make do. In any case, the takeaway point is that multiple determinants or electronic configurations are needed to accurately treat static correlation.
DFT has been a very popular method because of its ability to describe dynamical correlation (usually) with the same expense (or less!) than Hartree-Fock, which does not treat any dynamical correlation. The main idea of the CDFT-CI method is to add static correlation to a DFT method. It does so using Configuration Interaction, a method that starts off with a (usually large) set of electronic configurations/determinants and produces improved representations of the ground (and excited) electronic states as linear combinations of those states. (I will go on to call these states “basis states”, not to be confused with the one-particle basis functions that are used to create the one-particle orbitals that make up each determinant.) The details of the method are actually conceptually fairly simple: create a representation of the system’s Hamiltonian in this basis of states, and then diagonalize that Hamiltonian to produce its eigenstates. To do this, in addition to our basis states, all we need is the off-diagonal terms of the Hamiltonian, which we can think of as electronic couplings between the different states/determinants. For somewhat technical reasons, these couplings are easily obtained for determinants that correspond to Hartree-Fock wavefunctions, but there is less theoretical backing for using Kohn-Sham wavefunctions (which are what DFT provides) in computing these couplings. One of the main features of CDFT-CI is a somewhat clever way to compute these couplings.
Another feature of CDFT-CI is that the basis used for the configuration-interaction (which can easily reach tens of thousands of states for standard CI methods) is quite small, involving only three or four states for the systems we treated in our paper. To generate these states, instead of just taking the standard DFT ground state (which would not provide a very interesting basis set when taken multiple times), we apply a constraint, forcing charge or spin to localize on parts of the molecule. These calculations are still ground-state DFT calculations (just of a slightly perturbed system), and as such retain the accuracy and dynamic correlation benefits of DFT. However, if we choose these perturbations correctly, we can get a lot of the character of the lowest few excited states; combining a few such constrained calculations and taking a linear combination allows the ground state to be removed and each excited state in turn to be amplified, as the several eigenvectors of the CI Hamiltonian.
And that’s CDFT-CI: take a few of these tweaked DFT calculations, compute couplings between them, and diagonalize the CI Hamiltonian; out come energies that correspond to the ground and lowest few excited states. There’s dynamical correlation from DFT, and static correlation from the configuration-interaction. All in all, it’s enough to make a pretty picture of a conical intersection, which you can see in the paper I mentioned at the start of last week’s post.
Configurations
August 23rd, 2010Published
August 16th, 2010At the moment, the article that my advisor and I wrote is on the front page of The Journal of Chemical Physics. Entitled “Conical intersections using constrained density functional theory–configuration interaction”, we discuss how we have used a computational method previously developed in the research group to study conical intersections (CIs). I mentioned these in a previous post, but didn’t do much other than link to Wikipedia to describe them. Essentially, one can think of the problem of electronic structure (the branch of computational chemistry/physics in which I work) as being a question of describing a many-dimensional hypersurface which maps the electronic energy as a function of the position of the atoms that make up the molecule. (There are complications, of course, but we’ll ignore them for now.) There are actually infinitely many possible values of the electronic energy for any given nuclear position, corresponding to electronic excited states; each such excited state (as well as the non-excited ground state) traces out its own manifold in this large-dimensional space. Usually, these surfaces are well-separated, and the distance between them corresponds to the amount of energy in a visible or soft ultraviolet photon. This means that when we shine a light (usually a laser when we think about it, but any light will do) with about that much energy per photon onto our molecule, we can start to excite electrons from the ground state to an excited state. The energies do need to be fairly close, as the coupling between states has an inverse dependency on the [energy gap between states - photo energy] quantity. Anyway, once we have prepared an excited state, there are a few things that can happen. The simplest, is that it just spontaneously relaxes back to the ground state, emitting another photon. If that doesn’t happen, other relaxation pathways are available — the nuclei can move along the particular potential energy surface (that is, the hypersurface corresponding to this excited state) to a lower energy configuration, and emit a photon there. (This is fluorescence, and is responsible for the glows you see under a blacklight — the ultraviolet photons from the lamp are not visible to human eyes, but the lower-energy photons coming from fluorphores after they relax are visible.) There are some other loss pathways back to the ground state that are not easily described at this level of theory, but the final way for an excitation to relax is for it to find a place where two of these energetic hypersurfaces actually cross each other. In these configurations, the molecule can readily transition from the “excited” state to the ground state. These configurations are the conical intersections; many workers, notably David Yarkony, have found that these are actually quite ubiquitous in this many-dimensional configuration space.
Unfortunately, most of the ways to compute the electronic structure (that is, these hypersurfaces) that can actually find these conical intersections between surfaces are quite computationally expensive, scaling as N^6 or N^7 with the number of electrons (or worse). These are usually “wavefunction methods” (Wikipedia has something of a list), methods that explicitly try to describe the (3N-dimensional) configuration of all the electrons at once. An alternative class of methods, the density functional theory (DFT) class (Wikipedia) reduces the scope of the problem down to just a 3-dimensional function (which happens to be something that can be experimentally observed, by methods like X-ray diffraction), the electronic density. All electrons are identical, so we can think of this electronic density as being an average of where all the wave-like electrons happen to be. The advantage in computational cost comes at the expense of some approximations — though DFT is “in principle” exact (as wavefunction methods can be), this exactness relies on a “exchange-correlation functional” (a mathematical object that takes as input a density (as a function of position) and produces as output an energy). This functional is not known, and is unlikely to ever be known in a general fashion, so we must use approximations to it. DFT methods have only been developed comparatively recently (compared to wavefunction methods), and do not really have fully-reliable ways to compute the electronic excited states. Time-dependent DFT (which is kind of neat in its own right, and not exactly what it sounds like, but I shouldn’t digress) is the leading competitor, but it has previously been shown to be quite lousy at describing conical intersections (this is what the Levine reference from our paper is about). Our method, “Constrained DFT-configuration interaction” (CDFT-CI) builds on top of ground-state DFT to form a technique that is capable of describing CIs, which is the main result of our paper.
More about configuration-interaction methods in general, and CDFT-CI in particular, will appear in a later post.
Supplies and demands
August 9th, 2010I’ve been picking up my bicycle repair hobby again recently (more so than just what I’m forced to when my bike breaks down), and find myself needing to purchase replacement parts for various components; to some extent they can be considered “consumables”. Chains “stretch” (i.e. the pins wear down); rubber grips break down; bearings wear down. My standard response to this has been just to pay a visit to the local bike shop and use their offerings, but sometimes this is not entirely satisfactory. For example, when a flange on my rear hub broke, I essentially needed to get a replacement wheel. But to match with my 30-year-old Dawes frame, I needed a 27″ rim with 126mm dropout spacing, and threaded to take a freewheel (instead of the more modern cassette of rear sprockets). I was actually slightly surprised that they had anything in stock that met those qualifications, but I ended up waiting a while to actually buy it, since I was also slightly tempted to buy parts and assemble the new wheel myself. Unfortunately, the offerings of the internet were not so great, either, and I didn’t find a way to assemble something I would be happy with (for a reasonable price) among the usual suspects of nashbar, harris cyclery, and the like. I may still investigate building my own wheel (and hopefully take advantage of sales), but not while I’m on a deadline to return a borrowed spare wheel.
The current component I’m considering are the ball bearings that are used in the four main parts of the bicycle that need to rotate freely. The local bike shop will only sell them to me in increments of 25, which is not a terribly useful number (some applications take nine, ten, or eleven balls per side). The leftovers are not really worth saving, since mixing different batches of bearings is a bad idea — all the batches are supposed to have the same diameter and tolerance, but within the same batch, the deviations will be much smaller than between batches. If the balls are (even slightly) different sizes, the load will concentrate on the larger ones, increasing the failure rate. There are loads of places on the internet that will sell me ball bearings (even Amazon!), but most of the bike-specific ones don’t list the tolerance on them, which seems really silly. I am rather inclined to put high-quality bearings into my bicycle, especially as the components themselves are quite inexpensive. So, I came to realize that the supply giant McMaster-Carr could help me out, as highly spherical steel balls are pretty much a commodity. And so they are, with 100 balls of the highest grade (25, i.e. sphericity tolerance of 0.000025″) running just a few dollars. This gets me to thinking, what else for my bicycle (or even just around the home) might I want to get from McMaster-Carr that I hadn’t previously thought of? They seem to be pretty good at documenting their inventory (unlike these random bike shops), so as long as I have dimensioning for what I need, I can be confident that it will work.
I look forward to reading comments about what fun or useful things people might get from the giant supply warehouses.
a parallel resolution
August 1st, 2010The past couple posts followed my adventures in compiling (and running) a parallel program on an updated system. I have since found a resolution to my problems, though it was not what I expected.
Backing up a little bit, I was experiencing what might nominally have been a bug at the fourth-order of interaction — I had my source tree, a (vastly) updated operating system, updated Intel compilers, the fftw libraries, and the MPI compiler/libraries to deal with. Complicated systems like this can easily lead to ridiculously subtle and hard-to-diagnose issues (of course, they can also have extremely simple bugs, too), so I was not really expecting to just be able to type my issues into google and find an answer. However, I did have a couple of useful diagnostic tools at my disposal — I still had old libraries and compilers from the machine we’re replacing. After struggling with my issues in an ab initio fashion for a while, I made use of these tools, copying over the fftw libraries and LAM toolchain to the new machine. I was then able to isolate my bug to the LAM system — I could compile my program using the new Intel compilers and a new fftw library on the new machine, and it ran successfully. (Lest you think that this was an easy, obvious thing to do, I will point out that it did not go entirely smoothly. The mpic++ compiler wrapper has logic to check that there are LAM libraries where it expects them to be, but apparently it does not always force those libraries to be the ones linked into the final binary. As such, the initial link stage failed, as it tried to pull in the LAM libraries from /usr/lib, which are from version 7.1.2 and known to be broken for static compilation. However, I could call mpic++ -showme to print out what it was going to do for the link step, and replace -llam and friends with absolute paths to the libraries I wanted. It was this resulting binary that ran successfully.) If I repeated that procedure exactly, changing only which LAM compiler and libraries I was using, the resulting executable suffered a runtime error. By using a modular debugging technique to track down in which component(s) my bug existed, I greatly reduced the complexity of the problem, and knew which external resources to reach out to for help.
I sent mail to the LAM user list describing my issues, expecting to have a few rounds of back-and-forth of testing small programs and applying patches, but instead received a reply to the effect of “LAM is dead — the developers have moved on”. The respondent suggested OpenMPI as an alternative, and in less than a day’s work I was able to compile and install OpenMPI back-end-ing to the Intel compilers, recompile the MPI fftw libraries against that OpenMPI, and compile my software. It runs just fine.
So, now I can move on to doing more interesting things with my time. But I’ll never know what the actual bug was … and I won’t really get my time back.
linking
July 26th, 2010As I mentioned last time, I’ve been working to upgrade the software and development environment on our research group’s new login node.
One of the main troubles that when going from a serial-execution code to the MPI parallelized compiler, linkage errors popped up about the symbols fftw_version, fftw_malloc, and fftw_free. In addition to being defined in the fftw library, these symbols are also defined in an Intel MKL library. For some reason the change of compiler caused complaints about the duplicate symbols; I don’t see how the addition of the MPI additions to the fftw libraries could have affected this, as they do not involve the relevant symbols (as verified by objdump(1)). After playing around with the configure options for both fftw and lam (the MPI library), I eventually broke down and compiled a version of the fftw libraries that did not define those three symbols. This actually turned out to be quite more exciting that it ought to have been … it turns out that fftw likes to compile and run some test programs as part of its build. When you go and remove symbols from its libraries, these programs don’t like to compile, which causes the build to fail. It doesn’t help that the build system is quite autogoo-ified, making it hard to manually follow dependencies and see which make targets (if any) would just build the libraries and not test them. I ended up running a full build of the clean package (using debuild to get the debian packaging additions), keeping a log of the build to a file. Then, I went in and removed the definitions of these three symbols, replacing those statements with ‘extern’ declarations of them. After removing the compiled binary files of interest (and the final libraries), I failed to find ‘make’ invocations that would build the libraries I was really interested, so I just ran ‘make malloc.o’ and such by hand, and then copy/pasted the ar invocation to create the static libraries I wanted from the log of the normal build. I could then manually copy these libraries into a deployed tree (I just copied an existing one for the other libraries I wasn’t modifying), and build the software that has been giving me such trouble. The link step went uneventfully, and then I went to go and actually test this code.
Initial signs were promising, as the test jobs for the code I’ve already written worked okay, but jobs that explicitly tested the fft routines produced all NaNs as the result. (I am assuming that this stems from the MKL fftw_malloc’s failure to sufficiently align the memory it returns for use of the pipelined assembly instructions that fftw uses internally, but have not checked thoroughly.) For bonus points, if I pointed a serial compilation of my software at those libraries, the build failed, claiming that these symbols were not implemented!
This actually led me to note that the FFTW_LIBS were listed after the INTEL_LIBS on the link line, and thus the fftw library not picking up fftw_malloc from the Intel libraries made sense (since we recall that earlier libraries on the link line are not searched for symbols when resolving a particular object). Moving FFTW_LIBS before INTEL_LIBS allowed that static build to finish (though it still produced broken code). More interestingly, leaving FFTW_LIBS before INTEL_LIBS allowed the parallel compilation to link while using the normal (well, except for the underscores) fftw libraries. The most plausible explanation I have is that mpic++ and icpc differ in their treatment of -Wl,--begin-group and -Wl,--end-group, as INTEL_LIBS requires repeated searching to resolve inter-library dependencies, and the original complaint was for multiple implementations within the grouped objects. But that’s kind of a stretch — I’d really like to know what’s actually going on.
Now that the linking is mostly settled out, I have a new problem to deal with — the lam runtime environment for the parallel calculations is bailing on me after only one part of multi-part jobs. The displayed error is “bufferd (getroute): invalid node”. Source-diving the line printing the error is not too hard, but trying to back out where the error originates is not so easy. (The code itself is moderately interesting, doing things such as having functions that return function pointers, which results in a type signature of void(*(bufferd()))().) Playing around with my PATH and the hardcoded calls to mpirun results in several ominous WARNINGS about version mismatch between the runtime and the libraries that the binary were compiled with. I do have some reasons for having multiple version of the lam code installed on this machine, but I think I’ll save those for another post (hopefully one where I have solved all my problems and gotten back to the more interesting business of actually doing chemistry!).
packaging
July 19th, 2010Our research group is in the process of upgrading our main login node from a two-core Core2 Duo (3 GHz) to an eight-core Xeon machine (2.5 GHz); as one of the group’s system administrators, I am faced with the exciting task of porting all of the group’s software from the old machine to the new one. Along with the hardware upgrade, we are also upgrading from the EOL’d Ubuntu Gutsy Gibbon to the latest Ubuntu Long-Term Support release, Lucid Lynx.
Basically none of the actual scientific software we use (things like Q-Chem, CHARMM, GAMESS, TURBOMOLE and the like) is packaged for Debian/Ubuntu, so there is a great maze of hacked software deployment on the old machine. And, as with most amateur sysadmin projects, it is horrendously under-documented. Luckily, some of the actual binaries are statically-linked, and can thus be copied over as-is and will continue to work. Unfortunately, much of the work our group does involves new method development, and writing new code into the software requires recompiling it.
We have traditionally compiled our production software using the Intel compiler collection in preference to the GNU offerings, as one expects the processor manufacturer’s compiler to produce the most efficient code at high optimization levels. This, in turn leads to an amusing cascade of circumstances as all the pieces don’t quite fit together ….
In addition to the Intel compilers, our applications also link against the fftw (“Fastest Fourier Transform in the West”) libraries, and the lammpi implementation of the MPI (Message-Passing Interface) parallelization API. But! Instead of calling into fftw from Fortran, as it seems to assume will be the case, the existing code base calls fftw from C++, and does not append a trailing underscore to the symbols it uses, as is commonly done when interlinking C++ (well, C) and Fortran object files. This means that the fftw package available from Ubuntu is unsuitable for us, as the libraries it distributes have trailing underscores.
However, I can still leverage the existing Debian packaging: apt-get source fftw2 gives me the source code and packaging files for the several related packages that we will need. Within that directory, I can modify the debian/rules Makefile to use gfortran -no-underscoring instead of ordinary gfortran, and get libraries that will actually link against the existing codebase. (I ended up using dpkg --extract instead of just installing the resulting packages, so as to keep the unmodified Ubuntu packages in their normal place; the modified packages live in a subdirectory in /opt.)
With that out of the way, I could compile and link the codebase for single-threaded use. This is good, but our calculations that will be running for a month or two really want to benefit from a parallelization speedup, so on to MPI!
The recommended way to use MPI is to use the distributed set of compiler wrappers (e.g. mpic++) instead of the standard compiler; this takes care of linking in the appropriate MPI libraries as needed. Unfortunately, it turns out that these compiler wrappers hardcode at their compile-time which backend compiler to use. In the case of the Ubuntu packaged versions, this means gcc. It turns out that the particular set of compiler arguments that our build system normally passes to icc (the Intel C compiler) do not error out when passed to gcc, so the compilation process would proceed just fine until the first time that it attempted to link individual object files into an archive. This failed because the linker could not find any of the object files it was supposed to be looking for. The root cause of this is quite hilarious — the Intel compiler takes -openmp, enabling Intel’s parallelization technology. The GNU compiler interprets this as -o penmp, so that each object file produced is named “penmp” (overwriting the previous file). The linker, of course, can’t find filename.o, since it doesn’t exist!
Again, we can leverage the Ubuntu packaging, running apt-get source to get the lam4-dev source, and change the rules file to use icc and friends. Of course, there are a few wrenches in the works … the intel compilers want some environment preparation before being executed, in the form of a shell script that is sourced in. The debuild utility, used to build the package files, sanitizes the build environment by default, so any setup done in the shell invoking debuild is lost. A solution is to create wrapper scripts that do the setup before calling the actual Intel binaries; I chose to put these in /usr/local/bin, so as to not interfere with the system namespace. However, debuild doesn’t include that in the PATH, so I ended up putting symlinks in /usr/bin anyway. Even this is insufficient, though, as the rules file calls into dh_shlibdeps which checks the dependency information for the various shared library targets that the package provides. The code compiled with the Intel compilers links into Intel libraries (provided with the compiler suite), but these libraries are not packaged, and have no versioning information available to the Debian utilities. In this case, dh_shlibdeps errors out, which causes the entire build process to fail. A quick hack around this is to just ignore the lam libraries when doing the dependency checks, by passing -Xlam to dh_shlibdeps. This allows the lam4-dev package to build with (and backend to) the Intel compilers; I then extracted these packages into /opt and expected things to Just Work.
Alas, it proved not to be the case. All the application code compiled, but at the final link stage (some 45 minutes in), the link failed with undefined symbol references, complaining about the lam libraries installed in /usr/lib. That is, the ones installed by the standard lam4-dev package, not my custom-built version. I haven’t had much of a chance to debug the root cause of this, but hope that it can be solved by passing a non-standard --prefix argument to configure in the rules file (so that the runtime code will use a more appropriate library search path). We’ll see what actually happens.
On Garlic
July 11th, 2010I tend to be a fan of spicy and flavorful food, and one of the flavorings that I especially like is garlic. A couple weeks ago at dinner, I mentioned that it really seems that garlic as an ingredient should be measured in heads, not cloves. (E.g., I have this recipe for black bean stew that calls for three heads of garlic (ca. 30 cloves).) My friend Karl replied with the catchphrase “I am intrigued by your proposition and wish to subscribe to your newsletter”; feel free to think of this as issue one.
Garlic Rosemary Chicken with risotto
1 quart chicken stock
1.5 pounds frozen chicken pieces
1 head garlic (peeled)
1 tsp rosemary
1 tsp basil
4 tbsp butter
1 7/8 cups rice
Combine the chicken stock, garlic, and herbs in a saucepan and add the chicken pieces. Bring to a simmer, and poach the chicken, partially covered, for about 45 minutes (adjusting for their size as necessary). If there’s a lot of scum, skim it off while the chicken poaches. Remove the chicken and the cloves of garlic, and add 2 tbsp of butter to the stock, and the rice. (The stock and rice should be in a 2:1 ratio, but some of the stock will have evaporated by this point.) The rice will need frequent stirring, especially towards the end of its cooking. Melt the remaining 2 tbsp of butter in a skillet, and fry the chicken pieces and garlic cloves until golden brown. The chicken should not end up too crispy, so it may be necessary to remove it a bit before the garlic, which should end up a beautiful golden brown. The garlic will be quite soft from the poaching, so handle it carefully. Be sure to stir the rice while this is happening, you don’t want it to burn!
When the rice is done (technically, I wouldn’t really call it a risotto, but it is a decent approximation), it will be like a thick glossy sauce; at this point, declare the garlic (and chicken) done, and enjoy the feast!

I am kind of tempted to do this recipe again with two units of garlic instead of one, as it disappears very quickly at the end. Those three lonely cloves of garlic aren’t enough to make it through all the rice ….
integrity
July 5th, 2010As we saw last time, OpenAFS on FreeBSD (amd64 architecture) suffered from some serious corruption issues, being susceptible to page faults in kernel mode on small-valued (but largely non-NULL) addresses. It turns out that the corruption stems from the OpenAFS kernel module (libafs.ko) being compiled with different arguments to the gcc compiler than the main FreeBSD kernel. (The libafs.ko build procedure has been largely unchanged since the days of FreeBSD 4.X, being updated only when things break; the FreeBSD kernel build system has received more love.) In particular, the main kernel build passed -mno-red-zones, whereas until very recently the libafs.ko build did not. This argument has the effect of disabling the “Red Zone” feature of the x86_64 ABI, in which an extra 128-byte region at the end of the stack is available for use without adjusting the location of the stack pointer. In effect, libafs was calling into the kernel, and the kernel was stomping all over its data! Of course, the kernel is not at fault, here, libafs needed to be more careful about where it was storing things, but it is an amusing way to get corruption — I don’t trust the libafs code, and mostly trust the main kernel, but it was the kernel that smashed my stack. (The culprit is unlikely to be an actual function call into the main kernel, as gcc should synchronize the stack pointer before function calls, but rather an interrupt handler that triggered at an unfortunate moment.)
In my excitement of having found the bug I had been tracking for the past several weeks, I went and deleted all (fifteen or so) saved kernel coredumps that I had from the issue, so I can’t actually show the object code that caused the crash I quoted in last week’s post (it seems that I was using different compiler flags at some point, as the object code that is currently produced for afs_vop_close does not put the address of afs_global_mtx on the stack). Anyway, I can at least give an idea of what the differences look like:
--- osi_vnodeops.S-bad
+++ osi_vnodeops.S-good
@@ -1,22 +1,22 @@
0000000000001560
- 1560: 48 89 5c 24 e0 mov %rbx,0xffffffffffffffe0(%rsp)
- 1565: 48 89 6c 24 e8 mov %rbp,0xffffffffffffffe8(%rsp)
- 156a: 48 8d 15 00 00 00 00 lea 0(%rip),%rdx # 1571
- 156d: R_X86_64_PC32 .LC9+0xfffffffffffffffc
- 1571: 4c 89 64 24 f0 mov %r12,0xfffffffffffffff0(%rsp)
- 1576: 4c 89 6c 24 f8 mov %r13,0xfffffffffffffff8(%rsp)
- 157b: 48 83 ec 28 sub $0x28,%rsp
- 157f: 48 8b 1d 00 00 00 00 mov 0(%rip),%rbx # 1586
- 1582: R_X86_64_GOTPCREL afs_global_mtx+0xfffffffffffffffc
- 1586: 48 8b 47 08 mov 0x8(%rdi),%rax
- 158a: 31 f6 xor %esi,%esi
- 158c: 48 89 fd mov %rdi,%rbp
- 158f: b9 87 02 00 00 mov $0x287,%ecx
+ 1560: 48 83 ec 28 sub $0x28,%rsp
+ 1564: 48 8d 15 00 00 00 00 lea 0(%rip),%rdx # 156b
+ 1567: R_X86_64_PC32 .LC9+0xfffffffffffffffc
+ 156b: 31 f6 xor %esi,%esi
+ 156d: 48 89 5c 24 08 mov %rbx,0x8(%rsp)
+ 1572: 48 8b 1d 00 00 00 00 mov 0(%rip),%rbx # 1579
+ 1575: R_X86_64_GOTPCREL afs_global_mtx+0xfffffffffffffffc
+ 1579: b9 87 02 00 00 mov $0x287,%ecx
+ 157e: 48 89 6c 24 10 mov %rbp,0x10(%rsp)
+ 1583: 4c 89 64 24 18 mov %r12,0x18(%rsp)
+ 1588: 48 89 fd mov %rdi,%rbp
+ 158b: 4c 89 6c 24 20 mov %r13,0x20(%rsp)
+ 1590: 48 8b 47 08 mov 0x8(%rdi),%rax
1594: 48 89 df mov %rbx,%rdi
1597: 4c 8b 60 18 mov 0x18(%rax),%r12
159b: e8 00 00 00 00 callq 15a0
159c: R_X86_64_PLT32 _mtx_assert+0xfffffffffffffffc
15a0: 48 8d 15 00 00 00 00 lea 0(%rip),%rdx # 15a7
15a3: R_X86_64_PC32 .LC9+0xfffffffffffffffc
15a7: 31 f6 xor %esi,%esi
15a9: b9 87 02 00 00 mov $0x287,%ecx
The differences for this particular function (which is rather short) are limited to the beginning of the function, where values are stored into registers and on the stack. The “bad” version just starts storing values from registers down onto the stack ( mov %rbx,0xffffffffffffffe0(%rsp) is storing the value from register %rbx into the memory location pointed to by 0xffffffffffffffe0 (that is, -32) plus the value stored in %rsp, the stack pointer. Only after pushing some values onto the stack does the code decrement the stack pointer. The "good" version is good and decrements the stack pointer (the stack grows down, we recall) before storing anything to it. This difference is key, as without a "red zone", the kernel is free to write to anything below the stack pointer while fielding a scheduler interrupt, say. If the "bad" afs_vop_close was interrupted in the first few instructions, it could end up using whatever bogus values the kernel left on the stack, instead of what it put there; this can easily lead to page faults trying to access things in very low portions of address space.
With this bug squashed, I've been free to fix a couple other minor issues, and now my AFS client is sufficiently stable that I can copy half a gigabyte of data into AFS without trouble --- things are looking good for the future.
corruption?
June 27th, 2010A short post this week, I share the riddle that has been puzzling me recently:
(kgdb) frame 10
#10 0xffffff8000ac878b in afs_vop_close (ap=0xffffff8078120970) at /usr/ports/net/openafs-devel/work/openafs/src/afs/FBSD/osi_vnodeops.c:653
653 AFS_GUNLOCK();
(kgdb) p &afs_global_mtx
$1 = (struct mtx *) 0xffffff8000be6100
(kgdb) down
#9 0xffffffff8059ba00 in _mtx_assert (m=0x20, what=20,
file=0xffffff8000ad839c "/usr/ports/net/openafs-devel/work/openafs/src/afs/FBSD/osi_vnodeops.c", line=653)
at /usr/src/sys/kern/kern_mutex.c:701
701 switch (what) {
(kgdb) p m
$2 = (struct mtx *) 0x20
Noting that AFS_GUNLOCK() is defined thusly:
#define AFS_GUNLOCK() \
do { \
mtx_assert(&afs_global_mtx, (MA_OWNED|MA_NOTRECURSED)); \
mtx_unlock(&afs_global_mtx); \
} while (0)
The address passed to mtx_assert is garbage, even though the symbol involved is still correctly located.
What memory corruption could cause this behavior? The answer will probably be found in a disassembly; as such, it is time for me to learn x86(_64) assembly language.
Metal in the air
June 20th, 2010I was at dinner the other day, and sitting across from me was a Mechanical Engineering grad student; we got to talking about his research, which studies combustion. In particular, he mentioned that one of the things they study is carbon monoxide (necessitating CO detectors all over the lab!). He had an interesting story of how when they introduce CO into the combustion system, their viewing windows filmed up with an orange coating very quickly. This could be traced back to the source of the CO gas, in a high-pressure steel canister.
CO is a very good ligand to metals and metal ions — this is in fact why it’s so dangerous, as it binds very tightly to the iron in the haemoglobin in your blood, and doesn’t fall off. With no room for oxygen to bind and get distributed through the body, our cells quickly run out of metabolic pathways. Anyway, CO binds tightly to a lot of metals; in particular, to the iron in the steel canister. When five CO molecules have attached to a single iron atom, the electronic shell is filled, and the complex is free to separate from the bulk metal. The “iron carbonyl” so obtained can actually be generated as a bulk compound, a dense liquid. However, there’s nothing to bind the complexes to each other save for the weak Van der Waals interaction, so the liquid is quite volatile. In my dinner companion’s experiment, enough iron carbonyl was forming and staying in the vapor phase that it entered the combustion chamber with the stream of CO gas. Once there, the carbon monoxide was fully oxidized to carbon dioxide, and the iron converted to rust, which was the orange coating on the viewports.
This sort of carbonyl chemistry is not unique to iron; many metals will form such complexes: nickel carbonyl requires only four CO molecules to saturate, whereas chromium and others need form a more standard six-coordinate geometry. Many metal carbonyls form colored crystals (transition metals are a great way to get colored compounds), and some are liquids; nickel carbonyl boils at 43 degrees C!
Looking at these structures, one might be concerned that thermal decomposition would release dangerous carbon monoxide. However, it turns out that these (generally highly toxic) compounds are more dangerous than that: they are great ways to get unoxidized metal atoms into living tissue, as the aggregate complex is electrostatically bland and can diffuse readily through many materials. Once in the body, the metals themselves are quite active, both catalytically and in terms of disrupting biological structures such as DNA. The wikipedia page for nickel carbonyl claims that it is immediately fatal to humans at concentrations near 30 ppm.
All in all, a fun and fascinating branch of chemistry, bringing unexpected behavior from relatively commonplace materials. Just be careful and use proper conditions and technique.