As I mentioned last time, I’ve been working to upgrade the software and development environment on our research group’s new login node.
One of the main troubles that when going from a serial-execution code to the MPI parallelized compiler, linkage errors popped up about the symbols fftw_version, fftw_malloc, and fftw_free. In addition to being defined in the fftw library, these symbols are also defined in an Intel MKL library. For some reason the change of compiler caused complaints about the duplicate symbols; I don’t see how the addition of the MPI additions to the fftw libraries could have affected this, as they do not involve the relevant symbols (as verified by objdump(1)). After playing around with the configure options for both fftw and lam (the MPI library), I eventually broke down and compiled a version of the fftw libraries that did not define those three symbols. This actually turned out to be quite more exciting that it ought to have been … it turns out that fftw likes to compile and run some test programs as part of its build. When you go and remove symbols from its libraries, these programs don’t like to compile, which causes the build to fail. It doesn’t help that the build system is quite autogoo-ified, making it hard to manually follow dependencies and see which make targets (if any) would just build the libraries and not test them. I ended up running a full build of the clean package (using debuild to get the debian packaging additions), keeping a log of the build to a file. Then, I went in and removed the definitions of these three symbols, replacing those statements with ‘extern’ declarations of them. After removing the compiled binary files of interest (and the final libraries), I failed to find ‘make’ invocations that would build the libraries I was really interested, so I just ran ‘make malloc.o’ and such by hand, and then copy/pasted the ar invocation to create the static libraries I wanted from the log of the normal build. I could then manually copy these libraries into a deployed tree (I just copied an existing one for the other libraries I wasn’t modifying), and build the software that has been giving me such trouble. The link step went uneventfully, and then I went to go and actually test this code.
Initial signs were promising, as the test jobs for the code I’ve already written worked okay, but jobs that explicitly tested the fft routines produced all NaNs as the result. (I am assuming that this stems from the MKL fftw_malloc’s failure to sufficiently align the memory it returns for use of the pipelined assembly instructions that fftw uses internally, but have not checked thoroughly.) For bonus points, if I pointed a serial compilation of my software at those libraries, the build failed, claiming that these symbols were not implemented!
This actually led me to note that the FFTW_LIBS were listed after the INTEL_LIBS on the link line, and thus the fftw library not picking up fftw_malloc from the Intel libraries made sense (since we recall that earlier libraries on the link line are not searched for symbols when resolving a particular object). Moving FFTW_LIBS before INTEL_LIBS allowed that static build to finish (though it still produced broken code). More interestingly, leaving FFTW_LIBS before INTEL_LIBS allowed the parallel compilation to link while using the normal (well, except for the underscores) fftw libraries. The most plausible explanation I have is that mpic++ and icpc differ in their treatment of -Wl,--begin-group and -Wl,--end-group, as INTEL_LIBS requires repeated searching to resolve inter-library dependencies, and the original complaint was for multiple implementations within the grouped objects. But that’s kind of a stretch — I’d really like to know what’s actually going on.
Now that the linking is mostly settled out, I have a new problem to deal with — the lam runtime environment for the parallel calculations is bailing on me after only one part of multi-part jobs. The displayed error is “bufferd (getroute): invalid node”. Source-diving the line printing the error is not too hard, but trying to back out where the error originates is not so easy. (The code itself is moderately interesting, doing things such as having functions that return function pointers, which results in a type signature of void(*(bufferd()))().) Playing around with my PATH and the hardcoded calls to mpirun results in several ominous WARNINGS about version mismatch between the runtime and the libraries that the binary were compiled with. I do have some reasons for having multiple version of the lam code installed on this machine, but I think I’ll save those for another post (hopefully one where I have solved all my problems and gotten back to the more interesting business of actually doing chemistry!).
linking
July 26th, 2010packaging
July 19th, 2010Our research group is in the process of upgrading our main login node from a two-core Core2 Duo (3 GHz) to an eight-core Xeon machine (2.5 GHz); as one of the group’s system administrators, I am faced with the exciting task of porting all of the group’s software from the old machine to the new one. Along with the hardware upgrade, we are also upgrading from the EOL’d Ubuntu Gutsy Gibbon to the latest Ubuntu Long-Term Support release, Lucid Lynx.
Basically none of the actual scientific software we use (things like Q-Chem, CHARMM, GAMESS, TURBOMOLE and the like) is packaged for Debian/Ubuntu, so there is a great maze of hacked software deployment on the old machine. And, as with most amateur sysadmin projects, it is horrendously under-documented. Luckily, some of the actual binaries are statically-linked, and can thus be copied over as-is and will continue to work. Unfortunately, much of the work our group does involves new method development, and writing new code into the software requires recompiling it.
We have traditionally compiled our production software using the Intel compiler collection in preference to the GNU offerings, as one expects the processor manufacturer’s compiler to produce the most efficient code at high optimization levels. This, in turn leads to an amusing cascade of circumstances as all the pieces don’t quite fit together ….
In addition to the Intel compilers, our applications also link against the fftw (“Fastest Fourier Transform in the West”) libraries, and the lammpi implementation of the MPI (Message-Passing Interface) parallelization API. But! Instead of calling into fftw from Fortran, as it seems to assume will be the case, the existing code base calls fftw from C++, and does not append a trailing underscore to the symbols it uses, as is commonly done when interlinking C++ (well, C) and Fortran object files. This means that the fftw package available from Ubuntu is unsuitable for us, as the libraries it distributes have trailing underscores.
However, I can still leverage the existing Debian packaging: apt-get source fftw2 gives me the source code and packaging files for the several related packages that we will need. Within that directory, I can modify the debian/rules Makefile to use gfortran -no-underscoring instead of ordinary gfortran, and get libraries that will actually link against the existing codebase. (I ended up using dpkg --extract instead of just installing the resulting packages, so as to keep the unmodified Ubuntu packages in their normal place; the modified packages live in a subdirectory in /opt.)
With that out of the way, I could compile and link the codebase for single-threaded use. This is good, but our calculations that will be running for a month or two really want to benefit from a parallelization speedup, so on to MPI!
The recommended way to use MPI is to use the distributed set of compiler wrappers (e.g. mpic++) instead of the standard compiler; this takes care of linking in the appropriate MPI libraries as needed. Unfortunately, it turns out that these compiler wrappers hardcode at their compile-time which backend compiler to use. In the case of the Ubuntu packaged versions, this means gcc. It turns out that the particular set of compiler arguments that our build system normally passes to icc (the Intel C compiler) do not error out when passed to gcc, so the compilation process would proceed just fine until the first time that it attempted to link individual object files into an archive. This failed because the linker could not find any of the object files it was supposed to be looking for. The root cause of this is quite hilarious — the Intel compiler takes -openmp, enabling Intel’s parallelization technology. The GNU compiler interprets this as -o penmp, so that each object file produced is named “penmp” (overwriting the previous file). The linker, of course, can’t find filename.o, since it doesn’t exist!
Again, we can leverage the Ubuntu packaging, running apt-get source to get the lam4-dev source, and change the rules file to use icc and friends. Of course, there are a few wrenches in the works … the intel compilers want some environment preparation before being executed, in the form of a shell script that is sourced in. The debuild utility, used to build the package files, sanitizes the build environment by default, so any setup done in the shell invoking debuild is lost. A solution is to create wrapper scripts that do the setup before calling the actual Intel binaries; I chose to put these in /usr/local/bin, so as to not interfere with the system namespace. However, debuild doesn’t include that in the PATH, so I ended up putting symlinks in /usr/bin anyway. Even this is insufficient, though, as the rules file calls into dh_shlibdeps which checks the dependency information for the various shared library targets that the package provides. The code compiled with the Intel compilers links into Intel libraries (provided with the compiler suite), but these libraries are not packaged, and have no versioning information available to the Debian utilities. In this case, dh_shlibdeps errors out, which causes the entire build process to fail. A quick hack around this is to just ignore the lam libraries when doing the dependency checks, by passing -Xlam to dh_shlibdeps. This allows the lam4-dev package to build with (and backend to) the Intel compilers; I then extracted these packages into /opt and expected things to Just Work.
Alas, it proved not to be the case. All the application code compiled, but at the final link stage (some 45 minutes in), the link failed with undefined symbol references, complaining about the lam libraries installed in /usr/lib. That is, the ones installed by the standard lam4-dev package, not my custom-built version. I haven’t had much of a chance to debug the root cause of this, but hope that it can be solved by passing a non-standard --prefix argument to configure in the rules file (so that the runtime code will use a more appropriate library search path). We’ll see what actually happens.
On Garlic
July 11th, 2010I tend to be a fan of spicy and flavorful food, and one of the flavorings that I especially like is garlic. A couple weeks ago at dinner, I mentioned that it really seems that garlic as an ingredient should be measured in heads, not cloves. (E.g., I have this recipe for black bean stew that calls for three heads of garlic (ca. 30 cloves).) My friend Karl replied with the catchphrase “I am intrigued by your proposition and wish to subscribe to your newsletter”; feel free to think of this as issue one.
Garlic Rosemary Chicken with risotto
1 quart chicken stock
1.5 pounds frozen chicken pieces
1 head garlic (peeled)
1 tsp rosemary
1 tsp basil
4 tbsp butter
1 7/8 cups rice
Combine the chicken stock, garlic, and herbs in a saucepan and add the chicken pieces. Bring to a simmer, and poach the chicken, partially covered, for about 45 minutes (adjusting for their size as necessary). If there’s a lot of scum, skim it off while the chicken poaches. Remove the chicken and the cloves of garlic, and add 2 tbsp of butter to the stock, and the rice. (The stock and rice should be in a 2:1 ratio, but some of the stock will have evaporated by this point.) The rice will need frequent stirring, especially towards the end of its cooking. Melt the remaining 2 tbsp of butter in a skillet, and fry the chicken pieces and garlic cloves until golden brown. The chicken should not end up too crispy, so it may be necessary to remove it a bit before the garlic, which should end up a beautiful golden brown. The garlic will be quite soft from the poaching, so handle it carefully. Be sure to stir the rice while this is happening, you don’t want it to burn!
When the rice is done (technically, I wouldn’t really call it a risotto, but it is a decent approximation), it will be like a thick glossy sauce; at this point, declare the garlic (and chicken) done, and enjoy the feast!

I am kind of tempted to do this recipe again with two units of garlic instead of one, as it disappears very quickly at the end. Those three lonely cloves of garlic aren’t enough to make it through all the rice ….
integrity
July 5th, 2010As we saw last time, OpenAFS on FreeBSD (amd64 architecture) suffered from some serious corruption issues, being susceptible to page faults in kernel mode on small-valued (but largely non-NULL) addresses. It turns out that the corruption stems from the OpenAFS kernel module (libafs.ko) being compiled with different arguments to the gcc compiler than the main FreeBSD kernel. (The libafs.ko build procedure has been largely unchanged since the days of FreeBSD 4.X, being updated only when things break; the FreeBSD kernel build system has received more love.) In particular, the main kernel build passed -mno-red-zones, whereas until very recently the libafs.ko build did not. This argument has the effect of disabling the “Red Zone” feature of the x86_64 ABI, in which an extra 128-byte region at the end of the stack is available for use without adjusting the location of the stack pointer. In effect, libafs was calling into the kernel, and the kernel was stomping all over its data! Of course, the kernel is not at fault, here, libafs needed to be more careful about where it was storing things, but it is an amusing way to get corruption — I don’t trust the libafs code, and mostly trust the main kernel, but it was the kernel that smashed my stack. (The culprit is unlikely to be an actual function call into the main kernel, as gcc should synchronize the stack pointer before function calls, but rather an interrupt handler that triggered at an unfortunate moment.)
In my excitement of having found the bug I had been tracking for the past several weeks, I went and deleted all (fifteen or so) saved kernel coredumps that I had from the issue, so I can’t actually show the object code that caused the crash I quoted in last week’s post (it seems that I was using different compiler flags at some point, as the object code that is currently produced for afs_vop_close does not put the address of afs_global_mtx on the stack). Anyway, I can at least give an idea of what the differences look like:
--- osi_vnodeops.S-bad
+++ osi_vnodeops.S-good
@@ -1,22 +1,22 @@
0000000000001560
- 1560: 48 89 5c 24 e0 mov %rbx,0xffffffffffffffe0(%rsp)
- 1565: 48 89 6c 24 e8 mov %rbp,0xffffffffffffffe8(%rsp)
- 156a: 48 8d 15 00 00 00 00 lea 0(%rip),%rdx # 1571
- 156d: R_X86_64_PC32 .LC9+0xfffffffffffffffc
- 1571: 4c 89 64 24 f0 mov %r12,0xfffffffffffffff0(%rsp)
- 1576: 4c 89 6c 24 f8 mov %r13,0xfffffffffffffff8(%rsp)
- 157b: 48 83 ec 28 sub $0x28,%rsp
- 157f: 48 8b 1d 00 00 00 00 mov 0(%rip),%rbx # 1586
- 1582: R_X86_64_GOTPCREL afs_global_mtx+0xfffffffffffffffc
- 1586: 48 8b 47 08 mov 0x8(%rdi),%rax
- 158a: 31 f6 xor %esi,%esi
- 158c: 48 89 fd mov %rdi,%rbp
- 158f: b9 87 02 00 00 mov $0x287,%ecx
+ 1560: 48 83 ec 28 sub $0x28,%rsp
+ 1564: 48 8d 15 00 00 00 00 lea 0(%rip),%rdx # 156b
+ 1567: R_X86_64_PC32 .LC9+0xfffffffffffffffc
+ 156b: 31 f6 xor %esi,%esi
+ 156d: 48 89 5c 24 08 mov %rbx,0x8(%rsp)
+ 1572: 48 8b 1d 00 00 00 00 mov 0(%rip),%rbx # 1579
+ 1575: R_X86_64_GOTPCREL afs_global_mtx+0xfffffffffffffffc
+ 1579: b9 87 02 00 00 mov $0x287,%ecx
+ 157e: 48 89 6c 24 10 mov %rbp,0x10(%rsp)
+ 1583: 4c 89 64 24 18 mov %r12,0x18(%rsp)
+ 1588: 48 89 fd mov %rdi,%rbp
+ 158b: 4c 89 6c 24 20 mov %r13,0x20(%rsp)
+ 1590: 48 8b 47 08 mov 0x8(%rdi),%rax
1594: 48 89 df mov %rbx,%rdi
1597: 4c 8b 60 18 mov 0x18(%rax),%r12
159b: e8 00 00 00 00 callq 15a0
159c: R_X86_64_PLT32 _mtx_assert+0xfffffffffffffffc
15a0: 48 8d 15 00 00 00 00 lea 0(%rip),%rdx # 15a7
15a3: R_X86_64_PC32 .LC9+0xfffffffffffffffc
15a7: 31 f6 xor %esi,%esi
15a9: b9 87 02 00 00 mov $0x287,%ecx
The differences for this particular function (which is rather short) are limited to the beginning of the function, where values are stored into registers and on the stack. The “bad” version just starts storing values from registers down onto the stack ( mov %rbx,0xffffffffffffffe0(%rsp) is storing the value from register %rbx into the memory location pointed to by 0xffffffffffffffe0 (that is, -32) plus the value stored in %rsp, the stack pointer. Only after pushing some values onto the stack does the code decrement the stack pointer. The "good" version is good and decrements the stack pointer (the stack grows down, we recall) before storing anything to it. This difference is key, as without a "red zone", the kernel is free to write to anything below the stack pointer while fielding a scheduler interrupt, say. If the "bad" afs_vop_close was interrupted in the first few instructions, it could end up using whatever bogus values the kernel left on the stack, instead of what it put there; this can easily lead to page faults trying to access things in very low portions of address space.
With this bug squashed, I've been free to fix a couple other minor issues, and now my AFS client is sufficiently stable that I can copy half a gigabyte of data into AFS without trouble --- things are looking good for the future.
corruption?
June 27th, 2010A short post this week, I share the riddle that has been puzzling me recently:
(kgdb) frame 10
#10 0xffffff8000ac878b in afs_vop_close (ap=0xffffff8078120970) at /usr/ports/net/openafs-devel/work/openafs/src/afs/FBSD/osi_vnodeops.c:653
653 AFS_GUNLOCK();
(kgdb) p &afs_global_mtx
$1 = (struct mtx *) 0xffffff8000be6100
(kgdb) down
#9 0xffffffff8059ba00 in _mtx_assert (m=0x20, what=20,
file=0xffffff8000ad839c "/usr/ports/net/openafs-devel/work/openafs/src/afs/FBSD/osi_vnodeops.c", line=653)
at /usr/src/sys/kern/kern_mutex.c:701
701 switch (what) {
(kgdb) p m
$2 = (struct mtx *) 0x20
Noting that AFS_GUNLOCK() is defined thusly:
#define AFS_GUNLOCK() \
do { \
mtx_assert(&afs_global_mtx, (MA_OWNED|MA_NOTRECURSED)); \
mtx_unlock(&afs_global_mtx); \
} while (0)
The address passed to mtx_assert is garbage, even though the symbol involved is still correctly located.
What memory corruption could cause this behavior? The answer will probably be found in a disassembly; as such, it is time for me to learn x86(_64) assembly language.
Metal in the air
June 20th, 2010I was at dinner the other day, and sitting across from me was a Mechanical Engineering grad student; we got to talking about his research, which studies combustion. In particular, he mentioned that one of the things they study is carbon monoxide (necessitating CO detectors all over the lab!). He had an interesting story of how when they introduce CO into the combustion system, their viewing windows filmed up with an orange coating very quickly. This could be traced back to the source of the CO gas, in a high-pressure steel canister.
CO is a very good ligand to metals and metal ions — this is in fact why it’s so dangerous, as it binds very tightly to the iron in the haemoglobin in your blood, and doesn’t fall off. With no room for oxygen to bind and get distributed through the body, our cells quickly run out of metabolic pathways. Anyway, CO binds tightly to a lot of metals; in particular, to the iron in the steel canister. When five CO molecules have attached to a single iron atom, the electronic shell is filled, and the complex is free to separate from the bulk metal. The “iron carbonyl” so obtained can actually be generated as a bulk compound, a dense liquid. However, there’s nothing to bind the complexes to each other save for the weak Van der Waals interaction, so the liquid is quite volatile. In my dinner companion’s experiment, enough iron carbonyl was forming and staying in the vapor phase that it entered the combustion chamber with the stream of CO gas. Once there, the carbon monoxide was fully oxidized to carbon dioxide, and the iron converted to rust, which was the orange coating on the viewports.
This sort of carbonyl chemistry is not unique to iron; many metals will form such complexes: nickel carbonyl requires only four CO molecules to saturate, whereas chromium and others need form a more standard six-coordinate geometry. Many metal carbonyls form colored crystals (transition metals are a great way to get colored compounds), and some are liquids; nickel carbonyl boils at 43 degrees C!
Looking at these structures, one might be concerned that thermal decomposition would release dangerous carbon monoxide. However, it turns out that these (generally highly toxic) compounds are more dangerous than that: they are great ways to get unoxidized metal atoms into living tissue, as the aggregate complex is electrostatically bland and can diffuse readily through many materials. Once in the body, the metals themselves are quite active, both catalytically and in terms of disrupting biological structures such as DNA. The wikipedia page for nickel carbonyl claims that it is immediately fatal to humans at concentrations near 30 ppm.
All in all, a fun and fascinating branch of chemistry, bringing unexpected behavior from relatively commonplace materials. Just be careful and use proper conditions and technique.
… and Chemical Physics
June 14th, 2010As I mentioned last time, I submitted an article to the Journal of Chemical Physics. We (my advisor and I) submitted it as a Communication (as opposed to a full article), and it seems that this has made a big difference on the review process. Instead of the expected 4-6 week response time, we heard back from the reviewers in just under 2 weeks. The peer review is anonymous, so I know them only as “Reviewer #1″ and “Reviewer #2″; their comments were pasted together in an email from the journal’s submission-management software.
At this point, we have made changes to the paper to address the reviewers’ (minor) concerns, and after I tweak a couple of words, I will submit the revised copy. Whereas the original submission had just a cover letter and a PDF of the paper, this time I will upload a new cover letter (“Dear Editors, we have made minor changes to satisfy the reviewers’ requests”), letters addressing the reviewers’ comments in detail, the LaTeX source to the document, and the corresponding image files. Interestingly, they requested image format is TIFF, which is not a format that my pdflatex(1) can handle, so I had to convert to PNG to get a preview! Their document-processing system does not appear to have support for separate bibliography files, so I will need to “convert my bibliography to a .bbl file and include it as the bibliography section in the .tex document”. I’m looking forward to seeing the final paper in JCP!
Annual Reviews …
June 7th, 2010… no, this is not me looking back at the past year. Rather, I just last week got in the mail a copy of volume 61 of Annual Reviews of Physical Chemistry. Several members of my research group are coauthors on a paper contained therein, “The Diabatic Picture of Electron Transfer, Reaction Barriers, and Molecular Dynamics”. My main contribution was implementing the computer code that was used to compute diabatic couplings for several of the systems discussed in the paper. These couplings were a necessity for the CDFT-CI method (Constrained Density Functional Theory-Configuration Interaction) that I alluded to in a previous post, which is a lot of what I’m working on these days. The work mentioned in that post has paid off, and I have some pretty pictures of conical intersections (CIs) calculated using CDFT-CI; we’ve assembled them into a paper that I submitted recently to the Journal of Chemical Physics. This is the first time I’ve been first author on an academic paper, so I got introduced to the submission process. For JCP, at least, submissions are made through a web portal that is run on a separate web domain (http://jcp.peerx-press.org/); separations like this always seem a bit weird to me.
The writing process itself seems somewhat disjoint from the actual submission and anything that might get published. It turns out that JCP uses an XML tool to typeset their documents for publication, but only accept submissions in LaTeX and MS Word formats. After you upload your document, their system goes and processes it, converting to PDF along the way. Then you can download a proof copy of the PDF and make sure that everything converted acceptably.
This conversion step seemed a bit silly as I submitted the first manuscript as a PDF (which is allowed for the peer review phase; the final submission is restricted per the above), so it was “converting” from PDF to PDF.
The PDF that I submitted was generated from LaTeX source, of course, but I did grab a copy of the JCP-mandated revtex documentclass in order to compile the preprint that I submitted. I had been using plain “article” when I was writing, so a few things needed to be tweaked (mostly in the preamble) to get the document to compile with revtex4-1.
But now my paper is submitted; I picked the authors (my advisor and me) from their author database, clicked a few buttons, and it was off. We got an email that it was received, and should get another in a few days that it has been assigned an editor, but then we don’t hear anything for 4-6 weeks, when we hear what the reviewers have to say.
For now, I move on to another project; more on that later.
Wurst than bratwurst
May 24th, 2010I was at Trader Joe’s this evening, picking up the usual supplies (orange juice, sandwich fixings, and the like), and was feeling a bit hungry (always a poor plan to go to the grocery store hungry …), so I took a peek at some of the prepared food and meat section. To make a long story short, I came home with a pack of Hofbrau brats, something I had thought about trying for a while.
Now, the proper way to cook bratwurst is over charcoal, but I wasn’t really up for hauling everything out at 10 p.m.. (It doesn’t help that I don’t have charcoal handy, either.) So, I elected for the next best thing, boiling them and then browning them on the George Foreman. I wasn’t really up for sacrificing any of the beer on-hand to the brats, so I went and spiced the water up with a few bay leaves, half a teaspoon of mustard seeds, and four peppercorns. Unfortunately, my pans are a bit on the large side; my stew pot (the only container really large enough for this task) holds 12 quarts. This is nice when I’m making stew and just keep throwing in vegetables, but it also means that I ended up with three quarts of water for five sausages, a fact which probably led to my downfall.
The boiling liquid smelled quite tasty, but when I pulled the sausages out to go on the grill, they looked kind of shriveled, and even when they’d gained some color and caramelization, they still ended up quite bland. I was all ready with my waffle-as-a-bun (since I had leftover waffles from breakfast), but the first bite was lackluster, and I might even say that the light flavor of the waffle dominated the sausage. Most pitiful. I can only presume that the large excess of water and relative lack of solute caused most of the flavour to leave the brats and enter the solution.
There was a part 2 of the experiment, though — I had some Johnsonville brats in the freezer from when someone was moving out last year and gave them to me in the hallway, so I boiled those as the second round in the water. I haven’t tried them yet, though; we’ll see what happens.
DON’T PANIC…
May 17th, 2010… and carry a towel.
Now, if only we could get our operating systems to follow that advice. Well, the first part at least.
We’ve been making good progress on the front of making OpenAFS work on recent FreeBSD versions — in fact, if you limit yourself to a single processor machine and don’t do anything terribly stressful on the system, it works as you expect. However, it’s far from perfect, as evidenced by the system crashes that appear under heavy load. Historically, when the operating system’s kernel encounters a “cannot happen” case, it calls a routine called “panic()”, which prints a message to the console and stops (or reboots) the system.
In the present case, a bug in the OpenAFS code caused it to try to access memory at address 0xe0, but there wasn’t anything there:
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0xe0
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff8059bbe9
stack pointer = 0x28:0xffffff803cf549f0
frame pointer = 0x28:0xffffff803cf54a20
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 1963 (fs)
That address 0xe0 is quite suspicious, as it is a very small number; it is probably an offset into some data structure that happens to start at the NULL address that is used to indicate “there isn’t actually anything here”.
Using the powerful tool of a kernel debugger, we can probe more closely into what is happening, by looking at a stack trace of what functions were called to get here:
(kgdb) bt
#0 doadump () at pcpu.h:223
#1 0xffffffff801e17bc in db_fncall (dummy1=Variable "dummy1" is not available.
)
at /usr/src/sys/ddb/db_command.c:548
#2 0xffffffff801e1af1 in db_command (last_cmdp=0xffffffff80c53d60, cmd_table=Variable "cmd_table" is not available.
)
at /usr/src/sys/ddb/db_command.c:445
#3 0xffffffff801e1d40 in db_command_loop ()
at /usr/src/sys/ddb/db_command.c:498
#4 0xffffffff801e3d19 in db_trap (type=Variable "type" is not available.
) at /usr/src/sys/ddb/db_main.c:229
#5 0xffffffff805de135 in kdb_trap (type=12, code=0, tf=0xffffff803cf54940)
at /usr/src/sys/kern/subr_kdb.c:535
#6 0xffffffff808a655d in trap_fatal (frame=0xffffff803cf54940, eva=Variable "eva" is not available.
)
at /usr/src/sys/amd64/amd64/trap.c:773
#7 0xffffffff808a68bc in trap_pfault (frame=0xffffff803cf54940, usermode=0)
at /usr/src/sys/amd64/amd64/trap.c:694
#8 0xffffffff808a70ef in trap (frame=0xffffff803cf54940)
at /usr/src/sys/amd64/amd64/trap.c:451
#9 0xffffffff8088bd33 in calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:223
#10 0xffffffff8059bbe9 in _mtx_lock_flags (m=0xc8, opts=0,
file=0xffffffff809c37dc "/usr/src/sys/kern/vfs_subr.c", line=2138)
at /usr/src/sys/kern/kern_mutex.c:194
#11 0xffffffff8063e2d1 in vref (vp=0x0) at /usr/src/sys/kern/vfs_subr.c:2138
#12 0xffffff8000ac71a0 in afs_syscall_pioctl () from /boot/modules/libafs.ko
#13 0xffffff8000a7d1dc in afs3_syscall () from /boot/modules/libafs.ko
#14 0xffffffff808a69d2 in syscall (frame=0xffffff803cf54c80)
at /usr/src/sys/amd64/amd64/trap.c:946
#15 0xffffffff8088c011 in Xfast_syscall ()
at /usr/src/sys/amd64/amd64/exception.S:374
#16 0x00000008006f20fc in ?? ()
Previous frame inner to this frame (corrupt stack?)
The first several frames are not very interesting; just the mechanics of what the debugger did to get an image of the kernel memory onto the hard disk (a “core dump”). Frame 9 must be where we the bad page fault started, and frame 10 shows the unlucky routine that happened to be running when things fell apart. Ignoring frame 11 for now, frames 12 and 13 seem to be the AFS code that has the bug that caused the crash. Without going into too much detail, afs3_syscall() is a very general routine that basically everything in the kernel for OpenAFS goes through. afs_syscall_pioctl() is also a fairly general routine, but is for a smaller class of tasks than afs3_syscall. With any luck, we can figure out where in afs_syscall_pioctl the call to vref() in question occurs, and back out what’s going wrong.
As it turns out, there is no call to vref in afs_sycall_pioctl:
[kaduk@hysteresis /usr/devel/openafs/git/openafs/src/afs]$ grep vref afs_pioctl.c [kaduk@hysteresis /usr/devel/openafs/git/openafs/src/afs]$
That’s because it’s obscured behind a preprocessor macro (not uncommon at all for virtual-filesystem code); a bit of investigation reveals that the macro VN_HOLD() calls vref():
[kaduk@hysteresis /usr/devel/openafs/git/openafs/src/afs]$ grep VN_HOLD FBSD/*.h FBSD/osi_machdep.h:#define VN_HOLD(vp) VREF(vp) [kaduk@hysteresis /usr/devel/openafs/git/openafs/src/afs]$ grep VREF /usr/include/sys/* /usr/include/sys/vnode.h:#define VREF(vp) vref(vp)
Yes, that’s two layers of indirection — one from OpenAFS, and one from FreeBSD. Isn’t it fun?
Anyway, now we can see where this call to “vref” happens. As it turns out, there are only two occurrences of “VN_HOLD” within afs_syscall_pioctl(), and one of them is only compiled for Sun machines. Thus, we’re left looking at this block of code:
#ifdef AFS_LINUX22_ENV
code = gop_lookupname_user(path, AFS_UIOUSER, follow, &dp);
if (!code)
vp = (struct vnode *)dp->d_inode;
#else
code = gop_lookupname_user(path, AFS_UIOUSER, follow, &vp);
#if defined(AFS_FBSD80_ENV) /* XXX check on 7x */
VN_HOLD(vp);
#endif /* AFS_FBSD80_ENV */
#endif /* AFS_LINUX22_ENV */
#endif /* AFS_AIX41_ENV */
AFS_GLOCK();
if (code) {
vp = NULL;
#if defined(KERNEL_HAVE_UERROR)
setuerror(code);
#endif
goto rescred;
The VN_HOLD call is in a FreeBSD-specific portion (and thus poorly tested). We call it with the ‘vp’ argument, but it turns out that the gop_lookupname_user() function can return with vp set to NULL. I’ll spare you the internals of vref(), but suffice it to say that it assumes the vp argument is to an actual object, not a NULL placeholder, and things blow up.
We can easily avoid the panic by just checking if vp is NULL, and only calling VN_HOLD when it is not.
And now we’ve fixed a kernel bug. Not so hard, really …