Our research group is in the process of upgrading our main login node from a two-core Core2 Duo (3 GHz) to an eight-core Xeon machine (2.5 GHz); as one of the group’s system administrators, I am faced with the exciting task of porting all of the group’s software from the old machine to the new one. Along with the hardware upgrade, we are also upgrading from the EOL’d Ubuntu Gutsy Gibbon to the latest Ubuntu Long-Term Support release, Lucid Lynx.
Basically none of the actual scientific software we use (things like Q-Chem, CHARMM, GAMESS, TURBOMOLE and the like) is packaged for Debian/Ubuntu, so there is a great maze of hacked software deployment on the old machine. And, as with most amateur sysadmin projects, it is horrendously under-documented. Luckily, some of the actual binaries are statically-linked, and can thus be copied over as-is and will continue to work. Unfortunately, much of the work our group does involves new method development, and writing new code into the software requires recompiling it.
We have traditionally compiled our production software using the Intel compiler collection in preference to the GNU offerings, as one expects the processor manufacturer’s compiler to produce the most efficient code at high optimization levels. This, in turn leads to an amusing cascade of circumstances as all the pieces don’t quite fit together ….
In addition to the Intel compilers, our applications also link against the fftw (“Fastest Fourier Transform in the West”) libraries, and the lammpi implementation of the MPI (Message-Passing Interface) parallelization API. But! Instead of calling into fftw from Fortran, as it seems to assume will be the case, the existing code base calls fftw from C++, and does not append a trailing underscore to the symbols it uses, as is commonly done when interlinking C++ (well, C) and Fortran object files. This means that the fftw package available from Ubuntu is unsuitable for us, as the libraries it distributes have trailing underscores.
However, I can still leverage the existing Debian packaging: apt-get source fftw2 gives me the source code and packaging files for the several related packages that we will need. Within that directory, I can modify the debian/rules Makefile to use gfortran -no-underscoring instead of ordinary gfortran, and get libraries that will actually link against the existing codebase. (I ended up using dpkg --extract instead of just installing the resulting packages, so as to keep the unmodified Ubuntu packages in their normal place; the modified packages live in a subdirectory in /opt.)
With that out of the way, I could compile and link the codebase for single-threaded use. This is good, but our calculations that will be running for a month or two really want to benefit from a parallelization speedup, so on to MPI!
The recommended way to use MPI is to use the distributed set of compiler wrappers (e.g. mpic++) instead of the standard compiler; this takes care of linking in the appropriate MPI libraries as needed. Unfortunately, it turns out that these compiler wrappers hardcode at their compile-time which backend compiler to use. In the case of the Ubuntu packaged versions, this means gcc. It turns out that the particular set of compiler arguments that our build system normally passes to icc (the Intel C compiler) do not error out when passed to gcc, so the compilation process would proceed just fine until the first time that it attempted to link individual object files into an archive. This failed because the linker could not find any of the object files it was supposed to be looking for. The root cause of this is quite hilarious — the Intel compiler takes -openmp, enabling Intel’s parallelization technology. The GNU compiler interprets this as -o penmp, so that each object file produced is named “penmp” (overwriting the previous file). The linker, of course, can’t find filename.o, since it doesn’t exist!
Again, we can leverage the Ubuntu packaging, running apt-get source to get the lam4-dev source, and change the rules file to use icc and friends. Of course, there are a few wrenches in the works … the intel compilers want some environment preparation before being executed, in the form of a shell script that is sourced in. The debuild utility, used to build the package files, sanitizes the build environment by default, so any setup done in the shell invoking debuild is lost. A solution is to create wrapper scripts that do the setup before calling the actual Intel binaries; I chose to put these in /usr/local/bin, so as to not interfere with the system namespace. However, debuild doesn’t include that in the PATH, so I ended up putting symlinks in /usr/bin anyway. Even this is insufficient, though, as the rules file calls into dh_shlibdeps which checks the dependency information for the various shared library targets that the package provides. The code compiled with the Intel compilers links into Intel libraries (provided with the compiler suite), but these libraries are not packaged, and have no versioning information available to the Debian utilities. In this case, dh_shlibdeps errors out, which causes the entire build process to fail. A quick hack around this is to just ignore the lam libraries when doing the dependency checks, by passing -Xlam to dh_shlibdeps. This allows the lam4-dev package to build with (and backend to) the Intel compilers; I then extracted these packages into /opt and expected things to Just Work.
Alas, it proved not to be the case. All the application code compiled, but at the final link stage (some 45 minutes in), the link failed with undefined symbol references, complaining about the lam libraries installed in /usr/lib. That is, the ones installed by the standard lam4-dev package, not my custom-built version. I haven’t had much of a chance to debug the root cause of this, but hope that it can be solved by passing a non-standard --prefix argument to configure in the rules file (so that the runtime code will use a more appropriate library search path). We’ll see what actually happens.
Archive for the ‘computational chemistry’ Category
packaging
Monday, July 19th, 2010… and Chemical Physics
Monday, June 14th, 2010As I mentioned last time, I submitted an article to the Journal of Chemical Physics. We (my advisor and I) submitted it as a Communication (as opposed to a full article), and it seems that this has made a big difference on the review process. Instead of the expected 4-6 week response time, we heard back from the reviewers in just under 2 weeks. The peer review is anonymous, so I know them only as “Reviewer #1″ and “Reviewer #2″; their comments were pasted together in an email from the journal’s submission-management software.
At this point, we have made changes to the paper to address the reviewers’ (minor) concerns, and after I tweak a couple of words, I will submit the revised copy. Whereas the original submission had just a cover letter and a PDF of the paper, this time I will upload a new cover letter (“Dear Editors, we have made minor changes to satisfy the reviewers’ requests”), letters addressing the reviewers’ comments in detail, the LaTeX source to the document, and the corresponding image files. Interestingly, they requested image format is TIFF, which is not a format that my pdflatex(1) can handle, so I had to convert to PNG to get a preview! Their document-processing system does not appear to have support for separate bibliography files, so I will need to “convert my bibliography to a .bbl file and include it as the bibliography section in the .tex document”. I’m looking forward to seeing the final paper in JCP!
Annual Reviews …
Monday, June 7th, 2010… no, this is not me looking back at the past year. Rather, I just last week got in the mail a copy of volume 61 of Annual Reviews of Physical Chemistry. Several members of my research group are coauthors on a paper contained therein, “The Diabatic Picture of Electron Transfer, Reaction Barriers, and Molecular Dynamics”. My main contribution was implementing the computer code that was used to compute diabatic couplings for several of the systems discussed in the paper. These couplings were a necessity for the CDFT-CI method (Constrained Density Functional Theory-Configuration Interaction) that I alluded to in a previous post, which is a lot of what I’m working on these days. The work mentioned in that post has paid off, and I have some pretty pictures of conical intersections (CIs) calculated using CDFT-CI; we’ve assembled them into a paper that I submitted recently to the Journal of Chemical Physics. This is the first time I’ve been first author on an academic paper, so I got introduced to the submission process. For JCP, at least, submissions are made through a web portal that is run on a separate web domain (http://jcp.peerx-press.org/); separations like this always seem a bit weird to me.
The writing process itself seems somewhat disjoint from the actual submission and anything that might get published. It turns out that JCP uses an XML tool to typeset their documents for publication, but only accept submissions in LaTeX and MS Word formats. After you upload your document, their system goes and processes it, converting to PDF along the way. Then you can download a proof copy of the PDF and make sure that everything converted acceptably.
This conversion step seemed a bit silly as I submitted the first manuscript as a PDF (which is allowed for the peer review phase; the final submission is restricted per the above), so it was “converting” from PDF to PDF.
The PDF that I submitted was generated from LaTeX source, of course, but I did grab a copy of the JCP-mandated revtex documentclass in order to compile the preprint that I submitted. I had been using plain “article” when I was writing, so a few things needed to be tweaked (mostly in the preamble) to get the document to compile with revtex4-1.
But now my paper is submitted; I picked the authors (my advisor and me) from their author database, clicked a few buttons, and it was off. We got an email that it was received, and should get another in a few days that it has been assigned an editor, but then we don’t hear anything for 4-6 weeks, when we hear what the reviewers have to say.
For now, I move on to another project; more on that later.
Who will verify the verifiers?
Sunday, May 2nd, 2010Following up on a topic which I have mentioned previously, I find myself at a point in my research where I can actually implement the perturbative expressions needed for this research project.
I thought I had mentioned this in a previous post, but it fails to come up in a search. The expressions for the first-order perturbative corrections I want are algebraic in nature, but applying Thouless’ theorem by hand to get a useable expression is quite messy. Instead, it is possible to use a diagrammatic perturbation theory (a method originally introduced by Feynman for a different problem) to make the expressions quite concise, using the Einstein summation convention. This takes the derivation of working expressions down from about five pages of algebra to one, which makes it more reasonable to trust.
As mentioned previously, the perturbative expressions involve terms roughly like T_{ij}^{ab}C_{ia}C_{jc}C_{kb}w_{kc}; they are all formed as products (i.e. sums) of underlying terms. There are only three types of these underlying terms (which correspond to matrix elements of one- and two-electron operators), T, C, and w. Now, when writing an implementation of these expression from scratch, I want to have some confidence that the numbers I produce are correct. Part of this comes from using the easily-verified diagrammatic expressions for their conciseness, but I also want to bring in elements of the Computer Science technique of unit testing. Thus, I want to verify that each of the T, C, and w terms are correct before I start putting them together. The T elements are well-known; just the standard two-electron integrals (of the coulomb repulsion operator) — in fact, they are the main component of a common energy correction (mp2) for quantum chemical calculations that is implemented in almost every computational chemistry software package. So checking those is easy, just write a few 10-line functions and compute the mp2 energy, which is easily verified against those existing implementations. But the C and w terms are ones that we came up with just for this project; there aren’t existing implementations to compare against. Thus far, the best I’ve been able to do is just a bit of spot-checking for C (which are just wavefunction overlaps) with some known values. Computing the w is pretty simple, but I haven’t come up with a good way to check it, yet. Presumably I would want to find some other one-electron operator with known matrix elements; it might not be too much work to wrangle the one-electron hamiltonian into a useful form.
But even more than just the component matrix elements themselves, I want to verify the final result. So, though I have gone to all this work to get efficient expressions that scale as N^4 or better, I will find myself writing out loops with six (or more!) indices (which are simpler and easier to audit). But hey, polynomial time is still much better than exponential time …. Anyway, I can get some numbers that I have confidence in for a few small systems, and then start digging into the optimization in earnest once I have a few observables to verify my updated code against.
unobservables
Monday, April 26th, 2010In theoretical chemistry, some amount of interest is placed in the distinction between adiabatic and diabatic electronic states. Adiabatic states are well-defined as eigenstates of the electronic Hamiltonian, and are what we usually calcluate. However, sometimes they can be inconvenient to use, especially when we consider a large range of geometries. For example, if I pull apart the two atoms in a molecule of lithium fluoride, at short separations, the bond is of ionic character, but at large separations, the state behaves covalently. The concept of diabatic electronic states is introduced to eliminate this inconsistency — in LiF, we would consider two diabatic states, one of which is ionic in character and the other covalent. These potential energy surfaces for these states will cross at some point, which is where the nature of the adiabatic surface changes.
Unfortunately, there is no rigorous definition of diabatic states — the only possible candidate is not achievable in practice, so we frequently find ourselves considering approximately diabatic states.
We can convert a set of diabatic states into a set of adiabatic states by calculating the derivative couplings between them (e.g.
In our research group, we make extensive use of constrained density functional theory, which uses constraints on the molecular charge and/or spin in particular regions of space to find approximately diabatic electronic states. Now, if I wanted to compute a transition dipole moment (which is a physical observable) between two such states, I would be out of luck. Two general constrained states will come from systems with different Hamiltonians (due to the constraint potential), and as such will be non-orthogonal. Since observables are only defined for a set of orthogonal states, it does not make any sense to directly compute a transition dipole moment between two (different) constrained states. However, there is a simple way to make a general collection of constrained states into a collection of orthogonal constrained states — construct an overlap matrix, and use it to orthogonalize the states in question. For our purposes, symmetric orthogonalization makes the most sense, so we end up with coefficient matrices C’=S^{-1/2}C . Now these are some states we can get some transition dipoles from. And we probably will, in the future.
Linear Algebra, where art thou?
Monday, April 19th, 2010In a previous post, I mentioned a theorem that would speed up computing the overlap between unrelated molecular wavefunctions. I’ve been putting more work into this project, off-and-on, and come up with a bunch of expressions that I’ll want to evaluate; things that look like T_{ij}^{ab}C_{ia}C_{jc}C_{kb}w_{kc}, where T_{ij}^{ab} is the two-electron integral involving orbitals i, j, a, and b; w_{bc} is the matrix element of a single-electron operator (just a multiplicative potential in real-space); and C_{fg} are these overlaps between wavefunctions. Naively applying the Einstein summation convention, this would look to scale as N^6, but we can pre-sum over i and a (a N^4 step), and are then left with sums over j, k, b, and c, which is also only N^4.
But where do these terms come from? w_{kc} is in practice just a vector-matrix-vector product c_k.W.c_c; computing each term is N^2 time, and there are only N^2 of them, which is N^4 work total. So far so good. The two-electron integral T_{ij}^{ab} is just an integral over two variables r1 and r2 of 1/(r1-r2) times the orbitals i, j, a, and b (in real space), where i and a are functions of r1, and j and b are functions of r2. (Incidentally, the code to compute these integrals is the hard part of any atomic-orbital-based computational chemistry code; I am building off of someone else’s base.) Since the basis functions are all gaussians, these integrals can be evaluated in constant time … but only in the atomic orbital basis. We are working in the molecular orbital basis, and the best transformation of these integrals into the molecular orbital basis (each molecular orbital being a weighted sum of the atomic orbitals) requires N^5 time. However, this code is the bottleneck for many things, and is already heavily optimized, so I’ll go ahead and worry about other things.
The C_{ia} are nominally overlaps between two determinantal wavefunctions, with each determinant being M-factorial terms (M being the number of electrons); computing the overlap between a pair of terms is constant time (just lookup the one-electron orbital overlaps), but there are still (M!)^2 terms! M is frequently an order of magnitude less than N, but this is still quite a bit of work. It turns out, though (and I’m not planning to do the derivation here), that if we write things out in terms of permutations, then this collapses down into the determinant of a single M-by-M matrix, the condensation of C_{phi}.S.C_{psi}, where C_{phi} is an M-by-N matrix of orbital coefficients, S is the overlap matrix in the atomic orbital basis, and C_{psi} is the N-by-M matrix of orbital coefficients for psi. The matrix-matrix-matrix product is a (large-constant-factor) N^3 operation, as is the determinant, but there are N^2 of these terms to compute, so the C_{ia} as a whole contribute an N^5 factor. Unfortunately, this is all in code that I am writing, so it’s not terribly optimized.
Now, this whole setup really makes one think that it might be further optimized — S is always the same, and C_{psi} is always the same. Furthermore, the C_{phi} are constructed from a common base, with a single column being changed from that common base each time. (Which column that is can be different, though.) If M was equal to N, then we could use the associativity of the determinant to precompute det(S) and det(C_{psi}), and all of the appropriate minors from C_{phi}, and then do a O(N) expansion over minors for each C_{ia} term. However, the determinant is not associative for non-square matrices, so this speedup doesn’t work. Any linear algebra masters reading are requested to comment on whether such a speedup may or may not be possible in the general case.
For now, I’ll take some relish in the fact that this is not the (only) scaling bottleneck ….
Oh, the documentation
Monday, March 29th, 2010As part of my Ph.D. research (in computational chemistry), I am showing that a method we have developed can accurately locate Conical Intersections. Of course, I need something to compare my results to, and a well-known method for computing electronic energies that can locate CIs is the Complete Active Space (CAS, or CASSCF) method. It’s essentially a giant optimization problem, which is broken down into a matrix diagonalization. Our electronic structure calculation gives us one-electron orbitals, which are combined into a full wavefunction for the system. In simple calculations, this is just a single Slater determinant, but more accurate results (for total molecular energy and other properties) are obtained if more (frequently many more) determinants are included in the representation of the wavefunction. Constructing wavefunctions from multiple determinants by hand can be challenging, though — it’s a bit of work, and it’s hard to know which ones are most relevant. CAS is an algorithm for choosing determinants to include in a wavefunction.
First, we order our one-electron orbitals by energy, and then we pick some number of orbitals that are occupied in the standard Slater determinant (the highest-energy ones), and then some more orbitals from the low-energy unoccupied orbitals. These orbitals are the active space, and we will be effectively moving electrons around within that active space. We then proceed to construct every Slater determinant that includes all of the occupied orbitals we didn’t select and some combination of orbitals from the active space. (Yes, the number of determinants scales exponentially with the size of the active space. Nobody said this was going to be cheap.) Now, we have an expression for a many-determinantal wavefunction, with a whole bunch of free parameters. The total energy of the system is a variational quantity, meaning that the exact wavefunction will give the lowest energy of any possible wavefunction. Thus, our problem of finding the best wavefunction reduces to optimizing our parameters to get the lowest energy.
The parameters that we have to work with in this optimization are not quite obvious — we clearly have coefficients for each determinant in the sum, but we can also vary the nature of the one-particle orbitals. (These one-particle orbitals, called molecular orbitals, are defined as linear combinations of an atomic orbital basis set, which is where this set of parameters comes from.) Now, the very early work on this method was focusing on it for its variational properties — trying to get a good description of the ground state. Now, it turns out that CASSCF methods can also do a decent job describing electronic excited states as well — it’s the cheapest reliable way to locate conical intersections, for one. (Conical intersections are a standard “hard case” for electronic excited states.) However, if we want a good excited state wavefunction as well as a good ground state wavefunction, this changes our optimization problem a bit. No longer can we just blindly optimize for the ground-state energy, we end up optimizing a weighted average of the ground-state energy and the energy of the first-few excited states (that is, the lowest eigenvalues of our interaction matrix). These relative weights, of course, are a configurable parameter, so the scientist can specify what is desired; since CAS was originally developed for ground states, the default is to just optimize for the ground state.
The software I’m using to run these CAS calculations is Gaussian ‘03 (www.gaussian.com). There’s a newer version out, but this part of the code hasn’t really changed, and the new documentation also applies to the old version. Anyway, if you go look in their user’s manual, it tells you that if you use the StateAverage keyword for the CAS calculation, then you can specify the relative weights of however many states you’re interested in (that’s NRoot=j) on a line later in the file. Now, Gaussian is not known for having a particularly user-friendly file format for the input, but with a bit more digging, it’s pretty clear which line of the file this will be, and they even go so far as to say that the weights will be read in in the %10.8f format. (They’re not kidding about that — if you put things in as a %6.4f format, it reads in “unknown value”s.) However, there is one key point that it does not mention: the weights must sum to one! Sure, this is fairly natural; the math of the optimization doesn’t work unless the coefficients sum to unity. But on the other hand, the optimization doesn’t make sense unless the coefficients sum to unity, so maybe they could do some renormalization for the user, especially in the common case where they just want all the weights to be the same. Or maybe print a warning, to allow for informed foot-shooting. (Speaking of foot-shooting, if you specify StateAverage but don’t specify any weights, they are read in as zero. All of them. Yes, this makes for a weird optimization. No, there is no warning message printed.)
Anyway, this was a frustrating way to spend a bunch of time this week — with weights that summed to two, my energy optimization for water (which should be about -76 hartrees of energy) was finding a solution at about -74 hartrees. (Note that one hartree is about 630 kcal/mol of energy, which is roughly the energy released by the combustion of butane, i.e. quite a bit.) Dropping the sum of the weights down to one made things converge much more nicely. Now, if only this was in the documentation ….
A Theorem of Thouless
Monday, February 15th, 2010The result I describe below was presented in Nuclear Physics 21 (1960), 225-232.
First, let me give a bit of brief background to what is going on, here. If you’ve heard of quantum mechanics, you’ve probably heard of the Schrodinger equation, which defines quantum states (and their energies) as eigenfunctions (and eigenvalues) of the Hamiltonian operator. Now, in actual practice, we can’t numerically solve for the 3N-dimensional wavefunction (N is the number of particles involved, in this case electrons), so we introduce some sort of basis set of continuous functions, and attempt to solve for the best approximation to the true wavefunction that is contained within the Hilbert space spanned by these basis functions.
Now, these basis functions are generally real-valued functions (though in general they can be complex-valued), but more importantly, they only represent a single particle orbital, holding only a single electron. Since electrons are fermions, the sign of a two-electron (or larger) wavefunction must change when the indices of any pair of electrons is swapped. (To write an N-electron wavefunction, we pretend that we can tell the electrons apart, giving them labels, and then write a product of single-particle orbitals that are assigned different electron indices.) This means that just using a simple product such as a(1)*b(2) won’t work — if we swap the indices, to a(2)*b(1), the sign stays positive!
Instead, we need to antisymmetrize the wavefunction, a mathematical operation that can be represented by taking the determinant of the single-particle orbitals as occupied by different electron labels. In the above case, the wavefunction becomes a(1)*b(2)-a(2)*b(1).
In the course of doing a quantum-mechanical computation, we generally solve the schrodinger equation in a self-consistent fashion, starting with a guess wavefunction and generating better and better approximations to the exact wavefunction, which are then fed back into the procedure to compute the Hamiltonian as a matrix (which depends on the orbitals from the guess wavefunction). Once the wavefunction (and energy, and other properties) stops changing between steps, the solution is said to be self-consistent, and therefore and eigenstate has been located. (The particular eigenstate need not actually be the lowest-energy one (i.e. the ground state), but that is determined by the particular convergence techniques used, etc..) At each step, we end up with a number of single-particl orbitals (which we recall are just linear combinations of the basis functions), in fact, since the orbitals are required to be orthogonal to each other, and normalized, we end up with as many orbitals as the number of basis functions! (In general, the number of basis functions is substantially larger than the number of electrons in the system, very much so in the case of very precise calculations.) Of these, the N with the lowest energy are selected to be part of the reference wavefunction for the next step, and passed into the antisymmetrization process.
Now, there’s nothing to prevent us from picking some different set of N orbitals and building a determinental wavefunction from that set; it just wouldn’t be a good
approximation to the ground state of the system, and would
probably give weird results if we tried to do a self-consistent calculation starting from it (though there are some times when a variation on this can be useful). However, if we have already reached self-consistency, and thus have a zeroth-order wavefunction (from the lowest-energy N orbitals), then we can think of these other (more general) determinants as being excitations from that reference. Generally, these determinants are classified as single excitations, double excitations, etc., up to N-tuple excitations, which is as high as it goes. (You can’t excite an (N+1)-th particle if there are only N particles.) Many types of advanced theories for computing quantum-mechanical energies and other quantities start from the zeroth-order wavefunction as a reference, and describe corrections to it in terms of different orders of excitations from that ground state. (Higher orders of excitation have higher total energies, and thus can be expected to contribute less to the actual behavior of the whole system.)
Now, this theorem of Thouless’ has something to say about determinantal wavefunctions. In particular, if I have one wavefunction phi-0 that I want to use as a reference for something or another, and I have some other wavefunction phi, that is built out of completely different single-particle orbitals (but using the same basis set), then Thouless tells me that I can write phi in terms of phi-0 in a very simple way, as the exponential of an operator A applied to phi-0.
This operator A involves only single-particle excitations, that is, it is a linear combination of operators that move an electron from orbital i to orbital a.
This result is rather fascinating, since really, phi and phi-0 have nothing to do with each other — if I wanted to write phi in terms of the single-particle orbitals that comprise phi-0, I would have to write phi as a linear combination of determinants; in general, this would have to include all possible determinants from those orbitals (i.e. up to and including N-tuply excited states). But Thouless tells me I only need to worry about single excitations! (The coefficients for the linear combination of single-particle excitation operators are just determined by the overlap of phi with those singly-excited states.) I’m still trying to wrap my head around just what this really means, but in some sense, it seems that these different states phi and phi-0 live in the same Hilbert space, and any point in that space can be transformed to any other point by a rotation (of some set of coordinates which I do not specify). The set of allowed rotations is restricted in some fashion such that I can specify a direction and distance to rotate, and that will give my any possible state I want.
I ran into this theorem because in the course of my research, I need to compute interactions between wavefunctions from completely different systems, and the computations are much more efficiently performed in the basis of single-particle orbitals from psi-0 than in the original set of basis functions that we started with (in my case, gaussian-times-spherical-harmonic functions centered on atomic nuclei, and thus known as the AO (atomic orbital) basis). This is because the single-particle orbitals (commonly known as MOs (molecular orbitals) are orthogonal, whereas the AOs are not; the non-orthogonality of the AO basis makes many computations substantially more expensive to perform. I will be able to bring the scaling of my computation down from N^7 to N^5 (possibly even N^4) by using this trick, which is quite handy when the number of basis functions (N) starts creeping into the thousands.