<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Comity of errors</title>
	<atom:link href="http://kaduk.org/bjk/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://kaduk.org/bjk/blog</link>
	<description>the meat, the meta, and the morbid</description>
	<lastBuildDate>Mon, 06 Dec 2010 06:44:39 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>submitted</title>
		<link>http://kaduk.org/bjk/blog/2010/12/06/submitted/</link>
		<comments>http://kaduk.org/bjk/blog/2010/12/06/submitted/#comments</comments>
		<pubDate>Mon, 06 Dec 2010 06:44:39 +0000</pubDate>
		<dc:creator>bjk</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://kaduk.org/bjk/blog/?p=157</guid>
		<description><![CDATA[Last time, I mentioned more about FreeBSD packaging; I&#8217;ve been gradually working on the OpenAFS port, and it is now far enough along that I have submitted it for inclusion in the Ports Collection (pending testing, review, etc.).  Follow the PR for updates on issues with the packaging that get reported and fixed.
]]></description>
			<content:encoded><![CDATA[<p><a href="http://kaduk.org/bjk/blog/2010/11/21/anatomy-of-a-freebsd-port-part-5/">Last time</a>, I mentioned more about FreeBSD packaging; I&#8217;ve been gradually working on the OpenAFS port, and it is now far enough along that I have submitted it for inclusion in the Ports Collection (pending testing, review, etc.).  Follow <a href="http://www.freebsd.org/cgi/query-pr.cgi?pr=ports/152467">the PR</a> for updates on issues with the packaging that get reported and fixed.</p>
]]></content:encoded>
			<wfw:commentRss>http://kaduk.org/bjk/blog/2010/12/06/submitted/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Anatomy of a FreeBSD port (part 5)</title>
		<link>http://kaduk.org/bjk/blog/2010/11/21/anatomy-of-a-freebsd-port-part-5/</link>
		<comments>http://kaduk.org/bjk/blog/2010/11/21/anatomy-of-a-freebsd-port-part-5/#comments</comments>
		<pubDate>Mon, 22 Nov 2010 03:59:42 +0000</pubDate>
		<dc:creator>bjk</dc:creator>
				<category><![CDATA[FreeBSD]]></category>

		<guid isPermaLink="false">http://kaduk.org/bjk/blog/?p=154</guid>
		<description><![CDATA[It&#8217;s been quite some time since I last posted about FreeBSD packaging; today I&#8217;m coming back to it to talk a bit about other things that can go in the files/ directory.  I just recently got packaging for OpenAFS in good enough shape to submit to the FreeBSD Ports Collection (the PR is here); [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s been quite some time since I last <a href="http://kaduk.org/bjk/blog/2010/04/11/anatomy-of-a-freebsd-port-part-4/">posted</a> about FreeBSD packaging; today I&#8217;m coming back to it to talk a bit about other things that can go in the <code>files/</code> directory.  I just recently got packaging for OpenAFS in good enough shape to submit to the FreeBSD Ports Collection (the PR is <a href="http://www.freebsd.org/cgi/query-pr.cgi?pr=ports/152467">here</a>); there&#8217;s a little bit of cleverness in the <code>Makefile</code> that I&#8217;ll skip for now, in favor of the rc scripts.<br />
Long, long ago, in the early days of Unix (read: before my time), there was a single shell script <code>/etc/rc</code> (the <a href="http://www.catb.org/jargon/">Jargon file</a> claims it to stand for &#8220;runcom&#8221;) that would be run during system startup, executing commands to set up the local environment, start daemons, etc..  Eventually it grew so huge that it was split up into multiple files, and eventually a large infrastructure was created so that each service would have a script in <code>/etc/rc.d/</code> and the administrator had mechanisms for controlling which scripts would be run when.  Many of these controls are placed in <code>/etc/rc.conf</code>, and the rc scripts for software from the Ports Collection go in <code>/usr/local/etc/rc.d</code> to keep them separate from the base system.<br />
Instead of just being a shell script that is sourced at startup, modern usage involves invocations such as:</p>
<pre>/usr/local/etc/rc.d/afsd onestart
/usr/local/etc/rc.d/afsd forcestop
/usr/local/etc/rc.d/afsd start
</pre>
<p>with multiple variations on the &#8220;start&#8221; and &#8220;stop&#8221; commands.  In order to be <code>start</code>ed, the appropriate <code>rc.conf</code> variable must be set to enable that service; <code>onestart</code> is a way to start it manually regardless.  In order for this to work, each rc script has to define several shell functions that hook into the (massive) <code>rc.subr</code> (that&#8217;s &#8220;subroutine&#8221;) infrastructure.  Here&#8217;s what I ended up with in <code>files/afsd.in</code>:</p>
<pre>#!/bin/sh
#
# we require afsserver for the (rare, untested) case when a client
# and server are running on the same machine -- the client must not
# start until the server is running.
#
# PROVIDE: afsd
# REQUIRE: afsserver named
</pre>
<p>These keywords are used to order all the rc scripts on system startup (and shutdown) &#8212; dependencies are declared explicitly.</p>
<pre>
. /etc/rc.subr

name="afsd"
rcvar="afsd_enable"
start_cmd="afsd_start"
start_precmd="afsd_prestart"
stop_cmd="afsd_stop"
command="%%PREFIX%%/sbin/${name}"
kmod="libafs"
vicedir="%%PREFIX%%/etc/openafs"
</pre>
<p>The reason for the <code>.in</code> suffix on this file is because it has variable substitution applied to it.  Here, <code>%%PREFIX%%</code> gets expanded to the current prefix that the port is being built with; this is usually <code>/usr/local</code> but can be other things.</p>
<pre>

load_rc_config "$name"
eval "${rcvar}=\${${rcvar}:-'NO'}"
</pre>
<p>This is us checking if we&#8217;re listed in <code>rc.conf</code>; default to disabled if not mentioned.</p>
<pre>

afsd_prestart()
</pre>
<p>This is one of the functions that hooks into <code>rc.subr</code> &#8212; AFS requires several configuration files and a kernel module to be in place before it can start, so we check that they&#8217;re all there and give a useful error if not.  This is quite helpful for users who are not familiar with how to start <code>afsd</code> manually.</p>
<pre>
{
        # not going very far without a kernel module
        if ! kldstat -qm afs; then
                echo "Loading AFS kernel module..."
                if ! kldload $kmod; then
                        echo "Failed to enable kernel support. Aborting."
                        return 1;
                fi
        fi
        # now we have a kernel module; check for conffiles
        for file in cacheinfo ThisCell CellServDB; do
                if [ ! -f ${vicedir}/${file} ]; then
                        echo "${vicedir}/${file} does not exist.  Not starting AFS client."
                        return 1
                fi
        done
        # need a mountpoint and a cache dir (well, if we have a disk cache)
        for dir in $(awk -F: '{print $1, $2}' ${vicedir}/cacheinfo); do
                if [ ! -d ${dir} ]; then
                        echo "${dir} does not exist. Not starting AFS client."
                        return 2
                fi
        done
}

afsd_start()
{
        # you probably don't want to change these
        afsd_default_args="-memcache -dynroot -fakestat-all -afsdb"
        # either set explicit extra args or just a size; default medium
        afsd_args=${afsd_args:-'MEDIUM'}
        case ${afsd_args} in
        LARGE)
                afsd_args="-stat 2800 -dcache 2400 -daemons 5 -volumes 128"
                ;;
        MEDIUM)
                afsd_args="-stat 2000 -dcache 800 -daemons 3 -volumes 70"
                ;;
        SMALL)
                afsd_args="-stat 300 -dcache 100 -daemons 2 -volumes 50"
                ;;
        esac
        ${command} ${afsd_default_args} ${afs_args}
}
</pre>
<p>The actual start function.  We check to see if we&#8217;ve been given extra arguments, using sane defaults if not.  There are also a few things that you basically always want (for example, non-memcache is currently broken), which are listed in a local variable.</p>
<pre>

afsd_stop()
{
        afsdir=$(awk -F: '{print $1}' ${vicedir}/cacheinfo)
        umount ${afsdir}
        _return=$?
        [ "${_return}" -ne 0 ] &#038;&#038; [ -n "${rc_force}" ] &#038;&#038; umount -f ${afsdir}
        kldunload ${kmod}
}
</pre>
<p>Stopping does not actually involve touching <code>afsd</code> at all &#8212; those processes will happily ignore whatever you throw at them.  We must check that AFS is mounted (as someone might be erroneously running <code>onestop</code>), and then just run the <code>umount</code> command to stop things.  We also check for whether force is being used, passing that on to <code>umount</code> if needed.</p>
<pre>

run_rc_command "$1"
</pre>
<p>This last line is very important!  It looks very mundane, but it is how we actually interface with the rc system; <code>rc.subr</code> defines this function, which does all the necessary variable-name munging and calls the appropriate function(s) that we have defined.</p>
<p>At install time for the port, the substituted variables are replaced, and the script is installed into <code>${PREFIX}/etc/rc.d</code> and added to the package list to be removed at deinstall time.  All in all, we wrap a standard interface around the complicated <code>afsd</code> semantics.</p>
]]></content:encoded>
			<wfw:commentRss>http://kaduk.org/bjk/blog/2010/11/21/anatomy-of-a-freebsd-port-part-5/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>and bad locking</title>
		<link>http://kaduk.org/bjk/blog/2010/11/15/and-bad-locking/</link>
		<comments>http://kaduk.org/bjk/blog/2010/11/15/and-bad-locking/#comments</comments>
		<pubDate>Mon, 15 Nov 2010 09:27:50 +0000</pubDate>
		<dc:creator>bjk</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://kaduk.org/bjk/blog/?p=151</guid>
		<description><![CDATA[Last time, I discussed some locking issues in OpenAFS and mentioned that fixing them uncovered a race condition elsewhere.
OpenAFS has a somewhat complicated locking strategy, but there are parts of it that rely on the afs_global_mtx, or the GLOCK for short.  The GLOCK should not be held across sleeps, as this could cause the [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://kaduk.org/bjk/blog/2010/11/07/more-locking/">Last time</a>, I discussed some locking issues in OpenAFS and mentioned that fixing them uncovered a race condition elsewhere.<br />
OpenAFS has a somewhat complicated locking strategy, but there are parts of it that rely on the <code>afs_global_mtx</code>, or the GLOCK for short.  The GLOCK should not be held across sleeps, as this could cause the client to hang.  But, it is needed for synchronization for some things that must sleep.  So, the sleep routines backend to <code>mtx_sleep</code>, which drops and reacquires the mutex around the actual sleep.  However, other threads may have acquired the GLOCK during the intervening time, so any checks which were made before the sleep must be made again (or the programmer must otherwise ensure that their values could not have changed).  This was problematic in the <code>afs_root</code> function, where the GLOCK is used to serialize access to a global variable, <code>afs_globalVp</code>, which points to the vcache entry for the root AFS vnode.  The relevant code is:</p>
<pre>    if (afs_globalVp &#038;&#038; (afs_globalVp->f.states &#038; CStatd)) {
        tvp = afs_globalVp;
        error = 0;
    } else {
tryagain:
        if (afs_globalVp) {
            afs_PutVCache(afs_globalVp);
            /* vrele() needed here or not? */
            afs_globalVp = NULL;
        }
</pre>
<p>The <code>afs_PutVCache</code> function sleeps, dropping the GLOCK in the process.  So it is possible for some other thread to have come into this block of code at the same time, and <strong>also</strong> try to call <code>afs_PutVCache</code> on <code>afs_glovalVp</code>.  When this happens, the core <code>vputx</code> routine used to implement <code>afs_PutVCache</code> sees that the reference count on the global vnode entry is negative, which violates an invariant of the VFS layer.  (This means a kernel panic under my debugging options.)<br />
The fix, then, is to make sure that all changes to <code>afs_globalVp</code> involve conditions and actions checked while the GLOCK is held and no sleeps have been made.  If the thread sleeps, conditions must be re-checked.<br />
For this block of code, the change is easy &#8212; set <code>afs_globalVp</code> to NULL <strong>before</strong> calling <code>afs_PutVCache</code> (storing the value in an intermediate variable), so that other threads will see that the removal has been queued, even if it has not actually taken place, yet.  However, this is not the only bug of this form in the <code>afs_root</code> function &#8212; a reminder that great care must be taken with locking strategies, and that sleeps can come in surprising places.</p>
]]></content:encoded>
			<wfw:commentRss>http://kaduk.org/bjk/blog/2010/11/15/and-bad-locking/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>more locking</title>
		<link>http://kaduk.org/bjk/blog/2010/11/07/more-locking/</link>
		<comments>http://kaduk.org/bjk/blog/2010/11/07/more-locking/#comments</comments>
		<pubDate>Mon, 08 Nov 2010 04:51:59 +0000</pubDate>
		<dc:creator>bjk</dc:creator>
				<category><![CDATA[FreeBSD]]></category>

		<guid isPermaLink="false">http://kaduk.org/bjk/blog/?p=148</guid>
		<description><![CDATA[Last time, I started off with some locking fixups for the OpenAFS client on FreeBSD, but left off in the middle.  We had fixed the cosmetic errors coming from afs_FlushVCache, but not the underlying problem.  We recall that the cosmetic issues arose due to incorrect locking around accesses to the v_usecount field of [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://kaduk.org/bjk/blog/2010/11/01/locking/">Last time</a>, I started off with some locking fixups for the OpenAFS client on FreeBSD, but left off in the middle.  We had fixed the cosmetic errors coming from <code>afs_FlushVCache</code>, but not the underlying problem.  We recall that the cosmetic issues arose due to incorrect locking around accesses to the <code>v_usecount</code> field of the vnode; it turns out that there are only a small number of places where this happens.  Most of them are uninteresting, just a quick check and not much else.  But in <code>osi_VM_FlushVCache</code> (which is FreeBSD-specific code), we check the use count once, and then again later on in the function, and then we call <code>vgone()</code>.  <code>vgone()</code> is a heavyweight function call, marking a vnode as being free for reuse.  As such, it requires some pretty heavy locking around calls to it &#8212; in particular, it requires an exclusive lock on the vnode.  (<code>vgone</code> also acquires the vnode interlock internally.)  However, this is not quite sufficient, as once <code>vgone</code> places the vnode on the free list, it is susceptible to being destroyed (the FreeBSD VFS layer runs a cleaner periodically).  But we still have the vnode locked!  We need to unlock it after <code>vgone</code> returns, and need some mechanism to ensure that it doesn&#8217;t go away in the meantime.  This is done by placing a &#8220;hold&#8221; on the vnode, or increasing its &#8220;hold count&#8221;.  This is a closely related idea to the use count (and in fact the usual way to increment the use count also increments the hold count), but needs to be tracked separately for implementation details like this.  So, then, we put a hold on the vnode before sending it away (keeping the interlock held for efficiency), then do the unlock, and then drop the hold.  This procedure is sufficiently internal to the VFS layer so as to not be documented in a man page; I learned it because a FreeBSD VFS expert directed me to the <code>vlrureclaim</code> function in <code>sys/kern/vfs_subr.c</code>.  It is not quite the same, as it is iterating over a list of vnodes, but it does go and free up vnodes that are not currently being used, so the checks and locks it takes are a good example for my use case.<br />
The extra bonus of going and implementing a proper set of instructions around <code>vgone</code> is that it allowed the removal of some duplicated work!  In addition to <code>osi_VM_FlushVCache</code>, the <code>osi_TryEvictVCache</code> function was doing some checks and then calling <code>vgone</code>.  Well, more properly, it was calling <code>vgonel</code>, which is not supposed to be an exported symbol but just happened to work due to an implementation detail of the kernel linker!  It turns out that the checks it was doing are exactly the same ones done in <code>osi_VM_FlushVCache</code>, so the latter can be implemented in terms of the former, removing a goodly chunk of code.  (It actually wants to be implemented in terms of <code>afs_FlushVCache</code>, which does some additional bookkeeping on the number of vcaches in use, but I didn&#8217;t realize this until after the code was committed; I ran into it while tracking down another issue.)  This change allowed my (multi-threaded) testing to go far enough to expose a rare race condition elsewhere in the codebase, which we&#8217;ll cover next time.</p>
]]></content:encoded>
			<wfw:commentRss>http://kaduk.org/bjk/blog/2010/11/07/more-locking/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>locking</title>
		<link>http://kaduk.org/bjk/blog/2010/11/01/locking/</link>
		<comments>http://kaduk.org/bjk/blog/2010/11/01/locking/#comments</comments>
		<pubDate>Mon, 01 Nov 2010 05:11:07 +0000</pubDate>
		<dc:creator>bjk</dc:creator>
				<category><![CDATA[FreeBSD]]></category>

		<guid isPermaLink="false">http://kaduk.org/bjk/blog/?p=143</guid>
		<description><![CDATA[Things have been kind of locked up, here, for the past few weeks.  But I have finally gotten around to getting some interesting work done, on the OpenAFS front.  During the long run of a &#8220;buildworld&#8221; in AFS, I would eventually get a warning on the console that the afs_vop_reclaim() function (i.e. the [...]]]></description>
			<content:encoded><![CDATA[<p>Things have been kind of locked up, here, for the past few weeks.  But I have finally gotten around to getting some interesting work done, on the OpenAFS front.  During the long run of a &#8220;buildworld&#8221; in AFS, I would eventually get a warning on the console that the <code>afs_vop_reclaim()</code> function (i.e. the actual function that gets used when the <code>VOP_RECLAIM()</code> operation is performed on a vnode of type &#8220;afs&#8221;) had hit an error condition where the routine that should have removed all AFS content from that vnode failed:</p>
<pre>
afs_vop_reclaim: afs_FlushVCache failed code 16
</pre>
<p>Code 16 is <code>EBUSY</code>, which is actually returned from several places in that function.  Placing print statements before all of them, and triggering the bug again, reveals that the reference count of the vnode (as determined by AFS&#8217;s <code>VREFCOUNT()</code> macro) was too large, implying that someone else was probably trying to use that vnode.  For extra fun, later on the <code>buildworld</code> step would fail, usually claiming that it could not find a particular header file.  Using the <code>fs getfid</code> command (from a different computer), it was clear that the file existed, and the <code>fid</code> used to identify that file is the same as the one that we failed to flush properly.  Clearly, this bug was leaving a corrupt vnode floating around, and this corruption was later crashing the build.<br />
Now, what does <code>VREFCOUNT</code> actually do? The relevant block of code is</p>
<pre>
 665 #if defined(AFS_XBSD_ENV) || defined(AFS_DARWIN_ENV)
 666 #define vrefCount   v->v_usecount
[...]
 673 #elif defined(AFS_XBSD_ENV) || defined(AFS_DARWIN_ENV)
 674 #define VREFCOUNT(v)          ((v)->vrefCount)
 675 #define VREFCOUNT_GT(v, y)    (AFSTOV(v)->v_usecount > (y))
</pre>
<p>(which is rather ugly); both <code>VREFCOUNT</code> and <code>VREFCOUNT_GT</code> use the <code>v_usecount</code> field of the vnode associated with the given vcache.  Now, FreeBSD has a locking strategy for vnode elements (and for the vnodes themselves, but that&#8217;s getting ahead of ourselves), which is laid out in the <code>sys/vnode.h</code> system header.</p>
<pre>
/*
 * Reading or writing any of these items requires holding the appropriate lock.
 *
 * Lock reference:
 *      c - namecache mutex
 *      f - freelist mutex
 *      G - Giant
 *      i - interlock
 *      m - mntvnodes mutex
 *      p - pollinfo lock
 *      s - spechash mutex
 *      S - syncer mutex
 *      u - Only a reference to the vnode is needed to read.
 *      v - vnode lock
[...]
        /*
         * Locking
         */
        struct  lock v_lock;                    /* u (if fs don't have one) */
        struct  mtx v_interlock;                /* lock for "i" things */
        struct  lock *v_vnlock;                 /* u pointer to vnode lock */
        int     v_holdcnt;                      /* i prevents recycling. */
        int     v_usecount;                     /* i ref count of users */
        u_long  v_iflag;                        /* i vnode flags (see below) */
        u_long  v_vflag;                        /* v vnode flags */
        int     v_writecount;                   /* v ref count of writers */
</pre>
<p>As we can see, the locking strategy requires that accesses to <code>v_usecount</code> hold the vnode interlock; OpenAFS was failing to do so.  Conveniently, there is a wrapper function <code>vrefcnt()</code> that takes the interlock, reads the use count into a local variable, and then returns the local variable.  Changing the <code>VREFCOUNT</code> macros to use this function did eliminate the console warnings about <code>afs_FlushVCache</code> returning <code>EBUSY</code> &#8230; but it did not fix the buildworld.  Still, we get compilation errors stemming from files that are mysteriously &#8220;missing&#8221;.  The fix involves more locking, but the story is a bit more involved than I have space left, here; it&#8217;s on tap for next time.</p>
]]></content:encoded>
			<wfw:commentRss>http://kaduk.org/bjk/blog/2010/11/01/locking/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Marmelade</title>
		<link>http://kaduk.org/bjk/blog/2010/10/04/marmelade/</link>
		<comments>http://kaduk.org/bjk/blog/2010/10/04/marmelade/#comments</comments>
		<pubDate>Mon, 04 Oct 2010 05:45:33 +0000</pubDate>
		<dc:creator>bjk</dc:creator>
				<category><![CDATA[Food]]></category>

		<guid isPermaLink="false">http://kaduk.org/bjk/blog/?p=141</guid>
		<description><![CDATA[A month or so back, I went to a friend&#8217;s &#8220;house-cooling&#8221;, that is, the party right before they moved across the country and wanted to give away all the stuff they weren&#8217;t moving with them.  At this point, there basically wasn&#8217;t any furniture in the house, so we were standing and/or sitting on the [...]]]></description>
			<content:encoded><![CDATA[<p>A month or so back, I went to a friend&#8217;s &#8220;house-cooling&#8221;, that is, the party right before they moved across the country and wanted to give away all the stuff they weren&#8217;t moving with them.  At this point, there basically wasn&#8217;t any furniture in the house, so we were standing and/or sitting on the floor.  Various people had brought snacks and mixers to go with the alcohol collection (which was also up for grabs!  Sadly, his dad got dibs on the scotch whiskey), which were quite delicious.  Among the things I came home with were a different class of foodstuffs, though &#8212; marmelades.  He had quite a collection of them, sometimes picking up very interesting things while traveling.  Among the delicacies I acquired were both lemon and lime marmelade, and a clementine marmelade that, instead of having strips of zest, had whole slices of the fruit!  Of course, since I got the collection used, as it were, many of the jars were almost gone, so I only got a little bit of use from them.  They were nonetheless good enough that I have switched from putting maple syrup on my daily waffle breakfast to using jams, jellies, and marmelades.  I have a relatively standard orange marmelade that I picked up from Trader Joe&#8217;s, but kind of wonder whether I could find something more interesting in the local area.</p>
]]></content:encoded>
			<wfw:commentRss>http://kaduk.org/bjk/blog/2010/10/04/marmelade/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Such a clang</title>
		<link>http://kaduk.org/bjk/blog/2010/09/27/such-a-clang/</link>
		<comments>http://kaduk.org/bjk/blog/2010/09/27/such-a-clang/#comments</comments>
		<pubDate>Mon, 27 Sep 2010 05:31:16 +0000</pubDate>
		<dc:creator>bjk</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://kaduk.org/bjk/blog/?p=138</guid>
		<description><![CDATA[Recent versions of FreeBSD are shipping with a clang binary &#8212; the C compiler using the LLVM compiler backend.  Very recent versions of FreeBSD even can be compiled with it and run normally.  Clang is an exciting development, since it has lots of nice static analysis and very clear warnings (well, at least [...]]]></description>
			<content:encoded><![CDATA[<p>Recent versions of FreeBSD are shipping with a <code>clang</code> binary &#8212; the C compiler using the LLVM compiler backend.  Very recent versions of FreeBSD even can be compiled with it and run normally.  Clang is an exciting development, since it has lots of nice static analysis and very clear warnings (well, at least as compared to gcc) and is pretty easily extensible.<br />
Of course, having acquired a clang binary that can compile the FreeBSD kernel, my first instinct was to throw the OpenAFS source at it, to see what sorts of new and exciting warnings it gives about the codebase.  After a bit of a detour figuring out how to suppress the color (!) in the output (which doesn&#8217;t work very well with logging it to file), I did get a <code>libafs.ko</code> module built with clang, and went to try it out.  Very quickly, I got an &#8220;unexpected FPU use in kernel mode&#8221; page fault, and perhaps not too surprisingly, attempting to get a crash dump caused the system to hang.<br />
Unfortunately, my test system is remote, so my clang-y crash has taken the wind out of my pipes.</p>
]]></content:encoded>
			<wfw:commentRss>http://kaduk.org/bjk/blog/2010/09/27/such-a-clang/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>archive or archive</title>
		<link>http://kaduk.org/bjk/blog/2010/09/19/archive-or-archive/</link>
		<comments>http://kaduk.org/bjk/blog/2010/09/19/archive-or-archive/#comments</comments>
		<pubDate>Mon, 20 Sep 2010 03:37:22 +0000</pubDate>
		<dc:creator>bjk</dc:creator>
				<category><![CDATA[FreeBSD]]></category>

		<guid isPermaLink="false">http://kaduk.org/bjk/blog/?p=134</guid>
		<description><![CDATA[Another AFS-inspired post, though here it is just barely involved.  Previously, I mentioned that I could recompile the entire operating system with a recent version of OpenAFS on FreeBSD.  However, my first attempt to do so failed, with a rather curious error:
building static egacy library
ar: fatal: Numeric user ID too large
*** Error code [...]]]></description>
			<content:encoded><![CDATA[<p>Another AFS-inspired post, though here it is just barely involved.  <a href="http://kaduk.org/bjk/blog/2010/08/30/a-world-of-success/">Previously</a>, I mentioned that I could recompile the entire operating system with a recent version of OpenAFS on FreeBSD.  However, my first attempt to do so failed, with a rather curious error:<br />
<code>building static egacy library<br />
ar: fatal: Numeric user ID too large<br />
*** Error code 70</code><br />
The <code>ar(1)</code> utility is used to generate static libraries, and many such libraries are generated during the buildworld process.  However, the <code>ar</code> file format stores the uid and gid of the object files that comprise the archive, and there is a fixed-width field for storing the uid (six columns).  This usually works just fine, since Unix user IDs are capped at 2^16 or so, which is only five columns.  However, in AFS, this uids must be globally unique, and can be quite large &#8212; in particular, the protection database entry for <code>daemon.freebuild</code> (the kerberos principal I was using for testing) is 33554737, which decidedly does not fit into six columns!</p>
<p>I got around to doing some research, and none of the other systems I tested made a failure to represent the uid a fatal error:  Linux simply truncated it to 335547, Solaris capped it at 600001, and OS X took the remainder modulo some power of 2 (between 8 and 25), leaving it as 217.  FreeBSD&#8217;s libarchive infrastructure makes this a fairly easy patch to write, and I&#8217;m currently testing a patch that makes this condition non-fatal for submission to upstream.</p>
]]></content:encoded>
			<wfw:commentRss>http://kaduk.org/bjk/blog/2010/09/19/archive-or-archive/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>lockers</title>
		<link>http://kaduk.org/bjk/blog/2010/09/13/lockers/</link>
		<comments>http://kaduk.org/bjk/blog/2010/09/13/lockers/#comments</comments>
		<pubDate>Mon, 13 Sep 2010 06:26:05 +0000</pubDate>
		<dc:creator>bjk</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://kaduk.org/bjk/blog/?p=130</guid>
		<description><![CDATA[MIT makes heavy use of the AFS network filesystem in its Athena computing environment.  (This leads to my interest in OpenAFS support for FreeBSD, of course.)  One nifty feature of this setup is the concept of a &#8220;locker&#8221;, which corresponds to a particular bucket of AFS storage.  (Actually, there can be lockers [...]]]></description>
			<content:encoded><![CDATA[<p>MIT makes heavy use of the AFS network filesystem in its Athena computing environment.  (This leads to my interest in OpenAFS support for FreeBSD, of course.)  One nifty feature of this setup is the concept of a &#8220;locker&#8221;, which corresponds to a particular bucket of AFS storage.  (Actually, there can be lockers backed with other filesystems, but those are quite rare these days.)  AFS divides storage into <em>volumes</em>, which have a particular quota and are mounted at a particular point in the global <code>/afs</code> namespace.<br />
What a locker does is gives an AFS volume an entry in a <code>/mit</code> namespace, collapsing a very broad tree into a nice flat area.  I only <code>attach</code> the lockers that I&#8217;m interested in, so <code>/mit</code> doesn&#8217;t get very full.<br />
For example, my home directory is <code>/afs/athena.mit.edu/user/k/a/kaduk</code> (quite a mouthful!), but is available at <code>/mit/kaduk</code> when I log in.<br />
In addition to user lockers, there are also organization lockers (for student groups and the like), project lockers (for (usually software) projects), and more.  Project lockers are useful in that software can be installed in them which is available on any Athena machine, without being locally installed on each machine&#8217;s hard drive.  This is something of a feat when different Athena machines are based on different operating systems (or even different versions of the same OS) or have different word length.  A key feature here of AFS is its use of <code>sysname</code>s, so that each machine has a list of well-recognized names that it recognizes as being a compatible software version.  For example, on most current Athena cluster machines, the primary sysname is <code>amd64_ubuntu1004</code>, as these are 64-bit Ubuntu Lucid Lynx machines.  But they could probably also run code for <code>amd64_ubuntu910</code> machines, and maybe even <code>i386_ubuntu1004</code> machines.  A maintainer of locker software can deploy multiple copies of the same software, compiled for different systems, and make <code>bin</code> a symlink to <code>arch/@sys/bin</code> so that the appropriate version is selected automatically.<br />
I have recently become a maintainer of locker software in earnest, installing the <code>tmux</code> utility in the <code>bsd</code> locker (after a suggestion in the SIPB office).  Of course, I only have it installed for one sysname at the moment, so there&#8217;s more work to be done &#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://kaduk.org/bjk/blog/2010/09/13/lockers/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>a world of success</title>
		<link>http://kaduk.org/bjk/blog/2010/08/30/a-world-of-success/</link>
		<comments>http://kaduk.org/bjk/blog/2010/08/30/a-world-of-success/#comments</comments>
		<pubDate>Mon, 30 Aug 2010 05:09:39 +0000</pubDate>
		<dc:creator>bjk</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://kaduk.org/bjk/blog/?p=125</guid>
		<description><![CDATA[Having finally made some advances on the OpenAFS front, I had achieved a state that was able to copy, read, and write a large data set without error, hang, or crash.  However, I was unable to run executables from AFS, which presented a serious obstacle to passing the lazy man&#8217;s filesystem stress test: &#8216;make [...]]]></description>
			<content:encoded><![CDATA[<p>Having finally made some advances on the <a href="http://kaduk.org/bjk/blog/2010/07/05/integrity/">OpenAFS front</a>, I had achieved a state that was able to copy, read, and write a large data set without error, hang, or crash.  However, I was unable to run executables from AFS, which presented a serious obstacle to passing the lazy man&#8217;s filesystem stress test: &#8216;make buildworld&#8217;.  This target recompiles from scratch an entire build toolchain, and uses that (updated) toolchain to rebuild the entire operating system from scratch.  As such, it can put a fair bit of load on a filesystem (and a CPU, for that matter).<br />
Asking on the freebsd-fs@FreeBSD.org mailing list, a simple suggestion was made that would account for the displayed symptoms.  (This involved two different mechanisms for tracking what is effectively a file&#8217;s size, and only one of them being updated.)  After applying that fix, and a workaround for some locking issues, I now have an OpenAFS installation that can survive the buildworld informal stress test.<br />
To be fair, it&#8217;s not perfect &#8212; attempting a parallel make with simultaneous compilation processes still causes a deadlock, but it&#8217;s a big milestone, and cause for some celebration.</p>
]]></content:encoded>
			<wfw:commentRss>http://kaduk.org/bjk/blog/2010/08/30/a-world-of-success/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

