I've moved my blog to jmcneil.net. This is no longer being updated!

Wednesday, May 28, 2008

More NFS...

I am able to reproduce the issue at will now using both a Linux NFS
server or the NAS system. We can pretty safely rule out the NAS as the
culprit (though let's keep the ticket open, in case they've seen this
before).

As I've already explained, Linux caches file names as to avoid hitting
the disk on every look up (dentry cache). When there is a cache hit
(either negative or positive) the kernel will call a 'validate'
function, to ensure that the cached data is still good.

On local file systems, that function is simply a no-operation as all
updates need to go through the cache. Coherency is guaranteed.

On remote file systems (NFS, GFS, etc...) that call does "something" to
validate the contents of the cache. If the backing data has changed,
that entry is invalidated and the entire look up process starts over.

If there is a cache entry, the NFS validate function checks the cached
mtime on that entry's parent and compares it with current mtime as
reported by the NFS server. If there is a difference, the entry is
invalidated.

That process appears to not work correctly when dealing with negative
entries. It seems to work fine for positive entries (file *changes* show
up, just not new files in some cases).

So, after spending some time tracing how this whole process actually
works, I was able to reproduce the problem at will:

  1. Stat a file that we know is non-existant. This populates the dentry
    cache with a negative entry.

    [root@cluster-util ~]# ssh root@cluster-www1a
    "stat /home/cluster1/data/f/f/nfsuser01/test_nofile"
    stat: cannot stat `/home/cluster1/data/f/f/nfsuser01/test_nofile': No
    such file or directory


  2. Create that file on a different server, this will also update the
    mtime on that parent directory, so the NFS validation code on the dentry
    hit ought to catch that.

    [root@cluster-util ~]# ssh root@cluster-sshftp1a
    "touch /home/cluster1/data/f/f/nfsuser01/test_nofile"


  3. Try and stat the file again. Still broken.

    [root@cluster-util ~]# ssh root@cluster-www1a
    "stat /home/cluster1/data/f/f/nfsuser01/test_nofile"
    stat: cannot stat `/home/cluster1/data/f/f/nfsuser01/test_nofile': No
    such file or directory


  4. Read the parent directory, this doesn't actually *repopulate* the
    cache as I first thought. It actually invalidates the parent directory
    itself (which is what should have happened correctly the first time
    through). It doesn't invalidate it "the same way", though.

    [root@cluster-util ~]# ssh root@cluster-www1a
    "ls /home/cluster1/data/f/f/nfsuser01/ | wc -l"
    16


  5. And now it's present, as evidenced by the stat command below.

    [root@cluster-util ~]# ssh root@cluster-www1a
    "stat /home/cluster1/data/f/f/nfsuser01/test_nofile"
    File: `/home/cluster1/data/f/f/nfsuser01/test_nofile'
    Size: 0 Blocks: 0 IO Block: 4096 regular
    empty file
    Device: 15h/21d Inode: 4046108346 Links: 1
    Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
    Access: 2008-05-28 10:07:28.963000000 -0400
    Modify: 2008-05-28 10:07:28.963000000 -0400
    Change: 2008-05-28 10:07:28.963000000 -0400
    [root@cluster-util ~]#


Next step is going to be to load up a debug kernel or stick a system tap script in there to attempt to determine why that negative entry isn't invalidated as it should be .

No comments: