server or the NAS system. We can pretty safely rule out the NAS as the
culprit (though let's keep the ticket open, in case they've seen this
before).
As I've already explained, Linux caches file names as to avoid hitting
the disk on every look up (dentry cache). When there is a cache hit
(either negative or positive) the kernel will call a 'validate'
function, to ensure that the cached data is still good.
On local file systems, that function is simply a no-operation as all
updates need to go through the cache. Coherency is guaranteed.
On remote file systems (NFS, GFS, etc...) that call does "something" to
validate the contents of the cache. If the backing data has changed,
that entry is invalidated and the entire look up process starts over.
If there is a cache entry, the NFS validate function checks the cached
mtime on that entry's parent and compares it with current mtime as
reported by the NFS server. If there is a difference, the entry is
invalidated.
That process appears to not work correctly when dealing with negative
entries. It seems to work fine for positive entries (file *changes* show
up, just not new files in some cases).
So, after spending some time tracing how this whole process actually
works, I was able to reproduce the problem at will:
- Stat a file that we know is non-existant. This populates the dentry
cache with a negative entry.
[root@cluster-util ~]# ssh root@cluster-www1a
"stat /home/cluster1/data/f/f/nfsuser01/test_nofile"
stat: cannot stat `/home/cluster1/data/f/f/nfsuser01/test_nofile': No
such file or directory - Create that file on a different server, this will also update the
mtime on that parent directory, so the NFS validation code on the dentry
hit ought to catch that.
[root@cluster-util ~]# ssh root@cluster-sshftp1a
"touch /home/cluster1/data/f/f/nfsuser01/test_nofile" - Try and stat the file again. Still broken.
[root@cluster-util ~]# ssh root@cluster-www1a
"stat /home/cluster1/data/f/f/nfsuser01/test_nofile"
stat: cannot stat `/home/cluster1/data/f/f/nfsuser01/test_nofile': No
such file or directory - Read the parent directory, this doesn't actually *repopulate* the
cache as I first thought. It actually invalidates the parent directory
itself (which is what should have happened correctly the first time
through). It doesn't invalidate it "the same way", though.
[root@cluster-util ~]# ssh root@cluster-www1a
"ls /home/cluster1/data/f/f/nfsuser01/ | wc -l"
16 - And now it's present, as evidenced by the stat command below.
[root@cluster-util ~]# ssh root@cluster-www1a
"stat /home/cluster1/data/f/f/nfsuser01/test_nofile"
File: `/home/cluster1/data/f/f/nfsuser01/test_nofile'
Size: 0 Blocks: 0 IO Block: 4096 regular
empty file
Device: 15h/21d Inode: 4046108346 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2008-05-28 10:07:28.963000000 -0400
Modify: 2008-05-28 10:07:28.963000000 -0400
Change: 2008-05-28 10:07:28.963000000 -0400
[root@cluster-util ~]#
Next step is going to be to load up a debug kernel or stick a system tap script in there to attempt to determine why that negative entry isn't invalidated as it should be .
No comments:
Post a Comment