I've moved my blog to jmcneil.net. This is no longer being updated!

Friday, May 23, 2008

Linux Caching Issues

Recently, we started seeing issues where a file would exist on a couple of the NFS clients, but not the others. The front-end Apache instance would return different results depending on which cluster nodes our load balancers would direct us to. In some cases, we'd recieve 404 errors. In other cases, we'd get the actual content. In a third scnenario, we would get the default Red Hat Index page (as the NFS client couldn't access the *real* DirectoryIndex page).

At first, I thought it was an Apache problem as we run some custom modules which handle URL translation.

However, another issue popped up with a PHP script. That PHP script was attempting to include another as a function library. In this case, that include was failing with an ENOENT on some systems, but not on others. Clearly this wasn't an Apache problem.

I've yet to be able to reproduce the problem, but I've a few existing instances to test with.

Given a situation where we're getting a 404 half of the time, I ran the following test against our NFS clients:


# for i in www1a www1b www1c www1d ; do echo $i; ssh cluster-$i-mgmt "stat /home/cluster1/data/s/c/user/html/index.htm"; done
www1a
stat: cannot stat `/home/cluster1/data/s/c/user/html/index.htm': No such file or directory
www1b
File: `/home/cluster1/data/s/c/user/html/index.htm'
Size: 18838 Blocks: 40 IO Block: 4096 regular file
Device: 15h/21d Inode: 2733856169 Links: 1
Access: (0755/-rwxr-xr-x) Uid: (15953/ user) Gid: (15953/ user)
Access: 2008-05-23 09:55:03.029000000 -0400
Modify: 2008-05-23 09:55:03.029000000 -0400
Change: 2008-05-23 09:55:03.029000000 -0400
www1c
File: `/home/cluster1/data/s/c/user/html/index.htm'
Size: 18838 Blocks: 40 IO Block: 4096 regular file
Device: 15h/21d Inode: 2733856169 Links: 1
Access: (0755/-rwxr-xr-x) Uid: (15953/ user) Gid: (15953/ user)
Access: 2008-05-23 09:55:03.029000000 -0400
Modify: 2008-05-23 09:55:03.029000000 -0400
Change: 2008-05-23 09:55:03.029000000 -0400
www1d
File: `/home/cluster1/data/s/c/user/html/index.htm'
Size: 18838 Blocks: 40 IO Block: 4096 regular file
Device: 15h/21d Inode: 2733856169 Links: 1
Access: (0755/-rwxr-xr-x) Uid: (15953/ user) Gid: (15953/ user)
Access: 2008-05-23 09:55:03.029000000 -0400
Modify: 2008-05-23 09:55:03.029000000 -0400
Change: 2008-05-23 09:55:03.029000000 -0400

When logging in to 'www1a', it's just not possible to read the file directly. I can't cat it, stat it, or ls it. However, once I step into the containing directory (html, in this case) and run an 'ls', the file is now availble. Looks as though the readdir() triggered by my ls command updates the cache.

So, my assumption at this point is that it's a directory cache problem. For some reason, we're getting negative entries (NULL inode structure pointers). I've no idea why.

The problem came up again this morning. As it turns out, Linux 2.6.16 allows users to dump various caches in order to free the memory being used. I tried it out on one of the systems experiencing the problem:


[root@cluster-www1c vm]# ls
block_dump drop_caches max_map_count
overcommit_ratio swappiness
dirty_background_ratio hugetlb_shm_group min_free_kbytes
pagecache swap_token_timeout
dirty_expire_centisecs laptop_mode nr_hugepages
page-cluster vdso_enabled
dirty_ratio legacy_va_layout nr_pdflush_threads
panic_on_oom vfs_cache_pressure
dirty_writeback_centisecs lowmem_reserve_ratio overcommit_memory
percpu_pagelist_fraction
[root@cluster-www1c vm]# stat /home/cluster1/data/s/c/user/html/index.htm
stat: cannot stat `/home/cluster1/data/s/c/user/html/index.htm': No such file or directory
[root@cluster-www1c vm]# sync
[root@cluster-www1c vm]# sync
[root@cluster-www1c vm]# echo '2' > drop_caches
[root@cluster-www1c vm]# !st
stat /home/cluster1/data/s/c/user/html/index.htm
File: `/home/cluster1/data/s/c/user/html/index.htm'
Size: 18838 Blocks: 40 IO Block: 4096 regular file
Device: 15h/21d Inode: 2733856169 Links: 1
Access: (0755/-rwxr-xr-x) Uid: (15953/ user) Gid: (15953/ user)
Access: 2008-05-23 09:55:03.029000000 -0400
Modify: 2008-05-23 09:55:03.029000000 -0400
Change: 2008-05-23 09:55:03.029000000 -0400
[root@cluster-www1c vm]#


So, we've got a bit of a workaround, at least for the long weekend. I'll probably wind up setting up a cron job to dump the cache every half hour or so in order to avoid phone calls.

The problem is clearly related to the dcache. I've no idea what is causing it, however. It could be our bind mount system. The fancy NAS unit may also be returning invalid responses causing Linux to do the Right Thing at the Wrong Time.

No comments: