The code has been in existence for about 7 years now in various forms. I'm not the original author, though I've probably replaced about half of it as our needs change.
At the core, there is a shared memory cache that Apache children attach to. As data is pulled out of LDAP, it's jammed into that cache. I'm also storing negative entries in order to prevent against DOS situations. Data is expired after a configurable time frame. The expiration is handled by a separate daemon.
So, about a week ago, we started having issues with nodes locking up. The expiry daemon was sitting at around 100% CPU and Apache would not answer requests. An strace on the expiry process showed no system calls.
These are the fun ones. Probably stuck in a while loop due to a buffer overrun or some such problem.
Well, I stuck some debug code into the expiration process and I see the following:
expire: big long string with lots of spaces in it.. more than 128 bytes long ending in arealdomain.com???????
The question marks being the terminal chars for "I don't know how to render that!" Turns out that uncovered a bug in that domains over 127 chars were not NULL terminated when added to the negative cache.
In digging further, I checked my access logs. It turns out it was GoogleBot sending that big long string of junk as a 'Host:' header. Each time GoogleBot would hit a specific site on my platform, it would pass that in. It's amazing to me that we've not had this problem before and that GoogleBot was the first agent to trigger it...
Of course, it could always be a fraudulent user agent as I forgot to check the IP ownership before I ditched the logs...