Web Infrastructure Systems, Deployments, and Technologies.: 2008

Friday, December 19, 2008

Into Pylons I go...

So I've never really considered myself a web developer. Rather, I believe I'm a systems/administration/platforms guy that can slap a text-only HTML page up if I bump into a problem that a command line client can't solve. I build the classic "engineer" web site.

A few weeks back I started a project which requires a lot of heavy web UI work. Hopefully, it's able to pay the bills at some point.

Aside from having to learn CSS, HTML, JavaScript, and enough graphics to get by, I've also had to dive into the world of web frameworks. I looked at what I consider the "big three" up front - Django, TurboGears, and RoR.

Then I found Pylons. More importantly, I found http://www.pylonsbook.com. This has got to be one of the most complete end-to-end web development books I've seen thus far. Sure, it covers templating and routes and whatnot, but it also dives into things such as S3, and non-RDBMS based data models. It seems to answer a lot of edge case questions that other resources have brushed over. A big thanks to Mr. Gardner and the community for putting this one together. I'll be picking up a hard copy.

I think I might be a Pylons guy now.

Friday, November 14, 2008

Building My Egg Repository

I'm not certain whether this is the proper approach or not, I had a bit
of a tough time finding documentation detailing how to do this.

Our account provisioning system is composed of a twistd daemon and
a series of eggs which plug in to it. Each egg is responsible
for exporting a series of methods which can be exposed via
XMLRPC or SOAP. It works pretty well in that adding a new service
is as simple as adding a new egg.

I had been manually installing all of the egg files via direct
URLs each time I updated or changed the software. I've also
manually installed the required versions directly onto my
buildbot system. Managing it was getting slightly difficult as
the number of packages has been growing steadily.

Over the past few days, I've deployed my own egg repository. I've
updated my buildbot rules to copy over all newly built eggs into it
as tests complete.

Deploying the repository itself was quite easy. I simply
created a directory below my Apache document root for for each
package I want to serve, for example:

mkdir /var/www/html/eggs/hostapi.core
mkdir /var/www/html/eggs/hostapi.platform
mkdir /var/www/html/eggs/hostapi.mysql

The next step was to update the Apache configuration to allow
for directory listings.

Options Indexes ExecCGI

Lastly, I've updated my buildbot master.cfg script for each
component. After a build completes, we simply copy all of
the eggs in the dist/ directory into each project's eggs
location.

 f1.addStep(ShellCommand(
          command="cp -vf dist/*.egg /var/www/html/eggs/hostapi.core"))

Now, I can install all of my packages directly via easy_install, which
eliminates a lot of the deployment burden

[root@virtapi01 ~]# easy_install -Ui http://buildslave01/eggs hostapi.core
Searching for hostapi.core
Reading http://buildslave01/eggs/hostapi.core/
Reading http://buildslave01/eggs/hostapi.core/?C=S;O=A
Reading http://buildslave01/eggs/hostapi.core/?C=D;O=A
Reading http://buildslave01/eggs/hostapi.core/?C=M;O=A
Reading http://buildslave01/eggs/hostapi.core/?C=N;O=D
Best match: hostapi.core 1.0-r75
Processing hostapi.core-1.0_r75-py2.4.egg
hostapi.core 1.0-r75 is already the active version in easy-install.pth

Using /usr/lib/python2.4/site-packages/hostapi.core-1.0_r75-py2.4.egg
Reading http://buildslave01/eggs
Processing dependencies for hostapi.core
Finished processing dependencies for hostapi.core
[root@virtapi01 ~]#

All seemed well and good, so I updated the setup.py scripts for each
package to refer to a deployment URL rather than rely on locally
installed packages. Time for a test build.

running test
Checking .pth file support in .
/usr/bin/python -E -c pass
Searching for hostapi.core
Reading http://buildslave01/eggs
Reading http://pypi.python.org/simple/hostapi.core/
Couldn't find index page for 'hostapi.core' (maybe misspelled?)
Scanning index of all packages (this may take a while)
Reading http://pypi.python.org/simple/
No local packages or download links found for hostapi.core
error: Could not find suitable distribution for Requirement.parse('hostapi.core')
program finished with exit code 1
elapsedTime=2.605963

Ak! Wha? Not only did it not find my eggs, it sent requests to PyPI that
aren't going to do any good. In order to fix the problem, I created
the following little index.cgi script, which I've placed within my eggs
directory on the web server.

#!/usr/bin/python

import os
import cgitb
cgitb.enable()

print 'Content-type: text/html'
print
for dir in os.listdir('.'):
   try:
       for egg in os.listdir(dir):
           if egg.endswith('.egg'):
               print "%s" % (dir, egg, egg)
           elif egg.endswith('svn'):
               print open(egg).read()
   except OSError: pass

Now everything seems to be happy. My buildbot CI builds work
correctly as they can find their dependancy links, my easy_install
command lines work correctly as I've built a 'fake' PyPI, and I work
a little more correctly as I'm not copying egg files around any longer!

In order to reduce pull on PyPI, I'm actually locally hosting copies
of BeautifulSoup, zope.*, and some SQL libraries. It also helps to
ensure I don't get an unexpected update.

Is there a better way to do any of this? This seems to work pretty
well. I'm currently keeping two of these repositories. The first
one gets new eggs with each CI build while the second one only
contains eggs which have cleared QA and are production ready

Thursday, November 13, 2008

"phrase from nearest book" meme

via http://agiletesting.blogspot.com/2008/11/phrase-from-nearest-book-meme.html

Grab the nearest book.
Open it to page 56.
Find the fifth sentence.
Post the text of the sentence in your journal along with these instructions.
Don’t dig for your favorite book, the cool book, or the intellectual one: pick the CLOSEST.

Here’s mine, from The Art of Agile Development by James Shore & Shane Warden:

"Setup a projector so the whole team navigates while one person drives."

Somewhat anticlimactic, but still solid advice.

Tuesday, November 11, 2008

A subprocess.Popen Gotcha

In answering a recent question, I noticed a bit of a gotcha while using
subprocess.Popen. It's important to note the use of the 'args' parameter if
setting the shell argument to True.

When the shell parameter is set to True, the subprocess module essentially
does the following:


exec_list = ['/bin/sh',  '-c' ] +  args
os.execvp(exec_list[0], exec_list)

The 'args' parameter to the Popen initializer can be either a sequence or
a simple string. In most cases, args ought to be a string value here,
such that a shell is executed and in turn fires off the command with
its arguments:


subprocess.Popen('/bin/ls /var/log', shell=True)

Will result in:


/bin/sh -c '/bin/ls /var/log/'

However, it is still perfectly valid to use a exec-style sequence for
'args.' The resulting command executed will be different, however.


subprocess.Popen(('/bin/ls', '/var/log'), shell=True)

Will result in:

/bin/sh -c '/bin/ls' '/var/log'

The end result is quite different. In the first example, '/var/log' is
passed as an argument to '/bin/ls.' In the second example, both '/bin/ls'
and '/var/log' are passed as arguments to '/bin/sh.'

Since there is no immediate error generated, the caller simply winds up with
unexpected results. The same problem doesn't exist in reverse. If a string
with multiple arguments is passed when the default shell=False is used, an
immediate error is raised:


>>> subprocess.Popen('/bin/ls /var/log', shell=False)
Traceback (most recent call last):
 File "", line 1, in 
 File "/usr/local/lib/python2.6/subprocess.py", line 595, in __init__
   errread, errwrite)
 File "/usr/local/lib/python2.6/subprocess.py", line 1106, in _execute_child
   raise child_exception
OSError: [Errno 2] No such file or directory

My assumption is that this is because we're actually trying to execute
'/bin/ls /var/log' as one filename (including a space).

Something to keep in my "what the..?" pile.

Thursday, October 23, 2008

Guppy

Over the course of the the last two months or so, I've been consolidating
a small collection of proprietary XML-RPC daemons down into a series of
egg-based plugins for twistd. So far, I'm very happy with the result.

Yesterday, I started doing some initial load testing to watch for file
descriptor and memory leaks. I noticed that the 'twistd' process started
about at 19 MB of memory and slowly increased if I kept a steady stream
of method calls going.

I had some initial trouble trying to debut the leak. I'm actually
pretty happy with what I came up with.

I came across Guppy (http://guppy-pe.sourceforge.net/), which is a very
nice little tool that lets one dump Python heap statistics. The issue I had,
though, is that my application is a daemon. I need to trigger dumps as
I increase load.

My solution is as follows. I added the following to one of the modules that contains a subset of my XMLRPC methods.


try:
    from guppy import hpy
    @exposed
    def dumpheap():
        """
        Dumps Python heap information to a temporary file.

        Args: None
        Returns: string filename
        """
        filename = '/tmp/heap.%d.%5f' % (os.getpid(), time.time())
        context = hpy()
        context.heap().dump(filename)
        return filename 
     
    except ImportError, e:
        pass

The '@exposed' decorator is an internal function which wires up XMLRPC
plumbing for dumpheap. So now, I can trigger a heap statistics dump
by simply calling an XML-RPC method. Easy!

Thursday, October 16, 2008

Thanks, GoogleBot!

There are very few things I'll write in C these days. I just don't have a reason to use it. If I can write it in Python, I usually will unless I have a good reason not to. One of the exceptions is an Apache module that we use to dynamically manage virtual host data (as opposed to flat files). I pull all configurations out of LDAP. I'm able to get some sick scale using a dynamic approach and I never have to restart my server.

The code has been in existence for about 7 years now in various forms. I'm not the original author, though I've probably replaced about half of it as our needs change.

At the core, there is a shared memory cache that Apache children attach to. As data is pulled out of LDAP, it's jammed into that cache. I'm also storing negative entries in order to prevent against DOS situations. Data is expired after a configurable time frame. The expiration is handled by a separate daemon.

So, about a week ago, we started having issues with nodes locking up. The expiry daemon was sitting at around 100% CPU and Apache would not answer requests. An strace on the expiry process showed no system calls.

These are the fun ones. Probably stuck in a while loop due to a buffer overrun or some such problem.

Well, I stuck some debug code into the expiration process and I see the following:


expire: domain.com
expire: domain2.com
expire: big long string with lots of spaces in it.. more than 128 bytes long ending in arealdomain.com???????

The question marks being the terminal chars for "I don't know how to render that!" Turns out that uncovered a bug in that domains over 127 chars were not NULL terminated when added to the negative cache.

In digging further, I checked my access logs. It turns out it was GoogleBot sending that big long string of junk as a 'Host:' header. Each time GoogleBot would hit a specific site on my platform, it would pass that in. It's amazing to me that we've not had this problem before and that GoogleBot was the first agent to trigger it...

Of course, it could always be a fraudulent user agent as I forgot to check the IP ownership before I ditched the logs...

Tuesday, August 5, 2008

Setuptools Egg Repository

I've been in the process of moving all of our account management code away from a home brewed XMLRPC system and on to a Twisted instance. At first, we'll probably defer most of our calls in threads, but over time, I hope to make everything a bit more asynchronous in nature.

It uses a plug-in architecture. We've one main RPM we deploy (bdist_rpm) that grafts itself into the twisted/plugins directory. That component is then responsible for loading our actual account provisioning code.

I've been bundling each plug-in as an egg file. In order to make deployment easier, I wanted to run my own egg repository.

The simplest method is to dump all of our eggs into a directory and enable directory indexing within Apache. This gives us an automatically generated index of egg files that 'easy_install -f URL' can read from.

In our configuration, buildbot does the publishing. I've the setup.cfg setup such that it tags each egg with the SVN revision number. After the build has passed testing, buildbot copies the new egg into an HTTP accessible directory.

The problem I ran into is that I wanted to make all of our SVN code available to developers, but still rely on the easy directory index approach.

My solution?
[root@buildbot-slave01 eggs]# cat lister.cgi


#!/usr/bin/python

import os
import cgitb
cgitb.enable()

print 'Content-type: text/html'
print

for egg in os.listdir('.'):
   if egg.endswith('.egg'):
       print "<a href="%s">%s</a>" % (egg, egg)
   elif egg.endswith('svn'):
       print open(egg).read()

[root@buildbot-slave01 eggs]#

So now, I've included little text files named 'package.svn' that simply contain setuptools #package-dev links:


[root@buildbot-slave01 eggs]# cat package.svn
<a href="http://buildbot:password@svn/package/trunk#egg=package.dev">
      package trunk</a>

This results in an index that includes our buildbot tagged packages that we'll use by default, and our SVN links that are available for devlopers to follow.

Installing is as easy as:


easy_install -f http://buildbot/eggs/lister.cgi package

In turn, I've setup the unit tests to require certain eggs that exist in my repository. The buildbot update method is 'clobber', so each automated built results in the latest egg file being installed in the local build directory.

Very cool!

Tuesday, July 8, 2008

HTTP Error Codes

This is probably next to ancient, but I got a good chuckle out of it.

http://www.flickr.com/photos/apelad/sets/72157594388426362/

Sunday, June 29, 2008

Instance Specific Methods

Given an instance of a Python class, it's quite simple to bind a function to that object. One just binds a callable.


Python 2.5.2 (r252:60911, Apr 21 2008, 11:12:42)

[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2

Type "help", "copyright", "credits" or "license" for more information.

>>> class X(object):

...     pass

...

>>> def f():

...     pass

...

>>> x = X()

>>> x.f

Traceback (most recent call last):

 File "", line 1, in 

AttributeError: 'X' object has no attribute 'f'

>>> x.f = f

>>> x.f



>>>

The downside here is that when referencing the newly bound attribute, it's not treated as a bound method but rather a standard function object. There are two simple ways to actually create a new bound method that is specific to an instance.

First, one can easily just call the __get__ method of the function object when binding to the instance. Python functions are non-overriding descriptors. When the __get__ method is called, a method object is returned. Various attributes of the new method object handle the object 'plumbing', so to speak: im_self, im_class, im_func. That's really another topic. Here's how it works:


Python 2.5.2 (r252:60911, Apr 21 2008, 11:12:42)

[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2

Type "help", "copyright", "credits" or "license" for more information.

>>> class X(object): pass

...

>>> def f(self): pass

...

>>> x = X()

>>> x.f

Traceback (most recent call last):

 File "", line 1, in 

AttributeError: 'X' object has no attribute 'f'

>>> x.f = f.__get__(x, X)

>>> x.f

>

>>>

There we go. The 'im_func' is made available as the first argument to the function's __get__ method, as it is really the 'self' parameter that any standard descriptor would be passed.

The next way to make this all happen is via the types module. There is an 'MethodType' object that we can use to wrap a function:


Python 2.5.2 (r252:60911, Apr 21 2008, 11:12:42)

[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2

Type "help", "copyright", "credits" or "license" for more information.

>>> import types

>>> help (types.MethodType)



>>> class X(object): pass

...

>>> def f(self): pass

...

>>> x = X()

>>> x.f

Traceback (most recent call last):

 File "", line 1, in 

AttributeError: 'X' object has no attribute 'f'

>>> x.f = types.MethodType(f, x, X)

>>> x.f

>

>>> x.f()

>>>

It's my assumption that MethodType is simply calling f.__get__ internally, though I'm slightly too lazy to go look.

Friday, June 13, 2008

NFS Problems

All of my NFS problems have just gone away. We've recently updated our Kernels to the latest RHES available. Looks like there was something hidden in there that fixed it for us. Wonderful.

I do realize that it's normal to see delay. Both due to attribute cache semantics and mtime granularity of one second. The problem I had here was quite different in that the dentry caches *never* expired negative entries.

Wednesday, May 28, 2008

More NFS...

I am able to reproduce the issue at will now using both a Linux NFS
server or the NAS system. We can pretty safely rule out the NAS as the
culprit (though let's keep the ticket open, in case they've seen this
before).

As I've already explained, Linux caches file names as to avoid hitting
the disk on every look up (dentry cache). When there is a cache hit
(either negative or positive) the kernel will call a 'validate'
function, to ensure that the cached data is still good.

On local file systems, that function is simply a no-operation as all
updates need to go through the cache. Coherency is guaranteed.

On remote file systems (NFS, GFS, etc...) that call does "something" to
validate the contents of the cache. If the backing data has changed,
that entry is invalidated and the entire look up process starts over.

If there is a cache entry, the NFS validate function checks the cached
mtime on that entry's parent and compares it with current mtime as
reported by the NFS server. If there is a difference, the entry is
invalidated.

That process appears to not work correctly when dealing with negative
entries. It seems to work fine for positive entries (file *changes* show
up, just not new files in some cases).

So, after spending some time tracing how this whole process actually
works, I was able to reproduce the problem at will:

Stat a file that we know is non-existant. This populates the dentry
cache with a negative entry.


[root@cluster-util ~]# ssh root@cluster-www1a
"stat /home/cluster1/data/f/f/nfsuser01/test_nofile"
stat: cannot stat `/home/cluster1/data/f/f/nfsuser01/test_nofile': No
such file or directory

Create that file on a different server, this will also update the
mtime on that parent directory, so the NFS validation code on the dentry
hit ought to catch that.
```
[root@cluster-util ~]# ssh root@cluster-sshftp1a
"touch /home/cluster1/data/f/f/nfsuser01/test_nofile"
```

Try and stat the file again. Still broken.


[root@cluster-util ~]# ssh root@cluster-www1a
"stat /home/cluster1/data/f/f/nfsuser01/test_nofile"
stat: cannot stat `/home/cluster1/data/f/f/nfsuser01/test_nofile': No
such file or directory

Read the parent directory, this doesn't actually *repopulate* the
cache as I first thought. It actually invalidates the parent directory
itself (which is what should have happened correctly the first time
through). It doesn't invalidate it "the same way", though.
```
[root@cluster-util ~]# ssh root@cluster-www1a
"ls /home/cluster1/data/f/f/nfsuser01/ | wc -l"
16
```

And now it's present, as evidenced by the stat command below.


[root@cluster-util ~]# ssh root@cluster-www1a
"stat /home/cluster1/data/f/f/nfsuser01/test_nofile"
  File: `/home/cluster1/data/f/f/nfsuser01/test_nofile'
  Size: 0               Blocks: 0          IO Block: 4096   regular
empty file
Device: 15h/21d Inode: 4046108346  Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2008-05-28 10:07:28.963000000 -0400
Modify: 2008-05-28 10:07:28.963000000 -0400
Change: 2008-05-28 10:07:28.963000000 -0400
[root@cluster-util ~]#

Next step is going to be to load up a debug kernel or stick a system tap script in there to attempt to determine why that negative entry isn't invalidated as it should be .

Friday, May 23, 2008

Linux Caching Issues

Recently, we started seeing issues where a file would exist on a couple of the NFS clients, but not the others. The front-end Apache instance would return different results depending on which cluster nodes our load balancers would direct us to. In some cases, we'd recieve 404 errors. In other cases, we'd get the actual content. In a third scnenario, we would get the default Red Hat Index page (as the NFS client couldn't access the *real* DirectoryIndex page).

At first, I thought it was an Apache problem as we run some custom modules which handle URL translation.

However, another issue popped up with a PHP script. That PHP script was attempting to include another as a function library. In this case, that include was failing with an ENOENT on some systems, but not on others. Clearly this wasn't an Apache problem.

I've yet to be able to reproduce the problem, but I've a few existing instances to test with.

Given a situation where we're getting a 404 half of the time, I ran the following test against our NFS clients:


# for i in  www1a www1b www1c www1d ; do echo $i; ssh cluster-$i-mgmt "stat /home/cluster1/data/s/c/user/html/index.htm"; done
www1a
stat: cannot stat `/home/cluster1/data/s/c/user/html/index.htm': No such file or directory
www1b
  File: `/home/cluster1/data/s/c/user/html/index.htm'
  Size: 18838           Blocks: 40         IO Block: 4096   regular file
Device: 15h/21d Inode: 2733856169  Links: 1
Access: (0755/-rwxr-xr-x)  Uid: (15953/ user)   Gid: (15953/ user)
Access: 2008-05-23 09:55:03.029000000 -0400
Modify: 2008-05-23 09:55:03.029000000 -0400
Change: 2008-05-23 09:55:03.029000000 -0400
www1c
  File: `/home/cluster1/data/s/c/user/html/index.htm'
  Size: 18838           Blocks: 40         IO Block: 4096   regular file
Device: 15h/21d Inode: 2733856169  Links: 1
Access: (0755/-rwxr-xr-x)  Uid: (15953/ user)   Gid: (15953/ user)
Access: 2008-05-23 09:55:03.029000000 -0400
Modify: 2008-05-23 09:55:03.029000000 -0400
Change: 2008-05-23 09:55:03.029000000 -0400 
www1d
  File: `/home/cluster1/data/s/c/user/html/index.htm'
  Size: 18838           Blocks: 40         IO Block: 4096   regular file
Device: 15h/21d Inode: 2733856169  Links: 1
Access: (0755/-rwxr-xr-x)  Uid: (15953/ user)   Gid: (15953/ user)
Access: 2008-05-23 09:55:03.029000000 -0400
Modify: 2008-05-23 09:55:03.029000000 -0400
Change: 2008-05-23 09:55:03.029000000 -0400

When logging in to 'www1a', it's just not possible to read the file directly. I can't cat it, stat it, or ls it. However, once I step into the containing directory (html, in this case) and run an 'ls', the file is now availble. Looks as though the readdir() triggered by my ls command updates the cache.

So, my assumption at this point is that it's a directory cache problem. For some reason, we're getting negative entries (NULL inode structure pointers). I've no idea why.

The problem came up again this morning. As it turns out, Linux 2.6.16 allows users to dump various caches in order to free the memory being used. I tried it out on one of the systems experiencing the problem:


[root@cluster-www1c vm]# ls
block_dump                 drop_caches           max_map_count
overcommit_ratio          swappiness
dirty_background_ratio     hugetlb_shm_group     min_free_kbytes
pagecache                 swap_token_timeout
dirty_expire_centisecs     laptop_mode           nr_hugepages
page-cluster              vdso_enabled
dirty_ratio                legacy_va_layout      nr_pdflush_threads
panic_on_oom              vfs_cache_pressure
dirty_writeback_centisecs  lowmem_reserve_ratio  overcommit_memory
percpu_pagelist_fraction
[root@cluster-www1c vm]# stat /home/cluster1/data/s/c/user/html/index.htm
stat: cannot stat `/home/cluster1/data/s/c/user/html/index.htm': No such file or directory
[root@cluster-www1c vm]# sync
[root@cluster-www1c vm]# sync
[root@cluster-www1c vm]# echo '2' > drop_caches
[root@cluster-www1c vm]# !st
stat /home/cluster1/data/s/c/user/html/index.htm
  File: `/home/cluster1/data/s/c/user/html/index.htm'
  Size: 18838           Blocks: 40         IO Block: 4096   regular file
Device: 15h/21d Inode: 2733856169  Links: 1
Access: (0755/-rwxr-xr-x)  Uid: (15953/ user)   Gid: (15953/ user)
Access: 2008-05-23 09:55:03.029000000 -0400
Modify: 2008-05-23 09:55:03.029000000 -0400
Change: 2008-05-23 09:55:03.029000000 -0400
[root@cluster-www1c vm]#

So, we've got a bit of a workaround, at least for the long weekend. I'll probably wind up setting up a cron job to dump the cache every half hour or so in order to avoid phone calls.

The problem is clearly related to the dcache. I've no idea what is causing it, however. It could be our bind mount system. The fancy NAS unit may also be returning invalid responses causing Linux to do the Right Thing at the Wrong Time.

Thursday, April 24, 2008

Getting to Know Zope 3: Round 1

While I've done quite a bit of Python over the past few years, I've not really used any of the Zope based technologies. With the exception of zope.interface and a Plone proof of concept, I'm new to most everything Zope.

I've elected to go with pure play Zope as opposed to Grok. I like what I see from the Grok project, but I like to understand all of the internals. I get the impression that Grok attempts to shield that from me.

One of the things that I've been looking forward to in my quest to learn Zope 3 has been the egg-based install process. Over the past few months, I've been contemplating moving our RPM based system over to Python eggs. We can dynamically download and install them via HTTP when our application server starts up... or something all cutting edge and automated like that.

Getting it Up And Running

It took me a bit to actually figure out *how* to download an egg-based Zope installation. The links off of the main web site still point to the monolithic tarball distribution. It took a bit of creative clicking to find the Zope 3 site, which is Wiki based. Even then, the majority of the articles on that site are geared towards earlier releases of Zope 3.

Then I found this: http://wiki.zope.org/zope3/Zope340c1. The release documentation for Zope 3.4, release candidate 1. Not exactly the official style I was looking for, but it points me in the right direction regardless.

Note that I opted to install the latest version of Python 2.4. My first attempt was with 2.5.1, but that ended with a mysterious "no module named script.command." I've done all of this with 2.4.5.

1. First, we'll install the 'zopeproject' utility via setuptools. There are a few dependancies that area also added when this runs: PasteScript, PasteDeploy, and Paste.

root@marvin jj]# /usr/local/bin/easy_install zopeproject
Searching for zopeproject
Reading http://pypi.python.org/simple/zopeproject/
Best match: zopeproject 0.4.1
Downloading http://pypi.python.org/packages/2.4/z/zopeproject/zopeproject-0.4.1-py2.4.egg#md5=3c0c590752c637ee047cc555b2e8f5c1
Processing zopeproject-0.4.1-py2.4.egg
creating /usr/local/lib/python2.4/site-packages/zopeproject-0.4.1-py2.4.egg
Extracting zopeproject-0.4.1-py2.4.egg to /usr/local/lib/python2.4/site-packages
Adding zopeproject 0.4.1 to easy-install.pth file
Installing zopeproject script to /usr/local/bin

... Paste, PasteScript, and PasteDeploy ...
Finished processing dependencies for zopeproject

Easy enough. As this all relies on zc.buildout, I should be able to drop root permissions now and install directly into my home directory.

2. Next, run the newly installed zopeproject utility in order to create a Zope install. It asks a few questions specific to the install and then goes about it's business downloading all of the eggs that do not already exist on my system. As this is a new install, there are exactly zero pre existing. This takes a while. Note that it does allow one to pass a shared egg directory. This is a plus in that multiple instances can share the same install base.


jeff@marvin code]$ /usr/local/bin/zopeproject test_project
Enter user (Name of an initial administrator user): jeff
Enter passwd (Password for the initial administrator user): password
Enter eggs_dir (Location where zc.buildout will look for and place packages) ['/home/jeff/buildout-eggs']:
Creating directory ./test_project
Downloading zc.buildout...
Invoking zc.buildout...
zip_safe flag not set; analyzing archive contents...
warning: no files found matching '0.1.0-changes.txt'
no previously-included directories found matching 'docs-in-progress'
"optparse" module already present; ignoring extras/optparse.py.
"textwrap" module already present; ignoring extras/textwrap.py.
zip_safe flag not set; analyzing archive contents...
docutils.writers.html4css1.__init__: module references __file__
docutils.writers.pep_html.__init__: module references __file__
docutils.writers.s5_html.__init__: module references __file__
docutils.writers.newlatex2e.__init__: module references __file__
docutils.parsers.rst.directives.misc: module references __file__
[jeff@marvin code]$

At this point, there's not a lot of knowledge of paster or buildout required. I intend to learn them, but for now, it's rather easy to get started. Also of note, there's a holy hell of a lot less warning output using 2.4 as opposed to 2.5!

3. Lastly, If this works as expected, it should be possible to simply step into the newly created instance directory and startup the server.


[jeff@marvin test_project]$ bin/test_project-ctl fg
bin/paster serve deploy.ini
------
2008-04-24T21:14:38 WARNING root Developer mode is enabled: this is a security risk and should NOT be enabled on production servers. Developer mode can be turned off in etc/zope.conf
/home/jeff/buildout-eggs/zope.configuration-3.4.0-py2.4.egg/zope/configuration/config.py:197: DeprecationWarning: ZopeSecurityPolicy is deprecated. It has moved to zope.securitypolicy.zopepolicy  This reference will be removed somedays
 obj = getattr(mod, oname)
Starting server in PID 16295.
------
2008-04-24T21:14:41 INFO paste.httpserver.ThreadPool Cannot use kill_thread_limit as ctypes/killthread is not available
serving on http://127.0.0.1:8080

Perfect! We've an instance of Zope running and the ZMI is accessible. When time permits, I'll step through the instance directory and get a good understanding of what was actually created. It's pretty straightforward now that we've got an instance up.

Monday, April 21, 2008

Dipping into 3.0

A couple of days ago I downloaded the latest alpha release of Python 3.0. It's still pretty early, but I decided it made a lot of sense to start going through the new features in order to get a feel for what type of porting work we'll have to do.

Using the What's New page as a guide, I compared some of the behavior in the new 3.0 branch with the current 2.5 release. A lot of the changes appear to be "decruftification" updates. For instance, moving to a print() built in removes some of the awkward looking "print >>sys.stderr," code that never feels quite right.

The first feature that really caught my eye is the function annotation addition. I like the fact that the Python developers chose to go with a generic system as opposed to static typing alternative. I especially like the fact that annotations can be any valid Python expression.

We store all of our Linux user attributes in LDAP. This includes POSIX data as well as some domain specific stuff that makes sense. The first thought I had after reading up on function annotations was streamlining access to an LDAP system. It seems that one may be able to do something like:


#!/home/jeff/py3k/bin/python

LDAP_CONNECTION = "localhost"

class LDAPProxy:
 def __init__(self, connection):
     self._connection = connection

 def do_ldap_lookup(self, dn, attr):
     """LDAP Query Mock"""
     print("query ldap server on on {0}:{1} for {2}".format(
                             self._connection, dn, attr))
     # For example, something that can go string or int...
     return 0

 def __call__(self, attribute, kind):
     """Return a tuple of functions for use as a property"""
     def get_value(inner_self, attr=attribute) -> kind:
         result = self.do_ldap_lookup(inner_self.dn, attr)
         return get_value.__annotations__['return'](result)

     def set_value(inner_self, value):
         pass

     return (get_value, set_value)

class User:
 m = LDAPProxy(LDAP_CONNECTION)

 def __init__(self, dn):
     self.dn = dn

 uid = property(*m("uidNumber", int))
 home = property(*m("homeDirectory", str))

u = User("cn=user,ou=accounts,dc=domain,dc=com")
uid = u.uid
home = u.home
print ("{0} is ({1})".format(uid, type(uid)))
print ("{0} is ({1})".format(home, type(home))

Ok, so perhaps that's not the best example as it looks very possible
without the annotation. Still a pretty cool new feature, especially
when dealing with XMLRPC & SOAP. Ought to do make introspection and
WSDL generation much easier.

Friday, April 18, 2008

Choosing a framework with API support in mind?

For the last couple of years or so, I've maintained a system for managing and manipulating various web hosting accounts. The code consists of a few Python modules and some supporting C extensions where necessary.

Each module contains a series of XMLRPC calls that are made available to remote clients. Using these XMLRPC calls, it's possible to "do stuff" to a standard account. Accounts can be created, deleted, removed, and so on. It's a fairly powerful API in that it's possible to tickle just about every attribute associated with a user's account.

The application relies on a few thousand lines of boilerplate 'infrastructural code.' By infrastructural code, I generally mean things like logging, daemonization, startup and shutdown, and wrappers around Python's standard SimpleXMLRPCServer modules. They're actually subclassed in order to add support for some nifty little features, but that's not all that important.

We run multiple instances for each core module, each on a different port. For example, web might run on 8181, SQL on 8282, and ecommerce on 8383. The whole thing is fronted by Apache which provides URL mapping, authentication, and SSL capabilities.

I've really been hoping to get rid of all of that custom code. It's a headache. I don't want to manage PID files, session management, or SQL connection pooling manually any longer. Our group can be much more productive if we can simply focus on our API development rather than bugs in our frameworks.

I've spent a great deal of time researching the various Python application servers out there. I think I've narrowed it down to two: Zope 3 or Django.

With the exception of Zope 3, the problem I have with these frameworks in general is that they're clearly intended for web *site* development and not necessarily web-aware *API* development.

From a holistic point of view, I like Django. A lot. It's clean, pragmatic, and gaining a lot of popularity. I like the template system (I've seen a lot of horrible PHP). The documentation is outstanding. It took me almost no time at all to setup a basic site, grind out some extremely ugly templates, and wire everything up to a little MySQL database. That's about where it ends, though. We need to support XMLRPC calls. We don't need a UI. We don't even need a database for the most part.

Zope 3 on the other hand, looks to be exactly what we need. It's a very extensive framework. It's built on eggs, so we only need to install what we need. We can simply create XMLRPC/SOAP views and tie directly into our existing Python modules. Sounds great.

The documentation is god awful. I want to use it. I know it's very powerful. The Zope crew has just made the barrier to entry very difficult to overcome. It's not difficult to grasp once I find some type of authoritative documentation. It's finding that documentation that is the problem.

I'm going to learn it. It's clearly one of the most powerful utilities in the Pythonic tool chest.

All of that said, what is the preferred framework for API-driven development? Am I barking up the wrong tree with Zope and Django? Does it make sense to leave it as is? I don't need templates, URL routes, AJAX, or browser compatibility hacks. We're developing a large scale remote API, not a web site.

Lastly, I did look into using Twisted, but I'm not sure our current code base fits into the async. model very well. It would simply be an XMLRPC controller (for each module) firing off our current methods using threaded deferreds.

Post #1!

Well, this is it. I've done it. I've finally entered the technical blogosphere. More specifically, the software-engineering-using-Python-in-an-Agile-setting blogosphere. I'm preparing to step back into the world of Internet services after a brief hiatus. Lots of stuff to do, technologies to implement, and problems to fix. Sounds like a perfect time to start a new blog.

So, who am I and why do I matter? I'm not certain I can answer that second part, but I'd be more than happy to share some of my qualifications: Experience, and some experience.

I've spent about ten years in the web hosting industry. I've watched my company start, grow, IPO, acquire, and eventually get acquired. I've done everything from purchase servers and operate disk arrays to develop large-scale systems using the Python programming language. Always Linux (except for that time when I was younger and did some of that FreeBSD stuff...).

I have a very deep interest in Python specifically. Unless dictated by standard or convention, all of the work I've done for the past 3-4 years has been done in Python. Provisioning systems, account utilities, test suites, web UI's (which look terrible -- never let a programmer create a UI). All Python. I sometimes say that I actually think in Python.

That's it for the obligatory introductory post. More to come.

Web Infrastructure Systems, Deployments, and Technologies.

I've moved my blog to jmcneil.net. This is no longer being updated!