I've moved my blog to jmcneil.net. This is no longer being updated!

Tuesday, August 4, 2009

Regular Expressions


For a guy that's never been much of a regular expression wizard, this is still a good document. I've spent many hours trying to decipher strings of Perl compatible printer noise so I've learned to use them sparingly. It's my "pay it forward" for the next guy that sits in this not-so-comfy office chair.

Surly someone would have come up with a more natural-language way of processing strings by now? Instead of '^\d' one could write 'match if str starts with digit.'


Dougal said...

A more verbose syntax would be a NIGHTMARE as regular expressions can get kinda long anyway... the would be huge!

If you take the time to get to learn regular expressions or use them more regularly they kinda make sense and I don't actually mind them.

Infact, dare I say it, I kinda like em.

Brandon Corfman said...

The pyparsing isn't bad for readability.

Jeff McNeil said...

It's not that they don't make sense, it's that every time I come across one that's 50 chars long, I wind up spending ages taking it apart and figuring out exactly what it does. I've written quite a few of them; it's the reading part that I dislike!

Yeah, I wouldn't so much want 100 lines of text parsing directives littering my code. Might be nice to have a means to compile something like that down into a regular expression, though. Just a thought.

I'll suck PyParsing down. I haven't dealt with it before, it might be a nice alternative.

Jarrod said...

I have to say pyparsing is really cool.

A wise person once said, "If you have a problem and solve it with regular expressions, now you have 2 problems. (three if perl is involved :-) )

moreati said...

Regular expressions have their place. I certainly find them more readable than the equivalent string manipulation code.

One thing that helps me when they get over complex is the re.VERBOSE flag. It allows one to break up and comment a regular expression inline.


Anonymous said...

I had came across spacify.com were i found the best collection of swivel office chairs and you can also find more.

Marius Gedminas said...

Attempts to make specialized languages more user-friendly by making them mimic English often fail (remember COBOL?). You still have a specialized language that you have to learn; similarity may make it easier to remember some things, but it also leads you to make assumptions that are valid in English but do not work in this language.

The biggest problem with regexps is that there are so many different dialects: instead of learning one now you have to learn many.

Paddy3118 said...

Try Kodos http://kodos.sourceforge.net/

It's great for regexp training.

- Paddy.

i said...

An more natural-language way:

if s[0] == int(s[0]): print s,'starts with an integer'
except ValueError:
print s,'does not start with an integer'

You can also get a long way with:
-> if 'b' in 'abc': print 'True'
-> if 'abc'.startswith('a'): print 'True'
-> if 'abc'.endswith('c'): print 'True'

Jerome said...

I'm just getting the hang of them myself. Had to re-learn the little I had beforehand.

The biggest headache to me is remembering exactly how \s is different in one language versus the other... or those dang quotes/double-quotes.

Anyway... still a noob, but growing.