Monday, February 9, 2009

Obfuscated perl

For my own amusement, I like writing Perl as if it were a functional language, with lots of map()s, grep()s and join()s. Here's a fun example, commented up. What it does is print out all the capitalized words in a document that are not found in the system dictionary.

I was using this for the Kencyr Wiki on the HTML versions of the E-books, looking for proper names I hadn't defined in the Wiki.


#!/usr/bin/perl -w

# We just use a hash for its side-effect of quick keyed lookup
my %dictionary = map {chomp; tr/A-Z/a-z/; $_, 1} `cat /usr/share/dict/words`;

# Here, a hash is used for its side-effect of key uniqueness. The list returned by the first
# map() contains duplicates, but they are "flattened" when assigned to a hash.
my %allcapwords =
(map {$_ => 1} # Make hashy
(grep {not defined $dictionary{$_}} # sieve out only the undefined ones
(map {tr/A-Z/a-z/;s/\W+$//;$_} # Post-massage; lowercase; remove trailing garbage
(grep {/^[A-Z][a-z]/} # Only capital-then-lowercase words
# pre-massage; remove HTML and then split into words
(map {chomp; s/<[^<]+($|>)//g; s/&\w+;/ /g; split /\s+/; } <>)))));

# And output it all. We only ever were interested in the keys; the values were always 1
print((join "\n", keys %allcapwords) . "\n");

1 comments:

Asphyxiation said...

Matthew I love your fotos I wanted to get in touch and talk to you about doing more photos for xile i apperciate your review it was nice contact me via-myspace atmy profile off the xile page its www.myspace.com/clubxile talk theb