brad's life - Naming twins in Python & Perl [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Naming twins in Python & Perl [Jan. 6th, 2008|02:16 pm]
[Tags|, , , ]

Last night at Beau's party, one of Beau's guests mentioned he's expecting twins shortly, which is why is wife wasn't at the party.

I drunkenly suggested he name his kids two names that were anagrams of each other. I then wandered off downstairs to find such suitable names.

Because I'm supposed to be be working in Python these days, not Perl, I gathered what little Python knowledge I had and wrote out:

by_anagram = {}

names_file = open("dist.male.first")
for line in names_file:
    # lines in file look like:
    # WILLIAM        2.451 14.812      5
    # we want just the first field.
    name = (line.split())[0]
    letters = [letter for letter in name]
    sorted_name = "".join(letters)
    if not sorted_name in by_anagram:
        by_anagram[sorted_name] = []

for sorted_name in by_anagram:
    if len(by_anagram[sorted_name]) < 2:

    print by_anagram[sorted_name]
Not so happy with it, but it worked, and I printed out the results and brought them up to the guy:
['TROY', 'TORY']
['CLAY', 'LACY']
['DEAN', 'DANE']
['COLE', 'CLEO']
['LEON', 'NOEL', 'OLEN']
['NEAL', 'LANE']
['BERT', 'BRET']
['OLIN', 'LINO']
['DION', 'DINO']
['DANA', 'ADAN']
Just now, I was wondering the equivalent in Perl, and wrote:

use strict;
open (my $fh, "dist.male.first") or die;
my %by_anagram;
while (<$fh>) {
    my $name = $_;
    my $sorted_name = join('', sort split //, $name);
    push @{$by_anagram{$sorted_name}}, $name;

foreach my $sn (grep { @{$by_anagram{$_}} > 1 } keys %by_anagram) {
    print "@{$by_anagram{$sn}}\n";
In particular, I like about Python that errors-are-exceptions is the norm. But I like that regexps are built into Perl. (also hate Python's general hating on functional programming, unrelated to this post) I'm sure my Python could be way shorter, too. Anybody want to post either their short Python version, or their more-idiomatic Python version?

Also interesting: (fastest of a 3 consecutive runs each)
sammy$ time ./ > /dev/null

real	0m0.026s
user	0m0.024s
sys	0m0.004s

sammy$ time ./ > /dev/null

real	0m0.043s
user	0m0.036s
sys	0m0.008s

[User Picture]From: ciphergoth
2008-01-06 10:35 pm (UTC)

by_anagram = {}

for line in open("dist.male.first"):
    name = (line.split())[0]
    sorted_name = "".join(sorted([letter for letter in name]))
    by_anagram.setdefault(sorted_name, []).append(name)
for names in by_anagram.itervalues():
    if len(names) > 1:
       print names
(Reply) (Thread)
From: evan
2008-01-06 11:03 pm (UTC)
sorted(name) also works.
(Reply) (Parent) (Thread)
[User Picture]From: ciphergoth
2008-01-07 01:32 am (UTC)
"".join(sorted(name)) I guess, since "{sorted("foo"): 3}" throws a "list objects are unhashable" exception. Thanks!
(Reply) (Parent) (Thread)
[User Picture]From: oedipamaas49
2008-01-07 09:53 pm (UTC)
you can make your list of names, then do:
names.sorted(key = sorted)
to sort them by the result of sorting the letters of each name. Then go through the list and find names with the same letters as the next or previous one in the list.
(Reply) (Parent) (Thread)
[User Picture]From: dblume
2008-01-06 11:41 pm (UTC)
Thanks! I didn't know about dict.setdefault. Is there a downside to using it vs. calling dict.has_key or checking if the key is in dict like Brad does above?
(Reply) (Parent) (Thread)
[User Picture]From: ciphergoth
2008-01-07 01:42 am (UTC)
No, I think setdefault is both cleaner and faster. We can do still better using 2.5-specific features like "defaultdict":
from __future__ import with_statement
from collections import defaultdict

by_anagram = defaultdict(list)

with open("dist.male.first") as f:
    for line in f:
        name = line.split()[0]
        sorted_name = "".join(sorted(name))
We also do the right thing about closing the file using "with" here, which obviously doesn't matter for this program but is nice in general.

Unrelatedly, I note that in the Perl code the test for more than one member is effectively done in its own loop; Python can do that too, but I don't think it's the right thing here:
for names in [n for n in by_anagram.itervalues() if len(n) > 1]:
       print " ".join(names)

Edited at 2008-01-07 01:44 am (UTC)
(Reply) (Parent) (Thread)
[User Picture]From: ciphergoth
2008-01-07 02:05 am (UTC)
Must go to bed, but here's another reason wby defaultdict is neat:

from __future__ import with_statement
from collections import defaultdict

by_anagram = defaultdict(list)

with open("dist.male.first") as file:
    for line in file:
        name = line.split()[0]

with open("dist.female.first") as file:
    for line in file:
        name = line.split()[0]
        for boyname in by_anagram[name]:
            if boyname != name:
                print name, boyname
(Reply) (Parent) (Thread)
[User Picture]From: dblume
2008-01-07 03:38 am (UTC)
Did you mean,

for boyname in by_anagram["".join(sorted(name))]:

(Reply) (Parent) (Thread)
[User Picture]From: ciphergoth
2008-01-07 08:51 am (UTC)
D'oh! Thanks - that gives lots more boy/girl pairs!
(Reply) (Parent) (Thread)
From: (Anonymous)
2008-01-07 02:40 pm (UTC)
hm... wouldn't it be faster when using tuples instead of a string for the dict key?
instead of
(Reply) (Parent) (Thread)
[User Picture]From: ciphergoth
2008-01-15 05:22 pm (UTC)
This doesn't need to be fast. "".join(sorted(name)) would probably be easier if we had to debug this, but tuple is shorter and perhaps clearer, so I think it's a coin toss...
(Reply) (Parent) (Thread)
[User Picture]From: brad
2008-01-07 12:24 am (UTC)
Did not know about sorted or setdefault, thanks!
(Reply) (Parent) (Thread)