?

Log in

No account? Create an account
Naming twins in Python & Perl - brad's life [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Naming twins in Python & Perl [Jan. 6th, 2008|02:16 pm]
Brad Fitzpatrick
[Tags|, , , ]

Last night at Beau's party, one of Beau's guests mentioned he's expecting twins shortly, which is why is wife wasn't at the party.

I drunkenly suggested he name his kids two names that were anagrams of each other. I then wandered off downstairs to find such suitable names.

Because I'm supposed to be be working in Python these days, not Perl, I gathered what little Python knowledge I had and wrote out:
#!/usr/bin/python

by_anagram = {}

names_file = open("dist.male.first")
for line in names_file:
    # lines in file look like:
    # WILLIAM        2.451 14.812      5
    # we want just the first field.
    name = (line.split())[0]
    letters = [letter for letter in name]
    letters.sort()
    sorted_name = "".join(letters)
    if not sorted_name in by_anagram:
        by_anagram[sorted_name] = []
    by_anagram[sorted_name].append(name)

for sorted_name in by_anagram:
    if len(by_anagram[sorted_name]) < 2:
        continue

    print by_anagram[sorted_name]
Not so happy with it, but it worked, and I printed out the results and brought them up to the guy:
['TROY', 'TORY']
['CLAY', 'LACY']
['JEFFREY', 'JEFFERY']
['ANGEL', 'GALEN']
['FOREST', 'FOSTER']
['ANDRE', 'DAREN', 'ARDEN']
['LEROY', 'ELROY']
['LEONARD', 'RENALDO', 'LEANDRO']
['ARIEL', 'ARLIE']
['BRENDAN', 'BRANDEN']
['WARREN', 'WARNER']
['DEAN', 'DANE']
['CHRISTOPER', 'CRISTOPHER']
['COLE', 'CLEO']
['CARLO', 'CAROL']
['ELMER', 'MERLE']
['REUBEN', 'RUEBEN']
['JOSEPH', 'JOESPH', 'JOSPEH']
['COREY', 'ROYCE']
['JASON', 'JONAS']
['RAMON', 'ROMAN']
['JAMIE', 'JAIME']
['CARMELO', 'MARCELO']
['BYRON', 'BRYON']
['LEON', 'NOEL', 'OLEN']
['NEAL', 'LANE']
['MICHAEL', 'MICHEAL', 'MICHALE']
['KEITH', 'KIETH']
['BERT', 'BRET']
['BRIAN', 'BRAIN']
['OLIN', 'LINO']
['DION', 'DINO']
['DANA', 'ADAN']
['RONALD', 'ROLAND', 'ARNOLD']
['ISRAEL', 'ISREAL']
['DARNELL', 'RANDELL']
['ANTOINE', 'ANTIONE']
['ORLANDO', 'ROLANDO', 'ARNOLDO']
Just now, I was wondering the equivalent in Perl, and wrote:
#!/usr/bin/perl

use strict;
open (my $fh, "dist.male.first") or die;
my %by_anagram;
while (<$fh>) {
    chomp;
    s/\s.*//;
    my $name = $_;
    my $sorted_name = join('', sort split //, $name);
    push @{$by_anagram{$sorted_name}}, $name;
}

foreach my $sn (grep { @{$by_anagram{$_}} > 1 } keys %by_anagram) {
    print "@{$by_anagram{$sn}}\n";
}
In particular, I like about Python that errors-are-exceptions is the norm. But I like that regexps are built into Perl. (also hate Python's general hating on functional programming, unrelated to this post) I'm sure my Python could be way shorter, too. Anybody want to post either their short Python version, or their more-idiomatic Python version?

Also interesting: (fastest of a 3 consecutive runs each)
sammy$ time ./ananames.pl > /dev/null

real	0m0.026s
user	0m0.024s
sys	0m0.004s

sammy$ time ./ananames.py > /dev/null

real	0m0.043s
user	0m0.036s
sys	0m0.008s
LinkReply

Comments:
Page 1 of 3
<<[1] [2] [3] >>
[User Picture]From: avva
2008-01-06 10:32 pm (UTC)
Rueben and Reuben is the winning pair.
(Reply) (Thread)
[User Picture]From: herbie
2008-01-06 10:47 pm (UTC)
Think, though, of the pressures put on a Brian/Brain pair.
(Reply) (Parent) (Thread) (Expand)
[User Picture]From: ciphergoth
2008-01-06 10:35 pm (UTC)
#!/usr/bin/python

by_anagram = {}

for line in open("dist.male.first"):
    name = (line.split())[0]
    sorted_name = "".join(sorted([letter for letter in name]))
    by_anagram.setdefault(sorted_name, []).append(name)
    
for names in by_anagram.itervalues():
    if len(names) > 1:
       print names
(Reply) (Thread)
From: evan
2008-01-06 11:03 pm (UTC)
sorted(name) also works.
(Reply) (Parent) (Thread) (Expand)
(no subject) - (Anonymous) Expand
[User Picture]From: jarodrussell
2008-01-06 10:38 pm (UTC)
That's an elegantly simple way of sorting and comparing the letters in a name. Beautiful code.
(Reply) (Thread)
[User Picture]From: hellarad
2008-01-06 10:58 pm (UTC)
1] why didn't you do this for meeee?
2] same names spelled differently so do not count!!!
(Reply) (Thread)
[User Picture]From: dossy
2008-01-06 11:23 pm (UTC)
I'm assuming you're using this dist.male.first?

As long as you don't mind a little golfing, here's the Tcl version:

set f [open dist.male.first r]
while {[gets $f line] >= 0} {
    set name [lindex [split $line " "] 0]
    set sorted [lsort [split $name ""]]
    lappend by_anagram($sorted) $name
}

foreach sorted [array names by_anagram] {
    if {[llength $by_anagram($sorted)] > 1} {
        puts $by_anagram($sorted)
    }
}
close $f


I'd like to know how fast or slow it is on your machine, to get an apples-to-apples benchmark comparison (assuming you have Tcl/tclsh installed). Include the version of tclsh you used, if you do.
(Reply) (Thread)
[User Picture]From: pne
2008-01-07 07:17 am (UTC)

golf

Couldn't you turn set name [lindex [split $line " "] 0] into set name [lindex $line 0]?

I thought that strings and lists in Tcl were interconvertible, and lindex'ing a string essentially split it on whitespace.
(Reply) (Parent) (Thread) (Expand)
(Deleted comment)
From: ext_26836
2008-01-07 12:47 am (UTC)

Ruby 1.9

File.open('dist.male.first').
  map {|line| line.split.first}.
  group_by {|name| name.chars.sort}.
  each {|key,names| p names if names.length>1}
(Reply) (Parent) (Thread)
From: elvenforever
2008-01-07 12:18 am (UTC)
Heh. This year was my 10th wedding anniversary. I make jewelry. So, to celebrate, I calculated our wedding date in binary and made it into a necklace. Geek love. :)
(Reply) (Thread)
[User Picture]From: jope
2008-01-07 12:27 am (UTC)
Where were you when the best Hasbro could come up with was Tomax and Xamot?
(Reply) (Thread)
[User Picture]From: ciphergoth
2008-01-07 01:56 am (UTC)
Going a bit more mad for brevity with list comprehensions:
#!/usr/bin/python

by_anagram = {}

for name in [l.split()[0] for l in open("dist.male.first")]:
        by_anagram.setdefault("".join(sorted(name)), []).append(name)

print "\n".join([" ".join(n) for n in by_anagram.itervalues() if len(n) > 1])
(Reply) (Thread)
[User Picture]From: krow
2008-01-07 02:46 am (UTC)
Ask Guido someday about how Python's Threads are implemented...

In some fairness though, Python's internals are much easier to follow then the MACRO hell of Perl's.
(Reply) (Thread)
From: ex_n1ckname629
2008-01-07 03:02 am (UTC)
oki doki

$ time ./ana.py > /dev/null

real 0m0.028s
user 0m0.016s
sys 0m0.008s
$ time ./ana.pl > /dev/null

real 0m0.019s
user 0m0.020s
sys 0m0.000s
$ time ./ananames.bin > /dev/null

real 0m0.009s
user 0m0.008s
sys 0m0.004s
(Reply) (Thread)
[User Picture]From: mechanyx
2008-01-07 03:31 am (UTC)
Can I ask why you are "supposed" to be writing in Python if your Perl is so strong?
(Reply) (Thread)
[User Picture]From: brad
2008-01-07 04:16 am (UTC)
Google has a shitload of code and infrastructure. Google also has a shitload of engineers that altogether know a shitload of languages. If every engineer were allowed to write in his/her favorite pet language of the week, the necessary explosion of substandard bindings for each library * each language would be unmaintainable. Given that Perl/Python/Ruby are all effectively the same, it makes sense to standardize on one. Python has the right mix of learnability, readability, industry/community support, etc. I don't object to having to write in Python... it makes a ton of sense.
(Reply) (Parent) (Thread) (Expand)
From: evan
2008-01-07 04:05 am (UTC)

i can has golf

import Control.Arrow
import List
import qualified Data.Map as M

main = do
  dat <- readFile "dist.male.first"
  let pairs = map ((sort &&& (:[])) . head . words) $ lines dat
  mapM_ print $ filter ((>1) . length) $ M.elems $ M.fromListWith (++) pairs
Hooray for large standard libraries.
(Reply) (Thread)
[User Picture]From: mobley
2008-01-07 04:23 am (UTC)
if i was him, i would name my children
orlando and rolando
and be done with it.

also, if i had a twin named branden
while growing up
one of us would've been
driven to kill by now.
(Reply) (Thread)
[User Picture]From: nothings
2008-01-07 04:29 am (UTC)
For what little it's worth:

#define STB_DEFINE
#include "stb.h"     // http://nothings.org/stb.h

stb_sdict *names;

int main(int argc, char **argv)
{
   int i,j;
   char **lines, *y;
   names = stb_sdict_new(1);
   lines = stb_stringfile("/sean/writing/tools/male.txt", NULL);
   for (; *lines; ++lines) {
      char **z = stb_tokens(*lines, " ", NULL), *p = strdup(z[0]), **s;
      qsort(z[0], strlen(z[0]), 1, stb_charcmp);
      s = stb_sdict_get(names, z[0]);
      stb_arr_push(s, p);
      stb_sdict_set(names, z[0], s);
   }
   stb_sdict_for(names, i, y, lines) {
      if (stb_arr_len(lines) > 1) {
         printf("[ ");
         for (j=0; j < stb_arr_len(lines); ++j) {
            printf("'%s' ", lines[j]);
         }
         printf("]\n");
      }
   }
   return 0;
}


Of course that leaks all memory, and I don't have perl or python installed to do a performance comparison anyway.
(Reply) (Thread)
Page 1 of 3
<<[1] [2] [3] >>