| Comments: |
![[User Picture]](http://l-userpic.livejournal.com/81048954/111931) | From: avva 2008-01-06 10:32 pm (UTC)
| (Link)
|
Rueben and Reuben is the winning pair.
Think, though, of the pressures put on a Brian/Brain pair.
#!/usr/bin/python
by_anagram = {}
for line in open("dist.male.first"):
name = (line.split())[0]
sorted_name = "".join(sorted([letter for letter in name]))
by_anagram.setdefault(sorted_name, []).append(name)
for names in by_anagram.itervalues():
if len(names) > 1:
print names
From: evan 2008-01-06 11:03 pm (UTC)
| (Link)
|
sorted(name) also works.
That's an elegantly simple way of sorting and comparing the letters in a name. Beautiful code.
1] why didn't you do this for meeee? 2] same names spelled differently so do not count!!!
I'm assuming you're using this dist.male.first? As long as you don't mind a little golfing, here's the Tcl version: set f [open dist.male.first r]
while {[gets $f line] >= 0} {
set name [lindex [split $line " "] 0]
set sorted [lsort [split $name ""]]
lappend by_anagram($sorted) $name
}
foreach sorted [array names by_anagram] {
if {[llength $by_anagram($sorted)] > 1} {
puts $by_anagram($sorted)
}
}
close $fI'd like to know how fast or slow it is on your machine, to get an apples-to-apples benchmark comparison (assuming you have Tcl/tclsh installed). Include the version of tclsh you used, if you do.
![[User Picture]](http://l-userpic.livejournal.com/2949189/725716) | From: pne 2008-01-07 07:17 am (UTC)
golf | (Link)
|
Couldn't you turn set name [lindex [split $line " "] 0] into set name [lindex $line 0]?
I thought that strings and lists in Tcl were interconvertible, and lindex'ing a string essentially split it on whitespace. (Deleted comment)
File.open('dist.male.first').
map {|line| line.split.first}.
group_by {|name| name.chars.sort}.
each {|key,names| p names if names.length>1}
Heh. This year was my 10th wedding anniversary. I make jewelry. So, to celebrate, I calculated our wedding date in binary and made it into a necklace. Geek love. :)
![[User Picture]](http://l-userpic.livejournal.com/771231/243833) | From: jope 2008-01-07 12:27 am (UTC)
| (Link)
|
Where were you when the best Hasbro could come up with was Tomax and Xamot?
Going a bit more mad for brevity with list comprehensions: #!/usr/bin/python
by_anagram = {}
for name in [l.split()[0] for l in open("dist.male.first")]:
by_anagram.setdefault("".join(sorted(name)), []).append(name)
print "\n".join([" ".join(n) for n in by_anagram.itervalues() if len(n) > 1])
![[User Picture]](http://l-userpic.livejournal.com/39906800/12598) | From: krow 2008-01-07 02:46 am (UTC)
| (Link)
|
Ask Guido someday about how Python's Threads are implemented...
In some fairness though, Python's internals are much easier to follow then the MACRO hell of Perl's.
oki doki
$ time ./ana.py > /dev/null
real 0m0.028s user 0m0.016s sys 0m0.008s $ time ./ana.pl > /dev/null
real 0m0.019s user 0m0.020s sys 0m0.000s $ time ./ananames.bin > /dev/null
real 0m0.009s user 0m0.008s sys 0m0.004s
Can I ask why you are "supposed" to be writing in Python if your Perl is so strong?
![[User Picture]](http://l-userpic.livejournal.com/54541970/2) | From: brad 2008-01-07 04:16 am (UTC)
| (Link)
|
Google has a shitload of code and infrastructure. Google also has a shitload of engineers that altogether know a shitload of languages. If every engineer were allowed to write in his/her favorite pet language of the week, the necessary explosion of substandard bindings for each library * each language would be unmaintainable. Given that Perl/Python/Ruby are all effectively the same, it makes sense to standardize on one. Python has the right mix of learnability, readability, industry/community support, etc. I don't object to having to write in Python... it makes a ton of sense.
From: evan 2008-01-07 04:05 am (UTC)
i can has golf | (Link)
|
import Control.Arrow
import List
import qualified Data.Map as M
main = do
dat <- readFile "dist.male.first"
let pairs = map ((sort &&& (:[])) . head . words) $ lines dat
mapM_ print $ filter ((>1) . length) $ M.elems $ M.fromListWith (++) pairs Hooray for large standard libraries.
if i was him, i would name my children orlando and rolando and be done with it.
also, if i had a twin named branden while growing up one of us would've been driven to kill by now.
For what little it's worth:
#define STB_DEFINE
#include "stb.h" // http://nothings.org/stb.h
stb_sdict *names;
int main(int argc, char **argv)
{
int i,j;
char **lines, *y;
names = stb_sdict_new(1);
lines = stb_stringfile("/sean/writing/tools/male.txt", NULL);
for (; *lines; ++lines) {
char **z = stb_tokens(*lines, " ", NULL), *p = strdup(z[0]), **s;
qsort(z[0], strlen(z[0]), 1, stb_charcmp);
s = stb_sdict_get(names, z[0]);
stb_arr_push(s, p);
stb_sdict_set(names, z[0], s);
}
stb_sdict_for(names, i, y, lines) {
if (stb_arr_len(lines) > 1) {
printf("[ ");
for (j=0; j < stb_arr_len(lines); ++j) {
printf("'%s' ", lines[j]);
}
printf("]\n");
}
}
return 0;
}
Of course that leaks all memory, and I don't have perl or python installed to do a performance comparison anyway.
Hey, no fair. No one’s showing the Perl version any love even though it has a lot of stuff you can get rid of:
#!/usr/bin/perl
use strict;
sub sort_chars { join '', sort split //, shift }
my %by_anagram;
@ARGV = "dist.male.first"; # better yet: pass it on the command line
while (<>) {
s/\s.*//s;
push @{ $by_anagram{ sort_chars($_) } }, $_;
}
for ( values %by_anagram ) {
print "@$_\n" if @$_ > 1;
}
(Untested.)
Although I’d write it slightly longer in order to de-uglify the push line, by factoring out the addressing into an extra temporary:
my $bucket = \$by_anagram{ sort_chars($_) }; # oh hai, iz mah...
push @$bucket, $_;
Much nicer.
From: (Anonymous) 2008-01-08 05:13 am (UTC)
A little less opaque with most of his code concepts intact ... | (Link)
|
#!/usr/bin/perl
open (my $fh, "dist.male.first") or die;
my (%by_anagram,$name,$first,$rest,$sorted_name);
while (($name=<$fh>)=~ s/\s.*\n//m) {
push @{$by_anagram{join('', sort split //, $name) }},$name;
}
foreach my $sn ( keys %by_anagram) {
printf "%s\n",join(", ",@{$by_anagram{$sn}}) if (@{$by_anagram{$sn}} > 1);
}
Removed as much of the extraneous cde as possible, though I didn't alter his character sort (join, sort, split).
landman@lightning:~$ time ./ana2.pl > m
real 0m0.036s
user 0m0.028s
sys 0m0.004s
From: (Anonymous) 2008-01-07 06:06 am (UTC)
| (Link)
|
In situations like these, I'd prefer line.strip() to (line.split())[0]. The latter doesn't need the extra parentheses, anyway (and I don't know the layout of the file, if there is more than one thing on every line my replacement doesn't work).
It doesn't - see the comment in the Perl code, there are other fields in the file. I think this is the cleanest way.
![[User Picture]](http://l-userpic.livejournal.com/92611566/3171) | From: mart 2008-01-07 10:19 am (UTC)
| (Link)
|
I thought it might be fun to do it in C#, just for kicks:
using System;
using System.IO;
using System.Linq;
using System.Collections.Generic;
public class AnagramNames {
public static void Main(string[] args) {
var names = from name in File.ReadAllLines("dist.male.first") select name.Split(' ')[0];
var pairs = from pair in (
from name in names group name by SortCharsInString(name)
) where pair.Count() > 1 select pair;
foreach (var pair in pairs) {
Console.WriteLine(String.Join(",", pair.ToArray()));
}
}
public static string SortCharsInString(string s) {
char[] arr = s.ToArray();
Array.Sort(arr);
return new string(arr);
}
}
It could be even shorter and more memory-efficient if I could find a way in the standard library to:
- Get an
IEnumerable<string> for "lines in a file" without reading the whole file into an array (it's easy to write such a thing, so I'm sure it's in the standard library somewhere...)
- Sort the characters in a string in a functional way, rather than in-place
No timings, since I had to write this in Windows; Mono doesn't have stable LINQ support yet.
![[User Picture]](http://l-userpic.livejournal.com/35025624/8296261) | From: mtbg 2008-01-07 05:42 pm (UTC)
| (Link)
|
(it's easy to write such a thing, so I'm sure it's in the standard library somewhere...)So, by deduction, it must be really really hard to write a Pair class in Java... I have very little to say about the Python version that ciphergoth hasn't said. I probably would've implemented it as the list comprehension version. One comment about your version: letters = [letter for letter in name] is better spelled letters = list(name).
I'll be a bit offtop. Sorry. I'm writing a work for russian university, work is called "livejournal.com as unofficial massmedia", and i'd like to know your opinion about this. What do you think of the fact that livejournal.com is becoming or became already fourth authority? And can you predict lj's future? What do you prefer: newspapers and tv news or it? Thank you. Danil.
From: (Anonymous) 2008-01-07 04:00 pm (UTC)
| (Link)
|
;; Emacs Lisp version :D
(require 'cl)
(with-current-buffer (find-file-noselect "input")
(let ((hash (make-hash-table :test #'equal)))
(while (and (not (eobp))
(re-search-forward "^\\w*" nil t))
(setf str (match-string 0))
(sort* str #'<)
(push (match-string 0) (gethash str hash)))
(maphash (lambda (k v)
(when (cdr v) (print v))) hash)))
From: (Anonymous) 2008-01-07 05:19 pm (UTC)
A php version | (Link)
|
I know there will be haters but here is a php version.
#!/usr/local/php5/bin/php $line) { $parts = explode(" ", $line); $letters = str_split($parts[0]); sort($letters); $letters = implode('', $letters); if (!array_key_exists($letters, $anagrams)) { $anagrams[$letters] = array(); } $anagrams[$letters][] = $parts[0]; }
foreach ($anagrams as $key => $value) { if (count($value) < 2) { continue; } echo implode(", ", $value)."\n"; }
From: (Anonymous) 2008-01-10 04:12 pm (UTC)
Re: A php version | (Link)
|
Well, it seems you be a hater of indenting. Please stop the hate!
by_anagram = {}
for line in file("dist.male.first"):
by_anagram[sorted(line)]=by_anagram.get(sorted(line),[])+[line,]
for line in by_anagram.values():
print line
Test this - it will throw a TypeException; in addition you need to print only the entries with more than one name in the list. "setdefault" is easier. Also, compare my "list comprehensions" version above.
![[User Picture]](http://l-userpic.livejournal.com/27339622/6317759) | From: crw 2008-01-07 07:21 pm (UTC)
| (Link)
|
Alice / Celia, sadly missing (from Jeff Noon's "Automated Alice")
Nice post, it's fun to see different approaches. Here's mine in Common Lisp:
(let ((hash (make-hash-table :test #'equal)))
(with-open-file (stream "names")
(loop for line = (read-line stream nil nil)
for key = (sort (copy-seq line) #'char<)
until (null line)
do (push line (gethash key hash nil))))
(loop for v being the hash-value of hash
do (when (cdr v) (format t "~{~A~^, ~}~%" v))))
This problem is one of many that involve clustering values based on their signatures. If you factor out the signature-computing method as a parameter, what's left is a handy, reusable function that solves a large family of related problems:
ClusterBy: a handy little function for the toolbox (http://blog.moertel.com/articles/2007/09/01/clusterby-a-handy-little-function-for-the-toolbox).
Cheers, Tom
| |