August 14th, 2007

belize

The CIA and Buffy

If you haven't seen this linked just about everywhere, check it out:

http://wikiscanner.virgil.gr/

... 35 million wikipedia edits, all indexed by IP address, so you can see which organizations are making which edits.

For instance, it's good to know that somebody within the CIA is editing the article about the musical episode of Buffy, "Once More, with Feeling". Their contribution?
*Buffy sings, "I've got a theory, it doesn't matter" but then quickly adds "as long as we're together," revealing both the depression and the (revealed in this episode) cause.
belize

rejecting residential zombies with qpsmtpd

Qpsmtpd continues to be fun. For the past week or so, danga.com's mail has been running with qpsmtpd in front, instead of Postfix, and qpsmtpd now handles:

-- normal logging to syslog
-- logging all connections in structured/indexed way to mysql, many columns per connection
-- SMTP AUTH, letting me/friends use danga.com as our outgoing mail server
-- check_earlytalker (reject clients who speak before spoken to, as lot of spammers do)
-- counting bad commands (reject clients who HTTP POST to port 25 via open web proxy)
-- DNS RBL checks
-- rejecting non-members from posting to GNU Mailman lists
-- not letting people send mail as @danga.com, @bradfitz.com, etc, etc without SMTP-AUTH
-- rejecting mails to non-existent users (no bounces later, 5xx immediately)
-- spam checks via spamassassin
-- virus (and phishing) checks via ClamAV's clamd
-- queuing to postfix (which then delivers to local users via .forward/.procmailrc/Maildir, whatever)

My Postfix config, meanwhile, has shrunk to barely anything.

But after analyzing my logs in MySQL, I still wasn't happy. I saw that almost all my spam (~98%) came from residential DHCP/DSL/Cable connections, with stupid-looking, easily-recognizable hostnames, like:
+---------------------------------------------------+
| remotehost                                        |
+---------------------------------------------------+
| pool-71-187-1-147.nwrknj.fios.verizon.net         | 
| APointe-a-Pitre-103-1-66-27.w80-8.abo.wanadoo.fr  | 
| 80-219-140-217.dclient.hispeed.ch                 | 
| dsl-189-133-79-229.prod-infinitum.com.mx          | 
| c-71-196-65-212.hsd1.fl.comcast.net               | 
| dsl-189-174-204-42.prod-infinitum.com.mx          | 
| 150.218.111.218.klj03-home.tm.net.my              | 
| 201-25-211-207.fnsce702.dsl.brasiltelecom.net.br  | 
| d83-186-127-150.cust.tele2.be                     | 
| 24-159-245-214.dhcp.mdsn.wi.charter.com           | 
| aolclient-68-202-241-155.aol.cfl.res.rr.com       | 
| c-71-61-26-51.hsd1.pa.comcast.net                 | 
| i05v-87-90-179-109.d4.club-internet.fr            | 
| 85.137.89.143.dyn.user.ono.com                    | 
| c-76-100-225-155.hsd1.md.comcast.net              | 
| dslb-088-074-000-186.pools.arcor-ip.net           | 
| pD955ED51.dip.t-dialin.net                        | 
| 183.42.33.65.cfl.res.rr.com                       | 
| 82-42-29-1.cable.ubr02.knor.blueyonder.co.uk      | 
| client-200.121.44.138.speedy.net.pe               | 
| c-68-58-108-131.hsd1.in.comcast.net               | 
| host81-154-148-241.range81-154.btcentralplus.com  | 
| 216-199-78-234.atl.fdn.com                        | 
| 218.206.98-84.rev.gaoland.net                     | 
| cpe-65-28-157-135.bak.res.rr.com                  | 
| 82-45-185-97.cable.ubr01.chel.blueyonder.co.uk    | 
| KH222-156-64-178.adsl.dynamic.apol.com.tw         | 
| dslb-084-056-113-198.pools.arcor-ip.net           | 
| 195.64.189.72.cfl.res.rr.com                      | 
| dsl-189-179-129-154.prod-infinitum.com.mx         | 
| cpe-24-175-188-97.stx.res.rr.com                  | 
| 78-56-100-93.ip.zebra.lt                          | 
| 75-131-161-195.dhcp.spbg.sc.charter.com           | 
| host-81-190-121-194.gdynia.mm.pl                  | 
| c9112907.rjo.virtua.com.br                        | 
| ppp-124.120.110.148.revip2.asianet.co.th          | 
| dsl-189-144-11-85.prod-infinitum.com.mx           | 
| 201-95-47-232.dsl.telesp.net.br                   | 
| 202.31.101-84.rev.gaoland.net                     | 
| i05v-212-194-122-1.d4.club-internet.fr            | 
| 201-88-68-211.cbace700.dsl.brasiltelecom.net.br   | 
| dslb-082-083-049-211.pools.arcor-ip.net           | 
| 89.140.22.68.static.user.ono.com                  | 
| cpe-76-170-82-98.socal.res.rr.com                 | 
| 112.75.128.219.broad.fs.gd.dynamic.163data.com.cn | 
| S01060015e97b2781.du.shawcable.net                | 
| c-24-8-151-233.hsd1.co.comcast.net                | 
| 20150032076.user.veloxzone.com.br                 | 
| host-69-221-111-24.midco.net                      | 
| 67-130-61-6.dia.static.qwest.net                  | 
+---------------------------------------------------+

So I wrote a little Perl module (to be released) that incorporates a few dozen tests and regexps, looking for the IP address (or part of it) encoded in the hostname any number of totally f'ed up ways, and also double-checking that the user isn't some home Linux user running their own mail server... if their HELO hostname resolves to their source IP, I allow an ugly reverse DNS hostname. You often can't control your reverse. And forward DNS is free, so it's not an unreasonable barrier to entry.

Check out this table:
http://bradfitz.com/hacks/antispam/lj-example-table.txt

In conclusion, a lot can be learned from the (IP, reverse DNS of that IP, declared HELO host) tuple. I'm now rejecting 50% of incoming connections at the MAIL FROM stage (have to give them a chance to AUTH after HELO), based on the (IP, reverse DNS, HELO host) alone. I reject the remaining 50% (25% of total) via spamassassin, check_earlytalker, and dsnbl, and then I queue/deliver 25% of the total email to users on danga.com.

Yay qpsmtpd!