Back in April I started an informal spam research project to see if the conventional wisdom is true, namely that when you put your email address on a website it collects spam. Basically I littered my site with a email addresses and then kept watch to see when messages were delivered. Today I removed them from the site and would like to share my results so far.
Here are the email addresses I used, where I placed them and how many messages they received:
- st1 - plain text in an html comment, at /blog - received 29
- st2 - href=mailto in an html comment, at /blog - received 17
- st3 - href=mailto in plain sight, at /blog - received 23
- st4 - plain text in an html comment, at /gallery - received 8
- st5 - href=mailto in an html comment, at /gallery - received 8
- st6 - href=mailto in plain sight, at /gallery - received 7
- Total messages: 92
- First hit: 2006-05-05 04:21:39
- First hit address: st2
- Unique hosts: 38
- Hosts sending only a single message: 22
- Most messages from a single host: 8
- Hosts listed in Spamhaus SBL-XBL: 25
- Messages blocked by SBL-XBL: 37
- Separate attacks: 18
- Attacks with delays between messages: 6
- Number of countries: 19
- Most common countries: China (5), South Korea (6), U.S. (5)
The next step is to see how long I continue to receive messages at these addresses. I suspect it'll continue for at least 2 months. We'll see.
Finally, I came up with a clever way to trace back when these messages get skimmed off the server. Instead of a static address (st1, st2, etc.), I wrote up a simple little bit of php code to generate a unique email address of the form stT<date>T<ip address>. The things I'm most curious to find out are 1) how long spammers continue to use a certain address, 2) how widely the skimmed addresses are shared and 3) how far from the crawler the email address wanders. I'll let you know.