Spammers are known to scan web pages looking for email addresses to add to their lists (harvest). Putting your unprotected email address on your web pages is inviting spammers to fill your mailbox!
The problem is, visitors like to have a ready-to-click link to send an email.
There are several anti-spam options when it comes to allowing people visiting your web site to contact you:
mailto:
as well:
This comes out like: Email me. There isn't much point encoding the letters and dots in the address itself, as they look much like any other text you might find on a page, or indeed much like a web link you expect to find in an href attribute. This works quite well, as spammers almost always are looking for an @-sign and don't decode character entities.
<a href="mailto:andrew@scss.com.au">Email me</a>
This page deals with the last two options. Firstly, the Javascript one, followed (near the bottom of this page) by the server-side script one.
The idea here is to use Javascript to generate a clickable link. The theory goes that spam harvesters are just looking at the HTML markup of your page and are not using Javascript interpreters. Therefore, if you can encode your address into Javascript somehow so that it is no longer recognizable as an email address, the spam harvesters should miss it.
A reasonably well-known encoder is the Automatic Labs Enkoder Form. It produces code like:
<script type="text/javascript"> //<![CDATA[ function hiveware_enkoder(){var i,j,x,y,x= "x=\"0x=\\\"<m068$9-?_q\\\\8<;\\\"AA.o#4?mjsvDx=,C9\\\\\\\"|A=<@oBmA99=A?ml" + "&<P<<o99|B@|2pi2pi'*g~4r;ir9=r~4rkxl=yz?8=xsYjkt9kxm/Al?M54m|A8zn;8m8i-!kt" + "m}/A:.o?9i`&kAo|//-2wy!~Cf1C(:2<88n<9wxvzn7,m<<k88l/~AA#09|2g-?#<p8!1C</l<" + "j}<l<A}2o/<w<7:=;7~41evyfwGsrxvy~C={h9AhCk,nj8k-?\\\\%25\\\"<?!(8<yziEx;y=" + ",A=''x8b(.o8m/#8;x<7<=unpCk8i=|=-29e16?scag:6p/A:=m%25!1Ce(x;<r);:.~4?9h;j" + ",<fon@ir(il<=!C!87i4y={tk0;i<ng<l9xG89696-nx.l/z9en{=;;hyzA88xgtyi<h;i.6/g" + "<7/g=hb+8?}+){A=<j.pj<//A`6=:=x.v<~ch&C~4(A<kAWkarxvyCodl=;<?}<m.A+e<?=At(" + "k{hi+!yz-C-:rk2)-4jx8;i==?=?|Av+-yf(:<:j<3.7/r8-AisAl2qGl)j+!8;=<ii8ewge1~" + "494;<kCy+t--A;=u;vGx=Sshltri==8.i,9i9|-ny{hg.fkuxr?=.oj:6o,n-omC#C=haC8<:j" + "sv}&?,rCnA7ode6Aom6=Biz9<(ep,j)}~<?y\\\";=:|j=e2glvalek4(x.rktchavExrAt,4m" + "(0)<:<);x;--=x.?|Asub8znstrAo|(1)2wy;y=f1='';8<>forwxv(i=,580;iC</<x.#-?le" + "n}A+gth!8;;i+==+=10?js){yv91+=xC~4.su,mAbst4?yr(ii<8,5)6m@;}f|2por(:{hi=5y" + "zi;i<rkxx.llx7eng86=th;?m/i+=A5=10).o2{y+94-=x.!}/sub/8<str>=A(i,|2w5);y<A" + "}y=%25y.sfwxubsv,ltr(ux;j);\";j=eval(x.charAt(0));x=x.substr(1);y='';for(i" + "=0;i<x.length;i+=6){y+=x.substr(i,3);}for(i=3;i<x.length;i+=6){y+=x.substr" + "(i,3);}y=y.substr(j);"; while(x=eval(x));}hiveware_enkoder(); //]]> </script>
It's certainly difficult to read! It's also quite large, and needs to be duplicated everywhere you want your email address to appear. Then there's the maintenance issue - if you want to change an email address, you have to re-encode it, then cut-and-paste onto your page. What a hassle!
And what about people who are, for whatever reason, surfing with Javascript
disabled? Several statistics indicate they could be more than 10% of your
visitors! The usual method is to include <noscript>
tags and
describe your email address in some fashion. This results in duplication (of the
encoded address with the described address), not to mention the possibility of
inconsistencies.
My solution is to take advantage of the <noscript>
described address. What if you had some method by which that described address
could be automatically translated into a clickable address? Then all you would
need to do is place the described address on your page, then run some Javascript
to automatically un-obfuscate it and turn it into a link!
That's what I've done. All you need to do is include my Javascript module and a common event-handling Javascript module (see below for downloads):
<script type="text/javascript" src="events.js"></script> <script type="text/javascript" src="emaillinks.js"></script>
and place the required markup onto your page:
<p>Send your email to:<a class="email">"Andrew Gregory" <andrew at scss dot com dot au></a></p>
Which gets turned into:
Send your email to:"Andrew Gregory" <andrew at scss dot com dot au>
if you have Javascript enabled, and:
Send your email to:"Andrew Gregory" <andrew at scss dot com dot au>
if you haven't.
This work is licensed under a Creative Commons License.
Including the required Javascript files sets up an "onload" event handler for the web page, with the un-obfuscator configuration set to some useful defaults. When the page is loaded, the Javascript (if enabled) will run and search-and-replace the email addresses with links.
The email addresses are located by searching for <a>
elements with the configured class name (default: "email"). Elements
with an href already defined are skipped.
Once located, the destination name and address is extracted from the text content of the element, or, if the element contains an image, from the alt attribute. The name and address are found using regular expressions defined in the configuration (default: name is inside double-quotes, address is inside angle brackets). The name is optional, the address is not.
The address, which should be like "andrew at scss dot com dot au", then has a sequence of regular expression replacements applied to it. The default replacements convert " at " to "@", " dot " to ".", and remove any trailing "invalid" text (which may have an optional dot separating it from the rest of the domain).
After all that, the element link text is replaced and the href set to the email address.
At this point the process
function is called to perform any
last-minute processing that might be required.
It's possible spam harvesters could learn new ways of extracting email addresses from web pages. To counter this, I've made my script extremely flexible in it's method of operation. If you're handy with regular expressions and a bit of Javascript you can change my script's method of operation entirely.
Below are some links about regular expressions as they are implemented in Javascript:
The email un-obfuscator script uses an object to hold all the necessary configuration information. The default one is:
var emaillinks_config={ className:'email', addr:/<([^>]*)>/, name:/"([^"]*)"/, subj:/with subject "([^"]*)"/, process:[emaillinks_subject], unobs:[ {re:/\s+at\s+/ig , txt:'@'}, {re:/\s+dot\s+/ig, txt:'.'}, {re:/\s+-at-\s+/ig , txt:'@'}, {re:/\s+-dot-\s+/ig, txt:'.'}, {re:/\s+\(at\)\s+/ig , txt:'@'}, {re:/\s+\(dot\)\s+/ig, txt:'.'}, {re:/[\.]?invalid$/i, txt:''}, {re:/\s+/g, txt:''} ] };
You can replace it entirely (by setting a new value for
emaillinks_config
), or you can replace parts of it (for example,
emaillinks_config.className='mailtolink';
.
There are two methods of setting your customizations:
Your code is in your web page code somewhere after you include the source files:
<script type="text/javascript" src="events.js"></script> <script type="text/javascript" src="emaillinks.js"></script> <script type="text/javascript"> emaillinks_config.className='mailtolink'; </script>
Your code is in an external file referenced somewhere after you include the source files:
<script type="text/javascript" src="events.js"></script> <script type="text/javascript" src="emaillinks.js"></script> <script type="text/javascript" src="custom-code.js"></script>
where "custom-code.js" contains things like:
emaillinks_config.className='mailtolink';
The un-obfuscator script recognizes the following configuration object properties:
addr
This is the regular expression used to extract the obfuscated email address from the anchor content. If no name is present in the anchor content, the un-obfuscated version of this text is used as the link text. Default: "/<([^>]*)>/" (everything between two angle brackets).
className
This is the name of the class used to mark <a>
elements
as containing an obfuscated email address. Default: "email".
name
This is the regular expression used to extract the name of the email recipient from the anchor content. If present, this is used as the link text. Default: "/"([^"]*)"/" (everything between two double-quotes).
process
This is an Array of functions called when the un-obfuscator has built the link, but has not yet inserted it into the document, nor removed the original anchor. Each function is called with two parameters:
The intention is that you could perform some extra very fancy processing yourself, should you find that necessary, without needing to modify my original code.
This defaults to emaillinks_subject
, a function that
demonstrates how to use this facility, and which supports the useful feature of
adding a subject to the email link address.
subj
This is the regular expression used to extract the email subject (if any) from the anchor content. Default: everything between two double quotes following the text "with subject".
unobs
An array of Objects
, each with a regular expression property
(re
) and replacement text property (txt
). The array
is processed in order (first to last). At each step, the regular expression is
applied to the obfuscated address text and every match replaced with the
replacement text. The final result of the address text after being so processed
is assumed to be the un-obfuscated email address.
See also: Address Munging FAQ.
Rather than stick with the defaults, make it a little more difficult for the spam harvesters by customizing the script operation!
Set the class name to something different, like "mailtolink" or "address".
emaillinks_config.className='mailtolink';
Use different delimiters for the parts of the email. For example, square brackets to go around the obfuscated address:
emaillinks_config.addr=/\[([^\]]*)\]/,
Instead of looking for " at " and " dot ", look for " -at- " or " -dot- ":
emaillinks_config.unobs=[ {re:/\s+-at-\s+/ig , txt:'@'}, {re:/\s+-dot-\s+/ig, txt:'.'} ];
This is best shown by example. Note in particular, that this example also demonstrates that text not recognized as either the recipient name or email address will be ignored by the script. Code your address like:
<a class="email">"Andrew Gregory" <andrew at bogus dot scss dot com dot au invalid> (with subject "Feedback") (remove the bogus parts of the domain name before sending)</a>
And modify the standard configuration like (appends a new replacement object to the existing default ones):
emaillinks_config.unobs[emaillinks_config.unobs.length]={re:/@bogus./, txt:'@'};
Which turns out like:
with Javascript, and:
without.
You could write your email name and domain using lowercase letters, and write the "@" and "." using uppercase letters. For example, "andrewATscssDOTcomDOTau". A suitable configuration might be:
emaillinks_config.unobs=[{re:/AT/g,txt:'@'},{re:/DOT/g,txt:'.'}];
The only way I believe this technique could be defeated would be by harvesting software implementing a complete Javascript and DOM interpreter. By running every script on every page, then scanning through the resulting document objects, such a system could easily find anchor tags and the decoded href attributes.
This isn't necessarily difficult as open source browsers (such as Mozilla and Konqueror) provide a ready-to-go engine. All the spammers would need to do is create a modified version of the browser that can spider automatically.
What might stop this technique from being practical is that all the extra processing would significantly slow down harvesting.
I got this idea from A New Form of Spam Protection: If you're able to, you can use a server-side script instead of a client-side script. Here is a suitable Perl script (tested on Apache servers):
#!/usr/bin/perl -w my %q = split(/[=&]/, $ENV{'QUERY_STRING'}); print 'Status: 307 Moved Temporarily', "\n"; print 'Location: mailto:', $q{'name'}, '%40', $q{'domain'}; my $c = '?'; foreach ('cc', 'bcc', 'subject', 'body') { if ($q{$_}) { print "$c$_=$q{$_}"; $c = '&'; } } print "\n\n";
and some suitable PHP:
<?php if (isset($_GET["name"]) && isset($_GET["domain"])) { $loc = $_GET["name"] . "@" . $_GET["domain"]; $args = array("subject", "cc", "bcc", "body"); $ch = "?"; do { $value = current($args); if (isset($_GET[$value])) { $loc .= $ch . $value . "=" . $_GET[$value]; $ch = "&"; } } while (next($args)); header("Location: mailto:" . $loc); } ?>
You may, of course, hard-code any of the parameters (domain being the obvious one). Note also that these scripts are not spam relays because they don't actually send the email - they rely on the user-agent (browser) to do that.
You use the script by creating a link to it:
<a href="mailto.pl?name=andrew&domain=scss.com.au&subject=Feedback">Email me</a> <a href="mailto.php?name=andrew&domain=scss.com.au&subject=Feedback">Email me</a>
or a form:
<form action="/cgi-bin/mailto.pl"> <fieldset> <input type="hidden" name="name" value="andrew" /> <input type="hidden" name="domain" value="scss.com.au" /> <input type="hidden" name="subject" value="Feedback" /> <input type="submit" value="Email me" /> </fieldset> </form>
Clicking on the link/button executes the script, which redirects the browser to a "mailto" address, which the browser should interpret as an email.
Of course, these methods can also be defeated by spam harvesters using a web browser engine, but now they'd have to submit every form they encounter to see if they get a mailto address.
This is easily done using the following processing function:
emaillinks_config.process.push(function(orig,link) { var href = link.getAttribute('href'); href = href.replace(/^mailto:/, 'mailto.pl?name='); href = href.replace(/@/, '&domain='); link.setAttribute('href', href); });
Of course, I'm happy to receive feedback and suggestions on this script, page, or any other aspect of this web site. Follow the "Contact Me" in the footer of the page.
Version | Date | Description |
---|---|---|
1.8 | 2007-04-25 |
|
1.7 | 2006-09-07 |
|
n/a | 2004-11-21 |
|
n/a | 2004-10-22 |
|
n/a | 2004-10-18 |
|
n/a | 2004-10-08 |
|
1.6 | 2004-10-08 |
|
1.5 | 2004-08-10 |
|
1.4 | 2004-05-31 |
|
1.3 | 2004-05-29 |
|
1.2 | 2004-05-28 |
|
1.1 | 2004-05-28 |
|
1.0 | 2004-05-28 |
|
All the below are links to web pages, each page with a single email link. The different pages use different anti-spam techniques. The address is named to indicate which technique is used.
I originally had all the links on this page, but because I got so few spams (even to the unprotected address), that I thought perhaps there were too many email addresses on this page and the harvesters were calling it a spam trap. Maybe just having one address per page will be better.
Please don't use them to send me email! Instead, follow the "Contact Me" link at the bottom of this page.