Spam bots are annoying, there’s no question about that - they cost people money and, more importantly, time.
Spam bots are annoying, there’s no question about that - they cost people money and, more importantly, time. So how do we stop form spam without using traditional captcha forms? Over the next few posts I'll be detailing the results of some research I've been conducting, as well as testing different methods of stopping form spam, or at the very least reducing it.
There are two main types of spam - email spam and form spam. Email spam is all those lovely messages you get that try to extort money or attempt to sell you things.
Form spam is where bots or people submit forms with links or code in an attempt to build traffic to their own site or inject malicious XSS
into your pages.
In this post i’ll be talking about methods to deal with form spam.
There are a few restrictions I set myself.
First, to avoid using traditional captcha forms as they can be annoying and put users off.
Firstly, know thy enemy, how do spam bots work? There are different types of spam bots, playback bots, form-filling bots, and people. While people are obviously not bots, they do sometimes send spam, silly rick-rollers
A person visits a form the first time and records the form data. Certain fields are marked as slots to be filled in with randomized spam. The bots then replay the POST data back to the form submission URL, but filled with spam. The structure of the form is played back exactly the same each time. This includes the names of the fields, and the contents of hidden inputs.
These bots read the form served by the site, and automatically fill data into the fields. Some of them blindly fill in all fields that they come across, others are a little smarter and only fill in certain values, leaving hidden fields alone.
There's nothing you can do to stop them, as they should be able to pass all captcha’s and anything else you put in place to stop bots. The only thing I would recommend is to use the rel="nofollow"
attribute on all links, and be clear that you’re doing it. This gives the spammer no real incentive to post spam as they won’t be getting any search engine based link traffic.
Fighting the spam
Now we have a little understanding of how spam bots, we can look at different methods of stopping them, or at least making it more difficult for them.
Using a timestamp on a hidden field makes it possible to detect when old data is being reused, or even if the form is being filled in too quick. I would then recommend hashing the timestamp with a secret to make it tamper-proof. You can then go one step further and include the client's IP address in the hash, this will stop the replayed data being used by a large network of spambots.
, an editable field on the form that is invisible to people, can be used to stop some form-filling bots. This is then validated when the form data is posted, so that if they contain any data the submitter must be a bot (because people can’t see the field), and so the submission can be rejected.
One downside is that, depending on how you hide the honeypot, a user will be able to use the tab key to navigate into a honeypot which they may fill in by accident.
Randomised field names
As some bots look for key words in field names, such as ‘name’ or ‘email’, then they can attempt to post what would be considered valid data to those fields. By randomising the field names you make it difficult for a bot to tell what the field is for.
It will also stop bots posting data to the submission URL with the same field names over and over.
The downside would be that a user would be unable to make use of any auto-form filling functionality that may exist in their browser.
For example, the email field must have an @ sign, and the name field must not. This works great alongside the randomised field names. If these fields were renamed to jdf56g and bf874 then a bot can’t tell the difference, so it has less chance of being able to make a successful post.
Email domain checking
PHP has the handy function checkdnsrr
, which you can use to see if the domain of the email address exists and if it has mail exchange. Mail exchange means the domain can send and receive emails. If a domain doesn't have mail exchange it can be safe to say that its not valid email address. Just as if the domain doesn’t exist it probably isn’t a real email.
Randomise the form's order
By randomising a forms order it would be impossible for someone to make scripts to fill in a form based on order.
This could easily be done by putting all the form elements in an array, using the PHP function shuffle
to randomise the array, then loop over the array using a foreach
For example, a form with name, email, comment, honeypots for name and email and a hidden timestamp value would have 6 fields. That gives a total of 720 different form combinations. Even if the comment form was always last there are still 120 different combinations.
Over the next few weeks I will be testing each of the individual methods on my own blog, then showing the results here. Then finally I will combine all methods, plus any others I discover along the way, to see how effective they are together.