Preventing Spam in Drupal

Spam is the bane of many a Drupal site, organizations rely on Drupal webform data for lead generation & marketing, sales, user registration & onboarding, handling comments or reviews, and other business-critical functions. Spam floods these forms with useless or even malicious submissions or in worst-case scenarios, can bring the site down with a fusillade of requests that overwhelm the server. What are the best ways to go about preventing it?

The many different spam prevention methods that exist can be placed in one of three broad categories:

Methods that examine the content
Methods that examine the source
Methods that detect human agency

We'll take a look at what they are, list some available spam prevention methods in each category, compare/contrast some of the pros and cons of each method, and then wrap up with a discussion on how to choose which method(s) to use for your particular situation.

Methods that examine the content

This one is pretty straightforward. Methods in this category check the content of a form submission to determine whether or not it is legitimate or spam.

The most common and straightforward example of this type is form validation, which ensures that for a given form field, the submitted value(s) are appropriate for the type of field. For example, ensuring that a phone field or email field contain valid input rather than bot-generated nonsense or a fake phone number/email address.

Custom form validation

Drupal's Webform module has a pretty solid set of default validation methods for the various fields it provides but these occasionally let spam through, particularly in text fields as by default, generic text field validation doesn't care about what sort of text is submitted.

For this, it may be necessary to implement custom form validation via a webform handler. This does require extra custom development time and testing as you will have to ensure, for example, that you create a regex or utilize a library for validating specific fields that allows a broad enough range of expected/valid user input, taking edge cases into account, while preventing malicious input.

CleanTalk

CleanTalk and its associated Drupal module is a third-party spam prevention option that examines content. It is passive and automatic but requires an external account and subscription, has a (small) yearly fee, and must be hooked up via a Drupal module.

Methods that examine the source

Rather than looking at behavior or the content of a form submission, methods that examine the source look at where spam is coming from, be it a geographic region, a particular user, or a specific Internet Protocol (IP) address.

User banning

If the spam is coming from a particular registered Drupal user or users, the remedy is to simply ban or block their Drupal user accounts.

IP address banning

For anonymous users who don't have a Drupal account, the most common approach here is to prevent access to any user that attempts to access the site from a given IP address. This can be done at the web host level or the application level, such as Platform.sh HTTP access control, or via a module such as the core Ban module. Determining the IP address of an offending user or users can be done via accessing server logs or if using the Webform module, from the Webform results admin screen.

A note of caution on IP banning: While this approach is useful for blocking mass amounts of spam incoming from a single IP or small set of IPs, it cannot be a substitute for other measures as a normal method of spam prevention. Playing manual whack-a-mole with dozens, hundreds, or even thousands of individual IPs is unsustainable.

Geographic restrictions

You may also find that spam is coming from a specific geographic region, country, or administrative area instead, which may require banning or restricting access via a service like CloudFlare which has Firewall rules which can handle more specific source-based restrictions. If you do business or provide content/services to the geographic region in question, this might be too wide of a net and would block legitimate users from accessing your site from that region.

Methods that detect human agency

The final and most complex bucket of spam prevention methods are those that detect human agency. These methods attempt to differentiate legitimate human behavior and site interaction from that of bots or scripts.

Antibot

Antibot is a Drupal module that detects if a user agent has Javascript enabled, if it does not, the module shortcuts the webform render process and the site does not render the form. It is lightweight, unobtrusive, automatic, does not prevent page caching, and catches/prevents a wide range of bot-induced spam which is what makes it our preferred default method for spam prevention.

Honeypot

The Honeypot module creates a hidden form field that, if populated, prevents form submission. A normal user will not be able to fill this field so it has a low rate of false positives. Enabling Honeypot on a form does, however, disable Drupal page cache for that page which represents a significant performance hit for end users on subsequent page loads. As a result, Honeypot is best used as an elevated protection method for individual forms that are not embedded on more than one page.

Captcha/reCaptcha

Google's Captcha/reCaptcha and its associated Drupal modules are a widely used third-party spam prevention measure. Captcha must be configured on a per-domain basis which can present some issues with testing on feature branch dev/staging environments and represents a heightened level of manual work to set up and manage. Earlier versions (original and v2) require active input on behalf of a user and can present accessibility and UX issues or even an elevated level of false positives.

reCaptcha v3 uses a passive scoring system to assign a score based on user behavior (lower is more likely a human, higher is more likely a bot), preventing submission when the score exceeds a set threshold, but this does require more complex setup.

All versions of Captcha/reCaptcha don't play very well with Drupal page caching as a cached reCaptcha auth key can cause a false-positive submission failure. As a result, just like with Honeypot, this is best used for individual forms that are not embedded on more than one page.

How do I decide what spam prevention method to use?

The first step to determining which spam prevention method to use is inspecting the spam for patterns. Determine the context of the spam you're seeing and tailor the solution from there.

Do they all come from a single IP address, small set of IP addresses, or specific country? A source examining spam prevention methods may be the best place to start.

Do they tend to have legitimate sounding personal information but have something like website URLs, arbitrary code, advertisements, email addresses, or phone numbers in text fields where they don't belong? An examine the content method of prevention might be the ticket.

Are you seeing a lot of nonsense, bot-generated spam that comes from a wide range of sources? In that case, a spam prevention method that detects human agency is likely ideal here.

We don't want to blindly or haphazardly add a bunch of different spam prevention methods, we want to be as unobtrusive and present as little friction to a legitimate end user as possible. Many spam prevention methods come with substantial accessibility, UX, or technical downsides that need to be taken into account before implementing.

Hopefully this primer has been a helpful introduction to spam prevention in Drupal and helps you or your clients stamp out annoying spam before it can affect your business.

Need a fresh perspective on a tough project?

Let's talk about how RDG can help.

Frank head and shoulders, with glasses, beard, and red checked dress shirt

About the Author

Frank Holub

Frank is a web developer with experience in Drupal site building, project management, theme implementation, and WebOps management.

Module Roundup

User Experience

Drupal

Client Resources