To Block or Not Block: Good Bots vs Bad Bots: Full Guide
Bots, or to be exact, internet bots, are actually a term that can be applied to any software/program that can perform automated tasks without any human/user intervention over the internet. In practice, however, bots are typically designed to perform relatively simple but repetitive tasks, since they can perform these tasks much faster than a human user ever could.
These bots can be utilized to perform malicious activities like stealing sensitive data, content scraping, and even DDoS attacks, so managing these bot activities is necessary.
However, wouldn’t simply blocking all bots suffice? Why does bot management sound so complicated?
The thing is, blocking all bot activities isn’t always the best idea due to two main reasons:
First, is that there are actually good bots on the internet that can be really beneficial for your site and your business. Googlebot, for example, will index your site so that Google can recommend your site on its SERP (Search Engine Results Page). Obviously, we wouldn’t want to block Googlebot’s activities if we still want those valuable visitors coming from Google search.
Second, is that when you block a bad bot, the owner will know about it and might use the information surrounding the blocking to update the bot into a more sophisticated version instead. In turn, this would cause managing this bot much more difficult than before.
So, what is the right approach to managing these bots? Read along and we’ll learn the answer.
Good Bots VS Bad Bots
Good bots, simply put, are any internet bots that are designed to help businesses and users.
Typically we can tell if it’s a good bot due to three main criteria:
- It won’t hide its presence as a bot activity (i.e. won’t masquerade itself as a human visitor)
- The owner of the bot. Typically they are owned by legitimate, well-known companies (Google, Facebook, Microsoft, etc.) Even if the owner is not well-known, it should be fairly easy to look up their information
- The objective/activity of the bot. If you can analyze its activity, then it should be fairly easy to distinguish whether it’s a good bot or bad bot.
As we’ve discussed above, Googlebot along with crawler bots from other search engines (BingBot, etc.) are the most obvious example of good bots. However, there are other examples of good bots, including but not limited to:
- Copyright bots: these bots crawl websites looking for copyrighted content and check whether the site violates copyright law. Typically operated by a company that owns copyrighted material.
- Chatbots: bots that imitate human conversation and can answer site visitor’s questions.
- Monitoring bots: bots that crawl websites to monitor website metrics (i.e. whether a site is down) and also to perform analytics.
- Commercial bots: bots operated by commercial websites that crawl websites for information related to the owner’s business. For example, market research companies can use bots to monitor customer reviews.
While good bots are mostly harmless, they will still eat your site’s resources, and although rare, there are cases where activities from good bots can slow down your website. So, managing them is still fairly important, as we will discuss further below.
Bad bots, on the other hand, are any internet bots that are utilized to achieve malicious objectives, such as:
- Performing DDoS (Distributed Denial of Service) attacks. Cybercriminals can use bots to inject malware to computers and IoT devices, turning these devices into zombie devices as a part of a botnet. In turn, the attacker will use this botnet to attack websites and web servers, commonly by putting a massive amount of requests and traffic to the website to slow down its performance and even completely shut it down (denial of service).
- Web scraping. The bot is scanning and extracting content from websites and then reuse the content on other websites. The scraped content can include unreleased product information, product prices, etc. Ticketing sites and price-sensitive businesses are vulnerable to scraper bots and should use bot mitigation and web scraping detection and protection tools like DataDome.
- Performing spammy behavior like sending spam/fraud emails, filling comment sections with spams containing fraud links, fake/biased product reviews, click-fraud bots, and so on.
- Credential stuffing. Trying stolen credentials on a lot of (can be thousands) of websites to attempt account takeover.
And so on. What listed above is just the tip of the iceberg, and the potential implementations of bad bots are virtually limitless.
A key characteristic of bad bots is that they will mask its identity and will pose as a legitimate human user, creating two different layers of challenges in detecting their presences and creating the dilemma of whether to block or not block:
- We wouldn’t want to accidentally block the valuable human visitors, which may cause them to be angry and/or disappointed and leave your business
- We wouldn’t want to accidentally block good bots that are beneficial to our site
So, what’s the solution to these challenges? We will discuss it in the next section.
To Block or Not Block: Managing Bad Bots
With the increasing cybersecurity threats from bot activities, many companies have rapidly developed bot manager solutions in recent years.
The thing is, simply blocking the bot—even when we are 100% sure that it is a malicious bot— is no longer the preferred solution. When a bot operator knows that they have been blocked, they can update the bot or jump to another IP address to try bypassing the bot manager. With every attempt, the bot operator would also use this opportunity to gain information about your bot management solution and try to find its weakness.
So, the objective here is to manage and mitigate the bot’s activities instead of totally eliminating it, which is typically achieved with the following methods:
1. Honey Trap
The basic premise of this technique is to feed fake content or fake data to the bot depending on the bot’s objective. We typically use this technique against web scraper or content scraper bots.
With this, we can keep the bot active and allow it to continue its activity within your website/app, but rather than providing it with real content, we can reply with fake content. For example, if it’s a price scraping bot, we can feed it with wrong pricing values to fool the bot.
Another common technique is to redirect the bot to a mirror website/app where content has been reduced/simplified, so the bot can’t access your original content. So, the bot will waste its resources and might not realize that you’ve learned about its presence.
Another common technique is to throttle/intentionally slow down the bandwidth allocated for the bot to (significantly) slow down its activity. This technique is especially preferred when the bot is persistently attacking your site/app for example in a brute force attack. This approach is also effective in preventing false positives by still allowing legitimate users to access the website.
3. CAPTCHA/Challenge-Based Mitigation
CAPTCHA is a Turing test designed to be very difficult for bots to solve while very (relatively) easy for humans to solve, and for years have been the staple approach in managing bot activities. However, there are two reasons why CAPTCHA is no longer the one-size-fits-all bot management approach nowadays:
- CAPTCHA ruins the site’s user experience. Asking for too many CAPTCHAs and your visitors might be annoyed
- There is a presence of various CAPTCHA farm services that allows hackers to ask the help of real humans to solve the CAPTCHA before passing the access back to the bot, rendering CAPTCHA ineffective.
However, we can still use the challenge-based approach in mitigating bots, for example by implementing invisible challenges like asking the user to type certain data in mandatory form fields or asking the user to move the mouse a certain way which can be very difficult for bots to complete.
Since blocking is no longer effective in stopping bot activities, and at the same time bots are getting more sophisticated in mimicking human behaviors, a proper bot management solution like DataDome can help you:
- Differentiate between malicious bots and legitimate human visitors with low false positives
- Identify the source of all bot traffic and its reputation
- Analyze each bot’s behavior and make decisions on how to manage the bot’s activities depending on its behavior
- Allowing beneficial good bots to access the website/app
With both good and bad bots now accounting for almost 40% of all internet traffic, management of bots cannot be approached as an isolated activity, but rather must be integrated with the whole web and app ecosystem to allow optimal management of all traffic.
While blocking the bot or the bot’s source might seem like the most cost-effective and efficient approach, a persistent bot operator may find blocking easy to identify and bypass, and might actually provide the attacker with more information on how to better attack your site. In turn, this can lead to the attacker updating the bot code immediately, allowing the bot to evolve into a better version that is much more challenging to detect and mitigate in the future.