As seen on The Future of Commerce Blog:
…and the troublesome
ones in between
Web bots have been around for a long time and we all benefit from many of them. There are good bots (like Googlebot or Bingbot), and there are bad bots that automatically attempt to hack a web application or inject spam into websites. The good ones are generally beneficial, and the bad ones can often be dealt with by a solution such as a Web Application Firewall (WAF) or enterprise bot manager that will recognize malicious requests and block them.
The problematic bots are often those that sit between good and bad. These can be hard to detect, as they will often impersonate a normal user and make requests that, on their own and in isolation, are perfectly safe, legitimate, and seemingly harmless.
Although their intention is normally something other than a DDoS attack, the effect can sometimes be the same when they are either too aggressive or too many instances of a bot hitting a website at once.
These bots are used commercially for a number of reasons including:
- Automatic purchasing of products (aggressive purchase bots can cause severe performance issues during product launches)
- Aggregation of content (your content can be passed off as someone else’s)
- Competitor price analysis (competitors can use this data to undercut you)
- Aggressive content crawling (aggressive crawlers can put strain on your web platform)
Real-world example of a commercial bot causing a lot of issues
We have a client that often sells limited edition products which are very sought after. These products can often fetch 3 times the RRP when sold on eBay, and the retailer will only have a limited supply to sell. Most of these products have a coordinated worldwide launch and therefore the exact time of the launch is well known.
Over the last few years, we have increasingly seen extremely aggressive bots used in the many thousands to attempt to purchase these products to an extent where the performance of the e-commerce platform can be seriously compromised.
In this instance, the bots have been specifically designed for this retailer’s website, and know the exact requests that need to be made to add the product to the basket and go through the checkout. They don’t even need to visit the product display page. They are normally distributed across multiple cloud servers with multiple instances of the bot installed on each server. Because the launch time is public and coordinated, the bots all start to attempt to add the product to the basket and go through the checkout at the exact same time, normally many thousands at once.
The record we have seen is 3 million attempts to purchase a single product in a 12 hour period.
Because the requests are all legitimate and the bot is impersonating a real user, it can be hard to block the bots quickly enough before they do the damage without blocking real users. There is no point in waiting 1 minute to record how many requests a particular IP has made and, if the number is over a certain threshold, you then block them. By this point, the damage has already been done and you have tens of thousands of bots simultaneously in your checkout.
The bots also disadvantage real users, as you can guarantee that the bots will be first in the queue to get the products, as they are timed to start purchasing the second the products go live. Although the retailer obviously still gets the sale, they can lose brand loyalty because of this since real and loyal customers will always lose out.
We have spent the last few years fighting bots on behalf of this customer to varying degrees of success. Every time we implement a new layer of protection we keep them at bay for a short while but the bot developers then adapt their software to bypass that protection. Licences to use these bots can sell for quite a lot of money so there is a big incentive for the developers to adapt to get around the defences.
So how do you manage good bots and bad bots?
Many organizations, such as CDNs, have been rapidly developing bot management solutions over the last year in response to the increasing problems with bots that retailers are facing. Some, such as Akamai’s bot manager solution, can be very sophisticated in the way that they attempt to identify a bot, as well as with the options it will give the retailer in how they deal with the bot. Tools like this will constantly adapt and learn to keep up with the bots that are always evolving in an attempt to evade detection.
Simply blocking the bot is not always the answer. If they know they have been blocked, they can just jump to another IP or try to evolve in order to fool the bot manager. A better solution is to fool the bot by showing them the wrong content (maybe higher prices – in the case of a bot used to analyze competitor’s prices) or just slow them down. This is also a useful technique to use for bots that are only harmful because they are too aggressive in their crawling. You don’t want to block them altogether, but you do want to slow them down a little to reduce the impact on your infrastructure.
Although a bot manager solution is certainly a useful tool, it is unlikely to identify and stop all bots and, in the real-world instance detailed above, by the time it would possibly identify the user as a bot, it may be too late as the damage would already be done. Bots will constantly adapt and evolve to stop bot managers blocking them and so it is a moving target.
The solution to effectively managing these bots is multi-faceted. There is no one, single, solution that will catch everything and give you all of the control you need. Different services and solutions will give protection in different areas against different types of bots. Only by deploying multiple defences and solutions can you effectively manage these bots.
4 areas to consider when building a bot management strategy
A CDN can be a first line of defence against malicious or troublesome traffic. The ideal CDN configuration ensures that all requests to your web application, whether cacheable or not, are filtered through the CDN. You can then use tools that the CDN will provide such as a WAF, bot manager or even some basic rate limiting rules to protect your website against the most obvious bots.
Many retailers have a WAF layer sitting between their CDN and their hosting infrastructure. A high-quality WAF, such as Imperva WAF, can be used to automatically detect and block malicious requests such as those made by many bad bots. Additionally, custom rules can be added to recognize and block or limit those bots that are not malicious but can be troublesome.
Application caching layer
Implementing a tool such as Varnish that sits between your firewall and your web application can not only improve speed and performance, but can also be used to limit the impact of aggressive bots. A number of Varnish modules (Vmods) are available that can be used to effectively limit the rate of requests being made to specific urls.
Changes can be made to your application to protect it from aggressive or troublesome bots. For example, using simple tools like Google reCAPTCHA at relevant times, limiting the number of users who can add a specific product to their basket at any one time or even introducing initiatives such as a raffle for the purchase of exclusive and limited edition products so that these products cannot be purchased in the conventional way will help to prevent the bots from being successful.
It is important to consider implementing some or all of the solutions above rather than just relying on one of them as each will provide defence against these bots in slightly different ways. For example, if you simply relied on an application change to prevent purchasing bots they will still be hammering the rest of your infrastructure and even cause issues such as filling apache or Varnish logs files to an extent that your server could run out of disk space.
Good bot vs bad bot: Don’t ignore the signs
In summary, bots are becoming an increasing commercial threat to e-commerce retailers and dealing with them effectively can be very complex. We have a customer that very recently implemented an enterprise bot manager and they have found that over 65% of their web traffic comes from bots. The bot manager filters every request made to the website and uses various complex techniques and algorithms to identify and block requests that it deems to be from a bot. We knew that they were being hit hard by bots at times but that number took us a bit by surprise.
If you consider this number and the amount of bandwidth and server capacity that is required to serve this traffic and the fact that around 75% of that bot traffic is from ‘bad’ or malicious bots, it is not something that any retailer should ignore.
Director of CX Consulting
Branwell Moffat is the Director of CX Consulting at KPS Digital in the UK; an award-winning SAP partner and SAP CX SI in London, UK. He’s a highly technical, strategic and business-focused e-commerce consultant and business leader with over 20 years experience helping companies grow their digital businesses to levels of individual revenues in excess of $500 million per year.
During this time he has been the co-founder / manager of Envoy Digital, a successful digital and e-commerce agency and Gold SAP and Hybris Partner based in SW London, UK which was acquired by KPS in early 2018.
His career has been spent consulting on, architecting and sponsoring the development of a large number of enterprise e-commerce solutions for a range of global brands, online and high street retailers, Premier League football clubs, financial organisations as well as a number of other vertical industries.
This experience has given him a unique understanding of not only the commercial and strategic aspects of growing an omni-commerce business, but also the technical, tactical and practical aspects of doing so. His experience encompasses everything within the sphere of omni-commerce from user experience through to supply chain and ERP.
Branwell is often asked to talk on the subject of customer experience and, as a thought-leader, looks to write articles that, not only get people thinking, but contain real and practical advice.