Crawler Detection
Crawler detection is available only for customers with non-legacy plans (introduced in April 2023) and requires PHP Extension 5.14+ and Daemon 1.9.12+ |
Crawlers can significantly affect the performance of your site. As such it is relevant from a performance POV to know if a trace in Tideways was initiated by a real user in the browser or by a crawler.
Tideways can automatically detect if a request was made by a crawler, and will
tag the trace with crawler
if it does. You can filter for traces with the
crawler tag.
To enable crawler detection you must configure it in the PHP extension through php.ini or tideways.ini.
tideways.features.crawler_detection=1
If your application already detects crawlers for other purposes, then in this case you don’t need to enable Tideways crawler detection.
Please note that the crawler detection is not guaranteed to work 100% of the time and malicious actors can work around crawler detection. |
For example when using the PHP packages jaybizzle/crawler-detect or
matomo/device-detector, then you can programatically mark the request as being
from a crawler by setting the crawler
tag on the request:
<?php
use Jaybizzle\CrawlerDetect\CrawlerDetect;
$crawlerDetect = new CrawlerDetect;
if($crawlerDetect->isCrawler($_SERVER['HTTP_USER_AGENT'] ?? '')) {
if (class_exists('Tideways\Profiler')) {
\Tideways\Profiler::setTags(['crawler']);
}
}
Similarly, if you want to mark AI bots as crawlers in Tideways and are using VolkswAIgen then you can do that:
<?php
$volkswaigen = new \VolkswAIgen\VolkswAIgen\Main(
new \VolkswAIgen\VolkswAIgen\ListFetcher(
$psr6CachePoolImplementation
)
);
if ($volkswaigen->isAiBot($userAgent, $ipAddress)) {
if (class_exists('Tideways\Profiler')) {
\Tideways\Profiler::setTags(['crawler']);
}
}
What database of crawlers is used for detection?
Tideways uses the Go library x-way/crawlerdetect which in turn uses the patterns defined in JayBizzle/Crawler-Detect.