Sampling

Tideways uses sampling to keep the overhead of production profiling small and controlable.

You can influence the amount of tracing data collected with the sample rate configuration. It is a percentage value used for randomly deciding which PHP requests are fully traced by Tideways and which are not. A request that is fully traced collects detailed performance data visibile in the Timeline and Callgraph Profiler.

When Tideways is enabled, it runs on every request. The sample rate then influences if a full trace is collected including calls to frameworks, SQL, caches and other services via HTTP. When no full trace is collected, Tideways still reports the response time of the request and watches out for potential Exceptions and Fatal Errors. This allows you to get a full picture of your project including the full distribution of response times to every endpoint/transaction.

By default the sample rate for full traces is configured at 25% of all PHP requests and Tideways is automatically started on web requests.

When the daemon gets the data from the PHP extension, then it re-samples the traces to keep only interesting traces up to a max limit of traces per minute that depends on your plan.

Configuring Sampling

The following PHP INI configuration variables can affect how sampling works:

  • tideways.sample_rate can be modified to collect more tracing data to be sent to the local daemon.

  • tideways.auto_start defaults to true and can be set to false or 0 to avoid starting Tideways when a web request is run. This allows you to programmatically start Tideways using Tideways\Profiler::start() API.

  • tideways.traces_only_keep_minimum_ms defaults to 0 (meaning all traces are kept) and can be used to automatically discard detailed traces for requests that are fast and you don’t want to keep traces for. This can be used in conjunction with a high tideways.sample_rate to keep only those traces that are slow. Traces that are discarded because of this setting are still recorded for the monitoring statistics.

The next configuration variables control what Tideways does when the random sampling choice starts the monitoring mode of Tideways.

  • "disabled" for not enabling Tideways.

  • "basic" for basic monitoring data such as latency and memory of request, transaction name and error yes/no.

  • "tracing" for enabling the timeline profiling.

  • "full"  for enabling timeline and callgraph profiling.

The INI configuration variables are

  • tideways.monitor is for the monitoring mode and defaults to basic and can be changed to disabled.

  • tideways.collect is for the decision what profiling data is collected and defaults to tracing, can be changed to full.

Enabling tideways.collect=full increases the visibility in your project by always keeping callgraph traces, but this usually increases the overhead of each request by at least 50% and more. In certain cases we have seen overhead of 800% and more when the callgraph Profiler was activated, because the code executed several usually quick PHP internal functions over 500.000 times in the request. Your mileage with the overhead may vary, but setting the full mode is not recommended for permanent use on production. In addition callgraph traces are usually 3 - 4 times larger in size than timeline traces and Tideways may reduce the collected traces per minute automatically to adjust for this.

Resampling in the Daemon

Each daemon running on your projects servers re-samples traces based on the following criteria:

  • Servers (Daemons) get assigned a traces/minute quota based on the projects license and size. This process is called server balancing and happens at least every 15 minutes.

  • For servers with tideways.collect=full set the number of traces is reduced to account for the additional overhead of storing and processing a callgraph for each trace.

  • If a tracepoint is active, then traces for this transaction are prioritized up to the traces/minute quota of the server.

  • For every minute window, the daemon keeps 2 traces for each transaction with full visibility in local memory: one with slow and one with average response time. At the end of the minute it then ranks the transactions by when they have least recently sent traces to the backend, and sends those traces first up to the traces/minute quota.

  • If there is still traces/minute quota left after the previous step, then the slowest traces for transactions with limited visibilty get sent.

We call this strategy "prefer least recently traced" and it is used for Tideways 5 and 6 plans.

Downsampling

Timeline traces and callgraph traces differ roughly 2-4x in terms of storage requirements. When a project collects an excessive amount of callgraphs, Tideways needs to downsample the project by a factor between 2 and 4.

  • Downsampling only applies for projects that collect more than 1000 callgraphs per day.

  • When >90% of all traces contain a callgraph, then the downsampling factor is 4.

  • When >70% of all traces contain a callgraph, then the downsampling factor is 3.

  • When >50% of all traces contain a callgraph, then the downsampling factor is 2.

To avoid getting downsampled the following levers could need adjustment:

  • `tideways.collect=full´ INI setting is not recommended for production use, only for short term collection of all traces.

  • Long running tracepoints that collect callgraphs for the whole period can cause excessive callgraph collection.

  • Programmatic functionaltiy like Tideways\Profiler::enableCallgraphProfiler() without a programmatic sampling mechanism.

Still need help? Email [email protected]