No commit activity in last 3 years
No release in over 3 years
A Ruby C extension that enables defining static tracepoints from within a ruby context.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Development

 Project Readme

Build Status

Docker Repository on Quay

ponycorn

USDT Report

This gem is described in greater detail in this USDT report, and has rdoc for master and the latest release , as well as code coverage from last master build

Tracing ruby in Development and Production

Add tracepoints for any question you need answered about your code.

Until you enable them, these tracepoints will have 0 overhead, and after you enable them, these tracepoints will have almost no overhead.

No need to continuously log print statements, as output is only generated when tracepoints are actually being traced.

Write and test tracepoints on any Mac or Linux workstation, and have them be accessible to you in production in the exact same way.

This should be useful for:

  • Generating latency histograms for methods, code blocks, or entire libraries.
  • Comparing performance characteristics in development versus production.
  • Tracing the wall-lock time of calls to external services.
  • Diagnostic prints (the world's fanciest printf).
  • Collecting stack traces with surgical precision.
  • Exposing other ruby VM characteristics, such as for runtime heap analysis.

Both Darwin/OSX and Linux are supported.

There is a Development guide to show how to set up a dev env and test the gem and try out the linux examples.

Goals

Easy

Make registering static tracepoints in ruby as simple as possible with the lowest overhead possible.

Ideally, you shouldn't have to know anything about static tracing to add probes to your library.

We want to cater to frameworks like rails, to provide tracing patterns that will work well for both simple and advanced use cases.

Fast

Registering lots of tracepoints should be encouraged. To support this, the overhead of unused tracepoints should be:

  1. Minimized or eliminated if tracing is not enabled
  2. Measured with runtime complexity understood
  3. Simple, avoiding loops O(n) and using as many constant-time O(1) operations, such as (hopefully) table lookups

The approaches that eBPF itself takes to achieve this are:

  • Using a verifier before running, to perform static analysis on code. For this reason, loops are only allowed if fully unrolled.
  • To cause no overhead on a tracepoint if it is not being traced

Dtrace is able to overwrite the memory of the target process when a probe is enabled, and only then is assembly related to the tracing added to processing overhead.

The approach that eBPF on linux takes is similar, using uprobes to inform a process that it has a userspace probe enabled.

// TO DO add state diagrams for enabled/disabled flows of dtrace vs bpftrace, and how this maps to ruby

Powerful

Ideally, the values being probed should already be present in the execution context. Any code within the context of a probe will need to be executed each time it is fired.

If additional context needs to be gathered, this is possible through the use of ruby code. For instance, a code block can be passed to the tracepoint which is used as the wrapper code, which then yields the values to be fired. This allows for powerful dynamic tracing capabilities, being able to pull any relevant information in the execution context to deliver to the probe.

Safe

Where possible, table lookups or variables in local context should be preferred in order to gather the data to fire off in the probe. Any helper functions in ruby space should carefully describe their worst-case runtime complexity, and bound this at O(n), where n is a known small integer.

Latency tracer

Here's a demonstration of the latency tracer working end-to-end:

latencytracer.gif

Alternatives

Print statements and metrics

Ultimately, tracing is just a fancy printf in a lot of ways. If plain ol' puts and log statements get the job done with an acceptable performance overhead, use'm!

The same is true for metrics, if you have something like statsd that might be a better way to get the data you're looking for.

This type of tracing access at performance analysis, as you can easily generate latency histograms and perform other aggregation functions on scalar data.

Ruby tracing

Ruby has its own tracing support, but it theoretically has a higher overhead as requires running logic on every function call.

USDT tracing should solve this, by explicitly registering trace points and only adding complexity and extra instructions to execute if the tracepoint is actually enabled.

Ruby dtrace hooks

Ruby provides its own built-in dtrace probes, which can be used similarly to ruby's built-in tracing, to simply probe all executed functions. This has the same issue of running on every function, instead of just the ones you want, but it's really great for simple invocations:

bpftrace -e 'usdt::ruby:method__entry { @[str(arg1)]++ }' -p ${UNICORN_PID}

ruby-usdt

https://github.com/thekvn/ruby-usdt

This repo provides much of the inspiration for this project, but it appears lost to the sands of time.

We wrap a newer version of libusdt, and support both Darwin and Linux.

libusdt

https://github.com/chrisa/libusdt

This is used by this project to provide Darwin support, but libusdt doesn't support linux.

Works by writing a dof note dynamically into the processes heap.

libstapsdt

https://github.com/sthima/libstapsdt

This is used by this project to provide Linux support, but libusdt doesn't support Darwin.

Works by writing a ELF note to a stub library, and dlopen'ing it.

lttng-ust

There is an existing (apparently unmaintained) gem for ruby that utilizes lttng's userspace support.

It is possible that this could provide improved performance by way of lover overhead as compared to USDT probes. We may support lttng-ust probes in the future, as they appear to conform to the same interface. They could be used instead of USDT probes for linux.

lttng-ust does tracing 100% in userspace, as opposed to uprobes which are executed within a kernel context. It may be worthwile to benchmark agints lttng-ust libraries, to ascertain what the magnitude of the difference is overhead is, particularly if USDT probe overhead becomes observable and detrimental.