A lot of high-level languages, runtimes, and libraries people use on Linux have USDT probes embedded in them. In some cases, you have to compile with a specific flag to get the probes embedded (e.g. in Node.js), and in other cases they are part of the default package on most major distributions (e.g. in OpenJDK). Some examples of information these probes can provide include:
- Garbage collection events and latencies in Java and Node
- Method invocations and latencies in Python and Ruby
- Object allocations in Ruby and Java
- Thread start and stop events in Java
- Class load events in Java and Ruby
This information can be used for a variety of investigations in the field, as well as performance monitoring of a running production system.
If you have an application or runtime instrumented with USDT probes, there are a bunch of tools in the BCC project that you can now use to get low-overhead, safe, and powerful information from a production system (granted that you have a recent enough kernel to use BPF). A couple of days ago I was able to get several new tools merged, and it’s time to describe a few of them along with examples and usage scenarios. Let’s go!
Raw Probes: tplist, trace, and argdist
If you know exactly which probes you care about, and are willing to do the legwork of correlating trace output yourself, you can use the multi-tool tracing trio: tplist, trace, and argdist. All three tools support USDT probes. For example, you can use tplist to get a list of the probes embedded into a running Node process:
# tplist -p $(pgrep -n node) ../node node:gc__start ../node node:gc__done ../node node:http__server__request ../node node:http__server__response <...more output omitted for brevity...>
To figure out the format of the probe, you can apply tplist again to see the number and arguments and their types, but you will often need to consult the runtime’s source to see what the arguments stand for. Specifically, for the http__server__request probe in Node, the probe’s arguments include the requested URL, the client’s IP address, the HTTP method, and some other details.
Next, you can instrument probes you’re interested in, for example to monitor URLs that are frequently accessed by clients, with the argdist multi-tool:
# argdist -C 'u:/opt/bin/node:http__server__request():char*:arg6' -p $(pgrep -n node) [13:51:38] u:/opt/bin/node:http__server__request():char*:arg6 COUNT EVENT 123 arg6 = /index.html 11 arg6 = /favicon.ico 1 arg6 = / ^C
… or to print a trace message whenever there is a garbage collection in the Node process with the trace multi-tool:
# trace 'u:/opt/bin/node:gc__start' -p $(pgrep -n node) -T TIME PID TID COMM FUNC 13:56:54 7333 7333 node gc__start ^C
I especially love trace because it’s essentially a dynamic logger. I can attach it to arbitrary locations in the kernel or in user-space processes, and get live logs when interesting things happen — with minimal overhead. But I digress.
Java, Python, Ruby, Node: u* Tools
Learning the different kinds of probes that are available in different runtimes and then tracing and correlating this information can be a bit tedious. This is why I wrote a set of tools (inspired by Brendan Gregg’s work on the DTrace Toolkit for Solaris) that help trace GC latency, method calls and flow (including syscalls), thread start and stop events, and other interesting statistics and metrics for high-level language runtimes. These tools have just recently been merged into the BCC project, and should work on all Linux kernels after 4.3.
For example, let’s trace garbage collections in a Java process:
# ugc -m java $(pidof java) Tracing garbage collections in java process 1241... Ctrl-C to quit. START DESCRIPTION TIME (ms) 1.831 PS Scavenge PS Eden Space ... 50.00 2.005 PS Scavenge PS Old Gen ... 19.00 ^C
Or, trace method calls in a Python script:
# ucalls -L -l python $(pidof python) Tracing calls in process 4049 (language: python)... Ctrl-C to quit. ^C METHOD # CALLS TIME (us) ./runner.py.something_else 42 2017394.35 ./runner.py.something 17 1011.38
Or, count syscall frequencies and latencies in the same process:
# ucalls -LS 4923 Attached 742 kernel probes for syscall tracing. Tracing calls in process 4923 (language: none)... Ctrl-C to quit. ^C METHOD # CALLS TIME (us) sys_select 42 1998941.02
Or, trace Java thread creation and destruction events:
# uthreads -l java 27420 Tracing thread events in process 27420 (language: java)... Ctrl-C to quit. TIME ID TYPE DESCRIPTION 18.596 R=9/N=0 start SIGINT handler 18.596 R=4/N=0 stop Signal Dispatcher ^C
Or, trace Ruby object allocations in a REPL process:
# uobjnew ruby 27245 Tracing allocations in process 27245 (language: ruby)... Ctrl-C to quit. ^C TYPE # ALLOCS # BYTES NameError 1 0 RubyToken::TkSPACE 1 0 RubyToken::TkSTRING 1 0 String 7 0 RubyToken::TkNL 2 0 RubyToken::TkIDENTIFIER 2 0 array 55 129 string 344 1348
Or even trace method flow in a Ruby process, including filtering capabilities (this also works for Python OOTB and for Java with the -XX:+ExtendedDTraceProbes flag):
# uflow ruby 27245 Tracing method calls in ruby process 27245... Ctrl-C to quit. CPU PID TID TIME(us) METHOD 3 27245 27245 4.536 <- IO.gets 3 27245 27245 4.536 <- IRB::StdioInputMethod.gets 3 27245 27245 4.536 -> IRB::Context.verbose? 3 27245 27245 4.536 -> NilClass.nil? 3 27245 27245 4.536 <- NilClass.nil? 3 27245 27245 4.536 -> IO.tty? 3 27245 27245 4.536 <- IO.tty? 3 27245 27245 4.536 -> Kernel.kind_of? 3 27245 27245 4.536 <- Kernel.kind_of? 3 27245 27245 4.536 <- IRB::Context.verbose? 3 27245 27245 4.536 <- IRB::Irb.signal_status 3 27245 27245 4.536 -> String.chars 3 27245 27245 4.536 <- String.chars ^C
Or just monitor interesting events across all processes on the system that have one of the supported runtimes:
# ustat -C 5 Tracing... Output every 5 secs. Hit Ctrl-C to end 12:18:26 loadavg: 0.27 0.11 0.04 2/336 26455 PID CMDLINE METHOD/s GC/s OBJNEW/s CLOAD/s EXC/s THR/s 3018 node/node 0 1 0 0 0 0 12:18:31 loadavg: 0.33 0.12 0.04 2/336 26456 PID CMDLINE METHOD/s GC/s OBJNEW/s CLOAD/s EXC/s THR/s 3018 node/node 0 0 0 0 0 0 26439 java -XX:+ExtendedDT 2776045 0 0 0 0 0 12:18:37 loadavg: 0.38 0.14 0.05 2/336 26457 PID CMDLINE METHOD/s GC/s OBJNEW/s CLOAD/s EXC/s THR/s 3018 node/node 0 0 0 0 0 0 26439 java -XX:+ExtendedDT 2804378 0 0 0 0 0 ^C
Databases: dbslower
Finally, database engines such as MySQL and PostgreSQL are also instrumented with USDT probes. These can be very effective in identifying slow queries, tracing the query text, monitoring transaction execution and locks taken, and other various database events that might be difficult to trace without otherwise instrumenting your applications. The dbslower tool, which was developed as an exercise solution for my Linux tracing workshop, can attach to MySQL and PostgreSQL processes and trace queries slower than a predefined threshold (following the design of fileslower, ext4slower, and other similar tools from BCC). Already in BCC you can find Brendan Gregg’s mysqld_slower, which is a version that only supports MySQL:
# mysqld_qslower $(pgrep -n mysqld) 0.1 Tracing MySQL server queries for PID 14371 slower than 0.1 ms... TIME(s) PID MS QUERY 0.000000 18608 24.201 SELECT COUNT(*) FROM words 13.242390 18608 130.378 SELECT * FROM words WHERE word REGEXP '^bre.*n$' 23.601751 18608 119.198 SELECT * FROM words WHERE word REGEXP '^zzzzzzzz$' ^C
In Closing
If you haven’t seen the power of USDT tracing before, this is a good time to start. Runtimes and frameworks that are instrumented with USDT probes can be used very effectively for production investigations that do not require a system restart, do not require debug information, and do not require invasive steps such as ptrace-ing or suspending the target. I can only hope more software for Linux will ship with USDT on by default in the future, so we can all benefit from it in our production apps.
It’s also worth mentioning that even if you don’t have a kernel new enough to use the BPF tools above, you can still benefit from USDT tracepoints. The venerable perf is getting support for SDT events (try perf buildid-cache --add <path>
and then perf list sdt
), and you can also use tools that support uprobes to collect USDT information, albeit with some effort. SystemTap, too, is an option to consider.
You can also follow me on Twitter, where I put stuff that doesn’t necessarily deserve a full-blown blog post.