* Re: Feedback on your ARM LTTng benchmarks [not found] <20130912214457.GA7783@Krystal> @ 2013-09-12 22:27 ` Colin Ian King 2013-09-13 0:45 ` Mathieu Desnoyers 0 siblings, 1 reply; 3+ messages in thread From: Colin Ian King @ 2013-09-12 22:27 UTC (permalink / raw) To: Mathieu Desnoyers; +Cc: lttng-dev, kernel-team Hi Mathieu, On 12/09/13 22:44, Mathieu Desnoyers wrote: > Hi Colin, > > I just read your post on: > > https://lists.ubuntu.com/archives/kernel-team/2013-May/028450.html > > and, although I'm very pleased to see that LTTng provides good > performances in your tests, there is a small detail on your benchmarking > approach I would like to bring to your attention. If you followed the > benchmarking procedure used by Romik Guha Anjoy and Soumya Kanti > Chakraborty's "Efficiency of Lttng as a Kernel and Userspace Tracer" > work, you only have part of the picture. I pointed this issue to them > when I stumbled on their work after it has been published. > > You see, they only benchmark the equivalent of lttng-consumerd and > lttng-sessiond (in the lttng 0.x days, that was lttd). They entirely > miss the impact of the lttng-modules kernel tracer and lttng-ust > userspace tracer: the parts that write into the ring buffers. > > This part is slightly harder to benchmark. This is why I relied on > system benchmarks with typical workloads to measure the overall system > slowdown in my thesis > (http://www.lttng.org/pub/thesis/desnoyers-dissertation-2009-12.pdf) > rather than use profiling. > > If you only profile lttng-sessiond and lttng-consumerd, you will end up > noticing a very tiny impact indeed: while tracing is active, > lttng-sessiond is almost never active. lttng-consumerd needs to > transport the data, which indeed brings some overhead. However, if you > use lttng's flight recorder tracing (with snapshots) introduced in lttng > 2.3, the consumerd is entirely out of the picture: it's just writing > into memory buffers. Even then, the lttng-modules and lttng-ust parts > of the tracer have _some_ impact when writing into the buffers from the > kernel and user-space application contexts. > > So overall, there is a part of the lttng footprint not accounted for. > It's very small, but it exists. That is very useful to know, many thanks for the clarification. Do you have any ARM based benchmarks that can give us an idea of the overhead that I failed to account for? > > I just want to make sure that nobody can say later than "lttng is fast" > claim is based on bogus benchmarks. It is very fast, yes, but I > recommend revisiting your benchmarking approach if you based it solely > on Romik Guha Anjoy and Soumya Kanti Chakraborty's work. > > On typical benchmarks, my own results were usually under 5% of overhead > system-side (see my thesis for details). Is that specific to any particular architecture? I was concerned about the impact on processors with relatively small instruction and data caches such as ARM processors. > > Thank you ! > > Mathieu > Thanks again for taking the effort to enlighten me. Regards, Colin ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Feedback on your ARM LTTng benchmarks 2013-09-12 22:27 ` Feedback on your ARM LTTng benchmarks Colin Ian King @ 2013-09-13 0:45 ` Mathieu Desnoyers 0 siblings, 0 replies; 3+ messages in thread From: Mathieu Desnoyers @ 2013-09-13 0:45 UTC (permalink / raw) To: Colin Ian King; +Cc: lttng-dev, kernel-team * Colin Ian King (colin.king@canonical.com) wrote: > Hi Mathieu, > > On 12/09/13 22:44, Mathieu Desnoyers wrote: > > Hi Colin, > > > > I just read your post on: > > > > https://lists.ubuntu.com/archives/kernel-team/2013-May/028450.html > > > > and, although I'm very pleased to see that LTTng provides good > > performances in your tests, there is a small detail on your benchmarking > > approach I would like to bring to your attention. If you followed the > > benchmarking procedure used by Romik Guha Anjoy and Soumya Kanti > > Chakraborty's "Efficiency of Lttng as a Kernel and Userspace Tracer" > > work, you only have part of the picture. I pointed this issue to them > > when I stumbled on their work after it has been published. > > > > You see, they only benchmark the equivalent of lttng-consumerd and > > lttng-sessiond (in the lttng 0.x days, that was lttd). They entirely > > miss the impact of the lttng-modules kernel tracer and lttng-ust > > userspace tracer: the parts that write into the ring buffers. > > > > This part is slightly harder to benchmark. This is why I relied on > > system benchmarks with typical workloads to measure the overall system > > slowdown in my thesis > > (http://www.lttng.org/pub/thesis/desnoyers-dissertation-2009-12.pdf) > > rather than use profiling. > > > > If you only profile lttng-sessiond and lttng-consumerd, you will end up > > noticing a very tiny impact indeed: while tracing is active, > > lttng-sessiond is almost never active. lttng-consumerd needs to > > transport the data, which indeed brings some overhead. However, if you > > use lttng's flight recorder tracing (with snapshots) introduced in lttng > > 2.3, the consumerd is entirely out of the picture: it's just writing > > into memory buffers. Even then, the lttng-modules and lttng-ust parts > > of the tracer have _some_ impact when writing into the buffers from the > > kernel and user-space application contexts. > > > > So overall, there is a part of the lttng footprint not accounted for. > > It's very small, but it exists. > > That is very useful to know, many thanks for the clarification. Do you > have any ARM based benchmarks that can give us an idea of the overhead > that I failed to account for? Yes, but they will only give a basic estimate I'm afraid. If we look at my thesis dissertation, p. 87 Section 5.5.2 Probe CPU-cycles overhead, we have the following figures: Architecture Cycles Core freq. Time (GHz) (ns) Intel Pentium 4 545 3.0 182 AMD Athlon64 X2 628 2.0 314 Intel Core2 Xeon 238 2.0 119 ARMv7 OMAP3 507 0.5 1014 A couple of things to consider here: - this is for a microbenchmark, where all the system is doing is writing events into the ring buffer, so it may fail to take into account instruction cache size, data cache size, TLB pollution and BPB pollution effects that come into play when the entire system is active, - this is against lttng 0.x, not 2.x, and the ring buffer implementation has changed a lot between the two, But on an ARMv7 OMAP3, we had a factor 8.5 slowdown of the probes between Intel Core2 Xeon and ARMv7. I don't have numbers against OMAP4 for lttng 2.x, and I would be very interested to see them. > > > > > I just want to make sure that nobody can say later than "lttng is fast" > > claim is based on bogus benchmarks. It is very fast, yes, but I > > recommend revisiting your benchmarking approach if you based it solely > > on Romik Guha Anjoy and Soumya Kanti Chakraborty's work. > > > > > On typical benchmarks, my own results were usually under 5% of overhead > > system-side (see my thesis for details). > > Is that specific to any particular architecture? I was concerned about > the impact on processors with relatively small instruction and data > caches such as ARM processors. Yes, this was against Intel Core2 Xeon. I'd be interested to see those numbers against ARM OMAP4. If you really care about getting precise overhead numbers, I would recommend the following benchmarks, which I used for my thesis: - tbench: see how much tracing degrades network throughput, - dbench: similar for disk throughput, - lmbench to test the speed degradation of specific system calls when tracing, - gcc benchmark, for CPU intensive workload which uses the scheduler very heavily and forks lots of processes (short-lived) if you compile e.g. the Linux kernel. Don't forget to use /proc/sys/vm/drop_caches to ensure your filesystem cache is in a pristine state prior to each run (or do a cache-priming run first), - you might want to run those benchmarks both in flight recorder mode (lttng 2.3 new "snapshot" mode, very cool, try it out, it's amazingly fast!) ;-) as well as when tracing to disk, and while tracing to the network (streaming). The basic idea is to have a repeatable benchmark, and see how fast it completes (or what throughput it can reach) with and without tracing. Thanks! Mathieu > > > > > Thank you ! > > > > Mathieu > > > Thanks again for taking the effort to enlighten me. > > Regards, > > Colin -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 3+ messages in thread
* Feedback on your ARM LTTng benchmarks @ 2013-09-12 21:44 Mathieu Desnoyers 0 siblings, 0 replies; 3+ messages in thread From: Mathieu Desnoyers @ 2013-09-12 21:44 UTC (permalink / raw) To: Colin Ian King; +Cc: lttng-dev, kernel-team Hi Colin, I just read your post on: https://lists.ubuntu.com/archives/kernel-team/2013-May/028450.html and, although I'm very pleased to see that LTTng provides good performances in your tests, there is a small detail on your benchmarking approach I would like to bring to your attention. If you followed the benchmarking procedure used by Romik Guha Anjoy and Soumya Kanti Chakraborty's "Efficiency of Lttng as a Kernel and Userspace Tracer" work, you only have part of the picture. I pointed this issue to them when I stumbled on their work after it has been published. You see, they only benchmark the equivalent of lttng-consumerd and lttng-sessiond (in the lttng 0.x days, that was lttd). They entirely miss the impact of the lttng-modules kernel tracer and lttng-ust userspace tracer: the parts that write into the ring buffers. This part is slightly harder to benchmark. This is why I relied on system benchmarks with typical workloads to measure the overall system slowdown in my thesis (http://www.lttng.org/pub/thesis/desnoyers-dissertation-2009-12.pdf) rather than use profiling. If you only profile lttng-sessiond and lttng-consumerd, you will end up noticing a very tiny impact indeed: while tracing is active, lttng-sessiond is almost never active. lttng-consumerd needs to transport the data, which indeed brings some overhead. However, if you use lttng's flight recorder tracing (with snapshots) introduced in lttng 2.3, the consumerd is entirely out of the picture: it's just writing into memory buffers. Even then, the lttng-modules and lttng-ust parts of the tracer have _some_ impact when writing into the buffers from the kernel and user-space application contexts. So overall, there is a part of the lttng footprint not accounted for. It's very small, but it exists. I just want to make sure that nobody can say later than "lttng is fast" claim is based on bogus benchmarks. It is very fast, yes, but I recommend revisiting your benchmarking approach if you based it solely on Romik Guha Anjoy and Soumya Kanti Chakraborty's work. On typical benchmarks, my own results were usually under 5% of overhead system-side (see my thesis for details). Thank you ! Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-09-13 0:45 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20130912214457.GA7783@Krystal>
2013-09-12 22:27 ` Feedback on your ARM LTTng benchmarks Colin Ian King
2013-09-13 0:45 ` Mathieu Desnoyers
2013-09-12 21:44 Mathieu Desnoyers
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.