* [lttng-dev] Profiling LTTng tracepoint latency on different arm platforms @ 2023-06-20 10:27 Mousa, Anas via lttng-dev 2023-06-20 14:20 ` Mathieu Desnoyers via lttng-dev 0 siblings, 1 reply; 9+ messages in thread From: Mousa, Anas via lttng-dev @ 2023-06-20 10:27 UTC (permalink / raw) To: lttng-dev@lists.lttng.org [-- Attachment #1.1: Type: text/plain, Size: 2638 bytes --] Hello, I've recently profiled the latency of LTTng tracepoints on arm platforms, using the follow sample program: ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- static inline uint64_t get_time_nsec(void) { struct timespec ts; if (caa_unlikely(clock_gettime(CLOCK_MONOTONIC, &ts))) { ts.tv_sec = 0; ts.tv_nsec = 0; } return ((uint64_t) ts.tv_sec * 1000000000ULL) + ts.tv_nsec; } int main(int argc, char *argv[]) { unsigned int i; int tp_num = 0; uint64_t total_time = 0; uint64_t now, nowz; if (argc > 1) { sscanf (argv[1],"%d",&tp_num); } for (i = 0; i < tp_num; i++) { now = get_time_nsec(); lttng_ust_tracepoint(hello_world, my_first_tracepoint, i, "some_str"); nowz = get_time_nsec(); total_time += (nowz - now); } if (tp_num) { printf("---------------------------Average TP time is %"PRIu64"---------------------------\n", total_time / tp_num); } } ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- I observed a big average latency variance on different platforms when tracing a high number (many thousands to millions) of tracepoints: * [platform 1] with CPU info running a linux kernel based on Buildroot (4.19.273 aarch64 GNU/Linux): BogoMIPS : 187.50 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd08 CPU revision : 3 * Saw an average latency of 2-3usec * [platform 2] with CPU info running a linux kernel based on Amazon Linux (4.14.294-220.533.amzn2.aarch64 aarch64 GNU/Linux): BogoMIPS : 243.75 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x3 CPU part : 0xd0c CPU revision : 1 * Saw an average latency of ~0.5usec Are there any suggestions to root cause the high latency and potentially improve it on platform 1? Thanks and best regards, Anas. [-- Attachment #1.2: Type: text/html, Size: 102791 bytes --] [-- Attachment #2: Type: text/plain, Size: 156 bytes --] _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [lttng-dev] Profiling LTTng tracepoint latency on different arm platforms 2023-06-20 10:27 [lttng-dev] Profiling LTTng tracepoint latency on different arm platforms Mousa, Anas via lttng-dev @ 2023-06-20 14:20 ` Mathieu Desnoyers via lttng-dev 2023-06-20 18:03 ` Mathieu Desnoyers via lttng-dev 0 siblings, 1 reply; 9+ messages in thread From: Mathieu Desnoyers via lttng-dev @ 2023-06-20 14:20 UTC (permalink / raw) To: Mousa, Anas, lttng-dev@lists.lttng.org On 6/20/23 06:27, Mousa, Anas via lttng-dev wrote: > Hello, > > Arethereanysuggestionstorootcausethehighlatencyandpotentiallyimproveiton*platform****1*? > > Thanks and best regards, > > Anas. > I recommend using "perf" when tracing with the sample program in a loop to figure out the hot spots. With that information on the "fast" and "slow" system, we might be able to figure out what differs. Also, comparing the kernel configurations of the two systems can help. Also comparing the glibc versions of the two systems would be relevant. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [lttng-dev] Profiling LTTng tracepoint latency on different arm platforms 2023-06-20 14:20 ` Mathieu Desnoyers via lttng-dev @ 2023-06-20 18:03 ` Mathieu Desnoyers via lttng-dev 2023-06-21 5:39 ` Yitschak, Yehuda via lttng-dev 0 siblings, 1 reply; 9+ messages in thread From: Mathieu Desnoyers via lttng-dev @ 2023-06-20 18:03 UTC (permalink / raw) To: Mousa, Anas, lttng-dev@lists.lttng.org On 6/20/23 10:20, Mathieu Desnoyers via lttng-dev wrote: > On 6/20/23 06:27, Mousa, Anas via lttng-dev wrote: >> Hello, > >> >> Arethereanysuggestionstorootcausethehighlatencyandpotentiallyimproveiton*platform****1*? >> >> Thanks and best regards, >> >> Anas. >> > > I recommend using "perf" when tracing with the sample program in a loop > to figure out the hot spots. With that information on the "fast" and > "slow" system, we might be able to figure out what differs. > > Also, comparing the kernel configurations of the two systems can help. > Also comparing the glibc versions of the two systems would be relevant. > Also make sure you benchmark the lttng "snapshot" mode [1] to make sure you don't run into a situation where the disk/network I/O throughput cannot cope with the generated event throughput, thus causing the ring buffer to discard events. This would therefore "speed up" tracing from the application perspective because discarding an event is faster than writing it to a ring buffer. Thanks, Mathieu [1] https://lttng.org/docs/v2.13/#doc-taking-a-snapshot > Thanks, > > Mathieu > > -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [lttng-dev] Profiling LTTng tracepoint latency on different arm platforms 2023-06-20 18:03 ` Mathieu Desnoyers via lttng-dev @ 2023-06-21 5:39 ` Yitschak, Yehuda via lttng-dev 2023-06-21 13:47 ` Mathieu Desnoyers via lttng-dev 0 siblings, 1 reply; 9+ messages in thread From: Yitschak, Yehuda via lttng-dev @ 2023-06-21 5:39 UTC (permalink / raw) To: Mathieu Desnoyers, Mousa, Anas, lttng-dev@lists.lttng.org > On 6/20/23 10:20, Mathieu Desnoyers via lttng-dev wrote: > > On 6/20/23 06:27, Mousa, Anas via lttng-dev wrote: > >> Hello, > > > >> > >> > Arethereanysuggestionstorootcausethehighlatencyandpotentiallyimproveito > n*platform****1*? > >> > >> Thanks and best regards, > >> > >> Anas. > >> > > > > I recommend using "perf" when tracing with the sample program in a > > loop to figure out the hot spots. With that information on the "fast" > > and "slow" system, we might be able to figure out what differs. > > > > Also, comparing the kernel configurations of the two systems can help. > > Also comparing the glibc versions of the two systems would be relevant. > > > > Also make sure you benchmark the lttng "snapshot" mode [1] to make sure > you don't run into a situation where the disk/network I/O throughput cannot > cope with the generated event throughput, thus causing the ring buffer to > discard events. This would therefore "speed up" tracing from the application > perspective because discarding an event is faster than writing it to a ring > buffer. You mean we should avoid the "discard" loss mode and use "overwrite" loss mode since discard mode can fake fast performance ? > > Thanks, > > Mathieu > > [1] https://lttng.org/docs/v2.13/#doc-taking-a-snapshot > > > Thanks, > > > > Mathieu > > > > > > -- > Mathieu Desnoyers > EfficiOS Inc. > https://www.efficios.com > > _______________________________________________ > lttng-dev mailing list > lttng-dev@lists.lttng.org > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [lttng-dev] Profiling LTTng tracepoint latency on different arm platforms 2023-06-21 5:39 ` Yitschak, Yehuda via lttng-dev @ 2023-06-21 13:47 ` Mathieu Desnoyers via lttng-dev 2023-06-21 14:21 ` Yitschak, Yehuda via lttng-dev 0 siblings, 1 reply; 9+ messages in thread From: Mathieu Desnoyers via lttng-dev @ 2023-06-21 13:47 UTC (permalink / raw) To: Yitschak, Yehuda, Mousa, Anas, lttng-dev@lists.lttng.org On 6/21/23 01:39, Yitschak, Yehuda wrote: >> On 6/20/23 10:20, Mathieu Desnoyers via lttng-dev wrote: >>> On 6/20/23 06:27, Mousa, Anas via lttng-dev wrote: >>>> Hello, >>> >>>> >>>> >> Arethereanysuggestionstorootcausethehighlatencyandpotentiallyimproveito >> n*platform****1*? >>>> >>>> Thanks and best regards, >>>> >>>> Anas. >>>> >>> >>> I recommend using "perf" when tracing with the sample program in a >>> loop to figure out the hot spots. With that information on the "fast" >>> and "slow" system, we might be able to figure out what differs. >>> >>> Also, comparing the kernel configurations of the two systems can help. >>> Also comparing the glibc versions of the two systems would be relevant. >>> >> >> Also make sure you benchmark the lttng "snapshot" mode [1] to make sure >> you don't run into a situation where the disk/network I/O throughput cannot >> cope with the generated event throughput, thus causing the ring buffer to >> discard events. This would therefore "speed up" tracing from the application >> perspective because discarding an event is faster than writing it to a ring >> buffer. > > You mean we should avoid the "discard" loss mode and use "overwrite" loss mode since discard mode can fake fast performance ? Yes. In addition to use "overwrite-when-buffer-full" mode, the "snapshot" session also ensures that no consumer daemon extracts the trace data (unless an explicit snapshot record is performed), which allows comparing the ring buffer producer performance with minimal noise. If you really want to benchmark the discard-when-buffer-full mode and the the consumer daemon I/O behavior, then you need to take into account event discarded counts and the actual trace data size that was written to disk. Thanks, Mathieu > >> >> Thanks, >> >> Mathieu >> >> [1] https://lttng.org/docs/v2.13/#doc-taking-a-snapshot >> >>> Thanks, >>> >>> Mathieu >>> >>> >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> https://www.efficios.com >> >> _______________________________________________ >> lttng-dev mailing list >> lttng-dev@lists.lttng.org >> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [lttng-dev] Profiling LTTng tracepoint latency on different arm platforms 2023-06-21 13:47 ` Mathieu Desnoyers via lttng-dev @ 2023-06-21 14:21 ` Yitschak, Yehuda via lttng-dev 2023-09-10 14:18 ` Mousa, Anas via lttng-dev 0 siblings, 1 reply; 9+ messages in thread From: Yitschak, Yehuda via lttng-dev @ 2023-06-21 14:21 UTC (permalink / raw) To: Mathieu Desnoyers, Mousa, Anas, lttng-dev@lists.lttng.org > On 6/21/23 01:39, Yitschak, Yehuda wrote: > >> On 6/20/23 10:20, Mathieu Desnoyers via lttng-dev wrote: > >>> On 6/20/23 06:27, Mousa, Anas via lttng-dev wrote: > >>>> Hello, > >>> > >>>> > >>>> > >> > Arethereanysuggestionstorootcausethehighlatencyandpotentiallyimprovei > >> to > >> n*platform****1*? > >>>> > >>>> Thanks and best regards, > >>>> > >>>> Anas. > >>>> > >>> > >>> I recommend using "perf" when tracing with the sample program in a > >>> loop to figure out the hot spots. With that information on the "fast" > >>> and "slow" system, we might be able to figure out what differs. > >>> > >>> Also, comparing the kernel configurations of the two systems can help. > >>> Also comparing the glibc versions of the two systems would be relevant. > >>> > >> > >> Also make sure you benchmark the lttng "snapshot" mode [1] to make > >> sure you don't run into a situation where the disk/network I/O > >> throughput cannot cope with the generated event throughput, thus > >> causing the ring buffer to discard events. This would therefore > >> "speed up" tracing from the application perspective because > >> discarding an event is faster than writing it to a ring buffer. > > > > You mean we should avoid the "discard" loss mode and use "overwrite" > loss mode since discard mode can fake fast performance ? > > Yes. In addition to use "overwrite-when-buffer-full" mode, the "snapshot" > session also ensures that no consumer daemon extracts the trace data > (unless an explicit snapshot record is performed), which allows comparing > the ring buffer producer performance with minimal noise. > > If you really want to benchmark the discard-when-buffer-full mode and the > the consumer daemon I/O behavior, then you need to take into account > event discarded counts and the actual trace data size that was written to > disk. Since you mentioned this, is there any "stat" command which lists events such as discards and disk writes, etc ? I looked this up in the past but couldn't find anything > > Thanks, > > Mathieu > > > > >> > >> Thanks, > >> > >> Mathieu > >> > >> [1] https://lttng.org/docs/v2.13/#doc-taking-a-snapshot > >> > >>> Thanks, > >>> > >>> Mathieu > >>> > >>> > >> > >> -- > >> Mathieu Desnoyers > >> EfficiOS Inc. > >> https://www.efficios.com > >> > >> _______________________________________________ > >> lttng-dev mailing list > >> lttng-dev@lists.lttng.org > >> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > -- > Mathieu Desnoyers > EfficiOS Inc. > https://www.efficios.com _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [lttng-dev] Profiling LTTng tracepoint latency on different arm platforms 2023-06-21 14:21 ` Yitschak, Yehuda via lttng-dev @ 2023-09-10 14:18 ` Mousa, Anas via lttng-dev 2023-09-11 15:52 ` Mathieu Desnoyers via lttng-dev 0 siblings, 1 reply; 9+ messages in thread From: Mousa, Anas via lttng-dev @ 2023-09-10 14:18 UTC (permalink / raw) To: Yitschak, Yehuda, Mathieu Desnoyers, lttng-dev@lists.lttng.org [-- Attachment #1.1: Type: text/plain, Size: 3317 bytes --] Hey Mathieu, We see that upon recording a tracepoint, there are multiple stages of reserve-commit-write, where atomics and shared memory accesses take up a big part of the recording time, we're wondering, is there a "light-mode" of recording a tracepoint involving less logic or a mode which can potentially have lower latency? Also, are there any recent docs to share regarding tracepoint latency? Regards, Anas. ________________________________ From: Yitschak, Yehuda Sent: Wednesday, June 21, 2023 5:21:35 PM To: Mathieu Desnoyers; Mousa, Anas; lttng-dev@lists.lttng.org Subject: RE: [EXTERNAL][lttng-dev] Profiling LTTng tracepoint latency on different arm platforms > On 6/21/23 01:39, Yitschak, Yehuda wrote: > >> On 6/20/23 10:20, Mathieu Desnoyers via lttng-dev wrote: > >>> On 6/20/23 06:27, Mousa, Anas via lttng-dev wrote: > >>>> Hello, > >>> > >>>> > >>>> > >> > Arethereanysuggestionstorootcausethehighlatencyandpotentiallyimprovei > >> to > >> n*platform****1*? > >>>> > >>>> Thanks and best regards, > >>>> > >>>> Anas. > >>>> > >>> > >>> I recommend using "perf" when tracing with the sample program in a > >>> loop to figure out the hot spots. With that information on the "fast" > >>> and "slow" system, we might be able to figure out what differs. > >>> > >>> Also, comparing the kernel configurations of the two systems can help. > >>> Also comparing the glibc versions of the two systems would be relevant. > >>> > >> > >> Also make sure you benchmark the lttng "snapshot" mode [1] to make > >> sure you don't run into a situation where the disk/network I/O > >> throughput cannot cope with the generated event throughput, thus > >> causing the ring buffer to discard events. This would therefore > >> "speed up" tracing from the application perspective because > >> discarding an event is faster than writing it to a ring buffer. > > > > You mean we should avoid the "discard" loss mode and use "overwrite" > loss mode since discard mode can fake fast performance ? > > Yes. In addition to use "overwrite-when-buffer-full" mode, the "snapshot" > session also ensures that no consumer daemon extracts the trace data > (unless an explicit snapshot record is performed), which allows comparing > the ring buffer producer performance with minimal noise. > > If you really want to benchmark the discard-when-buffer-full mode and the > the consumer daemon I/O behavior, then you need to take into account > event discarded counts and the actual trace data size that was written to > disk. Since you mentioned this, is there any "stat" command which lists events such as discards and disk writes, etc ? I looked this up in the past but couldn't find anything > > Thanks, > > Mathieu > > > > >> > >> Thanks, > >> > >> Mathieu > >> > >> [1] https://lttng.org/docs/v2.13/#doc-taking-a-snapshot > >> > >>> Thanks, > >>> > >>> Mathieu > >>> > >>> > >> > >> -- > >> Mathieu Desnoyers > >> EfficiOS Inc. > >> https://www.efficios.com > >> > >> _______________________________________________ > >> lttng-dev mailing list > >> lttng-dev@lists.lttng.org > >> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > -- > Mathieu Desnoyers > EfficiOS Inc. > https://www.efficios.com [-- Attachment #1.2: Type: text/html, Size: 7798 bytes --] [-- Attachment #2: Type: text/plain, Size: 156 bytes --] _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [lttng-dev] Profiling LTTng tracepoint latency on different arm platforms 2023-09-10 14:18 ` Mousa, Anas via lttng-dev @ 2023-09-11 15:52 ` Mathieu Desnoyers via lttng-dev 2023-09-11 16:20 ` Michel Dagenais via lttng-dev 0 siblings, 1 reply; 9+ messages in thread From: Mathieu Desnoyers via lttng-dev @ 2023-09-11 15:52 UTC (permalink / raw) To: Mousa, Anas, Yitschak, Yehuda, lttng-dev@lists.lttng.org On 9/10/23 10:18, Mousa, Anas wrote: > Hey Mathieu, Hi Anas, > > We see that upon recording a tracepoint, there are multiple stages of > reserve-commit-write, where atomics and shared memory accesses take up a big part of the > recording time, > > we're wondering, is there a "light-mode" of recording a tracepoint > involving less logic or > > a mode which can potentially have lower latency? I've been working on the rseq(2) system call for a few years now, and this is intended to help reduce the cost of lttng-ust's ring buffer atomics on the tracing fast-path. The road ahead there is integration of rseq with lttng-ust, which did not show up on our customer feature requirements radar yet. In terms of logic involved in the lttng-ust tracepoints, I hope that my current work on "libside" will help steer away from tracepoint providers based on macros and generated code, replacing this by an efficient bytecode interpreter. This should allow me to inline many of the calls that are currently needed between the tracepoint probe provider and the lttng-ust ring buffer. Again, this is an area where I think we can have great speed improvements, but it did not show up on our customer's feature requirement radar yet. > Also, are there any recent docs to share regarding tracepoint latency? There is a Polytechnique student who extensively analyzed this recently. Michel, do you have a pointer to his work ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [lttng-dev] Profiling LTTng tracepoint latency on different arm platforms 2023-09-11 15:52 ` Mathieu Desnoyers via lttng-dev @ 2023-09-11 16:20 ` Michel Dagenais via lttng-dev 0 siblings, 0 replies; 9+ messages in thread From: Michel Dagenais via lttng-dev @ 2023-09-11 16:20 UTC (permalink / raw) To: Mathieu Desnoyers; +Cc: lttng-dev You are probably referring to the work of Mohammad Gebai: Survey and Analysis of Kernel and Userspace Tracers on Linux: Design, Implementation, and Overhead https://dl.acm.org/doi/abs/10.1145/3158644 ----- Le 11 Sep 23, à 11:52, Mathieu Desnoyers mathieu.desnoyers@efficios.com a écrit : > On 9/10/23 10:18, Mousa, Anas wrote: >> Hey Mathieu, > > Hi Anas, > >> >> We see that upon recording a tracepoint, there are multiple stages of >> reserve-commit-write, where atomics and shared memory accesses take up a big >> part of the >> recording time, >> >> we're wondering, is there a "light-mode" of recording a tracepoint >> involving less logic or >> >> a mode which can potentially have lower latency? > > I've been working on the rseq(2) system call for a few years now, and > this is intended to help reduce the cost of lttng-ust's ring buffer > atomics on the tracing fast-path. The road ahead there is integration of > rseq with lttng-ust, which did not show up on our customer feature > requirements radar yet. > > In terms of logic involved in the lttng-ust tracepoints, I hope that my > current work on "libside" will help steer away from tracepoint providers > based on macros and generated code, replacing this by an efficient > bytecode interpreter. This should allow me to inline many of the calls > that are currently needed between the tracepoint probe provider and the > lttng-ust ring buffer. Again, this is an area where I think we can have > great speed improvements, but it did not show up on our customer's > feature requirement radar yet. > >> Also, are there any recent docs to share regarding tracepoint latency? > > There is a Polytechnique student who extensively analyzed this recently. > Michel, do you have a pointer to his work ? > > Thanks, > > Mathieu > > -- > Mathieu Desnoyers > EfficiOS Inc. > https://www.efficios.com _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-09-11 16:26 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-06-20 10:27 [lttng-dev] Profiling LTTng tracepoint latency on different arm platforms Mousa, Anas via lttng-dev 2023-06-20 14:20 ` Mathieu Desnoyers via lttng-dev 2023-06-20 18:03 ` Mathieu Desnoyers via lttng-dev 2023-06-21 5:39 ` Yitschak, Yehuda via lttng-dev 2023-06-21 13:47 ` Mathieu Desnoyers via lttng-dev 2023-06-21 14:21 ` Yitschak, Yehuda via lttng-dev 2023-09-10 14:18 ` Mousa, Anas via lttng-dev 2023-09-11 15:52 ` Mathieu Desnoyers via lttng-dev 2023-09-11 16:20 ` Michel Dagenais via lttng-dev
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).