lttng-dev.lists.lttng.org archive mirror
 help / color / mirror / Atom feed
* [lttng-dev] (no subject)
@ 2023-06-20 10:21 Mousa, Anas via lttng-dev
  0 siblings, 0 replies; 2+ messages in thread
From: Mousa, Anas via lttng-dev @ 2023-06-20 10:21 UTC (permalink / raw)
  To: lttng-dev@lists.lttng.org


[-- Attachment #1.1: Type: text/plain, Size: 2566 bytes --]

Hello,

I've recently profiled the latency of LTTng tracepoints on arm platforms,

using the follow sample program:

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

static inline uint64_t get_time_nsec(void)

{

        struct timespec ts;

        if (caa_unlikely(clock_gettime(CLOCK_MONOTONIC, &ts))) {

                ts.tv_sec = 0;

                ts.tv_nsec = 0;

        }

        return ((uint64_t) ts.tv_sec * 1000000000ULL) + ts.tv_nsec;

}


int main(int argc, char *argv[])

{

    unsigned int i;

    int tp_num = 0;

    uint64_t total_time = 0;

    uint64_t now, nowz;

    if (argc > 1) {

        sscanf (argv[1],"%d",&tp_num);

    }

    for (i = 0; i < tp_num; i++) {

        now = get_time_nsec();

        lttng_ust_tracepoint(hello_world, my_first_tracepoint,

                             i, "some_str");

        nowz = get_time_nsec();

        total_time += (nowz - now);

    }

    if (tp_num) {

        printf("---------------------------Average TP time is %"PRIu64"---------------------------\n", total_time / tp_num);

    }

}

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


I observed a big average latency variance on different platforms when tracing a high number (many thousands to millions) of tracepoints:

  *   [platform 1] with CPU info running a linux kernel based on Buildroot (4.19.273 aarch64 GNU/Linux):

BogoMIPS        : 187.50

Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid

CPU implementer : 0x41

CPU architecture: 8

CPU variant     : 0x0

CPU part        : 0xd08

CPU revision    : 3

  *    Saw an average latency of 2-3usec


  *   [platform 2] with CPU info running a linux kernel based on Amazon Linux (4.14.294-220.533.amzn2.aarch64 aarch64 GNU/Linux):

BogoMIPS        : 243.75

Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs

CPU implementer : 0x41

CPU architecture: 8

CPU variant     : 0x3

CPU part        : 0xd0c

CPU revision    : 1

  *   Saw an average latency of ~0.5usec



Are there any suggestions to root cause the high latency and potentially improve it on platform 1?


Thanks and best regards,

Anas.


[-- Attachment #1.2: Type: text/html, Size: 96979 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 2+ messages in thread
* [lttng-dev] (no subject)
@ 2023-03-21 13:30 Ondřej Surý via lttng-dev
  0 siblings, 0 replies; 2+ messages in thread
From: Ondřej Surý via lttng-dev @ 2023-03-21 13:30 UTC (permalink / raw)
  To: lttng-dev

This is a second round of the patches after implementing the requested changes
from the first round.

Ondrej

[PATCH 1/7] Require __atomic builtins to build
- no changes

[PATCH 2/7] Use gcc __atomic builtis for <urcu/uatomic.h>
- the non return macros are now __ATOMIC_RELAXED
- the return macros are now __ATOMIC_SEQ_CST
- the memory barriers are now

[PATCH 3/7] Use __atomic_signal_fence() for cmm_barrier()
- this now uses __atomic_signal_fence() instead of __atomic_thread_fence()

[PATCH 4/7] Replace the internal pointer manipulation with __atomic
- changed the memory ordering to __ATOMIC_SEQ_CST for xchg and cmpxchg 

[PATCH 5/7] Replace the arch-specific memory barriers with __atomic
- dropped the changes to urcu/arch.h
- removed all custom cmm_*() macros from urcu/arch/*.h
- added the generic __atomic implementation to urcu/arch/generic.h

This was it's still possible to override the generics with arch specific macros.

[PATCH 6/7] Use __atomic builtins to implement CMM_{LOAD,STORE}_SHARED
- _CMM_STORE_SHARED and CMM_STORE_SHARED now returns the stored value

[PATCH 7/7] Fix: uatomic_or() need retyping to uintptr_t in
- no changes

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-06-20 10:21 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-20 10:21 [lttng-dev] (no subject) Mousa, Anas via lttng-dev
  -- strict thread matches above, loose matches on Subject: below --
2023-03-21 13:30 Ondřej Surý via lttng-dev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).