From: Kris Van Hees <kris.van.hees@oracle.com>
To: eugene.loh@oracle.com
Cc: dtrace@lists.linux.dev, dtrace-devel@oss.oracle.com
Subject: Re: [DTrace-devel] [PATCH 1/3] Cache per-CPU agg map IDs
Date: Wed, 16 Jul 2025 15:10:00 -0400 [thread overview]
Message-ID: <aHf5CFrpkCFYlz3F@oracle.com> (raw)
In-Reply-To: <20250501182252.27772-1-eugene.loh@oracle.com>
On Thu, May 01, 2025 at 02:22:50PM -0400, eugene.loh--- via DTrace-devel wrote:
> From: Eugene Loh <eugene.loh@oracle.com>
>
> The dt_bpf_map_lookup_fd(dtp->dt_aggmap_fd, &cpu) call that is used to
> snap or truncate aggregations takes a few milliseconds, which seems all
> right. For large systems (e.g., 100 CPUs) and many truncations (e.g.,
> tst.trunc.d, etc.), however, a trunc() might end up costing a minute on
> the consumer side, which is unreasonable and causes such tests to time
> out. The run time is due almost exclusively to looking up the per-CPU
> agg map ID.
Of course, we only go from 2 bpf() syscalls per call to 1 bpf() syscall per
call. Have you collected timings for the two kinds of bpf calls we use?
I would actually expect that the larger cost lies with the
bpf_map_get_fd_by_id() one rather than the bpf_map_lookup() since the
fd_by_id needs to allocate an fd and link it with a map.
In all, how bad would it be if we were to just keep the fds open for the
per-CPU aggregation data, and not having to do these bpf syscalls at all.
But we could end up with 100s of fds open of course...
> Cache the per-CPU agg map IDs.
>
> Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
> ---
> libdtrace/dt_aggregate.c | 5 ++---
> libdtrace/dt_bpf.c | 6 ++++++
> libdtrace/dt_impl.h | 1 +
> libdtrace/dt_open.c | 1 +
> 4 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/libdtrace/dt_aggregate.c b/libdtrace/dt_aggregate.c
> index 9e47fcab7..86f9d4d5b 100644
> --- a/libdtrace/dt_aggregate.c
> +++ b/libdtrace/dt_aggregate.c
> @@ -800,7 +800,7 @@ dtrace_aggregate_snap(dtrace_hdl_t *dtp)
>
> for (i = 0; i < dtp->dt_conf.num_online_cpus; i++) {
> int cpu = dtp->dt_conf.cpus[i].cpu_id;
> - int fd = dt_bpf_map_lookup_fd(dtp->dt_aggmap_fd, &cpu);
> + int fd = dt_bpf_map_get_fd_by_id(dtp->dt_aggmap_ids[i]);
>
> if (fd < 0)
> return DTRACE_WORKSTATUS_ERROR;
> @@ -1232,8 +1232,7 @@ dt_aggwalk_remove(dtrace_hdl_t *dtp, dt_ahashent_t *h)
> memcpy(key, agd->dtada_key, agd->dtada_desc->dtagd_ksize);
>
> for (i = 0; i < ncpus; i++) {
> - int cpu = dtp->dt_conf.cpus[i].cpu_id;
> - int fd = dt_bpf_map_lookup_fd(dtp->dt_aggmap_fd, &cpu);
> + int fd = dt_bpf_map_get_fd_by_id(dtp->dt_aggmap_ids[i]);
>
> if (fd < 0)
> return DTRACE_WORKSTATUS_ERROR;
> diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> index d6722cbd1..635780738 100644
> --- a/libdtrace/dt_bpf.c
> +++ b/libdtrace/dt_bpf.c
> @@ -689,6 +689,10 @@ gmap_create_aggs(dtrace_hdl_t *dtp)
> if (dtp->dt_aggmap_fd == -1)
> return -1;
>
> + dtp->dt_aggmap_ids = dt_calloc(dtp, dtp->dt_conf.num_online_cpus, sizeof(int));
> + if (dtp->dt_aggmap_ids == NULL)
> + return dt_set_errno(dtp, EDT_NOMEM);
> +
> for (i = 0; i < dtp->dt_conf.num_online_cpus; i++) {
> int cpu = dtp->dt_conf.cpus[i].cpu_id;
> char name[16];
> @@ -702,6 +706,8 @@ gmap_create_aggs(dtrace_hdl_t *dtp)
> return map_create_error(dtp, name, errno);
>
> dt_bpf_map_update(dtp->dt_aggmap_fd, &cpu, &fd);
> + if (dt_bpf_map_lookup(dtp->dt_aggmap_fd, &cpu, &dtp->dt_aggmap_ids[i]) < 0)
> + return -1;
> }
>
> /* Create the agg generation value array. */
> diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
> index 68fb8ec53..1033154d9 100644
> --- a/libdtrace/dt_impl.h
> +++ b/libdtrace/dt_impl.h
> @@ -388,6 +388,7 @@ struct dtrace_hdl {
> int dt_proc_fd; /* file descriptor for proc eventfd */
> int dt_stmap_fd; /* file descriptor for the 'state' BPF map */
> int dt_aggmap_fd; /* file descriptor for the 'aggs' BPF map */
> + int *dt_aggmap_ids; /* ids for the 'aggN' BPF maps */
> int dt_genmap_fd; /* file descriptor for the 'agggen' BPF map */
> int dt_cpumap_fd; /* file descriptor for the 'cpuinfo' BPF map */
> int dt_usdt_pridsmap_fd; /* file descriptor for the 'usdt_prids' BPF map */
> diff --git a/libdtrace/dt_open.c b/libdtrace/dt_open.c
> index 71ee21467..7da4c82cc 100644
> --- a/libdtrace/dt_open.c
> +++ b/libdtrace/dt_open.c
> @@ -1233,6 +1233,7 @@ dtrace_close(dtrace_hdl_t *dtp)
> dt_probe_detach_all(dtp);
>
> dt_free(dtp, dtp->dt_conf.cpus);
> + dt_free(dtp, dtp->dt_aggmap_ids);
>
> if (dtp->dt_procs != NULL)
> dt_proc_hash_destroy(dtp);
> --
> 2.43.5
>
>
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel
next prev parent reply other threads:[~2025-07-16 19:10 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-01 18:22 [PATCH 1/3] Cache per-CPU agg map IDs eugene.loh
2025-05-01 18:22 ` [PATCH 2/3] test: Convert tick-* probes to ioctl:entry for tst.trunc[quant].d eugene.loh
2025-07-16 13:21 ` Nick Alcock
2025-05-01 18:22 ` [PATCH 3/3] test: Mimic dtrace arithmetic more closely for avg/stddev eugene.loh
2025-07-16 13:22 ` [DTrace-devel] " Nick Alcock
2025-07-16 19:51 ` Eugene Loh
2025-07-16 13:20 ` [DTrace-devel] [PATCH 1/3] Cache per-CPU agg map IDs Nick Alcock
2025-07-16 18:42 ` Eugene Loh
2025-07-16 19:10 ` Kris Van Hees [this message]
2025-07-16 20:05 ` Eugene Loh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aHf5CFrpkCFYlz3F@oracle.com \
--to=kris.van.hees@oracle.com \
--cc=dtrace-devel@oss.oracle.com \
--cc=dtrace@lists.linux.dev \
--cc=eugene.loh@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.