From: Kris Van Hees <kris.van.hees@oracle.com>
To: eugene.loh@oracle.com
Cc: dtrace@lists.linux.dev, dtrace-devel@oss.oracle.com
Subject: Re: [DTrace-devel] [PATCH 1/3] Cache per-CPU agg map IDs
Date: Wed, 16 Jul 2025 15:10:00 -0400 [thread overview]
Message-ID: <aHf5CFrpkCFYlz3F@oracle.com> (raw)
In-Reply-To: <20250501182252.27772-1-eugene.loh@oracle.com>
On Thu, May 01, 2025 at 02:22:50PM -0400, eugene.loh--- via DTrace-devel wrote:
> From: Eugene Loh <eugene.loh@oracle.com>
>
> The dt_bpf_map_lookup_fd(dtp->dt_aggmap_fd, &cpu) call that is used to
> snap or truncate aggregations takes a few milliseconds, which seems all
> right. For large systems (e.g., 100 CPUs) and many truncations (e.g.,
> tst.trunc.d, etc.), however, a trunc() might end up costing a minute on
> the consumer side, which is unreasonable and causes such tests to time
> out. The run time is due almost exclusively to looking up the per-CPU
> agg map ID.
Of course, we only go from 2 bpf() syscalls per call to 1 bpf() syscall per
call. Have you collected timings for the two kinds of bpf calls we use?
I would actually expect that the larger cost lies with the
bpf_map_get_fd_by_id() one rather than the bpf_map_lookup() since the
fd_by_id needs to allocate an fd and link it with a map.
In all, how bad would it be if we were to just keep the fds open for the
per-CPU aggregation data, and not having to do these bpf syscalls at all.
But we could end up with 100s of fds open of course...
> Cache the per-CPU agg map IDs.
>
> Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
> ---
> libdtrace/dt_aggregate.c | 5 ++---
> libdtrace/dt_bpf.c | 6 ++++++
> libdtrace/dt_impl.h | 1 +
> libdtrace/dt_open.c | 1 +
> 4 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/libdtrace/dt_aggregate.c b/libdtrace/dt_aggregate.c
> index 9e47fcab7..86f9d4d5b 100644
> --- a/libdtrace/dt_aggregate.c
> +++ b/libdtrace/dt_aggregate.c
> @@ -800,7 +800,7 @@ dtrace_aggregate_snap(dtrace_hdl_t *dtp)
>
> for (i = 0; i < dtp->dt_conf.num_online_cpus; i++) {
> int cpu = dtp->dt_conf.cpus[i].cpu_id;
> - int fd = dt_bpf_map_lookup_fd(dtp->dt_aggmap_fd, &cpu);
> + int fd = dt_bpf_map_get_fd_by_id(dtp->dt_aggmap_ids[i]);
>
> if (fd < 0)
> return DTRACE_WORKSTATUS_ERROR;
> @@ -1232,8 +1232,7 @@ dt_aggwalk_remove(dtrace_hdl_t *dtp, dt_ahashent_t *h)
> memcpy(key, agd->dtada_key, agd->dtada_desc->dtagd_ksize);
>
> for (i = 0; i < ncpus; i++) {
> - int cpu = dtp->dt_conf.cpus[i].cpu_id;
> - int fd = dt_bpf_map_lookup_fd(dtp->dt_aggmap_fd, &cpu);
> + int fd = dt_bpf_map_get_fd_by_id(dtp->dt_aggmap_ids[i]);
>
> if (fd < 0)
> return DTRACE_WORKSTATUS_ERROR;
> diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> index d6722cbd1..635780738 100644
> --- a/libdtrace/dt_bpf.c
> +++ b/libdtrace/dt_bpf.c
> @@ -689,6 +689,10 @@ gmap_create_aggs(dtrace_hdl_t *dtp)
> if (dtp->dt_aggmap_fd == -1)
> return -1;
>
> + dtp->dt_aggmap_ids = dt_calloc(dtp, dtp->dt_conf.num_online_cpus, sizeof(int));
> + if (dtp->dt_aggmap_ids == NULL)
> + return dt_set_errno(dtp, EDT_NOMEM);
> +
> for (i = 0; i < dtp->dt_conf.num_online_cpus; i++) {
> int cpu = dtp->dt_conf.cpus[i].cpu_id;
> char name[16];
> @@ -702,6 +706,8 @@ gmap_create_aggs(dtrace_hdl_t *dtp)
> return map_create_error(dtp, name, errno);
>
> dt_bpf_map_update(dtp->dt_aggmap_fd, &cpu, &fd);
> + if (dt_bpf_map_lookup(dtp->dt_aggmap_fd, &cpu, &dtp->dt_aggmap_ids[i]) < 0)
> + return -1;
> }
>
> /* Create the agg generation value array. */
> diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
> index 68fb8ec53..1033154d9 100644
> --- a/libdtrace/dt_impl.h
> +++ b/libdtrace/dt_impl.h
> @@ -388,6 +388,7 @@ struct dtrace_hdl {
> int dt_proc_fd; /* file descriptor for proc eventfd */
> int dt_stmap_fd; /* file descriptor for the 'state' BPF map */
> int dt_aggmap_fd; /* file descriptor for the 'aggs' BPF map */
> + int *dt_aggmap_ids; /* ids for the 'aggN' BPF maps */
> int dt_genmap_fd; /* file descriptor for the 'agggen' BPF map */
> int dt_cpumap_fd; /* file descriptor for the 'cpuinfo' BPF map */
> int dt_usdt_pridsmap_fd; /* file descriptor for the 'usdt_prids' BPF map */
> diff --git a/libdtrace/dt_open.c b/libdtrace/dt_open.c
> index 71ee21467..7da4c82cc 100644
> --- a/libdtrace/dt_open.c
> +++ b/libdtrace/dt_open.c
> @@ -1233,6 +1233,7 @@ dtrace_close(dtrace_hdl_t *dtp)
> dt_probe_detach_all(dtp);
>
> dt_free(dtp, dtp->dt_conf.cpus);
> + dt_free(dtp, dtp->dt_aggmap_ids);
>
> if (dtp->dt_procs != NULL)
> dt_proc_hash_destroy(dtp);
> --
> 2.43.5
>
>
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel
next prev parent reply other threads:[~2025-07-16 19:10 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-01 18:22 [PATCH 1/3] Cache per-CPU agg map IDs eugene.loh
2025-05-01 18:22 ` [PATCH 2/3] test: Convert tick-* probes to ioctl:entry for tst.trunc[quant].d eugene.loh
2025-07-16 13:21 ` Nick Alcock
2025-05-01 18:22 ` [PATCH 3/3] test: Mimic dtrace arithmetic more closely for avg/stddev eugene.loh
2025-07-16 13:22 ` [DTrace-devel] " Nick Alcock
2025-07-16 19:51 ` Eugene Loh
2025-07-16 13:20 ` [DTrace-devel] [PATCH 1/3] Cache per-CPU agg map IDs Nick Alcock
2025-07-16 18:42 ` Eugene Loh
2025-07-16 19:10 ` Kris Van Hees [this message]
2025-07-16 20:05 ` Eugene Loh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aHf5CFrpkCFYlz3F@oracle.com \
--to=kris.van.hees@oracle.com \
--cc=dtrace-devel@oss.oracle.com \
--cc=dtrace@lists.linux.dev \
--cc=eugene.loh@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox