From: Kris Van Hees <kris.van.hees@oracle.com>
To: Eugene Loh <eugene.loh@oracle.com>
Cc: Kris Van Hees <kris.van.hees@oracle.com>,
dtrace@lists.linux.dev, dtrace-devel@oss.oracle.com
Subject: Re: [DTrace-devel] [PATCH 1/2] Add a cpuinfos BPF map
Date: Tue, 1 Apr 2025 19:03:55 -0400 [thread overview]
Message-ID: <Z+xw22FDG66lrCkN@oracle.com> (raw)
In-Reply-To: <fd3e36b0-33f6-9d4f-7997-30dae143c6e1@oracle.com>
On Tue, Apr 01, 2025 at 06:54:29PM -0400, Eugene Loh wrote:
> Here is a proposal. First, two observations:
>
> 1. (As Alan pointed out to me in a facepalm moment), one can write a simple
> D script to check enqueue_task_*()'s rq->cpu against the current CPU. He
> and I both find that the two CPUs are generally -- but not always -- the
> same. So, the strictly correct thing to do is use the rq->cpu value, even
> though you can just use the current CPU and be correct "99%" of the time.
Sort of what I expected. But nice to see it confirmed.
> 2. A BPF program can access per-cpu-array values on other CPUs. Well, I
> guess you need commit 0734311 ("bpf: add bpf_map_lookup_percpu_elem for
> percpu map"). That's in 5.18. That is, UEK9.
We oculd get that backported I bet but that doesn't help upstream. So, not
really worth asking for a backport I think.
> So my proposal is to leave the per-cpu cpuinfo BPF map alone. Perform a
> runtime test whether bpf_map_lookup_percpu_elem() is available. If so, do
> that cross-CPU lookup -- the 2/2 patch I posted -- but using the new helper
> function. If not, use a simpler on-CPU lookup, which should be right "99%"
> of the time. (I have a simple patch that uses the current CPU. Pretty
> simple.)
But... 95% correct of the time doesn't quite cut it. I could see some quite
useful case for using these probes to specifically capture the times when it
does *not* originate from the same CPU. Settling for 95% correctness seems
like an odd tradeoff to me. Especally since we can get it right 100% of the
time without too much trouble. Just convert the cpuinfo map to a regular
array map, and instead of indexing it with a 0 key all the time, index it with
the value returned by bpf_get_smp_processor_id((). Then we have code that will
work on all kernels - no special casing. And I do not see there being much
performance difference by going this route.
> On 4/1/25 01:36, Kris Van Hees wrote:
>
> > This is not the way to go about this. If, in order to implement the cpuinfo_t
> > argument to sched probes, a regular BPF array map is needed so that cpuinfo
> > data can be accessed for any given CPU id, then the existing map should be
> > replaced with the new one, and its use updated to access the new one. That
> > way you can also keep the name of the map, etc...
> >
> > Introducing this new map with exactly the same data, and then hoping to
> > deprecate the old one later is making things more messy.
> >
> > On Mon, Mar 31, 2025 at 05:45:00PM -0400, eugene.loh--- via DTrace-devel wrote:
> > > From: Eugene Loh <eugene.loh@oracle.com>
> > >
> > > The cpuinfo BPF map is a per-CPU map that has CPU information
> > > on each CPU for that CPU.
> > >
> > > Add a cpuinfos BPF map that allows any CPU to access information
> > > for any other CPU.
> > >
> > > For now, we retain the older per-CPU map. If desired, a future
> > > patch can migrate existing uses of the per-CPU map to the new
> > > map, decommissioning the old one. This would include map set up:
> > >
> > > *) libdtrace/dt_dlibs.c: DT_BPF_SYMBOL(cpuinfo, DT_IDENT_PTR),
> > >
> > > *) libdtrace/dt_impl.h: int dt_cpumap_fd;
> > >
> > > *) libdtrace/dt_bpf.c: dtp->dt_cpumap_fd = ...
> > > libdtrace/dt_bpf.c: CREATE_MAP(cpuinfo)
> > >
> > > and map use:
> > >
> > > *) bpf/get_agg.c
> > > *) bpf/get_bvar.c
> > > *) libdtrace/dt_cg.c
> > > *) libdtrace/dt_prov_lockstat.c
> > >
> > > Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
> > > ---
> > > libdtrace/dt_bpf.c | 13 +++++++++++++
> > > libdtrace/dt_dlibs.c | 1 +
> > > libdtrace/dt_impl.h | 1 +
> > > 3 files changed, 15 insertions(+)
> > >
> > > diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> > > index 6d42a96c7..8da51d6b9 100644
> > > --- a/libdtrace/dt_bpf.c
> > > +++ b/libdtrace/dt_bpf.c
> > > @@ -786,7 +786,20 @@ gmap_create_cpuinfo(dtrace_hdl_t *dtp)
> > > if (dtp->dt_cpumap_fd == -1)
> > > return -1;
> > > + dtp->dt_cpusmap_fd = create_gmap(dtp, "cpuinfos",
> > > + BPF_MAP_TYPE_HASH,
> > > + sizeof(uint32_t),
> > > + sizeof(dt_bpf_cpuinfo_t), ncpus);
> > > + if (dtp->dt_cpusmap_fd == -1)
> > > + return -1;
> > > +
> > > rc = dt_bpf_map_update(dtp->dt_cpumap_fd, &key, data);
> > > +
> > > + for (i = 0, ci = &conf->cpus[0]; i < ncpus && rc != -1; i++, ci++) {
> > > + key = ci->cpu_id;
> > > + rc = dt_bpf_map_update(dtp->dt_cpusmap_fd, &key, &data[ci->cpu_id]);
> > > + }
> > > +
> > > dt_free(dtp, data);
> > > if (rc == -1)
> > > return dt_bpf_error(dtp,
> > > diff --git a/libdtrace/dt_dlibs.c b/libdtrace/dt_dlibs.c
> > > index 21df22a8a..0f19f3566 100644
> > > --- a/libdtrace/dt_dlibs.c
> > > +++ b/libdtrace/dt_dlibs.c
> > > @@ -61,6 +61,7 @@ static const dt_ident_t dt_bpf_symbols[] = {
> > > DT_BPF_SYMBOL(agggen, DT_IDENT_PTR),
> > > DT_BPF_SYMBOL(buffers, DT_IDENT_PTR),
> > > DT_BPF_SYMBOL(cpuinfo, DT_IDENT_PTR),
> > > + DT_BPF_SYMBOL(cpuinfos, DT_IDENT_PTR),
> > > DT_BPF_SYMBOL(dvars, DT_IDENT_PTR),
> > > DT_BPF_SYMBOL(gvars, DT_IDENT_PTR),
> > > DT_BPF_SYMBOL(lvars, DT_IDENT_PTR),
> > > diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
> > > index 68fb8ec53..a5e42801c 100644
> > > --- a/libdtrace/dt_impl.h
> > > +++ b/libdtrace/dt_impl.h
> > > @@ -390,6 +390,7 @@ struct dtrace_hdl {
> > > int dt_aggmap_fd; /* file descriptor for the 'aggs' BPF map */
> > > int dt_genmap_fd; /* file descriptor for the 'agggen' BPF map */
> > > int dt_cpumap_fd; /* file descriptor for the 'cpuinfo' BPF map */
> > > + int dt_cpusmap_fd; /* file descriptor for the 'cpuinfos' BPF map */
> > > int dt_usdt_pridsmap_fd; /* file descriptor for the 'usdt_prids' BPF map */
> > > int dt_usdt_namesmap_fd; /* file descriptor for the 'usdt_names' BPF map */
> > > dtrace_handle_err_f *dt_errhdlr; /* error handler, if any */
> > > --
> > > 2.43.5
> > >
> > >
> > > _______________________________________________
> > > DTrace-devel mailing list
> > > DTrace-devel@oss.oracle.com
> > > https://oss.oracle.com/mailman/listinfo/dtrace-devel
next prev parent reply other threads:[~2025-04-01 23:04 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-31 21:45 [PATCH 1/2] Add a cpuinfos BPF map eugene.loh
2025-03-31 21:45 ` [PATCH 2/2] Clean up sched provider trampoline FIXMEs eugene.loh
2025-04-01 5:36 ` [DTrace-devel] [PATCH 1/2] Add a cpuinfos BPF map Kris Van Hees
2025-04-01 22:54 ` Eugene Loh
2025-04-01 23:03 ` Kris Van Hees [this message]
2025-04-02 23:50 ` Eugene Loh
2025-04-02 9:37 ` Alan Maguire
2025-04-02 19:18 ` Eugene Loh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z+xw22FDG66lrCkN@oracle.com \
--to=kris.van.hees@oracle.com \
--cc=dtrace-devel@oss.oracle.com \
--cc=dtrace@lists.linux.dev \
--cc=eugene.loh@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.