From: Kris Van Hees <kris.van.hees@oracle.com>
To: Eugene Loh <eugene.loh@oracle.com>
Cc: Kris Van Hees <kris.van.hees@oracle.com>,
dtrace@lists.linux.dev, dtrace-devel@oss.oracle.com
Subject: Re: [DTrace-devel] [PATCH 1/2] Add a cpuinfos BPF map
Date: Tue, 1 Apr 2025 19:03:55 -0400 [thread overview]
Message-ID: <Z+xw22FDG66lrCkN@oracle.com> (raw)
In-Reply-To: <fd3e36b0-33f6-9d4f-7997-30dae143c6e1@oracle.com>
On Tue, Apr 01, 2025 at 06:54:29PM -0400, Eugene Loh wrote:
> Here is a proposal. First, two observations:
>
> 1. (As Alan pointed out to me in a facepalm moment), one can write a simple
> D script to check enqueue_task_*()'s rq->cpu against the current CPU. He
> and I both find that the two CPUs are generally -- but not always -- the
> same. So, the strictly correct thing to do is use the rq->cpu value, even
> though you can just use the current CPU and be correct "99%" of the time.
Sort of what I expected. But nice to see it confirmed.
> 2. A BPF program can access per-cpu-array values on other CPUs. Well, I
> guess you need commit 0734311 ("bpf: add bpf_map_lookup_percpu_elem for
> percpu map"). That's in 5.18. That is, UEK9.
We oculd get that backported I bet but that doesn't help upstream. So, not
really worth asking for a backport I think.
> So my proposal is to leave the per-cpu cpuinfo BPF map alone. Perform a
> runtime test whether bpf_map_lookup_percpu_elem() is available. If so, do
> that cross-CPU lookup -- the 2/2 patch I posted -- but using the new helper
> function. If not, use a simpler on-CPU lookup, which should be right "99%"
> of the time. (I have a simple patch that uses the current CPU. Pretty
> simple.)
But... 95% correct of the time doesn't quite cut it. I could see some quite
useful case for using these probes to specifically capture the times when it
does *not* originate from the same CPU. Settling for 95% correctness seems
like an odd tradeoff to me. Especally since we can get it right 100% of the
time without too much trouble. Just convert the cpuinfo map to a regular
array map, and instead of indexing it with a 0 key all the time, index it with
the value returned by bpf_get_smp_processor_id((). Then we have code that will
work on all kernels - no special casing. And I do not see there being much
performance difference by going this route.
> On 4/1/25 01:36, Kris Van Hees wrote:
>
> > This is not the way to go about this. If, in order to implement the cpuinfo_t
> > argument to sched probes, a regular BPF array map is needed so that cpuinfo
> > data can be accessed for any given CPU id, then the existing map should be
> > replaced with the new one, and its use updated to access the new one. That
> > way you can also keep the name of the map, etc...
> >
> > Introducing this new map with exactly the same data, and then hoping to
> > deprecate the old one later is making things more messy.
> >
> > On Mon, Mar 31, 2025 at 05:45:00PM -0400, eugene.loh--- via DTrace-devel wrote:
> > > From: Eugene Loh <eugene.loh@oracle.com>
> > >
> > > The cpuinfo BPF map is a per-CPU map that has CPU information
> > > on each CPU for that CPU.
> > >
> > > Add a cpuinfos BPF map that allows any CPU to access information
> > > for any other CPU.
> > >
> > > For now, we retain the older per-CPU map. If desired, a future
> > > patch can migrate existing uses of the per-CPU map to the new
> > > map, decommissioning the old one. This would include map set up:
> > >
> > > *) libdtrace/dt_dlibs.c: DT_BPF_SYMBOL(cpuinfo, DT_IDENT_PTR),
> > >
> > > *) libdtrace/dt_impl.h: int dt_cpumap_fd;
> > >
> > > *) libdtrace/dt_bpf.c: dtp->dt_cpumap_fd = ...
> > > libdtrace/dt_bpf.c: CREATE_MAP(cpuinfo)
> > >
> > > and map use:
> > >
> > > *) bpf/get_agg.c
> > > *) bpf/get_bvar.c
> > > *) libdtrace/dt_cg.c
> > > *) libdtrace/dt_prov_lockstat.c
> > >
> > > Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
> > > ---
> > > libdtrace/dt_bpf.c | 13 +++++++++++++
> > > libdtrace/dt_dlibs.c | 1 +
> > > libdtrace/dt_impl.h | 1 +
> > > 3 files changed, 15 insertions(+)
> > >
> > > diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> > > index 6d42a96c7..8da51d6b9 100644
> > > --- a/libdtrace/dt_bpf.c
> > > +++ b/libdtrace/dt_bpf.c
> > > @@ -786,7 +786,20 @@ gmap_create_cpuinfo(dtrace_hdl_t *dtp)
> > > if (dtp->dt_cpumap_fd == -1)
> > > return -1;
> > > + dtp->dt_cpusmap_fd = create_gmap(dtp, "cpuinfos",
> > > + BPF_MAP_TYPE_HASH,
> > > + sizeof(uint32_t),
> > > + sizeof(dt_bpf_cpuinfo_t), ncpus);
> > > + if (dtp->dt_cpusmap_fd == -1)
> > > + return -1;
> > > +
> > > rc = dt_bpf_map_update(dtp->dt_cpumap_fd, &key, data);
> > > +
> > > + for (i = 0, ci = &conf->cpus[0]; i < ncpus && rc != -1; i++, ci++) {
> > > + key = ci->cpu_id;
> > > + rc = dt_bpf_map_update(dtp->dt_cpusmap_fd, &key, &data[ci->cpu_id]);
> > > + }
> > > +
> > > dt_free(dtp, data);
> > > if (rc == -1)
> > > return dt_bpf_error(dtp,
> > > diff --git a/libdtrace/dt_dlibs.c b/libdtrace/dt_dlibs.c
> > > index 21df22a8a..0f19f3566 100644
> > > --- a/libdtrace/dt_dlibs.c
> > > +++ b/libdtrace/dt_dlibs.c
> > > @@ -61,6 +61,7 @@ static const dt_ident_t dt_bpf_symbols[] = {
> > > DT_BPF_SYMBOL(agggen, DT_IDENT_PTR),
> > > DT_BPF_SYMBOL(buffers, DT_IDENT_PTR),
> > > DT_BPF_SYMBOL(cpuinfo, DT_IDENT_PTR),
> > > + DT_BPF_SYMBOL(cpuinfos, DT_IDENT_PTR),
> > > DT_BPF_SYMBOL(dvars, DT_IDENT_PTR),
> > > DT_BPF_SYMBOL(gvars, DT_IDENT_PTR),
> > > DT_BPF_SYMBOL(lvars, DT_IDENT_PTR),
> > > diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
> > > index 68fb8ec53..a5e42801c 100644
> > > --- a/libdtrace/dt_impl.h
> > > +++ b/libdtrace/dt_impl.h
> > > @@ -390,6 +390,7 @@ struct dtrace_hdl {
> > > int dt_aggmap_fd; /* file descriptor for the 'aggs' BPF map */
> > > int dt_genmap_fd; /* file descriptor for the 'agggen' BPF map */
> > > int dt_cpumap_fd; /* file descriptor for the 'cpuinfo' BPF map */
> > > + int dt_cpusmap_fd; /* file descriptor for the 'cpuinfos' BPF map */
> > > int dt_usdt_pridsmap_fd; /* file descriptor for the 'usdt_prids' BPF map */
> > > int dt_usdt_namesmap_fd; /* file descriptor for the 'usdt_names' BPF map */
> > > dtrace_handle_err_f *dt_errhdlr; /* error handler, if any */
> > > --
> > > 2.43.5
> > >
> > >
> > > _______________________________________________
> > > DTrace-devel mailing list
> > > DTrace-devel@oss.oracle.com
> > > https://oss.oracle.com/mailman/listinfo/dtrace-devel
next prev parent reply other threads:[~2025-04-01 23:04 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-31 21:45 [PATCH 1/2] Add a cpuinfos BPF map eugene.loh
2025-03-31 21:45 ` [PATCH 2/2] Clean up sched provider trampoline FIXMEs eugene.loh
2025-04-01 5:36 ` [DTrace-devel] [PATCH 1/2] Add a cpuinfos BPF map Kris Van Hees
2025-04-01 22:54 ` Eugene Loh
2025-04-01 23:03 ` Kris Van Hees [this message]
2025-04-02 23:50 ` Eugene Loh
2025-04-02 9:37 ` Alan Maguire
2025-04-02 19:18 ` Eugene Loh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z+xw22FDG66lrCkN@oracle.com \
--to=kris.van.hees@oracle.com \
--cc=dtrace-devel@oss.oracle.com \
--cc=dtrace@lists.linux.dev \
--cc=eugene.loh@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox