[PATCH] bpf: allocate the buffers BPF map to fit highest CPU id

public inbox for dtrace@lists.linux.dev
 help / color / mirror / Atom feed

* [PATCH] bpf: allocate the buffers BPF map to fit highest CPU id
@ 2025-12-11 22:22 Kris Van Hees
  2025-12-11 23:16 ` [DTrace-devel] " Eugene Loh
  0 siblings, 1 reply; 6+ messages in thread
From: Kris Van Hees @ 2025-12-11 22:22 UTC (permalink / raw)
  To: dtrace, dtrace-devel

Even when less than the possible number of CPUs are online, the 'buffers'
BPF map should be allocated based on the highest possible CPU id because
probe data is written to the bufer that corresponds to a given CPU id,
which could be part of non-sequential CPU id configurations.

Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
---
 libdtrace/dt_bpf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
index 0a57b7d2..6568a572 100644
--- a/libdtrace/dt_bpf.c
+++ b/libdtrace/dt_bpf.c
@@ -755,7 +755,7 @@ gmap_create_buffers(dtrace_hdl_t *dtp)
 {
 	return create_gmap(dtp, "buffers", BPF_MAP_TYPE_PERF_EVENT_ARRAY,
 			   sizeof(uint32_t), sizeof(uint32_t),
-			   dtp->dt_conf.num_online_cpus);
+			   dtp->dt_conf.max_cpuid);
 }
 
 /*
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [DTrace-devel] [PATCH] bpf: allocate the buffers BPF map to fit highest CPU id
  2025-12-11 22:22 [PATCH] bpf: allocate the buffers BPF map to fit highest CPU id Kris Van Hees
@ 2025-12-11 23:16 ` Eugene Loh
  2025-12-12  4:36   ` Kris Van Hees
  0 siblings, 1 reply; 6+ messages in thread
From: Eugene Loh @ 2025-12-11 23:16 UTC (permalink / raw)
  To: Kris Van Hees, dtrace, dtrace-devel

On 12/11/25 17:22, Kris Van Hees via DTrace-devel wrote:

> Even when less than the possible number of CPUs are online, the 'buffers'
> BPF map should be allocated based on the highest possible CPU id because
> probe data is written to the bufer

s/bufer/buffer/

I guess a problem here is that we do not test this case.  It would be 
hard, but I suppose we should change this situation?

What are the chances of our at least writing out what is per online CPU 
and what is per possible CPU id?  That way, a future developer would 
(hopefully) not have to reverse engineer this stuff.

It seems to me that this patch is an improvement over the status quo.  
On the other hand, given the limitations on what we're doing -- notably, 
the lack of regular, automated testing -- that should probably at least 
be acknowledged in the commit message.

> that corresponds to a given CPU id,
> which could be part of non-sequential CPU id configurations.
>
> Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
> ---
>   libdtrace/dt_bpf.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> index 0a57b7d2..6568a572 100644
> --- a/libdtrace/dt_bpf.c
> +++ b/libdtrace/dt_bpf.c
> @@ -755,7 +755,7 @@ gmap_create_buffers(dtrace_hdl_t *dtp)
>   {
>   	return create_gmap(dtp, "buffers", BPF_MAP_TYPE_PERF_EVENT_ARRAY,
>   			   sizeof(uint32_t), sizeof(uint32_t),
> -			   dtp->dt_conf.num_online_cpus);
> +			   dtp->dt_conf.max_cpuid);
>   }
>   
>   /*

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [DTrace-devel] [PATCH] bpf: allocate the buffers BPF map to fit highest CPU id
  2025-12-11 23:16 ` [DTrace-devel] " Eugene Loh
@ 2025-12-12  4:36   ` Kris Van Hees
  2025-12-12  5:14     ` Eugene Loh
  0 siblings, 1 reply; 6+ messages in thread
From: Kris Van Hees @ 2025-12-12  4:36 UTC (permalink / raw)
  To: Eugene Loh; +Cc: Kris Van Hees, dtrace, dtrace-devel

On Thu, Dec 11, 2025 at 06:16:37PM -0500, Eugene Loh wrote:
> On 12/11/25 17:22, Kris Van Hees via DTrace-devel wrote:
> 
> > Even when less than the possible number of CPUs are online, the 'buffers'
> > BPF map should be allocated based on the highest possible CPU id because
> > probe data is written to the bufer
> 
> s/bufer/buffer/

Thanks.

> I guess a problem here is that we do not test this case.  It would be hard,
> but I suppose we should change this situation?

I am not entirely sure we *can* test this properly specialized configs.  The
problem came to light on a configuration that conveniently demonstrated the
problem because of a large gap of unavailable CPUs (0-127 possible, but only
0-7,100-127 were online).  Because in that situation, it is quite likely that
the CPU on which the events we are interested in occur falls beyond the end
of the buffers BPF map.

> What are the chances of our at least writing out what is per online CPU and
> what is per possible CPU id?  That way, a future developer would (hopefully)
> not have to reverse engineer this stuff.

What do you mean?  What is not clear about the current implementation?  I
do not think that there was an issue here of things not being documented
sufficiently - I simply made a silly mistake in the code.

> It seems to me that this patch is an improvement over the status quo.  On
> the other hand, given the limitations on what we're doing -- notably, the
> lack of regular, automated testing -- that should probably at least be
> acknowledged in the commit message.

I again do not know what you mean by this.  It is not an improvement over a
status quo, it is a fix for a genuine bug.

I also do not think it is fair to say that we lack regular, automated testing.
That is something we actually *are* doing (i.e. the build-and-test robots we
use internally).  This is a situation that primarily has not been detected
until now because we don't have all unusual configurations available for
testing at all times.  A system that has this gap of CPU ids in the list of
online CPUs is not common.  We can look into building a system just for that,
if we can, but it is the type of situation that hardly any automated testing
would uncover.  You simply cannot test everything.

I can add more details to the commit message about what the exact problem is
that is being solved.

> > that corresponds to a given CPU id,
> > which could be part of non-sequential CPU id configurations.
> > 
> > Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
> > ---
> >   libdtrace/dt_bpf.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> > index 0a57b7d2..6568a572 100644
> > --- a/libdtrace/dt_bpf.c
> > +++ b/libdtrace/dt_bpf.c
> > @@ -755,7 +755,7 @@ gmap_create_buffers(dtrace_hdl_t *dtp)
> >   {
> >   	return create_gmap(dtp, "buffers", BPF_MAP_TYPE_PERF_EVENT_ARRAY,
> >   			   sizeof(uint32_t), sizeof(uint32_t),
> > -			   dtp->dt_conf.num_online_cpus);
> > +			   dtp->dt_conf.max_cpuid);
> >   }
> >   /*

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [DTrace-devel] [PATCH] bpf: allocate the buffers BPF map to fit highest CPU id
  2025-12-12  4:36   ` Kris Van Hees
@ 2025-12-12  5:14     ` Eugene Loh
  2025-12-12  5:34       ` Kris Van Hees
  0 siblings, 1 reply; 6+ messages in thread
From: Eugene Loh @ 2025-12-12  5:14 UTC (permalink / raw)
  To: Kris Van Hees; +Cc: dtrace, dtrace-devel

On 12/11/25 23:36, Kris Van Hees wrote:

> On Thu, Dec 11, 2025 at 06:16:37PM -0500, Eugene Loh wrote:
>> On 12/11/25 17:22, Kris Van Hees via DTrace-devel wrote:
>>
>>> Even when less than the possible number of CPUs are online, the 'buffers'
>>> BPF map should be allocated based on the highest possible CPU id because
>>> probe data is written to the bufer
>> I guess a problem here is that we do not test this case.  It would be hard,
>> but I suppose we should change this situation?
> I am not entirely sure we *can* test this properly specialized configs.  The
> problem came to light on a configuration that conveniently demonstrated the
> problem because of a large gap of unavailable CPUs (0-127 possible, but only
> 0-7,100-127 were online).  Because in that situation, it is quite likely that
> the CPU on which the events we are interested in occur falls beyond the end
> of the buffers BPF map.

I think at some level we agree that it is possible to test but that it 
can be hard to do so (automated, regular, etc.).

>> What are the chances of our at least writing out what is per online CPU and
>> what is per possible CPU id?  That way, a future developer would (hopefully)
>> not have to reverse engineer this stuff.
> What do you mean?  What is not clear about the current implementation?  I
> do not think that there was an issue here of things not being documented
> sufficiently - I simply made a silly mistake in the code.

Maybe I just get easily confused, but it seems to me someone needs to 
reason through the code carefully to track what is per on-line-cpu 
index, what is per cpu id, etc.  E.g., the "aggs" map is made with ncpus 
(which in that case is max_cpuid+1) and the "cpuinfo" map is made with 
ncpus (which in that case is num_online_cpus).  So for someone like me, 
it'd be nice to say what I should index with what.

IIUC, there is no documentation, let alone safeguard, that these per-CPU 
things are being addressed correctly.  Instead, we are to read the code 
and reason locally about whether that is right or not.  And then we do 
not (regularly) test the whole thing?  That just feels brittle.

One thing I do when I look at a patch is ask myself whether a problem 
that's being fixed needs to be fixed in some other locations as well.  I 
didn't do that here since that's time-consuming and tedious.  I guess 
it's a question of how thorough one wants to be. E.g., "We fixed a bug, 
so we're done."  Or, "We found a problem that could occur in very many 
places, so let's check them all."  Or, "We got bitten by this problem, 
so we really need to test for it."  Or, "We only have so much time, so 
we just need to move on and hope for the best without thorough code 
inspection or regular testing." There's a judgment call to be made.  
Certainly fixing one bug is good.

>
>> It seems to me that this patch is an improvement over the status quo.  On
>> the other hand, given the limitations on what we're doing -- notably, the
>> lack of regular, automated testing -- that should probably at least be
>> acknowledged in the commit message.
> I again do not know what you mean by this.  It is not an improvement over a
> status quo, it is a fix for a genuine bug.
>
> I also do not think it is fair to say that we lack regular, automated testing.
> That is something we actually *are* doing (i.e. the build-and-test robots we
> use internally).  This is a situation that primarily has not been detected
> until now because we don't have all unusual configurations available for
> testing at all times.  A system that has this gap of CPU ids in the list of
> online CPUs is not common.  We can look into building a system just for that,
> if we can, but it is the type of situation that hardly any automated testing
> would uncover.  You simply cannot test everything.

Is there regular automated testing on systems with gaps in the CPU 
numbering, that stress the case where the sophistication of the code to 
handle gaps is exercised?

> I can add more details to the commit message about what the exact problem is
> that is being solved.
>
>>> that corresponds to a given CPU id,
>>> which could be part of non-sequential CPU id configurations.
>>>
>>> Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
>>> ---
>>>    libdtrace/dt_bpf.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
>>> index 0a57b7d2..6568a572 100644
>>> --- a/libdtrace/dt_bpf.c
>>> +++ b/libdtrace/dt_bpf.c
>>> @@ -755,7 +755,7 @@ gmap_create_buffers(dtrace_hdl_t *dtp)
>>>    {
>>>    	return create_gmap(dtp, "buffers", BPF_MAP_TYPE_PERF_EVENT_ARRAY,
>>>    			   sizeof(uint32_t), sizeof(uint32_t),
>>> -			   dtp->dt_conf.num_online_cpus);
>>> +			   dtp->dt_conf.max_cpuid);
>>>    }
>>>    /*

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [DTrace-devel] [PATCH] bpf: allocate the buffers BPF map to fit highest CPU id
  2025-12-12  5:14     ` Eugene Loh
@ 2025-12-12  5:34       ` Kris Van Hees
  2025-12-12 19:16         ` Eugene Loh
  0 siblings, 1 reply; 6+ messages in thread
From: Kris Van Hees @ 2025-12-12  5:34 UTC (permalink / raw)
  To: Eugene Loh; +Cc: Kris Van Hees, dtrace, dtrace-devel

On Fri, Dec 12, 2025 at 12:14:20AM -0500, Eugene Loh wrote:
> On 12/11/25 23:36, Kris Van Hees wrote:
> 
> > On Thu, Dec 11, 2025 at 06:16:37PM -0500, Eugene Loh wrote:
> > > On 12/11/25 17:22, Kris Van Hees via DTrace-devel wrote:
> > > 
> > > > Even when less than the possible number of CPUs are online, the 'buffers'
> > > > BPF map should be allocated based on the highest possible CPU id because
> > > > probe data is written to the bufer
> > > I guess a problem here is that we do not test this case.  It would be hard,
> > > but I suppose we should change this situation?
> > I am not entirely sure we *can* test this properly specialized configs.  The
> > problem came to light on a configuration that conveniently demonstrated the
> > problem because of a large gap of unavailable CPUs (0-127 possible, but only
> > 0-7,100-127 were online).  Because in that situation, it is quite likely that
> > the CPU on which the events we are interested in occur falls beyond the end
> > of the buffers BPF map.
> 
> I think at some level we agree that it is possible to test but that it can
> be hard to do so (automated, regular, etc.).

Yes, but quite difficult and very specialized.

> > > What are the chances of our at least writing out what is per online CPU and
> > > what is per possible CPU id?  That way, a future developer would (hopefully)
> > > not have to reverse engineer this stuff.
> > What do you mean?  What is not clear about the current implementation?  I
> > do not think that there was an issue here of things not being documented
> > sufficiently - I simply made a silly mistake in the code.
> 
> Maybe I just get easily confused, but it seems to me someone needs to reason
> through the code carefully to track what is per on-line-cpu index, what is
> per cpu id, etc.  E.g., the "aggs" map is made with ncpus (which in that
> case is max_cpuid+1) and the "cpuinfo" map is made with ncpus (which in that
> case is num_online_cpus).  So for someone like me, it'd be nice to say what
> I should index with what.
> 
> IIUC, there is no documentation, let alone safeguard, that these per-CPU
> things are being addressed correctly.  Instead, we are to read the code and
> reason locally about whether that is right or not.  And then we do not
> (regularly) test the whole thing?  That just feels brittle.

Well, yes and no.  We have BPF array maps and we have BPF hash maps.  We always
index CPU-specific information by CPU id.  When we index an array, the array
needs to have sufficient slots to support indexing by the highest CPU id.  When
we index a hash map, it needs to have sufficient slots to support the *number*
of online CPUs.  That is where the difference lies.  It is not really an
aspect of our code, but rather the nature of array vs hashtable.

> One thing I do when I look at a patch is ask myself whether a problem that's
> being fixed needs to be fixed in some other locations as well.  I didn't do
> that here since that's time-consuming and tedious.  I guess it's a question
> of how thorough one wants to be. E.g., "We fixed a bug, so we're done."  Or,
> "We found a problem that could occur in very many places, so let's check
> them all."  Or, "We got bitten by this problem, so we really need to test
> for it."  Or, "We only have so much time, so we just need to move on and
> hope for the best without thorough code inspection or regular testing."
> There's a judgment call to be made.  Certainly fixing one bug is good.

Whenever I fix a bug, I do the same :)

> > > It seems to me that this patch is an improvement over the status quo.  On
> > > the other hand, given the limitations on what we're doing -- notably, the
> > > lack of regular, automated testing -- that should probably at least be
> > > acknowledged in the commit message.
> > I again do not know what you mean by this.  It is not an improvement over a
> > status quo, it is a fix for a genuine bug.
> > 
> > I also do not think it is fair to say that we lack regular, automated testing.
> > That is something we actually *are* doing (i.e. the build-and-test robots we
> > use internally).  This is a situation that primarily has not been detected
> > until now because we don't have all unusual configurations available for
> > testing at all times.  A system that has this gap of CPU ids in the list of
> > online CPUs is not common.  We can look into building a system just for that,
> > if we can, but it is the type of situation that hardly any automated testing
> > would uncover.  You simply cannot test everything.
> 
> Is there regular automated testing on systems with gaps in the CPU
> numbering, that stress the case where the sophistication of the code to
> handle gaps is exercised?

Ni, there is not.  Simply because we do not have a system that has that kind
of CPU configuration.  Simiarly, we do not have systems that have CPU configs
with multiple gaps in the list of online CPUs, or have just non-sequential
CPU ids (no ranges), or have a mix of singular CPU ids and ranges, etc...

We can always improve testing, but this is the type of bug that got uncovered
because it was in an area that is very hard to test (or at a minimum, very
resource-expensive to test consistently and for all possible configurations),
and someone happened to try DTrace in that configuration.

So yes, *if* we had tested DTrace in such unusual configurations, we might have
caught this.  It is unfortunate that we did not.  But unfortunately, that will
be the reality forever - we simply cannot test everything in all cases.  There
is no guarantee to ever have conclusive testing.  That is not a reason not to
try to do our best - but I think we ought to be realistic and not beat
ouselves up over not having caught this.  Those things happen, and even when
we would have tested this, there would be fair chance *not* to trigger it.
Now that we know the situation, it is easy to see what was wrong and think of
ways to test it.  But until you know there is a problem, it is quite tricky to
come up with all scenarios that need testing, right?

> > I can add more details to the commit message about what the exact problem is
> > that is being solved.
> > 
> > > > that corresponds to a given CPU id,
> > > > which could be part of non-sequential CPU id configurations.
> > > > 
> > > > Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
> > > > ---
> > > >    libdtrace/dt_bpf.c | 2 +-
> > > >    1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> > > > index 0a57b7d2..6568a572 100644
> > > > --- a/libdtrace/dt_bpf.c
> > > > +++ b/libdtrace/dt_bpf.c
> > > > @@ -755,7 +755,7 @@ gmap_create_buffers(dtrace_hdl_t *dtp)
> > > >    {
> > > >    	return create_gmap(dtp, "buffers", BPF_MAP_TYPE_PERF_EVENT_ARRAY,
> > > >    			   sizeof(uint32_t), sizeof(uint32_t),
> > > > -			   dtp->dt_conf.num_online_cpus);
> > > > +			   dtp->dt_conf.max_cpuid);
> > > >    }
> > > >    /*

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [DTrace-devel] [PATCH] bpf: allocate the buffers BPF map to fit highest CPU id
  2025-12-12  5:34       ` Kris Van Hees
@ 2025-12-12 19:16         ` Eugene Loh
  0 siblings, 0 replies; 6+ messages in thread
From: Eugene Loh @ 2025-12-12 19:16 UTC (permalink / raw)
  To: Kris Van Hees; +Cc: dtrace, dtrace-devel

This patch is clearly a step in the right direction (fixes a bug), and I 
do not want perfection to be the enemy of good here. So, if you're 
willing to add some remarks to the commit message that describe some of 
the issues we're discussing here -- and, quite frankly, even if you're 
not -- I'm good with this patch.

And...

On 12/12/25 00:34, Kris Van Hees wrote:

> On Fri, Dec 12, 2025 at 12:14:20AM -0500, Eugene Loh wrote:
>> On 12/11/25 23:36, Kris Van Hees wrote:
>>> I am not entirely sure we *can* test this properly specialized configs.  The
>>> problem came to light on a configuration that conveniently demonstrated the
>>> problem because of a large gap of unavailable CPUs (0-127 possible, but only
>>> 0-7,100-127 were online).  Because in that situation, it is quite likely that
>>> the CPU on which the events we are interested in occur falls beyond the end
>>> of the buffers BPF map.
>> I think at some level we agree that it is possible to test but that it can
>> be hard to do so (automated, regular, etc.).
> Yes, but quite difficult and very specialized.

Difficult?  It's outside of our normal routine, but we can get access to 
such systems and we could devise tests.  The problem is not that such 
testing is more challenging than other things we have to do, but only 
that we have too many other things to do.

Specialized?  The distinction between on-line and possible CPUs shows up 
in a number of places in the code.

Anyhow, yeah, let's try our best *and* not beat ourselves up on this one.

And, of course, thanks for fixing this problem.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-12-12 19:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-11 22:22 [PATCH] bpf: allocate the buffers BPF map to fit highest CPU id Kris Van Hees
2025-12-11 23:16 ` [DTrace-devel] " Eugene Loh
2025-12-12  4:36   ` Kris Van Hees
2025-12-12  5:14     ` Eugene Loh
2025-12-12  5:34       ` Kris Van Hees
2025-12-12 19:16         ` Eugene Loh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox