[PATCH] Fix (well, work around) perf stat --null segfault

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] Fix (well, work around) perf stat --null segfault
@ 2025-11-30 11:43 Ingo Molnar
  2025-12-01 16:52 ` Ian Rogers
  0 siblings, 1 reply; 4+ messages in thread
From: Ingo Molnar @ 2025-11-30 11:43 UTC (permalink / raw)
  To: linux-perf-users, Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa,
	Ian Rogers

JFYI:

  starship:~> perf stat --null ls
  Segmentation fault (core dumped)


  Program received signal SIGSEGV, Segmentation fault.
  0x000055555563596e in __perf_cpu_map__nr (cpus=<error reading variable: Cannot access memory at address 0x0>) at cpumap.c:257
  257		return RC_CHK_ACCESS(cpus)->nr;
  (gdb) bt
  #0  0x000055555563596e in __perf_cpu_map__nr (cpus=<error reading variable: Cannot access memory at address 0x0>) at cpumap.c:257
  #1  perf_cpu_map__max (map=0x0) at cpumap.c:372
  #2  0x00005555555cc249 in perf_stat_init_aggr_mode ()
  #3  0x00005555555d1963 in cmd_stat ()
  #4  0x0000555555631bb2 in run_builtin ()
  #5  0x0000555555631e9e in handle_internal_command ()
  #6  0x00005555555ab183 in main ()
  (gdb) 

  (gdb) print cpus
  Cannot access memory at address 0x0
  (gdb) bt
  #0  0x000055555563596e in __perf_cpu_map__nr (cpus=<error reading variable: Cannot access memory at address 0x0>) at cpumap.c:257
  #1  perf_cpu_map__max (map=0x0) at cpumap.c:372
  #2  0x00005555555cc249 in perf_stat_init_aggr_mode ()
  #3  0x00005555555d1963 in cmd_stat ()
  #4  0x0000555555631bb2 in run_builtin ()
  #5  0x0000555555631e9e in handle_internal_command ()
  #6  0x00005555555ab183 in main ()
  (gdb) 

For some build(?) reason I cannot set breakpoints 
inside builtin-stat.c:

  (gdb) b builtin-stat.c:1545
  No source file named builtin-stat.c.

... and didn't investigate any further.

System has 128 CPUs.

It's been broken for some time:

  starship:~/tip> git bisect bad
  ced4c249569ab25c32b0d36e2ebdb19c74394bdf is the first bad commit
  commit ced4c249569ab25c32b0d36e2ebdb19c74394bdf (HEAD)
  Author: Ian Rogers <irogers@google.com>
  Date:   Fri Jul 18 20:05:05 2025 -0700

    perf stat: Don't size aggregation ids from user_requested_cpus
    
    As evsels may have additional CPU terms, the user_requested_cpus may
    not reflect all the CPUs requested. Use evlist->all_cpus to size the
    array as that reflects all the CPUs potentially needed by the evlist.
    
    Reviewed-by: Thomas Falcon <thomas.falcon@intel.com>
    Signed-off-by: Ian Rogers <irogers@google.com>
    Tested-by: James Clark <james.clark@linaro.org>
    Link: https://lore.kernel.org/r/20250719030517.1990983-4-irogers@google.com
    Signed-off-by: Namhyung Kim <namhyung@kernel.org>


The bug is the assumption that 
evsel_list->core.all_cpus exists at this stage.

The attached patch works this around like the old 
version, but there's more things wrong with this code:

BTW., I couldn't even find where all_cpus pointer gets 
allocated. It gets zalloc()-ed to NULL, and cleared 
back to NULL in two places:

  lib/perf/evlist.c:	evlist->all_cpus = NULL;
  lib/perf/evlist.c:	evlist->all_cpus = NULL;

But where does it get allocated? perf_cpu_map__nr() 
hides it in most cases that ->cpus_all is always NULL:

  int perf_cpu_map__nr(const struct perf_cpu_map *cpus)
  {
        return cpus ? __perf_cpu_map__nr(cpus) : 1;
  }

Which in general is a code robustness red flag, for 
something this fundamental I doubt there should be any 
conditionality to this pointer. Allowing cpus pointers 
to be NULL and hiding it in the helpers hid a real bug 
for months.

So this code looks like a bit of a trainwreck. ;-)

In addition to fixing the bug, it might also make sense 
to add a testcase for 'perf stat --null'? (which is a 
better, more accurate 'time' utility)

As an additional report and side note, I still get this 
bogus message during a build on standard Ubuntu:

 Makefile.config:1151: No openjdk development package found, please install JDK package, e.g. openjdk-8-jdk, java-latest-openjdk-devel

The suggested "e.g. openjdk-8-jdk" package *is* 
installed:

 starship:~/tip> dpkg -l | grep openjdk
 ii  openjdk-11-jre:amd64                                        11.0.29+7-1ubuntu1~25.10                            amd64        OpenJDK Java runtime, using Hotspot JIT
 ii  openjdk-11-jre-headless:amd64                               11.0.29+7-1ubuntu1~25.10                            amd64        OpenJDK Java runtime, using Hotspot JIT (headless)
 ii  openjdk-17-jre:amd64                                        17.0.17+10-1~25.10                                  amd64        OpenJDK Java runtime, using Hotspot JIT
 ii  openjdk-17-jre-headless:amd64                               17.0.17+10-1~25.10                                  amd64        OpenJDK Java runtime, using Hotspot JIT (headless)
 ii  openjdk-21-jre:amd64                                        21.0.9+10-1~25.10                                   amd64        OpenJDK Java runtime, using Hotspot JIT
 ii  openjdk-21-jre-headless:amd64                               21.0.9+10-1~25.10                                   amd64        OpenJDK Java runtime, using Hotspot JIT (headless)
 ii  openjdk-8-jdk:amd64                                         8u472-ga-1~25.10                                    amd64        OpenJDK Development Kit (JDK)
 ii  openjdk-8-jdk-headless:amd64                                8u472-ga-1~25.10                                    amd64        OpenJDK Development Kit (JDK) (headless)
 ii  openjdk-8-jre:amd64                                         8u472-ga-1~25.10                                    amd64        OpenJDK Java runtime, using Hotspot JIT
 ii  openjdk-8-jre-headless:amd64                                8u472-ga-1~25.10                                    amd64        OpenJDK Java runtime, using Hotspot JIT (headless)

Thanks,

	Ingo

====================>

Signed-off-by: Ingo Molnar <mingo@kernel.org>

 tools/perf/builtin-stat.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 7006f848f87a..973b72218df7 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1542,7 +1542,10 @@ static int perf_stat_init_aggr_mode(void)
 	 * taking the highest cpu number to be the size of
 	 * the aggregation translate cpumap.
 	 */
-	nr = perf_cpu_map__max(evsel_list->core.all_cpus).cpu + 1;
+	if (evsel_list->core.all_cpus)
+		nr = perf_cpu_map__max(evsel_list->core.all_cpus).cpu + 1;
+	else
+		nr = 0;
 	stat_config.cpus_aggr_map = cpu_aggr_map__empty_new(nr);
 	return stat_config.cpus_aggr_map ? 0 : -ENOMEM;
 }

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] Fix (well, work around) perf stat --null segfault
  2025-11-30 11:43 [PATCH] Fix (well, work around) perf stat --null segfault Ingo Molnar
@ 2025-12-01 16:52 ` Ian Rogers
  2025-12-01 17:54   ` Ingo Molnar
  0 siblings, 1 reply; 4+ messages in thread
From: Ian Rogers @ 2025-12-01 16:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-perf-users, Arnaldo Carvalho de Melo, Namhyung Kim,
	Mark Rutland, Jiri Olsa

On Sun, Nov 30, 2025 at 3:43 AM Ingo Molnar <mingo@kernel.org> wrote:
>
> JFYI:
>
>   starship:~> perf stat --null ls
>   Segmentation fault (core dumped)
>
>   Program received signal SIGSEGV, Segmentation fault.
>   0x000055555563596e in __perf_cpu_map__nr (cpus=<error reading variable: Cannot access memory at address 0x0>) at cpumap.c:257
>   257           return RC_CHK_ACCESS(cpus)->nr;
>   (gdb) bt
>   #0  0x000055555563596e in __perf_cpu_map__nr (cpus=<error reading variable: Cannot access memory at address 0x0>) at cpumap.c:257
>   #1  perf_cpu_map__max (map=0x0) at cpumap.c:372
>   #2  0x00005555555cc249 in perf_stat_init_aggr_mode ()
>   #3  0x00005555555d1963 in cmd_stat ()
>   #4  0x0000555555631bb2 in run_builtin ()
>   #5  0x0000555555631e9e in handle_internal_command ()
>   #6  0x00005555555ab183 in main ()
>   (gdb)
>
>   (gdb) print cpus
>   Cannot access memory at address 0x0
>   (gdb) bt
>   #0  0x000055555563596e in __perf_cpu_map__nr (cpus=<error reading variable: Cannot access memory at address 0x0>) at cpumap.c:257
>   #1  perf_cpu_map__max (map=0x0) at cpumap.c:372
>   #2  0x00005555555cc249 in perf_stat_init_aggr_mode ()
>   #3  0x00005555555d1963 in cmd_stat ()
>   #4  0x0000555555631bb2 in run_builtin ()
>   #5  0x0000555555631e9e in handle_internal_command ()
>   #6  0x00005555555ab183 in main ()
>   (gdb)
>
> For some build(?) reason I cannot set breakpoints
> inside builtin-stat.c:
>
>   (gdb) b builtin-stat.c:1545
>   No source file named builtin-stat.c.
>
> ... and didn't investigate any further.
>
> System has 128 CPUs.
>
> It's been broken for some time:
>
>   starship:~/tip> git bisect bad
>   ced4c249569ab25c32b0d36e2ebdb19c74394bdf is the first bad commit
>   commit ced4c249569ab25c32b0d36e2ebdb19c74394bdf (HEAD)
>   Author: Ian Rogers <irogers@google.com>
>   Date:   Fri Jul 18 20:05:05 2025 -0700
>
>     perf stat: Don't size aggregation ids from user_requested_cpus
>
>     As evsels may have additional CPU terms, the user_requested_cpus may
>     not reflect all the CPUs requested. Use evlist->all_cpus to size the
>     array as that reflects all the CPUs potentially needed by the evlist.
>
>     Reviewed-by: Thomas Falcon <thomas.falcon@intel.com>
>     Signed-off-by: Ian Rogers <irogers@google.com>
>     Tested-by: James Clark <james.clark@linaro.org>
>     Link: https://lore.kernel.org/r/20250719030517.1990983-4-irogers@google.com
>     Signed-off-by: Namhyung Kim <namhyung@kernel.org>
>
>
> The bug is the assumption that
> evsel_list->core.all_cpus exists at this stage.
>
> The attached patch works this around like the old
> version, but there's more things wrong with this code:
>
> BTW., I couldn't even find where all_cpus pointer gets
> allocated. It gets zalloc()-ed to NULL, and cleared
> back to NULL in two places:
>
>   lib/perf/evlist.c:    evlist->all_cpus = NULL;
>   lib/perf/evlist.c:    evlist->all_cpus = NULL;
>
> But where does it get allocated? perf_cpu_map__nr()
> hides it in most cases that ->cpus_all is always NULL:
>
>   int perf_cpu_map__nr(const struct perf_cpu_map *cpus)
>   {
>         return cpus ? __perf_cpu_map__nr(cpus) : 1;
>   }
>
> Which in general is a code robustness red flag, for
> something this fundamental I doubt there should be any
> conditionality to this pointer. Allowing cpus pointers
> to be NULL and hiding it in the helpers hid a real bug
> for months.
>
> So this code looks like a bit of a trainwreck. ;-)

Agreed on the trainwreck :-) So NULL is a valid CPU map value. It
means no CPUs, but magically no CPUs can also be converted in the code
to become the "any" CPU value of -1. So iterating the perf_cpu_map of
NULL will yield a single CPU of -1 :-( This behavior existed before my
time working on the code, I've tried to make the expectation in the
using code clearer, does the user want the magic "any" CPU behavior?
"any" CPU is distinct from every/all CPUs which would be something
like CPUs 0-127, and then does "all" include online and offline CPUs?
This is something that's been discussed and I think things can be
better, I'd rather the code was explicit whenever terms like "all",
"online" and "any" were being used - it is very easy to confuse "all"
and "any". I've been trying to adjust the code and make it more
sensible with changes like:
https://lore.kernel.org/all/20240202234057.2085863-1-irogers@google.com/
but I agree with you that things are a mess here. At least we
shouldn't be leaking memory (due to the reference count checking) and
we shouldn't confuse indices and CPU numbers (due to the introduction
of a wrapper type), problems that were pervasive and broke things like
uncore counters frequently before.

Fwiw, part of me thinks we should change the perf_cpu_map to be a
cpu_set_t, the "any" case could be an additional boolean in the
struct. Perhaps we can just auto change "all" to "any" as is somewhat
done here:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/evlist.c?h=perf-tools-next#n92
Then perf_cpu_map can be a cpu_set_t and I think life will be clearer.

Fwiw, even on the kernel side this is borked as I tried to explain in
this RFC with 0 follow ups:
https://lore.kernel.org/lkml/20250716223924.825772-1-irogers@google.com/
Imagine the case of an event opened with 1 CPU and a TID. The filter
will mean the counter only counts on the 1 CPU but the enabled and
running time will mean the counter will be scaled (on your system) 128
times!
All of this matters when you consider different events opened in the
same evlist and there being combined perf_cpu_maps like the evlist's
all_cpus value.

> In addition to fixing the bug, it might also make sense
> to add a testcase for 'perf stat --null'? (which is a
> better, more accurate 'time' utility)

Yup. A good place would be:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/stat.sh?h=perf-tools-next

> As an additional report and side note, I still get this
> bogus message during a build on standard Ubuntu:
>
>  Makefile.config:1151: No openjdk development package found, please install JDK package, e.g. openjdk-8-jdk, java-latest-openjdk-devel
>
> The suggested "e.g. openjdk-8-jdk" package *is*
> installed:
>
>  starship:~/tip> dpkg -l | grep openjdk
>  ii  openjdk-11-jre:amd64                                        11.0.29+7-1ubuntu1~25.10                            amd64        OpenJDK Java runtime, using Hotspot JIT
>  ii  openjdk-11-jre-headless:amd64                               11.0.29+7-1ubuntu1~25.10                            amd64        OpenJDK Java runtime, using Hotspot JIT (headless)
>  ii  openjdk-17-jre:amd64                                        17.0.17+10-1~25.10                                  amd64        OpenJDK Java runtime, using Hotspot JIT
>  ii  openjdk-17-jre-headless:amd64                               17.0.17+10-1~25.10                                  amd64        OpenJDK Java runtime, using Hotspot JIT (headless)
>  ii  openjdk-21-jre:amd64                                        21.0.9+10-1~25.10                                   amd64        OpenJDK Java runtime, using Hotspot JIT
>  ii  openjdk-21-jre-headless:amd64                               21.0.9+10-1~25.10                                   amd64        OpenJDK Java runtime, using Hotspot JIT (headless)
>  ii  openjdk-8-jdk:amd64                                         8u472-ga-1~25.10                                    amd64        OpenJDK Development Kit (JDK)
>  ii  openjdk-8-jdk-headless:amd64                                8u472-ga-1~25.10                                    amd64        OpenJDK Development Kit (JDK) (headless)
>  ii  openjdk-8-jre:amd64                                         8u472-ga-1~25.10                                    amd64        OpenJDK Java runtime, using Hotspot JIT
>  ii  openjdk-8-jre-headless:amd64                                8u472-ga-1~25.10                                    amd64        OpenJDK Java runtime, using Hotspot JIT (headless)

I'll let someone else worry about this for now.

> Thanks,
>
>         Ingo
>
> ====================>
>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
>
>  tools/perf/builtin-stat.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 7006f848f87a..973b72218df7 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1542,7 +1542,10 @@ static int perf_stat_init_aggr_mode(void)
>          * taking the highest cpu number to be the size of
>          * the aggregation translate cpumap.
>          */
> -       nr = perf_cpu_map__max(evsel_list->core.all_cpus).cpu + 1;
> +       if (evsel_list->core.all_cpus)
> +               nr = perf_cpu_map__max(evsel_list->core.all_cpus).cpu + 1;
> +       else
> +               nr = 0;

So I think the better fix is to make max handle a NULL perf_cpu_map.
That will yield "any" CPU of -1 ( :-( ) and then the -1 + 1 will give
an empty aggregation map.

I'll write this up, add the test and see if I can spot similar issues
wrt NULL in the vicinity in the perf_cpu_map code.

Thanks,
Ian

>         stat_config.cpus_aggr_map = cpu_aggr_map__empty_new(nr);
>         return stat_config.cpus_aggr_map ? 0 : -ENOMEM;
>  }

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Fix (well, work around) perf stat --null segfault
  2025-12-01 16:52 ` Ian Rogers
@ 2025-12-01 17:54   ` Ingo Molnar
  2025-12-01 23:33     ` Ian Rogers
  0 siblings, 1 reply; 4+ messages in thread
From: Ingo Molnar @ 2025-12-01 17:54 UTC (permalink / raw)
  To: Ian Rogers
  Cc: linux-perf-users, Arnaldo Carvalho de Melo, Namhyung Kim,
	Mark Rutland, Jiri Olsa


* Ian Rogers <irogers@google.com> wrote:

> Agreed on the trainwreck :-) So NULL is a valid CPU map value. It
> means no CPUs, but magically no CPUs can also be converted in the code
> to become the "any" CPU value of -1. So iterating the perf_cpu_map of
> NULL will yield a single CPU of -1 :-( This behavior existed before my
> time working on the code, I've tried to make the expectation in the
> using code clearer, does the user want the magic "any" CPU behavior?
> "any" CPU is distinct from every/all CPUs which would be something
> like CPUs 0-127, and then does "all" include online and offline CPUs?
> This is something that's been discussed and I think things can be
> better, I'd rather the code was explicit whenever terms like "all",
> "online" and "any" were being used - it is very easy to confuse "all"
> and "any". I've been trying to adjust the code and make it more
> sensible with changes like:
> https://lore.kernel.org/all/20240202234057.2085863-1-irogers@google.com/
> but I agree with you that things are a mess here. At least we
> shouldn't be leaking memory (due to the reference count checking) and
> we shouldn't confuse indices and CPU numbers (due to the introduction
> of a wrapper type), problems that were pervasive and broke things like
> uncore counters frequently before.
> 
> Fwiw, part of me thinks we should change the perf_cpu_map to be a
> cpu_set_t, the "any" case could be an additional boolean in the
> struct. Perhaps we can just auto change "all" to "any" as is somewhat
> done here:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/evlist.c?h=perf-tools-next#n92
> Then perf_cpu_map can be a cpu_set_t and I think life will be clearer.

Yeah, so when I looked at the data structure a few 
minutes ago:

	/**
	 * A sized, reference counted, sorted array of integers representing CPU
	 * numbers. This is commonly used to capture which CPUs a PMU is associated
	 * with. The indices into the cpumap are frequently used as they avoid having
	 * gaps if CPU numbers were used. For events associated with a pid, rather than
	 * a CPU, a single dummy map with an entry of -1 is used.
	 */
	DECLARE_RC_STRUCT(perf_cpu_map) {
		refcount_t	refcnt;
		/** Length of the map array. */
		int		nr;
		/** The CPU values. */
		struct perf_cpu	map[];
	};


My first thought was "Why isn't this a bitmask?". :-)

The thing is, even with 8192 CPUs, the mask size is 1K. 
That's still not catastrophic compared to the nightmare 
of managing a [8192] array...

And I would keep such core data structures as simple as 
possible: a single continuous bitmask. No sparse 
support, no NULL tricks. Just an always-present 
bitmask, and maybe an any/all modifier flag if the ABIs 
strongly suggest so.

The closer a data structure is to the core concepts of 
a tool, the more important its robust KISS properties 
are - at almost any runtime cost. </handwaving> :)

> Fwiw, even on the kernel side this is borked as I tried to explain in
> this RFC with 0 follow ups:
> https://lore.kernel.org/lkml/20250716223924.825772-1-irogers@google.com/

So that's a neat patch and kudos on the ASCII graphics. :-)

I'd suggest a re-send.

> Imagine the case of an event opened with 1 CPU and a TID. The filter
> will mean the counter only counts on the 1 CPU but the enabled and
> running time will mean the counter will be scaled (on your system) 128
> times!
> All of this matters when you consider different events opened in the
> same evlist and there being combined perf_cpu_maps like the evlist's
> all_cpus value.

> > -       nr = perf_cpu_map__max(evsel_list->core.all_cpus).cpu + 1;
> > +       if (evsel_list->core.all_cpus)
> > +               nr = perf_cpu_map__max(evsel_list->core.all_cpus).cpu + 1;
> > +       else
> > +               nr = 0;
> 
> So I think the better fix is to make max handle a NULL perf_cpu_map.
> That will yield "any" CPU of -1 ( :-( ) and then the -1 + 1 will give
> an empty aggregation map.
> 
> I'll write this up, add the test and see if I can spot similar issues
> wrt NULL in the vicinity in the perf_cpu_map code.

Thank you! In the short run I'd prefer whichever fix 
can restore perf stat --null functionality the fastest. ;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Fix (well, work around) perf stat --null segfault
  2025-12-01 17:54   ` Ingo Molnar
@ 2025-12-01 23:33     ` Ian Rogers
  0 siblings, 0 replies; 4+ messages in thread
From: Ian Rogers @ 2025-12-01 23:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-perf-users, Arnaldo Carvalho de Melo, Namhyung Kim,
	Mark Rutland, Jiri Olsa

On Mon, Dec 1, 2025 at 9:54 AM Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Ian Rogers <irogers@google.com> wrote:
>
> > Agreed on the trainwreck :-) So NULL is a valid CPU map value. It
> > means no CPUs, but magically no CPUs can also be converted in the code
> > to become the "any" CPU value of -1. So iterating the perf_cpu_map of
> > NULL will yield a single CPU of -1 :-( This behavior existed before my
> > time working on the code, I've tried to make the expectation in the
> > using code clearer, does the user want the magic "any" CPU behavior?
> > "any" CPU is distinct from every/all CPUs which would be something
> > like CPUs 0-127, and then does "all" include online and offline CPUs?
> > This is something that's been discussed and I think things can be
> > better, I'd rather the code was explicit whenever terms like "all",
> > "online" and "any" were being used - it is very easy to confuse "all"
> > and "any". I've been trying to adjust the code and make it more
> > sensible with changes like:
> > https://lore.kernel.org/all/20240202234057.2085863-1-irogers@google.com/
> > but I agree with you that things are a mess here. At least we
> > shouldn't be leaking memory (due to the reference count checking) and
> > we shouldn't confuse indices and CPU numbers (due to the introduction
> > of a wrapper type), problems that were pervasive and broke things like
> > uncore counters frequently before.
> >
> > Fwiw, part of me thinks we should change the perf_cpu_map to be a
> > cpu_set_t, the "any" case could be an additional boolean in the
> > struct. Perhaps we can just auto change "all" to "any" as is somewhat
> > done here:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/evlist.c?h=perf-tools-next#n92
> > Then perf_cpu_map can be a cpu_set_t and I think life will be clearer.
>
> Yeah, so when I looked at the data structure a few
> minutes ago:
>
>         /**
>          * A sized, reference counted, sorted array of integers representing CPU
>          * numbers. This is commonly used to capture which CPUs a PMU is associated
>          * with. The indices into the cpumap are frequently used as they avoid having
>          * gaps if CPU numbers were used. For events associated with a pid, rather than
>          * a CPU, a single dummy map with an entry of -1 is used.
>          */
>         DECLARE_RC_STRUCT(perf_cpu_map) {
>                 refcount_t      refcnt;
>                 /** Length of the map array. */
>                 int             nr;
>                 /** The CPU values. */
>                 struct perf_cpu map[];
>         };
>
>
> My first thought was "Why isn't this a bitmask?". :-)
>
> The thing is, even with 8192 CPUs, the mask size is 1K.
> That's still not catastrophic compared to the nightmare
> of managing a [8192] array...
>
> And I would keep such core data structures as simple as
> possible: a single continuous bitmask. No sparse
> support, no NULL tricks. Just an always-present
> bitmask, and maybe an any/all modifier flag if the ABIs
> strongly suggest so.
>
> The closer a data structure is to the core concepts of
> a tool, the more important its robust KISS properties
> are - at almost any runtime cost. </handwaving> :)

Agreed. Some more things I can think of:

- the perf vs libperf separation is kind of redundant. The API of
libperf isn't good enough (no event parsing), nor stable and because
the license is GPLv2 doesn't benefit external users. It is a chore
making changes in two places and I think we should just focus on
making the python API the extension point.

- the /sys/bus/event_source/devices/<pmu>/cpumask aren't consistent in
whether there are hot plugged CPUs in there. Kan cleaned this up but
only as far as Intel PMUs:
https://lore.kernel.org/all/20240802151643.1691631-1-kan.liang@linux.intel.com/

- ARM PMUs ignore hot plugging leading to a scary intersect:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/arch/arm/util/pmu.c?h=perf-tools-next#n42

But we did manage to make event encoding/displaying consistent with:
https://lore.kernel.org/lkml/20251005182430.2791371-1-irogers@google.com/
something I will no doubt get emails about in this v6.19 merge window :-)

> > Fwiw, even on the kernel side this is borked as I tried to explain in
> > this RFC with 0 follow ups:
> > https://lore.kernel.org/lkml/20250716223924.825772-1-irogers@google.com/
>
> So that's a neat patch and kudos on the ASCII graphics. :-)
>
> I'd suggest a re-send.

Agreed. I think it needs a refresh, I'll try to find the time.

> > Imagine the case of an event opened with 1 CPU and a TID. The filter
> > will mean the counter only counts on the 1 CPU but the enabled and
> > running time will mean the counter will be scaled (on your system) 128
> > times!
> > All of this matters when you consider different events opened in the
> > same evlist and there being combined perf_cpu_maps like the evlist's
> > all_cpus value.
>
> > > -       nr = perf_cpu_map__max(evsel_list->core.all_cpus).cpu + 1;
> > > +       if (evsel_list->core.all_cpus)
> > > +               nr = perf_cpu_map__max(evsel_list->core.all_cpus).cpu + 1;
> > > +       else
> > > +               nr = 0;
> >
> > So I think the better fix is to make max handle a NULL perf_cpu_map.
> > That will yield "any" CPU of -1 ( :-( ) and then the -1 + 1 will give
> > an empty aggregation map.
> >
> > I'll write this up, add the test and see if I can spot similar issues
> > wrt NULL in the vicinity in the perf_cpu_map code.
>
> Thank you! In the short run I'd prefer whichever fix
> can restore perf stat --null functionality the fastest. ;-)

Done in the series:
https://lore.kernel.org/lkml/20251201230904.290733-1-irogers@google.com/

Thanks,
Ian

> Thanks,
>
>         Ingo

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-12-01 23:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-30 11:43 [PATCH] Fix (well, work around) perf stat --null segfault Ingo Molnar
2025-12-01 16:52 ` Ian Rogers
2025-12-01 17:54   ` Ingo Molnar
2025-12-01 23:33     ` Ian Rogers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).