From: Ingo Molnar <mingo@kernel.org>
To: Ian Rogers <irogers@google.com>
Cc: linux-perf-users@vger.kernel.org,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Namhyung Kim <namhyung@kernel.org>,
Mark Rutland <mark.rutland@arm.com>, Jiri Olsa <jolsa@kernel.org>
Subject: Re: [PATCH] Fix (well, work around) perf stat --null segfault
Date: Mon, 1 Dec 2025 18:54:38 +0100 [thread overview]
Message-ID: <aS3WXn4nz9GE87bM@gmail.com> (raw)
In-Reply-To: <CAP-5=fXK8HORZaFUxydEatLV1xXQTcLLBnEMeveJYDHDQPkf_w@mail.gmail.com>
* Ian Rogers <irogers@google.com> wrote:
> Agreed on the trainwreck :-) So NULL is a valid CPU map value. It
> means no CPUs, but magically no CPUs can also be converted in the code
> to become the "any" CPU value of -1. So iterating the perf_cpu_map of
> NULL will yield a single CPU of -1 :-( This behavior existed before my
> time working on the code, I've tried to make the expectation in the
> using code clearer, does the user want the magic "any" CPU behavior?
> "any" CPU is distinct from every/all CPUs which would be something
> like CPUs 0-127, and then does "all" include online and offline CPUs?
> This is something that's been discussed and I think things can be
> better, I'd rather the code was explicit whenever terms like "all",
> "online" and "any" were being used - it is very easy to confuse "all"
> and "any". I've been trying to adjust the code and make it more
> sensible with changes like:
> https://lore.kernel.org/all/20240202234057.2085863-1-irogers@google.com/
> but I agree with you that things are a mess here. At least we
> shouldn't be leaking memory (due to the reference count checking) and
> we shouldn't confuse indices and CPU numbers (due to the introduction
> of a wrapper type), problems that were pervasive and broke things like
> uncore counters frequently before.
>
> Fwiw, part of me thinks we should change the perf_cpu_map to be a
> cpu_set_t, the "any" case could be an additional boolean in the
> struct. Perhaps we can just auto change "all" to "any" as is somewhat
> done here:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/evlist.c?h=perf-tools-next#n92
> Then perf_cpu_map can be a cpu_set_t and I think life will be clearer.
Yeah, so when I looked at the data structure a few
minutes ago:
/**
* A sized, reference counted, sorted array of integers representing CPU
* numbers. This is commonly used to capture which CPUs a PMU is associated
* with. The indices into the cpumap are frequently used as they avoid having
* gaps if CPU numbers were used. For events associated with a pid, rather than
* a CPU, a single dummy map with an entry of -1 is used.
*/
DECLARE_RC_STRUCT(perf_cpu_map) {
refcount_t refcnt;
/** Length of the map array. */
int nr;
/** The CPU values. */
struct perf_cpu map[];
};
My first thought was "Why isn't this a bitmask?". :-)
The thing is, even with 8192 CPUs, the mask size is 1K.
That's still not catastrophic compared to the nightmare
of managing a [8192] array...
And I would keep such core data structures as simple as
possible: a single continuous bitmask. No sparse
support, no NULL tricks. Just an always-present
bitmask, and maybe an any/all modifier flag if the ABIs
strongly suggest so.
The closer a data structure is to the core concepts of
a tool, the more important its robust KISS properties
are - at almost any runtime cost. </handwaving> :)
> Fwiw, even on the kernel side this is borked as I tried to explain in
> this RFC with 0 follow ups:
> https://lore.kernel.org/lkml/20250716223924.825772-1-irogers@google.com/
So that's a neat patch and kudos on the ASCII graphics. :-)
I'd suggest a re-send.
> Imagine the case of an event opened with 1 CPU and a TID. The filter
> will mean the counter only counts on the 1 CPU but the enabled and
> running time will mean the counter will be scaled (on your system) 128
> times!
> All of this matters when you consider different events opened in the
> same evlist and there being combined perf_cpu_maps like the evlist's
> all_cpus value.
> > - nr = perf_cpu_map__max(evsel_list->core.all_cpus).cpu + 1;
> > + if (evsel_list->core.all_cpus)
> > + nr = perf_cpu_map__max(evsel_list->core.all_cpus).cpu + 1;
> > + else
> > + nr = 0;
>
> So I think the better fix is to make max handle a NULL perf_cpu_map.
> That will yield "any" CPU of -1 ( :-( ) and then the -1 + 1 will give
> an empty aggregation map.
>
> I'll write this up, add the test and see if I can spot similar issues
> wrt NULL in the vicinity in the perf_cpu_map code.
Thank you! In the short run I'd prefer whichever fix
can restore perf stat --null functionality the fastest. ;-)
Thanks,
Ingo
next prev parent reply other threads:[~2025-12-01 17:54 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-30 11:43 [PATCH] Fix (well, work around) perf stat --null segfault Ingo Molnar
2025-12-01 16:52 ` Ian Rogers
2025-12-01 17:54 ` Ingo Molnar [this message]
2025-12-01 23:33 ` Ian Rogers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aS3WXn4nz9GE87bM@gmail.com \
--to=mingo@kernel.org \
--cc=acme@kernel.org \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=namhyung@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).