Linux Perf Users
 help / color / mirror / Atom feed
From: Athira Rajeev <atrajeev@linux.ibm.com>
To: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Adrian Hunter <adrian.hunter@intel.com>,
	James Clark <james.clark@linaro.org>,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	Venkat <venkat88@linux.ibm.com>
Subject: Re: [PATCH v1] perf sched stats: Fix segmentation faults in diff mode
Date: Wed, 29 Apr 2026 19:31:32 +0530	[thread overview]
Message-ID: <2238F7E3-D50A-449A-B295-C3C996162FBD@linux.ibm.com> (raw)
In-Reply-To: <20260428070811.1883202-1-irogers@google.com>



> On 28 Apr 2026, at 12:38 PM, Ian Rogers <irogers@google.com> wrote:
> 
> Address several segmentation fault vectors in `perf sched stats diff`:
> 
> 1. When processing invalid or empty data files, the CPU domain maps may
>   be NULL. Added NULL checks for `cd_map1` and `cd_map2` in
>   `show_schedstat_data()` to fail gracefully.
> 
> 2. When files contain a different number of CPUs or domains, the parallel
>   list iteration in `show_schedstat_data()` could wrap around the list
>   heads and dereference invalid memory. Added `list_is_last` checks
>   to safely terminate iteration at the end of each list.
> 
> 3. When summarizing CPU statistics in `get_all_cpu_stats()`, parallel list
>   iteration over domains could similarly wrap around if a CPU has more
>   domains than the first CPU. Added `list_is_last` check to prevent this.
> 
> 4. Added bounds checks for `cs1->cpu` and `cs2->cpu` against `nr1` and
>   `nr2` (passed from `env->nr_cpus_avail`) to prevent out-of-bounds
>   reads from `cd_map1` and `cd_map2` when processing data from machines
>   with different CPU configurations.
> 
> 5. Added NULL checks for `cd_info1` and `cd_info2` in `show_schedstat_data()`
>   to prevent crashes when a CPU has samples in the data file but no
>   corresponding domain info in the header (which leaves the map entry NULL).
> 
> 6. Added NULL checks for `dinfo1` and `dinfo2` in `show_schedstat_data()`
>   to prevent crashes when a domain is present in the list but has no
>   corresponding info in the CPU domain map (which leaves the entry NULL).
> 
> 7. Zero-initialized the `perf_data` array in `perf_sched__schedstat_diff()`
>   to prevent stack garbage from causing `perf_data_file__fd()` to attempt
>   to use a NULL `fptr` when `use_stdio` happened to be non-zero.
> 
> Assisted-by: Gemini:gemini-3.1-pro-preview
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
> Previously this patch was part of a large perf script refactor:
> https://lore.kernel.org/lkml/20260425224951.174663-1-irogers@google.com/
> ---
> tools/perf/builtin-sched.c | 95 +++++++++++++++++++++++++++++++-------
> 1 file changed, 79 insertions(+), 16 deletions(-)
> 
> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> index 555247568e7a..9ecda631b3ab 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c
> @@ -4212,12 +4212,20 @@ static int get_all_cpu_stats(struct list_head *head)
> 
> cnt++;
> summarize_schedstat_cpu(summary_head, cptr, cnt, is_last);
> - tdptr = list_first_entry(&summary_head->domain_head, struct schedstat_domain,
> - domain_list);
> + if (!list_empty(&summary_head->domain_head))
> + tdptr = list_first_entry(&summary_head->domain_head, struct schedstat_domain,
> + domain_list);
> + else
> + tdptr = NULL;
> 
> list_for_each_entry(dptr, &cptr->domain_head, domain_list) {
> - summarize_schedstat_domain(tdptr, dptr, cnt, is_last);
> - tdptr = list_next_entry(tdptr, domain_list);
> + if (tdptr) {
> + summarize_schedstat_domain(tdptr, dptr, cnt, is_last);
> + if (list_is_last(&tdptr->domain_list, &summary_head->domain_head))
> + tdptr = NULL;
> + else
> + tdptr = list_next_entry(tdptr, domain_list);
> + }
> }
> }
> 
> @@ -4225,8 +4233,8 @@ static int get_all_cpu_stats(struct list_head *head)
> return ret;
> }
> 
> -static int show_schedstat_data(struct list_head *head1, struct cpu_domain_map **cd_map1,
> -       struct list_head *head2, struct cpu_domain_map **cd_map2,
> +static int show_schedstat_data(struct list_head *head1, struct cpu_domain_map **cd_map1, int nr1,
> +       struct list_head *head2, struct cpu_domain_map **cd_map2, int nr2,
>       bool summary_only)
> {
> struct schedstat_cpu *cptr1 = list_first_entry(head1, struct schedstat_cpu, cpu_list);
> @@ -4238,6 +4246,15 @@ static int show_schedstat_data(struct list_head *head1, struct cpu_domain_map **
> bool is_summary = true;
> int ret = 0;
> 
> + if (!cd_map1) {
> + pr_err("Error: CPU domain map 1 is missing.\n");
> + return -1;
> + }
> + if (head2 && !cd_map2) {
> + pr_err("Error: CPU domain map 2 is missing.\n");
> + return -1;
> + }
> +
> printf("Description\n");
> print_separator2(SEP_LEN, "", 0);
> printf("%-30s-> %s\n", "DESC", "Description of the field");
> @@ -4269,12 +4286,33 @@ static int show_schedstat_data(struct list_head *head1, struct cpu_domain_map **
> struct cpu_domain_map *cd_info1 = NULL, *cd_info2 = NULL;
> 
> cs1 = cptr1->cpu_data;
> + cs2 = NULL;
> + dptr2 = NULL;
> + if (cs1->cpu >= (u32)nr1) {
> + pr_err("Error: CPU %d exceeds domain map size %d\n", cs1->cpu, nr1);
> + return -1;
> + }
> cd_info1 = cd_map1[cs1->cpu];
> + if (!cd_info1) {
> + pr_err("Error: CPU %d domain info is missing in map 1.\n", cs1->cpu);
> + return -1;
> + }
> if (cptr2) {
> cs2 = cptr2->cpu_data;
> + if (cs2->cpu >= (u32)nr2) {
> + pr_err("Error: CPU %d exceeds domain map size %d\n", cs2->cpu, nr2);
> + return -1;
> + }
> cd_info2 = cd_map2[cs2->cpu];
> - dptr2 = list_first_entry(&cptr2->domain_head, struct schedstat_domain,
> - domain_list);
> + if (!cd_info2) {
> + pr_err("Error: CPU %d domain info is missing in map 2.\n", cs2->cpu);
> + return -1;
> + }
> + if (!list_empty(&cptr2->domain_head))
> + dptr2 = list_first_entry(&cptr2->domain_head, struct schedstat_domain,
> + domain_list);
> + else
> + dptr2 = NULL;
> }
> 
> if (cs2 && cs1->cpu != cs2->cpu) {
> @@ -4302,10 +4340,26 @@ static int show_schedstat_data(struct list_head *head1, struct cpu_domain_map **
> struct domain_info *dinfo1 = NULL, *dinfo2 = NULL;
> 
> ds1 = dptr1->domain_data;
> + if (ds1->domain >= cd_info1->nr_domains) {
> + pr_err("Error: Domain %d exceeds max domains %d for CPU %d in map 1.\n", ds1->domain, cd_info1->nr_domains, cs1->cpu);
> + return -1;
> + }
> dinfo1 = cd_info1->domains[ds1->domain];
> + if (!dinfo1) {
> + pr_err("Error: Domain %d info is missing for CPU %d in map 1.\n", ds1->domain, cs1->cpu);
> + return -1;
> + }
> if (dptr2) {
> ds2 = dptr2->domain_data;
> + if (ds2->domain >= cd_info2->nr_domains) {
> + pr_err("Error: Domain %d exceeds max domains %d for CPU %d in map 2.\n", ds2->domain, cd_info2->nr_domains, cs2->cpu);
> + return -1;
> + }
> dinfo2 = cd_info2->domains[ds2->domain];
> + if (!dinfo2) {
> + pr_err("Error: Domain %d info is missing for CPU %d in map 2.\n", ds2->domain, cs2->cpu);
> + return -1;
> + }
> }
> 
> if (dinfo2 && dinfo1->domain != dinfo2->domain) {
> @@ -4334,14 +4388,22 @@ static int show_schedstat_data(struct list_head *head1, struct cpu_domain_map **
> print_domain_stats(ds1, ds2, jiffies1, jiffies2);
> print_separator2(SEP_LEN, "", 0);
> 
> - if (dptr2)
> - dptr2 = list_next_entry(dptr2, domain_list);
> + if (dptr2) {
> + if (list_is_last(&dptr2->domain_list, &cptr2->domain_head))
> + dptr2 = NULL;
> + else
> + dptr2 = list_next_entry(dptr2, domain_list);
> + }
> }
> if (summary_only)
> break;
> 
> - if (cptr2)
> - cptr2 = list_next_entry(cptr2, cpu_list);
> + if (cptr2) {
> + if (list_is_last(&cptr2->cpu_list, head2))
> + cptr2 = NULL;
> + else
> + cptr2 = list_next_entry(cptr2, cpu_list);
> + }
> 
> is_summary = false;
> }
> @@ -4523,7 +4585,7 @@ static int perf_sched__schedstat_report(struct perf_sched *sched)
> }
> 
> cd_map = session->header.env.cpu_domain;
> - err = show_schedstat_data(&cpu_head, cd_map, NULL, NULL, false);
> + err = show_schedstat_data(&cpu_head, cd_map, session->header.env.nr_cpus_avail, NULL, NULL, 0, false);
> }
> 
> out:
> @@ -4538,7 +4600,7 @@ static int perf_sched__schedstat_diff(struct perf_sched *sched,
> struct cpu_domain_map **cd_map0 = NULL, **cd_map1 = NULL;
> struct list_head cpu_head_ses0, cpu_head_ses1;
> struct perf_session *session[2];
> - struct perf_data data[2];
> + struct perf_data data[2] = {0};

Hi Ian

Thanks for these fixes.

I had sent a fix for only above change ( perf_data initialization ) : https://lore.kernel.org/linux-perf-users/20260422173545.73144-1-atrajeev@linux.ibm.com/
Since that change is already covered in this patch. Tested with this patch and verified it covers the fix

Before the patch:

# for i in {0..20}; do ./perf test "perf sched stats tests"; done
 92: perf sched stats tests                                          : Ok
 92: perf sched stats tests                                          : FAILED!
 92: perf sched stats tests                                          : FAILED!
 92: perf sched stats tests                                          : FAILED!


After this fix

# for i in {0..20}; do ./perf test "perf sched stats tests"; done
 92: perf sched stats tests                                          : Ok
 92: perf sched stats tests                                          : Ok
 92: perf sched stats tests                                          : Ok
 92: perf sched stats tests                                          : Ok
 92: perf sched stats tests                                          : Ok
 92: perf sched stats tests                                          : Ok
 92: perf sched stats tests                                          : Ok
 92: perf sched stats tests                                          : Ok
 92: perf sched stats tests                                          : Ok
 92: perf sched stats tests                                          : Ok
 92: perf sched stats tests                                          : Ok
 92: perf sched stats tests                                          : Ok
 92: perf sched stats tests                                          : Ok
 92: perf sched stats tests                                          : Ok
 92: perf sched stats tests                                          : Ok
 92: perf sched stats tests                                          : Ok
 92: perf sched stats tests                                          : Ok
 92: perf sched stats tests                                          : Ok
 92: perf sched stats tests                                          : Ok
 92: perf sched stats tests                                          : Ok
 92: perf sched stats tests                                          : Ok


Tested-by : Athira Rajeev <atrajeev@linux.ibm.com <mailto:atrajeev@linux.ibm.com>>

Thanks
Athira
> int ret = 0, err = 0;
> static const char *defaults[] = {
> "perf.data.old",
> @@ -4610,7 +4672,8 @@ static int perf_sched__schedstat_diff(struct perf_sched *sched,
> goto out_delete_ses0;
> }
> 
> - show_schedstat_data(&cpu_head_ses0, cd_map0, &cpu_head_ses1, cd_map1, true);
> + show_schedstat_data(&cpu_head_ses0, cd_map0, session[0]->header.env.nr_cpus_avail,
> +    &cpu_head_ses1, cd_map1, session[1]->header.env.nr_cpus_avail, true);
> 
> out_delete_ses1:
> free_schedstat(&cpu_head_ses1);
> @@ -4720,7 +4783,7 @@ static int perf_sched__schedstat_live(struct perf_sched *sched,
> goto out;
> }
> 
> - show_schedstat_data(&cpu_head, cd_map, NULL, NULL, false);
> + show_schedstat_data(&cpu_head, cd_map, nr, NULL, NULL, 0, false);
> free_cpu_domain_info(cd_map, sv, nr);
> out:
> free_schedstat(&cpu_head);
> -- 
> 2.54.0.545.g6539524ca2-goog
> 


  parent reply	other threads:[~2026-04-29 14:02 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-28  7:08 [PATCH v1] perf sched stats: Fix segmentation faults in diff mode Ian Rogers
2026-04-29 12:30 ` James Clark
2026-04-29 14:01 ` Athira Rajeev [this message]
2026-04-29 17:39 ` [PATCH v2] " Ian Rogers
2026-04-29 20:18   ` sashiko-bot
2026-04-30 13:29   ` James Clark
2026-05-01 15:16   ` Athira Rajeev
2026-05-06  0:00   ` [PATCH v3] perf sched stats: Fix segmentation faults, memory leaks, and stale pointers " Ian Rogers
2026-05-06  0:28     ` sashiko-bot
2026-05-06  4:10     ` [PATCH v4] " Ian Rogers
2026-05-06  6:45       ` Ian Rogers
2026-05-06 13:03         ` Arnaldo Carvalho de Melo
2026-05-11 16:20           ` Ian Rogers
2026-05-06  9:07       ` James Clark

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2238F7E3-D50A-449A-B295-C3C996162FBD@linux.ibm.com \
    --to=atrajeev@linux.ibm.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=irogers@google.com \
    --cc=james.clark@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=venkat88@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox