Re: [Patch v7 24/24] perf/x86/intel: Add sanity check for PEBS fragment size

Sashiko discussions
 help / color / mirror / Atom feed

From: "Mi, Dapeng" <dapeng1.mi@linux.intel.com>
To: sashiko@lists.linux.dev
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [Patch v7 24/24] perf/x86/intel: Add sanity check for PEBS fragment size
Date: Wed, 29 Apr 2026 15:04:16 +0800	[thread overview]
Message-ID: <fdb06f4b-d6d4-4bbe-9861-e8aa52e7b053@linux.intel.com> (raw)
In-Reply-To: <20260425035339.8032DC2BCB2@smtp.kernel.org>


On 4/25/2026 11:53 AM, sashiko-bot@kernel.org wrote:
> Sashiko AI review found 3 potential issue(s):
> - [Critical] Skipping setup_sample() for corrupted records does not discard the record, but instead leaks stale register states to userspace.
> - [High] The global `corrupted` flag falsely invalidates valid pending samples for unrelated PMU counters.
> - [Medium] The corruption checks fail to flag records with out-of-bounds final fragments or zero-sized fragments as corrupted.
> --
>
> commit ed1fa3eb36814252a303cf5b2e6f3f4bfd3ed81e
> Author: Dapeng Mi <dapeng1.mi@linux.intel.com>
>
> perf/x86/intel: Add sanity check for PEBS fragment size
>
> This commit adds a sanity check to prevent potential infinite loops
> caused by corrupted PEBS fragment sizes. It intends to discard
> corrupted PEBS records and all subsequent records.
>
>> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
>> index 19f96d9ae04b5..e7bc6a37a3575 100644
>> --- a/arch/x86/events/intel/ds.c
>> +++ b/arch/x86/events/intel/ds.c
>> @@ -2951,13 +2951,17 @@ __intel_pmu_pebs_last_event(struct perf_event *event,
>>  			    struct pt_regs *iregs,
>>  			    struct pt_regs *regs,
>>  			    struct perf_sample_data *data,
>> -			    void *at,
>> -			    int count,
>> +			    void *at, int count, bool corrupted,
>>  			    setup_fn setup_sample)
>>  {
>>  	struct hw_perf_event *hwc = &event->hw;
>>  
>> -	setup_sample(event, iregs, at, data, regs);
>> +	/* Skip parsing corrupted PEBS record. */
>> +	if (corrupted)
>> +		perf_sample_data_init(data, 0, event->hw.last_period);
>> +	else
>> +		setup_sample(event, iregs, at, data, regs);
>> +
> The commit message states that corrupted PEBS records will be discarded, but
> skipping setup_sample() still allows the execution to continue to
> perf_event_output() or perf_event_overflow() located further down in this
> function.
>
> Since regs points to a per-CPU x86_pebs_regs, and it is not updated with
> the current hardware context when setup_sample() is skipped, does this emit
> stale register states from previously processed PEBS records into the
> userspace ring buffer?

Yes, need to call x86_pmu_clear_perf_regs() for the corrupted case.


>
> [ ... ]
>
>> @@ -3391,6 +3399,10 @@ static int intel_pmu_drain_arch_pebs(struct pt_regs *iregs,
>>  			if (!header->size)
>>  				break;
> If a fragment has header->size == 0, the loop breaks and reaches done: with
> corrupted == false.
>
> Does this miss flagging malformed zero-sized fragments as corrupted, allowing
> them to be parsed and potentially causing out-of-bounds reads?

No,  if the size is 0, then setup_arch_pebs_sample_data() would break the
loop and there should be no out-of-bound reads.


>
>>  			at += header->size;
>> +			if (WARN_ON_ONCE(at >= top)) {
>> +				corrupted = true;
>> +				goto done;
>> +			}
>>  			header = at;
>>  		}
>>  
>>  		/* Skip last fragment or the single record */
>>  		at += header->size;
>>  	}
> If the final fragment of a record (where arch_pebs_record_continued()
> is false) has an out-of-bounds size, at += header->size; is executed
> and the loop terminates naturally with corrupted == false.
>
> Could this allow out-of-bounds reads past ds->pebs_index when the unflagged
> malformed record is parsed by setup_arch_pebs_sample_data()?

Yes, it's a bug. Need to set corrupted to true if the final fragment is
corrupted.


>
>> +done:
>>  	__intel_pmu_handle_last_pebs_record(iregs, regs, data, mask,
>> -					    counts, last,
>> +					    counts, last, corrupted,
>>  					    setup_arch_pebs_sample_data);
> Here a single corrupted flag is passed for the entire PEBS buffer drain
> operation.
>
> If one PMU counter encounters a corrupted record and jumps to done:, this
> global flag is applied to the last events of all active counters.
>
> Does this falsely invalidate perfectly valid pending samples for unrelated
> PMU counters, and subject them to the stale register leak described above?

If a record or fragment is corrupted, all fragments and records after it
have no way to be accessed and further passed. If we clear the stale
register info, there would be no leak issue.

Thanks.


>

          parent reply	other threads:[~2026-04-29  7:04 UTC|newest]

Thread overview: expand[flat|nested]  mbox.gz  Atom feed
 [parent not found: <20260425035339.8032DC2BCB2@smtp.kernel.org>]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fdb06f4b-d6d4-4bbe-9861-e8aa52e7b053@linux.intel.com \
    --to=dapeng1.mi@linux.intel.com \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=sashiko@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox