Re: [Patch v7 4/4] perf regs: Enable dumping of SIMD registers

public inbox for linux-perf-users@vger.kernel.org
 help / color / mirror / Atom feed

From: "Mi, Dapeng" <dapeng1.mi@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Ian Rogers <irogers@google.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Jiri Olsa <jolsa@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Andi Kleen <ak@linux.intel.com>,
	Eranian Stephane <eranian@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	broonie@kernel.org, Ravi Bangoria <ravi.bangoria@amd.com>,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	Zide Chen <zide.chen@intel.com>,
	Falcon Thomas <thomas.falcon@intel.com>,
	Dapeng Mi <dapeng1.mi@intel.com>,
	Xudong Hao <xudong.hao@intel.com>,
	Kan Liang <kan.liang@linux.intel.com>
Subject: Re: [Patch v7 4/4] perf regs: Enable dumping of SIMD registers
Date: Thu, 26 Mar 2026 13:48:34 +0800	[thread overview]
Message-ID: <383b7c01-3d83-48ac-9dab-e9a5894ca907@linux.intel.com> (raw)
In-Reply-To: <20260324005706.3778057-5-dapeng1.mi@linux.intel.com>


On 3/24/2026 8:57 AM, Dapeng Mi wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
>
> This patch adds support for dumping SIMD registers using the new
> PERF_SAMPLE_REGS_ABI_SIMD ABI.
>
> Currently, the XMM, YMM, ZMM, OPMASK, eGPRs, and SSP registers on x86
> platforms are supported with the PERF_SAMPLE_REGS_ABI_SIMD ABI.
>
> An example of the output is displayed below.
>
> Example:
>
>  $perf record -e cycles:p -IXMM,YMM,OPMASK,SSP ./test
>  $perf report -D
>  ... ...
>  237538985992962 0x454d0 [0x480]: PERF_RECORD_SAMPLE(IP, 0x1):
>  179370/179370: 0xffffffff969627fc period: 124999 addr: 0
>  ... intr regs: mask 0x20000000000 ABI 64-bit
>  .... SSP   0x0000000000000000
>  ... SIMD ABI nr_vectors 32 vector_qwords 4 nr_pred 8 pred_qwords 1
>  .... YMM  [0] 0x0000000000004000
>  .... YMM  [0] 0x000055e828695270
>  .... YMM  [0] 0x0000000000000000
>  .... YMM  [0] 0x0000000000000000
>  .... YMM  [1] 0x000055e8286990e0
>  .... YMM  [1] 0x000055e828698dd0
>  .... YMM  [1] 0x0000000000000000
>  .... YMM  [1] 0x0000000000000000
>  ... ...
>  .... YMM  [31] 0x0000000000000000
>  .... YMM  [31] 0x0000000000000000
>  .... YMM  [31] 0x0000000000000000
>  .... YMM  [31] 0x0000000000000000
>  .... OPMASK[0] 0x0000000000100221
>  .... OPMASK[1] 0x0000000000000020
>  .... OPMASK[2] 0x000000007fffffff
>  .... OPMASK[3] 0x0000000000000000
>  .... OPMASK[4] 0x0000000000000000
>  .... OPMASK[5] 0x0000000000000000
>  .... OPMASK[6] 0x0000000000000000
>  .... OPMASK[7] 0x0000000000000000
>  ... ...
>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> ---
>
> V7: 1) add assert() check for SIMD fields in sample data.
>     2) optimize regs_abi[] defination.
>
>  tools/perf/util/evsel.c   | 36 +++++++++++++++++++++
>  tools/perf/util/sample.h  | 10 ++++++
>  tools/perf/util/session.c | 66 ++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 111 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 5f00489e714a..24cc7ba71ae1 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -3520,6 +3520,24 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
>  			regs->mask = mask;
>  			regs->regs = (u64 *)array;
>  			array = (void *)array + sz;
> +
> +			if (regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) {
> +				assert(regs->nr_vectors <=
> +				       hweight64(evsel->core.attr.sample_simd_vec_reg_user));
> +				assert(regs->vector_qwords <=
> +				       evsel->core.attr.sample_simd_vec_reg_qwords);
> +				assert(regs->nr_pred <=
> +				       hweight64(evsel->core.attr.sample_simd_pred_reg_user));
> +				assert(regs->pred_qwords <=
> +					 evsel->core.attr.sample_simd_pred_reg_qwords);

Sashiko complains

"

Is it safe to use assert() to validate untrusted perf.data file contents here?
If the sample data exceeds the SIMD register bounds, wouldn't this crash the
tool instead of gracefully returning an error?
Additionally, if compiled with NDEBUG, would the bounds checks be removed
completely?
Could these asserts also be evaluating uninitialized or zeroed memory?
The assertions are executed before the union is populated from the sample via
regs->config = *(u64 *)array. This would allow the checks to trivially pass,
potentially allowing an attacker to bypass the intended capability bounds by
providing arbitrarily large values in the configuration data.

"

It's correct. Need to move the assert check after "regs->config = *(u64
*)array;", and would change the assert() check to a regular if check. If
the check fail, gracefully return an error.


> +				regs->config = *(u64 *)array;

Sashiko comments

"

Is a bounds check missing before reading the SIMD configuration word?
There doesn't appear to be an OVERFLOW_CHECK_u64(array) before dereferencing
the array pointer. If a malformed perf.data event payload ends exactly before
this configuration word, could this result in an out-of-bounds read?

"

It's correct. Need to move the later "OVERFLOW_CHECK(array, sz, max_size)"
before the accessing.


> +				array = (void *)array + sizeof(u64);
> +				regs->simd_data = (u64 *)array;
> +				sz = (regs->nr_vectors * regs->vector_qwords +
> +				      regs->nr_pred * regs->pred_qwords) * sizeof(u64);

Sashiko comments

"

Can this calculation overflow and bypass the OVERFLOW_CHECK below?
Because nr_vectors and vector_qwords are u16 fields, C integer promotion
evaluates the multiplication as a signed 32-bit int. If large values are
supplied (e.g., nr_vectors = 65535, vector_qwords = 65535), could the 32-bit
integer wrap around to a small positive value?
This might allow the size check to pass while subsequent reads process a large
amount of out-of-bounds memory.

"

Although "nr_vectors" and "vector_qwords" can't be so large in reality,
it's a risk in logic.  Would fix it. 


> +				OVERFLOW_CHECK(array, sz, max_size);
> +				array = (void *)array + sz;
> +			}
>  		}
>  	}
>  
> @@ -3577,6 +3595,24 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
>  			regs->mask = mask;
>  			regs->regs = (u64 *)array;
>  			array = (void *)array + sz;
> +
> +			if (regs->abi & PERF_SAMPLE_REGS_ABI_SIMD) {
> +				assert(regs->nr_vectors <=
> +				       hweight64(evsel->core.attr.sample_simd_vec_reg_intr));
> +				assert(regs->vector_qwords <=
> +				       evsel->core.attr.sample_simd_vec_reg_qwords);
> +				assert(regs->nr_pred <=
> +				       hweight64(evsel->core.attr.sample_simd_pred_reg_intr));
> +				assert(regs->pred_qwords <=
> +					 evsel->core.attr.sample_simd_pred_reg_qwords);
> +				regs->config = *(u64 *)array;
> +				array = (void *)array + sizeof(u64);
> +				regs->simd_data = (u64 *)array;
> +				sz = (regs->nr_vectors * regs->vector_qwords +
> +				      regs->nr_pred * regs->pred_qwords) * sizeof(u64);
> +				OVERFLOW_CHECK(array, sz, max_size);
> +				array = (void *)array + sz;
> +			}
>  		}
>  	}
>  
> diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h
> index 3cce8dd202aa..21f3416d3755 100644
> --- a/tools/perf/util/sample.h
> +++ b/tools/perf/util/sample.h
> @@ -15,6 +15,16 @@ struct regs_dump {
>  	u64 abi;
>  	u64 mask;
>  	u64 *regs;
> +	union {
> +		u64 config;
> +		struct {
> +			u16 nr_vectors;
> +			u16 vector_qwords;
> +			u16 nr_pred;
> +			u16 pred_qwords;
> +		};
> +	};

Sashiko comments

"

Does overlaying the 64-bit integer with four 16-bit fields break cross-endian
parsing?
If perf processes a file recorded on a machine with different endianness, the
sample payload is typically byte-swapped as an array of 64-bit integers.
Swapping the 64-bit config word would reverse the structural byte order of the
four 16-bit fields (e.g., nr_vectors shifts to the position of pred_qwords),
which could corrupt the dimension variables.

"

Wow, Sashiko really found a such deeply hidden issue (Copilot didn't find
it). Considering currently perf-tools only does 64bits word swap for the
endians, so it looks the simplest way to fix this issue is to change the 4
u16 fields to 4 u64 fields.

@Peter, @Arnaldo, @Namhyung, @Ian, do you have any idea on this?


> +	u64 *simd_data;
>  
>  	/* Cached values/mask filled by first register access. */
>  	u64 cache_regs[PERF_SAMPLE_REGS_CACHE_SIZE];
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 7cf7bf86205d..453d44d32162 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -972,15 +972,77 @@ static void regs_dump__printf(u64 mask, struct regs_dump *regs,
>  	}
>  }
>  
> +static void simd_regs_dump__printf(uint16_t e_machine, struct regs_dump *regs, bool intr)
> +{
> +	const char *name = "unknown";
> +	int i, idx = 0;
> +	uint16_t qwords;
> +	int reg_c;
> +
> +	if (!(regs->abi & PERF_SAMPLE_REGS_ABI_SIMD))
> +		return;
> +
> +	printf("... SIMD ABI nr_vectors %d vector_qwords %d nr_pred %d pred_qwords %d\n",
> +	       regs->nr_vectors, regs->vector_qwords,
> +	       regs->nr_pred, regs->pred_qwords);
> +
> +	for (reg_c = 0; reg_c < 64; reg_c++) {
> +		if (intr) {
> +			perf_intr_simd_reg_class_bitmap_qwords(e_machine, reg_c,
> +							       &qwords, /*pred=*/false);
> +		} else {
> +			perf_user_simd_reg_class_bitmap_qwords(e_machine, reg_c,
> +							       &qwords, /*pred=*/false);
> +		}
> +		if (regs->vector_qwords == qwords) {
> +			name = perf_simd_reg_class_name(e_machine, reg_c, /*pred=*/false);
> +			break;
> +		}
> +	}
> +
> +	for (i = 0; i < regs->nr_vectors; i++) {
> +		printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->simd_data[idx++]);
> +		printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->simd_data[idx++]);
> +		if (regs->vector_qwords > 2) {
> +			printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->simd_data[idx++]);
> +			printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->simd_data[idx++]);
> +		}
> +		if (regs->vector_qwords > 4) {
> +			printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->simd_data[idx++]);
> +			printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->simd_data[idx++]);
> +			printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->simd_data[idx++]);
> +			printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->simd_data[idx++]);
> +		}
> +	}

Sashiko comments

"

Does this print logic safely handle all values of vector_qwords?
The loop unconditionally prints 2 elements per vector, and extends to 4 or 8
elements if vector_qwords > 2 or > 4. If an unusual value like vector_qwords
is 1 or 3, could the loop consume more elements than were allocated, leading
to reading past the simd_data array bounds?
The same pattern seems to apply to pred_qwords below if it is set to 0.

"

Make sense. Would precisely check if the qwords length instead of just a
range. 

Thanks.


> +
> +	name = "unknown";
> +	for (reg_c = 0; reg_c < 64; reg_c++) {
> +		if (intr) {
> +			perf_intr_simd_reg_class_bitmap_qwords(e_machine, reg_c,
> +							       &qwords, /*pred=*/true);
> +		} else {
> +			perf_user_simd_reg_class_bitmap_qwords(e_machine, reg_c,
> +							       &qwords, /*pred=*/true);
> +		}
> +		if (regs->pred_qwords == qwords) {
> +			name = perf_simd_reg_class_name(e_machine, reg_c, /*pred=*/true);
> +			break;
> +		}
> +	}
> +	for (i = 0; i < regs->nr_pred; i++)
> +		printf(".... %-5s[%d] 0x%016" PRIx64 "\n", name, i, regs->simd_data[idx++]);
> +}
> +
>  static const char *regs_abi[] = {
>  	[PERF_SAMPLE_REGS_ABI_NONE] = "none",
>  	[PERF_SAMPLE_REGS_ABI_32] = "32-bit",
>  	[PERF_SAMPLE_REGS_ABI_64] = "64-bit",
> +	[PERF_SAMPLE_REGS_ABI_SIMD | PERF_SAMPLE_REGS_ABI_64] = "64-bit SIMD",
>  };
>  
>  static inline const char *regs_dump_abi(struct regs_dump *d)
>  {
> -	if (d->abi > PERF_SAMPLE_REGS_ABI_64)
> +	if (d->abi >= ARRAY_SIZE(regs_abi) || !regs_abi[d->abi])
>  		return "unknown";
>  
>  	return regs_abi[d->abi];
> @@ -1010,6 +1072,7 @@ static void regs_user__printf(struct perf_sample *sample, uint16_t e_machine, ui
>  
>  	if (user_regs->regs)
>  		regs__printf("user", user_regs, e_machine, e_flags);
> +	simd_regs_dump__printf(e_machine, user_regs, /*intr=*/false);
>  }
>  
>  static void regs_intr__printf(struct perf_sample *sample, uint16_t e_machine, uint32_t e_flags)
> @@ -1023,6 +1086,7 @@ static void regs_intr__printf(struct perf_sample *sample, uint16_t e_machine, ui
>  
>  	if (intr_regs->regs)
>  		regs__printf("intr", intr_regs, e_machine, e_flags);
> +	simd_regs_dump__printf(e_machine, intr_regs, /*intr=*/true);
>  }
>  
>  static void stack_user__printf(struct stack_dump *dump)

     prev parent reply	other threads:[~2026-03-26  5:48 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-24  0:57 [Patch v7 0/4] Perf tools: Support eGPRs/SSP/SIMD registers sampling Dapeng Mi
2026-03-24  0:57 ` [Patch v7 1/4] perf headers: Sync with the kernel headers Dapeng Mi
2026-03-24  0:57 ` [Patch v7 2/4] perf regs: Support x86 eGPRs/SSP sampling Dapeng Mi
2026-03-24  2:49   ` Ian Rogers
2026-03-25  2:08     ` Mi, Dapeng
2026-03-26  1:41   ` Mi, Dapeng
2026-03-24  0:57 ` [Patch v7 3/4] perf regs: Support x86 SIMD registers sampling Dapeng Mi
2026-03-26  2:50   ` Mi, Dapeng
2026-03-24  0:57 ` [Patch v7 4/4] perf regs: Enable dumping of SIMD registers Dapeng Mi
2026-03-26  5:48   ` Mi, Dapeng [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=383b7c01-3d83-48ac-9dab-e9a5894ca907@linux.intel.com \
    --to=dapeng1.mi@linux.intel.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=broonie@kernel.org \
    --cc=dapeng1.mi@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=eranian@google.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=ravi.bangoria@amd.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.falcon@intel.com \
    --cc=xudong.hao@intel.com \
    --cc=zide.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox