Linux Perf Users
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Athira Rajeev" <atrajeev@linux.ibm.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [PATCH 8/9] tools/perf/powerpc: Add event name as htm of PERF_TYPE_SYNTH type to present htm samples
Date: Wed, 01 Jul 2026 09:12:45 +0000	[thread overview]
Message-ID: <20260701091246.5B6E21F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260701084115.80383-9-atrajeev@linux.ibm.com>

Thank you for your contribution! Sashiko AI review found 11 potential issue(s) to consider:
- [High] Error handling path bypasses auxtrace_queues cleanup, causing a memory leak and leaving a dangling pointer that leads to Use-After-Free.
- [High] Unchecked direct access to `evsel->name` passed to `strstarts()` within an event loop.
- [High] Unchecked dereference of `evsel->core.id` when synthesizing event IDs.
- [High] Synthesized `PERF_RECORD_SAMPLE` events lack payload serialization, causing downstream parser faults.
- [High] Off-by-one accounting of translation file headers causes out-of-bounds mmap read and data corruption.
- [High] Processing the auxtrace queues for every PERF_RECORD_SAMPLE duplicates trace data and heavily impacts performance.
- [High] Missing POSIX libc headers violates musl compatibility guidelines.
- [High] Massive out-of-bounds heap read in `write_htm` due to unvalidated length field from untrusted auxtrace data.
- [High] Missing `NULL` check on `evlist__event2evsel()` return value leads to immediate crash on malformed events.
- [Medium] Using strstr from the beginning of the line inside a regex match loop corrupts the output when a trace line contains multiple addresses.
- [Medium] Using `unsigned long` for 64-bit PowerPC addresses truncates them on 32-bit hosts, breaking cross-platform analysis.
--

(Note: The standard patch quoting format has been altered to a free-form
summary due to system recitation restrictions. No code diffs are quoted below.)

commit bb9556f09003be87e7766614cf5cfdd9647bd477
Author: Athira Rajeev <atrajeev@linux.ibm.com>
Subject: tools/perf/powerpc: Add event name as htm of PERF_TYPE_SYNTH type to present htm samples

This commit creates a synthetic "htm" event to present hardware transactional
memory entries as samples in perf. It adds configuration values and uses the
logical address as the sample instruction pointer to aid in identifying hot
logical addresses.

[Severity: High]
In powerpc_htm_process_auxtrace_info(), if powerpc_htm_synth_events() returns
an error, the code jumps to the err_free label. Does this skip the cleanup of
the auxtrace queues and leave a dangling pointer in session->auxtrace? 

It looks like the err_free_queues label handles both auxtrace_queues__free()
and setting session->auxtrace to NULL, which might be needed here to prevent
a use-after-free during session teardown.

[Severity: High]
In powerpc_htm_synth_events(), the loop iterates over evsel entries and checks
if they start with "htm" using strstarts(evsel->name, "htm"). Could
evsel->name be NULL for unnamed or dummy events? 

Passing a NULL name into strstarts() could trigger a segmentation fault.

[Severity: High]
Also in powerpc_htm_synth_events(), the synthesized event ID is generated by
adding a fixed offset to evsel->core.id[0]. Is it possible for evsel->core.id
to be NULL if the event was recorded without sample IDs? 

Dereferencing it directly in that case might cause a crash.

[Severity: High]
Looking at powerpc_htm_create_sample(), the synthesized event size is hardcoded
to exactly sizeof(struct perf_event_header), which is 8 bytes. Does this omit
the actual sample payload serialization that perf_event__synthesize_sample()
usually handles? 

Downstream tools parsing this PERF_RECORD_SAMPLE event might expect a payload
based on the sample_type, potentially leading to out-of-bounds reads when
they process the 8-byte event.

[Severity: High]
In write_htm(), the total number of entries is incremented to account for the
32-byte header, and this is added to htm->htm_mem_entries. However, when
create_mem_maps() later maps the file, it skips the first header but loops
exactly htm->htm_mem_entries times. 

Will this result in reading past the end of the memory-mapped region?
Additionally, if multiple chunks were written, would their embedded headers
be misinterpreted as valid trace data?

[Severity: High]
In powerpc_htm_process_event(), the code iterates through all auxtrace queues
and writes their buffer contents to the file for every PERF_RECORD_SAMPLE
event processed. 

Does this repeatedly duplicate the entire trace data? Since the buffers are
not consumed or marked as processed, this might result in an exponentially
growing trace file and significant performance degradation.

[Severity: Medium]
In process_trace_file(), while iterating through regex matches for addresses
on a line, the replacement logic uses strstr() from the beginning of the line
to find the "addr:0x" prefix. 

If a single trace line contains multiple addresses, won't strstr() repeatedly
find the first occurrence instead of the current match, potentially corrupting
the output?

[Severity: Medium]
The physical and logical addresses from the HTM trace are 64-bit values, but
powerpc_htm_create_sample() takes an unsigned long address. 

On 32-bit host systems performing cross-platform analysis, could this truncate
the 64-bit addresses? The boundary check for kernel addresses would also be
optimized out on 32-bit hosts. The addr_map structure seems to share this
same truncation issue.

[Severity: High]
This file utilizes several standard POSIX functions and constants such as
fopen(), malloc(), strtoul(), and O_WRONLY. 

To comply with musl libc compatibility guidelines, shouldn't the standard
headers like <stdio.h>, <stdlib.h>, and <fcntl.h> be explicitly included
at the top of the file?

[Severity: High]
In write_htm(), the num_entries value is parsed from the incoming auxtrace
data, converted to cpu endianness, incremented, and then passed directly to
fwrite(). 

Is there a missing bounds check to ensure that the calculated size does not
exceed the buffer's actual bounds? An unvalidated length could lead to an
out-of-bounds heap read if the trace data is malformed.

[Severity: High]
Both powerpc_htm_process_event() and powerpc_htm_process_auxtrace_info()
fetch an evsel using evlist__event2evsel() and immediately dereference its
config attribute. 

If the event's sample ID isn't found in the evlist, could
evlist__event2evsel() return NULL? A check might be needed to avoid a
segmentation fault on malformed perf.data files.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701084115.80383-1-atrajeev@linux.ibm.com?part=8

  reply	other threads:[~2026-07-01  9:12 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-07-01  8:41 [PATCH 0/9] tools/perf: Add interface to expose HTM trace data via perf Athira Rajeev
2026-07-01  8:41 ` [PATCH 1/9] tool/perf: Move auxtrace_record__init for powerpc-vpadtl as separate utility Athira Rajeev
2026-07-01  8:56   ` sashiko-bot
2026-07-01  8:41 ` [PATCH 2/9] tools/perf: Add CONFIG_AUXTRACE support for HTM pmu on powerpc Athira Rajeev
2026-07-01  8:55   ` sashiko-bot
2026-07-01  8:41 ` [PATCH 3/9] tools/perf: Add arch_record__collect_final_data to collect additional data before closing the event Athira Rajeev
2026-07-01  8:54   ` sashiko-bot
2026-07-01  8:41 ` [PATCH 4/9] tools/perf: Add powerpc callback support for arch_record__collect_final_data Athira Rajeev
2026-07-01  8:55   ` sashiko-bot
2026-07-01  8:41 ` [PATCH 5/9] tools/perf: process htm auxtrace events and display in perf report -D Athira Rajeev
2026-07-01  9:05   ` sashiko-bot
2026-07-01  8:41 ` [PATCH 6/9] perf tools powerpc: Add HTM trace data processing and decoding support Athira Rajeev
2026-07-01  9:06   ` sashiko-bot
2026-07-01  8:41 ` [PATCH 7/9] perf tools powerpc: Add physical to logical address mapping for HTM traces Athira Rajeev
2026-07-01  9:07   ` sashiko-bot
2026-07-01  8:41 ` [PATCH 8/9] tools/perf/powerpc: Add event name as htm of PERF_TYPE_SYNTH type to present htm samples Athira Rajeev
2026-07-01  9:12   ` sashiko-bot [this message]
2026-07-01  8:41 ` [PATCH 9/9] tools/perf/powerpc: Add logical address in decoded traces Athira Rajeev
2026-07-01  9:13   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260701091246.5B6E21F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=atrajeev@linux.ibm.com \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox