From: sashiko-bot@kernel.org
To: "Tanushree Shah" <tshah@linux.ibm.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [RFC PATCH 3/4] perf data-convert: Add perf.data to trace.dat conversion backend
Date: Mon, 08 Jun 2026 13:14:49 +0000 [thread overview]
Message-ID: <20260608131449.9DBDA1F00893@smtp.kernel.org> (raw)
In-Reply-To: <20260608125951.90425-5-tshah@linux.ibm.com>
Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [Critical] Out-of-bounds heap write when tracepoint event data size exceeds the remaining space in a trace.dat page.
- [Medium] The conversion tool fails to register required callbacks for pipe mode (such as `.attr`, `.feature`, and `.tracing_data`), silently dropping events.
- [Medium] The output file is overwritten unconditionally, ignoring the `opts->force` flag.
- [Medium] A `realloc` failure in `trace_dat__collect_cpu_event` causes a memory leak and subsequent segmentation fault.
--
commit da74698d88efeca5fa33802144416838acb15d2a
Author: Tanushree Shah <tshah@linux.ibm.com>
perf data-convert: Add perf.data to trace.dat conversion backend
Add data-convert-trace.c implementing trace_convert__perf2dat() to
convert perf.data tracepoint events to trace.dat format.
process_sample_event() is invoked for each PERF_TYPE_TRACEPOINT sample
during perf_session__process_events(), storing raw event bytes per-cpu
via trace_dat__collect_cpu_event().
Once all samples are collected:
- trace_dat__write_options_section1() writes the OPTIONS section with
CPUCOUNT, TRACECLOCK, HEADER_INFO, FTRACE_EVENTS, EVENT_FORMATS,
KALLSYMS, CMDLINES and DONE options.
- trace_dat__write__options_section2() writes the OPTIONS section with
BUFFER option holding per-cpu data offset placeholders and the DONE
option.
- trace_dat__write_flyrecord_section() builds ring buffer pages
per-cpu and patches BUFFER option with final offsets and sizes
Per-cpu buffers are sized to tep_get_page_size() from the session
tep handle and released on all exit paths.
> diff --git a/tools/perf/util/data-convert-trace.c b/tools/perf/util/data-convert-trace.c
> new file mode 100644
> index 0000000000000..e4f8b817be36d
> --- /dev/null
> +++ b/tools/perf/util/data-convert-trace.c
[ ... ]
> +/* Store raw tracepoint event data in per-cpu buffer for trace.dat flyrecord */
> +static int process_sample_event(const struct perf_tool *tool,
> + union perf_event *event __maybe_unused,
> + struct perf_sample *sample,
> + struct evsel *evsel,
> + struct machine *machine __maybe_unused)
> +{
> + struct trace_convert *tc = container_of(tool, struct trace_convert, tool);
> +
> + /* Collect raw tracepoint data per-cpu */
> + if (trace_dat_fp && sample->raw_size > 0 &&
> + evsel->core.attr.type == PERF_TYPE_TRACEPOINT) {
> + if (trace_dat__collect_cpu_event(sample->cpu, sample->time,
> + sample->raw_data, sample->raw_size) < 0) {
[Severity: Medium]
Is it possible for a memory allocation failure here to cause a crash
during cleanup?
If trace_dat__collect_cpu_event() fails during a realloc(), it might return
-ENOMEM while leaving the events array pointer as NULL but keeping the count
intact.
When trace_dat__free_cpu_buffers() is called on the error path, it could
dereference that NULL pointer:
tools/perf/util/trace-dat.c:trace_dat__free_cpu_buffers() {
...
for (i = 0; i < trace_cpu_data[cpu].count; i++)
free(trace_cpu_data[cpu].events[i].raw);
...
}
> + pr_err("Failed to collect CPU event\n");
> + return -ENOMEM;
> + }
> + tc->events_count++;
> + }
> +
> + return 0;
> +}
[ ... ]
> +/* Convert perf.data tracepoint events to trace.dat format */
> +int trace_convert__perf2dat(const char *input, const char *to_trace,
> + struct perf_data_convert_opts *opts)
> +{
> + struct perf_session *session;
> + struct trace_convert tc = {
> + .events_count = 0,
> + };
> + struct perf_data data = {
> + .path = input,
> + .mode = PERF_DATA_MODE_READ,
> + .force = opts->force,
> + };
> + int ret = -EINVAL;
> + bool cpu_buffers_initialized = false;
> +
> + /* Initialize tool with all required callbacks */
> + perf_tool__init(&tc.tool, /*ordered_events=*/true);
> + tc.tool.sample = process_sample_event;
[Severity: Medium]
Does this initialization omit the required callbacks for pipe mode?
By only overriding .sample, the .attr, .feature, and .tracing_data
callbacks remain as default stubs from perf_tool__init().
In pipe mode, without these callbacks, the tool might silently fail to
populate trace metadata and generate an empty trace.dat file.
> +
> + /* Open output trace.dat file */
> + trace_dat_fp = fopen(to_trace, "wb");
> + if (!trace_dat_fp) {
[Severity: Medium]
Can this unconditionally overwrite an existing file?
The fopen() call uses the "wb" mode which truncates the file, but it
doesn't appear to check the opts->force flag.
This regression could lead to accidental data loss.
> + pr_err("Failed to open output file: %s\n", to_trace);
> + return -EINVAL;
> + }
[ ... ]
> + /* Write trace.dat options and flyrecord sections */
> + if (trace_dat__write_options_section1() < 0) {
> + pr_err("Failed to write options section1\n");
> + ret = -EIO;
> + goto out_delete;
> + }
> + if (trace_dat__write_options_section2() < 0) {
> + pr_err("Failed to write options section2\n");
> + ret = -EIO;
> + goto out_delete;
> + }
> + if (trace_dat__write_flyrecord_section() < 0) {
[Severity: Critical]
Could an unusually large tracepoint event cause an out-of-bounds memory
access when writing the flyrecord section?
When trace_dat__write_cpu_dat() processes records, it queues them to a new
page without checking if the rec_size fits within trace_dat_page_size.
If an event size exceeds the page size, trace_dat__write_page() allocates a
fixed-size buffer and copies the data into it:
tools/perf/util/trace-dat.c:trace_dat__write_page() {
page = calloc(1, trace_dat_page_size);
...
for (i = 0; i < nr_recs; i++) {
memcpy(page + offset, records[i], rec_sizes[i]);
...
}
This regression could result in a heap buffer overflow.
> + pr_err("Failed to write flyrecord section\n");
> + ret = -EIO;
> + goto out_delete;
> + }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260608125951.90425-2-tshah@linux.ibm.com?part=3
next prev parent reply other threads:[~2026-06-08 13:14 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-08 12:59 [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion Tanushree Shah
2026-06-08 12:59 ` [RFC PATCH 1/4] perf/trace-dat: Add trace.dat export infrastructure Tanushree Shah
2026-06-08 13:13 ` sashiko-bot
2026-06-08 12:59 ` [RFC PATCH 2/4] perf/trace-event: Write trace.dat metadata sections during parsing Tanushree Shah
2026-06-08 13:12 ` sashiko-bot
2026-06-08 12:59 ` [RFC PATCH 3/4] perf data-convert: Add perf.data to trace.dat conversion backend Tanushree Shah
2026-06-08 13:14 ` sashiko-bot [this message]
2026-06-08 12:59 ` [RFC PATCH 4/4] perf data: Add --to-trace-dat option for converting perf.data tracepoint events into trace.dat format Tanushree Shah
2026-06-08 13:12 ` sashiko-bot
2026-06-08 15:18 ` [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion Ian Rogers
2026-06-09 13:09 ` Tanushree Shah
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260608131449.9DBDA1F00893@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=sashiko-reviews@lists.linux.dev \
--cc=tshah@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.