Re: [RFC PATCH 3/4] perf data-convert: Add perf.data to trace.dat conversion backend

Linux Perf Users
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Tanushree Shah" <tshah@linux.ibm.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [RFC PATCH 3/4] perf data-convert: Add perf.data to trace.dat conversion backend
Date: Mon, 08 Jun 2026 13:14:49 +0000	[thread overview]
Message-ID: <20260608131449.9DBDA1F00893@smtp.kernel.org> (raw)
In-Reply-To: <20260608125951.90425-5-tshah@linux.ibm.com>

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [Critical] Out-of-bounds heap write when tracepoint event data size exceeds the remaining space in a trace.dat page.
- [Medium] The conversion tool fails to register required callbacks for pipe mode (such as `.attr`, `.feature`, and `.tracing_data`), silently dropping events.
- [Medium] The output file is overwritten unconditionally, ignoring the `opts->force` flag.
- [Medium] A `realloc` failure in `trace_dat__collect_cpu_event` causes a memory leak and subsequent segmentation fault.
--

commit da74698d88efeca5fa33802144416838acb15d2a
Author: Tanushree Shah <tshah@linux.ibm.com>

perf data-convert: Add perf.data to trace.dat conversion backend

Add data-convert-trace.c implementing trace_convert__perf2dat() to
convert perf.data tracepoint events to trace.dat format.

process_sample_event() is invoked for each PERF_TYPE_TRACEPOINT sample
during perf_session__process_events(), storing raw event bytes per-cpu
via trace_dat__collect_cpu_event().

Once all samples are collected:
- trace_dat__write_options_section1() writes the OPTIONS section with
  CPUCOUNT, TRACECLOCK, HEADER_INFO, FTRACE_EVENTS, EVENT_FORMATS,
  KALLSYMS, CMDLINES and DONE options.
- trace_dat__write__options_section2() writes the OPTIONS section with
  BUFFER option holding per-cpu data offset placeholders and the DONE
  option.
- trace_dat__write_flyrecord_section() builds ring buffer pages
  per-cpu and patches BUFFER option with final offsets and sizes

Per-cpu buffers are sized to tep_get_page_size() from the session
tep handle and released on all exit paths.

> diff --git a/tools/perf/util/data-convert-trace.c b/tools/perf/util/data-convert-trace.c
> new file mode 100644
> index 0000000000000..e4f8b817be36d
> --- /dev/null
> +++ b/tools/perf/util/data-convert-trace.c
[ ... ]
> +/* Store raw tracepoint event data in per-cpu buffer for trace.dat flyrecord */
> +static int process_sample_event(const struct perf_tool *tool,
> +				union perf_event *event __maybe_unused,
> +				struct perf_sample *sample,
> +				struct evsel *evsel,
> +				struct machine *machine __maybe_unused)
> +{
> +	struct trace_convert *tc = container_of(tool, struct trace_convert, tool);
> +
> +	/* Collect raw tracepoint data per-cpu */
> +	if (trace_dat_fp && sample->raw_size > 0 &&
> +	    evsel->core.attr.type == PERF_TYPE_TRACEPOINT) {
> +		if (trace_dat__collect_cpu_event(sample->cpu, sample->time,
> +					sample->raw_data, sample->raw_size) < 0) {

[Severity: Medium]
Is it possible for a memory allocation failure here to cause a crash
during cleanup?

If trace_dat__collect_cpu_event() fails during a realloc(), it might return
-ENOMEM while leaving the events array pointer as NULL but keeping the count
intact.

When trace_dat__free_cpu_buffers() is called on the error path, it could
dereference that NULL pointer:

tools/perf/util/trace-dat.c:trace_dat__free_cpu_buffers() {
	...
	for (i = 0; i < trace_cpu_data[cpu].count; i++)
		free(trace_cpu_data[cpu].events[i].raw);
	...
}

> +			pr_err("Failed to collect CPU event\n");
> +			return -ENOMEM;
> +		}
> +		tc->events_count++;
> +	}
> +
> +	return 0;
> +}
[ ... ]
> +/* Convert perf.data tracepoint events to trace.dat format */
> +int trace_convert__perf2dat(const char *input, const char *to_trace,
> +			   struct perf_data_convert_opts *opts)
> +{
> +	struct perf_session *session;
> +	struct trace_convert tc = {
> +		.events_count = 0,
> +	};
> +	struct perf_data data = {
> +		.path = input,
> +		.mode = PERF_DATA_MODE_READ,
> +		.force = opts->force,
> +	};
> +	int ret = -EINVAL;
> +	bool cpu_buffers_initialized = false;
> +
> +	/* Initialize tool with all required callbacks */
> +	perf_tool__init(&tc.tool, /*ordered_events=*/true);
> +	tc.tool.sample = process_sample_event;

[Severity: Medium]
Does this initialization omit the required callbacks for pipe mode?

By only overriding .sample, the .attr, .feature, and .tracing_data
callbacks remain as default stubs from perf_tool__init().

In pipe mode, without these callbacks, the tool might silently fail to
populate trace metadata and generate an empty trace.dat file.

> +
> +	/* Open output trace.dat file */
> +	trace_dat_fp = fopen(to_trace, "wb");
> +	if (!trace_dat_fp) {

[Severity: Medium]
Can this unconditionally overwrite an existing file?

The fopen() call uses the "wb" mode which truncates the file, but it
doesn't appear to check the opts->force flag.

This regression could lead to accidental data loss.

> +		pr_err("Failed to open output file: %s\n", to_trace);
> +		return -EINVAL;
> +	}
[ ... ]
> +	/* Write trace.dat options and flyrecord sections */
> +	if (trace_dat__write_options_section1() < 0) {
> +		pr_err("Failed to write options section1\n");
> +		ret = -EIO;
> +		goto out_delete;
> +	}
> +	if (trace_dat__write_options_section2() < 0) {
> +		pr_err("Failed to write options section2\n");
> +		ret = -EIO;
> +		goto out_delete;
> +	}
> +	if (trace_dat__write_flyrecord_section() < 0) {

[Severity: Critical]
Could an unusually large tracepoint event cause an out-of-bounds memory
access when writing the flyrecord section?

When trace_dat__write_cpu_dat() processes records, it queues them to a new
page without checking if the rec_size fits within trace_dat_page_size.

If an event size exceeds the page size, trace_dat__write_page() allocates a
fixed-size buffer and copies the data into it:

tools/perf/util/trace-dat.c:trace_dat__write_page() {
	page = calloc(1, trace_dat_page_size);
	...
	for (i = 0; i < nr_recs; i++) {
		memcpy(page + offset, records[i], rec_sizes[i]);
	...
}

This regression could result in a heap buffer overflow.

> +		pr_err("Failed to write flyrecord section\n");
> +		ret = -EIO;
> +		goto out_delete;
> +	}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260608125951.90425-2-tshah@linux.ibm.com?part=3

next prev parent reply	other threads:[~2026-06-08 13:14 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-08 12:59 [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion Tanushree Shah
2026-06-08 12:59 ` [RFC PATCH 1/4] perf/trace-dat: Add trace.dat export infrastructure Tanushree Shah
2026-06-08 13:13   ` sashiko-bot
2026-06-08 12:59 ` [RFC PATCH 2/4] perf/trace-event: Write trace.dat metadata sections during parsing Tanushree Shah
2026-06-08 13:12   ` sashiko-bot
2026-06-08 12:59 ` [RFC PATCH 3/4] perf data-convert: Add perf.data to trace.dat conversion backend Tanushree Shah
2026-06-08 13:14   ` sashiko-bot [this message]
2026-06-08 12:59 ` [RFC PATCH 4/4] perf data: Add --to-trace-dat option for converting perf.data tracepoint events into trace.dat format Tanushree Shah
2026-06-08 13:12   ` sashiko-bot
2026-06-08 15:18 ` [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion Ian Rogers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260608131449.9DBDA1F00893@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=tshah@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox