From: sashiko-bot@kernel.org
To: "Tanushree Shah" <tshah@linux.ibm.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [RFC PATCH 2/4] perf/trace-event: Write trace.dat metadata sections during parsing
Date: Mon, 08 Jun 2026 13:12:24 +0000 [thread overview]
Message-ID: <20260608131224.B448A1F00898@smtp.kernel.org> (raw)
In-Reply-To: <20260608125951.90425-4-tshah@linux.ibm.com>
Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
New issues:
- [High] Using `read()` directly instead of the subsystem's `do_read()` loop fails to handle short reads, causing stream desynchronization and uninitialized memory writes.
- [High] Integer truncation in `do_read()` causes massive uninitialized heap memory dumps to the output file.
- [Medium] Metadata fields and integers for the `trace.dat` file are written in the host's native byte order, causing cross-platform incompatibility.
- [Medium] Error paths in `fwrite` operations return directly without cleaning up allocated resources, leading to memory and state leaks.
- [Medium] Writing 2 bytes from a 1-byte `char` variable causes a stack out-of-bounds read and file corruption.
Pre-existing issues:
- [High] Unbounded string accumulation in `read_string()` causes a stack buffer overflow.
--
commit f1360b22eb92ea64822af70555a808fd9d67ae93
Author: Tanushree Shah <tshah@linux.ibm.com>
Date: Mon Jun 8 12:59:50 2026 +0000
perf/trace-event: Write trace.dat metadata sections during parsing
Perf already captures the tracing metadata as a part of
data section in perf.data
When trace_dat_fp is set, write trace.dat compatible metadata
sections using the perf provided raw buffers.
Sections written:
- Initial format header (magic, version, endian, long_size,
page_size, compression, options_offset placeholder)
- Section 16: HEADER INFO (header_page + header_event)
- Section 17: FTRACE EVENT FORMATS
- Section 18: EVENT FORMATS (per system/event format files)
- Section 19: KALLSYMS
- Section 21: CMDLINES
- Section 15: STRINGS (written last after all sections)
Signed-off-by: Tanushree Shah <tshah@linux.ibm.com>
This commit updates the trace-event parsing in perf to write trace.dat
compatible metadata sections when trace_dat_fp is set. It outputs sections
such as initial format headers, FTRACE EVENT FORMATS, EVENT FORMATS,
KALLSYMS, and CMDLINES using perf's raw buffers.
> diff --git a/tools/perf/util/trace-event-read.c b/tools/perf/util/trace-event-read.c
> index ecbbb93f01853..815577703c2e5 100644
> --- a/tools/perf/util/trace-event-read.c
> +++ b/tools/perf/util/trace-event-read.c
[ ... ]
> @@ -145,10 +146,9 @@ static char *read_string(void)
[Severity: High]
This isn't a bug introduced by this patch, but does read_string() allow a
stack buffer overflow when accumulating characters?
In read_string():
buf[size++] = c;
If the input file contains a string longer than BUFSIZ without a null
terminator, will this increment the size index and write past the bounds
of the fixed-size array buf?
> static int read_proc_kallsyms(struct tep_handle *pevent)
> {
> unsigned int size;
> + char *buf;
>
> size = read4(pevent);
> - if (!size)
> - return 0;
[ ... ]
> + buf = malloc(size);
> + if (buf == NULL)
> + return -1;
> + if (read(input_fd, buf, size) < 0) {
> + free(buf);
> + return -1;
> + }
> trace_data_size += size;
> + /* Write kallsyms section with data */
> + if (trace_dat_fp) {
> + unsigned short section_id = TRACE_DAT_SECTION_KALLSYMS;
> + unsigned int string_id = STRID_KALLSYMS;
> + unsigned long long section_size = sizeof(unsigned int) + size;
> + unsigned short flags = 0;
> +
> + trace_dat_kallsyms_offset = ftell(trace_dat_fp);
> + if (!fwrite(§ion_id, sizeof(unsigned short), 1, trace_dat_fp) ||
> + !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
> + !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp) ||
> + !fwrite(§ion_size, sizeof(unsigned long long), 1, trace_dat_fp) ||
> + !fwrite(&size, sizeof(unsigned int), 1, trace_dat_fp) ||
> + !fwrite(buf, 1, size, trace_dat_fp)) {
[Severity: High]
Could using read() instead of do_read() here cause uninitialized memory to
be written to the output file?
Unlike do_read(), read() can return a short read. If a short read occurs, the
return value might still be greater than 0, bypassing the error check.
Will this leave the remainder of buf uninitialized, failing to advance the
input_fd offset by the expected amount, and then dump the partially
uninitialized heap memory into trace_dat_fp during the fwrite() call?
> + free(buf);
> + return -EIO;
> + }
> + }
> + free(buf);
> return 0;
> }
[ ... ]
> @@ -209,6 +261,7 @@ static int read_header_files(struct tep_handle *pevent)
>
> size = read8(pevent);
>
> + header_page_size = size;
> header_page = malloc(size);
> if (header_page == NULL)
> return -1;
> @@ -227,19 +280,59 @@ static int read_header_files(struct tep_handle *pevent)
> */
> tep_set_long_size(pevent, tep_get_header_page_size(pevent));
> }
> - free(header_page);
>
> - if (do_read(buf, 13) < 0)
> + if (do_read(buf, 13) < 0) {
> + free(header_page);
> return -1;
> + }
[Severity: High]
Can integer truncation in do_read() lead to massive uninitialized heap memory
dumps here?
Functions like read_header_files() allocate memory using the 64-bit size read
from the file:
size = read8(pevent);
header_page = malloc(size);
They then call do_read(header_page, size). Because do_read() takes an int
for the size argument, values over 4GB are truncated to their lower 32 bits.
If size is 0x100000004, do_read() will only read 4 bytes.
Later in the function, fwrite() is called with the full 64-bit size:
!fwrite(header_page, 1, header_page_size, trace_dat_fp)
Will this result in dumping 4GB of uninitialized heap memory into the
output file?
[ ... ]
> @@ -298,8 +405,31 @@ static int read_ftrace_files(struct tep_handle *pevent)
> int count;
> int i;
> int ret;
> + long section_size_pos = 0;
> + long count_pos = 0;
> + unsigned long long section_size = 0;
> + long end_pos;
>
> count = read4(pevent);
> + /* Write ftrace formats section to trace.dat output file */
> + if (trace_dat_fp) {
> + unsigned short section_id = TRACE_DAT_SECTION_FTRACE;
> + unsigned short flags = 0;
> + unsigned int string_id = STRID_FTRACE_FORMATS;
> +
> + trace_dat_ftrace_format_offset = ftell(trace_dat_fp);
> +
> + if (!fwrite(§ion_id, sizeof(unsigned short), 1, trace_dat_fp) ||
> + !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
> + !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp))
[Severity: Medium]
Should these multi-byte integers be byte-swapped to match the output file's
endianness?
The trace.dat file endianness is explicitly declared to match the original
perf.data file (endian = file_bigendian).
However, variables like section_id, flags, and string_id are written using
fwrite(), which writes them in the host's native byte order. If a
little-endian host is processing a big-endian capture file, won't this corrupt
the section IDs and sizes in the resulting trace.dat file?
> + return -EIO;
> + section_size_pos = ftell(trace_dat_fp);
> + if (!fwrite(§ion_size, sizeof(unsigned long long), 1, trace_dat_fp))
> + return -EIO;
> + count_pos = ftell(trace_dat_fp);
> + if (!fwrite(&count, sizeof(unsigned int), 1, trace_dat_fp))
> + return -EIO;
> + }
[ ... ]
> @@ -363,6 +557,23 @@ static int read_saved_cmdline(struct tep_handle *pevent)
> pr_debug("error reading saved cmdlines\n");
> goto out;
> }
> + /* Write cmdlines section with data */
> + if (trace_dat_fp) {
> + unsigned short section_id = TRACE_DAT_SECTION_CMDLINE;
> + unsigned short flags = 0;
> + unsigned int string_id = STRID_CMDLINES;
> + unsigned long long section_size = sizeof(unsigned long long) + size;
> +
> + trace_dat_cmdline_offset = ftell(trace_dat_fp);
> + if (!fwrite(§ion_id, sizeof(unsigned short), 1, trace_dat_fp) ||
> + !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
> + !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp) ||
> + !fwrite(§ion_size, sizeof(unsigned long long), 1, trace_dat_fp) ||
> + !fwrite(&size, sizeof(unsigned long long), 1, trace_dat_fp) ||
> + !fwrite(buf, 1, size, trace_dat_fp))
> + return -EIO;
[Severity: Medium]
Does returning directly bypass the cleanup block?
If fwrite() fails, this error path returns -EIO immediately instead of using
the "goto out" pattern established in this function.
Will this cause the allocated memory in buf to be leaked, since free(buf)
is skipped?
> + }
> +
> buf[ret] = '\0';
>
> parse_saved_cmdline(pevent, buf, size);
[ ... ]
> @@ -440,6 +657,28 @@ ssize_t trace_report(int fd, struct trace_event *tevent, bool __repipe)
> tep_set_long_size(pevent, file_long_size);
> tep_set_page_size(pevent, file_page_size);
>
> + /* Write initial file header to trace.dat */
> + if (trace_dat_fp) {
> + unsigned char endian = file_bigendian;
> + unsigned char long_size = file_long_size;
> + unsigned int page_size = file_page_size;
> + unsigned long long placeholder = 0;
> + char trace_dat_version = TRACE_DAT_VERSION;
> +
> + if (!fwrite(magic_buf, 1, 10, trace_dat_fp) || /* magic + "tracing" */
> + !fwrite(&trace_dat_version, 1, 2, trace_dat_fp) ||
[Severity: Medium]
Can this write out-of-bounds stack memory to the output file?
The variable trace_dat_version is declared as a 1-byte char:
char trace_dat_version = TRACE_DAT_VERSION;
However, the fwrite call asks to write 2 bytes:
!fwrite(&trace_dat_version, 1, 2, trace_dat_fp)
Will this read 1 byte past the trace_dat_version variable on the stack,
leaking uninitialized stack memory into the output file and potentially
failing to write the expected null terminator for the version string?
> + !fwrite(&endian, 1, 1, trace_dat_fp) ||
> + !fwrite(&long_size, 1, 1, trace_dat_fp) ||
> + !fwrite(&page_size, sizeof(unsigned int), 1, trace_dat_fp) ||
> + !fwrite("none", 1, 4, trace_dat_fp) ||
> + !fwrite("\0", 1, 1, trace_dat_fp) ||
> + !fwrite("\0", 1, 1, trace_dat_fp))
> + return -EIO;
> + trace_dat_options_offset = ftell(trace_dat_fp);
> + if (!fwrite(&placeholder, sizeof(unsigned long long), 1, trace_dat_fp))
> + return -EIO;
> + }
> +
> err = read_header_files(pevent);
> if (err)
> goto out;
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260608125951.90425-2-tshah@linux.ibm.com?part=2
next prev parent reply other threads:[~2026-06-08 13:12 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-08 12:59 [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion Tanushree Shah
2026-06-08 12:59 ` [RFC PATCH 1/4] perf/trace-dat: Add trace.dat export infrastructure Tanushree Shah
2026-06-08 13:13 ` sashiko-bot
2026-06-08 12:59 ` [RFC PATCH 2/4] perf/trace-event: Write trace.dat metadata sections during parsing Tanushree Shah
2026-06-08 13:12 ` sashiko-bot [this message]
2026-06-08 12:59 ` [RFC PATCH 3/4] perf data-convert: Add perf.data to trace.dat conversion backend Tanushree Shah
2026-06-08 13:14 ` sashiko-bot
2026-06-08 12:59 ` [RFC PATCH 4/4] perf data: Add --to-trace-dat option for converting perf.data tracepoint events into trace.dat format Tanushree Shah
2026-06-08 13:12 ` sashiko-bot
2026-06-08 15:18 ` [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion Ian Rogers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260608131224.B448A1F00898@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=sashiko-reviews@lists.linux.dev \
--cc=tshah@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox