[PATCH 0/3] perf record: fix multi-record Zstd compression

Linux Perf Users
 help / color / mirror / Atom feed

* [PATCH 0/3] perf record: fix multi-record Zstd compression
@ 2026-06-18 15:59 Dmitry Ilvokhin
  2026-06-18 15:59 ` [PATCH 1/3] perf record: Fix multiple PERF_RECORD_COMPRESSED2 records per push Dmitry Ilvokhin
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Dmitry Ilvokhin @ 2026-06-18 15:59 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, James Clark, Nick Terrell,
	David Sterba
  Cc: linux-kernel, linux-perf-users, kernel-team, Farid Zakaria,
	Dmitry Ilvokhin

Patch 1 fixes a 'perf record -z' regression that aborts recording with
"Bad address" and produces an undecompressable perf.data.

While fixing it I found two more latent bugs in the same compressor,
zstd_compress_stream_to_records(): an output-buffer overrun and a broken
zstd-error fallback. Addressed in patches 2 and 3.

Dmitry Ilvokhin (3):
  perf record: Fix multiple PERF_RECORD_COMPRESSED2 records per push
  perf record: Avoid overrunning the zstd output buffer
  perf record: Don't store raw data on zstd compression failure

 tools/perf/builtin-record.c                   | 40 ++++++------
 .../record+zstd_comp_decomp_multi_record.sh   | 64 +++++++++++++++++++
 tools/perf/util/zstd.c                        | 15 +++--
 3 files changed, 94 insertions(+), 25 deletions(-)
 create mode 100755 tools/perf/tests/shell/record+zstd_comp_decomp_multi_record.sh

-- 
2.53.0-Meta


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] perf record: Fix multiple PERF_RECORD_COMPRESSED2 records per push
  2026-06-18 15:59 [PATCH 0/3] perf record: fix multi-record Zstd compression Dmitry Ilvokhin
@ 2026-06-18 15:59 ` Dmitry Ilvokhin
  2026-06-18 16:26   ` sashiko-bot
  2026-06-18 15:59 ` [PATCH 2/3] perf record: Avoid overrunning the zstd output buffer Dmitry Ilvokhin
  2026-06-18 15:59 ` [PATCH 3/3] perf record: Don't store raw data on zstd compression failure Dmitry Ilvokhin
  2 siblings, 1 reply; 8+ messages in thread
From: Dmitry Ilvokhin @ 2026-06-18 15:59 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, James Clark, Nick Terrell,
	David Sterba
  Cc: linux-kernel, linux-perf-users, kernel-team, Farid Zakaria,
	Dmitry Ilvokhin

With Zstd compression enabled ('perf record -z'), a single mmap push
whose compressed output exceeds the maximum record size makes
zstd_compress_stream_to_records() emit several PERF_RECORD_COMPRESSED2
records back to back. record__pushfn() however rewrote only the first
record's header to describe the whole blob as one record:

  event->data_size   = compressed - sizeof(struct perf_record_compressed2);
  event->header.size = PERF_ALIGN(compressed, sizeof(u64));
  padding            = event->header.size - compressed;
  ...
  record__write(rec, map, &pad, padding);

perf_event_header::size is a __u16, so once the compressed blob no
longer fits in it the header.size assignment truncates and 'padding'
(size_t) underflows. write() is then handed that bogus length and fails
with EFAULT, aborting the recording:

  failed to write perf data, error: Bad address

The bytes that did reach the file are mis-framed, so reading it back
cannot be decompressed.

This is easy to hit with a high event rate and a large buffer, e.g.:

  perf record -z -F max -m 32M --per-thread -- perf test -w thloop 5 1

The single-record fixup is wrong by construction: because header.size is
16 bits a compressed record cannot exceed 64KB, so the compressor must
split a push into a chain of records, and the session reader already
consumes them as such.

Frame each record where it is produced instead: process_comp_header()
now sets the per-record data_size, 8-byte-aligns header.size and zeroes
the trailing padding, and record__pushfn() just writes the resulting
blob, as the AIO path already does. max_record_size is reduced by
sizeof(u64) so the per-record alignment padding cannot push header.size
past its u16 field.

There is no on-disk format change; a perf.data written by the fixed tool
is still read by existing perf.

Fixes: 208c0e168344 ("perf record: Add 8-byte aligned event type PERF_RECORD_COMPRESSED2")
Reported-by: Farid Zakaria <fmzakari@meta.com>
Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
---
 tools/perf/builtin-record.c                   | 40 ++++++------
 .../record+zstd_comp_decomp_multi_record.sh   | 64 +++++++++++++++++++
 tools/perf/util/zstd.c                        |  2 +-
 3 files changed, 86 insertions(+), 20 deletions(-)
 create mode 100755 tools/perf/tests/shell/record+zstd_comp_decomp_multi_record.sh

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4a5eba498c02..314600812207 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -652,27 +652,14 @@ static int record__pushfn(struct mmap *map, void *to, void *bf, size_t size)
 	struct record *rec = to;
 
 	if (record__comp_enabled(rec)) {
-		struct perf_record_compressed2 *event = map->data;
-		size_t padding = 0;
-		u8 pad[8] = {0};
 		ssize_t compressed = zstd_compress(rec->session, map, map->data,
 						   mmap__mmap_len(map), bf, size);
 
 		if (compressed < 0)
 			return (int)compressed;
 
-		bf = event;
 		thread->samples++;
-
-		/*
-		 * The record from `zstd_compress` is not 8 bytes aligned, which would cause asan
-		 * error. We make it aligned here.
-		 */
-		event->data_size = compressed - sizeof(struct perf_record_compressed2);
-		event->header.size = PERF_ALIGN(compressed, sizeof(u64));
-		padding = event->header.size - compressed;
-		return record__write(rec, map, bf, compressed) ||
-		       record__write(rec, map, &pad, padding);
+		return record__write(rec, map, map->data, compressed);
 	}
 
 	thread->samples++;
@@ -1590,18 +1577,28 @@ static void record__adjust_affinity(struct record *rec, struct mmap *map)
 	}
 }
 
-static size_t process_comp_header(void *record, size_t increment)
+/*
+ * Called once with data_size == 0 to start a record, then once with
+ * data_size == compressed payload size to finalize and 8-byte-pad it.
+ */
+static size_t process_comp_header(void *record, size_t data_size)
 {
 	struct perf_record_compressed2 *event = record;
 	size_t size = sizeof(*event);
 
-	if (increment) {
-		event->header.size += increment;
-		return increment;
+	if (data_size) {
+		size_t padding;
+
+		event->data_size = data_size;
+		event->header.size = PERF_ALIGN(size + data_size, sizeof(u64));
+		padding = event->header.size - size - data_size;
+		memset(record + size + data_size, 0, padding);
+		return data_size + padding;
 	}
 
 	event->header.type = PERF_RECORD_COMPRESSED2;
 	event->header.size = size;
+	event->data_size = 0;
 
 	return size;
 }
@@ -1610,7 +1607,12 @@ static ssize_t zstd_compress(struct perf_session *session, struct mmap *map,
 			    void *dst, size_t dst_size, void *src, size_t src_size)
 {
 	ssize_t compressed;
-	size_t max_record_size = PERF_SAMPLE_MAX_SIZE - sizeof(struct perf_record_compressed2) - 1;
+	/*
+	 * Reserve space so per-record PERF_ALIGN() padding keeps header.size
+	 * within u16.
+	 */
+	size_t max_record_size = PERF_SAMPLE_MAX_SIZE
+		- sizeof(struct perf_record_compressed2) - sizeof(u64);
 	struct zstd_data *zstd_data = &session->zstd_data;
 
 	if (map && map->file)
diff --git a/tools/perf/tests/shell/record+zstd_comp_decomp_multi_record.sh b/tools/perf/tests/shell/record+zstd_comp_decomp_multi_record.sh
new file mode 100755
index 000000000000..42efe7260def
--- /dev/null
+++ b/tools/perf/tests/shell/record+zstd_comp_decomp_multi_record.sh
@@ -0,0 +1,64 @@
+#!/bin/bash
+# Zstd perf.data compression/decompression of multi-record data
+
+# SPDX-License-Identifier: GPL-2.0
+
+perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX)
+recout=$(mktemp /tmp/__perf_test.zstd.rec.XXXXX)
+injout=$(mktemp /tmp/__perf_test.zstd.inj.XXXXX)
+perf_tool=perf
+
+cleanup() {
+	rm -f "${perfdata}" "${perfdata}".old "${perfdata}".decomp "${recout}" "${injout}"
+}
+trap cleanup EXIT TERM INT
+
+skip_if_no_z_record() {
+	$perf_tool record -h 2>&1 | grep -q -- '-z, --compression-level'
+}
+
+collect_z_record() {
+	echo "Collecting compressed record file:"
+	[ "$(uname -m)" != s390x ] && gflag='-g'
+	$perf_tool record -o "${perfdata}" $gflag -z -F max -m 32M --per-thread -- \
+		$perf_tool test -w thloop 5 1 \
+		>/dev/null 2>"${recout}"
+}
+
+check_record() {
+	echo "Checking record did not fail to write data:"
+	if grep -q "failed to write perf data" "${recout}"; then
+		cat "${recout}"
+		return 1
+	fi
+}
+
+check_decompress() {
+	echo "Checking compressed file decompresses cleanly:"
+	if ! $perf_tool inject -i "${perfdata}" -o "${perfdata}".decomp 2>"${injout}"; then
+		cat "${injout}"
+		return 1
+	fi
+	if grep -Eqi "decompress|corrupt|failed to process type" "${injout}"; then
+		cat "${injout}"
+		return 1
+	fi
+}
+
+skip_if_no_z_record || exit 2
+collect_z_record
+check_record || exit 1
+
+# Need >1 record, else the multi-record path wasn't exercised.
+# Skip rather than pass/fail spuriously.
+nr=$($perf_tool report -i "${perfdata}" --stats 2>/dev/null |
+	awk '/COMPRESSED2 events:/ { print $3 }')
+if [ -z "${nr}" ] || [ "${nr}" -lt 2 ]; then
+	echo "less than two compressed records (${nr:-0}), skipping"
+	exit 2
+fi
+echo "Produced ${nr} compressed records"
+
+check_decompress
+err=$?
+exit $err
diff --git a/tools/perf/util/zstd.c b/tools/perf/util/zstd.c
index 57027e0ac7b6..1955fa2431d1 100644
--- a/tools/perf/util/zstd.c
+++ b/tools/perf/util/zstd.c
@@ -30,7 +30,7 @@ int zstd_fini(struct zstd_data *data)
 
 ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
 				       void *src, size_t src_size, size_t max_record_size,
-				       size_t process_header(void *record, size_t increment))
+				       size_t process_header(void *record, size_t data_size))
 {
 	size_t ret, size, compressed = 0;
 	ZSTD_inBuffer input = { src, src_size, 0 };
-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] perf record: Avoid overrunning the zstd output buffer
  2026-06-18 15:59 [PATCH 0/3] perf record: fix multi-record Zstd compression Dmitry Ilvokhin
  2026-06-18 15:59 ` [PATCH 1/3] perf record: Fix multiple PERF_RECORD_COMPRESSED2 records per push Dmitry Ilvokhin
@ 2026-06-18 15:59 ` Dmitry Ilvokhin
  2026-06-18 16:20   ` sashiko-bot
  2026-06-18 15:59 ` [PATCH 3/3] perf record: Don't store raw data on zstd compression failure Dmitry Ilvokhin
  2 siblings, 1 reply; 8+ messages in thread
From: Dmitry Ilvokhin @ 2026-06-18 15:59 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, James Clark, Nick Terrell,
	David Sterba
  Cc: linux-kernel, linux-perf-users, kernel-team, Farid Zakaria,
	Dmitry Ilvokhin

zstd_compress_stream_to_records() compresses one mmap chunk into 'dst'
as a chain of records, decrementing 'dst_size' as each record's header
and compressed payload are written, but it never checks that the space
left can still hold a record before writing one.

When the compressed output does not fit in 'dst', 'dst_size -= size'
underflows ('dst_size' is a size_t), the output size is then taken as
max_record_size again, and ZSTD_compressStream() writes past the end
of 'dst'.

Bail out when the remaining 'dst_size' can no longer hold a full record
(header + max_record_size + 8-byte alignment) rather than writing past
'dst'.

Fixes: f24c1d7523e6 ("perf tools: Introduce Zstd streaming based compression API")
Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
---
 tools/perf/util/zstd.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/zstd.c b/tools/perf/util/zstd.c
index 1955fa2431d1..a0f6572ad201 100644
--- a/tools/perf/util/zstd.c
+++ b/tools/perf/util/zstd.c
@@ -32,7 +32,7 @@ ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_
 				       void *src, size_t src_size, size_t max_record_size,
 				       size_t process_header(void *record, size_t data_size))
 {
-	size_t ret, size, compressed = 0;
+	size_t ret, size, compressed = 0, hdr_size = 0;
 	ZSTD_inBuffer input = { src, src_size, 0 };
 	ZSTD_outBuffer output;
 	void *record;
@@ -53,8 +53,14 @@ ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_
 	}
 
 	while (input.pos < input.size) {
+		if (hdr_size &&
+		    dst_size < hdr_size + max_record_size + sizeof(u64)) {
+			pr_err("failed to compress: output buffer too small\n");
+			return -1;
+		}
 		record = dst;
 		size = process_header(record, 0);
+		hdr_size = size;
 		compressed += size;
 		dst += size;
 		dst_size -= size;
-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] perf record: Don't store raw data on zstd compression failure
  2026-06-18 15:59 [PATCH 0/3] perf record: fix multi-record Zstd compression Dmitry Ilvokhin
  2026-06-18 15:59 ` [PATCH 1/3] perf record: Fix multiple PERF_RECORD_COMPRESSED2 records per push Dmitry Ilvokhin
  2026-06-18 15:59 ` [PATCH 2/3] perf record: Avoid overrunning the zstd output buffer Dmitry Ilvokhin
@ 2026-06-18 15:59 ` Dmitry Ilvokhin
  2026-06-18 16:24   ` sashiko-bot
  2 siblings, 1 reply; 8+ messages in thread
From: Dmitry Ilvokhin @ 2026-06-18 15:59 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, James Clark, Nick Terrell,
	David Sterba
  Cc: linux-kernel, linux-perf-users, kernel-team, Farid Zakaria,
	Dmitry Ilvokhin

On a ZSTD_compressStream() error, zstd_compress_stream_to_records()
falls back to copying the uncompressed input into the output buffer and
returning its size. That cannot work:

- a COMPRESSED2 payload is always fed to zstd_decompress_stream(), so
  raw bytes placed there can never be read back.

- the record header is never finalized: process_header() is not called
  to write the real header.size and data_size, so they keep their
  initial values and do not match the bytes written.

- the copy is unbounded and can write past the output buffer.

Propagate the error instead.

Fixes: f24c1d7523e6 ("perf tools: Introduce Zstd streaming based compression API")
Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
---
 tools/perf/util/zstd.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/tools/perf/util/zstd.c b/tools/perf/util/zstd.c
index a0f6572ad201..2de777d49ee0 100644
--- a/tools/perf/util/zstd.c
+++ b/tools/perf/util/zstd.c
@@ -1,7 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
 
-#include <string.h>
-
 #include "util/compress.h"
 #include "util/debug.h"
 
@@ -71,8 +69,7 @@ ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_
 		if (ZSTD_isError(ret)) {
 			pr_err("failed to compress %ld bytes: %s\n",
 				(long)src_size, ZSTD_getErrorName(ret));
-			memcpy(dst, src, src_size);
-			return src_size;
+			return -1;
 		}
 		size = output.pos;
 		size = process_header(record, size);
-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] perf record: Avoid overrunning the zstd output buffer
  2026-06-18 15:59 ` [PATCH 2/3] perf record: Avoid overrunning the zstd output buffer Dmitry Ilvokhin
@ 2026-06-18 16:20   ` sashiko-bot
  2026-06-19 14:19     ` Dmitry Ilvokhin
  0 siblings, 1 reply; 8+ messages in thread
From: sashiko-bot @ 2026-06-18 16:20 UTC (permalink / raw)
  To: Dmitry Ilvokhin; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

New issues:
- [High] The patch introduces an overly strict buffer size check that causes compression to fail prematurely whenever less than `max_record_size` (~64KB) space remains.

Pre-existing issues:
- [High] The safety bounds check added to `zstd_compress_stream_to_records` is bypassed on the first loop iteration, leaving the buffer vulnerable to overflow and integer underflow.
- [High] In `zstd_decompress_stream()`, the `output.dst` pointer is advanced while `output.pos` is improperly preserved, leading to out-of-bounds writes and corrupted output if the decompression loop executes more than once.
- [High] If `perf_mmap__push` processes the ring buffer in two chunks and the second chunk's compression fails, the reference to `map->core` acquired during the first chunk is leaked.
--

commit 09f418a2fa19ace0234b4443e193f65fb347ed8b
Author: Dmitry Ilvokhin <d@ilvokhin.com>

perf record: Avoid overrunning the zstd output buffer

This commit adds a bounds check in zstd_compress_stream_to_records() to
prevent writing past the end of the output buffer. It verifies if the
remaining buffer size is sufficient for a full record.

> diff --git a/tools/perf/util/zstd.c b/tools/perf/util/zstd.c
> index 1955fa2431d17..a0f6572ad2017 100644
> --- a/tools/perf/util/zstd.c
> +++ b/tools/perf/util/zstd.c

[ ... ]

> @@ -53,8 +53,14 @@ ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_
>  	}
>  
>  	while (input.pos < input.size) {
> +		if (hdr_size &&
> +		    dst_size < hdr_size + max_record_size + sizeof(u64)) {

[Severity: High]
This isn't a bug introduced by this patch, but by gating the bounds check on
hdr_size, isn't the check bypassed on the very first loop iteration since
hdr_size is initialized to 0?

If dst_size is initially very small (which can legitimately happen in 
record__aio_pushfn() when calculating mmap__mmap_len(map) - aio->size), the 
code will unconditionally write the header and decrement dst_size.

Could this underflow dst_size to a massive positive value, bypassing
subsequent bounds checks and allowing ZSTD_compressStream() to write out of
bounds?

[Severity: High]
Does this check introduce an overly strict requirement that causes compression 
to fail prematurely?

By requiring dst_size < hdr_size + max_record_size + sizeof(u64), this 
condition unconditionally fails compression if the remaining buffer is less 
than the maximum record size (typically ~64KB).

In AIO mode, dst_size represents the remaining space in the AIO buffer, which
naturally dwindles. If a small chunk needs to be compressed and could fit in
the remaining space, wouldn't this check spuriously abort the compression and
render the last ~64KB of every buffer unusable?

> +			pr_err("failed to compress: output buffer too small\n");
> +			return -1;
> +		}
>  		record = dst;
>  		size = process_header(record, 0);
> +		hdr_size = size;
>  		compressed += size;
>  		dst += size;
>  		dst_size -= size;

[Severity: High]
This isn't a bug introduced by this patch, but looking at
zstd_decompress_stream() in the same file, is there an issue with how
output.pos is handled?

tools/perf/util/zstd.c:zstd_decompress_stream() {
    ...
    output.dst  = dst + output.pos;
    output.size = dst_size - output.pos;
    ...
}

The loop advances the destination pointer but fails to clear output.pos.

If the decompression loop executes more than once, will ZSTD start writing
at the new output.dst + output.pos, effectively skipping output.pos bytes
and corrupting the decompressed stream?

[Severity: High]
This isn't a bug introduced by this patch, but does the strict bounds check
introduced above make a memory leak in AIO mode more easily triggerable?

In tools/perf/builtin-record.c:record__aio_pushfn(), a reference to map->core 
is acquired on the first successful chunk:

tools/perf/builtin-record.c:record__aio_pushfn() {
    ...
    if (!aio->size) {
        perf_mmap__get(&map->core);
    }
    ...
}

If perf_mmap__push() processes the ring buffer in two chunks, and the
second chunk's compression fails (which happens more frequently now due
to the ~64KB bounds check), perf_mmap__push() aborts and passes the error
back to record__aio_push():

tools/perf/builtin-record.c:record__aio_push() {
    ...
    ret = perf_mmap__push(map, &aio, record__aio_pushfn);
    if (ret != 0)
        return ret;
    ...
}

Will this return immediately on failure and bypass the perf_mmap__put()
cleanup that occurs later, resulting in a leak of the map->core resource?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1781797544.git.d@ilvokhin.com?part=2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] perf record: Don't store raw data on zstd compression failure
  2026-06-18 15:59 ` [PATCH 3/3] perf record: Don't store raw data on zstd compression failure Dmitry Ilvokhin
@ 2026-06-18 16:24   ` sashiko-bot
  0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-06-18 16:24 UTC (permalink / raw)
  To: Dmitry Ilvokhin; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 7 potential issue(s) to consider:

Pre-existing issues:
- [Critical] The `zstd_decompress_stream` function incorrectly updates the output buffer pointer, causing an out-of-bounds write and data corruption.
- [Critical] Buffer overflow via integer underflow in `zstd_compress_stream_to_records` bypasses the bounds check.
- [High] Unflushed compressed data is left in the ZSTD internal buffer and permanently lost at the end of a recording session.
- [High] The return value of `ZSTD_flushStream()` is entirely ignored, masking potential compression state errors.
- [High] A missing `perf_mmap__put` in the error path of `record__aio_push` causes a reference count leak of `map->core`.
- [High] Skipped `perf_mmap__consume` on partial failure causes duplicate event records in the `perf.data` file.
- [High] An overly strict bounds check in `zstd_compress_stream_to_records` causes `perf record` to abort when compressing large chunks of trace data.
--

commit 460b94129652aafbc57f9667aa7c4a4a81ffacbe
Author: Dmitry Ilvokhin <d@ilvokhin.com>

perf record: Don't store raw data on zstd compression failure

On a ZSTD_compressStream() error, zstd_compress_stream_to_records()
falls back to copying the uncompressed input into the output buffer and
returning its size. That cannot work:

- a COMPRESSED2 payload is always fed to zstd_decompress_stream(), so
  raw bytes placed there can never be read back.

- the record header is never finalized: process_header() is not called
  to write the real header.size and data_size, so they keep their
  initial values and do not match the bytes written.

- the copy is unbounded and can write past the output buffer.

Propagate the error instead.

Fixes: f24c1d7523e6 ("perf tools: Introduce Zstd streaming based compression API")

> diff --git a/tools/perf/util/zstd.c b/tools/perf/util/zstd.c
> index a0f6572ad2017..2de777d49ee0b 100644
> --- a/tools/perf/util/zstd.c
> +++ b/tools/perf/util/zstd.c
[ ... ]
> @@ -71,8 +69,7 @@ ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_
>  		if (ZSTD_isError(ret)) {
>  			pr_err("failed to compress %ld bytes: %s\n",
>  				(long)src_size, ZSTD_getErrorName(ret));
> -			memcpy(dst, src, src_size);
> -			return src_size;
> +			return -1;
>  		}
>  		size = output.pos;
>  		size = process_header(record, size);

[Severity: High]
This isn't a bug introduced by this patch, but does the loop ignore the
return value of ZSTD_flushStream()?

Looking at zstd_compress_stream_to_records():

		ret = ZSTD_compressStream(data->cstream, &output, &input);
		ZSTD_flushStream(data->cstream, &output);
		if (ZSTD_isError(ret)) {

The return value of ZSTD_flushStream() is never checked. The subsequent
ZSTD_isError(ret) only evaluates the error code of ZSTD_compressStream().
If the flush operation encounters a failure, the error code is discarded,
potentially resulting in silently corrupted data streams or emitting an
incomplete compressed payload without propagating an error to the user.

[Severity: High]
This is a pre-existing issue, but does zstd_compress_stream_to_records()
leave unflushed compressed data in the ZSTD internal buffer?

In tools/perf/util/zstd.c:zstd_compress_stream_to_records():

	while (input.pos < input.size) {
        ...

The loop exits strictly when input.pos == input.size. Since
ZSTD_flushStream() can leave unflushed data in the internal compression
stream if the output buffer fills up, and perf record never calls
ZSTD_endStream() to force a final flush at the end of the session, any
remaining compressed data trapped in the stream buffer is silently and
permanently lost, resulting in truncated or corrupted trace files.

[Severity: Critical]
This is a pre-existing issue, but does zstd_decompress_stream() incorrectly
advance the output buffer pointer, causing an out-of-bounds write?

In tools/perf/util/zstd.c:zstd_decompress_stream():

	while (input.pos < input.size) {
		ret = ZSTD_decompressStream(data->dstream, &output, &input);
        ...
		output.dst  = dst + output.pos;
		output.size = dst_size - output.pos;
	}

The output.dst pointer is advanced by output.pos
(output.dst = dst + output.pos). However, output.pos is the accumulated
number of bytes written, which ZSTD_decompressStream() internally adds to
output.dst. This evaluates to an effective write address of
dst + 2 * output.pos.

Consequently, the decompression loop skips over output.pos bytes on every
iteration, leaving uninitialized gaps in the decompressed output and quickly
writing past the dst_size bounds, causing severe memory corruption.

[Severity: Critical]
This is a pre-existing issue, but does zstd_compress_stream_to_records()
have a buffer overflow via integer underflow that bypasses bounds checking?

In tools/perf/util/zstd.c:zstd_compress_stream_to_records():

		if (hdr_size &&
		    dst_size < hdr_size + max_record_size + sizeof(u64)) {
			pr_err("failed to compress: output buffer too small\n");
			return -1;
		}
		record = dst;
		size = process_header(record, 0);

When perf_mmap__push() splits the ring buffer, the second chunk can be
passed to zstd_compress() with a very small dst_size (e.g., 8 bytes). Since
hdr_size is initially 0, the boundary check is bypassed on the first
iteration.

process_header() unconditionally writes a 24-byte header into the <24 byte
remaining buffer, corrupting memory. Then dst_size -= 24 underflows to a
huge value (SIZE_MAX), causing ZSTD_compressStream() to write up to 64KB
(max_record_size) entirely out of bounds.

[Severity: High]
This is a pre-existing issue, but does the error path in record__aio_push()
cause a reference count leak of map->core?

In tools/perf/builtin-record.c:record__aio_push():

	if (ret != 0) /* ret > 0 - no data, ret < 0 - error */
		return ret;

When zstd_compress() fails on the second mmap chunk, record__aio_pushfn()
propagates the error (< 0). perf_mmap__push() then aborts and returns the
error code.

The code checks if (ret != 0) and returns immediately, skipping the
perf_mmap__put(&map->core) cleanup that balances the perf_mmap__get()
acquired during the successful first chunk. This unhandled error path causes
an LSAN-reportable memory leak and violates perf tool reference counting
invariants.

[Severity: High]
This is a pre-existing issue, but does skipping perf_mmap__consume() on
partial failure cause duplicate event records in the perf.data file?

In tools/perf/util/mmap.c:perf_mmap__push():

	if (push(md, to, buf, size) < 0) {
		rc = -1;
		goto out;
	}

If the second chunk of a split ring buffer fails during zstd_compress(),
perf_mmap__push() aborts via goto out, skipping the
perf_mmap__consume(&md->core) call.

Because perf_mmap__consume() is skipped, the ring buffer's consumer tail
pointer is never advanced, even though the first chunk was successfully
compressed and written. The caller record__mmap_read_evlist() receives the
error, aborts, and triggers a retry via record__mmap_read_all(rec, true),
which reads and writes the exact same first chunk again, resulting in
duplicated events and corrupted profiling data.

[Severity: High]
This is a pre-existing issue, but does an overly strict bounds check in
zstd_compress_stream_to_records() cause perf record to improperly abort?

In tools/perf/util/zstd.c:zstd_compress_stream_to_records():

		if (hdr_size &&
		    dst_size < hdr_size + max_record_size + sizeof(u64)) {
			pr_err("failed to compress: output buffer too small\n");
			return -1;
		}

The loop requires dst_size to be larger than approximately 64KB
(hdr_size + max_record_size + sizeof(u64)) to proceed to the next iteration.

If the remaining buffer space falls below this threshold near the end of a
compression loop, it unconditionally returns -1 and prints "output buffer
too small", even if only a few bytes of input remain to be compressed. This
logic effectively turns a normal buffer sizing condition into a fatal error,
instantly aborting the entire perf record session.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1781797544.git.d@ilvokhin.com?part=3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] perf record: Fix multiple PERF_RECORD_COMPRESSED2 records per push
  2026-06-18 15:59 ` [PATCH 1/3] perf record: Fix multiple PERF_RECORD_COMPRESSED2 records per push Dmitry Ilvokhin
@ 2026-06-18 16:26   ` sashiko-bot
  0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-06-18 16:26 UTC (permalink / raw)
  To: Dmitry Ilvokhin; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [Critical] Missing bounds check on `dst_size` during the first iteration causes unsigned underflow and out-of-bounds writes.
--

commit 05c1411c44b6c31eba9854745a0f8d5ce767f17a
Author: Dmitry Ilvokhin <d@ilvokhin.com>

perf record: Fix multiple PERF_RECORD_COMPRESSED2 records per push

With Zstd compression enabled, a single mmap push whose compressed output
exceeds the maximum record size emits several back-to-back compressed
records. The record__pushfn() function incorrectly rewrote only the first
record's header to describe the whole blob, which resulted in a bogus
truncated length and corrupted the perf.data file. This patch fixes the
issue by framing each record properly where it is produced, ensuring the
sizes remain valid.

[ ... ]

> diff --git a/tools/perf/util/zstd.c b/tools/perf/util/zstd.c
> index 57027e0ac7b65..1955fa2431d17 100644
> --- a/tools/perf/util/zstd.c
> +++ b/tools/perf/util/zstd.c
> @@ -30,7 +30,7 @@ int zstd_fini(struct zstd_data *data)
>  
>  ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
>  				       void *src, size_t src_size, size_t max_record_size,
> -				       size_t process_header(void *record, size_t increment))
> +				       size_t process_header(void *record, size_t data_size))
>  {
>  	size_t ret, size, compressed = 0;
>  	ZSTD_inBuffer input = { src, src_size, 0 };

[Severity: Critical]
This isn't a bug introduced by this patch, but is there a missing bounds check
on dst_size during the first iteration in this function?

Looking at zstd_compress_stream_to_records(), the bounds check explicitly
skips the first iteration because hdr_size is initialized to 0:

	while (input.pos < input.size) {
		if (hdr_size &&
		    dst_size < hdr_size + max_record_size + sizeof(u64)) {
			pr_err("failed to compress: output buffer too small\n");
			return -1;
		}
		record = dst;
		size = process_header(record, 0);
		hdr_size = size;
		compressed += size;
		dst += size;
		dst_size -= size;

If this function is called with an initially small dst_size (e.g., < 16),
process_header() will perform an out-of-bounds write of the 16-byte
perf_record_compressed2 header.

Following this, dst_size -= size will unconditionally unsigned underflow to a
massive value. This allows ZSTD_compressStream() to write up to
max_record_size bytes out-of-bounds into the dst buffer.

Could this be triggered in aio mode if perf_mmap__push() splits a wrap-around
kernel ring buffer, and the first compressed chunk leaves less than 16 bytes
in the aio->data buffer? The second call to zstd_compress() would then pass
a dst_size < 16, bypassing the bounds check on the first iteration and
triggering a buffer overflow.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1781797544.git.d@ilvokhin.com?part=1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] perf record: Avoid overrunning the zstd output buffer
  2026-06-18 16:20   ` sashiko-bot
@ 2026-06-19 14:19     ` Dmitry Ilvokhin
  0 siblings, 0 replies; 8+ messages in thread
From: Dmitry Ilvokhin @ 2026-06-19 14:19 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: linux-perf-users

On Thu, Jun 18, 2026 at 04:20:59PM +0000, sashiko-bot@kernel.org wrote:
> [Severity: High]
> Does this check introduce an overly strict requirement that causes compression 
> to fail prematurely?
> 
> By requiring dst_size < hdr_size + max_record_size + sizeof(u64), this 
> condition unconditionally fails compression if the remaining buffer is less 
> than the maximum record size (typically ~64KB).

Fair point.

After thinking about it a bit more, there is also another problem: the
check is bypassed for the first iteration of the loop. I think it is
correct, because of how the output buffer is sized, but it is not
immediately obvious from the code and needs to be traced through few
call levels.

Probably a better solution would be to have 'header_size' argument in
zstd_compress_stream_to_records(): then we would not need to rely on
process_header()'s return value, the check could run on every iteration,
and it would only need room for a header rather than a full
'max_record_size'.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-06-19 14:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-18 15:59 [PATCH 0/3] perf record: fix multi-record Zstd compression Dmitry Ilvokhin
2026-06-18 15:59 ` [PATCH 1/3] perf record: Fix multiple PERF_RECORD_COMPRESSED2 records per push Dmitry Ilvokhin
2026-06-18 16:26   ` sashiko-bot
2026-06-18 15:59 ` [PATCH 2/3] perf record: Avoid overrunning the zstd output buffer Dmitry Ilvokhin
2026-06-18 16:20   ` sashiko-bot
2026-06-19 14:19     ` Dmitry Ilvokhin
2026-06-18 15:59 ` [PATCH 3/3] perf record: Don't store raw data on zstd compression failure Dmitry Ilvokhin
2026-06-18 16:24   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox