Linux Perf Users
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion
@ 2026-06-08 12:59 Tanushree Shah
  2026-06-08 12:59 ` [RFC PATCH 1/4] perf/trace-dat: Add trace.dat export infrastructure Tanushree Shah
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Tanushree Shah @ 2026-06-08 12:59 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, vmolnaro, mpetlan, tmricht, maddy,
	irogers, namhyung, linux-kernel
  Cc: linux-perf-users, linuxppc-dev, atrajeev, hbathini, Tejas.Manhas1,
	Tanushree.Shah, Shivani.Nittor, Tanushree Shah

This RFC patch series introduces support for converting perf.data files
containing tracepoint events into trace.dat format, enabling seamless
visualization and analysis using KerneShark.

======================
Background and Motivation
======================

Currently, perf and trace-cmd operate as separate tracing ecosystems with
incompatible data formats. Users who collect tracepoint data with
'perf record' cannot easily visualize it in KernelShark's graphical
timeline view or leverage trace-cmd's analysis capabilities.

This creates workflow friction when users need to:

- Visualize perf tracepoint data in KernelShark's interactive graphical
  timeline
- Share trace data between perf and trace-cmd workflows and toolchains
- Perform architecture-independent conversion and analysis of traces

This conversion bridge eliminates these barriers by enabling seamless
data exchange between perf and trace-cmd ecosystems, allowing users to
choose the best tool for each analysis phase.

======================
Implementation Overview
======================

The series implements the trace.dat file format specification (version 7)
within perf's data conversion framework.

**Patch 1/4: Core trace.dat Export Infrastructure**
Introduces util/trace-dat.c and util/trace-dat.h implementing:
- Per-CPU raw event buffer management (init, collect, free)
- Ftrace ring buffer page construction
- trace.dat section writers (strings, options, flyrecord sections)

**Patch 2/4: Metadata Integration**
Extends util/trace-event-read.c to write trace.dat metadata during
perf.data
parsing:
- Initial format header (magic, version, endian, page size, compression)
- Section 16: HEADER INFO (header_page + header_event)
- Section 17: FTRACE EVENT FORMATS
- Section 18: EVENT FORMATS (per system/event format files)
- Section 19: KALLSYMS
- Section 21: CMDLINES
- Section 15: STRINGS (written last after all sections)

**Patch 3/4: Conversion Backend**
Implements util/data-convert-trace.c with trace_convert__perf2dat()
function:
- Processes PERF_TYPE_TRACEPOINT samples via process_sample_event()
- Collects raw event data per-CPU using trace_dat__collect_cpu_event()
- Writes OPTIONS sections (CPUCOUNT, TRACECLOCK, metadata offsets)
- Writes FLYRECORD section with per-CPU ring buffer pages

**Patch 4/4: User Interface**
Extends tools/perf/builtin-data.c with --to-trace-dat option:
- Adds command-line option for trace.dat output
- Mutually exclusive with --to-ctf and --to-json
- Calls trace_convert__perf2dat() to perform conversion

======================
Current Implementation Details
======================

**trace.dat Format Version:**
The implementation currently targets trace.dat format version 7, which
is the stable version supported by current trace-cmd releases (v3.x).
This version is hardcoded to ensure compatibility with existing
trace-cmd and KernelShark installations. Future enhancements could add
version negotiation or support for newer format versions as they become
standardized.

**Compression Strategy:**
Compression is explicitly disabled (set to NONE) in the generated
trace.dat files.
This design choice:
- Simplifies the initial implementation and testing
- Ensures maximum compatibility across trace-cmd versions
- Avoids external compression library dependencies

Future work could add support for various compression algorithms (zlib,
zstd, lz4) with runtime selection via command-line options, significantly
reducing file sizes for large traces.

======================
Usage Example
======================

```bash
*Record tracepoint events with perf*
perf record -e sched:sched_switch -e sched:sched_wakeup -a sleep 10

*Convert to trace.dat format*
perf data convert --to-trace-dat=output.dat

*Verify trace.dat structure*
trace-cmd dump --summary output.dat

*Analyze with trace-cmd*
trace-cmd report output.dat

*Visualize in KernelShark*
kernelshark output.dat
```

**Conversion Output:**
```
[ perf data convert: Converted 'perf.data' into trace.dat format
'output.dat' ]
[ perf data convert: Converted 2684 events ]
```
**trace-cmd dump --summary Output:**
```
 Tracing meta data in file output.dat:
	[Initial format]
		7	[Version]
		0	[Little endian]
		8	[Bytes in a long]
		65536	[Page size, bytes]
		none	[Compression algorithm]
			[Compression version]
	[buffer "", "local" clock, 65536 page size, 16 cpus, 1048576 bytes
    flyrecord data]
	[10 options]
	[Saved command lines, 0 bytes]
	[Kallsyms, 0 bytes]
	[Ftrace format, 0 events]
	[Header page, 206 bytes]
	[Header event, 205 bytes]
	[Events format, 1 systems]
	[9 sections]
```
======================
Testing and Verification
======================

The series has been extensively tested with:
- Various tracepoint events (sched, irq, syscalls, block I/O)
- Mixed recordings containing both tracepoint and non-tracepoint events
  only tracepoints converted)
- Verification with trace-cmd report and KernelShark visualization
- Memory leak testing with Valgrind (0 bytes leaked)
- Cross-architecture testing (x86_64, ppc64le)

All generated trace.dat files successfully open in:
- trace-cmd report (v3.1+)
- KernelShark (v2.0+)

======================
Next Steps
======================

We would highly appreciate reviews, comments, and feedback on:
- The overall architectural approach and integration points
- Compatibility considerations with trace-cmd ecosystem
- Performance characteristics for large-scale traces
- Additional use cases or workflow scenarios
- Future enhancement priorities

Tanushree Shah (4):
  perf/trace-dat: Add trace.dat export infrastructure
  perf/trace-event: Write trace.dat metadata sections during parsing
  perf data-convert: Add perf.data to trace.dat conversion backend
  perf data: Add --to-trace-dat option for converting perf.data
    tracepoint events into trace.dat format

 tools/perf/builtin-data.c            |  38 +-
 tools/perf/util/Build                |   2 +
 tools/perf/util/data-convert-trace.c | 152 ++++++
 tools/perf/util/data-convert.h       |   4 +
 tools/perf/util/trace-dat.c          | 705 +++++++++++++++++++++++++++
 tools/perf/util/trace-dat.h          |  79 +++
 tools/perf/util/trace-event-read.c   | 259 +++++++++-
 7 files changed, 1230 insertions(+), 9 deletions(-)
 create mode 100644 tools/perf/util/data-convert-trace.c
 create mode 100644 tools/perf/util/trace-dat.c
 create mode 100644 tools/perf/util/trace-dat.h

-- 
2.53.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 1/4] perf/trace-dat: Add trace.dat export infrastructure
  2026-06-08 12:59 [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion Tanushree Shah
@ 2026-06-08 12:59 ` Tanushree Shah
  2026-06-08 13:13   ` sashiko-bot
  2026-06-08 12:59 ` [RFC PATCH 2/4] perf/trace-event: Write trace.dat metadata sections during parsing Tanushree Shah
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Tanushree Shah @ 2026-06-08 12:59 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, vmolnaro, mpetlan, tmricht, maddy,
	irogers, namhyung, linux-kernel
  Cc: linux-perf-users, linuxppc-dev, atrajeev, hbathini, Tejas.Manhas1,
	Tanushree.Shah, Shivani.Nittor, Tanushree Shah

Add new utility files util/trace-dat.c and util/trace-dat.h
implementing the infrastructure for exporting perf.data tracepoints
to trace.dat format compatible with trace-cmd and KernelShark.

trace-dat.c defines all globals and functions needed for:
- Per-cpu raw event buffer management (init_cpu_buffers,
  collect_cpu_event, free_cpu_buffers)
- ftrace ring buffer page construction (write_page, write_cpu_dat)
- trace.dat section writers (write_strings_section,
  write_options_section1, write_options_section2,
  write_flyrecord_section)

trace-dat.h declares all globals and function prototypes to be
used by data-convert-trace.c and trace-event-read.c.

Signed-off-by: Tanushree Shah <tshah@linux.ibm.com>
---
 tools/perf/util/Build       |   1 +
 tools/perf/util/trace-dat.c | 705 ++++++++++++++++++++++++++++++++++++
 tools/perf/util/trace-dat.h |  79 ++++
 3 files changed, 785 insertions(+)
 create mode 100644 tools/perf/util/trace-dat.c
 create mode 100644 tools/perf/util/trace-dat.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 70cc91d00804..c000d8032d25 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -98,6 +98,7 @@ perf-util-y += trace-event-scripting.o
 perf-util-$(CONFIG_LIBTRACEEVENT) += trace-event.o
 perf-util-$(CONFIG_LIBTRACEEVENT) += trace-event-parse.o
 perf-util-$(CONFIG_LIBTRACEEVENT) += trace-event-read.o
+perf-util-$(CONFIG_LIBTRACEEVENT) += trace-dat.o
 perf-util-y += sort.o
 perf-util-y += hist.o
 perf-util-y += util.o
diff --git a/tools/perf/util/trace-dat.c b/tools/perf/util/trace-dat.c
new file mode 100644
index 000000000000..aa34a7b89b7a
--- /dev/null
+++ b/tools/perf/util/trace-dat.c
@@ -0,0 +1,705 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright 2026, IBM Corporation
+ * Author: Tanushree Shah <tshah@linux.ibm.com>
+ *
+ * trace-dat.c
+ *
+ * This file implements the trace.dat format writer for perf tool.
+ * It collects trace events from multiple CPUs and writes them in
+ * the trace-cmd compatible format.
+ */
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <errno.h>
+#include "api/fs/tracing_path.h"
+#include "trace-dat.h"
+#include "trace-event.h"
+#include "session.h"
+#include "header.h"
+#include "../perf.h"
+#include "debug.h"
+
+/* ftrace ring buffer constants for trace.dat flyrecord section
+ *
+ * Each page has a 16-byte header (timestamp + commit size), followed by
+ * variable-length records. Each record has a 4-byte header word encoding:
+ *   Bits 0-4:	 Type/Length field (5 bits, masked by TYPE_LEN_MASK)
+ *   Bits 5-31:  Time delta from page base timestamp (27 bits, masked by TIME_MASK)
+ */
+#define TRACE_DAT_RECORD_HEADER_SIZE 16		/* Page header: 8-byte ts + 8-byte commit */
+#define TRACE_DAT_RECORD_TYPE_LEN_MASK 0x1F		/* Extract lower 5 bits for type/length */
+#define TRACE_DAT_RECORD_TIME_SHIFT	5		/* Shift to extract time delta */
+#define TRACE_DAT_RECORD_TIME_MASK 0x07FFFFFF	/* Mask for 27-bit time delta */
+#define TRACE_DAT_WORD_SIZE	4		/* Records aligned to 4-byte boundaries */
+#define TRACE_DAT_WORD_ALIGN_MASK 3
+
+/* Initial capacity for per-CPU event buffer (grows by doubling) */
+#define INITIAL_EVENT_CAPACITY 1024
+/* Initial capacity for page record array (grows by doubling) */
+#define INITIAL_PAGE_RECORD_CAPACITY 64
+/* Buffer size for reading trace_clock string from debugfs/tracefs */
+#define CLOCK_BUFFER_SIZE 256
+
+FILE *trace_dat_fp;
+int trace_dat_page_size;
+int trace_dat_nr_cpus;
+long trace_dat_options_offset;
+long trace_dat_header_info_offset;
+long trace_dat_events_format_offset;
+long trace_dat_ftrace_format_offset;
+long trace_dat_kallsyms_offset;
+long trace_dat_cmdline_offset;
+long trace_dat_next_options_offset;
+
+
+/**
+ * struct cpu_event - Single trace event from a CPU
+ * @ts: Timestamp of the event
+ * @raw: Raw event data
+ * @raw_size: Size of raw event data in bytes
+ */
+struct cpu_event {
+	unsigned long long ts;
+	void *raw;
+	unsigned int raw_size;
+};
+
+/**
+ * struct cpu_events - Collection of trace events for a single CPU
+ * @events: Array of events
+ * @count: Number of events currently stored
+ * @capacity: Maximum number of events that can be stored
+ */
+struct cpu_events {
+	struct cpu_event  *events;
+	int count;
+	int capacity;
+};
+
+static struct cpu_events *trace_cpu_data;
+static long *buffer_opt_cpu_offsets_pos;
+static long opt_payload_start;
+
+/* Allocate per-cpu event buffers for tracepoint data collection */
+int trace_dat__init_cpu_buffers(int nr_cpus)
+{
+	trace_cpu_data = calloc(nr_cpus, sizeof(struct cpu_events));
+	if (!trace_cpu_data)
+		return -ENOMEM;
+	buffer_opt_cpu_offsets_pos = calloc(nr_cpus, sizeof(long));
+	if (!buffer_opt_cpu_offsets_pos) {
+		free(trace_cpu_data);
+		trace_cpu_data = NULL;
+		return -ENOMEM;
+	}
+	trace_dat_nr_cpus = nr_cpus;
+	return 0;
+}
+
+/* Store raw tracepoint event data in per-cpu buffer for trace.dat
+ * flyrecord
+ */
+int trace_dat__collect_cpu_event(int cpu, unsigned long long ts,
+				 void *raw, unsigned int raw_size)
+{
+	struct cpu_events *cpu_events;
+
+	if (!trace_cpu_data || cpu < 0 || cpu >= trace_dat_nr_cpus)
+		return -EINVAL;
+
+	if (!raw || raw_size == 0)
+		return -EINVAL;
+
+	cpu_events = &trace_cpu_data[cpu];
+
+	if (cpu_events->count >= cpu_events->capacity) {
+		cpu_events->capacity = cpu_events->capacity ?
+				       cpu_events->capacity * 2 : INITIAL_EVENT_CAPACITY;
+		cpu_events->events   = realloc(cpu_events->events,
+				       cpu_events->capacity * sizeof(*cpu_events->events));
+		if (!cpu_events->events)
+			return -ENOMEM;
+	}
+
+	cpu_events->events[cpu_events->count].ts = ts;
+	cpu_events->events[cpu_events->count].raw = malloc(raw_size);
+	if (!cpu_events->events[cpu_events->count].raw)
+		return -ENOMEM;
+
+	memcpy(cpu_events->events[cpu_events->count].raw, raw, raw_size);
+	cpu_events->events[cpu_events->count].raw_size = raw_size;
+	cpu_events->count++;
+
+	return 0;
+}
+
+/* Write a single page of trace records */
+static int trace_dat__write_page(FILE *fp, unsigned long long base_ts,
+			char **records, int *rec_sizes, int nr_recs)
+{
+	unsigned long long commit = 0;
+	int offset = TRACE_DAT_RECORD_HEADER_SIZE;
+	int i;
+	char *page;
+
+	page = calloc(1, trace_dat_page_size);
+	if (!page)
+		return -ENOMEM;
+
+	for (i = 0; i < nr_recs; i++) {
+		memcpy(page + offset, records[i], rec_sizes[i]);
+		offset += rec_sizes[i];
+		commit += rec_sizes[i];
+	}
+
+	memcpy(page, &base_ts, sizeof(base_ts));
+	memcpy(page + sizeof(base_ts), &commit, sizeof(commit));
+
+	if (!fwrite(page, 1, trace_dat_page_size, fp)) {
+		free(page);
+		return -EIO;
+	}
+	free(page);
+
+	return 0;
+}
+
+/* Write all trace data for a single CPU as trace.dat flyrecord pages */
+static int trace_dat__write_cpu_dat(FILE *fp, int cpu, unsigned long long *file_offset_out)
+{
+	struct cpu_events *cpu_events = &trace_cpu_data[cpu];
+	unsigned long long base_ts;
+	unsigned long long file_offset;
+	char **page_records = NULL;
+	int *page_rec_sizes = NULL;
+	int page_cap = 0;
+	int nr_page_recs = 0;
+	int page_size_used = 0;
+	int ret = 0;
+	int i, j;
+
+	file_offset = ftell(fp);
+	*file_offset_out = file_offset;
+
+	if (cpu_events->count == 0) {
+		char *empty_page = calloc(1, trace_dat_page_size);
+
+		if (!empty_page)
+			return -ENOMEM;
+		if (!fwrite(empty_page, 1, trace_dat_page_size, fp)) {
+			free(empty_page);
+			return -EIO;
+		}
+		free(empty_page);
+		return 0;
+	}
+
+	base_ts = cpu_events->events[0].ts;
+
+	for (i = 0; i < cpu_events->count; i++) {
+		struct cpu_event *event = &cpu_events->events[i];
+		unsigned long long time_delta = event->ts - base_ts;
+		unsigned int data_len = event->raw_size;
+		unsigned int words = (data_len + TRACE_DAT_WORD_ALIGN_MASK) / TRACE_DAT_WORD_SIZE;
+		unsigned int type_len = words & TRACE_DAT_RECORD_TYPE_LEN_MASK;
+		unsigned int hdr_word = ((time_delta & TRACE_DAT_RECORD_TIME_MASK) <<
+					TRACE_DAT_RECORD_TIME_SHIFT) | type_len;
+		int rec_size;
+		char *rec;
+
+		rec_size = TRACE_DAT_WORD_SIZE + data_len;
+		if (rec_size % TRACE_DAT_WORD_SIZE)
+			rec_size += TRACE_DAT_WORD_SIZE - (rec_size % TRACE_DAT_WORD_SIZE);
+
+		rec = calloc(1, rec_size);
+		if (!rec)
+			return -ENOMEM;
+		memcpy(rec, &hdr_word, TRACE_DAT_WORD_SIZE);
+		memcpy(rec + TRACE_DAT_WORD_SIZE, event->raw, data_len);
+
+		if (page_size_used + rec_size > trace_dat_page_size -
+		    TRACE_DAT_RECORD_HEADER_SIZE) {
+			ret = trace_dat__write_page(fp, base_ts,
+			      page_records, page_rec_sizes, nr_page_recs);
+			for (j = 0; j < nr_page_recs; j++)
+				free(page_records[j]);
+			nr_page_recs = 0;
+			page_size_used = 0;
+			base_ts = event->ts;
+			if (ret < 0)
+				goto out_free;
+		}
+
+		if (nr_page_recs >= page_cap) {
+			char **tmp_records;
+			int *tmp_sizes;
+
+			page_cap = page_cap ? page_cap * 2 : INITIAL_PAGE_RECORD_CAPACITY;
+			tmp_records = realloc(page_records, page_cap * sizeof(char *));
+			tmp_sizes = realloc(page_rec_sizes, page_cap * sizeof(int));
+			if (!tmp_records || !tmp_sizes) {
+				ret = -ENOMEM;
+				goto out_free;
+			}
+			page_records = tmp_records;
+			page_rec_sizes = tmp_sizes;
+		}
+		page_records[nr_page_recs] = rec;
+		page_rec_sizes[nr_page_recs] = rec_size;
+		nr_page_recs++;
+		page_size_used += rec_size;
+	}
+
+	if (nr_page_recs > 0) {
+		ret = trace_dat__write_page(fp, base_ts,
+		      page_records, page_rec_sizes, nr_page_recs);
+	}
+out_free:
+	for (j = 0; j < nr_page_recs; j++)
+		free(page_records[j]);
+	free(page_records);
+	free(page_rec_sizes);
+	return ret;
+}
+
+/* Write the strings section containing section name lookup table */
+int trace_dat__write_strings_section(void)
+{
+	unsigned short section_id = TRACE_DAT_SECTION_STRINGS;
+	unsigned short flags = 0;
+	unsigned long long section_size = 0;
+	static const char * const section_names[] = {
+		"headers",		/* offset 0 - strid for section 16  */
+		"ftrace event formats", /* offset 8 - strid for section 17  */
+		"events format",	/* offset 29 - strid for section 18  */
+		"kallsyms",		/* offset 43 - strid for section 19  */
+		"cmdlines",		/* offset 52 - strid for section 21  */
+		"strings",		/* offset 61 - strid for section 15  */
+		"options",		/* offset 69 - strid for options 1   */
+		"options",		/* offset 77 - strid for options 2   */
+		"buffer-flyrecord",	/* offset 85 - strid for flyrecord 3 */
+		NULL
+	};
+
+	/* string_id points to "strings" string itself */
+	unsigned int string_id = STRID_STRINGS;
+	int i;
+
+	if (!trace_dat_fp)
+		return -EBADF;
+
+	for (i = 0; section_names[i] != NULL; i++)
+		section_size += strlen(section_names[i]) + 1;
+
+	/* write section header */
+	if (!fwrite(&section_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+		       !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
+		       !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp) ||
+		       !fwrite(&section_size, sizeof(unsigned long long), 1, trace_dat_fp))
+		return -EIO;
+
+	/* write strings */
+	for (i = 0; section_names[i] != NULL; i++)
+		if (!fwrite(section_names[i], 1, strlen(section_names[i]) + 1, trace_dat_fp))
+			return -EIO;
+	return 0;
+}
+
+/* Writes options section containing CPUCOUNT, TRACECLOCK, EVENT_FORMAT, HEADER_INFO,
+ * FTRACE_EVENTS, KALLSYMS, CMDLINES options, ending with DONE option pointing to next section.
+ */
+int trace_dat__write_options_section1(void)
+{
+	unsigned short section_id = TRACE_DAT_SECTION_OPTIONS;
+	unsigned short flags = 0;
+	unsigned int string_id = STRID_OPTIONS_1;
+	unsigned long long section_size = 0;
+	long section_size_pos;
+	long payload_start;
+	unsigned long long section_start;
+	unsigned short opt_id;
+	unsigned int opt_size;
+	char clock_buf[CLOCK_BUFFER_SIZE];
+	FILE *clock_file;
+	size_t bytes_read;
+	char *path;
+	unsigned long long next_offset;
+	long end_pos;
+
+	if (!trace_dat_fp)
+		return -EBADF;
+
+	/* fill options_offset in initial format */
+	section_start = ftell(trace_dat_fp);
+
+	if (fseek(trace_dat_fp, trace_dat_options_offset, SEEK_SET) < 0 ||
+	   !fwrite(&section_start, sizeof(unsigned long long), 1, trace_dat_fp) ||
+	   fseek(trace_dat_fp, 0, SEEK_END) < 0)
+		return -EIO;
+
+	/* write section header */
+	if (!fwrite(&section_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+	    !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
+	    !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp))
+		return -EIO;
+	section_size_pos = ftell(trace_dat_fp);
+	if (!fwrite(&section_size, sizeof(unsigned long long), 1, trace_dat_fp))
+		return -EIO;
+
+	payload_start = ftell(trace_dat_fp);
+
+	/* CPUCOUNT option */
+	opt_id = TRACE_DAT_OPTION_CPUCOUNT;
+	opt_size = sizeof(unsigned int);
+
+	if (!fwrite(&opt_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+	    !fwrite(&opt_size, sizeof(unsigned int), 1, trace_dat_fp) ||
+	    !fwrite(&trace_dat_nr_cpus, sizeof(unsigned int), 1, trace_dat_fp))
+		return -EIO;
+
+	/* TRACECLOCK option */
+	opt_id = TRACE_DAT_OPTION_TRACECLOCK;
+
+	path = get_tracing_file("trace_clock");
+	clock_file = fopen(path, "r");
+	put_tracing_file(path);
+	if (clock_file) {
+		bytes_read = fread(clock_buf, 1, sizeof(clock_buf) - 1, clock_file);
+		fclose(clock_file);
+		clock_buf[bytes_read] = '\0';
+	} else {
+		strcpy(clock_buf, "local\n");
+		bytes_read = strlen(clock_buf);
+	}
+	opt_size = bytes_read + 1;
+	if (!fwrite(&opt_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+	    !fwrite(&opt_size, sizeof(unsigned int), 1, trace_dat_fp) ||
+	    !fwrite(clock_buf, 1, opt_size, trace_dat_fp))
+		return -EIO;
+
+	/* EVENT option */
+	opt_id = TRACE_DAT_OPTION_EVENT;
+	opt_size = sizeof(unsigned long long);
+
+	if (!fwrite(&opt_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+	   !fwrite(&opt_size, sizeof(unsigned int), 1, trace_dat_fp) ||
+	   !fwrite(&trace_dat_events_format_offset, sizeof(unsigned long long),
+		   1, trace_dat_fp))
+		return -EIO;
+
+	/* HEADER option */
+	opt_id = TRACE_DAT_OPTION_HEADER;
+	opt_size = sizeof(unsigned long long);
+
+	if (!fwrite(&opt_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+	    !fwrite(&opt_size, sizeof(unsigned int), 1, trace_dat_fp) ||
+	    !fwrite(&trace_dat_header_info_offset, sizeof(unsigned long long),
+		    1, trace_dat_fp))
+		return -EIO;
+
+	/* FTRACE option */
+	opt_id = TRACE_DAT_OPTION_FTRACE;
+	opt_size = sizeof(unsigned long long);
+
+	if (!fwrite(&opt_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+	    !fwrite(&opt_size, sizeof(unsigned int), 1, trace_dat_fp) ||
+	    !fwrite(&trace_dat_ftrace_format_offset, sizeof(unsigned long long),
+		   1, trace_dat_fp))
+		return -EIO;
+
+	/* KALLSYMS option */
+	opt_id = TRACE_DAT_OPTION_KALLSYMS;
+	opt_size = sizeof(unsigned long long);
+
+	if (!fwrite(&opt_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+	    !fwrite(&opt_size, sizeof(unsigned int), 1, trace_dat_fp) ||
+	    !fwrite(&trace_dat_kallsyms_offset, sizeof(unsigned long long),
+		    1, trace_dat_fp))
+		return -EIO;
+
+	/* CMDLINE option */
+	opt_id = TRACE_DAT_OPTION_CMDLINE;
+	opt_size = sizeof(unsigned long long);
+
+	if (!fwrite(&opt_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+	    !fwrite(&opt_size, sizeof(unsigned int), 1, trace_dat_fp) ||
+	    !fwrite(&trace_dat_cmdline_offset, sizeof(unsigned long long),
+		    1, trace_dat_fp))
+		return -EIO;
+
+	/* DONE option id - next_options_offset filled later */
+	opt_id = TRACE_DAT_OPTION_DONE;
+	opt_size = sizeof(unsigned long long);
+	next_offset = 0;  /* placeholder */
+
+	trace_dat_next_options_offset = ftell(trace_dat_fp);
+	if (!fwrite(&opt_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+	    !fwrite(&opt_size, sizeof(unsigned int), 1, trace_dat_fp) ||
+	    !fwrite(&next_offset, sizeof(unsigned long long), 1, trace_dat_fp))
+		return -EIO;
+
+	/* fill section size */
+	end_pos = ftell(trace_dat_fp);
+
+	section_size = end_pos - payload_start;
+	if (fseek(trace_dat_fp, section_size_pos, SEEK_SET) < 0 ||
+	    !fwrite(&section_size, sizeof(unsigned long long), 1, trace_dat_fp) ||
+	    fseek(trace_dat_fp, end_pos, SEEK_SET) < 0)
+		return -EIO;
+
+	return 0;
+
+}
+
+/* Writes options section containing BUFFER option with flyrecord section
+ * (flyrecord section offset, clock type, page size, CPU count,
+ * per-CPU offsets/sizes) and DONE option.
+ */
+int trace_dat__write_options_section2(void)
+{
+	unsigned short section_id  = TRACE_DAT_SECTION_OPTIONS;
+	unsigned short flags = 0;
+	unsigned int string_id = STRID_OPTIONS_2;
+	unsigned long long section_size = 0;
+	long section_size_pos;
+	long payload_start;
+	int cpu;
+	unsigned short opt_id = TRACE_DAT_OPTION_BUFFER;
+	unsigned int opt_size = 0;
+	long opt_size_pos;
+	unsigned long long data_offset = 0;
+	unsigned int page_size = (unsigned int)trace_dat_page_size;
+	const char *clock = "local";
+	unsigned long long next;
+	long end_pos;
+	unsigned long long cpu_offset;
+	unsigned long long cpu_size;
+	unsigned short done_id;
+	unsigned int done_size;
+
+	if (!trace_dat_fp)
+		return -EINVAL;
+
+	/* fill done1 next offset - points to this section */
+	next = ftell(trace_dat_fp);
+
+	if (fseek(trace_dat_fp, trace_dat_next_options_offset + 2 + 4, SEEK_SET) < 0 ||
+	    !fwrite(&next, sizeof(unsigned long long), 1, trace_dat_fp) ||
+	    fseek(trace_dat_fp, 0, SEEK_END) < 0)
+		return -EIO;
+
+	/* write section header */
+	if (!fwrite(&section_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+	    !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
+	    !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp))
+		return -EIO;
+	section_size_pos = ftell(trace_dat_fp);
+	if (!fwrite(&section_size, sizeof(unsigned long long), 1, trace_dat_fp))
+		return -EIO;
+
+	payload_start = ftell(trace_dat_fp);
+
+	/* BUFFER option */
+	if (!fwrite(&opt_id, sizeof(unsigned short), 1, trace_dat_fp))
+		return -EIO;
+	opt_size_pos = ftell(trace_dat_fp);
+	if (!fwrite(&opt_size, sizeof(unsigned int), 1, trace_dat_fp))
+		return -EIO;
+	opt_payload_start = ftell(trace_dat_fp);
+
+	/* data_offset placeholder */
+	if (!fwrite(&data_offset, sizeof(unsigned long long), 1, trace_dat_fp) ||
+	    !fwrite("\0", 1, 1, trace_dat_fp) ||
+	    !fwrite(clock, 1, strlen(clock) + 1, trace_dat_fp) ||
+	    !fwrite(&page_size, sizeof(unsigned int), 1, trace_dat_fp) ||
+	    !fwrite(&trace_dat_nr_cpus, sizeof(unsigned int), 1, trace_dat_fp))
+		return -EIO;
+
+	/* per cpu: cpu_id + offset placeholder + size */
+	for (cpu = 0; cpu < trace_dat_nr_cpus; cpu++) {
+		cpu_offset = 0;  /* filled in write_flyrecord */
+		cpu_size   = 0;  /* filled in write_flyrecord */
+
+		if (!fwrite(&cpu, sizeof(unsigned int), 1, trace_dat_fp))
+			return -EIO;
+		buffer_opt_cpu_offsets_pos[cpu] = ftell(trace_dat_fp);
+		if (!fwrite(&cpu_offset, sizeof(unsigned long long), 1, trace_dat_fp) ||
+		    !fwrite(&cpu_size, sizeof(unsigned long long), 1, trace_dat_fp))
+			return -EIO;
+	}
+
+	/* fill opt_size */
+	end_pos = ftell(trace_dat_fp);
+
+	opt_size = end_pos - opt_payload_start;
+	fseek(trace_dat_fp, opt_size_pos, SEEK_SET);
+	if (!fwrite(&opt_size, sizeof(unsigned int), 1, trace_dat_fp))
+		return -EIO;
+	fseek(trace_dat_fp, end_pos, SEEK_SET);
+
+	/* DONE id=0 */
+	done_id = TRACE_DAT_OPTION_DONE;
+	done_size = sizeof(unsigned long long);
+	/* No additional options sections follow this one */
+	next = 0;
+
+	if (!fwrite(&done_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+	    !fwrite(&done_size, sizeof(unsigned int), 1, trace_dat_fp) ||
+	    !fwrite(&next, sizeof(unsigned long long), 1, trace_dat_fp))
+		return -EIO;
+
+	/* fill section size */
+	end_pos = ftell(trace_dat_fp);
+
+	section_size = end_pos - payload_start;
+	fseek(trace_dat_fp, section_size_pos, SEEK_SET);
+	if (!fwrite(&section_size, sizeof(unsigned long long), 1, trace_dat_fp))
+		return -EIO;
+	fseek(trace_dat_fp, end_pos, SEEK_SET);
+
+	return 0;
+
+}
+
+int trace_dat__write_flyrecord_section(void)
+{
+	unsigned short section_id = TRACE_DAT_SECTION_FLYRECORD;
+	unsigned short flags = 0;
+	unsigned int string_id = STRID_BUFFER_FLYRECORD;
+	unsigned long long section_size = 0;
+	long section_size_pos;
+	long flyrecord_start;
+	long after_header;
+	long padding_needed;
+	unsigned long long *cpu_offsets;
+	unsigned long long *cpu_sizes;
+	int cpu;
+	int ret = 0;
+	char *pad;
+	unsigned long long start;
+	long end_pos;
+
+	if (!trace_dat_fp)
+		return -EINVAL;
+
+	cpu_offsets = calloc(trace_dat_nr_cpus, sizeof(unsigned long long));
+	cpu_sizes   = calloc(trace_dat_nr_cpus, sizeof(unsigned long long));
+	if (!cpu_offsets || !cpu_sizes) {
+		ret = -ENOMEM;
+		goto cleanup;
+	}
+	flyrecord_start = ftell(trace_dat_fp);
+	if (flyrecord_start < 0) {
+		ret = -EIO;
+		goto cleanup;
+	}
+
+	/* section header */
+	if (!fwrite(&section_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+	    !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
+	    !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp)) {
+		ret = -EIO;
+		goto cleanup;
+	}
+	section_size_pos = ftell(trace_dat_fp);
+	if (!fwrite(&section_size, sizeof(unsigned long long), 1, trace_dat_fp)) {
+		ret = -EIO;
+		goto cleanup;
+	}
+
+	/* Align to page boundary */
+	after_header   = ftell(trace_dat_fp);
+	padding_needed = (trace_dat_page_size -
+			 (after_header % trace_dat_page_size)) % trace_dat_page_size;
+
+	if (padding_needed > 0) {
+		pad = calloc(1, padding_needed);
+
+		if (!fwrite(pad, 1, padding_needed, trace_dat_fp)) {
+			free(pad);
+			ret = -EIO;
+			goto cleanup;
+		}
+		free(pad);
+	}
+
+	/* write per-cpu trace data */
+	for (cpu = 0; cpu < trace_dat_nr_cpus; cpu++) {
+		start = ftell(trace_dat_fp);
+
+		ret = trace_dat__write_cpu_dat(trace_dat_fp, cpu, &cpu_offsets[cpu]);
+
+		if (ret < 0) {
+			pr_err("Failed to write CPU %d data\n", cpu);
+			goto cleanup;
+		}
+		cpu_sizes[cpu]	 = ftell(trace_dat_fp) - start;
+	}
+
+	/* fill section size */
+	end_pos = ftell(trace_dat_fp);
+
+	section_size = end_pos - flyrecord_start;
+	if (fseek(trace_dat_fp, section_size_pos, SEEK_SET) < 0 ||
+	    !fwrite(&section_size, sizeof(unsigned long long), 1, trace_dat_fp)) {
+		ret = -EIO;
+		goto cleanup;
+	}
+	if (fseek(trace_dat_fp, end_pos, SEEK_SET) < 0) {
+		ret = -EIO;
+		goto cleanup;
+	}
+
+	/* fill cpu offsets and sizes in BUFFER option */
+	for (cpu = 0; cpu < trace_dat_nr_cpus; cpu++) {
+		if (fseek(trace_dat_fp, buffer_opt_cpu_offsets_pos[cpu], SEEK_SET) < 0 ||
+		    !fwrite(&cpu_offsets[cpu], sizeof(unsigned long long), 1, trace_dat_fp) ||
+		    !fwrite(&cpu_sizes[cpu], sizeof(unsigned long long), 1, trace_dat_fp)) {
+			ret = -EIO;
+			goto cleanup;
+		}
+	}
+
+	/* fill data offset in buffer option */
+	if (fseek(trace_dat_fp, opt_payload_start, SEEK_SET) < 0 ||
+	    !fwrite(&flyrecord_start, sizeof(unsigned long long), 1, trace_dat_fp)) {
+		ret = -EIO;
+		goto cleanup;
+	}
+
+	if (fseek(trace_dat_fp, 0, SEEK_END) < 0) {
+		ret = -EIO;
+		goto cleanup;
+	}
+
+
+cleanup:
+	free(cpu_offsets);
+	free(cpu_sizes);
+	return ret;
+}
+
+/* Free all per-CPU event buffers */
+void trace_dat__free_cpu_buffers(void)
+{
+	int cpu;
+
+	if (!trace_cpu_data)
+		return;
+
+	for (cpu = 0; cpu < trace_dat_nr_cpus; cpu++) {
+		int i;
+
+		for (i = 0; i < trace_cpu_data[cpu].count; i++)
+			free(trace_cpu_data[cpu].events[i].raw);
+		free(trace_cpu_data[cpu].events);
+	}
+	free(trace_cpu_data);
+	trace_cpu_data = NULL;
+	free(buffer_opt_cpu_offsets_pos);
+	buffer_opt_cpu_offsets_pos = NULL;
+	trace_dat_nr_cpus = 0;
+}
diff --git a/tools/perf/util/trace-dat.h b/tools/perf/util/trace-dat.h
new file mode 100644
index 000000000000..7667a440330c
--- /dev/null
+++ b/tools/perf/util/trace-dat.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright 2026, IBM Corporation
+ * Author: Tanushree Shah <tshah@linux.ibm.com>
+ */
+
+#ifndef __PERF_TRACE_DAT_H
+#define __PERF_TRACE_DAT_H
+
+#include <stdio.h>
+
+/* trace.dat file format version */
+#define TRACE_DAT_VERSION '7'
+
+/*
+ * Section IDs for trace.dat format
+ */
+#define TRACE_DAT_SECTION_OPTIONS   0
+#define TRACE_DAT_SECTION_FLYRECORD 3
+#define TRACE_DAT_SECTION_STRINGS   15
+#define TRACE_DAT_SECTION_HEADER    16
+#define TRACE_DAT_SECTION_FTRACE    17
+#define TRACE_DAT_SECTION_EVENTS    18
+#define TRACE_DAT_SECTION_KALLSYMS  19
+#define TRACE_DAT_SECTION_CMDLINE   21
+
+/*
+ * Option IDs for trace.dat options sections
+ */
+#define TRACE_DAT_OPTION_DONE       0
+#define TRACE_DAT_OPTION_BUFFER     3
+#define TRACE_DAT_OPTION_TRACECLOCK 4
+#define TRACE_DAT_OPTION_CPUCOUNT   8
+#define TRACE_DAT_OPTION_HEADER     16
+#define TRACE_DAT_OPTION_FTRACE     17
+#define TRACE_DAT_OPTION_EVENT      18
+#define TRACE_DAT_OPTION_KALLSYMS   19
+#define TRACE_DAT_OPTION_CMDLINE    21
+
+/*
+ * String offsets in the strings section
+ * These point to null-terminated strings used as section names
+ */
+#define STRID_HEADERS          0
+#define STRID_FTRACE_FORMATS   8
+#define STRID_EVENT_FORMATS    29
+#define STRID_KALLSYMS         43
+#define STRID_CMDLINES         52
+#define STRID_STRINGS          61
+#define STRID_OPTIONS_1        69
+#define STRID_OPTIONS_2        77
+#define STRID_BUFFER_FLYRECORD 85
+
+struct perf_session;
+
+extern FILE *trace_dat_fp;
+extern int trace_dat_page_size;
+extern int trace_dat_nr_cpus;
+extern long trace_dat_options_offset;
+extern long trace_dat_header_info_offset;
+extern long trace_dat_events_format_offset;
+extern long trace_dat_ftrace_format_offset;
+extern long trace_dat_kallsyms_offset;
+extern long trace_dat_cmdline_offset;
+extern long trace_dat_next_options_offset;
+
+/* collect and manage per-cpu tracepoint event buffers */
+int trace_dat__init_cpu_buffers(int nr_cpus);
+int trace_dat__collect_cpu_event(int cpu, unsigned long long ts,
+		       void *raw, unsigned int raw_size);
+void trace_dat__free_cpu_buffers(void);
+
+/* write trace.dat file sections */
+int trace_dat__write_options_section1(void);
+int trace_dat__write_options_section2(void);
+int trace_dat__write_flyrecord_section(void);
+int trace_dat__write_strings_section(void);
+
+#endif /* __PERF_TRACE_DAT_H */
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 2/4] perf/trace-event: Write trace.dat metadata sections during parsing
  2026-06-08 12:59 [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion Tanushree Shah
  2026-06-08 12:59 ` [RFC PATCH 1/4] perf/trace-dat: Add trace.dat export infrastructure Tanushree Shah
@ 2026-06-08 12:59 ` Tanushree Shah
  2026-06-08 13:12   ` sashiko-bot
  2026-06-08 12:59 ` [RFC PATCH 3/4] perf data-convert: Add perf.data to trace.dat conversion backend Tanushree Shah
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Tanushree Shah @ 2026-06-08 12:59 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, vmolnaro, mpetlan, tmricht, maddy,
	irogers, namhyung, linux-kernel
  Cc: linux-perf-users, linuxppc-dev, atrajeev, hbathini, Tejas.Manhas1,
	Tanushree.Shah, Shivani.Nittor, Tanushree Shah

Perf already captures the tracing metadata as a part of
data section in perf.data

When trace_dat_fp is set, write trace.dat compatible metadata
sections using the perf provided raw buffers.

Sections written:
- Initial format header (magic, version, endian, long_size,
  page_size, compression, options_offset placeholder)
- Section 16: HEADER INFO (header_page + header_event)
- Section 17: FTRACE EVENT FORMATS
- Section 18: EVENT FORMATS (per system/event format files)
- Section 19: KALLSYMS
- Section 21: CMDLINES
- Section 15: STRINGS (written last after all sections)

Signed-off-by: Tanushree Shah <tshah@linux.ibm.com>
---
 tools/perf/util/trace-event-read.c | 259 ++++++++++++++++++++++++++++-
 1 file changed, 252 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/trace-event-read.c b/tools/perf/util/trace-event-read.c
index ecbbb93f0185..815577703c2e 100644
--- a/tools/perf/util/trace-event-read.c
+++ b/tools/perf/util/trace-event-read.c
@@ -19,6 +19,7 @@
 #include "trace-event.h"
 #include "debug.h"
 #include "util.h"
+#include "trace-dat.h"
 
 static int input_fd;
 
@@ -145,10 +146,9 @@ static char *read_string(void)
 static int read_proc_kallsyms(struct tep_handle *pevent)
 {
 	unsigned int size;
+	char *buf;
 
 	size = read4(pevent);
-	if (!size)
-		return 0;
 	/*
 	 * Just skip it, now that we configure libtraceevent to use the
 	 * tools/perf/ symbol resolver.
@@ -160,11 +160,56 @@ static int read_proc_kallsyms(struct tep_handle *pevent)
 	 * payload", so that older tools can continue reading it and interpret
 	 * it as "no kallsyms payload is present".
 	 */
-	lseek(input_fd, size, SEEK_CUR);
+	/* Write kallsyms section with empty payload if no data */
+	if (!size) {
+		if (trace_dat_fp) {
+			unsigned short section_id = TRACE_DAT_SECTION_KALLSYMS;
+			unsigned short flags = 0;
+			unsigned long long section_size = sizeof(unsigned int);
+			unsigned int kallsyms_data = 0;
+			unsigned int string_id = STRID_KALLSYMS;
+
+			trace_dat_kallsyms_offset = ftell(trace_dat_fp);
+			if (!fwrite(&section_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+			    !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
+			    !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp) ||
+			    !fwrite(&section_size, sizeof(unsigned long long), 1, trace_dat_fp) ||
+			    !fwrite(&kallsyms_data, sizeof(unsigned int), 1, trace_dat_fp))
+				return -EIO;
+		}
+		return 0;
+	}
+	buf = malloc(size);
+	if (buf == NULL)
+		return -1;
+	if (read(input_fd, buf, size) < 0) {
+		free(buf);
+		return -1;
+	}
 	trace_data_size += size;
+	/* Write kallsyms section with data */
+	if (trace_dat_fp) {
+		unsigned short section_id = TRACE_DAT_SECTION_KALLSYMS;
+		unsigned int string_id = STRID_KALLSYMS;
+		unsigned long long section_size = sizeof(unsigned int) + size;
+		unsigned short flags = 0;
+
+		trace_dat_kallsyms_offset = ftell(trace_dat_fp);
+		if (!fwrite(&section_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+		    !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
+		    !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp) ||
+		    !fwrite(&section_size, sizeof(unsigned long long), 1, trace_dat_fp) ||
+		    !fwrite(&size, sizeof(unsigned int), 1, trace_dat_fp) ||
+		    !fwrite(buf, 1, size, trace_dat_fp)) {
+			free(buf);
+			return -EIO;
+		}
+	}
+	free(buf);
 	return 0;
 }
 
+
 static int read_ftrace_printk(struct tep_handle *pevent)
 {
 	unsigned int size;
@@ -195,6 +240,13 @@ static int read_ftrace_printk(struct tep_handle *pevent)
 static int read_header_files(struct tep_handle *pevent)
 {
 	unsigned long long size;
+	unsigned long long header_page_size;
+	unsigned long long header_event_size;
+	char *header_event;
+	unsigned short section_id;
+	unsigned short flags;
+	unsigned int string_id;
+	unsigned long long section_size;
 	char *header_page;
 	char buf[BUFSIZ];
 	int ret = 0;
@@ -209,6 +261,7 @@ static int read_header_files(struct tep_handle *pevent)
 
 	size = read8(pevent);
 
+	header_page_size = size;
 	header_page = malloc(size);
 	if (header_page == NULL)
 		return -1;
@@ -227,19 +280,59 @@ static int read_header_files(struct tep_handle *pevent)
 		 */
 		tep_set_long_size(pevent, tep_get_header_page_size(pevent));
 	}
-	free(header_page);
 
-	if (do_read(buf, 13) < 0)
+	if (do_read(buf, 13) < 0) {
+		free(header_page);
 		return -1;
+	}
 
 	if (memcmp(buf, "header_event", 13) != 0) {
 		pr_debug("did not read header event");
+		free(header_page);
 		return -1;
 	}
 
 	size = read8(pevent);
-	skip(size);
+	if (trace_dat_fp) {
+		header_event_size = size;
+		header_event = malloc(size);
+		if (header_event == NULL) {
+			free(header_page);
+			return -1;
+		}
+		if (do_read(header_event, size) < 0) {
+			free(header_page);
+			free(header_event);
+			return -1;
+		}
+		/* Write header_page and header_event to trace.dat */
+		section_id = TRACE_DAT_SECTION_HEADER;
+		flags = 0;
+		string_id = STRID_HEADERS;
+		section_size = 12 + 8 + header_page_size + 13 + 8 +
+				header_event_size;
+
+		trace_dat_header_info_offset = ftell(trace_dat_fp);
+		if (!fwrite(&section_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+		    !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
+		    !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp) ||
+		    !fwrite(&section_size, sizeof(unsigned long long), 1, trace_dat_fp) ||
+		    !fwrite("header_page\0", 1, 12, trace_dat_fp) ||
+		    !fwrite(&header_page_size, sizeof(unsigned long long), 1, trace_dat_fp) ||
+		    !fwrite(header_page, 1, header_page_size, trace_dat_fp) ||
+		    !fwrite("header_event\0", 1, 13, trace_dat_fp) ||
+		    !fwrite(&header_event_size, sizeof(unsigned long long), 1, trace_dat_fp) ||
+		    !fwrite(header_event, 1, header_event_size, trace_dat_fp)) {
+			free(header_page);
+			free(header_event);
+			return -EIO;
+		}
+		free(header_event);
+	} else {
+		skip(size);
+	}
 
+	free(header_page);
 	return ret;
 }
 
@@ -259,6 +352,13 @@ static int read_ftrace_file(struct tep_handle *pevent, unsigned long long size)
 		pr_debug("error reading ftrace file.\n");
 		goto out;
 	}
+	if (trace_dat_fp) {
+		if (!fwrite(&size, sizeof(unsigned long long), 1, trace_dat_fp) ||
+		    !fwrite(buf, 1, size, trace_dat_fp)) {
+			free(buf);
+			return -EIO;
+		}
+	}
 
 	ret = parse_ftrace_file(pevent, buf, size);
 	if (ret < 0)
@@ -283,6 +383,13 @@ static int read_event_file(struct tep_handle *pevent, char *sys,
 	ret = do_read(buf, size);
 	if (ret < 0)
 		goto out;
+	if (trace_dat_fp) {
+		if (!fwrite(&size, sizeof(unsigned long long), 1, trace_dat_fp) ||
+		    !fwrite(buf, 1, size, trace_dat_fp)) {
+			free(buf);
+			return -EIO;
+		}
+	}
 
 	ret = parse_event_file(pevent, buf, size, sys);
 	if (ret < 0)
@@ -298,8 +405,31 @@ static int read_ftrace_files(struct tep_handle *pevent)
 	int count;
 	int i;
 	int ret;
+	long section_size_pos = 0;
+	long count_pos = 0;
+	unsigned long long section_size = 0;
+	long end_pos;
 
 	count = read4(pevent);
+	/* Write ftrace formats section to trace.dat output file */
+	if (trace_dat_fp) {
+		unsigned short section_id = TRACE_DAT_SECTION_FTRACE;
+		unsigned short flags = 0;
+		unsigned int string_id = STRID_FTRACE_FORMATS;
+
+		trace_dat_ftrace_format_offset = ftell(trace_dat_fp);
+
+		if (!fwrite(&section_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+		    !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
+		    !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp))
+			return -EIO;
+		section_size_pos = ftell(trace_dat_fp);
+		if (!fwrite(&section_size, sizeof(unsigned long long), 1, trace_dat_fp))
+			return -EIO;
+		count_pos = ftell(trace_dat_fp);
+		if (!fwrite(&count, sizeof(unsigned int), 1, trace_dat_fp))
+			return -EIO;
+	}
 
 	for (i = 0; i < count; i++) {
 		size = read8(pevent);
@@ -307,6 +437,16 @@ static int read_ftrace_files(struct tep_handle *pevent)
 		if (ret)
 			return ret;
 	}
+	/* Fill in section size after writing all ftrace files */
+	if (trace_dat_fp) {
+		end_pos = ftell(trace_dat_fp);
+		section_size = end_pos - count_pos;
+		fseek(trace_dat_fp, section_size_pos, SEEK_SET);
+		if (!fwrite(&section_size, sizeof(unsigned long long), 1, trace_dat_fp))
+			return -EIO;
+		fseek(trace_dat_fp, end_pos, SEEK_SET);
+	}
+
 	return 0;
 }
 
@@ -318,8 +458,30 @@ static int read_event_files(struct tep_handle *pevent)
 	int count;
 	int i,x;
 	int ret;
+	long section_size_pos = 0;
+	long sys_count_pos = 0;
+	unsigned long long section_size = 0;
+	long end_pos;
 
 	systems = read4(pevent);
+	/* Write event formats section to trace.dat output file */
+	if (trace_dat_fp) {
+		unsigned short section_id = TRACE_DAT_SECTION_EVENTS;
+		unsigned short flags = 0;
+		unsigned int string_id = STRID_EVENT_FORMATS;
+
+		trace_dat_events_format_offset = ftell(trace_dat_fp);
+		if (!fwrite(&section_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+		    !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
+		    !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp))
+			return -EIO;
+		section_size_pos = ftell(trace_dat_fp);
+		if (!fwrite(&section_size, sizeof(unsigned long long), 1, trace_dat_fp))
+			return -EIO;
+		sys_count_pos = ftell(trace_dat_fp);
+		if (!fwrite(&systems, sizeof(unsigned int), 1, trace_dat_fp))
+			return -EIO;
+	}
 
 	for (i = 0; i < systems; i++) {
 		sys = read_string();
@@ -327,6 +489,11 @@ static int read_event_files(struct tep_handle *pevent)
 			return -1;
 
 		count = read4(pevent);
+		if (trace_dat_fp) {
+			if (!fwrite(sys, 1, strlen(sys) + 1, trace_dat_fp) ||
+			   !fwrite(&count, sizeof(unsigned int), 1, trace_dat_fp))
+				return -EIO;
+		}
 
 		for (x=0; x < count; x++) {
 			size = read8(pevent);
@@ -338,6 +505,16 @@ static int read_event_files(struct tep_handle *pevent)
 		}
 		free(sys);
 	}
+	/* Fill in section size after writing all event files */
+	if (trace_dat_fp) {
+		end_pos = ftell(trace_dat_fp);
+		section_size = end_pos - sys_count_pos;
+		fseek(trace_dat_fp, section_size_pos, SEEK_SET);
+		if (!fwrite(&section_size, sizeof(unsigned long long), 1, trace_dat_fp))
+			return -EIO;
+		fseek(trace_dat_fp, end_pos, SEEK_SET);
+	}
+
 	return 0;
 }
 
@@ -349,8 +526,25 @@ static int read_saved_cmdline(struct tep_handle *pevent)
 
 	/* it can have 0 size */
 	size = read8(pevent);
-	if (!size)
+	/* Write cmdlines section with empty payload if no data */
+	if (!size) {
+		if (trace_dat_fp) {
+			unsigned short section_id = TRACE_DAT_SECTION_CMDLINE;
+			unsigned short flags = 0;
+			unsigned int string_id = STRID_CMDLINES;
+			unsigned long long section_size = sizeof(unsigned long long);
+			unsigned long long section_data = 0;
+
+			trace_dat_cmdline_offset = ftell(trace_dat_fp);
+			if (!fwrite(&section_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+			    !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
+			    !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp) ||
+			    !fwrite(&section_size, sizeof(unsigned long long), 1, trace_dat_fp) ||
+			    !fwrite(&section_data, sizeof(unsigned long long), 1, trace_dat_fp))
+				return -EIO;
+		}
 		return 0;
+	}
 
 	buf = malloc(size + 1);
 	if (buf == NULL) {
@@ -363,6 +557,23 @@ static int read_saved_cmdline(struct tep_handle *pevent)
 		pr_debug("error reading saved cmdlines\n");
 		goto out;
 	}
+	/* Write cmdlines section with data */
+	if (trace_dat_fp) {
+		unsigned short section_id = TRACE_DAT_SECTION_CMDLINE;
+		unsigned short flags = 0;
+		unsigned int string_id = STRID_CMDLINES;
+		unsigned long long section_size = sizeof(unsigned long long) + size;
+
+		trace_dat_cmdline_offset = ftell(trace_dat_fp);
+		if (!fwrite(&section_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+		    !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
+		    !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp) ||
+		    !fwrite(&section_size, sizeof(unsigned long long), 1, trace_dat_fp) ||
+		    !fwrite(&size, sizeof(unsigned long long), 1, trace_dat_fp) ||
+		    !fwrite(buf, 1, size, trace_dat_fp))
+			return -EIO;
+	}
+
 	buf[ret] = '\0';
 
 	parse_saved_cmdline(pevent, buf, size);
@@ -387,6 +598,7 @@ ssize_t trace_report(int fd, struct trace_event *tevent, bool __repipe)
 	int file_page_size;
 	struct tep_handle *pevent = NULL;
 	int err;
+	char magic_buf[10];
 
 	repipe = __repipe;
 	input_fd = fd;
@@ -398,12 +610,17 @@ ssize_t trace_report(int fd, struct trace_event *tevent, bool __repipe)
 		return -1;
 	}
 
+	if (trace_dat_fp)
+		memcpy(magic_buf, buf, 3);
+
 	if (do_read(buf, 7) < 0)
 		return -1;
 	if (memcmp(buf, "tracing", 7) != 0) {
 		pr_debug("not a trace file (missing 'tracing' tag)");
 		return -1;
 	}
+	if (trace_dat_fp)
+		memcpy(magic_buf + 3, buf, 7);
 
 	version = read_string();
 	if (version == NULL)
@@ -440,6 +657,28 @@ ssize_t trace_report(int fd, struct trace_event *tevent, bool __repipe)
 	tep_set_long_size(pevent, file_long_size);
 	tep_set_page_size(pevent, file_page_size);
 
+	/* Write initial file header to trace.dat */
+	if (trace_dat_fp) {
+		unsigned char endian = file_bigendian;
+		unsigned char long_size = file_long_size;
+		unsigned int page_size = file_page_size;
+		unsigned long long placeholder = 0;
+		char trace_dat_version = TRACE_DAT_VERSION;
+
+		if (!fwrite(magic_buf, 1, 10, trace_dat_fp) ||    /* magic + "tracing" */
+		    !fwrite(&trace_dat_version, 1, 2, trace_dat_fp) ||
+		    !fwrite(&endian, 1, 1, trace_dat_fp) ||
+		    !fwrite(&long_size, 1, 1, trace_dat_fp) ||
+		    !fwrite(&page_size, sizeof(unsigned int), 1, trace_dat_fp) ||
+		    !fwrite("none", 1, 4, trace_dat_fp) ||
+		    !fwrite("\0", 1, 1, trace_dat_fp) ||
+		    !fwrite("\0", 1, 1, trace_dat_fp))
+			return -EIO;
+		trace_dat_options_offset = ftell(trace_dat_fp);
+		if (!fwrite(&placeholder, sizeof(unsigned long long), 1, trace_dat_fp))
+			return -EIO;
+	}
+
 	err = read_header_files(pevent);
 	if (err)
 		goto out;
@@ -460,6 +699,12 @@ ssize_t trace_report(int fd, struct trace_event *tevent, bool __repipe)
 		if (err)
 			goto out;
 	}
+	/* Write strings section to trace.dat output file */
+	if (trace_dat_fp) {
+		err = trace_dat__write_strings_section();
+		if (err)
+			goto out;
+	}
 
 	size = trace_data_size;
 	repipe = false;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 3/4] perf data-convert: Add perf.data to trace.dat conversion backend
  2026-06-08 12:59 [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion Tanushree Shah
  2026-06-08 12:59 ` [RFC PATCH 1/4] perf/trace-dat: Add trace.dat export infrastructure Tanushree Shah
  2026-06-08 12:59 ` [RFC PATCH 2/4] perf/trace-event: Write trace.dat metadata sections during parsing Tanushree Shah
@ 2026-06-08 12:59 ` Tanushree Shah
  2026-06-08 13:14   ` sashiko-bot
  2026-06-08 12:59 ` [RFC PATCH 4/4] perf data: Add --to-trace-dat option for converting perf.data tracepoint events into trace.dat format Tanushree Shah
  2026-06-08 15:18 ` [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion Ian Rogers
  4 siblings, 1 reply; 10+ messages in thread
From: Tanushree Shah @ 2026-06-08 12:59 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, vmolnaro, mpetlan, tmricht, maddy,
	irogers, namhyung, linux-kernel
  Cc: linux-perf-users, linuxppc-dev, atrajeev, hbathini, Tejas.Manhas1,
	Tanushree.Shah, Shivani.Nittor, Tanushree Shah

Add data-convert-trace.c implementing trace_convert__perf2dat() to
convert perf.data tracepoint events to trace.dat format.

process_sample_event() is invoked for each PERF_TYPE_TRACEPOINT sample
during perf_session__process_events(), storing raw event bytes per-cpu
via trace_dat__collect_cpu_event().

Once all samples are collected:
- trace_dat__write_options_section1() writes the OPTIONS section with
  CPUCOUNT, TRACECLOCK, HEADER_INFO, FTRACE_EVENTS, EVENT_FORMATS,
  KALLSYMS, CMDLINES and DONE options.
- trace_dat__write__options_section2() writes the OPTIONS section with
  BUFFER option holding per-cpu data offset placeholders and the DONE
  option.
- trace_dat__write_flyrecord_section() builds ring buffer pages
  per-cpu and patches BUFFER option with final offsets and sizes

Per-cpu buffers are sized to tep_get_page_size() from the session
tep handle and released on all exit paths.

Signed-off-by: Tanushree Shah <tshah@linux.ibm.com>
---
 tools/perf/util/Build                |   1 +
 tools/perf/util/data-convert-trace.c | 152 +++++++++++++++++++++++++++
 tools/perf/util/data-convert.h       |   4 +
 3 files changed, 157 insertions(+)
 create mode 100644 tools/perf/util/data-convert-trace.c

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index c000d8032d25..88022b24e170 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -236,6 +236,7 @@ ifeq ($(CONFIG_LIBTRACEEVENT),y)
 endif
 
 perf-util-y += data-convert-json.o
+perf-util-$(CONFIG_LIBTRACEEVENT) += data-convert-trace.o
 
 perf-util-y += scripting-engines/
 
diff --git a/tools/perf/util/data-convert-trace.c b/tools/perf/util/data-convert-trace.c
new file mode 100644
index 000000000000..e4f8b817be36
--- /dev/null
+++ b/tools/perf/util/data-convert-trace.c
@@ -0,0 +1,152 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright 2026, IBM Corporation
+ * Author: Tanushree Shah <tshah@linux.ibm.com>
+ *
+ * data-convert-trace.c
+ *
+ * Implements perf.data to trace.dat format conversion for tracepoint events.
+ */
+
+#include <errno.h>
+#include <inttypes.h>
+#include <linux/compiler.h>
+#include <linux/err.h>
+
+#include "data-convert.h"
+#include "session.h"
+#include "evsel.h"
+#include "tool.h"
+#include "debug.h"
+#include "trace-dat.h"
+#include "trace-event.h"
+#include "event.h"
+#include "sample.h"
+#include "evlist.h"
+
+struct trace_convert {
+	struct perf_tool tool;
+	u64 events_count;
+};
+
+/* Store raw tracepoint event data in per-cpu buffer for trace.dat flyrecord */
+static int process_sample_event(const struct perf_tool *tool,
+				union perf_event *event __maybe_unused,
+				struct perf_sample *sample,
+				struct evsel *evsel,
+				struct machine *machine __maybe_unused)
+{
+	struct trace_convert *tc = container_of(tool, struct trace_convert, tool);
+
+	/* Collect raw tracepoint data per-cpu */
+	if (trace_dat_fp && sample->raw_size > 0 &&
+	    evsel->core.attr.type == PERF_TYPE_TRACEPOINT) {
+		if (trace_dat__collect_cpu_event(sample->cpu, sample->time,
+					sample->raw_data, sample->raw_size) < 0) {
+			pr_err("Failed to collect CPU event\n");
+			return -ENOMEM;
+		}
+		tc->events_count++;
+	}
+
+	return 0;
+}
+
+/* Convert perf.data tracepoint events to trace.dat format */
+int trace_convert__perf2dat(const char *input, const char *to_trace,
+			   struct perf_data_convert_opts *opts)
+{
+	struct perf_session *session;
+	struct trace_convert tc = {
+		.events_count = 0,
+	};
+	struct perf_data data = {
+		.path = input,
+		.mode = PERF_DATA_MODE_READ,
+		.force = opts->force,
+	};
+	int ret = -EINVAL;
+	bool cpu_buffers_initialized = false;
+
+	/* Initialize tool with all required callbacks */
+	perf_tool__init(&tc.tool, /*ordered_events=*/true);
+	tc.tool.sample = process_sample_event;
+
+	/* Open output trace.dat file */
+	trace_dat_fp = fopen(to_trace, "wb");
+	if (!trace_dat_fp) {
+		pr_err("Failed to open output file: %s\n", to_trace);
+		return -EINVAL;
+	}
+
+	/* Open perf.data session - this writes trace.dat metadata sections */
+	session = perf_session__new(&data, &tc.tool);
+	if (IS_ERR(session)) {
+		pr_err("Failed to open perf.data file\n");
+		ret = PTR_ERR(session);
+		goto out_close;
+	}
+
+	/* Initialize per-CPU buffers for flyrecord data */
+	if (session->tevent.pevent) {
+		trace_dat_page_size = tep_get_page_size(session->tevent.pevent);
+		if (trace_dat__init_cpu_buffers(session->header.env.nr_cpus_online) < 0) {
+			pr_err("Failed to initialize CPU buffers\n");
+			ret = -ENOMEM;
+			goto out_delete;
+		}
+		cpu_buffers_initialized = true;
+	}
+
+	/* Process all events - collects raw data per-cpu */
+	ret = perf_session__process_events(session);
+	if (ret < 0) {
+		pr_err("Failed to process events\n");
+		goto out_delete;
+	}
+
+	/* Skip file creation if no tracepoint events found */
+	if (tc.events_count == 0) {
+		pr_warning("No tracepoint events found in '%s', skipping trace.dat creation\n",
+			input);
+		ret = -EINVAL;
+		goto out_delete;
+	}
+
+	/* Write trace.dat options and flyrecord sections */
+	if (trace_dat__write_options_section1() < 0) {
+		pr_err("Failed to write options section1\n");
+		ret = -EIO;
+		goto out_delete;
+	}
+	if (trace_dat__write_options_section2() < 0) {
+		pr_err("Failed to write options section2\n");
+		ret = -EIO;
+		goto out_delete;
+	}
+	if (trace_dat__write_flyrecord_section() < 0) {
+		pr_err("Failed to write flyrecord section\n");
+		ret = -EIO;
+		goto out_delete;
+	}
+
+	pr_info("[ perf data convert: Converted '%s' into trace.dat format '%s' ]\n",
+		input, to_trace);
+	pr_info("[ perf data convert: Converted %llu events ]\n",
+		(unsigned long long)tc.events_count);
+
+	ret = 0;
+
+out_delete:
+	if (cpu_buffers_initialized)
+		trace_dat__free_cpu_buffers();
+	perf_session__delete(session);
+out_close:
+	if (trace_dat_fp) {
+		fclose(trace_dat_fp);
+		trace_dat_fp = NULL;
+	}
+	if (ret != 0)
+		unlink(to_trace);
+	return ret;
+}
diff --git a/tools/perf/util/data-convert.h b/tools/perf/util/data-convert.h
index ee651fa680a1..d958e68367fe 100644
--- a/tools/perf/util/data-convert.h
+++ b/tools/perf/util/data-convert.h
@@ -19,4 +19,8 @@ int bt_convert__perf2ctf(const char *input_name, const char *to_ctf,
 int bt_convert__perf2json(const char *input_name, const char *to_ctf,
 			 struct perf_data_convert_opts *opts);
 
+#ifdef HAVE_LIBTRACEEVENT
+int trace_convert__perf2dat(const char *input, const char *to_trace,
+			   struct perf_data_convert_opts *opts);
+#endif /* HAVE_LIBTRACEEVENT */
 #endif /* __DATA_CONVERT_H */
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 4/4] perf data: Add --to-trace-dat option for converting perf.data tracepoint events into trace.dat format
  2026-06-08 12:59 [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion Tanushree Shah
                   ` (2 preceding siblings ...)
  2026-06-08 12:59 ` [RFC PATCH 3/4] perf data-convert: Add perf.data to trace.dat conversion backend Tanushree Shah
@ 2026-06-08 12:59 ` Tanushree Shah
  2026-06-08 13:12   ` sashiko-bot
  2026-06-08 15:18 ` [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion Ian Rogers
  4 siblings, 1 reply; 10+ messages in thread
From: Tanushree Shah @ 2026-06-08 12:59 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, vmolnaro, mpetlan, tmricht, maddy,
	irogers, namhyung, linux-kernel
  Cc: linux-perf-users, linuxppc-dev, atrajeev, hbathini, Tejas.Manhas1,
	Tanushree.Shah, Shivani.Nittor, Tanushree Shah

Add new command-line option to perf data convert for generating
trace.dat output files.

The --to-trace-dat option:
- Accepts output filename for trace.dat format
- Mutually exclusive with --to-ctf and --to-json
- Calls trace_convert__perf2dat() to perform conversion

Usage:
  $ perf record -e sched:* -a sleep 1
  $ perf data convert --to-trace-dat=trace.dat
  $ trace-cmd report trace.dat

Signed-off-by: Tanushree Shah <tshah@linux.ibm.com>
---
 tools/perf/builtin-data.c | 38 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-data.c b/tools/perf/builtin-data.c
index 4c08ccb8c06b..96ccab77456a 100644
--- a/tools/perf/builtin-data.c
+++ b/tools/perf/builtin-data.c
@@ -30,6 +30,9 @@ static const char *data_usage[] = {
 
 static const char *to_json;
 static const char *to_ctf;
+#ifdef HAVE_LIBTRACEEVENT
+	static const char *trace_dat_output;
+#endif
 static struct perf_data_convert_opts opts = {
 	.force = false,
 	.all = false,
@@ -48,6 +51,10 @@ static const struct option data_options[] = {
 		OPT_BOOLEAN(0, "all", &opts.all, "Convert all events"),
 		OPT_STRING(0, "time", &opts.time_str, "str",
 			   "Time span of interest (start,stop)"),
+#ifdef HAVE_LIBTRACEEVENT
+		OPT_STRING(0, "to-trace-dat", &trace_dat_output,
+			   "file", "Convert to trace.dat format using perf.data tracepoints"),
+#endif
 		OPT_END()
 	};
 
@@ -65,16 +72,43 @@ static int cmd_data_convert(int argc, const char **argv)
 		pr_err("You cannot specify both --to-ctf and --to-json.\n");
 		return -1;
 	}
+#ifdef HAVE_LIBTRACEEVENT
+	if (trace_dat_output && (to_json || to_ctf)) {
+		pr_err("You cannot specify --to-trace-dat with --to-ctf or --to-json.\n");
+		return -1;
+	}
+#endif
+
 #ifdef HAVE_LIBBABELTRACE_SUPPORT
+	#ifdef HAVE_LIBTRACEEVENT
+	if (!to_json && !to_ctf && !trace_dat_output) {
+		pr_err("You must specify one of --to-ctf, --to-json, or --to-trace-dat.\n");
+		return -1;
+	}
+	#else
 	if (!to_json && !to_ctf) {
 		pr_err("You must specify one of --to-ctf or --to-json.\n");
 		return -1;
 	}
+	#endif
 #else
+	#ifdef HAVE_LIBTRACEEVENT
+	if (!to_json && !trace_dat_output) {
+		pr_err("You must specify --to-json or --to-trace-dat.\n");
+		return -1;
+	}
+	#else
 	if (!to_json) {
 		pr_err("You must specify --to-json.\n");
-	return -1;
-}
+		return -1
+		}
+	#endif
+#endif
+
+#ifdef HAVE_LIBTRACEEVENT
+	if (trace_dat_output)
+		return trace_convert__perf2dat(input_name ? input_name : "perf.data",
+					       trace_dat_output, &opts);
 #endif
 
 	if (to_json)
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/4] perf/trace-event: Write trace.dat metadata sections during parsing
  2026-06-08 12:59 ` [RFC PATCH 2/4] perf/trace-event: Write trace.dat metadata sections during parsing Tanushree Shah
@ 2026-06-08 13:12   ` sashiko-bot
  0 siblings, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-06-08 13:12 UTC (permalink / raw)
  To: Tanushree Shah; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:

New issues:
- [High] Using `read()` directly instead of the subsystem's `do_read()` loop fails to handle short reads, causing stream desynchronization and uninitialized memory writes.
- [High] Integer truncation in `do_read()` causes massive uninitialized heap memory dumps to the output file.
- [Medium] Metadata fields and integers for the `trace.dat` file are written in the host's native byte order, causing cross-platform incompatibility.
- [Medium] Error paths in `fwrite` operations return directly without cleaning up allocated resources, leading to memory and state leaks.
- [Medium] Writing 2 bytes from a 1-byte `char` variable causes a stack out-of-bounds read and file corruption.

Pre-existing issues:
- [High] Unbounded string accumulation in `read_string()` causes a stack buffer overflow.
--

commit f1360b22eb92ea64822af70555a808fd9d67ae93
Author: Tanushree Shah <tshah@linux.ibm.com>
Date:   Mon Jun 8 12:59:50 2026 +0000

    perf/trace-event: Write trace.dat metadata sections during parsing
    
    Perf already captures the tracing metadata as a part of
    data section in perf.data
    
    When trace_dat_fp is set, write trace.dat compatible metadata
    sections using the perf provided raw buffers.
    
    Sections written:
    - Initial format header (magic, version, endian, long_size,
      page_size, compression, options_offset placeholder)
    - Section 16: HEADER INFO (header_page + header_event)
    - Section 17: FTRACE EVENT FORMATS
    - Section 18: EVENT FORMATS (per system/event format files)
    - Section 19: KALLSYMS
    - Section 21: CMDLINES
    - Section 15: STRINGS (written last after all sections)
    
    Signed-off-by: Tanushree Shah <tshah@linux.ibm.com>

This commit updates the trace-event parsing in perf to write trace.dat
compatible metadata sections when trace_dat_fp is set. It outputs sections
such as initial format headers, FTRACE EVENT FORMATS, EVENT FORMATS,
KALLSYMS, and CMDLINES using perf's raw buffers.

> diff --git a/tools/perf/util/trace-event-read.c b/tools/perf/util/trace-event-read.c
> index ecbbb93f01853..815577703c2e5 100644
> --- a/tools/perf/util/trace-event-read.c
> +++ b/tools/perf/util/trace-event-read.c

[ ... ]

> @@ -145,10 +146,9 @@ static char *read_string(void)

[Severity: High]
This isn't a bug introduced by this patch, but does read_string() allow a
stack buffer overflow when accumulating characters?

In read_string():
    buf[size++] = c;

If the input file contains a string longer than BUFSIZ without a null
terminator, will this increment the size index and write past the bounds
of the fixed-size array buf?

>  static int read_proc_kallsyms(struct tep_handle *pevent)
>  {
>  	unsigned int size;
> +	char *buf;
>  
>  	size = read4(pevent);
> -	if (!size)
> -		return 0;

[ ... ]

> +	buf = malloc(size);
> +	if (buf == NULL)
> +		return -1;
> +	if (read(input_fd, buf, size) < 0) {
> +		free(buf);
> +		return -1;
> +	}
>  	trace_data_size += size;
> +	/* Write kallsyms section with data */
> +	if (trace_dat_fp) {
> +		unsigned short section_id = TRACE_DAT_SECTION_KALLSYMS;
> +		unsigned int string_id = STRID_KALLSYMS;
> +		unsigned long long section_size = sizeof(unsigned int) + size;
> +		unsigned short flags = 0;
> +
> +		trace_dat_kallsyms_offset = ftell(trace_dat_fp);
> +		if (!fwrite(&section_id, sizeof(unsigned short), 1, trace_dat_fp) ||
> +		    !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
> +		    !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp) ||
> +		    !fwrite(&section_size, sizeof(unsigned long long), 1, trace_dat_fp) ||
> +		    !fwrite(&size, sizeof(unsigned int), 1, trace_dat_fp) ||
> +		    !fwrite(buf, 1, size, trace_dat_fp)) {

[Severity: High]
Could using read() instead of do_read() here cause uninitialized memory to
be written to the output file?

Unlike do_read(), read() can return a short read. If a short read occurs, the
return value might still be greater than 0, bypassing the error check.

Will this leave the remainder of buf uninitialized, failing to advance the
input_fd offset by the expected amount, and then dump the partially
uninitialized heap memory into trace_dat_fp during the fwrite() call?

> +			free(buf);
> +			return -EIO;
> +		}
> +	}
> +	free(buf);
>  	return 0;
>  }

[ ... ]

> @@ -209,6 +261,7 @@ static int read_header_files(struct tep_handle *pevent)
>  
>  	size = read8(pevent);
>  
> +	header_page_size = size;
>  	header_page = malloc(size);
>  	if (header_page == NULL)
>  		return -1;
> @@ -227,19 +280,59 @@ static int read_header_files(struct tep_handle *pevent)
>  		 */
>  		tep_set_long_size(pevent, tep_get_header_page_size(pevent));
>  	}
> -	free(header_page);
>  
> -	if (do_read(buf, 13) < 0)
> +	if (do_read(buf, 13) < 0) {
> +		free(header_page);
>  		return -1;
> +	}

[Severity: High]
Can integer truncation in do_read() lead to massive uninitialized heap memory
dumps here?

Functions like read_header_files() allocate memory using the 64-bit size read
from the file:

    size = read8(pevent);
    header_page = malloc(size);

They then call do_read(header_page, size). Because do_read() takes an int
for the size argument, values over 4GB are truncated to their lower 32 bits.
If size is 0x100000004, do_read() will only read 4 bytes.

Later in the function, fwrite() is called with the full 64-bit size:

    !fwrite(header_page, 1, header_page_size, trace_dat_fp)

Will this result in dumping 4GB of uninitialized heap memory into the
output file?

[ ... ]

> @@ -298,8 +405,31 @@ static int read_ftrace_files(struct tep_handle *pevent)
>  	int count;
>  	int i;
>  	int ret;
> +	long section_size_pos = 0;
> +	long count_pos = 0;
> +	unsigned long long section_size = 0;
> +	long end_pos;
>  
>  	count = read4(pevent);
> +	/* Write ftrace formats section to trace.dat output file */
> +	if (trace_dat_fp) {
> +		unsigned short section_id = TRACE_DAT_SECTION_FTRACE;
> +		unsigned short flags = 0;
> +		unsigned int string_id = STRID_FTRACE_FORMATS;
> +
> +		trace_dat_ftrace_format_offset = ftell(trace_dat_fp);
> +
> +		if (!fwrite(&section_id, sizeof(unsigned short), 1, trace_dat_fp) ||
> +		    !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
> +		    !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp))

[Severity: Medium]
Should these multi-byte integers be byte-swapped to match the output file's
endianness?

The trace.dat file endianness is explicitly declared to match the original
perf.data file (endian = file_bigendian).

However, variables like section_id, flags, and string_id are written using
fwrite(), which writes them in the host's native byte order. If a
little-endian host is processing a big-endian capture file, won't this corrupt
the section IDs and sizes in the resulting trace.dat file?

> +			return -EIO;
> +		section_size_pos = ftell(trace_dat_fp);
> +		if (!fwrite(&section_size, sizeof(unsigned long long), 1, trace_dat_fp))
> +			return -EIO;
> +		count_pos = ftell(trace_dat_fp);
> +		if (!fwrite(&count, sizeof(unsigned int), 1, trace_dat_fp))
> +			return -EIO;
> +	}

[ ... ]

> @@ -363,6 +557,23 @@ static int read_saved_cmdline(struct tep_handle *pevent)
>  		pr_debug("error reading saved cmdlines\n");
>  		goto out;
>  	}
> +	/* Write cmdlines section with data */
> +	if (trace_dat_fp) {
> +		unsigned short section_id = TRACE_DAT_SECTION_CMDLINE;
> +		unsigned short flags = 0;
> +		unsigned int string_id = STRID_CMDLINES;
> +		unsigned long long section_size = sizeof(unsigned long long) + size;
> +
> +		trace_dat_cmdline_offset = ftell(trace_dat_fp);
> +		if (!fwrite(&section_id, sizeof(unsigned short), 1, trace_dat_fp) ||
> +		    !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
> +		    !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp) ||
> +		    !fwrite(&section_size, sizeof(unsigned long long), 1, trace_dat_fp) ||
> +		    !fwrite(&size, sizeof(unsigned long long), 1, trace_dat_fp) ||
> +		    !fwrite(buf, 1, size, trace_dat_fp))
> +			return -EIO;

[Severity: Medium]
Does returning directly bypass the cleanup block?

If fwrite() fails, this error path returns -EIO immediately instead of using
the "goto out" pattern established in this function.

Will this cause the allocated memory in buf to be leaked, since free(buf)
is skipped?

> +	}
> +
>  	buf[ret] = '\0';
>  
>  	parse_saved_cmdline(pevent, buf, size);

[ ... ]

> @@ -440,6 +657,28 @@ ssize_t trace_report(int fd, struct trace_event *tevent, bool __repipe)
>  	tep_set_long_size(pevent, file_long_size);
>  	tep_set_page_size(pevent, file_page_size);
>  
> +	/* Write initial file header to trace.dat */
> +	if (trace_dat_fp) {
> +		unsigned char endian = file_bigendian;
> +		unsigned char long_size = file_long_size;
> +		unsigned int page_size = file_page_size;
> +		unsigned long long placeholder = 0;
> +		char trace_dat_version = TRACE_DAT_VERSION;
> +
> +		if (!fwrite(magic_buf, 1, 10, trace_dat_fp) ||    /* magic + "tracing" */
> +		    !fwrite(&trace_dat_version, 1, 2, trace_dat_fp) ||

[Severity: Medium]
Can this write out-of-bounds stack memory to the output file?

The variable trace_dat_version is declared as a 1-byte char:

    char trace_dat_version = TRACE_DAT_VERSION;

However, the fwrite call asks to write 2 bytes:

    !fwrite(&trace_dat_version, 1, 2, trace_dat_fp)

Will this read 1 byte past the trace_dat_version variable on the stack,
leaking uninitialized stack memory into the output file and potentially
failing to write the expected null terminator for the version string?

> +		    !fwrite(&endian, 1, 1, trace_dat_fp) ||
> +		    !fwrite(&long_size, 1, 1, trace_dat_fp) ||
> +		    !fwrite(&page_size, sizeof(unsigned int), 1, trace_dat_fp) ||
> +		    !fwrite("none", 1, 4, trace_dat_fp) ||
> +		    !fwrite("\0", 1, 1, trace_dat_fp) ||
> +		    !fwrite("\0", 1, 1, trace_dat_fp))
> +			return -EIO;
> +		trace_dat_options_offset = ftell(trace_dat_fp);
> +		if (!fwrite(&placeholder, sizeof(unsigned long long), 1, trace_dat_fp))
> +			return -EIO;
> +	}
> +
>  	err = read_header_files(pevent);
>  	if (err)
>  		goto out;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260608125951.90425-2-tshah@linux.ibm.com?part=2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 4/4] perf data: Add --to-trace-dat option for converting perf.data tracepoint events into trace.dat format
  2026-06-08 12:59 ` [RFC PATCH 4/4] perf data: Add --to-trace-dat option for converting perf.data tracepoint events into trace.dat format Tanushree Shah
@ 2026-06-08 13:12   ` sashiko-bot
  0 siblings, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-06-08 13:12 UTC (permalink / raw)
  To: Tanushree Shah; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] The `trace_convert__perf2dat` tool fails to register required `.attr`, `.feature`, and `.tracing_data` callbacks, breaking pipe mode.
- [High] Accessing unverified `perf_env` field `nr_cpus_online` leads to deterministic failure when CPU topology feature is missing.
- [High] Failing `realloc()` in `trace_dat__collect_cpu_event()` leaks memory and causes a NULL pointer dereference in the cleanup path.
- [Low] Missing semicolon after `return -1` in the `#else` branch inside `cmd_data_convert`.
--

commit 789bd3d29fe2134b1ed0b9a17d6c674bc7d7db23
Author: Tanushree Shah <tshah@linux.ibm.com>

perf data: Add --to-trace-dat option for converting perf.data tracepoint events into trace.dat format
    
Add new command-line option to perf data convert for generating
trace.dat output files. Accepts output filename for trace.dat format,
mutually exclusive with --to-ctf and --to-json, and calls
trace_convert__perf2dat() to perform conversion.

> diff --git a/tools/perf/builtin-data.c b/tools/perf/builtin-data.c
> index 4c08ccb8c06b7..96ccab77456a3 100644
> --- a/tools/perf/builtin-data.c
> +++ b/tools/perf/builtin-data.c

[ ... ]

> @@ -65,16 +72,43 @@ static int cmd_data_convert(int argc, const char **argv)
[ ... ]
>  #else
> +	#ifdef HAVE_LIBTRACEEVENT
> +	if (!to_json && !trace_dat_output) {
> +		pr_err("You must specify --to-json or --to-trace-dat.\n");
> +		return -1;
> +	}
> +	#else
>  	if (!to_json) {
>  		pr_err("You must specify --to-json.\n");
> -	return -1;
> -}
> +		return -1
> +		}

[Severity: Low]
Is there a missing semicolon here? It looks like this will break compilation
for environments lacking both HAVE_LIBBABELTRACE_SUPPORT and
HAVE_LIBTRACEEVENT.

> +	#endif
> +#endif
> +
> +#ifdef HAVE_LIBTRACEEVENT
> +	if (trace_dat_output)
> +		return trace_convert__perf2dat(input_name ? input_name : "perf.data",
> +					       trace_dat_output, &opts);

[Severity: High]
Does trace_convert__perf2dat() register the required callbacks for pipe mode?

Looking at the tool initialization in trace_convert__perf2dat():

	/* Initialize tool with all required callbacks */
	perf_tool__init(&tc.tool, /*ordered_events=*/true);
	tc.tool.sample = process_sample_event;

If .attr and .feature are not overridden, won't incoming
perf_event_header_attr and feature events from a pipe be silently dropped,
preventing the creation of evlists/evsels and breaking event processing?

[Severity: High]
How does this code handle missing CPU topology features?

Looking at trace_convert__perf2dat():

		if (trace_dat__init_cpu_buffers(session->header.env.nr_cpus_online) < 0) {

If the feature callback is not registered (like in pipe mode) or the CPU
topology feature wasn't recorded, wouldn't session->header.env.nr_cpus_online
remain 0?

If trace_dat__init_cpu_buffers(0) succeeds with 0 elements, won't
trace_dat__collect_cpu_event() later check cpu >= trace_dat_nr_cpus (0 >= 0)
and return -EINVAL, causing process_sample_event to return -ENOMEM and abort
the conversion?

[Severity: High]
Is it possible for a reallocation failure to cause a memory leak and
subsequent NULL pointer dereference?

Looking at trace_dat__collect_cpu_event():

		cpu_events->events   = realloc(cpu_events->events,
				       cpu_events->capacity * sizeof(*cpu_events->events));
		if (!cpu_events->events)
			return -ENOMEM;

If realloc fails, doesn't it overwrite cpu_events->events with NULL, leaking
the old array?

Furthermore, if this returns -ENOMEM and cascades up to trigger the cleanup
path in trace_dat__free_cpu_buffers():

		for (i = 0; i < trace_cpu_data[cpu].count; i++)
			free(trace_cpu_data[cpu].events[i].raw);

Since count was not reset, won't this loop access events[i] and dereference
the NULL pointer?

> +#endif

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260608125951.90425-2-tshah@linux.ibm.com?part=4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/4] perf/trace-dat: Add trace.dat export infrastructure
  2026-06-08 12:59 ` [RFC PATCH 1/4] perf/trace-dat: Add trace.dat export infrastructure Tanushree Shah
@ 2026-06-08 13:13   ` sashiko-bot
  0 siblings, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-06-08 13:13 UTC (permalink / raw)
  To: Tanushree Shah; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 10 potential issue(s) to consider:
- [High] Sequential realloc failures cause a use-after-free, double-free, and memory leak.
- [High] Unchecked realloc overwrite leads to array leak and a later NULL pointer dereference.
- [High] Unchecked calloc failure leads to NULL pointer dereference in fwrite.
- [High] Heap buffer overflow when processing abnormally large trace records.
- [High] Unchecked get_tracing_file failure leads to NULL pointer dereference in fopen.
- [High] Ftrace ring buffer format is incorrectly constructed for events >= 28 words, causing data corruption.
- [High] Timestamp deltas larger than 27 bits are silently truncated.
- [Medium] Direct return on record allocation failure bypasses cleanup, leaking accumulated memory.
- [Medium] The section_size for the flyrecord section is incorrectly calculated, violating the trace.dat format.
- [Medium] The current event record is leaked if flushing the page or reallocating the batch array fails.
--

commit b7f763e536c02304fb52bceb9c78dd92220fa81a
Author: Tanushree Shah <tshah@linux.ibm.com>

perf/trace-dat: Add trace.dat export infrastructure

Add new utility files util/trace-dat.c and util/trace-dat.h
implementing the infrastructure for exporting perf.data tracepoints
to trace.dat format compatible with trace-cmd and KernelShark.

> diff --git a/tools/perf/util/trace-dat.c b/tools/perf/util/trace-dat.c
> index 0000000000000..aa34a7b89b7a3
> --- /dev/null
> +++ b/tools/perf/util/trace-dat.c
[ ... ]
> +int trace_dat__collect_cpu_event(int cpu, unsigned long long ts,
> +				 void *raw, unsigned int raw_size)
> +{
[ ... ]
> +	if (cpu_events->count >= cpu_events->capacity) {
> +		cpu_events->capacity = cpu_events->capacity ?
> +				       cpu_events->capacity * 2 : INITIAL_EVENT_CAPACITY;
> +		cpu_events->events   = realloc(cpu_events->events,
> +				       cpu_events->capacity * sizeof(*cpu_events->events));
> +		if (!cpu_events->events)
> +			return -ENOMEM;

[Severity: High]
If realloc fails, it returns NULL and overwrites cpu_events->events, leaking
the original memory block. Later, trace_dat__free_cpu_buffers() iterates over
the events and dereferences trace_cpu_data[cpu].events[i].raw. Can this cause
a segmentation fault during cleanup since events is NULL?

> +	}
[ ... ]
> +static int trace_dat__write_page(FILE *fp, unsigned long long base_ts,
> +			char **records, int *rec_sizes, int nr_recs)
> +{
[ ... ]
> +	page = calloc(1, trace_dat_page_size);
> +	if (!page)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < nr_recs; i++) {
> +		memcpy(page + offset, records[i], rec_sizes[i]);

[Severity: High]
When a trace event's size exceeds trace_dat_page_size - 16 bytes, the
oversized event is queued into a new batch. Does this memcpy copy data past
the end of the page allocation, causing a heap buffer overflow?

> +		offset += rec_sizes[i];
> +		commit += rec_sizes[i];
> +	}
[ ... ]
> +static int trace_dat__write_cpu_dat(FILE *fp, int cpu, unsigned long long *file_offset_out)
> +{
[ ... ]
> +	for (i = 0; i < cpu_events->count; i++) {
> +		struct cpu_event *event = &cpu_events->events[i];
> +		unsigned long long time_delta = event->ts - base_ts;
> +		unsigned int data_len = event->raw_size;
> +		unsigned int words = (data_len + TRACE_DAT_WORD_ALIGN_MASK) / TRACE_DAT_WORD_SIZE;
> +		unsigned int type_len = words & TRACE_DAT_RECORD_TYPE_LEN_MASK;
> +		unsigned int hdr_word = ((time_delta & TRACE_DAT_RECORD_TIME_MASK) <<
> +					TRACE_DAT_RECORD_TIME_SHIFT) | type_len;

[Severity: High]
If the gap between events on a CPU exceeds 0x07FFFFFF, the ftrace
specification requires emitting a TIME_EXTEND or TIME_STAMP record to advance
base_ts. Does masking off the upper bits via time_delta &
TRACE_DAT_RECORD_TIME_MASK silently truncate the timestamp and corrupt the
chronological ordering?

[Severity: High]
The ftrace ring buffer specification uses the 5-bit type_len field to encode
lengths up to 27 words. For 28 words or more, type_len must be 0, and the
actual length must be written as a 32-bit word immediately following the
header. Does applying words & 0x1F map large events into special types (like
PADDING for 29) or to type_len = 0 without appending the required 32-bit
length word?

> +		int rec_size;
> +		char *rec;
> +
> +		rec_size = TRACE_DAT_WORD_SIZE + data_len;
> +		if (rec_size % TRACE_DAT_WORD_SIZE)
> +			rec_size += TRACE_DAT_WORD_SIZE - (rec_size % TRACE_DAT_WORD_SIZE);
> +
> +		rec = calloc(1, rec_size);
> +		if (!rec)
> +			return -ENOMEM;

[Severity: Medium]
If calloc fails, this directly returns -ENOMEM. Does this bypass the out_free
cleanup label entirely, leaking the page_records and page_rec_sizes tracking
arrays and any previously allocated record strings?

> +		memcpy(rec, &hdr_word, TRACE_DAT_WORD_SIZE);
> +		memcpy(rec + TRACE_DAT_WORD_SIZE, event->raw, data_len);
> +
> +		if (page_size_used + rec_size > trace_dat_page_size -
> +		    TRACE_DAT_RECORD_HEADER_SIZE) {
> +			ret = trace_dat__write_page(fp, base_ts,
> +			      page_records, page_rec_sizes, nr_page_recs);
> +			for (j = 0; j < nr_page_recs; j++)
> +				free(page_records[j]);
> +			nr_page_recs = 0;
> +			page_size_used = 0;
> +			base_ts = event->ts;
> +			if (ret < 0)
> +				goto out_free;

[Severity: Medium]
If trace_dat__write_page fails and we jump to out_free, the newly allocated
rec has not yet been appended to the page_records array. Is this memory
permanently leaked since it is skipped by the cleanup loop?

> +		}
> +
> +		if (nr_page_recs >= page_cap) {
> +			char **tmp_records;
> +			int *tmp_sizes;
> +
> +			page_cap = page_cap ? page_cap * 2 : INITIAL_PAGE_RECORD_CAPACITY;
> +			tmp_records = realloc(page_records, page_cap * sizeof(char *));
> +			tmp_sizes = realloc(page_rec_sizes, page_cap * sizeof(int));
> +			if (!tmp_records || !tmp_sizes) {
> +				ret = -ENOMEM;
> +				goto out_free;

[Severity: High]
If realloc for tmp_sizes fails, the code jumps to out_free without updating
page_records to point to the newly allocated block in tmp_records. Does the
cleanup path then iterate over the potentially freed page_records pointer,
dereference it (use-after-free), and then call free(page_records) again
(double-free)?

> +			}
[ ... ]
> +int trace_dat__write_options_section1(void)
> +{
[ ... ]
> +	/* TRACECLOCK option */
> +	opt_id = TRACE_DAT_OPTION_TRACECLOCK;
> +
> +	path = get_tracing_file("trace_clock");
> +	clock_file = fopen(path, "r");

[Severity: High]
The get_tracing_file function internally uses asprintf and can return NULL on
memory allocation failure. Is it possible for fopen to receive a NULL path,
leading to a segmentation fault?

> +	put_tracing_file(path);
[ ... ]
> +int trace_dat__write_flyrecord_section(void)
> +{
[ ... ]
> +	/* Align to page boundary */
> +	after_header   = ftell(trace_dat_fp);
> +	padding_needed = (trace_dat_page_size -
> +			 (after_header % trace_dat_page_size)) % trace_dat_page_size;
> +
> +	if (padding_needed > 0) {
> +		pad = calloc(1, padding_needed);
> +
> +		if (!fwrite(pad, 1, padding_needed, trace_dat_fp)) {

[Severity: High]
Is pad passed directly to fwrite without checking if calloc succeeded? Can
this lead to a segmentation fault if the allocation fails and fwrite receives
a NULL buffer?

> +			free(pad);
> +			ret = -EIO;
> +			goto cleanup;
> +		}
> +		free(pad);
> +	}
[ ... ]
> +	/* fill section size */
> +	end_pos = ftell(trace_dat_fp);
> +
> +	section_size = end_pos - flyrecord_start;

[Severity: Medium]
In the trace.dat specification, the section_size field must indicate the size
of the payload following the 16-byte section header. By capturing
flyrecord_start before the header is written, does the final section_size
erroneously include the 16 bytes of the header itself, causing parsing tools
to miscalculate the boundary?

> +	if (fseek(trace_dat_fp, section_size_pos, SEEK_SET) < 0 ||
> +	    !fwrite(&section_size, sizeof(unsigned long long), 1, trace_dat_fp)) {
> +		ret = -EIO;
> +		goto cleanup;
> +	}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260608125951.90425-2-tshah@linux.ibm.com?part=1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 3/4] perf data-convert: Add perf.data to trace.dat conversion backend
  2026-06-08 12:59 ` [RFC PATCH 3/4] perf data-convert: Add perf.data to trace.dat conversion backend Tanushree Shah
@ 2026-06-08 13:14   ` sashiko-bot
  0 siblings, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-06-08 13:14 UTC (permalink / raw)
  To: Tanushree Shah; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [Critical] Out-of-bounds heap write when tracepoint event data size exceeds the remaining space in a trace.dat page.
- [Medium] The conversion tool fails to register required callbacks for pipe mode (such as `.attr`, `.feature`, and `.tracing_data`), silently dropping events.
- [Medium] The output file is overwritten unconditionally, ignoring the `opts->force` flag.
- [Medium] A `realloc` failure in `trace_dat__collect_cpu_event` causes a memory leak and subsequent segmentation fault.
--

commit da74698d88efeca5fa33802144416838acb15d2a
Author: Tanushree Shah <tshah@linux.ibm.com>

perf data-convert: Add perf.data to trace.dat conversion backend

Add data-convert-trace.c implementing trace_convert__perf2dat() to
convert perf.data tracepoint events to trace.dat format.

process_sample_event() is invoked for each PERF_TYPE_TRACEPOINT sample
during perf_session__process_events(), storing raw event bytes per-cpu
via trace_dat__collect_cpu_event().

Once all samples are collected:
- trace_dat__write_options_section1() writes the OPTIONS section with
  CPUCOUNT, TRACECLOCK, HEADER_INFO, FTRACE_EVENTS, EVENT_FORMATS,
  KALLSYMS, CMDLINES and DONE options.
- trace_dat__write__options_section2() writes the OPTIONS section with
  BUFFER option holding per-cpu data offset placeholders and the DONE
  option.
- trace_dat__write_flyrecord_section() builds ring buffer pages
  per-cpu and patches BUFFER option with final offsets and sizes

Per-cpu buffers are sized to tep_get_page_size() from the session
tep handle and released on all exit paths.

> diff --git a/tools/perf/util/data-convert-trace.c b/tools/perf/util/data-convert-trace.c
> new file mode 100644
> index 0000000000000..e4f8b817be36d
> --- /dev/null
> +++ b/tools/perf/util/data-convert-trace.c
[ ... ]
> +/* Store raw tracepoint event data in per-cpu buffer for trace.dat flyrecord */
> +static int process_sample_event(const struct perf_tool *tool,
> +				union perf_event *event __maybe_unused,
> +				struct perf_sample *sample,
> +				struct evsel *evsel,
> +				struct machine *machine __maybe_unused)
> +{
> +	struct trace_convert *tc = container_of(tool, struct trace_convert, tool);
> +
> +	/* Collect raw tracepoint data per-cpu */
> +	if (trace_dat_fp && sample->raw_size > 0 &&
> +	    evsel->core.attr.type == PERF_TYPE_TRACEPOINT) {
> +		if (trace_dat__collect_cpu_event(sample->cpu, sample->time,
> +					sample->raw_data, sample->raw_size) < 0) {

[Severity: Medium]
Is it possible for a memory allocation failure here to cause a crash
during cleanup?

If trace_dat__collect_cpu_event() fails during a realloc(), it might return
-ENOMEM while leaving the events array pointer as NULL but keeping the count
intact.

When trace_dat__free_cpu_buffers() is called on the error path, it could
dereference that NULL pointer:

tools/perf/util/trace-dat.c:trace_dat__free_cpu_buffers() {
	...
	for (i = 0; i < trace_cpu_data[cpu].count; i++)
		free(trace_cpu_data[cpu].events[i].raw);
	...
}

> +			pr_err("Failed to collect CPU event\n");
> +			return -ENOMEM;
> +		}
> +		tc->events_count++;
> +	}
> +
> +	return 0;
> +}
[ ... ]
> +/* Convert perf.data tracepoint events to trace.dat format */
> +int trace_convert__perf2dat(const char *input, const char *to_trace,
> +			   struct perf_data_convert_opts *opts)
> +{
> +	struct perf_session *session;
> +	struct trace_convert tc = {
> +		.events_count = 0,
> +	};
> +	struct perf_data data = {
> +		.path = input,
> +		.mode = PERF_DATA_MODE_READ,
> +		.force = opts->force,
> +	};
> +	int ret = -EINVAL;
> +	bool cpu_buffers_initialized = false;
> +
> +	/* Initialize tool with all required callbacks */
> +	perf_tool__init(&tc.tool, /*ordered_events=*/true);
> +	tc.tool.sample = process_sample_event;

[Severity: Medium]
Does this initialization omit the required callbacks for pipe mode?

By only overriding .sample, the .attr, .feature, and .tracing_data
callbacks remain as default stubs from perf_tool__init().

In pipe mode, without these callbacks, the tool might silently fail to
populate trace metadata and generate an empty trace.dat file.

> +
> +	/* Open output trace.dat file */
> +	trace_dat_fp = fopen(to_trace, "wb");
> +	if (!trace_dat_fp) {

[Severity: Medium]
Can this unconditionally overwrite an existing file?

The fopen() call uses the "wb" mode which truncates the file, but it
doesn't appear to check the opts->force flag.

This regression could lead to accidental data loss.

> +		pr_err("Failed to open output file: %s\n", to_trace);
> +		return -EINVAL;
> +	}
[ ... ]
> +	/* Write trace.dat options and flyrecord sections */
> +	if (trace_dat__write_options_section1() < 0) {
> +		pr_err("Failed to write options section1\n");
> +		ret = -EIO;
> +		goto out_delete;
> +	}
> +	if (trace_dat__write_options_section2() < 0) {
> +		pr_err("Failed to write options section2\n");
> +		ret = -EIO;
> +		goto out_delete;
> +	}
> +	if (trace_dat__write_flyrecord_section() < 0) {

[Severity: Critical]
Could an unusually large tracepoint event cause an out-of-bounds memory
access when writing the flyrecord section?

When trace_dat__write_cpu_dat() processes records, it queues them to a new
page without checking if the rec_size fits within trace_dat_page_size.

If an event size exceeds the page size, trace_dat__write_page() allocates a
fixed-size buffer and copies the data into it:

tools/perf/util/trace-dat.c:trace_dat__write_page() {
	page = calloc(1, trace_dat_page_size);
	...
	for (i = 0; i < nr_recs; i++) {
		memcpy(page + offset, records[i], rec_sizes[i]);
	...
}

This regression could result in a heap buffer overflow.

> +		pr_err("Failed to write flyrecord section\n");
> +		ret = -EIO;
> +		goto out_delete;
> +	}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260608125951.90425-2-tshah@linux.ibm.com?part=3

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion
  2026-06-08 12:59 [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion Tanushree Shah
                   ` (3 preceding siblings ...)
  2026-06-08 12:59 ` [RFC PATCH 4/4] perf data: Add --to-trace-dat option for converting perf.data tracepoint events into trace.dat format Tanushree Shah
@ 2026-06-08 15:18 ` Ian Rogers
  4 siblings, 0 replies; 10+ messages in thread
From: Ian Rogers @ 2026-06-08 15:18 UTC (permalink / raw)
  To: Tanushree Shah
  Cc: acme, jolsa, adrian.hunter, vmolnaro, mpetlan, tmricht, maddy,
	namhyung, linux-kernel, linux-perf-users, linuxppc-dev, atrajeev,
	hbathini, Tejas.Manhas1, Tanushree.Shah, Shivani.Nittor

On Mon, Jun 8, 2026 at 6:00 AM Tanushree Shah <tshah@linux.ibm.com> wrote:
>
> This RFC patch series introduces support for converting perf.data files
> containing tracepoint events into trace.dat format, enabling seamless
> visualization and analysis using KerneShark.

Thanks for doing this, this is a useful feature!

nit: typo KernelShark

>
> ======================
> Background and Motivation
> ======================
>
> Currently, perf and trace-cmd operate as separate tracing ecosystems with
> incompatible data formats. Users who collect tracepoint data with
> 'perf record' cannot easily visualize it in KernelShark's graphical
> timeline view or leverage trace-cmd's analysis capabilities.
>
> This creates workflow friction when users need to:
>
> - Visualize perf tracepoint data in KernelShark's interactive graphical
>   timeline
> - Share trace data between perf and trace-cmd workflows and toolchains
> - Perform architecture-independent conversion and analysis of traces
>
> This conversion bridge eliminates these barriers by enabling seamless
> data exchange between perf and trace-cmd ecosystems, allowing users to
> choose the best tool for each analysis phase.
>
> ======================
> Implementation Overview
> ======================
>
> The series implements the trace.dat file format specification (version 7)
> within perf's data conversion framework.
>
> **Patch 1/4: Core trace.dat Export Infrastructure**
> Introduces util/trace-dat.c and util/trace-dat.h implementing:
> - Per-CPU raw event buffer management (init, collect, free)
> - Ftrace ring buffer page construction
> - trace.dat section writers (strings, options, flyrecord sections)
>
> **Patch 2/4: Metadata Integration**
> Extends util/trace-event-read.c to write trace.dat metadata during
> perf.data
> parsing:
> - Initial format header (magic, version, endian, page size, compression)
> - Section 16: HEADER INFO (header_page + header_event)
> - Section 17: FTRACE EVENT FORMATS
> - Section 18: EVENT FORMATS (per system/event format files)
> - Section 19: KALLSYMS
> - Section 21: CMDLINES
> - Section 15: STRINGS (written last after all sections)
>
> **Patch 3/4: Conversion Backend**
> Implements util/data-convert-trace.c with trace_convert__perf2dat()
> function:
> - Processes PERF_TYPE_TRACEPOINT samples via process_sample_event()
> - Collects raw event data per-CPU using trace_dat__collect_cpu_event()
> - Writes OPTIONS sections (CPUCOUNT, TRACECLOCK, metadata offsets)
> - Writes FLYRECORD section with per-CPU ring buffer pages
>
> **Patch 4/4: User Interface**
> Extends tools/perf/builtin-data.c with --to-trace-dat option:
> - Adds command-line option for trace.dat output
> - Mutually exclusive with --to-ctf and --to-json
> - Calls trace_convert__perf2dat() to perform conversion
>
> ======================
> Current Implementation Details
> ======================
>
> **trace.dat Format Version:**
> The implementation currently targets trace.dat format version 7, which
> is the stable version supported by current trace-cmd releases (v3.x).
> This version is hardcoded to ensure compatibility with existing
> trace-cmd and KernelShark installations. Future enhancements could add
> version negotiation or support for newer format versions as they become
> standardized.
>
> **Compression Strategy:**
> Compression is explicitly disabled (set to NONE) in the generated
> trace.dat files.
> This design choice:
> - Simplifies the initial implementation and testing
> - Ensures maximum compatibility across trace-cmd versions
> - Avoids external compression library dependencies
>
> Future work could add support for various compression algorithms (zlib,
> zstd, lz4) with runtime selection via command-line options, significantly
> reducing file sizes for large traces.
>
> ======================
> Usage Example
> ======================
>
> ```bash
> *Record tracepoint events with perf*
> perf record -e sched:sched_switch -e sched:sched_wakeup -a sleep 10
>
> *Convert to trace.dat format*
> perf data convert --to-trace-dat=output.dat
>
> *Verify trace.dat structure*
> trace-cmd dump --summary output.dat
>
> *Analyze with trace-cmd*
> trace-cmd report output.dat
>
> *Visualize in KernelShark*
> kernelshark output.dat
> ```
>
> **Conversion Output:**
> ```
> [ perf data convert: Converted 'perf.data' into trace.dat format
> 'output.dat' ]
> [ perf data convert: Converted 2684 events ]
> ```
> **trace-cmd dump --summary Output:**
> ```
>  Tracing meta data in file output.dat:
>         [Initial format]
>                 7       [Version]
>                 0       [Little endian]
>                 8       [Bytes in a long]
>                 65536   [Page size, bytes]
>                 none    [Compression algorithm]
>                         [Compression version]
>         [buffer "", "local" clock, 65536 page size, 16 cpus, 1048576 bytes
>     flyrecord data]
>         [10 options]
>         [Saved command lines, 0 bytes]
>         [Kallsyms, 0 bytes]
>         [Ftrace format, 0 events]
>         [Header page, 206 bytes]
>         [Header event, 205 bytes]
>         [Events format, 1 systems]
>         [9 sections]
> ```
> ======================
> Testing and Verification
> ======================
>
> The series has been extensively tested with:
> - Various tracepoint events (sched, irq, syscalls, block I/O)
> - Mixed recordings containing both tracepoint and non-tracepoint events
>   only tracepoints converted)
> - Verification with trace-cmd report and KernelShark visualization
> - Memory leak testing with Valgrind (0 bytes leaked)
> - Cross-architecture testing (x86_64, ppc64le)

It seems that some of this could be a test to give coverage of the
feature. We have similar tests for other convertors:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/test_perf_data_converter_ctf.sh?h=perf-tools-next
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/test_perf_data_converter_json.sh?h=perf-tools-next

I think Sashiko has caught some coding issues, so I'll hold off on a
full review until the churn from Sashiko subsides.

Thanks!
Ian

> All generated trace.dat files successfully open in:
> - trace-cmd report (v3.1+)
> - KernelShark (v2.0+)
>
> ======================
> Next Steps
> ======================
>
> We would highly appreciate reviews, comments, and feedback on:
> - The overall architectural approach and integration points
> - Compatibility considerations with trace-cmd ecosystem
> - Performance characteristics for large-scale traces
> - Additional use cases or workflow scenarios
> - Future enhancement priorities
>
> Tanushree Shah (4):
>   perf/trace-dat: Add trace.dat export infrastructure
>   perf/trace-event: Write trace.dat metadata sections during parsing
>   perf data-convert: Add perf.data to trace.dat conversion backend
>   perf data: Add --to-trace-dat option for converting perf.data
>     tracepoint events into trace.dat format
>
>  tools/perf/builtin-data.c            |  38 +-
>  tools/perf/util/Build                |   2 +
>  tools/perf/util/data-convert-trace.c | 152 ++++++
>  tools/perf/util/data-convert.h       |   4 +
>  tools/perf/util/trace-dat.c          | 705 +++++++++++++++++++++++++++
>  tools/perf/util/trace-dat.h          |  79 +++
>  tools/perf/util/trace-event-read.c   | 259 +++++++++-
>  7 files changed, 1230 insertions(+), 9 deletions(-)
>  create mode 100644 tools/perf/util/data-convert-trace.c
>  create mode 100644 tools/perf/util/trace-dat.c
>  create mode 100644 tools/perf/util/trace-dat.h
>
> --
> 2.53.0
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-06-08 15:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-08 12:59 [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion Tanushree Shah
2026-06-08 12:59 ` [RFC PATCH 1/4] perf/trace-dat: Add trace.dat export infrastructure Tanushree Shah
2026-06-08 13:13   ` sashiko-bot
2026-06-08 12:59 ` [RFC PATCH 2/4] perf/trace-event: Write trace.dat metadata sections during parsing Tanushree Shah
2026-06-08 13:12   ` sashiko-bot
2026-06-08 12:59 ` [RFC PATCH 3/4] perf data-convert: Add perf.data to trace.dat conversion backend Tanushree Shah
2026-06-08 13:14   ` sashiko-bot
2026-06-08 12:59 ` [RFC PATCH 4/4] perf data: Add --to-trace-dat option for converting perf.data tracepoint events into trace.dat format Tanushree Shah
2026-06-08 13:12   ` sashiko-bot
2026-06-08 15:18 ` [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion Ian Rogers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox