* [RFC PATCH 1/4] perf/trace-dat: Add trace.dat export infrastructure
2026-06-08 12:59 [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion Tanushree Shah
@ 2026-06-08 12:59 ` Tanushree Shah
2026-06-08 13:13 ` sashiko-bot
2026-06-08 12:59 ` [RFC PATCH 2/4] perf/trace-event: Write trace.dat metadata sections during parsing Tanushree Shah
` (3 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Tanushree Shah @ 2026-06-08 12:59 UTC (permalink / raw)
To: acme, jolsa, adrian.hunter, vmolnaro, mpetlan, tmricht, maddy,
irogers, namhyung, linux-kernel
Cc: linux-perf-users, linuxppc-dev, atrajeev, hbathini, Tejas.Manhas1,
Tanushree.Shah, Shivani.Nittor, Tanushree Shah
Add new utility files util/trace-dat.c and util/trace-dat.h
implementing the infrastructure for exporting perf.data tracepoints
to trace.dat format compatible with trace-cmd and KernelShark.
trace-dat.c defines all globals and functions needed for:
- Per-cpu raw event buffer management (init_cpu_buffers,
collect_cpu_event, free_cpu_buffers)
- ftrace ring buffer page construction (write_page, write_cpu_dat)
- trace.dat section writers (write_strings_section,
write_options_section1, write_options_section2,
write_flyrecord_section)
trace-dat.h declares all globals and function prototypes to be
used by data-convert-trace.c and trace-event-read.c.
Signed-off-by: Tanushree Shah <tshah@linux.ibm.com>
---
tools/perf/util/Build | 1 +
tools/perf/util/trace-dat.c | 705 ++++++++++++++++++++++++++++++++++++
tools/perf/util/trace-dat.h | 79 ++++
3 files changed, 785 insertions(+)
create mode 100644 tools/perf/util/trace-dat.c
create mode 100644 tools/perf/util/trace-dat.h
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 70cc91d00804..c000d8032d25 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -98,6 +98,7 @@ perf-util-y += trace-event-scripting.o
perf-util-$(CONFIG_LIBTRACEEVENT) += trace-event.o
perf-util-$(CONFIG_LIBTRACEEVENT) += trace-event-parse.o
perf-util-$(CONFIG_LIBTRACEEVENT) += trace-event-read.o
+perf-util-$(CONFIG_LIBTRACEEVENT) += trace-dat.o
perf-util-y += sort.o
perf-util-y += hist.o
perf-util-y += util.o
diff --git a/tools/perf/util/trace-dat.c b/tools/perf/util/trace-dat.c
new file mode 100644
index 000000000000..aa34a7b89b7a
--- /dev/null
+++ b/tools/perf/util/trace-dat.c
@@ -0,0 +1,705 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright 2026, IBM Corporation
+ * Author: Tanushree Shah <tshah@linux.ibm.com>
+ *
+ * trace-dat.c
+ *
+ * This file implements the trace.dat format writer for perf tool.
+ * It collects trace events from multiple CPUs and writes them in
+ * the trace-cmd compatible format.
+ */
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <errno.h>
+#include "api/fs/tracing_path.h"
+#include "trace-dat.h"
+#include "trace-event.h"
+#include "session.h"
+#include "header.h"
+#include "../perf.h"
+#include "debug.h"
+
+/* ftrace ring buffer constants for trace.dat flyrecord section
+ *
+ * Each page has a 16-byte header (timestamp + commit size), followed by
+ * variable-length records. Each record has a 4-byte header word encoding:
+ * Bits 0-4: Type/Length field (5 bits, masked by TYPE_LEN_MASK)
+ * Bits 5-31: Time delta from page base timestamp (27 bits, masked by TIME_MASK)
+ */
+#define TRACE_DAT_RECORD_HEADER_SIZE 16 /* Page header: 8-byte ts + 8-byte commit */
+#define TRACE_DAT_RECORD_TYPE_LEN_MASK 0x1F /* Extract lower 5 bits for type/length */
+#define TRACE_DAT_RECORD_TIME_SHIFT 5 /* Shift to extract time delta */
+#define TRACE_DAT_RECORD_TIME_MASK 0x07FFFFFF /* Mask for 27-bit time delta */
+#define TRACE_DAT_WORD_SIZE 4 /* Records aligned to 4-byte boundaries */
+#define TRACE_DAT_WORD_ALIGN_MASK 3
+
+/* Initial capacity for per-CPU event buffer (grows by doubling) */
+#define INITIAL_EVENT_CAPACITY 1024
+/* Initial capacity for page record array (grows by doubling) */
+#define INITIAL_PAGE_RECORD_CAPACITY 64
+/* Buffer size for reading trace_clock string from debugfs/tracefs */
+#define CLOCK_BUFFER_SIZE 256
+
+FILE *trace_dat_fp;
+int trace_dat_page_size;
+int trace_dat_nr_cpus;
+long trace_dat_options_offset;
+long trace_dat_header_info_offset;
+long trace_dat_events_format_offset;
+long trace_dat_ftrace_format_offset;
+long trace_dat_kallsyms_offset;
+long trace_dat_cmdline_offset;
+long trace_dat_next_options_offset;
+
+
+/**
+ * struct cpu_event - Single trace event from a CPU
+ * @ts: Timestamp of the event
+ * @raw: Raw event data
+ * @raw_size: Size of raw event data in bytes
+ */
+struct cpu_event {
+ unsigned long long ts;
+ void *raw;
+ unsigned int raw_size;
+};
+
+/**
+ * struct cpu_events - Collection of trace events for a single CPU
+ * @events: Array of events
+ * @count: Number of events currently stored
+ * @capacity: Maximum number of events that can be stored
+ */
+struct cpu_events {
+ struct cpu_event *events;
+ int count;
+ int capacity;
+};
+
+static struct cpu_events *trace_cpu_data;
+static long *buffer_opt_cpu_offsets_pos;
+static long opt_payload_start;
+
+/* Allocate per-cpu event buffers for tracepoint data collection */
+int trace_dat__init_cpu_buffers(int nr_cpus)
+{
+ trace_cpu_data = calloc(nr_cpus, sizeof(struct cpu_events));
+ if (!trace_cpu_data)
+ return -ENOMEM;
+ buffer_opt_cpu_offsets_pos = calloc(nr_cpus, sizeof(long));
+ if (!buffer_opt_cpu_offsets_pos) {
+ free(trace_cpu_data);
+ trace_cpu_data = NULL;
+ return -ENOMEM;
+ }
+ trace_dat_nr_cpus = nr_cpus;
+ return 0;
+}
+
+/* Store raw tracepoint event data in per-cpu buffer for trace.dat
+ * flyrecord
+ */
+int trace_dat__collect_cpu_event(int cpu, unsigned long long ts,
+ void *raw, unsigned int raw_size)
+{
+ struct cpu_events *cpu_events;
+
+ if (!trace_cpu_data || cpu < 0 || cpu >= trace_dat_nr_cpus)
+ return -EINVAL;
+
+ if (!raw || raw_size == 0)
+ return -EINVAL;
+
+ cpu_events = &trace_cpu_data[cpu];
+
+ if (cpu_events->count >= cpu_events->capacity) {
+ cpu_events->capacity = cpu_events->capacity ?
+ cpu_events->capacity * 2 : INITIAL_EVENT_CAPACITY;
+ cpu_events->events = realloc(cpu_events->events,
+ cpu_events->capacity * sizeof(*cpu_events->events));
+ if (!cpu_events->events)
+ return -ENOMEM;
+ }
+
+ cpu_events->events[cpu_events->count].ts = ts;
+ cpu_events->events[cpu_events->count].raw = malloc(raw_size);
+ if (!cpu_events->events[cpu_events->count].raw)
+ return -ENOMEM;
+
+ memcpy(cpu_events->events[cpu_events->count].raw, raw, raw_size);
+ cpu_events->events[cpu_events->count].raw_size = raw_size;
+ cpu_events->count++;
+
+ return 0;
+}
+
+/* Write a single page of trace records */
+static int trace_dat__write_page(FILE *fp, unsigned long long base_ts,
+ char **records, int *rec_sizes, int nr_recs)
+{
+ unsigned long long commit = 0;
+ int offset = TRACE_DAT_RECORD_HEADER_SIZE;
+ int i;
+ char *page;
+
+ page = calloc(1, trace_dat_page_size);
+ if (!page)
+ return -ENOMEM;
+
+ for (i = 0; i < nr_recs; i++) {
+ memcpy(page + offset, records[i], rec_sizes[i]);
+ offset += rec_sizes[i];
+ commit += rec_sizes[i];
+ }
+
+ memcpy(page, &base_ts, sizeof(base_ts));
+ memcpy(page + sizeof(base_ts), &commit, sizeof(commit));
+
+ if (!fwrite(page, 1, trace_dat_page_size, fp)) {
+ free(page);
+ return -EIO;
+ }
+ free(page);
+
+ return 0;
+}
+
+/* Write all trace data for a single CPU as trace.dat flyrecord pages */
+static int trace_dat__write_cpu_dat(FILE *fp, int cpu, unsigned long long *file_offset_out)
+{
+ struct cpu_events *cpu_events = &trace_cpu_data[cpu];
+ unsigned long long base_ts;
+ unsigned long long file_offset;
+ char **page_records = NULL;
+ int *page_rec_sizes = NULL;
+ int page_cap = 0;
+ int nr_page_recs = 0;
+ int page_size_used = 0;
+ int ret = 0;
+ int i, j;
+
+ file_offset = ftell(fp);
+ *file_offset_out = file_offset;
+
+ if (cpu_events->count == 0) {
+ char *empty_page = calloc(1, trace_dat_page_size);
+
+ if (!empty_page)
+ return -ENOMEM;
+ if (!fwrite(empty_page, 1, trace_dat_page_size, fp)) {
+ free(empty_page);
+ return -EIO;
+ }
+ free(empty_page);
+ return 0;
+ }
+
+ base_ts = cpu_events->events[0].ts;
+
+ for (i = 0; i < cpu_events->count; i++) {
+ struct cpu_event *event = &cpu_events->events[i];
+ unsigned long long time_delta = event->ts - base_ts;
+ unsigned int data_len = event->raw_size;
+ unsigned int words = (data_len + TRACE_DAT_WORD_ALIGN_MASK) / TRACE_DAT_WORD_SIZE;
+ unsigned int type_len = words & TRACE_DAT_RECORD_TYPE_LEN_MASK;
+ unsigned int hdr_word = ((time_delta & TRACE_DAT_RECORD_TIME_MASK) <<
+ TRACE_DAT_RECORD_TIME_SHIFT) | type_len;
+ int rec_size;
+ char *rec;
+
+ rec_size = TRACE_DAT_WORD_SIZE + data_len;
+ if (rec_size % TRACE_DAT_WORD_SIZE)
+ rec_size += TRACE_DAT_WORD_SIZE - (rec_size % TRACE_DAT_WORD_SIZE);
+
+ rec = calloc(1, rec_size);
+ if (!rec)
+ return -ENOMEM;
+ memcpy(rec, &hdr_word, TRACE_DAT_WORD_SIZE);
+ memcpy(rec + TRACE_DAT_WORD_SIZE, event->raw, data_len);
+
+ if (page_size_used + rec_size > trace_dat_page_size -
+ TRACE_DAT_RECORD_HEADER_SIZE) {
+ ret = trace_dat__write_page(fp, base_ts,
+ page_records, page_rec_sizes, nr_page_recs);
+ for (j = 0; j < nr_page_recs; j++)
+ free(page_records[j]);
+ nr_page_recs = 0;
+ page_size_used = 0;
+ base_ts = event->ts;
+ if (ret < 0)
+ goto out_free;
+ }
+
+ if (nr_page_recs >= page_cap) {
+ char **tmp_records;
+ int *tmp_sizes;
+
+ page_cap = page_cap ? page_cap * 2 : INITIAL_PAGE_RECORD_CAPACITY;
+ tmp_records = realloc(page_records, page_cap * sizeof(char *));
+ tmp_sizes = realloc(page_rec_sizes, page_cap * sizeof(int));
+ if (!tmp_records || !tmp_sizes) {
+ ret = -ENOMEM;
+ goto out_free;
+ }
+ page_records = tmp_records;
+ page_rec_sizes = tmp_sizes;
+ }
+ page_records[nr_page_recs] = rec;
+ page_rec_sizes[nr_page_recs] = rec_size;
+ nr_page_recs++;
+ page_size_used += rec_size;
+ }
+
+ if (nr_page_recs > 0) {
+ ret = trace_dat__write_page(fp, base_ts,
+ page_records, page_rec_sizes, nr_page_recs);
+ }
+out_free:
+ for (j = 0; j < nr_page_recs; j++)
+ free(page_records[j]);
+ free(page_records);
+ free(page_rec_sizes);
+ return ret;
+}
+
+/* Write the strings section containing section name lookup table */
+int trace_dat__write_strings_section(void)
+{
+ unsigned short section_id = TRACE_DAT_SECTION_STRINGS;
+ unsigned short flags = 0;
+ unsigned long long section_size = 0;
+ static const char * const section_names[] = {
+ "headers", /* offset 0 - strid for section 16 */
+ "ftrace event formats", /* offset 8 - strid for section 17 */
+ "events format", /* offset 29 - strid for section 18 */
+ "kallsyms", /* offset 43 - strid for section 19 */
+ "cmdlines", /* offset 52 - strid for section 21 */
+ "strings", /* offset 61 - strid for section 15 */
+ "options", /* offset 69 - strid for options 1 */
+ "options", /* offset 77 - strid for options 2 */
+ "buffer-flyrecord", /* offset 85 - strid for flyrecord 3 */
+ NULL
+ };
+
+ /* string_id points to "strings" string itself */
+ unsigned int string_id = STRID_STRINGS;
+ int i;
+
+ if (!trace_dat_fp)
+ return -EBADF;
+
+ for (i = 0; section_names[i] != NULL; i++)
+ section_size += strlen(section_names[i]) + 1;
+
+ /* write section header */
+ if (!fwrite(§ion_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp) ||
+ !fwrite(§ion_size, sizeof(unsigned long long), 1, trace_dat_fp))
+ return -EIO;
+
+ /* write strings */
+ for (i = 0; section_names[i] != NULL; i++)
+ if (!fwrite(section_names[i], 1, strlen(section_names[i]) + 1, trace_dat_fp))
+ return -EIO;
+ return 0;
+}
+
+/* Writes options section containing CPUCOUNT, TRACECLOCK, EVENT_FORMAT, HEADER_INFO,
+ * FTRACE_EVENTS, KALLSYMS, CMDLINES options, ending with DONE option pointing to next section.
+ */
+int trace_dat__write_options_section1(void)
+{
+ unsigned short section_id = TRACE_DAT_SECTION_OPTIONS;
+ unsigned short flags = 0;
+ unsigned int string_id = STRID_OPTIONS_1;
+ unsigned long long section_size = 0;
+ long section_size_pos;
+ long payload_start;
+ unsigned long long section_start;
+ unsigned short opt_id;
+ unsigned int opt_size;
+ char clock_buf[CLOCK_BUFFER_SIZE];
+ FILE *clock_file;
+ size_t bytes_read;
+ char *path;
+ unsigned long long next_offset;
+ long end_pos;
+
+ if (!trace_dat_fp)
+ return -EBADF;
+
+ /* fill options_offset in initial format */
+ section_start = ftell(trace_dat_fp);
+
+ if (fseek(trace_dat_fp, trace_dat_options_offset, SEEK_SET) < 0 ||
+ !fwrite(§ion_start, sizeof(unsigned long long), 1, trace_dat_fp) ||
+ fseek(trace_dat_fp, 0, SEEK_END) < 0)
+ return -EIO;
+
+ /* write section header */
+ if (!fwrite(§ion_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp))
+ return -EIO;
+ section_size_pos = ftell(trace_dat_fp);
+ if (!fwrite(§ion_size, sizeof(unsigned long long), 1, trace_dat_fp))
+ return -EIO;
+
+ payload_start = ftell(trace_dat_fp);
+
+ /* CPUCOUNT option */
+ opt_id = TRACE_DAT_OPTION_CPUCOUNT;
+ opt_size = sizeof(unsigned int);
+
+ if (!fwrite(&opt_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&opt_size, sizeof(unsigned int), 1, trace_dat_fp) ||
+ !fwrite(&trace_dat_nr_cpus, sizeof(unsigned int), 1, trace_dat_fp))
+ return -EIO;
+
+ /* TRACECLOCK option */
+ opt_id = TRACE_DAT_OPTION_TRACECLOCK;
+
+ path = get_tracing_file("trace_clock");
+ clock_file = fopen(path, "r");
+ put_tracing_file(path);
+ if (clock_file) {
+ bytes_read = fread(clock_buf, 1, sizeof(clock_buf) - 1, clock_file);
+ fclose(clock_file);
+ clock_buf[bytes_read] = '\0';
+ } else {
+ strcpy(clock_buf, "local\n");
+ bytes_read = strlen(clock_buf);
+ }
+ opt_size = bytes_read + 1;
+ if (!fwrite(&opt_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&opt_size, sizeof(unsigned int), 1, trace_dat_fp) ||
+ !fwrite(clock_buf, 1, opt_size, trace_dat_fp))
+ return -EIO;
+
+ /* EVENT option */
+ opt_id = TRACE_DAT_OPTION_EVENT;
+ opt_size = sizeof(unsigned long long);
+
+ if (!fwrite(&opt_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&opt_size, sizeof(unsigned int), 1, trace_dat_fp) ||
+ !fwrite(&trace_dat_events_format_offset, sizeof(unsigned long long),
+ 1, trace_dat_fp))
+ return -EIO;
+
+ /* HEADER option */
+ opt_id = TRACE_DAT_OPTION_HEADER;
+ opt_size = sizeof(unsigned long long);
+
+ if (!fwrite(&opt_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&opt_size, sizeof(unsigned int), 1, trace_dat_fp) ||
+ !fwrite(&trace_dat_header_info_offset, sizeof(unsigned long long),
+ 1, trace_dat_fp))
+ return -EIO;
+
+ /* FTRACE option */
+ opt_id = TRACE_DAT_OPTION_FTRACE;
+ opt_size = sizeof(unsigned long long);
+
+ if (!fwrite(&opt_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&opt_size, sizeof(unsigned int), 1, trace_dat_fp) ||
+ !fwrite(&trace_dat_ftrace_format_offset, sizeof(unsigned long long),
+ 1, trace_dat_fp))
+ return -EIO;
+
+ /* KALLSYMS option */
+ opt_id = TRACE_DAT_OPTION_KALLSYMS;
+ opt_size = sizeof(unsigned long long);
+
+ if (!fwrite(&opt_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&opt_size, sizeof(unsigned int), 1, trace_dat_fp) ||
+ !fwrite(&trace_dat_kallsyms_offset, sizeof(unsigned long long),
+ 1, trace_dat_fp))
+ return -EIO;
+
+ /* CMDLINE option */
+ opt_id = TRACE_DAT_OPTION_CMDLINE;
+ opt_size = sizeof(unsigned long long);
+
+ if (!fwrite(&opt_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&opt_size, sizeof(unsigned int), 1, trace_dat_fp) ||
+ !fwrite(&trace_dat_cmdline_offset, sizeof(unsigned long long),
+ 1, trace_dat_fp))
+ return -EIO;
+
+ /* DONE option id - next_options_offset filled later */
+ opt_id = TRACE_DAT_OPTION_DONE;
+ opt_size = sizeof(unsigned long long);
+ next_offset = 0; /* placeholder */
+
+ trace_dat_next_options_offset = ftell(trace_dat_fp);
+ if (!fwrite(&opt_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&opt_size, sizeof(unsigned int), 1, trace_dat_fp) ||
+ !fwrite(&next_offset, sizeof(unsigned long long), 1, trace_dat_fp))
+ return -EIO;
+
+ /* fill section size */
+ end_pos = ftell(trace_dat_fp);
+
+ section_size = end_pos - payload_start;
+ if (fseek(trace_dat_fp, section_size_pos, SEEK_SET) < 0 ||
+ !fwrite(§ion_size, sizeof(unsigned long long), 1, trace_dat_fp) ||
+ fseek(trace_dat_fp, end_pos, SEEK_SET) < 0)
+ return -EIO;
+
+ return 0;
+
+}
+
+/* Writes options section containing BUFFER option with flyrecord section
+ * (flyrecord section offset, clock type, page size, CPU count,
+ * per-CPU offsets/sizes) and DONE option.
+ */
+int trace_dat__write_options_section2(void)
+{
+ unsigned short section_id = TRACE_DAT_SECTION_OPTIONS;
+ unsigned short flags = 0;
+ unsigned int string_id = STRID_OPTIONS_2;
+ unsigned long long section_size = 0;
+ long section_size_pos;
+ long payload_start;
+ int cpu;
+ unsigned short opt_id = TRACE_DAT_OPTION_BUFFER;
+ unsigned int opt_size = 0;
+ long opt_size_pos;
+ unsigned long long data_offset = 0;
+ unsigned int page_size = (unsigned int)trace_dat_page_size;
+ const char *clock = "local";
+ unsigned long long next;
+ long end_pos;
+ unsigned long long cpu_offset;
+ unsigned long long cpu_size;
+ unsigned short done_id;
+ unsigned int done_size;
+
+ if (!trace_dat_fp)
+ return -EINVAL;
+
+ /* fill done1 next offset - points to this section */
+ next = ftell(trace_dat_fp);
+
+ if (fseek(trace_dat_fp, trace_dat_next_options_offset + 2 + 4, SEEK_SET) < 0 ||
+ !fwrite(&next, sizeof(unsigned long long), 1, trace_dat_fp) ||
+ fseek(trace_dat_fp, 0, SEEK_END) < 0)
+ return -EIO;
+
+ /* write section header */
+ if (!fwrite(§ion_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp))
+ return -EIO;
+ section_size_pos = ftell(trace_dat_fp);
+ if (!fwrite(§ion_size, sizeof(unsigned long long), 1, trace_dat_fp))
+ return -EIO;
+
+ payload_start = ftell(trace_dat_fp);
+
+ /* BUFFER option */
+ if (!fwrite(&opt_id, sizeof(unsigned short), 1, trace_dat_fp))
+ return -EIO;
+ opt_size_pos = ftell(trace_dat_fp);
+ if (!fwrite(&opt_size, sizeof(unsigned int), 1, trace_dat_fp))
+ return -EIO;
+ opt_payload_start = ftell(trace_dat_fp);
+
+ /* data_offset placeholder */
+ if (!fwrite(&data_offset, sizeof(unsigned long long), 1, trace_dat_fp) ||
+ !fwrite("\0", 1, 1, trace_dat_fp) ||
+ !fwrite(clock, 1, strlen(clock) + 1, trace_dat_fp) ||
+ !fwrite(&page_size, sizeof(unsigned int), 1, trace_dat_fp) ||
+ !fwrite(&trace_dat_nr_cpus, sizeof(unsigned int), 1, trace_dat_fp))
+ return -EIO;
+
+ /* per cpu: cpu_id + offset placeholder + size */
+ for (cpu = 0; cpu < trace_dat_nr_cpus; cpu++) {
+ cpu_offset = 0; /* filled in write_flyrecord */
+ cpu_size = 0; /* filled in write_flyrecord */
+
+ if (!fwrite(&cpu, sizeof(unsigned int), 1, trace_dat_fp))
+ return -EIO;
+ buffer_opt_cpu_offsets_pos[cpu] = ftell(trace_dat_fp);
+ if (!fwrite(&cpu_offset, sizeof(unsigned long long), 1, trace_dat_fp) ||
+ !fwrite(&cpu_size, sizeof(unsigned long long), 1, trace_dat_fp))
+ return -EIO;
+ }
+
+ /* fill opt_size */
+ end_pos = ftell(trace_dat_fp);
+
+ opt_size = end_pos - opt_payload_start;
+ fseek(trace_dat_fp, opt_size_pos, SEEK_SET);
+ if (!fwrite(&opt_size, sizeof(unsigned int), 1, trace_dat_fp))
+ return -EIO;
+ fseek(trace_dat_fp, end_pos, SEEK_SET);
+
+ /* DONE id=0 */
+ done_id = TRACE_DAT_OPTION_DONE;
+ done_size = sizeof(unsigned long long);
+ /* No additional options sections follow this one */
+ next = 0;
+
+ if (!fwrite(&done_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&done_size, sizeof(unsigned int), 1, trace_dat_fp) ||
+ !fwrite(&next, sizeof(unsigned long long), 1, trace_dat_fp))
+ return -EIO;
+
+ /* fill section size */
+ end_pos = ftell(trace_dat_fp);
+
+ section_size = end_pos - payload_start;
+ fseek(trace_dat_fp, section_size_pos, SEEK_SET);
+ if (!fwrite(§ion_size, sizeof(unsigned long long), 1, trace_dat_fp))
+ return -EIO;
+ fseek(trace_dat_fp, end_pos, SEEK_SET);
+
+ return 0;
+
+}
+
+int trace_dat__write_flyrecord_section(void)
+{
+ unsigned short section_id = TRACE_DAT_SECTION_FLYRECORD;
+ unsigned short flags = 0;
+ unsigned int string_id = STRID_BUFFER_FLYRECORD;
+ unsigned long long section_size = 0;
+ long section_size_pos;
+ long flyrecord_start;
+ long after_header;
+ long padding_needed;
+ unsigned long long *cpu_offsets;
+ unsigned long long *cpu_sizes;
+ int cpu;
+ int ret = 0;
+ char *pad;
+ unsigned long long start;
+ long end_pos;
+
+ if (!trace_dat_fp)
+ return -EINVAL;
+
+ cpu_offsets = calloc(trace_dat_nr_cpus, sizeof(unsigned long long));
+ cpu_sizes = calloc(trace_dat_nr_cpus, sizeof(unsigned long long));
+ if (!cpu_offsets || !cpu_sizes) {
+ ret = -ENOMEM;
+ goto cleanup;
+ }
+ flyrecord_start = ftell(trace_dat_fp);
+ if (flyrecord_start < 0) {
+ ret = -EIO;
+ goto cleanup;
+ }
+
+ /* section header */
+ if (!fwrite(§ion_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp)) {
+ ret = -EIO;
+ goto cleanup;
+ }
+ section_size_pos = ftell(trace_dat_fp);
+ if (!fwrite(§ion_size, sizeof(unsigned long long), 1, trace_dat_fp)) {
+ ret = -EIO;
+ goto cleanup;
+ }
+
+ /* Align to page boundary */
+ after_header = ftell(trace_dat_fp);
+ padding_needed = (trace_dat_page_size -
+ (after_header % trace_dat_page_size)) % trace_dat_page_size;
+
+ if (padding_needed > 0) {
+ pad = calloc(1, padding_needed);
+
+ if (!fwrite(pad, 1, padding_needed, trace_dat_fp)) {
+ free(pad);
+ ret = -EIO;
+ goto cleanup;
+ }
+ free(pad);
+ }
+
+ /* write per-cpu trace data */
+ for (cpu = 0; cpu < trace_dat_nr_cpus; cpu++) {
+ start = ftell(trace_dat_fp);
+
+ ret = trace_dat__write_cpu_dat(trace_dat_fp, cpu, &cpu_offsets[cpu]);
+
+ if (ret < 0) {
+ pr_err("Failed to write CPU %d data\n", cpu);
+ goto cleanup;
+ }
+ cpu_sizes[cpu] = ftell(trace_dat_fp) - start;
+ }
+
+ /* fill section size */
+ end_pos = ftell(trace_dat_fp);
+
+ section_size = end_pos - flyrecord_start;
+ if (fseek(trace_dat_fp, section_size_pos, SEEK_SET) < 0 ||
+ !fwrite(§ion_size, sizeof(unsigned long long), 1, trace_dat_fp)) {
+ ret = -EIO;
+ goto cleanup;
+ }
+ if (fseek(trace_dat_fp, end_pos, SEEK_SET) < 0) {
+ ret = -EIO;
+ goto cleanup;
+ }
+
+ /* fill cpu offsets and sizes in BUFFER option */
+ for (cpu = 0; cpu < trace_dat_nr_cpus; cpu++) {
+ if (fseek(trace_dat_fp, buffer_opt_cpu_offsets_pos[cpu], SEEK_SET) < 0 ||
+ !fwrite(&cpu_offsets[cpu], sizeof(unsigned long long), 1, trace_dat_fp) ||
+ !fwrite(&cpu_sizes[cpu], sizeof(unsigned long long), 1, trace_dat_fp)) {
+ ret = -EIO;
+ goto cleanup;
+ }
+ }
+
+ /* fill data offset in buffer option */
+ if (fseek(trace_dat_fp, opt_payload_start, SEEK_SET) < 0 ||
+ !fwrite(&flyrecord_start, sizeof(unsigned long long), 1, trace_dat_fp)) {
+ ret = -EIO;
+ goto cleanup;
+ }
+
+ if (fseek(trace_dat_fp, 0, SEEK_END) < 0) {
+ ret = -EIO;
+ goto cleanup;
+ }
+
+
+cleanup:
+ free(cpu_offsets);
+ free(cpu_sizes);
+ return ret;
+}
+
+/* Free all per-CPU event buffers */
+void trace_dat__free_cpu_buffers(void)
+{
+ int cpu;
+
+ if (!trace_cpu_data)
+ return;
+
+ for (cpu = 0; cpu < trace_dat_nr_cpus; cpu++) {
+ int i;
+
+ for (i = 0; i < trace_cpu_data[cpu].count; i++)
+ free(trace_cpu_data[cpu].events[i].raw);
+ free(trace_cpu_data[cpu].events);
+ }
+ free(trace_cpu_data);
+ trace_cpu_data = NULL;
+ free(buffer_opt_cpu_offsets_pos);
+ buffer_opt_cpu_offsets_pos = NULL;
+ trace_dat_nr_cpus = 0;
+}
diff --git a/tools/perf/util/trace-dat.h b/tools/perf/util/trace-dat.h
new file mode 100644
index 000000000000..7667a440330c
--- /dev/null
+++ b/tools/perf/util/trace-dat.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright 2026, IBM Corporation
+ * Author: Tanushree Shah <tshah@linux.ibm.com>
+ */
+
+#ifndef __PERF_TRACE_DAT_H
+#define __PERF_TRACE_DAT_H
+
+#include <stdio.h>
+
+/* trace.dat file format version */
+#define TRACE_DAT_VERSION '7'
+
+/*
+ * Section IDs for trace.dat format
+ */
+#define TRACE_DAT_SECTION_OPTIONS 0
+#define TRACE_DAT_SECTION_FLYRECORD 3
+#define TRACE_DAT_SECTION_STRINGS 15
+#define TRACE_DAT_SECTION_HEADER 16
+#define TRACE_DAT_SECTION_FTRACE 17
+#define TRACE_DAT_SECTION_EVENTS 18
+#define TRACE_DAT_SECTION_KALLSYMS 19
+#define TRACE_DAT_SECTION_CMDLINE 21
+
+/*
+ * Option IDs for trace.dat options sections
+ */
+#define TRACE_DAT_OPTION_DONE 0
+#define TRACE_DAT_OPTION_BUFFER 3
+#define TRACE_DAT_OPTION_TRACECLOCK 4
+#define TRACE_DAT_OPTION_CPUCOUNT 8
+#define TRACE_DAT_OPTION_HEADER 16
+#define TRACE_DAT_OPTION_FTRACE 17
+#define TRACE_DAT_OPTION_EVENT 18
+#define TRACE_DAT_OPTION_KALLSYMS 19
+#define TRACE_DAT_OPTION_CMDLINE 21
+
+/*
+ * String offsets in the strings section
+ * These point to null-terminated strings used as section names
+ */
+#define STRID_HEADERS 0
+#define STRID_FTRACE_FORMATS 8
+#define STRID_EVENT_FORMATS 29
+#define STRID_KALLSYMS 43
+#define STRID_CMDLINES 52
+#define STRID_STRINGS 61
+#define STRID_OPTIONS_1 69
+#define STRID_OPTIONS_2 77
+#define STRID_BUFFER_FLYRECORD 85
+
+struct perf_session;
+
+extern FILE *trace_dat_fp;
+extern int trace_dat_page_size;
+extern int trace_dat_nr_cpus;
+extern long trace_dat_options_offset;
+extern long trace_dat_header_info_offset;
+extern long trace_dat_events_format_offset;
+extern long trace_dat_ftrace_format_offset;
+extern long trace_dat_kallsyms_offset;
+extern long trace_dat_cmdline_offset;
+extern long trace_dat_next_options_offset;
+
+/* collect and manage per-cpu tracepoint event buffers */
+int trace_dat__init_cpu_buffers(int nr_cpus);
+int trace_dat__collect_cpu_event(int cpu, unsigned long long ts,
+ void *raw, unsigned int raw_size);
+void trace_dat__free_cpu_buffers(void);
+
+/* write trace.dat file sections */
+int trace_dat__write_options_section1(void);
+int trace_dat__write_options_section2(void);
+int trace_dat__write_flyrecord_section(void);
+int trace_dat__write_strings_section(void);
+
+#endif /* __PERF_TRACE_DAT_H */
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [RFC PATCH 1/4] perf/trace-dat: Add trace.dat export infrastructure
2026-06-08 12:59 ` [RFC PATCH 1/4] perf/trace-dat: Add trace.dat export infrastructure Tanushree Shah
@ 2026-06-08 13:13 ` sashiko-bot
0 siblings, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-06-08 13:13 UTC (permalink / raw)
To: Tanushree Shah; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 10 potential issue(s) to consider:
- [High] Sequential realloc failures cause a use-after-free, double-free, and memory leak.
- [High] Unchecked realloc overwrite leads to array leak and a later NULL pointer dereference.
- [High] Unchecked calloc failure leads to NULL pointer dereference in fwrite.
- [High] Heap buffer overflow when processing abnormally large trace records.
- [High] Unchecked get_tracing_file failure leads to NULL pointer dereference in fopen.
- [High] Ftrace ring buffer format is incorrectly constructed for events >= 28 words, causing data corruption.
- [High] Timestamp deltas larger than 27 bits are silently truncated.
- [Medium] Direct return on record allocation failure bypasses cleanup, leaking accumulated memory.
- [Medium] The section_size for the flyrecord section is incorrectly calculated, violating the trace.dat format.
- [Medium] The current event record is leaked if flushing the page or reallocating the batch array fails.
--
commit b7f763e536c02304fb52bceb9c78dd92220fa81a
Author: Tanushree Shah <tshah@linux.ibm.com>
perf/trace-dat: Add trace.dat export infrastructure
Add new utility files util/trace-dat.c and util/trace-dat.h
implementing the infrastructure for exporting perf.data tracepoints
to trace.dat format compatible with trace-cmd and KernelShark.
> diff --git a/tools/perf/util/trace-dat.c b/tools/perf/util/trace-dat.c
> index 0000000000000..aa34a7b89b7a3
> --- /dev/null
> +++ b/tools/perf/util/trace-dat.c
[ ... ]
> +int trace_dat__collect_cpu_event(int cpu, unsigned long long ts,
> + void *raw, unsigned int raw_size)
> +{
[ ... ]
> + if (cpu_events->count >= cpu_events->capacity) {
> + cpu_events->capacity = cpu_events->capacity ?
> + cpu_events->capacity * 2 : INITIAL_EVENT_CAPACITY;
> + cpu_events->events = realloc(cpu_events->events,
> + cpu_events->capacity * sizeof(*cpu_events->events));
> + if (!cpu_events->events)
> + return -ENOMEM;
[Severity: High]
If realloc fails, it returns NULL and overwrites cpu_events->events, leaking
the original memory block. Later, trace_dat__free_cpu_buffers() iterates over
the events and dereferences trace_cpu_data[cpu].events[i].raw. Can this cause
a segmentation fault during cleanup since events is NULL?
> + }
[ ... ]
> +static int trace_dat__write_page(FILE *fp, unsigned long long base_ts,
> + char **records, int *rec_sizes, int nr_recs)
> +{
[ ... ]
> + page = calloc(1, trace_dat_page_size);
> + if (!page)
> + return -ENOMEM;
> +
> + for (i = 0; i < nr_recs; i++) {
> + memcpy(page + offset, records[i], rec_sizes[i]);
[Severity: High]
When a trace event's size exceeds trace_dat_page_size - 16 bytes, the
oversized event is queued into a new batch. Does this memcpy copy data past
the end of the page allocation, causing a heap buffer overflow?
> + offset += rec_sizes[i];
> + commit += rec_sizes[i];
> + }
[ ... ]
> +static int trace_dat__write_cpu_dat(FILE *fp, int cpu, unsigned long long *file_offset_out)
> +{
[ ... ]
> + for (i = 0; i < cpu_events->count; i++) {
> + struct cpu_event *event = &cpu_events->events[i];
> + unsigned long long time_delta = event->ts - base_ts;
> + unsigned int data_len = event->raw_size;
> + unsigned int words = (data_len + TRACE_DAT_WORD_ALIGN_MASK) / TRACE_DAT_WORD_SIZE;
> + unsigned int type_len = words & TRACE_DAT_RECORD_TYPE_LEN_MASK;
> + unsigned int hdr_word = ((time_delta & TRACE_DAT_RECORD_TIME_MASK) <<
> + TRACE_DAT_RECORD_TIME_SHIFT) | type_len;
[Severity: High]
If the gap between events on a CPU exceeds 0x07FFFFFF, the ftrace
specification requires emitting a TIME_EXTEND or TIME_STAMP record to advance
base_ts. Does masking off the upper bits via time_delta &
TRACE_DAT_RECORD_TIME_MASK silently truncate the timestamp and corrupt the
chronological ordering?
[Severity: High]
The ftrace ring buffer specification uses the 5-bit type_len field to encode
lengths up to 27 words. For 28 words or more, type_len must be 0, and the
actual length must be written as a 32-bit word immediately following the
header. Does applying words & 0x1F map large events into special types (like
PADDING for 29) or to type_len = 0 without appending the required 32-bit
length word?
> + int rec_size;
> + char *rec;
> +
> + rec_size = TRACE_DAT_WORD_SIZE + data_len;
> + if (rec_size % TRACE_DAT_WORD_SIZE)
> + rec_size += TRACE_DAT_WORD_SIZE - (rec_size % TRACE_DAT_WORD_SIZE);
> +
> + rec = calloc(1, rec_size);
> + if (!rec)
> + return -ENOMEM;
[Severity: Medium]
If calloc fails, this directly returns -ENOMEM. Does this bypass the out_free
cleanup label entirely, leaking the page_records and page_rec_sizes tracking
arrays and any previously allocated record strings?
> + memcpy(rec, &hdr_word, TRACE_DAT_WORD_SIZE);
> + memcpy(rec + TRACE_DAT_WORD_SIZE, event->raw, data_len);
> +
> + if (page_size_used + rec_size > trace_dat_page_size -
> + TRACE_DAT_RECORD_HEADER_SIZE) {
> + ret = trace_dat__write_page(fp, base_ts,
> + page_records, page_rec_sizes, nr_page_recs);
> + for (j = 0; j < nr_page_recs; j++)
> + free(page_records[j]);
> + nr_page_recs = 0;
> + page_size_used = 0;
> + base_ts = event->ts;
> + if (ret < 0)
> + goto out_free;
[Severity: Medium]
If trace_dat__write_page fails and we jump to out_free, the newly allocated
rec has not yet been appended to the page_records array. Is this memory
permanently leaked since it is skipped by the cleanup loop?
> + }
> +
> + if (nr_page_recs >= page_cap) {
> + char **tmp_records;
> + int *tmp_sizes;
> +
> + page_cap = page_cap ? page_cap * 2 : INITIAL_PAGE_RECORD_CAPACITY;
> + tmp_records = realloc(page_records, page_cap * sizeof(char *));
> + tmp_sizes = realloc(page_rec_sizes, page_cap * sizeof(int));
> + if (!tmp_records || !tmp_sizes) {
> + ret = -ENOMEM;
> + goto out_free;
[Severity: High]
If realloc for tmp_sizes fails, the code jumps to out_free without updating
page_records to point to the newly allocated block in tmp_records. Does the
cleanup path then iterate over the potentially freed page_records pointer,
dereference it (use-after-free), and then call free(page_records) again
(double-free)?
> + }
[ ... ]
> +int trace_dat__write_options_section1(void)
> +{
[ ... ]
> + /* TRACECLOCK option */
> + opt_id = TRACE_DAT_OPTION_TRACECLOCK;
> +
> + path = get_tracing_file("trace_clock");
> + clock_file = fopen(path, "r");
[Severity: High]
The get_tracing_file function internally uses asprintf and can return NULL on
memory allocation failure. Is it possible for fopen to receive a NULL path,
leading to a segmentation fault?
> + put_tracing_file(path);
[ ... ]
> +int trace_dat__write_flyrecord_section(void)
> +{
[ ... ]
> + /* Align to page boundary */
> + after_header = ftell(trace_dat_fp);
> + padding_needed = (trace_dat_page_size -
> + (after_header % trace_dat_page_size)) % trace_dat_page_size;
> +
> + if (padding_needed > 0) {
> + pad = calloc(1, padding_needed);
> +
> + if (!fwrite(pad, 1, padding_needed, trace_dat_fp)) {
[Severity: High]
Is pad passed directly to fwrite without checking if calloc succeeded? Can
this lead to a segmentation fault if the allocation fails and fwrite receives
a NULL buffer?
> + free(pad);
> + ret = -EIO;
> + goto cleanup;
> + }
> + free(pad);
> + }
[ ... ]
> + /* fill section size */
> + end_pos = ftell(trace_dat_fp);
> +
> + section_size = end_pos - flyrecord_start;
[Severity: Medium]
In the trace.dat specification, the section_size field must indicate the size
of the payload following the 16-byte section header. By capturing
flyrecord_start before the header is written, does the final section_size
erroneously include the 16 bytes of the header itself, causing parsing tools
to miscalculate the boundary?
> + if (fseek(trace_dat_fp, section_size_pos, SEEK_SET) < 0 ||
> + !fwrite(§ion_size, sizeof(unsigned long long), 1, trace_dat_fp)) {
> + ret = -EIO;
> + goto cleanup;
> + }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260608125951.90425-2-tshah@linux.ibm.com?part=1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC PATCH 2/4] perf/trace-event: Write trace.dat metadata sections during parsing
2026-06-08 12:59 [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion Tanushree Shah
2026-06-08 12:59 ` [RFC PATCH 1/4] perf/trace-dat: Add trace.dat export infrastructure Tanushree Shah
@ 2026-06-08 12:59 ` Tanushree Shah
2026-06-08 13:12 ` sashiko-bot
2026-06-08 12:59 ` [RFC PATCH 3/4] perf data-convert: Add perf.data to trace.dat conversion backend Tanushree Shah
` (2 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Tanushree Shah @ 2026-06-08 12:59 UTC (permalink / raw)
To: acme, jolsa, adrian.hunter, vmolnaro, mpetlan, tmricht, maddy,
irogers, namhyung, linux-kernel
Cc: linux-perf-users, linuxppc-dev, atrajeev, hbathini, Tejas.Manhas1,
Tanushree.Shah, Shivani.Nittor, Tanushree Shah
Perf already captures the tracing metadata as a part of
data section in perf.data
When trace_dat_fp is set, write trace.dat compatible metadata
sections using the perf provided raw buffers.
Sections written:
- Initial format header (magic, version, endian, long_size,
page_size, compression, options_offset placeholder)
- Section 16: HEADER INFO (header_page + header_event)
- Section 17: FTRACE EVENT FORMATS
- Section 18: EVENT FORMATS (per system/event format files)
- Section 19: KALLSYMS
- Section 21: CMDLINES
- Section 15: STRINGS (written last after all sections)
Signed-off-by: Tanushree Shah <tshah@linux.ibm.com>
---
tools/perf/util/trace-event-read.c | 259 ++++++++++++++++++++++++++++-
1 file changed, 252 insertions(+), 7 deletions(-)
diff --git a/tools/perf/util/trace-event-read.c b/tools/perf/util/trace-event-read.c
index ecbbb93f0185..815577703c2e 100644
--- a/tools/perf/util/trace-event-read.c
+++ b/tools/perf/util/trace-event-read.c
@@ -19,6 +19,7 @@
#include "trace-event.h"
#include "debug.h"
#include "util.h"
+#include "trace-dat.h"
static int input_fd;
@@ -145,10 +146,9 @@ static char *read_string(void)
static int read_proc_kallsyms(struct tep_handle *pevent)
{
unsigned int size;
+ char *buf;
size = read4(pevent);
- if (!size)
- return 0;
/*
* Just skip it, now that we configure libtraceevent to use the
* tools/perf/ symbol resolver.
@@ -160,11 +160,56 @@ static int read_proc_kallsyms(struct tep_handle *pevent)
* payload", so that older tools can continue reading it and interpret
* it as "no kallsyms payload is present".
*/
- lseek(input_fd, size, SEEK_CUR);
+ /* Write kallsyms section with empty payload if no data */
+ if (!size) {
+ if (trace_dat_fp) {
+ unsigned short section_id = TRACE_DAT_SECTION_KALLSYMS;
+ unsigned short flags = 0;
+ unsigned long long section_size = sizeof(unsigned int);
+ unsigned int kallsyms_data = 0;
+ unsigned int string_id = STRID_KALLSYMS;
+
+ trace_dat_kallsyms_offset = ftell(trace_dat_fp);
+ if (!fwrite(§ion_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp) ||
+ !fwrite(§ion_size, sizeof(unsigned long long), 1, trace_dat_fp) ||
+ !fwrite(&kallsyms_data, sizeof(unsigned int), 1, trace_dat_fp))
+ return -EIO;
+ }
+ return 0;
+ }
+ buf = malloc(size);
+ if (buf == NULL)
+ return -1;
+ if (read(input_fd, buf, size) < 0) {
+ free(buf);
+ return -1;
+ }
trace_data_size += size;
+ /* Write kallsyms section with data */
+ if (trace_dat_fp) {
+ unsigned short section_id = TRACE_DAT_SECTION_KALLSYMS;
+ unsigned int string_id = STRID_KALLSYMS;
+ unsigned long long section_size = sizeof(unsigned int) + size;
+ unsigned short flags = 0;
+
+ trace_dat_kallsyms_offset = ftell(trace_dat_fp);
+ if (!fwrite(§ion_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp) ||
+ !fwrite(§ion_size, sizeof(unsigned long long), 1, trace_dat_fp) ||
+ !fwrite(&size, sizeof(unsigned int), 1, trace_dat_fp) ||
+ !fwrite(buf, 1, size, trace_dat_fp)) {
+ free(buf);
+ return -EIO;
+ }
+ }
+ free(buf);
return 0;
}
+
static int read_ftrace_printk(struct tep_handle *pevent)
{
unsigned int size;
@@ -195,6 +240,13 @@ static int read_ftrace_printk(struct tep_handle *pevent)
static int read_header_files(struct tep_handle *pevent)
{
unsigned long long size;
+ unsigned long long header_page_size;
+ unsigned long long header_event_size;
+ char *header_event;
+ unsigned short section_id;
+ unsigned short flags;
+ unsigned int string_id;
+ unsigned long long section_size;
char *header_page;
char buf[BUFSIZ];
int ret = 0;
@@ -209,6 +261,7 @@ static int read_header_files(struct tep_handle *pevent)
size = read8(pevent);
+ header_page_size = size;
header_page = malloc(size);
if (header_page == NULL)
return -1;
@@ -227,19 +280,59 @@ static int read_header_files(struct tep_handle *pevent)
*/
tep_set_long_size(pevent, tep_get_header_page_size(pevent));
}
- free(header_page);
- if (do_read(buf, 13) < 0)
+ if (do_read(buf, 13) < 0) {
+ free(header_page);
return -1;
+ }
if (memcmp(buf, "header_event", 13) != 0) {
pr_debug("did not read header event");
+ free(header_page);
return -1;
}
size = read8(pevent);
- skip(size);
+ if (trace_dat_fp) {
+ header_event_size = size;
+ header_event = malloc(size);
+ if (header_event == NULL) {
+ free(header_page);
+ return -1;
+ }
+ if (do_read(header_event, size) < 0) {
+ free(header_page);
+ free(header_event);
+ return -1;
+ }
+ /* Write header_page and header_event to trace.dat */
+ section_id = TRACE_DAT_SECTION_HEADER;
+ flags = 0;
+ string_id = STRID_HEADERS;
+ section_size = 12 + 8 + header_page_size + 13 + 8 +
+ header_event_size;
+
+ trace_dat_header_info_offset = ftell(trace_dat_fp);
+ if (!fwrite(§ion_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp) ||
+ !fwrite(§ion_size, sizeof(unsigned long long), 1, trace_dat_fp) ||
+ !fwrite("header_page\0", 1, 12, trace_dat_fp) ||
+ !fwrite(&header_page_size, sizeof(unsigned long long), 1, trace_dat_fp) ||
+ !fwrite(header_page, 1, header_page_size, trace_dat_fp) ||
+ !fwrite("header_event\0", 1, 13, trace_dat_fp) ||
+ !fwrite(&header_event_size, sizeof(unsigned long long), 1, trace_dat_fp) ||
+ !fwrite(header_event, 1, header_event_size, trace_dat_fp)) {
+ free(header_page);
+ free(header_event);
+ return -EIO;
+ }
+ free(header_event);
+ } else {
+ skip(size);
+ }
+ free(header_page);
return ret;
}
@@ -259,6 +352,13 @@ static int read_ftrace_file(struct tep_handle *pevent, unsigned long long size)
pr_debug("error reading ftrace file.\n");
goto out;
}
+ if (trace_dat_fp) {
+ if (!fwrite(&size, sizeof(unsigned long long), 1, trace_dat_fp) ||
+ !fwrite(buf, 1, size, trace_dat_fp)) {
+ free(buf);
+ return -EIO;
+ }
+ }
ret = parse_ftrace_file(pevent, buf, size);
if (ret < 0)
@@ -283,6 +383,13 @@ static int read_event_file(struct tep_handle *pevent, char *sys,
ret = do_read(buf, size);
if (ret < 0)
goto out;
+ if (trace_dat_fp) {
+ if (!fwrite(&size, sizeof(unsigned long long), 1, trace_dat_fp) ||
+ !fwrite(buf, 1, size, trace_dat_fp)) {
+ free(buf);
+ return -EIO;
+ }
+ }
ret = parse_event_file(pevent, buf, size, sys);
if (ret < 0)
@@ -298,8 +405,31 @@ static int read_ftrace_files(struct tep_handle *pevent)
int count;
int i;
int ret;
+ long section_size_pos = 0;
+ long count_pos = 0;
+ unsigned long long section_size = 0;
+ long end_pos;
count = read4(pevent);
+ /* Write ftrace formats section to trace.dat output file */
+ if (trace_dat_fp) {
+ unsigned short section_id = TRACE_DAT_SECTION_FTRACE;
+ unsigned short flags = 0;
+ unsigned int string_id = STRID_FTRACE_FORMATS;
+
+ trace_dat_ftrace_format_offset = ftell(trace_dat_fp);
+
+ if (!fwrite(§ion_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp))
+ return -EIO;
+ section_size_pos = ftell(trace_dat_fp);
+ if (!fwrite(§ion_size, sizeof(unsigned long long), 1, trace_dat_fp))
+ return -EIO;
+ count_pos = ftell(trace_dat_fp);
+ if (!fwrite(&count, sizeof(unsigned int), 1, trace_dat_fp))
+ return -EIO;
+ }
for (i = 0; i < count; i++) {
size = read8(pevent);
@@ -307,6 +437,16 @@ static int read_ftrace_files(struct tep_handle *pevent)
if (ret)
return ret;
}
+ /* Fill in section size after writing all ftrace files */
+ if (trace_dat_fp) {
+ end_pos = ftell(trace_dat_fp);
+ section_size = end_pos - count_pos;
+ fseek(trace_dat_fp, section_size_pos, SEEK_SET);
+ if (!fwrite(§ion_size, sizeof(unsigned long long), 1, trace_dat_fp))
+ return -EIO;
+ fseek(trace_dat_fp, end_pos, SEEK_SET);
+ }
+
return 0;
}
@@ -318,8 +458,30 @@ static int read_event_files(struct tep_handle *pevent)
int count;
int i,x;
int ret;
+ long section_size_pos = 0;
+ long sys_count_pos = 0;
+ unsigned long long section_size = 0;
+ long end_pos;
systems = read4(pevent);
+ /* Write event formats section to trace.dat output file */
+ if (trace_dat_fp) {
+ unsigned short section_id = TRACE_DAT_SECTION_EVENTS;
+ unsigned short flags = 0;
+ unsigned int string_id = STRID_EVENT_FORMATS;
+
+ trace_dat_events_format_offset = ftell(trace_dat_fp);
+ if (!fwrite(§ion_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp))
+ return -EIO;
+ section_size_pos = ftell(trace_dat_fp);
+ if (!fwrite(§ion_size, sizeof(unsigned long long), 1, trace_dat_fp))
+ return -EIO;
+ sys_count_pos = ftell(trace_dat_fp);
+ if (!fwrite(&systems, sizeof(unsigned int), 1, trace_dat_fp))
+ return -EIO;
+ }
for (i = 0; i < systems; i++) {
sys = read_string();
@@ -327,6 +489,11 @@ static int read_event_files(struct tep_handle *pevent)
return -1;
count = read4(pevent);
+ if (trace_dat_fp) {
+ if (!fwrite(sys, 1, strlen(sys) + 1, trace_dat_fp) ||
+ !fwrite(&count, sizeof(unsigned int), 1, trace_dat_fp))
+ return -EIO;
+ }
for (x=0; x < count; x++) {
size = read8(pevent);
@@ -338,6 +505,16 @@ static int read_event_files(struct tep_handle *pevent)
}
free(sys);
}
+ /* Fill in section size after writing all event files */
+ if (trace_dat_fp) {
+ end_pos = ftell(trace_dat_fp);
+ section_size = end_pos - sys_count_pos;
+ fseek(trace_dat_fp, section_size_pos, SEEK_SET);
+ if (!fwrite(§ion_size, sizeof(unsigned long long), 1, trace_dat_fp))
+ return -EIO;
+ fseek(trace_dat_fp, end_pos, SEEK_SET);
+ }
+
return 0;
}
@@ -349,8 +526,25 @@ static int read_saved_cmdline(struct tep_handle *pevent)
/* it can have 0 size */
size = read8(pevent);
- if (!size)
+ /* Write cmdlines section with empty payload if no data */
+ if (!size) {
+ if (trace_dat_fp) {
+ unsigned short section_id = TRACE_DAT_SECTION_CMDLINE;
+ unsigned short flags = 0;
+ unsigned int string_id = STRID_CMDLINES;
+ unsigned long long section_size = sizeof(unsigned long long);
+ unsigned long long section_data = 0;
+
+ trace_dat_cmdline_offset = ftell(trace_dat_fp);
+ if (!fwrite(§ion_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp) ||
+ !fwrite(§ion_size, sizeof(unsigned long long), 1, trace_dat_fp) ||
+ !fwrite(§ion_data, sizeof(unsigned long long), 1, trace_dat_fp))
+ return -EIO;
+ }
return 0;
+ }
buf = malloc(size + 1);
if (buf == NULL) {
@@ -363,6 +557,23 @@ static int read_saved_cmdline(struct tep_handle *pevent)
pr_debug("error reading saved cmdlines\n");
goto out;
}
+ /* Write cmdlines section with data */
+ if (trace_dat_fp) {
+ unsigned short section_id = TRACE_DAT_SECTION_CMDLINE;
+ unsigned short flags = 0;
+ unsigned int string_id = STRID_CMDLINES;
+ unsigned long long section_size = sizeof(unsigned long long) + size;
+
+ trace_dat_cmdline_offset = ftell(trace_dat_fp);
+ if (!fwrite(§ion_id, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
+ !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp) ||
+ !fwrite(§ion_size, sizeof(unsigned long long), 1, trace_dat_fp) ||
+ !fwrite(&size, sizeof(unsigned long long), 1, trace_dat_fp) ||
+ !fwrite(buf, 1, size, trace_dat_fp))
+ return -EIO;
+ }
+
buf[ret] = '\0';
parse_saved_cmdline(pevent, buf, size);
@@ -387,6 +598,7 @@ ssize_t trace_report(int fd, struct trace_event *tevent, bool __repipe)
int file_page_size;
struct tep_handle *pevent = NULL;
int err;
+ char magic_buf[10];
repipe = __repipe;
input_fd = fd;
@@ -398,12 +610,17 @@ ssize_t trace_report(int fd, struct trace_event *tevent, bool __repipe)
return -1;
}
+ if (trace_dat_fp)
+ memcpy(magic_buf, buf, 3);
+
if (do_read(buf, 7) < 0)
return -1;
if (memcmp(buf, "tracing", 7) != 0) {
pr_debug("not a trace file (missing 'tracing' tag)");
return -1;
}
+ if (trace_dat_fp)
+ memcpy(magic_buf + 3, buf, 7);
version = read_string();
if (version == NULL)
@@ -440,6 +657,28 @@ ssize_t trace_report(int fd, struct trace_event *tevent, bool __repipe)
tep_set_long_size(pevent, file_long_size);
tep_set_page_size(pevent, file_page_size);
+ /* Write initial file header to trace.dat */
+ if (trace_dat_fp) {
+ unsigned char endian = file_bigendian;
+ unsigned char long_size = file_long_size;
+ unsigned int page_size = file_page_size;
+ unsigned long long placeholder = 0;
+ char trace_dat_version = TRACE_DAT_VERSION;
+
+ if (!fwrite(magic_buf, 1, 10, trace_dat_fp) || /* magic + "tracing" */
+ !fwrite(&trace_dat_version, 1, 2, trace_dat_fp) ||
+ !fwrite(&endian, 1, 1, trace_dat_fp) ||
+ !fwrite(&long_size, 1, 1, trace_dat_fp) ||
+ !fwrite(&page_size, sizeof(unsigned int), 1, trace_dat_fp) ||
+ !fwrite("none", 1, 4, trace_dat_fp) ||
+ !fwrite("\0", 1, 1, trace_dat_fp) ||
+ !fwrite("\0", 1, 1, trace_dat_fp))
+ return -EIO;
+ trace_dat_options_offset = ftell(trace_dat_fp);
+ if (!fwrite(&placeholder, sizeof(unsigned long long), 1, trace_dat_fp))
+ return -EIO;
+ }
+
err = read_header_files(pevent);
if (err)
goto out;
@@ -460,6 +699,12 @@ ssize_t trace_report(int fd, struct trace_event *tevent, bool __repipe)
if (err)
goto out;
}
+ /* Write strings section to trace.dat output file */
+ if (trace_dat_fp) {
+ err = trace_dat__write_strings_section();
+ if (err)
+ goto out;
+ }
size = trace_data_size;
repipe = false;
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [RFC PATCH 2/4] perf/trace-event: Write trace.dat metadata sections during parsing
2026-06-08 12:59 ` [RFC PATCH 2/4] perf/trace-event: Write trace.dat metadata sections during parsing Tanushree Shah
@ 2026-06-08 13:12 ` sashiko-bot
0 siblings, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-06-08 13:12 UTC (permalink / raw)
To: Tanushree Shah; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
New issues:
- [High] Using `read()` directly instead of the subsystem's `do_read()` loop fails to handle short reads, causing stream desynchronization and uninitialized memory writes.
- [High] Integer truncation in `do_read()` causes massive uninitialized heap memory dumps to the output file.
- [Medium] Metadata fields and integers for the `trace.dat` file are written in the host's native byte order, causing cross-platform incompatibility.
- [Medium] Error paths in `fwrite` operations return directly without cleaning up allocated resources, leading to memory and state leaks.
- [Medium] Writing 2 bytes from a 1-byte `char` variable causes a stack out-of-bounds read and file corruption.
Pre-existing issues:
- [High] Unbounded string accumulation in `read_string()` causes a stack buffer overflow.
--
commit f1360b22eb92ea64822af70555a808fd9d67ae93
Author: Tanushree Shah <tshah@linux.ibm.com>
Date: Mon Jun 8 12:59:50 2026 +0000
perf/trace-event: Write trace.dat metadata sections during parsing
Perf already captures the tracing metadata as a part of
data section in perf.data
When trace_dat_fp is set, write trace.dat compatible metadata
sections using the perf provided raw buffers.
Sections written:
- Initial format header (magic, version, endian, long_size,
page_size, compression, options_offset placeholder)
- Section 16: HEADER INFO (header_page + header_event)
- Section 17: FTRACE EVENT FORMATS
- Section 18: EVENT FORMATS (per system/event format files)
- Section 19: KALLSYMS
- Section 21: CMDLINES
- Section 15: STRINGS (written last after all sections)
Signed-off-by: Tanushree Shah <tshah@linux.ibm.com>
This commit updates the trace-event parsing in perf to write trace.dat
compatible metadata sections when trace_dat_fp is set. It outputs sections
such as initial format headers, FTRACE EVENT FORMATS, EVENT FORMATS,
KALLSYMS, and CMDLINES using perf's raw buffers.
> diff --git a/tools/perf/util/trace-event-read.c b/tools/perf/util/trace-event-read.c
> index ecbbb93f01853..815577703c2e5 100644
> --- a/tools/perf/util/trace-event-read.c
> +++ b/tools/perf/util/trace-event-read.c
[ ... ]
> @@ -145,10 +146,9 @@ static char *read_string(void)
[Severity: High]
This isn't a bug introduced by this patch, but does read_string() allow a
stack buffer overflow when accumulating characters?
In read_string():
buf[size++] = c;
If the input file contains a string longer than BUFSIZ without a null
terminator, will this increment the size index and write past the bounds
of the fixed-size array buf?
> static int read_proc_kallsyms(struct tep_handle *pevent)
> {
> unsigned int size;
> + char *buf;
>
> size = read4(pevent);
> - if (!size)
> - return 0;
[ ... ]
> + buf = malloc(size);
> + if (buf == NULL)
> + return -1;
> + if (read(input_fd, buf, size) < 0) {
> + free(buf);
> + return -1;
> + }
> trace_data_size += size;
> + /* Write kallsyms section with data */
> + if (trace_dat_fp) {
> + unsigned short section_id = TRACE_DAT_SECTION_KALLSYMS;
> + unsigned int string_id = STRID_KALLSYMS;
> + unsigned long long section_size = sizeof(unsigned int) + size;
> + unsigned short flags = 0;
> +
> + trace_dat_kallsyms_offset = ftell(trace_dat_fp);
> + if (!fwrite(§ion_id, sizeof(unsigned short), 1, trace_dat_fp) ||
> + !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
> + !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp) ||
> + !fwrite(§ion_size, sizeof(unsigned long long), 1, trace_dat_fp) ||
> + !fwrite(&size, sizeof(unsigned int), 1, trace_dat_fp) ||
> + !fwrite(buf, 1, size, trace_dat_fp)) {
[Severity: High]
Could using read() instead of do_read() here cause uninitialized memory to
be written to the output file?
Unlike do_read(), read() can return a short read. If a short read occurs, the
return value might still be greater than 0, bypassing the error check.
Will this leave the remainder of buf uninitialized, failing to advance the
input_fd offset by the expected amount, and then dump the partially
uninitialized heap memory into trace_dat_fp during the fwrite() call?
> + free(buf);
> + return -EIO;
> + }
> + }
> + free(buf);
> return 0;
> }
[ ... ]
> @@ -209,6 +261,7 @@ static int read_header_files(struct tep_handle *pevent)
>
> size = read8(pevent);
>
> + header_page_size = size;
> header_page = malloc(size);
> if (header_page == NULL)
> return -1;
> @@ -227,19 +280,59 @@ static int read_header_files(struct tep_handle *pevent)
> */
> tep_set_long_size(pevent, tep_get_header_page_size(pevent));
> }
> - free(header_page);
>
> - if (do_read(buf, 13) < 0)
> + if (do_read(buf, 13) < 0) {
> + free(header_page);
> return -1;
> + }
[Severity: High]
Can integer truncation in do_read() lead to massive uninitialized heap memory
dumps here?
Functions like read_header_files() allocate memory using the 64-bit size read
from the file:
size = read8(pevent);
header_page = malloc(size);
They then call do_read(header_page, size). Because do_read() takes an int
for the size argument, values over 4GB are truncated to their lower 32 bits.
If size is 0x100000004, do_read() will only read 4 bytes.
Later in the function, fwrite() is called with the full 64-bit size:
!fwrite(header_page, 1, header_page_size, trace_dat_fp)
Will this result in dumping 4GB of uninitialized heap memory into the
output file?
[ ... ]
> @@ -298,8 +405,31 @@ static int read_ftrace_files(struct tep_handle *pevent)
> int count;
> int i;
> int ret;
> + long section_size_pos = 0;
> + long count_pos = 0;
> + unsigned long long section_size = 0;
> + long end_pos;
>
> count = read4(pevent);
> + /* Write ftrace formats section to trace.dat output file */
> + if (trace_dat_fp) {
> + unsigned short section_id = TRACE_DAT_SECTION_FTRACE;
> + unsigned short flags = 0;
> + unsigned int string_id = STRID_FTRACE_FORMATS;
> +
> + trace_dat_ftrace_format_offset = ftell(trace_dat_fp);
> +
> + if (!fwrite(§ion_id, sizeof(unsigned short), 1, trace_dat_fp) ||
> + !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
> + !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp))
[Severity: Medium]
Should these multi-byte integers be byte-swapped to match the output file's
endianness?
The trace.dat file endianness is explicitly declared to match the original
perf.data file (endian = file_bigendian).
However, variables like section_id, flags, and string_id are written using
fwrite(), which writes them in the host's native byte order. If a
little-endian host is processing a big-endian capture file, won't this corrupt
the section IDs and sizes in the resulting trace.dat file?
> + return -EIO;
> + section_size_pos = ftell(trace_dat_fp);
> + if (!fwrite(§ion_size, sizeof(unsigned long long), 1, trace_dat_fp))
> + return -EIO;
> + count_pos = ftell(trace_dat_fp);
> + if (!fwrite(&count, sizeof(unsigned int), 1, trace_dat_fp))
> + return -EIO;
> + }
[ ... ]
> @@ -363,6 +557,23 @@ static int read_saved_cmdline(struct tep_handle *pevent)
> pr_debug("error reading saved cmdlines\n");
> goto out;
> }
> + /* Write cmdlines section with data */
> + if (trace_dat_fp) {
> + unsigned short section_id = TRACE_DAT_SECTION_CMDLINE;
> + unsigned short flags = 0;
> + unsigned int string_id = STRID_CMDLINES;
> + unsigned long long section_size = sizeof(unsigned long long) + size;
> +
> + trace_dat_cmdline_offset = ftell(trace_dat_fp);
> + if (!fwrite(§ion_id, sizeof(unsigned short), 1, trace_dat_fp) ||
> + !fwrite(&flags, sizeof(unsigned short), 1, trace_dat_fp) ||
> + !fwrite(&string_id, sizeof(unsigned int), 1, trace_dat_fp) ||
> + !fwrite(§ion_size, sizeof(unsigned long long), 1, trace_dat_fp) ||
> + !fwrite(&size, sizeof(unsigned long long), 1, trace_dat_fp) ||
> + !fwrite(buf, 1, size, trace_dat_fp))
> + return -EIO;
[Severity: Medium]
Does returning directly bypass the cleanup block?
If fwrite() fails, this error path returns -EIO immediately instead of using
the "goto out" pattern established in this function.
Will this cause the allocated memory in buf to be leaked, since free(buf)
is skipped?
> + }
> +
> buf[ret] = '\0';
>
> parse_saved_cmdline(pevent, buf, size);
[ ... ]
> @@ -440,6 +657,28 @@ ssize_t trace_report(int fd, struct trace_event *tevent, bool __repipe)
> tep_set_long_size(pevent, file_long_size);
> tep_set_page_size(pevent, file_page_size);
>
> + /* Write initial file header to trace.dat */
> + if (trace_dat_fp) {
> + unsigned char endian = file_bigendian;
> + unsigned char long_size = file_long_size;
> + unsigned int page_size = file_page_size;
> + unsigned long long placeholder = 0;
> + char trace_dat_version = TRACE_DAT_VERSION;
> +
> + if (!fwrite(magic_buf, 1, 10, trace_dat_fp) || /* magic + "tracing" */
> + !fwrite(&trace_dat_version, 1, 2, trace_dat_fp) ||
[Severity: Medium]
Can this write out-of-bounds stack memory to the output file?
The variable trace_dat_version is declared as a 1-byte char:
char trace_dat_version = TRACE_DAT_VERSION;
However, the fwrite call asks to write 2 bytes:
!fwrite(&trace_dat_version, 1, 2, trace_dat_fp)
Will this read 1 byte past the trace_dat_version variable on the stack,
leaking uninitialized stack memory into the output file and potentially
failing to write the expected null terminator for the version string?
> + !fwrite(&endian, 1, 1, trace_dat_fp) ||
> + !fwrite(&long_size, 1, 1, trace_dat_fp) ||
> + !fwrite(&page_size, sizeof(unsigned int), 1, trace_dat_fp) ||
> + !fwrite("none", 1, 4, trace_dat_fp) ||
> + !fwrite("\0", 1, 1, trace_dat_fp) ||
> + !fwrite("\0", 1, 1, trace_dat_fp))
> + return -EIO;
> + trace_dat_options_offset = ftell(trace_dat_fp);
> + if (!fwrite(&placeholder, sizeof(unsigned long long), 1, trace_dat_fp))
> + return -EIO;
> + }
> +
> err = read_header_files(pevent);
> if (err)
> goto out;
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260608125951.90425-2-tshah@linux.ibm.com?part=2
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC PATCH 3/4] perf data-convert: Add perf.data to trace.dat conversion backend
2026-06-08 12:59 [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion Tanushree Shah
2026-06-08 12:59 ` [RFC PATCH 1/4] perf/trace-dat: Add trace.dat export infrastructure Tanushree Shah
2026-06-08 12:59 ` [RFC PATCH 2/4] perf/trace-event: Write trace.dat metadata sections during parsing Tanushree Shah
@ 2026-06-08 12:59 ` Tanushree Shah
2026-06-08 13:14 ` sashiko-bot
2026-06-08 12:59 ` [RFC PATCH 4/4] perf data: Add --to-trace-dat option for converting perf.data tracepoint events into trace.dat format Tanushree Shah
2026-06-08 15:18 ` [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion Ian Rogers
4 siblings, 1 reply; 10+ messages in thread
From: Tanushree Shah @ 2026-06-08 12:59 UTC (permalink / raw)
To: acme, jolsa, adrian.hunter, vmolnaro, mpetlan, tmricht, maddy,
irogers, namhyung, linux-kernel
Cc: linux-perf-users, linuxppc-dev, atrajeev, hbathini, Tejas.Manhas1,
Tanushree.Shah, Shivani.Nittor, Tanushree Shah
Add data-convert-trace.c implementing trace_convert__perf2dat() to
convert perf.data tracepoint events to trace.dat format.
process_sample_event() is invoked for each PERF_TYPE_TRACEPOINT sample
during perf_session__process_events(), storing raw event bytes per-cpu
via trace_dat__collect_cpu_event().
Once all samples are collected:
- trace_dat__write_options_section1() writes the OPTIONS section with
CPUCOUNT, TRACECLOCK, HEADER_INFO, FTRACE_EVENTS, EVENT_FORMATS,
KALLSYMS, CMDLINES and DONE options.
- trace_dat__write__options_section2() writes the OPTIONS section with
BUFFER option holding per-cpu data offset placeholders and the DONE
option.
- trace_dat__write_flyrecord_section() builds ring buffer pages
per-cpu and patches BUFFER option with final offsets and sizes
Per-cpu buffers are sized to tep_get_page_size() from the session
tep handle and released on all exit paths.
Signed-off-by: Tanushree Shah <tshah@linux.ibm.com>
---
tools/perf/util/Build | 1 +
tools/perf/util/data-convert-trace.c | 152 +++++++++++++++++++++++++++
tools/perf/util/data-convert.h | 4 +
3 files changed, 157 insertions(+)
create mode 100644 tools/perf/util/data-convert-trace.c
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index c000d8032d25..88022b24e170 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -236,6 +236,7 @@ ifeq ($(CONFIG_LIBTRACEEVENT),y)
endif
perf-util-y += data-convert-json.o
+perf-util-$(CONFIG_LIBTRACEEVENT) += data-convert-trace.o
perf-util-y += scripting-engines/
diff --git a/tools/perf/util/data-convert-trace.c b/tools/perf/util/data-convert-trace.c
new file mode 100644
index 000000000000..e4f8b817be36
--- /dev/null
+++ b/tools/perf/util/data-convert-trace.c
@@ -0,0 +1,152 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright 2026, IBM Corporation
+ * Author: Tanushree Shah <tshah@linux.ibm.com>
+ *
+ * data-convert-trace.c
+ *
+ * Implements perf.data to trace.dat format conversion for tracepoint events.
+ */
+
+#include <errno.h>
+#include <inttypes.h>
+#include <linux/compiler.h>
+#include <linux/err.h>
+
+#include "data-convert.h"
+#include "session.h"
+#include "evsel.h"
+#include "tool.h"
+#include "debug.h"
+#include "trace-dat.h"
+#include "trace-event.h"
+#include "event.h"
+#include "sample.h"
+#include "evlist.h"
+
+struct trace_convert {
+ struct perf_tool tool;
+ u64 events_count;
+};
+
+/* Store raw tracepoint event data in per-cpu buffer for trace.dat flyrecord */
+static int process_sample_event(const struct perf_tool *tool,
+ union perf_event *event __maybe_unused,
+ struct perf_sample *sample,
+ struct evsel *evsel,
+ struct machine *machine __maybe_unused)
+{
+ struct trace_convert *tc = container_of(tool, struct trace_convert, tool);
+
+ /* Collect raw tracepoint data per-cpu */
+ if (trace_dat_fp && sample->raw_size > 0 &&
+ evsel->core.attr.type == PERF_TYPE_TRACEPOINT) {
+ if (trace_dat__collect_cpu_event(sample->cpu, sample->time,
+ sample->raw_data, sample->raw_size) < 0) {
+ pr_err("Failed to collect CPU event\n");
+ return -ENOMEM;
+ }
+ tc->events_count++;
+ }
+
+ return 0;
+}
+
+/* Convert perf.data tracepoint events to trace.dat format */
+int trace_convert__perf2dat(const char *input, const char *to_trace,
+ struct perf_data_convert_opts *opts)
+{
+ struct perf_session *session;
+ struct trace_convert tc = {
+ .events_count = 0,
+ };
+ struct perf_data data = {
+ .path = input,
+ .mode = PERF_DATA_MODE_READ,
+ .force = opts->force,
+ };
+ int ret = -EINVAL;
+ bool cpu_buffers_initialized = false;
+
+ /* Initialize tool with all required callbacks */
+ perf_tool__init(&tc.tool, /*ordered_events=*/true);
+ tc.tool.sample = process_sample_event;
+
+ /* Open output trace.dat file */
+ trace_dat_fp = fopen(to_trace, "wb");
+ if (!trace_dat_fp) {
+ pr_err("Failed to open output file: %s\n", to_trace);
+ return -EINVAL;
+ }
+
+ /* Open perf.data session - this writes trace.dat metadata sections */
+ session = perf_session__new(&data, &tc.tool);
+ if (IS_ERR(session)) {
+ pr_err("Failed to open perf.data file\n");
+ ret = PTR_ERR(session);
+ goto out_close;
+ }
+
+ /* Initialize per-CPU buffers for flyrecord data */
+ if (session->tevent.pevent) {
+ trace_dat_page_size = tep_get_page_size(session->tevent.pevent);
+ if (trace_dat__init_cpu_buffers(session->header.env.nr_cpus_online) < 0) {
+ pr_err("Failed to initialize CPU buffers\n");
+ ret = -ENOMEM;
+ goto out_delete;
+ }
+ cpu_buffers_initialized = true;
+ }
+
+ /* Process all events - collects raw data per-cpu */
+ ret = perf_session__process_events(session);
+ if (ret < 0) {
+ pr_err("Failed to process events\n");
+ goto out_delete;
+ }
+
+ /* Skip file creation if no tracepoint events found */
+ if (tc.events_count == 0) {
+ pr_warning("No tracepoint events found in '%s', skipping trace.dat creation\n",
+ input);
+ ret = -EINVAL;
+ goto out_delete;
+ }
+
+ /* Write trace.dat options and flyrecord sections */
+ if (trace_dat__write_options_section1() < 0) {
+ pr_err("Failed to write options section1\n");
+ ret = -EIO;
+ goto out_delete;
+ }
+ if (trace_dat__write_options_section2() < 0) {
+ pr_err("Failed to write options section2\n");
+ ret = -EIO;
+ goto out_delete;
+ }
+ if (trace_dat__write_flyrecord_section() < 0) {
+ pr_err("Failed to write flyrecord section\n");
+ ret = -EIO;
+ goto out_delete;
+ }
+
+ pr_info("[ perf data convert: Converted '%s' into trace.dat format '%s' ]\n",
+ input, to_trace);
+ pr_info("[ perf data convert: Converted %llu events ]\n",
+ (unsigned long long)tc.events_count);
+
+ ret = 0;
+
+out_delete:
+ if (cpu_buffers_initialized)
+ trace_dat__free_cpu_buffers();
+ perf_session__delete(session);
+out_close:
+ if (trace_dat_fp) {
+ fclose(trace_dat_fp);
+ trace_dat_fp = NULL;
+ }
+ if (ret != 0)
+ unlink(to_trace);
+ return ret;
+}
diff --git a/tools/perf/util/data-convert.h b/tools/perf/util/data-convert.h
index ee651fa680a1..d958e68367fe 100644
--- a/tools/perf/util/data-convert.h
+++ b/tools/perf/util/data-convert.h
@@ -19,4 +19,8 @@ int bt_convert__perf2ctf(const char *input_name, const char *to_ctf,
int bt_convert__perf2json(const char *input_name, const char *to_ctf,
struct perf_data_convert_opts *opts);
+#ifdef HAVE_LIBTRACEEVENT
+int trace_convert__perf2dat(const char *input, const char *to_trace,
+ struct perf_data_convert_opts *opts);
+#endif /* HAVE_LIBTRACEEVENT */
#endif /* __DATA_CONVERT_H */
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [RFC PATCH 3/4] perf data-convert: Add perf.data to trace.dat conversion backend
2026-06-08 12:59 ` [RFC PATCH 3/4] perf data-convert: Add perf.data to trace.dat conversion backend Tanushree Shah
@ 2026-06-08 13:14 ` sashiko-bot
0 siblings, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-06-08 13:14 UTC (permalink / raw)
To: Tanushree Shah; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [Critical] Out-of-bounds heap write when tracepoint event data size exceeds the remaining space in a trace.dat page.
- [Medium] The conversion tool fails to register required callbacks for pipe mode (such as `.attr`, `.feature`, and `.tracing_data`), silently dropping events.
- [Medium] The output file is overwritten unconditionally, ignoring the `opts->force` flag.
- [Medium] A `realloc` failure in `trace_dat__collect_cpu_event` causes a memory leak and subsequent segmentation fault.
--
commit da74698d88efeca5fa33802144416838acb15d2a
Author: Tanushree Shah <tshah@linux.ibm.com>
perf data-convert: Add perf.data to trace.dat conversion backend
Add data-convert-trace.c implementing trace_convert__perf2dat() to
convert perf.data tracepoint events to trace.dat format.
process_sample_event() is invoked for each PERF_TYPE_TRACEPOINT sample
during perf_session__process_events(), storing raw event bytes per-cpu
via trace_dat__collect_cpu_event().
Once all samples are collected:
- trace_dat__write_options_section1() writes the OPTIONS section with
CPUCOUNT, TRACECLOCK, HEADER_INFO, FTRACE_EVENTS, EVENT_FORMATS,
KALLSYMS, CMDLINES and DONE options.
- trace_dat__write__options_section2() writes the OPTIONS section with
BUFFER option holding per-cpu data offset placeholders and the DONE
option.
- trace_dat__write_flyrecord_section() builds ring buffer pages
per-cpu and patches BUFFER option with final offsets and sizes
Per-cpu buffers are sized to tep_get_page_size() from the session
tep handle and released on all exit paths.
> diff --git a/tools/perf/util/data-convert-trace.c b/tools/perf/util/data-convert-trace.c
> new file mode 100644
> index 0000000000000..e4f8b817be36d
> --- /dev/null
> +++ b/tools/perf/util/data-convert-trace.c
[ ... ]
> +/* Store raw tracepoint event data in per-cpu buffer for trace.dat flyrecord */
> +static int process_sample_event(const struct perf_tool *tool,
> + union perf_event *event __maybe_unused,
> + struct perf_sample *sample,
> + struct evsel *evsel,
> + struct machine *machine __maybe_unused)
> +{
> + struct trace_convert *tc = container_of(tool, struct trace_convert, tool);
> +
> + /* Collect raw tracepoint data per-cpu */
> + if (trace_dat_fp && sample->raw_size > 0 &&
> + evsel->core.attr.type == PERF_TYPE_TRACEPOINT) {
> + if (trace_dat__collect_cpu_event(sample->cpu, sample->time,
> + sample->raw_data, sample->raw_size) < 0) {
[Severity: Medium]
Is it possible for a memory allocation failure here to cause a crash
during cleanup?
If trace_dat__collect_cpu_event() fails during a realloc(), it might return
-ENOMEM while leaving the events array pointer as NULL but keeping the count
intact.
When trace_dat__free_cpu_buffers() is called on the error path, it could
dereference that NULL pointer:
tools/perf/util/trace-dat.c:trace_dat__free_cpu_buffers() {
...
for (i = 0; i < trace_cpu_data[cpu].count; i++)
free(trace_cpu_data[cpu].events[i].raw);
...
}
> + pr_err("Failed to collect CPU event\n");
> + return -ENOMEM;
> + }
> + tc->events_count++;
> + }
> +
> + return 0;
> +}
[ ... ]
> +/* Convert perf.data tracepoint events to trace.dat format */
> +int trace_convert__perf2dat(const char *input, const char *to_trace,
> + struct perf_data_convert_opts *opts)
> +{
> + struct perf_session *session;
> + struct trace_convert tc = {
> + .events_count = 0,
> + };
> + struct perf_data data = {
> + .path = input,
> + .mode = PERF_DATA_MODE_READ,
> + .force = opts->force,
> + };
> + int ret = -EINVAL;
> + bool cpu_buffers_initialized = false;
> +
> + /* Initialize tool with all required callbacks */
> + perf_tool__init(&tc.tool, /*ordered_events=*/true);
> + tc.tool.sample = process_sample_event;
[Severity: Medium]
Does this initialization omit the required callbacks for pipe mode?
By only overriding .sample, the .attr, .feature, and .tracing_data
callbacks remain as default stubs from perf_tool__init().
In pipe mode, without these callbacks, the tool might silently fail to
populate trace metadata and generate an empty trace.dat file.
> +
> + /* Open output trace.dat file */
> + trace_dat_fp = fopen(to_trace, "wb");
> + if (!trace_dat_fp) {
[Severity: Medium]
Can this unconditionally overwrite an existing file?
The fopen() call uses the "wb" mode which truncates the file, but it
doesn't appear to check the opts->force flag.
This regression could lead to accidental data loss.
> + pr_err("Failed to open output file: %s\n", to_trace);
> + return -EINVAL;
> + }
[ ... ]
> + /* Write trace.dat options and flyrecord sections */
> + if (trace_dat__write_options_section1() < 0) {
> + pr_err("Failed to write options section1\n");
> + ret = -EIO;
> + goto out_delete;
> + }
> + if (trace_dat__write_options_section2() < 0) {
> + pr_err("Failed to write options section2\n");
> + ret = -EIO;
> + goto out_delete;
> + }
> + if (trace_dat__write_flyrecord_section() < 0) {
[Severity: Critical]
Could an unusually large tracepoint event cause an out-of-bounds memory
access when writing the flyrecord section?
When trace_dat__write_cpu_dat() processes records, it queues them to a new
page without checking if the rec_size fits within trace_dat_page_size.
If an event size exceeds the page size, trace_dat__write_page() allocates a
fixed-size buffer and copies the data into it:
tools/perf/util/trace-dat.c:trace_dat__write_page() {
page = calloc(1, trace_dat_page_size);
...
for (i = 0; i < nr_recs; i++) {
memcpy(page + offset, records[i], rec_sizes[i]);
...
}
This regression could result in a heap buffer overflow.
> + pr_err("Failed to write flyrecord section\n");
> + ret = -EIO;
> + goto out_delete;
> + }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260608125951.90425-2-tshah@linux.ibm.com?part=3
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC PATCH 4/4] perf data: Add --to-trace-dat option for converting perf.data tracepoint events into trace.dat format
2026-06-08 12:59 [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion Tanushree Shah
` (2 preceding siblings ...)
2026-06-08 12:59 ` [RFC PATCH 3/4] perf data-convert: Add perf.data to trace.dat conversion backend Tanushree Shah
@ 2026-06-08 12:59 ` Tanushree Shah
2026-06-08 13:12 ` sashiko-bot
2026-06-08 15:18 ` [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion Ian Rogers
4 siblings, 1 reply; 10+ messages in thread
From: Tanushree Shah @ 2026-06-08 12:59 UTC (permalink / raw)
To: acme, jolsa, adrian.hunter, vmolnaro, mpetlan, tmricht, maddy,
irogers, namhyung, linux-kernel
Cc: linux-perf-users, linuxppc-dev, atrajeev, hbathini, Tejas.Manhas1,
Tanushree.Shah, Shivani.Nittor, Tanushree Shah
Add new command-line option to perf data convert for generating
trace.dat output files.
The --to-trace-dat option:
- Accepts output filename for trace.dat format
- Mutually exclusive with --to-ctf and --to-json
- Calls trace_convert__perf2dat() to perform conversion
Usage:
$ perf record -e sched:* -a sleep 1
$ perf data convert --to-trace-dat=trace.dat
$ trace-cmd report trace.dat
Signed-off-by: Tanushree Shah <tshah@linux.ibm.com>
---
tools/perf/builtin-data.c | 38 ++++++++++++++++++++++++++++++++++++--
1 file changed, 36 insertions(+), 2 deletions(-)
diff --git a/tools/perf/builtin-data.c b/tools/perf/builtin-data.c
index 4c08ccb8c06b..96ccab77456a 100644
--- a/tools/perf/builtin-data.c
+++ b/tools/perf/builtin-data.c
@@ -30,6 +30,9 @@ static const char *data_usage[] = {
static const char *to_json;
static const char *to_ctf;
+#ifdef HAVE_LIBTRACEEVENT
+ static const char *trace_dat_output;
+#endif
static struct perf_data_convert_opts opts = {
.force = false,
.all = false,
@@ -48,6 +51,10 @@ static const struct option data_options[] = {
OPT_BOOLEAN(0, "all", &opts.all, "Convert all events"),
OPT_STRING(0, "time", &opts.time_str, "str",
"Time span of interest (start,stop)"),
+#ifdef HAVE_LIBTRACEEVENT
+ OPT_STRING(0, "to-trace-dat", &trace_dat_output,
+ "file", "Convert to trace.dat format using perf.data tracepoints"),
+#endif
OPT_END()
};
@@ -65,16 +72,43 @@ static int cmd_data_convert(int argc, const char **argv)
pr_err("You cannot specify both --to-ctf and --to-json.\n");
return -1;
}
+#ifdef HAVE_LIBTRACEEVENT
+ if (trace_dat_output && (to_json || to_ctf)) {
+ pr_err("You cannot specify --to-trace-dat with --to-ctf or --to-json.\n");
+ return -1;
+ }
+#endif
+
#ifdef HAVE_LIBBABELTRACE_SUPPORT
+ #ifdef HAVE_LIBTRACEEVENT
+ if (!to_json && !to_ctf && !trace_dat_output) {
+ pr_err("You must specify one of --to-ctf, --to-json, or --to-trace-dat.\n");
+ return -1;
+ }
+ #else
if (!to_json && !to_ctf) {
pr_err("You must specify one of --to-ctf or --to-json.\n");
return -1;
}
+ #endif
#else
+ #ifdef HAVE_LIBTRACEEVENT
+ if (!to_json && !trace_dat_output) {
+ pr_err("You must specify --to-json or --to-trace-dat.\n");
+ return -1;
+ }
+ #else
if (!to_json) {
pr_err("You must specify --to-json.\n");
- return -1;
-}
+ return -1
+ }
+ #endif
+#endif
+
+#ifdef HAVE_LIBTRACEEVENT
+ if (trace_dat_output)
+ return trace_convert__perf2dat(input_name ? input_name : "perf.data",
+ trace_dat_output, &opts);
#endif
if (to_json)
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [RFC PATCH 4/4] perf data: Add --to-trace-dat option for converting perf.data tracepoint events into trace.dat format
2026-06-08 12:59 ` [RFC PATCH 4/4] perf data: Add --to-trace-dat option for converting perf.data tracepoint events into trace.dat format Tanushree Shah
@ 2026-06-08 13:12 ` sashiko-bot
0 siblings, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-06-08 13:12 UTC (permalink / raw)
To: Tanushree Shah; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] The `trace_convert__perf2dat` tool fails to register required `.attr`, `.feature`, and `.tracing_data` callbacks, breaking pipe mode.
- [High] Accessing unverified `perf_env` field `nr_cpus_online` leads to deterministic failure when CPU topology feature is missing.
- [High] Failing `realloc()` in `trace_dat__collect_cpu_event()` leaks memory and causes a NULL pointer dereference in the cleanup path.
- [Low] Missing semicolon after `return -1` in the `#else` branch inside `cmd_data_convert`.
--
commit 789bd3d29fe2134b1ed0b9a17d6c674bc7d7db23
Author: Tanushree Shah <tshah@linux.ibm.com>
perf data: Add --to-trace-dat option for converting perf.data tracepoint events into trace.dat format
Add new command-line option to perf data convert for generating
trace.dat output files. Accepts output filename for trace.dat format,
mutually exclusive with --to-ctf and --to-json, and calls
trace_convert__perf2dat() to perform conversion.
> diff --git a/tools/perf/builtin-data.c b/tools/perf/builtin-data.c
> index 4c08ccb8c06b7..96ccab77456a3 100644
> --- a/tools/perf/builtin-data.c
> +++ b/tools/perf/builtin-data.c
[ ... ]
> @@ -65,16 +72,43 @@ static int cmd_data_convert(int argc, const char **argv)
[ ... ]
> #else
> + #ifdef HAVE_LIBTRACEEVENT
> + if (!to_json && !trace_dat_output) {
> + pr_err("You must specify --to-json or --to-trace-dat.\n");
> + return -1;
> + }
> + #else
> if (!to_json) {
> pr_err("You must specify --to-json.\n");
> - return -1;
> -}
> + return -1
> + }
[Severity: Low]
Is there a missing semicolon here? It looks like this will break compilation
for environments lacking both HAVE_LIBBABELTRACE_SUPPORT and
HAVE_LIBTRACEEVENT.
> + #endif
> +#endif
> +
> +#ifdef HAVE_LIBTRACEEVENT
> + if (trace_dat_output)
> + return trace_convert__perf2dat(input_name ? input_name : "perf.data",
> + trace_dat_output, &opts);
[Severity: High]
Does trace_convert__perf2dat() register the required callbacks for pipe mode?
Looking at the tool initialization in trace_convert__perf2dat():
/* Initialize tool with all required callbacks */
perf_tool__init(&tc.tool, /*ordered_events=*/true);
tc.tool.sample = process_sample_event;
If .attr and .feature are not overridden, won't incoming
perf_event_header_attr and feature events from a pipe be silently dropped,
preventing the creation of evlists/evsels and breaking event processing?
[Severity: High]
How does this code handle missing CPU topology features?
Looking at trace_convert__perf2dat():
if (trace_dat__init_cpu_buffers(session->header.env.nr_cpus_online) < 0) {
If the feature callback is not registered (like in pipe mode) or the CPU
topology feature wasn't recorded, wouldn't session->header.env.nr_cpus_online
remain 0?
If trace_dat__init_cpu_buffers(0) succeeds with 0 elements, won't
trace_dat__collect_cpu_event() later check cpu >= trace_dat_nr_cpus (0 >= 0)
and return -EINVAL, causing process_sample_event to return -ENOMEM and abort
the conversion?
[Severity: High]
Is it possible for a reallocation failure to cause a memory leak and
subsequent NULL pointer dereference?
Looking at trace_dat__collect_cpu_event():
cpu_events->events = realloc(cpu_events->events,
cpu_events->capacity * sizeof(*cpu_events->events));
if (!cpu_events->events)
return -ENOMEM;
If realloc fails, doesn't it overwrite cpu_events->events with NULL, leaking
the old array?
Furthermore, if this returns -ENOMEM and cascades up to trigger the cleanup
path in trace_dat__free_cpu_buffers():
for (i = 0; i < trace_cpu_data[cpu].count; i++)
free(trace_cpu_data[cpu].events[i].raw);
Since count was not reset, won't this loop access events[i] and dereference
the NULL pointer?
> +#endif
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260608125951.90425-2-tshah@linux.ibm.com?part=4
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion
2026-06-08 12:59 [RFC PATCH 0/4] perf: Add perf.data tracepoint events to trace.dat conversion Tanushree Shah
` (3 preceding siblings ...)
2026-06-08 12:59 ` [RFC PATCH 4/4] perf data: Add --to-trace-dat option for converting perf.data tracepoint events into trace.dat format Tanushree Shah
@ 2026-06-08 15:18 ` Ian Rogers
4 siblings, 0 replies; 10+ messages in thread
From: Ian Rogers @ 2026-06-08 15:18 UTC (permalink / raw)
To: Tanushree Shah
Cc: acme, jolsa, adrian.hunter, vmolnaro, mpetlan, tmricht, maddy,
namhyung, linux-kernel, linux-perf-users, linuxppc-dev, atrajeev,
hbathini, Tejas.Manhas1, Tanushree.Shah, Shivani.Nittor
On Mon, Jun 8, 2026 at 6:00 AM Tanushree Shah <tshah@linux.ibm.com> wrote:
>
> This RFC patch series introduces support for converting perf.data files
> containing tracepoint events into trace.dat format, enabling seamless
> visualization and analysis using KerneShark.
Thanks for doing this, this is a useful feature!
nit: typo KernelShark
>
> ======================
> Background and Motivation
> ======================
>
> Currently, perf and trace-cmd operate as separate tracing ecosystems with
> incompatible data formats. Users who collect tracepoint data with
> 'perf record' cannot easily visualize it in KernelShark's graphical
> timeline view or leverage trace-cmd's analysis capabilities.
>
> This creates workflow friction when users need to:
>
> - Visualize perf tracepoint data in KernelShark's interactive graphical
> timeline
> - Share trace data between perf and trace-cmd workflows and toolchains
> - Perform architecture-independent conversion and analysis of traces
>
> This conversion bridge eliminates these barriers by enabling seamless
> data exchange between perf and trace-cmd ecosystems, allowing users to
> choose the best tool for each analysis phase.
>
> ======================
> Implementation Overview
> ======================
>
> The series implements the trace.dat file format specification (version 7)
> within perf's data conversion framework.
>
> **Patch 1/4: Core trace.dat Export Infrastructure**
> Introduces util/trace-dat.c and util/trace-dat.h implementing:
> - Per-CPU raw event buffer management (init, collect, free)
> - Ftrace ring buffer page construction
> - trace.dat section writers (strings, options, flyrecord sections)
>
> **Patch 2/4: Metadata Integration**
> Extends util/trace-event-read.c to write trace.dat metadata during
> perf.data
> parsing:
> - Initial format header (magic, version, endian, page size, compression)
> - Section 16: HEADER INFO (header_page + header_event)
> - Section 17: FTRACE EVENT FORMATS
> - Section 18: EVENT FORMATS (per system/event format files)
> - Section 19: KALLSYMS
> - Section 21: CMDLINES
> - Section 15: STRINGS (written last after all sections)
>
> **Patch 3/4: Conversion Backend**
> Implements util/data-convert-trace.c with trace_convert__perf2dat()
> function:
> - Processes PERF_TYPE_TRACEPOINT samples via process_sample_event()
> - Collects raw event data per-CPU using trace_dat__collect_cpu_event()
> - Writes OPTIONS sections (CPUCOUNT, TRACECLOCK, metadata offsets)
> - Writes FLYRECORD section with per-CPU ring buffer pages
>
> **Patch 4/4: User Interface**
> Extends tools/perf/builtin-data.c with --to-trace-dat option:
> - Adds command-line option for trace.dat output
> - Mutually exclusive with --to-ctf and --to-json
> - Calls trace_convert__perf2dat() to perform conversion
>
> ======================
> Current Implementation Details
> ======================
>
> **trace.dat Format Version:**
> The implementation currently targets trace.dat format version 7, which
> is the stable version supported by current trace-cmd releases (v3.x).
> This version is hardcoded to ensure compatibility with existing
> trace-cmd and KernelShark installations. Future enhancements could add
> version negotiation or support for newer format versions as they become
> standardized.
>
> **Compression Strategy:**
> Compression is explicitly disabled (set to NONE) in the generated
> trace.dat files.
> This design choice:
> - Simplifies the initial implementation and testing
> - Ensures maximum compatibility across trace-cmd versions
> - Avoids external compression library dependencies
>
> Future work could add support for various compression algorithms (zlib,
> zstd, lz4) with runtime selection via command-line options, significantly
> reducing file sizes for large traces.
>
> ======================
> Usage Example
> ======================
>
> ```bash
> *Record tracepoint events with perf*
> perf record -e sched:sched_switch -e sched:sched_wakeup -a sleep 10
>
> *Convert to trace.dat format*
> perf data convert --to-trace-dat=output.dat
>
> *Verify trace.dat structure*
> trace-cmd dump --summary output.dat
>
> *Analyze with trace-cmd*
> trace-cmd report output.dat
>
> *Visualize in KernelShark*
> kernelshark output.dat
> ```
>
> **Conversion Output:**
> ```
> [ perf data convert: Converted 'perf.data' into trace.dat format
> 'output.dat' ]
> [ perf data convert: Converted 2684 events ]
> ```
> **trace-cmd dump --summary Output:**
> ```
> Tracing meta data in file output.dat:
> [Initial format]
> 7 [Version]
> 0 [Little endian]
> 8 [Bytes in a long]
> 65536 [Page size, bytes]
> none [Compression algorithm]
> [Compression version]
> [buffer "", "local" clock, 65536 page size, 16 cpus, 1048576 bytes
> flyrecord data]
> [10 options]
> [Saved command lines, 0 bytes]
> [Kallsyms, 0 bytes]
> [Ftrace format, 0 events]
> [Header page, 206 bytes]
> [Header event, 205 bytes]
> [Events format, 1 systems]
> [9 sections]
> ```
> ======================
> Testing and Verification
> ======================
>
> The series has been extensively tested with:
> - Various tracepoint events (sched, irq, syscalls, block I/O)
> - Mixed recordings containing both tracepoint and non-tracepoint events
> only tracepoints converted)
> - Verification with trace-cmd report and KernelShark visualization
> - Memory leak testing with Valgrind (0 bytes leaked)
> - Cross-architecture testing (x86_64, ppc64le)
It seems that some of this could be a test to give coverage of the
feature. We have similar tests for other convertors:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/test_perf_data_converter_ctf.sh?h=perf-tools-next
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/test_perf_data_converter_json.sh?h=perf-tools-next
I think Sashiko has caught some coding issues, so I'll hold off on a
full review until the churn from Sashiko subsides.
Thanks!
Ian
> All generated trace.dat files successfully open in:
> - trace-cmd report (v3.1+)
> - KernelShark (v2.0+)
>
> ======================
> Next Steps
> ======================
>
> We would highly appreciate reviews, comments, and feedback on:
> - The overall architectural approach and integration points
> - Compatibility considerations with trace-cmd ecosystem
> - Performance characteristics for large-scale traces
> - Additional use cases or workflow scenarios
> - Future enhancement priorities
>
> Tanushree Shah (4):
> perf/trace-dat: Add trace.dat export infrastructure
> perf/trace-event: Write trace.dat metadata sections during parsing
> perf data-convert: Add perf.data to trace.dat conversion backend
> perf data: Add --to-trace-dat option for converting perf.data
> tracepoint events into trace.dat format
>
> tools/perf/builtin-data.c | 38 +-
> tools/perf/util/Build | 2 +
> tools/perf/util/data-convert-trace.c | 152 ++++++
> tools/perf/util/data-convert.h | 4 +
> tools/perf/util/trace-dat.c | 705 +++++++++++++++++++++++++++
> tools/perf/util/trace-dat.h | 79 +++
> tools/perf/util/trace-event-read.c | 259 +++++++++-
> 7 files changed, 1230 insertions(+), 9 deletions(-)
> create mode 100644 tools/perf/util/data-convert-trace.c
> create mode 100644 tools/perf/util/trace-dat.c
> create mode 100644 tools/perf/util/trace-dat.h
>
> --
> 2.53.0
>
^ permalink raw reply [flat|nested] 10+ messages in thread