* [PATCH 00/14] Add interface to expose vpa dtl counters via perf
@ 2025-08-15 8:33 Athira Rajeev
2025-08-15 8:33 ` [PATCH 01/14] powerpc/time: Expose boot_tb via accessor Athira Rajeev
` (15 more replies)
0 siblings, 16 replies; 29+ messages in thread
From: Athira Rajeev @ 2025-08-15 8:33 UTC (permalink / raw)
To: acme, jolsa, adrian.hunter, maddy, irogers, namhyung
Cc: linux-perf-users, linuxppc-dev, aboorvad, sshegde, atrajeev,
kjain, hbathini, Aditya.Bodkhe1, venkat88
The pseries Shared Processor Logical Partition(SPLPAR) machines can
retrieve a log of dispatch and preempt events from the hypervisor
using data from Disptach Trace Log(DTL) buffer. With this information,
user can retrieve when and why each dispatch & preempt has occurred.
The vpa-dtl PMU exposes the Virtual Processor Area(VPA) DTL counters
via perf.
- Patches 1 to 6 has powerpc PMU driver code changes to capture DTL
trace in perf.data. And patch 14 has documentation update.
- Patch 7 to 13 is perf tools side code changes to enable perf
report/script on perf.data file
Infrastructure used
===================
The VPA DTL PMU counters do not interrupt on overflow or generate any
PMI interrupts. Therefore, hrtimer is used to poll the DTL data. The timer
nterval can be provided by user via sample_period field in nano seconds.
vpa dtl pmu has one hrtimer added per vpa-dtl pmu thread. DTL (Dispatch
Trace Log) contains information about dispatch/preempt, enqueue time etc.
We directly copy the DTL buffer data as part of auxiliary buffer and it
will be processed later. This will avoid time taken to create samples
in the kernel space. The PMU driver collecting Dispatch Trace Log (DTL)
entries makes use of AUX support in perf infrastructure. On the tools side,
this data is made available as PERF_RECORD_AUXTRACE records.
To corelate each DTL entry with other events across CPU's, an auxtrace_queue
is created for each CPU. Each auxtrace queue has a array/list of auxtrace buffers.
All auxtrace queues is maintained in auxtrace heap. The queues are sorted
based on timestamp. When the different PERF_RECORD_XX records are processed,
compare the timestamp of perf record with timestamp of top element in the
auxtrace heap so that DTL events can be co-related with other events
Process the auxtrace queue if the timestamp of element from heap is
lower than timestamp from entry in perf record. Sometimes it could happen that
one buffer is only partially processed. if the timestamp of occurrence of
another event is more than currently processed element in the queue, it will
move on to next perf record. So keep track of position of buffer to continue
processing next time. Update the timestamp of the auxtrace heap with the timestamp
of last processed entry from the auxtrace buffer.
This infrastructure ensures dispatch trace log entries can be corelated
and presented along with other events like sched.
vpa-dtl PMU example usage
# ls /sys/devices/vpa_dtl/
events format perf_event_mux_interval_ms power subsystem type uevent
To capture the DTL data using perf record:
# ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
The result can be interpreted using perf report. Snippet of perf report -D:
# ./perf report -D
There are different PERF_RECORD_XX records. In that records corresponding to
auxtrace buffers includes:
1. PERF_RECORD_AUX
Conveys that new data is available in AUX area
2. PERF_RECORD_AUXTRACE_INFO
Describes offset and size of auxtrace data in the buffers
3. PERF_RECORD_AUXTRACE
This is the record that defines the auxtrace data which here in case of
vpa-dtl pmu is dispatch trace log data.
Snippet from perf report -D showing the PERF_RECORD_AUXTRACE dump
0 0 0x39b10 [0x30]: PERF_RECORD_AUXTRACE size: 0x690 offset: 0 ref: 0 idx: 0 tid: -1 cpu: 0
.
. ... VPA DTL PMU data: size 1680 bytes, entries is 35
. 00000000: boot_tb: 21349649546353231, tb_freq: 512000000
. 00000030: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:7064, ready_to_enqueue_time:187, waiting_to_ready_time:6611773
. 00000060: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:146, ready_to_enqueue_time:0, waiting_to_ready_time:15359437
. 00000090: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4868, ready_to_enqueue_time:232, waiting_to_ready_time:5100709
. 000000c0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:179, ready_to_enqueue_time:0, waiting_to_ready_time:30714243
. 000000f0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:197, ready_to_enqueue_time:0, waiting_to_ready_time:15350648
. 00000120: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:213, ready_to_enqueue_time:0, waiting_to_ready_time:15353446
. 00000150: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:212, ready_to_enqueue_time:0, waiting_to_ready_time:15355126
. 00000180: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:6368, ready_to_enqueue_time:164, waiting_to_ready_time:5104665
Above is representation of dtl entry of below format:
struct dtl_entry {
u8 dispatch_reason;
u8 preempt_reason;
u16 processor_id;
u32 enqueue_to_dispatch_time;
u32 ready_to_enqueue_time;
u32 waiting_to_ready_time;
u64 timebase;
u64 fault_addr;
u64 srr0;
u64 srr1;
};
First two fields represent the dispatch reason and preempt reason. The post
procecssing of PERF_RECORD_AUXTRACE records will translate to meaninful data
for user to consume.
Visualize the dispatch trace log entries with perf report:
# ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.300 MB perf.data ]
# ./perf report
# Samples: 321 of event 'vpa-dtl'
# Event count (approx.): 321
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................. ..............................
#
100.00% 100.00% swapper [kernel.kallsyms] [k] plpar_hcall_norets_notrace
Visualize the dispatch trace log entries with perf script:
# ./perf script
perf 13322 [002] 233.835807: sched:sched_switch: perf:13322 [120] R ==> migration/2:27 [0]
migration/2 27 [002] 233.835811: sched:sched_migrate_task: comm=perf pid=13322 prio=120 orig_cpu=2 dest_cpu=3
migration/2 27 [002] 233.835818: sched:sched_stat_runtime: comm=migration/2 pid=27 runtime=9214 [ns]
migration/2 27 [002] 233.835819: sched:sched_switch: migration/2:27 [0] S ==> swapper/2:0 [120]
swapper 0 [002] 233.835822: vpa-dtl: timebase: 338954486062657 dispatch_reason:decrementer_interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:435, ready_to_enqueue_time:0, waiting_to_ready_time:34775058, processor_id: 202 c0000000000f8094 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
swapper 0 [001] 233.835886: vpa-dtl: timebase: 338954486095398 dispatch_reason:priv_doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:542, ready_to_enqueue_time:0, waiting_to_ready_time:1245360, processor_id: 201 c0000000000f8094 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
Aboorva Devarajan (1):
powerpc/time: Expose boot_tb via accessor
Athira Rajeev (11):
powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for
capturing DTL data
powerpc/perf/vpa-dtl: Add support to capture DTL data in aux buffer
powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake
up is needed
tools/perf: Add basic CONFIG_AUXTRACE support for VPA pmu on powerpc
tools/perf: process auxtrace events and display in perf report -D
tools/perf: Add event name as vpa-dtl of PERF_TYPE_SYNTH type to
present DTL samples
tools/perf: Allocate and setup aux buffer queue to help co-relate with
other events across CPU's
tools/perf: Process the DTL entries in queue and deliver samples
tools/perf: Add support for printing synth event details via default
callback
tools/perf: Enable perf script to present the DTL entries
powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU
Kajol Jain (2):
powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf
docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs
event format entries for vpa_dtl pmu
.../sysfs-bus-event_source-devices-vpa-dtl | 25 +
Documentation/arch/powerpc/index.rst | 1 +
Documentation/arch/powerpc/vpa-dtl.rst | 155 ++++
arch/powerpc/include/asm/time.h | 2 +
arch/powerpc/kernel/time.c | 7 +-
arch/powerpc/perf/Makefile | 2 +-
arch/powerpc/perf/vpa-dtl.c | 605 ++++++++++++++
tools/perf/arch/powerpc/util/Build | 1 +
tools/perf/arch/powerpc/util/auxtrace.c | 122 +++
tools/perf/builtin-script.c | 26 +
tools/perf/util/Build | 1 +
tools/perf/util/auxtrace.c | 4 +
tools/perf/util/auxtrace.h | 1 +
tools/perf/util/event.h | 1 +
tools/perf/util/powerpc-vpadtl.c | 756 ++++++++++++++++++
tools/perf/util/powerpc-vpadtl.h | 45 ++
16 files changed, 1752 insertions(+), 2 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-vpa-dtl
create mode 100644 Documentation/arch/powerpc/vpa-dtl.rst
create mode 100644 arch/powerpc/perf/vpa-dtl.c
create mode 100644 tools/perf/arch/powerpc/util/auxtrace.c
create mode 100644 tools/perf/util/powerpc-vpadtl.c
create mode 100644 tools/perf/util/powerpc-vpadtl.h
--
2.47.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 01/14] powerpc/time: Expose boot_tb via accessor
2025-08-15 8:33 [PATCH 00/14] Add interface to expose vpa dtl counters via perf Athira Rajeev
@ 2025-08-15 8:33 ` Athira Rajeev
2025-08-15 8:33 ` [PATCH 02/14] powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf Athira Rajeev
` (14 subsequent siblings)
15 siblings, 0 replies; 29+ messages in thread
From: Athira Rajeev @ 2025-08-15 8:33 UTC (permalink / raw)
To: acme, jolsa, adrian.hunter, maddy, irogers, namhyung
Cc: linux-perf-users, linuxppc-dev, aboorvad, sshegde, atrajeev,
kjain, hbathini, Aditya.Bodkhe1, venkat88
From: Aboorva Devarajan <aboorvad@linux.ibm.com>
- Define accessor function get_boot_tb() to safely return boot_tb.
- Tag boot_tb as __ro_after_init (only initialized once and never updated).
- Add a debug log to output the boot timebase during initialization.
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
---
arch/powerpc/include/asm/time.h | 2 ++
arch/powerpc/kernel/time.c | 7 ++++++-
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index f8885586efaf..31bb6be4d355 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -29,6 +29,8 @@ extern u64 decrementer_max;
extern void generic_calibrate_decr(void);
+extern u64 get_boot_tb(void);
+
/* Some sane defaults: 125 MHz timebase, 1GHz processor */
extern unsigned long ppc_proc_freq;
#define DEFAULT_PROC_FREQ (DEFAULT_TB_FREQ * 8)
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 8224381c1dba..f5106b90e517 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -137,7 +137,7 @@ EXPORT_SYMBOL_GPL(rtc_lock);
static u64 tb_to_ns_scale __read_mostly;
static unsigned tb_to_ns_shift __read_mostly;
-static u64 boot_tb __read_mostly;
+static u64 boot_tb __ro_after_init;
extern struct timezone sys_tz;
static long timezone_offset;
@@ -639,6 +639,10 @@ notrace unsigned long long sched_clock(void)
return mulhdu(get_tb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
}
+u64 get_boot_tb(void)
+{
+ return boot_tb;
+}
#ifdef CONFIG_PPC_PSERIES
@@ -975,6 +979,7 @@ void __init time_init(void)
tb_to_ns_shift = shift;
/* Save the current timebase to pretty up CONFIG_PRINTK_TIME */
boot_tb = get_tb();
+ pr_debug("%s: timebase at boot: %llu\n", __func__, (unsigned long long)boot_tb);
/* If platform provided a timezone (pmac), we correct the time */
if (timezone_offset) {
--
2.47.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 02/14] powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf
2025-08-15 8:33 [PATCH 00/14] Add interface to expose vpa dtl counters via perf Athira Rajeev
2025-08-15 8:33 ` [PATCH 01/14] powerpc/time: Expose boot_tb via accessor Athira Rajeev
@ 2025-08-15 8:33 ` Athira Rajeev
2025-08-20 11:53 ` Shrikanth Hegde
2025-08-15 8:33 ` [PATCH 03/14] docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs event format entries for vpa_dtl pmu Athira Rajeev
` (13 subsequent siblings)
15 siblings, 1 reply; 29+ messages in thread
From: Athira Rajeev @ 2025-08-15 8:33 UTC (permalink / raw)
To: acme, jolsa, adrian.hunter, maddy, irogers, namhyung
Cc: linux-perf-users, linuxppc-dev, aboorvad, sshegde, atrajeev,
kjain, hbathini, Aditya.Bodkhe1, venkat88
From: Kajol Jain <kjain@linux.ibm.com>
The pseries Shared Processor Logical Partition(SPLPAR) machines
can retrieve a log of dispatch and preempt events from the
hypervisor using data from Disptach Trace Log(DTL) buffer.
With this information, user can retrieve when and why each dispatch &
preempt has occurred. Added an interface to expose the Virtual Processor
Area(VPA) DTL counters via perf.
The following events are available and exposed in sysfs:
vpa_dtl/dtl_cede/ - Trace voluntary (OS initiated) virtual processor waits
vpa_dtl/dtl_preempt/ - Trace time slice preempts
vpa_dtl/dtl_fault/ - Trace virtual partition memory page faults.
vpa_dtl/dtl_all/ - Trace all (dtl_cede/dtl_preempt/dtl_fault)
Added interface defines supported event list, config fields for the
event attributes and their corresponding bit values which are exported
via sysfs. User could use the standard perf tool to access perf events
exposed via vpa-dtl pmu.
The VPA DTL PMU counters do not interrupt on overflow or generate any
PMI interrupts. Therefore, the kernel needs to poll the counters, added
hrtimer code to do that. The timer interval can be provided by user via
sample_period field in nano seconds. There is one hrtimer added per
vpa-dtl pmu thread.
To ensure there are no other conflicting dtl users (example: debugfs dtl
or /proc/powerpc/vcpudispatch_stats), interface added code to use
"down_write_trylock" call to take the dtl_access_lock. The dtl_access_lock
is defined in dtl.h file. Also added global reference count variable called
"dtl_global_refc", to ensure dtl data can be captured per-cpu. Code also
added global lock called "dtl_global_lock" to avoid race condition.
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
arch/powerpc/perf/Makefile | 2 +-
arch/powerpc/perf/vpa-dtl.c | 349 ++++++++++++++++++++++++++++++++++++
2 files changed, 350 insertions(+), 1 deletion(-)
create mode 100644 arch/powerpc/perf/vpa-dtl.c
diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index 7f53fcb7495a..78dd7e25219e 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -14,7 +14,7 @@ obj-$(CONFIG_PPC_POWERNV) += imc-pmu.o
obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
obj-$(CONFIG_FSL_EMB_PERF_EVENT_E500) += e500-pmu.o e6500-pmu.o
-obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o hv-common.o
+obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o hv-common.o vpa-dtl.o
obj-$(CONFIG_VPA_PMU) += vpa-pmu.o
diff --git a/arch/powerpc/perf/vpa-dtl.c b/arch/powerpc/perf/vpa-dtl.c
new file mode 100644
index 000000000000..e92756f88801
--- /dev/null
+++ b/arch/powerpc/perf/vpa-dtl.c
@@ -0,0 +1,349 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Perf interface to expose Dispatch Trace Log counters.
+ *
+ * Copyright (C) 2024 Kajol Jain, IBM Corporation
+ */
+
+#ifdef CONFIG_PPC_SPLPAR
+#define pr_fmt(fmt) "vpa_dtl: " fmt
+
+#include <asm/dtl.h>
+#include <linux/perf_event.h>
+#include <asm/plpar_wrappers.h>
+
+#define EVENT(_name, _code) enum{_name = _code}
+
+/*
+ * Based on Power Architecture Platform Reference(PAPR) documentation,
+ * Table 14.14. Per Virtual Processor Area, below Dispatch Trace Log(DTL)
+ * Enable Mask used to get corresponding virtual processor dispatch
+ * to preempt traces:
+ * DTL_CEDE(0x1): Trace voluntary (OS initiated) virtual
+ * processor waits
+ * DTL_PREEMPT(0x2): Trace time slice preempts
+ * DTL_FAULT(0x4): Trace virtual partition memory page
+ faults.
+ * DTL_ALL(0x7): Trace all (DTL_CEDE | DTL_PREEMPT | DTL_FAULT)
+ *
+ * Event codes based on Dispatch Trace Log Enable Mask.
+ */
+EVENT(DTL_CEDE, 0x1);
+EVENT(DTL_PREEMPT, 0x2);
+EVENT(DTL_FAULT, 0x4);
+EVENT(DTL_ALL, 0x7);
+
+GENERIC_EVENT_ATTR(dtl_cede, DTL_CEDE);
+GENERIC_EVENT_ATTR(dtl_preempt, DTL_PREEMPT);
+GENERIC_EVENT_ATTR(dtl_fault, DTL_FAULT);
+GENERIC_EVENT_ATTR(dtl_all, DTL_ALL);
+
+PMU_FORMAT_ATTR(event, "config:0-7");
+
+static struct attribute *events_attr[] = {
+ GENERIC_EVENT_PTR(DTL_CEDE),
+ GENERIC_EVENT_PTR(DTL_PREEMPT),
+ GENERIC_EVENT_PTR(DTL_FAULT),
+ GENERIC_EVENT_PTR(DTL_ALL),
+ NULL
+};
+
+static struct attribute_group event_group = {
+ .name = "events",
+ .attrs = events_attr,
+};
+
+static struct attribute *format_attrs[] = {
+ &format_attr_event.attr,
+ NULL,
+};
+
+static const struct attribute_group format_group = {
+ .name = "format",
+ .attrs = format_attrs,
+};
+
+static const struct attribute_group *attr_groups[] = {
+ &format_group,
+ &event_group,
+ NULL,
+};
+
+struct vpa_dtl {
+ struct dtl_entry *buf;
+ u64 last_idx;
+ bool active_lock;
+};
+
+static DEFINE_PER_CPU(struct vpa_dtl, vpa_dtl_cpu);
+
+/* variable to capture reference count for the active dtl threads */
+static int dtl_global_refc;
+static spinlock_t dtl_global_lock = __SPIN_LOCK_UNLOCKED(dtl_global_lock);
+
+/*
+ * Function to dump the dispatch trace log buffer data to the
+ * perf data.
+ */
+static void vpa_dtl_dump_sample_data(struct perf_event *event)
+{
+ return;
+}
+
+/*
+ * The VPA Dispatch Trace log counters do not interrupt on overflow.
+ * Therefore, the kernel needs to poll the counters to avoid missing
+ * an overflow using hrtimer. The timer interval is based on sample_period
+ * count provided by user, and minimum interval is 1 millisecond.
+ */
+static enum hrtimer_restart vpa_dtl_hrtimer_handle(struct hrtimer *hrtimer)
+{
+ struct perf_event *event;
+ u64 period;
+
+ event = container_of(hrtimer, struct perf_event, hw.hrtimer);
+
+ if (event->state != PERF_EVENT_STATE_ACTIVE)
+ return HRTIMER_NORESTART;
+
+ vpa_dtl_dump_sample_data(event);
+ period = max_t(u64, NSEC_PER_MSEC, event->hw.sample_period);
+ hrtimer_forward_now(hrtimer, ns_to_ktime(period));
+
+ return HRTIMER_RESTART;
+}
+
+static void vpa_dtl_start_hrtimer(struct perf_event *event)
+{
+ u64 period;
+ struct hw_perf_event *hwc = &event->hw;
+
+ period = max_t(u64, NSEC_PER_MSEC, hwc->sample_period);
+ hrtimer_start(&hwc->hrtimer, ns_to_ktime(period), HRTIMER_MODE_REL_PINNED);
+}
+
+static void vpa_dtl_stop_hrtimer(struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ hrtimer_cancel(&hwc->hrtimer);
+}
+
+static void vpa_dtl_reset_global_refc(struct perf_event *event)
+{
+ spin_lock(&dtl_global_lock);
+ dtl_global_refc--;
+ if (dtl_global_refc <= 0) {
+ dtl_global_refc = 0;
+ up_write(&dtl_access_lock);
+ }
+ spin_unlock(&dtl_global_lock);
+}
+
+/* Allocate dtl buffer memory for given cpu. */
+static int vpa_dtl_mem_alloc(int cpu)
+{
+ struct vpa_dtl *dtl = &per_cpu(vpa_dtl_cpu, cpu);
+ struct dtl_entry *buf = NULL;
+
+ /* Check for dispatch trace log buffer cache */
+ if (!dtl_cache)
+ return -ENOMEM;
+
+ buf = kmem_cache_alloc_node(dtl_cache, GFP_KERNEL, cpu_to_node(cpu));
+ if (!buf) {
+ pr_warn("buffer allocation failed for cpu %d\n", cpu);
+ return -ENOMEM;
+ }
+ dtl->buf = buf;
+ return 0;
+}
+
+static int vpa_dtl_event_init(struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+ struct vpa_dtl *dtl = &per_cpu(vpa_dtl_cpu, event->cpu);
+
+ /* test the event attr type for PMU enumeration */
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ if (!perfmon_capable())
+ return -EACCES;
+
+ /* Return if this is a counting event */
+ if (!is_sampling_event(event))
+ return -EOPNOTSUPP;
+
+ /* no branch sampling */
+ if (has_branch_stack(event))
+ return -EOPNOTSUPP;
+
+ /* Invalid eventcode */
+ switch (event->attr.config) {
+ case DTL_LOG_CEDE:
+ case DTL_LOG_PREEMPT:
+ case DTL_LOG_FAULT:
+ case DTL_LOG_ALL:
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ spin_lock(&dtl_global_lock);
+
+ /*
+ * To ensure there are no other conflicting dtl users
+ * (example: /proc/powerpc/vcpudispatch_stats or debugfs dtl),
+ * below code try to take the dtl_access_lock.
+ * The dtl_access_lock is a rwlock defined in dtl.h, which is used
+ * to unsure there is no conflicting dtl users.
+ * Based on below code, vpa_dtl pmu tries to take write access lock
+ * and also checks for dtl_global_refc, to make sure that the
+ * dtl_access_lock is taken by vpa_dtl pmu interface.
+ */
+ if (dtl_global_refc == 0 && !down_write_trylock(&dtl_access_lock)) {
+ spin_unlock(&dtl_global_lock);
+ return -EBUSY;
+ }
+
+ /* Allocate dtl buffer memory */
+ if (vpa_dtl_mem_alloc(event->cpu)) {
+ spin_unlock(&dtl_global_lock);
+ return -ENOMEM;
+ }
+
+ /*
+ * Increment the number of active vpa_dtl pmu threads. The
+ * dtl_global_refc is used to keep count of cpu threads that
+ * currently capturing dtl data using vpa_dtl pmu interface.
+ */
+ dtl_global_refc++;
+
+ /*
+ * active_lock is a per cpu variable which is set if
+ * current cpu is running vpa_dtl perf record session.
+ */
+ dtl->active_lock = true;
+ spin_unlock(&dtl_global_lock);
+
+ hrtimer_setup(&hwc->hrtimer, vpa_dtl_hrtimer_handle, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+
+ /*
+ * Since hrtimers have a fixed rate, we can do a static freq->period
+ * mapping and avoid the whole period adjust feedback stuff.
+ */
+ if (event->attr.freq) {
+ long freq = event->attr.sample_freq;
+
+ event->attr.sample_period = NSEC_PER_SEC / freq;
+ hwc->sample_period = event->attr.sample_period;
+ local64_set(&hwc->period_left, hwc->sample_period);
+ hwc->last_period = hwc->sample_period;
+ event->attr.freq = 0;
+ }
+
+ event->destroy = vpa_dtl_reset_global_refc;
+ return 0;
+}
+
+static int vpa_dtl_event_add(struct perf_event *event, int flags)
+{
+ int ret, hwcpu;
+ unsigned long addr;
+ struct vpa_dtl *dtl = &per_cpu(vpa_dtl_cpu, event->cpu);
+
+ /*
+ * Register our dtl buffer with the hypervisor. The
+ * HV expects the buffer size to be passed in the second
+ * word of the buffer. Refer section '14.11.3.2. H_REGISTER_VPA'
+ * from PAPR for more information.
+ */
+ ((u32 *)dtl->buf)[1] = cpu_to_be32(DISPATCH_LOG_BYTES);
+ dtl->last_idx = 0;
+
+ hwcpu = get_hard_smp_processor_id(event->cpu);
+ addr = __pa(dtl->buf);
+
+ ret = register_dtl(hwcpu, addr);
+ if (ret) {
+ pr_warn("DTL registration for cpu %d (hw %d) failed with %d\n",
+ event->cpu, hwcpu, ret);
+ return ret;
+ }
+
+ /* set our initial buffer indices */
+ lppaca_of(event->cpu).dtl_idx = 0;
+
+ /*
+ * Ensure that our updates to the lppaca fields have
+ * occurred before we actually enable the logging
+ */
+ smp_wmb();
+
+ /* enable event logging */
+ lppaca_of(event->cpu).dtl_enable_mask = event->attr.config;
+
+ vpa_dtl_start_hrtimer(event);
+
+ return 0;
+}
+
+static void vpa_dtl_event_del(struct perf_event *event, int flags)
+{
+ int hwcpu = get_hard_smp_processor_id(event->cpu);
+ struct vpa_dtl *dtl = &per_cpu(vpa_dtl_cpu, event->cpu);
+
+ vpa_dtl_stop_hrtimer(event);
+ unregister_dtl(hwcpu);
+ kmem_cache_free(dtl_cache, dtl->buf);
+ dtl->buf = NULL;
+ lppaca_of(event->cpu).dtl_enable_mask = 0x0;
+ dtl->active_lock = false;
+}
+
+/*
+ * This function definition is empty as vpa_dtl_dump_sample_data
+ * is used to parse and dump the dispatch trace log data,
+ * to perf data.
+ */
+static void vpa_dtl_event_read(struct perf_event *event)
+{
+}
+
+static struct pmu vpa_dtl_pmu = {
+ .task_ctx_nr = perf_invalid_context,
+
+ .name = "vpa_dtl",
+ .attr_groups = attr_groups,
+ .event_init = vpa_dtl_event_init,
+ .add = vpa_dtl_event_add,
+ .del = vpa_dtl_event_del,
+ .read = vpa_dtl_event_read,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE | PERF_PMU_CAP_EXCLUSIVE,
+};
+
+static int vpa_dtl_init(void)
+{
+ int r;
+
+ if (!firmware_has_feature(FW_FEATURE_SPLPAR)) {
+ pr_debug("not a shared virtualized system, not enabling\n");
+ return -ENODEV;
+ }
+
+ /* This driver is intended only for L1 host. */
+ if (is_kvm_guest()) {
+ pr_debug("Only supported for L1 host system\n");
+ return -ENODEV;
+ }
+
+ r = perf_pmu_register(&vpa_dtl_pmu, vpa_dtl_pmu.name, -1);
+ if (r)
+ return r;
+
+ return 0;
+}
+
+device_initcall(vpa_dtl_init);
+#endif //CONFIG_PPC_SPLPAR
--
2.47.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 03/14] docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs event format entries for vpa_dtl pmu
2025-08-15 8:33 [PATCH 00/14] Add interface to expose vpa dtl counters via perf Athira Rajeev
2025-08-15 8:33 ` [PATCH 01/14] powerpc/time: Expose boot_tb via accessor Athira Rajeev
2025-08-15 8:33 ` [PATCH 02/14] powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf Athira Rajeev
@ 2025-08-15 8:33 ` Athira Rajeev
2025-08-15 8:33 ` [PATCH 04/14] powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for capturing DTL data Athira Rajeev
` (12 subsequent siblings)
15 siblings, 0 replies; 29+ messages in thread
From: Athira Rajeev @ 2025-08-15 8:33 UTC (permalink / raw)
To: acme, jolsa, adrian.hunter, maddy, irogers, namhyung
Cc: linux-perf-users, linuxppc-dev, aboorvad, sshegde, atrajeev,
kjain, hbathini, Aditya.Bodkhe1, venkat88
From: Kajol Jain <kjain@linux.ibm.com>
Details are added for the vpa_dtl pmu event and format
attributes in the ABI documentation.
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
.../sysfs-bus-event_source-devices-vpa-dtl | 25 +++++++++++++++++++
1 file changed, 25 insertions(+)
create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-vpa-dtl
diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-vpa-dtl b/Documentation/ABI/testing/sysfs-bus-event_source-devices-vpa-dtl
new file mode 100644
index 000000000000..7b7c789a5cf5
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-vpa-dtl
@@ -0,0 +1,25 @@
+What: /sys/bus/event_source/devices/vpa_dtl/format
+Date: February 2025
+Contact: Linux on PowerPC Developer List <linuxppc-dev at lists.ozlabs.org>
+Description: Read-only. Attribute group to describe the magic bits
+ that go into perf_event_attr.config for a particular pmu.
+ (See ABI/testing/sysfs-bus-event_source-devices-format).
+
+ Each attribute under this group defines a bit range of the
+ perf_event_attr.config. Supported attribute are listed
+ below::
+
+ event = "config:0-7" - event ID
+
+ For example::
+
+ dtl_cede = "event=0x1"
+
+What: /sys/bus/event_source/devices/vpa_dtl/events
+Date: February 2025
+Contact: Linux on PowerPC Developer List <linuxppc-dev at lists.ozlabs.org>
+Description: (RO) Attribute group to describe performance monitoring events
+ for the Virtual Processor Dispatch Trace Log. Each attribute in
+ this group describes a single performance monitoring event
+ supported by vpa_dtl pmu. The name of the file is the name of
+ the event (See ABI/testing/sysfs-bus-event_source-devices-events).
--
2.47.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 04/14] powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for capturing DTL data
2025-08-15 8:33 [PATCH 00/14] Add interface to expose vpa dtl counters via perf Athira Rajeev
` (2 preceding siblings ...)
2025-08-15 8:33 ` [PATCH 03/14] docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs event format entries for vpa_dtl pmu Athira Rajeev
@ 2025-08-15 8:33 ` Athira Rajeev
2025-08-15 8:33 ` [PATCH 05/14] powerpc/perf/vpa-dtl: Add support to capture DTL data in aux buffer Athira Rajeev
` (11 subsequent siblings)
15 siblings, 0 replies; 29+ messages in thread
From: Athira Rajeev @ 2025-08-15 8:33 UTC (permalink / raw)
To: acme, jolsa, adrian.hunter, maddy, irogers, namhyung
Cc: linux-perf-users, linuxppc-dev, aboorvad, sshegde, atrajeev,
kjain, hbathini, Aditya.Bodkhe1, venkat88
vpa dtl pmu has one hrtimer added per vpa-dtl pmu thread. When the
hrtimer expires, in the timer handler, code is added to save the DTL
data to perf event record. DTL (Dispatch Trace Log) contains information
about dispatch/preempt, enqueue time etc. We directly copy the DTL
buffer data as part of auxiliary buffer and it will be postprocessed
later. To enable the support for aux buffer, add the PMU callbacks for
setup_aux and free_aux.
In setup_aux, set up pmu-private data structures for an AUX
area. rb_alloc_aux uses "alloc_pages_node" and returns pointer to each
page address. Map these pages to contiguous space using vmap and use
that as base address. The aux private data structure ie,
"struct vpa_pmu_buf" mainly saves:
1. buf->base: aux buffer base address
2. buf->head: offset from base address where data will be written to.
3. buf->size: Size of allocated memory
free_aux will free pmu-private AUX data structures.
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
---
arch/powerpc/perf/vpa-dtl.c | 77 +++++++++++++++++++++++++++++++++++++
1 file changed, 77 insertions(+)
diff --git a/arch/powerpc/perf/vpa-dtl.c b/arch/powerpc/perf/vpa-dtl.c
index e92756f88801..364242cbfa8a 100644
--- a/arch/powerpc/perf/vpa-dtl.c
+++ b/arch/powerpc/perf/vpa-dtl.c
@@ -11,6 +11,7 @@
#include <asm/dtl.h>
#include <linux/perf_event.h>
#include <asm/plpar_wrappers.h>
+#include <linux/vmalloc.h>
#define EVENT(_name, _code) enum{_name = _code}
@@ -75,6 +76,19 @@ struct vpa_dtl {
bool active_lock;
};
+struct vpa_pmu_ctx {
+ struct perf_output_handle handle;
+};
+
+struct vpa_pmu_buf {
+ int nr_pages;
+ bool snapshot;
+ u64 *base;
+ u64 size;
+ u64 head;
+};
+
+static DEFINE_PER_CPU(struct vpa_pmu_ctx, vpa_pmu_ctx);
static DEFINE_PER_CPU(struct vpa_dtl, vpa_dtl_cpu);
/* variable to capture reference count for the active dtl threads */
@@ -311,6 +325,67 @@ static void vpa_dtl_event_read(struct perf_event *event)
{
}
+/*
+ * Set up pmu-private data structures for an AUX area
+ * **pages contains the aux buffer allocated for this event
+ * for the corresponding cpu. rb_alloc_aux uses "alloc_pages_node"
+ * and returns pointer to each page address. Map these pages to
+ * contiguous space using vmap and use that as base address.
+ *
+ * The aux private data structure ie, "struct vpa_pmu_buf" mainly
+ * saves
+ * - buf->base: aux buffer base address
+ * - buf->head: offset from base address where data will be written to.
+ * - buf->size: Size of allocated memory
+ */
+static void *vpa_dtl_setup_aux(struct perf_event *event, void **pages,
+ int nr_pages, bool snapshot)
+{
+ int i, cpu = event->cpu;
+ struct vpa_pmu_buf *buf __free(kfree) = NULL;
+ struct page **pglist __free(kfree) = NULL;
+
+ /* We need at least one page for this to work. */
+ if (!nr_pages)
+ return NULL;
+
+ if (cpu == -1)
+ cpu = raw_smp_processor_id();
+
+ buf = kzalloc_node(sizeof(*buf), GFP_KERNEL, cpu_to_node(cpu));
+ if (!buf)
+ return NULL;
+
+ pglist = kcalloc(nr_pages, sizeof(*pglist), GFP_KERNEL);
+ if (!pglist)
+ return NULL;
+
+ for (i = 0; i < nr_pages; ++i)
+ pglist[i] = virt_to_page(pages[i]);
+
+ buf->base = vmap(pglist, nr_pages, VM_MAP, PAGE_KERNEL);
+ if (!buf->base)
+ return NULL;
+
+ buf->nr_pages = nr_pages;
+ buf->snapshot = false;
+
+ buf->size = nr_pages << PAGE_SHIFT;
+ buf->head = 0;
+ return no_free_ptr(buf);
+}
+
+/*
+ * free pmu-private AUX data structures
+ */
+static void vpa_dtl_free_aux(void *aux)
+{
+ struct vpa_pmu_buf *buf = aux;
+
+ vunmap(buf->base);
+ kfree(buf);
+}
+
static struct pmu vpa_dtl_pmu = {
.task_ctx_nr = perf_invalid_context,
@@ -320,6 +395,8 @@ static struct pmu vpa_dtl_pmu = {
.add = vpa_dtl_event_add,
.del = vpa_dtl_event_del,
.read = vpa_dtl_event_read,
+ .setup_aux = vpa_dtl_setup_aux,
+ .free_aux = vpa_dtl_free_aux,
.capabilities = PERF_PMU_CAP_NO_EXCLUDE | PERF_PMU_CAP_EXCLUSIVE,
};
--
2.47.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 05/14] powerpc/perf/vpa-dtl: Add support to capture DTL data in aux buffer
2025-08-15 8:33 [PATCH 00/14] Add interface to expose vpa dtl counters via perf Athira Rajeev
` (3 preceding siblings ...)
2025-08-15 8:33 ` [PATCH 04/14] powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for capturing DTL data Athira Rajeev
@ 2025-08-15 8:33 ` Athira Rajeev
2025-08-15 8:33 ` [PATCH 06/14] powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake up is needed Athira Rajeev
` (10 subsequent siblings)
15 siblings, 0 replies; 29+ messages in thread
From: Athira Rajeev @ 2025-08-15 8:33 UTC (permalink / raw)
To: acme, jolsa, adrian.hunter, maddy, irogers, namhyung
Cc: linux-perf-users, linuxppc-dev, aboorvad, sshegde, atrajeev,
kjain, hbathini, Aditya.Bodkhe1, venkat88
vpa dtl pmu has one hrtimer added per vpa-dtl pmu thread. When the
hrtimer expires, in the timer handler, code is added to save the DTL
data to perf event record via vpa_dtl_capture_aux() function.
The DTL (Dispatch Trace Log) contains information
about dispatch/preempt, enqueue time etc. We directly copy the DTL
buffer data as part of auxiliary buffer. Data will be written to
disk only when the allocated buffer is full.
By this approach, all the DTL data will be present as-is in the
perf.data. The data will be post-processed in perf tools side when doing
perf report/perf script and this will avoid time taken to create samples
in the kernel space.
To corelate each DTL entry with other events across CPU's, we need to
map timebase from "struct dtl_entry" which phyp provides with boot
timebase. This also needs timebase frequency. Define "struct boottb_freq"
to save these details.
Added changes to capture the Dispatch Trace Log details to AUX buffer
in vpa_dtl_dump_sample_data(). Boot timebase and frequency needs to be
saved only at once, added field to indicate this as part of
"vpa_pmu_buf" structure.
perf_aux_output_begin: This function is called before writing to AUX
area. This returns the pointer to aux area private structure, ie
"struct vpa_pmu_buf". The function obtains the output handle
(used in perf_aux_output_end). when capture completes in
vpa_dtl_capture_aux(), call perf_aux_output_end() to commit the recorded
data. perf_aux_output_end() is called to move the aux->head of
"struct perf_buffer" to indicate size of data in aux buffer.
aux_tail will be moved in perf tools side when writing the data from
aux buffer to perf.data file in disk.
It is responsiblity of PMU driver to make sure data is copied between
perf_aux_output_begin and perf_aux_output_end.
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
---
arch/powerpc/perf/vpa-dtl.c | 131 +++++++++++++++++++++++++++++++++++-
1 file changed, 130 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/perf/vpa-dtl.c b/arch/powerpc/perf/vpa-dtl.c
index 364242cbfa8a..ce17beddd4b4 100644
--- a/arch/powerpc/perf/vpa-dtl.c
+++ b/arch/powerpc/perf/vpa-dtl.c
@@ -86,6 +86,24 @@ struct vpa_pmu_buf {
u64 *base;
u64 size;
u64 head;
+ /* boot timebase and frequency needs to be saved only at once */
+ int boottb_freq_saved;
+};
+
+/*
+ * To corelate each DTL entry with other events across CPU's,
+ * we need to map timebase from "struct dtl_entry" which phyp
+ * provides with boot timebase. This also needs timebase frequency.
+ * Formula is: ((timbase from DTL entry - boot time) / frequency)
+ *
+ * To match with size of "struct dtl_entry" to ease post processing,
+ * padded 24 bytes to the structure.
+ */
+struct boottb_freq {
+ u64 boot_tb;
+ u64 tb_freq;
+ u64 timebase;
+ u64 padded[3];
};
static DEFINE_PER_CPU(struct vpa_pmu_ctx, vpa_pmu_ctx);
@@ -95,13 +113,123 @@ static DEFINE_PER_CPU(struct vpa_dtl, vpa_dtl_cpu);
static int dtl_global_refc;
static spinlock_t dtl_global_lock = __SPIN_LOCK_UNLOCKED(dtl_global_lock);
+/*
+ * Capture DTL data in AUX buffer
+ */
+static void vpa_dtl_capture_aux(long *n_entries, struct vpa_pmu_buf *buf,
+ struct vpa_dtl *dtl, int index)
+{
+ struct dtl_entry *aux_copy_buf = (struct dtl_entry *)buf->base;
+
+ /*
+ * Copy to AUX buffer from per-thread address
+ */
+ memcpy(aux_copy_buf + buf->head, &dtl->buf[index], *n_entries * sizeof(struct dtl_entry));
+
+ buf->head += *n_entries;
+
+ return;
+}
+
/*
* Function to dump the dispatch trace log buffer data to the
* perf data.
+ *
+ * perf_aux_output_begin: This function is called before writing
+ * to AUX area. This returns the pointer to aux area private structure,
+ * ie "struct vpa_pmu_buf" here which is set in setup_aux() function.
+ * The function obtains the output handle (used in perf_aux_output_end).
+ * when capture completes in vpa_dtl_capture_aux(), call perf_aux_output_end()
+ * to commit the recorded data.
+ *
+ * perf_aux_output_end: This function commits data by adjusting the
+ * aux_head of "struct perf_buffer". aux_tail will be moved in perf tools
+ * side when writing the data from aux buffer to perf.data file in disk.
+ *
+ * Here in the private aux structure, we maintain head to know where
+ * to copy data next time in the PMU driver. vpa_pmu_buf->head is moved to
+ * maintain the aux head for PMU driver. It is responsiblity of PMU
+ * driver to make sure data is copied between perf_aux_output_begin and
+ * perf_aux_output_end.
+ *
+ * After data is copied in vpa_dtl_capture_aux() function, perf_aux_output_end()
+ * is called to move the aux->head of "struct perf_buffer" to indicate size of
+ * data in aux buffer. This will post a PERF_RECORD_AUX into the perf buffer.
+ * Data will be written to disk only when the allocated buffer is full.
+ *
+ * By this approach, all the DTL data will be present as-is in the
+ * perf.data. The data will be pre-processed in perf tools side when doing
+ * perf report/perf script and this will avoid time taken to create samples
+ * in the kernel space.
*/
static void vpa_dtl_dump_sample_data(struct perf_event *event)
{
- return;
+ u64 cur_idx, last_idx, i;
+ u64 boot_tb;
+ struct boottb_freq boottb_freq;
+
+ /* actual number of entries read */
+ long n_read = 0, read_size = 0;
+
+ /* number of entries added to dtl buffer */
+ long n_req;
+
+ struct vpa_pmu_ctx *vpa_ctx = this_cpu_ptr(&vpa_pmu_ctx);
+
+ struct vpa_pmu_buf *aux_buf;
+
+ struct vpa_dtl *dtl = &per_cpu(vpa_dtl_cpu, event->cpu);
+
+ cur_idx = be64_to_cpu(lppaca_of(event->cpu).dtl_idx);
+ last_idx = dtl->last_idx;
+
+ if (last_idx + N_DISPATCH_LOG <= cur_idx)
+ last_idx = cur_idx - N_DISPATCH_LOG + 1;
+
+ n_req = cur_idx - last_idx;
+
+ /* no new entry added to the buffer, return */
+ if (n_req <= 0)
+ return;
+
+ dtl->last_idx = last_idx + n_req;
+ boot_tb = get_boot_tb();
+
+ i = last_idx % N_DISPATCH_LOG;
+
+ aux_buf = perf_aux_output_begin(&vpa_ctx->handle, event);
+ if (!aux_buf) {
+ pr_debug("returning. no aux\n");
+ return;
+ }
+
+ if (!aux_buf->boottb_freq_saved) {
+ pr_debug("Copying boot tb to aux buffer: %lld\n", boot_tb);
+ /* Save boot_tb to convert raw timebase to it's relative system boot time */
+ boottb_freq.boot_tb = boot_tb;
+ /* Save tb_ticks_per_sec to convert timebase to sec */
+ boottb_freq.tb_freq = tb_ticks_per_sec;
+ boottb_freq.timebase = 0;
+ memcpy(aux_buf->base, &boottb_freq, sizeof(boottb_freq));
+ aux_buf->head += 1;
+ aux_buf->boottb_freq_saved = 1;
+ n_read += 1;
+ }
+
+ /* read the tail of the buffer if we've wrapped */
+ if (i + n_req > N_DISPATCH_LOG) {
+ read_size = N_DISPATCH_LOG - i;
+ vpa_dtl_capture_aux(&read_size, aux_buf, dtl, i);
+ n_req -= read_size;
+ n_read += read_size;
+ i = 0;
+ }
+
+ /* .. and now the head */
+ vpa_dtl_capture_aux(&n_req, aux_buf, dtl, i);
+
+ /* Move the aux->head to indicate size of data in aux buffer */
+ perf_aux_output_end(&vpa_ctx->handle, (n_req + n_read) * sizeof(struct dtl_entry));
}
/*
@@ -372,6 +500,7 @@ static void *vpa_dtl_setup_aux(struct perf_event *event, void **pages,
buf->size = nr_pages << PAGE_SHIFT;
buf->head = 0;
+ buf->boottb_freq_saved = 0;
return no_free_ptr(buf);
}
--
2.47.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 06/14] powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake up is needed
2025-08-15 8:33 [PATCH 00/14] Add interface to expose vpa dtl counters via perf Athira Rajeev
` (4 preceding siblings ...)
2025-08-15 8:33 ` [PATCH 05/14] powerpc/perf/vpa-dtl: Add support to capture DTL data in aux buffer Athira Rajeev
@ 2025-08-15 8:33 ` Athira Rajeev
2025-08-15 8:34 ` [PATCH 07/14] tools/perf: Add basic CONFIG_AUXTRACE support for VPA pmu on powerpc Athira Rajeev
` (9 subsequent siblings)
15 siblings, 0 replies; 29+ messages in thread
From: Athira Rajeev @ 2025-08-15 8:33 UTC (permalink / raw)
To: acme, jolsa, adrian.hunter, maddy, irogers, namhyung
Cc: linux-perf-users, linuxppc-dev, aboorvad, sshegde, atrajeev,
kjain, hbathini, Aditya.Bodkhe1, venkat88
Handle the case when the aux buffer is going to be full and
data needs to be written to the data file. perf_aux_output_begin()
function checks if there is enough space depending on the values of
aux_wakeup and aux_watermark which is part of "struct perf_buffer".
Inorder to maintain where to write to aux buffer, add two fields
to "struct vpa_pmu_buf". Field "threshold" to indicate total possible
DTL entries that can be contained in aux buffer and field "full" to
indicate anytime when buffer is full. In perf_aux_output_end, there
is check to see if wake up is needed based on aux head value.
In vpa_dtl_capture_aux(), check if there is enough space to contain the
DTL data. If not, save the data for available memory and set full to true.
Set head of private aux to zero when buffer is full so that next data
will be copied to beginning of the buffer. The address used for copying
to aux is "aux_copy_buf + buf->head". So once buffer is full, set head
to zero, so that next time it will be written from start of the buffer.
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
---
arch/powerpc/perf/vpa-dtl.c | 54 +++++++++++++++++++++++++++++++++++--
1 file changed, 52 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/perf/vpa-dtl.c b/arch/powerpc/perf/vpa-dtl.c
index ce17beddd4b4..d9651c5ef215 100644
--- a/arch/powerpc/perf/vpa-dtl.c
+++ b/arch/powerpc/perf/vpa-dtl.c
@@ -86,8 +86,11 @@ struct vpa_pmu_buf {
u64 *base;
u64 size;
u64 head;
+ u64 head_size;
/* boot timebase and frequency needs to be saved only at once */
int boottb_freq_saved;
+ u64 threshold;
+ bool full;
};
/*
@@ -121,11 +124,31 @@ static void vpa_dtl_capture_aux(long *n_entries, struct vpa_pmu_buf *buf,
{
struct dtl_entry *aux_copy_buf = (struct dtl_entry *)buf->base;
+ /*
+ * check if there is enough space to contain the
+ * DTL data. If not, save the data for available
+ * memory and set full to true.
+ */
+ if (buf->head + *n_entries >= buf->threshold) {
+ *n_entries = buf->threshold - buf->head;
+ buf->full = 1;
+ }
+
/*
* Copy to AUX buffer from per-thread address
*/
memcpy(aux_copy_buf + buf->head, &dtl->buf[index], *n_entries * sizeof(struct dtl_entry));
+ if (buf->full) {
+ /*
+ * Set head of private aux to zero when buffer is full
+ * so that next data will be copied to beginning of the
+ * buffer
+ */
+ buf->head = 0;
+ return;
+ }
+
buf->head += *n_entries;
return;
@@ -179,6 +202,7 @@ static void vpa_dtl_dump_sample_data(struct perf_event *event)
struct vpa_pmu_buf *aux_buf;
struct vpa_dtl *dtl = &per_cpu(vpa_dtl_cpu, event->cpu);
+ u64 size;
cur_idx = be64_to_cpu(lppaca_of(event->cpu).dtl_idx);
last_idx = dtl->last_idx;
@@ -223,13 +247,37 @@ static void vpa_dtl_dump_sample_data(struct perf_event *event)
n_req -= read_size;
n_read += read_size;
i = 0;
+ if (aux_buf->full) {
+ size = (n_read * sizeof(struct dtl_entry));
+ if ((size + aux_buf->head_size) > aux_buf->size) {
+ size = aux_buf->size - aux_buf->head_size;
+ perf_aux_output_end(&vpa_ctx->handle, size);
+ aux_buf->head = 0;
+ aux_buf->head_size = 0;
+ } else {
+ aux_buf->head_size += (n_read * sizeof(struct dtl_entry));
+ perf_aux_output_end(&vpa_ctx->handle, n_read * sizeof(struct dtl_entry));
+ }
+ goto out;
+ }
}
/* .. and now the head */
vpa_dtl_capture_aux(&n_req, aux_buf, dtl, i);
- /* Move the aux->head to indicate size of data in aux buffer */
- perf_aux_output_end(&vpa_ctx->handle, (n_req + n_read) * sizeof(struct dtl_entry));
+ size = ((n_req + n_read) * sizeof(struct dtl_entry));
+ if ((size + aux_buf->head_size) > aux_buf->size) {
+ size = aux_buf->size - aux_buf->head_size;
+ perf_aux_output_end(&vpa_ctx->handle, size);
+ aux_buf->head = 0;
+ aux_buf->head_size = 0;
+ } else {
+ aux_buf->head_size += ((n_req + n_read) * sizeof(struct dtl_entry));
+ /* Move the aux->head to indicate size of data in aux buffer */
+ perf_aux_output_end(&vpa_ctx->handle, (n_req + n_read) * sizeof(struct dtl_entry));
+ }
+out:
+ aux_buf->full = 0;
}
/*
@@ -500,7 +548,9 @@ static void *vpa_dtl_setup_aux(struct perf_event *event, void **pages,
buf->size = nr_pages << PAGE_SHIFT;
buf->head = 0;
+ buf->head_size = 0;
buf->boottb_freq_saved = 0;
+ buf->threshold = ((buf->size - 32) / sizeof(struct dtl_entry));
return no_free_ptr(buf);
}
--
2.47.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 07/14] tools/perf: Add basic CONFIG_AUXTRACE support for VPA pmu on powerpc
2025-08-15 8:33 [PATCH 00/14] Add interface to expose vpa dtl counters via perf Athira Rajeev
` (5 preceding siblings ...)
2025-08-15 8:33 ` [PATCH 06/14] powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake up is needed Athira Rajeev
@ 2025-08-15 8:34 ` Athira Rajeev
2025-08-27 17:27 ` Adrian Hunter
2025-08-15 8:34 ` [PATCH 08/14] tools/perf: process auxtrace events and display in perf report -D Athira Rajeev
` (8 subsequent siblings)
15 siblings, 1 reply; 29+ messages in thread
From: Athira Rajeev @ 2025-08-15 8:34 UTC (permalink / raw)
To: acme, jolsa, adrian.hunter, maddy, irogers, namhyung
Cc: linux-perf-users, linuxppc-dev, aboorvad, sshegde, atrajeev,
kjain, hbathini, Aditya.Bodkhe1, venkat88
The powerpc PMU collecting Dispatch Trace Log (DTL) entries makes use of
AUX support in perf infrastructure. The PMU driver has the functionality
to collect trace entries in the aux buffer. On the tools side, this data
is made available as PERF_RECORD_AUXTRACE records. This record is
generated by "perf record" command. To enable the creation of
PERF_RECORD_AUXTRACE, add functions to initialize auxtrace records ie
"auxtrace_record__init()". Fill in fields for other callbacks like
info_priv_size, info_fill, free, recording options etc. Define
auxtrace_type as PERF_AUXTRACE_VPA_PMU. Add header file to define vpa
dtl pmu specific details.
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
---
tools/perf/arch/powerpc/util/Build | 1 +
tools/perf/arch/powerpc/util/auxtrace.c | 122 ++++++++++++++++++++++++
tools/perf/util/auxtrace.c | 2 +
tools/perf/util/auxtrace.h | 1 +
tools/perf/util/powerpc-vpadtl.h | 26 +++++
5 files changed, 152 insertions(+)
create mode 100644 tools/perf/arch/powerpc/util/auxtrace.c
create mode 100644 tools/perf/util/powerpc-vpadtl.h
diff --git a/tools/perf/arch/powerpc/util/Build b/tools/perf/arch/powerpc/util/Build
index fdd6a77a3432..a5b0babd307e 100644
--- a/tools/perf/arch/powerpc/util/Build
+++ b/tools/perf/arch/powerpc/util/Build
@@ -10,3 +10,4 @@ perf-util-$(CONFIG_LIBDW) += skip-callchain-idx.o
perf-util-$(CONFIG_LIBUNWIND) += unwind-libunwind.o
perf-util-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
+perf-util-$(CONFIG_AUXTRACE) += auxtrace.o
diff --git a/tools/perf/arch/powerpc/util/auxtrace.c b/tools/perf/arch/powerpc/util/auxtrace.c
new file mode 100644
index 000000000000..ec8ec601fd08
--- /dev/null
+++ b/tools/perf/arch/powerpc/util/auxtrace.c
@@ -0,0 +1,122 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * VPA support
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include <linux/log2.h>
+#include <time.h>
+
+#include "../../util/cpumap.h"
+#include "../../util/evsel.h"
+#include "../../util/evlist.h"
+#include "../../util/session.h"
+#include "../../util/util.h"
+#include "../../util/pmu.h"
+#include "../../util/debug.h"
+#include "../../util/auxtrace.h"
+#include "../../util/powerpc-vpadtl.h"
+#include "../../util/record.h"
+#include <internal/lib.h> // page_size
+
+#define KiB(x) ((x) * 1024)
+
+static int
+powerpc_vpadtl_parse_snapshot_options(struct auxtrace_record *itr __maybe_unused,
+ struct record_opts *opts __maybe_unused,
+ const char *str __maybe_unused)
+{
+ return 0;
+}
+
+static int
+powerpc_vpadtl_recording_options(struct auxtrace_record *ar __maybe_unused,
+ struct evlist *evlist __maybe_unused,
+ struct record_opts *opts)
+{
+ opts->full_auxtrace = true;
+
+ /*
+ * Set auxtrace_mmap_pages to minimum
+ * two pages
+ */
+ if (!opts->auxtrace_mmap_pages) {
+ opts->auxtrace_mmap_pages = KiB(128) / page_size;
+ if (opts->mmap_pages == UINT_MAX)
+ opts->mmap_pages = KiB(256) / page_size;
+ }
+
+ return 0;
+}
+
+static size_t powerpc_vpadtl_info_priv_size(struct auxtrace_record *itr __maybe_unused,
+ struct evlist *evlist __maybe_unused)
+{
+ return 0;
+}
+
+static int
+powerpc_vpadtl_info_fill(struct auxtrace_record *itr __maybe_unused,
+ struct perf_session *session __maybe_unused,
+ struct perf_record_auxtrace_info *auxtrace_info __maybe_unused,
+ size_t priv_size __maybe_unused)
+{
+ auxtrace_info->type = PERF_AUXTRACE_VPA_PMU;
+
+ return 0;
+}
+
+static u64 powerpc_vpadtl_reference(struct auxtrace_record *itr __maybe_unused)
+{
+ return 0;
+}
+
+static void powerpc_vpadtl_free(struct auxtrace_record *itr)
+{
+ free(itr);
+}
+
+struct auxtrace_record *auxtrace_record__init(struct evlist *evlist __maybe_unused,
+ int *err)
+{
+ struct auxtrace_record *aux;
+ struct evsel *pos;
+ char *pmu_name;
+ int found = 0;
+
+ evlist__for_each_entry(evlist, pos) {
+ pmu_name = strdup(pos->name);
+ pmu_name = strtok(pmu_name, "/");
+ if (!strcmp(pmu_name, "vpa_dtl")) {
+ found = 1;
+ pos->needs_auxtrace_mmap = true;
+ break;
+ }
+ }
+
+ if (!found)
+ return NULL;
+
+ /*
+ * To obtain the auxtrace buffer file descriptor, the auxtrace event
+ * must come first.
+ */
+ evlist__to_front(pos->evlist, pos);
+
+ aux = zalloc(sizeof(*aux));
+ if (aux == NULL) {
+ pr_debug("aux record is NULL\n");
+ *err = -ENOMEM;
+ return NULL;
+ }
+
+ aux->parse_snapshot_options = powerpc_vpadtl_parse_snapshot_options;
+ aux->recording_options = powerpc_vpadtl_recording_options;
+ aux->info_priv_size = powerpc_vpadtl_info_priv_size;
+ aux->info_fill = powerpc_vpadtl_info_fill;
+ aux->free = powerpc_vpadtl_free;
+ aux->reference = powerpc_vpadtl_reference;
+ return aux;
+}
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index ebd32f1b8f12..f587d386c5ef 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -55,6 +55,7 @@
#include "hisi-ptt.h"
#include "s390-cpumsf.h"
#include "util/mmap.h"
+#include "powerpc-vpadtl.h"
#include <linux/ctype.h>
#include "symbol/kallsyms.h"
@@ -1393,6 +1394,7 @@ int perf_event__process_auxtrace_info(struct perf_session *session,
case PERF_AUXTRACE_HISI_PTT:
err = hisi_ptt_process_auxtrace_info(event, session);
break;
+ case PERF_AUXTRACE_VPA_PMU:
case PERF_AUXTRACE_UNKNOWN:
default:
return -EINVAL;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index f001cbb68f8e..1f9ef473af77 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -50,6 +50,7 @@ enum auxtrace_type {
PERF_AUXTRACE_ARM_SPE,
PERF_AUXTRACE_S390_CPUMSF,
PERF_AUXTRACE_HISI_PTT,
+ PERF_AUXTRACE_VPA_PMU,
};
enum itrace_period_type {
diff --git a/tools/perf/util/powerpc-vpadtl.h b/tools/perf/util/powerpc-vpadtl.h
new file mode 100644
index 000000000000..625172adaba5
--- /dev/null
+++ b/tools/perf/util/powerpc-vpadtl.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * VPA DTL PMU Support
+ */
+
+#ifndef INCLUDE__PERF_POWERPC_VPADTL_H__
+#define INCLUDE__PERF_POWERPC_VPADTL_H__
+
+#define POWERPC_VPADTL_NAME "powerpc_vpadtl_"
+
+enum {
+ POWERPC_VPADTL_TYPE,
+ VPADTL_PER_CPU_MMAPS,
+ VPADTL_AUXTRACE_PRIV_MAX,
+};
+
+#define VPADTL_AUXTRACE_PRIV_SIZE (VPADTL_AUXTRACE_PRIV_MAX * sizeof(u64))
+
+union perf_event;
+struct perf_session;
+struct perf_pmu;
+
+int powerpc_vpadtl_process_auxtrace_info(union perf_event *event,
+ struct perf_session *session);
+
+#endif
--
2.47.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 08/14] tools/perf: process auxtrace events and display in perf report -D
2025-08-15 8:33 [PATCH 00/14] Add interface to expose vpa dtl counters via perf Athira Rajeev
` (6 preceding siblings ...)
2025-08-15 8:34 ` [PATCH 07/14] tools/perf: Add basic CONFIG_AUXTRACE support for VPA pmu on powerpc Athira Rajeev
@ 2025-08-15 8:34 ` Athira Rajeev
2025-08-27 17:28 ` Adrian Hunter
2025-08-15 8:34 ` [PATCH 09/14] tools/perf: Add event name as vpa-dtl of PERF_TYPE_SYNTH type to present DTL samples Athira Rajeev
` (7 subsequent siblings)
15 siblings, 1 reply; 29+ messages in thread
From: Athira Rajeev @ 2025-08-15 8:34 UTC (permalink / raw)
To: acme, jolsa, adrian.hunter, maddy, irogers, namhyung
Cc: linux-perf-users, linuxppc-dev, aboorvad, sshegde, atrajeev,
kjain, hbathini, Aditya.Bodkhe1, venkat88
Add vpa dtl pmu auxtrace process function for "perf report -D".
The auxtrace event processing functions are defined in file
"util/powerpc-vpadtl.c". Data structures used includes "struct
powerpc_vpadtl_queue", "struct powerpc_vpadtl" to store the auxtrace
buffers in queue. Different PERF_RECORD_XXX are generated
during recording. PERF_RECORD_AUXTRACE_INFO is processed first
since it is of type perf_user_event_type and perf session event
delivers perf_session__process_user_event() first. Define function
powerpc_vpadtl_process_auxtrace_info() to handle the processing of
PERF_RECORD_AUXTRACE_INFO records. In this function, initialize
the aux buffer queues using auxtrace_queues__init(). Setup the
required infrastructure for aux data processing. The data is collected
per CPU and auxtrace_queue is created for each CPU.
Define powerpc_vpadtl_process_event() function to process
PERF_RECORD_AUXTRACE records. In this, add the event to queue using
auxtrace_queues__add_event() and process the buffer in
powerpc_vpadtl_dump_event(). The first entry in the buffer with
timebase as zero has boot timebase and frequency. Remaining data is of
format for "struct dtl_entry". Define the translation for
dispatch_reasons and preempt_reasons, report this when dump trace is
invoked via powerpc_vpadtl_dump()
Sample output:
./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.300 MB perf.data ]
./perf report -D
0 0 0x39b10 [0x30]: PERF_RECORD_AUXTRACE size: 0x690 offset: 0 ref: 0 idx: 0 tid: -1 cpu: 0
.
. ... VPA DTL PMU data: size 1680 bytes, entries is 35
. 00000000: boot_tb: 21349649546353231, tb_freq: 512000000
. 00000030: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:7064, ready_to_enqueue_time:187, waiting_to_ready_time:6611773
. 00000060: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:146, ready_to_enqueue_time:0, waiting_to_ready_time:15359437
. 00000090: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4868, ready_to_enqueue_time:232, waiting_to_ready_time:5100709
. 000000c0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:179, ready_to_enqueue_time:0, waiting_to_ready_time:30714243
. 000000f0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:197, ready_to_enqueue_time:0, waiting_to_ready_time:15350648
. 00000120: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:213, ready_to_enqueue_time:0, waiting_to_ready_time:15353446
. 00000150: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:212, ready_to_enqueue_time:0, waiting_to_ready_time:15355126
. 00000180: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:6368, ready_to_enqueue_time:164, waiting_to_ready_time:5104665
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
---
tools/perf/util/Build | 1 +
tools/perf/util/auxtrace.c | 2 +
tools/perf/util/powerpc-vpadtl.c | 299 +++++++++++++++++++++++++++++++
3 files changed, 302 insertions(+)
create mode 100644 tools/perf/util/powerpc-vpadtl.c
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 4959e7a990e4..5ead46dc98e7 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -136,6 +136,7 @@ perf-util-$(CONFIG_AUXTRACE) += arm-spe-decoder/
perf-util-$(CONFIG_AUXTRACE) += hisi-ptt.o
perf-util-$(CONFIG_AUXTRACE) += hisi-ptt-decoder/
perf-util-$(CONFIG_AUXTRACE) += s390-cpumsf.o
+perf-util-$(CONFIG_AUXTRACE) += powerpc-vpadtl.o
ifdef CONFIG_LIBOPENCSD
perf-util-$(CONFIG_AUXTRACE) += cs-etm.o
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index f587d386c5ef..bd1404f26bb7 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1395,6 +1395,8 @@ int perf_event__process_auxtrace_info(struct perf_session *session,
err = hisi_ptt_process_auxtrace_info(event, session);
break;
case PERF_AUXTRACE_VPA_PMU:
+ err = powerpc_vpadtl_process_auxtrace_info(event, session);
+ break;
case PERF_AUXTRACE_UNKNOWN:
default:
return -EINVAL;
diff --git a/tools/perf/util/powerpc-vpadtl.c b/tools/perf/util/powerpc-vpadtl.c
new file mode 100644
index 000000000000..ea7b59c45f4a
--- /dev/null
+++ b/tools/perf/util/powerpc-vpadtl.c
@@ -0,0 +1,299 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * VPA DTL PMU support
+ */
+
+#include <endian.h>
+#include <errno.h>
+#include <byteswap.h>
+#include <inttypes.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include <linux/log2.h>
+#include <elf.h>
+#include <limits.h>
+
+#include "cpumap.h"
+#include "color.h"
+#include "evsel.h"
+#include "evlist.h"
+#include "machine.h"
+#include "session.h"
+#include "util.h"
+#include "thread.h"
+#include "debug.h"
+#include "auxtrace.h"
+#include "powerpc-vpadtl.h"
+#include "map.h"
+#include "symbol_conf.h"
+#include "symbol.h"
+
+/*
+ * The DTL entries are of below format
+ */
+struct dtl_entry {
+ u8 dispatch_reason;
+ u8 preempt_reason;
+ u16 processor_id;
+ u32 enqueue_to_dispatch_time;
+ u32 ready_to_enqueue_time;
+ u32 waiting_to_ready_time;
+ u64 timebase;
+ u64 fault_addr;
+ u64 srr0;
+ u64 srr1;
+};
+
+/*
+ * Structure to save the auxtrace queue
+ */
+struct powerpc_vpadtl {
+ struct auxtrace auxtrace;
+ struct auxtrace_queues queues;
+ struct auxtrace_heap heap;
+ u32 auxtrace_type;
+ struct perf_session *session;
+ struct machine *machine;
+ u32 pmu_type;
+};
+
+struct boottb_freq {
+ u64 boot_tb;
+ u64 tb_freq;
+ u64 timebase;
+ u64 padded[3];
+};
+
+struct powerpc_vpadtl_queue {
+ struct powerpc_vpadtl *vpa;
+ unsigned int queue_nr;
+ struct auxtrace_buffer *buffer;
+ struct thread *thread;
+ bool on_heap;
+ bool done;
+ pid_t pid;
+ pid_t tid;
+ int cpu;
+};
+
+const char *dispatch_reasons[11] = {
+ "external_interrupt",
+ "firmware_internal_event",
+ "H_PROD",
+ "decrementer_interrupt",
+ "system_reset",
+ "firmware_internal_event",
+ "conferred_cycles",
+ "time_slice",
+ "virtual_memory_page_fault",
+ "expropriated_adjunct",
+ "priv_doorbell"};
+
+const char *preempt_reasons[10] = {
+ "unused",
+ "firmware_internal_event",
+ "H_CEDE",
+ "H_CONFER",
+ "time_slice",
+ "migration_hibernation_page_fault",
+ "virtual_memory_page_fault",
+ "H_CONFER_ADJUNCT",
+ "hcall_adjunct",
+ "HDEC_adjunct"};
+
+#define dtl_entry_size 48
+
+/*
+ * Function to dump the dispatch trace data when perf report
+ * is invoked with -D
+ */
+static void powerpc_vpadtl_dump(struct powerpc_vpadtl *vpa __maybe_unused,
+ unsigned char *buf, size_t len)
+{
+ struct dtl_entry *dtl;
+ int pkt_len, pos = 0;
+ const char *color = PERF_COLOR_BLUE;
+
+ color_fprintf(stdout, color,
+ ". ... VPA DTL PMU data: size %zu bytes, entries is %zu\n",
+ len, len/dtl_entry_size);
+
+ if (len % dtl_entry_size)
+ len = len - (len % dtl_entry_size);
+
+ while (len) {
+ pkt_len = 48;
+ printf(".");
+ color_fprintf(stdout, color, " %08x: ", pos);
+ dtl = (struct dtl_entry *)buf;
+ if (dtl->timebase != 0) {
+ printf("dispatch_reason:%s, preempt_reason:%s, enqueue_to_dispatch_time:%d, ready_to_enqueue_time:%d, waiting_to_ready_time:%d\n",
+ dispatch_reasons[dtl->dispatch_reason], preempt_reasons[dtl->preempt_reason], be32_to_cpu(dtl->enqueue_to_dispatch_time),
+ be32_to_cpu(dtl->ready_to_enqueue_time), be32_to_cpu(dtl->waiting_to_ready_time));
+ } else {
+ struct boottb_freq *boot_tb = (struct boottb_freq *)buf;
+
+ printf("boot_tb: %" PRIu64 ", tb_freq: %" PRIu64 "\n", boot_tb->boot_tb, boot_tb->tb_freq);
+ }
+
+ pos += pkt_len;
+ buf += pkt_len;
+ len -= pkt_len;
+ }
+}
+
+static void powerpc_vpadtl_dump_event(struct powerpc_vpadtl *vpa, unsigned char *buf,
+ size_t len)
+{
+ printf(".\n");
+ powerpc_vpadtl_dump(vpa, buf, len);
+}
+
+static int powerpc_vpadtl_process_event(struct perf_session *session __maybe_unused,
+ union perf_event *event __maybe_unused,
+ struct perf_sample *sample __maybe_unused,
+ const struct perf_tool *tool __maybe_unused)
+{
+ return 0;
+}
+
+/*
+ * Process PERF_RECORD_AUXTRACE records
+ */
+static int powerpc_vpadtl_process_auxtrace_event(struct perf_session *session,
+ union perf_event *event,
+ const struct perf_tool *tool __maybe_unused)
+{
+ struct powerpc_vpadtl *vpa = container_of(session->auxtrace, struct powerpc_vpadtl,
+ auxtrace);
+ struct auxtrace_buffer *buffer;
+ off_t data_offset;
+ int fd = perf_data__fd(session->data);
+ int err;
+
+ if (perf_data__is_pipe(session->data)) {
+ data_offset = 0;
+ } else {
+ data_offset = lseek(fd, 0, SEEK_CUR);
+ if (data_offset == -1)
+ return -errno;
+ }
+
+ err = auxtrace_queues__add_event(&vpa->queues, session, event,
+ data_offset, &buffer);
+ if (err)
+ return err;
+
+ /* Dump here now we have copied a piped trace out of the pipe */
+ if (dump_trace) {
+ if (auxtrace_buffer__get_data(buffer, fd)) {
+ powerpc_vpadtl_dump_event(vpa, buffer->data,
+ buffer->size);
+ auxtrace_buffer__put_data(buffer);
+ }
+ }
+
+ return 0;
+}
+
+static int powerpc_vpadtl_flush(struct perf_session *session __maybe_unused,
+ const struct perf_tool *tool __maybe_unused)
+{
+ return 0;
+}
+
+static void powerpc_vpadtl_free_queue(void *priv)
+{
+ struct powerpc_vpadtl_queue *vpaq = priv;
+
+ if (!vpaq)
+ return;
+
+ free(vpaq);
+}
+
+static void powerpc_vpadtl_free_events(struct perf_session *session)
+{
+ struct powerpc_vpadtl *vpa = container_of(session->auxtrace, struct powerpc_vpadtl,
+ auxtrace);
+ struct auxtrace_queues *queues = &vpa->queues;
+ unsigned int i;
+
+ for (i = 0; i < queues->nr_queues; i++) {
+ powerpc_vpadtl_free_queue(queues->queue_array[i].priv);
+ queues->queue_array[i].priv = NULL;
+ }
+ auxtrace_queues__free(queues);
+}
+
+static void powerpc_vpadtl_free(struct perf_session *session)
+{
+ struct powerpc_vpadtl *vpa = container_of(session->auxtrace, struct powerpc_vpadtl,
+ auxtrace);
+
+ auxtrace_heap__free(&vpa->heap);
+ powerpc_vpadtl_free_events(session);
+ session->auxtrace = NULL;
+ free(vpa);
+}
+
+static const char * const powerpc_vpadtl_info_fmts[] = {
+ [POWERPC_VPADTL_TYPE] = " PMU Type %"PRId64"\n",
+};
+
+static void powerpc_vpadtl_print_info(__u64 *arr)
+{
+ if (!dump_trace)
+ return;
+
+ fprintf(stdout, powerpc_vpadtl_info_fmts[POWERPC_VPADTL_TYPE], arr[POWERPC_VPADTL_TYPE]);
+}
+
+/*
+ * Process the PERF_RECORD_AUXTRACE_INFO records and setup
+ * the infrastructure to process auxtrace events. PERF_RECORD_AUXTRACE_INFO
+ * is processed first since it is of type perf_user_event_type.
+ * Initialise the aux buffer queues using auxtrace_queues__init().
+ * auxtrace_queue is created for each CPU.
+ */
+int powerpc_vpadtl_process_auxtrace_info(union perf_event *event,
+ struct perf_session *session)
+{
+ struct perf_record_auxtrace_info *auxtrace_info = &event->auxtrace_info;
+ size_t min_sz = sizeof(u64) * POWERPC_VPADTL_TYPE;
+ struct powerpc_vpadtl *vpa;
+ int err;
+
+ if (auxtrace_info->header.size < sizeof(struct perf_record_auxtrace_info) +
+ min_sz)
+ return -EINVAL;
+
+ vpa = zalloc(sizeof(struct powerpc_vpadtl));
+ if (!vpa)
+ return -ENOMEM;
+
+ err = auxtrace_queues__init(&vpa->queues);
+ if (err)
+ goto err_free;
+
+ vpa->session = session;
+ vpa->machine = &session->machines.host; /* No kvm support */
+ vpa->auxtrace_type = auxtrace_info->type;
+ vpa->pmu_type = auxtrace_info->priv[POWERPC_VPADTL_TYPE];
+
+ vpa->auxtrace.process_event = powerpc_vpadtl_process_event;
+ vpa->auxtrace.process_auxtrace_event = powerpc_vpadtl_process_auxtrace_event;
+ vpa->auxtrace.flush_events = powerpc_vpadtl_flush;
+ vpa->auxtrace.free_events = powerpc_vpadtl_free_events;
+ vpa->auxtrace.free = powerpc_vpadtl_free;
+ session->auxtrace = &vpa->auxtrace;
+
+ powerpc_vpadtl_print_info(&auxtrace_info->priv[0]);
+
+ return 0;
+
+err_free:
+ free(vpa);
+ return err;
+}
--
2.47.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 09/14] tools/perf: Add event name as vpa-dtl of PERF_TYPE_SYNTH type to present DTL samples
2025-08-15 8:33 [PATCH 00/14] Add interface to expose vpa dtl counters via perf Athira Rajeev
` (7 preceding siblings ...)
2025-08-15 8:34 ` [PATCH 08/14] tools/perf: process auxtrace events and display in perf report -D Athira Rajeev
@ 2025-08-15 8:34 ` Athira Rajeev
2025-08-15 8:34 ` [PATCH 10/14] tools/perf: Allocate and setup aux buffer queue to help co-relate with other events across CPU's Athira Rajeev
` (6 subsequent siblings)
15 siblings, 0 replies; 29+ messages in thread
From: Athira Rajeev @ 2025-08-15 8:34 UTC (permalink / raw)
To: acme, jolsa, adrian.hunter, maddy, irogers, namhyung
Cc: linux-perf-users, linuxppc-dev, aboorvad, sshegde, atrajeev,
kjain, hbathini, Aditya.Bodkhe1, venkat88
Dispatch Trace Log details are captured as-is in PERF_RECORD_AUXTRACE
records. To present dtl entries as samples, create an event with name as
"vpa-dtl" and type PERF_TYPE_SYNTH. Add perf_synth_id,
"PERF_SYNTH_POWERPC_VPA_DTL" as config value for the event. Create a
sample id to be a fixed offset from evsel id.
To present the relevant fields from the "struct dtl_entry",
prepare the entries as events of type PERF_TYPE_SYNTH. By
defining as PERF_TYPE_SYNTH type, samples can be printed as part of
perf_sample__fprintf_synth in builtin-script.c
From powerpc_vpadtl_process_auxtrace_info(), invoke
auxtrace_queues__process_index() function which will queue the
auxtrace buffers by invoke auxtrace_queues__add_event().
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
---
tools/perf/util/event.h | 1 +
tools/perf/util/powerpc-vpadtl.c | 75 ++++++++++++++++++++++++++++++++
2 files changed, 76 insertions(+)
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index e40d16d3246c..eea21a542b22 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -117,6 +117,7 @@ enum perf_synth_id {
PERF_SYNTH_INTEL_PSB,
PERF_SYNTH_INTEL_EVT,
PERF_SYNTH_INTEL_IFLAG_CHG,
+ PERF_SYNTH_POWERPC_VPA_DTL,
};
/*
diff --git a/tools/perf/util/powerpc-vpadtl.c b/tools/perf/util/powerpc-vpadtl.c
index ea7b59c45f4a..36c02821cf0a 100644
--- a/tools/perf/util/powerpc-vpadtl.c
+++ b/tools/perf/util/powerpc-vpadtl.c
@@ -56,6 +56,7 @@ struct powerpc_vpadtl {
struct perf_session *session;
struct machine *machine;
u32 pmu_type;
+ u64 sample_id;
};
struct boottb_freq {
@@ -250,6 +251,65 @@ static void powerpc_vpadtl_print_info(__u64 *arr)
fprintf(stdout, powerpc_vpadtl_info_fmts[POWERPC_VPADTL_TYPE], arr[POWERPC_VPADTL_TYPE]);
}
+static void set_event_name(struct evlist *evlist, u64 id,
+ const char *name)
+{
+ struct evsel *evsel;
+
+ evlist__for_each_entry(evlist, evsel) {
+ if (evsel->core.id && evsel->core.id[0] == id) {
+ if (evsel->name)
+ zfree(&evsel->name);
+ evsel->name = strdup(name);
+ break;
+ }
+ }
+}
+
+static int
+powerpc_vpadtl_synth_events(struct powerpc_vpadtl *vpa, struct perf_session *session)
+{
+ struct evlist *evlist = session->evlist;
+ struct evsel *evsel;
+ struct perf_event_attr attr;
+ bool found = false;
+ u64 id;
+ int err;
+
+ evlist__for_each_entry(evlist, evsel) {
+ if (evsel->core.attr.type == vpa->pmu_type) {
+ found = true;
+ break;
+ }
+ }
+
+ if (!found) {
+ pr_debug("No selected events with VPA trace data\n");
+ return 0;
+ }
+
+ memset(&attr, 0, sizeof(struct perf_event_attr));
+ attr.size = sizeof(struct perf_event_attr);
+ attr.sample_type = evsel->core.attr.sample_type;
+ attr.sample_id_all = evsel->core.attr.sample_id_all;
+ attr.type = PERF_TYPE_SYNTH;
+ attr.config = PERF_SYNTH_POWERPC_VPA_DTL;
+
+ /* create new id val to be a fixed offset from evsel id */
+ id = evsel->core.id[0] + 1000000000;
+ if (!id)
+ id = 1;
+
+ err = perf_session__deliver_synth_attr_event(session, &attr, id);
+ if (err)
+ return err;
+
+ vpa->sample_id = id;
+ set_event_name(evlist, id, "vpa-dtl");
+
+ return 0;
+}
+
/*
* Process the PERF_RECORD_AUXTRACE_INFO records and setup
* the infrastructure to process auxtrace events. PERF_RECORD_AUXTRACE_INFO
@@ -291,8 +351,23 @@ int powerpc_vpadtl_process_auxtrace_info(union perf_event *event,
powerpc_vpadtl_print_info(&auxtrace_info->priv[0]);
+ if (dump_trace)
+ return 0;
+
+ err = powerpc_vpadtl_synth_events(vpa, session);
+ if (err)
+ goto err_free_queues;
+
+ err = auxtrace_queues__process_index(&vpa->queues, session);
+ if (err)
+ goto err_free_queues;
+
return 0;
+err_free_queues:
+ auxtrace_queues__free(&vpa->queues);
+ session->auxtrace = NULL;
+
err_free:
free(vpa);
return err;
--
2.47.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 10/14] tools/perf: Allocate and setup aux buffer queue to help co-relate with other events across CPU's
2025-08-15 8:33 [PATCH 00/14] Add interface to expose vpa dtl counters via perf Athira Rajeev
` (8 preceding siblings ...)
2025-08-15 8:34 ` [PATCH 09/14] tools/perf: Add event name as vpa-dtl of PERF_TYPE_SYNTH type to present DTL samples Athira Rajeev
@ 2025-08-15 8:34 ` Athira Rajeev
2025-08-27 17:29 ` Adrian Hunter
2025-08-15 8:34 ` [PATCH 11/14] tools/perf: Process the DTL entries in queue and deliver samples Athira Rajeev
` (5 subsequent siblings)
15 siblings, 1 reply; 29+ messages in thread
From: Athira Rajeev @ 2025-08-15 8:34 UTC (permalink / raw)
To: acme, jolsa, adrian.hunter, maddy, irogers, namhyung
Cc: linux-perf-users, linuxppc-dev, aboorvad, sshegde, atrajeev,
kjain, hbathini, Aditya.Bodkhe1, venkat88
When the Dispatch Trace Log data is collected along with other events
like sched tracepoint events, it needs to be correlated and present
interleaved along with these events. Perf events can be collected
parallely across the CPUs. Hence it needs to be ensured events/dtl
entries are processed in timestamp order.
An auxtrace_queue is created for each CPU. Data within each queue is in
increasing order of timestamp. Each auxtrace queue has a array/list of
auxtrace buffers. When processing the auxtrace buffer, the data is
mmapp'ed. All auxtrace queues is maintained in auxtrace heap. Each queue
has a queue number and a timestamp. The queues are sorted/added to head
based on the time stamp. So always the lowest timestamp (entries to be
processed first) is on top of the heap.
The auxtrace queue needs to be allocated and heap needs to be populated
in the sorted order of timestamp. The queue needs to be filled with data
only once via powerpc_vpadtl__update_queues() function.
powerpc_vpadtl__setup_queues() iterates through all the entries to
allocate and setup the auxtrace queue. To add to auxtrace heap, it is
required to fetch the timebase of first entry for each of the queue.
The first entry in the queue for VPA DTL PMU has the boot timebase,
frequency details which are needed to get timestamp which is required to
correlate with other events. The very next entry is the actual trace data
that provides timestamp for occurrence of DTL event. Formula used to get
the timestamp from dtl entry is:
((timbase from DTL entry - boot time) / frequency) * 1000000000
powerpc_vpadtl_decode() adds the boot time and frequency as part of
powerpc_vpadtl_queue structure so that it can be reused. Each of the
dtl_entry is of 48 bytes size. Sometimes it could happen that one buffer
is only partially processed (if the timestamp of occurrence of another
event is more than currently processed element in queue, it will move on
to next event). Inorder to keep track of position of buffer, additional
fields is added to powerpc_vpadtl_queue structure.
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
---
tools/perf/util/powerpc-vpadtl.c | 219 ++++++++++++++++++++++++++++++-
1 file changed, 218 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/powerpc-vpadtl.c b/tools/perf/util/powerpc-vpadtl.c
index 36c02821cf0a..299927901c9d 100644
--- a/tools/perf/util/powerpc-vpadtl.c
+++ b/tools/perf/util/powerpc-vpadtl.c
@@ -28,6 +28,7 @@
#include "map.h"
#include "symbol_conf.h"
#include "symbol.h"
+#include "tool.h"
/*
* The DTL entries are of below format
@@ -72,6 +73,14 @@ struct powerpc_vpadtl_queue {
struct auxtrace_buffer *buffer;
struct thread *thread;
bool on_heap;
+ struct dtl_entry *dtl;
+ u64 timestamp;
+ unsigned long pkt_len;
+ unsigned long buf_len;
+ u64 boot_tb;
+ u64 tb_freq;
+ unsigned int tb_buffer;
+ unsigned int size;
bool done;
pid_t pid;
pid_t tid;
@@ -151,12 +160,217 @@ static void powerpc_vpadtl_dump_event(struct powerpc_vpadtl *vpa, unsigned char
powerpc_vpadtl_dump(vpa, buf, len);
}
+static int powerpc_vpadtl_get_buffer(struct powerpc_vpadtl_queue *vpaq)
+{
+ struct auxtrace_buffer *buffer = vpaq->buffer;
+ struct auxtrace_queues *queues = &vpaq->vpa->queues;
+ struct auxtrace_queue *queue;
+
+ queue = &queues->queue_array[vpaq->queue_nr];
+ buffer = auxtrace_buffer__next(queue, buffer);
+
+ if (!buffer)
+ return 0;
+
+ vpaq->buffer = buffer;
+ vpaq->size = buffer->size;
+
+ /* If the aux_buffer doesn't have data associated, try to load it */
+ if (!buffer->data) {
+ /* get the file desc associated with the perf data file */
+ int fd = perf_data__fd(vpaq->vpa->session->data);
+
+ buffer->data = auxtrace_buffer__get_data(buffer, fd);
+ if (!buffer->data)
+ return -ENOMEM;
+ }
+
+ vpaq->buf_len = buffer->size;
+
+ if (buffer->size % dtl_entry_size)
+ vpaq->buf_len = buffer->size - (buffer->size % dtl_entry_size);
+
+ if (vpaq->tb_buffer != buffer->buffer_nr) {
+ vpaq->pkt_len = 0;
+ vpaq->tb_buffer = 0;
+ }
+
+ return 1;
+}
+
+/*
+ * The first entry in the queue for VPA DTL PMU has the boot timebase,
+ * frequency details which are needed to get timestamp which is required to
+ * correlate with other events. Save the boot_tb and tb_freq as part of
+ * powerpc_vpadtl_queue. The very next entry is the actual trace data to
+ * be returned.
+ */
+static int powerpc_vpadtl_decode(struct powerpc_vpadtl_queue *vpaq)
+{
+ int ret;
+ char *buf;
+ struct boottb_freq *boottb;
+
+ ret = powerpc_vpadtl_get_buffer(vpaq);
+ if (ret <= 0)
+ return ret;
+
+ boottb = (struct boottb_freq *)vpaq->buffer->data;
+ if (boottb->timebase == 0) {
+ vpaq->boot_tb = boottb->boot_tb;
+ vpaq->tb_freq = boottb->tb_freq;
+ vpaq->pkt_len += dtl_entry_size;
+ }
+
+ buf = vpaq->buffer->data;
+ buf += vpaq->pkt_len;
+ vpaq->dtl = (struct dtl_entry *)buf;
+
+ vpaq->tb_buffer = vpaq->buffer->buffer_nr;
+ vpaq->buffer = NULL;
+ vpaq->buf_len = 0;
+
+ return 1;
+}
+
+static struct powerpc_vpadtl_queue *powerpc_vpadtl__alloc_queue(struct powerpc_vpadtl *vpa,
+ unsigned int queue_nr)
+{
+ struct powerpc_vpadtl_queue *vpaq;
+
+ vpaq = zalloc(sizeof(*vpaq));
+ if (!vpaq)
+ return NULL;
+
+ vpaq->vpa = vpa;
+ vpaq->queue_nr = queue_nr;
+
+ return vpaq;
+}
+
+/*
+ * When the Dispatch Trace Log data is collected along with other events
+ * like sched tracepoint events, it needs to be correlated and present
+ * interleaved along with these events. Perf events can be collected
+ * parallely across the CPUs.
+ *
+ * An auxtrace_queue is created for each CPU. Data within each queue is in
+ * increasing order of timestamp. Allocate and setup auxtrace queues here.
+ * All auxtrace queues is maintained in auxtrace heap in the increasing order
+ * of timestamp. So always the lowest timestamp (entries to be processed first)
+ * is on top of the heap.
+ *
+ * To add to auxtrace heap, fetch the timestamp from first DTL entry
+ * for each of the queue.
+ */
+static int powerpc_vpadtl__setup_queue(struct powerpc_vpadtl *vpa,
+ struct auxtrace_queue *queue,
+ unsigned int queue_nr)
+{
+ struct powerpc_vpadtl_queue *vpaq = queue->priv;
+ struct dtl_entry *record;
+ double result, div;
+ double boot_freq;
+ unsigned long long boot_tb;
+ unsigned long long diff;
+ unsigned long long save = 0;
+
+ if (list_empty(&queue->head) || vpaq)
+ return 0;
+
+ vpaq = powerpc_vpadtl__alloc_queue(vpa, queue_nr);
+ if (!vpaq)
+ return -ENOMEM;
+
+ queue->priv = vpaq;
+
+ if (queue->cpu != -1)
+ vpaq->cpu = queue->cpu;
+
+ if (!vpaq->on_heap) {
+ int ret;
+retry:
+ ret = powerpc_vpadtl_decode(vpaq);
+ if (!ret)
+ return 0;
+
+ if (ret < 0)
+ goto retry;
+
+ record = vpaq->dtl;
+ /*
+ * Formula used to get timestamp that can be co-related with
+ * other perf events:
+ * ((timbase from DTL entry - boot time) / frequency) * 1000000000
+ */
+ if (record->timebase) {
+ boot_tb = vpaq->boot_tb;
+ boot_freq = vpaq->tb_freq;
+ diff = be64_to_cpu(record->timebase) - boot_tb;
+ div = diff / boot_freq;
+ result = div;
+ result = result * 1000000000;
+ save = result;
+ }
+
+ vpaq->timestamp = save;
+ ret = auxtrace_heap__add(&vpa->heap, queue_nr, vpaq->timestamp);
+ if (ret)
+ return ret;
+ vpaq->on_heap = true;
+ }
+
+ return 0;
+}
+
+static int powerpc_vpadtl__setup_queues(struct powerpc_vpadtl *vpa)
+{
+ unsigned int i;
+ int ret;
+
+ for (i = 0; i < vpa->queues.nr_queues; i++) {
+ ret = powerpc_vpadtl__setup_queue(vpa, &vpa->queues.queue_array[i], i);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
+static int powerpc_vpadtl__update_queues(struct powerpc_vpadtl *vpa)
+{
+ if (vpa->queues.new_data) {
+ vpa->queues.new_data = false;
+ return powerpc_vpadtl__setup_queues(vpa);
+ }
+
+ return 0;
+}
+
static int powerpc_vpadtl_process_event(struct perf_session *session __maybe_unused,
union perf_event *event __maybe_unused,
struct perf_sample *sample __maybe_unused,
const struct perf_tool *tool __maybe_unused)
{
- return 0;
+ int err = 0;
+ struct powerpc_vpadtl *vpa = container_of(session->auxtrace,
+ struct powerpc_vpadtl, auxtrace);
+
+ if (dump_trace)
+ return 0;
+
+ if (!tool->ordered_events) {
+ pr_err("VPA requires ordered events\n");
+ return -EINVAL;
+ }
+
+ if (sample->time) {
+ err = powerpc_vpadtl__update_queues(vpa);
+ if (err)
+ return err;
+ }
+
+ return err;
}
/*
@@ -181,6 +395,9 @@ static int powerpc_vpadtl_process_auxtrace_event(struct perf_session *session,
return -errno;
}
+ if (!dump_trace)
+ return 0;
+
err = auxtrace_queues__add_event(&vpa->queues, session, event,
data_offset, &buffer);
if (err)
--
2.47.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 11/14] tools/perf: Process the DTL entries in queue and deliver samples
2025-08-15 8:33 [PATCH 00/14] Add interface to expose vpa dtl counters via perf Athira Rajeev
` (9 preceding siblings ...)
2025-08-15 8:34 ` [PATCH 10/14] tools/perf: Allocate and setup aux buffer queue to help co-relate with other events across CPU's Athira Rajeev
@ 2025-08-15 8:34 ` Athira Rajeev
2025-08-27 17:29 ` Adrian Hunter
2025-08-15 8:34 ` [PATCH 12/14] tools/perf: Add support for printing synth event details via default callback Athira Rajeev
` (4 subsequent siblings)
15 siblings, 1 reply; 29+ messages in thread
From: Athira Rajeev @ 2025-08-15 8:34 UTC (permalink / raw)
To: acme, jolsa, adrian.hunter, maddy, irogers, namhyung
Cc: linux-perf-users, linuxppc-dev, aboorvad, sshegde, atrajeev,
kjain, hbathini, Aditya.Bodkhe1, venkat88
Create samples from DTL entries for displaying in perf report
and perf script. When the different PERF_RECORD_XX records are
processed from perf session, powerpc_vpadtl_process_event() will
be invoked. For each of the PERF_RECORD_XX record, compare the timestamp
of perf record with timestamp of top element in the auxtrace heap.
Process the auxtrace queue if the timestamp of element from heap is
lower than timestamp from entry in perf record.
Sometimes it could happen that one buffer is only partially
processed. if the timestamp of occurrence of another event is more
than currently processed element in the queue, it will move on
to next perf record. So keep track of position of buffer to
continue processing next time. Update the timestamp of the
auxtrace heap with the timestamp of last processed entry from
the auxtrace buffer.
Generate perf sample for each entry in the dispatch trace log.
Fill in the sample details:
- sample ip is picked from srr0 field of dtl_entry
- sample cpu is picked from processor_id of dtl_entry
- sample id is from sample_id of powerpc_vpadtl
- cpumode is set to PERF_RECORD_MISC_KERNEL
- Additionally save the details in raw_data of sample. This
is to print the relevant fields in perf_sample__fprintf_synth()
when called from builtin-script
The sample is processed by calling perf_session__deliver_synth_event()
so that it gets included in perf report.
Sample Output:
./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.300 MB perf.data ]
./perf report
# Samples: 321 of event 'vpa-dtl'
# Event count (approx.): 321
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................. ..............................
#
100.00% 100.00% swapper [kernel.kallsyms] [k] plpar_hcall_norets_notrace
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
---
tools/perf/util/powerpc-vpadtl.c | 181 +++++++++++++++++++++++++++++++
1 file changed, 181 insertions(+)
diff --git a/tools/perf/util/powerpc-vpadtl.c b/tools/perf/util/powerpc-vpadtl.c
index 299927901c9d..370c566f9ac2 100644
--- a/tools/perf/util/powerpc-vpadtl.c
+++ b/tools/perf/util/powerpc-vpadtl.c
@@ -160,6 +160,43 @@ static void powerpc_vpadtl_dump_event(struct powerpc_vpadtl *vpa, unsigned char
powerpc_vpadtl_dump(vpa, buf, len);
}
+/*
+ * Generate perf sample for each entry in the dispatch trace log.
+ * - sample ip is picked from srr0 field of dtl_entry
+ * - sample cpu is picked from logical cpu.
+ * - sample id is from sample_id of powerpc_vpadtl
+ * - cpumode is set to PERF_RECORD_MISC_KERNEL
+ * - Additionally save the details in raw_data of sample. This
+ * is to print the relevant fields in perf_sample__fprintf_synth()
+ * when called from builtin-script
+ */
+static int powerpc_vpadtl_sample(struct dtl_entry *record, struct powerpc_vpadtl *vpa, u64 save, int cpu)
+{
+ struct perf_sample sample;
+ union perf_event event;
+
+ sample.ip = be64_to_cpu(record->srr0);
+ sample.period = 1;
+ sample.cpu = cpu;
+ sample.id = vpa->sample_id;
+ sample.callchain = NULL;
+ sample.branch_stack = NULL;
+ memset(&event, 0, sizeof(event));
+ sample.cpumode = PERF_RECORD_MISC_KERNEL;
+ sample.time = save;
+ sample.raw_data = record;
+ sample.raw_size = sizeof(record);
+ event.sample.header.type = PERF_RECORD_SAMPLE;
+ event.sample.header.misc = sample.cpumode;
+ event.sample.header.size = sizeof(struct perf_event_header);
+ if (perf_session__deliver_synth_event(vpa->session, &event,
+ &sample)) {
+ pr_debug("Failed to create sample for dtl entry\n");
+ return -1;
+ }
+ return 0;
+}
+
static int powerpc_vpadtl_get_buffer(struct powerpc_vpadtl_queue *vpaq)
{
struct auxtrace_buffer *buffer = vpaq->buffer;
@@ -233,6 +270,148 @@ static int powerpc_vpadtl_decode(struct powerpc_vpadtl_queue *vpaq)
return 1;
}
+static int powerpc_vpadtl_decode_all(struct powerpc_vpadtl_queue *vpaq)
+{
+ int ret;
+ unsigned char *buf;
+
+ if (!vpaq->buf_len || (vpaq->pkt_len == vpaq->size)) {
+ ret = powerpc_vpadtl_get_buffer(vpaq);
+ if (ret <= 0)
+ return ret;
+ }
+
+ if (vpaq->buffer) {
+ buf = vpaq->buffer->data;
+ buf += vpaq->pkt_len;
+ vpaq->dtl = (struct dtl_entry *)buf;
+ if ((long long)be64_to_cpu(vpaq->dtl->timebase) <= 0) {
+ if (vpaq->pkt_len != dtl_entry_size && vpaq->buf_len) {
+ vpaq->pkt_len += dtl_entry_size;
+ vpaq->buf_len -= dtl_entry_size;
+ }
+ return -1;
+ }
+ vpaq->pkt_len += dtl_entry_size;
+ vpaq->buf_len -= dtl_entry_size;
+ } else
+ return 0;
+
+
+ return 1;
+}
+
+static int powerpc_vpadtl_run_decoder(struct powerpc_vpadtl_queue *vpaq, u64 *timestamp)
+{
+ struct powerpc_vpadtl *vpa = vpaq->vpa;
+ struct dtl_entry *record;
+ int ret;
+ double result, div;
+ double boot_freq = vpaq->tb_freq;
+ unsigned long long boot_tb = vpaq->boot_tb;
+ unsigned long long diff;
+ unsigned long long save;
+
+ while (1) {
+ ret = powerpc_vpadtl_decode_all(vpaq);
+ if (!ret) {
+ pr_debug("All data in the queue has been processed.\n");
+ return 1;
+ }
+
+ /*
+ * Error is detected when decoding VPA PMU trace. Continue to
+ * the next trace data and find out more dtl entries.
+ */
+ if (ret < 0)
+ continue;
+
+ record = vpaq->dtl;
+
+ diff = be64_to_cpu(record->timebase) - boot_tb;
+ div = diff / boot_freq;
+ result = div;
+ result = result * 1000000000;
+ save = result;
+
+ /* Update timestamp for the last record */
+ if (save > vpaq->timestamp)
+ vpaq->timestamp = save;
+
+ /*
+ * If the timestamp of the queue is later than timestamp of the
+ * coming perf event, bail out so can allow the perf event to
+ * be processed ahead.
+ */
+ if (vpaq->timestamp >= *timestamp) {
+ *timestamp = vpaq->timestamp;
+ vpaq->pkt_len -= dtl_entry_size;
+ vpaq->buf_len += dtl_entry_size;
+ return 0;
+ }
+
+ ret = powerpc_vpadtl_sample(record, vpa, save, vpaq->cpu);
+ if (ret)
+ continue;
+ }
+ return 0;
+}
+
+/*
+ * For each of the PERF_RECORD_XX record, compare the timestamp
+ * of perf record with timestamp of top element in the auxtrace heap.
+ * Process the auxtrace queue if the timestamp of element from heap is
+ * lower than timestamp from entry in perf record.
+ *
+ * Update the timestamp of the auxtrace heap with the timestamp
+ * of last processed entry from the auxtrace buffer.
+ */
+static int powerpc_vpadtl_process_queues(struct powerpc_vpadtl *vpa, u64 timestamp)
+{
+ unsigned int queue_nr;
+ u64 ts;
+ int ret;
+
+ while (1) {
+ struct auxtrace_queue *queue;
+ struct powerpc_vpadtl_queue *vpaq;
+
+ if (!vpa->heap.heap_cnt)
+ return 0;
+
+ if (vpa->heap.heap_array[0].ordinal >= timestamp)
+ return 0;
+
+ queue_nr = vpa->heap.heap_array[0].queue_nr;
+ queue = &vpa->queues.queue_array[queue_nr];
+ vpaq = queue->priv;
+
+ auxtrace_heap__pop(&vpa->heap);
+
+ if (vpa->heap.heap_cnt) {
+ ts = vpa->heap.heap_array[0].ordinal + 1;
+ if (ts > timestamp)
+ ts = timestamp;
+ } else
+ ts = timestamp;
+
+ ret = powerpc_vpadtl_run_decoder(vpaq, &ts);
+ if (ret < 0) {
+ auxtrace_heap__add(&vpa->heap, queue_nr, ts);
+ return ret;
+ }
+
+ if (!ret) {
+ ret = auxtrace_heap__add(&vpa->heap, queue_nr, ts);
+ if (ret < 0)
+ return ret;
+ } else {
+ vpaq->on_heap = false;
+ }
+ }
+ return 0;
+}
+
static struct powerpc_vpadtl_queue *powerpc_vpadtl__alloc_queue(struct powerpc_vpadtl *vpa,
unsigned int queue_nr)
{
@@ -368,6 +547,8 @@ static int powerpc_vpadtl_process_event(struct perf_session *session __maybe_unu
err = powerpc_vpadtl__update_queues(vpa);
if (err)
return err;
+
+ err = powerpc_vpadtl_process_queues(vpa, sample->time);
}
return err;
--
2.47.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 12/14] tools/perf: Add support for printing synth event details via default callback
2025-08-15 8:33 [PATCH 00/14] Add interface to expose vpa dtl counters via perf Athira Rajeev
` (10 preceding siblings ...)
2025-08-15 8:34 ` [PATCH 11/14] tools/perf: Process the DTL entries in queue and deliver samples Athira Rajeev
@ 2025-08-15 8:34 ` Athira Rajeev
2025-08-27 17:29 ` Adrian Hunter
2025-08-15 8:34 ` [PATCH 13/14] tools/perf: Enable perf script to present the DTL entries Athira Rajeev
` (3 subsequent siblings)
15 siblings, 1 reply; 29+ messages in thread
From: Athira Rajeev @ 2025-08-15 8:34 UTC (permalink / raw)
To: acme, jolsa, adrian.hunter, maddy, irogers, namhyung
Cc: linux-perf-users, linuxppc-dev, aboorvad, sshegde, atrajeev,
kjain, hbathini, Aditya.Bodkhe1, venkat88
Introduce arch_perf_sample__fprintf_synth_evt to add support for
printing arch specific synth event details. The process_event()
function in "builtin-script.c" invokes perf_sample__fprintf_synth() for
displaying PERF_TYPE_SYNTH type events.
if (attr->type == PERF_TYPE_SYNTH && PRINT_FIELD(SYNTH))
perf_sample__fprintf_synth(sample, evsel, fp);
perf_sample__fprintf_synth() process the sample depending on the value
in evsel->core.attr.config . Currently all the arch specific callbacks
perf_sample__fprintf_synth* are part of "builtin-script.c" itself.
Example: perf_sample__fprintf_synth_ptwrite,
perf_sample__fprintf_synth_mwait etc. This will need adding arch
specific details in builtin-script.c for any new perf_synth_id events.
Introduce arch_perf_sample__fprintf_synth_evt() and invoke this as
default callback for perf_sample__fprintf_synth(). This way, arch
specific code can handle processing the details.
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
---
tools/perf/builtin-script.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index d9fbdcf72f25..eff584735980 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2003,6 +2003,12 @@ static int perf_sample__fprintf_synth_iflag_chg(struct perf_sample *sample, FILE
return len + perf_sample__fprintf_pt_spacing(len, fp);
}
+static void arch_perf_sample__fprintf_synth_evt(struct perf_sample *data __maybe_unused,
+ FILE *fp __maybe_unused, u64 config __maybe_unused)
+{
+ return;
+}
+
static int perf_sample__fprintf_synth(struct perf_sample *sample,
struct evsel *evsel, FILE *fp)
{
@@ -2026,6 +2032,7 @@ static int perf_sample__fprintf_synth(struct perf_sample *sample,
case PERF_SYNTH_INTEL_IFLAG_CHG:
return perf_sample__fprintf_synth_iflag_chg(sample, fp);
default:
+ arch_perf_sample__fprintf_synth_evt(sample, fp, evsel->core.attr.config);
break;
}
--
2.47.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 13/14] tools/perf: Enable perf script to present the DTL entries
2025-08-15 8:33 [PATCH 00/14] Add interface to expose vpa dtl counters via perf Athira Rajeev
` (11 preceding siblings ...)
2025-08-15 8:34 ` [PATCH 12/14] tools/perf: Add support for printing synth event details via default callback Athira Rajeev
@ 2025-08-15 8:34 ` Athira Rajeev
2025-08-27 17:30 ` Adrian Hunter
2025-08-15 8:34 ` [PATCH 14/14] powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU Athira Rajeev
` (2 subsequent siblings)
15 siblings, 1 reply; 29+ messages in thread
From: Athira Rajeev @ 2025-08-15 8:34 UTC (permalink / raw)
To: acme, jolsa, adrian.hunter, maddy, irogers, namhyung
Cc: linux-perf-users, linuxppc-dev, aboorvad, sshegde, atrajeev,
kjain, hbathini, Aditya.Bodkhe1, venkat88
Enable perf script to present the DTL entries. Process the
dispatch trace log details in arch_perf_sample__fprintf_synth_evt()
defined in buiultin-script.c file for config value:
PERF_SYNTH_POWERPC_VPA_DTL.
Sample output:
./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.300 MB perf.data ]
./perf script
perf 13322 [002] 233.835807: sched:sched_switch: perf:13322 [120] R ==> migration/2:27 [0]
migration/2 27 [002] 233.835811: sched:sched_migrate_task: comm=perf pid=13322 prio=120 orig_cpu=2 dest_cpu=3
migration/2 27 [002] 233.835818: sched:sched_stat_runtime: comm=migration/2 pid=27 runtime=9214 [ns]
migration/2 27 [002] 233.835819: sched:sched_switch: migration/2:27 [0] S ==> swapper/2:0 [120]
swapper 0 [002] 233.835822: vpa-dtl: timebase: 338954486062657 dispatch_reason:decrementer_interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:435, ready_to_enqueue_time:0, waiting_to_ready_time:34775058, processor_id: 202 c0000000000f8094 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
swapper 0 [001] 233.835886: vpa-dtl: timebase: 338954486095398 dispatch_reason:priv_doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:542, ready_to_enqueue_time:0, waiting_to_ready_time:1245360, processor_id: 201 c0000000000f8094 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
---
tools/perf/builtin-script.c | 23 +++++++++++++++++++++--
tools/perf/util/powerpc-vpadtl.c | 16 ----------------
tools/perf/util/powerpc-vpadtl.h | 19 +++++++++++++++++++
3 files changed, 40 insertions(+), 18 deletions(-)
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index eff584735980..a0faadaadc4d 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -66,6 +66,7 @@
#include "util/cgroup.h"
#include "util/annotate.h"
#include "perf.h"
+#include "util/powerpc-vpadtl.h"
#include <linux/ctype.h>
#ifdef HAVE_LIBTRACEEVENT
@@ -2004,8 +2005,26 @@ static int perf_sample__fprintf_synth_iflag_chg(struct perf_sample *sample, FILE
}
static void arch_perf_sample__fprintf_synth_evt(struct perf_sample *data __maybe_unused,
- FILE *fp __maybe_unused, u64 config __maybe_unused)
+ FILE *fp __maybe_unused, u64 config __maybe_unused, struct perf_env *env)
{
+ const char *arch = perf_env__arch(env);
+
+ if (!strcmp("powerpc", arch)) {
+ struct dtl_entry *dtl = (struct dtl_entry *)data->raw_data;
+
+ if (config != PERF_SYNTH_POWERPC_VPA_DTL)
+ return;
+ fprintf(fp, "timebase: %" PRIu64 "dispatch_reason:%s, preempt_reason:%s, enqueue_to_dispatch_time:%d,\
+ ready_to_enqueue_time:%d, waiting_to_ready_time:%d, processor_id: %d",\
+ be64_to_cpu(dtl->timebase),
+ dispatch_reasons[dtl->dispatch_reason],
+ preempt_reasons[dtl->preempt_reason],
+ be32_to_cpu(dtl->enqueue_to_dispatch_time),
+ be32_to_cpu(dtl->ready_to_enqueue_time),
+ be32_to_cpu(dtl->waiting_to_ready_time),
+ be16_to_cpu(dtl->processor_id));
+ }
+
return;
}
@@ -2032,7 +2051,7 @@ static int perf_sample__fprintf_synth(struct perf_sample *sample,
case PERF_SYNTH_INTEL_IFLAG_CHG:
return perf_sample__fprintf_synth_iflag_chg(sample, fp);
default:
- arch_perf_sample__fprintf_synth_evt(sample, fp, evsel->core.attr.config);
+ arch_perf_sample__fprintf_synth_evt(sample, fp, evsel->core.attr.config, evsel__env(evsel));
break;
}
diff --git a/tools/perf/util/powerpc-vpadtl.c b/tools/perf/util/powerpc-vpadtl.c
index 370c566f9ac2..482ddf1a2d51 100644
--- a/tools/perf/util/powerpc-vpadtl.c
+++ b/tools/perf/util/powerpc-vpadtl.c
@@ -30,22 +30,6 @@
#include "symbol.h"
#include "tool.h"
-/*
- * The DTL entries are of below format
- */
-struct dtl_entry {
- u8 dispatch_reason;
- u8 preempt_reason;
- u16 processor_id;
- u32 enqueue_to_dispatch_time;
- u32 ready_to_enqueue_time;
- u32 waiting_to_ready_time;
- u64 timebase;
- u64 fault_addr;
- u64 srr0;
- u64 srr1;
-};
-
/*
* Structure to save the auxtrace queue
*/
diff --git a/tools/perf/util/powerpc-vpadtl.h b/tools/perf/util/powerpc-vpadtl.h
index 625172adaba5..497f704787a5 100644
--- a/tools/perf/util/powerpc-vpadtl.h
+++ b/tools/perf/util/powerpc-vpadtl.h
@@ -20,6 +20,25 @@ union perf_event;
struct perf_session;
struct perf_pmu;
+/*
+ * The DTL entries are of below format
+ */
+struct dtl_entry {
+ u8 dispatch_reason;
+ u8 preempt_reason;
+ u16 processor_id;
+ u32 enqueue_to_dispatch_time;
+ u32 ready_to_enqueue_time;
+ u32 waiting_to_ready_time;
+ u64 timebase;
+ u64 fault_addr;
+ u64 srr0;
+ u64 srr1;
+};
+
+extern const char *dispatch_reasons[11];
+extern const char *preempt_reasons[10];
+
int powerpc_vpadtl_process_auxtrace_info(union perf_event *event,
struct perf_session *session);
--
2.47.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 14/14] powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU
2025-08-15 8:33 [PATCH 00/14] Add interface to expose vpa dtl counters via perf Athira Rajeev
` (12 preceding siblings ...)
2025-08-15 8:34 ` [PATCH 13/14] tools/perf: Enable perf script to present the DTL entries Athira Rajeev
@ 2025-08-15 8:34 ` Athira Rajeev
2025-08-15 12:17 ` [PATCH 00/14] Add interface to expose vpa dtl counters via perf Venkat Rao Bagalkote
2025-08-18 14:41 ` tejas05
15 siblings, 0 replies; 29+ messages in thread
From: Athira Rajeev @ 2025-08-15 8:34 UTC (permalink / raw)
To: acme, jolsa, adrian.hunter, maddy, irogers, namhyung
Cc: linux-perf-users, linuxppc-dev, aboorvad, sshegde, atrajeev,
kjain, hbathini, Aditya.Bodkhe1, venkat88
Documentation for vpa-dtl (Virtual Processor Area - Dispatch Trace Log)
PMU interface. And how it can be used to collect the distrace trace log
entries in perf data, how to process/report as part of perf report/perf
script.
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
---
Documentation/arch/powerpc/index.rst | 1 +
Documentation/arch/powerpc/vpa-dtl.rst | 155 +++++++++++++++++++++++++
2 files changed, 156 insertions(+)
create mode 100644 Documentation/arch/powerpc/vpa-dtl.rst
diff --git a/Documentation/arch/powerpc/index.rst b/Documentation/arch/powerpc/index.rst
index 53fc9f89f3e4..1be2ee3f0361 100644
--- a/Documentation/arch/powerpc/index.rst
+++ b/Documentation/arch/powerpc/index.rst
@@ -37,6 +37,7 @@ powerpc
vas-api
vcpudispatch_stats
vmemmap_dedup
+ vpa-dtl
features
diff --git a/Documentation/arch/powerpc/vpa-dtl.rst b/Documentation/arch/powerpc/vpa-dtl.rst
new file mode 100644
index 000000000000..98a8550ae1cc
--- /dev/null
+++ b/Documentation/arch/powerpc/vpa-dtl.rst
@@ -0,0 +1,155 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. _htm:
+
+===================================
+DTL (Dispatch Trace Log)
+===================================
+
+Athira Rajeev, 19 April 2025
+
+.. contents::
+ :depth: 3
+
+
+Basic overview
+==============
+
+The pseries Shared Processor Logical Partition(SPLPAR) machines can
+retrieve a log of dispatch and preempt events from the hypervisor
+using data from Disptach Trace Log(DTL) buffer. With this information,
+user can retrieve when and why each dispatch & preempt has occurred.
+The vpa-dtl PMU exposes the Virtual Processor Area(VPA) DTL counters
+via perf.
+
+Infrastructure used
+===================
+
+The VPA DTL PMU counters do not interrupt on overflow or generate any
+PMI interrupts. Therefore, hrtimer is used to poll the DTL data. The timer
+nterval can be provided by user via sample_period field in nano seconds.
+vpa dtl pmu has one hrtimer added per vpa-dtl pmu thread. DTL (Dispatch
+Trace Log) contains information about dispatch/preempt, enqueue time etc.
+We directly copy the DTL buffer data as part of auxiliary buffer and it
+will be processed later. This will avoid time taken to create samples
+in the kernel space. The PMU driver collecting Dispatch Trace Log (DTL)
+entries makes use of AUX support in perf infrastructure. On the tools side,
+this data is made available as PERF_RECORD_AUXTRACE records.
+
+To correlate each DTL entry with other events across CPU's, an auxtrace_queue
+is created for each CPU. Each auxtrace queue has a array/list of auxtrace buffers.
+All auxtrace queues is maintained in auxtrace heap. The queues are sorted
+based on timestamp. When the different PERF_RECORD_XX records are processed,
+compare the timestamp of perf record with timestamp of top element in the
+auxtrace heap so that DTL events can be co-related with other events
+Process the auxtrace queue if the timestamp of element from heap is
+lower than timestamp from entry in perf record. Sometimes it could happen that
+one buffer is only partially processed. if the timestamp of occurrence of
+another event is more than currently processed element in the queue, it will
+move on to next perf record. So keep track of position of buffer to continue
+processing next time. Update the timestamp of the auxtrace heap with the timestamp
+of last processed entry from the auxtrace buffer.
+
+This infrastructure ensures dispatch trace log entries can be correlated
+and presented along with other events like sched.
+
+vpa-dtl PMU example usage
+=========================
+
+.. code-block:: sh
+
+ # ls /sys/devices/vpa_dtl/
+ events format perf_event_mux_interval_ms power subsystem type uevent
+
+
+To capture the DTL data using perf record:
+.. code-block:: sh
+
+ # ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
+
+The result can be interpreted using perf record. Snippet of perf report -D:
+
+.. code-block:: sh
+
+ # ./perf report -D
+
+There are different PERF_RECORD_XX records. In that records corresponding to
+auxtrace buffers includes:
+
+1. PERF_RECORD_AUX
+ Conveys that new data is available in AUX area
+
+2. PERF_RECORD_AUXTRACE_INFO
+ Describes offset and size of auxtrace data in the buffers
+
+3. PERF_RECORD_AUXTRACE
+ This is the record that defines the auxtrace data which here in case of
+ vpa-dtl pmu is dispatch trace log data.
+
+Snippet from perf report -D showing the PERF_RECORD_AUXTRACE dump
+
+.. code-block:: sh
+
+0 0 0x39b10 [0x30]: PERF_RECORD_AUXTRACE size: 0x690 offset: 0 ref: 0 idx: 0 tid: -1 cpu: 0
+.
+. ... VPA DTL PMU data: size 1680 bytes, entries is 35
+. 00000000: boot_tb: 21349649546353231, tb_freq: 512000000
+. 00000030: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:7064, ready_to_enqueue_time:187, waiting_to_ready_time:6611773
+. 00000060: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:146, ready_to_enqueue_time:0, waiting_to_ready_time:15359437
+. 00000090: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4868, ready_to_enqueue_time:232, waiting_to_ready_time:5100709
+. 000000c0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:179, ready_to_enqueue_time:0, waiting_to_ready_time:30714243
+. 000000f0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:197, ready_to_enqueue_time:0, waiting_to_ready_time:15350648
+. 00000120: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:213, ready_to_enqueue_time:0, waiting_to_ready_time:15353446
+. 00000150: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:212, ready_to_enqueue_time:0, waiting_to_ready_time:15355126
+. 00000180: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:6368, ready_to_enqueue_time:164, waiting_to_ready_time:5104665
+
+Above is representation of dtl entry of below format:
+
+struct dtl_entry {
+ u8 dispatch_reason;
+ u8 preempt_reason;
+ u16 processor_id;
+ u32 enqueue_to_dispatch_time;
+ u32 ready_to_enqueue_time;
+ u32 waiting_to_ready_time;
+ u64 timebase;
+ u64 fault_addr;
+ u64 srr0;
+ u64 srr1;
+};
+
+First two fields represent the dispatch reason and preempt reason. The post
+processing of PERF_RECORD_AUXTRACE records will translate to meaningful data
+for user to consume.
+
+Visualize the dispatch trace log entries with perf report
+=========================================================
+
+.. code-block:: sh
+
+ # ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
+ [ perf record: Woken up 1 times to write data ]
+ [ perf record: Captured and wrote 0.300 MB perf.data ]
+
+ # ./perf report
+ # Samples: 321 of event 'vpa-dtl'
+ # Event count (approx.): 321
+ #
+ # Children Self Command Shared Object Symbol
+ # ........ ........ ....... ................. ..............................
+ #
+ 100.00% 100.00% swapper [kernel.kallsyms] [k] plpar_hcall_norets_notrace
+
+Visualize the dispatch trace log entries with perf script
+=========================================================
+
+.. code-block:: sh
+
+ # ./perf script
+ migration/9 67 [009] 105373.359903: sched:sched_waking: comm=perf pid=13418 prio=120 target_cpu=009
+ migration/9 67 [009] 105373.359904: sched:sched_migrate_task: comm=perf pid=13418 prio=120 orig_cpu=9 dest_cpu=10
+ migration/9 67 [009] 105373.359907: sched:sched_stat_runtime: comm=migration/9 pid=67 runtime=4050 [ns]
+ migration/9 67 [009] 105373.359908: sched:sched_switch: prev_comm=migration/9 prev_pid=67 prev_prio=0 prev_state=S ==> next_comm=swapper/9 next_pid=0 next_prio=120
+ :256 256 [016] 105373.359913: vpa-dtl: timebase: 21403600706628832 dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4854, ready_to_enqueue_time:139, waiting_to_ready_time:511842115 c0000000000fcd28 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
+ :256 256 [017] 105373.360012: vpa-dtl: timebase: 21403600706679454 dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:236, ready_to_enqueue_time:0, waiting_to_ready_time:133864583 c0000000000fcd28 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
+ perf 13418 [010] 105373.360048: sched:sched_stat_runtime: comm=perf pid=13418 runtime=139748 [ns]
+ perf 13418 [010] 105373.360052: sched:sched_waking: comm=migration/10 pid=72 prio=0 target_cpu=010
--
2.47.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 00/14] Add interface to expose vpa dtl counters via perf
2025-08-15 8:33 [PATCH 00/14] Add interface to expose vpa dtl counters via perf Athira Rajeev
` (13 preceding siblings ...)
2025-08-15 8:34 ` [PATCH 14/14] powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU Athira Rajeev
@ 2025-08-15 12:17 ` Venkat Rao Bagalkote
2025-08-15 12:51 ` Athira Rajeev
2025-08-18 14:41 ` tejas05
15 siblings, 1 reply; 29+ messages in thread
From: Venkat Rao Bagalkote @ 2025-08-15 12:17 UTC (permalink / raw)
To: Athira Rajeev, acme, jolsa, adrian.hunter, maddy, irogers,
namhyung
Cc: linux-perf-users, linuxppc-dev, aboorvad, sshegde, kjain,
hbathini, Aditya.Bodkhe1
On 15/08/25 2:03 pm, Athira Rajeev wrote:
> The pseries Shared Processor Logical Partition(SPLPAR) machines can
> retrieve a log of dispatch and preempt events from the hypervisor
> using data from Disptach Trace Log(DTL) buffer. With this information,
> user can retrieve when and why each dispatch & preempt has occurred.
> The vpa-dtl PMU exposes the Virtual Processor Area(VPA) DTL counters
> via perf.
>
> - Patches 1 to 6 has powerpc PMU driver code changes to capture DTL
> trace in perf.data. And patch 14 has documentation update.
> - Patch 7 to 13 is perf tools side code changes to enable perf
> report/script on perf.data file
>
> Infrastructure used
> ===================
>
> The VPA DTL PMU counters do not interrupt on overflow or generate any
> PMI interrupts. Therefore, hrtimer is used to poll the DTL data. The timer
> nterval can be provided by user via sample_period field in nano seconds.
> vpa dtl pmu has one hrtimer added per vpa-dtl pmu thread. DTL (Dispatch
> Trace Log) contains information about dispatch/preempt, enqueue time etc.
> We directly copy the DTL buffer data as part of auxiliary buffer and it
> will be processed later. This will avoid time taken to create samples
> in the kernel space. The PMU driver collecting Dispatch Trace Log (DTL)
> entries makes use of AUX support in perf infrastructure. On the tools side,
> this data is made available as PERF_RECORD_AUXTRACE records.
>
> To corelate each DTL entry with other events across CPU's, an auxtrace_queue
> is created for each CPU. Each auxtrace queue has a array/list of auxtrace buffers.
> All auxtrace queues is maintained in auxtrace heap. The queues are sorted
> based on timestamp. When the different PERF_RECORD_XX records are processed,
> compare the timestamp of perf record with timestamp of top element in the
> auxtrace heap so that DTL events can be co-related with other events
> Process the auxtrace queue if the timestamp of element from heap is
> lower than timestamp from entry in perf record. Sometimes it could happen that
> one buffer is only partially processed. if the timestamp of occurrence of
> another event is more than currently processed element in the queue, it will
> move on to next perf record. So keep track of position of buffer to continue
> processing next time. Update the timestamp of the auxtrace heap with the timestamp
> of last processed entry from the auxtrace buffer.
>
> This infrastructure ensures dispatch trace log entries can be corelated
> and presented along with other events like sched.
>
> vpa-dtl PMU example usage
>
> # ls /sys/devices/vpa_dtl/
> events format perf_event_mux_interval_ms power subsystem type uevent
>
>
> To capture the DTL data using perf record:
>
> # ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
>
> The result can be interpreted using perf report. Snippet of perf report -D:
>
> # ./perf report -D
>
> There are different PERF_RECORD_XX records. In that records corresponding to
> auxtrace buffers includes:
>
> 1. PERF_RECORD_AUX
> Conveys that new data is available in AUX area
>
> 2. PERF_RECORD_AUXTRACE_INFO
> Describes offset and size of auxtrace data in the buffers
>
> 3. PERF_RECORD_AUXTRACE
> This is the record that defines the auxtrace data which here in case of
> vpa-dtl pmu is dispatch trace log data.
>
> Snippet from perf report -D showing the PERF_RECORD_AUXTRACE dump
>
> 0 0 0x39b10 [0x30]: PERF_RECORD_AUXTRACE size: 0x690 offset: 0 ref: 0 idx: 0 tid: -1 cpu: 0
> .
> . ... VPA DTL PMU data: size 1680 bytes, entries is 35
> . 00000000: boot_tb: 21349649546353231, tb_freq: 512000000
> . 00000030: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:7064, ready_to_enqueue_time:187, waiting_to_ready_time:6611773
> . 00000060: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:146, ready_to_enqueue_time:0, waiting_to_ready_time:15359437
> . 00000090: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4868, ready_to_enqueue_time:232, waiting_to_ready_time:5100709
> . 000000c0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:179, ready_to_enqueue_time:0, waiting_to_ready_time:30714243
> . 000000f0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:197, ready_to_enqueue_time:0, waiting_to_ready_time:15350648
> . 00000120: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:213, ready_to_enqueue_time:0, waiting_to_ready_time:15353446
> . 00000150: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:212, ready_to_enqueue_time:0, waiting_to_ready_time:15355126
> . 00000180: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:6368, ready_to_enqueue_time:164, waiting_to_ready_time:5104665
>
> Above is representation of dtl entry of below format:
>
> struct dtl_entry {
> u8 dispatch_reason;
> u8 preempt_reason;
> u16 processor_id;
> u32 enqueue_to_dispatch_time;
> u32 ready_to_enqueue_time;
> u32 waiting_to_ready_time;
> u64 timebase;
> u64 fault_addr;
> u64 srr0;
> u64 srr1;
> };
>
> First two fields represent the dispatch reason and preempt reason. The post
> procecssing of PERF_RECORD_AUXTRACE records will translate to meaninful data
> for user to consume.
>
> Visualize the dispatch trace log entries with perf report:
>
> # ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.300 MB perf.data ]
>
> # ./perf report
> # Samples: 321 of event 'vpa-dtl'
> # Event count (approx.): 321
> #
> # Children Self Command Shared Object Symbol
> # ........ ........ ....... ................. ..............................
> #
> 100.00% 100.00% swapper [kernel.kallsyms] [k] plpar_hcall_norets_notrace
>
> Visualize the dispatch trace log entries with perf script:
>
> # ./perf script
> perf 13322 [002] 233.835807: sched:sched_switch: perf:13322 [120] R ==> migration/2:27 [0]
> migration/2 27 [002] 233.835811: sched:sched_migrate_task: comm=perf pid=13322 prio=120 orig_cpu=2 dest_cpu=3
> migration/2 27 [002] 233.835818: sched:sched_stat_runtime: comm=migration/2 pid=27 runtime=9214 [ns]
> migration/2 27 [002] 233.835819: sched:sched_switch: migration/2:27 [0] S ==> swapper/2:0 [120]
> swapper 0 [002] 233.835822: vpa-dtl: timebase: 338954486062657 dispatch_reason:decrementer_interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:435, ready_to_enqueue_time:0, waiting_to_ready_time:34775058, processor_id: 202 c0000000000f8094 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
> swapper 0 [001] 233.835886: vpa-dtl: timebase: 338954486095398 dispatch_reason:priv_doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:542, ready_to_enqueue_time:0, waiting_to_ready_time:1245360, processor_id: 201 c0000000000f8094 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
>
> Aboorva Devarajan (1):
> powerpc/time: Expose boot_tb via accessor
>
> Athira Rajeev (11):
> powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for
> capturing DTL data
> powerpc/perf/vpa-dtl: Add support to capture DTL data in aux buffer
> powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake
> up is needed
> tools/perf: Add basic CONFIG_AUXTRACE support for VPA pmu on powerpc
> tools/perf: process auxtrace events and display in perf report -D
> tools/perf: Add event name as vpa-dtl of PERF_TYPE_SYNTH type to
> present DTL samples
> tools/perf: Allocate and setup aux buffer queue to help co-relate with
> other events across CPU's
> tools/perf: Process the DTL entries in queue and deliver samples
> tools/perf: Add support for printing synth event details via default
> callback
> tools/perf: Enable perf script to present the DTL entries
> powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU
>
> Kajol Jain (2):
> powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf
> docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs
> event format entries for vpa_dtl pmu
>
> .../sysfs-bus-event_source-devices-vpa-dtl | 25 +
> Documentation/arch/powerpc/index.rst | 1 +
> Documentation/arch/powerpc/vpa-dtl.rst | 155 ++++
> arch/powerpc/include/asm/time.h | 2 +
> arch/powerpc/kernel/time.c | 7 +-
> arch/powerpc/perf/Makefile | 2 +-
> arch/powerpc/perf/vpa-dtl.c | 605 ++++++++++++++
> tools/perf/arch/powerpc/util/Build | 1 +
> tools/perf/arch/powerpc/util/auxtrace.c | 122 +++
> tools/perf/builtin-script.c | 26 +
> tools/perf/util/Build | 1 +
> tools/perf/util/auxtrace.c | 4 +
> tools/perf/util/auxtrace.h | 1 +
> tools/perf/util/event.h | 1 +
> tools/perf/util/powerpc-vpadtl.c | 756 ++++++++++++++++++
> tools/perf/util/powerpc-vpadtl.h | 45 ++
> 16 files changed, 1752 insertions(+), 2 deletions(-)
> create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-vpa-dtl
> create mode 100644 Documentation/arch/powerpc/vpa-dtl.rst
> create mode 100644 arch/powerpc/perf/vpa-dtl.c
> create mode 100644 tools/perf/arch/powerpc/util/auxtrace.c
> create mode 100644 tools/perf/util/powerpc-vpadtl.c
> create mode 100644 tools/perf/util/powerpc-vpadtl.h
>
Tested this patchset by applying on top of mainline kernel, and its
working as expected. Hence for the entire series, please add below tag.
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Regards,
Venkat.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 00/14] Add interface to expose vpa dtl counters via perf
2025-08-15 12:17 ` [PATCH 00/14] Add interface to expose vpa dtl counters via perf Venkat Rao Bagalkote
@ 2025-08-15 12:51 ` Athira Rajeev
0 siblings, 0 replies; 29+ messages in thread
From: Athira Rajeev @ 2025-08-15 12:51 UTC (permalink / raw)
To: Venkat Rao Bagalkote, Namhyung Kim, Arnaldo Carvalho de Melo,
Madhavan Srinivasan
Cc: Jiri Olsa, Adrian Hunter, Ian Rogers,
open list:PERFORMANCE EVENTS SUBSYSTEM, PowerPC,
Aboorva Devarajan, Shrikanth Hegde, Kajol Jain, hbathini,
Aditya Bodkhe
> On 15 Aug 2025, at 5:47 PM, Venkat Rao Bagalkote <venkat88@linux.ibm.com> wrote:
>
>
> On 15/08/25 2:03 pm, Athira Rajeev wrote:
>> The pseries Shared Processor Logical Partition(SPLPAR) machines can
>> retrieve a log of dispatch and preempt events from the hypervisor
>> using data from Disptach Trace Log(DTL) buffer. With this information,
>> user can retrieve when and why each dispatch & preempt has occurred.
>> The vpa-dtl PMU exposes the Virtual Processor Area(VPA) DTL counters
>> via perf.
>>
>> - Patches 1 to 6 has powerpc PMU driver code changes to capture DTL
>> trace in perf.data. And patch 14 has documentation update.
>> - Patch 7 to 13 is perf tools side code changes to enable perf
>> report/script on perf.data file
>>
>> Infrastructure used
>> ===================
>>
>> The VPA DTL PMU counters do not interrupt on overflow or generate any
>> PMI interrupts. Therefore, hrtimer is used to poll the DTL data. The timer
>> nterval can be provided by user via sample_period field in nano seconds.
>> vpa dtl pmu has one hrtimer added per vpa-dtl pmu thread. DTL (Dispatch
>> Trace Log) contains information about dispatch/preempt, enqueue time etc.
>> We directly copy the DTL buffer data as part of auxiliary buffer and it
>> will be processed later. This will avoid time taken to create samples
>> in the kernel space. The PMU driver collecting Dispatch Trace Log (DTL)
>> entries makes use of AUX support in perf infrastructure. On the tools side,
>> this data is made available as PERF_RECORD_AUXTRACE records.
>>
>> To corelate each DTL entry with other events across CPU's, an auxtrace_queue
>> is created for each CPU. Each auxtrace queue has a array/list of auxtrace buffers.
>> All auxtrace queues is maintained in auxtrace heap. The queues are sorted
>> based on timestamp. When the different PERF_RECORD_XX records are processed,
>> compare the timestamp of perf record with timestamp of top element in the
>> auxtrace heap so that DTL events can be co-related with other events
>> Process the auxtrace queue if the timestamp of element from heap is
>> lower than timestamp from entry in perf record. Sometimes it could happen that
>> one buffer is only partially processed. if the timestamp of occurrence of
>> another event is more than currently processed element in the queue, it will
>> move on to next perf record. So keep track of position of buffer to continue
>> processing next time. Update the timestamp of the auxtrace heap with the timestamp
>> of last processed entry from the auxtrace buffer.
>>
>> This infrastructure ensures dispatch trace log entries can be corelated
>> and presented along with other events like sched.
>>
>> vpa-dtl PMU example usage
>>
>> # ls /sys/devices/vpa_dtl/
>> events format perf_event_mux_interval_ms power subsystem type uevent
>>
>>
>> To capture the DTL data using perf record:
>>
>> # ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
>>
>> The result can be interpreted using perf report. Snippet of perf report -D:
>>
>> # ./perf report -D
>>
>> There are different PERF_RECORD_XX records. In that records corresponding to
>> auxtrace buffers includes:
>>
>> 1. PERF_RECORD_AUX
>> Conveys that new data is available in AUX area
>>
>> 2. PERF_RECORD_AUXTRACE_INFO
>> Describes offset and size of auxtrace data in the buffers
>>
>> 3. PERF_RECORD_AUXTRACE
>> This is the record that defines the auxtrace data which here in case of
>> vpa-dtl pmu is dispatch trace log data.
>>
>> Snippet from perf report -D showing the PERF_RECORD_AUXTRACE dump
>>
>> 0 0 0x39b10 [0x30]: PERF_RECORD_AUXTRACE size: 0x690 offset: 0 ref: 0 idx: 0 tid: -1 cpu: 0
>> .
>> . ... VPA DTL PMU data: size 1680 bytes, entries is 35
>> . 00000000: boot_tb: 21349649546353231, tb_freq: 512000000
>> . 00000030: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:7064, ready_to_enqueue_time:187, waiting_to_ready_time:6611773
>> . 00000060: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:146, ready_to_enqueue_time:0, waiting_to_ready_time:15359437
>> . 00000090: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4868, ready_to_enqueue_time:232, waiting_to_ready_time:5100709
>> . 000000c0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:179, ready_to_enqueue_time:0, waiting_to_ready_time:30714243
>> . 000000f0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:197, ready_to_enqueue_time:0, waiting_to_ready_time:15350648
>> . 00000120: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:213, ready_to_enqueue_time:0, waiting_to_ready_time:15353446
>> . 00000150: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:212, ready_to_enqueue_time:0, waiting_to_ready_time:15355126
>> . 00000180: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:6368, ready_to_enqueue_time:164, waiting_to_ready_time:5104665
>>
>> Above is representation of dtl entry of below format:
>>
>> struct dtl_entry {
>> u8 dispatch_reason;
>> u8 preempt_reason;
>> u16 processor_id;
>> u32 enqueue_to_dispatch_time;
>> u32 ready_to_enqueue_time;
>> u32 waiting_to_ready_time;
>> u64 timebase;
>> u64 fault_addr;
>> u64 srr0;
>> u64 srr1;
>> };
>>
>> First two fields represent the dispatch reason and preempt reason. The post
>> procecssing of PERF_RECORD_AUXTRACE records will translate to meaninful data
>> for user to consume.
>>
>> Visualize the dispatch trace log entries with perf report:
>>
>> # ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.300 MB perf.data ]
>>
>> # ./perf report
>> # Samples: 321 of event 'vpa-dtl'
>> # Event count (approx.): 321
>> #
>> # Children Self Command Shared Object Symbol
>> # ........ ........ ....... ................. ..............................
>> #
>> 100.00% 100.00% swapper [kernel.kallsyms] [k] plpar_hcall_norets_notrace
>>
>> Visualize the dispatch trace log entries with perf script:
>>
>> # ./perf script
>> perf 13322 [002] 233.835807: sched:sched_switch: perf:13322 [120] R ==> migration/2:27 [0]
>> migration/2 27 [002] 233.835811: sched:sched_migrate_task: comm=perf pid=13322 prio=120 orig_cpu=2 dest_cpu=3
>> migration/2 27 [002] 233.835818: sched:sched_stat_runtime: comm=migration/2 pid=27 runtime=9214 [ns]
>> migration/2 27 [002] 233.835819: sched:sched_switch: migration/2:27 [0] S ==> swapper/2:0 [120]
>> swapper 0 [002] 233.835822: vpa-dtl: timebase: 338954486062657 dispatch_reason:decrementer_interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:435, ready_to_enqueue_time:0, waiting_to_ready_time:34775058, processor_id: 202 c0000000000f8094 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
>> swapper 0 [001] 233.835886: vpa-dtl: timebase: 338954486095398 dispatch_reason:priv_doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:542, ready_to_enqueue_time:0, waiting_to_ready_time:1245360, processor_id: 201 c0000000000f8094 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
>>
>> Aboorva Devarajan (1):
>> powerpc/time: Expose boot_tb via accessor
>>
>> Athira Rajeev (11):
>> powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for
>> capturing DTL data
>> powerpc/perf/vpa-dtl: Add support to capture DTL data in aux buffer
>> powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake
>> up is needed
>> tools/perf: Add basic CONFIG_AUXTRACE support for VPA pmu on powerpc
>> tools/perf: process auxtrace events and display in perf report -D
>> tools/perf: Add event name as vpa-dtl of PERF_TYPE_SYNTH type to
>> present DTL samples
>> tools/perf: Allocate and setup aux buffer queue to help co-relate with
>> other events across CPU's
>> tools/perf: Process the DTL entries in queue and deliver samples
>> tools/perf: Add support for printing synth event details via default
>> callback
>> tools/perf: Enable perf script to present the DTL entries
>> powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU
>>
>> Kajol Jain (2):
>> powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf
>> docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs
>> event format entries for vpa_dtl pmu
>>
>> .../sysfs-bus-event_source-devices-vpa-dtl | 25 +
>> Documentation/arch/powerpc/index.rst | 1 +
>> Documentation/arch/powerpc/vpa-dtl.rst | 155 ++++
>> arch/powerpc/include/asm/time.h | 2 +
>> arch/powerpc/kernel/time.c | 7 +-
>> arch/powerpc/perf/Makefile | 2 +-
>> arch/powerpc/perf/vpa-dtl.c | 605 ++++++++++++++
>> tools/perf/arch/powerpc/util/Build | 1 +
>> tools/perf/arch/powerpc/util/auxtrace.c | 122 +++
>> tools/perf/builtin-script.c | 26 +
>> tools/perf/util/Build | 1 +
>> tools/perf/util/auxtrace.c | 4 +
>> tools/perf/util/auxtrace.h | 1 +
>> tools/perf/util/event.h | 1 +
>> tools/perf/util/powerpc-vpadtl.c | 756 ++++++++++++++++++
>> tools/perf/util/powerpc-vpadtl.h | 45 ++
>> 16 files changed, 1752 insertions(+), 2 deletions(-)
>> create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-vpa-dtl
>> create mode 100644 Documentation/arch/powerpc/vpa-dtl.rst
>> create mode 100644 arch/powerpc/perf/vpa-dtl.c
>> create mode 100644 tools/perf/arch/powerpc/util/auxtrace.c
>> create mode 100644 tools/perf/util/powerpc-vpadtl.c
>> create mode 100644 tools/perf/util/powerpc-vpadtl.h
>>
>
> Tested this patchset by applying on top of mainline kernel, and its working as expected. Hence for the entire series, please add below tag.
>
>
> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
>
>
> Regards,
>
> Venkat.
Thanks Venkat for checking.
Hi All,
I have CC-ed linuxppc-dev for powerpc specific patches and linux-perf-users for perf tools side related patches.
As mentioned in cover letter:
- Patches 1 to 6 has powerpc PMU driver code changes to capture trace data in perf.data, And patch 14 has documentation update.
- Patch 7 to 13 is perf tools side code changes to enable perf report/script on perf.data
Thanks
Athira
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 00/14] Add interface to expose vpa dtl counters via perf
2025-08-15 8:33 [PATCH 00/14] Add interface to expose vpa dtl counters via perf Athira Rajeev
` (14 preceding siblings ...)
2025-08-15 12:17 ` [PATCH 00/14] Add interface to expose vpa dtl counters via perf Venkat Rao Bagalkote
@ 2025-08-18 14:41 ` tejas05
15 siblings, 0 replies; 29+ messages in thread
From: tejas05 @ 2025-08-18 14:41 UTC (permalink / raw)
To: Athira Rajeev, acme, jolsa, adrian.hunter, maddy, irogers,
namhyung
Cc: linux-perf-users, linuxppc-dev, aboorvad, sshegde, kjain,
hbathini, Aditya.Bodkhe1, venkat88
On 8/15/25 14:03, Athira Rajeev wrote:
> The pseries Shared Processor Logical Partition(SPLPAR) machines can
> retrieve a log of dispatch and preempt events from the hypervisor
> using data from Disptach Trace Log(DTL) buffer. With this information,
> user can retrieve when and why each dispatch & preempt has occurred.
> The vpa-dtl PMU exposes the Virtual Processor Area(VPA) DTL counters
> via perf.
>
> - Patches 1 to 6 has powerpc PMU driver code changes to capture DTL
> trace in perf.data. And patch 14 has documentation update.
> - Patch 7 to 13 is perf tools side code changes to enable perf
> report/script on perf.data file
>
> Infrastructure used
> ===================
>
> The VPA DTL PMU counters do not interrupt on overflow or generate any
> PMI interrupts. Therefore, hrtimer is used to poll the DTL data. The timer
> nterval can be provided by user via sample_period field in nano seconds.
> vpa dtl pmu has one hrtimer added per vpa-dtl pmu thread. DTL (Dispatch
> Trace Log) contains information about dispatch/preempt, enqueue time etc.
> We directly copy the DTL buffer data as part of auxiliary buffer and it
> will be processed later. This will avoid time taken to create samples
> in the kernel space. The PMU driver collecting Dispatch Trace Log (DTL)
> entries makes use of AUX support in perf infrastructure. On the tools side,
> this data is made available as PERF_RECORD_AUXTRACE records.
>
> To corelate each DTL entry with other events across CPU's, an auxtrace_queue
> is created for each CPU. Each auxtrace queue has a array/list of auxtrace buffers.
> All auxtrace queues is maintained in auxtrace heap. The queues are sorted
> based on timestamp. When the different PERF_RECORD_XX records are processed,
> compare the timestamp of perf record with timestamp of top element in the
> auxtrace heap so that DTL events can be co-related with other events
> Process the auxtrace queue if the timestamp of element from heap is
> lower than timestamp from entry in perf record. Sometimes it could happen that
> one buffer is only partially processed. if the timestamp of occurrence of
> another event is more than currently processed element in the queue, it will
> move on to next perf record. So keep track of position of buffer to continue
> processing next time. Update the timestamp of the auxtrace heap with the timestamp
> of last processed entry from the auxtrace buffer.
>
> This infrastructure ensures dispatch trace log entries can be corelated
> and presented along with other events like sched.
>
> vpa-dtl PMU example usage
>
> # ls /sys/devices/vpa_dtl/
> events format perf_event_mux_interval_ms power subsystem type uevent
>
>
> To capture the DTL data using perf record:
>
> # ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
>
> The result can be interpreted using perf report. Snippet of perf report -D:
>
> # ./perf report -D
>
> There are different PERF_RECORD_XX records. In that records corresponding to
> auxtrace buffers includes:
>
> 1. PERF_RECORD_AUX
> Conveys that new data is available in AUX area
>
> 2. PERF_RECORD_AUXTRACE_INFO
> Describes offset and size of auxtrace data in the buffers
>
> 3. PERF_RECORD_AUXTRACE
> This is the record that defines the auxtrace data which here in case of
> vpa-dtl pmu is dispatch trace log data.
>
> Snippet from perf report -D showing the PERF_RECORD_AUXTRACE dump
>
> 0 0 0x39b10 [0x30]: PERF_RECORD_AUXTRACE size: 0x690 offset: 0 ref: 0 idx: 0 tid: -1 cpu: 0
> .
> . ... VPA DTL PMU data: size 1680 bytes, entries is 35
> . 00000000: boot_tb: 21349649546353231, tb_freq: 512000000
> . 00000030: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:7064, ready_to_enqueue_time:187, waiting_to_ready_time:6611773
> . 00000060: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:146, ready_to_enqueue_time:0, waiting_to_ready_time:15359437
> . 00000090: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4868, ready_to_enqueue_time:232, waiting_to_ready_time:5100709
> . 000000c0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:179, ready_to_enqueue_time:0, waiting_to_ready_time:30714243
> . 000000f0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:197, ready_to_enqueue_time:0, waiting_to_ready_time:15350648
> . 00000120: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:213, ready_to_enqueue_time:0, waiting_to_ready_time:15353446
> . 00000150: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:212, ready_to_enqueue_time:0, waiting_to_ready_time:15355126
> . 00000180: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:6368, ready_to_enqueue_time:164, waiting_to_ready_time:5104665
>
> Above is representation of dtl entry of below format:
>
> struct dtl_entry {
> u8 dispatch_reason;
> u8 preempt_reason;
> u16 processor_id;
> u32 enqueue_to_dispatch_time;
> u32 ready_to_enqueue_time;
> u32 waiting_to_ready_time;
> u64 timebase;
> u64 fault_addr;
> u64 srr0;
> u64 srr1;
> };
>
> First two fields represent the dispatch reason and preempt reason. The post
> procecssing of PERF_RECORD_AUXTRACE records will translate to meaninful data
> for user to consume.
>
> Visualize the dispatch trace log entries with perf report:
>
> # ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.300 MB perf.data ]
>
> # ./perf report
> # Samples: 321 of event 'vpa-dtl'
> # Event count (approx.): 321
> #
> # Children Self Command Shared Object Symbol
> # ........ ........ ....... ................. ..............................
> #
> 100.00% 100.00% swapper [kernel.kallsyms] [k] plpar_hcall_norets_notrace
>
> Visualize the dispatch trace log entries with perf script:
>
> # ./perf script
> perf 13322 [002] 233.835807: sched:sched_switch: perf:13322 [120] R ==> migration/2:27 [0]
> migration/2 27 [002] 233.835811: sched:sched_migrate_task: comm=perf pid=13322 prio=120 orig_cpu=2 dest_cpu=3
> migration/2 27 [002] 233.835818: sched:sched_stat_runtime: comm=migration/2 pid=27 runtime=9214 [ns]
> migration/2 27 [002] 233.835819: sched:sched_switch: migration/2:27 [0] S ==> swapper/2:0 [120]
> swapper 0 [002] 233.835822: vpa-dtl: timebase: 338954486062657 dispatch_reason:decrementer_interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:435, ready_to_enqueue_time:0, waiting_to_ready_time:34775058, processor_id: 202 c0000000000f8094 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
> swapper 0 [001] 233.835886: vpa-dtl: timebase: 338954486095398 dispatch_reason:priv_doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:542, ready_to_enqueue_time:0, waiting_to_ready_time:1245360, processor_id: 201 c0000000000f8094 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
>
> Aboorva Devarajan (1):
> powerpc/time: Expose boot_tb via accessor
>
> Athira Rajeev (11):
> powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for
> capturing DTL data
> powerpc/perf/vpa-dtl: Add support to capture DTL data in aux buffer
> powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake
> up is needed
> tools/perf: Add basic CONFIG_AUXTRACE support for VPA pmu on powerpc
> tools/perf: process auxtrace events and display in perf report -D
> tools/perf: Add event name as vpa-dtl of PERF_TYPE_SYNTH type to
> present DTL samples
> tools/perf: Allocate and setup aux buffer queue to help co-relate with
> other events across CPU's
> tools/perf: Process the DTL entries in queue and deliver samples
> tools/perf: Add support for printing synth event details via default
> callback
> tools/perf: Enable perf script to present the DTL entries
> powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU
>
> Kajol Jain (2):
> powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf
> docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs
> event format entries for vpa_dtl pmu
>
> .../sysfs-bus-event_source-devices-vpa-dtl | 25 +
> Documentation/arch/powerpc/index.rst | 1 +
> Documentation/arch/powerpc/vpa-dtl.rst | 155 ++++
> arch/powerpc/include/asm/time.h | 2 +
> arch/powerpc/kernel/time.c | 7 +-
> arch/powerpc/perf/Makefile | 2 +-
> arch/powerpc/perf/vpa-dtl.c | 605 ++++++++++++++
> tools/perf/arch/powerpc/util/Build | 1 +
> tools/perf/arch/powerpc/util/auxtrace.c | 122 +++
> tools/perf/builtin-script.c | 26 +
> tools/perf/util/Build | 1 +
> tools/perf/util/auxtrace.c | 4 +
> tools/perf/util/auxtrace.h | 1 +
> tools/perf/util/event.h | 1 +
> tools/perf/util/powerpc-vpadtl.c | 756 ++++++++++++++++++
> tools/perf/util/powerpc-vpadtl.h | 45 ++
> 16 files changed, 1752 insertions(+), 2 deletions(-)
> create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-vpa-dtl
> create mode 100644 Documentation/arch/powerpc/vpa-dtl.rst
> create mode 100644 arch/powerpc/perf/vpa-dtl.c
> create mode 100644 tools/perf/arch/powerpc/util/auxtrace.c
> create mode 100644 tools/perf/util/powerpc-vpadtl.c
> create mode 100644 tools/perf/util/powerpc-vpadtl.h
>
Hi,
I have tested the above patchset on the mainline kernel [ 6.17.0-rc2],
it is working fine. The vpa-dtl pmu is recognizable and the perf record
report works as expected. Please add the tag below, for the entire series.
Tested-by: Tejas Manhas <tejas05@linux.ibm.com>
Thanks & Regards,
Tejas
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 02/14] powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf
2025-08-15 8:33 ` [PATCH 02/14] powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf Athira Rajeev
@ 2025-08-20 11:53 ` Shrikanth Hegde
0 siblings, 0 replies; 29+ messages in thread
From: Shrikanth Hegde @ 2025-08-20 11:53 UTC (permalink / raw)
To: Athira Rajeev, maddy
Cc: linux-perf-users, linuxppc-dev, aboorvad, kjain, hbathini,
Aditya.Bodkhe1, venkat88, acme, jolsa, adrian.hunter, irogers,
namhyung
On 8/15/25 14:03, Athira Rajeev wrote:
> From: Kajol Jain <kjain@linux.ibm.com>
>
> The pseries Shared Processor Logical Partition(SPLPAR) machines
> can retrieve a log of dispatch and preempt events from the
> hypervisor using data from Disptach Trace Log(DTL) buffer.
> With this information, user can retrieve when and why each dispatch &
> preempt has occurred. Added an interface to expose the Virtual Processor
> Area(VPA) DTL counters via perf.
>
> The following events are available and exposed in sysfs:
>
> vpa_dtl/dtl_cede/ - Trace voluntary (OS initiated) virtual processor waits
> vpa_dtl/dtl_preempt/ - Trace time slice preempts
> vpa_dtl/dtl_fault/ - Trace virtual partition memory page faults.
> vpa_dtl/dtl_all/ - Trace all (dtl_cede/dtl_preempt/dtl_fault)
>
> Added interface defines supported event list, config fields for the
> event attributes and their corresponding bit values which are exported
> via sysfs. User could use the standard perf tool to access perf events
> exposed via vpa-dtl pmu.
>
> The VPA DTL PMU counters do not interrupt on overflow or generate any
> PMI interrupts. Therefore, the kernel needs to poll the counters, added
> hrtimer code to do that. The timer interval can be provided by user via
> sample_period field in nano seconds. There is one hrtimer added per
> vpa-dtl pmu thread.
>
> To ensure there are no other conflicting dtl users (example: debugfs dtl
> or /proc/powerpc/vcpudispatch_stats), interface added code to use
> "down_write_trylock" call to take the dtl_access_lock. The dtl_access_lock
> is defined in dtl.h file. Also added global reference count variable called
> "dtl_global_refc", to ensure dtl data can be captured per-cpu. Code also
> added global lock called "dtl_global_lock" to avoid race condition.
>
> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
> ---
> arch/powerpc/perf/Makefile | 2 +-
> arch/powerpc/perf/vpa-dtl.c | 349 ++++++++++++++++++++++++++++++++++++
> 2 files changed, 350 insertions(+), 1 deletion(-)
> create mode 100644 arch/powerpc/perf/vpa-dtl.c
>
> diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
> index 7f53fcb7495a..78dd7e25219e 100644
> --- a/arch/powerpc/perf/Makefile
> +++ b/arch/powerpc/perf/Makefile
> @@ -14,7 +14,7 @@ obj-$(CONFIG_PPC_POWERNV) += imc-pmu.o
> obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
> obj-$(CONFIG_FSL_EMB_PERF_EVENT_E500) += e500-pmu.o e6500-pmu.o
>
> -obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o hv-common.o
> +obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o hv-common.o vpa-dtl.o
>
> obj-$(CONFIG_VPA_PMU) += vpa-pmu.o
>
> diff --git a/arch/powerpc/perf/vpa-dtl.c b/arch/powerpc/perf/vpa-dtl.c
> new file mode 100644
> index 000000000000..e92756f88801
> --- /dev/null
> +++ b/arch/powerpc/perf/vpa-dtl.c
> @@ -0,0 +1,349 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Perf interface to expose Dispatch Trace Log counters.
> + *
> + * Copyright (C) 2024 Kajol Jain, IBM Corporation
> + */
> +
> +#ifdef CONFIG_PPC_SPLPAR
> +#define pr_fmt(fmt) "vpa_dtl: " fmt
> +
> +#include <asm/dtl.h>
> +#include <linux/perf_event.h>
> +#include <asm/plpar_wrappers.h>
> +
> +#define EVENT(_name, _code) enum{_name = _code}
> +
> +/*
> + * Based on Power Architecture Platform Reference(PAPR) documentation,
> + * Table 14.14. Per Virtual Processor Area, below Dispatch Trace Log(DTL)
> + * Enable Mask used to get corresponding virtual processor dispatch
> + * to preempt traces:
> + * DTL_CEDE(0x1): Trace voluntary (OS initiated) virtual
> + * processor waits
> + * DTL_PREEMPT(0x2): Trace time slice preempts
> + * DTL_FAULT(0x4): Trace virtual partition memory page
> + faults.
> + * DTL_ALL(0x7): Trace all (DTL_CEDE | DTL_PREEMPT | DTL_FAULT)
> + *
> + * Event codes based on Dispatch Trace Log Enable Mask.
> + */
> +EVENT(DTL_CEDE, 0x1);
> +EVENT(DTL_PREEMPT, 0x2);
> +EVENT(DTL_FAULT, 0x4);
> +EVENT(DTL_ALL, 0x7);
> +
> +GENERIC_EVENT_ATTR(dtl_cede, DTL_CEDE);
> +GENERIC_EVENT_ATTR(dtl_preempt, DTL_PREEMPT);
> +GENERIC_EVENT_ATTR(dtl_fault, DTL_FAULT);
> +GENERIC_EVENT_ATTR(dtl_all, DTL_ALL);
> +
> +PMU_FORMAT_ATTR(event, "config:0-7");
> +
> +static struct attribute *events_attr[] = {
> + GENERIC_EVENT_PTR(DTL_CEDE),
> + GENERIC_EVENT_PTR(DTL_PREEMPT),
> + GENERIC_EVENT_PTR(DTL_FAULT),
> + GENERIC_EVENT_PTR(DTL_ALL),
> + NULL
> +};
> +
> +static struct attribute_group event_group = {
> + .name = "events",
> + .attrs = events_attr,
> +};
> +
> +static struct attribute *format_attrs[] = {
> + &format_attr_event.attr,
> + NULL,
> +};
> +
> +static const struct attribute_group format_group = {
> + .name = "format",
> + .attrs = format_attrs,
> +};
> +
> +static const struct attribute_group *attr_groups[] = {
> + &format_group,
> + &event_group,
> + NULL,
> +};
> +
> +struct vpa_dtl {
> + struct dtl_entry *buf;
> + u64 last_idx;
> + bool active_lock;
How is this active_lock being used?
I see it is set/unset, but couldn't figure out how it is used.
> +};
> +
> +static DEFINE_PER_CPU(struct vpa_dtl, vpa_dtl_cpu);
> +
> +/* variable to capture reference count for the active dtl threads */
> +static int dtl_global_refc;
> +static spinlock_t dtl_global_lock = __SPIN_LOCK_UNLOCKED(dtl_global_lock);
> +
> +/*
> + * Function to dump the dispatch trace log buffer data to the
> + * perf data.
> + */
> +static void vpa_dtl_dump_sample_data(struct perf_event *event)
> +{
> + return;
> +}
> +
> +/*
> + * The VPA Dispatch Trace log counters do not interrupt on overflow.
> + * Therefore, the kernel needs to poll the counters to avoid missing
> + * an overflow using hrtimer. The timer interval is based on sample_period
> + * count provided by user, and minimum interval is 1 millisecond.
> + */
> +static enum hrtimer_restart vpa_dtl_hrtimer_handle(struct hrtimer *hrtimer)
> +{
> + struct perf_event *event;
> + u64 period;
> +
> + event = container_of(hrtimer, struct perf_event, hw.hrtimer);
> +
> + if (event->state != PERF_EVENT_STATE_ACTIVE)
> + return HRTIMER_NORESTART;
> +
> + vpa_dtl_dump_sample_data(event);
> + period = max_t(u64, NSEC_PER_MSEC, event->hw.sample_period);
> + hrtimer_forward_now(hrtimer, ns_to_ktime(period));
> +
> + return HRTIMER_RESTART;
> +}
> +
> +static void vpa_dtl_start_hrtimer(struct perf_event *event)
> +{
> + u64 period;
> + struct hw_perf_event *hwc = &event->hw;
> +
> + period = max_t(u64, NSEC_PER_MSEC, hwc->sample_period);
> + hrtimer_start(&hwc->hrtimer, ns_to_ktime(period), HRTIMER_MODE_REL_PINNED);
> +}
> +
> +static void vpa_dtl_stop_hrtimer(struct perf_event *event)
> +{
> + struct hw_perf_event *hwc = &event->hw;
> +
> + hrtimer_cancel(&hwc->hrtimer);
> +}
> +
> +static void vpa_dtl_reset_global_refc(struct perf_event *event)
> +{
> + spin_lock(&dtl_global_lock);
> + dtl_global_refc--;
> + if (dtl_global_refc <= 0) {
> + dtl_global_refc = 0;
> + up_write(&dtl_access_lock);
> + }
> + spin_unlock(&dtl_global_lock);
> +}
> +
> +/* Allocate dtl buffer memory for given cpu. */
The above comment is self explainable, may not be needed.
> +static int vpa_dtl_mem_alloc(int cpu)
> +{
> + struct vpa_dtl *dtl = &per_cpu(vpa_dtl_cpu, cpu);
> + struct dtl_entry *buf = NULL;
> +
> + /* Check for dispatch trace log buffer cache */
> + if (!dtl_cache)
> + return -ENOMEM;
> +
> + buf = kmem_cache_alloc_node(dtl_cache, GFP_KERNEL, cpu_to_node(cpu));
You probably need GFP_ATOMIC here, since this is called when spinlocks are held.
> + if (!buf) {
> + pr_warn("buffer allocation failed for cpu %d\n", cpu);
> + return -ENOMEM;
> + }
> + dtl->buf = buf;
> + return 0;
> +}
> +
> +static int vpa_dtl_event_init(struct perf_event *event)
> +{
> + struct hw_perf_event *hwc = &event->hw;
> + struct vpa_dtl *dtl = &per_cpu(vpa_dtl_cpu, event->cpu);
> +
> + /* test the event attr type for PMU enumeration */
> + if (event->attr.type != event->pmu->type)
> + return -ENOENT;
> +
> + if (!perfmon_capable())
> + return -EACCES;
> +
> + /* Return if this is a counting event */
> + if (!is_sampling_event(event))
> + return -EOPNOTSUPP;
> +
> + /* no branch sampling */
> + if (has_branch_stack(event))
> + return -EOPNOTSUPP;
> +
> + /* Invalid eventcode */
> + switch (event->attr.config) {
> + case DTL_LOG_CEDE:
> + case DTL_LOG_PREEMPT:
> + case DTL_LOG_FAULT:
> + case DTL_LOG_ALL:
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + spin_lock(&dtl_global_lock);
> +
> + /*
> + * To ensure there are no other conflicting dtl users
> + * (example: /proc/powerpc/vcpudispatch_stats or debugfs dtl),
> + * below code try to take the dtl_access_lock.
> + * The dtl_access_lock is a rwlock defined in dtl.h, which is used
> + * to unsure there is no conflicting dtl users.
> + * Based on below code, vpa_dtl pmu tries to take write access lock
> + * and also checks for dtl_global_refc, to make sure that the
> + * dtl_access_lock is taken by vpa_dtl pmu interface.
> + */
> + if (dtl_global_refc == 0 && !down_write_trylock(&dtl_access_lock)) {
> + spin_unlock(&dtl_global_lock);
> + return -EBUSY;
> + }
> +
> + /* Allocate dtl buffer memory */
> + if (vpa_dtl_mem_alloc(event->cpu)) {
> + spin_unlock(&dtl_global_lock);
> + return -ENOMEM;
> + }
> +
> + /*
> + * Increment the number of active vpa_dtl pmu threads. The
> + * dtl_global_refc is used to keep count of cpu threads that
> + * currently capturing dtl data using vpa_dtl pmu interface.
> + */
> + dtl_global_refc++;
> +
> + /*
> + * active_lock is a per cpu variable which is set if
> + * current cpu is running vpa_dtl perf record session.
> + */
> + dtl->active_lock = true;
> + spin_unlock(&dtl_global_lock);
> +
> + hrtimer_setup(&hwc->hrtimer, vpa_dtl_hrtimer_handle, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +
> + /*
> + * Since hrtimers have a fixed rate, we can do a static freq->period
> + * mapping and avoid the whole period adjust feedback stuff.
> + */
I didn't get this comment. What is meant by hrtimers have fixed rate? You can adjust the
the period value for next expiry always.
> + if (event->attr.freq) {
> + long freq = event->attr.sample_freq;
> +
> + event->attr.sample_period = NSEC_PER_SEC / freq;
> + hwc->sample_period = event->attr.sample_period;
> + local64_set(&hwc->period_left, hwc->sample_period);
> + hwc->last_period = hwc->sample_period;
> + event->attr.freq = 0;
> + }
I am not very familiar with PMU stuff.
What does the above do? what is period_left?
> +
> + event->destroy = vpa_dtl_reset_global_refc;
> + return 0;
> +}
> +
> +static int vpa_dtl_event_add(struct perf_event *event, int flags)
> +{
> + int ret, hwcpu;
> + unsigned long addr;
> + struct vpa_dtl *dtl = &per_cpu(vpa_dtl_cpu, event->cpu);
> +
> + /*
> + * Register our dtl buffer with the hypervisor. The
> + * HV expects the buffer size to be passed in the second
> + * word of the buffer. Refer section '14.11.3.2. H_REGISTER_VPA'
> + * from PAPR for more information.
> + */
> + ((u32 *)dtl->buf)[1] = cpu_to_be32(DISPATCH_LOG_BYTES);
> + dtl->last_idx = 0;
> +
> + hwcpu = get_hard_smp_processor_id(event->cpu);
> + addr = __pa(dtl->buf);
> +
> + ret = register_dtl(hwcpu, addr);
> + if (ret) {
> + pr_warn("DTL registration for cpu %d (hw %d) failed with %d\n",
> + event->cpu, hwcpu, ret);
> + return ret;
> + }
> +
> + /* set our initial buffer indices */
> + lppaca_of(event->cpu).dtl_idx = 0;
> +
> + /*
> + * Ensure that our updates to the lppaca fields have
> + * occurred before we actually enable the logging
> + */
> + smp_wmb();
> +
> + /* enable event logging */
> + lppaca_of(event->cpu).dtl_enable_mask = event->attr.config;
> +
> + vpa_dtl_start_hrtimer(event);
> +
> + return 0;
> +}
> +
> +static void vpa_dtl_event_del(struct perf_event *event, int flags)
> +{
> + int hwcpu = get_hard_smp_processor_id(event->cpu);
> + struct vpa_dtl *dtl = &per_cpu(vpa_dtl_cpu, event->cpu);
> +
> + vpa_dtl_stop_hrtimer(event);
> + unregister_dtl(hwcpu);
> + kmem_cache_free(dtl_cache, dtl->buf);
> + dtl->buf = NULL;
> + lppaca_of(event->cpu).dtl_enable_mask = 0x0;
> + dtl->active_lock = false;
> +}
> +
> +/*
> + * This function definition is empty as vpa_dtl_dump_sample_data
> + * is used to parse and dump the dispatch trace log data,
> + * to perf data.
> + */
> +static void vpa_dtl_event_read(struct perf_event *event)
> +{
> +}
> +
> +static struct pmu vpa_dtl_pmu = {
> + .task_ctx_nr = perf_invalid_context,
> +
> + .name = "vpa_dtl",
> + .attr_groups = attr_groups,
> + .event_init = vpa_dtl_event_init,
> + .add = vpa_dtl_event_add,
> + .del = vpa_dtl_event_del,
> + .read = vpa_dtl_event_read,
> + .capabilities = PERF_PMU_CAP_NO_EXCLUDE | PERF_PMU_CAP_EXCLUSIVE,
> +};
> +
> +static int vpa_dtl_init(void)
> +{
> + int r;
> +
> + if (!firmware_has_feature(FW_FEATURE_SPLPAR)) {
> + pr_debug("not a shared virtualized system, not enabling\n");
> + return -ENODEV;
> + }
> +
> + /* This driver is intended only for L1 host. */
> + if (is_kvm_guest()) {
> + pr_debug("Only supported for L1 host system\n");
> + return -ENODEV;
> + }
> +
> + r = perf_pmu_register(&vpa_dtl_pmu, vpa_dtl_pmu.name, -1);
> + if (r)
> + return r;
> +
> + return 0;
> +}
> +
> +device_initcall(vpa_dtl_init);
> +#endif //CONFIG_PPC_SPLPAR
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 07/14] tools/perf: Add basic CONFIG_AUXTRACE support for VPA pmu on powerpc
2025-08-15 8:34 ` [PATCH 07/14] tools/perf: Add basic CONFIG_AUXTRACE support for VPA pmu on powerpc Athira Rajeev
@ 2025-08-27 17:27 ` Adrian Hunter
2025-08-29 8:29 ` Athira Rajeev
0 siblings, 1 reply; 29+ messages in thread
From: Adrian Hunter @ 2025-08-27 17:27 UTC (permalink / raw)
To: Athira Rajeev, acme, jolsa, maddy, irogers, namhyung
Cc: linux-perf-users, linuxppc-dev, aboorvad, sshegde, kjain,
hbathini, Aditya.Bodkhe1, venkat88
On 15/08/2025 11:34, Athira Rajeev wrote:
> The powerpc PMU collecting Dispatch Trace Log (DTL) entries makes use of
> AUX support in perf infrastructure. The PMU driver has the functionality
> to collect trace entries in the aux buffer. On the tools side, this data
> is made available as PERF_RECORD_AUXTRACE records. This record is
> generated by "perf record" command. To enable the creation of
> PERF_RECORD_AUXTRACE, add functions to initialize auxtrace records ie
> "auxtrace_record__init()". Fill in fields for other callbacks like
> info_priv_size, info_fill, free, recording options etc. Define
> auxtrace_type as PERF_AUXTRACE_VPA_PMU. Add header file to define vpa
> dtl pmu specific details.
>
> Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
> ---
> tools/perf/arch/powerpc/util/Build | 1 +
> tools/perf/arch/powerpc/util/auxtrace.c | 122 ++++++++++++++++++++++++
> tools/perf/util/auxtrace.c | 2 +
> tools/perf/util/auxtrace.h | 1 +
> tools/perf/util/powerpc-vpadtl.h | 26 +++++
> 5 files changed, 152 insertions(+)
> create mode 100644 tools/perf/arch/powerpc/util/auxtrace.c
> create mode 100644 tools/perf/util/powerpc-vpadtl.h
>
> diff --git a/tools/perf/arch/powerpc/util/Build b/tools/perf/arch/powerpc/util/Build
> index fdd6a77a3432..a5b0babd307e 100644
> --- a/tools/perf/arch/powerpc/util/Build
> +++ b/tools/perf/arch/powerpc/util/Build
> @@ -10,3 +10,4 @@ perf-util-$(CONFIG_LIBDW) += skip-callchain-idx.o
>
> perf-util-$(CONFIG_LIBUNWIND) += unwind-libunwind.o
> perf-util-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
> +perf-util-$(CONFIG_AUXTRACE) += auxtrace.o
> diff --git a/tools/perf/arch/powerpc/util/auxtrace.c b/tools/perf/arch/powerpc/util/auxtrace.c
> new file mode 100644
> index 000000000000..ec8ec601fd08
> --- /dev/null
> +++ b/tools/perf/arch/powerpc/util/auxtrace.c
> @@ -0,0 +1,122 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * VPA support
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/bitops.h>
> +#include <linux/log2.h>
> +#include <time.h>
> +
> +#include "../../util/cpumap.h"
> +#include "../../util/evsel.h"
> +#include "../../util/evlist.h"
> +#include "../../util/session.h"
> +#include "../../util/util.h"
> +#include "../../util/pmu.h"
> +#include "../../util/debug.h"
> +#include "../../util/auxtrace.h"
> +#include "../../util/powerpc-vpadtl.h"
It would be better to only add #includes when they are needed
> +#include "../../util/record.h"
> +#include <internal/lib.h> // page_size
> +
> +#define KiB(x) ((x) * 1024)
> +
> +static int
> +powerpc_vpadtl_parse_snapshot_options(struct auxtrace_record *itr __maybe_unused,
> + struct record_opts *opts __maybe_unused,
> + const char *str __maybe_unused)
> +{
> + return 0;
> +}
> +
> +static int
> +powerpc_vpadtl_recording_options(struct auxtrace_record *ar __maybe_unused,
> + struct evlist *evlist __maybe_unused,
> + struct record_opts *opts)
> +{
> + opts->full_auxtrace = true;
> +
> + /*
> + * Set auxtrace_mmap_pages to minimum
> + * two pages
> + */
> + if (!opts->auxtrace_mmap_pages) {
> + opts->auxtrace_mmap_pages = KiB(128) / page_size;
> + if (opts->mmap_pages == UINT_MAX)
> + opts->mmap_pages = KiB(256) / page_size;
> + }
> +
> + return 0;
> +}
> +
> +static size_t powerpc_vpadtl_info_priv_size(struct auxtrace_record *itr __maybe_unused,
> + struct evlist *evlist __maybe_unused)
> +{
> + return 0;
return VPADTL_AUXTRACE_PRIV_SIZE;
> +}
> +
> +static int
> +powerpc_vpadtl_info_fill(struct auxtrace_record *itr __maybe_unused,
> + struct perf_session *session __maybe_unused,
> + struct perf_record_auxtrace_info *auxtrace_info __maybe_unused,
auxtrace_info is not __maybe_unused
> + size_t priv_size __maybe_unused)
> +{
> + auxtrace_info->type = PERF_AUXTRACE_VPA_PMU;
> +
> + return 0;
> +}
> +
> +static u64 powerpc_vpadtl_reference(struct auxtrace_record *itr __maybe_unused)
> +{
> + return 0;
> +}
> +
> +static void powerpc_vpadtl_free(struct auxtrace_record *itr)
> +{
> + free(itr);
> +}
> +
> +struct auxtrace_record *auxtrace_record__init(struct evlist *evlist __maybe_unused,
evlist is not __maybe_unused
> + int *err)
> +{
> + struct auxtrace_record *aux;
> + struct evsel *pos;
> + char *pmu_name;
> + int found = 0;
> +
> + evlist__for_each_entry(evlist, pos) {
> + pmu_name = strdup(pos->name);
> + pmu_name = strtok(pmu_name, "/");
> + if (!strcmp(pmu_name, "vpa_dtl")) {
pmu_name is leaked but strstarts() could be used instead
of above
> + found = 1;
> + pos->needs_auxtrace_mmap = true;
> + break;
> + }
> + }
> +
> + if (!found)
> + return NULL;
> +
> + /*
> + * To obtain the auxtrace buffer file descriptor, the auxtrace event
> + * must come first.
> + */
> + evlist__to_front(pos->evlist, pos);
> +
> + aux = zalloc(sizeof(*aux));
> + if (aux == NULL) {
> + pr_debug("aux record is NULL\n");
> + *err = -ENOMEM;
> + return NULL;
> + }
> +
> + aux->parse_snapshot_options = powerpc_vpadtl_parse_snapshot_options;
Doesn't look like snapshot mode is supported, so
powerpc_vpadtl_parse_snapshot_options() is not needed
> + aux->recording_options = powerpc_vpadtl_recording_options;
> + aux->info_priv_size = powerpc_vpadtl_info_priv_size;
> + aux->info_fill = powerpc_vpadtl_info_fill;
> + aux->free = powerpc_vpadtl_free;
> + aux->reference = powerpc_vpadtl_reference;
reference is optional. powerpc_vpadtl_reference() stub is not needed
> + return aux;
> +}
> diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
> index ebd32f1b8f12..f587d386c5ef 100644
> --- a/tools/perf/util/auxtrace.c
> +++ b/tools/perf/util/auxtrace.c
> @@ -55,6 +55,7 @@
> #include "hisi-ptt.h"
> #include "s390-cpumsf.h"
> #include "util/mmap.h"
> +#include "powerpc-vpadtl.h"
Isn't needed yet
>
> #include <linux/ctype.h>
> #include "symbol/kallsyms.h"
> @@ -1393,6 +1394,7 @@ int perf_event__process_auxtrace_info(struct perf_session *session,
> case PERF_AUXTRACE_HISI_PTT:
> err = hisi_ptt_process_auxtrace_info(event, session);
> break;
> + case PERF_AUXTRACE_VPA_PMU:
> case PERF_AUXTRACE_UNKNOWN:
> default:
> return -EINVAL;
> diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
> index f001cbb68f8e..1f9ef473af77 100644
> --- a/tools/perf/util/auxtrace.h
> +++ b/tools/perf/util/auxtrace.h
> @@ -50,6 +50,7 @@ enum auxtrace_type {
> PERF_AUXTRACE_ARM_SPE,
> PERF_AUXTRACE_S390_CPUMSF,
> PERF_AUXTRACE_HISI_PTT,
> + PERF_AUXTRACE_VPA_PMU,
Everything else is called some variation of vpa dtl, so
PERF_AUXTRACE_VPA_DTL would seem a more consistent name
> };
>
> enum itrace_period_type {
> diff --git a/tools/perf/util/powerpc-vpadtl.h b/tools/perf/util/powerpc-vpadtl.h
> new file mode 100644
> index 000000000000..625172adaba5
> --- /dev/null
> +++ b/tools/perf/util/powerpc-vpadtl.h
> @@ -0,0 +1,26 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * VPA DTL PMU Support
> + */
> +
> +#ifndef INCLUDE__PERF_POWERPC_VPADTL_H__
> +#define INCLUDE__PERF_POWERPC_VPADTL_H__
> +
> +#define POWERPC_VPADTL_NAME "powerpc_vpadtl_"
> +
> +enum {
> + POWERPC_VPADTL_TYPE,
> + VPADTL_PER_CPU_MMAPS,
VPADTL_PER_CPU_MMAPS is never used
> + VPADTL_AUXTRACE_PRIV_MAX,
> +};
> +
> +#define VPADTL_AUXTRACE_PRIV_SIZE (VPADTL_AUXTRACE_PRIV_MAX * sizeof(u64))
> +
> +union perf_event;
> +struct perf_session;
> +struct perf_pmu;
> +
> +int powerpc_vpadtl_process_auxtrace_info(union perf_event *event,
> + struct perf_session *session);
None of these definitions are used in this patch, although probably
VPADTL_AUXTRACE_PRIV_SIZE should be.
It would be better to add definitions only when they are needed.
> +
> +#endif
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 08/14] tools/perf: process auxtrace events and display in perf report -D
2025-08-15 8:34 ` [PATCH 08/14] tools/perf: process auxtrace events and display in perf report -D Athira Rajeev
@ 2025-08-27 17:28 ` Adrian Hunter
2025-08-29 8:31 ` Athira Rajeev
0 siblings, 1 reply; 29+ messages in thread
From: Adrian Hunter @ 2025-08-27 17:28 UTC (permalink / raw)
To: Athira Rajeev, acme, jolsa, maddy, irogers, namhyung
Cc: linux-perf-users, linuxppc-dev, aboorvad, sshegde, kjain,
hbathini, Aditya.Bodkhe1, venkat88
On 15/08/2025 11:34, Athira Rajeev wrote:
> Add vpa dtl pmu auxtrace process function for "perf report -D".
> The auxtrace event processing functions are defined in file
> "util/powerpc-vpadtl.c". Data structures used includes "struct
> powerpc_vpadtl_queue", "struct powerpc_vpadtl" to store the auxtrace
> buffers in queue. Different PERF_RECORD_XXX are generated
> during recording. PERF_RECORD_AUXTRACE_INFO is processed first
> since it is of type perf_user_event_type and perf session event
> delivers perf_session__process_user_event() first. Define function
> powerpc_vpadtl_process_auxtrace_info() to handle the processing of
> PERF_RECORD_AUXTRACE_INFO records. In this function, initialize
> the aux buffer queues using auxtrace_queues__init(). Setup the
> required infrastructure for aux data processing. The data is collected
> per CPU and auxtrace_queue is created for each CPU.
>
> Define powerpc_vpadtl_process_event() function to process
> PERF_RECORD_AUXTRACE records. In this, add the event to queue using
> auxtrace_queues__add_event() and process the buffer in
> powerpc_vpadtl_dump_event(). The first entry in the buffer with
> timebase as zero has boot timebase and frequency. Remaining data is of
> format for "struct dtl_entry". Define the translation for
> dispatch_reasons and preempt_reasons, report this when dump trace is
> invoked via powerpc_vpadtl_dump()
>
> Sample output:
>
> ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.300 MB perf.data ]
>
> ./perf report -D
>
> 0 0 0x39b10 [0x30]: PERF_RECORD_AUXTRACE size: 0x690 offset: 0 ref: 0 idx: 0 tid: -1 cpu: 0
> .
> . ... VPA DTL PMU data: size 1680 bytes, entries is 35
> . 00000000: boot_tb: 21349649546353231, tb_freq: 512000000
> . 00000030: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:7064, ready_to_enqueue_time:187, waiting_to_ready_time:6611773
> . 00000060: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:146, ready_to_enqueue_time:0, waiting_to_ready_time:15359437
> . 00000090: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4868, ready_to_enqueue_time:232, waiting_to_ready_time:5100709
> . 000000c0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:179, ready_to_enqueue_time:0, waiting_to_ready_time:30714243
> . 000000f0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:197, ready_to_enqueue_time:0, waiting_to_ready_time:15350648
> . 00000120: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:213, ready_to_enqueue_time:0, waiting_to_ready_time:15353446
> . 00000150: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:212, ready_to_enqueue_time:0, waiting_to_ready_time:15355126
> . 00000180: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:6368, ready_to_enqueue_time:164, waiting_to_ready_time:5104665
>
> Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
> ---
> tools/perf/util/Build | 1 +
> tools/perf/util/auxtrace.c | 2 +
> tools/perf/util/powerpc-vpadtl.c | 299 +++++++++++++++++++++++++++++++
> 3 files changed, 302 insertions(+)
> create mode 100644 tools/perf/util/powerpc-vpadtl.c
>
> diff --git a/tools/perf/util/Build b/tools/perf/util/Build
> index 4959e7a990e4..5ead46dc98e7 100644
> --- a/tools/perf/util/Build
> +++ b/tools/perf/util/Build
> @@ -136,6 +136,7 @@ perf-util-$(CONFIG_AUXTRACE) += arm-spe-decoder/
> perf-util-$(CONFIG_AUXTRACE) += hisi-ptt.o
> perf-util-$(CONFIG_AUXTRACE) += hisi-ptt-decoder/
> perf-util-$(CONFIG_AUXTRACE) += s390-cpumsf.o
> +perf-util-$(CONFIG_AUXTRACE) += powerpc-vpadtl.o
>
> ifdef CONFIG_LIBOPENCSD
> perf-util-$(CONFIG_AUXTRACE) += cs-etm.o
> diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
> index f587d386c5ef..bd1404f26bb7 100644
> --- a/tools/perf/util/auxtrace.c
> +++ b/tools/perf/util/auxtrace.c
> @@ -1395,6 +1395,8 @@ int perf_event__process_auxtrace_info(struct perf_session *session,
> err = hisi_ptt_process_auxtrace_info(event, session);
> break;
> case PERF_AUXTRACE_VPA_PMU:
> + err = powerpc_vpadtl_process_auxtrace_info(event, session);
> + break;
> case PERF_AUXTRACE_UNKNOWN:
> default:
> return -EINVAL;
> diff --git a/tools/perf/util/powerpc-vpadtl.c b/tools/perf/util/powerpc-vpadtl.c
> new file mode 100644
> index 000000000000..ea7b59c45f4a
> --- /dev/null
> +++ b/tools/perf/util/powerpc-vpadtl.c
> @@ -0,0 +1,299 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * VPA DTL PMU support
> + */
> +
> +#include <endian.h>
> +#include <errno.h>
> +#include <byteswap.h>
> +#include <inttypes.h>
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/bitops.h>
> +#include <linux/log2.h>
> +#include <elf.h>
> +#include <limits.h>
> +
> +#include "cpumap.h"
> +#include "color.h"
> +#include "evsel.h"
> +#include "evlist.h"
> +#include "machine.h"
> +#include "session.h"
> +#include "util.h"
> +#include "thread.h"
> +#include "debug.h"
> +#include "auxtrace.h"
> +#include "powerpc-vpadtl.h"
> +#include "map.h"
> +#include "symbol_conf.h"
> +#include "symbol.h"
Are all these #includes really needed
> +
> +/*
> + * The DTL entries are of below format
> + */
> +struct dtl_entry {
> + u8 dispatch_reason;
> + u8 preempt_reason;
> + u16 processor_id;
> + u32 enqueue_to_dispatch_time;
> + u32 ready_to_enqueue_time;
> + u32 waiting_to_ready_time;
> + u64 timebase;
> + u64 fault_addr;
> + u64 srr0;
> + u64 srr1;
> +};
struct dtl_entry is moved in a later patch.
Maybe call it vpadtl_entry or powerpc_vpadtl_entry and
put it in perf/util/event.h since it is eventually needed
in perf/builtin-script.c
> +
> +/*
> + * Structure to save the auxtrace queue
> + */
> +struct powerpc_vpadtl {
> + struct auxtrace auxtrace;
> + struct auxtrace_queues queues;
> + struct auxtrace_heap heap;
> + u32 auxtrace_type;
> + struct perf_session *session;
> + struct machine *machine;
> + u32 pmu_type;
> +};
> +
> +struct boottb_freq {
> + u64 boot_tb;
> + u64 tb_freq;
> + u64 timebase;
> + u64 padded[3];
> +};
> +
> +struct powerpc_vpadtl_queue {
> + struct powerpc_vpadtl *vpa;
> + unsigned int queue_nr;
> + struct auxtrace_buffer *buffer;
> + struct thread *thread;
> + bool on_heap;
> + bool done;
> + pid_t pid;
> + pid_t tid;
> + int cpu;
> +};
> +
> +const char *dispatch_reasons[11] = {
> + "external_interrupt",
> + "firmware_internal_event",
> + "H_PROD",
> + "decrementer_interrupt",
> + "system_reset",
> + "firmware_internal_event",
> + "conferred_cycles",
> + "time_slice",
> + "virtual_memory_page_fault",
> + "expropriated_adjunct",
> + "priv_doorbell"};
> +
> +const char *preempt_reasons[10] = {
> + "unused",
> + "firmware_internal_event",
> + "H_CEDE",
> + "H_CONFER",
> + "time_slice",
> + "migration_hibernation_page_fault",
> + "virtual_memory_page_fault",
> + "H_CONFER_ADJUNCT",
> + "hcall_adjunct",
> + "HDEC_adjunct"};
> +
> +#define dtl_entry_size 48
sizeof(struct dtl_entry) ?
> +
> +/*
> + * Function to dump the dispatch trace data when perf report
> + * is invoked with -D
> + */
> +static void powerpc_vpadtl_dump(struct powerpc_vpadtl *vpa __maybe_unused,
> + unsigned char *buf, size_t len)
> +{
> + struct dtl_entry *dtl;
> + int pkt_len, pos = 0;
> + const char *color = PERF_COLOR_BLUE;
> +
> + color_fprintf(stdout, color,
> + ". ... VPA DTL PMU data: size %zu bytes, entries is %zu\n",
> + len, len/dtl_entry_size);
> +
> + if (len % dtl_entry_size)
> + len = len - (len % dtl_entry_size);
> +
> + while (len) {
> + pkt_len = 48;
dtl_entry_size ?
> + printf(".");
> + color_fprintf(stdout, color, " %08x: ", pos);
> + dtl = (struct dtl_entry *)buf;
> + if (dtl->timebase != 0) {
> + printf("dispatch_reason:%s, preempt_reason:%s, enqueue_to_dispatch_time:%d, ready_to_enqueue_time:%d, waiting_to_ready_time:%d\n",
> + dispatch_reasons[dtl->dispatch_reason], preempt_reasons[dtl->preempt_reason], be32_to_cpu(dtl->enqueue_to_dispatch_time),
> + be32_to_cpu(dtl->ready_to_enqueue_time), be32_to_cpu(dtl->waiting_to_ready_time));
Lines are getting a bit long
> + } else {
> + struct boottb_freq *boot_tb = (struct boottb_freq *)buf;
> +
> + printf("boot_tb: %" PRIu64 ", tb_freq: %" PRIu64 "\n", boot_tb->boot_tb, boot_tb->tb_freq);
> + }
> +
> + pos += pkt_len;
> + buf += pkt_len;
> + len -= pkt_len;
> + }
> +}
> +
> +static void powerpc_vpadtl_dump_event(struct powerpc_vpadtl *vpa, unsigned char *buf,
> + size_t len)
> +{
> + printf(".\n");
> + powerpc_vpadtl_dump(vpa, buf, len);
> +}
> +
> +static int powerpc_vpadtl_process_event(struct perf_session *session __maybe_unused,
> + union perf_event *event __maybe_unused,
> + struct perf_sample *sample __maybe_unused,
> + const struct perf_tool *tool __maybe_unused)
> +{
> + return 0;
> +}
> +
> +/*
> + * Process PERF_RECORD_AUXTRACE records
> + */
> +static int powerpc_vpadtl_process_auxtrace_event(struct perf_session *session,
> + union perf_event *event,
> + const struct perf_tool *tool __maybe_unused)
> +{
> + struct powerpc_vpadtl *vpa = container_of(session->auxtrace, struct powerpc_vpadtl,
> + auxtrace);
Might be worth adding a helper like
static struct powerpc_vpadtl *session_to_vpa(struct perf_session *session)
{
return container_of(session->auxtrace, struct powerpc_vpadtl, auxtrace);
}
> + struct auxtrace_buffer *buffer;
> + off_t data_offset;
> + int fd = perf_data__fd(session->data);
> + int err;
> +
> + if (perf_data__is_pipe(session->data)) {
> + data_offset = 0;
> + } else {
> + data_offset = lseek(fd, 0, SEEK_CUR);
> + if (data_offset == -1)
> + return -errno;
> + }
> +
> + err = auxtrace_queues__add_event(&vpa->queues, session, event,
> + data_offset, &buffer);
auxtrace_queues__add_event() is only needed here if there is no
auxtrace index, however an auxtrace index is always written for
new perf.data files. The index gets processed and data queued
by auxtrace_queues__process_index() which is added in patch 11.
Piped data, on the other hand, has no index and needs to be
handled here.
So:
if (perf_data__is_pipe(session->data)) {
err = auxtrace_queues__add_event(&vpa->queues, session, event, 0, &buffer);
if (err)
return err;
}
> + if (err)
> + return err;
> +
> + /* Dump here now we have copied a piped trace out of the pipe */
> + if (dump_trace) {
> + if (auxtrace_buffer__get_data(buffer, fd)) {
> + powerpc_vpadtl_dump_event(vpa, buffer->data,
> + buffer->size);
Unnecessary line wrap
> + auxtrace_buffer__put_data(buffer);
> + }
> + }
> +
> + return 0;
> +}
> +
> +static int powerpc_vpadtl_flush(struct perf_session *session __maybe_unused,
> + const struct perf_tool *tool __maybe_unused)
> +{
> + return 0;
> +}
> +
> +static void powerpc_vpadtl_free_queue(void *priv)
> +{
> + struct powerpc_vpadtl_queue *vpaq = priv;
> +
> + if (!vpaq)
> + return;
> +
> + free(vpaq);
> +}
> +
> +static void powerpc_vpadtl_free_events(struct perf_session *session)
> +{
> + struct powerpc_vpadtl *vpa = container_of(session->auxtrace, struct powerpc_vpadtl,
> + auxtrace);
> + struct auxtrace_queues *queues = &vpa->queues;
> + unsigned int i;
> +
> + for (i = 0; i < queues->nr_queues; i++) {
> + powerpc_vpadtl_free_queue(queues->queue_array[i].priv);
This is the same as free(queues->queue_array[i].priv)
> + queues->queue_array[i].priv = NULL;
Could all be reduced to zfree(queues->queue_array[i].priv)
> + }
> + auxtrace_queues__free(queues);
> +}
> +
> +static void powerpc_vpadtl_free(struct perf_session *session)
> +{
> + struct powerpc_vpadtl *vpa = container_of(session->auxtrace, struct powerpc_vpadtl,
> + auxtrace);
> +
> + auxtrace_heap__free(&vpa->heap);
> + powerpc_vpadtl_free_events(session);
> + session->auxtrace = NULL;
> + free(vpa);
> +}
> +
> +static const char * const powerpc_vpadtl_info_fmts[] = {
> + [POWERPC_VPADTL_TYPE] = " PMU Type %"PRId64"\n",
> +};
> +
> +static void powerpc_vpadtl_print_info(__u64 *arr)
> +{
> + if (!dump_trace)
> + return;
> +
> + fprintf(stdout, powerpc_vpadtl_info_fmts[POWERPC_VPADTL_TYPE], arr[POWERPC_VPADTL_TYPE]);
> +}
> +
> +/*
> + * Process the PERF_RECORD_AUXTRACE_INFO records and setup
> + * the infrastructure to process auxtrace events. PERF_RECORD_AUXTRACE_INFO
> + * is processed first since it is of type perf_user_event_type.
> + * Initialise the aux buffer queues using auxtrace_queues__init().
> + * auxtrace_queue is created for each CPU.
> + */
> +int powerpc_vpadtl_process_auxtrace_info(union perf_event *event,
> + struct perf_session *session)
> +{
> + struct perf_record_auxtrace_info *auxtrace_info = &event->auxtrace_info;
> + size_t min_sz = sizeof(u64) * POWERPC_VPADTL_TYPE;
> + struct powerpc_vpadtl *vpa;
> + int err;
> +
> + if (auxtrace_info->header.size < sizeof(struct perf_record_auxtrace_info) +
> + min_sz)
> + return -EINVAL;
> +
> + vpa = zalloc(sizeof(struct powerpc_vpadtl));
> + if (!vpa)
> + return -ENOMEM;
> +
> + err = auxtrace_queues__init(&vpa->queues);
> + if (err)
> + goto err_free;
> +
> + vpa->session = session;
> + vpa->machine = &session->machines.host; /* No kvm support */
> + vpa->auxtrace_type = auxtrace_info->type;
> + vpa->pmu_type = auxtrace_info->priv[POWERPC_VPADTL_TYPE];
> +
> + vpa->auxtrace.process_event = powerpc_vpadtl_process_event;
> + vpa->auxtrace.process_auxtrace_event = powerpc_vpadtl_process_auxtrace_event;
> + vpa->auxtrace.flush_events = powerpc_vpadtl_flush;
> + vpa->auxtrace.free_events = powerpc_vpadtl_free_events;
> + vpa->auxtrace.free = powerpc_vpadtl_free;
> + session->auxtrace = &vpa->auxtrace;
> +
> + powerpc_vpadtl_print_info(&auxtrace_info->priv[0]);
> +
> + return 0;
> +
> +err_free:
> + free(vpa);
> + return err;
> +}
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 10/14] tools/perf: Allocate and setup aux buffer queue to help co-relate with other events across CPU's
2025-08-15 8:34 ` [PATCH 10/14] tools/perf: Allocate and setup aux buffer queue to help co-relate with other events across CPU's Athira Rajeev
@ 2025-08-27 17:29 ` Adrian Hunter
0 siblings, 0 replies; 29+ messages in thread
From: Adrian Hunter @ 2025-08-27 17:29 UTC (permalink / raw)
To: Athira Rajeev, acme, jolsa, maddy, irogers, namhyung
Cc: linux-perf-users, linuxppc-dev, aboorvad, sshegde, kjain,
hbathini, Aditya.Bodkhe1, venkat88
On 15/08/2025 11:34, Athira Rajeev wrote:
> When the Dispatch Trace Log data is collected along with other events
> like sched tracepoint events, it needs to be correlated and present
> interleaved along with these events. Perf events can be collected
> parallely across the CPUs. Hence it needs to be ensured events/dtl
> entries are processed in timestamp order.
>
> An auxtrace_queue is created for each CPU. Data within each queue is in
> increasing order of timestamp. Each auxtrace queue has a array/list of
> auxtrace buffers. When processing the auxtrace buffer, the data is
> mmapp'ed. All auxtrace queues is maintained in auxtrace heap. Each queue
> has a queue number and a timestamp. The queues are sorted/added to head
> based on the time stamp. So always the lowest timestamp (entries to be
> processed first) is on top of the heap.
>
> The auxtrace queue needs to be allocated and heap needs to be populated
> in the sorted order of timestamp. The queue needs to be filled with data
> only once via powerpc_vpadtl__update_queues() function.
> powerpc_vpadtl__setup_queues() iterates through all the entries to
> allocate and setup the auxtrace queue. To add to auxtrace heap, it is
> required to fetch the timebase of first entry for each of the queue.
>
> The first entry in the queue for VPA DTL PMU has the boot timebase,
> frequency details which are needed to get timestamp which is required to
> correlate with other events. The very next entry is the actual trace data
> that provides timestamp for occurrence of DTL event. Formula used to get
> the timestamp from dtl entry is:
>
> ((timbase from DTL entry - boot time) / frequency) * 1000000000
>
> powerpc_vpadtl_decode() adds the boot time and frequency as part of
> powerpc_vpadtl_queue structure so that it can be reused. Each of the
> dtl_entry is of 48 bytes size. Sometimes it could happen that one buffer
> is only partially processed (if the timestamp of occurrence of another
> event is more than currently processed element in queue, it will move on
> to next event). Inorder to keep track of position of buffer, additional
Inorder -> In order
> fields is added to powerpc_vpadtl_queue structure.
>
> Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
> ---
> tools/perf/util/powerpc-vpadtl.c | 219 ++++++++++++++++++++++++++++++-
> 1 file changed, 218 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/powerpc-vpadtl.c b/tools/perf/util/powerpc-vpadtl.c
> index 36c02821cf0a..299927901c9d 100644
> --- a/tools/perf/util/powerpc-vpadtl.c
> +++ b/tools/perf/util/powerpc-vpadtl.c
> @@ -28,6 +28,7 @@
> #include "map.h"
> #include "symbol_conf.h"
> #include "symbol.h"
> +#include "tool.h"
>
> /*
> * The DTL entries are of below format
> @@ -72,6 +73,14 @@ struct powerpc_vpadtl_queue {
> struct auxtrace_buffer *buffer;
> struct thread *thread;
> bool on_heap;
> + struct dtl_entry *dtl;
> + u64 timestamp;
> + unsigned long pkt_len;
> + unsigned long buf_len;
> + u64 boot_tb;
> + u64 tb_freq;
> + unsigned int tb_buffer;
> + unsigned int size;
> bool done;
> pid_t pid;
> pid_t tid;
> @@ -151,12 +160,217 @@ static void powerpc_vpadtl_dump_event(struct powerpc_vpadtl *vpa, unsigned char
> powerpc_vpadtl_dump(vpa, buf, len);
> }
>
> +static int powerpc_vpadtl_get_buffer(struct powerpc_vpadtl_queue *vpaq)
> +{
> + struct auxtrace_buffer *buffer = vpaq->buffer;
> + struct auxtrace_queues *queues = &vpaq->vpa->queues;
> + struct auxtrace_queue *queue;
> +
> + queue = &queues->queue_array[vpaq->queue_nr];
> + buffer = auxtrace_buffer__next(queue, buffer);
> +
> + if (!buffer)
> + return 0;
> +
> + vpaq->buffer = buffer;
> + vpaq->size = buffer->size;
> +
> + /* If the aux_buffer doesn't have data associated, try to load it */
> + if (!buffer->data) {
> + /* get the file desc associated with the perf data file */
> + int fd = perf_data__fd(vpaq->vpa->session->data);
> +
> + buffer->data = auxtrace_buffer__get_data(buffer, fd);
> + if (!buffer->data)
> + return -ENOMEM;
> + }
> +
> + vpaq->buf_len = buffer->size;
> +
> + if (buffer->size % dtl_entry_size)
> + vpaq->buf_len = buffer->size - (buffer->size % dtl_entry_size);
> +
> + if (vpaq->tb_buffer != buffer->buffer_nr) {
> + vpaq->pkt_len = 0;
> + vpaq->tb_buffer = 0;
> + }
> +
> + return 1;
> +}
> +
> +/*
> + * The first entry in the queue for VPA DTL PMU has the boot timebase,
> + * frequency details which are needed to get timestamp which is required to
> + * correlate with other events. Save the boot_tb and tb_freq as part of
> + * powerpc_vpadtl_queue. The very next entry is the actual trace data to
> + * be returned.
> + */
> +static int powerpc_vpadtl_decode(struct powerpc_vpadtl_queue *vpaq)
> +{
> + int ret;
> + char *buf;
> + struct boottb_freq *boottb;
> +
> + ret = powerpc_vpadtl_get_buffer(vpaq);
> + if (ret <= 0)
> + return ret;
> +
> + boottb = (struct boottb_freq *)vpaq->buffer->data;
> + if (boottb->timebase == 0) {
> + vpaq->boot_tb = boottb->boot_tb;
> + vpaq->tb_freq = boottb->tb_freq;
> + vpaq->pkt_len += dtl_entry_size;
> + }
> +
> + buf = vpaq->buffer->data;
> + buf += vpaq->pkt_len;
> + vpaq->dtl = (struct dtl_entry *)buf;
> +
> + vpaq->tb_buffer = vpaq->buffer->buffer_nr;
> + vpaq->buffer = NULL;
> + vpaq->buf_len = 0;
> +
> + return 1;
> +}
> +
> +static struct powerpc_vpadtl_queue *powerpc_vpadtl__alloc_queue(struct powerpc_vpadtl *vpa,
> + unsigned int queue_nr)
> +{
> + struct powerpc_vpadtl_queue *vpaq;
> +
> + vpaq = zalloc(sizeof(*vpaq));
> + if (!vpaq)
> + return NULL;
> +
> + vpaq->vpa = vpa;
> + vpaq->queue_nr = queue_nr;
> +
> + return vpaq;
> +}
> +
> +/*
> + * When the Dispatch Trace Log data is collected along with other events
> + * like sched tracepoint events, it needs to be correlated and present
> + * interleaved along with these events. Perf events can be collected
> + * parallely across the CPUs.
> + *
> + * An auxtrace_queue is created for each CPU. Data within each queue is in
> + * increasing order of timestamp. Allocate and setup auxtrace queues here.
> + * All auxtrace queues is maintained in auxtrace heap in the increasing order
> + * of timestamp. So always the lowest timestamp (entries to be processed first)
> + * is on top of the heap.
> + *
> + * To add to auxtrace heap, fetch the timestamp from first DTL entry
> + * for each of the queue.
> + */
> +static int powerpc_vpadtl__setup_queue(struct powerpc_vpadtl *vpa,
> + struct auxtrace_queue *queue,
> + unsigned int queue_nr)
> +{
> + struct powerpc_vpadtl_queue *vpaq = queue->priv;
> + struct dtl_entry *record;
> + double result, div;
> + double boot_freq;
> + unsigned long long boot_tb;
> + unsigned long long diff;
> + unsigned long long save = 0;
> +
> + if (list_empty(&queue->head) || vpaq)
> + return 0;
> +
> + vpaq = powerpc_vpadtl__alloc_queue(vpa, queue_nr);
> + if (!vpaq)
> + return -ENOMEM;
> +
> + queue->priv = vpaq;
> +
> + if (queue->cpu != -1)
> + vpaq->cpu = queue->cpu;
> +
> + if (!vpaq->on_heap) {
> + int ret;
> +retry:
> + ret = powerpc_vpadtl_decode(vpaq);
> + if (!ret)
> + return 0;
> +
> + if (ret < 0)
> + goto retry;
> +
> + record = vpaq->dtl;
> + /*
> + * Formula used to get timestamp that can be co-related with
> + * other perf events:
> + * ((timbase from DTL entry - boot time) / frequency) * 1000000000
> + */
> + if (record->timebase) {
> + boot_tb = vpaq->boot_tb;
> + boot_freq = vpaq->tb_freq;
> + diff = be64_to_cpu(record->timebase) - boot_tb;
> + div = diff / boot_freq;
> + result = div;
> + result = result * 1000000000;
> + save = result;
It would be nicer for the time calculation to be in a separate function.
Also 'save' is an odd choice of variable name for a timestamp.
> + }
> +
> + vpaq->timestamp = save;
> + ret = auxtrace_heap__add(&vpa->heap, queue_nr, vpaq->timestamp);
> + if (ret)
> + return ret;
> + vpaq->on_heap = true;
> + }
> +
> + return 0;
> +}
> +
> +static int powerpc_vpadtl__setup_queues(struct powerpc_vpadtl *vpa)
> +{
> + unsigned int i;
> + int ret;
> +
> + for (i = 0; i < vpa->queues.nr_queues; i++) {
> + ret = powerpc_vpadtl__setup_queue(vpa, &vpa->queues.queue_array[i], i);
> + if (ret)
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> +static int powerpc_vpadtl__update_queues(struct powerpc_vpadtl *vpa)
> +{
> + if (vpa->queues.new_data) {
> + vpa->queues.new_data = false;
> + return powerpc_vpadtl__setup_queues(vpa);
> + }
> +
> + return 0;
> +}
> +
> static int powerpc_vpadtl_process_event(struct perf_session *session __maybe_unused,
> union perf_event *event __maybe_unused,
> struct perf_sample *sample __maybe_unused,
> const struct perf_tool *tool __maybe_unused)
tool, sample and session are not __maybe_unused
> {
> - return 0;
> + int err = 0;
> + struct powerpc_vpadtl *vpa = container_of(session->auxtrace,
> + struct powerpc_vpadtl, auxtrace);
Arranging local variable declarations in order of descending line
length is often more readable
struct powerpc_vpadtl *vpa = session_to_vpa(session);
int err = 0;
> +
> + if (dump_trace)
> + return 0;
> +
> + if (!tool->ordered_events) {
> + pr_err("VPA requires ordered events\n");
> + return -EINVAL;
> + }
> +
> + if (sample->time) {
> + err = powerpc_vpadtl__update_queues(vpa);
> + if (err)
> + return err;
> + }
> +
> + return err;
> }
>
> /*
> @@ -181,6 +395,9 @@ static int powerpc_vpadtl_process_auxtrace_event(struct perf_session *session,
> return -errno;
> }
>
> + if (!dump_trace)
> + return 0;
See comment about auxtrace_queues__add_event() in patch 8.
> +
> err = auxtrace_queues__add_event(&vpa->queues, session, event,
> data_offset, &buffer);
> if (err)
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 11/14] tools/perf: Process the DTL entries in queue and deliver samples
2025-08-15 8:34 ` [PATCH 11/14] tools/perf: Process the DTL entries in queue and deliver samples Athira Rajeev
@ 2025-08-27 17:29 ` Adrian Hunter
2025-08-29 8:33 ` Athira Rajeev
0 siblings, 1 reply; 29+ messages in thread
From: Adrian Hunter @ 2025-08-27 17:29 UTC (permalink / raw)
To: Athira Rajeev, acme, jolsa, maddy, irogers, namhyung
Cc: linux-perf-users, linuxppc-dev, aboorvad, sshegde, kjain,
hbathini, Aditya.Bodkhe1, venkat88
On 15/08/2025 11:34, Athira Rajeev wrote:
> Create samples from DTL entries for displaying in perf report
> and perf script. When the different PERF_RECORD_XX records are
> processed from perf session, powerpc_vpadtl_process_event() will
> be invoked. For each of the PERF_RECORD_XX record, compare the timestamp
> of perf record with timestamp of top element in the auxtrace heap.
> Process the auxtrace queue if the timestamp of element from heap is
> lower than timestamp from entry in perf record.
>
> Sometimes it could happen that one buffer is only partially
> processed. if the timestamp of occurrence of another event is more
> than currently processed element in the queue, it will move on
> to next perf record. So keep track of position of buffer to
> continue processing next time. Update the timestamp of the
> auxtrace heap with the timestamp of last processed entry from
> the auxtrace buffer.
>
> Generate perf sample for each entry in the dispatch trace log.
> Fill in the sample details:
> - sample ip is picked from srr0 field of dtl_entry
> - sample cpu is picked from processor_id of dtl_entry
> - sample id is from sample_id of powerpc_vpadtl
> - cpumode is set to PERF_RECORD_MISC_KERNEL
> - Additionally save the details in raw_data of sample. This
> is to print the relevant fields in perf_sample__fprintf_synth()
> when called from builtin-script
>
> The sample is processed by calling perf_session__deliver_synth_event()
> so that it gets included in perf report.
>
> Sample Output:
>
> ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.300 MB perf.data ]
>
> ./perf report
>
> # Samples: 321 of event 'vpa-dtl'
> # Event count (approx.): 321
> #
> # Children Self Command Shared Object Symbol
> # ........ ........ ....... ................. ..............................
> #
> 100.00% 100.00% swapper [kernel.kallsyms] [k] plpar_hcall_norets_notrace
>
> Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
> ---
> tools/perf/util/powerpc-vpadtl.c | 181 +++++++++++++++++++++++++++++++
> 1 file changed, 181 insertions(+)
>
> diff --git a/tools/perf/util/powerpc-vpadtl.c b/tools/perf/util/powerpc-vpadtl.c
> index 299927901c9d..370c566f9ac2 100644
> --- a/tools/perf/util/powerpc-vpadtl.c
> +++ b/tools/perf/util/powerpc-vpadtl.c
> @@ -160,6 +160,43 @@ static void powerpc_vpadtl_dump_event(struct powerpc_vpadtl *vpa, unsigned char
> powerpc_vpadtl_dump(vpa, buf, len);
> }
>
> +/*
> + * Generate perf sample for each entry in the dispatch trace log.
> + * - sample ip is picked from srr0 field of dtl_entry
> + * - sample cpu is picked from logical cpu.
> + * - sample id is from sample_id of powerpc_vpadtl
> + * - cpumode is set to PERF_RECORD_MISC_KERNEL
Above 4 lines of comments are a bit redundant.
> + * - Additionally save the details in raw_data of sample. This
> + * is to print the relevant fields in perf_sample__fprintf_synth()
> + * when called from builtin-script
> + */
> +static int powerpc_vpadtl_sample(struct dtl_entry *record, struct powerpc_vpadtl *vpa, u64 save, int cpu)
> +{
> + struct perf_sample sample;
> + union perf_event event;
> +
> + sample.ip = be64_to_cpu(record->srr0);
> + sample.period = 1;
> + sample.cpu = cpu;
> + sample.id = vpa->sample_id;
> + sample.callchain = NULL;
> + sample.branch_stack = NULL;
> + memset(&event, 0, sizeof(event));
> + sample.cpumode = PERF_RECORD_MISC_KERNEL;
> + sample.time = save;
> + sample.raw_data = record;
> + sample.raw_size = sizeof(record);
> + event.sample.header.type = PERF_RECORD_SAMPLE;
> + event.sample.header.misc = sample.cpumode;
> + event.sample.header.size = sizeof(struct perf_event_header);
> + if (perf_session__deliver_synth_event(vpa->session, &event,
> + &sample)) {
There is some inconsistency with line wrapping
> + pr_debug("Failed to create sample for dtl entry\n");
> + return -1;
> + }
> + return 0;
> +}
> +
> static int powerpc_vpadtl_get_buffer(struct powerpc_vpadtl_queue *vpaq)
> {
> struct auxtrace_buffer *buffer = vpaq->buffer;
> @@ -233,6 +270,148 @@ static int powerpc_vpadtl_decode(struct powerpc_vpadtl_queue *vpaq)
> return 1;
> }
>
> +static int powerpc_vpadtl_decode_all(struct powerpc_vpadtl_queue *vpaq)
> +{
> + int ret;
> + unsigned char *buf;
> +
> + if (!vpaq->buf_len || (vpaq->pkt_len == vpaq->size)) {
Unnecessary parentheses around 'vpaq->pkt_len == vpaq->size'
> + ret = powerpc_vpadtl_get_buffer(vpaq);
> + if (ret <= 0)
> + return ret;
> + }
> +
> + if (vpaq->buffer) {
> + buf = vpaq->buffer->data;
> + buf += vpaq->pkt_len;
> + vpaq->dtl = (struct dtl_entry *)buf;
> + if ((long long)be64_to_cpu(vpaq->dtl->timebase) <= 0) {
> + if (vpaq->pkt_len != dtl_entry_size && vpaq->buf_len) {
> + vpaq->pkt_len += dtl_entry_size;
> + vpaq->buf_len -= dtl_entry_size;
> + }
> + return -1;
> + }
> + vpaq->pkt_len += dtl_entry_size;
> + vpaq->buf_len -= dtl_entry_size;
> + } else
> + return 0;
braces {} should be used on all arms of this statement
> +
> +
> + return 1;
> +}
> +
> +static int powerpc_vpadtl_run_decoder(struct powerpc_vpadtl_queue *vpaq, u64 *timestamp)
> +{
> + struct powerpc_vpadtl *vpa = vpaq->vpa;
> + struct dtl_entry *record;
> + int ret;
> + double result, div;
> + double boot_freq = vpaq->tb_freq;
> + unsigned long long boot_tb = vpaq->boot_tb;
> + unsigned long long diff;
> + unsigned long long save;
> +
> + while (1) {
> + ret = powerpc_vpadtl_decode_all(vpaq);
> + if (!ret) {
> + pr_debug("All data in the queue has been processed.\n");
> + return 1;
> + }
> +
> + /*
> + * Error is detected when decoding VPA PMU trace. Continue to
> + * the next trace data and find out more dtl entries.
> + */
> + if (ret < 0)
> + continue;
> +
> + record = vpaq->dtl;
> +
> + diff = be64_to_cpu(record->timebase) - boot_tb;
> + div = diff / boot_freq;
> + result = div;
> + result = result * 1000000000;
> + save = result;
It would be nicer for the time calculation to be in a separate function.
Also 'save' is an odd choice of variable name for a timestamp.
> +
> + /* Update timestamp for the last record */
> + if (save > vpaq->timestamp)
> + vpaq->timestamp = save;
> +
> + /*
> + * If the timestamp of the queue is later than timestamp of the
> + * coming perf event, bail out so can allow the perf event to
> + * be processed ahead.
> + */
> + if (vpaq->timestamp >= *timestamp) {
> + *timestamp = vpaq->timestamp;
> + vpaq->pkt_len -= dtl_entry_size;
> + vpaq->buf_len += dtl_entry_size;
> + return 0;
> + }
> +
> + ret = powerpc_vpadtl_sample(record, vpa, save, vpaq->cpu);
> + if (ret)
> + continue;
> + }
> + return 0;
> +}
> +
> +/*
> + * For each of the PERF_RECORD_XX record, compare the timestamp
> + * of perf record with timestamp of top element in the auxtrace heap.
> + * Process the auxtrace queue if the timestamp of element from heap is
> + * lower than timestamp from entry in perf record.
> + *
> + * Update the timestamp of the auxtrace heap with the timestamp
> + * of last processed entry from the auxtrace buffer.
> + */
> +static int powerpc_vpadtl_process_queues(struct powerpc_vpadtl *vpa, u64 timestamp)
> +{
> + unsigned int queue_nr;
> + u64 ts;
> + int ret;
> +
> + while (1) {
> + struct auxtrace_queue *queue;
> + struct powerpc_vpadtl_queue *vpaq;
> +
> + if (!vpa->heap.heap_cnt)
> + return 0;
> +
> + if (vpa->heap.heap_array[0].ordinal >= timestamp)
> + return 0;
> +
> + queue_nr = vpa->heap.heap_array[0].queue_nr;
> + queue = &vpa->queues.queue_array[queue_nr];
> + vpaq = queue->priv;
> +
> + auxtrace_heap__pop(&vpa->heap);
> +
> + if (vpa->heap.heap_cnt) {
> + ts = vpa->heap.heap_array[0].ordinal + 1;
> + if (ts > timestamp)
> + ts = timestamp;
> + } else
> + ts = timestamp;
braces {} should be used on all arms of this statement
> +
> + ret = powerpc_vpadtl_run_decoder(vpaq, &ts);
> + if (ret < 0) {
> + auxtrace_heap__add(&vpa->heap, queue_nr, ts);
> + return ret;
> + }
> +
> + if (!ret) {
> + ret = auxtrace_heap__add(&vpa->heap, queue_nr, ts);
> + if (ret < 0)
> + return ret;
> + } else {
> + vpaq->on_heap = false;
> + }
> + }
> + return 0;
> +}
> +
> static struct powerpc_vpadtl_queue *powerpc_vpadtl__alloc_queue(struct powerpc_vpadtl *vpa,
> unsigned int queue_nr)
> {
> @@ -368,6 +547,8 @@ static int powerpc_vpadtl_process_event(struct perf_session *session __maybe_unu
> err = powerpc_vpadtl__update_queues(vpa);
> if (err)
> return err;
> +
> + err = powerpc_vpadtl_process_queues(vpa, sample->time);
> }
>
> return err;
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 12/14] tools/perf: Add support for printing synth event details via default callback
2025-08-15 8:34 ` [PATCH 12/14] tools/perf: Add support for printing synth event details via default callback Athira Rajeev
@ 2025-08-27 17:29 ` Adrian Hunter
2025-08-29 8:35 ` Athira Rajeev
0 siblings, 1 reply; 29+ messages in thread
From: Adrian Hunter @ 2025-08-27 17:29 UTC (permalink / raw)
To: Athira Rajeev, acme, jolsa, maddy, irogers, namhyung
Cc: linux-perf-users, linuxppc-dev, aboorvad, sshegde, kjain,
hbathini, Aditya.Bodkhe1, venkat88
On 15/08/2025 11:34, Athira Rajeev wrote:
> Introduce arch_perf_sample__fprintf_synth_evt to add support for
> printing arch specific synth event details. The process_event()
> function in "builtin-script.c" invokes perf_sample__fprintf_synth() for
> displaying PERF_TYPE_SYNTH type events.
>
> if (attr->type == PERF_TYPE_SYNTH && PRINT_FIELD(SYNTH))
> perf_sample__fprintf_synth(sample, evsel, fp);
>
> perf_sample__fprintf_synth() process the sample depending on the value
> in evsel->core.attr.config . Currently all the arch specific callbacks
> perf_sample__fprintf_synth* are part of "builtin-script.c" itself.
> Example: perf_sample__fprintf_synth_ptwrite,
> perf_sample__fprintf_synth_mwait etc. This will need adding arch
> specific details in builtin-script.c for any new perf_synth_id events.
>
> Introduce arch_perf_sample__fprintf_synth_evt() and invoke this as
> default callback for perf_sample__fprintf_synth(). This way, arch
> specific code can handle processing the details.
A default callback is not needed.
>
> Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
> ---
> tools/perf/builtin-script.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index d9fbdcf72f25..eff584735980 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -2003,6 +2003,12 @@ static int perf_sample__fprintf_synth_iflag_chg(struct perf_sample *sample, FILE
> return len + perf_sample__fprintf_pt_spacing(len, fp);
> }
>
> +static void arch_perf_sample__fprintf_synth_evt(struct perf_sample *data __maybe_unused,
> + FILE *fp __maybe_unused, u64 config __maybe_unused)
> +{
> + return;
> +}
> +
> static int perf_sample__fprintf_synth(struct perf_sample *sample,
> struct evsel *evsel, FILE *fp)
> {
> @@ -2026,6 +2032,7 @@ static int perf_sample__fprintf_synth(struct perf_sample *sample,
> case PERF_SYNTH_INTEL_IFLAG_CHG:
> return perf_sample__fprintf_synth_iflag_chg(sample, fp);
> default:
Should just add something like:
case PERF_SYNTH_POWERPC_VPA_DTL:
return perf_sample__fprintf_synth_vpadtl(sample, fp);
> + arch_perf_sample__fprintf_synth_evt(sample, fp, evsel->core.attr.config);
> break;
> }
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 13/14] tools/perf: Enable perf script to present the DTL entries
2025-08-15 8:34 ` [PATCH 13/14] tools/perf: Enable perf script to present the DTL entries Athira Rajeev
@ 2025-08-27 17:30 ` Adrian Hunter
0 siblings, 0 replies; 29+ messages in thread
From: Adrian Hunter @ 2025-08-27 17:30 UTC (permalink / raw)
To: Athira Rajeev, acme, jolsa, maddy, irogers, namhyung
Cc: linux-perf-users, linuxppc-dev, aboorvad, sshegde, kjain,
hbathini, Aditya.Bodkhe1, venkat88
On 15/08/2025 11:34, Athira Rajeev wrote:
> Enable perf script to present the DTL entries. Process the
> dispatch trace log details in arch_perf_sample__fprintf_synth_evt()
> defined in buiultin-script.c file for config value:
> PERF_SYNTH_POWERPC_VPA_DTL.
>
> Sample output:
>
> ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.300 MB perf.data ]
>
> ./perf script
> perf 13322 [002] 233.835807: sched:sched_switch: perf:13322 [120] R ==> migration/2:27 [0]
> migration/2 27 [002] 233.835811: sched:sched_migrate_task: comm=perf pid=13322 prio=120 orig_cpu=2 dest_cpu=3
> migration/2 27 [002] 233.835818: sched:sched_stat_runtime: comm=migration/2 pid=27 runtime=9214 [ns]
> migration/2 27 [002] 233.835819: sched:sched_switch: migration/2:27 [0] S ==> swapper/2:0 [120]
> swapper 0 [002] 233.835822: vpa-dtl: timebase: 338954486062657 dispatch_reason:decrementer_interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:435, ready_to_enqueue_time:0, waiting_to_ready_time:34775058, processor_id: 202 c0000000000f8094 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
> swapper 0 [001] 233.835886: vpa-dtl: timebase: 338954486095398 dispatch_reason:priv_doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:542, ready_to_enqueue_time:0, waiting_to_ready_time:1245360, processor_id: 201 c0000000000f8094 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
>
> Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
> ---
> tools/perf/builtin-script.c | 23 +++++++++++++++++++++--
> tools/perf/util/powerpc-vpadtl.c | 16 ----------------
> tools/perf/util/powerpc-vpadtl.h | 19 +++++++++++++++++++
> 3 files changed, 40 insertions(+), 18 deletions(-)
>
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index eff584735980..a0faadaadc4d 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -66,6 +66,7 @@
> #include "util/cgroup.h"
> #include "util/annotate.h"
> #include "perf.h"
> +#include "util/powerpc-vpadtl.h"
>
> #include <linux/ctype.h>
> #ifdef HAVE_LIBTRACEEVENT
> @@ -2004,8 +2005,26 @@ static int perf_sample__fprintf_synth_iflag_chg(struct perf_sample *sample, FILE
> }
>
> static void arch_perf_sample__fprintf_synth_evt(struct perf_sample *data __maybe_unused,
> - FILE *fp __maybe_unused, u64 config __maybe_unused)
> + FILE *fp __maybe_unused, u64 config __maybe_unused, struct perf_env *env)
> {
> + const char *arch = perf_env__arch(env);
> +
> + if (!strcmp("powerpc", arch)) {
Not needed. PERF_SYNTH_POWERPC_VPA_DTL is unique.
> + struct dtl_entry *dtl = (struct dtl_entry *)data->raw_data;
> +
> + if (config != PERF_SYNTH_POWERPC_VPA_DTL)
> + return;
> + fprintf(fp, "timebase: %" PRIu64 "dispatch_reason:%s, preempt_reason:%s, enqueue_to_dispatch_time:%d,\
> + ready_to_enqueue_time:%d, waiting_to_ready_time:%d, processor_id: %d",\
> + be64_to_cpu(dtl->timebase),
If the output were ever to be injected into another
perf.data file (by adding support in perf inject) then it
would be aligned to 4 bytes not 8, so for 64-bit access it
would be safer to use get_unaligned_be64()
> + dispatch_reasons[dtl->dispatch_reason],
> + preempt_reasons[dtl->preempt_reason],
> + be32_to_cpu(dtl->enqueue_to_dispatch_time),
> + be32_to_cpu(dtl->ready_to_enqueue_time),
> + be32_to_cpu(dtl->waiting_to_ready_time),
> + be16_to_cpu(dtl->processor_id));
> + }
> +
> return;
> }
>
> @@ -2032,7 +2051,7 @@ static int perf_sample__fprintf_synth(struct perf_sample *sample,
> case PERF_SYNTH_INTEL_IFLAG_CHG:
> return perf_sample__fprintf_synth_iflag_chg(sample, fp);
> default:
> - arch_perf_sample__fprintf_synth_evt(sample, fp, evsel->core.attr.config);
> + arch_perf_sample__fprintf_synth_evt(sample, fp, evsel->core.attr.config, evsel__env(evsel));
> break;
> }
>
> diff --git a/tools/perf/util/powerpc-vpadtl.c b/tools/perf/util/powerpc-vpadtl.c
> index 370c566f9ac2..482ddf1a2d51 100644
> --- a/tools/perf/util/powerpc-vpadtl.c
> +++ b/tools/perf/util/powerpc-vpadtl.c
> @@ -30,22 +30,6 @@
> #include "symbol.h"
> #include "tool.h"
>
> -/*
> - * The DTL entries are of below format
> - */
> -struct dtl_entry {
> - u8 dispatch_reason;
> - u8 preempt_reason;
> - u16 processor_id;
> - u32 enqueue_to_dispatch_time;
> - u32 ready_to_enqueue_time;
> - u32 waiting_to_ready_time;
> - u64 timebase;
> - u64 fault_addr;
> - u64 srr0;
> - u64 srr1;
> -};
> -
> /*
> * Structure to save the auxtrace queue
> */
> diff --git a/tools/perf/util/powerpc-vpadtl.h b/tools/perf/util/powerpc-vpadtl.h
> index 625172adaba5..497f704787a5 100644
> --- a/tools/perf/util/powerpc-vpadtl.h
> +++ b/tools/perf/util/powerpc-vpadtl.h
> @@ -20,6 +20,25 @@ union perf_event;
> struct perf_session;
> struct perf_pmu;
>
> +/*
> + * The DTL entries are of below format
> + */
> +struct dtl_entry {
> + u8 dispatch_reason;
> + u8 preempt_reason;
> + u16 processor_id;
> + u32 enqueue_to_dispatch_time;
> + u32 ready_to_enqueue_time;
> + u32 waiting_to_ready_time;
> + u64 timebase;
> + u64 fault_addr;
> + u64 srr0;
> + u64 srr1;
> +};
As mentioned for patch 8, maybe call it vpadtl_entry or powerpc_vpadtl_entry and
put it in perf/util/event.h
> +
> +extern const char *dispatch_reasons[11];
> +extern const char *preempt_reasons[10];
These are in perf/util/powerpc-vpadtl.c which is conditionally compiled
depending on CONFIG_AUXTRACE. So this happens when building with
NO_AUXTRACE=1 :
usr/bin/ld: perf-in.o: in function `process_sample_event':
builtin-script.c:(.text+0x379a6): undefined reference to `preempt_reasons'
/usr/bin/ld: builtin-script.c:(.text+0x379d5): undefined reference to `dispatch_reasons
> +
> int powerpc_vpadtl_process_auxtrace_info(union perf_event *event,
> struct perf_session *session);
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 07/14] tools/perf: Add basic CONFIG_AUXTRACE support for VPA pmu on powerpc
2025-08-27 17:27 ` Adrian Hunter
@ 2025-08-29 8:29 ` Athira Rajeev
0 siblings, 0 replies; 29+ messages in thread
From: Athira Rajeev @ 2025-08-29 8:29 UTC (permalink / raw)
To: Adrian Hunter
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Madhavan Srinivasan,
Ian Rogers, Namhyung Kim, open list:PERFORMANCE EVENTS SUBSYSTEM,
PowerPC, Aboorva Devarajan, Shrikanth Hegde, Kajol Jain, hbathini,
Aditya Bodkhe, Venkat Rao Bagalkote
> On 27 Aug 2025, at 10:57 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> On 15/08/2025 11:34, Athira Rajeev wrote:
>> The powerpc PMU collecting Dispatch Trace Log (DTL) entries makes use of
>> AUX support in perf infrastructure. The PMU driver has the functionality
>> to collect trace entries in the aux buffer. On the tools side, this data
>> is made available as PERF_RECORD_AUXTRACE records. This record is
>> generated by "perf record" command. To enable the creation of
>> PERF_RECORD_AUXTRACE, add functions to initialize auxtrace records ie
>> "auxtrace_record__init()". Fill in fields for other callbacks like
>> info_priv_size, info_fill, free, recording options etc. Define
>> auxtrace_type as PERF_AUXTRACE_VPA_PMU. Add header file to define vpa
>> dtl pmu specific details.
>>
>> Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
>> ---
>> tools/perf/arch/powerpc/util/Build | 1 +
>> tools/perf/arch/powerpc/util/auxtrace.c | 122 ++++++++++++++++++++++++
>> tools/perf/util/auxtrace.c | 2 +
>> tools/perf/util/auxtrace.h | 1 +
>> tools/perf/util/powerpc-vpadtl.h | 26 +++++
>> 5 files changed, 152 insertions(+)
>> create mode 100644 tools/perf/arch/powerpc/util/auxtrace.c
>> create mode 100644 tools/perf/util/powerpc-vpadtl.h
>>
>> diff --git a/tools/perf/arch/powerpc/util/Build b/tools/perf/arch/powerpc/util/Build
>> index fdd6a77a3432..a5b0babd307e 100644
>> --- a/tools/perf/arch/powerpc/util/Build
>> +++ b/tools/perf/arch/powerpc/util/Build
>> @@ -10,3 +10,4 @@ perf-util-$(CONFIG_LIBDW) += skip-callchain-idx.o
>>
>> perf-util-$(CONFIG_LIBUNWIND) += unwind-libunwind.o
>> perf-util-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
>> +perf-util-$(CONFIG_AUXTRACE) += auxtrace.o
>> diff --git a/tools/perf/arch/powerpc/util/auxtrace.c b/tools/perf/arch/powerpc/util/auxtrace.c
>> new file mode 100644
>> index 000000000000..ec8ec601fd08
>> --- /dev/null
>> +++ b/tools/perf/arch/powerpc/util/auxtrace.c
>> @@ -0,0 +1,122 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * VPA support
>> + */
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/types.h>
>> +#include <linux/bitops.h>
>> +#include <linux/log2.h>
>> +#include <time.h>
>> +
>> +#include "../../util/cpumap.h"
>> +#include "../../util/evsel.h"
>> +#include "../../util/evlist.h"
>> +#include "../../util/session.h"
>> +#include "../../util/util.h"
>> +#include "../../util/pmu.h"
>> +#include "../../util/debug.h"
>> +#include "../../util/auxtrace.h"
>> +#include "../../util/powerpc-vpadtl.h"
>
> It would be better to only add #includes when they are needed
>
>> +#include "../../util/record.h"
>> +#include <internal/lib.h> // page_size
>> +
>> +#define KiB(x) ((x) * 1024)
>> +
>> +static int
>> +powerpc_vpadtl_parse_snapshot_options(struct auxtrace_record *itr __maybe_unused,
>> + struct record_opts *opts __maybe_unused,
>> + const char *str __maybe_unused)
>> +{
>> + return 0;
>> +}
>> +
>> +static int
>> +powerpc_vpadtl_recording_options(struct auxtrace_record *ar __maybe_unused,
>> + struct evlist *evlist __maybe_unused,
>> + struct record_opts *opts)
>> +{
>> + opts->full_auxtrace = true;
>> +
>> + /*
>> + * Set auxtrace_mmap_pages to minimum
>> + * two pages
>> + */
>> + if (!opts->auxtrace_mmap_pages) {
>> + opts->auxtrace_mmap_pages = KiB(128) / page_size;
>> + if (opts->mmap_pages == UINT_MAX)
>> + opts->mmap_pages = KiB(256) / page_size;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static size_t powerpc_vpadtl_info_priv_size(struct auxtrace_record *itr __maybe_unused,
>> + struct evlist *evlist __maybe_unused)
>> +{
>> + return 0;
>
> return VPADTL_AUXTRACE_PRIV_SIZE;
>> +}
>> +
>> +static int
>> +powerpc_vpadtl_info_fill(struct auxtrace_record *itr __maybe_unused,
>> + struct perf_session *session __maybe_unused,
>> + struct perf_record_auxtrace_info *auxtrace_info __maybe_unused,
>
> auxtrace_info is not __maybe_unused
>
>> + size_t priv_size __maybe_unused)
>> +{
>> + auxtrace_info->type = PERF_AUXTRACE_VPA_PMU;
>> +
>> + return 0;
>> +}
>> +
>> +static u64 powerpc_vpadtl_reference(struct auxtrace_record *itr __maybe_unused)
>> +{
>> + return 0;
>> +}
>> +
>> +static void powerpc_vpadtl_free(struct auxtrace_record *itr)
>> +{
>> + free(itr);
>> +}
>> +
>> +struct auxtrace_record *auxtrace_record__init(struct evlist *evlist __maybe_unused,
>
> evlist is not __maybe_unused
>
>> + int *err)
>> +{
>> + struct auxtrace_record *aux;
>> + struct evsel *pos;
>> + char *pmu_name;
>> + int found = 0;
>> +
>> + evlist__for_each_entry(evlist, pos) {
>> + pmu_name = strdup(pos->name);
>> + pmu_name = strtok(pmu_name, "/");
>> + if (!strcmp(pmu_name, "vpa_dtl")) {
>
> pmu_name is leaked but strstarts() could be used instead
> of above
>
>> + found = 1;
>> + pos->needs_auxtrace_mmap = true;
>> + break;
>> + }
>> + }
>> +
>> + if (!found)
>> + return NULL;
>> +
>> + /*
>> + * To obtain the auxtrace buffer file descriptor, the auxtrace event
>> + * must come first.
>> + */
>> + evlist__to_front(pos->evlist, pos);
>> +
>> + aux = zalloc(sizeof(*aux));
>> + if (aux == NULL) {
>> + pr_debug("aux record is NULL\n");
>> + *err = -ENOMEM;
>> + return NULL;
>> + }
>> +
>> + aux->parse_snapshot_options = powerpc_vpadtl_parse_snapshot_options;
>
> Doesn't look like snapshot mode is supported, so
> powerpc_vpadtl_parse_snapshot_options() is not needed
>
>> + aux->recording_options = powerpc_vpadtl_recording_options;
>> + aux->info_priv_size = powerpc_vpadtl_info_priv_size;
>> + aux->info_fill = powerpc_vpadtl_info_fill;
>> + aux->free = powerpc_vpadtl_free;
>> + aux->reference = powerpc_vpadtl_reference;
>
> reference is optional. powerpc_vpadtl_reference() stub is not needed
>
>> + return aux;
>> +}
>> diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
>> index ebd32f1b8f12..f587d386c5ef 100644
>> --- a/tools/perf/util/auxtrace.c
>> +++ b/tools/perf/util/auxtrace.c
>> @@ -55,6 +55,7 @@
>> #include "hisi-ptt.h"
>> #include "s390-cpumsf.h"
>> #include "util/mmap.h"
>> +#include "powerpc-vpadtl.h"
>
> Isn't needed yet
>
>>
>> #include <linux/ctype.h>
>> #include "symbol/kallsyms.h"
>> @@ -1393,6 +1394,7 @@ int perf_event__process_auxtrace_info(struct perf_session *session,
>> case PERF_AUXTRACE_HISI_PTT:
>> err = hisi_ptt_process_auxtrace_info(event, session);
>> break;
>> + case PERF_AUXTRACE_VPA_PMU:
>> case PERF_AUXTRACE_UNKNOWN:
>> default:
>> return -EINVAL;
>> diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
>> index f001cbb68f8e..1f9ef473af77 100644
>> --- a/tools/perf/util/auxtrace.h
>> +++ b/tools/perf/util/auxtrace.h
>> @@ -50,6 +50,7 @@ enum auxtrace_type {
>> PERF_AUXTRACE_ARM_SPE,
>> PERF_AUXTRACE_S390_CPUMSF,
>> PERF_AUXTRACE_HISI_PTT,
>> + PERF_AUXTRACE_VPA_PMU,
>
> Everything else is called some variation of vpa dtl, so
> PERF_AUXTRACE_VPA_DTL would seem a more consistent name
>
>> };
>>
>> enum itrace_period_type {
>> diff --git a/tools/perf/util/powerpc-vpadtl.h b/tools/perf/util/powerpc-vpadtl.h
>> new file mode 100644
>> index 000000000000..625172adaba5
>> --- /dev/null
>> +++ b/tools/perf/util/powerpc-vpadtl.h
>> @@ -0,0 +1,26 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * VPA DTL PMU Support
>> + */
>> +
>> +#ifndef INCLUDE__PERF_POWERPC_VPADTL_H__
>> +#define INCLUDE__PERF_POWERPC_VPADTL_H__
>> +
>> +#define POWERPC_VPADTL_NAME "powerpc_vpadtl_"
>> +
>> +enum {
>> + POWERPC_VPADTL_TYPE,
>> + VPADTL_PER_CPU_MMAPS,
>
> VPADTL_PER_CPU_MMAPS is never used
>
>> + VPADTL_AUXTRACE_PRIV_MAX,
>> +};
>> +
>> +#define VPADTL_AUXTRACE_PRIV_SIZE (VPADTL_AUXTRACE_PRIV_MAX * sizeof(u64))
>> +
>> +union perf_event;
>> +struct perf_session;
>> +struct perf_pmu;
>> +
>> +int powerpc_vpadtl_process_auxtrace_info(union perf_event *event,
>> + struct perf_session *session);
>
> None of these definitions are used in this patch, although probably
> VPADTL_AUXTRACE_PRIV_SIZE should be.
> It would be better to add definitions only when they are needed.
Hi Adrian
Thanks for taking time to review this patch set and sharing your comments. I will address the changes suggested on each patches in next version
Thanks
Athira
>
>> +
>> +#endif
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 08/14] tools/perf: process auxtrace events and display in perf report -D
2025-08-27 17:28 ` Adrian Hunter
@ 2025-08-29 8:31 ` Athira Rajeev
0 siblings, 0 replies; 29+ messages in thread
From: Athira Rajeev @ 2025-08-29 8:31 UTC (permalink / raw)
To: Adrian Hunter
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Madhavan Srinivasan,
Ian Rogers, Namhyung Kim, open list:PERFORMANCE EVENTS SUBSYSTEM,
PowerPC, Aboorva Devarajan, Shrikanth Hegde, Kajol Jain, hbathini,
Aditya Bodkhe, Venkat Rao Bagalkote
> On 27 Aug 2025, at 10:58 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> On 15/08/2025 11:34, Athira Rajeev wrote:
>> Add vpa dtl pmu auxtrace process function for "perf report -D".
>> The auxtrace event processing functions are defined in file
>> "util/powerpc-vpadtl.c". Data structures used includes "struct
>> powerpc_vpadtl_queue", "struct powerpc_vpadtl" to store the auxtrace
>> buffers in queue. Different PERF_RECORD_XXX are generated
>> during recording. PERF_RECORD_AUXTRACE_INFO is processed first
>> since it is of type perf_user_event_type and perf session event
>> delivers perf_session__process_user_event() first. Define function
>> powerpc_vpadtl_process_auxtrace_info() to handle the processing of
>> PERF_RECORD_AUXTRACE_INFO records. In this function, initialize
>> the aux buffer queues using auxtrace_queues__init(). Setup the
>> required infrastructure for aux data processing. The data is collected
>> per CPU and auxtrace_queue is created for each CPU.
>>
>> Define powerpc_vpadtl_process_event() function to process
>> PERF_RECORD_AUXTRACE records. In this, add the event to queue using
>> auxtrace_queues__add_event() and process the buffer in
>> powerpc_vpadtl_dump_event(). The first entry in the buffer with
>> timebase as zero has boot timebase and frequency. Remaining data is of
>> format for "struct dtl_entry". Define the translation for
>> dispatch_reasons and preempt_reasons, report this when dump trace is
>> invoked via powerpc_vpadtl_dump()
>>
>> Sample output:
>>
>> ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.300 MB perf.data ]
>>
>> ./perf report -D
>>
>> 0 0 0x39b10 [0x30]: PERF_RECORD_AUXTRACE size: 0x690 offset: 0 ref: 0 idx: 0 tid: -1 cpu: 0
>> .
>> . ... VPA DTL PMU data: size 1680 bytes, entries is 35
>> . 00000000: boot_tb: 21349649546353231, tb_freq: 512000000
>> . 00000030: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:7064, ready_to_enqueue_time:187, waiting_to_ready_time:6611773
>> . 00000060: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:146, ready_to_enqueue_time:0, waiting_to_ready_time:15359437
>> . 00000090: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4868, ready_to_enqueue_time:232, waiting_to_ready_time:5100709
>> . 000000c0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:179, ready_to_enqueue_time:0, waiting_to_ready_time:30714243
>> . 000000f0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:197, ready_to_enqueue_time:0, waiting_to_ready_time:15350648
>> . 00000120: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:213, ready_to_enqueue_time:0, waiting_to_ready_time:15353446
>> . 00000150: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:212, ready_to_enqueue_time:0, waiting_to_ready_time:15355126
>> . 00000180: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:6368, ready_to_enqueue_time:164, waiting_to_ready_time:5104665
>>
>> Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
>> ---
>> tools/perf/util/Build | 1 +
>> tools/perf/util/auxtrace.c | 2 +
>> tools/perf/util/powerpc-vpadtl.c | 299 +++++++++++++++++++++++++++++++
>> 3 files changed, 302 insertions(+)
>> create mode 100644 tools/perf/util/powerpc-vpadtl.c
>>
>> diff --git a/tools/perf/util/Build b/tools/perf/util/Build
>> index 4959e7a990e4..5ead46dc98e7 100644
>> --- a/tools/perf/util/Build
>> +++ b/tools/perf/util/Build
>> @@ -136,6 +136,7 @@ perf-util-$(CONFIG_AUXTRACE) += arm-spe-decoder/
>> perf-util-$(CONFIG_AUXTRACE) += hisi-ptt.o
>> perf-util-$(CONFIG_AUXTRACE) += hisi-ptt-decoder/
>> perf-util-$(CONFIG_AUXTRACE) += s390-cpumsf.o
>> +perf-util-$(CONFIG_AUXTRACE) += powerpc-vpadtl.o
>>
>> ifdef CONFIG_LIBOPENCSD
>> perf-util-$(CONFIG_AUXTRACE) += cs-etm.o
>> diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
>> index f587d386c5ef..bd1404f26bb7 100644
>> --- a/tools/perf/util/auxtrace.c
>> +++ b/tools/perf/util/auxtrace.c
>> @@ -1395,6 +1395,8 @@ int perf_event__process_auxtrace_info(struct perf_session *session,
>> err = hisi_ptt_process_auxtrace_info(event, session);
>> break;
>> case PERF_AUXTRACE_VPA_PMU:
>> + err = powerpc_vpadtl_process_auxtrace_info(event, session);
>> + break;
>> case PERF_AUXTRACE_UNKNOWN:
>> default:
>> return -EINVAL;
>> diff --git a/tools/perf/util/powerpc-vpadtl.c b/tools/perf/util/powerpc-vpadtl.c
>> new file mode 100644
>> index 000000000000..ea7b59c45f4a
>> --- /dev/null
>> +++ b/tools/perf/util/powerpc-vpadtl.c
>> @@ -0,0 +1,299 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * VPA DTL PMU support
>> + */
>> +
>> +#include <endian.h>
>> +#include <errno.h>
>> +#include <byteswap.h>
>> +#include <inttypes.h>
>> +#include <linux/kernel.h>
>> +#include <linux/types.h>
>> +#include <linux/bitops.h>
>> +#include <linux/log2.h>
>> +#include <elf.h>
>> +#include <limits.h>
>> +
>> +#include "cpumap.h"
>> +#include "color.h"
>> +#include "evsel.h"
>> +#include "evlist.h"
>> +#include "machine.h"
>> +#include "session.h"
>> +#include "util.h"
>> +#include "thread.h"
>> +#include "debug.h"
>> +#include "auxtrace.h"
>> +#include "powerpc-vpadtl.h"
>> +#include "map.h"
>> +#include "symbol_conf.h"
>> +#include "symbol.h"
>
> Are all these #includes really needed
>
>> +
>> +/*
>> + * The DTL entries are of below format
>> + */
>> +struct dtl_entry {
>> + u8 dispatch_reason;
>> + u8 preempt_reason;
>> + u16 processor_id;
>> + u32 enqueue_to_dispatch_time;
>> + u32 ready_to_enqueue_time;
>> + u32 waiting_to_ready_time;
>> + u64 timebase;
>> + u64 fault_addr;
>> + u64 srr0;
>> + u64 srr1;
>> +};
>
> struct dtl_entry is moved in a later patch.
> Maybe call it vpadtl_entry or powerpc_vpadtl_entry and
> put it in perf/util/event.h since it is eventually needed
> in perf/builtin-script.c
Sure
>
>> +
>> +/*
>> + * Structure to save the auxtrace queue
>> + */
>> +struct powerpc_vpadtl {
>> + struct auxtrace auxtrace;
>> + struct auxtrace_queues queues;
>> + struct auxtrace_heap heap;
>> + u32 auxtrace_type;
>> + struct perf_session *session;
>> + struct machine *machine;
>> + u32 pmu_type;
>> +};
>> +
>> +struct boottb_freq {
>> + u64 boot_tb;
>> + u64 tb_freq;
>> + u64 timebase;
>> + u64 padded[3];
>> +};
>> +
>> +struct powerpc_vpadtl_queue {
>> + struct powerpc_vpadtl *vpa;
>> + unsigned int queue_nr;
>> + struct auxtrace_buffer *buffer;
>> + struct thread *thread;
>> + bool on_heap;
>> + bool done;
>> + pid_t pid;
>> + pid_t tid;
>> + int cpu;
>> +};
>> +
>> +const char *dispatch_reasons[11] = {
>> + "external_interrupt",
>> + "firmware_internal_event",
>> + "H_PROD",
>> + "decrementer_interrupt",
>> + "system_reset",
>> + "firmware_internal_event",
>> + "conferred_cycles",
>> + "time_slice",
>> + "virtual_memory_page_fault",
>> + "expropriated_adjunct",
>> + "priv_doorbell"};
>> +
>> +const char *preempt_reasons[10] = {
>> + "unused",
>> + "firmware_internal_event",
>> + "H_CEDE",
>> + "H_CONFER",
>> + "time_slice",
>> + "migration_hibernation_page_fault",
>> + "virtual_memory_page_fault",
>> + "H_CONFER_ADJUNCT",
>> + "hcall_adjunct",
>> + "HDEC_adjunct"};
>> +
>> +#define dtl_entry_size 48
>
> sizeof(struct dtl_entry) ?
>
>> +
>> +/*
>> + * Function to dump the dispatch trace data when perf report
>> + * is invoked with -D
>> + */
>> +static void powerpc_vpadtl_dump(struct powerpc_vpadtl *vpa __maybe_unused,
>> + unsigned char *buf, size_t len)
>> +{
>> + struct dtl_entry *dtl;
>> + int pkt_len, pos = 0;
>> + const char *color = PERF_COLOR_BLUE;
>> +
>> + color_fprintf(stdout, color,
>> + ". ... VPA DTL PMU data: size %zu bytes, entries is %zu\n",
>> + len, len/dtl_entry_size);
>> +
>> + if (len % dtl_entry_size)
>> + len = len - (len % dtl_entry_size);
>> +
>> + while (len) {
>> + pkt_len = 48;
>
> dtl_entry_size ?
Yes, thanks for pointing out. Will change in V2
>
>> + printf(".");
>> + color_fprintf(stdout, color, " %08x: ", pos);
>> + dtl = (struct dtl_entry *)buf;
>> + if (dtl->timebase != 0) {
>> + printf("dispatch_reason:%s, preempt_reason:%s, enqueue_to_dispatch_time:%d, ready_to_enqueue_time:%d, waiting_to_ready_time:%d\n",
>> + dispatch_reasons[dtl->dispatch_reason], preempt_reasons[dtl->preempt_reason], be32_to_cpu(dtl->enqueue_to_dispatch_time),
>> + be32_to_cpu(dtl->ready_to_enqueue_time), be32_to_cpu(dtl->waiting_to_ready_time));
>
> Lines are getting a bit long
Will split these.
>
>> + } else {
>> + struct boottb_freq *boot_tb = (struct boottb_freq *)buf;
>> +
>> + printf("boot_tb: %" PRIu64 ", tb_freq: %" PRIu64 "\n", boot_tb->boot_tb, boot_tb->tb_freq);
>> + }
>> +
>> + pos += pkt_len;
>> + buf += pkt_len;
>> + len -= pkt_len;
>> + }
>> +}
>> +
>> +static void powerpc_vpadtl_dump_event(struct powerpc_vpadtl *vpa, unsigned char *buf,
>> + size_t len)
>> +{
>> + printf(".\n");
>> + powerpc_vpadtl_dump(vpa, buf, len);
>> +}
>> +
>> +static int powerpc_vpadtl_process_event(struct perf_session *session __maybe_unused,
>> + union perf_event *event __maybe_unused,
>> + struct perf_sample *sample __maybe_unused,
>> + const struct perf_tool *tool __maybe_unused)
>> +{
>> + return 0;
>> +}
>> +
>> +/*
>> + * Process PERF_RECORD_AUXTRACE records
>> + */
>> +static int powerpc_vpadtl_process_auxtrace_event(struct perf_session *session,
>> + union perf_event *event,
>> + const struct perf_tool *tool __maybe_unused)
>> +{
>> + struct powerpc_vpadtl *vpa = container_of(session->auxtrace, struct powerpc_vpadtl,
>> + auxtrace);
>
> Might be worth adding a helper like
>
> static struct powerpc_vpadtl *session_to_vpa(struct perf_session *session)
> {
> return container_of(session->auxtrace, struct powerpc_vpadtl, auxtrace);
> }
Ok Adrian
>
>> + struct auxtrace_buffer *buffer;
>> + off_t data_offset;
>> + int fd = perf_data__fd(session->data);
>> + int err;
>> +
>> + if (perf_data__is_pipe(session->data)) {
>> + data_offset = 0;
>> + } else {
>> + data_offset = lseek(fd, 0, SEEK_CUR);
>> + if (data_offset == -1)
>> + return -errno;
>> + }
>> +
>> + err = auxtrace_queues__add_event(&vpa->queues, session, event,
>> + data_offset, &buffer);
>
> auxtrace_queues__add_event() is only needed here if there is no
> auxtrace index, however an auxtrace index is always written for
> new perf.data files. The index gets processed and data queued
> by auxtrace_queues__process_index() which is added in patch 11.
>
> Piped data, on the other hand, has no index and needs to be
> handled here.
>
> So:
>
> if (perf_data__is_pipe(session->data)) {
> err = auxtrace_queues__add_event(&vpa->queues, session, event, 0, &buffer);
> if (err)
> return err;
> }
>
>
Ok, will handle this change
>> + if (err)
>> + return err;
>> +
>> + /* Dump here now we have copied a piped trace out of the pipe */
>> + if (dump_trace) {
>> + if (auxtrace_buffer__get_data(buffer, fd)) {
>> + powerpc_vpadtl_dump_event(vpa, buffer->data,
>> + buffer->size);
>
> Unnecessary line wrap
>
>> + auxtrace_buffer__put_data(buffer);
>> + }
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int powerpc_vpadtl_flush(struct perf_session *session __maybe_unused,
>> + const struct perf_tool *tool __maybe_unused)
>> +{
>> + return 0;
>> +}
>> +
>> +static void powerpc_vpadtl_free_queue(void *priv)
>> +{
>> + struct powerpc_vpadtl_queue *vpaq = priv;
>> +
>> + if (!vpaq)
>> + return;
>> +
>> + free(vpaq);
>> +}
>> +
>> +static void powerpc_vpadtl_free_events(struct perf_session *session)
>> +{
>> + struct powerpc_vpadtl *vpa = container_of(session->auxtrace, struct powerpc_vpadtl,
>> + auxtrace);
>> + struct auxtrace_queues *queues = &vpa->queues;
>> + unsigned int i;
>> +
>> + for (i = 0; i < queues->nr_queues; i++) {
>> + powerpc_vpadtl_free_queue(queues->queue_array[i].priv);
>
> This is the same as free(queues->queue_array[i].priv)
>
>> + queues->queue_array[i].priv = NULL;
>
> Could all be reduced to zfree(queues->queue_array[i].priv)
Sure
>
>> + }
>> + auxtrace_queues__free(queues);
>> +}
>> +
>> +static void powerpc_vpadtl_free(struct perf_session *session)
>> +{
>> + struct powerpc_vpadtl *vpa = container_of(session->auxtrace, struct powerpc_vpadtl,
>> + auxtrace);
>> +
>> + auxtrace_heap__free(&vpa->heap);
>> + powerpc_vpadtl_free_events(session);
>> + session->auxtrace = NULL;
>> + free(vpa);
>> +}
>> +
>> +static const char * const powerpc_vpadtl_info_fmts[] = {
>> + [POWERPC_VPADTL_TYPE] = " PMU Type %"PRId64"\n",
>> +};
>> +
>> +static void powerpc_vpadtl_print_info(__u64 *arr)
>> +{
>> + if (!dump_trace)
>> + return;
>> +
>> + fprintf(stdout, powerpc_vpadtl_info_fmts[POWERPC_VPADTL_TYPE], arr[POWERPC_VPADTL_TYPE]);
>> +}
>> +
>> +/*
>> + * Process the PERF_RECORD_AUXTRACE_INFO records and setup
>> + * the infrastructure to process auxtrace events. PERF_RECORD_AUXTRACE_INFO
>> + * is processed first since it is of type perf_user_event_type.
>> + * Initialise the aux buffer queues using auxtrace_queues__init().
>> + * auxtrace_queue is created for each CPU.
>> + */
>> +int powerpc_vpadtl_process_auxtrace_info(union perf_event *event,
>> + struct perf_session *session)
>> +{
>> + struct perf_record_auxtrace_info *auxtrace_info = &event->auxtrace_info;
>> + size_t min_sz = sizeof(u64) * POWERPC_VPADTL_TYPE;
>> + struct powerpc_vpadtl *vpa;
>> + int err;
>> +
>> + if (auxtrace_info->header.size < sizeof(struct perf_record_auxtrace_info) +
>> + min_sz)
>> + return -EINVAL;
>> +
>> + vpa = zalloc(sizeof(struct powerpc_vpadtl));
>> + if (!vpa)
>> + return -ENOMEM;
>> +
>> + err = auxtrace_queues__init(&vpa->queues);
>> + if (err)
>> + goto err_free;
>> +
>> + vpa->session = session;
>> + vpa->machine = &session->machines.host; /* No kvm support */
>> + vpa->auxtrace_type = auxtrace_info->type;
>> + vpa->pmu_type = auxtrace_info->priv[POWERPC_VPADTL_TYPE];
>> +
>> + vpa->auxtrace.process_event = powerpc_vpadtl_process_event;
>> + vpa->auxtrace.process_auxtrace_event = powerpc_vpadtl_process_auxtrace_event;
>> + vpa->auxtrace.flush_events = powerpc_vpadtl_flush;
>> + vpa->auxtrace.free_events = powerpc_vpadtl_free_events;
>> + vpa->auxtrace.free = powerpc_vpadtl_free;
>> + session->auxtrace = &vpa->auxtrace;
>> +
>> + powerpc_vpadtl_print_info(&auxtrace_info->priv[0]);
>> +
>> + return 0;
>> +
>> +err_free:
>> + free(vpa);
>> + return err;
>> +}
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 11/14] tools/perf: Process the DTL entries in queue and deliver samples
2025-08-27 17:29 ` Adrian Hunter
@ 2025-08-29 8:33 ` Athira Rajeev
0 siblings, 0 replies; 29+ messages in thread
From: Athira Rajeev @ 2025-08-29 8:33 UTC (permalink / raw)
To: Adrian Hunter
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Madhavan Srinivasan,
Ian Rogers, Namhyung Kim, open list:PERFORMANCE EVENTS SUBSYSTEM,
PowerPC, Aboorva Devarajan, Shrikanth Hegde, Kajol Jain, hbathini,
Aditya Bodkhe, Venkat Rao Bagalkote
> On 27 Aug 2025, at 10:59 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> On 15/08/2025 11:34, Athira Rajeev wrote:
>> Create samples from DTL entries for displaying in perf report
>> and perf script. When the different PERF_RECORD_XX records are
>> processed from perf session, powerpc_vpadtl_process_event() will
>> be invoked. For each of the PERF_RECORD_XX record, compare the timestamp
>> of perf record with timestamp of top element in the auxtrace heap.
>> Process the auxtrace queue if the timestamp of element from heap is
>> lower than timestamp from entry in perf record.
>>
>> Sometimes it could happen that one buffer is only partially
>> processed. if the timestamp of occurrence of another event is more
>> than currently processed element in the queue, it will move on
>> to next perf record. So keep track of position of buffer to
>> continue processing next time. Update the timestamp of the
>> auxtrace heap with the timestamp of last processed entry from
>> the auxtrace buffer.
>>
>> Generate perf sample for each entry in the dispatch trace log.
>> Fill in the sample details:
>> - sample ip is picked from srr0 field of dtl_entry
>> - sample cpu is picked from processor_id of dtl_entry
>> - sample id is from sample_id of powerpc_vpadtl
>> - cpumode is set to PERF_RECORD_MISC_KERNEL
>> - Additionally save the details in raw_data of sample. This
>> is to print the relevant fields in perf_sample__fprintf_synth()
>> when called from builtin-script
>>
>> The sample is processed by calling perf_session__deliver_synth_event()
>> so that it gets included in perf report.
>>
>> Sample Output:
>>
>> ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.300 MB perf.data ]
>>
>> ./perf report
>>
>> # Samples: 321 of event 'vpa-dtl'
>> # Event count (approx.): 321
>> #
>> # Children Self Command Shared Object Symbol
>> # ........ ........ ....... ................. ..............................
>> #
>> 100.00% 100.00% swapper [kernel.kallsyms] [k] plpar_hcall_norets_notrace
>>
>> Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
>> ---
>> tools/perf/util/powerpc-vpadtl.c | 181 +++++++++++++++++++++++++++++++
>> 1 file changed, 181 insertions(+)
>>
>> diff --git a/tools/perf/util/powerpc-vpadtl.c b/tools/perf/util/powerpc-vpadtl.c
>> index 299927901c9d..370c566f9ac2 100644
>> --- a/tools/perf/util/powerpc-vpadtl.c
>> +++ b/tools/perf/util/powerpc-vpadtl.c
>> @@ -160,6 +160,43 @@ static void powerpc_vpadtl_dump_event(struct powerpc_vpadtl *vpa, unsigned char
>> powerpc_vpadtl_dump(vpa, buf, len);
>> }
>>
>> +/*
>> + * Generate perf sample for each entry in the dispatch trace log.
>> + * - sample ip is picked from srr0 field of dtl_entry
>> + * - sample cpu is picked from logical cpu.
>> + * - sample id is from sample_id of powerpc_vpadtl
>> + * - cpumode is set to PERF_RECORD_MISC_KERNEL
>
> Above 4 lines of comments are a bit redundant.
>
>> + * - Additionally save the details in raw_data of sample. This
>> + * is to print the relevant fields in perf_sample__fprintf_synth()
>> + * when called from builtin-script
>> + */
>> +static int powerpc_vpadtl_sample(struct dtl_entry *record, struct powerpc_vpadtl *vpa, u64 save, int cpu)
>> +{
>> + struct perf_sample sample;
>> + union perf_event event;
>> +
>> + sample.ip = be64_to_cpu(record->srr0);
>> + sample.period = 1;
>> + sample.cpu = cpu;
>> + sample.id = vpa->sample_id;
>> + sample.callchain = NULL;
>> + sample.branch_stack = NULL;
>> + memset(&event, 0, sizeof(event));
>> + sample.cpumode = PERF_RECORD_MISC_KERNEL;
>> + sample.time = save;
>> + sample.raw_data = record;
>> + sample.raw_size = sizeof(record);
>> + event.sample.header.type = PERF_RECORD_SAMPLE;
>> + event.sample.header.misc = sample.cpumode;
>> + event.sample.header.size = sizeof(struct perf_event_header);
>> + if (perf_session__deliver_synth_event(vpa->session, &event,
>> + &sample)) {
>
> There is some inconsistency with line wrapping
I will handle this properly in V2
>
>> + pr_debug("Failed to create sample for dtl entry\n");
>> + return -1;
>> + }
>> + return 0;
>> +}
>> +
>> static int powerpc_vpadtl_get_buffer(struct powerpc_vpadtl_queue *vpaq)
>> {
>> struct auxtrace_buffer *buffer = vpaq->buffer;
>> @@ -233,6 +270,148 @@ static int powerpc_vpadtl_decode(struct powerpc_vpadtl_queue *vpaq)
>> return 1;
>> }
>>
>> +static int powerpc_vpadtl_decode_all(struct powerpc_vpadtl_queue *vpaq)
>> +{
>> + int ret;
>> + unsigned char *buf;
>> +
>> + if (!vpaq->buf_len || (vpaq->pkt_len == vpaq->size)) {
>
> Unnecessary parentheses around 'vpaq->pkt_len == vpaq->size’
Ok,
>
>> + ret = powerpc_vpadtl_get_buffer(vpaq);
>> + if (ret <= 0)
>> + return ret;
>> + }
>> +
>> + if (vpaq->buffer) {
>> + buf = vpaq->buffer->data;
>> + buf += vpaq->pkt_len;
>> + vpaq->dtl = (struct dtl_entry *)buf;
>> + if ((long long)be64_to_cpu(vpaq->dtl->timebase) <= 0) {
>> + if (vpaq->pkt_len != dtl_entry_size && vpaq->buf_len) {
>> + vpaq->pkt_len += dtl_entry_size;
>> + vpaq->buf_len -= dtl_entry_size;
>> + }
>> + return -1;
>> + }
>> + vpaq->pkt_len += dtl_entry_size;
>> + vpaq->buf_len -= dtl_entry_size;
>> + } else
>> + return 0;
>
> braces {} should be used on all arms of this statement
Sure,
>
>> +
>> +
>> + return 1;
>> +}
>> +
>> +static int powerpc_vpadtl_run_decoder(struct powerpc_vpadtl_queue *vpaq, u64 *timestamp)
>> +{
>> + struct powerpc_vpadtl *vpa = vpaq->vpa;
>> + struct dtl_entry *record;
>> + int ret;
>> + double result, div;
>> + double boot_freq = vpaq->tb_freq;
>> + unsigned long long boot_tb = vpaq->boot_tb;
>> + unsigned long long diff;
>> + unsigned long long save;
>> +
>> + while (1) {
>> + ret = powerpc_vpadtl_decode_all(vpaq);
>> + if (!ret) {
>> + pr_debug("All data in the queue has been processed.\n");
>> + return 1;
>> + }
>> +
>> + /*
>> + * Error is detected when decoding VPA PMU trace. Continue to
>> + * the next trace data and find out more dtl entries.
>> + */
>> + if (ret < 0)
>> + continue;
>> +
>> + record = vpaq->dtl;
>> +
>> + diff = be64_to_cpu(record->timebase) - boot_tb;
>> + div = diff / boot_freq;
>> + result = div;
>> + result = result * 1000000000;
>> + save = result;
>
> It would be nicer for the time calculation to be in a separate function.
> Also 'save' is an odd choice of variable name for a timestamp.
Will have a separate function for time calculation
And will make meaningful name for saving it in V2
Thanks
Athira
>
>> +
>> + /* Update timestamp for the last record */
>> + if (save > vpaq->timestamp)
>> + vpaq->timestamp = save;
>> +
>> + /*
>> + * If the timestamp of the queue is later than timestamp of the
>> + * coming perf event, bail out so can allow the perf event to
>> + * be processed ahead.
>> + */
>> + if (vpaq->timestamp >= *timestamp) {
>> + *timestamp = vpaq->timestamp;
>> + vpaq->pkt_len -= dtl_entry_size;
>> + vpaq->buf_len += dtl_entry_size;
>> + return 0;
>> + }
>> +
>> + ret = powerpc_vpadtl_sample(record, vpa, save, vpaq->cpu);
>> + if (ret)
>> + continue;
>> + }
>> + return 0;
>> +}
>> +
>> +/*
>> + * For each of the PERF_RECORD_XX record, compare the timestamp
>> + * of perf record with timestamp of top element in the auxtrace heap.
>> + * Process the auxtrace queue if the timestamp of element from heap is
>> + * lower than timestamp from entry in perf record.
>> + *
>> + * Update the timestamp of the auxtrace heap with the timestamp
>> + * of last processed entry from the auxtrace buffer.
>> + */
>> +static int powerpc_vpadtl_process_queues(struct powerpc_vpadtl *vpa, u64 timestamp)
>> +{
>> + unsigned int queue_nr;
>> + u64 ts;
>> + int ret;
>> +
>> + while (1) {
>> + struct auxtrace_queue *queue;
>> + struct powerpc_vpadtl_queue *vpaq;
>> +
>> + if (!vpa->heap.heap_cnt)
>> + return 0;
>> +
>> + if (vpa->heap.heap_array[0].ordinal >= timestamp)
>> + return 0;
>> +
>> + queue_nr = vpa->heap.heap_array[0].queue_nr;
>> + queue = &vpa->queues.queue_array[queue_nr];
>> + vpaq = queue->priv;
>> +
>> + auxtrace_heap__pop(&vpa->heap);
>> +
>> + if (vpa->heap.heap_cnt) {
>> + ts = vpa->heap.heap_array[0].ordinal + 1;
>> + if (ts > timestamp)
>> + ts = timestamp;
>> + } else
>> + ts = timestamp;
>
> braces {} should be used on all arms of this statement
>
>> +
>> + ret = powerpc_vpadtl_run_decoder(vpaq, &ts);
>> + if (ret < 0) {
>> + auxtrace_heap__add(&vpa->heap, queue_nr, ts);
>> + return ret;
>> + }
>> +
>> + if (!ret) {
>> + ret = auxtrace_heap__add(&vpa->heap, queue_nr, ts);
>> + if (ret < 0)
>> + return ret;
>> + } else {
>> + vpaq->on_heap = false;
>> + }
>> + }
>> + return 0;
>> +}
>> +
>> static struct powerpc_vpadtl_queue *powerpc_vpadtl__alloc_queue(struct powerpc_vpadtl *vpa,
>> unsigned int queue_nr)
>> {
>> @@ -368,6 +547,8 @@ static int powerpc_vpadtl_process_event(struct perf_session *session __maybe_unu
>> err = powerpc_vpadtl__update_queues(vpa);
>> if (err)
>> return err;
>> +
>> + err = powerpc_vpadtl_process_queues(vpa, sample->time);
>> }
>>
>> return err;
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 12/14] tools/perf: Add support for printing synth event details via default callback
2025-08-27 17:29 ` Adrian Hunter
@ 2025-08-29 8:35 ` Athira Rajeev
0 siblings, 0 replies; 29+ messages in thread
From: Athira Rajeev @ 2025-08-29 8:35 UTC (permalink / raw)
To: Adrian Hunter
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Madhavan Srinivasan,
Ian Rogers, Namhyung Kim, open list:PERFORMANCE EVENTS SUBSYSTEM,
PowerPC, Aboorva Devarajan, Shrikanth Hegde, Kajol Jain, hbathini,
Aditya Bodkhe, Venkat Rao Bagalkote
> On 27 Aug 2025, at 10:59 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> On 15/08/2025 11:34, Athira Rajeev wrote:
>> Introduce arch_perf_sample__fprintf_synth_evt to add support for
>> printing arch specific synth event details. The process_event()
>> function in "builtin-script.c" invokes perf_sample__fprintf_synth() for
>> displaying PERF_TYPE_SYNTH type events.
>>
>> if (attr->type == PERF_TYPE_SYNTH && PRINT_FIELD(SYNTH))
>> perf_sample__fprintf_synth(sample, evsel, fp);
>>
>> perf_sample__fprintf_synth() process the sample depending on the value
>> in evsel->core.attr.config . Currently all the arch specific callbacks
>> perf_sample__fprintf_synth* are part of "builtin-script.c" itself.
>> Example: perf_sample__fprintf_synth_ptwrite,
>> perf_sample__fprintf_synth_mwait etc. This will need adding arch
>> specific details in builtin-script.c for any new perf_synth_id events.
>>
>> Introduce arch_perf_sample__fprintf_synth_evt() and invoke this as
>> default callback for perf_sample__fprintf_synth(). This way, arch
>> specific code can handle processing the details.
>
> A default callback is not needed.
>
>>
>> Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
>> ---
>> tools/perf/builtin-script.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
>> index d9fbdcf72f25..eff584735980 100644
>> --- a/tools/perf/builtin-script.c
>> +++ b/tools/perf/builtin-script.c
>> @@ -2003,6 +2003,12 @@ static int perf_sample__fprintf_synth_iflag_chg(struct perf_sample *sample, FILE
>> return len + perf_sample__fprintf_pt_spacing(len, fp);
>> }
>>
>> +static void arch_perf_sample__fprintf_synth_evt(struct perf_sample *data __maybe_unused,
>> + FILE *fp __maybe_unused, u64 config __maybe_unused)
>> +{
>> + return;
>> +}
>> +
>> static int perf_sample__fprintf_synth(struct perf_sample *sample,
>> struct evsel *evsel, FILE *fp)
>> {
>> @@ -2026,6 +2032,7 @@ static int perf_sample__fprintf_synth(struct perf_sample *sample,
>> case PERF_SYNTH_INTEL_IFLAG_CHG:
>> return perf_sample__fprintf_synth_iflag_chg(sample, fp);
>> default:
>
> Should just add something like:
>
> case PERF_SYNTH_POWERPC_VPA_DTL:
> return perf_sample__fprintf_synth_vpadtl(sample, fp);
Ok Adrian
Will directly call perf_sample__fprintf_synth_vpadtl instead of having default call back
Thanks for all comments, I will post a V2 addressing the changes
Thanks
Athira
>
>> + arch_perf_sample__fprintf_synth_evt(sample, fp, evsel->core.attr.config);
>> break;
>> }
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2025-08-29 8:35 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-15 8:33 [PATCH 00/14] Add interface to expose vpa dtl counters via perf Athira Rajeev
2025-08-15 8:33 ` [PATCH 01/14] powerpc/time: Expose boot_tb via accessor Athira Rajeev
2025-08-15 8:33 ` [PATCH 02/14] powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf Athira Rajeev
2025-08-20 11:53 ` Shrikanth Hegde
2025-08-15 8:33 ` [PATCH 03/14] docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs event format entries for vpa_dtl pmu Athira Rajeev
2025-08-15 8:33 ` [PATCH 04/14] powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for capturing DTL data Athira Rajeev
2025-08-15 8:33 ` [PATCH 05/14] powerpc/perf/vpa-dtl: Add support to capture DTL data in aux buffer Athira Rajeev
2025-08-15 8:33 ` [PATCH 06/14] powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake up is needed Athira Rajeev
2025-08-15 8:34 ` [PATCH 07/14] tools/perf: Add basic CONFIG_AUXTRACE support for VPA pmu on powerpc Athira Rajeev
2025-08-27 17:27 ` Adrian Hunter
2025-08-29 8:29 ` Athira Rajeev
2025-08-15 8:34 ` [PATCH 08/14] tools/perf: process auxtrace events and display in perf report -D Athira Rajeev
2025-08-27 17:28 ` Adrian Hunter
2025-08-29 8:31 ` Athira Rajeev
2025-08-15 8:34 ` [PATCH 09/14] tools/perf: Add event name as vpa-dtl of PERF_TYPE_SYNTH type to present DTL samples Athira Rajeev
2025-08-15 8:34 ` [PATCH 10/14] tools/perf: Allocate and setup aux buffer queue to help co-relate with other events across CPU's Athira Rajeev
2025-08-27 17:29 ` Adrian Hunter
2025-08-15 8:34 ` [PATCH 11/14] tools/perf: Process the DTL entries in queue and deliver samples Athira Rajeev
2025-08-27 17:29 ` Adrian Hunter
2025-08-29 8:33 ` Athira Rajeev
2025-08-15 8:34 ` [PATCH 12/14] tools/perf: Add support for printing synth event details via default callback Athira Rajeev
2025-08-27 17:29 ` Adrian Hunter
2025-08-29 8:35 ` Athira Rajeev
2025-08-15 8:34 ` [PATCH 13/14] tools/perf: Enable perf script to present the DTL entries Athira Rajeev
2025-08-27 17:30 ` Adrian Hunter
2025-08-15 8:34 ` [PATCH 14/14] powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU Athira Rajeev
2025-08-15 12:17 ` [PATCH 00/14] Add interface to expose vpa dtl counters via perf Venkat Rao Bagalkote
2025-08-15 12:51 ` Athira Rajeev
2025-08-18 14:41 ` tejas05
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).