* [PATCH V2 0/7] Add interface to expose vpa dtl counters via
@ 2025-09-15 7:22 Athira Rajeev
2025-09-15 7:22 ` [PATCH V2 1/7] powerpc/time: Expose boot_tb via accessor Athira Rajeev
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: Athira Rajeev @ 2025-09-15 7:22 UTC (permalink / raw)
To: maddy, linuxppc-dev
Cc: acme, jolsa, adrian.hunter, irogers, namhyung, linux-perf-users,
aboorvad, sshegde, atrajeev, hbathini, Aditya.Bodkhe1, venkat88,
Tejas.Manhas1
The pseries Shared Processor Logical Partition(SPLPAR) machines can
retrieve a log of dispatch and preempt events from the hypervisor
using data from Disptach Trace Log(DTL) buffer. With this information,
user can retrieve when and why each dispatch & preempt has occurred.
The vpa-dtl PMU exposes the Virtual Processor Area(VPA) DTL counters
via perf.
- Patches 1 to 6 has powerpc PMU driver code changes to capture DTL
trace in perf.data. And patch 7 has documentation update.
Infrastructure used
===================
The VPA DTL PMU counters do not interrupt on overflow or generate any
PMI interrupts. Therefore, hrtimer is used to poll the DTL data. The timer
nterval can be provided by user via sample_period field in nano seconds.
vpa dtl pmu has one hrtimer added per vpa-dtl pmu thread. DTL (Dispatch
Trace Log) contains information about dispatch/preempt, enqueue time etc.
We directly copy the DTL buffer data as part of auxiliary buffer and it
will be processed later. This will avoid time taken to create samples
in the kernel space. The PMU driver collecting Dispatch Trace Log (DTL)
entries makes use of AUX support in perf infrastructure. On the tools side,
this data is made available as PERF_RECORD_AUXTRACE records.
To corelate each DTL entry with other events across CPU's, an auxtrace_queue
is created for each CPU. Each auxtrace queue has a array/list of auxtrace buffers.
All auxtrace queues is maintained in auxtrace heap. The queues are sorted
based on timestamp. When the different PERF_RECORD_XX records are processed,
compare the timestamp of perf record with timestamp of top element in the
auxtrace heap so that DTL events can be co-related with other events
Process the auxtrace queue if the timestamp of element from heap is
lower than timestamp from entry in perf record. Sometimes it could happen that
one buffer is only partially processed. if the timestamp of occurrence of
another event is more than currently processed element in the queue, it will
move on to next perf record. So keep track of position of buffer to continue
processing next time. Update the timestamp of the auxtrace heap with the timestamp
of last processed entry from the auxtrace buffer.
This infrastructure ensures dispatch trace log entries can be corelated
and presented along with other events like sched.
With the kernel changes;
# ls /sys/devices/vpa_dtl/
events format perf_event_mux_interval_ms power subsystem type uevent
Thanks
Athira
Aboorva Devarajan (1):
powerpc/time: Expose boot_tb via accessor
Athira Rajeev (4):
powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for
capturing DTL data
powerpc/perf/vpa-dtl: Add support to capture DTL data in aux buffer
powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake
up is needed
powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU
Kajol Jain (2):
powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf
docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs
event format entries for vpa_dtl pmu
.../sysfs-bus-event_source-devices-vpa-dtl | 25 +
Documentation/arch/powerpc/index.rst | 1 +
Documentation/arch/powerpc/vpa-dtl.rst | 155 +++++
arch/powerpc/include/asm/time.h | 4 +
arch/powerpc/kernel/time.c | 8 +-
arch/powerpc/perf/Makefile | 2 +-
arch/powerpc/perf/vpa-dtl.c | 596 ++++++++++++++++++
7 files changed, 789 insertions(+), 2 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-vpa-dtl
create mode 100644 Documentation/arch/powerpc/vpa-dtl.rst
create mode 100644 arch/powerpc/perf/vpa-dtl.c
--
2.47.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH V2 1/7] powerpc/time: Expose boot_tb via accessor
2025-09-15 7:22 [PATCH V2 0/7] Add interface to expose vpa dtl counters via Athira Rajeev
@ 2025-09-15 7:22 ` Athira Rajeev
2025-09-15 7:22 ` [PATCH V2 2/7] powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf Athira Rajeev
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Athira Rajeev @ 2025-09-15 7:22 UTC (permalink / raw)
To: maddy, linuxppc-dev
Cc: acme, jolsa, adrian.hunter, irogers, namhyung, linux-perf-users,
aboorvad, sshegde, atrajeev, hbathini, Aditya.Bodkhe1, venkat88,
Tejas.Manhas1
From: Aboorva Devarajan <aboorvad@linux.ibm.com>
- Define accessor function get_boot_tb() to safely return boot_tb value,
this is only needed when running in SPLPAR environments, so the
accessor is built conditionally under CONFIG_PPC_SPLPAR.
- Tag boot_tb as __ro_after_init since it is written once at initialized
and never updated afterwards.
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
---
Changelog:
As suggested by Maddy, wrap boot_tb under
CONFIG_PPC_SPLPAR
arch/powerpc/include/asm/time.h | 4 ++++
arch/powerpc/kernel/time.c | 8 +++++++-
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index f8885586efaf..7991ab1d4cb8 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -29,6 +29,10 @@ extern u64 decrementer_max;
extern void generic_calibrate_decr(void);
+#ifdef CONFIG_PPC_SPLPAR
+extern u64 get_boot_tb(void);
+#endif
+
/* Some sane defaults: 125 MHz timebase, 1GHz processor */
extern unsigned long ppc_proc_freq;
#define DEFAULT_PROC_FREQ (DEFAULT_TB_FREQ * 8)
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 8224381c1dba..4bbeb8644d3d 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -137,7 +137,7 @@ EXPORT_SYMBOL_GPL(rtc_lock);
static u64 tb_to_ns_scale __read_mostly;
static unsigned tb_to_ns_shift __read_mostly;
-static u64 boot_tb __read_mostly;
+static u64 boot_tb __ro_after_init;
extern struct timezone sys_tz;
static long timezone_offset;
@@ -639,6 +639,12 @@ notrace unsigned long long sched_clock(void)
return mulhdu(get_tb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
}
+#ifdef CONFIG_PPC_SPLPAR
+u64 get_boot_tb(void)
+{
+ return boot_tb;
+}
+#endif
#ifdef CONFIG_PPC_PSERIES
--
2.47.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH V2 2/7] powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf
2025-09-15 7:22 [PATCH V2 0/7] Add interface to expose vpa dtl counters via Athira Rajeev
2025-09-15 7:22 ` [PATCH V2 1/7] powerpc/time: Expose boot_tb via accessor Athira Rajeev
@ 2025-09-15 7:22 ` Athira Rajeev
2025-09-15 7:22 ` [PATCH V2 3/7] docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs event format entries for vpa_dtl pmu Athira Rajeev
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Athira Rajeev @ 2025-09-15 7:22 UTC (permalink / raw)
To: maddy, linuxppc-dev
Cc: acme, jolsa, adrian.hunter, irogers, namhyung, linux-perf-users,
aboorvad, sshegde, atrajeev, hbathini, Aditya.Bodkhe1, venkat88,
Tejas.Manhas1, Kajol Jain
From: Kajol Jain <kjain@linux.ibm.com>
The pseries Shared Processor Logical Partition(SPLPAR) machines
can retrieve a log of dispatch and preempt events from the
hypervisor using data from Disptach Trace Log(DTL) buffer.
With this information, user can retrieve when and why each dispatch &
preempt has occurred. Added an interface to expose the Virtual Processor
Area(VPA) DTL counters via perf.
The following events are available and exposed in sysfs:
vpa_dtl/dtl_cede/ - Trace voluntary (OS initiated) virtual processor waits
vpa_dtl/dtl_preempt/ - Trace time slice preempts
vpa_dtl/dtl_fault/ - Trace virtual partition memory page faults.
vpa_dtl/dtl_all/ - Trace all (dtl_cede/dtl_preempt/dtl_fault)
Added interface defines supported event list, config fields for the
event attributes and their corresponding bit values which are exported
via sysfs. User could use the standard perf tool to access perf events
exposed via vpa-dtl pmu.
The VPA DTL PMU counters do not interrupt on overflow or generate any
PMI interrupts. Therefore, the kernel needs to poll the counters, added
hrtimer code to do that. The timer interval can be provided by user via
sample_period field in nano seconds. There is one hrtimer added per
vpa-dtl pmu thread.
To ensure there are no other conflicting dtl users (example: debugfs dtl
or /proc/powerpc/vcpudispatch_stats), interface added code to use
"down_write_trylock" call to take the dtl_access_lock. The dtl_access_lock
is defined in dtl.h file. Also added global reference count variable called
"dtl_global_refc", to ensure dtl data can be captured per-cpu. Code also
added global lock called "dtl_global_lock" to avoid race condition.
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
Changelog:
Addressed review comments from Shrikanth
- Removed unused active_lock
- Added GFP_ATOMIC as part of kmem_cache_alloc_node
arch/powerpc/perf/Makefile | 2 +-
arch/powerpc/perf/vpa-dtl.c | 340 ++++++++++++++++++++++++++++++++++++
2 files changed, 341 insertions(+), 1 deletion(-)
create mode 100644 arch/powerpc/perf/vpa-dtl.c
diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index 7f53fcb7495a..78dd7e25219e 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -14,7 +14,7 @@ obj-$(CONFIG_PPC_POWERNV) += imc-pmu.o
obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
obj-$(CONFIG_FSL_EMB_PERF_EVENT_E500) += e500-pmu.o e6500-pmu.o
-obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o hv-common.o
+obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o hv-common.o vpa-dtl.o
obj-$(CONFIG_VPA_PMU) += vpa-pmu.o
diff --git a/arch/powerpc/perf/vpa-dtl.c b/arch/powerpc/perf/vpa-dtl.c
new file mode 100644
index 000000000000..4a8ba5b3ae3f
--- /dev/null
+++ b/arch/powerpc/perf/vpa-dtl.c
@@ -0,0 +1,340 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Perf interface to expose Dispatch Trace Log counters.
+ *
+ * Copyright (C) 2024 Kajol Jain, IBM Corporation
+ */
+
+#ifdef CONFIG_PPC_SPLPAR
+#define pr_fmt(fmt) "vpa_dtl: " fmt
+
+#include <asm/dtl.h>
+#include <linux/perf_event.h>
+#include <asm/plpar_wrappers.h>
+
+#define EVENT(_name, _code) enum{_name = _code}
+
+/*
+ * Based on Power Architecture Platform Reference(PAPR) documentation,
+ * Table 14.14. Per Virtual Processor Area, below Dispatch Trace Log(DTL)
+ * Enable Mask used to get corresponding virtual processor dispatch
+ * to preempt traces:
+ * DTL_CEDE(0x1): Trace voluntary (OS initiated) virtual
+ * processor waits
+ * DTL_PREEMPT(0x2): Trace time slice preempts
+ * DTL_FAULT(0x4): Trace virtual partition memory page
+ faults.
+ * DTL_ALL(0x7): Trace all (DTL_CEDE | DTL_PREEMPT | DTL_FAULT)
+ *
+ * Event codes based on Dispatch Trace Log Enable Mask.
+ */
+EVENT(DTL_CEDE, 0x1);
+EVENT(DTL_PREEMPT, 0x2);
+EVENT(DTL_FAULT, 0x4);
+EVENT(DTL_ALL, 0x7);
+
+GENERIC_EVENT_ATTR(dtl_cede, DTL_CEDE);
+GENERIC_EVENT_ATTR(dtl_preempt, DTL_PREEMPT);
+GENERIC_EVENT_ATTR(dtl_fault, DTL_FAULT);
+GENERIC_EVENT_ATTR(dtl_all, DTL_ALL);
+
+PMU_FORMAT_ATTR(event, "config:0-7");
+
+static struct attribute *events_attr[] = {
+ GENERIC_EVENT_PTR(DTL_CEDE),
+ GENERIC_EVENT_PTR(DTL_PREEMPT),
+ GENERIC_EVENT_PTR(DTL_FAULT),
+ GENERIC_EVENT_PTR(DTL_ALL),
+ NULL
+};
+
+static struct attribute_group event_group = {
+ .name = "events",
+ .attrs = events_attr,
+};
+
+static struct attribute *format_attrs[] = {
+ &format_attr_event.attr,
+ NULL,
+};
+
+static const struct attribute_group format_group = {
+ .name = "format",
+ .attrs = format_attrs,
+};
+
+static const struct attribute_group *attr_groups[] = {
+ &format_group,
+ &event_group,
+ NULL,
+};
+
+struct vpa_dtl {
+ struct dtl_entry *buf;
+ u64 last_idx;
+};
+
+static DEFINE_PER_CPU(struct vpa_dtl, vpa_dtl_cpu);
+
+/* variable to capture reference count for the active dtl threads */
+static int dtl_global_refc;
+static spinlock_t dtl_global_lock = __SPIN_LOCK_UNLOCKED(dtl_global_lock);
+
+/*
+ * Function to dump the dispatch trace log buffer data to the
+ * perf data.
+ */
+static void vpa_dtl_dump_sample_data(struct perf_event *event)
+{
+ return;
+}
+
+/*
+ * The VPA Dispatch Trace log counters do not interrupt on overflow.
+ * Therefore, the kernel needs to poll the counters to avoid missing
+ * an overflow using hrtimer. The timer interval is based on sample_period
+ * count provided by user, and minimum interval is 1 millisecond.
+ */
+static enum hrtimer_restart vpa_dtl_hrtimer_handle(struct hrtimer *hrtimer)
+{
+ struct perf_event *event;
+ u64 period;
+
+ event = container_of(hrtimer, struct perf_event, hw.hrtimer);
+
+ if (event->state != PERF_EVENT_STATE_ACTIVE)
+ return HRTIMER_NORESTART;
+
+ vpa_dtl_dump_sample_data(event);
+ period = max_t(u64, NSEC_PER_MSEC, event->hw.sample_period);
+ hrtimer_forward_now(hrtimer, ns_to_ktime(period));
+
+ return HRTIMER_RESTART;
+}
+
+static void vpa_dtl_start_hrtimer(struct perf_event *event)
+{
+ u64 period;
+ struct hw_perf_event *hwc = &event->hw;
+
+ period = max_t(u64, NSEC_PER_MSEC, hwc->sample_period);
+ hrtimer_start(&hwc->hrtimer, ns_to_ktime(period), HRTIMER_MODE_REL_PINNED);
+}
+
+static void vpa_dtl_stop_hrtimer(struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ hrtimer_cancel(&hwc->hrtimer);
+}
+
+static void vpa_dtl_reset_global_refc(struct perf_event *event)
+{
+ spin_lock(&dtl_global_lock);
+ dtl_global_refc--;
+ if (dtl_global_refc <= 0) {
+ dtl_global_refc = 0;
+ up_write(&dtl_access_lock);
+ }
+ spin_unlock(&dtl_global_lock);
+}
+
+static int vpa_dtl_mem_alloc(int cpu)
+{
+ struct vpa_dtl *dtl = &per_cpu(vpa_dtl_cpu, cpu);
+ struct dtl_entry *buf = NULL;
+
+ /* Check for dispatch trace log buffer cache */
+ if (!dtl_cache)
+ return -ENOMEM;
+
+ buf = kmem_cache_alloc_node(dtl_cache, GFP_KERNEL | GFP_ATOMIC, cpu_to_node(cpu));
+ if (!buf) {
+ pr_warn("buffer allocation failed for cpu %d\n", cpu);
+ return -ENOMEM;
+ }
+ dtl->buf = buf;
+ return 0;
+}
+
+static int vpa_dtl_event_init(struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ /* test the event attr type for PMU enumeration */
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ if (!perfmon_capable())
+ return -EACCES;
+
+ /* Return if this is a counting event */
+ if (!is_sampling_event(event))
+ return -EOPNOTSUPP;
+
+ /* no branch sampling */
+ if (has_branch_stack(event))
+ return -EOPNOTSUPP;
+
+ /* Invalid eventcode */
+ switch (event->attr.config) {
+ case DTL_LOG_CEDE:
+ case DTL_LOG_PREEMPT:
+ case DTL_LOG_FAULT:
+ case DTL_LOG_ALL:
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ spin_lock(&dtl_global_lock);
+
+ /*
+ * To ensure there are no other conflicting dtl users
+ * (example: /proc/powerpc/vcpudispatch_stats or debugfs dtl),
+ * below code try to take the dtl_access_lock.
+ * The dtl_access_lock is a rwlock defined in dtl.h, which is used
+ * to unsure there is no conflicting dtl users.
+ * Based on below code, vpa_dtl pmu tries to take write access lock
+ * and also checks for dtl_global_refc, to make sure that the
+ * dtl_access_lock is taken by vpa_dtl pmu interface.
+ */
+ if (dtl_global_refc == 0 && !down_write_trylock(&dtl_access_lock)) {
+ spin_unlock(&dtl_global_lock);
+ return -EBUSY;
+ }
+
+ /* Allocate dtl buffer memory */
+ if (vpa_dtl_mem_alloc(event->cpu)) {
+ spin_unlock(&dtl_global_lock);
+ return -ENOMEM;
+ }
+
+ /*
+ * Increment the number of active vpa_dtl pmu threads. The
+ * dtl_global_refc is used to keep count of cpu threads that
+ * currently capturing dtl data using vpa_dtl pmu interface.
+ */
+ dtl_global_refc++;
+
+ spin_unlock(&dtl_global_lock);
+
+ hrtimer_setup(&hwc->hrtimer, vpa_dtl_hrtimer_handle, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+
+ /*
+ * Since hrtimers have a fixed rate, we can do a static freq->period
+ * mapping and avoid the whole period adjust feedback stuff.
+ */
+ if (event->attr.freq) {
+ long freq = event->attr.sample_freq;
+
+ event->attr.sample_period = NSEC_PER_SEC / freq;
+ hwc->sample_period = event->attr.sample_period;
+ local64_set(&hwc->period_left, hwc->sample_period);
+ hwc->last_period = hwc->sample_period;
+ event->attr.freq = 0;
+ }
+
+ event->destroy = vpa_dtl_reset_global_refc;
+ return 0;
+}
+
+static int vpa_dtl_event_add(struct perf_event *event, int flags)
+{
+ int ret, hwcpu;
+ unsigned long addr;
+ struct vpa_dtl *dtl = &per_cpu(vpa_dtl_cpu, event->cpu);
+
+ /*
+ * Register our dtl buffer with the hypervisor. The
+ * HV expects the buffer size to be passed in the second
+ * word of the buffer. Refer section '14.11.3.2. H_REGISTER_VPA'
+ * from PAPR for more information.
+ */
+ ((u32 *)dtl->buf)[1] = cpu_to_be32(DISPATCH_LOG_BYTES);
+ dtl->last_idx = 0;
+
+ hwcpu = get_hard_smp_processor_id(event->cpu);
+ addr = __pa(dtl->buf);
+
+ ret = register_dtl(hwcpu, addr);
+ if (ret) {
+ pr_warn("DTL registration for cpu %d (hw %d) failed with %d\n",
+ event->cpu, hwcpu, ret);
+ return ret;
+ }
+
+ /* set our initial buffer indices */
+ lppaca_of(event->cpu).dtl_idx = 0;
+
+ /*
+ * Ensure that our updates to the lppaca fields have
+ * occurred before we actually enable the logging
+ */
+ smp_wmb();
+
+ /* enable event logging */
+ lppaca_of(event->cpu).dtl_enable_mask = event->attr.config;
+
+ vpa_dtl_start_hrtimer(event);
+
+ return 0;
+}
+
+static void vpa_dtl_event_del(struct perf_event *event, int flags)
+{
+ int hwcpu = get_hard_smp_processor_id(event->cpu);
+ struct vpa_dtl *dtl = &per_cpu(vpa_dtl_cpu, event->cpu);
+
+ vpa_dtl_stop_hrtimer(event);
+ unregister_dtl(hwcpu);
+ kmem_cache_free(dtl_cache, dtl->buf);
+ dtl->buf = NULL;
+ lppaca_of(event->cpu).dtl_enable_mask = 0x0;
+}
+
+/*
+ * This function definition is empty as vpa_dtl_dump_sample_data
+ * is used to parse and dump the dispatch trace log data,
+ * to perf data.
+ */
+static void vpa_dtl_event_read(struct perf_event *event)
+{
+}
+
+static struct pmu vpa_dtl_pmu = {
+ .task_ctx_nr = perf_invalid_context,
+
+ .name = "vpa_dtl",
+ .attr_groups = attr_groups,
+ .event_init = vpa_dtl_event_init,
+ .add = vpa_dtl_event_add,
+ .del = vpa_dtl_event_del,
+ .read = vpa_dtl_event_read,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE | PERF_PMU_CAP_EXCLUSIVE,
+};
+
+static int vpa_dtl_init(void)
+{
+ int r;
+
+ if (!firmware_has_feature(FW_FEATURE_SPLPAR)) {
+ pr_debug("not a shared virtualized system, not enabling\n");
+ return -ENODEV;
+ }
+
+ /* This driver is intended only for L1 host. */
+ if (is_kvm_guest()) {
+ pr_debug("Only supported for L1 host system\n");
+ return -ENODEV;
+ }
+
+ r = perf_pmu_register(&vpa_dtl_pmu, vpa_dtl_pmu.name, -1);
+ if (r)
+ return r;
+
+ return 0;
+}
+
+device_initcall(vpa_dtl_init);
+#endif //CONFIG_PPC_SPLPAR
--
2.47.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH V2 3/7] docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs event format entries for vpa_dtl pmu
2025-09-15 7:22 [PATCH V2 0/7] Add interface to expose vpa dtl counters via Athira Rajeev
2025-09-15 7:22 ` [PATCH V2 1/7] powerpc/time: Expose boot_tb via accessor Athira Rajeev
2025-09-15 7:22 ` [PATCH V2 2/7] powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf Athira Rajeev
@ 2025-09-15 7:22 ` Athira Rajeev
2025-09-15 7:22 ` [PATCH V2 4/7] powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for capturing DTL data Athira Rajeev
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Athira Rajeev @ 2025-09-15 7:22 UTC (permalink / raw)
To: maddy, linuxppc-dev
Cc: acme, jolsa, adrian.hunter, irogers, namhyung, linux-perf-users,
aboorvad, sshegde, atrajeev, hbathini, Aditya.Bodkhe1, venkat88,
Tejas.Manhas1, Kajol Jain
From: Kajol Jain <kjain@linux.ibm.com>
Details are added for the vpa_dtl pmu event and format
attributes in the ABI documentation.
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
.../sysfs-bus-event_source-devices-vpa-dtl | 25 +++++++++++++++++++
1 file changed, 25 insertions(+)
create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-vpa-dtl
diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-vpa-dtl b/Documentation/ABI/testing/sysfs-bus-event_source-devices-vpa-dtl
new file mode 100644
index 000000000000..7b7c789a5cf5
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-vpa-dtl
@@ -0,0 +1,25 @@
+What: /sys/bus/event_source/devices/vpa_dtl/format
+Date: February 2025
+Contact: Linux on PowerPC Developer List <linuxppc-dev at lists.ozlabs.org>
+Description: Read-only. Attribute group to describe the magic bits
+ that go into perf_event_attr.config for a particular pmu.
+ (See ABI/testing/sysfs-bus-event_source-devices-format).
+
+ Each attribute under this group defines a bit range of the
+ perf_event_attr.config. Supported attribute are listed
+ below::
+
+ event = "config:0-7" - event ID
+
+ For example::
+
+ dtl_cede = "event=0x1"
+
+What: /sys/bus/event_source/devices/vpa_dtl/events
+Date: February 2025
+Contact: Linux on PowerPC Developer List <linuxppc-dev at lists.ozlabs.org>
+Description: (RO) Attribute group to describe performance monitoring events
+ for the Virtual Processor Dispatch Trace Log. Each attribute in
+ this group describes a single performance monitoring event
+ supported by vpa_dtl pmu. The name of the file is the name of
+ the event (See ABI/testing/sysfs-bus-event_source-devices-events).
--
2.47.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH V2 4/7] powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for capturing DTL data
2025-09-15 7:22 [PATCH V2 0/7] Add interface to expose vpa dtl counters via Athira Rajeev
` (2 preceding siblings ...)
2025-09-15 7:22 ` [PATCH V2 3/7] docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs event format entries for vpa_dtl pmu Athira Rajeev
@ 2025-09-15 7:22 ` Athira Rajeev
2025-09-15 7:22 ` [PATCH V2 5/7] powerpc/perf/vpa-dtl: Add support to capture DTL data in aux buffer Athira Rajeev
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Athira Rajeev @ 2025-09-15 7:22 UTC (permalink / raw)
To: maddy, linuxppc-dev
Cc: acme, jolsa, adrian.hunter, irogers, namhyung, linux-perf-users,
aboorvad, sshegde, atrajeev, hbathini, Aditya.Bodkhe1, venkat88,
Tejas.Manhas1
vpa dtl pmu has one hrtimer added per vpa-dtl pmu thread. When the
hrtimer expires, in the timer handler, code is added to save the DTL
data to perf event record. DTL (Dispatch Trace Log) contains information
about dispatch/preempt, enqueue time etc. We directly copy the DTL
buffer data as part of auxiliary buffer and it will be postprocessed
later. To enable the support for aux buffer, add the PMU callbacks for
setup_aux and free_aux.
In setup_aux, set up pmu-private data structures for an AUX
area. rb_alloc_aux uses "alloc_pages_node" and returns pointer to each
page address. Map these pages to contiguous space using vmap and use
that as base address. The aux private data structure ie,
"struct vpa_pmu_buf" mainly saves:
1. buf->base: aux buffer base address
2. buf->head: offset from base address where data will be written to.
3. buf->size: Size of allocated memory
free_aux will free pmu-private AUX data structures.
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
---
arch/powerpc/perf/vpa-dtl.c | 77 +++++++++++++++++++++++++++++++++++++
1 file changed, 77 insertions(+)
diff --git a/arch/powerpc/perf/vpa-dtl.c b/arch/powerpc/perf/vpa-dtl.c
index 4a8ba5b3ae3f..28e7fc6e7015 100644
--- a/arch/powerpc/perf/vpa-dtl.c
+++ b/arch/powerpc/perf/vpa-dtl.c
@@ -11,6 +11,7 @@
#include <asm/dtl.h>
#include <linux/perf_event.h>
#include <asm/plpar_wrappers.h>
+#include <linux/vmalloc.h>
#define EVENT(_name, _code) enum{_name = _code}
@@ -74,6 +75,19 @@ struct vpa_dtl {
u64 last_idx;
};
+struct vpa_pmu_ctx {
+ struct perf_output_handle handle;
+};
+
+struct vpa_pmu_buf {
+ int nr_pages;
+ bool snapshot;
+ u64 *base;
+ u64 size;
+ u64 head;
+};
+
+static DEFINE_PER_CPU(struct vpa_pmu_ctx, vpa_pmu_ctx);
static DEFINE_PER_CPU(struct vpa_dtl, vpa_dtl_cpu);
/* variable to capture reference count for the active dtl threads */
@@ -302,6 +316,67 @@ static void vpa_dtl_event_read(struct perf_event *event)
{
}
+/*
+ * Set up pmu-private data structures for an AUX area
+ * **pages contains the aux buffer allocated for this event
+ * for the corresponding cpu. rb_alloc_aux uses "alloc_pages_node"
+ * and returns pointer to each page address. Map these pages to
+ * contiguous space using vmap and use that as base address.
+ *
+ * The aux private data structure ie, "struct vpa_pmu_buf" mainly
+ * saves
+ * - buf->base: aux buffer base address
+ * - buf->head: offset from base address where data will be written to.
+ * - buf->size: Size of allocated memory
+ */
+static void *vpa_dtl_setup_aux(struct perf_event *event, void **pages,
+ int nr_pages, bool snapshot)
+{
+ int i, cpu = event->cpu;
+ struct vpa_pmu_buf *buf __free(kfree) = NULL;
+ struct page **pglist __free(kfree) = NULL;
+
+ /* We need at least one page for this to work. */
+ if (!nr_pages)
+ return NULL;
+
+ if (cpu == -1)
+ cpu = raw_smp_processor_id();
+
+ buf = kzalloc_node(sizeof(*buf), GFP_KERNEL, cpu_to_node(cpu));
+ if (!buf)
+ return NULL;
+
+ pglist = kcalloc(nr_pages, sizeof(*pglist), GFP_KERNEL);
+ if (!pglist)
+ return NULL;
+
+ for (i = 0; i < nr_pages; ++i)
+ pglist[i] = virt_to_page(pages[i]);
+
+ buf->base = vmap(pglist, nr_pages, VM_MAP, PAGE_KERNEL);
+ if (!buf->base)
+ return NULL;
+
+ buf->nr_pages = nr_pages;
+ buf->snapshot = false;
+
+ buf->size = nr_pages << PAGE_SHIFT;
+ buf->head = 0;
+ return no_free_ptr(buf);
+}
+
+/*
+ * free pmu-private AUX data structures
+ */
+static void vpa_dtl_free_aux(void *aux)
+{
+ struct vpa_pmu_buf *buf = aux;
+
+ vunmap(buf->base);
+ kfree(buf);
+}
+
static struct pmu vpa_dtl_pmu = {
.task_ctx_nr = perf_invalid_context,
@@ -311,6 +386,8 @@ static struct pmu vpa_dtl_pmu = {
.add = vpa_dtl_event_add,
.del = vpa_dtl_event_del,
.read = vpa_dtl_event_read,
+ .setup_aux = vpa_dtl_setup_aux,
+ .free_aux = vpa_dtl_free_aux,
.capabilities = PERF_PMU_CAP_NO_EXCLUDE | PERF_PMU_CAP_EXCLUSIVE,
};
--
2.47.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH V2 5/7] powerpc/perf/vpa-dtl: Add support to capture DTL data in aux buffer
2025-09-15 7:22 [PATCH V2 0/7] Add interface to expose vpa dtl counters via Athira Rajeev
` (3 preceding siblings ...)
2025-09-15 7:22 ` [PATCH V2 4/7] powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for capturing DTL data Athira Rajeev
@ 2025-09-15 7:22 ` Athira Rajeev
2025-09-15 7:22 ` [PATCH V2 6/7] powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake up is needed Athira Rajeev
2025-09-15 7:22 ` [PATCH V2 7/7] powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU Athira Rajeev
6 siblings, 0 replies; 8+ messages in thread
From: Athira Rajeev @ 2025-09-15 7:22 UTC (permalink / raw)
To: maddy, linuxppc-dev
Cc: acme, jolsa, adrian.hunter, irogers, namhyung, linux-perf-users,
aboorvad, sshegde, atrajeev, hbathini, Aditya.Bodkhe1, venkat88,
Tejas.Manhas1
vpa dtl pmu has one hrtimer added per vpa-dtl pmu thread. When the
hrtimer expires, in the timer handler, code is added to save the DTL
data to perf event record via vpa_dtl_capture_aux() function.
The DTL (Dispatch Trace Log) contains information
about dispatch/preempt, enqueue time etc. We directly copy the DTL
buffer data as part of auxiliary buffer. Data will be written to
disk only when the allocated buffer is full.
By this approach, all the DTL data will be present as-is in the
perf.data. The data will be post-processed in perf tools side when doing
perf report/perf script and this will avoid time taken to create samples
in the kernel space.
To corelate each DTL entry with other events across CPU's, we need to
map timebase from "struct dtl_entry" which phyp provides with boot
timebase. This also needs timebase frequency. Define "struct boottb_freq"
to save these details.
Added changes to capture the Dispatch Trace Log details to AUX buffer
in vpa_dtl_dump_sample_data(). Boot timebase and frequency needs to be
saved only at once, added field to indicate this as part of
"vpa_pmu_buf" structure.
perf_aux_output_begin: This function is called before writing to AUX
area. This returns the pointer to aux area private structure, ie
"struct vpa_pmu_buf". The function obtains the output handle
(used in perf_aux_output_end). when capture completes in
vpa_dtl_capture_aux(), call perf_aux_output_end() to commit the recorded
data. perf_aux_output_end() is called to move the aux->head of
"struct perf_buffer" to indicate size of data in aux buffer.
aux_tail will be moved in perf tools side when writing the data from
aux buffer to perf.data file in disk.
It is responsiblity of PMU driver to make sure data is copied between
perf_aux_output_begin and perf_aux_output_end.
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
---
arch/powerpc/perf/vpa-dtl.c | 131 +++++++++++++++++++++++++++++++++++-
1 file changed, 130 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/perf/vpa-dtl.c b/arch/powerpc/perf/vpa-dtl.c
index 28e7fc6e7015..ead96af37997 100644
--- a/arch/powerpc/perf/vpa-dtl.c
+++ b/arch/powerpc/perf/vpa-dtl.c
@@ -85,6 +85,24 @@ struct vpa_pmu_buf {
u64 *base;
u64 size;
u64 head;
+ /* boot timebase and frequency needs to be saved only at once */
+ int boottb_freq_saved;
+};
+
+/*
+ * To corelate each DTL entry with other events across CPU's,
+ * we need to map timebase from "struct dtl_entry" which phyp
+ * provides with boot timebase. This also needs timebase frequency.
+ * Formula is: ((timbase from DTL entry - boot time) / frequency)
+ *
+ * To match with size of "struct dtl_entry" to ease post processing,
+ * padded 24 bytes to the structure.
+ */
+struct boottb_freq {
+ u64 boot_tb;
+ u64 tb_freq;
+ u64 timebase;
+ u64 padded[3];
};
static DEFINE_PER_CPU(struct vpa_pmu_ctx, vpa_pmu_ctx);
@@ -94,13 +112,123 @@ static DEFINE_PER_CPU(struct vpa_dtl, vpa_dtl_cpu);
static int dtl_global_refc;
static spinlock_t dtl_global_lock = __SPIN_LOCK_UNLOCKED(dtl_global_lock);
+/*
+ * Capture DTL data in AUX buffer
+ */
+static void vpa_dtl_capture_aux(long *n_entries, struct vpa_pmu_buf *buf,
+ struct vpa_dtl *dtl, int index)
+{
+ struct dtl_entry *aux_copy_buf = (struct dtl_entry *)buf->base;
+
+ /*
+ * Copy to AUX buffer from per-thread address
+ */
+ memcpy(aux_copy_buf + buf->head, &dtl->buf[index], *n_entries * sizeof(struct dtl_entry));
+
+ buf->head += *n_entries;
+
+ return;
+}
+
/*
* Function to dump the dispatch trace log buffer data to the
* perf data.
+ *
+ * perf_aux_output_begin: This function is called before writing
+ * to AUX area. This returns the pointer to aux area private structure,
+ * ie "struct vpa_pmu_buf" here which is set in setup_aux() function.
+ * The function obtains the output handle (used in perf_aux_output_end).
+ * when capture completes in vpa_dtl_capture_aux(), call perf_aux_output_end()
+ * to commit the recorded data.
+ *
+ * perf_aux_output_end: This function commits data by adjusting the
+ * aux_head of "struct perf_buffer". aux_tail will be moved in perf tools
+ * side when writing the data from aux buffer to perf.data file in disk.
+ *
+ * Here in the private aux structure, we maintain head to know where
+ * to copy data next time in the PMU driver. vpa_pmu_buf->head is moved to
+ * maintain the aux head for PMU driver. It is responsiblity of PMU
+ * driver to make sure data is copied between perf_aux_output_begin and
+ * perf_aux_output_end.
+ *
+ * After data is copied in vpa_dtl_capture_aux() function, perf_aux_output_end()
+ * is called to move the aux->head of "struct perf_buffer" to indicate size of
+ * data in aux buffer. This will post a PERF_RECORD_AUX into the perf buffer.
+ * Data will be written to disk only when the allocated buffer is full.
+ *
+ * By this approach, all the DTL data will be present as-is in the
+ * perf.data. The data will be pre-processed in perf tools side when doing
+ * perf report/perf script and this will avoid time taken to create samples
+ * in the kernel space.
*/
static void vpa_dtl_dump_sample_data(struct perf_event *event)
{
- return;
+ u64 cur_idx, last_idx, i;
+ u64 boot_tb;
+ struct boottb_freq boottb_freq;
+
+ /* actual number of entries read */
+ long n_read = 0, read_size = 0;
+
+ /* number of entries added to dtl buffer */
+ long n_req;
+
+ struct vpa_pmu_ctx *vpa_ctx = this_cpu_ptr(&vpa_pmu_ctx);
+
+ struct vpa_pmu_buf *aux_buf;
+
+ struct vpa_dtl *dtl = &per_cpu(vpa_dtl_cpu, event->cpu);
+
+ cur_idx = be64_to_cpu(lppaca_of(event->cpu).dtl_idx);
+ last_idx = dtl->last_idx;
+
+ if (last_idx + N_DISPATCH_LOG <= cur_idx)
+ last_idx = cur_idx - N_DISPATCH_LOG + 1;
+
+ n_req = cur_idx - last_idx;
+
+ /* no new entry added to the buffer, return */
+ if (n_req <= 0)
+ return;
+
+ dtl->last_idx = last_idx + n_req;
+ boot_tb = get_boot_tb();
+
+ i = last_idx % N_DISPATCH_LOG;
+
+ aux_buf = perf_aux_output_begin(&vpa_ctx->handle, event);
+ if (!aux_buf) {
+ pr_debug("returning. no aux\n");
+ return;
+ }
+
+ if (!aux_buf->boottb_freq_saved) {
+ pr_debug("Copying boot tb to aux buffer: %lld\n", boot_tb);
+ /* Save boot_tb to convert raw timebase to it's relative system boot time */
+ boottb_freq.boot_tb = boot_tb;
+ /* Save tb_ticks_per_sec to convert timebase to sec */
+ boottb_freq.tb_freq = tb_ticks_per_sec;
+ boottb_freq.timebase = 0;
+ memcpy(aux_buf->base, &boottb_freq, sizeof(boottb_freq));
+ aux_buf->head += 1;
+ aux_buf->boottb_freq_saved = 1;
+ n_read += 1;
+ }
+
+ /* read the tail of the buffer if we've wrapped */
+ if (i + n_req > N_DISPATCH_LOG) {
+ read_size = N_DISPATCH_LOG - i;
+ vpa_dtl_capture_aux(&read_size, aux_buf, dtl, i);
+ n_req -= read_size;
+ n_read += read_size;
+ i = 0;
+ }
+
+ /* .. and now the head */
+ vpa_dtl_capture_aux(&n_req, aux_buf, dtl, i);
+
+ /* Move the aux->head to indicate size of data in aux buffer */
+ perf_aux_output_end(&vpa_ctx->handle, (n_req + n_read) * sizeof(struct dtl_entry));
}
/*
@@ -363,6 +491,7 @@ static void *vpa_dtl_setup_aux(struct perf_event *event, void **pages,
buf->size = nr_pages << PAGE_SHIFT;
buf->head = 0;
+ buf->boottb_freq_saved = 0;
return no_free_ptr(buf);
}
--
2.47.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH V2 6/7] powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake up is needed
2025-09-15 7:22 [PATCH V2 0/7] Add interface to expose vpa dtl counters via Athira Rajeev
` (4 preceding siblings ...)
2025-09-15 7:22 ` [PATCH V2 5/7] powerpc/perf/vpa-dtl: Add support to capture DTL data in aux buffer Athira Rajeev
@ 2025-09-15 7:22 ` Athira Rajeev
2025-09-15 7:22 ` [PATCH V2 7/7] powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU Athira Rajeev
6 siblings, 0 replies; 8+ messages in thread
From: Athira Rajeev @ 2025-09-15 7:22 UTC (permalink / raw)
To: maddy, linuxppc-dev
Cc: acme, jolsa, adrian.hunter, irogers, namhyung, linux-perf-users,
aboorvad, sshegde, atrajeev, hbathini, Aditya.Bodkhe1, venkat88,
Tejas.Manhas1
Handle the case when the aux buffer is going to be full and
data needs to be written to the data file. perf_aux_output_begin()
function checks if there is enough space depending on the values of
aux_wakeup and aux_watermark which is part of "struct perf_buffer".
Inorder to maintain where to write to aux buffer, add two fields
to "struct vpa_pmu_buf". Field "threshold" to indicate total possible
DTL entries that can be contained in aux buffer and field "full" to
indicate anytime when buffer is full. In perf_aux_output_end, there
is check to see if wake up is needed based on aux head value.
In vpa_dtl_capture_aux(), check if there is enough space to contain the
DTL data. If not, save the data for available memory and set full to true.
Set head of private aux to zero when buffer is full so that next data
will be copied to beginning of the buffer. The address used for copying
to aux is "aux_copy_buf + buf->head". So once buffer is full, set head
to zero, so that next time it will be written from start of the buffer.
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
---
arch/powerpc/perf/vpa-dtl.c | 54 +++++++++++++++++++++++++++++++++++--
1 file changed, 52 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/perf/vpa-dtl.c b/arch/powerpc/perf/vpa-dtl.c
index ead96af37997..3c1d1c28deb9 100644
--- a/arch/powerpc/perf/vpa-dtl.c
+++ b/arch/powerpc/perf/vpa-dtl.c
@@ -85,8 +85,11 @@ struct vpa_pmu_buf {
u64 *base;
u64 size;
u64 head;
+ u64 head_size;
/* boot timebase and frequency needs to be saved only at once */
int boottb_freq_saved;
+ u64 threshold;
+ bool full;
};
/*
@@ -120,11 +123,31 @@ static void vpa_dtl_capture_aux(long *n_entries, struct vpa_pmu_buf *buf,
{
struct dtl_entry *aux_copy_buf = (struct dtl_entry *)buf->base;
+ /*
+ * check if there is enough space to contain the
+ * DTL data. If not, save the data for available
+ * memory and set full to true.
+ */
+ if (buf->head + *n_entries >= buf->threshold) {
+ *n_entries = buf->threshold - buf->head;
+ buf->full = 1;
+ }
+
/*
* Copy to AUX buffer from per-thread address
*/
memcpy(aux_copy_buf + buf->head, &dtl->buf[index], *n_entries * sizeof(struct dtl_entry));
+ if (buf->full) {
+ /*
+ * Set head of private aux to zero when buffer is full
+ * so that next data will be copied to beginning of the
+ * buffer
+ */
+ buf->head = 0;
+ return;
+ }
+
buf->head += *n_entries;
return;
@@ -178,6 +201,7 @@ static void vpa_dtl_dump_sample_data(struct perf_event *event)
struct vpa_pmu_buf *aux_buf;
struct vpa_dtl *dtl = &per_cpu(vpa_dtl_cpu, event->cpu);
+ u64 size;
cur_idx = be64_to_cpu(lppaca_of(event->cpu).dtl_idx);
last_idx = dtl->last_idx;
@@ -222,13 +246,37 @@ static void vpa_dtl_dump_sample_data(struct perf_event *event)
n_req -= read_size;
n_read += read_size;
i = 0;
+ if (aux_buf->full) {
+ size = (n_read * sizeof(struct dtl_entry));
+ if ((size + aux_buf->head_size) > aux_buf->size) {
+ size = aux_buf->size - aux_buf->head_size;
+ perf_aux_output_end(&vpa_ctx->handle, size);
+ aux_buf->head = 0;
+ aux_buf->head_size = 0;
+ } else {
+ aux_buf->head_size += (n_read * sizeof(struct dtl_entry));
+ perf_aux_output_end(&vpa_ctx->handle, n_read * sizeof(struct dtl_entry));
+ }
+ goto out;
+ }
}
/* .. and now the head */
vpa_dtl_capture_aux(&n_req, aux_buf, dtl, i);
- /* Move the aux->head to indicate size of data in aux buffer */
- perf_aux_output_end(&vpa_ctx->handle, (n_req + n_read) * sizeof(struct dtl_entry));
+ size = ((n_req + n_read) * sizeof(struct dtl_entry));
+ if ((size + aux_buf->head_size) > aux_buf->size) {
+ size = aux_buf->size - aux_buf->head_size;
+ perf_aux_output_end(&vpa_ctx->handle, size);
+ aux_buf->head = 0;
+ aux_buf->head_size = 0;
+ } else {
+ aux_buf->head_size += ((n_req + n_read) * sizeof(struct dtl_entry));
+ /* Move the aux->head to indicate size of data in aux buffer */
+ perf_aux_output_end(&vpa_ctx->handle, (n_req + n_read) * sizeof(struct dtl_entry));
+ }
+out:
+ aux_buf->full = 0;
}
/*
@@ -491,7 +539,9 @@ static void *vpa_dtl_setup_aux(struct perf_event *event, void **pages,
buf->size = nr_pages << PAGE_SHIFT;
buf->head = 0;
+ buf->head_size = 0;
buf->boottb_freq_saved = 0;
+ buf->threshold = ((buf->size - 32) / sizeof(struct dtl_entry));
return no_free_ptr(buf);
}
--
2.47.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH V2 7/7] powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU
2025-09-15 7:22 [PATCH V2 0/7] Add interface to expose vpa dtl counters via Athira Rajeev
` (5 preceding siblings ...)
2025-09-15 7:22 ` [PATCH V2 6/7] powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake up is needed Athira Rajeev
@ 2025-09-15 7:22 ` Athira Rajeev
6 siblings, 0 replies; 8+ messages in thread
From: Athira Rajeev @ 2025-09-15 7:22 UTC (permalink / raw)
To: maddy, linuxppc-dev
Cc: acme, jolsa, adrian.hunter, irogers, namhyung, linux-perf-users,
aboorvad, sshegde, atrajeev, hbathini, Aditya.Bodkhe1, venkat88,
Tejas.Manhas1
Documentation for vpa-dtl (Virtual Processor Area - Dispatch Trace Log)
PMU interface. And how it can be used to collect the distrace trace log
entries in perf data, how to process/report as part of perf report/perf
script.
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
---
Documentation/arch/powerpc/index.rst | 1 +
Documentation/arch/powerpc/vpa-dtl.rst | 155 +++++++++++++++++++++++++
2 files changed, 156 insertions(+)
create mode 100644 Documentation/arch/powerpc/vpa-dtl.rst
diff --git a/Documentation/arch/powerpc/index.rst b/Documentation/arch/powerpc/index.rst
index 53fc9f89f3e4..1be2ee3f0361 100644
--- a/Documentation/arch/powerpc/index.rst
+++ b/Documentation/arch/powerpc/index.rst
@@ -37,6 +37,7 @@ powerpc
vas-api
vcpudispatch_stats
vmemmap_dedup
+ vpa-dtl
features
diff --git a/Documentation/arch/powerpc/vpa-dtl.rst b/Documentation/arch/powerpc/vpa-dtl.rst
new file mode 100644
index 000000000000..98a8550ae1cc
--- /dev/null
+++ b/Documentation/arch/powerpc/vpa-dtl.rst
@@ -0,0 +1,155 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. _htm:
+
+===================================
+DTL (Dispatch Trace Log)
+===================================
+
+Athira Rajeev, 19 April 2025
+
+.. contents::
+ :depth: 3
+
+
+Basic overview
+==============
+
+The pseries Shared Processor Logical Partition(SPLPAR) machines can
+retrieve a log of dispatch and preempt events from the hypervisor
+using data from Disptach Trace Log(DTL) buffer. With this information,
+user can retrieve when and why each dispatch & preempt has occurred.
+The vpa-dtl PMU exposes the Virtual Processor Area(VPA) DTL counters
+via perf.
+
+Infrastructure used
+===================
+
+The VPA DTL PMU counters do not interrupt on overflow or generate any
+PMI interrupts. Therefore, hrtimer is used to poll the DTL data. The timer
+nterval can be provided by user via sample_period field in nano seconds.
+vpa dtl pmu has one hrtimer added per vpa-dtl pmu thread. DTL (Dispatch
+Trace Log) contains information about dispatch/preempt, enqueue time etc.
+We directly copy the DTL buffer data as part of auxiliary buffer and it
+will be processed later. This will avoid time taken to create samples
+in the kernel space. The PMU driver collecting Dispatch Trace Log (DTL)
+entries makes use of AUX support in perf infrastructure. On the tools side,
+this data is made available as PERF_RECORD_AUXTRACE records.
+
+To correlate each DTL entry with other events across CPU's, an auxtrace_queue
+is created for each CPU. Each auxtrace queue has a array/list of auxtrace buffers.
+All auxtrace queues is maintained in auxtrace heap. The queues are sorted
+based on timestamp. When the different PERF_RECORD_XX records are processed,
+compare the timestamp of perf record with timestamp of top element in the
+auxtrace heap so that DTL events can be co-related with other events
+Process the auxtrace queue if the timestamp of element from heap is
+lower than timestamp from entry in perf record. Sometimes it could happen that
+one buffer is only partially processed. if the timestamp of occurrence of
+another event is more than currently processed element in the queue, it will
+move on to next perf record. So keep track of position of buffer to continue
+processing next time. Update the timestamp of the auxtrace heap with the timestamp
+of last processed entry from the auxtrace buffer.
+
+This infrastructure ensures dispatch trace log entries can be correlated
+and presented along with other events like sched.
+
+vpa-dtl PMU example usage
+=========================
+
+.. code-block:: sh
+
+ # ls /sys/devices/vpa_dtl/
+ events format perf_event_mux_interval_ms power subsystem type uevent
+
+
+To capture the DTL data using perf record:
+.. code-block:: sh
+
+ # ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
+
+The result can be interpreted using perf record. Snippet of perf report -D:
+
+.. code-block:: sh
+
+ # ./perf report -D
+
+There are different PERF_RECORD_XX records. In that records corresponding to
+auxtrace buffers includes:
+
+1. PERF_RECORD_AUX
+ Conveys that new data is available in AUX area
+
+2. PERF_RECORD_AUXTRACE_INFO
+ Describes offset and size of auxtrace data in the buffers
+
+3. PERF_RECORD_AUXTRACE
+ This is the record that defines the auxtrace data which here in case of
+ vpa-dtl pmu is dispatch trace log data.
+
+Snippet from perf report -D showing the PERF_RECORD_AUXTRACE dump
+
+.. code-block:: sh
+
+0 0 0x39b10 [0x30]: PERF_RECORD_AUXTRACE size: 0x690 offset: 0 ref: 0 idx: 0 tid: -1 cpu: 0
+.
+. ... VPA DTL PMU data: size 1680 bytes, entries is 35
+. 00000000: boot_tb: 21349649546353231, tb_freq: 512000000
+. 00000030: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:7064, ready_to_enqueue_time:187, waiting_to_ready_time:6611773
+. 00000060: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:146, ready_to_enqueue_time:0, waiting_to_ready_time:15359437
+. 00000090: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4868, ready_to_enqueue_time:232, waiting_to_ready_time:5100709
+. 000000c0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:179, ready_to_enqueue_time:0, waiting_to_ready_time:30714243
+. 000000f0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:197, ready_to_enqueue_time:0, waiting_to_ready_time:15350648
+. 00000120: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:213, ready_to_enqueue_time:0, waiting_to_ready_time:15353446
+. 00000150: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:212, ready_to_enqueue_time:0, waiting_to_ready_time:15355126
+. 00000180: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:6368, ready_to_enqueue_time:164, waiting_to_ready_time:5104665
+
+Above is representation of dtl entry of below format:
+
+struct dtl_entry {
+ u8 dispatch_reason;
+ u8 preempt_reason;
+ u16 processor_id;
+ u32 enqueue_to_dispatch_time;
+ u32 ready_to_enqueue_time;
+ u32 waiting_to_ready_time;
+ u64 timebase;
+ u64 fault_addr;
+ u64 srr0;
+ u64 srr1;
+};
+
+First two fields represent the dispatch reason and preempt reason. The post
+processing of PERF_RECORD_AUXTRACE records will translate to meaningful data
+for user to consume.
+
+Visualize the dispatch trace log entries with perf report
+=========================================================
+
+.. code-block:: sh
+
+ # ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
+ [ perf record: Woken up 1 times to write data ]
+ [ perf record: Captured and wrote 0.300 MB perf.data ]
+
+ # ./perf report
+ # Samples: 321 of event 'vpa-dtl'
+ # Event count (approx.): 321
+ #
+ # Children Self Command Shared Object Symbol
+ # ........ ........ ....... ................. ..............................
+ #
+ 100.00% 100.00% swapper [kernel.kallsyms] [k] plpar_hcall_norets_notrace
+
+Visualize the dispatch trace log entries with perf script
+=========================================================
+
+.. code-block:: sh
+
+ # ./perf script
+ migration/9 67 [009] 105373.359903: sched:sched_waking: comm=perf pid=13418 prio=120 target_cpu=009
+ migration/9 67 [009] 105373.359904: sched:sched_migrate_task: comm=perf pid=13418 prio=120 orig_cpu=9 dest_cpu=10
+ migration/9 67 [009] 105373.359907: sched:sched_stat_runtime: comm=migration/9 pid=67 runtime=4050 [ns]
+ migration/9 67 [009] 105373.359908: sched:sched_switch: prev_comm=migration/9 prev_pid=67 prev_prio=0 prev_state=S ==> next_comm=swapper/9 next_pid=0 next_prio=120
+ :256 256 [016] 105373.359913: vpa-dtl: timebase: 21403600706628832 dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4854, ready_to_enqueue_time:139, waiting_to_ready_time:511842115 c0000000000fcd28 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
+ :256 256 [017] 105373.360012: vpa-dtl: timebase: 21403600706679454 dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:236, ready_to_enqueue_time:0, waiting_to_ready_time:133864583 c0000000000fcd28 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
+ perf 13418 [010] 105373.360048: sched:sched_stat_runtime: comm=perf pid=13418 runtime=139748 [ns]
+ perf 13418 [010] 105373.360052: sched:sched_waking: comm=migration/10 pid=72 prio=0 target_cpu=010
--
2.47.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-09-15 7:23 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-15 7:22 [PATCH V2 0/7] Add interface to expose vpa dtl counters via Athira Rajeev
2025-09-15 7:22 ` [PATCH V2 1/7] powerpc/time: Expose boot_tb via accessor Athira Rajeev
2025-09-15 7:22 ` [PATCH V2 2/7] powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf Athira Rajeev
2025-09-15 7:22 ` [PATCH V2 3/7] docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs event format entries for vpa_dtl pmu Athira Rajeev
2025-09-15 7:22 ` [PATCH V2 4/7] powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for capturing DTL data Athira Rajeev
2025-09-15 7:22 ` [PATCH V2 5/7] powerpc/perf/vpa-dtl: Add support to capture DTL data in aux buffer Athira Rajeev
2025-09-15 7:22 ` [PATCH V2 6/7] powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake up is needed Athira Rajeev
2025-09-15 7:22 ` [PATCH V2 7/7] powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU Athira Rajeev
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).