* [PATCH v1 0/3] perf pmu: Add tool-provided NVMe PMU
@ 2026-06-09 7:03 Ian Rogers
2026-06-09 7:03 ` [PATCH v1 1/3] perf build: Add libnvme feature detection Ian Rogers
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Ian Rogers @ 2026-06-09 7:03 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark,
Tomas Glozar, Michael Jeanson, Dmitrii Dolgov, Alexandre Chartre,
Yuzhuo Jing, Leo Yan, German Gomez, Anubhav Shelat, linux-kernel,
linux-perf-users
Cc: linux-nvme
This patch series introduces a tool-provided PMU for NVMe devices to expose
various hardware logs as perf events.
Using libnvme, the new nvme_pmu extracts metrics from standard NVMe log pages
including:
- SMART / Health Information (e.g. data units read/written, power cycles,
temperature)
- Endurance Group Information
- FDP (Flexible Data Placement) Statistics
- Error Information
- Zoned Namespaces (ZNS) Changed Zones
Key features:
- Dynamic probing of supported log pages on /dev/nvmeX. Unsupported events
are marked as deprecated and hidden from `perf list` by default.
- Generic configuration encoding where the log type, size, and offset are
encoded directly into the event configuration.
- Correct interval delta tracking using baseline offset snapshots. This mimics
the behavior of the hwmon PMU, allowing gauges like Temperature to be
reported instantaneously in `perf stat -I`, while standard counters
correctly accumulate deltas.
- Robust handling of cross-architecture endianness when reading 128-bit and
64-bit fields from the NVMe specification.
The series is broken down into:
1. Adding the libnvme feature detection via pkg-config in tools/build.
2. The core implementation of the NVMe tool PMU.
3. A unit test suite verifying the parsing and encoding of the NVMe PMU
events using a mocked device.
Ian Rogers (3):
perf build: Add libnvme feature detection
perf pmu: Implement tool-provided NVMe PMU
perf tests: Add NVMe PMU event parsing test
tools/build/Makefile.feature | 2 +
tools/build/feature/Makefile | 3 +
tools/build/feature/test-libnvme.c | 11 +
tools/perf/Makefile.config | 12 +
tools/perf/tests/Build | 1 +
tools/perf/tests/builtin-test.c | 1 +
tools/perf/tests/nvme_pmu.c | 176 ++++++++++
tools/perf/tests/tests.h | 1 +
tools/perf/util/Build | 1 +
tools/perf/util/evsel.c | 11 +-
tools/perf/util/nvme_pmu.c | 535 +++++++++++++++++++++++++++++
tools/perf/util/nvme_pmu.h | 143 ++++++++
tools/perf/util/pmu.c | 15 +
tools/perf/util/pmu.h | 8 +-
tools/perf/util/pmus.c | 31 +-
tools/perf/util/pmus.h | 1 +
16 files changed, 945 insertions(+), 7 deletions(-)
create mode 100644 tools/build/feature/test-libnvme.c
create mode 100644 tools/perf/tests/nvme_pmu.c
create mode 100644 tools/perf/util/nvme_pmu.c
create mode 100644 tools/perf/util/nvme_pmu.h
--
2.54.0.1064.gd145956f57-goog
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v1 1/3] perf build: Add libnvme feature detection
2026-06-09 7:03 [PATCH v1 0/3] perf pmu: Add tool-provided NVMe PMU Ian Rogers
@ 2026-06-09 7:03 ` Ian Rogers
2026-06-09 7:12 ` sashiko-bot
2026-06-09 7:03 ` [PATCH v1 2/3] perf pmu: Implement tool-provided NVMe PMU Ian Rogers
2026-06-09 7:03 ` [PATCH v1 3/3] perf tests: Add NVMe PMU event parsing test Ian Rogers
2 siblings, 1 reply; 6+ messages in thread
From: Ian Rogers @ 2026-06-09 7:03 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark,
Tomas Glozar, Michael Jeanson, Dmitrii Dolgov, Alexandre Chartre,
Yuzhuo Jing, Leo Yan, German Gomez, Anubhav Shelat, linux-kernel,
linux-perf-users
Cc: linux-nvme
Introduce a feature check for libnvme so that we can conditionally
compile the NVMe PMU driver. This checks for the availability of the
libnvme headers and library using pkg-config.
Signed-off-by: Ian Rogers <irogers@google.com>
CONV=ca4c5d09-4ef8-405a-80bb-aa988020b436
TAG=agy
---
tools/build/Makefile.feature | 2 ++
tools/build/feature/Makefile | 3 +++
tools/build/feature/test-libnvme.c | 11 +++++++++++
tools/perf/Makefile.config | 12 ++++++++++++
4 files changed, 28 insertions(+)
create mode 100644 tools/build/feature/test-libnvme.c
diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
index ed1374af31c1..bf8503241539 100644
--- a/tools/build/Makefile.feature
+++ b/tools/build/Makefile.feature
@@ -73,6 +73,7 @@ FEATURE_TESTS_BASIC := \
libbfd \
libbfd-threadsafe \
libelf \
+ libnvme \
libelf-getphdrnum \
libelf-gelf_getnote \
libelf-getshdrstrndx \
@@ -146,6 +147,7 @@ FEATURE_DISPLAY ?= \
libdw \
glibc \
libelf \
+ libnvme \
libnuma \
numa_num_possible_cpus \
libpython \
diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile
index 62909a9c799d..fd00e0be6069 100644
--- a/tools/build/feature/Makefile
+++ b/tools/build/feature/Makefile
@@ -147,6 +147,9 @@ $(OUTPUT)test-libcap.bin:
$(OUTPUT)test-libelf.bin:
$(BUILD) -lelf
+$(OUTPUT)test-libnvme.bin:
+ $(BUILD) -lnvme
+
$(OUTPUT)test-eventfd.bin:
$(BUILD)
diff --git a/tools/build/feature/test-libnvme.c b/tools/build/feature/test-libnvme.c
new file mode 100644
index 000000000000..3fefad70a680
--- /dev/null
+++ b/tools/build/feature/test-libnvme.c
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+#include <libnvme.h>
+
+int main(void)
+{
+ nvme_root_t r = nvme_scan(NULL);
+
+ if (r)
+ nvme_free_tree(r);
+ return 0;
+}
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index 6e7b15fab2ec..79c4ad10bc18 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -1060,6 +1060,18 @@ ifndef NO_CAPSTONE
endif
endif
+ifndef NO_LIBNVME
+ $(call feature_check,libnvme)
+ ifeq ($(feature-libnvme), 1)
+ CFLAGS += -DHAVE_LIBNVME_SUPPORT $(shell $(PKG_CONFIG) --cflags libnvme)
+ LDFLAGS += $(shell $(PKG_CONFIG) --libs-only-L libnvme)
+ EXTLIBS += $(shell $(PKG_CONFIG) --libs-only-l libnvme)
+ $(call detected,CONFIG_LIBNVME)
+ else
+ msg := $(warning No libnvme found, disables NVMe PMU support, please install libnvme-dev/libnvme-devel);
+ endif
+endif
+
ifdef EXTRA_TESTS
$(call detected,CONFIG_EXTRA_TESTS)
CFLAGS += -DHAVE_EXTRA_TESTS
--
2.54.0.1064.gd145956f57-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v1 2/3] perf pmu: Implement tool-provided NVMe PMU
2026-06-09 7:03 [PATCH v1 0/3] perf pmu: Add tool-provided NVMe PMU Ian Rogers
2026-06-09 7:03 ` [PATCH v1 1/3] perf build: Add libnvme feature detection Ian Rogers
@ 2026-06-09 7:03 ` Ian Rogers
2026-06-09 7:21 ` sashiko-bot
2026-06-09 7:03 ` [PATCH v1 3/3] perf tests: Add NVMe PMU event parsing test Ian Rogers
2 siblings, 1 reply; 6+ messages in thread
From: Ian Rogers @ 2026-06-09 7:03 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark,
Tomas Glozar, Michael Jeanson, Dmitrii Dolgov, Alexandre Chartre,
Yuzhuo Jing, Leo Yan, German Gomez, Anubhav Shelat, linux-kernel,
linux-perf-users
Cc: linux-nvme
Add a tool-provided PMU for NVMe devices using libnvme. This PMU
exposes metrics from various NVMe logs (SMART, Endurance Group,
FDP, Error Information, and Zoned Namespaces) as perf events under
the 'nvme_nvmeX' PMUs.
Features:
- Generic configuration encoding (log type, size, and offset mapped
inside the event config).
- Dynamic probing of supported log pages on /dev/nvmeX. Unsupported
events are marked deprecated and hidden from 'perf list' by
default.
- Correct interval delta tracking using baseline offset snapshots on
evsel->priv.
- Proper conversion and scaling of properties like temperature (K
to C).
Signed-off-by: Ian Rogers <irogers@google.com>
CONV=ca4c5d09-4ef8-405a-80bb-aa988020b436
TAG=agy
---
tools/perf/util/Build | 1 +
tools/perf/util/evsel.c | 11 +-
tools/perf/util/nvme_pmu.c | 562 +++++++++++++++++++++++++++++++++++++
tools/perf/util/nvme_pmu.h | 112 ++++++++
tools/perf/util/pmu.c | 15 +
tools/perf/util/pmu.h | 8 +-
tools/perf/util/pmus.c | 26 +-
7 files changed, 728 insertions(+), 7 deletions(-)
create mode 100644 tools/perf/util/nvme_pmu.c
create mode 100644 tools/perf/util/nvme_pmu.h
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index b22cdc24082a..e28e99634178 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -20,6 +20,7 @@ perf-util-y += disasm.o
perf-util-y += env.o
perf-util-y += event.o
perf-util-y += evlist.o
+perf-util-y += nvme_pmu.o
perf-util-y += sideband_evlist.o
perf-util-y += evsel.o
perf-util-y += evsel_fprintf.o
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 34c03f47a913..8caa626af57a 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -57,6 +57,7 @@
#include "hashmap.h"
#include "hist.h"
#include "hwmon_pmu.h"
+#include "nvme_pmu.h"
#include "intel-tpebs.h"
#include "memswap.h"
#include "off_cpu.h"
@@ -2207,6 +2208,8 @@ int evsel__read_counter(struct evsel *evsel, int cpu_map_idx, int thread)
if (evsel__is_hwmon(evsel))
return evsel__hwmon_pmu_read(evsel, cpu_map_idx, thread);
+ if (evsel__is_nvme(evsel))
+ return evsel__nvme_pmu_read(evsel, cpu_map_idx, thread);
if (evsel__is_drm(evsel))
return evsel__drm_pmu_read(evsel, cpu_map_idx, thread);
@@ -2947,8 +2950,12 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
}
if (evsel__is_hwmon(evsel)) {
err = evsel__hwmon_pmu_open(evsel, threads,
- start_cpu_map_idx,
- end_cpu_map_idx);
+ start_cpu_map_idx, end_cpu_map_idx);
+ goto out;
+ }
+ if (evsel__is_nvme(evsel)) {
+ err = evsel__nvme_pmu_open(evsel, threads,
+ start_cpu_map_idx, end_cpu_map_idx);
goto out;
}
if (evsel__is_drm(evsel)) {
diff --git a/tools/perf/util/nvme_pmu.c b/tools/perf/util/nvme_pmu.c
new file mode 100644
index 000000000000..17ba758aec59
--- /dev/null
+++ b/tools/perf/util/nvme_pmu.c
@@ -0,0 +1,562 @@
+// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+#include "counts.h"
+#include "debug.h"
+#include "evsel.h"
+#include "hashmap.h"
+#include "nvme_pmu.h"
+#include "pmu.h"
+#include <internal/xyarray.h>
+#include <internal/threadmap.h>
+#include <perf/threadmap.h>
+#include <sys/types.h>
+#include <assert.h>
+#include <ctype.h>
+#include <fcntl.h>
+#include <stddef.h>
+#include <stdlib.h>
+#include <string.h>
+#include <api/fs/fs.h>
+#include <api/io.h>
+#include <api/io_dir.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/zalloc.h>
+
+#ifdef HAVE_LIBNVME_SUPPORT
+#include <libnvme.h>
+
+
+#define NVME_CONFIG(log, size, offset) \
+ (((uint64_t)(log) << 24) | ((uint64_t)(size) << 16) | (offset))
+
+enum nvme_log_type {
+ NVME_LOG_SMART = 0,
+ NVME_LOG_ENDURANCE = 1,
+ NVME_LOG_FDP = 2,
+ NVME_LOG_ERROR = 3,
+ NVME_LOG_ZNS = 4,
+};
+
+#define NVME_SMART(size, field) \
+ NVME_CONFIG(NVME_LOG_SMART, size, offsetof(struct nvme_smart_log, field))
+
+#define NVME_ENDURANCE(size, field) \
+ NVME_CONFIG(NVME_LOG_ENDURANCE, size, offsetof(struct nvme_endurance_group_log, field))
+
+#define NVME_FDP(size, field) \
+ NVME_CONFIG(NVME_LOG_FDP, size, offsetof(struct nvme_fdp_stats_log, field))
+
+#define NVME_ERROR(size, field) \
+ NVME_CONFIG(NVME_LOG_ERROR, size, offsetof(struct nvme_error_log_page, field))
+
+#define NVME_ZNS(size, field) \
+ NVME_CONFIG(NVME_LOG_ZNS, size, offsetof(struct nvme_zns_changed_zone_log, field))
+
+struct nvme_event {
+ const char *name;
+ const char *desc;
+ const char *scale_unit;
+ uint64_t config;
+};
+
+static const struct nvme_event nvme_events[] = {
+ { "smart_data_units_read",
+ "Data units read (in 1000s of 512-byte units)",
+ "512000B", NVME_SMART(16, data_units_read) },
+ { "smart_data_units_written",
+ "Data units written (in 1000s of 512-byte units)",
+ "512000B", NVME_SMART(16, data_units_written) },
+ { "smart_host_read_commands", "Host read commands", NULL, NVME_SMART(16, host_reads) },
+ { "smart_host_write_commands", "Host write commands", NULL, NVME_SMART(16, host_writes) },
+ { "smart_ctrl_busy_time", "Controller busy time", "60s", NVME_SMART(16, ctrl_busy_time) },
+ { "smart_power_cycles", "Power cycles", NULL, NVME_SMART(16, power_cycles) },
+ { "smart_power_on_hours", "Power on hours", "1h", NVME_SMART(16, power_on_hours) },
+ { "smart_unsafe_shutdowns", "Unsafe shutdowns", NULL, NVME_SMART(16, unsafe_shutdowns) },
+ { "smart_media_errors", "Media errors", NULL, NVME_SMART(16, media_errors) },
+ { "smart_num_err_log_entries",
+ "Number of error log entries",
+ NULL, NVME_SMART(16, num_err_log_entries) },
+ { "smart_warning_temp_time",
+ "Warning temperature time",
+ "60s", NVME_SMART(4, warning_temp_time) },
+ { "smart_crit_comp_time",
+ "Critical composite temperature time",
+ "60s", NVME_SMART(4, critical_comp_time) },
+ { "smart_temperature", "Temperature", "0.001'C", NVME_SMART(2, temperature) },
+
+ { "endurance_percent_used",
+ "Endurance group percentage used",
+ NULL, NVME_ENDURANCE(1, percent_used) },
+ { "endurance_data_units_read",
+ "Endurance group data units read",
+ "512000B", NVME_ENDURANCE(16, data_units_read) },
+ { "endurance_data_units_written",
+ "Endurance group data units written",
+ "512000B", NVME_ENDURANCE(16, data_units_written) },
+ { "endurance_media_units_written",
+ "Endurance group media units written",
+ "512000B", NVME_ENDURANCE(16, media_units_written) },
+ { "endurance_host_read_cmds",
+ "Endurance group host read commands",
+ NULL, NVME_ENDURANCE(16, host_read_cmds) },
+ { "endurance_host_write_cmds",
+ "Endurance group host write commands",
+ NULL, NVME_ENDURANCE(16, host_write_cmds) },
+ { "endurance_num_err_info_log_entries",
+ "Endurance group number of error information log entries",
+ NULL, NVME_ENDURANCE(16, num_err_info_log_entries) },
+
+ { "fdp_hbmw", "FDP host bytes with metadata written", "1B", NVME_FDP(16, hbmw) },
+ { "fdp_mbmw", "FDP media bytes with metadata written", "1B", NVME_FDP(16, mbmw) },
+ { "fdp_mbe", "FDP media bytes erased", "1B", NVME_FDP(16, mbe) },
+
+ { "error_count", "Error info log error count", NULL, NVME_ERROR(8, error_count) },
+
+ { "zns_nrzid", "ZNS changed zone nrzid", NULL, NVME_ZNS(2, nrzid) },
+};
+
+
+struct nvme_pmu {
+ struct perf_pmu pmu;
+ char *dev_name;
+ bool support_checked;
+ bool log_supported[256];
+};
+
+
+bool perf_pmu__is_nvme(const struct perf_pmu *pmu)
+{
+ return pmu && pmu->type >= PERF_PMU_TYPE_NVME_START &&
+ pmu->type <= PERF_PMU_TYPE_NVME_END;
+}
+
+bool evsel__is_nvme(const struct evsel *evsel)
+{
+ return perf_pmu__is_nvme(evsel->pmu);
+}
+
+struct perf_pmu *nvme_pmu__new(struct list_head *pmus, const char *sysfs_name, const char *name)
+{
+ struct nvme_pmu *nvm;
+ char buf[64];
+ __u32 type;
+
+ /*
+ * Usually sysfs_name is something like "nvme0".
+ * We try to extract the number. If parsing fails, we use 0.
+ */
+ type = PERF_PMU_TYPE_NVME_START + strtoul(sysfs_name + 4, NULL, 10);
+
+ if (type > PERF_PMU_TYPE_NVME_END) {
+ pr_err("Unable to encode NVMe type from %s in valid PMU type\n", sysfs_name);
+ return NULL;
+ }
+
+ snprintf(buf, sizeof(buf), "nvme_%s", name);
+
+ nvm = zalloc(sizeof(*nvm));
+ if (!nvm)
+ return NULL;
+
+ if (perf_pmu__init(&nvm->pmu, type, buf) != 0) {
+ free(nvm);
+ return NULL;
+ }
+
+ nvm->dev_name = strdup(sysfs_name);
+ if (!nvm->dev_name) {
+ perf_pmu__delete(&nvm->pmu);
+ return NULL;
+ }
+ nvm->pmu.alias_name = strdup(sysfs_name);
+ if (!nvm->pmu.alias_name) {
+ perf_pmu__delete(&nvm->pmu);
+ return NULL;
+ }
+ nvm->pmu.cpus = perf_cpu_map__new_int(0);
+ if (!nvm->pmu.cpus) {
+ perf_pmu__delete(&nvm->pmu);
+ return NULL;
+ }
+ INIT_LIST_HEAD(&nvm->pmu.format);
+ INIT_LIST_HEAD(&nvm->pmu.caps);
+
+ list_add_tail(&nvm->pmu.list, pmus);
+ return &nvm->pmu;
+}
+
+void nvme_pmu__exit(struct perf_pmu *pmu)
+{
+ struct nvme_pmu *nvm = container_of(pmu, struct nvme_pmu, pmu);
+
+ zfree(&nvm->dev_name);
+}
+
+
+
+static void nvme_pmu__check_support(struct nvme_pmu *nvm)
+{
+ int fd;
+ char path[PATH_MAX];
+ struct nvme_smart_log smart_log;
+ struct nvme_endurance_group_log endurance_log;
+ struct nvme_fdp_stats_log fdp_log;
+ struct nvme_error_log_page error_log;
+ struct nvme_zns_changed_zone_log zns_log;
+
+ if (nvm->support_checked)
+ return;
+
+ nvm->support_checked = true;
+
+ /* Assume all supported if we can't test. */
+ memset(nvm->log_supported, 1, sizeof(nvm->log_supported));
+
+ snprintf(path, sizeof(path), "/dev/%s", nvm->dev_name);
+ fd = open(path, O_RDONLY);
+ if (fd < 0)
+ return;
+
+ if (nvme_get_log_smart(fd, NVME_NSID_ALL, true, &smart_log) != 0)
+ nvm->log_supported[NVME_LOG_SMART] = false;
+
+ if (nvme_get_log_endurance_group(fd, 0, &endurance_log) != 0)
+ nvm->log_supported[NVME_LOG_ENDURANCE] = false;
+
+ if (nvme_get_log_fdp_stats(fd, 0, 0, sizeof(fdp_log), &fdp_log) != 0)
+ nvm->log_supported[NVME_LOG_FDP] = false;
+
+ if (nvme_get_log_error(fd, 1, true, &error_log) != 0)
+ nvm->log_supported[NVME_LOG_ERROR] = false;
+
+ if (nvme_get_log_zns_changed_zones(fd, NVME_NSID_ALL, true, &zns_log) != 0)
+ nvm->log_supported[NVME_LOG_ZNS] = false;
+
+ close(fd);
+}
+
+int nvme_pmu__for_each_event(struct perf_pmu *pmu, void *state, pmu_event_callback cb)
+{
+ struct nvme_pmu *nvm = container_of(pmu, struct nvme_pmu, pmu);
+ size_t i;
+
+ nvme_pmu__check_support(nvm);
+ for (i = 0; i < ARRAY_SIZE(nvme_events); i++) {
+ const struct nvme_event *e = &nvme_events[i];
+ char alias_buf[64];
+ char desc_buf[256];
+ char encoding_buf[128];
+ struct pmu_event_info info = {
+ .pmu = pmu,
+ .name = e->name,
+ .alias = alias_buf,
+ .scale_unit = e->scale_unit,
+ .desc = desc_buf,
+ .long_desc = NULL,
+ .encoding_desc = encoding_buf,
+
+ .topic = "nvme",
+ .pmu_name = pmu->name,
+ .event_type_desc = "NVMe event",
+ .deprecated = !nvm->log_supported[(e->config >> 24) & 0xFF],
+ };
+
+ int ret;
+
+ snprintf(alias_buf, sizeof(alias_buf), "%s", e->name);
+ snprintf(desc_buf, sizeof(desc_buf), "%s", e->desc);
+ snprintf(encoding_buf, sizeof(encoding_buf),
+ "%s/config=0x%lx/", pmu->name, e->config);
+
+ ret = cb(state, &info);
+ if (ret)
+ return ret;
+ }
+ return 0;
+}
+
+size_t nvme_pmu__num_events(struct perf_pmu *pmu __maybe_unused)
+{
+ return ARRAY_SIZE(nvme_events);
+}
+
+bool nvme_pmu__have_event(struct perf_pmu *pmu __maybe_unused, const char *name)
+{
+ size_t i;
+
+ for (i = 0; i < ARRAY_SIZE(nvme_events); i++) {
+ if (!strcasecmp(name, nvme_events[i].name))
+ return true;
+ }
+ return false;
+}
+
+static int nvme_pmu__config_term(const struct nvme_pmu *nvm __maybe_unused,
+ struct perf_event_attr *attr,
+ struct parse_events_term *term,
+ struct parse_events_error *err)
+{
+ if (term->type_term == PARSE_EVENTS__TERM_TYPE_USER) {
+ size_t i;
+
+ for (i = 0; i < ARRAY_SIZE(nvme_events); i++) {
+ if (!strcasecmp(term->config, nvme_events[i].name)) {
+ attr->config = nvme_events[i].config;
+ return 0;
+ }
+ }
+ }
+ if (err) {
+ char *err_str;
+
+ parse_events_error__handle(err, term->err_val,
+ asprintf(&err_str,
+ "unexpected nvme event term (%s) %s",
+ parse_events__term_type_str(term->type_term),
+ term->config) < 0
+ ? strdup("unexpected nvme event term")
+ : err_str,
+ NULL);
+ }
+ return -EINVAL;
+}
+
+int nvme_pmu__config_terms(const struct perf_pmu *pmu,
+ struct perf_event_attr *attr,
+ struct parse_events_terms *terms,
+ struct parse_events_error *err)
+{
+ struct nvme_pmu *nvm = container_of(pmu, struct nvme_pmu, pmu);
+ struct parse_events_term *term;
+
+ list_for_each_entry(term, &terms->terms, list) {
+ if (nvme_pmu__config_term(nvm, attr, term, err))
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+int nvme_pmu__check_alias(struct parse_events_terms *terms, struct perf_pmu_info *info,
+ struct parse_events_error *err)
+{
+ struct parse_events_term *term =
+ list_first_entry(&terms->terms, struct parse_events_term, list);
+
+ if (term->type_term == PARSE_EVENTS__TERM_TYPE_USER) {
+ size_t i;
+
+ for (i = 0; i < ARRAY_SIZE(nvme_events); i++) {
+ if (!strcasecmp(term->config, nvme_events[i].name)) {
+ if (nvme_events[i].scale_unit) {
+ char *unit;
+
+ perf_pmu__convert_scale(nvme_events[i].scale_unit,
+ &unit, &info->scale);
+ info->unit = unit;
+ }
+ return 0;
+ }
+ }
+ }
+ if (err) {
+ char *err_str;
+
+ parse_events_error__handle(err, term->err_val,
+ asprintf(&err_str,
+ "unexpected nvme event term (%s) %s",
+ parse_events__term_type_str(term->type_term),
+ term->config) < 0
+ ? strdup("unexpected nvme event term")
+ : err_str,
+ NULL);
+ }
+ return -EINVAL;
+}
+
+int perf_pmus__read_nvme_pmus(struct list_head *pmus)
+{
+ nvme_root_t r = nvme_scan(NULL);
+ nvme_host_t h;
+ nvme_subsystem_t s;
+ nvme_ctrl_t c;
+
+ if (!r)
+ return 0;
+
+ nvme_for_each_host(r, h) {
+ nvme_for_each_subsystem(h, s) {
+ nvme_subsystem_for_each_ctrl(s, c) {
+ nvme_pmu__new(pmus, nvme_ctrl_get_name(c), nvme_ctrl_get_name(c));
+ }
+ }
+ }
+ nvme_free_tree(r);
+ return 0;
+}
+
+
+static int nvme_pmu__read_val(int fd, uint64_t config, uint64_t *val)
+{
+ int log_type = (config >> 24) & 0xFF;
+ unsigned int size = (config >> 16) & 0xFF;
+ unsigned int offset = config & 0xFFFF;
+ uint8_t buf[4096];
+ uint8_t *p;
+
+ if (log_type == NVME_LOG_SMART) {
+ if (offset + size > sizeof(struct nvme_smart_log))
+ return -EINVAL;
+ if (nvme_get_log_smart(fd, NVME_NSID_ALL, true, (struct nvme_smart_log *)buf) != 0)
+ return -EINVAL;
+
+ if (offset == offsetof(struct nvme_smart_log, temperature)) {
+ uint64_t kelvin = ((struct nvme_smart_log *)buf)->temperature[0] |
+ (((struct nvme_smart_log *)buf)->temperature[1] << 8);
+ *val = (kelvin * 1000) - 273150;
+ return 0;
+ }
+ } else if (log_type == NVME_LOG_ENDURANCE) {
+ if (offset + size > sizeof(struct nvme_endurance_group_log))
+ return -EINVAL;
+ if (nvme_get_log_endurance_group(fd, 0,
+ (struct nvme_endurance_group_log *)buf) != 0)
+ return -EINVAL;
+ } else if (log_type == NVME_LOG_FDP) {
+ if (offset + size > sizeof(struct nvme_fdp_stats_log))
+ return -EINVAL;
+ if (nvme_get_log_fdp_stats(fd, 0, 0, sizeof(struct nvme_fdp_stats_log), buf) != 0)
+ return -EINVAL;
+ } else if (log_type == NVME_LOG_ERROR) {
+ if (offset + size > sizeof(struct nvme_error_log_page))
+ return -EINVAL;
+ if (nvme_get_log_error(fd, 1, true, (struct nvme_error_log_page *)buf) != 0)
+ return -EINVAL;
+ } else if (log_type == NVME_LOG_ZNS) {
+ if (offset + size > sizeof(struct nvme_zns_changed_zone_log))
+ return -EINVAL;
+ if (nvme_get_log_zns_changed_zones(fd, NVME_NSID_ALL, true,
+ (struct nvme_zns_changed_zone_log *)buf) != 0)
+ return -EINVAL;
+ } else {
+ return -EINVAL;
+ }
+
+ p = buf + offset;
+ if (size == 16 || size == 8)
+ *val = le64_to_cpu(*(uint64_t *)p);
+ else if (size == 4)
+ *val = le32_to_cpu(*(uint32_t *)p);
+ else if (size == 2)
+ *val = le16_to_cpu(*(uint16_t *)p);
+ else if (size == 1)
+ *val = *(uint8_t *)p;
+ else
+ return -EINVAL;
+
+ return 0;
+}
+
+static bool nvme_pmu__is_gauge(uint64_t config)
+{
+ if (config == NVME_SMART(2, temperature) ||
+ config == NVME_ENDURANCE(1, percent_used) ||
+ config == NVME_ZNS(2, nrzid))
+ return true;
+ return false;
+}
+
+#define FD(e, x, y) (*(int *)xyarray__entry(e->core.fd, x, y))
+
+int evsel__nvme_pmu_open(struct evsel *evsel,
+ struct perf_thread_map *threads,
+ int start_cpu_map_idx, int end_cpu_map_idx)
+{
+ struct nvme_pmu *nvm = container_of(evsel->pmu, struct nvme_pmu, pmu);
+ int idx = 0, thread = 0, nthreads, err = 0;
+ char path[PATH_MAX];
+
+ snprintf(path, sizeof(path), "/dev/%s", nvm->dev_name);
+
+ nthreads = perf_thread_map__nr(threads);
+
+ if (!evsel->priv) {
+ int max_cpus = evsel->core.cpus ? perf_cpu_map__nr(evsel->core.cpus) : 1;
+
+ evsel->priv = xyarray__new(max_cpus, nthreads, sizeof(uint64_t));
+ }
+
+ for (idx = start_cpu_map_idx; idx < end_cpu_map_idx; idx++) {
+ for (thread = 0; thread < nthreads; thread++) {
+ int fd = open(path, O_RDONLY);
+
+ FD(evsel, idx, thread) = fd;
+ if (fd < 0) {
+ err = -errno;
+ goto out_close;
+ }
+ if (evsel->priv) {
+ uint64_t *initial_val = xyarray__entry(evsel->priv, idx, thread);
+
+ if (nvme_pmu__read_val(fd, evsel->core.attr.config, initial_val))
+ *initial_val = 0;
+ }
+ }
+ }
+ return 0;
+out_close:
+ if (err)
+ threads->err_thread = thread;
+
+ do {
+ while (--thread >= 0) {
+ if (FD(evsel, idx, thread) >= 0)
+ close(FD(evsel, idx, thread));
+ FD(evsel, idx, thread) = -1;
+ }
+ thread = nthreads;
+ } while (--idx >= 0);
+ return err;
+}
+
+int evsel__nvme_pmu_read(struct evsel *evsel, int cpu_map_idx, int thread)
+{
+ int fd;
+ struct perf_counts_values *count, *old_count = NULL;
+ uint64_t val = 0;
+ uint64_t *initial_val = NULL;
+
+ if (evsel->prev_raw_counts)
+ old_count = perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread);
+
+ count = perf_counts(evsel->counts, cpu_map_idx, thread);
+ fd = FD(evsel, cpu_map_idx, thread);
+
+ if (fd < 0 || nvme_pmu__read_val(fd, evsel->core.attr.config, &val)) {
+ count->lost++;
+ return -EINVAL;
+ }
+
+ if (evsel->priv)
+ initial_val = xyarray__entry(evsel->priv, cpu_map_idx, thread);
+
+ if (old_count) {
+ if (nvme_pmu__is_gauge(evsel->core.attr.config))
+ count->val = old_count->val + val;
+ else
+ count->val = val - (initial_val ? *initial_val : 0);
+ count->run = old_count->run + 1;
+ count->ena = old_count->ena + 1;
+ } else {
+ if (nvme_pmu__is_gauge(evsel->core.attr.config))
+ count->val = val;
+ else
+ count->val = val - (initial_val ? *initial_val : 0);
+ count->run++;
+ count->ena++;
+ }
+ return 0;
+}
+
+
+#endif
diff --git a/tools/perf/util/nvme_pmu.h b/tools/perf/util/nvme_pmu.h
new file mode 100644
index 000000000000..6d5d2bbe4167
--- /dev/null
+++ b/tools/perf/util/nvme_pmu.h
@@ -0,0 +1,112 @@
+/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
+#ifndef __NVME_PMU_H
+#define __NVME_PMU_H
+
+#include "pmu.h"
+#include <stdbool.h>
+#include <errno.h>
+
+struct list_head;
+struct perf_thread_map;
+struct evsel;
+
+#ifdef HAVE_LIBNVME_SUPPORT
+struct perf_pmu *nvme_pmu__new(struct list_head *pmus, const char *sysfs_name, const char *name);
+void nvme_pmu__exit(struct perf_pmu *pmu);
+
+int nvme_pmu__for_each_event(struct perf_pmu *pmu, void *state, pmu_event_callback cb);
+size_t nvme_pmu__num_events(struct perf_pmu *pmu);
+bool nvme_pmu__have_event(struct perf_pmu *pmu, const char *name);
+int nvme_pmu__config_terms(const struct perf_pmu *pmu,
+ struct perf_event_attr *attr,
+ struct parse_events_terms *terms,
+ struct parse_events_error *err);
+int nvme_pmu__check_alias(struct parse_events_terms *terms, struct perf_pmu_info *info,
+ struct parse_events_error *err);
+
+bool perf_pmu__is_nvme(const struct perf_pmu *pmu);
+bool evsel__is_nvme(const struct evsel *evsel);
+
+int perf_pmus__read_nvme_pmus(struct list_head *pmus);
+
+int evsel__nvme_pmu_open(struct evsel *evsel,
+ struct perf_thread_map *threads,
+ int start_cpu_map_idx, int end_cpu_map_idx);
+int evsel__nvme_pmu_read(struct evsel *evsel, int cpu_map_idx, int thread);
+#else
+static inline struct perf_pmu *nvme_pmu__new(struct list_head *pmus __maybe_unused,
+ const char *sysfs_name __maybe_unused,
+ const char *name __maybe_unused)
+{
+ return NULL;
+}
+
+static inline void nvme_pmu__exit(struct perf_pmu *pmu __maybe_unused)
+{
+}
+
+static inline int nvme_pmu__for_each_event(struct perf_pmu *pmu __maybe_unused,
+ void *state __maybe_unused,
+ pmu_event_callback cb __maybe_unused)
+{
+ return 0;
+}
+
+static inline size_t nvme_pmu__num_events(struct perf_pmu *pmu __maybe_unused)
+{
+ return 0;
+}
+
+static inline bool nvme_pmu__have_event(struct perf_pmu *pmu __maybe_unused,
+ const char *name __maybe_unused)
+{
+ return false;
+}
+
+static inline int nvme_pmu__config_terms(const struct perf_pmu *pmu __maybe_unused,
+ struct perf_event_attr *attr __maybe_unused,
+ struct parse_events_terms *terms __maybe_unused,
+ struct parse_events_error *err __maybe_unused)
+{
+ return -EINVAL;
+}
+
+static inline int nvme_pmu__check_alias(struct parse_events_terms *terms __maybe_unused,
+ struct perf_pmu_info *info __maybe_unused,
+ struct parse_events_error *err __maybe_unused)
+{
+ return -EINVAL;
+}
+
+static inline bool perf_pmu__is_nvme(const struct perf_pmu *pmu __maybe_unused)
+{
+ return false;
+}
+
+static inline bool evsel__is_nvme(const struct evsel *evsel __maybe_unused)
+{
+ return false;
+}
+
+static inline int perf_pmus__read_nvme_pmus(struct list_head *pmus __maybe_unused)
+{
+ return 0;
+}
+
+static inline int evsel__nvme_pmu_open(struct evsel *evsel __maybe_unused,
+ struct perf_thread_map *threads __maybe_unused,
+ int start_cpu_map_idx __maybe_unused,
+ int end_cpu_map_idx __maybe_unused)
+{
+ return 0;
+}
+
+static inline int evsel__nvme_pmu_read(struct evsel *evsel __maybe_unused,
+ int cpu_map_idx __maybe_unused,
+ int thread __maybe_unused)
+{
+ return 0;
+}
+#endif
+
+#endif /* __NVME_PMU_H */
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 9994709ef12b..26ec19753644 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -22,6 +22,7 @@
#include "pmu.h"
#include "drm_pmu.h"
#include "hwmon_pmu.h"
+#include "nvme_pmu.h"
#include "pmus.h"
#include "tool_pmu.h"
#include "tp_pmu.h"
@@ -1687,6 +1688,8 @@ int perf_pmu__config_terms(const struct perf_pmu *pmu,
if (perf_pmu__is_hwmon(pmu))
return hwmon_pmu__config_terms(pmu, attr, terms, err);
+ if (perf_pmu__is_nvme(pmu))
+ return nvme_pmu__config_terms(pmu, attr, terms, err);
if (perf_pmu__is_drm(pmu))
return drm_pmu__config_terms(pmu, attr, terms, err);
@@ -1851,6 +1854,10 @@ int perf_pmu__check_alias(struct perf_pmu *pmu, struct parse_events_terms *head_
ret = hwmon_pmu__check_alias(head_terms, info, err);
goto out;
}
+ if (perf_pmu__is_nvme(pmu)) {
+ ret = nvme_pmu__check_alias(head_terms, info, err);
+ goto out;
+ }
if (perf_pmu__is_drm(pmu)) {
ret = drm_pmu__check_alias(pmu, head_terms, info, err);
goto out;
@@ -2071,6 +2078,8 @@ bool perf_pmu__have_event(struct perf_pmu *pmu, const char *name)
return tp_pmu__have_event(pmu, name);
if (perf_pmu__is_hwmon(pmu))
return hwmon_pmu__have_event(pmu, name);
+ if (perf_pmu__is_nvme(pmu))
+ return nvme_pmu__have_event(pmu, name);
if (perf_pmu__is_drm(pmu))
return drm_pmu__have_event(pmu, name);
if (perf_pmu__find_alias(pmu, name, /*load=*/ true) != NULL)
@@ -2092,6 +2101,8 @@ size_t perf_pmu__num_events(struct perf_pmu *pmu)
return tp_pmu__num_events(pmu);
if (perf_pmu__is_hwmon(pmu))
return hwmon_pmu__num_events(pmu);
+ if (perf_pmu__is_nvme(pmu))
+ return nvme_pmu__num_events(pmu);
if (perf_pmu__is_drm(pmu))
return drm_pmu__num_events(pmu);
@@ -2223,6 +2234,8 @@ int perf_pmu__for_each_event(struct perf_pmu *pmu, bool skip_duplicate_pmus,
return tp_pmu__for_each_event(pmu, state, cb);
if (perf_pmu__is_hwmon(pmu))
return hwmon_pmu__for_each_event(pmu, state, cb);
+ if (perf_pmu__is_nvme(pmu))
+ return nvme_pmu__for_each_event(pmu, state, cb);
if (perf_pmu__is_drm(pmu))
return drm_pmu__for_each_event(pmu, state, cb);
@@ -2714,6 +2727,8 @@ void perf_pmu__delete(struct perf_pmu *pmu)
if (perf_pmu__is_hwmon(pmu))
hwmon_pmu__exit(pmu);
+ if (perf_pmu__is_nvme(pmu))
+ nvme_pmu__exit(pmu);
else if (perf_pmu__is_drm(pmu))
drm_pmu__exit(pmu);
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 0d9f3c57e8e8..0fe47dd429e8 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -45,6 +45,8 @@ enum pmu_kind {
PERF_PMU_KIND_DRM,
/* A perf tool provided HWMON PMU. */
PERF_PMU_KIND_HWMON,
+ /* A perf tool provided NVME PMU. */
+ PERF_PMU_KIND_NVME,
/* Perf tool provided PMU for tool events like time. */
PERF_PMU_KIND_TOOL,
/* A testing PMU kind. */
@@ -53,7 +55,9 @@ enum pmu_kind {
enum {
PERF_PMU_TYPE_PE_START = 0,
- PERF_PMU_TYPE_PE_END = 0xFFFDFFFF,
+ PERF_PMU_TYPE_PE_END = 0xFFFCFFFF,
+ PERF_PMU_TYPE_NVME_START = 0xFFFD0000,
+ PERF_PMU_TYPE_NVME_END = 0xFFFDFFFF,
PERF_PMU_TYPE_DRM_START = 0xFFFE0000,
PERF_PMU_TYPE_DRM_END = 0xFFFEFFFF,
PERF_PMU_TYPE_HWMON_START = 0xFFFF0000,
@@ -363,6 +367,8 @@ static inline enum pmu_kind perf_pmu__kind(const struct perf_pmu *pmu)
type = pmu->type;
if (type <= PERF_PMU_TYPE_PE_END)
return PERF_PMU_KIND_PE;
+ if (type <= PERF_PMU_TYPE_NVME_END)
+ return PERF_PMU_KIND_NVME;
if (type <= PERF_PMU_TYPE_DRM_END)
return PERF_PMU_KIND_DRM;
if (type <= PERF_PMU_TYPE_HWMON_END)
diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
index 5e3f571450fe..83777f941e9a 100644
--- a/tools/perf/util/pmus.c
+++ b/tools/perf/util/pmus.c
@@ -17,6 +17,7 @@
#include "pmus.h"
#include "pmu.h"
#include "hwmon_pmu.h"
+#include "nvme_pmu.h"
#include "tool_pmu.h"
#include "print-events.h"
#include "strbuf.h"
@@ -44,18 +45,21 @@ enum perf_tool_pmu_type {
PERF_TOOL_PMU_TYPE_PE_OTHER,
PERF_TOOL_PMU_TYPE_TOOL,
PERF_TOOL_PMU_TYPE_HWMON,
+ PERF_TOOL_PMU_TYPE_NVME,
PERF_TOOL_PMU_TYPE_DRM,
#define PERF_TOOL_PMU_TYPE_PE_CORE_MASK (1 << PERF_TOOL_PMU_TYPE_PE_CORE)
#define PERF_TOOL_PMU_TYPE_PE_OTHER_MASK (1 << PERF_TOOL_PMU_TYPE_PE_OTHER)
#define PERF_TOOL_PMU_TYPE_TOOL_MASK (1 << PERF_TOOL_PMU_TYPE_TOOL)
#define PERF_TOOL_PMU_TYPE_HWMON_MASK (1 << PERF_TOOL_PMU_TYPE_HWMON)
+#define PERF_TOOL_PMU_TYPE_NVME_MASK (1 << PERF_TOOL_PMU_TYPE_NVME)
#define PERF_TOOL_PMU_TYPE_DRM_MASK (1 << PERF_TOOL_PMU_TYPE_DRM)
#define PERF_TOOL_PMU_TYPE_ALL_MASK (PERF_TOOL_PMU_TYPE_PE_CORE_MASK | \
PERF_TOOL_PMU_TYPE_PE_OTHER_MASK | \
PERF_TOOL_PMU_TYPE_TOOL_MASK | \
PERF_TOOL_PMU_TYPE_HWMON_MASK | \
+ PERF_TOOL_PMU_TYPE_NVME_MASK | \
PERF_TOOL_PMU_TYPE_DRM_MASK)
};
static unsigned int read_pmu_types;
@@ -175,12 +179,15 @@ struct perf_pmu *perf_pmus__find(const char *name)
return pmu;
/* Looking up an individual perf event PMU failed, check if a tool PMU should be read. */
- if (!strncmp(name, "hwmon_", 6))
- to_read_pmus |= PERF_TOOL_PMU_TYPE_HWMON_MASK;
- else if (!strncmp(name, "drm_", 4))
+ if (!strncmp(name, "hwmon_", 6)) {
+ to_read_pmus = PERF_TOOL_PMU_TYPE_HWMON_MASK;
+ } else if (!strncmp(name, "nvme_", 5)) {
+ to_read_pmus = PERF_TOOL_PMU_TYPE_NVME_MASK;
+ } else if (!strncmp(name, "drm_", 4)) {
to_read_pmus |= PERF_TOOL_PMU_TYPE_DRM_MASK;
- else if (!strcmp(name, "tool"))
+ } else if (!strcmp(name, "tool")) {
to_read_pmus |= PERF_TOOL_PMU_TYPE_TOOL_MASK;
+ }
if (to_read_pmus) {
pmu_read_sysfs(to_read_pmus);
@@ -279,6 +286,10 @@ static void pmu_read_sysfs(unsigned int to_read_types)
(read_pmu_types & PERF_TOOL_PMU_TYPE_HWMON_MASK) == 0)
perf_pmus__read_hwmon_pmus(&other_pmus);
+ if ((to_read_types & PERF_TOOL_PMU_TYPE_NVME_MASK) != 0 &&
+ (read_pmu_types & PERF_TOOL_PMU_TYPE_NVME_MASK) == 0)
+ perf_pmus__read_nvme_pmus(&other_pmus);
+
if ((to_read_types & PERF_TOOL_PMU_TYPE_DRM_MASK) != 0 &&
(read_pmu_types & PERF_TOOL_PMU_TYPE_DRM_MASK) == 0)
perf_pmus__read_drm_pmus(&other_pmus);
@@ -387,6 +398,10 @@ struct perf_pmu *perf_pmus__scan_for_event(struct perf_pmu *pmu, const char *eve
if (strlen(event) > 4 && strncmp("drm-", event, 4) == 0)
to_read_pmus |= PERF_TOOL_PMU_TYPE_DRM_MASK;
+ /* Could the event be an nvme event? */
+ if (nvme_pmu__have_event(NULL, event))
+ to_read_pmus |= PERF_TOOL_PMU_TYPE_NVME_MASK;
+
pmu_read_sysfs(to_read_pmus);
pmu = list_prepare_entry(pmu, &core_pmus, list);
}
@@ -424,11 +439,14 @@ struct perf_pmu *perf_pmus__scan_matching_wildcard(struct perf_pmu *pmu, const c
*/
if (strisglob(wildcard)) {
to_read_pmus |= PERF_TOOL_PMU_TYPE_HWMON_MASK |
+ PERF_TOOL_PMU_TYPE_NVME_MASK |
PERF_TOOL_PMU_TYPE_DRM_MASK;
} else if (strlen(wildcard) >= 4 && strncmp("drm_", wildcard, 4) == 0) {
to_read_pmus |= PERF_TOOL_PMU_TYPE_DRM_MASK;
} else if (strlen(wildcard) >= 5 && strncmp("hwmon", wildcard, 5) == 0) {
to_read_pmus |= PERF_TOOL_PMU_TYPE_HWMON_MASK;
+ } else if (strlen(wildcard) >= 4 && strncmp("nvme", wildcard, 4) == 0) {
+ to_read_pmus |= PERF_TOOL_PMU_TYPE_NVME_MASK;
}
pmu_read_sysfs(to_read_pmus);
--
2.54.0.1064.gd145956f57-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v1 3/3] perf tests: Add NVMe PMU event parsing test
2026-06-09 7:03 [PATCH v1 0/3] perf pmu: Add tool-provided NVMe PMU Ian Rogers
2026-06-09 7:03 ` [PATCH v1 1/3] perf build: Add libnvme feature detection Ian Rogers
2026-06-09 7:03 ` [PATCH v1 2/3] perf pmu: Implement tool-provided NVMe PMU Ian Rogers
@ 2026-06-09 7:03 ` Ian Rogers
2 siblings, 0 replies; 6+ messages in thread
From: Ian Rogers @ 2026-06-09 7:03 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark,
Tomas Glozar, Michael Jeanson, Dmitrii Dolgov, Alexandre Chartre,
Yuzhuo Jing, Leo Yan, German Gomez, Anubhav Shelat, linux-kernel,
linux-perf-users
Cc: linux-nvme
Introduce a unit test suite for the NVMe PMU event parser.
The test registers a mock 'nvme_nvme0' PMU and checks that all of
our mapped events (SMART, Endurance, FDP, Error, ZNS) parse into the
correctly configured configs.
To support this, moved the NVMe event config encoding macros and the
'nvme_log_type' enum from nvme_pmu.c to nvme_pmu.h.
Signed-off-by: Ian Rogers <irogers@google.com>
CONV=ca4c5d09-4ef8-405a-80bb-aa988020b436
TAG=agy
---
tools/perf/tests/Build | 1 +
tools/perf/tests/builtin-test.c | 1 +
tools/perf/tests/nvme_pmu.c | 176 ++++++++++++++++++++++++++++++++
tools/perf/tests/tests.h | 1 +
tools/perf/util/nvme_pmu.c | 27 -----
tools/perf/util/nvme_pmu.h | 31 ++++++
tools/perf/util/pmus.c | 5 +
tools/perf/util/pmus.h | 1 +
8 files changed, 216 insertions(+), 27 deletions(-)
create mode 100644 tools/perf/tests/nvme_pmu.c
diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build
index 66944a4f4968..acaf5e3e728b 100644
--- a/tools/perf/tests/Build
+++ b/tools/perf/tests/Build
@@ -68,6 +68,7 @@ perf-test-y += event_groups.o
perf-test-y += symbols.o
perf-test-y += util.o
perf-test-y += hwmon_pmu.o
+perf-test-y += nvme_pmu.o
perf-test-y += tool_pmu.o
perf-test-y += subcmd-help.o
perf-test-y += kallsyms-split.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index b64fc2204f22..9a2fb706c2bc 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -83,6 +83,7 @@ static struct test_suite *generic_tests[] = {
&suite__pmu,
&suite__pmu_events,
&suite__hwmon_pmu,
+ &suite__nvme_pmu,
&suite__tool_pmu,
&suite__dso_data,
&suite__perf_evsel__roundtrip_name_test,
diff --git a/tools/perf/tests/nvme_pmu.c b/tools/perf/tests/nvme_pmu.c
new file mode 100644
index 000000000000..3c1de6e92efc
--- /dev/null
+++ b/tools/perf/tests/nvme_pmu.c
@@ -0,0 +1,176 @@
+// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+#include "nvme_pmu.h"
+
+#include <errno.h>
+#include <inttypes.h>
+#include <linux/compiler.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+
+#include "debug.h"
+#include "evlist.h"
+#include "parse-events.h"
+#include "pmus.h"
+#include "tests.h"
+
+#ifdef HAVE_LIBNVME_SUPPORT
+
+static const struct test_event {
+ const char *name;
+ const char *alias;
+ uint64_t config;
+} test_events[] = {
+ {
+ "smart_temperature",
+ "smart_temperature",
+ NVME_SMART(2, temperature),
+ },
+ {
+ "smart_data_units_read",
+ "smart_data_units_read",
+ NVME_SMART(16, data_units_read),
+ },
+ {
+ "endurance_percent_used",
+ "endurance_percent_used",
+ NVME_ENDURANCE(1, percent_used),
+ },
+ {
+ "fdp_hbmw",
+ "fdp_hbmw",
+ NVME_FDP(16, hbmw),
+ },
+ {
+ "error_count",
+ "error_count",
+ NVME_ERROR(8, error_count),
+ },
+ {
+ "zns_nrzid",
+ "zns_nrzid",
+ NVME_ZNS(2, nrzid),
+ },
+};
+
+static int do_test(size_t i, bool with_pmu, bool with_alias)
+{
+ const char *test_event = with_alias ? test_events[i].alias : test_events[i].name;
+ struct evlist *evlist = evlist__new();
+ struct evsel *evsel;
+ struct parse_events_error err;
+ int ret;
+ char str[128];
+ bool found = false;
+
+ if (!evlist) {
+ pr_err("evlist allocation failed\n");
+ return TEST_FAIL;
+ }
+
+ if (with_pmu)
+ snprintf(str, sizeof(str), "nvme_nvme0/%s/", test_event);
+ else
+ strlcpy(str, test_event, sizeof(str));
+
+ pr_debug("Testing '%s'\n", str);
+ parse_events_error__init(&err);
+ ret = parse_events(evlist, str, &err);
+ if (ret) {
+ pr_debug("FAILED %s:%d failed to parse event '%s', err %d\n",
+ __FILE__, __LINE__, str, ret);
+ parse_events_error__print(&err, str);
+ ret = TEST_FAIL;
+ goto out;
+ }
+
+ ret = TEST_OK;
+ if (with_pmu ? (evlist->core.nr_entries != 1) : (evlist->core.nr_entries < 1)) {
+ pr_debug("FAILED %s:%d Unexpected number of events for '%s' of %d\n",
+ __FILE__, __LINE__, str, evlist->core.nr_entries);
+ ret = TEST_FAIL;
+ goto out;
+ }
+
+ evlist__for_each_entry(evlist, evsel) {
+ if (!evsel->pmu || !evsel->pmu->name ||
+ strcmp(evsel->pmu->name, "nvme_nvme0"))
+ continue;
+
+ if (evsel->core.attr.config != test_events[i].config) {
+ pr_debug("FAILED %s:%d Unexpected config for '%s', %"
+ PRIu64 " != %" PRIu64 "\n",
+ __FILE__, __LINE__, str,
+ (uint64_t)evsel->core.attr.config,
+ test_events[i].config);
+ ret = TEST_FAIL;
+ goto out;
+ }
+ found = true;
+ }
+
+ if (!found) {
+ pr_debug("FAILED %s:%d Didn't find nvme event '%s' in parsed evsels\n",
+ __FILE__, __LINE__, str);
+ ret = TEST_FAIL;
+ }
+
+out:
+ parse_events_error__exit(&err);
+ evlist__delete(evlist);
+ return ret;
+}
+
+static int test__nvme_pmu(bool with_pmu)
+{
+ struct perf_pmu *pmu = perf_pmus__add_test_nvme_pmu("nvme0", "nvme0");
+ int ret = TEST_OK;
+
+ if (!pmu)
+ return TEST_FAIL;
+
+ for (size_t i = 0; i < ARRAY_SIZE(test_events); i++) {
+ ret = do_test(i, with_pmu, /*with_alias=*/false);
+ if (ret != TEST_OK)
+ break;
+
+ ret = do_test(i, with_pmu, /*with_alias=*/true);
+ if (ret != TEST_OK)
+ break;
+ }
+
+ list_del(&pmu->list);
+ perf_pmu__delete(pmu);
+ return ret;
+}
+
+static int test__nvme_pmu_without_pmu(struct test_suite *test __maybe_unused,
+ int subtest __maybe_unused)
+{
+ return test__nvme_pmu(/*with_pmu=*/false);
+}
+
+static int test__nvme_pmu_with_pmu(struct test_suite *test __maybe_unused,
+ int subtest __maybe_unused)
+{
+ return test__nvme_pmu(/*with_pmu=*/true);
+}
+
+static struct test_case tests__nvme_pmu[] = {
+ TEST_CASE("Parsing without PMU name", nvme_pmu_without_pmu),
+ TEST_CASE("Parsing with PMU name", nvme_pmu_with_pmu),
+ { .name = NULL, }
+};
+
+struct test_suite suite__nvme_pmu = {
+ .desc = "NVMe PMU",
+ .test_cases = tests__nvme_pmu,
+};
+
+#else
+
+struct test_suite suite__nvme_pmu = {
+ .desc = "NVMe PMU",
+ .test_cases = NULL,
+};
+
+#endif
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index bf8ff7d54727..abffa51c5937 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -115,6 +115,7 @@ DECLARE_SUITE(syscall_openat_tp_fields);
DECLARE_SUITE(pmu);
DECLARE_SUITE(pmu_events);
DECLARE_SUITE(hwmon_pmu);
+DECLARE_SUITE(nvme_pmu);
DECLARE_SUITE(tool_pmu);
DECLARE_SUITE(attr);
DECLARE_SUITE(dso_data);
diff --git a/tools/perf/util/nvme_pmu.c b/tools/perf/util/nvme_pmu.c
index 17ba758aec59..ee0e958ea6ef 100644
--- a/tools/perf/util/nvme_pmu.c
+++ b/tools/perf/util/nvme_pmu.c
@@ -25,33 +25,6 @@
#ifdef HAVE_LIBNVME_SUPPORT
#include <libnvme.h>
-
-#define NVME_CONFIG(log, size, offset) \
- (((uint64_t)(log) << 24) | ((uint64_t)(size) << 16) | (offset))
-
-enum nvme_log_type {
- NVME_LOG_SMART = 0,
- NVME_LOG_ENDURANCE = 1,
- NVME_LOG_FDP = 2,
- NVME_LOG_ERROR = 3,
- NVME_LOG_ZNS = 4,
-};
-
-#define NVME_SMART(size, field) \
- NVME_CONFIG(NVME_LOG_SMART, size, offsetof(struct nvme_smart_log, field))
-
-#define NVME_ENDURANCE(size, field) \
- NVME_CONFIG(NVME_LOG_ENDURANCE, size, offsetof(struct nvme_endurance_group_log, field))
-
-#define NVME_FDP(size, field) \
- NVME_CONFIG(NVME_LOG_FDP, size, offsetof(struct nvme_fdp_stats_log, field))
-
-#define NVME_ERROR(size, field) \
- NVME_CONFIG(NVME_LOG_ERROR, size, offsetof(struct nvme_error_log_page, field))
-
-#define NVME_ZNS(size, field) \
- NVME_CONFIG(NVME_LOG_ZNS, size, offsetof(struct nvme_zns_changed_zone_log, field))
-
struct nvme_event {
const char *name;
const char *desc;
diff --git a/tools/perf/util/nvme_pmu.h b/tools/perf/util/nvme_pmu.h
index 6d5d2bbe4167..9203f461f381 100644
--- a/tools/perf/util/nvme_pmu.h
+++ b/tools/perf/util/nvme_pmu.h
@@ -6,6 +6,37 @@
#include <stdbool.h>
#include <errno.h>
+#ifdef HAVE_LIBNVME_SUPPORT
+#include <libnvme.h>
+#include <stddef.h>
+
+#define NVME_CONFIG(log, size, offset) \
+ (((uint64_t)(log) << 24) | ((uint64_t)(size) << 16) | (offset))
+
+enum nvme_log_type {
+ NVME_LOG_SMART = 0,
+ NVME_LOG_ENDURANCE = 1,
+ NVME_LOG_FDP = 2,
+ NVME_LOG_ERROR = 3,
+ NVME_LOG_ZNS = 4,
+};
+
+#define NVME_SMART(size, field) \
+ NVME_CONFIG(NVME_LOG_SMART, size, offsetof(struct nvme_smart_log, field))
+
+#define NVME_ENDURANCE(size, field) \
+ NVME_CONFIG(NVME_LOG_ENDURANCE, size, offsetof(struct nvme_endurance_group_log, field))
+
+#define NVME_FDP(size, field) \
+ NVME_CONFIG(NVME_LOG_FDP, size, offsetof(struct nvme_fdp_stats_log, field))
+
+#define NVME_ERROR(size, field) \
+ NVME_CONFIG(NVME_LOG_ERROR, size, offsetof(struct nvme_error_log_page, field))
+
+#define NVME_ZNS(size, field) \
+ NVME_CONFIG(NVME_LOG_ZNS, size, offsetof(struct nvme_zns_changed_zone_log, field))
+#endif
+
struct list_head;
struct perf_thread_map;
struct evsel;
diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
index 83777f941e9a..1c45164ae244 100644
--- a/tools/perf/util/pmus.c
+++ b/tools/perf/util/pmus.c
@@ -909,6 +909,11 @@ struct perf_pmu *perf_pmus__add_test_hwmon_pmu(const char *hwmon_dir,
return hwmon_pmu__new(&other_pmus, hwmon_dir, sysfs_name, name);
}
+struct perf_pmu *perf_pmus__add_test_nvme_pmu(const char *sysfs_name, const char *name)
+{
+ return nvme_pmu__new(&other_pmus, sysfs_name, name);
+}
+
struct perf_pmu *perf_pmus__fake_pmu(void)
{
static struct perf_pmu fake = {
diff --git a/tools/perf/util/pmus.h b/tools/perf/util/pmus.h
index 0d55edb3f2fc..2045d4cf44d1 100644
--- a/tools/perf/util/pmus.h
+++ b/tools/perf/util/pmus.h
@@ -37,6 +37,7 @@ struct perf_pmu *perf_pmus__add_test_pmu(int test_sysfs_dirfd, const char *name)
struct perf_pmu *perf_pmus__add_test_hwmon_pmu(const char *hwmon_dir,
const char *sysfs_name,
const char *name);
+struct perf_pmu *perf_pmus__add_test_nvme_pmu(const char *sysfs_name, const char *name);
struct perf_pmu *perf_pmus__fake_pmu(void);
struct perf_pmu *perf_pmus__find_core_pmu(void);
--
2.54.0.1064.gd145956f57-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v1 1/3] perf build: Add libnvme feature detection
2026-06-09 7:03 ` [PATCH v1 1/3] perf build: Add libnvme feature detection Ian Rogers
@ 2026-06-09 7:12 ` sashiko-bot
0 siblings, 0 replies; 6+ messages in thread
From: sashiko-bot @ 2026-06-09 7:12 UTC (permalink / raw)
To: Ian Rogers; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] The feature check for `libnvme` fails to utilize `pkg-config` during the test compilation, contradicting the commit message and breaking detection when the library is in a non-standard path.
- [Medium] `libnvme` is omitted from `test-all.c`, which causes false positives in fast-path detection and triggers continuous feature-dump flip-flops and rebuilds.
--
commit 1f4a72ba50b99659f0972d94ef171f8be76bc329
Author: Ian Rogers <irogers@google.com>
perf build: Add libnvme feature detection
Introduce a feature check for libnvme so that we can conditionally
compile the NVMe PMU driver. This checks for the availability of the
libnvme headers and library using pkg-config.
> diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
> index ed1374af31c16..bf85032415397 100644
> --- a/tools/build/Makefile.feature
> +++ b/tools/build/Makefile.feature
> @@ -73,6 +73,7 @@ FEATURE_TESTS_BASIC := \
> libbfd \
> libbfd-threadsafe \
> libelf \
> + libnvme \
[Severity: Medium]
Without including test-libnvme.c in test-all.c, the fast-path build will
succeed even if libnvme is absent.
This can cause the initial discovery to falsely report libnvme as present,
which then flip-flops when individual checks are run later in
tools/perf/Makefile.config. This mismatch invalidates the build cache and
triggers full feature evaluations and rebuilds on every make invocation.
Since libnvme was added to FEATURE_TESTS_BASIC, does it also need to be
included in tools/build/feature/test-all.c?
> libelf-getphdrnum \
> libelf-gelf_getnote \
> libelf-getshdrstrndx \
[ ... ]
> diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile
> index 62909a9c799d3..fd00e0be60690 100644
> --- a/tools/build/feature/Makefile
> +++ b/tools/build/feature/Makefile
> @@ -147,6 +147,9 @@ $(OUTPUT)test-libcap.bin:
> $(OUTPUT)test-libelf.bin:
> $(BUILD) -lelf
>
> +$(OUTPUT)test-libnvme.bin:
> + $(BUILD) -lnvme
> +
[Severity: Medium]
The commit message states the availability of libnvme headers and library are
checked using pkg-config. However, because libnvme is omitted from the
FEATURE_PKG_CONFIG list in tools/build/Makefile.feature, the variables
FEATURE_CHECK_CFLAGS-libnvme and FEATURE_CHECK_LDFLAGS-libnvme are never
initialized.
This rule manually invokes $(BUILD) -lnvme, falling back to standard compiler
paths instead of leveraging pkg-config variables.
If libnvme is installed in a non-standard location discoverable only via
pkg-config, could the feature test fail and incorrectly disable NVMe PMU
support?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260609070348.541964-1-irogers@google.com?part=1
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v1 2/3] perf pmu: Implement tool-provided NVMe PMU
2026-06-09 7:03 ` [PATCH v1 2/3] perf pmu: Implement tool-provided NVMe PMU Ian Rogers
@ 2026-06-09 7:21 ` sashiko-bot
0 siblings, 0 replies; 6+ messages in thread
From: sashiko-bot @ 2026-06-09 7:21 UTC (permalink / raw)
To: Ian Rogers; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Unaligned memory access and strict aliasing violation in nvme_pmu__read_val() due to byte-aligned stack array cast to wider types.
- [High] The NVMe PMU implementation hijacks the tool-owned `evsel->priv` field, leading to arbitrary memory corruption and memory leaks.
--
commit 444d6a9adc97df412d7040ffe34f89b94548c30b
Author: Ian Rogers <irogers@google.com>
perf pmu: Implement tool-provided NVMe PMU
This commit adds a tool-provided PMU for NVMe devices using libnvme to
expose metrics from various NVMe logs as perf events under the
nvme_nvmeX PMUs.
> diff --git a/tools/perf/util/nvme_pmu.c b/tools/perf/util/nvme_pmu.c
> new file mode 100644
> index 0000000000000..17ba758aec59b
> --- /dev/null
> +++ b/tools/perf/util/nvme_pmu.c
[ ... ]
> +static int nvme_pmu__read_val(int fd, uint64_t config, uint64_t *val)
> +{
> + int log_type = (config >> 24) & 0xFF;
> + unsigned int size = (config >> 16) & 0xFF;
> + unsigned int offset = config & 0xFFFF;
> + uint8_t buf[4096];
> + uint8_t *p;
[ ... ]
> + p = buf + offset;
> + if (size == 16 || size == 8)
> + *val = le64_to_cpu(*(uint64_t *)p);
[Severity: High]
Does this cast violate C strict aliasing rules and risk unaligned memory
access crashes?
Since buf is a byte-aligned stack array and offset can be an arbitrary
value, p is not guaranteed to be 8-byte aligned. On architectures that
strictly enforce memory alignment, dereferencing an unaligned 64-bit
pointer can cause an immediate bus error (SIGBUS).
Additionally, casting a byte array to a wider type violates strict aliasing,
which is enforced in the tools/ directory with -fstrict-aliasing.
Could we use get_unaligned_le64() here instead?
> + else if (size == 4)
> + *val = le32_to_cpu(*(uint32_t *)p);
> + else if (size == 2)
> + *val = le16_to_cpu(*(uint16_t *)p);
> + else if (size == 1)
> + *val = *(uint8_t *)p;
> + else
> + return -EINVAL;
> +
> + return 0;
> +}
[ ... ]
> +int evsel__nvme_pmu_open(struct evsel *evsel,
> + struct perf_thread_map *threads,
> + int start_cpu_map_idx, int end_cpu_map_idx)
> +{
> + struct nvme_pmu *nvm = container_of(evsel->pmu, struct nvme_pmu, pmu);
> + int idx = 0, thread = 0, nthreads, err = 0;
> + char path[PATH_MAX];
> +
> + snprintf(path, sizeof(path), "/dev/%s", nvm->dev_name);
> +
> + nthreads = perf_thread_map__nr(threads);
> +
> + if (!evsel->priv) {
> + int max_cpus = evsel->core.cpus ? perf_cpu_map__nr(evsel->core.cpus) : 1;
> +
> + evsel->priv = xyarray__new(max_cpus, nthreads, sizeof(uint64_t));
> + }
[Severity: High]
Can this leak the xyarray allocation?
Since evsel->priv is conventionally owned by the specific perf tool executing
the command, allocating an xyarray here does not seem to have a matching
destructor for the PMU, which might leave the memory unfreed when the event
is closed.
> +
> + for (idx = start_cpu_map_idx; idx < end_cpu_map_idx; idx++) {
> + for (thread = 0; thread < nthreads; thread++) {
> + int fd = open(path, O_RDONLY);
> +
> + FD(evsel, idx, thread) = fd;
> + if (fd < 0) {
> + err = -errno;
> + goto out_close;
> + }
> + if (evsel->priv) {
> + uint64_t *initial_val = xyarray__entry(evsel->priv, idx, thread);
> +
> + if (nvme_pmu__read_val(fd, evsel->core.attr.config, initial_val))
> + *initial_val = 0;
> + }
[Severity: High]
Is there a risk of memory corruption if a tool has already populated
evsel->priv?
If an active tool like perf stat has already set evsel->priv to its own
structure, this code assumes it is an xyarray. The xyarray__entry()
function will read from the tool's struct to compute a pointer offset,
and then nvme_pmu__read_val() will write a 64-bit value to that
invalid address, causing arbitrary heap corruption.
> + }
> + }
> + return 0;
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260609070348.541964-1-irogers@google.com?part=2
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-06-09 7:21 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-09 7:03 [PATCH v1 0/3] perf pmu: Add tool-provided NVMe PMU Ian Rogers
2026-06-09 7:03 ` [PATCH v1 1/3] perf build: Add libnvme feature detection Ian Rogers
2026-06-09 7:12 ` sashiko-bot
2026-06-09 7:03 ` [PATCH v1 2/3] perf pmu: Implement tool-provided NVMe PMU Ian Rogers
2026-06-09 7:21 ` sashiko-bot
2026-06-09 7:03 ` [PATCH v1 3/3] perf tests: Add NVMe PMU event parsing test Ian Rogers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox