Linux Perf Users
 help / color / mirror / Atom feed
* [PATCH v1 0/3] perf pmu: Add tool-provided NVMe PMU
@ 2026-06-09  7:03 Ian Rogers
  2026-06-09  7:03 ` [PATCH v1 1/3] perf build: Add libnvme feature detection Ian Rogers
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Ian Rogers @ 2026-06-09  7:03 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark,
	Tomas Glozar, Michael Jeanson, Dmitrii Dolgov, Alexandre Chartre,
	Yuzhuo Jing, Leo Yan, German Gomez, Anubhav Shelat, linux-kernel,
	linux-perf-users
  Cc: linux-nvme

This patch series introduces a tool-provided PMU for NVMe devices to expose
various hardware logs as perf events.

Using libnvme, the new nvme_pmu extracts metrics from standard NVMe log pages
including:
- SMART / Health Information (e.g. data units read/written, power cycles,
  temperature)
- Endurance Group Information
- FDP (Flexible Data Placement) Statistics
- Error Information
- Zoned Namespaces (ZNS) Changed Zones

Key features:
- Dynamic probing of supported log pages on /dev/nvmeX. Unsupported events
  are marked as deprecated and hidden from `perf list` by default.
- Generic configuration encoding where the log type, size, and offset are
  encoded directly into the event configuration.
- Correct interval delta tracking using baseline offset snapshots. This mimics
  the behavior of the hwmon PMU, allowing gauges like Temperature to be
  reported instantaneously in `perf stat -I`, while standard counters
  correctly accumulate deltas.
- Robust handling of cross-architecture endianness when reading 128-bit and
  64-bit fields from the NVMe specification.

The series is broken down into:
1. Adding the libnvme feature detection via pkg-config in tools/build.
2. The core implementation of the NVMe tool PMU.
3. A unit test suite verifying the parsing and encoding of the NVMe PMU
   events using a mocked device.

Ian Rogers (3):
  perf build: Add libnvme feature detection
  perf pmu: Implement tool-provided NVMe PMU
  perf tests: Add NVMe PMU event parsing test

 tools/build/Makefile.feature       |   2 +
 tools/build/feature/Makefile       |   3 +
 tools/build/feature/test-libnvme.c |  11 +
 tools/perf/Makefile.config         |  12 +
 tools/perf/tests/Build             |   1 +
 tools/perf/tests/builtin-test.c    |   1 +
 tools/perf/tests/nvme_pmu.c        | 176 ++++++++++
 tools/perf/tests/tests.h           |   1 +
 tools/perf/util/Build              |   1 +
 tools/perf/util/evsel.c            |  11 +-
 tools/perf/util/nvme_pmu.c         | 535 +++++++++++++++++++++++++++++
 tools/perf/util/nvme_pmu.h         | 143 ++++++++
 tools/perf/util/pmu.c              |  15 +
 tools/perf/util/pmu.h              |   8 +-
 tools/perf/util/pmus.c             |  31 +-
 tools/perf/util/pmus.h             |   1 +
 16 files changed, 945 insertions(+), 7 deletions(-)
 create mode 100644 tools/build/feature/test-libnvme.c
 create mode 100644 tools/perf/tests/nvme_pmu.c
 create mode 100644 tools/perf/util/nvme_pmu.c
 create mode 100644 tools/perf/util/nvme_pmu.h

-- 
2.54.0.1064.gd145956f57-goog


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v1 1/3] perf build: Add libnvme feature detection
  2026-06-09  7:03 [PATCH v1 0/3] perf pmu: Add tool-provided NVMe PMU Ian Rogers
@ 2026-06-09  7:03 ` Ian Rogers
  2026-06-09  7:12   ` sashiko-bot
  2026-06-09  7:03 ` [PATCH v1 2/3] perf pmu: Implement tool-provided NVMe PMU Ian Rogers
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Ian Rogers @ 2026-06-09  7:03 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark,
	Tomas Glozar, Michael Jeanson, Dmitrii Dolgov, Alexandre Chartre,
	Yuzhuo Jing, Leo Yan, German Gomez, Anubhav Shelat, linux-kernel,
	linux-perf-users
  Cc: linux-nvme

Introduce a feature check for libnvme so that we can conditionally
compile the NVMe PMU driver. This checks for the availability of the
libnvme headers and library using pkg-config.

Signed-off-by: Ian Rogers <irogers@google.com>
CONV=ca4c5d09-4ef8-405a-80bb-aa988020b436
TAG=agy
---
 tools/build/Makefile.feature       |  2 ++
 tools/build/feature/Makefile       |  3 +++
 tools/build/feature/test-libnvme.c | 11 +++++++++++
 tools/perf/Makefile.config         | 12 ++++++++++++
 4 files changed, 28 insertions(+)
 create mode 100644 tools/build/feature/test-libnvme.c

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
index ed1374af31c1..bf8503241539 100644
--- a/tools/build/Makefile.feature
+++ b/tools/build/Makefile.feature
@@ -73,6 +73,7 @@ FEATURE_TESTS_BASIC :=                  \
         libbfd                          \
 	libbfd-threadsafe		\
         libelf                          \
+        libnvme                         \
         libelf-getphdrnum               \
         libelf-gelf_getnote             \
         libelf-getshdrstrndx            \
@@ -146,6 +147,7 @@ FEATURE_DISPLAY ?=              \
          libdw                  \
          glibc                  \
          libelf                 \
+         libnvme                \
          libnuma                \
          numa_num_possible_cpus \
          libpython              \
diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile
index 62909a9c799d..fd00e0be6069 100644
--- a/tools/build/feature/Makefile
+++ b/tools/build/feature/Makefile
@@ -147,6 +147,9 @@ $(OUTPUT)test-libcap.bin:
 $(OUTPUT)test-libelf.bin:
 	$(BUILD) -lelf
 
+$(OUTPUT)test-libnvme.bin:
+	$(BUILD) -lnvme
+
 $(OUTPUT)test-eventfd.bin:
 	$(BUILD)
 
diff --git a/tools/build/feature/test-libnvme.c b/tools/build/feature/test-libnvme.c
new file mode 100644
index 000000000000..3fefad70a680
--- /dev/null
+++ b/tools/build/feature/test-libnvme.c
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+#include <libnvme.h>
+
+int main(void)
+{
+	nvme_root_t r = nvme_scan(NULL);
+
+	if (r)
+		nvme_free_tree(r);
+	return 0;
+}
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index 6e7b15fab2ec..79c4ad10bc18 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -1060,6 +1060,18 @@ ifndef NO_CAPSTONE
   endif
 endif
 
+ifndef NO_LIBNVME
+  $(call feature_check,libnvme)
+  ifeq ($(feature-libnvme), 1)
+    CFLAGS += -DHAVE_LIBNVME_SUPPORT $(shell $(PKG_CONFIG) --cflags libnvme)
+    LDFLAGS += $(shell $(PKG_CONFIG) --libs-only-L libnvme)
+    EXTLIBS += $(shell $(PKG_CONFIG) --libs-only-l libnvme)
+    $(call detected,CONFIG_LIBNVME)
+  else
+    msg := $(warning No libnvme found, disables NVMe PMU support, please install libnvme-dev/libnvme-devel);
+  endif
+endif
+
 ifdef EXTRA_TESTS
     $(call detected,CONFIG_EXTRA_TESTS)
     CFLAGS += -DHAVE_EXTRA_TESTS
-- 
2.54.0.1064.gd145956f57-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v1 2/3] perf pmu: Implement tool-provided NVMe PMU
  2026-06-09  7:03 [PATCH v1 0/3] perf pmu: Add tool-provided NVMe PMU Ian Rogers
  2026-06-09  7:03 ` [PATCH v1 1/3] perf build: Add libnvme feature detection Ian Rogers
@ 2026-06-09  7:03 ` Ian Rogers
  2026-06-09  7:21   ` sashiko-bot
  2026-06-09  7:03 ` [PATCH v1 3/3] perf tests: Add NVMe PMU event parsing test Ian Rogers
  2026-06-09 16:57 ` [PATCH v2 0/3] perf pmu: Add tool-provided NVMe PMU Ian Rogers
  3 siblings, 1 reply; 12+ messages in thread
From: Ian Rogers @ 2026-06-09  7:03 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark,
	Tomas Glozar, Michael Jeanson, Dmitrii Dolgov, Alexandre Chartre,
	Yuzhuo Jing, Leo Yan, German Gomez, Anubhav Shelat, linux-kernel,
	linux-perf-users
  Cc: linux-nvme

Add a tool-provided PMU for NVMe devices using libnvme. This PMU
exposes metrics from various NVMe logs (SMART, Endurance Group,
FDP, Error Information, and Zoned Namespaces) as perf events under
the 'nvme_nvmeX' PMUs.

Features:
- Generic configuration encoding (log type, size, and offset mapped
  inside the event config).
- Dynamic probing of supported log pages on /dev/nvmeX. Unsupported
  events are marked deprecated and hidden from 'perf list' by
  default.
- Correct interval delta tracking using baseline offset snapshots on
  evsel->priv.
- Proper conversion and scaling of properties like temperature (K
  to C).

Signed-off-by: Ian Rogers <irogers@google.com>
CONV=ca4c5d09-4ef8-405a-80bb-aa988020b436
TAG=agy
---
 tools/perf/util/Build      |   1 +
 tools/perf/util/evsel.c    |  11 +-
 tools/perf/util/nvme_pmu.c | 562 +++++++++++++++++++++++++++++++++++++
 tools/perf/util/nvme_pmu.h | 112 ++++++++
 tools/perf/util/pmu.c      |  15 +
 tools/perf/util/pmu.h      |   8 +-
 tools/perf/util/pmus.c     |  26 +-
 7 files changed, 728 insertions(+), 7 deletions(-)
 create mode 100644 tools/perf/util/nvme_pmu.c
 create mode 100644 tools/perf/util/nvme_pmu.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index b22cdc24082a..e28e99634178 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -20,6 +20,7 @@ perf-util-y += disasm.o
 perf-util-y += env.o
 perf-util-y += event.o
 perf-util-y += evlist.o
+perf-util-y += nvme_pmu.o
 perf-util-y += sideband_evlist.o
 perf-util-y += evsel.o
 perf-util-y += evsel_fprintf.o
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 34c03f47a913..8caa626af57a 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -57,6 +57,7 @@
 #include "hashmap.h"
 #include "hist.h"
 #include "hwmon_pmu.h"
+#include "nvme_pmu.h"
 #include "intel-tpebs.h"
 #include "memswap.h"
 #include "off_cpu.h"
@@ -2207,6 +2208,8 @@ int evsel__read_counter(struct evsel *evsel, int cpu_map_idx, int thread)
 
 	if (evsel__is_hwmon(evsel))
 		return evsel__hwmon_pmu_read(evsel, cpu_map_idx, thread);
+	if (evsel__is_nvme(evsel))
+		return evsel__nvme_pmu_read(evsel, cpu_map_idx, thread);
 
 	if (evsel__is_drm(evsel))
 		return evsel__drm_pmu_read(evsel, cpu_map_idx, thread);
@@ -2947,8 +2950,12 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
 	}
 	if (evsel__is_hwmon(evsel)) {
 		err = evsel__hwmon_pmu_open(evsel, threads,
-					    start_cpu_map_idx,
-					    end_cpu_map_idx);
+					    start_cpu_map_idx, end_cpu_map_idx);
+		goto out;
+	}
+	if (evsel__is_nvme(evsel)) {
+		err = evsel__nvme_pmu_open(evsel, threads,
+					   start_cpu_map_idx, end_cpu_map_idx);
 		goto out;
 	}
 	if (evsel__is_drm(evsel)) {
diff --git a/tools/perf/util/nvme_pmu.c b/tools/perf/util/nvme_pmu.c
new file mode 100644
index 000000000000..17ba758aec59
--- /dev/null
+++ b/tools/perf/util/nvme_pmu.c
@@ -0,0 +1,562 @@
+// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+#include "counts.h"
+#include "debug.h"
+#include "evsel.h"
+#include "hashmap.h"
+#include "nvme_pmu.h"
+#include "pmu.h"
+#include <internal/xyarray.h>
+#include <internal/threadmap.h>
+#include <perf/threadmap.h>
+#include <sys/types.h>
+#include <assert.h>
+#include <ctype.h>
+#include <fcntl.h>
+#include <stddef.h>
+#include <stdlib.h>
+#include <string.h>
+#include <api/fs/fs.h>
+#include <api/io.h>
+#include <api/io_dir.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/zalloc.h>
+
+#ifdef HAVE_LIBNVME_SUPPORT
+#include <libnvme.h>
+
+
+#define NVME_CONFIG(log, size, offset) \
+	(((uint64_t)(log) << 24) | ((uint64_t)(size) << 16) | (offset))
+
+enum nvme_log_type {
+	NVME_LOG_SMART = 0,
+	NVME_LOG_ENDURANCE = 1,
+	NVME_LOG_FDP = 2,
+	NVME_LOG_ERROR = 3,
+	NVME_LOG_ZNS = 4,
+};
+
+#define NVME_SMART(size, field) \
+	NVME_CONFIG(NVME_LOG_SMART, size, offsetof(struct nvme_smart_log, field))
+
+#define NVME_ENDURANCE(size, field) \
+	NVME_CONFIG(NVME_LOG_ENDURANCE, size, offsetof(struct nvme_endurance_group_log, field))
+
+#define NVME_FDP(size, field) \
+	NVME_CONFIG(NVME_LOG_FDP, size, offsetof(struct nvme_fdp_stats_log, field))
+
+#define NVME_ERROR(size, field) \
+	NVME_CONFIG(NVME_LOG_ERROR, size, offsetof(struct nvme_error_log_page, field))
+
+#define NVME_ZNS(size, field) \
+	NVME_CONFIG(NVME_LOG_ZNS, size, offsetof(struct nvme_zns_changed_zone_log, field))
+
+struct nvme_event {
+	const char *name;
+	const char *desc;
+	const char *scale_unit;
+	uint64_t config;
+};
+
+static const struct nvme_event nvme_events[] = {
+	{ "smart_data_units_read",
+	  "Data units read (in 1000s of 512-byte units)",
+	  "512000B", NVME_SMART(16, data_units_read) },
+	{ "smart_data_units_written",
+	  "Data units written (in 1000s of 512-byte units)",
+	  "512000B", NVME_SMART(16, data_units_written) },
+	{ "smart_host_read_commands", "Host read commands", NULL, NVME_SMART(16, host_reads) },
+	{ "smart_host_write_commands", "Host write commands", NULL, NVME_SMART(16, host_writes) },
+	{ "smart_ctrl_busy_time", "Controller busy time", "60s", NVME_SMART(16, ctrl_busy_time) },
+	{ "smart_power_cycles", "Power cycles", NULL, NVME_SMART(16, power_cycles) },
+	{ "smart_power_on_hours", "Power on hours", "1h", NVME_SMART(16, power_on_hours) },
+	{ "smart_unsafe_shutdowns", "Unsafe shutdowns", NULL, NVME_SMART(16, unsafe_shutdowns) },
+	{ "smart_media_errors", "Media errors", NULL, NVME_SMART(16, media_errors) },
+	{ "smart_num_err_log_entries",
+	  "Number of error log entries",
+	  NULL, NVME_SMART(16, num_err_log_entries) },
+	{ "smart_warning_temp_time",
+	  "Warning temperature time",
+	  "60s", NVME_SMART(4, warning_temp_time) },
+	{ "smart_crit_comp_time",
+	  "Critical composite temperature time",
+	  "60s", NVME_SMART(4, critical_comp_time) },
+	{ "smart_temperature", "Temperature", "0.001'C", NVME_SMART(2, temperature) },
+
+	{ "endurance_percent_used",
+	  "Endurance group percentage used",
+	  NULL, NVME_ENDURANCE(1, percent_used) },
+	{ "endurance_data_units_read",
+	  "Endurance group data units read",
+	  "512000B", NVME_ENDURANCE(16, data_units_read) },
+	{ "endurance_data_units_written",
+	  "Endurance group data units written",
+	  "512000B", NVME_ENDURANCE(16, data_units_written) },
+	{ "endurance_media_units_written",
+	  "Endurance group media units written",
+	  "512000B", NVME_ENDURANCE(16, media_units_written) },
+	{ "endurance_host_read_cmds",
+	  "Endurance group host read commands",
+	  NULL, NVME_ENDURANCE(16, host_read_cmds) },
+	{ "endurance_host_write_cmds",
+	  "Endurance group host write commands",
+	  NULL, NVME_ENDURANCE(16, host_write_cmds) },
+	{ "endurance_num_err_info_log_entries",
+	  "Endurance group number of error information log entries",
+	  NULL, NVME_ENDURANCE(16, num_err_info_log_entries) },
+
+	{ "fdp_hbmw", "FDP host bytes with metadata written", "1B", NVME_FDP(16, hbmw) },
+	{ "fdp_mbmw", "FDP media bytes with metadata written", "1B", NVME_FDP(16, mbmw) },
+	{ "fdp_mbe", "FDP media bytes erased", "1B", NVME_FDP(16, mbe) },
+
+	{ "error_count", "Error info log error count", NULL, NVME_ERROR(8, error_count) },
+
+	{ "zns_nrzid", "ZNS changed zone nrzid", NULL, NVME_ZNS(2, nrzid) },
+};
+
+
+struct nvme_pmu {
+	struct perf_pmu pmu;
+	char *dev_name;
+	bool support_checked;
+	bool log_supported[256];
+};
+
+
+bool perf_pmu__is_nvme(const struct perf_pmu *pmu)
+{
+	return pmu && pmu->type >= PERF_PMU_TYPE_NVME_START &&
+		pmu->type <= PERF_PMU_TYPE_NVME_END;
+}
+
+bool evsel__is_nvme(const struct evsel *evsel)
+{
+	return perf_pmu__is_nvme(evsel->pmu);
+}
+
+struct perf_pmu *nvme_pmu__new(struct list_head *pmus, const char *sysfs_name, const char *name)
+{
+	struct nvme_pmu *nvm;
+	char buf[64];
+	__u32 type;
+
+	/*
+	 * Usually sysfs_name is something like "nvme0".
+	 * We try to extract the number. If parsing fails, we use 0.
+	 */
+	type = PERF_PMU_TYPE_NVME_START + strtoul(sysfs_name + 4, NULL, 10);
+
+	if (type > PERF_PMU_TYPE_NVME_END) {
+		pr_err("Unable to encode NVMe type from %s in valid PMU type\n", sysfs_name);
+		return NULL;
+	}
+
+	snprintf(buf, sizeof(buf), "nvme_%s", name);
+
+	nvm = zalloc(sizeof(*nvm));
+	if (!nvm)
+		return NULL;
+
+	if (perf_pmu__init(&nvm->pmu, type, buf) != 0) {
+		free(nvm);
+		return NULL;
+	}
+
+	nvm->dev_name = strdup(sysfs_name);
+	if (!nvm->dev_name) {
+		perf_pmu__delete(&nvm->pmu);
+		return NULL;
+	}
+	nvm->pmu.alias_name = strdup(sysfs_name);
+	if (!nvm->pmu.alias_name) {
+		perf_pmu__delete(&nvm->pmu);
+		return NULL;
+	}
+	nvm->pmu.cpus = perf_cpu_map__new_int(0);
+	if (!nvm->pmu.cpus) {
+		perf_pmu__delete(&nvm->pmu);
+		return NULL;
+	}
+	INIT_LIST_HEAD(&nvm->pmu.format);
+	INIT_LIST_HEAD(&nvm->pmu.caps);
+
+	list_add_tail(&nvm->pmu.list, pmus);
+	return &nvm->pmu;
+}
+
+void nvme_pmu__exit(struct perf_pmu *pmu)
+{
+	struct nvme_pmu *nvm = container_of(pmu, struct nvme_pmu, pmu);
+
+	zfree(&nvm->dev_name);
+}
+
+
+
+static void nvme_pmu__check_support(struct nvme_pmu *nvm)
+{
+	int fd;
+	char path[PATH_MAX];
+	struct nvme_smart_log smart_log;
+	struct nvme_endurance_group_log endurance_log;
+	struct nvme_fdp_stats_log fdp_log;
+	struct nvme_error_log_page error_log;
+	struct nvme_zns_changed_zone_log zns_log;
+
+	if (nvm->support_checked)
+		return;
+
+	nvm->support_checked = true;
+
+	/* Assume all supported if we can't test. */
+	memset(nvm->log_supported, 1, sizeof(nvm->log_supported));
+
+	snprintf(path, sizeof(path), "/dev/%s", nvm->dev_name);
+	fd = open(path, O_RDONLY);
+	if (fd < 0)
+		return;
+
+	if (nvme_get_log_smart(fd, NVME_NSID_ALL, true, &smart_log) != 0)
+		nvm->log_supported[NVME_LOG_SMART] = false;
+
+	if (nvme_get_log_endurance_group(fd, 0, &endurance_log) != 0)
+		nvm->log_supported[NVME_LOG_ENDURANCE] = false;
+
+	if (nvme_get_log_fdp_stats(fd, 0, 0, sizeof(fdp_log), &fdp_log) != 0)
+		nvm->log_supported[NVME_LOG_FDP] = false;
+
+	if (nvme_get_log_error(fd, 1, true, &error_log) != 0)
+		nvm->log_supported[NVME_LOG_ERROR] = false;
+
+	if (nvme_get_log_zns_changed_zones(fd, NVME_NSID_ALL, true, &zns_log) != 0)
+		nvm->log_supported[NVME_LOG_ZNS] = false;
+
+	close(fd);
+}
+
+int nvme_pmu__for_each_event(struct perf_pmu *pmu, void *state, pmu_event_callback cb)
+{
+	struct nvme_pmu *nvm = container_of(pmu, struct nvme_pmu, pmu);
+	size_t i;
+
+	nvme_pmu__check_support(nvm);
+	for (i = 0; i < ARRAY_SIZE(nvme_events); i++) {
+		const struct nvme_event *e = &nvme_events[i];
+		char alias_buf[64];
+		char desc_buf[256];
+		char encoding_buf[128];
+		struct pmu_event_info info = {
+			.pmu = pmu,
+			.name = e->name,
+			.alias = alias_buf,
+			.scale_unit = e->scale_unit,
+			.desc = desc_buf,
+			.long_desc = NULL,
+			.encoding_desc = encoding_buf,
+
+			.topic = "nvme",
+			.pmu_name = pmu->name,
+			.event_type_desc = "NVMe event",
+			.deprecated = !nvm->log_supported[(e->config >> 24) & 0xFF],
+		};
+
+		int ret;
+
+		snprintf(alias_buf, sizeof(alias_buf), "%s", e->name);
+		snprintf(desc_buf, sizeof(desc_buf), "%s", e->desc);
+		snprintf(encoding_buf, sizeof(encoding_buf),
+			 "%s/config=0x%lx/", pmu->name, e->config);
+
+		ret = cb(state, &info);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+size_t nvme_pmu__num_events(struct perf_pmu *pmu __maybe_unused)
+{
+	return ARRAY_SIZE(nvme_events);
+}
+
+bool nvme_pmu__have_event(struct perf_pmu *pmu __maybe_unused, const char *name)
+{
+	size_t i;
+
+	for (i = 0; i < ARRAY_SIZE(nvme_events); i++) {
+		if (!strcasecmp(name, nvme_events[i].name))
+			return true;
+	}
+	return false;
+}
+
+static int nvme_pmu__config_term(const struct nvme_pmu *nvm __maybe_unused,
+				 struct perf_event_attr *attr,
+				 struct parse_events_term *term,
+				 struct parse_events_error *err)
+{
+	if (term->type_term == PARSE_EVENTS__TERM_TYPE_USER) {
+		size_t i;
+
+		for (i = 0; i < ARRAY_SIZE(nvme_events); i++) {
+			if (!strcasecmp(term->config, nvme_events[i].name)) {
+				attr->config = nvme_events[i].config;
+				return 0;
+			}
+		}
+	}
+	if (err) {
+		char *err_str;
+
+		parse_events_error__handle(err, term->err_val,
+					asprintf(&err_str,
+						"unexpected nvme event term (%s) %s",
+						parse_events__term_type_str(term->type_term),
+						term->config) < 0
+					? strdup("unexpected nvme event term")
+					: err_str,
+					NULL);
+	}
+	return -EINVAL;
+}
+
+int nvme_pmu__config_terms(const struct perf_pmu *pmu,
+			   struct perf_event_attr *attr,
+			   struct parse_events_terms *terms,
+			   struct parse_events_error *err)
+{
+	struct nvme_pmu *nvm = container_of(pmu, struct nvme_pmu, pmu);
+	struct parse_events_term *term;
+
+	list_for_each_entry(term, &terms->terms, list) {
+		if (nvme_pmu__config_term(nvm, attr, term, err))
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+int nvme_pmu__check_alias(struct parse_events_terms *terms, struct perf_pmu_info *info,
+			  struct parse_events_error *err)
+{
+	struct parse_events_term *term =
+		list_first_entry(&terms->terms, struct parse_events_term, list);
+
+	if (term->type_term == PARSE_EVENTS__TERM_TYPE_USER) {
+		size_t i;
+
+		for (i = 0; i < ARRAY_SIZE(nvme_events); i++) {
+			if (!strcasecmp(term->config, nvme_events[i].name)) {
+				if (nvme_events[i].scale_unit) {
+					char *unit;
+
+					perf_pmu__convert_scale(nvme_events[i].scale_unit,
+								&unit, &info->scale);
+					info->unit = unit;
+				}
+				return 0;
+			}
+		}
+	}
+	if (err) {
+		char *err_str;
+
+		parse_events_error__handle(err, term->err_val,
+					asprintf(&err_str,
+						"unexpected nvme event term (%s) %s",
+						parse_events__term_type_str(term->type_term),
+						term->config) < 0
+					? strdup("unexpected nvme event term")
+					: err_str,
+					NULL);
+	}
+	return -EINVAL;
+}
+
+int perf_pmus__read_nvme_pmus(struct list_head *pmus)
+{
+	nvme_root_t r = nvme_scan(NULL);
+	nvme_host_t h;
+	nvme_subsystem_t s;
+	nvme_ctrl_t c;
+
+	if (!r)
+		return 0;
+
+	nvme_for_each_host(r, h) {
+		nvme_for_each_subsystem(h, s) {
+			nvme_subsystem_for_each_ctrl(s, c) {
+				nvme_pmu__new(pmus, nvme_ctrl_get_name(c), nvme_ctrl_get_name(c));
+			}
+		}
+	}
+	nvme_free_tree(r);
+	return 0;
+}
+
+
+static int nvme_pmu__read_val(int fd, uint64_t config, uint64_t *val)
+{
+	int log_type = (config >> 24) & 0xFF;
+	unsigned int size = (config >> 16) & 0xFF;
+	unsigned int offset = config & 0xFFFF;
+	uint8_t buf[4096];
+	uint8_t *p;
+
+	if (log_type == NVME_LOG_SMART) {
+		if (offset + size > sizeof(struct nvme_smart_log))
+			return -EINVAL;
+		if (nvme_get_log_smart(fd, NVME_NSID_ALL, true, (struct nvme_smart_log *)buf) != 0)
+			return -EINVAL;
+
+		if (offset == offsetof(struct nvme_smart_log, temperature)) {
+			uint64_t kelvin = ((struct nvme_smart_log *)buf)->temperature[0] |
+					  (((struct nvme_smart_log *)buf)->temperature[1] << 8);
+			*val = (kelvin * 1000) - 273150;
+			return 0;
+		}
+	} else if (log_type == NVME_LOG_ENDURANCE) {
+		if (offset + size > sizeof(struct nvme_endurance_group_log))
+			return -EINVAL;
+		if (nvme_get_log_endurance_group(fd, 0,
+				(struct nvme_endurance_group_log *)buf) != 0)
+			return -EINVAL;
+	} else if (log_type == NVME_LOG_FDP) {
+		if (offset + size > sizeof(struct nvme_fdp_stats_log))
+			return -EINVAL;
+		if (nvme_get_log_fdp_stats(fd, 0, 0, sizeof(struct nvme_fdp_stats_log), buf) != 0)
+			return -EINVAL;
+	} else if (log_type == NVME_LOG_ERROR) {
+		if (offset + size > sizeof(struct nvme_error_log_page))
+			return -EINVAL;
+		if (nvme_get_log_error(fd, 1, true, (struct nvme_error_log_page *)buf) != 0)
+			return -EINVAL;
+	} else if (log_type == NVME_LOG_ZNS) {
+		if (offset + size > sizeof(struct nvme_zns_changed_zone_log))
+			return -EINVAL;
+		if (nvme_get_log_zns_changed_zones(fd, NVME_NSID_ALL, true,
+				(struct nvme_zns_changed_zone_log *)buf) != 0)
+			return -EINVAL;
+	} else {
+		return -EINVAL;
+	}
+
+	p = buf + offset;
+	if (size == 16 || size == 8)
+		*val = le64_to_cpu(*(uint64_t *)p);
+	else if (size == 4)
+		*val = le32_to_cpu(*(uint32_t *)p);
+	else if (size == 2)
+		*val = le16_to_cpu(*(uint16_t *)p);
+	else if (size == 1)
+		*val = *(uint8_t *)p;
+	else
+		return -EINVAL;
+
+	return 0;
+}
+
+static bool nvme_pmu__is_gauge(uint64_t config)
+{
+	if (config == NVME_SMART(2, temperature) ||
+	    config == NVME_ENDURANCE(1, percent_used) ||
+	    config == NVME_ZNS(2, nrzid))
+		return true;
+	return false;
+}
+
+#define FD(e, x, y) (*(int *)xyarray__entry(e->core.fd, x, y))
+
+int evsel__nvme_pmu_open(struct evsel *evsel,
+			 struct perf_thread_map *threads,
+			 int start_cpu_map_idx, int end_cpu_map_idx)
+{
+	struct nvme_pmu *nvm = container_of(evsel->pmu, struct nvme_pmu, pmu);
+	int idx = 0, thread = 0, nthreads, err = 0;
+	char path[PATH_MAX];
+
+	snprintf(path, sizeof(path), "/dev/%s", nvm->dev_name);
+
+	nthreads = perf_thread_map__nr(threads);
+
+	if (!evsel->priv) {
+		int max_cpus = evsel->core.cpus ? perf_cpu_map__nr(evsel->core.cpus) : 1;
+
+		evsel->priv = xyarray__new(max_cpus, nthreads, sizeof(uint64_t));
+	}
+
+	for (idx = start_cpu_map_idx; idx < end_cpu_map_idx; idx++) {
+		for (thread = 0; thread < nthreads; thread++) {
+			int fd = open(path, O_RDONLY);
+
+			FD(evsel, idx, thread) = fd;
+			if (fd < 0) {
+				err = -errno;
+				goto out_close;
+			}
+			if (evsel->priv) {
+				uint64_t *initial_val = xyarray__entry(evsel->priv, idx, thread);
+
+				if (nvme_pmu__read_val(fd, evsel->core.attr.config, initial_val))
+					*initial_val = 0;
+			}
+		}
+	}
+	return 0;
+out_close:
+	if (err)
+		threads->err_thread = thread;
+
+	do {
+		while (--thread >= 0) {
+			if (FD(evsel, idx, thread) >= 0)
+				close(FD(evsel, idx, thread));
+			FD(evsel, idx, thread) = -1;
+		}
+		thread = nthreads;
+	} while (--idx >= 0);
+	return err;
+}
+
+int evsel__nvme_pmu_read(struct evsel *evsel, int cpu_map_idx, int thread)
+{
+	int fd;
+	struct perf_counts_values *count, *old_count = NULL;
+	uint64_t val = 0;
+	uint64_t *initial_val = NULL;
+
+	if (evsel->prev_raw_counts)
+		old_count = perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread);
+
+	count = perf_counts(evsel->counts, cpu_map_idx, thread);
+	fd = FD(evsel, cpu_map_idx, thread);
+
+	if (fd < 0 || nvme_pmu__read_val(fd, evsel->core.attr.config, &val)) {
+		count->lost++;
+		return -EINVAL;
+	}
+
+	if (evsel->priv)
+		initial_val = xyarray__entry(evsel->priv, cpu_map_idx, thread);
+
+	if (old_count) {
+		if (nvme_pmu__is_gauge(evsel->core.attr.config))
+			count->val = old_count->val + val;
+		else
+			count->val = val - (initial_val ? *initial_val : 0);
+		count->run = old_count->run + 1;
+		count->ena = old_count->ena + 1;
+	} else {
+		if (nvme_pmu__is_gauge(evsel->core.attr.config))
+			count->val = val;
+		else
+			count->val = val - (initial_val ? *initial_val : 0);
+		count->run++;
+		count->ena++;
+	}
+	return 0;
+}
+
+
+#endif
diff --git a/tools/perf/util/nvme_pmu.h b/tools/perf/util/nvme_pmu.h
new file mode 100644
index 000000000000..6d5d2bbe4167
--- /dev/null
+++ b/tools/perf/util/nvme_pmu.h
@@ -0,0 +1,112 @@
+/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
+#ifndef __NVME_PMU_H
+#define __NVME_PMU_H
+
+#include "pmu.h"
+#include <stdbool.h>
+#include <errno.h>
+
+struct list_head;
+struct perf_thread_map;
+struct evsel;
+
+#ifdef HAVE_LIBNVME_SUPPORT
+struct perf_pmu *nvme_pmu__new(struct list_head *pmus, const char *sysfs_name, const char *name);
+void nvme_pmu__exit(struct perf_pmu *pmu);
+
+int nvme_pmu__for_each_event(struct perf_pmu *pmu, void *state, pmu_event_callback cb);
+size_t nvme_pmu__num_events(struct perf_pmu *pmu);
+bool nvme_pmu__have_event(struct perf_pmu *pmu, const char *name);
+int nvme_pmu__config_terms(const struct perf_pmu *pmu,
+			   struct perf_event_attr *attr,
+			   struct parse_events_terms *terms,
+			   struct parse_events_error *err);
+int nvme_pmu__check_alias(struct parse_events_terms *terms, struct perf_pmu_info *info,
+			  struct parse_events_error *err);
+
+bool perf_pmu__is_nvme(const struct perf_pmu *pmu);
+bool evsel__is_nvme(const struct evsel *evsel);
+
+int perf_pmus__read_nvme_pmus(struct list_head *pmus);
+
+int evsel__nvme_pmu_open(struct evsel *evsel,
+			 struct perf_thread_map *threads,
+			 int start_cpu_map_idx, int end_cpu_map_idx);
+int evsel__nvme_pmu_read(struct evsel *evsel, int cpu_map_idx, int thread);
+#else
+static inline struct perf_pmu *nvme_pmu__new(struct list_head *pmus __maybe_unused,
+					     const char *sysfs_name __maybe_unused,
+					     const char *name __maybe_unused)
+{
+	return NULL;
+}
+
+static inline void nvme_pmu__exit(struct perf_pmu *pmu __maybe_unused)
+{
+}
+
+static inline int nvme_pmu__for_each_event(struct perf_pmu *pmu __maybe_unused,
+					   void *state __maybe_unused,
+					   pmu_event_callback cb __maybe_unused)
+{
+	return 0;
+}
+
+static inline size_t nvme_pmu__num_events(struct perf_pmu *pmu __maybe_unused)
+{
+	return 0;
+}
+
+static inline bool nvme_pmu__have_event(struct perf_pmu *pmu __maybe_unused,
+					const char *name __maybe_unused)
+{
+	return false;
+}
+
+static inline int nvme_pmu__config_terms(const struct perf_pmu *pmu __maybe_unused,
+					 struct perf_event_attr *attr __maybe_unused,
+					 struct parse_events_terms *terms __maybe_unused,
+					 struct parse_events_error *err __maybe_unused)
+{
+	return -EINVAL;
+}
+
+static inline int nvme_pmu__check_alias(struct parse_events_terms *terms __maybe_unused,
+					struct perf_pmu_info *info __maybe_unused,
+					struct parse_events_error *err __maybe_unused)
+{
+	return -EINVAL;
+}
+
+static inline bool perf_pmu__is_nvme(const struct perf_pmu *pmu __maybe_unused)
+{
+	return false;
+}
+
+static inline bool evsel__is_nvme(const struct evsel *evsel __maybe_unused)
+{
+	return false;
+}
+
+static inline int perf_pmus__read_nvme_pmus(struct list_head *pmus __maybe_unused)
+{
+	return 0;
+}
+
+static inline int evsel__nvme_pmu_open(struct evsel *evsel __maybe_unused,
+				       struct perf_thread_map *threads __maybe_unused,
+				       int start_cpu_map_idx __maybe_unused,
+				       int end_cpu_map_idx __maybe_unused)
+{
+	return 0;
+}
+
+static inline int evsel__nvme_pmu_read(struct evsel *evsel __maybe_unused,
+				       int cpu_map_idx __maybe_unused,
+				       int thread __maybe_unused)
+{
+	return 0;
+}
+#endif
+
+#endif /* __NVME_PMU_H */
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 9994709ef12b..26ec19753644 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -22,6 +22,7 @@
 #include "pmu.h"
 #include "drm_pmu.h"
 #include "hwmon_pmu.h"
+#include "nvme_pmu.h"
 #include "pmus.h"
 #include "tool_pmu.h"
 #include "tp_pmu.h"
@@ -1687,6 +1688,8 @@ int perf_pmu__config_terms(const struct perf_pmu *pmu,
 
 	if (perf_pmu__is_hwmon(pmu))
 		return hwmon_pmu__config_terms(pmu, attr, terms, err);
+	if (perf_pmu__is_nvme(pmu))
+		return nvme_pmu__config_terms(pmu, attr, terms, err);
 	if (perf_pmu__is_drm(pmu))
 		return drm_pmu__config_terms(pmu, attr, terms, err);
 
@@ -1851,6 +1854,10 @@ int perf_pmu__check_alias(struct perf_pmu *pmu, struct parse_events_terms *head_
 		ret = hwmon_pmu__check_alias(head_terms, info, err);
 		goto out;
 	}
+	if (perf_pmu__is_nvme(pmu)) {
+		ret = nvme_pmu__check_alias(head_terms, info, err);
+		goto out;
+	}
 	if (perf_pmu__is_drm(pmu)) {
 		ret = drm_pmu__check_alias(pmu, head_terms, info, err);
 		goto out;
@@ -2071,6 +2078,8 @@ bool perf_pmu__have_event(struct perf_pmu *pmu, const char *name)
 		return tp_pmu__have_event(pmu, name);
 	if (perf_pmu__is_hwmon(pmu))
 		return hwmon_pmu__have_event(pmu, name);
+	if (perf_pmu__is_nvme(pmu))
+		return nvme_pmu__have_event(pmu, name);
 	if (perf_pmu__is_drm(pmu))
 		return drm_pmu__have_event(pmu, name);
 	if (perf_pmu__find_alias(pmu, name, /*load=*/ true) != NULL)
@@ -2092,6 +2101,8 @@ size_t perf_pmu__num_events(struct perf_pmu *pmu)
 		return tp_pmu__num_events(pmu);
 	if (perf_pmu__is_hwmon(pmu))
 		return hwmon_pmu__num_events(pmu);
+	if (perf_pmu__is_nvme(pmu))
+		return nvme_pmu__num_events(pmu);
 	if (perf_pmu__is_drm(pmu))
 		return drm_pmu__num_events(pmu);
 
@@ -2223,6 +2234,8 @@ int perf_pmu__for_each_event(struct perf_pmu *pmu, bool skip_duplicate_pmus,
 		return tp_pmu__for_each_event(pmu, state, cb);
 	if (perf_pmu__is_hwmon(pmu))
 		return hwmon_pmu__for_each_event(pmu, state, cb);
+	if (perf_pmu__is_nvme(pmu))
+		return nvme_pmu__for_each_event(pmu, state, cb);
 	if (perf_pmu__is_drm(pmu))
 		return drm_pmu__for_each_event(pmu, state, cb);
 
@@ -2714,6 +2727,8 @@ void perf_pmu__delete(struct perf_pmu *pmu)
 
 	if (perf_pmu__is_hwmon(pmu))
 		hwmon_pmu__exit(pmu);
+	if (perf_pmu__is_nvme(pmu))
+		nvme_pmu__exit(pmu);
 	else if (perf_pmu__is_drm(pmu))
 		drm_pmu__exit(pmu);
 
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 0d9f3c57e8e8..0fe47dd429e8 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -45,6 +45,8 @@ enum pmu_kind {
 	PERF_PMU_KIND_DRM,
 	/* A perf tool provided HWMON PMU. */
 	PERF_PMU_KIND_HWMON,
+	/* A perf tool provided NVME PMU. */
+	PERF_PMU_KIND_NVME,
 	/* Perf tool provided PMU for tool events like time. */
 	PERF_PMU_KIND_TOOL,
 	/* A testing PMU kind. */
@@ -53,7 +55,9 @@ enum pmu_kind {
 
 enum {
 	PERF_PMU_TYPE_PE_START    = 0,
-	PERF_PMU_TYPE_PE_END      = 0xFFFDFFFF,
+	PERF_PMU_TYPE_PE_END      = 0xFFFCFFFF,
+	PERF_PMU_TYPE_NVME_START  = 0xFFFD0000,
+	PERF_PMU_TYPE_NVME_END    = 0xFFFDFFFF,
 	PERF_PMU_TYPE_DRM_START   = 0xFFFE0000,
 	PERF_PMU_TYPE_DRM_END     = 0xFFFEFFFF,
 	PERF_PMU_TYPE_HWMON_START = 0xFFFF0000,
@@ -363,6 +367,8 @@ static inline enum pmu_kind perf_pmu__kind(const struct perf_pmu *pmu)
 	type = pmu->type;
 	if (type <= PERF_PMU_TYPE_PE_END)
 		return PERF_PMU_KIND_PE;
+	if (type <= PERF_PMU_TYPE_NVME_END)
+		return PERF_PMU_KIND_NVME;
 	if (type <= PERF_PMU_TYPE_DRM_END)
 		return PERF_PMU_KIND_DRM;
 	if (type <= PERF_PMU_TYPE_HWMON_END)
diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
index 5e3f571450fe..83777f941e9a 100644
--- a/tools/perf/util/pmus.c
+++ b/tools/perf/util/pmus.c
@@ -17,6 +17,7 @@
 #include "pmus.h"
 #include "pmu.h"
 #include "hwmon_pmu.h"
+#include "nvme_pmu.h"
 #include "tool_pmu.h"
 #include "print-events.h"
 #include "strbuf.h"
@@ -44,18 +45,21 @@ enum perf_tool_pmu_type {
 	PERF_TOOL_PMU_TYPE_PE_OTHER,
 	PERF_TOOL_PMU_TYPE_TOOL,
 	PERF_TOOL_PMU_TYPE_HWMON,
+	PERF_TOOL_PMU_TYPE_NVME,
 	PERF_TOOL_PMU_TYPE_DRM,
 
 #define PERF_TOOL_PMU_TYPE_PE_CORE_MASK (1 << PERF_TOOL_PMU_TYPE_PE_CORE)
 #define PERF_TOOL_PMU_TYPE_PE_OTHER_MASK (1 << PERF_TOOL_PMU_TYPE_PE_OTHER)
 #define PERF_TOOL_PMU_TYPE_TOOL_MASK (1 << PERF_TOOL_PMU_TYPE_TOOL)
 #define PERF_TOOL_PMU_TYPE_HWMON_MASK (1 << PERF_TOOL_PMU_TYPE_HWMON)
+#define PERF_TOOL_PMU_TYPE_NVME_MASK (1 << PERF_TOOL_PMU_TYPE_NVME)
 #define PERF_TOOL_PMU_TYPE_DRM_MASK (1 << PERF_TOOL_PMU_TYPE_DRM)
 
 #define PERF_TOOL_PMU_TYPE_ALL_MASK (PERF_TOOL_PMU_TYPE_PE_CORE_MASK |	\
 					PERF_TOOL_PMU_TYPE_PE_OTHER_MASK | \
 					PERF_TOOL_PMU_TYPE_TOOL_MASK |	\
 					PERF_TOOL_PMU_TYPE_HWMON_MASK | \
+					PERF_TOOL_PMU_TYPE_NVME_MASK | \
 					PERF_TOOL_PMU_TYPE_DRM_MASK)
 };
 static unsigned int read_pmu_types;
@@ -175,12 +179,15 @@ struct perf_pmu *perf_pmus__find(const char *name)
 		return pmu;
 
 	/* Looking up an individual perf event PMU failed, check if a tool PMU should be read. */
-	if (!strncmp(name, "hwmon_", 6))
-		to_read_pmus |= PERF_TOOL_PMU_TYPE_HWMON_MASK;
-	else if (!strncmp(name, "drm_", 4))
+	if (!strncmp(name, "hwmon_", 6)) {
+		to_read_pmus = PERF_TOOL_PMU_TYPE_HWMON_MASK;
+	} else if (!strncmp(name, "nvme_", 5)) {
+		to_read_pmus = PERF_TOOL_PMU_TYPE_NVME_MASK;
+	} else if (!strncmp(name, "drm_", 4)) {
 		to_read_pmus |= PERF_TOOL_PMU_TYPE_DRM_MASK;
-	else if (!strcmp(name, "tool"))
+	} else if (!strcmp(name, "tool")) {
 		to_read_pmus |= PERF_TOOL_PMU_TYPE_TOOL_MASK;
+	}
 
 	if (to_read_pmus) {
 		pmu_read_sysfs(to_read_pmus);
@@ -279,6 +286,10 @@ static void pmu_read_sysfs(unsigned int to_read_types)
 	    (read_pmu_types & PERF_TOOL_PMU_TYPE_HWMON_MASK) == 0)
 		perf_pmus__read_hwmon_pmus(&other_pmus);
 
+	if ((to_read_types & PERF_TOOL_PMU_TYPE_NVME_MASK) != 0 &&
+	    (read_pmu_types & PERF_TOOL_PMU_TYPE_NVME_MASK) == 0)
+		perf_pmus__read_nvme_pmus(&other_pmus);
+
 	if ((to_read_types & PERF_TOOL_PMU_TYPE_DRM_MASK) != 0 &&
 	    (read_pmu_types & PERF_TOOL_PMU_TYPE_DRM_MASK) == 0)
 		perf_pmus__read_drm_pmus(&other_pmus);
@@ -387,6 +398,10 @@ struct perf_pmu *perf_pmus__scan_for_event(struct perf_pmu *pmu, const char *eve
 		if (strlen(event) > 4 && strncmp("drm-", event, 4) == 0)
 			to_read_pmus |= PERF_TOOL_PMU_TYPE_DRM_MASK;
 
+		/* Could the event be an nvme event? */
+		if (nvme_pmu__have_event(NULL, event))
+			to_read_pmus |= PERF_TOOL_PMU_TYPE_NVME_MASK;
+
 		pmu_read_sysfs(to_read_pmus);
 		pmu = list_prepare_entry(pmu, &core_pmus, list);
 	}
@@ -424,11 +439,14 @@ struct perf_pmu *perf_pmus__scan_matching_wildcard(struct perf_pmu *pmu, const c
 		 */
 		if (strisglob(wildcard)) {
 			to_read_pmus |= PERF_TOOL_PMU_TYPE_HWMON_MASK |
+				PERF_TOOL_PMU_TYPE_NVME_MASK |
 				PERF_TOOL_PMU_TYPE_DRM_MASK;
 		} else if (strlen(wildcard) >= 4 && strncmp("drm_", wildcard, 4) == 0) {
 			to_read_pmus |= PERF_TOOL_PMU_TYPE_DRM_MASK;
 		} else if (strlen(wildcard) >= 5 && strncmp("hwmon", wildcard, 5) == 0) {
 			to_read_pmus |= PERF_TOOL_PMU_TYPE_HWMON_MASK;
+		} else if (strlen(wildcard) >= 4 && strncmp("nvme", wildcard, 4) == 0) {
+			to_read_pmus |= PERF_TOOL_PMU_TYPE_NVME_MASK;
 		}
 
 		pmu_read_sysfs(to_read_pmus);
-- 
2.54.0.1064.gd145956f57-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v1 3/3] perf tests: Add NVMe PMU event parsing test
  2026-06-09  7:03 [PATCH v1 0/3] perf pmu: Add tool-provided NVMe PMU Ian Rogers
  2026-06-09  7:03 ` [PATCH v1 1/3] perf build: Add libnvme feature detection Ian Rogers
  2026-06-09  7:03 ` [PATCH v1 2/3] perf pmu: Implement tool-provided NVMe PMU Ian Rogers
@ 2026-06-09  7:03 ` Ian Rogers
  2026-06-09 16:57 ` [PATCH v2 0/3] perf pmu: Add tool-provided NVMe PMU Ian Rogers
  3 siblings, 0 replies; 12+ messages in thread
From: Ian Rogers @ 2026-06-09  7:03 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark,
	Tomas Glozar, Michael Jeanson, Dmitrii Dolgov, Alexandre Chartre,
	Yuzhuo Jing, Leo Yan, German Gomez, Anubhav Shelat, linux-kernel,
	linux-perf-users
  Cc: linux-nvme

Introduce a unit test suite for the NVMe PMU event parser.
The test registers a mock 'nvme_nvme0' PMU and checks that all of
our mapped events (SMART, Endurance, FDP, Error, ZNS) parse into the
correctly configured configs.

To support this, moved the NVMe event config encoding macros and the
'nvme_log_type' enum from nvme_pmu.c to nvme_pmu.h.

Signed-off-by: Ian Rogers <irogers@google.com>
CONV=ca4c5d09-4ef8-405a-80bb-aa988020b436
TAG=agy
---
 tools/perf/tests/Build          |   1 +
 tools/perf/tests/builtin-test.c |   1 +
 tools/perf/tests/nvme_pmu.c     | 176 ++++++++++++++++++++++++++++++++
 tools/perf/tests/tests.h        |   1 +
 tools/perf/util/nvme_pmu.c      |  27 -----
 tools/perf/util/nvme_pmu.h      |  31 ++++++
 tools/perf/util/pmus.c          |   5 +
 tools/perf/util/pmus.h          |   1 +
 8 files changed, 216 insertions(+), 27 deletions(-)
 create mode 100644 tools/perf/tests/nvme_pmu.c

diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build
index 66944a4f4968..acaf5e3e728b 100644
--- a/tools/perf/tests/Build
+++ b/tools/perf/tests/Build
@@ -68,6 +68,7 @@ perf-test-y += event_groups.o
 perf-test-y += symbols.o
 perf-test-y += util.o
 perf-test-y += hwmon_pmu.o
+perf-test-y += nvme_pmu.o
 perf-test-y += tool_pmu.o
 perf-test-y += subcmd-help.o
 perf-test-y += kallsyms-split.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index b64fc2204f22..9a2fb706c2bc 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -83,6 +83,7 @@ static struct test_suite *generic_tests[] = {
 	&suite__pmu,
 	&suite__pmu_events,
 	&suite__hwmon_pmu,
+	&suite__nvme_pmu,
 	&suite__tool_pmu,
 	&suite__dso_data,
 	&suite__perf_evsel__roundtrip_name_test,
diff --git a/tools/perf/tests/nvme_pmu.c b/tools/perf/tests/nvme_pmu.c
new file mode 100644
index 000000000000..3c1de6e92efc
--- /dev/null
+++ b/tools/perf/tests/nvme_pmu.c
@@ -0,0 +1,176 @@
+// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+#include "nvme_pmu.h"
+
+#include <errno.h>
+#include <inttypes.h>
+#include <linux/compiler.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+
+#include "debug.h"
+#include "evlist.h"
+#include "parse-events.h"
+#include "pmus.h"
+#include "tests.h"
+
+#ifdef HAVE_LIBNVME_SUPPORT
+
+static const struct test_event {
+	const char *name;
+	const char *alias;
+	uint64_t config;
+} test_events[] = {
+	{
+		"smart_temperature",
+		"smart_temperature",
+		NVME_SMART(2, temperature),
+	},
+	{
+		"smart_data_units_read",
+		"smart_data_units_read",
+		NVME_SMART(16, data_units_read),
+	},
+	{
+		"endurance_percent_used",
+		"endurance_percent_used",
+		NVME_ENDURANCE(1, percent_used),
+	},
+	{
+		"fdp_hbmw",
+		"fdp_hbmw",
+		NVME_FDP(16, hbmw),
+	},
+	{
+		"error_count",
+		"error_count",
+		NVME_ERROR(8, error_count),
+	},
+	{
+		"zns_nrzid",
+		"zns_nrzid",
+		NVME_ZNS(2, nrzid),
+	},
+};
+
+static int do_test(size_t i, bool with_pmu, bool with_alias)
+{
+	const char *test_event = with_alias ? test_events[i].alias : test_events[i].name;
+	struct evlist *evlist = evlist__new();
+	struct evsel *evsel;
+	struct parse_events_error err;
+	int ret;
+	char str[128];
+	bool found = false;
+
+	if (!evlist) {
+		pr_err("evlist allocation failed\n");
+		return TEST_FAIL;
+	}
+
+	if (with_pmu)
+		snprintf(str, sizeof(str), "nvme_nvme0/%s/", test_event);
+	else
+		strlcpy(str, test_event, sizeof(str));
+
+	pr_debug("Testing '%s'\n", str);
+	parse_events_error__init(&err);
+	ret = parse_events(evlist, str, &err);
+	if (ret) {
+		pr_debug("FAILED %s:%d failed to parse event '%s', err %d\n",
+			 __FILE__, __LINE__, str, ret);
+		parse_events_error__print(&err, str);
+		ret = TEST_FAIL;
+		goto out;
+	}
+
+	ret = TEST_OK;
+	if (with_pmu ? (evlist->core.nr_entries != 1) : (evlist->core.nr_entries < 1)) {
+		pr_debug("FAILED %s:%d Unexpected number of events for '%s' of %d\n",
+			 __FILE__, __LINE__, str, evlist->core.nr_entries);
+		ret = TEST_FAIL;
+		goto out;
+	}
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (!evsel->pmu || !evsel->pmu->name ||
+		    strcmp(evsel->pmu->name, "nvme_nvme0"))
+			continue;
+
+		if (evsel->core.attr.config != test_events[i].config) {
+			pr_debug("FAILED %s:%d Unexpected config for '%s', %"
+				 PRIu64 " != %" PRIu64 "\n",
+				 __FILE__, __LINE__, str,
+				 (uint64_t)evsel->core.attr.config,
+				 test_events[i].config);
+			ret = TEST_FAIL;
+			goto out;
+		}
+		found = true;
+	}
+
+	if (!found) {
+		pr_debug("FAILED %s:%d Didn't find nvme event '%s' in parsed evsels\n",
+			 __FILE__, __LINE__, str);
+		ret = TEST_FAIL;
+	}
+
+out:
+	parse_events_error__exit(&err);
+	evlist__delete(evlist);
+	return ret;
+}
+
+static int test__nvme_pmu(bool with_pmu)
+{
+	struct perf_pmu *pmu = perf_pmus__add_test_nvme_pmu("nvme0", "nvme0");
+	int ret = TEST_OK;
+
+	if (!pmu)
+		return TEST_FAIL;
+
+	for (size_t i = 0; i < ARRAY_SIZE(test_events); i++) {
+		ret = do_test(i, with_pmu, /*with_alias=*/false);
+		if (ret != TEST_OK)
+			break;
+
+		ret = do_test(i, with_pmu, /*with_alias=*/true);
+		if (ret != TEST_OK)
+			break;
+	}
+
+	list_del(&pmu->list);
+	perf_pmu__delete(pmu);
+	return ret;
+}
+
+static int test__nvme_pmu_without_pmu(struct test_suite *test __maybe_unused,
+				      int subtest __maybe_unused)
+{
+	return test__nvme_pmu(/*with_pmu=*/false);
+}
+
+static int test__nvme_pmu_with_pmu(struct test_suite *test __maybe_unused,
+				   int subtest __maybe_unused)
+{
+	return test__nvme_pmu(/*with_pmu=*/true);
+}
+
+static struct test_case tests__nvme_pmu[] = {
+	TEST_CASE("Parsing without PMU name", nvme_pmu_without_pmu),
+	TEST_CASE("Parsing with PMU name", nvme_pmu_with_pmu),
+	{	.name = NULL, }
+};
+
+struct test_suite suite__nvme_pmu = {
+	.desc = "NVMe PMU",
+	.test_cases = tests__nvme_pmu,
+};
+
+#else
+
+struct test_suite suite__nvme_pmu = {
+	.desc = "NVMe PMU",
+	.test_cases = NULL,
+};
+
+#endif
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index bf8ff7d54727..abffa51c5937 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -115,6 +115,7 @@ DECLARE_SUITE(syscall_openat_tp_fields);
 DECLARE_SUITE(pmu);
 DECLARE_SUITE(pmu_events);
 DECLARE_SUITE(hwmon_pmu);
+DECLARE_SUITE(nvme_pmu);
 DECLARE_SUITE(tool_pmu);
 DECLARE_SUITE(attr);
 DECLARE_SUITE(dso_data);
diff --git a/tools/perf/util/nvme_pmu.c b/tools/perf/util/nvme_pmu.c
index 17ba758aec59..ee0e958ea6ef 100644
--- a/tools/perf/util/nvme_pmu.c
+++ b/tools/perf/util/nvme_pmu.c
@@ -25,33 +25,6 @@
 #ifdef HAVE_LIBNVME_SUPPORT
 #include <libnvme.h>
 
-
-#define NVME_CONFIG(log, size, offset) \
-	(((uint64_t)(log) << 24) | ((uint64_t)(size) << 16) | (offset))
-
-enum nvme_log_type {
-	NVME_LOG_SMART = 0,
-	NVME_LOG_ENDURANCE = 1,
-	NVME_LOG_FDP = 2,
-	NVME_LOG_ERROR = 3,
-	NVME_LOG_ZNS = 4,
-};
-
-#define NVME_SMART(size, field) \
-	NVME_CONFIG(NVME_LOG_SMART, size, offsetof(struct nvme_smart_log, field))
-
-#define NVME_ENDURANCE(size, field) \
-	NVME_CONFIG(NVME_LOG_ENDURANCE, size, offsetof(struct nvme_endurance_group_log, field))
-
-#define NVME_FDP(size, field) \
-	NVME_CONFIG(NVME_LOG_FDP, size, offsetof(struct nvme_fdp_stats_log, field))
-
-#define NVME_ERROR(size, field) \
-	NVME_CONFIG(NVME_LOG_ERROR, size, offsetof(struct nvme_error_log_page, field))
-
-#define NVME_ZNS(size, field) \
-	NVME_CONFIG(NVME_LOG_ZNS, size, offsetof(struct nvme_zns_changed_zone_log, field))
-
 struct nvme_event {
 	const char *name;
 	const char *desc;
diff --git a/tools/perf/util/nvme_pmu.h b/tools/perf/util/nvme_pmu.h
index 6d5d2bbe4167..9203f461f381 100644
--- a/tools/perf/util/nvme_pmu.h
+++ b/tools/perf/util/nvme_pmu.h
@@ -6,6 +6,37 @@
 #include <stdbool.h>
 #include <errno.h>
 
+#ifdef HAVE_LIBNVME_SUPPORT
+#include <libnvme.h>
+#include <stddef.h>
+
+#define NVME_CONFIG(log, size, offset) \
+	(((uint64_t)(log) << 24) | ((uint64_t)(size) << 16) | (offset))
+
+enum nvme_log_type {
+	NVME_LOG_SMART = 0,
+	NVME_LOG_ENDURANCE = 1,
+	NVME_LOG_FDP = 2,
+	NVME_LOG_ERROR = 3,
+	NVME_LOG_ZNS = 4,
+};
+
+#define NVME_SMART(size, field) \
+	NVME_CONFIG(NVME_LOG_SMART, size, offsetof(struct nvme_smart_log, field))
+
+#define NVME_ENDURANCE(size, field) \
+	NVME_CONFIG(NVME_LOG_ENDURANCE, size, offsetof(struct nvme_endurance_group_log, field))
+
+#define NVME_FDP(size, field) \
+	NVME_CONFIG(NVME_LOG_FDP, size, offsetof(struct nvme_fdp_stats_log, field))
+
+#define NVME_ERROR(size, field) \
+	NVME_CONFIG(NVME_LOG_ERROR, size, offsetof(struct nvme_error_log_page, field))
+
+#define NVME_ZNS(size, field) \
+	NVME_CONFIG(NVME_LOG_ZNS, size, offsetof(struct nvme_zns_changed_zone_log, field))
+#endif
+
 struct list_head;
 struct perf_thread_map;
 struct evsel;
diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
index 83777f941e9a..1c45164ae244 100644
--- a/tools/perf/util/pmus.c
+++ b/tools/perf/util/pmus.c
@@ -909,6 +909,11 @@ struct perf_pmu *perf_pmus__add_test_hwmon_pmu(const char *hwmon_dir,
 	return hwmon_pmu__new(&other_pmus, hwmon_dir, sysfs_name, name);
 }
 
+struct perf_pmu *perf_pmus__add_test_nvme_pmu(const char *sysfs_name, const char *name)
+{
+	return nvme_pmu__new(&other_pmus, sysfs_name, name);
+}
+
 struct perf_pmu *perf_pmus__fake_pmu(void)
 {
 	static struct perf_pmu fake = {
diff --git a/tools/perf/util/pmus.h b/tools/perf/util/pmus.h
index 0d55edb3f2fc..2045d4cf44d1 100644
--- a/tools/perf/util/pmus.h
+++ b/tools/perf/util/pmus.h
@@ -37,6 +37,7 @@ struct perf_pmu *perf_pmus__add_test_pmu(int test_sysfs_dirfd, const char *name)
 struct perf_pmu *perf_pmus__add_test_hwmon_pmu(const char *hwmon_dir,
 					       const char *sysfs_name,
 					       const char *name);
+struct perf_pmu *perf_pmus__add_test_nvme_pmu(const char *sysfs_name, const char *name);
 struct perf_pmu *perf_pmus__fake_pmu(void);
 struct perf_pmu *perf_pmus__find_core_pmu(void);
 
-- 
2.54.0.1064.gd145956f57-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 1/3] perf build: Add libnvme feature detection
  2026-06-09  7:03 ` [PATCH v1 1/3] perf build: Add libnvme feature detection Ian Rogers
@ 2026-06-09  7:12   ` sashiko-bot
  0 siblings, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-06-09  7:12 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] The feature check for `libnvme` fails to utilize `pkg-config` during the test compilation, contradicting the commit message and breaking detection when the library is in a non-standard path.
- [Medium] `libnvme` is omitted from `test-all.c`, which causes false positives in fast-path detection and triggers continuous feature-dump flip-flops and rebuilds.
--

commit 1f4a72ba50b99659f0972d94ef171f8be76bc329
Author: Ian Rogers <irogers@google.com>

perf build: Add libnvme feature detection

Introduce a feature check for libnvme so that we can conditionally
compile the NVMe PMU driver. This checks for the availability of the
libnvme headers and library using pkg-config.

> diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
> index ed1374af31c16..bf85032415397 100644
> --- a/tools/build/Makefile.feature
> +++ b/tools/build/Makefile.feature
> @@ -73,6 +73,7 @@ FEATURE_TESTS_BASIC :=                  \
>          libbfd                          \
>  	libbfd-threadsafe		\
>          libelf                          \
> +        libnvme                         \

[Severity: Medium]
Without including test-libnvme.c in test-all.c, the fast-path build will
succeed even if libnvme is absent.

This can cause the initial discovery to falsely report libnvme as present,
which then flip-flops when individual checks are run later in
tools/perf/Makefile.config. This mismatch invalidates the build cache and
triggers full feature evaluations and rebuilds on every make invocation.

Since libnvme was added to FEATURE_TESTS_BASIC, does it also need to be
included in tools/build/feature/test-all.c?

>          libelf-getphdrnum               \
>          libelf-gelf_getnote             \
>          libelf-getshdrstrndx            \

[ ... ]

> diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile
> index 62909a9c799d3..fd00e0be60690 100644
> --- a/tools/build/feature/Makefile
> +++ b/tools/build/feature/Makefile
> @@ -147,6 +147,9 @@ $(OUTPUT)test-libcap.bin:
>  $(OUTPUT)test-libelf.bin:
>  	$(BUILD) -lelf
>  
> +$(OUTPUT)test-libnvme.bin:
> +	$(BUILD) -lnvme
> +

[Severity: Medium]
The commit message states the availability of libnvme headers and library are
checked using pkg-config. However, because libnvme is omitted from the
FEATURE_PKG_CONFIG list in tools/build/Makefile.feature, the variables
FEATURE_CHECK_CFLAGS-libnvme and FEATURE_CHECK_LDFLAGS-libnvme are never
initialized.

This rule manually invokes $(BUILD) -lnvme, falling back to standard compiler
paths instead of leveraging pkg-config variables.

If libnvme is installed in a non-standard location discoverable only via
pkg-config, could the feature test fail and incorrectly disable NVMe PMU
support?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260609070348.541964-1-irogers@google.com?part=1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 2/3] perf pmu: Implement tool-provided NVMe PMU
  2026-06-09  7:03 ` [PATCH v1 2/3] perf pmu: Implement tool-provided NVMe PMU Ian Rogers
@ 2026-06-09  7:21   ` sashiko-bot
  0 siblings, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-06-09  7:21 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Unaligned memory access and strict aliasing violation in nvme_pmu__read_val() due to byte-aligned stack array cast to wider types.
- [High] The NVMe PMU implementation hijacks the tool-owned `evsel->priv` field, leading to arbitrary memory corruption and memory leaks.
--

commit 444d6a9adc97df412d7040ffe34f89b94548c30b
Author: Ian Rogers <irogers@google.com>

perf pmu: Implement tool-provided NVMe PMU

This commit adds a tool-provided PMU for NVMe devices using libnvme to
expose metrics from various NVMe logs as perf events under the
nvme_nvmeX PMUs.

> diff --git a/tools/perf/util/nvme_pmu.c b/tools/perf/util/nvme_pmu.c
> new file mode 100644
> index 0000000000000..17ba758aec59b
> --- /dev/null
> +++ b/tools/perf/util/nvme_pmu.c
[ ... ]
> +static int nvme_pmu__read_val(int fd, uint64_t config, uint64_t *val)
> +{
> +	int log_type = (config >> 24) & 0xFF;
> +	unsigned int size = (config >> 16) & 0xFF;
> +	unsigned int offset = config & 0xFFFF;
> +	uint8_t buf[4096];
> +	uint8_t *p;
[ ... ]
> +	p = buf + offset;
> +	if (size == 16 || size == 8)
> +		*val = le64_to_cpu(*(uint64_t *)p);

[Severity: High]
Does this cast violate C strict aliasing rules and risk unaligned memory
access crashes?

Since buf is a byte-aligned stack array and offset can be an arbitrary
value, p is not guaranteed to be 8-byte aligned. On architectures that
strictly enforce memory alignment, dereferencing an unaligned 64-bit
pointer can cause an immediate bus error (SIGBUS).

Additionally, casting a byte array to a wider type violates strict aliasing,
which is enforced in the tools/ directory with -fstrict-aliasing.

Could we use get_unaligned_le64() here instead?

> +	else if (size == 4)
> +		*val = le32_to_cpu(*(uint32_t *)p);
> +	else if (size == 2)
> +		*val = le16_to_cpu(*(uint16_t *)p);
> +	else if (size == 1)
> +		*val = *(uint8_t *)p;
> +	else
> +		return -EINVAL;
> +
> +	return 0;
> +}
[ ... ]
> +int evsel__nvme_pmu_open(struct evsel *evsel,
> +			 struct perf_thread_map *threads,
> +			 int start_cpu_map_idx, int end_cpu_map_idx)
> +{
> +	struct nvme_pmu *nvm = container_of(evsel->pmu, struct nvme_pmu, pmu);
> +	int idx = 0, thread = 0, nthreads, err = 0;
> +	char path[PATH_MAX];
> +
> +	snprintf(path, sizeof(path), "/dev/%s", nvm->dev_name);
> +
> +	nthreads = perf_thread_map__nr(threads);
> +
> +	if (!evsel->priv) {
> +		int max_cpus = evsel->core.cpus ? perf_cpu_map__nr(evsel->core.cpus) : 1;
> +
> +		evsel->priv = xyarray__new(max_cpus, nthreads, sizeof(uint64_t));
> +	}

[Severity: High]
Can this leak the xyarray allocation?

Since evsel->priv is conventionally owned by the specific perf tool executing
the command, allocating an xyarray here does not seem to have a matching
destructor for the PMU, which might leave the memory unfreed when the event
is closed.

> +
> +	for (idx = start_cpu_map_idx; idx < end_cpu_map_idx; idx++) {
> +		for (thread = 0; thread < nthreads; thread++) {
> +			int fd = open(path, O_RDONLY);
> +
> +			FD(evsel, idx, thread) = fd;
> +			if (fd < 0) {
> +				err = -errno;
> +				goto out_close;
> +			}
> +			if (evsel->priv) {
> +				uint64_t *initial_val = xyarray__entry(evsel->priv, idx, thread);
> +
> +				if (nvme_pmu__read_val(fd, evsel->core.attr.config, initial_val))
> +					*initial_val = 0;
> +			}

[Severity: High]
Is there a risk of memory corruption if a tool has already populated
evsel->priv?

If an active tool like perf stat has already set evsel->priv to its own
structure, this code assumes it is an xyarray. The xyarray__entry()
function will read from the tool's struct to compute a pointer offset,
and then nvme_pmu__read_val() will write a 64-bit value to that
invalid address, causing arbitrary heap corruption.

> +		}
> +	}
> +	return 0;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260609070348.541964-1-irogers@google.com?part=2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 0/3] perf pmu: Add tool-provided NVMe PMU
  2026-06-09  7:03 [PATCH v1 0/3] perf pmu: Add tool-provided NVMe PMU Ian Rogers
                   ` (2 preceding siblings ...)
  2026-06-09  7:03 ` [PATCH v1 3/3] perf tests: Add NVMe PMU event parsing test Ian Rogers
@ 2026-06-09 16:57 ` Ian Rogers
  2026-06-09 16:57   ` [PATCH v2 1/3] perf build: Add libnvme feature detection Ian Rogers
                     ` (2 more replies)
  3 siblings, 3 replies; 12+ messages in thread
From: Ian Rogers @ 2026-06-09 16:57 UTC (permalink / raw)
  To: irogers, yuzhuo
  Cc: 9erthalion6, acme, adrian.hunter, alexandre.chartre, ashelat,
	german.gomez, james.clark, jolsa, leo.yan, linux-kernel,
	linux-nvme, linux-perf-users, mingo, mjeanson, namhyung, peterz,
	tglozar

This patch series introduces a tool-provided PMU for NVMe devices to expose
various hardware logs as perf events.

Using libnvme, the new nvme_pmu extracts metrics from standard NVMe log pages
including:
- SMART / Health Information (e.g. data units read/written, power cycles,
  temperature)
- Endurance Group Information
- FDP (Flexible Data Placement) Statistics
- Error Information
- Zoned Namespaces (ZNS) Changed Zones

Changes in v2:
- Fixed feature check: added test-libnvme.c to test-all.c to correctly
  propagate fast-path discovery failures.
- Makefile cleanup: added libnvme to FEATURE_PKG_CONFIG in Makefile.feature
  to ensure correct pkg-config flags are used during feature testing.
- Alignment safety: replaced strict-aliasing pointers with get_unaligned_le*
  accessors inside nvme_pmu__read_val() to prevent crashes on strict-alignment
  architectures.
- Heap safety: refactored evsel delta snapshots to use a new 'nvme' union
  member inside struct evsel instead of clashing with evsel->priv, preventing
  memory leaks and heap corruption with active tools like perf script.

Ian Rogers (3):
  perf build: Add libnvme feature detection
  perf pmu: Implement tool-provided NVMe PMU
  perf tests: Add NVMe PMU event parsing test

 tools/build/Makefile.feature       |   5 +-
 tools/build/feature/Makefile       |   3 +
 tools/build/feature/test-all.c     |   5 +
 tools/build/feature/test-libnvme.c |  11 +
 tools/perf/Makefile.config         |  12 +
 tools/perf/tests/Build             |   1 +
 tools/perf/tests/builtin-test.c    |   1 +
 tools/perf/tests/nvme_pmu.c        | 176 ++++++++++
 tools/perf/tests/tests.h           |   1 +
 tools/perf/util/Build              |   1 +
 tools/perf/util/evsel.c            |  13 +-
 tools/perf/util/evsel.h            |   4 +
 tools/perf/util/nvme_pmu.c         | 537 +++++++++++++++++++++++++++++
 tools/perf/util/nvme_pmu.h         | 143 ++++++++
 tools/perf/util/pmu.c              |  15 +
 tools/perf/util/pmu.h              |   8 +-
 tools/perf/util/pmus.c             |  31 +-
 tools/perf/util/pmus.h             |   1 +
 18 files changed, 960 insertions(+), 8 deletions(-)
 create mode 100644 tools/build/feature/test-libnvme.c
 create mode 100644 tools/perf/tests/nvme_pmu.c
 create mode 100644 tools/perf/util/nvme_pmu.c
 create mode 100644 tools/perf/util/nvme_pmu.h

-- 
2.54.0.1099.g489fc7bff1-goog


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/3] perf build: Add libnvme feature detection
  2026-06-09 16:57 ` [PATCH v2 0/3] perf pmu: Add tool-provided NVMe PMU Ian Rogers
@ 2026-06-09 16:57   ` Ian Rogers
  2026-06-09 17:08     ` sashiko-bot
  2026-06-09 16:57   ` [PATCH v2 2/3] perf pmu: Implement tool-provided NVMe PMU Ian Rogers
  2026-06-09 16:57   ` [PATCH v2 3/3] perf tests: Add NVMe PMU event parsing test Ian Rogers
  2 siblings, 1 reply; 12+ messages in thread
From: Ian Rogers @ 2026-06-09 16:57 UTC (permalink / raw)
  To: irogers, yuzhuo
  Cc: 9erthalion6, acme, adrian.hunter, alexandre.chartre, ashelat,
	german.gomez, james.clark, jolsa, leo.yan, linux-kernel,
	linux-nvme, linux-perf-users, mingo, mjeanson, namhyung, peterz,
	tglozar

Introduce a feature check for libnvme so that we can conditionally
compile the NVMe PMU driver. This checks for the availability of the
libnvme headers and library using pkg-config.

Signed-off-by: Ian Rogers <irogers@google.com>
CONV=ca4c5d09-4ef8-405a-80bb-aa988020b436
TAG=agy
---
 tools/build/Makefile.feature       |  5 ++++-
 tools/build/feature/Makefile       |  3 +++
 tools/build/feature/test-all.c     |  5 +++++
 tools/build/feature/test-libnvme.c | 11 +++++++++++
 tools/perf/Makefile.config         | 12 ++++++++++++
 5 files changed, 35 insertions(+), 1 deletion(-)
 create mode 100644 tools/build/feature/test-libnvme.c

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
index ed1374af31c1..3bc9a305256b 100644
--- a/tools/build/Makefile.feature
+++ b/tools/build/Makefile.feature
@@ -73,6 +73,7 @@ FEATURE_TESTS_BASIC :=                  \
         libbfd                          \
 	libbfd-threadsafe		\
         libelf                          \
+        libnvme                         \
         libelf-getphdrnum               \
         libelf-gelf_getnote             \
         libelf-getshdrstrndx            \
@@ -146,6 +147,7 @@ FEATURE_DISPLAY ?=              \
          libdw                  \
          glibc                  \
          libelf                 \
+         libnvme                \
          libnuma                \
          numa_num_possible_cpus \
          libpython              \
@@ -171,7 +173,8 @@ FEATURE_GROUP_MEMBERS-libbfd = libbfd-liberty libbfd-liberty-z
 FEATURE_PKG_CONFIG ?=           \
 	 babeltrace2-ctf-writer \
          libtraceevent          \
-         libtracefs
+         libtracefs             \
+         libnvme
 
 feature_pkg_config = $(eval $(feature_pkg_config_code))
 define feature_pkg_config_code
diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile
index 62909a9c799d..fd00e0be6069 100644
--- a/tools/build/feature/Makefile
+++ b/tools/build/feature/Makefile
@@ -147,6 +147,9 @@ $(OUTPUT)test-libcap.bin:
 $(OUTPUT)test-libelf.bin:
 	$(BUILD) -lelf
 
+$(OUTPUT)test-libnvme.bin:
+	$(BUILD) -lnvme
+
 $(OUTPUT)test-eventfd.bin:
 	$(BUILD)
 
diff --git a/tools/build/feature/test-all.c b/tools/build/feature/test-all.c
index 544563d62950..875eb9185e23 100644
--- a/tools/build/feature/test-all.c
+++ b/tools/build/feature/test-all.c
@@ -146,6 +146,10 @@
 # include "test-libopenssl.c"
 #undef main
 
+#define main main_test_libnvme
+# include "test-libnvme.c"
+#undef main
+
 int main(int argc, char *argv[])
 {
 	main_test_libpython();
@@ -178,6 +182,7 @@ int main(int argc, char *argv[])
 	main_test_libzstd();
 	main_test_libtraceevent();
 	main_test_libopenssl();
+	main_test_libnvme();
 
 	return 0;
 }
diff --git a/tools/build/feature/test-libnvme.c b/tools/build/feature/test-libnvme.c
new file mode 100644
index 000000000000..3fefad70a680
--- /dev/null
+++ b/tools/build/feature/test-libnvme.c
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+#include <libnvme.h>
+
+int main(void)
+{
+	nvme_root_t r = nvme_scan(NULL);
+
+	if (r)
+		nvme_free_tree(r);
+	return 0;
+}
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index 6e7b15fab2ec..79c4ad10bc18 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -1060,6 +1060,18 @@ ifndef NO_CAPSTONE
   endif
 endif
 
+ifndef NO_LIBNVME
+  $(call feature_check,libnvme)
+  ifeq ($(feature-libnvme), 1)
+    CFLAGS += -DHAVE_LIBNVME_SUPPORT $(shell $(PKG_CONFIG) --cflags libnvme)
+    LDFLAGS += $(shell $(PKG_CONFIG) --libs-only-L libnvme)
+    EXTLIBS += $(shell $(PKG_CONFIG) --libs-only-l libnvme)
+    $(call detected,CONFIG_LIBNVME)
+  else
+    msg := $(warning No libnvme found, disables NVMe PMU support, please install libnvme-dev/libnvme-devel);
+  endif
+endif
+
 ifdef EXTRA_TESTS
     $(call detected,CONFIG_EXTRA_TESTS)
     CFLAGS += -DHAVE_EXTRA_TESTS
-- 
2.54.0.1099.g489fc7bff1-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/3] perf pmu: Implement tool-provided NVMe PMU
  2026-06-09 16:57 ` [PATCH v2 0/3] perf pmu: Add tool-provided NVMe PMU Ian Rogers
  2026-06-09 16:57   ` [PATCH v2 1/3] perf build: Add libnvme feature detection Ian Rogers
@ 2026-06-09 16:57   ` Ian Rogers
  2026-06-09 17:19     ` sashiko-bot
  2026-06-09 16:57   ` [PATCH v2 3/3] perf tests: Add NVMe PMU event parsing test Ian Rogers
  2 siblings, 1 reply; 12+ messages in thread
From: Ian Rogers @ 2026-06-09 16:57 UTC (permalink / raw)
  To: irogers, yuzhuo
  Cc: 9erthalion6, acme, adrian.hunter, alexandre.chartre, ashelat,
	german.gomez, james.clark, jolsa, leo.yan, linux-kernel,
	linux-nvme, linux-perf-users, mingo, mjeanson, namhyung, peterz,
	tglozar

Add a tool-provided PMU for NVMe devices using libnvme. This PMU
exposes metrics from various NVMe logs (SMART, Endurance Group,
FDP, Error Information, and Zoned Namespaces) as perf events under
the 'nvme_nvmeX' PMUs.

Features:
- Generic configuration encoding (log type, size, and offset mapped
  inside the event config).
- Dynamic probing of supported log pages on /dev/nvmeX. Unsupported
  events are marked deprecated and hidden from 'perf list' by
  default.
- Correct interval delta tracking using baseline offset snapshots on
  evsel->priv.
- Proper conversion and scaling of properties like temperature (K
  to C).

Signed-off-by: Ian Rogers <irogers@google.com>
CONV=ca4c5d09-4ef8-405a-80bb-aa988020b436
TAG=agy
---
 tools/perf/util/Build      |   1 +
 tools/perf/util/evsel.c    |  13 +-
 tools/perf/util/evsel.h    |   4 +
 tools/perf/util/nvme_pmu.c | 564 +++++++++++++++++++++++++++++++++++++
 tools/perf/util/nvme_pmu.h | 112 ++++++++
 tools/perf/util/pmu.c      |  15 +
 tools/perf/util/pmu.h      |   8 +-
 tools/perf/util/pmus.c     |  26 +-
 8 files changed, 736 insertions(+), 7 deletions(-)
 create mode 100644 tools/perf/util/nvme_pmu.c
 create mode 100644 tools/perf/util/nvme_pmu.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index b22cdc24082a..e28e99634178 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -20,6 +20,7 @@ perf-util-y += disasm.o
 perf-util-y += env.o
 perf-util-y += event.o
 perf-util-y += evlist.o
+perf-util-y += nvme_pmu.o
 perf-util-y += sideband_evlist.o
 perf-util-y += evsel.o
 perf-util-y += evsel_fprintf.o
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 34c03f47a913..4148186384a6 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -57,6 +57,7 @@
 #include "hashmap.h"
 #include "hist.h"
 #include "hwmon_pmu.h"
+#include "nvme_pmu.h"
 #include "intel-tpebs.h"
 #include "memswap.h"
 #include "off_cpu.h"
@@ -2006,6 +2007,8 @@ void evsel__exit(struct evsel *evsel)
 		xyarray__delete(evsel->process_time.start_times);
 		xyarray__delete(evsel->process_time.accumulated_times);
 	}
+	if (perf_pmu__is_nvme(evsel->pmu))
+		xyarray__delete(evsel->nvme.initial_values);
 }
 
 void evsel__delete(struct evsel *evsel)
@@ -2207,6 +2210,8 @@ int evsel__read_counter(struct evsel *evsel, int cpu_map_idx, int thread)
 
 	if (evsel__is_hwmon(evsel))
 		return evsel__hwmon_pmu_read(evsel, cpu_map_idx, thread);
+	if (evsel__is_nvme(evsel))
+		return evsel__nvme_pmu_read(evsel, cpu_map_idx, thread);
 
 	if (evsel__is_drm(evsel))
 		return evsel__drm_pmu_read(evsel, cpu_map_idx, thread);
@@ -2947,8 +2952,12 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
 	}
 	if (evsel__is_hwmon(evsel)) {
 		err = evsel__hwmon_pmu_open(evsel, threads,
-					    start_cpu_map_idx,
-					    end_cpu_map_idx);
+					    start_cpu_map_idx, end_cpu_map_idx);
+		goto out;
+	}
+	if (evsel__is_nvme(evsel)) {
+		err = evsel__nvme_pmu_open(evsel, threads,
+					   start_cpu_map_idx, end_cpu_map_idx);
 		goto out;
 	}
 	if (evsel__is_drm(evsel)) {
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 8178858d168a..002e045e9015 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -202,6 +202,10 @@ struct evsel {
 			struct xyarray *start_times;
 			struct xyarray *accumulated_times;
 		} process_time;
+		/* For NVMe events */
+		struct {
+			struct xyarray *initial_values;
+		} nvme;
 	};
 	/* Is the tool's fd for /proc/pid/stat or /proc/stat. */
 	bool pid_stat;
diff --git a/tools/perf/util/nvme_pmu.c b/tools/perf/util/nvme_pmu.c
new file mode 100644
index 000000000000..e2391726d76e
--- /dev/null
+++ b/tools/perf/util/nvme_pmu.c
@@ -0,0 +1,564 @@
+// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+#include "counts.h"
+#include "debug.h"
+#include "evsel.h"
+#include "hashmap.h"
+#include "nvme_pmu.h"
+#include "pmu.h"
+#include <internal/xyarray.h>
+#include <internal/threadmap.h>
+#include <perf/threadmap.h>
+#include <sys/types.h>
+#include <assert.h>
+#include <ctype.h>
+#include <fcntl.h>
+#include <stddef.h>
+#include <stdlib.h>
+#include <string.h>
+#include <api/fs/fs.h>
+#include <api/io.h>
+#include <api/io_dir.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/zalloc.h>
+#include <linux/unaligned.h>
+
+#ifdef HAVE_LIBNVME_SUPPORT
+#include <libnvme.h>
+
+
+#define NVME_CONFIG(log, size, offset) \
+	(((uint64_t)(log) << 24) | ((uint64_t)(size) << 16) | (offset))
+
+enum nvme_log_type {
+	NVME_LOG_SMART = 0,
+	NVME_LOG_ENDURANCE = 1,
+	NVME_LOG_FDP = 2,
+	NVME_LOG_ERROR = 3,
+	NVME_LOG_ZNS = 4,
+};
+
+#define NVME_SMART(size, field) \
+	NVME_CONFIG(NVME_LOG_SMART, size, offsetof(struct nvme_smart_log, field))
+
+#define NVME_ENDURANCE(size, field) \
+	NVME_CONFIG(NVME_LOG_ENDURANCE, size, offsetof(struct nvme_endurance_group_log, field))
+
+#define NVME_FDP(size, field) \
+	NVME_CONFIG(NVME_LOG_FDP, size, offsetof(struct nvme_fdp_stats_log, field))
+
+#define NVME_ERROR(size, field) \
+	NVME_CONFIG(NVME_LOG_ERROR, size, offsetof(struct nvme_error_log_page, field))
+
+#define NVME_ZNS(size, field) \
+	NVME_CONFIG(NVME_LOG_ZNS, size, offsetof(struct nvme_zns_changed_zone_log, field))
+
+struct nvme_event {
+	const char *name;
+	const char *desc;
+	const char *scale_unit;
+	uint64_t config;
+};
+
+static const struct nvme_event nvme_events[] = {
+	{ "smart_data_units_read",
+	  "Data units read (in 1000s of 512-byte units)",
+	  "512000B", NVME_SMART(16, data_units_read) },
+	{ "smart_data_units_written",
+	  "Data units written (in 1000s of 512-byte units)",
+	  "512000B", NVME_SMART(16, data_units_written) },
+	{ "smart_host_read_commands", "Host read commands", NULL, NVME_SMART(16, host_reads) },
+	{ "smart_host_write_commands", "Host write commands", NULL, NVME_SMART(16, host_writes) },
+	{ "smart_ctrl_busy_time", "Controller busy time", "60s", NVME_SMART(16, ctrl_busy_time) },
+	{ "smart_power_cycles", "Power cycles", NULL, NVME_SMART(16, power_cycles) },
+	{ "smart_power_on_hours", "Power on hours", "1h", NVME_SMART(16, power_on_hours) },
+	{ "smart_unsafe_shutdowns", "Unsafe shutdowns", NULL, NVME_SMART(16, unsafe_shutdowns) },
+	{ "smart_media_errors", "Media errors", NULL, NVME_SMART(16, media_errors) },
+	{ "smart_num_err_log_entries",
+	  "Number of error log entries",
+	  NULL, NVME_SMART(16, num_err_log_entries) },
+	{ "smart_warning_temp_time",
+	  "Warning temperature time",
+	  "60s", NVME_SMART(4, warning_temp_time) },
+	{ "smart_crit_comp_time",
+	  "Critical composite temperature time",
+	  "60s", NVME_SMART(4, critical_comp_time) },
+	{ "smart_temperature", "Temperature", "0.001'C", NVME_SMART(2, temperature) },
+
+	{ "endurance_percent_used",
+	  "Endurance group percentage used",
+	  NULL, NVME_ENDURANCE(1, percent_used) },
+	{ "endurance_data_units_read",
+	  "Endurance group data units read",
+	  "512000B", NVME_ENDURANCE(16, data_units_read) },
+	{ "endurance_data_units_written",
+	  "Endurance group data units written",
+	  "512000B", NVME_ENDURANCE(16, data_units_written) },
+	{ "endurance_media_units_written",
+	  "Endurance group media units written",
+	  "512000B", NVME_ENDURANCE(16, media_units_written) },
+	{ "endurance_host_read_cmds",
+	  "Endurance group host read commands",
+	  NULL, NVME_ENDURANCE(16, host_read_cmds) },
+	{ "endurance_host_write_cmds",
+	  "Endurance group host write commands",
+	  NULL, NVME_ENDURANCE(16, host_write_cmds) },
+	{ "endurance_num_err_info_log_entries",
+	  "Endurance group number of error information log entries",
+	  NULL, NVME_ENDURANCE(16, num_err_info_log_entries) },
+
+	{ "fdp_hbmw", "FDP host bytes with metadata written", "1B", NVME_FDP(16, hbmw) },
+	{ "fdp_mbmw", "FDP media bytes with metadata written", "1B", NVME_FDP(16, mbmw) },
+	{ "fdp_mbe", "FDP media bytes erased", "1B", NVME_FDP(16, mbe) },
+
+	{ "error_count", "Error info log error count", NULL, NVME_ERROR(8, error_count) },
+
+	{ "zns_nrzid", "ZNS changed zone nrzid", NULL, NVME_ZNS(2, nrzid) },
+};
+
+
+struct nvme_pmu {
+	struct perf_pmu pmu;
+	char *dev_name;
+	bool support_checked;
+	bool log_supported[256];
+};
+
+
+bool perf_pmu__is_nvme(const struct perf_pmu *pmu)
+{
+	return pmu && pmu->type >= PERF_PMU_TYPE_NVME_START &&
+		pmu->type <= PERF_PMU_TYPE_NVME_END;
+}
+
+bool evsel__is_nvme(const struct evsel *evsel)
+{
+	return perf_pmu__is_nvme(evsel->pmu);
+}
+
+struct perf_pmu *nvme_pmu__new(struct list_head *pmus, const char *sysfs_name, const char *name)
+{
+	struct nvme_pmu *nvm;
+	char buf[64];
+	__u32 type;
+
+	/*
+	 * Usually sysfs_name is something like "nvme0".
+	 * We try to extract the number. If parsing fails, we use 0.
+	 */
+	type = PERF_PMU_TYPE_NVME_START + strtoul(sysfs_name + 4, NULL, 10);
+
+	if (type > PERF_PMU_TYPE_NVME_END) {
+		pr_err("Unable to encode NVMe type from %s in valid PMU type\n", sysfs_name);
+		return NULL;
+	}
+
+	snprintf(buf, sizeof(buf), "nvme_%s", name);
+
+	nvm = zalloc(sizeof(*nvm));
+	if (!nvm)
+		return NULL;
+
+	if (perf_pmu__init(&nvm->pmu, type, buf) != 0) {
+		free(nvm);
+		return NULL;
+	}
+
+	nvm->dev_name = strdup(sysfs_name);
+	if (!nvm->dev_name) {
+		perf_pmu__delete(&nvm->pmu);
+		return NULL;
+	}
+	nvm->pmu.alias_name = strdup(sysfs_name);
+	if (!nvm->pmu.alias_name) {
+		perf_pmu__delete(&nvm->pmu);
+		return NULL;
+	}
+	nvm->pmu.cpus = perf_cpu_map__new_int(0);
+	if (!nvm->pmu.cpus) {
+		perf_pmu__delete(&nvm->pmu);
+		return NULL;
+	}
+	INIT_LIST_HEAD(&nvm->pmu.format);
+	INIT_LIST_HEAD(&nvm->pmu.caps);
+
+	list_add_tail(&nvm->pmu.list, pmus);
+	return &nvm->pmu;
+}
+
+void nvme_pmu__exit(struct perf_pmu *pmu)
+{
+	struct nvme_pmu *nvm = container_of(pmu, struct nvme_pmu, pmu);
+
+	zfree(&nvm->dev_name);
+}
+
+
+
+static void nvme_pmu__check_support(struct nvme_pmu *nvm)
+{
+	int fd;
+	char path[PATH_MAX];
+	struct nvme_smart_log smart_log;
+	struct nvme_endurance_group_log endurance_log;
+	struct nvme_fdp_stats_log fdp_log;
+	struct nvme_error_log_page error_log;
+	struct nvme_zns_changed_zone_log zns_log;
+
+	if (nvm->support_checked)
+		return;
+
+	nvm->support_checked = true;
+
+	/* Assume all supported if we can't test. */
+	memset(nvm->log_supported, 1, sizeof(nvm->log_supported));
+
+	snprintf(path, sizeof(path), "/dev/%s", nvm->dev_name);
+	fd = open(path, O_RDONLY);
+	if (fd < 0)
+		return;
+
+	if (nvme_get_log_smart(fd, NVME_NSID_ALL, true, &smart_log) != 0)
+		nvm->log_supported[NVME_LOG_SMART] = false;
+
+	if (nvme_get_log_endurance_group(fd, 0, &endurance_log) != 0)
+		nvm->log_supported[NVME_LOG_ENDURANCE] = false;
+
+	if (nvme_get_log_fdp_stats(fd, 0, 0, sizeof(fdp_log), &fdp_log) != 0)
+		nvm->log_supported[NVME_LOG_FDP] = false;
+
+	if (nvme_get_log_error(fd, 1, true, &error_log) != 0)
+		nvm->log_supported[NVME_LOG_ERROR] = false;
+
+	if (nvme_get_log_zns_changed_zones(fd, NVME_NSID_ALL, true, &zns_log) != 0)
+		nvm->log_supported[NVME_LOG_ZNS] = false;
+
+	close(fd);
+}
+
+int nvme_pmu__for_each_event(struct perf_pmu *pmu, void *state, pmu_event_callback cb)
+{
+	struct nvme_pmu *nvm = container_of(pmu, struct nvme_pmu, pmu);
+	size_t i;
+
+	nvme_pmu__check_support(nvm);
+	for (i = 0; i < ARRAY_SIZE(nvme_events); i++) {
+		const struct nvme_event *e = &nvme_events[i];
+		char alias_buf[64];
+		char desc_buf[256];
+		char encoding_buf[128];
+		struct pmu_event_info info = {
+			.pmu = pmu,
+			.name = e->name,
+			.alias = alias_buf,
+			.scale_unit = e->scale_unit,
+			.desc = desc_buf,
+			.long_desc = NULL,
+			.encoding_desc = encoding_buf,
+
+			.topic = "nvme",
+			.pmu_name = pmu->name,
+			.event_type_desc = "NVMe event",
+			.deprecated = !nvm->log_supported[(e->config >> 24) & 0xFF],
+		};
+
+		int ret;
+
+		snprintf(alias_buf, sizeof(alias_buf), "%s", e->name);
+		snprintf(desc_buf, sizeof(desc_buf), "%s", e->desc);
+		snprintf(encoding_buf, sizeof(encoding_buf),
+			 "%s/config=0x%lx/", pmu->name, e->config);
+
+		ret = cb(state, &info);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+size_t nvme_pmu__num_events(struct perf_pmu *pmu __maybe_unused)
+{
+	return ARRAY_SIZE(nvme_events);
+}
+
+bool nvme_pmu__have_event(struct perf_pmu *pmu __maybe_unused, const char *name)
+{
+	size_t i;
+
+	for (i = 0; i < ARRAY_SIZE(nvme_events); i++) {
+		if (!strcasecmp(name, nvme_events[i].name))
+			return true;
+	}
+	return false;
+}
+
+static int nvme_pmu__config_term(const struct nvme_pmu *nvm __maybe_unused,
+				 struct perf_event_attr *attr,
+				 struct parse_events_term *term,
+				 struct parse_events_error *err)
+{
+	if (term->type_term == PARSE_EVENTS__TERM_TYPE_USER) {
+		size_t i;
+
+		for (i = 0; i < ARRAY_SIZE(nvme_events); i++) {
+			if (!strcasecmp(term->config, nvme_events[i].name)) {
+				attr->config = nvme_events[i].config;
+				return 0;
+			}
+		}
+	}
+	if (err) {
+		char *err_str;
+
+		parse_events_error__handle(err, term->err_val,
+					asprintf(&err_str,
+						"unexpected nvme event term (%s) %s",
+						parse_events__term_type_str(term->type_term),
+						term->config) < 0
+					? strdup("unexpected nvme event term")
+					: err_str,
+					NULL);
+	}
+	return -EINVAL;
+}
+
+int nvme_pmu__config_terms(const struct perf_pmu *pmu,
+			   struct perf_event_attr *attr,
+			   struct parse_events_terms *terms,
+			   struct parse_events_error *err)
+{
+	struct nvme_pmu *nvm = container_of(pmu, struct nvme_pmu, pmu);
+	struct parse_events_term *term;
+
+	list_for_each_entry(term, &terms->terms, list) {
+		if (nvme_pmu__config_term(nvm, attr, term, err))
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+int nvme_pmu__check_alias(struct parse_events_terms *terms, struct perf_pmu_info *info,
+			  struct parse_events_error *err)
+{
+	struct parse_events_term *term =
+		list_first_entry(&terms->terms, struct parse_events_term, list);
+
+	if (term->type_term == PARSE_EVENTS__TERM_TYPE_USER) {
+		size_t i;
+
+		for (i = 0; i < ARRAY_SIZE(nvme_events); i++) {
+			if (!strcasecmp(term->config, nvme_events[i].name)) {
+				if (nvme_events[i].scale_unit) {
+					char *unit;
+
+					perf_pmu__convert_scale(nvme_events[i].scale_unit,
+								&unit, &info->scale);
+					info->unit = unit;
+				}
+				return 0;
+			}
+		}
+	}
+	if (err) {
+		char *err_str;
+
+		parse_events_error__handle(err, term->err_val,
+					asprintf(&err_str,
+						"unexpected nvme event term (%s) %s",
+						parse_events__term_type_str(term->type_term),
+						term->config) < 0
+					? strdup("unexpected nvme event term")
+					: err_str,
+					NULL);
+	}
+	return -EINVAL;
+}
+
+int perf_pmus__read_nvme_pmus(struct list_head *pmus)
+{
+	nvme_root_t r = nvme_scan(NULL);
+	nvme_host_t h;
+	nvme_subsystem_t s;
+	nvme_ctrl_t c;
+
+	if (!r)
+		return 0;
+
+	nvme_for_each_host(r, h) {
+		nvme_for_each_subsystem(h, s) {
+			nvme_subsystem_for_each_ctrl(s, c) {
+				nvme_pmu__new(pmus, nvme_ctrl_get_name(c), nvme_ctrl_get_name(c));
+			}
+		}
+	}
+	nvme_free_tree(r);
+	return 0;
+}
+
+
+static int nvme_pmu__read_val(int fd, uint64_t config, uint64_t *val)
+{
+	int log_type = (config >> 24) & 0xFF;
+	unsigned int size = (config >> 16) & 0xFF;
+	unsigned int offset = config & 0xFFFF;
+	uint8_t buf[4096];
+	uint8_t *p;
+
+	if (log_type == NVME_LOG_SMART) {
+		if (offset + size > sizeof(struct nvme_smart_log))
+			return -EINVAL;
+		if (nvme_get_log_smart(fd, NVME_NSID_ALL, true, (struct nvme_smart_log *)buf) != 0)
+			return -EINVAL;
+
+		if (offset == offsetof(struct nvme_smart_log, temperature)) {
+			uint64_t kelvin = ((struct nvme_smart_log *)buf)->temperature[0] |
+					  (((struct nvme_smart_log *)buf)->temperature[1] << 8);
+			*val = (kelvin * 1000) - 273150;
+			return 0;
+		}
+	} else if (log_type == NVME_LOG_ENDURANCE) {
+		if (offset + size > sizeof(struct nvme_endurance_group_log))
+			return -EINVAL;
+		if (nvme_get_log_endurance_group(fd, 0,
+				(struct nvme_endurance_group_log *)buf) != 0)
+			return -EINVAL;
+	} else if (log_type == NVME_LOG_FDP) {
+		if (offset + size > sizeof(struct nvme_fdp_stats_log))
+			return -EINVAL;
+		if (nvme_get_log_fdp_stats(fd, 0, 0, sizeof(struct nvme_fdp_stats_log), buf) != 0)
+			return -EINVAL;
+	} else if (log_type == NVME_LOG_ERROR) {
+		if (offset + size > sizeof(struct nvme_error_log_page))
+			return -EINVAL;
+		if (nvme_get_log_error(fd, 1, true, (struct nvme_error_log_page *)buf) != 0)
+			return -EINVAL;
+	} else if (log_type == NVME_LOG_ZNS) {
+		if (offset + size > sizeof(struct nvme_zns_changed_zone_log))
+			return -EINVAL;
+		if (nvme_get_log_zns_changed_zones(fd, NVME_NSID_ALL, true,
+				(struct nvme_zns_changed_zone_log *)buf) != 0)
+			return -EINVAL;
+	} else {
+		return -EINVAL;
+	}
+
+	p = buf + offset;
+	if (size == 16 || size == 8)
+		*val = get_unaligned_le64(p);
+	else if (size == 4)
+		*val = get_unaligned_le32(p);
+	else if (size == 2)
+		*val = get_unaligned_le16(p);
+	else if (size == 1)
+		*val = *p;
+	else
+		return -EINVAL;
+
+	return 0;
+}
+
+static bool nvme_pmu__is_gauge(uint64_t config)
+{
+	if (config == NVME_SMART(2, temperature) ||
+	    config == NVME_ENDURANCE(1, percent_used) ||
+	    config == NVME_ZNS(2, nrzid))
+		return true;
+	return false;
+}
+
+#define FD(e, x, y) (*(int *)xyarray__entry(e->core.fd, x, y))
+
+int evsel__nvme_pmu_open(struct evsel *evsel,
+			 struct perf_thread_map *threads,
+			 int start_cpu_map_idx, int end_cpu_map_idx)
+{
+	struct nvme_pmu *nvm = container_of(evsel->pmu, struct nvme_pmu, pmu);
+	int idx = 0, thread = 0, nthreads, err = 0;
+	char path[PATH_MAX];
+
+	snprintf(path, sizeof(path), "/dev/%s", nvm->dev_name);
+
+	nthreads = perf_thread_map__nr(threads);
+
+	if (!evsel->nvme.initial_values) {
+		int max_cpus = evsel->core.cpus ? perf_cpu_map__nr(evsel->core.cpus) : 1;
+
+		evsel->nvme.initial_values = xyarray__new(max_cpus, nthreads, sizeof(uint64_t));
+	}
+
+	for (idx = start_cpu_map_idx; idx < end_cpu_map_idx; idx++) {
+		for (thread = 0; thread < nthreads; thread++) {
+			int fd = open(path, O_RDONLY);
+
+			FD(evsel, idx, thread) = fd;
+			if (fd < 0) {
+				err = -errno;
+				goto out_close;
+			}
+			if (evsel->nvme.initial_values) {
+				uint64_t *initial_val =
+					xyarray__entry(evsel->nvme.initial_values, idx, thread);
+
+				if (nvme_pmu__read_val(fd, evsel->core.attr.config, initial_val))
+					*initial_val = 0;
+			}
+		}
+	}
+	return 0;
+out_close:
+	if (err)
+		threads->err_thread = thread;
+
+	do {
+		while (--thread >= 0) {
+			if (FD(evsel, idx, thread) >= 0)
+				close(FD(evsel, idx, thread));
+			FD(evsel, idx, thread) = -1;
+		}
+		thread = nthreads;
+	} while (--idx >= 0);
+	return err;
+}
+
+int evsel__nvme_pmu_read(struct evsel *evsel, int cpu_map_idx, int thread)
+{
+	int fd;
+	struct perf_counts_values *count, *old_count = NULL;
+	uint64_t val = 0;
+	uint64_t *initial_val = NULL;
+
+	if (evsel->prev_raw_counts)
+		old_count = perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread);
+
+	count = perf_counts(evsel->counts, cpu_map_idx, thread);
+	fd = FD(evsel, cpu_map_idx, thread);
+
+	if (fd < 0 || nvme_pmu__read_val(fd, evsel->core.attr.config, &val)) {
+		count->lost++;
+		return -EINVAL;
+	}
+
+	if (evsel->nvme.initial_values)
+		initial_val = xyarray__entry(evsel->nvme.initial_values, cpu_map_idx, thread);
+
+	if (old_count) {
+		if (nvme_pmu__is_gauge(evsel->core.attr.config))
+			count->val = old_count->val + val;
+		else
+			count->val = val - (initial_val ? *initial_val : 0);
+		count->run = old_count->run + 1;
+		count->ena = old_count->ena + 1;
+	} else {
+		if (nvme_pmu__is_gauge(evsel->core.attr.config))
+			count->val = val;
+		else
+			count->val = val - (initial_val ? *initial_val : 0);
+		count->run++;
+		count->ena++;
+	}
+	return 0;
+}
+
+
+#endif
diff --git a/tools/perf/util/nvme_pmu.h b/tools/perf/util/nvme_pmu.h
new file mode 100644
index 000000000000..6d5d2bbe4167
--- /dev/null
+++ b/tools/perf/util/nvme_pmu.h
@@ -0,0 +1,112 @@
+/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
+#ifndef __NVME_PMU_H
+#define __NVME_PMU_H
+
+#include "pmu.h"
+#include <stdbool.h>
+#include <errno.h>
+
+struct list_head;
+struct perf_thread_map;
+struct evsel;
+
+#ifdef HAVE_LIBNVME_SUPPORT
+struct perf_pmu *nvme_pmu__new(struct list_head *pmus, const char *sysfs_name, const char *name);
+void nvme_pmu__exit(struct perf_pmu *pmu);
+
+int nvme_pmu__for_each_event(struct perf_pmu *pmu, void *state, pmu_event_callback cb);
+size_t nvme_pmu__num_events(struct perf_pmu *pmu);
+bool nvme_pmu__have_event(struct perf_pmu *pmu, const char *name);
+int nvme_pmu__config_terms(const struct perf_pmu *pmu,
+			   struct perf_event_attr *attr,
+			   struct parse_events_terms *terms,
+			   struct parse_events_error *err);
+int nvme_pmu__check_alias(struct parse_events_terms *terms, struct perf_pmu_info *info,
+			  struct parse_events_error *err);
+
+bool perf_pmu__is_nvme(const struct perf_pmu *pmu);
+bool evsel__is_nvme(const struct evsel *evsel);
+
+int perf_pmus__read_nvme_pmus(struct list_head *pmus);
+
+int evsel__nvme_pmu_open(struct evsel *evsel,
+			 struct perf_thread_map *threads,
+			 int start_cpu_map_idx, int end_cpu_map_idx);
+int evsel__nvme_pmu_read(struct evsel *evsel, int cpu_map_idx, int thread);
+#else
+static inline struct perf_pmu *nvme_pmu__new(struct list_head *pmus __maybe_unused,
+					     const char *sysfs_name __maybe_unused,
+					     const char *name __maybe_unused)
+{
+	return NULL;
+}
+
+static inline void nvme_pmu__exit(struct perf_pmu *pmu __maybe_unused)
+{
+}
+
+static inline int nvme_pmu__for_each_event(struct perf_pmu *pmu __maybe_unused,
+					   void *state __maybe_unused,
+					   pmu_event_callback cb __maybe_unused)
+{
+	return 0;
+}
+
+static inline size_t nvme_pmu__num_events(struct perf_pmu *pmu __maybe_unused)
+{
+	return 0;
+}
+
+static inline bool nvme_pmu__have_event(struct perf_pmu *pmu __maybe_unused,
+					const char *name __maybe_unused)
+{
+	return false;
+}
+
+static inline int nvme_pmu__config_terms(const struct perf_pmu *pmu __maybe_unused,
+					 struct perf_event_attr *attr __maybe_unused,
+					 struct parse_events_terms *terms __maybe_unused,
+					 struct parse_events_error *err __maybe_unused)
+{
+	return -EINVAL;
+}
+
+static inline int nvme_pmu__check_alias(struct parse_events_terms *terms __maybe_unused,
+					struct perf_pmu_info *info __maybe_unused,
+					struct parse_events_error *err __maybe_unused)
+{
+	return -EINVAL;
+}
+
+static inline bool perf_pmu__is_nvme(const struct perf_pmu *pmu __maybe_unused)
+{
+	return false;
+}
+
+static inline bool evsel__is_nvme(const struct evsel *evsel __maybe_unused)
+{
+	return false;
+}
+
+static inline int perf_pmus__read_nvme_pmus(struct list_head *pmus __maybe_unused)
+{
+	return 0;
+}
+
+static inline int evsel__nvme_pmu_open(struct evsel *evsel __maybe_unused,
+				       struct perf_thread_map *threads __maybe_unused,
+				       int start_cpu_map_idx __maybe_unused,
+				       int end_cpu_map_idx __maybe_unused)
+{
+	return 0;
+}
+
+static inline int evsel__nvme_pmu_read(struct evsel *evsel __maybe_unused,
+				       int cpu_map_idx __maybe_unused,
+				       int thread __maybe_unused)
+{
+	return 0;
+}
+#endif
+
+#endif /* __NVME_PMU_H */
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 9994709ef12b..26ec19753644 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -22,6 +22,7 @@
 #include "pmu.h"
 #include "drm_pmu.h"
 #include "hwmon_pmu.h"
+#include "nvme_pmu.h"
 #include "pmus.h"
 #include "tool_pmu.h"
 #include "tp_pmu.h"
@@ -1687,6 +1688,8 @@ int perf_pmu__config_terms(const struct perf_pmu *pmu,
 
 	if (perf_pmu__is_hwmon(pmu))
 		return hwmon_pmu__config_terms(pmu, attr, terms, err);
+	if (perf_pmu__is_nvme(pmu))
+		return nvme_pmu__config_terms(pmu, attr, terms, err);
 	if (perf_pmu__is_drm(pmu))
 		return drm_pmu__config_terms(pmu, attr, terms, err);
 
@@ -1851,6 +1854,10 @@ int perf_pmu__check_alias(struct perf_pmu *pmu, struct parse_events_terms *head_
 		ret = hwmon_pmu__check_alias(head_terms, info, err);
 		goto out;
 	}
+	if (perf_pmu__is_nvme(pmu)) {
+		ret = nvme_pmu__check_alias(head_terms, info, err);
+		goto out;
+	}
 	if (perf_pmu__is_drm(pmu)) {
 		ret = drm_pmu__check_alias(pmu, head_terms, info, err);
 		goto out;
@@ -2071,6 +2078,8 @@ bool perf_pmu__have_event(struct perf_pmu *pmu, const char *name)
 		return tp_pmu__have_event(pmu, name);
 	if (perf_pmu__is_hwmon(pmu))
 		return hwmon_pmu__have_event(pmu, name);
+	if (perf_pmu__is_nvme(pmu))
+		return nvme_pmu__have_event(pmu, name);
 	if (perf_pmu__is_drm(pmu))
 		return drm_pmu__have_event(pmu, name);
 	if (perf_pmu__find_alias(pmu, name, /*load=*/ true) != NULL)
@@ -2092,6 +2101,8 @@ size_t perf_pmu__num_events(struct perf_pmu *pmu)
 		return tp_pmu__num_events(pmu);
 	if (perf_pmu__is_hwmon(pmu))
 		return hwmon_pmu__num_events(pmu);
+	if (perf_pmu__is_nvme(pmu))
+		return nvme_pmu__num_events(pmu);
 	if (perf_pmu__is_drm(pmu))
 		return drm_pmu__num_events(pmu);
 
@@ -2223,6 +2234,8 @@ int perf_pmu__for_each_event(struct perf_pmu *pmu, bool skip_duplicate_pmus,
 		return tp_pmu__for_each_event(pmu, state, cb);
 	if (perf_pmu__is_hwmon(pmu))
 		return hwmon_pmu__for_each_event(pmu, state, cb);
+	if (perf_pmu__is_nvme(pmu))
+		return nvme_pmu__for_each_event(pmu, state, cb);
 	if (perf_pmu__is_drm(pmu))
 		return drm_pmu__for_each_event(pmu, state, cb);
 
@@ -2714,6 +2727,8 @@ void perf_pmu__delete(struct perf_pmu *pmu)
 
 	if (perf_pmu__is_hwmon(pmu))
 		hwmon_pmu__exit(pmu);
+	if (perf_pmu__is_nvme(pmu))
+		nvme_pmu__exit(pmu);
 	else if (perf_pmu__is_drm(pmu))
 		drm_pmu__exit(pmu);
 
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 0d9f3c57e8e8..0fe47dd429e8 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -45,6 +45,8 @@ enum pmu_kind {
 	PERF_PMU_KIND_DRM,
 	/* A perf tool provided HWMON PMU. */
 	PERF_PMU_KIND_HWMON,
+	/* A perf tool provided NVME PMU. */
+	PERF_PMU_KIND_NVME,
 	/* Perf tool provided PMU for tool events like time. */
 	PERF_PMU_KIND_TOOL,
 	/* A testing PMU kind. */
@@ -53,7 +55,9 @@ enum pmu_kind {
 
 enum {
 	PERF_PMU_TYPE_PE_START    = 0,
-	PERF_PMU_TYPE_PE_END      = 0xFFFDFFFF,
+	PERF_PMU_TYPE_PE_END      = 0xFFFCFFFF,
+	PERF_PMU_TYPE_NVME_START  = 0xFFFD0000,
+	PERF_PMU_TYPE_NVME_END    = 0xFFFDFFFF,
 	PERF_PMU_TYPE_DRM_START   = 0xFFFE0000,
 	PERF_PMU_TYPE_DRM_END     = 0xFFFEFFFF,
 	PERF_PMU_TYPE_HWMON_START = 0xFFFF0000,
@@ -363,6 +367,8 @@ static inline enum pmu_kind perf_pmu__kind(const struct perf_pmu *pmu)
 	type = pmu->type;
 	if (type <= PERF_PMU_TYPE_PE_END)
 		return PERF_PMU_KIND_PE;
+	if (type <= PERF_PMU_TYPE_NVME_END)
+		return PERF_PMU_KIND_NVME;
 	if (type <= PERF_PMU_TYPE_DRM_END)
 		return PERF_PMU_KIND_DRM;
 	if (type <= PERF_PMU_TYPE_HWMON_END)
diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
index 5e3f571450fe..83777f941e9a 100644
--- a/tools/perf/util/pmus.c
+++ b/tools/perf/util/pmus.c
@@ -17,6 +17,7 @@
 #include "pmus.h"
 #include "pmu.h"
 #include "hwmon_pmu.h"
+#include "nvme_pmu.h"
 #include "tool_pmu.h"
 #include "print-events.h"
 #include "strbuf.h"
@@ -44,18 +45,21 @@ enum perf_tool_pmu_type {
 	PERF_TOOL_PMU_TYPE_PE_OTHER,
 	PERF_TOOL_PMU_TYPE_TOOL,
 	PERF_TOOL_PMU_TYPE_HWMON,
+	PERF_TOOL_PMU_TYPE_NVME,
 	PERF_TOOL_PMU_TYPE_DRM,
 
 #define PERF_TOOL_PMU_TYPE_PE_CORE_MASK (1 << PERF_TOOL_PMU_TYPE_PE_CORE)
 #define PERF_TOOL_PMU_TYPE_PE_OTHER_MASK (1 << PERF_TOOL_PMU_TYPE_PE_OTHER)
 #define PERF_TOOL_PMU_TYPE_TOOL_MASK (1 << PERF_TOOL_PMU_TYPE_TOOL)
 #define PERF_TOOL_PMU_TYPE_HWMON_MASK (1 << PERF_TOOL_PMU_TYPE_HWMON)
+#define PERF_TOOL_PMU_TYPE_NVME_MASK (1 << PERF_TOOL_PMU_TYPE_NVME)
 #define PERF_TOOL_PMU_TYPE_DRM_MASK (1 << PERF_TOOL_PMU_TYPE_DRM)
 
 #define PERF_TOOL_PMU_TYPE_ALL_MASK (PERF_TOOL_PMU_TYPE_PE_CORE_MASK |	\
 					PERF_TOOL_PMU_TYPE_PE_OTHER_MASK | \
 					PERF_TOOL_PMU_TYPE_TOOL_MASK |	\
 					PERF_TOOL_PMU_TYPE_HWMON_MASK | \
+					PERF_TOOL_PMU_TYPE_NVME_MASK | \
 					PERF_TOOL_PMU_TYPE_DRM_MASK)
 };
 static unsigned int read_pmu_types;
@@ -175,12 +179,15 @@ struct perf_pmu *perf_pmus__find(const char *name)
 		return pmu;
 
 	/* Looking up an individual perf event PMU failed, check if a tool PMU should be read. */
-	if (!strncmp(name, "hwmon_", 6))
-		to_read_pmus |= PERF_TOOL_PMU_TYPE_HWMON_MASK;
-	else if (!strncmp(name, "drm_", 4))
+	if (!strncmp(name, "hwmon_", 6)) {
+		to_read_pmus = PERF_TOOL_PMU_TYPE_HWMON_MASK;
+	} else if (!strncmp(name, "nvme_", 5)) {
+		to_read_pmus = PERF_TOOL_PMU_TYPE_NVME_MASK;
+	} else if (!strncmp(name, "drm_", 4)) {
 		to_read_pmus |= PERF_TOOL_PMU_TYPE_DRM_MASK;
-	else if (!strcmp(name, "tool"))
+	} else if (!strcmp(name, "tool")) {
 		to_read_pmus |= PERF_TOOL_PMU_TYPE_TOOL_MASK;
+	}
 
 	if (to_read_pmus) {
 		pmu_read_sysfs(to_read_pmus);
@@ -279,6 +286,10 @@ static void pmu_read_sysfs(unsigned int to_read_types)
 	    (read_pmu_types & PERF_TOOL_PMU_TYPE_HWMON_MASK) == 0)
 		perf_pmus__read_hwmon_pmus(&other_pmus);
 
+	if ((to_read_types & PERF_TOOL_PMU_TYPE_NVME_MASK) != 0 &&
+	    (read_pmu_types & PERF_TOOL_PMU_TYPE_NVME_MASK) == 0)
+		perf_pmus__read_nvme_pmus(&other_pmus);
+
 	if ((to_read_types & PERF_TOOL_PMU_TYPE_DRM_MASK) != 0 &&
 	    (read_pmu_types & PERF_TOOL_PMU_TYPE_DRM_MASK) == 0)
 		perf_pmus__read_drm_pmus(&other_pmus);
@@ -387,6 +398,10 @@ struct perf_pmu *perf_pmus__scan_for_event(struct perf_pmu *pmu, const char *eve
 		if (strlen(event) > 4 && strncmp("drm-", event, 4) == 0)
 			to_read_pmus |= PERF_TOOL_PMU_TYPE_DRM_MASK;
 
+		/* Could the event be an nvme event? */
+		if (nvme_pmu__have_event(NULL, event))
+			to_read_pmus |= PERF_TOOL_PMU_TYPE_NVME_MASK;
+
 		pmu_read_sysfs(to_read_pmus);
 		pmu = list_prepare_entry(pmu, &core_pmus, list);
 	}
@@ -424,11 +439,14 @@ struct perf_pmu *perf_pmus__scan_matching_wildcard(struct perf_pmu *pmu, const c
 		 */
 		if (strisglob(wildcard)) {
 			to_read_pmus |= PERF_TOOL_PMU_TYPE_HWMON_MASK |
+				PERF_TOOL_PMU_TYPE_NVME_MASK |
 				PERF_TOOL_PMU_TYPE_DRM_MASK;
 		} else if (strlen(wildcard) >= 4 && strncmp("drm_", wildcard, 4) == 0) {
 			to_read_pmus |= PERF_TOOL_PMU_TYPE_DRM_MASK;
 		} else if (strlen(wildcard) >= 5 && strncmp("hwmon", wildcard, 5) == 0) {
 			to_read_pmus |= PERF_TOOL_PMU_TYPE_HWMON_MASK;
+		} else if (strlen(wildcard) >= 4 && strncmp("nvme", wildcard, 4) == 0) {
+			to_read_pmus |= PERF_TOOL_PMU_TYPE_NVME_MASK;
 		}
 
 		pmu_read_sysfs(to_read_pmus);
-- 
2.54.0.1099.g489fc7bff1-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/3] perf tests: Add NVMe PMU event parsing test
  2026-06-09 16:57 ` [PATCH v2 0/3] perf pmu: Add tool-provided NVMe PMU Ian Rogers
  2026-06-09 16:57   ` [PATCH v2 1/3] perf build: Add libnvme feature detection Ian Rogers
  2026-06-09 16:57   ` [PATCH v2 2/3] perf pmu: Implement tool-provided NVMe PMU Ian Rogers
@ 2026-06-09 16:57   ` Ian Rogers
  2 siblings, 0 replies; 12+ messages in thread
From: Ian Rogers @ 2026-06-09 16:57 UTC (permalink / raw)
  To: irogers, yuzhuo
  Cc: 9erthalion6, acme, adrian.hunter, alexandre.chartre, ashelat,
	german.gomez, james.clark, jolsa, leo.yan, linux-kernel,
	linux-nvme, linux-perf-users, mingo, mjeanson, namhyung, peterz,
	tglozar

Introduce a unit test suite for the NVMe PMU event parser.
The test registers a mock 'nvme_nvme0' PMU and checks that all of
our mapped events (SMART, Endurance, FDP, Error, ZNS) parse into the
correctly configured configs.

To support this, moved the NVMe event config encoding macros and the
'nvme_log_type' enum from nvme_pmu.c to nvme_pmu.h.

Signed-off-by: Ian Rogers <irogers@google.com>
CONV=ca4c5d09-4ef8-405a-80bb-aa988020b436
TAG=agy
---
 tools/perf/tests/Build          |   1 +
 tools/perf/tests/builtin-test.c |   1 +
 tools/perf/tests/nvme_pmu.c     | 176 ++++++++++++++++++++++++++++++++
 tools/perf/tests/tests.h        |   1 +
 tools/perf/util/nvme_pmu.c      |  27 -----
 tools/perf/util/nvme_pmu.h      |  31 ++++++
 tools/perf/util/pmus.c          |   5 +
 tools/perf/util/pmus.h          |   1 +
 8 files changed, 216 insertions(+), 27 deletions(-)
 create mode 100644 tools/perf/tests/nvme_pmu.c

diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build
index 66944a4f4968..acaf5e3e728b 100644
--- a/tools/perf/tests/Build
+++ b/tools/perf/tests/Build
@@ -68,6 +68,7 @@ perf-test-y += event_groups.o
 perf-test-y += symbols.o
 perf-test-y += util.o
 perf-test-y += hwmon_pmu.o
+perf-test-y += nvme_pmu.o
 perf-test-y += tool_pmu.o
 perf-test-y += subcmd-help.o
 perf-test-y += kallsyms-split.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index b64fc2204f22..9a2fb706c2bc 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -83,6 +83,7 @@ static struct test_suite *generic_tests[] = {
 	&suite__pmu,
 	&suite__pmu_events,
 	&suite__hwmon_pmu,
+	&suite__nvme_pmu,
 	&suite__tool_pmu,
 	&suite__dso_data,
 	&suite__perf_evsel__roundtrip_name_test,
diff --git a/tools/perf/tests/nvme_pmu.c b/tools/perf/tests/nvme_pmu.c
new file mode 100644
index 000000000000..3c1de6e92efc
--- /dev/null
+++ b/tools/perf/tests/nvme_pmu.c
@@ -0,0 +1,176 @@
+// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+#include "nvme_pmu.h"
+
+#include <errno.h>
+#include <inttypes.h>
+#include <linux/compiler.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+
+#include "debug.h"
+#include "evlist.h"
+#include "parse-events.h"
+#include "pmus.h"
+#include "tests.h"
+
+#ifdef HAVE_LIBNVME_SUPPORT
+
+static const struct test_event {
+	const char *name;
+	const char *alias;
+	uint64_t config;
+} test_events[] = {
+	{
+		"smart_temperature",
+		"smart_temperature",
+		NVME_SMART(2, temperature),
+	},
+	{
+		"smart_data_units_read",
+		"smart_data_units_read",
+		NVME_SMART(16, data_units_read),
+	},
+	{
+		"endurance_percent_used",
+		"endurance_percent_used",
+		NVME_ENDURANCE(1, percent_used),
+	},
+	{
+		"fdp_hbmw",
+		"fdp_hbmw",
+		NVME_FDP(16, hbmw),
+	},
+	{
+		"error_count",
+		"error_count",
+		NVME_ERROR(8, error_count),
+	},
+	{
+		"zns_nrzid",
+		"zns_nrzid",
+		NVME_ZNS(2, nrzid),
+	},
+};
+
+static int do_test(size_t i, bool with_pmu, bool with_alias)
+{
+	const char *test_event = with_alias ? test_events[i].alias : test_events[i].name;
+	struct evlist *evlist = evlist__new();
+	struct evsel *evsel;
+	struct parse_events_error err;
+	int ret;
+	char str[128];
+	bool found = false;
+
+	if (!evlist) {
+		pr_err("evlist allocation failed\n");
+		return TEST_FAIL;
+	}
+
+	if (with_pmu)
+		snprintf(str, sizeof(str), "nvme_nvme0/%s/", test_event);
+	else
+		strlcpy(str, test_event, sizeof(str));
+
+	pr_debug("Testing '%s'\n", str);
+	parse_events_error__init(&err);
+	ret = parse_events(evlist, str, &err);
+	if (ret) {
+		pr_debug("FAILED %s:%d failed to parse event '%s', err %d\n",
+			 __FILE__, __LINE__, str, ret);
+		parse_events_error__print(&err, str);
+		ret = TEST_FAIL;
+		goto out;
+	}
+
+	ret = TEST_OK;
+	if (with_pmu ? (evlist->core.nr_entries != 1) : (evlist->core.nr_entries < 1)) {
+		pr_debug("FAILED %s:%d Unexpected number of events for '%s' of %d\n",
+			 __FILE__, __LINE__, str, evlist->core.nr_entries);
+		ret = TEST_FAIL;
+		goto out;
+	}
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (!evsel->pmu || !evsel->pmu->name ||
+		    strcmp(evsel->pmu->name, "nvme_nvme0"))
+			continue;
+
+		if (evsel->core.attr.config != test_events[i].config) {
+			pr_debug("FAILED %s:%d Unexpected config for '%s', %"
+				 PRIu64 " != %" PRIu64 "\n",
+				 __FILE__, __LINE__, str,
+				 (uint64_t)evsel->core.attr.config,
+				 test_events[i].config);
+			ret = TEST_FAIL;
+			goto out;
+		}
+		found = true;
+	}
+
+	if (!found) {
+		pr_debug("FAILED %s:%d Didn't find nvme event '%s' in parsed evsels\n",
+			 __FILE__, __LINE__, str);
+		ret = TEST_FAIL;
+	}
+
+out:
+	parse_events_error__exit(&err);
+	evlist__delete(evlist);
+	return ret;
+}
+
+static int test__nvme_pmu(bool with_pmu)
+{
+	struct perf_pmu *pmu = perf_pmus__add_test_nvme_pmu("nvme0", "nvme0");
+	int ret = TEST_OK;
+
+	if (!pmu)
+		return TEST_FAIL;
+
+	for (size_t i = 0; i < ARRAY_SIZE(test_events); i++) {
+		ret = do_test(i, with_pmu, /*with_alias=*/false);
+		if (ret != TEST_OK)
+			break;
+
+		ret = do_test(i, with_pmu, /*with_alias=*/true);
+		if (ret != TEST_OK)
+			break;
+	}
+
+	list_del(&pmu->list);
+	perf_pmu__delete(pmu);
+	return ret;
+}
+
+static int test__nvme_pmu_without_pmu(struct test_suite *test __maybe_unused,
+				      int subtest __maybe_unused)
+{
+	return test__nvme_pmu(/*with_pmu=*/false);
+}
+
+static int test__nvme_pmu_with_pmu(struct test_suite *test __maybe_unused,
+				   int subtest __maybe_unused)
+{
+	return test__nvme_pmu(/*with_pmu=*/true);
+}
+
+static struct test_case tests__nvme_pmu[] = {
+	TEST_CASE("Parsing without PMU name", nvme_pmu_without_pmu),
+	TEST_CASE("Parsing with PMU name", nvme_pmu_with_pmu),
+	{	.name = NULL, }
+};
+
+struct test_suite suite__nvme_pmu = {
+	.desc = "NVMe PMU",
+	.test_cases = tests__nvme_pmu,
+};
+
+#else
+
+struct test_suite suite__nvme_pmu = {
+	.desc = "NVMe PMU",
+	.test_cases = NULL,
+};
+
+#endif
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index bf8ff7d54727..abffa51c5937 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -115,6 +115,7 @@ DECLARE_SUITE(syscall_openat_tp_fields);
 DECLARE_SUITE(pmu);
 DECLARE_SUITE(pmu_events);
 DECLARE_SUITE(hwmon_pmu);
+DECLARE_SUITE(nvme_pmu);
 DECLARE_SUITE(tool_pmu);
 DECLARE_SUITE(attr);
 DECLARE_SUITE(dso_data);
diff --git a/tools/perf/util/nvme_pmu.c b/tools/perf/util/nvme_pmu.c
index e2391726d76e..c58048f69b43 100644
--- a/tools/perf/util/nvme_pmu.c
+++ b/tools/perf/util/nvme_pmu.c
@@ -26,33 +26,6 @@
 #ifdef HAVE_LIBNVME_SUPPORT
 #include <libnvme.h>
 
-
-#define NVME_CONFIG(log, size, offset) \
-	(((uint64_t)(log) << 24) | ((uint64_t)(size) << 16) | (offset))
-
-enum nvme_log_type {
-	NVME_LOG_SMART = 0,
-	NVME_LOG_ENDURANCE = 1,
-	NVME_LOG_FDP = 2,
-	NVME_LOG_ERROR = 3,
-	NVME_LOG_ZNS = 4,
-};
-
-#define NVME_SMART(size, field) \
-	NVME_CONFIG(NVME_LOG_SMART, size, offsetof(struct nvme_smart_log, field))
-
-#define NVME_ENDURANCE(size, field) \
-	NVME_CONFIG(NVME_LOG_ENDURANCE, size, offsetof(struct nvme_endurance_group_log, field))
-
-#define NVME_FDP(size, field) \
-	NVME_CONFIG(NVME_LOG_FDP, size, offsetof(struct nvme_fdp_stats_log, field))
-
-#define NVME_ERROR(size, field) \
-	NVME_CONFIG(NVME_LOG_ERROR, size, offsetof(struct nvme_error_log_page, field))
-
-#define NVME_ZNS(size, field) \
-	NVME_CONFIG(NVME_LOG_ZNS, size, offsetof(struct nvme_zns_changed_zone_log, field))
-
 struct nvme_event {
 	const char *name;
 	const char *desc;
diff --git a/tools/perf/util/nvme_pmu.h b/tools/perf/util/nvme_pmu.h
index 6d5d2bbe4167..9203f461f381 100644
--- a/tools/perf/util/nvme_pmu.h
+++ b/tools/perf/util/nvme_pmu.h
@@ -6,6 +6,37 @@
 #include <stdbool.h>
 #include <errno.h>
 
+#ifdef HAVE_LIBNVME_SUPPORT
+#include <libnvme.h>
+#include <stddef.h>
+
+#define NVME_CONFIG(log, size, offset) \
+	(((uint64_t)(log) << 24) | ((uint64_t)(size) << 16) | (offset))
+
+enum nvme_log_type {
+	NVME_LOG_SMART = 0,
+	NVME_LOG_ENDURANCE = 1,
+	NVME_LOG_FDP = 2,
+	NVME_LOG_ERROR = 3,
+	NVME_LOG_ZNS = 4,
+};
+
+#define NVME_SMART(size, field) \
+	NVME_CONFIG(NVME_LOG_SMART, size, offsetof(struct nvme_smart_log, field))
+
+#define NVME_ENDURANCE(size, field) \
+	NVME_CONFIG(NVME_LOG_ENDURANCE, size, offsetof(struct nvme_endurance_group_log, field))
+
+#define NVME_FDP(size, field) \
+	NVME_CONFIG(NVME_LOG_FDP, size, offsetof(struct nvme_fdp_stats_log, field))
+
+#define NVME_ERROR(size, field) \
+	NVME_CONFIG(NVME_LOG_ERROR, size, offsetof(struct nvme_error_log_page, field))
+
+#define NVME_ZNS(size, field) \
+	NVME_CONFIG(NVME_LOG_ZNS, size, offsetof(struct nvme_zns_changed_zone_log, field))
+#endif
+
 struct list_head;
 struct perf_thread_map;
 struct evsel;
diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
index 83777f941e9a..1c45164ae244 100644
--- a/tools/perf/util/pmus.c
+++ b/tools/perf/util/pmus.c
@@ -909,6 +909,11 @@ struct perf_pmu *perf_pmus__add_test_hwmon_pmu(const char *hwmon_dir,
 	return hwmon_pmu__new(&other_pmus, hwmon_dir, sysfs_name, name);
 }
 
+struct perf_pmu *perf_pmus__add_test_nvme_pmu(const char *sysfs_name, const char *name)
+{
+	return nvme_pmu__new(&other_pmus, sysfs_name, name);
+}
+
 struct perf_pmu *perf_pmus__fake_pmu(void)
 {
 	static struct perf_pmu fake = {
diff --git a/tools/perf/util/pmus.h b/tools/perf/util/pmus.h
index 0d55edb3f2fc..2045d4cf44d1 100644
--- a/tools/perf/util/pmus.h
+++ b/tools/perf/util/pmus.h
@@ -37,6 +37,7 @@ struct perf_pmu *perf_pmus__add_test_pmu(int test_sysfs_dirfd, const char *name)
 struct perf_pmu *perf_pmus__add_test_hwmon_pmu(const char *hwmon_dir,
 					       const char *sysfs_name,
 					       const char *name);
+struct perf_pmu *perf_pmus__add_test_nvme_pmu(const char *sysfs_name, const char *name);
 struct perf_pmu *perf_pmus__fake_pmu(void);
 struct perf_pmu *perf_pmus__find_core_pmu(void);
 
-- 
2.54.0.1099.g489fc7bff1-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] perf build: Add libnvme feature detection
  2026-06-09 16:57   ` [PATCH v2 1/3] perf build: Add libnvme feature detection Ian Rogers
@ 2026-06-09 17:08     ` sashiko-bot
  0 siblings, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-06-09 17:08 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] The libnvme feature test hardcodes the `-lnvme` compiler flag, allowing the test to pass even if `pkg-config` is missing or fails to locate `libnvme.pc` (provided the library is in a standard system path). However, `tools/perf/Makefile.config` strictly relies on the output of `pkg-config --libs-only-l libnvme` to populate `EXTLIBS`. This mismatch causes the build system to enable NVMe PMU support based on the successful feature test, but then fail during the final linking stage of the `perf` executable due to missing `-lnvme` flags, resulting in undefined references to libnvme functions.
--

commit a99ed0c67b6c2d89a762f7be2b57899a90210460
Author: Ian Rogers <irogers@google.com>

perf build: Add libnvme feature detection

Introduce a feature check for libnvme so that we can conditionally compile
the NVMe PMU driver. This checks for the availability of the libnvme headers
and library using pkg-config.

> diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile
> index 62909a9c799d3..fd00e0be60690 100644
> --- a/tools/build/feature/Makefile
> +++ b/tools/build/feature/Makefile

[ ... ]

> @@ -147,6 +147,9 @@ $(OUTPUT)test-libcap.bin:
>  $(OUTPUT)test-libelf.bin:
>  	$(BUILD) -lelf
>  
> +$(OUTPUT)test-libnvme.bin:
> +	$(BUILD) -lnvme

[Severity: Low]
Does hardcoding the -lnvme flag here bypass the pkg-config requirement?

If libnvme is installed in standard paths but its pkg-config file is missing
or pkg-config is absent, this feature test will still pass.

> +
>  $(OUTPUT)test-eventfd.bin:
>  	$(BUILD)

[ ... ]

> diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
> index 6e7b15fab2ecc..79c4ad10bc181 100644
> --- a/tools/perf/Makefile.config
> +++ b/tools/perf/Makefile.config

[ ... ]

> @@ -1060,6 +1060,18 @@ ifndef NO_CAPSTONE
>    endif
>  endif
>  
> +ifndef NO_LIBNVME
> +  $(call feature_check,libnvme)
> +  ifeq ($(feature-libnvme), 1)
> +    CFLAGS += -DHAVE_LIBNVME_SUPPORT $(shell $(PKG_CONFIG) --cflags libnvme)
> +    LDFLAGS += $(shell $(PKG_CONFIG) --libs-only-L libnvme)
> +    EXTLIBS += $(shell $(PKG_CONFIG) --libs-only-l libnvme)

[Severity: Low]
Since EXTLIBS relies strictly on pkg-config here, will the build fail during
final linking if the feature test succeeded using the hardcoded -lnvme flag?

If pkg-config doesn't return the required flags, the perf executable would
encounter undefined references instead of gracefully falling back to a build
without NVMe PMU support.

> +    $(call detected,CONFIG_LIBNVME)
> +  else
> +    msg := $(warning No libnvme found, disables NVMe PMU support, please install libnvme-dev/libnvme-devel);
> +  endif
> +endif

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260609165726.786694-1-irogers@google.com?part=1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] perf pmu: Implement tool-provided NVMe PMU
  2026-06-09 16:57   ` [PATCH v2 2/3] perf pmu: Implement tool-provided NVMe PMU Ian Rogers
@ 2026-06-09 17:19     ` sashiko-bot
  0 siblings, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-06-09 17:19 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] Silent fallback on memory allocation failure for `evsel->nvme.initial_values` leads to reporting absolute lifetime metric values instead of interval deltas.
- [High] Passing a potentially NULL `term->config` to `asprintf` with `%s` causes undefined behavior or crashes on certain libcs.
- [Medium] The PMU falsely advertises raw configuration encodings that it actively rejects.
- [Medium] Temperature conversion underflows for sub-zero Celsius values.
--

commit eb0821bb5f35088b9c452cacc4d9efb1228b06b1
Author: Ian Rogers <irogers@google.com>

perf pmu: Implement tool-provided NVMe PMU

Add a tool-provided PMU for NVMe devices using libnvme. This PMU exposes
metrics from various NVMe logs as perf events under the 'nvme_nvmeX' PMUs.

> diff --git a/tools/perf/util/nvme_pmu.c b/tools/perf/util/nvme_pmu.c
> new file mode 100644
> index 0000000000000..e2391726d76e7
> --- /dev/null
> +++ b/tools/perf/util/nvme_pmu.c
[ ... ]
> +static int nvme_pmu__config_term(const struct nvme_pmu *nvm __maybe_unused,
> +				 struct perf_event_attr *attr,
> +				 struct parse_events_term *term,
> +				 struct parse_events_error *err)
> +{
> +	if (term->type_term == PARSE_EVENTS__TERM_TYPE_USER) {

[Severity: Medium]
In nvme_pmu__for_each_event(), the encoding description is formatted as
config=0x%lx. However, here in nvme_pmu__config_term() it seems only
PARSE_EVENTS__TERM_TYPE_USER is accepted.

Would this incorrectly reject valid config terms that match the advertised
encoding if a user manually specifies them? For example, if they were to
use a command like: perf stat -e nvme_nvme0/config=0x123/

> +		size_t i;
> +
> +		for (i = 0; i < ARRAY_SIZE(nvme_events); i++) {
> +			if (!strcasecmp(term->config, nvme_events[i].name)) {
> +				attr->config = nvme_events[i].config;
> +				return 0;
> +			}
> +		}
> +	}
> +	if (err) {
> +		char *err_str;
> +
> +		parse_events_error__handle(err, term->err_val,
> +					asprintf(&err_str,
> +						"unexpected nvme event term (%s) %s",
> +						parse_events__term_type_str(term->type_term),
> +						term->config) < 0

[Severity: High]
If a user specifies a non-string config term like name=myname or a bare
number, the parser might create a term with a NULL term->config string key.

Does passing a potentially NULL term->config to asprintf() with %s cause
undefined behavior or a crash on certain libc implementations (like musl)?

> +					? strdup("unexpected nvme event term")
> +					: err_str,
> +					NULL);
> +	}
> +	return -EINVAL;
> +}
[ ... ]
> +int nvme_pmu__check_alias(struct parse_events_terms *terms, struct perf_pmu_info *info,
> +			  struct parse_events_error *err)
> +{
[ ... ]
> +	if (err) {
> +		char *err_str;
> +
> +		parse_events_error__handle(err, term->err_val,
> +					asprintf(&err_str,
> +						"unexpected nvme event term (%s) %s",
> +						parse_events__term_type_str(term->type_term),
> +						term->config) < 0

[Severity: High]
Similarly, this error formatting path also appears to suffer from the same
potential NULL dereference of term->config.

> +					? strdup("unexpected nvme event term")
> +					: err_str,
> +					NULL);
> +	}
> +	return -EINVAL;
> +}
[ ... ]
> +static int nvme_pmu__read_val(int fd, uint64_t config, uint64_t *val)
> +{
[ ... ]
> +	if (log_type == NVME_LOG_SMART) {
> +		if (offset + size > sizeof(struct nvme_smart_log))
> +			return -EINVAL;
> +		if (nvme_get_log_smart(fd, NVME_NSID_ALL, true, (struct nvme_smart_log *)buf) != 0)
> +			return -EINVAL;
> +
> +		if (offset == offsetof(struct nvme_smart_log, temperature)) {
> +			uint64_t kelvin = ((struct nvme_smart_log *)buf)->temperature[0] |
> +					  (((struct nvme_smart_log *)buf)->temperature[1] << 8);
> +			*val = (kelvin * 1000) - 273150;

[Severity: Medium]
Does this temperature conversion risk an unsigned integer underflow for
sub-zero Celsius values?

If the NVMe SMART temperature in Kelvin drops below 0 degrees Celsius
(less than 273 K), kelvin * 1000 would be less than 273150. Since *val is
assigned to a uint64_t, this underflow could result in a massive value
being reported instead of a valid negative temperature.

> +			return 0;
> +		}
> +	} else if (log_type == NVME_LOG_ENDURANCE) {
[ ... ]
> +int evsel__nvme_pmu_open(struct evsel *evsel,
> +			 struct perf_thread_map *threads,
> +			 int start_cpu_map_idx, int end_cpu_map_idx)
> +{
> +	struct nvme_pmu *nvm = container_of(evsel->pmu, struct nvme_pmu, pmu);
> +	int idx = 0, thread = 0, nthreads, err = 0;
> +	char path[PATH_MAX];
> +
> +	snprintf(path, sizeof(path), "/dev/%s", nvm->dev_name);
> +
> +	nthreads = perf_thread_map__nr(threads);
> +
> +	if (!evsel->nvme.initial_values) {
> +		int max_cpus = evsel->core.cpus ? perf_cpu_map__nr(evsel->core.cpus) : 1;
> +
> +		evsel->nvme.initial_values = xyarray__new(max_cpus, nthreads, sizeof(uint64_t));

[Severity: High]
What happens if xyarray__new() fails and returns NULL here?

It looks like the allocation failure is silently ignored, and the open
operation succeeds with evsel->nvme.initial_values left as NULL.
Later in evsel__nvme_pmu_read(), the baseline subtraction defaults to 0.

Could this cause the perf tool to silently report the absolute lifetime
device counter on the first measurement interval instead of the expected
interval deltas? Should this check for allocation failure and return
-ENOMEM instead?

> +	}
> +
> +	for (idx = start_cpu_map_idx; idx < end_cpu_map_idx; idx++) {

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260609165726.786694-1-irogers@google.com?part=2

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-06-09 17:19 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-09  7:03 [PATCH v1 0/3] perf pmu: Add tool-provided NVMe PMU Ian Rogers
2026-06-09  7:03 ` [PATCH v1 1/3] perf build: Add libnvme feature detection Ian Rogers
2026-06-09  7:12   ` sashiko-bot
2026-06-09  7:03 ` [PATCH v1 2/3] perf pmu: Implement tool-provided NVMe PMU Ian Rogers
2026-06-09  7:21   ` sashiko-bot
2026-06-09  7:03 ` [PATCH v1 3/3] perf tests: Add NVMe PMU event parsing test Ian Rogers
2026-06-09 16:57 ` [PATCH v2 0/3] perf pmu: Add tool-provided NVMe PMU Ian Rogers
2026-06-09 16:57   ` [PATCH v2 1/3] perf build: Add libnvme feature detection Ian Rogers
2026-06-09 17:08     ` sashiko-bot
2026-06-09 16:57   ` [PATCH v2 2/3] perf pmu: Implement tool-provided NVMe PMU Ian Rogers
2026-06-09 17:19     ` sashiko-bot
2026-06-09 16:57   ` [PATCH v2 3/3] perf tests: Add NVMe PMU event parsing test Ian Rogers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox