* [PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs
@ 2026-02-20 17:48 Nilay Shroff
2026-02-20 17:48 ` [PATCHv3 1/7] nvme: export command retry count " Nilay Shroff
` (9 more replies)
0 siblings, 10 replies; 18+ messages in thread
From: Nilay Shroff @ 2026-02-20 17:48 UTC (permalink / raw)
To: linux-nvme
Cc: kbusch, axboe, hch, sagi, hare, dwagner, wenxiong, gjoyce,
Nilay Shroff
Hi,
The NVMe driver encounters various events and conditions during normal
operation that are either not tracked today or not exposed to userspace
via sysfs. Lack of visibility into these events can make it difficult to
diagnose subtle issues related to controller behavior, multipath
stability, and I/O reliability.
This patchset adds several diagnostic counters that provide improved
observability into NVMe behavior. These counters are intended to help
users understand events such as transient path unavailability,
controller retries/reconnect/reset, failovers, and I/O failures. They
can also be consumed by monitoring tools such as nvme-top.
Specifically, this series proposes to export the following counters via
sysfs:
- Command retry count
- Multipath failover count
- Command error count
- I/O requeue count
- I/O failure count
- Controller reset event counts
- Controller reconnect counts
The patchset consists of seven patches:
Patch 1: Export command retry count
Patch 2: Export multipath failover count
Patch 3: Export command error count
Patch 4: Export I/O requeue count
Patch 5: Export I/O failure count
Patch 6: Export controller reset event counts
Patch 7: Export controller reconnect event count
Please note that this patchset doesn't make any functional change but
rather export relevant counters to user space via sysfs.
As usual, feedback/comments/suggestions are welcome!
Changes from v2:
- Allow user to write to sysfs attributes so that user could
reset stat counters, if needed (Sagi)
- The controller reconnect counter nr_reconnects could reset
to zero once connection is re-established, so instead of
exposing nr_reconnects counter via sysfs introduce a new
counter which accumulates the reconnect attempts and export
this accumulated counter via sysfs (Sagi)
Link to v2: https://lore.kernel.org/all/20260205124810.682559-1-nilay@linux.ibm.com/
Changes from v1:
- Remove export of stats for admin command rerty count (Keith)
- Use size_add() to ensure stat counters don't overflow (Keith)
Link to v1: https://lore.kernel.org/all/20260130182028.885089-1-nilay@linux.ibm.com/
Nilay Shroff (7):
nvme: export command retry count via sysfs
nvme: export multipath failover count via sysfs
nvme: export command error counters via sysfs
nvme: export I/O requeue count when no path is available via sysfs
nvme: export I/O failure count when no path is available via sysfs
nvme: export controller reset event count via sysfs
nvme: export controller reconnect event count via sysfs
drivers/nvme/host/core.c | 18 +++-
drivers/nvme/host/fc.c | 5 +
drivers/nvme/host/multipath.c | 89 ++++++++++++++++++
drivers/nvme/host/nvme.h | 13 ++-
drivers/nvme/host/rdma.c | 4 +
drivers/nvme/host/sysfs.c | 167 ++++++++++++++++++++++++++++++++++
drivers/nvme/host/tcp.c | 3 +
7 files changed, 297 insertions(+), 2 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCHv3 1/7] nvme: export command retry count via sysfs
2026-02-20 17:48 [PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs Nilay Shroff
@ 2026-02-20 17:48 ` Nilay Shroff
2026-02-20 17:48 ` [PATCHv3 2/7] nvme: export multipath failover " Nilay Shroff
` (8 subsequent siblings)
9 siblings, 0 replies; 18+ messages in thread
From: Nilay Shroff @ 2026-02-20 17:48 UTC (permalink / raw)
To: linux-nvme
Cc: kbusch, axboe, hch, sagi, hare, dwagner, wenxiong, gjoyce,
Nilay Shroff
When Advanced Command Retry Enable (ACRE) is configured, a controller
may interrupt command execution and return a completion status
indicating command interrupted with the DNR bit cleared. In this case,
the driver retries the command based on the Command Retry Delay (CRD)
value provided in the completion status.
Currently, these command retries are handled entirely within the NVMe
driver and are not visible to userspace. As a result, there is no
observability into retry behavior, which can be a useful diagnostic
signal.
Expose the command retries count through sysfs to provide visibility
into retry activity. This information can help identify controller-side
congestion under load and enables comparison across paths in multipath
setups (for example, detecting cases where one path experiences
significantly more retries than another under identical workloads).
This exported metric is intended for diagnostics and monitoring tools
such as nvme-top, and does not change command retry behavior. A new
sysfs attribute named "command_retries" is added for this purpose.
This attribute is both readable as well as writable. So user could
reset this counter if needed.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
drivers/nvme/host/core.c | 4 ++++
drivers/nvme/host/nvme.h | 2 +-
drivers/nvme/host/sysfs.c | 30 ++++++++++++++++++++++++++++++
3 files changed, 35 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 19b67cf5d550..212dabc807bb 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -323,6 +323,7 @@ static void nvme_retry_req(struct request *req)
{
unsigned long delay = 0;
u16 crd;
+ struct nvme_ns *ns = req->q->queuedata;
/* The mask and shift result must be <= 3 */
crd = (nvme_req(req)->status & NVME_STATUS_CRD) >> 11;
@@ -330,6 +331,9 @@ static void nvme_retry_req(struct request *req)
delay = nvme_req(req)->ctrl->crdt[crd - 1] * 100;
nvme_req(req)->retries++;
+ if (ns)
+ WRITE_ONCE(ns->retries, size_add(READ_ONCE(ns->retries), 1));
+
blk_mq_requeue_request(req, false);
blk_mq_delay_kick_requeue_list(req->q, delay);
}
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 9a5f28c5103c..237829cdc151 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -359,7 +359,6 @@ struct nvme_ctrl {
unsigned long ka_last_check_time;
struct work_struct fw_act_work;
unsigned long events;
-
#ifdef CONFIG_NVME_MULTIPATH
/* asymmetric namespace access: */
u8 anacap;
@@ -535,6 +534,7 @@ struct nvme_ns {
enum nvme_ana_state ana_state;
u32 ana_grpid;
#endif
+ size_t retries;
struct list_head siblings;
struct kref kref;
struct nvme_ns_head *head;
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index 29430949ce2f..11e7016954a7 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -246,6 +246,31 @@ static ssize_t nuse_show(struct device *dev, struct device_attribute *attr,
}
static DEVICE_ATTR_RO(nuse);
+static ssize_t command_retries_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
+
+ return sysfs_emit(buf, "%lu\n", READ_ONCE(ns->retries));
+}
+
+static ssize_t command_retries_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t count)
+{
+ unsigned long retries;
+ int err;
+ struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
+
+ err = kstrtoul(buf, 0, &retries);
+ if (err)
+ return -EINVAL;
+
+ WRITE_ONCE(ns->retries, retries);
+
+ return count;
+}
+static DEVICE_ATTR_RW(command_retries);
+
static struct attribute *nvme_ns_attrs[] = {
&dev_attr_wwid.attr,
&dev_attr_uuid.attr,
@@ -263,6 +288,7 @@ static struct attribute *nvme_ns_attrs[] = {
&dev_attr_delayed_removal_secs.attr,
#endif
&dev_attr_io_passthru_err_log_enabled.attr,
+ &dev_attr_command_retries.attr,
NULL,
};
@@ -285,6 +311,10 @@ static umode_t nvme_ns_attrs_are_visible(struct kobject *kobj,
if (!memchr_inv(ids->eui64, 0, sizeof(ids->eui64)))
return 0;
}
+ if (a == &dev_attr_command_retries.attr) {
+ if (nvme_disk_is_ns_head(dev_to_disk(dev)))
+ return 0;
+ }
#ifdef CONFIG_NVME_MULTIPATH
if (a == &dev_attr_ana_grpid.attr || a == &dev_attr_ana_state.attr) {
/* per-path attr */
--
2.52.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCHv3 2/7] nvme: export multipath failover count via sysfs
2026-02-20 17:48 [PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs Nilay Shroff
2026-02-20 17:48 ` [PATCHv3 1/7] nvme: export command retry count " Nilay Shroff
@ 2026-02-20 17:48 ` Nilay Shroff
2026-02-20 17:48 ` [PATCHv3 3/7] nvme: export command error counters " Nilay Shroff
` (7 subsequent siblings)
9 siblings, 0 replies; 18+ messages in thread
From: Nilay Shroff @ 2026-02-20 17:48 UTC (permalink / raw)
To: linux-nvme
Cc: kbusch, axboe, hch, sagi, hare, dwagner, wenxiong, gjoyce,
Nilay Shroff
When an NVMe command completes with a path-specific error, the NVMe
driver may retry the command on an alternate controller or path if one
is available. These failover events indicate that I/O was redirected
away from the original path.
Currently, the number of times requests are failed over to another
available path is not visible to userspace. Exposing this information
can be useful for diagnosing path health and stability.
Export the multipath failover count attribute named "multipath_failover
_count" via sysfs which is both readable and writable and thus allowing
user to reset the counter. This counter can be consumed by monitoring
tools such as nvme-top to help identify paths that consistently trigger
failovers under load.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
drivers/nvme/host/multipath.c | 27 +++++++++++++++++++++++++++
drivers/nvme/host/nvme.h | 2 ++
drivers/nvme/host/sysfs.c | 5 +++++
3 files changed, 34 insertions(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 174027d1cc19..c8ae935658a4 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -142,6 +142,7 @@ void nvme_failover_req(struct request *req)
struct bio *bio;
nvme_mpath_clear_current_path(ns);
+ WRITE_ONCE(ns->failover, size_add(READ_ONCE(ns->failover), 1));
/*
* If we got back an ANA error, we know the controller is alive but not
@@ -1168,6 +1169,32 @@ static ssize_t delayed_removal_secs_store(struct device *dev,
DEVICE_ATTR_RW(delayed_removal_secs);
+static ssize_t multipath_failover_count_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
+
+ return sysfs_emit(buf, "%lu\n", READ_ONCE(ns->failover));
+}
+
+static ssize_t multipath_failover_count_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t count)
+{
+ unsigned long failover;
+ int ret;
+ struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
+
+ ret = kstrtoul(buf, 0, &failover);
+ if (ret)
+ return -EINVAL;
+
+ WRITE_ONCE(ns->failover, failover);
+
+ return count;
+}
+
+DEVICE_ATTR_RW(multipath_failover_count);
+
static int nvme_lookup_ana_group_desc(struct nvme_ctrl *ctrl,
struct nvme_ana_group_desc *desc, void *data)
{
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 237829cdc151..6307243fd216 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -533,6 +533,7 @@ struct nvme_ns {
#ifdef CONFIG_NVME_MULTIPATH
enum nvme_ana_state ana_state;
u32 ana_grpid;
+ size_t failover;
#endif
size_t retries;
struct list_head siblings;
@@ -1000,6 +1001,7 @@ extern struct device_attribute dev_attr_ana_state;
extern struct device_attribute dev_attr_queue_depth;
extern struct device_attribute dev_attr_numa_nodes;
extern struct device_attribute dev_attr_delayed_removal_secs;
+extern struct device_attribute dev_attr_multipath_failover_count;
extern struct device_attribute subsys_attr_iopolicy;
static inline bool nvme_disk_is_ns_head(struct gendisk *disk)
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index 11e7016954a7..78c3a6f78ef8 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -286,6 +286,7 @@ static struct attribute *nvme_ns_attrs[] = {
&dev_attr_queue_depth.attr,
&dev_attr_numa_nodes.attr,
&dev_attr_delayed_removal_secs.attr,
+ &dev_attr_multipath_failover_count.attr,
#endif
&dev_attr_io_passthru_err_log_enabled.attr,
&dev_attr_command_retries.attr,
@@ -333,6 +334,10 @@ static umode_t nvme_ns_attrs_are_visible(struct kobject *kobj,
if (!nvme_disk_is_ns_head(disk))
return 0;
}
+ if (a == &dev_attr_multipath_failover_count.attr) {
+ if (nvme_disk_is_ns_head(dev_to_disk(dev)))
+ return 0;
+ }
#endif
return a->mode;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCHv3 3/7] nvme: export command error counters via sysfs
2026-02-20 17:48 [PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs Nilay Shroff
2026-02-20 17:48 ` [PATCHv3 1/7] nvme: export command retry count " Nilay Shroff
2026-02-20 17:48 ` [PATCHv3 2/7] nvme: export multipath failover " Nilay Shroff
@ 2026-02-20 17:48 ` Nilay Shroff
2026-02-20 17:48 ` [PATCHv3 4/7] nvme: export I/O requeue count when no path is available " Nilay Shroff
` (6 subsequent siblings)
9 siblings, 0 replies; 18+ messages in thread
From: Nilay Shroff @ 2026-02-20 17:48 UTC (permalink / raw)
To: linux-nvme
Cc: kbusch, axboe, hch, sagi, hare, dwagner, wenxiong, gjoyce,
Nilay Shroff
When an NVMe command completes with an error status, the driver
logs the error to the kernel log. However, these messages may be
lost or overwritten over time since dmesg is a circular buffer.
Expose per-path and ctrl command error counters through sysfs to
provide persistent visibility into error occurrences. This allows
users to observe the total number of commands that have failed on
a given path over time, which can be useful for diagnosing path
health and stability.
Add new sysfs attribute named "command_error_count" per-path and
ctrl which is both readable and writable thus allowing user to reset
these counters. These counters can also be consumed by observability
tools such as nvme-top to provide additional insight into NVMe error
behavior.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
drivers/nvme/host/core.c | 12 +++++++-
drivers/nvme/host/nvme.h | 2 ++
drivers/nvme/host/sysfs.c | 65 +++++++++++++++++++++++++++++++++++++++
3 files changed, 78 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 212dabc807bb..d07e2ed9e494 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -438,11 +438,21 @@ static inline void nvme_end_req_zoned(struct request *req)
static inline void __nvme_end_req(struct request *req)
{
- if (unlikely(nvme_req(req)->status && !(req->rq_flags & RQF_QUIET))) {
+ struct nvme_ns *ns = req->q->queuedata;
+ struct nvme_request *nr = nvme_req(req);
+
+ if (unlikely(nr->status && !(req->rq_flags & RQF_QUIET))) {
if (blk_rq_is_passthrough(req))
nvme_log_err_passthru(req);
else
nvme_log_error(req);
+
+ if (ns)
+ WRITE_ONCE(ns->errors,
+ size_add(READ_ONCE(ns->errors), 1));
+ else
+ WRITE_ONCE(nr->ctrl->errors,
+ size_add(READ_ONCE(nr->ctrl->errors), 1));
}
nvme_end_req_zoned(req);
nvme_trace_bio_complete(req);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 6307243fd216..83b102a0ad89 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -359,6 +359,7 @@ struct nvme_ctrl {
unsigned long ka_last_check_time;
struct work_struct fw_act_work;
unsigned long events;
+ size_t errors;
#ifdef CONFIG_NVME_MULTIPATH
/* asymmetric namespace access: */
u8 anacap;
@@ -536,6 +537,7 @@ struct nvme_ns {
size_t failover;
#endif
size_t retries;
+ size_t errors;
struct list_head siblings;
struct kref kref;
struct nvme_ns_head *head;
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index 78c3a6f78ef8..4012123be507 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -6,6 +6,7 @@
*/
#include <linux/nvme-auth.h>
+#include <linux/blkdev.h>
#include "nvme.h"
#include "fabrics.h"
@@ -271,6 +272,34 @@ static ssize_t command_retries_store(struct device *dev,
}
static DEVICE_ATTR_RW(command_retries);
+static ssize_t nvme_io_errors_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
+
+ return sysfs_emit(buf, "%lu\n", READ_ONCE(ns->errors));
+}
+
+static ssize_t nvme_io_errors_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t count)
+{
+ unsigned long errors;
+ int err;
+ struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
+
+ err = kstrtoul(buf, 0, &errors);
+ if (err)
+ return -EINVAL;
+
+ WRITE_ONCE(ns->errors, errors);
+
+ return count;
+}
+
+struct device_attribute dev_attr_io_errors =
+ __ATTR(command_error_count, 0644,
+ nvme_io_errors_show, nvme_io_errors_store);
+
static struct attribute *nvme_ns_attrs[] = {
&dev_attr_wwid.attr,
&dev_attr_uuid.attr,
@@ -290,6 +319,7 @@ static struct attribute *nvme_ns_attrs[] = {
#endif
&dev_attr_io_passthru_err_log_enabled.attr,
&dev_attr_command_retries.attr,
+ &dev_attr_io_errors.attr,
NULL,
};
@@ -316,6 +346,12 @@ static umode_t nvme_ns_attrs_are_visible(struct kobject *kobj,
if (nvme_disk_is_ns_head(dev_to_disk(dev)))
return 0;
}
+ if (a == &dev_attr_io_errors.attr) {
+ struct gendisk *disk = dev_to_disk(dev);
+
+ if (nvme_disk_is_ns_head(disk))
+ return 0;
+ }
#ifdef CONFIG_NVME_MULTIPATH
if (a == &dev_attr_ana_grpid.attr || a == &dev_attr_ana_state.attr) {
/* per-path attr */
@@ -636,6 +672,34 @@ static ssize_t dctype_show(struct device *dev,
}
static DEVICE_ATTR_RO(dctype);
+static ssize_t nvme_adm_errors_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+ return sysfs_emit(buf, "%lu\n", READ_ONCE(ctrl->errors));
+}
+
+static ssize_t nvme_adm_errors_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t count)
+{
+ unsigned long errors;
+ int err;
+ struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+ err = kstrtoul(buf, 0, &errors);
+ if (err)
+ return -EINVAL;
+
+ WRITE_ONCE(ctrl->errors, errors);
+
+ return count;
+}
+
+struct device_attribute dev_attr_adm_errors =
+ __ATTR(command_error_count, 0644,
+ nvme_adm_errors_show, nvme_adm_errors_store);
+
#ifdef CONFIG_NVME_HOST_AUTH
static ssize_t nvme_ctrl_dhchap_secret_show(struct device *dev,
struct device_attribute *attr, char *buf)
@@ -782,6 +846,7 @@ static struct attribute *nvme_dev_attrs[] = {
&dev_attr_dhchap_ctrl_secret.attr,
#endif
&dev_attr_adm_passthru_err_log_enabled.attr,
+ &dev_attr_adm_errors.attr,
NULL
};
--
2.52.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCHv3 4/7] nvme: export I/O requeue count when no path is available via sysfs
2026-02-20 17:48 [PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs Nilay Shroff
` (2 preceding siblings ...)
2026-02-20 17:48 ` [PATCHv3 3/7] nvme: export command error counters " Nilay Shroff
@ 2026-02-20 17:48 ` Nilay Shroff
2026-02-20 17:48 ` [PATCHv3 5/7] nvme: export I/O failure " Nilay Shroff
` (5 subsequent siblings)
9 siblings, 0 replies; 18+ messages in thread
From: Nilay Shroff @ 2026-02-20 17:48 UTC (permalink / raw)
To: linux-nvme
Cc: kbusch, axboe, hch, sagi, hare, dwagner, wenxiong, gjoyce,
Nilay Shroff
When the NVMe namespace head determines that there is no currently
available path to handle I/O (for example, while a controller is
resetting/connecting or due to a transient link failure), incoming
I/Os are added to the requeue list.
Currently, there is no visibility into how many I/Os have been requeued
in this situation. Add a new sysfs counter, requeue_no_available_path,
to expose the number of I/Os that were requeued due to the absence of
an available path. This counter is also writable thus allowing user
to reset it, if needed.
This statistic can help users understand I/O slowdowns or stalls caused
by temporary path unavailability, and can be consumed by monitoring
tools such as nvme-top for real-time observability.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
drivers/nvme/host/multipath.c | 31 +++++++++++++++++++++++++++++++
drivers/nvme/host/nvme.h | 2 ++
drivers/nvme/host/sysfs.c | 5 +++++
3 files changed, 38 insertions(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index c8ae935658a4..c80d5e27d318 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -539,6 +539,8 @@ static void nvme_ns_head_submit_bio(struct bio *bio)
spin_lock_irq(&head->requeue_lock);
bio_list_add(&head->requeue_list, bio);
spin_unlock_irq(&head->requeue_lock);
+ WRITE_ONCE(head->requeue_no_usable_path,
+ size_add(READ_ONCE(head->requeue_no_usable_path), 1));
} else {
dev_warn_ratelimited(dev, "no available path - failing I/O\n");
@@ -1195,6 +1197,35 @@ static ssize_t multipath_failover_count_store(struct device *dev,
DEVICE_ATTR_RW(multipath_failover_count);
+static ssize_t requeue_no_usable_path_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct gendisk *disk = dev_to_disk(dev);
+ struct nvme_ns_head *head = disk->private_data;
+
+ return sysfs_emit(buf, "%lu\n",
+ READ_ONCE(head->requeue_no_usable_path));
+}
+
+static ssize_t requeue_no_usable_path_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t count)
+{
+ int err;
+ unsigned long requeue_cnt;
+ struct gendisk *disk = dev_to_disk(dev);
+ struct nvme_ns_head *head = disk->private_data;
+
+ err = kstrtoul(buf, 0, &requeue_cnt);
+ if (err)
+ return -EINVAL;
+
+ WRITE_ONCE(head->requeue_no_usable_path, requeue_cnt);
+
+ return count;
+}
+
+DEVICE_ATTR_RW(requeue_no_usable_path);
+
static int nvme_lookup_ana_group_desc(struct nvme_ctrl *ctrl,
struct nvme_ana_group_desc *desc, void *data)
{
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 83b102a0ad89..39e5b5c7885b 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -508,6 +508,7 @@ struct nvme_ns_head {
unsigned long flags;
struct delayed_work remove_work;
unsigned int delayed_removal_secs;
+ size_t requeue_no_usable_path;
#define NVME_NSHEAD_DISK_LIVE 0
#define NVME_NSHEAD_QUEUE_IF_NO_PATH 1
struct nvme_ns __rcu *current_path[];
@@ -1004,6 +1005,7 @@ extern struct device_attribute dev_attr_queue_depth;
extern struct device_attribute dev_attr_numa_nodes;
extern struct device_attribute dev_attr_delayed_removal_secs;
extern struct device_attribute dev_attr_multipath_failover_count;
+extern struct device_attribute dev_attr_requeue_no_usable_path;
extern struct device_attribute subsys_attr_iopolicy;
static inline bool nvme_disk_is_ns_head(struct gendisk *disk)
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index 4012123be507..a4700ef9d18a 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -316,6 +316,7 @@ static struct attribute *nvme_ns_attrs[] = {
&dev_attr_numa_nodes.attr,
&dev_attr_delayed_removal_secs.attr,
&dev_attr_multipath_failover_count.attr,
+ &dev_attr_requeue_no_usable_path.attr,
#endif
&dev_attr_io_passthru_err_log_enabled.attr,
&dev_attr_command_retries.attr,
@@ -374,6 +375,10 @@ static umode_t nvme_ns_attrs_are_visible(struct kobject *kobj,
if (nvme_disk_is_ns_head(dev_to_disk(dev)))
return 0;
}
+ if (a == &dev_attr_requeue_no_usable_path.attr) {
+ if (!nvme_disk_is_ns_head(dev_to_disk(dev)))
+ return 0;
+ }
#endif
return a->mode;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCHv3 5/7] nvme: export I/O failure count when no path is available via sysfs
2026-02-20 17:48 [PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs Nilay Shroff
` (3 preceding siblings ...)
2026-02-20 17:48 ` [PATCHv3 4/7] nvme: export I/O requeue count when no path is available " Nilay Shroff
@ 2026-02-20 17:48 ` Nilay Shroff
2026-02-20 17:48 ` [PATCHv3 6/7] nvme: export controller reset event count " Nilay Shroff
` (4 subsequent siblings)
9 siblings, 0 replies; 18+ messages in thread
From: Nilay Shroff @ 2026-02-20 17:48 UTC (permalink / raw)
To: linux-nvme
Cc: kbusch, axboe, hch, sagi, hare, dwagner, wenxiong, gjoyce,
Nilay Shroff
When I/O is submitted to the NVMe namespace head and no available path
can handle the request, the driver fails the I/O immediately. Currently,
such failures are only reported via kernel log messages, which may be
lost over time since dmesg is a circular buffer.
Add a new sysfs counter, fail_no_available_path, to expose the number of
I/Os that failed due to the absence of an available path. This provides
persistent visibility into path-related I/O failures and can help users
diagnose the cause of I/O errors. This counter is also writable and so
user may reset its value, if needed.
This counter can also be consumed by monitoring tools such as nvme-top.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
drivers/nvme/host/multipath.c | 31 +++++++++++++++++++++++++++++++
drivers/nvme/host/nvme.h | 2 ++
drivers/nvme/host/sysfs.c | 5 +++++
3 files changed, 38 insertions(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index c80d5e27d318..a50845833c89 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -545,6 +545,8 @@ static void nvme_ns_head_submit_bio(struct bio *bio)
dev_warn_ratelimited(dev, "no available path - failing I/O\n");
bio_io_error(bio);
+ WRITE_ONCE(head->fail_no_available_path,
+ size_add(READ_ONCE(head->fail_no_available_path), 1));
}
srcu_read_unlock(&head->srcu, srcu_idx);
@@ -1226,6 +1228,35 @@ static ssize_t requeue_no_usable_path_store(struct device *dev,
DEVICE_ATTR_RW(requeue_no_usable_path);
+static ssize_t fail_no_available_path_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct gendisk *disk = dev_to_disk(dev);
+ struct nvme_ns_head *head = disk->private_data;
+
+ return sysfs_emit(buf, "%lu\n",
+ READ_ONCE(head->fail_no_available_path));
+}
+
+static ssize_t fail_no_available_path_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t count)
+{
+ int err;
+ unsigned long fail_cnt;
+ struct gendisk *disk = dev_to_disk(dev);
+ struct nvme_ns_head *head = disk->private_data;
+
+ err = kstrtoul(buf, 0, &fail_cnt);
+ if (err)
+ return -EINVAL;
+
+ WRITE_ONCE(head->fail_no_available_path, fail_cnt);
+
+ return count;
+}
+
+DEVICE_ATTR_RW(fail_no_available_path);
+
static int nvme_lookup_ana_group_desc(struct nvme_ctrl *ctrl,
struct nvme_ana_group_desc *desc, void *data)
{
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 39e5b5c7885b..b1ce2857899a 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -509,6 +509,7 @@ struct nvme_ns_head {
struct delayed_work remove_work;
unsigned int delayed_removal_secs;
size_t requeue_no_usable_path;
+ size_t fail_no_available_path;
#define NVME_NSHEAD_DISK_LIVE 0
#define NVME_NSHEAD_QUEUE_IF_NO_PATH 1
struct nvme_ns __rcu *current_path[];
@@ -1006,6 +1007,7 @@ extern struct device_attribute dev_attr_numa_nodes;
extern struct device_attribute dev_attr_delayed_removal_secs;
extern struct device_attribute dev_attr_multipath_failover_count;
extern struct device_attribute dev_attr_requeue_no_usable_path;
+extern struct device_attribute dev_attr_fail_no_available_path;
extern struct device_attribute subsys_attr_iopolicy;
static inline bool nvme_disk_is_ns_head(struct gendisk *disk)
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index a4700ef9d18a..790bf875dd1b 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -317,6 +317,7 @@ static struct attribute *nvme_ns_attrs[] = {
&dev_attr_delayed_removal_secs.attr,
&dev_attr_multipath_failover_count.attr,
&dev_attr_requeue_no_usable_path.attr,
+ &dev_attr_fail_no_available_path.attr,
#endif
&dev_attr_io_passthru_err_log_enabled.attr,
&dev_attr_command_retries.attr,
@@ -379,6 +380,10 @@ static umode_t nvme_ns_attrs_are_visible(struct kobject *kobj,
if (!nvme_disk_is_ns_head(dev_to_disk(dev)))
return 0;
}
+ if (a == &dev_attr_fail_no_available_path.attr) {
+ if (!nvme_disk_is_ns_head(dev_to_disk(dev)))
+ return 0;
+ }
#endif
return a->mode;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCHv3 6/7] nvme: export controller reset event count via sysfs
2026-02-20 17:48 [PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs Nilay Shroff
` (4 preceding siblings ...)
2026-02-20 17:48 ` [PATCHv3 5/7] nvme: export I/O failure " Nilay Shroff
@ 2026-02-20 17:48 ` Nilay Shroff
2026-02-20 17:48 ` [PATCHv3 7/7] nvme: export controller reconnect " Nilay Shroff
` (3 subsequent siblings)
9 siblings, 0 replies; 18+ messages in thread
From: Nilay Shroff @ 2026-02-20 17:48 UTC (permalink / raw)
To: linux-nvme
Cc: kbusch, axboe, hch, sagi, hare, dwagner, wenxiong, gjoyce,
Nilay Shroff
The NVMe controller transitions into the RESETTING state during error
recovery, link instability, firmware activation, or when a reset is
explicitly triggered by the user.
Expose a controller reset event count via sysfs attribute named
"reset_events" to provide visibility into these RESETTING state
transitions. Observing the frequency of reset events can help users
identify issues such as PCIe errors or unstable fabric links. This
counter is also writable thus allowing user to reset its value, if
needed.
This counter can also be consumed by monitoring tools such as nvme-top
to improve controller-level observability.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
drivers/nvme/host/core.c | 2 ++
drivers/nvme/host/nvme.h | 1 +
drivers/nvme/host/sysfs.c | 27 +++++++++++++++++++++++++++
3 files changed, 30 insertions(+)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d07e2ed9e494..1cba460d8563 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -593,6 +593,8 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
case NVME_CTRL_NEW:
case NVME_CTRL_LIVE:
changed = true;
+ WRITE_ONCE(ctrl->nr_reset,
+ size_add(READ_ONCE(ctrl->nr_reset), 1));
fallthrough;
default:
break;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index b1ce2857899a..5d90e5fa7298 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -360,6 +360,7 @@ struct nvme_ctrl {
struct work_struct fw_act_work;
unsigned long events;
size_t errors;
+ size_t nr_reset;
#ifdef CONFIG_NVME_MULTIPATH
/* asymmetric namespace access: */
u8 anacap;
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index 790bf875dd1b..f3e6c7208315 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -710,6 +710,32 @@ struct device_attribute dev_attr_adm_errors =
__ATTR(command_error_count, 0644,
nvme_adm_errors_show, nvme_adm_errors_store);
+static ssize_t reset_events_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+ return sysfs_emit(buf, "%lu\n", READ_ONCE(ctrl->nr_reset));
+}
+
+static ssize_t reset_events_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t count)
+{
+ int err;
+ unsigned long reset_cnt;
+ struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+ err = kstrtoul(buf, 0, &reset_cnt);
+ if (err)
+ return -EINVAL;
+
+ WRITE_ONCE(ctrl->nr_reset, reset_cnt);
+
+ return count;
+}
+
+static DEVICE_ATTR_RW(reset_events);
+
#ifdef CONFIG_NVME_HOST_AUTH
static ssize_t nvme_ctrl_dhchap_secret_show(struct device *dev,
struct device_attribute *attr, char *buf)
@@ -857,6 +883,7 @@ static struct attribute *nvme_dev_attrs[] = {
#endif
&dev_attr_adm_passthru_err_log_enabled.attr,
&dev_attr_adm_errors.attr,
+ &dev_attr_reset_events.attr,
NULL
};
--
2.52.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCHv3 7/7] nvme: export controller reconnect event count via sysfs
2026-02-20 17:48 [PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs Nilay Shroff
` (5 preceding siblings ...)
2026-02-20 17:48 ` [PATCHv3 6/7] nvme: export controller reset event count " Nilay Shroff
@ 2026-02-20 17:48 ` Nilay Shroff
2026-02-22 12:36 ` [PATCHv3 0/7] nvme: export additional diagnostic counters " Venkat
` (2 subsequent siblings)
9 siblings, 0 replies; 18+ messages in thread
From: Nilay Shroff @ 2026-02-20 17:48 UTC (permalink / raw)
To: linux-nvme
Cc: kbusch, axboe, hch, sagi, hare, dwagner, wenxiong, gjoyce,
Nilay Shroff
When an NVMe-oF link goes down, the driver attempts to recover the
connection by repeatedly reconnecting to the remote controller at
configured intervals. A maximum number of reconnect attempts is also
configured, after which recovery stops and the controller is removed
if the connection cannot be re-established.
The driver maintains a counter, nr_reconnects, which is incremented on
each reconnect attempt. However if in case the reconnect is successful
then this counter reset to zero. Moreover, currently, this counter is
only reported via kernel log messages and is not exposed to userspace.
Since dmesg is a circular buffer, this information may be lost over
time.
So introduce a new accumulator which accumulates nr_reconnect
attempts and also expose this accumulator via a new sysfs attribute
"reconnect_events" to provide persistent visibility into the number
of reconnect attempts made by the host. This information can help
users diagnose unstable links or connectivity issues. Furthermore,
this sysfs attribute is also writable so user may reset it to zero,
if needed.
The "reconnect_events" can also be consumed by monitoring tools such
as nvme-top to improve controller-level observability.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
drivers/nvme/host/fc.c | 5 +++++
drivers/nvme/host/nvme.h | 2 ++
drivers/nvme/host/rdma.c | 4 ++++
drivers/nvme/host/sysfs.c | 30 ++++++++++++++++++++++++++++++
drivers/nvme/host/tcp.c | 3 +++
5 files changed, 44 insertions(+)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 6948de3f438a..a918217620d1 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3148,6 +3148,10 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
goto out_term_aen_ops;
}
+ /* accumulate reconnect attempts before resetting it to zero */
+ WRITE_ONCE(ctrl->ctrl.acc_reconnects,
+ READ_ONCE(ctrl->ctrl.acc_reconnects) +
+ ctrl->ctrl.nr_reconnects);
ctrl->ctrl.nr_reconnects = 0;
nvme_start_ctrl(&ctrl->ctrl);
@@ -3470,6 +3474,7 @@ nvme_fc_alloc_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
ctrl->ctrl.opts = opts;
ctrl->ctrl.nr_reconnects = 0;
+ ctrl->ctrl.acc_reconnects = 0;
INIT_LIST_HEAD(&ctrl->ctrl_list);
ctrl->lport = lport;
ctrl->rport = rport;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 5d90e5fa7298..9146d1b48606 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -401,6 +401,8 @@ struct nvme_ctrl {
u16 icdoff;
u16 maxcmd;
int nr_reconnects;
+ /* accumulate reconenct attempts, as nr_reconnects can reset to zero */
+ size_t acc_reconnects;
unsigned long flags;
struct nvmf_ctrl_options *opts;
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 35c0822edb2d..bd5492ad3da6 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1110,6 +1110,10 @@ static void nvme_rdma_reconnect_ctrl_work(struct work_struct *work)
dev_info(ctrl->ctrl.device, "Successfully reconnected (%d attempts)\n",
ctrl->ctrl.nr_reconnects);
+ /* accumulate reconnect attempts before resetting it to zero */
+ WRITE_ONCE(ctrl->ctrl.acc_reconnects,
+ READ_ONCE(ctrl->ctrl.acc_reconnects) +
+ ctrl->ctrl.nr_reconnects);
ctrl->ctrl.nr_reconnects = 0;
return;
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index f3e6c7208315..166e45b589ad 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -736,6 +736,33 @@ static ssize_t reset_events_store(struct device *dev,
static DEVICE_ATTR_RW(reset_events);
+static ssize_t reconnect_events_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+ return sysfs_emit(buf, "%lu\n",
+ READ_ONCE(ctrl->acc_reconnects) + ctrl->nr_reconnects);
+}
+
+static ssize_t reconnect_events_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t count)
+{
+ int err;
+ unsigned long reconnect_cnt;
+ struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+ err = kstrtoul(buf, 0, &reconnect_cnt);
+ if (err)
+ return -EINVAL;
+
+ WRITE_ONCE(ctrl->acc_reconnects, reconnect_cnt);
+
+ return count;
+}
+
+static DEVICE_ATTR_RW(reconnect_events);
+
#ifdef CONFIG_NVME_HOST_AUTH
static ssize_t nvme_ctrl_dhchap_secret_show(struct device *dev,
struct device_attribute *attr, char *buf)
@@ -884,6 +911,7 @@ static struct attribute *nvme_dev_attrs[] = {
&dev_attr_adm_passthru_err_log_enabled.attr,
&dev_attr_adm_errors.attr,
&dev_attr_reset_events.attr,
+ &dev_attr_reconnect_events.attr,
NULL
};
@@ -913,6 +941,8 @@ static umode_t nvme_dev_attrs_are_visible(struct kobject *kobj,
if (a == &dev_attr_dhchap_ctrl_secret.attr && !ctrl->opts)
return 0;
#endif
+ if (a == &dev_attr_reconnect_events.attr && !ctrl->opts)
+ return 0;
return a->mode;
}
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 69cb04406b47..46398c826368 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2460,6 +2460,9 @@ static void nvme_tcp_reconnect_ctrl_work(struct work_struct *work)
dev_info(ctrl->device, "Successfully reconnected (attempt %d/%d)\n",
ctrl->nr_reconnects, ctrl->opts->max_reconnects);
+ /* accumulate reconnect attempts before resetting it to zero */
+ WRITE_ONCE(ctrl->acc_reconnects,
+ READ_ONCE(ctrl->acc_reconnects) + ctrl->nr_reconnects);
ctrl->nr_reconnects = 0;
return;
--
2.52.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs
2026-02-20 17:48 [PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs Nilay Shroff
` (6 preceding siblings ...)
2026-02-20 17:48 ` [PATCHv3 7/7] nvme: export controller reconnect " Nilay Shroff
@ 2026-02-22 12:36 ` Venkat
2026-02-22 14:10 ` Nilay Shroff
2026-02-26 5:37 ` Chaitanya Kulkarni
2026-03-04 14:33 ` Nilay Shroff
9 siblings, 1 reply; 18+ messages in thread
From: Venkat @ 2026-02-22 12:36 UTC (permalink / raw)
To: Nilay Shroff
Cc: linux-nvme, kbusch, axboe, hch, sagi, hare, dwagner, wenxiong,
gjoyce
> On 20 Feb 2026, at 11:18 PM, Nilay Shroff <nilay@linux.ibm.com> wrote:
>
> Hi,
>
> The NVMe driver encounters various events and conditions during normal
> operation that are either not tracked today or not exposed to userspace
> via sysfs. Lack of visibility into these events can make it difficult to
> diagnose subtle issues related to controller behavior, multipath
> stability, and I/O reliability.
>
> This patchset adds several diagnostic counters that provide improved
> observability into NVMe behavior. These counters are intended to help
> users understand events such as transient path unavailability,
> controller retries/reconnect/reset, failovers, and I/O failures. They
> can also be consumed by monitoring tools such as nvme-top.
>
> Specifically, this series proposes to export the following counters via
> sysfs:
> - Command retry count
> - Multipath failover count
> - Command error count
> - I/O requeue count
> - I/O failure count
> - Controller reset event counts
> - Controller reconnect counts
>
> The patchset consists of seven patches:
> Patch 1: Export command retry count
> Patch 2: Export multipath failover count
> Patch 3: Export command error count
> Patch 4: Export I/O requeue count
> Patch 5: Export I/O failure count
> Patch 6: Export controller reset event counts
> Patch 7: Export controller reconnect event count
>
> Please note that this patchset doesn't make any functional change but
> rather export relevant counters to user space via sysfs.
>
> As usual, feedback/comments/suggestions are welcome!
>
> Changes from v2:
> - Allow user to write to sysfs attributes so that user could
> reset stat counters, if needed (Sagi)
> - The controller reconnect counter nr_reconnects could reset
> to zero once connection is re-established, so instead of
> exposing nr_reconnects counter via sysfs introduce a new
> counter which accumulates the reconnect attempts and export
> this accumulated counter via sysfs (Sagi)
> Link to v2: https://lore.kernel.org/all/20260205124810.682559-1-nilay@linux.ibm.com/
>
> Changes from v1:
> - Remove export of stats for admin command rerty count (Keith)
> - Use size_add() to ensure stat counters don't overflow (Keith)
> Link to v1: https://lore.kernel.org/all/20260130182028.885089-1-nilay@linux.ibm.com/
>
> Nilay Shroff (7):
> nvme: export command retry count via sysfs
> nvme: export multipath failover count via sysfs
> nvme: export command error counters via sysfs
> nvme: export I/O requeue count when no path is available via sysfs
> nvme: export I/O failure count when no path is available via sysfs
> nvme: export controller reset event count via sysfs
> nvme: export controller reconnect event count via sysfs
>
> drivers/nvme/host/core.c | 18 +++-
> drivers/nvme/host/fc.c | 5 +
> drivers/nvme/host/multipath.c | 89 ++++++++++++++++++
> drivers/nvme/host/nvme.h | 13 ++-
> drivers/nvme/host/rdma.c | 4 +
> drivers/nvme/host/sysfs.c | 167 ++++++++++++++++++++++++++++++++++
> drivers/nvme/host/tcp.c | 3 +
> 7 files changed, 297 insertions(+), 2 deletions(-)
>
> --
> 2.52.0
>
>
Hello Nilay,
I tested this patch series and found couple of attributes are missing.
Missing diag counters:
1. I/O requeue count
2. I/O failure count
Rest all diag counters are exposed via sysfs properly.
Controller-level counters observed:
- reset_events
- reconnect_events
- command_error_count
Namespace-instance counters observed:
- command_retries
- multipath_failover_count
- command_error_count
Logs:
ll /sys/class/nvme/nvme3/
total 0
-r--r--r-- 1 root root 65536 Feb 22 05:49 address
-r--r--r-- 1 root root 65536 Feb 22 05:58 cntlid
-r--r--r-- 1 root root 65536 Feb 22 05:49 cntrltype
-rw-r--r-- 1 root root 65536 Feb 22 06:10 command_error_count
-rw-r--r-- 1 root root 65536 Feb 22 05:58 ctrl_loss_tmo
-r--r--r-- 1 root root 65536 Feb 22 05:49 dctype
--w------- 1 root root 65536 Feb 22 05:58 delete_controller
-r--r--r-- 1 root root 65536 Feb 22 05:58 dev
lrwxrwxrwx 1 root root 0 Feb 22 05:50 device -> ../../ctl
-rw-r--r-- 1 root root 65536 Feb 22 05:58 fast_io_fail_tmo
-r--r--r-- 1 root root 65536 Feb 22 05:49 firmware_rev
-r--r--r-- 1 root root 65536 Feb 22 05:51 hostid
-r--r--r-- 1 root root 65536 Feb 22 05:51 hostnqn
-r--r--r-- 1 root root 65536 Feb 22 05:58 kato
-r--r--r-- 1 root root 65536 Feb 22 05:49 model
-r--r--r-- 1 root root 65536 Feb 22 05:49 numa_node
drwxr-xr-x 9 root root 0 Feb 22 05:49 nvme3c3n1
drwxr-xr-x 9 root root 0 Feb 22 05:49 nvme3c3n10
drwxr-xr-x 9 root root 0 Feb 22 05:49 nvme3c3n2
drwxr-xr-x 9 root root 0 Feb 22 05:49 nvme3c3n3
drwxr-xr-x 9 root root 0 Feb 22 05:49 nvme3c3n4
drwxr-xr-x 9 root root 0 Feb 22 05:49 nvme3c3n5
drwxr-xr-x 9 root root 0 Feb 22 05:49 nvme3c3n6
drwxr-xr-x 9 root root 0 Feb 22 05:49 nvme3c3n7
drwxr-xr-x 9 root root 0 Feb 22 05:49 nvme3c3n8
drwxr-xr-x 9 root root 0 Feb 22 05:49 nvme3c3n9
-rw-r--r-- 1 root root 65536 Feb 22 05:58 passthru_err_log_enabled
drwxr-xr-x 2 root root 0 Feb 22 05:58 power
-r--r--r-- 1 root root 65536 Feb 22 05:49 queue_count
-rw-r--r-- 1 root root 65536 Feb 22 05:58 reconnect_delay
-rw-r--r-- 1 root root 65536 Feb 22 06:11 reconnect_events
--w------- 1 root root 65536 Feb 22 05:58 rescan_controller
--w------- 1 root root 65536 Feb 22 06:11 reset_controller
-rw-r--r-- 1 root root 65536 Feb 22 06:10 reset_events
-r--r--r-- 1 root root 65536 Feb 22 05:49 serial
-r--r--r-- 1 root root 65536 Feb 22 05:49 sqsize
-r--r--r-- 1 root root 65536 Feb 22 05:49 state
-r--r--r-- 1 root root 65536 Feb 22 05:51 subsysnqn
lrwxrwxrwx 1 root root 0 Feb 22 05:49 subsystem -> ../../../../../class/nvme
-r--r--r-- 1 root root 65536 Feb 22 05:51 transport
-rw-r--r-- 1 root root 65536 Feb 22 05:49 uevent
ll /sys/class/nvme/nvme3/nvme3c3n8
total 0
-r--r--r-- 1 root root 65536 Feb 22 06:02 alignment_offset
-r--r--r-- 1 root root 65536 Feb 22 05:51 ana_grpid
-r--r--r-- 1 root root 65536 Feb 22 05:51 ana_state
-r--r--r-- 1 root root 65536 Feb 22 06:02 capability
-rw-r--r-- 1 root root 65536 Feb 22 06:07 command_error_count
-rw-r--r-- 1 root root 65536 Feb 22 06:07 command_retries
-r--r--r-- 1 root root 65536 Feb 22 06:02 csi
lrwxrwxrwx 1 root root 0 Feb 22 05:50 device -> ../../nvme3
-r--r--r-- 1 root root 65536 Feb 22 06:02 discard_alignment
-r--r--r-- 1 root root 65536 Feb 22 06:02 diskseq
-r--r--r-- 1 root root 65536 Feb 22 06:02 events
-r--r--r-- 1 root root 65536 Feb 22 06:02 events_async
-rw-r--r-- 1 root root 65536 Feb 22 06:02 events_poll_msecs
-r--r--r-- 1 root root 65536 Feb 22 06:02 ext_range
-r--r--r-- 1 root root 65536 Feb 22 06:02 hidden
drwxr-xr-x 2 root root 0 Feb 22 06:02 holders
-r--r--r-- 1 root root 65536 Feb 22 06:02 inflight
drwxr-xr-x 2 root root 0 Feb 22 06:02 integrity
-r--r--r-- 1 root root 65536 Feb 22 06:02 metadata_bytes
drwxr-xr-x 18 root root 0 Feb 22 06:02 mq
-rw-r--r-- 1 root root 65536 Feb 22 06:07 multipath_failover_count
-r--r--r-- 1 root root 65536 Feb 22 06:02 nguid
-r--r--r-- 1 root root 65536 Feb 22 06:02 nsid
-r--r--r-- 1 root root 65536 Feb 22 06:02 numa_nodes
-r--r--r-- 1 root root 65536 Feb 22 06:02 nuse
-r--r--r-- 1 root root 65536 Feb 22 06:02 partscan
-rw-r--r-- 1 root root 65536 Feb 22 06:02 passthru_err_log_enabled
drwxr-xr-x 2 root root 0 Feb 22 06:02 power
drwxr-xr-x 2 root root 0 Feb 22 05:49 queue
-r--r--r-- 1 root root 65536 Feb 22 06:02 queue_depth
-r--r--r-- 1 root root 65536 Feb 22 06:02 range
-r--r--r-- 1 root root 65536 Feb 22 05:49 removable
-r--r--r-- 1 root root 65536 Feb 22 06:02 ro
-r--r--r-- 1 root root 65536 Feb 22 05:50 size
drwxr-xr-x 2 root root 0 Feb 22 06:02 slaves
-r--r--r-- 1 root root 65536 Feb 22 06:02 stat
lrwxrwxrwx 1 root root 0 Feb 22 05:49 subsystem -> ../../../../../../class/block
drwxr-xr-x 2 root root 0 Feb 22 06:02 trace
-rw-r--r-- 1 root root 65536 Feb 22 05:49 uevent
-r--r--r-- 1 root root 65536 Feb 22 06:02 uuid
-r--r--r-- 1 root root 65536 Feb 22 06:02 wwid
Regards,
Venkat.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs
2026-02-22 12:36 ` [PATCHv3 0/7] nvme: export additional diagnostic counters " Venkat
@ 2026-02-22 14:10 ` Nilay Shroff
2026-02-22 15:06 ` Venkat Rao Bagalkote
0 siblings, 1 reply; 18+ messages in thread
From: Nilay Shroff @ 2026-02-22 14:10 UTC (permalink / raw)
To: Venkat
Cc: linux-nvme, kbusch, axboe, hch, sagi, hare, dwagner, wenxiong,
gjoyce
On 2/22/26 6:06 PM, Venkat wrote:
> Hello Nilay,
>
> I tested this patch series and found couple of attributes are missing.
>
> Missing diag counters:
>
> 1. I/O requeue count
> 2. I/O failure count
These counters are exported under head node. So you should be able
to access it under here:
# ll /sys/block/nvme3nX
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs
2026-02-22 14:10 ` Nilay Shroff
@ 2026-02-22 15:06 ` Venkat Rao Bagalkote
0 siblings, 0 replies; 18+ messages in thread
From: Venkat Rao Bagalkote @ 2026-02-22 15:06 UTC (permalink / raw)
To: Nilay Shroff
Cc: linux-nvme, kbusch, axboe, hch, sagi, hare, dwagner, wenxiong,
gjoyce
On 22/02/26 7:40 pm, Nilay Shroff wrote:
>
> On 2/22/26 6:06 PM, Venkat wrote:
>> Hello Nilay,
>>
>> I tested this patch series and found couple of attributes are missing.
>>
>> Missing diag counters:
>>
>> 1. I/O requeue count
>> 2. I/O failure count
> These counters are exported under head node. So you should be able
> to access it under here:
>
> # ll /sys/block/nvme3nX
Thanks Nilay, for pointing it to me. With this, all the counters are
exposed.
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
=== /sys/block/nvme4n1 ===
fail_no_available_path
requeue_no_usable_path
=== /sys/block/nvme4n10 ===
fail_no_available_path
requeue_no_usable_path
=== /sys/block/nvme4n2 ===
fail_no_available_path
requeue_no_usable_path
=== /sys/block/nvme4n3 ===
fail_no_available_path
requeue_no_usable_path
=== /sys/block/nvme4n4 ===
fail_no_available_path
requeue_no_usable_path
=== /sys/block/nvme4n5 ===
fail_no_available_path
requeue_no_usable_path
=== /sys/block/nvme4n6 ===
fail_no_available_path
requeue_no_usable_path
=== /sys/block/nvme4n7 ===
fail_no_available_path
requeue_no_usable_path
=== /sys/block/nvme4n8 ===
fail_no_available_path
requeue_no_usable_path
=== /sys/block/nvme4n9 ===
fail_no_available_path
requeue_no_usable_path
Regards,
Venkat.
>
> Thanks,
> --Nilay
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs
2026-02-20 17:48 [PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs Nilay Shroff
` (7 preceding siblings ...)
2026-02-22 12:36 ` [PATCHv3 0/7] nvme: export additional diagnostic counters " Venkat
@ 2026-02-26 5:37 ` Chaitanya Kulkarni
2026-03-04 14:33 ` Nilay Shroff
9 siblings, 0 replies; 18+ messages in thread
From: Chaitanya Kulkarni @ 2026-02-26 5:37 UTC (permalink / raw)
To: Nilay Shroff, linux-nvme@lists.infradead.org
Cc: kbusch@kernel.org, axboe@kernel.dk, hch@lst.de, sagi@grimberg.me,
hare@suse.de, dwagner@suse.de, wenxiong@linux.ibm.com,
gjoyce@ibm.com
On 2/20/26 09:48, Nilay Shroff wrote:
> Hi,
>
> The NVMe driver encounters various events and conditions during normal
> operation that are either not tracked today or not exposed to userspace
> via sysfs. Lack of visibility into these events can make it difficult to
> diagnose subtle issues related to controller behavior, multipath
> stability, and I/O reliability.
>
> This patchset adds several diagnostic counters that provide improved
> observability into NVMe behavior. These counters are intended to help
> users understand events such as transient path unavailability,
> controller retries/reconnect/reset, failovers, and I/O failures. They
> can also be consumed by monitoring tools such as nvme-top.
>
> Specifically, this series proposes to export the following counters via
> sysfs:
> - Command retry count
> - Multipath failover count
> - Command error count
> - I/O requeue count
> - I/O failure count
> - Controller reset event counts
> - Controller reconnect counts
>
> The patchset consists of seven patches:
> Patch 1: Export command retry count
> Patch 2: Export multipath failover count
> Patch 3: Export command error count
> Patch 4: Export I/O requeue count
> Patch 5: Export I/O failure count
> Patch 6: Export controller reset event counts
> Patch 7: Export controller reconnect event count
Cover letter automatically logs number of patches and respective author,
you can avoid adding this from next time.
For the whole series, looks good.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-ck
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs
2026-02-20 17:48 [PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs Nilay Shroff
` (8 preceding siblings ...)
2026-02-26 5:37 ` Chaitanya Kulkarni
@ 2026-03-04 14:33 ` Nilay Shroff
2026-03-06 16:02 ` Keith Busch
9 siblings, 1 reply; 18+ messages in thread
From: Nilay Shroff @ 2026-03-04 14:33 UTC (permalink / raw)
To: linux-nvme, Keith Busch; +Cc: axboe, hch, sagi, hare, dwagner, wenxiong, gjoyce
Hi Keith,
A gentle ping on this. I’ve incorporated the review comments,
and the series has already received Reviewed-by and Tested-by tags.
Could you please consider pulling it? Also, please let me know if
you have any further comments or if additional changes are needed.
Thanks,
--Nilay
On 2/20/26 11:18 PM, Nilay Shroff wrote:
> Hi,
>
> The NVMe driver encounters various events and conditions during normal
> operation that are either not tracked today or not exposed to userspace
> via sysfs. Lack of visibility into these events can make it difficult to
> diagnose subtle issues related to controller behavior, multipath
> stability, and I/O reliability.
>
> This patchset adds several diagnostic counters that provide improved
> observability into NVMe behavior. These counters are intended to help
> users understand events such as transient path unavailability,
> controller retries/reconnect/reset, failovers, and I/O failures. They
> can also be consumed by monitoring tools such as nvme-top.
>
> Specifically, this series proposes to export the following counters via
> sysfs:
> - Command retry count
> - Multipath failover count
> - Command error count
> - I/O requeue count
> - I/O failure count
> - Controller reset event counts
> - Controller reconnect counts
>
> The patchset consists of seven patches:
> Patch 1: Export command retry count
> Patch 2: Export multipath failover count
> Patch 3: Export command error count
> Patch 4: Export I/O requeue count
> Patch 5: Export I/O failure count
> Patch 6: Export controller reset event counts
> Patch 7: Export controller reconnect event count
>
> Please note that this patchset doesn't make any functional change but
> rather export relevant counters to user space via sysfs.
>
> As usual, feedback/comments/suggestions are welcome!
>
> Changes from v2:
> - Allow user to write to sysfs attributes so that user could
> reset stat counters, if needed (Sagi)
> - The controller reconnect counter nr_reconnects could reset
> to zero once connection is re-established, so instead of
> exposing nr_reconnects counter via sysfs introduce a new
> counter which accumulates the reconnect attempts and export
> this accumulated counter via sysfs (Sagi)
> Link to v2: https://lore.kernel.org/all/20260205124810.682559-1-nilay@linux.ibm.com/
>
> Changes from v1:
> - Remove export of stats for admin command rerty count (Keith)
> - Use size_add() to ensure stat counters don't overflow (Keith)
> Link to v1: https://lore.kernel.org/all/20260130182028.885089-1-nilay@linux.ibm.com/
>
> Nilay Shroff (7):
> nvme: export command retry count via sysfs
> nvme: export multipath failover count via sysfs
> nvme: export command error counters via sysfs
> nvme: export I/O requeue count when no path is available via sysfs
> nvme: export I/O failure count when no path is available via sysfs
> nvme: export controller reset event count via sysfs
> nvme: export controller reconnect event count via sysfs
>
> drivers/nvme/host/core.c | 18 +++-
> drivers/nvme/host/fc.c | 5 +
> drivers/nvme/host/multipath.c | 89 ++++++++++++++++++
> drivers/nvme/host/nvme.h | 13 ++-
> drivers/nvme/host/rdma.c | 4 +
> drivers/nvme/host/sysfs.c | 167 ++++++++++++++++++++++++++++++++++
> drivers/nvme/host/tcp.c | 3 +
> 7 files changed, 297 insertions(+), 2 deletions(-)
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs
2026-03-04 14:33 ` Nilay Shroff
@ 2026-03-06 16:02 ` Keith Busch
2026-03-08 18:55 ` Nilay Shroff
0 siblings, 1 reply; 18+ messages in thread
From: Keith Busch @ 2026-03-06 16:02 UTC (permalink / raw)
To: Nilay Shroff
Cc: linux-nvme, axboe, hch, sagi, hare, dwagner, wenxiong, gjoyce
On Wed, Mar 04, 2026 at 08:03:02PM +0530, Nilay Shroff wrote:
> Hi Keith,
>
> A gentle ping on this. I´ve incorporated the review comments,
> and the series has already received Reviewed-by and Tested-by tags.
>
> Could you please consider pulling it? Also, please let me know if
> you have any further comments or if additional changes are needed.
Thanks, I was hoping free time would show up to let me look closer at
this, but that doesn't appear to be happening, so looks like I have to
force the time :)
We always need to be a bit cautious on adding user visible attributes,
so just throwing out some thoughts on this feature.
The event counters are not atomic. I know these are informational, so
maybe it's not important to be 100% accurate in the reporting, but I
don't know. Maybe you do want accuracy in which case using atomic_long_t
might be the right type. I think that should be okay to use since none
of these counters are in the normal fast path.
The names of the attributes are a bit inconsistent. Some have a "_count"
suffix, some have "_events" suffix, and some have no suffix at all.
In order to keep sysfs a bit cleaner, should we consider moving these
attributes under a sub-directory specifically for reporting event
counters?
Last thought, as you are probably aware, John Garry is proposing to
lift the nvme multipath into a generic library, which suggests many of
these events would also need to be generic. Should some of these, like
error and retry counts, be appended to the generic disk stats instead?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs
2026-03-06 16:02 ` Keith Busch
@ 2026-03-08 18:55 ` Nilay Shroff
2026-03-09 15:32 ` John Garry
2026-03-16 12:56 ` Nilay Shroff
0 siblings, 2 replies; 18+ messages in thread
From: Nilay Shroff @ 2026-03-08 18:55 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-nvme, axboe, hch, sagi, hare, dwagner, wenxiong, gjoyce
On 3/6/26 9:32 PM, Keith Busch wrote:
> On Wed, Mar 04, 2026 at 08:03:02PM +0530, Nilay Shroff wrote:
>> Hi Keith,
>>
>> A gentle ping on this. I´ve incorporated the review comments,
>> and the series has already received Reviewed-by and Tested-by tags.
>>
>> Could you please consider pulling it? Also, please let me know if
>> you have any further comments or if additional changes are needed.
>
> Thanks, I was hoping free time would show up to let me look closer at
> this, but that doesn't appear to be happening, so looks like I have to
> force the time :)
>
Thanks for your time and review!
> We always need to be a bit cautious on adding user visible attributes,
> so just throwing out some thoughts on this feature.
>
> The event counters are not atomic. I know these are informational, so
> maybe it's not important to be 100% accurate in the reporting, but I
> don't know. Maybe you do want accuracy in which case using atomic_long_t
> might be the right type. I think that should be okay to use since none
> of these counters are in the normal fast path.
Currently the counters are implemented using size_t and updated through
the size_add() helper to avoid overflow (as you suggested earlier in the
review). Since these counters are not updated in performance critical
paths, switching them to atomic_long_t should be fine. I'll update the
implementation to use atomic counters.
>
> The names of the attributes are a bit inconsistent. Some have a "_count"
> suffix, some have "_events" suffix, and some have no suffix at all.
>
Yes, I originally chose names that I felt were descriptive while trying
to keep them concise. However I agree that consistent naming is more
important. I'd update all attributes to use a uniform "_count" suffix.
> In order to keep sysfs a bit cleaner, should we consider moving these
> attributes under a sub-directory specifically for reporting event
> counters?
>
The counters are currently exported from different objects depending on
what they represent. Some are per-namespace path, some per-controller,
and others are associated with the namespace head. Because of this
separation, placing all counters under a single sub-directory would
not be feasible.
If the concern is reducing clutter in existing directories, we could
instead introduce a dedicated sub-directory under each object and export
the counters there. For example, we could create a "diag" directory under
the per-ns path, controller, and ns-head directories and then export the
respective counters under "diag". What do you suggest?
> Last thought, as you are probably aware, John Garry is proposing to
> lift the nvme multipath into a generic library, which suggests many of
> these events would also need to be generic. Should some of these, like
> error and retry counts, be appended to the generic disk stats instead?
Yes I am aware about libmultipath work.
I agree that retry and error counters might conceptually fit into
generic disk statistics. However the intent of these diagnostic counters
is to capture all relevant events, including passthrough commands.
Passthrough requests are typically not accounted for in generic disk
statistics, which makes that interface unsuitable for these counters.
Additionally some counters are reported at the controller level, and
controllers do not have an associated gendisk or block device.
For these reasons exporting them through the dedicated sysfs interfaces
appears to be the most appropriate approach.
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs
2026-03-08 18:55 ` Nilay Shroff
@ 2026-03-09 15:32 ` John Garry
2026-03-19 15:55 ` Nilay Shroff
2026-03-16 12:56 ` Nilay Shroff
1 sibling, 1 reply; 18+ messages in thread
From: John Garry @ 2026-03-09 15:32 UTC (permalink / raw)
To: linux-nvme
On 08/03/2026 18:55, Nilay Shroff wrote:
>> Last thought, as you are probably aware, John Garry is proposing to
>> lift the nvme multipath into a generic library, which suggests many of
>> these events would also need to be generic. Should some of these, like
>> error and retry counts, be appended to the generic disk stats instead?
>
Thanks for the mention
> Yes I am aware about libmultipath work.
> I agree that retry and error counters might conceptually fit into
> generic disk statistics. However the intent of these diagnostic counters
> is to capture all relevant events, including passthrough commands.
>
> Passthrough requests are typically not accounted for in generic disk
> statistics, which makes that interface unsuitable for these counters.
> Additionally some counters are reported at the controller level, and
> controllers do not have an associated gendisk or block device.
>
> For these reasons exporting them through the dedicated sysfs interfaces
> appears to be the most appropriate approach.
From the current list of proposed counters, my thoughts per counter are
WRT SCSI:
"nvme: export command retry count"
The ACRE which this is based on is not relevant (to SCSI), and I would
be reluctant to add such a counter for scsi_devices
"nvme: export multipath failover "
I think that this could be added for scsi_mpath_device class
"nvme: export command error counters "
Similar as "nvme: export command retry count"
"nvme: export I/O requeue count when no path is available "
I think that this could be added for scsi_mpath_device class
"nvme: export I/O failure"
Not really relevant to SCSI, or more relevant to SCSI low-level drivers
(which I would not want to expose as an ABI for SCSI multipath)
"nvme: export controller reset event count "
Same as "nvme: export I/O failure"
"nvme: export controller reconnect "
Again, same as "nvme: export I/O failure"
BTW, I think that the counters should be atomic - otherwise we are not
getting accurate results. And, as is mentioned, none seem to be in the
fastpath (so I don't know why not have them as atomic).
Finally, some of these counters seem to me to be more suitable for a
debugfs (and not sysfs).
Cheers!
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs
2026-03-08 18:55 ` Nilay Shroff
2026-03-09 15:32 ` John Garry
@ 2026-03-16 12:56 ` Nilay Shroff
1 sibling, 0 replies; 18+ messages in thread
From: Nilay Shroff @ 2026-03-16 12:56 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-nvme, axboe, hch, sagi, hare, dwagner, wenxiong, gjoyce
Hi Keith,
On 3/9/26 12:25 AM, Nilay Shroff wrote:
> On 3/6/26 9:32 PM, Keith Busch wrote:
>> On Wed, Mar 04, 2026 at 08:03:02PM +0530, Nilay Shroff wrote:
>>> Hi Keith,
>>>
>>> A gentle ping on this. I´ve incorporated the review comments,
>>> and the series has already received Reviewed-by and Tested-by tags.
>>>
>>> Could you please consider pulling it? Also, please let me know if
>>> you have any further comments or if additional changes are needed.
>>
>> Thanks, I was hoping free time would show up to let me look closer at
>> this, but that doesn't appear to be happening, so looks like I have to
>> force the time :)
>>
> Thanks for your time and review!
>
>> We always need to be a bit cautious on adding user visible attributes,
>> so just throwing out some thoughts on this feature.
>>
>> The event counters are not atomic. I know these are informational, so
>> maybe it's not important to be 100% accurate in the reporting, but I
>> don't know. Maybe you do want accuracy in which case using atomic_long_t
>> might be the right type. I think that should be okay to use since none
>> of these counters are in the normal fast path.
>
> Currently the counters are implemented using size_t and updated through
> the size_add() helper to avoid overflow (as you suggested earlier in the
> review). Since these counters are not updated in performance critical
> paths, switching them to atomic_long_t should be fine. I'll update the
> implementation to use atomic counters.
>
>>
>> The names of the attributes are a bit inconsistent. Some have a "_count"
>> suffix, some have "_events" suffix, and some have no suffix at all.
>>
> Yes, I originally chose names that I felt were descriptive while trying
> to keep them concise. However I agree that consistent naming is more
> important. I'd update all attributes to use a uniform "_count" suffix.
>
>> In order to keep sysfs a bit cleaner, should we consider moving these
>> attributes under a sub-directory specifically for reporting event
>> counters?
>>
> The counters are currently exported from different objects depending on
> what they represent. Some are per-namespace path, some per-controller,
> and others are associated with the namespace head. Because of this
> separation, placing all counters under a single sub-directory would
> not be feasible.
>
> If the concern is reducing clutter in existing directories, we could
> instead introduce a dedicated sub-directory under each object and export
> the counters there. For example, we could create a "diag" directory under
> the per-ns path, controller, and ns-head directories and then export the
> respective counters under "diag". What do you suggest?
>
>> Last thought, as you are probably aware, John Garry is proposing to
>> lift the nvme multipath into a generic library, which suggests many of
>> these events would also need to be generic. Should some of these, like
>> error and retry counts, be appended to the generic disk stats instead?
>
> Yes I am aware about libmultipath work.
> I agree that retry and error counters might conceptually fit into
> generic disk statistics. However the intent of these diagnostic counters
> is to capture all relevant events, including passthrough commands.
>
> Passthrough requests are typically not accounted for in generic disk
> statistics, which makes that interface unsuitable for these counters.
> Additionally some counters are reported at the controller level, and
> controllers do not have an associated gendisk or block device.
>
> For these reasons exporting them through the dedicated sysfs interfaces
> appears to be the most appropriate approach.
>
Based on the points you raised earlier in the thread, I propose the
following updates to the series:
1. Convert the stat/event counters to atomic_long_t to ensure correctness.
2. Make the attribute naming consistent by using a uniform _count suffix
for all counters.
3. Export the counters under a dedicated sysfs sub-directory. Since the
counters are exposed from different kobjects (per-ns path, controller,
and ns-head), I would add a diag sub-directory under each of those
objects and place the respective counters there.
4. Keep these counters out of the generic gendisk statistics, as some of the
counters include passthrough requests, which are typically not accounted
for in the generic disk stats interface.
Please let me know if this approach looks reasonable.
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs
2026-03-09 15:32 ` John Garry
@ 2026-03-19 15:55 ` Nilay Shroff
0 siblings, 0 replies; 18+ messages in thread
From: Nilay Shroff @ 2026-03-19 15:55 UTC (permalink / raw)
To: John Garry, linux-nvme
Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Hannes Reinecke,
Daniel Wagner, wenxiong, gjoyce, Sagi Grimberg
Hi John,
Sorry I missed to reply earlier as I see you sent your reply (probably by mistake)
only to the list; so I just saw your message. I am adding back all recipients.
On 3/9/26 9:02 PM, John Garry wrote:
> On 08/03/2026 18:55, Nilay Shroff wrote:
>>> Last thought, as you are probably aware, John Garry is proposing to
>>> lift the nvme multipath into a generic library, which suggests many of
>>> these events would also need to be generic. Should some of these, like
>>> error and retry counts, be appended to the generic disk stats instead?
>>
>
> Thanks for the mention
>
>> Yes I am aware about libmultipath work.
>> I agree that retry and error counters might conceptually fit into
>> generic disk statistics. However the intent of these diagnostic counters
>> is to capture all relevant events, including passthrough commands.
>>
>> Passthrough requests are typically not accounted for in generic disk
>> statistics, which makes that interface unsuitable for these counters.
>> Additionally some counters are reported at the controller level, and
>> controllers do not have an associated gendisk or block device.
>>
>> For these reasons exporting them through the dedicated sysfs interfaces
>> appears to be the most appropriate approach.
>
> From the current list of proposed counters, my thoughts per counter are WRT SCSI:
> "nvme: export command retry count"
> The ACRE which this is based on is not relevant (to SCSI), and I would be reluctant to add such a counter for scsi_devices
>
> "nvme: export multipath failover "
> I think that this could be added for scsi_mpath_device class
Ack
>
> "nvme: export command error counters "
> Similar as "nvme: export command retry count"
>
> "nvme: export I/O requeue count when no path is available "
> I think that this could be added for scsi_mpath_device class
I think this one should be added under mpath_head; this counter
represents the num of I/Os which has to re-queued (i.e.
mpath_head->requeue_list) due to none of the path is currently
available
>
> "nvme: export I/O failure"
> Not really relevant to SCSI, or more relevant to SCSI low-level drivers (which I would not want to expose as an ABI for SCSI multipath)
>
I think this one as well could be added under mpath_head; this
counter represents num of I/Os which are forced to fail maybe
because all paths (reachable via head node) were either deleted
or not usable at all.
> "nvme: export controller reset event count "
> Same as "nvme: export I/O failure"
>
> "nvme: export controller reconnect "
> Again, same as "nvme: export I/O failure"
>
> BTW, I think that the counters should be atomic - otherwise we are not getting accurate results. And, as is mentioned, none seem to be in the fastpath (so I don't know why not have them as atomic).
>
Yes I will be making counters atomic as Keith suggested earlier.
> Finally, some of these counters seem to me to be more suitable for a debugfs (and not sysfs).
>
You are correct but then this counters would be consumed by nvme-cli
(and mostly by nvme-top) and you know the debugfs may not be always
available or mounted in production system. For that reason, exporting
the metric through sysfs ensures it is consistently accessible in
production environments.
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-03-19 15:56 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-20 17:48 [PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs Nilay Shroff
2026-02-20 17:48 ` [PATCHv3 1/7] nvme: export command retry count " Nilay Shroff
2026-02-20 17:48 ` [PATCHv3 2/7] nvme: export multipath failover " Nilay Shroff
2026-02-20 17:48 ` [PATCHv3 3/7] nvme: export command error counters " Nilay Shroff
2026-02-20 17:48 ` [PATCHv3 4/7] nvme: export I/O requeue count when no path is available " Nilay Shroff
2026-02-20 17:48 ` [PATCHv3 5/7] nvme: export I/O failure " Nilay Shroff
2026-02-20 17:48 ` [PATCHv3 6/7] nvme: export controller reset event count " Nilay Shroff
2026-02-20 17:48 ` [PATCHv3 7/7] nvme: export controller reconnect " Nilay Shroff
2026-02-22 12:36 ` [PATCHv3 0/7] nvme: export additional diagnostic counters " Venkat
2026-02-22 14:10 ` Nilay Shroff
2026-02-22 15:06 ` Venkat Rao Bagalkote
2026-02-26 5:37 ` Chaitanya Kulkarni
2026-03-04 14:33 ` Nilay Shroff
2026-03-06 16:02 ` Keith Busch
2026-03-08 18:55 ` Nilay Shroff
2026-03-09 15:32 ` John Garry
2026-03-19 15:55 ` Nilay Shroff
2026-03-16 12:56 ` Nilay Shroff
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox