* [PATCHv2 0/7] nvme: export additional diagnostic counters via sysfs
@ 2026-02-05 12:47 Nilay Shroff
2026-02-05 12:48 ` [PATCHv2 1/7] nvme: export command retry count " Nilay Shroff
` (6 more replies)
0 siblings, 7 replies; 15+ messages in thread
From: Nilay Shroff @ 2026-02-05 12:47 UTC (permalink / raw)
To: linux-nvme
Cc: kbusch, axboe, hch, sagi, hare, dwagner, wenxiong, gjoyce,
Nilay Shroff
Hi,
The NVMe driver encounters various events and conditions during normal
operation that are either not tracked today or not exposed to userspace
via sysfs. Lack of visibility into these events can make it difficult to
diagnose subtle issues related to controller behavior, multipath
stability, and I/O reliability.
This patchset adds several diagnostic counters that provide improved
observability into NVMe behavior. These counters are intended to help
users understand events such as transient path unavailability,
controller retries/reconnect/reset, failovers, and I/O failures. They
can also be consumed by monitoring tools such as nvme-top.
Specifically, this series proposes to export the following counters via
sysfs:
- Command retry count
- Multipath failover count
- Command error count
- I/O requeue count
- I/O failure count
- Controller reset event counts
- Controller reconnect counts
The patchset consists of seven patches:
Patch 1: Export command retry count
Patch 2: Export multipath failover count
Patch 3: Export command error count
Patch 4: Export I/O requeue count
Patch 5: Export I/O failure count
Patch 6: Export controller reset event counts
Patch 7: Export controller reconnect event count
Please note that this patchset doesn't make any functional change but
rather export relevant counters to user space via sysfs.
As usual, feedback/comments/suggestions are welcome!
Changes from v1:
- Remove export of stats for admin command rerty count (Keith)
- Use size_add() to ensure stat counters don't overflow (Keith)
Nilay Shroff (7):
nvme: export command retry count via sysfs
nvme: export multipath failover count via sysfs
nvme: export command error counters via sysfs
nvme: export I/O requeue count when no path is available via sysfs
nvme: export I/O failure count when no path is available via sysfs
nvme: export controller reset event count via sysfs
nvme: export controller reconnect event count via sysfs
drivers/nvme/host/core.c | 15 ++++++-
drivers/nvme/host/multipath.c | 34 +++++++++++++++
drivers/nvme/host/nvme.h | 11 ++++-
drivers/nvme/host/sysfs.c | 78 +++++++++++++++++++++++++++++++++++
4 files changed, 136 insertions(+), 2 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCHv2 1/7] nvme: export command retry count via sysfs
2026-02-05 12:47 [PATCHv2 0/7] nvme: export additional diagnostic counters via sysfs Nilay Shroff
@ 2026-02-05 12:48 ` Nilay Shroff
2026-02-07 13:28 ` Sagi Grimberg
2026-02-05 12:48 ` [PATCHv2 2/7] nvme: export multipath failover " Nilay Shroff
` (5 subsequent siblings)
6 siblings, 1 reply; 15+ messages in thread
From: Nilay Shroff @ 2026-02-05 12:48 UTC (permalink / raw)
To: linux-nvme
Cc: kbusch, axboe, hch, sagi, hare, dwagner, wenxiong, gjoyce,
Nilay Shroff
When Advanced Command Retry Enable (ACRE) is configured, a controller
may interrupt command execution and return a completion status
indicating command interrupted with the DNR bit cleared. In this case,
the driver retries the command based on the Command Retry Delay (CRD)
value provided in the completion status.
Currently, these command retries are handled entirely within the NVMe
driver and are not visible to userspace. As a result, there is no
observability into retry behavior, which can be a useful diagnostic
signal.
Expose the command retries count through sysfs to provide visibility
into retry activity. This information can help identify controller-side
congestion under load and enables comparison across paths in multipath
setups (for example, detecting cases where one path experiences
significantly more retries than another under identical workloads).
This exported metric is intended for diagnostics and monitoring tools
such as nvme-top, and does not change command retry behavior.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
drivers/nvme/host/core.c | 4 ++++
drivers/nvme/host/nvme.h | 2 +-
drivers/nvme/host/sysfs.c | 14 ++++++++++++++
3 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 7bf228df6001..d2c430ec0077 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -323,6 +323,7 @@ static void nvme_retry_req(struct request *req)
{
unsigned long delay = 0;
u16 crd;
+ struct nvme_ns *ns = req->q->queuedata;
/* The mask and shift result must be <= 3 */
crd = (nvme_req(req)->status & NVME_STATUS_CRD) >> 11;
@@ -330,6 +331,9 @@ static void nvme_retry_req(struct request *req)
delay = nvme_req(req)->ctrl->crdt[crd - 1] * 100;
nvme_req(req)->retries++;
+ if (ns)
+ ns->retries = size_add(ns->retries, 1);
+
blk_mq_requeue_request(req, false);
blk_mq_delay_kick_requeue_list(req->q, delay);
}
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 9a5f28c5103c..237829cdc151 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -359,7 +359,6 @@ struct nvme_ctrl {
unsigned long ka_last_check_time;
struct work_struct fw_act_work;
unsigned long events;
-
#ifdef CONFIG_NVME_MULTIPATH
/* asymmetric namespace access: */
u8 anacap;
@@ -535,6 +534,7 @@ struct nvme_ns {
enum nvme_ana_state ana_state;
u32 ana_grpid;
#endif
+ size_t retries;
struct list_head siblings;
struct kref kref;
struct nvme_ns_head *head;
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index 29430949ce2f..174d099246b4 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -246,6 +246,15 @@ static ssize_t nuse_show(struct device *dev, struct device_attribute *attr,
}
static DEVICE_ATTR_RO(nuse);
+static ssize_t command_retries_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
+
+ return sysfs_emit(buf, "%lu\n", ns->retries);
+}
+static DEVICE_ATTR_RO(command_retries);
+
static struct attribute *nvme_ns_attrs[] = {
&dev_attr_wwid.attr,
&dev_attr_uuid.attr,
@@ -263,6 +272,7 @@ static struct attribute *nvme_ns_attrs[] = {
&dev_attr_delayed_removal_secs.attr,
#endif
&dev_attr_io_passthru_err_log_enabled.attr,
+ &dev_attr_command_retries.attr,
NULL,
};
@@ -285,6 +295,10 @@ static umode_t nvme_ns_attrs_are_visible(struct kobject *kobj,
if (!memchr_inv(ids->eui64, 0, sizeof(ids->eui64)))
return 0;
}
+ if (a == &dev_attr_command_retries.attr) {
+ if (nvme_disk_is_ns_head(dev_to_disk(dev)))
+ return 0;
+ }
#ifdef CONFIG_NVME_MULTIPATH
if (a == &dev_attr_ana_grpid.attr || a == &dev_attr_ana_state.attr) {
/* per-path attr */
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCHv2 2/7] nvme: export multipath failover count via sysfs
2026-02-05 12:47 [PATCHv2 0/7] nvme: export additional diagnostic counters via sysfs Nilay Shroff
2026-02-05 12:48 ` [PATCHv2 1/7] nvme: export command retry count " Nilay Shroff
@ 2026-02-05 12:48 ` Nilay Shroff
2026-02-07 13:30 ` Sagi Grimberg
2026-02-05 12:48 ` [PATCHv2 3/7] nvme: export command error counters " Nilay Shroff
` (4 subsequent siblings)
6 siblings, 1 reply; 15+ messages in thread
From: Nilay Shroff @ 2026-02-05 12:48 UTC (permalink / raw)
To: linux-nvme
Cc: kbusch, axboe, hch, sagi, hare, dwagner, wenxiong, gjoyce,
Nilay Shroff
When an NVMe command completes with a path-specific error, the NVMe
driver may retry the command on an alternate controller or path if one
is available. These failover events indicate that I/O was redirected
away from the original path.
Currently, the number of times requests are failed over to another
available path is not visible to userspace. Exposing this information
can be useful for diagnosing path health and stability.
Export the multipath failover count through sysfs to provide visibility
into path failover behavior. This statistic can be consumed by
monitoring tools such as nvme-top to help identify paths that
consistently trigger failovers under load.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
drivers/nvme/host/multipath.c | 10 ++++++++++
drivers/nvme/host/nvme.h | 2 ++
drivers/nvme/host/sysfs.c | 5 +++++
3 files changed, 17 insertions(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 174027d1cc19..792385477211 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -142,6 +142,7 @@ void nvme_failover_req(struct request *req)
struct bio *bio;
nvme_mpath_clear_current_path(ns);
+ ns->failover = size_add(ns->failover, 1);
/*
* If we got back an ANA error, we know the controller is alive but not
@@ -1168,6 +1169,15 @@ static ssize_t delayed_removal_secs_store(struct device *dev,
DEVICE_ATTR_RW(delayed_removal_secs);
+static ssize_t multipath_failover_count_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
+
+ return sysfs_emit(buf, "%lu\n", ns->failover);
+}
+DEVICE_ATTR_RO(multipath_failover_count);
+
static int nvme_lookup_ana_group_desc(struct nvme_ctrl *ctrl,
struct nvme_ana_group_desc *desc, void *data)
{
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 237829cdc151..6307243fd216 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -533,6 +533,7 @@ struct nvme_ns {
#ifdef CONFIG_NVME_MULTIPATH
enum nvme_ana_state ana_state;
u32 ana_grpid;
+ size_t failover;
#endif
size_t retries;
struct list_head siblings;
@@ -1000,6 +1001,7 @@ extern struct device_attribute dev_attr_ana_state;
extern struct device_attribute dev_attr_queue_depth;
extern struct device_attribute dev_attr_numa_nodes;
extern struct device_attribute dev_attr_delayed_removal_secs;
+extern struct device_attribute dev_attr_multipath_failover_count;
extern struct device_attribute subsys_attr_iopolicy;
static inline bool nvme_disk_is_ns_head(struct gendisk *disk)
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index 174d099246b4..34dcb6db9b5c 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -270,6 +270,7 @@ static struct attribute *nvme_ns_attrs[] = {
&dev_attr_queue_depth.attr,
&dev_attr_numa_nodes.attr,
&dev_attr_delayed_removal_secs.attr,
+ &dev_attr_multipath_failover_count.attr,
#endif
&dev_attr_io_passthru_err_log_enabled.attr,
&dev_attr_command_retries.attr,
@@ -317,6 +318,10 @@ static umode_t nvme_ns_attrs_are_visible(struct kobject *kobj,
if (!nvme_disk_is_ns_head(disk))
return 0;
}
+ if (a == &dev_attr_multipath_failover_count.attr) {
+ if (nvme_disk_is_ns_head(dev_to_disk(dev)))
+ return 0;
+ }
#endif
return a->mode;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCHv2 3/7] nvme: export command error counters via sysfs
2026-02-05 12:47 [PATCHv2 0/7] nvme: export additional diagnostic counters via sysfs Nilay Shroff
2026-02-05 12:48 ` [PATCHv2 1/7] nvme: export command retry count " Nilay Shroff
2026-02-05 12:48 ` [PATCHv2 2/7] nvme: export multipath failover " Nilay Shroff
@ 2026-02-05 12:48 ` Nilay Shroff
2026-02-05 12:48 ` [PATCHv2 4/7] nvme: export I/O requeue count when no path is available " Nilay Shroff
` (3 subsequent siblings)
6 siblings, 0 replies; 15+ messages in thread
From: Nilay Shroff @ 2026-02-05 12:48 UTC (permalink / raw)
To: linux-nvme
Cc: kbusch, axboe, hch, sagi, hare, dwagner, wenxiong, gjoyce,
Nilay Shroff
When an NVMe command completes with an error status, the driver
logs the error to the kernel log. However, these messages may be
lost or overwritten over time since dmesg is a circular buffer.
Expose per-path and ctrl command error counters through sysfs to
provide persistent visibility into error occurrences. This allows
users to observe the total number of commands that have failed on
a given path over time, which can be useful for diagnosing path
health and stability.
These counters can also be consumed by observability tools such as
nvme-top to provide additional insight into NVMe error behavior.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
drivers/nvme/host/core.c | 10 +++++++++-
drivers/nvme/host/nvme.h | 2 ++
drivers/nvme/host/sysfs.c | 29 +++++++++++++++++++++++++++++
3 files changed, 40 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d2c430ec0077..11eb28117501 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -438,11 +438,19 @@ static inline void nvme_end_req_zoned(struct request *req)
static inline void __nvme_end_req(struct request *req)
{
- if (unlikely(nvme_req(req)->status && !(req->rq_flags & RQF_QUIET))) {
+ struct nvme_ns *ns = req->q->queuedata;
+ struct nvme_request *nr = nvme_req(req);
+
+ if (unlikely(nr->status && !(req->rq_flags & RQF_QUIET))) {
if (blk_rq_is_passthrough(req))
nvme_log_err_passthru(req);
else
nvme_log_error(req);
+
+ if (ns)
+ ns->errors = size_add(ns->errors, 1);
+ else
+ nr->ctrl->errors = size_add(nr->ctrl->errors, 1);
}
nvme_end_req_zoned(req);
nvme_trace_bio_complete(req);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 6307243fd216..83b102a0ad89 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -359,6 +359,7 @@ struct nvme_ctrl {
unsigned long ka_last_check_time;
struct work_struct fw_act_work;
unsigned long events;
+ size_t errors;
#ifdef CONFIG_NVME_MULTIPATH
/* asymmetric namespace access: */
u8 anacap;
@@ -536,6 +537,7 @@ struct nvme_ns {
size_t failover;
#endif
size_t retries;
+ size_t errors;
struct list_head siblings;
struct kref kref;
struct nvme_ns_head *head;
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index 34dcb6db9b5c..4690ef9a1948 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -6,6 +6,7 @@
*/
#include <linux/nvme-auth.h>
+#include <linux/blkdev.h>
#include "nvme.h"
#include "fabrics.h"
@@ -255,6 +256,16 @@ static ssize_t command_retries_show(struct device *dev,
}
static DEVICE_ATTR_RO(command_retries);
+static ssize_t nvme_io_errors_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
+
+ return sysfs_emit(buf, "%lu\n", ns->errors);
+}
+struct device_attribute dev_attr_io_errors =
+ __ATTR(command_error_count, 0444, nvme_io_errors_show, NULL);
+
static struct attribute *nvme_ns_attrs[] = {
&dev_attr_wwid.attr,
&dev_attr_uuid.attr,
@@ -274,6 +285,7 @@ static struct attribute *nvme_ns_attrs[] = {
#endif
&dev_attr_io_passthru_err_log_enabled.attr,
&dev_attr_command_retries.attr,
+ &dev_attr_io_errors.attr,
NULL,
};
@@ -300,6 +312,12 @@ static umode_t nvme_ns_attrs_are_visible(struct kobject *kobj,
if (nvme_disk_is_ns_head(dev_to_disk(dev)))
return 0;
}
+ if (a == &dev_attr_io_errors.attr) {
+ struct gendisk *disk = dev_to_disk(dev);
+
+ if (nvme_disk_is_ns_head(disk))
+ return 0;
+ }
#ifdef CONFIG_NVME_MULTIPATH
if (a == &dev_attr_ana_grpid.attr || a == &dev_attr_ana_state.attr) {
/* per-path attr */
@@ -620,6 +638,16 @@ static ssize_t dctype_show(struct device *dev,
}
static DEVICE_ATTR_RO(dctype);
+static ssize_t nvme_adm_errors_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+ return sysfs_emit(buf, "%lu\n", ctrl->errors);
+}
+struct device_attribute dev_attr_adm_errors =
+ __ATTR(command_error_count, 0444, nvme_adm_errors_show, NULL);
+
#ifdef CONFIG_NVME_HOST_AUTH
static ssize_t nvme_ctrl_dhchap_secret_show(struct device *dev,
struct device_attribute *attr, char *buf)
@@ -766,6 +794,7 @@ static struct attribute *nvme_dev_attrs[] = {
&dev_attr_dhchap_ctrl_secret.attr,
#endif
&dev_attr_adm_passthru_err_log_enabled.attr,
+ &dev_attr_adm_errors.attr,
NULL
};
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCHv2 4/7] nvme: export I/O requeue count when no path is available via sysfs
2026-02-05 12:47 [PATCHv2 0/7] nvme: export additional diagnostic counters via sysfs Nilay Shroff
` (2 preceding siblings ...)
2026-02-05 12:48 ` [PATCHv2 3/7] nvme: export command error counters " Nilay Shroff
@ 2026-02-05 12:48 ` Nilay Shroff
2026-02-07 13:33 ` Sagi Grimberg
2026-02-05 12:48 ` [PATCHv2 5/7] nvme: export I/O failure " Nilay Shroff
` (2 subsequent siblings)
6 siblings, 1 reply; 15+ messages in thread
From: Nilay Shroff @ 2026-02-05 12:48 UTC (permalink / raw)
To: linux-nvme
Cc: kbusch, axboe, hch, sagi, hare, dwagner, wenxiong, gjoyce,
Nilay Shroff
When the NVMe namespace head determines that there is no currently
available path to handle I/O (for example, while a controller is
resetting/connecting or due to a transient link failure), incoming
I/Os are added to the requeue list.
Currently, there is no visibility into how many I/Os have been requeued
in this situation. Add a new sysfs counter, requeue_no_available_path,
to expose the number of I/Os that were requeued due to the absence of
an available path.
This statistic can help users understand I/O slowdowns or stalls caused
by temporary path unavailability, and can be consumed by monitoring
tools such as nvme-top for real-time observability.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
drivers/nvme/host/multipath.c | 12 ++++++++++++
drivers/nvme/host/nvme.h | 2 ++
drivers/nvme/host/sysfs.c | 5 +++++
3 files changed, 19 insertions(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 792385477211..e0bfbc659963 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -539,6 +539,8 @@ static void nvme_ns_head_submit_bio(struct bio *bio)
spin_lock_irq(&head->requeue_lock);
bio_list_add(&head->requeue_list, bio);
spin_unlock_irq(&head->requeue_lock);
+ head->requeue_no_usable_path =
+ size_add(head->requeue_no_usable_path, 1);
} else {
dev_warn_ratelimited(dev, "no available path - failing I/O\n");
@@ -1178,6 +1180,16 @@ static ssize_t multipath_failover_count_show(struct device *dev,
}
DEVICE_ATTR_RO(multipath_failover_count);
+static ssize_t requeue_no_usable_path_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct gendisk *disk = dev_to_disk(dev);
+ struct nvme_ns_head *head = disk->private_data;
+
+ return sysfs_emit(buf, "%lu\n", head->requeue_no_usable_path);
+}
+DEVICE_ATTR_RO(requeue_no_usable_path);
+
static int nvme_lookup_ana_group_desc(struct nvme_ctrl *ctrl,
struct nvme_ana_group_desc *desc, void *data)
{
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 83b102a0ad89..39e5b5c7885b 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -508,6 +508,7 @@ struct nvme_ns_head {
unsigned long flags;
struct delayed_work remove_work;
unsigned int delayed_removal_secs;
+ size_t requeue_no_usable_path;
#define NVME_NSHEAD_DISK_LIVE 0
#define NVME_NSHEAD_QUEUE_IF_NO_PATH 1
struct nvme_ns __rcu *current_path[];
@@ -1004,6 +1005,7 @@ extern struct device_attribute dev_attr_queue_depth;
extern struct device_attribute dev_attr_numa_nodes;
extern struct device_attribute dev_attr_delayed_removal_secs;
extern struct device_attribute dev_attr_multipath_failover_count;
+extern struct device_attribute dev_attr_requeue_no_usable_path;
extern struct device_attribute subsys_attr_iopolicy;
static inline bool nvme_disk_is_ns_head(struct gendisk *disk)
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index 4690ef9a1948..a6b8539074ee 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -282,6 +282,7 @@ static struct attribute *nvme_ns_attrs[] = {
&dev_attr_numa_nodes.attr,
&dev_attr_delayed_removal_secs.attr,
&dev_attr_multipath_failover_count.attr,
+ &dev_attr_requeue_no_usable_path.attr,
#endif
&dev_attr_io_passthru_err_log_enabled.attr,
&dev_attr_command_retries.attr,
@@ -340,6 +341,10 @@ static umode_t nvme_ns_attrs_are_visible(struct kobject *kobj,
if (nvme_disk_is_ns_head(dev_to_disk(dev)))
return 0;
}
+ if (a == &dev_attr_requeue_no_usable_path.attr) {
+ if (!nvme_disk_is_ns_head(dev_to_disk(dev)))
+ return 0;
+ }
#endif
return a->mode;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCHv2 5/7] nvme: export I/O failure count when no path is available via sysfs
2026-02-05 12:47 [PATCHv2 0/7] nvme: export additional diagnostic counters via sysfs Nilay Shroff
` (3 preceding siblings ...)
2026-02-05 12:48 ` [PATCHv2 4/7] nvme: export I/O requeue count when no path is available " Nilay Shroff
@ 2026-02-05 12:48 ` Nilay Shroff
2026-02-05 12:48 ` [PATCHv2 6/7] nvme: export controller reset event count " Nilay Shroff
2026-02-05 12:48 ` [PATCHv2 7/7] nvme: export controller reconnect " Nilay Shroff
6 siblings, 0 replies; 15+ messages in thread
From: Nilay Shroff @ 2026-02-05 12:48 UTC (permalink / raw)
To: linux-nvme
Cc: kbusch, axboe, hch, sagi, hare, dwagner, wenxiong, gjoyce,
Nilay Shroff
When I/O is submitted to the NVMe namespace head and no available path
can handle the request, the driver fails the I/O immediately. Currently,
such failures are only reported via kernel log messages, which may be
lost over time since dmesg is a circular buffer.
Add a new sysfs counter, fail_no_available_path, to expose the number of
I/Os that failed due to the absence of an available path. This provides
persistent visibility into path-related I/O failures and can help users
diagnose the cause of I/O errors.
This counter can also be consumed by monitoring tools such as nvme-top.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
drivers/nvme/host/multipath.c | 12 ++++++++++++
drivers/nvme/host/nvme.h | 2 ++
drivers/nvme/host/sysfs.c | 5 +++++
3 files changed, 19 insertions(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index e0bfbc659963..9984221bcec0 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -545,6 +545,8 @@ static void nvme_ns_head_submit_bio(struct bio *bio)
dev_warn_ratelimited(dev, "no available path - failing I/O\n");
bio_io_error(bio);
+ head->fail_no_available_path =
+ size_add(head->fail_no_available_path, 1);
}
srcu_read_unlock(&head->srcu, srcu_idx);
@@ -1190,6 +1192,16 @@ static ssize_t requeue_no_usable_path_show(struct device *dev,
}
DEVICE_ATTR_RO(requeue_no_usable_path);
+static ssize_t fail_no_available_path_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct gendisk *disk = dev_to_disk(dev);
+ struct nvme_ns_head *head = disk->private_data;
+
+ return sysfs_emit(buf, "%lu\n", head->fail_no_available_path);
+}
+DEVICE_ATTR_RO(fail_no_available_path);
+
static int nvme_lookup_ana_group_desc(struct nvme_ctrl *ctrl,
struct nvme_ana_group_desc *desc, void *data)
{
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 39e5b5c7885b..b1ce2857899a 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -509,6 +509,7 @@ struct nvme_ns_head {
struct delayed_work remove_work;
unsigned int delayed_removal_secs;
size_t requeue_no_usable_path;
+ size_t fail_no_available_path;
#define NVME_NSHEAD_DISK_LIVE 0
#define NVME_NSHEAD_QUEUE_IF_NO_PATH 1
struct nvme_ns __rcu *current_path[];
@@ -1006,6 +1007,7 @@ extern struct device_attribute dev_attr_numa_nodes;
extern struct device_attribute dev_attr_delayed_removal_secs;
extern struct device_attribute dev_attr_multipath_failover_count;
extern struct device_attribute dev_attr_requeue_no_usable_path;
+extern struct device_attribute dev_attr_fail_no_available_path;
extern struct device_attribute subsys_attr_iopolicy;
static inline bool nvme_disk_is_ns_head(struct gendisk *disk)
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index a6b8539074ee..c1e2b93f7ae8 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -283,6 +283,7 @@ static struct attribute *nvme_ns_attrs[] = {
&dev_attr_delayed_removal_secs.attr,
&dev_attr_multipath_failover_count.attr,
&dev_attr_requeue_no_usable_path.attr,
+ &dev_attr_fail_no_available_path.attr,
#endif
&dev_attr_io_passthru_err_log_enabled.attr,
&dev_attr_command_retries.attr,
@@ -345,6 +346,10 @@ static umode_t nvme_ns_attrs_are_visible(struct kobject *kobj,
if (!nvme_disk_is_ns_head(dev_to_disk(dev)))
return 0;
}
+ if (a == &dev_attr_fail_no_available_path.attr) {
+ if (!nvme_disk_is_ns_head(dev_to_disk(dev)))
+ return 0;
+ }
#endif
return a->mode;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCHv2 6/7] nvme: export controller reset event count via sysfs
2026-02-05 12:47 [PATCHv2 0/7] nvme: export additional diagnostic counters via sysfs Nilay Shroff
` (4 preceding siblings ...)
2026-02-05 12:48 ` [PATCHv2 5/7] nvme: export I/O failure " Nilay Shroff
@ 2026-02-05 12:48 ` Nilay Shroff
2026-02-05 12:48 ` [PATCHv2 7/7] nvme: export controller reconnect " Nilay Shroff
6 siblings, 0 replies; 15+ messages in thread
From: Nilay Shroff @ 2026-02-05 12:48 UTC (permalink / raw)
To: linux-nvme
Cc: kbusch, axboe, hch, sagi, hare, dwagner, wenxiong, gjoyce,
Nilay Shroff
The NVMe controller transitions into the RESETTING state during error
recovery, link instability, firmware activation, or when a reset is
explicitly triggered by the user.
Expose a controller reset event count via sysfs to provide visibility
into these RESETTING state transitions. Observing the frequency of reset
events can help users identify issues such as PCIe errors or unstable
fabric links.
This counter can also be consumed by monitoring tools such as nvme-top
to improve controller-level observability.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
drivers/nvme/host/core.c | 1 +
drivers/nvme/host/nvme.h | 1 +
drivers/nvme/host/sysfs.c | 10 ++++++++++
3 files changed, 12 insertions(+)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 11eb28117501..36af86515cb7 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -591,6 +591,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
case NVME_CTRL_NEW:
case NVME_CTRL_LIVE:
changed = true;
+ ctrl->nr_reset = size_add(ctrl->nr_reset, 1);
fallthrough;
default:
break;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index b1ce2857899a..5d90e5fa7298 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -360,6 +360,7 @@ struct nvme_ctrl {
struct work_struct fw_act_work;
unsigned long events;
size_t errors;
+ size_t nr_reset;
#ifdef CONFIG_NVME_MULTIPATH
/* asymmetric namespace access: */
u8 anacap;
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index c1e2b93f7ae8..7958fe998139 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -658,6 +658,15 @@ static ssize_t nvme_adm_errors_show(struct device *dev,
struct device_attribute dev_attr_adm_errors =
__ATTR(command_error_count, 0444, nvme_adm_errors_show, NULL);
+static ssize_t reset_events_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+ return sysfs_emit(buf, "%lu\n", ctrl->nr_reset);
+}
+static DEVICE_ATTR_RO(reset_events);
+
#ifdef CONFIG_NVME_HOST_AUTH
static ssize_t nvme_ctrl_dhchap_secret_show(struct device *dev,
struct device_attribute *attr, char *buf)
@@ -805,6 +814,7 @@ static struct attribute *nvme_dev_attrs[] = {
#endif
&dev_attr_adm_passthru_err_log_enabled.attr,
&dev_attr_adm_errors.attr,
+ &dev_attr_reset_events.attr,
NULL
};
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCHv2 7/7] nvme: export controller reconnect event count via sysfs
2026-02-05 12:47 [PATCHv2 0/7] nvme: export additional diagnostic counters via sysfs Nilay Shroff
` (5 preceding siblings ...)
2026-02-05 12:48 ` [PATCHv2 6/7] nvme: export controller reset event count " Nilay Shroff
@ 2026-02-05 12:48 ` Nilay Shroff
2026-02-07 13:37 ` Sagi Grimberg
6 siblings, 1 reply; 15+ messages in thread
From: Nilay Shroff @ 2026-02-05 12:48 UTC (permalink / raw)
To: linux-nvme
Cc: kbusch, axboe, hch, sagi, hare, dwagner, wenxiong, gjoyce,
Nilay Shroff
When an NVMe-oF link goes down, the driver attempts to recover the
connection by repeatedly reconnecting to the remote controller at
configured intervals. A maximum number of reconnect attempts is also
configured, after which recovery stops and the controller is removed
if the connection cannot be re-established.
The driver maintains a counter, nr_reconnects, which is incremented on
each reconnect attempt. Currently, this counter is only reported via
kernel log messages and is not exposed to userspace. Since dmesg is a
circular buffer, this information may be lost over time.
Expose the nr_reconnects counter via a new sysfs attribute, reconnect_
events, to provide persistent visibility into the number of reconnect
attempts made by the host. This information can help users diagnose
unstable links or connectivity issues.
This counter can also be consumed by monitoring tools such as nvme-top
to improve controller-level observability.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
drivers/nvme/host/sysfs.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index 7958fe998139..0b3fdd55ba30 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -667,6 +667,15 @@ static ssize_t reset_events_show(struct device *dev,
}
static DEVICE_ATTR_RO(reset_events);
+static ssize_t reconnect_events_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+ return sysfs_emit(buf, "%u\n", ctrl->nr_reconnects);
+}
+static DEVICE_ATTR_RO(reconnect_events);
+
#ifdef CONFIG_NVME_HOST_AUTH
static ssize_t nvme_ctrl_dhchap_secret_show(struct device *dev,
struct device_attribute *attr, char *buf)
@@ -815,6 +824,7 @@ static struct attribute *nvme_dev_attrs[] = {
&dev_attr_adm_passthru_err_log_enabled.attr,
&dev_attr_adm_errors.attr,
&dev_attr_reset_events.attr,
+ &dev_attr_reconnect_events.attr,
NULL
};
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCHv2 1/7] nvme: export command retry count via sysfs
2026-02-05 12:48 ` [PATCHv2 1/7] nvme: export command retry count " Nilay Shroff
@ 2026-02-07 13:28 ` Sagi Grimberg
2026-02-09 11:48 ` Nilay Shroff
0 siblings, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2026-02-07 13:28 UTC (permalink / raw)
To: Nilay Shroff, linux-nvme
Cc: kbusch, axboe, hch, hare, dwagner, wenxiong, gjoyce
On 05/02/2026 14:48, Nilay Shroff wrote:
> When Advanced Command Retry Enable (ACRE) is configured, a controller
> may interrupt command execution and return a completion status
> indicating command interrupted with the DNR bit cleared. In this case,
> the driver retries the command based on the Command Retry Delay (CRD)
> value provided in the completion status.
>
> Currently, these command retries are handled entirely within the NVMe
> driver and are not visible to userspace. As a result, there is no
> observability into retry behavior, which can be a useful diagnostic
> signal.
>
> Expose the command retries count through sysfs to provide visibility
> into retry activity. This information can help identify controller-side
> congestion under load and enables comparison across paths in multipath
> setups (for example, detecting cases where one path experiences
> significantly more retries than another under identical workloads).
>
> This exported metric is intended for diagnostics and monitoring tools
> such as nvme-top, and does not change command retry behavior.
This is designed to show an accumulated value of how much retries were
done on a namespace since boot? I'm wandering if this does not belong in
debugfs?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHv2 2/7] nvme: export multipath failover count via sysfs
2026-02-05 12:48 ` [PATCHv2 2/7] nvme: export multipath failover " Nilay Shroff
@ 2026-02-07 13:30 ` Sagi Grimberg
0 siblings, 0 replies; 15+ messages in thread
From: Sagi Grimberg @ 2026-02-07 13:30 UTC (permalink / raw)
To: Nilay Shroff, linux-nvme
Cc: kbusch, axboe, hch, hare, dwagner, wenxiong, gjoyce
On 05/02/2026 14:48, Nilay Shroff wrote:
> When an NVMe command completes with a path-specific error, the NVMe
> driver may retry the command on an alternate controller or path if one
> is available. These failover events indicate that I/O was redirected
> away from the original path.
>
> Currently, the number of times requests are failed over to another
> available path is not visible to userspace. Exposing this information
> can be useful for diagnosing path health and stability.
>
> Export the multipath failover count through sysfs to provide visibility
> into path failover behavior. This statistic can be consumed by
> monitoring tools such as nvme-top to help identify paths that
> consistently trigger failovers under load.
Same comment. Other than that, patch looks fine.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHv2 4/7] nvme: export I/O requeue count when no path is available via sysfs
2026-02-05 12:48 ` [PATCHv2 4/7] nvme: export I/O requeue count when no path is available " Nilay Shroff
@ 2026-02-07 13:33 ` Sagi Grimberg
2026-02-09 11:53 ` Nilay Shroff
0 siblings, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2026-02-07 13:33 UTC (permalink / raw)
To: Nilay Shroff, linux-nvme
Cc: kbusch, axboe, hch, hare, dwagner, wenxiong, gjoyce
> When the NVMe namespace head determines that there is no currently
> available path to handle I/O (for example, while a controller is
> resetting/connecting or due to a transient link failure), incoming
> I/Os are added to the requeue list.
>
> Currently, there is no visibility into how many I/Os have been requeued
> in this situation. Add a new sysfs counter, requeue_no_available_path,
> to expose the number of I/Os that were requeued due to the absence of
> an available path.
>
> This statistic can help users understand I/O slowdowns or stalls caused
> by temporary path unavailability, and can be consumed by monitoring
> tools such as nvme-top for real-time observability.
Other than the debugfs comment, would it make sense to have a reset
functionality to these files (via write to the file)?
Other than that, patch makes sense to me.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHv2 7/7] nvme: export controller reconnect event count via sysfs
2026-02-05 12:48 ` [PATCHv2 7/7] nvme: export controller reconnect " Nilay Shroff
@ 2026-02-07 13:37 ` Sagi Grimberg
2026-02-09 12:00 ` Nilay Shroff
0 siblings, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2026-02-07 13:37 UTC (permalink / raw)
To: Nilay Shroff, linux-nvme
Cc: kbusch, axboe, hch, hare, dwagner, wenxiong, gjoyce
On 05/02/2026 14:48, Nilay Shroff wrote:
> When an NVMe-oF link goes down, the driver attempts to recover the
> connection by repeatedly reconnecting to the remote controller at
> configured intervals. A maximum number of reconnect attempts is also
> configured, after which recovery stops and the controller is removed
> if the connection cannot be re-established.
>
> The driver maintains a counter, nr_reconnects, which is incremented on
> each reconnect attempt. Currently, this counter is only reported via
> kernel log messages and is not exposed to userspace. Since dmesg is a
> circular buffer, this information may be lost over time.
>
> Expose the nr_reconnects counter via a new sysfs attribute, reconnect_
> events, to provide persistent visibility into the number of reconnect
> attempts made by the host. This information can help users diagnose
> unstable links or connectivity issues.
This unlike the other is zeroed once the controller successfully reconnects.
I think it is better to have consistency across these sysfs entries.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHv2 1/7] nvme: export command retry count via sysfs
2026-02-07 13:28 ` Sagi Grimberg
@ 2026-02-09 11:48 ` Nilay Shroff
0 siblings, 0 replies; 15+ messages in thread
From: Nilay Shroff @ 2026-02-09 11:48 UTC (permalink / raw)
To: Sagi Grimberg, linux-nvme
Cc: kbusch, axboe, hch, hare, dwagner, wenxiong, gjoyce
On 2/7/26 6:58 PM, Sagi Grimberg wrote:
>
>
> On 05/02/2026 14:48, Nilay Shroff wrote:
>> When Advanced Command Retry Enable (ACRE) is configured, a controller
>> may interrupt command execution and return a completion status
>> indicating command interrupted with the DNR bit cleared. In this case,
>> the driver retries the command based on the Command Retry Delay (CRD)
>> value provided in the completion status.
>>
>> Currently, these command retries are handled entirely within the NVMe
>> driver and are not visible to userspace. As a result, there is no
>> observability into retry behavior, which can be a useful diagnostic
>> signal.
>>
>> Expose the command retries count through sysfs to provide visibility
>> into retry activity. This information can help identify controller-side
>> congestion under load and enables comparison across paths in multipath
>> setups (for example, detecting cases where one path experiences
>> significantly more retries than another under identical workloads).
>>
>> This exported metric is intended for diagnostics and monitoring tools
>> such as nvme-top, and does not change command retry behavior.
>
> This is designed to show an accumulated value of how much retries were
> done on a namespace since boot? I'm wandering if this does not belong in
> debugfs?
Yes, that’s correct — the intent is to expose an accumulated count of command
retries for a namespace since boot.
While debugfs could be used for this type of diagnostic information, it would
not work well for the intended userspace consumption. The retry counter is
expected to be consumed by tools such as nvme-cli (and potentially nvme-top),
so it needs to be available via sysfs. Relying on debugfs would make this
information unavailable on production systems where debugfs may not be enabled
or mounted. For that reason, exporting the metric through sysfs ensures it is
consistently accessible in production environments
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHv2 4/7] nvme: export I/O requeue count when no path is available via sysfs
2026-02-07 13:33 ` Sagi Grimberg
@ 2026-02-09 11:53 ` Nilay Shroff
0 siblings, 0 replies; 15+ messages in thread
From: Nilay Shroff @ 2026-02-09 11:53 UTC (permalink / raw)
To: Sagi Grimberg, linux-nvme
Cc: kbusch, axboe, hch, hare, dwagner, wenxiong, gjoyce
On 2/7/26 7:03 PM, Sagi Grimberg wrote:
>
>> When the NVMe namespace head determines that there is no currently
>> available path to handle I/O (for example, while a controller is
>> resetting/connecting or due to a transient link failure), incoming
>> I/Os are added to the requeue list.
>>
>> Currently, there is no visibility into how many I/Os have been requeued
>> in this situation. Add a new sysfs counter, requeue_no_available_path,
>> to expose the number of I/Os that were requeued due to the absence of
>> an available path.
>>
>> This statistic can help users understand I/O slowdowns or stalls caused
>> by temporary path unavailability, and can be consumed by monitoring
>> tools such as nvme-top for real-time observability.
>
> Other than the debugfs comment, would it make sense to have a reset
> functionality to these files (via write to the file)?
>
Hmm yes that can be added, however do you have any usecase in mind
so you think that allowing resets of this counter could be useful?
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHv2 7/7] nvme: export controller reconnect event count via sysfs
2026-02-07 13:37 ` Sagi Grimberg
@ 2026-02-09 12:00 ` Nilay Shroff
0 siblings, 0 replies; 15+ messages in thread
From: Nilay Shroff @ 2026-02-09 12:00 UTC (permalink / raw)
To: Sagi Grimberg, linux-nvme
Cc: kbusch, axboe, hch, hare, dwagner, wenxiong, gjoyce
On 2/7/26 7:07 PM, Sagi Grimberg wrote:
>
>
> On 05/02/2026 14:48, Nilay Shroff wrote:
>> When an NVMe-oF link goes down, the driver attempts to recover the
>> connection by repeatedly reconnecting to the remote controller at
>> configured intervals. A maximum number of reconnect attempts is also
>> configured, after which recovery stops and the controller is removed
>> if the connection cannot be re-established.
>>
>> The driver maintains a counter, nr_reconnects, which is incremented on
>> each reconnect attempt. Currently, this counter is only reported via
>> kernel log messages and is not exposed to userspace. Since dmesg is a
>> circular buffer, this information may be lost over time.
>>
>> Expose the nr_reconnects counter via a new sysfs attribute, reconnect_
>> events, to provide persistent visibility into the number of reconnect
>> attempts made by the host. This information can help users diagnose
>> unstable links or connectivity issues.
>
> This unlike the other is zeroed once the controller successfully reconnects.
> I think it is better to have consistency across these sysfs entries.
Yes, that’s correct. The current patch simply exposes an existing counter via
sysfs, and that counter is reset to zero once a reconnect attempt succeeds.
However, agreed, for consistency with the other sysfs entries, it would make
sense for this value to behave as an accumulator rather than a transient counter.
We can introduce a new counter that accumulates the total number of reconnect
attempts over time and export that through sysfs instead.
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-02-09 12:00 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-05 12:47 [PATCHv2 0/7] nvme: export additional diagnostic counters via sysfs Nilay Shroff
2026-02-05 12:48 ` [PATCHv2 1/7] nvme: export command retry count " Nilay Shroff
2026-02-07 13:28 ` Sagi Grimberg
2026-02-09 11:48 ` Nilay Shroff
2026-02-05 12:48 ` [PATCHv2 2/7] nvme: export multipath failover " Nilay Shroff
2026-02-07 13:30 ` Sagi Grimberg
2026-02-05 12:48 ` [PATCHv2 3/7] nvme: export command error counters " Nilay Shroff
2026-02-05 12:48 ` [PATCHv2 4/7] nvme: export I/O requeue count when no path is available " Nilay Shroff
2026-02-07 13:33 ` Sagi Grimberg
2026-02-09 11:53 ` Nilay Shroff
2026-02-05 12:48 ` [PATCHv2 5/7] nvme: export I/O failure " Nilay Shroff
2026-02-05 12:48 ` [PATCHv2 6/7] nvme: export controller reset event count " Nilay Shroff
2026-02-05 12:48 ` [PATCHv2 7/7] nvme: export controller reconnect " Nilay Shroff
2026-02-07 13:37 ` Sagi Grimberg
2026-02-09 12:00 ` Nilay Shroff
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox