From: Nilay Shroff <nilay@linux.ibm.com>
To: linux-nvme@lists.infradead.org
Cc: dwagner@suse.de, hare@suse.com, kbusch@kernel.org, hch@lst.de,
sagi@grimberg.me, axboe@kernel.dk, chaitanyak@nvidia.com,
venkat88@linux.ibm.com, gjoyce@linux.ibm.com,
wenxiong@linux.ibm.com, Nilay Shroff <nilay@linux.ibm.com>
Subject: [PATCHv4 0/8] nvme: export additional diagnostic counters via sysfs
Date: Sun, 17 May 2026 00:06:47 +0530 [thread overview]
Message-ID: <20260516183709.269937-1-nilay@linux.ibm.com> (raw)
Hi,
The NVMe driver encounters various events and conditions during normal
operation that are either not tracked today or not exposed to userspace
via sysfs. Lack of visibility into these events can make it difficult to
diagnose subtle issues related to controller behavior, multipath
stability, and I/O reliability.
This patchset adds several diagnostic counters that provide improved
observability into NVMe behavior. These counters are intended to help
users understand events such as transient path unavailability,
controller retries/reconnect/reset, failovers, and I/O failures. They
can also be consumed by monitoring tools such as nvme-top.
Specifically, this series proposes to export the following counters via
sysfs:
- Command retry count
- Multipath failover count
- Command error count
- I/O requeue count
- I/O failure count
- Controller reset event counts
- Controller reconnect counts
The first patch in the series adds a new diag attribute group under per-path,
ns-head and ctrl sysfs directories so that all diagnostics counters could be
grouped together under diag sub-directory. The subsequent patches in the series
adds diagnostics counters listed above.
Please note that this patchset doesn't make any functional change but
rather export relevant counters to user space via sysfs.
As usual, feedback/comments/suggestions are welcome!
Changes from v3:
- To be consistent in naming, all counters are suffixed with _count
(Keith Busch)
- The first patch in the series creates new attribute group named
diag and all counters are now grouped under this new sysfs
attribute group (Keith Busch)
- Counters are defined as atomic_long_t instead of size_t (Keith Busch)
- Removed RB and TB tags due to above changes
Link to v3: https://lore.kernel.org/all/20260220175024.292898-1-nilay@linux.ibm.com/
Changes from v2:
- Allow user to write to sysfs attributes so that user could
reset stat counters, if needed (Sagi)
- The controller reconnect counter nr_reconnects could reset
to zero once connection is re-established, so instead of
exposing nr_reconnects counter via sysfs introduce a new
counter which accumulates the reconnect attempts and export
this accumulated counter via sysfs (Sagi)
Link to v2: https://lore.kernel.org/all/20260205124810.682559-1-nilay@linux.ibm.com/
Changes from v1:
- Remove export of stats for admin command rerty count (Keith)
- Use size_add() to ensure stat counters don't overflow (Keith)
Link to v1: https://lore.kernel.org/all/20260130182028.885089-1-nilay@linux.ibm.com/
Nilay Shroff (8):
nvme: add diag attribute group under sysfs
nvme: export command retry count via sysfs
nvme: export multipath failover count via sysfs
nvme: export command error counters via sysfs
nvme: export I/O requeue count when no path is available via sysfs
nvme: export I/O failure count when no path is available via sysfs
nvme: export controller reset event count via sysfs
nvme: export controller reconnect event count via sysfs
drivers/nvme/host/core.c | 15 ++-
drivers/nvme/host/fc.c | 3 +
drivers/nvme/host/multipath.c | 87 ++++++++++++++
drivers/nvme/host/nvme.h | 13 +++
drivers/nvme/host/pci.c | 1 +
drivers/nvme/host/rdma.c | 2 +
drivers/nvme/host/sysfs.c | 214 ++++++++++++++++++++++++++++++++++
drivers/nvme/host/tcp.c | 2 +
8 files changed, 336 insertions(+), 1 deletion(-)
--
2.53.0
next reply other threads:[~2026-05-16 18:37 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-16 18:36 Nilay Shroff [this message]
2026-05-16 18:36 ` [PATCHv4 1/8] nvme: add diag attribute group under sysfs Nilay Shroff
2026-05-16 18:36 ` [PATCHv4 2/8] nvme: export command retry count via sysfs Nilay Shroff
2026-05-16 18:36 ` [PATCHv4 3/8] nvme: export multipath failover " Nilay Shroff
2026-05-16 18:36 ` [PATCHv4 4/8] nvme: export command error counters " Nilay Shroff
2026-05-16 18:36 ` [PATCHv4 5/8] nvme: export I/O requeue count when no path is usable " Nilay Shroff
2026-05-16 18:36 ` [PATCHv4 6/8] nvme: export I/O failure count when no path is available " Nilay Shroff
2026-05-16 18:36 ` [PATCHv4 7/8] nvme: export controller reset event count " Nilay Shroff
2026-05-16 18:36 ` [PATCHv4 8/8] nvme: export controller reconnect " Nilay Shroff
2026-05-16 18:47 ` [PATCHv4 0/8] nvme: export additional diagnostic counters " Nilay Shroff
2026-05-25 9:12 ` Venkat Rao Bagalkote
2026-05-27 19:54 ` Keith Busch
2026-06-04 8:58 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260516183709.269937-1-nilay@linux.ibm.com \
--to=nilay@linux.ibm.com \
--cc=axboe@kernel.dk \
--cc=chaitanyak@nvidia.com \
--cc=dwagner@suse.de \
--cc=gjoyce@linux.ibm.com \
--cc=hare@suse.com \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
--cc=venkat88@linux.ibm.com \
--cc=wenxiong@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.