public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs
@ 2026-02-20 17:48 Nilay Shroff
  2026-02-20 17:48 ` [PATCHv3 1/7] nvme: export command retry count " Nilay Shroff
                   ` (9 more replies)
  0 siblings, 10 replies; 18+ messages in thread
From: Nilay Shroff @ 2026-02-20 17:48 UTC (permalink / raw)
  To: linux-nvme
  Cc: kbusch, axboe, hch, sagi, hare, dwagner, wenxiong, gjoyce,
	Nilay Shroff

Hi,

The NVMe driver encounters various events and conditions during normal
operation that are either not tracked today or not exposed to userspace
via sysfs. Lack of visibility into these events can make it difficult to
diagnose subtle issues related to controller behavior, multipath
stability, and I/O reliability.

This patchset adds several diagnostic counters that provide improved
observability into NVMe behavior. These counters are intended to help
users understand events such as transient path unavailability,
controller retries/reconnect/reset, failovers, and I/O failures. They
can also be consumed by monitoring tools such as nvme-top.

Specifically, this series proposes to export the following counters via
sysfs:
  - Command retry count
  - Multipath failover count
  - Command error count
  - I/O requeue count
  - I/O failure count
  - Controller reset event counts
  - Controller reconnect counts

The patchset consists of seven patches:
  Patch 1: Export command retry count
  Patch 2: Export multipath failover count
  Patch 3: Export command error count
  Patch 4: Export I/O requeue count
  Patch 5: Export I/O failure count
  Patch 6: Export controller reset event counts
  Patch 7: Export controller reconnect event count

Please note that this patchset doesn't make any functional change but
rather export relevant counters to user space via sysfs.

As usual, feedback/comments/suggestions are welcome!

Changes from v2:
  - Allow user to write to sysfs attributes so that user could
    reset stat counters, if needed (Sagi)
  - The controller reconnect counter nr_reconnects could reset
    to zero once connection is re-established, so instead of
    exposing nr_reconnects counter via sysfs introduce a new
    counter which accumulates the reconnect attempts and export 
    this accumulated counter via sysfs (Sagi)
Link to v2: https://lore.kernel.org/all/20260205124810.682559-1-nilay@linux.ibm.com/

Changes from v1:
  - Remove export of stats for admin command rerty count (Keith)
  - Use size_add() to ensure stat counters don't overflow (Keith)
Link to v1: https://lore.kernel.org/all/20260130182028.885089-1-nilay@linux.ibm.com/  

Nilay Shroff (7):
  nvme: export command retry count via sysfs
  nvme: export multipath failover count via sysfs
  nvme: export command error counters via sysfs
  nvme: export I/O requeue count when no path is available via sysfs
  nvme: export I/O failure count when no path is available via sysfs
  nvme: export controller reset event count via sysfs
  nvme: export controller reconnect event count via sysfs

 drivers/nvme/host/core.c      |  18 +++-
 drivers/nvme/host/fc.c        |   5 +
 drivers/nvme/host/multipath.c |  89 ++++++++++++++++++
 drivers/nvme/host/nvme.h      |  13 ++-
 drivers/nvme/host/rdma.c      |   4 +
 drivers/nvme/host/sysfs.c     | 167 ++++++++++++++++++++++++++++++++++
 drivers/nvme/host/tcp.c       |   3 +
 7 files changed, 297 insertions(+), 2 deletions(-)

-- 
2.52.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2026-03-19 15:56 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-20 17:48 [PATCHv3 0/7] nvme: export additional diagnostic counters via sysfs Nilay Shroff
2026-02-20 17:48 ` [PATCHv3 1/7] nvme: export command retry count " Nilay Shroff
2026-02-20 17:48 ` [PATCHv3 2/7] nvme: export multipath failover " Nilay Shroff
2026-02-20 17:48 ` [PATCHv3 3/7] nvme: export command error counters " Nilay Shroff
2026-02-20 17:48 ` [PATCHv3 4/7] nvme: export I/O requeue count when no path is available " Nilay Shroff
2026-02-20 17:48 ` [PATCHv3 5/7] nvme: export I/O failure " Nilay Shroff
2026-02-20 17:48 ` [PATCHv3 6/7] nvme: export controller reset event count " Nilay Shroff
2026-02-20 17:48 ` [PATCHv3 7/7] nvme: export controller reconnect " Nilay Shroff
2026-02-22 12:36 ` [PATCHv3 0/7] nvme: export additional diagnostic counters " Venkat
2026-02-22 14:10   ` Nilay Shroff
2026-02-22 15:06     ` Venkat Rao Bagalkote
2026-02-26  5:37 ` Chaitanya Kulkarni
2026-03-04 14:33 ` Nilay Shroff
2026-03-06 16:02   ` Keith Busch
2026-03-08 18:55     ` Nilay Shroff
2026-03-09 15:32       ` John Garry
2026-03-19 15:55         ` Nilay Shroff
2026-03-16 12:56       ` Nilay Shroff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox