From: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
To: Nilay Shroff <nilay@linux.ibm.com>, linux-nvme@lists.infradead.org
Cc: dwagner@suse.de, hare@suse.com, kbusch@kernel.org, hch@lst.de,
sagi@grimberg.me, axboe@kernel.dk, chaitanyak@nvidia.com,
gjoyce@linux.ibm.com, wenxiong@linux.ibm.com
Subject: Re: [PATCHv4 0/8] nvme: export additional diagnostic counters via sysfs
Date: Mon, 25 May 2026 14:42:31 +0530 [thread overview]
Message-ID: <72318dfe-aaa7-4c11-8254-ea3163b31eba@linux.ibm.com> (raw)
In-Reply-To: <20260516183709.269937-1-nilay@linux.ibm.com>
On 17/05/26 12:06 am, Nilay Shroff wrote:
> Hi,
>
> The NVMe driver encounters various events and conditions during normal
> operation that are either not tracked today or not exposed to userspace
> via sysfs. Lack of visibility into these events can make it difficult to
> diagnose subtle issues related to controller behavior, multipath
> stability, and I/O reliability.
>
> This patchset adds several diagnostic counters that provide improved
> observability into NVMe behavior. These counters are intended to help
> users understand events such as transient path unavailability,
> controller retries/reconnect/reset, failovers, and I/O failures. They
> can also be consumed by monitoring tools such as nvme-top.
>
> Specifically, this series proposes to export the following counters via
> sysfs:
> - Command retry count
> - Multipath failover count
> - Command error count
> - I/O requeue count
> - I/O failure count
> - Controller reset event counts
> - Controller reconnect counts
>
> The first patch in the series adds a new diag attribute group under per-path,
> ns-head and ctrl sysfs directories so that all diagnostics counters could be
> grouped together under diag sub-directory. The subsequent patches in the series
> adds diagnostics counters listed above.
>
> Please note that this patchset doesn't make any functional change but
> rather export relevant counters to user space via sysfs.
>
> As usual, feedback/comments/suggestions are welcome!
>
> Changes from v3:
> - To be consistent in naming, all counters are suffixed with _count
> (Keith Busch)
> - The first patch in the series creates new attribute group named
> diag and all counters are now grouped under this new sysfs
> attribute group (Keith Busch)
> - Counters are defined as atomic_long_t instead of size_t (Keith Busch)
> - Removed RB and TB tags due to above changes
> Link to v3: https://lore.kernel.org/all/20260220175024.292898-1-nilay@linux.ibm.com/
>
> Changes from v2:
> - Allow user to write to sysfs attributes so that user could
> reset stat counters, if needed (Sagi)
> - The controller reconnect counter nr_reconnects could reset
> to zero once connection is re-established, so instead of
> exposing nr_reconnects counter via sysfs introduce a new
> counter which accumulates the reconnect attempts and export
> this accumulated counter via sysfs (Sagi)
> Link to v2: https://lore.kernel.org/all/20260205124810.682559-1-nilay@linux.ibm.com/
>
> Changes from v1:
> - Remove export of stats for admin command rerty count (Keith)
> - Use size_add() to ensure stat counters don't overflow (Keith)
> Link to v1: https://lore.kernel.org/all/20260130182028.885089-1-nilay@linux.ibm.com/
>
> Nilay Shroff (8):
> nvme: add diag attribute group under sysfs
> nvme: export command retry count via sysfs
> nvme: export multipath failover count via sysfs
> nvme: export command error counters via sysfs
> nvme: export I/O requeue count when no path is available via sysfs
> nvme: export I/O failure count when no path is available via sysfs
> nvme: export controller reset event count via sysfs
> nvme: export controller reconnect event count via sysfs
>
> drivers/nvme/host/core.c | 15 ++-
> drivers/nvme/host/fc.c | 3 +
> drivers/nvme/host/multipath.c | 87 ++++++++++++++
> drivers/nvme/host/nvme.h | 13 +++
> drivers/nvme/host/pci.c | 1 +
> drivers/nvme/host/rdma.c | 2 +
> drivers/nvme/host/sysfs.c | 214 ++++++++++++++++++++++++++++++++++
> drivers/nvme/host/tcp.c | 2 +
> 8 files changed, 336 insertions(+), 1 deletion(-)
>
Hello Nilay,
Applied this patch series on top of v7.1-rc5 and boot-tested on ppc64le.
Verified the new NVMe diag sysfs hierarchy and counters exposed by this
series.
Validation steps executed:
Read all exported NVMe diag counters: for f in $(find /sys -path
'*nvme*diag/*_count' 2>/dev/null); do echo "$f: $(cat "$f")"; done
Reset all writable counters to zero: for f in $(find /sys -path
'*nvme*diag/*_count' 2>/dev/null); do echo 0 > "$f" && echo "reset ok
$f"; done
Negative test with invalid input: echo abc >
/sys/devices/pci0525:48/0525:48:00.0/nvme/nvme0/diag/command_error_count
Observed results:
diag directories were present under:
controller paths, e.g. /sys/devices/.../nvme/nvmeX/diag/
per-path namespace paths, e.g. /sys/devices/.../nvme/nvmeX/nvmeYcZnW/diag/
namespace-head paths, e.g.
/sys/devices/virtual/nvme-subsystem/nvme-subsysX/nvmeYnZ/diag/
Controller counters observed:
reset_count
command_error_count
reconnect_count on fabrics controllers
# ll /sys/devices/virtual/nvme-fabrics/ctl/nvme7/diag
total 0
-rw-r--r--. 1 root root 65536 May 25 03:58 command_error_count
-rw-r--r--. 1 root root 65536 May 25 03:58 reconnect_count
-rw-r--r--. 1 root root 65536 May 25 03:58 reset_count
# ll /sys/devices/pci052a:58/052a:58:00.0/nvme/nvme2/diag
total 0
-rw-r--r--. 1 root root 65536 May 25 03:58 command_error_count
-rw-r--r--. 1 root root 65536 May 25 03:58 reset_count
Per-path counters observed:
multipath_failover_count
command_error_count
command_retries_count
# ll /sys/devices/pci052a:58/052a:58:00.0/nvme/nvme2/nvme2c2n1/diag
total 0
-rw-r--r--. 1 root root 65536 May 25 03:58 command_error_count
-rw-r--r--. 1 root root 65536 May 25 03:58 command_retries_count
-rw-r--r--. 1 root root 65536 May 25 03:58 multipath_failover_count
Namespace-head counters observed:
io_fail_no_available_path_count
io_requeue_no_usable_path_count
# ll /sys/devices/virtual/nvme-subsystem/nvme-subsys1/nvme1n2/diag
total 0
-rw-r--r--. 1 root root 65536 May 25 03:58 io_fail_no_available_path_count
-rw-r--r--. 1 root root 65536 May 25 03:58 io_requeue_no_usable_path_count
All reads returned numeric values
All reset writes to 0 succeeded
Invalid text write failed as expected: -bash: echo: write error: Invalid
argument.
If it all looks good, please add below tag.
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Regards,
Venkat.
Regards,
Venkat.
next prev parent reply other threads:[~2026-05-25 9:13 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-16 18:36 [PATCHv4 0/8] nvme: export additional diagnostic counters via sysfs Nilay Shroff
2026-05-16 18:36 ` [PATCHv4 1/8] nvme: add diag attribute group under sysfs Nilay Shroff
2026-05-16 18:36 ` [PATCHv4 2/8] nvme: export command retry count via sysfs Nilay Shroff
2026-05-16 18:36 ` [PATCHv4 3/8] nvme: export multipath failover " Nilay Shroff
2026-05-16 18:36 ` [PATCHv4 4/8] nvme: export command error counters " Nilay Shroff
2026-05-16 18:36 ` [PATCHv4 5/8] nvme: export I/O requeue count when no path is usable " Nilay Shroff
2026-05-16 18:36 ` [PATCHv4 6/8] nvme: export I/O failure count when no path is available " Nilay Shroff
2026-05-16 18:36 ` [PATCHv4 7/8] nvme: export controller reset event count " Nilay Shroff
2026-05-16 18:36 ` [PATCHv4 8/8] nvme: export controller reconnect " Nilay Shroff
2026-05-16 18:47 ` [PATCHv4 0/8] nvme: export additional diagnostic counters " Nilay Shroff
2026-05-25 9:12 ` Venkat Rao Bagalkote [this message]
2026-05-27 19:54 ` Keith Busch
2026-06-04 8:58 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=72318dfe-aaa7-4c11-8254-ea3163b31eba@linux.ibm.com \
--to=venkat88@linux.ibm.com \
--cc=axboe@kernel.dk \
--cc=chaitanyak@nvidia.com \
--cc=dwagner@suse.de \
--cc=gjoyce@linux.ibm.com \
--cc=hare@suse.com \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=nilay@linux.ibm.com \
--cc=sagi@grimberg.me \
--cc=wenxiong@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox