* [PATCH blktests v2] nvme/068: add a test for multipath delayed removal
@ 2026-04-17 9:48 John Garry
2026-04-17 12:36 ` Nilay Shroff
0 siblings, 1 reply; 2+ messages in thread
From: John Garry @ 2026-04-17 9:48 UTC (permalink / raw)
To: shinichiro.kawasaki, linux-block; +Cc: linux-nvme, kch, nilay, John Garry
For NVMe multipath, the delayed removal feature allows the multipath
gendisk to remain present when all available paths are gone. The purpose of
this feature is to ensure that we keep the gendisk for intermittent path
failures.
The delayed removal works on a timer - when all paths are gone, a timer is
kicked off; once the timer expires and no paths have returned, the gendisk
is removed.
When all paths are gone and the gendisk is still present, all reads and
writes to the disk are queued. If a path returns before the timer
expiration, the timer canceled and the queued IO is submitted;
otherwise they fail when the timer expires.
This testcase covers two scenarios in separate parts:
a. test that IOs submitted after all paths are removed (and do not return)
fail
b. test that IOs submitted between all paths removed and a path
returning succeed
During the period of the timer being active, it must be ensured that the
nvme-core module is not removed. Otherwise the driver may not be present
to handle the timeout expiry. The kernel ensures this by taking a
reference to the module. Ideally, we would try to remove the module during
this test to prove that this is not possible (and the kernel behaves as
expected), but that module will probably not be removable anyway due to
many references. To test this feature, check that the refcount of the
nvme-core module is incremented when the delayed timer is active.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
Differences to v1:
- Address shellcheck issues (Shinichiro)
- Drop Quick=1 (Shinichiro)
- Reference kernel commit (Nilay)
- Tidy setting refcnt (Nilay)
- Add RB tag from Chaitanya (Thanks!)
Note that a fix for a refcnt issue has been merged into nvme-7.1 queue
diff --git a/common/rc b/common/rc
index 5350057..375c8f1 100644
--- a/common/rc
+++ b/common/rc
@@ -117,6 +117,16 @@ _module_not_in_use() {
fi
}
+_module_use_count() {
+ local refcnt
+ if [ -f "/sys/module/$1/refcnt" ]; then
+ refcnt="$(cat /sys/module/"$1"/refcnt)"
+ echo "$refcnt"
+ return
+ fi
+ echo ""
+}
+
_have_module_param() {
_have_driver "$1" || return
diff --git a/tests/nvme/068 b/tests/nvme/068
new file mode 100755
index 0000000..81f3219
--- /dev/null
+++ b/tests/nvme/068
@@ -0,0 +1,118 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-3.0+
+# Copyright (C) 2026 John Garry
+#
+# Test NVMe multipath delayed removal works as expected. This feature was
+# introduced in commit 62188639ec16 ("nvme-multipath: introduce delayed removal
+# of the multipath head node")
+
+. tests/nvme/rc
+. common/xfs
+
+DESCRIPTION="NVMe multipath delayed removal test"
+
+requires() {
+ _nvme_requires
+ _have_loop
+ _have_module_param_value nvme_core multipath Y
+ _require_nvme_trtype_is_fabrics
+}
+
+set_conditions() {
+ _set_nvme_trtype "$@"
+}
+
+_delayed_nvme_reconnect_ctrl() {
+ sleep 5
+ _nvme_connect_subsys
+}
+
+test() {
+ echo "Running ${TEST_NAME}"
+
+ _setup_nvmet
+
+ local nvmedev
+ local ns
+ local bytes_written
+ local refcnt_orig
+ local refcnt
+ _nvmet_target_setup
+
+ _nvme_connect_subsys
+
+ # Part a: Prove that writes fail when no path returns. Any reads or
+ # writes are queued during the delayed removal period. If no
+ # paths return before the timer expires, then those IOs should
+ # fail.
+ # During the delayed removal period, ensure that the module
+ # refcnt is incremented, to prove that we cannot remove the
+ # driver during this period.
+ nvmedev=$(_find_nvme_dev "${def_subsysnqn}")
+ ns=$(_find_nvme_ns "${def_subsys_uuid}")
+ refcnt_orig=$(_module_use_count nvme_core)
+ echo 10 > "/sys/block/${ns}/delayed_removal_secs"
+ _nvme_disconnect_ctrl "${nvmedev}"
+ sleep 1
+ ns=$(_find_nvme_ns "${def_subsys_uuid}")
+ if [[ "${ns}" = "" ]]; then
+ echo "could not find ns after disconnect"
+ fi
+ refcnt=$(_module_use_count nvme_core)
+ if [ "$refcnt" != "" ] && [ "$refcnt" -le "$refcnt_orig" ]; then
+ echo "module refcount did not increase"
+ fi
+ bytes_written=$(run_xfs_io_pwritev2 /dev/"$ns" 4096)
+ if [ "$bytes_written" == 4096 ]; then
+ echo "wrote successfully after disconnect"
+ fi
+ sleep 10
+ ns=$(_find_nvme_ns "${def_subsys_uuid}")
+ if [[ "${ns}" != "" ]]; then
+ echo "found ns after delayed removal"
+ fi
+ refcnt=$(_module_use_count nvme_core)
+ if [ "$refcnt" != "" ] && [ "$refcnt" -ne "$refcnt_orig" ]; then
+ echo "module refcount not as original"
+ fi
+
+ # Part b: Ensure writes for an intermittent disconnect are successful.
+ # During an intermittent disconnect, any reads or writes
+ # queued should succeed after a path returns.
+ # Also ensure module refcount behaviour is as expected, as
+ # above.
+ _nvme_connect_subsys
+
+ nvmedev=$(_find_nvme_dev "${def_subsysnqn}")
+ ns=$(_find_nvme_ns "${def_subsys_uuid}")
+ refcnt_orig=$(_module_use_count nvme_core)
+ echo 10 > "/sys/block/${ns}/delayed_removal_secs"
+ _nvme_disconnect_ctrl "${nvmedev}"
+ sleep 1
+ ns=$(_find_nvme_ns "${def_subsys_uuid}")
+ if [[ "${ns}" = "" ]]; then
+ echo "could not find ns after disconnect"
+ fi
+ _delayed_nvme_reconnect_ctrl "${nvmedev}" &
+ bytes_written=$(run_xfs_io_pwritev2 /dev/"$ns" 4096)
+ if [ "$bytes_written" != 4096 ]; then
+ echo "could not write successfully with reconnect"
+ fi
+ sleep 10
+ ns=$(_find_nvme_ns "${def_subsys_uuid}")
+ if [[ "${ns}" = "" ]]; then
+ echo "could not find ns after delayed reconnect"
+ fi
+ refcnt=$(_module_use_count nvme_core)
+ if [ "$refcnt" != "" ] && [ "$refcnt" -ne "$refcnt_orig" ]; then
+ echo "module refcount not as original"
+ fi
+
+ # Final tidy-up
+ echo 0 > "/sys/block/${ns}/delayed_removal_secs"
+ nvmedev=$(_find_nvme_dev "${def_subsysnqn}")
+ _nvme_disconnect_ctrl "${nvmedev}"
+ _nvmet_target_cleanup
+
+ echo "Test complete"
+}
diff --git a/tests/nvme/068.out b/tests/nvme/068.out
new file mode 100644
index 0000000..b913d19
--- /dev/null
+++ b/tests/nvme/068.out
@@ -0,0 +1,3 @@
+Running nvme/068
+pwrite: Input/output error
+Test complete
--
2.43.5
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH blktests v2] nvme/068: add a test for multipath delayed removal
2026-04-17 9:48 [PATCH blktests v2] nvme/068: add a test for multipath delayed removal John Garry
@ 2026-04-17 12:36 ` Nilay Shroff
0 siblings, 0 replies; 2+ messages in thread
From: Nilay Shroff @ 2026-04-17 12:36 UTC (permalink / raw)
To: John Garry, shinichiro.kawasaki, linux-block; +Cc: linux-nvme, kch
On 4/17/26 3:18 PM, John Garry wrote:
> For NVMe multipath, the delayed removal feature allows the multipath
> gendisk to remain present when all available paths are gone. The purpose of
> this feature is to ensure that we keep the gendisk for intermittent path
> failures.
>
> The delayed removal works on a timer - when all paths are gone, a timer is
> kicked off; once the timer expires and no paths have returned, the gendisk
> is removed.
>
> When all paths are gone and the gendisk is still present, all reads and
> writes to the disk are queued. If a path returns before the timer
> expiration, the timer canceled and the queued IO is submitted;
> otherwise they fail when the timer expires.
>
> This testcase covers two scenarios in separate parts:
> a. test that IOs submitted after all paths are removed (and do not return)
> fail
> b. test that IOs submitted between all paths removed and a path
> returning succeed
>
> During the period of the timer being active, it must be ensured that the
> nvme-core module is not removed. Otherwise the driver may not be present
> to handle the timeout expiry. The kernel ensures this by taking a
> reference to the module. Ideally, we would try to remove the module during
> this test to prove that this is not possible (and the kernel behaves as
> expected), but that module will probably not be removable anyway due to
> many references. To test this feature, check that the refcount of the
> nvme-core module is incremented when the delayed timer is active.
>
> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
Looks good to me:
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-04-17 12:36 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-17 9:48 [PATCH blktests v2] nvme/068: add a test for multipath delayed removal John Garry
2026-04-17 12:36 ` Nilay Shroff
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox