* [PATCH blktests] nvme/068: add a test for multipath delayed removal
@ 2026-04-15 10:41 John Garry
2026-04-15 17:58 ` Chaitanya Kulkarni
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: John Garry @ 2026-04-15 10:41 UTC (permalink / raw)
To: shinichiro.kawasaki, linux-block; +Cc: linux-nvme, nilay, John Garry
For NVMe multipath, the delayed removal feature allows the multipath
gendisk to remain present when all available paths are gone. The purpose of
this feature is to ensure that we keep the gendisk for intermittent path
failures.
The delayed removal works on a timer - when all paths are gone, a timer is
kicked off; once the timer expires and no paths have returned, the gendisk
is removed.
When all paths are gone and the gendisk is still present, all reads and
writes to the disk are queued. If a path returns before the timer
expiration, the timer canceled and the queued IO is submitted;
otherwise they fail when the timer expires.
This testcase covers two scenarios in separate parts:
a. test that IOs submitted after all paths are removed (and do not return)
fail
b. test that IOs submitted between all paths removed and a path
returning succeed
During the period of the timer being active, it must be ensured that the
nvme-core module is not removed. Otherwise the driver may not be present
to handle the timeout expiry. The kernel ensures this by taking a
reference to the module. Ideally, we would try to remove the module during
this test to prove that this is not possible (and the kernel behaves as
expected), but that module will probably not be removable anyway due to
many references. To test this feature, check that the refcount of the
nvme-core module is incremented when the delayed timer is active.
Signed-off-by: John Garry <john.g.garry@oracle.com>
diff --git a/common/rc b/common/rc
index 5350057..6eae0e2 100644
--- a/common/rc
+++ b/common/rc
@@ -117,6 +117,16 @@ _module_not_in_use() {
fi
}
+_module_use_count() {
+ local refcnt
+ if [ -f "/sys/module/$1/refcnt" ]; then
+ refcnt="$(cat /sys/module/"$1"/refcnt)"
+ echo $refcnt
+ return
+ fi
+ echo ""
+}
+
_have_module_param() {
_have_driver "$1" || return
diff --git a/tests/nvme/068 b/tests/nvme/068
new file mode 100644
index 0000000..e06fd6b
--- /dev/null
+++ b/tests/nvme/068
@@ -0,0 +1,118 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-3.0+
+# Copyright (C) 2026 John Garry
+#
+# Test NVMe multipath delayed removal works as expected
+
+. tests/nvme/rc
+. common/xfs
+
+DESCRIPTION="NVMe multipath delayed removal test"
+QUICK=1
+
+requires() {
+ _nvme_requires
+ _have_loop
+ _have_module_param_value nvme_core multipath Y
+ _require_nvme_trtype_is_fabrics
+}
+
+set_conditions() {
+ _set_nvme_trtype "$@"
+}
+
+_delayed_nvme_reconnect_ctrl() {
+ sleep 5
+ _nvme_connect_subsys
+}
+
+test() {
+ echo "Running ${TEST_NAME}"
+
+ _setup_nvmet
+
+ local nvmedev
+ local ns
+ local bytes_written
+ local refcnt_orig
+ local refcnt
+ _nvmet_target_setup
+
+ _nvme_connect_subsys
+
+ # Part a: Prove that writes fail when no path returns. Any reads or
+ # writes are queued during the delayed removal period. If no
+ # paths return before the timer expires, then those IOs should
+ # fail.
+ # During the delayed removal period, ensure that the module
+ # refcnt is incremented, to prove that we cannot remove the
+ # driver during this period.
+ nvmedev=$(_find_nvme_dev "${def_subsysnqn}")
+ ns=$(_find_nvme_ns "${def_subsys_uuid}")
+ refcnt=$(_module_use_count nvme_core)
+ echo 10 > "/sys/block/"$ns"/delayed_removal_secs"
+ refcnt_orig=$(_module_use_count nvme_core)
+ _nvme_disconnect_ctrl "${nvmedev}"
+ sleep 1
+ ns=$(_find_nvme_ns "${def_subsys_uuid}")
+ if [[ "${ns}" = "" ]]; then
+ echo "could not find ns after disconnect"
+ fi
+ refcnt=$(_module_use_count nvme_core)
+ if [ "$refcnt" != "" ] && [ "$refcnt" -le "$refcnt_orig" ]; then
+ echo "module refcount did not increase"
+ fi
+ bytes_written=$(run_xfs_io_pwritev2 /dev/"$ns" 4096)
+ if [ "$bytes_written" == 4096 ]; then
+ echo "wrote successfully after disconnect"
+ fi
+ sleep 10
+ ns=$(_find_nvme_ns "${def_subsys_uuid}")
+ if [[ !"${ns}" = "" ]]; then
+ echo "found ns after delayed removal"
+ fi
+ refcnt=$(_module_use_count nvme_core)
+ if [ "$refcnt" != "" ] && [ "$refcnt" -ne "$refcnt_orig" ]; then
+ echo "module refcount not as original"
+ fi
+
+ # Part b: Ensure writes for an intermittent disconnect are successful.
+ # During an intermittent disconnect, any reads or writes
+ # queued should succeed after a path returns.
+ # Also ensure module refcount behaviour is as expected, as
+ # above.
+ _nvme_connect_subsys
+
+ nvmedev=$(_find_nvme_dev "${def_subsysnqn}")
+ ns=$(_find_nvme_ns "${def_subsys_uuid}")
+ refcnt_orig=$(_module_use_count nvme_core)
+ echo 10 > "/sys/block/"$ns"/delayed_removal_secs"
+ _nvme_disconnect_ctrl "${nvmedev}"
+ sleep 1
+ ns=$(_find_nvme_ns "${def_subsys_uuid}")
+ if [[ "${ns}" = "" ]]; then
+ echo "could not find ns after disconnect"
+ fi
+ _delayed_nvme_reconnect_ctrl "${nvmedev}" &
+ bytes_written=$(run_xfs_io_pwritev2 /dev/"$ns" 4096)
+ if [ "$bytes_written" != 4096 ]; then
+ echo "could not write successfully with reconnect"
+ fi
+ sleep 10
+ ns=$(_find_nvme_ns "${def_subsys_uuid}")
+ if [[ "${ns}" = "" ]]; then
+ echo "could not find ns after delayed reconnect"
+ fi
+ refcnt=$(_module_use_count nvme_core)
+ if [ "$refcnt" != "" ] && [ "$refcnt" -ne "$refcnt_orig" ]; then
+ echo "module refcount not as original"
+ fi
+
+ # Final tidy-up
+ echo 0 > /sys/block/"$ns"/delayed_removal_secs
+ nvmedev=$(_find_nvme_dev "${def_subsysnqn}")
+ _nvme_disconnect_ctrl "${nvmedev}"
+ _nvmet_target_cleanup
+
+ echo "Test complete"
+}
diff --git a/tests/nvme/068.out b/tests/nvme/068.out
new file mode 100644
index 0000000..b913d19
--- /dev/null
+++ b/tests/nvme/068.out
@@ -0,0 +1,3 @@
+Running nvme/068
+pwrite: Input/output error
+Test complete
--
2.43.5
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH blktests] nvme/068: add a test for multipath delayed removal 2026-04-15 10:41 [PATCH blktests] nvme/068: add a test for multipath delayed removal John Garry @ 2026-04-15 17:58 ` Chaitanya Kulkarni 2026-04-16 11:18 ` Nilay Shroff 2026-04-16 12:50 ` Shinichiro Kawasaki 2 siblings, 0 replies; 7+ messages in thread From: Chaitanya Kulkarni @ 2026-04-15 17:58 UTC (permalink / raw) To: John Garry, shinichiro.kawasaki@wdc.com, linux-block@vger.kernel.org Cc: linux-nvme@lists.infradead.org, nilay@linux.ibm.com John, On 4/15/26 03:41, John Garry wrote: > For NVMe multipath, the delayed removal feature allows the multipath > gendisk to remain present when all available paths are gone. The purpose of > this feature is to ensure that we keep the gendisk for intermittent path > failures. > > The delayed removal works on a timer - when all paths are gone, a timer is > kicked off; once the timer expires and no paths have returned, the gendisk > is removed. > > When all paths are gone and the gendisk is still present, all reads and > writes to the disk are queued. If a path returns before the timer > expiration, the timer canceled and the queued IO is submitted; > otherwise they fail when the timer expires. > > This testcase covers two scenarios in separate parts: > a. test that IOs submitted after all paths are removed (and do not return) > fail > b. test that IOs submitted between all paths removed and a path > returning succeed > > During the period of the timer being active, it must be ensured that the > nvme-core module is not removed. Otherwise the driver may not be present > to handle the timeout expiry. The kernel ensures this by taking a > reference to the module. Ideally, we would try to remove the module during > this test to prove that this is not possible (and the kernel behaves as > expected), but that module will probably not be removable anyway due to > many references. To test this feature, check that the refcount of the > nvme-core module is incremented when the delayed timer is active. > > Signed-off-by: John Garry<john.g.garry@oracle.com> Thanks you so much for this testcase. I've failed to add tests in this areas but this will be absolutely helpful to have it in regular testing. Looks good. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> -ck ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH blktests] nvme/068: add a test for multipath delayed removal 2026-04-15 10:41 [PATCH blktests] nvme/068: add a test for multipath delayed removal John Garry 2026-04-15 17:58 ` Chaitanya Kulkarni @ 2026-04-16 11:18 ` Nilay Shroff 2026-04-16 11:45 ` John Garry 2026-04-16 12:50 ` Shinichiro Kawasaki 2 siblings, 1 reply; 7+ messages in thread From: Nilay Shroff @ 2026-04-16 11:18 UTC (permalink / raw) To: John Garry, shinichiro.kawasaki, linux-block; +Cc: linux-nvme > diff --git a/tests/nvme/068 b/tests/nvme/068 > new file mode 100644 > index 0000000..e06fd6b > --- /dev/null > +++ b/tests/nvme/068 > @@ -0,0 +1,118 @@ > +#!/bin/bash > +# SPDX-License-Identifier: GPL-3.0+ > +# Copyright (C) 2026 John Garry > +# > +# Test NVMe multipath delayed removal works as expected > + Should we also reference the commit 62188639ec16 ("nvme-multipath: introduce delayed removal of the multipath head node") which introduced this feature in the kernel? > +. tests/nvme/rc > +. common/xfs > + > +DESCRIPTION="NVMe multipath delayed removal test" > +QUICK=1 > + > +requires() { > + _nvme_requires > + _have_loop > + _have_module_param_value nvme_core multipath Y > + _require_nvme_trtype_is_fabrics > +} > + > +set_conditions() { > + _set_nvme_trtype "$@" > +} > + > +_delayed_nvme_reconnect_ctrl() { > + sleep 5 > + _nvme_connect_subsys > +} > + > +test() { > + echo "Running ${TEST_NAME}" > + > + _setup_nvmet > + > + local nvmedev > + local ns > + local bytes_written > + local refcnt_orig > + local refcnt > + _nvmet_target_setup > + > + _nvme_connect_subsys > + > + # Part a: Prove that writes fail when no path returns. Any reads or > + # writes are queued during the delayed removal period. If no > + # paths return before the timer expires, then those IOs should > + # fail. > + # During the delayed removal period, ensure that the module > + # refcnt is incremented, to prove that we cannot remove the > + # driver during this period. > + nvmedev=$(_find_nvme_dev "${def_subsysnqn}") > + ns=$(_find_nvme_ns "${def_subsys_uuid}") > + refcnt=$(_module_use_count nvme_core) Should the refcount be stored in $refcnt_orig instead of $refcnt? > + echo 10 > "/sys/block/"$ns"/delayed_removal_secs" > + refcnt_orig=$(_module_use_count nvme_core) We may not need to again store refcnt here if we store it before enabling delayed_removal_sec above. Other than above trivial comments, this looks good to me. Thanks, --Nilay ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH blktests] nvme/068: add a test for multipath delayed removal 2026-04-16 11:18 ` Nilay Shroff @ 2026-04-16 11:45 ` John Garry 0 siblings, 0 replies; 7+ messages in thread From: John Garry @ 2026-04-16 11:45 UTC (permalink / raw) To: Nilay Shroff, shinichiro.kawasaki, linux-block; +Cc: linux-nvme On 16/04/2026 12:18, Nilay Shroff wrote: > >> diff --git a/tests/nvme/068 b/tests/nvme/068 >> new file mode 100644 >> index 0000000..e06fd6b >> --- /dev/null >> +++ b/tests/nvme/068 >> @@ -0,0 +1,118 @@ >> +#!/bin/bash >> +# SPDX-License-Identifier: GPL-3.0+ >> +# Copyright (C) 2026 John Garry >> +# >> +# Test NVMe multipath delayed removal works as expected >> + > Should we also reference the commit 62188639ec16 ("nvme-multipath: > introduce delayed removal of the multipath head node") which introduced > this feature in the kernel? > Sure, if you think that it is useful. We could also introduce at some point the _fixed_by_kernel_commit (like xfstests), but I am not sure if it has value for blktests >> +. tests/nvme/rc >> +. common/xfs >> + >> +DESCRIPTION="NVMe multipath delayed removal test" >> +QUICK=1 >> + >> +requires() { >> + _nvme_requires >> + _have_loop >> + _have_module_param_value nvme_core multipath Y >> + _require_nvme_trtype_is_fabrics >> +} >> + >> +set_conditions() { >> + _set_nvme_trtype "$@" >> +} >> + >> +_delayed_nvme_reconnect_ctrl() { >> + sleep 5 >> + _nvme_connect_subsys >> +} >> + >> +test() { >> + echo "Running ${TEST_NAME}" >> + >> + _setup_nvmet >> + >> + local nvmedev >> + local ns >> + local bytes_written >> + local refcnt_orig >> + local refcnt >> + _nvmet_target_setup >> + >> + _nvme_connect_subsys >> + >> + # Part a: Prove that writes fail when no path returns. Any reads or >> + # writes are queued during the delayed removal period. If no >> + # paths return before the timer expires, then those IOs should >> + # fail. >> + # During the delayed removal period, ensure that the module >> + # refcnt is incremented, to prove that we cannot remove the >> + # driver during this period. >> + nvmedev=$(_find_nvme_dev "${def_subsysnqn}") >> + ns=$(_find_nvme_ns "${def_subsys_uuid}") >> + refcnt=$(_module_use_count nvme_core) > > Should the refcount be stored in $refcnt_orig instead of $refcnt? Yeah, that's a small bug, as I needlessly store it twice: + refcnt=$(_module_use_count nvme_core) + echo 10 > "/sys/block/"$ns"/delayed_removal_secs" + refcnt_orig=$(_module_use_count nvme_core) > >> + echo 10 > "/sys/block/"$ns"/delayed_removal_secs" >> + refcnt_orig=$(_module_use_count nvme_core) > > We may not need to again store refcnt here if we store it > before enabling delayed_removal_sec above. > > Other than above trivial comments, this looks good to me. > Cheers, John ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH blktests] nvme/068: add a test for multipath delayed removal 2026-04-15 10:41 [PATCH blktests] nvme/068: add a test for multipath delayed removal John Garry 2026-04-15 17:58 ` Chaitanya Kulkarni 2026-04-16 11:18 ` Nilay Shroff @ 2026-04-16 12:50 ` Shinichiro Kawasaki 2026-04-16 13:03 ` John Garry 2 siblings, 1 reply; 7+ messages in thread From: Shinichiro Kawasaki @ 2026-04-16 12:50 UTC (permalink / raw) To: John Garry Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, nilay@linux.ibm.com On Apr 15, 2026 / 10:41, John Garry wrote: > For NVMe multipath, the delayed removal feature allows the multipath > gendisk to remain present when all available paths are gone. The purpose of > this feature is to ensure that we keep the gendisk for intermittent path > failures. > > The delayed removal works on a timer - when all paths are gone, a timer is > kicked off; once the timer expires and no paths have returned, the gendisk > is removed. > > When all paths are gone and the gendisk is still present, all reads and > writes to the disk are queued. If a path returns before the timer > expiration, the timer canceled and the queued IO is submitted; > otherwise they fail when the timer expires. > > This testcase covers two scenarios in separate parts: > a. test that IOs submitted after all paths are removed (and do not return) > fail > b. test that IOs submitted between all paths removed and a path > returning succeed > > During the period of the timer being active, it must be ensured that the > nvme-core module is not removed. Otherwise the driver may not be present > to handle the timeout expiry. The kernel ensures this by taking a > reference to the module. Ideally, we would try to remove the module during > this test to prove that this is not possible (and the kernel behaves as > expected), but that module will probably not be removable anyway due to > many references. To test this feature, check that the refcount of the > nvme-core module is incremented when the delayed timer is active. > > Signed-off-by: John Garry <john.g.garry@oracle.com> John, thanks for the patch. When I ran the new test case in my test environment, it failed. The reported refcount mismatch looks happening in the Part b. nvme/068 (tr=loop) (NVMe multipath delayed removal test) [failed] runtime 38.579s ... 38.770s --- tests/nvme/068.out 2026-04-16 20:50:21.228000000 +0900 +++ /home/shin/Blktests/blktests/results/nodev_tr_loop/nvme/068.out.bad 2026-04-16 21:30:36.215000000 +0900 @@ -1,3 +1,4 @@ Running nvme/068 pwrite: Input/output error +module refcount not as original Test complete I have no idea why it fails. Do you have any guess about the failure cause? Also, please find my review comments in line. > > diff --git a/common/rc b/common/rc > index 5350057..6eae0e2 100644 > --- a/common/rc > +++ b/common/rc > @@ -117,6 +117,16 @@ _module_not_in_use() { > fi > } > > +_module_use_count() { > + local refcnt > + if [ -f "/sys/module/$1/refcnt" ]; then > + refcnt="$(cat /sys/module/"$1"/refcnt)" > + echo $refcnt To suppress Shellcheck warning, please add quotation marks: "$refcnt". > + return > + fi > + echo "" > +} > + > _have_module_param() { > _have_driver "$1" || return > > diff --git a/tests/nvme/068 b/tests/nvme/068 > new file mode 100644 File mode 755 is recommended for consistency. > index 0000000..e06fd6b > --- /dev/null > +++ b/tests/nvme/068 > @@ -0,0 +1,118 @@ > +#!/bin/bash > +# SPDX-License-Identifier: GPL-3.0+ > +# Copyright (C) 2026 John Garry > +# > +# Test NVMe multipath delayed removal works as expected > + > +. tests/nvme/rc > +. common/xfs > + > +DESCRIPTION="NVMe multipath delayed removal test" > +QUICK=1 It is guided to set QUICK=1 for the test cases which completes "in ~30 seconds or less". In my environment, this test case took 38 seconds, so I'm not so sure if this test case is quick. How long does it take in your environment? > + > +requires() { > + _nvme_requires > + _have_loop > + _have_module_param_value nvme_core multipath Y > + _require_nvme_trtype_is_fabrics > +} > + > +set_conditions() { > + _set_nvme_trtype "$@" > +} > + > +_delayed_nvme_reconnect_ctrl() { > + sleep 5 > + _nvme_connect_subsys > +} > + > +test() { > + echo "Running ${TEST_NAME}" > + > + _setup_nvmet > + > + local nvmedev > + local ns > + local bytes_written > + local refcnt_orig > + local refcnt > + _nvmet_target_setup > + > + _nvme_connect_subsys > + > + # Part a: Prove that writes fail when no path returns. Any reads or > + # writes are queued during the delayed removal period. If no > + # paths return before the timer expires, then those IOs should > + # fail. > + # During the delayed removal period, ensure that the module > + # refcnt is incremented, to prove that we cannot remove the > + # driver during this period. > + nvmedev=$(_find_nvme_dev "${def_subsysnqn}") > + ns=$(_find_nvme_ns "${def_subsys_uuid}") > + refcnt=$(_module_use_count nvme_core) > + echo 10 > "/sys/block/"$ns"/delayed_removal_secs" Shellcheck complains about the line above. I think it can be modified as below: echo 10 > "/sys/block/${ns}/delayed_removal_secs" > + refcnt_orig=$(_module_use_count nvme_core) > + _nvme_disconnect_ctrl "${nvmedev}" > + sleep 1 > + ns=$(_find_nvme_ns "${def_subsys_uuid}") > + if [[ "${ns}" = "" ]]; then > + echo "could not find ns after disconnect" > + fi > + refcnt=$(_module_use_count nvme_core) > + if [ "$refcnt" != "" ] && [ "$refcnt" -le "$refcnt_orig" ]; then > + echo "module refcount did not increase" > + fi > + bytes_written=$(run_xfs_io_pwritev2 /dev/"$ns" 4096) > + if [ "$bytes_written" == 4096 ]; then > + echo "wrote successfully after disconnect" > + fi > + sleep 10 > + ns=$(_find_nvme_ns "${def_subsys_uuid}") > + if [[ !"${ns}" = "" ]]; then Shellcheck warns the line above. I guess it can be as follows: if [[ "${ns}" != "" ]]; then > + echo "found ns after delayed removal" > + fi > + refcnt=$(_module_use_count nvme_core) > + if [ "$refcnt" != "" ] && [ "$refcnt" -ne "$refcnt_orig" ]; then > + echo "module refcount not as original" > + fi > + > + # Part b: Ensure writes for an intermittent disconnect are successful. > + # During an intermittent disconnect, any reads or writes > + # queued should succeed after a path returns. > + # Also ensure module refcount behaviour is as expected, as > + # above. > + _nvme_connect_subsys > + > + nvmedev=$(_find_nvme_dev "${def_subsysnqn}") > + ns=$(_find_nvme_ns "${def_subsys_uuid}") > + refcnt_orig=$(_module_use_count nvme_core) > + echo 10 > "/sys/block/"$ns"/delayed_removal_secs" Again, the line above can be modified as follows for Shellcheck. echo 10 > "/sys/block/${ns}/delayed_removal_secs" ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH blktests] nvme/068: add a test for multipath delayed removal 2026-04-16 12:50 ` Shinichiro Kawasaki @ 2026-04-16 13:03 ` John Garry 2026-04-17 2:22 ` Shinichiro Kawasaki 0 siblings, 1 reply; 7+ messages in thread From: John Garry @ 2026-04-16 13:03 UTC (permalink / raw) To: Shinichiro Kawasaki Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, nilay@linux.ibm.com On 16/04/2026 13:50, Shinichiro Kawasaki wrote: > On Apr 15, 2026 / 10:41, John Garry wrote: >> For NVMe multipath, the delayed removal feature allows the multipath >> gendisk to remain present when all available paths are gone. The purpose of >> this feature is to ensure that we keep the gendisk for intermittent path >> failures. >> >> The delayed removal works on a timer - when all paths are gone, a timer is >> kicked off; once the timer expires and no paths have returned, the gendisk >> is removed. >> >> When all paths are gone and the gendisk is still present, all reads and >> writes to the disk are queued. If a path returns before the timer >> expiration, the timer canceled and the queued IO is submitted; >> otherwise they fail when the timer expires. >> >> This testcase covers two scenarios in separate parts: >> a. test that IOs submitted after all paths are removed (and do not return) >> fail >> b. test that IOs submitted between all paths removed and a path >> returning succeed >> >> During the period of the timer being active, it must be ensured that the >> nvme-core module is not removed. Otherwise the driver may not be present >> to handle the timeout expiry. The kernel ensures this by taking a >> reference to the module. Ideally, we would try to remove the module during >> this test to prove that this is not possible (and the kernel behaves as >> expected), but that module will probably not be removable anyway due to >> many references. To test this feature, check that the refcount of the >> nvme-core module is incremented when the delayed timer is active. >> >> Signed-off-by: John Garry <john.g.garry@oracle.com> > > John, thanks for the patch. When I ran the new test case in my test environment, > it failed. The reported refcount mismatch looks happening in the Part b. > > nvme/068 (tr=loop) (NVMe multipath delayed removal test) [failed] > runtime 38.579s ... 38.770s > --- tests/nvme/068.out 2026-04-16 20:50:21.228000000 +0900 > +++ /home/shin/Blktests/blktests/results/nodev_tr_loop/nvme/068.out.bad 2026-04-16 21:30:36.215000000 +0900 > @@ -1,3 +1,4 @@ > Running nvme/068 > pwrite: Input/output error > +module refcount not as original > Test complete > > I have no idea why it fails. Do you have any guess about the failure cause? I posted a fix for this here: https://lore.kernel.org/linux-nvme/20260416051459.GA14802@lst.de/T/#m17eb549aba102198009a37621cb45775841639d9 I should have mentioned this when posting this patch. About this specific part of the test (checking module refcnt), I was wondering if we should be testing kernel internals like this. But, since it found a problem, I suppose that makes it ok :) > > Also, please find my review comments in line. ok > >> >> diff --git a/common/rc b/common/rc >> index 5350057..6eae0e2 100644 >> --- a/common/rc >> +++ b/common/rc >> @@ -117,6 +117,16 @@ _module_not_in_use() { >> fi >> } >> >> +_module_use_count() { >> + local refcnt >> + if [ -f "/sys/module/$1/refcnt" ]; then >> + refcnt="$(cat /sys/module/"$1"/refcnt)" >> + echo $refcnt > > To suppress Shellcheck warning, please add quotation marks: "$refcnt". ok > >> + return >> + fi >> + echo "" >> +} >> + >> _have_module_param() { >> _have_driver "$1" || return >> >> diff --git a/tests/nvme/068 b/tests/nvme/068 >> new file mode 100644 > > File mode 755 is recommended for consistency. > ok >> index 0000000..e06fd6b >> --- /dev/null >> +++ b/tests/nvme/068 >> @@ -0,0 +1,118 @@ >> +#!/bin/bash >> +# SPDX-License-Identifier: GPL-3.0+ >> +# Copyright (C) 2026 John Garry >> +# >> +# Test NVMe multipath delayed removal works as expected >> + >> +. tests/nvme/rc >> +. common/xfs >> + >> +DESCRIPTION="NVMe multipath delayed removal test" >> +QUICK=1 > > It is guided to set QUICK=1 for the test cases which completes > "in ~30 seconds or less". In my environment, this test case took 38 > seconds, so I'm not so sure if this test case is quick. How long > does it take in your environment? I think that I will drop this, as the test time run should be > 30 with the delays, below > >> + >> +requires() { >> + _nvme_requires >> + _have_loop >> + _have_module_param_value nvme_core multipath Y >> + _require_nvme_trtype_is_fabrics >> +} >> + >> +set_conditions() { >> + _set_nvme_trtype "$@" >> +} >> + >> +_delayed_nvme_reconnect_ctrl() { >> + sleep 5 >> + _nvme_connect_subsys >> +} >> + >> +test() { >> + echo "Running ${TEST_NAME}" >> + >> + _setup_nvmet >> + >> + local nvmedev >> + local ns >> + local bytes_written >> + local refcnt_orig >> + local refcnt >> + _nvmet_target_setup >> + >> + _nvme_connect_subsys >> + >> + # Part a: Prove that writes fail when no path returns. Any reads or >> + # writes are queued during the delayed removal period. If no >> + # paths return before the timer expires, then those IOs should >> + # fail. >> + # During the delayed removal period, ensure that the module >> + # refcnt is incremented, to prove that we cannot remove the >> + # driver during this period. >> + nvmedev=$(_find_nvme_dev "${def_subsysnqn}") >> + ns=$(_find_nvme_ns "${def_subsys_uuid}") >> + refcnt=$(_module_use_count nvme_core) >> + echo 10 > "/sys/block/"$ns"/delayed_removal_secs" > > Shellcheck complains about the line above. I think it can be modified as below: > > echo 10 > "/sys/block/${ns}/delayed_removal_secs" ok, I'll check that > >> + refcnt_orig=$(_module_use_count nvme_core) >> + _nvme_disconnect_ctrl "${nvmedev}" >> + sleep 1 >> + ns=$(_find_nvme_ns "${def_subsys_uuid}") >> + if [[ "${ns}" = "" ]]; then >> + echo "could not find ns after disconnect" >> + fi >> + refcnt=$(_module_use_count nvme_core) >> + if [ "$refcnt" != "" ] && [ "$refcnt" -le "$refcnt_orig" ]; then >> + echo "module refcount did not increase" >> + fi >> + bytes_written=$(run_xfs_io_pwritev2 /dev/"$ns" 4096) >> + if [ "$bytes_written" == 4096 ]; then >> + echo "wrote successfully after disconnect" >> + fi >> + sleep 10 >> + ns=$(_find_nvme_ns "${def_subsys_uuid}") >> + if [[ !"${ns}" = "" ]]; then > > Shellcheck warns the line above. I guess it can be as follows: > > if [[ "${ns}" != "" ]]; then ok, I'll check that > >> + echo "found ns after delayed removal" >> + fi >> + refcnt=$(_module_use_count nvme_core) >> + if [ "$refcnt" != "" ] && [ "$refcnt" -ne "$refcnt_orig" ]; then >> + echo "module refcount not as original" >> + fi >> + >> + # Part b: Ensure writes for an intermittent disconnect are successful. >> + # During an intermittent disconnect, any reads or writes >> + # queued should succeed after a path returns. >> + # Also ensure module refcount behaviour is as expected, as >> + # above. >> + _nvme_connect_subsys >> + >> + nvmedev=$(_find_nvme_dev "${def_subsysnqn}") >> + ns=$(_find_nvme_ns "${def_subsys_uuid}") >> + refcnt_orig=$(_module_use_count nvme_core) >> + echo 10 > "/sys/block/"$ns"/delayed_removal_secs" > > Again, the line above can be modified as follows for Shellcheck. > sure > echo 10 > "/sys/block/${ns}/delayed_removal_secs" Thanks, John ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH blktests] nvme/068: add a test for multipath delayed removal 2026-04-16 13:03 ` John Garry @ 2026-04-17 2:22 ` Shinichiro Kawasaki 0 siblings, 0 replies; 7+ messages in thread From: Shinichiro Kawasaki @ 2026-04-17 2:22 UTC (permalink / raw) To: John Garry Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, nilay@linux.ibm.com On Apr 16, 2026 / 14:03, John Garry wrote: > On 16/04/2026 13:50, Shinichiro Kawasaki wrote: [...] > > John, thanks for the patch. When I ran the new test case in my test environment, > > it failed. The reported refcount mismatch looks happening in the Part b. > > > > nvme/068 (tr=loop) (NVMe multipath delayed removal test) [failed] > > runtime 38.579s ... 38.770s > > --- tests/nvme/068.out 2026-04-16 20:50:21.228000000 +0900 > > +++ /home/shin/Blktests/blktests/results/nodev_tr_loop/nvme/068.out.bad 2026-04-16 21:30:36.215000000 +0900 > > @@ -1,3 +1,4 @@ > > Running nvme/068 > > pwrite: Input/output error > > +module refcount not as original > > Test complete > > > > I have no idea why it fails. Do you have any guess about the failure cause? > > I posted a fix for this here: > https://lore.kernel.org/linux-nvme/20260416051459.GA14802@lst.de/T/#m17eb549aba102198009a37621cb45775841639d9 Thanks. With the kenrel fix, I confirmed that the test case passes. Good. [...] > About this specific part of the test (checking module refcnt), I was > wondering if we should be testing kernel internals like this. But, since it > found a problem, I suppose that makes it ok :) Agreed, this test case already proved its value :) ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-04-17 2:22 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-15 10:41 [PATCH blktests] nvme/068: add a test for multipath delayed removal John Garry 2026-04-15 17:58 ` Chaitanya Kulkarni 2026-04-16 11:18 ` Nilay Shroff 2026-04-16 11:45 ` John Garry 2026-04-16 12:50 ` Shinichiro Kawasaki 2026-04-16 13:03 ` John Garry 2026-04-17 2:22 ` Shinichiro Kawasaki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox