public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
To: John Garry <john.g.garry@oracle.com>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	"nilay@linux.ibm.com" <nilay@linux.ibm.com>
Subject: Re: [PATCH blktests] nvme/068: add a test for multipath delayed removal
Date: Thu, 16 Apr 2026 12:50:19 +0000	[thread overview]
Message-ID: <aeDYE1BfumNvutvS@shinmob> (raw)
In-Reply-To: <20260415104111.1439459-1-john.g.garry@oracle.com>

On Apr 15, 2026 / 10:41, John Garry wrote:
> For NVMe multipath, the delayed removal feature allows the multipath
> gendisk to remain present when all available paths are gone. The purpose of
> this feature is to ensure that we keep the gendisk for intermittent path
> failures.
> 
> The delayed removal works on a timer - when all paths are gone, a timer is
> kicked off; once the timer expires and no paths have returned, the gendisk
> is removed.
> 
> When all paths are gone and the gendisk is still present, all reads and
> writes to the disk are queued. If a path returns before the timer
> expiration, the timer canceled and the queued IO is submitted;
> otherwise they fail when the timer expires.
> 
> This testcase covers two scenarios in separate parts:
> a. test that IOs submitted after all paths are removed (and do not return)
>    fail
> b. test that IOs submitted between all paths removed and a path
>    returning succeed
> 
> During the period of the timer being active, it must be ensured that the
> nvme-core module is not removed. Otherwise the driver may not be present
> to handle the timeout expiry. The kernel ensures this by taking a
> reference to the module. Ideally, we would try to remove the module during
> this test to prove that this is not possible (and the kernel behaves as
> expected), but that module will probably not be removable anyway due to
> many references. To test this feature, check that the refcount of the
> nvme-core module is incremented when the delayed timer is active.
> 
> Signed-off-by: John Garry <john.g.garry@oracle.com>

John, thanks for the patch. When I ran the new test case in my test environment,
it failed. The reported refcount mismatch looks happening in the Part b.

nvme/068 (tr=loop) (NVMe multipath delayed removal test)     [failed]
    runtime  38.579s  ...  38.770s
    --- tests/nvme/068.out      2026-04-16 20:50:21.228000000 +0900
    +++ /home/shin/Blktests/blktests/results/nodev_tr_loop/nvme/068.out.bad     2026-04-16 21:30:36.215000000 +0900
    @@ -1,3 +1,4 @@
     Running nvme/068
     pwrite: Input/output error
    +module refcount not as original
     Test complete

I have no idea why it fails. Do you have any guess about the failure cause?

Also, please find my review comments in line.

> 
> diff --git a/common/rc b/common/rc
> index 5350057..6eae0e2 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -117,6 +117,16 @@ _module_not_in_use() {
>  	fi
>  }
>  
> +_module_use_count() {
> +	local refcnt
> +	if [ -f "/sys/module/$1/refcnt" ]; then
> +		refcnt="$(cat /sys/module/"$1"/refcnt)"
> +		echo $refcnt

To suppress Shellcheck warning, please add quotation marks: "$refcnt".

> +		return
> +	fi
> +	echo ""
> +}
> +
>  _have_module_param() {
>  	 _have_driver "$1" || return
>  
> diff --git a/tests/nvme/068 b/tests/nvme/068
> new file mode 100644

File mode 755 is recommended for consistency.

> index 0000000..e06fd6b
> --- /dev/null
> +++ b/tests/nvme/068
> @@ -0,0 +1,118 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-3.0+
> +# Copyright (C) 2026 John Garry
> +#
> +# Test NVMe multipath delayed removal works as expected
> +
> +. tests/nvme/rc
> +. common/xfs
> +
> +DESCRIPTION="NVMe multipath delayed removal test"
> +QUICK=1

It is guided to set QUICK=1 for the test cases which completes
"in ~30 seconds or less". In my environment, this test case took 38
seconds, so I'm not so sure if this test case is quick. How long
does it take in your environment?

> +
> +requires() {
> +	_nvme_requires
> +	_have_loop
> +	_have_module_param_value nvme_core multipath Y
> +	_require_nvme_trtype_is_fabrics
> +}
> +
> +set_conditions() {
> +	_set_nvme_trtype "$@"
> +}
> +
> +_delayed_nvme_reconnect_ctrl() {
> +	sleep 5
> +	_nvme_connect_subsys
> +}
> +
> +test() {
> +	echo "Running ${TEST_NAME}"
> +
> +	_setup_nvmet
> +
> +	local nvmedev
> +	local ns
> +	local bytes_written
> +	local refcnt_orig
> +	local refcnt
> +	_nvmet_target_setup
> +
> +	_nvme_connect_subsys
> +
> +	# Part a: Prove that writes fail when no path returns. Any reads or
> +	#	  writes are queued during the delayed removal period. If no
> +	#	  paths return before the timer expires, then those IOs should
> +	#	  fail.
> +	#	  During the delayed removal period, ensure that the module
> +	#	  refcnt is incremented, to prove that we cannot remove the
> +	#	  driver during this period.
> +	nvmedev=$(_find_nvme_dev "${def_subsysnqn}")
> +	ns=$(_find_nvme_ns "${def_subsys_uuid}")
> +	refcnt=$(_module_use_count nvme_core)
> +	echo 10 > "/sys/block/"$ns"/delayed_removal_secs"

Shellcheck complains about the line above. I think it can be modified as below:

	echo 10 > "/sys/block/${ns}/delayed_removal_secs"

> +	refcnt_orig=$(_module_use_count nvme_core)
> +	_nvme_disconnect_ctrl "${nvmedev}"
> +	sleep 1
> +	ns=$(_find_nvme_ns "${def_subsys_uuid}")
> +	if [[ "${ns}" = "" ]]; then
> +		echo "could not find ns after disconnect"
> +	fi
> +	refcnt=$(_module_use_count nvme_core)
> +	if [ "$refcnt" != "" ] && [ "$refcnt" -le "$refcnt_orig" ]; then
> +		echo "module refcount did not increase"
> +	fi
> +	bytes_written=$(run_xfs_io_pwritev2 /dev/"$ns" 4096)
> +	if [ "$bytes_written" == 4096 ]; then
> +		echo "wrote successfully after disconnect"
> +	fi
> +	sleep 10
> +	ns=$(_find_nvme_ns "${def_subsys_uuid}")
> +	if [[ !"${ns}" = "" ]]; then

Shellcheck warns the line above. I guess it can be as follows:

	if [[ "${ns}" != "" ]]; then

> +		echo "found ns after delayed removal"
> +	fi
> +	refcnt=$(_module_use_count nvme_core)
> +	if [ "$refcnt" != "" ] && [ "$refcnt" -ne "$refcnt_orig" ]; then
> +		echo "module refcount not as original"
> +	fi
> +
> +	# Part b: Ensure writes for an intermittent disconnect are successful.
> +	#	  During an intermittent disconnect, any reads or writes
> +	#	  queued should succeed after a path returns.
> +	#	  Also ensure module refcount behaviour is as expected, as
> +	#	  above.
> +	_nvme_connect_subsys
> +
> +	nvmedev=$(_find_nvme_dev "${def_subsysnqn}")
> +	ns=$(_find_nvme_ns "${def_subsys_uuid}")
> +	refcnt_orig=$(_module_use_count nvme_core)
> +	echo 10 > "/sys/block/"$ns"/delayed_removal_secs"

Again, the line above can be modified as follows for Shellcheck.

	echo 10 > "/sys/block/${ns}/delayed_removal_secs"

  parent reply	other threads:[~2026-04-16 12:50 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-15 10:41 [PATCH blktests] nvme/068: add a test for multipath delayed removal John Garry
2026-04-15 17:58 ` Chaitanya Kulkarni
2026-04-16 11:18 ` Nilay Shroff
2026-04-16 11:45   ` John Garry
2026-04-16 12:50 ` Shinichiro Kawasaki [this message]
2026-04-16 13:03   ` John Garry
2026-04-17  2:22     ` Shinichiro Kawasaki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aeDYE1BfumNvutvS@shinmob \
    --to=shinichiro.kawasaki@wdc.com \
    --cc=john.g.garry@oracle.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=nilay@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox