Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH blktests] nvme: add regression test for concurrently enable/disable nvmet ns
@ 2024-05-21  8:56 Sagi Grimberg
  2024-05-21  8:56 ` [PATCH] nvmet: fix ns enable/disable possible hang Sagi Grimberg
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Sagi Grimberg @ 2024-05-21  8:56 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni

Reproduce a hang in nvmet when concurrently disabling/enabling
an nvmet namespace.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 tests/nvme/051     | 48 ++++++++++++++++++++++++++++++++++++++++++++++
 tests/nvme/051.out |  2 ++
 2 files changed, 50 insertions(+)
 create mode 100755 tests/nvme/051
 create mode 100644 tests/nvme/051.out

diff --git a/tests/nvme/051 b/tests/nvme/051
new file mode 100755
index 000000000000..ddc097310dd7
--- /dev/null
+++ b/tests/nvme/051
@@ -0,0 +1,48 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-3.0+
+# Copyright (C) 2024 Sagi Grimberg
+#
+# Regression test for kernel hang when enabling/disabling nvmet
+# namespace concurrently
+
+. tests/nvme/rc
+
+DESCRIPTION="test nvmet concurrent ns enable/disable"
+QUICK=1
+
+requires() {
+	_nvme_requires
+	_require_nvme_trtype_is_fabrics
+}
+
+set_conditions() {
+	_set_nvme_trtype "$@"
+}
+
+ns_enable_disable_loop() {
+	local ns="$1"
+	local iterations=200
+	for ((i = 1; i <= ${iterations}; i++)); do
+		echo 0 > $ns/enable
+		echo 1 > $ns/enable
+	done
+}
+
+test() {
+	echo "Running ${TEST_NAME}"
+
+	_setup_nvmet
+
+	_nvmet_target_setup
+	ns="${NVMET_CFS}subsystems/${def_subsysnqn}/namespaces/1"
+
+	# fire off two enable/disable loops concurrently and wait
+	# for them to complete...
+	ns_enable_disable_loop $ns &
+	ns_enable_disable_loop $ns &
+	wait
+
+	_nvmet_target_cleanup
+
+	echo "Test complete"
+}
diff --git a/tests/nvme/051.out b/tests/nvme/051.out
new file mode 100644
index 000000000000..156f0687aab2
--- /dev/null
+++ b/tests/nvme/051.out
@@ -0,0 +1,2 @@
+Running nvme/051
+Test complete
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH] nvmet: fix ns enable/disable possible hang
  2024-05-21  8:56 [PATCH blktests] nvme: add regression test for concurrently enable/disable nvmet ns Sagi Grimberg
@ 2024-05-21  8:56 ` Sagi Grimberg
  2024-05-21 19:39   ` Christoph Hellwig
  2024-05-22  0:45   ` Chaitanya Kulkarni
  2024-05-21 10:07 ` [PATCH blktests] nvme: add regression test for concurrently enable/disable nvmet ns Sagi Grimberg
  2024-05-22  0:54 ` Chaitanya Kulkarni
  2 siblings, 2 replies; 8+ messages in thread
From: Sagi Grimberg @ 2024-05-21  8:56 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni

When disabling an nvmet namespace, there is a period where
the subsys->lock is released, as the ns disable waits for
backend IO to complete, and the ns percpu ref to be properly
killed. The original intent was to avoid taking the subsystem
lock for a prolong period as other processes may need to acquire
it (for example new incoming connections).

However, it opens up a window where another process may come in
and enable the ns, (re)intiailizing the ns percpu_ref, causing
the disable sequence to hang.

Solve this by taking the global nvmet_config_sem over the entire
configfs enable/disable sequence.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/target/configfs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 6a736d28d767..b38097acc605 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -676,10 +676,12 @@ static ssize_t nvmet_ns_enable_store(struct config_item *item,
 	if (kstrtobool(page, &enable))
 		return -EINVAL;
 
+	down_write(&nvmet_config_sem);
 	if (enable)
 		ret = nvmet_ns_enable(ns);
 	else
 		nvmet_ns_disable(ns);
+	up_write(&nvmet_config_sem);
 
 	return ret ? ret : count;
 }
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH blktests] nvme: add regression test for concurrently enable/disable nvmet ns
  2024-05-21  8:56 [PATCH blktests] nvme: add regression test for concurrently enable/disable nvmet ns Sagi Grimberg
  2024-05-21  8:56 ` [PATCH] nvmet: fix ns enable/disable possible hang Sagi Grimberg
@ 2024-05-21 10:07 ` Sagi Grimberg
  2024-05-22  3:46   ` Yi Zhang
  2024-05-22  0:54 ` Chaitanya Kulkarni
  2 siblings, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2024-05-21 10:07 UTC (permalink / raw)
  To: linux-nvme
  Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki

Forgot to CC Shinichiro...

On 21/05/2024 11:56, Sagi Grimberg wrote:
> Reproduce a hang in nvmet when concurrently disabling/enabling
> an nvmet namespace.
>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
>   tests/nvme/051     | 48 ++++++++++++++++++++++++++++++++++++++++++++++
>   tests/nvme/051.out |  2 ++
>   2 files changed, 50 insertions(+)
>   create mode 100755 tests/nvme/051
>   create mode 100644 tests/nvme/051.out
>
> diff --git a/tests/nvme/051 b/tests/nvme/051
> new file mode 100755
> index 000000000000..ddc097310dd7
> --- /dev/null
> +++ b/tests/nvme/051
> @@ -0,0 +1,48 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-3.0+
> +# Copyright (C) 2024 Sagi Grimberg
> +#
> +# Regression test for kernel hang when enabling/disabling nvmet
> +# namespace concurrently
> +
> +. tests/nvme/rc
> +
> +DESCRIPTION="test nvmet concurrent ns enable/disable"
> +QUICK=1
> +
> +requires() {
> +	_nvme_requires
> +	_require_nvme_trtype_is_fabrics
> +}
> +
> +set_conditions() {
> +	_set_nvme_trtype "$@"
> +}
> +
> +ns_enable_disable_loop() {
> +	local ns="$1"
> +	local iterations=200
> +	for ((i = 1; i <= ${iterations}; i++)); do
> +		echo 0 > $ns/enable
> +		echo 1 > $ns/enable
> +	done
> +}
> +
> +test() {
> +	echo "Running ${TEST_NAME}"
> +
> +	_setup_nvmet
> +
> +	_nvmet_target_setup
> +	ns="${NVMET_CFS}subsystems/${def_subsysnqn}/namespaces/1"
> +
> +	# fire off two enable/disable loops concurrently and wait
> +	# for them to complete...
> +	ns_enable_disable_loop $ns &
> +	ns_enable_disable_loop $ns &
> +	wait
> +
> +	_nvmet_target_cleanup
> +
> +	echo "Test complete"
> +}
> diff --git a/tests/nvme/051.out b/tests/nvme/051.out
> new file mode 100644
> index 000000000000..156f0687aab2
> --- /dev/null
> +++ b/tests/nvme/051.out
> @@ -0,0 +1,2 @@
> +Running nvme/051
> +Test complete



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvmet: fix ns enable/disable possible hang
  2024-05-21  8:56 ` [PATCH] nvmet: fix ns enable/disable possible hang Sagi Grimberg
@ 2024-05-21 19:39   ` Christoph Hellwig
  2024-05-22  0:45   ` Chaitanya Kulkarni
  1 sibling, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2024-05-21 19:39 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-nvme, Christoph Hellwig, Keith Busch, Chaitanya Kulkarni

On Tue, May 21, 2024 at 11:56:23AM +0300, Sagi Grimberg wrote:
> When disabling an nvmet namespace, there is a period where
> the subsys->lock is released, as the ns disable waits for
> backend IO to complete, and the ns percpu ref to be properly
> killed. The original intent was to avoid taking the subsystem
> lock for a prolong period as other processes may need to acquire
> it (for example new incoming connections).
> 
> However, it opens up a window where another process may come in
> and enable the ns, (re)intiailizing the ns percpu_ref, causing
> the disable sequence to hang.
> 
> Solve this by taking the global nvmet_config_sem over the entire
> configfs enable/disable sequence.

Can you use up all 73 characters for the commit log and add a little
comment to the code on why nvmet_config_sem is taken to the code?

With that this looks good to me.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvmet: fix ns enable/disable possible hang
  2024-05-21  8:56 ` [PATCH] nvmet: fix ns enable/disable possible hang Sagi Grimberg
  2024-05-21 19:39   ` Christoph Hellwig
@ 2024-05-22  0:45   ` Chaitanya Kulkarni
  1 sibling, 0 replies; 8+ messages in thread
From: Chaitanya Kulkarni @ 2024-05-22  0:45 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme@lists.infradead.org
  Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni

On 5/21/24 01:56, Sagi Grimberg wrote:
> When disabling an nvmet namespace, there is a period where
> the subsys->lock is released, as the ns disable waits for
> backend IO to complete, and the ns percpu ref to be properly
> killed. The original intent was to avoid taking the subsystem
> lock for a prolong period as other processes may need to acquire
> it (for example new incoming connections).
>
> However, it opens up a window where another process may come in
> and enable the ns, (re)intiailizing the ns percpu_ref, causing
> the disable sequence to hang.
>
> Solve this by taking the global nvmet_config_sem over the entire
> configfs enable/disable sequence.
>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
>

good catch, looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH blktests] nvme: add regression test for concurrently enable/disable nvmet ns
  2024-05-21  8:56 [PATCH blktests] nvme: add regression test for concurrently enable/disable nvmet ns Sagi Grimberg
  2024-05-21  8:56 ` [PATCH] nvmet: fix ns enable/disable possible hang Sagi Grimberg
  2024-05-21 10:07 ` [PATCH blktests] nvme: add regression test for concurrently enable/disable nvmet ns Sagi Grimberg
@ 2024-05-22  0:54 ` Chaitanya Kulkarni
  2024-05-22  8:58   ` Sagi Grimberg
  2 siblings, 1 reply; 8+ messages in thread
From: Chaitanya Kulkarni @ 2024-05-22  0:54 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme@lists.infradead.org
  Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni

On 5/21/24 01:56, Sagi Grimberg wrote:
> Reproduce a hang in nvmet when concurrently disabling/enabling
> an nvmet namespace.
>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

thanks for the test  ...

> ---
>   tests/nvme/051     | 48 ++++++++++++++++++++++++++++++++++++++++++++++
>   tests/nvme/051.out |  2 ++
>   2 files changed, 50 insertions(+)
>   create mode 100755 tests/nvme/051
>   create mode 100644 tests/nvme/051.out
>
> diff --git a/tests/nvme/051 b/tests/nvme/051
> new file mode 100755
> index 000000000000..ddc097310dd7
> --- /dev/null
> +++ b/tests/nvme/051
> @@ -0,0 +1,48 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-3.0+
> +# Copyright (C) 2024 Sagi Grimberg
> +#
> +# Regression test for kernel hang when enabling/disabling nvmet
> +# namespace concurrently
> +
> +. tests/nvme/rc
> +
> +DESCRIPTION="test nvmet concurrent ns enable/disable"
> +QUICK=1
> +
> +requires() {
> +	_nvme_requires
> +	_require_nvme_trtype_is_fabrics
> +}
> +
> +set_conditions() {
> +	_set_nvme_trtype "$@"
> +}
> +
> +ns_enable_disable_loop() {
> +	local ns="$1"

nit: how about :-

	local ns="${NVMET_CFS}subsystems/${def_subsysnqn}/namespaces/1"

and get rid of the argument at the call site ? unless there is
a plan to use different namespaces


> +	local iterations=200

nit: not sure if we need iterations as variable, unless there is
      a plan to make it configurable in the future maybe we can
      remove that ?

> +	for ((i = 1; i <= ${iterations}; i++)); do
> +		echo 0 > $ns/enable
> +		echo 1 > $ns/enable
> +	done
> +}
> +
> +test() {
> +	echo "Running ${TEST_NAME}"
> +
> +	_setup_nvmet
> +
> +	_nvmet_target_setup
> +	ns="${NVMET_CFS}subsystems/${def_subsysnqn}/namespaces/1"
> +
> +	# fire off two enable/disable loops concurrently and wait
> +	# for them to complete...
> +	ns_enable_disable_loop $ns &
> +	ns_enable_disable_loop $ns &
> +	wait
> +
> +	_nvmet_target_cleanup
> +
> +	echo "Test complete"
> +}
> diff --git a/tests/nvme/051.out b/tests/nvme/051.out
> new file mode 100644
> index 000000000000..156f0687aab2
> --- /dev/null
> +++ b/tests/nvme/051.out
> @@ -0,0 +1,2 @@
> +Running nvme/051
> +Test complete

irrespective of couple of nits, looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH blktests] nvme: add regression test for concurrently enable/disable nvmet ns
  2024-05-21 10:07 ` [PATCH blktests] nvme: add regression test for concurrently enable/disable nvmet ns Sagi Grimberg
@ 2024-05-22  3:46   ` Yi Zhang
  0 siblings, 0 replies; 8+ messages in thread
From: Yi Zhang @ 2024-05-22  3:46 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-nvme, Christoph Hellwig, Keith Busch, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki

Hi Sagi

Some note from shellcheck
$ make check
shellcheck -x -e SC2119 -f gcc check common/* \
tests/*/rc tests/*/[0-9]*[0-9] src/*.sh
tests/nvme/051:25:20: note: $/${} is unnecessary on arithmetic
variables. [SC2004]
tests/nvme/051:26:12: note: Double quote to prevent globbing and word
splitting. [SC2086]
tests/nvme/051:27:12: note: Double quote to prevent globbing and word
splitting. [SC2086]

On Tue, May 21, 2024 at 6:08 PM Sagi Grimberg <sagi@grimberg.me> wrote:
>
> Forgot to CC Shinichiro...
>
> On 21/05/2024 11:56, Sagi Grimberg wrote:
> > Reproduce a hang in nvmet when concurrently disabling/enabling
> > an nvmet namespace.
> >
> > Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> > ---
> >   tests/nvme/051     | 48 ++++++++++++++++++++++++++++++++++++++++++++++
> >   tests/nvme/051.out |  2 ++
> >   2 files changed, 50 insertions(+)
> >   create mode 100755 tests/nvme/051
> >   create mode 100644 tests/nvme/051.out
> >
> > diff --git a/tests/nvme/051 b/tests/nvme/051
> > new file mode 100755
> > index 000000000000..ddc097310dd7
> > --- /dev/null
> > +++ b/tests/nvme/051
> > @@ -0,0 +1,48 @@
> > +#!/bin/bash
> > +# SPDX-License-Identifier: GPL-3.0+
> > +# Copyright (C) 2024 Sagi Grimberg
> > +#
> > +# Regression test for kernel hang when enabling/disabling nvmet
> > +# namespace concurrently
> > +
> > +. tests/nvme/rc
> > +
> > +DESCRIPTION="test nvmet concurrent ns enable/disable"
> > +QUICK=1
> > +
> > +requires() {
> > +     _nvme_requires
> > +     _require_nvme_trtype_is_fabrics
> > +}
> > +
> > +set_conditions() {
> > +     _set_nvme_trtype "$@"
> > +}
> > +
> > +ns_enable_disable_loop() {
> > +     local ns="$1"
> > +     local iterations=200
> > +     for ((i = 1; i <= ${iterations}; i++)); do
> > +             echo 0 > $ns/enable
> > +             echo 1 > $ns/enable
> > +     done
> > +}
> > +
> > +test() {
> > +     echo "Running ${TEST_NAME}"
> > +
> > +     _setup_nvmet
> > +
> > +     _nvmet_target_setup
> > +     ns="${NVMET_CFS}subsystems/${def_subsysnqn}/namespaces/1"
> > +
> > +     # fire off two enable/disable loops concurrently and wait
> > +     # for them to complete...
> > +     ns_enable_disable_loop $ns &
> > +     ns_enable_disable_loop $ns &
> > +     wait
> > +
> > +     _nvmet_target_cleanup
> > +
> > +     echo "Test complete"
> > +}
> > diff --git a/tests/nvme/051.out b/tests/nvme/051.out
> > new file mode 100644
> > index 000000000000..156f0687aab2
> > --- /dev/null
> > +++ b/tests/nvme/051.out
> > @@ -0,0 +1,2 @@
> > +Running nvme/051
> > +Test complete
>
>


-- 
Best Regards,
  Yi Zhang



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH blktests] nvme: add regression test for concurrently enable/disable nvmet ns
  2024-05-22  0:54 ` Chaitanya Kulkarni
@ 2024-05-22  8:58   ` Sagi Grimberg
  0 siblings, 0 replies; 8+ messages in thread
From: Sagi Grimberg @ 2024-05-22  8:58 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-nvme@lists.infradead.org
  Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni



On 22/05/2024 3:54, Chaitanya Kulkarni wrote:
> On 5/21/24 01:56, Sagi Grimberg wrote:
>> Reproduce a hang in nvmet when concurrently disabling/enabling
>> an nvmet namespace.
>>
>> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> thanks for the test  ...
>
>> ---
>>    tests/nvme/051     | 48 ++++++++++++++++++++++++++++++++++++++++++++++
>>    tests/nvme/051.out |  2 ++
>>    2 files changed, 50 insertions(+)
>>    create mode 100755 tests/nvme/051
>>    create mode 100644 tests/nvme/051.out
>>
>> diff --git a/tests/nvme/051 b/tests/nvme/051
>> new file mode 100755
>> index 000000000000..ddc097310dd7
>> --- /dev/null
>> +++ b/tests/nvme/051
>> @@ -0,0 +1,48 @@
>> +#!/bin/bash
>> +# SPDX-License-Identifier: GPL-3.0+
>> +# Copyright (C) 2024 Sagi Grimberg
>> +#
>> +# Regression test for kernel hang when enabling/disabling nvmet
>> +# namespace concurrently
>> +
>> +. tests/nvme/rc
>> +
>> +DESCRIPTION="test nvmet concurrent ns enable/disable"
>> +QUICK=1
>> +
>> +requires() {
>> +	_nvme_requires
>> +	_require_nvme_trtype_is_fabrics
>> +}
>> +
>> +set_conditions() {
>> +	_set_nvme_trtype "$@"
>> +}
>> +
>> +ns_enable_disable_loop() {
>> +	local ns="$1"
> nit: how about :-
>
> 	local ns="${NVMET_CFS}subsystems/${def_subsysnqn}/namespaces/1"
>
> and get rid of the argument at the call site ? unless there is
> a plan to use different namespaces

I think its better that the ns comes as an argument, maybe someone may 
make this
function a generic utility in the future.

>
>
>> +	local iterations=200
> nit: not sure if we need iterations as variable, unless there is
>        a plan to make it configurable in the future maybe we can
>        remove that ?

At first I used NVME_NUM_ITERS, but that made the test too long, I can
make it hard-coded.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-05-22  8:58 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-21  8:56 [PATCH blktests] nvme: add regression test for concurrently enable/disable nvmet ns Sagi Grimberg
2024-05-21  8:56 ` [PATCH] nvmet: fix ns enable/disable possible hang Sagi Grimberg
2024-05-21 19:39   ` Christoph Hellwig
2024-05-22  0:45   ` Chaitanya Kulkarni
2024-05-21 10:07 ` [PATCH blktests] nvme: add regression test for concurrently enable/disable nvmet ns Sagi Grimberg
2024-05-22  3:46   ` Yi Zhang
2024-05-22  0:54 ` Chaitanya Kulkarni
2024-05-22  8:58   ` Sagi Grimberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox