parallel nvme format

public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed

* parallel nvme format
@ 2023-12-14 13:56 Niklas Cassel
  2023-12-14 17:19 ` Keith Busch
  0 siblings, 1 reply; 6+ messages in thread
From: Niklas Cassel @ 2023-12-14 13:56 UTC (permalink / raw)
  To: linux-nvme@lists.infradead.org
  Cc: Kamaljit Singh, Shinichiro Kawasaki, Damien Le Moal

Hello guys,

We have observed the following on linux stable v6.6.x.

(We tried to backport commit 839a40d1e730 ("nvme: fix deadlock between reset
and scan"), but it didn't fix the problem.)

What happens is that we are formatting 8 different namespaces in parallel.

[Wed Nov 29 16:43:56 2023] INFO: task nvme:18411 blocked for more than 120 seconds.
[Wed Nov 29 16:43:56 2023]       Not tainted 6.6.2 #1
[Wed Nov 29 16:43:56 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
+disables this message.
[Wed Nov 29 16:43:56 2023] task:nvme            state:D stack:0     pid:18411
+ppid:18405  flags:0x00004000
[Wed Nov 29 16:43:56 2023] Call Trace:
[Wed Nov 29 16:43:56 2023]  <TASK>
[Wed Nov 29 16:43:56 2023]  __schedule+0x34a/0xef0
[Wed Nov 29 16:43:56 2023]  ? preempt_count_add+0x7c/0xc0
[Wed Nov 29 16:43:56 2023]  ? _raw_spin_lock_irqsave+0x29/0x60
[Wed Nov 29 16:43:56 2023]  schedule+0x6b/0xf0
[Wed Nov 29 16:43:56 2023]  schedule_timeout+0x233/0x290
[Wed Nov 29 16:43:56 2023]  ? __timer_delete_sync+0x31/0x50   
[Wed Nov 29 16:43:56 2023]  ? schedule_timeout+0x1a3/0x290
[Wed Nov 29 16:43:56 2023]  wait_for_completion+0x8b/0x140
[Wed Nov 29 16:43:56 2023]  __flush_work+0x139/0x240
[Wed Nov 29 16:43:56 2023]  ? __pfx_wq_barrier_func+0x10/0x10
[Wed Nov 29 16:43:56 2023]  flush_work+0x10/0x20
[Wed Nov 29 16:43:56 2023]  nvme_passthru_end+0x60/0x170 [nvme_core]
[Wed Nov 29 16:43:56 2023]  nvme_submit_user_cmd+0x162/0x170 [nvme_core]
[Wed Nov 29 16:43:56 2023]  nvme_user_cmd.constprop.0+0x156/0x1d0 [nvme_core]
[Wed Nov 29 16:43:56 2023]  nvme_dev_ioctl+0x12c/0x2a0 [nvme_core]
[Wed Nov 29 16:43:56 2023]  __x64_sys_ioctl+0x95/0xd0
[Wed Nov 29 16:43:56 2023]  do_syscall_64+0x38/0x90
[Wed Nov 29 16:43:56 2023]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

flush_work() waits for more that 120 seconds on scan_work getting completed,
but even after 120 seconds, scan_work hasn't completed.

Inspecting nvme_scan_work(), the only thing that can cause it to not return
after 120 seconds, is if it is waiting on mutex_lock() to get the scan mutex.

What appears to be happening is:

format command #1 completes and calls nvme_passthru_end(),
nvme_passthru_end() calls:

mutex_unlock(&ctrl->scan_lock);
...
nvme_queue_scan(ctrl);
flush_work(&ctrl->scan_work);

Directly after format command #1 unlocks the mutex,
format command #2 will get the mutex in nvme_passthru_start().

Format command #2 will release the mutex in nvme_passthru_end(),
but only after the format command has completed.

The default timeout for the nvme format command via nvme-cli is 600 seconds,
so it is easy to imagine that the format command will take longer than 120
seconds.

So the scan work at the end of format command #1 will eventually get to run,
but only after the second nvme format command has completed.
(nvme_queue_scan() will finally be able to get the mutex, once
nvme_passthru_end() for the second format command has released the mutex.)

I don't think that formatting multiple namespaces in parallel is an
unreasonable thing to do, so I do not think that we should scare our
users with "hung task timeouts".

My first though was that perhaps we could do something like:

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 46a4c9c5ea96..811d40cf23f6 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1113,6 +1113,7 @@ u32 nvme_passthru_start(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u8 opcode)
         * effects say only one namespace is affected.
         */
        if (effects & NVME_CMD_EFFECTS_CSE_MASK) {
+               flush_work(&ctrl->scan_work);
                mutex_lock(&ctrl->scan_lock);
                mutex_lock(&ctrl->subsys->lock);
                nvme_mpath_start_freeze(ctrl->subsys);

But that will not work, because it will still have a race window between the
time nvme_passthru_end() has unlocked the mutex, and before
nvme_passthru_end() has queued the scan work.

I guess, in the case that nvme_passthru_end() has to perform a scan, we want
to unlock the scan mutex after nvme_passthru_end() has performed the scan,
but right now nvme_queue_scan() will acquire the mutex (and is running on the
workqueue)...

Thoughts?

Kind regards,
Niklas

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: parallel nvme format
  2023-12-14 13:56 parallel nvme format Niklas Cassel
@ 2023-12-14 17:19 ` Keith Busch
  2023-12-14 21:21   ` Niklas Cassel
  0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2023-12-14 17:19 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: linux-nvme@lists.infradead.org, Kamaljit Singh,
	Shinichiro Kawasaki, Damien Le Moal

On Thu, Dec 14, 2023 at 01:56:05PM +0000, Niklas Cassel wrote:
> Hello guys,
> 
> We have observed the following on linux stable v6.6.x.
> 
> (We tried to backport commit 839a40d1e730 ("nvme: fix deadlock between reset
> and scan"), but it didn't fix the problem.)
> 
> 
> What happens is that we are formatting 8 different namespaces in parallel.
> 
> [Wed Nov 29 16:43:56 2023] INFO: task nvme:18411 blocked for more than 120 seconds.
> [Wed Nov 29 16:43:56 2023]       Not tainted 6.6.2 #1
> [Wed Nov 29 16:43:56 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> +disables this message.
> [Wed Nov 29 16:43:56 2023] task:nvme            state:D stack:0     pid:18411
> +ppid:18405  flags:0x00004000
> [Wed Nov 29 16:43:56 2023] Call Trace:
> [Wed Nov 29 16:43:56 2023]  <TASK>
> [Wed Nov 29 16:43:56 2023]  __schedule+0x34a/0xef0
> [Wed Nov 29 16:43:56 2023]  ? preempt_count_add+0x7c/0xc0
> [Wed Nov 29 16:43:56 2023]  ? _raw_spin_lock_irqsave+0x29/0x60
> [Wed Nov 29 16:43:56 2023]  schedule+0x6b/0xf0
> [Wed Nov 29 16:43:56 2023]  schedule_timeout+0x233/0x290
> [Wed Nov 29 16:43:56 2023]  ? __timer_delete_sync+0x31/0x50   
> [Wed Nov 29 16:43:56 2023]  ? schedule_timeout+0x1a3/0x290
> [Wed Nov 29 16:43:56 2023]  wait_for_completion+0x8b/0x140
> [Wed Nov 29 16:43:56 2023]  __flush_work+0x139/0x240
> [Wed Nov 29 16:43:56 2023]  ? __pfx_wq_barrier_func+0x10/0x10
> [Wed Nov 29 16:43:56 2023]  flush_work+0x10/0x20
> [Wed Nov 29 16:43:56 2023]  nvme_passthru_end+0x60/0x170 [nvme_core]
> [Wed Nov 29 16:43:56 2023]  nvme_submit_user_cmd+0x162/0x170 [nvme_core]
> [Wed Nov 29 16:43:56 2023]  nvme_user_cmd.constprop.0+0x156/0x1d0 [nvme_core]
> [Wed Nov 29 16:43:56 2023]  nvme_dev_ioctl+0x12c/0x2a0 [nvme_core]
> [Wed Nov 29 16:43:56 2023]  __x64_sys_ioctl+0x95/0xd0
> [Wed Nov 29 16:43:56 2023]  do_syscall_64+0x38/0x90
> [Wed Nov 29 16:43:56 2023]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> 
> flush_work() waits for more that 120 seconds on scan_work getting completed,
> but even after 120 seconds, scan_work hasn't completed.
> 
> Inspecting nvme_scan_work(), the only thing that can cause it to not return
> after 120 seconds, is if it is waiting on mutex_lock() to get the scan mutex.
> 
> 
> What appears to be happening is:
> 
> format command #1 completes and calls nvme_passthru_end(),
> nvme_passthru_end() calls:
> 
> mutex_unlock(&ctrl->scan_lock);
> ...
> nvme_queue_scan(ctrl);
> flush_work(&ctrl->scan_work);
> 
> 
> Directly after format command #1 unlocks the mutex,
> format command #2 will get the mutex in nvme_passthru_start().
> 
> Format command #2 will release the mutex in nvme_passthru_end(),
> but only after the format command has completed.
> 
> The default timeout for the nvme format command via nvme-cli is 600 seconds,
> so it is easy to imagine that the format command will take longer than 120
> seconds.
> 
> 
> So the scan work at the end of format command #1 will eventually get to run,
> but only after the second nvme format command has completed.
> (nvme_queue_scan() will finally be able to get the mutex, once
> nvme_passthru_end() for the second format command has released the mutex.)
> 
> 
> I don't think that formatting multiple namespaces in parallel is an
> unreasonable thing to do, so I do not think that we should scare our
> users with "hung task timeouts".
> 
> 
> 
> My first though was that perhaps we could do something like:
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 46a4c9c5ea96..811d40cf23f6 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1113,6 +1113,7 @@ u32 nvme_passthru_start(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u8 opcode)
>          * effects say only one namespace is affected.
>          */
>         if (effects & NVME_CMD_EFFECTS_CSE_MASK) {
> +               flush_work(&ctrl->scan_work);
>                 mutex_lock(&ctrl->scan_lock);
>                 mutex_lock(&ctrl->subsys->lock);
>                 nvme_mpath_start_freeze(ctrl->subsys);
> 
> 
> But that will not work, because it will still have a race window between the
> time nvme_passthru_end() has unlocked the mutex, and before
> nvme_passthru_end() has queued the scan work.
> 
> I guess, in the case that nvme_passthru_end() has to perform a scan, we want
> to unlock the scan mutex after nvme_passthru_end() has performed the scan,
> but right now nvme_queue_scan() will acquire the mutex (and is running on the
> workqueue)...
> 
> Thoughts?

Hi Niklas!

Thanks for the report and detailed analysis. What do you think about
this idea: if scan_lock is already locked, then some other thread is
expecting to requeue scan_work shortly, so no need to proceed with the
scan.

---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 590cd4f097c22..c6317489022c3 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3945,7 +3945,13 @@ static void nvme_scan_work(struct work_struct *work)
 		nvme_clear_changed_ns_log(ctrl);
 	}
 
-	mutex_lock(&ctrl->scan_lock);
+	/*
+	 * Failure to aquire scan_lock means another thread will requeue this
+	 * work shortly, so abandon the current scan.
+	 */
+	if (!mutex_trylock(&ctrl->scan_lock))
+		return;
+
 	if (nvme_ctrl_limited_cns(ctrl)) {
 		nvme_scan_ns_sequential(ctrl);
 	} else {
--


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: parallel nvme format
  2023-12-14 17:19 ` Keith Busch
@ 2023-12-14 21:21   ` Niklas Cassel
  2023-12-14 22:49     ` Keith Busch
  0 siblings, 1 reply; 6+ messages in thread
From: Niklas Cassel @ 2023-12-14 21:21 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme@lists.infradead.org, Kamaljit Singh,
	Shinichiro Kawasaki, Damien Le Moal

On Thu, Dec 14, 2023 at 09:19:03AM -0800, Keith Busch wrote:
> 
> Hi Niklas!

Hello Keith!

> 
> Thanks for the report and detailed analysis. What do you think about
> this idea: if scan_lock is already locked, then some other thread is
> expecting to requeue scan_work shortly, so no need to proceed with the
> scan.

nvme_passthru_start() has:

if (effects & NVME_CMD_EFFECTS_CSE_MASK) {
	mutex_lock(&ctrl->scan_lock);

nvme_passthru_end() has:

if (effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC)) {
	nvme_queue_scan(ctrl);

What if the first passthru command (format), has
NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC
set, but the second passthru command (some other command),
satisfies: if (effects & NVME_CMD_EFFECTS_CSE_MASK)
but not:
if (effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC))

In that case, the scanning for the first command can get abandoned,
and the second command would never queue a new scan.

Perhaps we could introduce a new bit, NVME_CTRL_NEEDS_SCAN or similar?
(That gets cleared by nvme_scan_work(), after a scan has been performed.)

I guess for format command specifically, we already have
NVME_AER_NOTICE_NS_CHANGED, but that seems to be cleared before by
nvme_scan_work() before the mutex is acquired.
And I'm not sure if all command that have
effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC)
set will trigger a NVME_AER_NOTICE_NS_CHANGED AEN.

Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: parallel nvme format
  2023-12-14 21:21   ` Niklas Cassel
@ 2023-12-14 22:49     ` Keith Busch
  2023-12-14 23:08       ` Keith Busch
  0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2023-12-14 22:49 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: linux-nvme@lists.infradead.org, Kamaljit Singh,
	Shinichiro Kawasaki, Damien Le Moal

On Thu, Dec 14, 2023 at 09:21:28PM +0000, Niklas Cassel wrote:
> Perhaps we could introduce a new bit, NVME_CTRL_NEEDS_SCAN or similar?
> (That gets cleared by nvme_scan_work(), after a scan has been performed.)

Good point. Updated patch below.
 
> I guess for format command specifically, we already have
> NVME_AER_NOTICE_NS_CHANGED, but that seems to be cleared before by
> nvme_scan_work() before the mutex is acquired.

The clearing should just re-arm the controller to send the AER again if
any other namespace changes. But we can't really depend on the AER
anyway.

> And I'm not sure if all command that have
> effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC)
> set will trigger a NVME_AER_NOTICE_NS_CHANGED AEN.

We have the "known_effects" for the format command to ensure the driver
will always request a rescan on a format no matter what a controller
reports, but yeah, the format could be followed up by another command
with NVME_CMD_EFFECTS_CSE_MASK set without NIC/NCC, so definitely need
to retain the need to restart the scan.

---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 590cd4f097c22..0b062a2268447 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1120,6 +1120,8 @@ u32 nvme_passthru_start(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u8 opcode)
 		nvme_start_freeze(ctrl);
 		nvme_wait_freeze(ctrl);
 	}
+	if (effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC))
+		set_bit(NVME_CTRL_NEEDS_RESCAN, &ctrl->flags);
 	return effects;
 }
 EXPORT_SYMBOL_NS_GPL(nvme_passthru_start, NVME_TARGET_PASSTHRU);
@@ -1140,7 +1142,7 @@ void nvme_passthru_end(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u32 effects,
 "controller capabilities changed, reset may be required to take effect.\n");
 		}
 	}
-	if (effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC)) {
+	if (test_bit(NVME_CTRL_NEEDS_RESCAN, &ctrl->flags)) {
 		nvme_queue_scan(ctrl);
 		flush_work(&ctrl->scan_work);
 	}
@@ -3945,7 +3947,14 @@ static void nvme_scan_work(struct work_struct *work)
 		nvme_clear_changed_ns_log(ctrl);
 	}
 
-	mutex_lock(&ctrl->scan_lock);
+	/*
+	 * Failure to acquire scan_lock means another thread will requeue this
+	 * work shortly
+	 */
+	if (!mutex_trylock(&ctrl->scan_lock))
+		return;
+
+	clear_bit(NVME_CTRL_NEEDS_RESCAN, &ctrl->flags);
 	if (nvme_ctrl_limited_cns(ctrl)) {
 		nvme_scan_ns_sequential(ctrl);
 	} else {
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 39a90b7cb1254..79a71b6ae64fe 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -251,6 +251,7 @@ enum nvme_ctrl_flags {
 	NVME_CTRL_STOPPED		= 3,
 	NVME_CTRL_SKIP_ID_CNS_CS	= 4,
 	NVME_CTRL_DIRTY_CAPABILITY	= 5,
+	NVME_CTRL_NEEDS_RESCAN		= 6,
 };
 
 struct nvme_ctrl {
--


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: parallel nvme format
  2023-12-14 22:49     ` Keith Busch
@ 2023-12-14 23:08       ` Keith Busch
       [not found]         ` <BYAPR04MB4151B88FACA25A38C4C998FCBC8CA@BYAPR04MB4151.namprd04.prod.outlook.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2023-12-14 23:08 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: linux-nvme@lists.infradead.org, Kamaljit Singh,
	Shinichiro Kawasaki, Damien Le Moal

On Thu, Dec 14, 2023 at 02:49:54PM -0800, Keith Busch wrote:
> On Thu, Dec 14, 2023 at 09:21:28PM +0000, Niklas Cassel wrote:
> > Perhaps we could introduce a new bit, NVME_CTRL_NEEDS_SCAN or similar?
> > (That gets cleared by nvme_scan_work(), after a scan has been performed.)
> 
> Good point. Updated patch below.
>  
> > I guess for format command specifically, we already have
> > NVME_AER_NOTICE_NS_CHANGED, but that seems to be cleared before by
> > nvme_scan_work() before the mutex is acquired.
> 
> The clearing should just re-arm the controller to send the AER again if
> any other namespace changes. But we can't really depend on the AER
> anyway.
> 
> > And I'm not sure if all command that have
> > effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC)
> > set will trigger a NVME_AER_NOTICE_NS_CHANGED AEN.
> 
> We have the "known_effects" for the format command to ensure the driver
> will always request a rescan on a format no matter what a controller
> reports, but yeah, the format could be followed up by another command
> with NVME_CMD_EFFECTS_CSE_MASK set without NIC/NCC, so definitely need
> to retain the need to restart the scan.

The previous diff missed the header update:

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 39a90b7cb1254..79a71b6ae64fe 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -251,6 +251,7 @@ enum nvme_ctrl_flags {
 	NVME_CTRL_STOPPED		= 3,
 	NVME_CTRL_SKIP_ID_CNS_CS	= 4,
 	NVME_CTRL_DIRTY_CAPABILITY	= 5,
+	NVME_CTRL_NEEDS_RESCAN		= 6,
 };

 struct nvme_ctrl {
 
> ---
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 590cd4f097c22..0b062a2268447 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1120,6 +1120,8 @@ u32 nvme_passthru_start(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u8 opcode)
>  		nvme_start_freeze(ctrl);
>  		nvme_wait_freeze(ctrl);
>  	}
> +	if (effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC))
> +		set_bit(NVME_CTRL_NEEDS_RESCAN, &ctrl->flags);
>  	return effects;
>  }
>  EXPORT_SYMBOL_NS_GPL(nvme_passthru_start, NVME_TARGET_PASSTHRU);
> @@ -1140,7 +1142,7 @@ void nvme_passthru_end(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u32 effects,
>  "controller capabilities changed, reset may be required to take effect.\n");
>  		}
>  	}
> -	if (effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC)) {
> +	if (test_bit(NVME_CTRL_NEEDS_RESCAN, &ctrl->flags)) {
>  		nvme_queue_scan(ctrl);
>  		flush_work(&ctrl->scan_work);
>  	}
> @@ -3945,7 +3947,14 @@ static void nvme_scan_work(struct work_struct *work)
>  		nvme_clear_changed_ns_log(ctrl);
>  	}
>  
> -	mutex_lock(&ctrl->scan_lock);
> +	/*
> +	 * Failure to acquire scan_lock means another thread will requeue this
> +	 * work shortly
> +	 */
> +	if (!mutex_trylock(&ctrl->scan_lock))
> +		return;
> +
> +	clear_bit(NVME_CTRL_NEEDS_RESCAN, &ctrl->flags);
>  	if (nvme_ctrl_limited_cns(ctrl)) {
>  		nvme_scan_ns_sequential(ctrl);
>  	} else {
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 39a90b7cb1254..79a71b6ae64fe 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -251,6 +251,7 @@ enum nvme_ctrl_flags {
>  	NVME_CTRL_STOPPED		= 3,
>  	NVME_CTRL_SKIP_ID_CNS_CS	= 4,
>  	NVME_CTRL_DIRTY_CAPABILITY	= 5,
> +	NVME_CTRL_NEEDS_RESCAN		= 6,
>  };
>  
>  struct nvme_ctrl {
> --
> 


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: parallel nvme format
       [not found]         ` <BYAPR04MB4151B88FACA25A38C4C998FCBC8CA@BYAPR04MB4151.namprd04.prod.outlook.com>
@ 2023-12-14 23:37           ` Keith Busch
  0 siblings, 0 replies; 6+ messages in thread
From: Keith Busch @ 2023-12-14 23:37 UTC (permalink / raw)
  To: Kamaljit Singh
  Cc: Niklas Cassel, linux-nvme@lists.infradead.org,
	Shinichiro Kawasaki, Damien Le Moal

On Thu, Dec 14, 2023 at 11:27:57PM +0000, Kamaljit Singh wrote:
> Hi Keith,
> 
> Good suggestions but we may need another change to check for NVME_CTRL_NEEDS_SCAN and kick-off nvme_scan_work later, to follow-up for the call that may have been skipped.

Another check where? These are the only two paths that hold scan_lock,
so if the initial scan was aborted because the other path was holding
the lock, that path has to see that the needs rescan flag was set and
requeue it.

I guess there could be some user initiated scan or an AER triggered scan
in a multipath+multihost setup that was aborted because of an admin
command without ever explicitly setting the flag bit, so we could set it
on trylock failure to make doubly sure it gets rekicked after the
scan_lock is released:

	if (!mutex_trylock(&ctrl->scan_lock)) {
		set_bit(NVME_CTRL_NEEDS_RESCAN, &ctrl->flags);
		return;
	}
 
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -1120,6 +1120,8 @@ u32 nvme_passthru_start(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u8 opcode)
> >               nvme_start_freeze(ctrl);
> >               nvme_wait_freeze(ctrl);
> >       }
> > +     if (effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC))
> > +             set_bit(NVME_CTRL_NEEDS_RESCAN, &ctrl->flags);
> >       return effects;
> >  }
> >  EXPORT_SYMBOL_NS_GPL(nvme_passthru_start, NVME_TARGET_PASSTHRU);
> > @@ -1140,7 +1142,7 @@ void nvme_passthru_end(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u32 effects,
> >  "controller capabilities changed, reset may be required to take effect.\n");
> >               }
> >       }
> > -     if (effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC)) {
> > +     if (test_bit(NVME_CTRL_NEEDS_RESCAN, &ctrl->flags)) {
> >               nvme_queue_scan(ctrl);
> >               flush_work(&ctrl->scan_work);
> >       }
> > @@ -3945,7 +3947,14 @@ static void nvme_scan_work(struct work_struct *work)
> >               nvme_clear_changed_ns_log(ctrl);
> >       }
> >
> > -     mutex_lock(&ctrl->scan_lock);
> > +     /*
> > +      * Failure to acquire scan_lock means another thread will requeue this
> > +      * work shortly
> > +      */
> > +     if (!mutex_trylock(&ctrl->scan_lock))
> > +             return;
> > +
> > +     clear_bit(NVME_CTRL_NEEDS_RESCAN, &ctrl->flags);
> >       if (nvme_ctrl_limited_cns(ctrl)) {
> >               nvme_scan_ns_sequential(ctrl);
> >       } else {
> > diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> > index 39a90b7cb1254..79a71b6ae64fe 100644
> > --- a/drivers/nvme/host/nvme.h
> > +++ b/drivers/nvme/host/nvme.h
> > @@ -251,6 +251,7 @@ enum nvme_ctrl_flags {
> >       NVME_CTRL_STOPPED               = 3,
> >       NVME_CTRL_SKIP_ID_CNS_CS        = 4,
> >       NVME_CTRL_DIRTY_CAPABILITY      = 5,
> > +     NVME_CTRL_NEEDS_RESCAN          = 6,
> >  };
> >
> >  struct nvme_ctrl {
> > --
> >


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-12-14 23:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-14 13:56 parallel nvme format Niklas Cassel
2023-12-14 17:19 ` Keith Busch
2023-12-14 21:21   ` Niklas Cassel
2023-12-14 22:49     ` Keith Busch
2023-12-14 23:08       ` Keith Busch
     [not found]         ` <BYAPR04MB4151B88FACA25A38C4C998FCBC8CA@BYAPR04MB4151.namprd04.prod.outlook.com>
2023-12-14 23:37           ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox