public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH] nvmet: move percpu handling into nvmet_ns_{enable,disable}
@ 2025-01-24  8:25 hare
  2025-01-24 10:50 ` Sagi Grimberg
  2025-01-24 19:59 ` Keith Busch
  0 siblings, 2 replies; 9+ messages in thread
From: hare @ 2025-01-24  8:25 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Sagi Grimberg, Keith Busch, linux-nvme, Hannes Reinecke

From: Hannes Reinecke <hare@kernel.org>

The namespace percpu counter protects pending I/O, and we can
only safely diable the namespace once the counter drop to zero.
So we need to init the percpu counter in nvmet_ns_enable(), and
wait for it to drop to zero in nvmet_ns_disable() to avoid having
I/O pending after the namespace has been disabled.

Fixes: 74d16965d7ac ("nvmet-loop: avoid using mutex in IO hotpath")

Signed-off-by: Hannes Reinecke <hare@kernel.org>
---
 drivers/nvme/target/core.c | 44 +++++++++++++++++---------------------
 1 file changed, 20 insertions(+), 24 deletions(-)

diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 5f7b5d1f78c0..cdf8e16c1c6d 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -611,6 +611,9 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
 			goto out_dev_put;
 	}
 
+	if (percpu_ref_init(&ns->ref, nvmet_destroy_namespace, 0, GFP_KERNEL))
+		goto out_pr_exit;
+
 	nvmet_ns_changed(subsys, ns->nsid);
 	ns->enabled = true;
 	xa_set_mark(&subsys->namespaces, ns->nsid, NVMET_NS_ENABLED);
@@ -618,6 +621,9 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
 out_unlock:
 	mutex_unlock(&subsys->lock);
 	return ret;
+out_pr_exit:
+	if (ns->pr.enable)
+		nvmet_pr_exit_ns(ns);
 out_dev_put:
 	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry)
 		pci_dev_put(radix_tree_delete(&ctrl->p2p_ns_map, ns->nsid));
@@ -643,6 +649,19 @@ void nvmet_ns_disable(struct nvmet_ns *ns)
 
 	mutex_unlock(&subsys->lock);
 
+	/*
+	 * Now that we removed the namespaces from the lookup list, we
+	 * can kill the per_cpu ref and wait for any remaining references
+	 * to be dropped, as well as a RCU grace period for anyone only
+	 * using the namepace under rcu_read_lock().  Note that we can't
+	 * use call_rcu here as we need to ensure the namespaces have
+	 * been fully destroyed before unloading the module.
+	 */
+	percpu_ref_kill(&ns->ref);
+	synchronize_rcu();
+	wait_for_completion(&ns->disable_done);
+	percpu_ref_exit(&ns->ref);
+
 	if (ns->pr.enable)
 		nvmet_pr_exit_ns(ns);
 
@@ -660,25 +679,7 @@ void nvmet_ns_free(struct nvmet_ns *ns)
 	nvmet_ns_disable(ns);
 
 	mutex_lock(&subsys->lock);
-
 	xa_erase(&subsys->namespaces, ns->nsid);
-
-	mutex_unlock(&subsys->lock);
-
-	/*
-	 * Now that we removed the namespaces from the lookup list, we
-	 * can kill the per_cpu ref and wait for any remaining references
-	 * to be dropped, as well as a RCU grace period for anyone only
-	 * using the namepace under rcu_read_lock().  Note that we can't
-	 * use call_rcu here as we need to ensure the namespaces have
-	 * been fully destroyed before unloading the module.
-	 */
-	percpu_ref_kill(&ns->ref);
-	synchronize_rcu();
-	wait_for_completion(&ns->disable_done);
-	percpu_ref_exit(&ns->ref);
-
-	mutex_lock(&subsys->lock);
 	subsys->nr_namespaces--;
 	mutex_unlock(&subsys->lock);
 
@@ -708,11 +709,8 @@ struct nvmet_ns *nvmet_ns_alloc(struct nvmet_subsys *subsys, u32 nsid)
 	ns->nsid = nsid;
 	ns->subsys = subsys;
 
-	if (percpu_ref_init(&ns->ref, nvmet_destroy_namespace, 0, GFP_KERNEL))
-		goto out_free;
-
 	if (xa_insert(&subsys->namespaces, ns->nsid, ns, GFP_KERNEL))
-		goto out_exit;
+		goto out_free;
 
 	subsys->nr_namespaces++;
 
@@ -728,8 +726,6 @@ struct nvmet_ns *nvmet_ns_alloc(struct nvmet_subsys *subsys, u32 nsid)
 	ns->csi = NVME_CSI_NVM;
 
 	return ns;
-out_exit:
-	percpu_ref_exit(&ns->ref);
 out_free:
 	kfree(ns);
 out_unlock:
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvmet: move percpu handling into nvmet_ns_{enable,disable}
  2025-01-24  8:25 [PATCH] nvmet: move percpu handling into nvmet_ns_{enable,disable} hare
@ 2025-01-24 10:50 ` Sagi Grimberg
  2025-01-24 11:09   ` Hannes Reinecke
  2025-01-24 19:59 ` Keith Busch
  1 sibling, 1 reply; 9+ messages in thread
From: Sagi Grimberg @ 2025-01-24 10:50 UTC (permalink / raw)
  To: hare, Christoph Hellwig; +Cc: Keith Busch, linux-nvme




On 24/01/2025 10:25, hare@kernel.org wrote:
> From: Hannes Reinecke <hare@kernel.org>
>
> The namespace percpu counter protects pending I/O, and we can
> only safely diable the namespace once the counter drop to zero.
> So we need to init the percpu counter in nvmet_ns_enable(), and
> wait for it to drop to zero in nvmet_ns_disable() to avoid having
> I/O pending after the namespace has been disabled.
>
> Fixes: 74d16965d7ac ("nvmet-loop: avoid using mutex in IO hotpath")

Can you please describe the bug and scenario which you are hitting this bug?
It is also unclear how the above patch is causing this.

 From quick look patch itself looks reasonable.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvmet: move percpu handling into nvmet_ns_{enable,disable}
  2025-01-24 10:50 ` Sagi Grimberg
@ 2025-01-24 11:09   ` Hannes Reinecke
  2025-01-24 14:06     ` Nilay Shroff
                       ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Hannes Reinecke @ 2025-01-24 11:09 UTC (permalink / raw)
  To: Sagi Grimberg, hare, Christoph Hellwig
  Cc: Keith Busch, linux-nvme, Shin'ichiro Kawasaki

On 1/24/25 11:50, Sagi Grimberg wrote:
> 
> 
> 
> On 24/01/2025 10:25, hare@kernel.org wrote:
>> From: Hannes Reinecke <hare@kernel.org>
>>
>> The namespace percpu counter protects pending I/O, and we can
>> only safely diable the namespace once the counter drop to zero.
>> So we need to init the percpu counter in nvmet_ns_enable(), and
>> wait for it to drop to zero in nvmet_ns_disable() to avoid having
>> I/O pending after the namespace has been disabled.
>>
>> Fixes: 74d16965d7ac ("nvmet-loop: avoid using mutex in IO hotpath")
> 
> Can you please describe the bug and scenario which you are hitting this 
> bug?
> It is also unclear how the above patch is causing this.
> 
>  From quick look patch itself looks reasonable.

Should've said: this is the fix to the reported blktest failure
on nvme/058...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvmet: move percpu handling into nvmet_ns_{enable,disable}
  2025-01-24 11:09   ` Hannes Reinecke
@ 2025-01-24 14:06     ` Nilay Shroff
  2025-01-24 15:09       ` Hannes Reinecke
  2025-01-26  8:04     ` Sagi Grimberg
  2025-01-28  7:46     ` Christoph Hellwig
  2 siblings, 1 reply; 9+ messages in thread
From: Nilay Shroff @ 2025-01-24 14:06 UTC (permalink / raw)
  To: Hannes Reinecke, Sagi Grimberg, hare, Christoph Hellwig
  Cc: Keith Busch, linux-nvme, Shin'ichiro Kawasaki



On 1/24/25 4:39 PM, Hannes Reinecke wrote:
> On 1/24/25 11:50, Sagi Grimberg wrote:
>>
>>
>>
>> On 24/01/2025 10:25, hare@kernel.org wrote:
>>> From: Hannes Reinecke <hare@kernel.org>
>>>
>>> The namespace percpu counter protects pending I/O, and we can
>>> only safely diable the namespace once the counter drop to zero.
>>> So we need to init the percpu counter in nvmet_ns_enable(), and
>>> wait for it to drop to zero in nvmet_ns_disable() to avoid having
>>> I/O pending after the namespace has been disabled.
>>>
>>> Fixes: 74d16965d7ac ("nvmet-loop: avoid using mutex in IO hotpath")
>>
>> Can you please describe the bug and scenario which you are hitting this bug?
>> It is also unclear how the above patch is causing this.
>>
>>  From quick look patch itself looks reasonable.
> 
> Should've said: this is the fix to the reported blktest failure
> on nvme/058...
> 
Thanks Hannes for addressing this issue!

I think while implementing 74d16965d7ac ("nvmet-loop: avoid using mutex in 
IO hotpath"), I moved initialization of ns->ref from nvmet_ns_enable() to 
nvmet_ns_alloc() as well moved killing of ns->ref logic from nvmet_ns_disable() 
to nvmet_ns_free() and that probably caused the observed symptom. I thought we 
need to init ns->ref while we insert ns in Xarray and kill ns->ref while we erase
it from Xarray, but that was not correct. My bad!

Though I was never able to recreate this issue on my setup even after running
nvme/058 hundreds of times, still I think the issue reported by Shinichiro 
appears plausible. So we need this fix and your changes look good to me.

Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvmet: move percpu handling into nvmet_ns_{enable,disable}
  2025-01-24 14:06     ` Nilay Shroff
@ 2025-01-24 15:09       ` Hannes Reinecke
  2025-01-24 15:20         ` Nilay Shroff
  0 siblings, 1 reply; 9+ messages in thread
From: Hannes Reinecke @ 2025-01-24 15:09 UTC (permalink / raw)
  To: Nilay Shroff, Sagi Grimberg, hare, Christoph Hellwig
  Cc: Keith Busch, linux-nvme, Shin'ichiro Kawasaki

On 1/24/25 15:06, Nilay Shroff wrote:
> 
> 
[ .. ]>
> Though I was never able to recreate this issue on my setup even after running
> nvme/058 hundreds of times, still I think the issue reported by Shinichiro
> appears plausible. So we need this fix and your changes look good to me.
> 
> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>

Yeah, we definitely need this. I was able to trigger it quite 
consistently with qemu after just a few retries.

And I guess we can now kill the ->enable setting completely, and
just use the xarray mark instead.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvmet: move percpu handling into nvmet_ns_{enable,disable}
  2025-01-24 15:09       ` Hannes Reinecke
@ 2025-01-24 15:20         ` Nilay Shroff
  0 siblings, 0 replies; 9+ messages in thread
From: Nilay Shroff @ 2025-01-24 15:20 UTC (permalink / raw)
  To: Hannes Reinecke, Sagi Grimberg, hare, Christoph Hellwig
  Cc: Keith Busch, linux-nvme, Shin'ichiro Kawasaki



On 1/24/25 8:39 PM, Hannes Reinecke wrote:
> On 1/24/25 15:06, Nilay Shroff wrote:
>>
>>
> [ .. ]>
>> Though I was never able to recreate this issue on my setup even after running
>> nvme/058 hundreds of times, still I think the issue reported by Shinichiro
>> appears plausible. So we need this fix and your changes look good to me.
>>
>> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
> 
> Yeah, we definitely need this. I was able to trigger it quite consistently with qemu after just a few retries.
>
I was using PPC machine for testing, on which I couldn't recreate, but I would try using qemu.
I saw that Shinichiro also reported it on qemu.

Thanks,
--Nilay


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvmet: move percpu handling into nvmet_ns_{enable,disable}
  2025-01-24  8:25 [PATCH] nvmet: move percpu handling into nvmet_ns_{enable,disable} hare
  2025-01-24 10:50 ` Sagi Grimberg
@ 2025-01-24 19:59 ` Keith Busch
  1 sibling, 0 replies; 9+ messages in thread
From: Keith Busch @ 2025-01-24 19:59 UTC (permalink / raw)
  To: hare; +Cc: Christoph Hellwig, Sagi Grimberg, linux-nvme

On Fri, Jan 24, 2025 at 09:25:05AM +0100, hare@kernel.org wrote:
> From: Hannes Reinecke <hare@kernel.org>
> 
> The namespace percpu counter protects pending I/O, and we can
> only safely diable the namespace once the counter drop to zero.
> So we need to init the percpu counter in nvmet_ns_enable(), and
> wait for it to drop to zero in nvmet_ns_disable() to avoid having
> I/O pending after the namespace has been disabled.
> 
> Fixes: 74d16965d7ac ("nvmet-loop: avoid using mutex in IO hotpath")

We'll have to wait for after the merge window when the block tree merges
up to apply this: the nvme-6.14 branch doesn't contain the offending
commit yet.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvmet: move percpu handling into nvmet_ns_{enable,disable}
  2025-01-24 11:09   ` Hannes Reinecke
  2025-01-24 14:06     ` Nilay Shroff
@ 2025-01-26  8:04     ` Sagi Grimberg
  2025-01-28  7:46     ` Christoph Hellwig
  2 siblings, 0 replies; 9+ messages in thread
From: Sagi Grimberg @ 2025-01-26  8:04 UTC (permalink / raw)
  To: Hannes Reinecke, hare, Christoph Hellwig
  Cc: Keith Busch, linux-nvme, Shin'ichiro Kawasaki


>>
>> On 24/01/2025 10:25, hare@kernel.org wrote:
>>> From: Hannes Reinecke <hare@kernel.org>
>>>
>>> The namespace percpu counter protects pending I/O, and we can
>>> only safely diable the namespace once the counter drop to zero.
>>> So we need to init the percpu counter in nvmet_ns_enable(), and
>>> wait for it to drop to zero in nvmet_ns_disable() to avoid having
>>> I/O pending after the namespace has been disabled.
>>>
>>> Fixes: 74d16965d7ac ("nvmet-loop: avoid using mutex in IO hotpath")
>>
>> Can you please describe the bug and scenario which you are hitting 
>> this bug?
>> It is also unclear how the above patch is causing this.
>>
>>  From quick look patch itself looks reasonable.
>
> Should've said: this is the fix to the reported blktest failure
> on nvme/058...

OK, then please resend this patch that explains the exact issues that it 
solves.
Also, please rephrase your patch title to describe that this is a bug 
fix. i.e.
"nvmet: fix ...."


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvmet: move percpu handling into nvmet_ns_{enable,disable}
  2025-01-24 11:09   ` Hannes Reinecke
  2025-01-24 14:06     ` Nilay Shroff
  2025-01-26  8:04     ` Sagi Grimberg
@ 2025-01-28  7:46     ` Christoph Hellwig
  2 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2025-01-28  7:46 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Sagi Grimberg, hare, Christoph Hellwig, Keith Busch, linux-nvme,
	Shin'ichiro Kawasaki

On Fri, Jan 24, 2025 at 12:09:31PM +0100, Hannes Reinecke wrote:
>> It is also unclear how the above patch is causing this.
>>
>>  From quick look patch itself looks reasonable.
>
> Should've said: this is the fix to the reported blktest failure
> on nvme/058...

Can you please resend with that included?



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-01-28  7:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-24  8:25 [PATCH] nvmet: move percpu handling into nvmet_ns_{enable,disable} hare
2025-01-24 10:50 ` Sagi Grimberg
2025-01-24 11:09   ` Hannes Reinecke
2025-01-24 14:06     ` Nilay Shroff
2025-01-24 15:09       ` Hannes Reinecke
2025-01-24 15:20         ` Nilay Shroff
2025-01-26  8:04     ` Sagi Grimberg
2025-01-28  7:46     ` Christoph Hellwig
2025-01-24 19:59 ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox