All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme: fix namespace removal list
@ 2024-06-11 15:20 Keith Busch
  2024-06-11 16:39 ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Keith Busch @ 2024-06-11 15:20 UTC (permalink / raw)
  To: linux-nvme, hch, sagi; +Cc: Keith Busch, Venkat Rao Bagalkote

From: Keith Busch <kbusch@kernel.org>

This function wants to move a list element to another list, so do that
instead of using the wrong list splice API.

Fixes: be647e2c76b27f4 ("nvme: use srcu for iterating namespace list")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/core.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f5d150c62955d..3db8e94c85026 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3959,9 +3959,10 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
 
 	mutex_lock(&ctrl->namespaces_lock);
 	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
-		if (ns->head->ns_id > nsid)
-			list_splice_init_rcu(&ns->list, &rm_list,
-					     synchronize_rcu);
+		if (ns->head->ns_id > nsid) {
+			list_del_rcu(&ns->list);
+			list_add_tail_rcu(&ns->list, &rm_list);
+		}
 	}
 	mutex_unlock(&ctrl->namespaces_lock);
 	synchronize_srcu(&ctrl->srcu);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: fix namespace removal list
  2024-06-11 15:20 [PATCH] nvme: fix namespace removal list Keith Busch
@ 2024-06-11 16:39 ` Christoph Hellwig
  2024-06-11 17:10   ` Nilay Shroff
                     ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Christoph Hellwig @ 2024-06-11 16:39 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, hch, sagi, Keith Busch, Venkat Rao Bagalkote,
	Paul E. McKenney, rcu

On Tue, Jun 11, 2024 at 08:20:55AM -0700, Keith Busch wrote:
>  	mutex_lock(&ctrl->namespaces_lock);
>  	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
> -		if (ns->head->ns_id > nsid)
> -			list_splice_init_rcu(&ns->list, &rm_list,
> -					     synchronize_rcu);
> +		if (ns->head->ns_id > nsid) {
> +			list_del_rcu(&ns->list);
> +			list_add_tail_rcu(&ns->list, &rm_list);
> +		}

Is this actually valid for a (S)RCU protected list?  If the entry gets
added to the new list before the grace period has completed, we could
trick a concurrent traversal into following the new list unless I'm
mistaken (although chances I'm mistaken on RCU corner cases aren't that
low..).



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: fix namespace removal list
  2024-06-11 16:39 ` Christoph Hellwig
@ 2024-06-11 17:10   ` Nilay Shroff
  2024-06-11 17:23     ` Paul E. McKenney
  2024-06-11 17:22   ` Keith Busch
  2024-06-11 17:22   ` Paul E. McKenney
  2 siblings, 1 reply; 9+ messages in thread
From: Nilay Shroff @ 2024-06-11 17:10 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: linux-nvme, sagi, Keith Busch, Venkat Rao Bagalkote,
	Paul E. McKenney, rcu



On 6/11/24 22:09, Christoph Hellwig wrote:
> On Tue, Jun 11, 2024 at 08:20:55AM -0700, Keith Busch wrote:
>>  	mutex_lock(&ctrl->namespaces_lock);
>>  	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
>> -		if (ns->head->ns_id > nsid)
>> -			list_splice_init_rcu(&ns->list, &rm_list,
>> -					     synchronize_rcu);
>> +		if (ns->head->ns_id > nsid) {
>> +			list_del_rcu(&ns->list);
>> +			list_add_tail_rcu(&ns->list, &rm_list);
>> +		}
> 
> Is this actually valid for a (S)RCU protected list?  If the entry gets
> added to the new list before the grace period has completed, we could
> trick a concurrent traversal into following the new list unless I'm
> mistaken (although chances I'm mistaken on RCU corner cases aren't that
> low..).
> 
> 
I think we need synchronize_srcu() before we add deleted ns to rm_list.

Thanks,
--Nilay


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: fix namespace removal list
  2024-06-11 16:39 ` Christoph Hellwig
  2024-06-11 17:10   ` Nilay Shroff
@ 2024-06-11 17:22   ` Keith Busch
  2024-06-11 17:24     ` Christoph Hellwig
  2024-06-11 17:22   ` Paul E. McKenney
  2 siblings, 1 reply; 9+ messages in thread
From: Keith Busch @ 2024-06-11 17:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, linux-nvme, sagi, Venkat Rao Bagalkote,
	Paul E. McKenney, rcu

On Tue, Jun 11, 2024 at 06:39:55PM +0200, Christoph Hellwig wrote:
> On Tue, Jun 11, 2024 at 08:20:55AM -0700, Keith Busch wrote:
> >  	mutex_lock(&ctrl->namespaces_lock);
> >  	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
> > -		if (ns->head->ns_id > nsid)
> > -			list_splice_init_rcu(&ns->list, &rm_list,
> > -					     synchronize_rcu);
> > +		if (ns->head->ns_id > nsid) {
> > +			list_del_rcu(&ns->list);
> > +			list_add_tail_rcu(&ns->list, &rm_list);
> > +		}
> 
> Is this actually valid for a (S)RCU protected list?  If the entry gets
> added to the new list before the grace period has completed, we could
> trick a concurrent traversal into following the new list unless I'm
> mistaken (although chances I'm mistaken on RCU corner cases aren't that
> low..).

Good call, you are absolutely right that a sync should happen between
the del and the add for readers to consistently iterate this list.

I might be able to weasel out of this though: our namespace list is
sorted, and this function wants to append everything from this element
all the way to the end to the "rm_list": a reader should get the same
result either way, whether if it was torn at this element or the move
happened without the reader seeing it.

Now if there was a way to rcu split the list so we don't have to do it
element by element...


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: fix namespace removal list
  2024-06-11 16:39 ` Christoph Hellwig
  2024-06-11 17:10   ` Nilay Shroff
  2024-06-11 17:22   ` Keith Busch
@ 2024-06-11 17:22   ` Paul E. McKenney
  2 siblings, 0 replies; 9+ messages in thread
From: Paul E. McKenney @ 2024-06-11 17:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, linux-nvme, sagi, Keith Busch, Venkat Rao Bagalkote,
	rcu

On Tue, Jun 11, 2024 at 06:39:55PM +0200, Christoph Hellwig wrote:
> On Tue, Jun 11, 2024 at 08:20:55AM -0700, Keith Busch wrote:
> >  	mutex_lock(&ctrl->namespaces_lock);
> >  	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
> > -		if (ns->head->ns_id > nsid)
> > -			list_splice_init_rcu(&ns->list, &rm_list,
> > -					     synchronize_rcu);
> > +		if (ns->head->ns_id > nsid) {
> > +			list_del_rcu(&ns->list);
> > +			list_add_tail_rcu(&ns->list, &rm_list);
> > +		}
> 
> Is this actually valid for a (S)RCU protected list?  If the entry gets
> added to the new list before the grace period has completed, we could
> trick a concurrent traversal into following the new list unless I'm
> mistaken (although chances I'm mistaken on RCU corner cases aren't that
> low..).

Just to confirm, yes, if an RCU reader was referencing the object being
deleted from the first list, that reader really could be transported
(free of charge!) to the second list.

One (slow) way to avoid this involuntary transportation is to do something
like this:

		list_del_rcu(&ns->list);
		synchronize_rcu(); // drain all readers from ns.
		list_add_tail_rcu(&ns->list, &rm_list);

You could speed things up using synchronize_rcu_expedited, with the
usual downsides.

There are faster ways of getting this effect, but the ones that I
know of involve putting generation numbers on all objects, duplicating
objects on each change, and having readers chase pointers to find the
desired version.  Usually not worth it, but I don't know enough about
this code to judge.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: fix namespace removal list
  2024-06-11 17:10   ` Nilay Shroff
@ 2024-06-11 17:23     ` Paul E. McKenney
  0 siblings, 0 replies; 9+ messages in thread
From: Paul E. McKenney @ 2024-06-11 17:23 UTC (permalink / raw)
  To: Nilay Shroff
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, sagi, Keith Busch,
	Venkat Rao Bagalkote, rcu

On Tue, Jun 11, 2024 at 10:40:13PM +0530, Nilay Shroff wrote:
> 
> 
> On 6/11/24 22:09, Christoph Hellwig wrote:
> > On Tue, Jun 11, 2024 at 08:20:55AM -0700, Keith Busch wrote:
> >>  	mutex_lock(&ctrl->namespaces_lock);
> >>  	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
> >> -		if (ns->head->ns_id > nsid)
> >> -			list_splice_init_rcu(&ns->list, &rm_list,
> >> -					     synchronize_rcu);
> >> +		if (ns->head->ns_id > nsid) {
> >> +			list_del_rcu(&ns->list);
> >> +			list_add_tail_rcu(&ns->list, &rm_list);
> >> +		}
> > 
> > Is this actually valid for a (S)RCU protected list?  If the entry gets
> > added to the new list before the grace period has completed, we could
> > trick a concurrent traversal into following the new list unless I'm
> > mistaken (although chances I'm mistaken on RCU corner cases aren't that
> > low..).
> > 
> > 
> I think we need synchronize_srcu() before we add deleted ns to rm_list.

After we delete it, but yes.

There will also be a window during which the element is not in either
list, but given the name "rm_list", I am guessing that this is OK.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: fix namespace removal list
  2024-06-11 17:22   ` Keith Busch
@ 2024-06-11 17:24     ` Christoph Hellwig
  2024-06-11 17:43       ` Paul E. McKenney
  2024-06-11 17:47       ` Keith Busch
  0 siblings, 2 replies; 9+ messages in thread
From: Christoph Hellwig @ 2024-06-11 17:24 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, sagi,
	Venkat Rao Bagalkote, Paul E. McKenney, rcu

On Tue, Jun 11, 2024 at 11:22:04AM -0600, Keith Busch wrote:
> > Is this actually valid for a (S)RCU protected list?  If the entry gets
> > added to the new list before the grace period has completed, we could
> > trick a concurrent traversal into following the new list unless I'm
> > mistaken (although chances I'm mistaken on RCU corner cases aren't that
> > low..).
> 
> Good call, you are absolutely right that a sync should happen between
> the del and the add for readers to consistently iterate this list.
> 
> I might be able to weasel out of this though: our namespace list is
> sorted, and this function wants to append everything from this element
> all the way to the end to the "rm_list": a reader should get the same
> result either way, whether if it was torn at this element or the move
> happened without the reader seeing it.
> 
> Now if there was a way to rcu split the list so we don't have to do it
> element by element...

Isn't that exactly what the list_splice helpers do?  We'd just need to
do that exactly once at the cutoff point and not once per namespace.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: fix namespace removal list
  2024-06-11 17:24     ` Christoph Hellwig
@ 2024-06-11 17:43       ` Paul E. McKenney
  2024-06-11 17:47       ` Keith Busch
  1 sibling, 0 replies; 9+ messages in thread
From: Paul E. McKenney @ 2024-06-11 17:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Keith Busch, linux-nvme, sagi, Venkat Rao Bagalkote,
	rcu

On Tue, Jun 11, 2024 at 07:24:48PM +0200, Christoph Hellwig wrote:
> On Tue, Jun 11, 2024 at 11:22:04AM -0600, Keith Busch wrote:
> > > Is this actually valid for a (S)RCU protected list?  If the entry gets
> > > added to the new list before the grace period has completed, we could
> > > trick a concurrent traversal into following the new list unless I'm
> > > mistaken (although chances I'm mistaken on RCU corner cases aren't that
> > > low..).
> > 
> > Good call, you are absolutely right that a sync should happen between
> > the del and the add for readers to consistently iterate this list.
> > 
> > I might be able to weasel out of this though: our namespace list is
> > sorted, and this function wants to append everything from this element
> > all the way to the end to the "rm_list": a reader should get the same
> > result either way, whether if it was torn at this element or the move
> > happened without the reader seeing it.
> > 
> > Now if there was a way to rcu split the list so we don't have to do it
> > element by element...
> 
> Isn't that exactly what the list_splice helpers do?  We'd just need to
> do that exactly once at the cutoff point and not once per namespace.

Almost.  They move the whole list and Keith wants just part of it moved.
It should not be hard to create such a thing, and Keith said (offlist)
that he would like to give it a try.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: fix namespace removal list
  2024-06-11 17:24     ` Christoph Hellwig
  2024-06-11 17:43       ` Paul E. McKenney
@ 2024-06-11 17:47       ` Keith Busch
  1 sibling, 0 replies; 9+ messages in thread
From: Keith Busch @ 2024-06-11 17:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, linux-nvme, sagi, Venkat Rao Bagalkote,
	Paul E. McKenney, rcu

On Tue, Jun 11, 2024 at 07:24:48PM +0200, Christoph Hellwig wrote:
> On Tue, Jun 11, 2024 at 11:22:04AM -0600, Keith Busch wrote:
> > > Is this actually valid for a (S)RCU protected list?  If the entry gets
> > > added to the new list before the grace period has completed, we could
> > > trick a concurrent traversal into following the new list unless I'm
> > > mistaken (although chances I'm mistaken on RCU corner cases aren't that
> > > low..).
> > 
> > Good call, you are absolutely right that a sync should happen between
> > the del and the add for readers to consistently iterate this list.
> > 
> > I might be able to weasel out of this though: our namespace list is
> > sorted, and this function wants to append everything from this element
> > all the way to the end to the "rm_list": a reader should get the same
> > result either way, whether if it was torn at this element or the move
> > happened without the reader seeing it.
> > 
> > Now if there was a way to rcu split the list so we don't have to do it
> > element by element...
> 
> Isn't that exactly what the list_splice helpers do?  We'd just need to
> do that exactly once at the cutoff point and not once per namespace.

Not a splice since that takes the entire source list; this usage is more
aligned to "list_cut_position()", which doesn't look rcu safe as-is, but
I think we can make one!


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-06-11 17:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-11 15:20 [PATCH] nvme: fix namespace removal list Keith Busch
2024-06-11 16:39 ` Christoph Hellwig
2024-06-11 17:10   ` Nilay Shroff
2024-06-11 17:23     ` Paul E. McKenney
2024-06-11 17:22   ` Keith Busch
2024-06-11 17:24     ` Christoph Hellwig
2024-06-11 17:43       ` Paul E. McKenney
2024-06-11 17:47       ` Keith Busch
2024-06-11 17:22   ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.