* Re: [PATCH] nvme: fix namespace removal list
2024-06-11 16:39 ` Christoph Hellwig
@ 2024-06-11 17:10 ` Nilay Shroff
2024-06-11 17:23 ` Paul E. McKenney
2024-06-11 17:22 ` Keith Busch
2024-06-11 17:22 ` Paul E. McKenney
2 siblings, 1 reply; 9+ messages in thread
From: Nilay Shroff @ 2024-06-11 17:10 UTC (permalink / raw)
To: Christoph Hellwig, Keith Busch
Cc: linux-nvme, sagi, Keith Busch, Venkat Rao Bagalkote,
Paul E. McKenney, rcu
On 6/11/24 22:09, Christoph Hellwig wrote:
> On Tue, Jun 11, 2024 at 08:20:55AM -0700, Keith Busch wrote:
>> mutex_lock(&ctrl->namespaces_lock);
>> list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
>> - if (ns->head->ns_id > nsid)
>> - list_splice_init_rcu(&ns->list, &rm_list,
>> - synchronize_rcu);
>> + if (ns->head->ns_id > nsid) {
>> + list_del_rcu(&ns->list);
>> + list_add_tail_rcu(&ns->list, &rm_list);
>> + }
>
> Is this actually valid for a (S)RCU protected list? If the entry gets
> added to the new list before the grace period has completed, we could
> trick a concurrent traversal into following the new list unless I'm
> mistaken (although chances I'm mistaken on RCU corner cases aren't that
> low..).
>
>
I think we need synchronize_srcu() before we add deleted ns to rm_list.
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] nvme: fix namespace removal list
2024-06-11 17:10 ` Nilay Shroff
@ 2024-06-11 17:23 ` Paul E. McKenney
0 siblings, 0 replies; 9+ messages in thread
From: Paul E. McKenney @ 2024-06-11 17:23 UTC (permalink / raw)
To: Nilay Shroff
Cc: Christoph Hellwig, Keith Busch, linux-nvme, sagi, Keith Busch,
Venkat Rao Bagalkote, rcu
On Tue, Jun 11, 2024 at 10:40:13PM +0530, Nilay Shroff wrote:
>
>
> On 6/11/24 22:09, Christoph Hellwig wrote:
> > On Tue, Jun 11, 2024 at 08:20:55AM -0700, Keith Busch wrote:
> >> mutex_lock(&ctrl->namespaces_lock);
> >> list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
> >> - if (ns->head->ns_id > nsid)
> >> - list_splice_init_rcu(&ns->list, &rm_list,
> >> - synchronize_rcu);
> >> + if (ns->head->ns_id > nsid) {
> >> + list_del_rcu(&ns->list);
> >> + list_add_tail_rcu(&ns->list, &rm_list);
> >> + }
> >
> > Is this actually valid for a (S)RCU protected list? If the entry gets
> > added to the new list before the grace period has completed, we could
> > trick a concurrent traversal into following the new list unless I'm
> > mistaken (although chances I'm mistaken on RCU corner cases aren't that
> > low..).
> >
> >
> I think we need synchronize_srcu() before we add deleted ns to rm_list.
After we delete it, but yes.
There will also be a window during which the element is not in either
list, but given the name "rm_list", I am guessing that this is OK. ;-)
Thanx, Paul
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nvme: fix namespace removal list
2024-06-11 16:39 ` Christoph Hellwig
2024-06-11 17:10 ` Nilay Shroff
@ 2024-06-11 17:22 ` Keith Busch
2024-06-11 17:24 ` Christoph Hellwig
2024-06-11 17:22 ` Paul E. McKenney
2 siblings, 1 reply; 9+ messages in thread
From: Keith Busch @ 2024-06-11 17:22 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, linux-nvme, sagi, Venkat Rao Bagalkote,
Paul E. McKenney, rcu
On Tue, Jun 11, 2024 at 06:39:55PM +0200, Christoph Hellwig wrote:
> On Tue, Jun 11, 2024 at 08:20:55AM -0700, Keith Busch wrote:
> > mutex_lock(&ctrl->namespaces_lock);
> > list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
> > - if (ns->head->ns_id > nsid)
> > - list_splice_init_rcu(&ns->list, &rm_list,
> > - synchronize_rcu);
> > + if (ns->head->ns_id > nsid) {
> > + list_del_rcu(&ns->list);
> > + list_add_tail_rcu(&ns->list, &rm_list);
> > + }
>
> Is this actually valid for a (S)RCU protected list? If the entry gets
> added to the new list before the grace period has completed, we could
> trick a concurrent traversal into following the new list unless I'm
> mistaken (although chances I'm mistaken on RCU corner cases aren't that
> low..).
Good call, you are absolutely right that a sync should happen between
the del and the add for readers to consistently iterate this list.
I might be able to weasel out of this though: our namespace list is
sorted, and this function wants to append everything from this element
all the way to the end to the "rm_list": a reader should get the same
result either way, whether if it was torn at this element or the move
happened without the reader seeing it.
Now if there was a way to rcu split the list so we don't have to do it
element by element...
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] nvme: fix namespace removal list
2024-06-11 17:22 ` Keith Busch
@ 2024-06-11 17:24 ` Christoph Hellwig
2024-06-11 17:43 ` Paul E. McKenney
2024-06-11 17:47 ` Keith Busch
0 siblings, 2 replies; 9+ messages in thread
From: Christoph Hellwig @ 2024-06-11 17:24 UTC (permalink / raw)
To: Keith Busch
Cc: Christoph Hellwig, Keith Busch, linux-nvme, sagi,
Venkat Rao Bagalkote, Paul E. McKenney, rcu
On Tue, Jun 11, 2024 at 11:22:04AM -0600, Keith Busch wrote:
> > Is this actually valid for a (S)RCU protected list? If the entry gets
> > added to the new list before the grace period has completed, we could
> > trick a concurrent traversal into following the new list unless I'm
> > mistaken (although chances I'm mistaken on RCU corner cases aren't that
> > low..).
>
> Good call, you are absolutely right that a sync should happen between
> the del and the add for readers to consistently iterate this list.
>
> I might be able to weasel out of this though: our namespace list is
> sorted, and this function wants to append everything from this element
> all the way to the end to the "rm_list": a reader should get the same
> result either way, whether if it was torn at this element or the move
> happened without the reader seeing it.
>
> Now if there was a way to rcu split the list so we don't have to do it
> element by element...
Isn't that exactly what the list_splice helpers do? We'd just need to
do that exactly once at the cutoff point and not once per namespace.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nvme: fix namespace removal list
2024-06-11 17:24 ` Christoph Hellwig
@ 2024-06-11 17:43 ` Paul E. McKenney
2024-06-11 17:47 ` Keith Busch
1 sibling, 0 replies; 9+ messages in thread
From: Paul E. McKenney @ 2024-06-11 17:43 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, Keith Busch, linux-nvme, sagi, Venkat Rao Bagalkote,
rcu
On Tue, Jun 11, 2024 at 07:24:48PM +0200, Christoph Hellwig wrote:
> On Tue, Jun 11, 2024 at 11:22:04AM -0600, Keith Busch wrote:
> > > Is this actually valid for a (S)RCU protected list? If the entry gets
> > > added to the new list before the grace period has completed, we could
> > > trick a concurrent traversal into following the new list unless I'm
> > > mistaken (although chances I'm mistaken on RCU corner cases aren't that
> > > low..).
> >
> > Good call, you are absolutely right that a sync should happen between
> > the del and the add for readers to consistently iterate this list.
> >
> > I might be able to weasel out of this though: our namespace list is
> > sorted, and this function wants to append everything from this element
> > all the way to the end to the "rm_list": a reader should get the same
> > result either way, whether if it was torn at this element or the move
> > happened without the reader seeing it.
> >
> > Now if there was a way to rcu split the list so we don't have to do it
> > element by element...
>
> Isn't that exactly what the list_splice helpers do? We'd just need to
> do that exactly once at the cutoff point and not once per namespace.
Almost. They move the whole list and Keith wants just part of it moved.
It should not be hard to create such a thing, and Keith said (offlist)
that he would like to give it a try.
Thanx, Paul
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nvme: fix namespace removal list
2024-06-11 17:24 ` Christoph Hellwig
2024-06-11 17:43 ` Paul E. McKenney
@ 2024-06-11 17:47 ` Keith Busch
1 sibling, 0 replies; 9+ messages in thread
From: Keith Busch @ 2024-06-11 17:47 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, linux-nvme, sagi, Venkat Rao Bagalkote,
Paul E. McKenney, rcu
On Tue, Jun 11, 2024 at 07:24:48PM +0200, Christoph Hellwig wrote:
> On Tue, Jun 11, 2024 at 11:22:04AM -0600, Keith Busch wrote:
> > > Is this actually valid for a (S)RCU protected list? If the entry gets
> > > added to the new list before the grace period has completed, we could
> > > trick a concurrent traversal into following the new list unless I'm
> > > mistaken (although chances I'm mistaken on RCU corner cases aren't that
> > > low..).
> >
> > Good call, you are absolutely right that a sync should happen between
> > the del and the add for readers to consistently iterate this list.
> >
> > I might be able to weasel out of this though: our namespace list is
> > sorted, and this function wants to append everything from this element
> > all the way to the end to the "rm_list": a reader should get the same
> > result either way, whether if it was torn at this element or the move
> > happened without the reader seeing it.
> >
> > Now if there was a way to rcu split the list so we don't have to do it
> > element by element...
>
> Isn't that exactly what the list_splice helpers do? We'd just need to
> do that exactly once at the cutoff point and not once per namespace.
Not a splice since that takes the entire source list; this usage is more
aligned to "list_cut_position()", which doesn't look rcu safe as-is, but
I think we can make one!
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nvme: fix namespace removal list
2024-06-11 16:39 ` Christoph Hellwig
2024-06-11 17:10 ` Nilay Shroff
2024-06-11 17:22 ` Keith Busch
@ 2024-06-11 17:22 ` Paul E. McKenney
2 siblings, 0 replies; 9+ messages in thread
From: Paul E. McKenney @ 2024-06-11 17:22 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, linux-nvme, sagi, Keith Busch, Venkat Rao Bagalkote,
rcu
On Tue, Jun 11, 2024 at 06:39:55PM +0200, Christoph Hellwig wrote:
> On Tue, Jun 11, 2024 at 08:20:55AM -0700, Keith Busch wrote:
> > mutex_lock(&ctrl->namespaces_lock);
> > list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
> > - if (ns->head->ns_id > nsid)
> > - list_splice_init_rcu(&ns->list, &rm_list,
> > - synchronize_rcu);
> > + if (ns->head->ns_id > nsid) {
> > + list_del_rcu(&ns->list);
> > + list_add_tail_rcu(&ns->list, &rm_list);
> > + }
>
> Is this actually valid for a (S)RCU protected list? If the entry gets
> added to the new list before the grace period has completed, we could
> trick a concurrent traversal into following the new list unless I'm
> mistaken (although chances I'm mistaken on RCU corner cases aren't that
> low..).
Just to confirm, yes, if an RCU reader was referencing the object being
deleted from the first list, that reader really could be transported
(free of charge!) to the second list.
One (slow) way to avoid this involuntary transportation is to do something
like this:
list_del_rcu(&ns->list);
synchronize_rcu(); // drain all readers from ns.
list_add_tail_rcu(&ns->list, &rm_list);
You could speed things up using synchronize_rcu_expedited, with the
usual downsides.
There are faster ways of getting this effect, but the ones that I
know of involve putting generation numbers on all objects, duplicating
objects on each change, and having readers chase pointers to find the
desired version. Usually not worth it, but I don't know enough about
this code to judge.
Thanx, Paul
^ permalink raw reply [flat|nested] 9+ messages in thread