* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
2023-11-01 20:23 ` Oleg Nesterov
@ 2023-11-01 20:40 ` Oleg Nesterov
2023-11-01 21:22 ` David Howells
2023-11-01 20:52 ` Al Viro
2023-11-01 21:20 ` David Howells
2 siblings, 1 reply; 19+ messages in thread
From: Oleg Nesterov @ 2023-11-01 20:40 UTC (permalink / raw)
To: David Howells
Cc: Marc Dionne, Alexander Viro, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Chuck Lever, linux-afs, netdev,
linux-kernel
In case I was not clear, I am not saying this code is buggy.
Just none of read_seqbegin_or_lock/need_seqretry/done_seqretry
helpers make any sense in this code. It can use read_seqbegin/
read_seqretry and this won't change the current behaviour.
On 11/01, Oleg Nesterov wrote:
>
> On 11/01, David Howells wrote:
> >
> > Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > > read_seqbegin_or_lock() makes no sense unless you make "seq" odd
> > > after the lockless access failed.
> >
> > I think you're wrong.
>
> I think you missed the point ;)
>
> > write_seqlock() turns it odd.
>
> It changes seqcount_t->sequence but not "seq" so this doesn't matter.
>
> > For instance, if the read lock is taken first:
> >
> > sequence seq CPU 1 CPU 2
> > ======= ======= =============================== ===============
> > 0
> > 0 0 seq = 0 MUST BE EVEN
>
> This is correct,
>
> > ACCORDING TO DOC
>
> documentation is wrong, please see
>
> [PATCH 1/2] seqlock: fix the wrong read_seqbegin_or_lock/need_seqretry documentation
> https://lore.kernel.org/all/20231024120808.GA15382@redhat.com/
>
> > 0 0 read_seqbegin_or_lock() [lockless]
> > ...
> > 1 0 write_seqlock()
> > 1 0 need_seqretry() [seq=even; sequence!=seq: retry]
>
> Yes, if CPU_1 races with write_seqlock() need_seqretry() returns true,
>
> > 1 1 read_seqbegin_or_lock() [exclusive]
>
> No. "seq" is still even, so read_seqbegin_or_lock() won't do read_seqlock_excl(),
> it will do
>
> seq = read_seqbegin(lock);
>
> again.
>
> > Note that it spins in __read_seqcount_begin() until we get an even seq,
> > indicating that no write is currently in progress - at which point we can
> > perform a lockless pass.
>
> Exactly. And this means that "seq" is always even.
>
> > > See thread_group_cputime() as an example, note that it does nextseq = 1 for
> > > the 2nd round.
> >
> > That's not especially convincing.
>
> See also the usage of read_seqbegin_or_lock() in fs/dcache.c and fs/d_path.c.
> All other users are wrong.
>
> Lets start from the very beginning. This code does
>
> int seq = 0;
> do {
> read_seqbegin_or_lock(service_conn_lock, &seq);
>
> do_something();
>
> } while (need_seqretry(service_conn_lock, seq));
>
> done_seqretry(service_conn_lock, seq);
>
> Initially seq is even (it is zero), so read_seqbegin_or_lock(&seq) does
>
> *seq = read_seqbegin(lock);
>
> and returns. Note that "seq" is still even.
>
> Now. If need_seqretry(seq) detects the race with write_seqlock() it returns
> true but it does NOT change this "seq", it is still even. So on the next
> iteration read_seqbegin_or_lock() will do
>
> *seq = read_seqbegin(lock);
>
> again, it won't take this lock for writing. And again, seq will be even.
> And so on.
>
> And this means that the code above is equivalent to
>
> do {
> seq = read_seqbegin(service_conn_lock);
>
> do_something();
>
> } while (read_seqretry(service_conn_lock, seq));
>
> and this is what this patch does.
>
> Yes this is confusing. Again, even the documentation is wrong! That is why
> I am trying to remove the misuse of read_seqbegin_or_lock(), then I am going
> to change the semantics of need_seqretry() to enforce the locking on the 2nd
> pass.
>
> Oleg.
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
2023-11-01 20:40 ` Oleg Nesterov
@ 2023-11-01 21:22 ` David Howells
2023-11-01 22:38 ` Oleg Nesterov
0 siblings, 1 reply; 19+ messages in thread
From: David Howells @ 2023-11-01 21:22 UTC (permalink / raw)
To: Oleg Nesterov
Cc: dhowells, Marc Dionne, Alexander Viro, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Chuck Lever, linux-afs,
netdev, linux-kernel
Oleg Nesterov <oleg@redhat.com> wrote:
> Just none of read_seqbegin_or_lock/need_seqretry/done_seqretry
> helpers make any sense in this code.
I disagree. I think in at least a couple of cases I do want a locked second
path - ideally locked shared if seqlock can be made to use an rwlock instead
of a spinlock.
David
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
2023-11-01 21:22 ` David Howells
@ 2023-11-01 22:38 ` Oleg Nesterov
0 siblings, 0 replies; 19+ messages in thread
From: Oleg Nesterov @ 2023-11-01 22:38 UTC (permalink / raw)
To: David Howells
Cc: Marc Dionne, Alexander Viro, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Chuck Lever, linux-afs, netdev,
linux-kernel
On 11/01, David Howells wrote:
>
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > Just none of read_seqbegin_or_lock/need_seqretry/done_seqretry
> > helpers make any sense in this code.
>
> I disagree. I think in at least a couple of cases I do want a locked second
> path
Sorry for confusion. I never said that the 2nd locked pass makes no sense.
My only point is that rxrpc_find_service_conn_rcu() (and more) use
read_seqbegin_or_lock() incorrectly. They can use read_seqbegin() and this
won't change the current behaviour.
So lets change these users first? Then we can discuss the possible changes
in include/linux/seqlock.h and (perhaps) update the users which actually
want the locking on the 2nd pass.
Oleg.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
2023-11-01 20:23 ` Oleg Nesterov
2023-11-01 20:40 ` Oleg Nesterov
@ 2023-11-01 20:52 ` Al Viro
2023-11-01 21:52 ` Oleg Nesterov
2023-11-01 21:20 ` David Howells
2 siblings, 1 reply; 19+ messages in thread
From: Al Viro @ 2023-11-01 20:52 UTC (permalink / raw)
To: Oleg Nesterov
Cc: David Howells, Marc Dionne, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Chuck Lever, linux-afs, netdev,
linux-kernel
On Wed, Nov 01, 2023 at 09:23:03PM +0100, Oleg Nesterov wrote:
> Yes this is confusing. Again, even the documentation is wrong! That is why
> I am trying to remove the misuse of read_seqbegin_or_lock(), then I am going
> to change the semantics of need_seqretry() to enforce the locking on the 2nd
> pass.
What for? Sure, documentation needs to be fixed, but *not* in direction you
suggested in that patch.
Why would you want to force that "switch to locked on the second pass" policy
on every possible caller?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
2023-11-01 20:52 ` Al Viro
@ 2023-11-01 21:52 ` Oleg Nesterov
2023-11-01 22:48 ` Al Viro
0 siblings, 1 reply; 19+ messages in thread
From: Oleg Nesterov @ 2023-11-01 21:52 UTC (permalink / raw)
To: Al Viro
Cc: David Howells, Marc Dionne, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Chuck Lever, linux-afs, netdev,
linux-kernel
On 11/01, Al Viro wrote:
>
> On Wed, Nov 01, 2023 at 09:23:03PM +0100, Oleg Nesterov wrote:
>
> > Yes this is confusing. Again, even the documentation is wrong! That is why
> > I am trying to remove the misuse of read_seqbegin_or_lock(), then I am going
> > to change the semantics of need_seqretry() to enforce the locking on the 2nd
> > pass.
>
> What for? Sure, documentation needs to be fixed,
So do you agree that the current usage of read_seqbegin_or_lock() in
rxrpc_find_service_conn_rcu() is misleading ? Do you agree it can use
read_seqbegin/read_seqretry without changing the current behaviour?
> but *not* in direction you
> suggested in that patch.
Hmm. then how do you think the doc should be changed? To describe the
current behaviour.
> Why would you want to force that "switch to locked on the second pass" policy
> on every possible caller?
Because this is what (I think) read_seqbegin_or_lock() is supposed to do.
It should take the lock for writing if the lockless access failed. At least
according to the documentation.
This needs another discussion and perhaps this makes no sense. But I'd
like to turn need_seqretry(seq) into something like
static inline int need_seqretry(seqlock_t *lock, int *seq)
{
int ret = !(*seq & 1) && read_seqretry(lock, *seq);
if (ret)
*seq = 1; /* make this counter odd */
return ret;
}
and update the users which actually want read_seqlock_excl() on the 2nd pass.
thread_group_cputime(), fs/d_path.c and fs/dcache.c.
For example, __dentry_path()
--- a/fs/d_path.c
+++ b/fs/d_path.c
@@ -349,10 +349,9 @@ static char *__dentry_path(const struct dentry *d, struct prepend_buffer *p)
}
if (!(seq & 1))
rcu_read_unlock();
- if (need_seqretry(&rename_lock, seq)) {
- seq = 1;
+ if (need_seqretry(&rename_lock, &seq))
goto restart;
- }
+
done_seqretry(&rename_lock, seq);
if (b.len == p->len)
prepend_char(&b, '/');
but again, this need another discussion.
Oleg.
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
2023-11-01 21:52 ` Oleg Nesterov
@ 2023-11-01 22:48 ` Al Viro
2023-11-01 23:17 ` Oleg Nesterov
0 siblings, 1 reply; 19+ messages in thread
From: Al Viro @ 2023-11-01 22:48 UTC (permalink / raw)
To: Oleg Nesterov
Cc: David Howells, Marc Dionne, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Chuck Lever, linux-afs, netdev,
linux-kernel
On Wed, Nov 01, 2023 at 10:52:15PM +0100, Oleg Nesterov wrote:
> > Why would you want to force that "switch to locked on the second pass" policy
> > on every possible caller?
>
> Because this is what (I think) read_seqbegin_or_lock() is supposed to do.
> It should take the lock for writing if the lockless access failed. At least
> according to the documentation.
Not really - it's literally seqbegin or lock, depending upon what the caller
tells it... IMO the mistake in docs is the insistence on using do-while
loop for its users.
Take a look at d_walk() and try to shoehorn that into your variant. Especially
the D_WALK_NORETRY handling...
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
2023-11-01 22:48 ` Al Viro
@ 2023-11-01 23:17 ` Oleg Nesterov
0 siblings, 0 replies; 19+ messages in thread
From: Oleg Nesterov @ 2023-11-01 23:17 UTC (permalink / raw)
To: Al Viro
Cc: David Howells, Marc Dionne, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Chuck Lever, linux-afs, netdev,
linux-kernel
On 11/01, Al Viro wrote:
>
> On Wed, Nov 01, 2023 at 10:52:15PM +0100, Oleg Nesterov wrote:
>
> > > Why would you want to force that "switch to locked on the second pass" policy
> > > on every possible caller?
> >
> > Because this is what (I think) read_seqbegin_or_lock() is supposed to do.
> > It should take the lock for writing if the lockless access failed. At least
> > according to the documentation.
>
> Not really - it's literally seqbegin or lock, depending upon what the caller
> tells it...
OK, I won't argue right now. But again, this patch doesn't change the current
behaviour. Exactly because the caller does NOT tell read_seqbegin_or_lock() that
it wants "or lock" on the 2nd pass.
> Take a look at d_walk() and try to shoehorn that into your variant. Especially
> the D_WALK_NORETRY handling...
I am already sleeping, quite possibly I am wrong. But it seems that if we change
done_seqretry() then d_walk() needs something like
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1420,7 +1420,7 @@ static void d_walk(struct dentry *parent, void *data,
spin_lock(&this_parent->d_lock);
/* might go back up the wrong parent if we have had a rename. */
- if (need_seqretry(&rename_lock, seq))
+ if (need_seqretry(&rename_lock, &seq))
goto rename_retry;
/* go into the first sibling still alive */
do {
@@ -1432,22 +1432,20 @@ static void d_walk(struct dentry *parent, void *data,
rcu_read_unlock();
goto resume;
}
- if (need_seqretry(&rename_lock, seq))
+ if (need_seqretry(&rename_lock, &seq))
goto rename_retry;
rcu_read_unlock();
out_unlock:
spin_unlock(&this_parent->d_lock);
- done_seqretry(&rename_lock, seq);
+ done_seqretry(&rename_lock, &seq);
return;
rename_retry:
spin_unlock(&this_parent->d_lock);
rcu_read_unlock();
- BUG_ON(seq & 1);
if (!retry)
return;
- seq = 1;
goto again;
}
But again, again, this is off-topic and needs another discussion. Right now I am just
trying to audit the users of read_seqbegin_or_lock/need_seqretry and change those who
use them incorrectly.
Oleg.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
2023-11-01 20:23 ` Oleg Nesterov
2023-11-01 20:40 ` Oleg Nesterov
2023-11-01 20:52 ` Al Viro
@ 2023-11-01 21:20 ` David Howells
2023-11-01 22:15 ` Oleg Nesterov
2 siblings, 1 reply; 19+ messages in thread
From: David Howells @ 2023-11-01 21:20 UTC (permalink / raw)
To: Oleg Nesterov
Cc: dhowells, Marc Dionne, Alexander Viro, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Chuck Lever, linux-afs,
netdev, linux-kernel
Oleg Nesterov <oleg@redhat.com> wrote:
> > 1 0 need_seqretry() [seq=even; sequence!=seq: retry]
>
> Yes, if CPU_1 races with write_seqlock() need_seqretry() returns true,
>
> > 1 1 read_seqbegin_or_lock() [exclusive]
>
> No. "seq" is still even, so read_seqbegin_or_lock() won't do read_seqlock_excl(),
> it will do
Yeah, you're right. I missed the fact that I got in the second example that
read_seqbegin_or_lock() spins until it sees a positive seq.
However, I think just changing all of these to always-lockless isn't
necessarily the most optimal way. Yes, it will work... eventually. But the
point is to limit the number of iterations.
So I have the following:
(1) rxrpc_find_service_conn_rcu().
I want to move the first part of the reaper to the I/O thread at some
point, then the locking here can go away entirely. However, this is
drivable by external events, so I would prefer to limit the number of
passes to just two and take a lock on the second pass. Holding up the
reaper thread for the moment is fine; holding up the I/O thread is not.
(2) afs_lookup_volume_rcu().
There can be a lot of volumes known by a system. A thousand would
require a 10-step walk and this is drivable by remote operation, so I
think this should probably take a lock on the second pass too.
(3) afs_check_validity().
(4) afs_get_attr().
These are both pretty short, so your solution is probably good for them.
That said, afs_vnode_commit_status() can spend a long time under the
write lock - and pretty much every file RPC op returns a status update.
(5) afs_find_server().
There could be a lot of servers in the list and each server can have
multiple addresses, so I think this would be better with an exclusive
second pass.
The server list isn't likely to change all that often, but when it does
change, there's a good chance several servers are going to be
added/removed one after the other. Further, this is only going to be
used for incoming cache management/callback requests from the server,
which hopefully aren't going to happen too often - but it is remotely
drivable.
(6) afs_find_server_by_uuid().
Similarly to (5), there could be a lot of servers to search through, but
they are in a tree not a flat list, so it should be faster to process.
Again, it's not likely to change that often and, again, when it does
change it's likely to involve multiple changes. This can be driven
remotely by an incoming cache management request but is mostly going to
be driven by setting up or reconfiguring a volume's server list -
something that also isn't likely to happen often.
I wonder if struct seqlock would make more sense with an rwlock rather than a
spinlock. As it is, it does an exclusive spinlock for the readpath which is
kind of overkill.
David
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
2023-11-01 21:20 ` David Howells
@ 2023-11-01 22:15 ` Oleg Nesterov
2023-11-01 22:29 ` Oleg Nesterov
0 siblings, 1 reply; 19+ messages in thread
From: Oleg Nesterov @ 2023-11-01 22:15 UTC (permalink / raw)
To: David Howells
Cc: Marc Dionne, Alexander Viro, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Chuck Lever, linux-afs, netdev,
linux-kernel
On 11/01, David Howells wrote:
>
> However, I think just changing all of these to always-lockless isn't
> necessarily the most optimal way.
Yes, but so far I am trying to change the users which never take the
lock for writing, so this patch doesn't change the current behaviour.
> I wonder if struct seqlock would make more sense with an rwlock rather than a
> spinlock. As it is, it does an exclusive spinlock for the readpath which is
> kind of overkill.
Heh. Please see
[PATCH 4/5] seqlock: introduce read_seqcount_begin_or_lock() and friends
https://lore.kernel.org/all/20230913155005.GA26252@redhat.com/
I am going to return to this later.
Oleg.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
2023-11-01 22:15 ` Oleg Nesterov
@ 2023-11-01 22:29 ` Oleg Nesterov
0 siblings, 0 replies; 19+ messages in thread
From: Oleg Nesterov @ 2023-11-01 22:29 UTC (permalink / raw)
To: David Howells
Cc: Marc Dionne, Alexander Viro, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Chuck Lever, linux-afs, netdev,
linux-kernel
sorry for noise, but in case I wasn't clear...
On 11/01, Oleg Nesterov wrote:
>
> On 11/01, David Howells wrote:
> >
> > However, I think just changing all of these to always-lockless isn't
> > necessarily the most optimal way.
>
> Yes, but so far I am trying to change the users which never take the
> lock for writing, so this patch doesn't change the current behaviour.
>
> > I wonder if struct seqlock would make more sense with an rwlock rather than a
> > spinlock. As it is, it does an exclusive spinlock for the readpath which is
> > kind of overkill.
>
> Heh. Please see
>
> [PATCH 4/5] seqlock: introduce read_seqcount_begin_or_lock() and friends
> https://lore.kernel.org/all/20230913155005.GA26252@redhat.com/
>
I meant, we already have seqcount_rwlock_t, but currently you can't do
something like read_seqbegin_or_lock(&seqcount_rwlock_t).
> I am going to return to this later.
Yes.
Oleg.
^ permalink raw reply [flat|nested] 19+ messages in thread