[PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
@ 2023-10-27  9:58 Oleg Nesterov
  2023-10-27 10:00 ` Oleg Nesterov
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Oleg Nesterov @ 2023-10-27  9:58 UTC (permalink / raw)
  To: David Howells, Marc Dionne, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Chuck Lever, linux-afs, netdev, linux-kernel

read_seqbegin_or_lock() makes no sense unless you make "seq" odd
after the lockless access failed. See thread_group_cputime() as
an example, note that it does nextseq = 1 for the 2nd round.

So this code can use read_seqbegin() without changing the current
behaviour.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 net/rxrpc/conn_service.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/rxrpc/conn_service.c b/net/rxrpc/conn_service.c
index 89ac05a711a4..bfafe58681d9 100644
--- a/net/rxrpc/conn_service.c
+++ b/net/rxrpc/conn_service.c
@@ -25,7 +25,7 @@ struct rxrpc_connection *rxrpc_find_service_conn_rcu(struct rxrpc_peer *peer,
 	struct rxrpc_conn_proto k;
 	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
 	struct rb_node *p;
-	unsigned int seq = 0;
+	unsigned int seq;
 
 	k.epoch	= sp->hdr.epoch;
 	k.cid	= sp->hdr.cid & RXRPC_CIDMASK;
@@ -35,7 +35,7 @@ struct rxrpc_connection *rxrpc_find_service_conn_rcu(struct rxrpc_peer *peer,
 		 * under just the RCU read lock, so we have to check for
 		 * changes.
 		 */
-		read_seqbegin_or_lock(&peer->service_conn_lock, &seq);
+		seq = read_seqbegin(&peer->service_conn_lock);
 
 		p = rcu_dereference_raw(peer->service_conns.rb_node);
 		while (p) {
@@ -49,9 +49,8 @@ struct rxrpc_connection *rxrpc_find_service_conn_rcu(struct rxrpc_peer *peer,
 				break;
 			conn = NULL;
 		}
-	} while (need_seqretry(&peer->service_conn_lock, seq));
+	} while (read_seqretry(&peer->service_conn_lock, seq));
 
-	done_seqretry(&peer->service_conn_lock, seq);
 	_leave(" = %d", conn ? conn->debug_id : -1);
 	return conn;
 }
-- 
2.25.1.362.g51ebf55



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
  2023-10-27  9:58 [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock() Oleg Nesterov
@ 2023-10-27 10:00 ` Oleg Nesterov
  2023-11-01 15:45 ` David Howells
  2023-11-16 13:18 ` Oleg Nesterov
  2 siblings, 0 replies; 19+ messages in thread
From: Oleg Nesterov @ 2023-10-27 10:00 UTC (permalink / raw)
  To: David Howells, Marc Dionne, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Chuck Lever, linux-afs, netdev, linux-kernel

On 10/27, Oleg Nesterov wrote:
>
> read_seqbegin_or_lock() makes no sense unless you make "seq" odd
> after the lockless access failed. See thread_group_cputime() as
> an example, note that it does nextseq = 1 for the 2nd round.

See also

	[PATCH 1/2] seqlock: fix the wrong read_seqbegin_or_lock/need_seqretry documentation
	https://lore.kernel.org/all/20231024120808.GA15382@redhat.com/

> So this code can use read_seqbegin() without changing the current
> behaviour.

I am trying to remove the misuse of read_seqbegin_or_lock(),
then I am going to turn need_seqretry() into

	static inline int need_seqretry(seqlock_t *lock, int *seq)
	{
		int ret = !(*seq & 1) && read_seqretry(lock, *seq);

		if (ret)
			*seq = 1; /* make this counter odd */

		return ret;
	}

Oleg.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
  2023-10-27  9:58 [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock() Oleg Nesterov
  2023-10-27 10:00 ` Oleg Nesterov
@ 2023-11-01 15:45 ` David Howells
  2023-11-01 20:23   ` Oleg Nesterov
  2023-11-16 13:18 ` Oleg Nesterov
  2 siblings, 1 reply; 19+ messages in thread
From: David Howells @ 2023-11-01 15:45 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: dhowells, Marc Dionne, Alexander Viro, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Chuck Lever, linux-afs,
	netdev, linux-kernel

Oleg Nesterov <oleg@redhat.com> wrote:

> read_seqbegin_or_lock() makes no sense unless you make "seq" odd
> after the lockless access failed.

I think you're wrong.

write_seqlock() turns it odd.  For instance, if the read lock is taken first:

	sequence seq	CPU 1				CPU 2
	======= =======	===============================	===============
	0
	0	0	seq = 0 // MUST BE EVEN ACCORDING TO DOC
	0	0	read_seqbegin_or_lock() [lockless]
			...
	1	0					write_seqlock()
	1	0	need_seqretry() [seq=even; sequence!=seq: retry]
	1	1	read_seqbegin_or_lock() [exclusive]
			-->spin_lock(lock);
	2	1					write_sequnlock()
			<--locked
			...
	2	1	need_seqretry()

However, if the write lock is taken first:

	sequence seq	CPU 1				CPU 2
	======= =======	===============================	===============
	0
	1						write_seqlock()
	1	0	seq = 0 // MUST BE EVEN ACCORDING TO DOC
	1	0	read_seqbegin_or_lock() [lockless]
	1	0	    __read_seqcount_begin()
				while (lock.sequence is odd)
				    cpu_relax();
	2	0					write_sequnlock()
	2	2		[loop end]
			...
	2	2	need_seqretry() [seq=even; sequence==seq; done]

Note that it spins in __read_seqcount_begin() until we get an even seq,
indicating that no write is currently in progress - at which point we can
perform a lockless pass.

> See thread_group_cputime() as an example, note that it does nextseq = 1 for
> the 2nd round.

That's not especially convincing.

David


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
  2023-11-01 15:45 ` David Howells
@ 2023-11-01 20:23   ` Oleg Nesterov
  2023-11-01 20:40     ` Oleg Nesterov
                       ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Oleg Nesterov @ 2023-11-01 20:23 UTC (permalink / raw)
  To: David Howells
  Cc: Marc Dionne, Alexander Viro, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Chuck Lever, linux-afs, netdev,
	linux-kernel

On 11/01, David Howells wrote:
>
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > read_seqbegin_or_lock() makes no sense unless you make "seq" odd
> > after the lockless access failed.
>
> I think you're wrong.

I think you missed the point ;)

> write_seqlock() turns it odd.

It changes seqcount_t->sequence but not "seq" so this doesn't matter.

> For instance, if the read lock is taken first:
>
> 	sequence seq	CPU 1				CPU 2
> 	======= =======	===============================	===============
> 	0
> 	0	0	seq = 0  MUST BE EVEN

This is correct,

> ACCORDING TO DOC

documentation is wrong, please see

	[PATCH 1/2] seqlock: fix the wrong read_seqbegin_or_lock/need_seqretry documentation
	https://lore.kernel.org/all/20231024120808.GA15382@redhat.com/

> 	0	0	read_seqbegin_or_lock() [lockless]
> 			...
> 	1	0					write_seqlock()
> 	1	0	need_seqretry() [seq=even; sequence!=seq: retry]

Yes, if CPU_1 races with write_seqlock() need_seqretry() returns true,

> 	1	1	read_seqbegin_or_lock() [exclusive]

No. "seq" is still even, so read_seqbegin_or_lock() won't do read_seqlock_excl(),
it will do

	seq = read_seqbegin(lock);

again.

> Note that it spins in __read_seqcount_begin() until we get an even seq,
> indicating that no write is currently in progress - at which point we can
> perform a lockless pass.

Exactly. And this means that "seq" is always even.

> > See thread_group_cputime() as an example, note that it does nextseq = 1 for
> > the 2nd round.
>
> That's not especially convincing.

See also the usage of read_seqbegin_or_lock() in fs/dcache.c and fs/d_path.c.
All other users are wrong.

Lets start from the very beginning. This code does

        int seq = 0;
        do {
                read_seqbegin_or_lock(service_conn_lock, &seq);

                do_something();

        } while (need_seqretry(service_conn_lock, seq));

        done_seqretry(service_conn_lock, seq);

Initially seq is even (it is zero), so read_seqbegin_or_lock(&seq) does

	*seq = read_seqbegin(lock);

and returns. Note that "seq" is still even.

Now. If need_seqretry(seq) detects the race with write_seqlock() it returns
true but it does NOT change this "seq", it is still even. So on the next
iteration read_seqbegin_or_lock() will do

	*seq = read_seqbegin(lock);

again, it won't take this lock for writing. And again, seq will be even.
And so on.

And this means that the code above is equivalent to

	do {
		seq = read_seqbegin(service_conn_lock);

		do_something();

	} while (read_seqretry(service_conn_lock, seq));

and this is what this patch does.

Yes this is confusing. Again, even the documentation is wrong! That is why
I am trying to remove the misuse of read_seqbegin_or_lock(), then I am going
to change the semantics of need_seqretry() to enforce the locking on the 2nd
pass.

Oleg.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
  2023-11-01 20:23   ` Oleg Nesterov
@ 2023-11-01 20:40     ` Oleg Nesterov
  2023-11-01 21:22       ` David Howells
  2023-11-01 20:52     ` Al Viro
  2023-11-01 21:20     ` David Howells
  2 siblings, 1 reply; 19+ messages in thread
From: Oleg Nesterov @ 2023-11-01 20:40 UTC (permalink / raw)
  To: David Howells
  Cc: Marc Dionne, Alexander Viro, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Chuck Lever, linux-afs, netdev,
	linux-kernel

In case I was not clear, I am not saying this code is buggy.

Just none of read_seqbegin_or_lock/need_seqretry/done_seqretry
helpers make any sense in this code. It can use read_seqbegin/
read_seqretry and this won't change the current behaviour.

On 11/01, Oleg Nesterov wrote:
>
> On 11/01, David Howells wrote:
> >
> > Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > > read_seqbegin_or_lock() makes no sense unless you make "seq" odd
> > > after the lockless access failed.
> >
> > I think you're wrong.
>
> I think you missed the point ;)
>
> > write_seqlock() turns it odd.
>
> It changes seqcount_t->sequence but not "seq" so this doesn't matter.
>
> > For instance, if the read lock is taken first:
> >
> > 	sequence seq	CPU 1				CPU 2
> > 	======= =======	===============================	===============
> > 	0
> > 	0	0	seq = 0  MUST BE EVEN
>
> This is correct,
>
> > ACCORDING TO DOC
>
> documentation is wrong, please see
>
> 	[PATCH 1/2] seqlock: fix the wrong read_seqbegin_or_lock/need_seqretry documentation
> 	https://lore.kernel.org/all/20231024120808.GA15382@redhat.com/
>
> > 	0	0	read_seqbegin_or_lock() [lockless]
> > 			...
> > 	1	0					write_seqlock()
> > 	1	0	need_seqretry() [seq=even; sequence!=seq: retry]
>
> Yes, if CPU_1 races with write_seqlock() need_seqretry() returns true,
>
> > 	1	1	read_seqbegin_or_lock() [exclusive]
>
> No. "seq" is still even, so read_seqbegin_or_lock() won't do read_seqlock_excl(),
> it will do
>
> 	seq = read_seqbegin(lock);
>
> again.
>
> > Note that it spins in __read_seqcount_begin() until we get an even seq,
> > indicating that no write is currently in progress - at which point we can
> > perform a lockless pass.
>
> Exactly. And this means that "seq" is always even.
>
> > > See thread_group_cputime() as an example, note that it does nextseq = 1 for
> > > the 2nd round.
> >
> > That's not especially convincing.
>
> See also the usage of read_seqbegin_or_lock() in fs/dcache.c and fs/d_path.c.
> All other users are wrong.
>
> Lets start from the very beginning. This code does
>
>         int seq = 0;
>         do {
>                 read_seqbegin_or_lock(service_conn_lock, &seq);
>
>                 do_something();
>
>         } while (need_seqretry(service_conn_lock, seq));
>
>         done_seqretry(service_conn_lock, seq);
>
> Initially seq is even (it is zero), so read_seqbegin_or_lock(&seq) does
>
> 	*seq = read_seqbegin(lock);
>
> and returns. Note that "seq" is still even.
>
> Now. If need_seqretry(seq) detects the race with write_seqlock() it returns
> true but it does NOT change this "seq", it is still even. So on the next
> iteration read_seqbegin_or_lock() will do
>
> 	*seq = read_seqbegin(lock);
>
> again, it won't take this lock for writing. And again, seq will be even.
> And so on.
>
> And this means that the code above is equivalent to
>
> 	do {
> 		seq = read_seqbegin(service_conn_lock);
>
> 		do_something();
>
> 	} while (read_seqretry(service_conn_lock, seq));
>
> and this is what this patch does.
>
> Yes this is confusing. Again, even the documentation is wrong! That is why
> I am trying to remove the misuse of read_seqbegin_or_lock(), then I am going
> to change the semantics of need_seqretry() to enforce the locking on the 2nd
> pass.
>
> Oleg.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
  2023-11-01 20:40     ` Oleg Nesterov
@ 2023-11-01 21:22       ` David Howells
  2023-11-01 22:38         ` Oleg Nesterov
  0 siblings, 1 reply; 19+ messages in thread
From: David Howells @ 2023-11-01 21:22 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: dhowells, Marc Dionne, Alexander Viro, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Chuck Lever, linux-afs,
	netdev, linux-kernel

Oleg Nesterov <oleg@redhat.com> wrote:

> Just none of read_seqbegin_or_lock/need_seqretry/done_seqretry
> helpers make any sense in this code.

I disagree.  I think in at least a couple of cases I do want a locked second
path - ideally locked shared if seqlock can be made to use an rwlock instead
of a spinlock.

David


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
  2023-11-01 21:22       ` David Howells
@ 2023-11-01 22:38         ` Oleg Nesterov
  0 siblings, 0 replies; 19+ messages in thread
From: Oleg Nesterov @ 2023-11-01 22:38 UTC (permalink / raw)
  To: David Howells
  Cc: Marc Dionne, Alexander Viro, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Chuck Lever, linux-afs, netdev,
	linux-kernel

On 11/01, David Howells wrote:
>
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > Just none of read_seqbegin_or_lock/need_seqretry/done_seqretry
> > helpers make any sense in this code.
>
> I disagree.  I think in at least a couple of cases I do want a locked second
> path

Sorry for confusion. I never said that the 2nd locked pass makes no sense.

My only point is that rxrpc_find_service_conn_rcu() (and more) use
read_seqbegin_or_lock() incorrectly. They can use read_seqbegin() and this
won't change the current behaviour.

So lets change these users first? Then we can discuss the possible changes
in include/linux/seqlock.h and (perhaps) update the users which actually
want the locking on the 2nd pass.

Oleg.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
  2023-11-01 20:23   ` Oleg Nesterov
  2023-11-01 20:40     ` Oleg Nesterov
@ 2023-11-01 20:52     ` Al Viro
  2023-11-01 21:52       ` Oleg Nesterov
  2023-11-01 21:20     ` David Howells
  2 siblings, 1 reply; 19+ messages in thread
From: Al Viro @ 2023-11-01 20:52 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: David Howells, Marc Dionne, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Chuck Lever, linux-afs, netdev,
	linux-kernel

On Wed, Nov 01, 2023 at 09:23:03PM +0100, Oleg Nesterov wrote:

> Yes this is confusing. Again, even the documentation is wrong! That is why
> I am trying to remove the misuse of read_seqbegin_or_lock(), then I am going
> to change the semantics of need_seqretry() to enforce the locking on the 2nd
> pass.

What for?  Sure, documentation needs to be fixed, but *not* in direction you
suggested in that patch.

Why would you want to force that "switch to locked on the second pass" policy
on every possible caller?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
  2023-11-01 20:52     ` Al Viro
@ 2023-11-01 21:52       ` Oleg Nesterov
  2023-11-01 22:48         ` Al Viro
  0 siblings, 1 reply; 19+ messages in thread
From: Oleg Nesterov @ 2023-11-01 21:52 UTC (permalink / raw)
  To: Al Viro
  Cc: David Howells, Marc Dionne, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Chuck Lever, linux-afs, netdev,
	linux-kernel

On 11/01, Al Viro wrote:
>
> On Wed, Nov 01, 2023 at 09:23:03PM +0100, Oleg Nesterov wrote:
>
> > Yes this is confusing. Again, even the documentation is wrong! That is why
> > I am trying to remove the misuse of read_seqbegin_or_lock(), then I am going
> > to change the semantics of need_seqretry() to enforce the locking on the 2nd
> > pass.
>
> What for?  Sure, documentation needs to be fixed,

So do you agree that the current usage of read_seqbegin_or_lock() in
rxrpc_find_service_conn_rcu() is misleading ? Do you agree it can use
read_seqbegin/read_seqretry without changing the current behaviour?

> but *not* in direction you
> suggested in that patch.

Hmm. then how do you think the doc should be changed? To describe the
current behaviour.

> Why would you want to force that "switch to locked on the second pass" policy
> on every possible caller?

Because this is what (I think) read_seqbegin_or_lock() is supposed to do.
It should take the lock for writing if the lockless access failed. At least
according to the documentation.

This needs another discussion and perhaps this makes no sense. But I'd
like to turn need_seqretry(seq) into something like

	static inline int need_seqretry(seqlock_t *lock, int *seq)
	{
		int ret = !(*seq & 1) && read_seqretry(lock, *seq);

		if (ret)
			*seq = 1; /* make this counter odd */

		return ret;
	}

and update the users which actually want read_seqlock_excl() on the 2nd pass.
thread_group_cputime(), fs/d_path.c and fs/dcache.c.

For example, __dentry_path()

	--- a/fs/d_path.c
	+++ b/fs/d_path.c
	@@ -349,10 +349,9 @@ static char *__dentry_path(const struct dentry *d, struct prepend_buffer *p)
		}
		if (!(seq & 1))
			rcu_read_unlock();
	-	if (need_seqretry(&rename_lock, seq)) {
	-		seq = 1;
	+	if (need_seqretry(&rename_lock, &seq))
			goto restart;
	-	}
	+
		done_seqretry(&rename_lock, seq);
		if (b.len == p->len)
			prepend_char(&b, '/');

but again, this need another discussion.

Oleg.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
  2023-11-01 21:52       ` Oleg Nesterov
@ 2023-11-01 22:48         ` Al Viro
  2023-11-01 23:17           ` Oleg Nesterov
  0 siblings, 1 reply; 19+ messages in thread
From: Al Viro @ 2023-11-01 22:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: David Howells, Marc Dionne, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Chuck Lever, linux-afs, netdev,
	linux-kernel

On Wed, Nov 01, 2023 at 10:52:15PM +0100, Oleg Nesterov wrote:

> > Why would you want to force that "switch to locked on the second pass" policy
> > on every possible caller?
> 
> Because this is what (I think) read_seqbegin_or_lock() is supposed to do.
> It should take the lock for writing if the lockless access failed. At least
> according to the documentation.

Not really - it's literally seqbegin or lock, depending upon what the caller
tells it...  IMO the mistake in docs is the insistence on using do-while
loop for its users.

Take a look at d_walk() and try to shoehorn that into your variant.  Especially
the D_WALK_NORETRY handling...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
  2023-11-01 22:48         ` Al Viro
@ 2023-11-01 23:17           ` Oleg Nesterov
  0 siblings, 0 replies; 19+ messages in thread
From: Oleg Nesterov @ 2023-11-01 23:17 UTC (permalink / raw)
  To: Al Viro
  Cc: David Howells, Marc Dionne, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Chuck Lever, linux-afs, netdev,
	linux-kernel

On 11/01, Al Viro wrote:
>
> On Wed, Nov 01, 2023 at 10:52:15PM +0100, Oleg Nesterov wrote:
>
> > > Why would you want to force that "switch to locked on the second pass" policy
> > > on every possible caller?
> >
> > Because this is what (I think) read_seqbegin_or_lock() is supposed to do.
> > It should take the lock for writing if the lockless access failed. At least
> > according to the documentation.
>
> Not really - it's literally seqbegin or lock, depending upon what the caller
> tells it...

OK, I won't argue right now. But again, this patch doesn't change the current
behaviour. Exactly because the caller does NOT tell read_seqbegin_or_lock() that
it wants "or lock" on the 2nd pass.

> Take a look at d_walk() and try to shoehorn that into your variant.  Especially
> the D_WALK_NORETRY handling...

I am already sleeping, quite possibly I am wrong. But it seems that if we change
done_seqretry() then d_walk() needs something like

	--- a/fs/dcache.c
	+++ b/fs/dcache.c
	@@ -1420,7 +1420,7 @@ static void d_walk(struct dentry *parent, void *data,
			spin_lock(&this_parent->d_lock);
	 
			/* might go back up the wrong parent if we have had a rename. */
	-		if (need_seqretry(&rename_lock, seq))
	+		if (need_seqretry(&rename_lock, &seq))
				goto rename_retry;
			/* go into the first sibling still alive */
			do {
	@@ -1432,22 +1432,20 @@ static void d_walk(struct dentry *parent, void *data,
			rcu_read_unlock();
			goto resume;
		}
	-	if (need_seqretry(&rename_lock, seq))
	+	if (need_seqretry(&rename_lock, &seq))
			goto rename_retry;
		rcu_read_unlock();
	 
	 out_unlock:
		spin_unlock(&this_parent->d_lock);
	-	done_seqretry(&rename_lock, seq);
	+	done_seqretry(&rename_lock, &seq);
		return;
	 
	 rename_retry:
		spin_unlock(&this_parent->d_lock);
		rcu_read_unlock();
	-	BUG_ON(seq & 1);
		if (!retry)
			return;
	-	seq = 1;
		goto again;
	 }


But again, again, this is off-topic and needs another discussion. Right now I am just
trying to audit the users of read_seqbegin_or_lock/need_seqretry and change those who
use them incorrectly.

Oleg.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
  2023-11-01 20:23   ` Oleg Nesterov
  2023-11-01 20:40     ` Oleg Nesterov
  2023-11-01 20:52     ` Al Viro
@ 2023-11-01 21:20     ` David Howells
  2023-11-01 22:15       ` Oleg Nesterov
  2 siblings, 1 reply; 19+ messages in thread
From: David Howells @ 2023-11-01 21:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: dhowells, Marc Dionne, Alexander Viro, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Chuck Lever, linux-afs,
	netdev, linux-kernel

Oleg Nesterov <oleg@redhat.com> wrote:

> > 	1	0	need_seqretry() [seq=even; sequence!=seq: retry]
>
> Yes, if CPU_1 races with write_seqlock() need_seqretry() returns true,
>
> > 	1	1	read_seqbegin_or_lock() [exclusive]
>
> No. "seq" is still even, so read_seqbegin_or_lock() won't do read_seqlock_excl(),
> it will do

Yeah, you're right.  I missed the fact that I got in the second example that
read_seqbegin_or_lock() spins until it sees a positive seq.

However, I think just changing all of these to always-lockless isn't
necessarily the most optimal way.  Yes, it will work... eventually.  But the
point is to limit the number of iterations.

So I have the following:

 (1) rxrpc_find_service_conn_rcu().

     I want to move the first part of the reaper to the I/O thread at some
     point, then the locking here can go away entirely.  However, this is
     drivable by external events, so I would prefer to limit the number of
     passes to just two and take a lock on the second pass.  Holding up the
     reaper thread for the moment is fine; holding up the I/O thread is not.

 (2) afs_lookup_volume_rcu().

     There can be a lot of volumes known by a system.  A thousand would
     require a 10-step walk and this is drivable by remote operation, so I
     think this should probably take a lock on the second pass too.

 (3) afs_check_validity().
 (4) afs_get_attr().

     These are both pretty short, so your solution is probably good for them.
     That said, afs_vnode_commit_status() can spend a long time under the
     write lock - and pretty much every file RPC op returns a status update.

 (5) afs_find_server().

     There could be a lot of servers in the list and each server can have
     multiple addresses, so I think this would be better with an exclusive
     second pass.

     The server list isn't likely to change all that often, but when it does
     change, there's a good chance several servers are going to be
     added/removed one after the other.  Further, this is only going to be
     used for incoming cache management/callback requests from the server,
     which hopefully aren't going to happen too often - but it is remotely
     drivable.

 (6) afs_find_server_by_uuid().

     Similarly to (5), there could be a lot of servers to search through, but
     they are in a tree not a flat list, so it should be faster to process.
     Again, it's not likely to change that often and, again, when it does
     change it's likely to involve multiple changes.  This can be driven
     remotely by an incoming cache management request but is mostly going to
     be driven by setting up or reconfiguring a volume's server list -
     something that also isn't likely to happen often.

I wonder if struct seqlock would make more sense with an rwlock rather than a
spinlock.  As it is, it does an exclusive spinlock for the readpath which is
kind of overkill.

David


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
  2023-11-01 21:20     ` David Howells
@ 2023-11-01 22:15       ` Oleg Nesterov
  2023-11-01 22:29         ` Oleg Nesterov
  0 siblings, 1 reply; 19+ messages in thread
From: Oleg Nesterov @ 2023-11-01 22:15 UTC (permalink / raw)
  To: David Howells
  Cc: Marc Dionne, Alexander Viro, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Chuck Lever, linux-afs, netdev,
	linux-kernel

On 11/01, David Howells wrote:
>
> However, I think just changing all of these to always-lockless isn't
> necessarily the most optimal way.

Yes, but so far I am trying to change the users which never take the
lock for writing, so this patch doesn't change the current behaviour.

> I wonder if struct seqlock would make more sense with an rwlock rather than a
> spinlock.  As it is, it does an exclusive spinlock for the readpath which is
> kind of overkill.

Heh. Please see

	[PATCH 4/5] seqlock: introduce read_seqcount_begin_or_lock() and friends
	https://lore.kernel.org/all/20230913155005.GA26252@redhat.com/

I am going to return to this later.

Oleg.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
  2023-11-01 22:15       ` Oleg Nesterov
@ 2023-11-01 22:29         ` Oleg Nesterov
  0 siblings, 0 replies; 19+ messages in thread
From: Oleg Nesterov @ 2023-11-01 22:29 UTC (permalink / raw)
  To: David Howells
  Cc: Marc Dionne, Alexander Viro, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Chuck Lever, linux-afs, netdev,
	linux-kernel

sorry for noise, but in case I wasn't clear...

On 11/01, Oleg Nesterov wrote:
>
> On 11/01, David Howells wrote:
> >
> > However, I think just changing all of these to always-lockless isn't
> > necessarily the most optimal way.
>
> Yes, but so far I am trying to change the users which never take the
> lock for writing, so this patch doesn't change the current behaviour.
>
> > I wonder if struct seqlock would make more sense with an rwlock rather than a
> > spinlock.  As it is, it does an exclusive spinlock for the readpath which is
> > kind of overkill.
>
> Heh. Please see
>
> 	[PATCH 4/5] seqlock: introduce read_seqcount_begin_or_lock() and friends
> 	https://lore.kernel.org/all/20230913155005.GA26252@redhat.com/
>

I meant, we already have seqcount_rwlock_t, but currently you can't do
something like read_seqbegin_or_lock(&seqcount_rwlock_t).

> I am going to return to this later.

Yes.

Oleg.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
  2023-10-27  9:58 [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock() Oleg Nesterov
  2023-10-27 10:00 ` Oleg Nesterov
  2023-11-01 15:45 ` David Howells
@ 2023-11-16 13:18 ` Oleg Nesterov
  2023-11-16 13:41   ` David Howells
  2 siblings, 1 reply; 19+ messages in thread
From: Oleg Nesterov @ 2023-11-16 13:18 UTC (permalink / raw)
  To: David Howells, Marc Dionne, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Al Viro
  Cc: Chuck Lever, linux-afs, netdev, linux-kernel

David, Al,

So do you agree that

	- the usage of read_seqbegin_or_lock/need_seqretry in
	  this code makes no sense because read_seqlock_excl()
	  is not possible

	- this patch doesn't change the current behaviour but
	  simplifies the code and makes it more clear

?

On 10/27, Oleg Nesterov wrote:
>
> read_seqbegin_or_lock() makes no sense unless you make "seq" odd
> after the lockless access failed. See thread_group_cputime() as
> an example, note that it does nextseq = 1 for the 2nd round.
> 
> So this code can use read_seqbegin() without changing the current
> behaviour.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  net/rxrpc/conn_service.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/net/rxrpc/conn_service.c b/net/rxrpc/conn_service.c
> index 89ac05a711a4..bfafe58681d9 100644
> --- a/net/rxrpc/conn_service.c
> +++ b/net/rxrpc/conn_service.c
> @@ -25,7 +25,7 @@ struct rxrpc_connection *rxrpc_find_service_conn_rcu(struct rxrpc_peer *peer,
>  	struct rxrpc_conn_proto k;
>  	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
>  	struct rb_node *p;
> -	unsigned int seq = 0;
> +	unsigned int seq;
>  
>  	k.epoch	= sp->hdr.epoch;
>  	k.cid	= sp->hdr.cid & RXRPC_CIDMASK;
> @@ -35,7 +35,7 @@ struct rxrpc_connection *rxrpc_find_service_conn_rcu(struct rxrpc_peer *peer,
>  		 * under just the RCU read lock, so we have to check for
>  		 * changes.
>  		 */
> -		read_seqbegin_or_lock(&peer->service_conn_lock, &seq);
> +		seq = read_seqbegin(&peer->service_conn_lock);
>  
>  		p = rcu_dereference_raw(peer->service_conns.rb_node);
>  		while (p) {
> @@ -49,9 +49,8 @@ struct rxrpc_connection *rxrpc_find_service_conn_rcu(struct rxrpc_peer *peer,
>  				break;
>  			conn = NULL;
>  		}
> -	} while (need_seqretry(&peer->service_conn_lock, seq));
> +	} while (read_seqretry(&peer->service_conn_lock, seq));
>  
> -	done_seqretry(&peer->service_conn_lock, seq);
>  	_leave(" = %d", conn ? conn->debug_id : -1);
>  	return conn;
>  }
> -- 
> 2.25.1.362.g51ebf55
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
  2023-11-16 13:18 ` Oleg Nesterov
@ 2023-11-16 13:41   ` David Howells
  2023-11-16 14:19     ` Oleg Nesterov
  0 siblings, 1 reply; 19+ messages in thread
From: David Howells @ 2023-11-16 13:41 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: dhowells, Marc Dionne, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Al Viro, Chuck Lever, linux-afs,
	netdev, linux-kernel

Oleg Nesterov <oleg@redhat.com> wrote:

> So do you agree that
> 
> 	- the usage of read_seqbegin_or_lock/need_seqretry in
> 	  this code makes no sense because read_seqlock_excl()
> 	  is not possible

Not exactly.  I think it should take a lock on the second pass.

> 	- this patch doesn't change the current behaviour but
> 	  simplifies the code and makes it more clear

That is true.

David


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
  2023-11-16 13:41   ` David Howells
@ 2023-11-16 14:19     ` Oleg Nesterov
  2023-11-16 15:02       ` David Howells
  0 siblings, 1 reply; 19+ messages in thread
From: Oleg Nesterov @ 2023-11-16 14:19 UTC (permalink / raw)
  To: David Howells
  Cc: Marc Dionne, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Al Viro, Chuck Lever, linux-afs, netdev,
	linux-kernel

On 11/16, David Howells wrote:
>
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > So do you agree that
> >
> > 	- the usage of read_seqbegin_or_lock/need_seqretry in
> > 	  this code makes no sense because read_seqlock_excl()
> > 	  is not possible
>
> Not exactly.  I think it should take a lock on the second pass.

OK, then how about the patch below?

Again, I'd prefer to change the semantics/prototype of need_seqretry()
to enforce the locking on the 2nd pass "automatically", but a) this
needs more discussion and b) I can't do this before I update the users
which use read_seqbegin_or_lock/need_seqretry incorrectly. So lets
discuss this later.

Oleg.

--- a/net/rxrpc/conn_service.c
+++ b/net/rxrpc/conn_service.c
@@ -25,7 +25,7 @@ struct rxrpc_connection *rxrpc_find_service_conn_rcu(struct rxrpc_peer *peer,
 	struct rxrpc_conn_proto k;
 	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
 	struct rb_node *p;
-	unsigned int seq = 0;
+	unsigned int seq = 1;
 
 	k.epoch	= sp->hdr.epoch;
 	k.cid	= sp->hdr.cid & RXRPC_CIDMASK;
@@ -35,6 +35,7 @@ struct rxrpc_connection *rxrpc_find_service_conn_rcu(struct rxrpc_peer *peer,
 		 * under just the RCU read lock, so we have to check for
 		 * changes.
 		 */
+		seq++; /* 2 on the 1st/lockless path, otherwise odd */
 		read_seqbegin_or_lock(&peer->service_conn_lock, &seq);
 
 		p = rcu_dereference_raw(peer->service_conns.rb_node);


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
  2023-11-16 14:19     ` Oleg Nesterov
@ 2023-11-16 15:02       ` David Howells
  2023-11-16 15:06         ` Oleg Nesterov
  0 siblings, 1 reply; 19+ messages in thread
From: David Howells @ 2023-11-16 15:02 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: dhowells, Marc Dionne, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Al Viro, Chuck Lever, linux-afs,
	netdev, linux-kernel

Oleg Nesterov <oleg@redhat.com> wrote:

> > > 	- the usage of read_seqbegin_or_lock/need_seqretry in
> > > 	  this code makes no sense because read_seqlock_excl()
> > > 	  is not possible
> >
> > Not exactly.  I think it should take a lock on the second pass.
> 
> OK, then how about the patch below?

That seems to work.

David


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
  2023-11-16 15:02       ` David Howells
@ 2023-11-16 15:06         ` Oleg Nesterov
  0 siblings, 0 replies; 19+ messages in thread
From: Oleg Nesterov @ 2023-11-16 15:06 UTC (permalink / raw)
  To: David Howells
  Cc: Marc Dionne, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Al Viro, Chuck Lever, linux-afs, netdev,
	linux-kernel

On 11/16, David Howells wrote:
>
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > > > 	- the usage of read_seqbegin_or_lock/need_seqretry in
> > > > 	  this code makes no sense because read_seqlock_excl()
> > > > 	  is not possible
> > >
> > > Not exactly.  I think it should take a lock on the second pass.
> >
> > OK, then how about the patch below?
>
> That seems to work.

OK, I'll send V2 tomorrow.

Should I change fs/afs the same way?

Oleg.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-11-16 15:08 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-27  9:58 [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock() Oleg Nesterov
2023-10-27 10:00 ` Oleg Nesterov
2023-11-01 15:45 ` David Howells
2023-11-01 20:23   ` Oleg Nesterov
2023-11-01 20:40     ` Oleg Nesterov
2023-11-01 21:22       ` David Howells
2023-11-01 22:38         ` Oleg Nesterov
2023-11-01 20:52     ` Al Viro
2023-11-01 21:52       ` Oleg Nesterov
2023-11-01 22:48         ` Al Viro
2023-11-01 23:17           ` Oleg Nesterov
2023-11-01 21:20     ` David Howells
2023-11-01 22:15       ` Oleg Nesterov
2023-11-01 22:29         ` Oleg Nesterov
2023-11-16 13:18 ` Oleg Nesterov
2023-11-16 13:41   ` David Howells
2023-11-16 14:19     ` Oleg Nesterov
2023-11-16 15:02       ` David Howells
2023-11-16 15:06         ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).