All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: David Howells <dhowells@redhat.com>
Cc: Marc Dionne <marc.dionne@auristor.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Chuck Lever <chuck.lever@oracle.com>,
	linux-afs@lists.infradead.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock()
Date: Wed, 1 Nov 2023 21:40:23 +0100	[thread overview]
Message-ID: <20231101204023.GC32034@redhat.com> (raw)
In-Reply-To: <20231101202302.GB32034@redhat.com>

In case I was not clear, I am not saying this code is buggy.

Just none of read_seqbegin_or_lock/need_seqretry/done_seqretry
helpers make any sense in this code. It can use read_seqbegin/
read_seqretry and this won't change the current behaviour.

On 11/01, Oleg Nesterov wrote:
>
> On 11/01, David Howells wrote:
> >
> > Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > > read_seqbegin_or_lock() makes no sense unless you make "seq" odd
> > > after the lockless access failed.
> >
> > I think you're wrong.
>
> I think you missed the point ;)
>
> > write_seqlock() turns it odd.
>
> It changes seqcount_t->sequence but not "seq" so this doesn't matter.
>
> > For instance, if the read lock is taken first:
> >
> > 	sequence seq	CPU 1				CPU 2
> > 	======= =======	===============================	===============
> > 	0
> > 	0	0	seq = 0  MUST BE EVEN
>
> This is correct,
>
> > ACCORDING TO DOC
>
> documentation is wrong, please see
>
> 	[PATCH 1/2] seqlock: fix the wrong read_seqbegin_or_lock/need_seqretry documentation
> 	https://lore.kernel.org/all/20231024120808.GA15382@redhat.com/
>
> > 	0	0	read_seqbegin_or_lock() [lockless]
> > 			...
> > 	1	0					write_seqlock()
> > 	1	0	need_seqretry() [seq=even; sequence!=seq: retry]
>
> Yes, if CPU_1 races with write_seqlock() need_seqretry() returns true,
>
> > 	1	1	read_seqbegin_or_lock() [exclusive]
>
> No. "seq" is still even, so read_seqbegin_or_lock() won't do read_seqlock_excl(),
> it will do
>
> 	seq = read_seqbegin(lock);
>
> again.
>
> > Note that it spins in __read_seqcount_begin() until we get an even seq,
> > indicating that no write is currently in progress - at which point we can
> > perform a lockless pass.
>
> Exactly. And this means that "seq" is always even.
>
> > > See thread_group_cputime() as an example, note that it does nextseq = 1 for
> > > the 2nd round.
> >
> > That's not especially convincing.
>
> See also the usage of read_seqbegin_or_lock() in fs/dcache.c and fs/d_path.c.
> All other users are wrong.
>
> Lets start from the very beginning. This code does
>
>         int seq = 0;
>         do {
>                 read_seqbegin_or_lock(service_conn_lock, &seq);
>
>                 do_something();
>
>         } while (need_seqretry(service_conn_lock, seq));
>
>         done_seqretry(service_conn_lock, seq);
>
> Initially seq is even (it is zero), so read_seqbegin_or_lock(&seq) does
>
> 	*seq = read_seqbegin(lock);
>
> and returns. Note that "seq" is still even.
>
> Now. If need_seqretry(seq) detects the race with write_seqlock() it returns
> true but it does NOT change this "seq", it is still even. So on the next
> iteration read_seqbegin_or_lock() will do
>
> 	*seq = read_seqbegin(lock);
>
> again, it won't take this lock for writing. And again, seq will be even.
> And so on.
>
> And this means that the code above is equivalent to
>
> 	do {
> 		seq = read_seqbegin(service_conn_lock);
>
> 		do_something();
>
> 	} while (read_seqretry(service_conn_lock, seq));
>
> and this is what this patch does.
>
> Yes this is confusing. Again, even the documentation is wrong! That is why
> I am trying to remove the misuse of read_seqbegin_or_lock(), then I am going
> to change the semantics of need_seqretry() to enforce the locking on the 2nd
> pass.
>
> Oleg.


  reply	other threads:[~2023-11-01 20:42 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-27  9:58 [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock() Oleg Nesterov
2023-10-27 10:00 ` Oleg Nesterov
2023-11-01 15:45 ` David Howells
2023-11-01 20:23   ` Oleg Nesterov
2023-11-01 20:40     ` Oleg Nesterov [this message]
2023-11-01 21:22       ` David Howells
2023-11-01 22:38         ` Oleg Nesterov
2023-11-01 20:52     ` Al Viro
2023-11-01 21:52       ` Oleg Nesterov
2023-11-01 22:48         ` Al Viro
2023-11-01 23:17           ` Oleg Nesterov
2023-11-01 21:20     ` David Howells
2023-11-01 22:15       ` Oleg Nesterov
2023-11-01 22:29         ` Oleg Nesterov
2023-11-16 13:18 ` Oleg Nesterov
2023-11-16 13:41   ` David Howells
2023-11-16 14:19     ` Oleg Nesterov
2023-11-16 15:02       ` David Howells
2023-11-16 15:06         ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231101204023.GC32034@redhat.com \
    --to=oleg@redhat.com \
    --cc=chuck.lever@oracle.com \
    --cc=davem@davemloft.net \
    --cc=dhowells@redhat.com \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marc.dionne@auristor.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.