From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 53D6F13FF0 for ; Wed, 1 Nov 2023 20:24:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="MGAGQBd7" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D7E90C1 for ; Wed, 1 Nov 2023 13:24:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1698870251; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=sOSUTYODY3s3dQC+LMM2Wx0F9jbAg04FBi/zoSh47ZA=; b=MGAGQBd70qeKAUILdfpNhJ7nJdTkhwNJUZLI8BSmuEwhqdWIPIf7+QbUA+LuKSxQop5IcW KW0Lr/a4jqbYoMkAIlH3uLdZdjaCke0wRhlVR+gUsEHxQMIuX6bSoz1M1+ILFSM8xtJSqt jkanc1DHI2+CWmBEFPixz7HzfypeF+0= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-617-psFQYNJxPYC0ZuysJrnt1Q-1; Wed, 01 Nov 2023 16:24:08 -0400 X-MC-Unique: psFQYNJxPYC0ZuysJrnt1Q-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E8588185A780; Wed, 1 Nov 2023 20:24:07 +0000 (UTC) Received: from dhcp-27-174.brq.redhat.com (unknown [10.45.224.94]) by smtp.corp.redhat.com (Postfix) with SMTP id 7436D1121308; Wed, 1 Nov 2023 20:24:05 +0000 (UTC) Received: by dhcp-27-174.brq.redhat.com (nbSMTP-1.00) for uid 1000 oleg@redhat.com; Wed, 1 Nov 2023 21:23:06 +0100 (CET) Date: Wed, 1 Nov 2023 21:23:03 +0100 From: Oleg Nesterov To: David Howells Cc: Marc Dionne , Alexander Viro , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Chuck Lever , linux-afs@lists.infradead.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] rxrpc_find_service_conn_rcu: use read_seqbegin() rather than read_seqbegin_or_lock() Message-ID: <20231101202302.GB32034@redhat.com> References: <20231027095842.GA30868@redhat.com> <1952182.1698853516@warthog.procyon.org.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1952182.1698853516@warthog.procyon.org.uk> User-Agent: Mutt/1.5.24 (2015-08-30) X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.3 On 11/01, David Howells wrote: > > Oleg Nesterov wrote: > > > read_seqbegin_or_lock() makes no sense unless you make "seq" odd > > after the lockless access failed. > > I think you're wrong. I think you missed the point ;) > write_seqlock() turns it odd. It changes seqcount_t->sequence but not "seq" so this doesn't matter. > For instance, if the read lock is taken first: > > sequence seq CPU 1 CPU 2 > ======= ======= =============================== =============== > 0 > 0 0 seq = 0 MUST BE EVEN This is correct, > ACCORDING TO DOC documentation is wrong, please see [PATCH 1/2] seqlock: fix the wrong read_seqbegin_or_lock/need_seqretry documentation https://lore.kernel.org/all/20231024120808.GA15382@redhat.com/ > 0 0 read_seqbegin_or_lock() [lockless] > ... > 1 0 write_seqlock() > 1 0 need_seqretry() [seq=even; sequence!=seq: retry] Yes, if CPU_1 races with write_seqlock() need_seqretry() returns true, > 1 1 read_seqbegin_or_lock() [exclusive] No. "seq" is still even, so read_seqbegin_or_lock() won't do read_seqlock_excl(), it will do seq = read_seqbegin(lock); again. > Note that it spins in __read_seqcount_begin() until we get an even seq, > indicating that no write is currently in progress - at which point we can > perform a lockless pass. Exactly. And this means that "seq" is always even. > > See thread_group_cputime() as an example, note that it does nextseq = 1 for > > the 2nd round. > > That's not especially convincing. See also the usage of read_seqbegin_or_lock() in fs/dcache.c and fs/d_path.c. All other users are wrong. Lets start from the very beginning. This code does int seq = 0; do { read_seqbegin_or_lock(service_conn_lock, &seq); do_something(); } while (need_seqretry(service_conn_lock, seq)); done_seqretry(service_conn_lock, seq); Initially seq is even (it is zero), so read_seqbegin_or_lock(&seq) does *seq = read_seqbegin(lock); and returns. Note that "seq" is still even. Now. If need_seqretry(seq) detects the race with write_seqlock() it returns true but it does NOT change this "seq", it is still even. So on the next iteration read_seqbegin_or_lock() will do *seq = read_seqbegin(lock); again, it won't take this lock for writing. And again, seq will be even. And so on. And this means that the code above is equivalent to do { seq = read_seqbegin(service_conn_lock); do_something(); } while (read_seqretry(service_conn_lock, seq)); and this is what this patch does. Yes this is confusing. Again, even the documentation is wrong! That is why I am trying to remove the misuse of read_seqbegin_or_lock(), then I am going to change the semantics of need_seqretry() to enforce the locking on the 2nd pass. Oleg.