Re: [PATCH v6 5/6] locking/rwsem: Enable direct rwsem lock handoff

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Waiman Long <longman@redhat.com>
Cc: "Ingo Molnar" <mingo@redhat.com>, "Will Deacon" <will@kernel.org>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	linux-kernel@vger.kernel.org, john.p.donnelly@oracle.com,
	"Hillf Danton" <hdanton@sina.com>,
	"Mukesh Ojha" <quic_mojha@quicinc.com>,
	"Ting11 Wang 王婷" <wangting11@xiaomi.com>
Subject: Re: [PATCH v6 5/6] locking/rwsem: Enable direct rwsem lock handoff
Date: Mon, 23 Jan 2023 15:59:26 +0100	[thread overview]
Message-ID: <Y86gzkVHlsOTY8QL@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20221118022016.462070-6-longman@redhat.com>

On Thu, Nov 17, 2022 at 09:20:15PM -0500, Waiman Long wrote:
> The lock handoff provided in rwsem isn't a true handoff like that in
> the mutex. Instead, it is more like a quiescent state where optimistic
> spinning and lock stealing are disabled to make it easier for the first
> waiter to acquire the lock.
> 
> Reworking the code to enable a true lock handoff is more complex due to
> the following facts:
>  1) The RWSEM_FLAG_HANDOFF bit is protected by the wait_lock and it
>     is too expensive to always take the wait_lock in the unlock path
>     to prevent racing.

Specifically, the issue is that we'd need to turn the
atomic_long_add_return_release() into an atomic_try_cmpxchg_release()
loop, like:

	tmp = atomic_long_read(&sem->count);
	do {
		if (tmp & (WAITERS|HANDOFF))
			return slow_unock();
	} while (atomic_long_try_cmpxchg_release(&sem->count, &tmp,
						 tmp - RWSEM_{READER_BIAS,WRITE_LOCKED});

in order to not race with a concurrent setting of the HANDOFF bit,
right? And we don't like to turn unlock into a cmpxchg loop.

(OTOH we sorta do this for mutex, unconteded mutex has cmpxchg lock and
unlock, any fail and we go to the slow path -- I suppose the distinct
difference is that we sorta expect some contention on the read side)

>  2) The reader lock fast path may add a RWSEM_READER_BIAS at the wrong
>     time to prevent a proper lock handoff from a reader owned rwsem.

This would be much the same, right? We'd have to turn
rwsem_read_trylock() into a cmpxchg-loop and we don't like that.
Therefore we do that speculative add and fix up later.

Now, I'm not enturely sure what issue you allude to here; is the problem
that you can't quite tell when the last reader is gone?

> A lock handoff can only be initiated when the following conditions are
> true:
>  1) The RWSEM_FLAG_HANDOFF bit is set.

d'uh ;-)

>  2) The task to do the handoff don't see any other active lock
>     excluding the lock that it might have held.

2) here is the 2) above, right?

> The new handoff mechanism performs handoff in rwsem_wakeup() to minimize
> overhead. The rwsem count will be known at that point to determine if
> handoff should be done. However, there is a small time gap between the
> rwsem becomes free and the wait_lock is taken

Right, that's between atomic_long_fetch_add_release() and calling the
slow path because WAITERS bit is set.

> where a reader can come in and add a RWSEM_READER_BIAS to the count or

Both 2s above.

> the current first waiter can take the rwsem and clear
> RWSEM_FLAG_HANDOFF in the interim.

The actual intended action.

> That will fail the handoff operation.

I would not list that latter as a failure, it's exactly what we want to
happen, no?

> To handle the former case, a secondary handoff will also be done in
> the rwsem_down_read_slowpath() to catch it.

Right. In short:

Having HANDOVER set:
 - implies WAITERS set
 - disables all fastpaths (spinning or otherwise)
 - dis-allows anybody except first waiter to obtain lock

Therefore, having the window between clearing owner and prodding first
waiter is 'harmless'.

> With true lock handoff, there is no need to do a NULL owner spinning
> anymore as wakeup will be performed if handoff is possible. So it
> is likely that the first waiter won't actually go to sleep even when
> schedule() is called in this case.

Right, removing that NULL spinning was the whole purpose -- except I
seem to have forgotten why it was a problem :-)

OK, lemme go read the actual patch.

Hmmm... you made it a wee bit more complicated, instead of my 3rd clause
above, you added a whole intermediate GRANTED state. Why?

Since we fundamentally must deal with the release->wait_lock hole, why
do we need to do the whole rwsem_wake()->GRANTED->*_slowpath() dance?
Why can't we skip the whole rwsem_wake()->GRANTED part and only do
handoff in the slowpath?

next prev parent reply	other threads:[~2023-01-23 15:00 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-18  2:20 [PATCH v6 0/6] lockinig/rwsem: Fix rwsem bugs & enable true lock handoff Waiman Long
2022-11-18  2:20 ` [PATCH v6 1/6] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Waiman Long
2022-12-16 15:02   ` Jiri Wiesner
2023-01-20 22:58   ` Waiman Long
2022-11-18  2:20 ` [PATCH v6 2/6] locking/rwsem: Disable preemption at all down_read*() and up_read() code paths Waiman Long
2022-12-16 15:03   ` Jiri Wiesner
2022-11-18  2:20 ` [PATCH v6 3/6] locking/rwsem: Disable preemption at all down_write*() and up_write() " Waiman Long
2022-12-16 15:03   ` Jiri Wiesner
2022-11-18  2:20 ` [PATCH v6 4/6] locking/rwsem: Change waiter->hanodff_set to a handoff_state enum Waiman Long
2022-11-18  2:20 ` [PATCH v6 5/6] locking/rwsem: Enable direct rwsem lock handoff Waiman Long
2023-01-23 14:59   ` Peter Zijlstra [this message]
2023-01-23 17:30     ` Waiman Long
2023-01-23 22:07       ` Waiman Long
2023-01-24 12:58         ` Peter Zijlstra
2023-01-24 12:29       ` Peter Zijlstra
2023-01-25  1:53         ` Waiman Long
2022-11-18  2:20 ` [PATCH v6 6/6] locking/rwsem: Update handoff lock events tracking Waiman Long
2023-01-17 20:53 ` [PATCH v6 0/6] lockinig/rwsem: Fix rwsem bugs & enable true lock handoff Waiman Long
2023-01-22 13:46 ` Peter Zijlstra
2023-01-23  3:40   ` Waiman Long
2023-01-23 21:10     ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y86gzkVHlsOTY8QL@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=boqun.feng@gmail.com \
    --cc=hdanton@sina.com \
    --cc=john.p.donnelly@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@redhat.com \
    --cc=quic_mojha@quicinc.com \
    --cc=wangting11@xiaomi.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox