Re: [PATCH v6 5/6] locking/rwsem: Enable direct rwsem lock handoff

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Waiman Long <longman@redhat.com>
Cc: "Ingo Molnar" <mingo@redhat.com>, "Will Deacon" <will@kernel.org>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	linux-kernel@vger.kernel.org, john.p.donnelly@oracle.com,
	"Hillf Danton" <hdanton@sina.com>,
	"Mukesh Ojha" <quic_mojha@quicinc.com>,
	"Ting11 Wang 王婷" <wangting11@xiaomi.com>
Subject: Re: [PATCH v6 5/6] locking/rwsem: Enable direct rwsem lock handoff
Date: Mon, 23 Jan 2023 15:59:26 +0100	[thread overview]
Message-ID: <Y86gzkVHlsOTY8QL@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20221118022016.462070-6-longman@redhat.com>

On Thu, Nov 17, 2022 at 09:20:15PM -0500, Waiman Long wrote:
> The lock handoff provided in rwsem isn't a true handoff like that in
> the mutex. Instead, it is more like a quiescent state where optimistic
> spinning and lock stealing are disabled to make it easier for the first
> waiter to acquire the lock.
> 
> Reworking the code to enable a true lock handoff is more complex due to
> the following facts:
>  1) The RWSEM_FLAG_HANDOFF bit is protected by the wait_lock and it
>     is too expensive to always take the wait_lock in the unlock path
>     to prevent racing.

Specifically, the issue is that we'd need to turn the
atomic_long_add_return_release() into an atomic_try_cmpxchg_release()
loop, like:

	tmp = atomic_long_read(&sem->count);
	do {
		if (tmp & (WAITERS|HANDOFF))
			return slow_unock();
	} while (atomic_long_try_cmpxchg_release(&sem->count, &tmp,
						 tmp - RWSEM_{READER_BIAS,WRITE_LOCKED});

in order to not race with a concurrent setting of the HANDOFF bit,
right? And we don't like to turn unlock into a cmpxchg loop.

(OTOH we sorta do this for mutex, unconteded mutex has cmpxchg lock and
unlock, any fail and we go to the slow path -- I suppose the distinct
difference is that we sorta expect some contention on the read side)

>  2) The reader lock fast path may add a RWSEM_READER_BIAS at the wrong
>     time to prevent a proper lock handoff from a reader owned rwsem.

This would be much the same, right? We'd have to turn
rwsem_read_trylock() into a cmpxchg-loop and we don't like that.
Therefore we do that speculative add and fix up later.

Now, I'm not enturely sure what issue you allude to here; is the problem
that you can't quite tell when the last reader is gone?

> A lock handoff can only be initiated when the following conditions are
> true:
>  1) The RWSEM_FLAG_HANDOFF bit is set.

d'uh ;-)

>  2) The task to do the handoff don't see any other active lock
>     excluding the lock that it might have held.

2) here is the 2) above, right?

> The new handoff mechanism performs handoff in rwsem_wakeup() to minimize
> overhead. The rwsem count will be known at that point to determine if
> handoff should be done. However, there is a small time gap between the
> rwsem becomes free and the wait_lock is taken

Right, that's between atomic_long_fetch_add_release() and calling the
slow path because WAITERS bit is set.

> where a reader can come in and add a RWSEM_READER_BIAS to the count or

Both 2s above.

> the current first waiter can take the rwsem and clear
> RWSEM_FLAG_HANDOFF in the interim.

The actual intended action.

> That will fail the handoff operation.

I would not list that latter as a failure, it's exactly what we want to
happen, no?

> To handle the former case, a secondary handoff will also be done in
> the rwsem_down_read_slowpath() to catch it.

Right. In short:

Having HANDOVER set:
 - implies WAITERS set
 - disables all fastpaths (spinning or otherwise)
 - dis-allows anybody except first waiter to obtain lock

Therefore, having the window between clearing owner and prodding first
waiter is 'harmless'.

> With true lock handoff, there is no need to do a NULL owner spinning
> anymore as wakeup will be performed if handoff is possible. So it
> is likely that the first waiter won't actually go to sleep even when
> schedule() is called in this case.

Right, removing that NULL spinning was the whole purpose -- except I
seem to have forgotten why it was a problem :-)

OK, lemme go read the actual patch.

Hmmm... you made it a wee bit more complicated, instead of my 3rd clause
above, you added a whole intermediate GRANTED state. Why?

Since we fundamentally must deal with the release->wait_lock hole, why
do we need to do the whole rwsem_wake()->GRANTED->*_slowpath() dance?
Why can't we skip the whole rwsem_wake()->GRANTED part and only do
handoff in the slowpath?

next prev parent reply	other threads:[~2023-01-23 15:00 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-18  2:20 [PATCH v6 0/6] lockinig/rwsem: Fix rwsem bugs & enable true lock handoff Waiman Long
2022-11-18  2:20 ` [PATCH v6 1/6] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Waiman Long
2022-12-16 15:02   ` Jiri Wiesner
2023-01-20 22:58   ` Waiman Long
2022-11-18  2:20 ` [PATCH v6 2/6] locking/rwsem: Disable preemption at all down_read*() and up_read() code paths Waiman Long
2022-12-16 15:03   ` Jiri Wiesner
2022-11-18  2:20 ` [PATCH v6 3/6] locking/rwsem: Disable preemption at all down_write*() and up_write() " Waiman Long
2022-12-16 15:03   ` Jiri Wiesner
2022-11-18  2:20 ` [PATCH v6 4/6] locking/rwsem: Change waiter->hanodff_set to a handoff_state enum Waiman Long
2022-11-18  2:20 ` [PATCH v6 5/6] locking/rwsem: Enable direct rwsem lock handoff Waiman Long
2023-01-23 14:59   ` Peter Zijlstra [this message]
2023-01-23 17:30     ` Waiman Long
2023-01-23 22:07       ` Waiman Long
2023-01-24 12:58         ` Peter Zijlstra
2023-01-24 12:29       ` Peter Zijlstra
2023-01-25  1:53         ` Waiman Long
2022-11-18  2:20 ` [PATCH v6 6/6] locking/rwsem: Update handoff lock events tracking Waiman Long
2023-01-17 20:53 ` [PATCH v6 0/6] lockinig/rwsem: Fix rwsem bugs & enable true lock handoff Waiman Long
2023-01-22 13:46 ` Peter Zijlstra
2023-01-23  3:40   ` Waiman Long
2023-01-23 21:10     ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y86gzkVHlsOTY8QL@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=boqun.feng@gmail.com \
    --cc=hdanton@sina.com \
    --cc=john.p.donnelly@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@redhat.com \
    --cc=quic_mojha@quicinc.com \
    --cc=wangting11@xiaomi.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.