From: Waiman Long <longman@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
Boqun Feng <boqun.feng@gmail.com>
Cc: linux-kernel@vger.kernel.org, Waiman Long <longman@redhat.com>
Subject: [PATCH v2 8/8] locking/rwsem: Restore old write lock slow path behavior
Date: Mon, 27 Mar 2023 16:24:13 -0400 [thread overview]
Message-ID: <20230327202413.1955856-9-longman@redhat.com> (raw)
In-Reply-To: <20230327202413.1955856-1-longman@redhat.com>
An earlier commit ("locking/rwsem: Rework writer wakeup") moves writer
lock acquisition to the write wakeup path only. This can result in
an almost immediate transfer of write lock ownership after an unlock
leaving little time for lock stealing. That can be long before the new
write lock owner wakes up and run its critical section.
As a result, write lock stealing from optimistic spinning will be
greatly suppressed. By enabling CONFIG_LOCK_EVENT_COUNTS and running a
rwsem locking microbenmark on a 2-socket x86-64 test machine for 15s,
it was found that the locking rate was reduced to about 30% of that
before the patch - 584,091 op/s vs. 171,184 ops/s. The total number
of lock stealings within the testing period was reduced to less than 1%
of that before the patch - 4,252,147 vs 17,939 [1].
To restore the lost performance, this patch restores the old write lock
slow path behavior of acquiring the lock after the waiter has been woken
up with the exception that lock transfer may happen in the wakeup path
if the HANDOFF bit has been set. In addition, the waiter that sets the
HANDOFF bit will still try to spin on the lock owner if it is on cpu.
With this patch, the locking rate is now back up to 580,256 ops/s which
is almost the same as before.
[1] https://lore.kernel.org/lkml/c126f079-88a2-4067-6f94-82f51cf5ff2b@redhat.com
/
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/locking/rwsem.c | 34 +++++++++++++++++++++++++++++++---
1 file changed, 31 insertions(+), 3 deletions(-)
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 7bd26e64827a..cf9dc1e250e0 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -426,6 +426,7 @@ rwsem_waiter_wake(struct rwsem_waiter *waiter, struct wake_q_head *wake_q)
static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
struct rwsem_waiter *waiter)
{
+ bool first = (rwsem_first_waiter(sem) == waiter);
long count, new;
lockdep_assert_held(&sem->wait_lock);
@@ -434,6 +435,9 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
do {
new = count;
+ if (!first && (count & (RWSEM_FLAG_HANDOFF | RWSEM_LOCK_MASK)))
+ return false;
+
if (count & RWSEM_LOCK_MASK) {
/*
* A waiter (first or not) can set the handoff bit
@@ -501,11 +505,18 @@ static void rwsem_writer_wake(struct rw_semaphore *sem,
*/
list_del(&waiter->list);
atomic_long_set(&sem->owner, (long)waiter->task);
-
- } else if (!rwsem_try_write_lock(sem, waiter))
+ rwsem_waiter_wake(waiter, wake_q);
return;
+ }
- rwsem_waiter_wake(waiter, wake_q);
+ /*
+ * Mark writer at the front of the queue for wakeup.
+ * Until the task is actually awoken later by the caller, other
+ * writers are able to steal it. Readers, on the other hand, will
+ * block as they will notice the queued writer.
+ */
+ wake_q_add(wake_q, waiter->task);
+ lockevent_inc(rwsem_wake_writer);
}
static void rwsem_reader_wake(struct rw_semaphore *sem,
@@ -1038,6 +1049,23 @@ rwsem_waiter_wait(struct rw_semaphore *sem, struct rwsem_waiter *waiter,
/* Matches rwsem_waiter_wake()'s smp_store_release(). */
break;
}
+ if (!reader) {
+ /*
+ * Writer still needs to do a trylock here
+ */
+ raw_spin_lock_irq(&sem->wait_lock);
+ if (waiter->task && rwsem_try_write_lock(sem, waiter))
+ waiter->task = NULL;
+ raw_spin_unlock_irq(&sem->wait_lock);
+ if (!smp_load_acquire(&waiter->task))
+ break;
+
+ if (waiter->handoff_set) {
+ rwsem_spin_on_owner(sem);
+ if (!smp_load_acquire(&waiter->task))
+ break;
+ }
+ }
if (signal_pending_state(state, current)) {
raw_spin_lock_irq(&sem->wait_lock);
if (waiter->task)
--
2.31.1
next prev parent reply other threads:[~2023-03-27 20:26 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-27 20:24 [PATCH v2 0/8] locking/rwsem: Rework writer wakeup and handoff Waiman Long
2023-03-27 20:24 ` [PATCH v2 1/8] locking/rwsem: Minor code refactoring in rwsem_mark_wake() Waiman Long
2023-03-27 20:24 ` [PATCH v2 2/8] locking/rwsem: Enforce queueing when HANDOFF Waiman Long
2023-03-27 20:24 ` [PATCH v2 3/8] locking/rwsem: Rework writer wakeup Waiman Long
2023-03-27 20:24 ` [PATCH v2 4/8] locking/rwsem: Simplify rwsem_writer_wake() Waiman Long
2023-03-27 20:24 ` [PATCH v2 5/8] locking/rwsem: Split out rwsem_reader_wake() Waiman Long
2023-03-27 20:24 ` [PATCH v2 6/8] locking/rwsem: Unify wait loop Waiman Long
2023-03-27 20:24 ` [PATCH v2 7/8] locking/rwsem: Use the force Waiman Long
2023-03-27 20:24 ` Waiman Long [this message]
2023-03-28 14:02 ` [PATCH v2 8/8] locking/rwsem: Restore old write lock slow path behavior Peter Zijlstra
2023-03-29 3:45 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230327202413.1955856-9-longman@redhat.com \
--to=longman@redhat.com \
--cc=boqun.feng@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox