public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
	Boqun Feng <boqun.feng@gmail.com>
Cc: linux-kernel@vger.kernel.org, john.p.donnelly@oracle.com,
	"Hillf Danton" <hdanton@sina.com>,
	"Mukesh Ojha" <quic_mojha@quicinc.com>,
	"Ting11 Wang 王婷" <wangting11@xiaomi.com>,
	"Waiman Long" <longman@redhat.com>
Subject: [PATCH v7 4/4] locking/rwsem: Enable direct rwsem lock handoff
Date: Wed, 25 Jan 2023 19:36:28 -0500	[thread overview]
Message-ID: <20230126003628.365092-5-longman@redhat.com> (raw)
In-Reply-To: <20230126003628.365092-1-longman@redhat.com>

The lock handoff provided in rwsem isn't a true handoff like that in
the mutex. Instead, it is more like a quiescent state where optimistic
spinning and lock stealing are disabled to make it easier for the first
waiter to acquire the lock.

For mutex, lock handoff is done at unlock time as the owner value and
the handoff bit is in the same lock word and can be updated atomically.

That is the not case for rwsem which has a count value for locking and
a different owner value for storing lock owner. In addition, the handoff
processing differs depending on whether the first waiter is a writer or
a reader. We can only make that waiter type determination after acquiring
the wait lock. Together with the fact that the RWSEM_FLAG_HANDOFF bit is
stable while holding the wait_lock, the most convenient place to do the
handoff is at rwsem_wake() where wait_lock has to be acquired anyway.

Since a lot can happen between unlock time and after acquiring the
wait_lock in rwsem_wake(), we have to reconfirm the presence of the
handoff bit and the lock is free before doing the handoff. Handing off to
a reader has already been done pretty well by rwsem_mark_wake(), we don't
need to do anything extra other than disabling optimistic spinning. For
writer, additional code is added to pass the lock ownership to it. The
waiter is removed from the wait queue and waiter->task is cleared in
this case to signal that handoff has happened. This is similar to what
rwsem_mark_wake() is doing to readers whether a handoff has happened
or not.

Running a 96-thread rwsem locking test on a 96-thread x86-64 system,
the locking throughput increases slightly from 588 kops/s to 592 kops/s
with this change.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem.c | 74 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 63 insertions(+), 11 deletions(-)

diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index acb5a50309a1..2cf1e0bfdaa5 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -40,7 +40,7 @@
  *
  * When the rwsem is reader-owned and a spinning writer has timed out,
  * the nonspinnable bit will be set to disable optimistic spinning.
-
+ *
  * When a writer acquires a rwsem, it puts its task_struct pointer
  * into the owner field. It is cleared after an unlock.
  *
@@ -430,6 +430,10 @@ static void rwsem_mark_wake(struct rw_semaphore *sem,
 			 * Mark writer at the front of the queue for wakeup.
 			 * Until the task is actually later awoken later by
 			 * the caller, other writers are able to steal it.
+			 *
+			 * *Unless* HANDOFF is set, in which case only the
+			 * first waiter is allowed to take it.
+			 *
 			 * Readers, on the other hand, will block as they
 			 * will notice the queued writer.
 			 */
@@ -467,7 +471,12 @@ static void rwsem_mark_wake(struct rw_semaphore *sem,
 					adjustment -= RWSEM_FLAG_HANDOFF;
 					lockevent_inc(rwsem_rlock_handoff);
 				}
+				/*
+				 * With HANDOFF set for reader, we must
+				 * terminate all spinning.
+				 */
 				waiter->handoff_set = true;
+				rwsem_set_nonspinnable(sem);
 			}
 
 			atomic_long_add(-adjustment, &sem->count);
@@ -609,6 +618,12 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
 
 	lockdep_assert_held(&sem->wait_lock);
 
+	if (!waiter->task) {
+		/* Write lock handed off */
+		smp_acquire__after_ctrl_dep();
+		return true;
+	}
+
 	count = atomic_long_read(&sem->count);
 	do {
 		bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF);
@@ -754,6 +769,10 @@ rwsem_spin_on_owner(struct rw_semaphore *sem)
 
 	owner = rwsem_owner_flags(sem, &flags);
 	state = rwsem_owner_state(owner, flags);
+
+	if (owner == current)
+		return OWNER_NONSPINNABLE;	/* Handoff granted */
+
 	if (state != OWNER_WRITER)
 		return state;
 
@@ -844,7 +863,6 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 		 * Try to acquire the lock
 		 */
 		taken = rwsem_try_write_lock_unqueued(sem);
-
 		if (taken)
 			break;
 
@@ -1168,21 +1186,23 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 		 * without sleeping.
 		 */
 		if (waiter.handoff_set) {
-			enum owner_state owner_state;
-
-			owner_state = rwsem_spin_on_owner(sem);
-			if (owner_state == OWNER_NULL)
-				goto trylock_again;
+			rwsem_spin_on_owner(sem);
+			if (!READ_ONCE(waiter.task)) {
+				/* Write lock handed off */
+				smp_acquire__after_ctrl_dep();
+				set_current_state(TASK_RUNNING);
+				goto out;
+			}
 		}
 
 		schedule_preempt_disabled();
 		lockevent_inc(rwsem_sleep_writer);
 		set_current_state(state);
-trylock_again:
 		raw_spin_lock_irq(&sem->wait_lock);
 	}
 	__set_current_state(TASK_RUNNING);
 	raw_spin_unlock_irq(&sem->wait_lock);
+out:
 	lockevent_inc(rwsem_wlock);
 	trace_contention_end(sem, 0);
 	return sem;
@@ -1190,6 +1210,11 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 out_nolock:
 	__set_current_state(TASK_RUNNING);
 	raw_spin_lock_irq(&sem->wait_lock);
+	if (!waiter.task) {
+		smp_acquire__after_ctrl_dep();
+		raw_spin_unlock_irq(&sem->wait_lock);
+		goto out;
+	}
 	rwsem_del_wake_waiter(sem, &waiter, &wake_q);
 	lockevent_inc(rwsem_wlock_fail);
 	trace_contention_end(sem, -EINTR);
@@ -1202,14 +1227,41 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
  */
 static struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
 {
-	unsigned long flags;
 	DEFINE_WAKE_Q(wake_q);
+	unsigned long flags;
+	unsigned long count;
 
 	raw_spin_lock_irqsave(&sem->wait_lock, flags);
 
-	if (!list_empty(&sem->wait_list))
-		rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
+	if (list_empty(&sem->wait_list))
+		goto unlock_out;
+
+	/*
+	 * If the rwsem is free and handoff flag is set with wait_lock held,
+	 * no other CPUs can take an active lock.
+	 */
+	count = atomic_long_read(&sem->count);
+	if (!(count & RWSEM_LOCK_MASK) && (count & RWSEM_FLAG_HANDOFF)) {
+		/*
+		 * Since rwsem_mark_wake() will handle the handoff to reader
+		 * properly, we don't need to do anything extra for reader.
+		 * Special handoff processing will only be needed for writer.
+		 */
+		struct rwsem_waiter *waiter = rwsem_first_waiter(sem);
+		long adj = RWSEM_WRITER_LOCKED - RWSEM_FLAG_HANDOFF;
+
+		if (waiter->type == RWSEM_WAITING_FOR_WRITE) {
+			atomic_long_set(&sem->owner, (long)waiter->task);
+			atomic_long_add(adj, &sem->count);
+			wake_q_add(&wake_q, waiter->task);
+			rwsem_del_waiter(sem, waiter);
+			waiter->task = NULL;	/* Signal the handoff */
+			goto unlock_out;
+		}
+	}
+	rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
 
+unlock_out:
 	raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
 	wake_up_q(&wake_q);
 
-- 
2.31.1


  parent reply	other threads:[~2023-01-26  0:39 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-26  0:36 [PATCH v7 0/4] lockinig/rwsem: Fix rwsem bugs & enable true lock handoff Waiman Long
2023-01-26  0:36 ` [PATCH v7 1/4] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Waiman Long
2023-01-26 11:38   ` [tip: locking/core] " tip-bot2 for Waiman Long
2023-01-26  0:36 ` [PATCH v7 2/4] locking/rwsem: Disable preemption at all down_read*() and up_read() code paths Waiman Long
2023-01-26 11:38   ` [tip: locking/core] locking/rwsem: Disable preemption in " tip-bot2 for Waiman Long
2023-01-26  0:36 ` [PATCH v7 3/4] locking/rwsem: Disable preemption at all down_write*() and up_write() " Waiman Long
2023-01-26 11:38   ` [tip: locking/core] locking/rwsem: Disable preemption in " tip-bot2 for Waiman Long
2023-01-26  0:36 ` Waiman Long [this message]
2023-02-12 13:33   ` [PATCH v7 4/4] locking/rwsem: Enable direct rwsem lock handoff kernel test robot
2023-02-13 12:31   ` Peter Zijlstra
2023-02-13 17:14     ` Waiman Long
2023-02-13 21:52       ` Peter Zijlstra
2023-02-14  2:03         ` Waiman Long
2023-02-23  1:44   ` kernel test robot
2023-01-26 12:46 ` [PATCH v7 0/4] lockinig/rwsem: Fix rwsem bugs & enable true " Ingo Molnar
2023-01-26 13:17   ` Waiman Long
     [not found] ` <20230126100441.4823-1-hdanton@sina.com>
2023-01-26 12:59   ` [PATCH v7 4/4] locking/rwsem: Enable direct rwsem " Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230126003628.365092-5-longman@redhat.com \
    --to=longman@redhat.com \
    --cc=boqun.feng@gmail.com \
    --cc=hdanton@sina.com \
    --cc=john.p.donnelly@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=quic_mojha@quicinc.com \
    --cc=wangting11@xiaomi.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox