From: tip-bot for Davidlohr Bueso <tipbot@zytor.com>
To: linux-tip-commits@vger.kernel.org
Cc: mingo@kernel.org, dave@stgolabs.net, peterz@infradead.org,
tglx@linutronix.de, paulmck@linux.vnet.ibm.com, dbueso@suse.de,
linux-kernel@vger.kernel.org, torvalds@linux-foundation.org,
hpa@zytor.com, akpm@linux-foundation.org
Subject: [tip:locking/core] locking/rwsem: Scan the wait_list for readers only once
Date: Thu, 18 Aug 2016 04:01:20 -0700 [thread overview]
Message-ID: <tip-1dc3e13144720bd1b0adbcfac044234c2d09b2a4@git.kernel.org> (raw)
In-Reply-To: <1470384285-32163-4-git-send-email-dave@stgolabs.net>
Commit-ID: 1dc3e13144720bd1b0adbcfac044234c2d09b2a4
Gitweb: http://git.kernel.org/tip/1dc3e13144720bd1b0adbcfac044234c2d09b2a4
Author: Davidlohr Bueso <dave@stgolabs.net>
AuthorDate: Fri, 5 Aug 2016 01:04:45 -0700
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 18 Aug 2016 11:35:39 +0200
locking/rwsem: Scan the wait_list for readers only once
When wanting to wakeup readers, __rwsem_mark_wakeup() currently
iterates the wait_list twice while looking to wakeup the first N
queued reader-tasks. While this can be quite inefficient, it was
there such that a awoken reader would be first and foremost
acknowledged by the lock counter.
Keeping the same logic, we can further benefit from the use of
wake_qs and avoid entirely the first wait_list iteration that sets
the counter as wake_up_process() isn't going to occur right away,
and therefore we maintain the counter->list order of going about
things.
Other than saving cycles with O(n) "scanning", this change also
nicely cleans up a good chunk of __rwsem_mark_wakeup(); both
visually and less tedious to read.
For example, the following improvements where seen on some will
it scale microbenchmarks, on a 48-core Haswell:
v4.7 v4.7-rwsem-v1
Hmean signal1-processes-8 5792691.42 ( 0.00%) 5771971.04 ( -0.36%)
Hmean signal1-processes-12 6081199.96 ( 0.00%) 6072174.38 ( -0.15%)
Hmean signal1-processes-21 3071137.71 ( 0.00%) 3041336.72 ( -0.97%)
Hmean signal1-processes-48 3712039.98 ( 0.00%) 3708113.59 ( -0.11%)
Hmean signal1-processes-79 4464573.45 ( 0.00%) 4682798.66 ( 4.89%)
Hmean signal1-processes-110 4486842.01 ( 0.00%) 4633781.71 ( 3.27%)
Hmean signal1-processes-141 4611816.83 ( 0.00%) 4692725.38 ( 1.75%)
Hmean signal1-processes-172 4638157.05 ( 0.00%) 4714387.86 ( 1.64%)
Hmean signal1-processes-203 4465077.80 ( 0.00%) 4690348.07 ( 5.05%)
Hmean signal1-processes-224 4410433.74 ( 0.00%) 4687534.43 ( 6.28%)
Stddev signal1-processes-8 6360.47 ( 0.00%) 8455.31 ( 32.94%)
Stddev signal1-processes-12 4004.98 ( 0.00%) 9156.13 (128.62%)
Stddev signal1-processes-21 3273.14 ( 0.00%) 5016.80 ( 53.27%)
Stddev signal1-processes-48 28420.25 ( 0.00%) 26576.22 ( -6.49%)
Stddev signal1-processes-79 22038.34 ( 0.00%) 18992.70 (-13.82%)
Stddev signal1-processes-110 23226.93 ( 0.00%) 17245.79 (-25.75%)
Stddev signal1-processes-141 6358.98 ( 0.00%) 7636.14 ( 20.08%)
Stddev signal1-processes-172 9523.70 ( 0.00%) 4824.75 (-49.34%)
Stddev signal1-processes-203 13915.33 ( 0.00%) 9326.33 (-32.98%)
Stddev signal1-processes-224 15573.94 ( 0.00%) 10613.82 (-31.85%)
Other runs that saw improvements include context_switch and pipe; and
as expected, this is particularly highlighted on larger thread counts
as it becomes more expensive to walk the list twice.
No change in wakeup ordering or semantics.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman.Long@hp.com
Cc: dave@stgolabs.net
Cc: jason.low2@hpe.com
Cc: wanpeng.li@hotmail.com
Link: http://lkml.kernel.org/r/1470384285-32163-4-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
kernel/locking/rwsem-xadd.c | 58 ++++++++++++++++++++-------------------------
1 file changed, 26 insertions(+), 32 deletions(-)
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index e02fe32..2337b4b 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -125,12 +125,14 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
enum rwsem_wake_type wake_type,
struct wake_q_head *wake_q)
{
- struct rwsem_waiter *waiter;
- struct task_struct *tsk;
- struct list_head *next;
- long loop, oldcount, woken = 0, adjustment = 0;
+ struct rwsem_waiter *waiter, *tmp;
+ long oldcount, woken = 0, adjustment = 0;
- waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
+ /*
+ * Take a peek at the queue head waiter such that we can determine
+ * the wakeup(s) to perform.
+ */
+ waiter = list_first_entry(&sem->wait_list, struct rwsem_waiter, list);
if (waiter->type == RWSEM_WAITING_FOR_WRITE) {
if (wake_type == RWSEM_WAKE_ANY) {
@@ -180,36 +182,21 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
/*
* Grant an infinite number of read locks to the readers at the front
- * of the queue. Note we increment the 'active part' of the count by
- * the number of readers before waking any processes up.
+ * of the queue. We know that woken will be at least 1 as we accounted
+ * for above. Note we increment the 'active part' of the count by the
+ * number of readers before waking any processes up.
*/
- do {
- woken++;
+ list_for_each_entry_safe(waiter, tmp, &sem->wait_list, list) {
+ struct task_struct *tsk;
- if (waiter->list.next == &sem->wait_list)
+ if (waiter->type == RWSEM_WAITING_FOR_WRITE)
break;
- waiter = list_entry(waiter->list.next,
- struct rwsem_waiter, list);
-
- } while (waiter->type != RWSEM_WAITING_FOR_WRITE);
-
- adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
- if (waiter->type != RWSEM_WAITING_FOR_WRITE)
- /* hit end of list above */
- adjustment -= RWSEM_WAITING_BIAS;
-
- if (adjustment)
- atomic_long_add(adjustment, &sem->count);
-
- next = sem->wait_list.next;
- loop = woken;
- do {
- waiter = list_entry(next, struct rwsem_waiter, list);
- next = waiter->list.next;
+ woken++;
tsk = waiter->task;
wake_q_add(wake_q, tsk);
+ list_del(&waiter->list);
/*
* Ensure that the last operation is setting the reader
* waiter to nil such that rwsem_down_read_failed() cannot
@@ -217,10 +204,16 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
* to the task to wakeup.
*/
smp_store_release(&waiter->task, NULL);
- } while (--loop);
+ }
- sem->wait_list.next = next;
- next->prev = &sem->wait_list;
+ adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
+ if (list_empty(&sem->wait_list)) {
+ /* hit end of list above */
+ adjustment -= RWSEM_WAITING_BIAS;
+ }
+
+ if (adjustment)
+ atomic_long_add(adjustment, &sem->count);
}
/*
@@ -245,7 +238,8 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
/* we're now waiting on the lock, but no longer actively locking */
count = atomic_long_add_return(adjustment, &sem->count);
- /* If there are no active locks, wake the front queued process(es).
+ /*
+ * If there are no active locks, wake the front queued process(es).
*
* If there are no writers and we are first in the queue,
* wake our own waiter to join the existing active readers !
next prev parent reply other threads:[~2016-08-18 11:02 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-05 8:04 [PATCH 0/3] locking/rwsem: __rwsem_mark_wake() improvements Davidlohr Bueso
2016-08-05 8:04 ` [PATCH 1/3] locking/rwsem: Return void in __rwsem_mark_wake() Davidlohr Bueso
2016-08-18 11:00 ` [tip:locking/core] " tip-bot for Davidlohr Bueso
2016-08-18 13:41 ` tip-bot for Davidlohr Bueso
2016-08-05 8:04 ` [PATCH 2/3] locking/rwsem: Remove a few useless comments Davidlohr Bueso
2016-08-18 11:00 ` [tip:locking/core] " tip-bot for Davidlohr Bueso
2016-08-18 13:42 ` tip-bot for Davidlohr Bueso
2016-08-05 8:04 ` [PATCH 3/3] locking/rwsem: Scan the wait_list for readers only once Davidlohr Bueso
2016-08-06 6:05 ` Ingo Molnar
2016-08-08 16:37 ` Davidlohr Bueso
2016-08-18 11:01 ` tip-bot for Davidlohr Bueso [this message]
2016-08-18 13:42 ` [tip:locking/core] " tip-bot for Davidlohr Bueso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=tip-1dc3e13144720bd1b0adbcfac044234c2d09b2a4@git.kernel.org \
--to=tipbot@zytor.com \
--cc=akpm@linux-foundation.org \
--cc=dave@stgolabs.net \
--cc=dbueso@suse.de \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.