* [PATCH v4 0/2] locking/rwsem: optimize rwsem_wakeup()
@ 2015-04-30 21:12 Waiman Long
2015-04-30 21:12 ` [PATCH v4 1/2] locking/rwsem: reduce spinlock contention in wakeup after up_read/up_write Waiman Long
2015-04-30 21:12 ` [PATCH v4 2/2] locking/rwsem: check for active writer before wakeup Waiman Long
0 siblings, 2 replies; 8+ messages in thread
From: Waiman Long @ 2015-04-30 21:12 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: linux-kernel, Jason Low, Davidlohr Bueso, Scott J Norton,
Douglas Hatch, Waiman Long
v3->v4:
- Break out the active writer check into a separate patch and move
it from __rwsem_do_wake() to rwsem_wake().
- Use smp_rmb() instead of the incorrect smp_mb__after_atomic() as
suggested by PeterZ.
v2->v3:
- Fix errors in commit log.
v1->v2:
- Add a memory barrier before calling spin_trylock for proper memory
ordering.
This patch set aims to reduce spinlock contention in the wait_lock
due to excessive activity in the rwsem_wake code path. This, in turn,
reduces up_write/up_read latency and improve performance when the
rwsem is heavily contended.
On an 8-socket Westmere-EX server (80 cores, HT off), running AIM7's
high_systime workload (1000 users) on a vanilla 4.0 kernel produced
the following perf profile for spinlock contention:
9.23% reaim [kernel.kallsyms] [k] _raw_spin_lock_irqsave
|--97.39%-- rwsem_wake
|--0.69%-- try_to_wake_up
|--0.52%-- release_pages
--1.40%-- [...]
1.70% reaim [kernel.kallsyms] [k] _raw_spin_lock_irq
|--96.61%-- rwsem_down_write_failed
|--2.03%-- __schedule
|--0.50%-- run_timer_softirq
--0.86%-- [...]
Here the contended rwsems are the mmap_sem (mm_struct) and the
i_mmap_rwsem (address_space) with mostly write locking. With a
patched 4.0 kernel, the perf profile became:
1.87% reaim [kernel.kallsyms] [k] _raw_spin_lock_irqsave
|--87.64%-- rwsem_wake
|--2.80%-- release_pages
|--2.56%-- try_to_wake_up
|--1.10%-- __wake_up
|--1.06%-- pagevec_lru_move_fn
|--0.93%-- prepare_to_wait_exclusive
|--0.71%-- free_pid
|--0.58%-- get_page_from_freelist
|--0.57%-- add_device_randomness
--2.04%-- [...]
0.80% reaim [kernel.kallsyms] [k] _raw_spin_lock_irq
|--92.49%-- rwsem_down_write_failed
|--4.24%-- __schedule
|--1.37%-- run_timer_softirq
--1.91%-- [...]
The table below shows the % improvement in throughput (1100-2000 users)
in the various AIM7's workloads:
Workload % increase in throughput
-------- ------------------------
custom 3.8%
five-sec 3.5%
fserver 4.1%
high_systime 22.2%
shared 2.1%
short 10.1%
Waiman Long (2):
locking/rwsem: reduce spinlock contention in wakeup after
up_read/up_write
locking/rwsem: check for active writer before wakeup
include/linux/osq_lock.h | 5 +++
kernel/locking/rwsem-xadd.c | 65 +++++++++++++++++++++++++++++++++++++++++-
2 files changed, 68 insertions(+), 2 deletions(-)
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH v4 1/2] locking/rwsem: reduce spinlock contention in wakeup after up_read/up_write 2015-04-30 21:12 [PATCH v4 0/2] locking/rwsem: optimize rwsem_wakeup() Waiman Long @ 2015-04-30 21:12 ` Waiman Long 2015-04-30 21:21 ` Jason Low ` (3 more replies) 2015-04-30 21:12 ` [PATCH v4 2/2] locking/rwsem: check for active writer before wakeup Waiman Long 1 sibling, 4 replies; 8+ messages in thread From: Waiman Long @ 2015-04-30 21:12 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar Cc: linux-kernel, Jason Low, Davidlohr Bueso, Scott J Norton, Douglas Hatch, Waiman Long In up_write()/up_read(), rwsem_wake() will be called whenever it detects that some writers/readers are waiting. The rwsem_wake() function will take the wait_lock and call __rwsem_do_wake() to do the real wakeup. For a heavily contended rwsem, doing a spin_lock() on wait_lock will cause further contention on the heavily contended rwsem cacheline resulting in delay in the completion of the up_read/up_write operations. This patch makes the wait_lock taking and the call to __rwsem_do_wake() optional if at least one spinning writer is present. The spinning writer will be able to take the rwsem and call rwsem_wake() later when it calls up_write(). With the presence of a spinning writer, rwsem_wake() will now try to acquire the lock using trylock. If that fails, it will just quit. Signed-off-by: Waiman Long <Waiman.Long@hp.com> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- include/linux/osq_lock.h | 5 ++++ kernel/locking/rwsem-xadd.c | 44 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 49 insertions(+), 0 deletions(-) diff --git a/include/linux/osq_lock.h b/include/linux/osq_lock.h index 3a6490e..703ea5c 100644 --- a/include/linux/osq_lock.h +++ b/include/linux/osq_lock.h @@ -32,4 +32,9 @@ static inline void osq_lock_init(struct optimistic_spin_queue *lock) extern bool osq_lock(struct optimistic_spin_queue *lock); extern void osq_unlock(struct optimistic_spin_queue *lock); +static inline bool osq_is_locked(struct optimistic_spin_queue *lock) +{ + return atomic_read(&lock->tail) != OSQ_UNLOCKED_VAL; +} + #endif diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 2f7cc40..2bb25e2 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -391,11 +391,24 @@ done: return taken; } +/* + * Return true if the rwsem has active spinner + */ +static inline bool rwsem_has_spinner(struct rw_semaphore *sem) +{ + return osq_is_locked(&sem->osq); +} + #else static bool rwsem_optimistic_spin(struct rw_semaphore *sem) { return false; } + +static inline bool rwsem_has_spinner(struct rw_semaphore *sem) +{ + return false; +} #endif /* @@ -478,7 +491,38 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem) { unsigned long flags; + /* + * If a spinner is present, it is not necessary to do the wakeup. + * Try to do wakeup only if the trylock succeeds to minimize + * spinlock contention which may introduce too much delay in the + * unlock operation. + * + * spinning writer up_write/up_read caller + * --------------- ----------------------- + * [S] osq_unlock() [L] osq + * MB RMB + * [RmW] rwsem_try_write_lock() [RmW] spin_trylock(wait_lock) + * + * Here, it is important to make sure that there won't be a missed + * wakeup while the rwsem is free and the only spinning writer goes + * to sleep without taking the rwsem. Even when the spinning writer + * is just going to break out of the waiting loop, it will still do + * a trylock in rwsem_down_write_failed() before sleeping. IOW, if + * rwsem_has_spinner() is true, it will guarantee at least one + * trylock attempt on the rwsem later on. + */ + if (rwsem_has_spinner(sem)) { + /* + * The smp_rmb() here is to make sure that the spinner + * state is consulted before reading the wait_lock. + */ + smp_rmb(); + if (!raw_spin_trylock_irqsave(&sem->wait_lock, flags)) + return sem; + goto locked; + } raw_spin_lock_irqsave(&sem->wait_lock, flags); +locked: /* do nothing if list empty */ if (!list_empty(&sem->wait_list)) -- 1.7.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v4 1/2] locking/rwsem: reduce spinlock contention in wakeup after up_read/up_write 2015-04-30 21:12 ` [PATCH v4 1/2] locking/rwsem: reduce spinlock contention in wakeup after up_read/up_write Waiman Long @ 2015-04-30 21:21 ` Jason Low 2015-05-01 10:14 ` Peter Zijlstra ` (2 subsequent siblings) 3 siblings, 0 replies; 8+ messages in thread From: Jason Low @ 2015-04-30 21:21 UTC (permalink / raw) To: Waiman Long Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Davidlohr Bueso, Scott J Norton, Douglas Hatch, jason.low2 On Thu, 2015-04-30 at 17:12 -0400, Waiman Long wrote: > In up_write()/up_read(), rwsem_wake() will be called whenever it > detects that some writers/readers are waiting. The rwsem_wake() > function will take the wait_lock and call __rwsem_do_wake() to do the > real wakeup. For a heavily contended rwsem, doing a spin_lock() on > wait_lock will cause further contention on the heavily contended rwsem > cacheline resulting in delay in the completion of the up_read/up_write > operations. > > This patch makes the wait_lock taking and the call to __rwsem_do_wake() > optional if at least one spinning writer is present. The spinning > writer will be able to take the rwsem and call rwsem_wake() later > when it calls up_write(). With the presence of a spinning writer, > rwsem_wake() will now try to acquire the lock using trylock. If that > fails, it will just quit. > > Signed-off-by: Waiman Long <Waiman.Long@hp.com> > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Jason Low <jason.low2@hp.com> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 1/2] locking/rwsem: reduce spinlock contention in wakeup after up_read/up_write 2015-04-30 21:12 ` [PATCH v4 1/2] locking/rwsem: reduce spinlock contention in wakeup after up_read/up_write Waiman Long 2015-04-30 21:21 ` Jason Low @ 2015-05-01 10:14 ` Peter Zijlstra 2015-05-06 11:18 ` Davidlohr Bueso 2015-05-08 13:24 ` [tip:locking/core] locking/rwsem: Reduce spinlock contention in wakeup after up_read()/up_write() tip-bot for Waiman Long 3 siblings, 0 replies; 8+ messages in thread From: Peter Zijlstra @ 2015-05-01 10:14 UTC (permalink / raw) To: Waiman Long Cc: Ingo Molnar, linux-kernel, Jason Low, Davidlohr Bueso, Scott J Norton, Douglas Hatch On Thu, Apr 30, 2015 at 05:12:16PM -0400, Waiman Long wrote: > In up_write()/up_read(), rwsem_wake() will be called whenever it > detects that some writers/readers are waiting. The rwsem_wake() > function will take the wait_lock and call __rwsem_do_wake() to do the > real wakeup. For a heavily contended rwsem, doing a spin_lock() on > wait_lock will cause further contention on the heavily contended rwsem > cacheline resulting in delay in the completion of the up_read/up_write > operations. > > This patch makes the wait_lock taking and the call to __rwsem_do_wake() > optional if at least one spinning writer is present. The spinning > writer will be able to take the rwsem and call rwsem_wake() later > when it calls up_write(). With the presence of a spinning writer, > rwsem_wake() will now try to acquire the lock using trylock. If that > fails, it will just quit. > > Signed-off-by: Waiman Long <Waiman.Long@hp.com> > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- Thanks! ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 1/2] locking/rwsem: reduce spinlock contention in wakeup after up_read/up_write 2015-04-30 21:12 ` [PATCH v4 1/2] locking/rwsem: reduce spinlock contention in wakeup after up_read/up_write Waiman Long 2015-04-30 21:21 ` Jason Low 2015-05-01 10:14 ` Peter Zijlstra @ 2015-05-06 11:18 ` Davidlohr Bueso 2015-05-06 11:20 ` Davidlohr Bueso 2015-05-08 13:24 ` [tip:locking/core] locking/rwsem: Reduce spinlock contention in wakeup after up_read()/up_write() tip-bot for Waiman Long 3 siblings, 1 reply; 8+ messages in thread From: Davidlohr Bueso @ 2015-05-06 11:18 UTC (permalink / raw) To: Waiman Long Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Jason Low, Scott J Norton, Douglas Hatch On Thu, 2015-04-30 at 17:12 -0400, Waiman Long wrote: > In up_write()/up_read(), rwsem_wake() will be called whenever it > detects that some writers/readers are waiting. The rwsem_wake() > function will take the wait_lock and call __rwsem_do_wake() to do the > real wakeup. For a heavily contended rwsem, doing a spin_lock() on > wait_lock will cause further contention on the heavily contended rwsem > cacheline resulting in delay in the completion of the up_read/up_write > operations. > > This patch makes the wait_lock taking and the call to __rwsem_do_wake() > optional if at least one spinning writer is present. The spinning > writer will be able to take the rwsem and call rwsem_wake() later > when it calls up_write(). With the presence of a spinning writer, > rwsem_wake() will now try to acquire the lock using trylock. If that > fails, it will just quit. > > Signed-off-by: Waiman Long <Waiman.Long@hp.com> > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 1/2] locking/rwsem: reduce spinlock contention in wakeup after up_read/up_write 2015-05-06 11:18 ` Davidlohr Bueso @ 2015-05-06 11:20 ` Davidlohr Bueso 0 siblings, 0 replies; 8+ messages in thread From: Davidlohr Bueso @ 2015-05-06 11:20 UTC (permalink / raw) To: Waiman Long Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Jason Low, Scott J Norton, Douglas Hatch On Wed, 2015-05-06 at 04:18 -0700, Davidlohr Bueso wrote: > Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> A nit, but it would be useful if the benchmark/perf numbers were also in this changelog, for future references. Thanks, Davidlohr ^ permalink raw reply [flat|nested] 8+ messages in thread
* [tip:locking/core] locking/rwsem: Reduce spinlock contention in wakeup after up_read()/up_write() 2015-04-30 21:12 ` [PATCH v4 1/2] locking/rwsem: reduce spinlock contention in wakeup after up_read/up_write Waiman Long ` (2 preceding siblings ...) 2015-05-06 11:18 ` Davidlohr Bueso @ 2015-05-08 13:24 ` tip-bot for Waiman Long 3 siblings, 0 replies; 8+ messages in thread From: tip-bot for Waiman Long @ 2015-05-08 13:24 UTC (permalink / raw) To: linux-tip-commits Cc: tglx, akpm, hpa, torvalds, bp, mingo, doug.hatch, linux-kernel, peterz, dave, scott.norton, jason.low2, Waiman.Long Commit-ID: 59aabfc7e959f5f213e4e5cc7567ab4934da2adf Gitweb: http://git.kernel.org/tip/59aabfc7e959f5f213e4e5cc7567ab4934da2adf Author: Waiman Long <Waiman.Long@hp.com> AuthorDate: Thu, 30 Apr 2015 17:12:16 -0400 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Fri, 8 May 2015 12:27:59 +0200 locking/rwsem: Reduce spinlock contention in wakeup after up_read()/up_write() In up_write()/up_read(), rwsem_wake() will be called whenever it detects that some writers/readers are waiting. The rwsem_wake() function will take the wait_lock and call __rwsem_do_wake() to do the real wakeup. For a heavily contended rwsem, doing a spin_lock() on wait_lock will cause further contention on the heavily contended rwsem cacheline resulting in delay in the completion of the up_read/up_write operations. This patch makes the wait_lock taking and the call to __rwsem_do_wake() optional if at least one spinning writer is present. The spinning writer will be able to take the rwsem and call rwsem_wake() later when it calls up_write(). With the presence of a spinning writer, rwsem_wake() will now try to acquire the lock using trylock. If that fails, it will just quit. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Jason Low <jason.low2@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1430428337-16802-2-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org> --- include/linux/osq_lock.h | 5 +++++ kernel/locking/rwsem-xadd.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 49 insertions(+) diff --git a/include/linux/osq_lock.h b/include/linux/osq_lock.h index 3a6490e..703ea5c 100644 --- a/include/linux/osq_lock.h +++ b/include/linux/osq_lock.h @@ -32,4 +32,9 @@ static inline void osq_lock_init(struct optimistic_spin_queue *lock) extern bool osq_lock(struct optimistic_spin_queue *lock); extern void osq_unlock(struct optimistic_spin_queue *lock); +static inline bool osq_is_locked(struct optimistic_spin_queue *lock) +{ + return atomic_read(&lock->tail) != OSQ_UNLOCKED_VAL; +} + #endif diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 3417d01..0f18971 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -409,11 +409,24 @@ done: return taken; } +/* + * Return true if the rwsem has active spinner + */ +static inline bool rwsem_has_spinner(struct rw_semaphore *sem) +{ + return osq_is_locked(&sem->osq); +} + #else static bool rwsem_optimistic_spin(struct rw_semaphore *sem) { return false; } + +static inline bool rwsem_has_spinner(struct rw_semaphore *sem) +{ + return false; +} #endif /* @@ -496,7 +509,38 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem) { unsigned long flags; + /* + * If a spinner is present, it is not necessary to do the wakeup. + * Try to do wakeup only if the trylock succeeds to minimize + * spinlock contention which may introduce too much delay in the + * unlock operation. + * + * spinning writer up_write/up_read caller + * --------------- ----------------------- + * [S] osq_unlock() [L] osq + * MB RMB + * [RmW] rwsem_try_write_lock() [RmW] spin_trylock(wait_lock) + * + * Here, it is important to make sure that there won't be a missed + * wakeup while the rwsem is free and the only spinning writer goes + * to sleep without taking the rwsem. Even when the spinning writer + * is just going to break out of the waiting loop, it will still do + * a trylock in rwsem_down_write_failed() before sleeping. IOW, if + * rwsem_has_spinner() is true, it will guarantee at least one + * trylock attempt on the rwsem later on. + */ + if (rwsem_has_spinner(sem)) { + /* + * The smp_rmb() here is to make sure that the spinner + * state is consulted before reading the wait_lock. + */ + smp_rmb(); + if (!raw_spin_trylock_irqsave(&sem->wait_lock, flags)) + return sem; + goto locked; + } raw_spin_lock_irqsave(&sem->wait_lock, flags); +locked: /* do nothing if list empty */ if (!list_empty(&sem->wait_list)) ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v4 2/2] locking/rwsem: check for active writer before wakeup 2015-04-30 21:12 [PATCH v4 0/2] locking/rwsem: optimize rwsem_wakeup() Waiman Long 2015-04-30 21:12 ` [PATCH v4 1/2] locking/rwsem: reduce spinlock contention in wakeup after up_read/up_write Waiman Long @ 2015-04-30 21:12 ` Waiman Long 1 sibling, 0 replies; 8+ messages in thread From: Waiman Long @ 2015-04-30 21:12 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar Cc: linux-kernel, Jason Low, Davidlohr Bueso, Scott J Norton, Douglas Hatch, Waiman Long On a highly contended rwsem, spinlock contention due to the slow rwsem_wake() call can be a significant portion of the total CPU cycles used. With writer lock stealing and writer optimistic spinning, there is also a chance that the lock may have been stolen by the time that the wait_lock is acquired. This patch adds a low cost checking code after acquiring the wait_lock to look for active writer. The presence of an active writer will abort the wakeup operation. Signed-off-by: Waiman Long <Waiman.Long@hp.com> --- kernel/locking/rwsem-xadd.c | 21 +++++++++++++++++++-- 1 files changed, 19 insertions(+), 2 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 2bb25e2..815f0cc 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -399,6 +399,15 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) return osq_is_locked(&sem->osq); } +/* + * Return true if there is an active writer by checking the owner field which + * should be set if there is one. + */ +static inline bool rwsem_has_active_writer(struct rw_semaphore *sem) +{ + return READ_ONCE(sem->owner) != NULL; +} + #else static bool rwsem_optimistic_spin(struct rw_semaphore *sem) { @@ -409,6 +418,11 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem) { return false; } + +static inline bool rwsem_has_active_writer(struct rw_semaphore *sem) +{ + return false; /* Assume it has no active writer */ +} #endif /* @@ -524,8 +538,11 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem) raw_spin_lock_irqsave(&sem->wait_lock, flags); locked: - /* do nothing if list empty */ - if (!list_empty(&sem->wait_list)) + /* + * Do nothing if list empty or the lock has just been stolen by a + * writer after a possibly long wait in getting the wait_lock. + */ + if (!list_empty(&sem->wait_list) && !rwsem_has_active_writer(sem)) sem = __rwsem_do_wake(sem, RWSEM_WAKE_ANY); raw_spin_unlock_irqrestore(&sem->wait_lock, flags); -- 1.7.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-05-08 13:25 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-04-30 21:12 [PATCH v4 0/2] locking/rwsem: optimize rwsem_wakeup() Waiman Long 2015-04-30 21:12 ` [PATCH v4 1/2] locking/rwsem: reduce spinlock contention in wakeup after up_read/up_write Waiman Long 2015-04-30 21:21 ` Jason Low 2015-05-01 10:14 ` Peter Zijlstra 2015-05-06 11:18 ` Davidlohr Bueso 2015-05-06 11:20 ` Davidlohr Bueso 2015-05-08 13:24 ` [tip:locking/core] locking/rwsem: Reduce spinlock contention in wakeup after up_read()/up_write() tip-bot for Waiman Long 2015-04-30 21:12 ` [PATCH v4 2/2] locking/rwsem: check for active writer before wakeup Waiman Long
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox