linux-alpha.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH-tip v7 00/15] locking/rwsem: Rework rwsem-xadd & enable new rwsem features
@ 2017-10-18 18:30 Waiman Long
  2017-10-18 18:30 ` [PATCH-tip v7 01/15] locking/rwsem: relocate rwsem_down_read_failed() Waiman Long
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: Waiman Long @ 2017-10-18 18:30 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

v6->v7:
 - Remove reader lock stealing patch and add other patches to improve
   fairness to writers.
 - Remove rwsem_wake() optimization, but eliminate duplicated wakeup
   call to the same waiting writer.
 - Enable waiting writer to optimisticially spin on the lock.
 - Reader wakeup will now wake up all readers in the queue.

v5->v6:
 - Reworked the locking algorithm to make it similar to qrwlock.
 - Removed all the architecture specific code & use only generic code.
 - Added waiter lock handoff and time-based reader lock stealing.

v4->v5:
 - Drop the OSQ patch, the need to increase the size of the rwsem
   structure and the autotuning mechanism.
 - Add an intermediate patch to enable readers spinning on writer.
 - Other miscellaneous changes and optimizations.

v3->v4:
 - Rebased to the latest tip tree due to changes to rwsem-xadd.c.
 - Update the OSQ patch to fix race condition.

v2->v3:
 - Used smp_acquire__after_ctrl_dep() to provide acquire barrier.
 - Added the following new patches:
   1) make rwsem_spin_on_owner() return a tristate value.
   2) reactivate reader spinning when there is a large number of
      favorable writer-on-writer spinnings.
   3) move all the rwsem macros in arch-specific rwsem.h files
      into a common asm-generic/rwsem_types.h file.
   4) add a boot parameter to specify the reader spinning threshold.
 - Updated some of the patches as suggested by PeterZ and adjusted
   some of the reader spinning parameters.

v1->v2:
 - Fixed a 0day build error.
 - Added a new patch 1 to make osq_lock() a proper acquire memory
   barrier.
 - Replaced the explicit enabling of reader spinning by an autotuning
   mechanism that disable reader spinning for those rwsems that may
   not benefit from reader spinning.
 - Remove the last xfs patch as it is no longer necessary.

v4: https://lkml.org/lkml/2016/8/18/1039
v5: https://lkml.org/lkml/2017/6/1/841
v6: https://lkml.org/lkml/2017/10/11/722

This patchset revamps the current rwsem-xadd implmentation to make
it saner and easier to work with. This patchset also implements the
following 2 new features:

 1) Waiter lock handoff
 2) Reader optimistic spinning

With these changes, performance on workloads with a mix of readers
and writers will improve substantially. Now rwsem will become
more balance in term of preference for readers or writers.

Because of the fact that multiple readers can share the same lock,
there is a natural preference for readers when measuring in term of
locking throughput as more readers are likely to get into the locking
fast path than the writers. For those that enter the locking slowpath,
the ratio of readers and writers processed are usually around the 1-4
range when equal number of reader and writer threads are available.
The actual raio depends on the load and can vary somewhat from run
to run.

This patchset also uses generic code for all architectures, thus
all the architecture specific assembly codes can be removed easing
maintenance.

Patch 1 moves down the rwsem_down_read_failed() function for later
patches.

Patch 2 reworks the rwsem-xadd locking and unlocking codes to use
an algorithm somewhat similar to what qrwlock is doing today. All
the fastpath codes are moved to a new kernel/locking/rwsem-xadd.h
header file.

Patch 3 moves all the owner setting code to the fastpath in the
rwsem-xadd.h file as well.

Patch 4 moves content of kernel/locking/rwsem.h to rwsem-xadd.h and
removes it.

Patch 5 moves rwsem internal functions from include/linux/rwsem.h
to rwsem-xadd.h.

Patch 6 removes all the architecture specific rwsem files.

Patch 7 enables forced lock handoff to the first waiter in the wait
queue when it has waited for too long without acquiring the lock. This
prevents lock starvation and makes rwsem more fair.

Patch 8 enables readers to optimistically spin on a writer owned lock.

Patch 9 modifies rwsem_spin_on_owner() to return a tri-state value
that can be used in later patch.

Patch 10 enables writers to optimistically spin on reader-owned lock
using a fixed iteration count.

Patch 11 removes the rwsem_wake() optimization due to its effectiveness
has been reduced recently.

Patch 12 eliminates redundant wakeup calls to the same waiter by
multiple wakers.

Patch 13 improves fairness to writers by disabling reader spinning
when writers cannot spin on readers or there is a time stamp mismatch.

Patch 14 makes recently waken-up waiting writer to set the handoff
bit and optimistically spin for the lock instead of sleeping again
and wait for wakeup.  This makes rwsem favor writer from the wakeup
perspective.

Patch 15 makes reader wakeup to wake up all the readers in the wait
queue instead of just the ones in the front. This reduces the writer
preference of the previous 2 patches.

In term of rwsem performance, a rwsem microbenchmark and fio randrw
test with a xfs filesystem on a ramdisk were used to verify the
performance changes due to these patches. Both tests were run on a
2-socket, 40-core Gold 6148 system. The rwsem microbenchmark (1:1
reader/writer ratio) has short critical section while the fio randrw
test has long critical section (4k read/write).

The following tables show the performance of a rwsem microbenchmark
running on a 2-socket 36-core 72-thread x86-64 system. The
microbenchmark had 18 writer threads and 18 reader readers running on
a patched 4.14 based kernel for 10s under different critical section
loads (# of pause instructions).

           	       Reader 		 	    Writer
  CS Load	  Locking Ops/Thread	       Locking Ops/Thread
  -------	  ------------------	       ------------------
     1	    2,079,544/2,284,896/2,457,118   713,537/914,695/1,166,480
    10	    2,239,922/3,126,189/4,076,386   249,201/415,814/  612,465
    50	    1,826,276/2,163,704/2,842,305    72,111/198,479/  359,692
   100      1,587,516/1,899,778/2,256,065    14,586/251,545/  654,608
 1us sleep      8,034/    8,267/    8,555    57,555/ 64,190/   70,046

           	     Reader 		 	  Writer
  CS Load     Slowpath Locking Ops	   Slowpath Locking Ops
  -------     --------------------	   --------------------
     1	  	  3,987,189			3,992,714
    10	  	  1,460,878			1,463,589
    50	  	    609,273			  610,224
   100	  	    202,764			  201,770
 1us sleep	    148,805			1,155,410

The first table shows the minimum, average and maximum number of
locking operations done within the 10s period per locking thread. The
second table show the total number of reader and writer locking
operations that were done in the slowpath.

Looking at the first table, it was obvious that the readers are
preferred over the writers for non-sleeping loads.  Because of the
fact that multiple readers can share the same lock, readers have much
higher chance of acquring the lock via the fastpath.  This is a natural
preference for readers when measuring in term of locking throughput.

When considering what was happening within the slowpath, the number
of reader and writer operations processed in the slowpath were about
the same.  From the slowpath's perspective, it has equal preference
for readers and writers for non-sleeping loads.  For sleeping loads,
however, writers are more preferred.

The table below compares the the mean per-threads writer locking
operations done with equal number of reader and writer threads versus
an all writers configuration.

                 All Writers	    Half Writers
  CS Load    Locking Ops/Thread  Locking Ops/Thread	% Change
  -------    ------------------  ------------------	--------
     1	  	 1,183,273	     914,695		 -22.7%
    10	  	 1,035,676	     415,814		 -59.9%
    50	           577,067	     198,479		 -65.6%
   100    	   392,179	     251,545		 -35.9%
 1us sleep	    35,823	      64,190		 +79.2%

The corresponding rwsem microbenchmark performance on an unpatched
kernel were:

           	     Reader 		      Writer
  CS Load	Locking Ops/Thread	 Locking Ops/Thread
  -------	------------------	 ------------------
     1	    	9,521/9,521/9,522	9,534/397,336/710,196
    10	    	8,045/8,046/8,046	8,047/209,955/489,798
    50	    	7,730/7,730/7,731	7,730/172,723/347,213
   100      	5,037/5,038/5,039	5,037/163,691/694,101
 1us sleep        230/  231/  232	  230/ 97,288/822,645

                 All Writers	    Half Writers
  CS Load    Locking Ops/Thread  Locking Ops/Thread	% Change
  -------    ------------------  ------------------	--------
     1	  	 1,135,832	     397,336		 -65.0%
    10	  	   989,950	     209,955		 -78.8%
    50	           593,352	     172,723		 -70.9%
   100    	   369,227	     163,691		 -55.7%
 1us sleep	    49,437	      97,288		 +96.8%

All the performance numbers were worse than the patched kernel with
the exception of 1us sleep load writer performance. That comes with
greater variances as shown by the difference between the minimum and
maximum numbers.

The corresponding all writers numbers for the patched and unpatched
kernels were 32,743/35,823/38,697 and 9,378/49,437/137,200
respectively.  The patched kernel was more fair and hence suffered
some of performance loss.

Running a 36-thread fio randrw test on a ramdisk formatted with an
xfs filesystem, the aggregated bandwidth of the patched and unpatched
kernels were 2787 MB/s and 297 MB/s respectively. This is a difference
of about 10X.

Waiman Long (15):
  locking/rwsem: relocate rwsem_down_read_failed()
  locking/rwsem: Implement a new locking scheme
  locking/rwsem: Move owner setting code from rwsem.c to rwsem-xadd.h
  locking/rwsem: Remove kernel/locking/rwsem.h
  locking/rwsem: Move rwsem internal function declarations to
    rwsem-xadd.h
  locking/rwsem: Remove arch specific rwsem files
  locking/rwsem: Implement lock handoff to prevent lock starvation
  locking/rwsem: Enable readers spinning on writer
  locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value
  locking/rwsem: Enable count-based spinning on reader
  locking/rwsem: Remove rwsem_wake spinlock optimization
  locking/rwsem: Eliminate redundant writer wakeup calls
  locking/rwsem: Improve fairness to writers
  locking/rwsem: Make waiting writer to optimistically spin for the lock
  locking/rwsem: Wake up all readers in wait queue

 arch/alpha/include/asm/rwsem.h  | 210 -------------
 arch/arm/include/asm/Kbuild     |   1 -
 arch/arm64/include/asm/Kbuild   |   1 -
 arch/hexagon/include/asm/Kbuild |   1 -
 arch/ia64/include/asm/rwsem.h   | 171 -----------
 arch/powerpc/include/asm/Kbuild |   1 -
 arch/s390/include/asm/rwsem.h   | 225 --------------
 arch/sh/include/asm/Kbuild      |   1 -
 arch/sparc/include/asm/Kbuild   |   1 -
 arch/x86/include/asm/rwsem.h    | 236 ---------------
 arch/x86/lib/Makefile           |   1 -
 arch/x86/lib/rwsem.S            | 156 ----------
 arch/xtensa/include/asm/Kbuild  |   1 -
 include/asm-generic/rwsem.h     | 139 ---------
 include/linux/rwsem.h           |  19 +-
 kernel/locking/percpu-rwsem.c   |   4 +
 kernel/locking/rwsem-xadd.c     | 644 +++++++++++++++++++++++-----------------
 kernel/locking/rwsem-xadd.h     | 284 ++++++++++++++++++
 kernel/locking/rwsem.c          |  21 +-
 kernel/locking/rwsem.h          |  68 -----
 20 files changed, 674 insertions(+), 1511 deletions(-)
 delete mode 100644 arch/alpha/include/asm/rwsem.h
 delete mode 100644 arch/ia64/include/asm/rwsem.h
 delete mode 100644 arch/s390/include/asm/rwsem.h
 delete mode 100644 arch/x86/include/asm/rwsem.h
 delete mode 100644 arch/x86/lib/rwsem.S
 delete mode 100644 include/asm-generic/rwsem.h
 create mode 100644 kernel/locking/rwsem-xadd.h
 delete mode 100644 kernel/locking/rwsem.h

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH-tip v7 01/15] locking/rwsem: relocate rwsem_down_read_failed()
  2017-10-18 18:30 [PATCH-tip v7 00/15] locking/rwsem: Rework rwsem-xadd & enable new rwsem features Waiman Long
@ 2017-10-18 18:30 ` Waiman Long
  2017-10-18 18:30 ` [PATCH-tip v7 02/15] locking/rwsem: Implement a new locking scheme Waiman Long
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2017-10-18 18:30 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

The rwsem_down_read_failed*() functions were relocted from above the
optimistic spinning section to below that section. This enables
them to use functions in that section in future patches. There is no
code change.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 150 ++++++++++++++++++++++----------------------
 1 file changed, 75 insertions(+), 75 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 1fefe6d..db5dedf 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -219,81 +219,6 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 }
 
 /*
- * Wait for the read lock to be granted
- */
-static inline struct rw_semaphore __sched *
-__rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
-{
-	long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
-	struct rwsem_waiter waiter;
-	DEFINE_WAKE_Q(wake_q);
-
-	waiter.task = current;
-	waiter.type = RWSEM_WAITING_FOR_READ;
-
-	raw_spin_lock_irq(&sem->wait_lock);
-	if (list_empty(&sem->wait_list))
-		adjustment += RWSEM_WAITING_BIAS;
-	list_add_tail(&waiter.list, &sem->wait_list);
-
-	/* we're now waiting on the lock, but no longer actively locking */
-	count = atomic_long_add_return(adjustment, &sem->count);
-
-	/*
-	 * If there are no active locks, wake the front queued process(es).
-	 *
-	 * If there are no writers and we are first in the queue,
-	 * wake our own waiter to join the existing active readers !
-	 */
-	if (count == RWSEM_WAITING_BIAS ||
-	    (count > RWSEM_WAITING_BIAS &&
-	     adjustment != -RWSEM_ACTIVE_READ_BIAS))
-		__rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
-
-	raw_spin_unlock_irq(&sem->wait_lock);
-	wake_up_q(&wake_q);
-
-	/* wait to be given the lock */
-	while (true) {
-		set_current_state(state);
-		if (!waiter.task)
-			break;
-		if (signal_pending_state(state, current)) {
-			raw_spin_lock_irq(&sem->wait_lock);
-			if (waiter.task)
-				goto out_nolock;
-			raw_spin_unlock_irq(&sem->wait_lock);
-			break;
-		}
-		schedule();
-	}
-
-	__set_current_state(TASK_RUNNING);
-	return sem;
-out_nolock:
-	list_del(&waiter.list);
-	if (list_empty(&sem->wait_list))
-		atomic_long_add(-RWSEM_WAITING_BIAS, &sem->count);
-	raw_spin_unlock_irq(&sem->wait_lock);
-	__set_current_state(TASK_RUNNING);
-	return ERR_PTR(-EINTR);
-}
-
-__visible struct rw_semaphore * __sched
-rwsem_down_read_failed(struct rw_semaphore *sem)
-{
-	return __rwsem_down_read_failed_common(sem, TASK_UNINTERRUPTIBLE);
-}
-EXPORT_SYMBOL(rwsem_down_read_failed);
-
-__visible struct rw_semaphore * __sched
-rwsem_down_read_failed_killable(struct rw_semaphore *sem)
-{
-	return __rwsem_down_read_failed_common(sem, TASK_KILLABLE);
-}
-EXPORT_SYMBOL(rwsem_down_read_failed_killable);
-
-/*
  * This function must be called with the sem->wait_lock held to prevent
  * race conditions between checking the rwsem wait list and setting the
  * sem->count accordingly.
@@ -488,6 +413,81 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 #endif
 
 /*
+ * Wait for the read lock to be granted
+ */
+static inline struct rw_semaphore __sched *
+__rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
+{
+	long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
+	struct rwsem_waiter waiter;
+	DEFINE_WAKE_Q(wake_q);
+
+	waiter.task = current;
+	waiter.type = RWSEM_WAITING_FOR_READ;
+
+	raw_spin_lock_irq(&sem->wait_lock);
+	if (list_empty(&sem->wait_list))
+		adjustment += RWSEM_WAITING_BIAS;
+	list_add_tail(&waiter.list, &sem->wait_list);
+
+	/* we're now waiting on the lock, but no longer actively locking */
+	count = atomic_long_add_return(adjustment, &sem->count);
+
+	/*
+	 * If there are no active locks, wake the front queued process(es).
+	 *
+	 * If there are no writers and we are first in the queue,
+	 * wake our own waiter to join the existing active readers !
+	 */
+	if (count == RWSEM_WAITING_BIAS ||
+	    (count > RWSEM_WAITING_BIAS &&
+	     adjustment != -RWSEM_ACTIVE_READ_BIAS))
+		__rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
+
+	raw_spin_unlock_irq(&sem->wait_lock);
+	wake_up_q(&wake_q);
+
+	/* wait to be given the lock */
+	while (true) {
+		set_current_state(state);
+		if (!waiter.task)
+			break;
+		if (signal_pending_state(state, current)) {
+			raw_spin_lock_irq(&sem->wait_lock);
+			if (waiter.task)
+				goto out_nolock;
+			raw_spin_unlock_irq(&sem->wait_lock);
+			break;
+		}
+		schedule();
+	}
+
+	__set_current_state(TASK_RUNNING);
+	return sem;
+out_nolock:
+	list_del(&waiter.list);
+	if (list_empty(&sem->wait_list))
+		atomic_long_add(-RWSEM_WAITING_BIAS, &sem->count);
+	raw_spin_unlock_irq(&sem->wait_lock);
+	__set_current_state(TASK_RUNNING);
+	return ERR_PTR(-EINTR);
+}
+
+__visible struct rw_semaphore * __sched
+rwsem_down_read_failed(struct rw_semaphore *sem)
+{
+	return __rwsem_down_read_failed_common(sem, TASK_UNINTERRUPTIBLE);
+}
+EXPORT_SYMBOL(rwsem_down_read_failed);
+
+__visible struct rw_semaphore * __sched
+rwsem_down_read_failed_killable(struct rw_semaphore *sem)
+{
+	return __rwsem_down_read_failed_common(sem, TASK_KILLABLE);
+}
+EXPORT_SYMBOL(rwsem_down_read_failed_killable);
+
+/*
  * Wait until we successfully acquire the write lock
  */
 static inline struct rw_semaphore *
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH-tip v7 02/15] locking/rwsem: Implement a new locking scheme
  2017-10-18 18:30 [PATCH-tip v7 00/15] locking/rwsem: Rework rwsem-xadd & enable new rwsem features Waiman Long
  2017-10-18 18:30 ` [PATCH-tip v7 01/15] locking/rwsem: relocate rwsem_down_read_failed() Waiman Long
@ 2017-10-18 18:30 ` Waiman Long
  2017-10-18 18:30 ` [PATCH-tip v7 03/15] locking/rwsem: Move owner setting code from rwsem.c to rwsem-xadd.h Waiman Long
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2017-10-18 18:30 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

The current way of using various reader, writer and waiting biases
in the rwsem code are confusing and hard to understand. I have to
reread the rwsem count guide in the rwsem-xadd.c file from time to
time to remind myself how this whole thing works. It also makes the
rwsem code harder to be optimized.

To make rwsem more sane, a new locking scheme similar to the one in
qrwlock is now being used.  The count is now a 32-bit atomic value
in all architectures. The current bit definitions are:

  Bit  0    - writer locked bit
  Bit  1    - waiters present bit
  Bits 2-7  - reserved for future extension
  Bits 8-31 - reader count

Now the cmpxchg instruction is used to acquire the write lock. The
read lock is still acquired with xadd instruction, so there is no
change here.  This scheme will allow up to 16M active readers which
should be more than enough. We can always use some more reserved bits
if necessary.

The same generic locking code will be used for all the architectures
and the architecture specific files will be retired.

This patch also hide the fastpath implementation of rwsem (now in
kernel/locking/rwsem-xadd.h) from the other kernel code as
include/linux/rwsem.h will not include it.

With a locking microbenchmark running on 3.13 based kernel, the total
locking rates (in Mops/s) of the benchmark on a 2-socket 36-core
x86-64 system before and after the patch were as follows:

                  Before Patch      After Patch
   # of Threads  wlock    rlock    wlock    rlock
   ------------  -----    -----    -----    -----
        1        39.039   33.401   40.432   33.093
        2         9.767   17.250   11.424   18.763
        4         9.069   17.580   10.085   17.372
        8         9.390   15.372   11.733   14.507

The locking rates of the benchmark on a 16-processor Power8 system
were as follows:

                  Before Patch      After Patch
   # of Threads  wlock    rlock    wlock    rlock
   ------------  -----    -----    -----    -----
        1        15.086   13.738    9.373   13.597
        2         4.864    6.280    5.514    6.309
        4         3.286    4.932    4.153    5.011
        8         2.637    2.248    3.528    2.189

The locking rates of the benchmark on a 32-core Cavium ARM64 system
were as follows:

                  Before Patch      After Patch
   # of Threads  wlock    rlock    wlock    rlock
   ------------  -----    -----    -----    -----
        1        4.849    3.972    5.194    4.223
        2        3.165    4.628    3.077    4.885
        4        0.742    3.856    0.716    4.136
        8        1.639    2.443    1.330    2.475

For read lock, locking performance was about the same before and
after the patch. For write lock, the new code had better contended
performance (2 or more threads) for both x86 and ppc, but it seemed to
slow down a bit in arm64. The uncontended performance, however, suffers
quite a bit in ppc, but not in x86 and arm64. So cmpxchg does have a
noticeable higher cost than xadd in ppc, the elimination of the atomic
count reversal in slowpath helps the contended performance, though.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/asm-generic/rwsem.h   | 139 --------------------------------------
 include/linux/rwsem.h         |  12 ++--
 kernel/locking/percpu-rwsem.c |   2 +
 kernel/locking/rwsem-xadd.c   | 150 ++++++++++++++----------------------------
 kernel/locking/rwsem-xadd.h   | 128 +++++++++++++++++++++++++++++++++++
 kernel/locking/rwsem.h        |   4 ++
 6 files changed, 187 insertions(+), 248 deletions(-)
 delete mode 100644 include/asm-generic/rwsem.h
 create mode 100644 kernel/locking/rwsem-xadd.h

diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
deleted file mode 100644
index b2d68d2..0000000
--- a/include/asm-generic/rwsem.h
+++ /dev/null
@@ -1,139 +0,0 @@
-#ifndef _ASM_GENERIC_RWSEM_H
-#define _ASM_GENERIC_RWSEM_H
-
-#ifndef _LINUX_RWSEM_H
-#error "Please don't include <asm/rwsem.h> directly, use <linux/rwsem.h> instead."
-#endif
-
-#ifdef __KERNEL__
-
-/*
- * R/W semaphores originally for PPC using the stuff in lib/rwsem.c.
- * Adapted largely from include/asm-i386/rwsem.h
- * by Paul Mackerras <paulus@samba.org>.
- */
-
-/*
- * the semaphore definition
- */
-#ifdef CONFIG_64BIT
-# define RWSEM_ACTIVE_MASK		0xffffffffL
-#else
-# define RWSEM_ACTIVE_MASK		0x0000ffffL
-#endif
-
-#define RWSEM_UNLOCKED_VALUE		0x00000000L
-#define RWSEM_ACTIVE_BIAS		0x00000001L
-#define RWSEM_WAITING_BIAS		(-RWSEM_ACTIVE_MASK-1)
-#define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
-
-/*
- * lock for reading
- */
-static inline void __down_read(struct rw_semaphore *sem)
-{
-	if (unlikely(atomic_long_inc_return_acquire(&sem->count) <= 0))
-		rwsem_down_read_failed(sem);
-}
-
-static inline int __down_read_killable(struct rw_semaphore *sem)
-{
-	if (unlikely(atomic_long_inc_return_acquire(&sem->count) <= 0)) {
-		if (IS_ERR(rwsem_down_read_failed_killable(sem)))
-			return -EINTR;
-	}
-
-	return 0;
-}
-
-static inline int __down_read_trylock(struct rw_semaphore *sem)
-{
-	long tmp;
-
-	while ((tmp = atomic_long_read(&sem->count)) >= 0) {
-		if (tmp == atomic_long_cmpxchg_acquire(&sem->count, tmp,
-				   tmp + RWSEM_ACTIVE_READ_BIAS)) {
-			return 1;
-		}
-	}
-	return 0;
-}
-
-/*
- * lock for writing
- */
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	long tmp;
-
-	tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS,
-					     &sem->count);
-	if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
-		rwsem_down_write_failed(sem);
-}
-
-static inline int __down_write_killable(struct rw_semaphore *sem)
-{
-	long tmp;
-
-	tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS,
-					     &sem->count);
-	if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
-		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
-			return -EINTR;
-	return 0;
-}
-
-static inline int __down_write_trylock(struct rw_semaphore *sem)
-{
-	long tmp;
-
-	tmp = atomic_long_cmpxchg_acquire(&sem->count, RWSEM_UNLOCKED_VALUE,
-		      RWSEM_ACTIVE_WRITE_BIAS);
-	return tmp == RWSEM_UNLOCKED_VALUE;
-}
-
-/*
- * unlock after reading
- */
-static inline void __up_read(struct rw_semaphore *sem)
-{
-	long tmp;
-
-	tmp = atomic_long_dec_return_release(&sem->count);
-	if (unlikely(tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0))
-		rwsem_wake(sem);
-}
-
-/*
- * unlock after writing
- */
-static inline void __up_write(struct rw_semaphore *sem)
-{
-	if (unlikely(atomic_long_sub_return_release(RWSEM_ACTIVE_WRITE_BIAS,
-						    &sem->count) < 0))
-		rwsem_wake(sem);
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void __downgrade_write(struct rw_semaphore *sem)
-{
-	long tmp;
-
-	/*
-	 * When downgrading from exclusive to shared ownership,
-	 * anything inside the write-locked region cannot leak
-	 * into the read side. In contrast, anything in the
-	 * read-locked region is ok to be re-ordered into the
-	 * write side. As such, rely on RELEASE semantics.
-	 */
-	tmp = atomic_long_add_return_release(-RWSEM_WAITING_BIAS, &sem->count);
-	if (tmp < 0)
-		rwsem_downgrade_wake(sem);
-}
-
-#endif	/* __KERNEL__ */
-#endif	/* _ASM_GENERIC_RWSEM_H */
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 6ac8ee5..398c748 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -25,11 +25,10 @@
 #include <linux/rwsem-spinlock.h> /* use a generic implementation */
 #define __RWSEM_INIT_COUNT(name)	.count = RWSEM_UNLOCKED_VALUE
 #else
-/* All arch specific implementations share the same struct */
 struct rw_semaphore {
-	atomic_long_t count;
-	struct list_head wait_list;
+	atomic_t count;
 	raw_spinlock_t wait_lock;
+	struct list_head wait_list;
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 	struct optimistic_spin_queue osq; /* spinner MCS lock */
 	/*
@@ -50,16 +49,15 @@ struct rw_semaphore {
 extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *);
 extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
 
-/* Include the arch specific part */
-#include <asm/rwsem.h>
+#define RWSEM_UNLOCKED_VALUE	0
 
 /* In all implementations count != 0 means locked */
 static inline int rwsem_is_locked(struct rw_semaphore *sem)
 {
-	return atomic_long_read(&sem->count) != 0;
+	return atomic_read(&sem->count) != RWSEM_UNLOCKED_VALUE;
 }
 
-#define __RWSEM_INIT_COUNT(name)	.count = ATOMIC_LONG_INIT(RWSEM_UNLOCKED_VALUE)
+#define __RWSEM_INIT_COUNT(name)	.count = ATOMIC_INIT(RWSEM_UNLOCKED_VALUE)
 #endif
 
 /* Common initializer macros and functions */
diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
index 883cf1b..f17dad9 100644
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -7,6 +7,8 @@
 #include <linux/sched.h>
 #include <linux/errno.h>
 
+#include "rwsem.h"
+
 int __percpu_init_rwsem(struct percpu_rw_semaphore *sem,
 			const char *name, struct lock_class_key *rwsem_key)
 {
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index db5dedf..1d02b8b 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -21,52 +21,20 @@
 #include "rwsem.h"
 
 /*
- * Guide to the rw_semaphore's count field for common values.
- * (32-bit case illustrated, similar for 64-bit)
+ * Guide to the rw_semaphore's count field.
  *
- * 0x0000000X	(1) X readers active or attempting lock, no writer waiting
- *		    X = #active_readers + #readers attempting to lock
- *		    (X*ACTIVE_BIAS)
+ * When the RWSEM_WRITER_LOCKED bit in count is set, the lock is owned
+ * by a writer.
  *
- * 0x00000000	rwsem is unlocked, and no one is waiting for the lock or
- *		attempting to read lock or write lock.
- *
- * 0xffff000X	(1) X readers active or attempting lock, with waiters for lock
- *		    X = #active readers + # readers attempting lock
- *		    (X*ACTIVE_BIAS + WAITING_BIAS)
- *		(2) 1 writer attempting lock, no waiters for lock
- *		    X-1 = #active readers + #readers attempting lock
- *		    ((X-1)*ACTIVE_BIAS + ACTIVE_WRITE_BIAS)
- *		(3) 1 writer active, no waiters for lock
- *		    X-1 = #active readers + #readers attempting lock
- *		    ((X-1)*ACTIVE_BIAS + ACTIVE_WRITE_BIAS)
- *
- * 0xffff0001	(1) 1 reader active or attempting lock, waiters for lock
- *		    (WAITING_BIAS + ACTIVE_BIAS)
- *		(2) 1 writer active or attempting lock, no waiters for lock
- *		    (ACTIVE_WRITE_BIAS)
- *
- * 0xffff0000	(1) There are writers or readers queued but none active
- *		    or in the process of attempting lock.
- *		    (WAITING_BIAS)
- *		Note: writer can attempt to steal lock for this count by adding
- *		ACTIVE_WRITE_BIAS in cmpxchg and checking the old count
- *
- * 0xfffe0001	(1) 1 writer active, or attempting lock. Waiters on queue.
- *		    (ACTIVE_WRITE_BIAS + WAITING_BIAS)
- *
- * Note: Readers attempt to lock by adding ACTIVE_BIAS in down_read and checking
- *	 the count becomes more than 0 for successful lock acquisition,
- *	 i.e. the case where there are only readers or nobody has lock.
- *	 (1st and 2nd case above).
- *
- *	 Writers attempt to lock by adding ACTIVE_WRITE_BIAS in down_write and
- *	 checking the count becomes ACTIVE_WRITE_BIAS for successful lock
- *	 acquisition (i.e. nobody else has lock or attempts lock).  If
- *	 unsuccessful, in rwsem_down_write_failed, we'll check to see if there
- *	 are only waiters but none active (5th case above), and attempt to
- *	 steal the lock.
+ * The lock is owned by readers when
+ * (1) the RWSEM_WRITER_LOCKED isn't set in count,
+ * (2) some of the reader bits are set in count, and
+ * (3) the owner field is RWSEM_READ_OWNED.
  *
+ * Having some reader bits set is not enough to guarantee a readers owned
+ * lock as the readers may be in the process of backing out from the count
+ * and a writer has just released the lock. So another writer may steal
+ * the lock immediately after that.
  */
 
 /*
@@ -82,7 +50,7 @@ void __init_rwsem(struct rw_semaphore *sem, const char *name,
 	debug_check_no_locks_freed((void *)sem, sizeof(*sem));
 	lockdep_init_map(&sem->dep_map, name, key, 0);
 #endif
-	atomic_long_set(&sem->count, RWSEM_UNLOCKED_VALUE);
+	atomic_set(&sem->count, RWSEM_UNLOCKED_VALUE);
 	raw_spin_lock_init(&sem->wait_lock);
 	INIT_LIST_HEAD(&sem->wait_list);
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
@@ -112,9 +80,8 @@ enum rwsem_wake_type {
 
 /*
  * handle the lock release when processes blocked on it that can now run
- * - if we come here from up_xxxx(), then:
- *   - the 'active part' of count (&0x0000ffff) reached 0 (but may have changed)
- *   - the 'waiting part' of count (&0xffff0000) is -ve (and will still be so)
+ * - if we come here from up_xxxx(), then the RWSEM_FLAG_WAITERS bit must
+ *   have been set.
  * - there must be someone on the queue
  * - the wait_lock must be held by the caller
  * - tasks are marked for wakeup, the caller must later invoke wake_up_q()
@@ -128,7 +95,7 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 			      struct wake_q_head *wake_q)
 {
 	struct rwsem_waiter *waiter, *tmp;
-	long oldcount, woken = 0, adjustment = 0;
+	int oldcount, woken = 0, adjustment = 0;
 
 	/*
 	 * Take a peek at the queue head waiter such that we can determine
@@ -157,22 +124,11 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 	 * so we can bail out early if a writer stole the lock.
 	 */
 	if (wake_type != RWSEM_WAKE_READ_OWNED) {
-		adjustment = RWSEM_ACTIVE_READ_BIAS;
- try_reader_grant:
-		oldcount = atomic_long_fetch_add(adjustment, &sem->count);
-		if (unlikely(oldcount < RWSEM_WAITING_BIAS)) {
-			/*
-			 * If the count is still less than RWSEM_WAITING_BIAS
-			 * after removing the adjustment, it is assumed that
-			 * a writer has stolen the lock. We have to undo our
-			 * reader grant.
-			 */
-			if (atomic_long_add_return(-adjustment, &sem->count) <
-			    RWSEM_WAITING_BIAS)
-				return;
-
-			/* Last active locker left. Retry waking readers. */
-			goto try_reader_grant;
+		adjustment = RWSEM_READER_BIAS;
+		oldcount = atomic_fetch_add(adjustment, &sem->count);
+		if (unlikely(oldcount & RWSEM_WRITER_LOCKED)) {
+			atomic_sub(adjustment, &sem->count);
+			return;
 		}
 		/*
 		 * It is not really necessary to set it to reader-owned here,
@@ -208,14 +164,14 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 		smp_store_release(&waiter->task, NULL);
 	}
 
-	adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
+	adjustment = woken * RWSEM_READER_BIAS - adjustment;
 	if (list_empty(&sem->wait_list)) {
 		/* hit end of list above */
-		adjustment -= RWSEM_WAITING_BIAS;
+		adjustment -= RWSEM_FLAG_WAITERS;
 	}
 
 	if (adjustment)
-		atomic_long_add(adjustment, &sem->count);
+		atomic_add(adjustment, &sem->count);
 }
 
 /*
@@ -223,24 +179,17 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
  * race conditions between checking the rwsem wait list and setting the
  * sem->count accordingly.
  */
-static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem)
+static inline bool rwsem_try_write_lock(int count, struct rw_semaphore *sem)
 {
-	/*
-	 * Avoid trying to acquire write lock if count isn't RWSEM_WAITING_BIAS.
-	 */
-	if (count != RWSEM_WAITING_BIAS)
+	int new;
+
+	if (RWSEM_COUNT_LOCKED(count))
 		return false;
 
-	/*
-	 * Acquire the lock by trying to set it to ACTIVE_WRITE_BIAS. If there
-	 * are other tasks on the wait list, we need to add on WAITING_BIAS.
-	 */
-	count = list_is_singular(&sem->wait_list) ?
-			RWSEM_ACTIVE_WRITE_BIAS :
-			RWSEM_ACTIVE_WRITE_BIAS + RWSEM_WAITING_BIAS;
+	new = count + RWSEM_WRITER_LOCKED -
+	     (list_is_singular(&sem->wait_list) ? RWSEM_FLAG_WAITERS : 0);
 
-	if (atomic_long_cmpxchg_acquire(&sem->count, RWSEM_WAITING_BIAS, count)
-							== RWSEM_WAITING_BIAS) {
+	if (atomic_cmpxchg_acquire(&sem->count, count, new) == count) {
 		rwsem_set_owner(sem);
 		return true;
 	}
@@ -254,14 +203,14 @@ static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem)
  */
 static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
 {
-	long old, count = atomic_long_read(&sem->count);
+	int old, count = atomic_read(&sem->count);
 
 	while (true) {
-		if (!(count == 0 || count == RWSEM_WAITING_BIAS))
+		if (RWSEM_COUNT_LOCKED(count))
 			return false;
 
-		old = atomic_long_cmpxchg_acquire(&sem->count, count,
-				      count + RWSEM_ACTIVE_WRITE_BIAS);
+		old = atomic_cmpxchg_acquire(&sem->count, count,
+				count + RWSEM_WRITER_LOCKED);
 		if (old == count) {
 			rwsem_set_owner(sem);
 			return true;
@@ -418,7 +367,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 static inline struct rw_semaphore __sched *
 __rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
 {
-	long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
+	int count, adjustment = -RWSEM_READER_BIAS;
 	struct rwsem_waiter waiter;
 	DEFINE_WAKE_Q(wake_q);
 
@@ -427,11 +376,11 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 
 	raw_spin_lock_irq(&sem->wait_lock);
 	if (list_empty(&sem->wait_list))
-		adjustment += RWSEM_WAITING_BIAS;
+		adjustment += RWSEM_FLAG_WAITERS;
 	list_add_tail(&waiter.list, &sem->wait_list);
 
 	/* we're now waiting on the lock, but no longer actively locking */
-	count = atomic_long_add_return(adjustment, &sem->count);
+	count = atomic_add_return(adjustment, &sem->count);
 
 	/*
 	 * If there are no active locks, wake the front queued process(es).
@@ -439,9 +388,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	 * If there are no writers and we are first in the queue,
 	 * wake our own waiter to join the existing active readers !
 	 */
-	if (count == RWSEM_WAITING_BIAS ||
-	    (count > RWSEM_WAITING_BIAS &&
-	     adjustment != -RWSEM_ACTIVE_READ_BIAS))
+	if (!RWSEM_COUNT_LOCKED(count))
 		__rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
 
 	raw_spin_unlock_irq(&sem->wait_lock);
@@ -467,7 +414,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 out_nolock:
 	list_del(&waiter.list);
 	if (list_empty(&sem->wait_list))
-		atomic_long_add(-RWSEM_WAITING_BIAS, &sem->count);
+		atomic_add(-RWSEM_FLAG_WAITERS, &sem->count);
 	raw_spin_unlock_irq(&sem->wait_lock);
 	__set_current_state(TASK_RUNNING);
 	return ERR_PTR(-EINTR);
@@ -493,15 +440,12 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 static inline struct rw_semaphore *
 __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 {
-	long count;
+	int count;
 	bool waiting = true; /* any queued threads before us */
 	struct rwsem_waiter waiter;
 	struct rw_semaphore *ret = sem;
 	DEFINE_WAKE_Q(wake_q);
 
-	/* undo write bias from down_write operation, stop active locking */
-	count = atomic_long_sub_return(RWSEM_ACTIVE_WRITE_BIAS, &sem->count);
-
 	/* do optimistic spinning and steal lock if possible */
 	if (rwsem_optimistic_spin(sem))
 		return sem;
@@ -523,14 +467,14 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 
 	/* we're now waiting on the lock, but no longer actively locking */
 	if (waiting) {
-		count = atomic_long_read(&sem->count);
+		count = atomic_read(&sem->count);
 
 		/*
 		 * If there were already threads queued before us and there are
 		 * no active writers, the lock must be read owned; so we try to
 		 * wake any read locks that were queued ahead of us.
 		 */
-		if (count > RWSEM_WAITING_BIAS) {
+		if (!(count & RWSEM_WRITER_LOCKED)) {
 			__rwsem_mark_wake(sem, RWSEM_WAKE_READERS, &wake_q);
 			/*
 			 * The wakeup is normally called _after_ the wait_lock
@@ -547,8 +491,9 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 			wake_q_init(&wake_q);
 		}
 
-	} else
-		count = atomic_long_add_return(RWSEM_WAITING_BIAS, &sem->count);
+	} else {
+		count = atomic_add_return(RWSEM_FLAG_WAITERS, &sem->count);
+	}
 
 	/* wait until we successfully acquire the lock */
 	set_current_state(state);
@@ -564,7 +509,8 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 
 			schedule();
 			set_current_state(state);
-		} while ((count = atomic_long_read(&sem->count)) & RWSEM_ACTIVE_MASK);
+			count = atomic_read(&sem->count);
+		} while (RWSEM_COUNT_LOCKED(count));
 
 		raw_spin_lock_irq(&sem->wait_lock);
 	}
@@ -579,7 +525,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	raw_spin_lock_irq(&sem->wait_lock);
 	list_del(&waiter.list);
 	if (list_empty(&sem->wait_list))
-		atomic_long_add(-RWSEM_WAITING_BIAS, &sem->count);
+		atomic_add(-RWSEM_FLAG_WAITERS, &sem->count);
 	else
 		__rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
 	raw_spin_unlock_irq(&sem->wait_lock);
diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h
new file mode 100644
index 0000000..f0e8ba3
--- /dev/null
+++ b/kernel/locking/rwsem-xadd.h
@@ -0,0 +1,128 @@
+#ifndef _ASM_GENERIC_RWSEM_H
+#define _ASM_GENERIC_RWSEM_H
+
+#include <linux/rwsem.h>
+
+/*
+ * The definition of the atomic counter in the semaphore:
+ *
+ * Bit  0    - writer locked bit
+ * Bit  1    - waiters present bit
+ * Bits 2-7  - reserved
+ * Bits 8-31 - 24-bit reader count
+ *
+ * atomic_fetch_add() is used to obtain reader lock, whereas atomic_cmpxchg()
+ * will be used to obtain writer lock.
+ */
+#define RWSEM_WRITER_LOCKED	0X00000001
+#define RWSEM_FLAG_WAITERS	0X00000002
+#define RWSEM_READER_BIAS	0x00000100
+#define RWSEM_READER_SHIFT	8
+#define RWSEM_READER_MASK	(~((1U << RWSEM_READER_SHIFT) - 1))
+#define RWSEM_LOCK_MASK 	(RWSEM_WRITER_LOCKED|RWSEM_READER_MASK)
+#define RWSEM_READ_FAILED_MASK	(RWSEM_WRITER_LOCKED|RWSEM_FLAG_WAITERS)
+
+#define RWSEM_COUNT_LOCKED(c)	((c) & RWSEM_LOCK_MASK)
+
+/*
+ * lock for reading
+ */
+static inline void __down_read(struct rw_semaphore *sem)
+{
+	if (unlikely(atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count)
+		     & RWSEM_READ_FAILED_MASK))
+		rwsem_down_read_failed(sem);
+}
+
+static inline int __down_read_killable(struct rw_semaphore *sem)
+{
+	if (unlikely(atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count)
+		& RWSEM_READ_FAILED_MASK)) {
+		if (IS_ERR(rwsem_down_read_failed_killable(sem)))
+			return -EINTR;
+	}
+
+	return 0;
+}
+
+static inline int __down_read_trylock(struct rw_semaphore *sem)
+{
+	int tmp;
+
+	while (!((tmp = atomic_read(&sem->count)) & RWSEM_READ_FAILED_MASK)) {
+		if (tmp == atomic_cmpxchg_acquire(&sem->count, tmp,
+				   tmp + RWSEM_READER_BIAS)) {
+			return 1;
+		}
+	}
+	return 0;
+}
+
+/*
+ * lock for writing
+ */
+static inline void __down_write(struct rw_semaphore *sem)
+{
+	if (unlikely(atomic_cmpxchg_acquire(&sem->count, 0,
+					    RWSEM_WRITER_LOCKED)))
+		rwsem_down_write_failed(sem);
+}
+
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	if (unlikely(atomic_cmpxchg_acquire(&sem->count, 0,
+					    RWSEM_WRITER_LOCKED)))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+	return 0;
+}
+
+static inline int __down_write_trylock(struct rw_semaphore *sem)
+{
+	return !atomic_cmpxchg_acquire(&sem->count, 0, RWSEM_WRITER_LOCKED);
+}
+
+/*
+ * unlock after reading
+ */
+static inline void __up_read(struct rw_semaphore *sem)
+{
+	int tmp;
+
+	tmp = atomic_add_return_release(-RWSEM_READER_BIAS, &sem->count);
+	if (unlikely((tmp & (RWSEM_LOCK_MASK|RWSEM_FLAG_WAITERS))
+			== RWSEM_FLAG_WAITERS))
+		rwsem_wake(sem);
+}
+
+/*
+ * unlock after writing
+ */
+static inline void __up_write(struct rw_semaphore *sem)
+{
+	if (unlikely(atomic_fetch_add_release(-RWSEM_WRITER_LOCKED,
+			&sem->count) & RWSEM_FLAG_WAITERS))
+		rwsem_wake(sem);
+}
+
+/*
+ * downgrade write lock to read lock
+ */
+static inline void __downgrade_write(struct rw_semaphore *sem)
+{
+	int tmp;
+
+	/*
+	 * When downgrading from exclusive to shared ownership,
+	 * anything inside the write-locked region cannot leak
+	 * into the read side. In contrast, anything in the
+	 * read-locked region is ok to be re-ordered into the
+	 * write side. As such, rely on RELEASE semantics.
+	 */
+	tmp = atomic_fetch_add_release(-RWSEM_WRITER_LOCKED+RWSEM_READER_BIAS,
+					&sem->count);
+	if (tmp & RWSEM_FLAG_WAITERS)
+		rwsem_downgrade_wake(sem);
+}
+
+#endif	/* _ASM_GENERIC_RWSEM_H */
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index a699f40..adcc5af 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -66,3 +66,7 @@ static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 {
 }
 #endif
+
+#ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM
+#include "rwsem-xadd.h"
+#endif
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH-tip v7 03/15] locking/rwsem: Move owner setting code from rwsem.c to rwsem-xadd.h
  2017-10-18 18:30 [PATCH-tip v7 00/15] locking/rwsem: Rework rwsem-xadd & enable new rwsem features Waiman Long
  2017-10-18 18:30 ` [PATCH-tip v7 01/15] locking/rwsem: relocate rwsem_down_read_failed() Waiman Long
  2017-10-18 18:30 ` [PATCH-tip v7 02/15] locking/rwsem: Implement a new locking scheme Waiman Long
@ 2017-10-18 18:30 ` Waiman Long
  2017-10-18 18:30 ` [PATCH-tip v7 04/15] locking/rwsem: Remove kernel/locking/rwsem.h Waiman Long
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2017-10-18 18:30 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

The setting of owner field is specific to rwsem-xadd, it is not needed
for rwsem-spinlock. This patch moves all the owner setting code to
the fast paths directly within rwsem-add.h file.

The rwsem_set_reader_owned() is now only called by the first reader.
So there is no need to do a read to check if the owner is properly
set first.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c |  7 +++----
 kernel/locking/rwsem-xadd.h | 26 +++++++++++++++++++++-----
 kernel/locking/rwsem.c      | 17 ++---------------
 kernel/locking/rwsem.h      | 11 ++++-------
 4 files changed, 30 insertions(+), 31 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 1d02b8b..8df0b51 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -131,11 +131,10 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 			return;
 		}
 		/*
-		 * It is not really necessary to set it to reader-owned here,
-		 * but it gives the spinners an early indication that the
-		 * readers now have the lock.
+		 * Set it to reader-owned for first reader.
 		 */
-		rwsem_set_reader_owned(sem);
+		if (!(oldcount >> RWSEM_READER_SHIFT))
+			rwsem_set_reader_owned(sem);
 	}
 
 	/*
diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h
index f0e8ba3..f727b6c 100644
--- a/kernel/locking/rwsem-xadd.h
+++ b/kernel/locking/rwsem-xadd.h
@@ -29,17 +29,23 @@
  */
 static inline void __down_read(struct rw_semaphore *sem)
 {
-	if (unlikely(atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count)
-		     & RWSEM_READ_FAILED_MASK))
+	int count = atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count);
+
+	if (unlikely(count & RWSEM_READ_FAILED_MASK))
 		rwsem_down_read_failed(sem);
+	else if ((count >> RWSEM_READER_SHIFT) == 1)
+		rwsem_set_reader_owned(sem);
 }
 
 static inline int __down_read_killable(struct rw_semaphore *sem)
 {
-	if (unlikely(atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count)
-		& RWSEM_READ_FAILED_MASK)) {
+	int count = atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count);
+
+	if (unlikely(count & RWSEM_READ_FAILED_MASK)) {
 		if (IS_ERR(rwsem_down_read_failed_killable(sem)))
 			return -EINTR;
+	} else if ((count >> RWSEM_READER_SHIFT) == 1) {
+		rwsem_set_reader_owned(sem);
 	}
 
 	return 0;
@@ -52,6 +58,8 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 	while (!((tmp = atomic_read(&sem->count)) & RWSEM_READ_FAILED_MASK)) {
 		if (tmp == atomic_cmpxchg_acquire(&sem->count, tmp,
 				   tmp + RWSEM_READER_BIAS)) {
+			if (!(tmp >> RWSEM_READER_SHIFT))
+				rwsem_set_reader_owned(sem);
 			return 1;
 		}
 	}
@@ -66,6 +74,7 @@ static inline void __down_write(struct rw_semaphore *sem)
 	if (unlikely(atomic_cmpxchg_acquire(&sem->count, 0,
 					    RWSEM_WRITER_LOCKED)))
 		rwsem_down_write_failed(sem);
+	rwsem_set_owner(sem);
 }
 
 static inline int __down_write_killable(struct rw_semaphore *sem)
@@ -74,12 +83,17 @@ static inline int __down_write_killable(struct rw_semaphore *sem)
 					    RWSEM_WRITER_LOCKED)))
 		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
 			return -EINTR;
+	rwsem_set_owner(sem);
 	return 0;
 }
 
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
-	return !atomic_cmpxchg_acquire(&sem->count, 0, RWSEM_WRITER_LOCKED);
+	bool taken = !atomic_cmpxchg_acquire(&sem->count, 0,
+					     RWSEM_WRITER_LOCKED);
+	if (taken)
+		rwsem_set_owner(sem);
+	return taken;
 }
 
 /*
@@ -100,6 +114,7 @@ static inline void __up_read(struct rw_semaphore *sem)
  */
 static inline void __up_write(struct rw_semaphore *sem)
 {
+	rwsem_clear_owner(sem);
 	if (unlikely(atomic_fetch_add_release(-RWSEM_WRITER_LOCKED,
 			&sem->count) & RWSEM_FLAG_WAITERS))
 		rwsem_wake(sem);
@@ -121,6 +136,7 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 	 */
 	tmp = atomic_fetch_add_release(-RWSEM_WRITER_LOCKED+RWSEM_READER_BIAS,
 					&sem->count);
+	rwsem_set_reader_owned(sem);
 	if (tmp & RWSEM_FLAG_WAITERS)
 		rwsem_downgrade_wake(sem);
 }
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index e53f7746..a1451d8 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -23,7 +23,6 @@ void __sched down_read(struct rw_semaphore *sem)
 	rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
-	rwsem_set_reader_owned(sem);
 }
 
 EXPORT_SYMBOL(down_read);
@@ -51,10 +50,8 @@ int down_read_trylock(struct rw_semaphore *sem)
 {
 	int ret = __down_read_trylock(sem);
 
-	if (ret == 1) {
+	if (ret == 1)
 		rwsem_acquire_read(&sem->dep_map, 0, 1, _RET_IP_);
-		rwsem_set_reader_owned(sem);
-	}
 	return ret;
 }
 
@@ -69,7 +66,6 @@ void __sched down_write(struct rw_semaphore *sem)
 	rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_write_trylock, __down_write);
-	rwsem_set_owner(sem);
 }
 
 EXPORT_SYMBOL(down_write);
@@ -87,7 +83,6 @@ int __sched down_write_killable(struct rw_semaphore *sem)
 		return -EINTR;
 	}
 
-	rwsem_set_owner(sem);
 	return 0;
 }
 
@@ -100,10 +95,8 @@ int down_write_trylock(struct rw_semaphore *sem)
 {
 	int ret = __down_write_trylock(sem);
 
-	if (ret == 1) {
+	if (ret == 1)
 		rwsem_acquire(&sem->dep_map, 0, 1, _RET_IP_);
-		rwsem_set_owner(sem);
-	}
 
 	return ret;
 }
@@ -129,7 +122,6 @@ void up_write(struct rw_semaphore *sem)
 {
 	rwsem_release(&sem->dep_map, 1, _RET_IP_);
 
-	rwsem_clear_owner(sem);
 	__up_write(sem);
 }
 
@@ -142,7 +134,6 @@ void downgrade_write(struct rw_semaphore *sem)
 {
 	lock_downgrade(&sem->dep_map, _RET_IP_);
 
-	rwsem_set_reader_owned(sem);
 	__downgrade_write(sem);
 }
 
@@ -156,7 +147,6 @@ void down_read_nested(struct rw_semaphore *sem, int subclass)
 	rwsem_acquire_read(&sem->dep_map, subclass, 0, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
-	rwsem_set_reader_owned(sem);
 }
 
 EXPORT_SYMBOL(down_read_nested);
@@ -167,7 +157,6 @@ void _down_write_nest_lock(struct rw_semaphore *sem, struct lockdep_map *nest)
 	rwsem_acquire_nest(&sem->dep_map, 0, 0, nest, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_write_trylock, __down_write);
-	rwsem_set_owner(sem);
 }
 
 EXPORT_SYMBOL(_down_write_nest_lock);
@@ -187,7 +176,6 @@ void down_write_nested(struct rw_semaphore *sem, int subclass)
 	rwsem_acquire(&sem->dep_map, subclass, 0, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_write_trylock, __down_write);
-	rwsem_set_owner(sem);
 }
 
 EXPORT_SYMBOL(down_write_nested);
@@ -202,7 +190,6 @@ int __sched down_write_killable_nested(struct rw_semaphore *sem, int subclass)
 		return -EINTR;
 	}
 
-	rwsem_set_owner(sem);
 	return 0;
 }
 
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index adcc5af..612109f 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -33,15 +33,12 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
 	WRITE_ONCE(sem->owner, NULL);
 }
 
+/*
+ * This should only be called by the first reader that acquires the read lock.
+ */
 static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 {
-	/*
-	 * We check the owner value first to make sure that we will only
-	 * do a write to the rwsem cacheline when it is really necessary
-	 * to minimize cacheline contention.
-	 */
-	if (sem->owner != RWSEM_READER_OWNED)
-		WRITE_ONCE(sem->owner, RWSEM_READER_OWNED);
+	WRITE_ONCE(sem->owner, RWSEM_READER_OWNED);
 }
 
 static inline bool rwsem_owner_is_writer(struct task_struct *owner)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH-tip v7 04/15] locking/rwsem: Remove kernel/locking/rwsem.h
  2017-10-18 18:30 [PATCH-tip v7 00/15] locking/rwsem: Rework rwsem-xadd & enable new rwsem features Waiman Long
                   ` (2 preceding siblings ...)
  2017-10-18 18:30 ` [PATCH-tip v7 03/15] locking/rwsem: Move owner setting code from rwsem.c to rwsem-xadd.h Waiman Long
@ 2017-10-18 18:30 ` Waiman Long
  2017-10-18 18:30 ` [PATCH-tip v7 05/15] locking/rwsem: Move rwsem internal function declarations to rwsem-xadd.h Waiman Long
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2017-10-18 18:30 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

The content of kernel/locking/rwsem.h is now specific to rwsem-xadd. So
we can just move the its content into rwsem-xadd.h and remove it.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/percpu-rwsem.c |  4 ++-
 kernel/locking/rwsem-xadd.c   |  2 +-
 kernel/locking/rwsem-xadd.h   | 66 +++++++++++++++++++++++++++++++++++++++++
 kernel/locking/rwsem.c        |  4 ++-
 kernel/locking/rwsem.h        | 69 -------------------------------------------
 5 files changed, 73 insertions(+), 72 deletions(-)
 delete mode 100644 kernel/locking/rwsem.h

diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
index f17dad9..d06f7c3 100644
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -7,7 +7,9 @@
 #include <linux/sched.h>
 #include <linux/errno.h>
 
-#include "rwsem.h"
+#ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM
+#include "rwsem-xadd.h"
+#endif
 
 int __percpu_init_rwsem(struct percpu_rw_semaphore *sem,
 			const char *name, struct lock_class_key *rwsem_key)
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 8df0b51..9ca3b64 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -18,7 +18,7 @@
 #include <linux/sched/debug.h>
 #include <linux/osq_lock.h>
 
-#include "rwsem.h"
+#include "rwsem-xadd.h"
 
 /*
  * Guide to the rw_semaphore's count field.
diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h
index f727b6c..64afa35 100644
--- a/kernel/locking/rwsem-xadd.h
+++ b/kernel/locking/rwsem-xadd.h
@@ -4,6 +4,72 @@
 #include <linux/rwsem.h>
 
 /*
+ * The owner field of the rw_semaphore structure will be set to
+ * RWSEM_READ_OWNED when a reader grabs the lock. A writer will clear
+ * the owner field when it unlocks. A reader, on the other hand, will
+ * not touch the owner field when it unlocks.
+ *
+ * In essence, the owner field now has the following 3 states:
+ *  1) 0
+ *     - lock is free or the owner hasn't set the field yet
+ *  2) RWSEM_READER_OWNED
+ *     - lock is currently or previously owned by readers (lock is free
+ *       or not set by owner yet)
+ *  3) Other non-zero value
+ *     - a writer owns the lock
+ */
+#define RWSEM_READER_OWNED	((struct task_struct *)1UL)
+
+#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
+/*
+ * All writes to owner are protected by WRITE_ONCE() to make sure that
+ * store tearing can't happen as optimistic spinners may read and use
+ * the owner value concurrently without lock. Read from owner, however,
+ * may not need READ_ONCE() as long as the pointer value is only used
+ * for comparison and isn't being dereferenced.
+ */
+static inline void rwsem_set_owner(struct rw_semaphore *sem)
+{
+	WRITE_ONCE(sem->owner, current);
+}
+
+static inline void rwsem_clear_owner(struct rw_semaphore *sem)
+{
+	WRITE_ONCE(sem->owner, NULL);
+}
+
+/*
+ * This should only be called by the first reader that acquires the read lock.
+ */
+static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
+{
+	WRITE_ONCE(sem->owner, RWSEM_READER_OWNED);
+}
+
+static inline bool rwsem_owner_is_writer(struct task_struct *owner)
+{
+	return owner && (owner != RWSEM_READER_OWNED);
+}
+
+static inline bool rwsem_owner_is_reader(struct task_struct *owner)
+{
+	return owner == RWSEM_READER_OWNED;
+}
+#else
+static inline void rwsem_set_owner(struct rw_semaphore *sem)
+{
+}
+
+static inline void rwsem_clear_owner(struct rw_semaphore *sem)
+{
+}
+
+static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
+{
+}
+#endif
+
+/*
  * The definition of the atomic counter in the semaphore:
  *
  * Bit  0    - writer locked bit
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index a1451d8..530567e 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -12,7 +12,9 @@
 #include <linux/rwsem.h>
 #include <linux/atomic.h>
 
-#include "rwsem.h"
+#ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM
+#include "rwsem-xadd.h"
+#endif
 
 /*
  * lock for reading
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
deleted file mode 100644
index 612109f..0000000
--- a/kernel/locking/rwsem.h
+++ /dev/null
@@ -1,69 +0,0 @@
-/*
- * The owner field of the rw_semaphore structure will be set to
- * RWSEM_READ_OWNED when a reader grabs the lock. A writer will clear
- * the owner field when it unlocks. A reader, on the other hand, will
- * not touch the owner field when it unlocks.
- *
- * In essence, the owner field now has the following 3 states:
- *  1) 0
- *     - lock is free or the owner hasn't set the field yet
- *  2) RWSEM_READER_OWNED
- *     - lock is currently or previously owned by readers (lock is free
- *       or not set by owner yet)
- *  3) Other non-zero value
- *     - a writer owns the lock
- */
-#define RWSEM_READER_OWNED	((struct task_struct *)1UL)
-
-#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
-/*
- * All writes to owner are protected by WRITE_ONCE() to make sure that
- * store tearing can't happen as optimistic spinners may read and use
- * the owner value concurrently without lock. Read from owner, however,
- * may not need READ_ONCE() as long as the pointer value is only used
- * for comparison and isn't being dereferenced.
- */
-static inline void rwsem_set_owner(struct rw_semaphore *sem)
-{
-	WRITE_ONCE(sem->owner, current);
-}
-
-static inline void rwsem_clear_owner(struct rw_semaphore *sem)
-{
-	WRITE_ONCE(sem->owner, NULL);
-}
-
-/*
- * This should only be called by the first reader that acquires the read lock.
- */
-static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
-{
-	WRITE_ONCE(sem->owner, RWSEM_READER_OWNED);
-}
-
-static inline bool rwsem_owner_is_writer(struct task_struct *owner)
-{
-	return owner && owner != RWSEM_READER_OWNED;
-}
-
-static inline bool rwsem_owner_is_reader(struct task_struct *owner)
-{
-	return owner == RWSEM_READER_OWNED;
-}
-#else
-static inline void rwsem_set_owner(struct rw_semaphore *sem)
-{
-}
-
-static inline void rwsem_clear_owner(struct rw_semaphore *sem)
-{
-}
-
-static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
-{
-}
-#endif
-
-#ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM
-#include "rwsem-xadd.h"
-#endif
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH-tip v7 05/15] locking/rwsem: Move rwsem internal function declarations to rwsem-xadd.h
  2017-10-18 18:30 [PATCH-tip v7 00/15] locking/rwsem: Rework rwsem-xadd & enable new rwsem features Waiman Long
                   ` (3 preceding siblings ...)
  2017-10-18 18:30 ` [PATCH-tip v7 04/15] locking/rwsem: Remove kernel/locking/rwsem.h Waiman Long
@ 2017-10-18 18:30 ` Waiman Long
  2017-10-18 18:30 ` [PATCH-tip v7 06/15] locking/rwsem: Remove arch specific rwsem files Waiman Long
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2017-10-18 18:30 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

We don't need to expose rwsem internal functions which are not supposed
to be called directly from other kernel code.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/rwsem.h       | 7 -------
 kernel/locking/rwsem-xadd.h | 7 +++++++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 398c748..e75ea44 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -42,13 +42,6 @@ struct rw_semaphore {
 #endif
 };
 
-extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_down_read_failed_killable(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_down_write_failed_killable(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *);
-extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
-
 #define RWSEM_UNLOCKED_VALUE	0
 
 /* In all implementations count != 0 means locked */
diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h
index 64afa35..2a8d878 100644
--- a/kernel/locking/rwsem-xadd.h
+++ b/kernel/locking/rwsem-xadd.h
@@ -90,6 +90,13 @@ static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 
 #define RWSEM_COUNT_LOCKED(c)	((c) & RWSEM_LOCK_MASK)
 
+extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
+extern struct rw_semaphore *rwsem_down_read_failed_killable(struct rw_semaphore *sem);
+extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
+extern struct rw_semaphore *rwsem_down_write_failed_killable(struct rw_semaphore *sem);
+extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *);
+extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
+
 /*
  * lock for reading
  */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH-tip v7 06/15] locking/rwsem: Remove arch specific rwsem files
  2017-10-18 18:30 [PATCH-tip v7 00/15] locking/rwsem: Rework rwsem-xadd & enable new rwsem features Waiman Long
                   ` (4 preceding siblings ...)
  2017-10-18 18:30 ` [PATCH-tip v7 05/15] locking/rwsem: Move rwsem internal function declarations to rwsem-xadd.h Waiman Long
@ 2017-10-18 18:30 ` Waiman Long
  2017-10-18 18:30 ` [PATCH-tip v7 07/15] locking/rwsem: Implement lock handoff to prevent lock starvation Waiman Long
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2017-10-18 18:30 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

As the generic rwsem-xadd code is using the appropriate acquire and
release versions of the atomic operations, the arch specific rwsem.h
files will not that much faster than the generic code. So we can
remove those arch specific rwsem.h and stop building asm/rwsem.h to
reduce maintenance effort.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 arch/alpha/include/asm/rwsem.h  | 210 -----------------------------------
 arch/arm/include/asm/Kbuild     |   1 -
 arch/arm64/include/asm/Kbuild   |   1 -
 arch/hexagon/include/asm/Kbuild |   1 -
 arch/ia64/include/asm/rwsem.h   | 171 -----------------------------
 arch/powerpc/include/asm/Kbuild |   1 -
 arch/s390/include/asm/rwsem.h   | 225 --------------------------------------
 arch/sh/include/asm/Kbuild      |   1 -
 arch/sparc/include/asm/Kbuild   |   1 -
 arch/x86/include/asm/rwsem.h    | 236 ----------------------------------------
 arch/x86/lib/Makefile           |   1 -
 arch/x86/lib/rwsem.S            | 156 --------------------------
 arch/xtensa/include/asm/Kbuild  |   1 -
 13 files changed, 1006 deletions(-)
 delete mode 100644 arch/alpha/include/asm/rwsem.h
 delete mode 100644 arch/ia64/include/asm/rwsem.h
 delete mode 100644 arch/s390/include/asm/rwsem.h
 delete mode 100644 arch/x86/include/asm/rwsem.h
 delete mode 100644 arch/x86/lib/rwsem.S

diff --git a/arch/alpha/include/asm/rwsem.h b/arch/alpha/include/asm/rwsem.h
deleted file mode 100644
index 9624cb4..0000000
--- a/arch/alpha/include/asm/rwsem.h
+++ /dev/null
@@ -1,210 +0,0 @@
-#ifndef _ALPHA_RWSEM_H
-#define _ALPHA_RWSEM_H
-
-/*
- * Written by Ivan Kokshaysky <ink@jurassic.park.msu.ru>, 2001.
- * Based on asm-alpha/semaphore.h and asm-i386/rwsem.h
- */
-
-#ifndef _LINUX_RWSEM_H
-#error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead"
-#endif
-
-#ifdef __KERNEL__
-
-#include <linux/compiler.h>
-
-#define RWSEM_UNLOCKED_VALUE		0x0000000000000000L
-#define RWSEM_ACTIVE_BIAS		0x0000000000000001L
-#define RWSEM_ACTIVE_MASK		0x00000000ffffffffL
-#define RWSEM_WAITING_BIAS		(-0x0000000100000000L)
-#define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
-
-static inline int ___down_read(struct rw_semaphore *sem)
-{
-	long oldcount;
-#ifndef	CONFIG_SMP
-	oldcount = sem->count.counter;
-	sem->count.counter += RWSEM_ACTIVE_READ_BIAS;
-#else
-	long temp;
-	__asm__ __volatile__(
-	"1:	ldq_l	%0,%1\n"
-	"	addq	%0,%3,%2\n"
-	"	stq_c	%2,%1\n"
-	"	beq	%2,2f\n"
-	"	mb\n"
-	".subsection 2\n"
-	"2:	br	1b\n"
-	".previous"
-	:"=&r" (oldcount), "=m" (sem->count), "=&r" (temp)
-	:"Ir" (RWSEM_ACTIVE_READ_BIAS), "m" (sem->count) : "memory");
-#endif
-	return (oldcount < 0);
-}
-
-static inline void __down_read(struct rw_semaphore *sem)
-{
-	if (unlikely(___down_read(sem)))
-		rwsem_down_read_failed(sem);
-}
-
-static inline int __down_read_killable(struct rw_semaphore *sem)
-{
-	if (unlikely(___down_read(sem)))
-		if (IS_ERR(rwsem_down_read_failed_killable(sem)))
-			return -EINTR;
-
-	return 0;
-}
-
-/*
- * trylock for reading -- returns 1 if successful, 0 if contention
- */
-static inline int __down_read_trylock(struct rw_semaphore *sem)
-{
-	long old, new, res;
-
-	res = atomic_long_read(&sem->count);
-	do {
-		new = res + RWSEM_ACTIVE_READ_BIAS;
-		if (new <= 0)
-			break;
-		old = res;
-		res = atomic_long_cmpxchg(&sem->count, old, new);
-	} while (res != old);
-	return res >= 0 ? 1 : 0;
-}
-
-static inline long ___down_write(struct rw_semaphore *sem)
-{
-	long oldcount;
-#ifndef	CONFIG_SMP
-	oldcount = sem->count.counter;
-	sem->count.counter += RWSEM_ACTIVE_WRITE_BIAS;
-#else
-	long temp;
-	__asm__ __volatile__(
-	"1:	ldq_l	%0,%1\n"
-	"	addq	%0,%3,%2\n"
-	"	stq_c	%2,%1\n"
-	"	beq	%2,2f\n"
-	"	mb\n"
-	".subsection 2\n"
-	"2:	br	1b\n"
-	".previous"
-	:"=&r" (oldcount), "=m" (sem->count), "=&r" (temp)
-	:"Ir" (RWSEM_ACTIVE_WRITE_BIAS), "m" (sem->count) : "memory");
-#endif
-	return oldcount;
-}
-
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	if (unlikely(___down_write(sem)))
-		rwsem_down_write_failed(sem);
-}
-
-static inline int __down_write_killable(struct rw_semaphore *sem)
-{
-	if (unlikely(___down_write(sem))) {
-		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
-			return -EINTR;
-	}
-
-	return 0;
-}
-
-/*
- * trylock for writing -- returns 1 if successful, 0 if contention
- */
-static inline int __down_write_trylock(struct rw_semaphore *sem)
-{
-	long ret = atomic_long_cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
-			   RWSEM_ACTIVE_WRITE_BIAS);
-	if (ret == RWSEM_UNLOCKED_VALUE)
-		return 1;
-	return 0;
-}
-
-static inline void __up_read(struct rw_semaphore *sem)
-{
-	long oldcount;
-#ifndef	CONFIG_SMP
-	oldcount = sem->count.counter;
-	sem->count.counter -= RWSEM_ACTIVE_READ_BIAS;
-#else
-	long temp;
-	__asm__ __volatile__(
-	"	mb\n"
-	"1:	ldq_l	%0,%1\n"
-	"	subq	%0,%3,%2\n"
-	"	stq_c	%2,%1\n"
-	"	beq	%2,2f\n"
-	".subsection 2\n"
-	"2:	br	1b\n"
-	".previous"
-	:"=&r" (oldcount), "=m" (sem->count), "=&r" (temp)
-	:"Ir" (RWSEM_ACTIVE_READ_BIAS), "m" (sem->count) : "memory");
-#endif
-	if (unlikely(oldcount < 0))
-		if ((int)oldcount - RWSEM_ACTIVE_READ_BIAS == 0)
-			rwsem_wake(sem);
-}
-
-static inline void __up_write(struct rw_semaphore *sem)
-{
-	long count;
-#ifndef	CONFIG_SMP
-	sem->count.counter -= RWSEM_ACTIVE_WRITE_BIAS;
-	count = sem->count.counter;
-#else
-	long temp;
-	__asm__ __volatile__(
-	"	mb\n"
-	"1:	ldq_l	%0,%1\n"
-	"	subq	%0,%3,%2\n"
-	"	stq_c	%2,%1\n"
-	"	beq	%2,2f\n"
-	"	subq	%0,%3,%0\n"
-	".subsection 2\n"
-	"2:	br	1b\n"
-	".previous"
-	:"=&r" (count), "=m" (sem->count), "=&r" (temp)
-	:"Ir" (RWSEM_ACTIVE_WRITE_BIAS), "m" (sem->count) : "memory");
-#endif
-	if (unlikely(count))
-		if ((int)count == 0)
-			rwsem_wake(sem);
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void __downgrade_write(struct rw_semaphore *sem)
-{
-	long oldcount;
-#ifndef	CONFIG_SMP
-	oldcount = sem->count.counter;
-	sem->count.counter -= RWSEM_WAITING_BIAS;
-#else
-	long temp;
-	__asm__ __volatile__(
-	"1:	ldq_l	%0,%1\n"
-	"	addq	%0,%3,%2\n"
-	"	stq_c	%2,%1\n"
-	"	beq	%2,2f\n"
-	"	mb\n"
-	".subsection 2\n"
-	"2:	br	1b\n"
-	".previous"
-	:"=&r" (oldcount), "=m" (sem->count), "=&r" (temp)
-	:"Ir" (-RWSEM_WAITING_BIAS), "m" (sem->count) : "memory");
-#endif
-	if (unlikely(oldcount < 0))
-		rwsem_downgrade_wake(sem);
-}
-
-#endif /* __KERNEL__ */
-#endif /* _ALPHA_RWSEM_H */
diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index 721ab5e..58337ef 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -12,7 +12,6 @@ generic-y += mm-arch-hooks.h
 generic-y += msi.h
 generic-y += parport.h
 generic-y += preempt.h
-generic-y += rwsem.h
 generic-y += seccomp.h
 generic-y += segment.h
 generic-y += serial.h
diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index 2326e39..38366a6 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -16,7 +16,6 @@ generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
 generic-y += msi.h
 generic-y += preempt.h
-generic-y += rwsem.h
 generic-y += segment.h
 generic-y += serial.h
 generic-y += set_memory.h
diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild
index 3401368..002eb1f 100644
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -24,7 +24,6 @@ generic-y += mm-arch-hooks.h
 generic-y += pci.h
 generic-y += percpu.h
 generic-y += preempt.h
-generic-y += rwsem.h
 generic-y += sections.h
 generic-y += segment.h
 generic-y += serial.h
diff --git a/arch/ia64/include/asm/rwsem.h b/arch/ia64/include/asm/rwsem.h
deleted file mode 100644
index 49f7db0..0000000
--- a/arch/ia64/include/asm/rwsem.h
+++ /dev/null
@@ -1,171 +0,0 @@
-/*
- * R/W semaphores for ia64
- *
- * Copyright (C) 2003 Ken Chen <kenneth.w.chen@intel.com>
- * Copyright (C) 2003 Asit Mallick <asit.k.mallick@intel.com>
- * Copyright (C) 2005 Christoph Lameter <cl@linux.com>
- *
- * Based on asm-i386/rwsem.h and other architecture implementation.
- *
- * The MSW of the count is the negated number of active writers and
- * waiting lockers, and the LSW is the total number of active locks.
- *
- * The lock count is initialized to 0 (no active and no waiting lockers).
- *
- * When a writer subtracts WRITE_BIAS, it'll get 0xffffffff00000001 for
- * the case of an uncontended lock. Readers increment by 1 and see a positive
- * value when uncontended, negative if there are writers (and maybe) readers
- * waiting (in which case it goes to sleep).
- */
-
-#ifndef _ASM_IA64_RWSEM_H
-#define _ASM_IA64_RWSEM_H
-
-#ifndef _LINUX_RWSEM_H
-#error "Please don't include <asm/rwsem.h> directly, use <linux/rwsem.h> instead."
-#endif
-
-#include <asm/intrinsics.h>
-
-#define RWSEM_UNLOCKED_VALUE		__IA64_UL_CONST(0x0000000000000000)
-#define RWSEM_ACTIVE_BIAS		(1L)
-#define RWSEM_ACTIVE_MASK		(0xffffffffL)
-#define RWSEM_WAITING_BIAS		(-0x100000000L)
-#define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
-
-/*
- * lock for reading
- */
-static inline int
-___down_read (struct rw_semaphore *sem)
-{
-	long result = ia64_fetchadd8_acq((unsigned long *)&sem->count.counter, 1);
-
-	return (result < 0);
-}
-
-static inline void
-__down_read (struct rw_semaphore *sem)
-{
-	if (___down_read(sem))
-		rwsem_down_read_failed(sem);
-}
-
-static inline int
-__down_read_killable (struct rw_semaphore *sem)
-{
-	if (___down_read(sem))
-		if (IS_ERR(rwsem_down_read_failed_killable(sem)))
-			return -EINTR;
-
-	return 0;
-}
-
-/*
- * lock for writing
- */
-static inline long
-___down_write (struct rw_semaphore *sem)
-{
-	long old, new;
-
-	do {
-		old = atomic_long_read(&sem->count);
-		new = old + RWSEM_ACTIVE_WRITE_BIAS;
-	} while (atomic_long_cmpxchg_acquire(&sem->count, old, new) != old);
-
-	return old;
-}
-
-static inline void
-__down_write (struct rw_semaphore *sem)
-{
-	if (___down_write(sem))
-		rwsem_down_write_failed(sem);
-}
-
-static inline int
-__down_write_killable (struct rw_semaphore *sem)
-{
-	if (___down_write(sem)) {
-		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
-			return -EINTR;
-	}
-
-	return 0;
-}
-
-/*
- * unlock after reading
- */
-static inline void
-__up_read (struct rw_semaphore *sem)
-{
-	long result = ia64_fetchadd8_rel((unsigned long *)&sem->count.counter, -1);
-
-	if (result < 0 && (--result & RWSEM_ACTIVE_MASK) == 0)
-		rwsem_wake(sem);
-}
-
-/*
- * unlock after writing
- */
-static inline void
-__up_write (struct rw_semaphore *sem)
-{
-	long old, new;
-
-	do {
-		old = atomic_long_read(&sem->count);
-		new = old - RWSEM_ACTIVE_WRITE_BIAS;
-	} while (atomic_long_cmpxchg_release(&sem->count, old, new) != old);
-
-	if (new < 0 && (new & RWSEM_ACTIVE_MASK) == 0)
-		rwsem_wake(sem);
-}
-
-/*
- * trylock for reading -- returns 1 if successful, 0 if contention
- */
-static inline int
-__down_read_trylock (struct rw_semaphore *sem)
-{
-	long tmp;
-	while ((tmp = atomic_long_read(&sem->count)) >= 0) {
-		if (tmp == atomic_long_cmpxchg_acquire(&sem->count, tmp, tmp+1)) {
-			return 1;
-		}
-	}
-	return 0;
-}
-
-/*
- * trylock for writing -- returns 1 if successful, 0 if contention
- */
-static inline int
-__down_write_trylock (struct rw_semaphore *sem)
-{
-	long tmp = atomic_long_cmpxchg_acquire(&sem->count,
-			RWSEM_UNLOCKED_VALUE, RWSEM_ACTIVE_WRITE_BIAS);
-	return tmp == RWSEM_UNLOCKED_VALUE;
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void
-__downgrade_write (struct rw_semaphore *sem)
-{
-	long old, new;
-
-	do {
-		old = atomic_long_read(&sem->count);
-		new = old - RWSEM_WAITING_BIAS;
-	} while (atomic_long_cmpxchg_release(&sem->count, old, new) != old);
-
-	if (old < 0)
-		rwsem_downgrade_wake(sem);
-}
-
-#endif /* _ASM_IA64_RWSEM_H */
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 2542ea1..e25807a 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -6,6 +6,5 @@ generic-y += irq_work.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += preempt.h
-generic-y += rwsem.h
 generic-y += vtime.h
 generic-y += msi.h
diff --git a/arch/s390/include/asm/rwsem.h b/arch/s390/include/asm/rwsem.h
deleted file mode 100644
index fda9481..0000000
--- a/arch/s390/include/asm/rwsem.h
+++ /dev/null
@@ -1,225 +0,0 @@
-#ifndef _S390_RWSEM_H
-#define _S390_RWSEM_H
-
-/*
- *  S390 version
- *    Copyright IBM Corp. 2002
- *    Author(s): Martin Schwidefsky (schwidefsky@de.ibm.com)
- *
- *  Based on asm-alpha/semaphore.h and asm-i386/rwsem.h
- */
-
-/*
- *
- * The MSW of the count is the negated number of active writers and waiting
- * lockers, and the LSW is the total number of active locks
- *
- * The lock count is initialized to 0 (no active and no waiting lockers).
- *
- * When a writer subtracts WRITE_BIAS, it'll get 0xffff0001 for the case of an
- * uncontended lock. This can be determined because XADD returns the old value.
- * Readers increment by 1 and see a positive value when uncontended, negative
- * if there are writers (and maybe) readers waiting (in which case it goes to
- * sleep).
- *
- * The value of WAITING_BIAS supports up to 32766 waiting processes. This can
- * be extended to 65534 by manually checking the whole MSW rather than relying
- * on the S flag.
- *
- * The value of ACTIVE_BIAS supports up to 65535 active processes.
- *
- * This should be totally fair - if anything is waiting, a process that wants a
- * lock will go to the back of the queue. When the currently active lock is
- * released, if there's a writer at the front of the queue, then that and only
- * that will be woken up; if there's a bunch of consecutive readers at the
- * front, then they'll all be woken up, but no other readers will be.
- */
-
-#ifndef _LINUX_RWSEM_H
-#error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead"
-#endif
-
-#define RWSEM_UNLOCKED_VALUE	0x0000000000000000L
-#define RWSEM_ACTIVE_BIAS	0x0000000000000001L
-#define RWSEM_ACTIVE_MASK	0x00000000ffffffffL
-#define RWSEM_WAITING_BIAS	(-0x0000000100000000L)
-#define RWSEM_ACTIVE_READ_BIAS	RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS	(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
-
-/*
- * lock for reading
- */
-static inline int ___down_read(struct rw_semaphore *sem)
-{
-	signed long old, new;
-
-	asm volatile(
-		"	lg	%0,%2\n"
-		"0:	lgr	%1,%0\n"
-		"	aghi	%1,%4\n"
-		"	csg	%0,%1,%2\n"
-		"	jl	0b"
-		: "=&d" (old), "=&d" (new), "=Q" (sem->count)
-		: "Q" (sem->count), "i" (RWSEM_ACTIVE_READ_BIAS)
-		: "cc", "memory");
-	return (old < 0);
-}
-
-static inline void __down_read(struct rw_semaphore *sem)
-{
-	if (___down_read(sem))
-		rwsem_down_read_failed(sem);
-}
-
-static inline int __down_read_killable(struct rw_semaphore *sem)
-{
-	if (___down_read(sem)) {
-		if (IS_ERR(rwsem_down_read_failed_killable(sem)))
-			return -EINTR;
-	}
-
-	return 0;
-}
-
-/*
- * trylock for reading -- returns 1 if successful, 0 if contention
- */
-static inline int __down_read_trylock(struct rw_semaphore *sem)
-{
-	signed long old, new;
-
-	asm volatile(
-		"	lg	%0,%2\n"
-		"0:	ltgr	%1,%0\n"
-		"	jm	1f\n"
-		"	aghi	%1,%4\n"
-		"	csg	%0,%1,%2\n"
-		"	jl	0b\n"
-		"1:"
-		: "=&d" (old), "=&d" (new), "=Q" (sem->count)
-		: "Q" (sem->count), "i" (RWSEM_ACTIVE_READ_BIAS)
-		: "cc", "memory");
-	return old >= 0 ? 1 : 0;
-}
-
-/*
- * lock for writing
- */
-static inline long ___down_write(struct rw_semaphore *sem)
-{
-	signed long old, new, tmp;
-
-	tmp = RWSEM_ACTIVE_WRITE_BIAS;
-	asm volatile(
-		"	lg	%0,%2\n"
-		"0:	lgr	%1,%0\n"
-		"	ag	%1,%4\n"
-		"	csg	%0,%1,%2\n"
-		"	jl	0b"
-		: "=&d" (old), "=&d" (new), "=Q" (sem->count)
-		: "Q" (sem->count), "m" (tmp)
-		: "cc", "memory");
-
-	return old;
-}
-
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	if (___down_write(sem))
-		rwsem_down_write_failed(sem);
-}
-
-static inline int __down_write_killable(struct rw_semaphore *sem)
-{
-	if (___down_write(sem))
-		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
-			return -EINTR;
-
-	return 0;
-}
-
-/*
- * trylock for writing -- returns 1 if successful, 0 if contention
- */
-static inline int __down_write_trylock(struct rw_semaphore *sem)
-{
-	signed long old;
-
-	asm volatile(
-		"	lg	%0,%1\n"
-		"0:	ltgr	%0,%0\n"
-		"	jnz	1f\n"
-		"	csg	%0,%3,%1\n"
-		"	jl	0b\n"
-		"1:"
-		: "=&d" (old), "=Q" (sem->count)
-		: "Q" (sem->count), "d" (RWSEM_ACTIVE_WRITE_BIAS)
-		: "cc", "memory");
-	return (old == RWSEM_UNLOCKED_VALUE) ? 1 : 0;
-}
-
-/*
- * unlock after reading
- */
-static inline void __up_read(struct rw_semaphore *sem)
-{
-	signed long old, new;
-
-	asm volatile(
-		"	lg	%0,%2\n"
-		"0:	lgr	%1,%0\n"
-		"	aghi	%1,%4\n"
-		"	csg	%0,%1,%2\n"
-		"	jl	0b"
-		: "=&d" (old), "=&d" (new), "=Q" (sem->count)
-		: "Q" (sem->count), "i" (-RWSEM_ACTIVE_READ_BIAS)
-		: "cc", "memory");
-	if (new < 0)
-		if ((new & RWSEM_ACTIVE_MASK) == 0)
-			rwsem_wake(sem);
-}
-
-/*
- * unlock after writing
- */
-static inline void __up_write(struct rw_semaphore *sem)
-{
-	signed long old, new, tmp;
-
-	tmp = -RWSEM_ACTIVE_WRITE_BIAS;
-	asm volatile(
-		"	lg	%0,%2\n"
-		"0:	lgr	%1,%0\n"
-		"	ag	%1,%4\n"
-		"	csg	%0,%1,%2\n"
-		"	jl	0b"
-		: "=&d" (old), "=&d" (new), "=Q" (sem->count)
-		: "Q" (sem->count), "m" (tmp)
-		: "cc", "memory");
-	if (new < 0)
-		if ((new & RWSEM_ACTIVE_MASK) == 0)
-			rwsem_wake(sem);
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void __downgrade_write(struct rw_semaphore *sem)
-{
-	signed long old, new, tmp;
-
-	tmp = -RWSEM_WAITING_BIAS;
-	asm volatile(
-		"	lg	%0,%2\n"
-		"0:	lgr	%1,%0\n"
-		"	ag	%1,%4\n"
-		"	csg	%0,%1,%2\n"
-		"	jl	0b"
-		: "=&d" (old), "=&d" (new), "=Q" (sem->count)
-		: "Q" (sem->count), "m" (tmp)
-		: "cc", "memory");
-	if (new > 1)
-		rwsem_downgrade_wake(sem);
-}
-
-#endif /* _S390_RWSEM_H */
diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index 1a6f9c3..24af7e6 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -13,7 +13,6 @@ generic-y += mm-arch-hooks.h
 generic-y += parport.h
 generic-y += percpu.h
 generic-y += preempt.h
-generic-y += rwsem.h
 generic-y += serial.h
 generic-y += sizes.h
 generic-y += trace_clock.h
diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild
index 80ddc01..2c6d988 100644
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -15,7 +15,6 @@ generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
 generic-y += module.h
 generic-y += preempt.h
-generic-y += rwsem.h
 generic-y += serial.h
 generic-y += trace_clock.h
 generic-y += word-at-a-time.h
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
deleted file mode 100644
index 1e51195..0000000
--- a/arch/x86/include/asm/rwsem.h
+++ /dev/null
@@ -1,236 +0,0 @@
-/* rwsem.h: R/W semaphores implemented using XADD/CMPXCHG for i486+
- *
- * Written by David Howells (dhowells@redhat.com).
- *
- * Derived from asm-x86/semaphore.h
- *
- *
- * The MSW of the count is the negated number of active writers and waiting
- * lockers, and the LSW is the total number of active locks
- *
- * The lock count is initialized to 0 (no active and no waiting lockers).
- *
- * When a writer subtracts WRITE_BIAS, it'll get 0xffff0001 for the case of an
- * uncontended lock. This can be determined because XADD returns the old value.
- * Readers increment by 1 and see a positive value when uncontended, negative
- * if there are writers (and maybe) readers waiting (in which case it goes to
- * sleep).
- *
- * The value of WAITING_BIAS supports up to 32766 waiting processes. This can
- * be extended to 65534 by manually checking the whole MSW rather than relying
- * on the S flag.
- *
- * The value of ACTIVE_BIAS supports up to 65535 active processes.
- *
- * This should be totally fair - if anything is waiting, a process that wants a
- * lock will go to the back of the queue. When the currently active lock is
- * released, if there's a writer at the front of the queue, then that and only
- * that will be woken up; if there's a bunch of consecutive readers at the
- * front, then they'll all be woken up, but no other readers will be.
- */
-
-#ifndef _ASM_X86_RWSEM_H
-#define _ASM_X86_RWSEM_H
-
-#ifndef _LINUX_RWSEM_H
-#error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead"
-#endif
-
-#ifdef __KERNEL__
-#include <asm/asm.h>
-
-/*
- * The bias values and the counter type limits the number of
- * potential readers/writers to 32767 for 32 bits and 2147483647
- * for 64 bits.
- */
-
-#ifdef CONFIG_X86_64
-# define RWSEM_ACTIVE_MASK		0xffffffffL
-#else
-# define RWSEM_ACTIVE_MASK		0x0000ffffL
-#endif
-
-#define RWSEM_UNLOCKED_VALUE		0x00000000L
-#define RWSEM_ACTIVE_BIAS		0x00000001L
-#define RWSEM_WAITING_BIAS		(-RWSEM_ACTIVE_MASK-1)
-#define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
-
-/*
- * lock for reading
- */
-#define ____down_read(sem, slow_path)					\
-({									\
-	struct rw_semaphore* ret;					\
-	asm volatile("# beginning down_read\n\t"			\
-		     LOCK_PREFIX _ASM_INC "(%[sem])\n\t"		\
-		     /* adds 0x00000001 */				\
-		     "  jns        1f\n"				\
-		     "  call " slow_path "\n"				\
-		     "1:\n\t"						\
-		     "# ending down_read\n\t"				\
-		     : "+m" (sem->count), "=a" (ret),			\
-			ASM_CALL_CONSTRAINT				\
-		     : [sem] "a" (sem)					\
-		     : "memory", "cc");					\
-	ret;								\
-})
-
-static inline void __down_read(struct rw_semaphore *sem)
-{
-	____down_read(sem, "call_rwsem_down_read_failed");
-}
-
-static inline int __down_read_killable(struct rw_semaphore *sem)
-{
-	if (IS_ERR(____down_read(sem, "call_rwsem_down_read_failed_killable")))
-		return -EINTR;
-	return 0;
-}
-
-/*
- * trylock for reading -- returns 1 if successful, 0 if contention
- */
-static inline bool __down_read_trylock(struct rw_semaphore *sem)
-{
-	long result, tmp;
-	asm volatile("# beginning __down_read_trylock\n\t"
-		     "  mov          %[count],%[result]\n\t"
-		     "1:\n\t"
-		     "  mov          %[result],%[tmp]\n\t"
-		     "  add          %[inc],%[tmp]\n\t"
-		     "  jle	     2f\n\t"
-		     LOCK_PREFIX "  cmpxchg  %[tmp],%[count]\n\t"
-		     "  jnz	     1b\n\t"
-		     "2:\n\t"
-		     "# ending __down_read_trylock\n\t"
-		     : [count] "+m" (sem->count), [result] "=&a" (result),
-		       [tmp] "=&r" (tmp)
-		     : [inc] "i" (RWSEM_ACTIVE_READ_BIAS)
-		     : "memory", "cc");
-	return result >= 0;
-}
-
-/*
- * lock for writing
- */
-#define ____down_write(sem, slow_path)			\
-({							\
-	long tmp;					\
-	struct rw_semaphore* ret;			\
-							\
-	asm volatile("# beginning down_write\n\t"	\
-		     LOCK_PREFIX "  xadd      %[tmp],(%[sem])\n\t"	\
-		     /* adds 0xffff0001, returns the old value */ \
-		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" \
-		     /* was the active mask 0 before? */\
-		     "  jz        1f\n"			\
-		     "  call " slow_path "\n"		\
-		     "1:\n"				\
-		     "# ending down_write"		\
-		     : "+m" (sem->count), [tmp] "=d" (tmp),	\
-		       "=a" (ret), ASM_CALL_CONSTRAINT	\
-		     : [sem] "a" (sem), "[tmp]" (RWSEM_ACTIVE_WRITE_BIAS) \
-		     : "memory", "cc");			\
-	ret;						\
-})
-
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	____down_write(sem, "call_rwsem_down_write_failed");
-}
-
-static inline int __down_write_killable(struct rw_semaphore *sem)
-{
-	if (IS_ERR(____down_write(sem, "call_rwsem_down_write_failed_killable")))
-		return -EINTR;
-
-	return 0;
-}
-
-/*
- * trylock for writing -- returns 1 if successful, 0 if contention
- */
-static inline bool __down_write_trylock(struct rw_semaphore *sem)
-{
-	bool result;
-	long tmp0, tmp1;
-	asm volatile("# beginning __down_write_trylock\n\t"
-		     "  mov          %[count],%[tmp0]\n\t"
-		     "1:\n\t"
-		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t"
-		     /* was the active mask 0 before? */
-		     "  jnz          2f\n\t"
-		     "  mov          %[tmp0],%[tmp1]\n\t"
-		     "  add          %[inc],%[tmp1]\n\t"
-		     LOCK_PREFIX "  cmpxchg  %[tmp1],%[count]\n\t"
-		     "  jnz	     1b\n\t"
-		     "2:\n\t"
-		     CC_SET(e)
-		     "# ending __down_write_trylock\n\t"
-		     : [count] "+m" (sem->count), [tmp0] "=&a" (tmp0),
-		       [tmp1] "=&r" (tmp1), CC_OUT(e) (result)
-		     : [inc] "er" (RWSEM_ACTIVE_WRITE_BIAS)
-		     : "memory");
-	return result;
-}
-
-/*
- * unlock after reading
- */
-static inline void __up_read(struct rw_semaphore *sem)
-{
-	long tmp;
-	asm volatile("# beginning __up_read\n\t"
-		     LOCK_PREFIX "  xadd      %[tmp],(%[sem])\n\t"
-		     /* subtracts 1, returns the old value */
-		     "  jns        1f\n\t"
-		     "  call call_rwsem_wake\n" /* expects old value in %edx */
-		     "1:\n"
-		     "# ending __up_read\n"
-		     : "+m" (sem->count), [tmp] "=d" (tmp)
-		     : [sem] "a" (sem), "[tmp]" (-RWSEM_ACTIVE_READ_BIAS)
-		     : "memory", "cc");
-}
-
-/*
- * unlock after writing
- */
-static inline void __up_write(struct rw_semaphore *sem)
-{
-	long tmp;
-	asm volatile("# beginning __up_write\n\t"
-		     LOCK_PREFIX "  xadd      %[tmp],(%[sem])\n\t"
-		     /* subtracts 0xffff0001, returns the old value */
-		     "  jns        1f\n\t"
-		     "  call call_rwsem_wake\n" /* expects old value in %edx */
-		     "1:\n\t"
-		     "# ending __up_write\n"
-		     : "+m" (sem->count), [tmp] "=d" (tmp)
-		     : [sem] "a" (sem), "[tmp]" (-RWSEM_ACTIVE_WRITE_BIAS)
-		     : "memory", "cc");
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void __downgrade_write(struct rw_semaphore *sem)
-{
-	asm volatile("# beginning __downgrade_write\n\t"
-		     LOCK_PREFIX _ASM_ADD "%[inc],(%[sem])\n\t"
-		     /*
-		      * transitions 0xZZZZ0001 -> 0xYYYY0001 (i386)
-		      *     0xZZZZZZZZ00000001 -> 0xYYYYYYYY00000001 (x86_64)
-		      */
-		     "  jns       1f\n\t"
-		     "  call call_rwsem_downgrade_wake\n"
-		     "1:\n\t"
-		     "# ending __downgrade_write\n"
-		     : "+m" (sem->count)
-		     : [sem] "a" (sem), [inc] "er" (-RWSEM_WAITING_BIAS)
-		     : "memory", "cc");
-}
-
-#endif /* __KERNEL__ */
-#endif /* _ASM_X86_RWSEM_H */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 34a7413..b4e8c8e 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -22,7 +22,6 @@ obj-$(CONFIG_SMP) += msr-smp.o cache-smp.o
 lib-y := delay.o misc.o cmdline.o cpu.o
 lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
 lib-y += memcpy_$(BITS).o
-lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
 lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
 lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
 
diff --git a/arch/x86/lib/rwsem.S b/arch/x86/lib/rwsem.S
deleted file mode 100644
index dc2ab6e..0000000
--- a/arch/x86/lib/rwsem.S
+++ /dev/null
@@ -1,156 +0,0 @@
-/*
- * x86 semaphore implementation.
- *
- * (C) Copyright 1999 Linus Torvalds
- *
- * Portions Copyright 1999 Red Hat, Inc.
- *
- *	This program is free software; you can redistribute it and/or
- *	modify it under the terms of the GNU General Public License
- *	as published by the Free Software Foundation; either version
- *	2 of the License, or (at your option) any later version.
- *
- * rw semaphores implemented November 1999 by Benjamin LaHaise <bcrl@kvack.org>
- */
-
-#include <linux/linkage.h>
-#include <asm/alternative-asm.h>
-#include <asm/frame.h>
-
-#define __ASM_HALF_REG(reg)	__ASM_SEL(reg, e##reg)
-#define __ASM_HALF_SIZE(inst)	__ASM_SEL(inst##w, inst##l)
-
-#ifdef CONFIG_X86_32
-
-/*
- * The semaphore operations have a special calling sequence that
- * allow us to do a simpler in-line version of them. These routines
- * need to convert that sequence back into the C sequence when
- * there is contention on the semaphore.
- *
- * %eax contains the semaphore pointer on entry. Save the C-clobbered
- * registers (%eax, %edx and %ecx) except %eax which is either a return
- * value or just gets clobbered. Same is true for %edx so make sure GCC
- * reloads it after the slow path, by making it hold a temporary, for
- * example see ____down_write().
- */
-
-#define save_common_regs \
-	pushl %ecx
-
-#define restore_common_regs \
-	popl %ecx
-
-	/* Avoid uglifying the argument copying x86-64 needs to do. */
-	.macro movq src, dst
-	.endm
-
-#else
-
-/*
- * x86-64 rwsem wrappers
- *
- * This interfaces the inline asm code to the slow-path
- * C routines. We need to save the call-clobbered regs
- * that the asm does not mark as clobbered, and move the
- * argument from %rax to %rdi.
- *
- * NOTE! We don't need to save %rax, because the functions
- * will always return the semaphore pointer in %rax (which
- * is also the input argument to these helpers)
- *
- * The following can clobber %rdx because the asm clobbers it:
- *   call_rwsem_down_write_failed
- *   call_rwsem_wake
- * but %rdi, %rsi, %rcx, %r8-r11 always need saving.
- */
-
-#define save_common_regs \
-	pushq %rdi; \
-	pushq %rsi; \
-	pushq %rcx; \
-	pushq %r8;  \
-	pushq %r9;  \
-	pushq %r10; \
-	pushq %r11
-
-#define restore_common_regs \
-	popq %r11; \
-	popq %r10; \
-	popq %r9; \
-	popq %r8; \
-	popq %rcx; \
-	popq %rsi; \
-	popq %rdi
-
-#endif
-
-/* Fix up special calling conventions */
-ENTRY(call_rwsem_down_read_failed)
-	FRAME_BEGIN
-	save_common_regs
-	__ASM_SIZE(push,) %__ASM_REG(dx)
-	movq %rax,%rdi
-	call rwsem_down_read_failed
-	__ASM_SIZE(pop,) %__ASM_REG(dx)
-	restore_common_regs
-	FRAME_END
-	ret
-ENDPROC(call_rwsem_down_read_failed)
-
-ENTRY(call_rwsem_down_read_failed_killable)
-	FRAME_BEGIN
-	save_common_regs
-	__ASM_SIZE(push,) %__ASM_REG(dx)
-	movq %rax,%rdi
-	call rwsem_down_read_failed_killable
-	__ASM_SIZE(pop,) %__ASM_REG(dx)
-	restore_common_regs
-	FRAME_END
-	ret
-ENDPROC(call_rwsem_down_read_failed_killable)
-
-ENTRY(call_rwsem_down_write_failed)
-	FRAME_BEGIN
-	save_common_regs
-	movq %rax,%rdi
-	call rwsem_down_write_failed
-	restore_common_regs
-	FRAME_END
-	ret
-ENDPROC(call_rwsem_down_write_failed)
-
-ENTRY(call_rwsem_down_write_failed_killable)
-	FRAME_BEGIN
-	save_common_regs
-	movq %rax,%rdi
-	call rwsem_down_write_failed_killable
-	restore_common_regs
-	FRAME_END
-	ret
-ENDPROC(call_rwsem_down_write_failed_killable)
-
-ENTRY(call_rwsem_wake)
-	FRAME_BEGIN
-	/* do nothing if still outstanding active readers */
-	__ASM_HALF_SIZE(dec) %__ASM_HALF_REG(dx)
-	jnz 1f
-	save_common_regs
-	movq %rax,%rdi
-	call rwsem_wake
-	restore_common_regs
-1:	FRAME_END
-	ret
-ENDPROC(call_rwsem_wake)
-
-ENTRY(call_rwsem_downgrade_wake)
-	FRAME_BEGIN
-	save_common_regs
-	__ASM_SIZE(push,) %__ASM_REG(dx)
-	movq %rax,%rdi
-	call rwsem_downgrade_wake
-	__ASM_SIZE(pop,) %__ASM_REG(dx)
-	restore_common_regs
-	FRAME_END
-	ret
-ENDPROC(call_rwsem_downgrade_wake)
diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild
index dff7cc3..199523e 100644
--- a/arch/xtensa/include/asm/Kbuild
+++ b/arch/xtensa/include/asm/Kbuild
@@ -21,7 +21,6 @@ generic-y += mm-arch-hooks.h
 generic-y += param.h
 generic-y += percpu.h
 generic-y += preempt.h
-generic-y += rwsem.h
 generic-y += sections.h
 generic-y += topology.h
 generic-y += trace_clock.h
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH-tip v7 07/15] locking/rwsem: Implement lock handoff to prevent lock starvation
  2017-10-18 18:30 [PATCH-tip v7 00/15] locking/rwsem: Rework rwsem-xadd & enable new rwsem features Waiman Long
                   ` (5 preceding siblings ...)
  2017-10-18 18:30 ` [PATCH-tip v7 06/15] locking/rwsem: Remove arch specific rwsem files Waiman Long
@ 2017-10-18 18:30 ` Waiman Long
  2017-10-18 18:30 ` [PATCH-tip v7 08/15] locking/rwsem: Enable readers spinning on writer Waiman Long
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2017-10-18 18:30 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

Because of writer lock stealing, it is possible that a constant
stream of incoming writers will cause a waiting writer or reader to
wait indefinitely leading to lock starvation.

The mutex code has a lock handoff mechanism to prevent lock starvation.
This patch implements a similar lock handoff mechanism to disable
lock stealing and force lock handoff to the first waiter in the queue
after at least a 5ms waiting period. The waiting period is used to
avoid discouraging lock stealing too much to affect performance.

Beside the handoff bit, an extra bit is used to distinguish between
reader and writer handoff so that the code can act differently
depending on the type of handoff.

A rwsem microbenchmark was run for 5 seconds on a 2-socket 40-core
80-thread x86-64 system with a 4.14-rc2 based kernel and 60 write_lock
threads with 1us sleep critical section.

For the unpatched kernel, the locking rate was 15,519 kop/s. However
there were 28 threads with only 1 locking operation done (practically
starved). The thread with the highest locking rate had done more than
646k of them.

For the patched kernel, the locking rate dropped to 12,590 kop/s. The
number of locking operations done per thread had a range of 14,450 -
22,648. The rwsem became much more fair with the tradeoff of lower
overall throughput.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 94 +++++++++++++++++++++++++++++++++++++--------
 kernel/locking/rwsem-xadd.h | 32 +++++++++++----
 2 files changed, 102 insertions(+), 24 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 9ca3b64..65717d8 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -70,6 +70,7 @@ struct rwsem_waiter {
 	struct list_head list;
 	struct task_struct *task;
 	enum rwsem_waiter_type type;
+	unsigned long timeout;
 };
 
 enum rwsem_wake_type {
@@ -79,6 +80,12 @@ enum rwsem_wake_type {
 };
 
 /*
+ * The minimum waiting time (5ms) in the wait queue before initiating the
+ * handoff protocol.
+ */
+#define RWSEM_WAIT_TIMEOUT	((HZ - 1)/200 + 1)
+
+/*
  * handle the lock release when processes blocked on it that can now run
  * - if we come here from up_xxxx(), then the RWSEM_FLAG_WAITERS bit must
  *   have been set.
@@ -127,6 +134,13 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 		adjustment = RWSEM_READER_BIAS;
 		oldcount = atomic_fetch_add(adjustment, &sem->count);
 		if (unlikely(oldcount & RWSEM_WRITER_LOCKED)) {
+			/*
+			 * Initiate handoff to reader, if applicable.
+			 */
+			if (!(oldcount & RWSEM_FLAG_RHANDOFF) &&
+			    time_after(jiffies, waiter->timeout))
+				adjustment -= RWSEM_FLAG_RHANDOFF;
+
 			atomic_sub(adjustment, &sem->count);
 			return;
 		}
@@ -169,6 +183,13 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 		adjustment -= RWSEM_FLAG_WAITERS;
 	}
 
+	/*
+	 * Clear the handoff flag
+	 */
+	if (woken && (wake_type != RWSEM_WAKE_READ_OWNED) &&
+	    RWSEM_COUNT_HANDOFF(atomic_read(&sem->count)))
+		adjustment -= RWSEM_FLAG_RHANDOFF;
+
 	if (adjustment)
 		atomic_add(adjustment, &sem->count);
 }
@@ -178,15 +199,20 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
  * race conditions between checking the rwsem wait list and setting the
  * sem->count accordingly.
  */
-static inline bool rwsem_try_write_lock(int count, struct rw_semaphore *sem)
+static inline bool
+rwsem_try_write_lock(int count, struct rw_semaphore *sem, bool first)
 {
 	int new;
 
 	if (RWSEM_COUNT_LOCKED(count))
 		return false;
 
+	if (!first && RWSEM_COUNT_HANDOFF(count))
+		return false;
+
 	new = count + RWSEM_WRITER_LOCKED -
-	     (list_is_singular(&sem->wait_list) ? RWSEM_FLAG_WAITERS : 0);
+	     (list_is_singular(&sem->wait_list) ? RWSEM_FLAG_WAITERS : 0) -
+	     (count & RWSEM_FLAG_WHANDOFF);
 
 	if (atomic_cmpxchg_acquire(&sem->count, count, new) == count) {
 		rwsem_set_owner(sem);
@@ -205,7 +231,7 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
 	int old, count = atomic_read(&sem->count);
 
 	while (true) {
-		if (RWSEM_COUNT_LOCKED(count))
+		if (RWSEM_COUNT_LOCKED_OR_HANDOFF(count))
 			return false;
 
 		old = atomic_cmpxchg_acquire(&sem->count, count,
@@ -361,6 +387,16 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 #endif
 
 /*
+ * This is safe to be called without holding the wait_lock.
+ */
+static inline bool
+rwsem_waiter_is_first(struct rw_semaphore *sem, struct rwsem_waiter *waiter)
+{
+	return list_first_entry(&sem->wait_list, struct rwsem_waiter, list)
+			== waiter;
+}
+
+/*
  * Wait for the read lock to be granted
  */
 static inline struct rw_semaphore __sched *
@@ -372,6 +408,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 
 	waiter.task = current;
 	waiter.type = RWSEM_WAITING_FOR_READ;
+	waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT;
 
 	raw_spin_lock_irq(&sem->wait_lock);
 	if (list_empty(&sem->wait_list))
@@ -412,8 +449,12 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	return sem;
 out_nolock:
 	list_del(&waiter.list);
-	if (list_empty(&sem->wait_list))
-		atomic_add(-RWSEM_FLAG_WAITERS, &sem->count);
+	if (list_empty(&sem->wait_list)) {
+		int adjustment = -RWSEM_FLAG_WAITERS -
+			(atomic_read(&sem->count) & RWSEM_FLAG_HANDOFFS);
+
+		atomic_add(adjustment, &sem->count);
+	}
 	raw_spin_unlock_irq(&sem->wait_lock);
 	__set_current_state(TASK_RUNNING);
 	return ERR_PTR(-EINTR);
@@ -439,8 +480,8 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 static inline struct rw_semaphore *
 __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 {
-	int count;
-	bool waiting = true; /* any queued threads before us */
+	int count, adjustment;
+	bool first = false;	/* First one in queue */
 	struct rwsem_waiter waiter;
 	struct rw_semaphore *ret = sem;
 	DEFINE_WAKE_Q(wake_q);
@@ -455,17 +496,18 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	 */
 	waiter.task = current;
 	waiter.type = RWSEM_WAITING_FOR_WRITE;
+	waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT;
 
 	raw_spin_lock_irq(&sem->wait_lock);
 
 	/* account for this before adding a new element to the list */
 	if (list_empty(&sem->wait_list))
-		waiting = false;
+		first = true;
 
 	list_add_tail(&waiter.list, &sem->wait_list);
 
 	/* we're now waiting on the lock, but no longer actively locking */
-	if (waiting) {
+	if (!first) {
 		count = atomic_read(&sem->count);
 
 		/*
@@ -497,19 +539,30 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	/* wait until we successfully acquire the lock */
 	set_current_state(state);
 	while (true) {
-		if (rwsem_try_write_lock(count, sem))
+		if (rwsem_try_write_lock(count, sem, first))
 			break;
+
 		raw_spin_unlock_irq(&sem->wait_lock);
 
 		/* Block until there are no active lockers. */
-		do {
+		for (;;) {
 			if (signal_pending_state(state, current))
 				goto out_nolock;
 
 			schedule();
 			set_current_state(state);
 			count = atomic_read(&sem->count);
-		} while (RWSEM_COUNT_LOCKED(count));
+
+			if (!first)
+				first = rwsem_waiter_is_first(sem, &waiter);
+
+			if (!RWSEM_COUNT_LOCKED(count))
+				break;
+
+			if (first && !RWSEM_COUNT_HANDOFF(count) &&
+			    time_after(jiffies, waiter.timeout))
+				atomic_or(RWSEM_FLAG_WHANDOFF, &sem->count);
+		}
 
 		raw_spin_lock_irq(&sem->wait_lock);
 	}
@@ -523,9 +576,14 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	__set_current_state(TASK_RUNNING);
 	raw_spin_lock_irq(&sem->wait_lock);
 	list_del(&waiter.list);
+	adjustment = 0;
 	if (list_empty(&sem->wait_list))
-		atomic_add(-RWSEM_FLAG_WAITERS, &sem->count);
-	else
+		adjustment -= RWSEM_FLAG_WAITERS;
+	if (first)
+		adjustment -= (atomic_read(&sem->count) & RWSEM_FLAG_WHANDOFF);
+	if (adjustment)
+		atomic_add(adjustment, &sem->count);
+	if (!list_empty(&sem->wait_list))
 		__rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
 	raw_spin_unlock_irq(&sem->wait_lock);
 	wake_up_q(&wake_q);
@@ -552,7 +610,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
  * - up_read/up_write has decremented the active part of count if we come here
  */
 __visible
-struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
+struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem, int count)
 {
 	unsigned long flags;
 	DEFINE_WAKE_Q(wake_q);
@@ -585,7 +643,9 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
 	smp_rmb();
 
 	/*
-	 * If a spinner is present, it is not necessary to do the wakeup.
+	 * If a spinner is present and the handoff flag isn't set, it is
+	 * not necessary to do the wakeup.
+	 *
 	 * Try to do wakeup only if the trylock succeeds to minimize
 	 * spinlock contention which may introduce too much delay in the
 	 * unlock operation.
@@ -604,7 +664,7 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
 	 * rwsem_has_spinner() is true, it will guarantee at least one
 	 * trylock attempt on the rwsem later on.
 	 */
-	if (rwsem_has_spinner(sem)) {
+	if (rwsem_has_spinner(sem) && !RWSEM_COUNT_HANDOFF(count)) {
 		/*
 		 * The smp_rmb() here is to make sure that the spinner
 		 * state is consulted before reading the wait_lock.
diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h
index 2a8d878..a340ea4 100644
--- a/kernel/locking/rwsem-xadd.h
+++ b/kernel/locking/rwsem-xadd.h
@@ -74,7 +74,9 @@ static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
  *
  * Bit  0    - writer locked bit
  * Bit  1    - waiters present bit
- * Bits 2-7  - reserved
+ * Bit  2    - lock handoff bit
+ * Bit  3    - reader handoff bit
+ * Bits 4-7  - reserved
  * Bits 8-31 - 24-bit reader count
  *
  * atomic_fetch_add() is used to obtain reader lock, whereas atomic_cmpxchg()
@@ -82,19 +84,33 @@ static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
  */
 #define RWSEM_WRITER_LOCKED	0X00000001
 #define RWSEM_FLAG_WAITERS	0X00000002
+#define RWSEM_FLAG_HANDOFF	0X00000004
+#define RWSEM_FLAG_READER	0X00000008
+#define RWSEM_FLAG_RHANDOFF	(RWSEM_FLAG_HANDOFF|RWSEM_FLAG_READER)
+#define RWSEM_FLAG_WHANDOFF	RWSEM_FLAG_HANDOFF
+#define RWSEM_FLAG_HANDOFFS	RWSEM_FLAG_RHANDOFF
+
 #define RWSEM_READER_BIAS	0x00000100
 #define RWSEM_READER_SHIFT	8
 #define RWSEM_READER_MASK	(~((1U << RWSEM_READER_SHIFT) - 1))
 #define RWSEM_LOCK_MASK 	(RWSEM_WRITER_LOCKED|RWSEM_READER_MASK)
-#define RWSEM_READ_FAILED_MASK	(RWSEM_WRITER_LOCKED|RWSEM_FLAG_WAITERS)
+#define RWSEM_READ_FAILED_MASK	(RWSEM_WRITER_LOCKED|RWSEM_FLAG_WAITERS|\
+				 RWSEM_FLAG_HANDOFF)
 
 #define RWSEM_COUNT_LOCKED(c)	((c) & RWSEM_LOCK_MASK)
+#define RWSEM_COUNT_HANDOFF(c)	((c) & RWSEM_FLAG_HANDOFF)
+#define RWSEM_COUNT_HANDOFF_WRITER(c)		\
+	(((c) & RWSEM_FLAG_HANDOFFS) == RWSEM_FLAG_WHANDOFF)
+#define RWSEM_COUNT_HANDOFF_READER(c)		\
+	(((c) & RWSEM_FLAG_HANDOFFS) == RWSEM_FLAG_RHANDOFF)
+#define RWSEM_COUNT_LOCKED_OR_HANDOFF(c)	\
+	((c) & (RWSEM_LOCK_MASK|RWSEM_FLAG_HANDOFF))
 
 extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_read_failed_killable(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_write_failed_killable(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *);
+extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *, int count);
 extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
 
 /*
@@ -179,7 +195,7 @@ static inline void __up_read(struct rw_semaphore *sem)
 	tmp = atomic_add_return_release(-RWSEM_READER_BIAS, &sem->count);
 	if (unlikely((tmp & (RWSEM_LOCK_MASK|RWSEM_FLAG_WAITERS))
 			== RWSEM_FLAG_WAITERS))
-		rwsem_wake(sem);
+		rwsem_wake(sem, tmp);
 }
 
 /*
@@ -187,10 +203,12 @@ static inline void __up_read(struct rw_semaphore *sem)
  */
 static inline void __up_write(struct rw_semaphore *sem)
 {
+	int tmp;
+
 	rwsem_clear_owner(sem);
-	if (unlikely(atomic_fetch_add_release(-RWSEM_WRITER_LOCKED,
-			&sem->count) & RWSEM_FLAG_WAITERS))
-		rwsem_wake(sem);
+	tmp = atomic_fetch_add_release(-RWSEM_WRITER_LOCKED, &sem->count);
+	if (unlikely(tmp & RWSEM_FLAG_WAITERS))
+		rwsem_wake(sem, tmp);
 }
 
 /*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH-tip v7 08/15] locking/rwsem: Enable readers spinning on writer
  2017-10-18 18:30 [PATCH-tip v7 00/15] locking/rwsem: Rework rwsem-xadd & enable new rwsem features Waiman Long
                   ` (6 preceding siblings ...)
  2017-10-18 18:30 ` [PATCH-tip v7 07/15] locking/rwsem: Implement lock handoff to prevent lock starvation Waiman Long
@ 2017-10-18 18:30 ` Waiman Long
  2017-10-18 18:30 ` [PATCH-tip v7 09/15] locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value Waiman Long
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2017-10-18 18:30 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

This patch enables readers to optimistically spin on a
rwsem when it is owned by a writer instead of going to sleep
directly.  The rwsem_can_spin_on_owner() function is extracted
out of rwsem_optimistic_spin() and is called directly by
rwsem_down_read_failed() and rwsem_down_write_failed().

This patch may actually reduce performance under certain circumstances
as the readers may not be grouped together in the wait queue anymore.
So we may have a number of small reader groups among writers instead
of a large reader group. However, this change is needed for some of
the subsequent patches.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 68 +++++++++++++++++++++++++++++++++++++++------
 kernel/locking/rwsem-xadd.h |  1 +
 2 files changed, 60 insertions(+), 9 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 65717d8..15cdb28 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -224,6 +224,30 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 /*
+ * Try to acquire read lock before the reader is put on wait queue.
+ * Lock acquisition isn't allowed if the rwsem is locked or a writer handoff
+ * is ongoing.
+ */
+static inline bool rwsem_try_read_lock_unqueued(struct rw_semaphore *sem)
+{
+	int count = atomic_read(&sem->count);
+
+	if (RWSEM_COUNT_WLOCKED(count) || RWSEM_COUNT_HANDOFF_WRITER(count))
+		return false;
+
+	count = atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count);
+	if (!RWSEM_COUNT_WLOCKED(count) && !RWSEM_COUNT_HANDOFF_WRITER(count)) {
+		if (!(count >> RWSEM_READER_SHIFT))
+			rwsem_set_reader_owned(sem);
+		return true;
+	}
+
+	/* Back out the change */
+	atomic_add(-RWSEM_READER_BIAS, &sem->count);
+	return false;
+}
+
+/*
  * Try to acquire write lock before the writer has been put on wait queue.
  */
 static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
@@ -314,16 +338,14 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 	return !rwsem_owner_is_reader(READ_ONCE(sem->owner));
 }
 
-static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
+static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
+				  enum rwsem_waiter_type type)
 {
 	bool taken = false;
 
 	preempt_disable();
 
 	/* sem->wait_lock should not be held when doing optimistic spinning */
-	if (!rwsem_can_spin_on_owner(sem))
-		goto done;
-
 	if (!osq_lock(&sem->osq))
 		goto done;
 
@@ -338,10 +360,12 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 		/*
 		 * Try to acquire the lock
 		 */
-		if (rwsem_try_write_lock_unqueued(sem)) {
-			taken = true;
+		taken = (type == RWSEM_WAITING_FOR_WRITE)
+		      ? rwsem_try_write_lock_unqueued(sem)
+		      : rwsem_try_read_lock_unqueued(sem);
+
+		if (taken)
 			break;
-		}
 
 		/*
 		 * When there's no owner, we might have preempted between the
@@ -375,7 +399,13 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 }
 
 #else
-static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
+static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
+{
+	return false;
+}
+
+static inline bool
+rwsem_optimistic_spin(struct rw_semaphore *sem, enum rwsem_waiter_type type)
 {
 	return false;
 }
@@ -402,10 +432,29 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 static inline struct rw_semaphore __sched *
 __rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
 {
+	bool can_spin;
 	int count, adjustment = -RWSEM_READER_BIAS;
 	struct rwsem_waiter waiter;
 	DEFINE_WAKE_Q(wake_q);
 
+	/*
+	 * Undo read bias from down_read operation to stop active locking if:
+	 * 1) Optimistic spinners are present; or
+	 * 2) optimistic spinning is allowed.
+	 */
+	can_spin = rwsem_can_spin_on_owner(sem);
+	if (can_spin || rwsem_has_spinner(sem)) {
+		atomic_add(-RWSEM_READER_BIAS, &sem->count);
+		adjustment = 0;
+
+		/*
+		 * Do optimistic spinning and steal lock if possible.
+		 */
+		if (can_spin &&
+		    rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_READ))
+			return sem;
+	}
+
 	waiter.task = current;
 	waiter.type = RWSEM_WAITING_FOR_READ;
 	waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT;
@@ -487,7 +536,8 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	DEFINE_WAKE_Q(wake_q);
 
 	/* do optimistic spinning and steal lock if possible */
-	if (rwsem_optimistic_spin(sem))
+	if (rwsem_can_spin_on_owner(sem) &&
+	    rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_WRITE))
 		return sem;
 
 	/*
diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h
index a340ea4..db9726b 100644
--- a/kernel/locking/rwsem-xadd.h
+++ b/kernel/locking/rwsem-xadd.h
@@ -98,6 +98,7 @@ static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 				 RWSEM_FLAG_HANDOFF)
 
 #define RWSEM_COUNT_LOCKED(c)	((c) & RWSEM_LOCK_MASK)
+#define RWSEM_COUNT_WLOCKED(c)	((c) & RWSEM_WRITER_LOCKED)
 #define RWSEM_COUNT_HANDOFF(c)	((c) & RWSEM_FLAG_HANDOFF)
 #define RWSEM_COUNT_HANDOFF_WRITER(c)		\
 	(((c) & RWSEM_FLAG_HANDOFFS) == RWSEM_FLAG_WHANDOFF)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH-tip v7 09/15] locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value
  2017-10-18 18:30 [PATCH-tip v7 00/15] locking/rwsem: Rework rwsem-xadd & enable new rwsem features Waiman Long
                   ` (7 preceding siblings ...)
  2017-10-18 18:30 ` [PATCH-tip v7 08/15] locking/rwsem: Enable readers spinning on writer Waiman Long
@ 2017-10-18 18:30 ` Waiman Long
  2017-10-18 18:30 ` [PATCH-tip v7 10/15] locking/rwsem: Enable count-based spinning on reader Waiman Long
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2017-10-18 18:30 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

This patch modifies rwsem_spin_on_owner() to return a tri-state value
to better reflect the state of lock holder which enables us to make a
better decision of what to do next.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 15cdb28..7597736 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -298,9 +298,13 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 }
 
 /*
- * Return true only if we can still spin on the owner field of the rwsem.
+ * Return the folowing three values depending on the lock owner state.
+ *   1	when owner has changed and no reader is detected yet.
+ *   0	when owner has change and/or owner is a reader.
+ *  -1	when optimistic spinning has to stop because either the owner stops
+ *	running or its timeslice has been used up.
  */
-static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
+static noinline int rwsem_spin_on_owner(struct rw_semaphore *sem)
 {
 	struct task_struct *owner = READ_ONCE(sem->owner);
 
@@ -324,7 +328,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 		if (!owner->on_cpu || need_resched() ||
 				vcpu_is_preempted(task_cpu(owner))) {
 			rcu_read_unlock();
-			return false;
+			return -1;
 		}
 
 		cpu_relax();
@@ -335,7 +339,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 	 * If there is a new owner or the owner is not set, we continue
 	 * spinning.
 	 */
-	return !rwsem_owner_is_reader(READ_ONCE(sem->owner));
+	return rwsem_owner_is_reader(READ_ONCE(sem->owner)) ? 0 : 1;
 }
 
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
@@ -356,7 +360,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
 	 *  2) readers own the lock as we can't determine if they are
 	 *     actively running or not.
 	 */
-	while (rwsem_spin_on_owner(sem)) {
+	while (rwsem_spin_on_owner(sem) > 0) {
 		/*
 		 * Try to acquire the lock
 		 */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH-tip v7 10/15] locking/rwsem: Enable count-based spinning on reader
  2017-10-18 18:30 [PATCH-tip v7 00/15] locking/rwsem: Rework rwsem-xadd & enable new rwsem features Waiman Long
                   ` (8 preceding siblings ...)
  2017-10-18 18:30 ` [PATCH-tip v7 09/15] locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value Waiman Long
@ 2017-10-18 18:30 ` Waiman Long
  2017-10-18 18:30 ` [PATCH-tip v7 11/15] locking/rwsem: Remove rwsem_wake spinlock optimization Waiman Long
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2017-10-18 18:30 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

When the rwsem is owned by reader, writers stop optimistic spinning
simply because there is no easy way to figure out if all the readers
are actively running or not. However, there are scenarios where
the readers are unlikely to sleep and optimistic spinning can help
performance.

This patch provides a simple mechanism for spinning on a reader-owned
rwsem. It is a loop count threshold based spinning where the count will
get reset whenenver the the rwsem reader count value changes indicating
that the rwsem is still active. There is another maximum count value
that limits that maximum number of spinnings that can happen.

When the loop or max counts reach 0, a bit will be set in the owner
field to indicate that no more optimistic spinning should be done on
this rwsem until it becomes writer owned again.

The spinning threshold and maximum values can be overridden by
architecture specific rwsem.h header file, if necessary. The current
default threshold value is 512 iterations.

On a 2-socket 40-core x86-64 Gold 6148 system, a rwsem microbenchmark
was run with 40 locking threads (one/core) doing 10s of equal number
of reader and writer lock/unlock operations on the same rwsem
alternatively, the resulting locking total rates on a 4.14 based
kernel were 927 kop/s and 3218 kop/s without and with the patch
respectively. That was an increase of about 247%.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 72 ++++++++++++++++++++++++++++++++++++++-------
 kernel/locking/rwsem-xadd.h | 40 +++++++++++++++++++++----
 2 files changed, 95 insertions(+), 17 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 7597736..8205910 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -86,6 +86,22 @@ enum rwsem_wake_type {
 #define RWSEM_WAIT_TIMEOUT	((HZ - 1)/200 + 1)
 
 /*
+ * Reader-owned rwsem spinning threshold and maximum value
+ *
+ * This threshold and maximum values can be overridden by architecture
+ * specific value. The loop count will be reset whenenver the rwsem count
+ * value changes. The max value constrains the total number of reader-owned
+ * lock spinnings that can happen.
+ */
+#ifdef	ARCH_RWSEM_RSPIN_THRESHOLD
+# define RWSEM_RSPIN_THRESHOLD	ARCH_RWSEM_RSPIN_THRESHOLD
+# define RWSEM_RSPIN_MAX	ARCH_RWSEM_RSPIN_MAX
+#else
+# define RWSEM_RSPIN_THRESHOLD	(1 << 9)
+# define RWSEM_RSPIN_MAX	(1 << 12)
+#endif
+
+/*
  * handle the lock release when processes blocked on it that can now run
  * - if we come here from up_xxxx(), then the RWSEM_FLAG_WAITERS bit must
  *   have been set.
@@ -269,7 +285,8 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
 	}
 }
 
-static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
+static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem,
+					   bool reader)
 {
 	struct task_struct *owner;
 	bool ret = true;
@@ -281,9 +298,9 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 	owner = READ_ONCE(sem->owner);
 	if (!rwsem_owner_is_writer(owner)) {
 		/*
-		 * Don't spin if the rwsem is readers owned.
+		 * Don't spin if the rspin disable bit is set for writer.
 		 */
-		ret = !rwsem_owner_is_reader(owner);
+		ret = reader || !rwsem_owner_is_spin_disabled(owner);
 		goto done;
 	}
 
@@ -346,6 +363,11 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
 				  enum rwsem_waiter_type type)
 {
 	bool taken = false;
+	bool reader = (type == RWSEM_WAITING_FOR_READ);
+	int owner_state;	/* Lock owner state */
+	int rspin_cnt = RWSEM_RSPIN_THRESHOLD;
+	int rspin_max = RWSEM_RSPIN_MAX;
+	int old_count = 0;
 
 	preempt_disable();
 
@@ -353,25 +375,53 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
 	if (!osq_lock(&sem->osq))
 		goto done;
 
+	if (rwsem_is_spin_disabled(sem))
+		rspin_cnt = 0;
+
 	/*
 	 * Optimistically spin on the owner field and attempt to acquire the
 	 * lock whenever the owner changes. Spinning will be stopped when:
 	 *  1) the owning writer isn't running; or
-	 *  2) readers own the lock as we can't determine if they are
-	 *     actively running or not.
+	 *  2) writer: readers own the lock and spinning count has reached 0;
 	 */
-	while (rwsem_spin_on_owner(sem) > 0) {
+	while ((owner_state = rwsem_spin_on_owner(sem)) >= 0) {
 		/*
 		 * Try to acquire the lock
 		 */
-		taken = (type == RWSEM_WAITING_FOR_WRITE)
-		      ? rwsem_try_write_lock_unqueued(sem)
-		      : rwsem_try_read_lock_unqueued(sem);
+		taken = reader ? rwsem_try_read_lock_unqueued(sem)
+			       : rwsem_try_write_lock_unqueued(sem);
 
 		if (taken)
 			break;
 
 		/*
+		 * We only decremnt the rspin_cnt when the lock is owned
+		 * by readers (owner_state == 0). In which case,
+		 * rwsem_spin_on_owner() will essentially be a no-op
+		 * and we will be spinning in this main loop. The spinning
+		 * count will be reset whenever the rwsem count value
+		 * changes.
+		 */
+		if (!owner_state) {
+			int count;
+
+			if (!rspin_cnt || !rspin_max) {
+				if (!rwsem_is_spin_disabled(sem))
+					rwsem_set_spin_disabled(sem);
+				break;
+			}
+
+			count = atomic_read(&sem->count) >> RWSEM_READER_SHIFT;
+			if (count != old_count) {
+				old_count = count;
+				rspin_cnt = RWSEM_RSPIN_THRESHOLD;
+			} else {
+				rspin_cnt--;
+			}
+			rspin_max--;
+		}
+
+		/*
 		 * When there's no owner, we might have preempted between the
 		 * owner acquiring the lock and setting the owner field. If
 		 * we're an RT task that will live-lock because we won't let
@@ -446,7 +496,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	 * 1) Optimistic spinners are present; or
 	 * 2) optimistic spinning is allowed.
 	 */
-	can_spin = rwsem_can_spin_on_owner(sem);
+	can_spin = rwsem_can_spin_on_owner(sem, true);
 	if (can_spin || rwsem_has_spinner(sem)) {
 		atomic_add(-RWSEM_READER_BIAS, &sem->count);
 		adjustment = 0;
@@ -540,7 +590,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	DEFINE_WAKE_Q(wake_q);
 
 	/* do optimistic spinning and steal lock if possible */
-	if (rwsem_can_spin_on_owner(sem) &&
+	if (rwsem_can_spin_on_owner(sem, false) &&
 	    rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_WRITE))
 		return sem;
 
diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h
index db9726b..2429d8e 100644
--- a/kernel/locking/rwsem-xadd.h
+++ b/kernel/locking/rwsem-xadd.h
@@ -12,13 +12,15 @@
  * In essence, the owner field now has the following 3 states:
  *  1) 0
  *     - lock is free or the owner hasn't set the field yet
- *  2) RWSEM_READER_OWNED
+ *  2) (owner & RWSEM_READER_OWNED) == RWSEM_READER_OWNED
  *     - lock is currently or previously owned by readers (lock is free
- *       or not set by owner yet)
+ *       or not set by owner yet). The other bits in the owner field can
+ *	 be used for other purpose.
  *  3) Other non-zero value
  *     - a writer owns the lock
  */
-#define RWSEM_READER_OWNED	((struct task_struct *)1UL)
+#define RWSEM_READER_OWNED	(1UL)
+#define RWSEM_SPIN_DISABLED	(2UL)
 
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 /*
@@ -43,17 +45,43 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
  */
 static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 {
-	WRITE_ONCE(sem->owner, RWSEM_READER_OWNED);
+	WRITE_ONCE(sem->owner, (void *)RWSEM_READER_OWNED);
 }
 
 static inline bool rwsem_owner_is_writer(struct task_struct *owner)
 {
-	return owner && (owner != RWSEM_READER_OWNED);
+	return owner && !((unsigned long)owner & RWSEM_READER_OWNED);
 }
 
 static inline bool rwsem_owner_is_reader(struct task_struct *owner)
 {
-	return owner == RWSEM_READER_OWNED;
+	return (unsigned long)owner & RWSEM_READER_OWNED;
+}
+
+static inline bool rwsem_owner_is_spin_disabled(struct task_struct *owner)
+{
+	return (unsigned long)owner & RWSEM_SPIN_DISABLED;
+}
+
+/*
+ * Try to set an optimistic spinning disable bit while it is reader-owned.
+ */
+static inline void rwsem_set_spin_disabled(struct rw_semaphore *sem)
+{
+	struct task_struct *owner = READ_ONCE(sem->owner);
+
+	/*
+	 * Failure in cmpxchg() will be ignored, and the caller is expected
+	 * to retry later.
+	 */
+	if (rwsem_owner_is_reader(owner))
+		cmpxchg(&sem->owner, owner,
+			(void *)((unsigned long)owner|RWSEM_SPIN_DISABLED));
+}
+
+static inline bool rwsem_is_spin_disabled(struct rw_semaphore *sem)
+{
+	return rwsem_owner_is_spin_disabled(READ_ONCE(sem->owner));
 }
 #else
 static inline void rwsem_set_owner(struct rw_semaphore *sem)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH-tip v7 11/15] locking/rwsem: Remove rwsem_wake spinlock optimization
  2017-10-18 18:30 [PATCH-tip v7 00/15] locking/rwsem: Rework rwsem-xadd & enable new rwsem features Waiman Long
                   ` (9 preceding siblings ...)
  2017-10-18 18:30 ` [PATCH-tip v7 10/15] locking/rwsem: Enable count-based spinning on reader Waiman Long
@ 2017-10-18 18:30 ` Waiman Long
  2017-10-18 18:30 ` [PATCH-tip v7 12/15] locking/rwsem: Eliminate redundant writer wakeup calls Waiman Long
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2017-10-18 18:30 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

The rwsem_wake spinlock optimization was originally put there to
reduce lock contention in the wait_lock. However, the usefulness of
this optimization has been reduced because of the followings:

 1) The use of wake_q recently in the wakeup path has greatly
    reduce the wait_lock hold time.
 2) Contending writers used to produce false positive calls to
    rwsem_wake without waiter rather easily with the original locking
    scheme. This is no longer the case with the new locking scheme.
 3) Reader optimistic spinning reduces the chance of having waiters
    in the wait queue.

On the other hand, the complexity of making sure there is no missed
wakeup keeps increasing.  It is at a point where the drawback outgrows
its usefulness. So the optimization code is now taken out.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 62 +--------------------------------------------
 kernel/locking/rwsem-xadd.h |  6 ++---
 2 files changed, 4 insertions(+), 64 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 8205910..ba00795 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -714,72 +714,12 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
  * - up_read/up_write has decremented the active part of count if we come here
  */
 __visible
-struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem, int count)
+struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
 {
 	unsigned long flags;
 	DEFINE_WAKE_Q(wake_q);
 
-	/*
-	* __rwsem_down_write_failed_common(sem)
-	*   rwsem_optimistic_spin(sem)
-	*     osq_unlock(sem->osq)
-	*   ...
-	*   atomic_long_add_return(&sem->count)
-	*
-	*      - VS -
-	*
-	*              __up_write()
-	*                if (atomic_long_sub_return_release(&sem->count) < 0)
-	*                  rwsem_wake(sem)
-	*                    osq_is_locked(&sem->osq)
-	*
-	* And __up_write() must observe !osq_is_locked() when it observes the
-	* atomic_long_add_return() in order to not miss a wakeup.
-	*
-	* This boils down to:
-	*
-	* [S.rel] X = 1                [RmW] r0 = (Y += 0)
-	*         MB                         RMB
-	* [RmW]   Y += 1               [L]   r1 = X
-	*
-	* exists (r0=1 /\ r1=0)
-	*/
-	smp_rmb();
-
-	/*
-	 * If a spinner is present and the handoff flag isn't set, it is
-	 * not necessary to do the wakeup.
-	 *
-	 * Try to do wakeup only if the trylock succeeds to minimize
-	 * spinlock contention which may introduce too much delay in the
-	 * unlock operation.
-	 *
-	 *    spinning writer		up_write/up_read caller
-	 *    ---------------		-----------------------
-	 * [S]   osq_unlock()		[L]   osq
-	 *	 MB			      RMB
-	 * [RmW] rwsem_try_write_lock() [RmW] spin_trylock(wait_lock)
-	 *
-	 * Here, it is important to make sure that there won't be a missed
-	 * wakeup while the rwsem is free and the only spinning writer goes
-	 * to sleep without taking the rwsem. Even when the spinning writer
-	 * is just going to break out of the waiting loop, it will still do
-	 * a trylock in rwsem_down_write_failed() before sleeping. IOW, if
-	 * rwsem_has_spinner() is true, it will guarantee at least one
-	 * trylock attempt on the rwsem later on.
-	 */
-	if (rwsem_has_spinner(sem) && !RWSEM_COUNT_HANDOFF(count)) {
-		/*
-		 * The smp_rmb() here is to make sure that the spinner
-		 * state is consulted before reading the wait_lock.
-		 */
-		smp_rmb();
-		if (!raw_spin_trylock_irqsave(&sem->wait_lock, flags))
-			return sem;
-		goto locked;
-	}
 	raw_spin_lock_irqsave(&sem->wait_lock, flags);
-locked:
 
 	if (!list_empty(&sem->wait_list))
 		__rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h
index 2429d8e..1e87e85 100644
--- a/kernel/locking/rwsem-xadd.h
+++ b/kernel/locking/rwsem-xadd.h
@@ -139,7 +139,7 @@ static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 extern struct rw_semaphore *rwsem_down_read_failed_killable(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_write_failed_killable(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *, int count);
+extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *);
 extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
 
 /*
@@ -224,7 +224,7 @@ static inline void __up_read(struct rw_semaphore *sem)
 	tmp = atomic_add_return_release(-RWSEM_READER_BIAS, &sem->count);
 	if (unlikely((tmp & (RWSEM_LOCK_MASK|RWSEM_FLAG_WAITERS))
 			== RWSEM_FLAG_WAITERS))
-		rwsem_wake(sem, tmp);
+		rwsem_wake(sem);
 }
 
 /*
@@ -237,7 +237,7 @@ static inline void __up_write(struct rw_semaphore *sem)
 	rwsem_clear_owner(sem);
 	tmp = atomic_fetch_add_release(-RWSEM_WRITER_LOCKED, &sem->count);
 	if (unlikely(tmp & RWSEM_FLAG_WAITERS))
-		rwsem_wake(sem, tmp);
+		rwsem_wake(sem);
 }
 
 /*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH-tip v7 12/15] locking/rwsem: Eliminate redundant writer wakeup calls
  2017-10-18 18:30 [PATCH-tip v7 00/15] locking/rwsem: Rework rwsem-xadd & enable new rwsem features Waiman Long
                   ` (10 preceding siblings ...)
  2017-10-18 18:30 ` [PATCH-tip v7 11/15] locking/rwsem: Remove rwsem_wake spinlock optimization Waiman Long
@ 2017-10-18 18:30 ` Waiman Long
  2017-10-18 18:30 ` [PATCH-tip v7 13/15] locking/rwsem: Improve fairness to writers Waiman Long
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2017-10-18 18:30 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

When tasks with short lock hold time acquire a rwsem consecutively,
it is possible that they all try to wake up the same waiting task at
unlock time. Beside the first wakeup, the rests are a waste of precious
CPU cycles. This patch limits the actual wakeup call to only one.

To simplify sychronization of the waking flag, the inner schedule
loop of __rwsem_down_write_failed_common() is removed and
raw_spin_lock_irq() is always called after wakeup.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 39 +++++++++++++++++++++------------------
 1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index ba00795..3bdbf39 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -70,6 +70,7 @@ struct rwsem_waiter {
 	struct list_head list;
 	struct task_struct *task;
 	enum rwsem_waiter_type type;
+	bool waking;	/* For writer, protected by wait_lock */
 	unsigned long timeout;
 };
 
@@ -129,6 +130,12 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 	if (waiter->type == RWSEM_WAITING_FOR_WRITE) {
 		if (wake_type == RWSEM_WAKE_ANY) {
 			/*
+			 * No redundant wakeup if the waiter is waking up.
+			 */
+			if (waiter->waking)
+				return;
+
+			/*
 			 * Mark writer at the front of the queue for wakeup.
 			 * Until the task is actually later awoken later by
 			 * the caller, other writers are able to steal it.
@@ -136,6 +143,7 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 			 * will notice the queued writer.
 			 */
 			wake_q_add(wake_q, waiter->task);
+			waiter->waking = true;
 		}
 
 		return;
@@ -600,6 +608,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	 */
 	waiter.task = current;
 	waiter.type = RWSEM_WAITING_FOR_WRITE;
+	waiter.waking = false;
 	waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT;
 
 	raw_spin_lock_irq(&sem->wait_lock);
@@ -646,29 +655,24 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 		if (rwsem_try_write_lock(count, sem, first))
 			break;
 
-		raw_spin_unlock_irq(&sem->wait_lock);
+		if (signal_pending_state(state, current))
+			goto out_nolock;
 
-		/* Block until there are no active lockers. */
-		for (;;) {
-			if (signal_pending_state(state, current))
-				goto out_nolock;
+		if (!first)
+			first = rwsem_waiter_is_first(sem, &waiter);
 
-			schedule();
-			set_current_state(state);
-			count = atomic_read(&sem->count);
+		waiter.waking = false;
+		raw_spin_unlock_irq(&sem->wait_lock);
 
-			if (!first)
-				first = rwsem_waiter_is_first(sem, &waiter);
+		if (first && !RWSEM_COUNT_HANDOFF(count) &&
+		    time_after(jiffies, waiter.timeout))
+			atomic_or(RWSEM_FLAG_WHANDOFF, &sem->count);
 
-			if (!RWSEM_COUNT_LOCKED(count))
-				break;
-
-			if (first && !RWSEM_COUNT_HANDOFF(count) &&
-			    time_after(jiffies, waiter.timeout))
-				atomic_or(RWSEM_FLAG_WHANDOFF, &sem->count);
-		}
+		schedule();
+		set_current_state(state);
 
 		raw_spin_lock_irq(&sem->wait_lock);
+		count = atomic_read(&sem->count);
 	}
 	__set_current_state(TASK_RUNNING);
 	list_del(&waiter.list);
@@ -678,7 +682,6 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 
 out_nolock:
 	__set_current_state(TASK_RUNNING);
-	raw_spin_lock_irq(&sem->wait_lock);
 	list_del(&waiter.list);
 	adjustment = 0;
 	if (list_empty(&sem->wait_list))
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH-tip v7 13/15] locking/rwsem: Improve fairness to writers
  2017-10-18 18:30 [PATCH-tip v7 00/15] locking/rwsem: Rework rwsem-xadd & enable new rwsem features Waiman Long
                   ` (11 preceding siblings ...)
  2017-10-18 18:30 ` [PATCH-tip v7 12/15] locking/rwsem: Eliminate redundant writer wakeup calls Waiman Long
@ 2017-10-18 18:30 ` Waiman Long
  2017-10-18 18:30 ` [PATCH-tip v7 14/15] locking/rwsem: Make waiting writer to optimistically spin for the lock Waiman Long
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2017-10-18 18:30 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

When the spin disable bit in the owner field is set, writers will
stop spinning. However, an incoming stream of readers will be able to
go to the front of the OSQ and acquire the lock continuously without
ever releasing the read lock. This can lead to writer starvation.

Two changes are made to improve fairness to writers and hence prevent
writer starvation. The readers will stop spinning and go to the wait
queue when:

 1) It is not the first reader and the current time doesn't match
    the timestamp stored in the owner field.
 2) Writers cannot spin on reader.

This will allow the reader count to eventually reach 0 and wake up
the first waiter in the wait queue.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 60 +++++++++++++++++++++++++++++++++------------
 kernel/locking/rwsem-xadd.h | 32 +++++++++++++++++++-----
 2 files changed, 70 insertions(+), 22 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 3bdbf39..edb5ecc 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -169,10 +169,11 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 			return;
 		}
 		/*
-		 * Set it to reader-owned for first reader.
+		 * Since the wait queue head is a reader, we can set a
+		 * new timestamp even if it is not the first reader to
+		 * acquire the current lock.
 		 */
-		if (!(oldcount >> RWSEM_READER_SHIFT))
-			rwsem_set_reader_owned(sem);
+		rwsem_set_reader_owned(sem);
 	}
 
 	/*
@@ -249,44 +250,64 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 /*
  * Try to acquire read lock before the reader is put on wait queue.
- * Lock acquisition isn't allowed if the rwsem is locked or a writer handoff
- * is ongoing.
+ *
+ * Lock acquisition isn't allowed if the rwsem is locked.
+ *
+ * To avoid writer starvation and have more more fairness, it has to stop
+ * spinning if the reader count isn't 0, waiter is present and there is a
+ * timestamp mismatch.
+ *
+ * This will allow the reader count to go to 0 and wake up the first one
+ * in the wait queue which can initiate the handoff protocol, if necessary.
+ *
+ * Return:  1 - lock acquired
+ *	    0 - lock acquisition failed
+ *	   -1 - stop spinning
  */
-static inline bool rwsem_try_read_lock_unqueued(struct rw_semaphore *sem)
+static inline int rwsem_try_read_lock_unqueued(struct rw_semaphore *sem)
 {
 	int count = atomic_read(&sem->count);
 
 	if (RWSEM_COUNT_WLOCKED(count) || RWSEM_COUNT_HANDOFF_WRITER(count))
-		return false;
+		return 0;
 
 	count = atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count);
 	if (!RWSEM_COUNT_WLOCKED(count) && !RWSEM_COUNT_HANDOFF_WRITER(count)) {
-		if (!(count >> RWSEM_READER_SHIFT))
+		if (!(count >> RWSEM_READER_SHIFT)) {
 			rwsem_set_reader_owned(sem);
-		return true;
+		} else if (count & RWSEM_FLAG_WAITERS) {
+			struct task_struct *owner = READ_ONCE(sem->owner);
+
+			if (rwsem_owner_is_reader(owner) &&
+			   !rwsem_owner_timestamp_match(owner)) {
+				atomic_add(-RWSEM_READER_BIAS, &sem->count);
+				return -1;
+			}
+		}
+		return 1;
 	}
 
 	/* Back out the change */
 	atomic_add(-RWSEM_READER_BIAS, &sem->count);
-	return false;
+	return 0;
 }
 
 /*
  * Try to acquire write lock before the writer has been put on wait queue.
  */
-static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
+static inline int rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
 {
 	int old, count = atomic_read(&sem->count);
 
 	while (true) {
 		if (RWSEM_COUNT_LOCKED_OR_HANDOFF(count))
-			return false;
+			return 0;
 
 		old = atomic_cmpxchg_acquire(&sem->count, count,
 				count + RWSEM_WRITER_LOCKED);
 		if (old == count) {
 			rwsem_set_owner(sem);
-			return true;
+			return 1;
 		}
 
 		count = old;
@@ -370,7 +391,7 @@ static noinline int rwsem_spin_on_owner(struct rw_semaphore *sem)
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
 				  enum rwsem_waiter_type type)
 {
-	bool taken = false;
+	int taken = 0;
 	bool reader = (type == RWSEM_WAITING_FOR_READ);
 	int owner_state;	/* Lock owner state */
 	int rspin_cnt = RWSEM_RSPIN_THRESHOLD;
@@ -390,10 +411,17 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
 	 * Optimistically spin on the owner field and attempt to acquire the
 	 * lock whenever the owner changes. Spinning will be stopped when:
 	 *  1) the owning writer isn't running; or
-	 *  2) writer: readers own the lock and spinning count has reached 0;
+	 *  2) readers own the lock and spinning count has reached 0;
 	 */
 	while ((owner_state = rwsem_spin_on_owner(sem)) >= 0) {
 		/*
+		 * For fairness, reader will stop taking reader-owned lock
+		 * when writers cannot spin on reader.
+		 */
+		if (reader && !rspin_cnt && !owner_state)
+			break;
+
+		/*
 		 * Try to acquire the lock
 		 */
 		taken = reader ? rwsem_try_read_lock_unqueued(sem)
@@ -449,7 +477,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
 	osq_unlock(&sem->osq);
 done:
 	preempt_enable();
-	return taken;
+	return taken > 0;
 }
 
 /*
diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h
index 1e87e85..ea87bf1 100644
--- a/kernel/locking/rwsem-xadd.h
+++ b/kernel/locking/rwsem-xadd.h
@@ -4,10 +4,16 @@
 #include <linux/rwsem.h>
 
 /*
- * The owner field of the rw_semaphore structure will be set to
- * RWSEM_READ_OWNED when a reader grabs the lock. A writer will clear
- * the owner field when it unlocks. A reader, on the other hand, will
- * not touch the owner field when it unlocks.
+ * When a reader acquires the lock, the RWSEM_READ_OWNED bit of the owner
+ * field will be set. In addition, a timestamp based on the current jiffies
+ * value will be put into the upper bits of the owner. An incoming reader in
+ * the slowpath will not be allowed to join the read lock if the current
+ * time does not match the timestamp. This check will prevent a continuous
+ * stream of incoming readers from monopolizing the lock and starving the
+ * writers.
+ *
+ * A writer will clear the owner field when it unlocks. A reader, on the
+ * other hand, will not touch the owner field when it unlocks.
  *
  * In essence, the owner field now has the following 3 states:
  *  1) 0
@@ -19,8 +25,11 @@
  *  3) Other non-zero value
  *     - a writer owns the lock
  */
-#define RWSEM_READER_OWNED	(1UL)
-#define RWSEM_SPIN_DISABLED	(2UL)
+#define RWSEM_READER_OWNED		(1UL)
+#define RWSEM_SPIN_DISABLED		(2UL)
+#define RWSEM_READER_TIMESTAMP_SHIFT	8
+#define RWSEM_READER_TIMESTAMP_MASK	\
+	~((1UL << RWSEM_READER_TIMESTAMP_SHIFT) - 1)
 
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 /*
@@ -46,6 +55,8 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
 static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 {
 	WRITE_ONCE(sem->owner, (void *)RWSEM_READER_OWNED);
+	WRITE_ONCE(sem->owner, (void *)(RWSEM_READER_OWNED |
+		  (jiffies << RWSEM_READER_TIMESTAMP_SHIFT)));
 }
 
 static inline bool rwsem_owner_is_writer(struct task_struct *owner)
@@ -58,6 +69,15 @@ static inline bool rwsem_owner_is_reader(struct task_struct *owner)
 	return (unsigned long)owner & RWSEM_READER_OWNED;
 }
 
+/*
+ * Return true if the timestamp matches the current time.
+ */
+static inline bool rwsem_owner_timestamp_match(struct task_struct *owner)
+{
+	return ((unsigned long)owner & RWSEM_READER_TIMESTAMP_MASK) ==
+	       (jiffies << RWSEM_READER_TIMESTAMP_SHIFT);
+}
+
 static inline bool rwsem_owner_is_spin_disabled(struct task_struct *owner)
 {
 	return (unsigned long)owner & RWSEM_SPIN_DISABLED;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH-tip v7 14/15] locking/rwsem: Make waiting writer to optimistically spin for the lock
  2017-10-18 18:30 [PATCH-tip v7 00/15] locking/rwsem: Rework rwsem-xadd & enable new rwsem features Waiman Long
                   ` (12 preceding siblings ...)
  2017-10-18 18:30 ` [PATCH-tip v7 13/15] locking/rwsem: Improve fairness to writers Waiman Long
@ 2017-10-18 18:30 ` Waiman Long
  2017-10-18 18:30 ` [PATCH-tip v7 15/15] locking/rwsem: Wake up all readers in wait queue Waiman Long
  2017-10-19 15:21 ` [PATCH-tip v7 00/15] locking/rwsem: Rework rwsem-xadd & enable new rwsem features Waiman Long
  15 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2017-10-18 18:30 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

When waiting writers set the handoff bit, they went back to sleep. This
introduced a waiting period during which the rwsem could not be
acquired, but the designated waiter was not ready to acquire it. To
help eliminating this waiting period, the writer that set the handoff
bit will then optimistically spin for the rwsem instead of going to
sleep immediately.

The waiting writer in the front of the queue will also set the handoff
bit after it has slept at least once to make rwsem favor writers more
than readers.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 64 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 46 insertions(+), 18 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index edb5ecc..42129fa 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -294,17 +294,28 @@ static inline int rwsem_try_read_lock_unqueued(struct rw_semaphore *sem)
 
 /*
  * Try to acquire write lock before the writer has been put on wait queue.
+ *
+ * The fwaiter flag can only be set by a waiting writer at the head of the
+ * wait queue. So the handoff flag can be ignored and cleared in this case.
  */
-static inline int rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
+static inline int rwsem_try_write_lock_unqueued(struct rw_semaphore *sem,
+						bool fwaiter)
 {
-	int old, count = atomic_read(&sem->count);
+	int old, new, count = atomic_read(&sem->count);
 
 	while (true) {
-		if (RWSEM_COUNT_LOCKED_OR_HANDOFF(count))
+		if (RWSEM_COUNT_LOCKED(count))
 			return 0;
 
-		old = atomic_cmpxchg_acquire(&sem->count, count,
-				count + RWSEM_WRITER_LOCKED);
+		new = count + RWSEM_WRITER_LOCKED;
+		if (RWSEM_COUNT_HANDOFF(count)) {
+			if (fwaiter)
+				new -= RWSEM_FLAG_WHANDOFF;
+			else
+				return 0;
+		}
+
+		old = atomic_cmpxchg_acquire(&sem->count, count, new);
 		if (old == count) {
 			rwsem_set_owner(sem);
 			return 1;
@@ -389,7 +400,7 @@ static noinline int rwsem_spin_on_owner(struct rw_semaphore *sem)
 }
 
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
-				  enum rwsem_waiter_type type)
+				  enum rwsem_waiter_type type, bool fwaiter)
 {
 	int taken = 0;
 	bool reader = (type == RWSEM_WAITING_FOR_READ);
@@ -401,7 +412,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
 	preempt_disable();
 
 	/* sem->wait_lock should not be held when doing optimistic spinning */
-	if (!osq_lock(&sem->osq))
+	if (!fwaiter && !osq_lock(&sem->osq))
 		goto done;
 
 	if (rwsem_is_spin_disabled(sem))
@@ -425,7 +436,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
 		 * Try to acquire the lock
 		 */
 		taken = reader ? rwsem_try_read_lock_unqueued(sem)
-			       : rwsem_try_write_lock_unqueued(sem);
+			       : rwsem_try_write_lock_unqueued(sem, fwaiter);
 
 		if (taken)
 			break;
@@ -474,7 +485,8 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
 		 */
 		cpu_relax();
 	}
-	osq_unlock(&sem->osq);
+	if (!fwaiter)
+		osq_unlock(&sem->osq);
 done:
 	preempt_enable();
 	return taken > 0;
@@ -494,8 +506,9 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 	return false;
 }
 
-static inline bool
-rwsem_optimistic_spin(struct rw_semaphore *sem, enum rwsem_waiter_type type)
+static inline bool rwsem_optimistic_spin(struct rw_semaphore *sem,
+					 enum rwsem_waiter_type type,
+					 bool fwaiter)
 {
 	return false;
 }
@@ -541,7 +554,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 		 * Do optimistic spinning and steal lock if possible.
 		 */
 		if (can_spin &&
-		    rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_READ))
+		    rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_READ, false))
 			return sem;
 	}
 
@@ -627,7 +640,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 
 	/* do optimistic spinning and steal lock if possible */
 	if (rwsem_can_spin_on_owner(sem, false) &&
-	    rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_WRITE))
+	    rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_WRITE, false))
 		return sem;
 
 	/*
@@ -637,7 +650,6 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	waiter.task = current;
 	waiter.type = RWSEM_WAITING_FOR_WRITE;
 	waiter.waking = false;
-	waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT;
 
 	raw_spin_lock_irq(&sem->wait_lock);
 
@@ -691,14 +703,30 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 
 		waiter.waking = false;
 		raw_spin_unlock_irq(&sem->wait_lock);
+		schedule();
 
-		if (first && !RWSEM_COUNT_HANDOFF(count) &&
-		    time_after(jiffies, waiter.timeout))
+		if (!first)
+			goto relock;
+		/*
+		 * As writer can optimistically spin for the rwsem here,
+		 * we are setting the handoff bit after it has slept at
+		 * least once.
+		 */
+		if (!RWSEM_COUNT_HANDOFF(count))
 			atomic_or(RWSEM_FLAG_WHANDOFF, &sem->count);
 
-		schedule();
+		/*
+		 * Optimistically spin on the lock after the handoff bit is
+		 * set.
+		 */
+		if (rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_WRITE, true)) {
+			 raw_spin_lock_irq(&sem->wait_lock);
+			 if (list_is_singular(&sem->wait_list))
+				atomic_sub(RWSEM_FLAG_WAITERS, &sem->count);
+			 break;
+		}
+relock:
 		set_current_state(state);
-
 		raw_spin_lock_irq(&sem->wait_lock);
 		count = atomic_read(&sem->count);
 	}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH-tip v7 15/15] locking/rwsem: Wake up all readers in wait queue
  2017-10-18 18:30 [PATCH-tip v7 00/15] locking/rwsem: Rework rwsem-xadd & enable new rwsem features Waiman Long
                   ` (13 preceding siblings ...)
  2017-10-18 18:30 ` [PATCH-tip v7 14/15] locking/rwsem: Make waiting writer to optimistically spin for the lock Waiman Long
@ 2017-10-18 18:30 ` Waiman Long
  2017-10-19 15:21 ` [PATCH-tip v7 00/15] locking/rwsem: Rework rwsem-xadd & enable new rwsem features Waiman Long
  15 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2017-10-18 18:30 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

When the front of the wait queue is a reader, other readers
immediately following the first reader will also be woken up at the
same time.

Assuming that the lock hold times of the other readers still in the
queue will be about the same as the readers that are being woken up,
there is really not much additional cost other than the additional
latency due to the wakeup of additional tasks by the waker. Therefore
all the readers in the queue are woken up to improve reader throughput.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 42129fa..e8da0a9 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -177,16 +177,16 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 	}
 
 	/*
-	 * Grant an infinite number of read locks to the readers at the front
-	 * of the queue. We know that woken will be at least 1 as we accounted
-	 * for above. Note we increment the 'active part' of the count by the
+	 * Grant an infinite number of read locks to all the readers in the
+	 * queue. We know that woken will be at least 1 as we accounted for
+	 * above. Note we increment the 'active part' of the count by the
 	 * number of readers before waking any processes up.
 	 */
 	list_for_each_entry_safe(waiter, tmp, &sem->wait_list, list) {
 		struct task_struct *tsk;
 
 		if (waiter->type == RWSEM_WAITING_FOR_WRITE)
-			break;
+			continue;
 
 		woken++;
 		tsk = waiter->task;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH-tip v7 00/15] locking/rwsem: Rework rwsem-xadd & enable new rwsem features
  2017-10-18 18:30 [PATCH-tip v7 00/15] locking/rwsem: Rework rwsem-xadd & enable new rwsem features Waiman Long
                   ` (14 preceding siblings ...)
  2017-10-18 18:30 ` [PATCH-tip v7 15/15] locking/rwsem: Wake up all readers in wait queue Waiman Long
@ 2017-10-19 15:21 ` Waiman Long
  15 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2017-10-19 15:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner

Hi,

I had just run the rwsem microbenchmark on a 1-socket 44-core Qualcomm
Amberwing (Centriq 2400) arm64 system. There were 18 writer and 18
reader threads running.

For the patched kernel, the results were:

                       Reader                       Writer
  CS Load         Locking Ops/Thread           Locking Ops/Thread
  -------         ------------------           ------------------
     1          18,800/103,894/223,371      496,362/695,560/1,034,278
    10          28,503/ 68,834/154,348      425,708/791,553/1,469,845
    50           7,997/ 28,278/102,327      431,577/897,064/1,898,146
   100          31,628/ 52,555/ 89,431      432,844/580,496/  910,290
 1us sleep      15,625/ 16,071/ 16,535       42,339/ 44,866/   46,189

                     Reader                       Writer
  CS Load     Slowpath Locking Ops         Slowpath Locking Ops
  -------     --------------------         --------------------
     1            1,296,904                     11,196,177
    10            1,125,334                     13,242,082
    50              284,342                     14,960,882
   100              916,305                      9,652,818
 1us sleep          289,177                        807,584

                 All Writers        Half Writers
  CS Load    Locking Ops/Thread  Locking Ops/Thread     % Change
  -------    ------------------  ------------------     --------
     1           1,634,230           695,560             -57.4
    10           1,658,228           791,553             -52.3
    50           1,494,180           897,064             -40.0
   100           1,089,364           580,496             -46.7
 1us sleep          25,380            44,866             +76.8

It is obvious that for arm64, the writers are preferred under all
circumstances. One special thing about the results was that for the
all writers case, the number of slowpath calls were exceedingly small.
It was about 1000 or less which are significantly less than in x86-64
which was in the millions. Maybe it was due to the LL/SC architecture
that allows it to stay in the fast path as much as possible with
homogenous operation.

The corresponding results for the unpatched kernel were:

                       Reader                       Writer
  CS Load         Locking Ops/Thread           Locking Ops/Thread
  -------         ------------------           ------------------
     1           23,898/23,899/23,905        45,264/177,375/461,387
    10           25,114/25,115/25,122        26,188/190,517/458,960
    50           23,762/23,762/23,763        67,862/174,640/269,519
   100           25,050/25,051/25,053        57,214/200,725/814,178
 1us sleep            6/     6/     7             6/ 58,512/180,892

                 All Writers        Half Writers
  CS Load    Locking Ops/Thread  Locking Ops/Thread     % Change
  -------    ------------------  ------------------     --------
     1           1,687,691           177,375             -89.5
    10           1,627,061           190,517             -88.3
    50           1,469,431           174,640             -88.1
   100           1,148,905           200,725             -82.5
 1us sleep          29,865            58,512             +95.9

Cheers,
Longman

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-10-19 15:21 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-18 18:30 [PATCH-tip v7 00/15] locking/rwsem: Rework rwsem-xadd & enable new rwsem features Waiman Long
2017-10-18 18:30 ` [PATCH-tip v7 01/15] locking/rwsem: relocate rwsem_down_read_failed() Waiman Long
2017-10-18 18:30 ` [PATCH-tip v7 02/15] locking/rwsem: Implement a new locking scheme Waiman Long
2017-10-18 18:30 ` [PATCH-tip v7 03/15] locking/rwsem: Move owner setting code from rwsem.c to rwsem-xadd.h Waiman Long
2017-10-18 18:30 ` [PATCH-tip v7 04/15] locking/rwsem: Remove kernel/locking/rwsem.h Waiman Long
2017-10-18 18:30 ` [PATCH-tip v7 05/15] locking/rwsem: Move rwsem internal function declarations to rwsem-xadd.h Waiman Long
2017-10-18 18:30 ` [PATCH-tip v7 06/15] locking/rwsem: Remove arch specific rwsem files Waiman Long
2017-10-18 18:30 ` [PATCH-tip v7 07/15] locking/rwsem: Implement lock handoff to prevent lock starvation Waiman Long
2017-10-18 18:30 ` [PATCH-tip v7 08/15] locking/rwsem: Enable readers spinning on writer Waiman Long
2017-10-18 18:30 ` [PATCH-tip v7 09/15] locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value Waiman Long
2017-10-18 18:30 ` [PATCH-tip v7 10/15] locking/rwsem: Enable count-based spinning on reader Waiman Long
2017-10-18 18:30 ` [PATCH-tip v7 11/15] locking/rwsem: Remove rwsem_wake spinlock optimization Waiman Long
2017-10-18 18:30 ` [PATCH-tip v7 12/15] locking/rwsem: Eliminate redundant writer wakeup calls Waiman Long
2017-10-18 18:30 ` [PATCH-tip v7 13/15] locking/rwsem: Improve fairness to writers Waiman Long
2017-10-18 18:30 ` [PATCH-tip v7 14/15] locking/rwsem: Make waiting writer to optimistically spin for the lock Waiman Long
2017-10-18 18:30 ` [PATCH-tip v7 15/15] locking/rwsem: Wake up all readers in wait queue Waiman Long
2017-10-19 15:21 ` [PATCH-tip v7 00/15] locking/rwsem: Rework rwsem-xadd & enable new rwsem features Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).