[PATCH-tip v2 0/2] locking/qrwlock: Improve qrwlock performance

linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH-tip v2 0/2] locking/qrwlock: Improve qrwlock performance
@ 2015-07-09 16:32 Waiman Long
  2015-07-09 16:32 ` [PATCH v2 1/2] locking/qrwlock: Reduce reader/writer to reader lock transfer latency Waiman Long
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Waiman Long @ 2015-07-09 16:32 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnd Bergmann, Thomas Gleixner
  Cc: linux-arch, linux-kernel, Will Deacon, Scott J Norton,
	Douglas Hatch, Waiman Long

v1->v2:
 - Take out patch 1 which had been merged to tip.
 - Take out patch 4 as the change may impact light load performance
 - Rebased to the latest tip branch

In converting some existing spinlocks to rwlock, it was found that
the write lock slowpath performance isn't as good as the qspinlock.
This patch series tries to improve qrwlock performance to close the
gap between qspinlock and qrwlock.

With this patch series in place, we can start converting some spinlocks
back to rwlocks where it makes sense and the lock size increase isn't
a concern.

Waiman Long (2):
  locking/qrwlock: Reduce reader/writer to reader lock transfer latency
  locking/qrwlock: Reduce writer to writer lock transfer latency

 kernel/locking/qrwlock.c |   32 +++++++++++++++++---------------
 1 files changed, 17 insertions(+), 15 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/2] locking/qrwlock: Reduce reader/writer to reader lock transfer latency
  2015-07-09 16:32 [PATCH-tip v2 0/2] locking/qrwlock: Improve qrwlock performance Waiman Long
@ 2015-07-09 16:32 ` Waiman Long
  2015-07-09 16:32   ` Waiman Long
  2015-07-09 20:52   ` Davidlohr Bueso
  2015-07-09 16:32 ` [PATCH v2 2/2] locking/qrwlock: Reduce writer to writer " Waiman Long
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 12+ messages in thread
From: Waiman Long @ 2015-07-09 16:32 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnd Bergmann, Thomas Gleixner
  Cc: linux-arch, linux-kernel, Will Deacon, Scott J Norton,
	Douglas Hatch, Waiman Long

Currently, a reader will check first to make sure that the writer mode
byte is cleared before incrementing the reader count. That waiting is
not really necessary. It increases the latency in the reader/writer
to reader transition and reduces readers performance.

This patch eliminates that waiting. It also has the side effect
of reducing the chance of writer lock stealing and improving the
fairness of the lock. Using a locking microbenchmark, a 10-threads 5M
locking loop of mostly readers (RW ratio = 10,000:1) has the following
performance numbers in a Haswell-EX box:

        Kernel          Locking Rate (Kops/s)
        ------          ---------------------
        4.1.1               15,063,081
        Patched 4.1.1       17,241,552

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 kernel/locking/qrwlock.c |   12 ++++--------
 1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index d9c36c5..6a7a3b8 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -88,15 +88,11 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
 	arch_spin_lock(&lock->lock);
 
 	/*
-	 * At the head of the wait queue now, wait until the writer state
-	 * goes to 0 and then try to increment the reader count and get
-	 * the lock. It is possible that an incoming writer may steal the
-	 * lock in the interim, so it is necessary to check the writer byte
-	 * to make sure that the write lock isn't taken.
+	 * At the head of the wait queue now, increment the reader count
+	 * and wait until the writer, if it has the lock, has gone away.
+	 * At ths stage, it is not possible for a writer to remain in the
+	 * waiting state (_QW_WAITING). So there won't be any deadlock.
 	 */
-	while (atomic_read(&lock->cnts) & _QW_WMASK)
-		cpu_relax_lowlatency();
-
 	cnts = atomic_add_return(_QR_BIAS, &lock->cnts) - _QR_BIAS;
 	rspin_until_writer_unlock(lock, cnts);
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 1/2] locking/qrwlock: Reduce reader/writer to reader lock transfer latency
  2015-07-09 16:32 ` [PATCH v2 1/2] locking/qrwlock: Reduce reader/writer to reader lock transfer latency Waiman Long
@ 2015-07-09 16:32   ` Waiman Long
  2015-07-09 20:52   ` Davidlohr Bueso
  1 sibling, 0 replies; 12+ messages in thread
From: Waiman Long @ 2015-07-09 16:32 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnd Bergmann, Thomas Gleixner
  Cc: linux-arch, linux-kernel, Will Deacon, Scott J Norton,
	Douglas Hatch, Waiman Long

Currently, a reader will check first to make sure that the writer mode
byte is cleared before incrementing the reader count. That waiting is
not really necessary. It increases the latency in the reader/writer
to reader transition and reduces readers performance.

This patch eliminates that waiting. It also has the side effect
of reducing the chance of writer lock stealing and improving the
fairness of the lock. Using a locking microbenchmark, a 10-threads 5M
locking loop of mostly readers (RW ratio = 10,000:1) has the following
performance numbers in a Haswell-EX box:

        Kernel          Locking Rate (Kops/s)
        ------          ---------------------
        4.1.1               15,063,081
        Patched 4.1.1       17,241,552

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 kernel/locking/qrwlock.c |   12 ++++--------
 1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index d9c36c5..6a7a3b8 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -88,15 +88,11 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
 	arch_spin_lock(&lock->lock);
 
 	/*
-	 * At the head of the wait queue now, wait until the writer state
-	 * goes to 0 and then try to increment the reader count and get
-	 * the lock. It is possible that an incoming writer may steal the
-	 * lock in the interim, so it is necessary to check the writer byte
-	 * to make sure that the write lock isn't taken.
+	 * At the head of the wait queue now, increment the reader count
+	 * and wait until the writer, if it has the lock, has gone away.
+	 * At ths stage, it is not possible for a writer to remain in the
+	 * waiting state (_QW_WAITING). So there won't be any deadlock.
 	 */
-	while (atomic_read(&lock->cnts) & _QW_WMASK)
-		cpu_relax_lowlatency();
-
 	cnts = atomic_add_return(_QR_BIAS, &lock->cnts) - _QR_BIAS;
 	rspin_until_writer_unlock(lock, cnts);
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/2] locking/qrwlock: Reduce writer to writer lock transfer latency
  2015-07-09 16:32 [PATCH-tip v2 0/2] locking/qrwlock: Improve qrwlock performance Waiman Long
  2015-07-09 16:32 ` [PATCH v2 1/2] locking/qrwlock: Reduce reader/writer to reader lock transfer latency Waiman Long
@ 2015-07-09 16:32 ` Waiman Long
  2015-07-09 16:32   ` Waiman Long
  2015-07-09 22:04 ` [PATCH-tip v2 0/2] locking/qrwlock: Improve qrwlock performance Davidlohr Bueso
  2015-07-16 15:53 ` Will Deacon
  3 siblings, 1 reply; 12+ messages in thread
From: Waiman Long @ 2015-07-09 16:32 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnd Bergmann, Thomas Gleixner
  Cc: linux-arch, linux-kernel, Will Deacon, Scott J Norton,
	Douglas Hatch, Waiman Long

In most cases, a writer acquires the lock in two steps - first setting
the writer mode byte to _QW_WAITING and then to _QW_LOCKED. So two
atomic operations are required. This 2-step dance is only needed if
readers are present. This patch modifies the logic so that a writer
will try to acquire the lock in a single step as long as possible
until it see some readers.

Using a locking microbenchmark, a 10-threads 5M locking loop of only
writers has the following performance numbers in a Haswell-EX box:

        Kernel          Locking Rate (Kops/s)
        ------          ---------------------
        4.1.1               11,939,648
        Patched 4.1.1       12,906,593

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 kernel/locking/qrwlock.c |   20 +++++++++++++-------
 1 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 6a7a3b8..9f64493 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -109,15 +109,22 @@ EXPORT_SYMBOL(queued_read_lock_slowpath);
  */
 void queued_write_lock_slowpath(struct qrwlock *lock)
 {
-	u32 cnts;
-
 	/* Put the writer into the wait queue */
 	arch_spin_lock(&lock->lock);
 
 	/* Try to acquire the lock directly if no reader is present */
-	if (!atomic_read(&lock->cnts) &&
-	    (atomic_cmpxchg(&lock->cnts, 0, _QW_LOCKED) == 0))
-		goto unlock;
+	for (;;) {
+		u32 cnts = atomic_read(&lock->cnts);
+
+		if (!cnts) {
+			cnts = atomic_cmpxchg(&lock->cnts, 0, _QW_LOCKED);
+			if (cnts == 0)
+				goto unlock;
+		}
+		if (cnts & ~_QW_WMASK)
+			break;	/* Reader is present */
+		cpu_relax_lowlatency();
+	}
 
 	/*
 	 * Set the waiting flag to notify readers that a writer is pending,
@@ -135,8 +142,7 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 
 	/* When no more readers, set the locked flag */
 	for (;;) {
-		cnts = atomic_read(&lock->cnts);
-		if ((cnts == _QW_WAITING) &&
+		if ((atomic_read(&lock->cnts) == _QW_WAITING) &&
 		    (atomic_cmpxchg(&lock->cnts, _QW_WAITING,
 				    _QW_LOCKED) == _QW_WAITING))
 			break;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/2] locking/qrwlock: Reduce writer to writer lock transfer latency
  2015-07-09 16:32 ` [PATCH v2 2/2] locking/qrwlock: Reduce writer to writer " Waiman Long
@ 2015-07-09 16:32   ` Waiman Long
  0 siblings, 0 replies; 12+ messages in thread
From: Waiman Long @ 2015-07-09 16:32 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnd Bergmann, Thomas Gleixner
  Cc: linux-arch, linux-kernel, Will Deacon, Scott J Norton,
	Douglas Hatch, Waiman Long

In most cases, a writer acquires the lock in two steps - first setting
the writer mode byte to _QW_WAITING and then to _QW_LOCKED. So two
atomic operations are required. This 2-step dance is only needed if
readers are present. This patch modifies the logic so that a writer
will try to acquire the lock in a single step as long as possible
until it see some readers.

Using a locking microbenchmark, a 10-threads 5M locking loop of only
writers has the following performance numbers in a Haswell-EX box:

        Kernel          Locking Rate (Kops/s)
        ------          ---------------------
        4.1.1               11,939,648
        Patched 4.1.1       12,906,593

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 kernel/locking/qrwlock.c |   20 +++++++++++++-------
 1 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 6a7a3b8..9f64493 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -109,15 +109,22 @@ EXPORT_SYMBOL(queued_read_lock_slowpath);
  */
 void queued_write_lock_slowpath(struct qrwlock *lock)
 {
-	u32 cnts;
-
 	/* Put the writer into the wait queue */
 	arch_spin_lock(&lock->lock);
 
 	/* Try to acquire the lock directly if no reader is present */
-	if (!atomic_read(&lock->cnts) &&
-	    (atomic_cmpxchg(&lock->cnts, 0, _QW_LOCKED) == 0))
-		goto unlock;
+	for (;;) {
+		u32 cnts = atomic_read(&lock->cnts);
+
+		if (!cnts) {
+			cnts = atomic_cmpxchg(&lock->cnts, 0, _QW_LOCKED);
+			if (cnts == 0)
+				goto unlock;
+		}
+		if (cnts & ~_QW_WMASK)
+			break;	/* Reader is present */
+		cpu_relax_lowlatency();
+	}
 
 	/*
 	 * Set the waiting flag to notify readers that a writer is pending,
@@ -135,8 +142,7 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 
 	/* When no more readers, set the locked flag */
 	for (;;) {
-		cnts = atomic_read(&lock->cnts);
-		if ((cnts == _QW_WAITING) &&
+		if ((atomic_read(&lock->cnts) == _QW_WAITING) &&
 		    (atomic_cmpxchg(&lock->cnts, _QW_WAITING,
 				    _QW_LOCKED) == _QW_WAITING))
 			break;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] locking/qrwlock: Reduce reader/writer to reader lock transfer latency
  2015-07-09 16:32 ` [PATCH v2 1/2] locking/qrwlock: Reduce reader/writer to reader lock transfer latency Waiman Long
  2015-07-09 16:32   ` Waiman Long
@ 2015-07-09 20:52   ` Davidlohr Bueso
  2015-07-10  1:10     ` Waiman Long
  1 sibling, 1 reply; 12+ messages in thread
From: Davidlohr Bueso @ 2015-07-09 20:52 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Arnd Bergmann, Thomas Gleixner,
	linux-arch, linux-kernel, Will Deacon, Scott J Norton,
	Douglas Hatch

On Thu, 2015-07-09 at 12:32 -0400, Waiman Long wrote:
> This patch eliminates that waiting. It also has the side effect
> of reducing the chance of writer lock stealing and improving the
> fairness of the lock. Using a locking microbenchmark, a 10-threads 5M
> locking loop of mostly readers (RW ratio = 10,000:1) has the following
> performance numbers in a Haswell-EX box:
> 
>         Kernel          Locking Rate (Kops/s)
>         ------          ---------------------
>         4.1.1               15,063,081
>         Patched 4.1.1       17,241,552

In any case, for such read-mostly scenarios, you'd probably want to be
using rcu ;-).

> 
> Signed-off-by: Waiman Long <Waiman.Long@hp.com>
> ---
>  kernel/locking/qrwlock.c |   12 ++++--------
>  1 files changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
> index d9c36c5..6a7a3b8 100644
> --- a/kernel/locking/qrwlock.c
> +++ b/kernel/locking/qrwlock.c
> @@ -88,15 +88,11 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
>  	arch_spin_lock(&lock->lock);
>  
>  	/*
> -	 * At the head of the wait queue now, wait until the writer state
> -	 * goes to 0 and then try to increment the reader count and get
> -	 * the lock. It is possible that an incoming writer may steal the
> -	 * lock in the interim, so it is necessary to check the writer byte
> -	 * to make sure that the write lock isn't taken.
> +	 * At the head of the wait queue now, increment the reader count
> +	 * and wait until the writer, if it has the lock, has gone away.
> +	 * At ths
                ^^ this

>  stage, it is not possible for a writer to remain in the
> +	 * waiting state (_QW_WAITING). So there won't be any deadlock.

Because the writer setting _QW_WAITING is done in the slowpath,
serialized with the qrwlock->lock, right?

>  	 */
> -	while (atomic_read(&lock->cnts) & _QW_WMASK)
> -		cpu_relax_lowlatency();
> -
>  	cnts = atomic_add_return(_QR_BIAS, &lock->cnts) - _QR_BIAS;

Nit: since 'cnts' is now only the original value of lock->cnts before
adding _QR_BIAS, could we rename it to 'prev_cnts' (or something)? --
iirc you removed the need for the variable when in interrupt context.

>  	rspin_until_writer_unlock(lock, cnts);

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH-tip v2 0/2] locking/qrwlock: Improve qrwlock performance
  2015-07-09 16:32 [PATCH-tip v2 0/2] locking/qrwlock: Improve qrwlock performance Waiman Long
  2015-07-09 16:32 ` [PATCH v2 1/2] locking/qrwlock: Reduce reader/writer to reader lock transfer latency Waiman Long
  2015-07-09 16:32 ` [PATCH v2 2/2] locking/qrwlock: Reduce writer to writer " Waiman Long
@ 2015-07-09 22:04 ` Davidlohr Bueso
  2015-07-09 22:04   ` Davidlohr Bueso
  2015-07-10  1:16   ` Waiman Long
  2015-07-16 15:53 ` Will Deacon
  3 siblings, 2 replies; 12+ messages in thread
From: Davidlohr Bueso @ 2015-07-09 22:04 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Arnd Bergmann, Thomas Gleixner,
	linux-arch, linux-kernel, Will Deacon, Scott J Norton,
	Douglas Hatch

On Thu, 2015-07-09 at 12:32 -0400, Waiman Long wrote:
> With this patch series in place, we can start converting some spinlocks
> back to rwlocks where it makes sense and the lock size increase isn't
> a concern.

Nice, have any users to convert? I can think of a few I've encountered,
but there must be quite a few, specially those nasty global spinlocks
where nobody cares about the size.

o hugetlb reservation map lock: Updating hugepage ranges does a two step
read/update for the reservation map. The first step could now be done
concurrently if converted.

o The infamous swap_lock; although I doubt any of the serious offenders
(ie zswap callbacks) would benefit much for anything
beyond /proc/meminfo and related.

o async cookie sync wait_event, battery/ata bootup(?).

etc. etc. Obviously the fairness factor is also something to consider.

Thanks,
Davidlohr 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH-tip v2 0/2] locking/qrwlock: Improve qrwlock performance
  2015-07-09 22:04 ` [PATCH-tip v2 0/2] locking/qrwlock: Improve qrwlock performance Davidlohr Bueso
@ 2015-07-09 22:04   ` Davidlohr Bueso
  2015-07-10  1:16   ` Waiman Long
  1 sibling, 0 replies; 12+ messages in thread
From: Davidlohr Bueso @ 2015-07-09 22:04 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Arnd Bergmann, Thomas Gleixner,
	linux-arch, linux-kernel, Will Deacon, Scott J Norton,
	Douglas Hatch

On Thu, 2015-07-09 at 12:32 -0400, Waiman Long wrote:
> With this patch series in place, we can start converting some spinlocks
> back to rwlocks where it makes sense and the lock size increase isn't
> a concern.

Nice, have any users to convert? I can think of a few I've encountered,
but there must be quite a few, specially those nasty global spinlocks
where nobody cares about the size.

o hugetlb reservation map lock: Updating hugepage ranges does a two step
read/update for the reservation map. The first step could now be done
concurrently if converted.

o The infamous swap_lock; although I doubt any of the serious offenders
(ie zswap callbacks) would benefit much for anything
beyond /proc/meminfo and related.

o async cookie sync wait_event, battery/ata bootup(?).

etc. etc. Obviously the fairness factor is also something to consider.

Thanks,
Davidlohr 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] locking/qrwlock: Reduce reader/writer to reader lock transfer latency
  2015-07-09 20:52   ` Davidlohr Bueso
@ 2015-07-10  1:10     ` Waiman Long
  0 siblings, 0 replies; 12+ messages in thread
From: Waiman Long @ 2015-07-10  1:10 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Ingo Molnar, Arnd Bergmann, Thomas Gleixner,
	linux-arch, linux-kernel, Will Deacon, Scott J Norton,
	Douglas Hatch

On 07/09/2015 04:52 PM, Davidlohr Bueso wrote:
> On Thu, 2015-07-09 at 12:32 -0400, Waiman Long wrote:
>> This patch eliminates that waiting. It also has the side effect
>> of reducing the chance of writer lock stealing and improving the
>> fairness of the lock. Using a locking microbenchmark, a 10-threads 5M
>> locking loop of mostly readers (RW ratio = 10,000:1) has the following
>> performance numbers in a Haswell-EX box:
>>
>>          Kernel          Locking Rate (Kops/s)
>>          ------          ---------------------
>>          4.1.1               15,063,081
>>          Patched 4.1.1       17,241,552
> In any case, for such read-mostly scenarios, you'd probably want to be
> using rcu ;-).

Yes, I agree:-)

>> Signed-off-by: Waiman Long<Waiman.Long@hp.com>
>> ---
>>   kernel/locking/qrwlock.c |   12 ++++--------
>>   1 files changed, 4 insertions(+), 8 deletions(-)
>>
>> diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
>> index d9c36c5..6a7a3b8 100644
>> --- a/kernel/locking/qrwlock.c
>> +++ b/kernel/locking/qrwlock.c
>> @@ -88,15 +88,11 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
>>   	arch_spin_lock(&lock->lock);
>>
>>   	/*
>> -	 * At the head of the wait queue now, wait until the writer state
>> -	 * goes to 0 and then try to increment the reader count and get
>> -	 * the lock. It is possible that an incoming writer may steal the
>> -	 * lock in the interim, so it is necessary to check the writer byte
>> -	 * to make sure that the write lock isn't taken.
>> +	 * At the head of the wait queue now, increment the reader count
>> +	 * and wait until the writer, if it has the lock, has gone away.
>> +	 * At ths
>                  ^^ this
>
>>   stage, it is not possible for a writer to remain in the
>> +	 * waiting state (_QW_WAITING). So there won't be any deadlock.
> Because the writer setting _QW_WAITING is done in the slowpath,
> serialized with the qrwlock->lock, right?

_QW_WAITING can only be set when the writer is at the queue head, and it 
will become _QW_LOCKED when it gets the lock. When a reader becomes 
queue head, the writer byte can either be 0 or _QW_LOCKED, but it can 
never be _QW_WAITING.

>>   	*/
>> -	while (atomic_read(&lock->cnts)&  _QW_WMASK)
>> -		cpu_relax_lowlatency();
>> -
>>   	cnts = atomic_add_return(_QR_BIAS,&lock->cnts) - _QR_BIAS;
> Nit: since 'cnts' is now only the original value of lock->cnts before
> adding _QR_BIAS, could we rename it to 'prev_cnts' (or something)? --
> iirc you removed the need for the variable when in interrupt context.

The subtraction sign is there to simulate an xadd instruction. Without 
that, the generated code will have an additional add instruction. Yes, 
it is kind of a hack. It will be changed later on when other 
architectures start using qrwlock.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH-tip v2 0/2] locking/qrwlock: Improve qrwlock performance
  2015-07-09 22:04 ` [PATCH-tip v2 0/2] locking/qrwlock: Improve qrwlock performance Davidlohr Bueso
  2015-07-09 22:04   ` Davidlohr Bueso
@ 2015-07-10  1:16   ` Waiman Long
  1 sibling, 0 replies; 12+ messages in thread
From: Waiman Long @ 2015-07-10  1:16 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Ingo Molnar, Arnd Bergmann, Thomas Gleixner,
	linux-arch, linux-kernel, Will Deacon, Scott J Norton,
	Douglas Hatch

On 07/09/2015 06:04 PM, Davidlohr Bueso wrote:
> On Thu, 2015-07-09 at 12:32 -0400, Waiman Long wrote:
>> With this patch series in place, we can start converting some spinlocks
>> back to rwlocks where it makes sense and the lock size increase isn't
>> a concern.
> Nice, have any users to convert? I can think of a few I've encountered,
> but there must be quite a few, specially those nasty global spinlocks
> where nobody cares about the size.
>
> o hugetlb reservation map lock: Updating hugepage ranges does a two step
> read/update for the reservation map. The first step could now be done
> concurrently if converted.
>
> o The infamous swap_lock; although I doubt any of the serious offenders
> (ie zswap callbacks) would benefit much for anything
> beyond /proc/meminfo and related.
>
> o async cookie sync wait_event, battery/ata bootup(?).
>
> etc. etc. Obviously the fairness factor is also something to consider.

Yes, I saw a couple of global spinlocks that can be converted to 
rwlocks. The read lock can be used for lookup, whereas the write lock is 
used for modification. Doing so will enable parallel lookups. As the 
qrwlock is almost fair compared with the old implementation, it removes 
a big roadblock for the conversion.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH-tip v2 0/2] locking/qrwlock: Improve qrwlock performance
  2015-07-09 16:32 [PATCH-tip v2 0/2] locking/qrwlock: Improve qrwlock performance Waiman Long
                   ` (2 preceding siblings ...)
  2015-07-09 22:04 ` [PATCH-tip v2 0/2] locking/qrwlock: Improve qrwlock performance Davidlohr Bueso
@ 2015-07-16 15:53 ` Will Deacon
  2015-07-16 15:53   ` Will Deacon
  3 siblings, 1 reply; 12+ messages in thread
From: Will Deacon @ 2015-07-16 15:53 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Arnd Bergmann, Thomas Gleixner,
	linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
	Scott J Norton, Douglas Hatch

Hi Waiman,

On Thu, Jul 09, 2015 at 05:32:21PM +0100, Waiman Long wrote:
> v1->v2:
>  - Take out patch 1 which had been merged to tip.
>  - Take out patch 4 as the change may impact light load performance
>  - Rebased to the latest tip branch
> 
> In converting some existing spinlocks to rwlock, it was found that
> the write lock slowpath performance isn't as good as the qspinlock.
> This patch series tries to improve qrwlock performance to close the
> gap between qspinlock and qrwlock.
> 
> With this patch series in place, we can start converting some spinlocks
> back to rwlocks where it makes sense and the lock size increase isn't
> a concern.

Both of these patches look fine to me:

  Acked-by: Will Deacon <will.deacon@arm.com>

Cheers,

Will

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH-tip v2 0/2] locking/qrwlock: Improve qrwlock performance
  2015-07-16 15:53 ` Will Deacon
@ 2015-07-16 15:53   ` Will Deacon
  0 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2015-07-16 15:53 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Arnd Bergmann, Thomas Gleixner,
	linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
	Scott J Norton, Douglas Hatch

Hi Waiman,

On Thu, Jul 09, 2015 at 05:32:21PM +0100, Waiman Long wrote:
> v1->v2:
>  - Take out patch 1 which had been merged to tip.
>  - Take out patch 4 as the change may impact light load performance
>  - Rebased to the latest tip branch
> 
> In converting some existing spinlocks to rwlock, it was found that
> the write lock slowpath performance isn't as good as the qspinlock.
> This patch series tries to improve qrwlock performance to close the
> gap between qspinlock and qrwlock.
> 
> With this patch series in place, we can start converting some spinlocks
> back to rwlocks where it makes sense and the lock size increase isn't
> a concern.

Both of these patches look fine to me:

  Acked-by: Will Deacon <will.deacon@arm.com>

Cheers,

Will

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-07-16 15:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-09 16:32 [PATCH-tip v2 0/2] locking/qrwlock: Improve qrwlock performance Waiman Long
2015-07-09 16:32 ` [PATCH v2 1/2] locking/qrwlock: Reduce reader/writer to reader lock transfer latency Waiman Long
2015-07-09 16:32   ` Waiman Long
2015-07-09 20:52   ` Davidlohr Bueso
2015-07-10  1:10     ` Waiman Long
2015-07-09 16:32 ` [PATCH v2 2/2] locking/qrwlock: Reduce writer to writer " Waiman Long
2015-07-09 16:32   ` Waiman Long
2015-07-09 22:04 ` [PATCH-tip v2 0/2] locking/qrwlock: Improve qrwlock performance Davidlohr Bueso
2015-07-09 22:04   ` Davidlohr Bueso
2015-07-10  1:16   ` Waiman Long
2015-07-16 15:53 ` Will Deacon
2015-07-16 15:53   ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).