Re: [PATCH 2/9] locking/qrwlock: avoid redundant atomic_add_return on read_lock_slowpath

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Waiman Long <waiman.long@hp.com>
To: Will Deacon <will.deacon@arm.com>
Cc: linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	peterz@infradead.org, mingo@redhat.com
Subject: Re: [PATCH 2/9] locking/qrwlock: avoid redundant atomic_add_return on read_lock_slowpath
Date: Tue, 07 Jul 2015 13:51:54 -0400	[thread overview]
Message-ID: <559C11BA.1070309@hp.com> (raw)
In-Reply-To: <1436289865-2331-3-git-send-email-will.deacon@arm.com>

On 07/07/2015 01:24 PM, Will Deacon wrote:
> When a slow-path reader gets to the front of the wait queue outside of
> interrupt context, it waits for any writers to drain, increments the
> reader count and again waits for any additional writers that may have
> snuck in between the initial check and the increment.
>
> Given that this second check is performed with acquire semantics, there
> is no need to perform the increment using atomic_add_return, which acts
> as a full barrier.
>
> This patch changes the slow-path code to use smp_load_acquire and
> atomic_add instead of atomic_add_return. Since the check only involves
> the writer count, we can perform the acquire after the add.
>
> Signed-off-by: Will Deacon<will.deacon@arm.com>
> ---
>   kernel/locking/qrwlock.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
> index 96b77d1e0545..4e29bef688ac 100644
> --- a/kernel/locking/qrwlock.c
> +++ b/kernel/locking/qrwlock.c
> @@ -98,7 +98,8 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
>   	while (atomic_read(&lock->cnts)&  _QW_WMASK)
>   		cpu_relax_lowlatency();
>
> -	cnts = atomic_add_return(_QR_BIAS,&lock->cnts) - _QR_BIAS;
> +	atomic_add(_QR_BIAS,&lock->cnts);
> +	cnts = smp_load_acquire((u32 *)&lock->cnts);
>   	rspin_until_writer_unlock(lock, cnts);
>
>   	/*

Atomic add in x86 is actually a full barrier too. The performance 
difference between "lock add" and "lock xadd" should be minor. The 
additional load, however, could potentially cause an additional 
cacheline load on a contended lock. So do you see actual performance 
benefit of this change in ARM?

Cheers,
Longman

WARNING: multiple messages have this Message-ID (diff)

From: waiman.long@hp.com (Waiman Long)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 2/9] locking/qrwlock: avoid redundant atomic_add_return on read_lock_slowpath
Date: Tue, 07 Jul 2015 13:51:54 -0400	[thread overview]
Message-ID: <559C11BA.1070309@hp.com> (raw)
In-Reply-To: <1436289865-2331-3-git-send-email-will.deacon@arm.com>

On 07/07/2015 01:24 PM, Will Deacon wrote:
> When a slow-path reader gets to the front of the wait queue outside of
> interrupt context, it waits for any writers to drain, increments the
> reader count and again waits for any additional writers that may have
> snuck in between the initial check and the increment.
>
> Given that this second check is performed with acquire semantics, there
> is no need to perform the increment using atomic_add_return, which acts
> as a full barrier.
>
> This patch changes the slow-path code to use smp_load_acquire and
> atomic_add instead of atomic_add_return. Since the check only involves
> the writer count, we can perform the acquire after the add.
>
> Signed-off-by: Will Deacon<will.deacon@arm.com>
> ---
>   kernel/locking/qrwlock.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
> index 96b77d1e0545..4e29bef688ac 100644
> --- a/kernel/locking/qrwlock.c
> +++ b/kernel/locking/qrwlock.c
> @@ -98,7 +98,8 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
>   	while (atomic_read(&lock->cnts)&  _QW_WMASK)
>   		cpu_relax_lowlatency();
>
> -	cnts = atomic_add_return(_QR_BIAS,&lock->cnts) - _QR_BIAS;
> +	atomic_add(_QR_BIAS,&lock->cnts);
> +	cnts = smp_load_acquire((u32 *)&lock->cnts);
>   	rspin_until_writer_unlock(lock, cnts);
>
>   	/*

Atomic add in x86 is actually a full barrier too. The performance 
difference between "lock add" and "lock xadd" should be minor. The 
additional load, however, could potentially cause an additional 
cacheline load on a contended lock. So do you see actual performance 
benefit of this change in ARM?

Cheers,
Longman

next prev parent reply	other threads:[~2015-07-07 17:51 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-07 17:24 [PATCH 0/9] locking/qrwlock: get qrwlocks up and running on arm64 Will Deacon
2015-07-07 17:24 ` Will Deacon
2015-07-07 17:24 ` [PATCH 1/9] locking/qrwlock: include <linux/spinlock.h> for arch_spin_{lock,unlock} Will Deacon
2015-07-07 17:24   ` [PATCH 1/9] locking/qrwlock: include <linux/spinlock.h> for arch_spin_{lock, unlock} Will Deacon
2015-07-07 17:24 ` [PATCH 2/9] locking/qrwlock: avoid redundant atomic_add_return on read_lock_slowpath Will Deacon
2015-07-07 17:24   ` Will Deacon
2015-07-07 17:51   ` Waiman Long [this message]
2015-07-07 17:51     ` Waiman Long
2015-07-07 18:19     ` Will Deacon
2015-07-07 18:19       ` Will Deacon
2015-07-07 19:28       ` Waiman Long
2015-07-07 19:28         ` Waiman Long
2015-07-08  9:59         ` Peter Zijlstra
2015-07-08  9:59           ` Peter Zijlstra
2015-07-08 13:37           ` Will Deacon
2015-07-08 13:37             ` Will Deacon
2015-07-07 21:30     ` Peter Zijlstra
2015-07-07 21:30       ` Peter Zijlstra
2015-07-07 17:24 ` [PATCH 3/9] locking/qrwlock: tidy up rspin_until_writer_unlock Will Deacon
2015-07-07 17:24   ` Will Deacon
2015-07-07 17:24 ` [PATCH 4/9] locking/qrwlock: implement queue_write_unlock using smp_store_release Will Deacon
2015-07-07 17:24   ` Will Deacon
2015-07-08 10:00   ` Peter Zijlstra
2015-07-08 10:00     ` Peter Zijlstra
2015-07-07 17:24 ` [PATCH 5/9] locking/qrwlock: remove redundant cmpxchg barriers on writer slow-path Will Deacon
2015-07-07 17:24   ` Will Deacon
2015-07-08 10:05   ` Peter Zijlstra
2015-07-08 10:05     ` Peter Zijlstra
2015-07-08 13:34     ` Will Deacon
2015-07-08 13:34       ` Will Deacon
2015-07-07 17:24 ` [PATCH 6/9] locking/qrwlock: allow architectures to hook in to contended paths Will Deacon
2015-07-07 17:24   ` Will Deacon
2015-07-08 10:06   ` Peter Zijlstra
2015-07-08 10:06     ` Peter Zijlstra
2015-07-08 13:35     ` Will Deacon
2015-07-08 13:35       ` Will Deacon
2015-07-07 17:24 ` [PATCH 7/9] locking/qrwlock: expose internal lock structure in qrwlock definition Will Deacon
2015-07-07 17:24   ` Will Deacon
2015-07-07 17:24 ` [PATCH 8/9] arm64: cmpxchg: implement cmpxchg_relaxed Will Deacon
2015-07-07 17:24   ` Will Deacon
2015-07-07 17:24 ` [PATCH 9/9] arm64: locking: replace read/write locks with generic qrwlock code Will Deacon
2015-07-07 17:24   ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=559C11BA.1070309@hp.com \
    --to=waiman.long@hp.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.