From: Will Deacon <will.deacon@arm.com>
To: Waiman Long <waiman.long@hp.com>
Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"peterz@infradead.org" <peterz@infradead.org>,
"mingo@redhat.com" <mingo@redhat.com>
Subject: Re: [PATCH 2/9] locking/qrwlock: avoid redundant atomic_add_return on read_lock_slowpath
Date: Tue, 7 Jul 2015 19:19:41 +0100 [thread overview]
Message-ID: <20150707181941.GL23879@arm.com> (raw)
In-Reply-To: <559C11BA.1070309@hp.com>
On Tue, Jul 07, 2015 at 06:51:54PM +0100, Waiman Long wrote:
> On 07/07/2015 01:24 PM, Will Deacon wrote:
> > When a slow-path reader gets to the front of the wait queue outside of
> > interrupt context, it waits for any writers to drain, increments the
> > reader count and again waits for any additional writers that may have
> > snuck in between the initial check and the increment.
> >
> > Given that this second check is performed with acquire semantics, there
> > is no need to perform the increment using atomic_add_return, which acts
> > as a full barrier.
> >
> > This patch changes the slow-path code to use smp_load_acquire and
> > atomic_add instead of atomic_add_return. Since the check only involves
> > the writer count, we can perform the acquire after the add.
> >
> > Signed-off-by: Will Deacon<will.deacon@arm.com>
> > ---
> > kernel/locking/qrwlock.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
> > index 96b77d1e0545..4e29bef688ac 100644
> > --- a/kernel/locking/qrwlock.c
> > +++ b/kernel/locking/qrwlock.c
> > @@ -98,7 +98,8 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
> > while (atomic_read(&lock->cnts)& _QW_WMASK)
> > cpu_relax_lowlatency();
> >
> > - cnts = atomic_add_return(_QR_BIAS,&lock->cnts) - _QR_BIAS;
> > + atomic_add(_QR_BIAS,&lock->cnts);
> > + cnts = smp_load_acquire((u32 *)&lock->cnts);
> > rspin_until_writer_unlock(lock, cnts);
> >
> > /*
>
> Atomic add in x86 is actually a full barrier too. The performance
> difference between "lock add" and "lock xadd" should be minor. The
> additional load, however, could potentially cause an additional
> cacheline load on a contended lock. So do you see actual performance
> benefit of this change in ARM?
I'd need to re-run the numbers, but atomic_add is significantly less
work on ARM than atomic_add_return, which basically has two full memory
barriers compared to none for the former.
Will
WARNING: multiple messages have this Message-ID (diff)
From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 2/9] locking/qrwlock: avoid redundant atomic_add_return on read_lock_slowpath
Date: Tue, 7 Jul 2015 19:19:41 +0100 [thread overview]
Message-ID: <20150707181941.GL23879@arm.com> (raw)
In-Reply-To: <559C11BA.1070309@hp.com>
On Tue, Jul 07, 2015 at 06:51:54PM +0100, Waiman Long wrote:
> On 07/07/2015 01:24 PM, Will Deacon wrote:
> > When a slow-path reader gets to the front of the wait queue outside of
> > interrupt context, it waits for any writers to drain, increments the
> > reader count and again waits for any additional writers that may have
> > snuck in between the initial check and the increment.
> >
> > Given that this second check is performed with acquire semantics, there
> > is no need to perform the increment using atomic_add_return, which acts
> > as a full barrier.
> >
> > This patch changes the slow-path code to use smp_load_acquire and
> > atomic_add instead of atomic_add_return. Since the check only involves
> > the writer count, we can perform the acquire after the add.
> >
> > Signed-off-by: Will Deacon<will.deacon@arm.com>
> > ---
> > kernel/locking/qrwlock.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
> > index 96b77d1e0545..4e29bef688ac 100644
> > --- a/kernel/locking/qrwlock.c
> > +++ b/kernel/locking/qrwlock.c
> > @@ -98,7 +98,8 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
> > while (atomic_read(&lock->cnts)& _QW_WMASK)
> > cpu_relax_lowlatency();
> >
> > - cnts = atomic_add_return(_QR_BIAS,&lock->cnts) - _QR_BIAS;
> > + atomic_add(_QR_BIAS,&lock->cnts);
> > + cnts = smp_load_acquire((u32 *)&lock->cnts);
> > rspin_until_writer_unlock(lock, cnts);
> >
> > /*
>
> Atomic add in x86 is actually a full barrier too. The performance
> difference between "lock add" and "lock xadd" should be minor. The
> additional load, however, could potentially cause an additional
> cacheline load on a contended lock. So do you see actual performance
> benefit of this change in ARM?
I'd need to re-run the numbers, but atomic_add is significantly less
work on ARM than atomic_add_return, which basically has two full memory
barriers compared to none for the former.
Will
next prev parent reply other threads:[~2015-07-07 18:19 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-07 17:24 [PATCH 0/9] locking/qrwlock: get qrwlocks up and running on arm64 Will Deacon
2015-07-07 17:24 ` Will Deacon
2015-07-07 17:24 ` [PATCH 1/9] locking/qrwlock: include <linux/spinlock.h> for arch_spin_{lock,unlock} Will Deacon
2015-07-07 17:24 ` [PATCH 1/9] locking/qrwlock: include <linux/spinlock.h> for arch_spin_{lock, unlock} Will Deacon
2015-07-07 17:24 ` [PATCH 2/9] locking/qrwlock: avoid redundant atomic_add_return on read_lock_slowpath Will Deacon
2015-07-07 17:24 ` Will Deacon
2015-07-07 17:51 ` Waiman Long
2015-07-07 17:51 ` Waiman Long
2015-07-07 18:19 ` Will Deacon [this message]
2015-07-07 18:19 ` Will Deacon
2015-07-07 19:28 ` Waiman Long
2015-07-07 19:28 ` Waiman Long
2015-07-08 9:59 ` Peter Zijlstra
2015-07-08 9:59 ` Peter Zijlstra
2015-07-08 13:37 ` Will Deacon
2015-07-08 13:37 ` Will Deacon
2015-07-07 21:30 ` Peter Zijlstra
2015-07-07 21:30 ` Peter Zijlstra
2015-07-07 17:24 ` [PATCH 3/9] locking/qrwlock: tidy up rspin_until_writer_unlock Will Deacon
2015-07-07 17:24 ` Will Deacon
2015-07-07 17:24 ` [PATCH 4/9] locking/qrwlock: implement queue_write_unlock using smp_store_release Will Deacon
2015-07-07 17:24 ` Will Deacon
2015-07-08 10:00 ` Peter Zijlstra
2015-07-08 10:00 ` Peter Zijlstra
2015-07-07 17:24 ` [PATCH 5/9] locking/qrwlock: remove redundant cmpxchg barriers on writer slow-path Will Deacon
2015-07-07 17:24 ` Will Deacon
2015-07-08 10:05 ` Peter Zijlstra
2015-07-08 10:05 ` Peter Zijlstra
2015-07-08 13:34 ` Will Deacon
2015-07-08 13:34 ` Will Deacon
2015-07-07 17:24 ` [PATCH 6/9] locking/qrwlock: allow architectures to hook in to contended paths Will Deacon
2015-07-07 17:24 ` Will Deacon
2015-07-08 10:06 ` Peter Zijlstra
2015-07-08 10:06 ` Peter Zijlstra
2015-07-08 13:35 ` Will Deacon
2015-07-08 13:35 ` Will Deacon
2015-07-07 17:24 ` [PATCH 7/9] locking/qrwlock: expose internal lock structure in qrwlock definition Will Deacon
2015-07-07 17:24 ` Will Deacon
2015-07-07 17:24 ` [PATCH 8/9] arm64: cmpxchg: implement cmpxchg_relaxed Will Deacon
2015-07-07 17:24 ` Will Deacon
2015-07-07 17:24 ` [PATCH 9/9] arm64: locking: replace read/write locks with generic qrwlock code Will Deacon
2015-07-07 17:24 ` Will Deacon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150707181941.GL23879@arm.com \
--to=will.deacon@arm.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=waiman.long@hp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.