From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v2 0/5] Switch arm64 over to qrwlock
Date: Mon, 9 Oct 2017 10:59:36 +0100 [thread overview]
Message-ID: <20171009095935.GC5127@arm.com> (raw)
In-Reply-To: <20171008213052.ojyxpr56d2ypscjy@yury-thinkpad>
Hi Yury,
On Mon, Oct 09, 2017 at 12:30:52AM +0300, Yury Norov wrote:
> On Fri, Oct 06, 2017 at 02:34:37PM +0100, Will Deacon wrote:
> > This is version two of the patches I posted yesterday:
> >
> > http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/534666.html
> >
> > I'd normally leave it longer before posting again, but Peter had a good
> > suggestion to rework the layout of the lock word, so I wanted to post a
> > version that follows that approach.
> >
> > I've updated my branch if you're after the full patch stack:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git qrwlock
> >
> > As before, all comments (particularly related to testing and performance)
> > welcome!
> >
> I tested your patches with locktorture and found measurable performance
> regression. I also respin the patch of Jan Glauber [1], and I also
> tried Jan's patch with patch 5 from this series. Numbers differ a lot
> from my previous measurements, but since that I changed working
> station and use qemu with the support of parallel threads.
> Spinlock Read-RW lock Write-RW lock
> Vanilla: 129804626 12340895 14716138
> This series: 113718002 10982159 13068934
> Jan patch: 117977108 11363462 13615449
> Jan patch + #5: 121483176 11696728 13618967
>
> The bottomline of discussion [1] was that queued locks are more
> effective when SoC has many CPUs. And 4 is not many. My measurement
> was made on the 4-CPU machine, and it seems it confirms that. Does
> it make sense to make queued locks default for many-CPU machines only?
Just to confirm, you're running this under qemu on an x86 host, using full
AArch64 system emulation? If so, I really don't think we should base the
merits of qrwlocks on arm64 around this type of configuration. Given that
you work for a silicon vendor, could you try running on real arm64 hardware
instead, please? My measurements on 6-core and 8-core systems look a lot
better with qrwlock than what we currently have in mainline, and they
also fix a real starvation issue reported by Jeremy [1].
I'd also add that lock fairness comes at a cost, so I'd expect a small drop
in total throughput for some workloads. I encourage you to try passing
different arguments to locktorture to see this in action. For example, on
an 8-core machine:
# insmod ./locktorture.ko nwriters_stress=2 nreaders_stress=8 torture_type="rw_lock_irq" stat_interval=2
-rc3:
Writes: Total: 6612 Max/Min: 0/0 Fail: 0
Reads : Total: 1265230 Max/Min: 0/0 Fail: 0
Writes: Total: 6709 Max/Min: 0/0 Fail: 0
Reads : Total: 1916418 Max/Min: 0/0 Fail: 0
Writes: Total: 6725 Max/Min: 0/0 Fail: 0
Reads : Total: 5103727 Max/Min: 0/0 Fail: 0
notice how the writers are really struggling here (you only have to tweak a
bit more and you get RCU stalls, lose interrupts etc).
With the qrwlock:
Writes: Total: 47962 Max/Min: 0/0 Fail: 0
Reads : Total: 277903 Max/Min: 0/0 Fail: 0
Writes: Total: 100151 Max/Min: 0/0 Fail: 0
Reads : Total: 525781 Max/Min: 0/0 Fail: 0
Writes: Total: 155284 Max/Min: 0/0 Fail: 0
Reads : Total: 767703 Max/Min: 0/0 Fail: 0
which is an awful lot better for maximum latency and fairness, despite the
much lower reader count.
> There were 2 preparing patches in the series:
> [PATCH 1/3] kernel/locking: #include <asm/spinlock.h> in qrwlock
> and
> [PATCH 2/3] asm-generic: don't #include <linux/atomic.h> in qspinlock_types.h
>
> 1st patch is not needed anymore because Babu Moger submitted similar patch that
> is already in mainline: 9ab6055f95903 ("kernel/locking: Fix compile error with
> qrwlock.c"). Could you revisit second patch?
Sorry, not sure what you're asking me to do here.
Will
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/534299.html
WARNING: multiple messages have this Message-ID (diff)
From: Will Deacon <will.deacon@arm.com>
To: Yury Norov <ynorov@caviumnetworks.com>
Cc: linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, Jeremy.Linton@arm.com,
peterz@infradead.org, mingo@redhat.com, longman@redhat.com,
boqun.feng@gmail.com, paulmck@linux.vnet.ibm.com
Subject: Re: [PATCH v2 0/5] Switch arm64 over to qrwlock
Date: Mon, 9 Oct 2017 10:59:36 +0100 [thread overview]
Message-ID: <20171009095935.GC5127@arm.com> (raw)
In-Reply-To: <20171008213052.ojyxpr56d2ypscjy@yury-thinkpad>
Hi Yury,
On Mon, Oct 09, 2017 at 12:30:52AM +0300, Yury Norov wrote:
> On Fri, Oct 06, 2017 at 02:34:37PM +0100, Will Deacon wrote:
> > This is version two of the patches I posted yesterday:
> >
> > http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/534666.html
> >
> > I'd normally leave it longer before posting again, but Peter had a good
> > suggestion to rework the layout of the lock word, so I wanted to post a
> > version that follows that approach.
> >
> > I've updated my branch if you're after the full patch stack:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git qrwlock
> >
> > As before, all comments (particularly related to testing and performance)
> > welcome!
> >
> I tested your patches with locktorture and found measurable performance
> regression. I also respin the patch of Jan Glauber [1], and I also
> tried Jan's patch with patch 5 from this series. Numbers differ a lot
> from my previous measurements, but since that I changed working
> station and use qemu with the support of parallel threads.
> Spinlock Read-RW lock Write-RW lock
> Vanilla: 129804626 12340895 14716138
> This series: 113718002 10982159 13068934
> Jan patch: 117977108 11363462 13615449
> Jan patch + #5: 121483176 11696728 13618967
>
> The bottomline of discussion [1] was that queued locks are more
> effective when SoC has many CPUs. And 4 is not many. My measurement
> was made on the 4-CPU machine, and it seems it confirms that. Does
> it make sense to make queued locks default for many-CPU machines only?
Just to confirm, you're running this under qemu on an x86 host, using full
AArch64 system emulation? If so, I really don't think we should base the
merits of qrwlocks on arm64 around this type of configuration. Given that
you work for a silicon vendor, could you try running on real arm64 hardware
instead, please? My measurements on 6-core and 8-core systems look a lot
better with qrwlock than what we currently have in mainline, and they
also fix a real starvation issue reported by Jeremy [1].
I'd also add that lock fairness comes at a cost, so I'd expect a small drop
in total throughput for some workloads. I encourage you to try passing
different arguments to locktorture to see this in action. For example, on
an 8-core machine:
# insmod ./locktorture.ko nwriters_stress=2 nreaders_stress=8 torture_type="rw_lock_irq" stat_interval=2
-rc3:
Writes: Total: 6612 Max/Min: 0/0 Fail: 0
Reads : Total: 1265230 Max/Min: 0/0 Fail: 0
Writes: Total: 6709 Max/Min: 0/0 Fail: 0
Reads : Total: 1916418 Max/Min: 0/0 Fail: 0
Writes: Total: 6725 Max/Min: 0/0 Fail: 0
Reads : Total: 5103727 Max/Min: 0/0 Fail: 0
notice how the writers are really struggling here (you only have to tweak a
bit more and you get RCU stalls, lose interrupts etc).
With the qrwlock:
Writes: Total: 47962 Max/Min: 0/0 Fail: 0
Reads : Total: 277903 Max/Min: 0/0 Fail: 0
Writes: Total: 100151 Max/Min: 0/0 Fail: 0
Reads : Total: 525781 Max/Min: 0/0 Fail: 0
Writes: Total: 155284 Max/Min: 0/0 Fail: 0
Reads : Total: 767703 Max/Min: 0/0 Fail: 0
which is an awful lot better for maximum latency and fairness, despite the
much lower reader count.
> There were 2 preparing patches in the series:
> [PATCH 1/3] kernel/locking: #include <asm/spinlock.h> in qrwlock
> and
> [PATCH 2/3] asm-generic: don't #include <linux/atomic.h> in qspinlock_types.h
>
> 1st patch is not needed anymore because Babu Moger submitted similar patch that
> is already in mainline: 9ab6055f95903 ("kernel/locking: Fix compile error with
> qrwlock.c"). Could you revisit second patch?
Sorry, not sure what you're asking me to do here.
Will
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/534299.html
next prev parent reply other threads:[~2017-10-09 9:59 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-06 13:34 [PATCH v2 0/5] Switch arm64 over to qrwlock Will Deacon
2017-10-06 13:34 ` Will Deacon
2017-10-06 13:34 ` [PATCH v2 1/5] kernel/locking: Use struct qrwlock instead of struct __qrwlock Will Deacon
2017-10-06 13:34 ` Will Deacon
2017-10-06 13:34 ` [PATCH v2 2/5] locking/atomic: Add atomic_cond_read_acquire Will Deacon
2017-10-06 13:34 ` Will Deacon
2017-10-06 13:34 ` [PATCH v2 3/5] kernel/locking: Use atomic_cond_read_acquire when spinning in qrwlock Will Deacon
2017-10-06 13:34 ` Will Deacon
2017-10-08 1:03 ` Boqun Feng
2017-10-08 1:03 ` Boqun Feng
2017-10-09 11:30 ` Will Deacon
2017-10-09 11:30 ` Will Deacon
2017-10-06 13:34 ` [PATCH v2 4/5] arm64: locking: Move rwlock implementation over to qrwlocks Will Deacon
2017-10-06 13:34 ` Will Deacon
2017-10-10 1:34 ` Waiman Long
2017-10-10 1:34 ` Waiman Long
2017-10-11 11:49 ` Will Deacon
2017-10-11 11:49 ` Will Deacon
2017-10-11 14:03 ` Waiman Long
2017-10-11 14:03 ` Waiman Long
2017-10-06 13:34 ` [PATCH v2 5/5] kernel/locking: Prevent slowpath writers getting held up by fastpath Will Deacon
2017-10-06 13:34 ` Will Deacon
2017-10-08 21:30 ` [PATCH v2 0/5] Switch arm64 over to qrwlock Yury Norov
2017-10-08 21:30 ` Yury Norov
2017-10-09 6:52 ` Peter Zijlstra
2017-10-09 6:52 ` Peter Zijlstra
2017-10-09 10:02 ` Will Deacon
2017-10-09 10:02 ` Will Deacon
2017-10-09 9:59 ` Will Deacon [this message]
2017-10-09 9:59 ` Will Deacon
2017-10-09 12:49 ` Yury Norov
2017-10-09 12:49 ` Yury Norov
2017-10-09 13:13 ` Will Deacon
2017-10-09 13:13 ` Will Deacon
2017-10-09 21:19 ` Waiman Long
2017-10-09 21:19 ` Waiman Long
2017-10-09 22:31 ` Jeremy Linton
2017-10-09 22:31 ` Jeremy Linton
2017-10-10 18:20 ` Adam Wallis
2017-10-10 18:20 ` Adam Wallis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171009095935.GC5127@arm.com \
--to=will.deacon@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.