All of lore.kernel.org
 help / color / mirror / Atom feed
From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v2 0/5] Switch arm64 over to qrwlock
Date: Mon, 9 Oct 2017 10:59:36 +0100	[thread overview]
Message-ID: <20171009095935.GC5127@arm.com> (raw)
In-Reply-To: <20171008213052.ojyxpr56d2ypscjy@yury-thinkpad>

Hi Yury,

On Mon, Oct 09, 2017 at 12:30:52AM +0300, Yury Norov wrote:
> On Fri, Oct 06, 2017 at 02:34:37PM +0100, Will Deacon wrote:
> > This is version two of the patches I posted yesterday:
> > 
> >   http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/534666.html
> > 
> > I'd normally leave it longer before posting again, but Peter had a good
> > suggestion to rework the layout of the lock word, so I wanted to post a
> > version that follows that approach.
> > 
> > I've updated my branch if you're after the full patch stack:
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git qrwlock
> > 
> > As before, all comments (particularly related to testing and performance)
> > welcome!
> > 
> I tested your patches with locktorture and found measurable performance
> regression. I also respin the patch of Jan Glauber [1], and I also
> tried Jan's patch with patch 5 from this series. Numbers differ a lot
> from my previous measurements, but since that I changed working
> station and use qemu with the support of parallel threads.
>                         Spinlock        Read-RW lock    Write-RW lock
> Vanilla:                129804626       12340895        14716138
> This series:            113718002       10982159        13068934
> Jan patch:              117977108       11363462        13615449
> Jan patch + #5:         121483176       11696728        13618967
> 
> The bottomline of discussion [1] was that queued locks are more
> effective when SoC has many CPUs. And 4 is not many. My measurement
> was made on the 4-CPU machine, and it seems it confirms that. Does
> it make sense to make queued locks default for many-CPU machines only?

Just to confirm, you're running this under qemu on an x86 host, using full
AArch64 system emulation? If so, I really don't think we should base the
merits of qrwlocks on arm64 around this type of configuration. Given that
you work for a silicon vendor, could you try running on real arm64 hardware
instead, please? My measurements on 6-core and 8-core systems look a lot
better with qrwlock than what we currently have in mainline, and they
also fix a real starvation issue reported by Jeremy [1].

I'd also add that lock fairness comes at a cost, so I'd expect a small drop
in total throughput for some workloads. I encourage you to try passing
different arguments to locktorture to see this in action. For example, on
an 8-core machine:

# insmod ./locktorture.ko nwriters_stress=2 nreaders_stress=8 torture_type="rw_lock_irq" stat_interval=2

-rc3:

  Writes:  Total: 6612  Max/Min: 0/0   Fail: 0
  Reads :  Total: 1265230  Max/Min: 0/0   Fail: 0
  Writes:  Total: 6709  Max/Min: 0/0   Fail: 0
  Reads :  Total: 1916418  Max/Min: 0/0   Fail: 0
  Writes:  Total: 6725  Max/Min: 0/0   Fail: 0
  Reads :  Total: 5103727  Max/Min: 0/0   Fail: 0

notice how the writers are really struggling here (you only have to tweak a
bit more and you get RCU stalls, lose interrupts etc).

With the qrwlock:

  Writes:  Total: 47962  Max/Min: 0/0   Fail: 0
  Reads :  Total: 277903  Max/Min: 0/0   Fail: 0
  Writes:  Total: 100151  Max/Min: 0/0   Fail: 0
  Reads :  Total: 525781  Max/Min: 0/0   Fail: 0
  Writes:  Total: 155284  Max/Min: 0/0   Fail: 0
  Reads :  Total: 767703  Max/Min: 0/0   Fail: 0

which is an awful lot better for maximum latency and fairness, despite the
much lower reader count.

> There were 2 preparing patches in the series: 
> [PATCH 1/3] kernel/locking: #include <asm/spinlock.h> in qrwlock
> and
> [PATCH 2/3] asm-generic: don't #include <linux/atomic.h> in qspinlock_types.h
> 
> 1st patch is not needed anymore because Babu Moger submitted similar patch that
> is already in mainline: 9ab6055f95903 ("kernel/locking: Fix compile error with
> qrwlock.c"). Could you revisit second patch?

Sorry, not sure what you're asking me to do here.

Will

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/534299.html

WARNING: multiple messages have this Message-ID (diff)
From: Will Deacon <will.deacon@arm.com>
To: Yury Norov <ynorov@caviumnetworks.com>
Cc: linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, Jeremy.Linton@arm.com,
	peterz@infradead.org, mingo@redhat.com, longman@redhat.com,
	boqun.feng@gmail.com, paulmck@linux.vnet.ibm.com
Subject: Re: [PATCH v2 0/5] Switch arm64 over to qrwlock
Date: Mon, 9 Oct 2017 10:59:36 +0100	[thread overview]
Message-ID: <20171009095935.GC5127@arm.com> (raw)
In-Reply-To: <20171008213052.ojyxpr56d2ypscjy@yury-thinkpad>

Hi Yury,

On Mon, Oct 09, 2017 at 12:30:52AM +0300, Yury Norov wrote:
> On Fri, Oct 06, 2017 at 02:34:37PM +0100, Will Deacon wrote:
> > This is version two of the patches I posted yesterday:
> > 
> >   http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/534666.html
> > 
> > I'd normally leave it longer before posting again, but Peter had a good
> > suggestion to rework the layout of the lock word, so I wanted to post a
> > version that follows that approach.
> > 
> > I've updated my branch if you're after the full patch stack:
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git qrwlock
> > 
> > As before, all comments (particularly related to testing and performance)
> > welcome!
> > 
> I tested your patches with locktorture and found measurable performance
> regression. I also respin the patch of Jan Glauber [1], and I also
> tried Jan's patch with patch 5 from this series. Numbers differ a lot
> from my previous measurements, but since that I changed working
> station and use qemu with the support of parallel threads.
>                         Spinlock        Read-RW lock    Write-RW lock
> Vanilla:                129804626       12340895        14716138
> This series:            113718002       10982159        13068934
> Jan patch:              117977108       11363462        13615449
> Jan patch + #5:         121483176       11696728        13618967
> 
> The bottomline of discussion [1] was that queued locks are more
> effective when SoC has many CPUs. And 4 is not many. My measurement
> was made on the 4-CPU machine, and it seems it confirms that. Does
> it make sense to make queued locks default for many-CPU machines only?

Just to confirm, you're running this under qemu on an x86 host, using full
AArch64 system emulation? If so, I really don't think we should base the
merits of qrwlocks on arm64 around this type of configuration. Given that
you work for a silicon vendor, could you try running on real arm64 hardware
instead, please? My measurements on 6-core and 8-core systems look a lot
better with qrwlock than what we currently have in mainline, and they
also fix a real starvation issue reported by Jeremy [1].

I'd also add that lock fairness comes at a cost, so I'd expect a small drop
in total throughput for some workloads. I encourage you to try passing
different arguments to locktorture to see this in action. For example, on
an 8-core machine:

# insmod ./locktorture.ko nwriters_stress=2 nreaders_stress=8 torture_type="rw_lock_irq" stat_interval=2

-rc3:

  Writes:  Total: 6612  Max/Min: 0/0   Fail: 0
  Reads :  Total: 1265230  Max/Min: 0/0   Fail: 0
  Writes:  Total: 6709  Max/Min: 0/0   Fail: 0
  Reads :  Total: 1916418  Max/Min: 0/0   Fail: 0
  Writes:  Total: 6725  Max/Min: 0/0   Fail: 0
  Reads :  Total: 5103727  Max/Min: 0/0   Fail: 0

notice how the writers are really struggling here (you only have to tweak a
bit more and you get RCU stalls, lose interrupts etc).

With the qrwlock:

  Writes:  Total: 47962  Max/Min: 0/0   Fail: 0
  Reads :  Total: 277903  Max/Min: 0/0   Fail: 0
  Writes:  Total: 100151  Max/Min: 0/0   Fail: 0
  Reads :  Total: 525781  Max/Min: 0/0   Fail: 0
  Writes:  Total: 155284  Max/Min: 0/0   Fail: 0
  Reads :  Total: 767703  Max/Min: 0/0   Fail: 0

which is an awful lot better for maximum latency and fairness, despite the
much lower reader count.

> There were 2 preparing patches in the series: 
> [PATCH 1/3] kernel/locking: #include <asm/spinlock.h> in qrwlock
> and
> [PATCH 2/3] asm-generic: don't #include <linux/atomic.h> in qspinlock_types.h
> 
> 1st patch is not needed anymore because Babu Moger submitted similar patch that
> is already in mainline: 9ab6055f95903 ("kernel/locking: Fix compile error with
> qrwlock.c"). Could you revisit second patch?

Sorry, not sure what you're asking me to do here.

Will

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/534299.html

  parent reply	other threads:[~2017-10-09  9:59 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-06 13:34 [PATCH v2 0/5] Switch arm64 over to qrwlock Will Deacon
2017-10-06 13:34 ` Will Deacon
2017-10-06 13:34 ` [PATCH v2 1/5] kernel/locking: Use struct qrwlock instead of struct __qrwlock Will Deacon
2017-10-06 13:34   ` Will Deacon
2017-10-06 13:34 ` [PATCH v2 2/5] locking/atomic: Add atomic_cond_read_acquire Will Deacon
2017-10-06 13:34   ` Will Deacon
2017-10-06 13:34 ` [PATCH v2 3/5] kernel/locking: Use atomic_cond_read_acquire when spinning in qrwlock Will Deacon
2017-10-06 13:34   ` Will Deacon
2017-10-08  1:03   ` Boqun Feng
2017-10-08  1:03     ` Boqun Feng
2017-10-09 11:30     ` Will Deacon
2017-10-09 11:30       ` Will Deacon
2017-10-06 13:34 ` [PATCH v2 4/5] arm64: locking: Move rwlock implementation over to qrwlocks Will Deacon
2017-10-06 13:34   ` Will Deacon
2017-10-10  1:34   ` Waiman Long
2017-10-10  1:34     ` Waiman Long
2017-10-11 11:49     ` Will Deacon
2017-10-11 11:49       ` Will Deacon
2017-10-11 14:03       ` Waiman Long
2017-10-11 14:03         ` Waiman Long
2017-10-06 13:34 ` [PATCH v2 5/5] kernel/locking: Prevent slowpath writers getting held up by fastpath Will Deacon
2017-10-06 13:34   ` Will Deacon
2017-10-08 21:30 ` [PATCH v2 0/5] Switch arm64 over to qrwlock Yury Norov
2017-10-08 21:30   ` Yury Norov
2017-10-09  6:52   ` Peter Zijlstra
2017-10-09  6:52     ` Peter Zijlstra
2017-10-09 10:02     ` Will Deacon
2017-10-09 10:02       ` Will Deacon
2017-10-09  9:59   ` Will Deacon [this message]
2017-10-09  9:59     ` Will Deacon
2017-10-09 12:49     ` Yury Norov
2017-10-09 12:49       ` Yury Norov
2017-10-09 13:13       ` Will Deacon
2017-10-09 13:13         ` Will Deacon
2017-10-09 21:19 ` Waiman Long
2017-10-09 21:19   ` Waiman Long
2017-10-09 22:31 ` Jeremy Linton
2017-10-09 22:31   ` Jeremy Linton
2017-10-10 18:20 ` Adam Wallis
2017-10-10 18:20   ` Adam Wallis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171009095935.GC5127@arm.com \
    --to=will.deacon@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.