From: arnd@arndb.de (Arnd Bergmann)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] ARM64: Improve copy_page for 128 cache line sizes.
Date: Mon, 21 Dec 2015 14:42:58 +0100 [thread overview]
Message-ID: <201512211442.58803.arnd@arndb.de> (raw)
In-Reply-To: <20151221124637.GN23092@arm.com>
On Monday 21 December 2015, Will Deacon wrote:
> On Sat, Dec 19, 2015 at 04:11:18PM -0800, Andrew Pinski wrote:
> > Adding a check for the cache line size is not much overhead.
> > Special case 128 byte cache line size.
> > This improves copy_page by 85% on ThunderX compared to the
> > original implementation.
>
> So this patch seems to:
>
> - Align the loop
> - Increase the prefetch size
> - Unroll the loop once
>
> Do you know where your 85% boost comes from between these? I'd really
> like to avoid having multiple versions of copy_page, if possible, but
> maybe we could end up with something that works well enough regardless
> of cacheline size. Understanding what your bottleneck is would help to
> lead us in the right direction.
>
> Also, how are you measuring the improvement? If you can share your
> test somewhere, I can see how it affects the other systems I have access
> to.
A related question would be how other CPU cores are affected by the change.
The test for the cache line size is going to take a few cycles, possibly
a lot on certain implementations, e.g. if we ever get one where 'mrs' is
microcoded or trapped by a hypervisor.
Are there any possible downsides to using the ThunderX version on other
microarchitectures too and skip the check?
Arnd
WARNING: multiple messages have this Message-ID (diff)
From: Arnd Bergmann <arnd@arndb.de>
To: linux-arm-kernel@lists.infradead.org
Cc: Will Deacon <will.deacon@arm.com>,
Andrew Pinski <apinski@cavium.com>,
pinsia@gmail.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ARM64: Improve copy_page for 128 cache line sizes.
Date: Mon, 21 Dec 2015 14:42:58 +0100 [thread overview]
Message-ID: <201512211442.58803.arnd@arndb.de> (raw)
In-Reply-To: <20151221124637.GN23092@arm.com>
On Monday 21 December 2015, Will Deacon wrote:
> On Sat, Dec 19, 2015 at 04:11:18PM -0800, Andrew Pinski wrote:
> > Adding a check for the cache line size is not much overhead.
> > Special case 128 byte cache line size.
> > This improves copy_page by 85% on ThunderX compared to the
> > original implementation.
>
> So this patch seems to:
>
> - Align the loop
> - Increase the prefetch size
> - Unroll the loop once
>
> Do you know where your 85% boost comes from between these? I'd really
> like to avoid having multiple versions of copy_page, if possible, but
> maybe we could end up with something that works well enough regardless
> of cacheline size. Understanding what your bottleneck is would help to
> lead us in the right direction.
>
> Also, how are you measuring the improvement? If you can share your
> test somewhere, I can see how it affects the other systems I have access
> to.
A related question would be how other CPU cores are affected by the change.
The test for the cache line size is going to take a few cycles, possibly
a lot on certain implementations, e.g. if we ever get one where 'mrs' is
microcoded or trapped by a hypervisor.
Are there any possible downsides to using the ThunderX version on other
microarchitectures too and skip the check?
Arnd
next prev parent reply other threads:[~2015-12-21 13:42 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-20 0:11 [PATCH] ARM64: Improve copy_page for 128 cache line sizes Andrew Pinski
2015-12-20 0:11 ` Andrew Pinski
2015-12-21 12:46 ` Will Deacon
2015-12-21 12:46 ` Will Deacon
2015-12-21 13:42 ` Arnd Bergmann [this message]
2015-12-21 13:42 ` Arnd Bergmann
-- strict thread matches above, loose matches on Subject: below --
2015-12-22 23:32 Andrew Pinski
2016-01-06 16:31 ` Will Deacon
2016-01-06 16:31 ` Will Deacon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201512211442.58803.arnd@arndb.de \
--to=arnd@arndb.de \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.