All of lore.kernel.org
 help / color / mirror / Atom feed
From: arnd@arndb.de (Arnd Bergmann)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] ARM64: Improve copy_page for 128 cache line sizes.
Date: Mon, 21 Dec 2015 14:42:58 +0100	[thread overview]
Message-ID: <201512211442.58803.arnd@arndb.de> (raw)
In-Reply-To: <20151221124637.GN23092@arm.com>

On Monday 21 December 2015, Will Deacon wrote:
> On Sat, Dec 19, 2015 at 04:11:18PM -0800, Andrew Pinski wrote:
> > Adding a check for the cache line size is not much overhead.
> > Special case 128 byte cache line size.
> > This improves copy_page by 85% on ThunderX compared to the
> > original implementation.
> 
> So this patch seems to:
> 
>   - Align the loop
>   - Increase the prefetch size
>   - Unroll the loop once
> 
> Do you know where your 85% boost comes from between these? I'd really
> like to avoid having multiple versions of copy_page, if possible, but
> maybe we could end up with something that works well enough regardless
> of cacheline size. Understanding what your bottleneck is would help to
> lead us in the right direction.
> 
> Also, how are you measuring the improvement? If you can share your
> test somewhere, I can see how it affects the other systems I have access
> to.

A related question would be how other CPU cores are affected by the change.
The test for the cache line size is going to take a few cycles, possibly
a lot on certain implementations, e.g. if we ever get one where 'mrs' is
microcoded or trapped by a hypervisor.

Are there any possible downsides to using the ThunderX version on other
microarchitectures too and skip the check?

	Arnd

WARNING: multiple messages have this Message-ID (diff)
From: Arnd Bergmann <arnd@arndb.de>
To: linux-arm-kernel@lists.infradead.org
Cc: Will Deacon <will.deacon@arm.com>,
	Andrew Pinski <apinski@cavium.com>,
	pinsia@gmail.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ARM64: Improve copy_page for 128 cache line sizes.
Date: Mon, 21 Dec 2015 14:42:58 +0100	[thread overview]
Message-ID: <201512211442.58803.arnd@arndb.de> (raw)
In-Reply-To: <20151221124637.GN23092@arm.com>

On Monday 21 December 2015, Will Deacon wrote:
> On Sat, Dec 19, 2015 at 04:11:18PM -0800, Andrew Pinski wrote:
> > Adding a check for the cache line size is not much overhead.
> > Special case 128 byte cache line size.
> > This improves copy_page by 85% on ThunderX compared to the
> > original implementation.
> 
> So this patch seems to:
> 
>   - Align the loop
>   - Increase the prefetch size
>   - Unroll the loop once
> 
> Do you know where your 85% boost comes from between these? I'd really
> like to avoid having multiple versions of copy_page, if possible, but
> maybe we could end up with something that works well enough regardless
> of cacheline size. Understanding what your bottleneck is would help to
> lead us in the right direction.
> 
> Also, how are you measuring the improvement? If you can share your
> test somewhere, I can see how it affects the other systems I have access
> to.

A related question would be how other CPU cores are affected by the change.
The test for the cache line size is going to take a few cycles, possibly
a lot on certain implementations, e.g. if we ever get one where 'mrs' is
microcoded or trapped by a hypervisor.

Are there any possible downsides to using the ThunderX version on other
microarchitectures too and skip the check?

	Arnd

  reply	other threads:[~2015-12-21 13:42 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-20  0:11 [PATCH] ARM64: Improve copy_page for 128 cache line sizes Andrew Pinski
2015-12-20  0:11 ` Andrew Pinski
2015-12-21 12:46 ` Will Deacon
2015-12-21 12:46   ` Will Deacon
2015-12-21 13:42   ` Arnd Bergmann [this message]
2015-12-21 13:42     ` Arnd Bergmann
  -- strict thread matches above, loose matches on Subject: below --
2015-12-22 23:32 Andrew Pinski
2016-01-06 16:31 ` Will Deacon
2016-01-06 16:31   ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201512211442.58803.arnd@arndb.de \
    --to=arnd@arndb.de \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.