* [PATCH] x86: Micro-optimise clflush_cache_range()
@ 2016-01-08 9:55 Chris Wilson
2016-01-08 16:58 ` Ross Zwisler
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Chris Wilson @ 2016-01-08 9:55 UTC (permalink / raw)
To: x86
Cc: Chris Wilson, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Toshi Kani, Borislav Petkov, Luis R. Rodriguez, Stephen Rothwell,
Ross Zwisler, Sai Praneeth, linux-kernel
Whilst inspecting the asm for clflush_cache_range() and some perf profiles
that required extensive flushing of single cachelines (from part of the
intel-gpu-tools GPU benchmarks), we noticed that gcc was reloading
boot_cpu_data.x86_clflush_size on every iteration of the loop. We can
manually hoist that read which perf regarded as taking ~25% of the
function time for a single cacheline flush.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: "Luis R. Rodriguez" <mcgrof@suse.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sai Praneeth <sai.praneeth.prakhya@intel.com>
Cc: linux-kernel@vger.kernel.org
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
---
arch/x86/mm/pageattr.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index a3137a4feed1..6000ad7f560c 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -129,14 +129,16 @@ within(unsigned long addr, unsigned long start, unsigned long end)
*/
void clflush_cache_range(void *vaddr, unsigned int size)
{
- unsigned long clflush_mask = boot_cpu_data.x86_clflush_size - 1;
+ const unsigned long clflush_size = boot_cpu_data.x86_clflush_size;
+ void *p = (void *)((unsigned long)vaddr & ~(clflush_size - 1));
void *vend = vaddr + size;
- void *p;
+
+ if (p >= vend)
+ return;
mb();
- for (p = (void *)((unsigned long)vaddr & ~clflush_mask);
- p < vend; p += boot_cpu_data.x86_clflush_size)
+ for (; p < vend; p += clflush_size)
clflushopt(p);
mb();
--
2.7.0.rc3
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH] x86: Micro-optimise clflush_cache_range()
2016-01-08 9:55 [PATCH] x86: Micro-optimise clflush_cache_range() Chris Wilson
@ 2016-01-08 16:58 ` Ross Zwisler
2016-01-08 18:25 ` Toshi Kani
2016-01-08 18:30 ` [tip:x86/mm] x86/mm: " tip-bot for Chris Wilson
2 siblings, 0 replies; 4+ messages in thread
From: Ross Zwisler @ 2016-01-08 16:58 UTC (permalink / raw)
To: Chris Wilson
Cc: x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Toshi Kani,
Borislav Petkov, Luis R. Rodriguez, Stephen Rothwell,
Ross Zwisler, Sai Praneeth, linux-kernel
On Fri, Jan 08, 2016 at 09:55:33AM +0000, Chris Wilson wrote:
> Whilst inspecting the asm for clflush_cache_range() and some perf profiles
> that required extensive flushing of single cachelines (from part of the
> intel-gpu-tools GPU benchmarks), we noticed that gcc was reloading
> boot_cpu_data.x86_clflush_size on every iteration of the loop. We can
> manually hoist that read which perf regarded as taking ~25% of the
> function time for a single cacheline flush.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Toshi Kani <toshi.kani@hpe.com>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: "Luis R. Rodriguez" <mcgrof@suse.com>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> Cc: Sai Praneeth <sai.praneeth.prakhya@intel.com>
> Cc: linux-kernel@vger.kernel.org
> Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Looks good to me.
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> ---
> arch/x86/mm/pageattr.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index a3137a4feed1..6000ad7f560c 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -129,14 +129,16 @@ within(unsigned long addr, unsigned long start, unsigned long end)
> */
> void clflush_cache_range(void *vaddr, unsigned int size)
> {
> - unsigned long clflush_mask = boot_cpu_data.x86_clflush_size - 1;
> + const unsigned long clflush_size = boot_cpu_data.x86_clflush_size;
> + void *p = (void *)((unsigned long)vaddr & ~(clflush_size - 1));
> void *vend = vaddr + size;
> - void *p;
> +
> + if (p >= vend)
> + return;
>
> mb();
>
> - for (p = (void *)((unsigned long)vaddr & ~clflush_mask);
> - p < vend; p += boot_cpu_data.x86_clflush_size)
> + for (; p < vend; p += clflush_size)
> clflushopt(p);
>
> mb();
> --
> 2.7.0.rc3
>
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH] x86: Micro-optimise clflush_cache_range()
2016-01-08 9:55 [PATCH] x86: Micro-optimise clflush_cache_range() Chris Wilson
2016-01-08 16:58 ` Ross Zwisler
@ 2016-01-08 18:25 ` Toshi Kani
2016-01-08 18:30 ` [tip:x86/mm] x86/mm: " tip-bot for Chris Wilson
2 siblings, 0 replies; 4+ messages in thread
From: Toshi Kani @ 2016-01-08 18:25 UTC (permalink / raw)
To: Chris Wilson, x86
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Borislav Petkov,
Luis R. Rodriguez, Stephen Rothwell, Ross Zwisler, Sai Praneeth,
linux-kernel
On Fri, 2016-01-08 at 09:55 +0000, Chris Wilson wrote:
> Whilst inspecting the asm for clflush_cache_range() and some perf
> profiles
> that required extensive flushing of single cachelines (from part of the
> intel-gpu-tools GPU benchmarks), we noticed that gcc was reloading
> boot_cpu_data.x86_clflush_size on every iteration of the loop. We can
> manually hoist that read which perf regarded as taking ~25% of the
> function time for a single cacheline flush.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Toshi Kani <toshi.kani@hpe.com>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: "Luis R. Rodriguez" <mcgrof@suse.com>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> Cc: Sai Praneeth <sai.praneeth.prakhya@intel.com>
> Cc: linux-kernel@vger.kernel.org
> Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Thanks for the improvement! The change looks good to me.
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
-Toshi
^ permalink raw reply [flat|nested] 4+ messages in thread
* [tip:x86/mm] x86/mm: Micro-optimise clflush_cache_range()
2016-01-08 9:55 [PATCH] x86: Micro-optimise clflush_cache_range() Chris Wilson
2016-01-08 16:58 ` Ross Zwisler
2016-01-08 18:25 ` Toshi Kani
@ 2016-01-08 18:30 ` tip-bot for Chris Wilson
2 siblings, 0 replies; 4+ messages in thread
From: tip-bot for Chris Wilson @ 2016-01-08 18:30 UTC (permalink / raw)
To: linux-tip-commits
Cc: tglx, ross.zwisler, mcgrof, chris, bp, hpa, linux-kernel, mingo,
sfr, sai.praneeth.prakhya, toshi.kani
Commit-ID: 1f1a89ac05f6e88aa341e86e57435fdbb1177c0c
Gitweb: http://git.kernel.org/tip/1f1a89ac05f6e88aa341e86e57435fdbb1177c0c
Author: Chris Wilson <chris@chris-wilson.co.uk>
AuthorDate: Fri, 8 Jan 2016 09:55:33 +0000
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 8 Jan 2016 19:27:39 +0100
x86/mm: Micro-optimise clflush_cache_range()
Whilst inspecting the asm for clflush_cache_range() and some perf profiles
that required extensive flushing of single cachelines (from part of the
intel-gpu-tools GPU benchmarks), we noticed that gcc was reloading
boot_cpu_data.x86_clflush_size on every iteration of the loop. We can
manually hoist that read which perf regarded as taking ~25% of the
function time for a single cacheline flush.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Sai Praneeth <sai.praneeth.prakhya@intel.com>
Link: http://lkml.kernel.org/r/1452246933-10890-1-git-send-email-chris@chris-wilson.co.uk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/mm/pageattr.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index a3137a4..6000ad7 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -129,14 +129,16 @@ within(unsigned long addr, unsigned long start, unsigned long end)
*/
void clflush_cache_range(void *vaddr, unsigned int size)
{
- unsigned long clflush_mask = boot_cpu_data.x86_clflush_size - 1;
+ const unsigned long clflush_size = boot_cpu_data.x86_clflush_size;
+ void *p = (void *)((unsigned long)vaddr & ~(clflush_size - 1));
void *vend = vaddr + size;
- void *p;
+
+ if (p >= vend)
+ return;
mb();
- for (p = (void *)((unsigned long)vaddr & ~clflush_mask);
- p < vend; p += boot_cpu_data.x86_clflush_size)
+ for (; p < vend; p += clflush_size)
clflushopt(p);
mb();
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-01-08 18:31 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-08 9:55 [PATCH] x86: Micro-optimise clflush_cache_range() Chris Wilson
2016-01-08 16:58 ` Ross Zwisler
2016-01-08 18:25 ` Toshi Kani
2016-01-08 18:30 ` [tip:x86/mm] x86/mm: " tip-bot for Chris Wilson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.