All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86: Micro-optimise clflush_cache_range()
@ 2016-01-08  9:55 Chris Wilson
  2016-01-08 16:58 ` Ross Zwisler
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Chris Wilson @ 2016-01-08  9:55 UTC (permalink / raw)
  To: x86
  Cc: Chris Wilson, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Toshi Kani, Borislav Petkov, Luis R. Rodriguez, Stephen Rothwell,
	Ross Zwisler, Sai Praneeth, linux-kernel

Whilst inspecting the asm for clflush_cache_range() and some perf profiles
that required extensive flushing of single cachelines (from part of the
intel-gpu-tools GPU benchmarks), we noticed that gcc was reloading
boot_cpu_data.x86_clflush_size on every iteration of the loop. We can
manually hoist that read which perf regarded as taking ~25% of the
function time for a single cacheline flush.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: "Luis R. Rodriguez" <mcgrof@suse.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sai Praneeth <sai.praneeth.prakhya@intel.com>
Cc: linux-kernel@vger.kernel.org
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/mm/pageattr.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index a3137a4feed1..6000ad7f560c 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -129,14 +129,16 @@ within(unsigned long addr, unsigned long start, unsigned long end)
  */
 void clflush_cache_range(void *vaddr, unsigned int size)
 {
-	unsigned long clflush_mask = boot_cpu_data.x86_clflush_size - 1;
+	const unsigned long clflush_size = boot_cpu_data.x86_clflush_size;
+	void *p = (void *)((unsigned long)vaddr & ~(clflush_size - 1));
 	void *vend = vaddr + size;
-	void *p;
+
+	if (p >= vend)
+		return;
 
 	mb();
 
-	for (p = (void *)((unsigned long)vaddr & ~clflush_mask);
-	     p < vend; p += boot_cpu_data.x86_clflush_size)
+	for (; p < vend; p += clflush_size)
 		clflushopt(p);
 
 	mb();
-- 
2.7.0.rc3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-01-08 18:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-08  9:55 [PATCH] x86: Micro-optimise clflush_cache_range() Chris Wilson
2016-01-08 16:58 ` Ross Zwisler
2016-01-08 18:25 ` Toshi Kani
2016-01-08 18:30 ` [tip:x86/mm] x86/mm: " tip-bot for Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.