From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932557AbcAHQ63 (ORCPT ); Fri, 8 Jan 2016 11:58:29 -0500 Received: from mga03.intel.com ([134.134.136.65]:30865 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932220AbcAHQ62 (ORCPT ); Fri, 8 Jan 2016 11:58:28 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,539,1444719600"; d="scan'208";a="723116982" Date: Fri, 8 Jan 2016 09:58:26 -0700 From: Ross Zwisler To: Chris Wilson Cc: x86@kernel.org, Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Toshi Kani , Borislav Petkov , "Luis R. Rodriguez" , Stephen Rothwell , Ross Zwisler , Sai Praneeth , linux-kernel@vger.kernel.org Subject: Re: [PATCH] x86: Micro-optimise clflush_cache_range() Message-ID: <20160108165826.GA5000@linux.intel.com> References: <1452246933-10890-1-git-send-email-chris@chris-wilson.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1452246933-10890-1-git-send-email-chris@chris-wilson.co.uk> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 08, 2016 at 09:55:33AM +0000, Chris Wilson wrote: > Whilst inspecting the asm for clflush_cache_range() and some perf profiles > that required extensive flushing of single cachelines (from part of the > intel-gpu-tools GPU benchmarks), we noticed that gcc was reloading > boot_cpu_data.x86_clflush_size on every iteration of the loop. We can > manually hoist that read which perf regarded as taking ~25% of the > function time for a single cacheline flush. > > Signed-off-by: Chris Wilson > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: "H. Peter Anvin" > Cc: x86@kernel.org > Cc: Toshi Kani > Cc: Borislav Petkov > Cc: "Luis R. Rodriguez" > Cc: Stephen Rothwell > Cc: Ross Zwisler > Cc: Sai Praneeth > Cc: linux-kernel@vger.kernel.org > Acked-by: "H. Peter Anvin" Looks good to me. Reviewed-by: Ross Zwisler > --- > arch/x86/mm/pageattr.c | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c > index a3137a4feed1..6000ad7f560c 100644 > --- a/arch/x86/mm/pageattr.c > +++ b/arch/x86/mm/pageattr.c > @@ -129,14 +129,16 @@ within(unsigned long addr, unsigned long start, unsigned long end) > */ > void clflush_cache_range(void *vaddr, unsigned int size) > { > - unsigned long clflush_mask = boot_cpu_data.x86_clflush_size - 1; > + const unsigned long clflush_size = boot_cpu_data.x86_clflush_size; > + void *p = (void *)((unsigned long)vaddr & ~(clflush_size - 1)); > void *vend = vaddr + size; > - void *p; > + > + if (p >= vend) > + return; > > mb(); > > - for (p = (void *)((unsigned long)vaddr & ~clflush_mask); > - p < vend; p += boot_cpu_data.x86_clflush_size) > + for (; p < vend; p += clflush_size) > clflushopt(p); > > mb(); > -- > 2.7.0.rc3 >