From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756427AbcAHS0l (ORCPT ); Fri, 8 Jan 2016 13:26:41 -0500 Received: from g4t3427.houston.hp.com ([15.201.208.55]:49964 "EHLO g4t3427.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756440AbcAHS0i (ORCPT ); Fri, 8 Jan 2016 13:26:38 -0500 Message-ID: <1452277557.19330.64.camel@hpe.com> Subject: Re: [PATCH] x86: Micro-optimise clflush_cache_range() From: Toshi Kani To: Chris Wilson , x86@kernel.org Cc: Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Borislav Petkov , "Luis R. Rodriguez" , Stephen Rothwell , Ross Zwisler , Sai Praneeth , linux-kernel@vger.kernel.org Date: Fri, 08 Jan 2016 11:25:57 -0700 In-Reply-To: <1452246933-10890-1-git-send-email-chris@chris-wilson.co.uk> References: <1452246933-10890-1-git-send-email-chris@chris-wilson.co.uk> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.16.5 (3.16.5-3.fc22) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2016-01-08 at 09:55 +0000, Chris Wilson wrote: > Whilst inspecting the asm for clflush_cache_range() and some perf > profiles > that required extensive flushing of single cachelines (from part of the > intel-gpu-tools GPU benchmarks), we noticed that gcc was reloading > boot_cpu_data.x86_clflush_size on every iteration of the loop. We can > manually hoist that read which perf regarded as taking ~25% of the > function time for a single cacheline flush. > > Signed-off-by: Chris Wilson > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: "H. Peter Anvin" > Cc: x86@kernel.org > Cc: Toshi Kani > Cc: Borislav Petkov > Cc: "Luis R. Rodriguez" > Cc: Stephen Rothwell > Cc: Ross Zwisler > Cc: Sai Praneeth > Cc: linux-kernel@vger.kernel.org > Acked-by: "H. Peter Anvin" Thanks for the improvement! The change looks good to me. Reviewed-by: Toshi Kani -Toshi