From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753796AbcAGWap (ORCPT ); Thu, 7 Jan 2016 17:30:45 -0500 Received: from terminus.zytor.com ([198.137.202.10]:44731 "EHLO mail.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753199AbcAGWan (ORCPT ); Thu, 7 Jan 2016 17:30:43 -0500 Subject: Re: [PATCH] x86: Add an explicit barrier() to clflushopt() To: Chris Wilson , Andy Lutomirski , "linux-kernel@vger.kernel.org" , Ross Zwisler , "H . Peter Anvin" , Borislav Petkov , Brian Gerst , Denys Vlasenko , Linus Torvalds , Thomas Gleixner , Imre Deak , Daniel Vetter , DRI References: <1445248735-11915-1-git-send-email-chris@chris-wilson.co.uk> <20160107101652.GF652@nuc-i3427.alporthouse.com> <20160107194413.GA25144@nuc-i3427.alporthouse.com> <568ED31F.1090004@zytor.com> <20160107215401.GB25144@nuc-i3427.alporthouse.com> From: "H. Peter Anvin" X-Enigmail-Draft-Status: N1110 Message-ID: <568EE6E6.4000904@zytor.com> Date: Thu, 7 Jan 2016 14:29:58 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: <20160107215401.GB25144@nuc-i3427.alporthouse.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/07/16 13:54, Chris Wilson wrote: > > Whilst you are looking at this asm, note that we reload > boot_cpu_data.x86_cflush_size everytime around the loop. That's a small > but noticeable extra cost (especially when we are only flushing a single > cacheline). > I did notice that; I don't know if this is a compiler failure to do an obvious hoisting (which should then be merged with the other load) or a consequence of the volatile. It might be useful to report this to the gcc bugzilla to let them look at it. Either way, the diff looks good and you can add my: Acked-by: H. Peter Anvin However, I see absolutely nothing wrong with the assembly other than minor optimization possibilities: since gcc generates an early-out test as well as a late-out test anyway, we could add an explicit: if (p < vend) return; before the first mb() at no additional cost (assuming gcc is smart enough to skip the second test; otherwise that can easily be done manually by replacing the for loop with a do { } while loop. I would be very interested in knowing if replacing the final clflushopt with a clflush would resolve your problems (in which case the last mb() shouldn't be necessary either.) Something like: void clflush_cache_range(void *vaddr, unsigned int size) { unsigned long clflush_size = boot_cpu_data.x86_clflush_size; char *vend = (char *)vaddr + size; char *p = (char *)((unsigned long)vaddr & ~(clflush_size - 1); if (p >= vend) return; mb(); for (; p < vend - clflush_size; p += clflush_size) clflushopt(p); clflush(p); /* Serializing and thus a barrier */ }