From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755493AbaLWJOr (ORCPT ); Tue, 23 Dec 2014 04:14:47 -0500 Received: from mail-wg0-f41.google.com ([74.125.82.41]:62217 "EHLO mail-wg0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754628AbaLWJOp (ORCPT ); Tue, 23 Dec 2014 04:14:45 -0500 Date: Tue, 23 Dec 2014 10:14:40 +0100 From: Ingo Molnar To: Dave Hansen Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de, x86@kernel.org, dave.hansen@linux.intel.com, Linus Torvalds , Andrew Morton Subject: Re: [PATCH 2/2] x86 mpx: fix potential performance issue on unmaps Message-ID: <20141223091440.GA9112@gmail.com> References: <20141222200803.D316DA2A@viggo.jf.intel.com> <20141222200805.13639956@viggo.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141222200805.13639956@viggo.jf.intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Dave Hansen wrote: > > From: Dave Hansen > > The 3.19 release window saw some TLB modifications merged which > caused a performance regression. They were fixed in commit > 045bbb9fa. > > Once that fix was applied, I also noticed that there was a small > but intermittent regression still present. It was not present > consistently enough to bisect reliably, but I'm fairly confident > that it came from (my own) MPX patches. The source was reading > a relatively unused field in the mm_struct via arch_unmap. > > I also noted that this code was in the main instruction flow of > do_munmap() and probably had more icache impact than we want. > > This patch does two things: > 1. Adds a static (via Kconfig) and dynamic (via cpuid) check > for MPX with cpu_feature_enabled(). This keeps us from > reading that cacheline in the mm and trades it for a check > of the global CPUID variables at least on CPUs without MPX. > 2. Adds an unlikely() to ensure that the MPX call ends up out > of the main instruction flow in do_munmap(). I've added > a detailed comment about why this was done and why we want > it even on systems where MPX is present. > > Signed-off-by: Dave Hansen > --- > > b/arch/x86/include/asm/mmu_context.h | 20 +++++++++++++++++++- > 1 file changed, 19 insertions(+), 1 deletion(-) > > diff -puN arch/x86/include/asm/mmu_context.h~fix-mpx-regression-on-unmap arch/x86/include/asm/mmu_context.h > --- a/arch/x86/include/asm/mmu_context.h~fix-mpx-regression-on-unmap 2014-12-22 12:06:18.677928330 -0800 > +++ b/arch/x86/include/asm/mmu_context.h 2014-12-22 12:06:18.680928465 -0800 > @@ -130,7 +130,25 @@ static inline void arch_bprm_mm_init(str > static inline void arch_unmap(struct mm_struct *mm, struct vm_area_struct *vma, > unsigned long start, unsigned long end) > { > - mpx_notify_unmap(mm, vma, start, end); > + /* > + * mpx_notify_unmap() goes and reads a rarely-hot > + * cacheline in the mm_struct. That can be expensive > + * enough to be seen in profiles. > + * > + * The mpx_notify_unmap() call and its contents have been > + * observed to affect munmap() performance on hardware > + * where MPX is not present. > + * > + * The unlikely() optimizes for the fast case: no MPX > + * in the CPU, or no MPX use in the process. Even if > + * we get this wrong (in the unlikely event that MPX > + * is widely enabled on some system) the overhead of > + * MPX itself (reading bounds tables) is expected to > + * overwhelm the overhead of getting this unlikely() > + * consistently wrong. > + */ > + if (unlikely(cpu_feature_enabled(X86_FEATURE_MPX))) > + mpx_notify_unmap(mm, vma, start, end); > } Hm, so this patch still does not help people who have an MPX capable CPU but don't have (or don't have many) MPX using apps. What about them? Thanks, Ingo