From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932630AbbIDIaZ (ORCPT ); Fri, 4 Sep 2015 04:30:25 -0400 Received: from ipmail06.adl2.internode.on.net ([150.101.137.129]:13947 "EHLO ipmail06.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752006AbbIDIaV (ORCPT ); Fri, 4 Sep 2015 04:30:21 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2D5CADRVelVPAUaLHldgyGBPYZSonUBAQEBAQeKXpESAgIBAQKBOE0BAQEBAQEHAQEBAUE/hCMBAQEDATIBIyMQCAMYCSUPBSUDBxoTiCYHymkBAQEBAQUBAQEBHhmGE4VCgT0Bg00HgxiBFAWVUYx0mnOCQ4F1LDOJSwEBAQ Date: Fri, 4 Sep 2015 18:29:54 +1000 From: Dave Chinner To: Linus Torvalds Cc: Linux Kernel Mailing List , Peter Zijlstra , Waiman Long , Ingo Molnar Subject: Re: [4.2, Regression] Queued spinlocks cause major XFS performance regression Message-ID: <20150904082954.GB3902@dastard> References: <20150904054820.GY3902@dastard> <20150904071143.GZ3902@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20150904071143.GZ3902@dastard> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 04, 2015 at 05:11:43PM +1000, Dave Chinner wrote: > On Thu, Sep 03, 2015 at 11:39:21PM -0700, Linus Torvalds wrote: > > There doesn't seem to be anything even remotely strange going on in that area. > > > > Is this a PARAVIRT configuration? There were issues with PV > > interaction at some point. If it is PV, and you don't actually use PV, > > can you test with PV support disabled? > > $ grep PARAVIRT .config > CONFIG_PARAVIRT=y > # CONFIG_PARAVIRT_DEBUG is not set > # CONFIG_PARAVIRT_SPINLOCKS is not set > CONFIG_PARAVIRT_TIME_ACCOUNTING=y > CONFIG_PARAVIRT_CLOCK=y > $ > > I'll retest with CONFIG_PARAVIRT=n.... $ grep PARAVIRT .config # CONFIG_PARAVIRT is not set $ FSUse% Count Size Files/sec App Overhead 0 1600000 0 123407.7 9202289 0 3200000 0 97271.9 9187905 0 4800000 0 101010.3 11246527 .... So, no, that doesn't affect the queued spinlock performance at all. > > Also, if you look at the instruction-level profile for > > queued_spin_lock_slowpath itself, does anything stand out? For > > example, I note that the for-loop with the atomic_cmpxchg() call in it > > doesn't ever do a cpu_relax(). It doesn't look like that should > > normally loop, but obviously that function also shouldn't normally use > > 2/3rds of the cpu, so.. Maybe some part of queued_spin_lock_slowpath() > > stands out as "it's spending 99% of the time in _that_ particular > > part, and it gives some clue what goes wrong. > > I'll have a look when the current tests on that machine have > finished running. ¿ Disassembly of section load2: ¿ ¿ ffffffff810e0f30 : 0.00 ¿ nop ¿ push %rbp 0.00 ¿ mov %rsp,%rbp 0.00 ¿ xchg %ax,%ax ¿ xor %eax,%eax 0.00 ¿ mov $0x1,%edx ¿ lock cmpxchg %edx,(%rdi) 0.33 ¿ xor %ecx,%ecx ¿ test %eax,%eax ¿ ¿ je 28 0.02 ¿ 1c: pause 4.45 ¿ mov %ecx,%eax 0.00 ¿ lock cmpxchg %edx,(%rdi) 95.18 ¿ test %eax,%eax ¿ ¿ jne 1c 0.01 ¿ 28: pop %rbp 0.01 ¿ ¿ retq ..... It looks like it's spending all it's time looping around the cmpxchg. Cheers, Dave. -- Dave Chinner david@fromorbit.com