From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754870AbaHODfV (ORCPT ); Thu, 14 Aug 2014 23:35:21 -0400 Received: from ipmail07.adl2.internode.on.net ([150.101.137.131]:23771 "EHLO ipmail07.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754265AbaHODfT (ORCPT ); Thu, 14 Aug 2014 23:35:19 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AtUGABx/7VN5LDJ8/2dsb2JhbABZgw1TV68QAQEBAQEBBp5Uh1EBgRIXd4QEAQU6HCMQCAMYCSUPBSUDIROIQcU+FxiFZIlQB4RMBY8KhjmCOYFugk6BM5NMg24rL4JPAQEB Date: Fri, 15 Aug 2014 13:34:48 +1000 From: Dave Chinner To: Waiman Long Cc: Jason Low , Ingo Molnar , Peter Zijlstra , linux-kernel@vger.kernel.org, Davidlohr Bueso , Scott J Norton Subject: Re: [PATCH 2/7] locking/rwsem: more aggressive use of optimistic spinning Message-ID: <20140815033447.GJ20518@dastard> References: <1407119782-41119-1-git-send-email-Waiman.Long@hp.com> <1407119782-41119-3-git-send-email-Waiman.Long@hp.com> <1407125450.4710.38.camel@j-VirtualBox> <53DFAA53.4010003@hp.com> <20140813055153.GD20518@dastard> <53EB9522.2070804@hp.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <53EB9522.2070804@hp.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 13, 2014 at 12:41:06PM -0400, Waiman Long wrote: > On 08/13/2014 01:51 AM, Dave Chinner wrote: > >On Mon, Aug 04, 2014 at 11:44:19AM -0400, Waiman Long wrote: > >>On 08/04/2014 12:10 AM, Jason Low wrote: > >>>On Sun, 2014-08-03 at 22:36 -0400, Waiman Long wrote: > >>>>The rwsem_can_spin_on_owner() function currently allows optimistic > >>>>spinning only if the owner field is defined and is running. That is > >>>>too conservative as it will cause some tasks to miss the opportunity > >>>>of doing spinning in case the owner hasn't been able to set the owner > >>>>field in time or the lock has just become available. > >>>> > >>>>This patch enables more aggressive use of optimistic spinning by > >>>>assuming that the lock is spinnable unless proved otherwise. > >>>> > >>>>Signed-off-by: Waiman Long > >>>>--- > >>>> kernel/locking/rwsem-xadd.c | 2 +- > >>>> 1 files changed, 1 insertions(+), 1 deletions(-) > >>>> > >>>>diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c > >>>>index d058946..dce22b8 100644 > >>>>--- a/kernel/locking/rwsem-xadd.c > >>>>+++ b/kernel/locking/rwsem-xadd.c > >>>>@@ -285,7 +285,7 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem) > >>>> static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem) > >>>> { > >>>> struct task_struct *owner; > >>>>- bool on_cpu = false; > >>>>+ bool on_cpu = true; /* Assume spinnable unless proved not to be */ > >>>Hi, > >>> > >>>So "on_cpu = true" was recently converted to "on_cpu = false" in order > >>>to address issues such as a 5x performance regression in the xfs_repair > >>>workload that was caused by the original rwsem optimistic spinning code. > >>> > >>>However, patch 4 in this patchset does address some of the problems with > >>>spinning when there are readers. CC'ing Dave Chinner, who did the > >>>testing with the xfs_repair workload. > >>> > >>This patch set enables proper reader spinning and so the problem > >>that we see with xfs_repair workload should go away. I should have > >>this patch after patch 4 to make it less confusing. BTW, patch 3 can > >>significantly reduce spinlock contention in rwsem. So I believe the > >>xfs_repair workload should run faster with this patch than both 3.15 > >>and 3.16. > >I see lots of handwaving. I documented the test I ran when I > >reported the problem so anyone with a 16p system and an SSD can > >reproduce it. I don't have the bandwidth to keep track of the lunacy > >of making locks scale these days - that's what you guys are doing. > > > >I gave you a simple, reliable workload that is extremely sensitive > >to rwsem perturbations, so you should be adding it to your > >regression tests rather than leaving it for others to notice you > >screwed up.... > > > >Cheers, > > > >Dave. > > If you can send me a rwsem workload that I can use for testing > purpose, it will be highly appreciated. xfs_io -f -c "truncate 500t" -c "extsize 1m" /path/to/vm/image/file In vm: download and build fsmark from here: git://oss.sgi.com/dgc/fs_mark download and install xfsprogs v3.2.1 from here: git://oss.sgi.com/xfs/cmds/xfsprogs.git tags/v3.2.1 Setup up the target filesystem: # mkfs.xfs -f -m "crc=1,finobt=1" /dev/vda # mount -o logbsize=262144,nobarrier /dev/vda /mnt/scratch Run: # fs_mark -D 10000 -S0 -n 50000 -s 0 -L 32 \ -d /mnt/scratch/0 -d /mnt/scratch/1 \ -d /mnt/scratch/2 -d /mnt/scratch/3 \ -d /mnt/scratch/4 -d /mnt/scratch/5 \ -d /mnt/scratch/6 -d /mnt/scratch/7 \ -d /mnt/scratch/8 -d /mnt/scratch/9 \ -d /mnt/scratch/10 -d /mnt/scratch/11 \ -d /mnt/scratch/12 -d /mnt/scratch/13 \ -d /mnt/scratch/14 -d /mnt/scratch/15 \ If you've got everything set up right, that should run at around 200-250,000 file creates/s. When finished, unmount and run: # xfs_repair -o bhash=500000 /dev/vda And that should spend quite a long while pounding on the mmap_sem until the the userspace buffer cache stops growing. I just ran the above on 3.16, saw this from perf: 37.30% [kernel] [k] _raw_spin_unlock_irqrestore - _raw_spin_unlock_irqrestore - 62.00% rwsem_wake - call_rwsem_wake + 83.52% sys_mprotect + 16.23% __do_page_fault + 35.15% try_to_wake_up + 0.96% update_blocked_averages + 0.61% pagevec_lru_move_fn - 23.35% [kernel] [k] _raw_spin_unlock_irq - _raw_spin_unlock_irq + 51.37% finish_task_switch + 39.37% rwsem_down_write_failed + 8.49% rwsem_down_read_failed 0.62% run_timer_softirq + 5.22% [kernel] [k] native_read_tsc + 3.89% [kernel] [k] rwsem_down_write_failed ..... Cheers, Dave. -- Dave Chinner david@fromorbit.com