From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965321Ab0COQQB (ORCPT ); Mon, 15 Mar 2010 12:16:01 -0400 Received: from cantor.suse.de ([195.135.220.2]:53055 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965237Ab0COQQA (ORCPT ); Mon, 15 Mar 2010 12:16:00 -0400 Date: Tue, 16 Mar 2010 03:15:32 +1100 From: Nick Piggin To: Dave Chinner Cc: john stultz , Christoph Hellwig , Thomas Gleixner , lkml , Clark Williams , John Kacur Subject: Re: Nick's vfs-scalability patches ported to 2.6.33-rt Message-ID: <20100315161531.GF2869@laptop> References: <1267163608.2002.9.camel@work-vm> <20100226060109.GH9738@laptop> <1267659090.4317.67.camel@localhost.localdomain> <20100304033312.GO8653@laptop> <1267675511.4317.78.camel@localhost.localdomain> <1268189462.3339.12.camel@localhost.localdomain> <20100310090142.GA9529@infradead.org> <1268363312.3475.85.camel@localhost.localdomain> <20100312044112.GC4732@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100312044112.GC4732@dastard> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 12, 2010 at 03:41:12PM +1100, Dave Chinner wrote: > On Thu, Mar 11, 2010 at 07:08:32PM -0800, john stultz wrote: > > On Wed, 2010-03-10 at 04:01 -0500, Christoph Hellwig wrote: > > > On Tue, Mar 09, 2010 at 06:51:02PM -0800, john stultz wrote: > > > > So this all means that with Nick's patch set, we're no longer getting > > > > bogged down in the vfs (at least at 8-way) at all. All the contention is > > > > in the actual filesystem (ext2 in group_adjust_blocks, and ext3 in the > > > > journal and block allocation code). > > > > > > Can you check if you're running into any fs scaling limit with xfs? > > > > > > Here's the charts from some limited testing: > > http://sr71.net/~jstultz/dbench-scalability/graphs/2.6.33/xfs-dbench.png > > What's the X-axis? Number of clients? Yes I think so (either it's dbench clients, or CPUs). > If so, I have previously tested XFS to make sure throughput is flat > out to about 1000 clients, not 8. i.e I'm not interested in peak > throughput from dbench (generally a meaningless number), I'm much > more interested in sustaining that throughput under the sorts of > loads a real fileserver would see... dbench is simply one that is known bad for core vfs locks. If it is run on top of tmpfs it gives relatively stable numbers, and on a real filesystem on ramdisk it works OK too. Not sure if John was running it on a ramdisk though. It does emulate the syscall pattern coming from samba running netbench test, so it's not _totally_ meaningless :) In this case, we're mostly interested in it to see if there are contended locks or cachelines left. > > > They're not great. And compared to ext3, the results are basically > > flat. > > http://sr71.net/~jstultz/dbench-scalability/graphs/2.6.33/ext3-dbench.png > > > > Now, I've not done any real xfs work before, so if there is any tuning > > needed for dbench, please let me know. > > Dbench does lots of transactions which runs XFS into being log IO > bound. Make sure you have at least a 128MB log and are using > lazy-count=1 andperhaps even the logbsize=262144 mount option. but > in general it only takes 2-4 clients to reach maximum throughput on > XFS.... > > > The odd bit is that perf doesn't show huge overheads in the xfs runs. > > The spinlock contention is supposedly under 5%. So I'm not sure whats > > causing the numbers to be so bad. > > It's bound by sleeping locks or IO. call-graph based profiles > triggered on context switches are the easiest way to find the > contending lock. > > Last time I did this (around 2.6.16, IIRC) it involved patching the > kernel to put the sample point in the context switch code - can we > do that now without patching the kernel? lock profiling can track sleeping locks, profile=schedule and profile=sleep still works OK too. Don't know if any useful tracing stuff is there for locks yet.