From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755982Ab0CLDIi (ORCPT ); Thu, 11 Mar 2010 22:08:38 -0500 Received: from e31.co.us.ibm.com ([32.97.110.149]:47999 "EHLO e31.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753367Ab0CLDIh (ORCPT ); Thu, 11 Mar 2010 22:08:37 -0500 Subject: Re: Nick's vfs-scalability patches ported to 2.6.33-rt From: john stultz To: Christoph Hellwig Cc: Nick Piggin , Thomas Gleixner , lkml , Clark Williams , John Kacur In-Reply-To: <20100310090142.GA9529@infradead.org> References: <1267163608.2002.9.camel@work-vm> <20100226060109.GH9738@laptop> <1267659090.4317.67.camel@localhost.localdomain> <20100304033312.GO8653@laptop> <1267675511.4317.78.camel@localhost.localdomain> <1268189462.3339.12.camel@localhost.localdomain> <20100310090142.GA9529@infradead.org> Content-Type: text/plain; charset="UTF-8" Date: Thu, 11 Mar 2010 19:08:32 -0800 Message-ID: <1268363312.3475.85.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.28.1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2010-03-10 at 04:01 -0500, Christoph Hellwig wrote: > On Tue, Mar 09, 2010 at 06:51:02PM -0800, john stultz wrote: > > So this all means that with Nick's patch set, we're no longer getting > > bogged down in the vfs (at least at 8-way) at all. All the contention is > > in the actual filesystem (ext2 in group_adjust_blocks, and ext3 in the > > journal and block allocation code). > > Can you check if you're running into any fs scaling limit with xfs? Here's the charts from some limited testing: http://sr71.net/~jstultz/dbench-scalability/graphs/2.6.33/xfs-dbench.png They're not great. And compared to ext3, the results are basically flat. http://sr71.net/~jstultz/dbench-scalability/graphs/2.6.33/ext3-dbench.png Now, I've not done any real xfs work before, so if there is any tuning needed for dbench, please let me know. The odd bit is that perf doesn't show huge overheads in the xfs runs. The spinlock contention is supposedly under 5%. So I'm not sure whats causing the numbers to be so bad. Clipped perf log below. thanks -john 11.06% dbench [kernel] [k] copy_user_generic_strin 4.82% dbench [kernel] [k] __lock_acquire | |--94.74%-- lock_acquire | | | |--38.89%-- rt_spin_lock | | | | | |--28.57%-- _slab_irq_disable | | | | | | | |--50.00%-- kmem_cache_alloc | | | | kmem_zone_alloc | | | | xfs_buf_get | | | | xfs_buf_read | | | | xfs_trans_read_buf | | | | xfs_btree_read_buf_b | | | | xfs_btree_lookup_get | | | | xfs_btree_lookup | | | | xfs_alloc_lookup_eq | | | | xfs_alloc_fixup_tree | | | | xfs_alloc_ag_vextent | | | | xfs_alloc_ag_vextent | | | | xfs_alloc_vextent | | | | xfs_ialloc_ag_alloc | | | | xfs_dialloc | | | | xfs_ialloc | | | | xfs_dir_ialloc | | | | xfs_create | | | | xfs_vn_mknod | | | | xfs_vn_mkdir | | | | vfs_mkdir | | | | sys_mkdirat | | | | sys_mkdir | | | | system_call_fastpath | | | | __GI___mkdir | | | | | | | --50.00%-- kmem_cache_free | | | xfs_buf_get | | | xfs_buf_read | | | xfs_trans_read_buf | | | xfs_btree_read_buf_b | | | xfs_btree_lookup_get | | | xfs_btree_lookup | | | xfs_dialloc | | | xfs_ialloc | | | xfs_dir_ialloc | | | xfs_create | | | xfs_vn_mknod | | | xfs_vn_mkdir | | | vfs_mkdir | | | sys_mkdirat | | | sys_mkdir | | | system_call_fastpath | | | __GI___mkdir | | | | | |--14.29%-- dput | | | path_put | | | link_path_walk | | | path_walk | | | do_path_lookup | | | user_path_at | | | vfs_fstatat | | | vfs_stat | | | sys_newstat | | | system_call_fastpath | | | _xstat | | | | | |--14.29%-- add_to_page_cache_locked | | | add_to_page_cache_lru | | | grab_cache_page_write_begin | | | block_write_begin | | | xfs_vm_write_begin | | | generic_file_buffered_write | | | xfs_write | | | xfs_file_aio_write | | | do_sync_write | | | vfs_write | | | sys_pwrite64 | | | system_call_fastpath | | | __GI_pwrite :