* Nick's vfs-scalability patches ported to 2.6.33-rt @ 2010-02-26 5:53 john stultz 2010-02-26 6:01 ` Nick Piggin 0 siblings, 1 reply; 10+ messages in thread From: john stultz @ 2010-02-26 5:53 UTC (permalink / raw) To: Thomas Gleixner, Nick Piggin; +Cc: lkml, Clark Williams, John Kacur Hey Thomas, Nick, I just wanted to let you know I've just finished forward porting Nick's patches to 2.6.33-rc8-rt2. Luckily my forward port of Nick's patches to 2.6.33 applies on top of the -rt tree without any collisions, and I've added a handful of maybe sketchy fixups to get it working with -rt. You can find the patchset here: http://sr71.net/~jstultz/dbench-scalability/patches/2.6.33-rc8-rt2/vfs-scale.33-rt.tar.bz2 Here's a chart showing how much these patches help dbench numbers on ramfs: http://sr71.net/~jstultz/dbench-scalability/graphs/2.6.33/ramfs-dbench.png I've not done any serious stress testing with the patchset yet, but wanted to post it for your review. Nick: I'd appreciate any feedback as to if any of my forward porting has gone awry. I'm still very green with respect to the vfs, so I don't doubt there are some issues hiding here. Thomas: Let me know if you want to start playing with this in the -rt tree. I'm not seeing any warnings with the debugging options on, so I think I squashed all of those issues, but let me know if you manage to trigger anything. thanks -john ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Nick's vfs-scalability patches ported to 2.6.33-rt 2010-02-26 5:53 Nick's vfs-scalability patches ported to 2.6.33-rt john stultz @ 2010-02-26 6:01 ` Nick Piggin 2010-03-03 23:31 ` john stultz 0 siblings, 1 reply; 10+ messages in thread From: Nick Piggin @ 2010-02-26 6:01 UTC (permalink / raw) To: john stultz; +Cc: Thomas Gleixner, lkml, Clark Williams, John Kacur On Thu, Feb 25, 2010 at 09:53:28PM -0800, john stultz wrote: > Hey Thomas, Nick, > I just wanted to let you know I've just finished forward porting Nick's > patches to 2.6.33-rc8-rt2. Luckily my forward port of Nick's patches to > 2.6.33 applies on top of the -rt tree without any collisions, and I've > added a handful of maybe sketchy fixups to get it working with -rt. > > You can find the patchset here: > http://sr71.net/~jstultz/dbench-scalability/patches/2.6.33-rc8-rt2/vfs-scale.33-rt.tar.bz2 > > Here's a chart showing how much these patches help dbench numbers on > ramfs: > http://sr71.net/~jstultz/dbench-scalability/graphs/2.6.33/ramfs-dbench.png > > I've not done any serious stress testing with the patchset yet, but > wanted to post it for your review. > > Nick: I'd appreciate any feedback as to if any of my forward porting has > gone awry. I'm still very green with respect to the vfs, so I don't > doubt there are some issues hiding here. BTW there are a few issues Al pointed out. We have to synchronize RCU after unregistering a filesystem so d_ops/i_ops doesn't go away, and mntput can sleep so we can't do it under RCU read lock. The store-free path walk patches don't really have the required RCU barriers in them either (which is fine for x86, but would have to be fixed). ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Nick's vfs-scalability patches ported to 2.6.33-rt 2010-02-26 6:01 ` Nick Piggin @ 2010-03-03 23:31 ` john stultz 2010-03-04 3:33 ` Nick Piggin 0 siblings, 1 reply; 10+ messages in thread From: john stultz @ 2010-03-03 23:31 UTC (permalink / raw) To: Nick Piggin; +Cc: Thomas Gleixner, lkml, Clark Williams, John Kacur On Fri, 2010-02-26 at 17:01 +1100, Nick Piggin wrote: > On Thu, Feb 25, 2010 at 09:53:28PM -0800, john stultz wrote: > > Hey Thomas, Nick, > > I just wanted to let you know I've just finished forward porting Nick's > > patches to 2.6.33-rc8-rt2. Luckily my forward port of Nick's patches to > > 2.6.33 applies on top of the -rt tree without any collisions, and I've > > added a handful of maybe sketchy fixups to get it working with -rt. > > > > You can find the patchset here: > > http://sr71.net/~jstultz/dbench-scalability/patches/2.6.33-rc8-rt2/vfs-scale.33-rt.tar.bz2 > > > > Here's a chart showing how much these patches help dbench numbers on > > ramfs: > > http://sr71.net/~jstultz/dbench-scalability/graphs/2.6.33/ramfs-dbench.png > > > > I've not done any serious stress testing with the patchset yet, but > > wanted to post it for your review. > > > > Nick: I'd appreciate any feedback as to if any of my forward porting has > > gone awry. I'm still very green with respect to the vfs, so I don't > > doubt there are some issues hiding here. > > BTW there are a few issues Al pointed out. We have to synchronize RCU > after unregistering a filesystem so d_ops/i_ops doesn't go away, and > mntput can sleep so we can't do it under RCU read lock. Does the following address this issue properly? Signed-off-by: John Stultz <johnstul@us.ibm.com> diff --git a/fs/filesystems.c b/fs/filesystems.c index a24c58e..3448e7c 100644 --- a/fs/filesystems.c +++ b/fs/filesystems.c @@ -110,6 +110,7 @@ int unregister_filesystem(struct file_system_type * fs) *tmp = fs->next; fs->next = NULL; write_unlock(&file_systems_lock); + synchronize_rcu(); return 0; } tmp = &(*tmp)->next; ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Nick's vfs-scalability patches ported to 2.6.33-rt 2010-03-03 23:31 ` john stultz @ 2010-03-04 3:33 ` Nick Piggin 2010-03-04 4:05 ` john stultz 0 siblings, 1 reply; 10+ messages in thread From: Nick Piggin @ 2010-03-04 3:33 UTC (permalink / raw) To: john stultz; +Cc: Thomas Gleixner, lkml, Clark Williams, John Kacur On Wed, Mar 03, 2010 at 03:31:30PM -0800, john stultz wrote: > On Fri, 2010-02-26 at 17:01 +1100, Nick Piggin wrote: > > On Thu, Feb 25, 2010 at 09:53:28PM -0800, john stultz wrote: > > > Hey Thomas, Nick, > > > I just wanted to let you know I've just finished forward porting Nick's > > > patches to 2.6.33-rc8-rt2. Luckily my forward port of Nick's patches to > > > 2.6.33 applies on top of the -rt tree without any collisions, and I've > > > added a handful of maybe sketchy fixups to get it working with -rt. > > > > > > You can find the patchset here: > > > http://sr71.net/~jstultz/dbench-scalability/patches/2.6.33-rc8-rt2/vfs-scale.33-rt.tar.bz2 > > > > > > Here's a chart showing how much these patches help dbench numbers on > > > ramfs: > > > http://sr71.net/~jstultz/dbench-scalability/graphs/2.6.33/ramfs-dbench.png > > > > > > I've not done any serious stress testing with the patchset yet, but > > > wanted to post it for your review. > > > > > > Nick: I'd appreciate any feedback as to if any of my forward porting has > > > gone awry. I'm still very green with respect to the vfs, so I don't > > > doubt there are some issues hiding here. > > > > BTW there are a few issues Al pointed out. We have to synchronize RCU > > after unregistering a filesystem so d_ops/i_ops doesn't go away, and > > mntput can sleep so we can't do it under RCU read lock. > > Does the following address this issue properly? As far as I could tell, yes that should solve the code reference problem. I don't see a problem with synchronizing RCU here. > > Signed-off-by: John Stultz <johnstul@us.ibm.com> > > diff --git a/fs/filesystems.c b/fs/filesystems.c > index a24c58e..3448e7c 100644 > --- a/fs/filesystems.c > +++ b/fs/filesystems.c > @@ -110,6 +110,7 @@ int unregister_filesystem(struct file_system_type * fs) > *tmp = fs->next; > fs->next = NULL; > write_unlock(&file_systems_lock); > + synchronize_rcu(); > return 0; > } > tmp = &(*tmp)->next; > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Nick's vfs-scalability patches ported to 2.6.33-rt 2010-03-04 3:33 ` Nick Piggin @ 2010-03-04 4:05 ` john stultz 2010-03-10 2:51 ` john stultz 0 siblings, 1 reply; 10+ messages in thread From: john stultz @ 2010-03-04 4:05 UTC (permalink / raw) To: Nick Piggin, Thomas Gleixner; +Cc: lkml, Clark Williams, John Kacur On Thu, 2010-03-04 at 14:33 +1100, Nick Piggin wrote: > On Wed, Mar 03, 2010 at 03:31:30PM -0800, john stultz wrote: > > On Fri, 2010-02-26 at 17:01 +1100, Nick Piggin wrote: > > > On Thu, Feb 25, 2010 at 09:53:28PM -0800, john stultz wrote: > > > > Hey Thomas, Nick, > > > > I just wanted to let you know I've just finished forward porting Nick's > > > > patches to 2.6.33-rc8-rt2. Luckily my forward port of Nick's patches to > > > > 2.6.33 applies on top of the -rt tree without any collisions, and I've > > > > added a handful of maybe sketchy fixups to get it working with -rt. > > > > > > > > You can find the patchset here: > > > > http://sr71.net/~jstultz/dbench-scalability/patches/2.6.33-rc8-rt2/vfs-scale.33-rt.tar.bz2 > > > > > > > > Here's a chart showing how much these patches help dbench numbers on > > > > ramfs: > > > > http://sr71.net/~jstultz/dbench-scalability/graphs/2.6.33/ramfs-dbench.png > > > > > > > > I've not done any serious stress testing with the patchset yet, but > > > > wanted to post it for your review. > > > > > > > > Nick: I'd appreciate any feedback as to if any of my forward porting has > > > > gone awry. I'm still very green with respect to the vfs, so I don't > > > > doubt there are some issues hiding here. > > > > > > BTW there are a few issues Al pointed out. We have to synchronize RCU > > > after unregistering a filesystem so d_ops/i_ops doesn't go away, and > > > mntput can sleep so we can't do it under RCU read lock. > > > > Does the following address this issue properly? > > As far as I could tell, yes that should solve the code reference > problem. I don't see a problem with synchronizing RCU here. Good to hear! Thanks for the review Nick! Thomas: I ran a number of kernel-bench and dbench stress tests on this today and I've not seen any issues, so unless Nick has other issues, I think it should be ok to pull into -rt. You can grab the full patchset that builds ontop of 2.6.33-rt4 here: http://sr71.net/~jstultz/dbench-scalability/patches/2.6.33-rt4/vfs-scale.33-rt.tar.bz2 thanks -john ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Nick's vfs-scalability patches ported to 2.6.33-rt 2010-03-04 4:05 ` john stultz @ 2010-03-10 2:51 ` john stultz 2010-03-10 9:01 ` Christoph Hellwig 0 siblings, 1 reply; 10+ messages in thread From: john stultz @ 2010-03-10 2:51 UTC (permalink / raw) To: Nick Piggin; +Cc: Thomas Gleixner, lkml, Clark Williams, John Kacur On Wed, 2010-03-03 at 20:05 -0800, john stultz wrote: > Thomas: I ran a number of kernel-bench and dbench stress tests on this > today and I've not seen any issues, so unless Nick has other issues, I > think it should be ok to pull into -rt. > > You can grab the full patchset that builds ontop of 2.6.33-rt4 here: > http://sr71.net/~jstultz/dbench-scalability/patches/2.6.33-rt4/vfs-scale.33-rt.tar.bz2 Oh, and another interesting data point! The ext2 performance numbers with this patch set are scaling better then the 2.6.31-rt-vfs set earlier tested! http://sr71.net/~jstultz/dbench-scalability/graphs/2.6.33/ext2-dbench.png Its not perfect, but its closing the gap. More interestingly, where as we were still seeing path lookup contention in 2.6.31, its not showing up in the perf logs with 2.6.33. Instead, the contention is on the ext2 group_adjust_blocks function. And replacing the statvfs call in dbench with statfs pushes the results past mainline: http://sr71.net/~jstultz/dbench-scalability/graphs/2.6.33/ext2-dbench-statfs.png So this all means that with Nick's patch set, we're no longer getting bogged down in the vfs (at least at 8-way) at all. All the contention is in the actual filesystem (ext2 in group_adjust_blocks, and ext3 in the journal and block allocation code). So again, kudos to Nick! thanks -john ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Nick's vfs-scalability patches ported to 2.6.33-rt 2010-03-10 2:51 ` john stultz @ 2010-03-10 9:01 ` Christoph Hellwig 2010-03-12 3:08 ` john stultz 0 siblings, 1 reply; 10+ messages in thread From: Christoph Hellwig @ 2010-03-10 9:01 UTC (permalink / raw) To: john stultz Cc: Nick Piggin, Thomas Gleixner, lkml, Clark Williams, John Kacur On Tue, Mar 09, 2010 at 06:51:02PM -0800, john stultz wrote: > So this all means that with Nick's patch set, we're no longer getting > bogged down in the vfs (at least at 8-way) at all. All the contention is > in the actual filesystem (ext2 in group_adjust_blocks, and ext3 in the > journal and block allocation code). Can you check if you're running into any fs scaling limit with xfs? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Nick's vfs-scalability patches ported to 2.6.33-rt 2010-03-10 9:01 ` Christoph Hellwig @ 2010-03-12 3:08 ` john stultz 2010-03-12 4:41 ` Dave Chinner 0 siblings, 1 reply; 10+ messages in thread From: john stultz @ 2010-03-12 3:08 UTC (permalink / raw) To: Christoph Hellwig Cc: Nick Piggin, Thomas Gleixner, lkml, Clark Williams, John Kacur On Wed, 2010-03-10 at 04:01 -0500, Christoph Hellwig wrote: > On Tue, Mar 09, 2010 at 06:51:02PM -0800, john stultz wrote: > > So this all means that with Nick's patch set, we're no longer getting > > bogged down in the vfs (at least at 8-way) at all. All the contention is > > in the actual filesystem (ext2 in group_adjust_blocks, and ext3 in the > > journal and block allocation code). > > Can you check if you're running into any fs scaling limit with xfs? Here's the charts from some limited testing: http://sr71.net/~jstultz/dbench-scalability/graphs/2.6.33/xfs-dbench.png They're not great. And compared to ext3, the results are basically flat. http://sr71.net/~jstultz/dbench-scalability/graphs/2.6.33/ext3-dbench.png Now, I've not done any real xfs work before, so if there is any tuning needed for dbench, please let me know. The odd bit is that perf doesn't show huge overheads in the xfs runs. The spinlock contention is supposedly under 5%. So I'm not sure whats causing the numbers to be so bad. Clipped perf log below. thanks -john 11.06% dbench [kernel] [k] copy_user_generic_strin 4.82% dbench [kernel] [k] __lock_acquire | |--94.74%-- lock_acquire | | | |--38.89%-- rt_spin_lock | | | | | |--28.57%-- _slab_irq_disable | | | | | | | |--50.00%-- kmem_cache_alloc | | | | kmem_zone_alloc | | | | xfs_buf_get | | | | xfs_buf_read | | | | xfs_trans_read_buf | | | | xfs_btree_read_buf_b | | | | xfs_btree_lookup_get | | | | xfs_btree_lookup | | | | xfs_alloc_lookup_eq | | | | xfs_alloc_fixup_tree | | | | xfs_alloc_ag_vextent | | | | xfs_alloc_ag_vextent | | | | xfs_alloc_vextent | | | | xfs_ialloc_ag_alloc | | | | xfs_dialloc | | | | xfs_ialloc | | | | xfs_dir_ialloc | | | | xfs_create | | | | xfs_vn_mknod | | | | xfs_vn_mkdir | | | | vfs_mkdir | | | | sys_mkdirat | | | | sys_mkdir | | | | system_call_fastpath | | | | __GI___mkdir | | | | | | | --50.00%-- kmem_cache_free | | | xfs_buf_get | | | xfs_buf_read | | | xfs_trans_read_buf | | | xfs_btree_read_buf_b | | | xfs_btree_lookup_get | | | xfs_btree_lookup | | | xfs_dialloc | | | xfs_ialloc | | | xfs_dir_ialloc | | | xfs_create | | | xfs_vn_mknod | | | xfs_vn_mkdir | | | vfs_mkdir | | | sys_mkdirat | | | sys_mkdir | | | system_call_fastpath | | | __GI___mkdir | | | | | |--14.29%-- dput | | | path_put | | | link_path_walk | | | path_walk | | | do_path_lookup | | | user_path_at | | | vfs_fstatat | | | vfs_stat | | | sys_newstat | | | system_call_fastpath | | | _xstat | | | | | |--14.29%-- add_to_page_cache_locked | | | add_to_page_cache_lru | | | grab_cache_page_write_begin | | | block_write_begin | | | xfs_vm_write_begin | | | generic_file_buffered_write | | | xfs_write | | | xfs_file_aio_write | | | do_sync_write | | | vfs_write | | | sys_pwrite64 | | | system_call_fastpath | | | __GI_pwrite : ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Nick's vfs-scalability patches ported to 2.6.33-rt 2010-03-12 3:08 ` john stultz @ 2010-03-12 4:41 ` Dave Chinner 2010-03-15 16:15 ` Nick Piggin 0 siblings, 1 reply; 10+ messages in thread From: Dave Chinner @ 2010-03-12 4:41 UTC (permalink / raw) To: john stultz Cc: Christoph Hellwig, Nick Piggin, Thomas Gleixner, lkml, Clark Williams, John Kacur On Thu, Mar 11, 2010 at 07:08:32PM -0800, john stultz wrote: > On Wed, 2010-03-10 at 04:01 -0500, Christoph Hellwig wrote: > > On Tue, Mar 09, 2010 at 06:51:02PM -0800, john stultz wrote: > > > So this all means that with Nick's patch set, we're no longer getting > > > bogged down in the vfs (at least at 8-way) at all. All the contention is > > > in the actual filesystem (ext2 in group_adjust_blocks, and ext3 in the > > > journal and block allocation code). > > > > Can you check if you're running into any fs scaling limit with xfs? > > > Here's the charts from some limited testing: > http://sr71.net/~jstultz/dbench-scalability/graphs/2.6.33/xfs-dbench.png What's the X-axis? Number of clients? If so, I have previously tested XFS to make sure throughput is flat out to about 1000 clients, not 8. i.e I'm not interested in peak throughput from dbench (generally a meaningless number), I'm much more interested in sustaining that throughput under the sorts of loads a real fileserver would see... > They're not great. And compared to ext3, the results are basically > flat. > http://sr71.net/~jstultz/dbench-scalability/graphs/2.6.33/ext3-dbench.png > > Now, I've not done any real xfs work before, so if there is any tuning > needed for dbench, please let me know. Dbench does lots of transactions which runs XFS into being log IO bound. Make sure you have at least a 128MB log and are using lazy-count=1 andperhaps even the logbsize=262144 mount option. but in general it only takes 2-4 clients to reach maximum throughput on XFS.... > The odd bit is that perf doesn't show huge overheads in the xfs runs. > The spinlock contention is supposedly under 5%. So I'm not sure whats > causing the numbers to be so bad. It's bound by sleeping locks or IO. call-graph based profiles triggered on context switches are the easiest way to find the contending lock. Last time I did this (around 2.6.16, IIRC) it involved patching the kernel to put the sample point in the context switch code - can we do that now without patching the kernel? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Nick's vfs-scalability patches ported to 2.6.33-rt 2010-03-12 4:41 ` Dave Chinner @ 2010-03-15 16:15 ` Nick Piggin 0 siblings, 0 replies; 10+ messages in thread From: Nick Piggin @ 2010-03-15 16:15 UTC (permalink / raw) To: Dave Chinner Cc: john stultz, Christoph Hellwig, Thomas Gleixner, lkml, Clark Williams, John Kacur On Fri, Mar 12, 2010 at 03:41:12PM +1100, Dave Chinner wrote: > On Thu, Mar 11, 2010 at 07:08:32PM -0800, john stultz wrote: > > On Wed, 2010-03-10 at 04:01 -0500, Christoph Hellwig wrote: > > > On Tue, Mar 09, 2010 at 06:51:02PM -0800, john stultz wrote: > > > > So this all means that with Nick's patch set, we're no longer getting > > > > bogged down in the vfs (at least at 8-way) at all. All the contention is > > > > in the actual filesystem (ext2 in group_adjust_blocks, and ext3 in the > > > > journal and block allocation code). > > > > > > Can you check if you're running into any fs scaling limit with xfs? > > > > > > Here's the charts from some limited testing: > > http://sr71.net/~jstultz/dbench-scalability/graphs/2.6.33/xfs-dbench.png > > What's the X-axis? Number of clients? Yes I think so (either it's dbench clients, or CPUs). > If so, I have previously tested XFS to make sure throughput is flat > out to about 1000 clients, not 8. i.e I'm not interested in peak > throughput from dbench (generally a meaningless number), I'm much > more interested in sustaining that throughput under the sorts of > loads a real fileserver would see... dbench is simply one that is known bad for core vfs locks. If it is run on top of tmpfs it gives relatively stable numbers, and on a real filesystem on ramdisk it works OK too. Not sure if John was running it on a ramdisk though. It does emulate the syscall pattern coming from samba running netbench test, so it's not _totally_ meaningless :) In this case, we're mostly interested in it to see if there are contended locks or cachelines left. > > > They're not great. And compared to ext3, the results are basically > > flat. > > http://sr71.net/~jstultz/dbench-scalability/graphs/2.6.33/ext3-dbench.png > > > > Now, I've not done any real xfs work before, so if there is any tuning > > needed for dbench, please let me know. > > Dbench does lots of transactions which runs XFS into being log IO > bound. Make sure you have at least a 128MB log and are using > lazy-count=1 andperhaps even the logbsize=262144 mount option. but > in general it only takes 2-4 clients to reach maximum throughput on > XFS.... > > > The odd bit is that perf doesn't show huge overheads in the xfs runs. > > The spinlock contention is supposedly under 5%. So I'm not sure whats > > causing the numbers to be so bad. > > It's bound by sleeping locks or IO. call-graph based profiles > triggered on context switches are the easiest way to find the > contending lock. > > Last time I did this (around 2.6.16, IIRC) it involved patching the > kernel to put the sample point in the context switch code - can we > do that now without patching the kernel? lock profiling can track sleeping locks, profile=schedule and profile=sleep still works OK too. Don't know if any useful tracing stuff is there for locks yet. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2010-03-15 16:16 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-02-26 5:53 Nick's vfs-scalability patches ported to 2.6.33-rt john stultz 2010-02-26 6:01 ` Nick Piggin 2010-03-03 23:31 ` john stultz 2010-03-04 3:33 ` Nick Piggin 2010-03-04 4:05 ` john stultz 2010-03-10 2:51 ` john stultz 2010-03-10 9:01 ` Christoph Hellwig 2010-03-12 3:08 ` john stultz 2010-03-12 4:41 ` Dave Chinner 2010-03-15 16:15 ` Nick Piggin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox