* Re: [LKP] [fs/locks] 83b381078b: will-it-scale.per_thread_ops -62.5% regression [not found] <20181127060102.GF6163@shao2-debian> @ 2018-11-27 17:43 ` J. Bruce Fields 2018-11-27 23:20 ` NeilBrown 0 siblings, 1 reply; 5+ messages in thread From: J. Bruce Fields @ 2018-11-27 17:43 UTC (permalink / raw) To: kernel test robot Cc: NeilBrown, Jeff Layton, LKML, Jeff Layton, lkp, linux-nfs Thanks for the report! On Tue, Nov 27, 2018 at 02:01:02PM +0800, kernel test robot wrote: > FYI, we noticed a -62.5% regression of will-it-scale.per_thread_ops due to commit: > > > commit: 83b381078b5ecab098ebf6bc9548bb32af1dbf31 ("fs/locks: always delete_block after waiting.") > https://git.kernel.org/cgit/linux/kernel/git/jlayton/linux.git locks-next > > in testcase: will-it-scale > on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 64G memory > with following parameters: > > nr_task: 16 > mode: thread > test: lock1 So I guess it's doing this, uncontended file lock/unlock?: https://github.com/antonblanchard/will-it-scale/blob/master/tests/lock1.c Each thread is repeatedly locking and unlocking a file that is only used by that thread. By the way, what's the X-axis on these graphs? (Or the y-axis, for that matter?) --b. > will-it-scale.per_thread_ops > > 450000 +-+----------------------------------------------------------------+ > | | > 400000 +-+ +..+.. .+..+.. .+..+..+...+..+..+.. +.. .+.. ..| > 350000 +-+ .. +. +. .. +. +..+ | > | + + + : | > 300000 +-+ : : | > 250000 +-+ : : | > | : : | > 200000 +-+ : : | > 150000 +-+ : : | > O O O O O O O O O O O O O O O O O :O: O O O O O > 100000 +-+ : : | > 50000 +-+ : : | > | : | > 0 +-+----------------------------------------------------------------+ > > > will-it-scale.workload > > 7e+06 +-+-----------------------------------------------------------------+ > | +...+.. .+..+..+ + +.. | > 6e+06 +-+ +..+.. .. .+..+..+. + + + .. ..| > | .. + +. + + + + +..+ | > 5e+06 +-++ + + : | > | : : | > 4e+06 +-+ : : | > | : : | > 3e+06 +-+ : : | > | O O : : O O | > 2e+06 O-+O O O O O O O O O O O O O O : O: O O O > | : : | > 1e+06 +-+ : : | > | : | > 0 +-+-----------------------------------------------------------------+ > > > will-it-scale.time.user_time > > 250 +-+-------------------------------------------------------------------+ > | .+.. .+.. +.. | > |.. +...+.. .+. .+...+..+..+. +.. +.. .. . ..| > 200 +-+ .. +. +. . .. + +..+ | > | + + + : | > | : : | > 150 +-+ : : | > | : : | > 100 +-+ : : | > | O O : : | > O O O O O O O O O O O O O O O :O: O O O O O > 50 +-+ : : | > | : : | > | : | > 0 +-+-------------------------------------------------------------------+ > > > will-it-scale.time.system_time > > 5000 +-+------------------------------------------------------------------+ > 4500 O-+O..O..O...O..O..O..O..O..O..O...O..O..O..O..O..O O O...O..O..O..O > | : : | > 4000 +-+ : : | > 3500 +-+ : : | > | : : | > 3000 +-+ : : | > 2500 +-+ : : | > 2000 +-+ : : | > | : : | > 1500 +-+ : : | > 1000 +-+ : : | > | : | > 500 +-+ : | > 0 +-+------------------------------------------------------------------+ > > > [*] bisect-good sample > [O] bisect-bad sample ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [LKP] [fs/locks] 83b381078b: will-it-scale.per_thread_ops -62.5% regression 2018-11-27 17:43 ` [LKP] [fs/locks] 83b381078b: will-it-scale.per_thread_ops -62.5% regression J. Bruce Fields @ 2018-11-27 23:20 ` NeilBrown 2018-11-28 0:53 ` [PATCH] locks: fix performance regressions NeilBrown 0 siblings, 1 reply; 5+ messages in thread From: NeilBrown @ 2018-11-27 23:20 UTC (permalink / raw) To: J. Bruce Fields, kernel test robot Cc: Jeff Layton, LKML, Jeff Layton, lkp, linux-nfs [-- Attachment #1: Type: text/plain, Size: 8850 bytes --] On Tue, Nov 27 2018, J. Bruce Fields wrote: > Thanks for the report! Yes, thanks. I thought I had replied to the previous report of a similar problem, but I didn't actually send that email - oops. Though the test is the same and the regression similar, this is a different patch. The previous report identified fs/locks: allow a lock request to block other requests this one identifies fs/locks: always delete_block after waiting. Both cause blocked_lock_lock to be taken more often. In one case is it due to locks_move_blocks(). That can probably be optimised to skip the lock if list_empty(&fl->fl_blocked_requests). I'd need to double-check, but I think that is safe to check without locking. This one causes locks_delete_blocks() to be called more often. We now call it even if no waiting happened at all. I suspect we can test for that and avoid it. I'll have a look. > > On Tue, Nov 27, 2018 at 02:01:02PM +0800, kernel test robot wrote: >> FYI, we noticed a -62.5% regression of will-it-scale.per_thread_ops due to commit: >> >> >> commit: 83b381078b5ecab098ebf6bc9548bb32af1dbf31 ("fs/locks: always delete_block after waiting.") >> https://git.kernel.org/cgit/linux/kernel/git/jlayton/linux.git locks-next >> >> in testcase: will-it-scale >> on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 64G memory >> with following parameters: >> >> nr_task: 16 >> mode: thread >> test: lock1 > > So I guess it's doing this, uncontended file lock/unlock?: > > https://github.com/antonblanchard/will-it-scale/blob/master/tests/lock1.c > > Each thread is repeatedly locking and unlocking a file that is only used > by that thread. Thanks for identifying that Bruce. This would certainly be a case where locks_delete_block() is now being called when it wasn't before. > > By the way, what's the X-axis on these graphs? (Or the y-axis, for that > matter?) A key would help. I think the X-axis is number-of-threads. y-axis might be ops-per-second ??. Thanks, NeilBrown > > --b. > >> will-it-scale.per_thread_ops >> >> 450000 +-+----------------------------------------------------------------+ >> | | >> 400000 +-+ +..+.. .+..+.. .+..+..+...+..+..+.. +.. .+.. ..| >> 350000 +-+ .. +. +. .. +. +..+ | >> | + + + : | >> 300000 +-+ : : | >> 250000 +-+ : : | >> | : : | >> 200000 +-+ : : | >> 150000 +-+ : : | >> O O O O O O O O O O O O O O O O O :O: O O O O O >> 100000 +-+ : : | >> 50000 +-+ : : | >> | : | >> 0 +-+----------------------------------------------------------------+ >> >> >> will-it-scale.workload >> >> 7e+06 +-+-----------------------------------------------------------------+ >> | +...+.. .+..+..+ + +.. | >> 6e+06 +-+ +..+.. .. .+..+..+. + + + .. ..| >> | .. + +. + + + + +..+ | >> 5e+06 +-++ + + : | >> | : : | >> 4e+06 +-+ : : | >> | : : | >> 3e+06 +-+ : : | >> | O O : : O O | >> 2e+06 O-+O O O O O O O O O O O O O O : O: O O O >> | : : | >> 1e+06 +-+ : : | >> | : | >> 0 +-+-----------------------------------------------------------------+ >> >> >> will-it-scale.time.user_time >> >> 250 +-+-------------------------------------------------------------------+ >> | .+.. .+.. +.. | >> |.. +...+.. .+. .+...+..+..+. +.. +.. .. . ..| >> 200 +-+ .. +. +. . .. + +..+ | >> | + + + : | >> | : : | >> 150 +-+ : : | >> | : : | >> 100 +-+ : : | >> | O O : : | >> O O O O O O O O O O O O O O O :O: O O O O O >> 50 +-+ : : | >> | : : | >> | : | >> 0 +-+-------------------------------------------------------------------+ >> >> >> will-it-scale.time.system_time >> >> 5000 +-+------------------------------------------------------------------+ >> 4500 O-+O..O..O...O..O..O..O..O..O..O...O..O..O..O..O..O O O...O..O..O..O >> | : : | >> 4000 +-+ : : | >> 3500 +-+ : : | >> | : : | >> 3000 +-+ : : | >> 2500 +-+ : : | >> 2000 +-+ : : | >> | : : | >> 1500 +-+ : : | >> 1000 +-+ : : | >> | : | >> 500 +-+ : | >> 0 +-+------------------------------------------------------------------+ >> >> >> [*] bisect-good sample >> [O] bisect-bad sample [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] locks: fix performance regressions. 2018-11-27 23:20 ` NeilBrown @ 2018-11-28 0:53 ` NeilBrown 2018-11-28 9:17 ` kernel test robot 2018-11-28 11:37 ` Jeff Layton 0 siblings, 2 replies; 5+ messages in thread From: NeilBrown @ 2018-11-28 0:53 UTC (permalink / raw) To: J. Bruce Fields, kernel test robot, Jeff Layton Cc: LKML, Jeff Layton, lkp, linux-nfs [-- Attachment #1: Type: text/plain, Size: 2225 bytes --] The kernel test robot reported two performance regressions caused by recent patches. Both appear to related to the global spinlock blocked_lock_lock being taken more often. This patch avoids taking that lock in the cases tested. Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: NeilBrown <neilb@suse.com> --- Hi Jeff, you might like to merge these back into the patches that introduced the problem. Or you might like me to re-send the series with these merged in, in which case, please ask. And a BIG thank-you to the kernel-test-robot team!! Thanks, NeilBrown fs/locks.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/fs/locks.c b/fs/locks.c index f456cd3d9d50..67519a43e27a 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -444,6 +444,13 @@ static void locks_move_blocks(struct file_lock *new, struct file_lock *fl) { struct file_lock *f; + /* + * As ctx->flc_lock is held, new requests cannot be added to + * ->fl_blocked_requests, so we don't need a lock to check if it + * is empty. + */ + if (list_empty(&fl->fl_blocked_requests)) + return; spin_lock(&blocked_lock_lock); list_splice_init(&fl->fl_blocked_requests, &new->fl_blocked_requests); list_for_each_entry(f, &fl->fl_blocked_requests, fl_blocked_member) @@ -749,6 +756,20 @@ int locks_delete_block(struct file_lock *waiter) { int status = -ENOENT; + /* + * If fl_blocker is NULL, it won't be set again as this thread + * "owns" the lock and is the only one that might try to claim + * the lock. So it is safe to test fl_blocker locklessly. + * Also if fl_blocker is NULL, this waiter is not listed on + * fl_blocked_requests for some lock, so no other request can + * be added to the list of fl_blocked_requests for this + * request. So if fl_blocker is NULL, it is safe to + * locklessly check if fl_blocked_requests is empty. If both + * of these checks succeed, there is no need to take the lock. + */ + if (waiter->fl_blocker == NULL && + list_empty(&waiter->fl_blocked_requests)) + return status; spin_lock(&blocked_lock_lock); if (waiter->fl_blocker) status = 0; -- 2.14.0.rc0.dirty [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] locks: fix performance regressions. 2018-11-28 0:53 ` [PATCH] locks: fix performance regressions NeilBrown @ 2018-11-28 9:17 ` kernel test robot 2018-11-28 11:37 ` Jeff Layton 1 sibling, 0 replies; 5+ messages in thread From: kernel test robot @ 2018-11-28 9:17 UTC (permalink / raw) To: NeilBrown; +Cc: J. Bruce Fields, Jeff Layton, LKML, Jeff Layton, lkp, linux-nfs Hi, On Wed, Nov 28, 2018 at 11:53:48AM +1100, NeilBrown wrote: > > The kernel test robot reported two performance regressions > caused by recent patches. > Both appear to related to the global spinlock blocked_lock_lock > being taken more often. > > This patch avoids taking that lock in the cases tested. > > Reported-by: kernel test robot <rong.a.chen@intel.com> > Signed-off-by: NeilBrown <neilb@suse.com> > --- > > Hi Jeff, > you might like to merge these back into the patches that introduced > the problem. > Or you might like me to re-send the series with these merged in, > in which case, please ask. > > And a BIG thank-you to the kernel-test-robot team!! > > Thanks, > NeilBrown > > fs/locks.c | 21 +++++++++++++++++++++ > 1 file changed, 21 insertions(+) > > diff --git a/fs/locks.c b/fs/locks.c > index f456cd3d9d50..67519a43e27a 100644 > --- a/fs/locks.c > +++ b/fs/locks.c > @@ -444,6 +444,13 @@ static void locks_move_blocks(struct file_lock *new, struct file_lock *fl) > { > struct file_lock *f; > > + /* > + * As ctx->flc_lock is held, new requests cannot be added to > + * ->fl_blocked_requests, so we don't need a lock to check if it > + * is empty. > + */ > + if (list_empty(&fl->fl_blocked_requests)) > + return; > spin_lock(&blocked_lock_lock); > list_splice_init(&fl->fl_blocked_requests, &new->fl_blocked_requests); > list_for_each_entry(f, &fl->fl_blocked_requests, fl_blocked_member) > @@ -749,6 +756,20 @@ int locks_delete_block(struct file_lock *waiter) > { > int status = -ENOENT; > > + /* > + * If fl_blocker is NULL, it won't be set again as this thread > + * "owns" the lock and is the only one that might try to claim > + * the lock. So it is safe to test fl_blocker locklessly. > + * Also if fl_blocker is NULL, this waiter is not listed on > + * fl_blocked_requests for some lock, so no other request can > + * be added to the list of fl_blocked_requests for this > + * request. So if fl_blocker is NULL, it is safe to > + * locklessly check if fl_blocked_requests is empty. If both > + * of these checks succeed, there is no need to take the lock. > + */ > + if (waiter->fl_blocker == NULL && > + list_empty(&waiter->fl_blocked_requests)) > + return status; > spin_lock(&blocked_lock_lock); > if (waiter->fl_blocker) > status = 0; > -- > 2.14.0.rc0.dirty > FYI, the performance recovered back, we didn't find any regression between the two commits. commit: 48a7a13ff3 ("locks: use properly initialized file_lock when unlocking.") 8f64e497be ("locks: fix performance regressions.") 48a7a13ff31f0728 8f64e497be9929a2d5904c39c4 ---------------- -------------------------- %stddev change %stddev \ | \ 33.56 ± 3% 5% 35.30 boot-time.boot 10497 ± 3% 12% 11733 ± 4% proc-vmstat.nr_shmem 67392 68449 proc-vmstat.nr_zone_active_anon 67392 68449 proc-vmstat.nr_active_anon 16303 16206 proc-vmstat.nr_slab_reclaimable 30602 29921 proc-vmstat.nr_slab_unreclaimable 0 9e+03 9009 ± 80% latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_do_create.nfs3_proc_create.nfs_create.path_openat.do_filp_open.do_sys_open.do_syscall_64 0 6e+03 5837 ±139% latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe 149 ± 17% 5e+03 5457 ±137% latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_getattr.__nfs_revalidate_inode.nfs_do_access.nfs_permission.inode_permission.link_path_walk.path_lookupat 175 ± 29% 4e+03 3807 ±136% latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup.__lookup_slow.lookup_slow.walk_component.path_lookupat.filename_lookup 52868 ±110% -4e+04 17482 ± 4% latency_stats.avg.max 45055 ±141% -5e+04 0 latency_stats.avg.io_schedule.nfs_lock_and_join_requests.nfs_updatepage.nfs_write_end.generic_perform_write.nfs_file_write.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 227 ± 10% 1e+04 9907 ±136% latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_getattr.__nfs_revalidate_inode.nfs_do_access.nfs_permission.inode_permission.link_path_walk.path_lookupat 0 9e+03 9367 ± 78% latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_do_create.nfs3_proc_create.nfs_create.path_openat.do_filp_open.do_sys_open.do_syscall_64 0 6e+03 5837 ±139% latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe 175 ± 29% 4e+03 3807 ±136% latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup.__lookup_slow.lookup_slow.walk_component.path_lookupat.filename_lookup 98043 ±124% -8e+04 20999 ± 27% latency_stats.max.max 90059 ±141% -9e+04 0 latency_stats.max.io_schedule.nfs_lock_and_join_requests.nfs_updatepage.nfs_write_end.generic_perform_write.nfs_file_write.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 1251 ± 23% 5e+04 49005 ±137% latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_getattr.__nfs_revalidate_inode.nfs_do_access.nfs_permission.inode_permission.link_path_walk.path_lookupat 0 1e+04 12061 ± 70% latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_do_create.nfs3_proc_create.nfs_create.path_openat.do_filp_open.do_sys_open.do_syscall_64 0 6e+03 5837 ±139% latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe 175 ± 29% 4e+03 3807 ±136% latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup.__lookup_slow.lookup_slow.walk_component.path_lookupat.filename_lookup 90111 ±141% -9e+04 0 latency_stats.sum.io_schedule.nfs_lock_and_join_requests.nfs_updatepage.nfs_write_end.generic_perform_write.nfs_file_write.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe Best Regards, Rong Chen ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] locks: fix performance regressions. 2018-11-28 0:53 ` [PATCH] locks: fix performance regressions NeilBrown 2018-11-28 9:17 ` kernel test robot @ 2018-11-28 11:37 ` Jeff Layton 1 sibling, 0 replies; 5+ messages in thread From: Jeff Layton @ 2018-11-28 11:37 UTC (permalink / raw) To: NeilBrown, J. Bruce Fields, kernel test robot; +Cc: LKML, lkp, linux-nfs On Wed, 2018-11-28 at 11:53 +1100, NeilBrown wrote: > The kernel test robot reported two performance regressions > caused by recent patches. > Both appear to related to the global spinlock blocked_lock_lock > being taken more often. > > This patch avoids taking that lock in the cases tested. > > Reported-by: kernel test robot <rong.a.chen@intel.com> > Signed-off-by: NeilBrown <neilb@suse.com> > --- > > Hi Jeff, > you might like to merge these back into the patches that introduced > the problem. > Or you might like me to re-send the series with these merged in, > in which case, please ask. > Thanks Neil, This looks great. I'll go ahead and toss this patch on top of the pile in linux-next for now. Would you mind resending the series with this patch merged in? I took a quick stab at squashing it into the earlier patch, but there is some churn in this area. Maybe you can also turn that Reported-by: into a Tested-by: in the changelog afterward? > And a BIG thank-you to the kernel-test-robot team!! > Absolutely! We love you guys! > Thanks, > NeilBrown > > fs/locks.c | 21 +++++++++++++++++++++ > 1 file changed, 21 insertions(+) > > diff --git a/fs/locks.c b/fs/locks.c > index f456cd3d9d50..67519a43e27a 100644 > --- a/fs/locks.c > +++ b/fs/locks.c > @@ -444,6 +444,13 @@ static void locks_move_blocks(struct file_lock *new, struct file_lock *fl) > { > struct file_lock *f; > > + /* > + * As ctx->flc_lock is held, new requests cannot be added to > + * ->fl_blocked_requests, so we don't need a lock to check if it > + * is empty. > + */ > + if (list_empty(&fl->fl_blocked_requests)) > + return; > spin_lock(&blocked_lock_lock); > list_splice_init(&fl->fl_blocked_requests, &new->fl_blocked_requests); > list_for_each_entry(f, &fl->fl_blocked_requests, fl_blocked_member) > @@ -749,6 +756,20 @@ int locks_delete_block(struct file_lock *waiter) > { > int status = -ENOENT; > > + /* > + * If fl_blocker is NULL, it won't be set again as this thread > + * "owns" the lock and is the only one that might try to claim > + * the lock. So it is safe to test fl_blocker locklessly. > + * Also if fl_blocker is NULL, this waiter is not listed on > + * fl_blocked_requests for some lock, so no other request can > + * be added to the list of fl_blocked_requests for this > + * request. So if fl_blocker is NULL, it is safe to > + * locklessly check if fl_blocked_requests is empty. If both > + * of these checks succeed, there is no need to take the lock. > + */ > + if (waiter->fl_blocker == NULL && > + list_empty(&waiter->fl_blocked_requests)) > + return status; > spin_lock(&blocked_lock_lock); > if (waiter->fl_blocker) > status = 0; -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-11-28 11:37 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20181127060102.GF6163@shao2-debian>
2018-11-27 17:43 ` [LKP] [fs/locks] 83b381078b: will-it-scale.per_thread_ops -62.5% regression J. Bruce Fields
2018-11-27 23:20 ` NeilBrown
2018-11-28 0:53 ` [PATCH] locks: fix performance regressions NeilBrown
2018-11-28 9:17 ` kernel test robot
2018-11-28 11:37 ` Jeff Layton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox