* blkcg: circular dependency between blkcg_pol_mutex and s_active
@ 2014-05-02 15:56 Sasha Levin
[not found] ` <5363C04B.4010400-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Sasha Levin @ 2014-05-02 15:56 UTC (permalink / raw)
To: Tejun Heo, axboe, Li Zefan; +Cc: LKML, Dave Jones, cgroups
Hi all,
While fuzzing with trinity inside a KVM tools guest running the latest -next
kernel I've stumbled on the following:
[ 1585.219328] ======================================================
[ 1585.220035] [ INFO: possible circular locking dependency detected ]
[ 1585.220035] 3.15.0-rc3-next-20140430-sasha-00016-g4e281fa-dirty #429 Tainted: G W
[ 1585.220035] -------------------------------------------------------
[ 1585.220035] trinity-c173/9024 is trying to acquire lock:
[ 1585.220035] (blkcg_pol_mutex){+.+.+.}, at: blkcg_reset_stats (include/linux/spinlock.h:328 block/blk-cgroup.c:455)
[ 1585.220035]
[ 1585.220035] but task is already holding lock:
[ 1585.220035] (s_active#89){++++.+}, at: kernfs_fop_write (fs/kernfs/file.c:283)
[ 1585.220035]
[ 1585.220035] which lock already depends on the new lock.
[ 1585.220035]
[ 1585.220035]
[ 1585.220035] the existing dependency chain (in reverse order) is:
[ 1585.220035]
-> #2 (s_active#89){++++.+}:
[ 1585.220035] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 1585.220035] __kernfs_remove (arch/x86/include/asm/atomic.h:27 fs/kernfs/dir.c:352 fs/kernfs/dir.c:1024)
[ 1585.220035] kernfs_remove_by_name_ns (fs/kernfs/dir.c:1219)
[ 1585.220035] cgroup_addrm_files (include/linux/kernfs.h:427 kernel/cgroup.c:1074 kernel/cgroup.c:2899)
[ 1585.220035] cgroup_clear_dir (kernel/cgroup.c:1092 (discriminator 2))
[ 1585.240173] rebind_subsystems (kernel/cgroup.c:1144)
[ 1585.240173] cgroup_setup_root (kernel/cgroup.c:1568)
[ 1585.240173] cgroup_mount (kernel/cgroup.c:1716)
[ 1585.240173] mount_fs (fs/super.c:1094)
[ 1585.240173] vfs_kern_mount (fs/namespace.c:899)
[ 1585.240173] do_mount (fs/namespace.c:2238 fs/namespace.c:2561)
[ 1585.240173] SyS_mount (fs/namespace.c:2758 fs/namespace.c:2729)
[ 1585.240173] tracesys (arch/x86/kernel/entry_64.S:746)
[ 1585.240173]
-> #1 (cgroup_tree_mutex){+.+.+.}:
[ 1585.240173] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 1585.240173] mutex_lock_nested (kernel/locking/mutex.c:486 kernel/locking/mutex.c:587)
[ 1585.240173] cgroup_add_cftypes (include/linux/list.h:76 kernel/cgroup.c:3040)
[ 1585.240173] blkcg_policy_register (block/blk-cgroup.c:1106)
[ 1585.240173] throtl_init (block/blk-throttle.c:1694)
[ 1585.240173] do_one_initcall (init/main.c:789)
[ 1585.240173] kernel_init_freeable (init/main.c:854 init/main.c:863 init/main.c:882 init/main.c:1003)
[ 1585.240173] kernel_init (init/main.c:935)
[ 1585.240173] ret_from_fork (arch/x86/kernel/entry_64.S:552)
[ 1585.240173]
-> #0 (blkcg_pol_mutex){+.+.+.}:
[ 1585.240173] __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182)
[ 1585.240173] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 1585.240173] mutex_lock_nested (kernel/locking/mutex.c:486 kernel/locking/mutex.c:587)
[ 1585.240173] blkcg_reset_stats (include/linux/spinlock.h:328 block/blk-cgroup.c:455)
[ 1585.240173] cgroup_file_write (kernel/cgroup.c:2714)
[ 1585.240173] kernfs_fop_write (fs/kernfs/file.c:295)
[ 1585.240173] vfs_write (fs/read_write.c:532)
[ 1585.240173] SyS_write (fs/read_write.c:584 fs/read_write.c:576)
[ 1585.240173] tracesys (arch/x86/kernel/entry_64.S:746)
[ 1585.240173]
[ 1585.240173] other info that might help us debug this:
[ 1585.240173]
[ 1585.240173] Chain exists of:
blkcg_pol_mutex --> cgroup_tree_mutex --> s_active#89
[ 1585.240173] Possible unsafe locking scenario:
[ 1585.240173]
[ 1585.240173] CPU0 CPU1
[ 1585.240173] ---- ----
[ 1585.240173] lock(s_active#89);
[ 1585.240173] lock(cgroup_tree_mutex);
[ 1585.240173] lock(s_active#89);
[ 1585.240173] lock(blkcg_pol_mutex);
[ 1585.240173]
[ 1585.240173] *** DEADLOCK ***
[ 1585.240173]
[ 1585.240173] 4 locks held by trinity-c173/9024:
[ 1585.240173] #0: (&f->f_pos_lock){+.+.+.}, at: __fdget_pos (fs/file.c:714)
[ 1585.240173] #1: (sb_writers#18){.+.+.+}, at: vfs_write (include/linux/fs.h:2255 fs/read_write.c:530)
[ 1585.240173] #2: (&of->mutex){+.+.+.}, at: kernfs_fop_write (fs/kernfs/file.c:283)
[ 1585.240173] #3: (s_active#89){++++.+}, at: kernfs_fop_write (fs/kernfs/file.c:283)
[ 1585.240173]
[ 1585.240173] stack backtrace:
[ 1585.240173] CPU: 3 PID: 9024 Comm: trinity-c173 Tainted: G W 3.15.0-rc3-next-20140430-sasha-00016-g4e281fa-dirty #429
[ 1585.240173] ffffffff919687b0 ffff8805f6373bb8 ffffffff8e52cdbb 0000000000000002
[ 1585.240173] ffffffff919d8400 ffff8805f6373c08 ffffffff8e51fb88 0000000000000004
[ 1585.240173] ffff8805f6373c98 ffff8805f6373c08 ffff88061be70d98 ffff88061be70dd0
[ 1585.240173] Call Trace:
[ 1585.240173] dump_stack (lib/dump_stack.c:52)
[ 1585.240173] print_circular_bug (kernel/locking/lockdep.c:1216)
[ 1585.240173] __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182)
[ 1585.240173] ? sched_clock (arch/x86/include/asm/paravirt.h:192 arch/x86/kernel/tsc.c:305)
[ 1585.240173] ? sched_clock_local (kernel/sched/clock.c:214)
[ 1585.240173] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 1585.240173] ? blkcg_reset_stats (include/linux/spinlock.h:328 block/blk-cgroup.c:455)
[ 1585.240173] mutex_lock_nested (kernel/locking/mutex.c:486 kernel/locking/mutex.c:587)
[ 1585.240173] ? blkcg_reset_stats (include/linux/spinlock.h:328 block/blk-cgroup.c:455)
[ 1585.240173] ? get_parent_ip (kernel/sched/core.c:2485)
[ 1585.240173] ? get_parent_ip (kernel/sched/core.c:2485)
[ 1585.240173] ? blkcg_reset_stats (include/linux/spinlock.h:328 block/blk-cgroup.c:455)
[ 1585.240173] ? preempt_count_sub (kernel/sched/core.c:2541)
[ 1585.240173] blkcg_reset_stats (include/linux/spinlock.h:328 block/blk-cgroup.c:455)
[ 1585.240173] cgroup_file_write (kernel/cgroup.c:2714)
[ 1585.240173] ? cgroup_file_write (kernel/cgroup.c:2692)
[ 1585.240173] kernfs_fop_write (fs/kernfs/file.c:295)
[ 1585.240173] vfs_write (fs/read_write.c:532)
[ 1585.240173] SyS_write (fs/read_write.c:584 fs/read_write.c:576)
Thanks,
Sasha
^ permalink raw reply [flat|nested] 6+ messages in thread[parent not found: <5363C04B.4010400-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* Re: blkcg: circular dependency between blkcg_pol_mutex and s_active [not found] ` <5363C04B.4010400-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2014-05-04 19:26 ` Tejun Heo 2014-05-05 16:37 ` [PATCH cgroup/for-3.15-fixes] blkcg: use trylock on blkcg_pol_mutex in blkcg_reset_stats() Tejun Heo 1 sibling, 0 replies; 6+ messages in thread From: Tejun Heo @ 2014-05-04 19:26 UTC (permalink / raw) To: Sasha Levin Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, Li Zefan, LKML, Dave Jones, cgroups-u79uwXL29TY76Z2rM5mHXA Hello, On Fri, May 02, 2014 at 11:56:59AM -0400, Sasha Levin wrote: > While fuzzing with trinity inside a KVM tools guest running the latest -next > kernel I've stumbled on the following: This is a highly unlikely deadlock which got introduced by recent changes in cgroup core. I'm working to further simplify locking in cgroup core. Will prep a quick fix for this issue soon. Thanks. -- tejun ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH cgroup/for-3.15-fixes] blkcg: use trylock on blkcg_pol_mutex in blkcg_reset_stats() [not found] ` <5363C04B.4010400-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2014-05-04 19:26 ` Tejun Heo @ 2014-05-05 16:37 ` Tejun Heo 2014-05-05 16:38 ` Tejun Heo 2014-05-05 17:47 ` Sasha Levin 1 sibling, 2 replies; 6+ messages in thread From: Tejun Heo @ 2014-05-05 16:37 UTC (permalink / raw) To: Li Zefan Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, Li Zefan, Sasha Levin, LKML, Dave Jones, cgroups-u79uwXL29TY76Z2rM5mHXA During the recent conversion of cgroup to kernfs, cgroup_tree_mutex which nests above both the kernfs s_active protection and cgroup_mutex is added to synchronize cgroup file type operations as cgroup_mutex needed to be grabbed from some file operations and thus can't be put above s_active protection. While this arrangement mostly worked for cgroup, this triggered the following lockdep warning. ====================================================== [ INFO: possible circular locking dependency detected ] 3.15.0-rc3-next-20140430-sasha-00016-g4e281fa-dirty #429 Tainted: G W ------------------------------------------------------- trinity-c173/9024 is trying to acquire lock: (blkcg_pol_mutex){+.+.+.}, at: blkcg_reset_stats (include/linux/spinlock.h:328 block/blk-cgroup.c:455) but task is already holding lock: (s_active#89){++++.+}, at: kernfs_fop_write (fs/kernfs/file.c:283) which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (s_active#89){++++.+}: lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) __kernfs_remove (arch/x86/include/asm/atomic.h:27 fs/kernfs/dir.c:352 fs/kernfs/dir.c:1024) kernfs_remove_by_name_ns (fs/kernfs/dir.c:1219) cgroup_addrm_files (include/linux/kernfs.h:427 kernel/cgroup.c:1074 kernel/cgroup.c:2899) cgroup_clear_dir (kernel/cgroup.c:1092 (discriminator 2)) rebind_subsystems (kernel/cgroup.c:1144) cgroup_setup_root (kernel/cgroup.c:1568) cgroup_mount (kernel/cgroup.c:1716) mount_fs (fs/super.c:1094) vfs_kern_mount (fs/namespace.c:899) do_mount (fs/namespace.c:2238 fs/namespace.c:2561) SyS_mount (fs/namespace.c:2758 fs/namespace.c:2729) tracesys (arch/x86/kernel/entry_64.S:746) -> #1 (cgroup_tree_mutex){+.+.+.}: lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) mutex_lock_nested (kernel/locking/mutex.c:486 kernel/locking/mutex.c:587) cgroup_add_cftypes (include/linux/list.h:76 kernel/cgroup.c:3040) blkcg_policy_register (block/blk-cgroup.c:1106) throtl_init (block/blk-throttle.c:1694) do_one_initcall (init/main.c:789) kernel_init_freeable (init/main.c:854 init/main.c:863 init/main.c:882 init/main.c:1003) kernel_init (init/main.c:935) ret_from_fork (arch/x86/kernel/entry_64.S:552) -> #0 (blkcg_pol_mutex){+.+.+.}: __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182) lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) mutex_lock_nested (kernel/locking/mutex.c:486 kernel/locking/mutex.c:587) blkcg_reset_stats (include/linux/spinlock.h:328 block/blk-cgroup.c:455) cgroup_file_write (kernel/cgroup.c:2714) kernfs_fop_write (fs/kernfs/file.c:295) vfs_write (fs/read_write.c:532) SyS_write (fs/read_write.c:584 fs/read_write.c:576) tracesys (arch/x86/kernel/entry_64.S:746) other info that might help us debug this: Chain exists of: blkcg_pol_mutex --> cgroup_tree_mutex --> s_active#89 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(s_active#89); lock(cgroup_tree_mutex); lock(s_active#89); lock(blkcg_pol_mutex); *** DEADLOCK *** 4 locks held by trinity-c173/9024: #0: (&f->f_pos_lock){+.+.+.}, at: __fdget_pos (fs/file.c:714) #1: (sb_writers#18){.+.+.+}, at: vfs_write (include/linux/fs.h:2255 fs/read_write.c:530) #2: (&of->mutex){+.+.+.}, at: kernfs_fop_write (fs/kernfs/file.c:283) #3: (s_active#89){++++.+}, at: kernfs_fop_write (fs/kernfs/file.c:283) stack backtrace: CPU: 3 PID: 9024 Comm: trinity-c173 Tainted: G W 3.15.0-rc3-next-20140430-sasha-00016-g4e281fa-dirty #429 ffffffff919687b0 ffff8805f6373bb8 ffffffff8e52cdbb 0000000000000002 ffffffff919d8400 ffff8805f6373c08 ffffffff8e51fb88 0000000000000004 ffff8805f6373c98 ffff8805f6373c08 ffff88061be70d98 ffff88061be70dd0 Call Trace: dump_stack (lib/dump_stack.c:52) print_circular_bug (kernel/locking/lockdep.c:1216) __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182) lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) mutex_lock_nested (kernel/locking/mutex.c:486 kernel/locking/mutex.c:587) blkcg_reset_stats (include/linux/spinlock.h:328 block/blk-cgroup.c:455) cgroup_file_write (kernel/cgroup.c:2714) kernfs_fop_write (fs/kernfs/file.c:295) vfs_write (fs/read_write.c:532) SyS_write (fs/read_write.c:584 fs/read_write.c:576) This is a highly unlikely but valid circular dependency between "echo 1 > blkcg.reset_stats" and cfq module [un]loading. cgroup is going through further locking update which will remove this complication but for now let's use trylock on blkcg_pol_mutex and retry the file operation if the trylock fails. Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> Reported-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> References: http://lkml.kernel.org/g/5363C04B.4010400-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org --- block/blk-cgroup.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index e4a4145..1039fb9 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -451,7 +451,20 @@ static int blkcg_reset_stats(struct cgroup_subsys_state *css, struct blkcg_gq *blkg; int i; - mutex_lock(&blkcg_pol_mutex); + /* + * XXX: We invoke cgroup_add/rm_cftypes() under blkcg_pol_mutex + * which ends up putting cgroup's internal cgroup_tree_mutex under + * it; however, cgroup_tree_mutex is nested above cgroup file + * active protection and grabbing blkcg_pol_mutex from a cgroup + * file operation creates a possible circular dependency. cgroup + * internal locking is planned to go through further simplification + * and this issue should go away soon. For now, let's trylock + * blkcg_pol_mutex and restart the write on failure. + * + * http://lkml.kernel.org/g/5363C04B.4010400-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org + */ + if (!mutex_trylock(&blkcg_pol_mutex)) + return restart_syscall(); spin_lock_irq(&blkcg->lock); /* ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH cgroup/for-3.15-fixes] blkcg: use trylock on blkcg_pol_mutex in blkcg_reset_stats() 2014-05-05 16:37 ` [PATCH cgroup/for-3.15-fixes] blkcg: use trylock on blkcg_pol_mutex in blkcg_reset_stats() Tejun Heo @ 2014-05-05 16:38 ` Tejun Heo 2014-05-05 17:47 ` Sasha Levin 1 sibling, 0 replies; 6+ messages in thread From: Tejun Heo @ 2014-05-05 16:38 UTC (permalink / raw) To: Li Zefan; +Cc: axboe, Sasha Levin, LKML, Dave Jones, cgroups On Mon, May 05, 2014 at 12:37:30PM -0400, Tejun Heo wrote: > During the recent conversion of cgroup to kernfs, cgroup_tree_mutex > which nests above both the kernfs s_active protection and cgroup_mutex > is added to synchronize cgroup file type operations as cgroup_mutex > needed to be grabbed from some file operations and thus can't be put > above s_active protection. Ooh, I verified that the patch removes the lockdep warning and am routing it through cgroup/for-3.15-fixes. If there's any objection, please let me know. Thanks! -- tejun ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH cgroup/for-3.15-fixes] blkcg: use trylock on blkcg_pol_mutex in blkcg_reset_stats() 2014-05-05 16:37 ` [PATCH cgroup/for-3.15-fixes] blkcg: use trylock on blkcg_pol_mutex in blkcg_reset_stats() Tejun Heo 2014-05-05 16:38 ` Tejun Heo @ 2014-05-05 17:47 ` Sasha Levin 2014-05-05 17:48 ` Tejun Heo 1 sibling, 1 reply; 6+ messages in thread From: Sasha Levin @ 2014-05-05 17:47 UTC (permalink / raw) To: Tejun Heo, Li Zefan; +Cc: axboe, LKML, Dave Jones, cgroups Hey Tejun, On 05/05/2014 12:37 PM, Tejun Heo wrote: > Signed-off-by: Tejun Heo <tj@kernel.org> > Reported-by: Li Zefan <lizefan@huawei.com> > References: http://lkml.kernel.org/g/5363C04B.4010400@oracle.com I think that the Reported-by here should be me :) Thanks, Sasha ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH cgroup/for-3.15-fixes] blkcg: use trylock on blkcg_pol_mutex in blkcg_reset_stats() 2014-05-05 17:47 ` Sasha Levin @ 2014-05-05 17:48 ` Tejun Heo 0 siblings, 0 replies; 6+ messages in thread From: Tejun Heo @ 2014-05-05 17:48 UTC (permalink / raw) To: Sasha Levin; +Cc: Li Zefan, axboe, LKML, Dave Jones, cgroups On Mon, May 05, 2014 at 01:47:10PM -0400, Sasha Levin wrote: > Hey Tejun, > > On 05/05/2014 12:37 PM, Tejun Heo wrote: > > Signed-off-by: Tejun Heo <tj@kernel.org> > > Reported-by: Li Zefan <lizefan@huawei.com> > > References: http://lkml.kernel.org/g/5363C04B.4010400@oracle.com > > I think that the Reported-by here should be me :) Oops, how did that happen? :) Fixing up in the branch. Thanks! -- tejun ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-05-05 17:48 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-02 15:56 blkcg: circular dependency between blkcg_pol_mutex and s_active Sasha Levin
[not found] ` <5363C04B.4010400-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-05-04 19:26 ` Tejun Heo
2014-05-05 16:37 ` [PATCH cgroup/for-3.15-fixes] blkcg: use trylock on blkcg_pol_mutex in blkcg_reset_stats() Tejun Heo
2014-05-05 16:38 ` Tejun Heo
2014-05-05 17:47 ` Sasha Levin
2014-05-05 17:48 ` Tejun Heo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).