* VFS deadlock ? @ 2013-03-21 19:06 Dave Jones 2013-03-21 19:21 ` Al Viro 2013-03-21 19:29 ` Al Viro 0 siblings, 2 replies; 45+ messages in thread From: Dave Jones @ 2013-03-21 19:06 UTC (permalink / raw) To: Al Viro; +Cc: Linux Kernel, Linus Torvalds Al, Linus, I've seen this happen a few times recently.. I have a bunch of trinity processes that run for a few hours, and then eventually all those processes end up spinning waiting on each other. The stack traces always seem to look like this.. Does this look like a real kernel deadlock, or more likely that trinity was unlucky enough to do something really dumb that I need to protect itself against ? I've only seen it happening the last week or so (but that may be because of all the crap that I ended up finding last week preventing me running long enough to hit this until now..) Tree is Linus' latest with Peter Hurley's tty patchset on top (which afair doesn't touch anything in these paths). lockdep hasn't picked up anything, which makes me wonder if it's my problem.. Dave trinity-child2 D ffff880110b3a7e0 5448 7669 1 0x00000004 ffff8801272a7d20 0000000000000086 ffff880101874920 ffff8801272a7fd8 ffff8801272a7fd8 ffff8801272a7fd8 ffff880128ca8000 ffff880101874920 ffff8801039a0250 0000000000000246 ffff880101874920 ffff8801272a7fd8 Call Trace: [<ffffffff816c2fd3>] schedule_preempt_disabled+0x33/0x80 [<ffffffff816bfcc1>] mutex_lock_nested+0x1a1/0x3d0 [<ffffffff811c7034>] ? lock_rename+0x114/0x120 [<ffffffff811c7034>] ? lock_rename+0x114/0x120 [<ffffffff811c7034>] lock_rename+0x114/0x120 [<ffffffff811cd7a7>] sys_renameat+0x1f7/0x3b0 [<ffffffff810b33a2>] ? get_lock_stats+0x22/0x70 [<ffffffff810b6b95>] ? trace_hardirqs_on_caller+0x115/0x1a0 [<ffffffff810b6b95>] ? trace_hardirqs_on_caller+0x115/0x1a0 [<ffffffff8134a24e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff816cd102>] system_call_fastpath+0x16/0x1b trinity-child1 D ffff880110b3a7e0 5200 7901 1 0x00000004 ffff8801015ddc50 0000000000000086 ffff880123fca490 ffff8801015ddfd8 ffff8801015ddfd8 ffff8801015ddfd8 ffff880128cac920 ffff880123fca490 ffff8801015dde60 0000000000000246 ffff880123fca490 ffff8801015ddfd8 Call Trace: [<ffffffff816c2fd3>] schedule_preempt_disabled+0x33/0x80 [<ffffffff816bfcc1>] mutex_lock_nested+0x1a1/0x3d0 [<ffffffff816bb512>] ? lookup_slow+0x13f/0x1b8 [<ffffffff816bb512>] ? lookup_slow+0x13f/0x1b8 [<ffffffff816bb512>] lookup_slow+0x13f/0x1b8 [<ffffffff811c805a>] ? lookup_fast+0x16a/0x2d0 [<ffffffff811c9d7c>] path_lookupat+0x79c/0x7c0 [<ffffffff8119fa87>] ? kmem_cache_alloc+0xf7/0x340 [<ffffffff8100a144>] ? native_sched_clock+0x24/0x80 [<ffffffff811c826a>] ? getname_flags+0x5a/0x1a0 [<ffffffff811c9dd4>] filename_lookup+0x34/0xc0 [<ffffffff811ccd23>] user_path_at_empty+0x63/0xa0 [<ffffffff8100a144>] ? native_sched_clock+0x24/0x80 [<ffffffff810b3288>] ? trace_hardirqs_off_caller+0x28/0xc0 [<ffffffff816cd127>] ? sysret_check+0x1b/0x56 [<ffffffff811ccd71>] user_path_at+0x11/0x20 [<ffffffff811bb538>] sys_fchmodat+0x38/0xa0 [<ffffffff8134a24e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff811bb5b9>] sys_chmod+0x19/0x20 [<ffffffff816cd102>] system_call_fastpath+0x16/0x1b trinity-child1 D ffff880110b3a7e0 5528 8651 29705 0x00000004 ffff8801187cfc60 0000000000000082 ffff880126a28000 ffff8801187cffd8 ffff8801187cffd8 ffff8801187cffd8 ffffffff81c10440 ffff880126a28000 ffff8801187cfe70 0000000000000246 ffff880126a28000 ffff8801187cffd8 Call Trace: [<ffffffff816c2fd3>] schedule_preempt_disabled+0x33/0x80 [<ffffffff816bfcc1>] mutex_lock_nested+0x1a1/0x3d0 [<ffffffff816bb512>] ? lookup_slow+0x13f/0x1b8 [<ffffffff816bb512>] ? lookup_slow+0x13f/0x1b8 [<ffffffff816bb512>] lookup_slow+0x13f/0x1b8 [<ffffffff811c805a>] ? lookup_fast+0x16a/0x2d0 [<ffffffff811c9d7c>] path_lookupat+0x79c/0x7c0 [<ffffffff8119fa87>] ? kmem_cache_alloc+0xf7/0x340 [<ffffffff810b3288>] ? trace_hardirqs_off_caller+0x28/0xc0 [<ffffffff811c826a>] ? getname_flags+0x5a/0x1a0 [<ffffffff811c9dd4>] filename_lookup+0x34/0xc0 [<ffffffff811ccd23>] user_path_at_empty+0x63/0xa0 [<ffffffff810b3288>] ? trace_hardirqs_off_caller+0x28/0xc0 [<ffffffff816cd127>] ? sysret_check+0x1b/0x56 [<ffffffff811ccd71>] user_path_at+0x11/0x20 [<ffffffff811e4904>] sys_removexattr+0x34/0xb0 [<ffffffff816cd102>] system_call_fastpath+0x16/0x1b trinity-child3 D ffff880110b3a7e0 5312 9067 1 0x00000004 ffff880100d29c60 0000000000000082 ffff8801271a0000 ffff880100d29fd8 ffff880100d29fd8 ffff880100d29fd8 ffff880128cac920 ffff8801271a0000 ffff880100d29e70 0000000000000246 ffff8801271a0000 ffff880100d29fd8 Call Trace: [<ffffffff816c2fd3>] schedule_preempt_disabled+0x33/0x80 [<ffffffff816bfcc1>] mutex_lock_nested+0x1a1/0x3d0 [<ffffffff816bb512>] ? lookup_slow+0x13f/0x1b8 [<ffffffff816bb512>] ? lookup_slow+0x13f/0x1b8 [<ffffffff816bb512>] lookup_slow+0x13f/0x1b8 [<ffffffff811c805a>] ? lookup_fast+0x16a/0x2d0 [<ffffffff811c9d7c>] path_lookupat+0x79c/0x7c0 [<ffffffff8119fa87>] ? kmem_cache_alloc+0xf7/0x340 [<ffffffff811c826a>] ? getname_flags+0x5a/0x1a0 [<ffffffff811c9dd4>] filename_lookup+0x34/0xc0 [<ffffffff811ccd23>] user_path_at_empty+0x63/0xa0 [<ffffffff810b3ace>] ? put_lock_stats.isra.27+0xe/0x40 [<ffffffff810b6c2d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff816c4787>] ? _raw_spin_unlock_irq+0x37/0x60 [<ffffffff816cd127>] ? sysret_check+0x1b/0x56 [<ffffffff811ccd71>] user_path_at+0x11/0x20 [<ffffffff811bb538>] sys_fchmodat+0x38/0xa0 [<ffffffff816cd102>] system_call_fastpath+0x16/0x1b trinity-child0 D ffff880110b3a7e0 5320 9112 1 0x00000004 ffff8801014fbc50 0000000000000086 ffff880123a94920 ffff8801014fbfd8 ffff8801014fbfd8 ffff8801014fbfd8 ffffffff81c10440 ffff880123a94920 ffff8801014fbe60 0000000000000246 ffff880123a94920 ffff8801014fbfd8 Call Trace: [<ffffffff816c2fd3>] schedule_preempt_disabled+0x33/0x80 [<ffffffff816bfcc1>] mutex_lock_nested+0x1a1/0x3d0 [<ffffffff816bb512>] ? lookup_slow+0x13f/0x1b8 [<ffffffff816bb512>] ? lookup_slow+0x13f/0x1b8 [<ffffffff816bb512>] lookup_slow+0x13f/0x1b8 [<ffffffff811c805a>] ? lookup_fast+0x16a/0x2d0 [<ffffffff811c9d7c>] path_lookupat+0x79c/0x7c0 [<ffffffff8119fa87>] ? kmem_cache_alloc+0xf7/0x340 [<ffffffff8100a144>] ? native_sched_clock+0x24/0x80 [<ffffffff811c826a>] ? getname_flags+0x5a/0x1a0 [<ffffffff811c9dd4>] filename_lookup+0x34/0xc0 [<ffffffff811ccd23>] user_path_at_empty+0x63/0xa0 [<ffffffff8100a144>] ? native_sched_clock+0x24/0x80 [<ffffffff810b3288>] ? trace_hardirqs_off_caller+0x28/0xc0 [<ffffffff816cd127>] ? sysret_check+0x1b/0x56 [<ffffffff811ccd71>] user_path_at+0x11/0x20 [<ffffffff811babb5>] do_sys_truncate+0x35/0x90 [<ffffffff811bad7e>] sys_truncate+0xe/0x10 [<ffffffff816cd102>] system_call_fastpath+0x16/0x1b trinity-child3 D ffff880110b3a7e0 5304 9414 29705 0x00000004 ffff880127b23c40 0000000000000082 ffff880128d38000 ffff880127b23fd8 ffff880127b23fd8 ffff880127b23fd8 ffffffff81c10440 ffff880128d38000 ffff880127b23e50 0000000000000246 ffff880128d38000 ffff880127b23fd8 Call Trace: [<ffffffff816c2fd3>] schedule_preempt_disabled+0x33/0x80 [<ffffffff816bfcc1>] mutex_lock_nested+0x1a1/0x3d0 [<ffffffff816bb512>] ? lookup_slow+0x13f/0x1b8 [<ffffffff816bb512>] ? lookup_slow+0x13f/0x1b8 [<ffffffff816bb512>] lookup_slow+0x13f/0x1b8 [<ffffffff811c805a>] ? lookup_fast+0x16a/0x2d0 [<ffffffff811c9d7c>] path_lookupat+0x79c/0x7c0 [<ffffffff8119fa87>] ? kmem_cache_alloc+0xf7/0x340 [<ffffffff811c826a>] ? getname_flags+0x5a/0x1a0 [<ffffffff811c9dd4>] filename_lookup+0x34/0xc0 [<ffffffff811ccd23>] user_path_at_empty+0x63/0xa0 [<ffffffff8119fa87>] ? kmem_cache_alloc+0xf7/0x340 [<ffffffff81078c32>] ? creds_are_invalid.part.8+0x12/0x50 [<ffffffff81078f6a>] ? prepare_creds+0x3a/0x1f0 [<ffffffff811ccd71>] user_path_at+0x11/0x20 [<ffffffff811bb093>] sys_faccessat+0xa3/0x230 [<ffffffff816cd102>] system_call_fastpath+0x16/0x1b trinity-child2 D ffff880110b3a7e0 5704 10021 29705 0x00000004 ffff8801228a7be0 0000000000000086 ffff880128c18000 ffff8801228a7fd8 ffff8801228a7fd8 ffff8801228a7fd8 ffffffff81c10440 ffff880128c18000 ffff8801228a7df0 0000000000000246 ffff880128c18000 ffff8801228a7fd8 Call Trace: [<ffffffff816c2fd3>] schedule_preempt_disabled+0x33/0x80 [<ffffffff816bfcc1>] mutex_lock_nested+0x1a1/0x3d0 [<ffffffff816bb512>] ? lookup_slow+0x13f/0x1b8 [<ffffffff816bb512>] ? lookup_slow+0x13f/0x1b8 [<ffffffff816bb512>] lookup_slow+0x13f/0x1b8 [<ffffffff811c805a>] ? lookup_fast+0x16a/0x2d0 [<ffffffff8100a144>] ? native_sched_clock+0x24/0x80 [<ffffffff811c9d7c>] path_lookupat+0x79c/0x7c0 [<ffffffff8119fa87>] ? kmem_cache_alloc+0xf7/0x340 [<ffffffff811c826a>] ? getname_flags+0x5a/0x1a0 [<ffffffff811c9dd4>] filename_lookup+0x34/0xc0 [<ffffffff811ccd23>] user_path_at_empty+0x63/0xa0 [<ffffffff8116fbdc>] ? might_fault+0x9c/0xb0 [<ffffffff811ccd71>] user_path_at+0x11/0x20 [<ffffffff811c1379>] vfs_fstatat+0x49/0x90 [<ffffffff811c1917>] sys_newlstat+0x27/0x40 [<ffffffff810b6b95>] ? trace_hardirqs_on_caller+0x115/0x1a0 [<ffffffff8134a24e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff816cd102>] system_call_fastpath+0x16/0x1b trinity-child0 D ffff880110b3a7e0 5624 10222 29705 0x00000004 ffff8801005ebc50 0000000000000086 ffff880101872490 ffff8801005ebfd8 ffff8801005ebfd8 ffff8801005ebfd8 ffffffff81c10440 ffff880101872490 ffff8801005ebe60 0000000000000246 ffff880101872490 ffff8801005ebfd8 Call Trace: [<ffffffff816c2fd3>] schedule_preempt_disabled+0x33/0x80 [<ffffffff816bfcc1>] mutex_lock_nested+0x1a1/0x3d0 [<ffffffff816bb512>] ? lookup_slow+0x13f/0x1b8 [<ffffffff816bb512>] ? lookup_slow+0x13f/0x1b8 [<ffffffff816bb512>] lookup_slow+0x13f/0x1b8 [<ffffffff811c805a>] ? lookup_fast+0x16a/0x2d0 [<ffffffff811c9d7c>] path_lookupat+0x79c/0x7c0 [<ffffffff8119fa87>] ? kmem_cache_alloc+0xf7/0x340 [<ffffffff811c826a>] ? getname_flags+0x5a/0x1a0 [<ffffffff811c9dd4>] filename_lookup+0x34/0xc0 [<ffffffff811ccd23>] user_path_at_empty+0x63/0xa0 [<ffffffff810b33a2>] ? get_lock_stats+0x22/0x70 [<ffffffff810b6c2d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff816c4787>] ? _raw_spin_unlock_irq+0x37/0x60 [<ffffffff816cd127>] ? sysret_check+0x1b/0x56 [<ffffffff811ccd71>] user_path_at+0x11/0x20 [<ffffffff811bb626>] sys_fchownat+0x66/0x110 [<ffffffff816cd102>] system_call_fastpath+0x16/0x1b runnable tasks: task PID tree-key switches prio exec-runtime sum-exec sum-sleep ---------------------------------------------------------------------------------------------------------- Showing all locks held in the system: 4 locks on stack by trinity-child2/7669: #0: blocked: (sb_writers#4){.+.+.+}, instance: ffff8801292d17d8, at: [<ffffffff811df134>] mnt_want_write+0x24/0x50 #1: held: (&type->s_vfs_rename_key){+.+.+.}, instance: ffff8801292d1928, at: [<ffffffff811c6f5e>] lock_rename+0x3e/0x120 #2: held: (&type->i_mutex_dir_key#2/1){+.+.+.}, instance: ffff880110b3a858, at: [<ffffffff811c701e>] lock_rename+0xfe/0x120 #3: blocked: (&type->i_mutex_dir_key#2/2){+.+.+.}, instance: ffff880110b3a858, at: [<ffffffff811c7034>] lock_rename+0x114/0x120 1 lock on stack by trinity-child1/7901: #0: blocked: (&type->i_mutex_dir_key#2){+.+.+.}, instance: ffff880110b3a858, at: [<ffffffff816bb512>] lookup_slow+0x13f/0x1b8 1 lock on stack by trinity-child1/8651: #0: blocked: (&type->i_mutex_dir_key#2){+.+.+.}, instance: ffff880110b3a858, at: [<ffffffff816bb512>] lookup_slow+0x13f/0x1b8 1 lock on stack by trinity-child3/9067: #0: blocked: (&type->i_mutex_dir_key#2){+.+.+.}, instance: ffff880110b3a858, at: [<ffffffff816bb512>] lookup_slow+0x13f/0x1b8 1 lock on stack by trinity-child0/9112: #0: blocked: (&type->i_mutex_dir_key#2){+.+.+.}, instance: ffff880110b3a858, at: [<ffffffff816bb512>] lookup_slow+0x13f/0x1b8 1 lock on stack by trinity-child3/9414: #0: blocked: (&type->i_mutex_dir_key#2){+.+.+.}, instance: ffff880110b3a858, at: [<ffffffff816bb512>] lookup_slow+0x13f/0x1b8 1 lock on stack by trinity-child2/10021: #0: blocked: (&type->i_mutex_dir_key#2){+.+.+.}, instance: ffff880110b3a858, at: [<ffffffff816bb512>] lookup_slow+0x13f/0x1b8 1 lock on stack by trinity-child0/10222: #0: blocked: (&type->i_mutex_dir_key#2){+.+.+.}, instance: ffff880110b3a858, at: [<ffffffff816bb512>] lookup_slow+0x13f/0x1b8 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 19:06 VFS deadlock ? Dave Jones @ 2013-03-21 19:21 ` Al Viro 2013-03-21 20:31 ` Dave Jones 2013-03-21 19:29 ` Al Viro 1 sibling, 1 reply; 45+ messages in thread From: Al Viro @ 2013-03-21 19:21 UTC (permalink / raw) To: Dave Jones, Linux Kernel, Linus Torvalds On Thu, Mar 21, 2013 at 03:06:53PM -0400, Dave Jones wrote: > Al, Linus, > trinity-child2 D ffff880110b3a7e0 5448 7669 1 0x00000004 [sits in rename(), trying to grab ->i_mutex on a parent] > Showing all locks held in the system: > 4 locks on stack by trinity-child2/7669: > #0: blocked: (sb_writers#4){.+.+.+}, instance: ffff8801292d17d8, at: [<ffffffff811df134>] mnt_want_write+0x24/0x50 > #1: held: (&type->s_vfs_rename_key){+.+.+.}, instance: ffff8801292d1928, at: [<ffffffff811c6f5e>] lock_rename+0x3e/0x120 > #2: held: (&type->i_mutex_dir_key#2/1){+.+.+.}, instance: ffff880110b3a858, at: [<ffffffff811c701e>] lock_rename+0xfe/0x120 > #3: blocked: (&type->i_mutex_dir_key#2/2){+.+.+.}, instance: ffff880110b3a858, at: [<ffffffff811c7034>] lock_rename+0x114/0x120 Wait a minute... How the hell does it manage to sit with *two* blocked locks? Confused... ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 19:21 ` Al Viro @ 2013-03-21 20:31 ` Dave Jones 0 siblings, 0 replies; 45+ messages in thread From: Dave Jones @ 2013-03-21 20:31 UTC (permalink / raw) To: Al Viro; +Cc: Linux Kernel, Linus Torvalds On Thu, Mar 21, 2013 at 07:21:46PM +0000, Al Viro wrote: > On Thu, Mar 21, 2013 at 03:06:53PM -0400, Dave Jones wrote: > > Al, Linus, > > > trinity-child2 D ffff880110b3a7e0 5448 7669 1 0x00000004 > [sits in rename(), trying to grab ->i_mutex on a parent] > > > Showing all locks held in the system: > > 4 locks on stack by trinity-child2/7669: > > #0: blocked: (sb_writers#4){.+.+.+}, instance: ffff8801292d17d8, at: [<ffffffff811df134>] mnt_want_write+0x24/0x50 > > #1: held: (&type->s_vfs_rename_key){+.+.+.}, instance: ffff8801292d1928, at: [<ffffffff811c6f5e>] lock_rename+0x3e/0x120 > > #2: held: (&type->i_mutex_dir_key#2/1){+.+.+.}, instance: ffff880110b3a858, at: [<ffffffff811c701e>] lock_rename+0xfe/0x120 > > #3: blocked: (&type->i_mutex_dir_key#2/2){+.+.+.}, instance: ffff880110b3a858, at: [<ffffffff811c7034>] lock_rename+0x114/0x120 > > Wait a minute... How the hell does it manage to sit with *two* blocked locks? > Confused... Same thing happened when I ran into this yesterday.. (bit more confusing, as there were more parallel runs going on, but similar patterns..) Showing all locks held in the system: 1 lock on stack by trinity-main/784: #0: blocked: (&type->i_mutex_dir_key#2){+.+.+.}, instance: ffff88010c1717c0, at: [<ffffffff816bb512>] lookup_slow+0x13f/0x1b8 2 locks on stack by trinity-child3/6674: #0: blocked: (sb_writers#5){.+.+.+}, instance: ffff8801292d17d8, at: [<ffffffff811df134>] mnt_want_write+0x24/0x50 #1: blocked: (&type->i_mutex_dir_key#2){+.+.+.}, instance: ffff88010c1717c0, at: [<ffffffff811ba1d9>] chmod_common+0x49/0xa0 1 lock on stack by trinity-child3/9678: #0: blocked: (&type->i_mutex_dir_key#2){+.+.+.}, instance: ffff88010c1717c0, at: [<ffffffff816bb512>] lookup_slow+0x13f/0x1b8 1 lock on stack by trinity-child0/9845: #0: blocked: (&type->i_mutex_dir_key#2){+.+.+.}, instance: ffff88010c1717c0, at: [<ffffffff816bb512>] lookup_slow+0x13f/0x1b8 2 locks on stack by trinity-child1/10147: #0: blocked: (sb_writers#5){.+.+.+}, instance: ffff8801292d17d8, at: [<ffffffff811df134>] mnt_want_write+0x24/0x50 #1: blocked: (&type->i_mutex_dir_key#2/1){+.+.+.}, instance: ffff88010c1717c0, at: [<ffffffff811c9ffd>] kern_path_create+0xad/0x190 1 lock on stack by trinity-child2/10793: #0: blocked: (&type->i_mutex_dir_key#2){+.+.+.}, instance: ffff88010c1717c0, at: [<ffffffff811cb5da>] do_last+0x33a/0xf20 1 lock on stack by trinity-child1/11549: #0: blocked: (&type->i_mutex_dir_key#2){+.+.+.}, instance: ffff88010c1717c0, at: [<ffffffff816bb512>] lookup_slow+0x13f/0x1b8 4 locks on stack by trinity-child0/11592: #0: blocked: (sb_writers#5){.+.+.+}, instance: ffff8801292d17d8, at: [<ffffffff811df134>] mnt_want_write+0x24/0x50 #1: held: (&type->s_vfs_rename_key){+.+.+.}, instance: ffff8801292d1928, at: [<ffffffff811c6f5e>] lock_rename+0x3e/0x120 #2: held: (&type->i_mutex_dir_key#2/1){+.+.+.}, instance: ffff88010c1717c0, at: [<ffffffff811c701e>] lock_rename+0xfe/0x120 #3: blocked: (&type->i_mutex_dir_key#2/2){+.+.+.}, instance: ffff88010c1717c0, at: [<ffffffff811c7034>] lock_rename+0x114/0x120 1 lock on stack by trinity-child0/11645: #0: blocked: (&type->i_mutex_dir_key#2){+.+.+.}, instance: ffff88010c1717c0, at: [<ffffffff816bb512>] lookup_slow+0x13f/0x1b8 2 locks on stack by trinity-child3/11740: #0: blocked: (sb_writers#5){.+.+.+}, instance: ffff8801292d17d8, at: [<ffffffff811df134>] mnt_want_write+0x24/0x50 #1: blocked: (&type->i_mutex_dir_key#2/1){+.+.+.}, instance: ffff88010c1717c0, at: [<ffffffff811c9ffd>] kern_path_create+0xad/0x190 1 lock on stack by trinity-child2/11953: #0: blocked: (&type->i_mutex_dir_key#2){+.+.+.}, instance: ffff88010c1717c0, at: [<ffffffff816bb512>] lookup_slow+0x13f/0x1b8 1 lock on stack by trinity-child2/12018: #0: blocked: (&type->i_mutex_dir_key#2){+.+.+.}, instance: ffff88010c1717c0, at: [<ffffffff816bb512>] lookup_slow+0x13f/0x1b8 2 locks on stack by trinity-child1/12312: #0: blocked: (sb_writers#5){.+.+.+}, instance: ffff8801292d17d8, at: [<ffffffff811df134>] mnt_want_write+0x24/0x50 #1: blocked: (&type->i_mutex_dir_key#2/1){+.+.+.}, instance: ffff88010c1717c0, at: [<ffffffff811caaa2>] do_unlinkat+0x112/0x230 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 19:06 VFS deadlock ? Dave Jones 2013-03-21 19:21 ` Al Viro @ 2013-03-21 19:29 ` Al Viro 2013-03-21 20:15 ` Linus Torvalds 1 sibling, 1 reply; 45+ messages in thread From: Al Viro @ 2013-03-21 19:29 UTC (permalink / raw) To: Dave Jones, Linux Kernel, Linus Torvalds On Thu, Mar 21, 2013 at 03:06:53PM -0400, Dave Jones wrote: > Showing all locks held in the system: > 4 locks on stack by trinity-child2/7669: > #0: blocked: (sb_writers#4){.+.+.+}, instance: ffff8801292d17d8, at: [<ffffffff811df134>] mnt_want_write+0x24/0x50 > #1: held: (&type->s_vfs_rename_key){+.+.+.}, instance: ffff8801292d1928, at: [<ffffffff811c6f5e>] lock_rename+0x3e/0x120 > #2: held: (&type->i_mutex_dir_key#2/1){+.+.+.}, instance: ffff880110b3a858, at: [<ffffffff811c701e>] lock_rename+0xfe/0x120 > #3: blocked: (&type->i_mutex_dir_key#2/2){+.+.+.}, instance: ffff880110b3a858, at: [<ffffffff811c7034>] lock_rename+0x114/0x120 #0 oddity aside, that looks very much like directory aliased by two different dentries. Try to add BUG_ON(p1->d_inode == p2->d_inode); just before mutex_lock(&p1->d_inode->i_sb->s_vfs_rename_mutex); and see if it triggers. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 19:29 ` Al Viro @ 2013-03-21 20:15 ` Linus Torvalds 2013-03-21 20:26 ` Dave Jones 0 siblings, 1 reply; 45+ messages in thread From: Linus Torvalds @ 2013-03-21 20:15 UTC (permalink / raw) To: Al Viro; +Cc: Dave Jones, Linux Kernel On Thu, Mar 21, 2013 at 12:29 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > #0 oddity aside, that looks very much like directory aliased by two different > dentries. Try to add > BUG_ON(p1->d_inode == p2->d_inode); > just before > mutex_lock(&p1->d_inode->i_sb->s_vfs_rename_mutex); > and see if it triggers. Don't do a BUG_ON(), instead do something like if (WARN_ON_ONCE(p1->d_inode == p2->d_inode)) { printk("pi=%s p2=%s\n", pi->d_name, p2->d_name); mutex_lock_nested(&p1->d_inode->i_mutex, I_MUTEX_PARENT); return NULL; } so that we actually see where it is. I'm assuming it's some sysfs oddity again.. Linus ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 20:15 ` Linus Torvalds @ 2013-03-21 20:26 ` Dave Jones 2013-03-21 20:32 ` Linus Torvalds 0 siblings, 1 reply; 45+ messages in thread From: Dave Jones @ 2013-03-21 20:26 UTC (permalink / raw) To: Linus Torvalds; +Cc: Al Viro, Linux Kernel On Thu, Mar 21, 2013 at 01:15:08PM -0700, Linus Torvalds wrote: > On Thu, Mar 21, 2013 at 12:29 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > #0 oddity aside, that looks very much like directory aliased by two different > > dentries. Try to add > > BUG_ON(p1->d_inode == p2->d_inode); > > just before > > mutex_lock(&p1->d_inode->i_sb->s_vfs_rename_mutex); > > and see if it triggers. > > Don't do a BUG_ON(), instead do something like > > if (WARN_ON_ONCE(p1->d_inode == p2->d_inode)) { > printk("pi=%s p2=%s\n", pi->d_name, p2->d_name); > mutex_lock_nested(&p1->d_inode->i_mutex, I_MUTEX_PARENT); > return NULL; > } those are qstr's, so I used d_name.name, right ? > so that we actually see where it is. I'm assuming it's some sysfs oddity again.. I'd be surprised actually, I've got sysfs excluded from its list of victim files, due to unrelated issues still unresolved. So unless it followed a symlink into sys from somewhere in /proc or /dev... It took a few hours to reproduce last time, I'll increase the number of child processes to see if I can trigger it faster now that I have the debug stuff in there. Dave ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 20:26 ` Dave Jones @ 2013-03-21 20:32 ` Linus Torvalds 2013-03-21 20:36 ` Dave Jones 0 siblings, 1 reply; 45+ messages in thread From: Linus Torvalds @ 2013-03-21 20:32 UTC (permalink / raw) To: Dave Jones, Linus Torvalds, Al Viro, Linux Kernel On Thu, Mar 21, 2013 at 1:26 PM, Dave Jones <davej@redhat.com> wrote: > > those are qstr's, so I used d_name.name, right ? Yup. And if you want to, you could do p1->d_parent->d_name.name too, just to make things obvious. It's technically racy, but by the time the bug happens, who cares? > I'd be surprised actually, I've got sysfs excluded from its list of victim files, > due to unrelated issues still unresolved. So unless it followed a symlink into > sys from somewhere in /proc or /dev... > > It took a few hours to reproduce last time, I'll increase the number of child > processes to see if I can trigger it faster now that I have the debug stuff in there. Hmm, ok. Do you have any network mounts or fuse or other "odd" filesystems etc? The whole "aliased inodes" thing might come from something like that. Linus ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 20:32 ` Linus Torvalds @ 2013-03-21 20:36 ` Dave Jones 2013-03-21 20:47 ` Al Viro 0 siblings, 1 reply; 45+ messages in thread From: Dave Jones @ 2013-03-21 20:36 UTC (permalink / raw) To: Linus Torvalds; +Cc: Al Viro, Linux Kernel On Thu, Mar 21, 2013 at 01:32:45PM -0700, Linus Torvalds wrote: > On Thu, Mar 21, 2013 at 1:26 PM, Dave Jones <davej@redhat.com> wrote: > > > > those are qstr's, so I used d_name.name, right ? > > Yup. And if you want to, you could do p1->d_parent->d_name.name too, > just to make things obvious. It's technically racy, but by the time > the bug happens, who cares? I'll add that next time around if the current run doesn't turn up anything interesting. > > I'd be surprised actually, I've got sysfs excluded from its list of victim files, > > due to unrelated issues still unresolved. So unless it followed a symlink into > > sys from somewhere in /proc or /dev... > > > > It took a few hours to reproduce last time, I'll increase the number of child > > processes to see if I can trigger it faster now that I have the debug stuff in there. > > Hmm, ok. Do you have any network mounts or fuse or other "odd" > filesystems etc? The whole "aliased inodes" thing might come from > something like that. at some point during the fuzz run, this happened.. Mar 20 15:20:41 kernel: [ 7578.784674] fuse init (API version 7.21) Mar 20 15:20:41 systemd[1]: Mounting FUSE Control File System... Mar 20 15:20:41 systemd[1]: Mounted FUSE Control File System. I guess something wandered into /dev/fuse and did something. Not sure why systemd reacted though... Dave ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 20:36 ` Dave Jones @ 2013-03-21 20:47 ` Al Viro 2013-03-21 21:02 ` Dave Jones 0 siblings, 1 reply; 45+ messages in thread From: Al Viro @ 2013-03-21 20:47 UTC (permalink / raw) To: Dave Jones, Linus Torvalds, Linux Kernel On Thu, Mar 21, 2013 at 04:36:39PM -0400, Dave Jones wrote: > at some point during the fuzz run, this happened.. > > Mar 20 15:20:41 kernel: [ 7578.784674] fuse init (API version 7.21) > Mar 20 15:20:41 systemd[1]: Mounting FUSE Control File System... > Mar 20 15:20:41 systemd[1]: Mounted FUSE Control File System. > > I guess something wandered into /dev/fuse and did something. Not sure why > systemd reacted though... AFAICS, fuse uses d_materialise_unique() and d_splice_alias(), though, so it's not too likely source of that crap; there's no d_instantiate() calls at all and the sole d_add() is using an inode that has just been allocated, so it won't be creating aliases either... ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 20:47 ` Al Viro @ 2013-03-21 21:02 ` Dave Jones 2013-03-21 21:18 ` Linus Torvalds 0 siblings, 1 reply; 45+ messages in thread From: Dave Jones @ 2013-03-21 21:02 UTC (permalink / raw) To: Al Viro; +Cc: Linus Torvalds, Linux Kernel On Thu, Mar 21, 2013 at 08:47:04PM +0000, Al Viro wrote: > On Thu, Mar 21, 2013 at 04:36:39PM -0400, Dave Jones wrote: > > at some point during the fuzz run, this happened.. > > > > Mar 20 15:20:41 kernel: [ 7578.784674] fuse init (API version 7.21) > > Mar 20 15:20:41 systemd[1]: Mounting FUSE Control File System... > > Mar 20 15:20:41 systemd[1]: Mounted FUSE Control File System. > > > > I guess something wandered into /dev/fuse and did something. Not sure why > > systemd reacted though... > > AFAICS, fuse uses d_materialise_unique() and d_splice_alias(), though, so > it's not too likely source of that crap; there's no d_instantiate() calls > at all and the sole d_add() is using an inode that has just been allocated, > so it won't be creating aliases either... here we go... WARNING: at fs/namei.c:2335 lock_rename+0x156/0x160() Hardware name: GA-MA78GM-S2H Modules linked in: vmw_vsock_vmci_transport vmw_vmci vsock hidp cmtp kernelcapi bnep caif_socket caif phonet nfnetlink rfcomm l2tp_ppp l2tp_netlink l2tp_core rose pppoe pppox ppp_generic slhc llc2 netrom af_key can_raw af_rxrpc scsi_transport_iscsi ipt_ULOG appletalk irda crc_ccitt decnet can_bcm can rds atm x25 ipx p8023 psnap p8022 llc ax25 nfc af_802154 lockd sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables snd_hda_codec_realtek snd_hda_intel snd_hda_codec btusb snd_pcm bluetooth rfkill snd_page_alloc snd_timer microcode serio_raw pcspkr snd edac_core soundcore r8169 mii vhost_net tun macvtap macvlan kvm_amd kvm radeon backlight drm_kms_helper ttm Pid: 5633, comm: trinity-child3 Not tainted 3.9.0-rc3+ #107 Call Trace: [<ffffffff810450f5>] warn_slowpath_common+0x75/0xa0 [<ffffffff810451da>] warn_slowpath_null+0x1a/0x20 [<ffffffff811c6df6>] lock_rename+0x156/0x160 [<ffffffff811cd907>] sys_renameat+0x1f7/0x3b0 [<ffffffff810b33a2>] ? get_lock_stats+0x22/0x70 [<ffffffff810b6b95>] ? trace_hardirqs_on_caller+0x115/0x1a0 [<ffffffff810b6b95>] ? trace_hardirqs_on_caller+0x115/0x1a0 [<ffffffff8134a3ae>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff816cd242>] system_call_fastpath+0x16/0x1b ---[ end trace 421592dfa22abfb0 ]--- p1=irda p2=irda followed by... ===================================== [ BUG: bad unlock balance detected! ] 3.9.0-rc3+ #107 Tainted: G W ------------------------------------- trinity-child3/5633 is trying to release lock (&type->i_mutex_dir_key) at: [<ffffffff816c104e>] mutex_unlock+0xe/0x10 but there are no more locks to release! other info that might help us debug this: 1 lock on stack by trinity-child3/5633: #0: blocked: (sb_writers#4){.+.+.+}, instance: ffff8801292d17d8, at: [<ffffffff811df294>] mnt_want_write+0x24/0x50 stack backtrace: Pid: 5633, comm: trinity-child3 Tainted: G W 3.9.0-rc3+ #107 Call Trace: [<ffffffff816c104e>] ? mutex_unlock+0xe/0x10 [<ffffffff810b4e10>] print_unlock_imbalance_bug+0x100/0x110 [<ffffffff810b9417>] lock_release_non_nested+0x257/0x320 [<ffffffff810b3288>] ? trace_hardirqs_off_caller+0x28/0xc0 [<ffffffff816c104e>] ? mutex_unlock+0xe/0x10 [<ffffffff816c104e>] ? mutex_unlock+0xe/0x10 [<ffffffff810b9587>] lock_release+0xa7/0x310 [<ffffffff816c0f4a>] __mutex_unlock_slowpath+0x8a/0x180 [<ffffffff816c104e>] mutex_unlock+0xe/0x10 [<ffffffff811c67f1>] unlock_rename+0x41/0x60 [<ffffffff811cd996>] sys_renameat+0x286/0x3b0 [<ffffffff810b33a2>] ? get_lock_stats+0x22/0x70 [<ffffffff810b6b95>] ? trace_hardirqs_on_caller+0x115/0x1a0 [<ffffffff810b6b95>] ? trace_hardirqs_on_caller+0x115/0x1a0 [<ffffffff8134a3ae>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff816cd242>] system_call_fastpath+0x16/0x1b ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 21:02 ` Dave Jones @ 2013-03-21 21:18 ` Linus Torvalds 2013-03-21 21:26 ` Al Viro 2013-03-21 22:12 ` Dave Jones 0 siblings, 2 replies; 45+ messages in thread From: Linus Torvalds @ 2013-03-21 21:18 UTC (permalink / raw) To: Dave Jones, Al Viro, Linus Torvalds, Linux Kernel On Thu, Mar 21, 2013 at 2:02 PM, Dave Jones <davej@redhat.com> wrote: > > here we go... > > WARNING: at fs/namei.c:2335 lock_rename+0x156/0x160() > p1=irda p2=irda Ok, good. I ssupect it's /proc or /sys, we do have that entry there. But in fact I suspect we do want the parent name after all, because I think we have multiple "irda" directories. There's the one in /proc/net/ (added by net/irda/irproc.c), and there's a sysctl CTL_DIR "irda" directory (kernel/sysctl_binary.c). And there might even be a few other ones in /sys too, thanks to the ldisc called "irda" etc. I don't see where the shared inode comes from, but I suspect that would be easier to guess if we actually see which particular case it ends up being.. > followed by... > ===================================== > [ BUG: bad unlock balance detected! ] Oh, ok, that's just because the unlock path doesn't have the same logic for unlocking identical inodes that the thing just added to the locking path. You'd need to add a check for "same inode" and only unlock it once. So that was my fault in asking for a non-BUG_ON and not doing the complete thing. See "unlock_rename()" - you'd need to change the "p1 != p2" test there to "p1->d_inode != p2->d_inode" there to match the logic in lock_rename() Linus ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 21:18 ` Linus Torvalds @ 2013-03-21 21:26 ` Al Viro 2013-03-21 21:41 ` Dave Jones 2013-03-21 22:12 ` Dave Jones 1 sibling, 1 reply; 45+ messages in thread From: Al Viro @ 2013-03-21 21:26 UTC (permalink / raw) To: Linus Torvalds; +Cc: Dave Jones, Linux Kernel On Thu, Mar 21, 2013 at 02:18:15PM -0700, Linus Torvalds wrote: > On Thu, Mar 21, 2013 at 2:02 PM, Dave Jones <davej@redhat.com> wrote: > > > > here we go... > > > > WARNING: at fs/namei.c:2335 lock_rename+0x156/0x160() > > p1=irda p2=irda > > Ok, good. I ssupect it's /proc or /sys, we do have that entry there. > > But in fact I suspect we do want the parent name after all, because I > think we have multiple "irda" directories. There's the one in > /proc/net/ (added by net/irda/irproc.c), and there's a sysctl CTL_DIR > "irda" directory (kernel/sysctl_binary.c). And there might even be a > few other ones in /sys too, thanks to the ldisc called "irda" etc. > > I don't see where the shared inode comes from, but I suspect that > would be easier to guess if we actually see which particular case it > ends up being.. Well, something like static char path[4096]; d_absolute_path(p1, path, 4096); printk(KERN_ERR "%s %s %d %d" path, p1->d_sb->s_type->name, d_unlinked(p1), d_unlinked(p2)); might be useful - pathname within fs, fs type and which of those suckers are unlinked... ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 21:26 ` Al Viro @ 2013-03-21 21:41 ` Dave Jones 2013-03-21 21:47 ` Linus Torvalds 2013-03-21 21:52 ` Al Viro 0 siblings, 2 replies; 45+ messages in thread From: Dave Jones @ 2013-03-21 21:41 UTC (permalink / raw) To: Al Viro; +Cc: Linus Torvalds, Linux Kernel On Thu, Mar 21, 2013 at 09:26:17PM +0000, Al Viro wrote: > On Thu, Mar 21, 2013 at 02:18:15PM -0700, Linus Torvalds wrote: > > On Thu, Mar 21, 2013 at 2:02 PM, Dave Jones <davej@redhat.com> wrote: > > > > > > here we go... > > > > > > WARNING: at fs/namei.c:2335 lock_rename+0x156/0x160() > > > p1=irda p2=irda > > > > Ok, good. I ssupect it's /proc or /sys, we do have that entry there. > > > > But in fact I suspect we do want the parent name after all, because I > > think we have multiple "irda" directories. There's the one in > > /proc/net/ (added by net/irda/irproc.c), and there's a sysctl CTL_DIR > > "irda" directory (kernel/sysctl_binary.c). And there might even be a > > few other ones in /sys too, thanks to the ldisc called "irda" etc. > > > > I don't see where the shared inode comes from, but I suspect that > > would be easier to guess if we actually see which particular case it > > ends up being.. > > Well, something like > static char path[4096]; > d_absolute_path(p1, path, 4096); > printk(KERN_ERR "%s %s %d %d" > path, p1->d_sb->s_type->name, d_unlinked(p1), d_unlinked(p2)); > might be useful - pathname within fs, fs type and which of those suckers are > unlinked... uh.. fs/namei.c:2342:3: warning: passing argument 1 of ‘d_absolute_path’ from incompatible pointer type [enabled by default] d_absolute_path(p1, path, 4096); ^ In file included from include/linux/fs.h:8:0, from fs/namei.c:21: include/linux/dcache.h:337:14: note: expected ‘const struct path *’ but argument is of type ‘struct dentry *’ extern char *d_absolute_path(const struct path *, char *, int); ^ How do I go from the dentry in p1 to the path d_absolute_path expects ? Dave ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 21:41 ` Dave Jones @ 2013-03-21 21:47 ` Linus Torvalds 2013-03-21 21:55 ` Al Viro 2013-03-21 21:52 ` Al Viro 1 sibling, 1 reply; 45+ messages in thread From: Linus Torvalds @ 2013-03-21 21:47 UTC (permalink / raw) To: Dave Jones, Al Viro, Linus Torvalds, Linux Kernel On Thu, Mar 21, 2013 at 2:41 PM, Dave Jones <davej@redhat.com> wrote: > > How do I go from the dentry in p1 to the path d_absolute_path expects ? You don't. Use dentry_path() instead. It will also end up giving you the information about whether the dentry is unlinked or not. Linus ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 21:47 ` Linus Torvalds @ 2013-03-21 21:55 ` Al Viro 2013-03-21 21:57 ` Linus Torvalds 0 siblings, 1 reply; 45+ messages in thread From: Al Viro @ 2013-03-21 21:55 UTC (permalink / raw) To: Linus Torvalds; +Cc: Dave Jones, Linux Kernel On Thu, Mar 21, 2013 at 02:47:06PM -0700, Linus Torvalds wrote: > It will also end up giving you > the information about whether the dentry is unlinked or not. You also want to know if p2 is unlinked, though (and yes, it's dentry_path(), of course - apologies for the braino). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 21:55 ` Al Viro @ 2013-03-21 21:57 ` Linus Torvalds 2013-03-21 22:03 ` Al Viro 0 siblings, 1 reply; 45+ messages in thread From: Linus Torvalds @ 2013-03-21 21:57 UTC (permalink / raw) To: Al Viro; +Cc: Dave Jones, Linux Kernel On Thu, Mar 21, 2013 at 2:55 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Thu, Mar 21, 2013 at 02:47:06PM -0700, Linus Torvalds wrote: > >> It will also end up giving you >> the information about whether the dentry is unlinked or not. > > You also want to know if p2 is unlinked, though (and yes, it's dentry_path(), > of course - apologies for the braino). Just do dentry_path on both p1 and p2. You'll have to have two different buffers for them, of course. Linus ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 21:57 ` Linus Torvalds @ 2013-03-21 22:03 ` Al Viro 0 siblings, 0 replies; 45+ messages in thread From: Al Viro @ 2013-03-21 22:03 UTC (permalink / raw) To: Linus Torvalds; +Cc: Dave Jones, Linux Kernel On Thu, Mar 21, 2013 at 02:57:29PM -0700, Linus Torvalds wrote: > On Thu, Mar 21, 2013 at 2:55 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Thu, Mar 21, 2013 at 02:47:06PM -0700, Linus Torvalds wrote: > > > >> It will also end up giving you > >> the information about whether the dentry is unlinked or not. > > > > You also want to know if p2 is unlinked, though (and yes, it's dentry_path(), > > of course - apologies for the braino). > > Just do dentry_path on both p1 and p2. You'll have to have two > different buffers for them, of course. I would be rather surprised if the pathnames proper would turn out to be different, actually... ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 21:41 ` Dave Jones 2013-03-21 21:47 ` Linus Torvalds @ 2013-03-21 21:52 ` Al Viro 1 sibling, 0 replies; 45+ messages in thread From: Al Viro @ 2013-03-21 21:52 UTC (permalink / raw) To: Dave Jones, Linus Torvalds, Linux Kernel On Thu, Mar 21, 2013 at 05:41:18PM -0400, Dave Jones wrote: > fs/namei.c:2342:3: warning: passing argument 1 of ???d_absolute_path??? from incompatible pointer type [enabled by default] > d_absolute_path(p1, path, 4096); > ^ > In file included from include/linux/fs.h:8:0, > from fs/namei.c:21: > include/linux/dcache.h:337:14: note: expected ???const struct path *??? but argument is of type ???struct dentry *??? > extern char *d_absolute_path(const struct path *, char *, int); > ^ > > > How do I go from the dentry in p1 to the path d_absolute_path expects ? Ugh... s/d_absolute_path/dentry_path/, of course. Sorry about the braino... ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 21:18 ` Linus Torvalds 2013-03-21 21:26 ` Al Viro @ 2013-03-21 22:12 ` Dave Jones 2013-03-21 22:29 ` Dave Jones 2013-03-21 22:53 ` Linus Torvalds 1 sibling, 2 replies; 45+ messages in thread From: Dave Jones @ 2013-03-21 22:12 UTC (permalink / raw) To: Linus Torvalds; +Cc: Al Viro, Linux Kernel On Thu, Mar 21, 2013 at 02:18:15PM -0700, Linus Torvalds wrote: > On Thu, Mar 21, 2013 at 2:02 PM, Dave Jones <davej@redhat.com> wrote: > > > > here we go... > > > > WARNING: at fs/namei.c:2335 lock_rename+0x156/0x160() > > p1=irda p2=irda > > Ok, good. I ssupect it's /proc or /sys, we do have that entry there. > > But in fact I suspect we do want the parent name after all, because I > think we have multiple "irda" directories. There's the one in > /proc/net/ (added by net/irda/irproc.c), and there's a sysctl CTL_DIR > "irda" directory (kernel/sysctl_binary.c). And there might even be a > few other ones in /sys too, thanks to the ldisc called "irda" etc. > > I don't see where the shared inode comes from, but I suspect that > would be easier to guess if we actually see which particular case it > ends up being.. it's not just irda fwiw.. p1=rpc p2=rpc p1parent=net p2parent=net not sure why the printk with Al's last debugging info didn't print out.. Investigating.. Dave ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 22:12 ` Dave Jones @ 2013-03-21 22:29 ` Dave Jones 2013-03-21 22:53 ` Linus Torvalds 1 sibling, 0 replies; 45+ messages in thread From: Dave Jones @ 2013-03-21 22:29 UTC (permalink / raw) To: Linus Torvalds, Al Viro, Linux Kernel On Thu, Mar 21, 2013 at 06:12:56PM -0400, Dave Jones wrote: > On Thu, Mar 21, 2013 at 02:18:15PM -0700, Linus Torvalds wrote: > > On Thu, Mar 21, 2013 at 2:02 PM, Dave Jones <davej@redhat.com> wrote: > > > > > > here we go... > > > > > > WARNING: at fs/namei.c:2335 lock_rename+0x156/0x160() > > > p1=irda p2=irda > > > > Ok, good. I ssupect it's /proc or /sys, we do have that entry there. > > > > But in fact I suspect we do want the parent name after all, because I > > think we have multiple "irda" directories. There's the one in > > /proc/net/ (added by net/irda/irproc.c), and there's a sysctl CTL_DIR > > "irda" directory (kernel/sysctl_binary.c). And there might even be a > > few other ones in /sys too, thanks to the ldisc called "irda" etc. > > > > I don't see where the shared inode comes from, but I suspect that > > would be easier to guess if we actually see which particular case it > > ends up being.. > > it's not just irda fwiw.. > > p1=rpc p2=rpc p1parent=net p2parent=net > > not sure why the printk with Al's last debugging info didn't print out.. > Investigating.. missing \n meant it didn't go over serial.. on screen however was.. path1: path2: proc 0 0 sanity check ? diff --git a/fs/namei.c b/fs/namei.c index 57ae9c8..5afffe1 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2282,6 +2332,26 @@ struct dentry *lock_rename(struct dentry *p1, struct dentry *p2) return NULL; } + if (WARN_ON_ONCE(p1->d_inode == p2->d_inode)) { + static char path1[4096]; + static char path2[4096]; + + printk(KERN_ERR "p1=%s p2=%s p1parent=%s p2parent=%s\n", p1->d_name.name, p2->d_name.name, + p1->d_parent->d_name.name, + p2->d_parent->d_name.name); + + dentry_path(p1, path1, 4096); + dentry_path(p2, path2, 4096); + + printk(KERN_ERR "path1:%s path2:%s %s %d %d\n", + path1, path2, p1->d_sb->s_type->name, d_unlinked(p1), d_unlinked(p2)); + + + mutex_lock_nested(&p1->d_inode->i_mutex, I_MUTEX_PARENT); + return NULL; + } + + mutex_lock(&p1->d_inode->i_sb->s_vfs_rename_mutex); p = d_ancestor(p2, p1); @@ -2306,7 +2376,8 @@ struct dentry *lock_rename(struct dentry *p1, struct dentry *p2) void unlock_rename(struct dentry *p1, struct dentry *p2) { mutex_unlock(&p1->d_inode->i_mutex); - if (p1 != p2) { + + if (p1->d_inode != p2->d_inode) { mutex_unlock(&p2->d_inode->i_mutex); mutex_unlock(&p1->d_inode->i_sb->s_vfs_rename_mutex); } ^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 22:12 ` Dave Jones 2013-03-21 22:29 ` Dave Jones @ 2013-03-21 22:53 ` Linus Torvalds 2013-03-21 23:07 ` Dave Jones 2013-03-21 23:36 ` Al Viro 1 sibling, 2 replies; 45+ messages in thread From: Linus Torvalds @ 2013-03-21 22:53 UTC (permalink / raw) To: Dave Jones, Linus Torvalds, Al Viro, Linux Kernel On Thu, Mar 21, 2013 at 3:12 PM, Dave Jones <davej@redhat.com> wrote: > > it's not just irda fwiw.. > > p1=rpc p2=rpc p1parent=net p2parent=net Ok, good. The only rpc/irda that has something in common is /proc/net/, and they both use proc_mkdir() to create the directory: proc_irda = proc_mkdir("irda", init_net.proc_net); ... sn->proc_net_rpc = proc_mkdir("rpc", net->proc_net); so it's almost certainly that case. What I do *not* see is how we got two different dentries for the same name in /proc. But if that happens, then yes, they will have aliased inodes (because proc_get_inode() will look them up by "sb,de->low_ino". Al, any ideas? There shouldn't be some lookup race, because that's done under the parent inode lock. And multiple mount-points will have different superblocks, so proc_get_inode() will give them separate inodes. And bind mounts should have all the same dentry tree. So what the heck am I missing? Linus ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 22:53 ` Linus Torvalds @ 2013-03-21 23:07 ` Dave Jones 2013-03-21 23:36 ` Al Viro 1 sibling, 0 replies; 45+ messages in thread From: Dave Jones @ 2013-03-21 23:07 UTC (permalink / raw) To: Linus Torvalds; +Cc: Al Viro, Linux Kernel On Thu, Mar 21, 2013 at 03:53:13PM -0700, Linus Torvalds wrote: > On Thu, Mar 21, 2013 at 3:12 PM, Dave Jones <davej@redhat.com> wrote: > > > > it's not just irda fwiw.. > > > > p1=rpc p2=rpc p1parent=net p2parent=net > > Ok, good. The only rpc/irda that has something in common is > /proc/net/, and they both use proc_mkdir() to create the directory: > > proc_irda = proc_mkdir("irda", init_net.proc_net); > ... > sn->proc_net_rpc = proc_mkdir("rpc", net->proc_net); > > so it's almost certainly that case. What I do *not* see is how we got > two different dentries for the same name in /proc. But if that > happens, then yes, they will have aliased inodes (because > proc_get_inode() will look them up by "sb,de->low_ino". > > Al, any ideas? There shouldn't be some lookup race, because that's > done under the parent inode lock. And multiple mount-points will have > different superblocks, so proc_get_inode() will give them separate > inodes. And bind mounts should have all the same dentry tree. So what > the heck am I missing? Hmm, these also seem to have appeared around about the time I reenabled all the namespace options after Eric fixed that last proc bug. could that be related ? Dave ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 22:53 ` Linus Torvalds 2013-03-21 23:07 ` Dave Jones @ 2013-03-21 23:36 ` Al Viro 2013-03-21 23:58 ` Linus Torvalds 1 sibling, 1 reply; 45+ messages in thread From: Al Viro @ 2013-03-21 23:36 UTC (permalink / raw) To: Linus Torvalds; +Cc: Dave Jones, Linux Kernel, Eric W. Biederman On Thu, Mar 21, 2013 at 03:53:13PM -0700, Linus Torvalds wrote: > Ok, good. The only rpc/irda that has something in common is > /proc/net/, and they both use proc_mkdir() to create the directory: > > proc_irda = proc_mkdir("irda", init_net.proc_net); > ... > sn->proc_net_rpc = proc_mkdir("rpc", net->proc_net); > > so it's almost certainly that case. What I do *not* see is how we got > two different dentries for the same name in /proc. But if that > happens, then yes, they will have aliased inodes (because > proc_get_inode() will look them up by "sb,de->low_ino". > > Al, any ideas? There shouldn't be some lookup race, because that's > done under the parent inode lock. And multiple mount-points will have > different superblocks, so proc_get_inode() will give them separate > inodes. And bind mounts should have all the same dentry tree. So what > the heck am I missing? Some netns-related idiocy. Oh, shit... al@duke:~/linux/trees/vfs$ ls -lid /proc/{1,2}/net/stat 4026531842 dr-xr-xr-x 2 root root 0 Mar 21 19:33 /proc/1/net/stat 4026531842 dr-xr-xr-x 2 root root 0 Mar 21 19:33 /proc/2/net/stat Eric, would you mind explaining WTF is going on here? Again, WE CAN NOT HAVE SEVERAL DENTRIES OVER THE SAME DIRECTORY INODE. Ever. We do that, we are fucked. Sigh... Namespace kinds - there should've been only one... ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 23:36 ` Al Viro @ 2013-03-21 23:58 ` Linus Torvalds 2013-03-22 0:01 ` Linus Torvalds 2013-03-22 0:08 ` Al Viro 0 siblings, 2 replies; 45+ messages in thread From: Linus Torvalds @ 2013-03-21 23:58 UTC (permalink / raw) To: Al Viro; +Cc: Dave Jones, Linux Kernel, Eric W. Biederman On Thu, Mar 21, 2013 at 4:36 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Some netns-related idiocy. Oh, shit... > > al@duke:~/linux/trees/vfs$ ls -lid /proc/{1,2}/net/stat > 4026531842 dr-xr-xr-x 2 root root 0 Mar 21 19:33 /proc/1/net/stat > 4026531842 dr-xr-xr-x 2 root root 0 Mar 21 19:33 /proc/2/net/stat > > Eric, would you mind explaining WTF is going on here? Again, WE CAN NOT > HAVE SEVERAL DENTRIES OVER THE SAME DIRECTORY INODE. Ever. We do that, > we are fucked. Hmm. That certainly explains the situation, but it leaves me wondering whether the simplest solution to this is not to say "ok, let's allow it in this case". The locking is already per-inode, so we can literally change the code that checks "if same dentry" to "if same inode" instead. And the only other reason we don't want to allow it is to make sure you can't have directory loops etc, afaik, and again, for this particular case of /proc, we happen to be ok. So yes, it's against the rules, and we get that deadlock right now, but one solution would be to just allow this particular case. The patch for the deadlock looks dead simple: diff --git a/fs/namei.c b/fs/namei.c index 57ae9c8c66bf..435002f99bd8 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2277,7 +2277,7 @@ struct dentry *lock_rename(struct dentry *p1, struct dentry *p2) { struct dentry *p; - if (p1 == p2) { + if (p1->d_inode == p2->d_inode) { mutex_lock_nested(&p1->d_inode->i_mutex, I_MUTEX_PARENT); return NULL; } @@ -2306,7 +2306,7 @@ struct dentry *lock_rename(struct dentry *p1, struct dentry *p2) void unlock_rename(struct dentry *p1, struct dentry *p2) { mutex_unlock(&p1->d_inode->i_mutex); - if (p1 != p2) { + if (p1->d_inode != p2->d_inode) { mutex_unlock(&p2->d_inode->i_mutex); mutex_unlock(&p1->d_inode->i_sb->s_vfs_rename_mutex); } Are there any other reasons why these kinds of "hardlinked directories" would cause problems? Linus ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 23:58 ` Linus Torvalds @ 2013-03-22 0:01 ` Linus Torvalds 2013-03-22 0:12 ` Al Viro 2013-03-22 0:08 ` Al Viro 1 sibling, 1 reply; 45+ messages in thread From: Linus Torvalds @ 2013-03-22 0:01 UTC (permalink / raw) To: Al Viro; +Cc: Dave Jones, Linux Kernel, Eric W. Biederman On Thu, Mar 21, 2013 at 4:58 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > So yes, it's against the rules, and we get that deadlock right now, > but one solution would be to just allow this particular case. The > patch for the deadlock looks dead simple: It should go without saying that that whitespace-damaged patch is entirely untested. But especially since we need to back-port whatever fix, it would be good if we could make the fix be something as simple as this. Because I don't believe we really want to backport some big network-namespace reorganization. This is, of course, all assuming that hardlinked directories are ok if we can just guarantee the absence of loops. If there's some other reason why they'd be problematic, we're screwed. Linus ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-22 0:01 ` Linus Torvalds @ 2013-03-22 0:12 ` Al Viro 2013-03-22 0:20 ` Al Viro 2013-03-22 0:22 ` Linus Torvalds 0 siblings, 2 replies; 45+ messages in thread From: Al Viro @ 2013-03-22 0:12 UTC (permalink / raw) To: Linus Torvalds; +Cc: Dave Jones, Linux Kernel, Eric W. Biederman On Thu, Mar 21, 2013 at 05:01:49PM -0700, Linus Torvalds wrote: > On Thu, Mar 21, 2013 at 4:58 PM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > So yes, it's against the rules, and we get that deadlock right now, > > but one solution would be to just allow this particular case. The > > patch for the deadlock looks dead simple: > > It should go without saying that that whitespace-damaged patch is > entirely untested. But especially since we need to back-port whatever > fix, it would be good if we could make the fix be something as simple > as this. Because I don't believe we really want to backport some big > network-namespace reorganization. > > This is, of course, all assuming that hardlinked directories are ok if > we can just guarantee the absence of loops. If there's some other > reason why they'd be problematic, we're screwed. See the posting upthread. We could try to kludge around that as well (e.g. have d_ancestor() compare ->d_inode instead of dentries themselves), but I really think it's a lousy idea only inviting further abuse. What we should do, IMO, is to turn /proc/<pid>/net into a honest symlink - to ../nets/<netns ID>/net. Hell, might even make it a magical symlink instead... ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-22 0:12 ` Al Viro @ 2013-03-22 0:20 ` Al Viro 2013-03-22 0:22 ` Linus Torvalds 1 sibling, 0 replies; 45+ messages in thread From: Al Viro @ 2013-03-22 0:20 UTC (permalink / raw) To: Linus Torvalds; +Cc: Dave Jones, Linux Kernel, Eric W. Biederman On Fri, Mar 22, 2013 at 12:12:57AM +0000, Al Viro wrote: > See the posting upthread. We could try to kludge around that as well > (e.g. have d_ancestor() compare ->d_inode instead of dentries themselves), > but I really think it's a lousy idea only inviting further abuse. > > What we should do, IMO, is to turn /proc/<pid>/net into a honest symlink - > to ../nets/<netns ID>/net. Hell, might even make it a magical symlink > instead... BTW, the root cause is that what used to be /proc/net became per-process. So Eric (IIRC) had added /proc/<pid>/net. Only they are not really per-process - they are per-netns. And instead of putting those per-ns trees elsewhere and having /proc/<pid>/net resolve to the right one, we got them as directories, with each entry hardlinked between all /proc/<pid>/net for processes from the same netns. Including the subdirectory ones. Oops... Another variant is to keep cross-hardlinks for non-directories and duplicate directory dentries/inodes as we do for /proc/<pid>/net themselves. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-22 0:12 ` Al Viro 2013-03-22 0:20 ` Al Viro @ 2013-03-22 0:22 ` Linus Torvalds 2013-03-22 1:22 ` Al Viro 1 sibling, 1 reply; 45+ messages in thread From: Linus Torvalds @ 2013-03-22 0:22 UTC (permalink / raw) To: Al Viro; +Cc: Dave Jones, Linux Kernel, Eric W. Biederman On Thu, Mar 21, 2013 at 5:12 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > What we should do, IMO, is to turn /proc/<pid>/net into a honest symlink - > to ../nets/<netns ID>/net. Hell, might even make it a magical symlink > instead... Ok, having seen the error of my ways, I'm starting to agree with you.. How painful would that be? Especially since we'd need to backport it.. Linus ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-22 0:22 ` Linus Torvalds @ 2013-03-22 1:22 ` Al Viro 2013-03-22 1:33 ` Linus Torvalds 0 siblings, 1 reply; 45+ messages in thread From: Al Viro @ 2013-03-22 1:22 UTC (permalink / raw) To: Linus Torvalds; +Cc: Dave Jones, Linux Kernel, Eric W. Biederman On Thu, Mar 21, 2013 at 05:22:59PM -0700, Linus Torvalds wrote: > On Thu, Mar 21, 2013 at 5:12 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > What we should do, IMO, is to turn /proc/<pid>/net into a honest symlink - > > to ../nets/<netns ID>/net. Hell, might even make it a magical symlink > > instead... > > Ok, having seen the error of my ways, I'm starting to agree with you.. > How painful would that be? Especially since we'd need to backport > it.. Not sure; right now I'm looking through the guts of what procfs had become. Unfortunately, there are fairly subtle interactions with other shit - tomoyo, etc. Sigh... BTW, the variant with d_ancestor() modification is also not enough - /proc/1/net and /proc/2/net have different inodes, so for the pair (/proc/net/1, /proc/2/net/stat) d_ancestor() won't trigger even with this change. And we have /proc/net/1 < /proc/net/1/stat, since the latter is a subdirectory of the former. With /proc/net/{1,2}/stat having the same inode... In theory, we can make vfs_rmdir() and vfs_unlink() check the presense of the corresponding method before locking the victim; that would suffice to kludge around that mess on procfs. Along with ->d_inode comparison in lock_rename() it *might* suffice. OTOH, there are places in fs/dcache.c where we rely on the lack of such aliases; they might or might not trigger in case of procfs. We are talking about the violation of fundamental assert used in correctness analysis all over the place, unfortunately. The right fix is to restore it; I'll try to come up with something that could be reasonably easily backported - the kludge above is a fallback in case if no real fix turns out to be easy to backport. Assuming that this kludge is sufficient, that is... For 3.9 and later we *definitely* want to restore that assertion. PS: Once more, with feeling, to everyone even thinking of pulling something like that again: Hardlinks to directories do not work. Don't do that, or we'll be sorry, and then so will you. A Very Peeved BOFH ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-22 1:22 ` Al Viro @ 2013-03-22 1:33 ` Linus Torvalds 2013-03-22 1:40 ` Al Viro 0 siblings, 1 reply; 45+ messages in thread From: Linus Torvalds @ 2013-03-22 1:33 UTC (permalink / raw) To: Al Viro; +Cc: Dave Jones, Linux Kernel, Eric W. Biederman On Thu, Mar 21, 2013 at 6:22 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > In theory, we can make vfs_rmdir() and vfs_unlink() check the presense of > the corresponding method before locking the victim; that would suffice to > kludge around that mess on procfs. Along with ->d_inode comparison in > lock_rename() it *might* suffice. Hmm, yes. Maybe we can do that as a stopgap, backport that, and leave any bigger changes for the development tree. That would make the issue less urgent, never mind all the other worries about backporting complicated patches for subtle issues. I realize you aren't entirely thrilled about it, but we actually already seem to do that check in both vfs_rmdir().and vfs_unlink() before getting the child i_mutex. I wonder if that is because we've already seen lockdep splats for this case... Linus ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-22 1:33 ` Linus Torvalds @ 2013-03-22 1:40 ` Al Viro 2013-03-22 4:37 ` [CFT] " Al Viro 0 siblings, 1 reply; 45+ messages in thread From: Al Viro @ 2013-03-22 1:40 UTC (permalink / raw) To: Linus Torvalds; +Cc: Dave Jones, Linux Kernel, Eric W. Biederman On Thu, Mar 21, 2013 at 06:33:35PM -0700, Linus Torvalds wrote: > On Thu, Mar 21, 2013 at 6:22 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > In theory, we can make vfs_rmdir() and vfs_unlink() check the presense of > > the corresponding method before locking the victim; that would suffice to > > kludge around that mess on procfs. Along with ->d_inode comparison in > > lock_rename() it *might* suffice. > > Hmm, yes. Maybe we can do that as a stopgap, backport that, and leave > any bigger changes for the development tree. That would make the issue > less urgent, never mind all the other worries about backporting > complicated patches for subtle issues. > > I realize you aren't entirely thrilled about it, but we actually > already seem to do that check in both vfs_rmdir().and vfs_unlink() > before getting the child i_mutex. I wonder if that is because we've > already seen lockdep splats for this case... Yeah, I went to do such patch after sending the previous mail and noticed that we already did it that way. Simplicity of error recovery was probably more important consideration there - I honestly don't remember the reasoning in such details; it had been a decade or so... So lock_rename() doing ->d_inode comparison (with dire comment re not expecting that to be sufficient for anything other than this bug in procfs) will probably suffice for fs/namei.c part of it; I'm still looking at dcache.c side of things... ^ permalink raw reply [flat|nested] 45+ messages in thread
* [CFT] Re: VFS deadlock ? 2013-03-22 1:40 ` Al Viro @ 2013-03-22 4:37 ` Al Viro 2013-03-22 4:55 ` Linus Torvalds 0 siblings, 1 reply; 45+ messages in thread From: Al Viro @ 2013-03-22 4:37 UTC (permalink / raw) To: Linus Torvalds; +Cc: Dave Jones, Linux Kernel, Eric W. Biederman On Fri, Mar 22, 2013 at 01:40:37AM +0000, Al Viro wrote: > Yeah, I went to do such patch after sending the previous mail and noticed > that we already did it that way. Simplicity of error recovery was probably > more important consideration there - I honestly don't remember the reasoning > in such details; it had been a decade or so... So lock_rename() doing > ->d_inode comparison (with dire comment re not expecting that to be sufficient > for anything other than this bug in procfs) will probably suffice for fs/namei.c > part of it; I'm still looking at dcache.c side of things... FWIW, a relatively crude solution is this: diff --git a/fs/proc/generic.c b/fs/proc/generic.c index 4b3b3ff..778cbac 100644 --- a/fs/proc/generic.c +++ b/fs/proc/generic.c @@ -416,8 +416,7 @@ struct dentry *proc_lookup_de(struct proc_dir_entry *de, struct inode *dir, if (!inode) return ERR_PTR(-ENOMEM); d_set_d_op(dentry, &proc_dentry_operations); - d_add(dentry, inode); - return NULL; + return d_materialise_unique(dentry, inode); } } spin_unlock(&proc_subdir_lock); It *is* crude, but it restores the assert, killing the deadlock and lets everything work more or less as it used to. The case where things start to look odd is this: root@kvm-amd64:~# cd /proc/1/net/stat/; ls /proc/2/net/stat; /bin/pwd arp_cache ndisc_cache rt_cache /proc/2/net/stat IOW, if we were about to create a directory alias, the old dentry gets moved in new place. OTOH, I think it's the most robust backportable variant we can do. And yes, that should apply at least all the way back to 2.6.25 when Eric acked a patch from Pavel that really should've been nacked... Folks, could you test that one and see if any real userland breaks on that? If everything works, I'd propose that one for -stable. ^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [CFT] Re: VFS deadlock ? 2013-03-22 4:37 ` [CFT] " Al Viro @ 2013-03-22 4:55 ` Linus Torvalds 2013-03-22 5:18 ` Al Viro 2013-03-22 5:19 ` Linus Torvalds 0 siblings, 2 replies; 45+ messages in thread From: Linus Torvalds @ 2013-03-22 4:55 UTC (permalink / raw) To: Al Viro; +Cc: Dave Jones, Linux Kernel, Eric W. Biederman [-- Attachment #1: Type: text/plain, Size: 1791 bytes --] On Thu, Mar 21, 2013 at 9:37 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > FWIW, a relatively crude solution is this: > > - d_add(dentry, inode); > - return NULL; > + return d_materialise_unique(dentry, inode); > > It *is* crude, but it restores the assert, killing the deadlock and lets > everything work more or less as it used to. The case where things start > to look odd is this: > > root@kvm-amd64:~# cd /proc/1/net/stat/; ls /proc/2/net/stat; /bin/pwd > arp_cache ndisc_cache rt_cache > /proc/2/net/stat > > IOW, if we were about to create a directory alias, the old dentry gets moved > in new place. Ugh, no, this is worse than the bug we're trying to fix. However, I do wonder if we could take another approach... There's really no reason why we should look up an old inode with iget_locked() in proc_get_inode() and only fill it in if it is new. We might as well just always create a new inode. The "iget_locked()" logic really comes from the bad old days when the inode was the primary data structure, but it's really the dentry that is the important data structure, and the inode might as well follow the dentry, instead of the other way around. So why not just use "new_inode_pseudo()" instead? IOW, something like this (totally untested, natch) patch? This way, if you have a new dentry, you are guaranteed to always have a new inode. None of the silly inode alias issues.. This seems too simple, but I don't see why iget_locked() would be any better. It just wastes time hashing the inode that we aren't really interested in hashing. The inode is always filled by the caller anyway, so we migth as well just get a fresh pseudo-filesystem inode without any crazy hashing.. Linus [-- Attachment #2: patch.diff --] [-- Type: application/octet-stream, Size: 672 bytes --] fs/proc/inode.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/proc/inode.c b/fs/proc/inode.c index a86aebc9ba7c..a2395317b3bf 100644 --- a/fs/proc/inode.c +++ b/fs/proc/inode.c @@ -446,9 +446,10 @@ static const struct file_operations proc_reg_file_ops_no_compat = { struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de) { - struct inode *inode = iget_locked(sb, de->low_ino); + struct inode *inode = new_inode_pseudo(sb); - if (inode && (inode->i_state & I_NEW)) { + if (inode) { + inode->i_ino = de->low_ino; inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME; PROC_I(inode)->pde = de; ^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [CFT] Re: VFS deadlock ? 2013-03-22 4:55 ` Linus Torvalds @ 2013-03-22 5:18 ` Al Viro 2013-03-22 5:33 ` Linus Torvalds 2013-03-22 5:19 ` Linus Torvalds 1 sibling, 1 reply; 45+ messages in thread From: Al Viro @ 2013-03-22 5:18 UTC (permalink / raw) To: Linus Torvalds; +Cc: Dave Jones, Linux Kernel, Eric W. Biederman On Thu, Mar 21, 2013 at 09:55:10PM -0700, Linus Torvalds wrote: > However, I do wonder if we could take another approach... There's > really no reason why we should look up an old inode with iget_locked() > in proc_get_inode() and only fill it in if it is new. We might as well > just always create a new inode. The "iget_locked()" logic really comes > from the bad old days when the inode was the primary data structure, > but it's really the dentry that is the important data structure, and > the inode might as well follow the dentry, instead of the other way > around. > > So why not just use "new_inode_pseudo()" instead? IOW, something like > this (totally untested, natch) patch? This way, if you have a new > dentry, you are guaranteed to always have a new inode. None of the > silly inode alias issues.. > > This seems too simple, but I don't see why iget_locked() would be any > better. It just wastes time hashing the inode that we aren't really > interested in hashing. The inode is always filled by the caller > anyway, so we migth as well just get a fresh pseudo-filesystem inode > without any crazy hashing.. Umm... static int proc_delete_dentry(const struct dentry * dentry) { return 1; } static const struct dentry_operations proc_dentry_operations = { .d_delete = proc_delete_dentry, }; IOW, dcache retention in procfs is inexistent and the damn thing tries to cut down on the amount of inode allocation/freeing/filling. I agree that we could get rid of icache retention there and everything ought to keep working. Hell knows... It applies only to the stuff that isn't per-process, so it's probably not particulary hot anyway, but it does need profiling... OTOH, we probably could mark "stable" dentries in some way and let proc_delete_dentry() check that flag in proc_dir_entry - I seriously suspect that really hot non-per-process ones are of the "never become invalid" variety. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [CFT] Re: VFS deadlock ? 2013-03-22 5:18 ` Al Viro @ 2013-03-22 5:33 ` Linus Torvalds 2013-03-22 6:09 ` Al Viro ` (2 more replies) 0 siblings, 3 replies; 45+ messages in thread From: Linus Torvalds @ 2013-03-22 5:33 UTC (permalink / raw) To: Al Viro; +Cc: Dave Jones, Linux Kernel, Eric W. Biederman On Thu, Mar 21, 2013 at 10:18 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: >> >> This seems too simple, but I don't see why iget_locked() would be any >> better. It just wastes time hashing the inode that we aren't really >> interested in hashing. The inode is always filled by the caller >> anyway, so we migth as well just get a fresh pseudo-filesystem inode >> without any crazy hashing.. > > Umm... > static int proc_delete_dentry(const struct dentry * dentry) > { > return 1; > } > > static const struct dentry_operations proc_dentry_operations = > { > .d_delete = proc_delete_dentry, > }; > > IOW, dcache retention in procfs is inexistent and the damn thing tries > to cut down on the amount of inode allocation/freeing/filling. Ok, that's kind of ugly, but shouldn't be a correctness issue. It should still work - just cycle through inodes quite aggressivelydue to no longer re-using them - so I suspect Dave could test it (with the extra line removed I pointed out just a moment ago). And I wonder how big of a deal the aggressive dentry deletion is. Maybe it's even ok to allocate/free the inodes all the time. The whole "get the inode hash lock and look it up there" can't be all that wonderful either. It takes the inode->i_lock for each entry it finds on the hash list, which looks horrible. I suspect our slab allocator isn't much worse than that, although the RCU freeing of the inodes could end up being problematic. Linus ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [CFT] Re: VFS deadlock ? 2013-03-22 5:33 ` Linus Torvalds @ 2013-03-22 6:09 ` Al Viro 2013-03-22 6:22 ` Al Viro 2013-03-22 16:23 ` Dave Jones 2013-03-22 19:43 ` Linus Torvalds 2 siblings, 1 reply; 45+ messages in thread From: Al Viro @ 2013-03-22 6:09 UTC (permalink / raw) To: Linus Torvalds; +Cc: Dave Jones, Linux Kernel, Eric W. Biederman On Thu, Mar 21, 2013 at 10:33:27PM -0700, Linus Torvalds wrote: > Ok, that's kind of ugly, but shouldn't be a correctness issue. It > should still work - just cycle through inodes quite aggressivelydue to > no longer re-using them - so I suspect Dave could test it (with the > extra line removed I pointed out just a moment ago). > > And I wonder how big of a deal the aggressive dentry deletion is. > Maybe it's even ok to allocate/free the inodes all the time. The whole > "get the inode hash lock and look it up there" can't be all that > wonderful either. It takes the inode->i_lock for each entry it finds > on the hash list, which looks horrible. I suspect our slab allocator > isn't much worse than that, although the RCU freeing of the inodes > could end up being problematic. Hell knows... At the very least, I'd expect /proc/self to be fairly hot. During the boot time - /proc/mounts, /proc/filesystems, /proc/cmdline... Dunno. Would be interesting to slap a printk into proc_lookup_de() and see how much (and what) does it hit on a busy system... ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [CFT] Re: VFS deadlock ? 2013-03-22 6:09 ` Al Viro @ 2013-03-22 6:22 ` Al Viro 0 siblings, 0 replies; 45+ messages in thread From: Al Viro @ 2013-03-22 6:22 UTC (permalink / raw) To: Linus Torvalds; +Cc: Dave Jones, Linux Kernel, Eric W. Biederman On Fri, Mar 22, 2013 at 06:09:53AM +0000, Al Viro wrote: > Hell knows... At the very least, I'd expect /proc/self to be fairly hot. > During the boot time - /proc/mounts, /proc/filesystems, /proc/cmdline... > Dunno. Would be interesting to slap a printk into proc_lookup_de() and > see how much (and what) does it hit on a busy system... FWIW, I'd done a quick and dirty stats collector right now - added hit count into proc_dir_entry + /proc/counts walking the pde tree and dumping the counts. Left it running in practically idle kvm testbox, after boot it shows kmsg => 2 kcore => 4 version => 4 uptime => 3 stat => 38 meminfo => 22 loadavg => 2 devices => 4 consoles => 4 cmdline => 109 filesystems => 118 swaps => 8 modules => 2 misc => 1 acpi => 6 fb => 1 counts => 14 sys => 1 bus => 1 fs => 1 net => 18 mounts => 10 self => 123 IOW, counts on this one are very low so far. OTOH, that kvm image is practically idle, doesn't have desktop shite on it, etc. This really ought to be checked on something busy... The first impression is that the stuff outside of /proc/<pid> and /proc/sys isn't hot enough to care about any cache retention... ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [CFT] Re: VFS deadlock ? 2013-03-22 5:33 ` Linus Torvalds 2013-03-22 6:09 ` Al Viro @ 2013-03-22 16:23 ` Dave Jones 2013-03-22 19:43 ` Linus Torvalds 2 siblings, 0 replies; 45+ messages in thread From: Dave Jones @ 2013-03-22 16:23 UTC (permalink / raw) To: Linus Torvalds; +Cc: Al Viro, Linux Kernel, Eric W. Biederman On Thu, Mar 21, 2013 at 10:33:27PM -0700, Linus Torvalds wrote: > Ok, that's kind of ugly, but shouldn't be a correctness issue. It > should still work - just cycle through inodes quite aggressivelydue to > no longer re-using them - so I suspect Dave could test it (with the > extra line removed I pointed out just a moment ago). Been running a few hours now without incident. Of course, proving absense of a bug is always fun when fuzzing, as it may have just not run long enough yet.. though I was hitting it pretty quickly last night. Dave ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [CFT] Re: VFS deadlock ? 2013-03-22 5:33 ` Linus Torvalds 2013-03-22 6:09 ` Al Viro 2013-03-22 16:23 ` Dave Jones @ 2013-03-22 19:43 ` Linus Torvalds 2013-03-22 21:28 ` Al Viro 2013-03-22 22:57 ` Eric W. Biederman 2 siblings, 2 replies; 45+ messages in thread From: Linus Torvalds @ 2013-03-22 19:43 UTC (permalink / raw) To: Al Viro; +Cc: Dave Jones, Linux Kernel, Eric W. Biederman On Thu, Mar 21, 2013 at 10:33 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > And I wonder how big of a deal the aggressive dentry deletion is. > Maybe it's even ok to allocate/free the inodes all the time. The whole > "get the inode hash lock and look it up there" can't be all that > wonderful either. It takes the inode->i_lock for each entry it finds > on the hash list, which looks horrible. I suspect our slab allocator > isn't much worse than that, although the RCU freeing of the inodes > could end up being problematic. I tested this out by just having a process that keeps stat'ing the same /proc inode over and over and over again, which should be pretty much the worst-case situation (because we will delete the dentry after each stat, and so we'll repopulate it for each stat) The allocation overhead seems to be in the noise. The real costs are all in proc_lookup_de() just finding the entry, with strlen/strcmp being much hotter. So if we actually care about performance for these cases, what we would actually want to do is to just cache the dentries and have some kind of timeout instead of getting rid of them immediately. That would get rid of the lookup costs, and in the process also get rid of the constant inode allocation/deallocation costs. There was also some locking overhead, but that's also mainly dentry-related, with the inode side being visible but not dominant. Again, not immediately expiring the dentries would make all that just go away. Regardless, the patch seems to be the right thing to do. It actually simplifies /proc, seems to fix the problem for Dave, and is small and should be trivial to backport. I've committed it and marked it for stable. Let's hope there won't be any nasty surprises, but it does seem to be the simplest non-hacky patch. Linus ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [CFT] Re: VFS deadlock ? 2013-03-22 19:43 ` Linus Torvalds @ 2013-03-22 21:28 ` Al Viro 2013-03-22 22:57 ` Eric W. Biederman 1 sibling, 0 replies; 45+ messages in thread From: Al Viro @ 2013-03-22 21:28 UTC (permalink / raw) To: Linus Torvalds; +Cc: Dave Jones, Linux Kernel, Eric W. Biederman On Fri, Mar 22, 2013 at 12:43:50PM -0700, Linus Torvalds wrote: > I tested this out by just having a process that keeps stat'ing the > same /proc inode over and over and over again, which should be pretty > much the worst-case situation (because we will delete the dentry after > each stat, and so we'll repopulate it for each stat) > > The allocation overhead seems to be in the noise. The real costs are > all in proc_lookup_de() just finding the entry, with strlen/strcmp > being much hotter. So if we actually care about performance for these > cases, what we would actually want to do is to just cache the dentries > and have some kind of timeout instead of getting rid of them > immediately. That would get rid of the lookup costs, and in the > process also get rid of the constant inode allocation/deallocation > costs. As far as I can see, the whole thing is fairly cold, both the globals and per-netns stuff... /proc/<pid> is a different story, and /proc/sys can also get hit quite a bit, but those... nope. > There was also some locking overhead, but that's also mainly > dentry-related, with the inode side being visible but not dominant. > Again, not immediately expiring the dentries would make all that just > go away. > > Regardless, the patch seems to be the right thing to do. It actually > simplifies /proc, seems to fix the problem for Dave, and is small and > should be trivial to backport. I've committed it and marked it for > stable. Let's hope there won't be any nasty surprises, but it does > seem to be the simplest non-hacky patch. ACK. It might make sense to look into the making procfs retain dentries better, but I seriously suspect that we would get much bigger win simply from * refusing to create an entry with numeric name in procfs root * reordering the proc_root_lookup() - try the per-process first. The thing is, proc_pid_lookup() will start with looking if name is a decimal number; that'll immediately reject the ones that should be handled by proc_lookup(). proc_lookup() miss, OTOH, costs more. Sure, the length won't match for the most of them, but grabbing spinlock / walking the list / comparing the legth for each entry / dropping the spinlock is going to be more costly than checking that a short string isn't a decimal number. And we look there for /proc/<pid>/something a lot more... IOW, I suspect that something like (very lightly tested) patch below would be more useful than anything we can get from playing with dcache retention. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- diff --git a/fs/proc/base.c b/fs/proc/base.c index 69078c7..02b07c7 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2727,12 +2727,12 @@ out: struct dentry *proc_pid_lookup(struct inode *dir, struct dentry * dentry, unsigned int flags) { - struct dentry *result = NULL; + struct dentry *result = ERR_PTR(-ENOENT); struct task_struct *task; unsigned tgid; struct pid_namespace *ns; - tgid = name_to_int(dentry); + tgid = name_to_int(&dentry->d_name); if (tgid == ~0U) goto out; @@ -2990,7 +2990,7 @@ static struct dentry *proc_task_lookup(struct inode *dir, struct dentry * dentry if (!leader) goto out_no_task; - tid = name_to_int(dentry); + tid = name_to_int(&dentry->d_name); if (tid == ~0U) goto out; diff --git a/fs/proc/fd.c b/fs/proc/fd.c index d7a4a28..5f4286c 100644 --- a/fs/proc/fd.c +++ b/fs/proc/fd.c @@ -205,7 +205,7 @@ static struct dentry *proc_lookupfd_common(struct inode *dir, { struct task_struct *task = get_proc_task(dir); struct dentry *result = ERR_PTR(-ENOENT); - unsigned fd = name_to_int(dentry); + unsigned fd = name_to_int(&dentry->d_name); if (!task) goto out_no_task; diff --git a/fs/proc/generic.c b/fs/proc/generic.c index 4b3b3ff..e13d2da 100644 --- a/fs/proc/generic.c +++ b/fs/proc/generic.c @@ -580,7 +580,7 @@ static struct proc_dir_entry *__proc_create(struct proc_dir_entry **parent, { struct proc_dir_entry *ent = NULL; const char *fn = name; - unsigned int len; + struct qstr q; /* make sure name is valid */ if (!name || !strlen(name)) @@ -593,14 +593,17 @@ static struct proc_dir_entry *__proc_create(struct proc_dir_entry **parent, if (strchr(fn, '/')) goto out; - len = strlen(fn); + q.name = fn; + q.len = strlen(fn); + if (*parent == &proc_root && name_to_int(&q) != ~0U) + return NULL; - ent = kzalloc(sizeof(struct proc_dir_entry) + len + 1, GFP_KERNEL); + ent = kzalloc(sizeof(struct proc_dir_entry) + q.len + 1, GFP_KERNEL); if (!ent) goto out; - memcpy(ent->name, fn, len + 1); - ent->namelen = len; + memcpy(ent->name, q.name, q.len + 1); + ent->namelen = q.len; ent->mode = mode; ent->nlink = nlink; atomic_set(&ent->count, 1); diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 85ff3a4..fba6da2 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -123,10 +123,10 @@ static inline int pid_delete_dentry(const struct dentry * dentry) return !proc_pid(dentry->d_inode)->tasks[PIDTYPE_PID].first; } -static inline unsigned name_to_int(struct dentry *dentry) +static inline unsigned name_to_int(struct qstr *qstr) { - const char *name = dentry->d_name.name; - int len = dentry->d_name.len; + const char *name = qstr->name; + int len = qstr->len; unsigned n = 0; if (len > 1 && *name == '0') diff --git a/fs/proc/root.c b/fs/proc/root.c index c6e9fac..67c9dc4 100644 --- a/fs/proc/root.c +++ b/fs/proc/root.c @@ -190,10 +190,10 @@ static int proc_root_getattr(struct vfsmount *mnt, struct dentry *dentry, struct static struct dentry *proc_root_lookup(struct inode * dir, struct dentry * dentry, unsigned int flags) { - if (!proc_lookup(dir, dentry, flags)) + if (!proc_pid_lookup(dir, dentry, flags)) return NULL; - return proc_pid_lookup(dir, dentry, flags); + return proc_lookup(dir, dentry, flags); } static int proc_root_readdir(struct file * filp, ^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [CFT] Re: VFS deadlock ? 2013-03-22 19:43 ` Linus Torvalds 2013-03-22 21:28 ` Al Viro @ 2013-03-22 22:57 ` Eric W. Biederman 1 sibling, 0 replies; 45+ messages in thread From: Eric W. Biederman @ 2013-03-22 22:57 UTC (permalink / raw) To: Linus Torvalds; +Cc: Al Viro, Dave Jones, Linux Kernel Linus Torvalds <torvalds@linux-foundation.org> writes: > Regardless, the patch seems to be the right thing to do. It actually > simplifies /proc, seems to fix the problem for Dave, and is small and > should be trivial to backport. I've committed it and marked it for > stable. Let's hope there won't be any nasty surprises, but it does > seem to be the simplest non-hacky patch. Thanks for pushing this to completion. I thought I had fixed this several years ago by not reusing the directory inodes under /proc/net but apparently that never happened. Eric ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [CFT] Re: VFS deadlock ? 2013-03-22 4:55 ` Linus Torvalds 2013-03-22 5:18 ` Al Viro @ 2013-03-22 5:19 ` Linus Torvalds 1 sibling, 0 replies; 45+ messages in thread From: Linus Torvalds @ 2013-03-22 5:19 UTC (permalink / raw) To: Al Viro; +Cc: Dave Jones, Linux Kernel, Eric W. Biederman On Thu, Mar 21, 2013 at 9:55 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > So why not just use "new_inode_pseudo()" instead? IOW, something like > this (totally untested, natch) patch? Ok, so I think that patch is still probably the right way to go, but it's broken for a silly reason: because it's not using iget_locked() any more, it also needs to remove the now unnecessary (and actively incorrect) call to unlock_new_inode(). So remove that one line too, and maybe it could even work. Still not *tested*, though. Linus ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-21 23:58 ` Linus Torvalds 2013-03-22 0:01 ` Linus Torvalds @ 2013-03-22 0:08 ` Al Viro 2013-03-22 0:15 ` Linus Torvalds 1 sibling, 1 reply; 45+ messages in thread From: Al Viro @ 2013-03-22 0:08 UTC (permalink / raw) To: Linus Torvalds; +Cc: Dave Jones, Linux Kernel, Eric W. Biederman On Thu, Mar 21, 2013 at 04:58:41PM -0700, Linus Torvalds wrote: > And the only other reason we don't want to allow it is to make sure > you can't have directory loops etc, afaik, and again, for this > particular case of /proc, we happen to be ok. Not really. Do that and yes, this deadlock goes away. But the locking order in general goes to hell - we order directory inodes by "which dentry is an ancestor of another?" So we have no warranty that we won't get alias1/foo/bar/baz < alias2/foo. Take rename_lock() on those two and have it race with rmdir alias2/foo/bar/baz (locks alias2/foo/bar, then alias2/foo/bar/baz) and rmdir alias2/foo/bar (locks alias2/foo and alias2/foo/bar). Oops - we have a cycle now... ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-22 0:08 ` Al Viro @ 2013-03-22 0:15 ` Linus Torvalds 2013-03-22 0:19 ` Linus Torvalds 0 siblings, 1 reply; 45+ messages in thread From: Linus Torvalds @ 2013-03-22 0:15 UTC (permalink / raw) To: Al Viro; +Cc: Dave Jones, Linux Kernel, Eric W. Biederman On Thu, Mar 21, 2013 at 5:08 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Not really. Do that and yes, this deadlock goes away. But the locking > order in general goes to hell - we order directory inodes by "which dentry > is an ancestor of another?" So we have no warranty that we won't get > alias1/foo/bar/baz < alias2/foo. Take rename_lock() on those two and > have it race with rmdir alias2/foo/bar/baz (locks alias2/foo/bar, then > alias2/foo/bar/baz) and rmdir alias2/foo/bar (locks alias2/foo and > alias2/foo/bar). Oops - we have a cycle now... Hmm. But again, that can't actually happen here. We're in /proc. You can't move the entries around. Also, we only changed the locking order for the "inode is identical" case where we take only *one* lock, we didn't change it for the cases where we take multiple locks (and order them topologically). So I agree that we need to avoid aliased directories in the *general* case. I'm just arguing that for the specific case of /proc, we should be ok. No? Linus ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: VFS deadlock ? 2013-03-22 0:15 ` Linus Torvalds @ 2013-03-22 0:19 ` Linus Torvalds 0 siblings, 0 replies; 45+ messages in thread From: Linus Torvalds @ 2013-03-22 0:19 UTC (permalink / raw) To: Al Viro; +Cc: Dave Jones, Linux Kernel, Eric W. Biederman Thinking some more.. On Thu, Mar 21, 2013 at 5:15 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Hmm. But again, that can't actually happen here. We're in /proc. You > can't move the entries around. .. this wasn't a good argument, because we will take the locks before we do that. > Also, we only changed the locking order > for the "inode is identical" case where we take only *one* lock, we > didn't change it for the cases where we take multiple locks (and order > them topologically). .. and this isn't a good argument either, because your argument was that you can get the deadlock by always taking two directories, and never hitting the alias case itself. Hmm. Linus ^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2013-03-22 22:58 UTC | newest] Thread overview: 45+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-03-21 19:06 VFS deadlock ? Dave Jones 2013-03-21 19:21 ` Al Viro 2013-03-21 20:31 ` Dave Jones 2013-03-21 19:29 ` Al Viro 2013-03-21 20:15 ` Linus Torvalds 2013-03-21 20:26 ` Dave Jones 2013-03-21 20:32 ` Linus Torvalds 2013-03-21 20:36 ` Dave Jones 2013-03-21 20:47 ` Al Viro 2013-03-21 21:02 ` Dave Jones 2013-03-21 21:18 ` Linus Torvalds 2013-03-21 21:26 ` Al Viro 2013-03-21 21:41 ` Dave Jones 2013-03-21 21:47 ` Linus Torvalds 2013-03-21 21:55 ` Al Viro 2013-03-21 21:57 ` Linus Torvalds 2013-03-21 22:03 ` Al Viro 2013-03-21 21:52 ` Al Viro 2013-03-21 22:12 ` Dave Jones 2013-03-21 22:29 ` Dave Jones 2013-03-21 22:53 ` Linus Torvalds 2013-03-21 23:07 ` Dave Jones 2013-03-21 23:36 ` Al Viro 2013-03-21 23:58 ` Linus Torvalds 2013-03-22 0:01 ` Linus Torvalds 2013-03-22 0:12 ` Al Viro 2013-03-22 0:20 ` Al Viro 2013-03-22 0:22 ` Linus Torvalds 2013-03-22 1:22 ` Al Viro 2013-03-22 1:33 ` Linus Torvalds 2013-03-22 1:40 ` Al Viro 2013-03-22 4:37 ` [CFT] " Al Viro 2013-03-22 4:55 ` Linus Torvalds 2013-03-22 5:18 ` Al Viro 2013-03-22 5:33 ` Linus Torvalds 2013-03-22 6:09 ` Al Viro 2013-03-22 6:22 ` Al Viro 2013-03-22 16:23 ` Dave Jones 2013-03-22 19:43 ` Linus Torvalds 2013-03-22 21:28 ` Al Viro 2013-03-22 22:57 ` Eric W. Biederman 2013-03-22 5:19 ` Linus Torvalds 2013-03-22 0:08 ` Al Viro 2013-03-22 0:15 ` Linus Torvalds 2013-03-22 0:19 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).