* /proc/<pid>/status & task struct locking @ 2016-04-15 16:49 Dave Jones 2016-04-15 18:07 ` Linus Torvalds 0 siblings, 1 reply; 4+ messages in thread From: Dave Jones @ 2016-04-15 16:49 UTC (permalink / raw) To: Linux Kernel; +Cc: Linus Torvalds, Al Viro I've been chasing this bug which shows up occasionally. We've seen it happen all the way back to 3.10, and more recently, on 4.0 and the code affected doesn't look to have changed much since.. BUG: unable to handle kernel NULL pointer dereference at 0000000000000019 IP: [<ffffffff81442bc0>] vsnprintf+0x60/0x580 PGD 42c42a067 PUD 28a441067 PMD 0 Oops: 0000 [#1] SMP CPU: 13 PID: 2364235 Comm: ps Not tainted 4.0.9 #1 task: ffff880445f16880 ti: ffff8801ac52c000 task.ti: ffff8801ac52c000 RIP: 0010:[<ffffffff81442bc0>] [<ffffffff81442bc0>] vsnprintf+0x60/0x580 RSP: 0018:ffff8801ac52fc08 EFLAGS: 00010286 RAX: 0000000000000019 RBX: ffff88033a46720d RCX: ffffffff81981e31 RDX: 0000000000000025 RSI: ffff8801ac52fc40 RDI: ffffffff81981e18 RBP: ffff8801ac52fc78 R08: ffffffff81981e18 R09: 00000000ffffffff R10: 0000000000000000 R11: ffff8801ac52fa9e R12: ffff88033a468000 R13: ffff8801ac52fcb0 R14: 000000000001f199 R15: ffffffff81981e18 FS: 00007f907481a880(0000) GS:ffff88046fba0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000019 CR3: 0000000342931000 CR4: 00000000003406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: 0000000000000017 0000000000000001 0000000000001000 ffffffff81977b7f ffff88033a46720d 0000000000000df3 ffffffff811d7b39 0000000000000000 0000000000000296 ffff8801cffcc780 ffff8801cffcc780 ffff88046d7c7200 Call Trace: [<ffffffff811d7b39>] ? seq_vprintf+0x39/0x70 [<ffffffff811d7b35>] seq_vprintf+0x35/0x70 [<ffffffff811d7bad>] seq_printf+0x3d/0x40 [<ffffffff8121b675>] proc_pid_status+0x735/0x7c0 [<ffffffff811bdc32>] ? path_cleanup+0x42/0x60 [<ffffffff812168f4>] proc_single_show+0x54/0xa0 [<ffffffff811d746a>] seq_read+0xea/0x370 [<ffffffff811b4cc0>] __vfs_read+0x20/0x60 [<ffffffff811b4d86>] vfs_read+0x86/0x140 [<ffffffff811b4e86>] SyS_read+0x46/0xb0 [<ffffffff8175930e>] ? int_check_syscall_exit_work+0x34/0x3d [<ffffffff817590f2>] system_call_fastpath+0x12/0x17 Code: 89 cd 49 01 fc 0f 82 18 03 00 00 48 89 7d b0 41 0f b6 07 0f 1f 84 00 00 00 00 00 84 c0 74 43 48 8d 75 c8 4c 89 ff e8 30 d4 ff ff <0f> b6 55 c8 48 63 c8 4d 8d 34 0f 80 fa 07 0f 87 4c 02 00 00 ff RIP [<ffffffff81442bc0>] vsnprintf+0x60/0x580 RSP <ffff8801ac52fc08> My first thought on this was "ps is reading /proc/pid/status while the process is exiting/crashing" Looking at proc_pid_status(), I see a bunch of the functions it calls take task_lock(), but there are also a bunch of bare task struct dereferences. Wrapping those in task_lock()/task_unlock()'s might narrow the window some, but it still seems like there would be a small opportunity for this same scenario to exist. What puzzles me, is that I can't make this bug reproduce for the life of me, even by adding sleeps in proc_pid_status, so I'm wondering if I'm way off base. Is there some other locking at play here protecting the task struct that I'm overlooking ? Dave ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: /proc/<pid>/status & task struct locking 2016-04-15 16:49 /proc/<pid>/status & task struct locking Dave Jones @ 2016-04-15 18:07 ` Linus Torvalds 2016-04-15 18:23 ` Dave Jones 0 siblings, 1 reply; 4+ messages in thread From: Linus Torvalds @ 2016-04-15 18:07 UTC (permalink / raw) To: Dave Jones, Linux Kernel, Linus Torvalds, Al Viro On Fri, Apr 15, 2016 at 9:49 AM, Dave Jones <davej@codemonkey.org.uk> wrote: > [<ffffffff811d7b39>] ? seq_vprintf+0x39/0x70 > [<ffffffff811d7b35>] seq_vprintf+0x35/0x70 > Code: 89 cd 49 01 fc 0f 82 18 03 00 00 48 89 7d b0 41 0f b6 07 0f 1f 84 00 00 00 00 00 84 c0 74 43 48 8d 75 c8 4c 89 ff e8 30 d4 ff ff <0f> b6 55 c8 48 63 c8 4d 8d 34 0f 80 fa 07 0f 87 4c 02 00 00 ff The code disassembles to 0: 48 89 7d b0 mov %rdi,-0x50(%rbp) 4: 41 0f b6 07 movzbl (%r15),%eax 8: 0f 1f 84 00 00 00 00 nopl 10: 84 c0 test %al,%al 12: 74 43 je 0x57 14: 48 8d 75 c8 lea -0x38(%rbp),%rsi 18: 4c 89 ff mov %r15,%rdi 1b: e8 30 d4 ff ff callq 0xffffffffffffd450 20:* 0f b6 55 c8 movzbl -0x38(%rbp),%edx <-- trapping instruction 24: 48 63 c8 movslq %eax,%rcx which is interesting. That "-0x38(%rbp)" was passed (by reference) to some subroutine, and now that we try to read the value, we take a fault. And it makes even less sense because %rbp really seems to be not a random register, but the frame pointer: RBP: ffff8801ac52fc78 RSP: ffff8801ac52fc08 So why the *hell* do we get BUG: unable to handle kernel NULL pointer dereference at 0000000000000019 for that? That makes no sense. Quite frankly, I would not attribute this to /proc/pid/status with this kind of insane oops. Maybe I misread your oops, but that just all looks completely bogus. Even if the stack got corrupted and/or unmapped, how did %cr2 get that odd "0000000000000019" fault address? None of this makes any sense at all to me. What CPU is this on? There was the crazy AMD microcode bug. This looks even more random, because now the registers look fine, and the oops just looks bad. Do you have other versions of the oops for this same problem? Linus ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: /proc/<pid>/status & task struct locking 2016-04-15 18:07 ` Linus Torvalds @ 2016-04-15 18:23 ` Dave Jones 2016-04-15 18:39 ` Linus Torvalds 0 siblings, 1 reply; 4+ messages in thread From: Dave Jones @ 2016-04-15 18:23 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel, Al Viro On Fri, Apr 15, 2016 at 11:07:16AM -0700, Linus Torvalds wrote: > On Fri, Apr 15, 2016 at 9:49 AM, Dave Jones <davej@codemonkey.org.uk> wrote: > > [<ffffffff811d7b39>] ? seq_vprintf+0x39/0x70 > > [<ffffffff811d7b35>] seq_vprintf+0x35/0x70 > > Code: 89 cd 49 01 fc 0f 82 18 03 00 00 48 89 7d b0 41 0f b6 07 0f 1f 84 00 00 00 00 00 84 c0 74 43 48 8d 75 c8 4c 89 ff e8 30 d4 ff ff <0f> b6 55 c8 48 63 c8 4d 8d 34 0f 80 fa 07 0f 87 4c 02 00 00 ff > > The code disassembles to > > 0: 48 89 7d b0 mov %rdi,-0x50(%rbp) > 4: 41 0f b6 07 movzbl (%r15),%eax > 8: 0f 1f 84 00 00 00 00 nopl > 10: 84 c0 test %al,%al > 12: 74 43 je 0x57 > 14: 48 8d 75 c8 lea -0x38(%rbp),%rsi > 18: 4c 89 ff mov %r15,%rdi > 1b: e8 30 d4 ff ff callq 0xffffffffffffd450 > 20:* 0f b6 55 c8 movzbl -0x38(%rbp),%edx <-- trapping instruction > 24: 48 63 c8 movslq %eax,%rcx > > which is interesting. That "-0x38(%rbp)" was passed (by reference) to > some subroutine, and now that we try to read the value, we take a > fault. > > And it makes even less sense because %rbp really seems to be not a > random register, but the frame pointer: > > RBP: ffff8801ac52fc78 > RSP: ffff8801ac52fc08 > > So why the *hell* do we get > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000019 > > for that? That makes no sense. That's a really good question. > Quite frankly, I would not attribute this to /proc/pid/status with > this kind of insane oops. > > Maybe I misread your oops, but that just all looks completely bogus. > Even if the stack got corrupted and/or unmapped, how did %cr2 get that > odd "0000000000000019" fault address? None of this makes any sense at > all to me. > > What CPU is this on? There was the crazy AMD microcode bug. This looks > even more random, because now the registers look fine, and the oops > just looks bad. It's a Xeon E5-2680 v4. > Do you have other versions of the oops for this same problem? I've seen this a few times, so I'll see if I can dig up some more next week. I'm now wondering if it's just some hardware bug though. As mentioned it's an outlier bug, but one that seems to pop up enough times that it's been nagging at me, in case it's also responsible for similar weird /proc traces I've been seeing (more frequently), those have a different signature to this though. To put my mind at rest though, am I wrong about that absent task_lock() stuff ? Dave ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: /proc/<pid>/status & task struct locking 2016-04-15 18:23 ` Dave Jones @ 2016-04-15 18:39 ` Linus Torvalds 0 siblings, 0 replies; 4+ messages in thread From: Linus Torvalds @ 2016-04-15 18:39 UTC (permalink / raw) To: Dave Jones, Linus Torvalds, Linux Kernel, Al Viro On Fri, Apr 15, 2016 at 11:23 AM, Dave Jones <davej@codemonkey.org.uk> wrote: > > To put my mind at rest though, am I wrong about that absent task_lock() stuff ? So the task shouldn't be going away, because we are using the proc_single_file_operations, which use proc_single_show(), which in turn do a task = get_pid_task(pid, PIDTYPE_PID); ... put_task_struct(task); around it all. So dereferencing the task pointer is all safe, and the only reason to use task_lock() is if you end up doing something more complicated. I'm not seeing anything wrong there. It does do the get_task_mm() before touching mm fields, and the signal fields get protected by lock_task_sighand(). The rest seems to just dereference the task struct directly, and if those values fluctuate that's fine: you get one or the other, no amount of locking will make /proc/pid/status give "reliable" values in the big picture, since the user-space reader won't have the lock anyway. So it all looks fine to me, but I'm not saying I did some exhaustive check. Linus ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-04-15 18:39 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-04-15 16:49 /proc/<pid>/status & task struct locking Dave Jones 2016-04-15 18:07 ` Linus Torvalds 2016-04-15 18:23 ` Dave Jones 2016-04-15 18:39 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox