public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* /proc/<pid>/status & task struct locking
@ 2016-04-15 16:49 Dave Jones
  2016-04-15 18:07 ` Linus Torvalds
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Jones @ 2016-04-15 16:49 UTC (permalink / raw)
  To: Linux Kernel; +Cc: Linus Torvalds, Al Viro

I've been chasing this bug which shows up occasionally.
We've seen it happen all the way back to 3.10, and more recently, on 4.0
and the code affected doesn't look to have changed much since..


BUG: unable to handle kernel NULL pointer dereference at 0000000000000019
IP: [<ffffffff81442bc0>] vsnprintf+0x60/0x580
PGD 42c42a067 PUD 28a441067 PMD 0 
Oops: 0000 [#1] SMP 
CPU: 13 PID: 2364235 Comm: ps Not tainted 4.0.9 #1
task: ffff880445f16880 ti: ffff8801ac52c000 task.ti: ffff8801ac52c000
RIP: 0010:[<ffffffff81442bc0>]  [<ffffffff81442bc0>] vsnprintf+0x60/0x580
RSP: 0018:ffff8801ac52fc08  EFLAGS: 00010286
RAX: 0000000000000019 RBX: ffff88033a46720d RCX: ffffffff81981e31
RDX: 0000000000000025 RSI: ffff8801ac52fc40 RDI: ffffffff81981e18
RBP: ffff8801ac52fc78 R08: ffffffff81981e18 R09: 00000000ffffffff
R10: 0000000000000000 R11: ffff8801ac52fa9e R12: ffff88033a468000
R13: ffff8801ac52fcb0 R14: 000000000001f199 R15: ffffffff81981e18
FS:  00007f907481a880(0000) GS:ffff88046fba0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000019 CR3: 0000000342931000 CR4: 00000000003406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
 0000000000000017 0000000000000001 0000000000001000 ffffffff81977b7f
 ffff88033a46720d 0000000000000df3 ffffffff811d7b39 0000000000000000
 0000000000000296 ffff8801cffcc780 ffff8801cffcc780 ffff88046d7c7200
Call Trace:
 [<ffffffff811d7b39>] ? seq_vprintf+0x39/0x70
 [<ffffffff811d7b35>] seq_vprintf+0x35/0x70
 [<ffffffff811d7bad>] seq_printf+0x3d/0x40
 [<ffffffff8121b675>] proc_pid_status+0x735/0x7c0
 [<ffffffff811bdc32>] ? path_cleanup+0x42/0x60
 [<ffffffff812168f4>] proc_single_show+0x54/0xa0
 [<ffffffff811d746a>] seq_read+0xea/0x370
 [<ffffffff811b4cc0>] __vfs_read+0x20/0x60
 [<ffffffff811b4d86>] vfs_read+0x86/0x140
 [<ffffffff811b4e86>] SyS_read+0x46/0xb0
 [<ffffffff8175930e>] ? int_check_syscall_exit_work+0x34/0x3d
 [<ffffffff817590f2>] system_call_fastpath+0x12/0x17
Code: 89 cd 49 01 fc 0f 82 18 03 00 00 48 89 7d b0 41 0f b6 07 0f 1f 84 00 00 00 00 00 84 c0 74 43 48 8d 75 c8 4c 89 ff e8 30 d4 ff ff <0f> b6 55 c8 48 63 c8 4d 8d 34 0f 80 fa 07 0f 87 4c 02 00 00 ff 
RIP  [<ffffffff81442bc0>] vsnprintf+0x60/0x580
 RSP <ffff8801ac52fc08>

My first thought on this was "ps is reading /proc/pid/status while the
process is exiting/crashing"

Looking at proc_pid_status(), I see a bunch of the functions it calls
take task_lock(), but there are also a bunch of bare task struct dereferences.
Wrapping those in task_lock()/task_unlock()'s might narrow the window some,
but it still seems like there would be a small opportunity for this
same scenario to exist.

What puzzles me, is that I can't make this bug reproduce for the life of me,
even by adding sleeps in proc_pid_status, so I'm wondering if I'm way off base.
Is there some other locking at play here protecting the task struct that I'm overlooking ?

	Dave

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: /proc/<pid>/status & task struct locking
  2016-04-15 16:49 /proc/<pid>/status & task struct locking Dave Jones
@ 2016-04-15 18:07 ` Linus Torvalds
  2016-04-15 18:23   ` Dave Jones
  0 siblings, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2016-04-15 18:07 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel, Linus Torvalds, Al Viro

On Fri, Apr 15, 2016 at 9:49 AM, Dave Jones <davej@codemonkey.org.uk> wrote:
>  [<ffffffff811d7b39>] ? seq_vprintf+0x39/0x70
>  [<ffffffff811d7b35>] seq_vprintf+0x35/0x70
> Code: 89 cd 49 01 fc 0f 82 18 03 00 00 48 89 7d b0 41 0f b6 07 0f 1f 84 00 00 00 00 00 84 c0 74 43 48 8d 75 c8 4c 89 ff e8 30 d4 ff ff <0f> b6 55 c8 48 63 c8 4d 8d 34 0f 80 fa 07 0f 87 4c 02 00 00 ff

The code disassembles to

   0: 48 89 7d b0           mov    %rdi,-0x50(%rbp)
   4: 41 0f b6 07           movzbl (%r15),%eax
   8: 0f 1f 84 00 00 00 00 nopl
  10: 84 c0                 test   %al,%al
  12: 74 43                 je     0x57
  14: 48 8d 75 c8           lea    -0x38(%rbp),%rsi
  18: 4c 89 ff             mov    %r15,%rdi
  1b: e8 30 d4 ff ff       callq  0xffffffffffffd450
  20:* 0f b6 55 c8           movzbl -0x38(%rbp),%edx <-- trapping instruction
  24: 48 63 c8             movslq %eax,%rcx

which is interesting. That "-0x38(%rbp)" was passed (by reference) to
some subroutine, and now that we try to read the value, we take a
fault.

And it makes even less sense because %rbp really seems to be not a
random register, but the frame pointer:

  RBP: ffff8801ac52fc78
  RSP: ffff8801ac52fc08

So why the *hell* do we get

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000019

for that? That makes no sense.

Quite frankly, I would not attribute this to /proc/pid/status with
this kind of insane oops.

Maybe I misread your oops, but that just all looks completely bogus.
Even if the stack got corrupted and/or unmapped, how did %cr2 get that
odd "0000000000000019" fault address? None of this makes any sense at
all to me.

What CPU is this on? There was the crazy AMD microcode bug. This looks
even more random, because now the registers look fine, and the oops
just looks bad.

Do you have other versions of the oops for this same problem?

               Linus

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: /proc/<pid>/status & task struct locking
  2016-04-15 18:07 ` Linus Torvalds
@ 2016-04-15 18:23   ` Dave Jones
  2016-04-15 18:39     ` Linus Torvalds
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Jones @ 2016-04-15 18:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel, Al Viro

On Fri, Apr 15, 2016 at 11:07:16AM -0700, Linus Torvalds wrote:
 > On Fri, Apr 15, 2016 at 9:49 AM, Dave Jones <davej@codemonkey.org.uk> wrote:
 > >  [<ffffffff811d7b39>] ? seq_vprintf+0x39/0x70
 > >  [<ffffffff811d7b35>] seq_vprintf+0x35/0x70
 > > Code: 89 cd 49 01 fc 0f 82 18 03 00 00 48 89 7d b0 41 0f b6 07 0f 1f 84 00 00 00 00 00 84 c0 74 43 48 8d 75 c8 4c 89 ff e8 30 d4 ff ff <0f> b6 55 c8 48 63 c8 4d 8d 34 0f 80 fa 07 0f 87 4c 02 00 00 ff
 > 
 > The code disassembles to
 > 
 >    0: 48 89 7d b0           mov    %rdi,-0x50(%rbp)
 >    4: 41 0f b6 07           movzbl (%r15),%eax
 >    8: 0f 1f 84 00 00 00 00 nopl
 >   10: 84 c0                 test   %al,%al
 >   12: 74 43                 je     0x57
 >   14: 48 8d 75 c8           lea    -0x38(%rbp),%rsi
 >   18: 4c 89 ff             mov    %r15,%rdi
 >   1b: e8 30 d4 ff ff       callq  0xffffffffffffd450
 >   20:* 0f b6 55 c8           movzbl -0x38(%rbp),%edx <-- trapping instruction
 >   24: 48 63 c8             movslq %eax,%rcx
 > 
 > which is interesting. That "-0x38(%rbp)" was passed (by reference) to
 > some subroutine, and now that we try to read the value, we take a
 > fault.
 > 
 > And it makes even less sense because %rbp really seems to be not a
 > random register, but the frame pointer:
 > 
 >   RBP: ffff8801ac52fc78
 >   RSP: ffff8801ac52fc08
 > 
 > So why the *hell* do we get
 > 
 >   BUG: unable to handle kernel NULL pointer dereference at 0000000000000019
 > 
 > for that? That makes no sense.

That's a really good question.

 > Quite frankly, I would not attribute this to /proc/pid/status with
 > this kind of insane oops.
 > 
 > Maybe I misread your oops, but that just all looks completely bogus.
 > Even if the stack got corrupted and/or unmapped, how did %cr2 get that
 > odd "0000000000000019" fault address? None of this makes any sense at
 > all to me.
 > 
 > What CPU is this on? There was the crazy AMD microcode bug. This looks
 > even more random, because now the registers look fine, and the oops
 > just looks bad.

It's a Xeon E5-2680 v4.

 > Do you have other versions of the oops for this same problem?

I've seen this a few times, so I'll see if I can dig up some more next week.

I'm now wondering if it's just some hardware bug though. As mentioned it's
an outlier bug, but one that seems to pop up enough times that it's been
nagging at me, in case it's also responsible for similar weird /proc traces
I've been seeing (more frequently), those have a different signature to this
though.

To put my mind at rest though, am I wrong about that absent task_lock() stuff ?

	Dave

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: /proc/<pid>/status & task struct locking
  2016-04-15 18:23   ` Dave Jones
@ 2016-04-15 18:39     ` Linus Torvalds
  0 siblings, 0 replies; 4+ messages in thread
From: Linus Torvalds @ 2016-04-15 18:39 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Linux Kernel, Al Viro

On Fri, Apr 15, 2016 at 11:23 AM, Dave Jones <davej@codemonkey.org.uk> wrote:
>
> To put my mind at rest though, am I wrong about that absent task_lock() stuff ?

So the task shouldn't be going away, because we are using the
proc_single_file_operations, which use proc_single_show(), which in
turn do a

        task = get_pid_task(pid, PIDTYPE_PID);
        ...
        put_task_struct(task);

around it all.

So dereferencing the task pointer is all safe, and the only reason to
use task_lock() is if you end up doing something more complicated.

I'm not seeing anything wrong there. It does do the get_task_mm()
before touching mm fields, and the signal fields get protected by
lock_task_sighand().  The rest seems to just dereference the task
struct directly, and if those values fluctuate that's fine: you get
one or the other, no amount of locking will make /proc/pid/status give
"reliable" values in the big picture, since the user-space reader
won't have the lock anyway.

So it all looks fine to me, but I'm not saying I did some exhaustive check.

                  Linus

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-04-15 18:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-15 16:49 /proc/<pid>/status & task struct locking Dave Jones
2016-04-15 18:07 ` Linus Torvalds
2016-04-15 18:23   ` Dave Jones
2016-04-15 18:39     ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox