* BUG in find_pid_ns
@ 2013-02-17 17:48 Sasha Levin
2013-02-18 0:17 ` Eric W. Biederman
0 siblings, 1 reply; 4+ messages in thread
From: Sasha Levin @ 2013-02-17 17:48 UTC (permalink / raw)
To: Eric W. Biederman, Andrew Morton, serge.hallyn
Cc: Dave Jones, linux-kernel@vger.kernel.org
Hi all,
While fuzzing with trinity inside a KVM tools guest, running latest -next kernel,
I've stumbled on the following spew:
[ 400.345287] BUG: unable to handle kernel paging request at fffffffffffffff0
[ 400.346614] IP: [<ffffffff81131d50>] find_pid_ns+0x110/0x1f0
[ 400.347649] PGD 5429067 PUD 542b067 PMD 0
[ 400.348459] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 400.351166] Dumping ftrace buffer:
[ 400.352884] (ftrace buffer empty)
[ 400.354640] Modules linked in:
[ 400.355021] CPU 1
[ 400.355021] Pid: 6890, comm: trinity Tainted: G W 3.8.0-rc7-next-20130215-sasha-00027-gb399f44-dirty #288
[ 400.355021] RIP: 0010:[<ffffffff81131d50>] [<ffffffff81131d50>] find_pid_ns+0x110/0x1f0
[ 400.375245] RSP: 0018:ffff8800aedb5e18 EFLAGS: 00010286
[ 400.380086] RAX: 0000000000000001 RBX: 0000000000007e7d RCX: 0000000000000000
[ 400.383643] RDX: 0000000000000001 RSI: ffffffff85466e40 RDI: 0000000000000286
[ 400.383643] RBP: ffff8800aedb5e48 R08: 0000000000000001 R09: 0000000000000001
[ 400.383643] R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff85466460
[ 400.383643] R13: ffff8800bf8d3928 R14: fffffffffffffff0 R15: ffff8800a5b7f140
[ 400.383643] FS: 00007faab0ad2700(0000) GS:ffff8800bb800000(0000) knlGS:0000000000000000
[ 400.383643] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 400.383643] CR2: fffffffffffffff0 CR3: 00000000b07a1000 CR4: 00000000000406e0
[ 400.383643] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 400.383643] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 400.383643] Process trinity (pid: 6890, threadinfo ffff8800aedb4000, task ffff8800b0660000)
[ 400.383643] Stack:
[ 400.383643] ffffffff85466e40 0000000000007e7d ffff8800aedb5ed8 0000000000000000
[ 400.383643] 0000000000000004 20c49ba5e353f7cf ffff8800aedb5e58 ffffffff81131e5c
[ 400.383643] ffff8800aedb5ec8 ffffffff8112400f ffffffff81123f9c 0000000000000000
[ 400.383643] Call Trace:
[ 400.383643] [<ffffffff81131e5c>] find_vpid+0x2c/0x30
[ 400.383643] [<ffffffff81123f9c>] ? kill_something_info+0x2c/0x270
[ 400.383643] [<ffffffff81125e38>] sys_kill+0x88/0xa0
[ 400.383643] [<ffffffff8107ad34>] ? syscall_trace_enter+0x24/0x2e0
[ 400.383643] [<ffffffff811813b8>] ? trace_hardirqs_on_caller+0x128/0x160
[ 400.383643] [<ffffffff83d96275>] ? tracesys+0x7e/0xe6
[ 400.383643] [<ffffffff83d962d8>] tracesys+0xe1/0xe6
[ 400.383643] Code: 4d 8b 75 00 e8 b2 0e 00 00 85 c0 0f 84 d2 00 00 00 80 3d fa 17 d5 04 00 0f 85 c5 00 00 00 e9 93 00 00 00 0f
1f 84 00 00 00 00 00 <41> 39 1e 75 2b 4d 39 66 08 75 25 41 8b 84 24 20 08 00 00 48 c1
[ 400.383643] RIP [<ffffffff81131d50>] find_pid_ns+0x110/0x1f0
[ 400.383643] RSP <ffff8800aedb5e18>
[ 400.383643] CR2: fffffffffffffff0
[ 400.383643] ---[ end trace 5bae629f658bf736 ]---
This points out to:
struct pid *find_pid_ns(int nr, struct pid_namespace *ns)
{
struct upid *pnr;
hlist_for_each_entry_rcu(pnr,
&pid_hash[pid_hashfn(nr, ns)], pid_chain)
if (pnr->nr == nr && pnr->ns == ns) <=== here
return container_of(pnr, struct pid,
numbers[ns->level]);
return NULL;
}
Thanks,
Sasha
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: BUG in find_pid_ns
2013-02-17 17:48 BUG in find_pid_ns Sasha Levin
@ 2013-02-18 0:17 ` Eric W. Biederman
2013-02-18 2:51 ` Sasha Levin
0 siblings, 1 reply; 4+ messages in thread
From: Eric W. Biederman @ 2013-02-18 0:17 UTC (permalink / raw)
To: Sasha Levin
Cc: Andrew Morton, serge.hallyn, Dave Jones,
linux-kernel@vger.kernel.org, Oleg Nesterov
Adding Oleg since he knows about as much about signals and pids as
anyone.
Sasha Levin <sasha.levin@oracle.com> writes:
> Hi all,
>
> While fuzzing with trinity inside a KVM tools guest, running latest -next kernel,
> I've stumbled on the following spew:
To my knowledge there are no in progress patches to this area of the kernel
and nothing has changed in quite a while.
The bad pointer value is 0xfffffffffffffff0. Hmm.
If you have the failure location correct it looks like a corrupted hash
entry was found while following the hash chain.
It looks like the memory has been set to -16 -EBUSY? Weird.
It smells like something is stomping on the memory of a struct pid, with
the same hash value and thus in the same hash chain as the current pid.
Can you reproduce this?
Memory corruption is hard to trace down with just a single data point.
Looking a little closer Sasha you have rewritten
hlist_for_each_entry_rcu, and that seems to be the most recent patch
dealing with pids, and we are failing in hlist_for_each_entry_rcu.
I haven't looked at your patch in enough detail to know if you have
missed something or not, but a brand new patch and a brand new failure
certainly look suspicious at first glance.
Eric
> [ 400.345287] BUG: unable to handle kernel paging request at fffffffffffffff0
> [ 400.346614] IP: [<ffffffff81131d50>] find_pid_ns+0x110/0x1f0
> [ 400.347649] PGD 5429067 PUD 542b067 PMD 0
> [ 400.348459] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [ 400.351166] Dumping ftrace buffer:
> [ 400.352884] (ftrace buffer empty)
> [ 400.354640] Modules linked in:
> [ 400.355021] CPU 1
> [ 400.355021] Pid: 6890, comm: trinity Tainted: G W 3.8.0-rc7-next-20130215-sasha-00027-gb399f44-dirty #288
> [ 400.355021] RIP: 0010:[<ffffffff81131d50>] [<ffffffff81131d50>] find_pid_ns+0x110/0x1f0
> [ 400.375245] RSP: 0018:ffff8800aedb5e18 EFLAGS: 00010286
> [ 400.380086] RAX: 0000000000000001 RBX: 0000000000007e7d RCX: 0000000000000000
> [ 400.383643] RDX: 0000000000000001 RSI: ffffffff85466e40 RDI: 0000000000000286
> [ 400.383643] RBP: ffff8800aedb5e48 R08: 0000000000000001 R09: 0000000000000001
> [ 400.383643] R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff85466460
> [ 400.383643] R13: ffff8800bf8d3928 R14: fffffffffffffff0 R15: ffff8800a5b7f140
> [ 400.383643] FS: 00007faab0ad2700(0000) GS:ffff8800bb800000(0000) knlGS:0000000000000000
> [ 400.383643] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 400.383643] CR2: fffffffffffffff0 CR3: 00000000b07a1000 CR4: 00000000000406e0
> [ 400.383643] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 400.383643] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 400.383643] Process trinity (pid: 6890, threadinfo ffff8800aedb4000, task ffff8800b0660000)
> [ 400.383643] Stack:
> [ 400.383643] ffffffff85466e40 0000000000007e7d ffff8800aedb5ed8 0000000000000000
> [ 400.383643] 0000000000000004 20c49ba5e353f7cf ffff8800aedb5e58 ffffffff81131e5c
> [ 400.383643] ffff8800aedb5ec8 ffffffff8112400f ffffffff81123f9c 0000000000000000
> [ 400.383643] Call Trace:
> [ 400.383643] [<ffffffff81131e5c>] find_vpid+0x2c/0x30
> [ 400.383643] [<ffffffff81123f9c>] ? kill_something_info+0x2c/0x270
> [ 400.383643] [<ffffffff81125e38>] sys_kill+0x88/0xa0
> [ 400.383643] [<ffffffff8107ad34>] ? syscall_trace_enter+0x24/0x2e0
> [ 400.383643] [<ffffffff811813b8>] ? trace_hardirqs_on_caller+0x128/0x160
> [ 400.383643] [<ffffffff83d96275>] ? tracesys+0x7e/0xe6
> [ 400.383643] [<ffffffff83d962d8>] tracesys+0xe1/0xe6
> [ 400.383643] Code: 4d 8b 75 00 e8 b2 0e 00 00 85 c0 0f 84 d2 00 00 00 80 3d fa 17 d5 04 00 0f 85 c5 00 00 00 e9 93 00 00 00 0f
> 1f 84 00 00 00 00 00 <41> 39 1e 75 2b 4d 39 66 08 75 25 41 8b 84 24 20 08 00 00 48 c1
> [ 400.383643] RIP [<ffffffff81131d50>] find_pid_ns+0x110/0x1f0
> [ 400.383643] RSP <ffff8800aedb5e18>
> [ 400.383643] CR2: fffffffffffffff0
> [ 400.383643] ---[ end trace 5bae629f658bf736 ]---
>
> This points out to:
>
> struct pid *find_pid_ns(int nr, struct pid_namespace *ns)
> {
> struct upid *pnr;
>
> hlist_for_each_entry_rcu(pnr,
> &pid_hash[pid_hashfn(nr, ns)], pid_chain)
> if (pnr->nr == nr && pnr->ns == ns) <=== here
> return container_of(pnr, struct pid,
> numbers[ns->level]);
>
> return NULL;
> }
>
>
> Thanks,
> Sasha
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: BUG in find_pid_ns
2013-02-18 0:17 ` Eric W. Biederman
@ 2013-02-18 2:51 ` Sasha Levin
2013-02-19 4:38 ` Eric W. Biederman
0 siblings, 1 reply; 4+ messages in thread
From: Sasha Levin @ 2013-02-18 2:51 UTC (permalink / raw)
To: ebiederm
Cc: Andrew Morton, serge.hallyn, Dave Jones,
linux-kernel@vger.kernel.org, Oleg Nesterov
On 02/17/2013 07:17 PM, ebiederm@xmission.com wrote:
> The bad pointer value is 0xfffffffffffffff0. Hmm.
>
> If you have the failure location correct it looks like a corrupted hash
> entry was found while following the hash chain.
>
> It looks like the memory has been set to -16 -EBUSY? Weird.
>
> It smells like something is stomping on the memory of a struct pid, with
> the same hash value and thus in the same hash chain as the current pid.
>
> Can you reproduce this?
I've just reproduced it again:
[ 2404.518957] BUG: unable to handle kernel paging request at fffffffffffffff0
[ 2404.520024] IP: [<ffffffff81131d50>] find_pid_ns+0x110/0x1f0
[ 2404.520024] PGD 5429067 PUD 542b067 PMD 0
[ 2404.520024] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 2404.520024] Dumping ftrace buffer:
[ 2404.520024] (ftrace buffer empty)
[ 2404.520024] Modules linked in:
[ 2404.520024] CPU 3
[ 2404.520024] Pid: 6890, comm: trinity Tainted: G W 3.8.0-rc7-next-20130215-sasha-00027-gb399f44-dirty #288
[ 2404.520024] RIP: 0010:[<ffffffff81131d50>] [<ffffffff81131d50>] find_pid_ns+0x110/0x1f0
[ 2404.520024] RSP: 0018:ffff8800af1dfe18 EFLAGS: 00010286
[ 2404.520024] RAX: 0000000000000001 RBX: 0000000000004b72 RCX: 0000000000000000
[ 2404.520024] RDX: 0000000000000001 RSI: ffffffff85466e40 RDI: 0000000000000286
[ 2404.520024] RBP: ffff8800af1dfe48 R08: 0000000000000001 R09: 0000000000000001
[ 2404.520024] R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff85466460
[ 2404.520024] R13: ffff8800bf8d3ef8 R14: fffffffffffffff0 R15: ffff8800a43d9a40
[ 2404.520024] FS: 00007f8300f79700(0000) GS:ffff8800bbc00000(0000) knlGS:0000000000000000
[ 2404.520024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2404.520024] CR2: fffffffffffffff0 CR3: 00000000af0b7000 CR4: 00000000000406e0
[ 2404.520024] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2404.520024] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2404.520024] Process trinity (pid: 6890, threadinfo ffff8800af1de000, task ffff8800b060b000)
[ 2404.520024] Stack:
[ 2404.520024] ffffffff85466e40 0000000000004b72 ffff8800af1dfed8 0000000000000000
[ 2404.520024] 0000000000000003 20c49ba5e353f7cf ffff8800af1dfe58 ffffffff81131e5c
[ 2404.520024] ffff8800af1dfec8 ffffffff8112400f ffffffff81123f9c 0000000000000000
[ 2404.520024] Call Trace:
[ 2404.520024] [<ffffffff81131e5c>] find_vpid+0x2c/0x30
[ 2404.520024] [<ffffffff8112400f>] kill_something_info+0x9f/0x270
[ 2404.673395] [<ffffffff81123f9c>] ? kill_something_info+0x2c/0x270
[ 2404.673395] [<ffffffff81125e38>] sys_kill+0x88/0xa0
[ 2404.673395] [<ffffffff8107ad34>] ? syscall_trace_enter+0x24/0x2e0
[ 2404.694324] [<ffffffff811813b8>] ? trace_hardirqs_on_caller+0x128/0x160
[ 2404.694324] [<ffffffff83d96275>] ? tracesys+0x7e/0xe6
[ 2404.694324] [<ffffffff83d962d8>] tracesys+0xe1/0xe6
[ 2404.694324] Code: 4d 8b 75 00 e8 b2 0e 00 00 85 c0 0f 84 d2 00 00 00 80 3d fa 17 d5 04 00 0f 85 c5 00 00 00 e9 93 00 00 00 0f
1f 84 00 00 00 00 00 <41> 39 1e 75 2b 4d 39 66 08 75 25 41 8b 84 24 20 08 00 00 48 c1
[ 2404.733487] RIP [<ffffffff81131d50>] find_pid_ns+0x110/0x1f0
[ 2404.740299] RSP <ffff8800af1dfe18>
[ 2404.740299] CR2: fffffffffffffff0
[ 2404.740299] ---[ end trace 9f8bc22bbe4fe990 ]---
I'm not sure what debug info I could throw in which will be helpful. Dump
the entire chain or table if 'pnr' happens to look odd?
> Memory corruption is hard to trace down with just a single data point.
>
> Looking a little closer Sasha you have rewritten
> hlist_for_each_entry_rcu, and that seems to be the most recent patch
> dealing with pids, and we are failing in hlist_for_each_entry_rcu.
>
> I haven't looked at your patch in enough detail to know if you have
> missed something or not, but a brand new patch and a brand new failure
> certainly look suspicious at first glance.
Agreed, I've also took a second look at it when this BUG popped up. What
surprises me about it is that if the new iteration is broken, the kernel
would spectacularly break in a bunch of places instead of failing in the
exact same place twice.
Not ignoring the possibility it's broken though.
Thanks,
Sasha
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: BUG in find_pid_ns
2013-02-18 2:51 ` Sasha Levin
@ 2013-02-19 4:38 ` Eric W. Biederman
0 siblings, 0 replies; 4+ messages in thread
From: Eric W. Biederman @ 2013-02-19 4:38 UTC (permalink / raw)
To: Sasha Levin
Cc: Andrew Morton, serge.hallyn, Dave Jones,
linux-kernel@vger.kernel.org, Oleg Nesterov
Sasha Levin <sasha.levin@oracle.com> writes:
> On 02/17/2013 07:17 PM, ebiederm@xmission.com wrote:
>> The bad pointer value is 0xfffffffffffffff0. Hmm.
>>
>> If you have the failure location correct it looks like a corrupted hash
>> entry was found while following the hash chain.
>>
>> It looks like the memory has been set to -16 -EBUSY? Weird.
>>
>> It smells like something is stomping on the memory of a struct pid, with
>> the same hash value and thus in the same hash chain as the current pid.
>>
>> Can you reproduce this?
>
> I've just reproduced it again:
>
> [ 2404.518957] BUG: unable to handle kernel paging request at fffffffffffffff0
> [ 2404.520024] IP: [<ffffffff81131d50>] find_pid_ns+0x110/0x1f0
> [ 2404.520024] PGD 5429067 PUD 542b067 PMD 0
> [ 2404.520024] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [ 2404.520024] Dumping ftrace buffer:
> [ 2404.520024] (ftrace buffer empty)
> [ 2404.520024] Modules linked in:
> [ 2404.520024] CPU 3
> [ 2404.520024] Pid: 6890, comm: trinity Tainted: G W 3.8.0-rc7-next-20130215-sasha-00027-gb399f44-dirty #288
> [ 2404.520024] RIP: 0010:[<ffffffff81131d50>] [<ffffffff81131d50>] find_pid_ns+0x110/0x1f0
> [ 2404.520024] RSP: 0018:ffff8800af1dfe18 EFLAGS: 00010286
> [ 2404.520024] RAX: 0000000000000001 RBX: 0000000000004b72 RCX: 0000000000000000
> [ 2404.520024] RDX: 0000000000000001 RSI: ffffffff85466e40 RDI: 0000000000000286
> [ 2404.520024] RBP: ffff8800af1dfe48 R08: 0000000000000001 R09: 0000000000000001
> [ 2404.520024] R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff85466460
> [ 2404.520024] R13: ffff8800bf8d3ef8 R14: fffffffffffffff0 R15: ffff8800a43d9a40
> [ 2404.520024] FS: 00007f8300f79700(0000) GS:ffff8800bbc00000(0000) knlGS:0000000000000000
> [ 2404.520024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2404.520024] CR2: fffffffffffffff0 CR3: 00000000af0b7000 CR4: 00000000000406e0
> [ 2404.520024] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2404.520024] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 2404.520024] Process trinity (pid: 6890, threadinfo ffff8800af1de000, task ffff8800b060b000)
> [ 2404.520024] Stack:
> [ 2404.520024] ffffffff85466e40 0000000000004b72 ffff8800af1dfed8 0000000000000000
> [ 2404.520024] 0000000000000003 20c49ba5e353f7cf ffff8800af1dfe58 ffffffff81131e5c
> [ 2404.520024] ffff8800af1dfec8 ffffffff8112400f ffffffff81123f9c 0000000000000000
> [ 2404.520024] Call Trace:
> [ 2404.520024] [<ffffffff81131e5c>] find_vpid+0x2c/0x30
> [ 2404.520024] [<ffffffff8112400f>] kill_something_info+0x9f/0x270
> [ 2404.673395] [<ffffffff81123f9c>] ? kill_something_info+0x2c/0x270
> [ 2404.673395] [<ffffffff81125e38>] sys_kill+0x88/0xa0
> [ 2404.673395] [<ffffffff8107ad34>] ? syscall_trace_enter+0x24/0x2e0
> [ 2404.694324] [<ffffffff811813b8>] ? trace_hardirqs_on_caller+0x128/0x160
> [ 2404.694324] [<ffffffff83d96275>] ? tracesys+0x7e/0xe6
> [ 2404.694324] [<ffffffff83d962d8>] tracesys+0xe1/0xe6
> [ 2404.694324] Code: 4d 8b 75 00 e8 b2 0e 00 00 85 c0 0f 84 d2 00 00 00 80 3d fa 17 d5 04 00 0f 85 c5 00 00 00 e9 93 00 00 00 0f
> 1f 84 00 00 00 00 00 <41> 39 1e 75 2b 4d 39 66 08 75 25 41 8b 84 24 20 08 00 00 48 c1
> [ 2404.733487] RIP [<ffffffff81131d50>] find_pid_ns+0x110/0x1f0
> [ 2404.740299] RSP <ffff8800af1dfe18>
> [ 2404.740299] CR2: fffffffffffffff0
> [ 2404.740299] ---[ end trace 9f8bc22bbe4fe990 ]---
>
> I'm not sure what debug info I could throw in which will be helpful. Dump
> the entire chain or table if 'pnr' happens to look odd?
>
>> Memory corruption is hard to trace down with just a single data point.
>>
>> Looking a little closer Sasha you have rewritten
>> hlist_for_each_entry_rcu, and that seems to be the most recent patch
>> dealing with pids, and we are failing in hlist_for_each_entry_rcu.
>>
>> I haven't looked at your patch in enough detail to know if you have
>> missed something or not, but a brand new patch and a brand new failure
>> certainly look suspicious at first glance.
>
> Agreed, I've also took a second look at it when this BUG popped up. What
> surprises me about it is that if the new iteration is broken, the kernel
> would spectacularly break in a bunch of places instead of failing in the
> exact same place twice.
>
> Not ignoring the possibility it's broken though.
I don't see any obvious problems with your code however. struct upid is:
struct upid {
/* Try to keep pid_chain in the same cacheline as nr for find_vpid */
int nr;
struct pid_namespace *ns;
struct hlist_node pid_chain;
};
Which puts pid_chain at offset 16. nr is at offset 0 and is the first
field we access.
Trying to access a upid at address fffffffffffffff0 looks for all the
world like hlist_entry_safe decremented a NULL pointer by 16 bytebs.
I suggest you take a look at the assembly. Upon occassion gcc
optmizations get smart and optimize out things we would prefer they
kept.
As for the rest perhaps there is something odd about the optimization
of pid_nr_ns or perhaps kill_something_info is the only frequent test
where we try to acess something that does not exist.
Eric
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-02-19 4:38 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-17 17:48 BUG in find_pid_ns Sasha Levin
2013-02-18 0:17 ` Eric W. Biederman
2013-02-18 2:51 ` Sasha Levin
2013-02-19 4:38 ` Eric W. Biederman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).