* general protection fault on finalizing task
@ 2012-06-14 8:03 Andrey Vagin
2012-06-14 16:01 ` Oleg Nesterov
0 siblings, 1 reply; 6+ messages in thread
From: Andrey Vagin @ 2012-06-14 8:03 UTC (permalink / raw)
To: LKML, Oleg Nesterov, Andrew Morton, Cyrill Gorcunov,
Pavel Emelyanov
Hello,
I'm developing CRIU (criu.org) and got this GP. I have seen it a few
time with the same stack trace.
It's not reproduced on 3.4.0-rc4+.
general protection fault: 0000 [#1] SMP
CPU 0
Modules linked in: udp_diag bridge stp llc ipv6 ext4 jbd2 dm_mirror
dm_region_hash dm_log dm_mod pcspkr virtio_balloon 8139too 8139cp mii
i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring
virtio pata_acpi ata_generic ata_piix floppy [last unloaded:
scsi_wait_scan]
Pid: 1647, comm: crtools Not tainted 3.5.0-rc2+ #203 Red Hat KVM
RIP: 0010:[<ffffffff811b453a>] [<ffffffff811b453a>] d_hash_and_lookup+0x2a/0x70
RSP: 0018:ffff88001651bd28 EFLAGS: 00010246
RAX: 0000000000003531 RBX: ffff88001651bd68 RCX: 0000000000000010
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000003531
RBP: ffff88001651bd38 R08: 000000000000fffa R09: 0000000000000002
R10: 0000000000000000 R11: 000000000000fffd R12: 6b6b6b6b6b6b6b6b
R13: ffff88001a3b3db0 R14: ffff88001651bd68 R15: 000000000000000f
FS: 00007ff80c4a2700(0000) GS:ffff88001f800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007ff80c4ac000 CR3: 0000000001a0b000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process crtools (pid: 1647, threadinfo ffff88001651a000, task ffff880017154c40)
Stack:
ffff88001651bd78 0000000000000001 ffff88001651bdc8 ffffffff812050c0
ffff8800185b44b0 ffff88001721e4a0 ffff88001721e4a0 0000000f81057b6c
0000000200003531 ffff88001651bd78 ffff880032003531 0000000000000246
Call Trace:
[<ffffffff812050c0>] proc_flush_task+0xa0/0x1e0
[<ffffffff81057c0e>] release_task+0xce/0x690
[<ffffffff81057b6c>] ? release_task+0x2c/0x690
[<ffffffff810622c2>] exit_ptrace+0x102/0x140
[<ffffffff81059c64>] do_exit+0x214/0xa70
[<ffffffff81553cbb>] ? _raw_read_unlock+0x2b/0x50
[<ffffffff8105a51b>] do_group_exit+0x5b/0xd0
[<ffffffff8105a5a7>] sys_exit_group+0x17/0x20
[<ffffffff8155cee9>] system_call_fastpath+0x16/0x1b
Code: 00 55 48 89 e5 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 66 66 66
66 90 48 89 f3 49 89 fc 8b 76 04 48 8b 7b 08 e8 58 0c ff ff 89 03 <41>
f6 04 24 01 75 1f 48 89 de 4c 89 e7 e8 64 ff ff ff 48 8b 1c
RIP [<ffffffff811b453a>] d_hash_and_lookup+0x2a/0x70
RSP <ffff88001651bd28>
---[ end trace 250bb1fa95f4b805 ]---
Fixing recursive fault but reboot is needed!
Steps to reproduce:
* # git clone git://github.com/avagin/crtools.git -b gp-3.5
* # cd crtools
* # make && make -C test
* # while :; do bash test/zdtm.sh pidns/static/session00 || break; done
* Wait a few seconds
session00 is a test case for checking, that session ids restored correctly.
it create about 10 processes in a separate pidns, some of them wait
children, other ones
wait on read from pipe. crtools freezes and dumps state of this
processes and kill processes.
The bug is reproduced, when crtools try to kill tasks (in this moment
crtools attached to this tasks by ptrace).
The meta code looks like:
for_each_task(pid) {
kill(pid, SIGKILL);
ptrace(PTRACE_DETACH, pid, NULL, NULL);
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: general protection fault on finalizing task
2012-06-14 8:03 general protection fault on finalizing task Andrey Vagin
@ 2012-06-14 16:01 ` Oleg Nesterov
2012-06-14 20:37 ` Andrew Wagin
2012-06-14 21:05 ` Andrew Wagin
0 siblings, 2 replies; 6+ messages in thread
From: Oleg Nesterov @ 2012-06-14 16:01 UTC (permalink / raw)
To: Andrey Vagin; +Cc: LKML, Andrew Morton, Cyrill Gorcunov, Pavel Emelyanov
Hi Andrey,
On 06/14, Andrey Vagin wrote:
>
> Hello,
>
> I'm developing CRIU (criu.org) and got this GP. I have seen it a few
> time with the same stack trace.
> It's not reproduced on 3.4.0-rc4+.
>
> general protection fault: 0000 [#1] SMP
> CPU 0
> Modules linked in: udp_diag bridge stp llc ipv6 ext4 jbd2 dm_mirror
> dm_region_hash dm_log dm_mod pcspkr virtio_balloon 8139too 8139cp mii
> i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring
> virtio pata_acpi ata_generic ata_piix floppy [last unloaded:
> scsi_wait_scan]
>
> Pid: 1647, comm: crtools Not tainted 3.5.0-rc2+ #203 Red Hat KVM
> RIP: 0010:[<ffffffff811b453a>] [<ffffffff811b453a>] d_hash_and_lookup+0x2a/0x70
Could you please re-test with these
http://marc.info/?l=linux-mm-commits&m=133962463616232
http://marc.info/?l=linux-mm-commits&m=133962463616231
patches applied?
> RSP: 0018:ffff88001651bd28 EFLAGS: 00010246
> RAX: 0000000000003531 RBX: ffff88001651bd68 RCX: 0000000000000010
> RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000003531
> RBP: ffff88001651bd38 R08: 000000000000fffa R09: 0000000000000002
> R10: 0000000000000000 R11: 000000000000fffd R12: 6b6b6b6b6b6b6b6b
> R13: ffff88001a3b3db0 R14: ffff88001651bd68 R15: 000000000000000f
> FS: 00007ff80c4a2700(0000) GS:ffff88001f800000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007ff80c4ac000 CR3: 0000000001a0b000 CR4: 00000000000006f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process crtools (pid: 1647, threadinfo ffff88001651a000, task ffff880017154c40)
> Stack:
> ffff88001651bd78 0000000000000001 ffff88001651bdc8 ffffffff812050c0
> ffff8800185b44b0 ffff88001721e4a0 ffff88001721e4a0 0000000f81057b6c
> 0000000200003531 ffff88001651bd78 ffff880032003531 0000000000000246
> Call Trace:
> [<ffffffff812050c0>] proc_flush_task+0xa0/0x1e0
> [<ffffffff81057c0e>] release_task+0xce/0x690
> [<ffffffff81057b6c>] ? release_task+0x2c/0x690
> [<ffffffff810622c2>] exit_ptrace+0x102/0x140
> [<ffffffff81059c64>] do_exit+0x214/0xa70
> [<ffffffff81553cbb>] ? _raw_read_unlock+0x2b/0x50
> [<ffffffff8105a51b>] do_group_exit+0x5b/0xd0
> [<ffffffff8105a5a7>] sys_exit_group+0x17/0x20
> [<ffffffff8155cee9>] system_call_fastpath+0x16/0x1b
> Code: 00 55 48 89 e5 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 66 66 66
> 66 90 48 89 f3 49 89 fc 8b 76 04 48 8b 7b 08 e8 58 0c ff ff 89 03 <41>
> f6 04 24 01 75 1f 48 89 de 4c 89 e7 e8 64 ff ff ff 48 8b 1c
> RIP [<ffffffff811b453a>] d_hash_and_lookup+0x2a/0x70
> RSP <ffff88001651bd28>
> ---[ end trace 250bb1fa95f4b805 ]---
> Fixing recursive fault but reboot is needed!
>
> Steps to reproduce:
> * # git clone git://github.com/avagin/crtools.git -b gp-3.5
> * # cd crtools
> * # make && make -C test
> * # while :; do bash test/zdtm.sh pidns/static/session00 || break; done
> * Wait a few seconds
>
> session00 is a test case for checking, that session ids restored correctly.
> it create about 10 processes in a separate pidns, some of them wait
> children, other ones
> wait on read from pipe. crtools freezes and dumps state of this
> processes and kill processes.
>
> The bug is reproduced, when crtools try to kill tasks (in this moment
> crtools attached to this tasks by ptrace).
> The meta code looks like:
> for_each_task(pid) {
> kill(pid, SIGKILL);
> ptrace(PTRACE_DETACH, pid, NULL, NULL);
> }
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: general protection fault on finalizing task
2012-06-14 16:01 ` Oleg Nesterov
@ 2012-06-14 20:37 ` Andrew Wagin
2012-06-15 9:42 ` Oleg Nesterov
2012-06-14 21:05 ` Andrew Wagin
1 sibling, 1 reply; 6+ messages in thread
From: Andrew Wagin @ 2012-06-14 20:37 UTC (permalink / raw)
To: Oleg Nesterov
Cc: LKML, Andrew Morton, Cyrill Gorcunov, Pavel Emelyanov,
Eric W. Biederman
Oleg, thank you for response. I'm going to test yours patches.
FYI: I bisected this problem.
# git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[3208450488ae724196f1efffc457e4265957c04e] pidns: use
task_active_pid_ns in do_notify_parent
commit 3208450488ae724196f1efffc457e4265957c04e
Author: Eric W. Biederman <ebiederm@xmission.com>
Date: Thu May 31 16:26:39 2012 -0700
pidns: use task_active_pid_ns in do_notify_parent
Using task_active_pid_ns is more robust because it works even after we
have called exit_namespaces. This change allows us to have parent
processes that are zombies. Normally a zombie parent processes is crazy
and the last thing you would want to have but in the case of not letting
the init process of a pid namespace be reaped until all of it's children
are dead and reaped a zombie parent process is exactly what we want.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Louis Rilling <louis.rilling@kerlabs.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012/6/14 Oleg Nesterov <oleg@redhat.com>:
> Hi Andrey,
>
> On 06/14, Andrey Vagin wrote:
>>
>> Hello,
>>
>> I'm developing CRIU (criu.org) and got this GP. I have seen it a few
>> time with the same stack trace.
>> It's not reproduced on 3.4.0-rc4+.
>>
>> general protection fault: 0000 [#1] SMP
>> CPU 0
>> Modules linked in: udp_diag bridge stp llc ipv6 ext4 jbd2 dm_mirror
>> dm_region_hash dm_log dm_mod pcspkr virtio_balloon 8139too 8139cp mii
>> i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring
>> virtio pata_acpi ata_generic ata_piix floppy [last unloaded:
>> scsi_wait_scan]
>>
>> Pid: 1647, comm: crtools Not tainted 3.5.0-rc2+ #203 Red Hat KVM
>> RIP: 0010:[<ffffffff811b453a>] [<ffffffff811b453a>] d_hash_and_lookup+0x2a/0x70
>
> Could you please re-test with these
>
> http://marc.info/?l=linux-mm-commits&m=133962463616232
> http://marc.info/?l=linux-mm-commits&m=133962463616231
>
> patches applied?
>
>
>> RSP: 0018:ffff88001651bd28 EFLAGS: 00010246
>> RAX: 0000000000003531 RBX: ffff88001651bd68 RCX: 0000000000000010
>> RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000003531
>> RBP: ffff88001651bd38 R08: 000000000000fffa R09: 0000000000000002
>> R10: 0000000000000000 R11: 000000000000fffd R12: 6b6b6b6b6b6b6b6b
>> R13: ffff88001a3b3db0 R14: ffff88001651bd68 R15: 000000000000000f
>> FS: 00007ff80c4a2700(0000) GS:ffff88001f800000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> CR2: 00007ff80c4ac000 CR3: 0000000001a0b000 CR4: 00000000000006f0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> Process crtools (pid: 1647, threadinfo ffff88001651a000, task ffff880017154c40)
>> Stack:
>> ffff88001651bd78 0000000000000001 ffff88001651bdc8 ffffffff812050c0
>> ffff8800185b44b0 ffff88001721e4a0 ffff88001721e4a0 0000000f81057b6c
>> 0000000200003531 ffff88001651bd78 ffff880032003531 0000000000000246
>> Call Trace:
>> [<ffffffff812050c0>] proc_flush_task+0xa0/0x1e0
>> [<ffffffff81057c0e>] release_task+0xce/0x690
>> [<ffffffff81057b6c>] ? release_task+0x2c/0x690
>> [<ffffffff810622c2>] exit_ptrace+0x102/0x140
>> [<ffffffff81059c64>] do_exit+0x214/0xa70
>> [<ffffffff81553cbb>] ? _raw_read_unlock+0x2b/0x50
>> [<ffffffff8105a51b>] do_group_exit+0x5b/0xd0
>> [<ffffffff8105a5a7>] sys_exit_group+0x17/0x20
>> [<ffffffff8155cee9>] system_call_fastpath+0x16/0x1b
>> Code: 00 55 48 89 e5 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 66 66 66
>> 66 90 48 89 f3 49 89 fc 8b 76 04 48 8b 7b 08 e8 58 0c ff ff 89 03 <41>
>> f6 04 24 01 75 1f 48 89 de 4c 89 e7 e8 64 ff ff ff 48 8b 1c
>> RIP [<ffffffff811b453a>] d_hash_and_lookup+0x2a/0x70
>> RSP <ffff88001651bd28>
>> ---[ end trace 250bb1fa95f4b805 ]---
>> Fixing recursive fault but reboot is needed!
>>
>> Steps to reproduce:
>> * # git clone git://github.com/avagin/crtools.git -b gp-3.5
>> * # cd crtools
>> * # make && make -C test
>> * # while :; do bash test/zdtm.sh pidns/static/session00 || break; done
>> * Wait a few seconds
>>
>> session00 is a test case for checking, that session ids restored correctly.
>> it create about 10 processes in a separate pidns, some of them wait
>> children, other ones
>> wait on read from pipe. crtools freezes and dumps state of this
>> processes and kill processes.
>>
>> The bug is reproduced, when crtools try to kill tasks (in this moment
>> crtools attached to this tasks by ptrace).
>> The meta code looks like:
>> for_each_task(pid) {
>> kill(pid, SIGKILL);
>> ptrace(PTRACE_DETACH, pid, NULL, NULL);
>> }
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: general protection fault on finalizing task
2012-06-14 16:01 ` Oleg Nesterov
2012-06-14 20:37 ` Andrew Wagin
@ 2012-06-14 21:05 ` Andrew Wagin
2012-06-14 22:28 ` Andrew Morton
1 sibling, 1 reply; 6+ messages in thread
From: Andrew Wagin @ 2012-06-14 21:05 UTC (permalink / raw)
To: Oleg Nesterov
Cc: LKML, Andrew Morton, Cyrill Gorcunov, Pavel Emelyanov,
Eric W. Biederman
>
> Could you please re-test with these
>
> http://marc.info/?l=linux-mm-commits&m=133962463616232
> http://marc.info/?l=linux-mm-commits&m=133962463616231
>
> patches applied?
Yes. They fixed the bug. Thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: general protection fault on finalizing task
2012-06-14 21:05 ` Andrew Wagin
@ 2012-06-14 22:28 ` Andrew Morton
0 siblings, 0 replies; 6+ messages in thread
From: Andrew Morton @ 2012-06-14 22:28 UTC (permalink / raw)
To: Andrew Wagin
Cc: Oleg Nesterov, LKML, Cyrill Gorcunov, Pavel Emelyanov,
Eric W. Biederman
On Fri, 15 Jun 2012 01:05:51 +0400 Andrew Wagin <avagin@gmail.com> wrote:
> >
> > Could you please re-test with these
> >
> > http://marc.info/?l=linux-mm-commits&m=133962463616232
> > http://marc.info/?l=linux-mm-commits&m=133962463616231
> >
> > patches applied?
>
> Yes. They fixed the bug. Thanks.
OK, thanks. I didn't actually have those queued for 3.5. Do now.
I'll get them into Linus next week.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: general protection fault on finalizing task
2012-06-14 20:37 ` Andrew Wagin
@ 2012-06-15 9:42 ` Oleg Nesterov
0 siblings, 0 replies; 6+ messages in thread
From: Oleg Nesterov @ 2012-06-15 9:42 UTC (permalink / raw)
To: Andrew Wagin
Cc: LKML, Andrew Morton, Cyrill Gorcunov, Pavel Emelyanov,
Eric W. Biederman
Hi Andrew,
Thanks lot for testing, I guess we need these fixes in 3.5
But I am puzzled...
On 06/15, Andrew Wagin wrote:
>
> FYI: I bisected this problem.
>
> # git bisect bad
> Bisecting: 0 revisions left to test after this (roughly 0 steps)
> [3208450488ae724196f1efffc457e4265957c04e] pidns: use
> task_active_pid_ns in do_notify_parent
>
> commit 3208450488ae724196f1efffc457e4265957c04e
> Author: Eric W. Biederman <ebiederm@xmission.com>
> Date: Thu May 31 16:26:39 2012 -0700
>
> pidns: use task_active_pid_ns in do_notify_parent
Impossible ;) I think. I'd say it should be the next change
00c10bc13cdb58447d6bb2a003afad7bd60f5a5f
"pidns: make killed children autoreap"
which is fine by itself, but makes the problem (hopefully fixed
by -mm patches) more visible.
Oleg.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-06-15 9:44 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-14 8:03 general protection fault on finalizing task Andrey Vagin
2012-06-14 16:01 ` Oleg Nesterov
2012-06-14 20:37 ` Andrew Wagin
2012-06-15 9:42 ` Oleg Nesterov
2012-06-14 21:05 ` Andrew Wagin
2012-06-14 22:28 ` Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox