linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)
@ 2016-06-27  5:22 Andy Lutomirski
  2016-06-27  8:28 ` Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Andy Lutomirski @ 2016-06-27  5:22 UTC (permalink / raw)
  To: Linus Torvalds, Peter Zijlstra, Oleg Nesterov, Tejun Heo
  Cc: Andy Lutomirski, LKP, LKML, kernel test robot

My v4 series was doing pretty well until this explosion:

On Sun, Jun 26, 2016 at 9:41 PM, kernel test robot
<xiaolong.ye@intel.com> wrote:
>
>
> FYI, we noticed the following commit:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git x86/vmap_stack
> commit 26424589626d7f82d09d4e7c0569f9487b2e810a ("[DEBUG] force-enable CONFIG_VMAP_STACK")
>

...

> [    4.425052] BUG: unable to handle kernel paging request at ffffc90000997f18
> [    4.426645] IP: [<ffffffff81a9ace0>] _raw_spin_lock_irq+0x2c/0x3d
> [    4.427869] PGD 1249e067 PUD 1249f067 PMD 11e4e067 PTE 0
> [    4.429245] Oops: 0002 [#1] SMP
> [    4.430086] Modules linked in:
> [    4.430992] CPU: 0 PID: 1741 Comm: mount Not tainted 4.7.0-rc4-00258-g26424589 #1
> [    4.432727] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
> [    4.434646] task: ffff88000d950c80 ti: ffff88000d950c80 task.ti: ffff88000d950c80

Yeah, this line is meaningless with the thread_info cleanups, and I
have it fixed for v5.

> [    4.436406] RIP: 0010:[<ffffffff81a9ace0>]  [<ffffffff81a9ace0>] _raw_spin_lock_irq+0x2c/0x3d
> [    4.438341] RSP: 0018:ffffc90000957c80  EFLAGS: 00010046
> [    4.439438] RAX: 0000000000000000 RBX: 7fffffffffffffff RCX: 0000000000000a66
> [    4.440735] RDX: 0000000000000001 RSI: ffff880013619bc0 RDI: ffffc90000997f18
> [    4.442035] RBP: ffffc90000957c88 R08: 0000000000019bc0 R09: ffffffff81200748
> [    4.443323] R10: ffffea0000474900 R11: 000000000001a2a0 R12: ffffc90000997f10
> [    4.444614] R13: 0000000000000002 R14: ffffc90000997f18 R15: 00000000ffffffea
> [    4.445896] FS:  00007f9ca6a32700(0000) GS:ffff880013600000(0000) knlGS:0000000000000000
> [    4.447690] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    4.448819] CR2: ffffc90000997f18 CR3: 000000000d87c000 CR4: 00000000000006f0
> [    4.450102] Stack:
> [    4.450810]  ffffc90000997f18 ffffc90000957d00 ffffffff81a982eb 0000000000000246
> [    4.452827]  0000000000000000 ffffc90000957d00 ffffffff8112584b 0000000000000000
> [    4.454838]  0000000000000246 ffff88000e27f6bc 0000000000000000 ffff88000e27f080
> [    4.456845] Call Trace:
> [    4.457616]  [<ffffffff81a982eb>] wait_for_common+0x44/0x197
> [    4.458719]  [<ffffffff8112584b>] ? try_to_wake_up+0x2dd/0x2ef
> [    4.459877]  [<ffffffff81a9845b>] wait_for_completion+0x1d/0x1f
> [    4.461027]  [<ffffffff8111db10>] kthread_stop+0x82/0x10a
> [    4.462125]  [<ffffffff81117f08>] destroy_workqueue+0x10d/0x1cd
> [    4.463347]  [<ffffffff81445236>] xfs_destroy_mount_workqueues+0x49/0x64
> [    4.464620]  [<ffffffff81445c03>] xfs_fs_fill_super+0x2c0/0x49c
> [    4.465807]  [<ffffffff8123547a>] mount_bdev+0x143/0x195
> [    4.466937]  [<ffffffff81445943>] ? xfs_test_remount_options+0x5b/0x5b
> [    4.468727]  [<ffffffff81444568>] xfs_fs_mount+0x15/0x17
> [    4.469838]  [<ffffffff8123614a>] mount_fs+0x15/0x8c
> [    4.470882]  [<ffffffff8124cfc4>] vfs_kern_mount+0x6a/0xfe
> [    4.472005]  [<ffffffff8124fc2f>] do_mount+0x985/0xa9a
> [    4.473078]  [<ffffffff811e0846>] ? strndup_user+0x3a/0x6a
> [    4.474193]  [<ffffffff8124ff6a>] SyS_mount+0x77/0x9f
> [    4.475255]  [<ffffffff81a9b081>] entry_SYSCALL_64_fastpath+0x1f/0xbd
> [    4.476463] Code: 66 66 66 90 55 48 89 e5 50 48 89 7d f8 fa 66 66 90 66 66 90 e8 2d 0a 70 ff 65 ff 05 73 18 57 7e 31 c0 ba 01 00 00 00 48 8b 7d f8 <f0> 0f b1 17 85 c0 74 07 89 c6 e8 3e 20 6a ff c9 c3 66 66 66 66
> [    4.484413] RIP  [<ffffffff81a9ace0>] _raw_spin_lock_irq+0x2c/0x3d
> [    4.485639]  RSP <ffffc90000957c80>
> [    4.486509] CR2: ffffc90000997f18
> [    4.487366] ---[ end trace 79763b41869f2580 ]---
> [    4.488367] Kernel panic - not syncing: Fatal exception
>

kthread_stop is *sick*.

    struct kthread self;

...

    current->vfork_done = &self.exited;

...

    do_exit(ret);

And then some other thread goes and waits for the completion, which is
*on the stack*, which, in any sane world (e.g. with my series
applied), is long gone by then.

But this is broken even without any changes: since when is gcc
guaranteed to preserve the stack contents when a function ends with a
sibling call, let alone with a __noreturn call?

Is there seriously no way to directly wait for a struct task_struct to
exit?  Could we, say, kmalloc the completion (or maybe even the whole
struct kthread) and (ick!) hang it off ->vfork_done?

Linus, maybe it's time for you to carve another wax figurine.

--Andy

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2016-06-29 23:33 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-27  5:22 kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18) Andy Lutomirski
2016-06-27  8:28 ` Peter Zijlstra
2016-06-27 14:54 ` Oleg Nesterov
2016-06-27 15:44   ` Andy Lutomirski
2016-06-27 17:00     ` Oleg Nesterov
2016-06-28 18:58       ` Oleg Nesterov
2016-06-28 19:12         ` Andy Lutomirski
2016-06-28 20:12           ` Oleg Nesterov
2016-06-28 20:54             ` Andy Lutomirski
2016-06-28 21:14               ` Linus Torvalds
2016-06-28 21:18                 ` Linus Torvalds
2016-06-28 21:21                 ` Andy Lutomirski
2016-06-28 21:35                   ` Linus Torvalds
2016-06-28 21:40                     ` Linus Torvalds
2016-06-28 22:47                       ` Oleg Nesterov
2016-06-28 22:59               ` Oleg Nesterov
2016-06-29 15:34                 ` Andy Lutomirski
2016-06-29 18:03                   ` [PATCH] kthread: to_live_kthread() needs try_get_task_stack() Oleg Nesterov
2016-06-29 18:28                     ` kbuild test robot
2016-06-29 18:44                       ` Oleg Nesterov
2016-06-29 18:51                     ` kbuild test robot
2016-06-29 23:01                     ` Andy Lutomirski
2016-06-29 23:33         ` kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18) Andy Lutomirski
2016-06-27 17:16 ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).