linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/1] workqueue: ignore dead tasks in a workqueue sleep hook
@ 2016-10-25 11:03 Roman Pen
  2016-10-25 12:56 ` Oleg Nesterov
  0 siblings, 1 reply; 7+ messages in thread
From: Roman Pen @ 2016-10-25 11:03 UTC (permalink / raw)
  Cc: Roman Pen, Andy Lutomirski, Oleg Nesterov, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Tejun Heo, linux-kernel

If panic_on_oops is not set and oops happens inside workqueue kthread,
kernel kills this kthread.  Current patch fixes recursive GPF which
happens when wq_worker_sleeping() function unconditionally accesses
the NULL kthread->vfork_done ptr thru kthread_data() -> to_kthread().

The stack is the following:

[<ffffffff81397f75>] dump_stack+0x68/0x93
[<ffffffff8106954b>] ? do_exit+0x7ab/0xc10
[<ffffffff8108fd73>] __schedule_bug+0x83/0xe0
[<ffffffff81716d5a>] __schedule+0x7ea/0xba0
[<ffffffff810c864f>] ? vprintk_default+0x1f/0x30
[<ffffffff8116a63c>] ? printk+0x48/0x50
[<ffffffff81717150>] schedule+0x40/0x90
[<ffffffff8106976a>] do_exit+0x9ca/0xc10
[<ffffffff810c8e3d>] ? kmsg_dump+0x11d/0x190
[<ffffffff810c8d37>] ? kmsg_dump+0x17/0x190
[<ffffffff81021ee9>] oops_end+0x99/0xd0
[<ffffffff81052da5>] no_context+0x185/0x3e0
[<ffffffff81053083>] __bad_area_nosemaphore+0x83/0x1c0
[<ffffffff810c820e>] ? vprintk_emit+0x25e/0x530
[<ffffffff810531d4>] bad_area_nosemaphore+0x14/0x20
[<ffffffff8105355c>] __do_page_fault+0xac/0x570
[<ffffffff810c66fe>] ? console_trylock+0x1e/0xe0
[<ffffffff81002036>] ? trace_hardirqs_off_thunk+0x1a/0x1c
[<ffffffff81053a2c>] do_page_fault+0xc/0x10
[<ffffffff8171f812>] page_fault+0x22/0x30
[<ffffffff81089bc3>] ? kthread_data+0x33/0x40
[<ffffffff8108427e>] ? wq_worker_sleeping+0xe/0x80
[<ffffffff817169eb>] __schedule+0x47b/0xba0
[<ffffffff81717150>] schedule+0x40/0x90
[<ffffffff8106957d>] do_exit+0x7dd/0xc10
[<ffffffff81021ee9>] oops_end+0x99/0xd0

kthread->vfork_done is zeroed out on the following path:

    do_exit()
    exit_mm()
    mm_release()
    complete_vfork_done()

In order to fix a bug dead tasks must be ignored.

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: linux-kernel@vger.kernel.org
---
v3:
  o minor comment and coding style fixes.

v2:
  o put a task->state check directly into a wq_worker_sleeping() function
    instead of changing the __schedule().

 kernel/workqueue.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9dc7ac5101e0..1525456b18c6 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -875,9 +875,33 @@ void wq_worker_waking_up(struct task_struct *task, int cpu)
  */
 struct task_struct *wq_worker_sleeping(struct task_struct *task)
 {
-	struct worker *worker = kthread_data(task), *to_wakeup = NULL;
+	struct worker *worker, *to_wakeup = NULL;
 	struct worker_pool *pool;
 
+
+	if (task->state == TASK_DEAD) {
+		/*
+		 * Here we try to catch the following path before
+		 * accessing NULL kthread->vfork_done ptr thru
+		 * kthread_data():
+		 *
+		 *    oops_end()
+		 *    do_exit()
+		 *    schedule()
+		 *
+		 * If panic_on_oops is not set and oops happens on
+		 * a workqueue execution path, thread will be killed.
+		 * That is definitly sad, but not to make the situation
+		 * even worse we have to ignore dead tasks in order not
+		 * to step on zeroed out members (e.g. t->vfork_done is
+		 * already NULL on that path, since we were called by
+		 * do_exit())).
+		 */
+		return NULL;
+	}
+
+	worker = kthread_data(task);
+
 	/*
 	 * Rescuers, which may not have all the fields set up like normal
 	 * workers, also reach here, let's not access anything before
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/1] workqueue: ignore dead tasks in a workqueue sleep hook
  2016-10-25 11:03 [PATCH v3 1/1] workqueue: ignore dead tasks in a workqueue sleep hook Roman Pen
@ 2016-10-25 12:56 ` Oleg Nesterov
  2016-10-25 13:00   ` Roman Penyaev
  0 siblings, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2016-10-25 12:56 UTC (permalink / raw)
  To: Roman Pen
  Cc: Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Tejun Heo, linux-kernel

On 10/25, Roman Pen wrote:
>
>  struct task_struct *wq_worker_sleeping(struct task_struct *task)
>  {
> -	struct worker *worker = kthread_data(task), *to_wakeup = NULL;
> +	struct worker *worker, *to_wakeup = NULL;
>  	struct worker_pool *pool;
>  
> +
> +	if (task->state == TASK_DEAD) {
> +		/*
> +		 * Here we try to catch the following path before
> +		 * accessing NULL kthread->vfork_done ptr thru
> +		 * kthread_data():
> +		 *
> +		 *    oops_end()
> +		 *    do_exit()
> +		 *    schedule()
> +		 *
> +		 * If panic_on_oops is not set and oops happens on
> +		 * a workqueue execution path, thread will be killed.
> +		 * That is definitly sad, but not to make the situation
> +		 * even worse we have to ignore dead tasks in order not
> +		 * to step on zeroed out members (e.g. t->vfork_done is
> +		 * already NULL on that path, since we were called by
> +		 * do_exit())).
> +		 */
> +		return NULL;
> +	}

I still think that PF_EXITING check makes more sense than TASK_DEAD,
but I won't insist.

Oleg.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/1] workqueue: ignore dead tasks in a workqueue sleep hook
  2016-10-25 12:56 ` Oleg Nesterov
@ 2016-10-25 13:00   ` Roman Penyaev
  2016-10-25 14:19     ` Oleg Nesterov
  0 siblings, 1 reply; 7+ messages in thread
From: Roman Penyaev @ 2016-10-25 13:00 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Tejun Heo, linux-kernel

On Tue, Oct 25, 2016 at 2:56 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 10/25, Roman Pen wrote:
>>
>>  struct task_struct *wq_worker_sleeping(struct task_struct *task)
>>  {
>> -     struct worker *worker = kthread_data(task), *to_wakeup = NULL;
>> +     struct worker *worker, *to_wakeup = NULL;
>>       struct worker_pool *pool;
>>
>> +
>> +     if (task->state == TASK_DEAD) {
>> +             /*
>> +              * Here we try to catch the following path before
>> +              * accessing NULL kthread->vfork_done ptr thru
>> +              * kthread_data():
>> +              *
>> +              *    oops_end()
>> +              *    do_exit()
>> +              *    schedule()
>> +              *
>> +              * If panic_on_oops is not set and oops happens on
>> +              * a workqueue execution path, thread will be killed.
>> +              * That is definitly sad, but not to make the situation
>> +              * even worse we have to ignore dead tasks in order not
>> +              * to step on zeroed out members (e.g. t->vfork_done is
>> +              * already NULL on that path, since we were called by
>> +              * do_exit())).
>> +              */
>> +             return NULL;
>> +     }
>
> I still think that PF_EXITING check makes more sense than TASK_DEAD,
> but I won't insist.

Why?  I probably do not see the corner cases, so, please, explain.

--
Roman

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/1] workqueue: ignore dead tasks in a workqueue sleep hook
  2016-10-25 13:00   ` Roman Penyaev
@ 2016-10-25 14:19     ` Oleg Nesterov
  2016-10-25 15:57       ` Oleg Nesterov
  0 siblings, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2016-10-25 14:19 UTC (permalink / raw)
  To: Roman Penyaev
  Cc: Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Tejun Heo, linux-kernel

On 10/25, Roman Penyaev wrote:
>
> On Tue, Oct 25, 2016 at 2:56 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> > On 10/25, Roman Pen wrote:
> >>
> >>  struct task_struct *wq_worker_sleeping(struct task_struct *task)
> >>  {
> >> -     struct worker *worker = kthread_data(task), *to_wakeup = NULL;
> >> +     struct worker *worker, *to_wakeup = NULL;
> >>       struct worker_pool *pool;
> >>
> >> +
> >> +     if (task->state == TASK_DEAD) {
> >> +             /*
> >> +              * Here we try to catch the following path before
> >> +              * accessing NULL kthread->vfork_done ptr thru
> >> +              * kthread_data():
> >> +              *
> >> +              *    oops_end()
> >> +              *    do_exit()
> >> +              *    schedule()
> >> +              *
> >> +              * If panic_on_oops is not set and oops happens on
> >> +              * a workqueue execution path, thread will be killed.
> >> +              * That is definitly sad, but not to make the situation
> >> +              * even worse we have to ignore dead tasks in order not
> >> +              * to step on zeroed out members (e.g. t->vfork_done is
> >> +              * already NULL on that path, since we were called by
> >> +              * do_exit())).
> >> +              */
> >> +             return NULL;
> >> +     }
> >
> > I still think that PF_EXITING check makes more sense than TASK_DEAD,
> > but I won't insist.
>
> Why?  I probably do not see the corner cases, so, please, explain.

If nothing else the crashed worker can schedule() before do_task_dead(),

But mainly, to me PF_EXITING just looks better. TASK_DEAD is the very
special state, only sched/core.c should use it.

and... perhaps we can just add

	void oops_end_exit(void)
	{
		current->flags &= ~PF_WQ_WORKER;
		perhaps sonething else;
	}

called by oops_end() before rewind_stack_do_exit() ?

Oleg.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/1] workqueue: ignore dead tasks in a workqueue sleep hook
  2016-10-25 14:19     ` Oleg Nesterov
@ 2016-10-25 15:57       ` Oleg Nesterov
  2016-10-25 16:08         ` Oleg Nesterov
  0 siblings, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2016-10-25 15:57 UTC (permalink / raw)
  To: Roman Penyaev
  Cc: Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Tejun Heo, linux-kernel

On 10/25, Oleg Nesterov wrote:
>
> 	void oops_end_exit(void)
> 	{
> 		current->flags &= ~PF_WQ_WORKER;
> 		perhaps sonething else;
> 	}
>
> called by oops_end() before rewind_stack_do_exit() ?

and "perhaps sonething else" above should probably clear current->plug,
it likely points to "struct blk_plug" on stack.

and perhaps call task_work_run(). Currently only irq_thread() uses the
"destructor" work on stack, but think can have more users.

Oleg.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/1] workqueue: ignore dead tasks in a workqueue sleep hook
  2016-10-25 15:57       ` Oleg Nesterov
@ 2016-10-25 16:08         ` Oleg Nesterov
  2016-10-25 18:15           ` Roman Penyaev
  0 siblings, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2016-10-25 16:08 UTC (permalink / raw)
  To: Roman Penyaev
  Cc: Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Tejun Heo, linux-kernel

sorry for noise, forgot to mention...

On 10/25, Oleg Nesterov wrote:
>
> On 10/25, Oleg Nesterov wrote:
> >
> > 	void oops_end_exit(void)
> > 	{
> > 		current->flags &= ~PF_WQ_WORKER;
> > 		perhaps sonething else;
> > 	}
> >
> > called by oops_end() before rewind_stack_do_exit() ?
>
> and "perhaps sonething else" above should probably clear current->plug,
> it likely points to "struct blk_plug" on stack.
>
> and perhaps call task_work_run(). Currently only irq_thread() uses the
> "destructor" work on stack, but think can have more users.

and, probably absorb some code from do_exit(), say set_fs(USER_DS) and/or
PF_EXITING check.

Oleg.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/1] workqueue: ignore dead tasks in a workqueue sleep hook
  2016-10-25 16:08         ` Oleg Nesterov
@ 2016-10-25 18:15           ` Roman Penyaev
  0 siblings, 0 replies; 7+ messages in thread
From: Roman Penyaev @ 2016-10-25 18:15 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Tejun Heo, linux-kernel

On Tue, Oct 25, 2016 at 6:08 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> sorry for noise, forgot to mention...
>
> On 10/25, Oleg Nesterov wrote:
>>
>> On 10/25, Oleg Nesterov wrote:
>> >
>> >     void oops_end_exit(void)
>> >     {
>> >             current->flags &= ~PF_WQ_WORKER;
>> >             perhaps sonething else;
>> >     }
>> >
>> > called by oops_end() before rewind_stack_do_exit() ?
>>
>> and "perhaps sonething else" above should probably clear current->plug,
>> it likely points to "struct blk_plug" on stack.
>>
>> and perhaps call task_work_run(). Currently only irq_thread() uses the
>> "destructor" work on stack, but think can have more users.
>
> and, probably absorb some code from do_exit(), say set_fs(USER_DS) and/or
> PF_EXITING check.

I agree that for stack which was rewound we need more cleanups.  But in the
current patch I do not want to mix several cases.  Here I try to avoid access
to a NULL pointer of a dead task.  That is not related to a corrupted stack.

Regarding the PF_WQ_WORKER flags.  I liked the idea that workqueue itself
should be able to detect that task is dead.  Spreading the code which does
flags cleanup for a special workqueue case IMO is not nice.

And, indeed, PF_EXITING looks more generic.  Thanks.  I will resend this.

--
Roman

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-10-25 18:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-25 11:03 [PATCH v3 1/1] workqueue: ignore dead tasks in a workqueue sleep hook Roman Pen
2016-10-25 12:56 ` Oleg Nesterov
2016-10-25 13:00   ` Roman Penyaev
2016-10-25 14:19     ` Oleg Nesterov
2016-10-25 15:57       ` Oleg Nesterov
2016-10-25 16:08         ` Oleg Nesterov
2016-10-25 18:15           ` Roman Penyaev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).