From mboxrd@z Thu Jan  1 00:00:00 1970
Content-Type: multipart/mixed; boundary="===============8615787312279447014=="
MIME-Version: 1.0
From: Oleg Nesterov <oleg@redhat.com>
To: lkp@lists.01.org
Subject: Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)
Date: Mon, 27 Jun 2016 16:54:44 +0200
Message-ID: <20160627145443.GA17145@redhat.com>
In-Reply-To: <CALCETrVZUJdVNdt3W9_o6EO_BjeO1V8JtQ+aig=zkR5xukeZhg@mail.gmail.com>
List-Id: <oe-lkp.lists.linux.dev>

--===============8615787312279447014==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable

On 06/26, Andy Lutomirski wrote:
>
> kthread_stop is *sick*.
>
>     struct kthread self;
>
> ...
>
>     current->vfork_done =3D &self.exited;
>
> ...
>
>     do_exit(ret);
>
> And then some other thread goes and waits for the completion, which is
> *on the stack*, which, in any sane world (e.g. with my series
> applied), is long gone by then.

Yes, I forgot this when we discussed the problems with ti->flags/etc...

> But this is broken even without any changes: since when is gcc
> guaranteed to preserve the stack contents when a function ends with a
> sibling call, let alone with a __noreturn call?

I don't know if gcc can actually drop the stack frame in this case,
but even if it can this looks fixeable.

> Is there seriously no way to directly wait for a struct task_struct to
> exit?  Could we, say, kmalloc the completion (or maybe even the whole
> struct kthread) and (ick!) hang it off ->vfork_done?

Sure we can... And yes, I think we need to alloc the whole struct kthread.
Just another (unfortunate) complication, the current code is simple.

And probably kthread/kthread_stop should switch to task_work_exit().

Oleg.


--===============8615787312279447014==--


From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751961AbcF0OyK (ORCPT <rfc822;w@1wt.eu>);
	Mon, 27 Jun 2016 10:54:10 -0400
Received: from mx1.redhat.com ([209.132.183.28]:51307 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751860AbcF0OyH (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Mon, 27 Jun 2016 10:54:07 -0400
Date: Mon, 27 Jun 2016 16:54:44 +0200
From: Oleg Nesterov <oleg@redhat.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
        Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>,
        LKP <lkp@01.org>, LKML <linux-kernel@vger.kernel.org>,
        kernel test robot <xiaolong.ye@intel.com>
Subject: Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG:
	unable to handle kernel paging request at ffffc90000997f18)
Message-ID: <20160627145443.GA17145@redhat.com>
References: <CALCETrVZUJdVNdt3W9_o6EO_BjeO1V8JtQ+aig=zkR5xukeZhg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CALCETrVZUJdVNdt3W9_o6EO_BjeO1V8JtQ+aig=zkR5xukeZhg@mail.gmail.com>
User-Agent: Mutt/1.5.18 (2008-05-17)
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Mon, 27 Jun 2016 14:54:06 +0000 (UTC)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 06/26, Andy Lutomirski wrote:
>
> kthread_stop is *sick*.
>
>     struct kthread self;
>
> ...
>
>     current->vfork_done = &self.exited;
>
> ...
>
>     do_exit(ret);
>
> And then some other thread goes and waits for the completion, which is
> *on the stack*, which, in any sane world (e.g. with my series
> applied), is long gone by then.

Yes, I forgot this when we discussed the problems with ti->flags/etc...

> But this is broken even without any changes: since when is gcc
> guaranteed to preserve the stack contents when a function ends with a
> sibling call, let alone with a __noreturn call?

I don't know if gcc can actually drop the stack frame in this case,
but even if it can this looks fixeable.

> Is there seriously no way to directly wait for a struct task_struct to
> exit?  Could we, say, kmalloc the completion (or maybe even the whole
> struct kthread) and (ick!) hang it off ->vfork_done?

Sure we can... And yes, I think we need to alloc the whole struct kthread.
Just another (unfortunate) complication, the current code is simple.

And probably kthread/kthread_stop should switch to task_work_exit().

Oleg.