From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S933208AbdBVSID (ORCPT <rfc822;w@1wt.eu>);
        Wed, 22 Feb 2017 13:08:03 -0500
Received: from mx1.redhat.com ([209.132.183.28]:37602 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S932610AbdBVSHv (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 22 Feb 2017 13:07:51 -0500
Date: Wed, 22 Feb 2017 18:41:03 +0100
From: Oleg Nesterov <oleg@redhat.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
        Mika =?iso-8859-1?Q?Penttil=E4?= <mika.penttila@nextfour.com>,
        Aleksa Sarai <asarai@suse.com>, Andy Lutomirski <luto@amacapital.net>,
        Attila Fazekas <afazekas@redhat.com>, Jann Horn <jann@thejh.net>,
        Kees Cook <keescook@chromium.org>, Michal Hocko <mhocko@kernel.org>,
        Ulrich Obergfell <uobergfe@redhat.com>, linux-kernel@vger.kernel.org
Subject: Re: [PATCH V2 1/2] exec: don't wait for zombie threads with
        cred_guard_mutex held
Message-ID: <20170222174102.GA19788@redhat.com>
References: <20170213141452.GA30203@redhat.com> <20170213141516.GA30233@redhat.com> <20170213180454.GA2858@redhat.com> <87zihmpdkf.fsf@xmission.com> <20170220152202.GA13726@redhat.com> <87vas4wl3z.fsf@xmission.com> <20170221175354.GA31436@redhat.com> <871surmh34.fsf@xmission.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <871surmh34.fsf@xmission.com>
User-Agent: Mutt/1.5.18 (2008-05-17)
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Wed, 22 Feb 2017 17:42:40 +0000 (UTC)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 02/22, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> >> Reducing the
> >> scope of cred_guard_mutex concerns me.  There appear to be some fields
> >> like sighand that we currently expose in proc
> >
> > please see another email, collect_sigign_sigcatch() is called without this
> > mutex.
>
> I agree that it is called without the mutex.  It is not clear to me that
> is the correct behavior.

I fail to understand how/why this can be wrong.

> >> Do you know if we can make cred_guard_mutex a per-task lock again?
> >
> > I think we can, but this needs some (afaics simple) changes too.
> >
> > But for what? Note that the problem fixed by this series won't go away
> > if we do this.
>
> I believe it will if the other waiters use mutex_lock_killable.

No. They already use mutex_lock_killable/interruptible. And the test-case
can be killed, it is not the hard-lockup.

> I really don't like the first patch.

Just in case, I don't really like it too. Simply because it makes execve
more complex, we need to wait for sub-threads twice.

> It makes an information leak part
> a required detail of the implementation and as such possibly something
> we can never change.

Again, I simply can't understand how flush_signal_handlers() outside of
cred_guard_mutex can be treated as information leak. Even _if_
collect_sigign_sigcatch() was called with this mutex held.

Or do you mean something else?

> I suspect that a good fix that respects that proc and ptrace_attach need
> to exclude the setuid exec case for semantic reasons would have a similar
> complexity.

I am not sure I understand how we can do this. We need cred_guard_mutex
or something else even if exec is not setuid and does not change the
credentials, an LSM module can nack exec-under-ptrace by any reason.

> I think fixing the deadlock is important.

Yes. People actually hit this bug, it was reported several times.

> Right now it feels like your fix in patch 1 makes things a bit more
> brittle and I don't like that at all.

See above, I am not proud of this change too. I even mentioned on 0/2
that it would be nice to reconsider this change in the long term.

But I do not see another simple and _backportable_ solution for now.

What do you think we can do instead for stable trees?

Oleg.