From mboxrd@z Thu Jan  1 00:00:00 1970
From: ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman)
Subject: Re: [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped.
Date: Mon, 03 Apr 2017 17:49:17 -0500
Message-ID: <87h925jedu.fsf@xmission.com>
References: <87shmv6ufl.fsf@xmission.com> <20170303173326.GA17899@redhat.com>
        <87tw7axlr0.fsf@xmission.com> <87d1dyw5iw.fsf@xmission.com>
        <87tw7aunuh.fsf@xmission.com> <87lgsmunmj.fsf_-_@xmission.com>
        <20170304170312.GB13131@redhat.com> <8760ir192p.fsf@xmission.com>
        <20170402161518.GC12637@redhat.com> <87inmmbjsq.fsf@xmission.com>
        <20170403183728.GB31390@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain
Return-path: <linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
In-Reply-To: <20170403183728.GB31390-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> (Oleg Nesterov's message of
        "Mon, 3 Apr 2017 20:37:28 +0200")
Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
To: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>, Aleksa Sarai <asarai-IBi9RG/b67k@public.gmane.org>, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>, Attila Fazekas <afazekas-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, Jann Horn <jann-XZ1E9jl8jIdeoWH0uzbU5w@public.gmane.org>, Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>, Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>, Ulrich Obergfell <uobergfe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
List-Id: linux-api@vger.kernel.org

Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:

> Eric,
>
> I see another series from you, but I simply failed to force myself to read
> it carefully. Because at first glance it makes me really sad, I do dislike
> it even if it is correct. Yes, yes, sure, I can be wrong. Will try
> tomorrow.

Yes.  I needed to get my thoughts concrete.  I missed fixing the race in
zap_other_threads.  But overall I think things are moving in a good
direction.

>>
>> I don't know who actually useses PTRACE_O_TRACEEXIT so I don't actually
>> know what the implications of changing it are.  Let's see...
>
> And nobody knows ;) This is the problem, even the clear ptrace bugfix can
> break something, this happened before and we had to revert the obviously-
> correct patches; the bug was already used as feature.

Yes that is the challenge of changing userspace.   Which is why it helps
to test as much of a userspace change as possible.  Or to get very
clever, and figure out how to avoid the userspace change.

So I think it is worth knowing the lldb actually uses
PTRACE_O_TRACEEXIT.  So we can test at least some programs to verify
that all is well.

I don't see any way around cleaning up PTRACE_O_TRACEEXIT.  As
we fundamentally have the non-thread-group-leader exec problem.
We have to reap that previous leader thread with release_task.
Which means we can't stop for a PTRACE_O_TRACEEXIT.


>> If delivering a second SIGKILL
> ...
>> So userspace can absolutely kill a processes in PTRACE_EVENT_EXIT
>> before the tracers find it.
>>
>> Therefore we are only talking a quality of implementation issue
>> if we actually stop and wait for the tracer or not.
>
> Oh, this is another story, needs another discussion. We really need some
> changes in this area, we need to distinguish SIGKILL sent from user-space
> and (say) from group-exit, and we need to decide when should we stop.
>
> But at least I think the tracee should never stop if SIGKILL comes from
> user space. And yes ptrace_stop() is ugly and wrong, just look at the
> arch_ptrace_stop_needed() check. The problem, again, is that any fix will
> be user-visible.

The only issue I see is that arch_ptrace_stop() may sleep (sparc and
ia64 do as they flush the register stack to memory).  As the
code may sleep it means we can't set TASK_TRACED until after calling
arch_ptrace_stop().

My inclination is to just solve that by saying:
if (!sigkill_pending(current))
	set_current_task(TASK_TRACED);

That removes the special case.  We have to handle SIGKILL being
delivered immediately after set_current_state in any event.  And as we
are talking about something that happens on rare architecutres I don't
see any problem with tweaking that code at all.

It is closely enough related I will fold that into the next version of
my patch.

Eric

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1752158AbdDCWyg (ORCPT <rfc822;w@1wt.eu>);
        Mon, 3 Apr 2017 18:54:36 -0400
Received: from out03.mta.xmission.com ([166.70.13.233]:57968 "EHLO
        out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1751564AbdDCWye (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 3 Apr 2017 18:54:34 -0400
From: ebiederm@xmission.com (Eric W. Biederman)
To: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
        Aleksa Sarai <asarai@suse.com>, Andy Lutomirski <luto@amacapital.net>,
        Attila Fazekas <afazekas@redhat.com>, Jann Horn <jann@thejh.net>,
        Kees Cook <keescook@chromium.org>, Michal Hocko <mhocko@kernel.org>,
        Ulrich Obergfell <uobergfe@redhat.com>, linux-kernel@vger.kernel.org,
        linux-api@vger.kernel.org
References: <87shmv6ufl.fsf@xmission.com> <20170303173326.GA17899@redhat.com>
        <87tw7axlr0.fsf@xmission.com> <87d1dyw5iw.fsf@xmission.com>
        <87tw7aunuh.fsf@xmission.com> <87lgsmunmj.fsf_-_@xmission.com>
        <20170304170312.GB13131@redhat.com> <8760ir192p.fsf@xmission.com>
        <20170402161518.GC12637@redhat.com> <87inmmbjsq.fsf@xmission.com>
        <20170403183728.GB31390@redhat.com>
Date: Mon, 03 Apr 2017 17:49:17 -0500
In-Reply-To: <20170403183728.GB31390@redhat.com> (Oleg Nesterov's message of
        "Mon, 3 Apr 2017 20:37:28 +0200")
Message-ID: <87h925jedu.fsf@xmission.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-XM-SPF: eid=1cvArp-0000av-4M;;;mid=<87h925jedu.fsf@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=67.3.234.240;;;frm=ebiederm@xmission.com;;;spf=neutral
X-XM-AID: U2FsdGVkX1/qbXziOTgRqHPyz7sCXLDm9L5pSwPknM4=
X-SA-Exim-Connect-IP: 67.3.234.240
X-SA-Exim-Mail-From: ebiederm@xmission.com
X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP
        *  0.7 XMSubLong Long Subject
        *  1.5 TR_Symld_Words too many words that have symbols inside
        *  0.0 TVD_RCVD_IP Message was received from an IP address
        *  0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available.
        *  0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60%
        *      [score: 0.5000]
        * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC
        *      [sa07 1397; Body=1 Fuz1=1 Fuz2=1]
        *  0.0 T_TooManySym_01 4+ unique symbols in subject
        *  1.0 T_XMHurry_00 Hurry and Do Something
        *  0.1 XMSolicitRefs_0 Weightloss drug
X-Spam-DCC: XMission; sa07 1397; Body=1 Fuz1=1 Fuz2=1 
X-Spam-Combo: ***;Oleg Nesterov <oleg@redhat.com>
X-Spam-Relay-Country: 
X-Spam-Timing: total 258 ms - load_scoreonly_sql: 0.04 (0.0%),
        signal_user_changed: 3.4 (1.3%), b_tie_ro: 2.3 (0.9%), parse: 1.09 (0.4%),
        extract_message_metadata: 4.8 (1.9%), get_uri_detail_list: 2.7 (1.0%),
        tests_pri_-1000: 4.2 (1.6%), tests_pri_-950: 1.14 (0.4%), tests_pri_-900:
        0.96 (0.4%), tests_pri_-400: 23 (9.1%), check_bayes: 22 (8.6%), b_tokenize: 7
        (2.9%), b_tok_get_all: 8 (3.0%), b_comp_prob: 2.6 (1.0%), b_tok_touch_all:
        2.5 (1.0%), b_finish: 0.65 (0.3%), tests_pri_0: 205 (79.4%),
        check_dkim_signature: 0.48 (0.2%), check_dkim_adsp: 2.7 (1.1%),
        tests_pri_500: 5 (2.0%), rewrite_mail: 0.00 (0.0%)
Subject: Re: [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped.
X-Spam-Flag: No
X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600)
X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Oleg Nesterov <oleg@redhat.com> writes:

> Eric,
>
> I see another series from you, but I simply failed to force myself to read
> it carefully. Because at first glance it makes me really sad, I do dislike
> it even if it is correct. Yes, yes, sure, I can be wrong. Will try
> tomorrow.

Yes.  I needed to get my thoughts concrete.  I missed fixing the race in
zap_other_threads.  But overall I think things are moving in a good
direction.

>>
>> I don't know who actually useses PTRACE_O_TRACEEXIT so I don't actually
>> know what the implications of changing it are.  Let's see...
>
> And nobody knows ;) This is the problem, even the clear ptrace bugfix can
> break something, this happened before and we had to revert the obviously-
> correct patches; the bug was already used as feature.

Yes that is the challenge of changing userspace.   Which is why it helps
to test as much of a userspace change as possible.  Or to get very
clever, and figure out how to avoid the userspace change.

So I think it is worth knowing the lldb actually uses
PTRACE_O_TRACEEXIT.  So we can test at least some programs to verify
that all is well.

I don't see any way around cleaning up PTRACE_O_TRACEEXIT.  As
we fundamentally have the non-thread-group-leader exec problem.
We have to reap that previous leader thread with release_task.
Which means we can't stop for a PTRACE_O_TRACEEXIT.


>> If delivering a second SIGKILL
> ...
>> So userspace can absolutely kill a processes in PTRACE_EVENT_EXIT
>> before the tracers find it.
>>
>> Therefore we are only talking a quality of implementation issue
>> if we actually stop and wait for the tracer or not.
>
> Oh, this is another story, needs another discussion. We really need some
> changes in this area, we need to distinguish SIGKILL sent from user-space
> and (say) from group-exit, and we need to decide when should we stop.
>
> But at least I think the tracee should never stop if SIGKILL comes from
> user space. And yes ptrace_stop() is ugly and wrong, just look at the
> arch_ptrace_stop_needed() check. The problem, again, is that any fix will
> be user-visible.

The only issue I see is that arch_ptrace_stop() may sleep (sparc and
ia64 do as they flush the register stack to memory).  As the
code may sleep it means we can't set TASK_TRACED until after calling
arch_ptrace_stop().

My inclination is to just solve that by saying:
if (!sigkill_pending(current))
	set_current_task(TASK_TRACED);

That removes the special case.  We have to handle SIGKILL being
delivered immediately after set_current_state in any event.  And as we
are talking about something that happens on rare architecutres I don't
see any problem with tweaking that code at all.

It is closely enough related I will fold that into the next version of
my patch.

Eric