All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Weimer <fweimer@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jann Horn <jannh@google.com>, Kevin Easton <kevin@guarana.org>,
	Andy Lutomirski <luto@kernel.org>,
	Christian Brauner <christian@brauner.io>,
	Aleksa Sarai <cyphar@cyphar.com>,
	"Enrico Weigelt, metux IT consult" <lkml@metux.net>,
	Al Viro <viro@zeniv.linux.org.uk>,
	David Howells <dhowells@redhat.com>,
	Linux API <linux-api@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	Arnd Bergmann <arnd@arndb.de>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Kees Cook <keescook@chromium.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Daniel Colascione <dancol@goo>
Subject: Re: RFC: on adding new CLONE_* flags [WAS Re: [PATCH 0/4] clone: add CLONE_PIDFD]
Date: Tue, 30 Apr 2019 19:07:19 +0200	[thread overview]
Message-ID: <871s1j777c.fsf@oldenburg2.str.redhat.com> (raw)
In-Reply-To: <CAHk-=wgxqoH=Cztd7W67pYJyZwPPNtY5P7K66u2QTDEpEYHVAw@mail.gmail.com> (Linus Torvalds's message of "Tue, 30 Apr 2019 09:26:57 -0700")

* Linus Torvalds:

> On Tue, Apr 30, 2019 at 9:19 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> Of course, if you *don't* need the exact vfork() semantics, clone
>> itself actually very much supports a callback model with s separate
>> stack. You can basically do this:
>>
>>  - allocate new stack for the child
>>  - in trivial asm wrapper, do:
>>     - push the callback address on the child stack
>>     - clone(CLONE_VFORK|CLONE_VM|CLONE_SIGCHLD, chld_stack, NULL, NULL,NULL)
>>     - "ret"
>>  - free new stack
>>
>> where the "ret" in the child will just go to the callback, while the
>> parent (eventually) just returns from the trivial wrapper and frees
>> the new stack (which by definition is no longer used, since the child
>> has exited or execve'd.
>
> In fact, Florian, maybe this is the solution to your "I want to use
> vfork for posix_spawn(), but I don't know if I can trust it" problem.
>
> Just use clone() directly. On WSL it will presumably just fail, and
> you can then fall back on doing the slow stupid
> fork+pipes-to-communicate.

We already use clone.  I don't know why.  We should add a comment that
provides the reason.

> On valgrind, I don't know what will happen. Maybe it will just do an
> unchecked posix_spawn() because valgrind doesn't catch it?

I think what happens with these emulators that they use fork (no shared
address space) but suspend the parent thread.  clone with CLONE_VFORK
definitely does not fail.  That mostly works, except that you do not get
back the error code from the execve.  Instead, the process is considered
launched, and the caller collects the exit status from the _exit after
the failed execve.

> Of course, if you *don't* need the exact vfork() semantics, clone
> itself actually very much supports a callback model with s separate
> stack. You can basically do this:
> 
>  - allocate new stack for the child
>  - in trivial asm wrapper, do:
>     - push the callback address on the child stack
>     - clone(CLONE_VFORK|CLONE_VM|CLONE_SIGCHLD, chld_stack, NULL, NULL,NULL)
>     - "ret"
>  - free new stack
> 
> where the "ret" in the child will just go to the callback, while the
> parent (eventually) just returns from the trivial wrapper and frees
> the new stack (which by definition is no longer used, since the child
> has exited or execve'd.
> 
> So you can most definitely create a "vfork_with_child_callback()" with
> clone, and it would arguably be a much superior interface to vfork()
> anyway (maybe you'd like to pass in some arguments to the callback too
> - add more stack setup for the child as needed), but it wouldn't be
> the right solution for programs that just want to use the standard BSD
> vfork() model.

As far as we understand the situation, we believe that we absolutely
must block all signals for both the parent thread and the new
subprocess.  Signals can be unblocked in the subprocess, but only after
setting their handlers to SIG_DFL or SIG_IGN.  (Parent signal handlers
cannot run in the subprocess because application-supplied signal
handlers generally do not expect to run with a corrupt TCB—or a
different PID.)

At that point, I wonder if we can just skip the stack setup for the
CLONE_VFORK case and reuse the existing stack.

Thanks,
Florian

WARNING: multiple messages have this Message-ID (diff)
From: Florian Weimer <fweimer@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jann Horn <jannh@google.com>, Kevin Easton <kevin@guarana.org>,
	Andy Lutomirski <luto@kernel.org>,
	Christian Brauner <christian@brauner.io>,
	Aleksa Sarai <cyphar@cyphar.com>, "Enrico Weigelt\,
	metux IT consult" <lkml@metux.net>,
	Al Viro <viro@zeniv.linux.org.uk>,
	David Howells <dhowells@redhat.com>,
	Linux API <linux-api@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	Arnd Bergmann <arnd@arndb.de>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Kees Cook <keescook@chromium.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Daniel Colascione <dancol@google.com>
Subject: Re: RFC: on adding new CLONE_* flags [WAS Re: [PATCH 0/4] clone: add CLONE_PIDFD]
Date: Tue, 30 Apr 2019 19:07:19 +0200	[thread overview]
Message-ID: <871s1j777c.fsf@oldenburg2.str.redhat.com> (raw)
In-Reply-To: <CAHk-=wgxqoH=Cztd7W67pYJyZwPPNtY5P7K66u2QTDEpEYHVAw@mail.gmail.com> (Linus Torvalds's message of "Tue, 30 Apr 2019 09:26:57 -0700")

* Linus Torvalds:

> On Tue, Apr 30, 2019 at 9:19 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> Of course, if you *don't* need the exact vfork() semantics, clone
>> itself actually very much supports a callback model with s separate
>> stack. You can basically do this:
>>
>>  - allocate new stack for the child
>>  - in trivial asm wrapper, do:
>>     - push the callback address on the child stack
>>     - clone(CLONE_VFORK|CLONE_VM|CLONE_SIGCHLD, chld_stack, NULL, NULL,NULL)
>>     - "ret"
>>  - free new stack
>>
>> where the "ret" in the child will just go to the callback, while the
>> parent (eventually) just returns from the trivial wrapper and frees
>> the new stack (which by definition is no longer used, since the child
>> has exited or execve'd.
>
> In fact, Florian, maybe this is the solution to your "I want to use
> vfork for posix_spawn(), but I don't know if I can trust it" problem.
>
> Just use clone() directly. On WSL it will presumably just fail, and
> you can then fall back on doing the slow stupid
> fork+pipes-to-communicate.

We already use clone.  I don't know why.  We should add a comment that
provides the reason.

> On valgrind, I don't know what will happen. Maybe it will just do an
> unchecked posix_spawn() because valgrind doesn't catch it?

I think what happens with these emulators that they use fork (no shared
address space) but suspend the parent thread.  clone with CLONE_VFORK
definitely does not fail.  That mostly works, except that you do not get
back the error code from the execve.  Instead, the process is considered
launched, and the caller collects the exit status from the _exit after
the failed execve.

> Of course, if you *don't* need the exact vfork() semantics, clone
> itself actually very much supports a callback model with s separate
> stack. You can basically do this:
> 
>  - allocate new stack for the child
>  - in trivial asm wrapper, do:
>     - push the callback address on the child stack
>     - clone(CLONE_VFORK|CLONE_VM|CLONE_SIGCHLD, chld_stack, NULL, NULL,NULL)
>     - "ret"
>  - free new stack
> 
> where the "ret" in the child will just go to the callback, while the
> parent (eventually) just returns from the trivial wrapper and frees
> the new stack (which by definition is no longer used, since the child
> has exited or execve'd.
> 
> So you can most definitely create a "vfork_with_child_callback()" with
> clone, and it would arguably be a much superior interface to vfork()
> anyway (maybe you'd like to pass in some arguments to the callback too
> - add more stack setup for the child as needed), but it wouldn't be
> the right solution for programs that just want to use the standard BSD
> vfork() model.

As far as we understand the situation, we believe that we absolutely
must block all signals for both the parent thread and the new
subprocess.  Signals can be unblocked in the subprocess, but only after
setting their handlers to SIG_DFL or SIG_IGN.  (Parent signal handlers
cannot run in the subprocess because application-supplied signal
handlers generally do not expect to run with a corrupt TCB—or a
different PID.)

At that point, I wonder if we can just skip the stack setup for the
CLONE_VFORK case and reuse the existing stack.

Thanks,
Florian

  reply	other threads:[~2019-04-30 17:07 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-14 20:14 [PATCH 0/4] clone: add CLONE_PIDFD Christian Brauner
2019-04-14 20:14 ` [PATCH 1/4] Make anon_inodes unconditional Christian Brauner
2019-04-14 20:14 ` [PATCH 2/4] clone: add CLONE_PIDFD Christian Brauner
2019-04-15 10:52   ` Oleg Nesterov
2019-04-15 11:42     ` Christian Brauner
2019-04-15 13:24       ` Oleg Nesterov
2019-04-15 13:52         ` Christian Brauner
2019-04-15 16:25           ` Joel Fernandes
2019-04-15 17:15         ` Jonathan Kowalski
2019-04-15 19:39           ` Daniel Colascione
2019-04-14 20:14 ` [PATCH 3/4] signal: support CLONE_PIDFD with pidfd_send_signal Christian Brauner
2019-04-14 20:14 ` [PATCH 4/4] samples: show race-free pidfd metadata access Christian Brauner
2019-04-15 10:08 ` RFC: on adding new CLONE_* flags [WAS Re: [PATCH 0/4] clone: add CLONE_PIDFD] Enrico Weigelt, metux IT consult
2019-04-15 15:50   ` Serge E. Hallyn
2019-04-16 18:32     ` Enrico Weigelt, metux IT consult
2019-04-29 15:49       ` Serge E. Hallyn
2019-04-29 17:31         ` Enrico Weigelt, metux IT consult
2019-05-05  2:32           ` Serge E. Hallyn
2019-04-15 19:59   ` Aleksa Sarai
2019-04-15 20:29     ` Andy Lutomirski
2019-04-15 21:27       ` Jonathan Kowalski
2019-04-15 23:58         ` Andy Lutomirski
2019-04-15 23:58           ` Andy Lutomirski
2019-04-16 18:45       ` Enrico Weigelt, metux IT consult
2019-04-16 21:31         ` Andy Lutomirski
2019-04-17 12:03           ` Enrico Weigelt, metux IT consult
2019-04-17 12:54             ` Christian Brauner
2019-04-18 15:46               ` Enrico Weigelt, metux IT consult
2019-04-17 12:19       ` Florian Weimer
2019-04-17 12:19         ` Florian Weimer
2019-04-17 16:46         ` Andy Lutomirski
2019-04-17 16:46           ` Andy Lutomirski
2019-04-20  7:14       ` Kevin Easton
2019-04-20 11:15         ` Christian Brauner
2019-04-20 15:06         ` Daniel Colascione
2019-04-29 19:30         ` Jann Horn
2019-04-29 19:55           ` Jann Horn
2019-04-29 20:21             ` Linus Torvalds
2019-04-29 20:38               ` Florian Weimer
2019-04-29 20:38                 ` Florian Weimer
2019-04-29 20:51                 ` Christian Brauner
2019-04-29 20:51                   ` Christian Brauner
2019-04-29 21:31                 ` Linus Torvalds
2019-04-29 21:31                   ` Linus Torvalds
2019-04-30  7:01                   ` Florian Weimer
2019-04-30  7:01                     ` Florian Weimer
2019-04-30  0:38               ` Jann Horn
2019-04-30  2:16                 ` Linus Torvalds
2019-04-30  8:21                   ` Florian Weimer
2019-04-30  8:21                     ` Florian Weimer
2019-04-30 16:19                     ` Linus Torvalds
2019-04-30 16:19                       ` Linus Torvalds
2019-04-30 16:26                       ` Linus Torvalds
2019-04-30 16:26                         ` Linus Torvalds
2019-04-30 17:07                         ` Florian Weimer [this message]
2019-04-30 17:07                           ` Florian Weimer
2019-04-30 12:39               ` Oleg Nesterov
2019-04-30 16:24                 ` Linus Torvalds
2019-04-29 20:49             ` Florian Weimer
2019-04-29 20:49               ` Florian Weimer
2019-04-29 20:52               ` Christian Brauner
2019-04-29 20:52                 ` Christian Brauner
2019-04-20 15:28       ` Al Viro
2019-04-16 18:37     ` Enrico Weigelt, metux IT consult
2019-04-15 10:16 ` [PATCH 0/4] clone: add CLONE_PIDFD Enrico Weigelt, metux IT consult

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871s1j777c.fsf@oldenburg2.str.redhat.com \
    --to=fweimer@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=christian@brauner.io \
    --cc=cyphar@cyphar.com \
    --cc=dancol@goo \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=jannh@google.com \
    --cc=joel@joelfernandes.org \
    --cc=keescook@chromium.org \
    --cc=kevin@guarana.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkml@metux.net \
    --cc=luto@kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=oleg@redhat.com \
    --cc=serge@hallyn.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.