From: bugzilla-daemon@kernel.org
To: linux-man@vger.kernel.org
Subject: [Bug 215769] man 2 vfork() does not document corner case when PID == 1
Date: Wed, 06 Apr 2022 08:46:20 +0000 [thread overview]
Message-ID: <bug-215769-11311-qMg2u7PeXK@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-215769-11311@https.bugzilla.kernel.org/>
https://bugzilla.kernel.org/show_bug.cgi?id=215769
--- Comment #10 from brauner@kernel.org ---
On Tue, Apr 05, 2022 at 09:28:12PM +0200, Alejandro Colomar wrote:
> Hey, Christian!
>
> On 4/4/22 10:05, Christian Brauner wrote:
> > On Sat, Apr 02, 2022 at 11:15:52PM +0200, Alejandro Colomar (man-pages)
> wrote:
> > > [Added some kernel CCs that may know what's going on]
> [...]
> > > Maybe someone in the kernel can send some patch for the clone(2) and/or
> > > vfork(2) manual pages that explains the reason (if it's intended).
> >
> > Hey Alejandro,
> >
> > I won't be able to send a patch very soon but I can at least explain why
> > you see EINVAL. :)
>
> Don't hurry, we're not planning to release any soon :)
>
> >
> > This is intended.
> >
> > vfork() suspends the parent process and the child process will share the
> > same vm as the parent process. If the child process is in a new time
> > namespace different from its parent process it is not allowed to be in
> > the same threadgroup or share virtual memory with the parent process.
> > That's why you see EINVAL.
>
> That makes a lot of sense to me.
>
> >
> > Note, the unshare(CLONE_NEWTIME) call will _not_ cause the calling
> > process to be moved into a different time namespace. Only the newly
> > created child process will be after a subsequent
> > fork()/vfork()/clone()/clone3()...
> >
> > The semantics are equivalent to that of CLONE_NEWPID in this regard. You
> > can see this via /proc/<pid>/ns/ where you see two entries for pid
> > namespaces and also two entries for time namespaces:
> >
> > * CLONE_NEWTIME
> > * /proc/<pid>/ns/time // current time namespace
> > * /proc/<pid>/ns/time_for_children // time namespace for the new
> child process
>
> Also makes sense. Michael taught me that a few weeks ago :)
>
> This also triggers some doubt: will the same problem happen with
> CLONE_NEWPID since it also moves the child into a new ns (in this case a PID
> one)? See test program below.
No, it won't. A pid namespace places no relevant constraints on vm usage
whereas a time namespace does.
If a task joins a new time namespace it'll clean the VVAR page tables
and refault them with the new layout after the timens change. That
affects all tasks which use the same task->mm.
Since CLONE_THREAD implies CLONE_VM this would affect the whole
thread-group behind their back. All threads would suddenly change
timens.
No such issues exist for pid namespaces; they don't need to alter
task->mm.
>
> >
> > If during fork:
> >
> > parent_process->time != parent_process->time_for_children
> >
> > and either CLONE_VM or CLONE_THREAD is set you see EINVAL.
> >
> > You can thus replicate the same error via:
> >
> > unshare(CLONE_NEWTIME)
> >
> > and a
> >
> > clone() or clone3() call with CLONE_VM or CLONE_THREAD.
>
> So, to test my doubts, I wrote this similar program (and also similar
> programs where only the CLONE_NEW* flag was changed, one with CLONE_NEWTIME,
> and one with CLONE_NEWNS)):
>
> $ cat vfork_newpid.c
> #define _GNU_SOURCE
> #include <err.h>
> #include <errno.h>
> #include <linux/sched.h>
> #include <sched.h>
> #include <signal.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/syscall.h>
> #include <unistd.h>
>
> static char *const child_argv[] = {
> "print_pid",
> NULL
> };
>
> static char *const child_envp[] = {
> NULL
> };
>
> int
> main(void)
> {
> pid_t pid;
>
> printf("%s: PID: %ld\n", program_invocation_short_name, (long)
> getpid());
>
> if (unshare(CLONE_NEWPID) == -1)
> err(EXIT_FAILURE, "unshare(2)");
> if (signal(SIGCHLD, SIG_IGN) == SIG_ERR)
> err(EXIT_FAILURE, "signal(2)");
>
> pid = syscall(SYS_vfork);
> //pid = vfork(); // This behaves differently.
> switch (pid) {
> case 0:
> execve("/home/alx/tmp/print_pid", child_argv, child_envp);
> err(EXIT_SUCCESS, "PID %jd exiting after execve(2)",
> (long) getpid());
> case -1:
> err(EXIT_FAILURE, "vfork(2)");
> default:
> errx(EXIT_SUCCESS, "Parent exiting after vfork(2).");
> }
> }
>
> $ cat print_pid.c
> #include <err.h>
> #include <stdlib.h>
> #include <unistd.h>
>
> int
> main(void)
> {
> errx(EXIT_SUCCESS, "PID %jd exiting.", (long) getpid());
> }
>
> $ cc -Wall -Wextra -Werror -o print_pid print_pid.c
> $ cc -Wall -Wextra -Werror -o vfork_newpid vfork_newpid.c
> $
> $
> $ sudo ./vfork_newpid
> vfork_newpid: PID: 8479
> vfork_newpid: PID 8479 exiting after execve(2): Success
> print_pid: PID 1 exiting.
> $
> $
> $ sudo ./vfork_newtime
> vfork_newtime: PID: 8484
> vfork_newtime: vfork(2): Invalid argument
> $
> $
> $ sudo ./vfork_newns
> vfork_newns: PID: 8486
> vfork_newns: PID 8486 exiting after execve(2): Success
> print_pid: PID 8487 exiting.
>
>
> The first thing I noted is that usage of vfork(2) differs considerably from
> fork(2), and that's something that's not clear by reading the manual page.
> It sais that the parent process is suspended until the child calls
> execve(2), but I expected it to mean that vfork(2) doesn't return to the
> parent until that happened, but was otherwise transparent. I was wrong and
> my tests showed me that.
>
> I was going to propose an example program for the manual page, when I
> decided to try a slightly different thing: call vfork() instead of
> syscall(SYS_vfork); that changed the behavior to the same one as with
> fork(2) (i.e., the parent resumes after vfork(2) returns the PID of the
> child.
>
> Is that also intended? I couldn't find the glibc wrapper source code, so I
> don't know what is glibc doing here, but I straced the processes, and
> they're all calling vfork(), so the behavior should be consistent; it's
> quite weird. I'm very confused at this point.
glibc does vfork() via inline assembly massaging. There's probably
atfork handlers and a bunch of other stuff involved so it's difficult to
do a remote diagnosis.
(And note that calling anything other than execve() or _exit() after
vfork() is basically undefined behavior.)
>
>
> I'm also wondering why it's okay to have processes in different PID ns share
> the same vm, but I guess that's implementation details that I don't need to
> care that much.
See earlier in the thread.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
next prev parent reply other threads:[~2022-04-06 11:47 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-29 11:02 [Bug 215769] New: man 2 vfork() does not document corner case when PID == 1 bugzilla-daemon
2022-03-30 0:48 ` [Bug 215769] " bugzilla-daemon
2022-03-30 1:00 ` bugzilla-daemon
2022-03-31 7:53 ` bugzilla-daemon
2022-03-31 8:12 ` bugzilla-daemon
2022-04-02 21:15 ` bugzilla-daemon
2022-04-04 8:05 ` bugzilla-daemon
2022-04-05 11:37 ` bugzilla-daemon
2022-04-05 19:27 ` bugzilla-daemon
2022-04-06 5:44 ` bugzilla-daemon
2022-04-06 8:46 ` bugzilla-daemon [this message]
2022-04-06 17:13 ` bugzilla-daemon
2022-04-06 19:22 ` bugzilla-daemon
2022-04-08 10:55 ` [Bug 215769] vfork() returns EINVAL after unshare(CLONE_NEWTIME) bugzilla-daemon
2022-04-08 11:36 ` bugzilla-daemon
2022-04-11 15:36 ` bugzilla-daemon
2022-05-09 10:23 ` bugzilla-daemon
2022-09-06 8:32 ` bugzilla-daemon
2022-10-08 0:25 ` bugzilla-daemon
2022-10-08 18:52 ` bugzilla-daemon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-215769-11311-qMg2u7PeXK@https.bugzilla.kernel.org/ \
--to=bugzilla-daemon@kernel.org \
--cc=linux-man@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox