* [PATCH 0/2] Revert "fs/exec: allow to unshare a time namespace on vfork+exec"
@ 2022-09-13 10:25 Andrei Vagin
2022-09-13 10:25 ` [PATCH 1/2] Revert "selftests/timens: add a test for vfork+exit" Andrei Vagin
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: Andrei Vagin @ 2022-09-13 10:25 UTC (permalink / raw)
To: Kees Cook; +Cc: linux-kernel, Andrei Vagin
This reverts commits:
133e2d3e81de ("fs/exec: allow to unshare a time namespace on vfork+exec")
6342140db660 ("selftests/timens: add a test for vfork+exit")
Alexey pointed out a few undesirable side effects of the reverted change.
First, it doesn't take into account that CLONE_VFORK can be used with
CLONE_THREAD. Second, a child process doesn't enter a target time name-space,
if its parent dies before the child calls exec. It happens because the parent
clears vfork_done.
Eric W. Biederman suggests installing a time namespace as a task gets a new mm.
It includes all new processes cloned without CLONE_VM and all tasks that call
exec(). This is an user API change, but we think there aren't users that depend
on the old behavior.
It is too late to make such changes in this release, so let's roll back
this patch and introduce the right one in the next release.
Andrei Vagin (2):
Revert "selftests/timens: add a test for vfork+exit"
Revert "fs/exec: allow to unshare a time namespace on vfork+exec"
fs/exec.c | 7 --
kernel/fork.c | 5 +-
kernel/nsproxy.c | 3 +-
tools/testing/selftests/timens/Makefile | 2 +-
tools/testing/selftests/timens/vfork_exec.c | 90 ---------------------
5 files changed, 3 insertions(+), 104 deletions(-)
delete mode 100644 tools/testing/selftests/timens/vfork_exec.c
--
2.37.2.789.g6183377224-goog
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH 1/2] Revert "selftests/timens: add a test for vfork+exit" 2022-09-13 10:25 [PATCH 0/2] Revert "fs/exec: allow to unshare a time namespace on vfork+exec" Andrei Vagin @ 2022-09-13 10:25 ` Andrei Vagin 2022-09-13 10:25 ` [PATCH 2/2] Revert "fs/exec: allow to unshare a time namespace on vfork+exec" Andrei Vagin ` (2 subsequent siblings) 3 siblings, 0 replies; 6+ messages in thread From: Andrei Vagin @ 2022-09-13 10:25 UTC (permalink / raw) To: Kees Cook; +Cc: linux-kernel, Andrei Vagin, Andrei Vagin The next patch reverts the code that this test verified. This reverts commit 6342140db6609a0c7d34f68c52b2947468e0e630. Signed-off-by: Andrei Vagin <avagin@gmail.com> --- tools/testing/selftests/timens/Makefile | 2 +- tools/testing/selftests/timens/vfork_exec.c | 90 --------------------- 2 files changed, 1 insertion(+), 91 deletions(-) delete mode 100644 tools/testing/selftests/timens/vfork_exec.c diff --git a/tools/testing/selftests/timens/Makefile b/tools/testing/selftests/timens/Makefile index f0d51d4d2c87..3a5936cc10ab 100644 --- a/tools/testing/selftests/timens/Makefile +++ b/tools/testing/selftests/timens/Makefile @@ -1,4 +1,4 @@ -TEST_GEN_PROGS := timens timerfd timer clock_nanosleep procfs exec futex vfork_exec +TEST_GEN_PROGS := timens timerfd timer clock_nanosleep procfs exec futex TEST_GEN_PROGS_EXTENDED := gettime_perf CFLAGS := -Wall -Werror -pthread diff --git a/tools/testing/selftests/timens/vfork_exec.c b/tools/testing/selftests/timens/vfork_exec.c deleted file mode 100644 index e6ccd900f30a..000000000000 --- a/tools/testing/selftests/timens/vfork_exec.c +++ /dev/null @@ -1,90 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -#define _GNU_SOURCE -#include <errno.h> -#include <fcntl.h> -#include <sched.h> -#include <stdio.h> -#include <stdbool.h> -#include <sys/stat.h> -#include <sys/syscall.h> -#include <sys/types.h> -#include <sys/wait.h> -#include <time.h> -#include <unistd.h> -#include <string.h> - -#include "log.h" -#include "timens.h" - -#define OFFSET (36000) - -int main(int argc, char *argv[]) -{ - struct timespec now, tst; - int status, i; - pid_t pid; - - if (argc > 1) { - if (sscanf(argv[1], "%ld", &now.tv_sec) != 1) - return pr_perror("sscanf"); - - for (i = 0; i < 2; i++) { - _gettime(CLOCK_MONOTONIC, &tst, i); - if (abs(tst.tv_sec - now.tv_sec) > 5) - return pr_fail("%ld %ld\n", now.tv_sec, tst.tv_sec); - } - return 0; - } - - nscheck(); - - ksft_set_plan(1); - - clock_gettime(CLOCK_MONOTONIC, &now); - - if (unshare_timens()) - return 1; - - if (_settime(CLOCK_MONOTONIC, OFFSET)) - return 1; - - for (i = 0; i < 2; i++) { - _gettime(CLOCK_MONOTONIC, &tst, i); - if (abs(tst.tv_sec - now.tv_sec) > 5) - return pr_fail("%ld %ld\n", - now.tv_sec, tst.tv_sec); - } - - pid = vfork(); - if (pid < 0) - return pr_perror("fork"); - - if (pid == 0) { - char now_str[64]; - char *cargv[] = {"exec", now_str, NULL}; - char *cenv[] = {NULL}; - - // Check that we are still in the source timens. - for (i = 0; i < 2; i++) { - _gettime(CLOCK_MONOTONIC, &tst, i); - if (abs(tst.tv_sec - now.tv_sec) > 5) - return pr_fail("%ld %ld\n", - now.tv_sec, tst.tv_sec); - } - - /* Check for proper vvar offsets after execve. */ - snprintf(now_str, sizeof(now_str), "%ld", now.tv_sec + OFFSET); - execve("/proc/self/exe", cargv, cenv); - return pr_perror("execve"); - } - - if (waitpid(pid, &status, 0) != pid) - return pr_perror("waitpid"); - - if (status) - ksft_exit_fail(); - - ksft_test_result_pass("exec\n"); - ksft_exit_pass(); - return 0; -} -- 2.37.2.789.g6183377224-goog ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] Revert "fs/exec: allow to unshare a time namespace on vfork+exec" 2022-09-13 10:25 [PATCH 0/2] Revert "fs/exec: allow to unshare a time namespace on vfork+exec" Andrei Vagin 2022-09-13 10:25 ` [PATCH 1/2] Revert "selftests/timens: add a test for vfork+exit" Andrei Vagin @ 2022-09-13 10:25 ` Andrei Vagin 2022-09-13 11:37 ` [PATCH 0/2] " Kees Cook 2022-09-13 17:40 ` Kees Cook 3 siblings, 0 replies; 6+ messages in thread From: Andrei Vagin @ 2022-09-13 10:25 UTC (permalink / raw) To: Kees Cook Cc: linux-kernel, Andrei Vagin, Andrei Vagin, Alexey Izbyshev, Christian Brauner, Dmitry Safonov, Eric W. Biederman, Florian Weimer From: Andrei Vagin <avagin@gmail.com> This reverts commit 133e2d3e81de5d9706cab2dd1d52d231c27382e5. Alexey pointed out a few undesirable side effects of the reverted change. First, it doesn't take into account that CLONE_VFORK can be used with CLONE_THREAD. Second, a child process doesn't enter a target time name-space, if its parent dies before the child calls exec. It happens because the parent clears vfork_done. Eric W. Biederman suggests installing a time namespace as a task gets a new mm. It includes all new processes cloned without CLONE_VM and all tasks that call exec(). This is an user API change, but we think there aren't users that depend on the old behavior. It is too late to make such changes in this release, so let's roll back this patch and introduce the right one in the next release. Cc: Alexey Izbyshev <izbyshev@ispras.ru> Cc: Christian Brauner <brauner@kernel.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrei Vagin <avagin@gmail.com> --- fs/exec.c | 7 ------- kernel/fork.c | 5 +---- kernel/nsproxy.c | 3 +-- 3 files changed, 2 insertions(+), 13 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 9a5ca7b82bfc..d046dbb9cbd0 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -65,7 +65,6 @@ #include <linux/io_uring.h> #include <linux/syscall_user_dispatch.h> #include <linux/coredump.h> -#include <linux/time_namespace.h> #include <linux/uaccess.h> #include <asm/mmu_context.h> @@ -979,12 +978,10 @@ static int exec_mmap(struct mm_struct *mm) { struct task_struct *tsk; struct mm_struct *old_mm, *active_mm; - bool vfork; int ret; /* Notify parent that we're no longer interested in the old VM */ tsk = current; - vfork = !!tsk->vfork_done; old_mm = current->mm; exec_mm_release(tsk, old_mm); if (old_mm) @@ -1029,10 +1026,6 @@ static int exec_mmap(struct mm_struct *mm) tsk->mm->vmacache_seqnum = 0; vmacache_flush(tsk); task_unlock(tsk); - - if (vfork) - timens_on_fork(tsk->nsproxy, tsk); - if (old_mm) { mmap_read_unlock(old_mm); BUG_ON(active_mm != old_mm); diff --git a/kernel/fork.c b/kernel/fork.c index 8a9e92068b15..2b6bd511c6ed 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2047,11 +2047,8 @@ static __latent_entropy struct task_struct *copy_process( /* * If the new process will be in a different time namespace * do not allow it to share VM or a thread group with the forking task. - * - * On vfork, the child process enters the target time namespace only - * after exec. */ - if ((clone_flags & (CLONE_VM | CLONE_VFORK)) == CLONE_VM) { + if (clone_flags & (CLONE_THREAD | CLONE_VM)) { if (nsp->time_ns != nsp->time_ns_for_children) return ERR_PTR(-EINVAL); } diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c index b4cbb406bc28..eec72ca962e2 100644 --- a/kernel/nsproxy.c +++ b/kernel/nsproxy.c @@ -179,8 +179,7 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk) if (IS_ERR(new_ns)) return PTR_ERR(new_ns); - if ((flags & CLONE_VM) == 0) - timens_on_fork(new_ns, tsk); + timens_on_fork(new_ns, tsk); tsk->nsproxy = new_ns; return 0; -- 2.37.2.789.g6183377224-goog ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 0/2] Revert "fs/exec: allow to unshare a time namespace on vfork+exec" 2022-09-13 10:25 [PATCH 0/2] Revert "fs/exec: allow to unshare a time namespace on vfork+exec" Andrei Vagin 2022-09-13 10:25 ` [PATCH 1/2] Revert "selftests/timens: add a test for vfork+exit" Andrei Vagin 2022-09-13 10:25 ` [PATCH 2/2] Revert "fs/exec: allow to unshare a time namespace on vfork+exec" Andrei Vagin @ 2022-09-13 11:37 ` Kees Cook 2022-09-13 16:35 ` Andrei Vagin 2022-09-13 17:40 ` Kees Cook 3 siblings, 1 reply; 6+ messages in thread From: Kees Cook @ 2022-09-13 11:37 UTC (permalink / raw) To: Andrei Vagin; +Cc: linux-kernel On Tue, Sep 13, 2022 at 03:25:49AM -0700, Andrei Vagin wrote: > This reverts commits: > 133e2d3e81de ("fs/exec: allow to unshare a time namespace on vfork+exec") > 6342140db660 ("selftests/timens: add a test for vfork+exit") > > Alexey pointed out a few undesirable side effects of the reverted change. > First, it doesn't take into account that CLONE_VFORK can be used with > CLONE_THREAD. Second, a child process doesn't enter a target time name-space, > if its parent dies before the child calls exec. It happens because the parent > clears vfork_done. > > Eric W. Biederman suggests installing a time namespace as a task gets a new mm. > It includes all new processes cloned without CLONE_VM and all tasks that call > exec(). This is an user API change, but we think there aren't users that depend > on the old behavior. Can we include that patch here as well? > It is too late to make such changes in this release, so let's roll back > this patch and introduce the right one in the next release. Do you mean you'd like this revert to land for v6.0, and we should wait for the new API for later? -- Kees Cook ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/2] Revert "fs/exec: allow to unshare a time namespace on vfork+exec" 2022-09-13 11:37 ` [PATCH 0/2] " Kees Cook @ 2022-09-13 16:35 ` Andrei Vagin 0 siblings, 0 replies; 6+ messages in thread From: Andrei Vagin @ 2022-09-13 16:35 UTC (permalink / raw) To: Kees Cook; +Cc: Andrei Vagin, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1406 bytes --] On Tue, Sep 13, 2022 at 5:18 AM Kees Cook <keescook@chromium.org> wrote: > > On Tue, Sep 13, 2022 at 03:25:49AM -0700, Andrei Vagin wrote: > > This reverts commits: > > 133e2d3e81de ("fs/exec: allow to unshare a time namespace on vfork+exec") > > 6342140db660 ("selftests/timens: add a test for vfork+exit") > > > > Alexey pointed out a few undesirable side effects of the reverted change. > > First, it doesn't take into account that CLONE_VFORK can be used with > > CLONE_THREAD. Second, a child process doesn't enter a target time name-space, > > if its parent dies before the child calls exec. It happens because the parent > > clears vfork_done. > > > > Eric W. Biederman suggests installing a time namespace as a task gets a new mm. > > It includes all new processes cloned without CLONE_VM and all tasks that call > > exec(). This is an user API change, but we think there aren't users that depend > > on the old behavior. > > Can we include that patch here as well? It is attached. I need to test it and then I will send it properly. > > > It is too late to make such changes in this release, so let's roll back > > this patch and introduce the right one in the next release. > > Do you mean you'd like this revert to land for v6.0, and we should wait > for the new API for later? Yes, I mean this. I think we should merge the new patch in v6.1-rc1 so it sits there for a while. Thanks, Andrei [-- Attachment #2: 0001-fs-exec-switch-timens-when-a-task-gets-a-new-mm.patch --] [-- Type: text/x-patch, Size: 5306 bytes --] From 23e06b14354b328a100229c2d5c66b1d1ffc7a6f Mon Sep 17 00:00:00 2001 From: Andrei Vagin <avagin@gmail.com> Date: Tue, 13 Sep 2022 14:10:28 +0000 Subject: [PATCH] fs/exec: switch timens when a task gets a new mm Changing a time namespace requires remapping a vvar page, so we don't want to allow doing that if any other tasks can use the same mm. Currently, we install a time namespace when a task is cloned without CLONE_VM. exec() is another case when a task gets a new mm and so it can switch a time namespace safely, but this case isn't handled now. One more issue of the current interface is that clone() with CLONE_VM isn't allowed if the current task has unshared a time namespace (timens_for_children doesn't match the current timens). Both these issues make some inconvenience for users. For example, Florian reported that posix_spawn() uses vfork+exec and this pattern doesn't work well with time namespaces due to the both described issues. LXC needed to workaround the exec() issue by calling setns. In 133e2d3e81de5 "fs/exec: allow to unshare a time namespace on vfork+exec", we tried to fix these issues with minimal impact on UAPI. But it adds extra complexity and some undesirable side effects. Eric suggested fixing the issues properly because here are all the reasons to suppose that there are no users that depend on the old behavior. Cc: Alexey Izbyshev <izbyshev@ispras.ru> Cc: Christian Brauner <brauner@kernel.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Kees Cook <keescook@chromium.org> Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com> Original-author: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> --- fs/exec.c | 5 +++++ include/linux/nsproxy.h | 1 + kernel/fork.c | 9 --------- kernel/nsproxy.c | 23 +++++++++++++++++++++-- 4 files changed, 27 insertions(+), 11 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index d046dbb9cbd0..71284188b96d 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -65,6 +65,7 @@ #include <linux/io_uring.h> #include <linux/syscall_user_dispatch.h> #include <linux/coredump.h> +#include <linux/time_namespace.h> #include <linux/uaccess.h> #include <asm/mmu_context.h> @@ -1296,6 +1297,10 @@ int begin_new_exec(struct linux_binprm * bprm) bprm->mm = NULL; + retval = exec_task_namespaces(); + if (retval) + goto out_unlock; + #ifdef CONFIG_POSIX_TIMERS spin_lock_irq(&me->sighand->siglock); posix_cpu_timers_exit(me); diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h index cdb171efc7cb..fee881cded01 100644 --- a/include/linux/nsproxy.h +++ b/include/linux/nsproxy.h @@ -94,6 +94,7 @@ static inline struct cred *nsset_cred(struct nsset *set) int copy_namespaces(unsigned long flags, struct task_struct *tsk); void exit_task_namespaces(struct task_struct *tsk); void switch_task_namespaces(struct task_struct *tsk, struct nsproxy *new); +int exec_task_namespaces(void); void free_nsproxy(struct nsproxy *ns); int unshare_nsproxy_namespaces(unsigned long, struct nsproxy **, struct cred *, struct fs_struct *); diff --git a/kernel/fork.c b/kernel/fork.c index 2b6bd511c6ed..4eb803f75225 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2044,15 +2044,6 @@ static __latent_entropy struct task_struct *copy_process( return ERR_PTR(-EINVAL); } - /* - * If the new process will be in a different time namespace - * do not allow it to share VM or a thread group with the forking task. - */ - if (clone_flags & (CLONE_THREAD | CLONE_VM)) { - if (nsp->time_ns != nsp->time_ns_for_children) - return ERR_PTR(-EINVAL); - } - if (clone_flags & CLONE_PIDFD) { /* * - CLONE_DETACHED is blocked so that we can potentially diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c index eec72ca962e2..a487ff24129b 100644 --- a/kernel/nsproxy.c +++ b/kernel/nsproxy.c @@ -157,7 +157,8 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk) if (likely(!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWPID | CLONE_NEWNET | CLONE_NEWCGROUP | CLONE_NEWTIME)))) { - if (likely(old_ns->time_ns_for_children == old_ns->time_ns)) { + if ((flags & CLONE_VM) || + likely(old_ns->time_ns_for_children == old_ns->time_ns)) { get_nsproxy(old_ns); return 0; } @@ -179,7 +180,8 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk) if (IS_ERR(new_ns)) return PTR_ERR(new_ns); - timens_on_fork(new_ns, tsk); + if ((flags & CLONE_VM) == 0) + timens_on_fork(new_ns, tsk); tsk->nsproxy = new_ns; return 0; @@ -254,6 +256,23 @@ void exit_task_namespaces(struct task_struct *p) switch_task_namespaces(p, NULL); } +int exec_task_namespaces(void) +{ + struct task_struct *tsk = current; + struct nsproxy *new; + + if (tsk->nsproxy->time_ns_for_children == tsk->nsproxy->time_ns) + return 0; + + new = create_new_namespaces(0, tsk, current_user_ns(), tsk->fs); + if (IS_ERR(new)) + return PTR_ERR(new); + + timens_on_fork(new, tsk); + switch_task_namespaces(tsk, new); + return 0; +} + static int check_setns_flags(unsigned long flags) { if (!flags || (flags & ~(CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | -- 2.37.2.789.g6183377224-goog ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 0/2] Revert "fs/exec: allow to unshare a time namespace on vfork+exec" 2022-09-13 10:25 [PATCH 0/2] Revert "fs/exec: allow to unshare a time namespace on vfork+exec" Andrei Vagin ` (2 preceding siblings ...) 2022-09-13 11:37 ` [PATCH 0/2] " Kees Cook @ 2022-09-13 17:40 ` Kees Cook 3 siblings, 0 replies; 6+ messages in thread From: Kees Cook @ 2022-09-13 17:40 UTC (permalink / raw) To: avagin; +Cc: Kees Cook, linux-kernel On Tue, 13 Sep 2022 03:25:49 -0700, Andrei Vagin wrote: > This reverts commits: > 133e2d3e81de ("fs/exec: allow to unshare a time namespace on vfork+exec") > 6342140db660 ("selftests/timens: add a test for vfork+exit") > > Alexey pointed out a few undesirable side effects of the reverted change. > First, it doesn't take into account that CLONE_VFORK can be used with > CLONE_THREAD. Second, a child process doesn't enter a target time name-space, > if its parent dies before the child calls exec. It happens because the parent > clears vfork_done. > > [...] Applied to for-linus/execve, thanks! [1/2] Revert "selftests/timens: add a test for vfork+exit" https://git.kernel.org/kees/c/2b1e8921fc35 [2/2] Revert "fs/exec: allow to unshare a time namespace on vfork+exec" https://git.kernel.org/kees/c/33a2d6bc3480 -- Kees Cook ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-09-13 18:23 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-09-13 10:25 [PATCH 0/2] Revert "fs/exec: allow to unshare a time namespace on vfork+exec" Andrei Vagin 2022-09-13 10:25 ` [PATCH 1/2] Revert "selftests/timens: add a test for vfork+exit" Andrei Vagin 2022-09-13 10:25 ` [PATCH 2/2] Revert "fs/exec: allow to unshare a time namespace on vfork+exec" Andrei Vagin 2022-09-13 11:37 ` [PATCH 0/2] " Kees Cook 2022-09-13 16:35 ` Andrei Vagin 2022-09-13 17:40 ` Kees Cook
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox