From: Tycho Andersen <tycho@tycho.pizza>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
Christian Brauner <brauner@kernel.org>,
fuse-devel <fuse-devel@lists.sourceforge.net>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: strange interaction between fuse + pidns
Date: Tue, 12 Jul 2022 09:14:09 -0600 [thread overview]
Message-ID: <Ys2PwTS0qFmGNFqy@netflix> (raw)
In-Reply-To: <87sfn62yd1.fsf@email.froward.int.ebiederm.org>
On Tue, Jul 12, 2022 at 09:34:50AM -0500, Eric W. Biederman wrote:
> Tycho Andersen <tycho@tycho.pizza> writes:
>
> > On Mon, Jul 11, 2022 at 06:06:21PM -0500, Eric W. Biederman wrote:
> >> Tycho Andersen <tycho@tycho.pizza> writes:
> >> It is not different enough to change the semantics. What I am aiming
> >> for is having a dedicated flag indicating a task will exit, that
> >> fatal_signal_pending can check. And I intend to make that flag one way
> >> so that once it is set it will never be cleared.
> >
> > Ok - how far out is that? I'd like to try to convince Miklos to land
> > the fuse part of this fix now, but without the "look at shared signals
> > too" patch, that fix is useless. I'm not married to my patch, but I
> > would like to get this fixed somehow soon.
>
> My point is that we need to figure out why you need the look at shared
> signals.
At least in the case where the task was already exiting, it's because
complete_signal() never wakes them up.
> If I can get everything reviewed my changes will be in the next merge
> window (it unfortunately always takes longer to get the code reviewed
> than I would like).
>
> However when my changes land does not matter. What you are trying to
> solve is orthogonal of my on-going work.
>
> The problem is that looking at shared signals is fundamentally broken.
> A case in point is that kernel threads can have a pending SIGKILL that
> is not a fatal signal. As kernel threads are allowed to ignore or even
> handle SIGKILL.
>
> If you want to change fatal_signal_pending to include PF_EXITING I would
> need to double check the implications but I think that would work, and
> would not have the problems including the shared pending state of
> SIGKILL.
I think that would work. I'll test it out, thanks.
> >> The other thing I have played with that might be relevant was removing
> >> the explicit wait in zap_pid_ns_processes and simply not allowing wait
> >> to reap the pid namespace init until all it's children had been reaped.
> >> Essentially how we deal with the thread group leader for ordinary
> >> processes. Does that sound like it might help in the fuse case?
> >
> > No, the problem is that the wait code doesn't know to look in the
> > right place, so waiting later still won't help.
>
> I was suggesting to modify the kernel so that zap_pid_ns_processes would
> not wait for the zapped processes. Instead I was proposing that
> delay_group_leader called from wait_consider_task would simply refuse to
> allow the init process of a pid namespace to be reaped until every other
> process of that pid namespace had exited.
>
> You can prototype how that would affect the deadlock by simply removing
> the waiting from zap_pid_ns_processes.
>
> I suggest that simply because that has the potential to remove some of
> the strange pid namespace cases.
>
> I don't understand the problematic interaction between pid namespace
> shutdown and the fuse daemon, so I am merely suggesting a possibility
> that I know can simplify pid namespace shutdown.
>
> Something like:
>
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index f4f8cb0435b4..d22a30b0b0cf 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -207,47 +207,6 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
> read_unlock(&tasklist_lock);
> rcu_read_unlock();
>
> - /*
> - * Reap the EXIT_ZOMBIE children we had before we ignored SIGCHLD.
> - * kernel_wait4() will also block until our children traced from the
> - * parent namespace are detached and become EXIT_DEAD.
> - */
> - do {
> - clear_thread_flag(TIF_SIGPENDING);
> - rc = kernel_wait4(-1, NULL, __WALL, NULL);
> - } while (rc != -ECHILD);
> -
> - /*
> - * kernel_wait4() misses EXIT_DEAD children, and EXIT_ZOMBIE
> - * process whose parents processes are outside of the pid
> - * namespace. Such processes are created with setns()+fork().
> - *
> - * If those EXIT_ZOMBIE processes are not reaped by their
> - * parents before their parents exit, they will be reparented
> - * to pid_ns->child_reaper. Thus pidns->child_reaper needs to
> - * stay valid until they all go away.
> - *
> - * The code relies on the pid_ns->child_reaper ignoring
> - * SIGCHILD to cause those EXIT_ZOMBIE processes to be
> - * autoreaped if reparented.
> - *
> - * Semantically it is also desirable to wait for EXIT_ZOMBIE
> - * processes before allowing the child_reaper to be reaped, as
> - * that gives the invariant that when the init process of a
> - * pid namespace is reaped all of the processes in the pid
> - * namespace are gone.
> - *
> - * Once all of the other tasks are gone from the pid_namespace
> - * free_pid() will awaken this task.
> - */
> - for (;;) {
> - set_current_state(TASK_INTERRUPTIBLE);
> - if (pid_ns->pid_allocated == init_pids)
> - break;
> - schedule();
> - }
> - __set_current_state(TASK_RUNNING);
> -
> if (pid_ns->reboot)
> current->signal->group_exit_code = pid_ns->reboot;
Yes, but we need to add the wait to delay_group_leader(), and if the
tasks are stuck indefinitely looking at the wrong condition, I don't
see how moving it will help resolve things.
Thanks,
Tycho
next prev parent reply other threads:[~2022-07-12 15:20 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-23 17:21 strange interaction between fuse + pidns Tycho Andersen
2022-06-23 21:55 ` Vivek Goyal
2022-06-23 23:41 ` Tycho Andersen
2022-06-24 17:36 ` Vivek Goyal
2022-07-11 10:35 ` Miklos Szeredi
2022-07-11 13:59 ` Miklos Szeredi
2022-07-11 20:25 ` Tycho Andersen
2022-07-11 21:37 ` Eric W. Biederman
2022-07-11 22:53 ` Tycho Andersen
2022-07-11 23:06 ` Eric W. Biederman
2022-07-12 13:43 ` Tycho Andersen
2022-07-12 14:34 ` Eric W. Biederman
2022-07-12 15:14 ` Tycho Andersen [this message]
2022-07-13 17:53 ` [PATCH] sched: __fatal_signal_pending() should also check PF_EXITING Tycho Andersen
2022-07-20 15:03 ` Serge E. Hallyn
2022-07-20 20:58 ` Tycho Andersen
2022-07-21 1:54 ` Serge E. Hallyn
2022-07-27 15:44 ` Tycho Andersen
2022-07-27 16:32 ` Eric W. Biederman
2022-07-27 17:55 ` Tycho Andersen
2022-07-28 18:48 ` Eric W. Biederman
2022-07-27 17:55 ` Oleg Nesterov
2022-07-27 18:18 ` Tycho Andersen
2022-07-27 19:19 ` Oleg Nesterov
2022-07-27 19:40 ` Tycho Andersen
2022-07-28 9:12 ` Oleg Nesterov
2022-07-28 21:20 ` Tycho Andersen
2022-07-29 5:04 ` Eric W. Biederman
2022-07-29 13:50 ` Tycho Andersen
2022-07-29 16:15 ` Eric W. Biederman
2022-07-29 16:48 ` Tycho Andersen
2022-07-29 17:40 ` [RFC][PATCH] fuse: In fuse_flush only wait if someone wants the return code Eric W. Biederman
2022-07-29 20:47 ` Oleg Nesterov
2022-07-30 0:15 ` Al Viro
2022-07-30 5:10 ` [RFC][PATCH v2] " Eric W. Biederman
2022-08-01 15:16 ` Tycho Andersen
2022-08-02 12:50 ` Miklos Szeredi
2022-08-15 13:59 ` Tycho Andersen
2022-08-15 17:55 ` Serge E. Hallyn
2022-09-01 14:06 ` [PATCH] " Tycho Andersen
2022-09-19 15:03 ` Tycho Andersen
2022-09-20 18:02 ` Serge E. Hallyn
2022-09-26 14:17 ` Tycho Andersen
2022-09-27 9:46 ` Miklos Szeredi
2022-09-29 14:05 ` [fuse-devel] " Stef Bon
2022-09-29 16:39 ` [PATCH v2] " Tycho Andersen
2022-09-30 13:35 ` Miklos Szeredi
2022-09-30 14:01 ` Tycho Andersen
2022-09-30 14:41 ` Miklos Szeredi
2022-09-30 16:09 ` Tycho Andersen
2022-10-26 9:01 ` Miklos Szeredi
2022-11-14 16:02 ` [PATCH v3] " Tycho Andersen
2022-11-28 15:00 ` Tycho Andersen
2022-12-08 14:26 ` Miklos Szeredi
2022-12-08 17:49 ` Tycho Andersen
2022-12-19 19:16 ` Tycho Andersen
2023-01-03 14:51 ` Tycho Andersen
2023-01-05 15:15 ` Serge E. Hallyn
2023-01-26 14:12 ` Miklos Szeredi
2022-09-30 19:47 ` [PATCH] " Serge E. Hallyn
2022-09-19 15:46 ` [RFC][PATCH v2] " Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ys2PwTS0qFmGNFqy@netflix \
--to=tycho@tycho.pizza \
--cc=brauner@kernel.org \
--cc=ebiederm@xmission.com \
--cc=fuse-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.