Re: [RFC PATCH] Minimal non-child process exit notification support

linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [RFC PATCH] Minimal non-child process exit notification support
       [not found] <20181029175322.189042-1-dancol@google.com>
@ 2018-10-31 16:53 ` Daniel Colascione
       [not found]   ` <175DDE5D-E738-4C35-B664-6AD8B9CF446D@amacapital.net>
  0 siblings, 1 reply; 2+ messages in thread
From: Daniel Colascione @ 2018-10-31 16:53 UTC (permalink / raw)
  To: linux-kernel, linux-api; +Cc: Tim Murray, Joel Fernandes, Daniel Colascione

+ linux-api

On Mon, Oct 29, 2018 at 5:53 PM, Daniel Colascione <dancol@google.com> wrote:
> This patch adds a new file under /proc/pid, /proc/pid/exithand.
> Attempting to read from an exithand file will block until the
> corresponding process exits, at which point the read will successfully
> complete with EOF.  The file descriptor supports both blocking
> operations and poll(2). It's intended to be a minimal interface for
> allowing a program to wait for the exit of a process that is not one
> of its children.
>
> Why might we want this interface? Android's lmkd kills processes in
> order to free memory in response to various memory pressure
> signals. It's desirable to wait until a killed process actually exits
> before moving on (if needed) to killing the next process. Since the
> processes that lmkd kills are not lmkd's children, lmkd currently
> lacks a way to wait for a proces to actually die after being sent
> SIGKILL; today, lmkd resorts to polling the proc filesystem pid
> entry. This interface allow lmkd to give up polling and instead block
> and wait for process death.
>
> Signed-off-by: Daniel Colascione <dancol@google.com>
> ---
>  fs/proc/Makefile             |   1 +
>  fs/proc/base.c               |   1 +
>  fs/proc/exithand.c           | 117 +++++++++++++++++++++++++++++++++++
>  fs/proc/internal.h           |   4 ++
>  include/linux/sched/signal.h |   7 +++
>  kernel/exit.c                |   2 +
>  kernel/signal.c              |   3 +
>  7 files changed, 135 insertions(+)
>  create mode 100644 fs/proc/exithand.c
>
> diff --git a/fs/proc/Makefile b/fs/proc/Makefile
> index ead487e80510..21322280a2c1 100644
> --- a/fs/proc/Makefile
> +++ b/fs/proc/Makefile
> @@ -27,6 +27,7 @@ proc-y        += softirqs.o
>  proc-y += namespaces.o
>  proc-y += self.o
>  proc-y += thread_self.o
> +proc-y  += exithand.o
>  proc-$(CONFIG_PROC_SYSCTL)     += proc_sysctl.o
>  proc-$(CONFIG_NET)             += proc_net.o
>  proc-$(CONFIG_PROC_KCORE)      += kcore.o
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 7e9f07bf260d..31bc6bbb6dc4 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -3006,6 +3006,7 @@ static const struct pid_entry tgid_base_stuff[] = {
>  #ifdef CONFIG_LIVEPATCH
>         ONE("patch_state",  S_IRUSR, proc_pid_patch_state),
>  #endif
> +       REG("exithand", S_IRUGO, proc_tgid_exithand_operations),
>  };
>
>  static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx)
> diff --git a/fs/proc/exithand.c b/fs/proc/exithand.c
> new file mode 100644
> index 000000000000..358b08da6a08
> --- /dev/null
> +++ b/fs/proc/exithand.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Synchronous exit notification of non-child processes
> + *
> + * Simple file descriptor /proc/pid/exithand. Read blocks (and poll
> + * reports non-readable) until process either dies or becomes
> + * a zombie.
> + */
> +#include <linux/printk.h>
> +#include <linux/sched/signal.h>
> +#include <linux/poll.h>
> +#include "internal.h"
> +
> +static int proc_tgid_exithand_open(struct inode *inode, struct file *file)
> +{
> +       struct task_struct* task = get_proc_task(inode);
> +       /* If get_proc_task failed, it means the task is dead, which
> +        * is fine, since a subsequent read will return
> +        * immediately.  */
> +       if (task && !thread_group_leader(task))
> +               return -EINVAL;
> +       return 0;
> +}
> +
> +static ssize_t proc_tgid_exithand_read(struct file * file,
> +                                      char __user * buf,
> +                                      size_t count, loff_t *ppos)
> +{
> +       struct task_struct* task = NULL;
> +       wait_queue_entry_t wait;
> +       ssize_t res = 0;
> +       bool locked = false;
> +
> +       for (;;) {
> +               /* Retrieve the task from the struct pid each time
> +                * through the loop in case the exact struct task
> +                * changes underneath us (e.g., if in exec.c, the
> +                * execing process kills the group leader and starts
> +                * using its PID).  The struct signal should be the
> +                * same though even in this case.
> +                */
> +               task = get_proc_task(file_inode(file));
> +               res = 0;
> +               if (!task)
> +                       goto out;  /* No task?  Must have died.  */
> +
> +               BUG_ON(!thread_group_leader(task));
> +
> +               /* Synchronizes with exit.c machinery. */
> +               read_lock(&tasklist_lock);
> +               locked = true;
> +
> +               res = 0;
> +               if (task->exit_state)
> +                       goto out;
> +
> +               res = -EAGAIN;
> +               if (file->f_flags & O_NONBLOCK)
> +                       goto out;
> +
> +               /* Tell exit.c to go to the trouble of waking our
> +                * runqueue when this process gets around to
> +                * exiting. */
> +               task->signal->exithand_is_interested = true;
> +
> +               /* Even if the task identity changes, task->signal
> +                * should be invariant across the wait, making it safe
> +                * to go remove our wait record from the wait queue
> +                * after we come back from schedule.  */
> +
> +               init_waitqueue_entry(&wait, current);
> +               add_wait_queue(&wait_exithand, &wait);
> +
> +               read_unlock(&tasklist_lock);
> +               locked = false;
> +
> +               put_task_struct(task);
> +               task = NULL;
> +
> +               set_current_state(TASK_INTERRUPTIBLE);
> +               schedule();
> +               set_current_state(TASK_RUNNING);
> +               remove_wait_queue(&wait_exithand, &wait);
> +
> +               res = -ERESTARTSYS;
> +               if (signal_pending(current))
> +                       goto out;
> +       }
> +out:
> +       if (locked)
> +               read_unlock(&tasklist_lock);
> +       if (task)
> +               put_task_struct(task);
> +       return res;
> +}
> +
> +static __poll_t proc_tgid_exithand_poll(struct file *file, poll_table *wait)
> +{
> +       __poll_t mask = 0;
> +       struct task_struct* task = get_proc_task(file_inode(file));
> +       if (!task) {
> +               mask |= POLLIN;
> +       } else if (READ_ONCE(task->exit_state)) {
> +               mask |= POLLIN;
> +       } else {
> +               read_lock(&tasklist_lock);
> +               task->signal->exithand_is_interested = true;
> +               read_unlock(&tasklist_lock);
> +               poll_wait(file, &wait_exithand, wait);
> +       }
> +       return mask;
> +}
> +
> +const struct file_operations proc_tgid_exithand_operations = {
> +       .open           = proc_tgid_exithand_open,
> +       .read           = proc_tgid_exithand_read,
> +       .poll           = proc_tgid_exithand_poll,
> +};
> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
> index 5185d7f6a51e..1009d20475bc 100644
> --- a/fs/proc/internal.h
> +++ b/fs/proc/internal.h
> @@ -304,3 +304,7 @@ extern unsigned long task_statm(struct mm_struct *,
>                                 unsigned long *, unsigned long *,
>                                 unsigned long *, unsigned long *);
>  extern void task_mem(struct seq_file *, struct mm_struct *);
> +
> +/* exithand.c */
> +
> +extern const struct file_operations proc_tgid_exithand_operations;
> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> index 13789d10a50e..44131cb6c7f4 100644
> --- a/include/linux/sched/signal.h
> +++ b/include/linux/sched/signal.h
> @@ -74,6 +74,10 @@ struct multiprocess_signals {
>         struct hlist_node node;
>  };
>
> +/* Need to stick the waitq for exithand outside process structures in
> + * case a process disappears across a poll.  */
> +extern wait_queue_head_t wait_exithand;
> +
>  /*
>   * NOTE! "signal_struct" does not have its own
>   * locking, because a shared signal_struct always
> @@ -87,6 +91,9 @@ struct signal_struct {
>         int                     nr_threads;
>         struct list_head        thread_head;
>
> +       /* Protected with tasklist_lock.  */
> +       bool                    exithand_is_interested;
> +
>         wait_queue_head_t       wait_chldexit;  /* for wait4() */
>
>         /* current thread group signal load-balancing target: */
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 0e21e6d21f35..44a4e3796f8b 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -1485,6 +1485,8 @@ void __wake_up_parent(struct task_struct *p, struct task_struct *parent)
>  {
>         __wake_up_sync_key(&parent->signal->wait_chldexit,
>                                 TASK_INTERRUPTIBLE, 1, p);
> +       if (p->signal->exithand_is_interested)
> +               __wake_up_sync(&wait_exithand, TASK_INTERRUPTIBLE, 0);
>  }
>
>  static long do_wait(struct wait_opts *wo)
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 17565240b1c6..e156d48da70a 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -454,6 +454,9 @@ void flush_sigqueue(struct sigpending *queue)
>         }
>  }
>
> +wait_queue_head_t wait_exithand =
> +       __WAIT_QUEUE_HEAD_INITIALIZER(wait_exithand);
> +
>  /*
>   * Flush all pending signals for this kthread.
>   */
> --
> 2.19.1.568.g152ad8e336-goog
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [RFC PATCH] Minimal non-child process exit notification support
       [not found]   ` <175DDE5D-E738-4C35-B664-6AD8B9CF446D@amacapital.net>
@ 2018-10-31 17:44     ` Daniel Colascione
  0 siblings, 0 replies; 2+ messages in thread
From: Daniel Colascione @ 2018-10-31 17:44 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: linux-kernel, linux-api, Tim Murray, Joel Fernandes

On Wed, Oct 31, 2018 at 5:25 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> I had an old patch to do much the same thing:

It's a perennial idea. :-)

> https://lore.kernel.org/patchwork/patch/345098/
>
> Can you comment as to how your API compares to my old patch?

Sure. Basically, my approach is sort-of eventfd-esque, whereas your
approach involves adding a very unusual operation (poll support) to a
type of file (a directory) that normally doesn't support it. My
approach feels a bit more "conventional" than poll on a dfd.
Additionally, my approach is usable from the shell. In your model,
poll(2) returning *is* the notification, whereas in my approach, the
canonical notification is read() yielding EOF, with poll(2) acting
like a wakeup hint, just like for eventfd. (You can set O_NONBLOCK on
the exithand FD just like you would any other FD.)

The use of read() for notification of exit also allows for a simple
extension in which we return a siginfo_t with exit information to the
waiter, without changing the API model. My initial patch doesn't
include this feature because I wanted to keep the initial version as
simple as possible.

> You’re using
> some fairly gnarly global synchronization

The global synchronization only kicks for a particular process exit if
somebody has used an exithand FD to wait on that process. (Or more
precisely, that process's struct signal.) Since most process exits
don't require global synchronization, I don't think the global
waitqueue for exithand is a big problem, but if it is, there are
options for fixing it.

> , and that seems unnecessary

It is necessary, and I don't see how your patch is correct. In your
proc_task_base_poll, you call poll_wait() with &task->detach_wqh. What
prevents that waitqueue disappearing (and the poll table waitqueue
pointer dangling) immediately after proc_task_base_poll returns? The
proc_inode maintains a reference to a struct pid, not a task_struct,
but your waitqueue lives in task_struct.

The waitqueue living in task_struct is also wrong in the case that a
multithreaded program execs from a non-main thread; in this case (if
I'm reading the code in exec.c right) we destroy the old main thread
task_struct and have the caller-of-exec's task_struct adopt the old
main thread's struct pid. That is, identity-continuity of struct task
is not the same as identity-continuity of the logical thread group.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-10-31 17:44 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20181029175322.189042-1-dancol@google.com>
2018-10-31 16:53 ` [RFC PATCH] Minimal non-child process exit notification support Daniel Colascione
     [not found]   ` <175DDE5D-E738-4C35-B664-6AD8B9CF446D@amacapital.net>
2018-10-31 17:44     ` Daniel Colascione

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).