public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] prctl: add PR_{SET,GET}_CHILD_SUBREAPER to allow simple process supervision
@ 2012-01-07 15:56 Kay Sievers
  2012-01-07 16:13 ` Valdis.Kletnieks
  2012-04-23 22:36 ` Michael Kerrisk
  0 siblings, 2 replies; 7+ messages in thread
From: Kay Sievers @ 2012-01-07 15:56 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Oleg Nesterov, Lennart Poettering

Resending this, it got lost last year's September.

We still need it to properly implement init-like service managers.

Andrew, care to pick this up again? The issues raised the last year are
all expected to be fixed.

Thanks,
Kay


From: Lennart Poettering <lennart@poettering.net>
Subject: prctl: add PR_{SET,GET}_CHILD_SUBREAPER to allow simple process supervision

Userspace service managers/supervisors need to track their started
services. Many services daemonize by double-forking and get implicitly
re-parented to PID 1. The service manager will no longer be able to
receive the SIGCHLD signals for them, and is no longer in charge of
reaping the children with wait(). All information about the children
is lost at the moment PID 1 cleans up the re-parented processes.

With this prctl, a service manager process can mark itself as a sort of
'sub-init', able to stay as the parent for all orphaned processes
created by the started services. All SIGCHLD signals will be delivered
to the service manager.

Receiving SIGCHLD and doing wait() is in cases of a service-manager
much preferred over any possible asynchronous notification about
specific PIDs, because the service manager has full access to the
child process data in /proc and the PID can not be re-used until
the wait(), the service-manager itself is in charge of, has happened.

As a side effect, the relevant parent PID information does not get lost
by a double-fork, which results in a more elaborate process tree and 'ps'
output:

before:
  # ps afx
  253 ?        Ss     0:00 /bin/dbus-daemon --system --nofork
  294 ?        Sl     0:00 /usr/libexec/polkit-1/polkitd
  328 ?        S      0:00 /usr/sbin/modem-manager
  608 ?        Sl     0:00 /usr/libexec/colord
  658 ?        Sl     0:00 /usr/libexec/upowerd
  819 ?        Sl     0:00 /usr/libexec/imsettings-daemon
  916 ?        Sl     0:00 /usr/libexec/udisks-daemon
  917 ?        S      0:00  \_ udisks-daemon: not polling any devices

after:
  # ps afx
  294 ?        Ss     0:00 /bin/dbus-daemon --system --nofork
  426 ?        Sl     0:00  \_ /usr/libexec/polkit-1/polkitd
  449 ?        S      0:00  \_ /usr/sbin/modem-manager
  635 ?        Sl     0:00  \_ /usr/libexec/colord
  705 ?        Sl     0:00  \_ /usr/libexec/upowerd
  959 ?        Sl     0:00  \_ /usr/libexec/udisks-daemon
  960 ?        S      0:00  |   \_ udisks-daemon: not polling any devices
  977 ?        Sl     0:00  \_ /usr/libexec/packagekitd

This prctl is orthogonal to PID namespaces. PID namespaces are isolated
from each other, while a service management process usually requires
the services to live in the same namespace, to be able to talk to each
other.

Users of this will be the systemd per-user instance, which provides
init-like functionality for the user's login session and D-Bus, which
activates bus services on-demand. Both need init-like capabilities
to be able to properly keep track of the services they start.

Many thanks to Oleg for several rounds of review and insights.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Lennart Poettering <lennart@poettering.net>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
---

 include/linux/prctl.h |    3 +++
 include/linux/sched.h |   12 ++++++++++++
 kernel/exit.c         |   28 +++++++++++++++++++++++-----
 kernel/fork.c         |    3 +++
 kernel/sys.c          |    8 ++++++++
 5 files changed, 49 insertions(+), 5 deletions(-)

--- a/include/linux/prctl.h
+++ b/include/linux/prctl.h
@@ -102,4 +102,7 @@
 
 #define PR_MCE_KILL_GET 34
 
+#define PR_SET_CHILD_SUBREAPER 35
+#define PR_GET_CHILD_SUBREAPER 36
+
 #endif /* _LINUX_PRCTL_H */
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -552,6 +552,18 @@ struct signal_struct {
 	int			group_stop_count;
 	unsigned int		flags; /* see SIGNAL_* flags below */
 
+	/*
+	 * PR_SET_CHILD_SUBREAPER marks a process, like a service
+	 * manager, to re-parent orphan (double-forking) child processes
+	 * to this process instead of 'init'. The service manager is
+	 * able to receive SIGCHLD signals and is able to investigate
+	 * the process until it calls wait(). All children of this
+	 * process will inherit a flag if they should look for a
+	 * child_subreaper process at exit.
+	 */
+	unsigned int		is_child_subreaper:1;
+	unsigned int		has_child_subreaper:1;
+
 	/* POSIX.1b Interval Timers */
 	struct list_head posix_timers;
 
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -687,11 +687,12 @@ static void exit_mm(struct task_struct *
 }
 
 /*
- * When we die, we re-parent all our children.
- * Try to give them to another thread in our thread
- * group, and if no such member exists, give it to
- * the child reaper process (ie "init") in our pid
- * space.
+ * When we die, we re-parent all our children, and try to:
+ * 1. give them to another thread in our thread group, if such a
+ *    member exists
+ * 2. give it to the first anchestor process which prctl'd itself
+ *    as a child_subreaper for its children (like a service manager)
+ * 3. give it to the init process (PID 1) in our pid namespace
  */
 static struct task_struct *find_new_reaper(struct task_struct *father)
 	__releases(&tasklist_lock)
@@ -722,6 +723,23 @@ static struct task_struct *find_new_reap
 		 * forget_original_parent() must move them somewhere.
 		 */
 		pid_ns->child_reaper = init_pid_ns.child_reaper;
+	} else if (father->signal->has_child_subreaper) {
+		struct task_struct *reaper;
+
+		/* find the first ancestor marked as child_subreaper */
+		for (reaper = father->real_parent;
+		     reaper != &init_task;
+		     reaper = reaper->real_parent) {
+			if (same_thread_group(reaper, pid_ns->child_reaper))
+				break;
+			if (!reaper->signal->is_child_subreaper)
+				continue;
+			thread = reaper;
+			do {
+				if (!(thread->flags & PF_EXITING))
+					return reaper;
+			} while_each_thread(reaper, thread);
+		}
 	}
 
 	return pid_ns->child_reaper;
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -979,6 +979,9 @@ static int copy_signal(unsigned long clo
 	sig->oom_score_adj = current->signal->oom_score_adj;
 	sig->oom_score_adj_min = current->signal->oom_score_adj_min;
 
+	sig->has_child_subreaper = current->signal->has_child_subreaper ||
+				   current->signal->is_child_subreaper;
+
 	mutex_init(&sig->cred_guard_mutex);
 
 	return 0;
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1841,6 +1841,14 @@ SYSCALL_DEFINE5(prctl, int, option, unsi
 			else
 				error = PR_MCE_KILL_DEFAULT;
 			break;
+		case PR_SET_CHILD_SUBREAPER:
+			me->signal->is_child_subreaper = !!arg2;
+			error = 0;
+			break;
+		case PR_GET_CHILD_SUBREAPER:
+			error = put_user(me->signal->is_child_subreaper,
+					 (int __user *) arg2);
+			break;
 		default:
 			error = -EINVAL;
 			break;



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] prctl: add PR_{SET,GET}_CHILD_SUBREAPER to allow simple process supervision
  2012-01-07 15:56 [PATCH] prctl: add PR_{SET,GET}_CHILD_SUBREAPER to allow simple process supervision Kay Sievers
@ 2012-01-07 16:13 ` Valdis.Kletnieks
  2012-01-09 15:07   ` Oleg Nesterov
  2012-04-23 22:36 ` Michael Kerrisk
  1 sibling, 1 reply; 7+ messages in thread
From: Valdis.Kletnieks @ 2012-01-07 16:13 UTC (permalink / raw)
  To: Kay Sievers; +Cc: akpm, linux-kernel, Oleg Nesterov, Lennart Poettering

[-- Attachment #1: Type: text/plain, Size: 1863 bytes --]

On Sat, 07 Jan 2012 16:56:37 +0100, Kay Sievers said:
> Resending this, it got lost last year's September.
>
> We still need it to properly implement init-like service managers.

> From: Lennart Poettering <lennart@poettering.net>
> Subject: prctl: add PR_{SET,GET}_CHILD_SUBREAPER to allow simple process supervision

> Users of this will be the systemd per-user instance, which provides
> init-like functionality for the user's login session and D-Bus, which
> activates bus services on-demand. Both need init-like capabilities
> to be able to properly keep track of the services they start.

> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -552,6 +552,18 @@ struct signal_struct {
>  	int			group_stop_count;
>  	unsigned int		flags; /* see SIGNAL_* flags below */
>
> +	/*
> +	 * PR_SET_CHILD_SUBREAPER marks a process, like a service
> +	 * manager, to re-parent orphan (double-forking) child processes
> +	 * to this process instead of 'init'. The service manager is
> +	 * able to receive SIGCHLD signals and is able to investigate
> +	 * the process until it calls wait(). All children of this
> +	 * process will inherit a flag if they should look for a
> +	 * child_subreaper process at exit.
> +	 */
> +	unsigned int		is_child_subreaper:1;
> +	unsigned int		has_child_subreaper:1;

Is there someplace we can stick these two fields where they won't expand the
signal_struct?  Can we stick them in signal_struct->flags instead? Looks like we've
only burned 3 bits of that unsigned int.  Yes, I know that would complicate the
prctl get/set code.

> +		/* find the first ancestor marked as child_subreaper */
> +		for (reaper = father->real_parent;
> +		     reaper != &init_task;
> +		     reaper = reaper->real_parent) {

I admit being insufficiently caffienated - does this DTRT in a PID namespace? That
&init_task looks fishy to me...

[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] prctl: add PR_{SET,GET}_CHILD_SUBREAPER to allow simple process supervision
  2012-01-07 16:13 ` Valdis.Kletnieks
@ 2012-01-09 15:07   ` Oleg Nesterov
  2012-01-14  0:35     ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2012-01-09 15:07 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Kay Sievers, akpm, linux-kernel, Lennart Poettering

On 01/07, Valdis.Kletnieks@vt.edu wrote:
>
> On Sat, 07 Jan 2012 16:56:37 +0100, Kay Sievers said:
> > Resending this, it got lost last year's September.
> >
> > We still need it to properly implement init-like service managers.
>
> > From: Lennart Poettering <lennart@poettering.net>
> > Subject: prctl: add PR_{SET,GET}_CHILD_SUBREAPER to allow simple process supervision
>
> > Users of this will be the systemd per-user instance, which provides
> > init-like functionality for the user's login session and D-Bus, which
> > activates bus services on-demand. Both need init-like capabilities
> > to be able to properly keep track of the services they start.
>
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -552,6 +552,18 @@ struct signal_struct {
> >  	int			group_stop_count;
> >  	unsigned int		flags; /* see SIGNAL_* flags below */
> >
> > +	/*
> > +	 * PR_SET_CHILD_SUBREAPER marks a process, like a service
> > +	 * manager, to re-parent orphan (double-forking) child processes
> > +	 * to this process instead of 'init'. The service manager is
> > +	 * able to receive SIGCHLD signals and is able to investigate
> > +	 * the process until it calls wait(). All children of this
> > +	 * process will inherit a flag if they should look for a
> > +	 * child_subreaper process at exit.
> > +	 */
> > +	unsigned int		is_child_subreaper:1;
> > +	unsigned int		has_child_subreaper:1;
>
> Is there someplace we can stick these two fields where they won't expand the
> signal_struct?  Can we stick them in signal_struct->flags instead?

Yes, it would be better to use signal_struct->flags. But we can't do this
until we cleanup the usage of ->flags. For example, task_participate_group_stop
simply does sig->flags = SIGNAL_STOP_STOPPED.


> > +		/* find the first ancestor marked as child_subreaper */
> > +		for (reaper = father->real_parent;
> > +		     reaper != &init_task;
> > +		     reaper = reaper->real_parent) {
>
> I admit being insufficiently caffienated - does this DTRT in a PID namespace? That
> &init_task looks fishy to me...

Probably this needs a comment. Initially I was confused too.

Note that the code below checks same_thread_group(reaper, pid_ns->child_reaper),
this is what we need to DTRT in a PID namespace. However we still need the
check above, see http://marc.info/?l=linux-kernel&m=131385460420380

Oleg.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] prctl: add PR_{SET,GET}_CHILD_SUBREAPER to allow simple process supervision
  2012-01-09 15:07   ` Oleg Nesterov
@ 2012-01-14  0:35     ` Andrew Morton
  2012-01-14 13:59       ` Kay Sievers
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2012-01-14  0:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Valdis.Kletnieks, Kay Sievers, linux-kernel, Lennart Poettering

On Mon, 9 Jan 2012 16:07:01 +0100
Oleg Nesterov <oleg@redhat.com> wrote:

> > > +		/* find the first ancestor marked as child_subreaper */
> > > +		for (reaper = father->real_parent;
> > > +		     reaper != &init_task;
> > > +		     reaper = reaper->real_parent) {
> >
> > I admit being insufficiently caffienated - does this DTRT in a PID namespace? That
> > &init_task looks fishy to me...
> 
> Probably this needs a comment. Initially I was confused too.
> 
> Note that the code below checks same_thread_group(reaper, pid_ns->child_reaper),
> this is what we need to DTRT in a PID namespace. However we still need the
> check above, see http://marc.info/?l=linux-kernel&m=131385460420380

In light of Kay's haughty silence, I did this:

--- a/kernel/exit.c~prctl-add-pr_setget_child_subreaper-to-allow-simple-process-supervision-fix-fix
+++ a/kernel/exit.c
@@ -724,7 +724,13 @@ static struct task_struct *find_new_reap
 	} else if (father->signal->has_child_subreaper) {
 		struct task_struct *reaper;
 
-		/* find the first ancestor marked as child_subreaper */
+		/*
+		 * Find the first ancestor marked as child_subreaper.
+		 * Note that the code below checks same_thread_group(reaper,
+		 * pid_ns->child_reaper).  This is what we need to DTRT in a
+		 * PID namespace. However we still need the check above, see
+		 * http://marc.info/?l=linux-kernel&m=131385460420380
+		 */
 		for (reaper = father->real_parent;
 		     reaper != &init_task;
 		     reaper = reaper->real_parent) {
_

I'm not a fan of URLs-in-comments, but that email was too gnarly to be
condensed into a sane comment.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] prctl: add PR_{SET,GET}_CHILD_SUBREAPER to allow simple process supervision
  2012-01-14  0:35     ` Andrew Morton
@ 2012-01-14 13:59       ` Kay Sievers
  0 siblings, 0 replies; 7+ messages in thread
From: Kay Sievers @ 2012-01-14 13:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Oleg Nesterov, Valdis.Kletnieks, linux-kernel, Lennart Poettering

On Sat, Jan 14, 2012 at 01:35, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Mon, 9 Jan 2012 16:07:01 +0100
> Oleg Nesterov <oleg@redhat.com> wrote:
>
>> > > +         /* find the first ancestor marked as child_subreaper */
>> > > +         for (reaper = father->real_parent;
>> > > +              reaper != &init_task;
>> > > +              reaper = reaper->real_parent) {
>> >
>> > I admit being insufficiently caffienated - does this DTRT in a PID namespace? That
>> > &init_task looks fishy to me...
>>
>> Probably this needs a comment. Initially I was confused too.
>>
>> Note that the code below checks same_thread_group(reaper, pid_ns->child_reaper),
>> this is what we need to DTRT in a PID namespace. However we still need the
>> check above, see http://marc.info/?l=linux-kernel&m=131385460420380
>
> In light of Kay's haughty silence, I did this:
>
> --- a/kernel/exit.c~prctl-add-pr_setget_child_subreaper-to-allow-simple-process-supervision-fix-fix
> +++ a/kernel/exit.c
> @@ -724,7 +724,13 @@ static struct task_struct *find_new_reap
>        } else if (father->signal->has_child_subreaper) {
>                struct task_struct *reaper;
>
> -               /* find the first ancestor marked as child_subreaper */
> +               /*
> +                * Find the first ancestor marked as child_subreaper.
> +                * Note that the code below checks same_thread_group(reaper,
> +                * pid_ns->child_reaper).  This is what we need to DTRT in a
> +                * PID namespace. However we still need the check above, see
> +                * http://marc.info/?l=linux-kernel&m=131385460420380
> +                */
>                for (reaper = father->real_parent;
>                     reaper != &init_task;
>                     reaper = reaper->real_parent) {

Oleg and I got nowhere really, discussing if we should switch the
check in the for() parameters and the one inside the loop; so we just
decided to keep it the way it is, and did not reply any further. It
wasn't that we didn't care, we just couldn't decide. Sorry for that.

The comment indeed sounds good to have. Thanks for adding this.

Kay

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] prctl: add PR_{SET,GET}_CHILD_SUBREAPER to allow simple process supervision
  2012-01-07 15:56 [PATCH] prctl: add PR_{SET,GET}_CHILD_SUBREAPER to allow simple process supervision Kay Sievers
  2012-01-07 16:13 ` Valdis.Kletnieks
@ 2012-04-23 22:36 ` Michael Kerrisk
  2013-01-10 22:48   ` Michael Kerrisk (man-pages)
  1 sibling, 1 reply; 7+ messages in thread
From: Michael Kerrisk @ 2012-04-23 22:36 UTC (permalink / raw)
  To: Kay Sievers, Lennart Poettering
  Cc: akpm, linux-kernel, Oleg Nesterov, Michael Kerrisk

Lennart,

On Sun, Jan 8, 2012 at 4:56 AM, Kay Sievers <kay.sievers@vrfy.org> wrote:
> Resending this, it got lost last year's September.
>
> We still need it to properly implement init-like service managers.
>
> Andrew, care to pick this up again? The issues raised the last year are
> all expected to be fixed.
>
> Thanks,
> Kay
>
>
> From: Lennart Poettering <lennart@poettering.net>
> Subject: prctl: add PR_{SET,GET}_CHILD_SUBREAPER to allow simple process supervision

You have often enthusiastically pointed out to me pieces that are
missing from the man pages. When will you be sending me a man-pages
patch to document these new prctl() options that you've added?

Thanks,

Michael



> Userspace service managers/supervisors need to track their started
> services. Many services daemonize by double-forking and get implicitly
> re-parented to PID 1. The service manager will no longer be able to
> receive the SIGCHLD signals for them, and is no longer in charge of
> reaping the children with wait(). All information about the children
> is lost at the moment PID 1 cleans up the re-parented processes.
>
> With this prctl, a service manager process can mark itself as a sort of
> 'sub-init', able to stay as the parent for all orphaned processes
> created by the started services. All SIGCHLD signals will be delivered
> to the service manager.
>
> Receiving SIGCHLD and doing wait() is in cases of a service-manager
> much preferred over any possible asynchronous notification about
> specific PIDs, because the service manager has full access to the
> child process data in /proc and the PID can not be re-used until
> the wait(), the service-manager itself is in charge of, has happened.
>
> As a side effect, the relevant parent PID information does not get lost
> by a double-fork, which results in a more elaborate process tree and 'ps'
> output:
>
> before:
>  # ps afx
>  253 ?        Ss     0:00 /bin/dbus-daemon --system --nofork
>  294 ?        Sl     0:00 /usr/libexec/polkit-1/polkitd
>  328 ?        S      0:00 /usr/sbin/modem-manager
>  608 ?        Sl     0:00 /usr/libexec/colord
>  658 ?        Sl     0:00 /usr/libexec/upowerd
>  819 ?        Sl     0:00 /usr/libexec/imsettings-daemon
>  916 ?        Sl     0:00 /usr/libexec/udisks-daemon
>  917 ?        S      0:00  \_ udisks-daemon: not polling any devices
>
> after:
>  # ps afx
>  294 ?        Ss     0:00 /bin/dbus-daemon --system --nofork
>  426 ?        Sl     0:00  \_ /usr/libexec/polkit-1/polkitd
>  449 ?        S      0:00  \_ /usr/sbin/modem-manager
>  635 ?        Sl     0:00  \_ /usr/libexec/colord
>  705 ?        Sl     0:00  \_ /usr/libexec/upowerd
>  959 ?        Sl     0:00  \_ /usr/libexec/udisks-daemon
>  960 ?        S      0:00  |   \_ udisks-daemon: not polling any devices
>  977 ?        Sl     0:00  \_ /usr/libexec/packagekitd
>
> This prctl is orthogonal to PID namespaces. PID namespaces are isolated
> from each other, while a service management process usually requires
> the services to live in the same namespace, to be able to talk to each
> other.
>
> Users of this will be the systemd per-user instance, which provides
> init-like functionality for the user's login session and D-Bus, which
> activates bus services on-demand. Both need init-like capabilities
> to be able to properly keep track of the services they start.
>
> Many thanks to Oleg for several rounds of review and insights.
>
> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Lennart Poettering <lennart@poettering.net>
> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
> ---
>
>  include/linux/prctl.h |    3 +++
>  include/linux/sched.h |   12 ++++++++++++
>  kernel/exit.c         |   28 +++++++++++++++++++++++-----
>  kernel/fork.c         |    3 +++
>  kernel/sys.c          |    8 ++++++++
>  5 files changed, 49 insertions(+), 5 deletions(-)
>
> --- a/include/linux/prctl.h
> +++ b/include/linux/prctl.h
> @@ -102,4 +102,7 @@
>
>  #define PR_MCE_KILL_GET 34
>
> +#define PR_SET_CHILD_SUBREAPER 35
> +#define PR_GET_CHILD_SUBREAPER 36
> +
>  #endif /* _LINUX_PRCTL_H */
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -552,6 +552,18 @@ struct signal_struct {
>        int                     group_stop_count;
>        unsigned int            flags; /* see SIGNAL_* flags below */
>
> +       /*
> +        * PR_SET_CHILD_SUBREAPER marks a process, like a service
> +        * manager, to re-parent orphan (double-forking) child processes
> +        * to this process instead of 'init'. The service manager is
> +        * able to receive SIGCHLD signals and is able to investigate
> +        * the process until it calls wait(). All children of this
> +        * process will inherit a flag if they should look for a
> +        * child_subreaper process at exit.
> +        */
> +       unsigned int            is_child_subreaper:1;
> +       unsigned int            has_child_subreaper:1;
> +
>        /* POSIX.1b Interval Timers */
>        struct list_head posix_timers;
>
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -687,11 +687,12 @@ static void exit_mm(struct task_struct *
>  }
>
>  /*
> - * When we die, we re-parent all our children.
> - * Try to give them to another thread in our thread
> - * group, and if no such member exists, give it to
> - * the child reaper process (ie "init") in our pid
> - * space.
> + * When we die, we re-parent all our children, and try to:
> + * 1. give them to another thread in our thread group, if such a
> + *    member exists
> + * 2. give it to the first anchestor process which prctl'd itself
> + *    as a child_subreaper for its children (like a service manager)
> + * 3. give it to the init process (PID 1) in our pid namespace
>  */
>  static struct task_struct *find_new_reaper(struct task_struct *father)
>        __releases(&tasklist_lock)
> @@ -722,6 +723,23 @@ static struct task_struct *find_new_reap
>                 * forget_original_parent() must move them somewhere.
>                 */
>                pid_ns->child_reaper = init_pid_ns.child_reaper;
> +       } else if (father->signal->has_child_subreaper) {
> +               struct task_struct *reaper;
> +
> +               /* find the first ancestor marked as child_subreaper */
> +               for (reaper = father->real_parent;
> +                    reaper != &init_task;
> +                    reaper = reaper->real_parent) {
> +                       if (same_thread_group(reaper, pid_ns->child_reaper))
> +                               break;
> +                       if (!reaper->signal->is_child_subreaper)
> +                               continue;
> +                       thread = reaper;
> +                       do {
> +                               if (!(thread->flags & PF_EXITING))
> +                                       return reaper;
> +                       } while_each_thread(reaper, thread);
> +               }
>        }
>
>        return pid_ns->child_reaper;
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -979,6 +979,9 @@ static int copy_signal(unsigned long clo
>        sig->oom_score_adj = current->signal->oom_score_adj;
>        sig->oom_score_adj_min = current->signal->oom_score_adj_min;
>
> +       sig->has_child_subreaper = current->signal->has_child_subreaper ||
> +                                  current->signal->is_child_subreaper;
> +
>        mutex_init(&sig->cred_guard_mutex);
>
>        return 0;
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -1841,6 +1841,14 @@ SYSCALL_DEFINE5(prctl, int, option, unsi
>                        else
>                                error = PR_MCE_KILL_DEFAULT;
>                        break;
> +               case PR_SET_CHILD_SUBREAPER:
> +                       me->signal->is_child_subreaper = !!arg2;
> +                       error = 0;
> +                       break;
> +               case PR_GET_CHILD_SUBREAPER:
> +                       error = put_user(me->signal->is_child_subreaper,
> +                                        (int __user *) arg2);
> +                       break;
>                default:
>                        error = -EINVAL;
>                        break;
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



-- 
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] prctl: add PR_{SET,GET}_CHILD_SUBREAPER to allow simple process supervision
  2012-04-23 22:36 ` Michael Kerrisk
@ 2013-01-10 22:48   ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-01-10 22:48 UTC (permalink / raw)
  To: Kay Sievers, Lennart Poettering
  Cc: akpm, linux-kernel, Oleg Nesterov, Michael Kerrisk, shawnlandden

Hi Lennart,

Ping!

Also, I'll add you to CC on Shawn's mail. Perhaps you can
review/improve his patch. Or please send me an alternative.

Thanks,

Michael


On Tue, Apr 24, 2012 at 12:36 AM, Michael Kerrisk
<mtk.manpages@gmail.com> wrote:
> Lennart,
>
> On Sun, Jan 8, 2012 at 4:56 AM, Kay Sievers <kay.sievers@vrfy.org> wrote:
>> Resending this, it got lost last year's September.
>>
>> We still need it to properly implement init-like service managers.
>>
>> Andrew, care to pick this up again? The issues raised the last year are
>> all expected to be fixed.
>>
>> Thanks,
>> Kay
>>
>>
>> From: Lennart Poettering <lennart@poettering.net>
>> Subject: prctl: add PR_{SET,GET}_CHILD_SUBREAPER to allow simple process supervision
>
> You have often enthusiastically pointed out to me pieces that are
> missing from the man pages. When will you be sending me a man-pages
> patch to document these new prctl() options that you've added?
>
> Thanks,
>
> Michael
>
>
>
>> Userspace service managers/supervisors need to track their started
>> services. Many services daemonize by double-forking and get implicitly
>> re-parented to PID 1. The service manager will no longer be able to
>> receive the SIGCHLD signals for them, and is no longer in charge of
>> reaping the children with wait(). All information about the children
>> is lost at the moment PID 1 cleans up the re-parented processes.
>>
>> With this prctl, a service manager process can mark itself as a sort of
>> 'sub-init', able to stay as the parent for all orphaned processes
>> created by the started services. All SIGCHLD signals will be delivered
>> to the service manager.
>>
>> Receiving SIGCHLD and doing wait() is in cases of a service-manager
>> much preferred over any possible asynchronous notification about
>> specific PIDs, because the service manager has full access to the
>> child process data in /proc and the PID can not be re-used until
>> the wait(), the service-manager itself is in charge of, has happened.
>>
>> As a side effect, the relevant parent PID information does not get lost
>> by a double-fork, which results in a more elaborate process tree and 'ps'
>> output:
>>
>> before:
>>  # ps afx
>>  253 ?        Ss     0:00 /bin/dbus-daemon --system --nofork
>>  294 ?        Sl     0:00 /usr/libexec/polkit-1/polkitd
>>  328 ?        S      0:00 /usr/sbin/modem-manager
>>  608 ?        Sl     0:00 /usr/libexec/colord
>>  658 ?        Sl     0:00 /usr/libexec/upowerd
>>  819 ?        Sl     0:00 /usr/libexec/imsettings-daemon
>>  916 ?        Sl     0:00 /usr/libexec/udisks-daemon
>>  917 ?        S      0:00  \_ udisks-daemon: not polling any devices
>>
>> after:
>>  # ps afx
>>  294 ?        Ss     0:00 /bin/dbus-daemon --system --nofork
>>  426 ?        Sl     0:00  \_ /usr/libexec/polkit-1/polkitd
>>  449 ?        S      0:00  \_ /usr/sbin/modem-manager
>>  635 ?        Sl     0:00  \_ /usr/libexec/colord
>>  705 ?        Sl     0:00  \_ /usr/libexec/upowerd
>>  959 ?        Sl     0:00  \_ /usr/libexec/udisks-daemon
>>  960 ?        S      0:00  |   \_ udisks-daemon: not polling any devices
>>  977 ?        Sl     0:00  \_ /usr/libexec/packagekitd
>>
>> This prctl is orthogonal to PID namespaces. PID namespaces are isolated
>> from each other, while a service management process usually requires
>> the services to live in the same namespace, to be able to talk to each
>> other.
>>
>> Users of this will be the systemd per-user instance, which provides
>> init-like functionality for the user's login session and D-Bus, which
>> activates bus services on-demand. Both need init-like capabilities
>> to be able to properly keep track of the services they start.
>>
>> Many thanks to Oleg for several rounds of review and insights.
>>
>> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
>> Signed-off-by: Lennart Poettering <lennart@poettering.net>
>> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
>> ---
>>
>>  include/linux/prctl.h |    3 +++
>>  include/linux/sched.h |   12 ++++++++++++
>>  kernel/exit.c         |   28 +++++++++++++++++++++++-----
>>  kernel/fork.c         |    3 +++
>>  kernel/sys.c          |    8 ++++++++
>>  5 files changed, 49 insertions(+), 5 deletions(-)
>>
>> --- a/include/linux/prctl.h
>> +++ b/include/linux/prctl.h
>> @@ -102,4 +102,7 @@
>>
>>  #define PR_MCE_KILL_GET 34
>>
>> +#define PR_SET_CHILD_SUBREAPER 35
>> +#define PR_GET_CHILD_SUBREAPER 36
>> +
>>  #endif /* _LINUX_PRCTL_H */
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -552,6 +552,18 @@ struct signal_struct {
>>        int                     group_stop_count;
>>        unsigned int            flags; /* see SIGNAL_* flags below */
>>
>> +       /*
>> +        * PR_SET_CHILD_SUBREAPER marks a process, like a service
>> +        * manager, to re-parent orphan (double-forking) child processes
>> +        * to this process instead of 'init'. The service manager is
>> +        * able to receive SIGCHLD signals and is able to investigate
>> +        * the process until it calls wait(). All children of this
>> +        * process will inherit a flag if they should look for a
>> +        * child_subreaper process at exit.
>> +        */
>> +       unsigned int            is_child_subreaper:1;
>> +       unsigned int            has_child_subreaper:1;
>> +
>>        /* POSIX.1b Interval Timers */
>>        struct list_head posix_timers;
>>
>> --- a/kernel/exit.c
>> +++ b/kernel/exit.c
>> @@ -687,11 +687,12 @@ static void exit_mm(struct task_struct *
>>  }
>>
>>  /*
>> - * When we die, we re-parent all our children.
>> - * Try to give them to another thread in our thread
>> - * group, and if no such member exists, give it to
>> - * the child reaper process (ie "init") in our pid
>> - * space.
>> + * When we die, we re-parent all our children, and try to:
>> + * 1. give them to another thread in our thread group, if such a
>> + *    member exists
>> + * 2. give it to the first anchestor process which prctl'd itself
>> + *    as a child_subreaper for its children (like a service manager)
>> + * 3. give it to the init process (PID 1) in our pid namespace
>>  */
>>  static struct task_struct *find_new_reaper(struct task_struct *father)
>>        __releases(&tasklist_lock)
>> @@ -722,6 +723,23 @@ static struct task_struct *find_new_reap
>>                 * forget_original_parent() must move them somewhere.
>>                 */
>>                pid_ns->child_reaper = init_pid_ns.child_reaper;
>> +       } else if (father->signal->has_child_subreaper) {
>> +               struct task_struct *reaper;
>> +
>> +               /* find the first ancestor marked as child_subreaper */
>> +               for (reaper = father->real_parent;
>> +                    reaper != &init_task;
>> +                    reaper = reaper->real_parent) {
>> +                       if (same_thread_group(reaper, pid_ns->child_reaper))
>> +                               break;
>> +                       if (!reaper->signal->is_child_subreaper)
>> +                               continue;
>> +                       thread = reaper;
>> +                       do {
>> +                               if (!(thread->flags & PF_EXITING))
>> +                                       return reaper;
>> +                       } while_each_thread(reaper, thread);
>> +               }
>>        }
>>
>>        return pid_ns->child_reaper;
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -979,6 +979,9 @@ static int copy_signal(unsigned long clo
>>        sig->oom_score_adj = current->signal->oom_score_adj;
>>        sig->oom_score_adj_min = current->signal->oom_score_adj_min;
>>
>> +       sig->has_child_subreaper = current->signal->has_child_subreaper ||
>> +                                  current->signal->is_child_subreaper;
>> +
>>        mutex_init(&sig->cred_guard_mutex);
>>
>>        return 0;
>> --- a/kernel/sys.c
>> +++ b/kernel/sys.c
>> @@ -1841,6 +1841,14 @@ SYSCALL_DEFINE5(prctl, int, option, unsi
>>                        else
>>                                error = PR_MCE_KILL_DEFAULT;
>>                        break;
>> +               case PR_SET_CHILD_SUBREAPER:
>> +                       me->signal->is_child_subreaper = !!arg2;
>> +                       error = 0;
>> +                       break;
>> +               case PR_GET_CHILD_SUBREAPER:
>> +                       error = put_user(me->signal->is_child_subreaper,
>> +                                        (int __user *) arg2);
>> +                       break;
>>                default:
>>                        error = -EINVAL;
>>                        break;
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>
>
>
> --
> Michael Kerrisk Linux man-pages maintainer;
> http://www.kernel.org/doc/man-pages/
> Author of "The Linux Programming Interface", http://blog.man7.org/



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-01-10 22:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-07 15:56 [PATCH] prctl: add PR_{SET,GET}_CHILD_SUBREAPER to allow simple process supervision Kay Sievers
2012-01-07 16:13 ` Valdis.Kletnieks
2012-01-09 15:07   ` Oleg Nesterov
2012-01-14  0:35     ` Andrew Morton
2012-01-14 13:59       ` Kay Sievers
2012-04-23 22:36 ` Michael Kerrisk
2013-01-10 22:48   ` Michael Kerrisk (man-pages)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox