public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* adopt(pid_t pid) syscall proposal [patch included]
@ 2013-06-11  1:23 vcaputo
  2013-06-11 16:48 ` Andy Lutomirski
  0 siblings, 1 reply; 4+ messages in thread
From: vcaputo @ 2013-06-11  1:23 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2390 bytes --]

Hello all,

I'd like to propose a new system call, I know I'll probably be flamed,
but here goes.

Today there is no way to get a process which has been orphaned, and thus
become a child of init, back as a child of another process.  This is
annoying me, please allow me to explain why.

The CONFIG_CHECKPOINT_RESTORE feature enables a useful little thing in
proc which I've been taking advantage of to make some interesting
omnipresent desktop monitoring features built into my own composited
window manager.  That thing is the /proc/$pid/task/$pid/children file.

Using this file, I've created scoped per-window process monitors which
can efficiently limit their monitoring to only the client pid of the X
window and its decendants.  These monitors are always running, but
visibility is toggled when desired by a keypress, efficiency is
important and the children file helps tremendously in this regard.

So I'm thrilled with all this and everything is fantastically efficient
and simple, until the first time I detach from my screen session and
reattach.

Initially, I'll have a scenario like:

 xterm
  bash
   screen -S stuff
    screen -S stuff
     bash
     bash
      top
     bash
      vi foo.c
     bash
      make
       sh -c cc -Wall ...
        cc1 -quiet -I ...
     bash

Behind these rows of text I have user/cpu graphs for each process
sliding by.

After detaching from screen, then reattaching, this is what the above
turns into, as you can probably predict:

 xterm
  bash
   screen -dr stuff

All subsequent processes created through my interactions with this
screen are lost to the monitoring, since the "screen -S stuff" backend
is now stuck being a child of init.

With the attached adopt() syscall patch, and a 1 line change to screen's
attacher function invoking adopt(), I get the following on reattach:

 xterm
  bash
   screen -dr stuff
    screen -S stuff
     bash
     bash
      top
     bash
      vi foo.c
     bash
      make
       sh -c cc -Wall ...
        cc1 -quiet -I ...


Which makes me very happy, dance around the room happy, and I'm not a
dancer.

Here is a screenshot of the results enabled by the patch:
http://pengaru.com/~swivel/vwm/screen_sys_adopt.png

Thanks for reading, and hopefuly considering this addition, I'm sure
there are other uses as well.  Please CC me in any replies as I'm not
currently subscribed.

Regards,
Vito Caputo

[-- Attachment #2: 0001-sched-implement-adopt-system-call.patch --]
[-- Type: text/x-diff, Size: 4211 bytes --]

>From 2d9f1781c6cdec8d74f44f98624af2eacbd21810 Mon Sep 17 00:00:00 2001
From: Vito Caputo <vcaputo@gnugeneration.com>
Date: Mon, 10 Jun 2013 17:28:49 -0700
Subject: [PATCH 1/1] 	sched: implement adopt() system call

	This implements a proposed facility for adopting a process
	by another.  The immediate use case is programs like GNU
	screen which lose their parent-child relationship when
	detached.  Using this sytem call on reattach, the relation
	can be restored, which is particularly of use to those
	taking advantage of the /proc/$pid/task/$pid/children list
	provided by CONFIG_CHECKPOINT_RESTORE.

	This implementation applies permission checks similar to
	that of kill(), in addition to preventing the adoption of
	an ancestor process.

	I have tested it and use the change myself to complement
	process subtree monitoring, without which I lose all
	visibility of the descendants of my screen sessions on
	reattach.

Signed-off-by: Vito Caputo <vcaputo@gnugeneration.com>
---
 arch/x86/syscalls/syscall_32.tbl |    1 +
 arch/x86/syscalls/syscall_64.tbl |    1 +
 include/linux/syscalls.h         |    2 ++
 kernel/exit.c                    |   59 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 63 insertions(+)

diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index aabfb83..d219781 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -357,3 +357,4 @@
 348	i386	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
 349	i386	kcmp			sys_kcmp
 350	i386	finit_module		sys_finit_module
+351	i386	adopt			sys_adopt
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 38ae65d..6345ebf 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -320,6 +320,7 @@
 311	64	process_vm_writev	sys_process_vm_writev
 312	common	kcmp			sys_kcmp
 313	common	finit_module		sys_finit_module
+314	common	adopt			sys_adopt
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 4147d70..3997cde 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -841,4 +841,6 @@ asmlinkage long sys_process_vm_writev(pid_t pid,
 asmlinkage long sys_kcmp(pid_t pid1, pid_t pid2, int type,
 			 unsigned long idx1, unsigned long idx2);
 asmlinkage long sys_finit_module(int fd, const char __user *uargs, int flags);
+
+asmlinkage long sys_adopt(pid_t upid);
 #endif
diff --git a/kernel/exit.c b/kernel/exit.c
index af2eb3c..5b554c8 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1681,3 +1681,62 @@ SYSCALL_DEFINE3(waitpid, pid_t, pid, int __user *, stat_addr, int, options)
 }
 
 #endif
+
+/*
+ * sys_adopt() adopts the process specified as upid by the current
+ * process if permitted.  This is linux-specific and provided so
+ * programs such as GNU screen may restore the parent-child
+ * relationship lost in detaching when reattaching.
+ */
+SYSCALL_DEFINE1(adopt, pid_t, upid)
+{
+	long ret = -ENOENT;
+	struct pid *pid;
+
+	if ((pid = find_get_pid(upid))) {
+		struct task_struct	*p;
+
+		rcu_read_lock();
+		write_lock_irq(&tasklist_lock);
+		p = pid_task(pid, PIDTYPE_PID);
+		if (p) {
+			struct task_struct *t;
+			const struct cred *cred = current_cred();
+			const struct cred *tcred = __task_cred(p);
+
+			if (!uid_eq(cred->euid, tcred->suid) &&
+			    !uid_eq(cred->euid, tcred->uid)  &&
+			    !uid_eq(cred->uid,  tcred->suid) &&
+			    !uid_eq(cred->uid,  tcred->uid) &&
+			    !ns_capable(cred->user_ns, CAP_KILL)) {
+				ret = -EPERM;
+				goto out_unlock;
+			}
+
+
+			/* upid cannot be current nor an ancestor of current */
+			for (t = current;; t = t->real_parent) {
+				if (t == p) {
+					ret = -EINVAL;
+					goto out_unlock;
+				}
+				if (is_global_init(t))
+					break;
+			}
+
+			t = p;
+			do {
+				t->real_parent = current;
+			} while_each_thread(p, t);
+
+			list_move_tail(&p->sibling, &p->real_parent->children);
+			ret = 0;
+		} /* else { ret = -ENOENT } */
+out_unlock:
+		write_unlock_irq(&tasklist_lock);
+		rcu_read_unlock();
+		put_pid(pid);
+	} /* else { ret = -ENOENT } */
+
+	return ret;
+}
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: adopt(pid_t pid) syscall proposal [patch included]
  2013-06-11  1:23 adopt(pid_t pid) syscall proposal [patch included] vcaputo
@ 2013-06-11 16:48 ` Andy Lutomirski
  2013-06-11 18:38   ` vcaputo
  0 siblings, 1 reply; 4+ messages in thread
From: Andy Lutomirski @ 2013-06-11 16:48 UTC (permalink / raw)
  To: vcaputo; +Cc: linux-kernel

On 06/10/2013 06:23 PM, vcaputo@gnugeneration.com wrote:
> +			if (!uid_eq(cred->euid, tcred->suid) &&
> +			    !uid_eq(cred->euid, tcred->uid)  &&
> +			    !uid_eq(cred->uid,  tcred->suid) &&
> +			    !uid_eq(cred->uid,  tcred->uid) &&
> +			    !ns_capable(cred->user_ns, CAP_KILL)) {
> +				ret = -EPERM;
> +				goto out_unlock;
> +			}
> +

That check's far too permissive.

This sounds like it will break anything that uses wait and expects its 
children to not be stolen out from under it.

Also, you'll have problems with screen -x or the default tmux shareable 
configuration.  It sounds like this is better done in userspace.

--Andy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: adopt(pid_t pid) syscall proposal [patch included]
  2013-06-11 16:48 ` Andy Lutomirski
@ 2013-06-11 18:38   ` vcaputo
  2013-06-11 18:47     ` Andrew Lutomirski
  0 siblings, 1 reply; 4+ messages in thread
From: vcaputo @ 2013-06-11 18:38 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: linux-kernel

On Tue, Jun 11, 2013 at 09:48:22AM -0700, Andy Lutomirski wrote:
> On 06/10/2013 06:23 PM, vcaputo@gnugeneration.com wrote:
> >+			if (!uid_eq(cred->euid, tcred->suid) &&
> >+			    !uid_eq(cred->euid, tcred->uid)  &&
> >+			    !uid_eq(cred->uid,  tcred->suid) &&
> >+			    !uid_eq(cred->uid,  tcred->uid) &&
> >+			    !ns_capable(cred->user_ns, CAP_KILL)) {
> >+				ret = -EPERM;
> >+				goto out_unlock;
> >+			}
> >+
> 
> That check's far too permissive.

I suspected, but that's easily improved, I just wanted to get this out
there and see what people thought, start the discussion, as well as
get my use case functional.  Although I'm curious, what is your cause
for concern with the existing checks?

> 
> This sounds like it will break anything that uses wait and expects its 
> children to not be stolen out from under it.

This thought crossed my mind, and originally I intended to restrict
adoption to orphans who had become children of init.  It seemed like
more general use might be desirable so I left that out.  If this
really is a possibility worth preventing we could put that restriction
on it.  To the best of my knowledge, nothing informs init of its
becoming parent of an orphan, so we should be able to steal the orphan
back without harm.  This still enables the use case of screen/tmux
reattachment.

> 
> Also, you'll have problems with screen -x or the default tmux shareable 
> configuration.  It sounds like this is better done in userspace.

It just needs some determination of "session leader" applied to which
attach adopts the backend before invoking adopt(), that logic goes in
userspace.  I imagine the implementations of both screen and tmux
already have such determinations happening for other reasons.

Regards,
Vito Caputo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: adopt(pid_t pid) syscall proposal [patch included]
  2013-06-11 18:38   ` vcaputo
@ 2013-06-11 18:47     ` Andrew Lutomirski
  0 siblings, 0 replies; 4+ messages in thread
From: Andrew Lutomirski @ 2013-06-11 18:47 UTC (permalink / raw)
  To: vcaputo; +Cc: linux-kernel

On Tue, Jun 11, 2013 at 11:38 AM,  <vcaputo@gnugeneration.com> wrote:
> On Tue, Jun 11, 2013 at 09:48:22AM -0700, Andy Lutomirski wrote:
>> On 06/10/2013 06:23 PM, vcaputo@gnugeneration.com wrote:
>> >+                    if (!uid_eq(cred->euid, tcred->suid) &&
>> >+                        !uid_eq(cred->euid, tcred->uid)  &&
>> >+                        !uid_eq(cred->uid,  tcred->suid) &&
>> >+                        !uid_eq(cred->uid,  tcred->uid) &&
>> >+                        !ns_capable(cred->user_ns, CAP_KILL)) {
>> >+                            ret = -EPERM;
>> >+                            goto out_unlock;
>> >+                    }
>> >+
>>
>> That check's far too permissive.
>
> I suspected, but that's easily improved, I just wanted to get this out
> there and see what people thought, start the discussion, as well as
> get my use case functional.  Although I'm curious, what is your cause
> for concern with the existing checks?

For example, "!uid_eq(cred->euid, tcred->uid)" means that you can
adopt setuid things.  Looking at ptrace may be a good start.

>
>>
>> This sounds like it will break anything that uses wait and expects its
>> children to not be stolen out from under it.
>
> This thought crossed my mind, and originally I intended to restrict
> adoption to orphans who had become children of init.  It seemed like
> more general use might be desirable so I left that out.  If this
> really is a possibility worth preventing we could put that restriction
> on it.  To the best of my knowledge, nothing informs init of its
> becoming parent of an orphan, so we should be able to steal the orphan
> back without harm.  This still enables the use case of screen/tmux
> reattachment.
>
>>
>> Also, you'll have problems with screen -x or the default tmux shareable
>> configuration.  It sounds like this is better done in userspace.
>
> It just needs some determination of "session leader" applied to which
> attach adopts the backend before invoking adopt(), that logic goes in
> userspace.  I imagine the implementations of both screen and tmux
> already have such determinations happening for other reasons.

What I mean is: why not just teach your userspace tool to do this
without kernel help?

--Andy

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-06-11 18:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-11  1:23 adopt(pid_t pid) syscall proposal [patch included] vcaputo
2013-06-11 16:48 ` Andy Lutomirski
2013-06-11 18:38   ` vcaputo
2013-06-11 18:47     ` Andrew Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox