* A feature suggestion for sandboxing processes @ 2014-01-09 23:55 Victor Porton 2014-01-10 2:55 ` [RFC] subreaper mode 2 (Re: A feature suggestion for sandboxing processes) Andy Lutomirski 2014-01-10 18:14 ` A feature suggestion for sandboxing processes Victor Porton 0 siblings, 2 replies; 6+ messages in thread From: Victor Porton @ 2014-01-09 23:55 UTC (permalink / raw) To: Linus Torvalds, linux-kernel In Fedora there is bin/sandbox command which runs a specified command in so called 'sandbox'. Program running in sandbox cannot open new files (it is commonly used with preopen stdin and stdout) and possibly its access to network is limited. It is intended to run potentially malicious software safely. This Fedora sandbox is not perfect however. One problem is: Suppose the sandboxed program spawned some child processes and exited itself. Suppose we want to kill the sandboxed program after 30 second, if it has not exited voluntarily. The trouble is that the software cannot figure out which processes have appeared from the sandboxed binary. So we are unable to kill these processes automatically. This means that a hacker can in this way create thousands (or more) processes which would overload the system. Also note that the sandboxed program may run setsid() and thus its identity may be lost completely. I propose to add parameter sandbox_id to each process in the kernel. It would be 0 for normal processes and allocated like PID or GID for processes we create in sandbox. Children inherit sandbox_id. There should be an API call using which a process makes it sandboxed_id non-zero (which returns EPERM if it is already non-zero). Then there should be API to enumerate all processes with given sandbox_id, so that we would be able to kill them (-TERM or -KILL). Or maybe we should also have the function which sends the given signal to all processes with given sandbox_id (otherwise we would war with a hacker which could possibly create new children faster than we kill them). Please add me in CC: (I am not subscribed for this mailing list.) -- Victor Porton - http://portonvictor.org ^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC] subreaper mode 2 (Re: A feature suggestion for sandboxing processes) 2014-01-09 23:55 A feature suggestion for sandboxing processes Victor Porton @ 2014-01-10 2:55 ` Andy Lutomirski 2014-01-10 15:00 ` Victor Porton 2014-01-10 18:14 ` A feature suggestion for sandboxing processes Victor Porton 1 sibling, 1 reply; 6+ messages in thread From: Andy Lutomirski @ 2014-01-10 2:55 UTC (permalink / raw) To: Victor Porton, Linus Torvalds, linux-kernel On 01/09/2014 03:55 PM, Victor Porton wrote: > In Fedora there is bin/sandbox command which runs a specified command in so called 'sandbox'. Program running in sandbox cannot open new files (it is commonly used with preopen stdin and stdout) and possibly its access to network is limited. It is intended to run potentially malicious software safely. > > This Fedora sandbox is not perfect however. > > One problem is: > > Suppose the sandboxed program spawned some child processes and exited itself. > > Suppose we want to kill the sandboxed program after 30 second, if it has not exited voluntarily. > > The trouble is that the software cannot figure out which processes have appeared from the sandboxed binary. So we are unable to kill these processes automatically. This means that a hacker can in this way create thousands (or more) processes which would overload the system. > > Also note that the sandboxed program may run setsid() and thus its identity may be lost completely. > > I propose to add parameter sandbox_id to each process in the kernel. It would be 0 for normal processes and allocated like PID or GID for processes we create in sandbox. Children inherit sandbox_id. There should be an API call using which a process makes it sandboxed_id non-zero (which returns EPERM if it is already non-zero). > > Then there should be API to enumerate all processes with given sandbox_id, so that we would be able to kill them (-TERM or -KILL). Or maybe we should also have the function which sends the given signal to all processes with given sandbox_id (otherwise we would war with a hacker which could possibly create new children faster than we kill them). I think you need to think bigger :) I've occasionally pondered how to do real tracking of process trees (sandbox could use it, but I was thinking of systemd and other service managers). cgroups* suck for this purpose. One approach would be to have another subreaper mode (subreaper mode 2) that does three things: - Subreaper mode 2 zombies do not send SIGCHLD and cannot be reaped until they have no descendents left. - Direct zombie children of subreaper mode 2 zombies are automatically reaped. - Descendents that need to be reparented are reparented to the subreaper, just like in subreaper mode 1. Then you'd add an API that takes the PID of a mode 2 subreaper and kills its entire process subtree. (Optionally, tgkill could do that automatically.) To use this for sandbox, sandbox would set subreaper mode 2 and then fork. The initial sandbox process would exit and the child would exec into the sandbox. The parent would stick around as a zombie until the whole tree went away. To use this for an init-like program, the service manager would fork/clone a dummy PID, set subreaper mode 2, fork again, and exec the service. That dummy PID would serve as a persistent reference to the subtree. For added fun, there should be a way to efficiently find the mode 2 subreaper that owns a given pid/tid. That way systemd / journald could map PIDs to service names without mucking with cgroups. An alternative formulation of more or less the same thing would be a syscall manage_pid_subtree(pid_t pid) that does, roughly: if (pid->real_parent != current) return -EINVAL; set subreaper mode; exit current mm, signal set, etc to conserve resources; /* at this point, current is essentially a kernel thread. */ wait for pid to exit; exit, copying pid's return code and other exit siginfo state; To manage a subreaper, you double-fork, and then the middle process would call manage_pid_subtree on its child. Thoughts? * Goddamnit, systemd, I want a way to turn *off* your control of the One True Cgroup Hierarchy (TM). I consider the lack of such a mechanism to be a serious upcoming regression. Maybe if the kernel gives systemd a way to do this, systemd will use it. --Andy ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] subreaper mode 2 (Re: A feature suggestion for sandboxing processes) 2014-01-10 2:55 ` [RFC] subreaper mode 2 (Re: A feature suggestion for sandboxing processes) Andy Lutomirski @ 2014-01-10 15:00 ` Victor Porton 2014-01-10 18:56 ` Andy Lutomirski 0 siblings, 1 reply; 6+ messages in thread From: Victor Porton @ 2014-01-10 15:00 UTC (permalink / raw) To: Andy Lutomirski, Linus Torvalds, linux-kernel@vger.kernel.org I don't quite understand your subreaper mode 2, but for me it looks like that this would break compatibility (sandboxed applications ideally should not be written in any special way, any application which does not open new files (or does similar things) should work in sandbox just like as if there would be no sandbox). 10.01.2014, 04:55, "Andy Lutomirski" <luto@amacapital.net>: > On 01/09/2014 03:55 PM, Victor Porton wrote: > >> In Fedora there is bin/sandbox command which runs a specified command in so called 'sandbox'. Program running in sandbox cannot open new files (it is commonly used with preopen stdin and stdout) and possibly its access to network is limited. It is intended to run potentially malicious software safely. >> >> This Fedora sandbox is not perfect however. >> >> One problem is: >> >> Suppose the sandboxed program spawned some child processes and exited itself. >> >> Suppose we want to kill the sandboxed program after 30 second, if it has not exited voluntarily. >> >> The trouble is that the software cannot figure out which processes have appeared from the sandboxed binary. So we are unable to kill these processes automatically. This means that a hacker can in this way create thousands (or more) processes which would overload the system. >> >> Also note that the sandboxed program may run setsid() and thus its identity may be lost completely. >> >> I propose to add parameter sandbox_id to each process in the kernel. It would be 0 for normal processes and allocated like PID or GID for processes we create in sandbox. Children inherit sandbox_id. There should be an API call using which a process makes it sandboxed_id non-zero (which returns EPERM if it is already non-zero). >> >> Then there should be API to enumerate all processes with given sandbox_id, so that we would be able to kill them (-TERM or -KILL). Or maybe we should also have the function which sends the given signal to all processes with given sandbox_id (otherwise we would war with a hacker which could possibly create new children faster than we kill them). > > I think you need to think bigger :) > > I've occasionally pondered how to do real tracking of process trees > (sandbox could use it, but I was thinking of systemd and other service > managers). cgroups* suck for this purpose. > > One approach would be to have another subreaper mode (subreaper mode 2) > that does three things: > - Subreaper mode 2 zombies do not send SIGCHLD and cannot be reaped > until they have no descendents left. > - Direct zombie children of subreaper mode 2 zombies are automatically > reaped. > - Descendents that need to be reparented are reparented to the > subreaper, just like in subreaper mode 1. > > Then you'd add an API that takes the PID of a mode 2 subreaper and kills > its entire process subtree. (Optionally, tgkill could do that > automatically.) > > To use this for sandbox, sandbox would set subreaper mode 2 and then > fork. The initial sandbox process would exit and the child would exec > into the sandbox. The parent would stick around as a zombie until the > whole tree went away. > > To use this for an init-like program, the service manager would > fork/clone a dummy PID, set subreaper mode 2, fork again, and exec the > service. That dummy PID would serve as a persistent reference to the > subtree. > > For added fun, there should be a way to efficiently find the mode 2 > subreaper that owns a given pid/tid. That way systemd / journald could > map PIDs to service names without mucking with cgroups. > > An alternative formulation of more or less the same thing would be a > syscall manage_pid_subtree(pid_t pid) that does, roughly: > > if (pid->real_parent != current) return -EINVAL; > set subreaper mode; > exit current mm, signal set, etc to conserve resources; > /* at this point, current is essentially a kernel thread. */ > wait for pid to exit; > exit, copying pid's return code and other exit siginfo state; > > To manage a subreaper, you double-fork, and then the middle process > would call manage_pid_subtree on its child. > > Thoughts? > > * Goddamnit, systemd, I want a way to turn *off* your control of the One > True Cgroup Hierarchy (TM). I consider the lack of such a mechanism to > be a serious upcoming regression. Maybe if the kernel gives systemd a > way to do this, systemd will use it. > > --Andy -- Victor Porton - http://portonvictor.org ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] subreaper mode 2 (Re: A feature suggestion for sandboxing processes) 2014-01-10 15:00 ` Victor Porton @ 2014-01-10 18:56 ` Andy Lutomirski 0 siblings, 0 replies; 6+ messages in thread From: Andy Lutomirski @ 2014-01-10 18:56 UTC (permalink / raw) To: Victor Porton; +Cc: Linus Torvalds, linux-kernel@vger.kernel.org On Fri, Jan 10, 2014 at 7:00 AM, Victor Porton <porton@narod.ru> wrote: > I don't quite understand your subreaper mode 2, but for me it looks like that this would break compatibility (sandboxed applications ideally should not be written in any special way, any application which does not open new files (or does similar things) should work in sandbox just like as if there would be no sandbox). I'm suggesting that *sandbox*, not the application in the sandbox, use subreaper mode 2 (or whatever the new mechanism). The sandboxed app should case, except insofar as as the sandbox can't double-fork to create long-lived subprocesses. --Andy ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: A feature suggestion for sandboxing processes 2014-01-09 23:55 A feature suggestion for sandboxing processes Victor Porton 2014-01-10 2:55 ` [RFC] subreaper mode 2 (Re: A feature suggestion for sandboxing processes) Andy Lutomirski @ 2014-01-10 18:14 ` Victor Porton 2014-01-10 19:35 ` Andy Lutomirski 1 sibling, 1 reply; 6+ messages in thread From: Victor Porton @ 2014-01-10 18:14 UTC (permalink / raw) To: Linus Torvalds, linux-kernel@vger.kernel.org I was told that it can be done using cgroups. So no urgent necessity to add my new syscall. 10.01.2014, 01:55, "Victor Porton" <porton@narod.ru>: > In Fedora there is bin/sandbox command which runs a specified command in so called 'sandbox'. Program running in sandbox cannot open new files (it is commonly used with preopen stdin and stdout) and possibly its access to network is limited. It is intended to run potentially malicious software safely. > > This Fedora sandbox is not perfect however. > > One problem is: > > Suppose the sandboxed program spawned some child processes and exited itself. > > Suppose we want to kill the sandboxed program after 30 second, if it has not exited voluntarily. > > The trouble is that the software cannot figure out which processes have appeared from the sandboxed binary. So we are unable to kill these processes automatically. This means that a hacker can in this way create thousands (or more) processes which would overload the system. > > Also note that the sandboxed program may run setsid() and thus its identity may be lost completely. > > I propose to add parameter sandbox_id to each process in the kernel. It would be 0 for normal processes and allocated like PID or GID for processes we create in sandbox. Children inherit sandbox_id. There should be an API call using which a process makes it sandboxed_id non-zero (which returns EPERM if it is already non-zero). > > Then there should be API to enumerate all processes with given sandbox_id, so that we would be able to kill them (-TERM or -KILL). Or maybe we should also have the function which sends the given signal to all processes with given sandbox_id (otherwise we would war with a hacker which could possibly create new children faster than we kill them). > > Please add me in CC: (I am not subscribed for this mailing list.) -- Victor Porton - http://portonvictor.org ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: A feature suggestion for sandboxing processes 2014-01-10 18:14 ` A feature suggestion for sandboxing processes Victor Porton @ 2014-01-10 19:35 ` Andy Lutomirski 0 siblings, 0 replies; 6+ messages in thread From: Andy Lutomirski @ 2014-01-10 19:35 UTC (permalink / raw) To: Victor Porton, Linus Torvalds, linux-kernel@vger.kernel.org On 01/10/2014 10:14 AM, Victor Porton wrote: > I was told that it can be done using cgroups. So no urgent necessity to add my new syscall. Yeah, right. Good luck writing a program that will work on modern Red Hat, Fedora, and Ubuntu systems. Cgroups is IMO a complete and utter failure in providing an interface usable by normal programs, and it's getting *worse* over time. --Andy > > 10.01.2014, 01:55, "Victor Porton" <porton@narod.ru>: >> In Fedora there is bin/sandbox command which runs a specified command in so called 'sandbox'. Program running in sandbox cannot open new files (it is commonly used with preopen stdin and stdout) and possibly its access to network is limited. It is intended to run potentially malicious software safely. >> >> This Fedora sandbox is not perfect however. >> >> One problem is: >> >> Suppose the sandboxed program spawned some child processes and exited itself. >> >> Suppose we want to kill the sandboxed program after 30 second, if it has not exited voluntarily. >> >> The trouble is that the software cannot figure out which processes have appeared from the sandboxed binary. So we are unable to kill these processes automatically. This means that a hacker can in this way create thousands (or more) processes which would overload the system. >> >> Also note that the sandboxed program may run setsid() and thus its identity may be lost completely. >> >> I propose to add parameter sandbox_id to each process in the kernel. It would be 0 for normal processes and allocated like PID or GID for processes we create in sandbox. Children inherit sandbox_id. There should be an API call using which a process makes it sandboxed_id non-zero (which returns EPERM if it is already non-zero). >> >> Then there should be API to enumerate all processes with given sandbox_id, so that we would be able to kill them (-TERM or -KILL). Or maybe we should also have the function which sends the given signal to all processes with given sandbox_id (otherwise we would war with a hacker which could possibly create new children faster than we kill them). >> >> Please add me in CC: (I am not subscribed for this mailing list.) > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-01-10 19:35 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-01-09 23:55 A feature suggestion for sandboxing processes Victor Porton 2014-01-10 2:55 ` [RFC] subreaper mode 2 (Re: A feature suggestion for sandboxing processes) Andy Lutomirski 2014-01-10 15:00 ` Victor Porton 2014-01-10 18:56 ` Andy Lutomirski 2014-01-10 18:14 ` A feature suggestion for sandboxing processes Victor Porton 2014-01-10 19:35 ` Andy Lutomirski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox