A feature suggestion for sandboxing processes

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* A feature suggestion for sandboxing processes
@ 2014-01-09 23:55 Victor Porton
  2014-01-10  2:55 ` [RFC] subreaper mode 2 (Re: A feature suggestion for sandboxing processes) Andy Lutomirski
  2014-01-10 18:14 ` A feature suggestion for sandboxing processes Victor Porton
  0 siblings, 2 replies; 6+ messages in thread
From: Victor Porton @ 2014-01-09 23:55 UTC (permalink / raw)
  To: Linus Torvalds, linux-kernel

In Fedora there is bin/sandbox command which runs a specified command in so called 'sandbox'. Program running in sandbox cannot open new files (it is commonly used with preopen stdin and stdout) and possibly its access to network is limited. It is intended to run potentially malicious software safely.

This Fedora sandbox is not perfect however.

One problem is:

Suppose the sandboxed program spawned some child processes and exited itself.

Suppose we want to kill the sandboxed program after 30 second, if it has not exited voluntarily.

The trouble is that the software cannot figure out which processes have appeared from the sandboxed binary. So we are unable to kill these processes automatically. This means that a hacker can in this way create thousands (or more) processes which would overload the system.

Also note that the sandboxed program may run setsid() and thus its identity may be lost completely.

I propose to add parameter sandbox_id to each process in the kernel. It would be 0 for normal processes and allocated like PID or GID for processes we create in sandbox. Children inherit sandbox_id. There should be an API call using which a process makes it sandboxed_id non-zero (which returns EPERM if it is already non-zero).

Then there should be API to enumerate all processes with given sandbox_id, so that we would be able to kill them (-TERM or -KILL). Or maybe we should also have the function which sends the given signal to all processes with given sandbox_id (otherwise we would war with a hacker which could possibly create new children faster than we kill them).

Please add me in CC: (I am not subscribed for this mailing list.)

-- 
Victor Porton - http://portonvictor.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC] subreaper mode 2 (Re: A feature suggestion for sandboxing processes)
  2014-01-09 23:55 A feature suggestion for sandboxing processes Victor Porton
@ 2014-01-10  2:55 ` Andy Lutomirski
  2014-01-10 15:00   ` Victor Porton
  2014-01-10 18:14 ` A feature suggestion for sandboxing processes Victor Porton
  1 sibling, 1 reply; 6+ messages in thread
From: Andy Lutomirski @ 2014-01-10  2:55 UTC (permalink / raw)
  To: Victor Porton, Linus Torvalds, linux-kernel

On 01/09/2014 03:55 PM, Victor Porton wrote:
> In Fedora there is bin/sandbox command which runs a specified command in so called 'sandbox'. Program running in sandbox cannot open new files (it is commonly used with preopen stdin and stdout) and possibly its access to network is limited. It is intended to run potentially malicious software safely.
> 
> This Fedora sandbox is not perfect however.
> 
> One problem is:
> 
> Suppose the sandboxed program spawned some child processes and exited itself.
> 
> Suppose we want to kill the sandboxed program after 30 second, if it has not exited voluntarily.
> 
> The trouble is that the software cannot figure out which processes have appeared from the sandboxed binary. So we are unable to kill these processes automatically. This means that a hacker can in this way create thousands (or more) processes which would overload the system.
> 
> Also note that the sandboxed program may run setsid() and thus its identity may be lost completely.
> 
> I propose to add parameter sandbox_id to each process in the kernel. It would be 0 for normal processes and allocated like PID or GID for processes we create in sandbox. Children inherit sandbox_id. There should be an API call using which a process makes it sandboxed_id non-zero (which returns EPERM if it is already non-zero).
> 
> Then there should be API to enumerate all processes with given sandbox_id, so that we would be able to kill them (-TERM or -KILL). Or maybe we should also have the function which sends the given signal to all processes with given sandbox_id (otherwise we would war with a hacker which could possibly create new children faster than we kill them).

I think you need to think bigger :)

I've occasionally pondered how to do real tracking of process trees
(sandbox could use it, but I was thinking of systemd and other service
managers).  cgroups* suck for this purpose.

One approach would be to have another subreaper mode (subreaper mode 2)
that does three things:
 - Subreaper mode 2 zombies do not send SIGCHLD and cannot be reaped
until they have no descendents left.
 - Direct zombie children of subreaper mode 2 zombies are automatically
reaped.
 - Descendents that need to be reparented are reparented to the
subreaper, just like in subreaper mode 1.

Then you'd add an API that takes the PID of a mode 2 subreaper and kills
its entire process subtree.  (Optionally, tgkill could do that
automatically.)

To use this for sandbox, sandbox would set subreaper mode 2 and then
fork.  The initial sandbox process would exit and the child would exec
into the sandbox.  The parent would stick around as a zombie until the
whole tree went away.

To use this for an init-like program, the service manager would
fork/clone a dummy PID, set subreaper mode 2, fork again, and exec the
service.  That dummy PID would serve as a persistent reference to the
subtree.

For added fun, there should be a way to efficiently find the mode 2
subreaper that owns a given pid/tid.  That way systemd / journald could
map PIDs to service names without mucking with cgroups.

An alternative formulation of more or less the same thing would be a
syscall manage_pid_subtree(pid_t pid) that does, roughly:

  if (pid->real_parent != current) return -EINVAL;
  set subreaper mode;
  exit current mm, signal set, etc to conserve resources;
  /* at this point, current is essentially a kernel thread. */
  wait for pid to exit;
  exit, copying pid's return code and other exit siginfo state;

To manage a subreaper, you double-fork, and then the middle process
would call manage_pid_subtree on its child.

Thoughts?

* Goddamnit, systemd, I want a way to turn *off* your control of the One
True Cgroup Hierarchy (TM).  I consider the lack of such a mechanism to
be a serious upcoming regression.  Maybe if the kernel gives systemd a
way to do this, systemd will use it.

--Andy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] subreaper mode 2 (Re: A feature suggestion for sandboxing processes)
  2014-01-10  2:55 ` [RFC] subreaper mode 2 (Re: A feature suggestion for sandboxing processes) Andy Lutomirski
@ 2014-01-10 15:00   ` Victor Porton
  2014-01-10 18:56     ` Andy Lutomirski
  0 siblings, 1 reply; 6+ messages in thread
From: Victor Porton @ 2014-01-10 15:00 UTC (permalink / raw)
  To: Andy Lutomirski, Linus Torvalds, linux-kernel@vger.kernel.org

I don't quite understand your subreaper mode 2, but for me it looks like that this would break compatibility (sandboxed applications ideally should not be written in any special way, any application which does not open new files (or does similar things) should work in sandbox just like as if there would be no sandbox).

10.01.2014, 04:55, "Andy Lutomirski" <luto@amacapital.net>:
> On 01/09/2014 03:55 PM, Victor Porton wrote:
>
>>  In Fedora there is bin/sandbox command which runs a specified command in so called 'sandbox'. Program running in sandbox cannot open new files (it is commonly used with preopen stdin and stdout) and possibly its access to network is limited. It is intended to run potentially malicious software safely.
>>
>>  This Fedora sandbox is not perfect however.
>>
>>  One problem is:
>>
>>  Suppose the sandboxed program spawned some child processes and exited itself.
>>
>>  Suppose we want to kill the sandboxed program after 30 second, if it has not exited voluntarily.
>>
>>  The trouble is that the software cannot figure out which processes have appeared from the sandboxed binary. So we are unable to kill these processes automatically. This means that a hacker can in this way create thousands (or more) processes which would overload the system.
>>
>>  Also note that the sandboxed program may run setsid() and thus its identity may be lost completely.
>>
>>  I propose to add parameter sandbox_id to each process in the kernel. It would be 0 for normal processes and allocated like PID or GID for processes we create in sandbox. Children inherit sandbox_id. There should be an API call using which a process makes it sandboxed_id non-zero (which returns EPERM if it is already non-zero).
>>
>>  Then there should be API to enumerate all processes with given sandbox_id, so that we would be able to kill them (-TERM or -KILL). Or maybe we should also have the function which sends the given signal to all processes with given sandbox_id (otherwise we would war with a hacker which could possibly create new children faster than we kill them).
>
> I think you need to think bigger :)
>
> I've occasionally pondered how to do real tracking of process trees
> (sandbox could use it, but I was thinking of systemd and other service
> managers).  cgroups* suck for this purpose.
>
> One approach would be to have another subreaper mode (subreaper mode 2)
> that does three things:
>  - Subreaper mode 2 zombies do not send SIGCHLD and cannot be reaped
> until they have no descendents left.
>  - Direct zombie children of subreaper mode 2 zombies are automatically
> reaped.
>  - Descendents that need to be reparented are reparented to the
> subreaper, just like in subreaper mode 1.
>
> Then you'd add an API that takes the PID of a mode 2 subreaper and kills
> its entire process subtree.  (Optionally, tgkill could do that
> automatically.)
>
> To use this for sandbox, sandbox would set subreaper mode 2 and then
> fork.  The initial sandbox process would exit and the child would exec
> into the sandbox.  The parent would stick around as a zombie until the
> whole tree went away.
>
> To use this for an init-like program, the service manager would
> fork/clone a dummy PID, set subreaper mode 2, fork again, and exec the
> service.  That dummy PID would serve as a persistent reference to the
> subtree.
>
> For added fun, there should be a way to efficiently find the mode 2
> subreaper that owns a given pid/tid.  That way systemd / journald could
> map PIDs to service names without mucking with cgroups.
>
> An alternative formulation of more or less the same thing would be a
> syscall manage_pid_subtree(pid_t pid) that does, roughly:
>
>   if (pid->real_parent != current) return -EINVAL;
>   set subreaper mode;
>   exit current mm, signal set, etc to conserve resources;
>   /* at this point, current is essentially a kernel thread. */
>   wait for pid to exit;
>   exit, copying pid's return code and other exit siginfo state;
>
> To manage a subreaper, you double-fork, and then the middle process
> would call manage_pid_subtree on its child.
>
> Thoughts?
>
> * Goddamnit, systemd, I want a way to turn *off* your control of the One
> True Cgroup Hierarchy (TM).  I consider the lack of such a mechanism to
> be a serious upcoming regression.  Maybe if the kernel gives systemd a
> way to do this, systemd will use it.
>
> --Andy

-- 
Victor Porton - http://portonvictor.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] subreaper mode 2 (Re: A feature suggestion for sandboxing processes)
  2014-01-10 15:00   ` Victor Porton
@ 2014-01-10 18:56     ` Andy Lutomirski
  0 siblings, 0 replies; 6+ messages in thread
From: Andy Lutomirski @ 2014-01-10 18:56 UTC (permalink / raw)
  To: Victor Porton; +Cc: Linus Torvalds, linux-kernel@vger.kernel.org

On Fri, Jan 10, 2014 at 7:00 AM, Victor Porton <porton@narod.ru> wrote:
> I don't quite understand your subreaper mode 2, but for me it looks like that this would break compatibility (sandboxed applications ideally should not be written in any special way, any application which does not open new files (or does similar things) should work in sandbox just like as if there would be no sandbox).

I'm suggesting that *sandbox*, not the application in the sandbox, use
subreaper mode 2 (or whatever the new mechanism).  The sandboxed app
should case, except insofar as as the sandbox can't double-fork to
create long-lived subprocesses.

--Andy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A feature suggestion for sandboxing processes
  2014-01-09 23:55 A feature suggestion for sandboxing processes Victor Porton
  2014-01-10  2:55 ` [RFC] subreaper mode 2 (Re: A feature suggestion for sandboxing processes) Andy Lutomirski
@ 2014-01-10 18:14 ` Victor Porton
  2014-01-10 19:35   ` Andy Lutomirski
  1 sibling, 1 reply; 6+ messages in thread
From: Victor Porton @ 2014-01-10 18:14 UTC (permalink / raw)
  To: Linus Torvalds, linux-kernel@vger.kernel.org

I was told that it can be done using cgroups. So no urgent necessity to add my new syscall.

10.01.2014, 01:55, "Victor Porton" <porton@narod.ru>:
> In Fedora there is bin/sandbox command which runs a specified command in so called 'sandbox'. Program running in sandbox cannot open new files (it is commonly used with preopen stdin and stdout) and possibly its access to network is limited. It is intended to run potentially malicious software safely.
>
> This Fedora sandbox is not perfect however.
>
> One problem is:
>
> Suppose the sandboxed program spawned some child processes and exited itself.
>
> Suppose we want to kill the sandboxed program after 30 second, if it has not exited voluntarily.
>
> The trouble is that the software cannot figure out which processes have appeared from the sandboxed binary. So we are unable to kill these processes automatically. This means that a hacker can in this way create thousands (or more) processes which would overload the system.
>
> Also note that the sandboxed program may run setsid() and thus its identity may be lost completely.
>
> I propose to add parameter sandbox_id to each process in the kernel. It would be 0 for normal processes and allocated like PID or GID for processes we create in sandbox. Children inherit sandbox_id. There should be an API call using which a process makes it sandboxed_id non-zero (which returns EPERM if it is already non-zero).
>
> Then there should be API to enumerate all processes with given sandbox_id, so that we would be able to kill them (-TERM or -KILL). Or maybe we should also have the function which sends the given signal to all processes with given sandbox_id (otherwise we would war with a hacker which could possibly create new children faster than we kill them).
>
> Please add me in CC: (I am not subscribed for this mailing list.)

-- 
Victor Porton - http://portonvictor.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: A feature suggestion for sandboxing processes
  2014-01-10 18:14 ` A feature suggestion for sandboxing processes Victor Porton
@ 2014-01-10 19:35   ` Andy Lutomirski
  0 siblings, 0 replies; 6+ messages in thread
From: Andy Lutomirski @ 2014-01-10 19:35 UTC (permalink / raw)
  To: Victor Porton, Linus Torvalds, linux-kernel@vger.kernel.org

On 01/10/2014 10:14 AM, Victor Porton wrote:
> I was told that it can be done using cgroups. So no urgent necessity to add my new syscall.

Yeah, right.  Good luck writing a program that will work on modern Red
Hat, Fedora, and Ubuntu systems.

Cgroups is IMO a complete and utter failure in providing an interface
usable by normal programs, and it's getting *worse* over time.

--Andy

> 
> 10.01.2014, 01:55, "Victor Porton" <porton@narod.ru>:
>> In Fedora there is bin/sandbox command which runs a specified command in so called 'sandbox'. Program running in sandbox cannot open new files (it is commonly used with preopen stdin and stdout) and possibly its access to network is limited. It is intended to run potentially malicious software safely.
>>
>> This Fedora sandbox is not perfect however.
>>
>> One problem is:
>>
>> Suppose the sandboxed program spawned some child processes and exited itself.
>>
>> Suppose we want to kill the sandboxed program after 30 second, if it has not exited voluntarily.
>>
>> The trouble is that the software cannot figure out which processes have appeared from the sandboxed binary. So we are unable to kill these processes automatically. This means that a hacker can in this way create thousands (or more) processes which would overload the system.
>>
>> Also note that the sandboxed program may run setsid() and thus its identity may be lost completely.
>>
>> I propose to add parameter sandbox_id to each process in the kernel. It would be 0 for normal processes and allocated like PID or GID for processes we create in sandbox. Children inherit sandbox_id. There should be an API call using which a process makes it sandboxed_id non-zero (which returns EPERM if it is already non-zero).
>>
>> Then there should be API to enumerate all processes with given sandbox_id, so that we would be able to kill them (-TERM or -KILL). Or maybe we should also have the function which sends the given signal to all processes with given sandbox_id (otherwise we would war with a hacker which could possibly create new children faster than we kill them).
>>
>> Please add me in CC: (I am not subscribed for this mailing list.)
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-01-10 19:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-09 23:55 A feature suggestion for sandboxing processes Victor Porton
2014-01-10  2:55 ` [RFC] subreaper mode 2 (Re: A feature suggestion for sandboxing processes) Andy Lutomirski
2014-01-10 15:00   ` Victor Porton
2014-01-10 18:56     ` Andy Lutomirski
2014-01-10 18:14 ` A feature suggestion for sandboxing processes Victor Porton
2014-01-10 19:35   ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox