public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: daw@taverner.cs.berkeley.edu (David Wagner)
To: linux-kernel@vger.kernel.org
Subject: Re: seccomp for 2.6.11-rc1-bk8
Date: Sun, 23 Jan 2005 07:34:24 +0000 (UTC)	[thread overview]
Message-ID: <csvk20$6qa$1@abraham.cs.berkeley.edu> (raw)
In-Reply-To: 20050121111700.Q469@build.pdx.osdl.net

Chris Wright  wrote:
>* David Wagner (daw@taverner.cs.berkeley.edu) wrote:
>> There is a simple tweak to ptrace which fixes that: one could add an
>> API to specify a set of syscalls that ptrace should not trap on.  To get
>> seccomp-like semantics, the user program could specify {read,write}, but
>> if the user program ever wants to change its policy, it could change that
>> set.  Solaris /proc (which is what is used for tracing) has this feature.
>> I coded up such an extension to ptrace semantics a long time ago, and
>> it seemed to work fine for me, though of course I am not a ptrace expert.
>
>Hmm, yeah, that'd be nice.  That only leaves the issue of tracer dying
>(say from that crazy oom killer ;-).

Yes, I also implemented was a ptrace option which causes the child to be
slaughtered if the parent dies for any reason.  I could dig up the code,
but I don't recall it being very hard.  This was ages ago (a 2.0.x kernel)
and I have no idea what might have changed.  Also, am definitely not a
guru on kernel internals, so it is always possible I missed something.
But, at least on the surface this doesn't seem hard to implement.

A third thing I implemented was a option which would cause ptrace() to be
inherited across forks.  The way that strace does this (last I looked)
is an unreliable abomination: when it sees a request to call fork(), it
sets a breakpoint at the next instruction after the fork() by re-writing
the code of the parent, then when that breakpoint triggers it attaches to
the child, restores the parent's code, and lets them continue executing.
This is icky, and I have little confidence in its security to prevent
children from escaping a ptrace() jail, so I added a feature to ptrace()
that remedies the situation.

Anyway, back to the main topic: ptrace() vs seccomp.  I think one
plausible reason to prefer some mechanism that allows user level to
specify the allowed syscall set is that it might provide more flexibility.
What if 6 months from now we discover that we really should have enabled
one more syscall in seccomp to accomodate other applications?

At the same time, I truly empathize Andrea's position that something
like seccomp ought to be a lot easier to verify correct than ptrace().
I think several people here are underestimating the importance of
clean design.  ptrace() is, frankly, a godawful mess, and I don't
know about this thinking that you can take a godawful mess and then
audit it carefully and call it secure -- well, that seems unlikely to
ever lead to the same level of assurance that you can get with a much
cleaner design.  (This business of overloading as a means of sending
ptrace events to user level was in retrospect probably a bad design
decision, for instance.  See, e.g., Section 12 of my MS thesis for more.
http://www.cs.berkeley.edu/~daw/papers/janus-masters.ps)  Given this,
I can see real value in seccomp.

Perhaps there is a compromise position.  What if one started from seccomp,
but then extended it so the set of allowed syscalls can be specified by
user level?  This would push policy to user level, while retaining the
attractive simplicity and ease-of-audit properties of the seccomp design.
Does something like this make sense?

Let me give you some idea of new applications that might be enabled
by this kind of functionality.  One cool idea is a 'delegating
architecture' for jails.  The jailed process inherit an open file
descriptor to its jailor, and is only allowed to call read(), write(),
sendmsg(), and recvmsg().  If the jailed process wants to interact
with the outside world, it can send a request to its jailor to this
effect.  For instance, suppose the jailed process wants to create a
file called "/tmp/whatever", so it sends this request to the jailor.
The jailor can decide whether it wants this to be allowed.  If it is
to be allowed, the jailor can create this file and transfer a file
descriptor to the jailed process using sendmsg().  Note that this
mechanism allows the jailor to completely virtualize the system call
interface; for instance, the jailor could transparently instead create
"/tmp/jail17/whatever" and return a fd to it to the jailed process,
without the jailed process being any the wiser.  (For more on this,
see http://www.stanford.edu/~talg/papers/NDSS04/abstract.html and
http://www.cs.jhu.edu/~seaborn/plash/plash.html)

So this is one example of an application that is enabled by adding
recvmsg() to the set of allowed syscalls.  When it comes to the broader
question of seccomp vs ptrace(), I don't know what strategy makes most
sense for the Linux kernel, but I hope these ideas help give you some
idea of what might be possible and how these mechanisms could be used.

  reply	other threads:[~2005-01-23  7:36 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-01-21 10:06 seccomp for 2.6.11-rc1-bk8 Andrea Arcangeli
2005-01-21 12:03 ` Ingo Molnar
2005-01-21 12:47   ` Ingo Molnar
2005-01-21 12:55     ` Ingo Molnar
2005-01-21 21:31       ` Roland McGrath
2005-01-22  3:25         ` Andrea Arcangeli
2005-01-21 20:24     ` Andrea Arcangeli
2005-01-21 17:39   ` Chris Wright
2005-01-21 18:39     ` Rik van Riel
2005-01-21 18:50       ` Chris Wright
2005-01-21 19:55         ` Ingo Molnar
2005-01-21 20:34           ` Andrea Arcangeli
2005-01-21 20:54             ` Ingo Molnar
2005-01-22  2:51               ` Andrea Arcangeli
2005-01-22 10:32             ` Pavel Machek
2005-01-22 17:25               ` Andrea Arcangeli
2005-01-22 19:42                 ` Pavel Machek
2005-01-22 23:34                   ` Andrea Arcangeli
2005-01-23  0:07                     ` Pavel Machek
2005-01-23  0:46                       ` Andrea Arcangeli
2005-01-23  0:43                     ` Rik van Riel
2005-01-23  0:52                       ` Andrea Arcangeli
2005-01-23  4:43                         ` Valdis.Kletnieks
2005-01-23  6:11                           ` Andrea Arcangeli
2005-01-21 18:59     ` David Wagner
2005-01-21 19:17       ` Chris Wright
2005-01-23  7:34         ` David Wagner [this message]
2005-01-24 15:10           ` Daniel Jacobowitz
2005-02-15  9:25           ` Andrea Arcangeli
2005-02-25 19:01             ` David Wagner
2005-01-21 12:11 ` Pavel Machek
2005-02-15  9:32 ` seccomp for 2.6.11-rc4 Andrea Arcangeli
2005-02-16  5:25   ` Herbert Poetzl
2005-02-18  2:25     ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='csvk20$6qa$1@abraham.cs.berkeley.edu' \
    --to=daw@taverner.cs.berkeley.edu \
    --cc=daw-usenet@taverner.cs.berkeley.edu \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox