Re: Using ftrace/perf as a basis for generic seccomp

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Eric Paris <eparis@redhat.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>,
	Stefan Fritsch <sf@sfritsch.de>, Ingo Molnar <mingo@elte.hu>,
	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
	linux-kernel@vger.kernel.org, agl@google.com, tzanussi@gmail.com,
	Jason Baron <jbaron@redhat.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	2nddept-manager@sdl.hitachi.co.jp,
	Steven Rostedt <rostedt@goodmis.org>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	James Morris <jmorris@namei.org>
Subject: Re: Using ftrace/perf as a basis for generic seccomp
Date: Fri, 04 Feb 2011 11:29:19 -0500	[thread overview]
Message-ID: <1296836962.3145.75.camel@localhost.localdomain> (raw)
In-Reply-To: <1296829915.26581.658.camel@laptop>

On Fri, 2011-02-04 at 15:31 +0100, Peter Zijlstra wrote:
> On Thu, 2011-02-03 at 20:50 -0500, Eric Paris wrote:
> > I'm going to try to work on it over
> > the next week or two.  
> 
> What is your use-case? Going by: http://lwn.net/Articles/332990/ syscall
> based stuff (seccomp) is broken by design.

My personal goal is very different than an LSM.  My goal is to reduce
attack surface.  I'm not trying to implement an LSM.  LSM hooks are
(intentionally) placed in the kernel after object resolution is
complete.  In an LSM we don't check 'open' type operation until after
the pathname has been converted to an inode.  We don't check some
'sendto' operations until after the data has been placed into an skb and
is about to be queued to a socket.  There is a LOT of code between
syscall_entry() and any given LSM hook.

An obvious vulnerability that I'm sure all the people involved here know
would be the original perf syscall bounds checking vulnerability.  If
I'm dealing with an application that I know will never use perf I'd like
a way to be able to completely disable the perf syscall and greatly
reduce the kernel attack surface.  It would be almost impossible for an
LSM to hook between the syscall_enter() and the location of that
vulnerability in the perf syscall.  In my particular case I'm thinking
about qemu, which never needs to call perf.  I want a way to disable all
of the code after syscall_enter() for huge swaths of the kernel.

What we have today, called "seccomp", is a one way toggle,
prctl(PR_SET_SECCOMP, 1), which reduces the available syscalls to
read,write,exit, and sigreturn.  Any other syscall results in a process
being immediately killed.  It's a great idea to reduce the attack
surface of the kernel but it is too inflexible to be useful.  I wonder
if anyone is using it.

Qemu on my box in just a couple of seconds of strace was found to use
futex, ioctl, read, rt_sigaction, select, timer_gettime, timer_settime,
and write.  I'm sure that other well defined processes have other such
sort lists of required syscalls.  I think a more flexible seccomp which
lets one remove syscalls from the allowed set (but never add them back)
can GREATLY reduce the kernel attack surface from malicious processes.

This is not a sandbox.  This is not an LSM replacement.  This is a per
syscall cutoff.  It can be used to help build a stronger sandbox.  I'll
likely see if this can't be used by the SELinux sandbox which already
uses the LSM hooks to control information flow and mediate access.  But
SELinux does not control the sheer amount of the kernel code that can be
executed.  I believe we can build a stronger sandbox using a flexible
seccomp as one of the tools.  All we have to do is find one
vulnerability in the code between the syscall entry and a LSM hook which
would deny to operation to see the value in a per syscall control
mechanism.

As to doing it in seccomp code where it's all of a syscall or none vs
making use of the filter infrastructure to allow even more fine grained
control over the syscall is a question.  I'm leaning more towards just
doing it in seccomp.  We can't ever build a full and complete strong
sandbox using the filter code.  James' assertions about copy_from_user()
are obviously correct.  A chat with PeterZ privately on IRC indicated
that he also was not interested in seeing this creep into the tracing
code.  Do we have a user that can articulate a need for greater
flexibility in their use of such a hardening tool?

I think given all these things I'm going to go back to looking at the
flexible seccomp for now.  And maybe we should work towards using the
tracing filter code in the future if someone can articulate a real use
case.....

-Eric

next prev parent reply	other threads:[~2011-02-04 16:30 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-12 21:28 Using ftrace/perf as a basis for generic seccomp Eric Paris
2011-02-01 14:58 ` Eric Paris
2011-02-02 12:14   ` Masami Hiramatsu
2011-02-02 12:26     ` Ingo Molnar
2011-02-02 16:45       ` Eric Paris
2011-02-02 17:55         ` Ingo Molnar
2011-02-02 18:17           ` Steven Rostedt
2011-02-03 19:06         ` Frederic Weisbecker
2011-02-03 19:18           ` Frederic Weisbecker
2011-02-03 22:06           ` Stefan Fritsch
2011-02-03 23:10             ` Frederic Weisbecker
2011-02-04  1:50               ` Eric Paris
2011-02-04 14:31                 ` Peter Zijlstra
2011-02-04 16:29                   ` Eric Paris [this message]
2011-02-04 17:04                     ` Frederic Weisbecker
2011-02-05 11:51                       ` Stefan Fritsch
2011-02-07 12:26                         ` Peter Zijlstra
2011-02-04 16:36             ` Eric Paris
2011-02-05 11:42               ` Stefan Fritsch
2011-02-06 16:51                 ` Eric Paris

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1296836962.3145.75.camel@localhost.localdomain \
    --to=eparis@redhat.com \
    --cc=2nddept-manager@sdl.hitachi.co.jp \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@redhat.com \
    --cc=agl@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jbaron@redhat.com \
    --cc=jmorris@namei.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@elte.hu \
    --cc=rostedt@goodmis.org \
    --cc=sf@sfritsch.de \
    --cc=tglx@linutronix.de \
    --cc=tzanussi@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox