public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Will Drewry <wad@chromium.org>
To: linux-kernel@vger.kernel.org
Cc: torvalds@linux-foundation.org, djm@mindrot.org,
	segoon@openwall.com, kees.cook@canonical.com, mingo@elte.hu,
	rostedt@goodmis.org, jmorris@namei.org, fweisbec@gmail.com,
	tglx@linutronix.de, scarybeasts@gmail.com,
	Will Drewry <wad@chromium.org>,
	Randy Dunlap <rdunlap@xenotime.net>,
	linux-doc@vger.kernel.org
Subject: [PATCH v9 05/13] seccomp_filter: Document what seccomp_filter is and how it works.
Date: Thu, 23 Jun 2011 19:36:44 -0500	[thread overview]
Message-ID: <1308875813-20122-5-git-send-email-wad@chromium.org> (raw)
In-Reply-To: <1308875813-20122-1-git-send-email-wad@chromium.org>

Adds a text file covering what CONFIG_SECCOMP_FILTER is, how it is
implemented presently, and what it may be used for.  In addition,
the limitations and caveats of the proposed implementation are
included.

v9: rebase on to bccaeafd7c117acee36e90d37c7e05c19be9e7bf
v8: -
v7: Add a caveat around fork behavior and execve
v6: -
v5: -
v4: rewording (courtesy kees.cook@canonical.com)
    reflect support for event ids
    add a small section on adding per-arch support
v3: a little more cleanup
v2: moved to prctl/
    updated for the v2 syntax.
    adds a note about compat behavior

Signed-off-by: Will Drewry <wad@chromium.org>
---
 Documentation/prctl/seccomp_filter.txt |  189 ++++++++++++++++++++++++++++++++
 1 files changed, 189 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/prctl/seccomp_filter.txt

diff --git a/Documentation/prctl/seccomp_filter.txt b/Documentation/prctl/seccomp_filter.txt
new file mode 100644
index 0000000..a9cddc2
--- /dev/null
+++ b/Documentation/prctl/seccomp_filter.txt
@@ -0,0 +1,189 @@
+		Seccomp filtering
+		=================
+
+Introduction
+------------
+
+A large number of system calls are exposed to every userland process
+with many of them going unused for the entire lifetime of the process.
+As system calls change and mature, bugs are found and eradicated.  A
+certain subset of userland applications benefit by having a reduced set
+of available system calls.  The resulting set reduces the total kernel
+surface exposed to the application.  System call filtering is meant for
+use with those applications.
+
+The implementation currently leverages both the existing seccomp
+infrastructure and the kernel tracing infrastructure.  By centralizing
+hooks for attack surface reduction in seccomp, it is possible to assure
+attention to security that is less relevant in normal ftrace scenarios,
+such as time-of-check, time-of-use attacks.  However, ftrace provides a
+rich, human-friendly environment for interfacing with system call
+specific arguments.  (As such, this requires FTRACE_SYSCALLS for any
+introspective filtering support.)
+
+
+What it isn't
+-------------
+
+System call filtering isn't a sandbox.  It provides a clearly defined
+mechanism for minimizing the exposed kernel surface.  Beyond that,
+policy for logical behavior and information flow should be managed with
+a combinations of other system hardening techniques and, potentially, a
+LSM of your choosing.  Expressive, dynamic filters based on the ftrace
+filter engine provide further options down this path (avoiding
+pathological sizes or selecting which of the multiplexed system calls in
+socketcall() is allowed, for instance) which could be construed,
+incorrectly, as a more complete sandboxing solution.
+
+
+Usage
+-----
+
+An additional seccomp mode is exposed through mode '2'.
+This mode depends on CONFIG_SECCOMP_FILTER.  By default, it provides
+only the most trivial of filter support "1" or cleared.  However, if
+CONFIG_FTRACE_SYSCALLS is enabled, the ftrace filter engine may be used
+for more expressive filters.
+
+A collection of filters may be supplied via prctl, and the current set
+of filters is exposed in /proc/<pid>/seccomp_filter.
+
+Interacting with seccomp filters can be done through three new prctl calls
+and one existing one.
+
+PR_SET_SECCOMP:
+	A pre-existing option for enabling strict seccomp mode (1) or
+	filtering seccomp (2).
+
+	Usage:
+		prctl(PR_SET_SECCOMP, 1);  /* strict */
+		prctl(PR_SET_SECCOMP, 2);  /* filters */
+
+PR_SET_SECCOMP_FILTER:
+	Allows the specification of a new filter for a given system
+	call, by number, and filter string.  By default, the filter
+	string may only be "1".  However, if CONFIG_FTRACE_SYSCALLS is
+	supported, the filter string may make use of the ftrace
+	filtering language's awareness of system call arguments.
+
+	In addition, the event id for the system call entry may be
+	specified in lieu of the system call number itself, as
+	determined by the 'type' argument.  This allows for the future
+	addition of seccomp-based filtering on other registered,
+	relevant ftrace events.
+
+	All calls to PR_SET_SECCOMP_FILTER for a given system
+	call will append the supplied string to any existing filters.
+	Filter construction looks as follows:
+		(Nothing) + "fd == 1 || fd == 2" => fd == 1 || fd == 2
+		... + "fd != 2" => (fd == 1 || fd == 2) && fd != 2
+		... + "size < 100" =>
+			((fd == 1 || fd == 2) && fd != 2) && size < 100
+	If there is no filter and the seccomp mode has already
+	transitioned to filtering, additions cannot be made.  Filters
+	may only be added that reduce the available kernel surface.
+
+	Usage (per the construction example above):
+		unsigned long type = PR_SECCOMP_FILTER_SYSCALL;
+		prctl(PR_SET_SECCOMP_FILTER, type, __NR_write,
+			"fd == 1 || fd == 2");
+		prctl(PR_SET_SECCOMP_FILTER, type, __NR_write,
+			"fd != 2");
+		prctl(PR_SET_SECCOMP_FILTER, type, __NR_write,
+			"size < 100");
+
+	The 'type' argument may be one of PR_SECCOMP_FILTER_SYSCALL or
+	PR_SECCOMP_FILTER_EVENT.
+
+PR_CLEAR_SECCOMP_FILTER:
+	Removes all filter entries for a given system call number or
+	event id.  When called prior to entering seccomp filtering mode,
+	it allows for new filters to be applied to the same system call.
+	After transition, however, it completely drops access to the
+	call.
+
+	Usage:
+		prctl(PR_CLEAR_SECCOMP_FILTER,
+			PR_SECCOMP_FILTER_SYSCALL, __NR_open);
+
+PR_GET_SECCOMP_FILTER:
+	Returns the aggregated filter string for a system call into a
+	user-supplied buffer of a given length.
+
+	Usage:
+		prctl(PR_GET_SECCOMP_FILTER,
+			PR_SECCOMP_FILTER_SYSCALL, __NR_write, buf,
+			sizeof(buf));
+
+All of the above calls return 0 on success and non-zero on error.  If
+CONFIG_FTRACE_SYSCALLS is not supported and a rich-filter was specified,
+the caller may check the errno for -ENOSYS.  The same is true if
+specifying an filter by the event id fails to discover any relevant
+event entries.
+
+
+Example
+-------
+
+Assume a process would like to cleanly read and write to stdin/out/err
+as well as access its filters after seccomp enforcement begins.  This
+may be done as follows:
+
+  int filter_syscall(int nr, char *buf) {
+    return prctl(PR_SET_SECCOMP_FILTER, PR_SECCOMP_FILTER_SYSCALL,
+                 nr, buf);
+  }
+
+  filter_syscall(__NR_read, "fd == 0");
+  filter_syscall(_NR_write, "fd == 1 || fd == 2");
+  filter_syscall(__NR_exit, "1");
+  filter_syscall(__NR_prctl, "1");
+  prctl(PR_SET_SECCOMP, 2);
+
+  /* Do stuff with fdset . . .*/
+
+  /* Drop read access and keep only write access to fd 1. */
+  prctl(PR_CLEAR_SECCOMP_FILTER, PR_SECCOMP_FILTER_SYSCALL, __NR_read);
+  filter_syscall(__NR_write, "fd != 2");
+
+  /* Perform any final processing . . . */
+  syscall(__NR_exit, 0);
+
+
+Caveats
+-------
+
+- Avoid using a filter of "0" to disable a filter.  Always favor calling
+  prctl(PR_CLEAR_SECCOMP_FILTER, ...).  Otherwise the behavior may vary
+  depending on if CONFIG_FTRACE_SYSCALLS support exists -- though an
+  error will be returned if the support is missing.
+
+- execve is always blocked.  seccomp filters may not cross that boundary.
+
+- Filters can be inherited across fork/clone but only when they are
+  active (e.g., PR_SET_SECCOMP has been set to 2), but not prior to use.
+  This stops the parent process from adding filters that may undermine
+  the child process security or create unexpected behavior after an
+  execve.
+
+- Some platforms support a 32-bit userspace with 64-bit kernels.  In
+  these cases (CONFIG_COMPAT), system call numbers may not match across
+  64-bit and 32-bit system calls. When the first PRCTL_SET_SECCOMP_FILTER
+  is called, the in-memory filters state is annotated with whether the
+  call has been made via the compat interface.  All subsequent calls will
+  be checked for compat call mismatch.  In the long run, it may make sense
+  to store compat and non-compat filters separately, but that is not
+  supported at present. Once one type of system call interface has been
+  used, it must be continued to be used.
+
+
+Adding architecture support
+-----------------------
+
+Any platform with seccomp support should be able to support the bare
+minimum of seccomp filter features.  However, since seccomp_filter
+requires that execve be blocked, it expects the architecture to expose a
+__NR_seccomp_execve define that maps to the execve system call number.
+On platforms where CONFIG_COMPAT applies, __NR_seccomp_execve_32 must
+also be provided.  Once those macros exist, "select HAVE_SECCOMP_FILTER"
+support may be added to the architectures Kconfig.
-- 
1.7.0.4


  parent reply	other threads:[~2011-06-24  0:40 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-24  0:36 [PATCH v9 01/13] tracing: split out filter initialization and clean up uses Will Drewry
2011-06-24  0:36 ` [PATCH v9 02/13] tracing: split out syscall_trace_enter construction Will Drewry
2011-06-24  0:36 ` [PATCH v9 03/13] seccomp_filter: new mode with configurable syscall filters Will Drewry
2011-06-24  7:30   ` Damien Miller
2011-06-24 20:20   ` Kees Cook
2011-06-24  0:36 ` [PATCH v9 04/13] seccomp_filter: add process state reporting Will Drewry
2011-06-24  0:36 ` Will Drewry [this message]
2011-06-24  7:24   ` [PATCH v9 05/13] seccomp_filter: Document what seccomp_filter is and how it works Chris Evans
     [not found]   ` <BANLkTimtYUyXbZjWhjK61B_1WBXE4MoAeA@mail.gmail.com>
2011-06-26 23:20     ` James Morris
2011-06-29 19:13       ` Will Drewry
2011-06-30  1:30         ` James Morris
2011-07-01 11:56           ` Ingo Molnar
2011-07-01 12:56             ` Will Drewry
2011-07-01 13:07               ` Ingo Molnar
2011-07-01 15:46                 ` Will Drewry
2011-07-01 16:10                   ` Ingo Molnar
2011-07-01 16:43                     ` Will Drewry
2011-07-01 18:04                       ` Steven Rostedt
2011-07-01 18:09                         ` Will Drewry
2011-07-01 18:48                           ` Steven Rostedt
2011-07-04  2:19                             ` James Morris
2011-07-05 12:40                               ` Steven Rostedt
2011-07-05 23:46                                 ` James Morris
2011-07-06  0:37                                   ` [Ksummit-2011-discuss] " Ted Ts'o
2011-07-05 23:56                               ` Steven Rostedt
2011-07-05  2:54                           ` [Ksummit-2011-discuss] " Eugene Teo
2011-07-01 20:25                         ` Kees Cook
2011-07-04 16:09                           ` [Ksummit-2011-discuss] " Greg KH
2011-07-01 21:00                       ` Ingo Molnar
2011-07-01 21:34                         ` Will Drewry
2011-07-05  9:50                           ` Ingo Molnar
2011-07-06 18:24                             ` Will Drewry
2011-07-05 15:26                 ` Vasiliy Kulikov
2011-06-24  0:36 ` [PATCH v9 06/13] x86: add HAVE_SECCOMP_FILTER and seccomp_execve Will Drewry
2011-06-24  0:36 ` [PATCH v9 07/13] arm: select HAVE_SECCOMP_FILTER Will Drewry
2011-06-24  0:36 ` [PATCH v9 08/13] microblaze: select HAVE_SECCOMP_FILTER and provide seccomp_execve Will Drewry
2011-06-24  0:36 ` [PATCH v9 09/13] mips: " Will Drewry
2011-06-24  0:36 ` [PATCH v9 10/13] s390: " Will Drewry
2011-06-24  0:36 ` [PATCH v9 11/13] powerpc: " Will Drewry
2011-08-30  5:28   ` Benjamin Herrenschmidt
2011-11-28  0:14     ` Benjamin Herrenschmidt
2011-11-28  1:45       ` Will Drewry
2011-06-24  0:36 ` [PATCH v9 12/13] sparc: " Will Drewry
2011-06-24  0:36 ` [PATCH v9 13/13] sh: select HAVE_SECCOMP_FILTER Will Drewry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1308875813-20122-5-git-send-email-wad@chromium.org \
    --to=wad@chromium.org \
    --cc=djm@mindrot.org \
    --cc=fweisbec@gmail.com \
    --cc=jmorris@namei.org \
    --cc=kees.cook@canonical.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rdunlap@xenotime.net \
    --cc=rostedt@goodmis.org \
    --cc=scarybeasts@gmail.com \
    --cc=segoon@openwall.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox