All of lore.kernel.org
 help / color / mirror / Atom feed
From: Will Drewry <wad@chromium.org>
To: linux-kernel@vger.kernel.org
Cc: torvalds@linux-foundation.org, djm@mindrot.org,
	segoon@openwall.com, kees.cook@canonical.com, mingo@elte.hu,
	rostedt@goodmis.org, jmorris@namei.org, fweisbec@gmail.com,
	tglx@linutronix.de, scarybeasts@gmail.com,
	Will Drewry <wad@chromium.org>,
	Randy Dunlap <rdunlap@xenotime.net>,
	linux-doc@vger.kernel.org
Subject: [PATCH v9 05/13] seccomp_filter: Document what seccomp_filter is and how it works.
Date: Thu, 23 Jun 2011 19:36:44 -0500	[thread overview]
Message-ID: <1308875813-20122-5-git-send-email-wad@chromium.org> (raw)
In-Reply-To: <1308875813-20122-1-git-send-email-wad@chromium.org>

Adds a text file covering what CONFIG_SECCOMP_FILTER is, how it is
implemented presently, and what it may be used for.  In addition,
the limitations and caveats of the proposed implementation are
included.

v9: rebase on to bccaeafd7c117acee36e90d37c7e05c19be9e7bf
v8: -
v7: Add a caveat around fork behavior and execve
v6: -
v5: -
v4: rewording (courtesy kees.cook@canonical.com)
    reflect support for event ids
    add a small section on adding per-arch support
v3: a little more cleanup
v2: moved to prctl/
    updated for the v2 syntax.
    adds a note about compat behavior

Signed-off-by: Will Drewry <wad@chromium.org>
---
 Documentation/prctl/seccomp_filter.txt |  189 ++++++++++++++++++++++++++++++++
 1 files changed, 189 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/prctl/seccomp_filter.txt

diff --git a/Documentation/prctl/seccomp_filter.txt b/Documentation/prctl/seccomp_filter.txt
new file mode 100644
index 0000000..a9cddc2
--- /dev/null
+++ b/Documentation/prctl/seccomp_filter.txt
@@ -0,0 +1,189 @@
+		Seccomp filtering
+		=================
+
+Introduction
+------------
+
+A large number of system calls are exposed to every userland process
+with many of them going unused for the entire lifetime of the process.
+As system calls change and mature, bugs are found and eradicated.  A
+certain subset of userland applications benefit by having a reduced set
+of available system calls.  The resulting set reduces the total kernel
+surface exposed to the application.  System call filtering is meant for
+use with those applications.
+
+The implementation currently leverages both the existing seccomp
+infrastructure and the kernel tracing infrastructure.  By centralizing
+hooks for attack surface reduction in seccomp, it is possible to assure
+attention to security that is less relevant in normal ftrace scenarios,
+such as time-of-check, time-of-use attacks.  However, ftrace provides a
+rich, human-friendly environment for interfacing with system call
+specific arguments.  (As such, this requires FTRACE_SYSCALLS for any
+introspective filtering support.)
+
+
+What it isn't
+-------------
+
+System call filtering isn't a sandbox.  It provides a clearly defined
+mechanism for minimizing the exposed kernel surface.  Beyond that,
+policy for logical behavior and information flow should be managed with
+a combinations of other system hardening techniques and, potentially, a
+LSM of your choosing.  Expressive, dynamic filters based on the ftrace
+filter engine provide further options down this path (avoiding
+pathological sizes or selecting which of the multiplexed system calls in
+socketcall() is allowed, for instance) which could be construed,
+incorrectly, as a more complete sandboxing solution.
+
+
+Usage
+-----
+
+An additional seccomp mode is exposed through mode '2'.
+This mode depends on CONFIG_SECCOMP_FILTER.  By default, it provides
+only the most trivial of filter support "1" or cleared.  However, if
+CONFIG_FTRACE_SYSCALLS is enabled, the ftrace filter engine may be used
+for more expressive filters.
+
+A collection of filters may be supplied via prctl, and the current set
+of filters is exposed in /proc/<pid>/seccomp_filter.
+
+Interacting with seccomp filters can be done through three new prctl calls
+and one existing one.
+
+PR_SET_SECCOMP:
+	A pre-existing option for enabling strict seccomp mode (1) or
+	filtering seccomp (2).
+
+	Usage:
+		prctl(PR_SET_SECCOMP, 1);  /* strict */
+		prctl(PR_SET_SECCOMP, 2);  /* filters */
+
+PR_SET_SECCOMP_FILTER:
+	Allows the specification of a new filter for a given system
+	call, by number, and filter string.  By default, the filter
+	string may only be "1".  However, if CONFIG_FTRACE_SYSCALLS is
+	supported, the filter string may make use of the ftrace
+	filtering language's awareness of system call arguments.
+
+	In addition, the event id for the system call entry may be
+	specified in lieu of the system call number itself, as
+	determined by the 'type' argument.  This allows for the future
+	addition of seccomp-based filtering on other registered,
+	relevant ftrace events.
+
+	All calls to PR_SET_SECCOMP_FILTER for a given system
+	call will append the supplied string to any existing filters.
+	Filter construction looks as follows:
+		(Nothing) + "fd == 1 || fd == 2" => fd == 1 || fd == 2
+		... + "fd != 2" => (fd == 1 || fd == 2) && fd != 2
+		... + "size < 100" =>
+			((fd == 1 || fd == 2) && fd != 2) && size < 100
+	If there is no filter and the seccomp mode has already
+	transitioned to filtering, additions cannot be made.  Filters
+	may only be added that reduce the available kernel surface.
+
+	Usage (per the construction example above):
+		unsigned long type = PR_SECCOMP_FILTER_SYSCALL;
+		prctl(PR_SET_SECCOMP_FILTER, type, __NR_write,
+			"fd == 1 || fd == 2");
+		prctl(PR_SET_SECCOMP_FILTER, type, __NR_write,
+			"fd != 2");
+		prctl(PR_SET_SECCOMP_FILTER, type, __NR_write,
+			"size < 100");
+
+	The 'type' argument may be one of PR_SECCOMP_FILTER_SYSCALL or
+	PR_SECCOMP_FILTER_EVENT.
+
+PR_CLEAR_SECCOMP_FILTER:
+	Removes all filter entries for a given system call number or
+	event id.  When called prior to entering seccomp filtering mode,
+	it allows for new filters to be applied to the same system call.
+	After transition, however, it completely drops access to the
+	call.
+
+	Usage:
+		prctl(PR_CLEAR_SECCOMP_FILTER,
+			PR_SECCOMP_FILTER_SYSCALL, __NR_open);
+
+PR_GET_SECCOMP_FILTER:
+	Returns the aggregated filter string for a system call into a
+	user-supplied buffer of a given length.
+
+	Usage:
+		prctl(PR_GET_SECCOMP_FILTER,
+			PR_SECCOMP_FILTER_SYSCALL, __NR_write, buf,
+			sizeof(buf));
+
+All of the above calls return 0 on success and non-zero on error.  If
+CONFIG_FTRACE_SYSCALLS is not supported and a rich-filter was specified,
+the caller may check the errno for -ENOSYS.  The same is true if
+specifying an filter by the event id fails to discover any relevant
+event entries.
+
+
+Example
+-------
+
+Assume a process would like to cleanly read and write to stdin/out/err
+as well as access its filters after seccomp enforcement begins.  This
+may be done as follows:
+
+  int filter_syscall(int nr, char *buf) {
+    return prctl(PR_SET_SECCOMP_FILTER, PR_SECCOMP_FILTER_SYSCALL,
+                 nr, buf);
+  }
+
+  filter_syscall(__NR_read, "fd == 0");
+  filter_syscall(_NR_write, "fd == 1 || fd == 2");
+  filter_syscall(__NR_exit, "1");
+  filter_syscall(__NR_prctl, "1");
+  prctl(PR_SET_SECCOMP, 2);
+
+  /* Do stuff with fdset . . .*/
+
+  /* Drop read access and keep only write access to fd 1. */
+  prctl(PR_CLEAR_SECCOMP_FILTER, PR_SECCOMP_FILTER_SYSCALL, __NR_read);
+  filter_syscall(__NR_write, "fd != 2");
+
+  /* Perform any final processing . . . */
+  syscall(__NR_exit, 0);
+
+
+Caveats
+-------
+
+- Avoid using a filter of "0" to disable a filter.  Always favor calling
+  prctl(PR_CLEAR_SECCOMP_FILTER, ...).  Otherwise the behavior may vary
+  depending on if CONFIG_FTRACE_SYSCALLS support exists -- though an
+  error will be returned if the support is missing.
+
+- execve is always blocked.  seccomp filters may not cross that boundary.
+
+- Filters can be inherited across fork/clone but only when they are
+  active (e.g., PR_SET_SECCOMP has been set to 2), but not prior to use.
+  This stops the parent process from adding filters that may undermine
+  the child process security or create unexpected behavior after an
+  execve.
+
+- Some platforms support a 32-bit userspace with 64-bit kernels.  In
+  these cases (CONFIG_COMPAT), system call numbers may not match across
+  64-bit and 32-bit system calls. When the first PRCTL_SET_SECCOMP_FILTER
+  is called, the in-memory filters state is annotated with whether the
+  call has been made via the compat interface.  All subsequent calls will
+  be checked for compat call mismatch.  In the long run, it may make sense
+  to store compat and non-compat filters separately, but that is not
+  supported at present. Once one type of system call interface has been
+  used, it must be continued to be used.
+
+
+Adding architecture support
+-----------------------
+
+Any platform with seccomp support should be able to support the bare
+minimum of seccomp filter features.  However, since seccomp_filter
+requires that execve be blocked, it expects the architecture to expose a
+__NR_seccomp_execve define that maps to the execve system call number.
+On platforms where CONFIG_COMPAT applies, __NR_seccomp_execve_32 must
+also be provided.  Once those macros exist, "select HAVE_SECCOMP_FILTER"
+support may be added to the architectures Kconfig.
-- 
1.7.0.4


  parent reply	other threads:[~2011-06-24  0:40 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-24  0:36 [PATCH v9 01/13] tracing: split out filter initialization and clean up uses Will Drewry
2011-06-24  0:36 ` [PATCH v9 02/13] tracing: split out syscall_trace_enter construction Will Drewry
2011-06-24  0:36 ` [PATCH v9 03/13] seccomp_filter: new mode with configurable syscall filters Will Drewry
2011-06-24  7:30   ` Damien Miller
2011-06-24 20:20   ` Kees Cook
2011-06-24  0:36 ` [PATCH v9 04/13] seccomp_filter: add process state reporting Will Drewry
2011-06-24  0:36 ` Will Drewry [this message]
2011-06-24  7:24   ` [PATCH v9 05/13] seccomp_filter: Document what seccomp_filter is and how it works Chris Evans
     [not found]   ` <BANLkTimtYUyXbZjWhjK61B_1WBXE4MoAeA@mail.gmail.com>
2011-06-26 23:20     ` James Morris
2011-06-29 19:13       ` Will Drewry
2011-06-30  1:30         ` James Morris
2011-07-01 11:56           ` Ingo Molnar
2011-07-01 12:56             ` Will Drewry
2011-07-01 13:07               ` Ingo Molnar
2011-07-01 15:46                 ` Will Drewry
2011-07-01 16:10                   ` Ingo Molnar
2011-07-01 16:43                     ` Will Drewry
2011-07-01 18:04                       ` Steven Rostedt
2011-07-01 18:09                         ` Will Drewry
2011-07-01 18:48                           ` Steven Rostedt
2011-07-04  2:19                             ` James Morris
2011-07-05 12:40                               ` Steven Rostedt
2011-07-05 23:46                                 ` James Morris
2011-07-06  0:37                                   ` [Ksummit-2011-discuss] " Ted Ts'o
2011-07-05 23:56                               ` Steven Rostedt
2011-07-05  2:54                           ` [Ksummit-2011-discuss] " Eugene Teo
2011-07-01 20:25                         ` Kees Cook
2011-07-04 16:09                           ` [Ksummit-2011-discuss] " Greg KH
2011-07-01 21:00                       ` Ingo Molnar
2011-07-01 21:34                         ` Will Drewry
2011-07-05  9:50                           ` Ingo Molnar
2011-07-06 18:24                             ` Will Drewry
2011-07-05 15:26                 ` Vasiliy Kulikov
2011-06-24  0:36 ` [PATCH v9 06/13] x86: add HAVE_SECCOMP_FILTER and seccomp_execve Will Drewry
2011-06-24  0:36 ` [PATCH v9 07/13] arm: select HAVE_SECCOMP_FILTER Will Drewry
2011-06-24  0:36   ` Will Drewry
2011-06-24  0:36 ` [PATCH v9 08/13] microblaze: select HAVE_SECCOMP_FILTER and provide seccomp_execve Will Drewry
2011-06-24  0:36 ` [PATCH v9 09/13] mips: " Will Drewry
2011-06-24  0:36 ` [PATCH v9 10/13] s390: " Will Drewry
2011-06-24  0:36 ` [PATCH v9 11/13] powerpc: " Will Drewry
2011-06-24  0:36   ` Will Drewry
2011-08-30  5:28   ` Benjamin Herrenschmidt
2011-08-30  5:28     ` Benjamin Herrenschmidt
2011-11-28  0:14     ` Benjamin Herrenschmidt
2011-11-28  0:14       ` Benjamin Herrenschmidt
2011-11-28  1:45       ` Will Drewry
2011-11-28  1:45         ` Will Drewry
2011-06-24  0:36 ` [PATCH v9 12/13] sparc: " Will Drewry
2011-06-24  0:36   ` Will Drewry
2011-06-24  0:36 ` [PATCH v9 13/13] sh: select HAVE_SECCOMP_FILTER Will Drewry
2011-06-24  0:36   ` Will Drewry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1308875813-20122-5-git-send-email-wad@chromium.org \
    --to=wad@chromium.org \
    --cc=djm@mindrot.org \
    --cc=fweisbec@gmail.com \
    --cc=jmorris@namei.org \
    --cc=kees.cook@canonical.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rdunlap@xenotime.net \
    --cc=rostedt@goodmis.org \
    --cc=scarybeasts@gmail.com \
    --cc=segoon@openwall.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.