Re: [PATCH v6 3/3] Documentation: prctl/seccomp_filter

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Corey Bryant <coreyb@linux.vnet.ibm.com>
To: Will Drewry <wad@chromium.org>
Cc: linux-kernel@vger.kernel.org, keescook@chromium.org,
	john.johansen@canonical.com, serge.hallyn@canonical.com,
	pmoore@redhat.com, eparis@redhat.com, djm@mindrot.org,
	torvalds@linux-foundation.org, segoon@openwall.com,
	rostedt@goodmis.org, jmorris@namei.org, scarybeasts@gmail.com,
	avi@redhat.com, penberg@cs.helsinki.fi, viro@zeniv.linux.org.uk,
	luto@mit.edu, mingo@elte.hu, akpm@linux-foundation.org,
	khilman@ti.com, borislav.petkov@amd.com, amwang@redhat.com,
	oleg@redhat.com, ak@linux.intel.com, eric.dumazet@gmail.com,
	gregkh@suse.de, dhowells@redhat.com, daniel.lezcano@free.fr,
	linux-fsdevel@vger.kernel.org,
	linux-security-module@vger.kernel.org, olofj@chromium.org,
	mhalcrow@google.com, dlaor@redhat.com, corbet@lwn.net,
	alan@lxorguk.ukuu.org.uk, indan@nul.nu, mcgrathr@chromium.org
Subject: Re: [PATCH v6 3/3] Documentation: prctl/seccomp_filter
Date: Mon, 30 Jan 2012 17:47:26 -0500	[thread overview]
Message-ID: <4F271DFE.3080202@linux.vnet.ibm.com> (raw)
In-Reply-To: <1327788715-24076-3-git-send-email-wad@chromium.org>



On 01/28/2012 05:11 PM, Will Drewry wrote:
> Documents how system call filtering using Berkeley Packet
> Filter programs works and how it may be used.
> Includes an example for x86 (32-bit) and a semi-generic
> example using an example code generator.
>
> v6: - tweak the language to note the requirement of
>        PR_SET_NO_NEW_PRIVS being called prior to use. (luto@mit.edu)
> v5: - update sample to use system call arguments
>      - adds a "fancy" example using a macro-based generator
>      - cleaned up bpf in the sample
>      - update docs to mention arguments
>      - fix prctl value (eparis@redhat.com)
>      - language cleanup (rdunlap@xenotime.net)
> v4: - update for no_new_privs use
>      - minor tweaks
> v3: - call out BPF<->  Berkeley Packet Filter (rdunlap@xenotime.net)
>      - document use of tentative always-unprivileged
>      - guard sample compilation for i386 and x86_64
> v2: - move code to samples (corbet@lwn.net)
>
> Signed-off-by: Will Drewry<wad@chromium.org>
> ---
>   Documentation/prctl/seccomp_filter.txt |  100 +++++++++++++++
>   samples/Makefile                       |    2 +-
>   samples/seccomp/Makefile               |   27 ++++
>   samples/seccomp/bpf-direct.c           |   77 +++++++++++
>   samples/seccomp/bpf-fancy.c            |   95 ++++++++++++++
>   samples/seccomp/bpf-helper.c           |   89 +++++++++++++
>   samples/seccomp/bpf-helper.h           |  219 ++++++++++++++++++++++++++++++++
>   7 files changed, 608 insertions(+), 1 deletions(-)
>   create mode 100644 Documentation/prctl/seccomp_filter.txt
>   create mode 100644 samples/seccomp/Makefile
>   create mode 100644 samples/seccomp/bpf-direct.c
>   create mode 100644 samples/seccomp/bpf-fancy.c
>   create mode 100644 samples/seccomp/bpf-helper.c
>   create mode 100644 samples/seccomp/bpf-helper.h
>
> diff --git a/Documentation/prctl/seccomp_filter.txt b/Documentation/prctl/seccomp_filter.txt
> new file mode 100644
> index 0000000..4ad7649
> --- /dev/null
> +++ b/Documentation/prctl/seccomp_filter.txt
> @@ -0,0 +1,100 @@
> +		Seccomp filtering
> +		=================
> +
> +Introduction
> +------------
> +
> +A large number of system calls are exposed to every userland process
> +with many of them going unused for the entire lifetime of the process.
> +As system calls change and mature, bugs are found and eradicated.  A
> +certain subset of userland applications benefit by having a reduced set
> +of available system calls.  The resulting set reduces the total kernel
> +surface exposed to the application.  System call filtering is meant for
> +use with those applications.
> +
> +Seccomp filtering provides a means for a process to specify a filter for
> +incoming system calls.  The filter is expressed as a Berkeley Packet
> +Filter (BPF) program, as with socket filters, except that the data
> +operated on is related to the system call being made: system call
> +number, and the system call arguments.  This allows for expressive
> +filtering of system calls using a filter program language with a long
> +history of being exposed to userland and a straightforward data set.
> +
> +Additionally, BPF makes it impossible for users of seccomp to fall prey
> +to time-of-check-time-of-use (TOCTOU) attacks that are common in system
> +call interposition frameworks.  BPF programs may not dereference
> +pointers which constrains all filters to solely evaluating the system
> +call arguments directly.
> +
> +What it isn't
> +-------------
> +
> +System call filtering isn't a sandbox.  It provides a clearly defined
> +mechanism for minimizing the exposed kernel surface.  Beyond that,
> +policy for logical behavior and information flow should be managed with
> +a combination of other system hardening techniques and, potentially, an
> +LSM of your choosing.  Expressive, dynamic filters provide further options down
> +this path (avoiding pathological sizes or selecting which of the multiplexed
> +system calls in socketcall() is allowed, for instance) which could be
> +construed, incorrectly, as a more complete sandboxing solution.
> +
> +Usage
> +-----
> +
> +An additional seccomp mode is added, but they are not directly set by
> +the consuming process.  The new mode, '2', is only available if
> +CONFIG_SECCOMP_FILTER is set and enabled using prctl with the
> +PR_ATTACH_SECCOMP_FILTER argument.
> +
> +Interacting with seccomp filters is done using one prctl(2) call.
> +
> +PR_ATTACH_SECCOMP_FILTER:
> +	Allows the specification of a new filter using a BPF program.
> +	The BPF program will be executed over struct seccomp_filter_data
> +	reflecting the system call number, arguments, and other
> +	metadata, To allow a system call, SECCOMP_BPF_ALLOW must be
> +	returned.  At present, all other return values result in the
> +	system call being blocked, but it is recommended to return
> +	SECCOMP_BPF_DENY in those cases.  This will allow for future
> +	custom return values to be introduced, if ever desired.
> +
> +	Usage:
> +		prctl(PR_ATTACH_SECCOMP_FILTER, prog);
> +
> +	The 'prog' argument is a pointer to a struct sock_fprog which will
> +	contain the filter program.  If the program is invalid, the call
> +	will return -1 and set errno to EINVAL.
> +
> +	Note, is_compat_task is also tracked for the @prog.  This means
> +	that once set the calling task will have all of its system calls
> +	blocked if it switches its system call ABI.
> +
> +	If fork/clone and execve are allowed by @prog, any child processes will
> +	be constrained to the same filters and system call ABI as the parent.
> +
> +	Prior to use, the task must call prctl(PR_SET_NO_NEW_PRIVS, 1) or
> +	run with CAP_SYS_ADMIN privileges in its namespace.  If these are not
> +	true, -EACCES will be returned.  This requirement ensures that filter
> +	programs cannot be applied to child processes with greater privileges
> +	than the task that installed them.
> +
> +	Additionally, if prctl(2) is allowed by the attached filter,
> +	additional filters may be layered on which will increase evaluation
> +	time, but allow for further decreasing the attack surface during
> +	execution of a process.
> +
> +The above call returns 0 on success and non-zero on error.
> +
> +Example
> +-------
> +
> +The samples/seccomp/ directory contains both a 32-bit specific example
> +and a more generic example of a higher level macro interface for BPF
> +program generation.
> +
> +Adding architecture support
> +-----------------------
> +
> +Any platform with seccomp support will support seccomp filters as long
> +as CONFIG_SECCOMP_FILTER is enabled and the architecture has implemented
> +syscall_get_arguments.
> diff --git a/samples/Makefile b/samples/Makefile
> index 6280817..f29b19c 100644
> --- a/samples/Makefile
> +++ b/samples/Makefile
> @@ -1,4 +1,4 @@
>   # Makefile for Linux samples code
>
>   obj-$(CONFIG_SAMPLES)	+= kobject/ kprobes/ tracepoints/ trace_events/ \
> -			   hw_breakpoint/ kfifo/ kdb/ hidraw/
> +			   hw_breakpoint/ kfifo/ kdb/ hidraw/ seccomp/
> diff --git a/samples/seccomp/Makefile b/samples/seccomp/Makefile
> new file mode 100644
> index 0000000..0298c6f
> --- /dev/null
> +++ b/samples/seccomp/Makefile
> @@ -0,0 +1,27 @@
> +# kbuild trick to avoid linker error. Can be omitted if a module is built.
> +obj- := dummy.o
> +
> +hostprogs-y := bpf-fancy
> +bpf-fancy-objs := bpf-fancy.o bpf-helper.o
> +
> +HOSTCFLAGS_bpf-fancy.o += -I$(objtree)/usr/include
> +HOSTCFLAGS_bpf-fancy.o += -idirafter $(objtree)/include
> +HOSTCFLAGS_bpf-helper.o += -I$(objtree)/usr/include
> +HOSTCFLAGS_bpf-helper.o += -idirafter $(objtree)/include
> +
> +# bpf-direct.c is x86-only.
> +ifeq ($(filter-out x86_64 i386,$(KBUILD_BUILDHOST)),)
> +# List of programs to build
> +hostprogs-y += bpf-direct
> +bpf-direct-objs := bpf-direct.o
> +endif
> +
> +# Tell kbuild to always build the programs
> +always := $(hostprogs-y)
> +
> +HOSTCFLAGS_bpf-direct.o += -I$(objtree)/usr/include
> +HOSTCFLAGS_bpf-direct.o += -idirafter $(objtree)/include
> +ifeq ($(KBUILD_BUILDHOST),x86_64)
> +HOSTCFLAGS_bpf-direct.o += -m32
> +HOSTLOADLIBES_bpf-direct += -m32
> +endif
> diff --git a/samples/seccomp/bpf-direct.c b/samples/seccomp/bpf-direct.c
> new file mode 100644
> index 0000000..d799244
> --- /dev/null
> +++ b/samples/seccomp/bpf-direct.c
> @@ -0,0 +1,77 @@
> +/*
> + * 32-bit seccomp filter example with BPF macros
> + *
> + * Copyright (c) 2012 The Chromium OS Authors<chromium-os-dev@chromium.org>
> + * Author: Will Drewry<wad@chromium.org>
> + *
> + * The code may be used by anyone for any purpose,
> + * and can serve as a starting point for developing
> + * applications using prctl(PR_ATTACH_SECCOMP_FILTER).
> + */
> +
> +#include<linux/filter.h>
> +#include<linux/ptrace.h>
> +#include<linux/seccomp_filter.h>
> +#include<linux/unistd.h>
> +#include<stdio.h>
> +#include<stddef.h>
> +#include<sys/prctl.h>
> +#include<unistd.h>
> +
> +#ifndef PR_ATTACH_SECCOMP_FILTER
> +#	define PR_ATTACH_SECCOMP_FILTER 37
> +#endif
> +
> +#define syscall_arg(_n) (offsetof(struct seccomp_filter_data, args[_n].lo32))
> +#define nr (offsetof(struct seccomp_filter_data, syscall_nr))
> +
> +static int install_filter(void)
> +{
> +	struct seccomp_filter_block filter[] = {
> +		/* Grab the system call number */
> +		BPF_STMT(BPF_LD+BPF_W+BPF_ABS, nr),
> +		/* Jump table for the allowed syscalls */
> +		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_rt_sigreturn, 10, 0),
> +		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_sigreturn, 9, 0),
> +		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit_group, 8, 0),
> +		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit, 7, 0),
> +		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 1, 0),
> +		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_write, 2, 6),
> +
> +		/* Check that read is only using stdin. */
> +		BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_arg(0)),
> +		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDIN_FILENO, 3, 4),
> +
> +		/* Check that write is only using stdout/stderr */
> +		BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_arg(0)),
> +		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDOUT_FILENO, 1, 0),
> +		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDERR_FILENO, 0, 1),
> +
> +		BPF_STMT(BPF_RET+BPF_K, SECCOMP_BPF_ALLOW),
> +		BPF_STMT(BPF_RET+BPF_K, SECCOMP_BPF_DENY),
> +	};
> +	struct seccomp_fprog prog = {
> +		.len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
> +		.filter = filter,
> +	};
> +	if (prctl(PR_ATTACH_SECCOMP_FILTER,&prog)) {
> +		perror("prctl");
> +		return 1;
> +	}
> +	return 0;
> +}
> +
> +#define payload(_c) (_c), sizeof((_c))
> +int main(int argc, char **argv)
> +{
> +	char buf[4096];
> +	ssize_t bytes = 0;
> +	if (install_filter())
> +		return 1;
> +	syscall(__NR_write, STDOUT_FILENO,
> +		payload("OHAI! WHAT IS YOUR NAME? "));
> +	bytes = syscall(__NR_read, STDIN_FILENO, buf, sizeof(buf));
> +	syscall(__NR_write, STDOUT_FILENO, payload("HELLO, "));
> +	syscall(__NR_write, STDOUT_FILENO, buf, bytes);
> +	return 0;
> +}
> diff --git a/samples/seccomp/bpf-fancy.c b/samples/seccomp/bpf-fancy.c
> new file mode 100644
> index 0000000..1318b1a
> --- /dev/null
> +++ b/samples/seccomp/bpf-fancy.c
> @@ -0,0 +1,95 @@
> +/*
> + * Seccomp BPF example using a macro-based generator.
> + *
> + * Copyright (c) 2012 The Chromium OS Authors<chromium-os-dev@chromium.org>
> + * Author: Will Drewry<wad@chromium.org>
> + *
> + * The code may be used by anyone for any purpose,
> + * and can serve as a starting point for developing
> + * applications using prctl(PR_ATTACH_SECCOMP_FILTER).
> + */
> +
> +#include<linux/seccomp_filter.h>
> +#include<linux/unistd.h>
> +#include<stdio.h>
> +#include<string.h>
> +#include<sys/prctl.h>
> +#include<unistd.h>
> +
> +#include "bpf-helper.h"
> +
> +#ifndef PR_ATTACH_SECCOMP_FILTER
> +#	define PR_ATTACH_SECCOMP_FILTER 37
> +#endif
> +
> +int main(int argc, char **argv)
> +{
> +	struct bpf_labels l;
> +	static const char msg1[] = "Please type something: ";
> +	static const char msg2[] = "You typed: ";
> +	char buf[256];
> +	struct seccomp_filter_block filter[] = {
> +		LOAD_SYSCALL_NR,
> +		SYSCALL(__NR_exit, ALLOW),
> +		SYSCALL(__NR_exit_group, ALLOW),
> +		SYSCALL(__NR_write, JUMP(&l, write_fd)),
> +		SYSCALL(__NR_read, JUMP(&l, read)),
> +		DENY,  /* Don't passthrough into a label */
> +
> +		LABEL(&l, read),
> +		ARG(0),
> +		JNE(STDIN_FILENO, DENY),
> +		ARG(1),
> +		JNE((unsigned long)buf, DENY),
> +		ARG(2),
> +		JGE(sizeof(buf), DENY),
> +		ALLOW,
> +
> +		LABEL(&l, write_fd),
> +		ARG(0),
> +		JEQ(STDOUT_FILENO, JUMP(&l, write_buf)),
> +		JEQ(STDERR_FILENO, JUMP(&l, write_buf)),
> +		DENY,
> +
> +		LABEL(&l, write_buf),
> +		ARG(1),
> +		JEQ((unsigned long)msg1, JUMP(&l, msg1_len)),
> +		JEQ((unsigned long)msg2, JUMP(&l, msg2_len)),
> +		JEQ((unsigned long)buf, JUMP(&l, buf_len)),
> +		DENY,
> +
> +		LABEL(&l, msg1_len),
> +		ARG(2),
> +		JLT(sizeof(msg1), ALLOW),
> +		DENY,
> +
> +		LABEL(&l, msg2_len),
> +		ARG(2),
> +		JLT(sizeof(msg2), ALLOW),
> +		DENY,
> +
> +		LABEL(&l, buf_len),
> +		ARG(2),
> +		JLT(sizeof(buf), ALLOW),
> +		DENY,
> +	};
> +	struct seccomp_fprog prog = {
> +		.len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
> +		.filter = filter,
> +	};
> +	ssize_t bytes;
> +	bpf_resolve_jumps(&l, filter, sizeof(filter)/sizeof(*filter));
> +
> +	if (prctl(PR_ATTACH_SECCOMP_FILTER,&prog)) {
> +		perror("prctl");
> +		return 1;
> +	}
> +	syscall(__NR_write, STDOUT_FILENO, msg1, strlen(msg1));
> +	bytes = syscall(__NR_read, STDIN_FILENO, buf, sizeof(buf)-1);
> +	bytes = (bytes>  0 ? bytes : 0);
> +	syscall(__NR_write, STDERR_FILENO, msg2, strlen(msg2));
> +	syscall(__NR_write, STDERR_FILENO, buf, bytes);
> +	/* Now get killed */
> +	syscall(__NR_write, STDERR_FILENO, msg2, strlen(msg2)+2);
> +	return 0;
> +}
> diff --git a/samples/seccomp/bpf-helper.c b/samples/seccomp/bpf-helper.c
> new file mode 100644
> index 0000000..e1b6bc7
> --- /dev/null
> +++ b/samples/seccomp/bpf-helper.c
> @@ -0,0 +1,89 @@
> +/*
> + * Seccomp BPF helper functions
> + *
> + * Copyright (c) 2012 The Chromium OS Authors<chromium-os-dev@chromium.org>
> + * Author: Will Drewry<wad@chromium.org>
> + *
> + * The code may be used by anyone for any purpose,
> + * and can serve as a starting point for developing
> + * applications using prctl(PR_ATTACH_SECCOMP_FILTER).
> + */
> +
> +#include<stdio.h>
> +#include<string.h>
> +
> +#include "bpf-helper.h"
> +
> +int bpf_resolve_jumps(struct bpf_labels *labels,
> +		      struct seccomp_filter_block *filter, size_t count)
> +{
> +	struct seccomp_filter_block *begin = filter;
> +	__u8 insn = count - 1;
> +
> +	if (count<  1)
> +		return -1;
> +	/*
> +	* Walk it once, backwards, to build the label table and do fixups.
> +	* Since backward jumps are disallowed by BPF, this is easy.
> +	*/
> +	filter += insn;
> +	for (; filter>= begin; --insn, --filter) {
> +		if (filter->code != (BPF_JMP+BPF_JA))
> +			continue;
> +		switch ((filter->jt<<8)|filter->jf) {
> +		case (JUMP_JT<<8)|JUMP_JF:
> +			if (labels->labels[filter->k].location == 0xffffffff) {
> +				fprintf(stderr, "Unresolved label: '%s'\n",
> +					labels->labels[filter->k].label);
> +				return 1;
> +			}
> +			filter->k = labels->labels[filter->k].location -
> +				    (insn + 1);
> +			filter->jt = 0;
> +			filter->jf = 0;
> +			continue;
> +		case (LABEL_JT<<8)|LABEL_JF:
> +			if (labels->labels[filter->k].location != 0xffffffff) {
> +				fprintf(stderr, "Duplicate label use: '%s'\n",
> +					labels->labels[filter->k].label);
> +				return 1;
> +			}
> +			labels->labels[filter->k].location = insn;
> +			filter->k = 0; /* fall through */
> +			filter->jt = 0;
> +			filter->jf = 0;
> +			continue;
> +		}
> +	}
> +	return 0;
> +}
> +
> +/* Simple lookup table for labels. */
> +__u32 seccomp_bpf_label(struct bpf_labels *labels, const char *label)
> +{
> +	struct __bpf_label *begin = labels->labels, *end;
> +	int id;
> +	if (labels->count == 0) {
> +		begin->label = label;
> +		begin->location = 0xffffffff;
> +		labels->count++;
> +		return 0;
> +	}
> +	end = begin + labels->count;
> +	for (id = 0; begin<  end; ++begin, ++id) {
> +		if (!strcmp(label, begin->label))
> +			return id;
> +	}
> +	begin->label = label;
> +	begin->location = 0xffffffff;
> +	labels->count++;
> +	return id;
> +}
> +
> +void seccomp_bpf_print(struct seccomp_filter_block *filter, size_t count)
> +{
> +	struct seccomp_filter_block *end = filter + count;
> +	for ( ; filter<  end; ++filter)
> +		printf("{ code=%u,jt=%u,jf=%u,k=%u },\n",
> +			filter->code, filter->jt, filter->jf, filter->k);
> +}
> diff --git a/samples/seccomp/bpf-helper.h b/samples/seccomp/bpf-helper.h
> new file mode 100644
> index 0000000..92b94ec
> --- /dev/null
> +++ b/samples/seccomp/bpf-helper.h
> @@ -0,0 +1,219 @@
> +/*
> + * Example wrapper around BPF macros.
> + *
> + * Copyright (c) 2012 The Chromium OS Authors<chromium-os-dev@chromium.org>
> + * Author: Will Drewry<wad@chromium.org>
> + *
> + * The code may be used by anyone for any purpose,
> + * and can serve as a starting point for developing
> + * applications using prctl(PR_ATTACH_SECCOMP_FILTER).
> + *
> + * No guarantees are provided with respect to the correctness
> + * or functionality of this code.
> + */
> +#ifndef __BPF_HELPER_H__
> +#define __BPF_HELPER_H__
> +
> +#include<asm/bitsperlong.h>	/* for __BITS_PER_LONG */
> +#include<linux/filter.h>
> +#include<linux/seccomp_filter.h>	/* for seccomp_filter_data.arg */
> +#include<linux/types.h>
> +#include<linux/unistd.h>
> +#include<stddef.h>
> +
> +#define BPF_LABELS_MAX 256
> +struct bpf_labels {
> +	int count;
> +	struct __bpf_label {
> +		const char *label;
> +		__u32 location;
> +	} labels[BPF_LABELS_MAX];
> +};
> +
> +int bpf_resolve_jumps(struct bpf_labels *labels,
> +		      struct seccomp_filter_block *filter, size_t count);
> +__u32 seccomp_bpf_label(struct bpf_labels *labels, const char *label);
> +void seccomp_bpf_print(struct seccomp_filter_block *filter, size_t count);
> +
> +#define JUMP_JT 0xff
> +#define JUMP_JF 0xff
> +#define LABEL_JT 0xfe
> +#define LABEL_JF 0xfe
> +
> +#define ALLOW \
> +	BPF_STMT(BPF_RET+BPF_K, 0xFFFFFFFF)
> +#define DENY \
> +	BPF_STMT(BPF_RET+BPF_K, 0)
> +#define JUMP(labels, label) \
> +	BPF_JUMP(BPF_JMP+BPF_JA, FIND_LABEL((labels), (label)), \
> +		 JUMP_JT, JUMP_JF)
> +#define LABEL(labels, label) \
> +	BPF_JUMP(BPF_JMP+BPF_JA, FIND_LABEL((labels), (label)), \
> +		 LABEL_JT, LABEL_JF)
> +#define SYSCALL(nr, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (nr), 0, 1), \
> +	jt
> +
> +/* Lame, but just an example */
> +#define FIND_LABEL(labels, label) seccomp_bpf_label((labels), #label)
> +
> +#define EXPAND(...) __VA_ARGS__
> +/* Map all width-sensitive operations */
> +#if __BITS_PER_LONG == 32
> +
> +#define JEQ(x, jt) JEQ32(x, EXPAND(jt))
> +#define JNE(x, jt) JNE32(x, EXPAND(jt))
> +#define JGT(x, jt) JGT32(x, EXPAND(jt))
> +#define JLT(x, jt) JLT32(x, EXPAND(jt))
> +#define JGE(x, jt) JGE32(x, EXPAND(jt))
> +#define JLE(x, jt) JLE32(x, EXPAND(jt))
> +#define JA(x, jt) JA32(x, EXPAND(jt))
> +#define ARG(i) ARG_32(i)
> +
> +#elif __BITS_PER_LONG == 64
> +
> +#define JEQ(x, jt) \
> +	JEQ64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \
> +	      ((union seccomp_filter_arg){.u64 = (x)}).hi32, \
> +	      EXPAND(jt))
> +#define JGT(x, jt) \
> +	JGT64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \
> +	      ((union seccomp_filter_arg){.u64 = (x)}).hi32, \
> +	      EXPAND(jt))
> +#define JGE(x, jt) \
> +	JGE64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \
> +	      ((union seccomp_filter_arg){.u64 = (x)}).hi32, \
> +	      EXPAND(jt))
> +#define JNE(x, jt) \
> +	JNE64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \
> +	      ((union seccomp_filter_arg){.u64 = (x)}).hi32, \
> +	      EXPAND(jt))
> +#define JLT(x, jt) \
> +	JLT64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \
> +	      ((union seccomp_filter_arg){.u64 = (x)}).hi32, \
> +	      EXPAND(jt))
> +#define JLE(x, jt) \
> +	JLE64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \
> +	      ((union seccomp_filter_arg){.u64 = (x)}).hi32, \
> +	      EXPAND(jt))
> +
> +#define JA(x, jt) \
> +	JA64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \
> +	       ((union seccomp_filter_arg){.u64 = (x)}).hi32, \
> +	       EXPAND(jt))
> +#define ARG(i) ARG_64(i)
> +
> +#else
> +#error __BITS_PER_LONG value unusable.
> +#endif
> +
> +/* Loads the arg into A */
> +#define ARG_32(idx) \
> +	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \
> +		offsetof(struct seccomp_filter_data, args[(idx)].lo32))
> +
> +/* Loads hi into A and lo in X */
> +#define ARG_64(idx) \
> +	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \
> +	  offsetof(struct seccomp_filter_data, args[(idx)].lo32)), \
> +	BPF_STMT(BPF_ST, 0), /* lo ->  M[0] */ \
> +	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \
> +	  offsetof(struct seccomp_filter_data, args[(idx)].hi32)), \
> +	BPF_STMT(BPF_ST, 1) /* hi ->  M[1] */
> +
> +#define JEQ32(value, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (value), 0, 1), \
> +	jt
> +
> +#define JNE32(value, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (value), 1, 0), \
> +	jt
> +
> +/* Checks the lo, then swaps to check the hi. A=lo,X=hi */
> +#define JEQ64(lo, hi, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
> +	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (lo), 0, 2), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
> +	jt, \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
> +
> +#define JNE64(lo, hi, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 5, 0), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
> +	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (lo), 2, 0), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
> +	jt, \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
> +
> +#define JA32(value, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (value), 0, 1), \
> +	jt
> +
> +#define JA64(lo, hi, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (hi), 3, 0), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
> +	BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (lo), 0, 2), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
> +	jt, \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
> +
> +#define JGE32(value, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (value), 0, 1), \
> +	jt
> +
> +#define JLT32(value, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (value), 1, 0), \
> +	jt
> +
> +/* Shortcut checking if hi>  arg.hi. */
> +#define JGE64(lo, hi, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 4, 0), \
> +	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
> +	BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (lo), 0, 2), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
> +	jt, \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
> +
> +#define JLT64(lo, hi, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (hi), 0, 4), \
> +	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
> +	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 2, 0), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
> +	jt, \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
> +
> +#define JGT32(value, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (value), 0, 1), \
> +	jt
> +
> +#define JLE32(value, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (value), 0, 1), \
> +	jt

Should the true/false offsets be reversed here?

Thanks for all the work on this.  We're looking forward to using it with 
QEMU.

> +
> +/* Check hi>  args.hi first, then do the GE checking */
> +#define JGT64(lo, hi, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 4, 0), \
> +	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
> +	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 0, 2), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
> +	jt, \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
> +
> +#define JLE64(lo, hi, jt) \
> +	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 6, 0), \
> +	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 3), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
> +	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 2, 0), \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
> +	jt, \
> +	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
> +
> +#define LOAD_SYSCALL_NR \
> +	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \
> +		 offsetof(struct seccomp_filter_data, syscall_nr))
> +
> +#endif  /* __BPF_HELPER_H__ */


-- 
Regards,
Corey

next prev parent reply	other threads:[~2012-01-30 22:47 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-28 22:11 [PATCH v6 1/3] seccomp: kill the seccomp_t typedef Will Drewry
2012-01-28 22:11 ` [PATCH v6 2/3] seccomp_filters: system call filtering using BPF Will Drewry
2012-01-31 14:13   ` Eduardo Otubo
2012-01-31 15:20     ` Will Drewry
2012-01-31 15:20       ` Will Drewry
2012-02-02 15:32   ` Serge E. Hallyn
2012-02-03 23:14     ` Will Drewry
2012-02-03 23:14       ` Will Drewry
2012-01-28 22:11 ` [PATCH v6 3/3] Documentation: prctl/seccomp_filter Will Drewry
2012-01-30 22:47   ` Corey Bryant [this message]
2012-01-30 22:52     ` Will Drewry
2012-02-02 15:29 ` [PATCH v6 1/3] seccomp: kill the seccomp_t typedef Serge E. Hallyn
2012-02-03 23:16   ` Will Drewry
2012-02-04  1:05     ` Linus Torvalds
2012-02-04  1:05       ` Linus Torvalds
2012-02-06 16:13       ` Will Drewry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F271DFE.3080202@linux.vnet.ibm.com \
    --to=coreyb@linux.vnet.ibm.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=amwang@redhat.com \
    --cc=avi@redhat.com \
    --cc=borislav.petkov@amd.com \
    --cc=corbet@lwn.net \
    --cc=daniel.lezcano@free.fr \
    --cc=dhowells@redhat.com \
    --cc=djm@mindrot.org \
    --cc=dlaor@redhat.com \
    --cc=eparis@redhat.com \
    --cc=eric.dumazet@gmail.com \
    --cc=gregkh@suse.de \
    --cc=indan@nul.nu \
    --cc=jmorris@namei.org \
    --cc=john.johansen@canonical.com \
    --cc=keescook@chromium.org \
    --cc=khilman@ti.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=luto@mit.edu \
    --cc=mcgrathr@chromium.org \
    --cc=mhalcrow@google.com \
    --cc=mingo@elte.hu \
    --cc=oleg@redhat.com \
    --cc=olofj@chromium.org \
    --cc=penberg@cs.helsinki.fi \
    --cc=pmoore@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=scarybeasts@gmail.com \
    --cc=segoon@openwall.com \
    --cc=serge.hallyn@canonical.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wad@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.