* v2 of seccomp filter c/r patches
@ 2015-09-11  0:20 Tycho Andersen
  2015-09-11  0:20 ` [PATCH v2 2/5] seccomp: make underlying bpf ref counted as well Tycho Andersen
                   ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Tycho Andersen @ 2015-09-11  0:20 UTC (permalink / raw)
  To: Kees Cook, Alexei Starovoitov
  Cc: David S. Miller, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA
Hi all,
Here is v2 of the seccomp filter c/r set. The patch notes have individual
changes from the last series, but there are two points not noted:
* The series still does not allow us to correctly restore state for programs
  that will use SECCOMP_FILTER_FLAG_TSYNC in the future. Given that we want to
  keep seccomp_filter's identity, I think something along the lines of another
  seccomp command like SECCOMP_INHERIT_PARENT is needed (although I'm not sure
  if this can even be done yet). In addition, we'll need a kcmp command for
  figuring out if filters are the same, although this too needs to compare
  seccomp_filter objects, so it's a little screwy. Any thoughts on how to do
  this nicely are welcome.
* I've dropped the bpf converter bug from the set and will submit it
  separately.
Alexei mentioned that this should go via net-next to minimize cross-tree
conflicts. Does that make sense here?
Thanks,
Tycho
^ permalink raw reply	[flat|nested] 40+ messages in thread
* [PATCH v2 1/5] ebpf: add a seccomp program type
       [not found] ` <1441930862-14347-1-git-send-email-tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
@ 2015-09-11  0:20   ` Tycho Andersen
  2015-09-11 12:09     ` Michael Kerrisk (man-pages)
  2015-09-11  0:21   ` [PATCH v2 3/5] ebpf: add a way to dump an eBPF program Tycho Andersen
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 40+ messages in thread
From: Tycho Andersen @ 2015-09-11  0:20 UTC (permalink / raw)
  To: Kees Cook, Alexei Starovoitov
  Cc: David S. Miller, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Tycho Andersen
seccomp uses eBPF as its underlying storage and execution format, and eBPF
has features that seccomp would like to make use of in the future. This
patch adds a formal seccomp type to the eBPF verifier.
The current implementation of the seccomp eBPF type is very limited, and
doesn't support some interesting features (notably, maps) of eBPF. However,
the primary motivation for this patchset is to enable checkpoint/restore
for seccomp filters later in the series, to this limited feature set is ok
for now.
v2: * don't allow seccomp eBPF programs to call any functions
    * get rid of superfluous seccomp_convert_ctx_access
Signed-off-by: Tycho Andersen <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
CC: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
CC: Will Drewry <wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
CC: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
CC: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
CC: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
CC: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
CC: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
---
 include/uapi/linux/bpf.h |  1 +
 net/core/filter.c        | 31 +++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 92a48e2..631cdee 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -123,6 +123,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_KPROBE,
 	BPF_PROG_TYPE_SCHED_CLS,
 	BPF_PROG_TYPE_SCHED_ACT,
+	BPF_PROG_TYPE_SECCOMP,
 };
 
 #define BPF_PSEUDO_MAP_FD	1
diff --git a/net/core/filter.c b/net/core/filter.c
index 13079f0..faaae67 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1612,6 +1612,15 @@ tc_cls_act_func_proto(enum bpf_func_id func_id)
 	}
 }
 
+static const struct bpf_func_proto *
+seccomp_func_proto(enum bpf_func_id func_id)
+{
+	/* At some point in the future seccomp filters may grow support for
+	 * eBPF functions. For now, these are disabled.
+	 */
+	return NULL;
+}
+
 static bool __is_valid_access(int off, int size, enum bpf_access_type type)
 {
 	/* check bounds */
@@ -1662,6 +1671,17 @@ static bool tc_cls_act_is_valid_access(int off, int size,
 	return __is_valid_access(off, size, type);
 }
 
+static bool seccomp_is_valid_access(int off, int size,
+				    enum bpf_access_type type)
+{
+	if (type == BPF_WRITE)
+		return false;
+
+	if (off < 0 || off >= sizeof(struct seccomp_data) || off & 3)
+		return false;
+
+	return true;
+}
 static u32 bpf_net_convert_ctx_access(enum bpf_access_type type, int dst_reg,
 				      int src_reg, int ctx_off,
 				      struct bpf_insn *insn_buf)
@@ -1795,6 +1815,11 @@ static const struct bpf_verifier_ops tc_cls_act_ops = {
 	.convert_ctx_access = bpf_net_convert_ctx_access,
 };
 
+static const struct bpf_verifier_ops seccomp_ops = {
+	.get_func_proto = seccomp_func_proto,
+	.is_valid_access = seccomp_is_valid_access,
+};
+
 static struct bpf_prog_type_list sk_filter_type __read_mostly = {
 	.ops = &sk_filter_ops,
 	.type = BPF_PROG_TYPE_SOCKET_FILTER,
@@ -1810,11 +1835,17 @@ static struct bpf_prog_type_list sched_act_type __read_mostly = {
 	.type = BPF_PROG_TYPE_SCHED_ACT,
 };
 
+static struct bpf_prog_type_list seccomp_type __read_mostly = {
+	.ops = &seccomp_ops,
+	.type = BPF_PROG_TYPE_SECCOMP,
+};
+
 static int __init register_sk_filter_ops(void)
 {
 	bpf_register_prog_type(&sk_filter_type);
 	bpf_register_prog_type(&sched_cls_type);
 	bpf_register_prog_type(&sched_act_type);
+	bpf_register_prog_type(&seccomp_type);
 
 	return 0;
 }
-- 
2.1.4
^ permalink raw reply related	[flat|nested] 40+ messages in thread
* [PATCH v2 2/5] seccomp: make underlying bpf ref counted as well
  2015-09-11  0:20 v2 of seccomp filter c/r patches Tycho Andersen
@ 2015-09-11  0:20 ` Tycho Andersen
       [not found]   ` <1441930862-14347-3-git-send-email-tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
  2015-09-11  0:21 ` [PATCH v2 4/5] seccomp: add a way to access filters via bpf fds Tycho Andersen
       [not found] ` <1441930862-14347-1-git-send-email-tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
  2 siblings, 1 reply; 40+ messages in thread
From: Tycho Andersen @ 2015-09-11  0:20 UTC (permalink / raw)
  To: Kees Cook, Alexei Starovoitov
  Cc: David S. Miller, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	netdev, linux-api, Tycho Andersen
In the next patch, we're going to add a way to access the underlying
filters via bpf fds. This means that we need to ref-count both the
struct seccomp_filter objects and the struct bpf_prog objects separately,
in case a process dies but a filter is still referred to by another
process.
Additionally, we mark classic converted seccomp filters as seccomp eBPF
programs, since they are a subset of what is supported in seccomp eBPF.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Kees Cook <keescook@chromium.org>
CC: Will Drewry <wad@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
CC: Alexei Starovoitov <ast@kernel.org>
CC: Daniel Borkmann <daniel@iogearbox.net>
---
 kernel/seccomp.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 245df6b..afaeddf 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -378,6 +378,8 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 	}
 
 	atomic_set(&sfilter->usage, 1);
+	atomic_set(&sfilter->prog->aux->refcnt, 1);
+	sfilter->prog->type = BPF_PROG_TYPE_SECCOMP;
 
 	return sfilter;
 }
@@ -470,7 +472,7 @@ void get_seccomp_filter(struct task_struct *tsk)
 static inline void seccomp_filter_free(struct seccomp_filter *filter)
 {
 	if (filter) {
-		bpf_prog_free(filter->prog);
+		bpf_prog_put(filter->prog);
 		kfree(filter);
 	}
 }
-- 
2.1.4
^ permalink raw reply related	[flat|nested] 40+ messages in thread
* [PATCH v2 3/5] ebpf: add a way to dump an eBPF program
       [not found] ` <1441930862-14347-1-git-send-email-tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
  2015-09-11  0:20   ` [PATCH v2 1/5] ebpf: add a seccomp program type Tycho Andersen
@ 2015-09-11  0:21   ` Tycho Andersen
       [not found]     ` <1441930862-14347-4-git-send-email-tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
  2015-09-11 12:11     ` Michael Kerrisk (man-pages)
  2015-09-11  0:21   ` [PATCH v2 5/5] seccomp: add a way to attach a filter via eBPF fd Tycho Andersen
                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 40+ messages in thread
From: Tycho Andersen @ 2015-09-11  0:21 UTC (permalink / raw)
  To: Kees Cook, Alexei Starovoitov
  Cc: David S. Miller, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Tycho Andersen
This commit adds a way to dump eBPF programs. The initial implementation
doesn't support maps, and therefore only allows dumping seccomp ebpf
programs which themselves don't currently support maps.
v2: don't export a prog_id for the filter
Signed-off-by: Tycho Andersen <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
CC: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
CC: Will Drewry <wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
CC: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
CC: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
CC: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
CC: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
CC: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
---
 include/uapi/linux/bpf.h | 14 ++++++++++++++
 kernel/bpf/syscall.c     | 41 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 631cdee..e037a76 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -107,6 +107,13 @@ enum bpf_cmd {
 	 * returns fd or negative error
 	 */
 	BPF_PROG_LOAD,
+
+	/* dump an existing bpf
+	 * err = bpf(BPF_PROG_DUMP, union bpf_attr *attr, u32 size)
+	 * Using attr->prog_fd, attr->dump_insn_cnt, attr->dump_insns
+	 * returns zero or negative error
+	 */
+	BPF_PROG_DUMP,
 };
 
 enum bpf_map_type {
@@ -161,6 +168,13 @@ union bpf_attr {
 		__aligned_u64	log_buf;	/* user supplied buffer */
 		__u32		kern_version;	/* checked when prog_type=kprobe */
 	};
+
+	struct { /* anonymous struct used by BPF_PROG_DUMP command */
+		__u32		prog_fd;
+		__u32		dump_insn_cnt;
+		__aligned_u64	dump_insns;	/* user supplied buffer */
+		__u8		gpl_compatible;
+	};
 } __attribute__((aligned(8)));
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index dc9b464..58ae9f4 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -586,6 +586,44 @@ free_prog:
 	return err;
 }
 
+static int bpf_prog_dump(union bpf_attr *attr, union bpf_attr __user *uattr)
+{
+	int ufd = attr->prog_fd;
+	struct fd f = fdget(ufd);
+	struct bpf_prog *prog;
+	int ret = -EINVAL;
+
+	prog = get_prog(f);
+	if (IS_ERR(prog))
+		return PTR_ERR(prog);
+
+	/* For now, let's refuse to dump anything that isn't a seccomp program.
+	 * Other program types have support for maps, which our current dump
+	 * code doesn't support.
+	 */
+	if (prog->type != BPF_PROG_TYPE_SECCOMP)
+		goto out;
+
+	ret = -EFAULT;
+	if (put_user(prog->len, &uattr->dump_insn_cnt))
+		goto out;
+
+	if (put_user((u8) prog->gpl_compatible, &uattr->gpl_compatible))
+		goto out;
+
+	if (attr->dump_insns) {
+		u32 len = prog->len * sizeof(struct bpf_insn);
+
+		if (copy_to_user(u64_to_ptr(attr->dump_insns),
+				 prog->insns, len) != 0)
+			goto out;
+	}
+
+	ret = 0;
+out:
+	return ret;
+}
+
 SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
 {
 	union bpf_attr attr = {};
@@ -650,6 +688,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	case BPF_PROG_LOAD:
 		err = bpf_prog_load(&attr);
 		break;
+	case BPF_PROG_DUMP:
+		err = bpf_prog_dump(&attr, uattr);
+		break;
 	default:
 		err = -EINVAL;
 		break;
-- 
2.1.4
^ permalink raw reply related	[flat|nested] 40+ messages in thread
* [PATCH v2 4/5] seccomp: add a way to access filters via bpf fds
  2015-09-11  0:20 v2 of seccomp filter c/r patches Tycho Andersen
  2015-09-11  0:20 ` [PATCH v2 2/5] seccomp: make underlying bpf ref counted as well Tycho Andersen
@ 2015-09-11  0:21 ` Tycho Andersen
  2015-09-11 11:47   ` Daniel Borkmann
                     ` (2 more replies)
       [not found] ` <1441930862-14347-1-git-send-email-tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
  2 siblings, 3 replies; 40+ messages in thread
From: Tycho Andersen @ 2015-09-11  0:21 UTC (permalink / raw)
  To: Kees Cook, Alexei Starovoitov
  Cc: David S. Miller, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, linux-kernel,
	netdev, linux-api, Tycho Andersen
This patch adds a way for a process that is "real root" to access the
seccomp filters of another process. The process first does a
PTRACE_SECCOMP_GET_FILTER_FD to get an fd with that process' seccomp filter
attached, and then iterates on this with PTRACE_SECCOMP_NEXT_FILTER using
bpf(BPF_PROG_DUMP) to dump the actual program at each step.
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Kees Cook <keescook@chromium.org>
CC: Will Drewry <wad@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
CC: Alexei Starovoitov <ast@kernel.org>
CC: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/bpf.h         | 12 ++++++++++
 include/linux/seccomp.h     | 14 +++++++++++
 include/uapi/linux/ptrace.h |  3 +++
 kernel/bpf/syscall.c        | 26 ++++++++++++++++++++-
 kernel/ptrace.c             |  7 ++++++
 kernel/seccomp.c            | 57 +++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 118 insertions(+), 1 deletion(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f57d7fe..bfd9cab 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -162,6 +162,8 @@ void bpf_register_prog_type(struct bpf_prog_type_list *tl);
 void bpf_register_map_type(struct bpf_map_type_list *tl);
 
 struct bpf_prog *bpf_prog_get(u32 ufd);
+int bpf_prog_set(u32 ufd, struct bpf_prog *new);
+int bpf_new_fd(struct bpf_prog *prog, int flags);
 void bpf_prog_put(struct bpf_prog *prog);
 void bpf_prog_put_rcu(struct bpf_prog *prog);
 
@@ -180,6 +182,16 @@ static inline struct bpf_prog *bpf_prog_get(u32 ufd)
 	return ERR_PTR(-EOPNOTSUPP);
 }
 
+static inline int bpf_prog_set(u32 ufd, struct bpf_prog *new)
+{
+	return -EINVAL;
+}
+
+static inline int bpf_new_fd(struct bpf_prog *prog, int flags)
+{
+	return -EINVAL;
+}
+
 static inline void bpf_prog_put(struct bpf_prog *prog)
 {
 }
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index a19ddac..41b083c 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -95,4 +95,18 @@ static inline void get_seccomp_filter(struct task_struct *tsk)
 	return;
 }
 #endif /* CONFIG_SECCOMP_FILTER */
+
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
+extern long seccomp_get_filter_fd(struct task_struct *child);
+extern long seccomp_next_filter(struct task_struct *child, u32 fd);
+#else
+static inline long seccomp_get_filter_fd(struct task_struct *child)
+{
+	return -EINVAL;
+}
+static inline long seccomp_next_filter(struct task_struct *child, u32 fd)
+{
+	return -EINVAL;
+}
+#endif /* CONFIG_SECCOMP_FILTER && CONFIG_CHECKPOINT_RESTORE */
 #endif /* _LINUX_SECCOMP_H */
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index cf1019e..041c3c3 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -23,6 +23,9 @@
 
 #define PTRACE_SYSCALL		  24
 
+#define PTRACE_SECCOMP_GET_FILTER_FD	40
+#define PTRACE_SECCOMP_NEXT_FILTER	41
+
 /* 0x4200-0x4300 are reserved for architecture-independent additions.  */
 #define PTRACE_SETOPTIONS	0x4200
 #define PTRACE_GETEVENTMSG	0x4201
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 58ae9f4..ac3ed1c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -506,6 +506,30 @@ struct bpf_prog *bpf_prog_get(u32 ufd)
 }
 EXPORT_SYMBOL_GPL(bpf_prog_get);
 
+int bpf_prog_set(u32 ufd, struct bpf_prog *new)
+{
+	struct fd f;
+	struct bpf_prog *prog;
+
+	f = fdget(ufd);
+
+	prog = get_prog(f);
+	if (!IS_ERR(prog) && prog)
+		bpf_prog_put(prog);
+
+	atomic_inc(&new->aux->refcnt);
+	f.file->private_data = new;
+	fdput(f);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(bpf_prog_set);
+
+int bpf_new_fd(struct bpf_prog *prog, int flags)
+{
+	return anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog, flags);
+}
+EXPORT_SYMBOL_GPL(bpf_new_fd);
+
 /* last field in 'union bpf_attr' used by this command */
 #define	BPF_PROG_LOAD_LAST_FIELD kern_version
 
@@ -572,7 +596,7 @@ static int bpf_prog_load(union bpf_attr *attr)
 	if (err < 0)
 		goto free_used_maps;
 
-	err = anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog, O_RDWR | O_CLOEXEC);
+	err = bpf_new_fd(prog, O_RDWR | O_CLOEXEC);
 	if (err < 0)
 		/* failed to allocate fd */
 		goto free_used_maps;
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index c8e0e05..a151c35 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1003,6 +1003,13 @@ int ptrace_request(struct task_struct *child, long request,
 		break;
 	}
 #endif
+
+	case PTRACE_SECCOMP_GET_FILTER_FD:
+		return seccomp_get_filter_fd(child);
+
+	case PTRACE_SECCOMP_NEXT_FILTER:
+		return seccomp_next_filter(child, data);
+
 	default:
 		break;
 	}
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index afaeddf..1856f69 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -26,6 +26,8 @@
 #endif
 
 #ifdef CONFIG_SECCOMP_FILTER
+#include <linux/bpf.h>
+#include <uapi/linux/bpf.h>
 #include <linux/filter.h>
 #include <linux/pid.h>
 #include <linux/ptrace.h>
@@ -807,6 +809,61 @@ static inline long seccomp_set_mode_filter(unsigned int flags,
 }
 #endif
 
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
+long seccomp_get_filter_fd(struct task_struct *child)
+{
+	long fd;
+	struct seccomp_filter *filter;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
+	if (child->seccomp.mode != SECCOMP_MODE_FILTER)
+		return -EINVAL;
+
+	filter = child->seccomp.filter;
+
+	fd = bpf_new_fd(filter->prog, O_RDONLY);
+	if (fd > 0)
+		atomic_inc(&filter->prog->aux->refcnt);
+
+	return fd;
+}
+
+long seccomp_next_filter(struct task_struct *child, u32 fd)
+{
+	struct seccomp_filter *cur;
+	struct bpf_prog *prog;
+	long ret = -ESRCH;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
+	if (child->seccomp.mode != SECCOMP_MODE_FILTER)
+		return -EINVAL;
+
+	prog = bpf_prog_get(fd);
+	if (IS_ERR(prog)) {
+		ret = PTR_ERR(prog);
+		goto out;
+	}
+
+	for (cur = child->seccomp.filter; cur; cur = cur->prev) {
+		if (cur->prog == prog) {
+			if (!cur->prev)
+				ret = -ENOENT;
+			else
+				ret = bpf_prog_set(fd, cur->prev->prog);
+			break;
+		}
+	}
+
+out:
+	bpf_prog_put(prog);
+	return ret;
+}
+#endif
+
 /* Common entry point for both prctl and syscall. */
 static long do_seccomp(unsigned int op, unsigned int flags,
 		       const char __user *uargs)
-- 
2.1.4
^ permalink raw reply related	[flat|nested] 40+ messages in thread
* [PATCH v2 5/5] seccomp: add a way to attach a filter via eBPF fd
       [not found] ` <1441930862-14347-1-git-send-email-tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
  2015-09-11  0:20   ` [PATCH v2 1/5] ebpf: add a seccomp program type Tycho Andersen
  2015-09-11  0:21   ` [PATCH v2 3/5] ebpf: add a way to dump an eBPF program Tycho Andersen
@ 2015-09-11  0:21   ` Tycho Andersen
  2015-09-11 12:10     ` Michael Kerrisk (man-pages)
  2015-09-11 12:37     ` Daniel Borkmann
  2015-09-11  2:50   ` v2 of seccomp filter c/r patches Alexei Starovoitov
  2015-09-11 16:30   ` Andy Lutomirski
  4 siblings, 2 replies; 40+ messages in thread
From: Tycho Andersen @ 2015-09-11  0:21 UTC (permalink / raw)
  To: Kees Cook, Alexei Starovoitov
  Cc: David S. Miller, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Tycho Andersen
This is the final bit needed to support seccomp filters created via the bpf
syscall. The patch adds a new seccomp operation SECCOMP_MODE_FILTER_EBPF,
which takes exactly one command (presumably to be expanded upon later when
seccomp EBPFs support more interesting things) and an argument struct
similar to that of bpf(), although the size is explicit in the struct to
avoid changing the signature of seccomp().
v2: Don't abuse seccomp's third argument; use a separate command and a
    pointer to a structure instead.
Signed-off-by: Tycho Andersen <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
CC: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
CC: Will Drewry <wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
CC: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
CC: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
CC: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
CC: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
CC: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
---
 include/uapi/linux/seccomp.h |  16 +++++
 kernel/seccomp.c             | 135 ++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 138 insertions(+), 13 deletions(-)
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 0f238a4..a8694e2 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -13,10 +13,14 @@
 /* Valid operations for seccomp syscall. */
 #define SECCOMP_SET_MODE_STRICT	0
 #define SECCOMP_SET_MODE_FILTER	1
+#define SECCOMP_MODE_FILTER_EBPF	2
 
 /* Valid flags for SECCOMP_SET_MODE_FILTER */
 #define SECCOMP_FILTER_FLAG_TSYNC	1
 
+/* Valid cmds for SECCOMP_MODE_FILTER_EBPF */
+#define SECCOMP_EBPF_ADD_FD	0
+
 /*
  * All BPF programs must return a 32-bit value.
  * The bottom 16-bits are for optional return data.
@@ -51,4 +55,16 @@ struct seccomp_data {
 	__u64 args[6];
 };
 
+struct seccomp_ebpf {
+	unsigned int size;
+
+	union {
+		/* SECCOMP_EBPF_ADD_FD */
+		struct {
+			unsigned int	add_flags;
+			__u32		add_fd;
+		};
+	};
+};
+
 #endif /* _UAPI_LINUX_SECCOMP_H */
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 1856f69..e78175a 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -65,6 +65,9 @@ struct seccomp_filter {
 /* Limit any path through the tree to 256KB worth of instructions. */
 #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter))
 
+static long seccomp_install_filter(unsigned int flags,
+				   struct seccomp_filter *prepared);
+
 /*
  * Endianness is explicitly ignored and left for BPF program authors to manage
  * as per the specific architecture.
@@ -356,17 +359,6 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 
 	BUG_ON(INT_MAX / fprog->len < sizeof(struct sock_filter));
 
-	/*
-	 * Installing a seccomp filter requires that the task has
-	 * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
-	 * This avoids scenarios where unprivileged tasks can affect the
-	 * behavior of privileged children.
-	 */
-	if (!task_no_new_privs(current) &&
-	    security_capable_noaudit(current_cred(), current_user_ns(),
-				     CAP_SYS_ADMIN) != 0)
-		return ERR_PTR(-EACCES);
-
 	/* Allocate a new seccomp_filter */
 	sfilter = kzalloc(sizeof(*sfilter), GFP_KERNEL | __GFP_NOWARN);
 	if (!sfilter)
@@ -510,8 +502,105 @@ static void seccomp_send_sigsys(int syscall, int reason)
 	info.si_syscall = syscall;
 	force_sig_info(SIGSYS, &info, current);
 }
+
 #endif	/* CONFIG_SECCOMP_FILTER */
 
+#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_SECCOMP_FILTER)
+static struct seccomp_filter *seccomp_prepare_ebpf(int fd)
+{
+	struct seccomp_filter *ret;
+	struct bpf_prog *prog;
+
+	prog = bpf_prog_get(fd);
+	if (IS_ERR(prog))
+		return (struct seccomp_filter *) prog;
+
+	if (prog->type != BPF_PROG_TYPE_SECCOMP) {
+		bpf_prog_put(prog);
+		return ERR_PTR(-EINVAL);
+	}
+
+	ret = kzalloc(sizeof(*ret), GFP_KERNEL | __GFP_NOWARN);
+	if (!ret) {
+		bpf_prog_put(prog);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	ret->prog = prog;
+	atomic_set(&ret->usage, 1);
+
+	/* Intentionally don't bpf_prog_put() here, because the underlying prog
+	 * is refcounted too and we're holding a reference from the struct
+	 * seccomp_filter object.
+	 */
+	return ret;
+}
+
+static long seccomp_ebpf_add_fd(struct seccomp_ebpf *ebpf)
+{
+	struct seccomp_filter *prepared;
+
+	prepared = seccomp_prepare_ebpf(ebpf->add_fd);
+	if (IS_ERR(prepared))
+		return PTR_ERR(prepared);
+
+	return seccomp_install_filter(ebpf->add_flags, prepared);
+}
+
+static long seccomp_mode_filter_ebpf(unsigned int cmd, const char __user *uargs)
+{
+	const struct seccomp_ebpf __user *uebpf;
+	struct seccomp_ebpf ebpf;
+	unsigned int size;
+	long ret = -EFAULT;
+
+	uebpf = (const struct seccomp_ebpf __user *) uargs;
+
+	if (get_user(size, &uebpf->size) != 0)
+		return -EFAULT;
+
+	/* If we're handed a bigger struct than we know of,
+	 * ensure all the unknown bits are 0 - i.e. new
+	 * user-space does not rely on any kernel feature
+	 * extensions we dont know about yet.
+	 */
+	if (size > sizeof(ebpf)) {
+		unsigned char __user *addr;
+		unsigned char __user *end;
+		unsigned char val;
+
+		addr = (void __user *)uebpf + sizeof(ebpf);
+		end  = (void __user *)uebpf + size;
+
+		for (; addr < end; addr++) {
+			int err = get_user(val, addr);
+
+			if (err)
+				return err;
+			if (val)
+				return -E2BIG;
+		}
+		size = sizeof(ebpf);
+	}
+
+	if (copy_from_user(&ebpf, uebpf, size) != 0)
+		return -EFAULT;
+
+	switch (cmd) {
+	case SECCOMP_EBPF_ADD_FD:
+		ret = seccomp_ebpf_add_fd(&ebpf);
+		break;
+	}
+
+	return ret;
+}
+#else
+static long seccomp_mode_filter_ebpf(unsigned int cmd, const char __user *uargs)
+{
+	return -EINVAL;
+}
+#endif
+
 /*
  * Secure computing mode 1 allows only read/write/exit/sigreturn.
  * To be fully secure this must be combined with rlimit
@@ -760,9 +849,7 @@ out:
 static long seccomp_set_mode_filter(unsigned int flags,
 				    const char __user *filter)
 {
-	const unsigned long seccomp_mode = SECCOMP_MODE_FILTER;
 	struct seccomp_filter *prepared = NULL;
-	long ret = -EINVAL;
 
 	/* Validate flags. */
 	if (flags & ~SECCOMP_FILTER_FLAG_MASK)
@@ -773,6 +860,26 @@ static long seccomp_set_mode_filter(unsigned int flags,
 	if (IS_ERR(prepared))
 		return PTR_ERR(prepared);
 
+	return seccomp_install_filter(flags, prepared);
+}
+
+static long seccomp_install_filter(unsigned int flags,
+				   struct seccomp_filter *prepared)
+{
+	const unsigned long seccomp_mode = SECCOMP_MODE_FILTER;
+	long ret = -EINVAL;
+
+	/*
+	 * Installing a seccomp filter requires that the task has
+	 * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
+	 * This avoids scenarios where unprivileged tasks can affect the
+	 * behavior of privileged children.
+	 */
+	if (!task_no_new_privs(current) &&
+	    security_capable_noaudit(current_cred(), current_user_ns(),
+				     CAP_SYS_ADMIN) != 0)
+		return -EACCES;
+
 	/*
 	 * Make sure we cannot change seccomp or nnp state via TSYNC
 	 * while another thread is in the middle of calling exec.
@@ -875,6 +982,8 @@ static long do_seccomp(unsigned int op, unsigned int flags,
 		return seccomp_set_mode_strict();
 	case SECCOMP_SET_MODE_FILTER:
 		return seccomp_set_mode_filter(flags, uargs);
+	case SECCOMP_MODE_FILTER_EBPF:
+		return seccomp_mode_filter_ebpf(flags, uargs);
 	default:
 		return -EINVAL;
 	}
-- 
2.1.4
^ permalink raw reply related	[flat|nested] 40+ messages in thread
* Re: [PATCH v2 3/5] ebpf: add a way to dump an eBPF program
       [not found]     ` <1441930862-14347-4-git-send-email-tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
@ 2015-09-11  2:29       ` Alexei Starovoitov
       [not found]         ` <20150911022940.GA4903-2RGepAHry06MXrjNfwE7T/6muRTtt8+awzqs5ZKRSiY@public.gmane.org>
  2015-09-11 13:39       ` Daniel Borkmann
  1 sibling, 1 reply; 40+ messages in thread
From: Alexei Starovoitov @ 2015-09-11  2:29 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, David S. Miller, Will Drewry,
	Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA
On Thu, Sep 10, 2015 at 06:21:00PM -0600, Tycho Andersen wrote:
> +static int bpf_prog_dump(union bpf_attr *attr, union bpf_attr __user *uattr)
> +{
> +	int ufd = attr->prog_fd;
> +	struct fd f = fdget(ufd);
> +	struct bpf_prog *prog;
> +	int ret = -EINVAL;
> +
> +	prog = get_prog(f);
> +	if (IS_ERR(prog))
> +		return PTR_ERR(prog);
> +
> +	/* For now, let's refuse to dump anything that isn't a seccomp program.
> +	 * Other program types have support for maps, which our current dump
> +	 * code doesn't support.
> +	 */
> +	if (prog->type != BPF_PROG_TYPE_SECCOMP)
> +		goto out;
> +
> +	ret = -EFAULT;
> +	if (put_user(prog->len, &uattr->dump_insn_cnt))
> +		goto out;
> +
> +	if (put_user((u8) prog->gpl_compatible, &uattr->gpl_compatible))
> +		goto out;
> +
> +	if (attr->dump_insns) {
> +		u32 len = prog->len * sizeof(struct bpf_insn);
> +
> +		if (copy_to_user(u64_to_ptr(attr->dump_insns),
> +				 prog->insns, len) != 0)
> +			goto out;
> +	}
> +
> +	ret = 0;
> +out:
> +	return ret;
fdput() is missing in all error paths.
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: v2 of seccomp filter c/r patches
       [not found] ` <1441930862-14347-1-git-send-email-tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-09-11  0:21   ` [PATCH v2 5/5] seccomp: add a way to attach a filter via eBPF fd Tycho Andersen
@ 2015-09-11  2:50   ` Alexei Starovoitov
  2015-09-11 16:30   ` Andy Lutomirski
  4 siblings, 0 replies; 40+ messages in thread
From: Alexei Starovoitov @ 2015-09-11  2:50 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, David S. Miller, Will Drewry,
	Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA
On Thu, Sep 10, 2015 at 06:20:57PM -0600, Tycho Andersen wrote:
> Hi all,
> 
> Here is v2 of the seccomp filter c/r set. The patch notes have individual
> changes from the last series, but there are two points not noted:
> 
> * The series still does not allow us to correctly restore state for programs
>   that will use SECCOMP_FILTER_FLAG_TSYNC in the future. Given that we want to
>   keep seccomp_filter's identity, I think something along the lines of another
>   seccomp command like SECCOMP_INHERIT_PARENT is needed (although I'm not sure
>   if this can even be done yet). In addition, we'll need a kcmp command for
>   figuring out if filters are the same, although this too needs to compare
>   seccomp_filter objects, so it's a little screwy. Any thoughts on how to do
>   this nicely are welcome.
> 
> * I've dropped the bpf converter bug from the set and will submit it
>   separately.
> 
> Alexei mentioned that this should go via net-next to minimize cross-tree
> conflicts. Does that make sense here?
Having looked at the set again I already see conflicts in net/core/filter.c
and in linux/bpf.h with things myself and others are working on for net-next.
So I think it makes the most sense to get the whole set via net-next,
since seccomp bits look limited comparing to bpf changes.
Otherwise the merge window will be unpleasant.
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [PATCH v2 4/5] seccomp: add a way to access filters via bpf fds
  2015-09-11  0:21 ` [PATCH v2 4/5] seccomp: add a way to access filters via bpf fds Tycho Andersen
@ 2015-09-11 11:47   ` Daniel Borkmann
       [not found]     ` <55F2BF5A.8010006-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
       [not found]   ` <1441930862-14347-5-git-send-email-tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
  2015-09-11 16:20   ` Andy Lutomirski
  2 siblings, 1 reply; 40+ messages in thread
From: Daniel Borkmann @ 2015-09-11 11:47 UTC (permalink / raw)
  To: Tycho Andersen, Kees Cook, Alexei Starovoitov
  Cc: David S. Miller, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, linux-kernel, netdev, linux-api
On 09/11/2015 02:21 AM, Tycho Andersen wrote:
> This patch adds a way for a process that is "real root" to access the
> seccomp filters of another process. The process first does a
> PTRACE_SECCOMP_GET_FILTER_FD to get an fd with that process' seccomp filter
> attached, and then iterates on this with PTRACE_SECCOMP_NEXT_FILTER using
> bpf(BPF_PROG_DUMP) to dump the actual program at each step.
>
> Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
> CC: Kees Cook <keescook@chromium.org>
> CC: Will Drewry <wad@chromium.org>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Pavel Emelyanov <xemul@parallels.com>
> CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
> CC: Alexei Starovoitov <ast@kernel.org>
> CC: Daniel Borkmann <daniel@iogearbox.net>
[...]
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 58ae9f4..ac3ed1c 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -506,6 +506,30 @@ struct bpf_prog *bpf_prog_get(u32 ufd)
>   }
>   EXPORT_SYMBOL_GPL(bpf_prog_get);
>
> +int bpf_prog_set(u32 ufd, struct bpf_prog *new)
> +{
> +	struct fd f;
> +	struct bpf_prog *prog;
> +
> +	f = fdget(ufd);
> +
> +	prog = get_prog(f);
> +	if (!IS_ERR(prog) && prog)
> +		bpf_prog_put(prog);
> +
> +	atomic_inc(&new->aux->refcnt);
> +	f.file->private_data = new;
> +	fdput(f);
> +	return 0;
So in case get_prog() fails, and for example f.file is infact NULL,
you assign the bpf prog then to ERR_PTR(-EBADF)'s private_data? :(
> +}
> +EXPORT_SYMBOL_GPL(bpf_prog_set);
> +
> +int bpf_new_fd(struct bpf_prog *prog, int flags)
> +{
> +	return anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog, flags);
> +}
> +EXPORT_SYMBOL_GPL(bpf_new_fd);
Any reason why these two need to be exported for modules? Which
modules are using them?
I think modules should probably not mess with this.
If you already name it generic, it would also be good if bpf_new_fd()
is used in case of maps that call anon_inode_getfd(), too.
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [PATCH v2 4/5] seccomp: add a way to access filters via bpf fds
       [not found]   ` <1441930862-14347-5-git-send-email-tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
@ 2015-09-11 12:08     ` Michael Kerrisk (man-pages)
       [not found]       ` <CAKgNAki99ZFgLPE5mWWjj1nvdNyke1w0ttqmiG+Uk0rVfqutZw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-09-11 12:08 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, David S. Miller, Will Drewry,
	Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, lkml, netdev, Linux API
HI Tycho
On 11 September 2015 at 02:21, Tycho Andersen
<tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
> This patch adds a way for a process that is "real root" to access the
> seccomp filters of another process. The process first does a
> PTRACE_SECCOMP_GET_FILTER_FD to get an fd with that process' seccomp filter
> attached, and then iterates on this with PTRACE_SECCOMP_NEXT_FILTER using
> bpf(BPF_PROG_DUMP) to dump the actual program at each step.
Do you have a man- page patch for this change?
Cheers,
Michael
> Signed-off-by: Tycho Andersen <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> CC: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> CC: Will Drewry <wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> CC: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
> CC: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
> CC: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
> CC: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> CC: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
> ---
>  include/linux/bpf.h         | 12 ++++++++++
>  include/linux/seccomp.h     | 14 +++++++++++
>  include/uapi/linux/ptrace.h |  3 +++
>  kernel/bpf/syscall.c        | 26 ++++++++++++++++++++-
>  kernel/ptrace.c             |  7 ++++++
>  kernel/seccomp.c            | 57 +++++++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 118 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index f57d7fe..bfd9cab 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -162,6 +162,8 @@ void bpf_register_prog_type(struct bpf_prog_type_list *tl);
>  void bpf_register_map_type(struct bpf_map_type_list *tl);
>
>  struct bpf_prog *bpf_prog_get(u32 ufd);
> +int bpf_prog_set(u32 ufd, struct bpf_prog *new);
> +int bpf_new_fd(struct bpf_prog *prog, int flags);
>  void bpf_prog_put(struct bpf_prog *prog);
>  void bpf_prog_put_rcu(struct bpf_prog *prog);
>
> @@ -180,6 +182,16 @@ static inline struct bpf_prog *bpf_prog_get(u32 ufd)
>         return ERR_PTR(-EOPNOTSUPP);
>  }
>
> +static inline int bpf_prog_set(u32 ufd, struct bpf_prog *new)
> +{
> +       return -EINVAL;
> +}
> +
> +static inline int bpf_new_fd(struct bpf_prog *prog, int flags)
> +{
> +       return -EINVAL;
> +}
> +
>  static inline void bpf_prog_put(struct bpf_prog *prog)
>  {
>  }
> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
> index a19ddac..41b083c 100644
> --- a/include/linux/seccomp.h
> +++ b/include/linux/seccomp.h
> @@ -95,4 +95,18 @@ static inline void get_seccomp_filter(struct task_struct *tsk)
>         return;
>  }
>  #endif /* CONFIG_SECCOMP_FILTER */
> +
> +#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
> +extern long seccomp_get_filter_fd(struct task_struct *child);
> +extern long seccomp_next_filter(struct task_struct *child, u32 fd);
> +#else
> +static inline long seccomp_get_filter_fd(struct task_struct *child)
> +{
> +       return -EINVAL;
> +}
> +static inline long seccomp_next_filter(struct task_struct *child, u32 fd)
> +{
> +       return -EINVAL;
> +}
> +#endif /* CONFIG_SECCOMP_FILTER && CONFIG_CHECKPOINT_RESTORE */
>  #endif /* _LINUX_SECCOMP_H */
> diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
> index cf1019e..041c3c3 100644
> --- a/include/uapi/linux/ptrace.h
> +++ b/include/uapi/linux/ptrace.h
> @@ -23,6 +23,9 @@
>
>  #define PTRACE_SYSCALL           24
>
> +#define PTRACE_SECCOMP_GET_FILTER_FD   40
> +#define PTRACE_SECCOMP_NEXT_FILTER     41
> +
>  /* 0x4200-0x4300 are reserved for architecture-independent additions.  */
>  #define PTRACE_SETOPTIONS      0x4200
>  #define PTRACE_GETEVENTMSG     0x4201
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 58ae9f4..ac3ed1c 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -506,6 +506,30 @@ struct bpf_prog *bpf_prog_get(u32 ufd)
>  }
>  EXPORT_SYMBOL_GPL(bpf_prog_get);
>
> +int bpf_prog_set(u32 ufd, struct bpf_prog *new)
> +{
> +       struct fd f;
> +       struct bpf_prog *prog;
> +
> +       f = fdget(ufd);
> +
> +       prog = get_prog(f);
> +       if (!IS_ERR(prog) && prog)
> +               bpf_prog_put(prog);
> +
> +       atomic_inc(&new->aux->refcnt);
> +       f.file->private_data = new;
> +       fdput(f);
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(bpf_prog_set);
> +
> +int bpf_new_fd(struct bpf_prog *prog, int flags)
> +{
> +       return anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog, flags);
> +}
> +EXPORT_SYMBOL_GPL(bpf_new_fd);
> +
>  /* last field in 'union bpf_attr' used by this command */
>  #define        BPF_PROG_LOAD_LAST_FIELD kern_version
>
> @@ -572,7 +596,7 @@ static int bpf_prog_load(union bpf_attr *attr)
>         if (err < 0)
>                 goto free_used_maps;
>
> -       err = anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog, O_RDWR | O_CLOEXEC);
> +       err = bpf_new_fd(prog, O_RDWR | O_CLOEXEC);
>         if (err < 0)
>                 /* failed to allocate fd */
>                 goto free_used_maps;
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index c8e0e05..a151c35 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -1003,6 +1003,13 @@ int ptrace_request(struct task_struct *child, long request,
>                 break;
>         }
>  #endif
> +
> +       case PTRACE_SECCOMP_GET_FILTER_FD:
> +               return seccomp_get_filter_fd(child);
> +
> +       case PTRACE_SECCOMP_NEXT_FILTER:
> +               return seccomp_next_filter(child, data);
> +
>         default:
>                 break;
>         }
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index afaeddf..1856f69 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -26,6 +26,8 @@
>  #endif
>
>  #ifdef CONFIG_SECCOMP_FILTER
> +#include <linux/bpf.h>
> +#include <uapi/linux/bpf.h>
>  #include <linux/filter.h>
>  #include <linux/pid.h>
>  #include <linux/ptrace.h>
> @@ -807,6 +809,61 @@ static inline long seccomp_set_mode_filter(unsigned int flags,
>  }
>  #endif
>
> +#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
> +long seccomp_get_filter_fd(struct task_struct *child)
> +{
> +       long fd;
> +       struct seccomp_filter *filter;
> +
> +       if (!capable(CAP_SYS_ADMIN))
> +               return -EACCES;
> +
> +       if (child->seccomp.mode != SECCOMP_MODE_FILTER)
> +               return -EINVAL;
> +
> +       filter = child->seccomp.filter;
> +
> +       fd = bpf_new_fd(filter->prog, O_RDONLY);
> +       if (fd > 0)
> +               atomic_inc(&filter->prog->aux->refcnt);
> +
> +       return fd;
> +}
> +
> +long seccomp_next_filter(struct task_struct *child, u32 fd)
> +{
> +       struct seccomp_filter *cur;
> +       struct bpf_prog *prog;
> +       long ret = -ESRCH;
> +
> +       if (!capable(CAP_SYS_ADMIN))
> +               return -EACCES;
> +
> +       if (child->seccomp.mode != SECCOMP_MODE_FILTER)
> +               return -EINVAL;
> +
> +       prog = bpf_prog_get(fd);
> +       if (IS_ERR(prog)) {
> +               ret = PTR_ERR(prog);
> +               goto out;
> +       }
> +
> +       for (cur = child->seccomp.filter; cur; cur = cur->prev) {
> +               if (cur->prog == prog) {
> +                       if (!cur->prev)
> +                               ret = -ENOENT;
> +                       else
> +                               ret = bpf_prog_set(fd, cur->prev->prog);
> +                       break;
> +               }
> +       }
> +
> +out:
> +       bpf_prog_put(prog);
> +       return ret;
> +}
> +#endif
> +
>  /* Common entry point for both prctl and syscall. */
>  static long do_seccomp(unsigned int op, unsigned int flags,
>                        const char __user *uargs)
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [PATCH v2 1/5] ebpf: add a seccomp program type
  2015-09-11  0:20   ` [PATCH v2 1/5] ebpf: add a seccomp program type Tycho Andersen
@ 2015-09-11 12:09     ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 40+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-09-11 12:09 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, David S. Miller, Will Drewry,
	Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, lkml, netdev, Linux API
On 11 September 2015 at 02:20, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> seccomp uses eBPF as its underlying storage and execution format, and eBPF
> has features that seccomp would like to make use of in the future. This
> patch adds a formal seccomp type to the eBPF verifier.
>
> The current implementation of the seccomp eBPF type is very limited, and
> doesn't support some interesting features (notably, maps) of eBPF. However,
> the primary motivation for this patchset is to enable checkpoint/restore
> for seccomp filters later in the series, to this limited feature set is ok
> for now.
Hi Tycho,
Seems like a man-pages patch is warranted here also?
Cheers,
Michael
> v2: * don't allow seccomp eBPF programs to call any functions
>     * get rid of superfluous seccomp_convert_ctx_access
>
> Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
> CC: Kees Cook <keescook@chromium.org>
> CC: Will Drewry <wad@chromium.org>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Pavel Emelyanov <xemul@parallels.com>
> CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
> CC: Alexei Starovoitov <ast@kernel.org>
> CC: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  include/uapi/linux/bpf.h |  1 +
>  net/core/filter.c        | 31 +++++++++++++++++++++++++++++++
>  2 files changed, 32 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 92a48e2..631cdee 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -123,6 +123,7 @@ enum bpf_prog_type {
>         BPF_PROG_TYPE_KPROBE,
>         BPF_PROG_TYPE_SCHED_CLS,
>         BPF_PROG_TYPE_SCHED_ACT,
> +       BPF_PROG_TYPE_SECCOMP,
>  };
>
>  #define BPF_PSEUDO_MAP_FD      1
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 13079f0..faaae67 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -1612,6 +1612,15 @@ tc_cls_act_func_proto(enum bpf_func_id func_id)
>         }
>  }
>
> +static const struct bpf_func_proto *
> +seccomp_func_proto(enum bpf_func_id func_id)
> +{
> +       /* At some point in the future seccomp filters may grow support for
> +        * eBPF functions. For now, these are disabled.
> +        */
> +       return NULL;
> +}
> +
>  static bool __is_valid_access(int off, int size, enum bpf_access_type type)
>  {
>         /* check bounds */
> @@ -1662,6 +1671,17 @@ static bool tc_cls_act_is_valid_access(int off, int size,
>         return __is_valid_access(off, size, type);
>  }
>
> +static bool seccomp_is_valid_access(int off, int size,
> +                                   enum bpf_access_type type)
> +{
> +       if (type == BPF_WRITE)
> +               return false;
> +
> +       if (off < 0 || off >= sizeof(struct seccomp_data) || off & 3)
> +               return false;
> +
> +       return true;
> +}
>  static u32 bpf_net_convert_ctx_access(enum bpf_access_type type, int dst_reg,
>                                       int src_reg, int ctx_off,
>                                       struct bpf_insn *insn_buf)
> @@ -1795,6 +1815,11 @@ static const struct bpf_verifier_ops tc_cls_act_ops = {
>         .convert_ctx_access = bpf_net_convert_ctx_access,
>  };
>
> +static const struct bpf_verifier_ops seccomp_ops = {
> +       .get_func_proto = seccomp_func_proto,
> +       .is_valid_access = seccomp_is_valid_access,
> +};
> +
>  static struct bpf_prog_type_list sk_filter_type __read_mostly = {
>         .ops = &sk_filter_ops,
>         .type = BPF_PROG_TYPE_SOCKET_FILTER,
> @@ -1810,11 +1835,17 @@ static struct bpf_prog_type_list sched_act_type __read_mostly = {
>         .type = BPF_PROG_TYPE_SCHED_ACT,
>  };
>
> +static struct bpf_prog_type_list seccomp_type __read_mostly = {
> +       .ops = &seccomp_ops,
> +       .type = BPF_PROG_TYPE_SECCOMP,
> +};
> +
>  static int __init register_sk_filter_ops(void)
>  {
>         bpf_register_prog_type(&sk_filter_type);
>         bpf_register_prog_type(&sched_cls_type);
>         bpf_register_prog_type(&sched_act_type);
> +       bpf_register_prog_type(&seccomp_type);
>
>         return 0;
>  }
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [PATCH v2 5/5] seccomp: add a way to attach a filter via eBPF fd
  2015-09-11  0:21   ` [PATCH v2 5/5] seccomp: add a way to attach a filter via eBPF fd Tycho Andersen
@ 2015-09-11 12:10     ` Michael Kerrisk (man-pages)
  2015-09-11 12:37     ` Daniel Borkmann
  1 sibling, 0 replies; 40+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-09-11 12:10 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, David S. Miller, Will Drewry,
	Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, lkml, netdev, Linux API
On 11 September 2015 at 02:21, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> This is the final bit needed to support seccomp filters created via the bpf
> syscall. The patch adds a new seccomp operation SECCOMP_MODE_FILTER_EBPF,
> which takes exactly one command (presumably to be expanded upon later when
> seccomp EBPFs support more interesting things) and an argument struct
> similar to that of bpf(), although the size is explicit in the struct to
> avoid changing the signature of seccomp().
>
> v2: Don't abuse seccomp's third argument; use a separate command and a
>     pointer to a structure instead.
Hi Tycho,
Here, I'm entering broken record territory :-). Seems like a man-pages
patch is warranted here also?
Cheers,
Michael
> Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
> CC: Kees Cook <keescook@chromium.org>
> CC: Will Drewry <wad@chromium.org>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Pavel Emelyanov <xemul@parallels.com>
> CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
> CC: Alexei Starovoitov <ast@kernel.org>
> CC: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  include/uapi/linux/seccomp.h |  16 +++++
>  kernel/seccomp.c             | 135 ++++++++++++++++++++++++++++++++++++++-----
>  2 files changed, 138 insertions(+), 13 deletions(-)
>
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index 0f238a4..a8694e2 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -13,10 +13,14 @@
>  /* Valid operations for seccomp syscall. */
>  #define SECCOMP_SET_MODE_STRICT        0
>  #define SECCOMP_SET_MODE_FILTER        1
> +#define SECCOMP_MODE_FILTER_EBPF       2
>
>  /* Valid flags for SECCOMP_SET_MODE_FILTER */
>  #define SECCOMP_FILTER_FLAG_TSYNC      1
>
> +/* Valid cmds for SECCOMP_MODE_FILTER_EBPF */
> +#define SECCOMP_EBPF_ADD_FD    0
> +
>  /*
>   * All BPF programs must return a 32-bit value.
>   * The bottom 16-bits are for optional return data.
> @@ -51,4 +55,16 @@ struct seccomp_data {
>         __u64 args[6];
>  };
>
> +struct seccomp_ebpf {
> +       unsigned int size;
> +
> +       union {
> +               /* SECCOMP_EBPF_ADD_FD */
> +               struct {
> +                       unsigned int    add_flags;
> +                       __u32           add_fd;
> +               };
> +       };
> +};
> +
>  #endif /* _UAPI_LINUX_SECCOMP_H */
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 1856f69..e78175a 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -65,6 +65,9 @@ struct seccomp_filter {
>  /* Limit any path through the tree to 256KB worth of instructions. */
>  #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter))
>
> +static long seccomp_install_filter(unsigned int flags,
> +                                  struct seccomp_filter *prepared);
> +
>  /*
>   * Endianness is explicitly ignored and left for BPF program authors to manage
>   * as per the specific architecture.
> @@ -356,17 +359,6 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
>
>         BUG_ON(INT_MAX / fprog->len < sizeof(struct sock_filter));
>
> -       /*
> -        * Installing a seccomp filter requires that the task has
> -        * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
> -        * This avoids scenarios where unprivileged tasks can affect the
> -        * behavior of privileged children.
> -        */
> -       if (!task_no_new_privs(current) &&
> -           security_capable_noaudit(current_cred(), current_user_ns(),
> -                                    CAP_SYS_ADMIN) != 0)
> -               return ERR_PTR(-EACCES);
> -
>         /* Allocate a new seccomp_filter */
>         sfilter = kzalloc(sizeof(*sfilter), GFP_KERNEL | __GFP_NOWARN);
>         if (!sfilter)
> @@ -510,8 +502,105 @@ static void seccomp_send_sigsys(int syscall, int reason)
>         info.si_syscall = syscall;
>         force_sig_info(SIGSYS, &info, current);
>  }
> +
>  #endif /* CONFIG_SECCOMP_FILTER */
>
> +#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_SECCOMP_FILTER)
> +static struct seccomp_filter *seccomp_prepare_ebpf(int fd)
> +{
> +       struct seccomp_filter *ret;
> +       struct bpf_prog *prog;
> +
> +       prog = bpf_prog_get(fd);
> +       if (IS_ERR(prog))
> +               return (struct seccomp_filter *) prog;
> +
> +       if (prog->type != BPF_PROG_TYPE_SECCOMP) {
> +               bpf_prog_put(prog);
> +               return ERR_PTR(-EINVAL);
> +       }
> +
> +       ret = kzalloc(sizeof(*ret), GFP_KERNEL | __GFP_NOWARN);
> +       if (!ret) {
> +               bpf_prog_put(prog);
> +               return ERR_PTR(-ENOMEM);
> +       }
> +
> +       ret->prog = prog;
> +       atomic_set(&ret->usage, 1);
> +
> +       /* Intentionally don't bpf_prog_put() here, because the underlying prog
> +        * is refcounted too and we're holding a reference from the struct
> +        * seccomp_filter object.
> +        */
> +       return ret;
> +}
> +
> +static long seccomp_ebpf_add_fd(struct seccomp_ebpf *ebpf)
> +{
> +       struct seccomp_filter *prepared;
> +
> +       prepared = seccomp_prepare_ebpf(ebpf->add_fd);
> +       if (IS_ERR(prepared))
> +               return PTR_ERR(prepared);
> +
> +       return seccomp_install_filter(ebpf->add_flags, prepared);
> +}
> +
> +static long seccomp_mode_filter_ebpf(unsigned int cmd, const char __user *uargs)
> +{
> +       const struct seccomp_ebpf __user *uebpf;
> +       struct seccomp_ebpf ebpf;
> +       unsigned int size;
> +       long ret = -EFAULT;
> +
> +       uebpf = (const struct seccomp_ebpf __user *) uargs;
> +
> +       if (get_user(size, &uebpf->size) != 0)
> +               return -EFAULT;
> +
> +       /* If we're handed a bigger struct than we know of,
> +        * ensure all the unknown bits are 0 - i.e. new
> +        * user-space does not rely on any kernel feature
> +        * extensions we dont know about yet.
> +        */
> +       if (size > sizeof(ebpf)) {
> +               unsigned char __user *addr;
> +               unsigned char __user *end;
> +               unsigned char val;
> +
> +               addr = (void __user *)uebpf + sizeof(ebpf);
> +               end  = (void __user *)uebpf + size;
> +
> +               for (; addr < end; addr++) {
> +                       int err = get_user(val, addr);
> +
> +                       if (err)
> +                               return err;
> +                       if (val)
> +                               return -E2BIG;
> +               }
> +               size = sizeof(ebpf);
> +       }
> +
> +       if (copy_from_user(&ebpf, uebpf, size) != 0)
> +               return -EFAULT;
> +
> +       switch (cmd) {
> +       case SECCOMP_EBPF_ADD_FD:
> +               ret = seccomp_ebpf_add_fd(&ebpf);
> +               break;
> +       }
> +
> +       return ret;
> +}
> +#else
> +static long seccomp_mode_filter_ebpf(unsigned int cmd, const char __user *uargs)
> +{
> +       return -EINVAL;
> +}
> +#endif
> +
>  /*
>   * Secure computing mode 1 allows only read/write/exit/sigreturn.
>   * To be fully secure this must be combined with rlimit
> @@ -760,9 +849,7 @@ out:
>  static long seccomp_set_mode_filter(unsigned int flags,
>                                     const char __user *filter)
>  {
> -       const unsigned long seccomp_mode = SECCOMP_MODE_FILTER;
>         struct seccomp_filter *prepared = NULL;
> -       long ret = -EINVAL;
>
>         /* Validate flags. */
>         if (flags & ~SECCOMP_FILTER_FLAG_MASK)
> @@ -773,6 +860,26 @@ static long seccomp_set_mode_filter(unsigned int flags,
>         if (IS_ERR(prepared))
>                 return PTR_ERR(prepared);
>
> +       return seccomp_install_filter(flags, prepared);
> +}
> +
> +static long seccomp_install_filter(unsigned int flags,
> +                                  struct seccomp_filter *prepared)
> +{
> +       const unsigned long seccomp_mode = SECCOMP_MODE_FILTER;
> +       long ret = -EINVAL;
> +
> +       /*
> +        * Installing a seccomp filter requires that the task has
> +        * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
> +        * This avoids scenarios where unprivileged tasks can affect the
> +        * behavior of privileged children.
> +        */
> +       if (!task_no_new_privs(current) &&
> +           security_capable_noaudit(current_cred(), current_user_ns(),
> +                                    CAP_SYS_ADMIN) != 0)
> +               return -EACCES;
> +
>         /*
>          * Make sure we cannot change seccomp or nnp state via TSYNC
>          * while another thread is in the middle of calling exec.
> @@ -875,6 +982,8 @@ static long do_seccomp(unsigned int op, unsigned int flags,
>                 return seccomp_set_mode_strict();
>         case SECCOMP_SET_MODE_FILTER:
>                 return seccomp_set_mode_filter(flags, uargs);
> +       case SECCOMP_MODE_FILTER_EBPF:
> +               return seccomp_mode_filter_ebpf(flags, uargs);
>         default:
>                 return -EINVAL;
>         }
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [PATCH v2 3/5] ebpf: add a way to dump an eBPF program
  2015-09-11  0:21   ` [PATCH v2 3/5] ebpf: add a way to dump an eBPF program Tycho Andersen
       [not found]     ` <1441930862-14347-4-git-send-email-tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
@ 2015-09-11 12:11     ` Michael Kerrisk (man-pages)
  1 sibling, 0 replies; 40+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-09-11 12:11 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, David S. Miller, Will Drewry,
	Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, lkml, netdev, Linux API
Hi Tycho,
On 11 September 2015 at 02:21, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> This commit adds a way to dump eBPF programs. The initial implementation
> doesn't support maps, and therefore only allows dumping seccomp ebpf
> programs which themselves don't currently support maps.
Same broken record :-).
Cheers,
Michael
> v2: don't export a prog_id for the filter
>
> Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
> CC: Kees Cook <keescook@chromium.org>
> CC: Will Drewry <wad@chromium.org>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Pavel Emelyanov <xemul@parallels.com>
> CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
> CC: Alexei Starovoitov <ast@kernel.org>
> CC: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  include/uapi/linux/bpf.h | 14 ++++++++++++++
>  kernel/bpf/syscall.c     | 41 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 55 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 631cdee..e037a76 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -107,6 +107,13 @@ enum bpf_cmd {
>          * returns fd or negative error
>          */
>         BPF_PROG_LOAD,
> +
> +       /* dump an existing bpf
> +        * err = bpf(BPF_PROG_DUMP, union bpf_attr *attr, u32 size)
> +        * Using attr->prog_fd, attr->dump_insn_cnt, attr->dump_insns
> +        * returns zero or negative error
> +        */
> +       BPF_PROG_DUMP,
>  };
>
>  enum bpf_map_type {
> @@ -161,6 +168,13 @@ union bpf_attr {
>                 __aligned_u64   log_buf;        /* user supplied buffer */
>                 __u32           kern_version;   /* checked when prog_type=kprobe */
>         };
> +
> +       struct { /* anonymous struct used by BPF_PROG_DUMP command */
> +               __u32           prog_fd;
> +               __u32           dump_insn_cnt;
> +               __aligned_u64   dump_insns;     /* user supplied buffer */
> +               __u8            gpl_compatible;
> +       };
>  } __attribute__((aligned(8)));
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index dc9b464..58ae9f4 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -586,6 +586,44 @@ free_prog:
>         return err;
>  }
>
> +static int bpf_prog_dump(union bpf_attr *attr, union bpf_attr __user *uattr)
> +{
> +       int ufd = attr->prog_fd;
> +       struct fd f = fdget(ufd);
> +       struct bpf_prog *prog;
> +       int ret = -EINVAL;
> +
> +       prog = get_prog(f);
> +       if (IS_ERR(prog))
> +               return PTR_ERR(prog);
> +
> +       /* For now, let's refuse to dump anything that isn't a seccomp program.
> +        * Other program types have support for maps, which our current dump
> +        * code doesn't support.
> +        */
> +       if (prog->type != BPF_PROG_TYPE_SECCOMP)
> +               goto out;
> +
> +       ret = -EFAULT;
> +       if (put_user(prog->len, &uattr->dump_insn_cnt))
> +               goto out;
> +
> +       if (put_user((u8) prog->gpl_compatible, &uattr->gpl_compatible))
> +               goto out;
> +
> +       if (attr->dump_insns) {
> +               u32 len = prog->len * sizeof(struct bpf_insn);
> +
> +               if (copy_to_user(u64_to_ptr(attr->dump_insns),
> +                                prog->insns, len) != 0)
> +                       goto out;
> +       }
> +
> +       ret = 0;
> +out:
> +       return ret;
> +}
> +
>  SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
>  {
>         union bpf_attr attr = {};
> @@ -650,6 +688,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
>         case BPF_PROG_LOAD:
>                 err = bpf_prog_load(&attr);
>                 break;
> +       case BPF_PROG_DUMP:
> +               err = bpf_prog_dump(&attr, uattr);
> +               break;
>         default:
>                 err = -EINVAL;
>                 break;
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [PATCH v2 5/5] seccomp: add a way to attach a filter via eBPF fd
  2015-09-11  0:21   ` [PATCH v2 5/5] seccomp: add a way to attach a filter via eBPF fd Tycho Andersen
  2015-09-11 12:10     ` Michael Kerrisk (man-pages)
@ 2015-09-11 12:37     ` Daniel Borkmann
       [not found]       ` <55F2CB27.7030804-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
  1 sibling, 1 reply; 40+ messages in thread
From: Daniel Borkmann @ 2015-09-11 12:37 UTC (permalink / raw)
  To: Tycho Andersen, Kees Cook, Alexei Starovoitov
  Cc: David S. Miller, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, linux-kernel, netdev, linux-api
On 09/11/2015 02:21 AM, Tycho Andersen wrote:
> This is the final bit needed to support seccomp filters created via the bpf
> syscall. The patch adds a new seccomp operation SECCOMP_MODE_FILTER_EBPF,
> which takes exactly one command (presumably to be expanded upon later when
> seccomp EBPFs support more interesting things) and an argument struct
> similar to that of bpf(), although the size is explicit in the struct to
> avoid changing the signature of seccomp().
>
> v2: Don't abuse seccomp's third argument; use a separate command and a
>      pointer to a structure instead.
Comments below ...
> Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
> CC: Kees Cook <keescook@chromium.org>
> CC: Will Drewry <wad@chromium.org>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Pavel Emelyanov <xemul@parallels.com>
> CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
> CC: Alexei Starovoitov <ast@kernel.org>
> CC: Daniel Borkmann <daniel@iogearbox.net>
> ---
>   include/uapi/linux/seccomp.h |  16 +++++
>   kernel/seccomp.c             | 135 ++++++++++++++++++++++++++++++++++++++-----
>   2 files changed, 138 insertions(+), 13 deletions(-)
>
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index 0f238a4..a8694e2 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -13,10 +13,14 @@
>   /* Valid operations for seccomp syscall. */
>   #define SECCOMP_SET_MODE_STRICT	0
>   #define SECCOMP_SET_MODE_FILTER	1
> +#define SECCOMP_MODE_FILTER_EBPF	2
Should this be SECCOMP_SET_MODE_FILTER_EBPF or just SECCOMP_SET_MODE_EBPF?
>   /* Valid flags for SECCOMP_SET_MODE_FILTER */
>   #define SECCOMP_FILTER_FLAG_TSYNC	1
>
> +/* Valid cmds for SECCOMP_MODE_FILTER_EBPF */
> +#define SECCOMP_EBPF_ADD_FD	0
> +
>   /*
>    * All BPF programs must return a 32-bit value.
>    * The bottom 16-bits are for optional return data.
> @@ -51,4 +55,16 @@ struct seccomp_data {
>   	__u64 args[6];
>   };
>
> +struct seccomp_ebpf {
> +	unsigned int size;
> +
> +	union {
> +		/* SECCOMP_EBPF_ADD_FD */
> +		struct {
> +			unsigned int	add_flags;
> +			__u32		add_fd;
> +		};
> +	};
> +};
> +
>   #endif /* _UAPI_LINUX_SECCOMP_H */
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 1856f69..e78175a 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -65,6 +65,9 @@ struct seccomp_filter {
>   /* Limit any path through the tree to 256KB worth of instructions. */
>   #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter))
>
> +static long seccomp_install_filter(unsigned int flags,
> +				   struct seccomp_filter *prepared);
> +
>   /*
>    * Endianness is explicitly ignored and left for BPF program authors to manage
>    * as per the specific architecture.
> @@ -356,17 +359,6 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
>
>   	BUG_ON(INT_MAX / fprog->len < sizeof(struct sock_filter));
>
> -	/*
> -	 * Installing a seccomp filter requires that the task has
> -	 * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
> -	 * This avoids scenarios where unprivileged tasks can affect the
> -	 * behavior of privileged children.
> -	 */
> -	if (!task_no_new_privs(current) &&
> -	    security_capable_noaudit(current_cred(), current_user_ns(),
> -				     CAP_SYS_ADMIN) != 0)
> -		return ERR_PTR(-EACCES);
> -
>   	/* Allocate a new seccomp_filter */
>   	sfilter = kzalloc(sizeof(*sfilter), GFP_KERNEL | __GFP_NOWARN);
>   	if (!sfilter)
> @@ -510,8 +502,105 @@ static void seccomp_send_sigsys(int syscall, int reason)
>   	info.si_syscall = syscall;
>   	force_sig_info(SIGSYS, &info, current);
>   }
> +
>   #endif	/* CONFIG_SECCOMP_FILTER */
>
> +#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_SECCOMP_FILTER)
> +static struct seccomp_filter *seccomp_prepare_ebpf(int fd)
> +{
> +	struct seccomp_filter *ret;
> +	struct bpf_prog *prog;
> +
> +	prog = bpf_prog_get(fd);
> +	if (IS_ERR(prog))
> +		return (struct seccomp_filter *) prog;
ERR_CAST()
> +
> +	if (prog->type != BPF_PROG_TYPE_SECCOMP) {
> +		bpf_prog_put(prog);
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	ret = kzalloc(sizeof(*ret), GFP_KERNEL | __GFP_NOWARN);
> +	if (!ret) {
> +		bpf_prog_put(prog);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	ret->prog = prog;
> +	atomic_set(&ret->usage, 1);
> +
> +	/* Intentionally don't bpf_prog_put() here, because the underlying prog
> +	 * is refcounted too and we're holding a reference from the struct
> +	 * seccomp_filter object.
> +	 */
> +	return ret;
> +}
> +
> +static long seccomp_ebpf_add_fd(struct seccomp_ebpf *ebpf)
> +{
> +	struct seccomp_filter *prepared;
> +
> +	prepared = seccomp_prepare_ebpf(ebpf->add_fd);
> +	if (IS_ERR(prepared))
> +		return PTR_ERR(prepared);
> +
> +	return seccomp_install_filter(ebpf->add_flags, prepared);
> +}
> +
> +static long seccomp_mode_filter_ebpf(unsigned int cmd, const char __user *uargs)
> +{
> +	const struct seccomp_ebpf __user *uebpf;
> +	struct seccomp_ebpf ebpf;
> +	unsigned int size;
> +	long ret = -EFAULT;
> +
> +	uebpf = (const struct seccomp_ebpf __user *) uargs;
> +
> +	if (get_user(size, &uebpf->size) != 0)
> +		return -EFAULT;
> +
> +	/* If we're handed a bigger struct than we know of,
> +	 * ensure all the unknown bits are 0 - i.e. new
> +	 * user-space does not rely on any kernel feature
> +	 * extensions we dont know about yet.
> +	 */
> +	if (size > sizeof(ebpf)) {
> +		unsigned char __user *addr;
> +		unsigned char __user *end;
> +		unsigned char val;
> +
> +		addr = (void __user *)uebpf + sizeof(ebpf);
> +		end  = (void __user *)uebpf + size;
> +
> +		for (; addr < end; addr++) {
> +			int err = get_user(val, addr);
> +
> +			if (err)
> +				return err;
> +			if (val)
> +				return -E2BIG;
> +		}
> +		size = sizeof(ebpf);
> +	}
> +
> +	if (copy_from_user(&ebpf, uebpf, size) != 0)
> +		return -EFAULT;
Not sure it's worth adding all this bpf(2)-alike interface complexity into
this, but fair enough, I guess there are some very good reasons and bigger
additions coming then ...
> +	switch (cmd) {
> +	case SECCOMP_EBPF_ADD_FD:
> +		ret = seccomp_ebpf_add_fd(&ebpf);
> +		break;
> +	}
> +
> +	return ret;
> +}
> +#else
> +static long seccomp_mode_filter_ebpf(unsigned int cmd, const char __user *uargs)
> +{
> +	return -EINVAL;
> +}
> +#endif
> +
>   /*
>    * Secure computing mode 1 allows only read/write/exit/sigreturn.
>    * To be fully secure this must be combined with rlimit
> @@ -760,9 +849,7 @@ out:
>   static long seccomp_set_mode_filter(unsigned int flags,
>   				    const char __user *filter)
>   {
> -	const unsigned long seccomp_mode = SECCOMP_MODE_FILTER;
>   	struct seccomp_filter *prepared = NULL;
> -	long ret = -EINVAL;
>
>   	/* Validate flags. */
>   	if (flags & ~SECCOMP_FILTER_FLAG_MASK)
> @@ -773,6 +860,26 @@ static long seccomp_set_mode_filter(unsigned int flags,
>   	if (IS_ERR(prepared))
>   		return PTR_ERR(prepared);
>
> +	return seccomp_install_filter(flags, prepared);
I (truly) hope, I'm overseeing something ;) ...
... but why doing all the (classic) seccomp-BPF preparation work (which is rather
a lot) up to this point, where you have it ready, only to *then* find out we don't
have the actual permissions ?!
Plus, when seccomp_install_filter() fails with -EACCES, who is releasing all the
allocated foo resp. dropping taken program refs !?
I see the same in seccomp_ebpf_add_fd().
So, an unprivileged child could increase the parent's bpf_prog's reference count
w/o having the actual permissions to do so, and thus controlling it to the point
where the next bpf_prog_put() would unintentionally release it?
(So yeah, I'm hoping I misread something ... ;))
> +}
> +
> +static long seccomp_install_filter(unsigned int flags,
> +				   struct seccomp_filter *prepared)
> +{
> +	const unsigned long seccomp_mode = SECCOMP_MODE_FILTER;
> +	long ret = -EINVAL;
> +
> +	/*
> +	 * Installing a seccomp filter requires that the task has
> +	 * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
> +	 * This avoids scenarios where unprivileged tasks can affect the
> +	 * behavior of privileged children.
> +	 */
> +	if (!task_no_new_privs(current) &&
> +	    security_capable_noaudit(current_cred(), current_user_ns(),
> +				     CAP_SYS_ADMIN) != 0)
> +		return -EACCES;
> +
>   	/*
>   	 * Make sure we cannot change seccomp or nnp state via TSYNC
>   	 * while another thread is in the middle of calling exec.
> @@ -875,6 +982,8 @@ static long do_seccomp(unsigned int op, unsigned int flags,
>   		return seccomp_set_mode_strict();
>   	case SECCOMP_SET_MODE_FILTER:
>   		return seccomp_set_mode_filter(flags, uargs);
> +	case SECCOMP_MODE_FILTER_EBPF:
> +		return seccomp_mode_filter_ebpf(flags, uargs);
>   	default:
>   		return -EINVAL;
>   	}
>
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [PATCH v2 2/5] seccomp: make underlying bpf ref counted as well
       [not found]   ` <1441930862-14347-3-git-send-email-tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
@ 2015-09-11 13:02     ` Daniel Borkmann
       [not found]       ` <55F2D0EC.9090004-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Daniel Borkmann @ 2015-09-11 13:02 UTC (permalink / raw)
  To: Tycho Andersen, Kees Cook, Alexei Starovoitov
  Cc: David S. Miller, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA
On 09/11/2015 02:20 AM, Tycho Andersen wrote:
> In the next patch, we're going to add a way to access the underlying
> filters via bpf fds. This means that we need to ref-count both the
> struct seccomp_filter objects and the struct bpf_prog objects separately,
> in case a process dies but a filter is still referred to by another
> process.
>
> Additionally, we mark classic converted seccomp filters as seccomp eBPF
> programs, since they are a subset of what is supported in seccomp eBPF.
>
> Signed-off-by: Tycho Andersen <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> CC: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> CC: Will Drewry <wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> CC: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
> CC: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
> CC: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
> CC: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> CC: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
> ---
>   kernel/seccomp.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 245df6b..afaeddf 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -378,6 +378,8 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
>   	}
>
>   	atomic_set(&sfilter->usage, 1);
> +	atomic_set(&sfilter->prog->aux->refcnt, 1);
> +	sfilter->prog->type = BPF_PROG_TYPE_SECCOMP;
So, if you do this, then this breaks the assumption of eBPF JITs
that, currently, all classic converted BPF programs always have a
prog->type of BPF_PROG_TYPE_UNSPEC (see: bpf_prog_was_classic()).
Currently, JITs make use of this information to determine whether
A and X mappings for such programs should or should not be cleared
in the prologue (s390 currently).
In the seccomp_prepare_filter() stage, we're already past that, so
it will not cause an issue, but we certainly would need to be very
careful in future, if bpf_prog_was_classic() is then used at a later
stage when we already have a generated bpf_prog somewhere, as then
this assumption will break.
>   	return sfilter;
>   }
> @@ -470,7 +472,7 @@ void get_seccomp_filter(struct task_struct *tsk)
>   static inline void seccomp_filter_free(struct seccomp_filter *filter)
>   {
>   	if (filter) {
> -		bpf_prog_free(filter->prog);
> +		bpf_prog_put(filter->prog);
>   		kfree(filter);
>   	}
>   }
>
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [PATCH v2 3/5] ebpf: add a way to dump an eBPF program
       [not found]     ` <1441930862-14347-4-git-send-email-tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
  2015-09-11  2:29       ` Alexei Starovoitov
@ 2015-09-11 13:39       ` Daniel Borkmann
  2015-09-11 14:44         ` Tycho Andersen
  1 sibling, 1 reply; 40+ messages in thread
From: Daniel Borkmann @ 2015-09-11 13:39 UTC (permalink / raw)
  To: Tycho Andersen, Kees Cook, Alexei Starovoitov
  Cc: David S. Miller, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA
On 09/11/2015 02:21 AM, Tycho Andersen wrote:
> This commit adds a way to dump eBPF programs. The initial implementation
> doesn't support maps, and therefore only allows dumping seccomp ebpf
> programs which themselves don't currently support maps.
>
> v2: don't export a prog_id for the filter
>
> Signed-off-by: Tycho Andersen <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> CC: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> CC: Will Drewry <wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> CC: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
> CC: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
> CC: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
> CC: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> CC: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
[...]
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index dc9b464..58ae9f4 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -586,6 +586,44 @@ free_prog:
>   	return err;
>   }
>
> +static int bpf_prog_dump(union bpf_attr *attr, union bpf_attr __user *uattr)
> +{
> +	int ufd = attr->prog_fd;
> +	struct fd f = fdget(ufd);
> +	struct bpf_prog *prog;
> +	int ret = -EINVAL;
> +
> +	prog = get_prog(f);
> +	if (IS_ERR(prog))
> +		return PTR_ERR(prog);
> +
> +	/* For now, let's refuse to dump anything that isn't a seccomp program.
> +	 * Other program types have support for maps, which our current dump
> +	 * code doesn't support.
> +	 */
> +	if (prog->type != BPF_PROG_TYPE_SECCOMP)
> +		goto out;
Yep, also when you start adding helper calls (next to map objects) you'd
need to undo kernel pointers that the verifier sets here.
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [PATCH v2 4/5] seccomp: add a way to access filters via bpf fds
       [not found]     ` <55F2BF5A.8010006-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
@ 2015-09-11 14:29       ` Tycho Andersen
  0 siblings, 0 replies; 40+ messages in thread
From: Tycho Andersen @ 2015-09-11 14:29 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Kees Cook, Alexei Starovoitov, David S. Miller, Will Drewry,
	Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA
On Fri, Sep 11, 2015 at 01:47:38PM +0200, Daniel Borkmann wrote:
> On 09/11/2015 02:21 AM, Tycho Andersen wrote:
> >This patch adds a way for a process that is "real root" to access the
> >seccomp filters of another process. The process first does a
> >PTRACE_SECCOMP_GET_FILTER_FD to get an fd with that process' seccomp filter
> >attached, and then iterates on this with PTRACE_SECCOMP_NEXT_FILTER using
> >bpf(BPF_PROG_DUMP) to dump the actual program at each step.
> >
> >Signed-off-by: Tycho Andersen <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> >CC: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> >CC: Will Drewry <wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> >CC: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> >CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
> >CC: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
> >CC: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
> >CC: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> >CC: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
> [...]
> >diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> >index 58ae9f4..ac3ed1c 100644
> >--- a/kernel/bpf/syscall.c
> >+++ b/kernel/bpf/syscall.c
> >@@ -506,6 +506,30 @@ struct bpf_prog *bpf_prog_get(u32 ufd)
> >  }
> >  EXPORT_SYMBOL_GPL(bpf_prog_get);
> >
> >+int bpf_prog_set(u32 ufd, struct bpf_prog *new)
> >+{
> >+	struct fd f;
> >+	struct bpf_prog *prog;
> >+
> >+	f = fdget(ufd);
> >+
> >+	prog = get_prog(f);
> >+	if (!IS_ERR(prog) && prog)
> >+		bpf_prog_put(prog);
> >+
> >+	atomic_inc(&new->aux->refcnt);
> >+	f.file->private_data = new;
> >+	fdput(f);
> >+	return 0;
> 
> So in case get_prog() fails, and for example f.file is infact NULL,
> you assign the bpf prog then to ERR_PTR(-EBADF)'s private_data? :(
Thanks, I will fix for the next version.
> >+}
> >+EXPORT_SYMBOL_GPL(bpf_prog_set);
> >+
> >+int bpf_new_fd(struct bpf_prog *prog, int flags)
> >+{
> >+	return anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog, flags);
> >+}
> >+EXPORT_SYMBOL_GPL(bpf_new_fd);
> 
> Any reason why these two need to be exported for modules? Which
> modules are using them?
> 
> I think modules should probably not mess with this.
No reason, I suppose. I was just exporting because bpf_prog_get is;
I'll drop it for the next version.
> If you already name it generic, it would also be good if bpf_new_fd()
> is used in case of maps that call anon_inode_getfd(), too.
I needed to call bpf_new_fd from kernel/seccomp.c, which it seems
shouldn't be able to reference bpf_prog_fops, which is why I added the
little "proxy". If we change the api to something like,
bpf_new_fd("bpf-map", &bpf_map_fops, map);
bpf_new_fd("bpf-prog", &bpf_prog_fops, prog);
I'd need access to bpf_prog_fops again. What about changing the name
to bpf_new_prog_fd?
Tycho
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [PATCH v2 4/5] seccomp: add a way to access filters via bpf fds
       [not found]       ` <CAKgNAki99ZFgLPE5mWWjj1nvdNyke1w0ttqmiG+Uk0rVfqutZw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-09-11 14:31         ` Tycho Andersen
  0 siblings, 0 replies; 40+ messages in thread
From: Tycho Andersen @ 2015-09-11 14:31 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Kees Cook, Alexei Starovoitov, David S. Miller, Will Drewry,
	Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, lkml, netdev, Linux API
Hi Michael,
On Fri, Sep 11, 2015 at 02:08:50PM +0200, Michael Kerrisk (man-pages) wrote:
> HI Tycho
> 
> On 11 September 2015 at 02:21, Tycho Andersen
> <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
> > This patch adds a way for a process that is "real root" to access the
> > seccomp filters of another process. The process first does a
> > PTRACE_SECCOMP_GET_FILTER_FD to get an fd with that process' seccomp filter
> > attached, and then iterates on this with PTRACE_SECCOMP_NEXT_FILTER using
> > bpf(BPF_PROG_DUMP) to dump the actual program at each step.
> 
> Do you have a man- page patch for this change?
Not yet (r.e. all the man page reqs), I can draft them asap, though.
Hopefully the API is mostly stable at this point :).
Thanks,
Tycho
> Cheers,
> 
> Michael
> 
> > Signed-off-by: Tycho Andersen <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> > CC: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> > CC: Will Drewry <wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> > CC: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
> > CC: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
> > CC: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
> > CC: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> > CC: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
> > ---
> >  include/linux/bpf.h         | 12 ++++++++++
> >  include/linux/seccomp.h     | 14 +++++++++++
> >  include/uapi/linux/ptrace.h |  3 +++
> >  kernel/bpf/syscall.c        | 26 ++++++++++++++++++++-
> >  kernel/ptrace.c             |  7 ++++++
> >  kernel/seccomp.c            | 57 +++++++++++++++++++++++++++++++++++++++++++++
> >  6 files changed, 118 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index f57d7fe..bfd9cab 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -162,6 +162,8 @@ void bpf_register_prog_type(struct bpf_prog_type_list *tl);
> >  void bpf_register_map_type(struct bpf_map_type_list *tl);
> >
> >  struct bpf_prog *bpf_prog_get(u32 ufd);
> > +int bpf_prog_set(u32 ufd, struct bpf_prog *new);
> > +int bpf_new_fd(struct bpf_prog *prog, int flags);
> >  void bpf_prog_put(struct bpf_prog *prog);
> >  void bpf_prog_put_rcu(struct bpf_prog *prog);
> >
> > @@ -180,6 +182,16 @@ static inline struct bpf_prog *bpf_prog_get(u32 ufd)
> >         return ERR_PTR(-EOPNOTSUPP);
> >  }
> >
> > +static inline int bpf_prog_set(u32 ufd, struct bpf_prog *new)
> > +{
> > +       return -EINVAL;
> > +}
> > +
> > +static inline int bpf_new_fd(struct bpf_prog *prog, int flags)
> > +{
> > +       return -EINVAL;
> > +}
> > +
> >  static inline void bpf_prog_put(struct bpf_prog *prog)
> >  {
> >  }
> > diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
> > index a19ddac..41b083c 100644
> > --- a/include/linux/seccomp.h
> > +++ b/include/linux/seccomp.h
> > @@ -95,4 +95,18 @@ static inline void get_seccomp_filter(struct task_struct *tsk)
> >         return;
> >  }
> >  #endif /* CONFIG_SECCOMP_FILTER */
> > +
> > +#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
> > +extern long seccomp_get_filter_fd(struct task_struct *child);
> > +extern long seccomp_next_filter(struct task_struct *child, u32 fd);
> > +#else
> > +static inline long seccomp_get_filter_fd(struct task_struct *child)
> > +{
> > +       return -EINVAL;
> > +}
> > +static inline long seccomp_next_filter(struct task_struct *child, u32 fd)
> > +{
> > +       return -EINVAL;
> > +}
> > +#endif /* CONFIG_SECCOMP_FILTER && CONFIG_CHECKPOINT_RESTORE */
> >  #endif /* _LINUX_SECCOMP_H */
> > diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
> > index cf1019e..041c3c3 100644
> > --- a/include/uapi/linux/ptrace.h
> > +++ b/include/uapi/linux/ptrace.h
> > @@ -23,6 +23,9 @@
> >
> >  #define PTRACE_SYSCALL           24
> >
> > +#define PTRACE_SECCOMP_GET_FILTER_FD   40
> > +#define PTRACE_SECCOMP_NEXT_FILTER     41
> > +
> >  /* 0x4200-0x4300 are reserved for architecture-independent additions.  */
> >  #define PTRACE_SETOPTIONS      0x4200
> >  #define PTRACE_GETEVENTMSG     0x4201
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 58ae9f4..ac3ed1c 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -506,6 +506,30 @@ struct bpf_prog *bpf_prog_get(u32 ufd)
> >  }
> >  EXPORT_SYMBOL_GPL(bpf_prog_get);
> >
> > +int bpf_prog_set(u32 ufd, struct bpf_prog *new)
> > +{
> > +       struct fd f;
> > +       struct bpf_prog *prog;
> > +
> > +       f = fdget(ufd);
> > +
> > +       prog = get_prog(f);
> > +       if (!IS_ERR(prog) && prog)
> > +               bpf_prog_put(prog);
> > +
> > +       atomic_inc(&new->aux->refcnt);
> > +       f.file->private_data = new;
> > +       fdput(f);
> > +       return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(bpf_prog_set);
> > +
> > +int bpf_new_fd(struct bpf_prog *prog, int flags)
> > +{
> > +       return anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog, flags);
> > +}
> > +EXPORT_SYMBOL_GPL(bpf_new_fd);
> > +
> >  /* last field in 'union bpf_attr' used by this command */
> >  #define        BPF_PROG_LOAD_LAST_FIELD kern_version
> >
> > @@ -572,7 +596,7 @@ static int bpf_prog_load(union bpf_attr *attr)
> >         if (err < 0)
> >                 goto free_used_maps;
> >
> > -       err = anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog, O_RDWR | O_CLOEXEC);
> > +       err = bpf_new_fd(prog, O_RDWR | O_CLOEXEC);
> >         if (err < 0)
> >                 /* failed to allocate fd */
> >                 goto free_used_maps;
> > diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> > index c8e0e05..a151c35 100644
> > --- a/kernel/ptrace.c
> > +++ b/kernel/ptrace.c
> > @@ -1003,6 +1003,13 @@ int ptrace_request(struct task_struct *child, long request,
> >                 break;
> >         }
> >  #endif
> > +
> > +       case PTRACE_SECCOMP_GET_FILTER_FD:
> > +               return seccomp_get_filter_fd(child);
> > +
> > +       case PTRACE_SECCOMP_NEXT_FILTER:
> > +               return seccomp_next_filter(child, data);
> > +
> >         default:
> >                 break;
> >         }
> > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > index afaeddf..1856f69 100644
> > --- a/kernel/seccomp.c
> > +++ b/kernel/seccomp.c
> > @@ -26,6 +26,8 @@
> >  #endif
> >
> >  #ifdef CONFIG_SECCOMP_FILTER
> > +#include <linux/bpf.h>
> > +#include <uapi/linux/bpf.h>
> >  #include <linux/filter.h>
> >  #include <linux/pid.h>
> >  #include <linux/ptrace.h>
> > @@ -807,6 +809,61 @@ static inline long seccomp_set_mode_filter(unsigned int flags,
> >  }
> >  #endif
> >
> > +#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
> > +long seccomp_get_filter_fd(struct task_struct *child)
> > +{
> > +       long fd;
> > +       struct seccomp_filter *filter;
> > +
> > +       if (!capable(CAP_SYS_ADMIN))
> > +               return -EACCES;
> > +
> > +       if (child->seccomp.mode != SECCOMP_MODE_FILTER)
> > +               return -EINVAL;
> > +
> > +       filter = child->seccomp.filter;
> > +
> > +       fd = bpf_new_fd(filter->prog, O_RDONLY);
> > +       if (fd > 0)
> > +               atomic_inc(&filter->prog->aux->refcnt);
> > +
> > +       return fd;
> > +}
> > +
> > +long seccomp_next_filter(struct task_struct *child, u32 fd)
> > +{
> > +       struct seccomp_filter *cur;
> > +       struct bpf_prog *prog;
> > +       long ret = -ESRCH;
> > +
> > +       if (!capable(CAP_SYS_ADMIN))
> > +               return -EACCES;
> > +
> > +       if (child->seccomp.mode != SECCOMP_MODE_FILTER)
> > +               return -EINVAL;
> > +
> > +       prog = bpf_prog_get(fd);
> > +       if (IS_ERR(prog)) {
> > +               ret = PTR_ERR(prog);
> > +               goto out;
> > +       }
> > +
> > +       for (cur = child->seccomp.filter; cur; cur = cur->prev) {
> > +               if (cur->prog == prog) {
> > +                       if (!cur->prev)
> > +                               ret = -ENOENT;
> > +                       else
> > +                               ret = bpf_prog_set(fd, cur->prev->prog);
> > +                       break;
> > +               }
> > +       }
> > +
> > +out:
> > +       bpf_prog_put(prog);
> > +       return ret;
> > +}
> > +#endif
> > +
> >  /* Common entry point for both prctl and syscall. */
> >  static long do_seccomp(unsigned int op, unsigned int flags,
> >                        const char __user *uargs)
> > --
> > 2.1.4
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-api" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [PATCH v2 5/5] seccomp: add a way to attach a filter via eBPF fd
       [not found]       ` <55F2CB27.7030804-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
@ 2015-09-11 14:40         ` Tycho Andersen
  0 siblings, 0 replies; 40+ messages in thread
From: Tycho Andersen @ 2015-09-11 14:40 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Kees Cook, Alexei Starovoitov, David S. Miller, Will Drewry,
	Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA
On Fri, Sep 11, 2015 at 02:37:59PM +0200, Daniel Borkmann wrote:
> On 09/11/2015 02:21 AM, Tycho Andersen wrote:
> >This is the final bit needed to support seccomp filters created via the bpf
> >syscall. The patch adds a new seccomp operation SECCOMP_MODE_FILTER_EBPF,
> >which takes exactly one command (presumably to be expanded upon later when
> >seccomp EBPFs support more interesting things) and an argument struct
> >similar to that of bpf(), although the size is explicit in the struct to
> >avoid changing the signature of seccomp().
> >
> >v2: Don't abuse seccomp's third argument; use a separate command and a
> >     pointer to a structure instead.
> 
> Comments below ...
> 
> >Signed-off-by: Tycho Andersen <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> >CC: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> >CC: Will Drewry <wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> >CC: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> >CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
> >CC: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
> >CC: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
> >CC: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> >CC: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
> >---
> >  include/uapi/linux/seccomp.h |  16 +++++
> >  kernel/seccomp.c             | 135 ++++++++++++++++++++++++++++++++++++++-----
> >  2 files changed, 138 insertions(+), 13 deletions(-)
> >
> >diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> >index 0f238a4..a8694e2 100644
> >--- a/include/uapi/linux/seccomp.h
> >+++ b/include/uapi/linux/seccomp.h
> >@@ -13,10 +13,14 @@
> >  /* Valid operations for seccomp syscall. */
> >  #define SECCOMP_SET_MODE_STRICT	0
> >  #define SECCOMP_SET_MODE_FILTER	1
> >+#define SECCOMP_MODE_FILTER_EBPF	2
> 
> Should this be SECCOMP_SET_MODE_FILTER_EBPF or just SECCOMP_SET_MODE_EBPF?
I just stole the name Kees gave it in the previous thread, but I think
that perhaps there are other plans for manipulating seccomp ebpfs (?).
The command is SECCOMP_EBPF_ADD_FD, so it seems like we could add a
command like SECCOMP_EBPF_SOMETHING in the future.
> >  /* Valid flags for SECCOMP_SET_MODE_FILTER */
> >  #define SECCOMP_FILTER_FLAG_TSYNC	1
> >
> >+/* Valid cmds for SECCOMP_MODE_FILTER_EBPF */
> >+#define SECCOMP_EBPF_ADD_FD	0
> >+
> >  /*
> >   * All BPF programs must return a 32-bit value.
> >   * The bottom 16-bits are for optional return data.
> >@@ -51,4 +55,16 @@ struct seccomp_data {
> >  	__u64 args[6];
> >  };
> >
> >+struct seccomp_ebpf {
> >+	unsigned int size;
> >+
> >+	union {
> >+		/* SECCOMP_EBPF_ADD_FD */
> >+		struct {
> >+			unsigned int	add_flags;
> >+			__u32		add_fd;
> >+		};
> >+	};
> >+};
> >+
> >  #endif /* _UAPI_LINUX_SECCOMP_H */
> >diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> >index 1856f69..e78175a 100644
> >--- a/kernel/seccomp.c
> >+++ b/kernel/seccomp.c
> >@@ -65,6 +65,9 @@ struct seccomp_filter {
> >  /* Limit any path through the tree to 256KB worth of instructions. */
> >  #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter))
> >
> >+static long seccomp_install_filter(unsigned int flags,
> >+				   struct seccomp_filter *prepared);
> >+
> >  /*
> >   * Endianness is explicitly ignored and left for BPF program authors to manage
> >   * as per the specific architecture.
> >@@ -356,17 +359,6 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
> >
> >  	BUG_ON(INT_MAX / fprog->len < sizeof(struct sock_filter));
> >
> >-	/*
> >-	 * Installing a seccomp filter requires that the task has
> >-	 * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
> >-	 * This avoids scenarios where unprivileged tasks can affect the
> >-	 * behavior of privileged children.
> >-	 */
> >-	if (!task_no_new_privs(current) &&
> >-	    security_capable_noaudit(current_cred(), current_user_ns(),
> >-				     CAP_SYS_ADMIN) != 0)
> >-		return ERR_PTR(-EACCES);
> >-
> >  	/* Allocate a new seccomp_filter */
> >  	sfilter = kzalloc(sizeof(*sfilter), GFP_KERNEL | __GFP_NOWARN);
> >  	if (!sfilter)
> >@@ -510,8 +502,105 @@ static void seccomp_send_sigsys(int syscall, int reason)
> >  	info.si_syscall = syscall;
> >  	force_sig_info(SIGSYS, &info, current);
> >  }
> >+
> >  #endif	/* CONFIG_SECCOMP_FILTER */
> >
> >+#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_SECCOMP_FILTER)
> >+static struct seccomp_filter *seccomp_prepare_ebpf(int fd)
> >+{
> >+	struct seccomp_filter *ret;
> >+	struct bpf_prog *prog;
> >+
> >+	prog = bpf_prog_get(fd);
> >+	if (IS_ERR(prog))
> >+		return (struct seccomp_filter *) prog;
> 
> ERR_CAST()
> 
> >+
> >+	if (prog->type != BPF_PROG_TYPE_SECCOMP) {
> >+		bpf_prog_put(prog);
> >+		return ERR_PTR(-EINVAL);
> >+	}
> >+
> >+	ret = kzalloc(sizeof(*ret), GFP_KERNEL | __GFP_NOWARN);
> >+	if (!ret) {
> >+		bpf_prog_put(prog);
> >+		return ERR_PTR(-ENOMEM);
> >+	}
> >+
> >+	ret->prog = prog;
> >+	atomic_set(&ret->usage, 1);
> >+
> >+	/* Intentionally don't bpf_prog_put() here, because the underlying prog
> >+	 * is refcounted too and we're holding a reference from the struct
> >+	 * seccomp_filter object.
> >+	 */
> >+	return ret;
> >+}
> >+
> >+static long seccomp_ebpf_add_fd(struct seccomp_ebpf *ebpf)
> >+{
> >+	struct seccomp_filter *prepared;
> >+
> >+	prepared = seccomp_prepare_ebpf(ebpf->add_fd);
> >+	if (IS_ERR(prepared))
> >+		return PTR_ERR(prepared);
> >+
> >+	return seccomp_install_filter(ebpf->add_flags, prepared);
> >+}
> >+
> >+static long seccomp_mode_filter_ebpf(unsigned int cmd, const char __user *uargs)
> >+{
> >+	const struct seccomp_ebpf __user *uebpf;
> >+	struct seccomp_ebpf ebpf;
> >+	unsigned int size;
> >+	long ret = -EFAULT;
> >+
> >+	uebpf = (const struct seccomp_ebpf __user *) uargs;
> >+
> >+	if (get_user(size, &uebpf->size) != 0)
> >+		return -EFAULT;
> >+
> >+	/* If we're handed a bigger struct than we know of,
> >+	 * ensure all the unknown bits are 0 - i.e. new
> >+	 * user-space does not rely on any kernel feature
> >+	 * extensions we dont know about yet.
> >+	 */
> >+	if (size > sizeof(ebpf)) {
> >+		unsigned char __user *addr;
> >+		unsigned char __user *end;
> >+		unsigned char val;
> >+
> >+		addr = (void __user *)uebpf + sizeof(ebpf);
> >+		end  = (void __user *)uebpf + size;
> >+
> >+		for (; addr < end; addr++) {
> >+			int err = get_user(val, addr);
> >+
> >+			if (err)
> >+				return err;
> >+			if (val)
> >+				return -E2BIG;
> >+		}
> >+		size = sizeof(ebpf);
> >+	}
> >+
> >+	if (copy_from_user(&ebpf, uebpf, size) != 0)
> >+		return -EFAULT;
> 
> Not sure it's worth adding all this bpf(2)-alike interface complexity into
> this, but fair enough, I guess there are some very good reasons and bigger
> additions coming then ...
I'm not sure what bigger additions are coming, although it seems Andy
might have something. I think this is just an attempt to future proof
things.
> >+	switch (cmd) {
> >+	case SECCOMP_EBPF_ADD_FD:
> >+		ret = seccomp_ebpf_add_fd(&ebpf);
> >+		break;
> >+	}
> >+
> >+	return ret;
> >+}
> >+#else
> >+static long seccomp_mode_filter_ebpf(unsigned int cmd, const char __user *uargs)
> >+{
> >+	return -EINVAL;
> >+}
> >+#endif
> >+
> >  /*
> >   * Secure computing mode 1 allows only read/write/exit/sigreturn.
> >   * To be fully secure this must be combined with rlimit
> >@@ -760,9 +849,7 @@ out:
> >  static long seccomp_set_mode_filter(unsigned int flags,
> >  				    const char __user *filter)
> >  {
> >-	const unsigned long seccomp_mode = SECCOMP_MODE_FILTER;
> >  	struct seccomp_filter *prepared = NULL;
> >-	long ret = -EINVAL;
> >
> >  	/* Validate flags. */
> >  	if (flags & ~SECCOMP_FILTER_FLAG_MASK)
> >@@ -773,6 +860,26 @@ static long seccomp_set_mode_filter(unsigned int flags,
> >  	if (IS_ERR(prepared))
> >  		return PTR_ERR(prepared);
> >
> >+	return seccomp_install_filter(flags, prepared);
> 
> I (truly) hope, I'm overseeing something ;) ...
> 
> ... but why doing all the (classic) seccomp-BPF preparation work (which is rather
> a lot) up to this point, where you have it ready, only to *then* find out we don't
> have the actual permissions ?!
Yes, this seems dumb. I was trying to avoid having the check in two
places, but that's probably what's necessary.
> Plus, when seccomp_install_filter() fails with -EACCES, who is releasing all the
> allocated foo resp. dropping taken program refs !?
Yes, seccomp_install_filter is /supposed/ to free things if the
install fails, although it doesn't in the permissions case because
of the copy paste error, doh.
> I see the same in seccomp_ebpf_add_fd().
Same as above, seccomp_install_filter is supposed to call
seccomp_filter_free in case of an error, but it doesn't.
Thanks for the look. I'll make the changes for the next set.
Tycho
> So, an unprivileged child could increase the parent's bpf_prog's reference count
> w/o having the actual permissions to do so, and thus controlling it to the point
> where the next bpf_prog_put() would unintentionally release it?
> 
> (So yeah, I'm hoping I misread something ... ;))
> 
> >+}
> >+
> >+static long seccomp_install_filter(unsigned int flags,
> >+				   struct seccomp_filter *prepared)
> >+{
> >+	const unsigned long seccomp_mode = SECCOMP_MODE_FILTER;
> >+	long ret = -EINVAL;
> >+
> >+	/*
> >+	 * Installing a seccomp filter requires that the task has
> >+	 * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
> >+	 * This avoids scenarios where unprivileged tasks can affect the
> >+	 * behavior of privileged children.
> >+	 */
> >+	if (!task_no_new_privs(current) &&
> >+	    security_capable_noaudit(current_cred(), current_user_ns(),
> >+				     CAP_SYS_ADMIN) != 0)
> >+		return -EACCES;
> >+
> >  	/*
> >  	 * Make sure we cannot change seccomp or nnp state via TSYNC
> >  	 * while another thread is in the middle of calling exec.
> >@@ -875,6 +982,8 @@ static long do_seccomp(unsigned int op, unsigned int flags,
> >  		return seccomp_set_mode_strict();
> >  	case SECCOMP_SET_MODE_FILTER:
> >  		return seccomp_set_mode_filter(flags, uargs);
> >+	case SECCOMP_MODE_FILTER_EBPF:
> >+		return seccomp_mode_filter_ebpf(flags, uargs);
> >  	default:
> >  		return -EINVAL;
> >  	}
> >
> 
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [PATCH v2 2/5] seccomp: make underlying bpf ref counted as well
       [not found]       ` <55F2D0EC.9090004-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
@ 2015-09-11 14:44         ` Tycho Andersen
  2015-09-11 16:03           ` Daniel Borkmann
  0 siblings, 1 reply; 40+ messages in thread
From: Tycho Andersen @ 2015-09-11 14:44 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Kees Cook, Alexei Starovoitov, David S. Miller, Will Drewry,
	Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA
On Fri, Sep 11, 2015 at 03:02:36PM +0200, Daniel Borkmann wrote:
> On 09/11/2015 02:20 AM, Tycho Andersen wrote:
> >In the next patch, we're going to add a way to access the underlying
> >filters via bpf fds. This means that we need to ref-count both the
> >struct seccomp_filter objects and the struct bpf_prog objects separately,
> >in case a process dies but a filter is still referred to by another
> >process.
> >
> >Additionally, we mark classic converted seccomp filters as seccomp eBPF
> >programs, since they are a subset of what is supported in seccomp eBPF.
> >
> >Signed-off-by: Tycho Andersen <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> >CC: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> >CC: Will Drewry <wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> >CC: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> >CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
> >CC: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
> >CC: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
> >CC: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> >CC: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
> >---
> >  kernel/seccomp.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> >diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> >index 245df6b..afaeddf 100644
> >--- a/kernel/seccomp.c
> >+++ b/kernel/seccomp.c
> >@@ -378,6 +378,8 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
> >  	}
> >
> >  	atomic_set(&sfilter->usage, 1);
> >+	atomic_set(&sfilter->prog->aux->refcnt, 1);
> >+	sfilter->prog->type = BPF_PROG_TYPE_SECCOMP;
> 
> So, if you do this, then this breaks the assumption of eBPF JITs
> that, currently, all classic converted BPF programs always have a
> prog->type of BPF_PROG_TYPE_UNSPEC (see: bpf_prog_was_classic()).
> 
> Currently, JITs make use of this information to determine whether
> A and X mappings for such programs should or should not be cleared
> in the prologue (s390 currently).
> 
> In the seccomp_prepare_filter() stage, we're already past that, so
> it will not cause an issue, but we certainly would need to be very
> careful in future, if bpf_prog_was_classic() is then used at a later
> stage when we already have a generated bpf_prog somewhere, as then
> this assumption will break.
The only reason we need to do this is to allow BPF_DUMP_PROG to work,
since we were restricting it to only allow dumping of seccomp
programs, since those don't have maps. Instead, perhaps we could allow
dumping of BPF_PROG_TYPE_SECCOMP and BPF_PROG_TYPE_UNSPEC?
Tycho
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [PATCH v2 3/5] ebpf: add a way to dump an eBPF program
  2015-09-11 13:39       ` Daniel Borkmann
@ 2015-09-11 14:44         ` Tycho Andersen
  0 siblings, 0 replies; 40+ messages in thread
From: Tycho Andersen @ 2015-09-11 14:44 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Kees Cook, Alexei Starovoitov, David S. Miller, Will Drewry,
	Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	linux-kernel, netdev, linux-api
On Fri, Sep 11, 2015 at 03:39:14PM +0200, Daniel Borkmann wrote:
> On 09/11/2015 02:21 AM, Tycho Andersen wrote:
> >This commit adds a way to dump eBPF programs. The initial implementation
> >doesn't support maps, and therefore only allows dumping seccomp ebpf
> >programs which themselves don't currently support maps.
> >
> >v2: don't export a prog_id for the filter
> >
> >Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
> >CC: Kees Cook <keescook@chromium.org>
> >CC: Will Drewry <wad@chromium.org>
> >CC: Oleg Nesterov <oleg@redhat.com>
> >CC: Andy Lutomirski <luto@amacapital.net>
> >CC: Pavel Emelyanov <xemul@parallels.com>
> >CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
> >CC: Alexei Starovoitov <ast@kernel.org>
> >CC: Daniel Borkmann <daniel@iogearbox.net>
> [...]
> >diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> >index dc9b464..58ae9f4 100644
> >--- a/kernel/bpf/syscall.c
> >+++ b/kernel/bpf/syscall.c
> >@@ -586,6 +586,44 @@ free_prog:
> >  	return err;
> >  }
> >
> >+static int bpf_prog_dump(union bpf_attr *attr, union bpf_attr __user *uattr)
> >+{
> >+	int ufd = attr->prog_fd;
> >+	struct fd f = fdget(ufd);
> >+	struct bpf_prog *prog;
> >+	int ret = -EINVAL;
> >+
> >+	prog = get_prog(f);
> >+	if (IS_ERR(prog))
> >+		return PTR_ERR(prog);
> >+
> >+	/* For now, let's refuse to dump anything that isn't a seccomp program.
> >+	 * Other program types have support for maps, which our current dump
> >+	 * code doesn't support.
> >+	 */
> >+	if (prog->type != BPF_PROG_TYPE_SECCOMP)
> >+		goto out;
> 
> Yep, also when you start adding helper calls (next to map objects) you'd
> need to undo kernel pointers that the verifier sets here.
Good point, I'll add that to the comment as well.
Tycho
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [PATCH v2 3/5] ebpf: add a way to dump an eBPF program
       [not found]         ` <20150911022940.GA4903-2RGepAHry06MXrjNfwE7T/6muRTtt8+awzqs5ZKRSiY@public.gmane.org>
@ 2015-09-11 14:59           ` Tycho Andersen
  0 siblings, 0 replies; 40+ messages in thread
From: Tycho Andersen @ 2015-09-11 14:59 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kees Cook, Alexei Starovoitov, David S. Miller, Will Drewry,
	Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA
On Thu, Sep 10, 2015 at 07:29:42PM -0700, Alexei Starovoitov wrote:
> On Thu, Sep 10, 2015 at 06:21:00PM -0600, Tycho Andersen wrote:
> > +static int bpf_prog_dump(union bpf_attr *attr, union bpf_attr __user *uattr)
> > +{
> > +	int ufd = attr->prog_fd;
> > +	struct fd f = fdget(ufd);
> > +	struct bpf_prog *prog;
> > +	int ret = -EINVAL;
> > +
> > +	prog = get_prog(f);
> > +	if (IS_ERR(prog))
> > +		return PTR_ERR(prog);
> > +
> > +	/* For now, let's refuse to dump anything that isn't a seccomp program.
> > +	 * Other program types have support for maps, which our current dump
> > +	 * code doesn't support.
> > +	 */
> > +	if (prog->type != BPF_PROG_TYPE_SECCOMP)
> > +		goto out;
> > +
> > +	ret = -EFAULT;
> > +	if (put_user(prog->len, &uattr->dump_insn_cnt))
> > +		goto out;
> > +
> > +	if (put_user((u8) prog->gpl_compatible, &uattr->gpl_compatible))
> > +		goto out;
> > +
> > +	if (attr->dump_insns) {
> > +		u32 len = prog->len * sizeof(struct bpf_insn);
> > +
> > +		if (copy_to_user(u64_to_ptr(attr->dump_insns),
> > +				 prog->insns, len) != 0)
> > +			goto out;
> > +	}
> > +
> > +	ret = 0;
> > +out:
> > +	return ret;
> 
> fdput() is missing in all error paths.
So it is, thanks!
Tycho
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [PATCH v2 2/5] seccomp: make underlying bpf ref counted as well
  2015-09-11 14:44         ` Tycho Andersen
@ 2015-09-11 16:03           ` Daniel Borkmann
       [not found]             ` <55F2FB6F.7050708-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Daniel Borkmann @ 2015-09-11 16:03 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, David S. Miller, Will Drewry,
	Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA
On 09/11/2015 04:44 PM, Tycho Andersen wrote:
> On Fri, Sep 11, 2015 at 03:02:36PM +0200, Daniel Borkmann wrote:
>> On 09/11/2015 02:20 AM, Tycho Andersen wrote:
>>> In the next patch, we're going to add a way to access the underlying
>>> filters via bpf fds. This means that we need to ref-count both the
>>> struct seccomp_filter objects and the struct bpf_prog objects separately,
>>> in case a process dies but a filter is still referred to by another
>>> process.
>>>
>>> Additionally, we mark classic converted seccomp filters as seccomp eBPF
>>> programs, since they are a subset of what is supported in seccomp eBPF.
>>>
>>> Signed-off-by: Tycho Andersen <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
>>> CC: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
>>> CC: Will Drewry <wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
>>> CC: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>> CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
>>> CC: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
>>> CC: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
>>> CC: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>>> CC: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
>>> ---
>>>   kernel/seccomp.c | 4 +++-
>>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
>>> index 245df6b..afaeddf 100644
>>> --- a/kernel/seccomp.c
>>> +++ b/kernel/seccomp.c
>>> @@ -378,6 +378,8 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
>>>   	}
>>>
>>>   	atomic_set(&sfilter->usage, 1);
>>> +	atomic_set(&sfilter->prog->aux->refcnt, 1);
>>> +	sfilter->prog->type = BPF_PROG_TYPE_SECCOMP;
>>
>> So, if you do this, then this breaks the assumption of eBPF JITs
>> that, currently, all classic converted BPF programs always have a
>> prog->type of BPF_PROG_TYPE_UNSPEC (see: bpf_prog_was_classic()).
>>
>> Currently, JITs make use of this information to determine whether
>> A and X mappings for such programs should or should not be cleared
>> in the prologue (s390 currently).
>>
>> In the seccomp_prepare_filter() stage, we're already past that, so
>> it will not cause an issue, but we certainly would need to be very
>> careful in future, if bpf_prog_was_classic() is then used at a later
>> stage when we already have a generated bpf_prog somewhere, as then
>> this assumption will break.
>
> The only reason we need to do this is to allow BPF_DUMP_PROG to work,
> since we were restricting it to only allow dumping of seccomp
> programs, since those don't have maps. Instead, perhaps we could allow
> dumping of BPF_PROG_TYPE_SECCOMP and BPF_PROG_TYPE_UNSPEC?
There are possibilities that BPF_PROG_TYPE_UNSPEC is calling helpers
already today, at least in networking case, not seccomp. So, since
you want to export [classic -> eBPF] only for seccomp, put fds on them
and dump these via bpf(2), you could allow that (with a big comment
stating why it's safe), but mid-term we really need to sanitize all
this stuff properly as this is needed for other types, too.
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [PATCH v2 4/5] seccomp: add a way to access filters via bpf fds
  2015-09-11  0:21 ` [PATCH v2 4/5] seccomp: add a way to access filters via bpf fds Tycho Andersen
  2015-09-11 11:47   ` Daniel Borkmann
       [not found]   ` <1441930862-14347-5-git-send-email-tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
@ 2015-09-11 16:20   ` Andy Lutomirski
  2015-09-11 16:44     ` Tycho Andersen
  2 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2015-09-11 16:20 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Pavel Emelyanov, Kees Cook, linux-kernel@vger.kernel.org,
	Serge E. Hallyn, Oleg Nesterov, David S. Miller,
	Alexei Starovoitov, Will Drewry, Network Development,
	Daniel Borkmann, Linux API
On Sep 10, 2015 5:22 PM, "Tycho Andersen" <tycho.andersen@canonical.com> wrote:
>
> This patch adds a way for a process that is "real root" to access the
> seccomp filters of another process. The process first does a
> PTRACE_SECCOMP_GET_FILTER_FD to get an fd with that process' seccomp filter
> attached, and then iterates on this with PTRACE_SECCOMP_NEXT_FILTER using
> bpf(BPF_PROG_DUMP) to dump the actual program at each step.
>
> +
> +       fd = bpf_new_fd(filter->prog, O_RDONLY);
> +       if (fd > 0)
> +               atomic_inc(&filter->prog->aux->refcnt);
Why isn't this folded into bpf_new_fd?
> +
> +       return fd;
> +}
> +
> +long seccomp_next_filter(struct task_struct *child, u32 fd)
> +{
> +       struct seccomp_filter *cur;
> +       struct bpf_prog *prog;
> +       long ret = -ESRCH;
> +
> +       if (!capable(CAP_SYS_ADMIN))
> +               return -EACCES;
> +
> +       if (child->seccomp.mode != SECCOMP_MODE_FILTER)
> +               return -EINVAL;
> +
> +       prog = bpf_prog_get(fd);
> +       if (IS_ERR(prog)) {
> +               ret = PTR_ERR(prog);
> +               goto out;
> +       }
> +
> +       for (cur = child->seccomp.filter; cur; cur = cur->prev) {
> +               if (cur->prog == prog) {
> +                       if (!cur->prev)
> +                               ret = -ENOENT;
> +                       else
> +                               ret = bpf_prog_set(fd, cur->prev->prog);
This lets you take an fd pointing to one prog and point it elsewhere.
I'm not sure that's a good idea.
--Andy
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: v2 of seccomp filter c/r patches
       [not found] ` <1441930862-14347-1-git-send-email-tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
                     ` (3 preceding siblings ...)
  2015-09-11  2:50   ` v2 of seccomp filter c/r patches Alexei Starovoitov
@ 2015-09-11 16:30   ` Andy Lutomirski
       [not found]     ` <CALCETrVYtv1=g-xPjQ-LiX+5GK3xtB6a2hYbat0TuU-Bd4QA6Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  4 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2015-09-11 16:30 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Pavel Emelyanov, Kees Cook,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Serge E. Hallyn, Oleg Nesterov, David S. Miller,
	Alexei Starovoitov, Will Drewry, Network Development,
	Daniel Borkmann, Linux API
On Sep 10, 2015 5:22 PM, "Tycho Andersen" <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
>
> Hi all,
>
> Here is v2 of the seccomp filter c/r set. The patch notes have individual
> changes from the last series, but there are two points not noted:
>
> * The series still does not allow us to correctly restore state for programs
>   that will use SECCOMP_FILTER_FLAG_TSYNC in the future. Given that we want to
>   keep seccomp_filter's identity, I think something along the lines of another
>   seccomp command like SECCOMP_INHERIT_PARENT is needed (although I'm not sure
>   if this can even be done yet). In addition, we'll need a kcmp command for
>   figuring out if filters are the same, although this too needs to compare
>   seccomp_filter objects, so it's a little screwy. Any thoughts on how to do
>   this nicely are welcome.
Let's add a concept of a seccompfd.
For background of what I want to add: I want to be able to create a
seccomp monitor.  A seccomp monitor will be, logically, a pair of a
struct file that represents the monitor and a seccomp_filter that is
controlled by the monitor.  Depending on flags, whoever holds the
monitor fd could change the active filter, intercept syscalls, and
issue syscalls on behalf of a process that is trapped in an
intercepted syscall.
Seccomp filters would nest properly.
The interface would probably be (extremely pseudocoded):
monitor_fd, filter_fd = seccomp(CREATE_MONITOR, flags, ...);
Then, later:
seccomp(ATTACH_TO_FILTER, filter_fd);  /* now filtered */
read(monitor_fd, buf, size); /* returns an intercepted syscall */
write(monitor_fd, buf, size); /* issues a syscall or releases the
trapped task */
This can't be implemented on x86 without either going insane or
finishing the massive set of pending cleanups to the x86 entry code.
I favor the latter.
We could, however, add part of it right now: we could have a way to
create a filterfd, we could add kcmp support for it, and we could add
the ATTACH_TO_FILTER thing.  I think that would solve your problem.
One major open question: does a filter_fd know what its parent is and,
if so, will it just refuse to attach if the caller's parent is wrong?
Or will a filter_fd attach anywhere.
--Andy
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [PATCH v2 4/5] seccomp: add a way to access filters via bpf fds
  2015-09-11 16:20   ` Andy Lutomirski
@ 2015-09-11 16:44     ` Tycho Andersen
  2015-09-14 17:52       ` Andy Lutomirski
  0 siblings, 1 reply; 40+ messages in thread
From: Tycho Andersen @ 2015-09-11 16:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Pavel Emelyanov, Kees Cook, linux-kernel@vger.kernel.org,
	Serge E. Hallyn, Oleg Nesterov, David S. Miller,
	Alexei Starovoitov, Will Drewry, Network Development,
	Daniel Borkmann, Linux API
On Fri, Sep 11, 2015 at 09:20:55AM -0700, Andy Lutomirski wrote:
> On Sep 10, 2015 5:22 PM, "Tycho Andersen" <tycho.andersen@canonical.com> wrote:
> >
> > This patch adds a way for a process that is "real root" to access the
> > seccomp filters of another process. The process first does a
> > PTRACE_SECCOMP_GET_FILTER_FD to get an fd with that process' seccomp filter
> > attached, and then iterates on this with PTRACE_SECCOMP_NEXT_FILTER using
> > bpf(BPF_PROG_DUMP) to dump the actual program at each step.
> >
> 
> > +
> > +       fd = bpf_new_fd(filter->prog, O_RDONLY);
> > +       if (fd > 0)
> > +               atomic_inc(&filter->prog->aux->refcnt);
> 
> Why isn't this folded into bpf_new_fd?
No reason it can't be as far as I can see. I'll make the change for
the next version.
> > +
> > +       return fd;
> > +}
> > +
> > +long seccomp_next_filter(struct task_struct *child, u32 fd)
> > +{
> > +       struct seccomp_filter *cur;
> > +       struct bpf_prog *prog;
> > +       long ret = -ESRCH;
> > +
> > +       if (!capable(CAP_SYS_ADMIN))
> > +               return -EACCES;
> > +
> > +       if (child->seccomp.mode != SECCOMP_MODE_FILTER)
> > +               return -EINVAL;
> > +
> > +       prog = bpf_prog_get(fd);
> > +       if (IS_ERR(prog)) {
> > +               ret = PTR_ERR(prog);
> > +               goto out;
> > +       }
> > +
> > +       for (cur = child->seccomp.filter; cur; cur = cur->prev) {
> > +               if (cur->prog == prog) {
> > +                       if (!cur->prev)
> > +                               ret = -ENOENT;
> > +                       else
> > +                               ret = bpf_prog_set(fd, cur->prev->prog);
> 
> This lets you take an fd pointing to one prog and point it elsewhere.
> I'm not sure that's a good idea.
That's how the interface was designed (calling ptrace(NEXT_FILTER, fd) and
then doing bpf(DUMP, fd)). I suppose we could have NEXT_FILTER return
a new fd instead if that seems better to you.
Tycho
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: v2 of seccomp filter c/r patches
       [not found]     ` <CALCETrVYtv1=g-xPjQ-LiX+5GK3xtB6a2hYbat0TuU-Bd4QA6Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-09-11 17:00       ` Andy Lutomirski
       [not found]         ` <CALCETrWxLMSgdsdT9gTL80LSovONmCcTYjzqrHqF-WdJ4BN1Uw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2015-09-11 17:00 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Pavel Emelyanov, Kees Cook,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Serge E. Hallyn, Oleg Nesterov, David S. Miller,
	Alexei Starovoitov, Will Drewry, Network Development,
	Daniel Borkmann, Linux API
On Fri, Sep 11, 2015 at 9:30 AM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
> On Sep 10, 2015 5:22 PM, "Tycho Andersen" <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
>>
>> Hi all,
>>
>> Here is v2 of the seccomp filter c/r set. The patch notes have individual
>> changes from the last series, but there are two points not noted:
>>
>> * The series still does not allow us to correctly restore state for programs
>>   that will use SECCOMP_FILTER_FLAG_TSYNC in the future. Given that we want to
>>   keep seccomp_filter's identity, I think something along the lines of another
>>   seccomp command like SECCOMP_INHERIT_PARENT is needed (although I'm not sure
>>   if this can even be done yet). In addition, we'll need a kcmp command for
>>   figuring out if filters are the same, although this too needs to compare
>>   seccomp_filter objects, so it's a little screwy. Any thoughts on how to do
>>   this nicely are welcome.
>
> Let's add a concept of a seccompfd.
>
> For background of what I want to add: I want to be able to create a
> seccomp monitor.  A seccomp monitor will be, logically, a pair of a
> struct file that represents the monitor and a seccomp_filter that is
> controlled by the monitor.  Depending on flags, whoever holds the
> monitor fd could change the active filter, intercept syscalls, and
> issue syscalls on behalf of a process that is trapped in an
> intercepted syscall.
>
> Seccomp filters would nest properly.
>
> The interface would probably be (extremely pseudocoded):
>
> monitor_fd, filter_fd = seccomp(CREATE_MONITOR, flags, ...);
>
> Then, later:
>
> seccomp(ATTACH_TO_FILTER, filter_fd);  /* now filtered */
>
> read(monitor_fd, buf, size); /* returns an intercepted syscall */
> write(monitor_fd, buf, size); /* issues a syscall or releases the
> trapped task */
>
> This can't be implemented on x86 without either going insane or
> finishing the massive set of pending cleanups to the x86 entry code.
> I favor the latter.
>
> We could, however, add part of it right now: we could have a way to
> create a filterfd, we could add kcmp support for it, and we could add
> the ATTACH_TO_FILTER thing.  I think that would solve your problem.
>
> One major open question: does a filter_fd know what its parent is and,
> if so, will it just refuse to attach if the caller's parent is wrong?
> Or will a filter_fd attach anywhere.
>
Let me add one more thought:
Currently, struct seccomp_filter encodes a strict tree hierarchy: it
knows what its parent is.  This only matters as an implementation
detail and because TSYNC checks for seccomp_filter equality.
We could change this without user-visible effects.  We could say that,
for TSYNC purposes, two filter states match if they contain exactly
the same layers in the same order where a layer does *not* encode a
concept of parent.  We could then say that attaching a classic bpf
filter creates a branch new layer that is not equal to any other layer
that's been created.
This has no effect whatsoever.  The difference would be that we could
declare that attaching the same ebpf program twice creates the *same*
layer so that, if you fork and both children attach the same ebpf
program, then they match for TSYNC purposes.  Similarly, attaching the
same hypothetical filterfd would create the same layer.
Thoughts?
--Andy
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: v2 of seccomp filter c/r patches
       [not found]         ` <CALCETrWxLMSgdsdT9gTL80LSovONmCcTYjzqrHqF-WdJ4BN1Uw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-09-11 17:28           ` Tycho Andersen
  2015-09-14 17:52             ` Andy Lutomirski
  0 siblings, 1 reply; 40+ messages in thread
From: Tycho Andersen @ 2015-09-11 17:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Pavel Emelyanov, Kees Cook,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Serge E. Hallyn, Oleg Nesterov, David S. Miller,
	Alexei Starovoitov, Will Drewry, Network Development,
	Daniel Borkmann, Linux API
On Fri, Sep 11, 2015 at 10:00:22AM -0700, Andy Lutomirski wrote:
> On Fri, Sep 11, 2015 at 9:30 AM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
> > On Sep 10, 2015 5:22 PM, "Tycho Andersen" <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
> >>
> >> Hi all,
> >>
> >> Here is v2 of the seccomp filter c/r set. The patch notes have individual
> >> changes from the last series, but there are two points not noted:
> >>
> >> * The series still does not allow us to correctly restore state for programs
> >>   that will use SECCOMP_FILTER_FLAG_TSYNC in the future. Given that we want to
> >>   keep seccomp_filter's identity, I think something along the lines of another
> >>   seccomp command like SECCOMP_INHERIT_PARENT is needed (although I'm not sure
> >>   if this can even be done yet). In addition, we'll need a kcmp command for
> >>   figuring out if filters are the same, although this too needs to compare
> >>   seccomp_filter objects, so it's a little screwy. Any thoughts on how to do
> >>   this nicely are welcome.
> >
> > Let's add a concept of a seccompfd.
> >
> > For background of what I want to add: I want to be able to create a
> > seccomp monitor.  A seccomp monitor will be, logically, a pair of a
> > struct file that represents the monitor and a seccomp_filter that is
> > controlled by the monitor.  Depending on flags, whoever holds the
> > monitor fd could change the active filter, intercept syscalls, and
> > issue syscalls on behalf of a process that is trapped in an
> > intercepted syscall.
> >
> > Seccomp filters would nest properly.
> >
> > The interface would probably be (extremely pseudocoded):
> >
> > monitor_fd, filter_fd = seccomp(CREATE_MONITOR, flags, ...);
> >
> > Then, later:
> >
> > seccomp(ATTACH_TO_FILTER, filter_fd);  /* now filtered */
> >
> > read(monitor_fd, buf, size); /* returns an intercepted syscall */
> > write(monitor_fd, buf, size); /* issues a syscall or releases the
> > trapped task */
> >
> > This can't be implemented on x86 without either going insane or
> > finishing the massive set of pending cleanups to the x86 entry code.
> > I favor the latter.
> >
> > We could, however, add part of it right now: we could have a way to
> > create a filterfd, we could add kcmp support for it, and we could add
> > the ATTACH_TO_FILTER thing.  I think that would solve your problem.
> >
> > One major open question: does a filter_fd know what its parent is and,
> > if so, will it just refuse to attach if the caller's parent is wrong?
> > Or will a filter_fd attach anywhere.
> >
> 
> Let me add one more thought:
> 
> Currently, struct seccomp_filter encodes a strict tree hierarchy: it
> knows what its parent is.  This only matters as an implementation
> detail and because TSYNC checks for seccomp_filter equality.
> 
> We could change this without user-visible effects.  We could say that,
> for TSYNC purposes, two filter states match if they contain exactly
> the same layers in the same order where a layer does *not* encode a
> concept of parent.  We could then say that attaching a classic bpf
> filter creates a branch new layer that is not equal to any other layer
> that's been created.
> 
> This has no effect whatsoever.  The difference would be that we could
> declare that attaching the same ebpf program twice creates the *same*
> layer so that, if you fork and both children attach the same ebpf
> program, then they match for TSYNC purposes.
Would you keep struct seccomp_filter identity here (meaning that you'd
reach over and grab the seccomp_filter from a sibling thread if it
existed)? Would it only work for the last filter attached to siblings,
or for all the filters? This does make my life easier, but I like the
idea of just using seccompfd directly below as it seems somewhat
easier (for me at least) to understand,
> Similarly, attaching the
> same hypothetical filterfd would create the same layer.
If we change the api of my current set to have the ptrace commands
iterate over seccomp fds, it looks something like:
seccompfd = ptrace(GET_FILTER_FD, pid);
while (ptrace(NEXT_FD, pid, seccompfd) == 0) {
        if (seccomp(CHECK_INHERITED, seccompfd))
                break;
        bpffd = seccomp(GET_BPF_FD, seccompfd);
        err = buf(BPF_PROG_DUMP, bpffd, &attr);
        /* save the bpf prog */
}
then restore can look like:
while (have_noninherited_filters()) {
        filter = load_filter();
        bpffd = bpf(BPF_PROG_LOAD, filter);
        seccompfd = seccomp(SECCOMP_FD_CREATE, bpffd);
        filters[n_filters++] = seccompfd;
}
/* fork any children as necessary and do the rest of the restore */
for (i = 0; i < n_filters; i++) {
        seccomp(SECCOMP_FD_INSTALL, filters[i]);
}
then the only question is how to implement the CHECK_INHERITED command
on dump.
If we support the above API, we don't need to think about the concept
of layers at all, or do any extra work on filter install to preserve
struct seccomp_filter identity, it just comes naturally.
Tycho
> Thoughts?
> 
> --Andy
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [PATCH v2 2/5] seccomp: make underlying bpf ref counted as well
       [not found]             ` <55F2FB6F.7050708-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
@ 2015-09-11 17:33               ` Tycho Andersen
  2015-09-11 18:28                 ` Daniel Borkmann
  0 siblings, 1 reply; 40+ messages in thread
From: Tycho Andersen @ 2015-09-11 17:33 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Kees Cook, Alexei Starovoitov, David S. Miller, Will Drewry,
	Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA
On Fri, Sep 11, 2015 at 06:03:59PM +0200, Daniel Borkmann wrote:
> On 09/11/2015 04:44 PM, Tycho Andersen wrote:
> >On Fri, Sep 11, 2015 at 03:02:36PM +0200, Daniel Borkmann wrote:
> >>On 09/11/2015 02:20 AM, Tycho Andersen wrote:
> >>>In the next patch, we're going to add a way to access the underlying
> >>>filters via bpf fds. This means that we need to ref-count both the
> >>>struct seccomp_filter objects and the struct bpf_prog objects separately,
> >>>in case a process dies but a filter is still referred to by another
> >>>process.
> >>>
> >>>Additionally, we mark classic converted seccomp filters as seccomp eBPF
> >>>programs, since they are a subset of what is supported in seccomp eBPF.
> >>>
> >>>Signed-off-by: Tycho Andersen <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> >>>CC: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> >>>CC: Will Drewry <wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> >>>CC: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> >>>CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
> >>>CC: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
> >>>CC: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
> >>>CC: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> >>>CC: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
> >>>---
> >>>  kernel/seccomp.c | 4 +++-
> >>>  1 file changed, 3 insertions(+), 1 deletion(-)
> >>>
> >>>diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> >>>index 245df6b..afaeddf 100644
> >>>--- a/kernel/seccomp.c
> >>>+++ b/kernel/seccomp.c
> >>>@@ -378,6 +378,8 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
> >>>  	}
> >>>
> >>>  	atomic_set(&sfilter->usage, 1);
> >>>+	atomic_set(&sfilter->prog->aux->refcnt, 1);
> >>>+	sfilter->prog->type = BPF_PROG_TYPE_SECCOMP;
> >>
> >>So, if you do this, then this breaks the assumption of eBPF JITs
> >>that, currently, all classic converted BPF programs always have a
> >>prog->type of BPF_PROG_TYPE_UNSPEC (see: bpf_prog_was_classic()).
> >>
> >>Currently, JITs make use of this information to determine whether
> >>A and X mappings for such programs should or should not be cleared
> >>in the prologue (s390 currently).
> >>
> >>In the seccomp_prepare_filter() stage, we're already past that, so
> >>it will not cause an issue, but we certainly would need to be very
> >>careful in future, if bpf_prog_was_classic() is then used at a later
> >>stage when we already have a generated bpf_prog somewhere, as then
> >>this assumption will break.
> >
> >The only reason we need to do this is to allow BPF_DUMP_PROG to work,
> >since we were restricting it to only allow dumping of seccomp
> >programs, since those don't have maps. Instead, perhaps we could allow
> >dumping of BPF_PROG_TYPE_SECCOMP and BPF_PROG_TYPE_UNSPEC?
> 
> There are possibilities that BPF_PROG_TYPE_UNSPEC is calling helpers
> already today, at least in networking case, not seccomp. So, since
> you want to export [classic -> eBPF] only for seccomp, put fds on them
> and dump these via bpf(2), you could allow that (with a big comment
> stating why it's safe), but mid-term we really need to sanitize all
> this stuff properly as this is needed for other types, too.
Sorry, just to be clear, you're suggesting that the patch is ok modulo
a comment describing the jit issues?
Tycho
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [PATCH v2 2/5] seccomp: make underlying bpf ref counted as well
  2015-09-11 17:33               ` Tycho Andersen
@ 2015-09-11 18:28                 ` Daniel Borkmann
  2015-09-14 16:00                   ` Tycho Andersen
  0 siblings, 1 reply; 40+ messages in thread
From: Daniel Borkmann @ 2015-09-11 18:28 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, David S. Miller, Will Drewry,
	Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	linux-kernel, netdev, linux-api
On 09/11/2015 07:33 PM, Tycho Andersen wrote:
> On Fri, Sep 11, 2015 at 06:03:59PM +0200, Daniel Borkmann wrote:
>> On 09/11/2015 04:44 PM, Tycho Andersen wrote:
>>> On Fri, Sep 11, 2015 at 03:02:36PM +0200, Daniel Borkmann wrote:
>>>> On 09/11/2015 02:20 AM, Tycho Andersen wrote:
>>>>> In the next patch, we're going to add a way to access the underlying
>>>>> filters via bpf fds. This means that we need to ref-count both the
>>>>> struct seccomp_filter objects and the struct bpf_prog objects separately,
>>>>> in case a process dies but a filter is still referred to by another
>>>>> process.
>>>>>
>>>>> Additionally, we mark classic converted seccomp filters as seccomp eBPF
>>>>> programs, since they are a subset of what is supported in seccomp eBPF.
>>>>>
>>>>> Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
>>>>> CC: Kees Cook <keescook@chromium.org>
>>>>> CC: Will Drewry <wad@chromium.org>
>>>>> CC: Oleg Nesterov <oleg@redhat.com>
>>>>> CC: Andy Lutomirski <luto@amacapital.net>
>>>>> CC: Pavel Emelyanov <xemul@parallels.com>
>>>>> CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
>>>>> CC: Alexei Starovoitov <ast@kernel.org>
>>>>> CC: Daniel Borkmann <daniel@iogearbox.net>
>>>>> ---
>>>>>   kernel/seccomp.c | 4 +++-
>>>>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
>>>>> index 245df6b..afaeddf 100644
>>>>> --- a/kernel/seccomp.c
>>>>> +++ b/kernel/seccomp.c
>>>>> @@ -378,6 +378,8 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
>>>>>   	}
>>>>>
>>>>>   	atomic_set(&sfilter->usage, 1);
>>>>> +	atomic_set(&sfilter->prog->aux->refcnt, 1);
>>>>> +	sfilter->prog->type = BPF_PROG_TYPE_SECCOMP;
>>>>
>>>> So, if you do this, then this breaks the assumption of eBPF JITs
>>>> that, currently, all classic converted BPF programs always have a
>>>> prog->type of BPF_PROG_TYPE_UNSPEC (see: bpf_prog_was_classic()).
>>>>
>>>> Currently, JITs make use of this information to determine whether
>>>> A and X mappings for such programs should or should not be cleared
>>>> in the prologue (s390 currently).
>>>>
>>>> In the seccomp_prepare_filter() stage, we're already past that, so
>>>> it will not cause an issue, but we certainly would need to be very
>>>> careful in future, if bpf_prog_was_classic() is then used at a later
>>>> stage when we already have a generated bpf_prog somewhere, as then
>>>> this assumption will break.
>>>
>>> The only reason we need to do this is to allow BPF_DUMP_PROG to work,
>>> since we were restricting it to only allow dumping of seccomp
>>> programs, since those don't have maps. Instead, perhaps we could allow
>>> dumping of BPF_PROG_TYPE_SECCOMP and BPF_PROG_TYPE_UNSPEC?
>>
>> There are possibilities that BPF_PROG_TYPE_UNSPEC is calling helpers
>> already today, at least in networking case, not seccomp. So, since
>> you want to export [classic -> eBPF] only for seccomp, put fds on them
>> and dump these via bpf(2), you could allow that (with a big comment
>> stating why it's safe), but mid-term we really need to sanitize all
>> this stuff properly as this is needed for other types, too.
>
> Sorry, just to be clear, you're suggesting that the patch is ok modulo
> a comment describing the jit issues?
I think due to the given insns restrictions on classic seccomp, this
could work for "most cases" (see below) for the time being until pointer
sanitation is resolved and that seccomp-only restriction from the dump
could be removed, BUT there's one more stone in the road which you still
need to take care of with this whole 'giving classic seccomp-BPF -> eBPF
transforms an fd, dumping and restoring that via bpf(2)' approach:
If you have JIT enabled on ARM32, and add a classic seccomp-BPF filter,
and dump that via your bpf(2) interface based on the current patches, what
you'll get is not eBPF opcodes but classic (!) BPF opcodes as ARM32 classic
JIT supports compilation of seccomp, since commit 24e737c1ebac ("ARM: net:
add JIT support for loads from struct seccomp_data.").
So in that case, bpf_prepare_filter() will not call into bpf_migrate_filter()
as there's simply no need for it, because the classic code could already
be JITed there. I guess other archs where JIT support for eBPF in not yet
within near sight might sooner or later support this insn for their classic
JITs, too ...
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [PATCH v2 2/5] seccomp: make underlying bpf ref counted as well
  2015-09-11 18:28                 ` Daniel Borkmann
@ 2015-09-14 16:00                   ` Tycho Andersen
  2015-09-14 16:48                     ` Daniel Borkmann
  0 siblings, 1 reply; 40+ messages in thread
From: Tycho Andersen @ 2015-09-14 16:00 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Kees Cook, Alexei Starovoitov, David S. Miller, Will Drewry,
	Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	linux-kernel, netdev, linux-api
Hi Daniel,
On Fri, Sep 11, 2015 at 08:28:19PM +0200, Daniel Borkmann wrote:
> I think due to the given insns restrictions on classic seccomp, this
> could work for "most cases" (see below) for the time being until pointer
> sanitation is resolved and that seccomp-only restriction from the dump
> could be removed,
Ok, thanks.
> BUT there's one more stone in the road which you still
> need to take care of with this whole 'giving classic seccomp-BPF -> eBPF
> transforms an fd, dumping and restoring that via bpf(2)' approach:
> 
> If you have JIT enabled on ARM32, and add a classic seccomp-BPF filter,
> and dump that via your bpf(2) interface based on the current patches, what
> you'll get is not eBPF opcodes but classic (!) BPF opcodes as ARM32 classic
> JIT supports compilation of seccomp, since commit 24e737c1ebac ("ARM: net:
> add JIT support for loads from struct seccomp_data.").
> 
> So in that case, bpf_prepare_filter() will not call into bpf_migrate_filter()
> as there's simply no need for it, because the classic code could already
> be JITed there. I guess other archs where JIT support for eBPF in not yet
> within near sight might sooner or later support this insn for their classic
> JITs, too ...
Thanks for pointing this out.
What if we legislate that the output of bpf(BPF_PROG_DUMP, ...) is
always eBPF? As near as I can tell there is no way to determine if a
struct bpf_prog is classic or eBPF, so we'd need to add a bit to
indicate whether or not the prog has been converted so that
BPF_PROG_DUMP knows when to convert it.
Tycho
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [PATCH v2 2/5] seccomp: make underlying bpf ref counted as well
  2015-09-14 16:00                   ` Tycho Andersen
@ 2015-09-14 16:48                     ` Daniel Borkmann
       [not found]                       ` <55F6FA6B.1060108-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Daniel Borkmann @ 2015-09-14 16:48 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Alexei Starovoitov, David S. Miller, Will Drewry,
	Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	linux-kernel, netdev, linux-api
On 09/14/2015 06:00 PM, Tycho Andersen wrote:
> On Fri, Sep 11, 2015 at 08:28:19PM +0200, Daniel Borkmann wrote:
>> I think due to the given insns restrictions on classic seccomp, this
>> could work for "most cases" (see below) for the time being until pointer
>> sanitation is resolved and that seccomp-only restriction from the dump
>> could be removed,
>
> Ok, thanks.
>
>> BUT there's one more stone in the road which you still
>> need to take care of with this whole 'giving classic seccomp-BPF -> eBPF
>> transforms an fd, dumping and restoring that via bpf(2)' approach:
>>
>> If you have JIT enabled on ARM32, and add a classic seccomp-BPF filter,
>> and dump that via your bpf(2) interface based on the current patches, what
>> you'll get is not eBPF opcodes but classic (!) BPF opcodes as ARM32 classic
>> JIT supports compilation of seccomp, since commit 24e737c1ebac ("ARM: net:
>> add JIT support for loads from struct seccomp_data.").
>>
>> So in that case, bpf_prepare_filter() will not call into bpf_migrate_filter()
>> as there's simply no need for it, because the classic code could already
>> be JITed there. I guess other archs where JIT support for eBPF in not yet
>> within near sight might sooner or later support this insn for their classic
>> JITs, too ...
>
> Thanks for pointing this out.
>
> What if we legislate that the output of bpf(BPF_PROG_DUMP, ...) is
> always eBPF? As near as I can tell there is no way to determine if a
> struct bpf_prog is classic or eBPF, so we'd need to add a bit to
> indicate whether or not the prog has been converted so that
> BPF_PROG_DUMP knows when to convert it.
As I said, you have bpf_prog_was_classic() function to determine exactly
this (so without your type re-assignment you have a way to distinguish it).
Wouldn't it be much easier to rip this set apart into multiple ones, solving
one individual thing at a time, f.e. starting out simple and 1) only add
native eBPF support to seccomp, after that 2) add a method to dump native-only
eBPF programs for criu, then 3) think about a right interface for classic
BPF seccomp dumping, etc, etc? Currently, it tries to solve everything at
once, and with some early assumptions that have non-trivial side-effects.
Thanks,
Daniel
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [PATCH v2 2/5] seccomp: make underlying bpf ref counted as well
       [not found]                       ` <55F6FA6B.1060108-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
@ 2015-09-14 17:30                         ` Tycho Andersen
  0 siblings, 0 replies; 40+ messages in thread
From: Tycho Andersen @ 2015-09-14 17:30 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Kees Cook, Alexei Starovoitov, David S. Miller, Will Drewry,
	Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA
On Mon, Sep 14, 2015 at 06:48:43PM +0200, Daniel Borkmann wrote:
> On 09/14/2015 06:00 PM, Tycho Andersen wrote:
> >On Fri, Sep 11, 2015 at 08:28:19PM +0200, Daniel Borkmann wrote:
> >>I think due to the given insns restrictions on classic seccomp, this
> >>could work for "most cases" (see below) for the time being until pointer
> >>sanitation is resolved and that seccomp-only restriction from the dump
> >>could be removed,
> >
> >Ok, thanks.
> >
> >>BUT there's one more stone in the road which you still
> >>need to take care of with this whole 'giving classic seccomp-BPF -> eBPF
> >>transforms an fd, dumping and restoring that via bpf(2)' approach:
> >>
> >>If you have JIT enabled on ARM32, and add a classic seccomp-BPF filter,
> >>and dump that via your bpf(2) interface based on the current patches, what
> >>you'll get is not eBPF opcodes but classic (!) BPF opcodes as ARM32 classic
> >>JIT supports compilation of seccomp, since commit 24e737c1ebac ("ARM: net:
> >>add JIT support for loads from struct seccomp_data.").
> >>
> >>So in that case, bpf_prepare_filter() will not call into bpf_migrate_filter()
> >>as there's simply no need for it, because the classic code could already
> >>be JITed there. I guess other archs where JIT support for eBPF in not yet
> >>within near sight might sooner or later support this insn for their classic
> >>JITs, too ...
> >
> >Thanks for pointing this out.
> >
> >What if we legislate that the output of bpf(BPF_PROG_DUMP, ...) is
> >always eBPF? As near as I can tell there is no way to determine if a
> >struct bpf_prog is classic or eBPF, so we'd need to add a bit to
> >indicate whether or not the prog has been converted so that
> >BPF_PROG_DUMP knows when to convert it.
> 
> As I said, you have bpf_prog_was_classic() function to determine exactly
> this (so without your type re-assignment you have a way to distinguish it).
I don't think this is the same thing, though. IIUC, when the classic
jit succeeds, bpf_prog_was_classic() will still return true even
though prog->insnsi points to classic instructions instead of eBPF
ones, and (I think) this situation is impossible to distinguish.
Anyway, it sounds like this doesn't matter, as we have...
> Wouldn't it be much easier to rip this set apart into multiple ones, solving
> one individual thing at a time, f.e. starting out simple and 1) only add
> native eBPF support to seccomp, after that 2) add a method to dump native-only
> eBPF programs for criu, then 3) think about a right interface for classic
> BPF seccomp dumping, etc, etc? Currently, it tries to solve everything at
> once, and with some early assumptions that have non-trivial side-effects.
The primary motivation for this set is your bullet 3, c/r of programs
with classic bpf programs (i.e. what seccomp supports now). Initially,
I thought it was best to try and dump the eBPFs directly, but it seems
there are a lot of complications I wasn't aware of. Perhaps I'll look
at a bpf_prog_store_orig_filter() style approach.
Thanks,
Tycho
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: v2 of seccomp filter c/r patches
  2015-09-11 17:28           ` Tycho Andersen
@ 2015-09-14 17:52             ` Andy Lutomirski
  2015-09-15 16:07               ` Tycho Andersen
  0 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2015-09-14 17:52 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Pavel Emelyanov, Network Development,
	Alexei Starovoitov, David S. Miller, Oleg Nesterov,
	Serge E. Hallyn, Linux API, Will Drewry,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Daniel Borkmann
On Sep 11, 2015 10:28 AM, "Tycho Andersen" <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
>
> On Fri, Sep 11, 2015 at 10:00:22AM -0700, Andy Lutomirski wrote:
> > On Fri, Sep 11, 2015 at 9:30 AM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
> > > On Sep 10, 2015 5:22 PM, "Tycho Andersen" <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
> > >>
> > >> Hi all,
> > >>
> > >> Here is v2 of the seccomp filter c/r set. The patch notes have individual
> > >> changes from the last series, but there are two points not noted:
> > >>
> > >> * The series still does not allow us to correctly restore state for programs
> > >>   that will use SECCOMP_FILTER_FLAG_TSYNC in the future. Given that we want to
> > >>   keep seccomp_filter's identity, I think something along the lines of another
> > >>   seccomp command like SECCOMP_INHERIT_PARENT is needed (although I'm not sure
> > >>   if this can even be done yet). In addition, we'll need a kcmp command for
> > >>   figuring out if filters are the same, although this too needs to compare
> > >>   seccomp_filter objects, so it's a little screwy. Any thoughts on how to do
> > >>   this nicely are welcome.
> > >
> > > Let's add a concept of a seccompfd.
> > >
> > > For background of what I want to add: I want to be able to create a
> > > seccomp monitor.  A seccomp monitor will be, logically, a pair of a
> > > struct file that represents the monitor and a seccomp_filter that is
> > > controlled by the monitor.  Depending on flags, whoever holds the
> > > monitor fd could change the active filter, intercept syscalls, and
> > > issue syscalls on behalf of a process that is trapped in an
> > > intercepted syscall.
> > >
> > > Seccomp filters would nest properly.
> > >
> > > The interface would probably be (extremely pseudocoded):
> > >
> > > monitor_fd, filter_fd = seccomp(CREATE_MONITOR, flags, ...);
> > >
> > > Then, later:
> > >
> > > seccomp(ATTACH_TO_FILTER, filter_fd);  /* now filtered */
> > >
> > > read(monitor_fd, buf, size); /* returns an intercepted syscall */
> > > write(monitor_fd, buf, size); /* issues a syscall or releases the
> > > trapped task */
> > >
> > > This can't be implemented on x86 without either going insane or
> > > finishing the massive set of pending cleanups to the x86 entry code.
> > > I favor the latter.
> > >
> > > We could, however, add part of it right now: we could have a way to
> > > create a filterfd, we could add kcmp support for it, and we could add
> > > the ATTACH_TO_FILTER thing.  I think that would solve your problem.
> > >
> > > One major open question: does a filter_fd know what its parent is and,
> > > if so, will it just refuse to attach if the caller's parent is wrong?
> > > Or will a filter_fd attach anywhere.
> > >
> >
> > Let me add one more thought:
> >
> > Currently, struct seccomp_filter encodes a strict tree hierarchy: it
> > knows what its parent is.  This only matters as an implementation
> > detail and because TSYNC checks for seccomp_filter equality.
> >
> > We could change this without user-visible effects.  We could say that,
> > for TSYNC purposes, two filter states match if they contain exactly
> > the same layers in the same order where a layer does *not* encode a
> > concept of parent.  We could then say that attaching a classic bpf
> > filter creates a branch new layer that is not equal to any other layer
> > that's been created.
> >
> > This has no effect whatsoever.  The difference would be that we could
> > declare that attaching the same ebpf program twice creates the *same*
> > layer so that, if you fork and both children attach the same ebpf
> > program, then they match for TSYNC purposes.
>
> Would you keep struct seccomp_filter identity here (meaning that you'd
> reach over and grab the seccomp_filter from a sibling thread if it
> existed)? Would it only work for the last filter attached to siblings,
> or for all the filters? This does make my life easier, but I like the
> idea of just using seccompfd directly below as it seems somewhat
> easier (for me at least) to understand,
>
If we did that, it would just be an internal optimization.
> > Similarly, attaching the
> > same hypothetical filterfd would create the same layer.
>
> If we change the api of my current set to have the ptrace commands
> iterate over seccomp fds, it looks something like:
>
> seccompfd = ptrace(GET_FILTER_FD, pid);
> while (ptrace(NEXT_FD, pid, seccompfd) == 0) {
>         if (seccomp(CHECK_INHERITED, seccompfd))
>                 break;
>
>         bpffd = seccomp(GET_BPF_FD, seccompfd);
>         err = buf(BPF_PROG_DUMP, bpffd, &attr);
>         /* save the bpf prog */
> }
>
> then restore can look like:
>
> while (have_noninherited_filters()) {
>         filter = load_filter();
>         bpffd = bpf(BPF_PROG_LOAD, filter);
>         seccompfd = seccomp(SECCOMP_FD_CREATE, bpffd);
>
>         filters[n_filters++] = seccompfd;
> }
>
> /* fork any children as necessary and do the rest of the restore */
>
> for (i = 0; i < n_filters; i++) {
>         seccomp(SECCOMP_FD_INSTALL, filters[i]);
> }
>
> then the only question is how to implement the CHECK_INHERITED command
> on dump.
I don't think it would be a well defined operation.  I think you'd
have to ask "for this pid, give me the nth thing in the stack", since
an fd identifying a layer without reference to its parent would no
longer even be guaranteed to be unique in the filter stack for a given
task.
I'm not sure I entirely like this solution...
>
> If we support the above API, we don't need to think about the concept
> of layers at all, or do any extra work on filter install to preserve
> struct seccomp_filter identity, it just comes naturally.
>
> Tycho
>
> > Thoughts?
> >
> > --Andy
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: [PATCH v2 4/5] seccomp: add a way to access filters via bpf fds
  2015-09-11 16:44     ` Tycho Andersen
@ 2015-09-14 17:52       ` Andy Lutomirski
  0 siblings, 0 replies; 40+ messages in thread
From: Andy Lutomirski @ 2015-09-14 17:52 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Pavel Emelyanov, Network Development,
	Alexei Starovoitov, David S. Miller, Oleg Nesterov,
	Serge E. Hallyn, Linux API, Will Drewry,
	linux-kernel@vger.kernel.org, Daniel Borkmann
On Sep 11, 2015 9:44 AM, "Tycho Andersen" <tycho.andersen@canonical.com> wrote:
>
> On Fri, Sep 11, 2015 at 09:20:55AM -0700, Andy Lutomirski wrote:
> > On Sep 10, 2015 5:22 PM, "Tycho Andersen" <tycho.andersen@canonical.com> wrote:
> > >
> > > This patch adds a way for a process that is "real root" to access the
> > > seccomp filters of another process. The process first does a
> > > PTRACE_SECCOMP_GET_FILTER_FD to get an fd with that process' seccomp filter
> > > attached, and then iterates on this with PTRACE_SECCOMP_NEXT_FILTER using
> > > bpf(BPF_PROG_DUMP) to dump the actual program at each step.
> > >
> >
> > > +
> > > +       fd = bpf_new_fd(filter->prog, O_RDONLY);
> > > +       if (fd > 0)
> > > +               atomic_inc(&filter->prog->aux->refcnt);
> >
> > Why isn't this folded into bpf_new_fd?
>
> No reason it can't be as far as I can see. I'll make the change for
> the next version.
>
> > > +
> > > +       return fd;
> > > +}
> > > +
> > > +long seccomp_next_filter(struct task_struct *child, u32 fd)
> > > +{
> > > +       struct seccomp_filter *cur;
> > > +       struct bpf_prog *prog;
> > > +       long ret = -ESRCH;
> > > +
> > > +       if (!capable(CAP_SYS_ADMIN))
> > > +               return -EACCES;
> > > +
> > > +       if (child->seccomp.mode != SECCOMP_MODE_FILTER)
> > > +               return -EINVAL;
> > > +
> > > +       prog = bpf_prog_get(fd);
> > > +       if (IS_ERR(prog)) {
> > > +               ret = PTR_ERR(prog);
> > > +               goto out;
> > > +       }
> > > +
> > > +       for (cur = child->seccomp.filter; cur; cur = cur->prev) {
> > > +               if (cur->prog == prog) {
> > > +                       if (!cur->prev)
> > > +                               ret = -ENOENT;
> > > +                       else
> > > +                               ret = bpf_prog_set(fd, cur->prev->prog);
> >
> > This lets you take an fd pointing to one prog and point it elsewhere.
> > I'm not sure that's a good idea.
>
> That's how the interface was designed (calling ptrace(NEXT_FILTER, fd) and
> then doing bpf(DUMP, fd)). I suppose we could have NEXT_FILTER return
> a new fd instead if that seems better to you.
It'll be slower, but it avoids a weird side effect.
>
> Tycho
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: v2 of seccomp filter c/r patches
  2015-09-14 17:52             ` Andy Lutomirski
@ 2015-09-15 16:07               ` Tycho Andersen
  2015-09-15 18:13                 ` Andy Lutomirski
  0 siblings, 1 reply; 40+ messages in thread
From: Tycho Andersen @ 2015-09-15 16:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Pavel Emelyanov, Network Development,
	Alexei Starovoitov, David S. Miller, Oleg Nesterov,
	Serge E. Hallyn, Linux API, Will Drewry,
	linux-kernel@vger.kernel.org, Daniel Borkmann
Hi Andy,
On Mon, Sep 14, 2015 at 10:52:46AM -0700, Andy Lutomirski wrote:
>
> I'm not sure I entirely like this solution...
Ok. Since we also aren't going to do all the eBPF stuff now, how about
something that looks like this:
struct seccomp_layer {
  unsigned int size;
  unsigned int type; /* SECCOMP_BPF_CLASSIC or SECCOMP_EBPF or ... */
  bool inherited;
  union {
    unsigned int insn_cnt;
    struct bpf_insn *insns;
  };
};
with a ptrace command:
ptrace(PTRACE_SECCOMP_DUMP_LAYER, pid, i, &layer);
If we save a pointer to the current seccomp filter on fork (if there
is one), then I think the inherited flag is just,
inherited = is_ancestor(child->seccomp.filter, child->seccomp.inherited_filter)
In order to restore this (so it can be checkpointed again), we need a
command that looks like:
seccomp(SECCOMP_INHERIT_FILTER);
which sets the current and inherited filter to that of the parent
process. (Optionally we could have seccomp(SECCOMP_INHERIT_FILTER, i)
to inherit the ith filter from the parent, but we can coordinate this
via userpace so it's not strictly necessary.) So the whole c/r process
looks something like:
--- dump ---
for (i = 0; true; i++) {
  ret = ptrace(PTRACE_SECCOMP_DUMP_FILTER, pid, i, &layer);
  if (ret == -ESRCH)
    break;
  if (ret < 0)
    /* real error */
  /* save the filter if it's not inherited, if it is, mark the filter
   * to be inherited from ppid; note that this index is walking up the
   * tree following filter->prev, and the index we want to reason
   * about on restore is walking down, so we should reverse the whole
   * array.
   */
}
--- restore ---
if (have_inherited_filters) {
  wait_for_ppid_seccomp_restore(n_inherited);
  seccomp(SECCOMP_INHERIT_FILTER);
  signal_done_inheriting();
}
for (i = 0; i < n_filters; i++) {
  seccomp(SECCOMP_SET_MODE_FILTER, ...);
  if (child_inherited_filter(i))
    signal_children_filter(i);
}
I played around with an implementation of SECCOMP_INHERIT_FILTER last
night and I think I have one that might work. Thoughts?
Tycho
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: v2 of seccomp filter c/r patches
  2015-09-15 16:07               ` Tycho Andersen
@ 2015-09-15 18:13                 ` Andy Lutomirski
       [not found]                   ` <CALCETrVxhNvmEdMq0XRy1YZ+oJLDwcmE1y6prs7FGGhsS-Y5gg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2015-09-15 18:13 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Pavel Emelyanov, Network Development,
	Alexei Starovoitov, David S. Miller, Oleg Nesterov,
	Serge E. Hallyn, Linux API, Will Drewry,
	linux-kernel@vger.kernel.org, Daniel Borkmann
On Tue, Sep 15, 2015 at 9:07 AM, Tycho Andersen
<tycho.andersen@canonical.com> wrote:
> Hi Andy,
>
> On Mon, Sep 14, 2015 at 10:52:46AM -0700, Andy Lutomirski wrote:
>>
>> I'm not sure I entirely like this solution...
>
> Ok. Since we also aren't going to do all the eBPF stuff now, how about
> something that looks like this:
>
> struct seccomp_layer {
>   unsigned int size;
>   unsigned int type; /* SECCOMP_BPF_CLASSIC or SECCOMP_EBPF or ... */
>   bool inherited;
>   union {
>     unsigned int insn_cnt;
>     struct bpf_insn *insns;
>   };
> };
>
> with a ptrace command:
>
> ptrace(PTRACE_SECCOMP_DUMP_LAYER, pid, i, &layer);
>
> If we save a pointer to the current seccomp filter on fork (if there
> is one), then I think the inherited flag is just,
>
> inherited = is_ancestor(child->seccomp.filter, child->seccomp.inherited_filter)
>
I'm lost.  What is the inherited flag for?
--Andy
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: v2 of seccomp filter c/r patches
       [not found]                   ` <CALCETrVxhNvmEdMq0XRy1YZ+oJLDwcmE1y6prs7FGGhsS-Y5gg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-09-15 18:26                     ` Tycho Andersen
  2015-09-15 20:01                       ` Andy Lutomirski
  0 siblings, 1 reply; 40+ messages in thread
From: Tycho Andersen @ 2015-09-15 18:26 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Pavel Emelyanov, Network Development,
	Alexei Starovoitov, David S. Miller, Oleg Nesterov,
	Serge E. Hallyn, Linux API, Will Drewry,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Daniel Borkmann
Hi Andy,
On Tue, Sep 15, 2015 at 11:13:51AM -0700, Andy Lutomirski wrote:
> On Tue, Sep 15, 2015 at 9:07 AM, Tycho Andersen
> <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
> > Hi Andy,
> >
> > On Mon, Sep 14, 2015 at 10:52:46AM -0700, Andy Lutomirski wrote:
> >>
> >> I'm not sure I entirely like this solution...
> >
> > Ok. Since we also aren't going to do all the eBPF stuff now, how about
> > something that looks like this:
> >
> > struct seccomp_layer {
> >   unsigned int size;
> >   unsigned int type; /* SECCOMP_BPF_CLASSIC or SECCOMP_EBPF or ... */
> >   bool inherited;
> >   union {
> >     unsigned int insn_cnt;
> >     struct bpf_insn *insns;
> >   };
> > };
> >
> > with a ptrace command:
> >
> > ptrace(PTRACE_SECCOMP_DUMP_LAYER, pid, i, &layer);
> >
> > If we save a pointer to the current seccomp filter on fork (if there
> > is one), then I think the inherited flag is just,
> >
> > inherited = is_ancestor(child->seccomp.filter, child->seccomp.inherited_filter)
> >
> 
> I'm lost.  What is the inherited flag for?
We need some way to expose the seccomp hierarchy, specifically which
filters are inherited, so that we can correctly restore the filter
tree for tasks that may use TSYNC in the future. You've mentioned that
you don't like kcmp, so this is an alternative to that.
Tycho
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: v2 of seccomp filter c/r patches
  2015-09-15 18:26                     ` Tycho Andersen
@ 2015-09-15 20:01                       ` Andy Lutomirski
  2015-09-15 21:38                         ` Tycho Andersen
  0 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2015-09-15 20:01 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Pavel Emelyanov, Network Development,
	Alexei Starovoitov, David S. Miller, Oleg Nesterov,
	Serge E. Hallyn, Linux API, Will Drewry,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Daniel Borkmann
On Tue, Sep 15, 2015 at 11:26 AM, Tycho Andersen
<tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
> Hi Andy,
>
> On Tue, Sep 15, 2015 at 11:13:51AM -0700, Andy Lutomirski wrote:
>> On Tue, Sep 15, 2015 at 9:07 AM, Tycho Andersen
>> <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
>> > Hi Andy,
>> >
>> > On Mon, Sep 14, 2015 at 10:52:46AM -0700, Andy Lutomirski wrote:
>> >>
>> >> I'm not sure I entirely like this solution...
>> >
>> > Ok. Since we also aren't going to do all the eBPF stuff now, how about
>> > something that looks like this:
>> >
>> > struct seccomp_layer {
>> >   unsigned int size;
>> >   unsigned int type; /* SECCOMP_BPF_CLASSIC or SECCOMP_EBPF or ... */
>> >   bool inherited;
>> >   union {
>> >     unsigned int insn_cnt;
>> >     struct bpf_insn *insns;
>> >   };
>> > };
>> >
>> > with a ptrace command:
>> >
>> > ptrace(PTRACE_SECCOMP_DUMP_LAYER, pid, i, &layer);
>> >
>> > If we save a pointer to the current seccomp filter on fork (if there
>> > is one), then I think the inherited flag is just,
>> >
>> > inherited = is_ancestor(child->seccomp.filter, child->seccomp.inherited_filter)
>> >
>>
>> I'm lost.  What is the inherited flag for?
>
> We need some way to expose the seccomp hierarchy, specifically which
> filters are inherited, so that we can correctly restore the filter
> tree for tasks that may use TSYNC in the future. You've mentioned that
> you don't like kcmp, so this is an alternative to that.
>
My only objection to kcmp is that IMO it's a suboptimal interface and
could be better.  I have no problem with the general principle of
asking to compare two objects.
The thing I really don't have a good handle on is whether the seccomp
filter hierarchy should look more like A:
struct seccomp_filter {
    ...;
    struct seccomp_filter *prev;
};
with the seccomp_filter being the user-visible object
Or B:
struct seccomp_layer {
   ...;  /* BPF program, etc. */
}
struct seccomp_filter {
   struct seccomp_layer *layer;
   struct seccomp_filter *prev;
};  /* or equivalent */
with seccomp_layer being the user-visible object.
A is simpler to implement in a memory-efficient way, but it's less
flexible.  I haven't come up with a compelling use case for B where A
doesn't work, with the caveat that, if an fd points to a
seccomp_filter in model A, you can't attach it unless your current
state matches its "prev" state (or an ancestor thereof), which might
be a little bit awkward.
Am I making more sense now?
--Andy
^ permalink raw reply	[flat|nested] 40+ messages in thread
* Re: v2 of seccomp filter c/r patches
  2015-09-15 20:01                       ` Andy Lutomirski
@ 2015-09-15 21:38                         ` Tycho Andersen
  0 siblings, 0 replies; 40+ messages in thread
From: Tycho Andersen @ 2015-09-15 21:38 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Pavel Emelyanov, Network Development,
	Alexei Starovoitov, David S. Miller, Oleg Nesterov,
	Serge E. Hallyn, Linux API, Will Drewry,
	linux-kernel@vger.kernel.org, Daniel Borkmann
Hi Andy,
On Tue, Sep 15, 2015 at 01:01:23PM -0700, Andy Lutomirski wrote:
> On Tue, Sep 15, 2015 at 11:26 AM, Tycho Andersen
> <tycho.andersen@canonical.com> wrote:
> > Hi Andy,
> >
> > On Tue, Sep 15, 2015 at 11:13:51AM -0700, Andy Lutomirski wrote:
> >> On Tue, Sep 15, 2015 at 9:07 AM, Tycho Andersen
> >> <tycho.andersen@canonical.com> wrote:
> >> > Hi Andy,
> >> >
> >> > On Mon, Sep 14, 2015 at 10:52:46AM -0700, Andy Lutomirski wrote:
> >> >>
> >> >> I'm not sure I entirely like this solution...
> >> >
> >> > Ok. Since we also aren't going to do all the eBPF stuff now, how about
> >> > something that looks like this:
> >> >
> >> > struct seccomp_layer {
> >> >   unsigned int size;
> >> >   unsigned int type; /* SECCOMP_BPF_CLASSIC or SECCOMP_EBPF or ... */
> >> >   bool inherited;
> >> >   union {
> >> >     unsigned int insn_cnt;
> >> >     struct bpf_insn *insns;
> >> >   };
> >> > };
> >> >
> >> > with a ptrace command:
> >> >
> >> > ptrace(PTRACE_SECCOMP_DUMP_LAYER, pid, i, &layer);
> >> >
> >> > If we save a pointer to the current seccomp filter on fork (if there
> >> > is one), then I think the inherited flag is just,
> >> >
> >> > inherited = is_ancestor(child->seccomp.filter, child->seccomp.inherited_filter)
> >> >
> >>
> >> I'm lost.  What is the inherited flag for?
> >
> > We need some way to expose the seccomp hierarchy, specifically which
> > filters are inherited, so that we can correctly restore the filter
> > tree for tasks that may use TSYNC in the future. You've mentioned that
> > you don't like kcmp, so this is an alternative to that.
> >
> 
> My only objection to kcmp is that IMO it's a suboptimal interface and
> could be better.  I have no problem with the general principle of
> asking to compare two objects.
Ok, in that case I think we can get rid of all the inherited stuff,
and use kcmp to figure it out.
> The thing I really don't have a good handle on is whether the seccomp
> filter hierarchy should look more like A:
> 
> struct seccomp_filter {
>     ...;
>     struct seccomp_filter *prev;
> };
> 
> with the seccomp_filter being the user-visible object
> 
> Or B:
> 
> struct seccomp_layer {
>    ...;  /* BPF program, etc. */
> }
> 
> struct seccomp_filter {
>    struct seccomp_layer *layer;
>    struct seccomp_filter *prev;
> };  /* or equivalent */
> 
> with seccomp_layer being the user-visible object.
> 
> A is simpler to implement in a memory-efficient way, but it's less
> flexible.  I haven't come up with a compelling use case for B where A
> doesn't work, with the caveat that, if an fd points to a
> seccomp_filter in model A, you can't attach it unless your current
> state matches its "prev" state (or an ancestor thereof), which might
> be a little bit awkward.
Perhaps, although I don't think it would be an issue for c/r.
> Am I making more sense now?
Yes, thanks for the clarifications. I guess personally I'd probably
choose option A. If this (using kcmp and one of A/B) sounds good to
you, I'll start working on a set to do c/r that way.
Tycho
^ permalink raw reply	[flat|nested] 40+ messages in thread
end of thread, other threads:[~2015-09-15 21:38 UTC | newest]
Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-11  0:20 v2 of seccomp filter c/r patches Tycho Andersen
2015-09-11  0:20 ` [PATCH v2 2/5] seccomp: make underlying bpf ref counted as well Tycho Andersen
     [not found]   ` <1441930862-14347-3-git-send-email-tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
2015-09-11 13:02     ` Daniel Borkmann
     [not found]       ` <55F2D0EC.9090004-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
2015-09-11 14:44         ` Tycho Andersen
2015-09-11 16:03           ` Daniel Borkmann
     [not found]             ` <55F2FB6F.7050708-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
2015-09-11 17:33               ` Tycho Andersen
2015-09-11 18:28                 ` Daniel Borkmann
2015-09-14 16:00                   ` Tycho Andersen
2015-09-14 16:48                     ` Daniel Borkmann
     [not found]                       ` <55F6FA6B.1060108-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
2015-09-14 17:30                         ` Tycho Andersen
2015-09-11  0:21 ` [PATCH v2 4/5] seccomp: add a way to access filters via bpf fds Tycho Andersen
2015-09-11 11:47   ` Daniel Borkmann
     [not found]     ` <55F2BF5A.8010006-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
2015-09-11 14:29       ` Tycho Andersen
     [not found]   ` <1441930862-14347-5-git-send-email-tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
2015-09-11 12:08     ` Michael Kerrisk (man-pages)
     [not found]       ` <CAKgNAki99ZFgLPE5mWWjj1nvdNyke1w0ttqmiG+Uk0rVfqutZw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-11 14:31         ` Tycho Andersen
2015-09-11 16:20   ` Andy Lutomirski
2015-09-11 16:44     ` Tycho Andersen
2015-09-14 17:52       ` Andy Lutomirski
     [not found] ` <1441930862-14347-1-git-send-email-tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
2015-09-11  0:20   ` [PATCH v2 1/5] ebpf: add a seccomp program type Tycho Andersen
2015-09-11 12:09     ` Michael Kerrisk (man-pages)
2015-09-11  0:21   ` [PATCH v2 3/5] ebpf: add a way to dump an eBPF program Tycho Andersen
     [not found]     ` <1441930862-14347-4-git-send-email-tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
2015-09-11  2:29       ` Alexei Starovoitov
     [not found]         ` <20150911022940.GA4903-2RGepAHry06MXrjNfwE7T/6muRTtt8+awzqs5ZKRSiY@public.gmane.org>
2015-09-11 14:59           ` Tycho Andersen
2015-09-11 13:39       ` Daniel Borkmann
2015-09-11 14:44         ` Tycho Andersen
2015-09-11 12:11     ` Michael Kerrisk (man-pages)
2015-09-11  0:21   ` [PATCH v2 5/5] seccomp: add a way to attach a filter via eBPF fd Tycho Andersen
2015-09-11 12:10     ` Michael Kerrisk (man-pages)
2015-09-11 12:37     ` Daniel Borkmann
     [not found]       ` <55F2CB27.7030804-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
2015-09-11 14:40         ` Tycho Andersen
2015-09-11  2:50   ` v2 of seccomp filter c/r patches Alexei Starovoitov
2015-09-11 16:30   ` Andy Lutomirski
     [not found]     ` <CALCETrVYtv1=g-xPjQ-LiX+5GK3xtB6a2hYbat0TuU-Bd4QA6Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-11 17:00       ` Andy Lutomirski
     [not found]         ` <CALCETrWxLMSgdsdT9gTL80LSovONmCcTYjzqrHqF-WdJ4BN1Uw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-11 17:28           ` Tycho Andersen
2015-09-14 17:52             ` Andy Lutomirski
2015-09-15 16:07               ` Tycho Andersen
2015-09-15 18:13                 ` Andy Lutomirski
     [not found]                   ` <CALCETrVxhNvmEdMq0XRy1YZ+oJLDwcmE1y6prs7FGGhsS-Y5gg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-15 18:26                     ` Tycho Andersen
2015-09-15 20:01                       ` Andy Lutomirski
2015-09-15 21:38                         ` Tycho Andersen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).