* Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd [not found] ` <1441382664-17437-6-git-send-email-tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> @ 2015-09-04 20:41 ` Kees Cook [not found] ` <CAGXu5jKke44txdYqEgPRrkn8SyWGjJuHxT2qMdq2ztp_16mQyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Kees Cook @ 2015-09-04 20:41 UTC (permalink / raw) To: Tycho Andersen, Linux API Cc: Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML, Network Development On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote: > This is the final bit needed to support seccomp filters created via the bpf > syscall. > > One concern with this patch is exactly what the interface should look like > for users, since seccomp()'s second argument is a pointer, we could ask > people to pass a pointer to the fd, but implies we might write to it which > seems impolite. Right now we cast the pointer (and force the user to cast > it), which generates ugly warnings. I'm not sure what the right answer is > here. > > Signed-off-by: Tycho Andersen <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> > CC: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> > CC: Will Drewry <wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> > CC: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> > CC: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> > CC: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> > CC: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> > CC: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org> > --- > include/linux/seccomp.h | 3 +- > include/uapi/linux/seccomp.h | 1 + > kernel/seccomp.c | 70 ++++++++++++++++++++++++++++++++++++-------- > 3 files changed, 61 insertions(+), 13 deletions(-) > > diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h > index d1a86ed..a725dd5 100644 > --- a/include/linux/seccomp.h > +++ b/include/linux/seccomp.h > @@ -3,7 +3,8 @@ > > #include <uapi/linux/seccomp.h> > > -#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC) > +#define SECCOMP_FILTER_FLAG_MASK (\ > + SECCOMP_FILTER_FLAG_TSYNC | SECCOMP_FILTER_FLAG_EBPF) > > #ifdef CONFIG_SECCOMP > > diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h > index 0f238a4..c29a423 100644 > --- a/include/uapi/linux/seccomp.h > +++ b/include/uapi/linux/seccomp.h > @@ -16,6 +16,7 @@ > > /* Valid flags for SECCOMP_SET_MODE_FILTER */ > #define SECCOMP_FILTER_FLAG_TSYNC 1 > +#define SECCOMP_FILTER_FLAG_EBPF (1 << 1) > > /* > * All BPF programs must return a 32-bit value. > diff --git a/kernel/seccomp.c b/kernel/seccomp.c > index a2c5b32..9c6bea6 100644 > --- a/kernel/seccomp.c > +++ b/kernel/seccomp.c > @@ -355,17 +355,6 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog) > > BUG_ON(INT_MAX / fprog->len < sizeof(struct sock_filter)); > > - /* > - * Installing a seccomp filter requires that the task has > - * CAP_SYS_ADMIN in its namespace or be running with no_new_privs. > - * This avoids scenarios where unprivileged tasks can affect the > - * behavior of privileged children. > - */ > - if (!task_no_new_privs(current) && > - security_capable_noaudit(current_cred(), current_user_ns(), > - CAP_SYS_ADMIN) != 0) > - return ERR_PTR(-EACCES); > - > /* Allocate a new seccomp_filter */ > sfilter = kzalloc(sizeof(*sfilter), GFP_KERNEL | __GFP_NOWARN); > if (!sfilter) > @@ -509,6 +498,48 @@ static void seccomp_send_sigsys(int syscall, int reason) > info.si_syscall = syscall; > force_sig_info(SIGSYS, &info, current); > } > + > +#ifdef CONFIG_BPF_SYSCALL > +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter) > +{ > + /* XXX: this cast generates a warning. should we make people pass in > + * &fd, or is there some nicer way of doing this? > + */ > + u32 fd = (u32) filter; I think this is probably the right way to do it, modulo getting the warning fixed. Let me invoke the great linux-api subscribers to get some more opinions. tl;dr: adding SECCOMP_FILTER_FLAG_EBPF to the flags changes the pointer argument into an fd argument. Is this sane, should it be a pointer to an fd, or should it not be a flag at all, creating a new seccomp command instead (SECCOMP_MODE_FILTER_EBPF)? -Kees > + struct seccomp_filter *ret; > + struct bpf_prog *prog; > + > + prog = bpf_prog_get(fd); > + if (IS_ERR(prog)) > + return (struct seccomp_filter *) prog; > + > + if (prog->type != BPF_PROG_TYPE_SECCOMP) { > + bpf_prog_put(prog); > + return ERR_PTR(-EINVAL); > + } > + > + ret = kzalloc(sizeof(*ret), GFP_KERNEL | __GFP_NOWARN); > + if (!ret) { > + bpf_prog_put(prog); > + return ERR_PTR(-ENOMEM); > + } > + > + ret->prog = prog; > + atomic_set(&ret->usage, 1); > + > + /* Intentionally don't bpf_prog_put() here, because the underlying prog > + * is refcounted too and we're holding a reference from the struct > + * seccomp_filter object. > + */ > + > + return ret; > +} > +#else > +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter) > +{ > + return ERR_PTR(-EINVAL); > +} > +#endif > #endif /* CONFIG_SECCOMP_FILTER */ > > /* > @@ -775,8 +806,23 @@ static long seccomp_set_mode_filter(unsigned int flags, > if (flags & ~SECCOMP_FILTER_FLAG_MASK) > return -EINVAL; > > + /* > + * Installing a seccomp filter requires that the task has > + * CAP_SYS_ADMIN in its namespace or be running with no_new_privs. > + * This avoids scenarios where unprivileged tasks can affect the > + * behavior of privileged children. > + */ > + if (!task_no_new_privs(current) && > + security_capable_noaudit(current_cred(), current_user_ns(), > + CAP_SYS_ADMIN) != 0) > + return -EACCES; > + > /* Prepare the new filter before holding any locks. */ > - prepared = seccomp_prepare_user_filter(filter); > + if (flags & SECCOMP_FILTER_FLAG_EBPF) > + prepared = seccomp_prepare_ebpf(filter); > + else > + prepared = seccomp_prepare_user_filter(filter); > + > if (IS_ERR(prepared)) > return PTR_ERR(prepared); > > -- > 2.1.4 > -- Kees Cook Chrome OS Security ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <CAGXu5jKke44txdYqEgPRrkn8SyWGjJuHxT2qMdq2ztp_16mQyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd [not found] ` <CAGXu5jKke44txdYqEgPRrkn8SyWGjJuHxT2qMdq2ztp_16mQyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2015-09-05 7:13 ` Michael Kerrisk (man-pages) [not found] ` <55EA95FE.7000006-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Michael Kerrisk (man-pages) @ 2015-09-05 7:13 UTC (permalink / raw) To: Kees Cook, Tycho Andersen, Linux API Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML, Network Development On 09/04/2015 10:41 PM, Kees Cook wrote: > On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen > <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote: >> This is the final bit needed to support seccomp filters created via the bpf >> syscall. Hmm. Thanks Kees, for CCinf linux-api@. That really should have been done at the outset. Tycho, where's the man-pages patch describing this new kernel-userspace API feature? :-) >> One concern with this patch is exactly what the interface should look like >> for users, since seccomp()'s second argument is a pointer, we could ask >> people to pass a pointer to the fd, but implies we might write to it which >> seems impolite. Right now we cast the pointer (and force the user to cast >> it), which generates ugly warnings. I'm not sure what the right answer is >> here. >> >> Signed-off-by: Tycho Andersen <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> >> CC: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> >> CC: Will Drewry <wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> >> CC: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> >> CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> >> CC: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> >> CC: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> >> CC: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> >> CC: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org> >> --- >> include/linux/seccomp.h | 3 +- >> include/uapi/linux/seccomp.h | 1 + >> kernel/seccomp.c | 70 ++++++++++++++++++++++++++++++++++++-------- >> 3 files changed, 61 insertions(+), 13 deletions(-) >> >> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h >> index d1a86ed..a725dd5 100644 >> --- a/include/linux/seccomp.h >> +++ b/include/linux/seccomp.h >> @@ -3,7 +3,8 @@ >> >> #include <uapi/linux/seccomp.h> >> >> -#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC) >> +#define SECCOMP_FILTER_FLAG_MASK (\ >> + SECCOMP_FILTER_FLAG_TSYNC | SECCOMP_FILTER_FLAG_EBPF) >> >> #ifdef CONFIG_SECCOMP >> >> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h >> index 0f238a4..c29a423 100644 >> --- a/include/uapi/linux/seccomp.h >> +++ b/include/uapi/linux/seccomp.h >> @@ -16,6 +16,7 @@ >> >> /* Valid flags for SECCOMP_SET_MODE_FILTER */ >> #define SECCOMP_FILTER_FLAG_TSYNC 1 >> +#define SECCOMP_FILTER_FLAG_EBPF (1 << 1) >> >> /* >> * All BPF programs must return a 32-bit value. >> diff --git a/kernel/seccomp.c b/kernel/seccomp.c >> index a2c5b32..9c6bea6 100644 >> --- a/kernel/seccomp.c >> +++ b/kernel/seccomp.c >> @@ -355,17 +355,6 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog) >> >> BUG_ON(INT_MAX / fprog->len < sizeof(struct sock_filter)); >> >> - /* >> - * Installing a seccomp filter requires that the task has >> - * CAP_SYS_ADMIN in its namespace or be running with no_new_privs. >> - * This avoids scenarios where unprivileged tasks can affect the >> - * behavior of privileged children. >> - */ >> - if (!task_no_new_privs(current) && >> - security_capable_noaudit(current_cred(), current_user_ns(), >> - CAP_SYS_ADMIN) != 0) >> - return ERR_PTR(-EACCES); >> - >> /* Allocate a new seccomp_filter */ >> sfilter = kzalloc(sizeof(*sfilter), GFP_KERNEL | __GFP_NOWARN); >> if (!sfilter) >> @@ -509,6 +498,48 @@ static void seccomp_send_sigsys(int syscall, int reason) >> info.si_syscall = syscall; >> force_sig_info(SIGSYS, &info, current); >> } >> + >> +#ifdef CONFIG_BPF_SYSCALL >> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter) >> +{ >> + /* XXX: this cast generates a warning. should we make people pass in >> + * &fd, or is there some nicer way of doing this? >> + */ >> + u32 fd = (u32) filter; > > I think this is probably the right way to do it, modulo getting the > warning fixed. Let me invoke the great linux-api subscribers to get > some more opinions. Sigh. It's sad, but the using a cast does seem the simplest option. But, how about another idea... > tl;dr: adding SECCOMP_FILTER_FLAG_EBPF to the flags changes the > pointer argument into an fd argument. Is this sane, should it be a > pointer to an fd, or should it not be a flag at all, creating a new > seccomp command instead (SECCOMP_MODE_FILTER_EBPF)? What about seccomp(SECCOMP_MODE_FILTER_EBPF, flags, structp) Where structp is a pointer to something like struct seccomp_ebpf { int size; /* Size of this whole struct */ int fd; } 'size' allows for future expansion of the struct (in case we want to expand it later), and placing 'fd' inside a struct avoids unpleasant implication that would be made by passing a pointer to an fd as the third argument. Cheers, Michael > -Kees > >> + struct seccomp_filter *ret; >> + struct bpf_prog *prog; >> + >> + prog = bpf_prog_get(fd); >> + if (IS_ERR(prog)) >> + return (struct seccomp_filter *) prog; >> + >> + if (prog->type != BPF_PROG_TYPE_SECCOMP) { >> + bpf_prog_put(prog); >> + return ERR_PTR(-EINVAL); >> + } >> + >> + ret = kzalloc(sizeof(*ret), GFP_KERNEL | __GFP_NOWARN); >> + if (!ret) { >> + bpf_prog_put(prog); >> + return ERR_PTR(-ENOMEM); >> + } >> + >> + ret->prog = prog; >> + atomic_set(&ret->usage, 1); >> + >> + /* Intentionally don't bpf_prog_put() here, because the underlying prog >> + * is refcounted too and we're holding a reference from the struct >> + * seccomp_filter object. >> + */ >> + >> + return ret; >> +} >> +#else >> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter) >> +{ >> + return ERR_PTR(-EINVAL); >> +} >> +#endif >> #endif /* CONFIG_SECCOMP_FILTER */ >> >> /* >> @@ -775,8 +806,23 @@ static long seccomp_set_mode_filter(unsigned int flags, >> if (flags & ~SECCOMP_FILTER_FLAG_MASK) >> return -EINVAL; >> >> + /* >> + * Installing a seccomp filter requires that the task has >> + * CAP_SYS_ADMIN in its namespace or be running with no_new_privs. >> + * This avoids scenarios where unprivileged tasks can affect the >> + * behavior of privileged children. >> + */ >> + if (!task_no_new_privs(current) && >> + security_capable_noaudit(current_cred(), current_user_ns(), >> + CAP_SYS_ADMIN) != 0) >> + return -EACCES; >> + >> /* Prepare the new filter before holding any locks. */ >> - prepared = seccomp_prepare_user_filter(filter); >> + if (flags & SECCOMP_FILTER_FLAG_EBPF) >> + prepared = seccomp_prepare_ebpf(filter); >> + else >> + prepared = seccomp_prepare_user_filter(filter); >> + >> if (IS_ERR(prepared)) >> return PTR_ERR(prepared); >> >> -- >> 2.1.4 >> > > > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <55EA95FE.7000006-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd [not found] ` <55EA95FE.7000006-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2015-09-08 13:40 ` Tycho Andersen 2015-09-09 0:07 ` Kees Cook 0 siblings, 1 reply; 7+ messages in thread From: Tycho Andersen @ 2015-09-08 13:40 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Kees Cook, Linux API, Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML, Network Development On Sat, Sep 05, 2015 at 09:13:02AM +0200, Michael Kerrisk (man-pages) wrote: > On 09/04/2015 10:41 PM, Kees Cook wrote: > > On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen > > <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote: > >> This is the final bit needed to support seccomp filters created via the bpf > >> syscall. > > Hmm. Thanks Kees, for CCinf linux-api@. That really should have been done at > the outset. Apologies, I'll cc the list on future versions. > Tycho, where's the man-pages patch describing this new kernel-userspace > API feature? :-) Once we get the API finalized I'm happy to write it. > >> One concern with this patch is exactly what the interface should look like > >> for users, since seccomp()'s second argument is a pointer, we could ask > >> people to pass a pointer to the fd, but implies we might write to it which > >> seems impolite. Right now we cast the pointer (and force the user to cast > >> it), which generates ugly warnings. I'm not sure what the right answer is > >> here. > >> > >> Signed-off-by: Tycho Andersen <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> > >> CC: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> > >> CC: Will Drewry <wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> > >> CC: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > >> CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> > >> CC: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> > >> CC: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> > >> CC: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> > >> CC: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org> > >> --- > >> include/linux/seccomp.h | 3 +- > >> include/uapi/linux/seccomp.h | 1 + > >> kernel/seccomp.c | 70 ++++++++++++++++++++++++++++++++++++-------- > >> 3 files changed, 61 insertions(+), 13 deletions(-) > >> > >> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h > >> index d1a86ed..a725dd5 100644 > >> --- a/include/linux/seccomp.h > >> +++ b/include/linux/seccomp.h > >> @@ -3,7 +3,8 @@ > >> > >> #include <uapi/linux/seccomp.h> > >> > >> -#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC) > >> +#define SECCOMP_FILTER_FLAG_MASK (\ > >> + SECCOMP_FILTER_FLAG_TSYNC | SECCOMP_FILTER_FLAG_EBPF) > >> > >> #ifdef CONFIG_SECCOMP > >> > >> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h > >> index 0f238a4..c29a423 100644 > >> --- a/include/uapi/linux/seccomp.h > >> +++ b/include/uapi/linux/seccomp.h > >> @@ -16,6 +16,7 @@ > >> > >> /* Valid flags for SECCOMP_SET_MODE_FILTER */ > >> #define SECCOMP_FILTER_FLAG_TSYNC 1 > >> +#define SECCOMP_FILTER_FLAG_EBPF (1 << 1) > >> > >> /* > >> * All BPF programs must return a 32-bit value. > >> diff --git a/kernel/seccomp.c b/kernel/seccomp.c > >> index a2c5b32..9c6bea6 100644 > >> --- a/kernel/seccomp.c > >> +++ b/kernel/seccomp.c > >> @@ -355,17 +355,6 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog) > >> > >> BUG_ON(INT_MAX / fprog->len < sizeof(struct sock_filter)); > >> > >> - /* > >> - * Installing a seccomp filter requires that the task has > >> - * CAP_SYS_ADMIN in its namespace or be running with no_new_privs. > >> - * This avoids scenarios where unprivileged tasks can affect the > >> - * behavior of privileged children. > >> - */ > >> - if (!task_no_new_privs(current) && > >> - security_capable_noaudit(current_cred(), current_user_ns(), > >> - CAP_SYS_ADMIN) != 0) > >> - return ERR_PTR(-EACCES); > >> - > >> /* Allocate a new seccomp_filter */ > >> sfilter = kzalloc(sizeof(*sfilter), GFP_KERNEL | __GFP_NOWARN); > >> if (!sfilter) > >> @@ -509,6 +498,48 @@ static void seccomp_send_sigsys(int syscall, int reason) > >> info.si_syscall = syscall; > >> force_sig_info(SIGSYS, &info, current); > >> } > >> + > >> +#ifdef CONFIG_BPF_SYSCALL > >> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter) > >> +{ > >> + /* XXX: this cast generates a warning. should we make people pass in > >> + * &fd, or is there some nicer way of doing this? > >> + */ > >> + u32 fd = (u32) filter; > > > > I think this is probably the right way to do it, modulo getting the > > warning fixed. Let me invoke the great linux-api subscribers to get > > some more opinions. > > Sigh. It's sad, but the using a cast does seem the simplest option. > But, how about another idea... > > > tl;dr: adding SECCOMP_FILTER_FLAG_EBPF to the flags changes the > > pointer argument into an fd argument. Is this sane, should it be a > > pointer to an fd, or should it not be a flag at all, creating a new > > seccomp command instead (SECCOMP_MODE_FILTER_EBPF)? > > What about > > seccomp(SECCOMP_MODE_FILTER_EBPF, flags, structp) > > Where structp is a pointer to something like > > struct seccomp_ebpf { > int size; /* Size of this whole struct */ > int fd; > } > > 'size' allows for future expansion of the struct (in case we want to > expand it later), and placing 'fd' inside a struct avoids unpleasant > implication that would be made by passing a pointer to an fd as the > third argument. I like this; although perhaps something like bpf() has, with the unions inside the struct so that we don't have to declare another struct if we add another seccomp command. Kees? Tycho > Cheers, > > Michael > > > > -Kees > > > >> + struct seccomp_filter *ret; > >> + struct bpf_prog *prog; > >> + > >> + prog = bpf_prog_get(fd); > >> + if (IS_ERR(prog)) > >> + return (struct seccomp_filter *) prog; > >> + > >> + if (prog->type != BPF_PROG_TYPE_SECCOMP) { > >> + bpf_prog_put(prog); > >> + return ERR_PTR(-EINVAL); > >> + } > >> + > >> + ret = kzalloc(sizeof(*ret), GFP_KERNEL | __GFP_NOWARN); > >> + if (!ret) { > >> + bpf_prog_put(prog); > >> + return ERR_PTR(-ENOMEM); > >> + } > >> + > >> + ret->prog = prog; > >> + atomic_set(&ret->usage, 1); > >> + > >> + /* Intentionally don't bpf_prog_put() here, because the underlying prog > >> + * is refcounted too and we're holding a reference from the struct > >> + * seccomp_filter object. > >> + */ > >> + > >> + return ret; > >> +} > >> +#else > >> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter) > >> +{ > >> + return ERR_PTR(-EINVAL); > >> +} > >> +#endif > >> #endif /* CONFIG_SECCOMP_FILTER */ > >> > >> /* > >> @@ -775,8 +806,23 @@ static long seccomp_set_mode_filter(unsigned int flags, > >> if (flags & ~SECCOMP_FILTER_FLAG_MASK) > >> return -EINVAL; > >> > >> + /* > >> + * Installing a seccomp filter requires that the task has > >> + * CAP_SYS_ADMIN in its namespace or be running with no_new_privs. > >> + * This avoids scenarios where unprivileged tasks can affect the > >> + * behavior of privileged children. > >> + */ > >> + if (!task_no_new_privs(current) && > >> + security_capable_noaudit(current_cred(), current_user_ns(), > >> + CAP_SYS_ADMIN) != 0) > >> + return -EACCES; > >> + > >> /* Prepare the new filter before holding any locks. */ > >> - prepared = seccomp_prepare_user_filter(filter); > >> + if (flags & SECCOMP_FILTER_FLAG_EBPF) > >> + prepared = seccomp_prepare_ebpf(filter); > >> + else > >> + prepared = seccomp_prepare_user_filter(filter); > >> + > >> if (IS_ERR(prepared)) > >> return PTR_ERR(prepared); > >> > >> -- > >> 2.1.4 > >> > > > > > > > > > -- > Michael Kerrisk > Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ > Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd 2015-09-08 13:40 ` Tycho Andersen @ 2015-09-09 0:07 ` Kees Cook [not found] ` <CAGXu5jKS0yX92XXhL6ZkqMrxkqFpPyyBd7wbsvEEx4rqZ0VG6g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Kees Cook @ 2015-09-09 0:07 UTC (permalink / raw) To: Tycho Andersen Cc: Michael Kerrisk (man-pages), Linux API, Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML, Network Development On Tue, Sep 8, 2015 at 6:40 AM, Tycho Andersen <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote: > On Sat, Sep 05, 2015 at 09:13:02AM +0200, Michael Kerrisk (man-pages) wrote: >> On 09/04/2015 10:41 PM, Kees Cook wrote: >> > On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen >> > <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote: >> >> This is the final bit needed to support seccomp filters created via the bpf >> >> syscall. >> >> Hmm. Thanks Kees, for CCinf linux-api@. That really should have been done at >> the outset. > > Apologies, I'll cc the list on future versions. > >> Tycho, where's the man-pages patch describing this new kernel-userspace >> API feature? :-) > > Once we get the API finalized I'm happy to write it. > >> >> One concern with this patch is exactly what the interface should look like >> >> for users, since seccomp()'s second argument is a pointer, we could ask >> >> people to pass a pointer to the fd, but implies we might write to it which >> >> seems impolite. Right now we cast the pointer (and force the user to cast >> >> it), which generates ugly warnings. I'm not sure what the right answer is >> >> here. >> >> >> >> Signed-off-by: Tycho Andersen <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> >> >> CC: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> >> >> CC: Will Drewry <wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> >> >> CC: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> >> >> CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> >> >> CC: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> >> >> CC: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> >> >> CC: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> >> >> CC: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org> >> >> --- >> >> include/linux/seccomp.h | 3 +- >> >> include/uapi/linux/seccomp.h | 1 + >> >> kernel/seccomp.c | 70 ++++++++++++++++++++++++++++++++++++-------- >> >> 3 files changed, 61 insertions(+), 13 deletions(-) >> >> >> >> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h >> >> index d1a86ed..a725dd5 100644 >> >> --- a/include/linux/seccomp.h >> >> +++ b/include/linux/seccomp.h >> >> @@ -3,7 +3,8 @@ >> >> >> >> #include <uapi/linux/seccomp.h> >> >> >> >> -#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC) >> >> +#define SECCOMP_FILTER_FLAG_MASK (\ >> >> + SECCOMP_FILTER_FLAG_TSYNC | SECCOMP_FILTER_FLAG_EBPF) >> >> >> >> #ifdef CONFIG_SECCOMP >> >> >> >> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h >> >> index 0f238a4..c29a423 100644 >> >> --- a/include/uapi/linux/seccomp.h >> >> +++ b/include/uapi/linux/seccomp.h >> >> @@ -16,6 +16,7 @@ >> >> >> >> /* Valid flags for SECCOMP_SET_MODE_FILTER */ >> >> #define SECCOMP_FILTER_FLAG_TSYNC 1 >> >> +#define SECCOMP_FILTER_FLAG_EBPF (1 << 1) >> >> >> >> /* >> >> * All BPF programs must return a 32-bit value. >> >> diff --git a/kernel/seccomp.c b/kernel/seccomp.c >> >> index a2c5b32..9c6bea6 100644 >> >> --- a/kernel/seccomp.c >> >> +++ b/kernel/seccomp.c >> >> @@ -355,17 +355,6 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog) >> >> >> >> BUG_ON(INT_MAX / fprog->len < sizeof(struct sock_filter)); >> >> >> >> - /* >> >> - * Installing a seccomp filter requires that the task has >> >> - * CAP_SYS_ADMIN in its namespace or be running with no_new_privs. >> >> - * This avoids scenarios where unprivileged tasks can affect the >> >> - * behavior of privileged children. >> >> - */ >> >> - if (!task_no_new_privs(current) && >> >> - security_capable_noaudit(current_cred(), current_user_ns(), >> >> - CAP_SYS_ADMIN) != 0) >> >> - return ERR_PTR(-EACCES); >> >> - >> >> /* Allocate a new seccomp_filter */ >> >> sfilter = kzalloc(sizeof(*sfilter), GFP_KERNEL | __GFP_NOWARN); >> >> if (!sfilter) >> >> @@ -509,6 +498,48 @@ static void seccomp_send_sigsys(int syscall, int reason) >> >> info.si_syscall = syscall; >> >> force_sig_info(SIGSYS, &info, current); >> >> } >> >> + >> >> +#ifdef CONFIG_BPF_SYSCALL >> >> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter) >> >> +{ >> >> + /* XXX: this cast generates a warning. should we make people pass in >> >> + * &fd, or is there some nicer way of doing this? >> >> + */ >> >> + u32 fd = (u32) filter; >> > >> > I think this is probably the right way to do it, modulo getting the >> > warning fixed. Let me invoke the great linux-api subscribers to get >> > some more opinions. >> >> Sigh. It's sad, but the using a cast does seem the simplest option. >> But, how about another idea... >> >> > tl;dr: adding SECCOMP_FILTER_FLAG_EBPF to the flags changes the >> > pointer argument into an fd argument. Is this sane, should it be a >> > pointer to an fd, or should it not be a flag at all, creating a new >> > seccomp command instead (SECCOMP_MODE_FILTER_EBPF)? >> >> What about >> >> seccomp(SECCOMP_MODE_FILTER_EBPF, flags, structp) >> >> Where structp is a pointer to something like >> >> struct seccomp_ebpf { >> int size; /* Size of this whole struct */ >> int fd; >> } >> >> 'size' allows for future expansion of the struct (in case we want to >> expand it later), and placing 'fd' inside a struct avoids unpleasant >> implication that would be made by passing a pointer to an fd as the >> third argument. > > I like this; although perhaps something like bpf() has, with the > unions inside the struct so that we don't have to declare another > struct if we add another seccomp command. Kees? Yeah, bpf's union looks good. Let's add a "command" flag, though: seccomp(SECCOMP_MODE_FILTER_EBPF, int cmd, union, size); And this cmd could be ADD_FD or something? How's that look? -Kees > > Tycho > >> Cheers, >> >> Michael >> >> >> > -Kees >> > >> >> + struct seccomp_filter *ret; >> >> + struct bpf_prog *prog; >> >> + >> >> + prog = bpf_prog_get(fd); >> >> + if (IS_ERR(prog)) >> >> + return (struct seccomp_filter *) prog; >> >> + >> >> + if (prog->type != BPF_PROG_TYPE_SECCOMP) { >> >> + bpf_prog_put(prog); >> >> + return ERR_PTR(-EINVAL); >> >> + } >> >> + >> >> + ret = kzalloc(sizeof(*ret), GFP_KERNEL | __GFP_NOWARN); >> >> + if (!ret) { >> >> + bpf_prog_put(prog); >> >> + return ERR_PTR(-ENOMEM); >> >> + } >> >> + >> >> + ret->prog = prog; >> >> + atomic_set(&ret->usage, 1); >> >> + >> >> + /* Intentionally don't bpf_prog_put() here, because the underlying prog >> >> + * is refcounted too and we're holding a reference from the struct >> >> + * seccomp_filter object. >> >> + */ >> >> + >> >> + return ret; >> >> +} >> >> +#else >> >> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter) >> >> +{ >> >> + return ERR_PTR(-EINVAL); >> >> +} >> >> +#endif >> >> #endif /* CONFIG_SECCOMP_FILTER */ >> >> >> >> /* >> >> @@ -775,8 +806,23 @@ static long seccomp_set_mode_filter(unsigned int flags, >> >> if (flags & ~SECCOMP_FILTER_FLAG_MASK) >> >> return -EINVAL; >> >> >> >> + /* >> >> + * Installing a seccomp filter requires that the task has >> >> + * CAP_SYS_ADMIN in its namespace or be running with no_new_privs. >> >> + * This avoids scenarios where unprivileged tasks can affect the >> >> + * behavior of privileged children. >> >> + */ >> >> + if (!task_no_new_privs(current) && >> >> + security_capable_noaudit(current_cred(), current_user_ns(), >> >> + CAP_SYS_ADMIN) != 0) >> >> + return -EACCES; >> >> + >> >> /* Prepare the new filter before holding any locks. */ >> >> - prepared = seccomp_prepare_user_filter(filter); >> >> + if (flags & SECCOMP_FILTER_FLAG_EBPF) >> >> + prepared = seccomp_prepare_ebpf(filter); >> >> + else >> >> + prepared = seccomp_prepare_user_filter(filter); >> >> + >> >> if (IS_ERR(prepared)) >> >> return PTR_ERR(prepared); >> >> >> >> -- >> >> 2.1.4 >> >> >> > >> > >> > >> >> >> -- >> Michael Kerrisk >> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ >> Linux/UNIX System Programming Training: http://man7.org/training/ -- Kees Cook Chrome OS Security ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <CAGXu5jKS0yX92XXhL6ZkqMrxkqFpPyyBd7wbsvEEx4rqZ0VG6g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd [not found] ` <CAGXu5jKS0yX92XXhL6ZkqMrxkqFpPyyBd7wbsvEEx4rqZ0VG6g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2015-09-09 14:47 ` Tycho Andersen 2015-09-09 15:14 ` Alexei Starovoitov 0 siblings, 1 reply; 7+ messages in thread From: Tycho Andersen @ 2015-09-09 14:47 UTC (permalink / raw) To: Kees Cook Cc: Michael Kerrisk (man-pages), Linux API, Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML, Network Development On Tue, Sep 08, 2015 at 05:07:03PM -0700, Kees Cook wrote: > > Yeah, bpf's union looks good. Let's add a "command" flag, though: > > seccomp(SECCOMP_MODE_FILTER_EBPF, int cmd, union, size); > > And this cmd could be ADD_FD or something? > > How's that look? I think we can drop the size (using the same strategy as bpf() and checking for zeroes at the end), and keep the same signature for seccomp(); so: seccomp(SECCOMP_MODE_FILTER_EBPF, SECCOMP_ADD_BPF_FD, &union) Yes, I'll use this in the next version. Tycho ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd 2015-09-09 14:47 ` Tycho Andersen @ 2015-09-09 15:14 ` Alexei Starovoitov [not found] ` <20150909151402.GA3429-2RGepAHry04KGsCuBW9QBvb0xQGhdpdCAL8bYrjMMd8@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Alexei Starovoitov @ 2015-09-09 15:14 UTC (permalink / raw) To: Tycho Andersen Cc: Kees Cook, Michael Kerrisk (man-pages), Linux API, Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML, Network Development On Wed, Sep 09, 2015 at 08:47:24AM -0600, Tycho Andersen wrote: > On Tue, Sep 08, 2015 at 05:07:03PM -0700, Kees Cook wrote: > > > > Yeah, bpf's union looks good. Let's add a "command" flag, though: > > > > seccomp(SECCOMP_MODE_FILTER_EBPF, int cmd, union, size); > > > > And this cmd could be ADD_FD or something? > > > > How's that look? > > I think we can drop the size (using the same strategy as bpf() and > checking for zeroes at the end), and keep the same signature for > seccomp(); so: > > seccomp(SECCOMP_MODE_FILTER_EBPF, SECCOMP_ADD_BPF_FD, &union) > > Yes, I'll use this in the next version. actually bpf() has size as the last argument: SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size) perf_event_open() doesn't and size is embedded as one of the fields. Both approaches are equivally powerfull from extensitiblity point of view. My preference was to keep size as an explicit argument. ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20150909151402.GA3429-2RGepAHry04KGsCuBW9QBvb0xQGhdpdCAL8bYrjMMd8@public.gmane.org>]
* Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd [not found] ` <20150909151402.GA3429-2RGepAHry04KGsCuBW9QBvb0xQGhdpdCAL8bYrjMMd8@public.gmane.org> @ 2015-09-09 15:55 ` Tycho Andersen 0 siblings, 0 replies; 7+ messages in thread From: Tycho Andersen @ 2015-09-09 15:55 UTC (permalink / raw) To: Alexei Starovoitov Cc: Kees Cook, Michael Kerrisk (man-pages), Linux API, Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML, Network Development On Wed, Sep 09, 2015 at 08:14:04AM -0700, Alexei Starovoitov wrote: > On Wed, Sep 09, 2015 at 08:47:24AM -0600, Tycho Andersen wrote: > > On Tue, Sep 08, 2015 at 05:07:03PM -0700, Kees Cook wrote: > > > > > > Yeah, bpf's union looks good. Let's add a "command" flag, though: > > > > > > seccomp(SECCOMP_MODE_FILTER_EBPF, int cmd, union, size); > > > > > > And this cmd could be ADD_FD or something? > > > > > > How's that look? > > > > I think we can drop the size (using the same strategy as bpf() and > > checking for zeroes at the end), and keep the same signature for > > seccomp(); so: > > > > seccomp(SECCOMP_MODE_FILTER_EBPF, SECCOMP_ADD_BPF_FD, &union) > > > > Yes, I'll use this in the next version. > > actually bpf() has size as the last argument: > SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size) > perf_event_open() doesn't and size is embedded as one of the fields. > Both approaches are equivally powerfull from extensitiblity > point of view. My preference was to keep size as an explicit > argument. Yep, sorry that was poorly written. I meant keeping the size as a member of the struct as Michael originally suggested, mostly to avoid having to change the signature of seccomp(). Tycho ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-09-09 15:55 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1441382664-17437-1-git-send-email-tycho.andersen@canonical.com>
[not found] ` <1441382664-17437-6-git-send-email-tycho.andersen@canonical.com>
[not found] ` <1441382664-17437-6-git-send-email-tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
2015-09-04 20:41 ` [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd Kees Cook
[not found] ` <CAGXu5jKke44txdYqEgPRrkn8SyWGjJuHxT2qMdq2ztp_16mQyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-05 7:13 ` Michael Kerrisk (man-pages)
[not found] ` <55EA95FE.7000006-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-09-08 13:40 ` Tycho Andersen
2015-09-09 0:07 ` Kees Cook
[not found] ` <CAGXu5jKS0yX92XXhL6ZkqMrxkqFpPyyBd7wbsvEEx4rqZ0VG6g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-09 14:47 ` Tycho Andersen
2015-09-09 15:14 ` Alexei Starovoitov
[not found] ` <20150909151402.GA3429-2RGepAHry04KGsCuBW9QBvb0xQGhdpdCAL8bYrjMMd8@public.gmane.org>
2015-09-09 15:55 ` Tycho Andersen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).