Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd

linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd
       [not found]   ` <1441382664-17437-6-git-send-email-tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
@ 2015-09-04 20:41     ` Kees Cook
       [not found]       ` <CAGXu5jKke44txdYqEgPRrkn8SyWGjJuHxT2qMdq2ztp_16mQyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Kees Cook @ 2015-09-04 20:41 UTC (permalink / raw)
  To: Tycho Andersen, Linux API
  Cc: Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development

On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
<tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
> This is the final bit needed to support seccomp filters created via the bpf
> syscall.
>
> One concern with this patch is exactly what the interface should look like
> for users, since seccomp()'s second argument is a pointer, we could ask
> people to pass a pointer to the fd, but implies we might write to it which
> seems impolite. Right now we cast the pointer (and force the user to cast
> it), which generates ugly warnings. I'm not sure what the right answer is
> here.
>
> Signed-off-by: Tycho Andersen <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> CC: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> CC: Will Drewry <wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> CC: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
> CC: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
> CC: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
> CC: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> CC: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
> ---
>  include/linux/seccomp.h      |  3 +-
>  include/uapi/linux/seccomp.h |  1 +
>  kernel/seccomp.c             | 70 ++++++++++++++++++++++++++++++++++++--------
>  3 files changed, 61 insertions(+), 13 deletions(-)
>
> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
> index d1a86ed..a725dd5 100644
> --- a/include/linux/seccomp.h
> +++ b/include/linux/seccomp.h
> @@ -3,7 +3,8 @@
>
>  #include <uapi/linux/seccomp.h>
>
> -#define SECCOMP_FILTER_FLAG_MASK       (SECCOMP_FILTER_FLAG_TSYNC)
> +#define SECCOMP_FILTER_FLAG_MASK       (\
> +       SECCOMP_FILTER_FLAG_TSYNC | SECCOMP_FILTER_FLAG_EBPF)
>
>  #ifdef CONFIG_SECCOMP
>
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index 0f238a4..c29a423 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -16,6 +16,7 @@
>
>  /* Valid flags for SECCOMP_SET_MODE_FILTER */
>  #define SECCOMP_FILTER_FLAG_TSYNC      1
> +#define SECCOMP_FILTER_FLAG_EBPF       (1 << 1)
>
>  /*
>   * All BPF programs must return a 32-bit value.
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index a2c5b32..9c6bea6 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -355,17 +355,6 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
>
>         BUG_ON(INT_MAX / fprog->len < sizeof(struct sock_filter));
>
> -       /*
> -        * Installing a seccomp filter requires that the task has
> -        * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
> -        * This avoids scenarios where unprivileged tasks can affect the
> -        * behavior of privileged children.
> -        */
> -       if (!task_no_new_privs(current) &&
> -           security_capable_noaudit(current_cred(), current_user_ns(),
> -                                    CAP_SYS_ADMIN) != 0)
> -               return ERR_PTR(-EACCES);
> -
>         /* Allocate a new seccomp_filter */
>         sfilter = kzalloc(sizeof(*sfilter), GFP_KERNEL | __GFP_NOWARN);
>         if (!sfilter)
> @@ -509,6 +498,48 @@ static void seccomp_send_sigsys(int syscall, int reason)
>         info.si_syscall = syscall;
>         force_sig_info(SIGSYS, &info, current);
>  }
> +
> +#ifdef CONFIG_BPF_SYSCALL
> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter)
> +{
> +       /* XXX: this cast generates a warning. should we make people pass in
> +        * &fd, or is there some nicer way of doing this?
> +        */
> +       u32 fd = (u32) filter;

I think this is probably the right way to do it, modulo getting the
warning fixed. Let me invoke the great linux-api subscribers to get
some more opinions.

tl;dr: adding SECCOMP_FILTER_FLAG_EBPF to the flags changes the
pointer argument into an fd argument. Is this sane, should it be a
pointer to an fd, or should it not be a flag at all, creating a new
seccomp command instead (SECCOMP_MODE_FILTER_EBPF)?

-Kees

> +       struct seccomp_filter *ret;
> +       struct bpf_prog *prog;
> +
> +       prog = bpf_prog_get(fd);
> +       if (IS_ERR(prog))
> +               return (struct seccomp_filter *) prog;
> +
> +       if (prog->type != BPF_PROG_TYPE_SECCOMP) {
> +               bpf_prog_put(prog);
> +               return ERR_PTR(-EINVAL);
> +       }
> +
> +       ret = kzalloc(sizeof(*ret), GFP_KERNEL | __GFP_NOWARN);
> +       if (!ret) {
> +               bpf_prog_put(prog);
> +               return ERR_PTR(-ENOMEM);
> +       }
> +
> +       ret->prog = prog;
> +       atomic_set(&ret->usage, 1);
> +
> +       /* Intentionally don't bpf_prog_put() here, because the underlying prog
> +        * is refcounted too and we're holding a reference from the struct
> +        * seccomp_filter object.
> +        */
> +
> +       return ret;
> +}
> +#else
> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter)
> +{
> +       return ERR_PTR(-EINVAL);
> +}
> +#endif
>  #endif /* CONFIG_SECCOMP_FILTER */
>
>  /*
> @@ -775,8 +806,23 @@ static long seccomp_set_mode_filter(unsigned int flags,
>         if (flags & ~SECCOMP_FILTER_FLAG_MASK)
>                 return -EINVAL;
>
> +       /*
> +        * Installing a seccomp filter requires that the task has
> +        * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
> +        * This avoids scenarios where unprivileged tasks can affect the
> +        * behavior of privileged children.
> +        */
> +       if (!task_no_new_privs(current) &&
> +           security_capable_noaudit(current_cred(), current_user_ns(),
> +                                    CAP_SYS_ADMIN) != 0)
> +               return -EACCES;
> +
>         /* Prepare the new filter before holding any locks. */
> -       prepared = seccomp_prepare_user_filter(filter);
> +       if (flags & SECCOMP_FILTER_FLAG_EBPF)
> +               prepared = seccomp_prepare_ebpf(filter);
> +       else
> +               prepared = seccomp_prepare_user_filter(filter);
> +
>         if (IS_ERR(prepared))
>                 return PTR_ERR(prepared);
>
> --
> 2.1.4
>



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <CAGXu5jKke44txdYqEgPRrkn8SyWGjJuHxT2qMdq2ztp_16mQyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd
       [not found]       ` <CAGXu5jKke44txdYqEgPRrkn8SyWGjJuHxT2qMdq2ztp_16mQyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-09-05  7:13         ` Michael Kerrisk (man-pages)
       [not found]           ` <55EA95FE.7000006-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-09-05  7:13 UTC (permalink / raw)
  To: Kees Cook, Tycho Andersen, Linux API
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Alexei Starovoitov,
	Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov,
	Serge E. Hallyn, Daniel Borkmann, LKML, Network Development

On 09/04/2015 10:41 PM, Kees Cook wrote:
> On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
> <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
>> This is the final bit needed to support seccomp filters created via the bpf
>> syscall.

Hmm. Thanks Kees, for CCinf linux-api@. That really should have been done at
the outset.

Tycho, where's the man-pages patch describing this new kernel-userspace
API feature? :-)

>> One concern with this patch is exactly what the interface should look like
>> for users, since seccomp()'s second argument is a pointer, we could ask
>> people to pass a pointer to the fd, but implies we might write to it which
>> seems impolite. Right now we cast the pointer (and force the user to cast
>> it), which generates ugly warnings. I'm not sure what the right answer is
>> here.
>>
>> Signed-off-by: Tycho Andersen <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
>> CC: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
>> CC: Will Drewry <wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
>> CC: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
>> CC: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
>> CC: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
>> CC: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>> CC: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
>> ---
>>  include/linux/seccomp.h      |  3 +-
>>  include/uapi/linux/seccomp.h |  1 +
>>  kernel/seccomp.c             | 70 ++++++++++++++++++++++++++++++++++++--------
>>  3 files changed, 61 insertions(+), 13 deletions(-)
>>
>> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
>> index d1a86ed..a725dd5 100644
>> --- a/include/linux/seccomp.h
>> +++ b/include/linux/seccomp.h
>> @@ -3,7 +3,8 @@
>>
>>  #include <uapi/linux/seccomp.h>
>>
>> -#define SECCOMP_FILTER_FLAG_MASK       (SECCOMP_FILTER_FLAG_TSYNC)
>> +#define SECCOMP_FILTER_FLAG_MASK       (\
>> +       SECCOMP_FILTER_FLAG_TSYNC | SECCOMP_FILTER_FLAG_EBPF)
>>
>>  #ifdef CONFIG_SECCOMP
>>
>> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
>> index 0f238a4..c29a423 100644
>> --- a/include/uapi/linux/seccomp.h
>> +++ b/include/uapi/linux/seccomp.h
>> @@ -16,6 +16,7 @@
>>
>>  /* Valid flags for SECCOMP_SET_MODE_FILTER */
>>  #define SECCOMP_FILTER_FLAG_TSYNC      1
>> +#define SECCOMP_FILTER_FLAG_EBPF       (1 << 1)
>>
>>  /*
>>   * All BPF programs must return a 32-bit value.
>> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
>> index a2c5b32..9c6bea6 100644
>> --- a/kernel/seccomp.c
>> +++ b/kernel/seccomp.c
>> @@ -355,17 +355,6 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
>>
>>         BUG_ON(INT_MAX / fprog->len < sizeof(struct sock_filter));
>>
>> -       /*
>> -        * Installing a seccomp filter requires that the task has
>> -        * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
>> -        * This avoids scenarios where unprivileged tasks can affect the
>> -        * behavior of privileged children.
>> -        */
>> -       if (!task_no_new_privs(current) &&
>> -           security_capable_noaudit(current_cred(), current_user_ns(),
>> -                                    CAP_SYS_ADMIN) != 0)
>> -               return ERR_PTR(-EACCES);
>> -
>>         /* Allocate a new seccomp_filter */
>>         sfilter = kzalloc(sizeof(*sfilter), GFP_KERNEL | __GFP_NOWARN);
>>         if (!sfilter)
>> @@ -509,6 +498,48 @@ static void seccomp_send_sigsys(int syscall, int reason)
>>         info.si_syscall = syscall;
>>         force_sig_info(SIGSYS, &info, current);
>>  }
>> +
>> +#ifdef CONFIG_BPF_SYSCALL
>> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter)
>> +{
>> +       /* XXX: this cast generates a warning. should we make people pass in
>> +        * &fd, or is there some nicer way of doing this?
>> +        */
>> +       u32 fd = (u32) filter;
> 
> I think this is probably the right way to do it, modulo getting the
> warning fixed. Let me invoke the great linux-api subscribers to get
> some more opinions.

Sigh. It's sad, but the using a cast does seem the simplest option.
But, how about another idea...

> tl;dr: adding SECCOMP_FILTER_FLAG_EBPF to the flags changes the
> pointer argument into an fd argument. Is this sane, should it be a
> pointer to an fd, or should it not be a flag at all, creating a new
> seccomp command instead (SECCOMP_MODE_FILTER_EBPF)?

What about

    seccomp(SECCOMP_MODE_FILTER_EBPF, flags, structp)

Where structp is a pointer to something like

struct seccomp_ebpf {
    int size;      /* Size of this whole struct */
    int fd;
}

'size' allows for future expansion of the struct (in case we want to 
expand it later), and placing 'fd' inside a struct avoids unpleasant
implication that would be made by passing a pointer to an fd as the
third argument.

Cheers,

Michael


> -Kees
> 
>> +       struct seccomp_filter *ret;
>> +       struct bpf_prog *prog;
>> +
>> +       prog = bpf_prog_get(fd);
>> +       if (IS_ERR(prog))
>> +               return (struct seccomp_filter *) prog;
>> +
>> +       if (prog->type != BPF_PROG_TYPE_SECCOMP) {
>> +               bpf_prog_put(prog);
>> +               return ERR_PTR(-EINVAL);
>> +       }
>> +
>> +       ret = kzalloc(sizeof(*ret), GFP_KERNEL | __GFP_NOWARN);
>> +       if (!ret) {
>> +               bpf_prog_put(prog);
>> +               return ERR_PTR(-ENOMEM);
>> +       }
>> +
>> +       ret->prog = prog;
>> +       atomic_set(&ret->usage, 1);
>> +
>> +       /* Intentionally don't bpf_prog_put() here, because the underlying prog
>> +        * is refcounted too and we're holding a reference from the struct
>> +        * seccomp_filter object.
>> +        */
>> +
>> +       return ret;
>> +}
>> +#else
>> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter)
>> +{
>> +       return ERR_PTR(-EINVAL);
>> +}
>> +#endif
>>  #endif /* CONFIG_SECCOMP_FILTER */
>>
>>  /*
>> @@ -775,8 +806,23 @@ static long seccomp_set_mode_filter(unsigned int flags,
>>         if (flags & ~SECCOMP_FILTER_FLAG_MASK)
>>                 return -EINVAL;
>>
>> +       /*
>> +        * Installing a seccomp filter requires that the task has
>> +        * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
>> +        * This avoids scenarios where unprivileged tasks can affect the
>> +        * behavior of privileged children.
>> +        */
>> +       if (!task_no_new_privs(current) &&
>> +           security_capable_noaudit(current_cred(), current_user_ns(),
>> +                                    CAP_SYS_ADMIN) != 0)
>> +               return -EACCES;
>> +
>>         /* Prepare the new filter before holding any locks. */
>> -       prepared = seccomp_prepare_user_filter(filter);
>> +       if (flags & SECCOMP_FILTER_FLAG_EBPF)
>> +               prepared = seccomp_prepare_ebpf(filter);
>> +       else
>> +               prepared = seccomp_prepare_user_filter(filter);
>> +
>>         if (IS_ERR(prepared))
>>                 return PTR_ERR(prepared);
>>
>> --
>> 2.1.4
>>
> 
> 
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <55EA95FE.7000006-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]

* Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd
       [not found]           ` <55EA95FE.7000006-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-09-08 13:40             ` Tycho Andersen
  2015-09-09  0:07               ` Kees Cook
  0 siblings, 1 reply; 7+ messages in thread
From: Tycho Andersen @ 2015-09-08 13:40 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Kees Cook, Linux API, Alexei Starovoitov, Will Drewry,
	Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov, Serge E. Hallyn,
	Daniel Borkmann, LKML, Network Development

On Sat, Sep 05, 2015 at 09:13:02AM +0200, Michael Kerrisk (man-pages) wrote:
> On 09/04/2015 10:41 PM, Kees Cook wrote:
> > On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
> > <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
> >> This is the final bit needed to support seccomp filters created via the bpf
> >> syscall.
> 
> Hmm. Thanks Kees, for CCinf linux-api@. That really should have been done at
> the outset.

Apologies, I'll cc the list on future versions.

> Tycho, where's the man-pages patch describing this new kernel-userspace
> API feature? :-)

Once we get the API finalized I'm happy to write it.

> >> One concern with this patch is exactly what the interface should look like
> >> for users, since seccomp()'s second argument is a pointer, we could ask
> >> people to pass a pointer to the fd, but implies we might write to it which
> >> seems impolite. Right now we cast the pointer (and force the user to cast
> >> it), which generates ugly warnings. I'm not sure what the right answer is
> >> here.
> >>
> >> Signed-off-by: Tycho Andersen <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> >> CC: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> >> CC: Will Drewry <wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> >> CC: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> >> CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
> >> CC: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
> >> CC: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
> >> CC: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> >> CC: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
> >> ---
> >>  include/linux/seccomp.h      |  3 +-
> >>  include/uapi/linux/seccomp.h |  1 +
> >>  kernel/seccomp.c             | 70 ++++++++++++++++++++++++++++++++++++--------
> >>  3 files changed, 61 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
> >> index d1a86ed..a725dd5 100644
> >> --- a/include/linux/seccomp.h
> >> +++ b/include/linux/seccomp.h
> >> @@ -3,7 +3,8 @@
> >>
> >>  #include <uapi/linux/seccomp.h>
> >>
> >> -#define SECCOMP_FILTER_FLAG_MASK       (SECCOMP_FILTER_FLAG_TSYNC)
> >> +#define SECCOMP_FILTER_FLAG_MASK       (\
> >> +       SECCOMP_FILTER_FLAG_TSYNC | SECCOMP_FILTER_FLAG_EBPF)
> >>
> >>  #ifdef CONFIG_SECCOMP
> >>
> >> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> >> index 0f238a4..c29a423 100644
> >> --- a/include/uapi/linux/seccomp.h
> >> +++ b/include/uapi/linux/seccomp.h
> >> @@ -16,6 +16,7 @@
> >>
> >>  /* Valid flags for SECCOMP_SET_MODE_FILTER */
> >>  #define SECCOMP_FILTER_FLAG_TSYNC      1
> >> +#define SECCOMP_FILTER_FLAG_EBPF       (1 << 1)
> >>
> >>  /*
> >>   * All BPF programs must return a 32-bit value.
> >> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> >> index a2c5b32..9c6bea6 100644
> >> --- a/kernel/seccomp.c
> >> +++ b/kernel/seccomp.c
> >> @@ -355,17 +355,6 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
> >>
> >>         BUG_ON(INT_MAX / fprog->len < sizeof(struct sock_filter));
> >>
> >> -       /*
> >> -        * Installing a seccomp filter requires that the task has
> >> -        * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
> >> -        * This avoids scenarios where unprivileged tasks can affect the
> >> -        * behavior of privileged children.
> >> -        */
> >> -       if (!task_no_new_privs(current) &&
> >> -           security_capable_noaudit(current_cred(), current_user_ns(),
> >> -                                    CAP_SYS_ADMIN) != 0)
> >> -               return ERR_PTR(-EACCES);
> >> -
> >>         /* Allocate a new seccomp_filter */
> >>         sfilter = kzalloc(sizeof(*sfilter), GFP_KERNEL | __GFP_NOWARN);
> >>         if (!sfilter)
> >> @@ -509,6 +498,48 @@ static void seccomp_send_sigsys(int syscall, int reason)
> >>         info.si_syscall = syscall;
> >>         force_sig_info(SIGSYS, &info, current);
> >>  }
> >> +
> >> +#ifdef CONFIG_BPF_SYSCALL
> >> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter)
> >> +{
> >> +       /* XXX: this cast generates a warning. should we make people pass in
> >> +        * &fd, or is there some nicer way of doing this?
> >> +        */
> >> +       u32 fd = (u32) filter;
> > 
> > I think this is probably the right way to do it, modulo getting the
> > warning fixed. Let me invoke the great linux-api subscribers to get
> > some more opinions.
> 
> Sigh. It's sad, but the using a cast does seem the simplest option.
> But, how about another idea...
> 
> > tl;dr: adding SECCOMP_FILTER_FLAG_EBPF to the flags changes the
> > pointer argument into an fd argument. Is this sane, should it be a
> > pointer to an fd, or should it not be a flag at all, creating a new
> > seccomp command instead (SECCOMP_MODE_FILTER_EBPF)?
> 
> What about
> 
>     seccomp(SECCOMP_MODE_FILTER_EBPF, flags, structp)
> 
> Where structp is a pointer to something like
> 
> struct seccomp_ebpf {
>     int size;      /* Size of this whole struct */
>     int fd;
> }
> 
> 'size' allows for future expansion of the struct (in case we want to 
> expand it later), and placing 'fd' inside a struct avoids unpleasant
> implication that would be made by passing a pointer to an fd as the
> third argument.

I like this; although perhaps something like bpf() has, with the
unions inside the struct so that we don't have to declare another
struct if we add another seccomp command. Kees?

Tycho

> Cheers,
> 
> Michael
> 
> 
> > -Kees
> > 
> >> +       struct seccomp_filter *ret;
> >> +       struct bpf_prog *prog;
> >> +
> >> +       prog = bpf_prog_get(fd);
> >> +       if (IS_ERR(prog))
> >> +               return (struct seccomp_filter *) prog;
> >> +
> >> +       if (prog->type != BPF_PROG_TYPE_SECCOMP) {
> >> +               bpf_prog_put(prog);
> >> +               return ERR_PTR(-EINVAL);
> >> +       }
> >> +
> >> +       ret = kzalloc(sizeof(*ret), GFP_KERNEL | __GFP_NOWARN);
> >> +       if (!ret) {
> >> +               bpf_prog_put(prog);
> >> +               return ERR_PTR(-ENOMEM);
> >> +       }
> >> +
> >> +       ret->prog = prog;
> >> +       atomic_set(&ret->usage, 1);
> >> +
> >> +       /* Intentionally don't bpf_prog_put() here, because the underlying prog
> >> +        * is refcounted too and we're holding a reference from the struct
> >> +        * seccomp_filter object.
> >> +        */
> >> +
> >> +       return ret;
> >> +}
> >> +#else
> >> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter)
> >> +{
> >> +       return ERR_PTR(-EINVAL);
> >> +}
> >> +#endif
> >>  #endif /* CONFIG_SECCOMP_FILTER */
> >>
> >>  /*
> >> @@ -775,8 +806,23 @@ static long seccomp_set_mode_filter(unsigned int flags,
> >>         if (flags & ~SECCOMP_FILTER_FLAG_MASK)
> >>                 return -EINVAL;
> >>
> >> +       /*
> >> +        * Installing a seccomp filter requires that the task has
> >> +        * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
> >> +        * This avoids scenarios where unprivileged tasks can affect the
> >> +        * behavior of privileged children.
> >> +        */
> >> +       if (!task_no_new_privs(current) &&
> >> +           security_capable_noaudit(current_cred(), current_user_ns(),
> >> +                                    CAP_SYS_ADMIN) != 0)
> >> +               return -EACCES;
> >> +
> >>         /* Prepare the new filter before holding any locks. */
> >> -       prepared = seccomp_prepare_user_filter(filter);
> >> +       if (flags & SECCOMP_FILTER_FLAG_EBPF)
> >> +               prepared = seccomp_prepare_ebpf(filter);
> >> +       else
> >> +               prepared = seccomp_prepare_user_filter(filter);
> >> +
> >>         if (IS_ERR(prepared))
> >>                 return PTR_ERR(prepared);
> >>
> >> --
> >> 2.1.4
> >>
> > 
> > 
> > 
> 
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd
  2015-09-08 13:40             ` Tycho Andersen
@ 2015-09-09  0:07               ` Kees Cook
       [not found]                 ` <CAGXu5jKS0yX92XXhL6ZkqMrxkqFpPyyBd7wbsvEEx4rqZ0VG6g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Kees Cook @ 2015-09-09  0:07 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Michael Kerrisk (man-pages), Linux API, Alexei Starovoitov,
	Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov,
	Serge E. Hallyn, Daniel Borkmann, LKML, Network Development

On Tue, Sep 8, 2015 at 6:40 AM, Tycho Andersen
<tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
> On Sat, Sep 05, 2015 at 09:13:02AM +0200, Michael Kerrisk (man-pages) wrote:
>> On 09/04/2015 10:41 PM, Kees Cook wrote:
>> > On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
>> > <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
>> >> This is the final bit needed to support seccomp filters created via the bpf
>> >> syscall.
>>
>> Hmm. Thanks Kees, for CCinf linux-api@. That really should have been done at
>> the outset.
>
> Apologies, I'll cc the list on future versions.
>
>> Tycho, where's the man-pages patch describing this new kernel-userspace
>> API feature? :-)
>
> Once we get the API finalized I'm happy to write it.
>
>> >> One concern with this patch is exactly what the interface should look like
>> >> for users, since seccomp()'s second argument is a pointer, we could ask
>> >> people to pass a pointer to the fd, but implies we might write to it which
>> >> seems impolite. Right now we cast the pointer (and force the user to cast
>> >> it), which generates ugly warnings. I'm not sure what the right answer is
>> >> here.
>> >>
>> >> Signed-off-by: Tycho Andersen <tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
>> >> CC: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
>> >> CC: Will Drewry <wad-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
>> >> CC: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> >> CC: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
>> >> CC: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
>> >> CC: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
>> >> CC: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>> >> CC: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
>> >> ---
>> >>  include/linux/seccomp.h      |  3 +-
>> >>  include/uapi/linux/seccomp.h |  1 +
>> >>  kernel/seccomp.c             | 70 ++++++++++++++++++++++++++++++++++++--------
>> >>  3 files changed, 61 insertions(+), 13 deletions(-)
>> >>
>> >> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
>> >> index d1a86ed..a725dd5 100644
>> >> --- a/include/linux/seccomp.h
>> >> +++ b/include/linux/seccomp.h
>> >> @@ -3,7 +3,8 @@
>> >>
>> >>  #include <uapi/linux/seccomp.h>
>> >>
>> >> -#define SECCOMP_FILTER_FLAG_MASK       (SECCOMP_FILTER_FLAG_TSYNC)
>> >> +#define SECCOMP_FILTER_FLAG_MASK       (\
>> >> +       SECCOMP_FILTER_FLAG_TSYNC | SECCOMP_FILTER_FLAG_EBPF)
>> >>
>> >>  #ifdef CONFIG_SECCOMP
>> >>
>> >> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
>> >> index 0f238a4..c29a423 100644
>> >> --- a/include/uapi/linux/seccomp.h
>> >> +++ b/include/uapi/linux/seccomp.h
>> >> @@ -16,6 +16,7 @@
>> >>
>> >>  /* Valid flags for SECCOMP_SET_MODE_FILTER */
>> >>  #define SECCOMP_FILTER_FLAG_TSYNC      1
>> >> +#define SECCOMP_FILTER_FLAG_EBPF       (1 << 1)
>> >>
>> >>  /*
>> >>   * All BPF programs must return a 32-bit value.
>> >> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
>> >> index a2c5b32..9c6bea6 100644
>> >> --- a/kernel/seccomp.c
>> >> +++ b/kernel/seccomp.c
>> >> @@ -355,17 +355,6 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
>> >>
>> >>         BUG_ON(INT_MAX / fprog->len < sizeof(struct sock_filter));
>> >>
>> >> -       /*
>> >> -        * Installing a seccomp filter requires that the task has
>> >> -        * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
>> >> -        * This avoids scenarios where unprivileged tasks can affect the
>> >> -        * behavior of privileged children.
>> >> -        */
>> >> -       if (!task_no_new_privs(current) &&
>> >> -           security_capable_noaudit(current_cred(), current_user_ns(),
>> >> -                                    CAP_SYS_ADMIN) != 0)
>> >> -               return ERR_PTR(-EACCES);
>> >> -
>> >>         /* Allocate a new seccomp_filter */
>> >>         sfilter = kzalloc(sizeof(*sfilter), GFP_KERNEL | __GFP_NOWARN);
>> >>         if (!sfilter)
>> >> @@ -509,6 +498,48 @@ static void seccomp_send_sigsys(int syscall, int reason)
>> >>         info.si_syscall = syscall;
>> >>         force_sig_info(SIGSYS, &info, current);
>> >>  }
>> >> +
>> >> +#ifdef CONFIG_BPF_SYSCALL
>> >> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter)
>> >> +{
>> >> +       /* XXX: this cast generates a warning. should we make people pass in
>> >> +        * &fd, or is there some nicer way of doing this?
>> >> +        */
>> >> +       u32 fd = (u32) filter;
>> >
>> > I think this is probably the right way to do it, modulo getting the
>> > warning fixed. Let me invoke the great linux-api subscribers to get
>> > some more opinions.
>>
>> Sigh. It's sad, but the using a cast does seem the simplest option.
>> But, how about another idea...
>>
>> > tl;dr: adding SECCOMP_FILTER_FLAG_EBPF to the flags changes the
>> > pointer argument into an fd argument. Is this sane, should it be a
>> > pointer to an fd, or should it not be a flag at all, creating a new
>> > seccomp command instead (SECCOMP_MODE_FILTER_EBPF)?
>>
>> What about
>>
>>     seccomp(SECCOMP_MODE_FILTER_EBPF, flags, structp)
>>
>> Where structp is a pointer to something like
>>
>> struct seccomp_ebpf {
>>     int size;      /* Size of this whole struct */
>>     int fd;
>> }
>>
>> 'size' allows for future expansion of the struct (in case we want to
>> expand it later), and placing 'fd' inside a struct avoids unpleasant
>> implication that would be made by passing a pointer to an fd as the
>> third argument.
>
> I like this; although perhaps something like bpf() has, with the
> unions inside the struct so that we don't have to declare another
> struct if we add another seccomp command. Kees?

Yeah, bpf's union looks good. Let's add a "command" flag, though:

seccomp(SECCOMP_MODE_FILTER_EBPF, int cmd, union, size);

And this cmd could be ADD_FD or something?

How's that look?

-Kees

>
> Tycho
>
>> Cheers,
>>
>> Michael
>>
>>
>> > -Kees
>> >
>> >> +       struct seccomp_filter *ret;
>> >> +       struct bpf_prog *prog;
>> >> +
>> >> +       prog = bpf_prog_get(fd);
>> >> +       if (IS_ERR(prog))
>> >> +               return (struct seccomp_filter *) prog;
>> >> +
>> >> +       if (prog->type != BPF_PROG_TYPE_SECCOMP) {
>> >> +               bpf_prog_put(prog);
>> >> +               return ERR_PTR(-EINVAL);
>> >> +       }
>> >> +
>> >> +       ret = kzalloc(sizeof(*ret), GFP_KERNEL | __GFP_NOWARN);
>> >> +       if (!ret) {
>> >> +               bpf_prog_put(prog);
>> >> +               return ERR_PTR(-ENOMEM);
>> >> +       }
>> >> +
>> >> +       ret->prog = prog;
>> >> +       atomic_set(&ret->usage, 1);
>> >> +
>> >> +       /* Intentionally don't bpf_prog_put() here, because the underlying prog
>> >> +        * is refcounted too and we're holding a reference from the struct
>> >> +        * seccomp_filter object.
>> >> +        */
>> >> +
>> >> +       return ret;
>> >> +}
>> >> +#else
>> >> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user *filter)
>> >> +{
>> >> +       return ERR_PTR(-EINVAL);
>> >> +}
>> >> +#endif
>> >>  #endif /* CONFIG_SECCOMP_FILTER */
>> >>
>> >>  /*
>> >> @@ -775,8 +806,23 @@ static long seccomp_set_mode_filter(unsigned int flags,
>> >>         if (flags & ~SECCOMP_FILTER_FLAG_MASK)
>> >>                 return -EINVAL;
>> >>
>> >> +       /*
>> >> +        * Installing a seccomp filter requires that the task has
>> >> +        * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
>> >> +        * This avoids scenarios where unprivileged tasks can affect the
>> >> +        * behavior of privileged children.
>> >> +        */
>> >> +       if (!task_no_new_privs(current) &&
>> >> +           security_capable_noaudit(current_cred(), current_user_ns(),
>> >> +                                    CAP_SYS_ADMIN) != 0)
>> >> +               return -EACCES;
>> >> +
>> >>         /* Prepare the new filter before holding any locks. */
>> >> -       prepared = seccomp_prepare_user_filter(filter);
>> >> +       if (flags & SECCOMP_FILTER_FLAG_EBPF)
>> >> +               prepared = seccomp_prepare_ebpf(filter);
>> >> +       else
>> >> +               prepared = seccomp_prepare_user_filter(filter);
>> >> +
>> >>         if (IS_ERR(prepared))
>> >>                 return PTR_ERR(prepared);
>> >>
>> >> --
>> >> 2.1.4
>> >>
>> >
>> >
>> >
>>
>>
>> --
>> Michael Kerrisk
>> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
>> Linux/UNIX System Programming Training: http://man7.org/training/



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <CAGXu5jKS0yX92XXhL6ZkqMrxkqFpPyyBd7wbsvEEx4rqZ0VG6g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd
       [not found]                 ` <CAGXu5jKS0yX92XXhL6ZkqMrxkqFpPyyBd7wbsvEEx4rqZ0VG6g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-09-09 14:47                   ` Tycho Andersen
  2015-09-09 15:14                     ` Alexei Starovoitov
  0 siblings, 1 reply; 7+ messages in thread
From: Tycho Andersen @ 2015-09-09 14:47 UTC (permalink / raw)
  To: Kees Cook
  Cc: Michael Kerrisk (man-pages), Linux API, Alexei Starovoitov,
	Will Drewry, Oleg Nesterov, Andy Lutomirski, Pavel Emelyanov,
	Serge E. Hallyn, Daniel Borkmann, LKML, Network Development

On Tue, Sep 08, 2015 at 05:07:03PM -0700, Kees Cook wrote:
>
> Yeah, bpf's union looks good. Let's add a "command" flag, though:
> 
> seccomp(SECCOMP_MODE_FILTER_EBPF, int cmd, union, size);
> 
> And this cmd could be ADD_FD or something?
> 
> How's that look?

I think we can drop the size (using the same strategy as bpf() and
checking for zeroes at the end), and keep the same signature for
seccomp(); so:

seccomp(SECCOMP_MODE_FILTER_EBPF, SECCOMP_ADD_BPF_FD, &union)

Yes, I'll use this in the next version.

Tycho

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd
  2015-09-09 14:47                   ` Tycho Andersen
@ 2015-09-09 15:14                     ` Alexei Starovoitov
       [not found]                       ` <20150909151402.GA3429-2RGepAHry04KGsCuBW9QBvb0xQGhdpdCAL8bYrjMMd8@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Alexei Starovoitov @ 2015-09-09 15:14 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Kees Cook, Michael Kerrisk (man-pages), Linux API,
	Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development

On Wed, Sep 09, 2015 at 08:47:24AM -0600, Tycho Andersen wrote:
> On Tue, Sep 08, 2015 at 05:07:03PM -0700, Kees Cook wrote:
> >
> > Yeah, bpf's union looks good. Let's add a "command" flag, though:
> > 
> > seccomp(SECCOMP_MODE_FILTER_EBPF, int cmd, union, size);
> > 
> > And this cmd could be ADD_FD or something?
> > 
> > How's that look?
> 
> I think we can drop the size (using the same strategy as bpf() and
> checking for zeroes at the end), and keep the same signature for
> seccomp(); so:
> 
> seccomp(SECCOMP_MODE_FILTER_EBPF, SECCOMP_ADD_BPF_FD, &union)
> 
> Yes, I'll use this in the next version.

actually bpf() has size as the last argument:
SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
perf_event_open() doesn't and size is embedded as one of the fields.
Both approaches are equivally powerfull from extensitiblity
point of view. My preference was to keep size as an explicit
argument.

^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <20150909151402.GA3429-2RGepAHry04KGsCuBW9QBvb0xQGhdpdCAL8bYrjMMd8@public.gmane.org>]

* Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd
       [not found]                       ` <20150909151402.GA3429-2RGepAHry04KGsCuBW9QBvb0xQGhdpdCAL8bYrjMMd8@public.gmane.org>
@ 2015-09-09 15:55                         ` Tycho Andersen
  0 siblings, 0 replies; 7+ messages in thread
From: Tycho Andersen @ 2015-09-09 15:55 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kees Cook, Michael Kerrisk (man-pages), Linux API,
	Alexei Starovoitov, Will Drewry, Oleg Nesterov, Andy Lutomirski,
	Pavel Emelyanov, Serge E. Hallyn, Daniel Borkmann, LKML,
	Network Development

On Wed, Sep 09, 2015 at 08:14:04AM -0700, Alexei Starovoitov wrote:
> On Wed, Sep 09, 2015 at 08:47:24AM -0600, Tycho Andersen wrote:
> > On Tue, Sep 08, 2015 at 05:07:03PM -0700, Kees Cook wrote:
> > >
> > > Yeah, bpf's union looks good. Let's add a "command" flag, though:
> > > 
> > > seccomp(SECCOMP_MODE_FILTER_EBPF, int cmd, union, size);
> > > 
> > > And this cmd could be ADD_FD or something?
> > > 
> > > How's that look?
> > 
> > I think we can drop the size (using the same strategy as bpf() and
> > checking for zeroes at the end), and keep the same signature for
> > seccomp(); so:
> > 
> > seccomp(SECCOMP_MODE_FILTER_EBPF, SECCOMP_ADD_BPF_FD, &union)
> > 
> > Yes, I'll use this in the next version.
> 
> actually bpf() has size as the last argument:
> SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
> perf_event_open() doesn't and size is embedded as one of the fields.
> Both approaches are equivally powerfull from extensitiblity
> point of view. My preference was to keep size as an explicit
> argument.

Yep, sorry that was poorly written. I meant keeping the size as a
member of the struct as Michael originally suggested, mostly to avoid
having to change the signature of seccomp().

Tycho

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-09-09 15:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1441382664-17437-1-git-send-email-tycho.andersen@canonical.com>
     [not found] ` <1441382664-17437-6-git-send-email-tycho.andersen@canonical.com>
     [not found]   ` <1441382664-17437-6-git-send-email-tycho.andersen-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
2015-09-04 20:41     ` [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd Kees Cook
     [not found]       ` <CAGXu5jKke44txdYqEgPRrkn8SyWGjJuHxT2qMdq2ztp_16mQyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-05  7:13         ` Michael Kerrisk (man-pages)
     [not found]           ` <55EA95FE.7000006-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-09-08 13:40             ` Tycho Andersen
2015-09-09  0:07               ` Kees Cook
     [not found]                 ` <CAGXu5jKS0yX92XXhL6ZkqMrxkqFpPyyBd7wbsvEEx4rqZ0VG6g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-09 14:47                   ` Tycho Andersen
2015-09-09 15:14                     ` Alexei Starovoitov
     [not found]                       ` <20150909151402.GA3429-2RGepAHry04KGsCuBW9QBvb0xQGhdpdCAL8bYrjMMd8@public.gmane.org>
2015-09-09 15:55                         ` Tycho Andersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).