All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stanislav Fomichev <sdf@fomichev.me>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Stanislav Fomichev <sdf@google.com>,
	Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>, Martin Lau <kafai@fb.com>
Subject: Re: [PATCH bpf-next v7 1/9] bpf: implement getsockopt and setsockopt hooks
Date: Wed, 19 Jun 2019 15:20:32 -0700	[thread overview]
Message-ID: <20190619222032.GC19111@mini-arch> (raw)
In-Reply-To: <CAEf4BzbTfCG3Mk9=78=Kh5u4ofxdYePm=xVEcN8g38cXQ_gq6Q@mail.gmail.com>

On 06/19, Andrii Nakryiko wrote:
> On Wed, Jun 19, 2019 at 1:17 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> >
> > On 06/19, Andrii Nakryiko wrote:
> > > On Wed, Jun 19, 2019 at 10:00 AM Stanislav Fomichev <sdf@google.com> wrote:
> > > >
> > > > Implement new BPF_PROG_TYPE_CGROUP_SOCKOPT program type and
> > > > BPF_CGROUP_{G,S}ETSOCKOPT cgroup hooks.
> > > >
> > > > BPF_CGROUP_SETSOCKOPT get a read-only view of the setsockopt arguments.
> > > > BPF_CGROUP_GETSOCKOPT can modify the supplied buffer.
> > > > Both of them reuse existing PTR_TO_PACKET{,_END} infrastructure.
> > > >
> > > > The buffer memory is pre-allocated (because I don't think there is
> > > > a precedent for working with __user memory from bpf). This might be
> > > > slow to do for each {s,g}etsockopt call, that's why I've added
> > > > __cgroup_bpf_prog_array_is_empty that exits early if there is nothing
> > > > attached to a cgroup. Note, however, that there is a race between
> > > > __cgroup_bpf_prog_array_is_empty and BPF_PROG_RUN_ARRAY where cgroup
> > > > program layout might have changed; this should not be a problem
> > > > because in general there is a race between multiple calls to
> > > > {s,g}etsocktop and user adding/removing bpf progs from a cgroup.
> > > >
> > > > The return code of the BPF program is handled as follows:
> > > > * 0: EPERM
> > > > * 1: success, continue with next BPF program in the cgroup chain
> > > >
> > > > v7:
> > > > * return only 0 or 1 (Alexei Starovoitov)
> > > > * always run all progs (Alexei Starovoitov)
> > > > * use optval=0 as kernel bypass in setsockopt (Alexei Starovoitov)
> > > >   (decided to use optval=-1 instead, optval=0 might be a valid input)
> > > > * call getsockopt hook after kernel handlers (Alexei Starovoitov)
> > > >
> > > > v6:
> > > > * rework cgroup chaining; stop as soon as bpf program returns
> > > >   0 or 2; see patch with the documentation for the details
> > > > * drop Andrii's and Martin's Acked-by (not sure they are comfortable
> > > >   with the new state of things)
> > >
> > > I like the general approach, just overall unclear about seemingly
> > > artificial restrictions I mentioned below.
> > >
> > > >
> > > > v5:
> > > > * skip copy_to_user() and put_user() when ret == 0 (Martin Lau)
> > > >
> > > > v4:
> > > > * don't export bpf_sk_fullsock helper (Martin Lau)
> > > > * size != sizeof(__u64) for uapi pointers (Martin Lau)
> > > > * offsetof instead of bpf_ctx_range when checking ctx access (Martin Lau)
> > > >
> > > > v3:
> > > > * typos in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY comments (Andrii Nakryiko)
> > > > * reverse christmas tree in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY (Andrii
> > > >   Nakryiko)
> > > > * use __bpf_md_ptr instead of __u32 for optval{,_end} (Martin Lau)
> > > > * use BPF_FIELD_SIZEOF() for consistency (Martin Lau)
> > > > * new CG_SOCKOPT_ACCESS macro to wrap repeated parts
> > > >
> > > > v2:
> > > > * moved bpf_sockopt_kern fields around to remove a hole (Martin Lau)
> > > > * aligned bpf_sockopt_kern->buf to 8 bytes (Martin Lau)
> > > > * bpf_prog_array_is_empty instead of bpf_prog_array_length (Martin Lau)
> > > > * added [0,2] return code check to verifier (Martin Lau)
> > > > * dropped unused buf[64] from the stack (Martin Lau)
> > > > * use PTR_TO_SOCKET for bpf_sockopt->sk (Martin Lau)
> > > > * dropped bpf_target_off from ctx rewrites (Martin Lau)
> > > > * use return code for kernel bypass (Martin Lau & Andrii Nakryiko)
> > > >
> > > > Cc: Martin Lau <kafai@fb.com>
> > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > ---
> > >
> > > <snip>
> > >
> > > >
> > > > +struct bpf_sockopt_kern {
> > > > +       struct sock     *sk;
> > > > +       u8              *optval;
> > > > +       u8              *optval_end;
> > > > +       s32             level;
> > > > +       s32             optname;
> > > > +       u32             optlen;
> > >
> > > Optlen is used below as signed integer, so switch it to s32?
> > Good catch, should be s32 here and below, thanks!
> >
> > > > +       s32             retval;
> > > > +
> > > > +       /* Small on-stack optval buffer to avoid small allocations.
> > > > +        */
> > > > +       u8 buf[64] __aligned(8);
> > > > +};
> > > > +
> > >
> > > <snip>
> > >
> > > >
> > > > +struct bpf_sockopt {
> > > > +       __bpf_md_ptr(struct bpf_sock *, sk);
> > > > +       __bpf_md_ptr(void *, optval);
> > > > +       __bpf_md_ptr(void *, optval_end);
> > > > +
> > > > +       __s32   level;
> > > > +       __s32   optname;
> > > > +       __u32   optlen;
> > >
> > > Same as above, we expect BPF program to be able to set it to -1, so __s32?
> > >
> > > > +       __s32   retval;
> > > > +};
> > > > +
> > > >  #endif /* _UAPI__LINUX_BPF_H__ */
> > > > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> > >
> > > <snip>
> > >
> > > > +
> > > > +       if (ctx.optlen == -1)
> > > > +               /* optlen set to -1, bypass kernel */
> > > > +               ret = 1;
> > > > +       else if (ctx.optlen == optlen)
> > > > +               /* optlen not changed, run kernel handler */
> > > > +               ret = 0;
> > > > +       else
> > > > +               /* any other value is rejected */
> > > > +               ret = -EFAULT;
> > >
> > > I'm consufed about this assymetry between getsockopt and setsockopt
> > > behavior. Why we are disallowing setsockopt from changing optlen (and
> > > value itself)? Is there any harm in allowing that? Imagining some use
> > > case that provides transparent "support" for some option, you'd need
> > > to be able to intercept and provide custom values both for setsockopt
> > > and getsockopt. So unless I'm missing some security implications, why
> > > not make both sides able to write?
> > Because kernel setsockopt handlers use get_user to read the data. We
> > can definitely allow changing optval+optlen, but we'd have to copy
> > that data back to userspace to let kernel handle it. I'm not sure how
> > userspace might feel about it. Can it be a buffer in the readonly
> > elf section?
> 
> Ah, ok, now I see why :) Yeah, I guess it can be in read-only section.
> Alright, I don't see an easy solution to that, I guess we can live
> with that for now.
> 
> >
> > > Similar will apply w.r.t. retval, why can't setsockopt return EINVAL
> > > to reject some options? This seems very useful and very similar to
> > > what sysctl BPF hooks do.
> > I was just being defensive because I'm not sure what's the use-case.
> > We can already return EPERM, why do we need to return a different
> > error code? Are we comfortable letting progs return arbitrary number?
> > Or you just want to allow a bunch of pre-defined error codes?
> >
> > I haven't seen the ability to return arbitrary error from the sysctl
> > hooks, but maybe I didn't look hard enough.
> 
> Yeah, seems like sysctl is only 0 or EPERM. I missed for a moment that
> there is return value from BPF program and retval from the context. I
> think it's good enough as is.
> 
> >
> > > > +
> > > > +out:
> > > > +       sockopt_free_buf(&ctx);
> > > > +       return ret;
> > > > +}
> > > > +EXPORT_SYMBOL(__cgroup_bpf_run_filter_setsockopt);
> > > > +
> > > > +int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
> > > > +                                      int optname, char __user *optval,
> > > > +                                      int __user *optlen, int max_optlen,
> > > > +                                      int retval)
> > > > +{
> > >
> > > <snip>
> > >
> > > > +
> > > > +       if (ctx.optlen > max_optlen) {
> > > > +               ret = -EFAULT;
> > > > +               goto out;
> > > > +       }
> > > > +
> > > > +       /* BPF programs only allowed to set retval to 0, not some
> > > > +        * arbitrary value.
> > > > +        */
> > > > +       if (ctx.retval != 0 && ctx.retval != retval) {
> > >
> > > Lookin at manpage of getsockopt, seems like at least two error codes
> > > are relevant and generally useful for BPF program to be able to
> > > return: EINVAL and ENOPROTOOPT? Why we are disallowing anything but 0
> > > (or preserving original retval)?
> > I was thinking about simple use-case where it's either BPF that
> > handles the opt or the kernel. And then it's BFP returning success or
> > EPERM. I don't think I understand why BPF needs to be able to
> > return different error codes. We can certainly do that if you think
> > that it makes sense; alternatively, we can start with 0 or kernel retval
> > and relax the requirements if someone really needs that in the future.
> >
> > (I don't have a strong opinion here tbh).
> 
> As replied above, EPERM is probably good enough for practical
> purposes, I was being a bit pedantic :)
Sounds good! I was also debating whether to allow BPF programs
to set arbitrary retval, but didn't find any good example on
why we need it :-)

> > > > +               ret = -EFAULT;
> > > > +               goto out;
> > > > +       }
> > > > +
> > > > +       if (copy_to_user(optval, ctx.optval, ctx.optlen) ||
> > > > +           put_user(ctx.optlen, optlen)) {
> > > > +               ret = -EFAULT;
> > > > +               goto out;
> > > > +       }
> > > > +
> > > > +       ret = ctx.retval;
> > > > +
> > > > +out:
> > > > +       sockopt_free_buf(&ctx);
> > > > +       return ret;
> > > > +}
> > > > +EXPORT_SYMBOL(__cgroup_bpf_run_filter_getsockopt);
> > > > +
> > >
> > > <snip>

  reply	other threads:[~2019-06-19 22:20 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-19 16:59 [PATCH bpf-next v7 0/9] bpf: getsockopt and setsockopt hooks Stanislav Fomichev
2019-06-19 16:59 ` [PATCH bpf-next v7 1/9] bpf: implement " Stanislav Fomichev
2019-06-19 19:31   ` Andrii Nakryiko
2019-06-19 20:17     ` Stanislav Fomichev
2019-06-19 21:45       ` Andrii Nakryiko
2019-06-19 22:20         ` Stanislav Fomichev [this message]
2019-06-21 17:11   ` kbuild test robot
2019-06-19 16:59 ` [PATCH bpf-next v7 2/9] bpf: sync bpf.h to tools/ Stanislav Fomichev
2019-06-19 16:59 ` [PATCH bpf-next v7 3/9] libbpf: support sockopt hooks Stanislav Fomichev
2019-06-19 16:59 ` [PATCH bpf-next v7 4/9] selftests/bpf: test sockopt section name Stanislav Fomichev
2019-06-19 16:59 ` [PATCH bpf-next v7 5/9] selftests/bpf: add sockopt test Stanislav Fomichev
2019-06-19 16:59 ` [PATCH bpf-next v7 6/9] selftests/bpf: add sockopt test that exercises sk helpers Stanislav Fomichev
2019-06-19 16:59 ` [PATCH bpf-next v7 7/9] selftests/bpf: add sockopt test that exercises BPF_F_ALLOW_MULTI Stanislav Fomichev
2019-06-19 16:59 ` [PATCH bpf-next v7 8/9] bpf: add sockopt documentation Stanislav Fomichev
2019-06-19 16:59 ` [PATCH bpf-next v7 9/9] bpftool: support cgroup sockopt Stanislav Fomichev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190619222032.GC19111@mini-arch \
    --to=sdf@fomichev.me \
    --cc=andrii.nakryiko@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=kafai@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.