From: Stanislav Fomichev <sdf@fomichev.me>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Stanislav Fomichev <sdf@google.com>,
Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>, Martin Lau <kafai@fb.com>,
Andrii Nakryiko <andriin@fb.com>
Subject: Re: [PATCH bpf-next v5 1/8] bpf: implement getsockopt and setsockopt hooks
Date: Thu, 13 Jun 2019 14:54:00 -0700 [thread overview]
Message-ID: <20190613215400.GC9636@mini-arch> (raw)
In-Reply-To: <CAEf4Bza0D6=4a6D1ErpT+nh8_byufOz4qhvBmCsBV9zLFHP0eA@mail.gmail.com>
On 06/13, Andrii Nakryiko wrote:
> On Thu, Jun 13, 2019 at 2:20 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> >
> > On 06/13, Alexei Starovoitov wrote:
> > > C based example doesn't use ret=1.
> > > imo that's a sign that something is odd in the api.
> > I decided not to test ret=1 it because there are tests in the test_sockopt.c
> > for ret=1 usecase. But I can certainly extend C based test to cover
> > ret=1 as well. I agree that C based test can be used as an example,
> > will extend that to cover ret=0/1/2.
> >
> > > In particular ret=1 doesn't prohibit bpf prog to modify the optval.
> > That's a good point, Martin brought that up as well. We were trying
> > to remedy it by doing copy_to_user only if any program returned 2 ("BPF
> > handled that, bypass the kernel"). But I agree, the fact that the prog in
> > the chain can modify optval and return 1 is suboptimal. Especially if
> > the previous one filled in some valid data and returned 2.
> >
> > > Multiple progs can overwrite it and still return 1.
> > > But that optval is not going to be processed by the kernel.
> > > Should we do copy_to_user(optval, ctx.optval, ctx.optlen) here
> > > and let kernel pick it up from there?
> > I was thinking initially about that, that kernel can "transparently"
> > modify user buffer and then kernel (or next BPF program in the chain)
> > can run standard getsockopt on that.
> >
> > But it sounds a bit complicated and I don't really have a good use case
> > for that.
> >
> > > Should bpf prog be allowed to change optlen as well?
> > > ret=1 would mean that bpf prog did something and needs kernel
> > > to continue.
> > >
> > > Now consider a sequence of bpf progs.
> > > Some are doing ret=1. Some others are doing ret=2
> > > ret=2 will supersede.
> > > If first executed prog (child in cgroup) did ret=2
> > > the parent has no way to tell kernel to handle it.
> > > Even if parent does ret=1, it's effectively ignored.
> > > Parent can enforce rejection with ret=0, but it's a weird
> > > discrepancy.
> > > The rule for cgroup progs was 'all yes is yes, any no is no'.
> > My canonical example when reasoning about multiple progs was that each one
> > of them would implement handling for a particular level+optname. So only
> > a single one form the chain would return 2 or 0, the rest would return 1
> > without touching the buffer. I can't come up with a good use-case where
> > two programs in the chain can both return 2 and fill out the buffer.
> > The majority of the sockopts would still be handled by the kernel,
> > we'd have only a handful of bpf progs that handle a tiny subset
> > and delegate the rest to the kernel.
> >
> > How about we stop further processing as soon as some program in the chain
> > returned 2? I think that would address most of the concerns?
>
> What about a case of passive "auditing" BPF programs, that are not
> modifying anything, but want to capture every single
> getsockopt/setsockopt call? This premature stop would render that
> whole approach broken.
In that case you'd attach that program to the root of a cgroup
(sub)tree what you want to audit (and it would be always executed and
would return 1)? And you'd have to attach it first.
> > Maybe, in this case, also stop further processing as soon as
> > we get ret=0 (EPERM) for consistency?
> >
> > > So if ret=1 means 'kernel handles it'. Should it be almost
> > > as strong as 'reject it': any prog doing ret=1 means 'kernel does it'
> > > (unless some prog did ret=0. then reject it) ?
> > > if ret=1 means 'bpf did some and needs kernel to continue' that's
> > > another story.
> > > For ret=2 being 'bpf handled it completely', should parent overwrite it?
> > See above, I was thinking the opposite. Treat ret=1 from the BPF
> > program as "I'm not interested in this level+optname, other BPF
> > program or kernel should do the job". Essentially, as soon as bpf program
> > returns 2, that means BPF had consumed the request and no further processing
> > from neither BPF, nor kernel is requred; we can return to userspace.
> >
> > There is a problem that some prog in the chain might do some
> > "background" work and still return 1, but so far I don't see why
> > that can be useful. The pattern should be: filter the option
> > you want, handle it, otherwise return 1 to let the other progs/kernel
> > run.
> >
> > That BPF_F_ALLOW_MULTI use-case probably deserves another selftest :-/
> >
> > > May be retval from child prog should be seen by parent prog?
> > >
> > > In some sense kernel can be seen as another bpf prog in a sequence.
> > >
> > > Whatever new behavior is with 3 values it needs to be
> > > documented in uapi/bpf.h
> > > We were sloppy with such docs in the past, but that's not
> > > a reason to continue.
> > Good point on documenting that, I was trying to document everything
> > in Documentation/bpf/prog_cgroup_sockopt.rst, uapi/bpf.h seems too
> > constrained (I didn't find a good place to put that ret 1 vs 2 info).
> > Do you think having a file under Documentation/ with all the details
> > is not enough? Where can I put this ret=0/1/2 handing info in the
> > uapi/bpf.h?
next prev parent reply other threads:[~2019-06-13 21:54 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-10 21:08 [PATCH bpf-next v5 0/8] bpf: getsockopt and setsockopt hooks Stanislav Fomichev
2019-06-10 21:08 ` [PATCH bpf-next v5 1/8] bpf: implement " Stanislav Fomichev
2019-06-13 20:16 ` Alexei Starovoitov
2019-06-13 21:20 ` Stanislav Fomichev
2019-06-13 21:50 ` Andrii Nakryiko
2019-06-13 21:54 ` Stanislav Fomichev [this message]
2019-06-14 16:32 ` Stanislav Fomichev
2019-06-10 21:08 ` [PATCH bpf-next v5 2/8] bpf: sync bpf.h to tools/ Stanislav Fomichev
2019-06-10 21:08 ` [PATCH bpf-next v5 3/8] libbpf: support sockopt hooks Stanislav Fomichev
2019-06-10 21:08 ` [PATCH bpf-next v5 4/8] selftests/bpf: test sockopt section name Stanislav Fomichev
2019-06-10 21:08 ` [PATCH bpf-next v5 5/8] selftests/bpf: add sockopt test Stanislav Fomichev
2019-06-10 21:08 ` [PATCH bpf-next v5 6/8] selftests/bpf: add sockopt test that exercises sk helpers Stanislav Fomichev
2019-06-10 21:08 ` [PATCH bpf-next v5 7/8] bpf: add sockopt documentation Stanislav Fomichev
2019-06-10 21:08 ` [PATCH bpf-next v5 8/8] bpftool: support cgroup sockopt Stanislav Fomichev
2019-06-10 21:38 ` [PATCH bpf-next v5 0/8] bpf: getsockopt and setsockopt hooks Martin Lau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190613215400.GC9636@mini-arch \
--to=sdf@fomichev.me \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii.nakryiko@gmail.com \
--cc=andriin@fb.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=kafai@fb.com \
--cc=netdev@vger.kernel.org \
--cc=sdf@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.