From: sdf@google.com
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, davem@davemloft.net,
ast@kernel.org, daniel@iogearbox.net,
David Laight <David.Laight@ACULAB.COM>
Subject: Re: [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE
Date: Wed, 17 Jun 2020 10:45:08 -0700 [thread overview]
Message-ID: <20200617174508.GA246265@google.com> (raw)
In-Reply-To: <20200617170909.koev3t5fmngla3c4@ast-mbp.dhcp.thefacebook.com>
On 06/17, Alexei Starovoitov wrote:
> On Tue, Jun 16, 2020 at 06:04:14PM -0700, Stanislav Fomichev wrote:
> > Attaching to these hooks can break iptables because its optval is
> > usually quite big, or at least bigger than the current PAGE_SIZE limit.
> > David also mentioned some SCTP options can be big (around 256k).
> >
> > For such optvals we expose only the first PAGE_SIZE bytes to
> > the BPF program. BPF program has two options:
> > 1. Set ctx->optlen to 0 to indicate that the BPF's optval
> > should be ignored and the kernel should use original userspace
> > value.
> > 2. Set ctx->optlen to something that's smaller than the PAGE_SIZE.
> >
> > v5:
> > * use ctx->optlen == 0 with trimmed buffer (Alexei Starovoitov)
> > * update the docs accordingly
> >
> > v4:
> > * use temporary buffer to avoid optval == optval_end == NULL;
> > this removes the corner case in the verifier that might assume
> > non-zero PTR_TO_PACKET/PTR_TO_PACKET_END.
> >
> > v3:
> > * don't increase the limit, bypass the argument
> >
> > v2:
> > * proper comments formatting (Jakub Kicinski)
> >
> > Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
> > Cc: David Laight <David.Laight@ACULAB.COM>
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> > kernel/bpf/cgroup.c | 53 ++++++++++++++++++++++++++++-----------------
> > 1 file changed, 33 insertions(+), 20 deletions(-)
> >
> > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> > index 4d76f16524cc..ac53102e244a 100644
> > --- a/kernel/bpf/cgroup.c
> > +++ b/kernel/bpf/cgroup.c
> > @@ -1276,16 +1276,23 @@ static bool
> __cgroup_bpf_prog_array_is_empty(struct cgroup *cgrp,
> >
> > static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int
> max_optlen)
> > {
> > - if (unlikely(max_optlen > PAGE_SIZE) || max_optlen < 0)
> > + if (unlikely(max_optlen < 0))
> > return -EINVAL;
> >
> > + if (unlikely(max_optlen > PAGE_SIZE)) {
> > + /* We don't expose optvals that are greater than PAGE_SIZE
> > + * to the BPF program.
> > + */
> > + max_optlen = PAGE_SIZE;
> > + }
> > +
> > ctx->optval = kzalloc(max_optlen, GFP_USER);
> > if (!ctx->optval)
> > return -ENOMEM;
> >
> > ctx->optval_end = ctx->optval + max_optlen;
> >
> > - return 0;
> > + return max_optlen;
> > }
> >
> > static void sockopt_free_buf(struct bpf_sockopt_kern *ctx)
> > @@ -1319,13 +1326,13 @@ int __cgroup_bpf_run_filter_setsockopt(struct
> sock *sk, int *level,
> > */
> > max_optlen = max_t(int, 16, *optlen);
> >
> > - ret = sockopt_alloc_buf(&ctx, max_optlen);
> > - if (ret)
> > - return ret;
> > + max_optlen = sockopt_alloc_buf(&ctx, max_optlen);
> > + if (max_optlen < 0)
> > + return max_optlen;
> >
> > ctx.optlen = *optlen;
> >
> > - if (copy_from_user(ctx.optval, optval, *optlen) != 0) {
> > + if (copy_from_user(ctx.optval, optval, min(*optlen, max_optlen)) !=
> 0) {
> > ret = -EFAULT;
> > goto out;
> > }
> > @@ -1353,8 +1360,14 @@ int __cgroup_bpf_run_filter_setsockopt(struct
> sock *sk, int *level,
> > /* export any potential modifications */
> > *level = ctx.level;
> > *optname = ctx.optname;
> > - *optlen = ctx.optlen;
> > - *kernel_optval = ctx.optval;
> > +
> > + /* optlen == 0 from BPF indicates that we should
> > + * use original userspace data.
> > + */
> > + if (ctx.optlen != 0) {
> > + *optlen = ctx.optlen;
> I think it should be:
> *optlen = min(ctx.optlen, max_optlen);
We do have the following (existing) check above:
} else if (ctx.optlen > max_optlen || ctx.optlen < -1) {
/* optlen is out of bounds */
ret = -EFAULT;
} else {
So we shouldn't need any min here? Or am I missing something?
> Otherwise when bpf prog doesn't adjust ctx.oplen the kernel will see
> 4k only in kernel_optval whereas optlen will be > 4k.
> I suspect iptables sockopt should have crashed at this point.
> How did you test it?
The selftests that I've attached in the series. The test is passing
two pages and for IP_TOS we bypass the value via optlen=0 and
for IP_FREEBIND we trim the buffer to 1 byte. I think this should
cover this check here.
One thing I didn't really test is getsockopt when the kernel
returns really large buffer (iptables). Right now, the test
gets 4 bytes (trimmed) from the kernel. I think that's the only
place that I didn't properly test. I wonder whether I should
do a real iptables-like setsockopt/getsockopt :-/
next prev parent reply other threads:[~2020-06-17 17:45 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-17 1:04 [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Stanislav Fomichev
2020-06-17 1:04 ` [PATCH bpf v5 2/3] selftests/bpf: make sure optvals > PAGE_SIZE are bypassed Stanislav Fomichev
2020-06-17 1:04 ` [PATCH bpf v5 3/3] bpf: document optval > PAGE_SIZE behavior for sockopt hooks Stanislav Fomichev
2020-06-17 17:09 ` [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Alexei Starovoitov
2020-06-17 17:45 ` sdf [this message]
2020-06-17 17:59 ` Alexei Starovoitov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200617174508.GA246265@google.com \
--to=sdf@google.com \
--cc=David.Laight@ACULAB.COM \
--cc=alexei.starovoitov@gmail.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).