From: Stanislav Fomichev <sdf@google.com>
To: netdev@vger.kernel.org, bpf@vger.kernel.org
Cc: davem@davemloft.net, ast@kernel.org, daniel@iogearbox.net,
Stanislav Fomichev <sdf@google.com>,
David Laight <David.Laight@ACULAB.COM>
Subject: [PATCH bpf v4 1/2] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE
Date: Tue, 16 Jun 2020 15:52:55 -0700 [thread overview]
Message-ID: <20200616225256.246769-1-sdf@google.com> (raw)
Attaching to these hooks can break iptables because its optval is
usually quite big, or at least bigger than the current PAGE_SIZE limit.
David also mentioned some SCTP options can be big (around 256k).
There are two possible ways to fix it:
1. Increase the limit to match iptables max optval. There is, however,
no clear upper limit. Technically, iptables can accept up to
512M of data (not sure how practical it is though).
2. Bypass the value (don't expose to BPF) if it's too big and trigger
BPF only with level/optname so BPF can still decide whether
to allow/deny big sockopts.
The initial attempt was implemented using strategy #1. Due to
listed shortcomings, let's switch to strategy #2. When there is
legitimate a real use-case for iptables/SCTP, we can consider increasing
the PAGE_SIZE limit.
To support the cases where len(optval) > PAGE_SIZE we can
leverage upcoming sleepable BPF work by providing a helper
which can do copy_from_user (sleepable) at the given offset
from the original large buffer.
v4:
* use temporary buffer to avoid optval == optval_end == NULL;
this removes the corner case in the verifier that might assume
non-zero PTR_TO_PACKET/PTR_TO_PACKET_END.
v3:
* don't increase the limit, bypass the argument
v2:
* proper comments formatting (Jakub Kicinski)
Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
Cc: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
include/linux/filter.h | 1 +
kernel/bpf/cgroup.c | 31 +++++++++++++++++++++++++------
2 files changed, 26 insertions(+), 6 deletions(-)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 259377723603..f4565a70f8ba 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1276,6 +1276,7 @@ struct bpf_sockopt_kern {
s32 optname;
s32 optlen;
s32 retval;
+ u8 optval_too_large;
};
#endif /* __LINUX_FILTER_H__ */
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 4d76f16524cc..be78c01bf459 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1276,9 +1276,18 @@ static bool __cgroup_bpf_prog_array_is_empty(struct cgroup *cgrp,
static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen)
{
- if (unlikely(max_optlen > PAGE_SIZE) || max_optlen < 0)
+ if (unlikely(max_optlen < 0))
return -EINVAL;
+ if (unlikely(max_optlen > PAGE_SIZE)) {
+ /* We don't expose optvals that are greater than PAGE_SIZE
+ * to the BPF program.
+ */
+ ctx->optval = &ctx->optval_too_large;
+ ctx->optval_end = &ctx->optval_too_large;
+ return 0;
+ }
+
ctx->optval = kzalloc(max_optlen, GFP_USER);
if (!ctx->optval)
return -ENOMEM;
@@ -1288,9 +1297,15 @@ static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen)
return 0;
}
+static int sockopt_has_optval(struct bpf_sockopt_kern *ctx)
+{
+ return ctx->optval != &ctx->optval_too_large;
+}
+
static void sockopt_free_buf(struct bpf_sockopt_kern *ctx)
{
- kfree(ctx->optval);
+ if (sockopt_has_optval(ctx))
+ kfree(ctx->optval);
}
int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
@@ -1325,7 +1340,8 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
ctx.optlen = *optlen;
- if (copy_from_user(ctx.optval, optval, *optlen) != 0) {
+ if (sockopt_has_optval(&ctx) &&
+ copy_from_user(ctx.optval, optval, *optlen) != 0) {
ret = -EFAULT;
goto out;
}
@@ -1354,7 +1370,8 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
*level = ctx.level;
*optname = ctx.optname;
*optlen = ctx.optlen;
- *kernel_optval = ctx.optval;
+ if (sockopt_has_optval(&ctx))
+ *kernel_optval = ctx.optval;
}
out:
@@ -1407,7 +1424,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
if (ctx.optlen > max_optlen)
ctx.optlen = max_optlen;
- if (copy_from_user(ctx.optval, optval, ctx.optlen) != 0) {
+ if (sockopt_has_optval(&ctx) &&
+ copy_from_user(ctx.optval, optval, ctx.optlen) != 0) {
ret = -EFAULT;
goto out;
}
@@ -1436,7 +1454,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
goto out;
}
- if (copy_to_user(optval, ctx.optval, ctx.optlen) ||
+ if ((sockopt_has_optval(&ctx) &&
+ copy_to_user(optval, ctx.optval, ctx.optlen)) ||
put_user(ctx.optlen, optlen)) {
ret = -EFAULT;
goto out;
--
2.27.0.290.gba653c62da-goog
next reply other threads:[~2020-06-16 22:53 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-16 22:52 Stanislav Fomichev [this message]
2020-06-16 22:52 ` [PATCH bpf v4 2/2] selftests/bpf: make sure optvals > PAGE_SIZE are bypassed Stanislav Fomichev
2020-06-16 23:06 ` Alexei Starovoitov
2020-06-16 23:05 ` [PATCH bpf v4 1/2] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Alexei Starovoitov
2020-06-16 23:20 ` Stanislav Fomichev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200616225256.246769-1-sdf@google.com \
--to=sdf@google.com \
--cc=David.Laight@ACULAB.COM \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).