* [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE
@ 2020-06-17 1:04 Stanislav Fomichev
2020-06-17 1:04 ` [PATCH bpf v5 2/3] selftests/bpf: make sure optvals > PAGE_SIZE are bypassed Stanislav Fomichev
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Stanislav Fomichev @ 2020-06-17 1:04 UTC (permalink / raw)
To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, David Laight
Attaching to these hooks can break iptables because its optval is
usually quite big, or at least bigger than the current PAGE_SIZE limit.
David also mentioned some SCTP options can be big (around 256k).
For such optvals we expose only the first PAGE_SIZE bytes to
the BPF program. BPF program has two options:
1. Set ctx->optlen to 0 to indicate that the BPF's optval
should be ignored and the kernel should use original userspace
value.
2. Set ctx->optlen to something that's smaller than the PAGE_SIZE.
v5:
* use ctx->optlen == 0 with trimmed buffer (Alexei Starovoitov)
* update the docs accordingly
v4:
* use temporary buffer to avoid optval == optval_end == NULL;
this removes the corner case in the verifier that might assume
non-zero PTR_TO_PACKET/PTR_TO_PACKET_END.
v3:
* don't increase the limit, bypass the argument
v2:
* proper comments formatting (Jakub Kicinski)
Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
Cc: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
kernel/bpf/cgroup.c | 53 ++++++++++++++++++++++++++++-----------------
1 file changed, 33 insertions(+), 20 deletions(-)
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 4d76f16524cc..ac53102e244a 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1276,16 +1276,23 @@ static bool __cgroup_bpf_prog_array_is_empty(struct cgroup *cgrp,
static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen)
{
- if (unlikely(max_optlen > PAGE_SIZE) || max_optlen < 0)
+ if (unlikely(max_optlen < 0))
return -EINVAL;
+ if (unlikely(max_optlen > PAGE_SIZE)) {
+ /* We don't expose optvals that are greater than PAGE_SIZE
+ * to the BPF program.
+ */
+ max_optlen = PAGE_SIZE;
+ }
+
ctx->optval = kzalloc(max_optlen, GFP_USER);
if (!ctx->optval)
return -ENOMEM;
ctx->optval_end = ctx->optval + max_optlen;
- return 0;
+ return max_optlen;
}
static void sockopt_free_buf(struct bpf_sockopt_kern *ctx)
@@ -1319,13 +1326,13 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
*/
max_optlen = max_t(int, 16, *optlen);
- ret = sockopt_alloc_buf(&ctx, max_optlen);
- if (ret)
- return ret;
+ max_optlen = sockopt_alloc_buf(&ctx, max_optlen);
+ if (max_optlen < 0)
+ return max_optlen;
ctx.optlen = *optlen;
- if (copy_from_user(ctx.optval, optval, *optlen) != 0) {
+ if (copy_from_user(ctx.optval, optval, min(*optlen, max_optlen)) != 0) {
ret = -EFAULT;
goto out;
}
@@ -1353,8 +1360,14 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
/* export any potential modifications */
*level = ctx.level;
*optname = ctx.optname;
- *optlen = ctx.optlen;
- *kernel_optval = ctx.optval;
+
+ /* optlen == 0 from BPF indicates that we should
+ * use original userspace data.
+ */
+ if (ctx.optlen != 0) {
+ *optlen = ctx.optlen;
+ *kernel_optval = ctx.optval;
+ }
}
out:
@@ -1385,12 +1398,12 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
__cgroup_bpf_prog_array_is_empty(cgrp, BPF_CGROUP_GETSOCKOPT))
return retval;
- ret = sockopt_alloc_buf(&ctx, max_optlen);
- if (ret)
- return ret;
-
ctx.optlen = max_optlen;
+ max_optlen = sockopt_alloc_buf(&ctx, max_optlen);
+ if (max_optlen < 0)
+ return max_optlen;
+
if (!retval) {
/* If kernel getsockopt finished successfully,
* copy whatever was returned to the user back
@@ -1404,10 +1417,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
goto out;
}
- if (ctx.optlen > max_optlen)
- ctx.optlen = max_optlen;
-
- if (copy_from_user(ctx.optval, optval, ctx.optlen) != 0) {
+ if (copy_from_user(ctx.optval, optval,
+ min(ctx.optlen, max_optlen)) != 0) {
ret = -EFAULT;
goto out;
}
@@ -1436,10 +1447,12 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
goto out;
}
- if (copy_to_user(optval, ctx.optval, ctx.optlen) ||
- put_user(ctx.optlen, optlen)) {
- ret = -EFAULT;
- goto out;
+ if (ctx.optlen != 0) {
+ if (copy_to_user(optval, ctx.optval, ctx.optlen) ||
+ put_user(ctx.optlen, optlen)) {
+ ret = -EFAULT;
+ goto out;
+ }
}
ret = ctx.retval;
--
2.27.0.290.gba653c62da-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH bpf v5 2/3] selftests/bpf: make sure optvals > PAGE_SIZE are bypassed 2020-06-17 1:04 [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Stanislav Fomichev @ 2020-06-17 1:04 ` Stanislav Fomichev 2020-06-17 1:04 ` [PATCH bpf v5 3/3] bpf: document optval > PAGE_SIZE behavior for sockopt hooks Stanislav Fomichev 2020-06-17 17:09 ` [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Alexei Starovoitov 2 siblings, 0 replies; 6+ messages in thread From: Stanislav Fomichev @ 2020-06-17 1:04 UTC (permalink / raw) To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev We are relying on the fact, that we can pass > sizeof(int) optvals to the SOL_IP+IP_FREEBIND option (the kernel will take first 4 bytes). In the BPF program we check that we can only touch PAGE_SIZE bytes, but the real optlen is PAGE_SIZE * 2. In both cases, we override it to some predefined value and trim the optlen. Also, let's modify exiting IP_TOS usecase to test optlen=0 case where BPF program just bypasses the data as is. Signed-off-by: Stanislav Fomichev <sdf@google.com> --- .../selftests/bpf/prog_tests/sockopt_sk.c | 46 +++++++++++++--- .../testing/selftests/bpf/progs/sockopt_sk.c | 54 ++++++++++++++++++- 2 files changed, 91 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c index 2061a6beac0f..5f54c6aec7f0 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c +++ b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c @@ -13,6 +13,7 @@ static int getsetsockopt(void) char cc[16]; /* TCP_CA_NAME_MAX */ } buf = {}; socklen_t optlen; + char *big_buf = NULL; fd = socket(AF_INET, SOCK_STREAM, 0); if (fd < 0) { @@ -22,24 +23,31 @@ static int getsetsockopt(void) /* IP_TOS - BPF bypass */ - buf.u8[0] = 0x08; - err = setsockopt(fd, SOL_IP, IP_TOS, &buf, 1); + optlen = getpagesize() * 2; + big_buf = calloc(1, optlen); + if (!big_buf) { + log_err("Couldn't allocate two pages"); + goto err; + } + + *(int *)big_buf = 0x08; + err = setsockopt(fd, SOL_IP, IP_TOS, big_buf, optlen); if (err) { log_err("Failed to call setsockopt(IP_TOS)"); goto err; } - buf.u8[0] = 0x00; + memset(big_buf, 0, optlen); optlen = 1; - err = getsockopt(fd, SOL_IP, IP_TOS, &buf, &optlen); + err = getsockopt(fd, SOL_IP, IP_TOS, big_buf, &optlen); if (err) { log_err("Failed to call getsockopt(IP_TOS)"); goto err; } - if (buf.u8[0] != 0x08) { - log_err("Unexpected getsockopt(IP_TOS) buf[0] 0x%02x != 0x08", - buf.u8[0]); + if (*(int *)big_buf != 0x08) { + log_err("Unexpected getsockopt(IP_TOS) optval 0x%x != 0x08", + *(int *)big_buf); goto err; } @@ -78,6 +86,28 @@ static int getsetsockopt(void) goto err; } + /* IP_FREEBIND - BPF can't access optval past PAGE_SIZE */ + + optlen = getpagesize() * 2; + memset(big_buf, 0, optlen); + + err = setsockopt(fd, SOL_IP, IP_FREEBIND, big_buf, optlen); + if (err != 0) { + log_err("Failed to call setsockopt, ret=%d", err); + goto err; + } + + err = getsockopt(fd, SOL_IP, IP_FREEBIND, big_buf, &optlen); + if (err != 0) { + log_err("Failed to call getsockopt, ret=%d", err); + goto err; + } + + if (optlen != 1 || *(__u8 *)big_buf != 0x55) { + log_err("Unexpected IP_FREEBIND getsockopt, optlen=%d, optval=0x%x", + optlen, *(__u8 *)big_buf); + } + /* SO_SNDBUF is overwritten */ buf.u32 = 0x01010101; @@ -124,9 +154,11 @@ static int getsetsockopt(void) goto err; } + free(big_buf); close(fd); return 0; err: + free(big_buf); close(fd); return -1; } diff --git a/tools/testing/selftests/bpf/progs/sockopt_sk.c b/tools/testing/selftests/bpf/progs/sockopt_sk.c index d5a5eeb5fb52..712df7b49cb1 100644 --- a/tools/testing/selftests/bpf/progs/sockopt_sk.c +++ b/tools/testing/selftests/bpf/progs/sockopt_sk.c @@ -8,6 +8,10 @@ char _license[] SEC("license") = "GPL"; __u32 _version SEC("version") = 1; +#ifndef PAGE_SIZE +#define PAGE_SIZE 4096 +#endif + #define SOL_CUSTOM 0xdeadbeef struct sockopt_sk { @@ -28,12 +32,14 @@ int _getsockopt(struct bpf_sockopt *ctx) __u8 *optval = ctx->optval; struct sockopt_sk *storage; - if (ctx->level == SOL_IP && ctx->optname == IP_TOS) + if (ctx->level == SOL_IP && ctx->optname == IP_TOS) { /* Not interested in SOL_IP:IP_TOS; * let next BPF program in the cgroup chain or kernel * handle it. */ + ctx->optlen = 0; /* bypass optval>PAGE_SIZE */ return 1; + } if (ctx->level == SOL_SOCKET && ctx->optname == SO_SNDBUF) { /* Not interested in SOL_SOCKET:SO_SNDBUF; @@ -51,6 +57,26 @@ int _getsockopt(struct bpf_sockopt *ctx) return 1; } + if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) { + if (optval + 1 > optval_end) + return 0; /* EPERM, bounds check */ + + ctx->retval = 0; /* Reset system call return value to zero */ + + /* Always export 0x55 */ + optval[0] = 0x55; + ctx->optlen = 1; + + /* Userspace buffer is PAGE_SIZE * 2, but BPF + * program can only see the first PAGE_SIZE + * bytes of data. + */ + if (optval_end - optval != PAGE_SIZE) + return 0; /* EPERM, unexpected data size */ + + return 1; + } + if (ctx->level != SOL_CUSTOM) return 0; /* EPERM, deny everything except custom level */ @@ -81,12 +107,14 @@ int _setsockopt(struct bpf_sockopt *ctx) __u8 *optval = ctx->optval; struct sockopt_sk *storage; - if (ctx->level == SOL_IP && ctx->optname == IP_TOS) + if (ctx->level == SOL_IP && ctx->optname == IP_TOS) { /* Not interested in SOL_IP:IP_TOS; * let next BPF program in the cgroup chain or kernel * handle it. */ + ctx->optlen = 0; /* bypass optval>PAGE_SIZE */ return 1; + } if (ctx->level == SOL_SOCKET && ctx->optname == SO_SNDBUF) { /* Overwrite SO_SNDBUF value */ @@ -112,6 +140,28 @@ int _setsockopt(struct bpf_sockopt *ctx) return 1; } + if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) { + /* Original optlen is larger than PAGE_SIZE. */ + if (ctx->optlen != PAGE_SIZE * 2) + return 0; /* EPERM, unexpected data size */ + + if (optval + 1 > optval_end) + return 0; /* EPERM, bounds check */ + + /* Make sure we can trim the buffer. */ + optval[0] = 0; + ctx->optlen = 1; + + /* Usepace buffer is PAGE_SIZE * 2, but BPF + * program can only see the first PAGE_SIZE + * bytes of data. + */ + if (optval_end - optval != PAGE_SIZE) + return 0; /* EPERM, unexpected data size */ + + return 1; + } + if (ctx->level != SOL_CUSTOM) return 0; /* EPERM, deny everything except custom level */ -- 2.27.0.290.gba653c62da-goog ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH bpf v5 3/3] bpf: document optval > PAGE_SIZE behavior for sockopt hooks 2020-06-17 1:04 [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Stanislav Fomichev 2020-06-17 1:04 ` [PATCH bpf v5 2/3] selftests/bpf: make sure optvals > PAGE_SIZE are bypassed Stanislav Fomichev @ 2020-06-17 1:04 ` Stanislav Fomichev 2020-06-17 17:09 ` [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Alexei Starovoitov 2 siblings, 0 replies; 6+ messages in thread From: Stanislav Fomichev @ 2020-06-17 1:04 UTC (permalink / raw) To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev Extend existing doc with more details about requiring ctx->optlen = 0 for handling optval > PAGE_SIZE. Signed-off-by: Stanislav Fomichev <sdf@google.com> --- Documentation/bpf/prog_cgroup_sockopt.rst | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/Documentation/bpf/prog_cgroup_sockopt.rst b/Documentation/bpf/prog_cgroup_sockopt.rst index c47d974629ae..172f957204bf 100644 --- a/Documentation/bpf/prog_cgroup_sockopt.rst +++ b/Documentation/bpf/prog_cgroup_sockopt.rst @@ -86,6 +86,20 @@ then the next program in the chain (A) will see those changes, *not* the original input ``setsockopt`` arguments. The potentially modified values will be then passed down to the kernel. +Large optval +============ +When the ``optval`` is greater than the ``PAGE_SIZE``, the BPF program +can access only the first ``PAGE_SIZE`` of that data. So it has to options: + +* Set ``optlen`` to zero, which indicates that the kernel should + use the original buffer from the userspace. Any modifications + done by the BPF program to the ``optval`` are ignored. +* Set ``optlen`` to the value less than ``PAGE_SIZE``, which + indicates that the kernel should use BPF's trimmed ``optval``. + +When the BPF program returns with the ``optlen`` greater than +``PAGE_SIZE``, the userspace will receive ``EFAULT`` errno. + Example ======= -- 2.27.0.290.gba653c62da-goog ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE 2020-06-17 1:04 [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Stanislav Fomichev 2020-06-17 1:04 ` [PATCH bpf v5 2/3] selftests/bpf: make sure optvals > PAGE_SIZE are bypassed Stanislav Fomichev 2020-06-17 1:04 ` [PATCH bpf v5 3/3] bpf: document optval > PAGE_SIZE behavior for sockopt hooks Stanislav Fomichev @ 2020-06-17 17:09 ` Alexei Starovoitov 2020-06-17 17:45 ` sdf 2 siblings, 1 reply; 6+ messages in thread From: Alexei Starovoitov @ 2020-06-17 17:09 UTC (permalink / raw) To: Stanislav Fomichev; +Cc: netdev, bpf, davem, ast, daniel, David Laight On Tue, Jun 16, 2020 at 06:04:14PM -0700, Stanislav Fomichev wrote: > Attaching to these hooks can break iptables because its optval is > usually quite big, or at least bigger than the current PAGE_SIZE limit. > David also mentioned some SCTP options can be big (around 256k). > > For such optvals we expose only the first PAGE_SIZE bytes to > the BPF program. BPF program has two options: > 1. Set ctx->optlen to 0 to indicate that the BPF's optval > should be ignored and the kernel should use original userspace > value. > 2. Set ctx->optlen to something that's smaller than the PAGE_SIZE. > > v5: > * use ctx->optlen == 0 with trimmed buffer (Alexei Starovoitov) > * update the docs accordingly > > v4: > * use temporary buffer to avoid optval == optval_end == NULL; > this removes the corner case in the verifier that might assume > non-zero PTR_TO_PACKET/PTR_TO_PACKET_END. > > v3: > * don't increase the limit, bypass the argument > > v2: > * proper comments formatting (Jakub Kicinski) > > Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks") > Cc: David Laight <David.Laight@ACULAB.COM> > Signed-off-by: Stanislav Fomichev <sdf@google.com> > --- > kernel/bpf/cgroup.c | 53 ++++++++++++++++++++++++++++----------------- > 1 file changed, 33 insertions(+), 20 deletions(-) > > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c > index 4d76f16524cc..ac53102e244a 100644 > --- a/kernel/bpf/cgroup.c > +++ b/kernel/bpf/cgroup.c > @@ -1276,16 +1276,23 @@ static bool __cgroup_bpf_prog_array_is_empty(struct cgroup *cgrp, > > static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen) > { > - if (unlikely(max_optlen > PAGE_SIZE) || max_optlen < 0) > + if (unlikely(max_optlen < 0)) > return -EINVAL; > > + if (unlikely(max_optlen > PAGE_SIZE)) { > + /* We don't expose optvals that are greater than PAGE_SIZE > + * to the BPF program. > + */ > + max_optlen = PAGE_SIZE; > + } > + > ctx->optval = kzalloc(max_optlen, GFP_USER); > if (!ctx->optval) > return -ENOMEM; > > ctx->optval_end = ctx->optval + max_optlen; > > - return 0; > + return max_optlen; > } > > static void sockopt_free_buf(struct bpf_sockopt_kern *ctx) > @@ -1319,13 +1326,13 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level, > */ > max_optlen = max_t(int, 16, *optlen); > > - ret = sockopt_alloc_buf(&ctx, max_optlen); > - if (ret) > - return ret; > + max_optlen = sockopt_alloc_buf(&ctx, max_optlen); > + if (max_optlen < 0) > + return max_optlen; > > ctx.optlen = *optlen; > > - if (copy_from_user(ctx.optval, optval, *optlen) != 0) { > + if (copy_from_user(ctx.optval, optval, min(*optlen, max_optlen)) != 0) { > ret = -EFAULT; > goto out; > } > @@ -1353,8 +1360,14 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level, > /* export any potential modifications */ > *level = ctx.level; > *optname = ctx.optname; > - *optlen = ctx.optlen; > - *kernel_optval = ctx.optval; > + > + /* optlen == 0 from BPF indicates that we should > + * use original userspace data. > + */ > + if (ctx.optlen != 0) { > + *optlen = ctx.optlen; I think it should be: *optlen = min(ctx.optlen, max_optlen); Otherwise when bpf prog doesn't adjust ctx.oplen the kernel will see 4k only in kernel_optval whereas optlen will be > 4k. I suspect iptables sockopt should have crashed at this point. How did you test it? > + *kernel_optval = ctx.optval; > + } > } > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE 2020-06-17 17:09 ` [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Alexei Starovoitov @ 2020-06-17 17:45 ` sdf 2020-06-17 17:59 ` Alexei Starovoitov 0 siblings, 1 reply; 6+ messages in thread From: sdf @ 2020-06-17 17:45 UTC (permalink / raw) To: Alexei Starovoitov; +Cc: netdev, bpf, davem, ast, daniel, David Laight On 06/17, Alexei Starovoitov wrote: > On Tue, Jun 16, 2020 at 06:04:14PM -0700, Stanislav Fomichev wrote: > > Attaching to these hooks can break iptables because its optval is > > usually quite big, or at least bigger than the current PAGE_SIZE limit. > > David also mentioned some SCTP options can be big (around 256k). > > > > For such optvals we expose only the first PAGE_SIZE bytes to > > the BPF program. BPF program has two options: > > 1. Set ctx->optlen to 0 to indicate that the BPF's optval > > should be ignored and the kernel should use original userspace > > value. > > 2. Set ctx->optlen to something that's smaller than the PAGE_SIZE. > > > > v5: > > * use ctx->optlen == 0 with trimmed buffer (Alexei Starovoitov) > > * update the docs accordingly > > > > v4: > > * use temporary buffer to avoid optval == optval_end == NULL; > > this removes the corner case in the verifier that might assume > > non-zero PTR_TO_PACKET/PTR_TO_PACKET_END. > > > > v3: > > * don't increase the limit, bypass the argument > > > > v2: > > * proper comments formatting (Jakub Kicinski) > > > > Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks") > > Cc: David Laight <David.Laight@ACULAB.COM> > > Signed-off-by: Stanislav Fomichev <sdf@google.com> > > --- > > kernel/bpf/cgroup.c | 53 ++++++++++++++++++++++++++++----------------- > > 1 file changed, 33 insertions(+), 20 deletions(-) > > > > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c > > index 4d76f16524cc..ac53102e244a 100644 > > --- a/kernel/bpf/cgroup.c > > +++ b/kernel/bpf/cgroup.c > > @@ -1276,16 +1276,23 @@ static bool > __cgroup_bpf_prog_array_is_empty(struct cgroup *cgrp, > > > > static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int > max_optlen) > > { > > - if (unlikely(max_optlen > PAGE_SIZE) || max_optlen < 0) > > + if (unlikely(max_optlen < 0)) > > return -EINVAL; > > > > + if (unlikely(max_optlen > PAGE_SIZE)) { > > + /* We don't expose optvals that are greater than PAGE_SIZE > > + * to the BPF program. > > + */ > > + max_optlen = PAGE_SIZE; > > + } > > + > > ctx->optval = kzalloc(max_optlen, GFP_USER); > > if (!ctx->optval) > > return -ENOMEM; > > > > ctx->optval_end = ctx->optval + max_optlen; > > > > - return 0; > > + return max_optlen; > > } > > > > static void sockopt_free_buf(struct bpf_sockopt_kern *ctx) > > @@ -1319,13 +1326,13 @@ int __cgroup_bpf_run_filter_setsockopt(struct > sock *sk, int *level, > > */ > > max_optlen = max_t(int, 16, *optlen); > > > > - ret = sockopt_alloc_buf(&ctx, max_optlen); > > - if (ret) > > - return ret; > > + max_optlen = sockopt_alloc_buf(&ctx, max_optlen); > > + if (max_optlen < 0) > > + return max_optlen; > > > > ctx.optlen = *optlen; > > > > - if (copy_from_user(ctx.optval, optval, *optlen) != 0) { > > + if (copy_from_user(ctx.optval, optval, min(*optlen, max_optlen)) != > 0) { > > ret = -EFAULT; > > goto out; > > } > > @@ -1353,8 +1360,14 @@ int __cgroup_bpf_run_filter_setsockopt(struct > sock *sk, int *level, > > /* export any potential modifications */ > > *level = ctx.level; > > *optname = ctx.optname; > > - *optlen = ctx.optlen; > > - *kernel_optval = ctx.optval; > > + > > + /* optlen == 0 from BPF indicates that we should > > + * use original userspace data. > > + */ > > + if (ctx.optlen != 0) { > > + *optlen = ctx.optlen; > I think it should be: > *optlen = min(ctx.optlen, max_optlen); We do have the following (existing) check above: } else if (ctx.optlen > max_optlen || ctx.optlen < -1) { /* optlen is out of bounds */ ret = -EFAULT; } else { So we shouldn't need any min here? Or am I missing something? > Otherwise when bpf prog doesn't adjust ctx.oplen the kernel will see > 4k only in kernel_optval whereas optlen will be > 4k. > I suspect iptables sockopt should have crashed at this point. > How did you test it? The selftests that I've attached in the series. The test is passing two pages and for IP_TOS we bypass the value via optlen=0 and for IP_FREEBIND we trim the buffer to 1 byte. I think this should cover this check here. One thing I didn't really test is getsockopt when the kernel returns really large buffer (iptables). Right now, the test gets 4 bytes (trimmed) from the kernel. I think that's the only place that I didn't properly test. I wonder whether I should do a real iptables-like setsockopt/getsockopt :-/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE 2020-06-17 17:45 ` sdf @ 2020-06-17 17:59 ` Alexei Starovoitov 0 siblings, 0 replies; 6+ messages in thread From: Alexei Starovoitov @ 2020-06-17 17:59 UTC (permalink / raw) To: sdf; +Cc: netdev, bpf, davem, ast, daniel, David Laight On Wed, Jun 17, 2020 at 10:45:08AM -0700, sdf@google.com wrote: > On 06/17, Alexei Starovoitov wrote: > > On Tue, Jun 16, 2020 at 06:04:14PM -0700, Stanislav Fomichev wrote: > > > Attaching to these hooks can break iptables because its optval is > > > usually quite big, or at least bigger than the current PAGE_SIZE limit. > > > David also mentioned some SCTP options can be big (around 256k). > > > > > > For such optvals we expose only the first PAGE_SIZE bytes to > > > the BPF program. BPF program has two options: > > > 1. Set ctx->optlen to 0 to indicate that the BPF's optval > > > should be ignored and the kernel should use original userspace > > > value. > > > 2. Set ctx->optlen to something that's smaller than the PAGE_SIZE. > > > > > > v5: > > > * use ctx->optlen == 0 with trimmed buffer (Alexei Starovoitov) > > > * update the docs accordingly > > > > > > v4: > > > * use temporary buffer to avoid optval == optval_end == NULL; > > > this removes the corner case in the verifier that might assume > > > non-zero PTR_TO_PACKET/PTR_TO_PACKET_END. > > > > > > v3: > > > * don't increase the limit, bypass the argument > > > > > > v2: > > > * proper comments formatting (Jakub Kicinski) > > > > > > Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks") > > > Cc: David Laight <David.Laight@ACULAB.COM> > > > Signed-off-by: Stanislav Fomichev <sdf@google.com> > > > --- > > > kernel/bpf/cgroup.c | 53 ++++++++++++++++++++++++++++----------------- > > > 1 file changed, 33 insertions(+), 20 deletions(-) > > > > > > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c > > > index 4d76f16524cc..ac53102e244a 100644 > > > --- a/kernel/bpf/cgroup.c > > > +++ b/kernel/bpf/cgroup.c > > > @@ -1276,16 +1276,23 @@ static bool > > __cgroup_bpf_prog_array_is_empty(struct cgroup *cgrp, > > > > > > static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int > > max_optlen) > > > { > > > - if (unlikely(max_optlen > PAGE_SIZE) || max_optlen < 0) > > > + if (unlikely(max_optlen < 0)) > > > return -EINVAL; > > > > > > + if (unlikely(max_optlen > PAGE_SIZE)) { > > > + /* We don't expose optvals that are greater than PAGE_SIZE > > > + * to the BPF program. > > > + */ > > > + max_optlen = PAGE_SIZE; > > > + } > > > + > > > ctx->optval = kzalloc(max_optlen, GFP_USER); > > > if (!ctx->optval) > > > return -ENOMEM; > > > > > > ctx->optval_end = ctx->optval + max_optlen; > > > > > > - return 0; > > > + return max_optlen; > > > } > > > > > > static void sockopt_free_buf(struct bpf_sockopt_kern *ctx) > > > @@ -1319,13 +1326,13 @@ int __cgroup_bpf_run_filter_setsockopt(struct > > sock *sk, int *level, > > > */ > > > max_optlen = max_t(int, 16, *optlen); > > > > > > - ret = sockopt_alloc_buf(&ctx, max_optlen); > > > - if (ret) > > > - return ret; > > > + max_optlen = sockopt_alloc_buf(&ctx, max_optlen); > > > + if (max_optlen < 0) > > > + return max_optlen; > > > > > > ctx.optlen = *optlen; > > > > > > - if (copy_from_user(ctx.optval, optval, *optlen) != 0) { > > > + if (copy_from_user(ctx.optval, optval, min(*optlen, max_optlen)) != > > 0) { > > > ret = -EFAULT; > > > goto out; > > > } > > > @@ -1353,8 +1360,14 @@ int __cgroup_bpf_run_filter_setsockopt(struct > > sock *sk, int *level, > > > /* export any potential modifications */ > > > *level = ctx.level; > > > *optname = ctx.optname; > > > - *optlen = ctx.optlen; > > > - *kernel_optval = ctx.optval; > > > + > > > + /* optlen == 0 from BPF indicates that we should > > > + * use original userspace data. > > > + */ > > > + if (ctx.optlen != 0) { > > > + *optlen = ctx.optlen; > > > I think it should be: > > *optlen = min(ctx.optlen, max_optlen); > We do have the following (existing) check above: > } else if (ctx.optlen > max_optlen || ctx.optlen < -1) { > /* optlen is out of bounds */ > ret = -EFAULT; > } else { > > So we shouldn't need any min here? Or am I missing something? ahh. you're right. Applied to bpf tree. > > Otherwise when bpf prog doesn't adjust ctx.oplen the kernel will see > > 4k only in kernel_optval whereas optlen will be > 4k. > > I suspect iptables sockopt should have crashed at this point. > > How did you test it? > The selftests that I've attached in the series. The test is passing > two pages and for IP_TOS we bypass the value via optlen=0 and > for IP_FREEBIND we trim the buffer to 1 byte. I think this should > cover this check here. > > One thing I didn't really test is getsockopt when the kernel > returns really large buffer (iptables). Right now, the test > gets 4 bytes (trimmed) from the kernel. I think that's the only > place that I didn't properly test. I wonder whether I should > do a real iptables-like setsockopt/getsockopt :-/ would be nice :) ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-06-17 17:59 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-06-17 1:04 [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Stanislav Fomichev
2020-06-17 1:04 ` [PATCH bpf v5 2/3] selftests/bpf: make sure optvals > PAGE_SIZE are bypassed Stanislav Fomichev
2020-06-17 1:04 ` [PATCH bpf v5 3/3] bpf: document optval > PAGE_SIZE behavior for sockopt hooks Stanislav Fomichev
2020-06-17 17:09 ` [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Alexei Starovoitov
2020-06-17 17:45 ` sdf
2020-06-17 17:59 ` Alexei Starovoitov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).