[PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE
@ 2020-06-17  1:04 Stanislav Fomichev
  2020-06-17  1:04 ` [PATCH bpf v5 2/3] selftests/bpf: make sure optvals > PAGE_SIZE are bypassed Stanislav Fomichev
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Stanislav Fomichev @ 2020-06-17  1:04 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, David Laight

Attaching to these hooks can break iptables because its optval is
usually quite big, or at least bigger than the current PAGE_SIZE limit.
David also mentioned some SCTP options can be big (around 256k).

For such optvals we expose only the first PAGE_SIZE bytes to
the BPF program. BPF program has two options:
1. Set ctx->optlen to 0 to indicate that the BPF's optval
   should be ignored and the kernel should use original userspace
   value.
2. Set ctx->optlen to something that's smaller than the PAGE_SIZE.

v5:
* use ctx->optlen == 0 with trimmed buffer (Alexei Starovoitov)
* update the docs accordingly

v4:
* use temporary buffer to avoid optval == optval_end == NULL;
  this removes the corner case in the verifier that might assume
  non-zero PTR_TO_PACKET/PTR_TO_PACKET_END.

v3:
* don't increase the limit, bypass the argument

v2:
* proper comments formatting (Jakub Kicinski)

Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
Cc: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 kernel/bpf/cgroup.c | 53 ++++++++++++++++++++++++++++-----------------
 1 file changed, 33 insertions(+), 20 deletions(-)

diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 4d76f16524cc..ac53102e244a 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1276,16 +1276,23 @@ static bool __cgroup_bpf_prog_array_is_empty(struct cgroup *cgrp,
 
 static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen)
 {
-	if (unlikely(max_optlen > PAGE_SIZE) || max_optlen < 0)
+	if (unlikely(max_optlen < 0))
 		return -EINVAL;
 
+	if (unlikely(max_optlen > PAGE_SIZE)) {
+		/* We don't expose optvals that are greater than PAGE_SIZE
+		 * to the BPF program.
+		 */
+		max_optlen = PAGE_SIZE;
+	}
+
 	ctx->optval = kzalloc(max_optlen, GFP_USER);
 	if (!ctx->optval)
 		return -ENOMEM;
 
 	ctx->optval_end = ctx->optval + max_optlen;
 
-	return 0;
+	return max_optlen;
 }
 
 static void sockopt_free_buf(struct bpf_sockopt_kern *ctx)
@@ -1319,13 +1326,13 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
 	 */
 	max_optlen = max_t(int, 16, *optlen);
 
-	ret = sockopt_alloc_buf(&ctx, max_optlen);
-	if (ret)
-		return ret;
+	max_optlen = sockopt_alloc_buf(&ctx, max_optlen);
+	if (max_optlen < 0)
+		return max_optlen;
 
 	ctx.optlen = *optlen;
 
-	if (copy_from_user(ctx.optval, optval, *optlen) != 0) {
+	if (copy_from_user(ctx.optval, optval, min(*optlen, max_optlen)) != 0) {
 		ret = -EFAULT;
 		goto out;
 	}
@@ -1353,8 +1360,14 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
 		/* export any potential modifications */
 		*level = ctx.level;
 		*optname = ctx.optname;
-		*optlen = ctx.optlen;
-		*kernel_optval = ctx.optval;
+
+		/* optlen == 0 from BPF indicates that we should
+		 * use original userspace data.
+		 */
+		if (ctx.optlen != 0) {
+			*optlen = ctx.optlen;
+			*kernel_optval = ctx.optval;
+		}
 	}
 
 out:
@@ -1385,12 +1398,12 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
 	    __cgroup_bpf_prog_array_is_empty(cgrp, BPF_CGROUP_GETSOCKOPT))
 		return retval;
 
-	ret = sockopt_alloc_buf(&ctx, max_optlen);
-	if (ret)
-		return ret;
-
 	ctx.optlen = max_optlen;
 
+	max_optlen = sockopt_alloc_buf(&ctx, max_optlen);
+	if (max_optlen < 0)
+		return max_optlen;
+
 	if (!retval) {
 		/* If kernel getsockopt finished successfully,
 		 * copy whatever was returned to the user back
@@ -1404,10 +1417,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
 			goto out;
 		}
 
-		if (ctx.optlen > max_optlen)
-			ctx.optlen = max_optlen;
-
-		if (copy_from_user(ctx.optval, optval, ctx.optlen) != 0) {
+		if (copy_from_user(ctx.optval, optval,
+				   min(ctx.optlen, max_optlen)) != 0) {
 			ret = -EFAULT;
 			goto out;
 		}
@@ -1436,10 +1447,12 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
 		goto out;
 	}
 
-	if (copy_to_user(optval, ctx.optval, ctx.optlen) ||
-	    put_user(ctx.optlen, optlen)) {
-		ret = -EFAULT;
-		goto out;
+	if (ctx.optlen != 0) {
+		if (copy_to_user(optval, ctx.optval, ctx.optlen) ||
+		    put_user(ctx.optlen, optlen)) {
+			ret = -EFAULT;
+			goto out;
+		}
 	}
 
 	ret = ctx.retval;
-- 
2.27.0.290.gba653c62da-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH bpf v5 2/3] selftests/bpf: make sure optvals > PAGE_SIZE are bypassed
  2020-06-17  1:04 [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Stanislav Fomichev
@ 2020-06-17  1:04 ` Stanislav Fomichev
  2020-06-17  1:04 ` [PATCH bpf v5 3/3] bpf: document optval > PAGE_SIZE behavior for sockopt hooks Stanislav Fomichev
  2020-06-17 17:09 ` [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Alexei Starovoitov
  2 siblings, 0 replies; 6+ messages in thread
From: Stanislav Fomichev @ 2020-06-17  1:04 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev

We are relying on the fact, that we can pass > sizeof(int) optvals
to the SOL_IP+IP_FREEBIND option (the kernel will take first 4 bytes).
In the BPF program we check that we can only touch PAGE_SIZE bytes,
but the real optlen is PAGE_SIZE * 2. In both cases, we override it to
some predefined value and trim the optlen.

Also, let's modify exiting IP_TOS usecase to test optlen=0 case
where BPF program just bypasses the data as is.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 .../selftests/bpf/prog_tests/sockopt_sk.c     | 46 +++++++++++++---
 .../testing/selftests/bpf/progs/sockopt_sk.c  | 54 ++++++++++++++++++-
 2 files changed, 91 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
index 2061a6beac0f..5f54c6aec7f0 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
@@ -13,6 +13,7 @@ static int getsetsockopt(void)
 		char cc[16]; /* TCP_CA_NAME_MAX */
 	} buf = {};
 	socklen_t optlen;
+	char *big_buf = NULL;
 
 	fd = socket(AF_INET, SOCK_STREAM, 0);
 	if (fd < 0) {
@@ -22,24 +23,31 @@ static int getsetsockopt(void)
 
 	/* IP_TOS - BPF bypass */
 
-	buf.u8[0] = 0x08;
-	err = setsockopt(fd, SOL_IP, IP_TOS, &buf, 1);
+	optlen = getpagesize() * 2;
+	big_buf = calloc(1, optlen);
+	if (!big_buf) {
+		log_err("Couldn't allocate two pages");
+		goto err;
+	}
+
+	*(int *)big_buf = 0x08;
+	err = setsockopt(fd, SOL_IP, IP_TOS, big_buf, optlen);
 	if (err) {
 		log_err("Failed to call setsockopt(IP_TOS)");
 		goto err;
 	}
 
-	buf.u8[0] = 0x00;
+	memset(big_buf, 0, optlen);
 	optlen = 1;
-	err = getsockopt(fd, SOL_IP, IP_TOS, &buf, &optlen);
+	err = getsockopt(fd, SOL_IP, IP_TOS, big_buf, &optlen);
 	if (err) {
 		log_err("Failed to call getsockopt(IP_TOS)");
 		goto err;
 	}
 
-	if (buf.u8[0] != 0x08) {
-		log_err("Unexpected getsockopt(IP_TOS) buf[0] 0x%02x != 0x08",
-			buf.u8[0]);
+	if (*(int *)big_buf != 0x08) {
+		log_err("Unexpected getsockopt(IP_TOS) optval 0x%x != 0x08",
+			*(int *)big_buf);
 		goto err;
 	}
 
@@ -78,6 +86,28 @@ static int getsetsockopt(void)
 		goto err;
 	}
 
+	/* IP_FREEBIND - BPF can't access optval past PAGE_SIZE */
+
+	optlen = getpagesize() * 2;
+	memset(big_buf, 0, optlen);
+
+	err = setsockopt(fd, SOL_IP, IP_FREEBIND, big_buf, optlen);
+	if (err != 0) {
+		log_err("Failed to call setsockopt, ret=%d", err);
+		goto err;
+	}
+
+	err = getsockopt(fd, SOL_IP, IP_FREEBIND, big_buf, &optlen);
+	if (err != 0) {
+		log_err("Failed to call getsockopt, ret=%d", err);
+		goto err;
+	}
+
+	if (optlen != 1 || *(__u8 *)big_buf != 0x55) {
+		log_err("Unexpected IP_FREEBIND getsockopt, optlen=%d, optval=0x%x",
+			optlen, *(__u8 *)big_buf);
+	}
+
 	/* SO_SNDBUF is overwritten */
 
 	buf.u32 = 0x01010101;
@@ -124,9 +154,11 @@ static int getsetsockopt(void)
 		goto err;
 	}
 
+	free(big_buf);
 	close(fd);
 	return 0;
 err:
+	free(big_buf);
 	close(fd);
 	return -1;
 }
diff --git a/tools/testing/selftests/bpf/progs/sockopt_sk.c b/tools/testing/selftests/bpf/progs/sockopt_sk.c
index d5a5eeb5fb52..712df7b49cb1 100644
--- a/tools/testing/selftests/bpf/progs/sockopt_sk.c
+++ b/tools/testing/selftests/bpf/progs/sockopt_sk.c
@@ -8,6 +8,10 @@
 char _license[] SEC("license") = "GPL";
 __u32 _version SEC("version") = 1;
 
+#ifndef PAGE_SIZE
+#define PAGE_SIZE 4096
+#endif
+
 #define SOL_CUSTOM			0xdeadbeef
 
 struct sockopt_sk {
@@ -28,12 +32,14 @@ int _getsockopt(struct bpf_sockopt *ctx)
 	__u8 *optval = ctx->optval;
 	struct sockopt_sk *storage;
 
-	if (ctx->level == SOL_IP && ctx->optname == IP_TOS)
+	if (ctx->level == SOL_IP && ctx->optname == IP_TOS) {
 		/* Not interested in SOL_IP:IP_TOS;
 		 * let next BPF program in the cgroup chain or kernel
 		 * handle it.
 		 */
+		ctx->optlen = 0; /* bypass optval>PAGE_SIZE */
 		return 1;
+	}
 
 	if (ctx->level == SOL_SOCKET && ctx->optname == SO_SNDBUF) {
 		/* Not interested in SOL_SOCKET:SO_SNDBUF;
@@ -51,6 +57,26 @@ int _getsockopt(struct bpf_sockopt *ctx)
 		return 1;
 	}
 
+	if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) {
+		if (optval + 1 > optval_end)
+			return 0; /* EPERM, bounds check */
+
+		ctx->retval = 0; /* Reset system call return value to zero */
+
+		/* Always export 0x55 */
+		optval[0] = 0x55;
+		ctx->optlen = 1;
+
+		/* Userspace buffer is PAGE_SIZE * 2, but BPF
+		 * program can only see the first PAGE_SIZE
+		 * bytes of data.
+		 */
+		if (optval_end - optval != PAGE_SIZE)
+			return 0; /* EPERM, unexpected data size */
+
+		return 1;
+	}
+
 	if (ctx->level != SOL_CUSTOM)
 		return 0; /* EPERM, deny everything except custom level */
 
@@ -81,12 +107,14 @@ int _setsockopt(struct bpf_sockopt *ctx)
 	__u8 *optval = ctx->optval;
 	struct sockopt_sk *storage;
 
-	if (ctx->level == SOL_IP && ctx->optname == IP_TOS)
+	if (ctx->level == SOL_IP && ctx->optname == IP_TOS) {
 		/* Not interested in SOL_IP:IP_TOS;
 		 * let next BPF program in the cgroup chain or kernel
 		 * handle it.
 		 */
+		ctx->optlen = 0; /* bypass optval>PAGE_SIZE */
 		return 1;
+	}
 
 	if (ctx->level == SOL_SOCKET && ctx->optname == SO_SNDBUF) {
 		/* Overwrite SO_SNDBUF value */
@@ -112,6 +140,28 @@ int _setsockopt(struct bpf_sockopt *ctx)
 		return 1;
 	}
 
+	if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) {
+		/* Original optlen is larger than PAGE_SIZE. */
+		if (ctx->optlen != PAGE_SIZE * 2)
+			return 0; /* EPERM, unexpected data size */
+
+		if (optval + 1 > optval_end)
+			return 0; /* EPERM, bounds check */
+
+		/* Make sure we can trim the buffer. */
+		optval[0] = 0;
+		ctx->optlen = 1;
+
+		/* Usepace buffer is PAGE_SIZE * 2, but BPF
+		 * program can only see the first PAGE_SIZE
+		 * bytes of data.
+		 */
+		if (optval_end - optval != PAGE_SIZE)
+			return 0; /* EPERM, unexpected data size */
+
+		return 1;
+	}
+
 	if (ctx->level != SOL_CUSTOM)
 		return 0; /* EPERM, deny everything except custom level */
 
-- 
2.27.0.290.gba653c62da-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH bpf v5 3/3] bpf: document optval > PAGE_SIZE behavior for sockopt hooks
  2020-06-17  1:04 [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Stanislav Fomichev
  2020-06-17  1:04 ` [PATCH bpf v5 2/3] selftests/bpf: make sure optvals > PAGE_SIZE are bypassed Stanislav Fomichev
@ 2020-06-17  1:04 ` Stanislav Fomichev
  2020-06-17 17:09 ` [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Alexei Starovoitov
  2 siblings, 0 replies; 6+ messages in thread
From: Stanislav Fomichev @ 2020-06-17  1:04 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev

Extend existing doc with more details about requiring ctx->optlen = 0
for handling optval > PAGE_SIZE.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 Documentation/bpf/prog_cgroup_sockopt.rst | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/Documentation/bpf/prog_cgroup_sockopt.rst b/Documentation/bpf/prog_cgroup_sockopt.rst
index c47d974629ae..172f957204bf 100644
--- a/Documentation/bpf/prog_cgroup_sockopt.rst
+++ b/Documentation/bpf/prog_cgroup_sockopt.rst
@@ -86,6 +86,20 @@ then the next program in the chain (A) will see those changes,
 *not* the original input ``setsockopt`` arguments. The potentially
 modified values will be then passed down to the kernel.
 
+Large optval
+============
+When the ``optval`` is greater than the ``PAGE_SIZE``, the BPF program
+can access only the first ``PAGE_SIZE`` of that data. So it has to options:
+
+* Set ``optlen`` to zero, which indicates that the kernel should
+  use the original buffer from the userspace. Any modifications
+  done by the BPF program to the ``optval`` are ignored.
+* Set ``optlen`` to the value less than ``PAGE_SIZE``, which
+  indicates that the kernel should use BPF's trimmed ``optval``.
+
+When the BPF program returns with the ``optlen`` greater than
+``PAGE_SIZE``, the userspace will receive ``EFAULT`` errno.
+
 Example
 =======
 
-- 
2.27.0.290.gba653c62da-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE
  2020-06-17  1:04 [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Stanislav Fomichev
  2020-06-17  1:04 ` [PATCH bpf v5 2/3] selftests/bpf: make sure optvals > PAGE_SIZE are bypassed Stanislav Fomichev
  2020-06-17  1:04 ` [PATCH bpf v5 3/3] bpf: document optval > PAGE_SIZE behavior for sockopt hooks Stanislav Fomichev
@ 2020-06-17 17:09 ` Alexei Starovoitov
  2020-06-17 17:45   ` sdf
  2 siblings, 1 reply; 6+ messages in thread
From: Alexei Starovoitov @ 2020-06-17 17:09 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, bpf, davem, ast, daniel, David Laight

On Tue, Jun 16, 2020 at 06:04:14PM -0700, Stanislav Fomichev wrote:
> Attaching to these hooks can break iptables because its optval is
> usually quite big, or at least bigger than the current PAGE_SIZE limit.
> David also mentioned some SCTP options can be big (around 256k).
> 
> For such optvals we expose only the first PAGE_SIZE bytes to
> the BPF program. BPF program has two options:
> 1. Set ctx->optlen to 0 to indicate that the BPF's optval
>    should be ignored and the kernel should use original userspace
>    value.
> 2. Set ctx->optlen to something that's smaller than the PAGE_SIZE.
> 
> v5:
> * use ctx->optlen == 0 with trimmed buffer (Alexei Starovoitov)
> * update the docs accordingly
> 
> v4:
> * use temporary buffer to avoid optval == optval_end == NULL;
>   this removes the corner case in the verifier that might assume
>   non-zero PTR_TO_PACKET/PTR_TO_PACKET_END.
> 
> v3:
> * don't increase the limit, bypass the argument
> 
> v2:
> * proper comments formatting (Jakub Kicinski)
> 
> Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
> Cc: David Laight <David.Laight@ACULAB.COM>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  kernel/bpf/cgroup.c | 53 ++++++++++++++++++++++++++++-----------------
>  1 file changed, 33 insertions(+), 20 deletions(-)
> 
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index 4d76f16524cc..ac53102e244a 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -1276,16 +1276,23 @@ static bool __cgroup_bpf_prog_array_is_empty(struct cgroup *cgrp,
>  
>  static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen)
>  {
> -	if (unlikely(max_optlen > PAGE_SIZE) || max_optlen < 0)
> +	if (unlikely(max_optlen < 0))
>  		return -EINVAL;
>  
> +	if (unlikely(max_optlen > PAGE_SIZE)) {
> +		/* We don't expose optvals that are greater than PAGE_SIZE
> +		 * to the BPF program.
> +		 */
> +		max_optlen = PAGE_SIZE;
> +	}
> +
>  	ctx->optval = kzalloc(max_optlen, GFP_USER);
>  	if (!ctx->optval)
>  		return -ENOMEM;
>  
>  	ctx->optval_end = ctx->optval + max_optlen;
>  
> -	return 0;
> +	return max_optlen;
>  }
>  
>  static void sockopt_free_buf(struct bpf_sockopt_kern *ctx)
> @@ -1319,13 +1326,13 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
>  	 */
>  	max_optlen = max_t(int, 16, *optlen);
>  
> -	ret = sockopt_alloc_buf(&ctx, max_optlen);
> -	if (ret)
> -		return ret;
> +	max_optlen = sockopt_alloc_buf(&ctx, max_optlen);
> +	if (max_optlen < 0)
> +		return max_optlen;
>  
>  	ctx.optlen = *optlen;
>  
> -	if (copy_from_user(ctx.optval, optval, *optlen) != 0) {
> +	if (copy_from_user(ctx.optval, optval, min(*optlen, max_optlen)) != 0) {
>  		ret = -EFAULT;
>  		goto out;
>  	}
> @@ -1353,8 +1360,14 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
>  		/* export any potential modifications */
>  		*level = ctx.level;
>  		*optname = ctx.optname;
> -		*optlen = ctx.optlen;
> -		*kernel_optval = ctx.optval;
> +
> +		/* optlen == 0 from BPF indicates that we should
> +		 * use original userspace data.
> +		 */
> +		if (ctx.optlen != 0) {
> +			*optlen = ctx.optlen;

I think it should be:
*optlen = min(ctx.optlen, max_optlen);

Otherwise when bpf prog doesn't adjust ctx.oplen the kernel will see
4k only in kernel_optval whereas optlen will be > 4k.
I suspect iptables sockopt should have crashed at this point.
How did you test it?

> +			*kernel_optval = ctx.optval;
> +		}
>  	}
>  

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE
  2020-06-17 17:09 ` [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Alexei Starovoitov
@ 2020-06-17 17:45   ` sdf
  2020-06-17 17:59     ` Alexei Starovoitov
  0 siblings, 1 reply; 6+ messages in thread
From: sdf @ 2020-06-17 17:45 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: netdev, bpf, davem, ast, daniel, David Laight

On 06/17, Alexei Starovoitov wrote:
> On Tue, Jun 16, 2020 at 06:04:14PM -0700, Stanislav Fomichev wrote:
> > Attaching to these hooks can break iptables because its optval is
> > usually quite big, or at least bigger than the current PAGE_SIZE limit.
> > David also mentioned some SCTP options can be big (around 256k).
> >
> > For such optvals we expose only the first PAGE_SIZE bytes to
> > the BPF program. BPF program has two options:
> > 1. Set ctx->optlen to 0 to indicate that the BPF's optval
> >    should be ignored and the kernel should use original userspace
> >    value.
> > 2. Set ctx->optlen to something that's smaller than the PAGE_SIZE.
> >
> > v5:
> > * use ctx->optlen == 0 with trimmed buffer (Alexei Starovoitov)
> > * update the docs accordingly
> >
> > v4:
> > * use temporary buffer to avoid optval == optval_end == NULL;
> >   this removes the corner case in the verifier that might assume
> >   non-zero PTR_TO_PACKET/PTR_TO_PACKET_END.
> >
> > v3:
> > * don't increase the limit, bypass the argument
> >
> > v2:
> > * proper comments formatting (Jakub Kicinski)
> >
> > Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
> > Cc: David Laight <David.Laight@ACULAB.COM>
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  kernel/bpf/cgroup.c | 53 ++++++++++++++++++++++++++++-----------------
> >  1 file changed, 33 insertions(+), 20 deletions(-)
> >
> > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> > index 4d76f16524cc..ac53102e244a 100644
> > --- a/kernel/bpf/cgroup.c
> > +++ b/kernel/bpf/cgroup.c
> > @@ -1276,16 +1276,23 @@ static bool  
> __cgroup_bpf_prog_array_is_empty(struct cgroup *cgrp,
> >
> >  static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int  
> max_optlen)
> >  {
> > -	if (unlikely(max_optlen > PAGE_SIZE) || max_optlen < 0)
> > +	if (unlikely(max_optlen < 0))
> >  		return -EINVAL;
> >
> > +	if (unlikely(max_optlen > PAGE_SIZE)) {
> > +		/* We don't expose optvals that are greater than PAGE_SIZE
> > +		 * to the BPF program.
> > +		 */
> > +		max_optlen = PAGE_SIZE;
> > +	}
> > +
> >  	ctx->optval = kzalloc(max_optlen, GFP_USER);
> >  	if (!ctx->optval)
> >  		return -ENOMEM;
> >
> >  	ctx->optval_end = ctx->optval + max_optlen;
> >
> > -	return 0;
> > +	return max_optlen;
> >  }
> >
> >  static void sockopt_free_buf(struct bpf_sockopt_kern *ctx)
> > @@ -1319,13 +1326,13 @@ int __cgroup_bpf_run_filter_setsockopt(struct  
> sock *sk, int *level,
> >  	 */
> >  	max_optlen = max_t(int, 16, *optlen);
> >
> > -	ret = sockopt_alloc_buf(&ctx, max_optlen);
> > -	if (ret)
> > -		return ret;
> > +	max_optlen = sockopt_alloc_buf(&ctx, max_optlen);
> > +	if (max_optlen < 0)
> > +		return max_optlen;
> >
> >  	ctx.optlen = *optlen;
> >
> > -	if (copy_from_user(ctx.optval, optval, *optlen) != 0) {
> > +	if (copy_from_user(ctx.optval, optval, min(*optlen, max_optlen)) !=  
> 0) {
> >  		ret = -EFAULT;
> >  		goto out;
> >  	}
> > @@ -1353,8 +1360,14 @@ int __cgroup_bpf_run_filter_setsockopt(struct  
> sock *sk, int *level,
> >  		/* export any potential modifications */
> >  		*level = ctx.level;
> >  		*optname = ctx.optname;
> > -		*optlen = ctx.optlen;
> > -		*kernel_optval = ctx.optval;
> > +
> > +		/* optlen == 0 from BPF indicates that we should
> > +		 * use original userspace data.
> > +		 */
> > +		if (ctx.optlen != 0) {
> > +			*optlen = ctx.optlen;

> I think it should be:
> *optlen = min(ctx.optlen, max_optlen);
We do have the following (existing) check above:
	} else if (ctx.optlen > max_optlen || ctx.optlen < -1) {
		/* optlen is out of bounds */
		ret = -EFAULT;
	} else {

So we shouldn't need any min here? Or am I missing something?

> Otherwise when bpf prog doesn't adjust ctx.oplen the kernel will see
> 4k only in kernel_optval whereas optlen will be > 4k.
> I suspect iptables sockopt should have crashed at this point.
> How did you test it?
The selftests that I've attached in the series. The test is passing
two pages and for IP_TOS we bypass the value via optlen=0 and
for IP_FREEBIND we trim the buffer to 1 byte. I think this should
cover this check here.

One thing I didn't really test is getsockopt when the kernel
returns really large buffer (iptables). Right now, the test
gets 4 bytes (trimmed) from the kernel. I think that's the only
place that I didn't properly test. I wonder whether I should
do a real iptables-like setsockopt/getsockopt :-/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE
  2020-06-17 17:45   ` sdf
@ 2020-06-17 17:59     ` Alexei Starovoitov
  0 siblings, 0 replies; 6+ messages in thread
From: Alexei Starovoitov @ 2020-06-17 17:59 UTC (permalink / raw)
  To: sdf; +Cc: netdev, bpf, davem, ast, daniel, David Laight

On Wed, Jun 17, 2020 at 10:45:08AM -0700, sdf@google.com wrote:
> On 06/17, Alexei Starovoitov wrote:
> > On Tue, Jun 16, 2020 at 06:04:14PM -0700, Stanislav Fomichev wrote:
> > > Attaching to these hooks can break iptables because its optval is
> > > usually quite big, or at least bigger than the current PAGE_SIZE limit.
> > > David also mentioned some SCTP options can be big (around 256k).
> > >
> > > For such optvals we expose only the first PAGE_SIZE bytes to
> > > the BPF program. BPF program has two options:
> > > 1. Set ctx->optlen to 0 to indicate that the BPF's optval
> > >    should be ignored and the kernel should use original userspace
> > >    value.
> > > 2. Set ctx->optlen to something that's smaller than the PAGE_SIZE.
> > >
> > > v5:
> > > * use ctx->optlen == 0 with trimmed buffer (Alexei Starovoitov)
> > > * update the docs accordingly
> > >
> > > v4:
> > > * use temporary buffer to avoid optval == optval_end == NULL;
> > >   this removes the corner case in the verifier that might assume
> > >   non-zero PTR_TO_PACKET/PTR_TO_PACKET_END.
> > >
> > > v3:
> > > * don't increase the limit, bypass the argument
> > >
> > > v2:
> > > * proper comments formatting (Jakub Kicinski)
> > >
> > > Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
> > > Cc: David Laight <David.Laight@ACULAB.COM>
> > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > ---
> > >  kernel/bpf/cgroup.c | 53 ++++++++++++++++++++++++++++-----------------
> > >  1 file changed, 33 insertions(+), 20 deletions(-)
> > >
> > > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> > > index 4d76f16524cc..ac53102e244a 100644
> > > --- a/kernel/bpf/cgroup.c
> > > +++ b/kernel/bpf/cgroup.c
> > > @@ -1276,16 +1276,23 @@ static bool
> > __cgroup_bpf_prog_array_is_empty(struct cgroup *cgrp,
> > >
> > >  static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int
> > max_optlen)
> > >  {
> > > -	if (unlikely(max_optlen > PAGE_SIZE) || max_optlen < 0)
> > > +	if (unlikely(max_optlen < 0))
> > >  		return -EINVAL;
> > >
> > > +	if (unlikely(max_optlen > PAGE_SIZE)) {
> > > +		/* We don't expose optvals that are greater than PAGE_SIZE
> > > +		 * to the BPF program.
> > > +		 */
> > > +		max_optlen = PAGE_SIZE;
> > > +	}
> > > +
> > >  	ctx->optval = kzalloc(max_optlen, GFP_USER);
> > >  	if (!ctx->optval)
> > >  		return -ENOMEM;
> > >
> > >  	ctx->optval_end = ctx->optval + max_optlen;
> > >
> > > -	return 0;
> > > +	return max_optlen;
> > >  }
> > >
> > >  static void sockopt_free_buf(struct bpf_sockopt_kern *ctx)
> > > @@ -1319,13 +1326,13 @@ int __cgroup_bpf_run_filter_setsockopt(struct
> > sock *sk, int *level,
> > >  	 */
> > >  	max_optlen = max_t(int, 16, *optlen);
> > >
> > > -	ret = sockopt_alloc_buf(&ctx, max_optlen);
> > > -	if (ret)
> > > -		return ret;
> > > +	max_optlen = sockopt_alloc_buf(&ctx, max_optlen);
> > > +	if (max_optlen < 0)
> > > +		return max_optlen;
> > >
> > >  	ctx.optlen = *optlen;
> > >
> > > -	if (copy_from_user(ctx.optval, optval, *optlen) != 0) {
> > > +	if (copy_from_user(ctx.optval, optval, min(*optlen, max_optlen)) !=
> > 0) {
> > >  		ret = -EFAULT;
> > >  		goto out;
> > >  	}
> > > @@ -1353,8 +1360,14 @@ int __cgroup_bpf_run_filter_setsockopt(struct
> > sock *sk, int *level,
> > >  		/* export any potential modifications */
> > >  		*level = ctx.level;
> > >  		*optname = ctx.optname;
> > > -		*optlen = ctx.optlen;
> > > -		*kernel_optval = ctx.optval;
> > > +
> > > +		/* optlen == 0 from BPF indicates that we should
> > > +		 * use original userspace data.
> > > +		 */
> > > +		if (ctx.optlen != 0) {
> > > +			*optlen = ctx.optlen;
> 
> > I think it should be:
> > *optlen = min(ctx.optlen, max_optlen);
> We do have the following (existing) check above:
> 	} else if (ctx.optlen > max_optlen || ctx.optlen < -1) {
> 		/* optlen is out of bounds */
> 		ret = -EFAULT;
> 	} else {
> 
> So we shouldn't need any min here? Or am I missing something?

ahh. you're right.
Applied to bpf tree.

> > Otherwise when bpf prog doesn't adjust ctx.oplen the kernel will see
> > 4k only in kernel_optval whereas optlen will be > 4k.
> > I suspect iptables sockopt should have crashed at this point.
> > How did you test it?
> The selftests that I've attached in the series. The test is passing
> two pages and for IP_TOS we bypass the value via optlen=0 and
> for IP_FREEBIND we trim the buffer to 1 byte. I think this should
> cover this check here.
> 
> One thing I didn't really test is getsockopt when the kernel
> returns really large buffer (iptables). Right now, the test
> gets 4 bytes (trimmed) from the kernel. I think that's the only
> place that I didn't properly test. I wonder whether I should
> do a real iptables-like setsockopt/getsockopt :-/

would be nice :)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-06-17 17:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-06-17  1:04 [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Stanislav Fomichev
2020-06-17  1:04 ` [PATCH bpf v5 2/3] selftests/bpf: make sure optvals > PAGE_SIZE are bypassed Stanislav Fomichev
2020-06-17  1:04 ` [PATCH bpf v5 3/3] bpf: document optval > PAGE_SIZE behavior for sockopt hooks Stanislav Fomichev
2020-06-17 17:09 ` [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE Alexei Starovoitov
2020-06-17 17:45   ` sdf
2020-06-17 17:59     ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).