* [net PATCH 1/5] bpf: enforce TCP only support for sockmap
2017-10-18 14:09 [net PATCH 0/5] sockmap fixes for net John Fastabend
@ 2017-10-18 14:10 ` John Fastabend
2017-10-18 17:33 ` Alexei Starovoitov
2017-10-18 14:10 ` [net PATCH 2/5] bpf: avoid preempt enable/disable in sockmap using tcp_skb_cb region John Fastabend
` (4 subsequent siblings)
5 siblings, 1 reply; 12+ messages in thread
From: John Fastabend @ 2017-10-18 14:10 UTC (permalink / raw)
To: alexei.starovoitov, davem; +Cc: netdev, borkmann
From: John Fastabend <john.fastabend@gmail.com>
Only TCP sockets have been tested and at the moment the state change
callback only handles TCP sockets. This adds a check to ensure that
sockets actually being added are TCP sockets.
For net-next we can consider UDP support.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
kernel/bpf/sockmap.c | 6 ++++++
tools/testing/selftests/bpf/test_maps.c | 12 +++++++++++-
2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 6424ce0..c68899d 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -840,6 +840,12 @@ static int sock_map_update_elem(struct bpf_map *map,
return -EINVAL;
}
+ if (skops.sk->sk_type != SOCK_STREAM ||
+ skops.sk->sk_protocol != IPPROTO_TCP) {
+ fput(socket->file);
+ return -EOPNOTSUPP;
+ }
+
err = sock_map_ctx_update_elem(&skops, map, key, flags);
fput(socket->file);
return err;
diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
index fe3a443..50ce52d 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -466,7 +466,7 @@ static void test_sockmap(int tasks, void *data)
int one = 1, map_fd_rx, map_fd_tx, map_fd_break, s, sc, rc;
struct bpf_map *bpf_map_rx, *bpf_map_tx, *bpf_map_break;
int ports[] = {50200, 50201, 50202, 50204};
- int err, i, fd, sfd[6] = {0xdeadbeef};
+ int err, i, fd, udp, sfd[6] = {0xdeadbeef};
u8 buf[20] = {0x0, 0x5, 0x3, 0x2, 0x1, 0x0};
int parse_prog, verdict_prog;
struct sockaddr_in addr;
@@ -548,6 +548,16 @@ static void test_sockmap(int tasks, void *data)
goto out_sockmap;
}
+ /* Test update with unsupported UDP socket */
+ udp = socket(AF_INET, SOCK_DGRAM, 0);
+ i = 0;
+ err = bpf_map_update_elem(fd, &i, &udp, BPF_ANY);
+ if (!err) {
+ printf("Failed socket SOCK_DGRAM allowed '%i:%i'\n",
+ i, udp);
+ goto out_sockmap;
+ }
+
/* Test update without programs */
for (i = 0; i < 6; i++) {
err = bpf_map_update_elem(fd, &i, &sfd[i], BPF_ANY);
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [net PATCH 1/5] bpf: enforce TCP only support for sockmap
2017-10-18 14:10 ` [net PATCH 1/5] bpf: enforce TCP only support for sockmap John Fastabend
@ 2017-10-18 17:33 ` Alexei Starovoitov
0 siblings, 0 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2017-10-18 17:33 UTC (permalink / raw)
To: John Fastabend; +Cc: davem, netdev, borkmann
On Wed, Oct 18, 2017 at 07:10:15AM -0700, John Fastabend wrote:
> From: John Fastabend <john.fastabend@gmail.com>
>
> Only TCP sockets have been tested and at the moment the state change
> callback only handles TCP sockets. This adds a check to ensure that
> sockets actually being added are TCP sockets.
>
> For net-next we can consider UDP support.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [net PATCH 2/5] bpf: avoid preempt enable/disable in sockmap using tcp_skb_cb region
2017-10-18 14:09 [net PATCH 0/5] sockmap fixes for net John Fastabend
2017-10-18 14:10 ` [net PATCH 1/5] bpf: enforce TCP only support for sockmap John Fastabend
@ 2017-10-18 14:10 ` John Fastabend
2017-10-18 17:36 ` Alexei Starovoitov
2017-10-18 14:10 ` [net PATCH 3/5] bpf: remove mark access for SK_SKB program types John Fastabend
` (3 subsequent siblings)
5 siblings, 1 reply; 12+ messages in thread
From: John Fastabend @ 2017-10-18 14:10 UTC (permalink / raw)
To: alexei.starovoitov, davem; +Cc: netdev, borkmann
From: John Fastabend <john.fastabend@gmail.com>
SK_SKB BPF programs are run from the socket/tcp context but early in
the stack before much of the TCP metadata is needed in tcp_skb_cb. So
we can use some unused fields to place BPF metadata needed for SK_SKB
programs when implementing the redirect function.
This allows us to drop the preempt disable logic. It does however
require an API change so sk_redirect_map() has been updated to
additionally provide ctx_ptr to skb. Note, we do however continue to
disable/enable preemption around actual BPF program running to account
for map updates.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
include/linux/filter.h | 2 +
include/net/tcp.h | 5 +++
kernel/bpf/sockmap.c | 19 ++++++-------
net/core/filter.c | 29 ++++++++++----------
samples/sockmap/sockmap_kern.c | 2 +
tools/include/uapi/linux/bpf.h | 3 +-
tools/testing/selftests/bpf/bpf_helpers.h | 2 +
tools/testing/selftests/bpf/sockmap_verdict_prog.c | 4 +--
8 files changed, 36 insertions(+), 30 deletions(-)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index d29e58f..818a0b2 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -728,7 +728,7 @@ void xdp_do_flush_map(void);
void bpf_warn_invalid_xdp_action(u32 act);
void bpf_warn_invalid_xdp_redirect(u32 ifindex);
-struct sock *do_sk_redirect_map(void);
+struct sock *do_sk_redirect_map(struct sk_buff *skb);
#ifdef CONFIG_BPF_JIT
extern int bpf_jit_enable;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 89974c5..b1ef98e 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -840,6 +840,11 @@ struct tcp_skb_cb {
struct inet6_skb_parm h6;
#endif
} header; /* For incoming skbs */
+ struct {
+ __u32 key;
+ __u32 flags;
+ struct bpf_map *map;
+ } bpf;
};
};
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index c68899d..beaabb2 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -39,6 +39,7 @@
#include <linux/workqueue.h>
#include <linux/list.h>
#include <net/strparser.h>
+#include <net/tcp.h>
struct bpf_stab {
struct bpf_map map;
@@ -101,9 +102,16 @@ static int smap_verdict_func(struct smap_psock *psock, struct sk_buff *skb)
return SK_DROP;
skb_orphan(skb);
+ /* We need to ensure that BPF metadata for maps is also cleared
+ * when we orphan the skb so that we don't have the possibility
+ * to reference a stale map.
+ */
+ TCP_SKB_CB(skb)->bpf.map = NULL;
skb->sk = psock->sock;
bpf_compute_data_end(skb);
+ preempt_disable();
rc = (*prog->bpf_func)(skb, prog->insnsi);
+ preempt_enable();
skb->sk = NULL;
return rc;
@@ -114,17 +122,10 @@ static void smap_do_verdict(struct smap_psock *psock, struct sk_buff *skb)
struct sock *sk;
int rc;
- /* Because we use per cpu values to feed input from sock redirect
- * in BPF program to do_sk_redirect_map() call we need to ensure we
- * are not preempted. RCU read lock is not sufficient in this case
- * with CONFIG_PREEMPT_RCU enabled so we must be explicit here.
- */
- preempt_disable();
rc = smap_verdict_func(psock, skb);
switch (rc) {
case SK_REDIRECT:
- sk = do_sk_redirect_map();
- preempt_enable();
+ sk = do_sk_redirect_map(skb);
if (likely(sk)) {
struct smap_psock *peer = smap_psock_sk(sk);
@@ -141,8 +142,6 @@ static void smap_do_verdict(struct smap_psock *psock, struct sk_buff *skb)
/* Fall through and free skb otherwise */
case SK_DROP:
default:
- if (rc != SK_REDIRECT)
- preempt_enable();
kfree_skb(skb);
}
}
diff --git a/net/core/filter.c b/net/core/filter.c
index 74b8c91..ca1ba0b 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1839,31 +1839,31 @@ static const struct bpf_func_proto bpf_redirect_proto = {
.arg2_type = ARG_ANYTHING,
};
-BPF_CALL_3(bpf_sk_redirect_map, struct bpf_map *, map, u32, key, u64, flags)
+BPF_CALL_4(bpf_sk_redirect_map, struct sk_buff *, skb,
+ struct bpf_map *, map, u32, key, u64, flags)
{
- struct redirect_info *ri = this_cpu_ptr(&redirect_info);
+ struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
if (unlikely(flags))
return SK_ABORTED;
- ri->ifindex = key;
- ri->flags = flags;
- ri->map = map;
+ tcb->bpf.key = key;
+ tcb->bpf.flags = flags;
+ tcb->bpf.map = map;
return SK_REDIRECT;
}
-struct sock *do_sk_redirect_map(void)
+struct sock *do_sk_redirect_map(struct sk_buff *skb)
{
- struct redirect_info *ri = this_cpu_ptr(&redirect_info);
+ struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
struct sock *sk = NULL;
- if (ri->map) {
- sk = __sock_map_lookup_elem(ri->map, ri->ifindex);
+ if (tcb->bpf.map) {
+ sk = __sock_map_lookup_elem(tcb->bpf.map, tcb->bpf.key);
- ri->ifindex = 0;
- ri->map = NULL;
- /* we do not clear flags for future lookup */
+ tcb->bpf.key = 0;
+ tcb->bpf.map = NULL;
}
return sk;
@@ -1873,9 +1873,10 @@ static const struct bpf_func_proto bpf_sk_redirect_map_proto = {
.func = bpf_sk_redirect_map,
.gpl_only = false,
.ret_type = RET_INTEGER,
- .arg1_type = ARG_CONST_MAP_PTR,
- .arg2_type = ARG_ANYTHING,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_CONST_MAP_PTR,
.arg3_type = ARG_ANYTHING,
+ .arg4_type = ARG_ANYTHING,
};
BPF_CALL_1(bpf_get_cgroup_classid, const struct sk_buff *, skb)
diff --git a/samples/sockmap/sockmap_kern.c b/samples/sockmap/sockmap_kern.c
index f9b38ef..52b0053 100644
--- a/samples/sockmap/sockmap_kern.c
+++ b/samples/sockmap/sockmap_kern.c
@@ -62,7 +62,7 @@ int bpf_prog2(struct __sk_buff *skb)
ret = 1;
bpf_printk("sockmap: %d -> %d @ %d\n", lport, bpf_ntohl(rport), ret);
- return bpf_sk_redirect_map(&sock_map, ret, 0);
+ return bpf_sk_redirect_map(skb, &sock_map, ret, 0);
}
SEC("sockops")
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 43ab5c4..be9a631 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -569,9 +569,10 @@ union bpf_attr {
* @flags: reserved for future use
* Return: 0 on success or negative error code
*
- * int bpf_sk_redirect_map(map, key, flags)
+ * int bpf_sk_redirect_map(skb, map, key, flags)
* Redirect skb to a sock in map using key as a lookup key for the
* sock in map.
+ * @skb: pointer to skb
* @map: pointer to sockmap
* @key: key to lookup sock in map
* @flags: reserved for future use
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 36fb916..b2e02bd 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -65,7 +65,7 @@ static int (*bpf_xdp_adjust_head)(void *ctx, int offset) =
static int (*bpf_setsockopt)(void *ctx, int level, int optname, void *optval,
int optlen) =
(void *) BPF_FUNC_setsockopt;
-static int (*bpf_sk_redirect_map)(void *map, int key, int flags) =
+static int (*bpf_sk_redirect_map)(void *ctx, void *map, int key, int flags) =
(void *) BPF_FUNC_sk_redirect_map;
static int (*bpf_sock_map_update)(void *map, void *key, void *value,
unsigned long long flags) =
diff --git a/tools/testing/selftests/bpf/sockmap_verdict_prog.c b/tools/testing/selftests/bpf/sockmap_verdict_prog.c
index 9b99bd1..2cd2d55 100644
--- a/tools/testing/selftests/bpf/sockmap_verdict_prog.c
+++ b/tools/testing/selftests/bpf/sockmap_verdict_prog.c
@@ -61,8 +61,8 @@ int bpf_prog2(struct __sk_buff *skb)
bpf_printk("verdict: data[0] = redir(%u:%u)\n", map, sk);
if (!map)
- return bpf_sk_redirect_map(&sock_map_rx, sk, 0);
- return bpf_sk_redirect_map(&sock_map_tx, sk, 0);
+ return bpf_sk_redirect_map(skb, &sock_map_rx, sk, 0);
+ return bpf_sk_redirect_map(skb, &sock_map_tx, sk, 0);
}
char _license[] SEC("license") = "GPL";
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [net PATCH 2/5] bpf: avoid preempt enable/disable in sockmap using tcp_skb_cb region
2017-10-18 14:10 ` [net PATCH 2/5] bpf: avoid preempt enable/disable in sockmap using tcp_skb_cb region John Fastabend
@ 2017-10-18 17:36 ` Alexei Starovoitov
0 siblings, 0 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2017-10-18 17:36 UTC (permalink / raw)
To: John Fastabend; +Cc: davem, netdev, borkmann, Eric Dumazet
On Wed, Oct 18, 2017 at 07:10:36AM -0700, John Fastabend wrote:
> From: John Fastabend <john.fastabend@gmail.com>
>
> SK_SKB BPF programs are run from the socket/tcp context but early in
> the stack before much of the TCP metadata is needed in tcp_skb_cb. So
> we can use some unused fields to place BPF metadata needed for SK_SKB
> programs when implementing the redirect function.
>
> This allows us to drop the preempt disable logic. It does however
> require an API change so sk_redirect_map() has been updated to
> additionally provide ctx_ptr to skb. Note, we do however continue to
> disable/enable preemption around actual BPF program running to account
> for map updates.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
> ---
> include/linux/filter.h | 2 +
> include/net/tcp.h | 5 +++
> kernel/bpf/sockmap.c | 19 ++++++-------
> net/core/filter.c | 29 ++++++++++----------
> samples/sockmap/sockmap_kern.c | 2 +
> tools/include/uapi/linux/bpf.h | 3 +-
> tools/testing/selftests/bpf/bpf_helpers.h | 2 +
> tools/testing/selftests/bpf/sockmap_verdict_prog.c | 4 +--
> 8 files changed, 36 insertions(+), 30 deletions(-)
>
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index d29e58f..818a0b2 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -728,7 +728,7 @@ void xdp_do_flush_map(void);
> void bpf_warn_invalid_xdp_action(u32 act);
> void bpf_warn_invalid_xdp_redirect(u32 ifindex);
>
> -struct sock *do_sk_redirect_map(void);
> +struct sock *do_sk_redirect_map(struct sk_buff *skb);
>
> #ifdef CONFIG_BPF_JIT
> extern int bpf_jit_enable;
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 89974c5..b1ef98e 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -840,6 +840,11 @@ struct tcp_skb_cb {
> struct inet6_skb_parm h6;
> #endif
> } header; /* For incoming skbs */
> + struct {
> + __u32 key;
> + __u32 flags;
> + struct bpf_map *map;
> + } bpf;
I think it should be ok. cc Eric for visibility.
> };
> };
>
> diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
> index c68899d..beaabb2 100644
> --- a/kernel/bpf/sockmap.c
> +++ b/kernel/bpf/sockmap.c
> @@ -39,6 +39,7 @@
> #include <linux/workqueue.h>
> #include <linux/list.h>
> #include <net/strparser.h>
> +#include <net/tcp.h>
>
> struct bpf_stab {
> struct bpf_map map;
> @@ -101,9 +102,16 @@ static int smap_verdict_func(struct smap_psock *psock, struct sk_buff *skb)
> return SK_DROP;
>
> skb_orphan(skb);
> + /* We need to ensure that BPF metadata for maps is also cleared
> + * when we orphan the skb so that we don't have the possibility
> + * to reference a stale map.
> + */
> + TCP_SKB_CB(skb)->bpf.map = NULL;
> skb->sk = psock->sock;
> bpf_compute_data_end(skb);
> + preempt_disable();
> rc = (*prog->bpf_func)(skb, prog->insnsi);
> + preempt_enable();
> skb->sk = NULL;
>
> return rc;
> @@ -114,17 +122,10 @@ static void smap_do_verdict(struct smap_psock *psock, struct sk_buff *skb)
> struct sock *sk;
> int rc;
>
> - /* Because we use per cpu values to feed input from sock redirect
> - * in BPF program to do_sk_redirect_map() call we need to ensure we
> - * are not preempted. RCU read lock is not sufficient in this case
> - * with CONFIG_PREEMPT_RCU enabled so we must be explicit here.
> - */
> - preempt_disable();
> rc = smap_verdict_func(psock, skb);
> switch (rc) {
> case SK_REDIRECT:
> - sk = do_sk_redirect_map();
> - preempt_enable();
> + sk = do_sk_redirect_map(skb);
> if (likely(sk)) {
> struct smap_psock *peer = smap_psock_sk(sk);
>
> @@ -141,8 +142,6 @@ static void smap_do_verdict(struct smap_psock *psock, struct sk_buff *skb)
> /* Fall through and free skb otherwise */
> case SK_DROP:
> default:
> - if (rc != SK_REDIRECT)
> - preempt_enable();
> kfree_skb(skb);
> }
> }
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 74b8c91..ca1ba0b 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -1839,31 +1839,31 @@ static const struct bpf_func_proto bpf_redirect_proto = {
> .arg2_type = ARG_ANYTHING,
> };
>
> -BPF_CALL_3(bpf_sk_redirect_map, struct bpf_map *, map, u32, key, u64, flags)
> +BPF_CALL_4(bpf_sk_redirect_map, struct sk_buff *, skb,
> + struct bpf_map *, map, u32, key, u64, flags)
> {
> - struct redirect_info *ri = this_cpu_ptr(&redirect_info);
> + struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
>
> if (unlikely(flags))
> return SK_ABORTED;
>
> - ri->ifindex = key;
> - ri->flags = flags;
> - ri->map = map;
> + tcb->bpf.key = key;
> + tcb->bpf.flags = flags;
> + tcb->bpf.map = map;
>
> return SK_REDIRECT;
> }
>
> -struct sock *do_sk_redirect_map(void)
> +struct sock *do_sk_redirect_map(struct sk_buff *skb)
> {
> - struct redirect_info *ri = this_cpu_ptr(&redirect_info);
> + struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
> struct sock *sk = NULL;
>
> - if (ri->map) {
> - sk = __sock_map_lookup_elem(ri->map, ri->ifindex);
> + if (tcb->bpf.map) {
> + sk = __sock_map_lookup_elem(tcb->bpf.map, tcb->bpf.key);
>
> - ri->ifindex = 0;
> - ri->map = NULL;
> - /* we do not clear flags for future lookup */
> + tcb->bpf.key = 0;
> + tcb->bpf.map = NULL;
> }
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [net PATCH 3/5] bpf: remove mark access for SK_SKB program types
2017-10-18 14:09 [net PATCH 0/5] sockmap fixes for net John Fastabend
2017-10-18 14:10 ` [net PATCH 1/5] bpf: enforce TCP only support for sockmap John Fastabend
2017-10-18 14:10 ` [net PATCH 2/5] bpf: avoid preempt enable/disable in sockmap using tcp_skb_cb region John Fastabend
@ 2017-10-18 14:10 ` John Fastabend
2017-10-18 17:34 ` Alexei Starovoitov
2017-10-18 14:11 ` [net PATCH 4/5] bpf: require CAP_NET_ADMIN when using sockmap maps John Fastabend
` (2 subsequent siblings)
5 siblings, 1 reply; 12+ messages in thread
From: John Fastabend @ 2017-10-18 14:10 UTC (permalink / raw)
To: alexei.starovoitov, davem; +Cc: netdev, borkmann
From: John Fastabend <john.fastabend@gmail.com>
The skb->mark field is a union with reserved_tailroom which is used
in the TCP code paths from stream memory allocation. Allowing SK_SKB
programs to set this field creates a conflict with future code
optimizations, such as "gifting" the skb to the egress path instead
of creating a new skb and doing a memcpy.
Because we do not have a released version of SK_SKB yet lets just
remove it for now. A more appropriate scratch pad to use at the
socket layer is dev_scratch, but lets add that in future kernels
when needed.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
net/core/filter.c | 2 +-
tools/testing/selftests/bpf/test_verifier.c | 16 ++++++++++++++--
2 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index ca1ba0b..aa02659 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3684,7 +3684,6 @@ static bool sk_skb_is_valid_access(int off, int size,
{
if (type == BPF_WRITE) {
switch (off) {
- case bpf_ctx_range(struct __sk_buff, mark):
case bpf_ctx_range(struct __sk_buff, tc_index):
case bpf_ctx_range(struct __sk_buff, priority):
break;
@@ -3694,6 +3693,7 @@ static bool sk_skb_is_valid_access(int off, int size,
}
switch (off) {
+ case bpf_ctx_range(struct __sk_buff, mark):
case bpf_ctx_range(struct __sk_buff, tc_classid):
return false;
case bpf_ctx_range(struct __sk_buff, data):
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 26f3250..16cca57 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -1130,15 +1130,27 @@ static struct bpf_test tests[] = {
.errstr = "invalid bpf_context access",
},
{
- "check skb->mark is writeable by SK_SKB",
+ "invalid access of skb->mark for SK_SKB",
+ .insns = {
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+ offsetof(struct __sk_buff, mark)),
+ BPF_EXIT_INSN(),
+ },
+ .result = REJECT,
+ .prog_type = BPF_PROG_TYPE_SK_SKB,
+ .errstr = "invalid bpf_context access",
+ },
+ {
+ "check skb->mark is not writeable by SK_SKB",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
offsetof(struct __sk_buff, mark)),
BPF_EXIT_INSN(),
},
- .result = ACCEPT,
+ .result = REJECT,
.prog_type = BPF_PROG_TYPE_SK_SKB,
+ .errstr = "invalid bpf_context access",
},
{
"check skb->tc_index is writeable by SK_SKB",
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [net PATCH 3/5] bpf: remove mark access for SK_SKB program types
2017-10-18 14:10 ` [net PATCH 3/5] bpf: remove mark access for SK_SKB program types John Fastabend
@ 2017-10-18 17:34 ` Alexei Starovoitov
0 siblings, 0 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2017-10-18 17:34 UTC (permalink / raw)
To: John Fastabend; +Cc: davem, netdev, borkmann
On Wed, Oct 18, 2017 at 07:10:58AM -0700, John Fastabend wrote:
> From: John Fastabend <john.fastabend@gmail.com>
>
> The skb->mark field is a union with reserved_tailroom which is used
> in the TCP code paths from stream memory allocation. Allowing SK_SKB
> programs to set this field creates a conflict with future code
> optimizations, such as "gifting" the skb to the egress path instead
> of creating a new skb and doing a memcpy.
>
> Because we do not have a released version of SK_SKB yet lets just
> remove it for now. A more appropriate scratch pad to use at the
> socket layer is dev_scratch, but lets add that in future kernels
> when needed.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [net PATCH 4/5] bpf: require CAP_NET_ADMIN when using sockmap maps
2017-10-18 14:09 [net PATCH 0/5] sockmap fixes for net John Fastabend
` (2 preceding siblings ...)
2017-10-18 14:10 ` [net PATCH 3/5] bpf: remove mark access for SK_SKB program types John Fastabend
@ 2017-10-18 14:11 ` John Fastabend
2017-10-18 17:34 ` Alexei Starovoitov
2017-10-18 14:11 ` [net PATCH 5/5] bpf: require CAP_NET_ADMIN when using devmap John Fastabend
2017-10-20 12:01 ` [net PATCH 0/5] sockmap fixes for net David Miller
5 siblings, 1 reply; 12+ messages in thread
From: John Fastabend @ 2017-10-18 14:11 UTC (permalink / raw)
To: alexei.starovoitov, davem; +Cc: netdev, borkmann
From: John Fastabend <john.fastabend@gmail.com>
Restrict sockmap to CAP_NET_ADMIN.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
kernel/bpf/sockmap.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index beaabb2..2b6eb35 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -486,6 +486,9 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
int err = -EINVAL;
u64 cost;
+ if (!capable(CAP_NET_ADMIN))
+ return ERR_PTR(-EPERM);
+
/* check sanity of attributes */
if (attr->max_entries == 0 || attr->key_size != 4 ||
attr->value_size != 4 || attr->map_flags & ~BPF_F_NUMA_NODE)
^ permalink raw reply related [flat|nested] 12+ messages in thread* [net PATCH 5/5] bpf: require CAP_NET_ADMIN when using devmap
2017-10-18 14:09 [net PATCH 0/5] sockmap fixes for net John Fastabend
` (3 preceding siblings ...)
2017-10-18 14:11 ` [net PATCH 4/5] bpf: require CAP_NET_ADMIN when using sockmap maps John Fastabend
@ 2017-10-18 14:11 ` John Fastabend
2017-10-18 17:34 ` Alexei Starovoitov
2017-10-20 12:01 ` [net PATCH 0/5] sockmap fixes for net David Miller
5 siblings, 1 reply; 12+ messages in thread
From: John Fastabend @ 2017-10-18 14:11 UTC (permalink / raw)
To: alexei.starovoitov, davem; +Cc: netdev, borkmann
From: John Fastabend <john.fastabend@gmail.com>
Devmap is used with XDP which requires CAP_NET_ADMIN so lets also
make CAP_NET_ADMIN required to use the map.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
kernel/bpf/devmap.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index e093d9a..7d9f32f 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -78,6 +78,9 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
int err = -EINVAL;
u64 cost;
+ if (!capable(CAP_NET_ADMIN))
+ return ERR_PTR(-EPERM);
+
/* check sanity of attributes */
if (attr->max_entries == 0 || attr->key_size != 4 ||
attr->value_size != 4 || attr->map_flags & ~BPF_F_NUMA_NODE)
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [net PATCH 0/5] sockmap fixes for net
2017-10-18 14:09 [net PATCH 0/5] sockmap fixes for net John Fastabend
` (4 preceding siblings ...)
2017-10-18 14:11 ` [net PATCH 5/5] bpf: require CAP_NET_ADMIN when using devmap John Fastabend
@ 2017-10-20 12:01 ` David Miller
5 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2017-10-20 12:01 UTC (permalink / raw)
To: john.r.fastabend; +Cc: alexei.starovoitov, netdev, borkmann
From: John Fastabend <john.r.fastabend@gmail.com>
Date: Wed, 18 Oct 2017 07:09:48 -0700
> The following implements a set of fixes for sockmap and changes the
> API slightly in a few places to reduce preempt_disable/enable scope.
> We do this here in net because it requires an API change and this
> avoids getting stuck with legacy API going forward.
>
> The short description:
>
> Access to skb mark is removed, it is problematic when we add
> features in the future because mark is a union and used by the
> TCP/socket code internally. We don't want to expose this to the
> BPF programs or let programs change the values.
>
> The other change is caching metadata in the skb itself between
> when the BPF program returns a redirect code and the core code
> implements the redirect. This avoids having per cpu metadata.
>
> Finally, tighten restriction on using sockmap to CAP_NET_ADMIN and
> only SOCK_STREAM sockets.
Series applied, thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread