* [net PATCH 0/5] sockmap fixes for net
@ 2017-10-18 14:09 John Fastabend
2017-10-18 14:10 ` [net PATCH 1/5] bpf: enforce TCP only support for sockmap John Fastabend
` (5 more replies)
0 siblings, 6 replies; 12+ messages in thread
From: John Fastabend @ 2017-10-18 14:09 UTC (permalink / raw)
To: alexei.starovoitov, davem; +Cc: netdev, borkmann
The following implements a set of fixes for sockmap and changes the
API slightly in a few places to reduce preempt_disable/enable scope.
We do this here in net because it requires an API change and this
avoids getting stuck with legacy API going forward.
The short description:
Access to skb mark is removed, it is problematic when we add
features in the future because mark is a union and used by the
TCP/socket code internally. We don't want to expose this to the
BPF programs or let programs change the values.
The other change is caching metadata in the skb itself between
when the BPF program returns a redirect code and the core code
implements the redirect. This avoids having per cpu metadata.
Finally, tighten restriction on using sockmap to CAP_NET_ADMIN and
only SOCK_STREAM sockets.
Thanks,
John
---
John Fastabend (5):
bpf: enforce TCP only support for sockmap
bpf: avoid preempt enable/disable in sockmap using tcp_skb_cb region
bpf: remove mark access for SK_SKB program types
bpf: require CAP_NET_ADMIN when using sockmap maps
bpf: require CAP_NET_ADMIN when using devmap
include/linux/filter.h | 2 +
include/net/tcp.h | 5 +++
kernel/bpf/devmap.c | 3 ++
kernel/bpf/sockmap.c | 28 ++++++++++++------
net/core/filter.c | 31 ++++++++++----------
samples/sockmap/sockmap_kern.c | 2 +
tools/include/uapi/linux/bpf.h | 3 +-
tools/testing/selftests/bpf/bpf_helpers.h | 2 +
tools/testing/selftests/bpf/sockmap_verdict_prog.c | 4 +--
tools/testing/selftests/bpf/test_maps.c | 12 +++++++-
tools/testing/selftests/bpf/test_verifier.c | 16 +++++++++-
11 files changed, 74 insertions(+), 34 deletions(-)
^ permalink raw reply [flat|nested] 12+ messages in thread
* [net PATCH 1/5] bpf: enforce TCP only support for sockmap
2017-10-18 14:09 [net PATCH 0/5] sockmap fixes for net John Fastabend
@ 2017-10-18 14:10 ` John Fastabend
2017-10-18 17:33 ` Alexei Starovoitov
2017-10-18 14:10 ` [net PATCH 2/5] bpf: avoid preempt enable/disable in sockmap using tcp_skb_cb region John Fastabend
` (4 subsequent siblings)
5 siblings, 1 reply; 12+ messages in thread
From: John Fastabend @ 2017-10-18 14:10 UTC (permalink / raw)
To: alexei.starovoitov, davem; +Cc: netdev, borkmann
From: John Fastabend <john.fastabend@gmail.com>
Only TCP sockets have been tested and at the moment the state change
callback only handles TCP sockets. This adds a check to ensure that
sockets actually being added are TCP sockets.
For net-next we can consider UDP support.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
kernel/bpf/sockmap.c | 6 ++++++
tools/testing/selftests/bpf/test_maps.c | 12 +++++++++++-
2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 6424ce0..c68899d 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -840,6 +840,12 @@ static int sock_map_update_elem(struct bpf_map *map,
return -EINVAL;
}
+ if (skops.sk->sk_type != SOCK_STREAM ||
+ skops.sk->sk_protocol != IPPROTO_TCP) {
+ fput(socket->file);
+ return -EOPNOTSUPP;
+ }
+
err = sock_map_ctx_update_elem(&skops, map, key, flags);
fput(socket->file);
return err;
diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
index fe3a443..50ce52d 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -466,7 +466,7 @@ static void test_sockmap(int tasks, void *data)
int one = 1, map_fd_rx, map_fd_tx, map_fd_break, s, sc, rc;
struct bpf_map *bpf_map_rx, *bpf_map_tx, *bpf_map_break;
int ports[] = {50200, 50201, 50202, 50204};
- int err, i, fd, sfd[6] = {0xdeadbeef};
+ int err, i, fd, udp, sfd[6] = {0xdeadbeef};
u8 buf[20] = {0x0, 0x5, 0x3, 0x2, 0x1, 0x0};
int parse_prog, verdict_prog;
struct sockaddr_in addr;
@@ -548,6 +548,16 @@ static void test_sockmap(int tasks, void *data)
goto out_sockmap;
}
+ /* Test update with unsupported UDP socket */
+ udp = socket(AF_INET, SOCK_DGRAM, 0);
+ i = 0;
+ err = bpf_map_update_elem(fd, &i, &udp, BPF_ANY);
+ if (!err) {
+ printf("Failed socket SOCK_DGRAM allowed '%i:%i'\n",
+ i, udp);
+ goto out_sockmap;
+ }
+
/* Test update without programs */
for (i = 0; i < 6; i++) {
err = bpf_map_update_elem(fd, &i, &sfd[i], BPF_ANY);
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [net PATCH 2/5] bpf: avoid preempt enable/disable in sockmap using tcp_skb_cb region
2017-10-18 14:09 [net PATCH 0/5] sockmap fixes for net John Fastabend
2017-10-18 14:10 ` [net PATCH 1/5] bpf: enforce TCP only support for sockmap John Fastabend
@ 2017-10-18 14:10 ` John Fastabend
2017-10-18 17:36 ` Alexei Starovoitov
2017-10-18 14:10 ` [net PATCH 3/5] bpf: remove mark access for SK_SKB program types John Fastabend
` (3 subsequent siblings)
5 siblings, 1 reply; 12+ messages in thread
From: John Fastabend @ 2017-10-18 14:10 UTC (permalink / raw)
To: alexei.starovoitov, davem; +Cc: netdev, borkmann
From: John Fastabend <john.fastabend@gmail.com>
SK_SKB BPF programs are run from the socket/tcp context but early in
the stack before much of the TCP metadata is needed in tcp_skb_cb. So
we can use some unused fields to place BPF metadata needed for SK_SKB
programs when implementing the redirect function.
This allows us to drop the preempt disable logic. It does however
require an API change so sk_redirect_map() has been updated to
additionally provide ctx_ptr to skb. Note, we do however continue to
disable/enable preemption around actual BPF program running to account
for map updates.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
include/linux/filter.h | 2 +
include/net/tcp.h | 5 +++
kernel/bpf/sockmap.c | 19 ++++++-------
net/core/filter.c | 29 ++++++++++----------
samples/sockmap/sockmap_kern.c | 2 +
tools/include/uapi/linux/bpf.h | 3 +-
tools/testing/selftests/bpf/bpf_helpers.h | 2 +
tools/testing/selftests/bpf/sockmap_verdict_prog.c | 4 +--
8 files changed, 36 insertions(+), 30 deletions(-)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index d29e58f..818a0b2 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -728,7 +728,7 @@ void xdp_do_flush_map(void);
void bpf_warn_invalid_xdp_action(u32 act);
void bpf_warn_invalid_xdp_redirect(u32 ifindex);
-struct sock *do_sk_redirect_map(void);
+struct sock *do_sk_redirect_map(struct sk_buff *skb);
#ifdef CONFIG_BPF_JIT
extern int bpf_jit_enable;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 89974c5..b1ef98e 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -840,6 +840,11 @@ struct tcp_skb_cb {
struct inet6_skb_parm h6;
#endif
} header; /* For incoming skbs */
+ struct {
+ __u32 key;
+ __u32 flags;
+ struct bpf_map *map;
+ } bpf;
};
};
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index c68899d..beaabb2 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -39,6 +39,7 @@
#include <linux/workqueue.h>
#include <linux/list.h>
#include <net/strparser.h>
+#include <net/tcp.h>
struct bpf_stab {
struct bpf_map map;
@@ -101,9 +102,16 @@ static int smap_verdict_func(struct smap_psock *psock, struct sk_buff *skb)
return SK_DROP;
skb_orphan(skb);
+ /* We need to ensure that BPF metadata for maps is also cleared
+ * when we orphan the skb so that we don't have the possibility
+ * to reference a stale map.
+ */
+ TCP_SKB_CB(skb)->bpf.map = NULL;
skb->sk = psock->sock;
bpf_compute_data_end(skb);
+ preempt_disable();
rc = (*prog->bpf_func)(skb, prog->insnsi);
+ preempt_enable();
skb->sk = NULL;
return rc;
@@ -114,17 +122,10 @@ static void smap_do_verdict(struct smap_psock *psock, struct sk_buff *skb)
struct sock *sk;
int rc;
- /* Because we use per cpu values to feed input from sock redirect
- * in BPF program to do_sk_redirect_map() call we need to ensure we
- * are not preempted. RCU read lock is not sufficient in this case
- * with CONFIG_PREEMPT_RCU enabled so we must be explicit here.
- */
- preempt_disable();
rc = smap_verdict_func(psock, skb);
switch (rc) {
case SK_REDIRECT:
- sk = do_sk_redirect_map();
- preempt_enable();
+ sk = do_sk_redirect_map(skb);
if (likely(sk)) {
struct smap_psock *peer = smap_psock_sk(sk);
@@ -141,8 +142,6 @@ static void smap_do_verdict(struct smap_psock *psock, struct sk_buff *skb)
/* Fall through and free skb otherwise */
case SK_DROP:
default:
- if (rc != SK_REDIRECT)
- preempt_enable();
kfree_skb(skb);
}
}
diff --git a/net/core/filter.c b/net/core/filter.c
index 74b8c91..ca1ba0b 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1839,31 +1839,31 @@ static const struct bpf_func_proto bpf_redirect_proto = {
.arg2_type = ARG_ANYTHING,
};
-BPF_CALL_3(bpf_sk_redirect_map, struct bpf_map *, map, u32, key, u64, flags)
+BPF_CALL_4(bpf_sk_redirect_map, struct sk_buff *, skb,
+ struct bpf_map *, map, u32, key, u64, flags)
{
- struct redirect_info *ri = this_cpu_ptr(&redirect_info);
+ struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
if (unlikely(flags))
return SK_ABORTED;
- ri->ifindex = key;
- ri->flags = flags;
- ri->map = map;
+ tcb->bpf.key = key;
+ tcb->bpf.flags = flags;
+ tcb->bpf.map = map;
return SK_REDIRECT;
}
-struct sock *do_sk_redirect_map(void)
+struct sock *do_sk_redirect_map(struct sk_buff *skb)
{
- struct redirect_info *ri = this_cpu_ptr(&redirect_info);
+ struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
struct sock *sk = NULL;
- if (ri->map) {
- sk = __sock_map_lookup_elem(ri->map, ri->ifindex);
+ if (tcb->bpf.map) {
+ sk = __sock_map_lookup_elem(tcb->bpf.map, tcb->bpf.key);
- ri->ifindex = 0;
- ri->map = NULL;
- /* we do not clear flags for future lookup */
+ tcb->bpf.key = 0;
+ tcb->bpf.map = NULL;
}
return sk;
@@ -1873,9 +1873,10 @@ static const struct bpf_func_proto bpf_sk_redirect_map_proto = {
.func = bpf_sk_redirect_map,
.gpl_only = false,
.ret_type = RET_INTEGER,
- .arg1_type = ARG_CONST_MAP_PTR,
- .arg2_type = ARG_ANYTHING,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_CONST_MAP_PTR,
.arg3_type = ARG_ANYTHING,
+ .arg4_type = ARG_ANYTHING,
};
BPF_CALL_1(bpf_get_cgroup_classid, const struct sk_buff *, skb)
diff --git a/samples/sockmap/sockmap_kern.c b/samples/sockmap/sockmap_kern.c
index f9b38ef..52b0053 100644
--- a/samples/sockmap/sockmap_kern.c
+++ b/samples/sockmap/sockmap_kern.c
@@ -62,7 +62,7 @@ int bpf_prog2(struct __sk_buff *skb)
ret = 1;
bpf_printk("sockmap: %d -> %d @ %d\n", lport, bpf_ntohl(rport), ret);
- return bpf_sk_redirect_map(&sock_map, ret, 0);
+ return bpf_sk_redirect_map(skb, &sock_map, ret, 0);
}
SEC("sockops")
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 43ab5c4..be9a631 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -569,9 +569,10 @@ union bpf_attr {
* @flags: reserved for future use
* Return: 0 on success or negative error code
*
- * int bpf_sk_redirect_map(map, key, flags)
+ * int bpf_sk_redirect_map(skb, map, key, flags)
* Redirect skb to a sock in map using key as a lookup key for the
* sock in map.
+ * @skb: pointer to skb
* @map: pointer to sockmap
* @key: key to lookup sock in map
* @flags: reserved for future use
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 36fb916..b2e02bd 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -65,7 +65,7 @@ static int (*bpf_xdp_adjust_head)(void *ctx, int offset) =
static int (*bpf_setsockopt)(void *ctx, int level, int optname, void *optval,
int optlen) =
(void *) BPF_FUNC_setsockopt;
-static int (*bpf_sk_redirect_map)(void *map, int key, int flags) =
+static int (*bpf_sk_redirect_map)(void *ctx, void *map, int key, int flags) =
(void *) BPF_FUNC_sk_redirect_map;
static int (*bpf_sock_map_update)(void *map, void *key, void *value,
unsigned long long flags) =
diff --git a/tools/testing/selftests/bpf/sockmap_verdict_prog.c b/tools/testing/selftests/bpf/sockmap_verdict_prog.c
index 9b99bd1..2cd2d55 100644
--- a/tools/testing/selftests/bpf/sockmap_verdict_prog.c
+++ b/tools/testing/selftests/bpf/sockmap_verdict_prog.c
@@ -61,8 +61,8 @@ int bpf_prog2(struct __sk_buff *skb)
bpf_printk("verdict: data[0] = redir(%u:%u)\n", map, sk);
if (!map)
- return bpf_sk_redirect_map(&sock_map_rx, sk, 0);
- return bpf_sk_redirect_map(&sock_map_tx, sk, 0);
+ return bpf_sk_redirect_map(skb, &sock_map_rx, sk, 0);
+ return bpf_sk_redirect_map(skb, &sock_map_tx, sk, 0);
}
char _license[] SEC("license") = "GPL";
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [net PATCH 3/5] bpf: remove mark access for SK_SKB program types
2017-10-18 14:09 [net PATCH 0/5] sockmap fixes for net John Fastabend
2017-10-18 14:10 ` [net PATCH 1/5] bpf: enforce TCP only support for sockmap John Fastabend
2017-10-18 14:10 ` [net PATCH 2/5] bpf: avoid preempt enable/disable in sockmap using tcp_skb_cb region John Fastabend
@ 2017-10-18 14:10 ` John Fastabend
2017-10-18 17:34 ` Alexei Starovoitov
2017-10-18 14:11 ` [net PATCH 4/5] bpf: require CAP_NET_ADMIN when using sockmap maps John Fastabend
` (2 subsequent siblings)
5 siblings, 1 reply; 12+ messages in thread
From: John Fastabend @ 2017-10-18 14:10 UTC (permalink / raw)
To: alexei.starovoitov, davem; +Cc: netdev, borkmann
From: John Fastabend <john.fastabend@gmail.com>
The skb->mark field is a union with reserved_tailroom which is used
in the TCP code paths from stream memory allocation. Allowing SK_SKB
programs to set this field creates a conflict with future code
optimizations, such as "gifting" the skb to the egress path instead
of creating a new skb and doing a memcpy.
Because we do not have a released version of SK_SKB yet lets just
remove it for now. A more appropriate scratch pad to use at the
socket layer is dev_scratch, but lets add that in future kernels
when needed.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
net/core/filter.c | 2 +-
tools/testing/selftests/bpf/test_verifier.c | 16 ++++++++++++++--
2 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index ca1ba0b..aa02659 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3684,7 +3684,6 @@ static bool sk_skb_is_valid_access(int off, int size,
{
if (type == BPF_WRITE) {
switch (off) {
- case bpf_ctx_range(struct __sk_buff, mark):
case bpf_ctx_range(struct __sk_buff, tc_index):
case bpf_ctx_range(struct __sk_buff, priority):
break;
@@ -3694,6 +3693,7 @@ static bool sk_skb_is_valid_access(int off, int size,
}
switch (off) {
+ case bpf_ctx_range(struct __sk_buff, mark):
case bpf_ctx_range(struct __sk_buff, tc_classid):
return false;
case bpf_ctx_range(struct __sk_buff, data):
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 26f3250..16cca57 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -1130,15 +1130,27 @@ static struct bpf_test tests[] = {
.errstr = "invalid bpf_context access",
},
{
- "check skb->mark is writeable by SK_SKB",
+ "invalid access of skb->mark for SK_SKB",
+ .insns = {
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+ offsetof(struct __sk_buff, mark)),
+ BPF_EXIT_INSN(),
+ },
+ .result = REJECT,
+ .prog_type = BPF_PROG_TYPE_SK_SKB,
+ .errstr = "invalid bpf_context access",
+ },
+ {
+ "check skb->mark is not writeable by SK_SKB",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
offsetof(struct __sk_buff, mark)),
BPF_EXIT_INSN(),
},
- .result = ACCEPT,
+ .result = REJECT,
.prog_type = BPF_PROG_TYPE_SK_SKB,
+ .errstr = "invalid bpf_context access",
},
{
"check skb->tc_index is writeable by SK_SKB",
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [net PATCH 4/5] bpf: require CAP_NET_ADMIN when using sockmap maps
2017-10-18 14:09 [net PATCH 0/5] sockmap fixes for net John Fastabend
` (2 preceding siblings ...)
2017-10-18 14:10 ` [net PATCH 3/5] bpf: remove mark access for SK_SKB program types John Fastabend
@ 2017-10-18 14:11 ` John Fastabend
2017-10-18 17:34 ` Alexei Starovoitov
2017-10-18 14:11 ` [net PATCH 5/5] bpf: require CAP_NET_ADMIN when using devmap John Fastabend
2017-10-20 12:01 ` [net PATCH 0/5] sockmap fixes for net David Miller
5 siblings, 1 reply; 12+ messages in thread
From: John Fastabend @ 2017-10-18 14:11 UTC (permalink / raw)
To: alexei.starovoitov, davem; +Cc: netdev, borkmann
From: John Fastabend <john.fastabend@gmail.com>
Restrict sockmap to CAP_NET_ADMIN.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
kernel/bpf/sockmap.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index beaabb2..2b6eb35 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -486,6 +486,9 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
int err = -EINVAL;
u64 cost;
+ if (!capable(CAP_NET_ADMIN))
+ return ERR_PTR(-EPERM);
+
/* check sanity of attributes */
if (attr->max_entries == 0 || attr->key_size != 4 ||
attr->value_size != 4 || attr->map_flags & ~BPF_F_NUMA_NODE)
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [net PATCH 5/5] bpf: require CAP_NET_ADMIN when using devmap
2017-10-18 14:09 [net PATCH 0/5] sockmap fixes for net John Fastabend
` (3 preceding siblings ...)
2017-10-18 14:11 ` [net PATCH 4/5] bpf: require CAP_NET_ADMIN when using sockmap maps John Fastabend
@ 2017-10-18 14:11 ` John Fastabend
2017-10-18 17:34 ` Alexei Starovoitov
2017-10-20 12:01 ` [net PATCH 0/5] sockmap fixes for net David Miller
5 siblings, 1 reply; 12+ messages in thread
From: John Fastabend @ 2017-10-18 14:11 UTC (permalink / raw)
To: alexei.starovoitov, davem; +Cc: netdev, borkmann
From: John Fastabend <john.fastabend@gmail.com>
Devmap is used with XDP which requires CAP_NET_ADMIN so lets also
make CAP_NET_ADMIN required to use the map.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
kernel/bpf/devmap.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index e093d9a..7d9f32f 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -78,6 +78,9 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
int err = -EINVAL;
u64 cost;
+ if (!capable(CAP_NET_ADMIN))
+ return ERR_PTR(-EPERM);
+
/* check sanity of attributes */
if (attr->max_entries == 0 || attr->key_size != 4 ||
attr->value_size != 4 || attr->map_flags & ~BPF_F_NUMA_NODE)
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [net PATCH 1/5] bpf: enforce TCP only support for sockmap
2017-10-18 14:10 ` [net PATCH 1/5] bpf: enforce TCP only support for sockmap John Fastabend
@ 2017-10-18 17:33 ` Alexei Starovoitov
0 siblings, 0 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2017-10-18 17:33 UTC (permalink / raw)
To: John Fastabend; +Cc: davem, netdev, borkmann
On Wed, Oct 18, 2017 at 07:10:15AM -0700, John Fastabend wrote:
> From: John Fastabend <john.fastabend@gmail.com>
>
> Only TCP sockets have been tested and at the moment the state change
> callback only handles TCP sockets. This adds a check to ensure that
> sockets actually being added are TCP sockets.
>
> For net-next we can consider UDP support.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [net PATCH 3/5] bpf: remove mark access for SK_SKB program types
2017-10-18 14:10 ` [net PATCH 3/5] bpf: remove mark access for SK_SKB program types John Fastabend
@ 2017-10-18 17:34 ` Alexei Starovoitov
0 siblings, 0 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2017-10-18 17:34 UTC (permalink / raw)
To: John Fastabend; +Cc: davem, netdev, borkmann
On Wed, Oct 18, 2017 at 07:10:58AM -0700, John Fastabend wrote:
> From: John Fastabend <john.fastabend@gmail.com>
>
> The skb->mark field is a union with reserved_tailroom which is used
> in the TCP code paths from stream memory allocation. Allowing SK_SKB
> programs to set this field creates a conflict with future code
> optimizations, such as "gifting" the skb to the egress path instead
> of creating a new skb and doing a memcpy.
>
> Because we do not have a released version of SK_SKB yet lets just
> remove it for now. A more appropriate scratch pad to use at the
> socket layer is dev_scratch, but lets add that in future kernels
> when needed.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [net PATCH 4/5] bpf: require CAP_NET_ADMIN when using sockmap maps
2017-10-18 14:11 ` [net PATCH 4/5] bpf: require CAP_NET_ADMIN when using sockmap maps John Fastabend
@ 2017-10-18 17:34 ` Alexei Starovoitov
0 siblings, 0 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2017-10-18 17:34 UTC (permalink / raw)
To: John Fastabend; +Cc: davem, netdev, borkmann
On Wed, Oct 18, 2017 at 07:11:22AM -0700, John Fastabend wrote:
> From: John Fastabend <john.fastabend@gmail.com>
>
> Restrict sockmap to CAP_NET_ADMIN.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [net PATCH 5/5] bpf: require CAP_NET_ADMIN when using devmap
2017-10-18 14:11 ` [net PATCH 5/5] bpf: require CAP_NET_ADMIN when using devmap John Fastabend
@ 2017-10-18 17:34 ` Alexei Starovoitov
0 siblings, 0 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2017-10-18 17:34 UTC (permalink / raw)
To: John Fastabend; +Cc: davem, netdev, borkmann
On Wed, Oct 18, 2017 at 07:11:44AM -0700, John Fastabend wrote:
> From: John Fastabend <john.fastabend@gmail.com>
>
> Devmap is used with XDP which requires CAP_NET_ADMIN so lets also
> make CAP_NET_ADMIN required to use the map.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [net PATCH 2/5] bpf: avoid preempt enable/disable in sockmap using tcp_skb_cb region
2017-10-18 14:10 ` [net PATCH 2/5] bpf: avoid preempt enable/disable in sockmap using tcp_skb_cb region John Fastabend
@ 2017-10-18 17:36 ` Alexei Starovoitov
0 siblings, 0 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2017-10-18 17:36 UTC (permalink / raw)
To: John Fastabend; +Cc: davem, netdev, borkmann, Eric Dumazet
On Wed, Oct 18, 2017 at 07:10:36AM -0700, John Fastabend wrote:
> From: John Fastabend <john.fastabend@gmail.com>
>
> SK_SKB BPF programs are run from the socket/tcp context but early in
> the stack before much of the TCP metadata is needed in tcp_skb_cb. So
> we can use some unused fields to place BPF metadata needed for SK_SKB
> programs when implementing the redirect function.
>
> This allows us to drop the preempt disable logic. It does however
> require an API change so sk_redirect_map() has been updated to
> additionally provide ctx_ptr to skb. Note, we do however continue to
> disable/enable preemption around actual BPF program running to account
> for map updates.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
> ---
> include/linux/filter.h | 2 +
> include/net/tcp.h | 5 +++
> kernel/bpf/sockmap.c | 19 ++++++-------
> net/core/filter.c | 29 ++++++++++----------
> samples/sockmap/sockmap_kern.c | 2 +
> tools/include/uapi/linux/bpf.h | 3 +-
> tools/testing/selftests/bpf/bpf_helpers.h | 2 +
> tools/testing/selftests/bpf/sockmap_verdict_prog.c | 4 +--
> 8 files changed, 36 insertions(+), 30 deletions(-)
>
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index d29e58f..818a0b2 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -728,7 +728,7 @@ void xdp_do_flush_map(void);
> void bpf_warn_invalid_xdp_action(u32 act);
> void bpf_warn_invalid_xdp_redirect(u32 ifindex);
>
> -struct sock *do_sk_redirect_map(void);
> +struct sock *do_sk_redirect_map(struct sk_buff *skb);
>
> #ifdef CONFIG_BPF_JIT
> extern int bpf_jit_enable;
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 89974c5..b1ef98e 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -840,6 +840,11 @@ struct tcp_skb_cb {
> struct inet6_skb_parm h6;
> #endif
> } header; /* For incoming skbs */
> + struct {
> + __u32 key;
> + __u32 flags;
> + struct bpf_map *map;
> + } bpf;
I think it should be ok. cc Eric for visibility.
> };
> };
>
> diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
> index c68899d..beaabb2 100644
> --- a/kernel/bpf/sockmap.c
> +++ b/kernel/bpf/sockmap.c
> @@ -39,6 +39,7 @@
> #include <linux/workqueue.h>
> #include <linux/list.h>
> #include <net/strparser.h>
> +#include <net/tcp.h>
>
> struct bpf_stab {
> struct bpf_map map;
> @@ -101,9 +102,16 @@ static int smap_verdict_func(struct smap_psock *psock, struct sk_buff *skb)
> return SK_DROP;
>
> skb_orphan(skb);
> + /* We need to ensure that BPF metadata for maps is also cleared
> + * when we orphan the skb so that we don't have the possibility
> + * to reference a stale map.
> + */
> + TCP_SKB_CB(skb)->bpf.map = NULL;
> skb->sk = psock->sock;
> bpf_compute_data_end(skb);
> + preempt_disable();
> rc = (*prog->bpf_func)(skb, prog->insnsi);
> + preempt_enable();
> skb->sk = NULL;
>
> return rc;
> @@ -114,17 +122,10 @@ static void smap_do_verdict(struct smap_psock *psock, struct sk_buff *skb)
> struct sock *sk;
> int rc;
>
> - /* Because we use per cpu values to feed input from sock redirect
> - * in BPF program to do_sk_redirect_map() call we need to ensure we
> - * are not preempted. RCU read lock is not sufficient in this case
> - * with CONFIG_PREEMPT_RCU enabled so we must be explicit here.
> - */
> - preempt_disable();
> rc = smap_verdict_func(psock, skb);
> switch (rc) {
> case SK_REDIRECT:
> - sk = do_sk_redirect_map();
> - preempt_enable();
> + sk = do_sk_redirect_map(skb);
> if (likely(sk)) {
> struct smap_psock *peer = smap_psock_sk(sk);
>
> @@ -141,8 +142,6 @@ static void smap_do_verdict(struct smap_psock *psock, struct sk_buff *skb)
> /* Fall through and free skb otherwise */
> case SK_DROP:
> default:
> - if (rc != SK_REDIRECT)
> - preempt_enable();
> kfree_skb(skb);
> }
> }
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 74b8c91..ca1ba0b 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -1839,31 +1839,31 @@ static const struct bpf_func_proto bpf_redirect_proto = {
> .arg2_type = ARG_ANYTHING,
> };
>
> -BPF_CALL_3(bpf_sk_redirect_map, struct bpf_map *, map, u32, key, u64, flags)
> +BPF_CALL_4(bpf_sk_redirect_map, struct sk_buff *, skb,
> + struct bpf_map *, map, u32, key, u64, flags)
> {
> - struct redirect_info *ri = this_cpu_ptr(&redirect_info);
> + struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
>
> if (unlikely(flags))
> return SK_ABORTED;
>
> - ri->ifindex = key;
> - ri->flags = flags;
> - ri->map = map;
> + tcb->bpf.key = key;
> + tcb->bpf.flags = flags;
> + tcb->bpf.map = map;
>
> return SK_REDIRECT;
> }
>
> -struct sock *do_sk_redirect_map(void)
> +struct sock *do_sk_redirect_map(struct sk_buff *skb)
> {
> - struct redirect_info *ri = this_cpu_ptr(&redirect_info);
> + struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
> struct sock *sk = NULL;
>
> - if (ri->map) {
> - sk = __sock_map_lookup_elem(ri->map, ri->ifindex);
> + if (tcb->bpf.map) {
> + sk = __sock_map_lookup_elem(tcb->bpf.map, tcb->bpf.key);
>
> - ri->ifindex = 0;
> - ri->map = NULL;
> - /* we do not clear flags for future lookup */
> + tcb->bpf.key = 0;
> + tcb->bpf.map = NULL;
> }
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [net PATCH 0/5] sockmap fixes for net
2017-10-18 14:09 [net PATCH 0/5] sockmap fixes for net John Fastabend
` (4 preceding siblings ...)
2017-10-18 14:11 ` [net PATCH 5/5] bpf: require CAP_NET_ADMIN when using devmap John Fastabend
@ 2017-10-20 12:01 ` David Miller
5 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2017-10-20 12:01 UTC (permalink / raw)
To: john.r.fastabend; +Cc: alexei.starovoitov, netdev, borkmann
From: John Fastabend <john.r.fastabend@gmail.com>
Date: Wed, 18 Oct 2017 07:09:48 -0700
> The following implements a set of fixes for sockmap and changes the
> API slightly in a few places to reduce preempt_disable/enable scope.
> We do this here in net because it requires an API change and this
> avoids getting stuck with legacy API going forward.
>
> The short description:
>
> Access to skb mark is removed, it is problematic when we add
> features in the future because mark is a union and used by the
> TCP/socket code internally. We don't want to expose this to the
> BPF programs or let programs change the values.
>
> The other change is caching metadata in the skb itself between
> when the BPF program returns a redirect code and the core code
> implements the redirect. This avoids having per cpu metadata.
>
> Finally, tighten restriction on using sockmap to CAP_NET_ADMIN and
> only SOCK_STREAM sockets.
Series applied, thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2017-10-20 12:01 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-18 14:09 [net PATCH 0/5] sockmap fixes for net John Fastabend
2017-10-18 14:10 ` [net PATCH 1/5] bpf: enforce TCP only support for sockmap John Fastabend
2017-10-18 17:33 ` Alexei Starovoitov
2017-10-18 14:10 ` [net PATCH 2/5] bpf: avoid preempt enable/disable in sockmap using tcp_skb_cb region John Fastabend
2017-10-18 17:36 ` Alexei Starovoitov
2017-10-18 14:10 ` [net PATCH 3/5] bpf: remove mark access for SK_SKB program types John Fastabend
2017-10-18 17:34 ` Alexei Starovoitov
2017-10-18 14:11 ` [net PATCH 4/5] bpf: require CAP_NET_ADMIN when using sockmap maps John Fastabend
2017-10-18 17:34 ` Alexei Starovoitov
2017-10-18 14:11 ` [net PATCH 5/5] bpf: require CAP_NET_ADMIN when using devmap John Fastabend
2017-10-18 17:34 ` Alexei Starovoitov
2017-10-20 12:01 ` [net PATCH 0/5] sockmap fixes for net David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).