* [PATCH v4 bpf-next 0/4] bpf: Support setting variable-length tunnel options
@ 2022-08-23 13:30 Shmulik Ladkani
2022-08-23 13:30 ` [PATCH v4 bpf-next 1/4] bpf: Add 'bpf_dynptr_get_data' helper Shmulik Ladkani
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Shmulik Ladkani @ 2022-08-23 13:30 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend
Cc: Andrii Nakryiko, Paul Chaignon, Shmulik Ladkani
Introduce 'skb_set_tunnel_opt_dynptr' to allow setting tunnel options of
dynamic length.
v2:
- Place test_tunnel's local route in a custom table, to ensure the IP
isn't considered assigned to a device.
v3:
- Avoid 'inline' for the __bpf_skb_set_tunopt helper function
v4:
- change API to be based on bpf_dynptr,
suggested by John Fastabend <john.fastabend@gmail.com>
Shmulik Ladkani (4):
bpf: Add 'bpf_dynptr_get_data' helper
bpf: Support setting variable-length tunnel options
selftests/bpf: Simplify test_tunnel setup for allowing non-local
tunnel traffic
selftests/bpf: Add geneve with bpf_skb_set_tunnel_opt_dynptr test-case
to test_progs
include/linux/bpf.h | 1 +
include/uapi/linux/bpf.h | 12 ++
kernel/bpf/helpers.c | 10 +
net/core/filter.c | 36 +++-
tools/include/uapi/linux/bpf.h | 12 ++
.../selftests/bpf/prog_tests/test_tunnel.c | 131 ++++++++++--
.../selftests/bpf/progs/test_tunnel_kern.c | 200 ++++++++++++------
7 files changed, 322 insertions(+), 80 deletions(-)
--
2.37.2
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v4 bpf-next 1/4] bpf: Add 'bpf_dynptr_get_data' helper
2022-08-23 13:30 [PATCH v4 bpf-next 0/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
@ 2022-08-23 13:30 ` Shmulik Ladkani
2022-08-23 19:11 ` Joanne Koong
2022-08-23 13:30 ` [PATCH v4 bpf-next 2/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
` (2 subsequent siblings)
3 siblings, 1 reply; 7+ messages in thread
From: Shmulik Ladkani @ 2022-08-23 13:30 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend
Cc: Andrii Nakryiko, Paul Chaignon, Shmulik Ladkani
The task of calculating bpf_dynptr_kern's available size, and the
current (offset) data pointer is common for BPF functions working with
ARG_PTR_TO_DYNPTR parameters.
Introduce 'bpf_dynptr_get_data' which returns the current data
(with properer offset), and the number of usable bytes it has.
This will void callers from directly calculating bpf_dynptr_kern's
data, offset and size.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---
include/linux/bpf.h | 1 +
kernel/bpf/helpers.c | 10 ++++++++++
2 files changed, 11 insertions(+)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 39bd36359c1e..6d288dfc302b 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2576,6 +2576,7 @@ void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
enum bpf_dynptr_type type, u32 offset, u32 size);
void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
int bpf_dynptr_check_size(u32 size);
+void *bpf_dynptr_get_data(struct bpf_dynptr_kern *ptr, u32 *avail_bytes);
#ifdef CONFIG_BPF_LSM
void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype);
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 3c1b9bbcf971..91f406a9c37f 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1461,6 +1461,16 @@ void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
bpf_dynptr_set_type(ptr, type);
}
+void *bpf_dynptr_get_data(struct bpf_dynptr_kern *ptr, u32 *avail_bytes)
+{
+ u32 size = bpf_dynptr_get_size(ptr);
+
+ if (!ptr->data || ptr->offset > size)
+ return NULL;
+ *avail_bytes = size - ptr->offset;
+ return ptr->data + ptr->offset;
+}
+
void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr)
{
memset(ptr, 0, sizeof(*ptr));
--
2.37.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 bpf-next 2/4] bpf: Support setting variable-length tunnel options
2022-08-23 13:30 [PATCH v4 bpf-next 0/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
2022-08-23 13:30 ` [PATCH v4 bpf-next 1/4] bpf: Add 'bpf_dynptr_get_data' helper Shmulik Ladkani
@ 2022-08-23 13:30 ` Shmulik Ladkani
2022-08-23 13:30 ` [PATCH v4 bpf-next 3/4] selftests/bpf: Simplify test_tunnel setup for allowing non-local tunnel traffic Shmulik Ladkani
2022-08-23 13:30 ` [PATCH v4 bpf-next 4/4] selftests/bpf: Add geneve with bpf_skb_set_tunnel_opt_dynptr test-case to test_progs Shmulik Ladkani
3 siblings, 0 replies; 7+ messages in thread
From: Shmulik Ladkani @ 2022-08-23 13:30 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend
Cc: Andrii Nakryiko, Paul Chaignon, Shmulik Ladkani
Existing 'bpf_skb_set_tunnel_opt' allows setting tunnel options given
an option buffer (ARG_PTR_TO_MEM) and the compile-time fixed buffer
size (ARG_CONST_SIZE).
However, in certain cases we wish to set tunnel options of dynamic
length.
For example, we have an ebpf program that gets geneve options on
incoming packets, stores them into a map (using a key representing
the incoming flow), and later needs to assign *same* options to
reply packets (belonging to same flow).
This is currently imposssible without knowing sender's exact geneve
options length, which unfortunately is dymamic.
Introduce 'bpf_skb_set_tunnel_opt_dynptr'.
This is a variant of 'bpf_skb_set_tunnel_opt' which gets a bpf dynamic
pointer (ARG_PTR_TO_DYNPTR) parameter 'ptr' whose data points to the
options buffer, and 'len', the byte length of options data caller wishes
to copy into ip_tunnnel_info.
'len' must never exceed the dynptr's internal size, o/w EINVAL is
returned.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---
v3: Avoid 'inline' for the __bpf_skb_set_tunopt helper function
v4: change API to be based on bpf_dynptr, suggested by John Fastabend <john.fastabend@gmail.com>
---
include/uapi/linux/bpf.h | 12 ++++++++++++
net/core/filter.c | 36 ++++++++++++++++++++++++++++++++--
tools/include/uapi/linux/bpf.h | 12 ++++++++++++
3 files changed, 58 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 934a2a8beb87..6d7eedf3644e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5355,6 +5355,17 @@ union bpf_attr {
* Return
* Current *ktime*.
*
+ * long bpf_skb_set_tunnel_opt_dynptr(struct sk_buff *skb, struct bpf_dynptr *opt, u32 len)
+ * Description
+ * Set tunnel options metadata for the packet associated to *skb*
+ * to the variable length *len* bytes of option data pointed to
+ * by the *opts* dynptr.
+ *
+ * See also the description of the **bpf_skb_get_tunnel_opt**\ ()
+ * helper for additional information.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -5566,6 +5577,7 @@ union bpf_attr {
FN(tcp_raw_check_syncookie_ipv4), \
FN(tcp_raw_check_syncookie_ipv6), \
FN(ktime_get_tai_ns), \
+ FN(skb_set_tunnel_opt_dynptr), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/net/core/filter.c b/net/core/filter.c
index 1acfaffeaf32..406ed0c50149 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4669,8 +4669,7 @@ static const struct bpf_func_proto bpf_skb_set_tunnel_key_proto = {
.arg4_type = ARG_ANYTHING,
};
-BPF_CALL_3(bpf_skb_set_tunnel_opt, struct sk_buff *, skb,
- const u8 *, from, u32, size)
+static u64 __bpf_skb_set_tunopt(struct sk_buff *skb, const u8 *from, u32 size)
{
struct ip_tunnel_info *info = skb_tunnel_info(skb);
const struct metadata_dst *md = this_cpu_ptr(md_dst);
@@ -4685,6 +4684,26 @@ BPF_CALL_3(bpf_skb_set_tunnel_opt, struct sk_buff *, skb,
return 0;
}
+BPF_CALL_3(bpf_skb_set_tunnel_opt, struct sk_buff *, skb,
+ const u8 *, from, u32, size)
+{
+ return __bpf_skb_set_tunopt(skb, from, size);
+}
+
+BPF_CALL_3(bpf_skb_set_tunnel_opt_dynptr, struct sk_buff *, skb,
+ struct bpf_dynptr_kern *, ptr, u32, len)
+{
+ const u8 *from;
+ u32 avail;
+
+ from = bpf_dynptr_get_data(ptr, &avail);
+ if (unlikely(!from))
+ return -EFAULT;
+ if (unlikely(len > avail))
+ return -EINVAL;
+ return __bpf_skb_set_tunopt(skb, from, len);
+}
+
static const struct bpf_func_proto bpf_skb_set_tunnel_opt_proto = {
.func = bpf_skb_set_tunnel_opt,
.gpl_only = false,
@@ -4694,6 +4713,15 @@ static const struct bpf_func_proto bpf_skb_set_tunnel_opt_proto = {
.arg3_type = ARG_CONST_SIZE,
};
+static const struct bpf_func_proto bpf_skb_set_tunnel_opt_dynptr_proto = {
+ .func = bpf_skb_set_tunnel_opt_dynptr,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL,
+ .arg3_type = ARG_ANYTHING,
+};
+
static const struct bpf_func_proto *
bpf_get_skb_set_tunnel_proto(enum bpf_func_id which)
{
@@ -4714,6 +4742,8 @@ bpf_get_skb_set_tunnel_proto(enum bpf_func_id which)
return &bpf_skb_set_tunnel_key_proto;
case BPF_FUNC_skb_set_tunnel_opt:
return &bpf_skb_set_tunnel_opt_proto;
+ case BPF_FUNC_skb_set_tunnel_opt_dynptr:
+ return &bpf_skb_set_tunnel_opt_dynptr_proto;
default:
return NULL;
}
@@ -7826,6 +7856,7 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_FUNC_skb_get_tunnel_opt:
return &bpf_skb_get_tunnel_opt_proto;
case BPF_FUNC_skb_set_tunnel_opt:
+ case BPF_FUNC_skb_set_tunnel_opt_dynptr:
return bpf_get_skb_set_tunnel_proto(func_id);
case BPF_FUNC_redirect:
return &bpf_redirect_proto;
@@ -8169,6 +8200,7 @@ lwt_xmit_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_FUNC_skb_get_tunnel_opt:
return &bpf_skb_get_tunnel_opt_proto;
case BPF_FUNC_skb_set_tunnel_opt:
+ case BPF_FUNC_skb_set_tunnel_opt_dynptr:
return bpf_get_skb_set_tunnel_proto(func_id);
case BPF_FUNC_redirect:
return &bpf_redirect_proto;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 1d6085e15fc8..b9bbccd1e5fa 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5355,6 +5355,17 @@ union bpf_attr {
* Return
* Current *ktime*.
*
+ * long bpf_skb_set_tunnel_opt_dynptr(struct sk_buff *skb, struct bpf_dynptr *opt, u32 len)
+ * Description
+ * Set tunnel options metadata for the packet associated to *skb*
+ * to the variable length *len* bytes of option data pointed to
+ * by the *opts* dynptr.
+ *
+ * See also the description of the **bpf_skb_get_tunnel_opt**\ ()
+ * helper for additional information.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -5566,6 +5577,7 @@ union bpf_attr {
FN(tcp_raw_check_syncookie_ipv4), \
FN(tcp_raw_check_syncookie_ipv6), \
FN(ktime_get_tai_ns), \
+ FN(skb_set_tunnel_opt_dynptr), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
--
2.37.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 bpf-next 3/4] selftests/bpf: Simplify test_tunnel setup for allowing non-local tunnel traffic
2022-08-23 13:30 [PATCH v4 bpf-next 0/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
2022-08-23 13:30 ` [PATCH v4 bpf-next 1/4] bpf: Add 'bpf_dynptr_get_data' helper Shmulik Ladkani
2022-08-23 13:30 ` [PATCH v4 bpf-next 2/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
@ 2022-08-23 13:30 ` Shmulik Ladkani
2022-08-23 13:30 ` [PATCH v4 bpf-next 4/4] selftests/bpf: Add geneve with bpf_skb_set_tunnel_opt_dynptr test-case to test_progs Shmulik Ladkani
3 siblings, 0 replies; 7+ messages in thread
From: Shmulik Ladkani @ 2022-08-23 13:30 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend
Cc: Andrii Nakryiko, Paul Chaignon, Shmulik Ladkani
Commit 1115169f47ae ("selftests/bpf: Don't assign outer source IP to host")
removed the secondary IP (IP4_ADDR2_VETH1) assigned to veth1, in order
to test bpf_skb_set_tunnel_key's functionality when tunnel destination
isn't assigned to an interface.
The chosen setup for testing the "tunnel to unassigned outer IP"
scenario was rather complex: (1) static ARP entries in order to
bypass ARP (o/w requests will fail as the target address isn't assigned
locally), and (2) a BPF program running on veth1 ingress which
manipulates the IP header's daddr to the actual IP assigned to the
interface (o/w tunnel traffic won't be accepted locally).
This is complex, and adds a dependency on this hidden "dnat"-like eBPF
program, that needs to be replicated when new tunnel tests are added.
Instead, we can have a much simpler setup: Add the secondary IP as a
*local route* in a table pointed by a custom fib rule. No static arp
entries are needed, and the special eBPF program that "dnats" the outer
destination can be removed.
This commit is a revert of 1115169f47ae, with the addition of the local
route of IP4_ADDR2_VETH1 (instead of the original address assignment).
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---
v2: Place the local route for the secondary IP in a custom table
pointed by a custom fib rule; this ensures the IP is not considered
assigned to a device.
---
.../selftests/bpf/prog_tests/test_tunnel.c | 23 ++----
.../selftests/bpf/progs/test_tunnel_kern.c | 80 +++----------------
2 files changed, 17 insertions(+), 86 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/test_tunnel.c b/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
index eea274110267..852da04ff281 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
@@ -82,7 +82,6 @@
#define MAC_TUNL_DEV0 "52:54:00:d9:01:00"
#define MAC_TUNL_DEV1 "52:54:00:d9:02:00"
-#define MAC_VETH1 "52:54:00:d9:03:00"
#define VXLAN_TUNL_DEV0 "vxlan00"
#define VXLAN_TUNL_DEV1 "vxlan11"
@@ -109,9 +108,15 @@
static int config_device(void)
{
SYS("ip netns add at_ns0");
- SYS("ip link add veth0 address " MAC_VETH1 " type veth peer name veth1");
+ SYS("ip link add veth0 type veth peer name veth1");
SYS("ip link set veth0 netns at_ns0");
SYS("ip addr add " IP4_ADDR1_VETH1 "/24 dev veth1");
+ /* Create a custom rule routing IP4_ADDR2_VETH1 as local.
+ * Do not place it in "local" table, to avoid this IP being considered
+ * assigned to a device.
+ */
+ SYS("ip rule add to " IP4_ADDR2_VETH1 " table 20");
+ SYS("ip route add local " IP4_ADDR2_VETH1 "/32 dev veth1 table 20");
SYS("ip link set dev veth1 up mtu 1500");
SYS("ip netns exec at_ns0 ip addr add " IP4_ADDR_VETH0 "/24 dev veth0");
SYS("ip netns exec at_ns0 ip link set dev veth0 up mtu 1500");
@@ -125,6 +130,7 @@ static void cleanup(void)
{
SYS_NOFAIL("test -f /var/run/netns/at_ns0 && ip netns delete at_ns0");
SYS_NOFAIL("ip link del veth1 2> /dev/null");
+ SYS_NOFAIL("ip rule del to %s table 20 2> /dev/null", IP4_ADDR2_VETH1);
SYS_NOFAIL("ip link del %s 2> /dev/null", VXLAN_TUNL_DEV1);
SYS_NOFAIL("ip link del %s 2> /dev/null", IP6VXLAN_TUNL_DEV1);
}
@@ -140,8 +146,6 @@ static int add_vxlan_tunnel(void)
VXLAN_TUNL_DEV0, IP4_ADDR_TUNL_DEV0);
SYS("ip netns exec at_ns0 ip neigh add %s lladdr %s dev %s",
IP4_ADDR_TUNL_DEV1, MAC_TUNL_DEV1, VXLAN_TUNL_DEV0);
- SYS("ip netns exec at_ns0 ip neigh add %s lladdr %s dev veth0",
- IP4_ADDR2_VETH1, MAC_VETH1);
/* root namespace */
SYS("ip link add dev %s type vxlan external gbp dstport 4789",
@@ -279,17 +283,6 @@ static void test_vxlan_tunnel(void)
if (attach_tc_prog(&tc_hook, get_src_prog_fd, set_src_prog_fd))
goto done;
- /* load and attach bpf prog to veth dev tc hook point */
- ifindex = if_nametoindex("veth1");
- if (!ASSERT_NEQ(ifindex, 0, "veth1 ifindex"))
- goto done;
- tc_hook.ifindex = ifindex;
- set_dst_prog_fd = bpf_program__fd(skel->progs.veth_set_outer_dst);
- if (!ASSERT_GE(set_dst_prog_fd, 0, "bpf_program__fd"))
- goto done;
- if (attach_tc_prog(&tc_hook, set_dst_prog_fd, -1))
- goto done;
-
/* load and attach prog set_md to tunnel dev tc hook point at_ns0 */
nstoken = open_netns("at_ns0");
if (!ASSERT_OK_PTR(nstoken, "setns src"))
diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
index df0673c4ecbe..17f2f325b3f3 100644
--- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
@@ -14,24 +14,15 @@
#include <linux/if_packet.h>
#include <linux/ip.h>
#include <linux/ipv6.h>
-#include <linux/icmp.h>
#include <linux/types.h>
#include <linux/socket.h>
#include <linux/pkt_cls.h>
#include <linux/erspan.h>
-#include <linux/udp.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>
#define log_err(__ret) bpf_printk("ERROR line:%d ret:%d\n", __LINE__, __ret)
-#define VXLAN_UDP_PORT 4789
-
-/* Only IPv4 address assigned to veth1.
- * 172.16.1.200
- */
-#define ASSIGNED_ADDR_VETH1 0xac1001c8
-
struct geneve_opt {
__be16 opt_class;
__u8 type;
@@ -42,11 +33,6 @@ struct geneve_opt {
__u8 opt_data[8]; /* hard-coded to 8 byte */
};
-struct vxlanhdr {
- __be32 vx_flags;
- __be32 vx_vni;
-} __attribute__((packed));
-
struct vxlan_metadata {
__u32 gbp;
};
@@ -383,8 +369,14 @@ int vxlan_get_tunnel_src(struct __sk_buff *skb)
int ret;
struct bpf_tunnel_key key;
struct vxlan_metadata md;
- __u32 orig_daddr;
__u32 index = 0;
+ __u32 *local_ip = NULL;
+
+ local_ip = bpf_map_lookup_elem(&local_ip_map, &index);
+ if (!local_ip) {
+ log_err(ret);
+ return TC_ACT_SHOT;
+ }
ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
if (ret < 0) {
@@ -398,10 +390,11 @@ int vxlan_get_tunnel_src(struct __sk_buff *skb)
return TC_ACT_SHOT;
}
- if (key.local_ipv4 != ASSIGNED_ADDR_VETH1 || md.gbp != 0x800FF) {
+ if (key.local_ipv4 != *local_ip || md.gbp != 0x800FF) {
bpf_printk("vxlan key %d local ip 0x%x remote ip 0x%x gbp 0x%x\n",
key.tunnel_id, key.local_ipv4,
key.remote_ipv4, md.gbp);
+ bpf_printk("local_ip 0x%x\n", *local_ip);
log_err(ret);
return TC_ACT_SHOT;
}
@@ -409,61 +402,6 @@ int vxlan_get_tunnel_src(struct __sk_buff *skb)
return TC_ACT_OK;
}
-SEC("tc")
-int veth_set_outer_dst(struct __sk_buff *skb)
-{
- struct ethhdr *eth = (struct ethhdr *)(long)skb->data;
- __u32 assigned_ip = bpf_htonl(ASSIGNED_ADDR_VETH1);
- void *data_end = (void *)(long)skb->data_end;
- struct udphdr *udph;
- struct iphdr *iph;
- __u32 index = 0;
- int ret = 0;
- int shrink;
- __s64 csum;
-
- if ((void *)eth + sizeof(*eth) > data_end) {
- log_err(ret);
- return TC_ACT_SHOT;
- }
-
- if (eth->h_proto != bpf_htons(ETH_P_IP))
- return TC_ACT_OK;
-
- iph = (struct iphdr *)(eth + 1);
- if ((void *)iph + sizeof(*iph) > data_end) {
- log_err(ret);
- return TC_ACT_SHOT;
- }
- if (iph->protocol != IPPROTO_UDP)
- return TC_ACT_OK;
-
- udph = (struct udphdr *)(iph + 1);
- if ((void *)udph + sizeof(*udph) > data_end) {
- log_err(ret);
- return TC_ACT_SHOT;
- }
- if (udph->dest != bpf_htons(VXLAN_UDP_PORT))
- return TC_ACT_OK;
-
- if (iph->daddr != assigned_ip) {
- csum = bpf_csum_diff(&iph->daddr, sizeof(__u32), &assigned_ip,
- sizeof(__u32), 0);
- if (bpf_skb_store_bytes(skb, ETH_HLEN + offsetof(struct iphdr, daddr),
- &assigned_ip, sizeof(__u32), 0) < 0) {
- log_err(ret);
- return TC_ACT_SHOT;
- }
- if (bpf_l3_csum_replace(skb, ETH_HLEN + offsetof(struct iphdr, check),
- 0, csum, 0) < 0) {
- log_err(ret);
- return TC_ACT_SHOT;
- }
- bpf_skb_change_type(skb, PACKET_HOST);
- }
- return TC_ACT_OK;
-}
-
SEC("tc")
int ip6vxlan_set_tunnel_dst(struct __sk_buff *skb)
{
--
2.37.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 bpf-next 4/4] selftests/bpf: Add geneve with bpf_skb_set_tunnel_opt_dynptr test-case to test_progs
2022-08-23 13:30 [PATCH v4 bpf-next 0/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
` (2 preceding siblings ...)
2022-08-23 13:30 ` [PATCH v4 bpf-next 3/4] selftests/bpf: Simplify test_tunnel setup for allowing non-local tunnel traffic Shmulik Ladkani
@ 2022-08-23 13:30 ` Shmulik Ladkani
3 siblings, 0 replies; 7+ messages in thread
From: Shmulik Ladkani @ 2022-08-23 13:30 UTC (permalink / raw)
To: bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend
Cc: Andrii Nakryiko, Paul Chaignon, Shmulik Ladkani
Add geneve test to test_tunnel. The test setup and scheme resembles the
existing vxlan test.
The test also checks variable length tunnel option assignment using
bpf_skb_set_tunnel_opt_dynptr.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---
.../selftests/bpf/prog_tests/test_tunnel.c | 108 ++++++++++++++
.../selftests/bpf/progs/test_tunnel_kern.c | 136 ++++++++++++++++++
2 files changed, 244 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/test_tunnel.c b/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
index 852da04ff281..9aae03c720e9 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
@@ -87,6 +87,8 @@
#define VXLAN_TUNL_DEV1 "vxlan11"
#define IP6VXLAN_TUNL_DEV0 "ip6vxlan00"
#define IP6VXLAN_TUNL_DEV1 "ip6vxlan11"
+#define GENEVE_TUNL_DEV0 "geneve00"
+#define GENEVE_TUNL_DEV1 "geneve11"
#define PING_ARGS "-i 0.01 -c 3 -w 10 -q"
@@ -133,6 +135,38 @@ static void cleanup(void)
SYS_NOFAIL("ip rule del to %s table 20 2> /dev/null", IP4_ADDR2_VETH1);
SYS_NOFAIL("ip link del %s 2> /dev/null", VXLAN_TUNL_DEV1);
SYS_NOFAIL("ip link del %s 2> /dev/null", IP6VXLAN_TUNL_DEV1);
+ SYS_NOFAIL("ip link del %s 2> /dev/null", GENEVE_TUNL_DEV1);
+}
+
+static int add_geneve_tunnel(void)
+{
+ /* at_ns0 namespace */
+ SYS("ip netns exec at_ns0 ip link add dev %s type geneve external",
+ GENEVE_TUNL_DEV0);
+ SYS("ip netns exec at_ns0 ip link set dev %s address %s up",
+ GENEVE_TUNL_DEV0, MAC_TUNL_DEV0);
+ SYS("ip netns exec at_ns0 ip addr add dev %s %s/24",
+ GENEVE_TUNL_DEV0, IP4_ADDR_TUNL_DEV0);
+ SYS("ip netns exec at_ns0 ip neigh add %s lladdr %s dev %s",
+ IP4_ADDR_TUNL_DEV1, MAC_TUNL_DEV1, GENEVE_TUNL_DEV0);
+
+ /* root namespace */
+ SYS("ip link add dev %s type geneve external", GENEVE_TUNL_DEV1);
+ SYS("ip link set dev %s address %s up", GENEVE_TUNL_DEV1, MAC_TUNL_DEV1);
+ SYS("ip addr add dev %s %s/24", GENEVE_TUNL_DEV1, IP4_ADDR_TUNL_DEV1);
+ SYS("ip neigh add %s lladdr %s dev %s",
+ IP4_ADDR_TUNL_DEV0, MAC_TUNL_DEV0, GENEVE_TUNL_DEV1);
+
+ return 0;
+fail:
+ return -1;
+}
+
+static void delete_geneve_tunnel(void)
+{
+ SYS_NOFAIL("ip netns exec at_ns0 ip link delete dev %s",
+ GENEVE_TUNL_DEV0);
+ SYS_NOFAIL("ip link delete dev %s", GENEVE_TUNL_DEV1);
}
static int add_vxlan_tunnel(void)
@@ -248,6 +282,79 @@ static int attach_tc_prog(struct bpf_tc_hook *hook, int igr_fd, int egr_fd)
return 0;
}
+static void test_geneve_tunnel(void)
+{
+ struct test_tunnel_kern *skel = NULL;
+ struct nstoken *nstoken;
+ int local_ip_map_fd = -1;
+ int set_src_prog_fd, get_src_prog_fd;
+ int set_dst_prog_fd;
+ int key = 0, ifindex = -1;
+ uint local_ip;
+ int err;
+ DECLARE_LIBBPF_OPTS(bpf_tc_hook, tc_hook,
+ .attach_point = BPF_TC_INGRESS);
+
+ /* add genve tunnel */
+ err = add_geneve_tunnel();
+ if (!ASSERT_OK(err, "add geneve tunnel"))
+ goto done;
+
+ /* load and attach bpf prog to tunnel dev tc hook point */
+ skel = test_tunnel_kern__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "test_tunnel_kern__open_and_load"))
+ goto done;
+ ifindex = if_nametoindex(GENEVE_TUNL_DEV1);
+ if (!ASSERT_NEQ(ifindex, 0, "geneve11 ifindex"))
+ goto done;
+ tc_hook.ifindex = ifindex;
+ get_src_prog_fd = bpf_program__fd(skel->progs.geneve_get_tunnel_src);
+ set_src_prog_fd = bpf_program__fd(skel->progs.geneve_set_tunnel_src);
+ if (!ASSERT_GE(get_src_prog_fd, 0, "bpf_program__fd"))
+ goto done;
+ if (!ASSERT_GE(set_src_prog_fd, 0, "bpf_program__fd"))
+ goto done;
+ if (attach_tc_prog(&tc_hook, get_src_prog_fd, set_src_prog_fd))
+ goto done;
+
+ /* load and attach prog set_md to tunnel dev tc hook point at_ns0 */
+ nstoken = open_netns("at_ns0");
+ if (!ASSERT_OK_PTR(nstoken, "setns src"))
+ goto done;
+ ifindex = if_nametoindex(GENEVE_TUNL_DEV0);
+ if (!ASSERT_NEQ(ifindex, 0, "geneve00 ifindex"))
+ goto done;
+ tc_hook.ifindex = ifindex;
+ set_dst_prog_fd = bpf_program__fd(skel->progs.geneve_set_tunnel_dst);
+ if (!ASSERT_GE(set_dst_prog_fd, 0, "bpf_program__fd"))
+ goto done;
+ if (attach_tc_prog(&tc_hook, -1, set_dst_prog_fd))
+ goto done;
+ close_netns(nstoken);
+
+ /* use veth1 ip 1 as tunnel source ip */
+ local_ip_map_fd = bpf_map__fd(skel->maps.local_ip_map);
+ if (!ASSERT_GE(local_ip_map_fd, 0, "bpf_map__fd"))
+ goto done;
+ local_ip = IP4_ADDR1_HEX_VETH1;
+ err = bpf_map_update_elem(local_ip_map_fd, &key, &local_ip, BPF_ANY);
+ if (!ASSERT_OK(err, "update bpf local_ip_map"))
+ goto done;
+
+ /* ping test */
+ err = test_ping(AF_INET, IP4_ADDR_TUNL_DEV0);
+ if (!ASSERT_OK(err, "test_ping"))
+ goto done;
+
+done:
+ /* delete geneve tunnel */
+ delete_geneve_tunnel();
+ if (local_ip_map_fd >= 0)
+ close(local_ip_map_fd);
+ if (skel)
+ test_tunnel_kern__destroy(skel);
+}
+
static void test_vxlan_tunnel(void)
{
struct test_tunnel_kern *skel = NULL;
@@ -408,6 +515,7 @@ static void *test_tunnel_run_tests(void *arg)
RUN_TEST(vxlan_tunnel);
RUN_TEST(ip6vxlan_tunnel);
+ RUN_TEST(geneve_tunnel);
cleanup();
diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
index 17f2f325b3f3..dfd274d2f65d 100644
--- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
@@ -23,6 +23,17 @@
#define log_err(__ret) bpf_printk("ERROR line:%d ret:%d\n", __LINE__, __ret)
+struct tun_opts_raw {
+ __u8 data[64];
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
+ __uint(max_entries, 1);
+ __type(key, __u32);
+ __type(value, struct tun_opts_raw);
+} geneve_opts SEC(".maps");
+
struct geneve_opt {
__be16 opt_class;
__u8 type;
@@ -285,6 +296,131 @@ int ip4ip6erspan_get_tunnel(struct __sk_buff *skb)
return TC_ACT_OK;
}
+SEC("tc")
+int geneve_set_tunnel_dst(struct __sk_buff *skb)
+{
+ int ret;
+ struct bpf_tunnel_key key;
+ struct tun_opts_raw *opts;
+ struct bpf_dynptr dptr;
+ __u32 index = 0;
+ __u32 *local_ip = NULL;
+ int opts_len;
+
+ local_ip = bpf_map_lookup_elem(&local_ip_map, &index);
+ if (!local_ip) {
+ log_err(-1);
+ return TC_ACT_SHOT;
+ }
+
+ index = 0;
+ opts = bpf_map_lookup_elem(&geneve_opts, &index);
+ if (!opts) {
+ log_err(-1);
+ return TC_ACT_SHOT;
+ }
+
+ __builtin_memset(&key, 0x0, sizeof(key));
+ key.local_ipv4 = 0xac100164; /* 172.16.1.100 */
+ key.remote_ipv4 = *local_ip;
+ key.tunnel_id = 2;
+ key.tunnel_tos = 0;
+ key.tunnel_ttl = 64;
+
+ ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+ BPF_F_ZERO_CSUM_TX);
+ if (ret < 0) {
+ log_err(ret);
+ return TC_ACT_SHOT;
+ }
+
+ __builtin_memset(opts, 0x0, sizeof(*opts));
+ bpf_dynptr_from_mem(opts, sizeof(*opts), 0, &dptr);
+ /* dynamic number of empty geneve options (4 bytes each).
+ * total len capped at sizeof(*opts) and is multiple of 4
+ */
+ opts_len = (skb->len % sizeof(*opts)) & ~(sizeof(__u32) - 1);
+ ret = bpf_skb_set_tunnel_opt_dynptr(skb, &dptr, opts_len);
+ if (ret < 0) {
+ log_err(ret);
+ return TC_ACT_SHOT;
+ }
+
+ return TC_ACT_OK;
+}
+
+SEC("tc")
+int geneve_set_tunnel_src(struct __sk_buff *skb)
+{
+ int ret;
+ struct bpf_tunnel_key key;
+ __u32 index = 0;
+ __u32 *local_ip = NULL;
+
+ local_ip = bpf_map_lookup_elem(&local_ip_map, &index);
+ if (!local_ip) {
+ log_err(ret);
+ return TC_ACT_SHOT;
+ }
+
+ __builtin_memset(&key, 0x0, sizeof(key));
+ key.local_ipv4 = *local_ip;
+ key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
+ key.tunnel_id = 2;
+ key.tunnel_tos = 0;
+ key.tunnel_ttl = 64;
+
+ ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+ BPF_F_ZERO_CSUM_TX);
+ if (ret < 0) {
+ log_err(ret);
+ return TC_ACT_SHOT;
+ }
+
+ return TC_ACT_OK;
+}
+
+SEC("tc")
+int geneve_get_tunnel_src(struct __sk_buff *skb)
+{
+ int ret;
+ struct bpf_tunnel_key key;
+ struct tun_opts_raw opts;
+ int expected_opts_len;
+ __u32 index = 0;
+ __u32 *local_ip = NULL;
+
+ local_ip = bpf_map_lookup_elem(&local_ip_map, &index);
+ if (!local_ip) {
+ log_err(ret);
+ return TC_ACT_SHOT;
+ }
+
+ ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
+ if (ret < 0) {
+ log_err(ret);
+ return TC_ACT_SHOT;
+ }
+
+ ret = bpf_skb_get_tunnel_opt(skb, &opts, sizeof(opts));
+ if (ret < 0) {
+ log_err(ret);
+ return TC_ACT_SHOT;
+ }
+
+ expected_opts_len = (skb->len % sizeof(opts)) & ~(sizeof(__u32) - 1);
+ if (key.local_ipv4 != *local_ip || ret != expected_opts_len) {
+ bpf_printk("geneve key %d local ip 0x%x remote ip 0x%x opts_len %d\n",
+ key.tunnel_id, key.local_ipv4,
+ key.remote_ipv4, ret);
+ bpf_printk("local_ip 0x%x\n", *local_ip);
+ log_err(ret);
+ return TC_ACT_SHOT;
+ }
+
+ return TC_ACT_OK;
+}
+
SEC("tc")
int vxlan_set_tunnel_dst(struct __sk_buff *skb)
{
--
2.37.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v4 bpf-next 1/4] bpf: Add 'bpf_dynptr_get_data' helper
2022-08-23 13:30 ` [PATCH v4 bpf-next 1/4] bpf: Add 'bpf_dynptr_get_data' helper Shmulik Ladkani
@ 2022-08-23 19:11 ` Joanne Koong
2022-08-24 4:07 ` Shmulik Ladkani
0 siblings, 1 reply; 7+ messages in thread
From: Joanne Koong @ 2022-08-23 19:11 UTC (permalink / raw)
To: Shmulik Ladkani
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Paul Chaignon, Shmulik Ladkani
On Tue, Aug 23, 2022 at 10:01 AM Shmulik Ladkani
<shmulik@metanetworks.com> wrote:
>
> The task of calculating bpf_dynptr_kern's available size, and the
> current (offset) data pointer is common for BPF functions working with
> ARG_PTR_TO_DYNPTR parameters.
>
> Introduce 'bpf_dynptr_get_data' which returns the current data
> (with properer offset), and the number of usable bytes it has.
>
> This will void callers from directly calculating bpf_dynptr_kern's
> data, offset and size.
>
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> ---
> include/linux/bpf.h | 1 +
> kernel/bpf/helpers.c | 10 ++++++++++
> 2 files changed, 11 insertions(+)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 39bd36359c1e..6d288dfc302b 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -2576,6 +2576,7 @@ void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
> enum bpf_dynptr_type type, u32 offset, u32 size);
> void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
> int bpf_dynptr_check_size(u32 size);
> +void *bpf_dynptr_get_data(struct bpf_dynptr_kern *ptr, u32 *avail_bytes);
>
> #ifdef CONFIG_BPF_LSM
> void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype);
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 3c1b9bbcf971..91f406a9c37f 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -1461,6 +1461,16 @@ void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
> bpf_dynptr_set_type(ptr, type);
> }
>
> +void *bpf_dynptr_get_data(struct bpf_dynptr_kern *ptr, u32 *avail_bytes)
> +{
> + u32 size = bpf_dynptr_get_size(ptr);
> +
> + if (!ptr->data || ptr->offset > size)
The "ptr->offset > size" check isn't quite correct because size is the
number of usable bytes (more on this below :))
> + return NULL;
> + *avail_bytes = size - ptr->offset;
dynptr->size is already the number of usable bytes; this is noted in
include/linux/bpf.h
/* the implementation of the opaque uapi struct bpf_dynptr */
struct bpf_dynptr_kern {
void *data;
/* Size represents the number of usable bytes of dynptr data.
* If for example the offset is at 4 for a local dynptr whose data is
* of type u64, the number of usable bytes is 4.
*
* The upper 8 bits are reserved. It is as follows:
* Bits 0 - 23 = size
* Bits 24 - 30 = dynptr type
* Bit 31 = whether dynptr is read-only
*/
u32 size;
u32 offset;
} __aligned(8);
> + return ptr->data + ptr->offset;
> +}
> +
> void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr)
> {
> memset(ptr, 0, sizeof(*ptr));
> --
> 2.37.2
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 bpf-next 1/4] bpf: Add 'bpf_dynptr_get_data' helper
2022-08-23 19:11 ` Joanne Koong
@ 2022-08-24 4:07 ` Shmulik Ladkani
0 siblings, 0 replies; 7+ messages in thread
From: Shmulik Ladkani @ 2022-08-24 4:07 UTC (permalink / raw)
To: Joanne Koong
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Paul Chaignon, Shmulik Ladkani
On Tue, 23 Aug 2022 12:11:38 -0700
Joanne Koong <joannelkoong@gmail.com> wrote:
> The "ptr->offset > size" check isn't quite correct because size is the
> number of usable bytes (more on this below :))
>
> > + return NULL;
> > + *avail_bytes = size - ptr->offset;
>
> dynptr->size is already the number of usable bytes; this is noted in
> include/linux/bpf.h
>
> /* the implementation of the opaque uapi struct bpf_dynptr */
> struct bpf_dynptr_kern {
> void *data;
> /* Size represents the number of usable bytes of dynptr data.
Thanks.
BTW, despite the comment I was under the impression the 'size' is the
*fixed* allocation size associated with 'data' (and not the usable bytes
left at data+offset), since (1) havn't encounterd 'size' adjustments in
the helpers code, and (2) 'size' arithmetic isn't trivial (due to
the flags stored into size's upper bits). Therefore, assumed it is
probably the fixed size.
Anyway will fix the new 'bpf_dynptr_get_data'.
Best,
Shmulik
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-08-24 4:07 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-23 13:30 [PATCH v4 bpf-next 0/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
2022-08-23 13:30 ` [PATCH v4 bpf-next 1/4] bpf: Add 'bpf_dynptr_get_data' helper Shmulik Ladkani
2022-08-23 19:11 ` Joanne Koong
2022-08-24 4:07 ` Shmulik Ladkani
2022-08-23 13:30 ` [PATCH v4 bpf-next 2/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
2022-08-23 13:30 ` [PATCH v4 bpf-next 3/4] selftests/bpf: Simplify test_tunnel setup for allowing non-local tunnel traffic Shmulik Ladkani
2022-08-23 13:30 ` [PATCH v4 bpf-next 4/4] selftests/bpf: Add geneve with bpf_skb_set_tunnel_opt_dynptr test-case to test_progs Shmulik Ladkani
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox