* [PATCH bpf-next v2 0/4] netfilter: Add the capability to offload flowtable in XDP layer
@ 2024-05-18 10:12 Lorenzo Bianconi
2024-05-18 10:12 ` [PATCH bpf-next v2 1/4] netfilter: nf_tables: add flowtable map for xdp offload Lorenzo Bianconi
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Lorenzo Bianconi @ 2024-05-18 10:12 UTC (permalink / raw)
To: bpf
Cc: pablo, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
netdev, ast, daniel, andrii, lorenzo.bianconi, toke, fw, hawk,
horms, donhunte, memxor
Introduce bpf_xdp_flow_offload_lookup kfunc in order to perform the
lookup of a given flowtable entry based on the fib tuple of incoming
traffic.
bpf_xdp_flow_offload_lookup can be used as building block to offload
in XDP the sw flowtable processing when the hw support is not available.
This series has been tested running the xdp_flowtable_offload eBPF program
on an ixgbe 10Gbps NIC (eno2) in order to XDP_REDIRECT the TCP traffic to
a veth pair (veth0-veth1) based on the content of the nf_flowtable as soon
as the TCP connection is in the established state:
[tcp client] (eno1) == LAN == (eno2) xdp_flowtable_offload [XDP_REDIRECT] --> veth0 == veth1 [tcp server]
table inet filter {
flowtable ft {
hook ingress priority filter
devices = { eno2, veth0 }
}
chain forward {
type filter hook forward priority filter
meta l4proto { tcp, udp } flow add @ft
}
}
- sw flowtable [1 TCP stream, T = 300s]: ~ 6.2 Gbps
- xdp flowtable [1 TCP stream, T = 300s]: ~ 7.6 Gbps
- sw flowtable [3 TCP stream, T = 300s]: ~ 7.7 Gbps
- xdp flowtable [3 TCP stream, T = 300s]: ~ 8.8 Gbps
Changes since v1:
- return NULL in bpf_xdp_flow_offload_lookup kfunc in case of error
- take into account kfunc registration possible failures
Changes since RFC:
- fix compilation error if BTF is not enabled
Florian Westphal (1):
netfilter: nf_tables: add flowtable map for xdp offload
Lorenzo Bianconi (3):
netfilter: add bpf_xdp_flow_offload_lookup kfunc
samples/bpf: Add bpf sample to offload flowtable traffic to xdp
selftests/bpf: Add selftest for bpf_xdp_flow_offload_lookup kfunc
include/net/netfilter/nf_flow_table.h | 12 +
net/netfilter/Makefile | 5 +
net/netfilter/nf_flow_table_bpf.c | 94 +++
net/netfilter/nf_flow_table_inet.c | 2 +-
net/netfilter/nf_flow_table_offload.c | 161 ++++-
samples/bpf/Makefile | 7 +-
samples/bpf/xdp_flowtable_offload.bpf.c | 591 ++++++++++++++++++
samples/bpf/xdp_flowtable_offload_user.c | 128 ++++
tools/testing/selftests/bpf/Makefile | 10 +-
tools/testing/selftests/bpf/config | 4 +
.../selftests/bpf/progs/xdp_flowtable.c | 141 +++++
.../selftests/bpf/test_xdp_flowtable.sh | 112 ++++
tools/testing/selftests/bpf/xdp_flowtable.c | 142 +++++
13 files changed, 1403 insertions(+), 6 deletions(-)
create mode 100644 net/netfilter/nf_flow_table_bpf.c
create mode 100644 samples/bpf/xdp_flowtable_offload.bpf.c
create mode 100644 samples/bpf/xdp_flowtable_offload_user.c
create mode 100644 tools/testing/selftests/bpf/progs/xdp_flowtable.c
create mode 100755 tools/testing/selftests/bpf/test_xdp_flowtable.sh
create mode 100644 tools/testing/selftests/bpf/xdp_flowtable.c
--
2.45.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH bpf-next v2 1/4] netfilter: nf_tables: add flowtable map for xdp offload
2024-05-18 10:12 [PATCH bpf-next v2 0/4] netfilter: Add the capability to offload flowtable in XDP layer Lorenzo Bianconi
@ 2024-05-18 10:12 ` Lorenzo Bianconi
2024-05-18 10:12 ` [PATCH bpf-next v2 2/4] netfilter: add bpf_xdp_flow_offload_lookup kfunc Lorenzo Bianconi
` (2 subsequent siblings)
3 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Bianconi @ 2024-05-18 10:12 UTC (permalink / raw)
To: bpf
Cc: pablo, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
netdev, ast, daniel, andrii, lorenzo.bianconi, toke, fw, hawk,
horms, donhunte, memxor
From: Florian Westphal <fw@strlen.de>
This adds a small internal mapping table so that a new bpf (xdp) kfunc
can perform lookups in a flowtable.
As-is, xdp program has access to the device pointer, but no way to do a
lookup in a flowtable -- there is no way to obtain the needed struct
without questionable stunts.
This allows to obtain an nf_flowtable pointer given a net_device
structure.
In order to keep backward compatibility, the infrastructure allows the
user to add a given device to multiple flowtables, but it will always
return the first added mapping performing the lookup since it assumes
the right configuration is 1:1 mapping between flowtables and net_devices.
Signed-off-by: Florian Westphal <fw@strlen.de>
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
include/net/netfilter/nf_flow_table.h | 2 +
net/netfilter/nf_flow_table_offload.c | 161 +++++++++++++++++++++++++-
2 files changed, 161 insertions(+), 2 deletions(-)
diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 9abb7ee40d72f..0bbe6ea8e0651 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -305,6 +305,8 @@ struct flow_ports {
__be16 source, dest;
};
+struct nf_flowtable *nf_flowtable_by_dev(const struct net_device *dev);
+
unsigned int nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
const struct nf_hook_state *state);
unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index a010b25076ca0..1acfcdbee42e8 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -17,6 +17,129 @@ static struct workqueue_struct *nf_flow_offload_add_wq;
static struct workqueue_struct *nf_flow_offload_del_wq;
static struct workqueue_struct *nf_flow_offload_stats_wq;
+struct flow_offload_xdp_ft {
+ struct list_head head;
+ struct nf_flowtable *ft;
+ struct rcu_head rcuhead;
+};
+
+struct flow_offload_xdp {
+ struct hlist_node hnode;
+ unsigned long net_device_addr;
+ struct list_head head;
+};
+
+#define NF_XDP_HT_BITS 4
+static DEFINE_HASHTABLE(nf_xdp_hashtable, NF_XDP_HT_BITS);
+static DEFINE_MUTEX(nf_xdp_hashtable_lock);
+
+/* caller must hold rcu read lock */
+struct nf_flowtable *nf_flowtable_by_dev(const struct net_device *dev)
+{
+ unsigned long key = (unsigned long)dev;
+ struct flow_offload_xdp *iter;
+
+ hash_for_each_possible_rcu(nf_xdp_hashtable, iter, hnode, key) {
+ if (key == iter->net_device_addr) {
+ struct flow_offload_xdp_ft *ft_elem;
+
+ /* The user is supposed to insert a given net_device
+ * just into a single nf_flowtable so we always return
+ * the first element here.
+ */
+ ft_elem = list_first_or_null_rcu(&iter->head,
+ struct flow_offload_xdp_ft,
+ head);
+ return ft_elem ? ft_elem->ft : NULL;
+ }
+ }
+
+ return NULL;
+}
+
+static int nf_flowtable_by_dev_insert(struct nf_flowtable *ft,
+ const struct net_device *dev)
+{
+ struct flow_offload_xdp *iter, *elem = NULL;
+ unsigned long key = (unsigned long)dev;
+ struct flow_offload_xdp_ft *ft_elem;
+
+ ft_elem = kzalloc(sizeof(*ft_elem), GFP_KERNEL_ACCOUNT);
+ if (!ft_elem)
+ return -ENOMEM;
+
+ ft_elem->ft = ft;
+
+ mutex_lock(&nf_xdp_hashtable_lock);
+
+ hash_for_each_possible(nf_xdp_hashtable, iter, hnode, key) {
+ if (key == iter->net_device_addr) {
+ elem = iter;
+ break;
+ }
+ }
+
+ if (!elem) {
+ elem = kzalloc(sizeof(*elem), GFP_KERNEL_ACCOUNT);
+ if (!elem)
+ goto err_unlock;
+
+ elem->net_device_addr = key;
+ INIT_LIST_HEAD(&elem->head);
+ hash_add_rcu(nf_xdp_hashtable, &elem->hnode, key);
+ }
+ list_add_tail_rcu(&ft_elem->head, &elem->head);
+
+ mutex_unlock(&nf_xdp_hashtable_lock);
+
+ return 0;
+
+err_unlock:
+ mutex_unlock(&nf_xdp_hashtable_lock);
+ kfree(ft_elem);
+
+ return -ENOMEM;
+}
+
+static void nf_flowtable_by_dev_remove(struct nf_flowtable *ft,
+ const struct net_device *dev)
+{
+ struct flow_offload_xdp *iter, *elem = NULL;
+ unsigned long key = (unsigned long)dev;
+
+ mutex_lock(&nf_xdp_hashtable_lock);
+
+ hash_for_each_possible(nf_xdp_hashtable, iter, hnode, key) {
+ if (key == iter->net_device_addr) {
+ elem = iter;
+ break;
+ }
+ }
+
+ if (elem) {
+ struct flow_offload_xdp_ft *ft_elem, *ft_next;
+
+ list_for_each_entry_safe(ft_elem, ft_next, &elem->head, head) {
+ if (ft_elem->ft == ft) {
+ list_del_rcu(&ft_elem->head);
+ kfree_rcu(ft_elem, rcuhead);
+ }
+ }
+
+ if (list_empty(&elem->head))
+ hash_del_rcu(&elem->hnode);
+ else
+ elem = NULL;
+ }
+
+ mutex_unlock(&nf_xdp_hashtable_lock);
+
+ if (elem) {
+ synchronize_rcu();
+ kfree(elem);
+ }
+}
+
struct flow_offload_work {
struct list_head list;
enum flow_cls_command cmd;
@@ -1183,6 +1306,38 @@ static int nf_flow_table_offload_cmd(struct flow_block_offload *bo,
return 0;
}
+static int nf_flow_offload_xdp_setup(struct nf_flowtable *flowtable,
+ struct net_device *dev,
+ enum flow_block_command cmd)
+{
+ switch (cmd) {
+ case FLOW_BLOCK_BIND:
+ return nf_flowtable_by_dev_insert(flowtable, dev);
+ case FLOW_BLOCK_UNBIND:
+ nf_flowtable_by_dev_remove(flowtable, dev);
+ return 0;
+ }
+
+ WARN_ON_ONCE(1);
+ return 0;
+}
+
+static void nf_flow_offload_xdp_cancel(struct nf_flowtable *flowtable,
+ struct net_device *dev,
+ enum flow_block_command cmd)
+{
+ switch (cmd) {
+ case FLOW_BLOCK_BIND:
+ nf_flowtable_by_dev_remove(flowtable, dev);
+ return;
+ case FLOW_BLOCK_UNBIND:
+ /* We do not re-bind in case hw offload would report error
+ * on *unregister*.
+ */
+ break;
+ }
+}
+
int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
struct net_device *dev,
enum flow_block_command cmd)
@@ -1192,7 +1347,7 @@ int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
int err;
if (!nf_flowtable_hw_offload(flowtable))
- return 0;
+ return nf_flow_offload_xdp_setup(flowtable, dev, cmd);
if (dev->netdev_ops->ndo_setup_tc)
err = nf_flow_table_offload_cmd(&bo, flowtable, dev, cmd,
@@ -1200,8 +1355,10 @@ int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
else
err = nf_flow_table_indr_offload_cmd(&bo, flowtable, dev, cmd,
&extack);
- if (err < 0)
+ if (err < 0) {
+ nf_flow_offload_xdp_cancel(flowtable, dev, cmd);
return err;
+ }
return nf_flow_table_block_setup(flowtable, &bo, cmd);
}
--
2.45.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH bpf-next v2 2/4] netfilter: add bpf_xdp_flow_offload_lookup kfunc
2024-05-18 10:12 [PATCH bpf-next v2 0/4] netfilter: Add the capability to offload flowtable in XDP layer Lorenzo Bianconi
2024-05-18 10:12 ` [PATCH bpf-next v2 1/4] netfilter: nf_tables: add flowtable map for xdp offload Lorenzo Bianconi
@ 2024-05-18 10:12 ` Lorenzo Bianconi
2024-05-18 21:50 ` Kumar Kartikeya Dwivedi
2024-05-21 1:41 ` Alexei Starovoitov
2024-05-18 10:12 ` [PATCH bpf-next v2 3/4] samples/bpf: Add bpf sample to offload flowtable traffic to xdp Lorenzo Bianconi
2024-05-18 10:12 ` [PATCH bpf-next v2 4/4] selftests/bpf: Add selftest for bpf_xdp_flow_offload_lookup kfunc Lorenzo Bianconi
3 siblings, 2 replies; 12+ messages in thread
From: Lorenzo Bianconi @ 2024-05-18 10:12 UTC (permalink / raw)
To: bpf
Cc: pablo, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
netdev, ast, daniel, andrii, lorenzo.bianconi, toke, fw, hawk,
horms, donhunte, memxor
Introduce bpf_xdp_flow_offload_lookup kfunc in order to perform the
lookup of a given flowtable entry based on a fib tuple of incoming
traffic.
bpf_xdp_flow_offload_lookup can be used as building block to offload
in xdp the processing of sw flowtable when hw flowtable is not
available.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
include/net/netfilter/nf_flow_table.h | 10 +++
net/netfilter/Makefile | 5 ++
net/netfilter/nf_flow_table_bpf.c | 94 +++++++++++++++++++++++++++
net/netfilter/nf_flow_table_inet.c | 2 +-
4 files changed, 110 insertions(+), 1 deletion(-)
create mode 100644 net/netfilter/nf_flow_table_bpf.c
diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 0bbe6ea8e0651..085660cbcd3f2 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -312,6 +312,16 @@ unsigned int nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
const struct nf_hook_state *state);
+#if (IS_BUILTIN(CONFIG_NF_FLOW_TABLE) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) || \
+ (IS_MODULE(CONFIG_NF_FLOW_TABLE) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES))
+extern int nf_flow_offload_register_bpf(void);
+#else
+static inline int nf_flow_offload_register_bpf(void)
+{
+ return 0;
+}
+#endif
+
#define MODULE_ALIAS_NF_FLOWTABLE(family) \
MODULE_ALIAS("nf-flowtable-" __stringify(family))
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 614815a3ed738..18b09cec92024 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -144,6 +144,11 @@ obj-$(CONFIG_NF_FLOW_TABLE) += nf_flow_table.o
nf_flow_table-objs := nf_flow_table_core.o nf_flow_table_ip.o \
nf_flow_table_offload.o
nf_flow_table-$(CONFIG_NF_FLOW_TABLE_PROCFS) += nf_flow_table_procfs.o
+ifeq ($(CONFIG_NF_FLOW_TABLE),m)
+nf_flow_table-$(CONFIG_DEBUG_INFO_BTF_MODULES) += nf_flow_table_bpf.o
+else ifeq ($(CONFIG_NF_FLOW_TABLE),y)
+nf_flow_table-$(CONFIG_DEBUG_INFO_BTF) += nf_flow_table_bpf.o
+endif
obj-$(CONFIG_NF_FLOW_TABLE_INET) += nf_flow_table_inet.o
diff --git a/net/netfilter/nf_flow_table_bpf.c b/net/netfilter/nf_flow_table_bpf.c
new file mode 100644
index 0000000000000..f999ed9712796
--- /dev/null
+++ b/net/netfilter/nf_flow_table_bpf.c
@@ -0,0 +1,94 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Unstable Flow Table Helpers for XDP hook
+ *
+ * These are called from the XDP programs.
+ * Note that it is allowed to break compatibility for these functions since
+ * the interface they are exposed through to BPF programs is explicitly
+ * unstable.
+ */
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <net/netfilter/nf_flow_table.h>
+#include <linux/bpf.h>
+#include <linux/btf.h>
+#include <net/xdp.h>
+
+__diag_push();
+__diag_ignore_all("-Wmissing-prototypes",
+ "Global functions as their definitions will be in nf_flow_table BTF");
+
+static struct flow_offload_tuple_rhash *
+bpf_xdp_flow_offload_tuple_lookup(struct net_device *dev,
+ struct flow_offload_tuple *tuple,
+ __be16 proto)
+{
+ struct flow_offload_tuple_rhash *tuplehash;
+ struct nf_flowtable *flow_table;
+ struct flow_offload *flow;
+
+ flow_table = nf_flowtable_by_dev(dev);
+ if (!flow_table)
+ return NULL;
+
+ tuplehash = flow_offload_lookup(flow_table, tuple);
+ if (!tuplehash)
+ return NULL;
+
+ flow = container_of(tuplehash, struct flow_offload,
+ tuplehash[tuplehash->tuple.dir]);
+ flow_offload_refresh(flow_table, flow, false);
+
+ return tuplehash;
+}
+
+__bpf_kfunc struct flow_offload_tuple_rhash *
+bpf_xdp_flow_offload_lookup(struct xdp_md *ctx,
+ struct bpf_fib_lookup *fib_tuple)
+{
+ struct xdp_buff *xdp = (struct xdp_buff *)ctx;
+ struct flow_offload_tuple tuple = {
+ .iifidx = fib_tuple->ifindex,
+ .l3proto = fib_tuple->family,
+ .l4proto = fib_tuple->l4_protocol,
+ .src_port = fib_tuple->sport,
+ .dst_port = fib_tuple->dport,
+ };
+ __be16 proto;
+
+ switch (fib_tuple->family) {
+ case AF_INET:
+ tuple.src_v4.s_addr = fib_tuple->ipv4_src;
+ tuple.dst_v4.s_addr = fib_tuple->ipv4_dst;
+ proto = htons(ETH_P_IP);
+ break;
+ case AF_INET6:
+ tuple.src_v6 = *(struct in6_addr *)&fib_tuple->ipv6_src;
+ tuple.dst_v6 = *(struct in6_addr *)&fib_tuple->ipv6_dst;
+ proto = htons(ETH_P_IPV6);
+ break;
+ default:
+ return NULL;
+ }
+
+ return bpf_xdp_flow_offload_tuple_lookup(xdp->rxq->dev, &tuple, proto);
+}
+
+__diag_pop()
+
+BTF_KFUNCS_START(nf_ft_kfunc_set)
+BTF_ID_FLAGS(func, bpf_xdp_flow_offload_lookup)
+BTF_KFUNCS_END(nf_ft_kfunc_set)
+
+static const struct btf_kfunc_id_set nf_flow_offload_kfunc_set = {
+ .owner = THIS_MODULE,
+ .set = &nf_ft_kfunc_set,
+};
+
+int nf_flow_offload_register_bpf(void)
+{
+ return register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP,
+ &nf_flow_offload_kfunc_set);
+}
+EXPORT_SYMBOL_GPL(nf_flow_offload_register_bpf);
diff --git a/net/netfilter/nf_flow_table_inet.c b/net/netfilter/nf_flow_table_inet.c
index 6eef15648b7b0..6175f7556919d 100644
--- a/net/netfilter/nf_flow_table_inet.c
+++ b/net/netfilter/nf_flow_table_inet.c
@@ -98,7 +98,7 @@ static int __init nf_flow_inet_module_init(void)
nft_register_flowtable_type(&flowtable_ipv6);
nft_register_flowtable_type(&flowtable_inet);
- return 0;
+ return nf_flow_offload_register_bpf();
}
static void __exit nf_flow_inet_module_exit(void)
--
2.45.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH bpf-next v2 3/4] samples/bpf: Add bpf sample to offload flowtable traffic to xdp
2024-05-18 10:12 [PATCH bpf-next v2 0/4] netfilter: Add the capability to offload flowtable in XDP layer Lorenzo Bianconi
2024-05-18 10:12 ` [PATCH bpf-next v2 1/4] netfilter: nf_tables: add flowtable map for xdp offload Lorenzo Bianconi
2024-05-18 10:12 ` [PATCH bpf-next v2 2/4] netfilter: add bpf_xdp_flow_offload_lookup kfunc Lorenzo Bianconi
@ 2024-05-18 10:12 ` Lorenzo Bianconi
2024-05-21 1:45 ` Alexei Starovoitov
2024-05-18 10:12 ` [PATCH bpf-next v2 4/4] selftests/bpf: Add selftest for bpf_xdp_flow_offload_lookup kfunc Lorenzo Bianconi
3 siblings, 1 reply; 12+ messages in thread
From: Lorenzo Bianconi @ 2024-05-18 10:12 UTC (permalink / raw)
To: bpf
Cc: pablo, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
netdev, ast, daniel, andrii, lorenzo.bianconi, toke, fw, hawk,
horms, donhunte, memxor
Introduce xdp_flowtable_offload bpf sample to offload sw flowtable logic
in xdp layer if hw flowtable is not available or does not support a
specific kind of traffic.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
samples/bpf/Makefile | 7 +-
samples/bpf/xdp_flowtable_offload.bpf.c | 591 +++++++++++++++++++++++
samples/bpf/xdp_flowtable_offload_user.c | 128 +++++
3 files changed, 725 insertions(+), 1 deletion(-)
create mode 100644 samples/bpf/xdp_flowtable_offload.bpf.c
create mode 100644 samples/bpf/xdp_flowtable_offload_user.c
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 9aa027b144df6..a3d089ca224d5 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -46,6 +46,7 @@ tprogs-y += xdp_fwd
tprogs-y += task_fd_query
tprogs-y += ibumad
tprogs-y += hbm
+tprogs-y += xdp_flowtable_offload
# Libbpf dependencies
LIBBPF_SRC = $(TOOLS_PATH)/lib/bpf
@@ -98,6 +99,7 @@ ibumad-objs := ibumad_user.o
hbm-objs := hbm.o $(CGROUP_HELPERS)
xdp_router_ipv4-objs := xdp_router_ipv4_user.o $(XDP_SAMPLE)
+xdp_flowtable_offload-objs := xdp_flowtable_offload_user.o $(XDP_SAMPLE)
# Tell kbuild to always build the programs
always-y := $(tprogs-y)
@@ -306,6 +308,7 @@ $(obj)/$(TRACE_HELPERS) $(obj)/$(CGROUP_HELPERS) $(obj)/$(XDP_SAMPLE): | libbpf_
.PHONY: libbpf_hdrs
$(obj)/xdp_router_ipv4_user.o: $(obj)/xdp_router_ipv4.skel.h
+$(obj)/xdp_flowtable_offload_user.o: $(obj)/xdp_flowtable_offload.skel.h
$(obj)/tracex5.bpf.o: $(obj)/syscall_nrs.h
$(obj)/hbm_out_kern.o: $(src)/hbm.h $(src)/hbm_kern.h
@@ -361,6 +364,7 @@ endef
CLANG_SYS_INCLUDES = $(call get_sys_includes,$(CLANG))
$(obj)/xdp_router_ipv4.bpf.o: $(obj)/xdp_sample.bpf.o
+$(obj)/xdp_flowtable_offload.bpf.o: $(obj)/xdp_sample.bpf.o
$(obj)/%.bpf.o: $(src)/%.bpf.c $(obj)/vmlinux.h $(src)/xdp_sample.bpf.h $(src)/xdp_sample_shared.h
@echo " CLANG-BPF " $@
@@ -370,10 +374,11 @@ $(obj)/%.bpf.o: $(src)/%.bpf.c $(obj)/vmlinux.h $(src)/xdp_sample.bpf.h $(src)/x
-I$(LIBBPF_INCLUDE) $(CLANG_SYS_INCLUDES) \
-c $(filter %.bpf.c,$^) -o $@
-LINKED_SKELS := xdp_router_ipv4.skel.h
+LINKED_SKELS := xdp_router_ipv4.skel.h xdp_flowtable_offload.skel.h
clean-files += $(LINKED_SKELS)
xdp_router_ipv4.skel.h-deps := xdp_router_ipv4.bpf.o xdp_sample.bpf.o
+xdp_flowtable_offload.skel.h-deps := xdp_flowtable_offload.bpf.o xdp_sample.bpf.o
LINKED_BPF_SRCS := $(patsubst %.bpf.o,%.bpf.c,$(foreach skel,$(LINKED_SKELS),$($(skel)-deps)))
diff --git a/samples/bpf/xdp_flowtable_offload.bpf.c b/samples/bpf/xdp_flowtable_offload.bpf.c
new file mode 100644
index 0000000000000..2c41054b2eb95
--- /dev/null
+++ b/samples/bpf/xdp_flowtable_offload.bpf.c
@@ -0,0 +1,591 @@
+/* Copyright (c) 2024 Lorenzo Bianconi <lorenzo@kernel.org>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#include "vmlinux.h"
+#include "xdp_sample.bpf.h"
+#include "xdp_sample_shared.h"
+
+#define MAX_ERRNO 4095
+#define BIT(x) (1 << (x))
+
+#define ETH_P_IP 0x0800
+#define IP_MF 0x2000 /* "More Fragments" */
+#define IP_OFFSET 0x1fff /* "Fragment Offset" */
+
+#define IPV6_FLOWINFO_MASK __cpu_to_be32(0x0fffffff)
+
+#define CSUM_MANGLED_0 ((__sum16)0xffff)
+
+struct flow_offload_tuple_rhash *
+bpf_xdp_flow_offload_lookup(struct xdp_md *,
+ struct bpf_fib_lookup *) __ksym;
+
+/* IP checksum utility routines */
+
+static __always_inline __u32 csum_add(__u32 csum, __u32 addend)
+{
+ __u32 res = csum + addend;
+
+ return res + (res < addend);
+}
+
+static __always_inline __u16 csum_fold(__u32 csum)
+{
+ csum = (csum & 0xffff) + (csum >> 16);
+ csum = (csum & 0xffff) + (csum >> 16);
+ return ~csum;
+}
+
+static __always_inline __u16 csum_replace4(__u32 csum, __u32 from, __u32 to)
+{
+ __u32 tmp = csum_add(~csum, ~from);
+
+ return csum_fold(csum_add(tmp, to));
+}
+
+static __always_inline __u16 csum_replace16(__u32 csum, __u32 *from, __u32 *to)
+{
+ __u32 diff[] = {
+ ~from[0], ~from[1], ~from[2], ~from[3],
+ to[0], to[1], to[2], to[3],
+ };
+
+ csum = bpf_csum_diff(0, 0, diff, sizeof(diff), ~csum);
+ return csum_fold(csum);
+}
+
+/* IP-TCP header utility routines */
+
+static __always_inline void ip_decrease_ttl(struct iphdr *iph)
+{
+ __u32 check = (__u32)iph->check;
+
+ check += (__u32)bpf_htons(0x0100);
+ iph->check = (__sum16)(check + (check >= 0xffff));
+ iph->ttl--;
+}
+
+static __always_inline bool
+xdp_flowtable_offload_check_iphdr(struct iphdr *iph)
+{
+ /* ip fragmented traffic */
+ if (iph->frag_off & bpf_htons(IP_MF | IP_OFFSET))
+ return false;
+
+ /* ip options */
+ if (iph->ihl * 4 != sizeof(*iph))
+ return false;
+
+ if (iph->ttl <= 1)
+ return false;
+
+ return true;
+}
+
+static __always_inline bool
+xdp_flowtable_offload_check_tcp_state(void *ports, void *data_end, u8 proto)
+{
+ if (proto == IPPROTO_TCP) {
+ struct tcphdr *tcph = ports;
+
+ if (tcph + 1 > data_end)
+ return false;
+
+ if (tcph->fin || tcph->rst)
+ return false;
+ }
+
+ return true;
+}
+
+/* IP nat utility routines */
+
+static __always_inline void
+xdp_flowtable_offload_nat_port(struct flow_ports *ports, void *data_end,
+ u8 proto, __be16 port, __be16 nat_port)
+{
+ switch (proto) {
+ case IPPROTO_TCP: {
+ struct tcphdr *tcph = (struct tcphdr *)ports;
+
+ if (tcph + 1 > data_end)
+ break;
+
+ tcph->check = csum_replace4((__u32)tcph->check, (__u32)port,
+ (__u32)nat_port);
+ break;
+ }
+ case IPPROTO_UDP: {
+ struct udphdr *udph = (struct udphdr *)ports;
+
+ if (udph + 1 > data_end)
+ break;
+
+ if (!udph->check)
+ break;
+
+ udph->check = csum_replace4((__u32)udph->check, (__u32)port,
+ (__u32)nat_port);
+ if (!udph->check)
+ udph->check = CSUM_MANGLED_0;
+ break;
+ }
+ default:
+ break;
+ }
+}
+
+static __always_inline void
+xdp_flowtable_offload_snat_port(const struct flow_offload *flow,
+ struct flow_ports *ports, void *data_end,
+ u8 proto, enum flow_offload_tuple_dir dir)
+{
+ __be16 port, nat_port;
+
+ if (ports + 1 > data_end)
+ return;
+
+ switch (dir) {
+ case FLOW_OFFLOAD_DIR_ORIGINAL:
+ port = ports->source;
+ bpf_core_read(&nat_port, bpf_core_type_size(nat_port),
+ &flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_port);
+ ports->source = nat_port;
+ break;
+ case FLOW_OFFLOAD_DIR_REPLY:
+ port = ports->dest;
+ bpf_core_read(&nat_port, bpf_core_type_size(nat_port),
+ &flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.src_port);
+ ports->dest = nat_port;
+ break;
+ default:
+ return;
+ }
+
+ xdp_flowtable_offload_nat_port(ports, data_end, proto, port, nat_port);
+}
+
+static __always_inline void
+xdp_flowtable_offload_dnat_port(const struct flow_offload *flow,
+ struct flow_ports *ports, void *data_end,
+ u8 proto, enum flow_offload_tuple_dir dir)
+{
+ __be16 port, nat_port;
+
+ if (ports + 1 > data_end)
+ return;
+
+ switch (dir) {
+ case FLOW_OFFLOAD_DIR_ORIGINAL:
+ port = ports->dest;
+ bpf_core_read(&nat_port, bpf_core_type_size(nat_port),
+ &flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.src_port);
+ ports->dest = nat_port;
+ break;
+ case FLOW_OFFLOAD_DIR_REPLY:
+ port = ports->source;
+ bpf_core_read(&nat_port, bpf_core_type_size(nat_port),
+ &flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_port);
+ ports->source = nat_port;
+ break;
+ default:
+ return;
+ }
+
+ xdp_flowtable_offload_nat_port(ports, data_end, proto, port, nat_port);
+}
+
+static __always_inline void
+xdp_flowtable_offload_ip_l4(struct iphdr *iph, void *data_end,
+ __be32 addr, __be32 nat_addr)
+{
+ switch (iph->protocol) {
+ case IPPROTO_TCP: {
+ struct tcphdr *tcph = (struct tcphdr *)(iph + 1);
+
+ if (tcph + 1 > data_end)
+ break;
+
+ tcph->check = csum_replace4((__u32)tcph->check, addr,
+ nat_addr);
+ break;
+ }
+ case IPPROTO_UDP: {
+ struct udphdr *udph = (struct udphdr *)(iph + 1);
+
+ if (udph + 1 > data_end)
+ break;
+
+ if (!udph->check)
+ break;
+
+ udph->check = csum_replace4((__u32)udph->check, addr,
+ nat_addr);
+ if (!udph->check)
+ udph->check = CSUM_MANGLED_0;
+ break;
+ }
+ default:
+ break;
+ }
+}
+
+static __always_inline void
+xdp_flowtable_offload_snat_ip(const struct flow_offload *flow,
+ struct iphdr *iph, void *data_end,
+ enum flow_offload_tuple_dir dir)
+{
+ __be32 addr, nat_addr;
+
+ switch (dir) {
+ case FLOW_OFFLOAD_DIR_ORIGINAL:
+ addr = iph->saddr;
+ bpf_core_read(&nat_addr, bpf_core_type_size(nat_addr),
+ &flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_v4.s_addr);
+ iph->saddr = nat_addr;
+ break;
+ case FLOW_OFFLOAD_DIR_REPLY:
+ addr = iph->daddr;
+ bpf_core_read(&nat_addr, bpf_core_type_size(nat_addr),
+ &flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.src_v4.s_addr);
+ iph->daddr = nat_addr;
+ break;
+ default:
+ return;
+ }
+ iph->check = csum_replace4((__u32)iph->check, addr, nat_addr);
+
+ xdp_flowtable_offload_ip_l4(iph, data_end, addr, nat_addr);
+}
+
+static __always_inline void
+xdp_flowtable_offload_get_dnat_ip(const struct flow_offload *flow,
+ enum flow_offload_tuple_dir dir,
+ __be32 *addr)
+{
+ switch (dir) {
+ case FLOW_OFFLOAD_DIR_ORIGINAL:
+ bpf_core_read(addr, sizeof(*addr),
+ &flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.src_v4.s_addr);
+ break;
+ case FLOW_OFFLOAD_DIR_REPLY:
+ bpf_core_read(addr, sizeof(*addr),
+ &flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_v4.s_addr);
+ break;
+ }
+}
+
+static __always_inline void
+xdp_flowtable_offload_dnat_ip(const struct flow_offload *flow,
+ struct iphdr *iph, void *data_end,
+ enum flow_offload_tuple_dir dir)
+{
+ __be32 addr, nat_addr;
+
+ switch (dir) {
+ case FLOW_OFFLOAD_DIR_ORIGINAL:
+ addr = iph->daddr;
+ bpf_core_read(&nat_addr, bpf_core_type_size(nat_addr),
+ &flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.src_v4.s_addr);
+ iph->daddr = nat_addr;
+ break;
+ case FLOW_OFFLOAD_DIR_REPLY:
+ addr = iph->saddr;
+ bpf_core_read(&nat_addr, bpf_core_type_size(nat_addr),
+ &flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_v4.s_addr);
+ iph->saddr = nat_addr;
+ break;
+ default:
+ return;
+ }
+ iph->check = csum_replace4((__u32)iph->check, addr, nat_addr);
+
+ xdp_flowtable_offload_ip_l4(iph, data_end, addr, nat_addr);
+}
+
+static __always_inline void
+xdp_flowtable_offload_ipv6_l4(struct ipv6hdr *ip6h, void *data_end,
+ struct in6_addr *addr, struct in6_addr *nat_addr)
+{
+ switch (ip6h->nexthdr) {
+ case IPPROTO_TCP: {
+ struct tcphdr *tcph = (struct tcphdr *)(ip6h + 1);
+
+ if (tcph + 1 > data_end)
+ break;
+
+ tcph->check = csum_replace16((__u32)tcph->check,
+ addr->in6_u.u6_addr32,
+ nat_addr->in6_u.u6_addr32);
+ break;
+ }
+ case IPPROTO_UDP: {
+ struct udphdr *udph = (struct udphdr *)(ip6h + 1);
+
+ if (udph + 1 > data_end)
+ break;
+
+ if (!udph->check)
+ break;
+
+ udph->check = csum_replace16((__u32)udph->check,
+ addr->in6_u.u6_addr32,
+ nat_addr->in6_u.u6_addr32);
+ if (!udph->check)
+ udph->check = CSUM_MANGLED_0;
+ break;
+ }
+ default:
+ break;
+ }
+}
+
+static __always_inline void
+xdp_flowtable_offload_snat_ipv6(const struct flow_offload *flow,
+ struct ipv6hdr *ip6h, void *data_end,
+ enum flow_offload_tuple_dir dir)
+{
+ struct in6_addr addr, nat_addr;
+
+ switch (dir) {
+ case FLOW_OFFLOAD_DIR_ORIGINAL:
+ addr = ip6h->saddr;
+ bpf_core_read(&nat_addr, bpf_core_type_size(nat_addr),
+ &flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_v6);
+ ip6h->saddr = nat_addr;
+ break;
+ case FLOW_OFFLOAD_DIR_REPLY:
+ addr = ip6h->daddr;
+ bpf_core_read(&nat_addr, bpf_core_type_size(nat_addr),
+ &flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.src_v6);
+ ip6h->daddr = nat_addr;
+ break;
+ default:
+ return;
+ }
+
+ xdp_flowtable_offload_ipv6_l4(ip6h, data_end, &addr, &nat_addr);
+}
+
+static __always_inline void
+xdp_flowtable_offload_get_dnat_ipv6(const struct flow_offload *flow,
+ enum flow_offload_tuple_dir dir,
+ struct in6_addr *addr)
+{
+ switch (dir) {
+ case FLOW_OFFLOAD_DIR_ORIGINAL:
+ bpf_core_read(addr, sizeof(*addr),
+ &flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.src_v6);
+ break;
+ case FLOW_OFFLOAD_DIR_REPLY:
+ bpf_core_read(addr, sizeof(*addr),
+ &flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_v6);
+ break;
+ }
+}
+
+static __always_inline void
+xdp_flowtable_offload_dnat_ipv6(const struct flow_offload *flow,
+ struct ipv6hdr *ip6h, void *data_end,
+ enum flow_offload_tuple_dir dir)
+{
+ struct in6_addr addr, nat_addr;
+
+ switch (dir) {
+ case FLOW_OFFLOAD_DIR_ORIGINAL:
+ addr = ip6h->daddr;
+ bpf_core_read(&nat_addr, bpf_core_type_size(nat_addr),
+ &flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.src_v6);
+ ip6h->daddr = nat_addr;
+ break;
+ case FLOW_OFFLOAD_DIR_REPLY:
+ addr = ip6h->saddr;
+ bpf_core_read(&nat_addr, bpf_core_type_size(nat_addr),
+ &flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_v6);
+ ip6h->saddr = nat_addr;
+ break;
+ default:
+ return;
+ }
+
+ xdp_flowtable_offload_ipv6_l4(ip6h, data_end, &addr, &nat_addr);
+}
+
+static __always_inline void
+xdp_flowtable_offload_forward_ip(const struct flow_offload *flow,
+ void *data, void *data_end,
+ struct flow_ports *ports,
+ enum flow_offload_tuple_dir dir,
+ unsigned long flags)
+{
+ struct iphdr *iph = data + sizeof(struct ethhdr);
+
+ if (iph + 1 > data_end)
+ return;
+
+ if (flags & BIT(NF_FLOW_SNAT)) {
+ xdp_flowtable_offload_snat_port(flow, ports, data_end,
+ iph->protocol, dir);
+ xdp_flowtable_offload_snat_ip(flow, iph, data_end, dir);
+ }
+ if (flags & BIT(NF_FLOW_DNAT)) {
+ xdp_flowtable_offload_dnat_port(flow, ports, data_end,
+ iph->protocol, dir);
+ xdp_flowtable_offload_dnat_ip(flow, iph, data_end, dir);
+ }
+
+ ip_decrease_ttl(iph);
+}
+
+static __always_inline void
+xdp_flowtable_offload_forward_ipv6(const struct flow_offload *flow,
+ void *data, void *data_end,
+ struct flow_ports *ports,
+ enum flow_offload_tuple_dir dir,
+ unsigned long flags)
+{
+ struct ipv6hdr *ip6h = data + sizeof(struct ethhdr);
+
+ if (ip6h + 1 > data_end)
+ return;
+
+ if (flags & BIT(NF_FLOW_SNAT)) {
+ xdp_flowtable_offload_snat_port(flow, ports, data_end,
+ ip6h->nexthdr, dir);
+ xdp_flowtable_offload_snat_ipv6(flow, ip6h, data_end, dir);
+ }
+ if (flags & BIT(NF_FLOW_DNAT)) {
+ xdp_flowtable_offload_dnat_port(flow, ports, data_end,
+ ip6h->nexthdr, dir);
+ xdp_flowtable_offload_dnat_ipv6(flow, ip6h, data_end, dir);
+ }
+
+ ip6h->hop_limit--;
+}
+
+SEC("xdp")
+int xdp_flowtable_offload(struct xdp_md *ctx)
+{
+ void *data_end = (void *)(long)ctx->data_end;
+ struct flow_offload_tuple_rhash *tuplehash;
+ struct bpf_fib_lookup tuple = {
+ .ifindex = ctx->ingress_ifindex,
+ };
+ void *data = (void *)(long)ctx->data;
+ enum flow_offload_tuple_dir dir;
+ struct ethhdr *eth = data;
+ struct flow_offload *flow;
+ struct flow_ports *ports;
+ unsigned long flags;
+ int iifindex;
+
+ if (eth + 1 > data_end)
+ return XDP_PASS;
+
+ switch (eth->h_proto) {
+ case bpf_htons(ETH_P_IP): {
+ struct iphdr *iph = data + sizeof(*eth);
+
+ ports = (struct flow_ports *)(iph + 1);
+ if (ports + 1 > data_end)
+ return XDP_PASS;
+
+ /* sanity check on ip header */
+ if (!xdp_flowtable_offload_check_iphdr(iph))
+ return XDP_PASS;
+
+ if (!xdp_flowtable_offload_check_tcp_state(ports, data_end,
+ iph->protocol))
+ return XDP_PASS;
+
+ tuple.family = AF_INET;
+ tuple.tos = iph->tos;
+ tuple.l4_protocol = iph->protocol;
+ tuple.tot_len = bpf_ntohs(iph->tot_len);
+ tuple.ipv4_src = iph->saddr;
+ tuple.ipv4_dst = iph->daddr;
+ tuple.sport = ports->source;
+ tuple.dport = ports->dest;
+ break;
+ }
+ case bpf_htons(ETH_P_IPV6): {
+ struct in6_addr *src = (struct in6_addr *)tuple.ipv6_src;
+ struct in6_addr *dst = (struct in6_addr *)tuple.ipv6_dst;
+ struct ipv6hdr *ip6h = data + sizeof(*eth);
+
+ ports = (struct flow_ports *)(ip6h + 1);
+ if (ports + 1 > data_end)
+ return XDP_PASS;
+
+ if (ip6h->hop_limit <= 1)
+ return XDP_PASS;
+
+ if (!xdp_flowtable_offload_check_tcp_state(ports, data_end,
+ ip6h->nexthdr))
+ return XDP_PASS;
+
+ tuple.family = AF_INET6;
+ tuple.l4_protocol = ip6h->nexthdr;
+ tuple.tot_len = bpf_ntohs(ip6h->payload_len);
+ *src = ip6h->saddr;
+ *dst = ip6h->daddr;
+ tuple.sport = ports->source;
+ tuple.dport = ports->dest;
+ break;
+ }
+ default:
+ return XDP_PASS;
+ }
+
+ tuplehash = bpf_xdp_flow_offload_lookup(ctx, &tuple);
+ if (!tuplehash)
+ return XDP_PASS;
+
+ dir = tuplehash->tuple.dir;
+ flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
+ if (bpf_core_read(&flags, sizeof(flags), &flow->flags))
+ return XDP_PASS;
+
+ switch (tuplehash->tuple.xmit_type) {
+ case FLOW_OFFLOAD_XMIT_NEIGH:
+ /* update the destination address in case of dnatting before
+ * performing the route lookup
+ */
+ if (tuple.family == AF_INET6)
+ xdp_flowtable_offload_get_dnat_ipv6(flow, dir,
+ (struct in6_addr *)&tuple.ipv6_dst);
+ else
+ xdp_flowtable_offload_get_dnat_ip(flow, dir, &tuple.ipv4_src);
+
+ if (bpf_fib_lookup(ctx, &tuple, sizeof(tuple), 0))
+ return XDP_PASS;
+
+ if (tuple.family == AF_INET6)
+ xdp_flowtable_offload_forward_ipv6(flow, data, data_end,
+ ports, dir, flags);
+ else
+ xdp_flowtable_offload_forward_ip(flow, data, data_end,
+ ports, dir, flags);
+
+ __builtin_memcpy(eth->h_dest, tuple.dmac, ETH_ALEN);
+ __builtin_memcpy(eth->h_source, tuple.smac, ETH_ALEN);
+ iifindex = tuple.ifindex;
+ break;
+ case FLOW_OFFLOAD_XMIT_DIRECT:
+ default:
+ return XDP_PASS;
+ }
+
+ return bpf_redirect(iifindex, 0);
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/samples/bpf/xdp_flowtable_offload_user.c b/samples/bpf/xdp_flowtable_offload_user.c
new file mode 100644
index 0000000000000..179b1f34b48fd
--- /dev/null
+++ b/samples/bpf/xdp_flowtable_offload_user.c
@@ -0,0 +1,128 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2024 Lorenzo Bianconi <lorenzo@kernel.org>
+ */
+static const char *__doc__ =
+"XDP flowtable integration example\n"
+"Usage: xdp_flowtable_offload <IFINDEX|IFNAME>\n";
+
+#include <linux/bpf.h>
+#include <linux/if_link.h>
+#include <assert.h>
+#include <errno.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <net/if.h>
+#include <unistd.h>
+#include <libgen.h>
+#include <getopt.h>
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+#include "bpf_util.h"
+#include "xdp_sample_user.h"
+#include "xdp_flowtable_offload.skel.h"
+
+static int mask = SAMPLE_RX_CNT | SAMPLE_EXCEPTION_CNT;
+
+DEFINE_SAMPLE_INIT(xdp_flowtable_offload);
+
+static const struct option long_options[] = {
+ { "help", no_argument, NULL, 'h' },
+ { "generic", no_argument, NULL, 'g' },
+ {}
+};
+
+int main(int argc, char **argv)
+{
+ struct xdp_flowtable_offload *skel;
+ int ret = EXIT_FAIL_OPTION;
+ char ifname[IF_NAMESIZE];
+ bool generic = false;
+ int ifindex;
+ int opt;
+
+ while ((opt = getopt_long(argc, argv, "hg",
+ long_options, NULL)) != -1) {
+ switch (opt) {
+ case 'g':
+ generic = true;
+ break;
+ case 'h':
+ default:
+ sample_usage(argv, long_options, __doc__, mask, false);
+ return ret;
+ }
+ }
+
+ if (argc <= optind) {
+ sample_usage(argv, long_options, __doc__, mask, true);
+ goto end;
+ }
+
+ ifindex = if_nametoindex(argv[optind]);
+ if (!ifindex)
+ ifindex = strtoul(argv[optind], NULL, 0);
+
+ if (!ifindex) {
+ fprintf(stderr, "Bad interface index or name\n");
+ sample_usage(argv, long_options, __doc__, mask, true);
+ goto end;
+ }
+
+ skel = xdp_flowtable_offload__open();
+ if (!skel) {
+ fprintf(stderr, "Failed to xdp_flowtable_offload__open: %s\n",
+ strerror(errno));
+ ret = EXIT_FAIL_BPF;
+ goto end;
+ }
+
+ ret = sample_init_pre_load(skel);
+ if (ret < 0) {
+ fprintf(stderr, "Failed to sample_init_pre_load: %s\n", strerror(-ret));
+ ret = EXIT_FAIL_BPF;
+ goto end_destroy;
+ }
+
+ ret = xdp_flowtable_offload__load(skel);
+ if (ret < 0) {
+ fprintf(stderr, "Failed to xdp_flowtable_offload__load: %s\n",
+ strerror(errno));
+ ret = EXIT_FAIL_BPF;
+ goto end_destroy;
+ }
+
+ ret = sample_init(skel, mask);
+ if (ret < 0) {
+ fprintf(stderr, "Failed to initialize sample: %s\n", strerror(-ret));
+ ret = EXIT_FAIL;
+ goto end_destroy;
+ }
+
+ if (sample_install_xdp(skel->progs.xdp_flowtable_offload,
+ ifindex, generic, false) < 0) {
+ ret = EXIT_FAIL_XDP;
+ goto end_destroy;
+ }
+
+ ret = EXIT_FAIL;
+ if (!if_indextoname(ifindex, ifname)) {
+ fprintf(stderr, "Failed to if_indextoname for %d: %s\n", ifindex,
+ strerror(errno));
+ goto end_destroy;
+ }
+
+ ret = sample_run(2, NULL, NULL);
+ if (ret < 0) {
+ fprintf(stderr, "Failed during sample run: %s\n", strerror(-ret));
+ ret = EXIT_FAIL;
+ goto end_destroy;
+ }
+ ret = EXIT_OK;
+end_destroy:
+ xdp_flowtable_offload__destroy(skel);
+end:
+ sample_exit(ret);
+}
--
2.45.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH bpf-next v2 4/4] selftests/bpf: Add selftest for bpf_xdp_flow_offload_lookup kfunc
2024-05-18 10:12 [PATCH bpf-next v2 0/4] netfilter: Add the capability to offload flowtable in XDP layer Lorenzo Bianconi
` (2 preceding siblings ...)
2024-05-18 10:12 ` [PATCH bpf-next v2 3/4] samples/bpf: Add bpf sample to offload flowtable traffic to xdp Lorenzo Bianconi
@ 2024-05-18 10:12 ` Lorenzo Bianconi
2024-05-21 1:43 ` Alexei Starovoitov
3 siblings, 1 reply; 12+ messages in thread
From: Lorenzo Bianconi @ 2024-05-18 10:12 UTC (permalink / raw)
To: bpf
Cc: pablo, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
netdev, ast, daniel, andrii, lorenzo.bianconi, toke, fw, hawk,
horms, donhunte, memxor
Introduce e2e selftest for bpf_xdp_flow_offload_lookup kfunc through
xdp_flowtable utility.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
tools/testing/selftests/bpf/Makefile | 10 +-
tools/testing/selftests/bpf/config | 4 +
.../selftests/bpf/progs/xdp_flowtable.c | 141 +++++++++++++++++
.../selftests/bpf/test_xdp_flowtable.sh | 112 ++++++++++++++
tools/testing/selftests/bpf/xdp_flowtable.c | 142 ++++++++++++++++++
5 files changed, 407 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/xdp_flowtable.c
create mode 100755 tools/testing/selftests/bpf/test_xdp_flowtable.sh
create mode 100644 tools/testing/selftests/bpf/xdp_flowtable.c
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index e0b3887b3d2df..7361c429bed62 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -133,7 +133,8 @@ TEST_PROGS := test_kmod.sh \
test_bpftool_metadata.sh \
test_doc_build.sh \
test_xsk.sh \
- test_xdp_features.sh
+ test_xdp_features.sh \
+ test_xdp_flowtable.sh
TEST_PROGS_EXTENDED := with_addr.sh \
with_tunnels.sh ima_setup.sh verify_sig_setup.sh \
@@ -144,7 +145,7 @@ TEST_GEN_PROGS_EXTENDED = test_skb_cgroup_id_user \
flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user \
test_lirc_mode2_user xdping test_cpp runqslower bench bpf_testmod.ko \
xskxceiver xdp_redirect_multi xdp_synproxy veristat xdp_hw_metadata \
- xdp_features bpf_test_no_cfi.ko
+ xdp_features bpf_test_no_cfi.ko xdp_flowtable
TEST_GEN_FILES += liburandom_read.so urandom_read sign-file uprobe_multi
@@ -476,6 +477,7 @@ test_usdt.skel.h-deps := test_usdt.bpf.o test_usdt_multispec.bpf.o
xsk_xdp_progs.skel.h-deps := xsk_xdp_progs.bpf.o
xdp_hw_metadata.skel.h-deps := xdp_hw_metadata.bpf.o
xdp_features.skel.h-deps := xdp_features.bpf.o
+xdp_flowtable.skel.h-deps := xdp_flowtable.bpf.o
LINKED_BPF_SRCS := $(patsubst %.bpf.o,%.c,$(foreach skel,$(LINKED_SKELS),$($(skel)-deps)))
@@ -710,6 +712,10 @@ $(OUTPUT)/xdp_features: xdp_features.c $(OUTPUT)/network_helpers.o $(OUTPUT)/xdp
$(call msg,BINARY,,$@)
$(Q)$(CC) $(CFLAGS) $(filter %.a %.o %.c,$^) $(LDLIBS) -o $@
+$(OUTPUT)/xdp_flowtable: xdp_flowtable.c $(OUTPUT)/xdp_flowtable.skel.h | $(OUTPUT)
+ $(call msg,BINARY,,$@)
+ $(Q)$(CC) $(CFLAGS) $(filter %.a %.o %.c,$^) $(LDLIBS) -o $@
+
# Make sure we are able to include and link libbpf against c++.
$(OUTPUT)/test_cpp: test_cpp.cpp $(OUTPUT)/test_core_extern.skel.h $(BPFOBJ)
$(call msg,CXX,,$@)
diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
index eeabd798bc3ae..1a9aea01145f7 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -82,6 +82,10 @@ CONFIG_NF_CONNTRACK=y
CONFIG_NF_CONNTRACK_MARK=y
CONFIG_NF_DEFRAG_IPV4=y
CONFIG_NF_DEFRAG_IPV6=y
+CONFIG_NF_TABLES=y
+CONFIG_NETFILTER_INGRESS=y
+CONFIG_NF_FLOW_TABLE=y
+CONFIG_NF_FLOW_TABLE_INET=y
CONFIG_NF_NAT=y
CONFIG_RC_CORE=y
CONFIG_SECURITY=y
diff --git a/tools/testing/selftests/bpf/progs/xdp_flowtable.c b/tools/testing/selftests/bpf/progs/xdp_flowtable.c
new file mode 100644
index 0000000000000..888ac87790f90
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/xdp_flowtable.c
@@ -0,0 +1,141 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#define MAX_ERRNO 4095
+
+#define ETH_P_IP 0x0800
+#define ETH_P_IPV6 0x86dd
+#define IP_MF 0x2000 /* "More Fragments" */
+#define IP_OFFSET 0x1fff /* "Fragment Offset" */
+#define AF_INET 2
+#define AF_INET6 10
+
+struct flow_offload_tuple_rhash *
+bpf_xdp_flow_offload_lookup(struct xdp_md *,
+ struct bpf_fib_lookup *) __ksym;
+
+struct {
+ __uint(type, BPF_MAP_TYPE_ARRAY);
+ __type(key, __u32);
+ __type(value, __u32);
+ __uint(max_entries, 1);
+} stats SEC(".maps");
+
+static __always_inline bool
+xdp_flowtable_offload_check_iphdr(struct iphdr *iph)
+{
+ /* ip fragmented traffic */
+ if (iph->frag_off & bpf_htons(IP_MF | IP_OFFSET))
+ return false;
+
+ /* ip options */
+ if (iph->ihl * 4 != sizeof(*iph))
+ return false;
+
+ if (iph->ttl <= 1)
+ return false;
+
+ return true;
+}
+
+static __always_inline bool
+xdp_flowtable_offload_check_tcp_state(void *ports, void *data_end, u8 proto)
+{
+ if (proto == IPPROTO_TCP) {
+ struct tcphdr *tcph = ports;
+
+ if (tcph + 1 > data_end)
+ return false;
+
+ if (tcph->fin || tcph->rst)
+ return false;
+ }
+
+ return true;
+}
+
+SEC("xdp.frags")
+int xdp_flowtable_do_lookup(struct xdp_md *ctx)
+{
+ void *data_end = (void *)(long)ctx->data_end;
+ struct flow_offload_tuple_rhash *tuplehash;
+ struct bpf_fib_lookup tuple = {
+ .ifindex = ctx->ingress_ifindex,
+ };
+ void *data = (void *)(long)ctx->data;
+ struct ethhdr *eth = data;
+ struct flow_ports *ports;
+ __u32 *val, key = 0;
+
+ if (eth + 1 > data_end)
+ return XDP_DROP;
+
+ switch (eth->h_proto) {
+ case bpf_htons(ETH_P_IP): {
+ struct iphdr *iph = data + sizeof(*eth);
+
+ ports = (struct flow_ports *)(iph + 1);
+ if (ports + 1 > data_end)
+ return XDP_PASS;
+
+ /* sanity check on ip header */
+ if (!xdp_flowtable_offload_check_iphdr(iph))
+ return XDP_PASS;
+
+ if (!xdp_flowtable_offload_check_tcp_state(ports, data_end,
+ iph->protocol))
+ return XDP_PASS;
+
+ tuple.family = AF_INET;
+ tuple.tos = iph->tos;
+ tuple.l4_protocol = iph->protocol;
+ tuple.tot_len = bpf_ntohs(iph->tot_len);
+ tuple.ipv4_src = iph->saddr;
+ tuple.ipv4_dst = iph->daddr;
+ tuple.sport = ports->source;
+ tuple.dport = ports->dest;
+ break;
+ }
+ case bpf_htons(ETH_P_IPV6): {
+ struct in6_addr *src = (struct in6_addr *)tuple.ipv6_src;
+ struct in6_addr *dst = (struct in6_addr *)tuple.ipv6_dst;
+ struct ipv6hdr *ip6h = data + sizeof(*eth);
+
+ ports = (struct flow_ports *)(ip6h + 1);
+ if (ports + 1 > data_end)
+ return XDP_PASS;
+
+ if (ip6h->hop_limit <= 1)
+ return XDP_PASS;
+
+ if (!xdp_flowtable_offload_check_tcp_state(ports, data_end,
+ ip6h->nexthdr))
+ return XDP_PASS;
+
+ tuple.family = AF_INET6;
+ tuple.l4_protocol = ip6h->nexthdr;
+ tuple.tot_len = bpf_ntohs(ip6h->payload_len);
+ *src = ip6h->saddr;
+ *dst = ip6h->daddr;
+ tuple.sport = ports->source;
+ tuple.dport = ports->dest;
+ break;
+ }
+ default:
+ return XDP_PASS;
+ }
+
+ tuplehash = bpf_xdp_flow_offload_lookup(ctx, &tuple);
+ if (!tuplehash)
+ return XDP_PASS;
+
+ val = bpf_map_lookup_elem(&stats, &key);
+ if (val)
+ __sync_add_and_fetch(val, 1);
+
+ return XDP_PASS;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_xdp_flowtable.sh b/tools/testing/selftests/bpf/test_xdp_flowtable.sh
new file mode 100755
index 0000000000000..1a8a40aebbdf1
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_xdp_flowtable.sh
@@ -0,0 +1,112 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+readonly NS0="ns0-$(mktemp -u XXXXXX)"
+readonly NS1="ns1-$(mktemp -u XXXXXX)"
+readonly infile="$(mktemp)"
+readonly outfile="$(mktemp)"
+
+xdp_flowtable_pid=""
+ret=1
+
+setup_flowtable() {
+nft -f /dev/stdin <<EOF
+table inet nat {
+ chain postrouting {
+ type nat hook postrouting priority filter; policy accept;
+ meta oif v10 masquerade
+ }
+}
+table inet filter {
+ flowtable ft {
+ hook ingress priority filter
+ devices = { v01, v10 }
+ }
+ chain forward {
+ type filter hook forward priority filter
+ meta l4proto { tcp, udp } flow add @ft
+ }
+}
+EOF
+}
+
+setup() {
+ sysctl -w net.ipv4.ip_forward=1
+ sysctl -w net.ipv6.conf.all.forwarding=1
+
+ ip netns add ${NS0}
+ ip netns add ${NS1}
+
+ ip link add v01 type veth peer name v00 netns ${NS0}
+ ip link add v10 type veth peer name v11 netns ${NS1}
+
+ ip -n ${NS0} addr add 192.168.0.1/24 dev v00
+ ip -6 -n ${NS0} addr add 2001:db8::1/64 dev v00
+ ip -n ${NS0} link set dev v00 up
+ ip -n ${NS0} route add default via 192.168.0.2
+ ip -6 -n ${NS0} route add default via 2001:db8::2
+
+ ip addr add 192.168.0.2/24 dev v01
+ ip -6 addr add 2001:db8::2/64 dev v01
+ ip link set dev v01 up
+ ip addr add 192.168.1.1/24 dev v10
+ ip -6 addr add 2001:db8:1::1/64 dev v10
+ ip link set dev v10 up
+
+ ip -n ${NS1} addr add 192.168.1.2/24 dev v11
+ ip -6 -n ${NS1} addr add 2001:db8:1::2/64 dev v11
+ ip -n ${NS1} link set dev v11 up
+ ip -n ${NS1} route add default via 192.168.1.1
+ ip -6 -n ${NS1} route add default via 2001:db8:1::1
+
+ # Load XDP program
+ ./xdp_flowtable v01 &
+ xdp_flowtable_pid=$!
+
+ setup_flowtable
+
+ dd if=/dev/urandom of="${infile}" bs=8192 count=16 status=none
+}
+
+wait_for_nc_server() {
+ while sleep 1; do
+ ip netns exec ${NS1} ss -nutlp | grep -q ":$1"
+ [ $? -eq 0 ] && break
+ done
+}
+
+cleanup() {
+ {
+ rm -f "${infile}" "${outfile}"
+
+ nft delete table inet filter
+ nft delete table inet nat
+
+ ip link del v01
+ ip link del v10
+
+ ip netns del ${NS0}
+ ip netns del ${NS1}
+ } >/dev/null 2>/dev/null
+}
+
+test_xdp_flowtable_lookup() {
+ ## Run IPv4 test
+ ip netns exec ${NS1} nc -4 --no-shutdown -l 8084 > ${outfile} &
+ wait_for_nc_server 8084
+ ip netns exec ${NS0} timeout 2 nc -4 192.168.1.2 8084 < ${infile}
+
+ ## Run IPv6 test
+ ip netns exec ${NS1} nc -6 --no-shutdown -l 8086 > ${outfile} &
+ wait_for_nc_server 8086
+ ip netns exec ${NS0} timeout 2 nc -6 2001:db8:1::2 8086 < ${infile}
+
+ wait $xdp_flowtable_pid && ret=0
+}
+
+trap cleanup 0 2 3 6 9
+setup
+
+test_xdp_flowtable_lookup
+
+exit $ret
diff --git a/tools/testing/selftests/bpf/xdp_flowtable.c b/tools/testing/selftests/bpf/xdp_flowtable.c
new file mode 100644
index 0000000000000..dea24deda7359
--- /dev/null
+++ b/tools/testing/selftests/bpf/xdp_flowtable.c
@@ -0,0 +1,142 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <uapi/linux/bpf.h>
+#include <linux/if_link.h>
+#include <net/if.h>
+#include <unistd.h>
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+#include <signal.h>
+#include <argp.h>
+
+#include "xdp_flowtable.skel.h"
+
+#define MAX_ITERATION 10
+
+static volatile bool exiting, verbosity;
+static char ifname[IF_NAMESIZE];
+static int ifindex = -ENODEV;
+const char *argp_program_version = "xdp-flowtable 0.0";
+const char argp_program_doc[] =
+"XDP flowtable application.\n"
+"\n"
+"USAGE: ./xdp-flowtable [-v] <iface-name>\n";
+
+static const struct argp_option opts[] = {
+ { "verbose", 'v', NULL, 0, "Verbose debug output" },
+ {},
+};
+
+static void sig_handler(int sig)
+{
+ exiting = true;
+}
+
+static int libbpf_print_fn(enum libbpf_print_level level,
+ const char *format, va_list args)
+{
+ if (level == LIBBPF_DEBUG && !verbosity)
+ return 0;
+ return vfprintf(stderr, format, args);
+}
+
+static error_t parse_arg(int key, char *arg, struct argp_state *state)
+{
+ switch (key) {
+ case 'v':
+ verbosity = true;
+ break;
+ case ARGP_KEY_ARG:
+ errno = 0;
+ if (strlen(arg) >= IF_NAMESIZE) {
+ fprintf(stderr, "Invalid device name: %s\n", arg);
+ argp_usage(state);
+ return ARGP_ERR_UNKNOWN;
+ }
+
+ ifindex = if_nametoindex(arg);
+ if (!ifindex)
+ ifindex = strtoul(arg, NULL, 0);
+ if (!ifindex || !if_indextoname(ifindex, ifname)) {
+ fprintf(stderr,
+ "Bad interface index or name (%d): %s\n",
+ errno, strerror(errno));
+ argp_usage(state);
+ return ARGP_ERR_UNKNOWN;
+ }
+ break;
+ default:
+ return ARGP_ERR_UNKNOWN;
+ }
+
+ return 0;
+}
+
+static const struct argp argp = {
+ .options = opts,
+ .parser = parse_arg,
+ .doc = argp_program_doc,
+};
+
+int main(int argc, char **argv)
+{
+ unsigned int count = 0, key = 0;
+ struct xdp_flowtable *skel;
+ int i, err;
+
+ libbpf_set_strict_mode(LIBBPF_STRICT_ALL);
+ libbpf_set_print(libbpf_print_fn);
+
+ signal(SIGINT, sig_handler);
+ signal(SIGTERM, sig_handler);
+
+ /* Parse command line arguments */
+ err = argp_parse(&argp, argc, argv, 0, NULL, NULL);
+ if (err)
+ return err;
+
+ /* Load and verify BPF application */
+ skel = xdp_flowtable__open();
+ if (!skel) {
+ fprintf(stderr, "Failed to open and load BPF skeleton\n");
+ return -EINVAL;
+ }
+
+ /* Load & verify BPF programs */
+ err = xdp_flowtable__load(skel);
+ if (err) {
+ fprintf(stderr, "Failed to load and verify BPF skeleton\n");
+ goto cleanup;
+ }
+
+ /* Attach the XDP program */
+ err = xdp_flowtable__attach(skel);
+ if (err) {
+ fprintf(stderr, "Failed to attach BPF skeleton\n");
+ goto cleanup;
+ }
+
+ err = bpf_xdp_attach(ifindex,
+ bpf_program__fd(skel->progs.xdp_flowtable_do_lookup),
+ XDP_FLAGS_DRV_MODE, NULL);
+ if (err) {
+ fprintf(stderr, "Failed attaching XDP program to device %s\n",
+ ifname);
+ goto cleanup;
+ }
+
+ /* Collect stats */
+ for (i = 0; i < MAX_ITERATION && !exiting; i++)
+ sleep(1);
+
+ /* Check results */
+ err = bpf_map__lookup_elem(skel->maps.stats, &key, sizeof(key),
+ &count, sizeof(count), 0);
+ if (!err && !count)
+ err = -EINVAL;
+
+ bpf_xdp_detach(ifindex, XDP_FLAGS_DRV_MODE, NULL);
+cleanup:
+ xdp_flowtable__destroy(skel);
+
+ return err;
+}
--
2.45.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v2 2/4] netfilter: add bpf_xdp_flow_offload_lookup kfunc
2024-05-18 10:12 ` [PATCH bpf-next v2 2/4] netfilter: add bpf_xdp_flow_offload_lookup kfunc Lorenzo Bianconi
@ 2024-05-18 21:50 ` Kumar Kartikeya Dwivedi
2024-05-21 1:41 ` Alexei Starovoitov
1 sibling, 0 replies; 12+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2024-05-18 21:50 UTC (permalink / raw)
To: Lorenzo Bianconi
Cc: bpf, pablo, kadlec, davem, edumazet, kuba, pabeni,
netfilter-devel, netdev, ast, daniel, andrii, lorenzo.bianconi,
toke, fw, hawk, horms, donhunte
On Sat, 18 May 2024 at 12:13, Lorenzo Bianconi <lorenzo@kernel.org> wrote:
>
> Introduce bpf_xdp_flow_offload_lookup kfunc in order to perform the
> lookup of a given flowtable entry based on a fib tuple of incoming
> traffic.
> bpf_xdp_flow_offload_lookup can be used as building block to offload
> in xdp the processing of sw flowtable when hw flowtable is not
> available.
>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Though I think it might have been better to have an opts parameter for
extensibility (and have opts->error for now to aid debugging when NULL
is returned),
but I won't insist (it's not a big deal, as there's only two things
that can go wrong: the tuple->family is unsupported or the lookup
fails).
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v2 2/4] netfilter: add bpf_xdp_flow_offload_lookup kfunc
2024-05-18 10:12 ` [PATCH bpf-next v2 2/4] netfilter: add bpf_xdp_flow_offload_lookup kfunc Lorenzo Bianconi
2024-05-18 21:50 ` Kumar Kartikeya Dwivedi
@ 2024-05-21 1:41 ` Alexei Starovoitov
2024-05-21 13:21 ` Lorenzo Bianconi
1 sibling, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2024-05-21 1:41 UTC (permalink / raw)
To: Lorenzo Bianconi
Cc: bpf, Pablo Neira Ayuso, Jozsef Kadlecsik, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, netfilter-devel,
Network Development, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Lorenzo Bianconi,
Toke Høiland-Jørgensen, Florian Westphal,
Jesper Dangaard Brouer, Simon Horman, donhunte,
Kumar Kartikeya Dwivedi
On Sat, May 18, 2024 at 3:13 AM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
>
> Introduce bpf_xdp_flow_offload_lookup kfunc in order to perform the
> lookup of a given flowtable entry based on a fib tuple of incoming
> traffic.
> bpf_xdp_flow_offload_lookup can be used as building block to offload
> in xdp the processing of sw flowtable when hw flowtable is not
> available.
>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
> include/net/netfilter/nf_flow_table.h | 10 +++
> net/netfilter/Makefile | 5 ++
> net/netfilter/nf_flow_table_bpf.c | 94 +++++++++++++++++++++++++++
> net/netfilter/nf_flow_table_inet.c | 2 +-
> 4 files changed, 110 insertions(+), 1 deletion(-)
> create mode 100644 net/netfilter/nf_flow_table_bpf.c
>
> diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
> index 0bbe6ea8e0651..085660cbcd3f2 100644
> --- a/include/net/netfilter/nf_flow_table.h
> +++ b/include/net/netfilter/nf_flow_table.h
> @@ -312,6 +312,16 @@ unsigned int nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
> unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
> const struct nf_hook_state *state);
>
> +#if (IS_BUILTIN(CONFIG_NF_FLOW_TABLE) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) || \
> + (IS_MODULE(CONFIG_NF_FLOW_TABLE) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES))
> +extern int nf_flow_offload_register_bpf(void);
> +#else
> +static inline int nf_flow_offload_register_bpf(void)
> +{
> + return 0;
> +}
> +#endif
> +
> #define MODULE_ALIAS_NF_FLOWTABLE(family) \
> MODULE_ALIAS("nf-flowtable-" __stringify(family))
>
> diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
> index 614815a3ed738..18b09cec92024 100644
> --- a/net/netfilter/Makefile
> +++ b/net/netfilter/Makefile
> @@ -144,6 +144,11 @@ obj-$(CONFIG_NF_FLOW_TABLE) += nf_flow_table.o
> nf_flow_table-objs := nf_flow_table_core.o nf_flow_table_ip.o \
> nf_flow_table_offload.o
> nf_flow_table-$(CONFIG_NF_FLOW_TABLE_PROCFS) += nf_flow_table_procfs.o
> +ifeq ($(CONFIG_NF_FLOW_TABLE),m)
> +nf_flow_table-$(CONFIG_DEBUG_INFO_BTF_MODULES) += nf_flow_table_bpf.o
> +else ifeq ($(CONFIG_NF_FLOW_TABLE),y)
> +nf_flow_table-$(CONFIG_DEBUG_INFO_BTF) += nf_flow_table_bpf.o
> +endif
>
> obj-$(CONFIG_NF_FLOW_TABLE_INET) += nf_flow_table_inet.o
>
> diff --git a/net/netfilter/nf_flow_table_bpf.c b/net/netfilter/nf_flow_table_bpf.c
> new file mode 100644
> index 0000000000000..f999ed9712796
> --- /dev/null
> +++ b/net/netfilter/nf_flow_table_bpf.c
> @@ -0,0 +1,94 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Unstable Flow Table Helpers for XDP hook
> + *
> + * These are called from the XDP programs.
> + * Note that it is allowed to break compatibility for these functions since
> + * the interface they are exposed through to BPF programs is explicitly
> + * unstable.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/init.h>
> +#include <linux/module.h>
> +#include <net/netfilter/nf_flow_table.h>
> +#include <linux/bpf.h>
> +#include <linux/btf.h>
> +#include <net/xdp.h>
> +
> +__diag_push();
> +__diag_ignore_all("-Wmissing-prototypes",
> + "Global functions as their definitions will be in nf_flow_table BTF");
> +
> +static struct flow_offload_tuple_rhash *
> +bpf_xdp_flow_offload_tuple_lookup(struct net_device *dev,
> + struct flow_offload_tuple *tuple,
> + __be16 proto)
> +{
> + struct flow_offload_tuple_rhash *tuplehash;
> + struct nf_flowtable *flow_table;
> + struct flow_offload *flow;
> +
> + flow_table = nf_flowtable_by_dev(dev);
> + if (!flow_table)
> + return NULL;
> +
> + tuplehash = flow_offload_lookup(flow_table, tuple);
> + if (!tuplehash)
> + return NULL;
> +
> + flow = container_of(tuplehash, struct flow_offload,
> + tuplehash[tuplehash->tuple.dir]);
> + flow_offload_refresh(flow_table, flow, false);
> +
> + return tuplehash;
> +}
> +
> +__bpf_kfunc struct flow_offload_tuple_rhash *
> +bpf_xdp_flow_offload_lookup(struct xdp_md *ctx,
> + struct bpf_fib_lookup *fib_tuple)
> +{
> + struct xdp_buff *xdp = (struct xdp_buff *)ctx;
> + struct flow_offload_tuple tuple = {
> + .iifidx = fib_tuple->ifindex,
> + .l3proto = fib_tuple->family,
> + .l4proto = fib_tuple->l4_protocol,
> + .src_port = fib_tuple->sport,
> + .dst_port = fib_tuple->dport,
> + };
> + __be16 proto;
> +
> + switch (fib_tuple->family) {
> + case AF_INET:
> + tuple.src_v4.s_addr = fib_tuple->ipv4_src;
> + tuple.dst_v4.s_addr = fib_tuple->ipv4_dst;
> + proto = htons(ETH_P_IP);
> + break;
> + case AF_INET6:
> + tuple.src_v6 = *(struct in6_addr *)&fib_tuple->ipv6_src;
> + tuple.dst_v6 = *(struct in6_addr *)&fib_tuple->ipv6_dst;
> + proto = htons(ETH_P_IPV6);
> + break;
> + default:
> + return NULL;
> + }
> +
> + return bpf_xdp_flow_offload_tuple_lookup(xdp->rxq->dev, &tuple, proto);
> +}
> +
> +__diag_pop()
> +
> +BTF_KFUNCS_START(nf_ft_kfunc_set)
> +BTF_ID_FLAGS(func, bpf_xdp_flow_offload_lookup)
I think it needs to be KF_RET_NULL.
And most likely KF_TRUSTED_ARGS as well.
Also the "offload" doesn't fit in the name.
The existing code calls it "offload", because it's actually
pushing the rules to HW (if I understand the code),
but here it's just a lookup from xdp.
So call it
bpf_xdp_flow_lookup() ?
Though "flow" is a bit too generic here.
nf_flow maybe?
> +BTF_KFUNCS_END(nf_ft_kfunc_set)
> +
> +static const struct btf_kfunc_id_set nf_flow_offload_kfunc_set = {
> + .owner = THIS_MODULE,
> + .set = &nf_ft_kfunc_set,
> +};
> +
> +int nf_flow_offload_register_bpf(void)
> +{
> + return register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP,
> + &nf_flow_offload_kfunc_set);
> +}
> +EXPORT_SYMBOL_GPL(nf_flow_offload_register_bpf);
> diff --git a/net/netfilter/nf_flow_table_inet.c b/net/netfilter/nf_flow_table_inet.c
> index 6eef15648b7b0..6175f7556919d 100644
> --- a/net/netfilter/nf_flow_table_inet.c
> +++ b/net/netfilter/nf_flow_table_inet.c
> @@ -98,7 +98,7 @@ static int __init nf_flow_inet_module_init(void)
> nft_register_flowtable_type(&flowtable_ipv6);
> nft_register_flowtable_type(&flowtable_inet);
>
> - return 0;
> + return nf_flow_offload_register_bpf();
> }
>
> static void __exit nf_flow_inet_module_exit(void)
> --
> 2.45.1
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v2 4/4] selftests/bpf: Add selftest for bpf_xdp_flow_offload_lookup kfunc
2024-05-18 10:12 ` [PATCH bpf-next v2 4/4] selftests/bpf: Add selftest for bpf_xdp_flow_offload_lookup kfunc Lorenzo Bianconi
@ 2024-05-21 1:43 ` Alexei Starovoitov
0 siblings, 0 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2024-05-21 1:43 UTC (permalink / raw)
To: Lorenzo Bianconi
Cc: bpf, Pablo Neira Ayuso, Jozsef Kadlecsik, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, netfilter-devel,
Network Development, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Lorenzo Bianconi,
Toke Høiland-Jørgensen, Florian Westphal,
Jesper Dangaard Brouer, Simon Horman, donhunte,
Kumar Kartikeya Dwivedi
On Sat, May 18, 2024 at 3:13 AM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
>
> Introduce e2e selftest for bpf_xdp_flow_offload_lookup kfunc through
> xdp_flowtable utility.
>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
> tools/testing/selftests/bpf/Makefile | 10 +-
> tools/testing/selftests/bpf/config | 4 +
> .../selftests/bpf/progs/xdp_flowtable.c | 141 +++++++++++++++++
> .../selftests/bpf/test_xdp_flowtable.sh | 112 ++++++++++++++
> tools/testing/selftests/bpf/xdp_flowtable.c | 142 ++++++++++++++++++
> 5 files changed, 407 insertions(+), 2 deletions(-)
> create mode 100644 tools/testing/selftests/bpf/progs/xdp_flowtable.c
> create mode 100755 tools/testing/selftests/bpf/test_xdp_flowtable.sh
> create mode 100644 tools/testing/selftests/bpf/xdp_flowtable.c
>
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index e0b3887b3d2df..7361c429bed62 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -133,7 +133,8 @@ TEST_PROGS := test_kmod.sh \
> test_bpftool_metadata.sh \
> test_doc_build.sh \
> test_xsk.sh \
> - test_xdp_features.sh
> + test_xdp_features.sh \
> + test_xdp_flowtable.sh
>
> TEST_PROGS_EXTENDED := with_addr.sh \
> with_tunnels.sh ima_setup.sh verify_sig_setup.sh \
> @@ -144,7 +145,7 @@ TEST_GEN_PROGS_EXTENDED = test_skb_cgroup_id_user \
> flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user \
> test_lirc_mode2_user xdping test_cpp runqslower bench bpf_testmod.ko \
> xskxceiver xdp_redirect_multi xdp_synproxy veristat xdp_hw_metadata \
> - xdp_features bpf_test_no_cfi.ko
> + xdp_features bpf_test_no_cfi.ko xdp_flowtable
>
> TEST_GEN_FILES += liburandom_read.so urandom_read sign-file uprobe_multi
>
> @@ -476,6 +477,7 @@ test_usdt.skel.h-deps := test_usdt.bpf.o test_usdt_multispec.bpf.o
> xsk_xdp_progs.skel.h-deps := xsk_xdp_progs.bpf.o
> xdp_hw_metadata.skel.h-deps := xdp_hw_metadata.bpf.o
> xdp_features.skel.h-deps := xdp_features.bpf.o
> +xdp_flowtable.skel.h-deps := xdp_flowtable.bpf.o
>
> LINKED_BPF_SRCS := $(patsubst %.bpf.o,%.c,$(foreach skel,$(LINKED_SKELS),$($(skel)-deps)))
>
> @@ -710,6 +712,10 @@ $(OUTPUT)/xdp_features: xdp_features.c $(OUTPUT)/network_helpers.o $(OUTPUT)/xdp
> $(call msg,BINARY,,$@)
> $(Q)$(CC) $(CFLAGS) $(filter %.a %.o %.c,$^) $(LDLIBS) -o $@
>
> +$(OUTPUT)/xdp_flowtable: xdp_flowtable.c $(OUTPUT)/xdp_flowtable.skel.h | $(OUTPUT)
> + $(call msg,BINARY,,$@)
> + $(Q)$(CC) $(CFLAGS) $(filter %.a %.o %.c,$^) $(LDLIBS) -o $@
> +
> # Make sure we are able to include and link libbpf against c++.
> $(OUTPUT)/test_cpp: test_cpp.cpp $(OUTPUT)/test_core_extern.skel.h $(BPFOBJ)
> $(call msg,CXX,,$@)
> diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
> index eeabd798bc3ae..1a9aea01145f7 100644
> --- a/tools/testing/selftests/bpf/config
> +++ b/tools/testing/selftests/bpf/config
> @@ -82,6 +82,10 @@ CONFIG_NF_CONNTRACK=y
> CONFIG_NF_CONNTRACK_MARK=y
> CONFIG_NF_DEFRAG_IPV4=y
> CONFIG_NF_DEFRAG_IPV6=y
> +CONFIG_NF_TABLES=y
> +CONFIG_NETFILTER_INGRESS=y
> +CONFIG_NF_FLOW_TABLE=y
> +CONFIG_NF_FLOW_TABLE_INET=y
> CONFIG_NF_NAT=y
> CONFIG_RC_CORE=y
> CONFIG_SECURITY=y
> diff --git a/tools/testing/selftests/bpf/progs/xdp_flowtable.c b/tools/testing/selftests/bpf/progs/xdp_flowtable.c
> new file mode 100644
> index 0000000000000..888ac87790f90
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/xdp_flowtable.c
> @@ -0,0 +1,141 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <vmlinux.h>
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_endian.h>
> +
> +#define MAX_ERRNO 4095
> +
> +#define ETH_P_IP 0x0800
> +#define ETH_P_IPV6 0x86dd
> +#define IP_MF 0x2000 /* "More Fragments" */
> +#define IP_OFFSET 0x1fff /* "Fragment Offset" */
> +#define AF_INET 2
> +#define AF_INET6 10
> +
> +struct flow_offload_tuple_rhash *
> +bpf_xdp_flow_offload_lookup(struct xdp_md *,
> + struct bpf_fib_lookup *) __ksym;
> +
> +struct {
> + __uint(type, BPF_MAP_TYPE_ARRAY);
> + __type(key, __u32);
> + __type(value, __u32);
> + __uint(max_entries, 1);
> +} stats SEC(".maps");
> +
> +static __always_inline bool
> +xdp_flowtable_offload_check_iphdr(struct iphdr *iph)
Please do not use __always_inline in bpf code.
It was needed 10 years ago. Not any more.
> +{
> + /* ip fragmented traffic */
> + if (iph->frag_off & bpf_htons(IP_MF | IP_OFFSET))
> + return false;
> +
> + /* ip options */
> + if (iph->ihl * 4 != sizeof(*iph))
> + return false;
> +
> + if (iph->ttl <= 1)
> + return false;
> +
> + return true;
> +}
> +
> +static __always_inline bool
> +xdp_flowtable_offload_check_tcp_state(void *ports, void *data_end, u8 proto)
> +{
> + if (proto == IPPROTO_TCP) {
> + struct tcphdr *tcph = ports;
> +
> + if (tcph + 1 > data_end)
> + return false;
> +
> + if (tcph->fin || tcph->rst)
> + return false;
> + }
> +
> + return true;
> +}
> +
> +SEC("xdp.frags")
> +int xdp_flowtable_do_lookup(struct xdp_md *ctx)
> +{
> + void *data_end = (void *)(long)ctx->data_end;
> + struct flow_offload_tuple_rhash *tuplehash;
> + struct bpf_fib_lookup tuple = {
> + .ifindex = ctx->ingress_ifindex,
> + };
> + void *data = (void *)(long)ctx->data;
> + struct ethhdr *eth = data;
> + struct flow_ports *ports;
> + __u32 *val, key = 0;
> +
> + if (eth + 1 > data_end)
> + return XDP_DROP;
> +
> + switch (eth->h_proto) {
> + case bpf_htons(ETH_P_IP): {
> + struct iphdr *iph = data + sizeof(*eth);
> +
> + ports = (struct flow_ports *)(iph + 1);
> + if (ports + 1 > data_end)
> + return XDP_PASS;
> +
> + /* sanity check on ip header */
> + if (!xdp_flowtable_offload_check_iphdr(iph))
> + return XDP_PASS;
> +
> + if (!xdp_flowtable_offload_check_tcp_state(ports, data_end,
> + iph->protocol))
> + return XDP_PASS;
> +
> + tuple.family = AF_INET;
> + tuple.tos = iph->tos;
> + tuple.l4_protocol = iph->protocol;
> + tuple.tot_len = bpf_ntohs(iph->tot_len);
> + tuple.ipv4_src = iph->saddr;
> + tuple.ipv4_dst = iph->daddr;
> + tuple.sport = ports->source;
> + tuple.dport = ports->dest;
> + break;
> + }
> + case bpf_htons(ETH_P_IPV6): {
> + struct in6_addr *src = (struct in6_addr *)tuple.ipv6_src;
> + struct in6_addr *dst = (struct in6_addr *)tuple.ipv6_dst;
> + struct ipv6hdr *ip6h = data + sizeof(*eth);
> +
> + ports = (struct flow_ports *)(ip6h + 1);
> + if (ports + 1 > data_end)
> + return XDP_PASS;
> +
> + if (ip6h->hop_limit <= 1)
> + return XDP_PASS;
> +
> + if (!xdp_flowtable_offload_check_tcp_state(ports, data_end,
> + ip6h->nexthdr))
> + return XDP_PASS;
> +
> + tuple.family = AF_INET6;
> + tuple.l4_protocol = ip6h->nexthdr;
> + tuple.tot_len = bpf_ntohs(ip6h->payload_len);
> + *src = ip6h->saddr;
> + *dst = ip6h->daddr;
> + tuple.sport = ports->source;
> + tuple.dport = ports->dest;
> + break;
> + }
> + default:
> + return XDP_PASS;
> + }
> +
> + tuplehash = bpf_xdp_flow_offload_lookup(ctx, &tuple);
> + if (!tuplehash)
> + return XDP_PASS;
> +
> + val = bpf_map_lookup_elem(&stats, &key);
> + if (val)
> + __sync_add_and_fetch(val, 1);
> +
> + return XDP_PASS;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> diff --git a/tools/testing/selftests/bpf/test_xdp_flowtable.sh b/tools/testing/selftests/bpf/test_xdp_flowtable.sh
> new file mode 100755
> index 0000000000000..1a8a40aebbdf1
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/test_xdp_flowtable.sh
Sorry shell scripts are not allowed.
Integrate it into test_progs.
pw-bot: cr
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v2 3/4] samples/bpf: Add bpf sample to offload flowtable traffic to xdp
2024-05-18 10:12 ` [PATCH bpf-next v2 3/4] samples/bpf: Add bpf sample to offload flowtable traffic to xdp Lorenzo Bianconi
@ 2024-05-21 1:45 ` Alexei Starovoitov
2024-05-21 10:19 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2024-05-21 1:45 UTC (permalink / raw)
To: Lorenzo Bianconi
Cc: bpf, Pablo Neira Ayuso, Jozsef Kadlecsik, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, netfilter-devel,
Network Development, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Lorenzo Bianconi,
Toke Høiland-Jørgensen, Florian Westphal,
Jesper Dangaard Brouer, Simon Horman, donhunte,
Kumar Kartikeya Dwivedi
On Sat, May 18, 2024 at 3:13 AM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
>
> Introduce xdp_flowtable_offload bpf sample to offload sw flowtable logic
> in xdp layer if hw flowtable is not available or does not support a
> specific kind of traffic.
>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
> samples/bpf/Makefile | 7 +-
> samples/bpf/xdp_flowtable_offload.bpf.c | 591 +++++++++++++++++++++++
> samples/bpf/xdp_flowtable_offload_user.c | 128 +++++
> 3 files changed, 725 insertions(+), 1 deletion(-)
> create mode 100644 samples/bpf/xdp_flowtable_offload.bpf.c
> create mode 100644 samples/bpf/xdp_flowtable_offload_user.c
I feel this sample code is dead on arrival.
Make selftest more real if you want people to use it as an example,
but samples dir is just a dumping ground.
We shouldn't be adding anything to it.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v2 3/4] samples/bpf: Add bpf sample to offload flowtable traffic to xdp
2024-05-21 1:45 ` Alexei Starovoitov
@ 2024-05-21 10:19 ` Toke Høiland-Jørgensen
2024-05-21 13:19 ` Lorenzo Bianconi
0 siblings, 1 reply; 12+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-05-21 10:19 UTC (permalink / raw)
To: Alexei Starovoitov, Lorenzo Bianconi
Cc: bpf, Pablo Neira Ayuso, Jozsef Kadlecsik, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, netfilter-devel,
Network Development, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Lorenzo Bianconi, Florian Westphal,
Jesper Dangaard Brouer, Simon Horman, donhunte,
Kumar Kartikeya Dwivedi
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> On Sat, May 18, 2024 at 3:13 AM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
>>
>> Introduce xdp_flowtable_offload bpf sample to offload sw flowtable logic
>> in xdp layer if hw flowtable is not available or does not support a
>> specific kind of traffic.
>>
>> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
>> ---
>> samples/bpf/Makefile | 7 +-
>> samples/bpf/xdp_flowtable_offload.bpf.c | 591 +++++++++++++++++++++++
>> samples/bpf/xdp_flowtable_offload_user.c | 128 +++++
>> 3 files changed, 725 insertions(+), 1 deletion(-)
>> create mode 100644 samples/bpf/xdp_flowtable_offload.bpf.c
>> create mode 100644 samples/bpf/xdp_flowtable_offload_user.c
>
> I feel this sample code is dead on arrival.
> Make selftest more real if you want people to use it as an example,
> but samples dir is just a dumping ground.
> We shouldn't be adding anything to it.
Agreed. We can integrate a working sample into xdp-tools instead :)
-Toke
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v2 3/4] samples/bpf: Add bpf sample to offload flowtable traffic to xdp
2024-05-21 10:19 ` Toke Høiland-Jørgensen
@ 2024-05-21 13:19 ` Lorenzo Bianconi
0 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Bianconi @ 2024-05-21 13:19 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Alexei Starovoitov, bpf, Pablo Neira Ayuso, Jozsef Kadlecsik,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netfilter-devel, Network Development, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Lorenzo Bianconi,
Florian Westphal, Jesper Dangaard Brouer, Simon Horman, donhunte,
Kumar Kartikeya Dwivedi
[-- Attachment #1: Type: text/plain, Size: 1180 bytes --]
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>
> > On Sat, May 18, 2024 at 3:13 AM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> >>
> >> Introduce xdp_flowtable_offload bpf sample to offload sw flowtable logic
> >> in xdp layer if hw flowtable is not available or does not support a
> >> specific kind of traffic.
> >>
> >> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> >> ---
> >> samples/bpf/Makefile | 7 +-
> >> samples/bpf/xdp_flowtable_offload.bpf.c | 591 +++++++++++++++++++++++
> >> samples/bpf/xdp_flowtable_offload_user.c | 128 +++++
> >> 3 files changed, 725 insertions(+), 1 deletion(-)
> >> create mode 100644 samples/bpf/xdp_flowtable_offload.bpf.c
> >> create mode 100644 samples/bpf/xdp_flowtable_offload_user.c
> >
> > I feel this sample code is dead on arrival.
> > Make selftest more real if you want people to use it as an example,
> > but samples dir is just a dumping ground.
> > We shouldn't be adding anything to it.
>
> Agreed. We can integrate a working sample into xdp-tools instead :)
ack fine, I can post a patch for xdp-tools.
Regards,
Lorenzo
>
> -Toke
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v2 2/4] netfilter: add bpf_xdp_flow_offload_lookup kfunc
2024-05-21 1:41 ` Alexei Starovoitov
@ 2024-05-21 13:21 ` Lorenzo Bianconi
0 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Bianconi @ 2024-05-21 13:21 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Pablo Neira Ayuso, Jozsef Kadlecsik, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, netfilter-devel,
Network Development, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Lorenzo Bianconi,
Toke Høiland-Jørgensen, Florian Westphal,
Jesper Dangaard Brouer, Simon Horman, donhunte,
Kumar Kartikeya Dwivedi
[-- Attachment #1: Type: text/plain, Size: 1714 bytes --]
> On Sat, May 18, 2024 at 3:13 AM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
[...]
>
> I think it needs to be KF_RET_NULL.
> And most likely KF_TRUSTED_ARGS as well.
ack, I will fix it in v2.
>
> Also the "offload" doesn't fit in the name.
> The existing code calls it "offload", because it's actually
> pushing the rules to HW (if I understand the code),
> but here it's just a lookup from xdp.
> So call it
> bpf_xdp_flow_lookup() ?
ack fine, I do not have a strong opinion on it. I will fix it in v2.
>
> Though "flow" is a bit too generic here.
> nf_flow maybe?
ack, I will fix it in v2.
Regards,
Lorenzo
>
> > +BTF_KFUNCS_END(nf_ft_kfunc_set)
> > +
> > +static const struct btf_kfunc_id_set nf_flow_offload_kfunc_set = {
> > + .owner = THIS_MODULE,
> > + .set = &nf_ft_kfunc_set,
> > +};
> > +
> > +int nf_flow_offload_register_bpf(void)
> > +{
> > + return register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP,
> > + &nf_flow_offload_kfunc_set);
> > +}
> > +EXPORT_SYMBOL_GPL(nf_flow_offload_register_bpf);
> > diff --git a/net/netfilter/nf_flow_table_inet.c b/net/netfilter/nf_flow_table_inet.c
> > index 6eef15648b7b0..6175f7556919d 100644
> > --- a/net/netfilter/nf_flow_table_inet.c
> > +++ b/net/netfilter/nf_flow_table_inet.c
> > @@ -98,7 +98,7 @@ static int __init nf_flow_inet_module_init(void)
> > nft_register_flowtable_type(&flowtable_ipv6);
> > nft_register_flowtable_type(&flowtable_inet);
> >
> > - return 0;
> > + return nf_flow_offload_register_bpf();
> > }
> >
> > static void __exit nf_flow_inet_module_exit(void)
> > --
> > 2.45.1
> >
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-05-21 13:21 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-18 10:12 [PATCH bpf-next v2 0/4] netfilter: Add the capability to offload flowtable in XDP layer Lorenzo Bianconi
2024-05-18 10:12 ` [PATCH bpf-next v2 1/4] netfilter: nf_tables: add flowtable map for xdp offload Lorenzo Bianconi
2024-05-18 10:12 ` [PATCH bpf-next v2 2/4] netfilter: add bpf_xdp_flow_offload_lookup kfunc Lorenzo Bianconi
2024-05-18 21:50 ` Kumar Kartikeya Dwivedi
2024-05-21 1:41 ` Alexei Starovoitov
2024-05-21 13:21 ` Lorenzo Bianconi
2024-05-18 10:12 ` [PATCH bpf-next v2 3/4] samples/bpf: Add bpf sample to offload flowtable traffic to xdp Lorenzo Bianconi
2024-05-21 1:45 ` Alexei Starovoitov
2024-05-21 10:19 ` Toke Høiland-Jørgensen
2024-05-21 13:19 ` Lorenzo Bianconi
2024-05-18 10:12 ` [PATCH bpf-next v2 4/4] selftests/bpf: Add selftest for bpf_xdp_flow_offload_lookup kfunc Lorenzo Bianconi
2024-05-21 1:43 ` Alexei Starovoitov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).