* [PATCH v4 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer
@ 2024-05-29 13:04 Lorenzo Bianconi
2024-05-29 13:04 ` [PATCH v4 bpf-next 1/3] netfilter: nf_tables: add flowtable map for xdp offload Lorenzo Bianconi
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Lorenzo Bianconi @ 2024-05-29 13:04 UTC (permalink / raw)
To: bpf
Cc: pablo, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
netdev, ast, daniel, andrii, martin.lau, eddyz87,
lorenzo.bianconi, toke, fw, hawk, horms, donhunte, memxor
Introduce bpf_xdp_flow_lookup kfunc in order to perform the lookup of
a given flowtable entry based on the fib tuple of incoming traffic.
bpf_xdp_flow_lookup can be used as building block to offload in XDP
the sw flowtable processing when the hw support is not available.
This series has been tested running the xdp_flowtable_offload eBPF program
on an ixgbe 10Gbps NIC (eno2) in order to XDP_REDIRECT the TCP traffic to
a veth pair (veth0-veth1) based on the content of the nf_flowtable as soon
as the TCP connection is in the established state:
[tcp client] (eno1) == LAN == (eno2) xdp_flowtable_offload [XDP_REDIRECT] --> veth0 == veth1 [tcp server]
table inet filter {
flowtable ft {
hook ingress priority filter
devices = { eno2, veth0 }
}
chain forward {
type filter hook forward priority filter
meta l4proto { tcp, udp } flow add @ft
}
}
- sw flowtable [1 TCP stream, T = 300s]: ~ 6.2 Gbps
- xdp flowtable [1 TCP stream, T = 300s]: ~ 7.6 Gbps
- sw flowtable [3 TCP stream, T = 300s]: ~ 7.7 Gbps
- xdp flowtable [3 TCP stream, T = 300s]: ~ 8.8 Gbps
Changes since v3:
- move flowtable map utilities in nf_flow_table_xdp.c
Changes since v2:
- introduce bpf_flowtable_opts struct in bpf_xdp_flow_lookup signature
- get rid of xdp_flowtable_offload bpf sample
- get rid of test_xdp_flowtable.sh for selftest and rely on prog_tests instead
- rename bpf_xdp_flow_offload_lookup in bpf_xdp_flow_lookup
Changes since v1:
- return NULL in bpf_xdp_flow_offload_lookup kfunc in case of error
- take into account kfunc registration possible failures
Changes since RFC:
- fix compilation error if BTF is not enabled
Florian Westphal (1):
netfilter: nf_tables: add flowtable map for xdp offload
Lorenzo Bianconi (2):
netfilter: add bpf_xdp_flow_lookup kfunc
selftests/bpf: Add selftest for bpf_xdp_flow_lookup kfunc
include/net/netfilter/nf_flow_table.h | 18 ++
net/netfilter/Makefile | 7 +-
net/netfilter/nf_flow_table_bpf.c | 117 ++++++++++++
net/netfilter/nf_flow_table_inet.c | 2 +-
net/netfilter/nf_flow_table_offload.c | 6 +-
net/netfilter/nf_flow_table_xdp.c | 163 +++++++++++++++++
tools/testing/selftests/bpf/config | 13 ++
.../selftests/bpf/prog_tests/xdp_flowtable.c | 168 ++++++++++++++++++
.../selftests/bpf/progs/xdp_flowtable.c | 145 +++++++++++++++
9 files changed, 635 insertions(+), 4 deletions(-)
create mode 100644 net/netfilter/nf_flow_table_bpf.c
create mode 100644 net/netfilter/nf_flow_table_xdp.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c
create mode 100644 tools/testing/selftests/bpf/progs/xdp_flowtable.c
--
2.45.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v4 bpf-next 1/3] netfilter: nf_tables: add flowtable map for xdp offload
2024-05-29 13:04 [PATCH v4 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer Lorenzo Bianconi
@ 2024-05-29 13:04 ` Lorenzo Bianconi
2024-06-13 22:34 ` Lorenzo Bianconi
2024-05-29 13:04 ` [PATCH v4 bpf-next 2/3] netfilter: add bpf_xdp_flow_lookup kfunc Lorenzo Bianconi
` (2 subsequent siblings)
3 siblings, 1 reply; 10+ messages in thread
From: Lorenzo Bianconi @ 2024-05-29 13:04 UTC (permalink / raw)
To: bpf
Cc: pablo, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
netdev, ast, daniel, andrii, martin.lau, eddyz87,
lorenzo.bianconi, toke, fw, hawk, horms, donhunte, memxor
From: Florian Westphal <fw@strlen.de>
This adds a small internal mapping table so that a new bpf (xdp) kfunc
can perform lookups in a flowtable.
As-is, xdp program has access to the device pointer, but no way to do a
lookup in a flowtable -- there is no way to obtain the needed struct
without questionable stunts.
This allows to obtain an nf_flowtable pointer given a net_device
structure.
In order to keep backward compatibility, the infrastructure allows the
user to add a given device to multiple flowtables, but it will always
return the first added mapping performing the lookup since it assumes
the right configuration is 1:1 mapping between flowtables and net_devices.
Signed-off-by: Florian Westphal <fw@strlen.de>
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
include/net/netfilter/nf_flow_table.h | 8 ++
net/netfilter/Makefile | 2 +-
net/netfilter/nf_flow_table_offload.c | 6 +-
net/netfilter/nf_flow_table_xdp.c | 163 ++++++++++++++++++++++++++
4 files changed, 176 insertions(+), 3 deletions(-)
create mode 100644 net/netfilter/nf_flow_table_xdp.c
diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 9abb7ee40d72f..688e02b287cc4 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -305,6 +305,14 @@ struct flow_ports {
__be16 source, dest;
};
+struct nf_flowtable *nf_flowtable_by_dev(const struct net_device *dev);
+int nf_flow_offload_xdp_setup(struct nf_flowtable *flowtable,
+ struct net_device *dev,
+ enum flow_block_command cmd);
+void nf_flow_offload_xdp_cancel(struct nf_flowtable *flowtable,
+ struct net_device *dev,
+ enum flow_block_command cmd);
+
unsigned int nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
const struct nf_hook_state *state);
unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 614815a3ed738..18046872a38aa 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -142,7 +142,7 @@ obj-$(CONFIG_NFT_FWD_NETDEV) += nft_fwd_netdev.o
# flow table infrastructure
obj-$(CONFIG_NF_FLOW_TABLE) += nf_flow_table.o
nf_flow_table-objs := nf_flow_table_core.o nf_flow_table_ip.o \
- nf_flow_table_offload.o
+ nf_flow_table_offload.o nf_flow_table_xdp.o
nf_flow_table-$(CONFIG_NF_FLOW_TABLE_PROCFS) += nf_flow_table_procfs.o
obj-$(CONFIG_NF_FLOW_TABLE_INET) += nf_flow_table_inet.o
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index a010b25076ca0..d9b019c98694b 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -1192,7 +1192,7 @@ int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
int err;
if (!nf_flowtable_hw_offload(flowtable))
- return 0;
+ return nf_flow_offload_xdp_setup(flowtable, dev, cmd);
if (dev->netdev_ops->ndo_setup_tc)
err = nf_flow_table_offload_cmd(&bo, flowtable, dev, cmd,
@@ -1200,8 +1200,10 @@ int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
else
err = nf_flow_table_indr_offload_cmd(&bo, flowtable, dev, cmd,
&extack);
- if (err < 0)
+ if (err < 0) {
+ nf_flow_offload_xdp_cancel(flowtable, dev, cmd);
return err;
+ }
return nf_flow_table_block_setup(flowtable, &bo, cmd);
}
diff --git a/net/netfilter/nf_flow_table_xdp.c b/net/netfilter/nf_flow_table_xdp.c
new file mode 100644
index 0000000000000..b9bdf27ba9bd3
--- /dev/null
+++ b/net/netfilter/nf_flow_table_xdp.c
@@ -0,0 +1,163 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/netfilter.h>
+#include <linux/rhashtable.h>
+#include <linux/netdevice.h>
+#include <net/flow_offload.h>
+#include <net/netfilter/nf_flow_table.h>
+
+struct flow_offload_xdp_ft {
+ struct list_head head;
+ struct nf_flowtable *ft;
+ struct rcu_head rcuhead;
+};
+
+struct flow_offload_xdp {
+ struct hlist_node hnode;
+ unsigned long net_device_addr;
+ struct list_head head;
+};
+
+#define NF_XDP_HT_BITS 4
+static DEFINE_HASHTABLE(nf_xdp_hashtable, NF_XDP_HT_BITS);
+static DEFINE_MUTEX(nf_xdp_hashtable_lock);
+
+/* caller must hold rcu read lock */
+struct nf_flowtable *nf_flowtable_by_dev(const struct net_device *dev)
+{
+ unsigned long key = (unsigned long)dev;
+ struct flow_offload_xdp *iter;
+
+ hash_for_each_possible_rcu(nf_xdp_hashtable, iter, hnode, key) {
+ if (key == iter->net_device_addr) {
+ struct flow_offload_xdp_ft *ft_elem;
+
+ /* The user is supposed to insert a given net_device
+ * just into a single nf_flowtable so we always return
+ * the first element here.
+ */
+ ft_elem = list_first_or_null_rcu(&iter->head,
+ struct flow_offload_xdp_ft,
+ head);
+ return ft_elem ? ft_elem->ft : NULL;
+ }
+ }
+
+ return NULL;
+}
+
+static int nf_flowtable_by_dev_insert(struct nf_flowtable *ft,
+ const struct net_device *dev)
+{
+ struct flow_offload_xdp *iter, *elem = NULL;
+ unsigned long key = (unsigned long)dev;
+ struct flow_offload_xdp_ft *ft_elem;
+
+ ft_elem = kzalloc(sizeof(*ft_elem), GFP_KERNEL_ACCOUNT);
+ if (!ft_elem)
+ return -ENOMEM;
+
+ ft_elem->ft = ft;
+
+ mutex_lock(&nf_xdp_hashtable_lock);
+
+ hash_for_each_possible(nf_xdp_hashtable, iter, hnode, key) {
+ if (key == iter->net_device_addr) {
+ elem = iter;
+ break;
+ }
+ }
+
+ if (!elem) {
+ elem = kzalloc(sizeof(*elem), GFP_KERNEL_ACCOUNT);
+ if (!elem)
+ goto err_unlock;
+
+ elem->net_device_addr = key;
+ INIT_LIST_HEAD(&elem->head);
+ hash_add_rcu(nf_xdp_hashtable, &elem->hnode, key);
+ }
+ list_add_tail_rcu(&ft_elem->head, &elem->head);
+
+ mutex_unlock(&nf_xdp_hashtable_lock);
+
+ return 0;
+
+err_unlock:
+ mutex_unlock(&nf_xdp_hashtable_lock);
+ kfree(ft_elem);
+
+ return -ENOMEM;
+}
+
+static void nf_flowtable_by_dev_remove(struct nf_flowtable *ft,
+ const struct net_device *dev)
+{
+ struct flow_offload_xdp *iter, *elem = NULL;
+ unsigned long key = (unsigned long)dev;
+
+ mutex_lock(&nf_xdp_hashtable_lock);
+
+ hash_for_each_possible(nf_xdp_hashtable, iter, hnode, key) {
+ if (key == iter->net_device_addr) {
+ elem = iter;
+ break;
+ }
+ }
+
+ if (elem) {
+ struct flow_offload_xdp_ft *ft_elem, *ft_next;
+
+ list_for_each_entry_safe(ft_elem, ft_next, &elem->head, head) {
+ if (ft_elem->ft == ft) {
+ list_del_rcu(&ft_elem->head);
+ kfree_rcu(ft_elem, rcuhead);
+ }
+ }
+
+ if (list_empty(&elem->head))
+ hash_del_rcu(&elem->hnode);
+ else
+ elem = NULL;
+ }
+
+ mutex_unlock(&nf_xdp_hashtable_lock);
+
+ if (elem) {
+ synchronize_rcu();
+ kfree(elem);
+ }
+}
+
+int nf_flow_offload_xdp_setup(struct nf_flowtable *flowtable,
+ struct net_device *dev,
+ enum flow_block_command cmd)
+{
+ switch (cmd) {
+ case FLOW_BLOCK_BIND:
+ return nf_flowtable_by_dev_insert(flowtable, dev);
+ case FLOW_BLOCK_UNBIND:
+ nf_flowtable_by_dev_remove(flowtable, dev);
+ return 0;
+ }
+
+ WARN_ON_ONCE(1);
+ return 0;
+}
+
+void nf_flow_offload_xdp_cancel(struct nf_flowtable *flowtable,
+ struct net_device *dev,
+ enum flow_block_command cmd)
+{
+ switch (cmd) {
+ case FLOW_BLOCK_BIND:
+ nf_flowtable_by_dev_remove(flowtable, dev);
+ return;
+ case FLOW_BLOCK_UNBIND:
+ /* We do not re-bind in case hw offload would report error
+ * on *unregister*.
+ */
+ break;
+ }
+}
--
2.45.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v4 bpf-next 2/3] netfilter: add bpf_xdp_flow_lookup kfunc
2024-05-29 13:04 [PATCH v4 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer Lorenzo Bianconi
2024-05-29 13:04 ` [PATCH v4 bpf-next 1/3] netfilter: nf_tables: add flowtable map for xdp offload Lorenzo Bianconi
@ 2024-05-29 13:04 ` Lorenzo Bianconi
2024-05-29 21:53 ` Alexei Starovoitov
2024-05-29 13:04 ` [PATCH v4 bpf-next 3/3] selftests/bpf: Add selftest for " Lorenzo Bianconi
2024-06-14 15:19 ` [PATCH v4 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer Pablo Neira Ayuso
3 siblings, 1 reply; 10+ messages in thread
From: Lorenzo Bianconi @ 2024-05-29 13:04 UTC (permalink / raw)
To: bpf
Cc: pablo, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
netdev, ast, daniel, andrii, martin.lau, eddyz87,
lorenzo.bianconi, toke, fw, hawk, horms, donhunte, memxor
Introduce bpf_xdp_flow_lookup kfunc in order to perform the lookup
of a given flowtable entry based on a fib tuple of incoming traffic.
bpf_xdp_flow_lookup can be used as building block to offload in xdp
the processing of sw flowtable when hw flowtable is not available.
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
include/net/netfilter/nf_flow_table.h | 10 +++
net/netfilter/Makefile | 5 ++
net/netfilter/nf_flow_table_bpf.c | 117 ++++++++++++++++++++++++++
net/netfilter/nf_flow_table_inet.c | 2 +-
4 files changed, 133 insertions(+), 1 deletion(-)
create mode 100644 net/netfilter/nf_flow_table_bpf.c
diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 688e02b287cc4..cc52234ef71af 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -318,6 +318,16 @@ unsigned int nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
const struct nf_hook_state *state);
+#if (IS_BUILTIN(CONFIG_NF_FLOW_TABLE) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) || \
+ (IS_MODULE(CONFIG_NF_FLOW_TABLE) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES))
+extern int nf_flow_register_bpf(void);
+#else
+static inline int nf_flow_register_bpf(void)
+{
+ return 0;
+}
+#endif
+
#define MODULE_ALIAS_NF_FLOWTABLE(family) \
MODULE_ALIAS("nf-flowtable-" __stringify(family))
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 18046872a38aa..f0aa4d7ef4998 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -144,6 +144,11 @@ obj-$(CONFIG_NF_FLOW_TABLE) += nf_flow_table.o
nf_flow_table-objs := nf_flow_table_core.o nf_flow_table_ip.o \
nf_flow_table_offload.o nf_flow_table_xdp.o
nf_flow_table-$(CONFIG_NF_FLOW_TABLE_PROCFS) += nf_flow_table_procfs.o
+ifeq ($(CONFIG_NF_FLOW_TABLE),m)
+nf_flow_table-$(CONFIG_DEBUG_INFO_BTF_MODULES) += nf_flow_table_bpf.o
+else ifeq ($(CONFIG_NF_FLOW_TABLE),y)
+nf_flow_table-$(CONFIG_DEBUG_INFO_BTF) += nf_flow_table_bpf.o
+endif
obj-$(CONFIG_NF_FLOW_TABLE_INET) += nf_flow_table_inet.o
diff --git a/net/netfilter/nf_flow_table_bpf.c b/net/netfilter/nf_flow_table_bpf.c
new file mode 100644
index 0000000000000..b3f8dffe62535
--- /dev/null
+++ b/net/netfilter/nf_flow_table_bpf.c
@@ -0,0 +1,117 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Unstable Flow Table Helpers for XDP hook
+ *
+ * These are called from the XDP programs.
+ * Note that it is allowed to break compatibility for these functions since
+ * the interface they are exposed through to BPF programs is explicitly
+ * unstable.
+ */
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <net/netfilter/nf_flow_table.h>
+#include <linux/bpf.h>
+#include <linux/btf.h>
+#include <net/xdp.h>
+
+/* bpf_flowtable_opts - options for bpf flowtable helpers
+ * @error: out parameter, set for any encountered error
+ */
+struct bpf_flowtable_opts {
+ s32 error;
+};
+
+enum {
+ NF_BPF_FLOWTABLE_OPTS_SZ = 4,
+};
+
+__diag_push();
+__diag_ignore_all("-Wmissing-prototypes",
+ "Global functions as their definitions will be in nf_flow_table BTF");
+
+static struct flow_offload_tuple_rhash *
+bpf_xdp_flow_tuple_lookup(struct net_device *dev,
+ struct flow_offload_tuple *tuple, __be16 proto)
+{
+ struct flow_offload_tuple_rhash *tuplehash;
+ struct nf_flowtable *nf_flow_table;
+ struct flow_offload *nf_flow;
+
+ nf_flow_table = nf_flowtable_by_dev(dev);
+ if (!nf_flow_table)
+ return ERR_PTR(-ENOENT);
+
+ tuplehash = flow_offload_lookup(nf_flow_table, tuple);
+ if (!tuplehash)
+ return ERR_PTR(-ENOENT);
+
+ nf_flow = container_of(tuplehash, struct flow_offload,
+ tuplehash[tuplehash->tuple.dir]);
+ flow_offload_refresh(nf_flow_table, nf_flow, false);
+
+ return tuplehash;
+}
+
+__bpf_kfunc struct flow_offload_tuple_rhash *
+bpf_xdp_flow_lookup(struct xdp_md *ctx, struct bpf_fib_lookup *fib_tuple,
+ struct bpf_flowtable_opts *opts, u32 opts_len)
+{
+ struct xdp_buff *xdp = (struct xdp_buff *)ctx;
+ struct flow_offload_tuple tuple = {
+ .iifidx = fib_tuple->ifindex,
+ .l3proto = fib_tuple->family,
+ .l4proto = fib_tuple->l4_protocol,
+ .src_port = fib_tuple->sport,
+ .dst_port = fib_tuple->dport,
+ };
+ struct flow_offload_tuple_rhash *tuplehash;
+ __be16 proto;
+
+ if (opts_len != NF_BPF_FLOWTABLE_OPTS_SZ) {
+ opts->error = -EINVAL;
+ return NULL;
+ }
+
+ switch (fib_tuple->family) {
+ case AF_INET:
+ tuple.src_v4.s_addr = fib_tuple->ipv4_src;
+ tuple.dst_v4.s_addr = fib_tuple->ipv4_dst;
+ proto = htons(ETH_P_IP);
+ break;
+ case AF_INET6:
+ tuple.src_v6 = *(struct in6_addr *)&fib_tuple->ipv6_src;
+ tuple.dst_v6 = *(struct in6_addr *)&fib_tuple->ipv6_dst;
+ proto = htons(ETH_P_IPV6);
+ break;
+ default:
+ opts->error = -EAFNOSUPPORT;
+ return NULL;
+ }
+
+ tuplehash = bpf_xdp_flow_tuple_lookup(xdp->rxq->dev, &tuple, proto);
+ if (IS_ERR(tuplehash)) {
+ opts->error = PTR_ERR(tuplehash);
+ return NULL;
+ }
+
+ return tuplehash;
+}
+
+__diag_pop()
+
+BTF_KFUNCS_START(nf_ft_kfunc_set)
+BTF_ID_FLAGS(func, bpf_xdp_flow_lookup, KF_TRUSTED_ARGS | KF_RET_NULL)
+BTF_KFUNCS_END(nf_ft_kfunc_set)
+
+static const struct btf_kfunc_id_set nf_flow_kfunc_set = {
+ .owner = THIS_MODULE,
+ .set = &nf_ft_kfunc_set,
+};
+
+int nf_flow_register_bpf(void)
+{
+ return register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP,
+ &nf_flow_kfunc_set);
+}
+EXPORT_SYMBOL_GPL(nf_flow_register_bpf);
diff --git a/net/netfilter/nf_flow_table_inet.c b/net/netfilter/nf_flow_table_inet.c
index 6eef15648b7b0..88787b45e30d6 100644
--- a/net/netfilter/nf_flow_table_inet.c
+++ b/net/netfilter/nf_flow_table_inet.c
@@ -98,7 +98,7 @@ static int __init nf_flow_inet_module_init(void)
nft_register_flowtable_type(&flowtable_ipv6);
nft_register_flowtable_type(&flowtable_inet);
- return 0;
+ return nf_flow_register_bpf();
}
static void __exit nf_flow_inet_module_exit(void)
--
2.45.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v4 bpf-next 3/3] selftests/bpf: Add selftest for bpf_xdp_flow_lookup kfunc
2024-05-29 13:04 [PATCH v4 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer Lorenzo Bianconi
2024-05-29 13:04 ` [PATCH v4 bpf-next 1/3] netfilter: nf_tables: add flowtable map for xdp offload Lorenzo Bianconi
2024-05-29 13:04 ` [PATCH v4 bpf-next 2/3] netfilter: add bpf_xdp_flow_lookup kfunc Lorenzo Bianconi
@ 2024-05-29 13:04 ` Lorenzo Bianconi
2024-06-13 16:06 ` Daniel Borkmann
2024-06-14 15:19 ` [PATCH v4 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer Pablo Neira Ayuso
3 siblings, 1 reply; 10+ messages in thread
From: Lorenzo Bianconi @ 2024-05-29 13:04 UTC (permalink / raw)
To: bpf
Cc: pablo, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
netdev, ast, daniel, andrii, martin.lau, eddyz87,
lorenzo.bianconi, toke, fw, hawk, horms, donhunte, memxor
Introduce e2e selftest for bpf_xdp_flow_lookup kfunc through
xdp_flowtable utility.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
tools/testing/selftests/bpf/config | 13 ++
.../selftests/bpf/prog_tests/xdp_flowtable.c | 168 ++++++++++++++++++
.../selftests/bpf/progs/xdp_flowtable.c | 145 +++++++++++++++
3 files changed, 326 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c
create mode 100644 tools/testing/selftests/bpf/progs/xdp_flowtable.c
diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
index 2fb16da78dce8..5291e97df7494 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -83,6 +83,19 @@ CONFIG_NF_CONNTRACK_MARK=y
CONFIG_NF_CONNTRACK_ZONES=y
CONFIG_NF_DEFRAG_IPV4=y
CONFIG_NF_DEFRAG_IPV6=y
+CONFIG_NF_TABLES=y
+CONFIG_NF_TABLES_INET=y
+CONFIG_NF_TABLES_NETDEV=y
+CONFIG_NF_TABLES_IPV4=y
+CONFIG_NF_TABLES_IPV6=y
+CONFIG_NETFILTER_INGRESS=y
+CONFIG_NF_FLOW_TABLE=y
+CONFIG_NF_FLOW_TABLE_INET=y
+CONFIG_NETFILTER_NETLINK=y
+CONFIG_NFT_FLOW_OFFLOAD=y
+CONFIG_IP_NF_IPTABLES=y
+CONFIG_IP6_NF_IPTABLES=y
+CONFIG_IP6_NF_FILTER=y
CONFIG_NF_NAT=y
CONFIG_RC_CORE=y
CONFIG_SECURITY=y
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c b/tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c
new file mode 100644
index 0000000000000..e1bf141d34015
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c
@@ -0,0 +1,168 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include <network_helpers.h>
+#include <bpf/btf.h>
+#include <linux/if_link.h>
+#include <linux/udp.h>
+#include <net/if.h>
+#include <unistd.h>
+
+#include "xdp_flowtable.skel.h"
+
+#define TX_NETNS_NAME "ns0"
+#define RX_NETNS_NAME "ns1"
+
+#define TX_NAME "v0"
+#define FORWARD_NAME "v1"
+#define RX_NAME "d0"
+
+#define TX_MAC "00:00:00:00:00:01"
+#define FORWARD_MAC "00:00:00:00:00:02"
+#define RX_MAC "00:00:00:00:00:03"
+#define DST_MAC "00:00:00:00:00:04"
+
+#define TX_ADDR "10.0.0.1"
+#define FORWARD_ADDR "10.0.0.2"
+#define RX_ADDR "20.0.0.1"
+#define DST_ADDR "20.0.0.2"
+
+#define PREFIX_LEN "8"
+#define N_PACKETS 10
+#define UDP_PORT 12345
+#define UDP_PORT_STR "12345"
+
+static int send_udp_traffic(void)
+{
+ struct sockaddr_storage addr;
+ int i, sock;
+
+ if (make_sockaddr(AF_INET, DST_ADDR, UDP_PORT, &addr, NULL))
+ return -EINVAL;
+
+ sock = socket(AF_INET, SOCK_DGRAM, 0);
+ if (sock < 0)
+ return sock;
+
+ for (i = 0; i < N_PACKETS; i++) {
+ unsigned char buf[] = { 0xaa, 0xbb, 0xcc };
+ int n;
+
+ n = sendto(sock, buf, sizeof(buf), MSG_NOSIGNAL | MSG_CONFIRM,
+ (struct sockaddr *)&addr, sizeof(addr));
+ if (n != sizeof(buf)) {
+ close(sock);
+ return -EINVAL;
+ }
+
+ usleep(50000); /* 50ms */
+ }
+ close(sock);
+
+ return 0;
+}
+
+void test_xdp_flowtable(void)
+{
+ struct xdp_flowtable *skel = NULL;
+ struct nstoken *tok = NULL;
+ int iifindex, stats_fd;
+ __u32 value, key = 0;
+ struct bpf_link *link;
+
+ if (SYS_NOFAIL("nft -v")) {
+ fprintf(stdout, "Missing required nft tool\n");
+ test__skip();
+ return;
+ }
+
+ SYS(out, "ip netns add " TX_NETNS_NAME);
+ SYS(out, "ip netns add " RX_NETNS_NAME);
+
+ tok = open_netns(RX_NETNS_NAME);
+ if (!ASSERT_OK_PTR(tok, "setns"))
+ goto out;
+
+ SYS(out, "sysctl -qw net.ipv4.conf.all.forwarding=1");
+
+ SYS(out, "ip link add " TX_NAME " type veth peer " FORWARD_NAME);
+ SYS(out, "ip link set " TX_NAME " netns " TX_NETNS_NAME);
+ SYS(out, "ip link set dev " FORWARD_NAME " address " FORWARD_MAC);
+ SYS(out,
+ "ip addr add " FORWARD_ADDR "/" PREFIX_LEN " dev " FORWARD_NAME);
+ SYS(out, "ip link set dev " FORWARD_NAME " up");
+
+ SYS(out, "ip link add " RX_NAME " type dummy");
+ SYS(out, "ip link set dev " RX_NAME " address " RX_MAC);
+ SYS(out, "ip addr add " RX_ADDR "/" PREFIX_LEN " dev " RX_NAME);
+ SYS(out, "ip link set dev " RX_NAME " up");
+
+ /* configure the flowtable */
+ SYS(out, "nft add table ip filter");
+ SYS(out,
+ "nft add flowtable ip filter f { hook ingress priority 0\\; "
+ "devices = { " FORWARD_NAME ", " RX_NAME " }\\; }");
+ SYS(out,
+ "nft add chain ip filter forward "
+ "{ type filter hook forward priority 0\\; }");
+ SYS(out,
+ "nft add rule ip filter forward ip protocol udp th dport "
+ UDP_PORT_STR " flow add @f");
+
+ /* Avoid ARP calls */
+ SYS(out,
+ "ip -4 neigh add " DST_ADDR " lladdr " DST_MAC " dev " RX_NAME);
+
+ close_netns(tok);
+ tok = open_netns(TX_NETNS_NAME);
+ if (!ASSERT_OK_PTR(tok, "setns"))
+ goto out;
+
+ SYS(out, "ip addr add " TX_ADDR "/" PREFIX_LEN " dev " TX_NAME);
+ SYS(out, "ip link set dev " TX_NAME " address " TX_MAC);
+ SYS(out, "ip link set dev " TX_NAME " up");
+ SYS(out, "ip route add default via " FORWARD_ADDR);
+
+ close_netns(tok);
+ tok = open_netns(RX_NETNS_NAME);
+ if (!ASSERT_OK_PTR(tok, "setns"))
+ goto out;
+
+ iifindex = if_nametoindex(FORWARD_NAME);
+ if (!ASSERT_NEQ(iifindex, 0, "iifindex"))
+ goto out;
+
+ skel = xdp_flowtable__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel"))
+ goto out;
+
+ link = bpf_program__attach_xdp(skel->progs.xdp_flowtable_do_lookup,
+ iifindex);
+ if (!ASSERT_OK_PTR(link, "prog_attach"))
+ goto out;
+
+ close_netns(tok);
+ tok = open_netns(TX_NETNS_NAME);
+ if (!ASSERT_OK_PTR(tok, "setns"))
+ goto out;
+
+ if (!ASSERT_OK(send_udp_traffic(), "send udp"))
+ goto out;
+
+ close_netns(tok);
+ tok = open_netns(RX_NETNS_NAME);
+ if (!ASSERT_OK_PTR(tok, "setns"))
+ goto out;
+
+ stats_fd = bpf_map__fd(skel->maps.stats);
+ if (!ASSERT_OK(bpf_map_lookup_elem(stats_fd, &key, &value),
+ "bpf_map_update_elem stats"))
+ goto out;
+
+ ASSERT_GE(value, N_PACKETS - 2, "bpf_xdp_flow_lookup failed");
+out:
+ xdp_flowtable__destroy(skel);
+ if (tok)
+ close_netns(tok);
+ SYS_NOFAIL("ip netns del " TX_NETNS_NAME);
+ SYS_NOFAIL("ip netns del " RX_NETNS_NAME);
+}
diff --git a/tools/testing/selftests/bpf/progs/xdp_flowtable.c b/tools/testing/selftests/bpf/progs/xdp_flowtable.c
new file mode 100644
index 0000000000000..fb7f6fac57459
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/xdp_flowtable.c
@@ -0,0 +1,145 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#define MAX_ERRNO 4095
+
+#define ETH_P_IP 0x0800
+#define ETH_P_IPV6 0x86dd
+#define IP_MF 0x2000 /* "More Fragments" */
+#define IP_OFFSET 0x1fff /* "Fragment Offset" */
+#define AF_INET 2
+#define AF_INET6 10
+
+struct bpf_flowtable_opts___local {
+ s32 error;
+};
+
+struct flow_offload_tuple_rhash *
+bpf_xdp_flow_lookup(struct xdp_md *, struct bpf_fib_lookup *,
+ struct bpf_flowtable_opts___local *, u32) __ksym;
+
+struct {
+ __uint(type, BPF_MAP_TYPE_ARRAY);
+ __type(key, __u32);
+ __type(value, __u32);
+ __uint(max_entries, 1);
+} stats SEC(".maps");
+
+static bool xdp_flowtable_offload_check_iphdr(struct iphdr *iph)
+{
+ /* ip fragmented traffic */
+ if (iph->frag_off & bpf_htons(IP_MF | IP_OFFSET))
+ return false;
+
+ /* ip options */
+ if (iph->ihl * 4 != sizeof(*iph))
+ return false;
+
+ if (iph->ttl <= 1)
+ return false;
+
+ return true;
+}
+
+static bool xdp_flowtable_offload_check_tcp_state(void *ports, void *data_end,
+ u8 proto)
+{
+ if (proto == IPPROTO_TCP) {
+ struct tcphdr *tcph = ports;
+
+ if (tcph + 1 > data_end)
+ return false;
+
+ if (tcph->fin || tcph->rst)
+ return false;
+ }
+
+ return true;
+}
+
+SEC("xdp.frags")
+int xdp_flowtable_do_lookup(struct xdp_md *ctx)
+{
+ void *data_end = (void *)(long)ctx->data_end;
+ struct bpf_flowtable_opts___local opts = {};
+ struct flow_offload_tuple_rhash *tuplehash;
+ struct bpf_fib_lookup tuple = {
+ .ifindex = ctx->ingress_ifindex,
+ };
+ void *data = (void *)(long)ctx->data;
+ struct ethhdr *eth = data;
+ struct flow_ports *ports;
+ __u32 *val, key = 0;
+
+ if (eth + 1 > data_end)
+ return XDP_DROP;
+
+ switch (eth->h_proto) {
+ case bpf_htons(ETH_P_IP): {
+ struct iphdr *iph = data + sizeof(*eth);
+
+ ports = (struct flow_ports *)(iph + 1);
+ if (ports + 1 > data_end)
+ return XDP_PASS;
+
+ /* sanity check on ip header */
+ if (!xdp_flowtable_offload_check_iphdr(iph))
+ return XDP_PASS;
+
+ if (!xdp_flowtable_offload_check_tcp_state(ports, data_end,
+ iph->protocol))
+ return XDP_PASS;
+
+ tuple.family = AF_INET;
+ tuple.tos = iph->tos;
+ tuple.l4_protocol = iph->protocol;
+ tuple.tot_len = bpf_ntohs(iph->tot_len);
+ tuple.ipv4_src = iph->saddr;
+ tuple.ipv4_dst = iph->daddr;
+ tuple.sport = ports->source;
+ tuple.dport = ports->dest;
+ break;
+ }
+ case bpf_htons(ETH_P_IPV6): {
+ struct in6_addr *src = (struct in6_addr *)tuple.ipv6_src;
+ struct in6_addr *dst = (struct in6_addr *)tuple.ipv6_dst;
+ struct ipv6hdr *ip6h = data + sizeof(*eth);
+
+ ports = (struct flow_ports *)(ip6h + 1);
+ if (ports + 1 > data_end)
+ return XDP_PASS;
+
+ if (ip6h->hop_limit <= 1)
+ return XDP_PASS;
+
+ if (!xdp_flowtable_offload_check_tcp_state(ports, data_end,
+ ip6h->nexthdr))
+ return XDP_PASS;
+
+ tuple.family = AF_INET6;
+ tuple.l4_protocol = ip6h->nexthdr;
+ tuple.tot_len = bpf_ntohs(ip6h->payload_len);
+ *src = ip6h->saddr;
+ *dst = ip6h->daddr;
+ tuple.sport = ports->source;
+ tuple.dport = ports->dest;
+ break;
+ }
+ default:
+ return XDP_PASS;
+ }
+
+ tuplehash = bpf_xdp_flow_lookup(ctx, &tuple, &opts, sizeof(opts));
+ if (!tuplehash)
+ return XDP_PASS;
+
+ val = bpf_map_lookup_elem(&stats, &key);
+ if (val)
+ __sync_add_and_fetch(val, 1);
+
+ return XDP_PASS;
+}
+
+char _license[] SEC("license") = "GPL";
--
2.45.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v4 bpf-next 2/3] netfilter: add bpf_xdp_flow_lookup kfunc
2024-05-29 13:04 ` [PATCH v4 bpf-next 2/3] netfilter: add bpf_xdp_flow_lookup kfunc Lorenzo Bianconi
@ 2024-05-29 21:53 ` Alexei Starovoitov
0 siblings, 0 replies; 10+ messages in thread
From: Alexei Starovoitov @ 2024-05-29 21:53 UTC (permalink / raw)
To: Lorenzo Bianconi
Cc: bpf, Pablo Neira Ayuso, Jozsef Kadlecsik, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, netfilter-devel,
Network Development, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eddy Z, Lorenzo Bianconi,
Toke Høiland-Jørgensen, Florian Westphal,
Jesper Dangaard Brouer, Simon Horman, donhunte,
Kumar Kartikeya Dwivedi
On Wed, May 29, 2024 at 6:04 AM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
>
> Introduce bpf_xdp_flow_lookup kfunc in order to perform the lookup
> of a given flowtable entry based on a fib tuple of incoming traffic.
> bpf_xdp_flow_lookup can be used as building block to offload in xdp
> the processing of sw flowtable when hw flowtable is not available.
>
> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
lgtm
Waiting for the Ack from netfilter folks...
So we can land it through bpf-next and pass it to net-next
a week or so later.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 bpf-next 3/3] selftests/bpf: Add selftest for bpf_xdp_flow_lookup kfunc
2024-05-29 13:04 ` [PATCH v4 bpf-next 3/3] selftests/bpf: Add selftest for " Lorenzo Bianconi
@ 2024-06-13 16:06 ` Daniel Borkmann
2024-06-13 16:54 ` Daniel Xu
0 siblings, 1 reply; 10+ messages in thread
From: Daniel Borkmann @ 2024-06-13 16:06 UTC (permalink / raw)
To: Lorenzo Bianconi, bpf
Cc: pablo, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
netdev, ast, andrii, martin.lau, eddyz87, lorenzo.bianconi, toke,
fw, hawk, horms, donhunte, memxor
On 5/29/24 3:04 PM, Lorenzo Bianconi wrote:
> Introduce e2e selftest for bpf_xdp_flow_lookup kfunc through
> xdp_flowtable utility.
>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
[...]
> +struct flow_offload_tuple_rhash *
> +bpf_xdp_flow_lookup(struct xdp_md *, struct bpf_fib_lookup *,
> + struct bpf_flowtable_opts___local *, u32) __ksym;
Btw, this fails CI build :
https://github.com/kernel-patches/bpf/actions/runs/9499749947/job/26190382116
[...]
progs/xdp_flowtable.c:20:1: error: conflicting types for 'bpf_xdp_flow_lookup'
20 | bpf_xdp_flow_lookup(struct xdp_md *, struct bpf_fib_lookup *,
| ^
/tmp/work/bpf/bpf/tools/testing/selftests/bpf/tools/include/vmlinux.h:106755:41: note: previous declaration is here
106755 | extern struct flow_offload_tuple_rhash *bpf_xdp_flow_lookup(struct xdp_md *ctx, struct bpf_fib_lookup *fib_tuple, struct bpf_flowtable_opts *opts, u32 opts_len) __weak __ksym;
| ^
progs/xdp_flowtable.c:134:47: error: incompatible pointer types passing 'struct bpf_flowtable_opts___local *' to parameter of type 'struct bpf_flowtable_opts *' [-Werror,-Wincompatible-pointer-types]
134 | tuplehash = bpf_xdp_flow_lookup(ctx, &tuple, &opts, sizeof(opts));
| ^~~~~
/tmp/work/bpf/bpf/tools/testing/selftests/bpf/tools/include/vmlinux.h:106755:142: note: passing argument to parameter 'opts' here
106755 | extern struct flow_offload_tuple_rhash *bpf_xdp_flow_lookup(struct xdp_md *ctx, struct bpf_fib_lookup *fib_tuple, struct bpf_flowtable_opts *opts, u32 opts_len) __weak __ksym;
| ^
2 errors generated.
CLNG-BPF [test_maps] kprobe_multi_override.bpf.o
CLNG-BPF [test_maps] tailcall_bpf2bpf1.bpf.o
make: *** [Makefile:654: /tmp/work/bpf/bpf/tools/testing/selftests/bpf/xdp_flowtable.bpf.o] Error 1
make: *** Waiting for unfinished jobs....
make: Leaving directory '/tmp/work/bpf/bpf/tools/testing/selftests/bpf'
Error: Process completed with exit code 2.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 bpf-next 3/3] selftests/bpf: Add selftest for bpf_xdp_flow_lookup kfunc
2024-06-13 16:06 ` Daniel Borkmann
@ 2024-06-13 16:54 ` Daniel Xu
2024-06-13 22:11 ` Lorenzo Bianconi
0 siblings, 1 reply; 10+ messages in thread
From: Daniel Xu @ 2024-06-13 16:54 UTC (permalink / raw)
To: Daniel Borkmann
Cc: Lorenzo Bianconi, bpf, pablo, kadlec, davem, edumazet, kuba,
pabeni, netfilter-devel, netdev, ast, andrii, martin.lau, eddyz87,
lorenzo.bianconi, toke, fw, hawk, horms, donhunte, memxor
On Thu, Jun 13, 2024 at 06:06:29PM GMT, Daniel Borkmann wrote:
> On 5/29/24 3:04 PM, Lorenzo Bianconi wrote:
> > Introduce e2e selftest for bpf_xdp_flow_lookup kfunc through
> > xdp_flowtable utility.
> >
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> [...]
> > +struct flow_offload_tuple_rhash *
> > +bpf_xdp_flow_lookup(struct xdp_md *, struct bpf_fib_lookup *,
> > + struct bpf_flowtable_opts___local *, u32) __ksym;
>
> Btw, this fails CI build :
>
> https://github.com/kernel-patches/bpf/actions/runs/9499749947/job/26190382116
>
> [...]
> progs/xdp_flowtable.c:20:1: error: conflicting types for 'bpf_xdp_flow_lookup'
> 20 | bpf_xdp_flow_lookup(struct xdp_md *, struct bpf_fib_lookup *,
> | ^
> /tmp/work/bpf/bpf/tools/testing/selftests/bpf/tools/include/vmlinux.h:106755:41: note: previous declaration is here
> 106755 | extern struct flow_offload_tuple_rhash *bpf_xdp_flow_lookup(struct xdp_md *ctx, struct bpf_fib_lookup *fib_tuple, struct bpf_flowtable_opts *opts, u32 opts_len) __weak __ksym;
> | ^
> progs/xdp_flowtable.c:134:47: error: incompatible pointer types passing 'struct bpf_flowtable_opts___local *' to parameter of type 'struct bpf_flowtable_opts *' [-Werror,-Wincompatible-pointer-types]
> 134 | tuplehash = bpf_xdp_flow_lookup(ctx, &tuple, &opts, sizeof(opts));
> | ^~~~~
> /tmp/work/bpf/bpf/tools/testing/selftests/bpf/tools/include/vmlinux.h:106755:142: note: passing argument to parameter 'opts' here
> 106755 | extern struct flow_offload_tuple_rhash *bpf_xdp_flow_lookup(struct xdp_md *ctx, struct bpf_fib_lookup *fib_tuple, struct bpf_flowtable_opts *opts, u32 opts_len) __weak __ksym;
> | ^
> 2 errors generated.
> CLNG-BPF [test_maps] kprobe_multi_override.bpf.o
> CLNG-BPF [test_maps] tailcall_bpf2bpf1.bpf.o
> make: *** [Makefile:654: /tmp/work/bpf/bpf/tools/testing/selftests/bpf/xdp_flowtable.bpf.o] Error 1
> make: *** Waiting for unfinished jobs....
> make: Leaving directory '/tmp/work/bpf/bpf/tools/testing/selftests/bpf'
> Error: Process completed with exit code 2.
>
We'll probably want to do the same thing as in f709124dd72f ("bpf:
selftests: nf: Opt out of using generated kfunc prototypes").
Daniel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 bpf-next 3/3] selftests/bpf: Add selftest for bpf_xdp_flow_lookup kfunc
2024-06-13 16:54 ` Daniel Xu
@ 2024-06-13 22:11 ` Lorenzo Bianconi
0 siblings, 0 replies; 10+ messages in thread
From: Lorenzo Bianconi @ 2024-06-13 22:11 UTC (permalink / raw)
To: Daniel Xu
Cc: Daniel Borkmann, bpf, pablo, kadlec, davem, edumazet, kuba,
pabeni, netfilter-devel, netdev, ast, andrii, martin.lau, eddyz87,
lorenzo.bianconi, toke, fw, hawk, horms, donhunte, memxor
[-- Attachment #1: Type: text/plain, Size: 2763 bytes --]
> On Thu, Jun 13, 2024 at 06:06:29PM GMT, Daniel Borkmann wrote:
> > On 5/29/24 3:04 PM, Lorenzo Bianconi wrote:
> > > Introduce e2e selftest for bpf_xdp_flow_lookup kfunc through
> > > xdp_flowtable utility.
> > >
> > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > [...]
> > > +struct flow_offload_tuple_rhash *
> > > +bpf_xdp_flow_lookup(struct xdp_md *, struct bpf_fib_lookup *,
> > > + struct bpf_flowtable_opts___local *, u32) __ksym;
> >
> > Btw, this fails CI build :
> >
> > https://github.com/kernel-patches/bpf/actions/runs/9499749947/job/26190382116
> >
> > [...]
> > progs/xdp_flowtable.c:20:1: error: conflicting types for 'bpf_xdp_flow_lookup'
> > 20 | bpf_xdp_flow_lookup(struct xdp_md *, struct bpf_fib_lookup *,
> > | ^
> > /tmp/work/bpf/bpf/tools/testing/selftests/bpf/tools/include/vmlinux.h:106755:41: note: previous declaration is here
> > 106755 | extern struct flow_offload_tuple_rhash *bpf_xdp_flow_lookup(struct xdp_md *ctx, struct bpf_fib_lookup *fib_tuple, struct bpf_flowtable_opts *opts, u32 opts_len) __weak __ksym;
> > | ^
> > progs/xdp_flowtable.c:134:47: error: incompatible pointer types passing 'struct bpf_flowtable_opts___local *' to parameter of type 'struct bpf_flowtable_opts *' [-Werror,-Wincompatible-pointer-types]
> > 134 | tuplehash = bpf_xdp_flow_lookup(ctx, &tuple, &opts, sizeof(opts));
> > | ^~~~~
> > /tmp/work/bpf/bpf/tools/testing/selftests/bpf/tools/include/vmlinux.h:106755:142: note: passing argument to parameter 'opts' here
> > 106755 | extern struct flow_offload_tuple_rhash *bpf_xdp_flow_lookup(struct xdp_md *ctx, struct bpf_fib_lookup *fib_tuple, struct bpf_flowtable_opts *opts, u32 opts_len) __weak __ksym;
> > | ^
> > 2 errors generated.
> > CLNG-BPF [test_maps] kprobe_multi_override.bpf.o
> > CLNG-BPF [test_maps] tailcall_bpf2bpf1.bpf.o
> > make: *** [Makefile:654: /tmp/work/bpf/bpf/tools/testing/selftests/bpf/xdp_flowtable.bpf.o] Error 1
> > make: *** Waiting for unfinished jobs....
> > make: Leaving directory '/tmp/work/bpf/bpf/tools/testing/selftests/bpf'
> > Error: Process completed with exit code 2.
> >
>
> We'll probably want to do the same thing as in f709124dd72f ("bpf:
> selftests: nf: Opt out of using generated kfunc prototypes").
ack, I added BPF_NO_KFUNC_PROTOTYPES to selftest patch. CI seems fine now:
https://github.com/kernel-patches/bpf/pull/7202
Regards,
Lorenzo
>
> Daniel
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 bpf-next 1/3] netfilter: nf_tables: add flowtable map for xdp offload
2024-05-29 13:04 ` [PATCH v4 bpf-next 1/3] netfilter: nf_tables: add flowtable map for xdp offload Lorenzo Bianconi
@ 2024-06-13 22:34 ` Lorenzo Bianconi
0 siblings, 0 replies; 10+ messages in thread
From: Lorenzo Bianconi @ 2024-06-13 22:34 UTC (permalink / raw)
To: pablo
Cc: bpf, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
netdev, ast, daniel, andrii, martin.lau, eddyz87,
lorenzo.bianconi, toke, fw, hawk, horms, donhunte, memxor
[-- Attachment #1: Type: text/plain, Size: 8359 bytes --]
> From: Florian Westphal <fw@strlen.de>
>
> This adds a small internal mapping table so that a new bpf (xdp) kfunc
> can perform lookups in a flowtable.
>
> As-is, xdp program has access to the device pointer, but no way to do a
> lookup in a flowtable -- there is no way to obtain the needed struct
> without questionable stunts.
>
> This allows to obtain an nf_flowtable pointer given a net_device
> structure.
>
> In order to keep backward compatibility, the infrastructure allows the
> user to add a given device to multiple flowtables, but it will always
> return the first added mapping performing the lookup since it assumes
> the right configuration is 1:1 mapping between flowtables and net_devices.
Hi Pablo,
do you have any feedback about nft part? Thanks.
Regards,
Lorenzo
>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
> include/net/netfilter/nf_flow_table.h | 8 ++
> net/netfilter/Makefile | 2 +-
> net/netfilter/nf_flow_table_offload.c | 6 +-
> net/netfilter/nf_flow_table_xdp.c | 163 ++++++++++++++++++++++++++
> 4 files changed, 176 insertions(+), 3 deletions(-)
> create mode 100644 net/netfilter/nf_flow_table_xdp.c
>
> diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
> index 9abb7ee40d72f..688e02b287cc4 100644
> --- a/include/net/netfilter/nf_flow_table.h
> +++ b/include/net/netfilter/nf_flow_table.h
> @@ -305,6 +305,14 @@ struct flow_ports {
> __be16 source, dest;
> };
>
> +struct nf_flowtable *nf_flowtable_by_dev(const struct net_device *dev);
> +int nf_flow_offload_xdp_setup(struct nf_flowtable *flowtable,
> + struct net_device *dev,
> + enum flow_block_command cmd);
> +void nf_flow_offload_xdp_cancel(struct nf_flowtable *flowtable,
> + struct net_device *dev,
> + enum flow_block_command cmd);
> +
> unsigned int nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
> const struct nf_hook_state *state);
> unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
> diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
> index 614815a3ed738..18046872a38aa 100644
> --- a/net/netfilter/Makefile
> +++ b/net/netfilter/Makefile
> @@ -142,7 +142,7 @@ obj-$(CONFIG_NFT_FWD_NETDEV) += nft_fwd_netdev.o
> # flow table infrastructure
> obj-$(CONFIG_NF_FLOW_TABLE) += nf_flow_table.o
> nf_flow_table-objs := nf_flow_table_core.o nf_flow_table_ip.o \
> - nf_flow_table_offload.o
> + nf_flow_table_offload.o nf_flow_table_xdp.o
> nf_flow_table-$(CONFIG_NF_FLOW_TABLE_PROCFS) += nf_flow_table_procfs.o
>
> obj-$(CONFIG_NF_FLOW_TABLE_INET) += nf_flow_table_inet.o
> diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
> index a010b25076ca0..d9b019c98694b 100644
> --- a/net/netfilter/nf_flow_table_offload.c
> +++ b/net/netfilter/nf_flow_table_offload.c
> @@ -1192,7 +1192,7 @@ int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
> int err;
>
> if (!nf_flowtable_hw_offload(flowtable))
> - return 0;
> + return nf_flow_offload_xdp_setup(flowtable, dev, cmd);
>
> if (dev->netdev_ops->ndo_setup_tc)
> err = nf_flow_table_offload_cmd(&bo, flowtable, dev, cmd,
> @@ -1200,8 +1200,10 @@ int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
> else
> err = nf_flow_table_indr_offload_cmd(&bo, flowtable, dev, cmd,
> &extack);
> - if (err < 0)
> + if (err < 0) {
> + nf_flow_offload_xdp_cancel(flowtable, dev, cmd);
> return err;
> + }
>
> return nf_flow_table_block_setup(flowtable, &bo, cmd);
> }
> diff --git a/net/netfilter/nf_flow_table_xdp.c b/net/netfilter/nf_flow_table_xdp.c
> new file mode 100644
> index 0000000000000..b9bdf27ba9bd3
> --- /dev/null
> +++ b/net/netfilter/nf_flow_table_xdp.c
> @@ -0,0 +1,163 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/netfilter.h>
> +#include <linux/rhashtable.h>
> +#include <linux/netdevice.h>
> +#include <net/flow_offload.h>
> +#include <net/netfilter/nf_flow_table.h>
> +
> +struct flow_offload_xdp_ft {
> + struct list_head head;
> + struct nf_flowtable *ft;
> + struct rcu_head rcuhead;
> +};
> +
> +struct flow_offload_xdp {
> + struct hlist_node hnode;
> + unsigned long net_device_addr;
> + struct list_head head;
> +};
> +
> +#define NF_XDP_HT_BITS 4
> +static DEFINE_HASHTABLE(nf_xdp_hashtable, NF_XDP_HT_BITS);
> +static DEFINE_MUTEX(nf_xdp_hashtable_lock);
> +
> +/* caller must hold rcu read lock */
> +struct nf_flowtable *nf_flowtable_by_dev(const struct net_device *dev)
> +{
> + unsigned long key = (unsigned long)dev;
> + struct flow_offload_xdp *iter;
> +
> + hash_for_each_possible_rcu(nf_xdp_hashtable, iter, hnode, key) {
> + if (key == iter->net_device_addr) {
> + struct flow_offload_xdp_ft *ft_elem;
> +
> + /* The user is supposed to insert a given net_device
> + * just into a single nf_flowtable so we always return
> + * the first element here.
> + */
> + ft_elem = list_first_or_null_rcu(&iter->head,
> + struct flow_offload_xdp_ft,
> + head);
> + return ft_elem ? ft_elem->ft : NULL;
> + }
> + }
> +
> + return NULL;
> +}
> +
> +static int nf_flowtable_by_dev_insert(struct nf_flowtable *ft,
> + const struct net_device *dev)
> +{
> + struct flow_offload_xdp *iter, *elem = NULL;
> + unsigned long key = (unsigned long)dev;
> + struct flow_offload_xdp_ft *ft_elem;
> +
> + ft_elem = kzalloc(sizeof(*ft_elem), GFP_KERNEL_ACCOUNT);
> + if (!ft_elem)
> + return -ENOMEM;
> +
> + ft_elem->ft = ft;
> +
> + mutex_lock(&nf_xdp_hashtable_lock);
> +
> + hash_for_each_possible(nf_xdp_hashtable, iter, hnode, key) {
> + if (key == iter->net_device_addr) {
> + elem = iter;
> + break;
> + }
> + }
> +
> + if (!elem) {
> + elem = kzalloc(sizeof(*elem), GFP_KERNEL_ACCOUNT);
> + if (!elem)
> + goto err_unlock;
> +
> + elem->net_device_addr = key;
> + INIT_LIST_HEAD(&elem->head);
> + hash_add_rcu(nf_xdp_hashtable, &elem->hnode, key);
> + }
> + list_add_tail_rcu(&ft_elem->head, &elem->head);
> +
> + mutex_unlock(&nf_xdp_hashtable_lock);
> +
> + return 0;
> +
> +err_unlock:
> + mutex_unlock(&nf_xdp_hashtable_lock);
> + kfree(ft_elem);
> +
> + return -ENOMEM;
> +}
> +
> +static void nf_flowtable_by_dev_remove(struct nf_flowtable *ft,
> + const struct net_device *dev)
> +{
> + struct flow_offload_xdp *iter, *elem = NULL;
> + unsigned long key = (unsigned long)dev;
> +
> + mutex_lock(&nf_xdp_hashtable_lock);
> +
> + hash_for_each_possible(nf_xdp_hashtable, iter, hnode, key) {
> + if (key == iter->net_device_addr) {
> + elem = iter;
> + break;
> + }
> + }
> +
> + if (elem) {
> + struct flow_offload_xdp_ft *ft_elem, *ft_next;
> +
> + list_for_each_entry_safe(ft_elem, ft_next, &elem->head, head) {
> + if (ft_elem->ft == ft) {
> + list_del_rcu(&ft_elem->head);
> + kfree_rcu(ft_elem, rcuhead);
> + }
> + }
> +
> + if (list_empty(&elem->head))
> + hash_del_rcu(&elem->hnode);
> + else
> + elem = NULL;
> + }
> +
> + mutex_unlock(&nf_xdp_hashtable_lock);
> +
> + if (elem) {
> + synchronize_rcu();
> + kfree(elem);
> + }
> +}
> +
> +int nf_flow_offload_xdp_setup(struct nf_flowtable *flowtable,
> + struct net_device *dev,
> + enum flow_block_command cmd)
> +{
> + switch (cmd) {
> + case FLOW_BLOCK_BIND:
> + return nf_flowtable_by_dev_insert(flowtable, dev);
> + case FLOW_BLOCK_UNBIND:
> + nf_flowtable_by_dev_remove(flowtable, dev);
> + return 0;
> + }
> +
> + WARN_ON_ONCE(1);
> + return 0;
> +}
> +
> +void nf_flow_offload_xdp_cancel(struct nf_flowtable *flowtable,
> + struct net_device *dev,
> + enum flow_block_command cmd)
> +{
> + switch (cmd) {
> + case FLOW_BLOCK_BIND:
> + nf_flowtable_by_dev_remove(flowtable, dev);
> + return;
> + case FLOW_BLOCK_UNBIND:
> + /* We do not re-bind in case hw offload would report error
> + * on *unregister*.
> + */
> + break;
> + }
> +}
> --
> 2.45.1
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer
2024-05-29 13:04 [PATCH v4 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer Lorenzo Bianconi
` (2 preceding siblings ...)
2024-05-29 13:04 ` [PATCH v4 bpf-next 3/3] selftests/bpf: Add selftest for " Lorenzo Bianconi
@ 2024-06-14 15:19 ` Pablo Neira Ayuso
3 siblings, 0 replies; 10+ messages in thread
From: Pablo Neira Ayuso @ 2024-06-14 15:19 UTC (permalink / raw)
To: Lorenzo Bianconi
Cc: bpf, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
netdev, ast, daniel, andrii, martin.lau, eddyz87,
lorenzo.bianconi, toke, fw, hawk, horms, donhunte, memxor
On Wed, May 29, 2024 at 03:04:29PM +0200, Lorenzo Bianconi wrote:
> Introduce bpf_xdp_flow_lookup kfunc in order to perform the lookup of
> a given flowtable entry based on the fib tuple of incoming traffic.
> bpf_xdp_flow_lookup can be used as building block to offload in XDP
> the sw flowtable processing when the hw support is not available.
Akced-by: Pablo Neira Ayuso <pablo@netfilter.org>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-06-14 15:19 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-29 13:04 [PATCH v4 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer Lorenzo Bianconi
2024-05-29 13:04 ` [PATCH v4 bpf-next 1/3] netfilter: nf_tables: add flowtable map for xdp offload Lorenzo Bianconi
2024-06-13 22:34 ` Lorenzo Bianconi
2024-05-29 13:04 ` [PATCH v4 bpf-next 2/3] netfilter: add bpf_xdp_flow_lookup kfunc Lorenzo Bianconi
2024-05-29 21:53 ` Alexei Starovoitov
2024-05-29 13:04 ` [PATCH v4 bpf-next 3/3] selftests/bpf: Add selftest for " Lorenzo Bianconi
2024-06-13 16:06 ` Daniel Borkmann
2024-06-13 16:54 ` Daniel Xu
2024-06-13 22:11 ` Lorenzo Bianconi
2024-06-14 15:19 ` [PATCH v4 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer Pablo Neira Ayuso
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).