All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer
@ 2024-05-29 13:04 Lorenzo Bianconi
  2024-05-29 13:04 ` [PATCH v4 bpf-next 1/3] netfilter: nf_tables: add flowtable map for xdp offload Lorenzo Bianconi
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Lorenzo Bianconi @ 2024-05-29 13:04 UTC (permalink / raw)
  To: bpf
  Cc: pablo, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
	netdev, ast, daniel, andrii, martin.lau, eddyz87,
	lorenzo.bianconi, toke, fw, hawk, horms, donhunte, memxor

Introduce bpf_xdp_flow_lookup kfunc in order to perform the lookup of
a given flowtable entry based on the fib tuple of incoming traffic.
bpf_xdp_flow_lookup can be used as building block to offload in XDP
the sw flowtable processing when the hw support is not available.

This series has been tested running the xdp_flowtable_offload eBPF program
on an ixgbe 10Gbps NIC (eno2) in order to XDP_REDIRECT the TCP traffic to
a veth pair (veth0-veth1) based on the content of the nf_flowtable as soon
as the TCP connection is in the established state:

[tcp client] (eno1) == LAN == (eno2) xdp_flowtable_offload [XDP_REDIRECT] --> veth0 == veth1 [tcp server]

table inet filter {
	flowtable ft {
		hook ingress priority filter
		devices = { eno2, veth0 }
	}
	chain forward {
		type filter hook forward priority filter
		meta l4proto { tcp, udp } flow add @ft
	}
}

-  sw flowtable [1 TCP stream, T = 300s]: ~ 6.2 Gbps
- xdp flowtable [1 TCP stream, T = 300s]: ~ 7.6 Gbps

-  sw flowtable [3 TCP stream, T = 300s]: ~ 7.7 Gbps
- xdp flowtable [3 TCP stream, T = 300s]: ~ 8.8 Gbps

Changes since v3:
- move flowtable map utilities in nf_flow_table_xdp.c
Changes since v2:
- introduce bpf_flowtable_opts struct in bpf_xdp_flow_lookup signature
- get rid of xdp_flowtable_offload bpf sample
- get rid of test_xdp_flowtable.sh for selftest and rely on prog_tests instead
- rename bpf_xdp_flow_offload_lookup in bpf_xdp_flow_lookup
Changes since v1:
- return NULL in bpf_xdp_flow_offload_lookup kfunc in case of error
- take into account kfunc registration possible failures
Changes since RFC:
- fix compilation error if BTF is not enabled

Florian Westphal (1):
  netfilter: nf_tables: add flowtable map for xdp offload

Lorenzo Bianconi (2):
  netfilter: add bpf_xdp_flow_lookup kfunc
  selftests/bpf: Add selftest for bpf_xdp_flow_lookup kfunc

 include/net/netfilter/nf_flow_table.h         |  18 ++
 net/netfilter/Makefile                        |   7 +-
 net/netfilter/nf_flow_table_bpf.c             | 117 ++++++++++++
 net/netfilter/nf_flow_table_inet.c            |   2 +-
 net/netfilter/nf_flow_table_offload.c         |   6 +-
 net/netfilter/nf_flow_table_xdp.c             | 163 +++++++++++++++++
 tools/testing/selftests/bpf/config            |  13 ++
 .../selftests/bpf/prog_tests/xdp_flowtable.c  | 168 ++++++++++++++++++
 .../selftests/bpf/progs/xdp_flowtable.c       | 145 +++++++++++++++
 9 files changed, 635 insertions(+), 4 deletions(-)
 create mode 100644 net/netfilter/nf_flow_table_bpf.c
 create mode 100644 net/netfilter/nf_flow_table_xdp.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c
 create mode 100644 tools/testing/selftests/bpf/progs/xdp_flowtable.c

-- 
2.45.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v4 bpf-next 1/3] netfilter: nf_tables: add flowtable map for xdp offload
  2024-05-29 13:04 [PATCH v4 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer Lorenzo Bianconi
@ 2024-05-29 13:04 ` Lorenzo Bianconi
  2024-06-13 22:34   ` Lorenzo Bianconi
  2024-05-29 13:04 ` [PATCH v4 bpf-next 2/3] netfilter: add bpf_xdp_flow_lookup kfunc Lorenzo Bianconi
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Lorenzo Bianconi @ 2024-05-29 13:04 UTC (permalink / raw)
  To: bpf
  Cc: pablo, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
	netdev, ast, daniel, andrii, martin.lau, eddyz87,
	lorenzo.bianconi, toke, fw, hawk, horms, donhunte, memxor

From: Florian Westphal <fw@strlen.de>

This adds a small internal mapping table so that a new bpf (xdp) kfunc
can perform lookups in a flowtable.

As-is, xdp program has access to the device pointer, but no way to do a
lookup in a flowtable -- there is no way to obtain the needed struct
without questionable stunts.

This allows to obtain an nf_flowtable pointer given a net_device
structure.

In order to keep backward compatibility, the infrastructure allows the
user to add a given device to multiple flowtables, but it will always
return the first added mapping performing the lookup since it assumes
the right configuration is 1:1 mapping between flowtables and net_devices.

Signed-off-by: Florian Westphal <fw@strlen.de>
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/net/netfilter/nf_flow_table.h |   8 ++
 net/netfilter/Makefile                |   2 +-
 net/netfilter/nf_flow_table_offload.c |   6 +-
 net/netfilter/nf_flow_table_xdp.c     | 163 ++++++++++++++++++++++++++
 4 files changed, 176 insertions(+), 3 deletions(-)
 create mode 100644 net/netfilter/nf_flow_table_xdp.c

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 9abb7ee40d72f..688e02b287cc4 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -305,6 +305,14 @@ struct flow_ports {
 	__be16 source, dest;
 };
 
+struct nf_flowtable *nf_flowtable_by_dev(const struct net_device *dev);
+int nf_flow_offload_xdp_setup(struct nf_flowtable *flowtable,
+			      struct net_device *dev,
+			      enum flow_block_command cmd);
+void nf_flow_offload_xdp_cancel(struct nf_flowtable *flowtable,
+				struct net_device *dev,
+				enum flow_block_command cmd);
+
 unsigned int nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
 				     const struct nf_hook_state *state);
 unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 614815a3ed738..18046872a38aa 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -142,7 +142,7 @@ obj-$(CONFIG_NFT_FWD_NETDEV)	+= nft_fwd_netdev.o
 # flow table infrastructure
 obj-$(CONFIG_NF_FLOW_TABLE)	+= nf_flow_table.o
 nf_flow_table-objs		:= nf_flow_table_core.o nf_flow_table_ip.o \
-				   nf_flow_table_offload.o
+				   nf_flow_table_offload.o nf_flow_table_xdp.o
 nf_flow_table-$(CONFIG_NF_FLOW_TABLE_PROCFS) += nf_flow_table_procfs.o
 
 obj-$(CONFIG_NF_FLOW_TABLE_INET) += nf_flow_table_inet.o
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index a010b25076ca0..d9b019c98694b 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -1192,7 +1192,7 @@ int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
 	int err;
 
 	if (!nf_flowtable_hw_offload(flowtable))
-		return 0;
+		return nf_flow_offload_xdp_setup(flowtable, dev, cmd);
 
 	if (dev->netdev_ops->ndo_setup_tc)
 		err = nf_flow_table_offload_cmd(&bo, flowtable, dev, cmd,
@@ -1200,8 +1200,10 @@ int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
 	else
 		err = nf_flow_table_indr_offload_cmd(&bo, flowtable, dev, cmd,
 						     &extack);
-	if (err < 0)
+	if (err < 0) {
+		nf_flow_offload_xdp_cancel(flowtable, dev, cmd);
 		return err;
+	}
 
 	return nf_flow_table_block_setup(flowtable, &bo, cmd);
 }
diff --git a/net/netfilter/nf_flow_table_xdp.c b/net/netfilter/nf_flow_table_xdp.c
new file mode 100644
index 0000000000000..b9bdf27ba9bd3
--- /dev/null
+++ b/net/netfilter/nf_flow_table_xdp.c
@@ -0,0 +1,163 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/netfilter.h>
+#include <linux/rhashtable.h>
+#include <linux/netdevice.h>
+#include <net/flow_offload.h>
+#include <net/netfilter/nf_flow_table.h>
+
+struct flow_offload_xdp_ft {
+	struct list_head head;
+	struct nf_flowtable *ft;
+	struct rcu_head rcuhead;
+};
+
+struct flow_offload_xdp {
+	struct hlist_node hnode;
+	unsigned long net_device_addr;
+	struct list_head head;
+};
+
+#define NF_XDP_HT_BITS	4
+static DEFINE_HASHTABLE(nf_xdp_hashtable, NF_XDP_HT_BITS);
+static DEFINE_MUTEX(nf_xdp_hashtable_lock);
+
+/* caller must hold rcu read lock */
+struct nf_flowtable *nf_flowtable_by_dev(const struct net_device *dev)
+{
+	unsigned long key = (unsigned long)dev;
+	struct flow_offload_xdp *iter;
+
+	hash_for_each_possible_rcu(nf_xdp_hashtable, iter, hnode, key) {
+		if (key == iter->net_device_addr) {
+			struct flow_offload_xdp_ft *ft_elem;
+
+			/* The user is supposed to insert a given net_device
+			 * just into a single nf_flowtable so we always return
+			 * the first element here.
+			 */
+			ft_elem = list_first_or_null_rcu(&iter->head,
+							 struct flow_offload_xdp_ft,
+							 head);
+			return ft_elem ? ft_elem->ft : NULL;
+		}
+	}
+
+	return NULL;
+}
+
+static int nf_flowtable_by_dev_insert(struct nf_flowtable *ft,
+				      const struct net_device *dev)
+{
+	struct flow_offload_xdp *iter, *elem = NULL;
+	unsigned long key = (unsigned long)dev;
+	struct flow_offload_xdp_ft *ft_elem;
+
+	ft_elem = kzalloc(sizeof(*ft_elem), GFP_KERNEL_ACCOUNT);
+	if (!ft_elem)
+		return -ENOMEM;
+
+	ft_elem->ft = ft;
+
+	mutex_lock(&nf_xdp_hashtable_lock);
+
+	hash_for_each_possible(nf_xdp_hashtable, iter, hnode, key) {
+		if (key == iter->net_device_addr) {
+			elem = iter;
+			break;
+		}
+	}
+
+	if (!elem) {
+		elem = kzalloc(sizeof(*elem), GFP_KERNEL_ACCOUNT);
+		if (!elem)
+			goto err_unlock;
+
+		elem->net_device_addr = key;
+		INIT_LIST_HEAD(&elem->head);
+		hash_add_rcu(nf_xdp_hashtable, &elem->hnode, key);
+	}
+	list_add_tail_rcu(&ft_elem->head, &elem->head);
+
+	mutex_unlock(&nf_xdp_hashtable_lock);
+
+	return 0;
+
+err_unlock:
+	mutex_unlock(&nf_xdp_hashtable_lock);
+	kfree(ft_elem);
+
+	return -ENOMEM;
+}
+
+static void nf_flowtable_by_dev_remove(struct nf_flowtable *ft,
+				       const struct net_device *dev)
+{
+	struct flow_offload_xdp *iter, *elem = NULL;
+	unsigned long key = (unsigned long)dev;
+
+	mutex_lock(&nf_xdp_hashtable_lock);
+
+	hash_for_each_possible(nf_xdp_hashtable, iter, hnode, key) {
+		if (key == iter->net_device_addr) {
+			elem = iter;
+			break;
+		}
+	}
+
+	if (elem) {
+		struct flow_offload_xdp_ft *ft_elem, *ft_next;
+
+		list_for_each_entry_safe(ft_elem, ft_next, &elem->head, head) {
+			if (ft_elem->ft == ft) {
+				list_del_rcu(&ft_elem->head);
+				kfree_rcu(ft_elem, rcuhead);
+			}
+		}
+
+		if (list_empty(&elem->head))
+			hash_del_rcu(&elem->hnode);
+		else
+			elem = NULL;
+	}
+
+	mutex_unlock(&nf_xdp_hashtable_lock);
+
+	if (elem) {
+		synchronize_rcu();
+		kfree(elem);
+	}
+}
+
+int nf_flow_offload_xdp_setup(struct nf_flowtable *flowtable,
+			      struct net_device *dev,
+			      enum flow_block_command cmd)
+{
+	switch (cmd) {
+	case FLOW_BLOCK_BIND:
+		return nf_flowtable_by_dev_insert(flowtable, dev);
+	case FLOW_BLOCK_UNBIND:
+		nf_flowtable_by_dev_remove(flowtable, dev);
+		return 0;
+	}
+
+	WARN_ON_ONCE(1);
+	return 0;
+}
+
+void nf_flow_offload_xdp_cancel(struct nf_flowtable *flowtable,
+				struct net_device *dev,
+				enum flow_block_command cmd)
+{
+	switch (cmd) {
+	case FLOW_BLOCK_BIND:
+		nf_flowtable_by_dev_remove(flowtable, dev);
+		return;
+	case FLOW_BLOCK_UNBIND:
+		/* We do not re-bind in case hw offload would report error
+		 * on *unregister*.
+		 */
+		break;
+	}
+}
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 bpf-next 2/3] netfilter: add bpf_xdp_flow_lookup kfunc
  2024-05-29 13:04 [PATCH v4 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer Lorenzo Bianconi
  2024-05-29 13:04 ` [PATCH v4 bpf-next 1/3] netfilter: nf_tables: add flowtable map for xdp offload Lorenzo Bianconi
@ 2024-05-29 13:04 ` Lorenzo Bianconi
  2024-05-29 21:53   ` Alexei Starovoitov
  2024-05-29 13:04 ` [PATCH v4 bpf-next 3/3] selftests/bpf: Add selftest for " Lorenzo Bianconi
  2024-06-14 15:19 ` [PATCH v4 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer Pablo Neira Ayuso
  3 siblings, 1 reply; 10+ messages in thread
From: Lorenzo Bianconi @ 2024-05-29 13:04 UTC (permalink / raw)
  To: bpf
  Cc: pablo, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
	netdev, ast, daniel, andrii, martin.lau, eddyz87,
	lorenzo.bianconi, toke, fw, hawk, horms, donhunte, memxor

Introduce bpf_xdp_flow_lookup kfunc in order to perform the lookup
of a given flowtable entry based on a fib tuple of incoming traffic.
bpf_xdp_flow_lookup can be used as building block to offload in xdp
the processing of sw flowtable when hw flowtable is not available.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/net/netfilter/nf_flow_table.h |  10 +++
 net/netfilter/Makefile                |   5 ++
 net/netfilter/nf_flow_table_bpf.c     | 117 ++++++++++++++++++++++++++
 net/netfilter/nf_flow_table_inet.c    |   2 +-
 4 files changed, 133 insertions(+), 1 deletion(-)
 create mode 100644 net/netfilter/nf_flow_table_bpf.c

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 688e02b287cc4..cc52234ef71af 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -318,6 +318,16 @@ unsigned int nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
 unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
 				       const struct nf_hook_state *state);
 
+#if (IS_BUILTIN(CONFIG_NF_FLOW_TABLE) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) || \
+    (IS_MODULE(CONFIG_NF_FLOW_TABLE) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES))
+extern int nf_flow_register_bpf(void);
+#else
+static inline int nf_flow_register_bpf(void)
+{
+	return 0;
+}
+#endif
+
 #define MODULE_ALIAS_NF_FLOWTABLE(family)	\
 	MODULE_ALIAS("nf-flowtable-" __stringify(family))
 
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 18046872a38aa..f0aa4d7ef4998 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -144,6 +144,11 @@ obj-$(CONFIG_NF_FLOW_TABLE)	+= nf_flow_table.o
 nf_flow_table-objs		:= nf_flow_table_core.o nf_flow_table_ip.o \
 				   nf_flow_table_offload.o nf_flow_table_xdp.o
 nf_flow_table-$(CONFIG_NF_FLOW_TABLE_PROCFS) += nf_flow_table_procfs.o
+ifeq ($(CONFIG_NF_FLOW_TABLE),m)
+nf_flow_table-$(CONFIG_DEBUG_INFO_BTF_MODULES) += nf_flow_table_bpf.o
+else ifeq ($(CONFIG_NF_FLOW_TABLE),y)
+nf_flow_table-$(CONFIG_DEBUG_INFO_BTF) += nf_flow_table_bpf.o
+endif
 
 obj-$(CONFIG_NF_FLOW_TABLE_INET) += nf_flow_table_inet.o
 
diff --git a/net/netfilter/nf_flow_table_bpf.c b/net/netfilter/nf_flow_table_bpf.c
new file mode 100644
index 0000000000000..b3f8dffe62535
--- /dev/null
+++ b/net/netfilter/nf_flow_table_bpf.c
@@ -0,0 +1,117 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Unstable Flow Table Helpers for XDP hook
+ *
+ * These are called from the XDP programs.
+ * Note that it is allowed to break compatibility for these functions since
+ * the interface they are exposed through to BPF programs is explicitly
+ * unstable.
+ */
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <net/netfilter/nf_flow_table.h>
+#include <linux/bpf.h>
+#include <linux/btf.h>
+#include <net/xdp.h>
+
+/* bpf_flowtable_opts - options for bpf flowtable helpers
+ * @error: out parameter, set for any encountered error
+ */
+struct bpf_flowtable_opts {
+	s32 error;
+};
+
+enum {
+	NF_BPF_FLOWTABLE_OPTS_SZ = 4,
+};
+
+__diag_push();
+__diag_ignore_all("-Wmissing-prototypes",
+		  "Global functions as their definitions will be in nf_flow_table BTF");
+
+static struct flow_offload_tuple_rhash *
+bpf_xdp_flow_tuple_lookup(struct net_device *dev,
+			  struct flow_offload_tuple *tuple, __be16 proto)
+{
+	struct flow_offload_tuple_rhash *tuplehash;
+	struct nf_flowtable *nf_flow_table;
+	struct flow_offload *nf_flow;
+
+	nf_flow_table = nf_flowtable_by_dev(dev);
+	if (!nf_flow_table)
+		return ERR_PTR(-ENOENT);
+
+	tuplehash = flow_offload_lookup(nf_flow_table, tuple);
+	if (!tuplehash)
+		return ERR_PTR(-ENOENT);
+
+	nf_flow = container_of(tuplehash, struct flow_offload,
+			       tuplehash[tuplehash->tuple.dir]);
+	flow_offload_refresh(nf_flow_table, nf_flow, false);
+
+	return tuplehash;
+}
+
+__bpf_kfunc struct flow_offload_tuple_rhash *
+bpf_xdp_flow_lookup(struct xdp_md *ctx, struct bpf_fib_lookup *fib_tuple,
+		    struct bpf_flowtable_opts *opts, u32 opts_len)
+{
+	struct xdp_buff *xdp = (struct xdp_buff *)ctx;
+	struct flow_offload_tuple tuple = {
+		.iifidx = fib_tuple->ifindex,
+		.l3proto = fib_tuple->family,
+		.l4proto = fib_tuple->l4_protocol,
+		.src_port = fib_tuple->sport,
+		.dst_port = fib_tuple->dport,
+	};
+	struct flow_offload_tuple_rhash *tuplehash;
+	__be16 proto;
+
+	if (opts_len != NF_BPF_FLOWTABLE_OPTS_SZ) {
+		opts->error = -EINVAL;
+		return NULL;
+	}
+
+	switch (fib_tuple->family) {
+	case AF_INET:
+		tuple.src_v4.s_addr = fib_tuple->ipv4_src;
+		tuple.dst_v4.s_addr = fib_tuple->ipv4_dst;
+		proto = htons(ETH_P_IP);
+		break;
+	case AF_INET6:
+		tuple.src_v6 = *(struct in6_addr *)&fib_tuple->ipv6_src;
+		tuple.dst_v6 = *(struct in6_addr *)&fib_tuple->ipv6_dst;
+		proto = htons(ETH_P_IPV6);
+		break;
+	default:
+		opts->error = -EAFNOSUPPORT;
+		return NULL;
+	}
+
+	tuplehash = bpf_xdp_flow_tuple_lookup(xdp->rxq->dev, &tuple, proto);
+	if (IS_ERR(tuplehash)) {
+		opts->error = PTR_ERR(tuplehash);
+		return NULL;
+	}
+
+	return tuplehash;
+}
+
+__diag_pop()
+
+BTF_KFUNCS_START(nf_ft_kfunc_set)
+BTF_ID_FLAGS(func, bpf_xdp_flow_lookup, KF_TRUSTED_ARGS | KF_RET_NULL)
+BTF_KFUNCS_END(nf_ft_kfunc_set)
+
+static const struct btf_kfunc_id_set nf_flow_kfunc_set = {
+	.owner = THIS_MODULE,
+	.set   = &nf_ft_kfunc_set,
+};
+
+int nf_flow_register_bpf(void)
+{
+	return register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP,
+					 &nf_flow_kfunc_set);
+}
+EXPORT_SYMBOL_GPL(nf_flow_register_bpf);
diff --git a/net/netfilter/nf_flow_table_inet.c b/net/netfilter/nf_flow_table_inet.c
index 6eef15648b7b0..88787b45e30d6 100644
--- a/net/netfilter/nf_flow_table_inet.c
+++ b/net/netfilter/nf_flow_table_inet.c
@@ -98,7 +98,7 @@ static int __init nf_flow_inet_module_init(void)
 	nft_register_flowtable_type(&flowtable_ipv6);
 	nft_register_flowtable_type(&flowtable_inet);
 
-	return 0;
+	return nf_flow_register_bpf();
 }
 
 static void __exit nf_flow_inet_module_exit(void)
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 bpf-next 3/3] selftests/bpf: Add selftest for bpf_xdp_flow_lookup kfunc
  2024-05-29 13:04 [PATCH v4 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer Lorenzo Bianconi
  2024-05-29 13:04 ` [PATCH v4 bpf-next 1/3] netfilter: nf_tables: add flowtable map for xdp offload Lorenzo Bianconi
  2024-05-29 13:04 ` [PATCH v4 bpf-next 2/3] netfilter: add bpf_xdp_flow_lookup kfunc Lorenzo Bianconi
@ 2024-05-29 13:04 ` Lorenzo Bianconi
  2024-06-13 16:06   ` Daniel Borkmann
  2024-06-14 15:19 ` [PATCH v4 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer Pablo Neira Ayuso
  3 siblings, 1 reply; 10+ messages in thread
From: Lorenzo Bianconi @ 2024-05-29 13:04 UTC (permalink / raw)
  To: bpf
  Cc: pablo, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
	netdev, ast, daniel, andrii, martin.lau, eddyz87,
	lorenzo.bianconi, toke, fw, hawk, horms, donhunte, memxor

Introduce e2e selftest for bpf_xdp_flow_lookup kfunc through
xdp_flowtable utility.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 tools/testing/selftests/bpf/config            |  13 ++
 .../selftests/bpf/prog_tests/xdp_flowtable.c  | 168 ++++++++++++++++++
 .../selftests/bpf/progs/xdp_flowtable.c       | 145 +++++++++++++++
 3 files changed, 326 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c
 create mode 100644 tools/testing/selftests/bpf/progs/xdp_flowtable.c

diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
index 2fb16da78dce8..5291e97df7494 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -83,6 +83,19 @@ CONFIG_NF_CONNTRACK_MARK=y
 CONFIG_NF_CONNTRACK_ZONES=y
 CONFIG_NF_DEFRAG_IPV4=y
 CONFIG_NF_DEFRAG_IPV6=y
+CONFIG_NF_TABLES=y
+CONFIG_NF_TABLES_INET=y
+CONFIG_NF_TABLES_NETDEV=y
+CONFIG_NF_TABLES_IPV4=y
+CONFIG_NF_TABLES_IPV6=y
+CONFIG_NETFILTER_INGRESS=y
+CONFIG_NF_FLOW_TABLE=y
+CONFIG_NF_FLOW_TABLE_INET=y
+CONFIG_NETFILTER_NETLINK=y
+CONFIG_NFT_FLOW_OFFLOAD=y
+CONFIG_IP_NF_IPTABLES=y
+CONFIG_IP6_NF_IPTABLES=y
+CONFIG_IP6_NF_FILTER=y
 CONFIG_NF_NAT=y
 CONFIG_RC_CORE=y
 CONFIG_SECURITY=y
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c b/tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c
new file mode 100644
index 0000000000000..e1bf141d34015
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c
@@ -0,0 +1,168 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include <network_helpers.h>
+#include <bpf/btf.h>
+#include <linux/if_link.h>
+#include <linux/udp.h>
+#include <net/if.h>
+#include <unistd.h>
+
+#include "xdp_flowtable.skel.h"
+
+#define TX_NETNS_NAME	"ns0"
+#define RX_NETNS_NAME	"ns1"
+
+#define TX_NAME		"v0"
+#define FORWARD_NAME	"v1"
+#define RX_NAME		"d0"
+
+#define TX_MAC		"00:00:00:00:00:01"
+#define FORWARD_MAC	"00:00:00:00:00:02"
+#define RX_MAC		"00:00:00:00:00:03"
+#define DST_MAC		"00:00:00:00:00:04"
+
+#define TX_ADDR		"10.0.0.1"
+#define FORWARD_ADDR	"10.0.0.2"
+#define RX_ADDR		"20.0.0.1"
+#define DST_ADDR	"20.0.0.2"
+
+#define PREFIX_LEN	"8"
+#define N_PACKETS	10
+#define UDP_PORT	12345
+#define UDP_PORT_STR	"12345"
+
+static int send_udp_traffic(void)
+{
+	struct sockaddr_storage addr;
+	int i, sock;
+
+	if (make_sockaddr(AF_INET, DST_ADDR, UDP_PORT, &addr, NULL))
+		return -EINVAL;
+
+	sock = socket(AF_INET, SOCK_DGRAM, 0);
+	if (sock < 0)
+		return sock;
+
+	for (i = 0; i < N_PACKETS; i++) {
+		unsigned char buf[] = { 0xaa, 0xbb, 0xcc };
+		int n;
+
+		n = sendto(sock, buf, sizeof(buf), MSG_NOSIGNAL | MSG_CONFIRM,
+			   (struct sockaddr *)&addr, sizeof(addr));
+		if (n != sizeof(buf)) {
+			close(sock);
+			return -EINVAL;
+		}
+
+		usleep(50000); /* 50ms */
+	}
+	close(sock);
+
+	return 0;
+}
+
+void test_xdp_flowtable(void)
+{
+	struct xdp_flowtable *skel = NULL;
+	struct nstoken *tok = NULL;
+	int iifindex, stats_fd;
+	__u32 value, key = 0;
+	struct bpf_link *link;
+
+	if (SYS_NOFAIL("nft -v")) {
+		fprintf(stdout, "Missing required nft tool\n");
+		test__skip();
+		return;
+	}
+
+	SYS(out, "ip netns add " TX_NETNS_NAME);
+	SYS(out, "ip netns add " RX_NETNS_NAME);
+
+	tok = open_netns(RX_NETNS_NAME);
+	if (!ASSERT_OK_PTR(tok, "setns"))
+		goto out;
+
+	SYS(out, "sysctl -qw net.ipv4.conf.all.forwarding=1");
+
+	SYS(out, "ip link add " TX_NAME " type veth peer " FORWARD_NAME);
+	SYS(out, "ip link set " TX_NAME " netns " TX_NETNS_NAME);
+	SYS(out, "ip link set dev " FORWARD_NAME " address " FORWARD_MAC);
+	SYS(out,
+	    "ip addr add " FORWARD_ADDR "/" PREFIX_LEN " dev " FORWARD_NAME);
+	SYS(out, "ip link set dev " FORWARD_NAME " up");
+
+	SYS(out, "ip link add " RX_NAME " type dummy");
+	SYS(out, "ip link set dev " RX_NAME " address " RX_MAC);
+	SYS(out, "ip addr add " RX_ADDR "/" PREFIX_LEN " dev " RX_NAME);
+	SYS(out, "ip link set dev " RX_NAME " up");
+
+	/* configure the flowtable */
+	SYS(out, "nft add table ip filter");
+	SYS(out,
+	    "nft add flowtable ip filter f { hook ingress priority 0\\; "
+	    "devices = { " FORWARD_NAME ", " RX_NAME " }\\; }");
+	SYS(out,
+	    "nft add chain ip filter forward "
+	    "{ type filter hook forward priority 0\\; }");
+	SYS(out,
+	    "nft add rule ip filter forward ip protocol udp th dport "
+	    UDP_PORT_STR " flow add @f");
+
+	/* Avoid ARP calls */
+	SYS(out,
+	    "ip -4 neigh add " DST_ADDR " lladdr " DST_MAC " dev " RX_NAME);
+
+	close_netns(tok);
+	tok = open_netns(TX_NETNS_NAME);
+	if (!ASSERT_OK_PTR(tok, "setns"))
+		goto out;
+
+	SYS(out, "ip addr add " TX_ADDR "/" PREFIX_LEN " dev " TX_NAME);
+	SYS(out, "ip link set dev " TX_NAME " address " TX_MAC);
+	SYS(out, "ip link set dev " TX_NAME " up");
+	SYS(out, "ip route add default via " FORWARD_ADDR);
+
+	close_netns(tok);
+	tok = open_netns(RX_NETNS_NAME);
+	if (!ASSERT_OK_PTR(tok, "setns"))
+		goto out;
+
+	iifindex = if_nametoindex(FORWARD_NAME);
+	if (!ASSERT_NEQ(iifindex, 0, "iifindex"))
+		goto out;
+
+	skel = xdp_flowtable__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel"))
+		goto out;
+
+	link = bpf_program__attach_xdp(skel->progs.xdp_flowtable_do_lookup,
+				       iifindex);
+	if (!ASSERT_OK_PTR(link, "prog_attach"))
+		goto out;
+
+	close_netns(tok);
+	tok = open_netns(TX_NETNS_NAME);
+	if (!ASSERT_OK_PTR(tok, "setns"))
+		goto out;
+
+	if (!ASSERT_OK(send_udp_traffic(), "send udp"))
+		goto out;
+
+	close_netns(tok);
+	tok = open_netns(RX_NETNS_NAME);
+	if (!ASSERT_OK_PTR(tok, "setns"))
+		goto out;
+
+	stats_fd = bpf_map__fd(skel->maps.stats);
+	if (!ASSERT_OK(bpf_map_lookup_elem(stats_fd, &key, &value),
+		       "bpf_map_update_elem stats"))
+		goto out;
+
+	ASSERT_GE(value, N_PACKETS - 2, "bpf_xdp_flow_lookup failed");
+out:
+	xdp_flowtable__destroy(skel);
+	if (tok)
+		close_netns(tok);
+	SYS_NOFAIL("ip netns del " TX_NETNS_NAME);
+	SYS_NOFAIL("ip netns del " RX_NETNS_NAME);
+}
diff --git a/tools/testing/selftests/bpf/progs/xdp_flowtable.c b/tools/testing/selftests/bpf/progs/xdp_flowtable.c
new file mode 100644
index 0000000000000..fb7f6fac57459
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/xdp_flowtable.c
@@ -0,0 +1,145 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#define MAX_ERRNO	4095
+
+#define ETH_P_IP	0x0800
+#define ETH_P_IPV6	0x86dd
+#define IP_MF		0x2000	/* "More Fragments" */
+#define IP_OFFSET	0x1fff	/* "Fragment Offset" */
+#define AF_INET		2
+#define AF_INET6	10
+
+struct bpf_flowtable_opts___local {
+	s32 error;
+};
+
+struct flow_offload_tuple_rhash *
+bpf_xdp_flow_lookup(struct xdp_md *, struct bpf_fib_lookup *,
+		    struct bpf_flowtable_opts___local *, u32) __ksym;
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__type(key, __u32);
+	__type(value, __u32);
+	__uint(max_entries, 1);
+} stats SEC(".maps");
+
+static bool xdp_flowtable_offload_check_iphdr(struct iphdr *iph)
+{
+	/* ip fragmented traffic */
+	if (iph->frag_off & bpf_htons(IP_MF | IP_OFFSET))
+		return false;
+
+	/* ip options */
+	if (iph->ihl * 4 != sizeof(*iph))
+		return false;
+
+	if (iph->ttl <= 1)
+		return false;
+
+	return true;
+}
+
+static bool xdp_flowtable_offload_check_tcp_state(void *ports, void *data_end,
+						  u8 proto)
+{
+	if (proto == IPPROTO_TCP) {
+		struct tcphdr *tcph = ports;
+
+		if (tcph + 1 > data_end)
+			return false;
+
+		if (tcph->fin || tcph->rst)
+			return false;
+	}
+
+	return true;
+}
+
+SEC("xdp.frags")
+int xdp_flowtable_do_lookup(struct xdp_md *ctx)
+{
+	void *data_end = (void *)(long)ctx->data_end;
+	struct bpf_flowtable_opts___local opts = {};
+	struct flow_offload_tuple_rhash *tuplehash;
+	struct bpf_fib_lookup tuple = {
+		.ifindex = ctx->ingress_ifindex,
+	};
+	void *data = (void *)(long)ctx->data;
+	struct ethhdr *eth = data;
+	struct flow_ports *ports;
+	__u32 *val, key = 0;
+
+	if (eth + 1 > data_end)
+		return XDP_DROP;
+
+	switch (eth->h_proto) {
+	case bpf_htons(ETH_P_IP): {
+		struct iphdr *iph = data + sizeof(*eth);
+
+		ports = (struct flow_ports *)(iph + 1);
+		if (ports + 1 > data_end)
+			return XDP_PASS;
+
+		/* sanity check on ip header */
+		if (!xdp_flowtable_offload_check_iphdr(iph))
+			return XDP_PASS;
+
+		if (!xdp_flowtable_offload_check_tcp_state(ports, data_end,
+							   iph->protocol))
+			return XDP_PASS;
+
+		tuple.family		= AF_INET;
+		tuple.tos		= iph->tos;
+		tuple.l4_protocol	= iph->protocol;
+		tuple.tot_len		= bpf_ntohs(iph->tot_len);
+		tuple.ipv4_src		= iph->saddr;
+		tuple.ipv4_dst		= iph->daddr;
+		tuple.sport		= ports->source;
+		tuple.dport		= ports->dest;
+		break;
+	}
+	case bpf_htons(ETH_P_IPV6): {
+		struct in6_addr *src = (struct in6_addr *)tuple.ipv6_src;
+		struct in6_addr *dst = (struct in6_addr *)tuple.ipv6_dst;
+		struct ipv6hdr *ip6h = data + sizeof(*eth);
+
+		ports = (struct flow_ports *)(ip6h + 1);
+		if (ports + 1 > data_end)
+			return XDP_PASS;
+
+		if (ip6h->hop_limit <= 1)
+			return XDP_PASS;
+
+		if (!xdp_flowtable_offload_check_tcp_state(ports, data_end,
+							   ip6h->nexthdr))
+			return XDP_PASS;
+
+		tuple.family		= AF_INET6;
+		tuple.l4_protocol	= ip6h->nexthdr;
+		tuple.tot_len		= bpf_ntohs(ip6h->payload_len);
+		*src			= ip6h->saddr;
+		*dst			= ip6h->daddr;
+		tuple.sport		= ports->source;
+		tuple.dport		= ports->dest;
+		break;
+	}
+	default:
+		return XDP_PASS;
+	}
+
+	tuplehash = bpf_xdp_flow_lookup(ctx, &tuple, &opts, sizeof(opts));
+	if (!tuplehash)
+		return XDP_PASS;
+
+	val = bpf_map_lookup_elem(&stats, &key);
+	if (val)
+		__sync_add_and_fetch(val, 1);
+
+	return XDP_PASS;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 bpf-next 2/3] netfilter: add bpf_xdp_flow_lookup kfunc
  2024-05-29 13:04 ` [PATCH v4 bpf-next 2/3] netfilter: add bpf_xdp_flow_lookup kfunc Lorenzo Bianconi
@ 2024-05-29 21:53   ` Alexei Starovoitov
  0 siblings, 0 replies; 10+ messages in thread
From: Alexei Starovoitov @ 2024-05-29 21:53 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bpf, Pablo Neira Ayuso, Jozsef Kadlecsik, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netfilter-devel,
	Network Development, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eddy Z, Lorenzo Bianconi,
	Toke Høiland-Jørgensen, Florian Westphal,
	Jesper Dangaard Brouer, Simon Horman, donhunte,
	Kumar Kartikeya Dwivedi

On Wed, May 29, 2024 at 6:04 AM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
>
> Introduce bpf_xdp_flow_lookup kfunc in order to perform the lookup
> of a given flowtable entry based on a fib tuple of incoming traffic.
> bpf_xdp_flow_lookup can be used as building block to offload in xdp
> the processing of sw flowtable when hw flowtable is not available.
>
> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>

lgtm
Waiting for the Ack from netfilter folks...

So we can land it through bpf-next and pass it to net-next
a week or so later.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 bpf-next 3/3] selftests/bpf: Add selftest for bpf_xdp_flow_lookup kfunc
  2024-05-29 13:04 ` [PATCH v4 bpf-next 3/3] selftests/bpf: Add selftest for " Lorenzo Bianconi
@ 2024-06-13 16:06   ` Daniel Borkmann
  2024-06-13 16:54     ` Daniel Xu
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Borkmann @ 2024-06-13 16:06 UTC (permalink / raw)
  To: Lorenzo Bianconi, bpf
  Cc: pablo, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
	netdev, ast, andrii, martin.lau, eddyz87, lorenzo.bianconi, toke,
	fw, hawk, horms, donhunte, memxor

On 5/29/24 3:04 PM, Lorenzo Bianconi wrote:
> Introduce e2e selftest for bpf_xdp_flow_lookup kfunc through
> xdp_flowtable utility.
> 
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
[...]
> +struct flow_offload_tuple_rhash *
> +bpf_xdp_flow_lookup(struct xdp_md *, struct bpf_fib_lookup *,
> +		    struct bpf_flowtable_opts___local *, u32) __ksym;

Btw, this fails CI build :

https://github.com/kernel-patches/bpf/actions/runs/9499749947/job/26190382116

   [...]
   progs/xdp_flowtable.c:20:1: error: conflicting types for 'bpf_xdp_flow_lookup'
      20 | bpf_xdp_flow_lookup(struct xdp_md *, struct bpf_fib_lookup *,
         | ^
   /tmp/work/bpf/bpf/tools/testing/selftests/bpf/tools/include/vmlinux.h:106755:41: note: previous declaration is here
    106755 | extern struct flow_offload_tuple_rhash *bpf_xdp_flow_lookup(struct xdp_md *ctx, struct bpf_fib_lookup *fib_tuple, struct bpf_flowtable_opts *opts, u32 opts_len) __weak __ksym;
           |                                         ^
   progs/xdp_flowtable.c:134:47: error: incompatible pointer types passing 'struct bpf_flowtable_opts___local *' to parameter of type 'struct bpf_flowtable_opts *' [-Werror,-Wincompatible-pointer-types]
     134 |         tuplehash = bpf_xdp_flow_lookup(ctx, &tuple, &opts, sizeof(opts));
         |                                                      ^~~~~
   /tmp/work/bpf/bpf/tools/testing/selftests/bpf/tools/include/vmlinux.h:106755:142: note: passing argument to parameter 'opts' here
    106755 | extern struct flow_offload_tuple_rhash *bpf_xdp_flow_lookup(struct xdp_md *ctx, struct bpf_fib_lookup *fib_tuple, struct bpf_flowtable_opts *opts, u32 opts_len) __weak __ksym;
           |                                                                                                                                              ^
   2 errors generated.
     CLNG-BPF [test_maps] kprobe_multi_override.bpf.o
     CLNG-BPF [test_maps] tailcall_bpf2bpf1.bpf.o
   make: *** [Makefile:654: /tmp/work/bpf/bpf/tools/testing/selftests/bpf/xdp_flowtable.bpf.o] Error 1
   make: *** Waiting for unfinished jobs....
   make: Leaving directory '/tmp/work/bpf/bpf/tools/testing/selftests/bpf'
   Error: Process completed with exit code 2.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 bpf-next 3/3] selftests/bpf: Add selftest for bpf_xdp_flow_lookup kfunc
  2024-06-13 16:06   ` Daniel Borkmann
@ 2024-06-13 16:54     ` Daniel Xu
  2024-06-13 22:11       ` Lorenzo Bianconi
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Xu @ 2024-06-13 16:54 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Lorenzo Bianconi, bpf, pablo, kadlec, davem, edumazet, kuba,
	pabeni, netfilter-devel, netdev, ast, andrii, martin.lau, eddyz87,
	lorenzo.bianconi, toke, fw, hawk, horms, donhunte, memxor

On Thu, Jun 13, 2024 at 06:06:29PM GMT, Daniel Borkmann wrote:
> On 5/29/24 3:04 PM, Lorenzo Bianconi wrote:
> > Introduce e2e selftest for bpf_xdp_flow_lookup kfunc through
> > xdp_flowtable utility.
> > 
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> [...]
> > +struct flow_offload_tuple_rhash *
> > +bpf_xdp_flow_lookup(struct xdp_md *, struct bpf_fib_lookup *,
> > +		    struct bpf_flowtable_opts___local *, u32) __ksym;
> 
> Btw, this fails CI build :
> 
> https://github.com/kernel-patches/bpf/actions/runs/9499749947/job/26190382116
> 
>   [...]
>   progs/xdp_flowtable.c:20:1: error: conflicting types for 'bpf_xdp_flow_lookup'
>      20 | bpf_xdp_flow_lookup(struct xdp_md *, struct bpf_fib_lookup *,
>         | ^
>   /tmp/work/bpf/bpf/tools/testing/selftests/bpf/tools/include/vmlinux.h:106755:41: note: previous declaration is here
>    106755 | extern struct flow_offload_tuple_rhash *bpf_xdp_flow_lookup(struct xdp_md *ctx, struct bpf_fib_lookup *fib_tuple, struct bpf_flowtable_opts *opts, u32 opts_len) __weak __ksym;
>           |                                         ^
>   progs/xdp_flowtable.c:134:47: error: incompatible pointer types passing 'struct bpf_flowtable_opts___local *' to parameter of type 'struct bpf_flowtable_opts *' [-Werror,-Wincompatible-pointer-types]
>     134 |         tuplehash = bpf_xdp_flow_lookup(ctx, &tuple, &opts, sizeof(opts));
>         |                                                      ^~~~~
>   /tmp/work/bpf/bpf/tools/testing/selftests/bpf/tools/include/vmlinux.h:106755:142: note: passing argument to parameter 'opts' here
>    106755 | extern struct flow_offload_tuple_rhash *bpf_xdp_flow_lookup(struct xdp_md *ctx, struct bpf_fib_lookup *fib_tuple, struct bpf_flowtable_opts *opts, u32 opts_len) __weak __ksym;
>           |                                                                                                                                              ^
>   2 errors generated.
>     CLNG-BPF [test_maps] kprobe_multi_override.bpf.o
>     CLNG-BPF [test_maps] tailcall_bpf2bpf1.bpf.o
>   make: *** [Makefile:654: /tmp/work/bpf/bpf/tools/testing/selftests/bpf/xdp_flowtable.bpf.o] Error 1
>   make: *** Waiting for unfinished jobs....
>   make: Leaving directory '/tmp/work/bpf/bpf/tools/testing/selftests/bpf'
>   Error: Process completed with exit code 2.
> 

We'll probably want to do the same thing as in f709124dd72f ("bpf:
selftests: nf: Opt out of using generated kfunc prototypes").

Daniel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 bpf-next 3/3] selftests/bpf: Add selftest for bpf_xdp_flow_lookup kfunc
  2024-06-13 16:54     ` Daniel Xu
@ 2024-06-13 22:11       ` Lorenzo Bianconi
  0 siblings, 0 replies; 10+ messages in thread
From: Lorenzo Bianconi @ 2024-06-13 22:11 UTC (permalink / raw)
  To: Daniel Xu
  Cc: Daniel Borkmann, bpf, pablo, kadlec, davem, edumazet, kuba,
	pabeni, netfilter-devel, netdev, ast, andrii, martin.lau, eddyz87,
	lorenzo.bianconi, toke, fw, hawk, horms, donhunte, memxor

[-- Attachment #1: Type: text/plain, Size: 2763 bytes --]

> On Thu, Jun 13, 2024 at 06:06:29PM GMT, Daniel Borkmann wrote:
> > On 5/29/24 3:04 PM, Lorenzo Bianconi wrote:
> > > Introduce e2e selftest for bpf_xdp_flow_lookup kfunc through
> > > xdp_flowtable utility.
> > > 
> > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > [...]
> > > +struct flow_offload_tuple_rhash *
> > > +bpf_xdp_flow_lookup(struct xdp_md *, struct bpf_fib_lookup *,
> > > +		    struct bpf_flowtable_opts___local *, u32) __ksym;
> > 
> > Btw, this fails CI build :
> > 
> > https://github.com/kernel-patches/bpf/actions/runs/9499749947/job/26190382116
> > 
> >   [...]
> >   progs/xdp_flowtable.c:20:1: error: conflicting types for 'bpf_xdp_flow_lookup'
> >      20 | bpf_xdp_flow_lookup(struct xdp_md *, struct bpf_fib_lookup *,
> >         | ^
> >   /tmp/work/bpf/bpf/tools/testing/selftests/bpf/tools/include/vmlinux.h:106755:41: note: previous declaration is here
> >    106755 | extern struct flow_offload_tuple_rhash *bpf_xdp_flow_lookup(struct xdp_md *ctx, struct bpf_fib_lookup *fib_tuple, struct bpf_flowtable_opts *opts, u32 opts_len) __weak __ksym;
> >           |                                         ^
> >   progs/xdp_flowtable.c:134:47: error: incompatible pointer types passing 'struct bpf_flowtable_opts___local *' to parameter of type 'struct bpf_flowtable_opts *' [-Werror,-Wincompatible-pointer-types]
> >     134 |         tuplehash = bpf_xdp_flow_lookup(ctx, &tuple, &opts, sizeof(opts));
> >         |                                                      ^~~~~
> >   /tmp/work/bpf/bpf/tools/testing/selftests/bpf/tools/include/vmlinux.h:106755:142: note: passing argument to parameter 'opts' here
> >    106755 | extern struct flow_offload_tuple_rhash *bpf_xdp_flow_lookup(struct xdp_md *ctx, struct bpf_fib_lookup *fib_tuple, struct bpf_flowtable_opts *opts, u32 opts_len) __weak __ksym;
> >           |                                                                                                                                              ^
> >   2 errors generated.
> >     CLNG-BPF [test_maps] kprobe_multi_override.bpf.o
> >     CLNG-BPF [test_maps] tailcall_bpf2bpf1.bpf.o
> >   make: *** [Makefile:654: /tmp/work/bpf/bpf/tools/testing/selftests/bpf/xdp_flowtable.bpf.o] Error 1
> >   make: *** Waiting for unfinished jobs....
> >   make: Leaving directory '/tmp/work/bpf/bpf/tools/testing/selftests/bpf'
> >   Error: Process completed with exit code 2.
> > 
> 
> We'll probably want to do the same thing as in f709124dd72f ("bpf:
> selftests: nf: Opt out of using generated kfunc prototypes").

ack, I added BPF_NO_KFUNC_PROTOTYPES to selftest patch. CI seems fine now:
https://github.com/kernel-patches/bpf/pull/7202

Regards,
Lorenzo

> 
> Daniel

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 bpf-next 1/3] netfilter: nf_tables: add flowtable map for xdp offload
  2024-05-29 13:04 ` [PATCH v4 bpf-next 1/3] netfilter: nf_tables: add flowtable map for xdp offload Lorenzo Bianconi
@ 2024-06-13 22:34   ` Lorenzo Bianconi
  0 siblings, 0 replies; 10+ messages in thread
From: Lorenzo Bianconi @ 2024-06-13 22:34 UTC (permalink / raw)
  To: pablo
  Cc: bpf, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
	netdev, ast, daniel, andrii, martin.lau, eddyz87,
	lorenzo.bianconi, toke, fw, hawk, horms, donhunte, memxor

[-- Attachment #1: Type: text/plain, Size: 8359 bytes --]

> From: Florian Westphal <fw@strlen.de>
> 
> This adds a small internal mapping table so that a new bpf (xdp) kfunc
> can perform lookups in a flowtable.
> 
> As-is, xdp program has access to the device pointer, but no way to do a
> lookup in a flowtable -- there is no way to obtain the needed struct
> without questionable stunts.
> 
> This allows to obtain an nf_flowtable pointer given a net_device
> structure.
> 
> In order to keep backward compatibility, the infrastructure allows the
> user to add a given device to multiple flowtables, but it will always
> return the first added mapping performing the lookup since it assumes
> the right configuration is 1:1 mapping between flowtables and net_devices.

Hi Pablo,

do you have any feedback about nft part? Thanks.

Regards,
Lorenzo

> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  include/net/netfilter/nf_flow_table.h |   8 ++
>  net/netfilter/Makefile                |   2 +-
>  net/netfilter/nf_flow_table_offload.c |   6 +-
>  net/netfilter/nf_flow_table_xdp.c     | 163 ++++++++++++++++++++++++++
>  4 files changed, 176 insertions(+), 3 deletions(-)
>  create mode 100644 net/netfilter/nf_flow_table_xdp.c
> 
> diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
> index 9abb7ee40d72f..688e02b287cc4 100644
> --- a/include/net/netfilter/nf_flow_table.h
> +++ b/include/net/netfilter/nf_flow_table.h
> @@ -305,6 +305,14 @@ struct flow_ports {
>  	__be16 source, dest;
>  };
>  
> +struct nf_flowtable *nf_flowtable_by_dev(const struct net_device *dev);
> +int nf_flow_offload_xdp_setup(struct nf_flowtable *flowtable,
> +			      struct net_device *dev,
> +			      enum flow_block_command cmd);
> +void nf_flow_offload_xdp_cancel(struct nf_flowtable *flowtable,
> +				struct net_device *dev,
> +				enum flow_block_command cmd);
> +
>  unsigned int nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
>  				     const struct nf_hook_state *state);
>  unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
> diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
> index 614815a3ed738..18046872a38aa 100644
> --- a/net/netfilter/Makefile
> +++ b/net/netfilter/Makefile
> @@ -142,7 +142,7 @@ obj-$(CONFIG_NFT_FWD_NETDEV)	+= nft_fwd_netdev.o
>  # flow table infrastructure
>  obj-$(CONFIG_NF_FLOW_TABLE)	+= nf_flow_table.o
>  nf_flow_table-objs		:= nf_flow_table_core.o nf_flow_table_ip.o \
> -				   nf_flow_table_offload.o
> +				   nf_flow_table_offload.o nf_flow_table_xdp.o
>  nf_flow_table-$(CONFIG_NF_FLOW_TABLE_PROCFS) += nf_flow_table_procfs.o
>  
>  obj-$(CONFIG_NF_FLOW_TABLE_INET) += nf_flow_table_inet.o
> diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
> index a010b25076ca0..d9b019c98694b 100644
> --- a/net/netfilter/nf_flow_table_offload.c
> +++ b/net/netfilter/nf_flow_table_offload.c
> @@ -1192,7 +1192,7 @@ int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
>  	int err;
>  
>  	if (!nf_flowtable_hw_offload(flowtable))
> -		return 0;
> +		return nf_flow_offload_xdp_setup(flowtable, dev, cmd);
>  
>  	if (dev->netdev_ops->ndo_setup_tc)
>  		err = nf_flow_table_offload_cmd(&bo, flowtable, dev, cmd,
> @@ -1200,8 +1200,10 @@ int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
>  	else
>  		err = nf_flow_table_indr_offload_cmd(&bo, flowtable, dev, cmd,
>  						     &extack);
> -	if (err < 0)
> +	if (err < 0) {
> +		nf_flow_offload_xdp_cancel(flowtable, dev, cmd);
>  		return err;
> +	}
>  
>  	return nf_flow_table_block_setup(flowtable, &bo, cmd);
>  }
> diff --git a/net/netfilter/nf_flow_table_xdp.c b/net/netfilter/nf_flow_table_xdp.c
> new file mode 100644
> index 0000000000000..b9bdf27ba9bd3
> --- /dev/null
> +++ b/net/netfilter/nf_flow_table_xdp.c
> @@ -0,0 +1,163 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/netfilter.h>
> +#include <linux/rhashtable.h>
> +#include <linux/netdevice.h>
> +#include <net/flow_offload.h>
> +#include <net/netfilter/nf_flow_table.h>
> +
> +struct flow_offload_xdp_ft {
> +	struct list_head head;
> +	struct nf_flowtable *ft;
> +	struct rcu_head rcuhead;
> +};
> +
> +struct flow_offload_xdp {
> +	struct hlist_node hnode;
> +	unsigned long net_device_addr;
> +	struct list_head head;
> +};
> +
> +#define NF_XDP_HT_BITS	4
> +static DEFINE_HASHTABLE(nf_xdp_hashtable, NF_XDP_HT_BITS);
> +static DEFINE_MUTEX(nf_xdp_hashtable_lock);
> +
> +/* caller must hold rcu read lock */
> +struct nf_flowtable *nf_flowtable_by_dev(const struct net_device *dev)
> +{
> +	unsigned long key = (unsigned long)dev;
> +	struct flow_offload_xdp *iter;
> +
> +	hash_for_each_possible_rcu(nf_xdp_hashtable, iter, hnode, key) {
> +		if (key == iter->net_device_addr) {
> +			struct flow_offload_xdp_ft *ft_elem;
> +
> +			/* The user is supposed to insert a given net_device
> +			 * just into a single nf_flowtable so we always return
> +			 * the first element here.
> +			 */
> +			ft_elem = list_first_or_null_rcu(&iter->head,
> +							 struct flow_offload_xdp_ft,
> +							 head);
> +			return ft_elem ? ft_elem->ft : NULL;
> +		}
> +	}
> +
> +	return NULL;
> +}
> +
> +static int nf_flowtable_by_dev_insert(struct nf_flowtable *ft,
> +				      const struct net_device *dev)
> +{
> +	struct flow_offload_xdp *iter, *elem = NULL;
> +	unsigned long key = (unsigned long)dev;
> +	struct flow_offload_xdp_ft *ft_elem;
> +
> +	ft_elem = kzalloc(sizeof(*ft_elem), GFP_KERNEL_ACCOUNT);
> +	if (!ft_elem)
> +		return -ENOMEM;
> +
> +	ft_elem->ft = ft;
> +
> +	mutex_lock(&nf_xdp_hashtable_lock);
> +
> +	hash_for_each_possible(nf_xdp_hashtable, iter, hnode, key) {
> +		if (key == iter->net_device_addr) {
> +			elem = iter;
> +			break;
> +		}
> +	}
> +
> +	if (!elem) {
> +		elem = kzalloc(sizeof(*elem), GFP_KERNEL_ACCOUNT);
> +		if (!elem)
> +			goto err_unlock;
> +
> +		elem->net_device_addr = key;
> +		INIT_LIST_HEAD(&elem->head);
> +		hash_add_rcu(nf_xdp_hashtable, &elem->hnode, key);
> +	}
> +	list_add_tail_rcu(&ft_elem->head, &elem->head);
> +
> +	mutex_unlock(&nf_xdp_hashtable_lock);
> +
> +	return 0;
> +
> +err_unlock:
> +	mutex_unlock(&nf_xdp_hashtable_lock);
> +	kfree(ft_elem);
> +
> +	return -ENOMEM;
> +}
> +
> +static void nf_flowtable_by_dev_remove(struct nf_flowtable *ft,
> +				       const struct net_device *dev)
> +{
> +	struct flow_offload_xdp *iter, *elem = NULL;
> +	unsigned long key = (unsigned long)dev;
> +
> +	mutex_lock(&nf_xdp_hashtable_lock);
> +
> +	hash_for_each_possible(nf_xdp_hashtable, iter, hnode, key) {
> +		if (key == iter->net_device_addr) {
> +			elem = iter;
> +			break;
> +		}
> +	}
> +
> +	if (elem) {
> +		struct flow_offload_xdp_ft *ft_elem, *ft_next;
> +
> +		list_for_each_entry_safe(ft_elem, ft_next, &elem->head, head) {
> +			if (ft_elem->ft == ft) {
> +				list_del_rcu(&ft_elem->head);
> +				kfree_rcu(ft_elem, rcuhead);
> +			}
> +		}
> +
> +		if (list_empty(&elem->head))
> +			hash_del_rcu(&elem->hnode);
> +		else
> +			elem = NULL;
> +	}
> +
> +	mutex_unlock(&nf_xdp_hashtable_lock);
> +
> +	if (elem) {
> +		synchronize_rcu();
> +		kfree(elem);
> +	}
> +}
> +
> +int nf_flow_offload_xdp_setup(struct nf_flowtable *flowtable,
> +			      struct net_device *dev,
> +			      enum flow_block_command cmd)
> +{
> +	switch (cmd) {
> +	case FLOW_BLOCK_BIND:
> +		return nf_flowtable_by_dev_insert(flowtable, dev);
> +	case FLOW_BLOCK_UNBIND:
> +		nf_flowtable_by_dev_remove(flowtable, dev);
> +		return 0;
> +	}
> +
> +	WARN_ON_ONCE(1);
> +	return 0;
> +}
> +
> +void nf_flow_offload_xdp_cancel(struct nf_flowtable *flowtable,
> +				struct net_device *dev,
> +				enum flow_block_command cmd)
> +{
> +	switch (cmd) {
> +	case FLOW_BLOCK_BIND:
> +		nf_flowtable_by_dev_remove(flowtable, dev);
> +		return;
> +	case FLOW_BLOCK_UNBIND:
> +		/* We do not re-bind in case hw offload would report error
> +		 * on *unregister*.
> +		 */
> +		break;
> +	}
> +}
> -- 
> 2.45.1
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer
  2024-05-29 13:04 [PATCH v4 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer Lorenzo Bianconi
                   ` (2 preceding siblings ...)
  2024-05-29 13:04 ` [PATCH v4 bpf-next 3/3] selftests/bpf: Add selftest for " Lorenzo Bianconi
@ 2024-06-14 15:19 ` Pablo Neira Ayuso
  3 siblings, 0 replies; 10+ messages in thread
From: Pablo Neira Ayuso @ 2024-06-14 15:19 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bpf, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
	netdev, ast, daniel, andrii, martin.lau, eddyz87,
	lorenzo.bianconi, toke, fw, hawk, horms, donhunte, memxor

On Wed, May 29, 2024 at 03:04:29PM +0200, Lorenzo Bianconi wrote:
> Introduce bpf_xdp_flow_lookup kfunc in order to perform the lookup of
> a given flowtable entry based on the fib tuple of incoming traffic.
> bpf_xdp_flow_lookup can be used as building block to offload in XDP
> the sw flowtable processing when the hw support is not available.

Akced-by: Pablo Neira Ayuso <pablo@netfilter.org>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-06-14 15:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-29 13:04 [PATCH v4 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer Lorenzo Bianconi
2024-05-29 13:04 ` [PATCH v4 bpf-next 1/3] netfilter: nf_tables: add flowtable map for xdp offload Lorenzo Bianconi
2024-06-13 22:34   ` Lorenzo Bianconi
2024-05-29 13:04 ` [PATCH v4 bpf-next 2/3] netfilter: add bpf_xdp_flow_lookup kfunc Lorenzo Bianconi
2024-05-29 21:53   ` Alexei Starovoitov
2024-05-29 13:04 ` [PATCH v4 bpf-next 3/3] selftests/bpf: Add selftest for " Lorenzo Bianconi
2024-06-13 16:06   ` Daniel Borkmann
2024-06-13 16:54     ` Daniel Xu
2024-06-13 22:11       ` Lorenzo Bianconi
2024-06-14 15:19 ` [PATCH v4 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer Pablo Neira Ayuso

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.