netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer
@ 2024-05-23 12:06 Lorenzo Bianconi
  2024-05-23 12:06 ` [PATCH v3 bpf-next 1/3] netfilter: nf_tables: add flowtable map for xdp offload Lorenzo Bianconi
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Lorenzo Bianconi @ 2024-05-23 12:06 UTC (permalink / raw)
  To: bpf
  Cc: pablo, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
	netdev, ast, daniel, andrii, martin.lau, eddyz87,
	lorenzo.bianconi, toke, fw, hawk, horms, donhunte, memxor

Introduce bpf_xdp_flow_lookup kfunc in order to perform the lookup of
a given flowtable entry based on the fib tuple of incoming traffic.
bpf_xdp_flow_lookup can be used as building block to offload in XDP
the sw flowtable processing when the hw support is not available.

This series has been tested running the xdp_flowtable_offload eBPF program
on an ixgbe 10Gbps NIC (eno2) in order to XDP_REDIRECT the TCP traffic to
a veth pair (veth0-veth1) based on the content of the nf_flowtable as soon
as the TCP connection is in the established state:

[tcp client] (eno1) == LAN == (eno2) xdp_flowtable_offload [XDP_REDIRECT] --> veth0 == veth1 [tcp server]

table inet filter {
	flowtable ft {
		hook ingress priority filter
		devices = { eno2, veth0 }
	}
	chain forward {
		type filter hook forward priority filter
		meta l4proto { tcp, udp } flow add @ft
	}
}

-  sw flowtable [1 TCP stream, T = 300s]: ~ 6.2 Gbps
- xdp flowtable [1 TCP stream, T = 300s]: ~ 7.6 Gbps

-  sw flowtable [3 TCP stream, T = 300s]: ~ 7.7 Gbps
- xdp flowtable [3 TCP stream, T = 300s]: ~ 8.8 Gbps

Changes since v2:
- introduce bpf_flowtable_opts struct in bpf_xdp_flow_lookup signature
- get rid of xdp_flowtable_offload bpf sample
- get rid of test_xdp_flowtable.sh for selftest and rely on prog_tests instead
- rename bpf_xdp_flow_offload_lookup in bpf_xdp_flow_lookup
Changes since v1:
- return NULL in bpf_xdp_flow_offload_lookup kfunc in case of error
- take into account kfunc registration possible failures
Changes since RFC:
- fix compilation error if BTF is not enabled

Florian Westphal (1):
  netfilter: nf_tables: add flowtable map for xdp offload

Lorenzo Bianconi (2):
  netfilter: add bpf_xdp_flow_lookup kfunc
  selftests/bpf: Add selftest for bpf_xdp_flow_lookup kfunc

 include/net/netfilter/nf_flow_table.h         |  12 ++
 net/netfilter/Makefile                        |   5 +
 net/netfilter/nf_flow_table_bpf.c             | 117 ++++++++++++
 net/netfilter/nf_flow_table_inet.c            |   2 +-
 net/netfilter/nf_flow_table_offload.c         | 161 ++++++++++++++++-
 tools/testing/selftests/bpf/config            |  13 ++
 .../selftests/bpf/prog_tests/xdp_flowtable.c  | 168 ++++++++++++++++++
 .../selftests/bpf/progs/xdp_flowtable.c       | 145 +++++++++++++++
 8 files changed, 620 insertions(+), 3 deletions(-)
 create mode 100644 net/netfilter/nf_flow_table_bpf.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c
 create mode 100644 tools/testing/selftests/bpf/progs/xdp_flowtable.c

-- 
2.45.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3 bpf-next 1/3] netfilter: nf_tables: add flowtable map for xdp offload
  2024-05-23 12:06 [PATCH v3 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer Lorenzo Bianconi
@ 2024-05-23 12:06 ` Lorenzo Bianconi
  2024-05-23 15:51   ` Pablo Neira Ayuso
  2024-05-23 12:06 ` [PATCH v3 bpf-next 2/3] netfilter: add bpf_xdp_flow_lookup kfunc Lorenzo Bianconi
  2024-05-23 12:06 ` [PATCH v3 bpf-next 3/3] selftests/bpf: Add selftest for " Lorenzo Bianconi
  2 siblings, 1 reply; 6+ messages in thread
From: Lorenzo Bianconi @ 2024-05-23 12:06 UTC (permalink / raw)
  To: bpf
  Cc: pablo, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
	netdev, ast, daniel, andrii, martin.lau, eddyz87,
	lorenzo.bianconi, toke, fw, hawk, horms, donhunte, memxor

From: Florian Westphal <fw@strlen.de>

This adds a small internal mapping table so that a new bpf (xdp) kfunc
can perform lookups in a flowtable.

As-is, xdp program has access to the device pointer, but no way to do a
lookup in a flowtable -- there is no way to obtain the needed struct
without questionable stunts.

This allows to obtain an nf_flowtable pointer given a net_device
structure.

In order to keep backward compatibility, the infrastructure allows the
user to add a given device to multiple flowtables, but it will always
return the first added mapping performing the lookup since it assumes
the right configuration is 1:1 mapping between flowtables and net_devices.

Signed-off-by: Florian Westphal <fw@strlen.de>
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/net/netfilter/nf_flow_table.h |   2 +
 net/netfilter/nf_flow_table_offload.c | 161 +++++++++++++++++++++++++-
 2 files changed, 161 insertions(+), 2 deletions(-)

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 9abb7ee40d72f..0bbe6ea8e0651 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -305,6 +305,8 @@ struct flow_ports {
 	__be16 source, dest;
 };
 
+struct nf_flowtable *nf_flowtable_by_dev(const struct net_device *dev);
+
 unsigned int nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
 				     const struct nf_hook_state *state);
 unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index a010b25076ca0..1acfcdbee42e8 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -17,6 +17,129 @@ static struct workqueue_struct *nf_flow_offload_add_wq;
 static struct workqueue_struct *nf_flow_offload_del_wq;
 static struct workqueue_struct *nf_flow_offload_stats_wq;
 
+struct flow_offload_xdp_ft {
+	struct list_head head;
+	struct nf_flowtable *ft;
+	struct rcu_head rcuhead;
+};
+
+struct flow_offload_xdp {
+	struct hlist_node hnode;
+	unsigned long net_device_addr;
+	struct list_head head;
+};
+
+#define NF_XDP_HT_BITS	4
+static DEFINE_HASHTABLE(nf_xdp_hashtable, NF_XDP_HT_BITS);
+static DEFINE_MUTEX(nf_xdp_hashtable_lock);
+
+/* caller must hold rcu read lock */
+struct nf_flowtable *nf_flowtable_by_dev(const struct net_device *dev)
+{
+	unsigned long key = (unsigned long)dev;
+	struct flow_offload_xdp *iter;
+
+	hash_for_each_possible_rcu(nf_xdp_hashtable, iter, hnode, key) {
+		if (key == iter->net_device_addr) {
+			struct flow_offload_xdp_ft *ft_elem;
+
+			/* The user is supposed to insert a given net_device
+			 * just into a single nf_flowtable so we always return
+			 * the first element here.
+			 */
+			ft_elem = list_first_or_null_rcu(&iter->head,
+							 struct flow_offload_xdp_ft,
+							 head);
+			return ft_elem ? ft_elem->ft : NULL;
+		}
+	}
+
+	return NULL;
+}
+
+static int nf_flowtable_by_dev_insert(struct nf_flowtable *ft,
+				      const struct net_device *dev)
+{
+	struct flow_offload_xdp *iter, *elem = NULL;
+	unsigned long key = (unsigned long)dev;
+	struct flow_offload_xdp_ft *ft_elem;
+
+	ft_elem = kzalloc(sizeof(*ft_elem), GFP_KERNEL_ACCOUNT);
+	if (!ft_elem)
+		return -ENOMEM;
+
+	ft_elem->ft = ft;
+
+	mutex_lock(&nf_xdp_hashtable_lock);
+
+	hash_for_each_possible(nf_xdp_hashtable, iter, hnode, key) {
+		if (key == iter->net_device_addr) {
+			elem = iter;
+			break;
+		}
+	}
+
+	if (!elem) {
+		elem = kzalloc(sizeof(*elem), GFP_KERNEL_ACCOUNT);
+		if (!elem)
+			goto err_unlock;
+
+		elem->net_device_addr = key;
+		INIT_LIST_HEAD(&elem->head);
+		hash_add_rcu(nf_xdp_hashtable, &elem->hnode, key);
+	}
+	list_add_tail_rcu(&ft_elem->head, &elem->head);
+
+	mutex_unlock(&nf_xdp_hashtable_lock);
+
+	return 0;
+
+err_unlock:
+	mutex_unlock(&nf_xdp_hashtable_lock);
+	kfree(ft_elem);
+
+	return -ENOMEM;
+}
+
+static void nf_flowtable_by_dev_remove(struct nf_flowtable *ft,
+				       const struct net_device *dev)
+{
+	struct flow_offload_xdp *iter, *elem = NULL;
+	unsigned long key = (unsigned long)dev;
+
+	mutex_lock(&nf_xdp_hashtable_lock);
+
+	hash_for_each_possible(nf_xdp_hashtable, iter, hnode, key) {
+		if (key == iter->net_device_addr) {
+			elem = iter;
+			break;
+		}
+	}
+
+	if (elem) {
+		struct flow_offload_xdp_ft *ft_elem, *ft_next;
+
+		list_for_each_entry_safe(ft_elem, ft_next, &elem->head, head) {
+			if (ft_elem->ft == ft) {
+				list_del_rcu(&ft_elem->head);
+				kfree_rcu(ft_elem, rcuhead);
+			}
+		}
+
+		if (list_empty(&elem->head))
+			hash_del_rcu(&elem->hnode);
+		else
+			elem = NULL;
+	}
+
+	mutex_unlock(&nf_xdp_hashtable_lock);
+
+	if (elem) {
+		synchronize_rcu();
+		kfree(elem);
+	}
+}
+
 struct flow_offload_work {
 	struct list_head	list;
 	enum flow_cls_command	cmd;
@@ -1183,6 +1306,38 @@ static int nf_flow_table_offload_cmd(struct flow_block_offload *bo,
 	return 0;
 }
 
+static int nf_flow_offload_xdp_setup(struct nf_flowtable *flowtable,
+				     struct net_device *dev,
+				     enum flow_block_command cmd)
+{
+	switch (cmd) {
+	case FLOW_BLOCK_BIND:
+		return nf_flowtable_by_dev_insert(flowtable, dev);
+	case FLOW_BLOCK_UNBIND:
+		nf_flowtable_by_dev_remove(flowtable, dev);
+		return 0;
+	}
+
+	WARN_ON_ONCE(1);
+	return 0;
+}
+
+static void nf_flow_offload_xdp_cancel(struct nf_flowtable *flowtable,
+				       struct net_device *dev,
+				       enum flow_block_command cmd)
+{
+	switch (cmd) {
+	case FLOW_BLOCK_BIND:
+		nf_flowtable_by_dev_remove(flowtable, dev);
+		return;
+	case FLOW_BLOCK_UNBIND:
+		/* We do not re-bind in case hw offload would report error
+		 * on *unregister*.
+		 */
+		break;
+	}
+}
+
 int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
 				struct net_device *dev,
 				enum flow_block_command cmd)
@@ -1192,7 +1347,7 @@ int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
 	int err;
 
 	if (!nf_flowtable_hw_offload(flowtable))
-		return 0;
+		return nf_flow_offload_xdp_setup(flowtable, dev, cmd);
 
 	if (dev->netdev_ops->ndo_setup_tc)
 		err = nf_flow_table_offload_cmd(&bo, flowtable, dev, cmd,
@@ -1200,8 +1355,10 @@ int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
 	else
 		err = nf_flow_table_indr_offload_cmd(&bo, flowtable, dev, cmd,
 						     &extack);
-	if (err < 0)
+	if (err < 0) {
+		nf_flow_offload_xdp_cancel(flowtable, dev, cmd);
 		return err;
+	}
 
 	return nf_flow_table_block_setup(flowtable, &bo, cmd);
 }
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 bpf-next 2/3] netfilter: add bpf_xdp_flow_lookup kfunc
  2024-05-23 12:06 [PATCH v3 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer Lorenzo Bianconi
  2024-05-23 12:06 ` [PATCH v3 bpf-next 1/3] netfilter: nf_tables: add flowtable map for xdp offload Lorenzo Bianconi
@ 2024-05-23 12:06 ` Lorenzo Bianconi
  2024-05-23 12:06 ` [PATCH v3 bpf-next 3/3] selftests/bpf: Add selftest for " Lorenzo Bianconi
  2 siblings, 0 replies; 6+ messages in thread
From: Lorenzo Bianconi @ 2024-05-23 12:06 UTC (permalink / raw)
  To: bpf
  Cc: pablo, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
	netdev, ast, daniel, andrii, martin.lau, eddyz87,
	lorenzo.bianconi, toke, fw, hawk, horms, donhunte, memxor

Introduce bpf_xdp_flow_lookup kfunc in order to perform the lookup
of a given flowtable entry based on a fib tuple of incoming traffic.
bpf_xdp_flow_lookup can be used as building block to offload in xdp
the processing of sw flowtable when hw flowtable is not available.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/net/netfilter/nf_flow_table.h |  10 +++
 net/netfilter/Makefile                |   5 ++
 net/netfilter/nf_flow_table_bpf.c     | 117 ++++++++++++++++++++++++++
 net/netfilter/nf_flow_table_inet.c    |   2 +-
 4 files changed, 133 insertions(+), 1 deletion(-)
 create mode 100644 net/netfilter/nf_flow_table_bpf.c

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 0bbe6ea8e0651..fb4bfeb569dee 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -312,6 +312,16 @@ unsigned int nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
 unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
 				       const struct nf_hook_state *state);
 
+#if (IS_BUILTIN(CONFIG_NF_FLOW_TABLE) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) || \
+    (IS_MODULE(CONFIG_NF_FLOW_TABLE) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES))
+extern int nf_flow_register_bpf(void);
+#else
+static inline int nf_flow_register_bpf(void)
+{
+	return 0;
+}
+#endif
+
 #define MODULE_ALIAS_NF_FLOWTABLE(family)	\
 	MODULE_ALIAS("nf-flowtable-" __stringify(family))
 
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 614815a3ed738..18b09cec92024 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -144,6 +144,11 @@ obj-$(CONFIG_NF_FLOW_TABLE)	+= nf_flow_table.o
 nf_flow_table-objs		:= nf_flow_table_core.o nf_flow_table_ip.o \
 				   nf_flow_table_offload.o
 nf_flow_table-$(CONFIG_NF_FLOW_TABLE_PROCFS) += nf_flow_table_procfs.o
+ifeq ($(CONFIG_NF_FLOW_TABLE),m)
+nf_flow_table-$(CONFIG_DEBUG_INFO_BTF_MODULES) += nf_flow_table_bpf.o
+else ifeq ($(CONFIG_NF_FLOW_TABLE),y)
+nf_flow_table-$(CONFIG_DEBUG_INFO_BTF) += nf_flow_table_bpf.o
+endif
 
 obj-$(CONFIG_NF_FLOW_TABLE_INET) += nf_flow_table_inet.o
 
diff --git a/net/netfilter/nf_flow_table_bpf.c b/net/netfilter/nf_flow_table_bpf.c
new file mode 100644
index 0000000000000..b3f8dffe62535
--- /dev/null
+++ b/net/netfilter/nf_flow_table_bpf.c
@@ -0,0 +1,117 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Unstable Flow Table Helpers for XDP hook
+ *
+ * These are called from the XDP programs.
+ * Note that it is allowed to break compatibility for these functions since
+ * the interface they are exposed through to BPF programs is explicitly
+ * unstable.
+ */
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <net/netfilter/nf_flow_table.h>
+#include <linux/bpf.h>
+#include <linux/btf.h>
+#include <net/xdp.h>
+
+/* bpf_flowtable_opts - options for bpf flowtable helpers
+ * @error: out parameter, set for any encountered error
+ */
+struct bpf_flowtable_opts {
+	s32 error;
+};
+
+enum {
+	NF_BPF_FLOWTABLE_OPTS_SZ = 4,
+};
+
+__diag_push();
+__diag_ignore_all("-Wmissing-prototypes",
+		  "Global functions as their definitions will be in nf_flow_table BTF");
+
+static struct flow_offload_tuple_rhash *
+bpf_xdp_flow_tuple_lookup(struct net_device *dev,
+			  struct flow_offload_tuple *tuple, __be16 proto)
+{
+	struct flow_offload_tuple_rhash *tuplehash;
+	struct nf_flowtable *nf_flow_table;
+	struct flow_offload *nf_flow;
+
+	nf_flow_table = nf_flowtable_by_dev(dev);
+	if (!nf_flow_table)
+		return ERR_PTR(-ENOENT);
+
+	tuplehash = flow_offload_lookup(nf_flow_table, tuple);
+	if (!tuplehash)
+		return ERR_PTR(-ENOENT);
+
+	nf_flow = container_of(tuplehash, struct flow_offload,
+			       tuplehash[tuplehash->tuple.dir]);
+	flow_offload_refresh(nf_flow_table, nf_flow, false);
+
+	return tuplehash;
+}
+
+__bpf_kfunc struct flow_offload_tuple_rhash *
+bpf_xdp_flow_lookup(struct xdp_md *ctx, struct bpf_fib_lookup *fib_tuple,
+		    struct bpf_flowtable_opts *opts, u32 opts_len)
+{
+	struct xdp_buff *xdp = (struct xdp_buff *)ctx;
+	struct flow_offload_tuple tuple = {
+		.iifidx = fib_tuple->ifindex,
+		.l3proto = fib_tuple->family,
+		.l4proto = fib_tuple->l4_protocol,
+		.src_port = fib_tuple->sport,
+		.dst_port = fib_tuple->dport,
+	};
+	struct flow_offload_tuple_rhash *tuplehash;
+	__be16 proto;
+
+	if (opts_len != NF_BPF_FLOWTABLE_OPTS_SZ) {
+		opts->error = -EINVAL;
+		return NULL;
+	}
+
+	switch (fib_tuple->family) {
+	case AF_INET:
+		tuple.src_v4.s_addr = fib_tuple->ipv4_src;
+		tuple.dst_v4.s_addr = fib_tuple->ipv4_dst;
+		proto = htons(ETH_P_IP);
+		break;
+	case AF_INET6:
+		tuple.src_v6 = *(struct in6_addr *)&fib_tuple->ipv6_src;
+		tuple.dst_v6 = *(struct in6_addr *)&fib_tuple->ipv6_dst;
+		proto = htons(ETH_P_IPV6);
+		break;
+	default:
+		opts->error = -EAFNOSUPPORT;
+		return NULL;
+	}
+
+	tuplehash = bpf_xdp_flow_tuple_lookup(xdp->rxq->dev, &tuple, proto);
+	if (IS_ERR(tuplehash)) {
+		opts->error = PTR_ERR(tuplehash);
+		return NULL;
+	}
+
+	return tuplehash;
+}
+
+__diag_pop()
+
+BTF_KFUNCS_START(nf_ft_kfunc_set)
+BTF_ID_FLAGS(func, bpf_xdp_flow_lookup, KF_TRUSTED_ARGS | KF_RET_NULL)
+BTF_KFUNCS_END(nf_ft_kfunc_set)
+
+static const struct btf_kfunc_id_set nf_flow_kfunc_set = {
+	.owner = THIS_MODULE,
+	.set   = &nf_ft_kfunc_set,
+};
+
+int nf_flow_register_bpf(void)
+{
+	return register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP,
+					 &nf_flow_kfunc_set);
+}
+EXPORT_SYMBOL_GPL(nf_flow_register_bpf);
diff --git a/net/netfilter/nf_flow_table_inet.c b/net/netfilter/nf_flow_table_inet.c
index 6eef15648b7b0..88787b45e30d6 100644
--- a/net/netfilter/nf_flow_table_inet.c
+++ b/net/netfilter/nf_flow_table_inet.c
@@ -98,7 +98,7 @@ static int __init nf_flow_inet_module_init(void)
 	nft_register_flowtable_type(&flowtable_ipv6);
 	nft_register_flowtable_type(&flowtable_inet);
 
-	return 0;
+	return nf_flow_register_bpf();
 }
 
 static void __exit nf_flow_inet_module_exit(void)
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 bpf-next 3/3] selftests/bpf: Add selftest for bpf_xdp_flow_lookup kfunc
  2024-05-23 12:06 [PATCH v3 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer Lorenzo Bianconi
  2024-05-23 12:06 ` [PATCH v3 bpf-next 1/3] netfilter: nf_tables: add flowtable map for xdp offload Lorenzo Bianconi
  2024-05-23 12:06 ` [PATCH v3 bpf-next 2/3] netfilter: add bpf_xdp_flow_lookup kfunc Lorenzo Bianconi
@ 2024-05-23 12:06 ` Lorenzo Bianconi
  2 siblings, 0 replies; 6+ messages in thread
From: Lorenzo Bianconi @ 2024-05-23 12:06 UTC (permalink / raw)
  To: bpf
  Cc: pablo, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
	netdev, ast, daniel, andrii, martin.lau, eddyz87,
	lorenzo.bianconi, toke, fw, hawk, horms, donhunte, memxor

Introduce e2e selftest for bpf_xdp_flow_lookup kfunc through
xdp_flowtable utility.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 tools/testing/selftests/bpf/config            |  13 ++
 .../selftests/bpf/prog_tests/xdp_flowtable.c  | 168 ++++++++++++++++++
 .../selftests/bpf/progs/xdp_flowtable.c       | 145 +++++++++++++++
 3 files changed, 326 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c
 create mode 100644 tools/testing/selftests/bpf/progs/xdp_flowtable.c

diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
index 2fb16da78dce8..5291e97df7494 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -83,6 +83,19 @@ CONFIG_NF_CONNTRACK_MARK=y
 CONFIG_NF_CONNTRACK_ZONES=y
 CONFIG_NF_DEFRAG_IPV4=y
 CONFIG_NF_DEFRAG_IPV6=y
+CONFIG_NF_TABLES=y
+CONFIG_NF_TABLES_INET=y
+CONFIG_NF_TABLES_NETDEV=y
+CONFIG_NF_TABLES_IPV4=y
+CONFIG_NF_TABLES_IPV6=y
+CONFIG_NETFILTER_INGRESS=y
+CONFIG_NF_FLOW_TABLE=y
+CONFIG_NF_FLOW_TABLE_INET=y
+CONFIG_NETFILTER_NETLINK=y
+CONFIG_NFT_FLOW_OFFLOAD=y
+CONFIG_IP_NF_IPTABLES=y
+CONFIG_IP6_NF_IPTABLES=y
+CONFIG_IP6_NF_FILTER=y
 CONFIG_NF_NAT=y
 CONFIG_RC_CORE=y
 CONFIG_SECURITY=y
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c b/tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c
new file mode 100644
index 0000000000000..e1bf141d34015
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c
@@ -0,0 +1,168 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include <network_helpers.h>
+#include <bpf/btf.h>
+#include <linux/if_link.h>
+#include <linux/udp.h>
+#include <net/if.h>
+#include <unistd.h>
+
+#include "xdp_flowtable.skel.h"
+
+#define TX_NETNS_NAME	"ns0"
+#define RX_NETNS_NAME	"ns1"
+
+#define TX_NAME		"v0"
+#define FORWARD_NAME	"v1"
+#define RX_NAME		"d0"
+
+#define TX_MAC		"00:00:00:00:00:01"
+#define FORWARD_MAC	"00:00:00:00:00:02"
+#define RX_MAC		"00:00:00:00:00:03"
+#define DST_MAC		"00:00:00:00:00:04"
+
+#define TX_ADDR		"10.0.0.1"
+#define FORWARD_ADDR	"10.0.0.2"
+#define RX_ADDR		"20.0.0.1"
+#define DST_ADDR	"20.0.0.2"
+
+#define PREFIX_LEN	"8"
+#define N_PACKETS	10
+#define UDP_PORT	12345
+#define UDP_PORT_STR	"12345"
+
+static int send_udp_traffic(void)
+{
+	struct sockaddr_storage addr;
+	int i, sock;
+
+	if (make_sockaddr(AF_INET, DST_ADDR, UDP_PORT, &addr, NULL))
+		return -EINVAL;
+
+	sock = socket(AF_INET, SOCK_DGRAM, 0);
+	if (sock < 0)
+		return sock;
+
+	for (i = 0; i < N_PACKETS; i++) {
+		unsigned char buf[] = { 0xaa, 0xbb, 0xcc };
+		int n;
+
+		n = sendto(sock, buf, sizeof(buf), MSG_NOSIGNAL | MSG_CONFIRM,
+			   (struct sockaddr *)&addr, sizeof(addr));
+		if (n != sizeof(buf)) {
+			close(sock);
+			return -EINVAL;
+		}
+
+		usleep(50000); /* 50ms */
+	}
+	close(sock);
+
+	return 0;
+}
+
+void test_xdp_flowtable(void)
+{
+	struct xdp_flowtable *skel = NULL;
+	struct nstoken *tok = NULL;
+	int iifindex, stats_fd;
+	__u32 value, key = 0;
+	struct bpf_link *link;
+
+	if (SYS_NOFAIL("nft -v")) {
+		fprintf(stdout, "Missing required nft tool\n");
+		test__skip();
+		return;
+	}
+
+	SYS(out, "ip netns add " TX_NETNS_NAME);
+	SYS(out, "ip netns add " RX_NETNS_NAME);
+
+	tok = open_netns(RX_NETNS_NAME);
+	if (!ASSERT_OK_PTR(tok, "setns"))
+		goto out;
+
+	SYS(out, "sysctl -qw net.ipv4.conf.all.forwarding=1");
+
+	SYS(out, "ip link add " TX_NAME " type veth peer " FORWARD_NAME);
+	SYS(out, "ip link set " TX_NAME " netns " TX_NETNS_NAME);
+	SYS(out, "ip link set dev " FORWARD_NAME " address " FORWARD_MAC);
+	SYS(out,
+	    "ip addr add " FORWARD_ADDR "/" PREFIX_LEN " dev " FORWARD_NAME);
+	SYS(out, "ip link set dev " FORWARD_NAME " up");
+
+	SYS(out, "ip link add " RX_NAME " type dummy");
+	SYS(out, "ip link set dev " RX_NAME " address " RX_MAC);
+	SYS(out, "ip addr add " RX_ADDR "/" PREFIX_LEN " dev " RX_NAME);
+	SYS(out, "ip link set dev " RX_NAME " up");
+
+	/* configure the flowtable */
+	SYS(out, "nft add table ip filter");
+	SYS(out,
+	    "nft add flowtable ip filter f { hook ingress priority 0\\; "
+	    "devices = { " FORWARD_NAME ", " RX_NAME " }\\; }");
+	SYS(out,
+	    "nft add chain ip filter forward "
+	    "{ type filter hook forward priority 0\\; }");
+	SYS(out,
+	    "nft add rule ip filter forward ip protocol udp th dport "
+	    UDP_PORT_STR " flow add @f");
+
+	/* Avoid ARP calls */
+	SYS(out,
+	    "ip -4 neigh add " DST_ADDR " lladdr " DST_MAC " dev " RX_NAME);
+
+	close_netns(tok);
+	tok = open_netns(TX_NETNS_NAME);
+	if (!ASSERT_OK_PTR(tok, "setns"))
+		goto out;
+
+	SYS(out, "ip addr add " TX_ADDR "/" PREFIX_LEN " dev " TX_NAME);
+	SYS(out, "ip link set dev " TX_NAME " address " TX_MAC);
+	SYS(out, "ip link set dev " TX_NAME " up");
+	SYS(out, "ip route add default via " FORWARD_ADDR);
+
+	close_netns(tok);
+	tok = open_netns(RX_NETNS_NAME);
+	if (!ASSERT_OK_PTR(tok, "setns"))
+		goto out;
+
+	iifindex = if_nametoindex(FORWARD_NAME);
+	if (!ASSERT_NEQ(iifindex, 0, "iifindex"))
+		goto out;
+
+	skel = xdp_flowtable__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel"))
+		goto out;
+
+	link = bpf_program__attach_xdp(skel->progs.xdp_flowtable_do_lookup,
+				       iifindex);
+	if (!ASSERT_OK_PTR(link, "prog_attach"))
+		goto out;
+
+	close_netns(tok);
+	tok = open_netns(TX_NETNS_NAME);
+	if (!ASSERT_OK_PTR(tok, "setns"))
+		goto out;
+
+	if (!ASSERT_OK(send_udp_traffic(), "send udp"))
+		goto out;
+
+	close_netns(tok);
+	tok = open_netns(RX_NETNS_NAME);
+	if (!ASSERT_OK_PTR(tok, "setns"))
+		goto out;
+
+	stats_fd = bpf_map__fd(skel->maps.stats);
+	if (!ASSERT_OK(bpf_map_lookup_elem(stats_fd, &key, &value),
+		       "bpf_map_update_elem stats"))
+		goto out;
+
+	ASSERT_GE(value, N_PACKETS - 2, "bpf_xdp_flow_lookup failed");
+out:
+	xdp_flowtable__destroy(skel);
+	if (tok)
+		close_netns(tok);
+	SYS_NOFAIL("ip netns del " TX_NETNS_NAME);
+	SYS_NOFAIL("ip netns del " RX_NETNS_NAME);
+}
diff --git a/tools/testing/selftests/bpf/progs/xdp_flowtable.c b/tools/testing/selftests/bpf/progs/xdp_flowtable.c
new file mode 100644
index 0000000000000..fb7f6fac57459
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/xdp_flowtable.c
@@ -0,0 +1,145 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#define MAX_ERRNO	4095
+
+#define ETH_P_IP	0x0800
+#define ETH_P_IPV6	0x86dd
+#define IP_MF		0x2000	/* "More Fragments" */
+#define IP_OFFSET	0x1fff	/* "Fragment Offset" */
+#define AF_INET		2
+#define AF_INET6	10
+
+struct bpf_flowtable_opts___local {
+	s32 error;
+};
+
+struct flow_offload_tuple_rhash *
+bpf_xdp_flow_lookup(struct xdp_md *, struct bpf_fib_lookup *,
+		    struct bpf_flowtable_opts___local *, u32) __ksym;
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__type(key, __u32);
+	__type(value, __u32);
+	__uint(max_entries, 1);
+} stats SEC(".maps");
+
+static bool xdp_flowtable_offload_check_iphdr(struct iphdr *iph)
+{
+	/* ip fragmented traffic */
+	if (iph->frag_off & bpf_htons(IP_MF | IP_OFFSET))
+		return false;
+
+	/* ip options */
+	if (iph->ihl * 4 != sizeof(*iph))
+		return false;
+
+	if (iph->ttl <= 1)
+		return false;
+
+	return true;
+}
+
+static bool xdp_flowtable_offload_check_tcp_state(void *ports, void *data_end,
+						  u8 proto)
+{
+	if (proto == IPPROTO_TCP) {
+		struct tcphdr *tcph = ports;
+
+		if (tcph + 1 > data_end)
+			return false;
+
+		if (tcph->fin || tcph->rst)
+			return false;
+	}
+
+	return true;
+}
+
+SEC("xdp.frags")
+int xdp_flowtable_do_lookup(struct xdp_md *ctx)
+{
+	void *data_end = (void *)(long)ctx->data_end;
+	struct bpf_flowtable_opts___local opts = {};
+	struct flow_offload_tuple_rhash *tuplehash;
+	struct bpf_fib_lookup tuple = {
+		.ifindex = ctx->ingress_ifindex,
+	};
+	void *data = (void *)(long)ctx->data;
+	struct ethhdr *eth = data;
+	struct flow_ports *ports;
+	__u32 *val, key = 0;
+
+	if (eth + 1 > data_end)
+		return XDP_DROP;
+
+	switch (eth->h_proto) {
+	case bpf_htons(ETH_P_IP): {
+		struct iphdr *iph = data + sizeof(*eth);
+
+		ports = (struct flow_ports *)(iph + 1);
+		if (ports + 1 > data_end)
+			return XDP_PASS;
+
+		/* sanity check on ip header */
+		if (!xdp_flowtable_offload_check_iphdr(iph))
+			return XDP_PASS;
+
+		if (!xdp_flowtable_offload_check_tcp_state(ports, data_end,
+							   iph->protocol))
+			return XDP_PASS;
+
+		tuple.family		= AF_INET;
+		tuple.tos		= iph->tos;
+		tuple.l4_protocol	= iph->protocol;
+		tuple.tot_len		= bpf_ntohs(iph->tot_len);
+		tuple.ipv4_src		= iph->saddr;
+		tuple.ipv4_dst		= iph->daddr;
+		tuple.sport		= ports->source;
+		tuple.dport		= ports->dest;
+		break;
+	}
+	case bpf_htons(ETH_P_IPV6): {
+		struct in6_addr *src = (struct in6_addr *)tuple.ipv6_src;
+		struct in6_addr *dst = (struct in6_addr *)tuple.ipv6_dst;
+		struct ipv6hdr *ip6h = data + sizeof(*eth);
+
+		ports = (struct flow_ports *)(ip6h + 1);
+		if (ports + 1 > data_end)
+			return XDP_PASS;
+
+		if (ip6h->hop_limit <= 1)
+			return XDP_PASS;
+
+		if (!xdp_flowtable_offload_check_tcp_state(ports, data_end,
+							   ip6h->nexthdr))
+			return XDP_PASS;
+
+		tuple.family		= AF_INET6;
+		tuple.l4_protocol	= ip6h->nexthdr;
+		tuple.tot_len		= bpf_ntohs(ip6h->payload_len);
+		*src			= ip6h->saddr;
+		*dst			= ip6h->daddr;
+		tuple.sport		= ports->source;
+		tuple.dport		= ports->dest;
+		break;
+	}
+	default:
+		return XDP_PASS;
+	}
+
+	tuplehash = bpf_xdp_flow_lookup(ctx, &tuple, &opts, sizeof(opts));
+	if (!tuplehash)
+		return XDP_PASS;
+
+	val = bpf_map_lookup_elem(&stats, &key);
+	if (val)
+		__sync_add_and_fetch(val, 1);
+
+	return XDP_PASS;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 bpf-next 1/3] netfilter: nf_tables: add flowtable map for xdp offload
  2024-05-23 12:06 ` [PATCH v3 bpf-next 1/3] netfilter: nf_tables: add flowtable map for xdp offload Lorenzo Bianconi
@ 2024-05-23 15:51   ` Pablo Neira Ayuso
  2024-05-23 17:06     ` Lorenzo Bianconi
  0 siblings, 1 reply; 6+ messages in thread
From: Pablo Neira Ayuso @ 2024-05-23 15:51 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bpf, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
	netdev, ast, daniel, andrii, martin.lau, eddyz87,
	lorenzo.bianconi, toke, fw, hawk, horms, donhunte, memxor

Hi Lorenzo,

On Thu, May 23, 2024 at 02:06:16PM +0200, Lorenzo Bianconi wrote:
> From: Florian Westphal <fw@strlen.de>
> 
> This adds a small internal mapping table so that a new bpf (xdp) kfunc
> can perform lookups in a flowtable.
> 
> As-is, xdp program has access to the device pointer, but no way to do a
> lookup in a flowtable -- there is no way to obtain the needed struct
> without questionable stunts.
> 
> This allows to obtain an nf_flowtable pointer given a net_device
> structure.
> 
> In order to keep backward compatibility, the infrastructure allows the
> user to add a given device to multiple flowtables, but it will always
> return the first added mapping performing the lookup since it assumes
> the right configuration is 1:1 mapping between flowtables and net_devices.

Would it be possible to move this new code in _offload.c to
nf_flow_table_xdp.c?

The flowtable offload code is already a bit convoluted, the hardware
offload API for payload matching results in chatty with many sparse
warnings (unless I adds casting everywhere), but I remember I failed
to provide a convincing improvements on that front without requiring
changes to drivers at the time. This is of course no related to this
series.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 bpf-next 1/3] netfilter: nf_tables: add flowtable map for xdp offload
  2024-05-23 15:51   ` Pablo Neira Ayuso
@ 2024-05-23 17:06     ` Lorenzo Bianconi
  0 siblings, 0 replies; 6+ messages in thread
From: Lorenzo Bianconi @ 2024-05-23 17:06 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: bpf, kadlec, davem, edumazet, kuba, pabeni, netfilter-devel,
	netdev, ast, daniel, andrii, martin.lau, eddyz87,
	lorenzo.bianconi, toke, fw, hawk, horms, donhunte, memxor

[-- Attachment #1: Type: text/plain, Size: 1362 bytes --]

> Hi Lorenzo,
> 
> On Thu, May 23, 2024 at 02:06:16PM +0200, Lorenzo Bianconi wrote:
> > From: Florian Westphal <fw@strlen.de>
> > 
> > This adds a small internal mapping table so that a new bpf (xdp) kfunc
> > can perform lookups in a flowtable.
> > 
> > As-is, xdp program has access to the device pointer, but no way to do a
> > lookup in a flowtable -- there is no way to obtain the needed struct
> > without questionable stunts.
> > 
> > This allows to obtain an nf_flowtable pointer given a net_device
> > structure.
> > 
> > In order to keep backward compatibility, the infrastructure allows the
> > user to add a given device to multiple flowtables, but it will always
> > return the first added mapping performing the lookup since it assumes
> > the right configuration is 1:1 mapping between flowtables and net_devices.
> 
> Would it be possible to move this new code in _offload.c to
> nf_flow_table_xdp.c?

ack, I will do in v4.

Regards,
Lorenzo

> 
> The flowtable offload code is already a bit convoluted, the hardware
> offload API for payload matching results in chatty with many sparse
> warnings (unless I adds casting everywhere), but I remember I failed
> to provide a convincing improvements on that front without requiring
> changes to drivers at the time. This is of course no related to this
> series.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-05-23 17:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-23 12:06 [PATCH v3 bpf-next 0/3] netfilter: Add the capability to offload flowtable in XDP layer Lorenzo Bianconi
2024-05-23 12:06 ` [PATCH v3 bpf-next 1/3] netfilter: nf_tables: add flowtable map for xdp offload Lorenzo Bianconi
2024-05-23 15:51   ` Pablo Neira Ayuso
2024-05-23 17:06     ` Lorenzo Bianconi
2024-05-23 12:06 ` [PATCH v3 bpf-next 2/3] netfilter: add bpf_xdp_flow_lookup kfunc Lorenzo Bianconi
2024-05-23 12:06 ` [PATCH v3 bpf-next 3/3] selftests/bpf: Add selftest for " Lorenzo Bianconi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).