netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/6] netfilter: updates for net-next
@ 2025-08-20 14:47 Florian Westphal
  2025-08-20 14:47 ` [PATCH net-next 1/6] netfilter: ctnetlink: remove refcounting in dying list dumping Florian Westphal
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Florian Westphal @ 2025-08-20 14:47 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo

Hi,

The following patchset contains Netfilter enhancements for *net-next*:

First patch gets rid of refcounting for dying list dumping, use a
cookie value instead of keeping the object around.

Remaining patches extend nftables pipapo (concatenated ranges) set type.

Make the AVX2 optimized version available from the control plane as
well, then use it during insert.  This gives a nice speedup for large
sets. All from myself.

On PREEMPT_RT, we can't rely on local_bh_disable to protect the
access to the percpu scratch maps.  Use nested-BH locking for this,
From Sebastian Siewior.

Please, pull these changes from:
The following changes since commit 5c69e0b395c1ffb37fd6fbdbd428353fc0894005:

  Merge branch 'stmmac-stop-silently-dropping-bad-checksum-packets' (2025-08-19 18:33:09 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next.git tags/nf-next-25-08-20

for you to fetch changes up to 456010c8b99e65231160d4c706122ac5502fbcff:

  netfilter: nft_set_pipapo: Use nested-BH locking for nft_pipapo_scratch (2025-08-20 13:52:37 +0200)

----------------------------------------------------------------
netfilter pull request nf-next-25-08-20

----------------------------------------------------------------
Florian Westphal (3):
  netfilter: ctnetlink: remove refcounting in dying list dumping
  netfilter: nft_set_pipapo_avx2: split lookup function in two parts
  netfilter: nft_set_pipapo: use avx2 algorithm for insertions too

Sebastian Andrzej Siewior (3):
  netfilter: nft_set_pipapo_avx2: Drop the comment regarding protection
  netfilter: nft_set_pipapo: Store real pointer, adjust later.
  netfilter: nft_set_pipapo: Use nested-BH locking for nft_pipapo_scratch

 net/netfilter/nf_conntrack_netlink.c |  39 ++------
 net/netfilter/nft_set_pipapo.c       |  90 ++++++++++-------
 net/netfilter/nft_set_pipapo.h       |   8 +-
 net/netfilter/nft_set_pipapo_avx2.c  | 138 ++++++++++++++++-----------
 net/netfilter/nft_set_pipapo_avx2.h  |   4 +
 5 files changed, 155 insertions(+), 124 deletions(-)

-- 
2.49.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH net-next 1/6] netfilter: ctnetlink: remove refcounting in dying list dumping
  2025-08-20 14:47 [PATCH net-next 0/6] netfilter: updates for net-next Florian Westphal
@ 2025-08-20 14:47 ` Florian Westphal
  2025-08-22  0:30   ` patchwork-bot+netdevbpf
  2025-08-20 14:47 ` [PATCH net-next 2/6] netfilter: nft_set_pipapo_avx2: Drop the comment regarding protection Florian Westphal
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2025-08-20 14:47 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo

There is no need to keep the object alive via refcount, use a cookie and
then use that as the skip hint for dump resumption.

Unlike the two earlier, similar patches in this file, this is a cleanup
without intended side effects.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_conntrack_netlink.c | 39 +++++++---------------------
 1 file changed, 10 insertions(+), 29 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 50fd6809380f..3a04665adf99 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -60,7 +60,7 @@ MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("List and change connection tracking table");
 
 struct ctnetlink_list_dump_ctx {
-	struct nf_conn *last;
+	unsigned long last_id;
 	unsigned int cpu;
 	bool done;
 };
@@ -1733,16 +1733,6 @@ static int ctnetlink_get_conntrack(struct sk_buff *skb,
 	return nfnetlink_unicast(skb2, info->net, NETLINK_CB(skb).portid);
 }
 
-static int ctnetlink_done_list(struct netlink_callback *cb)
-{
-	struct ctnetlink_list_dump_ctx *ctx = (void *)cb->ctx;
-
-	if (ctx->last)
-		nf_ct_put(ctx->last);
-
-	return 0;
-}
-
 #ifdef CONFIG_NF_CONNTRACK_EVENTS
 static int ctnetlink_dump_one_entry(struct sk_buff *skb,
 				    struct netlink_callback *cb,
@@ -1757,11 +1747,11 @@ static int ctnetlink_dump_one_entry(struct sk_buff *skb,
 	if (l3proto && nf_ct_l3num(ct) != l3proto)
 		return 0;
 
-	if (ctx->last) {
-		if (ct != ctx->last)
+	if (ctx->last_id) {
+		if (ctnetlink_get_id(ct) != ctx->last_id)
 			return 0;
 
-		ctx->last = NULL;
+		ctx->last_id = 0;
 	}
 
 	/* We can't dump extension info for the unconfirmed
@@ -1775,12 +1765,8 @@ static int ctnetlink_dump_one_entry(struct sk_buff *skb,
 				  cb->nlh->nlmsg_seq,
 				  NFNL_MSG_TYPE(cb->nlh->nlmsg_type),
 				  ct, dying, 0);
-	if (res < 0) {
-		if (!refcount_inc_not_zero(&ct->ct_general.use))
-			return 0;
-
-		ctx->last = ct;
-	}
+	if (res < 0)
+		ctx->last_id = ctnetlink_get_id(ct);
 
 	return res;
 }
@@ -1796,10 +1782,10 @@ static int
 ctnetlink_dump_dying(struct sk_buff *skb, struct netlink_callback *cb)
 {
 	struct ctnetlink_list_dump_ctx *ctx = (void *)cb->ctx;
-	struct nf_conn *last = ctx->last;
 #ifdef CONFIG_NF_CONNTRACK_EVENTS
 	const struct net *net = sock_net(skb->sk);
 	struct nf_conntrack_net_ecache *ecache_net;
+	unsigned long last_id = ctx->last_id;
 	struct nf_conntrack_tuple_hash *h;
 	struct hlist_nulls_node *n;
 #endif
@@ -1807,7 +1793,7 @@ ctnetlink_dump_dying(struct sk_buff *skb, struct netlink_callback *cb)
 	if (ctx->done)
 		return 0;
 
-	ctx->last = NULL;
+	ctx->last_id = 0;
 
 #ifdef CONFIG_NF_CONNTRACK_EVENTS
 	ecache_net = nf_conn_pernet_ecache(net);
@@ -1818,24 +1804,21 @@ ctnetlink_dump_dying(struct sk_buff *skb, struct netlink_callback *cb)
 		int res;
 
 		ct = nf_ct_tuplehash_to_ctrack(h);
-		if (last && last != ct)
+		if (last_id && last_id != ctnetlink_get_id(ct))
 			continue;
 
 		res = ctnetlink_dump_one_entry(skb, cb, ct, true);
 		if (res < 0) {
 			spin_unlock_bh(&ecache_net->dying_lock);
-			nf_ct_put(last);
 			return skb->len;
 		}
 
-		nf_ct_put(last);
-		last = NULL;
+		last_id = 0;
 	}
 
 	spin_unlock_bh(&ecache_net->dying_lock);
 #endif
 	ctx->done = true;
-	nf_ct_put(last);
 
 	return skb->len;
 }
@@ -1847,7 +1830,6 @@ static int ctnetlink_get_ct_dying(struct sk_buff *skb,
 	if (info->nlh->nlmsg_flags & NLM_F_DUMP) {
 		struct netlink_dump_control c = {
 			.dump = ctnetlink_dump_dying,
-			.done = ctnetlink_done_list,
 		};
 		return netlink_dump_start(info->sk, skb, info->nlh, &c);
 	}
@@ -1862,7 +1844,6 @@ static int ctnetlink_get_ct_unconfirmed(struct sk_buff *skb,
 	if (info->nlh->nlmsg_flags & NLM_F_DUMP) {
 		struct netlink_dump_control c = {
 			.dump = ctnetlink_dump_unconfirmed,
-			.done = ctnetlink_done_list,
 		};
 		return netlink_dump_start(info->sk, skb, info->nlh, &c);
 	}
-- 
2.49.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 2/6] netfilter: nft_set_pipapo_avx2: Drop the comment regarding protection
  2025-08-20 14:47 [PATCH net-next 0/6] netfilter: updates for net-next Florian Westphal
  2025-08-20 14:47 ` [PATCH net-next 1/6] netfilter: ctnetlink: remove refcounting in dying list dumping Florian Westphal
@ 2025-08-20 14:47 ` Florian Westphal
  2025-08-20 14:47 ` [PATCH net-next 3/6] netfilter: nft_set_pipapo_avx2: split lookup function in two parts Florian Westphal
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2025-08-20 14:47 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo, Sebastian Andrzej Siewior

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The comment claims that the kernel_fpu_begin_mask() below protects
access to the scratch map. This is not true because the access is only
protected by local_bh_disable() above.

Remove the misleading comment.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nft_set_pipapo_avx2.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c
index 2f090e253caf..fc734a8545b4 100644
--- a/net/netfilter/nft_set_pipapo_avx2.c
+++ b/net/netfilter/nft_set_pipapo_avx2.c
@@ -1171,9 +1171,7 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
 
 	m = rcu_dereference(priv->match);
 
-	/* This also protects access to all data related to scratch maps.
-	 *
-	 * Note that we don't need a valid MXCSR state for any of the
+	/* Note that we don't need a valid MXCSR state for any of the
 	 * operations we use here, so pass 0 as mask and spare a LDMXCSR
 	 * instruction.
 	 */
-- 
2.49.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 3/6] netfilter: nft_set_pipapo_avx2: split lookup function in two parts
  2025-08-20 14:47 [PATCH net-next 0/6] netfilter: updates for net-next Florian Westphal
  2025-08-20 14:47 ` [PATCH net-next 1/6] netfilter: ctnetlink: remove refcounting in dying list dumping Florian Westphal
  2025-08-20 14:47 ` [PATCH net-next 2/6] netfilter: nft_set_pipapo_avx2: Drop the comment regarding protection Florian Westphal
@ 2025-08-20 14:47 ` Florian Westphal
  2025-08-20 14:47 ` [PATCH net-next 4/6] netfilter: nft_set_pipapo: use avx2 algorithm for insertions too Florian Westphal
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2025-08-20 14:47 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo, Stefano Brivio

Split the main avx2 lookup function into a helper.

This is a preparation patch: followup change will use the new helper
from the insertion path if possible.  This greatly improves insertion
performance when avx2 is supported.

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nft_set_pipapo_avx2.c | 126 +++++++++++++++++-----------
 1 file changed, 77 insertions(+), 49 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c
index fc734a8545b4..994a2ad2d9b1 100644
--- a/net/netfilter/nft_set_pipapo_avx2.c
+++ b/net/netfilter/nft_set_pipapo_avx2.c
@@ -1133,56 +1133,35 @@ static inline void pipapo_resmap_init_avx2(const struct nft_pipapo_match *m, uns
 }
 
 /**
- * nft_pipapo_avx2_lookup() - Lookup function for AVX2 implementation
- * @net:	Network namespace
- * @set:	nftables API set representation
- * @key:	nftables API element representation containing key data
+ * pipapo_get_avx2() - Lookup function for AVX2 implementation
+ * @m:		Storage containing the set elements
+ * @data:	Key data to be matched against existing elements
+ * @genmask:	If set, check that element is active in given genmask
+ * @tstamp:	Timestamp to check for expired elements
  *
  * For more details, see DOC: Theory of Operation in nft_set_pipapo.c.
  *
  * This implementation exploits the repetitive characteristic of the algorithm
  * to provide a fast, vectorised version using the AVX2 SIMD instruction set.
  *
- * Return: true on match, false otherwise.
+ * The caller must check that the FPU is usable.
+ * This function must be called with BH disabled.
+ *
+ * Return: pointer to &struct nft_pipapo_elem on match, NULL otherwise.
  */
-const struct nft_set_ext *
-nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
-		       const u32 *key)
+static struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m,
+					       const u8 *data, u8 genmask,
+					       u64 tstamp)
 {
-	struct nft_pipapo *priv = nft_set_priv(set);
-	const struct nft_set_ext *ext = NULL;
 	struct nft_pipapo_scratch *scratch;
-	u8 genmask = nft_genmask_cur(net);
-	const struct nft_pipapo_match *m;
 	const struct nft_pipapo_field *f;
-	const u8 *rp = (const u8 *)key;
 	unsigned long *res, *fill;
 	bool map_index;
 	int i;
 
-	local_bh_disable();
-
-	if (unlikely(!irq_fpu_usable())) {
-		ext = nft_pipapo_lookup(net, set, key);
-
-		local_bh_enable();
-		return ext;
-	}
-
-	m = rcu_dereference(priv->match);
-
-	/* Note that we don't need a valid MXCSR state for any of the
-	 * operations we use here, so pass 0 as mask and spare a LDMXCSR
-	 * instruction.
-	 */
-	kernel_fpu_begin_mask(0);
-
 	scratch = *raw_cpu_ptr(m->scratch);
-	if (unlikely(!scratch)) {
-		kernel_fpu_end();
-		local_bh_enable();
+	if (unlikely(!scratch))
 		return NULL;
-	}
 
 	map_index = scratch->map_index;
 
@@ -1191,6 +1170,12 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
 
 	pipapo_resmap_init_avx2(m, res);
 
+	/* Note that we don't need a valid MXCSR state for any of the
+	 * operations we use here, so pass 0 as mask and spare a LDMXCSR
+	 * instruction.
+	 */
+	kernel_fpu_begin_mask(0);
+
 	nft_pipapo_avx2_prepare();
 
 next_match:
@@ -1200,7 +1185,7 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
 
 #define NFT_SET_PIPAPO_AVX2_LOOKUP(b, n)				\
 		(ret = nft_pipapo_avx2_lookup_##b##b_##n(res, fill, f,	\
-							 ret, rp,	\
+							 ret, data,	\
 							 first, last))
 
 		if (likely(f->bb == 8)) {
@@ -1216,7 +1201,7 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
 				NFT_SET_PIPAPO_AVX2_LOOKUP(8, 16);
 			} else {
 				ret = nft_pipapo_avx2_lookup_slow(m, res, fill, f,
-								  ret, rp,
+								  ret, data,
 								  first, last);
 			}
 		} else {
@@ -1232,7 +1217,7 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
 				NFT_SET_PIPAPO_AVX2_LOOKUP(4, 32);
 			} else {
 				ret = nft_pipapo_avx2_lookup_slow(m, res, fill, f,
-								  ret, rp,
+								  ret, data,
 								  first, last);
 			}
 		}
@@ -1240,29 +1225,72 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
 
 #undef NFT_SET_PIPAPO_AVX2_LOOKUP
 
-		if (ret < 0)
-			goto out;
+		if (ret < 0) {
+			scratch->map_index = map_index;
+			kernel_fpu_end();
+			return NULL;
+		}
 
 		if (last) {
-			const struct nft_set_ext *e = &f->mt[ret].e->ext;
+			struct nft_pipapo_elem *e;
 
-			if (unlikely(nft_set_elem_expired(e) ||
-				     !nft_set_elem_active(e, genmask)))
+			e = f->mt[ret].e;
+			if (unlikely(__nft_set_elem_expired(&e->ext, tstamp) ||
+				     !nft_set_elem_active(&e->ext, genmask)))
 				goto next_match;
 
-			ext = e;
-			goto out;
+			scratch->map_index = map_index;
+			kernel_fpu_end();
+			return e;
 		}
 
+		map_index = !map_index;
 		swap(res, fill);
-		rp += NFT_PIPAPO_GROUPS_PADDED_SIZE(f);
+		data += NFT_PIPAPO_GROUPS_PADDED_SIZE(f);
 	}
 
-out:
-	if (i % 2)
-		scratch->map_index = !map_index;
 	kernel_fpu_end();
+	return NULL;
+}
+
+/**
+ * nft_pipapo_avx2_lookup() - Dataplane frontend for AVX2 implementation
+ * @net:	Network namespace
+ * @set:	nftables API set representation
+ * @key:	nftables API element representation containing key data
+ *
+ * This function is called from the data path.  It will search for
+ * an element matching the given key in the current active copy using
+ * the AVX2 routines if the fpu is usable or fall back to the generic
+ * implementation of the algorithm otherwise.
+ *
+ * Return: nftables API extension pointer or NULL if no match.
+ */
+const struct nft_set_ext *
+nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
+		       const u32 *key)
+{
+	struct nft_pipapo *priv = nft_set_priv(set);
+	u8 genmask = nft_genmask_cur(net);
+	const struct nft_pipapo_match *m;
+	const u8 *rp = (const u8 *)key;
+	const struct nft_pipapo_elem *e;
+
+	local_bh_disable();
+
+	if (unlikely(!irq_fpu_usable())) {
+		const struct nft_set_ext *ext;
+
+		ext = nft_pipapo_lookup(net, set, key);
+
+		local_bh_enable();
+		return ext;
+	}
+
+	m = rcu_dereference(priv->match);
+
+	e = pipapo_get_avx2(m, rp, genmask, get_jiffies_64());
 	local_bh_enable();
 
-	return ext;
+	return e ? &e->ext : NULL;
 }
-- 
2.49.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 4/6] netfilter: nft_set_pipapo: use avx2 algorithm for insertions too
  2025-08-20 14:47 [PATCH net-next 0/6] netfilter: updates for net-next Florian Westphal
                   ` (2 preceding siblings ...)
  2025-08-20 14:47 ` [PATCH net-next 3/6] netfilter: nft_set_pipapo_avx2: split lookup function in two parts Florian Westphal
@ 2025-08-20 14:47 ` Florian Westphal
  2025-08-20 15:45   ` Stefano Brivio
  2025-08-20 14:47 ` [PATCH net-next 5/6] netfilter: nft_set_pipapo: Store real pointer, adjust later Florian Westphal
  2025-08-20 14:47 ` [PATCH net-next 6/6] netfilter: nft_set_pipapo: Use nested-BH locking for nft_pipapo_scratch Florian Westphal
  5 siblings, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2025-08-20 14:47 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo, Sebastian Andrzej Siewior

Always prefer the avx2 implementation if its available.
This greatly improves insertion performance (each insertion
checks if the new element would overlap with an existing one):

time nft -f - <<EOF
table ip pipapo {
	set s {
		typeof ip saddr . tcp dport
		flags interval
		size 800000
		elements = { 10.1.1.1 - 10.1.1.4 . 3996,
[.. 800k entries elided .. ]

before:
real    1m55.993s
user    0m2.505s
sys     1m53.296s

after:
real    0m42.586s
user    0m2.554s
sys     0m39.811s

Fold patch from Sebastian:

kernel_fpu_begin_mask()/ _end() remains in pipapo_get_avx2() where it is
required.

A followup patch will add local_lock_t to struct nft_pipapo_scratch in
order to protect the map pointer. The lock can not be acquired in
preemption disabled context which is what kernel_fpu_begin*() does.

Link: https://lore.kernel.org/netfilter-devel/20250818110213.1319982-2-bigeasy@linutronix.de/
Co-developed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nft_set_pipapo.c      | 45 +++++++++++++++++++++++++----
 net/netfilter/nft_set_pipapo_avx2.c |  8 ++---
 net/netfilter/nft_set_pipapo_avx2.h |  4 +++
 3 files changed, 48 insertions(+), 9 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 9a10251228fd..7ed9c5f0e233 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -397,7 +397,7 @@ int pipapo_refill(unsigned long *map, unsigned int len, unsigned int rules,
 }
 
 /**
- * pipapo_get() - Get matching element reference given key data
+ * pipapo_get_slow() - Get matching element reference given key data
  * @m:		storage containing the set elements
  * @data:	Key data to be matched against existing elements
  * @genmask:	If set, check that element is active in given genmask
@@ -414,9 +414,9 @@ int pipapo_refill(unsigned long *map, unsigned int len, unsigned int rules,
  *
  * Return: pointer to &struct nft_pipapo_elem on match, NULL otherwise.
  */
-static struct nft_pipapo_elem *pipapo_get(const struct nft_pipapo_match *m,
-					  const u8 *data, u8 genmask,
-					  u64 tstamp)
+static struct nft_pipapo_elem *pipapo_get_slow(const struct nft_pipapo_match *m,
+					       const u8 *data, u8 genmask,
+					       u64 tstamp)
 {
 	struct nft_pipapo_scratch *scratch;
 	unsigned long *res_map, *fill_map;
@@ -502,6 +502,41 @@ static struct nft_pipapo_elem *pipapo_get(const struct nft_pipapo_match *m,
 	return NULL;
 }
 
+/**
+ * pipapo_get() - Get matching element reference given key data
+ * @m:		Storage containing the set elements
+ * @data:	Key data to be matched against existing elements
+ * @genmask:	If set, check that element is active in given genmask
+ * @tstamp:	Timestamp to check for expired elements
+ *
+ * This is a dispatcher function, either calling out the generic C
+ * implementation or, if available, the AVX2 one.
+ * This helper is only called from the control plane, with either RCU
+ * read lock or transaction mutex held.
+ *
+ * Return: pointer to &struct nft_pipapo_elem on match, NULL otherwise.
+ */
+static struct nft_pipapo_elem *pipapo_get(const struct nft_pipapo_match *m,
+					  const u8 *data, u8 genmask,
+					  u64 tstamp)
+{
+	struct nft_pipapo_elem *e;
+
+	local_bh_disable();
+
+#if defined(CONFIG_X86_64) && !defined(CONFIG_UML)
+	if (boot_cpu_has(X86_FEATURE_AVX2) && boot_cpu_has(X86_FEATURE_AVX) &&
+	    irq_fpu_usable()) {
+		e = pipapo_get_avx2(m, data, genmask, tstamp);
+		local_bh_enable();
+		return e;
+	}
+#endif
+	e = pipapo_get_slow(m, data, genmask, tstamp);
+	local_bh_enable();
+	return e;
+}
+
 /**
  * nft_pipapo_lookup() - Dataplane fronted for main lookup function
  * @net:	Network namespace
@@ -523,7 +558,7 @@ nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
 	const struct nft_pipapo_elem *e;
 
 	m = rcu_dereference(priv->match);
-	e = pipapo_get(m, (const u8 *)key, genmask, get_jiffies_64());
+	e = pipapo_get_slow(m, (const u8 *)key, genmask, get_jiffies_64());
 
 	return e ? &e->ext : NULL;
 }
diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c
index 994a2ad2d9b1..028c11724b42 100644
--- a/net/netfilter/nft_set_pipapo_avx2.c
+++ b/net/netfilter/nft_set_pipapo_avx2.c
@@ -1149,9 +1149,9 @@ static inline void pipapo_resmap_init_avx2(const struct nft_pipapo_match *m, uns
  *
  * Return: pointer to &struct nft_pipapo_elem on match, NULL otherwise.
  */
-static struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m,
-					       const u8 *data, u8 genmask,
-					       u64 tstamp)
+struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m,
+					const u8 *data, u8 genmask,
+					u64 tstamp)
 {
 	struct nft_pipapo_scratch *scratch;
 	const struct nft_pipapo_field *f;
@@ -1261,7 +1261,7 @@ static struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m,
  *
  * This function is called from the data path.  It will search for
  * an element matching the given key in the current active copy using
- * the AVX2 routines if the fpu is usable or fall back to the generic
+ * the AVX2 routines if the FPU is usable or fall back to the generic
  * implementation of the algorithm otherwise.
  *
  * Return: nftables API extension pointer or NULL if no match.
diff --git a/net/netfilter/nft_set_pipapo_avx2.h b/net/netfilter/nft_set_pipapo_avx2.h
index dbb6aaca8a7a..c2999b63da3f 100644
--- a/net/netfilter/nft_set_pipapo_avx2.h
+++ b/net/netfilter/nft_set_pipapo_avx2.h
@@ -5,8 +5,12 @@
 #include <asm/fpu/xstate.h>
 #define NFT_PIPAPO_ALIGN	(XSAVE_YMM_SIZE / BITS_PER_BYTE)
 
+struct nft_pipapo_match;
 bool nft_pipapo_avx2_estimate(const struct nft_set_desc *desc, u32 features,
 			      struct nft_set_estimate *est);
+struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m,
+					const u8 *data, u8 genmask,
+					u64 tstamp);
 #endif /* defined(CONFIG_X86_64) && !defined(CONFIG_UML) */
 
 #endif /* _NFT_SET_PIPAPO_AVX2_H */
-- 
2.49.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 5/6] netfilter: nft_set_pipapo: Store real pointer, adjust later.
  2025-08-20 14:47 [PATCH net-next 0/6] netfilter: updates for net-next Florian Westphal
                   ` (3 preceding siblings ...)
  2025-08-20 14:47 ` [PATCH net-next 4/6] netfilter: nft_set_pipapo: use avx2 algorithm for insertions too Florian Westphal
@ 2025-08-20 14:47 ` Florian Westphal
  2025-08-20 15:44   ` Stefano Brivio
  2025-08-20 14:47 ` [PATCH net-next 6/6] netfilter: nft_set_pipapo: Use nested-BH locking for nft_pipapo_scratch Florian Westphal
  5 siblings, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2025-08-20 14:47 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo, Sebastian Andrzej Siewior

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The struct nft_pipapo_scratch is allocated, then aligned to the required
alignment and difference (in bytes) is then saved in align_off. The
aligned pointer is used later.
While this works, it gets complicated with all the extra checks if
all member before map are larger than the required alignment.

Instead of saving the aligned pointer, just save the returned pointer
and align the map pointer in nft_pipapo_lookup() before using it. The
alignment later on shouldn't be that expensive. With this change, the
align_off can be removed and the pointer can be passed to kfree() as is.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nft_set_pipapo.c      | 40 ++++++-----------------------
 net/netfilter/nft_set_pipapo.h      |  6 ++---
 net/netfilter/nft_set_pipapo_avx2.c |  8 +++---
 3 files changed, 14 insertions(+), 40 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 7ed9c5f0e233..96b7539f5506 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -418,8 +418,8 @@ static struct nft_pipapo_elem *pipapo_get_slow(const struct nft_pipapo_match *m,
 					       const u8 *data, u8 genmask,
 					       u64 tstamp)
 {
+	unsigned long *res_map, *fill_map, *map;
 	struct nft_pipapo_scratch *scratch;
-	unsigned long *res_map, *fill_map;
 	const struct nft_pipapo_field *f;
 	bool map_index;
 	int i;
@@ -432,8 +432,9 @@ static struct nft_pipapo_elem *pipapo_get_slow(const struct nft_pipapo_match *m,
 
 	map_index = scratch->map_index;
 
-	res_map  = scratch->map + (map_index ? m->bsize_max : 0);
-	fill_map = scratch->map + (map_index ? 0 : m->bsize_max);
+	map = NFT_PIPAPO_LT_ALIGN(&scratch->__map[0]);
+	res_map  = map + (map_index ? m->bsize_max : 0);
+	fill_map = map + (map_index ? 0 : m->bsize_max);
 
 	pipapo_resmap_init(m, res_map);
 
@@ -1171,22 +1172,17 @@ static void pipapo_map(struct nft_pipapo_match *m,
 }
 
 /**
- * pipapo_free_scratch() - Free per-CPU map at original (not aligned) address
+ * pipapo_free_scratch() - Free per-CPU map at original address
  * @m:		Matching data
  * @cpu:	CPU number
  */
 static void pipapo_free_scratch(const struct nft_pipapo_match *m, unsigned int cpu)
 {
 	struct nft_pipapo_scratch *s;
-	void *mem;
 
 	s = *per_cpu_ptr(m->scratch, cpu);
-	if (!s)
-		return;
 
-	mem = s;
-	mem -= s->align_off;
-	kvfree(mem);
+	kvfree(s);
 }
 
 /**
@@ -1203,11 +1199,8 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone,
 
 	for_each_possible_cpu(i) {
 		struct nft_pipapo_scratch *scratch;
-#ifdef NFT_PIPAPO_ALIGN
-		void *scratch_aligned;
-		u32 align_off;
-#endif
-		scratch = kvzalloc_node(struct_size(scratch, map, bsize_max * 2) +
+
+		scratch = kvzalloc_node(struct_size(scratch, __map, bsize_max * 2) +
 					NFT_PIPAPO_ALIGN_HEADROOM,
 					GFP_KERNEL_ACCOUNT, cpu_to_node(i));
 		if (!scratch) {
@@ -1222,23 +1215,6 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone,
 		}
 
 		pipapo_free_scratch(clone, i);
-
-#ifdef NFT_PIPAPO_ALIGN
-		/* Align &scratch->map (not the struct itself): the extra
-		 * %NFT_PIPAPO_ALIGN_HEADROOM bytes passed to kzalloc_node()
-		 * above guarantee we can waste up to those bytes in order
-		 * to align the map field regardless of its offset within
-		 * the struct.
-		 */
-		BUILD_BUG_ON(offsetof(struct nft_pipapo_scratch, map) > NFT_PIPAPO_ALIGN_HEADROOM);
-
-		scratch_aligned = NFT_PIPAPO_LT_ALIGN(&scratch->map);
-		scratch_aligned -= offsetof(struct nft_pipapo_scratch, map);
-		align_off = scratch_aligned - (void *)scratch;
-
-		scratch = scratch_aligned;
-		scratch->align_off = align_off;
-#endif
 		*per_cpu_ptr(clone->scratch, i) = scratch;
 	}
 
diff --git a/net/netfilter/nft_set_pipapo.h b/net/netfilter/nft_set_pipapo.h
index 4a2ff85ce1c4..e10cdbaa65d8 100644
--- a/net/netfilter/nft_set_pipapo.h
+++ b/net/netfilter/nft_set_pipapo.h
@@ -125,13 +125,11 @@ struct nft_pipapo_field {
 /**
  * struct nft_pipapo_scratch - percpu data used for lookup and matching
  * @map_index:	Current working bitmap index, toggled between field matches
- * @align_off:	Offset to get the originally allocated address
- * @map:	store partial matching results during lookup
+ * @__map:	store partial matching results during lookup
  */
 struct nft_pipapo_scratch {
 	u8 map_index;
-	u32 align_off;
-	unsigned long map[];
+	unsigned long __map[];
 };
 
 /**
diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c
index 028c11724b42..f0d8c796d731 100644
--- a/net/netfilter/nft_set_pipapo_avx2.c
+++ b/net/netfilter/nft_set_pipapo_avx2.c
@@ -1155,7 +1155,7 @@ struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m,
 {
 	struct nft_pipapo_scratch *scratch;
 	const struct nft_pipapo_field *f;
-	unsigned long *res, *fill;
+	unsigned long *res, *fill, *map;
 	bool map_index;
 	int i;
 
@@ -1164,9 +1164,9 @@ struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m,
 		return NULL;
 
 	map_index = scratch->map_index;
-
-	res  = scratch->map + (map_index ? m->bsize_max : 0);
-	fill = scratch->map + (map_index ? 0 : m->bsize_max);
+	map = NFT_PIPAPO_LT_ALIGN(&scratch->__map[0]);
+	res  = map + (map_index ? m->bsize_max : 0);
+	fill = map + (map_index ? 0 : m->bsize_max);
 
 	pipapo_resmap_init_avx2(m, res);
 
-- 
2.49.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 6/6] netfilter: nft_set_pipapo: Use nested-BH locking for nft_pipapo_scratch
  2025-08-20 14:47 [PATCH net-next 0/6] netfilter: updates for net-next Florian Westphal
                   ` (4 preceding siblings ...)
  2025-08-20 14:47 ` [PATCH net-next 5/6] netfilter: nft_set_pipapo: Store real pointer, adjust later Florian Westphal
@ 2025-08-20 14:47 ` Florian Westphal
  5 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2025-08-20 14:47 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo, Sebastian Andrzej Siewior

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

nft_pipapo_scratch is a per-CPU variable and relies on disabled BH for
its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT
this data structure requires explicit locking.

Add a local_lock_t to the data structure and use local_lock_nested_bh() for
locking. This change adds only lockdep coverage and does not alter the
functional behaviour for !PREEMPT_RT.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nft_set_pipapo.c      | 5 +++++
 net/netfilter/nft_set_pipapo.h      | 2 ++
 net/netfilter/nft_set_pipapo_avx2.c | 4 ++++
 3 files changed, 11 insertions(+)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 96b7539f5506..b385cfcf886f 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -429,6 +429,7 @@ static struct nft_pipapo_elem *pipapo_get_slow(const struct nft_pipapo_match *m,
 	scratch = *raw_cpu_ptr(m->scratch);
 	if (unlikely(!scratch))
 		goto out;
+	__local_lock_nested_bh(&scratch->bh_lock);
 
 	map_index = scratch->map_index;
 
@@ -465,6 +466,7 @@ static struct nft_pipapo_elem *pipapo_get_slow(const struct nft_pipapo_match *m,
 				  last);
 		if (b < 0) {
 			scratch->map_index = map_index;
+			__local_unlock_nested_bh(&scratch->bh_lock);
 			local_bh_enable();
 
 			return NULL;
@@ -484,6 +486,7 @@ static struct nft_pipapo_elem *pipapo_get_slow(const struct nft_pipapo_match *m,
 			 * *next* bitmap (not initial) for the next packet.
 			 */
 			scratch->map_index = map_index;
+			__local_unlock_nested_bh(&scratch->bh_lock);
 			local_bh_enable();
 			return e;
 		}
@@ -498,6 +501,7 @@ static struct nft_pipapo_elem *pipapo_get_slow(const struct nft_pipapo_match *m,
 		data += NFT_PIPAPO_GROUPS_PADDING(f);
 	}
 
+	__local_unlock_nested_bh(&scratch->bh_lock);
 out:
 	local_bh_enable();
 	return NULL;
@@ -1215,6 +1219,7 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone,
 		}
 
 		pipapo_free_scratch(clone, i);
+		local_lock_init(&scratch->bh_lock);
 		*per_cpu_ptr(clone->scratch, i) = scratch;
 	}
 
diff --git a/net/netfilter/nft_set_pipapo.h b/net/netfilter/nft_set_pipapo.h
index e10cdbaa65d8..eaab422aa56a 100644
--- a/net/netfilter/nft_set_pipapo.h
+++ b/net/netfilter/nft_set_pipapo.h
@@ -124,10 +124,12 @@ struct nft_pipapo_field {
 
 /**
  * struct nft_pipapo_scratch - percpu data used for lookup and matching
+ * @bh_lock:    PREEMPT_RT local spinlock
  * @map_index:	Current working bitmap index, toggled between field matches
  * @__map:	store partial matching results during lookup
  */
 struct nft_pipapo_scratch {
+	local_lock_t bh_lock;
 	u8 map_index;
 	unsigned long __map[];
 };
diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c
index f0d8c796d731..29326f3fcaf3 100644
--- a/net/netfilter/nft_set_pipapo_avx2.c
+++ b/net/netfilter/nft_set_pipapo_avx2.c
@@ -1163,6 +1163,7 @@ struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m,
 	if (unlikely(!scratch))
 		return NULL;
 
+	__local_lock_nested_bh(&scratch->bh_lock);
 	map_index = scratch->map_index;
 	map = NFT_PIPAPO_LT_ALIGN(&scratch->__map[0]);
 	res  = map + (map_index ? m->bsize_max : 0);
@@ -1228,6 +1229,7 @@ struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m,
 		if (ret < 0) {
 			scratch->map_index = map_index;
 			kernel_fpu_end();
+			__local_unlock_nested_bh(&scratch->bh_lock);
 			return NULL;
 		}
 
@@ -1241,6 +1243,7 @@ struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m,
 
 			scratch->map_index = map_index;
 			kernel_fpu_end();
+			__local_unlock_nested_bh(&scratch->bh_lock);
 			return e;
 		}
 
@@ -1250,6 +1253,7 @@ struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m,
 	}
 
 	kernel_fpu_end();
+	__local_unlock_nested_bh(&scratch->bh_lock);
 	return NULL;
 }
 
-- 
2.49.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 5/6] netfilter: nft_set_pipapo: Store real pointer, adjust later.
  2025-08-20 14:47 ` [PATCH net-next 5/6] netfilter: nft_set_pipapo: Store real pointer, adjust later Florian Westphal
@ 2025-08-20 15:44   ` Stefano Brivio
  2025-08-20 16:01     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 16+ messages in thread
From: Stefano Brivio @ 2025-08-20 15:44 UTC (permalink / raw)
  To: Florian Westphal, Sebastian Andrzej Siewior
  Cc: netdev, Paolo Abeni, David S. Miller, Eric Dumazet,
	Jakub Kicinski, netfilter-devel, pablo

On Wed, 20 Aug 2025 16:47:37 +0200
Florian Westphal <fw@strlen.de> wrote:

> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> The struct nft_pipapo_scratch is allocated, then aligned to the required
> alignment and difference (in bytes) is then saved in align_off. The
> aligned pointer is used later.
> While this works, it gets complicated with all the extra checks if
> all member before map are larger than the required alignment.
> 
> Instead of saving the aligned pointer, just save the returned pointer
> and align the map pointer in nft_pipapo_lookup() before using it. The
> alignment later on shouldn't be that expensive.

The cost of doing the alignment later was the very reason why I added
this whole dance in the first place though. Did you check packet
matching rates before and after this?

-- 
Stefano


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 4/6] netfilter: nft_set_pipapo: use avx2 algorithm for insertions too
  2025-08-20 14:47 ` [PATCH net-next 4/6] netfilter: nft_set_pipapo: use avx2 algorithm for insertions too Florian Westphal
@ 2025-08-20 15:45   ` Stefano Brivio
  0 siblings, 0 replies; 16+ messages in thread
From: Stefano Brivio @ 2025-08-20 15:45 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netdev, Paolo Abeni, David S. Miller, Eric Dumazet,
	Jakub Kicinski, netfilter-devel, pablo, Sebastian Andrzej Siewior

On Wed, 20 Aug 2025 16:47:36 +0200
Florian Westphal <fw@strlen.de> wrote:

> Always prefer the avx2 implementation if its available.
> This greatly improves insertion performance (each insertion
> checks if the new element would overlap with an existing one):
> 
> time nft -f - <<EOF
> table ip pipapo {
> 	set s {
> 		typeof ip saddr . tcp dport
> 		flags interval
> 		size 800000
> 		elements = { 10.1.1.1 - 10.1.1.4 . 3996,
> [.. 800k entries elided .. ]
> 
> before:
> real    1m55.993s
> user    0m2.505s
> sys     1m53.296s
> 
> after:
> real    0m42.586s
> user    0m2.554s
> sys     0m39.811s
> 
> Fold patch from Sebastian:
> 
> kernel_fpu_begin_mask()/ _end() remains in pipapo_get_avx2() where it is
> required.
> 
> A followup patch will add local_lock_t to struct nft_pipapo_scratch in
> order to protect the map pointer. The lock can not be acquired in
> preemption disabled context which is what kernel_fpu_begin*() does.
> 
> Link: https://lore.kernel.org/netfilter-devel/20250818110213.1319982-2-bigeasy@linutronix.de/
> Co-developed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Florian Westphal <fw@strlen.de>

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>

-- 
Stefano


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 5/6] netfilter: nft_set_pipapo: Store real pointer, adjust later.
  2025-08-20 15:44   ` Stefano Brivio
@ 2025-08-20 16:01     ` Sebastian Andrzej Siewior
  2025-08-20 16:15       ` Stefano Brivio
  0 siblings, 1 reply; 16+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-08-20 16:01 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: Florian Westphal, netdev, Paolo Abeni, David S. Miller,
	Eric Dumazet, Jakub Kicinski, netfilter-devel, pablo

On 2025-08-20 17:44:01 [+0200], Stefano Brivio wrote:
> On Wed, 20 Aug 2025 16:47:37 +0200
> Florian Westphal <fw@strlen.de> wrote:
> 
> > From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > 
> > The struct nft_pipapo_scratch is allocated, then aligned to the required
> > alignment and difference (in bytes) is then saved in align_off. The
> > aligned pointer is used later.
> > While this works, it gets complicated with all the extra checks if
> > all member before map are larger than the required alignment.
> > 
> > Instead of saving the aligned pointer, just save the returned pointer
> > and align the map pointer in nft_pipapo_lookup() before using it. The
> > alignment later on shouldn't be that expensive.
> 
> The cost of doing the alignment later was the very reason why I added
> this whole dance in the first place though. Did you check packet
> matching rates before and after this?

how? There was something under selftest which I used to ensure it still
works.
On x86 it should be two additional opcodes (and + lea) and that might be
interleaved. Do you remember a rule of thumb of your improvement?
As far as I remember the alignment code expects that the "hole" at the
begin does not exceed a certain size and the lock there exceeds it.

Sebastian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 5/6] netfilter: nft_set_pipapo: Store real pointer, adjust later.
  2025-08-20 16:01     ` Sebastian Andrzej Siewior
@ 2025-08-20 16:15       ` Stefano Brivio
  2025-08-20 16:29         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 16+ messages in thread
From: Stefano Brivio @ 2025-08-20 16:15 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Florian Westphal, netdev, Paolo Abeni, David S. Miller,
	Eric Dumazet, Jakub Kicinski, netfilter-devel, pablo

On Wed, 20 Aug 2025 18:01:14 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> On 2025-08-20 17:44:01 [+0200], Stefano Brivio wrote:
> > On Wed, 20 Aug 2025 16:47:37 +0200
> > Florian Westphal <fw@strlen.de> wrote:
> >   
> > > From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > > 
> > > The struct nft_pipapo_scratch is allocated, then aligned to the required
> > > alignment and difference (in bytes) is then saved in align_off. The
> > > aligned pointer is used later.
> > > While this works, it gets complicated with all the extra checks if
> > > all member before map are larger than the required alignment.
> > > 
> > > Instead of saving the aligned pointer, just save the returned pointer
> > > and align the map pointer in nft_pipapo_lookup() before using it. The
> > > alignment later on shouldn't be that expensive.  
> > 
> > The cost of doing the alignment later was the very reason why I added
> > this whole dance in the first place though. Did you check packet
> > matching rates before and after this?  
> 
> how? There was something under selftest which I used to ensure it still
> works.

tools/testing/selftests/net/netfilter/nft_concat_range.sh, you should add
"performance" to $TESTS (or just do TESTS=perfomance), they are normally
skipped because they take a while.

> On x86 it should be two additional opcodes (and + lea) and that might be
> interleaved.

I think so too, but I wonder if that has a much bigger effect on
subsequent cache loads rather than just those two instructions.

> Do you remember a rule of thumb of your improvement?

I added this right away with the initial implementation of the
vectorised version, so I didn't really check the difference or record
it anywhere, but I vaguely remember having something similar to the
version with your current change in an earlier draft and it was
something like 20 cycles difference with the 'net,port' test with 1000
entries... maybe, I'm really not sure anymore.

I'm especially not sure if my old draft was equivalent to this change.
I reported the original figures (with the alignment done in advance) in
the commit message of 7400b063969b ("nft_set_pipapo: Introduce
AVX2-based lookup implementation").

> As far as I remember the alignment code expects that the "hole" at the
> begin does not exceed a certain size and the lock there exceeds it.

I think you're right. But again, the alignment itself should be fast,
that's not what I'm concerned about.

-- 
Stefano


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 5/6] netfilter: nft_set_pipapo: Store real pointer, adjust later.
  2025-08-20 16:15       ` Stefano Brivio
@ 2025-08-20 16:29         ` Sebastian Andrzej Siewior
  2025-08-20 16:34           ` Stefano Brivio
  0 siblings, 1 reply; 16+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-08-20 16:29 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: Florian Westphal, netdev, Paolo Abeni, David S. Miller,
	Eric Dumazet, Jakub Kicinski, netfilter-devel, pablo

On 2025-08-20 18:15:36 [+0200], Stefano Brivio wrote:
> > As far as I remember the alignment code expects that the "hole" at the
> > begin does not exceed a certain size and the lock there exceeds it.
> 
> I think you're right. But again, the alignment itself should be fast,
> that's not what I'm concerned about.

Are we good are do you want me to do the performance check, that you
suggested?

Sebastian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 5/6] netfilter: nft_set_pipapo: Store real pointer, adjust later.
  2025-08-20 16:29         ` Sebastian Andrzej Siewior
@ 2025-08-20 16:34           ` Stefano Brivio
  2025-08-20 21:04             ` Stefano Brivio
  0 siblings, 1 reply; 16+ messages in thread
From: Stefano Brivio @ 2025-08-20 16:34 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Florian Westphal, netdev, Paolo Abeni, David S. Miller,
	Eric Dumazet, Jakub Kicinski, netfilter-devel, pablo

On Wed, 20 Aug 2025 18:29:25 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> On 2025-08-20 18:15:36 [+0200], Stefano Brivio wrote:
> > > As far as I remember the alignment code expects that the "hole" at the
> > > begin does not exceed a certain size and the lock there exceeds it.  
> > 
> > I think you're right. But again, the alignment itself should be fast,
> > that's not what I'm concerned about.  
> 
> Are we good are do you want me to do the performance check, that you
> suggested?

I think it would be good if you could give that a try (I don't have a
stable setup to run that at hand right now, sorry). It shouldn't take
long.

That's because I'm not sure how cached accesses are affected by this
(see just above in my previous email).

-- 
Stefano


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 5/6] netfilter: nft_set_pipapo: Store real pointer, adjust later.
  2025-08-20 16:34           ` Stefano Brivio
@ 2025-08-20 21:04             ` Stefano Brivio
  2025-08-21  6:37               ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 16+ messages in thread
From: Stefano Brivio @ 2025-08-20 21:04 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Florian Westphal, netdev, Paolo Abeni, David S. Miller,
	Eric Dumazet, Jakub Kicinski, netfilter-devel, pablo

On Wed, 20 Aug 2025 18:34:51 +0200
Stefano Brivio <sbrivio@redhat.com> wrote:

> On Wed, 20 Aug 2025 18:29:25 +0200
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> 
> > On 2025-08-20 18:15:36 [+0200], Stefano Brivio wrote:  
> > > > As far as I remember the alignment code expects that the "hole" at the
> > > > begin does not exceed a certain size and the lock there exceeds it.    
> > > 
> > > I think you're right. But again, the alignment itself should be fast,
> > > that's not what I'm concerned about.    
> > 
> > Are we good are do you want me to do the performance check, that you
> > suggested?  
> 
> I think it would be good if you could give that a try (I don't have a
> stable setup to run that at hand right now, sorry). It shouldn't take
> long.

Never mind, I just found a moment to run that for you. Before your
change (net-next a couple of weeks old -- I didn't realise that Florian
introduced a 'nft_concat_range_perf.sh' meanwhile):

---
# ./nft_concat_range_perf.sh 
TEST: performance
  net,port                             28s                              [ OK ]
    baseline (drop from netdev hook):              26079726pps
    baseline hash (non-ranged entries):            18795587pps
    baseline rbtree (match on first field only):    9461059pps
    set with  1000 full, ranged entries:           14358957pps
  port,net                             22s                              [ OK ]
    baseline (drop from netdev hook):              26183255pps
    baseline hash (non-ranged entries):            18738336pps
    baseline rbtree (match on first field only):   12578272pps
    set with   100 full, ranged entries:           15277135pps
  net6,port                            28s                              [ OK ]
    baseline (drop from netdev hook):              25094125pps
    baseline hash (non-ranged entries):            17011489pps
    baseline rbtree (match on first field only):    6964647pps
    set with  1000 full, ranged entries:           11721714pps
  port,proto                          304s                              [ OK ]
    baseline (drop from netdev hook):              26174580pps
    baseline hash (non-ranged entries):            19252254pps
    baseline rbtree (match on first field only):    8516771pps
    set with 30000 full, ranged entries:            6064576pps
  net6,port,mac                        23s                              [ OK ]
    baseline (drop from netdev hook):              24996893pps
    baseline hash (non-ranged entries):            14526917pps
    baseline rbtree (match on first field only):   12596905pps
    set with    10 full, ranged entries:           12089867pps
  net6,port,mac,proto                  35s                              [ OK ]
    baseline (drop from netdev hook):              24874223pps
    baseline hash (non-ranged entries):            14352580pps
    baseline rbtree (match on first field only):    6884754pps
    set with  1000 full, ranged entries:            8787067pps
  net,mac                              29s                              [ OK ]
    baseline (drop from netdev hook):              25956434pps
    baseline hash (non-ranged entries):            17166976pps
    baseline rbtree (match on first field only):    9423341pps
    set with  1000 full, ranged entries:           12150579pps
---

after your change:

---
# ./nft_concat_range_perf.sh 
TEST: performance
  net,port                             27s                              [ OK ]
    baseline (drop from netdev hook):              27212033pps
    baseline hash (non-ranged entries):            19494836pps
    baseline rbtree (match on first field only):    9669798pps
    set with  1000 full, ranged entries:           14931543pps
  port,net                             23s                              [ OK ]
    baseline (drop from netdev hook):              27085267pps
    baseline hash (non-ranged entries):            19642549pps
    baseline rbtree (match on first field only):   12852031pps
    set with   100 full, ranged entries:           15882440pps
  net6,port                            29s                              [ OK ]
    baseline (drop from netdev hook):              26134468pps
    baseline hash (non-ranged entries):            17732410pps
    baseline rbtree (match on first field only):    7044812pps
    set with  1000 full, ranged entries:           11670109pps
  port,proto                          300s                              [ OK ]
    baseline (drop from netdev hook):              27227915pps
    baseline hash (non-ranged entries):            20266609pps
    baseline rbtree (match on first field only):    8662566pps
    set with 30000 full, ranged entries:            6147235pps
  net6,port,mac                        23s                              [ OK ]
    baseline (drop from netdev hook):              26001705pps
    baseline hash (non-ranged entries):            15448524pps
    baseline rbtree (match on first field only):   12867457pps
    set with    10 full, ranged entries:           12140558pps
  net6,port,mac,proto                  34s                              [ OK ]
    baseline (drop from netdev hook):              25485866pps
    baseline hash (non-ranged entries):            14794412pps
    baseline rbtree (match on first field only):    6929897pps
    set with  1000 full, ranged entries:            8754555pps
  net,mac                              28s                              [ OK ]
    baseline (drop from netdev hook):              27095870pps
    baseline hash (non-ranged entries):            17848010pps
    baseline rbtree (match on first field only):    9576292pps
    set with  1000 full, ranged entries:           12568702pps
---

it's a single run and not exactly from the same baseline (you see that
the baseline actually improved), but I'd say it's enough to be
confident that the change doesn't affect matching rate significantly,
so:

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>

...thanks for making this simpler!

-- 
Stefano


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 5/6] netfilter: nft_set_pipapo: Store real pointer, adjust later.
  2025-08-20 21:04             ` Stefano Brivio
@ 2025-08-21  6:37               ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 16+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-08-21  6:37 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: Florian Westphal, netdev, Paolo Abeni, David S. Miller,
	Eric Dumazet, Jakub Kicinski, netfilter-devel, pablo

On 2025-08-20 23:04:45 [+0200], Stefano Brivio wrote:
> On Wed, 20 Aug 2025 18:34:51 +0200
> 
> it's a single run and not exactly from the same baseline (you see that
> the baseline actually improved), but I'd say it's enough to be
> confident that the change doesn't affect matching rate significantly,
> so:
> 
> Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
> 
> ...thanks for making this simpler!

Thank you for testing in the meantime!

Sebastian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 1/6] netfilter: ctnetlink: remove refcounting in dying list dumping
  2025-08-20 14:47 ` [PATCH net-next 1/6] netfilter: ctnetlink: remove refcounting in dying list dumping Florian Westphal
@ 2025-08-22  0:30   ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 16+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-08-22  0:30 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netdev, pabeni, davem, edumazet, kuba, netfilter-devel, pablo

Hello:

This series was applied to netdev/net-next.git (main)
by Florian Westphal <fw@strlen.de>:

On Wed, 20 Aug 2025 16:47:33 +0200 you wrote:
> There is no need to keep the object alive via refcount, use a cookie and
> then use that as the skip hint for dump resumption.
> 
> Unlike the two earlier, similar patches in this file, this is a cleanup
> without intended side effects.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> 
> [...]

Here is the summary with links:
  - [net-next,1/6] netfilter: ctnetlink: remove refcounting in dying list dumping
    https://git.kernel.org/netdev/net-next/c/08d07f25fd5e
  - [net-next,2/6] netfilter: nft_set_pipapo_avx2: Drop the comment regarding protection
    https://git.kernel.org/netdev/net-next/c/d11b26402a33
  - [net-next,3/6] netfilter: nft_set_pipapo_avx2: split lookup function in two parts
    https://git.kernel.org/netdev/net-next/c/416e53e39516
  - [net-next,4/6] netfilter: nft_set_pipapo: use avx2 algorithm for insertions too
    https://git.kernel.org/netdev/net-next/c/84c1da7b38d9
  - [net-next,5/6] netfilter: nft_set_pipapo: Store real pointer, adjust later.
    https://git.kernel.org/netdev/net-next/c/6aa67d5706f0
  - [net-next,6/6] netfilter: nft_set_pipapo: Use nested-BH locking for nft_pipapo_scratch
    https://git.kernel.org/netdev/net-next/c/456010c8b99e

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-08-22  0:30 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-20 14:47 [PATCH net-next 0/6] netfilter: updates for net-next Florian Westphal
2025-08-20 14:47 ` [PATCH net-next 1/6] netfilter: ctnetlink: remove refcounting in dying list dumping Florian Westphal
2025-08-22  0:30   ` patchwork-bot+netdevbpf
2025-08-20 14:47 ` [PATCH net-next 2/6] netfilter: nft_set_pipapo_avx2: Drop the comment regarding protection Florian Westphal
2025-08-20 14:47 ` [PATCH net-next 3/6] netfilter: nft_set_pipapo_avx2: split lookup function in two parts Florian Westphal
2025-08-20 14:47 ` [PATCH net-next 4/6] netfilter: nft_set_pipapo: use avx2 algorithm for insertions too Florian Westphal
2025-08-20 15:45   ` Stefano Brivio
2025-08-20 14:47 ` [PATCH net-next 5/6] netfilter: nft_set_pipapo: Store real pointer, adjust later Florian Westphal
2025-08-20 15:44   ` Stefano Brivio
2025-08-20 16:01     ` Sebastian Andrzej Siewior
2025-08-20 16:15       ` Stefano Brivio
2025-08-20 16:29         ` Sebastian Andrzej Siewior
2025-08-20 16:34           ` Stefano Brivio
2025-08-20 21:04             ` Stefano Brivio
2025-08-21  6:37               ` Sebastian Andrzej Siewior
2025-08-20 14:47 ` [PATCH net-next 6/6] netfilter: nft_set_pipapo: Use nested-BH locking for nft_pipapo_scratch Florian Westphal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).