* [PATCH net 1/4] netfilter: nf_flow_table_ip: Introduce nf_flow_vlan_push()
2026-03-04 17:29 [PATCH net 0/4] netfilter: updates for net Florian Westphal
@ 2026-03-04 17:29 ` Florian Westphal
2026-03-04 17:29 ` [PATCH net 2/4] netfilter: nf_tables: unconditionally bump set->nelems before insertion Florian Westphal
` (4 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Florian Westphal @ 2026-03-04 17:29 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Eric Woudstra <ericwouds@gmail.com>
With double vlan tagged packets in the fastpath, getting the error:
skb_vlan_push got skb with skb->data not at mac header (offset 18)
Introduce nf_flow_vlan_push(), that can correctly push the inner vlan
in the fastpath. It is closedly modelled on existing nf_flow_pppoe_push()
Fixes: c653d5a78f34 ("netfilter: flowtable: inline vlan encapsulation in xmit path")
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nf_flow_table_ip.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 3fdb10d9bf7f..e65c8148688e 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -544,6 +544,27 @@ static int nf_flow_offload_forward(struct nf_flowtable_ctx *ctx,
return 1;
}
+static int nf_flow_vlan_push(struct sk_buff *skb, __be16 proto, u16 id)
+{
+ if (skb_vlan_tag_present(skb)) {
+ struct vlan_hdr *vhdr;
+
+ if (skb_cow_head(skb, VLAN_HLEN))
+ return -1;
+
+ __skb_push(skb, VLAN_HLEN);
+ skb_reset_network_header(skb);
+
+ vhdr = (struct vlan_hdr *)(skb->data);
+ vhdr->h_vlan_TCI = htons(id);
+ vhdr->h_vlan_encapsulated_proto = skb->protocol;
+ skb->protocol = proto;
+ } else {
+ __vlan_hwaccel_put_tag(skb, proto, id);
+ }
+ return 0;
+}
+
static int nf_flow_pppoe_push(struct sk_buff *skb, u16 id)
{
int data_len = skb->len + sizeof(__be16);
@@ -738,8 +759,8 @@ static int nf_flow_encap_push(struct sk_buff *skb,
switch (tuple->encap[i].proto) {
case htons(ETH_P_8021Q):
case htons(ETH_P_8021AD):
- if (skb_vlan_push(skb, tuple->encap[i].proto,
- tuple->encap[i].id) < 0)
+ if (nf_flow_vlan_push(skb, tuple->encap[i].proto,
+ tuple->encap[i].id) < 0)
return -1;
break;
case htons(ETH_P_PPP_SES):
--
2.52.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH net 2/4] netfilter: nf_tables: unconditionally bump set->nelems before insertion
2026-03-04 17:29 [PATCH net 0/4] netfilter: updates for net Florian Westphal
2026-03-04 17:29 ` [PATCH net 1/4] netfilter: nf_flow_table_ip: Introduce nf_flow_vlan_push() Florian Westphal
@ 2026-03-04 17:29 ` Florian Westphal
2026-03-04 17:29 ` [PATCH net 3/4] netfilter: nf_tables: clone set on flush only Florian Westphal
` (3 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Florian Westphal @ 2026-03-04 17:29 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Pablo Neira Ayuso <pablo@netfilter.org>
In case that the set is full, a new element gets published then removed
without waiting for the RCU grace period, while RCU reader can be
walking over it already.
To address this issue, add the element transaction even if set is full,
but toggle the set_full flag to report -ENFILE so the abort path safely
unwinds the set to its previous state.
As for element updates, decrement set->nelems to restore it.
A simpler fix is to call synchronize_rcu() in the error path.
However, with a large batch adding elements to already maxed-out set,
this could cause noticeable slowdown of such batches.
Fixes: 35d0ac9070ef ("netfilter: nf_tables: fix set->nelems counting with no NLM_F_EXCL")
Reported-by: Inseo An <y0un9sa@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nf_tables_api.c | 30 ++++++++++++++++--------------
1 file changed, 16 insertions(+), 14 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index fd7f7e4e2a43..df67932d3e09 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -7170,6 +7170,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
struct nft_data_desc desc;
enum nft_registers dreg;
struct nft_trans *trans;
+ bool set_full = false;
u64 expiration;
u64 timeout;
int err, i;
@@ -7461,10 +7462,18 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
if (err < 0)
goto err_elem_free;
+ if (!(flags & NFT_SET_ELEM_CATCHALL)) {
+ unsigned int max = nft_set_maxsize(set), nelems;
+
+ nelems = atomic_inc_return(&set->nelems);
+ if (nelems > max)
+ set_full = true;
+ }
+
trans = nft_trans_elem_alloc(ctx, NFT_MSG_NEWSETELEM, set);
if (trans == NULL) {
err = -ENOMEM;
- goto err_elem_free;
+ goto err_set_size;
}
ext->genmask = nft_genmask_cur(ctx->net);
@@ -7516,7 +7525,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
ue->priv = elem_priv;
nft_trans_commit_list_add_elem(ctx->net, trans);
- goto err_elem_free;
+ goto err_set_size;
}
}
}
@@ -7534,23 +7543,16 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
goto err_element_clash;
}
- if (!(flags & NFT_SET_ELEM_CATCHALL)) {
- unsigned int max = nft_set_maxsize(set);
-
- if (!atomic_add_unless(&set->nelems, 1, max)) {
- err = -ENFILE;
- goto err_set_full;
- }
- }
-
nft_trans_container_elem(trans)->elems[0].priv = elem.priv;
nft_trans_commit_list_add_elem(ctx->net, trans);
- return 0;
-err_set_full:
- nft_setelem_remove(ctx->net, set, elem.priv);
+ return set_full ? -ENFILE : 0;
+
err_element_clash:
kfree(trans);
+err_set_size:
+ if (!(flags & NFT_SET_ELEM_CATCHALL))
+ atomic_dec(&set->nelems);
err_elem_free:
nf_tables_set_elem_destroy(ctx, set, elem.priv);
err_parse_data:
--
2.52.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH net 3/4] netfilter: nf_tables: clone set on flush only
2026-03-04 17:29 [PATCH net 0/4] netfilter: updates for net Florian Westphal
2026-03-04 17:29 ` [PATCH net 1/4] netfilter: nf_flow_table_ip: Introduce nf_flow_vlan_push() Florian Westphal
2026-03-04 17:29 ` [PATCH net 2/4] netfilter: nf_tables: unconditionally bump set->nelems before insertion Florian Westphal
@ 2026-03-04 17:29 ` Florian Westphal
2026-03-04 17:29 ` [PATCH net 4/4] netfilter: nft_set_pipapo: split gc into unlink and reclaim phase Florian Westphal
` (2 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Florian Westphal @ 2026-03-04 17:29 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Pablo Neira Ayuso <pablo@netfilter.org>
Syzbot with fault injection triggered a failing memory allocation with
GFP_KERNEL which results in a WARN splat:
iter.err
WARNING: net/netfilter/nf_tables_api.c:845 at nft_map_deactivate+0x34e/0x3c0 net/netfilter/nf_tables_api.c:845, CPU#0: syz.0.17/5992
Modules linked in:
CPU: 0 UID: 0 PID: 5992 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2026
RIP: 0010:nft_map_deactivate+0x34e/0x3c0 net/netfilter/nf_tables_api.c:845
Code: 8b 05 86 5a 4e 09 48 3b 84 24 a0 00 00 00 75 62 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc cc e8 63 6d fa f7 90 <0f> 0b 90 43
+80 7c 35 00 00 0f 85 23 fe ff ff e9 26 fe ff ff 89 d9
RSP: 0018:ffffc900045af780 EFLAGS: 00010293
RAX: ffffffff89ca45bd RBX: 00000000fffffff4 RCX: ffff888028111e40
RDX: 0000000000000000 RSI: 00000000fffffff4 RDI: 0000000000000000
RBP: ffffc900045af870 R08: 0000000000400dc0 R09: 00000000ffffffff
R10: dffffc0000000000 R11: fffffbfff1d141db R12: ffffc900045af7e0
R13: 1ffff920008b5f24 R14: dffffc0000000000 R15: ffffc900045af920
FS: 000055557a6a5500(0000) GS:ffff888125496000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fb5ea271fc0 CR3: 000000003269e000 CR4: 00000000003526f0
Call Trace:
<TASK>
__nft_release_table+0xceb/0x11f0 net/netfilter/nf_tables_api.c:12115
nft_rcv_nl_event+0xc25/0xdb0 net/netfilter/nf_tables_api.c:12187
notifier_call_chain+0x19d/0x3a0 kernel/notifier.c:85
blocking_notifier_call_chain+0x6a/0x90 kernel/notifier.c:380
netlink_release+0x123b/0x1ad0 net/netlink/af_netlink.c:761
__sock_release net/socket.c:662 [inline]
sock_close+0xc3/0x240 net/socket.c:1455
Restrict set clone to the flush set command in the preparation phase.
Add NFT_ITER_UPDATE_CLONE and use it for this purpose, update the rbtree
and pipapo backends to only clone the set when this iteration type is
used.
As for the existing NFT_ITER_UPDATE type, update the pipapo backend to
use the existing set clone if available, otherwise use the existing set
representation. After this update, there is no need to clone a set that
is being deleted, this includes bound anonymous set.
An alternative approach to NFT_ITER_UPDATE_CLONE is to add a .clone
interface and call it from the flush set path.
Reported-by: syzbot+4924a0edc148e8b4b342@syzkaller.appspotmail.com
Fixes: 3f1d886cc7c3 ("netfilter: nft_set_pipapo: move cloning of match info to insert/removal path")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
include/net/netfilter/nf_tables.h | 2 ++
net/netfilter/nf_tables_api.c | 10 +++++++++-
net/netfilter/nft_set_hash.c | 1 +
net/netfilter/nft_set_pipapo.c | 11 +++++++++--
net/netfilter/nft_set_rbtree.c | 8 +++++---
5 files changed, 26 insertions(+), 6 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 426534a711b0..ea6f29ad7888 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -320,11 +320,13 @@ static inline void *nft_elem_priv_cast(const struct nft_elem_priv *priv)
* @NFT_ITER_UNSPEC: unspecified, to catch errors
* @NFT_ITER_READ: read-only iteration over set elements
* @NFT_ITER_UPDATE: iteration under mutex to update set element state
+ * @NFT_ITER_UPDATE_CLONE: clone set before iteration under mutex to update element
*/
enum nft_iter_type {
NFT_ITER_UNSPEC,
NFT_ITER_READ,
NFT_ITER_UPDATE,
+ NFT_ITER_UPDATE_CLONE,
};
struct nft_set;
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index df67932d3e09..058f7004cb2b 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -833,6 +833,11 @@ static void nft_map_catchall_deactivate(const struct nft_ctx *ctx,
}
}
+/* Use NFT_ITER_UPDATE iterator even if this may be called from the preparation
+ * phase, the set clone might already exist from a previous command, or it might
+ * be a set that is going away and does not require a clone. The netns and
+ * netlink release paths also need to work on the live set.
+ */
static void nft_map_deactivate(const struct nft_ctx *ctx, struct nft_set *set)
{
struct nft_set_iter iter = {
@@ -7903,9 +7908,12 @@ static int nft_set_catchall_flush(const struct nft_ctx *ctx,
static int nft_set_flush(struct nft_ctx *ctx, struct nft_set *set, u8 genmask)
{
+ /* The set backend might need to clone the set, do it now from the
+ * preparation phase, use NFT_ITER_UPDATE_CLONE iterator type.
+ */
struct nft_set_iter iter = {
.genmask = genmask,
- .type = NFT_ITER_UPDATE,
+ .type = NFT_ITER_UPDATE_CLONE,
.fn = nft_setelem_flush,
};
diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index 739b992bde59..b0e571c8e3f3 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -374,6 +374,7 @@ static void nft_rhash_walk(const struct nft_ctx *ctx, struct nft_set *set,
{
switch (iter->type) {
case NFT_ITER_UPDATE:
+ case NFT_ITER_UPDATE_CLONE:
/* only relevant for netlink dumps which use READ type */
WARN_ON_ONCE(iter->skip != 0);
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 7ef4b44471d3..c091898df710 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -2144,13 +2144,20 @@ static void nft_pipapo_walk(const struct nft_ctx *ctx, struct nft_set *set,
const struct nft_pipapo_match *m;
switch (iter->type) {
- case NFT_ITER_UPDATE:
+ case NFT_ITER_UPDATE_CLONE:
m = pipapo_maybe_clone(set);
if (!m) {
iter->err = -ENOMEM;
return;
}
-
+ nft_pipapo_do_walk(ctx, set, m, iter);
+ break;
+ case NFT_ITER_UPDATE:
+ if (priv->clone)
+ m = priv->clone;
+ else
+ m = rcu_dereference_protected(priv->match,
+ nft_pipapo_transaction_mutex_held(set));
nft_pipapo_do_walk(ctx, set, m, iter);
break;
case NFT_ITER_READ:
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 3f02e4478216..ee3d4f5b9ff7 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -861,13 +861,15 @@ static void nft_rbtree_walk(const struct nft_ctx *ctx,
struct nft_rbtree *priv = nft_set_priv(set);
switch (iter->type) {
- case NFT_ITER_UPDATE:
- lockdep_assert_held(&nft_pernet(ctx->net)->commit_mutex);
-
+ case NFT_ITER_UPDATE_CLONE:
if (nft_array_may_resize(set) < 0) {
iter->err = -ENOMEM;
break;
}
+ fallthrough;
+ case NFT_ITER_UPDATE:
+ lockdep_assert_held(&nft_pernet(ctx->net)->commit_mutex);
+
nft_rbtree_do_walk(ctx, set, iter);
break;
case NFT_ITER_READ:
--
2.52.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH net 4/4] netfilter: nft_set_pipapo: split gc into unlink and reclaim phase
2026-03-04 17:29 [PATCH net 0/4] netfilter: updates for net Florian Westphal
` (2 preceding siblings ...)
2026-03-04 17:29 ` [PATCH net 3/4] netfilter: nf_tables: clone set on flush only Florian Westphal
@ 2026-03-04 17:29 ` Florian Westphal
2026-03-04 21:57 ` [PATCH net 0/4] netfilter: updates for net Pablo Neira Ayuso
2026-03-05 12:21 ` Florian Westphal
5 siblings, 0 replies; 13+ messages in thread
From: Florian Westphal @ 2026-03-04 17:29 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
Yiming Qian reports Use-after-free in the pipapo set type:
Under a large number of expired elements, commit-time GC can run for a very
long time in a non-preemptible context, triggering soft lockup warnings and
RCU stall reports (local denial of service).
We must split GC in an unlink and a reclaim phase.
We cannot queue elements for freeing until pointers have been swapped.
Expired elements are still exposed to both the packet path and userspace
dumpers via the live copy of the data structure.
call_rcu() does not protect us: dump operations or element lookups starting
after call_rcu has fired can still observe the free'd element, unless the
commit phase has made enough progress to swap the clone and live pointers
before any new reader has picked up the old version.
This a similar approach as done recently for the rbtree backend in commit
35f83a75529a ("netfilter: nft_set_rbtree: don't gc elements on insert").
Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
include/net/netfilter/nf_tables.h | 5 +++
net/netfilter/nf_tables_api.c | 5 ---
net/netfilter/nft_set_pipapo.c | 51 ++++++++++++++++++++++++++-----
net/netfilter/nft_set_pipapo.h | 2 ++
4 files changed, 50 insertions(+), 13 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index ea6f29ad7888..e2d2bfc1f989 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -1863,6 +1863,11 @@ struct nft_trans_gc {
struct rcu_head rcu;
};
+static inline int nft_trans_gc_space(const struct nft_trans_gc *trans)
+{
+ return NFT_TRANS_GC_BATCHCOUNT - trans->count;
+}
+
static inline void nft_ctx_update(struct nft_ctx *ctx,
const struct nft_trans *trans)
{
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 058f7004cb2b..1862bd7fe804 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -10493,11 +10493,6 @@ static void nft_trans_gc_queue_work(struct nft_trans_gc *trans)
schedule_work(&trans_gc_work);
}
-static int nft_trans_gc_space(struct nft_trans_gc *trans)
-{
- return NFT_TRANS_GC_BATCHCOUNT - trans->count;
-}
-
struct nft_trans_gc *nft_trans_gc_queue_async(struct nft_trans_gc *gc,
unsigned int gc_seq, gfp_t gfp)
{
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index c091898df710..a34632ae6048 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -1680,11 +1680,11 @@ static void nft_pipapo_gc_deactivate(struct net *net, struct nft_set *set,
}
/**
- * pipapo_gc() - Drop expired entries from set, destroy start and end elements
+ * pipapo_gc_scan() - Drop expired entries from set and link them to gc list
* @set: nftables API set representation
* @m: Matching data
*/
-static void pipapo_gc(struct nft_set *set, struct nft_pipapo_match *m)
+static void pipapo_gc_scan(struct nft_set *set, struct nft_pipapo_match *m)
{
struct nft_pipapo *priv = nft_set_priv(set);
struct net *net = read_pnet(&set->net);
@@ -1697,6 +1697,8 @@ static void pipapo_gc(struct nft_set *set, struct nft_pipapo_match *m)
if (!gc)
return;
+ list_add(&gc->list, &priv->gc_head);
+
while ((rules_f0 = pipapo_rules_same_key(m->f, first_rule))) {
union nft_pipapo_map_bucket rulemap[NFT_PIPAPO_MAX_FIELDS];
const struct nft_pipapo_field *f;
@@ -1724,9 +1726,13 @@ static void pipapo_gc(struct nft_set *set, struct nft_pipapo_match *m)
* NFT_SET_ELEM_DEAD_BIT.
*/
if (__nft_set_elem_expired(&e->ext, tstamp)) {
- gc = nft_trans_gc_queue_sync(gc, GFP_KERNEL);
- if (!gc)
- return;
+ if (!nft_trans_gc_space(gc)) {
+ gc = nft_trans_gc_alloc(set, 0, GFP_KERNEL);
+ if (!gc)
+ return;
+
+ list_add(&gc->list, &priv->gc_head);
+ }
nft_pipapo_gc_deactivate(net, set, e);
pipapo_drop(m, rulemap);
@@ -1740,10 +1746,30 @@ static void pipapo_gc(struct nft_set *set, struct nft_pipapo_match *m)
}
}
- gc = nft_trans_gc_catchall_sync(gc);
+ priv->last_gc = jiffies;
+}
+
+/**
+ * pipapo_gc_queue() - Free expired elements
+ * @set: nftables API set representation
+ */
+static void pipapo_gc_queue(struct nft_set *set)
+{
+ struct nft_pipapo *priv = nft_set_priv(set);
+ struct nft_trans_gc *gc, *next;
+
+ /* always do a catchall cycle: */
+ gc = nft_trans_gc_alloc(set, 0, GFP_KERNEL);
if (gc) {
+ gc = nft_trans_gc_catchall_sync(gc);
+ if (gc)
+ nft_trans_gc_queue_sync_done(gc);
+ }
+
+ /* always purge queued gc elements. */
+ list_for_each_entry_safe(gc, next, &priv->gc_head, list) {
+ list_del(&gc->list);
nft_trans_gc_queue_sync_done(gc);
- priv->last_gc = jiffies;
}
}
@@ -1797,6 +1823,10 @@ static void pipapo_reclaim_match(struct rcu_head *rcu)
*
* We also need to create a new working copy for subsequent insertions and
* deletions.
+ *
+ * After the live copy has been replaced by the clone, we can safely queue
+ * expired elements that have been collected by pipapo_gc_scan() for
+ * memory reclaim.
*/
static void nft_pipapo_commit(struct nft_set *set)
{
@@ -1807,7 +1837,7 @@ static void nft_pipapo_commit(struct nft_set *set)
return;
if (time_after_eq(jiffies, priv->last_gc + nft_set_gc_interval(set)))
- pipapo_gc(set, priv->clone);
+ pipapo_gc_scan(set, priv->clone);
old = rcu_replace_pointer(priv->match, priv->clone,
nft_pipapo_transaction_mutex_held(set));
@@ -1815,6 +1845,8 @@ static void nft_pipapo_commit(struct nft_set *set)
if (old)
call_rcu(&old->rcu, pipapo_reclaim_match);
+
+ pipapo_gc_queue(set);
}
static void nft_pipapo_abort(const struct nft_set *set)
@@ -2279,6 +2311,7 @@ static int nft_pipapo_init(const struct nft_set *set,
f->mt = NULL;
}
+ INIT_LIST_HEAD(&priv->gc_head);
rcu_assign_pointer(priv->match, m);
return 0;
@@ -2328,6 +2361,8 @@ static void nft_pipapo_destroy(const struct nft_ctx *ctx,
struct nft_pipapo *priv = nft_set_priv(set);
struct nft_pipapo_match *m;
+ WARN_ON_ONCE(!list_empty(&priv->gc_head));
+
m = rcu_dereference_protected(priv->match, true);
if (priv->clone) {
diff --git a/net/netfilter/nft_set_pipapo.h b/net/netfilter/nft_set_pipapo.h
index eaab422aa56a..9aee9a9eaeb7 100644
--- a/net/netfilter/nft_set_pipapo.h
+++ b/net/netfilter/nft_set_pipapo.h
@@ -156,12 +156,14 @@ struct nft_pipapo_match {
* @clone: Copy where pending insertions and deletions are kept
* @width: Total bytes to be matched for one packet, including padding
* @last_gc: Timestamp of last garbage collection run, jiffies
+ * @gc_head: list of nft_trans_gc to queue up for mem reclaim
*/
struct nft_pipapo {
struct nft_pipapo_match __rcu *match;
struct nft_pipapo_match *clone;
int width;
unsigned long last_gc;
+ struct list_head gc_head;
};
struct nft_pipapo_elem;
--
2.52.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH net 0/4] netfilter: updates for net
2026-03-04 17:29 [PATCH net 0/4] netfilter: updates for net Florian Westphal
` (3 preceding siblings ...)
2026-03-04 17:29 ` [PATCH net 4/4] netfilter: nft_set_pipapo: split gc into unlink and reclaim phase Florian Westphal
@ 2026-03-04 21:57 ` Pablo Neira Ayuso
2026-03-05 9:05 ` Florian Westphal
2026-03-05 12:21 ` Florian Westphal
5 siblings, 1 reply; 13+ messages in thread
From: Pablo Neira Ayuso @ 2026-03-04 21:57 UTC (permalink / raw)
To: Florian Westphal
Cc: netdev, Paolo Abeni, David S. Miller, Eric Dumazet,
Jakub Kicinski, netfilter-devel
Hi Florian,
On Wed, Mar 04, 2026 at 06:29:36PM +0100, Florian Westphal wrote:
> Hi,
>
> The following patchset contains Netfilter fixes for *net*:
>
> 1) Fix a bug with vlan headers in the flowtable infrastructure.
> Existing code uses skb_vlan_push() helper, but that helper
> requires skb->data to point to the MAC header, which isn't the
> case for flowtables. Switch to a new helper, modeled on the
> existing PPPoE helper. From Eric Woudstra. This bug was added
> in v6.19-rc1.
In patch 1/4, why is this new function so different wrt. skb_vlan_push?
int skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci)
{
if (skb_vlan_tag_present(skb)) {
int offset = skb->data - skb_mac_header(skb);
int err;
if (WARN_ONCE(offset,
"skb_vlan_push got skb with skb->data not at mac header (offset %d)\n",
offset)) {
return -EINVAL;
}
err = __vlan_insert_tag(skb, skb->vlan_proto,
skb_vlan_tag_get(skb));
if (err)
return err;
skb->protocol = skb->vlan_proto;
skb->network_header -= VLAN_HLEN;
skb_postpush_rcsum(skb, skb->data + (2 * ETH_ALEN), VLAN_HLEN);
}
__vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci);
In case there are two VLANs, the existing in hwaccel gets pushed into
the VLAN header, and the outer VLAN becomes the one that is offloaded?
Is this reversed in this patch? The first VLAN tag is offloaded, then
the next one coming is pushed as a VLAN header?
Thanks.
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH net 0/4] netfilter: updates for net
2026-03-04 21:57 ` [PATCH net 0/4] netfilter: updates for net Pablo Neira Ayuso
@ 2026-03-05 9:05 ` Florian Westphal
2026-03-05 9:40 ` Pablo Neira Ayuso
0 siblings, 1 reply; 13+ messages in thread
From: Florian Westphal @ 2026-03-05 9:05 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: netdev, Paolo Abeni, David S. Miller, Eric Dumazet,
Jakub Kicinski, netfilter-devel
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Hi Florian,
>
> On Wed, Mar 04, 2026 at 06:29:36PM +0100, Florian Westphal wrote:
> > Hi,
> >
> > The following patchset contains Netfilter fixes for *net*:
> >
> > 1) Fix a bug with vlan headers in the flowtable infrastructure.
> > Existing code uses skb_vlan_push() helper, but that helper
> > requires skb->data to point to the MAC header, which isn't the
> > case for flowtables. Switch to a new helper, modeled on the
> > existing PPPoE helper. From Eric Woudstra. This bug was added
> > in v6.19-rc1.
>
> In patch 1/4, why is this new function so different wrt. skb_vlan_push?
>
I asked that to Eric when I reviewed this, and that was his reply:
--------------------------------------------------------------------
The code here for the inner header is an almost exact copy of
nf_flow_pppoe_push(), which was also implemented at the same time.
So handling pppoe and inner-vlan header is implemented in the same
manner, which keeps it simple and uniform. If one functions
(in)correctly, then so would the other.
I've been implementing handling the inner vlan header like this for a
half year now. My version of nf_flow_encap_push() was a bit different,
but after this patch it is quite similar.
--------------------------------------------------------------------
> skb_postpush_rcsum(skb, skb->data + (2 * ETH_ALEN), VLAN_HLEN);
> }
> __vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci);
>
>
> In case there are two VLANs, the existing in hwaccel gets pushed into
> the VLAN header, and the outer VLAN becomes the one that is offloaded?
>
> Is this reversed in this patch? The first VLAN tag is offloaded, then
> the next one coming is pushed as a VLAN header?
Yes, it looks broken. I wonder why we have no tests for this stuff.
First a vlan push function that cannot have worked, ever, now this
seemingly reversing-headers variant:
For PPPOE, its pushing the ppppe header to packet, so we get
strict ordering, later header coming in the stack gets placed on
top, before older one.
Here, first vlan push gets placed into hw tag in skb (which makes
sense, let HW take care of it).
But if 2nd comes along, then that gets placed in the packet
and the hwaccel tag remains?
What to do? Should be nuke vlan offload support from flowtable?
It appears to be an unused feature.
I have low confidence in this code.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 0/4] netfilter: updates for net
2026-03-05 9:05 ` Florian Westphal
@ 2026-03-05 9:40 ` Pablo Neira Ayuso
2026-03-05 12:20 ` Florian Westphal
0 siblings, 1 reply; 13+ messages in thread
From: Pablo Neira Ayuso @ 2026-03-05 9:40 UTC (permalink / raw)
To: Florian Westphal
Cc: netdev, Paolo Abeni, David S. Miller, Eric Dumazet,
Jakub Kicinski, netfilter-devel
On Thu, Mar 05, 2026 at 10:05:15AM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > Hi Florian,
> >
> > On Wed, Mar 04, 2026 at 06:29:36PM +0100, Florian Westphal wrote:
> > > Hi,
> > >
> > > The following patchset contains Netfilter fixes for *net*:
> > >
> > > 1) Fix a bug with vlan headers in the flowtable infrastructure.
> > > Existing code uses skb_vlan_push() helper, but that helper
> > > requires skb->data to point to the MAC header, which isn't the
> > > case for flowtables. Switch to a new helper, modeled on the
> > > existing PPPoE helper. From Eric Woudstra. This bug was added
> > > in v6.19-rc1.
> >
> > In patch 1/4, why is this new function so different wrt. skb_vlan_push?
> >
>
> I asked that to Eric when I reviewed this, and that was his reply:
> --------------------------------------------------------------------
> The code here for the inner header is an almost exact copy of
> nf_flow_pppoe_push(), which was also implemented at the same time.
> So handling pppoe and inner-vlan header is implemented in the same
> manner, which keeps it simple and uniform. If one functions
> (in)correctly, then so would the other.
>
> I've been implementing handling the inner vlan header like this for a
> half year now. My version of nf_flow_encap_push() was a bit different,
> but after this patch it is quite similar.
> --------------------------------------------------------------------
>
> > skb_postpush_rcsum(skb, skb->data + (2 * ETH_ALEN), VLAN_HLEN);
> > }
> > __vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci);
> >
> >
> > In case there are two VLANs, the existing in hwaccel gets pushed into
> > the VLAN header, and the outer VLAN becomes the one that is offloaded?
> >
> > Is this reversed in this patch? The first VLAN tag is offloaded, then
> > the next one coming is pushed as a VLAN header?
>
> Yes, it looks broken. I wonder why we have no tests for this stuff.
> First a vlan push function that cannot have worked, ever, now this
> seemingly reversing-headers variant:
This used to work, I just accidentally broke it when using
skb_vlan_push() in net-next.
I will post fix.
> For PPPOE, its pushing the ppppe header to packet, so we get
> strict ordering, later header coming in the stack gets placed on
> top, before older one.
>
> Here, first vlan push gets placed into hw tag in skb (which makes
> sense, let HW take care of it).
>
> But if 2nd comes along, then that gets placed in the packet
> and the hwaccel tag remains?
>
> What to do? Should be nuke vlan offload support from flowtable?
> It appears to be an unused feature.
>
> I have low confidence in this code.
Could you elaborate more precisely?
Thanks.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 0/4] netfilter: updates for net
2026-03-05 9:40 ` Pablo Neira Ayuso
@ 2026-03-05 12:20 ` Florian Westphal
0 siblings, 0 replies; 13+ messages in thread
From: Florian Westphal @ 2026-03-05 12:20 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: netdev, Paolo Abeni, David S. Miller, Eric Dumazet,
Jakub Kicinski, netfilter-devel
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > Yes, it looks broken. I wonder why we have no tests for this stuff.
> > First a vlan push function that cannot have worked, ever, now this
> > seemingly reversing-headers variant:
>
> This used to work, I just accidentally broke it when using
> skb_vlan_push() in net-next.
>
> I will post fix.
Ok, thanks.
> > For PPPOE, its pushing the ppppe header to packet, so we get
> > strict ordering, later header coming in the stack gets placed on
> > top, before older one.
> >
> > Here, first vlan push gets placed into hw tag in skb (which makes
> > sense, let HW take care of it).
> >
> > But if 2nd comes along, then that gets placed in the packet
> > and the hwaccel tag remains?
> >
> > What to do? Should be nuke vlan offload support from flowtable?
> > It appears to be an unused feature.
> >
> > I have low confidence in this code.
>
> Could you elaborate more precisely?
Add bug in nf_queue -> kselftest will likely barf
Add bug in nf_tables control plane -> nftables shell and/or
python tests will likely barf
Add bug in conntrack -> kselftest will likely barf
Add new bug in flowtable vlan -> nada.
I think we should refuse both new features and refactoring patches going
forward unless they come with either update to existing kselftest, or a
new test or a test in nftables.git.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 0/4] netfilter: updates for net
2026-03-04 17:29 [PATCH net 0/4] netfilter: updates for net Florian Westphal
` (4 preceding siblings ...)
2026-03-04 21:57 ` [PATCH net 0/4] netfilter: updates for net Pablo Neira Ayuso
@ 2026-03-05 12:21 ` Florian Westphal
5 siblings, 0 replies; 13+ messages in thread
From: Florian Westphal @ 2026-03-05 12:21 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
Florian Westphal <fw@strlen.de> wrote:
> 1) Fix a bug with vlan headers in the flowtable infrastructure.
> Existing code uses skb_vlan_push() helper, but that helper
> requires skb->data to point to the MAC header, which isn't the
> case for flowtables. Switch to a new helper, modeled on the
> existing PPPoE helper. From Eric Woudstra. This bug was added
> in v6.19-rc1.
Please toss this MR, I will create a new one in a few minutes,
axing this fix from the series.
^ permalink raw reply [flat|nested] 13+ messages in thread