* [PATCH nf] netfilter: nft_set_pipapo: don't leak bad clone into future transaction
@ 2026-06-16 19:19 Florian Westphal
2026-06-17 5:51 ` Stefano Brivio
0 siblings, 1 reply; 3+ messages in thread
From: Florian Westphal @ 2026-06-16 19:19 UTC (permalink / raw)
To: netfilter-devel; +Cc: Florian Westphal, Seesee, Stefano Brivio
On memory allocation failure the cloned nft_pipapo_match can enter a bad
state:
- some fields can have their lookup tables resized while others did
not
- bits might have been toggled
- scratch map can be undersized which also means m->bsize_max can be
lower than what is required
This means that the next insertion in the same batch can trigger
out-of-bounds writes.
Futhermore, a failure in the first can result in the bad clone to
leak into the next transaction because the abort callback is never
executed in this case (the upper layer saw an error and no attempt to
allocate a transactional request was made).
Record a state for the nft_pipapo_match structure:
- NEW (pristine clone)
- MOD (modified clone with good state)
- ERR (potentially bogus content)
Then make it so that deletes and insertions fail when the clone
entered ERR state.
In case the very first insert attempt results in an error, free the
clone right away.
Reported-by: Seesee <cjc000013@gmail.com>
Cc: Stefano Brivio <sbrivio@redhat.com>
Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
For some reason reporter did not share a PoC despite claiming
there is one, so I have no fucking clue if this patch
is correct or not, its based on guesswork without
any validation.
net/netfilter/nft_set_pipapo.c | 34 +++++++++++++++++++++++++++++-----
net/netfilter/nft_set_pipapo.h | 8 ++++++++
2 files changed, 37 insertions(+), 5 deletions(-)
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 50d4a4f04309..61b8601ee377 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -342,6 +342,8 @@
#include "nft_set_pipapo_avx2.h"
#include "nft_set_pipapo.h"
+static void nft_pipapo_abort(const struct nft_set *set);
+
/**
* pipapo_refill() - For each set bit, set bits from selected mapping table item
* @map: Bitmap to be scanned for set bits
@@ -1296,7 +1298,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
const u8 *start_p, *end_p;
int i, bsize_max, err = 0;
- if (!m)
+ if (!m || m->state == NFT_PIPAPO_CLONE_ERR)
return -ENOMEM;
if (nft_set_ext_exists(ext, NFT_SET_EXT_KEY_END))
@@ -1367,8 +1369,10 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
else
ret = pipapo_expand(f, start, end, f->groups * f->bb);
- if (ret < 0)
- return ret;
+ if (ret < 0) {
+ err = ret;
+ goto abort;
+ }
if (f->bsize > bsize_max)
bsize_max = f->bsize;
@@ -1384,7 +1388,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
err = pipapo_realloc_scratch(m, bsize_max);
if (err)
- return err;
+ goto abort;
m->bsize_max = bsize_max;
} else {
@@ -1396,7 +1400,26 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
pipapo_map(m, rulemap, e);
+ m->state = NFT_PIPAPO_CLONE_MOD;
return 0;
+abort:
+ DEBUG_NET_WARN_ON_ONCE(m->state == NFT_PIPAPO_CLONE_ERR);
+
+ /* Two rollback cases:
+ * 1) no previous changes. nft_pipapo_abort is not
+ * guaranteed to be invoked (there might be no further
+ * add/delete requests coming after this).
+ *
+ * 2) we had previous changes: there are transaction
+ * records pointing to this set. Leave the rollback to
+ * the transaction handling.
+ */
+ if (m->state == NFT_PIPAPO_CLONE_NEW)
+ nft_pipapo_abort(set); /* releases m */
+ else
+ m->state = NFT_PIPAPO_CLONE_ERR;
+
+ return err;
}
/**
@@ -1473,6 +1496,7 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
dst++;
}
+ new->state = NFT_PIPAPO_CLONE_NEW;
return new;
out_mt:
@@ -1896,7 +1920,7 @@ nft_pipapo_deactivate(const struct net *net, const struct nft_set *set,
/* removal must occur on priv->clone, if we are low on memory
* we have no choice and must fail the removal request.
*/
- if (!m)
+ if (!m || m->state == NFT_PIPAPO_CLONE_ERR)
return NULL;
e = pipapo_get(m, (const u8 *)elem->key.val.data,
diff --git a/net/netfilter/nft_set_pipapo.h b/net/netfilter/nft_set_pipapo.h
index b82abb03576e..a19e980d06ef 100644
--- a/net/netfilter/nft_set_pipapo.h
+++ b/net/netfilter/nft_set_pipapo.h
@@ -131,9 +131,16 @@ struct nft_pipapo_scratch {
unsigned long __map[];
};
+enum nft_pipapo_clone_state {
+ NFT_PIPAPO_CLONE_NEW,
+ NFT_PIPAPO_CLONE_MOD,
+ NFT_PIPAPO_CLONE_ERR,
+};
+
/**
* struct nft_pipapo_match - Data used for lookup and matching
* @field_count: Amount of fields in set
+ * @state: add/delete state; used from control plane
* @bsize_max: Maximum lookup table bucket size of all fields, in longs
* @scratch: Preallocated per-CPU maps for partial matching results
* @rcu: Matching data is swapped on commits
@@ -141,6 +148,7 @@ struct nft_pipapo_scratch {
*/
struct nft_pipapo_match {
u8 field_count;
+ enum nft_pipapo_clone_state state:8;
unsigned int bsize_max;
struct nft_pipapo_scratch * __percpu *scratch;
struct rcu_head rcu;
--
2.53.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH nf] netfilter: nft_set_pipapo: don't leak bad clone into future transaction
2026-06-16 19:19 [PATCH nf] netfilter: nft_set_pipapo: don't leak bad clone into future transaction Florian Westphal
@ 2026-06-17 5:51 ` Stefano Brivio
2026-06-17 8:02 ` Florian Westphal
0 siblings, 1 reply; 3+ messages in thread
From: Stefano Brivio @ 2026-06-17 5:51 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel, Seesee
On Tue, 16 Jun 2026 21:19:34 +0200
Florian Westphal <fw@strlen.de> wrote:
> On memory allocation failure the cloned nft_pipapo_match can enter a bad
> state:
> - some fields can have their lookup tables resized while others did
> not
> - bits might have been toggled
> - scratch map can be undersized which also means m->bsize_max can be
> lower than what is required
If I understand it correctly, this is about pipapo_realloc_scratch()
failing to allocate memory for per-CPU scratch maps but
pipapo_maybe_clone() succeeding, right?
> This means that the next insertion in the same batch can trigger
> out-of-bounds writes.
>
> Futhermore, a failure in the first can result in the bad clone to
> leak into the next transaction because the abort callback is never
> executed in this case (the upper layer saw an error and no attempt to
> allocate a transactional request was made).
>
> Record a state for the nft_pipapo_match structure:
> - NEW (pristine clone)
> - MOD (modified clone with good state)
> - ERR (potentially bogus content)
>
> Then make it so that deletes and insertions fail when the clone
> entered ERR state.
>
> In case the very first insert attempt results in an error, free the
> clone right away.
I don't see anything wrong with this approach, but I guess there might
be a more obvious alternative, even though I didn't really think this
through: undo what we did in nft_pipapo_insert() up to that point
(perhaps calling nft_pipapo_delete() with a particular argument).
I can try to get to this in the next few days (I would have some ideas
about testing, see below), but I suppose we want a fix quickly if that's
really the case so I'm actually fine with this, with one nit, also
reported below.
> Reported-by: Seesee <cjc000013@gmail.com>
> Cc: Stefano Brivio <sbrivio@redhat.com>
> Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
> Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
> ---
> For some reason reporter did not share a PoC despite claiming
> there is one, so I have no fucking clue if this patch
> is correct or not, its based on guesswork without
> any validation.
Maybe we could run test scripts against a modified version of
pipapo_realloc_scratch() returning -ENOMEM every 20th time or so?
> net/netfilter/nft_set_pipapo.c | 34 +++++++++++++++++++++++++++++-----
> net/netfilter/nft_set_pipapo.h | 8 ++++++++
> 2 files changed, 37 insertions(+), 5 deletions(-)
>
> diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
> index 50d4a4f04309..61b8601ee377 100644
> --- a/net/netfilter/nft_set_pipapo.c
> +++ b/net/netfilter/nft_set_pipapo.c
> @@ -342,6 +342,8 @@
> #include "nft_set_pipapo_avx2.h"
> #include "nft_set_pipapo.h"
>
> +static void nft_pipapo_abort(const struct nft_set *set);
> +
> /**
> * pipapo_refill() - For each set bit, set bits from selected mapping table item
> * @map: Bitmap to be scanned for set bits
> @@ -1296,7 +1298,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
> const u8 *start_p, *end_p;
> int i, bsize_max, err = 0;
>
> - if (!m)
> + if (!m || m->state == NFT_PIPAPO_CLONE_ERR)
> return -ENOMEM;
>
> if (nft_set_ext_exists(ext, NFT_SET_EXT_KEY_END))
> @@ -1367,8 +1369,10 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
> else
> ret = pipapo_expand(f, start, end, f->groups * f->bb);
>
> - if (ret < 0)
> - return ret;
> + if (ret < 0) {
> + err = ret;
> + goto abort;
> + }
>
> if (f->bsize > bsize_max)
> bsize_max = f->bsize;
> @@ -1384,7 +1388,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
>
> err = pipapo_realloc_scratch(m, bsize_max);
> if (err)
> - return err;
> + goto abort;
>
> m->bsize_max = bsize_max;
> } else {
> @@ -1396,7 +1400,26 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
>
> pipapo_map(m, rulemap, e);
>
> + m->state = NFT_PIPAPO_CLONE_MOD;
> return 0;
> +abort:
> + DEBUG_NET_WARN_ON_ONCE(m->state == NFT_PIPAPO_CLONE_ERR);
> +
> + /* Two rollback cases:
> + * 1) no previous changes. nft_pipapo_abort is not
> + * guaranteed to be invoked (there might be no further
> + * add/delete requests coming after this).
> + *
> + * 2) we had previous changes: there are transaction
> + * records pointing to this set. Leave the rollback to
> + * the transaction handling.
> + */
> + if (m->state == NFT_PIPAPO_CLONE_NEW)
> + nft_pipapo_abort(set); /* releases m */
> + else
> + m->state = NFT_PIPAPO_CLONE_ERR;
> +
> + return err;
> }
>
> /**
> @@ -1473,6 +1496,7 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
> dst++;
> }
>
> + new->state = NFT_PIPAPO_CLONE_NEW;
> return new;
>
> out_mt:
> @@ -1896,7 +1920,7 @@ nft_pipapo_deactivate(const struct net *net, const struct nft_set *set,
> /* removal must occur on priv->clone, if we are low on memory
> * we have no choice and must fail the removal request.
> */
> - if (!m)
> + if (!m || m->state == NFT_PIPAPO_CLONE_ERR)
> return NULL;
>
> e = pipapo_get(m, (const u8 *)elem->key.val.data,
> diff --git a/net/netfilter/nft_set_pipapo.h b/net/netfilter/nft_set_pipapo.h
> index b82abb03576e..a19e980d06ef 100644
> --- a/net/netfilter/nft_set_pipapo.h
> +++ b/net/netfilter/nft_set_pipapo.h
> @@ -131,9 +131,16 @@ struct nft_pipapo_scratch {
> unsigned long __map[];
> };
>
> +enum nft_pipapo_clone_state {
Nit: if would be nice to have the canonical:
/**
* enum nft_pipapo_clone_state - Validity states for clones of matching data
* @NFT_PIPAPO_CLONE_NEW Pristine clone of currently active data
* @NFT_PIPAPO_CLONE_MOD Modified clone (insertion or deletions), valid
* @NFT_PIPAPO_CLONE_ERR Some operations failed, invalid state, don't use
*/
> + NFT_PIPAPO_CLONE_NEW,
> + NFT_PIPAPO_CLONE_MOD,
> + NFT_PIPAPO_CLONE_ERR,
> +};
> +
> /**
> * struct nft_pipapo_match - Data used for lookup and matching
> * @field_count: Amount of fields in set
> + * @state: add/delete state; used from control plane
> * @bsize_max: Maximum lookup table bucket size of all fields, in longs
> * @scratch: Preallocated per-CPU maps for partial matching results
> * @rcu: Matching data is swapped on commits
> @@ -141,6 +148,7 @@ struct nft_pipapo_scratch {
> */
> struct nft_pipapo_match {
> u8 field_count;
> + enum nft_pipapo_clone_state state:8;
> unsigned int bsize_max;
> struct nft_pipapo_scratch * __percpu *scratch;
> struct rcu_head rcu;
--
Stefano
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH nf] netfilter: nft_set_pipapo: don't leak bad clone into future transaction
2026-06-17 5:51 ` Stefano Brivio
@ 2026-06-17 8:02 ` Florian Westphal
0 siblings, 0 replies; 3+ messages in thread
From: Florian Westphal @ 2026-06-17 8:02 UTC (permalink / raw)
To: Stefano Brivio; +Cc: netfilter-devel, Seesee
Stefano Brivio <sbrivio@redhat.com> wrote:
> On Tue, 16 Jun 2026 21:19:34 +0200
> Florian Westphal <fw@strlen.de> wrote:
>
> > On memory allocation failure the cloned nft_pipapo_match can enter a bad
> > state:
> > - some fields can have their lookup tables resized while others did
> > not
> > - bits might have been toggled
> > - scratch map can be undersized which also means m->bsize_max can be
> > lower than what is required
>
> If I understand it correctly, this is about pipapo_realloc_scratch()
> failing to allocate memory for per-CPU scratch maps but
> pipapo_maybe_clone() succeeding, right?
Yes.
> I don't see anything wrong with this approach, but I guess there might
> be a more obvious alternative, even though I didn't really think this
> through: undo what we did in nft_pipapo_insert() up to that point
> (perhaps calling nft_pipapo_delete() with a particular argument).
Yes, that is the laternative, return the cloned copy to a state where it
is identical to the state it had at function start.
> I can try to get to this in the next few days (I would have some ideas
> about testing, see below), but I suppose we want a fix quickly if that's
> really the case so I'm actually fine with this, with one nit, also
> reported below.
I don't mind, this can wait if you prefer to undo the state.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-06-17 8:02 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-16 19:19 [PATCH nf] netfilter: nft_set_pipapo: don't leak bad clone into future transaction Florian Westphal
2026-06-17 5:51 ` Stefano Brivio
2026-06-17 8:02 ` Florian Westphal
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.