From: Stefano Brivio <sbrivio@redhat.com>
To: Florian Westphal <fw@strlen.de>
Cc: <netfilter-devel@vger.kernel.org>, Seesee <cjc000013@gmail.com>
Subject: Re: [PATCH nf] netfilter: nft_set_pipapo: don't leak bad clone into future transaction
Date: Wed, 17 Jun 2026 07:51:24 +0200 (CEST) [thread overview]
Message-ID: <20260617075123.7a62e22c@elisabeth> (raw)
In-Reply-To: <20260616191938.2875-1-fw@strlen.de>
On Tue, 16 Jun 2026 21:19:34 +0200
Florian Westphal <fw@strlen.de> wrote:
> On memory allocation failure the cloned nft_pipapo_match can enter a bad
> state:
> - some fields can have their lookup tables resized while others did
> not
> - bits might have been toggled
> - scratch map can be undersized which also means m->bsize_max can be
> lower than what is required
If I understand it correctly, this is about pipapo_realloc_scratch()
failing to allocate memory for per-CPU scratch maps but
pipapo_maybe_clone() succeeding, right?
> This means that the next insertion in the same batch can trigger
> out-of-bounds writes.
>
> Futhermore, a failure in the first can result in the bad clone to
> leak into the next transaction because the abort callback is never
> executed in this case (the upper layer saw an error and no attempt to
> allocate a transactional request was made).
>
> Record a state for the nft_pipapo_match structure:
> - NEW (pristine clone)
> - MOD (modified clone with good state)
> - ERR (potentially bogus content)
>
> Then make it so that deletes and insertions fail when the clone
> entered ERR state.
>
> In case the very first insert attempt results in an error, free the
> clone right away.
I don't see anything wrong with this approach, but I guess there might
be a more obvious alternative, even though I didn't really think this
through: undo what we did in nft_pipapo_insert() up to that point
(perhaps calling nft_pipapo_delete() with a particular argument).
I can try to get to this in the next few days (I would have some ideas
about testing, see below), but I suppose we want a fix quickly if that's
really the case so I'm actually fine with this, with one nit, also
reported below.
> Reported-by: Seesee <cjc000013@gmail.com>
> Cc: Stefano Brivio <sbrivio@redhat.com>
> Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
> Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
> ---
> For some reason reporter did not share a PoC despite claiming
> there is one, so I have no fucking clue if this patch
> is correct or not, its based on guesswork without
> any validation.
Maybe we could run test scripts against a modified version of
pipapo_realloc_scratch() returning -ENOMEM every 20th time or so?
> net/netfilter/nft_set_pipapo.c | 34 +++++++++++++++++++++++++++++-----
> net/netfilter/nft_set_pipapo.h | 8 ++++++++
> 2 files changed, 37 insertions(+), 5 deletions(-)
>
> diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
> index 50d4a4f04309..61b8601ee377 100644
> --- a/net/netfilter/nft_set_pipapo.c
> +++ b/net/netfilter/nft_set_pipapo.c
> @@ -342,6 +342,8 @@
> #include "nft_set_pipapo_avx2.h"
> #include "nft_set_pipapo.h"
>
> +static void nft_pipapo_abort(const struct nft_set *set);
> +
> /**
> * pipapo_refill() - For each set bit, set bits from selected mapping table item
> * @map: Bitmap to be scanned for set bits
> @@ -1296,7 +1298,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
> const u8 *start_p, *end_p;
> int i, bsize_max, err = 0;
>
> - if (!m)
> + if (!m || m->state == NFT_PIPAPO_CLONE_ERR)
> return -ENOMEM;
>
> if (nft_set_ext_exists(ext, NFT_SET_EXT_KEY_END))
> @@ -1367,8 +1369,10 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
> else
> ret = pipapo_expand(f, start, end, f->groups * f->bb);
>
> - if (ret < 0)
> - return ret;
> + if (ret < 0) {
> + err = ret;
> + goto abort;
> + }
>
> if (f->bsize > bsize_max)
> bsize_max = f->bsize;
> @@ -1384,7 +1388,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
>
> err = pipapo_realloc_scratch(m, bsize_max);
> if (err)
> - return err;
> + goto abort;
>
> m->bsize_max = bsize_max;
> } else {
> @@ -1396,7 +1400,26 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
>
> pipapo_map(m, rulemap, e);
>
> + m->state = NFT_PIPAPO_CLONE_MOD;
> return 0;
> +abort:
> + DEBUG_NET_WARN_ON_ONCE(m->state == NFT_PIPAPO_CLONE_ERR);
> +
> + /* Two rollback cases:
> + * 1) no previous changes. nft_pipapo_abort is not
> + * guaranteed to be invoked (there might be no further
> + * add/delete requests coming after this).
> + *
> + * 2) we had previous changes: there are transaction
> + * records pointing to this set. Leave the rollback to
> + * the transaction handling.
> + */
> + if (m->state == NFT_PIPAPO_CLONE_NEW)
> + nft_pipapo_abort(set); /* releases m */
> + else
> + m->state = NFT_PIPAPO_CLONE_ERR;
> +
> + return err;
> }
>
> /**
> @@ -1473,6 +1496,7 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
> dst++;
> }
>
> + new->state = NFT_PIPAPO_CLONE_NEW;
> return new;
>
> out_mt:
> @@ -1896,7 +1920,7 @@ nft_pipapo_deactivate(const struct net *net, const struct nft_set *set,
> /* removal must occur on priv->clone, if we are low on memory
> * we have no choice and must fail the removal request.
> */
> - if (!m)
> + if (!m || m->state == NFT_PIPAPO_CLONE_ERR)
> return NULL;
>
> e = pipapo_get(m, (const u8 *)elem->key.val.data,
> diff --git a/net/netfilter/nft_set_pipapo.h b/net/netfilter/nft_set_pipapo.h
> index b82abb03576e..a19e980d06ef 100644
> --- a/net/netfilter/nft_set_pipapo.h
> +++ b/net/netfilter/nft_set_pipapo.h
> @@ -131,9 +131,16 @@ struct nft_pipapo_scratch {
> unsigned long __map[];
> };
>
> +enum nft_pipapo_clone_state {
Nit: if would be nice to have the canonical:
/**
* enum nft_pipapo_clone_state - Validity states for clones of matching data
* @NFT_PIPAPO_CLONE_NEW Pristine clone of currently active data
* @NFT_PIPAPO_CLONE_MOD Modified clone (insertion or deletions), valid
* @NFT_PIPAPO_CLONE_ERR Some operations failed, invalid state, don't use
*/
> + NFT_PIPAPO_CLONE_NEW,
> + NFT_PIPAPO_CLONE_MOD,
> + NFT_PIPAPO_CLONE_ERR,
> +};
> +
> /**
> * struct nft_pipapo_match - Data used for lookup and matching
> * @field_count: Amount of fields in set
> + * @state: add/delete state; used from control plane
> * @bsize_max: Maximum lookup table bucket size of all fields, in longs
> * @scratch: Preallocated per-CPU maps for partial matching results
> * @rcu: Matching data is swapped on commits
> @@ -141,6 +148,7 @@ struct nft_pipapo_scratch {
> */
> struct nft_pipapo_match {
> u8 field_count;
> + enum nft_pipapo_clone_state state:8;
> unsigned int bsize_max;
> struct nft_pipapo_scratch * __percpu *scratch;
> struct rcu_head rcu;
--
Stefano
next prev parent reply other threads:[~2026-06-17 5:51 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-16 19:19 [PATCH nf] netfilter: nft_set_pipapo: don't leak bad clone into future transaction Florian Westphal
2026-06-17 5:51 ` Stefano Brivio [this message]
2026-06-17 8:02 ` Florian Westphal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260617075123.7a62e22c@elisabeth \
--to=sbrivio@redhat.com \
--cc=cjc000013@gmail.com \
--cc=fw@strlen.de \
--cc=netfilter-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.