From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Florian Westphal <fw@strlen.de>
Cc: netfilter-devel@vger.kernel.org, carges@cloudflare.com
Subject: Re: [PATCH nf,v2] netfilter: nft_set_rbtree: allocate same array size on updates
Date: Sat, 7 Mar 2026 13:59:48 +0100 [thread overview]
Message-ID: <aawhRH5SLVzNTots@chamomile> (raw)
In-Reply-To: <aavqwA_H032EaiRg@strlen.de>
On Sat, Mar 07, 2026 at 10:07:12AM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > v2: fix crash with new sets, reported by Florian.
> >
> > net/netfilter/nft_set_rbtree.c | 7 ++++++-
> > 1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
> > index 853ff30a208c..bdcea649467f 100644
> > --- a/net/netfilter/nft_set_rbtree.c
> > +++ b/net/netfilter/nft_set_rbtree.c
> > @@ -646,7 +646,12 @@ static int nft_array_may_resize(const struct nft_set *set)
> > struct nft_array *array;
> >
> > if (!priv->array_next) {
> > - array = nft_array_alloc(nelems + NFT_ARRAY_EXTRA_SIZE);
> > + if (priv->array)
> > + new_max_intervals = priv->array->max_intervals;
> > + else
> > + new_max_intervals = NFT_ARRAY_EXTRA_SIZE;
> > +
> > + array = nft_array_alloc(new_max_intervals);
> > if (!array)
> > return -ENOMEM;
> >
>
> I wonder if rbtree code should try harder to compact memory.
>
> We can't defer allocation to commit hook because commit hook can't fail.
> But its the only location where we know the exact memory size needed:
>
> 1. To-be-removed elements have been unlinked from the tree
> 2. Expired elements have been unlinked too
>
> So, after the tree walk, num_intervals is the actual needed count and
> no longer a worst-case estimate.
>
> What if this would check num_intervals vs.
> priv->array_next->num_intervals, and, if the difference is say, more
> than 10% or more than 1 page of memory, try to allocate a better size?
>
> If the allocation fails, too bad, use the existing one.
> If it works, copy the elements over, free the larger allocation and
> use the "better fit".
>
> netlink has similar approach via netlink_trim() to avoid stuffing
> large allocations into socket queues.
>
> An even better way would be to just kvrealloc() but in the vmalloc case
> this code path doesn't shrink the VM area in case new requested size
> is smaller than the existing one, so that would require work on mm side
> first.
>
> One issue is that if you have a large rbtree, and the userspace pattern
> is:
>
> "flush set t s; add element t s { $huge }" then rbtree will roughly have
> double the memory size that it would actually need: it will
> first allocate the existing element size, then continue to realloc to
> accomodate the new incoming elements, but, since the flush only takes
> effect post-commit, can't account for the reduced size post-commit.
There is also set->ndeact that provides a hint on deactivated
elements.
And NFT_ARRAY_EXTRA_SIZE and resize can be revisited to allocate
memory areas that fit into the corresponding kmalloc slab.
So yes, there are smarter things that can be done here, but as for
this patch, my initial approach in this fix series was intentionally
simple.
As for this patch, I am just targetting at fixing the current linear
growth in memory consumption of this array on each update.
next prev parent reply other threads:[~2026-03-07 12:59 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-07 0:11 [PATCH nf,v2] netfilter: nft_set_rbtree: allocate same array size on updates Pablo Neira Ayuso
2026-03-07 9:07 ` Florian Westphal
2026-03-07 12:59 ` Pablo Neira Ayuso [this message]
2026-03-07 13:06 ` Florian Westphal
2026-03-08 10:47 ` Pablo Neira Ayuso
2026-03-11 16:29 ` Chris Arges
2026-03-11 16:43 ` Pablo Neira Ayuso
2026-03-11 18:45 ` Chris Arges
-- strict thread matches above, loose matches on Subject: below --
2026-03-08 11:23 Pablo Neira Ayuso
2026-03-08 11:25 ` Pablo Neira Ayuso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aawhRH5SLVzNTots@chamomile \
--to=pablo@netfilter.org \
--cc=carges@cloudflare.com \
--cc=fw@strlen.de \
--cc=netfilter-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox