From: Florian Westphal <fw@strlen.de>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org, carges@cloudflare.com
Subject: Re: [PATCH nf,v2] netfilter: nft_set_rbtree: allocate same array size on updates
Date: Sat, 7 Mar 2026 10:07:12 +0100 [thread overview]
Message-ID: <aavqwA_H032EaiRg@strlen.de> (raw)
In-Reply-To: <20260307001124.2897063-1-pablo@netfilter.org>
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> v2: fix crash with new sets, reported by Florian.
>
> net/netfilter/nft_set_rbtree.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
> index 853ff30a208c..bdcea649467f 100644
> --- a/net/netfilter/nft_set_rbtree.c
> +++ b/net/netfilter/nft_set_rbtree.c
> @@ -646,7 +646,12 @@ static int nft_array_may_resize(const struct nft_set *set)
> struct nft_array *array;
>
> if (!priv->array_next) {
> - array = nft_array_alloc(nelems + NFT_ARRAY_EXTRA_SIZE);
> + if (priv->array)
> + new_max_intervals = priv->array->max_intervals;
> + else
> + new_max_intervals = NFT_ARRAY_EXTRA_SIZE;
> +
> + array = nft_array_alloc(new_max_intervals);
> if (!array)
> return -ENOMEM;
>
I wonder if rbtree code should try harder to compact memory.
We can't defer allocation to commit hook because commit hook can't fail.
But its the only location where we know the exact memory size needed:
1. To-be-removed elements have been unlinked from the tree
2. Expired elements have been unlinked too
So, after the tree walk, num_intervals is the actual needed count and
no longer a worst-case estimate.
What if this would check num_intervals vs.
priv->array_next->num_intervals, and, if the difference is say, more
than 10% or more than 1 page of memory, try to allocate a better size?
If the allocation fails, too bad, use the existing one.
If it works, copy the elements over, free the larger allocation and
use the "better fit".
netlink has similar approach via netlink_trim() to avoid stuffing
large allocations into socket queues.
An even better way would be to just kvrealloc() but in the vmalloc case
this code path doesn't shrink the VM area in case new requested size
is smaller than the existing one, so that would require work on mm side
first.
One issue is that if you have a large rbtree, and the userspace pattern
is:
"flush set t s; add element t s { $huge }" then rbtree will roughly have
double the memory size that it would actually need: it will
first allocate the existing element size, then continue to realloc to
accomodate the new incoming elements, but, since the flush only takes
effect post-commit, can't account for the reduced size post-commit.
next prev parent reply other threads:[~2026-03-07 9:07 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-07 0:11 [PATCH nf,v2] netfilter: nft_set_rbtree: allocate same array size on updates Pablo Neira Ayuso
2026-03-07 9:07 ` Florian Westphal [this message]
2026-03-07 12:59 ` Pablo Neira Ayuso
2026-03-07 13:06 ` Florian Westphal
2026-03-08 10:47 ` Pablo Neira Ayuso
2026-03-11 16:29 ` Chris Arges
2026-03-11 16:43 ` Pablo Neira Ayuso
2026-03-11 18:45 ` Chris Arges
-- strict thread matches above, loose matches on Subject: below --
2026-03-08 11:23 Pablo Neira Ayuso
2026-03-08 11:25 ` Pablo Neira Ayuso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aavqwA_H032EaiRg@strlen.de \
--to=fw@strlen.de \
--cc=carges@cloudflare.com \
--cc=netfilter-devel@vger.kernel.org \
--cc=pablo@netfilter.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox