public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Chris Arges <carges@cloudflare.com>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	lwn@lwn.net, jslaby@suse.cz, kernel-team@cloudflare.com,
	netfilter-devel@vger.kernel.org
Subject: Re: [REGRESSION] 6.18.14 netfilter/nftables consumes way more memory
Date: Thu, 5 Mar 2026 10:28:49 -0600	[thread overview]
Message-ID: <aamvQTTZu4-chpsS@20HS2G4> (raw)
In-Reply-To: <aaij0XAgYRN40QdD@chamomile>

On 2026-03-04 22:27:45, Pablo Neira Ayuso wrote:
> Resending, your Reply-To: is botched.
> 
> -o-
> 

I noticed after I sent, thanks for fixing.
> Hi,
> 
> On Wed, Mar 04, 2026 at 11:50:54AM -0600, Chris Arges wrote:
> > Hello,
> > 
> > We've noticed significant slab unreclaimable memory increase after upgrading
> > from 6.18.12 to 6.18.15. Other memory values look fairly close, but in my
> > testing slab unreclaimable goes from 1.7 GB to 4.9 GB on machines.
> 
> From where are you collecting these memory consumption numbers?
> 

These numbers come from the cgroup's memory.stat:
```
$ cat /sys/fs/cgroup/path/to/service/memory.stat | grep slab
slab_reclaimable 35874232
slab_unreclaimable 5343553056
slab 5379427288
```

> > Our use case is having nft rules like below, but adding them to 1000s of
> > network namespaces. This is essentially running `nft -f` for all these
> > namespaces every minute.
> 
> Those numbers for only 1000? That is too little number of entries for
> such increase in memory usage that you report.
> 

For this workload that I suspect (since its in the cgroup) it has the following
characteristics:
- 1000s of namespaces
- 1000s of CIDRs in ip list per namespace
- Updating everything frequently (<1m)

> > ```
> > table inet service_1234567 {
> > }
> > delete table inet service_1234567
> > table inet service_1234567 {
> > 	chain input {
> > 		type filter hook prerouting priority filter; policy accept;
> > 		ip saddr @account.ip_list drop
> > 	}
> > 	set account.ip_list {
> > 		type ipv4_addr
> > 		flags interval
> > 		auto-merge
> > 	}
> > }
> > add element inet service_1234567 account.ip_list { /* add 1000s of CIDRs here */ }
> > ```
> > 
> > I suspect this is related to:
> > - 36ed9b6e3961 (upstream 7e43e0a1141deec651a60109dab3690854107298)
> > - netfilter: nft_set_rbtree: translate rbtree to array for binary search
> 
> More memory consumption is expected indeed, but not so much as you are
> reporting.
> 
> > I'm still digging into this, and plan on reverting commits and seeing if memory
> > usage goes back to nominal in production. I don't have a trivial
> > reproducer unfortunately.
> 
> The extra memory comes from the array allocation, the relevant code
> is here:
> 
> #define NFT_ARRAY_EXTRA_SIZE    10240 
>  
> /* Similar to nft_rbtree_{u,k}size to hide details to userspace, but consider
>  * packed representation coming from userspace for anonymous sets too.
>  */     
> static u32 nft_array_elems(const struct nft_set *set)
> 
> > Happy to run some additional tests, and I can easily apply patches on top of
> > linux-6.18.y to run in a test environment.
> 
> I would need need more info to propose a patch, I don't know where you
> are pulling such numbers. You also mention you have no reproducer.
> 
To clarify this issue _is_ happening in our production environments, so I can
reproduce this issue there. It only happened when going from 6.18.12 to
6.18.15, and with a service inside a cgroup that is mostly applying large sets
of IPs via nft. I do not have a simple reproducer script or something I can
easily share yet, but am working on that.

I'm going to try and revert rbtree patch series locally and see if it still
happens. I can also play with NFT_ARRAY_EXTRA_SIZE and see if that is a factor
here as well.

> > We are using userspace nftables 1.1.3, but had to apply the patch mentioned
> > in this thread: https://lore.kernel.org/all/e6b43861cda6953cc7f8c259e663b890e53d7785.camel@sapience.com/
> > In order to solve the other regression we encountered.
> 
> Yes, there are plans to revert a kernel patch that went in -stable to
> address this.

Thanks.
--chris

  reply	other threads:[~2026-03-05 16:28 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-04 17:50 [REGRESSION] 6.18.14 netfilter/nftables consumes way more memory Chris Arges
2026-03-04 21:26 ` Pablo Neira Ayuso
2026-03-04 21:27   ` Pablo Neira Ayuso
2026-03-05 16:28     ` Chris Arges [this message]
2026-03-06 12:22       ` Pablo Neira Ayuso
2026-03-06 12:25         ` Pablo Neira Ayuso
2026-03-06 18:20           ` Chris Arges
2026-03-07  0:15             ` Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aamvQTTZu4-chpsS@20HS2G4 \
    --to=carges@cloudflare.com \
    --cc=fw@strlen.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=jslaby@suse.cz \
    --cc=kernel-team@cloudflare.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lwn@lwn.net \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox