All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Arges <carges@cloudflare.com>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	lwn@lwn.net, jslaby@suse.cz, kernel-team@cloudflare.com,
	netfilter-devel@vger.kernel.org
Subject: Re: [REGRESSION] 6.18.14 netfilter/nftables consumes way more memory
Date: Thu, 5 Mar 2026 10:28:49 -0600	[thread overview]
Message-ID: <aamvQTTZu4-chpsS@20HS2G4> (raw)
In-Reply-To: <aaij0XAgYRN40QdD@chamomile>

On 2026-03-04 22:27:45, Pablo Neira Ayuso wrote:
> Resending, your Reply-To: is botched.
> 
> -o-
> 

I noticed after I sent, thanks for fixing.
> Hi,
> 
> On Wed, Mar 04, 2026 at 11:50:54AM -0600, Chris Arges wrote:
> > Hello,
> > 
> > We've noticed significant slab unreclaimable memory increase after upgrading
> > from 6.18.12 to 6.18.15. Other memory values look fairly close, but in my
> > testing slab unreclaimable goes from 1.7 GB to 4.9 GB on machines.
> 
> From where are you collecting these memory consumption numbers?
> 

These numbers come from the cgroup's memory.stat:
```
$ cat /sys/fs/cgroup/path/to/service/memory.stat | grep slab
slab_reclaimable 35874232
slab_unreclaimable 5343553056
slab 5379427288
```

> > Our use case is having nft rules like below, but adding them to 1000s of
> > network namespaces. This is essentially running `nft -f` for all these
> > namespaces every minute.
> 
> Those numbers for only 1000? That is too little number of entries for
> such increase in memory usage that you report.
> 

For this workload that I suspect (since its in the cgroup) it has the following
characteristics:
- 1000s of namespaces
- 1000s of CIDRs in ip list per namespace
- Updating everything frequently (<1m)

> > ```
> > table inet service_1234567 {
> > }
> > delete table inet service_1234567
> > table inet service_1234567 {
> > 	chain input {
> > 		type filter hook prerouting priority filter; policy accept;
> > 		ip saddr @account.ip_list drop
> > 	}
> > 	set account.ip_list {
> > 		type ipv4_addr
> > 		flags interval
> > 		auto-merge
> > 	}
> > }
> > add element inet service_1234567 account.ip_list { /* add 1000s of CIDRs here */ }
> > ```
> > 
> > I suspect this is related to:
> > - 36ed9b6e3961 (upstream 7e43e0a1141deec651a60109dab3690854107298)
> > - netfilter: nft_set_rbtree: translate rbtree to array for binary search
> 
> More memory consumption is expected indeed, but not so much as you are
> reporting.
> 
> > I'm still digging into this, and plan on reverting commits and seeing if memory
> > usage goes back to nominal in production. I don't have a trivial
> > reproducer unfortunately.
> 
> The extra memory comes from the array allocation, the relevant code
> is here:
> 
> #define NFT_ARRAY_EXTRA_SIZE    10240 
>  
> /* Similar to nft_rbtree_{u,k}size to hide details to userspace, but consider
>  * packed representation coming from userspace for anonymous sets too.
>  */     
> static u32 nft_array_elems(const struct nft_set *set)
> 
> > Happy to run some additional tests, and I can easily apply patches on top of
> > linux-6.18.y to run in a test environment.
> 
> I would need need more info to propose a patch, I don't know where you
> are pulling such numbers. You also mention you have no reproducer.
> 
To clarify this issue _is_ happening in our production environments, so I can
reproduce this issue there. It only happened when going from 6.18.12 to
6.18.15, and with a service inside a cgroup that is mostly applying large sets
of IPs via nft. I do not have a simple reproducer script or something I can
easily share yet, but am working on that.

I'm going to try and revert rbtree patch series locally and see if it still
happens. I can also play with NFT_ARRAY_EXTRA_SIZE and see if that is a factor
here as well.

> > We are using userspace nftables 1.1.3, but had to apply the patch mentioned
> > in this thread: https://lore.kernel.org/all/e6b43861cda6953cc7f8c259e663b890e53d7785.camel@sapience.com/
> > In order to solve the other regression we encountered.
> 
> Yes, there are plans to revert a kernel patch that went in -stable to
> address this.

Thanks.
--chris

  reply	other threads:[~2026-03-05 16:28 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-04 17:50 [REGRESSION] 6.18.14 netfilter/nftables consumes way more memory Chris Arges
2026-03-04 21:26 ` Pablo Neira Ayuso
2026-03-04 21:27   ` Pablo Neira Ayuso
2026-03-05 16:28     ` Chris Arges [this message]
2026-03-06 12:22       ` Pablo Neira Ayuso
2026-03-06 12:25         ` Pablo Neira Ayuso
2026-03-06 18:20           ` Chris Arges
2026-03-07  0:15             ` Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aamvQTTZu4-chpsS@20HS2G4 \
    --to=carges@cloudflare.com \
    --cc=fw@strlen.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=jslaby@suse.cz \
    --cc=kernel-team@cloudflare.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lwn@lwn.net \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.