netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Florian Westphal <fw@strlen.de>
Cc: netfilter-devel <netfilter-devel@vger.kernel.org>
Subject: Re: [nf-next 0/2] netfilter: nf_tables: make set flush more resistant to memory pressure
Date: Tue, 29 Jul 2025 12:38:59 +0200	[thread overview]
Message-ID: <aIikwxU686KFto35@calendula> (raw)
In-Reply-To: <aIfrktUYzla8f9dw@strlen.de>

On Mon, Jul 28, 2025 at 11:28:50PM +0200, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > Yes, u32 flush_id (or trans_id) needs to be added, then set
> > transaction id incrementally.
> 
> Not enough, unfortunately.
> 
> The key difference between flush (delete all elements) and delset
> (remove the set and all elements) is that the set itself gets detached
> from the dataplane.  Then, when elements get free'd, we can just iterate
> the set and free all elements, they are all unreachable from the
> dataplane.
> 
> But in case of a flush, thats not the case, releasing the elements will
> cause use-after-free.  Current DELSETELEM method unlinks the elements
> from the set, links them to the DELSETELEM transactional container.

DELSETELEM does not unlink elements from set in the preparation phase,
instead elements are marked as inactive in the next generation but
they still remain linked to the set. These elements are removed from
the set from either the commit/abort phase.

- flush should skip elements that are already inactive
- flush should not work on deleted sets.
- flush command (elements are marked as inactive) then delete set
  skips those elements that are inactive. So abort path can unwind
  accordingly using the transaction id marker what I am proposing.

I think the key is that no two different transaction release the same
object, hence the need for the transaction id for the flush command to
differentiate between delete set and flush set commands.

I can take a look next week to see if all this is practical,
otherwise...

> Then, on abort they get re-linked to the set, or, in case of commit,
> they can be free'd after the final synchronize_rcu().
> 
> That leaves two options:
> 1.  Use the first patchset, that makes delsetelem allocations sleepable.
> 2.  Add a pointer + and id to nft_set_ext.
> 
> The drawback of 2) is the added mem cost for every set eleemnt (first
> patch series only forces it for rhashtable).
> 
> The major upside however is that DELSETELEM transaction objects are
> simplified a lot, the to-be-deleted elements could be linked to it by
> the then-always-available nft_set_ext pointer, i.e., each DELSETELEM
> transaction object can take an arbitrary number of elements.
> 
> Unless you disagree, I will go for 2).
> This will also allow to remove the krealloc() compaction for DELSETELEM,
> so it should be a net code-removal patch.
> 
> Another option might be to replace a flush with delset+newset
> internally, but this will get tricky because the set/map still being
> referenced by other rules, we'd have to fixup the ruleset internally to
> use the new/empty set while still being able to roll back.
> 
> Proably too tricky / hard to get right, but I'll check it anyway.

... if I don't find a way or I'm too slow, we can take your series in
the next merge window as is.

  parent reply	other threads:[~2025-07-29 10:39 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-04 12:30 [nf-next 0/2] netfilter: nf_tables: make set flush more resistant to memory pressure Florian Westphal
2025-07-04 12:30 ` [nf-next 1/2] netfilter: nf_tables: allow iter callbacks to sleep Florian Westphal
2025-07-04 12:30 ` [nf-next 2/2] netfilter: nf_tables: all transaction allocations can now sleep Florian Westphal
2025-07-24 23:19 ` [nf-next 0/2] netfilter: nf_tables: make set flush more resistant to memory pressure Pablo Neira Ayuso
2025-07-25  0:24   ` Florian Westphal
2025-07-25 10:10     ` Pablo Neira Ayuso
2025-07-25 11:15       ` Florian Westphal
2025-07-25 15:03         ` Pablo Neira Ayuso
2025-07-28 21:28           ` Florian Westphal
2025-07-29  7:22             ` Jozsef Kadlecsik
2025-07-29 10:27               ` Pablo Neira Ayuso
2025-07-29 10:50                 ` Jozsef Kadlecsik
2025-07-29 10:38             ` Pablo Neira Ayuso [this message]
2025-07-29 11:37               ` Florian Westphal
2025-07-30 16:16                 ` Pablo Neira Ayuso
2025-07-30 16:35                   ` Florian Westphal
2025-08-19 19:10                     ` Florian Westphal
2025-08-19 22:23                       ` Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aIikwxU686KFto35@calendula \
    --to=pablo@netfilter.org \
    --cc=fw@strlen.de \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).