All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Florian Westphal <fw@strlen.de>
Cc: netfilter-devel <netfilter-devel@vger.kernel.org>
Subject: Re: [nf-next 0/2] netfilter: nf_tables: make set flush more resistant to memory pressure
Date: Fri, 25 Jul 2025 17:03:24 +0200	[thread overview]
Message-ID: <aIOcq2sdP17aYgAE@calendula> (raw)
In-Reply-To: <aINnTy_Ifu66N8dp@strlen.de>

On Fri, Jul 25, 2025 at 01:15:27PM +0200, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > I think the key is to be able to identify what elements have been
> > flushed by what flush command, so abort path can just restore/undo the
> > state for the given elements.
> > 
> > Because this also is possible:
> > 
> >        flush set x + [...] + flush set x
> > 
> > And [...] includes possible new/delete elements in x.
> > 
> > It should be possible to store an flush command id in the set element
> > (this increases the memory consumption of the set element, which your
> > series already does it) to identify what flush command has deleted it.
> > This is needed because the transaction object won't be in place but I
> > think it is a fair tradeoff. The flush command id can be incremental
> > in the batch (the netlink sequence number cannot be used for this
> > purpose).
> 
> OK, that might work.  So the idea is to do the set walk as-is, but
> instead of allocating a NFT_MSG_DELSETELEM for each transaction
> object, introduce NFT_MSG_FLUSHSET transaction.

Or simply using NFT_MSG_DELSET and add a flag to note this is a flush.

> Then, for a DELSETELEM request with no elements (== flush),
> allocate *one* NFT_MSG_FLUSHSET transaction.

Yes.

> The NFT_MSG_FLUSHSET transaction holds the set being flushed
> and an id, that increments sequentially once for each flush.

Yes.

> Then, do the walk as-is:
> 
> static int nft_setelem_flush(const struct nft_ctx *ctx,
>                              struct nft_set *set,
>                              const struct nft_set_iter *iter,
>                              struct nft_elem_priv *elem_priv)
> {
>         const struct nft_set_ext *ext = nft_set_elem_ext(set, elem_priv);
>         struct nft_trans *trans;
> 
> 	/* previous delsetelem or erlier flush marked it inactive */
>         if (!nft_set_elem_active(ext, iter->genmask))
>                 return 0;
> 
> /* No allocation per set elemenet anymore */
> /* trans = nft_trans_alloc_gfp(ctx, NFT_MSG_DELSETELEM, */
> 
> 	/* trans_flush could be obtained from the tail of
> 	 * the transaction list.  Or placed in *iter.
> 	 */
> 	elem_priv->flush_id = trans_flush->flush_id
>         set->ops->flush(ctx->net, set, elem_priv);
>         set->ndeact++;
> 
>         nft_setelem_data_deactivate(ctx->net, set, elem_priv);
> 
>         return 0;

Maybe use nft_map_deactivate() ?

> }
> 
> On abort, NFT_MSG_FLUSHSET would do another walk, for all set elements,
> and then calls nft_setelem_data_activate/nft_setelem_activate in case
> elem_priv->flush_id == trans_flush->flush_id.

Exactly, maybe nft_map_activate() can help.

> Did I get that right?  I don't see any major issues with this, except
> the need to add u32 flush_id to struct nft_elem_priv.
> Or perhaps struct nft_set_ext would be a better fit as nft_elem_priv is
> just a proxy.

Yes, u32 flush_id (or trans_id) needs to be added, then set
transaction id incrementally.

> > Of course, this needs careful look, but if the set element can be used
> > to annotate the information that allows us to restore to previous
> > state before flush (given the transaction object is not used anymore),
> > then it should work. Your series is extending the set element size for
> > a different purpose, so I think the extra memory should not be an
> > issue.
> 
> Agree, it would need 4 bytes per elem which isn't much compared to the
> transaction log savings.
> 
> Will you have a look or should I have a go at this?

Please, go for it.

  reply	other threads:[~2025-07-25 15:03 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-04 12:30 [nf-next 0/2] netfilter: nf_tables: make set flush more resistant to memory pressure Florian Westphal
2025-07-04 12:30 ` [nf-next 1/2] netfilter: nf_tables: allow iter callbacks to sleep Florian Westphal
2025-07-04 12:30 ` [nf-next 2/2] netfilter: nf_tables: all transaction allocations can now sleep Florian Westphal
2025-07-24 23:19 ` [nf-next 0/2] netfilter: nf_tables: make set flush more resistant to memory pressure Pablo Neira Ayuso
2025-07-25  0:24   ` Florian Westphal
2025-07-25 10:10     ` Pablo Neira Ayuso
2025-07-25 11:15       ` Florian Westphal
2025-07-25 15:03         ` Pablo Neira Ayuso [this message]
2025-07-28 21:28           ` Florian Westphal
2025-07-29  7:22             ` Jozsef Kadlecsik
2025-07-29 10:27               ` Pablo Neira Ayuso
2025-07-29 10:50                 ` Jozsef Kadlecsik
2025-07-29 10:38             ` Pablo Neira Ayuso
2025-07-29 11:37               ` Florian Westphal
2025-07-30 16:16                 ` Pablo Neira Ayuso
2025-07-30 16:35                   ` Florian Westphal
2025-08-19 19:10                     ` Florian Westphal
2025-08-19 22:23                       ` Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aIOcq2sdP17aYgAE@calendula \
    --to=pablo@netfilter.org \
    --cc=fw@strlen.de \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.