From: Florian Westphal <fw@strlen.de>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel <netfilter-devel@vger.kernel.org>
Subject: Re: [nf-next 0/2] netfilter: nf_tables: make set flush more resistant to memory pressure
Date: Fri, 25 Jul 2025 13:15:27 +0200 [thread overview]
Message-ID: <aINnTy_Ifu66N8dp@strlen.de> (raw)
In-Reply-To: <aINYGACMGoNL77Ct@calendula>
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> I think the key is to be able to identify what elements have been
> flushed by what flush command, so abort path can just restore/undo the
> state for the given elements.
>
> Because this also is possible:
>
> flush set x + [...] + flush set x
>
> And [...] includes possible new/delete elements in x.
>
> It should be possible to store an flush command id in the set element
> (this increases the memory consumption of the set element, which your
> series already does it) to identify what flush command has deleted it.
> This is needed because the transaction object won't be in place but I
> think it is a fair tradeoff. The flush command id can be incremental
> in the batch (the netlink sequence number cannot be used for this
> purpose).
OK, that might work. So the idea is to do the set walk as-is, but
instead of allocating a NFT_MSG_DELSETELEM for each transaction
object, introduce NFT_MSG_FLUSHSET transaction.
Then, for a DELSETELEM request with no elements (== flush),
allocate *one* NFT_MSG_FLUSHSET transaction.
The NFT_MSG_FLUSHSET transaction holds the set being flushed
and an id, that increments sequentially once for each flush.
Then, do the walk as-is:
static int nft_setelem_flush(const struct nft_ctx *ctx,
struct nft_set *set,
const struct nft_set_iter *iter,
struct nft_elem_priv *elem_priv)
{
const struct nft_set_ext *ext = nft_set_elem_ext(set, elem_priv);
struct nft_trans *trans;
/* previous delsetelem or erlier flush marked it inactive */
if (!nft_set_elem_active(ext, iter->genmask))
return 0;
/* No allocation per set elemenet anymore */
/* trans = nft_trans_alloc_gfp(ctx, NFT_MSG_DELSETELEM, */
/* trans_flush could be obtained from the tail of
* the transaction list. Or placed in *iter.
*/
elem_priv->flush_id = trans_flush->flush_id
set->ops->flush(ctx->net, set, elem_priv);
set->ndeact++;
nft_setelem_data_deactivate(ctx->net, set, elem_priv);
return 0;
}
On abort, NFT_MSG_FLUSHSET would do another walk, for all set elements,
and then calls nft_setelem_data_activate/nft_setelem_activate in case
elem_priv->flush_id == trans_flush->flush_id.
Did I get that right? I don't see any major issues with this, except
the need to add u32 flush_id to struct nft_elem_priv.
Or perhaps struct nft_set_ext would be a better fit as nft_elem_priv is
just a proxy.
> Of course, this needs careful look, but if the set element can be used
> to annotate the information that allows us to restore to previous
> state before flush (given the transaction object is not used anymore),
> then it should work. Your series is extending the set element size for
> a different purpose, so I think the extra memory should not be an
> issue.
Agree, it would need 4 bytes per elem which isn't much compared to the
transaction log savings.
Will you have a look or should I have a go at this?
next prev parent reply other threads:[~2025-07-25 11:15 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-04 12:30 [nf-next 0/2] netfilter: nf_tables: make set flush more resistant to memory pressure Florian Westphal
2025-07-04 12:30 ` [nf-next 1/2] netfilter: nf_tables: allow iter callbacks to sleep Florian Westphal
2025-07-04 12:30 ` [nf-next 2/2] netfilter: nf_tables: all transaction allocations can now sleep Florian Westphal
2025-07-24 23:19 ` [nf-next 0/2] netfilter: nf_tables: make set flush more resistant to memory pressure Pablo Neira Ayuso
2025-07-25 0:24 ` Florian Westphal
2025-07-25 10:10 ` Pablo Neira Ayuso
2025-07-25 11:15 ` Florian Westphal [this message]
2025-07-25 15:03 ` Pablo Neira Ayuso
2025-07-28 21:28 ` Florian Westphal
2025-07-29 7:22 ` Jozsef Kadlecsik
2025-07-29 10:27 ` Pablo Neira Ayuso
2025-07-29 10:50 ` Jozsef Kadlecsik
2025-07-29 10:38 ` Pablo Neira Ayuso
2025-07-29 11:37 ` Florian Westphal
2025-07-30 16:16 ` Pablo Neira Ayuso
2025-07-30 16:35 ` Florian Westphal
2025-08-19 19:10 ` Florian Westphal
2025-08-19 22:23 ` Pablo Neira Ayuso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aINnTy_Ifu66N8dp@strlen.de \
--to=fw@strlen.de \
--cc=netfilter-devel@vger.kernel.org \
--cc=pablo@netfilter.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).