From: Florian Westphal <fw@strlen.de>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Florian Westphal <fw@strlen.de>,
netfilter-devel@vger.kernel.org, coreteam@netfilter.org,
linux-rt-devel@lists.linux.dev,
Pablo Neira Ayuso <pablo@netfilter.org>,
Jozsef Kadlecsik <kadlec@netfilter.org>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH net-next 1/3] netfilter: Make xt_table::private RCU protected.
Date: Mon, 17 Feb 2025 16:35:48 +0100 [thread overview]
Message-ID: <20250217153548.GB16351@breakpoint.cc> (raw)
In-Reply-To: <20250217145754.KVUio79e@linutronix.de>
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> > > This can also be achieved with RCU: Each reader of the private pointer
> > > will be with in an RCU read section. The new pointer will be published
> > > with rcu_assign_pointer() and synchronize_rcu() is used to wait until
> > > each reader left its critical section.
> >
> > Note we had this before and it was reverted in
> > d3d40f237480 ("Revert "netfilter: x_tables: Switch synchronization to RCU"")
> >
> > I'm not saying its wrong, but I think you need a plan b when the same
> > complaints wrt table replace slowdown come in.
> >
> > And unfortunately I can't find a solution for this, unless we keep
> > either the existing wait-scheme for counters sake or we accept
> > that some counter update might be lost between copy to userland and
> > destruction (it would need to use rcu_work or similar, the xt
> > target/module destroy callbacks can sleep).
>
> Urgh. Is this fast & frequent update a real-world thing or a benchmark
> of some sort? I mean adding rule after rule is possible but…
No idea, its certainly bad design (iptables-restore exists for a
reason).
> I used here synchronize_rcu() and there is also
> synchronize_rcu_expedited() but I do hate it. With everything.
Agreed.
> What are the counters used in userland for? I've seen that they are
> copied but did not understood why.
> iptables-legacy -A INPUT -j ACCEPT
> ends up in xt_replace_table() but iptables-nft doesn't. Different
> interface, better design? Or I just used legacy and now it is considered
> as the only?
iptables-nft uses the nftables netlink API, I don't think it suffers
from the preempt_rt issues you're resolving in the setsockopt api.
It won't go anywhere near xt_replace_table and it doesn't use the
xtables binary format on the user/kernel api side.
> I see two invocations on iptables-legacy-restore.
>
> But the question remains: Why copy counters after replacing the rule
> set?
Good question. What I can gather from
https://git.netfilter.org/iptables/tree/libiptc/libiptc.c#n2597
After replace we get copy of the old counters, depending on mode
we can force-update counters post-replace in kernel, via
ret = setsockopt(handle->sockfd, TC_IPPROTO, SO_SET_ADD_COUNTERS,
newcounters, counterlen);
I suspect this is whats used by 'iptables -L -v -Z INPUT'.
But I don't know if anyone uses this in practice.
Maybe Pablo remembers the details, this is ancient code by kernel
standards.
I think we might get away with losing counters on the update
side, i.e. rcu_update_pointer() so new CPUs won't find the old
binary blob, copy the counters, then defer destruction via rcu_work
or similar and accept that the counters might have marginally
changed after the copy to userland and before last cpu left its
rcu critical section.
next prev parent reply other threads:[~2025-02-17 15:35 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-16 12:51 [PATCH net-next 0/3] Replace xt_recseq with u64_stats Sebastian Andrzej Siewior
2025-02-16 12:51 ` [PATCH net-next 1/3] netfilter: Make xt_table::private RCU protected Sebastian Andrzej Siewior
2025-02-17 14:05 ` Florian Westphal
2025-02-17 14:57 ` Sebastian Andrzej Siewior
2025-02-17 15:35 ` Florian Westphal [this message]
2025-02-17 15:56 ` Sebastian Andrzej Siewior
2025-02-17 16:20 ` Florian Westphal
2025-02-18 8:18 ` Sebastian Andrzej Siewior
2025-02-18 12:46 ` Florian Westphal
2025-02-16 12:51 ` [PATCH net-next 2/3] netfilter: Split the xt_counters type between kernel and user Sebastian Andrzej Siewior
2025-02-16 15:22 ` kernel test robot
2025-02-16 12:51 ` [PATCH net-next 3/3] netfilter: Use u64_stats for counters in xt_counters_k Sebastian Andrzej Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250217153548.GB16351@breakpoint.cc \
--to=fw@strlen.de \
--cc=bigeasy@linutronix.de \
--cc=coreteam@netfilter.org \
--cc=kadlec@netfilter.org \
--cc=linux-rt-devel@lists.linux.dev \
--cc=netfilter-devel@vger.kernel.org \
--cc=pablo@netfilter.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.