From: Florian Westphal <fw@strlen.de>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Florian Westphal <fw@strlen.de>,
netfilter-devel@vger.kernel.org, coreteam@netfilter.org,
Eric Dumazet <edumazet@google.com>,
Thomas Gleixner <tglx@linutronix.de>,
Jozsef Kadlecsik <kadlec@netfilter.org>
Subject: Re: [netfilter-core] [Q] The usage of xt_recseq.
Date: Tue, 13 Aug 2024 20:32:02 +0200 [thread overview]
Message-ID: <20240813183202.GA13864@breakpoint.cc> (raw)
In-Reply-To: <20240813152810.iBu4Tg20@linutronix.de>
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> > > Would it work to convert the counters to u64_stats_sync? On 32bit
> > > there would be a seqcount_t with preemption disabling during the
> > > update which means the xt_write_recseq_begin()/ xt_write_recseq_end()
> > > has to be limited the counter update only. On 64bit architectures there
> > > would be just the update. This means that number of packets and bytes
> > > might be "off" (the one got updated, the other not "yet") but I don't
> > > think that this is a problem here.
> >
> > Unfortunately its not only about counters; local_bh_disable() is also
> > used to prevent messing up the chain jump stack.
>
> Okay. But I could get rid of the counters/ seqcount and worry about the
> other things later on?
Unfortunately no, see xt_replace_table(). A seqcnt transition is seen
as "done" signal for releasing the ruleset.
See d3d40f237480 ("Revert "netfilter: x_tables: Switch synchronization to RCU"")
and its history.
First step would be to revert back to rcu and then replace
synchronize_rcu with call_rcu based release of the blob.
Or, just tag the x_tables traversers as incompatible with
CONFIG_PREEMPT_RT in Kconfig...
Its possible to build a kernel that can support iptables-nft
(iptables-over-nftables api) but not classic iptables, so the
problematic table traverse code isn't built.
> So jumpstack. This is exclusively used by ipt_do_table(). Not sure how a
> timer comes here but I goes any softirq (as in NAPI) would do the job
> without actually disabling BH.
Think eg. tcp retransmit timer kicking in while kernel executed ip
output path on behalf of write() on some socket.
We're in process context, inside ipt_do_table, local_bh_disable is
needed to delay sirq from coming in at wrong time due to pcpu jumpstack
area.
> Can this be easily transformed to the on-stack thingy that nft is using
> or is it completely different?
In first step, blob validation needs to be changed to validate that jump
depth can't exceed 16 (i.e. 64byte extra scratch space on stack for
storage of return addresses).
Then it could be updated. It will probably break some test cases but
I don't think there are real production rulesets that would fail to load
with a chainstack limit of 16.
> local_lock_nested_bh() would be the easiest to not upset anyone but this
> is using hand crafted per-CPU memory instead of alloc_percpu(). Can
> stacksize get extremely huge?
Classic iptables allows as many calls as there are jumps in the ruleset,
so theroretically they can be huge.
If that happens outside of test case scripts -- i do not think so.
next prev parent reply other threads:[~2024-08-13 18:32 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-13 14:01 [Q] The usage of xt_recseq Sebastian Andrzej Siewior
2024-08-13 14:37 ` [netfilter-core] " Florian Westphal
2024-08-13 15:28 ` Sebastian Andrzej Siewior
2024-08-13 18:32 ` Florian Westphal [this message]
2024-08-14 7:13 ` Sebastian Andrzej Siewior
2024-08-14 15:09 ` Florian Westphal
2024-08-14 15:22 ` Sebastian Andrzej Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240813183202.GA13864@breakpoint.cc \
--to=fw@strlen.de \
--cc=bigeasy@linutronix.de \
--cc=coreteam@netfilter.org \
--cc=edumazet@google.com \
--cc=kadlec@netfilter.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.