All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Westphal <fw@strlen.de>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: netfilter-devel@vger.kernel.org, coreteam@netfilter.org,
	Eric Dumazet <edumazet@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Jozsef Kadlecsik <kadlec@netfilter.org>
Subject: Re: [netfilter-core] [Q] The usage of xt_recseq.
Date: Tue, 13 Aug 2024 16:37:19 +0200	[thread overview]
Message-ID: <20240813143719.GA5147@breakpoint.cc> (raw)
In-Reply-To: <20240813140121.QvV8fMbm@linutronix.de>

Hi Sebastian!

Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> xt_recseq is per-CPU sequence counter which is not entirely using the
> seqcount API.
> The writer side of the sequence counter is updating the packet and byte
> counter (64bit) while processing a packet. The reader simply retrieves
> the two counter.
> Based on the code, the writer side can be recursive which is probably
> why the "regular" write side isn't used or maybe because there is no
> "lock".

Yes, recursive entry is possible even with local_bh_disable(), as
some of the xt_FOO extensions can send a packet (REJECT and TEE come
to mind), which can re-enter into ip_tables' traverser (*_do_table).

> The seqcount is per-CPU and disabling BH is used as the "lock". On
> PREEMPT_RT code in local_bh_disable()ed section is preemptible and this
> means that a seqcount reader with higher priority can preempt the writer
> which leads to a deadlock. 
> 
> While trying to trigger the writer side, I managed only to trigger a
> single reader and only while using iptables-legacy/ arptables-legacy
> commands. The nft did not trigger it. So it is legacy code only.

Yes, this is legacy only.

> Would it work to convert the counters to u64_stats_sync? On 32bit
> there would be a seqcount_t with preemption disabling during the
> update which means the xt_write_recseq_begin()/ xt_write_recseq_end()
> has to be limited the counter update only. On 64bit architectures there
> would be just the update. This means that number of packets and bytes
> might be "off" (the one got updated, the other not "yet") but I don't
> think that this is a problem here.

Unfortunately its not only about counters; local_bh_disable() is also
used to prevent messing up the chain jump stack.

For local hooks, this is called from process context, so in order
to avoid timers kicking in and then re-using the jumpstack, this
local_bh_disable avoids that.

The chain stack is percpu in -legacy, and on-stack in nf_tables.

Then, there is also recursion via xt_TEE.c, hence this strange
        if (static_key_false(&xt_tee_enabled))

in ipt_do_table() (We'll switch to a shadow-stack for that case).

  reply	other threads:[~2024-08-13 14:37 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-13 14:01 [Q] The usage of xt_recseq Sebastian Andrzej Siewior
2024-08-13 14:37 ` Florian Westphal [this message]
2024-08-13 15:28   ` [netfilter-core] " Sebastian Andrzej Siewior
2024-08-13 18:32     ` Florian Westphal
2024-08-14  7:13       ` Sebastian Andrzej Siewior
2024-08-14 15:09         ` Florian Westphal
2024-08-14 15:22           ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240813143719.GA5147@breakpoint.cc \
    --to=fw@strlen.de \
    --cc=bigeasy@linutronix.de \
    --cc=coreteam@netfilter.org \
    --cc=edumazet@google.com \
    --cc=kadlec@netfilter.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.