All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Florian Westphal <fw@strlen.de>
Cc: Brian Witte <brianwitte@mailfence.com>,
	netfilter-devel@vger.kernel.org, kadlec@blackhole.kfki.hu
Subject: Re: [PATCH nf-next] netfilter: nf_tables: use dedicated mutex for reset operations
Date: Tue, 3 Feb 2026 00:01:01 +0100	[thread overview]
Message-ID: <aYEsrZpkqCb675vv@chamomile> (raw)
In-Reply-To: <aXlTpuk0Z1CeoYwT@strlen.de>

On Wed, Jan 28, 2026 at 01:09:10AM +0100, Florian Westphal wrote:
> Brian Witte <brianwitte@mailfence.com> wrote:
> > Add a dedicated reset_mutex to serialize reset operations instead of
> > reusing the commit_mutex. This fixes a circular locking dependency
> > between commit_mutex, nfnl_subsys_ipset, and nlk_cb_mutex-NETFILTER
> > that could lead to deadlock when nft reset, ipset list, and
> > iptables-nft with set match run concurrently:
> > 
> >   CPU0 (nft reset):        nlk_cb_mutex -> commit_mutex
> >   CPU1 (ipset list):       nfnl_subsys_ipset -> nlk_cb_mutex
> >   CPU2 (iptables -m set):  commit_mutex -> nfnl_subsys_ipset
> > 
> > The reset_mutex only serializes concurrent reset operations to prevent
> > counter underruns, which is all that's needed. Breaking the commit_mutex
> > dependency in the dump-reset path eliminates the circular lock chain.
> > 
> > Reported-by: syzbot+ff16b505ec9152e5f448@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=ff16b505ec9152e5f448
> > Signed-off-by: Brian Witte <brianwitte@mailfence.com>
> 
> This needs more work:
> 
> -----------------------------
> net/netfilter/nf_tables_api.c:1002 RCU-list traversed in non-reader section!!
> 
> other info that might help us debug this:
> 
> rcu_scheduler_active = 2, debug_locks = 1
> 1 lock held by nft/17539:
>  #0: ffff888132018368 (&nft_net->reset_mutex){+.+.}-{4:4}, at: nf_tables_getobj_reset+0x19e/0x5a0 [nf_tables]
> 
> stack backtrace:
> CPU: 4 UID: 0 PID: 17539 Comm: nft Not tainted 6.19.0-rc6+ #9 PREEMPT(full)
> Call Trace:
>  lockdep_rcu_suspicious.cold+0x4f/0xb1
>  nft_table_lookup.part.0+0x1e7/0x220 [nf_tables]
>  nf_tables_getobj_single+0x196/0x5a0 [nf_tables]
>  nf_tables_getobj_reset+0x1b1/0x5a0 [nf_tables]
>  nfnetlink_rcv_msg+0x49e/0xf00
> 
> Please run nftables.git tests/shell/run-tests.sh with
> 
> CONFIG_PROVE_LOCKING=y
> CONFIG_PROVE_RCU=y
> CONFIG_PROVE_RCU_LIST=y
> 
> This warning is not a false positive, the list traversal was
> fine for reset case because we held the transaction mutex.
> 
> Now that we don't, we need to hold rcu_read_lock().
> 
> Maybe its worth investigating if we should instead protect
> only the reset action itself, i.e. add private reset spinlocks
> in nft_quota_do_dump() et al?

Last time we discussed this:

- There was an attempt to make reset fully atomic (for the whole
  ruleset), which is not really possible because netlink dumps for a
  large ruleset might not fit into, not worth trying.

- Still, there could be two threads resetting the counters at the same
  time, and someone mentioned underrun is possible.

  Looking at last for nft_quota, it should be possible to use
  atomic64_xchg():

diff --git a/net/netfilter/nft_quota.c b/net/netfilter/nft_quota.c
index df0798da2329..4a501cc86192 100644
--- a/net/netfilter/nft_quota.c
+++ b/net/netfilter/nft_quota.c
@@ -144,7 +144,11 @@ static int nft_quota_do_dump(struct sk_buff *skb, struct nft_quota *priv,
         * that we see, don't go over the quota boundary in what we send to
         * userspace.
         */
-       consumed = atomic64_read(priv->consumed);
+       if (reset)
+               consumed = atomic64_xchg(priv->consumed, 0);
+       else
+               consumed = atomic64_read(priv->consumed);
+
        quota = atomic64_read(&priv->quota);
        if (consumed >= quota) {
                consumed_cap = quota;
@@ -160,10 +164,9 @@ static int nft_quota_do_dump(struct sk_buff *skb, struct nft_quota *priv,
            nla_put_be32(skb, NFTA_QUOTA_FLAGS, htonl(flags)))
                goto nla_put_failure;
 
-       if (reset) {
-               atomic64_sub(consumed, priv->consumed);
+       if (reset)
                clear_bit(NFT_QUOTA_DEPLETED_BIT, &priv->flags);
-       }
+
        return 0;
 
 nla_put_failure:

Note that priv->quota could be converted to use WRITE_ONCE/READ_ONCE
instead, because updates in the quota are very rare and only happening
from userspace (atomic64 is not needed).

Then, for nft_counter, it is a bit more complicated, maybe a per-netns
spinlock for counters is sufficient, to protect this
nft_counter_do_dump() when the reset flag is true.

  parent reply	other threads:[~2026-02-02 23:01 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-27  3:06 [PATCH nf-next] netfilter: nf_tables: use dedicated mutex for reset operations Brian Witte
2026-01-28  0:09 ` Florian Westphal
2026-01-30  1:56   ` Brian Witte
2026-01-30 11:51     ` Florian Westphal
2026-02-02 23:01   ` Pablo Neira Ayuso [this message]
2026-02-02 23:06     ` Florian Westphal
2026-02-02 23:43       ` Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aYEsrZpkqCb675vv@chamomile \
    --to=pablo@netfilter.org \
    --cc=brianwitte@mailfence.com \
    --cc=fw@strlen.de \
    --cc=kadlec@blackhole.kfki.hu \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.