public inbox for netfilter-devel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 nf-next 0/3] netfilter: nf_tables: fix reset request deadlock
@ 2026-02-04 20:26 Brian Witte
  2026-02-04 20:26 ` [PATCH v5 nf-next 1/3] Revert nf_tables commit_mutex in reset path Brian Witte
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Brian Witte @ 2026-02-04 20:26 UTC (permalink / raw)
  To: netfilter-devel
  Cc: pablo, fw, kadlec, syzbot+ff16b505ec9152e5f448, Brian Witte

syzbot reported a possible circular locking dependency between
commit_mutex, nfnl_subsys_ipset and nlk_cb_mutex-NETFILTER:

  WARNING: possible circular locking dependency detected
  syz.3.970/9330 is trying to acquire lock:
  ffff888012d4ccd8 (&nft_net->commit_mutex){+.+.}-{4:4}, at: nf_tables_dumpreset_obj+0x6f/0xa0

  but task is already holding lock:
  ffff88802bce36f0 (nlk_cb_mutex-NETFILTER){+.+.}-{4:4}, at: __netlink_dump_start+0x150/0x990

  Chain exists of:
    &nft_net->commit_mutex --> nfnl_subsys_ipset --> nlk_cb_mutex-NETFILTER

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(nlk_cb_mutex-NETFILTER);
                                 lock(nfnl_subsys_ipset);
                                 lock(nlk_cb_mutex-NETFILTER);
    lock(&nft_net->commit_mutex);

Link: https://syzkaller.appspot.com/bug?extid=ff16b505ec9152e5f448

The bug was introduced by commits that added commit_mutex locking to
serialize reset requests.

v5:
  - Split counter and quota changes into separate patches
  - counter: use global static spinlock wrapping fetch+reset
    atomically to prevent parallel reset underrun, instead of
    per-net spinlock taken too late (after fetch)
  - Drop struct net from counter priv (no longer needed with
    global spinlock)
  - quota: unchanged from v4, atomic64_xchg() for reset

v4:
  - Push spinlock down into nft_counter_reset() instead of holding it
    across entire dump iteration, per Florian's review
  - Store struct net in counter priv to access the per-net spinlock
    during reset, avoiding skb->sk dereference which is NULL in
    single-element GET paths such as nft_get_set_elem
  - Use atomic64_xchg() for quota reset instead of spinlock, which is
    simpler per Pablo's suggestion
  Link: https://lore.kernel.org/netfilter-devel/20260203050723.263515-1-brianwitte@mailfence.com/

v3:
  - Restructured as 2-patch series per Florian's suggestion:
    1. Revert the 3 commits that added commit_mutex locking
    2. Add spinlock-based serialization for reset requests
  Link: https://lore.kernel.org/netfilter-devel/20260201195255.532559-1-brianwitte@mailfence.com/

v2:
  - Switched to a spinlock in nft_pernet instead of mutex
  - Spinlock doesn't sleep, so we stay in RCU read-side critical section
  - Removes the try_module_get/module_put and rcu_read_unlock/lock dance
  Link: https://lore.kernel.org/netfilter-devel/20260201062517.263087-1-brianwitte@mailfence.com/

v1:
  - Proposed using a dedicated reset_mutex instead of commit_mutex
  Link: https://lore.kernel.org/netfilter-devel/20260127030604.39982-1-brianwitte@mailfence.com/

Brian Witte (3):
  Revert nf_tables commit_mutex in reset path
  netfilter: nft_counter: serialize reset with spinlock
  netfilter: nft_quota: use atomic64_xchg for reset

 net/netfilter/nf_tables_api.c | 248 ++++++----------------------------
 net/netfilter/nft_counter.c   |  20 ++-
 net/netfilter/nft_quota.c     |  12 +-
 3 files changed, 65 insertions(+), 215 deletions(-)

-- 
2.47.3


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v5 nf-next 1/3] Revert nf_tables commit_mutex in reset path
  2026-02-04 20:26 [PATCH v5 nf-next 0/3] netfilter: nf_tables: fix reset request deadlock Brian Witte
@ 2026-02-04 20:26 ` Brian Witte
  2026-02-05 13:46   ` Florian Westphal
  2026-02-04 20:26 ` [PATCH v5 nf-next 2/3] netfilter: nft_counter: serialize reset with spinlock Brian Witte
  2026-02-04 20:26 ` [PATCH v5 nf-next 3/3] netfilter: nft_quota: use atomic64_xchg for reset Brian Witte
  2 siblings, 1 reply; 5+ messages in thread
From: Brian Witte @ 2026-02-04 20:26 UTC (permalink / raw)
  To: netfilter-devel
  Cc: pablo, fw, kadlec, syzbot+ff16b505ec9152e5f448, Brian Witte

Revert mutex-based locking for reset requests. It caused a circular
lock dependency between commit_mutex, nfnl_subsys_ipset, and
nlk_cb_mutex when nft reset, ipset list, and iptables-nft with set
match ran concurrently.

This reverts bd662c4218f9, 3d483faa6663, 3cb03edb4de3.

Reported-by: syzbot+ff16b505ec9152e5f448@syzkaller.appspotmail.com
Fixes: bd662c4218f9 ("netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests")
Fixes: 3d483faa6663 ("netfilter: nf_tables: Add locking for NFT_MSG_GETSETELEM_RESET requests")
Fixes: 3cb03edb4de3 ("netfilter: nf_tables: Add locking for NFT_MSG_GETRULE_RESET requests")
Signed-off-by: Brian Witte <brianwitte@mailfence.com>
---
 net/netfilter/nf_tables_api.c | 248 ++++++----------------------------
 1 file changed, 42 insertions(+), 206 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 9923387907e3..14a6c2f5826e 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -3901,23 +3901,6 @@ static int nf_tables_dump_rules(struct sk_buff *skb,
 	return skb->len;
 }
 
-static int nf_tables_dumpreset_rules(struct sk_buff *skb,
-				     struct netlink_callback *cb)
-{
-	struct nftables_pernet *nft_net = nft_pernet(sock_net(skb->sk));
-	int ret;
-
-	/* Mutex is held is to prevent that two concurrent dump-and-reset calls
-	 * do not underrun counters and quotas. The commit_mutex is used for
-	 * the lack a better lock, this is not transaction path.
-	 */
-	mutex_lock(&nft_net->commit_mutex);
-	ret = nf_tables_dump_rules(skb, cb);
-	mutex_unlock(&nft_net->commit_mutex);
-
-	return ret;
-}
-
 static int nf_tables_dump_rules_start(struct netlink_callback *cb)
 {
 	struct nft_rule_dump_ctx *ctx = (void *)cb->ctx;
@@ -3937,16 +3920,10 @@ static int nf_tables_dump_rules_start(struct netlink_callback *cb)
 			return -ENOMEM;
 		}
 	}
-	return 0;
-}
-
-static int nf_tables_dumpreset_rules_start(struct netlink_callback *cb)
-{
-	struct nft_rule_dump_ctx *ctx = (void *)cb->ctx;
-
-	ctx->reset = true;
+	if (NFNL_MSG_TYPE(cb->nlh->nlmsg_type) == NFT_MSG_GETRULE_RESET)
+		ctx->reset = true;
 
-	return nf_tables_dump_rules_start(cb);
+	return 0;
 }
 
 static int nf_tables_dump_rules_done(struct netlink_callback *cb)
@@ -4012,6 +3989,8 @@ static int nf_tables_getrule(struct sk_buff *skb, const struct nfnl_info *info,
 	u32 portid = NETLINK_CB(skb).portid;
 	struct net *net = info->net;
 	struct sk_buff *skb2;
+	bool reset = false;
+	char *buf;
 
 	if (info->nlh->nlmsg_flags & NLM_F_DUMP) {
 		struct netlink_dump_control c = {
@@ -4025,47 +4004,16 @@ static int nf_tables_getrule(struct sk_buff *skb, const struct nfnl_info *info,
 		return nft_netlink_dump_start_rcu(info->sk, skb, info->nlh, &c);
 	}
 
-	skb2 = nf_tables_getrule_single(portid, info, nla, false);
-	if (IS_ERR(skb2))
-		return PTR_ERR(skb2);
-
-	return nfnetlink_unicast(skb2, net, portid);
-}
-
-static int nf_tables_getrule_reset(struct sk_buff *skb,
-				   const struct nfnl_info *info,
-				   const struct nlattr * const nla[])
-{
-	struct nftables_pernet *nft_net = nft_pernet(info->net);
-	u32 portid = NETLINK_CB(skb).portid;
-	struct net *net = info->net;
-	struct sk_buff *skb2;
-	char *buf;
-
-	if (info->nlh->nlmsg_flags & NLM_F_DUMP) {
-		struct netlink_dump_control c = {
-			.start= nf_tables_dumpreset_rules_start,
-			.dump = nf_tables_dumpreset_rules,
-			.done = nf_tables_dump_rules_done,
-			.module = THIS_MODULE,
-			.data = (void *)nla,
-		};
-
-		return nft_netlink_dump_start_rcu(info->sk, skb, info->nlh, &c);
-	}
-
-	if (!try_module_get(THIS_MODULE))
-		return -EINVAL;
-	rcu_read_unlock();
-	mutex_lock(&nft_net->commit_mutex);
-	skb2 = nf_tables_getrule_single(portid, info, nla, true);
-	mutex_unlock(&nft_net->commit_mutex);
-	rcu_read_lock();
-	module_put(THIS_MODULE);
+	if (NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_GETRULE_RESET)
+		reset = true;
 
+	skb2 = nf_tables_getrule_single(portid, info, nla, reset);
 	if (IS_ERR(skb2))
 		return PTR_ERR(skb2);
 
+	if (!reset)
+		return nfnetlink_unicast(skb2, net, portid);
+
 	buf = kasprintf(GFP_ATOMIC, "%.*s:%u",
 			nla_len(nla[NFTA_RULE_TABLE]),
 			(char *)nla_data(nla[NFTA_RULE_TABLE]),
@@ -6324,6 +6272,10 @@ static int nf_tables_dump_set(struct sk_buff *skb, struct netlink_callback *cb)
 	nla_nest_end(skb, nest);
 	nlmsg_end(skb, nlh);
 
+	if (dump_ctx->reset && args.iter.count > args.iter.skip)
+		audit_log_nft_set_reset(table, cb->seq,
+					args.iter.count - args.iter.skip);
+
 	rcu_read_unlock();
 
 	if (args.iter.err && args.iter.err != -EMSGSIZE)
@@ -6339,26 +6291,6 @@ static int nf_tables_dump_set(struct sk_buff *skb, struct netlink_callback *cb)
 	return -ENOSPC;
 }
 
-static int nf_tables_dumpreset_set(struct sk_buff *skb,
-				   struct netlink_callback *cb)
-{
-	struct nftables_pernet *nft_net = nft_pernet(sock_net(skb->sk));
-	struct nft_set_dump_ctx *dump_ctx = cb->data;
-	int ret, skip = cb->args[0];
-
-	mutex_lock(&nft_net->commit_mutex);
-
-	ret = nf_tables_dump_set(skb, cb);
-
-	if (cb->args[0] > skip)
-		audit_log_nft_set_reset(dump_ctx->ctx.table, cb->seq,
-					cb->args[0] - skip);
-
-	mutex_unlock(&nft_net->commit_mutex);
-
-	return ret;
-}
-
 static int nf_tables_dump_set_start(struct netlink_callback *cb)
 {
 	struct nft_set_dump_ctx *dump_ctx = cb->data;
@@ -6602,8 +6534,13 @@ static int nf_tables_getsetelem(struct sk_buff *skb,
 {
 	struct netlink_ext_ack *extack = info->extack;
 	struct nft_set_dump_ctx dump_ctx;
+	int rem, err = 0, nelems = 0;
+	struct net *net = info->net;
 	struct nlattr *attr;
-	int rem, err = 0;
+	bool reset = false;
+
+	if (NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_GETSETELEM_RESET)
+		reset = true;
 
 	if (info->nlh->nlmsg_flags & NLM_F_DUMP) {
 		struct netlink_dump_control c = {
@@ -6613,7 +6550,7 @@ static int nf_tables_getsetelem(struct sk_buff *skb,
 			.module = THIS_MODULE,
 		};
 
-		err = nft_set_dump_ctx_init(&dump_ctx, skb, info, nla, false);
+		err = nft_set_dump_ctx_init(&dump_ctx, skb, info, nla, reset);
 		if (err)
 			return err;
 
@@ -6624,75 +6561,21 @@ static int nf_tables_getsetelem(struct sk_buff *skb,
 	if (!nla[NFTA_SET_ELEM_LIST_ELEMENTS])
 		return -EINVAL;
 
-	err = nft_set_dump_ctx_init(&dump_ctx, skb, info, nla, false);
+	err = nft_set_dump_ctx_init(&dump_ctx, skb, info, nla, reset);
 	if (err)
 		return err;
 
 	nla_for_each_nested(attr, nla[NFTA_SET_ELEM_LIST_ELEMENTS], rem) {
-		err = nft_get_set_elem(&dump_ctx.ctx, dump_ctx.set, attr, false);
-		if (err < 0) {
-			NL_SET_BAD_ATTR(extack, attr);
-			break;
-		}
-	}
-
-	return err;
-}
-
-static int nf_tables_getsetelem_reset(struct sk_buff *skb,
-				      const struct nfnl_info *info,
-				      const struct nlattr * const nla[])
-{
-	struct nftables_pernet *nft_net = nft_pernet(info->net);
-	struct netlink_ext_ack *extack = info->extack;
-	struct nft_set_dump_ctx dump_ctx;
-	int rem, err = 0, nelems = 0;
-	struct nlattr *attr;
-
-	if (info->nlh->nlmsg_flags & NLM_F_DUMP) {
-		struct netlink_dump_control c = {
-			.start = nf_tables_dump_set_start,
-			.dump = nf_tables_dumpreset_set,
-			.done = nf_tables_dump_set_done,
-			.module = THIS_MODULE,
-		};
-
-		err = nft_set_dump_ctx_init(&dump_ctx, skb, info, nla, true);
-		if (err)
-			return err;
-
-		c.data = &dump_ctx;
-		return nft_netlink_dump_start_rcu(info->sk, skb, info->nlh, &c);
-	}
-
-	if (!nla[NFTA_SET_ELEM_LIST_ELEMENTS])
-		return -EINVAL;
-
-	if (!try_module_get(THIS_MODULE))
-		return -EINVAL;
-	rcu_read_unlock();
-	mutex_lock(&nft_net->commit_mutex);
-	rcu_read_lock();
-
-	err = nft_set_dump_ctx_init(&dump_ctx, skb, info, nla, true);
-	if (err)
-		goto out_unlock;
-
-	nla_for_each_nested(attr, nla[NFTA_SET_ELEM_LIST_ELEMENTS], rem) {
-		err = nft_get_set_elem(&dump_ctx.ctx, dump_ctx.set, attr, true);
+		err = nft_get_set_elem(&dump_ctx.ctx, dump_ctx.set, attr, reset);
 		if (err < 0) {
 			NL_SET_BAD_ATTR(extack, attr);
 			break;
 		}
 		nelems++;
 	}
-	audit_log_nft_set_reset(dump_ctx.ctx.table, nft_base_seq(info->net), nelems);
-
-out_unlock:
-	rcu_read_unlock();
-	mutex_unlock(&nft_net->commit_mutex);
-	rcu_read_lock();
-	module_put(THIS_MODULE);
+	if (reset)
+		audit_log_nft_set_reset(dump_ctx.ctx.table, nft_base_seq(net),
+					nelems);
 
 	return err;
 }
@@ -8564,19 +8447,6 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb)
 	return skb->len;
 }
 
-static int nf_tables_dumpreset_obj(struct sk_buff *skb,
-				   struct netlink_callback *cb)
-{
-	struct nftables_pernet *nft_net = nft_pernet(sock_net(skb->sk));
-	int ret;
-
-	mutex_lock(&nft_net->commit_mutex);
-	ret = nf_tables_dump_obj(skb, cb);
-	mutex_unlock(&nft_net->commit_mutex);
-
-	return ret;
-}
-
 static int nf_tables_dump_obj_start(struct netlink_callback *cb)
 {
 	struct nft_obj_dump_ctx *ctx = (void *)cb->ctx;
@@ -8593,16 +8463,10 @@ static int nf_tables_dump_obj_start(struct netlink_callback *cb)
 	if (nla[NFTA_OBJ_TYPE])
 		ctx->type = ntohl(nla_get_be32(nla[NFTA_OBJ_TYPE]));
 
-	return 0;
-}
-
-static int nf_tables_dumpreset_obj_start(struct netlink_callback *cb)
-{
-	struct nft_obj_dump_ctx *ctx = (void *)cb->ctx;
-
-	ctx->reset = true;
+	if (NFNL_MSG_TYPE(cb->nlh->nlmsg_type) == NFT_MSG_GETOBJ_RESET)
+		ctx->reset = true;
 
-	return nf_tables_dump_obj_start(cb);
+	return 0;
 }
 
 static int nf_tables_dump_obj_done(struct netlink_callback *cb)
@@ -8664,42 +8528,16 @@ nf_tables_getobj_single(u32 portid, const struct nfnl_info *info,
 static int nf_tables_getobj(struct sk_buff *skb, const struct nfnl_info *info,
 			    const struct nlattr * const nla[])
 {
-	u32 portid = NETLINK_CB(skb).portid;
-	struct sk_buff *skb2;
-
-	if (info->nlh->nlmsg_flags & NLM_F_DUMP) {
-		struct netlink_dump_control c = {
-			.start = nf_tables_dump_obj_start,
-			.dump = nf_tables_dump_obj,
-			.done = nf_tables_dump_obj_done,
-			.module = THIS_MODULE,
-			.data = (void *)nla,
-		};
-
-		return nft_netlink_dump_start_rcu(info->sk, skb, info->nlh, &c);
-	}
-
-	skb2 = nf_tables_getobj_single(portid, info, nla, false);
-	if (IS_ERR(skb2))
-		return PTR_ERR(skb2);
-
-	return nfnetlink_unicast(skb2, info->net, portid);
-}
-
-static int nf_tables_getobj_reset(struct sk_buff *skb,
-				  const struct nfnl_info *info,
-				  const struct nlattr * const nla[])
-{
-	struct nftables_pernet *nft_net = nft_pernet(info->net);
 	u32 portid = NETLINK_CB(skb).portid;
 	struct net *net = info->net;
 	struct sk_buff *skb2;
+	bool reset = false;
 	char *buf;
 
 	if (info->nlh->nlmsg_flags & NLM_F_DUMP) {
 		struct netlink_dump_control c = {
-			.start = nf_tables_dumpreset_obj_start,
-			.dump = nf_tables_dumpreset_obj,
+			.start = nf_tables_dump_obj_start,
+			.dump = nf_tables_dump_obj,
 			.done = nf_tables_dump_obj_done,
 			.module = THIS_MODULE,
 			.data = (void *)nla,
@@ -8708,18 +8546,16 @@ static int nf_tables_getobj_reset(struct sk_buff *skb,
 		return nft_netlink_dump_start_rcu(info->sk, skb, info->nlh, &c);
 	}
 
-	if (!try_module_get(THIS_MODULE))
-		return -EINVAL;
-	rcu_read_unlock();
-	mutex_lock(&nft_net->commit_mutex);
-	skb2 = nf_tables_getobj_single(portid, info, nla, true);
-	mutex_unlock(&nft_net->commit_mutex);
-	rcu_read_lock();
-	module_put(THIS_MODULE);
+	if (NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_GETOBJ_RESET)
+		reset = true;
 
+	skb2 = nf_tables_getobj_single(portid, info, nla, reset);
 	if (IS_ERR(skb2))
 		return PTR_ERR(skb2);
 
+	if (!reset)
+		return nfnetlink_unicast(skb2, net, NETLINK_CB(skb).portid);
+
 	buf = kasprintf(GFP_ATOMIC, "%.*s:%u",
 			nla_len(nla[NFTA_OBJ_TABLE]),
 			(char *)nla_data(nla[NFTA_OBJ_TABLE]),
@@ -10037,7 +9873,7 @@ static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = {
 		.policy		= nft_rule_policy,
 	},
 	[NFT_MSG_GETRULE_RESET] = {
-		.call		= nf_tables_getrule_reset,
+		.call		= nf_tables_getrule,
 		.type		= NFNL_CB_RCU,
 		.attr_count	= NFTA_RULE_MAX,
 		.policy		= nft_rule_policy,
@@ -10091,7 +9927,7 @@ static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = {
 		.policy		= nft_set_elem_list_policy,
 	},
 	[NFT_MSG_GETSETELEM_RESET] = {
-		.call		= nf_tables_getsetelem_reset,
+		.call		= nf_tables_getsetelem,
 		.type		= NFNL_CB_RCU,
 		.attr_count	= NFTA_SET_ELEM_LIST_MAX,
 		.policy		= nft_set_elem_list_policy,
@@ -10137,7 +9973,7 @@ static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = {
 		.policy		= nft_obj_policy,
 	},
 	[NFT_MSG_GETOBJ_RESET] = {
-		.call		= nf_tables_getobj_reset,
+		.call		= nf_tables_getobj,
 		.type		= NFNL_CB_RCU,
 		.attr_count	= NFTA_OBJ_MAX,
 		.policy		= nft_obj_policy,
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v5 nf-next 2/3] netfilter: nft_counter: serialize reset with spinlock
  2026-02-04 20:26 [PATCH v5 nf-next 0/3] netfilter: nf_tables: fix reset request deadlock Brian Witte
  2026-02-04 20:26 ` [PATCH v5 nf-next 1/3] Revert nf_tables commit_mutex in reset path Brian Witte
@ 2026-02-04 20:26 ` Brian Witte
  2026-02-04 20:26 ` [PATCH v5 nf-next 3/3] netfilter: nft_quota: use atomic64_xchg for reset Brian Witte
  2 siblings, 0 replies; 5+ messages in thread
From: Brian Witte @ 2026-02-04 20:26 UTC (permalink / raw)
  To: netfilter-devel
  Cc: pablo, fw, kadlec, syzbot+ff16b505ec9152e5f448, Brian Witte

Add a global static spinlock to serialize counter fetch+reset
operations, preventing concurrent dump-and-reset from underrunning
values.

The lock is taken before fetching the total so that two parallel
resets cannot both read the same counter values and then both
subtract them.

A global lock is used for simplicity since resets are infrequent.
If this becomes a bottleneck, it can be replaced with a per-net
lock later.

Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Brian Witte <brianwitte@mailfence.com>
---
 net/netfilter/nft_counter.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/nft_counter.c b/net/netfilter/nft_counter.c
index 0d70325280cc..169ae93688bc 100644
--- a/net/netfilter/nft_counter.c
+++ b/net/netfilter/nft_counter.c
@@ -32,6 +32,9 @@ struct nft_counter_percpu_priv {
 
 static DEFINE_PER_CPU(struct u64_stats_sync, nft_counter_sync);
 
+/* control plane only: sync fetch+reset */
+static DEFINE_SPINLOCK(nft_counter_lock);
+
 static inline void nft_counter_do_eval(struct nft_counter_percpu_priv *priv,
 				       struct nft_regs *regs,
 				       const struct nft_pktinfo *pkt)
@@ -148,13 +151,25 @@ static void nft_counter_fetch(struct nft_counter_percpu_priv *priv,
 	}
 }
 
+static void nft_counter_fetch_and_reset(struct nft_counter_percpu_priv *priv,
+					struct nft_counter_tot *total)
+{
+	spin_lock(&nft_counter_lock);
+	nft_counter_fetch(priv, total);
+	nft_counter_reset(priv, total);
+	spin_unlock(&nft_counter_lock);
+}
+
 static int nft_counter_do_dump(struct sk_buff *skb,
 			       struct nft_counter_percpu_priv *priv,
 			       bool reset)
 {
 	struct nft_counter_tot total;
 
-	nft_counter_fetch(priv, &total);
+	if (unlikely(reset))
+		nft_counter_fetch_and_reset(priv, &total);
+	else
+		nft_counter_fetch(priv, &total);
 
 	if (nla_put_be64(skb, NFTA_COUNTER_BYTES, cpu_to_be64(total.bytes),
 			 NFTA_COUNTER_PAD) ||
@@ -162,9 +177,6 @@ static int nft_counter_do_dump(struct sk_buff *skb,
 			 NFTA_COUNTER_PAD))
 		goto nla_put_failure;
 
-	if (reset)
-		nft_counter_reset(priv, &total);
-
 	return 0;
 
 nla_put_failure:
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v5 nf-next 3/3] netfilter: nft_quota: use atomic64_xchg for reset
  2026-02-04 20:26 [PATCH v5 nf-next 0/3] netfilter: nf_tables: fix reset request deadlock Brian Witte
  2026-02-04 20:26 ` [PATCH v5 nf-next 1/3] Revert nf_tables commit_mutex in reset path Brian Witte
  2026-02-04 20:26 ` [PATCH v5 nf-next 2/3] netfilter: nft_counter: serialize reset with spinlock Brian Witte
@ 2026-02-04 20:26 ` Brian Witte
  2 siblings, 0 replies; 5+ messages in thread
From: Brian Witte @ 2026-02-04 20:26 UTC (permalink / raw)
  To: netfilter-devel
  Cc: pablo, fw, kadlec, syzbot+ff16b505ec9152e5f448, Brian Witte

Use atomic64_xchg() to atomically read and zero the consumed value
on reset, which is simpler than the previous read+sub pattern and
doesn't require spinlock protection.

Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Brian Witte <brianwitte@mailfence.com>
---
 net/netfilter/nft_quota.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/nft_quota.c b/net/netfilter/nft_quota.c
index df0798da2329..34c77c872f79 100644
--- a/net/netfilter/nft_quota.c
+++ b/net/netfilter/nft_quota.c
@@ -140,11 +140,14 @@ static int nft_quota_do_dump(struct sk_buff *skb, struct nft_quota *priv,
 	u64 consumed, consumed_cap, quota;
 	u32 flags = priv->flags;
 
-	/* Since we inconditionally increment consumed quota for each packet
+	/* Since we unconditionally increment consumed quota for each packet
 	 * that we see, don't go over the quota boundary in what we send to
 	 * userspace.
 	 */
-	consumed = atomic64_read(priv->consumed);
+	if (reset)
+		consumed = atomic64_xchg(priv->consumed, 0);
+	else
+		consumed = atomic64_read(priv->consumed);
 	quota = atomic64_read(&priv->quota);
 	if (consumed >= quota) {
 		consumed_cap = quota;
@@ -160,10 +163,9 @@ static int nft_quota_do_dump(struct sk_buff *skb, struct nft_quota *priv,
 	    nla_put_be32(skb, NFTA_QUOTA_FLAGS, htonl(flags)))
 		goto nla_put_failure;
 
-	if (reset) {
-		atomic64_sub(consumed, priv->consumed);
+	if (reset)
 		clear_bit(NFT_QUOTA_DEPLETED_BIT, &priv->flags);
-	}
+
 	return 0;
 
 nla_put_failure:
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v5 nf-next 1/3] Revert nf_tables commit_mutex in reset path
  2026-02-04 20:26 ` [PATCH v5 nf-next 1/3] Revert nf_tables commit_mutex in reset path Brian Witte
@ 2026-02-05 13:46   ` Florian Westphal
  0 siblings, 0 replies; 5+ messages in thread
From: Florian Westphal @ 2026-02-05 13:46 UTC (permalink / raw)
  To: Brian Witte; +Cc: netfilter-devel, pablo, Phil Sutter

Brian Witte <brianwitte@mailfence.com> wrote:

TL;DR: I plan to queue this series up for the nf tree next week.
Not directly related to this patchset:

[ CC Phil ]

> Revert mutex-based locking for reset requests. It caused a circular
> lock dependency between commit_mutex, nfnl_subsys_ipset, and
> nlk_cb_mutex when nft reset, ipset list, and iptables-nft with set
> match ran concurrently.
> 
> This reverts bd662c4218f9, 3d483faa6663, 3cb03edb4de3.

Phil, Pablo, the reset infra is broken in the sense that it cannot
guarantee a correct dump+reset:

        nft_rule_for_each_expr(expr, next, rule) {
                if (nft_expr_dump(skb, NFTA_LIST_ELEM, expr, reset) < 0)
                        goto nla_put_failure;
        }
        nla_nest_end(skb, list);

-> when a single ->dump callback fails because netlink skb is full,
the dump is trimmed and resumed.

But, the reset side effects are already visible.
Hence, while dump may be complete, it can contain already-zeroed
counters without userspace ever getting the pre-reset value.

Maybe we should add a cushion in the relevant dump callbacks to
bail out before calling counter/quota->dump() when we run low on
remaining space?  What do you think?

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-02-05 13:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-04 20:26 [PATCH v5 nf-next 0/3] netfilter: nf_tables: fix reset request deadlock Brian Witte
2026-02-04 20:26 ` [PATCH v5 nf-next 1/3] Revert nf_tables commit_mutex in reset path Brian Witte
2026-02-05 13:46   ` Florian Westphal
2026-02-04 20:26 ` [PATCH v5 nf-next 2/3] netfilter: nft_counter: serialize reset with spinlock Brian Witte
2026-02-04 20:26 ` [PATCH v5 nf-next 3/3] netfilter: nft_quota: use atomic64_xchg for reset Brian Witte

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox