Netdev List
 help / color / mirror / Atom feed
* [PATCH net 13/14] netfilter: nf_conntrack_expect: use conntrack GC to reap expectations
From: Pablo Neira Ayuso @ 2026-06-20 22:27 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
In-Reply-To: <20260620222738.112506-1-pablo@netfilter.org>

This patch replaces the timer API by GC worker approach for
expectations, as it already happened in many other subsystems.

Use the existing conntrack GC worker to iterate over the local list of
expectations in the master conntrack to reap expired expectations.
Check IPS_HELPER_BIT to run GC for expectations, set it on for nft_ct
expectation which nevers sets it. Hold the expectation spinlock while
iterating over the master conntrack expectation list to synchronize with
nf_ct_remove_expectations(). This also performs runtime packet path
garbage collection through the expectation insertion and lookup
functions while walking over one of the chains of the global expectation
hashtables. Unconfirmed conntrack entries are skipped since ct->ext can
be reallocated and dying are skipped since those will be gone soon.
Set on IPS_HELPER_BIT if the helper ct extension is added, then the new
GC worker does not need to bump the ct refcount to check if the ct->ext
helper is available.

This removes the extra bump on the refcount for expectation timers, this
allows to remove several nf_ct_expect_put() calls after the unlink,
after this update only refcount remains at 1 while on the expectation
hashes.

This patch implicitly addresses a race with the existing timer API
allowing an expectation to access a stale exp->master pointer which has
been already released when expectation removal loses races with an
expiring timer, ie. timer_del() reporting false.

Add a new NF_CT_EXPECT_DEAD flag to reap this expectation via GC. This
is needed by nf_conntrack_unexpect_related() which is called in error
paths to invalidate newly created expectations that has been added into
the hashes. These expectactions cannot be inmediately released as GC or
nf_ct_remove_expectations() could race to make it. On expectation
insert, the runtime GC reaps stale expectations before checking the
expectation limit set by policy.

Set current timestamp in nf_ct_expect_alloc(), then add the expectation
policy timeout (or custom timeout specified added on top of this) to
specify the expectation lifetime.

Fixes: bffcaad9afdf ("netfilter: ctnetlink: ensure safe access to master conntrack")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_conntrack_expect.h   |  16 +-
 .../linux/netfilter/nf_conntrack_common.h     |   1 +
 net/netfilter/nf_conntrack_core.c             |  33 +++-
 net/netfilter/nf_conntrack_expect.c           | 145 +++++++++---------
 net/netfilter/nf_conntrack_h323_main.c        |   4 +-
 net/netfilter/nf_conntrack_helper.c           |  10 +-
 net/netfilter/nf_conntrack_netlink.c          |  22 ++-
 net/netfilter/nf_conntrack_sip.c              |  13 +-
 net/netfilter/nft_ct.c                        |   3 +-
 9 files changed, 139 insertions(+), 108 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_expect.h b/include/net/netfilter/nf_conntrack_expect.h
index 80f50fd0f7ad..be4a120d549e 100644
--- a/include/net/netfilter/nf_conntrack_expect.h
+++ b/include/net/netfilter/nf_conntrack_expect.h
@@ -54,8 +54,8 @@ struct nf_conntrack_expect {
 	/* The conntrack of the master connection */
 	struct nf_conn *master;
 
-	/* Timer function; deletes the expectation. */
-	struct timer_list timeout;
+	/* jiffies32 when this expectation expires */
+	u32 timeout;
 
 #if IS_ENABLED(CONFIG_NF_NAT)
 	union nf_inet_addr saved_addr;
@@ -69,6 +69,14 @@ struct nf_conntrack_expect {
 	struct rcu_head rcu;
 };
 
+static inline bool nf_ct_exp_is_expired(const struct nf_conntrack_expect *exp)
+{
+	if (READ_ONCE(exp->flags) & NF_CT_EXPECT_DEAD)
+		return true;
+
+	return (__s32)(READ_ONCE(exp->timeout) - nfct_time_stamp) <= 0;
+}
+
 static inline struct net *nf_ct_exp_net(struct nf_conntrack_expect *exp)
 {
 	return read_pnet(&exp->net);
@@ -130,7 +138,6 @@ static inline void nf_ct_unlink_expect(struct nf_conntrack_expect *exp)
 
 void nf_ct_remove_expectations(struct nf_conn *ct);
 void nf_ct_unexpect_related(struct nf_conntrack_expect *exp);
-bool nf_ct_remove_expect(struct nf_conntrack_expect *exp);
 
 void nf_ct_expect_iterate_destroy(bool (*iter)(struct nf_conntrack_expect *e, void *data), void *data);
 void nf_ct_expect_iterate_net(struct net *net,
@@ -153,5 +160,8 @@ static inline int nf_ct_expect_related(struct nf_conntrack_expect *expect,
 	return nf_ct_expect_related_report(expect, 0, 0, flags);
 }
 
+struct nf_conn_help;
+void nf_ct_expectation_gc(struct nf_conn_help *master_help);
+
 #endif /*_NF_CONNTRACK_EXPECT_H*/
 
diff --git a/include/uapi/linux/netfilter/nf_conntrack_common.h b/include/uapi/linux/netfilter/nf_conntrack_common.h
index 56b6b60a814f..ee51045ae1d6 100644
--- a/include/uapi/linux/netfilter/nf_conntrack_common.h
+++ b/include/uapi/linux/netfilter/nf_conntrack_common.h
@@ -160,6 +160,7 @@ enum ip_conntrack_expect_events {
 #define NF_CT_EXPECT_USERSPACE		0x4
 
 #ifdef __KERNEL__
+#define NF_CT_EXPECT_DEAD		0x8
 #define NF_CT_EXPECT_MASK	(NF_CT_EXPECT_PERMANENT | NF_CT_EXPECT_INACTIVE | \
 				 NF_CT_EXPECT_USERSPACE)
 #endif
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 4fb3a2d18631..784bd1d7a9bf 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1471,6 +1471,31 @@ static bool gc_worker_can_early_drop(const struct nf_conn *ct)
 	return false;
 }
 
+static void nf_ct_help_gc(struct nf_conn *ct)
+{
+	struct nf_conn_help *help;
+
+	if (!refcount_inc_not_zero(&ct->ct_general.use))
+		return;
+
+	/* load ->status after refcount increase */
+	smp_acquire__after_ctrl_dep();
+
+	if (!nf_ct_is_confirmed(ct) || nf_ct_is_dying(ct)) {
+		nf_ct_put(ct);
+		return;
+	}
+
+	/* re-check helper due to SLAB_TYPESAFE_BY_RCU */
+	if (test_bit(IPS_HELPER_BIT, &ct->status)) {
+		help = nfct_help(ct);
+		if (help)
+			nf_ct_expectation_gc(help);
+	}
+
+	nf_ct_put(ct);
+}
+
 static void gc_worker(struct work_struct *work)
 {
 	unsigned int i, hashsz, nf_conntrack_max95 = 0;
@@ -1543,7 +1568,13 @@ static void gc_worker(struct work_struct *work)
 			expires = (expires - (long)next_run) / ++count;
 			next_run += expires;
 
-			if (nf_conntrack_max95 == 0 || gc_worker_skip_ct(tmp))
+			if (gc_worker_skip_ct(tmp))
+				continue;
+
+			if (test_bit(IPS_HELPER_BIT, &tmp->status))
+				nf_ct_help_gc(tmp);
+
+			if (nf_conntrack_max95 == 0)
 				continue;
 
 			net = nf_ct_net(tmp);
diff --git a/net/netfilter/nf_conntrack_expect.c b/net/netfilter/nf_conntrack_expect.c
index 5c9b17835c28..49e18eda037e 100644
--- a/net/netfilter/nf_conntrack_expect.c
+++ b/net/netfilter/nf_conntrack_expect.c
@@ -43,6 +43,24 @@ unsigned int nf_ct_expect_max __read_mostly;
 static struct kmem_cache *nf_ct_expect_cachep __read_mostly;
 static siphash_aligned_key_t nf_ct_expect_hashrnd;
 
+void nf_ct_expectation_gc(struct nf_conn_help *master_help)
+{
+	struct nf_conntrack_expect *exp;
+	struct hlist_node *next;
+
+	if (hlist_empty(&master_help->expectations))
+		return;
+
+	spin_lock_bh(&nf_conntrack_expect_lock);
+	hlist_for_each_entry_safe(exp, next, &master_help->expectations, lnode) {
+		if (!nf_ct_exp_is_expired(exp))
+			continue;
+
+		nf_ct_unlink_expect(exp);
+	}
+	spin_unlock_bh(&nf_conntrack_expect_lock);
+}
+
 /* nf_conntrack_expect helper functions */
 void nf_ct_unlink_expect_report(struct nf_conntrack_expect *exp,
 				u32 portid, int report)
@@ -52,7 +70,6 @@ void nf_ct_unlink_expect_report(struct nf_conntrack_expect *exp,
 	struct nf_conntrack_net *cnet;
 
 	lockdep_nfct_expect_lock_held();
-	WARN_ON_ONCE(timer_pending(&exp->timeout));
 
 	hlist_del_rcu(&exp->hnode);
 
@@ -70,16 +87,6 @@ void nf_ct_unlink_expect_report(struct nf_conntrack_expect *exp,
 }
 EXPORT_SYMBOL_GPL(nf_ct_unlink_expect_report);
 
-static void nf_ct_expectation_timed_out(struct timer_list *t)
-{
-	struct nf_conntrack_expect *exp = timer_container_of(exp, t, timeout);
-
-	spin_lock_bh(&nf_conntrack_expect_lock);
-	nf_ct_unlink_expect(exp);
-	spin_unlock_bh(&nf_conntrack_expect_lock);
-	nf_ct_expect_put(exp);
-}
-
 static unsigned int nf_ct_expect_dst_hash(const struct net *n, const struct nf_conntrack_tuple *tuple)
 {
 	struct {
@@ -117,19 +124,6 @@ nf_ct_exp_equal(const struct nf_conntrack_tuple *tuple,
 	       nf_ct_exp_zone_equal_any(i, zone);
 }
 
-bool nf_ct_remove_expect(struct nf_conntrack_expect *exp)
-{
-	lockdep_nfct_expect_lock_held();
-
-	if (timer_delete(&exp->timeout)) {
-		nf_ct_unlink_expect(exp);
-		nf_ct_expect_put(exp);
-		return true;
-	}
-	return false;
-}
-EXPORT_SYMBOL_GPL(nf_ct_remove_expect);
-
 struct nf_conntrack_expect *
 __nf_ct_expect_find(struct net *net,
 		    const struct nf_conntrack_zone *zone,
@@ -144,6 +138,8 @@ __nf_ct_expect_find(struct net *net,
 
 	h = nf_ct_expect_dst_hash(net, tuple);
 	hlist_for_each_entry_rcu(i, &nf_ct_expect_hash[h], hnode) {
+		if (nf_ct_exp_is_expired(i))
+			continue;
 		if (nf_ct_exp_equal(tuple, i, zone, net))
 			return i;
 	}
@@ -178,6 +174,7 @@ nf_ct_find_expectation(struct net *net,
 {
 	struct nf_conntrack_net *cnet = nf_ct_pernet(net);
 	struct nf_conntrack_expect *i, *exp = NULL;
+	struct hlist_node *next;
 	unsigned int h;
 
 	lockdep_nfct_expect_lock_held();
@@ -186,7 +183,11 @@ nf_ct_find_expectation(struct net *net,
 		return NULL;
 
 	h = nf_ct_expect_dst_hash(net, tuple);
-	hlist_for_each_entry(i, &nf_ct_expect_hash[h], hnode) {
+	hlist_for_each_entry_safe(i, next, &nf_ct_expect_hash[h], hnode) {
+		if (nf_ct_exp_is_expired(i)) {
+			nf_ct_unlink_expect(i);
+			continue;
+		}
 		if (!(i->flags & NF_CT_EXPECT_INACTIVE) &&
 		    nf_ct_exp_equal(tuple, i, zone, net)) {
 			exp = i;
@@ -196,13 +197,16 @@ nf_ct_find_expectation(struct net *net,
 	if (!exp)
 		return NULL;
 
+	if (!refcount_inc_not_zero(&exp->use))
+		return NULL;
+
 	/* If master is not in hash table yet (ie. packet hasn't left
 	   this machine yet), how can other end know about expected?
 	   Hence these are not the droids you are looking for (if
 	   master ct never got confirmed, we'd hold a reference to it
 	   and weird things would happen to future packets). */
 	if (!nf_ct_is_confirmed(exp->master))
-		return NULL;
+		goto err_release_exp;
 
 	/* Avoid race with other CPUs, that for exp->master ct, is
 	 * about to invoke ->destroy(), or nf_ct_delete() via timeout
@@ -214,18 +218,17 @@ nf_ct_find_expectation(struct net *net,
 	 */
 	if (unlikely(nf_ct_is_dying(exp->master) ||
 		     !refcount_inc_not_zero(&exp->master->ct_general.use)))
-		return NULL;
+		goto err_release_exp;
 
-	if (exp->flags & NF_CT_EXPECT_PERMANENT || !unlink) {
-		refcount_inc(&exp->use);
-		return exp;
-	} else if (timer_delete(&exp->timeout)) {
-		nf_ct_unlink_expect(exp);
+	if (exp->flags & NF_CT_EXPECT_PERMANENT || !unlink)
 		return exp;
-	}
-	/* Undo exp->master refcnt increase, if timer_delete() failed */
-	nf_ct_put(exp->master);
 
+	nf_ct_unlink_expect(exp);
+
+	return exp;
+
+err_release_exp:
+	nf_ct_expect_put(exp);
 	return NULL;
 }
 
@@ -241,9 +244,8 @@ void nf_ct_remove_expectations(struct nf_conn *ct)
 		return;
 
 	spin_lock_bh(&nf_conntrack_expect_lock);
-	hlist_for_each_entry_safe(exp, next, &help->expectations, lnode) {
-		nf_ct_remove_expect(exp);
-	}
+	hlist_for_each_entry_safe(exp, next, &help->expectations, lnode)
+		nf_ct_unlink_expect(exp);
 	spin_unlock_bh(&nf_conntrack_expect_lock);
 }
 EXPORT_SYMBOL_GPL(nf_ct_remove_expectations);
@@ -292,7 +294,7 @@ static bool master_matches(const struct nf_conntrack_expect *a,
 void nf_ct_unexpect_related(struct nf_conntrack_expect *exp)
 {
 	spin_lock_bh(&nf_conntrack_expect_lock);
-	nf_ct_remove_expect(exp);
+	WRITE_ONCE(exp->flags, exp->flags | NF_CT_EXPECT_DEAD);
 	spin_unlock_bh(&nf_conntrack_expect_lock);
 }
 EXPORT_SYMBOL_GPL(nf_ct_unexpect_related);
@@ -308,6 +310,7 @@ struct nf_conntrack_expect *nf_ct_expect_alloc(struct nf_conn *me)
 	if (!new)
 		return NULL;
 
+	new->timeout = nfct_time_stamp;
 	new->master = me;
 	refcount_set(&new->use, 1);
 	return new;
@@ -413,17 +416,12 @@ static void nf_ct_expect_insert(struct nf_conntrack_expect *exp,
 	struct net *net = nf_ct_exp_net(exp);
 	unsigned int h = nf_ct_expect_dst_hash(net, &exp->tuple);
 
-	/* two references : one for hash insert, one for the timer */
-	refcount_add(2, &exp->use);
+	refcount_inc(&exp->use);
 
-	timer_setup(&exp->timeout, nf_ct_expectation_timed_out, 0);
 	helper = rcu_dereference_protected(master_help->helper,
 					   lockdep_is_held(&nf_conntrack_expect_lock));
-	if (helper) {
-		exp->timeout.expires = jiffies +
-			helper->expect_policy[exp->class].timeout * HZ;
-	}
-	add_timer(&exp->timeout);
+	if (helper)
+		exp->timeout += helper->expect_policy[exp->class].timeout * HZ;
 
 	hlist_add_head_rcu(&exp->lnode, &master_help->expectations);
 	master_help->expecting[exp->class]++;
@@ -435,19 +433,26 @@ static void nf_ct_expect_insert(struct nf_conntrack_expect *exp,
 	NF_CT_STAT_INC(net, expect_create);
 }
 
-/* Race with expectations being used means we could have none to find; OK. */
 static void evict_oldest_expect(struct nf_conn_help *master_help,
-				struct nf_conntrack_expect *new)
+				struct nf_conntrack_expect *new,
+				const struct nf_conntrack_expect_policy *p)
 {
 	struct nf_conntrack_expect *exp, *last = NULL;
+	struct hlist_node *next;
 
-	hlist_for_each_entry(exp, &master_help->expectations, lnode) {
+	hlist_for_each_entry_safe(exp, next, &master_help->expectations, lnode) {
+		if (nf_ct_exp_is_expired(exp)) {
+			nf_ct_unlink_expect(exp);
+			continue;
+		}
 		if (exp->class == new->class)
 			last = exp;
 	}
 
-	if (last)
-		nf_ct_remove_expect(last);
+	/* Still worth to evict oldest expectation after garbage collection? */
+	if (last &&
+	    master_help->expecting[last->class] >= p->max_expected)
+		nf_ct_unlink_expect(last);
 }
 
 static inline int __nf_ct_expect_check(struct nf_conntrack_expect *expect,
@@ -467,14 +472,18 @@ static inline int __nf_ct_expect_check(struct nf_conntrack_expect *expect,
 
 	h = nf_ct_expect_dst_hash(net, &expect->tuple);
 	hlist_for_each_entry_safe(i, next, &nf_ct_expect_hash[h], hnode) {
+		if (nf_ct_exp_is_expired(i)) {
+			nf_ct_unlink_expect(i);
+			continue;
+		}
 		if (master_matches(i, expect, flags) &&
 		    expect_matches(i, expect)) {
 			if (i->class != expect->class ||
 			    i->master != expect->master)
 				return -EALREADY;
 
-			if (nf_ct_remove_expect(i))
-				break;
+			nf_ct_unlink_expect(i);
+			break;
 		} else if (expect_clash(i, expect)) {
 			ret = -EBUSY;
 			goto out;
@@ -486,14 +495,8 @@ static inline int __nf_ct_expect_check(struct nf_conntrack_expect *expect,
 	if (helper) {
 		p = &helper->expect_policy[expect->class];
 		if (p->max_expected &&
-		    master_help->expecting[expect->class] >= p->max_expected) {
-			evict_oldest_expect(master_help, expect);
-			if (master_help->expecting[expect->class]
-						>= p->max_expected) {
-				ret = -EMFILE;
-				goto out;
-			}
-		}
+		    master_help->expecting[expect->class] >= p->max_expected)
+			evict_oldest_expect(master_help, expect, p);
 	}
 
 	cnet = nf_ct_pernet(net);
@@ -547,10 +550,8 @@ void nf_ct_expect_iterate_destroy(bool (*iter)(struct nf_conntrack_expect *e, vo
 		hlist_for_each_entry_safe(exp, next,
 					  &nf_ct_expect_hash[i],
 					  hnode) {
-			if (iter(exp, data) && timer_delete(&exp->timeout)) {
+			if (iter(exp, data))
 				nf_ct_unlink_expect(exp);
-				nf_ct_expect_put(exp);
-			}
 		}
 	}
 
@@ -577,10 +578,8 @@ void nf_ct_expect_iterate_net(struct net *net,
 			if (!net_eq(nf_ct_exp_net(exp), net))
 				continue;
 
-			if (iter(exp, data) && timer_delete(&exp->timeout)) {
+			if (iter(exp, data))
 				nf_ct_unlink_expect_report(exp, portid, report);
-				nf_ct_expect_put(exp);
-			}
 		}
 	}
 
@@ -657,17 +656,17 @@ static int exp_seq_show(struct seq_file *s, void *v)
 	struct net *net = seq_file_net(s);
 	struct hlist_node *n = v;
 	char *delim = "";
+	__s32 timeout;
 
 	expect = hlist_entry(n, struct nf_conntrack_expect, hnode);
 
 	if (!net_eq(nf_ct_exp_net(expect), net))
 		return 0;
+	if (nf_ct_exp_is_expired(expect))
+		return 0;
 
-	if (expect->timeout.function)
-		seq_printf(s, "%ld ", timer_pending(&expect->timeout)
-			   ? (long)(expect->timeout.expires - jiffies)/HZ : 0);
-	else
-		seq_puts(s, "- ");
+	timeout = (__s32)(READ_ONCE(expect->timeout) - nfct_time_stamp) / HZ;
+	seq_printf(s, "%d ", timeout > 0 ? timeout : 0);
 	seq_printf(s, "l3proto = %u proto=%u ",
 		   expect->tuple.src.l3num,
 		   expect->tuple.dst.protonum);
diff --git a/net/netfilter/nf_conntrack_h323_main.c b/net/netfilter/nf_conntrack_h323_main.c
index 7f189dceb3c4..24931e379985 100644
--- a/net/netfilter/nf_conntrack_h323_main.c
+++ b/net/netfilter/nf_conntrack_h323_main.c
@@ -1388,8 +1388,8 @@ static int process_rcf(struct sk_buff *skb, struct nf_conn *ct,
 				 "timeout to %u seconds for",
 				 info->timeout);
 			nf_ct_dump_tuple(&exp->tuple);
-			mod_timer_pending(&exp->timeout,
-					  jiffies + info->timeout * HZ);
+			WRITE_ONCE(exp->timeout,
+				   nfct_time_stamp + (info->timeout * HZ));
 		}
 		spin_unlock_bh(&nf_conntrack_expect_lock);
 	}
diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
index 2f35bdd0d7d7..8b94001c2430 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -181,10 +181,10 @@ nf_ct_helper_ext_add(struct nf_conn *ct, gfp_t gfp)
 	struct nf_conn_help *help;
 
 	help = nf_ct_ext_add(ct, NF_CT_EXT_HELPER, gfp);
-	if (help)
+	if (help) {
+		__set_bit(IPS_HELPER_BIT, &ct->status);
 		INIT_HLIST_HEAD(&help->expectations);
-	else
-		pr_debug("failed to add helper extension area");
+	}
 	return help;
 }
 EXPORT_SYMBOL_GPL(nf_ct_helper_ext_add);
@@ -203,10 +203,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl,
 		return 0;
 
 	help = nfct_help(tmpl);
-	if (help != NULL) {
+	if (help)
 		helper = rcu_dereference(help->helper);
-		set_bit(IPS_HELPER_BIT, &ct->status);
-	}
 
 	help = nfct_help(ct);
 
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index b429e648f06c..4e78d2482989 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -3014,8 +3014,8 @@ static int
 ctnetlink_exp_dump_expect(struct sk_buff *skb,
 			  const struct nf_conntrack_expect *exp)
 {
+	__s32 timeout = (__s32)(READ_ONCE(exp->timeout) - nfct_time_stamp) / HZ;
 	struct nf_conn *master = exp->master;
-	long timeout = ((long)exp->timeout.expires - (long)jiffies) / HZ;
 	struct nf_conntrack_helper *helper;
 #if IS_ENABLED(CONFIG_NF_NAT)
 	struct nlattr *nest_parms;
@@ -3178,6 +3178,9 @@ ctnetlink_exp_dump_table(struct sk_buff *skb, struct netlink_callback *cb)
 restart:
 		hlist_for_each_entry_rcu(exp, &nf_ct_expect_hash[cb->args[0]],
 					 hnode) {
+			if (nf_ct_exp_is_expired(exp))
+				continue;
+
 			if (l3proto && exp->tuple.src.l3num != l3proto)
 				continue;
 
@@ -3456,11 +3459,8 @@ static int ctnetlink_del_expect(struct sk_buff *skb,
 		}
 
 		/* after list removal, usage count == 1 */
-		if (timer_delete(&exp->timeout)) {
-			nf_ct_unlink_expect_report(exp, NETLINK_CB(skb).portid,
-						   nlmsg_report(info->nlh));
-			nf_ct_expect_put(exp);
-		}
+		nf_ct_unlink_expect_report(exp, NETLINK_CB(skb).portid,
+					   nlmsg_report(info->nlh));
 		spin_unlock_bh(&nf_conntrack_expect_lock);
 		/* have to put what we 'get' above.
 		 * after this line usage count == 0 */
@@ -3484,14 +3484,10 @@ static int
 ctnetlink_change_expect(struct nf_conntrack_expect *x,
 			const struct nlattr * const cda[])
 {
-	if (cda[CTA_EXPECT_TIMEOUT]) {
-		if (!timer_delete(&x->timeout))
-			return -ETIME;
+	if (cda[CTA_EXPECT_TIMEOUT])
+		WRITE_ONCE(x->timeout, nfct_time_stamp +
+			   ntohl(nla_get_be32(cda[CTA_EXPECT_TIMEOUT])) * HZ);
 
-		x->timeout.expires = jiffies +
-			ntohl(nla_get_be32(cda[CTA_EXPECT_TIMEOUT])) * HZ;
-		add_timer(&x->timeout);
-	}
 	return 0;
 }
 
diff --git a/net/netfilter/nf_conntrack_sip.c b/net/netfilter/nf_conntrack_sip.c
index c606d1f60b58..5ec3a4a4bbd7 100644
--- a/net/netfilter/nf_conntrack_sip.c
+++ b/net/netfilter/nf_conntrack_sip.c
@@ -897,11 +897,10 @@ static int refresh_signalling_expectation(struct nf_conn *ct,
 		    exp->tuple.dst.protonum != proto ||
 		    exp->tuple.dst.u.udp.port != port)
 			continue;
-		if (mod_timer_pending(&exp->timeout, jiffies + expires * HZ)) {
-			exp->flags &= ~NF_CT_EXPECT_INACTIVE;
-			found = 1;
-			break;
-		}
+		WRITE_ONCE(exp->timeout, nfct_time_stamp + (expires * HZ));
+		WRITE_ONCE(exp->flags, exp->flags & ~NF_CT_EXPECT_INACTIVE);
+		found = 1;
+		break;
 	}
 	spin_unlock_bh(&nf_conntrack_expect_lock);
 	return found;
@@ -920,8 +919,7 @@ static void flush_expectations(struct nf_conn *ct, bool media)
 	hlist_for_each_entry_safe(exp, next, &help->expectations, lnode) {
 		if ((exp->class != SIP_EXPECT_SIGNALLING) ^ media)
 			continue;
-		if (!nf_ct_remove_expect(exp))
-			continue;
+		nf_ct_unlink_expect(exp);
 		if (!media)
 			break;
 	}
@@ -1413,7 +1411,6 @@ static int process_register_request(struct sk_buff *skb, unsigned int protoff,
 
 	nf_ct_expect_init(exp, SIP_EXPECT_SIGNALLING, nf_ct_l3num(ct),
 			  saddr, &daddr, proto, NULL, &port);
-	exp->timeout.expires = sip_timeout * HZ;
 	rcu_assign_pointer(exp->assign_helper, helper);
 	exp->flags = NF_CT_EXPECT_PERMANENT | NF_CT_EXPECT_INACTIVE;
 
diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index 25934c6f01fb..958054dd2e2e 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -1145,7 +1145,6 @@ static void nft_ct_helper_obj_eval(struct nft_object *obj,
 	help = nf_ct_helper_ext_add(ct, GFP_ATOMIC);
 	if (help && refcount_inc_not_zero(&to_assign->ct_refcnt)) {
 		rcu_assign_pointer(help->helper, to_assign);
-		set_bit(IPS_HELPER_BIT, &ct->status);
 
 		if ((ct->status & IPS_NAT_MASK) && !nfct_seqadj(ct))
 			if (!nfct_seqadj_ext_add(ct))
@@ -1326,7 +1325,7 @@ static void nft_ct_expect_obj_eval(struct nft_object *obj,
 		          &ct->tuplehash[!dir].tuple.src.u3,
 		          &ct->tuplehash[!dir].tuple.dst.u3,
 		          priv->l4proto, NULL, &priv->dport);
-	exp->timeout.expires = jiffies + priv->timeout * HZ;
+	exp->timeout += priv->timeout * HZ;
 
 	if (nf_ct_expect_related(exp, 0) != 0)
 		regs->verdict.code = NF_DROP;
-- 
2.47.3


^ permalink raw reply related

* [PATCH net 14/14] netfilter: nft_meta_bridge: fix NFT_META_BRI_IIFPVID stack leak
From: Pablo Neira Ayuso @ 2026-06-20 22:27 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
In-Reply-To: <20260620222738.112506-1-pablo@netfilter.org>

From: Florian Westphal <fw@strlen.de>

This needs to test for nonzero retval.

Fixes: c54c7c685494 ("netfilter: nft_meta_bridge: add NFT_META_BRI_IIFPVID support")
Closes: https://sashiko.dev/#/patchset/20260618061631.21919-1-fw%40strlen.de
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/bridge/netfilter/nft_meta_bridge.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/bridge/netfilter/nft_meta_bridge.c b/net/bridge/netfilter/nft_meta_bridge.c
index 3d95f68e0906..e4c9aa1f64e2 100644
--- a/net/bridge/netfilter/nft_meta_bridge.c
+++ b/net/bridge/netfilter/nft_meta_bridge.c
@@ -44,7 +44,9 @@ static void nft_meta_bridge_get_eval(const struct nft_expr *expr,
 		if (!br_dev || !br_vlan_enabled(br_dev))
 			goto err;
 
-		br_vlan_get_pvid_rcu(in, &p_pvid);
+		if (br_vlan_get_pvid_rcu(in, &p_pvid))
+			goto err;
+
 		nft_reg_store16(dest, p_pvid);
 		return;
 	}
-- 
2.47.3


^ permalink raw reply related

* Re: [PATCH net 00/16] Netfilter fixes for net
From: Pablo Neira Ayuso @ 2026-06-20 22:28 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
In-Reply-To: <20260619115452.93949-1-pablo@netfilter.org>

Hi,

Please scratch this v1 series.

I have posted a v2 for this series for the net tree.

Thanks.

On Fri, Jun 19, 2026 at 01:54:35PM +0200, Pablo Neira Ayuso wrote:
> Hi,
> 
> The following patchset contains Netfilter fixes for net, this contains
> fixes for a few crash, but many of the patches are trivial/correctness
> fixes. There is too one rework of the conntrack expectation timeout
> strategy to deal with a possible race when removing an expectation.
> 
> 1) Fix the incorrect flowtable timeout extension for entries in
>    hw offload, from Adrian Bente. This is correcting a defect in
>    the functionality, no crash.
> 
> 2) Hold reference to device under the fake dst in br_netfilter,
>    from Haoze Xie. This is fixing a possible UaF if the device
>    is removed while packet is sitting in nfqueue.
> 
> 3) Reject template conntrack in xt_cluster, otherwise access to
>    uninitialize conntrack fields are possible leading to WARN_ON
>    due to unset layer 3 protocol. From Wyatt Feng.
> 
> 4) Make sure the IPv6 tunnel header is in the linear skb data
>    area before pulling. While at it remove incomplete NEXTHDR_DEST
>    support. From Lorenzo Bianconi. This possibly leading to crash
>    if IPv4 header is not linear, but GRO already guarantees this,
>    unlikely but still possible.
> 
> 5) Bail out immediately if ENOMEM is seen in a nfnetlink batch,
>    no further processing since this will accumulate more bogus
>    errors. From Florian Westphal. Functionally improvements
>    under memory stress, no crash.
> 
> 6) Use test_bit_acquire in ipset hash set to avoid reordering
>    of subsequent memory access. This is addressing a LLM related
>    report, no crash has been observed. From Jozsef Kadlecsik.
> 
> 7) Use test_bit_acquire in ipset bitmap set too, for the same
>    reason as in the previous patch, from Jozsef Kadlecsik.
> 
> 8) Call kfree_rcu() after rcu_assign_pointer() to address a
>    possible UaF, very hard to trigger. Never observed in practise,
>    reported by LLM. Also from Jozsef Kadlecsik.
> 
> 9) Use disable_delayed_work_sync() instead cancel_delayed_work_sync()
>    to avoid that ipset GC handler re-queues work as reported by LLM.
>    From Jozsef Kadlecsik. This is for correctness.
> 
> 10) Restore the check in nft_payload for exceeding payloda offset
>     over 2^16. From Florian Westphal. This fixes a silent truncation,
>     not a big deal, but better be assertive and reject it.
> 
> 11) Validate NFT_META_BRI_IIFHWADDR can only run from bridge
>     prerouting. From Florian Westphal. Harmless but it could allow
>     to read bytes from skb->cb.
> 
> 12) Zero out destination hardware address during the flowtable
>     path setup, also from Florian. This is a correctness fix, LLM
>     points that possible infoleak can happen but topology to achieve
>     it is not clear.
> 
> 13) Skip IPv4 options if present when building the IPV4 reject reply.
>     Otherwise bytes in the IPv4 options header can be sent back to
>     origin where the ICMP header is being expected. Again from
>     Florian Westphal.
> 
> 14) Replace timer API for expectation by GC worker approach. This
>     is implicitly fixing a race between nf_ct_remove_expectations()
>     which might fail to remove the expectation due to timer_del()
>     returning false because timer has expired and callback is
>     being run concurrently. This fix is addressing a crash that has
>     been already reported with a reproducer.
> 
> 15) Store the master tuple in the expectation, since SLAB_TYPESAFE_BY_RCU
>     does not guarantee that accessing exp->master under rcu read lock
>     refer to the right master conntrack. Found by initial round of
>     fixes for expectation by LLM also found this.
> 
> 16) Check if br_vlan_get_pvid_rcu() fails to address a possible stack
>     infoleak of 4-bytes. From Florian Westphal.
> 
> This is slightly over the 15 patch limit in batches, please, allow this
> round to exceed it by one.
> 
> Please, pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-26-06-19
> 
> Thanks.
> 
> ----------------------------------------------------------------
> 
> The following changes since commit 96e7f9122aae0ed000ee321f324b812a447906d9:
> 
>   eth: fbnic: take netif_addr_lock_bh() around rx mode address programming (2026-06-18 18:36:26 -0700)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-26-06-19
> 
> for you to fetch changes up to 05477f7a037c127854b58441f60b34210668f5c3:
> 
>   netfilter: nft_meta_bridge: fix NFT_META_BRI_IIFPVID stack leak (2026-06-19 12:27:08 +0200)
> 
> ----------------------------------------------------------------
> netfilter pull request 26-06-19
> 
> ----------------------------------------------------------------
> Adrian Bente (1):
>       netfilter: flowtable: fix offloaded ct timeout never being extended
> 
> Florian Westphal (6):
>       netfilter: nfnetlink: make OOM conditions fatal
>       netfilter: nft_payload: reject offsets exceeding 65535 bytes
>       netfilter: nft_meta_bridge: add validate callback for get operations
>       netfilter: nft_flow_offload: zero device address for non-ether case
>       netfilter: nf_reject: skip iphdr options when looking for icmp header
>       netfilter: nft_meta_bridge: fix NFT_META_BRI_IIFPVID stack leak
> 
> Haoze Xie (1):
>       netfilter: nf_queue: pin bridge device while NFQUEUE holds fake dst
> 
> Jozsef Kadlecsik (4):
>       netfilter: ipset: Don't use test_bit() in lockless RCU readers in hash types
>       netfilter: ipset: Don't use test_bit() in lockless RCU readers in bitmap types
>       netfilter: ipset: fix order of kfree_rcu() and rcu_assign_pointer()
>       netfilter: ipset: make sure gc is properly stopped
> 
> Lorenzo Bianconi (1):
>       netfilter: flowtable: fix and simplify IP6IP6 tunnel handling
> 
> Pablo Neira Ayuso (2):
>       netfilter: nf_conntrack_expect: use conntrack GC to reap expectations
>       netfilter: nf_conntrack_expect: store master_tuple in expectation
> 
> Wyatt Feng (1):
>       netfilter: xt_cluster: reject template conntracks in hash match
> 
>  include/net/netfilter/nf_conntrack_expect.h        |  17 ++-
>  include/net/netfilter/nf_queue.h                   |   1 +
>  include/net/netfilter/nft_meta.h                   |   2 +
>  include/uapi/linux/netfilter/nf_conntrack_common.h |   1 +
>  net/bridge/netfilter/nft_meta_bridge.c             |  23 +++-
>  net/ipv4/netfilter/nf_reject_ipv4.c                |   2 +-
>  net/ipv6/ip6_tunnel.c                              |   7 +
>  net/netfilter/ipset/ip_set_bitmap_gen.h            |   4 +-
>  net/netfilter/ipset/ip_set_bitmap_ip.c             |   2 +-
>  net/netfilter/ipset/ip_set_bitmap_ipmac.c          |   2 +-
>  net/netfilter/ipset/ip_set_bitmap_port.c           |   2 +-
>  net/netfilter/ipset/ip_set_core.c                  |   4 +-
>  net/netfilter/ipset/ip_set_hash_gen.h              |  12 +-
>  net/netfilter/nf_conntrack_broadcast.c             |   1 +
>  net/netfilter/nf_conntrack_core.c                  |  33 ++++-
>  net/netfilter/nf_conntrack_expect.c                | 147 +++++++++++----------
>  net/netfilter/nf_conntrack_h323_main.c             |   4 +-
>  net/netfilter/nf_conntrack_helper.c                |  10 +-
>  net/netfilter/nf_conntrack_netlink.c               |  31 ++---
>  net/netfilter/nf_conntrack_sip.c                   |  13 +-
>  net/netfilter/nf_flow_table_core.c                 |  13 +-
>  net/netfilter/nf_flow_table_ip.c                   |  80 +++--------
>  net/netfilter/nf_flow_table_path.c                 |   4 +-
>  net/netfilter/nf_queue.c                           |  14 ++
>  net/netfilter/nfnetlink.c                          |   7 +
>  net/netfilter/nfnetlink_queue.c                    |   3 +
>  net/netfilter/nft_ct.c                             |   3 +-
>  net/netfilter/nft_meta.c                           |   5 +-
>  net/netfilter/nft_payload.c                        |  16 ++-
>  net/netfilter/xt_cluster.c                         |   2 +-
>  .../selftests/net/netfilter/nft_flowtable.sh       |   8 +-
>  31 files changed, 268 insertions(+), 205 deletions(-)
> 

^ permalink raw reply

* Re: Bug#1130336: [regression] Network failure beyond first connection after 69894e5b4c5e ("netfilter: nft_connlimit: update the count if add was skipped")
From: Pablo Neira Ayuso @ 2026-06-20 22:32 UTC (permalink / raw)
  To: Salvatore Bonaccorso
  Cc: Fernando Fernandez Mancera, Thorsten Leemhuis,
	Alejandro Oliván Alvarez, 1130336, Florian Westphal,
	Phil Sutter, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, netfilter-devel, coreteam, netdev,
	linux-kernel, regressions, stable
In-Reply-To: <ajb7ugG5mYxYIPva@eldamar.lan>

On Sat, Jun 20, 2026 at 10:44:42PM +0200, Salvatore Bonaccorso wrote:
> Hi Fernando,
> 
> On Wed, Apr 22, 2026 at 12:32:34PM +0200, Fernando Fernandez Mancera wrote:
> > On 4/22/26 11:18 AM, Thorsten Leemhuis wrote:
> > > Lo! Top-posting on purpose to make this easy to process.
> > > 
> > > What happened to this regression? It looks a bit like things stalled and
> > > fell through the cracks. Or Fernando, did you post a patch like you
> > > mentioned? I looked for one referring the commit or the reporter, but
> > > could not find anything -- but maybe I missed it.
> > > 
> > 
> > Yes, it stalled and fell through the cracks. Let me prepare a fix as I
> > mentioned.
> 
> Did that happened? On a quick chek at least 7.0.13 upstream seem still
> to exhibit the problem (or would it be fair to let this usecase rest?)

I still have to take a fix Fernando posted.

^ permalink raw reply

* Re: [PATCH net] tipc: restrict socket queue dumps in enqueue tracepoints
From: XIAO WU @ 2026-06-21  1:21 UTC (permalink / raw)
  To: Li Xiasong, Jon Maloy
  Cc: stable, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Ying Xue, Tuong Lien, netdev,
	tipc-discussion, yuehaibing, zhangchangzhong, weiyongjun1
In-Reply-To: <20260611135647.3666727-1-lixiasong1@huawei.com>

Hi Li Xiasong,

I see this patch was merged into net.git as commit acd7df8d9554 — thanks
for the fix.  However, a Sashiko AI code review [1] flagged that
`tipc_poll()` in the same file has the identical pre-existing issue: it
calls `trace_tipc_sk_poll()` with `TIPC_DUMP_ALL`, which triggers a dump
of all socket queues without holding the socket owner lock.  The merged
fix addressed `tipc_sk_enqueue()` but left `tipc_poll()` unchanged.

I was able to reproduce the remaining use-after-free in QEMU with KASAN
by racing `tipc_poll()` against `tipc_recvmsg()` on the same socket.

On Wed, Jun 11, 2026 at 09:56:47PM +0800, Li Xiasong wrote:
 > This commit addresses a KASAN use-after-free issue in tipc_sk_enqueue()
 > by restricting tracepoints to only dump the backlog queue
 > (TIPC_DUMP_SK_BKLGQ) instead of all queues (TIPC_DUMP_ALL).

Your fix correctly restricts the `tipc_sk_enqueue()` tracepoints, but
`tipc_poll()` still uses `TIPC_DUMP_ALL`:

```c
// net/tipc/socket.c:tipc_poll()
trace_tipc_sk_poll(sk, NULL, TIPC_DUMP_ALL, " ");
```

This triggers `tipc_sk_dump()` → `tipc_list_dump()` to walk
`sk->sk_receive_queue` without holding `sk->sk_lock.slock`. If
`tipc_recvmsg()` concurrently dequeues and frees an skb from that
queue, the tracepoint dump reads freed memory.

[Reproduction]

Two threads on the same TIPC SOCK_DGRAM socket, with the
`tipc_sk_poll` tracepoint enabled:
- Thread 1: loops on poll() → trace_tipc_sk_poll → tipc_sk_dump
- Thread 2: loops on recvfrom() → frees skbs from the receive queue
   while the tracepoint walks it

Full PoC source (poc.c):
---8<----------------------------------------------------------------
// SPDX-License-Identifier: GPL-2.0-only
/*
  * tipc_poll() tracepoint use-after-free PoC
  * gcc -static -o poc poc.c -lpthread
  */
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <pthread.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/poll.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdint.h>

#ifndef AF_TIPC
#define AF_TIPC         30
#endif

#define TIPC_SERVICE_RANGE      1
#define TIPC_SERVICE_ADDR       2
#define TIPC_CLUSTER_SCOPE      2

struct tipc_socket_addr { uint32_t ref; uint32_t node; };
struct sockaddr_tipc {
     unsigned short family;
     unsigned char  addrtype;
     signed   char  scope;
     union {
         struct tipc_socket_addr id;
         struct { uint32_t type; uint32_t lower; uint32_t upper; } nameseq;
         struct { struct { uint32_t type; uint32_t instance; } name;
                  uint32_t domain; } name;
     } addr;
};

static int running = 1;
static int server_fd = -1;

static int enable_tracepoint(void)
{
     const char *paths[] = {
         "/sys/kernel/debug/tracing/events/tipc/tipc_sk_poll/enable",
         "/sys/kernel/tracing/events/tipc/tipc_sk_poll/enable", NULL
     };
     for (int i = 0; paths[i]; i++) {
         int fd = open(paths[i], O_WRONLY|O_TRUNC);
         if (fd >= 0) { write(fd, "1", 1); close(fd); return 0; }
     }
     return -1;
}

static void *poll_thread(void *arg)
{
     struct pollfd pfd;
     (void)arg;
     while (running) {
         pfd.fd = server_fd; pfd.events = POLLIN; pfd.revents = 0;
         poll(&pfd, 1, 0);
     }
     return NULL;
}

static void *recv_thread(void *arg)
{
     char buf[4096];
     struct sockaddr_tipc src;
     socklen_t srclen = sizeof(src);
     (void)arg;
     while (running) {
         srclen = sizeof(src);
         recvfrom(server_fd, buf, sizeof(buf), MSG_DONTWAIT,
                  (struct sockaddr *)&src, &srclen);
         usleep(100);
     }
     return NULL;
}

int main(void)
{
     pthread_t poll_tid, recv_tid;
     uint32_t svc_type = 20000 + (getpid() % 40000);

     enable_tracepoint();
     server_fd = socket(AF_TIPC, SOCK_DGRAM, 0);

     struct sockaddr_tipc srv_addr = {0};
     srv_addr.family = AF_TIPC;
     srv_addr.addrtype = TIPC_SERVICE_RANGE;
     srv_addr.scope = TIPC_CLUSTER_SCOPE;
     srv_addr.addr.nameseq.type = svc_type;
     srv_addr.addr.nameseq.lower = 1;
     srv_addr.addr.nameseq.upper = 1;
     bind(server_fd, (struct sockaddr *)&srv_addr, sizeof(srv_addr));

     int client_fd = socket(AF_TIPC, SOCK_DGRAM, 0);
     struct sockaddr_tipc dest_addr = {0};
     dest_addr.family = AF_TIPC;
     dest_addr.addrtype = TIPC_SERVICE_ADDR;
     dest_addr.scope = TIPC_CLUSTER_SCOPE;
     dest_addr.addr.name.name.type = svc_type;
     dest_addr.addr.name.name.instance = 1;

     char sendbuf[256];
     memset(sendbuf, 0x41, sizeof(sendbuf));
     for (int i = 0; i < 50; i++)
         sendto(client_fd, sendbuf, sizeof(sendbuf), 0,
                (struct sockaddr *)&dest_addr, sizeof(dest_addr));
     usleep(100000);

     pthread_create(&poll_tid, NULL, poll_thread, NULL);
     pthread_create(&recv_tid, NULL, recv_thread, NULL);

     for (int i = 0; i < 2000; i++) {
         sendto(client_fd, sendbuf, sizeof(sendbuf), 0,
                (struct sockaddr *)&dest_addr, sizeof(dest_addr));
         usleep(500);
     }

     running = 0;
     pthread_join(poll_tid, NULL);
     pthread_join(recv_tid, NULL);
     close(client_fd);
     close(server_fd);
     printf("[+] Done. Check dmesg.\n");
     return 0;
}
---8<----------------------------------------------------------------
Compile: gcc -static -o poc poc.c -lpthread

[KASAN report — kernel 7.1.0-rc6+, CONFIG_KASAN=y]

   ==================================================================
   BUG: KASAN: slab-use-after-free in tipc_skb_dump+0x12e7/0x1590
   Read of size 4 at addr ffff888033f3d8d0 by task poc/9474

   Call Trace:
    <TASK>
    tipc_skb_dump+0x12e7/0x1590
    tipc_list_dump+0x276/0x330
    tipc_sk_dump+0xb6c/0xda0
    trace_event_raw_event_tipc_sk_class+0x364/0x590
    tipc_poll+0x44a/0x6b0
    sock_poll+0x.../...
    do_sys_poll+0x.../...
    __x64_sys_poll+0x.../...
    do_syscall_64+0xcd/0xf80
    entry_SYSCALL_64_after_hwframe+0x77/0x7f

   Freed by task 9475:
    kfree_skb_reason+0x.../...
    tipc_recvmsg+0x.../...
    sock_recvmsg+0x.../...
    sock_read_iter+0x.../...
    vfs_read+0x.../...
    ksys_read+0x.../...

The fix is the same as what was already applied to `tipc_sk_enqueue()` in
commit acd7df8d9554: change `TIPC_DUMP_ALL` to `TIPC_DUMP_SK_BKLGQ` in
the `tipc_poll()` tracepoint, since poll() does not hold the socket lock
that protects the other queues.

[1] 
https://sashiko.dev/#/patchset/20260611135647.3666727-1-lixiasong1%40huawei.com
     (Sashiko AI code review — "Use-After-Free", Severity: High)

Thanks,
XIAOWU



^ permalink raw reply

* [PATCH] net: wwan: t7xx: destroy DMA pool on CLDMA late init failure
From: Haoxiang Li @ 2026-06-21  3:17 UTC (permalink / raw)
  To: chandrashekar.devegowda, haijun.liu, ricardo.martinez,
	loic.poulain, ryazanov.s.a, johannes, andrew+netdev, davem,
	edumazet, kuba, pabeni, ilpo.jarvinen
  Cc: netdev, linux-kernel, Haoxiang Li, stable

t7xx_cldma_late_init() creates md_ctrl->gpd_dmapool before
initializing the TX and RX rings. If any ring initialization
fails, the error path frees the already initialized rings but
leaves the DMA pool allocated.

Destroy md_ctrl->gpd_dmapool on the late-init failure path
to avoid leaking the DMA pool.

Fixes: 39d439047f1d ("net: wwan: t7xx: Add control DMA interface")
Cc: stable@vger.kernel.org
Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>
---
 drivers/net/wwan/t7xx/t7xx_hif_cldma.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/wwan/t7xx/t7xx_hif_cldma.c b/drivers/net/wwan/t7xx/t7xx_hif_cldma.c
index e10cb4f9104e..2917cee9b802 100644
--- a/drivers/net/wwan/t7xx/t7xx_hif_cldma.c
+++ b/drivers/net/wwan/t7xx/t7xx_hif_cldma.c
@@ -1063,6 +1063,9 @@ static int t7xx_cldma_late_init(struct cldma_ctrl *md_ctrl)
 	while (i--)
 		t7xx_cldma_ring_free(md_ctrl, &md_ctrl->tx_ring[i], DMA_TO_DEVICE);
 
+	dma_pool_destroy(md_ctrl->gpd_dmapool);
+	md_ctrl->gpd_dmapool = NULL;
+
 	return ret;
 }
 
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH net v2 0/2] net: ethernet: sunplus: spl2sw: fix of_node refcount leaks
From: 呂芳騰 @ 2026-06-21  4:38 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Shitalkumar Gandhi, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, netdev, linux-kernel,
	Shitalkumar Gandhi
In-Reply-To: <20260618175619.671a4025@kernel.org>

Hi Jakub,

I'm sorry that I can't test the fix.
I've left from Suplus and don't have the relevant hardware.


Best regards,
Wells Lu

Jakub Kicinski <kuba@kernel.org> 於 2026年6月19日週五 上午8:56寫道:
>
> On Tue, 16 Jun 2026 01:20:30 +0530 Shitalkumar Gandhi wrote:
> > This series fixes of_node refcount leaks in the Sunplus SP7021 ethernet
> > driver, found by inspection. Compile-tested only; no SP7021 hardware
> > available here.
> >
> > Patch 1/2 fixes the phy_node leak in the remove path.
> > Patch 2/2 fixes multiple leaks in the probe path and depends on the
> > cleanup contract from patch 1/2.
>
> Wells Lu, please review.
> --
> mping: SUNPLUS ETHERNET DRIVER

^ permalink raw reply

* Re: [PATCH net] tipc: restrict socket queue dumps in enqueue tracepoints
From: Greg KH @ 2026-06-21  5:39 UTC (permalink / raw)
  To: XIAO WU
  Cc: Li Xiasong, Jon Maloy, stable, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Ying Xue, Tuong Lien,
	netdev, tipc-discussion, yuehaibing, zhangchangzhong, weiyongjun1
In-Reply-To: <tencent_EC8B2032C1F9358EA3B49645F0F2277B210A@qq.com>

On Sun, Jun 21, 2026 at 09:21:15AM +0800, XIAO WU wrote:
> Hi Li Xiasong,
> 
> I see this patch was merged into net.git as commit acd7df8d9554 — thanks
> for the fix.  However, a Sashiko AI code review [1] flagged that
> `tipc_poll()` in the same file has the identical pre-existing issue: it
> calls `trace_tipc_sk_poll()` with `TIPC_DUMP_ALL`, which triggers a dump
> of all socket queues without holding the socket owner lock.  The merged
> fix addressed `tipc_sk_enqueue()` but left `tipc_poll()` unchanged.
> 
> I was able to reproduce the remaining use-after-free in QEMU with KASAN
> by racing `tipc_poll()` against `tipc_recvmsg()` on the same socket.

Great, can you send a fix for this?

thanks,

greg k-h

^ permalink raw reply

* [syzbot] [nbd?] WARNING in nbd_add_socket
From: syzbot @ 2026-06-21  6:23 UTC (permalink / raw)
  To: axboe, josef, linux-block, linux-kernel, nbd, netdev,
	syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    b85966adbf5d Merge tag 'net-next-7.2' of git://git.kernel...
git tree:       net
console output: https://syzkaller.appspot.com/x/log.txt?x=101f6d56580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=9a9f723a32776544
dashboard link: https://syzkaller.appspot.com/bug?extid=6b85d1e39a5b8ed9a954
compiler:       Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13584aae580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11fd7b7a580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/780edcc3cc37/disk-b85966ad.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/967dd18c7ecd/vmlinux-b85966ad.xz
kernel image: https://storage.googleapis.com/syzbot-assets/cf9fa92c90ff/bzImage-b85966ad.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+6b85d1e39a5b8ed9a954@syzkaller.appspotmail.com

netlink: 3936 bytes leftover after parsing attributes in process `syz.0.25'.
------------[ cut here ]------------
!sock_allow_reclassification(sk)
WARNING: drivers/block/nbd.c:1249 at nbd_reclassify_socket drivers/block/nbd.c:1249 [inline], CPU#0: syz.0.25/5992
WARNING: drivers/block/nbd.c:1249 at nbd_add_socket+0xf35/0x12c0 drivers/block/nbd.c:1293, CPU#0: syz.0.25/5992
Modules linked in:

CPU: 0 UID: 0 PID: 5992 Comm: syz.0.25 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/09/2026
RIP: 0010:nbd_reclassify_socket drivers/block/nbd.c:1249 [inline]
RIP: 0010:nbd_add_socket+0xf35/0x12c0 drivers/block/nbd.c:1293
Code: f7 e8 6f b5 20 fc bf e0 01 00 00 49 03 3e 48 c7 c6 40 02 55 8c e8 2b a8 1b fb b8 f0 ff ff ff e9 b2 fd ff ff e8 ac 60 b5 fb 90 <0f> 0b 90 e9 16 f8 ff ff e8 5e 2e 97 05 44 89 e9 80 e1 07 fe c1 38
RSP: 0018:ffffc90002ef7160 EFLAGS: 00010293

RAX: ffffffff86109574 RBX: 1ffff1100651ddb9 RCX: ffff888020b68000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffffc90002ef7250 R08: ffff888035af2bdf R09: 1ffff11006b5e57b
R10: dffffc0000000000 R11: ffffed1006b5e57c R12: ffff8880328eec00
R13: 1ffff920005dee38 R14: dffffc0000000000 R15: 0000000000000001
FS:  00007fcc9d5dd6c0(0000) GS:ffff88812527c000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f81a8ab50f0 CR3: 0000000078780000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 nbd_genl_connect+0x133d/0x1c10 drivers/block/nbd.c:2254
 genl_family_rcv_msg_doit+0x233/0x340 net/netlink/genetlink.c:1114
 genl_family_rcv_msg net/netlink/genetlink.c:1194 [inline]
 genl_rcv_msg+0x614/0x7a0 net/netlink/genetlink.c:1209
 netlink_rcv_skb+0x226/0x4a0 net/netlink/af_netlink.c:2556
 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1218
 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
 netlink_unicast+0x7bb/0x940 net/netlink/af_netlink.c:1345
 netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1900
 sock_sendmsg_nosec net/socket.c:775 [inline]
 __sock_sendmsg net/socket.c:790 [inline]
 ____sys_sendmsg+0x9b9/0xa20 net/socket.c:2684
 ___sys_sendmsg+0x2a5/0x360 net/socket.c:2738
 __sys_sendmsg net/socket.c:2770 [inline]
 __do_sys_sendmsg net/socket.c:2775 [inline]
 __se_sys_sendmsg net/socket.c:2773 [inline]
 __x64_sys_sendmsg+0x1b1/0x290 net/socket.c:2773
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fcc9df9ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fcc9d5dd028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fcc9e216090 RCX: 00007fcc9df9ce59
RDX: 0000000000004040 RSI: 0000200000000140 RDI: 0000000000000004
RBP: 00007fcc9e032d6f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fcc9e216128 R14: 00007fcc9e216090 R15: 00007ffc8f827678
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH iwl-next v2] ixgbe: Implement PCI reset handler
From: Temerkhanov, Sergey @ 2026-06-21  7:52 UTC (permalink / raw)
  To: Temerkhanov, Sergey, intel-wired-lan@lists.osuosl.org
  Cc: netdev@vger.kernel.org, pmenzel@molgen.mpg.de
In-Reply-To: <20260618142212.310475-1-sergey.temerkhanov@intel.com>

Please disregard, this is a broken version mistakenly sent.

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Sergey Temerkhanov
> Sent: Thursday, June 18, 2026 4:22 PM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; pmenzel@molgen.mpg.de
> Subject: [Intel-wired-lan] [PATCH iwl-next v2] ixgbe: Implement PCI reset
> handler
> 
> Implement PCI device reset handler to allow the network device to get re-
> initialized and function after a PCI-level reset.
> 
> This is necessary for the adapter to avoid TX queue timeouts occurring when
> the PCI reset is initiated via sysfs during the operation
> 
> Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com>
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> ---
> 
> Previous version:
> https://lore.kernel.org/netdev/MW4PR11MB6864BC9CA84F060AF7E02484
> 80E42@MW4PR11MB6864.namprd11.prod.outlook.com/
> v1->v2 changes: Rearranged the order of operations, switched to
> v1->poll_timeout_us() macro
> 
>  drivers/net/ethernet/intel/ixgbe/ixgbe.h      |  1 +
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 82 +++++++++++++++++++
>  2 files changed, 83 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> index 594ccb28da20..c4b0c5bb89c6 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> @@ -912,6 +912,7 @@ enum ixgbe_state_t {
>  	__IXGBE_PTP_TX_IN_PROGRESS,
>  	__IXGBE_RESET_REQUESTED,
>  	__IXGBE_PHY_INIT_COMPLETE,
> +	__IXGBE_PCIE_RESET_IN_PROGRESS,
>  };
> 
>  struct ixgbe_cb {
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 2ac274c73d61..0fb64aef223e 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -12352,6 +12352,86 @@ static pci_ers_result_t
> ixgbe_io_slot_reset(struct pci_dev *pdev)
>  	return result;
>  }
> 
> +/* 1500 us poll interval */
> +#define IXGBE_RESET_PREP_POLL_INTERVAL_US 1500
> +/* 2 second timeout to acquire reset lock before proceeding */ #define
> +IXGBE_RESET_PREP_TIMEOUT_US 2000000
> +
> +/**
> + * ixgbe_reset_prep - called before the pci bus is reset.
> + * @pdev: Pointer to PCI device
> + *
> + * Prepare the card for a reset, preventing the service task from running.
> + */
> +static void ixgbe_reset_prep(struct pci_dev *pdev) {
> +	struct ixgbe_adapter *adapter = pci_get_drvdata(pdev);
> +
> +	if (!adapter)
> +		return;
> +
> +	if (poll_timeout_us(test_and_set_bit(__IXGBE_RESETTING, &adapter-
> >state),
> +			    test_bit(__IXGBE_RESETTING, &adapter->state),
> +			    IXGBE_RESET_PREP_POLL_INTERVAL_US,
> +			    IXGBE_RESET_PREP_TIMEOUT_US, false)) {
> +		/* ixgbe_reset_done() will exit early if this happens.
> +		 * A retry will be needed
> +		 */
> +		e_err(drv, "Timed out waiting for __IXGBE_RESETTING to be
> released. Reset is needed\n");
> +		return;
> +	}
> +
> +	/* Sync __IXGBE_RESETTING */
> +	smp_mb__after_atomic();
> +
> +	if (test_bit(__IXGBE_SERVICE_INITED, &adapter->state)) {
> +		/* Prevent the service task from being requeued in the timer
> callback */
> +		timer_delete_sync(&adapter->service_timer);
> +		/* Cancel any possibly queued service task */
> +		cancel_work_sync(&adapter->service_task);
> +	}
> +
> +	pci_clear_master(pdev);
> +
> +	set_bit(__IXGBE_PCIE_RESET_IN_PROGRESS, &adapter->state); }
> +
> +/**
> + * ixgbe_reset_done - called after the pci bus has been reset.
> + * @pdev: Pointer to PCI device
> + *
> + * Allow the service task to run and schedule re-initialization.
> + */
> +static void ixgbe_reset_done(struct pci_dev *pdev) {
> +	struct ixgbe_adapter *adapter = pci_get_drvdata(pdev);
> +
> +	if (!adapter)
> +		return;
> +
> +	if (!test_and_clear_bit(__IXGBE_PCIE_RESET_IN_PROGRESS,
> &adapter->state)) {
> +		/* Should never get here */
> +		e_err(drv, "Reset done called without PCIe reset in
> progress\n");
> +		return;
> +	}
> +
> +	pci_set_master(pdev);
> +
> +	/* Allow the service task to run */
> +	if (!test_bit(__IXGBE_REMOVING, &adapter->state)) {
> +		clear_bit(__IXGBE_RESETTING, &adapter->state);
> +		/* Sync __IXGBE_RESETTING */
> +		smp_mb__after_atomic();
> +	}
> +
> +	/* Schedule re-initialization */
> +	if (!test_bit(__IXGBE_DOWN, &adapter->state)) {
> +		set_bit(__IXGBE_RESET_REQUESTED, &adapter->state);
> +		if (test_bit(__IXGBE_SERVICE_INITED, &adapter->state))
> +			mod_timer(&adapter->service_timer, jiffies + 1);
> +	}
> +}
> +
>  /**
>   * ixgbe_io_resume - called when traffic can start flowing again.
>   * @pdev: Pointer to PCI device
> @@ -12384,6 +12464,8 @@ static const struct pci_error_handlers
> ixgbe_err_handler = {
>  	.error_detected = ixgbe_io_error_detected,
>  	.slot_reset = ixgbe_io_slot_reset,
>  	.resume = ixgbe_io_resume,
> +	.reset_prepare = ixgbe_reset_prep,
> +	.reset_done = ixgbe_reset_done,
>  };
> 
>  static DEFINE_SIMPLE_DEV_PM_OPS(ixgbe_pm_ops, ixgbe_suspend,
> ixgbe_resume);
> --
> 2.53.0


^ permalink raw reply

* Re: [RFC net-next 3/4] net: dsa: motorcomm: Dynamically allocate port structures
From: Andrew Lunn @ 2026-06-21  9:06 UTC (permalink / raw)
  To: David Yang
  Cc: netdev, Vladimir Oltean, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel
In-Reply-To: <CAAXyoMNOwiQCta9CKVtoBUwmX5JL-1MaDjxLrbWyviADyfs=sQ@mail.gmail.com>

> > This seems pretty error prone. A missing check will result in an
> > opps. At least it will be obvious. How big is each port structure? Is
> > the memory saving worth it?
> >
> >     Andrew
> 
> It's about 1.4k per port for 5 dummy ports.

That adds up, so not allocating them makes sense.

Maybe check if the functions you are adding tests to can actually be
called for dummy ports. They should not have a netdev, so that often
means there is no path to call these functions.

      Andrew

^ permalink raw reply

* [PATCH net v5 0/2] xfrm: fix async crypto (-EINPROGRESS) handling in validate_xmit_xfrm()
From: Petr Wozniak @ 2026-06-21 10:03 UTC (permalink / raw)
  To: netdev
  Cc: sd, steffen.klassert, herbert, kuba, horms, pabeni, edumazet,
	davem, Petr Wozniak
In-Reply-To: <20260603064659.3867-1-petr.wozniak@gmail.com>

This series fixes how the async crypto path (-EINPROGRESS from ->xmit())
is handled in validate_xmit_xfrm() and its callers.

Patch 1 (previously sent on its own, v1-v4) makes validate_xmit_xfrm()
return ERR_PTR(-EINPROGRESS) instead of NULL when a packet is stolen by
async crypto, so __dev_queue_xmit() can tell it apart from a real drop
and stop reporting -ENOMEM on noqueue/bridge interfaces.  v5 also covers
the GSO segment loop, as Sabrina pointed out.

Patch 2 fixes a use-after-free found while looking at that GSO loop:
validate_xmit_xfrm() unlinks async-stolen segments but never updates the
list head ->prev, which validate_xmit_skb_list() later dereferences.

Changes in v5:
 - 1/2: also propagate ERR_PTR(-EINPROGRESS) from the GSO segment loop
   (the 2nd ->xmit() call); v4 only handled the single-skb path.  Restore
   the blank line in validate_xmit_skb_list().  Add the missing
   maintainers to Cc. (Sabrina Dubroca)
 - 2/2: new patch -- fix the stale skb->prev use-after-free (also flagged
   by Sashiko)

Changes in v4:
 - Drop bool stolen tracking and the ERR_PTR return in
   validate_xmit_skb_list(); use IS_ERR_OR_NULL() so stolen skbs are
   silently skipped (Sabrina Dubroca)
 - Drop ERR_PTR(-EINPROGRESS) handling in __dev_direct_xmit() (Sabrina Dubroca)
 - Move validate_xmit_skb() return-value comment above the function
   (Sabrina Dubroca)

Changes in v3:
 - validate_xmit_skb_list(): set stolen=true only for -EINPROGRESS
   (Sabrina Dubroca)

Changes in v2:
 - Reset rc to NET_XMIT_SUCCESS only when PTR_ERR(skb) == -EINPROGRESS
   (Sabrina Dubroca)

Petr Wozniak (2):
  xfrm: propagate -EINPROGRESS from validate_xmit_xfrm()
  xfrm: fix stale skb->prev after async crypto steals a GSO segment

 net/core/dev.c         | 10 ++++++++--
 net/xfrm/xfrm_device.c | 12 ++++++++++--
 2 files changed, 18 insertions(+), 4 deletions(-)

-- 
2.51.0


^ permalink raw reply

* [PATCH net v5 1/2] xfrm: propagate -EINPROGRESS from validate_xmit_xfrm()
From: Petr Wozniak @ 2026-06-21 10:03 UTC (permalink / raw)
  To: netdev
  Cc: sd, steffen.klassert, herbert, kuba, horms, pabeni, edumazet,
	davem, Petr Wozniak
In-Reply-To: <20260621100327.40203-1-petr.wozniak@gmail.com>

validate_xmit_xfrm() returns NULL both when a packet is dropped and
when it is stolen by async crypto (-EINPROGRESS from ->xmit()).
Callers cannot distinguish the two cases.

f53c723902d1 ("net: Add asynchronous callbacks for xfrm on layer 2.")
changed the semantics of a NULL return from "dropped" to "stolen or
dropped", but __dev_queue_xmit() was not updated.  On virtual/bridge
interfaces (noqueue qdisc) __dev_queue_xmit() initialises rc=-ENOMEM
and jumps to out: when skb is NULL, returning -ENOMEM to the caller
even though the packet will be delivered correctly via xfrm_dev_resume().

Return ERR_PTR(-EINPROGRESS) from validate_xmit_xfrm() for the async
case so callers can tell it apart from a real drop.  Update
__dev_queue_xmit() to handle ERR_PTR(-EINPROGRESS) from
validate_xmit_skb() correctly.  Update validate_xmit_skb_list() to
use IS_ERR_OR_NULL() so that ERR_PTR(-EINPROGRESS) is not mistakenly
added to the transmitted list.

Fixes: f53c723902d1 ("net: Add asynchronous callbacks for xfrm on layer 2.")
Suggested-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Petr Wozniak <petr.wozniak@gmail.com>
---
 net/core/dev.c         | 10 ++++++++--
 net/xfrm/xfrm_device.c |  4 ++--
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 5c01dfaa6c44..f7ffc4d29597 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4018,6 +4018,9 @@ static struct sk_buff *validate_xmit_unreadable_skb(struct sk_buff *skb,
 	return NULL;
 }
 
+/* Returns the skb on success, NULL if dropped, or ERR_PTR(-EINPROGRESS)
+ * if stolen by async xfrm crypto (delivered via xfrm_dev_resume()).
+ */
 static struct sk_buff *validate_xmit_skb(struct sk_buff *skb, struct net_device *dev, bool *again)
 {
 	netdev_features_t features;
@@ -4089,7 +4092,7 @@ struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *d
 		skb->prev = skb;
 
 		skb = validate_xmit_skb(skb, dev, again);
-		if (!skb)
+		if (IS_ERR_OR_NULL(skb))
 			continue;
 
 		if (!head)
@@ -4860,8 +4863,11 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
 			goto recursion_alert;
 
 		skb = validate_xmit_skb(skb, dev, &again);
-		if (!skb)
+		if (IS_ERR_OR_NULL(skb)) {
+			if (PTR_ERR(skb) == -EINPROGRESS)
+				rc = NET_XMIT_SUCCESS;
 			goto out;
+		}
 
 		HARD_TX_LOCK(dev, txq, cpu);
 
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 630f3dd31cc5..19c77f09acc9 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -182,7 +182,7 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, netdev_features_t featur
 		err = x->type_offload->xmit(x, skb, esp_features);
 		if (err) {
 			if (err == -EINPROGRESS)
-				return NULL;
+				return ERR_PTR(-EINPROGRESS);
 
 			XFRM_INC_STATS(xs_net(x), LINUX_MIB_XFRMOUTSTATEPROTOERROR);
 			kfree_skb(skb);
@@ -224,7 +224,7 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, netdev_features_t featur
 		pskb = skb2;
 	}
 
-	return skb;
+	return skb ? skb : ERR_PTR(-EINPROGRESS);
 }
 EXPORT_SYMBOL_GPL(validate_xmit_xfrm);
 
-- 
2.51.0


^ permalink raw reply related

* [PATCH net v5 2/2] xfrm: fix stale skb->prev after async crypto steals a GSO segment
From: Petr Wozniak @ 2026-06-21 10:03 UTC (permalink / raw)
  To: netdev
  Cc: sd, steffen.klassert, herbert, kuba, horms, pabeni, edumazet,
	davem, Petr Wozniak
In-Reply-To: <20260621100327.40203-1-petr.wozniak@gmail.com>

skb_gso_segment() leaves the segment list head with ->prev pointing at
the last segment, an invariant validate_xmit_skb_list() relies on when
it sets its tail pointer (tail = skb->prev).

When validate_xmit_xfrm() walks a GSO list and some segments are stolen
by async crypto (->xmit() returns -EINPROGRESS), those segments are
unlinked from the list but the head ->prev is never updated.  If the
last segment is the one stolen, the returned head still has ->prev
pointing at it, even though it is now owned by the crypto engine and may
be freed.  validate_xmit_skb_list() later does tail->next = skb, writing
through that stale pointer -- a use-after-free.

Repoint skb->prev at the last retained segment before returning.

Fixes: f53c723902d1 ("net: Add asynchronous callbacks for xfrm on layer 2.")
Signed-off-by: Petr Wozniak <petr.wozniak@gmail.com>
---
 net/xfrm/xfrm_device.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 19c77f09acc9..aec1e1184a71 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -224,6 +224,14 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, netdev_features_t featur
 		pskb = skb2;
 	}
 
+	/* skb_gso_segment() set skb->prev to the last segment, but async
+	 * crypto may have stolen it above without updating ->prev.  Repoint
+	 * it at the last retained segment so validate_xmit_skb_list() does
+	 * not chain onto a segment now owned by the crypto engine.
+	 */
+	if (skb)
+		skb->prev = pskb;
+
 	return skb ? skb : ERR_PTR(-EINPROGRESS);
 }
 EXPORT_SYMBOL_GPL(validate_xmit_xfrm);
-- 
2.51.0


^ permalink raw reply related

* Re: [PATCH net v4] xfrm: propagate -EINPROGRESS from validate_xmit_xfrm()
From: Petr Wozniak @ 2026-06-21 10:06 UTC (permalink / raw)
  To: netdev, sd
  Cc: steffen.klassert, herbert, kuba, horms, pabeni, edumazet, davem
In-Reply-To: <20260603064659.3867-1-petr.wozniak@gmail.com>

Reposting on the list, as you asked.

Apologies for missing your comment about the 2nd x->type_offload->xmit()
call across several versions -- entirely my fault.

You're right: in the skb_list_walk_safe() loop, if all GSO segments
return -EINPROGRESS, skb is advanced to NULL and the function returns
NULL instead of ERR_PTR(-EINPROGRESS).  v5 1/2 fixes it:

	-	return skb;
	+	return skb ? skb : ERR_PTR(-EINPROGRESS);

At that point NULL can only mean all segments were stolen -- the error
path (err != -EINPROGRESS) returns NULL directly from inside the loop.

v5 1/2 also restores the blank line in validate_xmit_skb_list() and adds
the missing maintainers to Cc.

For the use-after-free I mentioned: I confirmed it.  validate_xmit_xfrm()
unlinks async-stolen segments but never updates the list head ->prev, so
when the last segment is stolen, validate_xmit_skb_list() chains onto it
via tail->next.  v5 2/2 fixes it by repointing skb->prev at the last
retained segment.  As you suggested, the two fixes go as a series.

I could not confirm the head-list leak on a closer look, so I left it
out; I'll send a separate patch if I find it.

The v5 series has been sent.

Thanks,
Petr

^ permalink raw reply

* Re: [PATCH v2] net: add sock_open() with flags for socket creation
From: Alex Goltsev @ 2026-06-21 11:05 UTC (permalink / raw)
  To: davem, netdev; +Cc: linux-kernel, Al Viro
In-Reply-To: <CAEKmD4KSvAGWEod3h8mPKQ-UYhKqakxfakt4gXrsU8sWuAO77g@mail.gmail.com>

From a9316957e594708dfb4258ad968fe88666c9b736 Mon Sep 17 00:00:00 2001
From: 0-x-0-0 <sasha.goltsev777@gmail.com>
Date: Sun, 21 Jun 2026 13:24:29 +0300
Subject: [PATCH v2] net: add sock_open() with flags for socket creation

---
Changes in V2:
- Replaced the use of plain integer constants for flags with proper
enums to improve readability and type safety.
- `sock_open` is intentionally left as a regular exported symbol rather
than being moved to a header as `static inline`. This is because it
dereferences `current->nsproxy->net_ns`, which would require pulling
in heavy headers like <linux/sched.h> and <linux/nsproxy.h> into the
already widely-used <linux/net.h>, causing unnecessary header bloat
and potential circular dependencies.
- Introduced two new creation flags for specialized use cases within
kernel modules:

* SOCK_CREATE_NOLSM: This flag allows a kernel module to bypass
LSM hooks during socket creation. This
enables a micro-optimization for kernel-internal sockets where
the security check is known *a priori* to be a no-op (e.g., for
specific configurations or high-performance paths).
This is safe because the API is restricted to in-kernel (LKM)
contexts only, and does not weaken the security boundary for
user-triggered socket creation.

* SOCK_CREATE_NOWARN: This flag suppresses the standard warning
messages on creation failure. This is useful for callers in the
kernel that probe for protocol support and handle the error
gracefully, without wanting to pollute the kernel log with
misleading warnings.

Signed-off-by: Alexander Goltsev <sasha.goltsev777@gmail.com>
---
include/linux/net.h | 22 ++++++++
net/socket.c | 133 +++++++++++++++++++++++++++++++++++++-------
2 files changed, 134 insertions(+), 21 deletions(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index f268f395c..6367c00db 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -116,6 +116,22 @@ enum sock_shutdown_cmd {
SHUT_RDWR,
};
+/**
+ * enum sock_create_flags - socket creation flags
+ * @SOCK_CREATE_KERN: creates a kernel socket
+ * @SOCK_CREATE_USER: creates a regular socket
+ * @SOCK_CREATE_LITE: creates a lite socket
+ * @SOCK_CREATE_NOLSM: disables LSM
+ * @SOCK_CREATE_NOWARN: disables warning
+ */
+enum sock_create_flags {
+ SOCK_CREATE_KERN = BIT(0),
+ SOCK_CREATE_USER = BIT(1),
+ SOCK_CREATE_LITE = BIT(2),
+ SOCK_CREATE_NOLSM = BIT(3),
+ SOCK_CREATE_NOWARN = BIT(4),
+};
+
struct socket_wq {
/* Note: wait MUST be first field of socket_wq */
wait_queue_head_t wait;
@@ -275,6 +291,12 @@ void sock_unregister(int family);
bool sock_is_registered(int family);
int __sock_create(struct net *net, int family, int type, int proto,
struct socket **res, int kern);
+int __sock_create_flags(struct net *net, int family, int type, int protocol,
+ struct socket **res, int kern, int flags);
+int __sock_create_lite_flags(int family, int type, int protocol,
+ struct socket **res, int flags);
+int sock_open(struct net *net, int family,
+ int type, int protocol, struct socket **res, int flags);
int sock_create(int family, int type, int proto, struct socket **res);
int sock_create_kern(struct net *net, int family, int type, int proto,
struct socket **res);
int sock_create_lite(int family, int type, int proto, struct socket **res);
diff --git a/net/socket.c b/net/socket.c
index 63c69a0fa..2359fd5bf 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1425,26 +1425,28 @@ static long sock_ioctl(struct file *file,
unsigned cmd, unsigned long arg)
}
/**
- * sock_create_lite - creates a socket
+ * __sock_create_lite_flags - creates a socket (with flags)
* @family: protocol family (AF_INET, ...)
* @type: communication type (SOCK_STREAM, ...)
* @protocol: protocol (0, ...)
* @res: new socket
+ * @flags: defines socket creation flags
*
* Creates a new socket and assigns it to @res, passing through LSM.
* The new socket initialization is not complete, see kernel_accept().
* Returns 0 or an error. On failure @res is set to %NULL.
* This function internally uses GFP_KERNEL.
*/
-
-int sock_create_lite(int family, int type, int protocol, struct socket **res)
+int __sock_create_lite_flags(int family, int type, int protocol,
struct socket **res, int flags)
{
int err;
struct socket *sock = NULL;
- err = security_socket_create(family, type, protocol, 1);
- if (err)
- goto out;
+ if (!(flags & SOCK_CREATE_NOLSM)) {
+ err = security_socket_create(family, type, protocol, 1);
+ if (err)
+ goto out;
+ }
sock = sock_alloc();
if (!sock) {
@@ -1453,9 +1455,11 @@ int sock_create_lite(int family, int type, int
protocol, struct socket **res)
}
sock->type = type;
- err = security_socket_post_create(sock, family, type, protocol, 1);
- if (err)
- goto out_release;
+ if (!(flags & SOCK_CREATE_NOLSM)) {
+ err = security_socket_post_create(sock, family, type, protocol, 1);
+ if (err)
+ goto out_release;
+ }
out:
*res = sock;
@@ -1465,6 +1469,25 @@ int sock_create_lite(int family, int type, int
protocol, struct socket **res)
sock = NULL;
goto out;
}
+EXPORT_SYMBOL(__sock_create_lite_flags);
+
+/**
+ * sock_create_lite - creates a socket
+ * @family: protocol family (AF_INET, ...)
+ * @type: communication type (SOCK_STREAM, ...)
+ * @protocol: protocol (0, ...)
+ * @res: new socket
+ *
+ * Creates a new socket and assigns it to @res, passing through LSM.
+ * The new socket initialization is not complete, see kernel_accept().
+ * Returns 0 or an error. On failure @res is set to %NULL.
+ * This function internally uses GFP_KERNEL.
+ */
+
+int sock_create_lite(int family, int type, int protocol, struct socket **res)
+{
+ return __sock_create_lite_flags(family, type, protocol, res, 0);
+}
EXPORT_SYMBOL(sock_create_lite);
/* No kernel lock held - perfect */
@@ -1563,22 +1586,23 @@ int sock_wake_async(struct socket_wq *wq, int
how, int band)
EXPORT_SYMBOL(sock_wake_async);
/**
- * __sock_create - creates a socket
+ * __sock_create_flags - creates a socket (with flags)
* @net: net namespace
* @family: protocol family (AF_INET, ...)
* @type: communication type (SOCK_STREAM, ...)
* @protocol: protocol (0, ...)
* @res: new socket
* @kern: boolean for kernel space sockets
+ * @flags: defines socket creation flags
*
* Creates a new socket and assigns it to @res, passing through LSM.
* Returns 0 or an error. On failure @res is set to %NULL. @kern must
* be set to true if the socket resides in kernel space.
* This function internally uses GFP_KERNEL.
*/
-
-int __sock_create(struct net *net, int family, int type, int protocol,
- struct socket **res, int kern)
+int __sock_create_flags(struct net *net, int family,
+ int type, int protocol, struct socket **res,
+ int kern, int flags)
{
int err;
struct socket *sock;
@@ -1598,14 +1622,18 @@ int __sock_create(struct net *net, int family,
int type, int protocol,
deadlock in module load.
*/
if (family == PF_INET && type == SOCK_PACKET) {
- pr_info_once("%s uses obsolete (PF_INET,SOCK_PACKET)\n",
+ if (!(flags & SOCK_CREATE_NOWARN)) {
+ pr_info_once("%s uses obsolete (PF_INET,SOCK_PACKET)\n",
current->comm);
+ }
family = PF_PACKET;
}
- err = security_socket_create(family, type, protocol, kern);
- if (err)
- return err;
+ if (!(flags & SOCK_CREATE_NOLSM)) {
+ err = security_socket_create(family, type, protocol, kern);
+ if (err)
+ return err;
+ }
/*
* Allocate the socket and allow the family to set things up. if
@@ -1614,7 +1642,8 @@ int __sock_create(struct net *net, int family,
int type, int protocol,
*/
sock = sock_alloc();
if (!sock) {
- net_warn_ratelimited("socket: no more sockets\n");
+ if (!(flags & SOCK_CREATE_NOWARN))
+ net_warn_ratelimited("socket: no more sockets\n");
return -ENFILE; /* Not exactly a match, but its the
closest posix thing */
}
@@ -1671,9 +1700,12 @@ int __sock_create(struct net *net, int family,
int type, int protocol,
* module can have its refcnt decremented
*/
module_put(pf->owner);
- err = security_socket_post_create(sock, family, type, protocol, kern);
- if (err)
- goto out_sock_release;
+
+ if (!(flags & SOCK_CREATE_NOLSM)) {
+ err = security_socket_post_create(sock, family, type, protocol, kern);
+ if (err)
+ goto out_sock_release;
+ }
*res = sock;
return 0;
@@ -1691,6 +1723,28 @@ int __sock_create(struct net *net, int family,
int type, int protocol,
rcu_read_unlock();
goto out_sock_release;
}
+EXPORT_SYMBOL(__sock_create_flags);
+
+/**
+ * __sock_create - creates a socket
+ * @net: net namespace
+ * @family: protocol family (AF_INET, ...)
+ * @type: communication type (SOCK_STREAM, ...)
+ * @protocol: protocol (0, ...)
+ * @res: new socket
+ * @kern: boolean for kernel space sockets
+ *
+ * Creates a new socket and assigns it to @res, passing through LSM.
+ * Returns 0 or an error. On failure @res is set to %NULL. @kern must
+ * be set to true if the socket resides in kernel space.
+ * This function internally uses GFP_KERNEL.
+ */
+
+int __sock_create(struct net *net, int family, int type, int protocol,
+ struct socket **res, int kern)
+{
+ return __sock_create_flags(net, family, type, protocol, res, kern, 0);
+}
EXPORT_SYMBOL(__sock_create);
/**
@@ -1710,6 +1764,43 @@ int sock_create(int family, int type, int
protocol, struct socket **res)
}
EXPORT_SYMBOL(sock_create);
+/**
+ * sock_open - creates a socket (with flags)
+ * @net: net namespace (may be NULL in non-SOCK_CREATE_KERN modes)
+ * @family: protocol family (AF_INET, ...)
+ * @type: communication type (SOCK_STREAM, ...)
+ * @protocol: protocol (0, ...)
+ * @res: new socket
+ * @flags: socket creation flags
+ *
+ * Unified entry point for socket creation with flags.
+ * Returns 0 or an error. This function internally uses GFP_KERNEL.
+ */
+int sock_open(struct net *net, int family,
+ int type, int protocol, struct socket **res,
+ int flags)
+{
+ int type_bits = flags & (SOCK_CREATE_KERN | SOCK_CREATE_USER |
SOCK_CREATE_LITE);
+ int optional_flags = flags & ~(SOCK_CREATE_KERN | SOCK_CREATE_USER |
SOCK_CREATE_LITE);
+
+ if (type_bits == 0 || (type_bits & (type_bits - 1)) != 0)
+ return -EINVAL;
+
+ if (optional_flags & ~(SOCK_CREATE_NOLSM | SOCK_CREATE_NOWARN))
+ return -EINVAL;
+
+ switch (type_bits) {
+ case SOCK_CREATE_KERN: return __sock_create_flags(net, family, type, protocol,
+ res, 1, optional_flags);
+ case SOCK_CREATE_USER: return __sock_create_flags(current->nsproxy->net_ns,
+ family, type, protocol, res, 0, optional_flags);
+ case SOCK_CREATE_LITE: return __sock_create_lite_flags(family,
+ type, protocol, res, optional_flags);
+ default: return -EINVAL;
+ }
+}
+EXPORT_SYMBOL(sock_open);
+
/**
* sock_create_kern - creates a socket (kernel space)
* @net: net namespace
-- 
2.47.3

^ permalink raw reply related

* Fwd: HTML message rejected: qcom_ppe: direct (non-DSA) MAC port - host-bound RX dropped at EDMA (src_info_type != PORTID)?
From: perceival percy @ 2026-06-21 11:18 UTC (permalink / raw)
  Cc: netdev
In-Reply-To: <1782040471-27241-mlmmj-64bd9d5a@vger.kernel.org>

Hi Luo Jie, netdev,

I'm porting the GL.iNet GL-BE9300 (IPQ5332) to mainline OpenWrt and have run
into what looks like a gap in qcom_ppe for a direct (non-DSA) MAC port. The
question is at the bottom; I'm happy to provide the full DT, register dumps, or
test patches.

Setup
-----
SoC: IPQ5332, kernel 6.12, drivers/net/ethernet/qualcomm/ppe. Two ports:

- port@1: 10G SerDes, DSA conduit to an external RTL8372N switch (eth0). RX
works fine; the DSA tagger demuxes the switch ports.
- port@2: 2.5G, direct on-board RTL8221B PHY (phy-mode = "2500base-x"), used as
a standalone L3 "WAN" (eth1, no bridge, no DSA). RX is broken.

Symptom (port@2)
----------------
Link comes up at 2.5G/full. The MAC receives - the PPE ingress counters
advance:

PORT_RX_CNT ... 4/1(port=0002)
VPORT_RX_CNT ... 4/0(port=0002)

...but eth1 rx_packets stays 0, and the EDMA RX-ring error stat shows the drop:

/sys/kernel/debug/ppe/edma/stats/rx_ring_stats:
rxdesc[15]:src_port_inval_type = 1

So the frame reaches the EDMA RX descriptor ring, but in edma_rx_get_src_dev()
the source is not tagged as PORTID:

if ((src_info & EDMA_RXDESC_SRCINFO_TYPE_MASK) ==
EDMA_RXDESC_SRCINFO_TYPE_PORTID) {
src_port_num = src_info & EDMA_RXDESC_PORTNUM_BITS;
...
} else {
... ++rxdesc_stats->src_port_inval_type; /* port@2: type == 0 */
return NULL; /* dropped */
}

so the netdev lookup never runs.

What I have checked
-------------------
- netdev_arr[port_id - 1] is populated for port@2 (edma_port_setup), so the
PORTID -> netdev mapping would resolve if the type were PORTID.
- port@2's L2 forwarding is the fallback path: PPE_L2_VP_PORT_TBL with
INVALID_VSI_FWD_EN = 1, DST_INFO = 0 (CPU port0) - set for every physical
port in ppe_config.c. That controls the destination; the RX descriptor
src_info type is still 0.
- Bridging eth1 (to give it a VSI) does not change it.
- port@1 (the DSA conduit) gets SRCINFO_TYPE_PORTID and is delivered correctly,
so the tagging clearly happens for that port's path.

Question
--------
What configures a port so the PPE tags its host-bound frames with
SRCINFO_TYPE_PORTID + src_port? Is there a per-port setup (an EG service code,
a source-port profile, a physical-vs-virtual-port distinction) that the
DSA-conduit path establishes but a standalone direct MAC port does not?

And more generally: is "deliver a direct MAC port's own ingress to the host
CPU" (a plain WAN/L3 port - no DSA, no bridge offload) an intended/supported
configuration today, or does the EDMA RX path assume all host-bound traffic
arrives via the DSA conduit or an offloaded bridge VSI?

Thanks,
Kamil Bienkiewicz (perceival on the OpenWrt forum)

^ permalink raw reply

* [PATCH 1/2 net-next,v1] i40e: move ATR sample rate from ring to PF level
From: mheib @ 2026-06-21 12:56 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, jiri, davem, edumazet, kuba, pabeni, horms, corbet,
	anthony.l.nguyen, przemyslaw.kitszel, andrew+netdev,
	Mohammad Heib

From: Mohammad Heib <mheib@redhat.com>

The ATR sample rate is currently stored per-ring and initialized when each
TX ring is configured. Since the sample rate is a global policy that
applies uniformly across all rings, it makes more sense to store it at
the PF level.

Move atr_sample_rate from struct i40e_ring to struct i40e_pf and initialize
it once during i40e_sw_init(). Update i40e_atr() to reference the PF-level
field. Change atr_count from u8 to u32 to match the sample rate type.

Signed-off-by: Mohammad Heib <mheib@redhat.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h      | 1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c | 9 +++------
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 4 ++--
 drivers/net/ethernet/intel/i40e/i40e_txrx.h | 3 +--
 4 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 1b6a8fbaa648..88eb40ee45f0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -487,6 +487,7 @@ struct i40e_pf {
 	u16 rss_size_max;          /* HW defined max RSS queues */
 	u16 fdir_pf_filter_count;  /* num of guaranteed filters for this PF */
 	u16 num_alloc_vsi;         /* num VSIs this driver supports */
+	u32 atr_sample_rate;
 	bool wol_en;
 
 	struct hlist_head fdir_filter_list;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index d59750c490f4..9695d160bc59 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3457,12 +3457,7 @@ static int i40e_configure_tx_ring(struct i40e_ring *ring)
 		ring->xsk_pool = i40e_xsk_pool(ring);
 
 	/* some ATR related tx ring init */
-	if (test_bit(I40E_FLAG_FD_ATR_ENA, vsi->back->flags)) {
-		ring->atr_sample_rate = I40E_DEFAULT_ATR_SAMPLE_RATE;
-		ring->atr_count = 0;
-	} else {
-		ring->atr_sample_rate = 0;
-	}
+	ring->atr_count = 0;
 
 	/* configure XPS */
 	i40e_config_xps_tx_ring(ring);
@@ -12745,6 +12740,8 @@ static int i40e_sw_init(struct i40e_pf *pf)
 		}
 	}
 
+	pf->atr_sample_rate = I40E_DEFAULT_ATR_SAMPLE_RATE;
+
 	if ((pf->hw.func_caps.fd_filters_guaranteed > 0) ||
 	    (pf->hw.func_caps.fd_filters_best_effort > 0)) {
 		set_bit(I40E_FLAG_FD_ATR_ENA, pf->flags);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 61525ab7d21e..da94cb2ce94d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2882,7 +2882,7 @@ static void i40e_atr(struct i40e_ring *tx_ring, struct sk_buff *skb,
 		return;
 
 	/* if sampling is disabled do nothing */
-	if (!tx_ring->atr_sample_rate)
+	if (!pf->atr_sample_rate)
 		return;
 
 	/* Currently only IPv4/IPv6 with TCP is supported */
@@ -2934,7 +2934,7 @@ static void i40e_atr(struct i40e_ring *tx_ring, struct sk_buff *skb,
 	if (!th->fin &&
 	    !th->syn &&
 	    !th->rst &&
-	    (tx_ring->atr_count < tx_ring->atr_sample_rate))
+	    (tx_ring->atr_count < pf->atr_sample_rate))
 		return;
 
 	tx_ring->atr_count = 0;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index bb741ff3e5f2..be587f804e7a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -372,8 +372,7 @@ struct i40e_ring {
 	u16 next_to_clean;
 	u16 xdp_tx_active;
 
-	u8 atr_sample_rate;
-	u8 atr_count;
+	u32 atr_count;
 
 	bool ring_active;		/* is ring online or not */
 	bool arm_wb;		/* do something to arm write back */
-- 
2.53.0


^ permalink raw reply related

* [PATCH 2/2 net-next,v2] i40e: add devlink parameter for Flow Director ATR sample rate
From: mheib @ 2026-06-21 12:56 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, jiri, davem, edumazet, kuba, pabeni, horms, corbet,
	anthony.l.nguyen, przemyslaw.kitszel, andrew+netdev,
	Mohammad Heib
In-Reply-To: <20260621125644.253844-1-mheib@redhat.com>

From: Mohammad Heib <mheib@redhat.com>

The i40e driver uses Flow Director ATR to periodically update flow
steering information for active TCP flows. The update frequency is
currently controlled by I40E_DEFAULT_ATR_SAMPLE_RATE and is fixed at
driver build time.

On systems with a large number of queues and high-rate TCP workloads,
the default sampling interval can result in frequent Flow Director
reprogramming for long-lived flows.

The amount of TCP packet reordering observed on some systems is
sensitive to the ATR sampling interval. Increasing the interval reduces
Flow Director programming activity and can significantly reduce the
associated reordering.

Since the optimal sampling interval depends on the workload and system
configuration, a single fixed value is not suitable for all deployments.

Add a devlink parameter to allow administrators to tune the ATR sample
rate at runtime without rebuilding the driver or disabling ATR
functionality entirely.

Signed-off-by: Mohammad Heib <mheib@redhat.com>
---
 Documentation/networking/devlink/i40e.rst     | 20 +++++++++++
 .../net/ethernet/intel/i40e/i40e_devlink.c    | 36 +++++++++++++++++++
 2 files changed, 56 insertions(+)

diff --git a/Documentation/networking/devlink/i40e.rst b/Documentation/networking/devlink/i40e.rst
index 51c887f0dc83..2cea98b631ba 100644
--- a/Documentation/networking/devlink/i40e.rst
+++ b/Documentation/networking/devlink/i40e.rst
@@ -40,6 +40,26 @@ Parameters
 
         The default value is ``0`` (internal calculation is used).
 
+.. list-table:: Driver specific parameters implemented
+    :widths: 5 5 90
+
+    * - Name
+      - Mode
+      - Description
+    * - ``atr_sample_rate``
+      - runtime
+      - Controls how frequently Flow Director ATR updates flow steering
+        information for active TCP flows.
+
+        ATR programs Flow Director entries based on sampled transmitted
+        packets. The sampling interval is specified as the number of
+        transmitted packets between ATR updates.
+
+        Lower values increase Flow Director programming activity, while
+        higher values reduce the update frequency.
+
+        Setting to ``0`` disables ATR sampling (no filters will be programmed)
+        The default value is ``20``.
 
 Info versions
 =============
diff --git a/drivers/net/ethernet/intel/i40e/i40e_devlink.c b/drivers/net/ethernet/intel/i40e/i40e_devlink.c
index 229179ccc131..cf487efdd803 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_devlink.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_devlink.c
@@ -33,12 +33,48 @@ static int i40e_max_mac_per_vf_get(struct devlink *devlink,
 	return 0;
 }
 
+static int i40e_atr_sample_rate_set(struct devlink *devlink,
+				    u32 id,
+				    struct devlink_param_gset_ctx *ctx,
+				    struct netlink_ext_ack *extack)
+{
+	struct i40e_pf *pf = devlink_priv(devlink);
+	u32 sample_rate = ctx->val.vu32;
+
+	pf->atr_sample_rate = sample_rate;
+	return 0;
+}
+
+static int i40e_atr_sample_rate_get(struct devlink *devlink,
+				    u32 id,
+				    struct devlink_param_gset_ctx *ctx,
+				    struct netlink_ext_ack *extack)
+{
+	struct i40e_pf *pf = devlink_priv(devlink);
+
+	ctx->val.vu32 = pf->atr_sample_rate;
+
+	return 0;
+}
+
+enum i40e_dl_param_id {
+	I40E_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
+	I40E_DEVLINK_PARAM_ID_ATR_SAMPLE_RATE,
+};
+
 static const struct devlink_param i40e_dl_params[] = {
 	DEVLINK_PARAM_GENERIC(MAX_MAC_PER_VF,
 			      BIT(DEVLINK_PARAM_CMODE_RUNTIME),
 			      i40e_max_mac_per_vf_get,
 			      i40e_max_mac_per_vf_set,
 			      NULL),
+	DEVLINK_PARAM_DRIVER(I40E_DEVLINK_PARAM_ID_ATR_SAMPLE_RATE,
+			     "atr_sample_rate",
+			     DEVLINK_PARAM_TYPE_U32,
+			     BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+			     i40e_atr_sample_rate_get,
+			     i40e_atr_sample_rate_set,
+			     NULL),
 };
 
 static void i40e_info_get_dsn(struct i40e_pf *pf, char *buf, size_t len)
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH v2] net: add sock_open() with flags for socket creation
From: David Laight @ 2026-06-21 12:57 UTC (permalink / raw)
  To: Alex Goltsev; +Cc: davem, netdev, linux-kernel, Al Viro
In-Reply-To: <CAEKmD4K-v_srabQDJCfqaqA6vssk-Hg-mLHEdTsphTBLmQVjnw@mail.gmail.com>

On Sun, 21 Jun 2026 14:05:40 +0300
Alex Goltsev <sasha.goltsev777@gmail.com> wrote:

> From a9316957e594708dfb4258ad968fe88666c9b736 Mon Sep 17 00:00:00 2001
> From: 0-x-0-0 <sasha.goltsev777@gmail.com>
> Date: Sun, 21 Jun 2026 13:24:29 +0300
> Subject: [PATCH v2] net: add sock_open() with flags for socket creation

A) There is no info here.
B) You've not said why this is of any use.
C) It isn't a bug fix so would go into net-next
D) net-next is closed.

	David

> 
> ---
> Changes in V2:
> - Replaced the use of plain integer constants for flags with proper
> enums to improve readability and type safety.
> - `sock_open` is intentionally left as a regular exported symbol rather
> than being moved to a header as `static inline`. This is because it
> dereferences `current->nsproxy->net_ns`, which would require pulling
> in heavy headers like <linux/sched.h> and <linux/nsproxy.h> into the
> already widely-used <linux/net.h>, causing unnecessary header bloat
> and potential circular dependencies.
> - Introduced two new creation flags for specialized use cases within
> kernel modules:
> 
> * SOCK_CREATE_NOLSM: This flag allows a kernel module to bypass
> LSM hooks during socket creation. This
> enables a micro-optimization for kernel-internal sockets where
> the security check is known *a priori* to be a no-op (e.g., for
> specific configurations or high-performance paths).
> This is safe because the API is restricted to in-kernel (LKM)
> contexts only, and does not weaken the security boundary for
> user-triggered socket creation.
> 
> * SOCK_CREATE_NOWARN: This flag suppresses the standard warning
> messages on creation failure. This is useful for callers in the
> kernel that probe for protocol support and handle the error
> gracefully, without wanting to pollute the kernel log with
> misleading warnings.
> 
> Signed-off-by: Alexander Goltsev <sasha.goltsev777@gmail.com>
> ---
> include/linux/net.h | 22 ++++++++
> net/socket.c | 133 +++++++++++++++++++++++++++++++++++++-------
> 2 files changed, 134 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/net.h b/include/linux/net.h
> index f268f395c..6367c00db 100644
> --- a/include/linux/net.h
> +++ b/include/linux/net.h
> @@ -116,6 +116,22 @@ enum sock_shutdown_cmd {
> SHUT_RDWR,
> };
> +/**
> + * enum sock_create_flags - socket creation flags
> + * @SOCK_CREATE_KERN: creates a kernel socket
> + * @SOCK_CREATE_USER: creates a regular socket
> + * @SOCK_CREATE_LITE: creates a lite socket
> + * @SOCK_CREATE_NOLSM: disables LSM
> + * @SOCK_CREATE_NOWARN: disables warning
> + */
> +enum sock_create_flags {
> + SOCK_CREATE_KERN = BIT(0),
> + SOCK_CREATE_USER = BIT(1),
> + SOCK_CREATE_LITE = BIT(2),
> + SOCK_CREATE_NOLSM = BIT(3),
> + SOCK_CREATE_NOWARN = BIT(4),
> +};
> +
> struct socket_wq {
> /* Note: wait MUST be first field of socket_wq */
> wait_queue_head_t wait;
> @@ -275,6 +291,12 @@ void sock_unregister(int family);
> bool sock_is_registered(int family);
> int __sock_create(struct net *net, int family, int type, int proto,
> struct socket **res, int kern);
> +int __sock_create_flags(struct net *net, int family, int type, int protocol,
> + struct socket **res, int kern, int flags);
> +int __sock_create_lite_flags(int family, int type, int protocol,
> + struct socket **res, int flags);
> +int sock_open(struct net *net, int family,
> + int type, int protocol, struct socket **res, int flags);
> int sock_create(int family, int type, int proto, struct socket **res);
> int sock_create_kern(struct net *net, int family, int type, int proto,
> struct socket **res);
> int sock_create_lite(int family, int type, int proto, struct socket **res);
> diff --git a/net/socket.c b/net/socket.c
> index 63c69a0fa..2359fd5bf 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -1425,26 +1425,28 @@ static long sock_ioctl(struct file *file,
> unsigned cmd, unsigned long arg)
> }
> /**
> - * sock_create_lite - creates a socket
> + * __sock_create_lite_flags - creates a socket (with flags)
> * @family: protocol family (AF_INET, ...)
> * @type: communication type (SOCK_STREAM, ...)
> * @protocol: protocol (0, ...)
> * @res: new socket
> + * @flags: defines socket creation flags
> *
> * Creates a new socket and assigns it to @res, passing through LSM.
> * The new socket initialization is not complete, see kernel_accept().
> * Returns 0 or an error. On failure @res is set to %NULL.
> * This function internally uses GFP_KERNEL.
> */
> -
> -int sock_create_lite(int family, int type, int protocol, struct socket **res)
> +int __sock_create_lite_flags(int family, int type, int protocol,
> struct socket **res, int flags)
> {
> int err;
> struct socket *sock = NULL;
> - err = security_socket_create(family, type, protocol, 1);
> - if (err)
> - goto out;
> + if (!(flags & SOCK_CREATE_NOLSM)) {
> + err = security_socket_create(family, type, protocol, 1);
> + if (err)
> + goto out;
> + }
> sock = sock_alloc();
> if (!sock) {
> @@ -1453,9 +1455,11 @@ int sock_create_lite(int family, int type, int
> protocol, struct socket **res)
> }
> sock->type = type;
> - err = security_socket_post_create(sock, family, type, protocol, 1);
> - if (err)
> - goto out_release;
> + if (!(flags & SOCK_CREATE_NOLSM)) {
> + err = security_socket_post_create(sock, family, type, protocol, 1);
> + if (err)
> + goto out_release;
> + }
> out:
> *res = sock;
> @@ -1465,6 +1469,25 @@ int sock_create_lite(int family, int type, int
> protocol, struct socket **res)
> sock = NULL;
> goto out;
> }
> +EXPORT_SYMBOL(__sock_create_lite_flags);
> +
> +/**
> + * sock_create_lite - creates a socket
> + * @family: protocol family (AF_INET, ...)
> + * @type: communication type (SOCK_STREAM, ...)
> + * @protocol: protocol (0, ...)
> + * @res: new socket
> + *
> + * Creates a new socket and assigns it to @res, passing through LSM.
> + * The new socket initialization is not complete, see kernel_accept().
> + * Returns 0 or an error. On failure @res is set to %NULL.
> + * This function internally uses GFP_KERNEL.
> + */
> +
> +int sock_create_lite(int family, int type, int protocol, struct socket **res)
> +{
> + return __sock_create_lite_flags(family, type, protocol, res, 0);
> +}
> EXPORT_SYMBOL(sock_create_lite);
> /* No kernel lock held - perfect */
> @@ -1563,22 +1586,23 @@ int sock_wake_async(struct socket_wq *wq, int
> how, int band)
> EXPORT_SYMBOL(sock_wake_async);
> /**
> - * __sock_create - creates a socket
> + * __sock_create_flags - creates a socket (with flags)
> * @net: net namespace
> * @family: protocol family (AF_INET, ...)
> * @type: communication type (SOCK_STREAM, ...)
> * @protocol: protocol (0, ...)
> * @res: new socket
> * @kern: boolean for kernel space sockets
> + * @flags: defines socket creation flags
> *
> * Creates a new socket and assigns it to @res, passing through LSM.
> * Returns 0 or an error. On failure @res is set to %NULL. @kern must
> * be set to true if the socket resides in kernel space.
> * This function internally uses GFP_KERNEL.
> */
> -
> -int __sock_create(struct net *net, int family, int type, int protocol,
> - struct socket **res, int kern)
> +int __sock_create_flags(struct net *net, int family,
> + int type, int protocol, struct socket **res,
> + int kern, int flags)
> {
> int err;
> struct socket *sock;
> @@ -1598,14 +1622,18 @@ int __sock_create(struct net *net, int family,
> int type, int protocol,
> deadlock in module load.
> */
> if (family == PF_INET && type == SOCK_PACKET) {
> - pr_info_once("%s uses obsolete (PF_INET,SOCK_PACKET)\n",
> + if (!(flags & SOCK_CREATE_NOWARN)) {
> + pr_info_once("%s uses obsolete (PF_INET,SOCK_PACKET)\n",
> current->comm);
> + }
> family = PF_PACKET;
> }
> - err = security_socket_create(family, type, protocol, kern);
> - if (err)
> - return err;
> + if (!(flags & SOCK_CREATE_NOLSM)) {
> + err = security_socket_create(family, type, protocol, kern);
> + if (err)
> + return err;
> + }
> /*
> * Allocate the socket and allow the family to set things up. if
> @@ -1614,7 +1642,8 @@ int __sock_create(struct net *net, int family,
> int type, int protocol,
> */
> sock = sock_alloc();
> if (!sock) {
> - net_warn_ratelimited("socket: no more sockets\n");
> + if (!(flags & SOCK_CREATE_NOWARN))
> + net_warn_ratelimited("socket: no more sockets\n");
> return -ENFILE; /* Not exactly a match, but its the
> closest posix thing */
> }
> @@ -1671,9 +1700,12 @@ int __sock_create(struct net *net, int family,
> int type, int protocol,
> * module can have its refcnt decremented
> */
> module_put(pf->owner);
> - err = security_socket_post_create(sock, family, type, protocol, kern);
> - if (err)
> - goto out_sock_release;
> +
> + if (!(flags & SOCK_CREATE_NOLSM)) {
> + err = security_socket_post_create(sock, family, type, protocol, kern);
> + if (err)
> + goto out_sock_release;
> + }
> *res = sock;
> return 0;
> @@ -1691,6 +1723,28 @@ int __sock_create(struct net *net, int family,
> int type, int protocol,
> rcu_read_unlock();
> goto out_sock_release;
> }
> +EXPORT_SYMBOL(__sock_create_flags);
> +
> +/**
> + * __sock_create - creates a socket
> + * @net: net namespace
> + * @family: protocol family (AF_INET, ...)
> + * @type: communication type (SOCK_STREAM, ...)
> + * @protocol: protocol (0, ...)
> + * @res: new socket
> + * @kern: boolean for kernel space sockets
> + *
> + * Creates a new socket and assigns it to @res, passing through LSM.
> + * Returns 0 or an error. On failure @res is set to %NULL. @kern must
> + * be set to true if the socket resides in kernel space.
> + * This function internally uses GFP_KERNEL.
> + */
> +
> +int __sock_create(struct net *net, int family, int type, int protocol,
> + struct socket **res, int kern)
> +{
> + return __sock_create_flags(net, family, type, protocol, res, kern, 0);
> +}
> EXPORT_SYMBOL(__sock_create);
> /**
> @@ -1710,6 +1764,43 @@ int sock_create(int family, int type, int
> protocol, struct socket **res)
> }
> EXPORT_SYMBOL(sock_create);
> +/**
> + * sock_open - creates a socket (with flags)
> + * @net: net namespace (may be NULL in non-SOCK_CREATE_KERN modes)
> + * @family: protocol family (AF_INET, ...)
> + * @type: communication type (SOCK_STREAM, ...)
> + * @protocol: protocol (0, ...)
> + * @res: new socket
> + * @flags: socket creation flags
> + *
> + * Unified entry point for socket creation with flags.
> + * Returns 0 or an error. This function internally uses GFP_KERNEL.
> + */
> +int sock_open(struct net *net, int family,
> + int type, int protocol, struct socket **res,
> + int flags)
> +{
> + int type_bits = flags & (SOCK_CREATE_KERN | SOCK_CREATE_USER |
> SOCK_CREATE_LITE);
> + int optional_flags = flags & ~(SOCK_CREATE_KERN | SOCK_CREATE_USER |
> SOCK_CREATE_LITE);
> +
> + if (type_bits == 0 || (type_bits & (type_bits - 1)) != 0)
> + return -EINVAL;
> +
> + if (optional_flags & ~(SOCK_CREATE_NOLSM | SOCK_CREATE_NOWARN))
> + return -EINVAL;
> +
> + switch (type_bits) {
> + case SOCK_CREATE_KERN: return __sock_create_flags(net, family, type, protocol,
> + res, 1, optional_flags);
> + case SOCK_CREATE_USER: return __sock_create_flags(current->nsproxy->net_ns,
> + family, type, protocol, res, 0, optional_flags);
> + case SOCK_CREATE_LITE: return __sock_create_lite_flags(family,
> + type, protocol, res, optional_flags);
> + default: return -EINVAL;
> + }
> +}
> +EXPORT_SYMBOL(sock_open);
> +
> /**
> * sock_create_kern - creates a socket (kernel space)
> * @net: net namespace


^ permalink raw reply

* Re: [Intel-wired-lan] e1000e: Report link down after "Detected Hardware Unit Hang" ?
From: Ruinskiy, Dima @ 2026-06-21 13:22 UTC (permalink / raw)
  To: Andrew Lunn, Helge Deller, Helge Deller
  Cc: Tony Nguyen, Przemek Kitszel, intel-wired-lan, netdev
In-Reply-To: <d86c0dd8-8bd8-495a-b750-2a0036fbbee4@lunn.ch>

On 17/06/2026 0:59, Andrew Lunn wrote:
>> This does not seem like the right direction to me.
>>
>> The "Detected Hardware Unit Hang" print does not indicate that the interface
>> is dead, but that the transmitter is stalled.
>>
>> This can be due to an unusually high load, or a HW fault / race condition
>> with another component, etc.
>>
>> When a hang is detected, the transmitter is stopped with netif_stop_queue()
>> and eventually ndo_tx_timeout triggers a full reset to the device, which in
>> many cases recovers it from the hang.
> 
> Does a full reset cause the link to be negotiated again? If so, there
> is no harm in setting the carrier down. If the reset is successful,
> the carrier will be restored. However, if the reset does not recover
> the system, does the carrier say down?
> 
>      Andrew
> 

The way it is written - a reset triggered by the Tx timeout path will go 
through e1000e_reinit_locked(), which calls e1000e_down() followed by 
e1000e_up().

e1000e_down() calls netif_carrier_off() at the start, and e1000e_reset() 
later. e1000e_up() triggers a link state recheck, which should restore 
the carrier.

So if everything works as it should, the change proposed here would be 
both redundant and unnecessary. However, we have been getting reports of 
these unrecoverable hangs from time-to-time, so I suspect things do not 
always work as they should.

There is one issue under investigation at present, where a persistent 
hang was reported following an aborted hibernation attempt. We are 
testing a patch against it.

I did not see anything in the original description of this report tying 
the hang to a power state change, but I will happily share the patch 
once we get preliminary positive results.

--Dima


^ permalink raw reply

* Re: [PATCH v2] net: add sock_open() with flags for socket creation
From: Andrew Lunn @ 2026-06-21 13:59 UTC (permalink / raw)
  To: David Laight; +Cc: Alex Goltsev, davem, netdev, linux-kernel, Al Viro
In-Reply-To: <20260621135746.067a93be@pumpkin>

On Sun, Jun 21, 2026 at 01:57:46PM +0100, David Laight wrote:
> On Sun, 21 Jun 2026 14:05:40 +0300
> Alex Goltsev <sasha.goltsev777@gmail.com> wrote:
> 
> > From a9316957e594708dfb4258ad968fe88666c9b736 Mon Sep 17 00:00:00 2001
> > From: 0-x-0-0 <sasha.goltsev777@gmail.com>
> > Date: Sun, 21 Jun 2026 13:24:29 +0300
> > Subject: [PATCH v2] net: add sock_open() with flags for socket creation
> 
> A) There is no info here.
> B) You've not said why this is of any use.
> C) It isn't a bug fix so would go into net-next
> D) net-next is closed.

E) The patch has had all its whitespace corrupted.

Please "submit" the patch to yourself and ensure you can cleanly apply
it.

   Andrew

^ permalink raw reply

* Re: "ip help" output is an error
From: Stephen Hemminger @ 2026-06-21 15:21 UTC (permalink / raw)
  To: Dmitri Seletski; +Cc: netdev
In-Reply-To: <62f09fe8-899c-4d22-b7a1-67e2745613df@gmail.com>

On Sat, 20 Jun 2026 10:36:31 +0100
Dmitri Seletski <drjoms@gmail.com> wrote:

> Hello iproute2 maintainers,
> 
> I am reporting an inconsistency regarding the exit status of the ip help 
> command.
> 
> Current Behavior:
> When running ip help, the command prints the help documentation to 
> stdout, but exits with a non-zero status (error). This causes issues in 
> shell scripts that rely on exit codes for control flow.
> 
> Steps to reproduce:
> bash
> 
> # This returns "FAIL" because the exit code is non-zero
> if ip help > /dev/null; then
>      echo "SUCCESS"
> else
>      echo "FAIL"
> fi
> 
> Expected Behavior:
> Since the command successfully performs the requested task (displaying 
> help information) and does not encounter a system error, it should 
> return an exit code of 0.
> 
> Context:
> This behavior breaks standard Bash logic for automation. For example:
> ip help && echo "This will not execute"
> 
> "ip help |grep br" - this will bring no result.
> 
> Current version tested: iproute2-6.19.0
> 
> Thank you for your time and for maintaining this tool.
> 
> Regards,
> Dmitri Seletski
> 
> 

Yes iproute2 doesn't do a great job of handling error codes
with usage vs help. Its a bug and no one has bothered to fix it.

^ permalink raw reply

* [syzbot] [net?] WARNING in __ethtool_get_link_ksettings
From: syzbot @ 2026-06-21 15:22 UTC (permalink / raw)
  To: andrew, davem, edumazet, horms, kuba, linux-kernel, netdev,
	pabeni, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    4fa3f5fabb30 Add linux-next specific files for 20260616
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1039ffec580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=6c414e1864e61ef6
dashboard link: https://syzkaller.appspot.com/bug?extid=09da62a8b78959ceb8bb
compiler:       Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/bf5b803a695d/disk-4fa3f5fa.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/47871e7c589e/vmlinux-4fa3f5fa.xz
kernel image: https://storage.googleapis.com/syzbot-assets/53cd9ef32a2b/bzImage-4fa3f5fa.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+09da62a8b78959ceb8bb@syzkaller.appspotmail.com

netlink: 'syz.3.4736': attribute type 10 has an invalid length.
------------[ cut here ]------------
rtmutex deadlock detected
WARNING: kernel/locking/rtmutex.c:1698 at rt_mutex_handle_deadlock+0x21/0xb0 kernel/locking/rtmutex.c:1698, CPU#0: syz.3.4736/19861
Modules linked in:
CPU: 0 UID: 0 PID: 19861 Comm: syz.3.4736 Tainted: G             L      syzkaller #0 PREEMPT_{RT,(full)} 
Tainted: [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/09/2026
RIP: 0010:rt_mutex_handle_deadlock+0x21/0xb0 kernel/locking/rtmutex.c:1698
Code: 90 90 90 90 90 90 90 90 90 41 57 41 56 41 55 41 54 53 83 ff dd 0f 85 81 00 00 00 48 89 f7 e8 16 40 01 00 48 8d 3d 3f 2f 5d 04 <67> 48 0f b9 3a 4c 8d 3d 00 00 00 00 65 48 8b 1d 13 2d 42 07 4c 8d
RSP: 0018:ffffc90003d760f0 EFLAGS: 00010286
RAX: 0000000080000000 RBX: 00000000ffffffdd RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff8baa3a60 RDI: ffffffff8f8fd220
RBP: ffffc90003d762a0 R08: ffffffff8f8c5ff7 R09: 1ffffffff1f18bfe
R10: dffffc0000000000 R11: fffffbfff1f18bff R12: 1ffff920007aec2c
R13: ffffffff8b329d22 R14: ffff88805dbb1008 R15: dffffc0000000000
FS:  00007fd5506ae6c0(0000) GS:ffff888125ed3000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000003cd08000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 __rt_mutex_slowlock kernel/locking/rtmutex.c:1760 [inline]
 __rt_mutex_slowlock_locked kernel/locking/rtmutex.c:1787 [inline]
 rt_mutex_slowlock+0x73c/0x780 kernel/locking/rtmutex.c:1827
 __rt_mutex_lock kernel/locking/rtmutex.c:1842 [inline]
 __mutex_lock_common kernel/locking/rtmutex_api.c:560 [inline]
 mutex_lock_nested+0x168/0x1d0 kernel/locking/rtmutex_api.c:578
 netdev_lock include/linux/netdevice.h:2846 [inline]
 netdev_lock_ops include/net/netdev_lock.h:42 [inline]
 __ethtool_get_link_ksettings+0x109/0x250 net/ethtool/ioctl.c:463
 bond_update_speed_duplex drivers/net/bonding/bond_main.c:801 [inline]
 bond_slave_netdev_event drivers/net/bonding/bond_main.c:3982 [inline]
 bond_netdev_event+0x643/0xf80 drivers/net/bonding/bond_main.c:4089
 notifier_call_chain+0x1a5/0x3d0 kernel/notifier.c:85
 call_netdevice_notifiers_extack net/core/dev.c:2289 [inline]
 call_netdevice_notifiers net/core/dev.c:2303 [inline]
 __dev_notify_flags+0x1aa/0x310 net/core/dev.c:9792
 netif_change_flags+0xde/0x1b0 net/core/dev.c:9821
 dev_change_flags+0x128/0x260 net/core/dev_api.c:68
 vlan_device_event+0x1b4e/0x1f00 net/8021q/vlan.c:494
 notifier_call_chain+0x1a5/0x3d0 kernel/notifier.c:85
 call_netdevice_notifiers_extack net/core/dev.c:2289 [inline]
 call_netdevice_notifiers net/core/dev.c:2303 [inline]
 netif_open+0x10f/0x190 net/core/dev.c:1730
 dev_open+0x101/0x220 net/core/dev_api.c:202
 bond_enslave+0xeb2/0x3b70 drivers/net/bonding/bond_main.c:2089
 do_set_master+0x563/0x720 net/core/rtnetlink.c:3012
 do_setlink+0xe7b/0x4670 net/core/rtnetlink.c:3214
 rtnl_changelink net/core/rtnetlink.c:3841 [inline]
 __rtnl_newlink net/core/rtnetlink.c:4014 [inline]
 rtnl_newlink+0x15c2/0x1bd0 net/core/rtnetlink.c:4151
 rtnetlink_rcv_msg+0x802/0xc00 net/core/rtnetlink.c:7068
 netlink_rcv_skb+0x226/0x4a0 net/netlink/af_netlink.c:2556
 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
 netlink_unicast+0x7f5/0x990 net/netlink/af_netlink.c:1345
 netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1900
 sock_sendmsg_nosec+0x13a/0x180 net/socket.c:785
 __sock_sendmsg net/socket.c:800 [inline]
 ____sys_sendmsg+0x565/0x870 net/socket.c:2702
 ___sys_sendmsg+0x2a5/0x360 net/socket.c:2756
 __sys_sendmsg net/socket.c:2788 [inline]
 __do_sys_sendmsg net/socket.c:2793 [inline]
 __se_sys_sendmsg net/socket.c:2791 [inline]
 __x64_sys_sendmsg+0x1b7/0x290 net/socket.c:2791
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fd55245ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fd5506ae028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fd5526d5fa0 RCX: 00007fd55245ce59
RDX: 0000000000008084 RSI: 0000200000000600 RDI: 0000000000000003
RBP: 00007fd5524f2d6f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fd5526d6038 R14: 00007fd5526d5fa0 R15: 00007ffdbadd34a8
 </TASK>
----------------
Code disassembly (best guess):
   0:	90                   	nop
   1:	90                   	nop
   2:	90                   	nop
   3:	90                   	nop
   4:	90                   	nop
   5:	90                   	nop
   6:	90                   	nop
   7:	90                   	nop
   8:	90                   	nop
   9:	41 57                	push   %r15
   b:	41 56                	push   %r14
   d:	41 55                	push   %r13
   f:	41 54                	push   %r12
  11:	53                   	push   %rbx
  12:	83 ff dd             	cmp    $0xffffffdd,%edi
  15:	0f 85 81 00 00 00    	jne    0x9c
  1b:	48 89 f7             	mov    %rsi,%rdi
  1e:	e8 16 40 01 00       	call   0x14039
  23:	48 8d 3d 3f 2f 5d 04 	lea    0x45d2f3f(%rip),%rdi        # 0x45d2f69
* 2a:	67 48 0f b9 3a       	ud1    (%edx),%rdi <-- trapping instruction
  2f:	4c 8d 3d 00 00 00 00 	lea    0x0(%rip),%r15        # 0x36
  36:	65 48 8b 1d 13 2d 42 	mov    %gs:0x7422d13(%rip),%rbx        # 0x7422d51
  3d:	07
  3e:	4c                   	rex.WR
  3f:	8d                   	.byte 0x8d


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply

* Re: [PATCH net v4] nfc: llcp: bound SNL TLV parsing to the skb and add length checks
From: David Heidelberg @ 2026-06-21 16:34 UTC (permalink / raw)
  To: Doruk Tan Ozturk, oe-linux-nfc
  Cc: david.laight.linux, horms, netdev, linux-kernel
In-Reply-To: <20260609202543.42282-1-doruk@0sec.ai>


On Tue, 09 Jun 2026 22:25:43 +0200, Doruk Tan Ozturk wrote:
 > nfc: llcp: bound SNL TLV parsing to the skb and add length checks

Applied, thanks!

[1/1] nfc: llcp: bound SNL TLV parsing to the skb and add length checks
       commit: ed85d4cbbfaa4e630c5aa0d607348b42620d976b

Best regards,
-- 
David Heidelberg <david@ixit.cz>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox