Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net v2 1/2] iov_iter: export iov_iter_restore
From: Stefano Garzarella @ 2026-06-16 12:35 UTC (permalink / raw)
  To: Octavian Purdila
  Cc: netdev, Alexander Viro, Andrew Morton, Arseniy Krasnov,
	David S. Miller, Eric Dumazet, Eugenio Pérez, Jakub Kicinski,
	Jason Wang, kvm, linux-block, linux-fsdevel, linux-kernel,
	Michael S. Tsirkin, Paolo Abeni, Simon Horman, Stefan Hajnoczi,
	virtualization, Xuan Zhuo
In-Reply-To: <20260613000953.467473-2-tavip@google.com>

On Sat, Jun 13, 2026 at 12:09:52AM +0000, Octavian Purdila wrote:
>Export iov_iter_restore so that it can be used by modules.
>
>This is needed by the virtio vsock transport (which can be built as a
>module) to restore the msg_iter state when transmission fails.
>
>Signed-off-by: Octavian Purdila <tavip@google.com>
>---
> lib/iov_iter.c | 1 +
> 1 file changed, 1 insertion(+)

Acked-by: Stefano Garzarella <sgarzare@redhat.com>

>
>diff --git a/lib/iov_iter.c b/lib/iov_iter.c
>index 243662af1af73..067e745f9ef53 100644
>--- a/lib/iov_iter.c
>+++ b/lib/iov_iter.c
>@@ -1491,6 +1491,7 @@ void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state)
> 		i->__iov -= state->nr_segs - i->nr_segs;
> 	i->nr_segs = state->nr_segs;
> }
>+EXPORT_SYMBOL(iov_iter_restore);
>
> /*
>  * Extract a list of contiguous pages from an ITER_FOLIOQ iterator.  This does
>-- 
>2.54.0.1136.gdb2ca164c4-goog
>


^ permalink raw reply

* Re: [PATCH v2] [net] net: airoha: Fix QoS counter configuration for Tx-fwd channels
From: Lorenzo Bianconi @ 2026-06-16 12:35 UTC (permalink / raw)
  To: Wayen Yan
  Cc: netdev, horms, pabeni, kuba, edumazet, andrew+netdev,
	angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
	linux-mediatek
In-Reply-To: <178161132384.2164449.18407700117859190327@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1430 bytes --]

> In airoha_qdma_init_qos_stats(), the Tx-fwd counter was incorrectly
> using register index (i << 1) instead of ((i << 1) + 1). This caused
> the Tx-fwd configuration to overwrite the Tx-cpu configuration for
> each QoS channel, resulting in incorrect QoS statistics.
> 
> Fix by using the correct register index ((i << 1) + 1) for Tx-fwd
> counter configuration.
> 
> Fixes: 20bf7d07c956 ("net: airoha: Add sched ETS offload support")
> Signed-off-by: Wayen Yan <win847@gmail.com>

Is this a patch you already sent? IIRC I have acked it.

Regards,
Lorenzo

> ---
>  drivers/net/ethernet/airoha/airoha_eth.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index 31cdb11cd7..329988a840 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> @@ -1256,7 +1256,7 @@ static void airoha_qdma_init_qos_stats(struct airoha_qdma *qdma)
>  			       FIELD_PREP(CNTR_CHAN_MASK, i));
>  		/* Tx-fwd transferred count */
>  		airoha_qdma_wr(qdma, REG_CNTR_VAL((i << 1) + 1), 0);
> -		airoha_qdma_wr(qdma, REG_CNTR_CFG(i << 1),
> +		airoha_qdma_wr(qdma, REG_CNTR_CFG((i << 1) + 1),
>  			       CNTR_EN_MASK | CNTR_ALL_QUEUE_EN_MASK |
>  			       CNTR_ALL_DSCP_RING_EN_MASK |
>  			       FIELD_PREP(CNTR_SRC_MASK, 1) |
> -- 
> 2.51.0
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH v2] [net] net: airoha: fix foe_check_time allocation size
From: Lorenzo Bianconi @ 2026-06-16 12:34 UTC (permalink / raw)
  To: Wayen Yan
  Cc: netdev, horms, pabeni, kuba, edumazet, andrew+netdev,
	angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
	linux-mediatek
In-Reply-To: <178161119471.2163752.14373384830691569758@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1276 bytes --]

> foe_check_time is declared as u16 pointer but was allocated with
> only ppe_num_entries bytes instead of ppe_num_entries * sizeof(u16).
> 
> When airoha_ppe_foe_verify_entry() is called with hash >= ppe_num_entries/2,
> it writes beyond the allocated buffer, causing heap buffer overflow and
> potential kernel crash.
> 
> Fixes: 6d5b601d52a2 ("net: airoha: ppe: Dynamically allocate foe_check_time array in airoha_ppe struct")
> Signed-off-by: Wayen Yan <win847@gmail.com>

Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>

> ---
>  drivers/net/ethernet/airoha/airoha_ppe.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/airoha/airoha_ppe.c b/drivers/net/ethernet/airoha/airoha_ppe.c
> index 5c9dff6bcc..8fb8ecf909 100644
> --- a/drivers/net/ethernet/airoha/airoha_ppe.c
> +++ b/drivers/net/ethernet/airoha/airoha_ppe.c
> @@ -1578,7 +1578,8 @@ int airoha_ppe_init(struct airoha_eth *eth)
>  			return -ENOMEM;
>  	}
>  
> -	ppe->foe_check_time = devm_kzalloc(eth->dev, ppe_num_entries,
> +	ppe->foe_check_time = devm_kzalloc(eth->dev,
> +					   ppe_num_entries * sizeof(*ppe->foe_check_time),
>  					   GFP_KERNEL);
>  	if (!ppe->foe_check_time)
>  		return -ENOMEM;
> -- 
> 2.51.0
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* [PATCH v4] flow_dissector: check device type before reading ETH_ADDRS
From: Yun Zhou @ 2026-06-16 12:30 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, horms
  Cc: netdev, linux-kernel, yun.zhou, qingfang.deng

__skb_flow_dissect() unconditionally reads 12 bytes from eth_hdr(skb)
when FLOW_DISSECTOR_KEY_ETH_ADDRS is requested. This assumes the skb
has a valid Ethernet header at mac_header, which is not always the case.

The problem can be triggered by:
 1. Creating a TUN device in L3 mode (IFF_TUN, hard_header_len=0)
 2. Attaching a multiq qdisc with a flower filter matching on eth_src
 3. Sending a packet through AF_PACKET

Since TUN in L3 mode has no link-layer header, mac_header points to
the L3 data area. The flow dissector reads 12 bytes of uninitialized
skb memory, which then propagates through fl_set_masked_key() and is
used as a rhashtable lookup key in __fl_lookup(), as reported by KMSAN.

Rejecting the filter in the control path (at tc filter add time) is
not feasible because TC filter blocks can be shared between arbitrary
devices -- a filter installed on an Ethernet device may later classify
packets on a headerless device through a shared block. The device
association is not fixed at filter creation time.

Fix this by gating the memcpy on dev->type == ARPHRD_ETHER, which
ensures only true Ethernet-framed packets have their addresses read.
This is more precise than the previous hard_header_len >= 12 check,
which would incorrectly pass for non-Ethernet link types like IPoIB
(ARPHRD_INFINIBAND, hard_header_len=24) and FDDI (hard_header_len=21)
whose L2 headers are not in Ethernet format. Additionally check
skb_mac_header_was_set() to guard against the pathological case where
mac_header is the unset sentinel (~0U), which would cause eth_hdr() to
return a wild pointer.

For the act_mirred redirect case (Ethernet packet redirected to a
non-Ethernet device sharing a TC block), zeroing the key is the correct
behavior: the packet is now being classified on the target device, where
Ethernet address matching is not semantically meaningful.

Note: on non-Ethernet devices, the zeroed key will match a filter
configured with all-zero MAC addresses. This is an improvement over the
previous behavior where uninitialized memory could randomly match any
filter.

Reported-by: syzbot+fa2f5b1fb06147be5e16@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=fa2f5b1fb06147be5e16
Fixes: 67a900cc0436 ("flow_dissector: introduce support for Ethernet addresses")
Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
---
v4:
 - Use dev->type == ARPHRD_ETHER instead of hard_header_len >= 12 to
   avoid false positives on non-Ethernet link types (IPoIB, FDDI)
 - Add skb_mac_header_was_set() guard against unset mac_header sentinel
 - Document act_mirred and all-zero key edge cases in commit message

v3:
 - Replace skb_tail_pointer() - skb_mac_header() length check with
    skb->dev->hard_header_len check.

v2:
 - Adjust commit message and comment.

 net/core/flow_dissector.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 2a98f5fa74eb..8aa4f9b4df81 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -1173,13 +1173,21 @@ bool __skb_flow_dissect(const struct net *net,
 
 	if (dissector_uses_key(flow_dissector,
 			       FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
-		struct ethhdr *eth = eth_hdr(skb);
 		struct flow_dissector_key_eth_addrs *key_eth_addrs;
 
 		key_eth_addrs = skb_flow_dissector_target(flow_dissector,
 							  FLOW_DISSECTOR_KEY_ETH_ADDRS,
 							  target_container);
-		memcpy(key_eth_addrs, eth, sizeof(*key_eth_addrs));
+		/* TC filter blocks can be shared across devices with
+		 * different link types, so we cannot validate this
+		 * when the filter is installed -- check at dissect time.
+		 */
+		if (skb && skb->dev &&
+		    skb->dev->type == ARPHRD_ETHER &&
+		    skb_mac_header_was_set(skb))
+			memcpy(key_eth_addrs, eth_hdr(skb), sizeof(*key_eth_addrs));
+		else
+			memset(key_eth_addrs, 0, sizeof(*key_eth_addrs));
 	}
 
 	if (dissector_uses_key(flow_dissector,
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net] tipc: fix use-after-free of discoverer in tipc_disc_rcv()
From: Weiming Shi @ 2026-06-16 12:28 UTC (permalink / raw)
  To: Tung Quang Nguyen, Weiming Shi
  Cc: Simon Horman, netdev@vger.kernel.org,
	tipc-discussion@lists.sourceforge.net,
	linux-kernel@vger.kernel.org, Xiang Mei, Jon Maloy,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
In-Reply-To: <GV1P189MB1988A1CFCAA9214B6F009315C6182@GV1P189MB1988.EURP189.PROD.OUTLOOK.COM>

On Fri Jun 12, 2026 at 4:53 PM CST, Tung Quang Nguyen wrote:
>>Subject: [PATCH net] tipc: fix use-after-free of discoverer in tipc_disc_rcv()
>>
>>bearer_disable() frees b->disc with tipc_disc_delete()'s plain kfree(), but
>>tipc_disc_rcv() still dereferences b->disc in RX softirq under
>>rcu_read_lock() (tipc_udp_recv -> tipc_rcv -> tipc_disc_rcv).
>>
>>L2 bearers are safe thanks to the synchronize_net() in tipc_disable_l2_media(),
>>but the UDP bearer defers that call to the
>>cleanup_bearer() workqueue, so the discoverer is freed with no grace
>>period:
>>
>> BUG: KASAN: slab-use-after-free in tipc_disc_rcv (net/tipc/discover.c:149)
>>Read of size 8 at addr ffff88802348b728 by task poc_tipc/184  <IRQ>
>>  tipc_disc_rcv (net/tipc/discover.c:149)
>>  tipc_rcv (net/tipc/node.c:2126)
>>  tipc_udp_recv (net/tipc/udp_media.c:391)
>>  udp_rcv (net/ipv4/udp.c:2643)
>>  ip_local_deliver_finish (net/ipv4/ip_input.c:241)  </IRQ>  Freed by task 181:
>>  kfree (mm/slub.c:6565)
>>  bearer_disable (net/tipc/bearer.c:418)
>>  tipc_nl_bearer_disable (net/tipc/bearer.c:1001)
>>
>>The bearer is freed with kfree_rcu(); free the discoverer the same way.
>>Add an rcu_head to struct tipc_discoverer and free it and its skb from an RCU
>>callback.
>>
>>Reachable from an unprivileged user namespace: the TIPCv2 genl family is
>>netnsok and its bearer commands have no GENL_ADMIN_PERM. Needs
>>CONFIG_TIPC and CONFIG_TIPC_MEDIA_UDP.
>>
>>Fixes: 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address hash
>>values")
>>Reported-by: Xiang Mei <xmei5@asu.edu>
>>Assisted-by: Claude:claude-opus-4-8
>>Signed-off-by: Weiming Shi <bestswngs@gmail.com>
>>---
>> net/tipc/discover.c | 13 +++++++++++--
>> 1 file changed, 11 insertions(+), 2 deletions(-)
>>
>>diff --git a/net/tipc/discover.c b/net/tipc/discover.c index
>>3e54d2df5683a..34dbe5ad10e09 100644
>>--- a/net/tipc/discover.c
>>+++ b/net/tipc/discover.c
>>@@ -58,6 +58,7 @@
>>  * @skb: request message to be (repeatedly) sent
>>  * @timer: timer governing period between requests
>>  * @timer_intv: current interval between requests (in ms)
>>+ * @rcu: RCU head for deferred freeing
>>  */
>> struct tipc_discoverer {
>> 	u32 bearer_id;
>>@@ -69,6 +70,7 @@ struct tipc_discoverer {
>> 	struct sk_buff *skb;
>> 	struct timer_list timer;
>> 	unsigned long timer_intv;
>>+	struct rcu_head rcu;
>> };
>>
>> /**
>>@@ -382,6 +384,14 @@ int tipc_disc_create(struct net *net, struct tipc_bearer
>>*b,
>> 	return 0;
>> }
>>
>>+static void tipc_disc_free_rcu(struct rcu_head *rp) {
>>+	struct tipc_discoverer *d = container_of(rp, struct tipc_discoverer,
>>+rcu);
>
> This line is long (over 80 columns). Please break it into 2 lines (refer to linux/Documentation/process/coding-style.rst).
>
>>+
>>+	kfree_skb(d->skb);
>>+	kfree(d);
>>+}
>>+
>> /**
>>  * tipc_disc_delete - destroy object sending periodic link setup requests
>>  * @d: ptr to link dest structure
>>@@ -389,8 +399,7 @@ int tipc_disc_create(struct net *net, struct tipc_bearer
>>*b,  void tipc_disc_delete(struct tipc_discoverer *d)  {
>> 	timer_shutdown_sync(&d->timer);
>>-	kfree_skb(d->skb);
>>-	kfree(d);
>>+	call_rcu(&d->rcu, tipc_disc_free_rcu);
>> }
>>
>> /**
>>--
>>2.43.0
>>

Hi,

I’m sorry for taking so long to respond. The v2 version has already been sent.


^ permalink raw reply

* [PATCH net v2] tipc: fix use-after-free of the discoverer in tipc_disc_rcv()
From: Weiming Shi @ 2026-06-16 12:22 UTC (permalink / raw)
  To: netdev, tipc-discussion, linux-kernel
  Cc: jmaloy, ying.xue, tung.quang.nguyen, edumazet, kuba, pabeni,
	horms, davem, xmei5, Weiming Shi

bearer_disable() frees b->disc with tipc_disc_delete()'s plain kfree(),
but tipc_disc_rcv() still dereferences b->disc in RX softirq under
rcu_read_lock() (tipc_udp_recv -> tipc_rcv -> tipc_disc_rcv).

L2 bearers are safe thanks to the synchronize_net() in
tipc_disable_l2_media(), but the UDP bearer defers that call to the
cleanup_bearer() workqueue, so the discoverer is freed with no grace
period:

 BUG: KASAN: slab-use-after-free in tipc_disc_rcv (net/tipc/discover.c:149)
 Read of size 8 at addr ffff88802348b728 by task poc_tipc/184
 <IRQ>
  tipc_disc_rcv (net/tipc/discover.c:149)
  tipc_rcv (net/tipc/node.c:2126)
  tipc_udp_recv (net/tipc/udp_media.c:391)
  udp_rcv (net/ipv4/udp.c:2643)
  ip_local_deliver_finish (net/ipv4/ip_input.c:241)
 </IRQ>
 Freed by task 181:
  kfree (mm/slub.c:6565)
  bearer_disable (net/tipc/bearer.c:418)
  tipc_nl_bearer_disable (net/tipc/bearer.c:1001)

The bearer is freed with kfree_rcu(); free the discoverer the same way.
Add an rcu_head to struct tipc_discoverer and free it and its skb from an
RCU callback.

Because the RCU callback (tipc_disc_free_rcu) lives in module text, a
call_rcu() that is still pending when the tipc module is unloaded would
invoke a freed function. Add an rcu_barrier() to tipc_exit() after the
bearer subsystem has been torn down, so all pending discoverer callbacks
have run before the module text goes away.

Reachable from an unprivileged user namespace: the TIPCv2 genl family is
netnsok and its bearer commands have no GENL_ADMIN_PERM. Needs CONFIG_TIPC
and CONFIG_TIPC_MEDIA_UDP.

Fixes: 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address hash values")
Reported-by: Xiang Mei <xmei5@asu.edu>
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
---
v2:
 - split the over-80-column container_of() line (Tung Quang Nguyen)
 - add rcu_barrier() to tipc_exit() so a pending call_rcu() cannot fire
   into freed module text after rmmod (Eric Dumazet)

 net/tipc/core.c     |  3 +++
 net/tipc/discover.c | 14 ++++++++++++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/net/tipc/core.c b/net/tipc/core.c
index 434e70eabe08..747328e58d30 100644
--- a/net/tipc/core.c
+++ b/net/tipc/core.c
@@ -218,6 +218,9 @@ static void __exit tipc_exit(void)
 	unregister_pernet_device(&tipc_net_ops);
 	tipc_unregister_sysctl();
 
+	/* Wait for tipc_disc_free_rcu() callbacks queued from module text. */
+	rcu_barrier();
+
 	pr_info("Deactivated\n");
 }
 
diff --git a/net/tipc/discover.c b/net/tipc/discover.c
index 3e54d2df5683..696b7a8ed54d 100644
--- a/net/tipc/discover.c
+++ b/net/tipc/discover.c
@@ -58,6 +58,7 @@
  * @skb: request message to be (repeatedly) sent
  * @timer: timer governing period between requests
  * @timer_intv: current interval between requests (in ms)
+ * @rcu: RCU head for deferred freeing
  */
 struct tipc_discoverer {
 	u32 bearer_id;
@@ -69,6 +70,7 @@ struct tipc_discoverer {
 	struct sk_buff *skb;
 	struct timer_list timer;
 	unsigned long timer_intv;
+	struct rcu_head rcu;
 };
 
 /**
@@ -382,6 +384,15 @@ int tipc_disc_create(struct net *net, struct tipc_bearer *b,
 	return 0;
 }
 
+static void tipc_disc_free_rcu(struct rcu_head *rp)
+{
+	struct tipc_discoverer *d =
+		container_of(rp, struct tipc_discoverer, rcu);
+
+	kfree_skb(d->skb);
+	kfree(d);
+}
+
 /**
  * tipc_disc_delete - destroy object sending periodic link setup requests
  * @d: ptr to link dest structure
@@ -389,8 +400,7 @@ int tipc_disc_create(struct net *net, struct tipc_bearer *b,
 void tipc_disc_delete(struct tipc_discoverer *d)
 {
 	timer_shutdown_sync(&d->timer);
-	kfree_skb(d->skb);
-	kfree(d);
+	call_rcu(&d->rcu, tipc_disc_free_rcu);
 }
 
 /**
-- 
2.43.0


^ permalink raw reply related

* RE: [PATCH v2 net-next 1/1] tcp: Replace min_tso_segs() with tso_segs() CC callback for TCP Prague
From: Chia-Yu Chang (Nokia) @ 2026-06-16 12:23 UTC (permalink / raw)
  To: Jakub Kicinski, edumazet@google.com, ncardwell@google.com
  Cc: jolsa@kernel.org, yonghong.song@linux.dev, song@kernel.org,
	linux-kselftest@vger.kernel.org, memxor@gmail.com,
	shuah@kernel.org, martin.lau@linux.dev, ast@kernel.org,
	daniel@iogearbox.net, andrii@kernel.org, eddyz87@gmail.com,
	horms@kernel.org, dsahern@kernel.org, bpf@vger.kernel.org,
	netdev@vger.kernel.org, pabeni@redhat.com, jhs@mojatatu.com,
	stephen@networkplumber.org, davem@davemloft.net,
	andrew+netdev@lunn.ch, donald.hunter@gmail.com, kuniyu@google.com,
	ij@kernel.org, Koen De Schepper (Nokia), g.white@cablelabs.com,
	ingemar.s.johansson@ericsson.com, mirja.kuehlewind@ericsson.com,
	cheshire@apple.com, rs.ietf@gmx.at, Jason_Livingood@comcast.com,
	vidhi_goel@apple.com
In-Reply-To: <20260615191704.31be22da@kernel.org>

git send-email --to='jolsa@kernel.org' --to='yonghong.song@linux.dev' --to='song@kernel.org' --to='linux-kselftest@vger.kernel.org' --to='memxor@gmail.com' --to='shuah@kernel.org' --to='martin.lau@linux.dev' --to='ast@kernel.org' --to='daniel@iogearbox.net' --to='andrii@kernel.org' --to='eddyz87@gmail.com' --to='horms@kernel.org' --to='dsahern@kernel.org' --to='bpf@vger.kernel.org' --to='netdev@vger.kernel.org' --to='pabeni@redhat.com' --to='jhs@mojatatu.com' --to='kuba@kernel.org' --to='stephen@networkplumber.org' --to='davem@davemloft.net' --to='edumazet@google.com' --to='andrew+netdev@lunn.ch' --to='donald.hunter@gmail.com' --to='kuniyu@google.com' --to='ij@kernel.org' --to='ncardwell@google.com' --to='koen.de_schepper@nokia-bell-labs.com' --to='g.white@cablelabs.com' --to='ingemar.s.johansson@ericsson.com' --to='mirja.kuehlewind@ericsson.com' --to='cheshire@apple.com' --to='rs.ietf@gmx.at' --to='Jason_Livingood@comcast.com' --to='vidhi_goel@apple.com' ~/L4S_patches_20260613/linux_net_next_prague_series0/*.patch --from='chia-yu.chang@nokia-bell-labs.com' --smtp-server='mailrelay.int.nokia.com' --smtp-server-port=25


> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org> 
> Sent: Tuesday, June 16, 2026 4:17 AM
> To: edumazet@google.com; ncardwell@google.com
> Cc: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>; jolsa@kernel.org; yonghong.song@linux.dev; song@kernel.org; linux-kselftest@vger.kernel.org; memxor@gmail.com; shuah@kernel.org; martin.lau@linux.dev; ast@kernel.org; daniel@iogearbox.net; andrii@kernel.org; eddyz87@gmail.com; horms@kernel.org; dsahern@kernel.org; bpf@vger.kernel.org; netdev@vger.kernel.org; pabeni@redhat.com; jhs@mojatatu.com; stephen@networkplumber.org; davem@davemloft.net; andrew+netdev@lunn.ch; donald.hunter@gmail.com; kuniyu@google.com; ij@kernel.org; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire@apple.com; rs.ietf@gmx.at; Jason_Livingood@comcast.com; vidhi_goel@apple.com
> Subject: Re: [PATCH v2 net-next 1/1] tcp: Replace min_tso_segs() with tso_segs() CC callback for TCP Prague
> 
> 
> CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
> 
> 
> 
> On Mon, 15 Jun 2026 18:51:02 -0700 Jakub Kicinski wrote:
> > On Sun, 14 Jun 2026 09:17:56 +0200 chia-yu.chang@nokia-bell-labs.com
> > wrote:
> > > This patch replaces existing min_tso_segs() with tso_segs() CC 
> > > callbak for CC algorithm to provides explicit tso segment number of 
> > > each data burst and overrides tcp_tso_autosize().
> > >
> > > No functional change.
> >
> > Eric, Neal, looks good?
> >
> > The min rtt thing in tcp_tso_autosize() helps a bit but if the sender 
> > gets congested for a longer stretch min_rtts on new connections are 
> > high and we're back to sending small TSO, keeping the sender overloaded.
> > Which is to say - I _hope_ this also solves some of Meta's problems :)
> 
> Ugh, I didn't see the Sashiko report, it's only CCed to the author and bpf@, not to netdev :/
> 
> The zero-check sounds legit. Let's revisit this after the merge window.

Thanks for the comment, I will take action after the merge window.

And, please correct me if I am wrong, the next eligible submission is expected from 30-June, right?
Thanks!

Chia-Yu

^ permalink raw reply

* RE: Ethtool : PRBS feature
From: Das, Shubham @ 2026-06-16 12:14 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev@vger.kernel.org, mkubecek@suse.cz, D H, Siddaraju,
	Chintalapalle, Balaji
In-Reply-To: <f4a3e4ad-91e2-4844-a29b-81f70151228e@lunn.ch>

Hi Andrew,

Thanks for the feedback.

Yes, for multi-lane ports we can accept the lane number as an argument like:

ethtool --phy-test eth1 lane 0 tx-prbs prbs7
ethtool --phy-test eth2 lane 0 rx-prbs prbs7

We referred to "Lee Trager's" "Open-Source Tooling for PHY Management and Testing" session:
https://netdevconf.info/0x19/sessions/talk/open-source-tooling-for-phy-management-and-testing.html?.
We have been trying to reach "Lee Trager" to seek more input, latest update on the approach and understand if there is a parallel effort in active so we can collaborate.
If you can, please help me connect with "Lee Trager" and others who expressed interest in Ethernet PRBS. We are happy to align and start implementation.

About standardizing across other bus like PCIe and USB, I had a quick discussion with our internal designers, but I didn't observe any such SW-level config knobs interest. 
Looks like Ethernet has clear interest and we are joining that Ethernet PRBS community too.

Ethernet PRBS configuration and diagnostics support is well established and already widely used in existing Ethernet SERDES deployments.
We think Ethernet is the most natural starting point within netdev, as it aligns with current driver practice and existing validation workflows. 

Thanks,
Shubham D


> -----Original Message-----
> From: Andrew Lunn <andrew@lunn.ch>
> Sent: 11 June 2026 21:14
> To: Das, Shubham <shubham.das@intel.com>
> Cc: netdev@vger.kernel.org; mkubecek@suse.cz; D H, Siddaraju
> <siddaraju.dh@intel.com>; Chintalapalle, Balaji <balaji.chintalapalle@intel.com>
> Subject: Re: Ethtool : PRBS feature
> 
> > 2. Whether similar work has been proposed previously.
> 
> There was a presentation at netdev conf last year about this topic,
> and how you use it to configure SERDES eyes. And then a long
> discussion on the netdev mailing afterwards. You should read the
> discussion, and incorporate the ideas. There was a couple of points
> raised:
> 
> SERDES are also used for PCIe, USB, SATA, and they have similar
> capabilities to a SERDES used for networking. Do we want a networking
> specific solution, or something more generic?
> 
> You need to include lane information, since there can be 1, 2 or 4
> lanes involved, and you need to specify which lane you want to test.
> 
>       Andrew

^ permalink raw reply

* [PATCH v2] [net] net: airoha: Clean up RX queues in airoha_dev_stop
From: Wayen Yan @ 2026-06-16 10:50 UTC (permalink / raw)
  To: netdev
  Cc: lorenzo, horms, pabeni, kuba, edumazet, andrew+netdev,
	angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
	linux-mediatek

When the last port is stopped, airoha_dev_stop() clears TX queues
but neglects to clean up RX queues. This can lead to:
- RX ring buffer descriptors remaining valid after device close
- Potential DMA synchronization issues on device reopen
- Risk of use-after-free if pages are freed while DMA is still active

Add cleanup loop for RX queues to mirror the TX queue cleanup,
ensuring symmetric resource management.

Fixes: 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC")
Signed-off-by: Wayen Yan <win847@gmail.com>
---
 drivers/net/ethernet/airoha/airoha_eth.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index 31cdb11cd7..9ca5bbf64d 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -1771,6 +1771,13 @@ static int airoha_dev_stop(struct net_device *dev)
 
 			airoha_qdma_cleanup_tx_queue(&qdma->q_tx[i]);
 		}
+
+		for (i = 0; i < ARRAY_SIZE(qdma->q_rx); i++) {
+			if (!qdma->q_rx[i].ndesc)
+				continue;
+
+			airoha_qdma_cleanup_rx_queue(&qdma->q_rx[i]);
+		}
 	}
 
 	return 0;
-- 
2.51.0



^ permalink raw reply related

* [PATCH v2] [net] net: airoha: Stop TX queues on error path in airoha_dev_open
From: Wayen Yan @ 2026-06-16 10:50 UTC (permalink / raw)
  To: netdev
  Cc: lorenzo, horms, pabeni, kuba, edumazet, andrew+netdev,
	angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
	linux-mediatek

In airoha_dev_open(), if airoha_set_vip_for_gdm_port() fails after
netif_tx_start_all_queues() has been called, the TX queues remain
started while the device configuration is incomplete. This leaves
the device in an inconsistent state where packets could be
transmitted before the VIP/IFC port configuration is complete.

Add netif_tx_stop_all_queues() call on the error path to properly
roll back the TX queue state.

Fixes: 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC")
Signed-off-by: Wayen Yan <win847@gmail.com>
---
 drivers/net/ethernet/airoha/airoha_eth.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index 31cdb11cd7..cf9c366907 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -1715,8 +1715,10 @@ static int airoha_dev_open(struct net_device *dev)
 
 	netif_tx_start_all_queues(dev);
 	err = airoha_set_vip_for_gdm_port(port, true);
-	if (err)
+	if (err) {
+		netif_tx_stop_all_queues(dev);
 		return err;
+	}
 
 	if (netdev_uses_dsa(dev))
 		airoha_fe_set(qdma->eth, REG_GDM_INGRESS_CFG(port->id),
-- 
2.51.0



^ permalink raw reply related

* [PATCH v2] [net] net: airoha: Fix QoS counter configuration for Tx-fwd channels
From: Wayen Yan @ 2026-06-16 10:50 UTC (permalink / raw)
  To: netdev
  Cc: lorenzo, horms, pabeni, kuba, edumazet, andrew+netdev,
	angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
	linux-mediatek

In airoha_qdma_init_qos_stats(), the Tx-fwd counter was incorrectly
using register index (i << 1) instead of ((i << 1) + 1). This caused
the Tx-fwd configuration to overwrite the Tx-cpu configuration for
each QoS channel, resulting in incorrect QoS statistics.

Fix by using the correct register index ((i << 1) + 1) for Tx-fwd
counter configuration.

Fixes: 20bf7d07c956 ("net: airoha: Add sched ETS offload support")
Signed-off-by: Wayen Yan <win847@gmail.com>
---
 drivers/net/ethernet/airoha/airoha_eth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index 31cdb11cd7..329988a840 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -1256,7 +1256,7 @@ static void airoha_qdma_init_qos_stats(struct airoha_qdma *qdma)
 			       FIELD_PREP(CNTR_CHAN_MASK, i));
 		/* Tx-fwd transferred count */
 		airoha_qdma_wr(qdma, REG_CNTR_VAL((i << 1) + 1), 0);
-		airoha_qdma_wr(qdma, REG_CNTR_CFG(i << 1),
+		airoha_qdma_wr(qdma, REG_CNTR_CFG((i << 1) + 1),
 			       CNTR_EN_MASK | CNTR_ALL_QUEUE_EN_MASK |
 			       CNTR_ALL_DSCP_RING_EN_MASK |
 			       FIELD_PREP(CNTR_SRC_MASK, 1) |
-- 
2.51.0



^ permalink raw reply related

* [PATCH v2] [net] net: airoha: fix foe_check_time allocation size
From: Wayen Yan @ 2026-06-16 11:52 UTC (permalink / raw)
  To: netdev
  Cc: lorenzo, horms, pabeni, kuba, edumazet, andrew+netdev,
	angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
	linux-mediatek

foe_check_time is declared as u16 pointer but was allocated with
only ppe_num_entries bytes instead of ppe_num_entries * sizeof(u16).

When airoha_ppe_foe_verify_entry() is called with hash >= ppe_num_entries/2,
it writes beyond the allocated buffer, causing heap buffer overflow and
potential kernel crash.

Fixes: 6d5b601d52a2 ("net: airoha: ppe: Dynamically allocate foe_check_time array in airoha_ppe struct")
Signed-off-by: Wayen Yan <win847@gmail.com>
---
 drivers/net/ethernet/airoha/airoha_ppe.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/airoha/airoha_ppe.c b/drivers/net/ethernet/airoha/airoha_ppe.c
index 5c9dff6bcc..8fb8ecf909 100644
--- a/drivers/net/ethernet/airoha/airoha_ppe.c
+++ b/drivers/net/ethernet/airoha/airoha_ppe.c
@@ -1578,7 +1578,8 @@ int airoha_ppe_init(struct airoha_eth *eth)
 			return -ENOMEM;
 	}
 
-	ppe->foe_check_time = devm_kzalloc(eth->dev, ppe_num_entries,
+	ppe->foe_check_time = devm_kzalloc(eth->dev,
+					   ppe_num_entries * sizeof(*ppe->foe_check_time),
 					   GFP_KERNEL);
 	if (!ppe->foe_check_time)
 		return -ENOMEM;
-- 
2.51.0



^ permalink raw reply related

* [PATCH net-next v3] virtio-net: xsk: support tx wake up
From: Menglong Dong @ 2026-06-16 11:59 UTC (permalink / raw)
  To: xuanzhuo, eperezma
  Cc: mst, jasowang, andrew+netdev, davem, edumazet, kuba, pabeni,
	netdev, virtualization, linux-kernel

For now, XDP_RING_NEED_WAKEUP is not supported properly by the virtio-net
in the tx path for example: we set xsk_set_tx_need_wakeup() in
virtnet_xsk_xmit(), but we didn't call xsk_clear_tx_need_wakeup()
anywhere, which means the user will call send() for every packet.

We call xsk_set_tx_need_wakeup() after virtnet_xsk_xmit_batch() if sq->vq
is empty, as we can't be wakeup by the skb_xmit_done() in this case.
Otherwise, we will clear the wakeup flag.

Race condition is considered for tx path.

Fixes: 89f86675cb03 ("virtio_net: xsk: tx: support xmit xsk buffer")
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
v3:
- remove the confusing comment

v2:
- add the Fixes tag
---
 drivers/net/virtio_net.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f4adcfee7a80..6e099edef6e9 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1440,8 +1440,9 @@ static bool virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
 	struct virtnet_info *vi = sq->vq->vdev->priv;
 	struct virtnet_sq_free_stats stats = {};
 	struct net_device *dev = vi->dev;
+	int sent, vring_size;
+	bool need_wakeup;
 	u64 kicks = 0;
-	int sent;
 
 	/* Avoid to wakeup napi meanless, so call __free_old_xmit instead of
 	 * free_old_xmit().
@@ -1451,8 +1452,25 @@ static bool virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
 	if (stats.xsk)
 		xsk_tx_completed(sq->xsk_pool, stats.xsk);
 
+	vring_size = virtqueue_get_vring_size(sq->vq);
+	need_wakeup = xsk_uses_need_wakeup(pool);
+
+	if (need_wakeup && vring_size == sq->vq->num_free)
+		xsk_set_tx_need_wakeup(pool);
+
 	sent = virtnet_xsk_xmit_batch(sq, pool, budget, &kicks);
 
+	if (need_wakeup) {
+		if (vring_size == sq->vq->num_free)
+			/* we can't wake up by ourself, and it should be done
+			 * by the user.
+			 */
+			xsk_set_tx_need_wakeup(pool);
+		else
+			/* we can wake up from skb_xmit_done() */
+			xsk_clear_tx_need_wakeup(pool);
+	}
+
 	if (!is_xdp_raw_buffer_queue(vi, sq - vi->sq))
 		check_sq_full_and_disable(vi, vi->dev, sq);
 
@@ -1470,9 +1488,6 @@ static bool virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
 	u64_stats_add(&sq->stats.xdp_tx,  sent);
 	u64_stats_update_end(&sq->stats.syncp);
 
-	if (xsk_uses_need_wakeup(pool))
-		xsk_set_tx_need_wakeup(pool);
-
 	return sent;
 }
 
-- 
2.54.0


^ permalink raw reply related

* Re: [PATCH net] tipc: free bearer discoverer via RCU to fix tipc_disc_rcv UAF
From: Tung Quang Nguyen @ 2026-06-16 11:34 UTC (permalink / raw)
  To: Sam P
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev@vger.kernel.org,
	tipc-discussion@lists.sourceforge.net,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org, Jon Maloy,
	bestswngs@gmail.com
In-Reply-To: <fa2e0cfb-9d60-4295-8a46-f69ce1229094@bynar.io>

Subject: Re: [PATCH net] tipc: free bearer discoverer via RCU to fix tipc_disc_rcv UAF

> Oops, I missed that patch! I'm not sure what the etiquette
> is in this case, but I'm happy to defer to the original
> submitter (CCd) if they're working on a new patch and/or
> add any appropriate trailers to my v2.

> I've prepared a v2 to submit after the ~24h period,
> addressing your changes and taking into account Eric's
> feedback from the earlier submission as well
> (adding an rcu_barrier() in tipc_exit()).
Eric's concern is correct but it needs to be addressed in a separate patch because it is a pre-existing issue. It requires another reproduction (load/unload TIPC kernel module) and other considerations (calling call_rcu() from timer etc.).
For now, I think you just need to address my comment.


^ permalink raw reply

* [PATCH v3] net: mvneta_bm: add suspend/resume support to prevent crash after resume
From: Yun Zhou @ 2026-06-16 11:25 UTC (permalink / raw)
  To: marcin.s.wojtas, andrew+netdev, davem, edumazet, kuba, pabeni
  Cc: netdev, linux-kernel, yun.zhou

The mvneta driver uses the hardware Buffer Manager (BM) for RX buffer
allocation. During suspend, mvneta disables its clock, causing BM to
lose all buffer address state. On resume, mvneta_bm_port_init() re-
attaches the BM pool to the NIC, but BM hardware returns stale/garbage
buffer addresses. When NAPI poll processes these buffers, DMA cache
sync hits an invalid virtual address causing a kernel panic:

 Unable to handle kernel paging request at virtual address b0000080
 PC is at v7_dma_inv_range
 Call trace:
  v7_dma_inv_range from arch_sync_dma_for_cpu+0x94/0x158
  arch_sync_dma_for_cpu from __dma_sync_single_for_cpu+0xc4/0x15c
  __dma_sync_single_for_cpu from mvneta_rx_swbm+0x6c8/0xf48
  mvneta_rx_swbm from mvneta_poll+0x6fc/0x70c
  mvneta_poll from __napi_poll.constprop.0+0x2c/0x1e0
  __napi_poll.constprop.0 from net_rx_action+0x160/0x2c4
  net_rx_action from handle_softirqs+0xd8/0x2b8
  handle_softirqs from run_ksoftirqd+0x30/0x94
  run_ksoftirqd from smpboot_thread_fn+0x100/0x204
  smpboot_thread_fn from kthread+0xf4/0x110
  kthread from ret_from_fork+0x14/0x28

Fix by adding suspend/resume callbacks to the BM driver:

- suspend: drain all buffers (with DMA unmapping), free the BPPE
  regions, and reset pool state to FREE before stopping BM and gating
  the clock.

- resume: enable the clock, reinitialize BM defaults, and restore pool
  read/write pointers and size registers. Pool allocation and buffer
  refill are handled by mvneta_resume() through the normal
  mvneta_bm_port_init() path, which sees pools as FREE and performs
  full initialization identical to probe.

Add a device_link (DL_FLAG_AUTOREMOVE_CONSUMER) in mvneta_probe to
guarantee BM resumes before mvneta and suspends after mvneta.

Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
---
v3:
  - Restore per-pool POOL_SIZE_REG, POOL_READ_PTR_REG, and
    POOL_WRITE_PTR_REG in resume, since clock gating loses all BM
    register state.
  - Check device_link_add() return value and emit dev_warn on failure.
  - Replace SIMPLE_DEV_PM_OPS (deprecated) with
    DEFINE_SIMPLE_DEV_PM_OPS and pm_sleep_ptr(), removing the
    #ifdef CONFIG_PM_SLEEP guard.
  - Add dev_warn in suspend if not all buffers could be freed.

v2:
  - Drain buffers via mvneta_bm_bufs_free() in suspend instead of only
    stopping BM and gating the clock. This ensures proper DMA unmapping
    and avoids buffer leaks.
  - Free the BPPE DMA-coherent region in suspend so that resume takes
    the full probe-time initialization path (alloc + fill), eliminating
    the need to modify mvneta_bm_pool_create().
  - Reset pool type to MVNETA_BM_FREE in suspend so mvneta_bm_pool_use()
    correctly re-creates and refills pools on resume.
  - Check clk_prepare_enable() return value in resume.
  - Add device_link between mvneta (consumer) and mvneta_bm (supplier)
    to guarantee correct suspend/resume ordering.

 drivers/net/ethernet/marvell/mvneta.c    |  7 +++
 drivers/net/ethernet/marvell/mvneta_bm.c | 58 ++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 0c061fb0ed07..b4a845f04c05 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -5678,6 +5678,13 @@ static int mvneta_probe(struct platform_device *pdev)
 					 "use SW buffer management\n");
 				mvneta_bm_put(pp->bm_priv);
 				pp->bm_priv = NULL;
+			} else {
+				/* Ensure BM suspends after us, resumes before us */
+				if (!device_link_add(&pdev->dev,
+						     &pp->bm_priv->pdev->dev,
+						     DL_FLAG_AUTOREMOVE_CONSUMER))
+					dev_warn(&pdev->dev,
+						 "failed to create device link to BM\n");
 			}
 		}
 		/* Set RX packet offset correction for platforms, whose
diff --git a/drivers/net/ethernet/marvell/mvneta_bm.c b/drivers/net/ethernet/marvell/mvneta_bm.c
index 6bb380494919..85162a43eaf6 100644
--- a/drivers/net/ethernet/marvell/mvneta_bm.c
+++ b/drivers/net/ethernet/marvell/mvneta_bm.c
@@ -477,6 +477,63 @@ static void mvneta_bm_remove(struct platform_device *pdev)
 	clk_disable_unprepare(priv->clk);
 }
 
+static int mvneta_bm_suspend(struct device *dev)
+{
+	struct mvneta_bm *priv = dev_get_drvdata(dev);
+	int i;
+
+	/* Drain buffers and free pool resources while BM is still clocked */
+	for (i = 0; i < MVNETA_BM_POOLS_NUM; i++) {
+		struct mvneta_bm_pool *bm_pool = &priv->bm_pools[i];
+		int size_bytes;
+
+		if (bm_pool->type == MVNETA_BM_FREE)
+			continue;
+
+		mvneta_bm_bufs_free(priv, bm_pool, bm_pool->port_map);
+		if (bm_pool->hwbm_pool.buf_num)
+			dev_warn(&priv->pdev->dev,
+				 "pool %d: %d buffers not freed\n",
+				 bm_pool->id, bm_pool->hwbm_pool.buf_num);
+
+		size_bytes = sizeof(u32) * bm_pool->hwbm_pool.size;
+		dma_free_coherent(&priv->pdev->dev, size_bytes,
+				  bm_pool->virt_addr, bm_pool->phys_addr);
+		bm_pool->virt_addr = NULL;
+		bm_pool->type = MVNETA_BM_FREE;
+	}
+
+	mvneta_bm_write(priv, MVNETA_BM_COMMAND_REG, MVNETA_BM_STOP_MASK);
+	clk_disable_unprepare(priv->clk);
+	return 0;
+}
+
+static int mvneta_bm_resume(struct device *dev)
+{
+	struct mvneta_bm *priv = dev_get_drvdata(dev);
+	int i, err;
+
+	err = clk_prepare_enable(priv->clk);
+	if (err)
+		return err;
+
+	/* Reinitialize BM hardware; pools are refilled by mvneta_resume() */
+	mvneta_bm_default_set(priv);
+
+	/* Restore pool registers lost during clock gating */
+	for (i = 0; i < MVNETA_BM_POOLS_NUM; i++) {
+		mvneta_bm_write(priv, MVNETA_BM_POOL_READ_PTR_REG(i), 0);
+		mvneta_bm_write(priv, MVNETA_BM_POOL_WRITE_PTR_REG(i), 0);
+		mvneta_bm_write(priv, MVNETA_BM_POOL_SIZE_REG(i),
+				priv->bm_pools[i].hwbm_pool.size);
+	}
+
+	mvneta_bm_write(priv, MVNETA_BM_COMMAND_REG, MVNETA_BM_START_MASK);
+	return 0;
+}
+
+static DEFINE_SIMPLE_DEV_PM_OPS(mvneta_bm_pm_ops, mvneta_bm_suspend, mvneta_bm_resume);
+
 static const struct of_device_id mvneta_bm_match[] = {
 	{ .compatible = "marvell,armada-380-neta-bm" },
 	{ }
@@ -489,6 +546,7 @@ static struct platform_driver mvneta_bm_driver = {
 	.driver = {
 		.name = MVNETA_BM_DRIVER_NAME,
 		.of_match_table = mvneta_bm_match,
+		.pm = pm_sleep_ptr(&mvneta_bm_pm_ops),
 	},
 };
 
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH] swiotlb: avoid double copy with swiotlb on tx socket
From: kernel test robot @ 2026-06-16 11:21 UTC (permalink / raw)
  To: Luigi Rizzo, rizzo.unipi, m.szyprowski, robin.murphy, willemb,
	kuniyu, davem, edumazet, kuba, pabeni
  Cc: oe-kbuild-all, gregkh, rafael, akpm, david, netdev, linux-mm,
	iommu, driver-core, linux-kernel
In-Reply-To: <20260615234220.3946885-1-lrizzo@google.com>

Hi Luigi,

kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on linus/master v7.1 next-20260615]
[cannot apply to driver-core/driver-core-testing driver-core/driver-core-next driver-core/driver-core-linus]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Luigi-Rizzo/swiotlb-avoid-double-copy-with-swiotlb-on-tx-socket/20260616-074655
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20260615234220.3946885-1-lrizzo%40google.com
patch subject: [PATCH] swiotlb: avoid double copy with swiotlb on tx socket
config: arm-randconfig-r122-20260616 (https://download.01.org/0day-ci/archive/20260616/202606161921.OPkgBApm-lkp@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 16.1.0
sparse: v0.6.5-rc1
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260616/202606161921.OPkgBApm-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606161921.OPkgBApm-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
   kernel/dma/swiotlb.c: note: in included file (through include/linux/dma-direct.h):
>> include/linux/swiotlb.h:229:65: sparse: sparse: Using plain integer as NULL pointer
>> include/linux/swiotlb.h:229:65: sparse: sparse: Using plain integer as NULL pointer

vim +229 include/linux/swiotlb.h

   224	
   225	static inline bool is_zerocopy_swiotlb_folio(struct page *page)
   226	{
   227		struct folio *folio = page_folio(page);
   228	
 > 229		return folio_test_zcswiotlb(folio) && folio->private != 0;
   230	}
   231	

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH net 4/4] net: ti: icssg: Fix XSK zero copy TX during application wakeup
From: Meghana Malladi @ 2026-06-16 11:11 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: diogo.ivo, haokexin, vadim.fedorenko, devnexen, horms,
	jacob.e.keller, sdf, john.fastabend, hawk, daniel, ast, pabeni,
	edumazet, davem, andrew+netdev, bpf, linux-kernel, netdev,
	linux-arm-kernel, srk, Vignesh Raghavendra, Roger Quadros,
	danishanwar
In-Reply-To: <20260615162157.3748bcda@kernel.org>

Hi Jakub,

On 6/16/26 04:51, Jakub Kicinski wrote:
> On Fri, 12 Jun 2026 00:27:44 +0530 Meghana Malladi wrote:
>> @@ -169,9 +169,6 @@ static int emac_xsk_xmit_zc(struct prueth_emac *emac,
>>   
>>   		num_tx++;
>>   	}
>> -
>> -	xsk_tx_release(tx_chn->xsk_pool);
>> -	return num_tx;
> 
> Why are you deleting this?
> 

xsk_sendmsg() also calls this without an rcu-lock when transmitting the 
packets if the xmit was successful, so I was assuming it is not required 
and I removed this.

>>   }
>>   
>>   void prueth_xmit_free(struct prueth_tx_chn *tx_chn,
>> @@ -279,9 +276,6 @@ int emac_tx_complete_packets(struct prueth_emac *emac, int chn,
>>   		num_tx++;
>>   	}
>>   
>> -	if (!num_tx)
>> -		return 0;
> 
> Does something prevent us from running all this code if budget is 0?
> If budget is 0 we can complete normal Tx with skbs but we must
> not touch any AF-XDP related state.
> 

Can you elaborate more, I couldn't interpret your comment here

>>   	netif_txq = netdev_get_tx_queue(ndev, chn);
>>   	netdev_tx_completed_queue(netif_txq, num_tx, total_bytes);
>>   
>> @@ -306,7 +300,9 @@ int emac_tx_complete_packets(struct prueth_emac *emac, int chn,
>>   
>>   		netif_txq = netdev_get_tx_queue(ndev, chn);
>>   		txq_trans_cond_update(netif_txq);
> 
> This looks misplaced, now we will hit it even if we didn't complete
> or submit any Tx.
> 

This code needs to be hit for packet transmission in zero copy mode.
emac_xsk_xmit_zc() submits the packets to the DMA in NAPI context,
when application wakes up the driver and triggers NAPI. Once DMA 
transfer is done, irq gets triggered NAPI gets called which will handle 
the tx packet completion + submit next Tx batch packets to the DMA.

if (tx_chn->xsk_pool) -> check ensure this hits and runs for zero copy 
only. Also above check (!num_tx) returns early during the application 
wakeup (where budget is zero), hence it is removed.

>> +		__netif_tx_lock(netif_txq, smp_processor_id());
>>   		emac_xsk_xmit_zc(emac, chn);
>> +		__netif_tx_unlock(netif_txq);
>>   	}


^ permalink raw reply

* [PATCH net 2/2] devlink: Fix parent ref leak on tc-bw failure
From: Cosmin Ratiu @ 2026-06-16 11:06 UTC (permalink / raw)
  To: netdev
  Cc: Jiri Pirko, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Michal Wilczynski, Carolina Jubran,
	Cosmin Ratiu, Mark Bloch, Tariq Toukan
In-Reply-To: <20260616110633.1449432-1-cratiu@nvidia.com>

When a node is created via rate-new with tc-bw and a parent node,
devlink_nl_rate_set() executes the sequence of ops. It bails out on the
first failure and doesn't rollback anything. For most things that is
fine (setting some numbers), but the parent set can leak if there's
another failure after that.

That is precisely what happens when parent setting isn't the last block
in the function. After the referenced "Fixes" commit, when tc-bw fails
to be set the function bails out after having set the parent and
incremented its refcount.
There are two callers:
- devlink_nl_rate_set_doit() is fine, it just reports the error.
- but devlink_nl_rate_new_doit() frees the newly created node and leaks
  the parent refcnt.

Fix that by reordering the blocks so parent setting is last and adding a
comment explaining this so future modification preserve the ordering
(hopefully).

Fixes: 566e8f108fc7 ("devlink: Extend devlink rate API with traffic classes bandwidth management")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
---
 net/devlink/rate.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/net/devlink/rate.c b/net/devlink/rate.c
index 210e26c6cfa0..533d21b028a7 100644
--- a/net/devlink/rate.c
+++ b/net/devlink/rate.c
@@ -486,16 +486,19 @@ static int devlink_nl_rate_set(struct devlink_rate *devlink_rate,
 		devlink_rate->tx_weight = weight;
 	}
 
-	nla_parent = attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME];
-	if (nla_parent) {
-		err = devlink_nl_rate_parent_node_set(devlink_rate, info,
-						      nla_parent);
+	if (attrs[DEVLINK_ATTR_RATE_TC_BWS]) {
+		err = devlink_nl_rate_tc_bw_set(devlink_rate, info);
 		if (err)
 			return err;
 	}
 
-	if (attrs[DEVLINK_ATTR_RATE_TC_BWS]) {
-		err = devlink_nl_rate_tc_bw_set(devlink_rate, info);
+	/* Keep parent setting last because it takes a reference. This function
+	 * has no rollback, so failing after taking the ref would leak it.
+	 */
+	nla_parent = attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME];
+	if (nla_parent) {
+		err = devlink_nl_rate_parent_node_set(devlink_rate, info,
+						      nla_parent);
 		if (err)
 			return err;
 	}
-- 
2.53.0


^ permalink raw reply related

* [PATCH net 0/2] devlink: Fix a couple parent ref leaks
From: Cosmin Ratiu @ 2026-06-16 11:06 UTC (permalink / raw)
  To: netdev
  Cc: Jiri Pirko, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Michal Wilczynski, Carolina Jubran,
	Cosmin Ratiu, Mark Bloch, Tariq Toukan

These two patches fix parent ref leaks on errors.

Cosmin Ratiu (2):
  devlink: Fix parent ref leak in devl_rate_node_create()
  devlink: Fix parent ref leak on tc-bw failure

 net/devlink/rate.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

-- 
2.53.0


^ permalink raw reply

* [PATCH net 1/2] devlink: Fix parent ref leak in devl_rate_node_create()
From: Cosmin Ratiu @ 2026-06-16 11:06 UTC (permalink / raw)
  To: netdev
  Cc: Jiri Pirko, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Michal Wilczynski, Carolina Jubran,
	Cosmin Ratiu, Mark Bloch, Tariq Toukan
In-Reply-To: <20260616110633.1449432-1-cratiu@nvidia.com>

In the original commit the function bails out on kstrdup failure,
forgetting to decrement the refcnt of the parent.

Fix that by moving the parent refcnt setting after kstrdup.

Fixes: caba177d7f4d ("devlink: Enable creation of the devlink-rate nodes from the driver")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
---
 net/devlink/rate.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/devlink/rate.c b/net/devlink/rate.c
index 41be2d6c2954..210e26c6cfa0 100644
--- a/net/devlink/rate.c
+++ b/net/devlink/rate.c
@@ -725,11 +725,6 @@ devl_rate_node_create(struct devlink *devlink, void *priv, char *node_name,
 	if (!rate_node)
 		return ERR_PTR(-ENOMEM);
 
-	if (parent) {
-		rate_node->parent = parent;
-		refcount_inc(&rate_node->parent->refcnt);
-	}
-
 	rate_node->type = DEVLINK_RATE_TYPE_NODE;
 	rate_node->devlink = devlink;
 	rate_node->priv = priv;
@@ -740,6 +735,11 @@ devl_rate_node_create(struct devlink *devlink, void *priv, char *node_name,
 		return ERR_PTR(-ENOMEM);
 	}
 
+	if (parent) {
+		rate_node->parent = parent;
+		refcount_inc(&rate_node->parent->refcnt);
+	}
+
 	refcount_set(&rate_node->refcnt, 1);
 	list_add(&rate_node->list, &devlink->rate_list);
 	devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_NEW);
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH] swiotlb: avoid double copy with swiotlb on tx socket
From: Mostafa Saleh @ 2026-06-16 11:06 UTC (permalink / raw)
  To: Luigi Rizzo
  Cc: Jakub Kicinski, rizzo.unipi, m.szyprowski, robin.murphy, willemb,
	kuniyu, davem, edumazet, pabeni, gregkh, rafael, akpm, david,
	netdev, linux-mm, iommu, driver-core, linux-kernel
In-Reply-To: <CAMOZA0KAHKsvA9yRcdrjG13S+=rJhw-Cvnw2WdLjGGY0azG0kw@mail.gmail.com>

On Tue, Jun 16, 2026 at 02:33:52AM +0200, Luigi Rizzo wrote:
> On Tue, Jun 16, 2026 at 2:25 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Mon, 15 Jun 2026 23:42:20 +0000 Luigi Rizzo wrote:
> > > The use of swiotlb causes an extra data copy on I/O.  For tx sockets,
> > > especially with greedy senders, this has a high chance of happening in
> > > the softirq handler for tx network interrupts, creating a significant
> > > performance bottleneck.
> >
> > What's the use case? I associate swiotlb with debug / testing mostly,
> > so it'd be useful for people like me to explain why you care.
> 
> Ah sorry, I forgot to mention.
> swiotlb is used in guest kernels for confidential computing VMs.
> Ordinary memory pages are encrypted and the host or devices
> have no way to decrypt them, so the kernel must use
> unencrypted bounce buffers to exchange data with I/O devices.

I started looking into the same problem recently, to reduce the
bouncing in protected KVM (pKVM) confidential guests.
My first attempt was to update dma_direct_map_phys() to skip
bouncing and do inline memory decryption (for pKVM that is a hypercall
which updates the stage-2 page tables), however, that was really slow
compared to the memcpy in bouncing even for massive pages.
My conclusion was similar that we need to solve this at construction
by making this memory allocated from a pre-decrypted pool (which
does not have to be part of the SWIOTLB)
My initial idea was to teach some of the kernel subsystems (SKB,
BLK, SLAB) about "CoCo allocators" that allocate decrypted memory,
as this is not a net specific problem.

I am still looking into this, I was planning to bring this up in the
upcoming LPC.
I will give this patch a try. However, I believe that we need a more
generalised concept for CoCo pre-decrypted allocators in the kernel.

Thanks,
Mostafa

> 
> cheers
> luigi
> 

^ permalink raw reply

* Re: [PATCH net] tipc: free bearer discoverer via RCU to fix tipc_disc_rcv UAF
From: Sam P @ 2026-06-16 11:04 UTC (permalink / raw)
  To: Tung Quang Nguyen
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev@vger.kernel.org,
	tipc-discussion@lists.sourceforge.net,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org, Jon Maloy,
	bestswngs
In-Reply-To: <GV1P189MB19887A9A37B5B170C112DF8EC6E52@GV1P189MB1988.EURP189.PROD.OUTLOOK.COM>

On 16/06/2026 08:50, Tung Quang Nguyen wrote: 
> A similar patch was submitted 6 days ago: https://patchwork.kernel.org/project/netdevbpf/patch/20260610153349.2546041-2-bestswngs@gmail.com/
> 
> I do not receive updated patch from the submitter yet.
> Your patch has the same coding style issue (long line, over 80 columns), see linux/Documentation/process/coding-style.rst
> 
> If you break the long line into 2 lines and submit again, I think I can acknowledge your patch.

Oops, I missed that patch! I'm not sure what the etiquette
is in this case, but I'm happy to defer to the original
submitter (CCd) if they're working on a new patch and/or
add any appropriate trailers to my v2.

I've prepared a v2 to submit after the ~24h period,
addressing your changes and taking into account Eric's
feedback from the earlier submission as well
(adding an rcu_barrier() in tipc_exit()).


^ permalink raw reply

* Re: [PATCH net-next] i40e: add devlink parameter for Flow Director ATR sample rate
From: mohammad heib @ 2026-06-16 11:03 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, jiri, davem, edumazet, kuba, pabeni, horms, corbet,
	anthony.l.nguyen, przemyslaw.kitszel, andrew+netdev
In-Reply-To: <20260614161131.192068-1-mheib@redhat.com>



On 6/14/26 7:11 PM, mheib@redhat.com wrote:
> From: Mohammad Heib <mheib@redhat.com>
> 
> The i40e driver uses Flow Director ATR to periodically update flow
> steering information for active TCP flows. The update frequency is
> currently controlled by I40E_DEFAULT_ATR_SAMPLE_RATE and is fixed at
> driver build time.
> 
> On systems with a large number of queues and high-rate TCP workloads,
> the default sampling interval can result in frequent Flow Director
> reprogramming for long-lived flows.
> 
> The amount of TCP packet reordering observed on some systems is
> sensitive to the ATR sampling interval. Increasing the interval reduces
> Flow Director programming activity and can significantly reduce the
> associated reordering.
> 
> Since the optimal sampling interval depends on the workload and system
> configuration, a single fixed value is not suitable for all deployments.
> 
> Add a devlink parameter to allow administrators to tune the ATR sample
> rate at runtime without rebuilding the driver or disabling ATR
> functionality entirely.
> 
> Signed-off-by: Mohammad Heib <mheib@redhat.com>
> ---
>   Documentation/networking/devlink/i40e.rst     | 19 ++++++
>   drivers/net/ethernet/intel/i40e/i40e.h        |  1 +
>   .../net/ethernet/intel/i40e/i40e_devlink.c    | 65 +++++++++++++++++++
>   drivers/net/ethernet/intel/i40e/i40e_main.c   |  4 +-
>   drivers/net/ethernet/intel/i40e/i40e_txrx.h   |  4 +-
>   5 files changed, 90 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/networking/devlink/i40e.rst b/Documentation/networking/devlink/i40e.rst
> index 51c887f0dc83..704469aa9acf 100644
> --- a/Documentation/networking/devlink/i40e.rst
> +++ b/Documentation/networking/devlink/i40e.rst
> @@ -40,6 +40,25 @@ Parameters
>   
>           The default value is ``0`` (internal calculation is used).
>   
> +.. list-table:: Driver specific parameters implemented
> +    :widths: 5 5 90
> +
> +    * - Name
> +      - Mode
> +      - Description
> +    * - ``atr_sample_rate``
> +      - runtime
> +      - Controls how frequently Flow Director ATR updates flow steering
> +        information for active TCP flows.
> +
> +        ATR programs Flow Director entries based on sampled transmitted
> +        packets. The sampling interval is specified as the number of
> +        transmitted packets between ATR updates.
> +
> +        Lower values increase Flow Director programming activity, while
> +        higher values reduce the update frequency.
> +
> +        The default value is ``20``.
>   
>   Info versions
>   =============
> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
> index 1b6a8fbaa648..88eb40ee45f0 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e.h
> @@ -487,6 +487,7 @@ struct i40e_pf {
>   	u16 rss_size_max;          /* HW defined max RSS queues */
>   	u16 fdir_pf_filter_count;  /* num of guaranteed filters for this PF */
>   	u16 num_alloc_vsi;         /* num VSIs this driver supports */
> +	u32 atr_sample_rate;
>   	bool wol_en;
>   
>   	struct hlist_head fdir_filter_list;
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_devlink.c b/drivers/net/ethernet/intel/i40e/i40e_devlink.c
> index 229179ccc131..16e51762db45 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_devlink.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_devlink.c
> @@ -33,12 +33,77 @@ static int i40e_max_mac_per_vf_get(struct devlink *devlink,
>   	return 0;
>   }
>   
> +static int i40e_atr_sample_rate_set(struct devlink *devlink,
> +				    u32 id,
> +				    struct devlink_param_gset_ctx *ctx,
> +				    struct netlink_ext_ack *extack)
> +{
> +	struct i40e_pf *pf = devlink_priv(devlink);
> +	struct i40e_vsi *vsi;
> +	u32 sample_rate = ctx->val.vu32;
> +	int i;
> +
> +	pf->atr_sample_rate = sample_rate;
> +
> +	if (!test_bit(I40E_FLAG_FD_ATR_ENA, pf->flags))
> +		return 0;
> +
> +	vsi = i40e_pf_get_main_vsi(pf);
> +	if (!vsi)
> +		return 0;
> +
> +	for (i = 0; i < vsi->num_queue_pairs; i++) {
> +		if (!vsi->tx_rings[i])
> +			continue;
> +		vsi->tx_rings[i]->atr_sample_rate = sample_rate;
> +		vsi->tx_rings[i]->atr_count = 0;
> +	}
> +
> +	return 0;
> +}
> +
> +static int i40e_atr_sample_rate_get(struct devlink *devlink,
> +				    u32 id,
> +				    struct devlink_param_gset_ctx *ctx,
> +				    struct netlink_ext_ack *extack)
> +{
> +	struct i40e_pf *pf = devlink_priv(devlink);
> +
> +	ctx->val.vu32 = pf->atr_sample_rate;
> +
> +	return 0;
> +}
> +
> +static int i40e_atr_sample_rate_validate(struct devlink *devlink, u32 id,
> +					 union devlink_param_value val,
> +					 struct netlink_ext_ack *extack)
> +{
> +	if (!val.vu32) {
> +		NL_SET_ERR_MSG_MOD(extack,
> +				   "ATR sample rate must be greater than 0");
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
> +enum i40e_dl_param_id {
> +	I40E_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
> +	I40E_DEVLINK_PARAM_ID_ATR_SAMPLE_RATE,
> +};
> +
>   static const struct devlink_param i40e_dl_params[] = {
>   	DEVLINK_PARAM_GENERIC(MAX_MAC_PER_VF,
>   			      BIT(DEVLINK_PARAM_CMODE_RUNTIME),
>   			      i40e_max_mac_per_vf_get,
>   			      i40e_max_mac_per_vf_set,
>   			      NULL),
> +	DEVLINK_PARAM_DRIVER(I40E_DEVLINK_PARAM_ID_ATR_SAMPLE_RATE,
> +			     "atr_sample_rate",
> +			     DEVLINK_PARAM_TYPE_U32,
> +			     BIT(DEVLINK_PARAM_CMODE_RUNTIME),
> +			     i40e_atr_sample_rate_get,
> +			     i40e_atr_sample_rate_set,
> +			     i40e_atr_sample_rate_validate),
>   };
>   
>   static void i40e_info_get_dsn(struct i40e_pf *pf, char *buf, size_t len)
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index d59750c490f4..9c8144970a34 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -3458,7 +3458,7 @@ static int i40e_configure_tx_ring(struct i40e_ring *ring)
>   
>   	/* some ATR related tx ring init */
>   	if (test_bit(I40E_FLAG_FD_ATR_ENA, vsi->back->flags)) {
> -		ring->atr_sample_rate = I40E_DEFAULT_ATR_SAMPLE_RATE;
> +		ring->atr_sample_rate = vsi->back->atr_sample_rate;
>   		ring->atr_count = 0;
>   	} else {
>   		ring->atr_sample_rate = 0;
> @@ -12745,6 +12745,8 @@ static int i40e_sw_init(struct i40e_pf *pf)
>   		}
>   	}
>   
> +	pf->atr_sample_rate = I40E_DEFAULT_ATR_SAMPLE_RATE;
> +
>   	if ((pf->hw.func_caps.fd_filters_guaranteed > 0) ||
>   	    (pf->hw.func_caps.fd_filters_best_effort > 0)) {
>   		set_bit(I40E_FLAG_FD_ATR_ENA, pf->flags);
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
> index bb741ff3e5f2..7e29e9244c3a 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
> @@ -372,8 +372,8 @@ struct i40e_ring {
>   	u16 next_to_clean;
>   	u16 xdp_tx_active;
>   
> -	u8 atr_sample_rate;
> -	u8 atr_count;
> +	u32 atr_sample_rate;
> +	u32 atr_count;
>   
>   	bool ring_active;		/* is ring online or not */
>   	bool arm_wb;		/* do something to arm write back */

Hi Aleksandr,

Your concern is indeed valid. I'm not 100% sure whether devlink 
callbacks are still protected by rtnl_lock after the large locking 
changes that recently went into net/core.

That said, I'm wondering whether we need to store the ATR sample rate 
per ring at all. As far as I can tell, there is no option to configure 
the sample rate independently for individual rings, so maintaining a 
copy in every ring may not be necessary.

Would it make sense to remove the per-ring copy entirely and keep the 
sample rate only at the PF level? That would avoid the need to walk the 
rings from the devlink callback and would eliminate the race you pointed 
out.

Thanks Piotr for the review. I'll address your comment in v2.


^ permalink raw reply

* Re: [PATCH net v3] ip_tunnel: drop stale dst from generated PMTU ICMP replies
From: Ido Schimmel @ 2026-06-16 11:02 UTC (permalink / raw)
  To: laikabcprice
  Cc: David Ahern, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Shuah Khan, netdev, linux-kernel,
	linux-kselftest
In-Reply-To: <20260614-master-v3-1-9f5060ba1ed1@gmail.com>

On Sun, Jun 14, 2026 at 12:13:57AM +0100, Laika Price via B4 Relay wrote:
> From: Laika Price <laikabcprice@gmail.com>
> 
> iptunnel_pmtud_build_icmp(...) and iptunnel_pmtud_build_icmpv6(...) take
> in an sk_buff, modify it to create a PMTU ICMP error reply, and return it.
> As part of these modifications, the source/destination ethernet and IP
> addresses are swapped around which makes the sk_buff's current dst invalid.
> 
> If the stale dst is left, the packet can skip input routing and be
> forwarded using the original output device. This was observed when sending
> packets to a VXLAN over a WireGuard tunnel - the ICMP reply was generated
> but it was sent over the VXLAN instead of to the WireGuard tunnel.
> 
> This patch drops the stale dst after building the PMTU reply so that the
> packet is routed using its new headers when it is reinjected.
> 
> The pmtu_ipv4_br_vxlan4_exception test generates PMTU exceptions by
> pinging an IP on the other side of a tunnel. This was incorrect as it
> would return upon the first ICMP Fragmentation Needed due to the -w flag
> being used in conjunction with || return 1.
> 
> This patch updates pmtu_ipv4_br_vxlan4_exception to be in line with how
> PMTU exceptions are generated in other tests such as in test_pmtu_ipvX
> 
>     run_cmd ${ns_a} ${ping} -q -M want -i 0.1 -w 1 -s 1800 ${dst1}
>     run_cmd ${ns_a} ${ping} -q -M want -i 0.1 -w 1 -s 1800 ${dst2}

1. Please split the selftest fix to a separate patch (patch #1), explain
why the test is currently passing and why it's going to break with the
subsequent code change.

2. Use the appropriate Fixes tag for each patch.

3. Go over this doc:

https://docs.kernel.org/process/maintainer-netdev.html

4. Use ingest_mdir.py to test your patches:

https://github.com/linux-netdev/nipa#running-locally

> 
> Signed-off-by: Laika Price <laikabcprice@gmail.com>
> ---
> Changes in v3:
> - Squashed the selftest update into the ip_tunnel fix so the patch remains
>   bisectable.
> - Link to v2: https://patch.msgid.link/20260613-master-v2-0-061b70fd45dd@gmail.com
> 
> Changes in v2:
> - Fixed incorrect PMTU exception generation in the selftest.
> - Link to v1: https://patch.msgid.link/20260613-master-v1-1-df796e8e2d74@gmail.com
> ---
>  net/ipv4/ip_tunnel_core.c           | 2 ++
>  tools/testing/selftests/net/pmtu.sh | 4 ++--
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
> index d3c677e9b..949150e43 100644
> --- a/net/ipv4/ip_tunnel_core.c
> +++ b/net/ipv4/ip_tunnel_core.c
> @@ -267,6 +267,7 @@ static int iptunnel_pmtud_build_icmp(struct sk_buff *skb, int mtu)
>  
>  	eth_header(skb, skb->dev, ntohs(eh.h_proto), eh.h_source, eh.h_dest, 0);
>  	skb_reset_mac_header(skb);
> +	skb_dst_drop(skb);

This probably needs to be:

if (skb_valid_dst(skb))
	skb_dst_drop(skb);

Both VXLAN and GENEVE use the dst after skb_tunnel_check_pmtu() when in
external mode, so you can't drop it unconditionally. This shouldn't be a
problem because both IPv4 and IPv6 will resolve a new dst if the
current one isn't valid (i.e., it's a dst metadata one).

>  
>  	return skb->len;
>  }
> @@ -370,6 +371,7 @@ static int iptunnel_pmtud_build_icmpv6(struct sk_buff *skb, int mtu)
>  
>  	eth_header(skb, skb->dev, ntohs(eh.h_proto), eh.h_source, eh.h_dest, 0);
>  	skb_reset_mac_header(skb);
> +	skb_dst_drop(skb);
>  
>  	return skb->len;
>  }
> diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
> index a3323c21f..9498d9f53 100755
> --- a/tools/testing/selftests/net/pmtu.sh
> +++ b/tools/testing/selftests/net/pmtu.sh
> @@ -1456,8 +1456,8 @@ test_pmtu_ipvX_over_bridged_vxlanY_or_geneveY_exception() {
>  	mtu "${ns_a}" ${type}_a $((${ll_mtu} + 1000))
>  	mtu "${ns_b}" ${type}_b $((${ll_mtu} + 1000))
>  
> -	run_cmd ${ns_c} ${ping} -q -M want -i 0.1 -c 10 -s $((${ll_mtu} + 500)) ${dst} || return 1
> -	run_cmd ${ns_a} ${ping} -q -M want -i 0.1 -w 1  -s $((${ll_mtu} + 500)) ${dst} || return 1
> +	run_cmd ${ns_c} ${ping} -q -M want -i 0.1 -w 1 -s $((${ll_mtu} + 500)) ${dst}
> +	run_cmd ${ns_a} ${ping} -q -M want -i 0.1 -w 1 -s $((${ll_mtu} + 500)) ${dst}
>  
>  	# Check that exceptions were created
>  	pmtu="$(route_get_dst_pmtu_from_exception "${ns_c}" ${dst})"
> 
> ---
> base-commit: 2a2974b5145cdf2f4db134be1a2157e9ca4a1cf0
> change-id: 20260613-master-b749dfae5ecc
> 
> Best regards,
> --  
> Laika Price <laikabcprice@gmail.com>
> 
> 

^ permalink raw reply

* [PATCH] net: airoha: Clean up RX queues in airoha_dev_stop
From: Wayen Yan @ 2026-06-16 10:50 UTC (permalink / raw)
  To: netdev
  Cc: lorenzo, horms, pabeni, kuba, edumazet, andrew+netdev,
	angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
	linux-mediatek

When the last port is stopped, airoha_dev_stop() clears TX queues
but neglects to clean up RX queues. This can lead to:
- RX ring buffer descriptors remaining valid after device close
- Potential DMA synchronization issues on device reopen
- Risk of use-after-free if pages are freed while DMA is still active

Add cleanup loop for RX queues to mirror the TX queue cleanup,
ensuring symmetric resource management.

Fixes: 20bf7d07c956 ("net: airoha: add QDMA support for Airoha EN7581 Ethernet")
Signed-off-by: Wayen Yan <win847@gmail.com>
---
 drivers/net/ethernet/airoha/airoha_eth.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index 31cdb11cd7..9ca5bbf64d 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -1771,6 +1771,13 @@ static int airoha_dev_stop(struct net_device *dev)
 
 			airoha_qdma_cleanup_tx_queue(&qdma->q_tx[i]);
 		}
+
+		for (i = 0; i < ARRAY_SIZE(qdma->q_rx); i++) {
+			if (!qdma->q_rx[i].ndesc)
+				continue;
+
+			airoha_qdma_cleanup_rx_queue(&qdma->q_rx[i]);
+		}
 	}
 
 	return 0;
-- 
2.51.0



^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox