Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next v7 4/5] net: wangxun: implement soft quiesce for PCIe error recovery
From: Jiawen Wu @ 2026-06-15  6:50 UTC (permalink / raw)
  To: netdev
  Cc: Mengyuan Lou, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Richard Cochran, Russell King,
	Aleksandr Loktionov, Jacob Keller, Michal Swiatkowski,
	Simon Horman, Kees Cook, Greg Kroah-Hartman, Thomas Gleixner,
	Breno Leitao, Larysa Zaremba,
	Uwe Kleine-König (The Capable Hub), Fabio Baltieri,
	Jiawen Wu
In-Reply-To: <20260615065016.21672-1-jiawenwu@trustnetic.com>

Function wx_soft_quiesce() provide a lightweight shutdown path during
PCIe error recovery. It avoids MMIO-dependent operations in PCIe error
status.

Waiting for the service task to complete may unnecessarily delay PCIe
error recovery, especially if the work item is already blocked by the
hardware failure that triggered AER. So the service task is not
explicitly cancelled in quiesce path. As a measure to block the service
task, the checking of WX_STATE_DOWN and WX_STATE_RESETTING is added at
the entry of every work item.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 drivers/net/ethernet/wangxun/libwx/wx_lib.c   | 18 ++++++++++++++++
 drivers/net/ethernet/wangxun/libwx/wx_lib.h   |  1 +
 drivers/net/ethernet/wangxun/libwx/wx_ptp.c   | 21 +++++++++++++++++++
 drivers/net/ethernet/wangxun/libwx/wx_ptp.h   |  1 +
 .../net/ethernet/wangxun/txgbe/txgbe_main.c   |  8 +++++++
 5 files changed, 49 insertions(+)

diff --git a/drivers/net/ethernet/wangxun/libwx/wx_lib.c b/drivers/net/ethernet/wangxun/libwx/wx_lib.c
index e5a45356ba00..c10a3bf5cf02 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_lib.c
+++ b/drivers/net/ethernet/wangxun/libwx/wx_lib.c
@@ -3382,5 +3382,23 @@ void wx_service_timer(struct timer_list *t)
 }
 EXPORT_SYMBOL(wx_service_timer);
 
+void wx_soft_quiesce(struct wx *wx)
+{
+	if (!netif_running(wx->netdev) ||
+	    test_and_set_bit(WX_STATE_DOWN, wx->state))
+		return;
+
+	wx_ptp_quiesce(wx);
+	pci_clear_master(wx->pdev);
+	netif_tx_stop_all_queues(wx->netdev);
+	netif_carrier_off(wx->netdev);
+	netif_tx_disable(wx->netdev);
+	wx_napi_disable_all(wx);
+
+	clear_bit(WX_FLAG_NEED_PF_RESET, wx->flags);
+	timer_delete_sync(&wx->service_timer);
+}
+EXPORT_SYMBOL(wx_soft_quiesce);
+
 MODULE_DESCRIPTION("Common library for Wangxun(R) Ethernet drivers.");
 MODULE_LICENSE("GPL");
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_lib.h b/drivers/net/ethernet/wangxun/libwx/wx_lib.h
index aed6ea8cf0d6..11bd79985e17 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_lib.h
+++ b/drivers/net/ethernet/wangxun/libwx/wx_lib.h
@@ -41,5 +41,6 @@ void wx_set_ring(struct wx *wx, u32 new_tx_count,
 void wx_service_event_schedule(struct wx *wx);
 void wx_service_event_complete(struct wx *wx);
 void wx_service_timer(struct timer_list *t);
+void wx_soft_quiesce(struct wx *wx);
 
 #endif /* _WX_LIB_H_ */
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_ptp.c b/drivers/net/ethernet/wangxun/libwx/wx_ptp.c
index 44f3e6505246..bb89eff3dd45 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_ptp.c
+++ b/drivers/net/ethernet/wangxun/libwx/wx_ptp.c
@@ -842,6 +842,27 @@ void wx_ptp_stop(struct wx *wx)
 }
 EXPORT_SYMBOL(wx_ptp_stop);
 
+void wx_ptp_quiesce(struct wx *wx)
+{
+	if (!test_and_clear_bit(WX_STATE_PTP_RUNNING, wx->state))
+		return;
+
+	clear_bit(WX_FLAG_PTP_PPS_ENABLED, wx->flags);
+
+	if (wx->ptp_clock) {
+		ptp_clock_unregister(wx->ptp_clock);
+		wx->ptp_clock = NULL;
+		dev_info(&wx->pdev->dev, "removed PHC on %s\n", wx->netdev->name);
+	}
+
+	if (wx->ptp_tx_skb) {
+		dev_kfree_skb_any(wx->ptp_tx_skb);
+		wx->ptp_tx_skb = NULL;
+	}
+	clear_bit_unlock(WX_STATE_PTP_TX_IN_PROGRESS, wx->state);
+}
+EXPORT_SYMBOL(wx_ptp_quiesce);
+
 /**
  * wx_ptp_rx_hwtstamp - utility function which checks for RX time stamp
  * @wx: pointer to wx struct
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_ptp.h b/drivers/net/ethernet/wangxun/libwx/wx_ptp.h
index 50db90a6e3ee..ad2f824875d5 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_ptp.h
+++ b/drivers/net/ethernet/wangxun/libwx/wx_ptp.h
@@ -10,6 +10,7 @@ void wx_ptp_reset(struct wx *wx);
 void wx_ptp_init(struct wx *wx);
 void wx_ptp_suspend(struct wx *wx);
 void wx_ptp_stop(struct wx *wx);
+void wx_ptp_quiesce(struct wx *wx);
 void wx_ptp_rx_hwtstamp(struct wx *wx, struct sk_buff *skb);
 int wx_hwtstamp_get(struct net_device *dev,
 		    struct kernel_hwtstamp_config *cfg);
diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
index 9251e7a1d416..f6e596eb9217 100644
--- a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
+++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
@@ -94,6 +94,10 @@ static void txgbe_module_detection_subtask(struct wx *wx)
 {
 	int err;
 
+	if (test_bit(WX_STATE_DOWN, wx->state) ||
+	    test_bit(WX_STATE_RESETTING, wx->state))
+		return;
+
 	if (!test_and_clear_bit(WX_FLAG_NEED_MODULE_RESET, wx->flags))
 		return;
 
@@ -107,6 +111,10 @@ static void txgbe_module_detection_subtask(struct wx *wx)
 
 static void txgbe_link_config_subtask(struct wx *wx)
 {
+	if (test_bit(WX_STATE_DOWN, wx->state) ||
+	    test_bit(WX_STATE_RESETTING, wx->state))
+		return;
+
 	if (!test_and_clear_bit(WX_FLAG_NEED_LINK_CONFIG, wx->flags))
 		return;
 
-- 
2.51.0


^ permalink raw reply related

* RE: [PATCH net v2] tipc: fix slab-use-after-free Read in tipc_aead_decrypt_done
From: Tung Quang Nguyen @ 2026-06-15  6:54 UTC (permalink / raw)
  To: Doruk Tan Ozturk
  Cc: jmaloy@redhat.com, aleksander.lobakin@intel.com,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, horms@kernel.org, netdev@vger.kernel.org,
	tipc-discussion@lists.sourceforge.net,
	linux-kernel@vger.kernel.org
In-Reply-To: <20260615002211.90694-1-doruk@0sec.ai>

>Subject: Re: [PATCH net v2] tipc: fix slab-use-after-free Read in
>tipc_aead_decrypt_done
>
>On Tue, Jun 10, 2026, Tung Quang Nguyen wrote:
>> Can you decode the stack trace (using
>> linux/scripts/decode_stacktrace.sh)
>> for more readable text?
>>
>> Is the issue reproducible on the latest net branch, or just on the old
>> v6.12.92 you mentioned?
>
>Hi Tung,
>
>Thanks for the review. Answers to both questions below.
>
>1) Decoded stack trace
>----------------------
>Decoded with scripts/decode_stacktrace.sh against the vmlinux that produced
>the splat (6.12.92, CONFIG_KASAN_INLINE + CONFIG_TIPC +
>CONFIG_TIPC_CRYPTO).
>The use-after-free read and the allocation/free sites resolve as follows:
>
>BUG: KASAN: slab-use-after-free in tipc_crypto_rcv_complete
>(net/tipc/crypto.c:1917) Read of size 8 at addr ffff888104c8c808 by task
>kworker/3:2/70
>Workqueue: cryptd cryptd_queue_worker
>Call Trace:
> <TASK>
> dump_stack_lvl (lib/dump_stack.c:123)
> print_report (mm/kasan/report.c:378 mm/kasan/report.c:481)  kasan_report
>(mm/kasan/report.c:596)  tipc_crypto_rcv_complete (net/tipc/crypto.c:1917)
>tipc_aead_decrypt_done (net/tipc/crypto.c:996)  cryptd_aead_crypt
>(include/crypto/internal/aead.h:85 crypto/cryptd.c:772)  cryptd_queue_worker
>(crypto/cryptd.c:181)  process_one_work (kernel/workqueue.c:3264)
>worker_thread (kernel/workqueue.c:3339 kernel/workqueue.c:3426)  kthread
>(kernel/kthread.c:389)  ret_from_fork (arch/x86/kernel/process.c:152)
>ret_from_fork_asm (arch/x86/entry/entry_64.S:257)  </TASK>
>
>Allocated by task 1550:
> kasan_save_stack (mm/kasan/common.c:49)  kasan_save_track
>(mm/kasan/common.c:61 mm/kasan/common.c:70)  __kasan_kmalloc
>(mm/kasan/common.c:378 mm/kasan/common.c:395)  tipc_crypto_start
>(net/tipc/crypto.c:1484)  tipc_init_net (net/tipc/core.c:73)  ops_init
>(net/core/net_namespace.c:140)  setup_net (net/core/net_namespace.c:357)
>copy_net_ns (net/core/net_namespace.c:512)  create_new_namespaces
>(kernel/nsproxy.c:110)
>
>(The captured report has the KASAN read trace and the Allocated-by track; the
>free is on the netns teardown path tipc_crypto_stop() <- tipc_exit_net() <-
>cleanup_net(), as described in the changelog.)
>
>So the freed object is the per-netns struct tipc_crypto allocated in
>tipc_crypto_start() at netns creation (crypto.c:1484), and the cryptd worker
>then reads it from the async completion: tipc_aead_decrypt_done()
>(crypto.c:996) -> tipc_crypto_rcv_complete() (crypto.c:1917). Immediately after
>the UAF read the worker also faults dereferencing the stale node pointer in
>tipc_node_put() (net/tipc/node.c:319), confirming the object is gone.
>
>2) Reproducibility on the latest net branch
>--------------------------------------------
>The bug is still present on the latest net tree. I checked out v7.1-rc7 and
>inspected net/tipc/crypto.c first: the encrypt side already carries the
>maybe_get_net() guard from commit e279024617134 ("net/tipc: fix slab-use-
>after-free Read in tipc_aead_encrypt"), but tipc_aead_decrypt() still goes
>straight from tipc_bearer_hold(b) to crypto_aead_decrypt(req) with no
>maybe_get_net(aead->crypto->net) and no matching put_net() -- i.e. the exact
>gap this patch closes. So the decrypt path is unguarded on 7.1-rc7 and the UAF
>is reachable there in the same way.
>
>I also built v7.1-rc7 (HEAD at v7.1-rc7) with KASAN_INLINE + TIPC +
>TIPC_CRYPTO and reproduced the UAF live. The workload is the same as on
>6.12.92: a UDP bearer with a cluster key is flooded with crafted encrypted
>frames from an unknown peer, taking the cluster-key (pick_tx) RX decrypt path,
>while the bearer's netns is repeatedly torn down. Decoded against the rc7
>vmlinux:
>
>BUG: KASAN: slab-use-after-free in tipc_aead_decrypt_done
>(net/tipc/crypto.c:999) Read of size 8 at addr ffff8881056258a8 by task
>kworker/u16:2/51
>CPU: 2 UID: 0 PID: 51 Comm: kworker/u16:2 Not tainted 7.1.0-rc7-00020-... #15
>Workqueue: events_unbound
>Call Trace:
> <TASK>
> dump_stack_lvl (lib/dump_stack.c:94 lib/dump_stack.c:120)  print_report
>(mm/kasan/report.c:378 mm/kasan/report.c:482)  kasan_report
>(mm/kasan/report.c:595)  tipc_aead_decrypt_done (net/tipc/crypto.c:999)
>process_one_work (kernel/workqueue.c:3314)  worker_thread
>(kernel/workqueue.c:3397 kernel/workqueue.c:3478)  kthread
>(kernel/kthread.c:436)  ret_from_fork (arch/x86/kernel/process.c:158)
>ret_from_fork_asm (arch/x86/entry/entry_64.S:245)  </TASK>
>
>Allocated by task 169:
> __kasan_kmalloc (mm/kasan/common.c:398 mm/kasan/common.c:415)
>tipc_crypto_start (net/tipc/crypto.c:1502)  tipc_init_net (net/tipc/core.c:72)
>ops_init (net/core/net_namespace.c:137)  setup_net
>(net/core/net_namespace.c:446)  copy_net_ns
>(net/core/net_namespace.c:579)  create_new_namespaces
>(kernel/nsproxy.c:132)  unshare_nsproxy_namespaces (kernel/nsproxy.c:234)
>ksys_unshare (kernel/fork.c:3242)  __x64_sys_unshare (kernel/fork.c:3316)
> do_syscall_64 (arch/x86/entry/syscall_64.c:63)
>entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
>
>Freed by task 8:
> kfree (mm/slub.c:6566)
> tipc_exit_net (net/tipc/core.c:119)
> ops_undo_list (net/core/net_namespace.c)  cleanup_net
>(net/core/net_namespace.c:704)  process_one_work
>(kernel/workqueue.c:3314)  worker_thread (kernel/workqueue.c)  kthread
>(kernel/kthread.c:436)
>
>The freed object is the per-netns struct tipc_crypto allocated in
>tipc_crypto_start() at netns creation (crypto.c:1502 on rc7); the async decrypt
>completion then reads aead->crypto->stats from it (crypto.c:999) after
>cleanup_net() -> tipc_exit_net() -> tipc_crypto_stop() has freed it -- the exact
>read/alloc/free triple this patch closes, now on 7.1-rc7 rather than 6.12.92.
>
>One note on the harness: on x86 the in-tree gcm(aes) the SIMD aead wrapper
>used to register via simd_register_aeads_compat() is, as of the aesni rewrite,
>now registered directly with crypto_register_aeads() and decrypts
>synchronously, so the cryptd async window the original 6.12.92 splat used does
>not arise from the stock aesni path on rc7. To exercise the same async
>completion the changelog describes, I forced tipc_aead_decrypt()'s completion
>onto a workqueue in my test tree; the unguarded aead->crypto dereference in
>tipc_aead_decrypt_done() is what KASAN catches, and that code is byte-for-
>byte the unpatched upstream path. The source state is in any case
>unambiguous: tipc_aead_decrypt() on rc7 still lacks maybe_get_net(aead-
>>crypto->net), so the completion can outlive the free on any config where
>crypto_aead_decrypt() goes async (e.g. cryptd offload).
>
>Reproduced under KASAN on both v6.12.92 and v7.1-rc7; the decrypt path
>lacks the guard on the latest net tree.

Thanks. Please submit v3 with updated changelog (decoded stack trace, remove v6.12.92) because we always prefer latest tree to old one.

>
>Thanks,
>Doruk

^ permalink raw reply

* [PATCH] net: ehea: unwind probe_port sysfs file on failure
From: Pengpeng Hou @ 2026-06-15  7:00 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Kees Cook, netdev, linux-kernel
  Cc: pengpeng

ehea_create_device_sysfs() creates probe_port and then remove_port. If
the second device_create_file() fails, the helper returns the error but
leaves probe_port installed even though probe treats the sysfs setup as
failed.

Remove probe_port on the remove_port creation failure path so the helper
leaves no partial sysfs state behind.

Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
---
 drivers/net/ethernet/ibm/ehea/ehea_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/ibm/ehea/ehea_main.c b/drivers/net/ethernet/ibm/ehea/ehea_main.c
index ff67c4fd66a3..bfc8699a05b9 100644
--- a/drivers/net/ethernet/ibm/ehea/ehea_main.c
+++ b/drivers/net/ethernet/ibm/ehea/ehea_main.c
@@ -3216,6 +3216,8 @@ static int ehea_create_device_sysfs(struct platform_device *dev)
 		goto out;
 
 	ret = device_create_file(&dev->dev, &dev_attr_remove_port);
+	if (ret)
+		device_remove_file(&dev->dev, &dev_attr_probe_port);
 out:
 	return ret;
 }
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related

* net: netdev-genl: NETDEV_A_NAPI_PID is the init-ns pid, not the caller's
From: Maoyi Xie @ 2026-06-15  7:01 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Amritha Nambiar, Simon Horman, David Wei, Stanislav Fomichev,
	Samiullah Khawaja, netdev, linux-kernel, Maoyi Xie

Hi all,

I noticed something in netdev_nl_napi_fill_one() and would appreciate
your view on whether it is a real problem.

It reports the NAPI kthread pid like this:

	if (napi->thread) {
		pid = task_pid_nr(napi->thread);
		if (nla_put_u32(rsp, NETDEV_A_NAPI_PID, pid))

task_pid_nr() returns the pid in the initial pid namespace. It is put
into NETDEV_A_NAPI_PID without any translation to the caller's pid
namespace.

NETDEV_CMD_NAPI_GET has no GENL_ADMIN_PERM and the family is netnsok.
So a caller in a child pid namespace can read it. That caller then sees
the kthread's global pid. The kthread is not in that namespace, so the
value there should be 0.

This looks like the same case as commit 3799c2570982 ("io_uring/fdinfo:
translate SqThread PID through caller's pid_ns").

I checked it with a small reproducer and a fix. From a child pid
namespace the reproducer reads the kthread's global pid. With the fix it
reads 0. I am not sure how much this matters in practice. I would
appreciate it if you could let me know whether it is worth a fix. I am
happy to send the patch.

Thanks,
Maoyi

^ permalink raw reply

* [PATCH] net: ethtool: mm: Increase FPE verification retry count
From: muhammad.nazim.amirul.nazle.asmade @ 2026-06-15  7:24 UTC (permalink / raw)
  To: netdev
  Cc: andrew, kuba, davem, edumazet, pabeni, horms, vladimir.oltean,
	faizal.abdul.rahim, linux-kernel

From: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>

The current FPE verification retry count is set to 3. However,
the IEEE 802.3br standard does not specify a fixed value for this.
A retry count of 3 may be insufficient when the remote device is
slow to respond during link-up. Increase the retry count to 20 to
improve robustness.

Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com>
Signed-off-by: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
---
 include/linux/ethtool.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index f51346a6a686..9a1b1f5d37a4 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -23,7 +23,7 @@
 #include <uapi/linux/net_tstamp.h>
 
 #define ETHTOOL_MM_MAX_VERIFY_TIME_MS		128
-#define ETHTOOL_MM_MAX_VERIFY_RETRIES		3
+#define ETHTOOL_MM_MAX_VERIFY_RETRIES		20
 
 struct compat_ethtool_rx_flow_spec {
 	u32		flow_type;
-- 
2.43.7


^ permalink raw reply related

* Re: [PATCH] net: ethtool: mm: Increase FPE verification retry count
From: Nazle Asmade, Muhammad Nazim Amirul @ 2026-06-15  7:26 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev@vger.kernel.org, andrew+netdev@lunn.ch,
	davem@davemloft.net, edumazet@google.com, pabeni@redhat.com,
	mcoquelin.stm32@gmail.com, alexandre.torgue@foss.st.com,
	rmk+kernel@armlinux.org.uk, maxime.chevallier@bootlin.com,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20260609171750.7c5709ac@kernel.org>

On 10/6/2026 8:17 am, Jakub Kicinski wrote:
> On Thu,  4 Jun 2026 19:56:31 -0700
> muhammad.nazim.amirul.nazle.asmade@altera.com wrote:
>> The current FPE verification retry count is set to 3. However,
>> the IEEE 802.3br standard does not specify a fixed value for this.
>> A retry count of 3 may be insufficient when the remote device is
>> slow to respond during link-up. Increase the retry count to 20 to
>> improve robustness.
> 
> You need to CC the author / expert on this code, please repost
> with the CC fixed.
Reposted, Thanks Jackub!

https://lore.kernel.org/all/20260615072436.26128-1-muhammad.nazim.amirul.nazle.asmade@altera.com/

^ permalink raw reply

* [syzbot] [net?] possible deadlock in __ethtool_get_link_ksettings
From: syzbot @ 2026-06-15  7:28 UTC (permalink / raw)
  To: andrew, davem, edumazet, horms, kuba, linux-kernel, netdev,
	pabeni, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    2319688890d9 geneve: Fix off-by-one comparing with GRO_LEG..
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=169064ae580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=a0842261b62cdea8
dashboard link: https://syzkaller.appspot.com/bug?extid=9bb8bd77f3966641f298
compiler:       Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/51d0476acca3/disk-23196888.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/7b1b6c54f1f5/vmlinux-23196888.xz
kernel image: https://storage.googleapis.com/syzbot-assets/c69ec4023055/bzImage-23196888.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+9bb8bd77f3966641f298@syzkaller.appspotmail.com

netlink: 'syz.1.3070': attribute type 10 has an invalid length.
netlink: 40 bytes leftover after parsing attributes in process `syz.1.3070'.
============================================
WARNING: possible recursive locking detected
syzkaller #0 Not tainted
--------------------------------------------
syz.1.3070/18201 is trying to acquire lock:
ffff888079ea6e18 (&dev_instance_lock_key#3){+.+.}-{4:4}, at: netdev_lock include/linux/netdevice.h:2837 [inline]
ffff888079ea6e18 (&dev_instance_lock_key#3){+.+.}-{4:4}, at: netdev_lock_ops include/net/netdev_lock.h:42 [inline]
ffff888079ea6e18 (&dev_instance_lock_key#3){+.+.}-{4:4}, at: __ethtool_get_link_ksettings+0x109/0x250 net/ethtool/ioctl.c:463

but task is already holding lock:
ffff888079ea6e18 (&dev_instance_lock_key#3){+.+.}-{4:4}, at: netdev_lock include/linux/netdevice.h:2837 [inline]
ffff888079ea6e18 (&dev_instance_lock_key#3){+.+.}-{4:4}, at: netdev_lock_ops include/net/netdev_lock.h:42 [inline]
ffff888079ea6e18 (&dev_instance_lock_key#3){+.+.}-{4:4}, at: do_setlink+0x3d4/0x4670 net/core/rtnetlink.c:3117
and the lock comparison function returns 0:

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&dev_instance_lock_key#3);
  lock(&dev_instance_lock_key#3);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by syz.1.3070/18201:
 #0: ffffffff8fdc1540 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock net/core/rtnetlink.c:80 [inline]
 #0: ffffffff8fdc1540 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock net/core/rtnetlink.c:341 [inline]
 #0: ffffffff8fdc1540 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_setlink+0x6a7/0xb20 net/core/rtnetlink.c:3527
 #1: ffff888079ea6e18 (&dev_instance_lock_key#3){+.+.}-{4:4}, at: netdev_lock include/linux/netdevice.h:2837 [inline]
 #1: ffff888079ea6e18 (&dev_instance_lock_key#3){+.+.}-{4:4}, at: netdev_lock_ops include/net/netdev_lock.h:42 [inline]
 #1: ffff888079ea6e18 (&dev_instance_lock_key#3){+.+.}-{4:4}, at: do_setlink+0x3d4/0x4670 net/core/rtnetlink.c:3117

stack backtrace:
CPU: 0 UID: 0 PID: 18201 Comm: syz.1.3070 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/09/2026
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_deadlock_bug+0x279/0x290 kernel/locking/lockdep.c:3041
 check_deadlock kernel/locking/lockdep.c:3093 [inline]
 validate_chain kernel/locking/lockdep.c:3895 [inline]
 __lock_acquire+0x24bf/0x2cd0 kernel/locking/lockdep.c:5237
 lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5868
 __mutex_lock_common kernel/locking/mutex.c:646 [inline]
 __mutex_lock+0x19d/0x1590 kernel/locking/mutex.c:820
 netdev_lock include/linux/netdevice.h:2837 [inline]
 netdev_lock_ops include/net/netdev_lock.h:42 [inline]
 __ethtool_get_link_ksettings+0x109/0x250 net/ethtool/ioctl.c:463
 __team_port_change_send+0x245/0x560 drivers/net/team/team_core.c:3095
 __team_port_change_check drivers/net/team/team_core.c:3139 [inline]
 team_port_change_check+0x82/0x1d0 drivers/net/team/team_core.c:3160
 team_device_event+0x487/0x570 drivers/net/team/team_core.c:3181
 notifier_call_chain+0x1a5/0x3d0 kernel/notifier.c:85
 call_netdevice_notifiers_extack net/core/dev.c:2288 [inline]
 call_netdevice_notifiers net/core/dev.c:2302 [inline]
 __dev_notify_flags+0x1aa/0x310 net/core/dev.c:9791
 netif_change_flags+0xde/0x1b0 net/core/dev.c:9820
 dev_change_flags+0x128/0x260 net/core/dev_api.c:68
 vlan_device_event+0x1b4e/0x1f00 net/8021q/vlan.c:494
 notifier_call_chain+0x1a5/0x3d0 kernel/notifier.c:85
 call_netdevice_notifiers_extack net/core/dev.c:2288 [inline]
 call_netdevice_notifiers net/core/dev.c:2302 [inline]
 __dev_notify_flags+0x1aa/0x310 net/core/dev.c:9791
 netif_change_flags+0xde/0x1b0 net/core/dev.c:9820
 do_setlink+0xdd6/0x4670 net/core/rtnetlink.c:3207
 rtnl_setlink+0x77d/0xb20 net/core/rtnetlink.c:3537
 rtnetlink_rcv_msg+0x802/0xc00 net/core/rtnetlink.c:7068
 netlink_rcv_skb+0x226/0x4a0 net/netlink/af_netlink.c:2556
 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
 netlink_unicast+0x7bb/0x940 net/netlink/af_netlink.c:1345
 netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1900
 sock_sendmsg_nosec net/socket.c:787 [inline]
 __sock_sendmsg net/socket.c:802 [inline]
 ____sys_sendmsg+0x9b9/0xa20 net/socket.c:2699
 ___sys_sendmsg+0x2a5/0x360 net/socket.c:2753
 __sys_sendmsg net/socket.c:2785 [inline]
 __do_sys_sendmsg net/socket.c:2790 [inline]
 __se_sys_sendmsg net/socket.c:2788 [inline]
 __x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2788
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f50af59ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f50b0467028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f50af815fa0 RCX: 00007f50af59ce59
RDX: 0000000000000000 RSI: 0000200000000080 RDI: 0000000000000003
RBP: 00007f50af632d6f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f50af816038 R14: 00007f50af815fa0 R15: 00007ffee67f33e8
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply

* Re: [PATCH net-next v2 0/2] net: isolate SKB data area allocations
From: Vlastimil Babka (SUSE) @ 2026-06-15  7:28 UTC (permalink / raw)
  To: Jakub Kicinski, Pedro Falcato
  Cc: Harry Yoo, Andrew Morton, David S. Miller, Eric Dumazet,
	Paolo Abeni, linux-hardening, linux-mm, netdev, linux-kernel,
	Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin,
	Simon Horman, Jason Xing, Kuniyuki Iwashima, Kees Cook
In-Reply-To: <20260613113303.7562bc59@kernel.org>

On 6/13/26 20:33, Jakub Kicinski wrote:
> On Thu, 11 Jun 2026 13:46:40 +0100 Pedro Falcato wrote:
>> Subject: [PATCH net-next v2 0/2] net: isolate SKB data area allocations
> 
> This doesn't apply to net-next, does patch 2 not apply to mm?
> If neither tree can take both - maybe MM can take the first patch by

OK I'll take the first patch through the slab tree in the planned second
next week's PR.

> itself and we will queue patch 2 after the changes propagate during 
> the merge window?


^ permalink raw reply

* Re: [PATCH net v3 2/2] ipv6: account for fraggap on the paged allocation path
From: Ido Schimmel @ 2026-06-15  7:30 UTC (permalink / raw)
  To: Wongi Lee
  Cc: netdev, David Ahern, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, asml.silence, dhowells,
	willemb, Jungwoo Lee
In-Reply-To: <aiq5VQknt1QHcvio@DESKTOP-19IMU7U.localdomain>

On Thu, Jun 11, 2026 at 10:34:13PM +0900, Wongi Lee wrote:
> In __ip6_append_data(), when the paged-allocation branch is taken
> (MSG_MORE / NETIF_F_SG / large fraglen), alloclen and pagedlen are
> computed as
> 
> 	alloclen = fragheaderlen + transhdrlen;
> 	pagedlen = datalen - transhdrlen;
> 
> datalen already includes fraggap (datalen = length + fraggap), but
> the fraggap bytes carried over from the previous skb are copied into
> the new skb's linear area at offset transhdrlen by the subsequent
> skb_copy_and_csum_bits(). The linear area is therefore undersized by
> fraggap bytes while pagedlen is overstated by the same amount, and
> the copy writes past skb->end into the trailing skb_shared_info.

Nit: I agree with the conclusion that the linear area is undersized, but
"copied into the new skb's linear area at offset transhdrlen" is not
accurate:

If fraggap is non-zero, this means that this is not the first skb and
that the transport header length is zero. We copy the gap bytes just
past the fragment headers:

data = skb_put(...);
data += fragheaderlen;
skb_copy_and_csum_bits(..., data + transhdrlen, fraggap) =
skb_copy_and_csum_bits(..., data + 0, fraggap)

> 
> An unprivileged user can trigger this via a UDPv6 socket using
> MSG_MORE together with MSG_SPLICE_PAGES.
> 
> The bad accounting was introduced by commit 773ba4fe9104 ("ipv6:
> avoid partial copy for zc"). Before commit ce650a166335 ("udp6: Fix
> __ip6_append_data()'s handling of MSG_SPLICE_PAGES"), the negative
> copy value caused -EINVAL to be returned. That later commit allowed
> MSG_SPLICE_PAGES to proceed in this case, making the corruption
> triggerable.
> 
> The non-paged branch sets alloclen to fraglen, which already accounts
> for fraggap because datalen does. Bring the paged branch in line by
> adding fraggap to alloclen and subtracting it from pagedlen.
> 
> After this adjustment, copy no longer collapses to -fraggap on the
> paged path, so remove the stale comment describing that old arithmetic.
> 
> Fixes: 773ba4fe9104 ("ipv6: avoid partial copy for zc")
> Signed-off-by: Jungwoo Lee <jwlee2217@gmail.com>
> Signed-off-by: Wongi Lee <qw3rtyp0@gmail.com>

Reviewed-by: Ido Schimmel <idosch@nvidia.com>

^ permalink raw reply

* Re: [PATCH net v3 1/2] ipv4: account for fraggap on the paged allocation path
From: Ido Schimmel @ 2026-06-15  7:32 UTC (permalink / raw)
  To: Wongi Lee
  Cc: netdev, David Ahern, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, asml.silence, dhowells,
	willemb, Jungwoo Lee
In-Reply-To: <aiq491nYg2Qf/f1N@DESKTOP-19IMU7U.localdomain>

On Thu, Jun 11, 2026 at 10:32:39PM +0900, Wongi Lee wrote:
> In __ip_append_data(), when the paged-allocation branch is taken,
> alloclen and pagedlen are computed as
> 
> 	alloclen = fragheaderlen + transhdrlen;
> 	pagedlen = datalen - transhdrlen;
> 
> datalen already includes fraggap, but the fraggap bytes carried over
> from the previous skb are copied into the new skb's linear area at
> offset transhdrlen by the subsequent skb_copy_and_csum_bits(). The
> linear area is therefore undersized by fraggap bytes while pagedlen is
> overstated by the same amount.
> 
> The non-paged branch sets alloclen to fraglen, which already accounts
> for fraggap because datalen does. Bring the paged branch in line by
> adding fraggap to alloclen and subtracting it from pagedlen.
> 
> After this adjustment, copy no longer collapses to -fraggap on the
> paged path, so remove the stale comment describing that old arithmetic.
> 
> Fixes: 8eb77cc73977 ("ipv4: avoid partial copy for zc")
> Signed-off-by: Jungwoo Lee <jwlee2217@gmail.com>
> Signed-off-by: Wongi Lee <qw3rtyp0@gmail.com>

Reviewed-by: Ido Schimmel <idosch@nvidia.com>

^ permalink raw reply

* [PATCH bpf 0/2] Fix partial copy of non-linear skb test_run output
From: Sun Jian @ 2026-06-15  7:38 UTC (permalink / raw)
  To: bpf
  Cc: netdev, linux-kselftest, linux-kernel, ast, daniel, andrii,
	martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, shuah,
	paul.chaignon, Sun Jian

This series fixes BPF_PROG_TEST_RUN copy-out handling for non-linear skbs
when userspace provides a short data_out buffer.

Patch 1 fixes bpf_test_finish() to compute the skb linear head copy length
from the skb layout instead of deriving it from the clamped copy size.

Patch 2 adds a selftest covering a non-linear skb with a short data_out
buffer. The test checks that test_run returns -ENOSPC, reports the full
packet length through data_size_out, and copies the packet prefix into
data_out.

Tested with:

./test_progs -t skb_load_bytes
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

./test_progs -t skb_load_bytes -v
test_nonlinear_data_out_partial:PASS:nonlinear_partial_err
test_nonlinear_data_out_partial:PASS:nonlinear_partial_data_size_out
test_nonlinear_data_out_partial:PASS:nonlinear_partial_data_out
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

./test_progs -t skb
Summary: 14/92 PASSED, 0 SKIPPED, 0 FAILED

Sun Jian (2):
  bpf: Fix partial copy of non-linear skb test_run output
  selftests/bpf: Cover partial copy of non-linear skb test_run output

 net/bpf/test_run.c                            | 11 +++---
 .../selftests/bpf/prog_tests/skb_load_bytes.c | 35 +++++++++++++++++++
 2 files changed, 39 insertions(+), 7 deletions(-)

-- 
2.43.0

^ permalink raw reply

* [PATCH bpf 1/2] bpf: Fix partial copy of non-linear skb test_run output
From: Sun Jian @ 2026-06-15  7:38 UTC (permalink / raw)
  To: bpf
  Cc: netdev, linux-kselftest, linux-kernel, ast, daniel, andrii,
	martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, shuah,
	paul.chaignon, Sun Jian
In-Reply-To: <20260615073856.152479-1-sun.jian.kdev@gmail.com>

For non-linear skbs, bpf_test_finish() derives the linear head copy
length from copy_size - frag_size. This only matches the skb head length
when copy_size is the full packet size.

When userspace provides a short data_out buffer, copy_size is clamped to
that buffer size. If copy_size is smaller than frag_size, the computed
length becomes negative and bpf_test_finish() returns -ENOSPC before
copying the packet prefix or updating data_size_out.

Compute the linear head length from the skb layout instead, and clamp the
head copy length to copy_size. This preserves the expected partial-copy
semantics: return -ENOSPC, copy the packet prefix that fits in data_out,
and report the full packet length through data_size_out.

Fixes: 838baa351cee ("bpf: Craft non-linear skbs in BPF_PROG_TEST_RUN")
Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
---
 net/bpf/test_run.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 2bc04feadfab..976e8fa31bc9 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -453,19 +453,16 @@ static int bpf_test_finish(const union bpf_attr *kattr,
 	}

 	if (data_out) {
-		int len = sinfo ? copy_size - frag_size : copy_size;
-
-		if (len < 0) {
-			err = -ENOSPC;
-			goto out;
-		}
+		u32 head_len = size - frag_size;
+		u32 len = min(copy_size, head_len);

 		if (copy_to_user(data_out, data, len))
 			goto out;

 		if (sinfo) {
-			int i, offset = len;
+			u32 offset = len;
 			u32 data_len;
+			int i;

 			for (i = 0; i < sinfo->nr_frags; i++) {
 				skb_frag_t *frag = &sinfo->frags[i];
-- 
2.43.0

^ permalink raw reply related

* [PATCH bpf 2/2] selftests/bpf: Cover partial copy of non-linear skb test_run output
From: Sun Jian @ 2026-06-15  7:38 UTC (permalink / raw)
  To: bpf
  Cc: netdev, linux-kselftest, linux-kernel, ast, daniel, andrii,
	martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, shuah,
	paul.chaignon, Sun Jian
In-Reply-To: <20260615073856.152479-1-sun.jian.kdev@gmail.com>

Add a test case for BPF_PROG_TEST_RUN with a non-linear skb and a short
data_out buffer.

The test verifies that test_run returns -ENOSPC, reports the full packet
length through data_size_out, and copies the packet prefix into data_out.
The test uses a 100-byte data_out buffer with a 64-byte linear head, so the
expected output spans both the skb head and the first fragment.

Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
---
 .../selftests/bpf/prog_tests/skb_load_bytes.c | 35 +++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/skb_load_bytes.c b/tools/testing/selftests/bpf/prog_tests/skb_load_bytes.c
index d7f83c0a40a5..134be0ea8ed7 100644
--- a/tools/testing/selftests/bpf/prog_tests/skb_load_bytes.c
+++ b/tools/testing/selftests/bpf/prog_tests/skb_load_bytes.c
@@ -3,6 +3,39 @@
 #include <network_helpers.h>
 #include "skb_load_bytes.skel.h"
 
+#define NONLINEAR_PKT_LEN 9000
+#define NONLINEAR_HEAD_LEN 64
+#define SHORT_OUT_LEN 100
+
+static void test_nonlinear_data_out_partial(int prog_fd)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, tattr);
+	__u8 pkt[NONLINEAR_PKT_LEN];
+	__u8 out[SHORT_OUT_LEN];
+	struct __sk_buff skb = {};
+	int err, i;
+
+	for (i = 0; i < sizeof(pkt); i++)
+		pkt[i] = i & 0xff;
+
+	memset(out, 0xa5, sizeof(out));
+
+	skb.data_end = NONLINEAR_HEAD_LEN;
+
+	tattr.data_in = pkt;
+	tattr.data_size_in = sizeof(pkt);
+	tattr.data_out = out;
+	tattr.data_size_out = sizeof(out);
+	tattr.ctx_in = &skb;
+	tattr.ctx_size_in = sizeof(skb);
+
+	err = bpf_prog_test_run_opts(prog_fd, &tattr);
+
+	ASSERT_EQ(err, -ENOSPC, "nonlinear_partial_err");
+	ASSERT_EQ(tattr.data_size_out, sizeof(pkt), "nonlinear_partial_data_size_out");
+	ASSERT_OK(memcmp(out, pkt, sizeof(out)), "nonlinear_partial_data_out");
+}
+
 void test_skb_load_bytes(void)
 {
 	struct skb_load_bytes *skel;
@@ -40,6 +73,8 @@ void test_skb_load_bytes(void)
 	if (!ASSERT_EQ(test_result, 0, "offset 10"))
 		goto out;
 
+	test_nonlinear_data_out_partial(prog_fd);
+
 out:
 	skb_load_bytes__destroy(skel);
 }
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net-next 06/15] igb: Retrieve Tx timestamp from BH workqueue
From: Kurt Kanzenbach @ 2026-06-15  7:49 UTC (permalink / raw)
  To: Jakub Kicinski, anthony.l.nguyen
  Cc: Simon Horman, davem, pabeni, edumazet, andrew+netdev, netdev, ade,
	dima.ruinskiy, jacob.e.keller, dish, avigailx.dahan
In-Reply-To: <20260613164306.567d1cc6@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 685 bytes --]

On Sat Jun 13 2026, Jakub Kicinski wrote:
> On Fri, 12 Jun 2026 10:57:44 +0100 Simon Horman wrote:
>> >  		/* reschedule to check later */
>> > -		schedule_work(&adapter->ptp_tx_work);
>> > +		queue_work(system_bh_wq, &adapter->ptp_tx_work);  
>> 
>> [Severity: High]
>> If the hardware timestamp is not yet valid, won't this work item
>> unconditionally reschedule itself to system_bh_wq without delay? 
>
> This sounds correct, the patch is basically busy polling from a tasklet.
> BH workqueue is not just another work queue.

Indeed, thanks for the report. Please, drop this patch for now. I'll get
an Intel 82576 NIC and see how this can be fixed.

Thanks,
Kurt

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]

^ permalink raw reply

* Re: [PATCH] nbd: Reclassify sockets to avoid lockdep circular dependency
From: Eric Dumazet @ 2026-06-15  7:53 UTC (permalink / raw)
  To: Hillf Danton
  Cc: linux-kernel, Jens Axboe, linux-block, nbd, Kuniyuki Iwashima,
	netdev, syzbot+607cdcf978b3e79da878
In-Reply-To: <20260613101214.1771-1-hdanton@sina.com>

On Sat, Jun 13, 2026 at 3:12 AM Hillf Danton <hdanton@sina.com> wrote:
>
> On Sat, 13 Jun 2026 04:26:19 +0000 Eric Dumazet wrote:
> > syzbot reported a possible circular locking dependency in udp_sendmsg()
> > where fs_reclaim can be triggered while holding sk_lock, and fs_reclaim
> > can eventually depend on another sk_lock (e.g., if NBD is used for swap
> > or writeback and NBD uses TLS/TCP which acquires sk_lock).
> >
> > Since the UDP socket and the NBD TCP/TLS socket are different, this is a
> > false positive. Fix this by reclassifying NBD sockets to a separate lock
> > class when they are added to the NBD device.
> >
> > This is similar to what nvme-tcp and other network block devices do.
> >
> > Fixes: ffa1e7ada456 ("block: Make request_queue lockdep splats show up earlier")
>
> Given the Fixes tag, can you specify anything wrong that commit added?

Nothing 'wrong'.

This (good) commit allowed LOCKDEP to throw a warning and eventually
panic the box.

A Fixes: tag does not imply the patch was wrong.

^ permalink raw reply

* Re: [PATCH net] igb: only strip Rx timestamp header on the first buffer of a frame
From: Kurt Kanzenbach @ 2026-06-15  7:43 UTC (permalink / raw)
  To: Tjerk Kusters, netdev@vger.kernel.org
  Cc: intel-wired-lan@lists.osuosl.org, anthony.l.nguyen@intel.com,
	przemyslaw.kitszel@intel.com, andrew+netdev@lunn.ch,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, richardcochran@gmail.com, hawk@kernel.org,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <PAWPR05MB1069106D52F4E17F1EDB99C67B9182@PAWPR05MB10691.eurprd05.prod.outlook.com>

[-- Attachment #1: Type: text/plain, Size: 3994 bytes --]

Hi,

On Fri Jun 12 2026, Tjerk Kusters wrote:
> Hi,
>
> The patch is attached (0001-igb-only-strip-Rx-timestamp-header-on-the-first-buff.patch)
> as my mail setup cannot send it inline via git send-email; apologies for the
> attachment.

b4 has a web submission endpoint. Maybe you can use that one:

https://b4.docs.kernel.org/en/latest/contributor/send.html

[snip]

> From fee3e3452dfcd7e109332369672a3e0090cadeb3 Mon Sep 17 00:00:00 2001
> From: T Kusters <tkusters@aweta.nl>
> Date: Tue, 9 Jun 2026 14:06:24 +0200
> Subject: [PATCH net] igb: only strip Rx timestamp header on the first buffer
>  of a frame
>
> When Rx hardware timestamping is enabled (e.g. ptp4l, which configures
> HWTSTAMP_FILTER_ALL), the NIC prepends a 16-byte timestamp header to the
> first Rx buffer of every received frame. igb_clean_rx_irq() strips this
> header inside its per-buffer loop:
>
> 	if (igb_test_staterr(rx_desc, E1000_RXDADV_STAT_TSIP)) {
> 		ts_hdr_len = igb_ptp_rx_pktstamp(rx_ring->q_vector,
> 						 pktbuf, &timestamp);
> 		pkt_offset += ts_hdr_len;
> 		size -= ts_hdr_len;
> 	}
>
> For a frame that spans more than one Rx buffer (e.g. a jumbo frame), this
> block runs once per buffer. The timestamp header only exists at the start
> of the first buffer, but igb_ptp_rx_pktstamp() is called for every buffer.
>
> On a continuation buffer the data is packet payload, not a timestamp
> header. igb_ptp_rx_pktstamp() already has two guards against acting on a
> non-header buffer: it returns 0 if PTP is disabled, and returns 0 if the
> reserved dwords (the first 8 bytes) are non-zero. Neither is sufficient
> here: PTP is enabled, and a continuation buffer whose payload happens to
> begin with 8 zero bytes passes the reserved-dword check. In that case the
> payload is mistaken for a valid timestamp header and igb_ptp_rx_pktstamp()
> returns IGB_TS_HDR_LEN, so the caller strips 16 bytes of real data from
> that buffer. A frame spanning N buffers whose continuation buffers start
> with zero bytes therefore loses 16 * (N - 1) bytes from its tail.
>
> This is easily triggered by a GigE Vision camera streaming dark frames
> (mostly 0x00 pixel data) over jumbo UDP with PTP active on the receiver:
> the all-zero frames arrive truncated while frames with non-zero content
> are fine. There is no error indication.
>
> No content-based check can reliably tell a continuation buffer that begins
> with zero bytes from a real timestamp header, because both are all zero.
> Fix it structurally instead: only attempt the strip on the first buffer of
> a frame, which is the only buffer that can contain a timestamp header. In
> igb_clean_rx_irq() skb is NULL until the first buffer has been processed,
> so guarding the strip with !skb restricts it to the first buffer
> regardless of payload content.
>
> Fixes: 5379260852b0 ("igb: Fix XDP with PTP enabled")
> Cc: stable@vger.kernel.org
> Signed-off-by: T Kusters <tkusters@aweta.nl>

Great explanation! igb_clean_rx_irq_zc() does not need the same
treatment, correct?

Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>

> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index ce91dda00ec0..abb55cd589a9 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -9061,7 +9061,8 @@ static int igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
>  		pktbuf = page_address(rx_buffer->page) + rx_buffer->page_offset;
>  
>  		/* pull rx packet timestamp if available and valid */
> -		if (igb_test_staterr(rx_desc, E1000_RXDADV_STAT_TSIP)) {
> +		if (!skb &&
> +		    igb_test_staterr(rx_desc, E1000_RXDADV_STAT_TSIP)) {
>  			int ts_hdr_len;
>  
>  			ts_hdr_len = igb_ptp_rx_pktstamp(rx_ring->q_vector,
> -- 
> 2.27.0
>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]

^ permalink raw reply

* [PATCH] nfc: fdp: reject an oversized device-reported packet length
From: Bryam Vargas via B4 Relay @ 2026-06-15  8:04 UTC (permalink / raw)
  To: David Heidelberg
  Cc: linux-kernel, Robert Dolca, netdev, oe-linux-nfc, Samuel Ortiz,
	Kang Chen

From: Bryam Vargas <hexlabsecurity@proton.me>

fdp_nci_i2c_read() reads the length of the next packet from the device
into phy->next_read_size and uses it as the i2c_master_recv() byte count
into a fixed on-stack buffer:

	u8 tmp[FDP_NCI_I2C_MAX_PAYLOAD];		/* 261 bytes */
	...
	len = phy->next_read_size;
	r = i2c_master_recv(client, tmp, len);

When a "length packet" arrives (tmp[0] == 0 && tmp[1] == 0), the next
length is taken verbatim from two device-supplied bytes:

	phy->next_read_size = (tmp[2] << 8) + tmp[3] + 3;

next_read_size is a u16, so this can be driven as high as 65535 - far
larger than the 261-byte tmp[] buffer - and it is never bounded before
the next iteration's i2c_master_recv(). A malfunctioning, malicious or
counterfeit FDP NFC controller (or an attacker tampering with the I2C
bus) that sends such a length packet makes i2c_master_recv() write up to
about 64 KB into the 261-byte on-stack buffer: a stack out-of-bounds
write that clobbers the stack canary, saved registers and the return
address.

Reject a next_read_size larger than the receive buffer the same way a
corrupted packet is already handled - drop it and force resynchronization
- so a device can never drive an over-length read.

Fixes: a06347c04c13 ("NFC: Add Intel Fields Peak NFC solution driver")
Cc: stable@vger.kernel.org
Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
---
I reproduced the out-of-bounds write with an in-kernel test that drives
the fdp_nci_i2c_read() buffer geometry verbatim under KASAN
(CONFIG_KASAN_STACK=y), modelling i2c_master_recv() delivering
next_read_size device bytes into the 261-byte tmp[] buffer:

  next_read_size = 281, no bound:
    BUG: KASAN: stack-out-of-bounds in i2c_master_recv...
    Write of size 281 ... [48, 309) 'tmp'   (the 261-byte buffer)
  with the device length bounded to <= FDP_NCI_I2C_MAX_PAYLOAD (what this
    patch enforces): no KASAN report.
  a well-formed packet (length <= 261) is unaffected, no KASAN report.

The full device range - next_read_size = 65535 (tmp[2] = 0xff,
tmp[3] = 0xfc; the u16 field truncates the + 3), a 65535-byte write =
65274 bytes past the buffer, smashing the stack canary and the return
address - reproduces the same way under userspace AddressSanitizer on
both -m32 and -m64.
---
 drivers/nfc/fdp/i2c.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/nfc/fdp/i2c.c b/drivers/nfc/fdp/i2c.c
index c1896a1d978c..0392bb49bb4b 100644
--- a/drivers/nfc/fdp/i2c.c
+++ b/drivers/nfc/fdp/i2c.c
@@ -166,6 +166,20 @@ static int fdp_nci_i2c_read(struct fdp_i2c_phy *phy, struct sk_buff **skb)
 		/* Packet that contains a length */
 		if (tmp[0] == 0 && tmp[1] == 0) {
 			phy->next_read_size = (tmp[2] << 8) + tmp[3] + 3;
+
+			/*
+			 * next_read_size is taken from the device and is used
+			 * as the i2c_master_recv() count on the next iteration.
+			 * A value larger than the receive buffer would overflow
+			 * tmp[]; treat it like a corrupted packet and force
+			 * resynchronization.
+			 */
+			if (phy->next_read_size > FDP_NCI_I2C_MAX_PAYLOAD) {
+				dev_dbg(&client->dev, "%s: corrupted packet\n",
+					__func__);
+				phy->next_read_size = FDP_NCI_I2C_MIN_PAYLOAD;
+				goto flush;
+			}
 		} else {
 			phy->next_read_size = FDP_NCI_I2C_MIN_PAYLOAD;
 

---
base-commit: 8e65320d91cdc3b241d4b94855c88459b91abf66
change-id: 20260615-b4-disp-f42dce2d-055035ea37ba

Best regards,
-- 
Bryam Vargas <hexlabsecurity@proton.me>



^ permalink raw reply related

* Re: [PATCH v3 3/4] drm/xe/ras: Add support for error threshold
From: Tauro, Riana @ 2026-06-15  8:17 UTC (permalink / raw)
  To: Raag Jadav, intel-xe, dri-devel, netdev
  Cc: simona.vetter, airlied, kuba, lijo.lazar, Hawking.Zhang, davem,
	pabeni, edumazet, dev, zachary.mckevitt, rodrigo.vivi,
	michal.wajdeczko, matthew.d.roper, mallesh.koujalagi
In-Reply-To: <20260604184849.1011985-4-raag.jadav@intel.com>


On 05-06-2026 00:16, Raag Jadav wrote:
> System controller allows getting/setting per counter threshold, which it

for correctable errors.

> uses to raise error events to the driver. Get/set it using the respective
> mailbox command.
>
> Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> ---
> v2: Add RAS operation status codes (Riana)
> v3: Reuse status codes and uapi mapping from counter series (Riana)
>      Access request/response counter using local pointer (Riana)
>      Mark unused field as reserved (Riana)
> ---
>   drivers/gpu/drm/xe/xe_ras.c                   | 105 ++++++++++++++++++
>   drivers/gpu/drm/xe/xe_ras.h                   |   2 +
>   drivers/gpu/drm/xe/xe_ras_types.h             |  51 +++++++++
>   drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h |   4 +
>   4 files changed, 162 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
> index 7cb6fcb1254a..d6f89b429cec 100644
> --- a/drivers/gpu/drm/xe/xe_ras.c
> +++ b/drivers/gpu/drm/xe/xe_ras.c
> @@ -270,6 +270,111 @@ int xe_ras_clear_counter(struct xe_device *xe, u8 severity, u8 component)
>   	return 0;
>   }
>   
> +/**
> + * xe_ras_get_threshold() - Get error counter threshold
> + * @xe: Xe device instance
> + * @severity: Error severity to be queried (&enum drm_xe_ras_error_severity)
> + * @component: Error component to be queried (&enum drm_xe_ras_error_component)
> + * @threshold: Counter threshold
> + *
> + * This function retrieves the error threshold of a specific counter based on
> + * severity and component.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int xe_ras_get_threshold(struct xe_device *xe, u8 severity, u8 component, u32 *threshold)
> +{
> +	struct xe_ras_get_threshold_response response = {};
> +	struct xe_ras_get_threshold_request request = {};
> +	struct xe_sysctrl_mailbox_command command = {};
> +	struct xe_ras_error_class *counter;
> +	size_t len;
> +	int ret;
> +
> +	counter = &request.counter;
> +	counter->common.severity = drm_to_xe_ras_severity(severity);
> +	counter->common.component = drm_to_xe_ras_component(component);
> +
> +	xe_sysctrl_create_command(&command, XE_SYSCTRL_GROUP_GFSP, XE_SYSCTRL_CMD_GET_THRESHOLD,
> +				  &request, sizeof(request), &response, sizeof(response));
> +
> +	guard(xe_pm_runtime)(xe);
> +	ret = xe_sysctrl_send_command(&xe->sc, &command, &len);
> +	if (ret) {
> +		xe_err(xe, "sysctrl: failed to get threshold %d\n", ret);
> +		return ret;
> +	}
> +
> +	if (len != sizeof(response)) {
> +		xe_err(xe, "sysctrl: unexpected get threshold response length %zu (expected %zu)\n",
> +		       len, sizeof(response));
> +		return -EIO;
> +	}
> +
> +	counter = &response.counter;
> +	*threshold = response.threshold;
> +
> +	xe_dbg(xe, "[RAS]: get counter threshold %u for %s %s\n", *threshold,
> +	       comp_to_str(counter->common.component), sev_to_str(counter->common.severity));

"get threshold" to be consistent with <operation> <value> <component> 
<severity>
and other prints

> +	return 0;
> +}
> +
> +/**
> + * xe_ras_set_threshold() - Set error counter threshold
> + * @xe: Xe device instance
> + * @severity: Error severity to be set (&enum drm_xe_ras_error_severity)
> + * @component: Error component to be set (&enum drm_xe_ras_error_component)
> + * @threshold: Counter threshold
> + *
> + * This function sets the error threshold of a specific counter based on
> + * severity and component.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int xe_ras_set_threshold(struct xe_device *xe, u8 severity, u8 component, u32 threshold)
> +{
> +	struct xe_ras_set_threshold_response response = {};
> +	struct xe_ras_set_threshold_request request = {};
> +	struct xe_sysctrl_mailbox_command command = {};
> +	struct xe_ras_error_class *counter;
> +	size_t len;
> +	int ret;
> +
> +	counter = &request.counter;
> +	counter->common.severity = drm_to_xe_ras_severity(severity);
> +	counter->common.component = drm_to_xe_ras_component(component);
> +	request.threshold = threshold;
> +
> +	xe_sysctrl_create_command(&command, XE_SYSCTRL_GROUP_GFSP, XE_SYSCTRL_CMD_SET_THRESHOLD,
> +				  &request, sizeof(request), &response, sizeof(response));
> +
> +	guard(xe_pm_runtime)(xe);
> +	ret = xe_sysctrl_send_command(&xe->sc, &command, &len);
> +	if (ret) {
> +		xe_err(xe, "sysctrl: failed to set threshold %d\n", ret);
> +		return ret;
> +	}
> +
> +	if (len != sizeof(response)) {
> +		xe_err(xe, "sysctrl: unexpected set threshold response length %zu (expected %zu)\n",
> +		       len, sizeof(response));
> +		return -EIO;
> +	}
> +
> +	ret = ras_status_to_errno(response.status);
> +	if (ret) {
> +		xe_err(xe, "sysctrl: set threshold command failed with status %#x\n",
> +		       response.status);
> +		return ret;
> +	}
> +
> +	counter = &response.counter;
> +
> +	xe_dbg(xe, "[RAS]: set counter threshold %u for %s %s\n", response.threshold,

set threshold

> +	       comp_to_str(counter->common.component), sev_to_str(counter->common.severity));
> +	return 0;
> +}
> +
>   /**
>    * xe_ras_init - Initialize Xe RAS
>    * @xe: xe device instance
> diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
> index ba0b0224df23..1aa43c54b710 100644
> --- a/drivers/gpu/drm/xe/xe_ras.h
> +++ b/drivers/gpu/drm/xe/xe_ras.h
> @@ -15,6 +15,8 @@ void xe_ras_counter_threshold_crossed(struct xe_device *xe,
>   				      struct xe_sysctrl_event_response *response);
>   int xe_ras_get_counter(struct xe_device *xe, u8 severity, u8 component, u32 *value);
>   int xe_ras_clear_counter(struct xe_device *xe, u8 severity, u8 component);
> +int xe_ras_get_threshold(struct xe_device *xe, u8 severity, u8 component, u32 *threshold);
> +int xe_ras_set_threshold(struct xe_device *xe, u8 severity, u8 component, u32 threshold);
>   void xe_ras_init(struct xe_device *xe);
>   
>   #endif
> diff --git a/drivers/gpu/drm/xe/xe_ras_types.h b/drivers/gpu/drm/xe/xe_ras_types.h
> index c6392435d1c6..8ea817583eed 100644
> --- a/drivers/gpu/drm/xe/xe_ras_types.h
> +++ b/drivers/gpu/drm/xe/xe_ras_types.h
> @@ -121,4 +121,55 @@ struct xe_ras_clear_counter_response {
>   	/** @reserved1: Reserved for future use */
>   	u32 reserved1[3];
>   } __packed;
> +
> +/**
> + * struct xe_ras_get_threshold_request - Request structure for get threshold
> + */
> +struct xe_ras_get_threshold_request {
> +	/** @counter: Counter to get threshold for */
> +	struct xe_ras_error_class counter;
> +	/** @reserved: Reserved for future use */
> +	u32 reserved;
> +} __packed;
> +
> +/**
> + * struct xe_ras_get_threshold_response - Response structure for get threshold
> + */
> +struct xe_ras_get_threshold_response {
> +	/** @counter: Counter ID */
> +	struct xe_ras_error_class counter;
> +	/** @threshold: Threshold value */
> +	u32 threshold;
> +	/** @reserved: Reserved for future use */
> +	u32 reserved[4];
> +} __packed;
> +
> +/**
> + * struct xe_ras_set_threshold_request - Request structure for set threshold
> + */
> +struct xe_ras_set_threshold_request {
> +	/** @counter: Counter to set threshold for */
> +	struct xe_ras_error_class counter;
> +	/** @threshold: Threshold value to set */
> +	u32 threshold;
> +	/** @reserved: Reserved for future use */
> +	u32 reserved;
> +} __packed;
> +
> +/**
> + * struct xe_ras_set_threshold_response - Response structure for set threshold
> + */
> +struct xe_ras_set_threshold_response {
> +	/** @counter: Counter ID */
> +	struct xe_ras_error_class counter;
> +	/** @reserved: Reserved */
> +	u32 reserved;
> +	/** @threshold: Updated threshold value */
> +	u32 threshold;
> +	/** @status: Set threshold operation status */

Nit: Already part of set threshold.  Can be just operation status

Thanks
Riana

> +	u32 status;
> +	/** @reserved1: Reserved for future use */
> +	u32 reserved1[2];
> +} __packed;
> +
>   #endif
> diff --git a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
> index 6e3753554510..10f06aa5c4b5 100644
> --- a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
> +++ b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
> @@ -24,11 +24,15 @@ enum xe_sysctrl_group {
>    *
>    * @XE_SYSCTRL_CMD_GET_COUNTER: Get error counter value
>    * @XE_SYSCTRL_CMD_CLEAR_COUNTER: Clear error counter value
> + * @XE_SYSCTRL_CMD_GET_THRESHOLD: Retrieve error threshold
> + * @XE_SYSCTRL_CMD_SET_THRESHOLD: Set error threshold
>    * @XE_SYSCTRL_CMD_GET_PENDING_EVENT: Retrieve pending event
>    */
>   enum xe_sysctrl_gfsp_cmd {
>   	XE_SYSCTRL_CMD_GET_COUNTER		= 0x03,
>   	XE_SYSCTRL_CMD_CLEAR_COUNTER		= 0x04,
> +	XE_SYSCTRL_CMD_GET_THRESHOLD		= 0x05,
> +	XE_SYSCTRL_CMD_SET_THRESHOLD		= 0x06,
>   	XE_SYSCTRL_CMD_GET_PENDING_EVENT	= 0x07,
>   };
>   

^ permalink raw reply

* Re: [PATCH net-next v3 0/2] net: dsa: realtek: rtl8365mb: add SGMII/HSGMII support for RTL8367S
From: Mieczyslaw Nalewaj @ 2026-06-15  7:23 UTC (permalink / raw)
  To: Johan Alvarado, linusw, alsi, andrew, olteanv, davem, edumazet,
	kuba, pabeni, netdev
  Cc: linux, luizluca, linux-kernel
In-Reply-To: <0100019ec34ab9b0-cd42493d-62f2-4bd7-9ace-2e4f8e41bbbd-000000@email.amazonses.com>

The fix applies not only to the RTL8367S but also to the RTL8367SB.

On 6/14/2026 1:21 AM, Johan Alvarado wrote:
> The RTL8367S is a 5+2 port switch from the same family as the
> RTL8365MB-VC already supported by this driver. Its chip info table
> entry declares SGMII and HSGMII on external interface 1, but the
> driver so far only implements RGMII, leaving boards that wire the
> switch to the CPU over the SerDes without a working CPU port.
> 
> This series implements both modes. The configuration sequence and the
> SerDes tuning parameters are derived from the GPL-licensed Realtek
> rtl8367c vendor driver, as distributed in the Mercusys MR80X GPL code
> drop, and cross-checked against the real register sequence captured at
> runtime by chainloading a custom U-Boot ahead of the stock firmware
> and logging the live SerDes accesses on hardware.
> 
> The vendor driver brings up the SerDes by loading firmware into the
> switch's embedded DW8051 microcontroller. Analysis of that firmware
> (by Luiz Angelo Daros de Luca) showed it only performs a SerDes
> data-path reset right after the SerDes reset is deasserted, and then
> runs a link-polling loop that writes the external interface force
> registers -- duplicating, and racing with, the link management phylink
> already performs. This series therefore keeps the DW8051 disabled and
> performs the one necessary action (the data-path reset via the SerDes
> BMCR register) directly in the driver, avoiding both the race and a
> dependency on a redistributable firmware blob.
> 
> Patch 1 adds the SerDes indirect access helpers and SGMII (1 Gbps)
> support. Patch 2 extends this to HSGMII (2.5 Gbps), which phylink
> represents as 2500base-x.
> 
> Tested on a Mercusys MR80X v2.20 (RTL8367S wired to the SoC over the
> SerDes), in both SGMII and HSGMII modes with a fixed-link device tree
> description: link bring-up verified across cold boots, warm reboots,
> module reloads and link down/up cycles, with sustained traffic and no
> CRC/symbol errors. The HSGMII link is confirmed running at 2.5G at the
> register level (SoC uniphy mode and gmac clocks); per-direction
> throughput could not be pushed past ~1 Gbps on this board because the
> SoC side is driven by the IPQ5018 SSDK and the user-facing PHY is 1G,
> so full 2.5G line-rate throughput remains unverified on my hardware.
> 
> Signed-off-by: Johan Alvarado <contact@c127.dev>
> ---
> v3:
>   - Drop the DW8051 firmware loading entirely. Analysis of the vendor
>     firmware showed it only duplicates the link management phylink
>     already does; the one needed action (SerDes data-path reset via
>     the BMCR register) is now performed directly in the driver, with
>     the DW8051 kept disabled. This removes the dependency on the
>     rtl8367s-sgmii.bin firmware blob, which could not be redistributed
>     via linux-firmware (the GPL vendor source ships it as a byte array
>     without the corresponding microcode source). Thanks to Luiz Angelo
>     Daros de Luca for the firmware analysis.
> v2: https://lore.kernel.org/netdev/0100019eb0b1822e-ffc5626c-1b9f-4c8a-8a1a-759a9e665f4f-000000@email.amazonses.com/
>   - No code changes; resend because the SMTP provider used for v1
>     corrupted the mails and patch 1/2 never reached the list.
> v1: https://lore.kernel.org/netdev/aebccaad-eca3-4ea4-99dd-ae7edbc8981b@smtp-relay.sendinblue.com/
> 
> Johan Alvarado (2):
>   net: dsa: realtek: rtl8365mb: add SGMII support for RTL8367S
>   net: dsa: realtek: rtl8365mb: add HSGMII support for RTL8367S
> 
>  drivers/net/dsa/realtek/rtl8365mb.c | 336 +++++++++++++++++++++++++++-
>  1 file changed, 332 insertions(+), 4 deletions(-)
> 
> 
> base-commit: 8f4695fb67b259b2cae0be1eef55859bfc559058


^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH iwl-next v5 1/4] igc: remove unused autoneg_failed field
From: Kwapulinski, Piotr @ 2026-06-15  8:22 UTC (permalink / raw)
  To: KhaiWenTan, Nguyen, Anthony L, Kitszel, Przemyslaw,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com
  Cc: intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Abdul Rahim, Faizal, Looi, Hong Aun,
	Blanco Alcaine, Hector, Tan, Khai Wen, Faizal Rahim,
	Loktionov, Aleksandr
In-Reply-To: <20260507214706.309984-2-khai.wen.tan@linux.intel.com>

>-----Original Message-----
>From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of KhaiWenTan
>Sent: Thursday, May 7, 2026 11:47 PM
>To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; andrew+netdev@lunn.ch; davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com
>Cc: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Abdul Rahim, Faizal <faizal.abdul.rahim@intel.com>; Looi, Hong Aun <hong.aun.looi@intel.com>; Blanco Alcaine, Hector <hector.blanco.alcaine@intel.com>; Tan, Khai Wen <khai.wen.tan@intel.com>; Faizal Rahim <faizal.abdul.rahim@linux.intel.com>; Loktionov, Aleksandr <aleksandr.loktionov@intel.com>; Khai Wen Tan <khai.wen.tan@linux.intel.com>
>Subject: [Intel-wired-lan] [PATCH iwl-next v5 1/4] igc: remove unused autoneg_failed field
>
>From: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
>
>autoneg_failed in struct igc_mac_info is never set in the igc driver.
>Remove the field and the dead code checking it in igc_config_fc_after_link_up().
>
>The field originates from the e1000/e1000e fiber/serdes forced-link path, where MAC-level autoneg timeout sets it to signal the flow-control code to force pause. igc supports only copper, so it never needs to set this field.
>
>Reviewed-by: Looi Hong Aun <hong.aun.looi@intel.com>
>Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
>Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
>Signed-off-by: Khai Wen Tan <khai.wen.tan@linux.intel.com>
>---
> drivers/net/ethernet/intel/igc/igc_hw.h  |  1 -  drivers/net/ethernet/intel/igc/igc_mac.c | 16 +---------------
> 2 files changed, 1 insertion(+), 16 deletions(-)
>
>diff --git a/drivers/net/ethernet/intel/igc/igc_hw.h b/drivers/net/ethernet/intel/igc/igc_hw.h
>index be8a49a86d09..86ab8f566f44 100644
>--- a/drivers/net/ethernet/intel/igc/igc_hw.h
>+++ b/drivers/net/ethernet/intel/igc/igc_hw.h
>@@ -92,7 +92,6 @@ struct igc_mac_info {
> 	bool asf_firmware_present;
> 	bool arc_subsystem_valid;
> 
>-	bool autoneg_failed;
> 	bool get_link_status;
> };
> 
>diff --git a/drivers/net/ethernet/intel/igc/igc_mac.c b/drivers/net/ethernet/intel/igc/igc_mac.c
>index 7ac6637f8db7..142beb9ae557 100644
>--- a/drivers/net/ethernet/intel/igc/igc_mac.c
>+++ b/drivers/net/ethernet/intel/igc/igc_mac.c
>@@ -438,28 +438,14 @@ void igc_config_collision_dist(struct igc_hw *hw)
>  * Checks the status of auto-negotiation after link up to ensure that the
>  * speed and duplex were not forced.  If the link needed to be forced, then
>  * flow control needs to be forced also.  If auto-negotiation is enabled
>- * and did not fail, then we configure flow control based on our link
>- * partner.
>+ * then we configure flow control based on our link partner.
>  */
> s32 igc_config_fc_after_link_up(struct igc_hw *hw)  {
> 	u16 mii_status_reg, mii_nway_adv_reg, mii_nway_lp_ability_reg;
>-	struct igc_mac_info *mac = &hw->mac;
> 	u16 speed, duplex;
> 	s32 ret_val = 0;
> 
>-	/* Check for the case where we have fiber media and auto-neg failed
>-	 * so we had to force link.  In this case, we need to force the
>-	 * configuration of the MAC to match the "fc" parameter.
>-	 */
>-	if (mac->autoneg_failed)
>-		ret_val = igc_force_mac_fc(hw);
>-
>-	if (ret_val) {
>-		hw_dbg("Error forcing flow control settings\n");
>-		goto out;
>-	}
>-
> 	/* In auto-neg, we need to check and see if Auto-Neg has completed,
> 	 * and if so, how the PHY and link partner has flow control
> 	 * configured.
>--
>2.43.0

Reviewed-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>

^ permalink raw reply

* RE: [External Mail] Re: [PATCH v2 3/7] net: wwan: t9xx: Add control DMA interface
From: Wu. JackBB (GSM) @ 2026-06-15  8:31 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Loic Poulain, Sergey Ryazanov, Johannes Berg, Andrew Lunn,
	David S. Miller, Eric Dumazet, Paolo Abeni, Wen-Zhi Huang,
	Shi-Wei Yeh, Minano Tseng, Matthias Brugger,
	AngeloGioacchino Del Regno, Simon Horman, Jonathan Corbet,
	Shuah Khan, linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-mediatek@lists.infradead.org, linux-doc@vger.kernel.org,
	wojackbb@gmail.com
In-Reply-To: <20260613173018.7a0581f1@kernel.org>

Hi Jakub,

Thank you for your comment.

On Wed, 10 Jun 2026 18:41:06 +0800 Jack Wu via B4 Relay wrote:
> Transient build warnings:
>
> +../drivers/net/wwan/t9xx/pcie/mtk_pci_drv_m9xx.c:52:30: warning: symbol 'mtk_dev_cfg_0900' was not declared. Should it be static?
> +../drivers/net/wwan/t9xx/pcie/mtk_ctrl_cfg_m9xx.c:19:22: warning: symbol 'mtk_ctrl_info_m9xx' was not declared. Should it be static?
> +../drivers/net/wwan/t9xx/pcie/mtk_cldma_drv_m9xx.c:33:22: warning: symbol 'mtk_cldma_regs_m9xx' was not declared. Should it be static?
> +../drivers/net/wwan/t9xx/pcie/mtk_cldma_drv_m9xx.c:166:22: warning: symbol 'cldma_drv_ops_m9xx' was not declared. Should it be static?

Will fix in v3. Moved extern declarations into shared headers so each
defining .c file includes its own declaration.

> please also see all the AI code comments at:
> https://sashiko.dev/#/patchset/20260610-t9xx_driver_v1-v2-3-c65addf23b3f@compal.com

Thank you. We have reviewed the AI comments. Below are our
responses for the items under this patch (patch 3):

Q1: Is the reference count for the skb incremented correctly here? If the port
is non-blocking, the caller might immediately drop the reference. Without an
extra reference taken before queueing, the skb could be freed while it still
resides in the queue, resulting in a use-after-free when the worker thread
later accesses the list.

  No use-after-free here. The skb is allocated by the port layer
  (mtk_port_send_data / mtk_port_common_write) and ownership is
  transferred to submit_skb — the caller does not free the skb after
  a successful submission. On error returns (-EINVAL, -EIO, -EAGAIN),
  the skb is NOT queued and the caller handles cleanup. The skb_queue
  operations (skb_queue_head/skb_queue_tail) do not require an extra
  reference because the queue becomes the sole owner. This is the
  standard kernel pattern — skb ownership transfers to the queue, not
  shared between caller and queue.

Q2: Could there be a race condition between setting the virtual address and the
hardware ownership flag? If the worker checks the condition between these two
writes, it might erroneously detect completion and free the buffer before the
transfer even starts.

  No race. The tx_done_work checks:
    if (!req->data_vm_addr || (gpd_flags & CLDMA_GPD_FLAG_HWO))
        break;
  It only proceeds when data_vm_addr is set AND HWO is cleared.
  In submit_tx, the write order is:
    1. req->data_vm_addr = skb->data;     (set non-NULL)
    2. wmb();
    3. gpd_flags |= CLDMA_GPD_FLAG_HWO;   (set HWO)
  After step 1, data_vm_addr is non-NULL but HWO is set (step 3), so
  tx_done_work sees HWO=1 and breaks. It will only process this req
  after the hardware clears HWO upon DMA completion. The wmb() ensures
  the metadata writes are visible before HWO is set, and the rmb() in
  tx_done_work ensures HWO is read before the metadata.

Q3: Does this permanently halt the receive queue if a memory allocation fails
during refill? If the skb allocation fails earlier, the worker exits here
without advancing the free index, which seems to leave the queue stuck
indefinitely on all subsequent executions.

  The concern is partially valid. If mtk_cldma_reload_rx_skb() fails
  (skb alloc failure), the code jumps to "out" which unmasks the
  interrupt. On the next hardware interrupt, rx_done_work runs again
  and retries from the same free_idx. If req->skb is NULL (because
  reload failed and rx_done set req->skb = NULL after dispatch), the
  code hits the "!req->skb" error check and goes to "out" — but "out"
  now correctly unmasks the interrupt (fixed in v2), so the next
  interrupt will retry. This creates a retry loop that depends on
  memory becoming available. Under sustained memory pressure, the RX
  queue stalls but recovers once memory is freed. This is acceptable
  behavior for a control plane interface with small, infrequent
  messages.

Q4: Does this loop safely handle fatal errors from the submission function? It
appears that on errors other than EAGAIN, the loop breaks without unlinking
the skb. The worker loop would then peek the exact same skb on the next
iteration, causing an infinite busy loop that locks up the processor.

  The code shown (with kref_put) is from patch 4, not patch 3.
  We will fix the issue in patch 4 v3.

Q5: Is a read memory barrier needed after checking the hardware ownership flag?
Without a barrier, the processor might speculatively read the payload length
and data in mtk_cldma_rx_skb_adjust before confirming ownership, leading to
potential data corruption.

  The GPD descriptors are allocated from a coherent DMA pool
  (dma_pool_zalloc), so CPU cache coherency is guaranteed by the DMA
  mapping. A dma_rmb() is not needed here as coherent memory ensures
  the CPU always observes the latest values written by the device
  without reordering.

Q6: Should the return value of the queue stop function be checked before tearing
down DMA memory? If the hardware fails to stop within the timeout, proceeding
to unmap the memory might result in the active hardware writing to freed
kernel memory.

  The concern is valid in theory but not actionable in practice.
  cldma_stop_queue() polls the hardware for up to 10 iterations with
  usleep. If it times out, the hardware is in a broken state (link
  error or device hang). At this point the teardown is part of
  device removal or error recovery — there is no safe alternative to
  proceeding with cleanup. Skipping the teardown would leak DMA
  memory permanently. The synchronize_irq() + flush_work() after
  stop_queue ensures no software handlers are in-flight. If the
  hardware is truly stuck, the PCIe link is likely already down
  (LINK_ERROR_VAL check), making further MMIO access no-ops.

Q7: Does this leak the radix tree nodes and their contents if the array
allocation fails? Performing memory allocations in a teardown path might
permanently leak memory during system memory pressure.

  We will fix in v3 by replacing kcalloc +
  radix_tree_gang_lookup with radix_tree_for_each_slot() iteration,
  eliminating the allocation in the teardown path.

Regarding AI comments for other patches in the series — should we
respond to those under each respective patch thread?

Thanks.

================================================================================================================================================================
This message may contain information which is private, privileged or confidential of Compal Electronics, Inc. If you are not the intended recipient of this message, please notify the sender and destroy/delete the message. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information, by persons or entities other than the intended recipient is prohibited.
================================================================================================================================================================

^ permalink raw reply

* Re: [PATCH net-next v5 0/3] airoha: add the capability to configure GDM3/GDM4 as WAN/LAN on demand
From: Simon Horman @ 2026-06-15  8:32 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-arm-kernel, linux-mediatek, netdev,
	Madhur Agrawal, Alexander Lobakin
In-Reply-To: <20260611-airoha-ethtool-priv_flags-v5-0-c11de08486d1@kernel.org>

On Thu, Jun 11, 2026 at 11:55:50PM +0200, Lorenzo Bianconi wrote:
> Add the capability to configure GDM3/GDM4 as WAN/LAN on demand when QoS
> offload is created or destroyed.
> Make dev->qdma an RCU pointer so the TX path can safely dereference it
> without holding RTNL.
> Introduce airoha_qdma_start() and airoha_qdma_stop() helpers.

For the series:

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply

* Re: [PATCH] net: dsa: hellcreek: replace kcalloc with struct_size
From: Kurt Kanzenbach @ 2026-06-15  8:40 UTC (permalink / raw)
  To: Rosen Penev, netdev
  Cc: Andrew Lunn, Vladimir Oltean, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, open list
In-Reply-To: <87v7bsnnqy.fsf@kurt.kurt.home>

[-- Attachment #1: Type: text/plain, Size: 371 bytes --]

On Tue Jun 09 2026, Kurt Kanzenbach wrote:
> On Sun Jun 07 2026, Rosen Penev wrote:
>> One fewer allocation for the priv struct.
>>
>> Signed-off-by: Rosen Penev <rosenp@gmail.com>
>
> Acked-by: Kurt Kanzenbach <kurt@linutronix.de>
>
> It looks correct, but i'm currently out of office. I'll test it next
> week on hardware.

Tested on hardware. No issues found. Thanks!

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]

^ permalink raw reply

* Re: [PATCH] xfrm: Fix xfrm state cache insertion race
From: Simon Horman @ 2026-06-15  8:43 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Steffen Klassert, netdev, Linus Torvalds, Jakub Kicinski,
	Paolo Abeni, zdi-disclosures@trendmicro.com, Willy Tarreau
In-Reply-To: <aiuSE5J-U8SuoZnk@gondor.apana.org.au>

On Fri, Jun 12, 2026 at 12:58:59PM +0800, Herbert Xu wrote:
> The xfrm input state cache insertion code checks the validity of
> the state before acquiring the global xfrm_state_lock.  Thus it's
> possible for someone else to kill the state after it passed the
> validity check, and then the insertion will add the dead state
> to the cache.
> 
> Fix this by moving the validity check inside the lock.
> 
> This entire function is called on the input path, where BH must
> be off (e.g., the caller of this function xfrm_input acquires
> its spinlocks without disabling BH).
> 
> So there is no need to disable BH here or take the RCU read lock.
> Remove both and replace them with an assertion that trips if BH
> is accidentally enabled on some future calling path.
> 
> Fixes: 81a331a0e72d ("xfrm: Add an inbound percpu state cache.")
> Reported-by: Zero Day Initiative <zdi-disclosures@trendmicro.com>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply

* Re: [PATCH v5 1/9] block: partitions: of: Skip child nodes without reg property
From: Bartosz Golaszewski @ 2026-06-15  8:47 UTC (permalink / raw)
  To: Loic Poulain
  Cc: linux-mmc, devicetree, linux-kernel, linux-arm-msm, linux-block,
	linux-wireless, ath10k, linux-bluetooth, netdev, daniel,
	Ulf Hansson, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Bjorn Andersson, Konrad Dybcio, Jens Axboe, Johannes Berg,
	Jeff Johnson, Bartosz Golaszewski, Marcel Holtmann,
	Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
	Russell King, Saravana Kannan
In-Reply-To: <20260612-block-as-nvmem-v5-1-95e0b30fff90@oss.qualcomm.com>

On Fri, 12 Jun 2026 15:20:53 +0200, Loic Poulain
<loic.poulain@oss.qualcomm.com> said:
> Child nodes of a fixed-partitions node are not necessarily partition
> entries, for example an nvmem-layout node has no reg property. The
> current code passes a NULL reg pointer and uninitialized len to the
> length check, which can result in a kernel panic or silent failure to
> register any partitions.
>
> Fix validate_of_partition() to return a skip indicator when no reg
> property is present. Guard add_of_partition() with a reg property
> check for the same reason.
>
> Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
> ---

I think this warrants a Cc: stable and backporting as well as a Fixes tag.

Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox