linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v2] net: usbnet: Avoid potential RCU stall on LINK_CHANGE event
@ 2025-07-23 10:25 John Ernberg
  2025-07-25 18:10 ` patchwork-bot+netdevbpf
  0 siblings, 1 reply; 3+ messages in thread
From: John Ernberg @ 2025-07-23 10:25 UTC (permalink / raw)
  To: Oliver Neukum, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Greg Kroah-Hartman, netdev@vger.kernel.org,
	linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
	John Ernberg, stable@vger.kernel.org

The Gemalto Cinterion PLS83-W modem (cdc_ether) is emitting confusing link
up and down events when the WWAN interface is activated on the modem-side.

Interrupt URBs will in consecutive polls grab:
* Link Connected
* Link Disconnected
* Link Connected

Where the last Connected is then a stable link state.

When the system is under load this may cause the unlink_urbs() work in
__handle_link_change() to not complete before the next usbnet_link_change()
call turns the carrier on again, allowing rx_submit() to queue new SKBs.

In that event the URB queue is filled faster than it can drain, ending up
in a RCU stall:

    rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-.... } 33108 jiffies s: 201 root: 0x1/.
    rcu: blocking rcu_node structures (internal RCU debug):
    Sending NMI from CPU 1 to CPUs 0:
    NMI backtrace for cpu 0

    Call trace:
     arch_local_irq_enable+0x4/0x8
     local_bh_enable+0x18/0x20
     __netdev_alloc_skb+0x18c/0x1cc
     rx_submit+0x68/0x1f8 [usbnet]
     rx_alloc_submit+0x4c/0x74 [usbnet]
     usbnet_bh+0x1d8/0x218 [usbnet]
     usbnet_bh_tasklet+0x10/0x18 [usbnet]
     tasklet_action_common+0xa8/0x110
     tasklet_action+0x2c/0x34
     handle_softirqs+0x2cc/0x3a0
     __do_softirq+0x10/0x18
     ____do_softirq+0xc/0x14
     call_on_irq_stack+0x24/0x34
     do_softirq_own_stack+0x18/0x20
     __irq_exit_rcu+0xa8/0xb8
     irq_exit_rcu+0xc/0x30
     el1_interrupt+0x34/0x48
     el1h_64_irq_handler+0x14/0x1c
     el1h_64_irq+0x68/0x6c
     _raw_spin_unlock_irqrestore+0x38/0x48
     xhci_urb_dequeue+0x1ac/0x45c [xhci_hcd]
     unlink1+0xd4/0xdc [usbcore]
     usb_hcd_unlink_urb+0x70/0xb0 [usbcore]
     usb_unlink_urb+0x24/0x44 [usbcore]
     unlink_urbs.constprop.0.isra.0+0x64/0xa8 [usbnet]
     __handle_link_change+0x34/0x70 [usbnet]
     usbnet_deferred_kevent+0x1c0/0x320 [usbnet]
     process_scheduled_works+0x2d0/0x48c
     worker_thread+0x150/0x1dc
     kthread+0xd8/0xe8
     ret_from_fork+0x10/0x20

Get around the problem by delaying the carrier on to the scheduled work.

This needs a new flag to keep track of the necessary action.

The carrier ok check cannot be removed as it remains required for the
LINK_RESET event flow.

Fixes: 4b49f58fff00 ("usbnet: handle link change")
Cc: stable@vger.kernel.org
Signed-off-by: John Ernberg <john.ernberg@actia.se>

---

I've been testing this quite aggressively over a night, and seems equally
stable to my first approach. I'm a little bit concerned that the bit stuff
can now race (although much smaller) in the opposite direction, that a
carrier off can occur between test_and_clear_bit() and the carrier on
action in the handler. Leaving the carrier on when it shouldn't be.

v2:
 - target tree in patch description.
 - Drop Ming Lei from address list as their address bounces.
 - Rework solution based on feedback by Jakub (let me know if you want a
     Suggested-by tag, if we're keeping this direction)

v1: https://lore.kernel.org/netdev/20250710085028.1070922-1-john.ernberg@actia.se/

Tested on 6.12.20 and forward ported. Stack trace from 6.12.20.
---
 drivers/net/usb/usbnet.c   | 11 ++++++++---
 include/linux/usb/usbnet.h |  1 +
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index c04e715a4c2a..bc1d8631ffe0 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1122,6 +1122,9 @@ static void __handle_link_change(struct usbnet *dev)
 		 * tx queue is stopped by netcore after link becomes off
 		 */
 	} else {
+		if (test_and_clear_bit(EVENT_LINK_CARRIER_ON, &dev->flags))
+			netif_carrier_on(dev->net);
+
 		/* submitting URBs for reading packets */
 		tasklet_schedule(&dev->bh);
 	}
@@ -2009,10 +2012,12 @@ EXPORT_SYMBOL(usbnet_manage_power);
 void usbnet_link_change(struct usbnet *dev, bool link, bool need_reset)
 {
 	/* update link after link is reseted */
-	if (link && !need_reset)
-		netif_carrier_on(dev->net);
-	else
+	if (link && !need_reset) {
+		set_bit(EVENT_LINK_CARRIER_ON, &dev->flags);
+	} else {
+		clear_bit(EVENT_LINK_CARRIER_ON, &dev->flags);
 		netif_carrier_off(dev->net);
+	}
 
 	if (need_reset && link)
 		usbnet_defer_kevent(dev, EVENT_LINK_RESET);
diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
index 0b9f1e598e3a..4bc6bb01a0eb 100644
--- a/include/linux/usb/usbnet.h
+++ b/include/linux/usb/usbnet.h
@@ -76,6 +76,7 @@ struct usbnet {
 #		define EVENT_LINK_CHANGE	11
 #		define EVENT_SET_RX_MODE	12
 #		define EVENT_NO_IP_ALIGN	13
+#		define EVENT_LINK_CARRIER_ON	14
 /* This one is special, as it indicates that the device is going away
  * there are cyclic dependencies between tasklet, timer and bh
  * that must be broken
-- 
2.49.0

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net v2] net: usbnet: Avoid potential RCU stall on LINK_CHANGE event
  2025-07-23 10:25 [PATCH net v2] net: usbnet: Avoid potential RCU stall on LINK_CHANGE event John Ernberg
@ 2025-07-25 18:10 ` patchwork-bot+netdevbpf
  2025-08-01 18:09   ` Ammar Faizi
  0 siblings, 1 reply; 3+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-07-25 18:10 UTC (permalink / raw)
  To: John Ernberg
  Cc: oneukum, andrew+netdev, davem, edumazet, kuba, pabeni, gregkh,
	netdev, linux-usb, linux-kernel, stable

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 23 Jul 2025 10:25:35 +0000 you wrote:
> The Gemalto Cinterion PLS83-W modem (cdc_ether) is emitting confusing link
> up and down events when the WWAN interface is activated on the modem-side.
> 
> Interrupt URBs will in consecutive polls grab:
> * Link Connected
> * Link Disconnected
> * Link Connected
> 
> [...]

Here is the summary with links:
  - [net,v2] net: usbnet: Avoid potential RCU stall on LINK_CHANGE event
    https://git.kernel.org/netdev/net/c/0d9cfc9b8cb1

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net v2] net: usbnet: Avoid potential RCU stall on LINK_CHANGE event
  2025-07-25 18:10 ` patchwork-bot+netdevbpf
@ 2025-08-01 18:09   ` Ammar Faizi
  0 siblings, 0 replies; 3+ messages in thread
From: Ammar Faizi @ 2025-08-01 18:09 UTC (permalink / raw)
  To: patchwork-bot+netdevbpf, John Ernberg
  Cc: Armando Budianto, Oliver Neukum, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Greg Kroah-Hartman,
	Linux Netdev Mailing List, Linux USB Mailing List,
	Linux Kernel Mailing List, stable, GNU/Weeb Mailing List

On 7/26/25 1:10 AM, patchwork-bot+netdevbpf@kernel.org wrote:
> Here is the summary with links:
>    - [net,v2] net: usbnet: Avoid potential RCU stall on LINK_CHANGE event
>      https://git.kernel.org/netdev/net/c/0d9cfc9b8cb1


I just got bitten by this commit after syncing with Linus' tree.

It breaks my laptop. RJ45 LAN cable cannot connect. After git bisect,
it ends up with that commit.

ammarfaizi2@integral2:~/p/linux-block$ git bisect log
git bisect start
# bad: [ff82265b006e468df734a2d71f9110b73bd740f2] Merge branch 'master' into af/home (sync with mainline)
git bisect bad ff82265b006e468df734a2d71f9110b73bd740f2
# good: [b0896d43221f7858491d59383f56dfe38e7fff34] Merge tag 'kvm-x86-vmx-6.17' of https://github.com/kvm-x86/linux into af/home
git bisect good b0896d43221f7858491d59383f56dfe38e7fff34
# good: [5f5c9952b33cb4e8d25c70ef29f7a45cd26b6a9b] Merge tag 'powerpc-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
git bisect good 5f5c9952b33cb4e8d25c70ef29f7a45cd26b6a9b
# bad: [8be4d31cb8aaeea27bde4b7ddb26e28a89062ebf] Merge tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect bad 8be4d31cb8aaeea27bde4b7ddb26e28a89062ebf
# good: [c2b93d6beca8526fb38ccc834def1c987afe24fc] eth: fbnic: Create ring buffer for firmware logs
git bisect good c2b93d6beca8526fb38ccc834def1c987afe24fc
# good: [077f7153fd2582874b0dec8c8fcd687677d0f4cc] gve: merge xdp and xsk registration
git bisect good 077f7153fd2582874b0dec8c8fcd687677d0f4cc
# good: [126d85fb040559ba6654f51c0b587d280b041abb] Merge tag 'wireless-next-2025-07-24' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
git bisect good 126d85fb040559ba6654f51c0b587d280b041abb
# good: [ecc383e5fe060f1aaad0e4e4ae36ad1c899e948d] Merge tag 'linux-can-next-for-6.17-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
git bisect good ecc383e5fe060f1aaad0e4e4ae36ad1c899e948d
# bad: [c58c18be8850d58fd61b0480d2355df89ce7ee59] Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
git bisect bad c58c18be8850d58fd61b0480d2355df89ce7ee59
# good: [620e2392db235ba3b9e9619912aadb8cadee15e7] net: dsa: microchip: Disable PTP function of KSZ8463
git bisect good 620e2392db235ba3b9e9619912aadb8cadee15e7
# bad: [2764ab51d5f0e8c7d3b7043af426b1883e3bde1d] stmmac: xsk: fix negative overflow of budget in zerocopy mode
git bisect bad 2764ab51d5f0e8c7d3b7043af426b1883e3bde1d
# good: [4fc7885c3a98ec4450103aef874fb1d35920c7af] Merge branch 'mlx5e-misc-fixes-2025-07-23'
git bisect good 4fc7885c3a98ec4450103aef874fb1d35920c7af
# bad: [1bbb76a899486827394530916f01214d049931b3] neighbour: Fix null-ptr-deref in neigh_flush_dev().
git bisect bad 1bbb76a899486827394530916f01214d049931b3
# bad: [165a7f5db919ab68a45ae755cceb751e067273ef] net: dsa: microchip: Fix wrong rx drop MIB counter for KSZ8863
git bisect bad 165a7f5db919ab68a45ae755cceb751e067273ef
# bad: [0d9cfc9b8cb17dbc29a98792d36ec39a1cf1395f] net: usbnet: Avoid potential RCU stall on LINK_CHANGE event
git bisect bad 0d9cfc9b8cb17dbc29a98792d36ec39a1cf1395f
# first bad commit: [0d9cfc9b8cb17dbc29a98792d36ec39a1cf1395f] net: usbnet: Avoid potential RCU stall on LINK_CHANGE event

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-08-01 18:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-23 10:25 [PATCH net v2] net: usbnet: Avoid potential RCU stall on LINK_CHANGE event John Ernberg
2025-07-25 18:10 ` patchwork-bot+netdevbpf
2025-08-01 18:09   ` Ammar Faizi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).