From: John Ernberg <john.ernberg@actia.se>
To: Jakub Kicinski <kuba@kernel.org>, Oliver Neukum <oneukum@suse.com>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>,
"David S . Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Paolo Abeni <pabeni@redhat.com>,
Ming Lei <ming.lei@canonical.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: [PATCH] net: usbnet: Avoid potential RCU stall on LINK_CHANGE event
Date: Wed, 16 Jul 2025 14:54:46 +0000 [thread overview]
Message-ID: <fbd03180-cca0-4a0f-8fd9-4daf5ff28ff5@actia.se> (raw)
In-Reply-To: <20250715065403.641e4bd7@kernel.org>
Hi Jakub, Oliver,
On 7/15/25 3:54 PM, Jakub Kicinski wrote:
> On Tue, 15 Jul 2025 07:15:51 +0000 John Ernberg wrote:
>>> I'm worried that this is still racy.
>>> Since usbnet_bh checks if carrier is ok and __handle_link_change()
>>> checks the opposite something must be out of sync if both run.
>>> Most likely something restored the carrier while we're still handling
>>> the previous carrier loss.
>>
>> There could definitely be other factors, I'll try to dig some in
>> cdc_ether and see if something there could be causing problems for the
>> usbnet core.
>> I honestly kinda stopped digging when I found unlink_urbs() being
>> wrapped with a pause/unpause at another place tied to a commit seeing a
>> similar issue.
>
> Looking at cdc_ether:
>
> static void usbnet_cdc_zte_status(struct usbnet *dev, struct urb *urb)
> [...]
> if (event->wValue &&
> netif_carrier_ok(dev->net))
> netif_carrier_off(dev->net);
>
> This looks sus. Is the Gemalto Cinterion PLS83-W ZTE based?
>
The modem isn't ZTE-based, but I did spot that function as well when
checking cdc_ether.
The modem is advertising USB High Speed with a bInterval of 5, which in
usbnet gets bumped to 7 to avoid too aggressive polling, if I read that
code right.
Through some rudimentary logging with the inlined patch on 6.12.20 (mail
client mangled):
-----------------8<----------------------
diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c
index a6469235d904..7b4827d2c797 100644
--- a/drivers/net/usb/cdc_ether.c
+++ b/drivers/net/usb/cdc_ether.c
@@ -418,8 +418,10 @@ void usbnet_cdc_status(struct usbnet *dev, struct
urb *urb)
case USB_CDC_NOTIFY_NETWORK_CONNECTION:
netif_dbg(dev, timer, dev->net, "CDC: carrier %s\n",
event->wValue ? "on" : "off");
- if (netif_carrier_ok(dev->net) != !!event->wValue)
+ if (netif_carrier_ok(dev->net) != !!event->wValue) {
+ netdev_err(dev->net, "network connection %d\n",
(int)event->wValue);
usbnet_link_change(dev, !!event->wValue, 0);
+ }
break;
case USB_CDC_NOTIFY_SPEED_CHANGE: /* tx/rx rates */
netif_dbg(dev, timer, dev->net, "CDC: speed change (len
%d)\n",
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 1e99a6ef4945..dc3e3d87b4f7 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -721,6 +721,7 @@ static int unlink_urbs (struct usbnet *dev, struct
sk_buff_head *q)
struct sk_buff *skb;
int count = 0;
+ netdev_err(dev->net, "unlink urb start: %d devflags=%lx\n",
skb_queue_len_lockless(q), dev->flags);
spin_lock_irqsave (&q->lock, flags);
while (!skb_queue_empty(q)) {
struct skb_data *entry;
@@ -757,6 +758,7 @@ static int unlink_urbs (struct usbnet *dev, struct
sk_buff_head *q)
spin_lock_irqsave(&q->lock, flags);
}
spin_unlock_irqrestore (&q->lock, flags);
+ netdev_err(dev->net, "unlink urb counted %d\n", count);
return count;
}
-----------------8<----------------------
I ended up with the following log:
[ 23.823289] cdc_ether 1-1.1:1.8 wwan0: network connection 0
[ 23.830874] cdc_ether 1-1.1:1.8 wwan0: unlink urb start: 5 devflags=1880
[ 23.840148] cdc_ether 1-1.1:1.8 wwan0: unlink urb counted 5
[ 25.356741] cdc_ether 1-1.1:1.8 wwan0: network connection 1
[ 25.364745] cdc_ether 1-1.1:1.8 wwan0: network connection 0
[ 25.371106] cdc_ether 1-1.1:1.8 wwan0: unlink urb start: 5 devflags=880
[ 25.378710] cdc_ether 1-1.1:1.8 wwan0: network connection 1
[ 51.422757] rcu: INFO: rcu_sched self-detected stall on CPU
[ 51.429081] rcu: 0-....: (6499 ticks this GP)
idle=da7c/1/0x4000000000000000 softirq=2067/2067 fqs=2668
[ 51.439717] rcu: hardirqs softirqs csw/system
[ 51.445897] rcu: number: 62096 59017 0
[ 51.452107] rcu: cputime: 0 11397 1470 ==>
12996(ms)
[ 51.459852] rcu: (t=6500 jiffies g=2397 q=663 ncpus=2)
From a USB capture where the stall didn't happen I can see:
* A bunch of CDC_NETWORK_CONNECTION events with Disconnected state (0).
* Then a CDC_NETWORK_CONNECTION event with Connected state (1) once the
WWAN interface is turned on by the modem.
* Followed by a Disconnected in the next USB INTR poll.
* Followed by a Connected in the next USB INTR poll.
(I'm not sure if I can achieve a different timing with enough captures
or a faster system)
Which makes the off and on LINK_CHANGE events race on our system (ARM64
based, iMX8QXP) as they cannot be handled fast enough. Nothing stops
usbnet_link_change() from being called while the deferred work is running.
As Oliver points out usbnet_resume_rx() causes scheduling which seems
unnecessary or maybe even inappropriate for all cases except when the
carrier was turned on during the race.
I gave the ZTE modem quirk a go anyway, despite the comment explaining a
different situation than what I am seeing, and it has no observable
effect on this RCU stall.
Currently drawing a blank on what the correct fix would be.
Best regards // John Ernberg
next prev parent reply other threads:[~2025-07-16 14:54 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-10 8:50 [PATCH] net: usbnet: Avoid potential RCU stall on LINK_CHANGE event John Ernberg
2025-07-14 23:35 ` Jakub Kicinski
2025-07-15 7:15 ` John Ernberg
2025-07-15 13:54 ` Jakub Kicinski
2025-07-16 14:54 ` John Ernberg [this message]
2025-07-16 21:39 ` Jakub Kicinski
2025-07-18 9:07 ` John Ernberg
2025-07-18 23:18 ` Jakub Kicinski
2025-07-23 9:29 ` John Ernberg
2025-07-15 11:49 ` Oliver Neukum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fbd03180-cca0-4a0f-8fd9-4daf5ff28ff5@actia.se \
--to=john.ernberg@actia.se \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-usb@vger.kernel.org \
--cc=ming.lei@canonical.com \
--cc=netdev@vger.kernel.org \
--cc=oneukum@suse.com \
--cc=pabeni@redhat.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox