public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed
From: John Ernberg <john.ernberg@actia.se>
To: Jakub Kicinski <kuba@kernel.org>, Oliver Neukum <oneukum@suse.com>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S . Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>,
	Ming Lei <ming.lei@canonical.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: [PATCH] net: usbnet: Avoid potential RCU stall on LINK_CHANGE event
Date: Wed, 16 Jul 2025 14:54:46 +0000	[thread overview]
Message-ID: <fbd03180-cca0-4a0f-8fd9-4daf5ff28ff5@actia.se> (raw)
In-Reply-To: <20250715065403.641e4bd7@kernel.org>

Hi Jakub, Oliver,

On 7/15/25 3:54 PM, Jakub Kicinski wrote:
> On Tue, 15 Jul 2025 07:15:51 +0000 John Ernberg wrote:
>>> I'm worried that this is still racy.
>>> Since usbnet_bh checks if carrier is ok and __handle_link_change()
>>> checks the opposite something must be out of sync if both run.
>>> Most likely something restored the carrier while we're still handling
>>> the previous carrier loss.
>>
>> There could definitely be other factors, I'll try to dig some in
>> cdc_ether and see if something there could be causing problems for the
>> usbnet core.
>> I honestly kinda stopped digging when I found unlink_urbs() being
>> wrapped with a pause/unpause at another place tied to a commit seeing a
>> similar issue.
> 
> Looking at cdc_ether:
> 
> static void usbnet_cdc_zte_status(struct usbnet *dev, struct urb *urb)
> [...]
>          if (event->wValue &&
>              netif_carrier_ok(dev->net))
>                  netif_carrier_off(dev->net);
> 
> This looks sus. Is the Gemalto Cinterion PLS83-W ZTE based?
> 

The modem isn't ZTE-based, but I did spot that function as well when 
checking cdc_ether.

The modem is advertising USB High Speed with a bInterval of 5, which in 
usbnet gets bumped to 7 to avoid too aggressive polling, if I read that 
code right.

Through some rudimentary logging with the inlined patch on 6.12.20 (mail 
client mangled):

-----------------8<----------------------

diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c
index a6469235d904..7b4827d2c797 100644
--- a/drivers/net/usb/cdc_ether.c
+++ b/drivers/net/usb/cdc_ether.c
@@ -418,8 +418,10 @@ void usbnet_cdc_status(struct usbnet *dev, struct 
urb *urb)
         case USB_CDC_NOTIFY_NETWORK_CONNECTION:
                 netif_dbg(dev, timer, dev->net, "CDC: carrier %s\n",
                           event->wValue ? "on" : "off");
-               if (netif_carrier_ok(dev->net) != !!event->wValue)
+               if (netif_carrier_ok(dev->net) != !!event->wValue) {
+                       netdev_err(dev->net, "network connection %d\n", 
(int)event->wValue);
                         usbnet_link_change(dev, !!event->wValue, 0);
+               }
                 break;
         case USB_CDC_NOTIFY_SPEED_CHANGE:       /* tx/rx rates */
                 netif_dbg(dev, timer, dev->net, "CDC: speed change (len 
%d)\n",
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 1e99a6ef4945..dc3e3d87b4f7 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -721,6 +721,7 @@ static int unlink_urbs (struct usbnet *dev, struct 
sk_buff_head *q)
         struct sk_buff          *skb;
         int                     count = 0;

+       netdev_err(dev->net, "unlink urb start: %d devflags=%lx\n", 
skb_queue_len_lockless(q), dev->flags);
         spin_lock_irqsave (&q->lock, flags);
         while (!skb_queue_empty(q)) {
                 struct skb_data         *entry;
@@ -757,6 +758,7 @@ static int unlink_urbs (struct usbnet *dev, struct 
sk_buff_head *q)
                 spin_lock_irqsave(&q->lock, flags);
         }
         spin_unlock_irqrestore (&q->lock, flags);
+       netdev_err(dev->net, "unlink urb counted %d\n", count);
         return count;
  }

-----------------8<----------------------

I ended up with the following log:

[   23.823289] cdc_ether 1-1.1:1.8 wwan0: network connection 0
[   23.830874] cdc_ether 1-1.1:1.8 wwan0: unlink urb start: 5 devflags=1880
[   23.840148] cdc_ether 1-1.1:1.8 wwan0: unlink urb counted 5
[   25.356741] cdc_ether 1-1.1:1.8 wwan0: network connection 1
[   25.364745] cdc_ether 1-1.1:1.8 wwan0: network connection 0
[   25.371106] cdc_ether 1-1.1:1.8 wwan0: unlink urb start: 5 devflags=880
[   25.378710] cdc_ether 1-1.1:1.8 wwan0: network connection 1
[   51.422757] rcu: INFO: rcu_sched self-detected stall on CPU
[   51.429081] rcu:     0-....: (6499 ticks this GP) 
idle=da7c/1/0x4000000000000000 softirq=2067/2067 fqs=2668
[   51.439717] rcu:              hardirqs   softirqs   csw/system
[   51.445897] rcu:      number:    62096      59017            0
[   51.452107] rcu:     cputime:        0      11397         1470   ==> 
12996(ms)
[   51.459852] rcu:     (t=6500 jiffies g=2397 q=663 ncpus=2)

 From a USB capture where the stall didn't happen I can see:
* A bunch of CDC_NETWORK_CONNECTION events with Disconnected state (0).
* Then a CDC_NETWORK_CONNECTION event with Connected state (1) once the 
WWAN interface is turned on by the modem.
* Followed by a Disconnected in the next USB INTR poll.
* Followed by a Connected in the next USB INTR poll.
(I'm not sure if I can achieve a different timing with enough captures 
or a faster system)

Which makes the off and on LINK_CHANGE events race on our system (ARM64 
based, iMX8QXP) as they cannot be handled fast enough. Nothing stops 
usbnet_link_change() from being called while the deferred work is running.

As Oliver points out usbnet_resume_rx() causes scheduling which seems 
unnecessary or maybe even inappropriate for all cases except when the 
carrier was turned on during the race.

I gave the ZTE modem quirk a go anyway, despite the comment explaining a 
different situation than what I am seeing, and it has no observable 
effect on this RCU stall.

Currently drawing a blank on what the correct fix would be.

Best regards // John Ernberg

  reply	other threads:[~2025-07-16 14:54 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-10  8:50 [PATCH] net: usbnet: Avoid potential RCU stall on LINK_CHANGE event John Ernberg
2025-07-14 23:35 ` Jakub Kicinski
2025-07-15  7:15   ` John Ernberg
2025-07-15 13:54     ` Jakub Kicinski
2025-07-16 14:54       ` John Ernberg [this message]
2025-07-16 21:39         ` Jakub Kicinski
2025-07-18  9:07           ` John Ernberg
2025-07-18 23:18             ` Jakub Kicinski
2025-07-23  9:29               ` John Ernberg
2025-07-15 11:49 ` Oliver Neukum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fbd03180-cca0-4a0f-8fd9-4daf5ff28ff5@actia.se \
    --to=john.ernberg@actia.se \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=ming.lei@canonical.com \
    --cc=netdev@vger.kernel.org \
    --cc=oneukum@suse.com \
    --cc=pabeni@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox