Linux USB
 help / color / mirror / Atom feed
* Problem hanging Bulk IN, with USB 3.x, perhaps due to wMaxPacketSize = 1024 and wMaxBurst = 6 (OUT) and 2 (IN), tested and reproduceable with i.MX8MP and Raspberry Pi Compute Module 5 (CM5)
@ 2025-08-08 21:47 Martin Maurer
  2025-08-14 12:16 ` Michał Pecio
  2025-08-14 13:02 ` Mathias Nyman
  0 siblings, 2 replies; 8+ messages in thread
From: Martin Maurer @ 2025-08-08 21:47 UTC (permalink / raw)
  To: linux-usb

Hello,

since some time I am fighting against a problem with USB:

I have a Qualcomm radio module (in my case a Quectel RM520N-GL and 
SIMCOM SIM8260G-M2)

connected to a Phytec Pollux board with an NXP i.MX8MP.

I started with Linux 6.6.23. It communicates with USB 3.x.

I build up an internet connection with this radio module. I connected a 
Notebook (via Wifi, but external hardware converts to Ethernet).

First test setup was a bit difficult.


Radio Module <-> USB 3.x <-> Phytec Linux Board <-> Ethernet Tunnel <-> 
Raspberry Pi CM5 <-> Wifi <-> Windows Notebook


I opened Firefox, Youtube worked well, HD video over multiple hours with 
no problem.

Then I opened Microsoft Teams instead and data transfer immediately 
stalled. I had a ping running in parallel directly to radio module, this 
also stalled.

With more testing I found out also Firefox and opening Twitch.tv stalled 
the connection.

# lsusb -t
/:  Bus 05.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
     |__ Port 1: Dev 2, If 0, Class=Vendor Specific Class, 
Driver=option, 5000M
     |__ Port 1: Dev 2, If 1, Class=Vendor Specific Class, 
Driver=option, 5000M
     |__ Port 1: Dev 2, If 2, Class=Vendor Specific Class, 
Driver=option, 5000M
     |__ Port 1: Dev 2, If 3, Class=Vendor Specific Class, 
Driver=option, 5000M
     |__ Port 1: Dev 2, If 4, Class=Vendor Specific Class, 
Driver=qmi_wwan, 5000M
/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/2p, 480M
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
     |__ Port 1: Dev 2, If 0, Class=Communications, Driver=cdc_ncm, 5000M
     |__ Port 1: Dev 2, If 1, Class=CDC Data, Driver=cdc_ncm, 5000M
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/2p, 480M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=dwc2/1p, 480M
#

It is the Bus 05, Port 1, Dev 2 with multiple interface.

I found a newer Linux version 6.6.52. Same error occured.

When error occurs, I don't see anything in system logs (e.g. dmesg).

Instead of the Quectel radio module I took one from SIMCOM and same 
problem occured.

I added my USB 2.0 analyzer (old Ellisys) and problem disappeared. 
Unfortunately I have no USB 3.x analyzer.

I am still waiting for a original NXP board with an i.MX8MP, which seems 
a 6.12.x kernel can be used and tested.

I found an errata for i.MX8MP: ERR050714 “USB: HOST Stream IN issue if 
received short packet”, but it looks like I have no Stream IN in use...?

So perhaps something different.

For confirming, if it could be something i.MX8MP related, I today took a 
Raspberry Pi Compute Module 5 (CM5).

Also ARM64, but else, I assume completely different USB 3 peripheral.

And surprise: I was able to reproduce the problem.

The Raspberry Pi uses:

Linux CM5 6.12.34+rpt-rpi-v8 #1 SMP PREEMPT Debian 
1:6.12.34-1+rpt1~bookworm (2025-06-26) aarch64 GNU/Linux


I still can't decide which side makes the problem, or if it is just an 
interop problem.


I saw the virtual channel, which is used for data transfer, uses a 
wMaxPacketSize of 1024 Bytes (IN and OUT) and a wMaxBurst of 6 (OUT) and 
2 (IN).

I already created traces by usbmon, which I can share.

I can also read out USB descriptor (lsusb -v) and share.

Qualcomm radio modules are widely used, due to high possible throughput, 
I assume people are also using USB 3.x with it.

I am not sure yet, why the error occurs on my side and not just works...

Someone already heard from such an error? Is there perhaps some workaround?

Can I perhaps patch wMaxPacketSize or wMaxBurst done in Kernel, at least 
only for test purposes?

Note beside:

I can query number of sent and received IP packets. I sent the pings (1 
every second).

The ping does not display, that it receives an answer, after the hang 
occured.

But the radio module tells 5 sent and 5 received packets, when querying 
the statistics every 5 packets via QMI.

So I assume sending (Bulk OUT) is working, packet go to server and back 
to radio module, but answer is not sent over USB from device to host.

Second perhaps interesting note:

It looks like the received radio module is keeping/storing the packets.

After a more or less long time (a few hours), all buffers are exhausted. 
Then QMI commands are not answered anymore.

When doing some action (I saw it with opening the channel for AT 
commands, or for creating log files),

it could happen that all kept packets are then sent in one go, so I get 
QMI packet statistic a lot of time, all with increase of 5 packets,

in sum an amount of packets, which needed the hours the create.

The radio module seems also to be using Linux. Which version, I don't know.

What can I do/test next?

Try again on a AMD x64 controller? Perhaps with main/latest of Linux Kernel?

I thought about getting an i.MX95, but seems to be not yet available,

but Phytec has some engineering samples (?) on boards, but release notes 
say: only USB 2 is working yet.

Can I enabled traces in USB kernel which could be helpful to narrow the 
problem.

Sorry for the long description. I tested meanwhile a lot...

Best regards,

Martin




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem hanging Bulk IN, with USB 3.x, perhaps due to wMaxPacketSize = 1024 and wMaxBurst = 6 (OUT) and 2 (IN), tested and reproduceable with i.MX8MP and Raspberry Pi Compute Module 5 (CM5)
  2025-08-08 21:47 Problem hanging Bulk IN, with USB 3.x, perhaps due to wMaxPacketSize = 1024 and wMaxBurst = 6 (OUT) and 2 (IN), tested and reproduceable with i.MX8MP and Raspberry Pi Compute Module 5 (CM5) Martin Maurer
@ 2025-08-14 12:16 ` Michał Pecio
  2025-08-14 13:02 ` Mathias Nyman
  1 sibling, 0 replies; 8+ messages in thread
From: Michał Pecio @ 2025-08-14 12:16 UTC (permalink / raw)
  To: Martin Maurer; +Cc: linux-usb

Hi,

On Fri, 8 Aug 2025 23:47:07 +0200, Martin Maurer wrote:
> What can I do/test next?
> 
> Try again on a AMD x64 controller? Perhaps with main/latest of Linux Kernel?

You are describing some fairly complex setups, can you confidently
say that the problem is USB and not elsewhere? For example, you have
tcpdump running on the USB host machine and packets go out through
the USB network interface but nothing comes back?

What driver are you using with this USB device? Any errors/diagnostics
from the driver, or from xhci_hcd (I guess that's your host)?

You tried usbmon and what happened? Is the driver submitting IN URBs?
Are they coming back empty? With error status? Not completing at all?

Trying on a PC with newer kernel makes sense, debugging may be easier
that way and lower risk of:
- chasing bugs in downstream kernels that nobody here can help with
- fixing something that has already been figured out and fixed

Regards,
Michal

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem hanging Bulk IN, with USB 3.x, perhaps due to wMaxPacketSize = 1024 and wMaxBurst = 6 (OUT) and 2 (IN), tested and reproduceable with i.MX8MP and Raspberry Pi Compute Module 5 (CM5)
  2025-08-08 21:47 Problem hanging Bulk IN, with USB 3.x, perhaps due to wMaxPacketSize = 1024 and wMaxBurst = 6 (OUT) and 2 (IN), tested and reproduceable with i.MX8MP and Raspberry Pi Compute Module 5 (CM5) Martin Maurer
  2025-08-14 12:16 ` Michał Pecio
@ 2025-08-14 13:02 ` Mathias Nyman
  2025-08-17 14:58   ` Martin Maurer
  1 sibling, 1 reply; 8+ messages in thread
From: Mathias Nyman @ 2025-08-14 13:02 UTC (permalink / raw)
  To: Martin Maurer, linux-usb

On 9.8.2025 0.47, Martin Maurer wrote:
> Hello,
> 
> since some time I am fighting against a problem with USB:
> 
...
> 
> Note beside:
> 
> I can query number of sent and received IP packets. I sent the pings (1 every second).
> 
> The ping does not display, that it receives an answer, after the hang occured.
> 
> But the radio module tells 5 sent and 5 received packets, when querying the statistics every 5 packets via QMI.
> 
> So I assume sending (Bulk OUT) is working, packet go to server and back to radio module, but answer is not sent over USB from device to host.
> 

I didn't fully understand the complex setup, but the subject, and this section does give a hint it
maybe could be related to missing zero-length bulk packets.

The receiving side, which on host side would be the bulk in endpoint usually knows a transfer
is complete when it receives the exact amount it requested, or if it receives a packet shorter than
maxPacketSize. (short transfer)

But if the sender sends less than expected, and it happens to be exactly maxPacketSize (1024) bytes,
then the sender should send an additinal zero-length packet to let receiver know no more data is coming.

Otherwise the receiving side will be stuck waiting for the next packet.

Thanks
Mathias

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem hanging Bulk IN, with USB 3.x, perhaps due to wMaxPacketSize = 1024 and wMaxBurst = 6 (OUT) and 2 (IN), tested and reproduceable with i.MX8MP and Raspberry Pi Compute Module 5 (CM5)
  2025-08-14 13:02 ` Mathias Nyman
@ 2025-08-17 14:58   ` Martin Maurer
  2025-08-17 15:07     ` Martin Maurer
  2025-08-17 15:22     ` Daniele Palmas
  0 siblings, 2 replies; 8+ messages in thread
From: Martin Maurer @ 2025-08-17 14:58 UTC (permalink / raw)
  To: linux-usb, michal.pecio, mathias.nyman

Hello Michał, hello Mathias at all,

many thanks for your answers!

I have tried if I can reproduce it with a AMD Linux PC, but 
unfortunately I was not able to reproduce (but setup is a bit different).

So I went back to Raspberry Pi Compute Module 5, where I mainly 
connected the radio module (Quectel RM520N-GL) via USB3,

and installed a Wifi access point. All data/all connections from Wifi 
access point are routed directly via wwan0 to radio module.

This is currently my easiest setup to be able to reproduce the error. 
Mostly in a few seconds.

My knowledge in area Linux Kernel + USB is unfortunately not yet enough 
to analyze and fix it by myself.

But I used the help of ChatGPT-5 to create an usbmon and xhci kernel trace.

I create an usbmon trace as well as a trace from xhci (both recorded in 
parallel):

https://www.file-upload.net/en/download-15523936/usbmon_bus5_20250817-150158.log.html

https://www.file-upload.net/en/download-15523937/xhci_20250817-150158.trace.html

This was the last output, my ping in a shell has shown:

64 bytes from 8.8.8.8: icmp_seq=2323 ttl=112 time=26.0 ms
64 bytes from 8.8.8.8: icmp_seq=2324 ttl=112 time=25.0 ms
64 bytes from 8.8.8.8: icmp_seq=2325 ttl=112 time=29.1 ms
64 bytes from 8.8.8.8: icmp_seq=2326 ttl=112 time=37.8 ms

In parallel created more data traffic, but with ping I see first when IP 
data connection does not work stable anymore.

According to ChatGPT-5 the following places contain errors:

*** USBMON ***

In your usbmon_bus5_20250817-150158.log:

First -71 (EPROTO) on the QMI Bulk-IN (Bi:5:005:14): line 2161, 
timestamp 493245744

2161: ffffff8003c8cb40 493245744 C Bi:5:005:14 -71 0

Just before that, there’s a -75 (EOVERFLOW) on the same IN EP, which is 
often the first sign of trouble: line 2159, timestamp 493245221

2159: ffffff8003c8cd80 493245221 C Bi:5:005:14 -75 1024 = ...

So the sequence is: several good completions → EOVERFLOW (-75) → then a 
stream of EPROTO (-71) errors on Bi:5:005:14, which kills further ping 
replies after your last good seq (2326).


*** XHCI TRACE ***

I found the first failure in your xHCI trace.

First error line: line 8216

Timestamp: 758267.000115

Event: xhci_handle_event … type 'Transfer Event' … 'Error' … slot 1 ep 
29 … len 1472

Why ep 29? In xHCI, the endpoint context index is ep_index = 2 * 
ep_number + (direction), where direction is 0=OUT, 1=IN.
So for Bulk IN ep 14: 2*14+1 = 29 → that’s your IN 0x87 pipe.

Right after that line you can see the driver react:

xhci_handle_transfer … length 1472 … (the failed TD)

xhci_queue_command: Reset Endpoint Command … ep 29 (host tries to recover)

xhci_handle_event: … 'Command Completion Event' (reset completes)

But from this point on, completions for that IN EP correspond to usbmon 
-71 (EPROTO) — matching what you saw.


Does this give a clue, where it could be coming from?

It is 100% reproduceable in a few seconds on Raspberry Pi Ccompute 
Module 5 (and I same behaviour on different kernel of i.MX8MP).

Could it be a hardware problem? I already tried different radio module 
(all Qualcomm, X62/X65 and X72/X75),

different cables (all same length, all from same source), different eval 
board for the M.2 radio modules (but from same source).


Can you give me a hint, what to try next?


ChatGPT-5 pinpoints me to try to disable LPM for USB3, could this be a 
next step? Or is it something  else?


Many thanks for your help!

Best regards,

Martin


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem hanging Bulk IN, with USB 3.x, perhaps due to wMaxPacketSize = 1024 and wMaxBurst = 6 (OUT) and 2 (IN), tested and reproduceable with i.MX8MP and Raspberry Pi Compute Module 5 (CM5)
  2025-08-17 14:58   ` Martin Maurer
@ 2025-08-17 15:07     ` Martin Maurer
  2025-08-17 15:22     ` Daniele Palmas
  1 sibling, 0 replies; 8+ messages in thread
From: Martin Maurer @ 2025-08-17 15:07 UTC (permalink / raw)
  To: linux-usb, michal.pecio, mathias.nyman

Don't use the links in original email!

Sorry, more spam than useful. I don't wanted to send big files to 
mailing list, but this service is also shit, sorry again:

I tried it now with WeTransfer:

https://wetransfer.com/downloads/a5ddcb347b80bc0e58413b2053d7aa5d20250817150401/7b0199?t_exp=1755702241&t_lsid=b037582b-3c5f-4b07-abbf-74bf23b4890c&t_network=link&t_rid=YXV0aDB8NjQwOGVjNThhOTFhMzAyZWI3OGU5M2M3&t_s=download_link&t_ts=1755443041&utm_campaign=TRN_TDL_12&utm_source=sendgrid&utm_medium=email&trk=TRN_TDL_12


Download only valid for 3 days.



Am 17.08.2025 um 16:58 schrieb Martin Maurer:
> Hello Michał, hello Mathias at all,
>
> many thanks for your answers!
>
> I have tried if I can reproduce it with a AMD Linux PC, but 
> unfortunately I was not able to reproduce (but setup is a bit different).
>
> So I went back to Raspberry Pi Compute Module 5, where I mainly 
> connected the radio module (Quectel RM520N-GL) via USB3,
>
> and installed a Wifi access point. All data/all connections from Wifi 
> access point are routed directly via wwan0 to radio module.
>
> This is currently my easiest setup to be able to reproduce the error. 
> Mostly in a few seconds.
>
> My knowledge in area Linux Kernel + USB is unfortunately not yet 
> enough to analyze and fix it by myself.
>
> But I used the help of ChatGPT-5 to create an usbmon and xhci kernel 
> trace.
>
> I create an usbmon trace as well as a trace from xhci (both recorded 
> in parallel):
>
> ... Removed due to spam website...

> This was the last output, my ping in a shell has shown:
>
> 64 bytes from 8.8.8.8: icmp_seq=2323 ttl=112 time=26.0 ms
> 64 bytes from 8.8.8.8: icmp_seq=2324 ttl=112 time=25.0 ms
> 64 bytes from 8.8.8.8: icmp_seq=2325 ttl=112 time=29.1 ms
> 64 bytes from 8.8.8.8: icmp_seq=2326 ttl=112 time=37.8 ms
>
> In parallel created more data traffic, but with ping I see first when 
> IP data connection does not work stable anymore.
>
> According to ChatGPT-5 the following places contain errors:
>
> *** USBMON ***
>
> In your usbmon_bus5_20250817-150158.log:
>
> First -71 (EPROTO) on the QMI Bulk-IN (Bi:5:005:14): line 2161, 
> timestamp 493245744
>
> 2161: ffffff8003c8cb40 493245744 C Bi:5:005:14 -71 0
>
> Just before that, there’s a -75 (EOVERFLOW) on the same IN EP, which 
> is often the first sign of trouble: line 2159, timestamp 493245221
>
> 2159: ffffff8003c8cd80 493245221 C Bi:5:005:14 -75 1024 = ...
>
> So the sequence is: several good completions → EOVERFLOW (-75) → then 
> a stream of EPROTO (-71) errors on Bi:5:005:14, which kills further 
> ping replies after your last good seq (2326).
>
>
> *** XHCI TRACE ***
>
> I found the first failure in your xHCI trace.
>
> First error line: line 8216
>
> Timestamp: 758267.000115
>
> Event: xhci_handle_event … type 'Transfer Event' … 'Error' … slot 1 ep 
> 29 … len 1472
>
> Why ep 29? In xHCI, the endpoint context index is ep_index = 2 * 
> ep_number + (direction), where direction is 0=OUT, 1=IN.
> So for Bulk IN ep 14: 2*14+1 = 29 → that’s your IN 0x87 pipe.
>
> Right after that line you can see the driver react:
>
> xhci_handle_transfer … length 1472 … (the failed TD)
>
> xhci_queue_command: Reset Endpoint Command … ep 29 (host tries to 
> recover)
>
> xhci_handle_event: … 'Command Completion Event' (reset completes)
>
> But from this point on, completions for that IN EP correspond to 
> usbmon -71 (EPROTO) — matching what you saw.
>
>
> Does this give a clue, where it could be coming from?
>
> It is 100% reproduceable in a few seconds on Raspberry Pi Ccompute 
> Module 5 (and I same behaviour on different kernel of i.MX8MP).
>
> Could it be a hardware problem? I already tried different radio module 
> (all Qualcomm, X62/X65 and X72/X75),
>
> different cables (all same length, all from same source), different 
> eval board for the M.2 radio modules (but from same source).
>
>
> Can you give me a hint, what to try next?
>
>
> ChatGPT-5 pinpoints me to try to disable LPM for USB3, could this be a 
> next step? Or is it something  else?
>
>
> Many thanks for your help!
>
> Best regards,
>
> Martin
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem hanging Bulk IN, with USB 3.x, perhaps due to wMaxPacketSize = 1024 and wMaxBurst = 6 (OUT) and 2 (IN), tested and reproduceable with i.MX8MP and Raspberry Pi Compute Module 5 (CM5)
  2025-08-17 14:58   ` Martin Maurer
  2025-08-17 15:07     ` Martin Maurer
@ 2025-08-17 15:22     ` Daniele Palmas
  2025-08-17 17:01       ` Martin Maurer
  1 sibling, 1 reply; 8+ messages in thread
From: Daniele Palmas @ 2025-08-17 15:22 UTC (permalink / raw)
  To: Martin Maurer; +Cc: linux-usb, michal.pecio, mathias.nyman

Hello Martin,

Il giorno dom 17 ago 2025 alle ore 17:09 Martin Maurer
<martin.maurer@mmeacs.de> ha scritto:
>
> Hello Michał, hello Mathias at all,
>
> many thanks for your answers!
>
> I have tried if I can reproduce it with a AMD Linux PC, but
> unfortunately I was not able to reproduce (but setup is a bit different).
>
> So I went back to Raspberry Pi Compute Module 5, where I mainly
> connected the radio module (Quectel RM520N-GL) via USB3,
>
> and installed a Wifi access point. All data/all connections from Wifi
> access point are routed directly via wwan0 to radio module.
>
> This is currently my easiest setup to be able to reproduce the error.
> Mostly in a few seconds.
>
> My knowledge in area Linux Kernel + USB is unfortunately not yet enough
> to analyze and fix it by myself.
>
> But I used the help of ChatGPT-5 to create an usbmon and xhci kernel trace.
>
> I create an usbmon trace as well as a trace from xhci (both recorded in
> parallel):
>
> https://www.file-upload.net/en/download-15523936/usbmon_bus5_20250817-150158.log.html
>
> https://www.file-upload.net/en/download-15523937/xhci_20250817-150158.trace.html
>
> This was the last output, my ping in a shell has shown:
>
> 64 bytes from 8.8.8.8: icmp_seq=2323 ttl=112 time=26.0 ms
> 64 bytes from 8.8.8.8: icmp_seq=2324 ttl=112 time=25.0 ms
> 64 bytes from 8.8.8.8: icmp_seq=2325 ttl=112 time=29.1 ms
> 64 bytes from 8.8.8.8: icmp_seq=2326 ttl=112 time=37.8 ms
>
> In parallel created more data traffic, but with ping I see first when IP
> data connection does not work stable anymore.
>
> According to ChatGPT-5 the following places contain errors:
>
> *** USBMON ***
>
> In your usbmon_bus5_20250817-150158.log:
>
> First -71 (EPROTO) on the QMI Bulk-IN (Bi:5:005:14): line 2161,
> timestamp 493245744
>
> 2161: ffffff8003c8cb40 493245744 C Bi:5:005:14 -71 0
>
> Just before that, there’s a -75 (EOVERFLOW) on the same IN EP, which is
> often the first sign of trouble: line 2159, timestamp 493245221
>

I did not have the chance to look at the usbmon traces so I'm not sure
that this is really the same scenario, but you could take a look at
the whole thread at
https://www.spinics.net/lists/netdev/msg635944.html

If it is the same issue, basically, if you setup the data connection
with QMAP you should not face the issue.

Regards,
Daniele

> 2159: ffffff8003c8cd80 493245221 C Bi:5:005:14 -75 1024 = ...
>
> So the sequence is: several good completions → EOVERFLOW (-75) → then a
> stream of EPROTO (-71) errors on Bi:5:005:14, which kills further ping
> replies after your last good seq (2326).
>
>
> *** XHCI TRACE ***
>
> I found the first failure in your xHCI trace.
>
> First error line: line 8216
>
> Timestamp: 758267.000115
>
> Event: xhci_handle_event … type 'Transfer Event' … 'Error' … slot 1 ep
> 29 … len 1472
>
> Why ep 29? In xHCI, the endpoint context index is ep_index = 2 *
> ep_number + (direction), where direction is 0=OUT, 1=IN.
> So for Bulk IN ep 14: 2*14+1 = 29 → that’s your IN 0x87 pipe.
>
> Right after that line you can see the driver react:
>
> xhci_handle_transfer … length 1472 … (the failed TD)
>
> xhci_queue_command: Reset Endpoint Command … ep 29 (host tries to recover)
>
> xhci_handle_event: … 'Command Completion Event' (reset completes)
>
> But from this point on, completions for that IN EP correspond to usbmon
> -71 (EPROTO) — matching what you saw.
>
>
> Does this give a clue, where it could be coming from?
>
> It is 100% reproduceable in a few seconds on Raspberry Pi Ccompute
> Module 5 (and I same behaviour on different kernel of i.MX8MP).
>
> Could it be a hardware problem? I already tried different radio module
> (all Qualcomm, X62/X65 and X72/X75),
>
> different cables (all same length, all from same source), different eval
> board for the M.2 radio modules (but from same source).
>
>
> Can you give me a hint, what to try next?
>
>
> ChatGPT-5 pinpoints me to try to disable LPM for USB3, could this be a
> next step? Or is it something  else?
>
>
> Many thanks for your help!
>
> Best regards,
>
> Martin
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem hanging Bulk IN, with USB 3.x, perhaps due to wMaxPacketSize = 1024 and wMaxBurst = 6 (OUT) and 2 (IN), tested and reproduceable with i.MX8MP and Raspberry Pi Compute Module 5 (CM5)
  2025-08-17 15:22     ` Daniele Palmas
@ 2025-08-17 17:01       ` Martin Maurer
  2025-08-18 19:49         ` Daniele Palmas
  0 siblings, 1 reply; 8+ messages in thread
From: Martin Maurer @ 2025-08-17 17:01 UTC (permalink / raw)
  To: Daniele Palmas; +Cc: linux-usb, michal.pecio, mathias.nyman

Hi Daniele,

many thanks for your reply!

I can only partly open

https://www.spinics.net

pages, often pages time out...

Have I understood correctly, that there is a known bug, but it was not 
fixed (from 2020 till now).

But as workaround enabling qmux/qmimux could work?

Best regards,

Martin



Am 17.08.2025 um 17:22 schrieb Daniele Palmas:
> Hello Martin,
>
> Il giorno dom 17 ago 2025 alle ore 17:09 Martin Maurer
> <martin.maurer@mmeacs.de> ha scritto:
>> Hello Michał, hello Mathias at all,
>>
>> many thanks for your answers!
>>
>> I have tried if I can reproduce it with a AMD Linux PC, but
>> unfortunately I was not able to reproduce (but setup is a bit different).
>>
>> So I went back to Raspberry Pi Compute Module 5, where I mainly
>> connected the radio module (Quectel RM520N-GL) via USB3,
>>
>> and installed a Wifi access point. All data/all connections from Wifi
>> access point are routed directly via wwan0 to radio module.
>>
>> This is currently my easiest setup to be able to reproduce the error.
>> Mostly in a few seconds.
>>
>> My knowledge in area Linux Kernel + USB is unfortunately not yet enough
>> to analyze and fix it by myself.
>>
>> But I used the help of ChatGPT-5 to create an usbmon and xhci kernel trace.
>>
>> I create an usbmon trace as well as a trace from xhci (both recorded in
>> parallel):
>>
>> https://www.file-upload.net/en/download-15523936/usbmon_bus5_20250817-150158.log.html
>>
>> https://www.file-upload.net/en/download-15523937/xhci_20250817-150158.trace.html
>>
>> This was the last output, my ping in a shell has shown:
>>
>> 64 bytes from 8.8.8.8: icmp_seq=2323 ttl=112 time=26.0 ms
>> 64 bytes from 8.8.8.8: icmp_seq=2324 ttl=112 time=25.0 ms
>> 64 bytes from 8.8.8.8: icmp_seq=2325 ttl=112 time=29.1 ms
>> 64 bytes from 8.8.8.8: icmp_seq=2326 ttl=112 time=37.8 ms
>>
>> In parallel created more data traffic, but with ping I see first when IP
>> data connection does not work stable anymore.
>>
>> According to ChatGPT-5 the following places contain errors:
>>
>> *** USBMON ***
>>
>> In your usbmon_bus5_20250817-150158.log:
>>
>> First -71 (EPROTO) on the QMI Bulk-IN (Bi:5:005:14): line 2161,
>> timestamp 493245744
>>
>> 2161: ffffff8003c8cb40 493245744 C Bi:5:005:14 -71 0
>>
>> Just before that, there’s a -75 (EOVERFLOW) on the same IN EP, which is
>> often the first sign of trouble: line 2159, timestamp 493245221
>>
> I did not have the chance to look at the usbmon traces so I'm not sure
> that this is really the same scenario, but you could take a look at
> the whole thread at
> https://www.spinics.net/lists/netdev/msg635944.html
>
> If it is the same issue, basically, if you setup the data connection
> with QMAP you should not face the issue.
>
> Regards,
> Daniele
>
>> 2159: ffffff8003c8cd80 493245221 C Bi:5:005:14 -75 1024 = ...
>>
>> So the sequence is: several good completions → EOVERFLOW (-75) → then a
>> stream of EPROTO (-71) errors on Bi:5:005:14, which kills further ping
>> replies after your last good seq (2326).
>>
>>
>> *** XHCI TRACE ***
>>
>> I found the first failure in your xHCI trace.
>>
>> First error line: line 8216
>>
>> Timestamp: 758267.000115
>>
>> Event: xhci_handle_event … type 'Transfer Event' … 'Error' … slot 1 ep
>> 29 … len 1472
>>
>> Why ep 29? In xHCI, the endpoint context index is ep_index = 2 *
>> ep_number + (direction), where direction is 0=OUT, 1=IN.
>> So for Bulk IN ep 14: 2*14+1 = 29 → that’s your IN 0x87 pipe.
>>
>> Right after that line you can see the driver react:
>>
>> xhci_handle_transfer … length 1472 … (the failed TD)
>>
>> xhci_queue_command: Reset Endpoint Command … ep 29 (host tries to recover)
>>
>> xhci_handle_event: … 'Command Completion Event' (reset completes)
>>
>> But from this point on, completions for that IN EP correspond to usbmon
>> -71 (EPROTO) — matching what you saw.
>>
>>
>> Does this give a clue, where it could be coming from?
>>
>> It is 100% reproduceable in a few seconds on Raspberry Pi Ccompute
>> Module 5 (and I same behaviour on different kernel of i.MX8MP).
>>
>> Could it be a hardware problem? I already tried different radio module
>> (all Qualcomm, X62/X65 and X72/X75),
>>
>> different cables (all same length, all from same source), different eval
>> board for the M.2 radio modules (but from same source).
>>
>>
>> Can you give me a hint, what to try next?
>>
>>
>> ChatGPT-5 pinpoints me to try to disable LPM for USB3, could this be a
>> next step? Or is it something  else?
>>
>>
>> Many thanks for your help!
>>
>> Best regards,
>>
>> Martin
>>
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem hanging Bulk IN, with USB 3.x, perhaps due to wMaxPacketSize = 1024 and wMaxBurst = 6 (OUT) and 2 (IN), tested and reproduceable with i.MX8MP and Raspberry Pi Compute Module 5 (CM5)
  2025-08-17 17:01       ` Martin Maurer
@ 2025-08-18 19:49         ` Daniele Palmas
  0 siblings, 0 replies; 8+ messages in thread
From: Daniele Palmas @ 2025-08-18 19:49 UTC (permalink / raw)
  To: Martin Maurer; +Cc: linux-usb, michal.pecio, mathias.nyman

Hi Martin,

Il giorno dom 17 ago 2025 alle ore 19:01 Martin Maurer
<martin.maurer@mmeacs.de> ha scritto:
>
> Hi Daniele,
>
> many thanks for your reply!
>
> I can only partly open
>
> https://www.spinics.net
>
> pages, often pages time out...
>
> Have I understood correctly, that there is a known bug, but it was not
> fixed (from 2020 till now).
>
> But as workaround enabling qmux/qmimux could work?

If the problem is the same, it should. If your kernel version supports
the passthrough sysfs file, you can also use the rmnet module (much
better than the inbox qmap implementation).

If enabling QMAP is too complicated in your setup, you can just try to
increase the rx_urb_size by increasing the mtu (at least 2048).

Regards,
Daniele

> Best regards,
>
> Martin
>
>
>
> Am 17.08.2025 um 17:22 schrieb Daniele Palmas:
> > Hello Martin,
> >
> > Il giorno dom 17 ago 2025 alle ore 17:09 Martin Maurer
> > <martin.maurer@mmeacs.de> ha scritto:
> >> Hello Michał, hello Mathias at all,
> >>
> >> many thanks for your answers!
> >>
> >> I have tried if I can reproduce it with a AMD Linux PC, but
> >> unfortunately I was not able to reproduce (but setup is a bit different).
> >>
> >> So I went back to Raspberry Pi Compute Module 5, where I mainly
> >> connected the radio module (Quectel RM520N-GL) via USB3,
> >>
> >> and installed a Wifi access point. All data/all connections from Wifi
> >> access point are routed directly via wwan0 to radio module.
> >>
> >> This is currently my easiest setup to be able to reproduce the error.
> >> Mostly in a few seconds.
> >>
> >> My knowledge in area Linux Kernel + USB is unfortunately not yet enough
> >> to analyze and fix it by myself.
> >>
> >> But I used the help of ChatGPT-5 to create an usbmon and xhci kernel trace.
> >>
> >> I create an usbmon trace as well as a trace from xhci (both recorded in
> >> parallel):
> >>
> >> https://www.file-upload.net/en/download-15523936/usbmon_bus5_20250817-150158.log.html
> >>
> >> https://www.file-upload.net/en/download-15523937/xhci_20250817-150158.trace.html
> >>
> >> This was the last output, my ping in a shell has shown:
> >>
> >> 64 bytes from 8.8.8.8: icmp_seq=2323 ttl=112 time=26.0 ms
> >> 64 bytes from 8.8.8.8: icmp_seq=2324 ttl=112 time=25.0 ms
> >> 64 bytes from 8.8.8.8: icmp_seq=2325 ttl=112 time=29.1 ms
> >> 64 bytes from 8.8.8.8: icmp_seq=2326 ttl=112 time=37.8 ms
> >>
> >> In parallel created more data traffic, but with ping I see first when IP
> >> data connection does not work stable anymore.
> >>
> >> According to ChatGPT-5 the following places contain errors:
> >>
> >> *** USBMON ***
> >>
> >> In your usbmon_bus5_20250817-150158.log:
> >>
> >> First -71 (EPROTO) on the QMI Bulk-IN (Bi:5:005:14): line 2161,
> >> timestamp 493245744
> >>
> >> 2161: ffffff8003c8cb40 493245744 C Bi:5:005:14 -71 0
> >>
> >> Just before that, there’s a -75 (EOVERFLOW) on the same IN EP, which is
> >> often the first sign of trouble: line 2159, timestamp 493245221
> >>
> > I did not have the chance to look at the usbmon traces so I'm not sure
> > that this is really the same scenario, but you could take a look at
> > the whole thread at
> > https://www.spinics.net/lists/netdev/msg635944.html
> >
> > If it is the same issue, basically, if you setup the data connection
> > with QMAP you should not face the issue.
> >
> > Regards,
> > Daniele
> >
> >> 2159: ffffff8003c8cd80 493245221 C Bi:5:005:14 -75 1024 = ...
> >>
> >> So the sequence is: several good completions → EOVERFLOW (-75) → then a
> >> stream of EPROTO (-71) errors on Bi:5:005:14, which kills further ping
> >> replies after your last good seq (2326).
> >>
> >>
> >> *** XHCI TRACE ***
> >>
> >> I found the first failure in your xHCI trace.
> >>
> >> First error line: line 8216
> >>
> >> Timestamp: 758267.000115
> >>
> >> Event: xhci_handle_event … type 'Transfer Event' … 'Error' … slot 1 ep
> >> 29 … len 1472
> >>
> >> Why ep 29? In xHCI, the endpoint context index is ep_index = 2 *
> >> ep_number + (direction), where direction is 0=OUT, 1=IN.
> >> So for Bulk IN ep 14: 2*14+1 = 29 → that’s your IN 0x87 pipe.
> >>
> >> Right after that line you can see the driver react:
> >>
> >> xhci_handle_transfer … length 1472 … (the failed TD)
> >>
> >> xhci_queue_command: Reset Endpoint Command … ep 29 (host tries to recover)
> >>
> >> xhci_handle_event: … 'Command Completion Event' (reset completes)
> >>
> >> But from this point on, completions for that IN EP correspond to usbmon
> >> -71 (EPROTO) — matching what you saw.
> >>
> >>
> >> Does this give a clue, where it could be coming from?
> >>
> >> It is 100% reproduceable in a few seconds on Raspberry Pi Ccompute
> >> Module 5 (and I same behaviour on different kernel of i.MX8MP).
> >>
> >> Could it be a hardware problem? I already tried different radio module
> >> (all Qualcomm, X62/X65 and X72/X75),
> >>
> >> different cables (all same length, all from same source), different eval
> >> board for the M.2 radio modules (but from same source).
> >>
> >>
> >> Can you give me a hint, what to try next?
> >>
> >>
> >> ChatGPT-5 pinpoints me to try to disable LPM for USB3, could this be a
> >> next step? Or is it something  else?
> >>
> >>
> >> Many thanks for your help!
> >>
> >> Best regards,
> >>
> >> Martin
> >>
> >>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-08-18 19:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-08 21:47 Problem hanging Bulk IN, with USB 3.x, perhaps due to wMaxPacketSize = 1024 and wMaxBurst = 6 (OUT) and 2 (IN), tested and reproduceable with i.MX8MP and Raspberry Pi Compute Module 5 (CM5) Martin Maurer
2025-08-14 12:16 ` Michał Pecio
2025-08-14 13:02 ` Mathias Nyman
2025-08-17 14:58   ` Martin Maurer
2025-08-17 15:07     ` Martin Maurer
2025-08-17 15:22     ` Daniele Palmas
2025-08-17 17:01       ` Martin Maurer
2025-08-18 19:49         ` Daniele Palmas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox