linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Bug 217242] New: CPU hard lockup related to xhci/dma
@ 2023-03-24 15:00 bugzilla-daemon
  2023-04-01 20:49 ` [Bug 217242] " bugzilla-daemon
                   ` (35 more replies)
  0 siblings, 36 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-03-24 15:00 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

            Bug ID: 217242
           Summary: CPU hard lockup related to xhci/dma
           Product: Drivers
           Version: 2.5
    Kernel Version: 6.1.14-1-lts
          Hardware: Intel
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: USB
          Assignee: drivers_usb@kernel-bugs.kernel.org
          Reporter: miller.hunterc@gmail.com
        Regression: No

Created attachment 304018
  --> https://bugzilla.kernel.org/attachment.cgi?id=304018&action=edit
cpu hard lockup

Utilizing Intel NUC11ATKC4 computers, there is an issue of CPU hard lockups
that occurs seemingly randomly. Typically the issue arises every 1-5 days,
though sometimes the issue does not arise for a week plus. 

Of note: These computers are constantly communicating via serial communication
(request-reply pattern) to an embedded device via USB to UART cable. Unsure if
that may contribute - will start test next week to see if issue still arises
when serial communication does not occur.

This issue has been seen as far back as linux-lts 5.15.62 (not saying that's
when the issue started, just that is the earliest release that these computers
have ran).

This is on Arch Linux.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
@ 2023-04-01 20:49 ` bugzilla-daemon
  2023-04-01 20:54 ` bugzilla-daemon
                   ` (34 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-01 20:49 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

Austin Domino (austin.domino@hotmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |austin.domino@hotmail.com

--- Comment #1 from Austin Domino (austin.domino@hotmail.com) ---
The same problem has been seen on Intel NUC7i5BNK, ASRock NUC BOX-1260P,
LattePanda Delta 3 and other compact computers of the sort, and if enough
individual processes are doing USB serial communications (ttyUSB/ttyACM) at
once and it matches or exceeds the number of CPU cores, the entire CPU can get
locked up until the watchdog frees things up.  Most lock ups of this sort are
resolved after ~20-25 seconds, but some have lasted over 20 minutes!  There's
several options to get around this problem. One of which is to set the CPU
affinity so at least 1 core won't ever be used for USB serial communications. 
Another one is to use a semaphore to add a similar limitation.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
  2023-04-01 20:49 ` [Bug 217242] " bugzilla-daemon
@ 2023-04-01 20:54 ` bugzilla-daemon
  2023-04-01 20:57 ` bugzilla-daemon
                   ` (33 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-01 20:54 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #2 from Austin Domino (austin.domino@hotmail.com) ---
Created attachment 304072
  --> https://bugzilla.kernel.org/attachment.cgi?id=304072&action=edit
Log file displaying this problem on the ASRock NUC Box-1260P running kernel
version 6.2.8

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
  2023-04-01 20:49 ` [Bug 217242] " bugzilla-daemon
  2023-04-01 20:54 ` bugzilla-daemon
@ 2023-04-01 20:57 ` bugzilla-daemon
  2023-04-01 22:11 ` bugzilla-daemon
                   ` (32 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-01 20:57 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #3 from Austin Domino (austin.domino@hotmail.com) ---
Created attachment 304073
  --> https://bugzilla.kernel.org/attachment.cgi?id=304073&action=edit
Log file displaying this problem on the Intel NUC7i5BNK running kernel version
6.2.8

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (2 preceding siblings ...)
  2023-04-01 20:57 ` bugzilla-daemon
@ 2023-04-01 22:11 ` bugzilla-daemon
  2023-04-01 22:12 ` bugzilla-daemon
                   ` (31 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-01 22:11 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #4 from Austin Domino (austin.domino@hotmail.com) ---
The setup that created the output in the previous two attached log files is
described in more detail in related posts on:

Ubuntu Forums -> https://ubuntuforums.org/showthread.php … st14136903
and
Ubuntu's Launchpad Bug Page -> https://bugs.launchpad.net/ubuntu/+sour …
ug/2013390

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (3 preceding siblings ...)
  2023-04-01 22:11 ` bugzilla-daemon
@ 2023-04-01 22:12 ` bugzilla-daemon
  2023-04-01 22:17 ` bugzilla-daemon
                   ` (30 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-01 22:12 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #5 from Austin Domino (austin.domino@hotmail.com) ---
The setup that created this output is described in related posts on:

Ubuntu Forums ->
[url]https://ubuntuforums.org/showthread.php?t=2485480&p=14136903#post14136903[/url]
and
Ubuntu's Launchpad Bug Page ->
[url]https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2013390[/url]

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (4 preceding siblings ...)
  2023-04-01 22:12 ` bugzilla-daemon
@ 2023-04-01 22:17 ` bugzilla-daemon
  2023-04-02 15:54   ` Hans Petter Selasky
  2023-04-02 15:54 ` bugzilla-daemon
                   ` (29 subsequent siblings)
  35 siblings, 1 reply; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-01 22:17 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #6 from Austin Domino (austin.domino@hotmail.com) ---
(In reply to Austin Domino from comment #4)
> The setup that created the output in the previous two attached log files is
> described in more detail in related posts on:
> 
> Ubuntu Forums -> https://ubuntuforums.org/showthread.php … st14136903
> and
> Ubuntu's Launchpad Bug Page -> https://bugs.launchpad.net/ubuntu/+sour …
> ug/2013390
Try 3 (I wish I could edit posts, but I should not have pressed "Save Changes"
so quickly. Sorry about this comment and my incompetence):

Here are the actual links:

Ubuntu Forums ->
https://ubuntuforums.org/showthread.php?t=2485480&p=14136903#post14136903
and
Ubuntu's Launchpad Bug Page ->
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2013390

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Bug 217242] CPU hard lockup related to xhci/dma
  2023-04-01 22:17 ` bugzilla-daemon
@ 2023-04-02 15:54   ` Hans Petter Selasky
  2023-04-02 17:25     ` Greg KH
  0 siblings, 1 reply; 41+ messages in thread
From: Hans Petter Selasky @ 2023-04-02 15:54 UTC (permalink / raw)
  To: bugzilla-daemon, linux-usb

On 4/2/23 00:17, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=217242
> 
> --- Comment #6 from Austin Domino (austin.domino@hotmail.com) ---
> (In reply to Austin Domino from comment #4)
>> The setup that created the output in the previous two attached log files is
>> described in more detail in related posts on:
>>
>> Ubuntu Forums -> https://ubuntuforums.org/showthread.php … st14136903
>> and
>> Ubuntu's Launchpad Bug Page -> https://bugs.launchpad.net/ubuntu/+sour …
>> ug/2013390
> Try 3 (I wish I could edit posts, but I should not have pressed "Save Changes"
> so quickly. Sorry about this comment and my incompetence):
> 
> Here are the actual links:
> 
> Ubuntu Forums ->
> https://ubuntuforums.org/showthread.php?t=2485480&p=14136903#post14136903
> and
> Ubuntu's Launchpad Bug Page ->
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2013390
> 

Hi,

I don't have access to the bugzilla, but this looks like a out of memory 
situation, and does not really point towards the USB XHCI. URB's are 
typically submitted using
GFP_KERNEL, which allow memory allocators to sleep while waiting for 
more memory. GFP_ATOMIC does not allow sleeping.

usb_submit_urb(xxx, GFP_KERNEL);

While that being said, I wish the Linux USB core would take the example 
of the FreeBSD USB core, and pre-allocate all memory needed for USB 
transfers, also called URB's, during device attach. Frequently going 
through allocate and free cycles during operation, is not just 
inefficient, but also greatly degrades the ability to debug the system. 
USB is still quite essential when doing remote server access. Yeah, the 
serial port is great too, but one day inb() and outb() will die :-)

--HPS

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (5 preceding siblings ...)
  2023-04-01 22:17 ` bugzilla-daemon
@ 2023-04-02 15:54 ` bugzilla-daemon
  2023-04-02 17:25 ` bugzilla-daemon
                   ` (28 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-02 15:54 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #7 from hps@selasky.org ---
On 4/2/23 00:17, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=217242
> 
> --- Comment #6 from Austin Domino (austin.domino@hotmail.com) ---
> (In reply to Austin Domino from comment #4)
>> The setup that created the output in the previous two attached log files is
>> described in more detail in related posts on:
>>
>> Ubuntu Forums -> https://ubuntuforums.org/showthread.php … st14136903
>> and
>> Ubuntu's Launchpad Bug Page -> https://bugs.launchpad.net/ubuntu/+sour …
>> ug/2013390
> Try 3 (I wish I could edit posts, but I should not have pressed "Save
> Changes"
> so quickly. Sorry about this comment and my incompetence):
> 
> Here are the actual links:
> 
> Ubuntu Forums ->
> https://ubuntuforums.org/showthread.php?t=2485480&p=14136903#post14136903
> and
> Ubuntu's Launchpad Bug Page ->
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2013390
> 

Hi,

I don't have access to the bugzilla, but this looks like a out of memory 
situation, and does not really point towards the USB XHCI. URB's are 
typically submitted using
GFP_KERNEL, which allow memory allocators to sleep while waiting for 
more memory. GFP_ATOMIC does not allow sleeping.

usb_submit_urb(xxx, GFP_KERNEL);

While that being said, I wish the Linux USB core would take the example 
of the FreeBSD USB core, and pre-allocate all memory needed for USB 
transfers, also called URB's, during device attach. Frequently going 
through allocate and free cycles during operation, is not just 
inefficient, but also greatly degrades the ability to debug the system. 
USB is still quite essential when doing remote server access. Yeah, the 
serial port is great too, but one day inb() and outb() will die :-)

--HPS

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Bug 217242] CPU hard lockup related to xhci/dma
  2023-04-02 15:54   ` Hans Petter Selasky
@ 2023-04-02 17:25     ` Greg KH
  2023-04-02 18:57       ` Alan Stern
  0 siblings, 1 reply; 41+ messages in thread
From: Greg KH @ 2023-04-02 17:25 UTC (permalink / raw)
  To: Hans Petter Selasky; +Cc: bugzilla-daemon, linux-usb

On Sun, Apr 02, 2023 at 05:54:18PM +0200, Hans Petter Selasky wrote:
> While that being said, I wish the Linux USB core would take the example of
> the FreeBSD USB core, and pre-allocate all memory needed for USB transfers,
> also called URB's, during device attach.

Many drivers do that today already, which specific ones do you think
need to have this added that are not doing so?

> Frequently going through allocate
> and free cycles during operation, is not just inefficient, but also greatly
> degrades the ability to debug the system.

Based on the slow USB speeds, "inefficient" isn't anything that I've
been able to measure specifically, have you?

> USB is still quite essential when doing remote server access. Yeah,
> the serial port is great too, but one day inb() and outb() will die

That's what a USB debugging cable is for :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (6 preceding siblings ...)
  2023-04-02 15:54 ` bugzilla-daemon
@ 2023-04-02 17:25 ` bugzilla-daemon
  2023-04-03 19:18 ` bugzilla-daemon
                   ` (27 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-02 17:25 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #8 from gregkh@linuxfoundation.org ---
On Sun, Apr 02, 2023 at 05:54:18PM +0200, Hans Petter Selasky wrote:
> While that being said, I wish the Linux USB core would take the example of
> the FreeBSD USB core, and pre-allocate all memory needed for USB transfers,
> also called URB's, during device attach.

Many drivers do that today already, which specific ones do you think
need to have this added that are not doing so?

> Frequently going through allocate
> and free cycles during operation, is not just inefficient, but also greatly
> degrades the ability to debug the system.

Based on the slow USB speeds, "inefficient" isn't anything that I've
been able to measure specifically, have you?

> USB is still quite essential when doing remote server access. Yeah,
> the serial port is great too, but one day inb() and outb() will die

That's what a USB debugging cable is for :)

thanks,

greg k-h

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Bug 217242] CPU hard lockup related to xhci/dma
  2023-04-02 17:25     ` Greg KH
@ 2023-04-02 18:57       ` Alan Stern
  2023-04-05 18:15         ` Hans Petter Selasky
  0 siblings, 1 reply; 41+ messages in thread
From: Alan Stern @ 2023-04-02 18:57 UTC (permalink / raw)
  To: Greg KH, Hans Petter Selasky; +Cc: linux-usb

[Bugzilla removed from the CC: list, since this isn't relevant to the bug 
report]

On Sun, Apr 02, 2023 at 07:25:27PM +0200, Greg KH wrote:
> On Sun, Apr 02, 2023 at 05:54:18PM +0200, Hans Petter Selasky wrote:
> > While that being said, I wish the Linux USB core would take the example of
> > the FreeBSD USB core, and pre-allocate all memory needed for USB transfers,
> > also called URB's, during device attach.
> 
> Many drivers do that today already, which specific ones do you think
> need to have this added that are not doing so?

Hans is undoubtedly referring to the host controller drivers.

usb_alloc_urb() allocates memory for the URB itself.  But the routine does 
not know which device or host controller the URB will eventually be used 
with, so it doesn't know which HCD to tell to set aside adequate memory 
for handling the URB once it is submitted.  And since HCDs tend to process 
URB submissions while holding a private spinlock, when their memory 
allocation does get done it cannot use GFP_KERNEL.

I think it's fair to call this a weak point in Linux's USB stack.  
Balancing this, it should be pointed out that we can't always know in 
advance how large an URB's transfer buffer will be, and the amount of 
memory that the HCD will need can depend on this size.

> > Frequently going through allocate
> > and free cycles during operation, is not just inefficient, but also greatly

In fact, the original Slab memory allocator (in Solaris 2.4) was designed 
to make frequent allocate-and-free cycles extremely efficient.  So much so 
that people would just naturally do things that way instead of 
pre-allocating memory which would then just sit around unused a large 
fraction of the time.

I suspect the allocators in the Linux kernel don't end up being quite as 
efficient as the original Slab, however.

Alan Stern

> > degrades the ability to debug the system.
> 
> Based on the slow USB speeds, "inefficient" isn't anything that I've
> been able to measure specifically, have you?
> 
> > USB is still quite essential when doing remote server access. Yeah,
> > the serial port is great too, but one day inb() and outb() will die
> 
> That's what a USB debugging cable is for :)
> 
> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (7 preceding siblings ...)
  2023-04-02 17:25 ` bugzilla-daemon
@ 2023-04-03 19:18 ` bugzilla-daemon
  2023-04-06 20:15 ` bugzilla-daemon
                   ` (26 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-03 19:18 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #9 from Hunter M (miller.hunterc@gmail.com) ---
Created attachment 304082
  --> https://bugzilla.kernel.org/attachment.cgi?id=304082&action=edit
1 serial comm lockup

Update: Linux version 6.1.21-1-lts (arch linux)

Performed test over the weekend to see if issue reproducible by only running
process which communicates over UART. Logs attached.

Will build Kernel with DMA API Debug on and see if I am able to get any more
information.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Bug 217242] CPU hard lockup related to xhci/dma
  2023-04-02 18:57       ` Alan Stern
@ 2023-04-05 18:15         ` Hans Petter Selasky
  0 siblings, 0 replies; 41+ messages in thread
From: Hans Petter Selasky @ 2023-04-05 18:15 UTC (permalink / raw)
  To: Alan Stern, Greg KH; +Cc: linux-usb

On 4/2/23 20:57, Alan Stern wrote:
> [Bugzilla removed from the CC: list, since this isn't relevant to the bug
> report]
> 
> On Sun, Apr 02, 2023 at 07:25:27PM +0200, Greg KH wrote:
>> On Sun, Apr 02, 2023 at 05:54:18PM +0200, Hans Petter Selasky wrote:
>>> While that being said, I wish the Linux USB core would take the example of
>>> the FreeBSD USB core, and pre-allocate all memory needed for USB transfers,
>>> also called URB's, during device attach.
>>
>> Many drivers do that today already, which specific ones do you think
>> need to have this added that are not doing so?
> 
> Hans is undoubtedly referring to the host controller drivers.

Hi Alan,

Yes, I'm on the USB host side this time.

> usb_alloc_urb() allocates memory for the URB itself.  But the routine does
> not know which device or host controller the URB will eventually be used
> with, so it doesn't know which HCD to tell to set aside adequate memory
> for handling the URB once it is submitted.  And since HCDs tend to process
> URB submissions while holding a private spinlock, when their memory
> allocation does get done it cannot use GFP_KERNEL.

I remember a long time ago when memory allocation was very slow in 
FreeBSD, testing the USB control endpoint was difficult, without at the 
same time using 100% CPU. The reason was user-space applications used 
IOCTL's to do USB control endpoint requests synchronously, and that 
leaded to the request data being alloc'ed and free'd regularly. That was 
before jemalloc and per-CPU slabs. It was not the amount of data causing 
problems, but the request rate, 1000 - 8000 requests per second 
typically. Finding free holes in memory bitmaps due to fragmentation is 
_very_ expensive!

> 
> I think it's fair to call this a weak point in Linux's USB stack.
> Balancing this, it should be pointed out that we can't always know in
> advance how large an URB's transfer buffer will be, and the amount of
> memory that the HCD will need can depend on this size.
 >

In FreeBSD you have to specify a maximum length in bytes per "urb" or 
FreeBSD USB transfer, and various other static properties. Then you 
don't allocate and free those URB's so to speak, but just keep on 
re-using them, after first time allocation. All XHCI DMA structures are 
then just pre-allocated, because we know the PAGE_SIZE and how stuff is 
laid out into memory, it's easy to compute exactly the worst and best 
case for the number for hardware structures you need.

This is also very useful for boot-loaders, that FreeBSD USB can either 
run all single threaded with few fixed size memory pools, or multi 
threaded as part of a bigger OS.

>>> Frequently going through allocate
>>> and free cycles during operation, is not just inefficient, but also greatly
> 
> In fact, the original Slab memory allocator (in Solaris 2.4) was designed
> to make frequent allocate-and-free cycles extremely efficient.  So much so
> that people would just naturally do things that way instead of
> pre-allocating memory which would then just sit around unused a large
> fraction of the time.
> 
> I suspect the allocators in the Linux kernel don't end up being quite as
> efficient as the original Slab, however.
> 

FreeBSD USB is a completely different design compared to Linux. Anyway, 
back to the topic and thanks for the chat :-)

--HPS

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (8 preceding siblings ...)
  2023-04-03 19:18 ` bugzilla-daemon
@ 2023-04-06 20:15 ` bugzilla-daemon
  2023-04-06 20:16 ` bugzilla-daemon
                   ` (25 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-06 20:15 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #10 from Hunter M (miller.hunterc@gmail.com) ---
Created attachment 304094
  --> https://bugzilla.kernel.org/attachment.cgi?id=304094&action=edit
computer 1 dmesg CPU lockup

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (9 preceding siblings ...)
  2023-04-06 20:15 ` bugzilla-daemon
@ 2023-04-06 20:16 ` bugzilla-daemon
  2023-04-10 17:32 ` bugzilla-daemon
                   ` (24 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-06 20:16 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #11 from Hunter M (miller.hunterc@gmail.com) ---
Created attachment 304095
  --> https://bugzilla.kernel.org/attachment.cgi?id=304095&action=edit
computer 2 dmesg CPU lockup

Update: DMA API debugging did not result in any warnings/errors from DMA. 2/6
computers have had CPU lockup occur.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (10 preceding siblings ...)
  2023-04-06 20:16 ` bugzilla-daemon
@ 2023-04-10 17:32 ` bugzilla-daemon
  2023-04-10 17:34 ` bugzilla-daemon
                   ` (23 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-10 17:32 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #12 from Austin Domino (austin.domino@hotmail.com) ---
After some more testing, I found this bug was most likely introduced between
kernel version 5.11 and 5.13, but more testing will need to be done to narrow
verify and narrow things down further.  Although, it should be noted that the
lockup that occurred on kernel version 5.13 appeared in a slightly different
manner than I've seen before.  I've uploaded a short exert from that kernel
log.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (11 preceding siblings ...)
  2023-04-10 17:32 ` bugzilla-daemon
@ 2023-04-10 17:34 ` bugzilla-daemon
  2023-04-11 12:54 ` bugzilla-daemon
                   ` (22 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-10 17:34 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #13 from Austin Domino (austin.domino@hotmail.com) ---
Created attachment 304110
  --> https://bugzilla.kernel.org/attachment.cgi?id=304110&action=edit
Exert from kernel log on system running 5.13

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (12 preceding siblings ...)
  2023-04-10 17:34 ` bugzilla-daemon
@ 2023-04-11 12:54 ` bugzilla-daemon
  2023-04-12 19:56 ` bugzilla-daemon
                   ` (21 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-11 12:54 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

Mathias Nyman (mathias.nyman@linux.intel.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mathias.nyman@linux.intel.c
                   |                            |om

--- Comment #14 from Mathias Nyman (mathias.nyman@linux.intel.com) ---
Looks like it gets stuck during xhci ring expansion.

During ring expansion the xhci driver allocates memory with spinlock held using
dma_pool_zalloc(.., GFP_ATOMIC, ...)

This apparently never completes, so spinlock isn't released.
Any URBs queued for xhci after this will spin forever trying to take the lock,
lockin up that CPU.

The xhci ring expansion code looks broken, the calculation of new ring segments
needed is incorrect, may be huge.

Also I don't think we should need to expand the ring buffer in this case. There
might be some bug in how driver keeps track on free trbs.

I'll write a debugging patch that that tracks free trbs and expansion values.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (13 preceding siblings ...)
  2023-04-11 12:54 ` bugzilla-daemon
@ 2023-04-12 19:56 ` bugzilla-daemon
  2023-04-12 19:57 ` bugzilla-daemon
                   ` (20 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-12 19:56 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #15 from Hunter M (miller.hunterc@gmail.com) ---
Created attachment 304127
  --> https://bugzilla.kernel.org/attachment.cgi?id=304127&action=edit
trb values

Found what (possibly) looks to be a similar issue at
https://github.com/raspberrypi/linux/is … 1241972882.

The comment I've linked has a similar scenario (in that the single serial port
communication occurs frequently - the comment stated the serial communication
frequency was every 0.2s; the communication I've implemented is every 0.15s).
Yesterday I had started the serial communication process and left it overnight
to see the trb values by performing

for d in /sys/kernel/debug/usb/xhci/0000:01:00.0/devices/*/*; do if [ -d "$d"
]; then cd $d; echo "${d/?*\/devices\//}: $(wc -l trbs)"; fi done

However, when coming back this morning and attempting the above command, the
computer froze and either the software or hardware watchdog kicked in and
restarted the computer (journalctl logs did not print out before the reboot
occurred, so unfortunately I don't have any information on what happened).

I went ahead and re-performed the test today reducing the sleep time in between
serial write calls from 0.15s to 0.05s. With that, the ttyACM ring size did
expand considerably. The values I was able to obtain until a power outage
occurred which stopped my test are attached.

Further information: utilizing Microchip MCP2200 USB to UART converter instead
of FTDI (though this should not make a difference, pointing it out just in
case).

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (14 preceding siblings ...)
  2023-04-12 19:56 ` bugzilla-daemon
@ 2023-04-12 19:57 ` bugzilla-daemon
  2023-04-13  8:02 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-12 19:57 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #16 from Hunter M (miller.hunterc@gmail.com) ---
URL hyperlink got cut - here's the correct link for the issue:
https://github.com/raspberrypi/linux/issues/5088#issuecomment-1241972882

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (15 preceding siblings ...)
  2023-04-12 19:57 ` bugzilla-daemon
@ 2023-04-13  8:02 ` bugzilla-daemon
  2023-04-13 20:23 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-13  8:02 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #17 from Mathias Nyman (mathias.nyman@linux.intel.com) ---
Thanks, one reason why ring expansion calculation is incorrect is that we try
to store a negative value in an unsigned int. 

static int prepare_ring(...)
{
        unsigned int num_trbs_needed;        
        ....
        num_trbs_needed = num_trbs - ep_ring->num_trbs_free;
}

ep_ring->num_trbs_free might be bigger than num_trbs

So we end up with a huge and incorrect num_trbs_needed

https://elixir.bootlin.com/linux/v6.2/source/drivers/usb/host/xhci-ring.c#L3186

In addition to this there is most likely a small bug in tracking
ep_ring->num_trbs_free, gradually decreasing it incorrectly.
Not sure where that happens

Still working on that debugging patch

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (16 preceding siblings ...)
  2023-04-13  8:02 ` bugzilla-daemon
@ 2023-04-13 20:23 ` bugzilla-daemon
  2023-04-14 14:24 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-13 20:23 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #18 from Austin Domino (austin.domino@hotmail.com) ---
I've narrowed down when this bug first appears to the 5.12 kernel release; I
ran a couple computers for a week on kernel version 5.11 and ran into 0
problems while running a program like the one above, but before doing this, I
had 1 of those computers on kernel version 5.12 while running that same program
with the same devices and it ran into this bug within 24 hours.


I looked at the number of TRBs for the computers that ran kernel version 5.11
for over a week with that program, and they were all at 512, so it's extremely
unlikely that the ring expansion problems are present in version 5.11.


This morning, out of curiosity, I took a computer running Ubuntu 18.04, went to
Ubuntu's kernel build page, https://kernel.ubuntu.com/~kernel-ppa/mainline/,
and tried a number of kernels to narrow down when the ring expansion problems
first appeared.  It seems that this problem is present in all the 5.12-rc
releases, and I know that it's present on 5.15; that was what this computer was
running before all of this, so I'm assuming that it's present from 5.12 onward.
 Right now I have this computer running kernel "v5.12-rc1" from Ubuntu's kernel
page, the ring expansion problems are present; the maximum number of TRBs for a
device the last time I checked was 8388608 after ~2.5 hours, and I'm curious if
this computer will run into a hard lockup.  I'm nearly certain that it will,
but we'll just have to wait and see to be certain.


Lastly, I went and looked at the changes that were done between v5.11 and
v5.12-rc2 within the "drivers/usb/host" directory and it appears that a
moderate amount of change took place (more than enough to make my head spin). 
I haven't dealt with kernel source like this before and it'd take a while to
parse through everything to understand what's going on, so I don't know how
much further I will get involved. Anyhow, I hope that this information might
help.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (17 preceding siblings ...)
  2023-04-13 20:23 ` bugzilla-daemon
@ 2023-04-14 14:24 ` bugzilla-daemon
  2023-04-14 14:32 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-14 14:24 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #19 from Mathias Nyman (mathias.nyman@linux.intel.com) ---

I suspect the offending commit is:
55f6153d8cc8 xhci: remove extra loop in interrupt context

It changes how num_trbs_free is counted for a ring. 

I'll attach a debug patch that will track both free trb and ring expansion

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (18 preceding siblings ...)
  2023-04-14 14:24 ` bugzilla-daemon
@ 2023-04-14 14:32 ` bugzilla-daemon
  2023-04-14 20:02 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-14 14:32 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #20 from Mathias Nyman (mathias.nyman@linux.intel.com) ---
Created attachment 304134
  --> https://bugzilla.kernel.org/attachment.cgi?id=304134&action=edit
debug patch comparing free trbs

Debugging patch for ring expansion and 

Patch recalculates free trbs and compares it to the old tracked value of free
trbs. Prints out a message if there is a new mismatch.

Patch also includes a new way of checking if ring needs expansion, and by how
much. used only to show it when driver expands ring based on old code

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (19 preceding siblings ...)
  2023-04-14 14:32 ` bugzilla-daemon
@ 2023-04-14 20:02 ` bugzilla-daemon
  2023-04-18 19:24 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-14 20:02 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #21 from Hunter M (miller.hunterc@gmail.com) ---
I'll get the kernel compiled with the patch and install it on some computers
next week. Will update with logs once they have ran for a few days.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (20 preceding siblings ...)
  2023-04-14 20:02 ` bugzilla-daemon
@ 2023-04-18 19:24 ` bugzilla-daemon
  2023-04-18 20:17 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-18 19:24 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #22 from Hunter M (miller.hunterc@gmail.com) ---
Created attachment 304159
  --> https://bugzilla.kernel.org/attachment.cgi?id=304159&action=edit
xhci debug

Kernel compiled and initial test performed. See debug logs from journalctl
attached.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (21 preceding siblings ...)
  2023-04-18 19:24 ` bugzilla-daemon
@ 2023-04-18 20:17 ` bugzilla-daemon
  2023-04-19 15:41 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-18 20:17 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #23 from Hunter M (miller.hunterc@gmail.com) ---
Created attachment 304161
  --> https://bugzilla.kernel.org/attachment.cgi?id=304161&action=edit
xhci dynamic debug log

xhci debug with dynamic debugging ON for xhci module

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (22 preceding siblings ...)
  2023-04-18 20:17 ` bugzilla-daemon
@ 2023-04-19 15:41 ` bugzilla-daemon
  2023-04-19 15:44 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-19 15:41 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #24 from Mathias Nyman (mathias.nyman@linux.intel.com) ---
Thanks, I see whats going on now.

Some transfers that are further ahead on the ring can simply be turned to no-op
trbs by driver when cancelled. These are not added back to num_trbs_free.  

This is the case when several URBs are queued for an endpoint and then
cancelled in reverse order.

I have a untested fix for this that goes on top of previous debug patch.
Can you try it out?

I'm also reworking this whole thing, but we need a small fix like this for
older stable kernels.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (23 preceding siblings ...)
  2023-04-19 15:41 ` bugzilla-daemon
@ 2023-04-19 15:44 ` bugzilla-daemon
  2023-04-19 18:41 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-19 15:44 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #25 from Mathias Nyman (mathias.nyman@linux.intel.com) ---
Created attachment 304163
  --> https://bugzilla.kernel.org/attachment.cgi?id=304163&action=edit
Fix trb free calculation patch, goes on top debug patch comparing free trbs

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (24 preceding siblings ...)
  2023-04-19 15:44 ` bugzilla-daemon
@ 2023-04-19 18:41 ` bugzilla-daemon
  2023-04-19 18:45 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-19 18:41 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #26 from Austin Domino (austin.domino@hotmail.com) ---
I can probably do so, but I'm sure Hunter will be on this eventually too.

Also, I saw that you mentioned something about commit 55f6153d8cc8, so I
reverted that commit, rebuilt the kernel, and tested, but I forgot to apply the
previous debug patch.  The TRBs problem doesn't appear to be present anymore
and I haven't run into any hard lockups within the ~24 hour period that it has
been running for so far.  Although, I'd like to run things with that kernel
build for a while longer before saying anything definitive, and I think that
some of the changes themselves within that commit are justifiable (but what can
I really say; I'm way too new at this), so changes likened unto those in this
newest patch appear to be a better solution to me.  I'll just build the most
recent kernel release with this patch for another device.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (25 preceding siblings ...)
  2023-04-19 18:41 ` bugzilla-daemon
@ 2023-04-19 18:45 ` bugzilla-daemon
  2023-04-21 17:54 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-19 18:45 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #27 from Hunter M (miller.hunterc@gmail.com) ---
Created attachment 304164
  --> https://bugzilla.kernel.org/attachment.cgi?id=304164&action=edit
init test with trb calculation test log

Here's the initial log with the trb free calculation patch compiled in the
kernel. 

Initial test shows that the number of trbs stays constant at 512. I'll run this
patch on multiple computers until next week and update with results. 

Let me know when the rework is done as well and I can go ahead and test that.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (26 preceding siblings ...)
  2023-04-19 18:45 ` bugzilla-daemon
@ 2023-04-21 17:54 ` bugzilla-daemon
  2023-04-25 14:35 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-21 17:54 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #28 from Austin Domino (austin.domino@hotmail.com) ---
A quick update.  I've been running a kernel with this patch on a system for
about a day and a half now, I haven't run into any issues so far, and there's
nothing notable in any of the logs.  I'll likely respond back sometime next
week with further updates, but things are looking good so far.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (27 preceding siblings ...)
  2023-04-21 17:54 ` bugzilla-daemon
@ 2023-04-25 14:35 ` bugzilla-daemon
  2023-04-25 19:01 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-25 14:35 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #29 from Hunter M (miller.hunterc@gmail.com) ---
12 computers have been running since 04-19-2023 with no CPU lockups using the
patches. TRB values constant at 512. 
The only thing I have seen on 2 of the computers is a warning log for the
following:

Apr 21 07:47:37 myuser kernel: xhci_hid 0000:00:14.0: WARN Set TR Deq Ptr cmd
failed due to incorrect slot or ep state.

However, no functionality has been lost and computers are running fine with
serial communication occurring as normal.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (28 preceding siblings ...)
  2023-04-25 14:35 ` bugzilla-daemon
@ 2023-04-25 19:01 ` bugzilla-daemon
  2023-04-26 15:18 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-25 19:01 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #30 from Austin Domino (austin.domino@hotmail.com) ---
(In reply to Hunter M from comment #29)
> 12 computers have been running since 04-19-2023 with no CPU lockups using
> the patches. TRB values constant at 512. 
> The only thing I have seen on 2 of the computers is a warning log for the
> following:
> 
> Apr 21 07:47:37 myuser kernel: xhci_hid 0000:00:14.0: WARN Set TR Deq Ptr
> cmd failed due to incorrect slot or ep state.

Hunter, I've seen the same warning in logs for computers running kernel 6.1 and
6.2 before and after these patches were applied.  Also, it appears that there's
already a bug report put together for this warning (see bug w/ id 202541).

Also, since I'm writing a quick comment, I'll give an update on testing the
kernel with these patches applied.  I haven't had any problems so far on any
computer running kernel 5.15, 6.1 or 6.2 with these patches applied over the
past 6 days, and the number of TRBs has remained at 512 on all 5 computers. 
I'm becoming more and more certain as they days go by that these patches fix
this bug, and I'm okay with running a patched kernel for the time being.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (29 preceding siblings ...)
  2023-04-25 19:01 ` bugzilla-daemon
@ 2023-04-26 15:18 ` bugzilla-daemon
  2023-04-26 15:20 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-26 15:18 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #31 from Mathias Nyman (mathias.nyman@linux.intel.com) ---
Thanks, I cleaned up the debug patch and turned into a real fix.
It can be found in my tree in the fix_ring_expansion branch:

git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git 

https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/log/?h=fix_ring_expansion

The larger rework of ring expansion is also done, but not very well tested.
If you can try it out as well it would be appreciated.
It can be found in the same tree in the rework_ring_expansion branch

https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/log/?h=rework_ring_expansion

In fear of regression I think I'll submit the smaller fix to 6.4 and older
stable kernels, and then rebase the larger rework on top of that, and submit it
to usb-next. (6.5 kernels and later)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (30 preceding siblings ...)
  2023-04-26 15:18 ` bugzilla-daemon
@ 2023-04-26 15:20 ` bugzilla-daemon
  2023-04-27 20:37 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-26 15:20 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #32 from Mathias Nyman (mathias.nyman@linux.intel.com) ---
Created attachment 304188
  --> https://bugzilla.kernel.org/attachment.cgi?id=304188&action=edit
Final free trb fix for 6.4 and stable

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (31 preceding siblings ...)
  2023-04-26 15:20 ` bugzilla-daemon
@ 2023-04-27 20:37 ` bugzilla-daemon
  2023-05-05 18:50 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-04-27 20:37 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #33 from Hunter M (miller.hunterc@gmail.com) ---
Will be running most computers with the real patch in the fix_ring_expansion
branch for a couple of weeks to verify.

I'll run a few of them with the larger rework next week and post results when I
get a chance.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (32 preceding siblings ...)
  2023-04-27 20:37 ` bugzilla-daemon
@ 2023-05-05 18:50 ` bugzilla-daemon
  2023-05-08  7:54 ` bugzilla-daemon
  2023-05-08 11:46 ` bugzilla-daemon
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-05-05 18:50 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #34 from Hunter M (miller.hunterc@gmail.com) ---
Update on larger rework - No issues found running on 2 computers for the week.
Would logs containing dynamic debug statement for xhci_hcd module be needed? If
so, will be able to get a snippet of those logs next week.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (33 preceding siblings ...)
  2023-05-05 18:50 ` bugzilla-daemon
@ 2023-05-08  7:54 ` bugzilla-daemon
  2023-05-08 11:46 ` bugzilla-daemon
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-05-08  7:54 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

--- Comment #35 from Mathias Nyman (mathias.nyman@linux.intel.com) ---
Thanks for testing the larger rework, no logs needed.
Can I add "Tested-by: Miller Hunter <MillerH@hearthnhome.com>" tag to that
series?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Bug 217242] CPU hard lockup related to xhci/dma
  2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
                   ` (34 preceding siblings ...)
  2023-05-08  7:54 ` bugzilla-daemon
@ 2023-05-08 11:46 ` bugzilla-daemon
  35 siblings, 0 replies; 41+ messages in thread
From: bugzilla-daemon @ 2023-05-08 11:46 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=217242

Hunter M (miller.hunterc@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |CODE_FIX

--- Comment #36 from Hunter M (miller.hunterc@gmail.com) ---
Yes go ahead. Marking this issue as resolved.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2023-05-08 11:47 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-24 15:00 [Bug 217242] New: CPU hard lockup related to xhci/dma bugzilla-daemon
2023-04-01 20:49 ` [Bug 217242] " bugzilla-daemon
2023-04-01 20:54 ` bugzilla-daemon
2023-04-01 20:57 ` bugzilla-daemon
2023-04-01 22:11 ` bugzilla-daemon
2023-04-01 22:12 ` bugzilla-daemon
2023-04-01 22:17 ` bugzilla-daemon
2023-04-02 15:54   ` Hans Petter Selasky
2023-04-02 17:25     ` Greg KH
2023-04-02 18:57       ` Alan Stern
2023-04-05 18:15         ` Hans Petter Selasky
2023-04-02 15:54 ` bugzilla-daemon
2023-04-02 17:25 ` bugzilla-daemon
2023-04-03 19:18 ` bugzilla-daemon
2023-04-06 20:15 ` bugzilla-daemon
2023-04-06 20:16 ` bugzilla-daemon
2023-04-10 17:32 ` bugzilla-daemon
2023-04-10 17:34 ` bugzilla-daemon
2023-04-11 12:54 ` bugzilla-daemon
2023-04-12 19:56 ` bugzilla-daemon
2023-04-12 19:57 ` bugzilla-daemon
2023-04-13  8:02 ` bugzilla-daemon
2023-04-13 20:23 ` bugzilla-daemon
2023-04-14 14:24 ` bugzilla-daemon
2023-04-14 14:32 ` bugzilla-daemon
2023-04-14 20:02 ` bugzilla-daemon
2023-04-18 19:24 ` bugzilla-daemon
2023-04-18 20:17 ` bugzilla-daemon
2023-04-19 15:41 ` bugzilla-daemon
2023-04-19 15:44 ` bugzilla-daemon
2023-04-19 18:41 ` bugzilla-daemon
2023-04-19 18:45 ` bugzilla-daemon
2023-04-21 17:54 ` bugzilla-daemon
2023-04-25 14:35 ` bugzilla-daemon
2023-04-25 19:01 ` bugzilla-daemon
2023-04-26 15:18 ` bugzilla-daemon
2023-04-26 15:20 ` bugzilla-daemon
2023-04-27 20:37 ` bugzilla-daemon
2023-05-05 18:50 ` bugzilla-daemon
2023-05-08  7:54 ` bugzilla-daemon
2023-05-08 11:46 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).