public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed
* [PROBLEM] usb: xhci_bus_resume cause irq lost issue
       [not found] <ea84165273814a41ae7187a008c4144b@huawei.com>
@ 2025-03-05 10:07 ` liudingyuan
  2025-03-05 15:49   ` Mathias Nyman
  0 siblings, 1 reply; 5+ messages in thread
From: liudingyuan @ 2025-03-05 10:07 UTC (permalink / raw)
  To: linux-usb@vger.kernel.org, greg@kroah.com,
	patchwork-bot@kernel.org, mricon@kernel.org
  Cc: Fangjian (Jay), Kangfenglong, yangxingui, fengsheng (A),
	lingmingqiang, liulongfang, zhonghaoquan, yanzhili (A),
	huyihua (A), Zengtao (B), shenjian (K), liuyonglong,
	Jonathan Cameron



Hi

I'm running into an issue where the enumeration of a USB2.0 device failed due to lost interrupts.

This issue appears to occur randomly and we can only reproduce it on xHCI controllers that provide both USB3.0 and USB2.0 root hubs. Additionally, it is necessary to ensure that the first-level device under this controller is a SB2.0 device.
The above scenario can be referred to in the following figure.
   ----------------------------------------------------------------------------
  |         +---------------------------------+               |
  |         |    xHCI Controller    |               |
  |         +---------------------------------+               |
  |                /       \                      |
  |               /         \                     |
  |              /           \                    |
  |  +-------------------------+      +---------------------------+   |
  |  | USB 3.0 Root Hub |      | USB 2.0 Root Hub  |   |
  |  +------------------------+      +----------------------------+   |
  --------------|-------------------------------------|-------------------------          
          |                       |
          | [NO device]             | [Device A]
          |                       |
                             
                               

The USB topology displayed in the OS looks like this:
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/ , 480M
    ID 1d6b:0002 Linux Foundation 2.0 root hub
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M

/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/ , 5000M
    ID 1d6b:0003 Linux Foundation 3.0 root hub

This issue only occurs when the system reboot or when we insmod the xhci driver or ingister the xhci controller.
When the issue occurs, we can observe that the CPU receives fewer interrupts than what would normally be generated during the enumeration process, and there are error logs indicating command timeouts.
	[ 2040.039438]xhci_hcd 0000:8a:00.7: Command timeout, USBSTS: 0x00000018 EINT PCD
	[ 2040.039444] xhci_hcd 0000:8a:00.7: Command timeout
	[ 2040.039446] xhci_hcd 0000:8a:00.7: Abort command ring
	[ 2042.055435] xhci_hcd 0000:8a:00.7: No stop event for abort, ring start fail?
	[ 2042.055469] xhci_hcd 0000:8a:00.7: Timeout while waiting for setup device command
	[ 2042.064048] usb 15-1: hub failed to enable device, error -62
	[ 2054.343446] xhci_hcd 0000:8a:00.7: Unsuccessful disable slot 1 command, status 25
 	[ 2066.631449] xhci_hcd 0000:8a:00.7: Error while assigning device slot ID: Command Aborted
	[ 2066.640633] xhci_hcd 0000:8a:00.7: Max number of devices this xHCI host supports is 64.
	[ 2066.649713] usb usb15-port1: couldn't allocate usb_device

After verification, we can confirm that the reason for the interrupt loss is that during the enumeration of U2 device,
U3 port is in a suspend procedure and performs an operation to disable interrupts in this function:

	drivers/usb/host/xhci-hub.c
		xhci_bus_resume()
			/* delay the irqs */
			temp = readl(&xhci->op_regs->command);
			temp &= ~CMD_EIE;
			writel(temp, &xhci->op_regs->command);

we can temporarily avoid this issue by modifying parameters.
echo -1 > /sys/module/usbcore/parameters/autosuspend

I am wondering whether there is a chance of interrupt loss occurring, regardless of whether or not it belongs to the scenario mentioned above? As long as an interrupt from a controller is triggered at exactly the same time as the process of disabling the controller's interrupts?

I read the xHCI protocol version 1.2 and haven't found detailed descriptions for such special cases. So I was wondering what is the main reason for disabling interrupts in xHCI driver during the resume process?

Additionally, given that there is a conflict between interrupt handling and the resume process, does our protocol require controllers to ensure they should trigger interrupt signals after interrupt has been re-enabled, even if the driver hasn't cleared the EHB? (This should also apply in cases where edge-triggered interrupts are used.)

Or should the xHCI driver software check and handle the controller's interrupt status when an error occurs, by polling for interrupt flags?

Thanks in advance for any insights!

Devyn Liu
liudingyuan@huawei.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PROBLEM] usb: xhci_bus_resume cause irq lost issue
  2025-03-05 10:07 ` [PROBLEM] usb: xhci_bus_resume cause irq lost issue liudingyuan
@ 2025-03-05 15:49   ` Mathias Nyman
  0 siblings, 0 replies; 5+ messages in thread
From: Mathias Nyman @ 2025-03-05 15:49 UTC (permalink / raw)
  To: liudingyuan, linux-usb@vger.kernel.org, greg@kroah.com,
	patchwork-bot@kernel.org, mricon@kernel.org
  Cc: Fangjian (Jay), Kangfenglong, yangxingui, fengsheng (A),
	lingmingqiang, liulongfang, zhonghaoquan, yanzhili (A),
	huyihua (A), Zengtao (B), shenjian (K), liuyonglong,
	Jonathan Cameron

Hi

On 5.3.2025 12.07, liudingyuan wrote:
> 
> 
> Hi
> 
> I'm running into an issue where the enumeration of a USB2.0 device failed due to lost interrupts.
> 
> This issue appears to occur randomly and we can only reproduce it on xHCI controllers that provide both USB3.0 and USB2.0 root hubs. Additionally, it is necessary to ensure that the first-level device under this controller is a SB2.0 device.
> The above scenario can be referred to in the following figure.
>     ----------------------------------------------------------------------------
>    |         +---------------------------------+               |
>    |         |    xHCI Controller    |               |
>    |         +---------------------------------+               |
>    |                /       \                      |
>    |               /         \                     |
>    |              /           \                    |
>    |  +-------------------------+      +---------------------------+   |
>    |  | USB 3.0 Root Hub |      | USB 2.0 Root Hub  |   |
>    |  +------------------------+      +----------------------------+   |
>    --------------|-------------------------------------|-------------------------
>            |                       |
>            | [NO device]             | [Device A]
>            |                       |
>                               
>                                 
> 
> The USB topology displayed in the OS looks like this:
> /:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/ , 480M
>      ID 1d6b:0002 Linux Foundation 2.0 root hub
>      |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
> 
> /:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/ , 5000M
>      ID 1d6b:0003 Linux Foundation 3.0 root hub

Odd that the USB3 roothub is registered before USB2 roothub,
I thought xhci driver always registers USB2 hcd first .

> 
> This issue only occurs when the system reboot or when we insmod the xhci driver or ingister the xhci controller.
> When the issue occurs, we can observe that the CPU receives fewer interrupts than what would normally be generated during the enumeration process, and there are error logs indicating command timeouts.
> 	[ 2040.039438]xhci_hcd 0000:8a:00.7: Command timeout, USBSTS: 0x00000018 EINT PCD
> 	[ 2040.039444] xhci_hcd 0000:8a:00.7: Command timeout
> 	[ 2040.039446] xhci_hcd 0000:8a:00.7: Abort command ring
> 	[ 2042.055435] xhci_hcd 0000:8a:00.7: No stop event for abort, ring start fail?
> 	[ 2042.055469] xhci_hcd 0000:8a:00.7: Timeout while waiting for setup device command
> 	[ 2042.064048] usb 15-1: hub failed to enable device, error -62
> 	[ 2054.343446] xhci_hcd 0000:8a:00.7: Unsuccessful disable slot 1 command, status 25
>   	[ 2066.631449] xhci_hcd 0000:8a:00.7: Error while assigning device slot ID: Command Aborted
> 	[ 2066.640633] xhci_hcd 0000:8a:00.7: Max number of devices this xHCI host supports is 64.
> 	[ 2066.649713] usb usb15-port1: couldn't allocate usb_device
> 
> After verification, we can confirm that the reason for the interrupt loss is that during the enumeration of U2 device,
> U3 port is in a suspend procedure and performs an operation to disable interrupts in this function:
> 
> 	drivers/usb/host/xhci-hub.c
> 		xhci_bus_resume()
> 			/* delay the irqs */
> 			temp = readl(&xhci->op_regs->command);
> 			temp &= ~CMD_EIE;
> 			writel(temp, &xhci->op_regs->command);
> 
> we can temporarily avoid this issue by modifying parameters.
> echo -1 > /sys/module/usbcore/parameters/autosuspend
> 
> I am wondering whether there is a chance of interrupt loss occurring, regardless of whether or not it belongs to the scenario mentioned above? As long as an interrupt from a controller is triggered at exactly the same time as the process of disabling the controller's interrupts?
> 
> I read the xHCI protocol version 1.2 and haven't found detailed descriptions for such special cases. So I was wondering what is the main reason for disabling interrupts in xHCI driver during the resume process?
> 

This is from a time before I started maintaining the xhci driver. I guess
it was done to allow bus suspended usb2 ports to resume fully to U0 before
xhci_bus_resume() returns.

Resuming a usb2 port from  U3 suspend to U0 is a two stage process.
In 'host initiated resume' the xhci driver will first transition the port from
'U3' to 'Resume' state, then wait in Resume state for 20ms, and finally move
it to U0 state.

I assume driver disabled xHC from triggering interrupts to prevent event handler
from messing with this usb2 port resume process.

spin_lock_irqsave(xhci->lock) would normally be used to prevent interrupt
handler from interfering, but keeping the spin lock over msleep(20) was not
possible.

The device initiated usb2 resume is much better, it utilizes the event handler,
hub thread, timestamps and completions. The same should be done here, but
implementation isn't trivial.

I don't however think there is a reason to turn off interrupts while resuming
the USB3 bus. It doesn't sleep so just keeping spin_lock_irqsave() should be
enough. This should be an easy fix. Something like this:
(untested, copy-pasted diff)


diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 9693464c0520..7cf7ee84fc96 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -1873,6 +1873,7 @@ int xhci_bus_resume(struct usb_hcd *hcd)
         u32 temp, portsc;
         struct xhci_hub *rhub;
         struct xhci_port **ports;
+       bool disabled_irq = false;
  
         rhub = xhci_get_rhub(hcd);
         ports = rhub->ports;
@@ -1888,17 +1889,19 @@ int xhci_bus_resume(struct usb_hcd *hcd)
                 return -ESHUTDOWN;
         }
  
-       /* delay the irqs */
-       temp = readl(&xhci->op_regs->command);
-       temp &= ~CMD_EIE;
-       writel(temp, &xhci->op_regs->command);
-
         /* bus specific resume for ports we suspended at bus_suspend */
-       if (hcd->speed >= HCD_USB3)
+       if (hcd->speed >= HCD_USB3) {
                 next_state = XDEV_U0;
-       else
+       } else {
                 next_state = XDEV_RESUME;
-
+               if (bus_state->bus_suspended) {
+                       /* delay the irqs if we need to resume usb2 ports */
+                       temp = readl(&xhci->op_regs->command);
+                       temp &= ~CMD_EIE;
+                       writel(temp, &xhci->op_regs->command);
+                       disabled_irq = true;
+               }
+       }
         port_index = max_ports;
         while (port_index--) {
                 portsc = readl(ports[port_index]->addr);
@@ -1967,10 +1970,12 @@ int xhci_bus_resume(struct usb_hcd *hcd)
  
         bus_state->next_statechange = jiffies + msecs_to_jiffies(5);
         /* re-enable irqs */
-       temp = readl(&xhci->op_regs->command);
-       temp |= CMD_EIE;
-       writel(temp, &xhci->op_regs->command);
-       temp = readl(&xhci->op_regs->command);
+       if (disabled_irq) {
+               temp = readl(&xhci->op_regs->command);
+               temp |= CMD_EIE;
+               writel(temp, &xhci->op_regs->command);
+               temp = readl(&xhci->op_regs->command);
+       }
  
         spin_unlock_irqrestore(&xhci->lock, flags);
         return 0;

Thanks
Mathias

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* re: [PROBLEM] usb: xhci_bus_resume cause irq lost issue
@ 2025-03-06 14:29 liudingyuan
  2025-03-07 16:20 ` Mathias Nyman
  0 siblings, 1 reply; 5+ messages in thread
From: liudingyuan @ 2025-03-06 14:29 UTC (permalink / raw)
  To: Mathias Nyman, linux-usb@vger.kernel.org, greg@kroah.com,
	patchwork-bot@kernel.org, mricon@kernel.org
  Cc: Fangjian (Jay), Kangfenglong, yangxingui, fengsheng (A),
	lingmingqiang, liulongfang, zhonghaoquan, yanzhili (A),
	huyihua (A), Zengtao (B), shenjian (K), liuyonglong,
	Jonathan Cameron


Hi

> > On 5.3.2025 12.07, liudingyuan wrote:
> > 
> > 
> > Hi
> > 
> > I'm running into an issue where the enumeration of a USB2.0 device failed due to lost interrupts.
> > 
> > This issue appears to occur randomly and we can only reproduce it on xHCI controllers that provide both USB3.0 and USB2.0 root hubs. Additionally, it is necessary to ensure that the first-level device under this controller is a SB2.0 device.
> > The above scenario can be referred to in the following figure.
> >     ----------------------------------------------------------------------------
> >    |         +---------------------------------+               |
> >    |         |    xHCI Controller    |               |
> >    |         +---------------------------------+               |
> >    |                /       \                      |
> >    |               /         \                     |
> >    |              /           \                    |
> >    |  +-------------------------+      +---------------------------+   |
> >    |  | USB 3.0 Root Hub |      | USB 2.0 Root Hub  |   |
> >    |  +------------------------+      +----------------------------+   |
> >    --------------|-------------------------------------|-------------------------
> >            |                       |
> >            | [NO device]             | [Device A]
> >            |                       |
> >                               
> >                                 
> > 
> > The USB topology displayed in the OS looks like this:

   /:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 10000M
       ID 1d6b:0003 Linux Foundation 3.0 root hub
> > /:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/ , 480M
> >      ID 1d6b:0002 Linux Foundation 2.0 root hub
> >      |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
> > 
> > /:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/ , 5000M
> >      ID 1d6b:0003 Linux Foundation 3.0 root hub

> Odd that the USB3 roothub is registered before USB2 roothub, I thought xhci driver always registers USB2 hcd first .

Yes, this was my mistake. After checking, it turns out that the bus3 and bus4 are two ports belonging to the same xHCI controller.
Information about bus4 has been added as part of the modification.
The failed USB15 in the following log is a USB2 port. The resume of this port has a conflict with the USB3 port USB16.
(In the scene below, this resume was due to the time condition below not being met.)
	/drivers/usb/core/hcd.c
		hcd_bus_suspend() 
		{

			status = hcd->driver->hub_status_data(hcd, buffer)
			drivers/usb/host/xhci-hub.c
			xhci_hub_status_data() 
			{
				if (time_before(jiffies, xhci->run_graceperiod)) 
					status = 1;
			}

			if (status != 0) {
				dev_dbg(&rhdev->dev, "suspend raced with wakeup event\n");
				hcd_bus_resume(rhdev, PMSG_AUTO_RESUME);
				status = -EBUSY;
			}
		}

> > 
> > This issue only occurs when the system reboot or when we insmod the xhci driver or ingister the xhci controller.
> > When the issue occurs, we can observe that the CPU receives fewer interrupts than what would normally be generated during the enumeration process, and there are error logs indicating command timeouts.
> > 	[ 2040.039438]xhci_hcd 0000:8a:00.7: Command timeout, USBSTS: 0x00000018 EINT PCD
> > 	[ 2040.039444] xhci_hcd 0000:8a:00.7: Command timeout
> > 	[ 2040.039446] xhci_hcd 0000:8a:00.7: Abort command ring
> > 	[ 2042.055435] xhci_hcd 0000:8a:00.7: No stop event for abort, ring start fail?
> > 	[ 2042.055469] xhci_hcd 0000:8a:00.7: Timeout while waiting for setup device command
> > 	[ 2042.064048] usb 15-1: hub failed to enable device, error -62
> > 	[ 2054.343446] xhci_hcd 0000:8a:00.7: Unsuccessful disable slot 1 command, status 25
> >   	[ 2066.631449] xhci_hcd 0000:8a:00.7: Error while assigning device slot ID: Command Aborted
> > 	[ 2066.640633] xhci_hcd 0000:8a:00.7: Max number of devices this xHCI host supports is 64.
> > 	[ 2066.649713] usb usb15-port1: couldn't allocate usb_device
> > 
> > After verification, we can confirm that the reason for the interrupt 
> > loss is that during the enumeration of U2 device,
> > U3 port is in a suspend procedure and performs an operation to disable interrupts in this function:
> > 
> > 	drivers/usb/host/xhci-hub.c
> > 		xhci_bus_resume()
> > 			/* delay the irqs */
> > 			temp = readl(&xhci->op_regs->command);
> > 			temp &= ~CMD_EIE;
> > 			writel(temp, &xhci->op_regs->command);
> > 
> > we can temporarily avoid this issue by modifying parameters.
> > echo -1 > /sys/module/usbcore/parameters/autosuspend
> > 
> > I am wondering whether there is a chance of interrupt loss occurring, regardless of whether or not it belongs to the scenario mentioned above? As long as an interrupt from a controller is triggered at exactly the same time as the process of disabling the controller's interrupts?
> > 
> > I read the xHCI protocol version 1.2 and haven't found detailed descriptions for such special cases. So I was wondering what is the main reason for disabling interrupts in xHCI driver during the resume process?
> > 

> This is from a time before I started maintaining the xhci driver. I guess it was done to allow bus suspended usb2 ports to resume fully to U0 before
> xhci_bus_resume() returns.

> Resuming a usb2 port from  U3 suspend to U0 is a two stage process.
> In 'host initiated resume' the xhci driver will first transition the port from 'U3' to 'Resume' state, then wait in Resume state for 20ms, and finally move it to U0 state.

> I assume driver disabled xHC from triggering interrupts to prevent event handler from messing with this usb2 port resume process.

> spin_lock_irqsave(xhci->lock) would normally be used to prevent interrupt handler from interfering, but keeping the spin lock over msleep(20) was not possible.

> The device initiated usb2 resume is much better, it utilizes the event handler, hub thread, timestamps and completions. The same should be done here, but implementation isn't trivial.

> I don't however think there is a reason to turn off interrupts while resuming the USB3 bus. It doesn't sleep so just keeping spin_lock_irqsave() should be enough. This should be an easy fix. Something like this:
> (untested, copy-pasted diff)

Thank you very much for your detailed analysis regarding the reason for disabling interrupts during the resume operation.

I compiled a new kernel based on the fix code you provided below and conducted some preliminary tests. 
In the repeated unregister/register tests of the xHCI controller that previously caused issues,  both the driver and USB-related functionalities are now working normally. 
(Moreover, this fix code, in theory, should completely resolve the issues we encountered in our USB3-USB2 device-only scenario.)

Based on the logic mentioned in analysis, we currently may not have implemented a better solution to avoid disabling interrupts during the USB2 resume process. I would like to ask if we need to
be concerned about the issue of interrupt loss caused by disabling interrupts in other scenarios where resume and enumeration processes or transfer operations might conflict?

For example, when a user inserts a device during the USB2/USB3 port resume, or when the USB3 controller is only connected to a USB3 devices, and the USB2 port enters this resume flow due to auto-suspend?
 (However, it seems that the probability of these two scenarios is very low, as we have not yet been able to reproduce errors under these conditions.)

> diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c index 9693464c0520..7cf7ee84fc96 100644
> --- a/drivers/usb/host/xhci-hub.c
> +++ b/drivers/usb/host/xhci-hub.c
> @@ -1873,6 +1873,7 @@ int xhci_bus_resume(struct usb_hcd *hcd)
>          u32 temp, portsc;
>          struct xhci_hub *rhub;
>          struct xhci_port **ports;
> +       bool disabled_irq = false;
>   
>          rhub = xhci_get_rhub(hcd);
>          ports = rhub->ports;
> @@ -1888,17 +1889,19 @@ int xhci_bus_resume(struct usb_hcd *hcd)
>                  return -ESHUTDOWN;
>          }
>   
> -       /* delay the irqs */
> -       temp = readl(&xhci->op_regs->command);
> -       temp &= ~CMD_EIE;
> -       writel(temp, &xhci->op_regs->command);
> -
>          /* bus specific resume for ports we suspended at bus_suspend */
> -       if (hcd->speed >= HCD_USB3)
> +       if (hcd->speed >= HCD_USB3) {
>                  next_state = XDEV_U0;
> -       else
> +       } else {
>                  next_state = XDEV_RESUME;
> -
> +               if (bus_state->bus_suspended) {
> +                       /* delay the irqs if we need to resume usb2 ports */
> +                       temp = readl(&xhci->op_regs->command);
> +                       temp &= ~CMD_EIE;
> +                       writel(temp, &xhci->op_regs->command);
> +                       disabled_irq = true;
> +               }
> +       }
>          port_index = max_ports;
>          while (port_index--) {
>                  portsc = readl(ports[port_index]->addr); @@ -1967,10 +1970,12 @@ int xhci_bus_resume(struct usb_hcd *hcd)
>   
>         bus_state->next_statechange = jiffies + msecs_to_jiffies(5);
>          /* re-enable irqs */
> -       temp = readl(&xhci->op_regs->command);
> -       temp |= CMD_EIE;
> -       writel(temp, &xhci->op_regs->command);
> -       temp = readl(&xhci->op_regs->command);
> +       if (disabled_irq) {
> +               temp = readl(&xhci->op_regs->command);
> +               temp |= CMD_EIE;
> +               writel(temp, &xhci->op_regs->command);
> +               temp = readl(&xhci->op_regs->command);
> +       }
>   
>          spin_unlock_irqrestore(&xhci->lock, flags);
>          return 0;
>

This fix indeed helps us avoid the current issue, so I would like to ask if it is possible to push this modification as a patch to the mainline code?
If possible, we also plan to conduct a comprehensive test of USB functionality based on this modification to further validate it. 


Considering that the fix cannot completely avoid all possible scenarios where interrupts might be lost due to the hardware IE (Interrupt Enable) being turned off.
I would like to ask whether the hardware design is reasonable in the following case: 
when a hardware edge-triggered interrupt is lost due to IE being disabled, and the subsequent interrupts cannot be triggered because the software didn't
clear the EHB (Event Handler Busy) bit.  

Does XHCI Specificationl have any requirements regarding this? 

                     +--------------------------------+
                     |    xHCI Controller   |
                     |   (Shared Interrupt)  |
                     +--------------------------------+
                          /             \
                         /               \
                        /                 \
           +--------------------------------+   +------------------------------------+
           | USB 2.0 Roothub      |   | USB 3.0 Roothub         |
           | (Suspend in progress)  |   | (Device Enumeration)     |
           |  - IE disabled        |   |  - New device inserted    |
           |  - Events accumulating |   |  - Generates event TRBs  |
           +---------------------------------+   +------------------------------------+
                          \             /
                           \           /
                            \         /
                  +------------------------------------------+
                  | Shared Physical Interrupt Line  |
                  |   (Edge-triggered mechanism) |
                  +-------------------------------------------+
                              ||
                              ||  (No new edge occurs when IE is re-enabled)
                              \/
         [Pending events in the event ring not triggering an interrupt]


Thanks!
Best Regards
Devyn

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PROBLEM] usb: xhci_bus_resume cause irq lost issue
  2025-03-06 14:29 [PROBLEM] usb: xhci_bus_resume cause irq lost issue liudingyuan
@ 2025-03-07 16:20 ` Mathias Nyman
  2025-03-10 15:11   ` [RFT PATCH] xhci: Limit time spent with interrupts disabled during bus resume Mathias Nyman
  0 siblings, 1 reply; 5+ messages in thread
From: Mathias Nyman @ 2025-03-07 16:20 UTC (permalink / raw)
  To: liudingyuan, linux-usb@vger.kernel.org, greg@kroah.com,
	patchwork-bot@kernel.org, mricon@kernel.org
  Cc: Fangjian (Jay), Kangfenglong, yangxingui, fengsheng (A),
	lingmingqiang, liulongfang, zhonghaoquan, yanzhili (A),
	huyihua (A), Zengtao (B), shenjian (K), liuyonglong,
	Jonathan Cameron

On 6.3.2025 16.29, liudingyuan wrote:
> 
> I compiled a new kernel based on the fix code you provided below and conducted some preliminary tests.> In the repeated unregister/register tests of the xHCI controller that previously caused issues,  both the driver and USB-related functionalities are now working normally.
> (Moreover, this fix code, in theory, should completely resolve the issues we encountered in our USB3-USB2 device-only scenario.)
> 
> Based on the logic mentioned in analysis, we currently may not have implemented a better solution to avoid disabling interrupts during the USB2 resume process. I would like to ask if we need to
> be concerned about the issue of interrupt loss caused by disabling interrupts in other scenarios where resume and enumeration processes or transfer operations might conflict?
> 
> For example, when a user inserts a device during the USB2/USB3 port resume, or when the USB3 controller is only connected to a USB3 devices, and the USB2 port enters this resume flow due to auto-suspend?
>   (However, it seems that the probability of these two scenarios is very low, as we have not yet been able to reproduce errors under these conditions.)
> 
> This fix indeed helps us avoid the current issue, so I would like to ask if it is possible to push this modification as a patch to the mainline code?
> If possible, we also plan to conduct a comprehensive test of USB functionality based on this modification to further validate it.
> 
> 
> Considering that the fix cannot completely avoid all possible scenarios where interrupts might be lost due to the hardware IE (Interrupt Enable) being turned off.
> I would like to ask whether the hardware design is reasonable in the following case:
> when a hardware edge-triggered interrupt is lost due to IE being disabled, and the subsequent interrupts cannot be triggered because the software didn't
> clear the EHB (Event Handler Busy) bit.
> 

I think we can avoid this situation by disabling the primary interrupter instead of all interrupts.
Meaning we would clear the 'Interrupt enable (IE)' bit:1  in  Interrupter Management Register (IMAN)
instead of the 'Interrupter Enable' (INTE) bit:2 in USBCMD register.

This way EHB and IP shouldn't be set at all, and thus not prevent future interrupts.

In practice this just means calling xhci_disable_interrupter() xhci_enable_interrupter()
instead of clearing and setting CMD_EIE bit.

I'll write a patch for this next week
Grateful if you could run the more comprehensive test before I queue it for
upstream (mainline)

Thanks
Mathias


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFT PATCH] xhci: Limit time spent with interrupts disabled during bus resume
  2025-03-07 16:20 ` Mathias Nyman
@ 2025-03-10 15:11   ` Mathias Nyman
  0 siblings, 0 replies; 5+ messages in thread
From: Mathias Nyman @ 2025-03-10 15:11 UTC (permalink / raw)
  To: gregkh, liudingyuan, patchwork-bot, mricon, linux-usb
  Cc: f.fangjian, kangfenglong, yangxingui, fengsheng5, lingmingqiang,
	liulongfang, zhonghaoquan, yanzhili7, huyihua4, prime.zeng,
	shenjian15, liuyonglong, jonathan.cameron, Mathias Nyman

Current xhci bus resume implementation prevents xHC host from generating
interrupts during high-speed USB 2 and super-speed USB 3 bus resume.

Only reason to disable interrupts during bus resume would be to prevent
the interrupt handler from interfering with the resume process of USB 2
ports.

Host initiated resume of USB 2 ports is done in two stages.

The xhci driver first transitions the port from 'U3' to 'Resume' state,
then wait in Resume for 20ms, and finally moves port to U0 state.
xhci driver can't prevent interrupts by taking and keeping the xhci
spinlock with spin_lock_irqsave() due to this 20ms sleep.

Limit interrupt disabling to the USB 2 port resume case only.
resuming USB 2 ports in bus resume is only done in special cases where
USB 2 ports had to be forced to suspend during bus suspend.

The current way of preventing interrupts by clearing the 'Interrupt
Enable' (INTE) bit in USBCMD register won't prevent the Interrupter
registers 'Interrupt Pending' (IP), 'Event Handler Busy' (EHB) and
USBSTS register Event Interrupt (EINT) bits from being set.

New interrupts can't be issued before those bits are properly clered.

Disable interrupts by clearing the interrupter register 'Interrupt
Enable' (IE) bit instead. This way IP, EHB and INTE won't be set
before IE is enabled again and a new interrupt is triggered.

Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-hub.c | 30 ++++++++++++++++--------------
 drivers/usb/host/xhci.c     |  4 ++--
 drivers/usb/host/xhci.h     |  2 ++
 3 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 9693464c0520..d11b60f705bb 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -1870,9 +1870,10 @@ int xhci_bus_resume(struct usb_hcd *hcd)
 	int max_ports, port_index;
 	int sret;
 	u32 next_state;
-	u32 temp, portsc;
+	u32 portsc;
 	struct xhci_hub *rhub;
 	struct xhci_port **ports;
+	bool disabled_irq = false;
 
 	rhub = xhci_get_rhub(hcd);
 	ports = rhub->ports;
@@ -1888,17 +1889,20 @@ int xhci_bus_resume(struct usb_hcd *hcd)
 		return -ESHUTDOWN;
 	}
 
-	/* delay the irqs */
-	temp = readl(&xhci->op_regs->command);
-	temp &= ~CMD_EIE;
-	writel(temp, &xhci->op_regs->command);
-
 	/* bus specific resume for ports we suspended at bus_suspend */
-	if (hcd->speed >= HCD_USB3)
+	if (hcd->speed >= HCD_USB3) {
 		next_state = XDEV_U0;
-	else
+	} else {
 		next_state = XDEV_RESUME;
-
+		if (bus_state->bus_suspended) {
+			/*
+			 * prevent port event interrupts from interfering
+			 * with usb2 port resume process
+			 */
+			xhci_disable_interrupter(xhci->interrupters[0]);
+			disabled_irq = true;
+		}
+	}
 	port_index = max_ports;
 	while (port_index--) {
 		portsc = readl(ports[port_index]->addr);
@@ -1966,11 +1970,9 @@ int xhci_bus_resume(struct usb_hcd *hcd)
 	(void) readl(&xhci->op_regs->command);
 
 	bus_state->next_statechange = jiffies + msecs_to_jiffies(5);
-	/* re-enable irqs */
-	temp = readl(&xhci->op_regs->command);
-	temp |= CMD_EIE;
-	writel(temp, &xhci->op_regs->command);
-	temp = readl(&xhci->op_regs->command);
+	/* re-enable interrupter */
+	if (disabled_irq)
+		xhci_enable_interrupter(xhci->interrupters[0]);
 
 	spin_unlock_irqrestore(&xhci->lock, flags);
 	return 0;
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 0c22b78358b9..ad229cb9a90b 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -322,7 +322,7 @@ static void xhci_zero_64b_regs(struct xhci_hcd *xhci)
 		xhci_info(xhci, "Fault detected\n");
 }
 
-static int xhci_enable_interrupter(struct xhci_interrupter *ir)
+int xhci_enable_interrupter(struct xhci_interrupter *ir)
 {
 	u32 iman;
 
@@ -335,7 +335,7 @@ static int xhci_enable_interrupter(struct xhci_interrupter *ir)
 	return 0;
 }
 
-static int xhci_disable_interrupter(struct xhci_interrupter *ir)
+int xhci_disable_interrupter(struct xhci_interrupter *ir)
 {
 	u32 iman;
 
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 6c00062a9acc..10572336fabe 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1891,6 +1891,8 @@ int xhci_alloc_tt_info(struct xhci_hcd *xhci,
 		struct usb_tt *tt, gfp_t mem_flags);
 int xhci_set_interrupter_moderation(struct xhci_interrupter *ir,
 				    u32 imod_interval);
+int xhci_enable_interrupter(struct xhci_interrupter *ir);
+int xhci_disable_interrupter(struct xhci_interrupter *ir);
 
 /* xHCI ring, segment, TRB, and TD functions */
 dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg, union xhci_trb *trb);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-03-10 15:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-06 14:29 [PROBLEM] usb: xhci_bus_resume cause irq lost issue liudingyuan
2025-03-07 16:20 ` Mathias Nyman
2025-03-10 15:11   ` [RFT PATCH] xhci: Limit time spent with interrupts disabled during bus resume Mathias Nyman
     [not found] <ea84165273814a41ae7187a008c4144b@huawei.com>
2025-03-05 10:07 ` [PROBLEM] usb: xhci_bus_resume cause irq lost issue liudingyuan
2025-03-05 15:49   ` Mathias Nyman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox