public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] usb: xhci: add xhci_halt() for HCE Handling
@ 2026-01-27 11:04 jiangdayu
  2026-01-27 11:22 ` Greg Kroah-Hartman
  2026-01-27 12:25 ` Mathias Nyman
  0 siblings, 2 replies; 16+ messages in thread
From: jiangdayu @ 2026-01-27 11:04 UTC (permalink / raw)
  To: Mathias Nyman, Greg Kroah-Hartman
  Cc: Longfang Liu, linux-usb, linux-kernel, yudongbin, guhuinan,
	chenyu45, mahongwei3, jiangdayu

When the xHCI controller reports a Host Controller Error (HCE) status
in the interrupt handler, the driver currently only logs a warning and
continues execution. However, a Host Controller Error indicates a
critical hardware failure that requires the controller to be halted.

Add xhci_halt(xhci) call after the HCE warning to properly halt the
controller when this error condition is detected. This ensures the
controller is in a consistent state and prevents further operations
on a failed hardware. Additionally, if there are still unhandled
interrupts at this point, it may cause interrupt storm.

The change is made in xhci_irq() function where STS_HCE status is
checked, mirroring the existing error handling pattern used for
STS_FATAL errors.

Fixes: 2a25e66d676df ("xhci: print warning when HCE was set")
Signed-off-by: jiangdayu <jiangdayu@xiaomi.com>
---
 drivers/usb/host/xhci-ring.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 9315ba18310d..1cbefee3c4ca 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -3195,6 +3195,7 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd)
 
 	if (status & STS_HCE) {
 		xhci_warn(xhci, "WARNING: Host Controller Error\n");
+		xhci_halt(xhci);
 		goto out;
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
  2026-01-27 11:04 [PATCH] usb: xhci: add xhci_halt() for HCE Handling jiangdayu
@ 2026-01-27 11:22 ` Greg Kroah-Hartman
  2026-01-28  8:48   ` Dayu Jiang
  2026-01-27 12:25 ` Mathias Nyman
  1 sibling, 1 reply; 16+ messages in thread
From: Greg Kroah-Hartman @ 2026-01-27 11:22 UTC (permalink / raw)
  To: jiangdayu
  Cc: Mathias Nyman, Longfang Liu, linux-usb, linux-kernel, yudongbin,
	guhuinan, chenyu45, mahongwei3

On Tue, Jan 27, 2026 at 07:04:22PM +0800, jiangdayu wrote:
> When the xHCI controller reports a Host Controller Error (HCE) status
> in the interrupt handler, the driver currently only logs a warning and
> continues execution. However, a Host Controller Error indicates a
> critical hardware failure that requires the controller to be halted.
> 
> Add xhci_halt(xhci) call after the HCE warning to properly halt the
> controller when this error condition is detected. This ensures the
> controller is in a consistent state and prevents further operations
> on a failed hardware. Additionally, if there are still unhandled
> interrupts at this point, it may cause interrupt storm.
> 
> The change is made in xhci_irq() function where STS_HCE status is
> checked, mirroring the existing error handling pattern used for
> STS_FATAL errors.
> 
> Fixes: 2a25e66d676df ("xhci: print warning when HCE was set")
> Signed-off-by: jiangdayu <jiangdayu@xiaomi.com>

We need a full name, not an email alias, sorry.

And this isn't really "fixing" that commit, there's nothing wrong with
it as-is.  This is adding new functionality to the code.

> ---
>  drivers/usb/host/xhci-ring.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 9315ba18310d..1cbefee3c4ca 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -3195,6 +3195,7 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd)
>  
>  	if (status & STS_HCE) {
>  		xhci_warn(xhci, "WARNING: Host Controller Error\n");
> +		xhci_halt(xhci);

What is going to start things back up again?  And as you are calling
this function, why is the warning message needed anymore?  The
tracepoint information will give you that message now, right?

And is this just papering over a hardware bug?  Should this really be
happening for any normal system?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
  2026-01-27 11:04 [PATCH] usb: xhci: add xhci_halt() for HCE Handling jiangdayu
  2026-01-27 11:22 ` Greg Kroah-Hartman
@ 2026-01-27 12:25 ` Mathias Nyman
  2026-01-28  8:53   ` Dayu Jiang
  1 sibling, 1 reply; 16+ messages in thread
From: Mathias Nyman @ 2026-01-27 12:25 UTC (permalink / raw)
  To: jiangdayu, Mathias Nyman, Greg Kroah-Hartman
  Cc: Longfang Liu, linux-usb, linux-kernel, yudongbin, guhuinan,
	chenyu45, mahongwei3

Hi

On 1/27/26 13:04, jiangdayu wrote:
> When the xHCI controller reports a Host Controller Error (HCE) status
> in the interrupt handler, the driver currently only logs a warning and
> continues execution. However, a Host Controller Error indicates a
> critical hardware failure that requires the controller to be halted.
> 

The host should cease all activity when it sets the HCE bit.

See xHCI spec 4.24.1 'Internal Errors':
"When the HCE flag is set to ‘1’ the xHC shall cease all activity.
  Software response to the assertion of HCE is to reset the
  xHC (HCRST = ‘1’) and reinitialize it."

Same is true for "Host system error" HSE (STS_FATAL), not sure
why we halt it manually in that case.

> Add xhci_halt(xhci) call after the HCE warning to properly halt the
> controller when this error condition is detected. This ensures the
> controller is in a consistent state and prevents further operations
> on a failed hardware. Additionally, if there are still unhandled
> interrupts at this point, it may cause interrupt storm.

Is this something that has been seen on real word hardware?
If yes, and halting the host helped ,then this fix is ok by me.
At least until a proper host reset solution is implemented.

Thanks
Mathias


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
  2026-01-27 11:22 ` Greg Kroah-Hartman
@ 2026-01-28  8:48   ` Dayu Jiang
  2026-01-28  8:56     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 16+ messages in thread
From: Dayu Jiang @ 2026-01-28  8:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Mathias Nyman, Longfang Liu, linux-usb, linux-kernel, yudongbin,
	guhuinan, chenyu45, mahongwei3

On Tue, Jan 27, 2026 at 12:22:40PM +0100, Greg Kroah-Hartman wrote:
> On Tue, Jan 27, 2026 at 07:04:22PM +0800, jiangdayu wrote:
> > When the xHCI controller reports a Host Controller Error (HCE) status
> > in the interrupt handler, the driver currently only logs a warning and
> > continues execution. However, a Host Controller Error indicates a
> > critical hardware failure that requires the controller to be halted.
> > 
> > Add xhci_halt(xhci) call after the HCE warning to properly halt the
> > controller when this error condition is detected. This ensures the
> > controller is in a consistent state and prevents further operations
> > on a failed hardware. Additionally, if there are still unhandled
> > interrupts at this point, it may cause interrupt storm.
> > 
> > The change is made in xhci_irq() function where STS_HCE status is
> > checked, mirroring the existing error handling pattern used for
> > STS_FATAL errors.
> > 
> > Fixes: 2a25e66d676df ("xhci: print warning when HCE was set")
> > Signed-off-by: jiangdayu <jiangdayu@xiaomi.com>
> 
> We need a full name, not an email alias, sorry.
Sorry for the confusion, I will use my full legal name (instead of the 
email alias) in the Signed-off-by line in the revised patch.  
> 
> And this isn't really "fixing" that commit, there's nothing wrong with
> it as-is.  This is adding new functionality to the code.
I initially used the Fixes tag because the original commit only logged
a warning for HCE with no further action, this incomplete handling 
risks interrupt storms on the SoC (since the interrupt isn’t cleared). 
That’s a robustness gap I wanted to fix with this patch.
> 
> > ---
> >  drivers/usb/host/xhci-ring.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> > index 9315ba18310d..1cbefee3c4ca 100644
> > --- a/drivers/usb/host/xhci-ring.c
> > +++ b/drivers/usb/host/xhci-ring.c
> > @@ -3195,6 +3195,7 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd)
> >  
> >  	if (status & STS_HCE) {
> >  		xhci_warn(xhci, "WARNING: Host Controller Error\n");
> > +		xhci_halt(xhci);
> 
> What is going to start things back up again?  And as you are calling
> this function, why is the warning message needed anymore?  The
> tracepoint information will give you that message now, right?
When HCE is triggered, it indicates a critical hardware failure. 
Aligning with the handling of HSE (STS_FATAL) by adding 
xhci_halt() here is more reasonable: without xhci_halt(), the 
USB controller may fall into an unpredictable and unstable state, 
which could exacerbate system issues.  

Retaining the warning message is necessary because it is directly 
visible in dmesg, whereas tracepoint information requires explicitly 
enabling xHCI tracepoints. Additionally, if xhci_halt() is called in 
xhci_irq() without the warning log, it would be impossible to 
distinguish whether the halt was triggered by HCE or HSE.
> 
> And is this just papering over a hardware bug?  Should this really be
> happening for any normal system?
Yes, this issue has been reproducible on real-world hardware: HCE is 
triggered in UAS Storage Device plug/unplug scenarios on Android 
devices, which enters this error branch and causes an interrupt storm, 
leading to severe system-level faults.
> 
> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
  2026-01-27 12:25 ` Mathias Nyman
@ 2026-01-28  8:53   ` Dayu Jiang
  0 siblings, 0 replies; 16+ messages in thread
From: Dayu Jiang @ 2026-01-28  8:53 UTC (permalink / raw)
  To: Mathias Nyman
  Cc: Greg Kroah-Hartman, Longfang Liu, linux-usb, linux-kernel,
	yudongbin, guhuinan, chenyu45, mahongwei3, Dayu Jiang

On Tue, Jan 27, 2026 at 02:25:19PM +0200, Mathias Nyman wrote:
> Hi
> 
> On 1/27/26 13:04, jiangdayu wrote:
> > When the xHCI controller reports a Host Controller Error (HCE) status
> > in the interrupt handler, the driver currently only logs a warning and
> > continues execution. However, a Host Controller Error indicates a
> > critical hardware failure that requires the controller to be halted.
> > 
> 
> The host should cease all activity when it sets the HCE bit.
> 
> See xHCI spec 4.24.1 'Internal Errors':
> "When the HCE flag is set to ‘1’ the xHC shall cease all activity.
>  Software response to the assertion of HCE is to reset the
>  xHC (HCRST = ‘1’) and reinitialize it."
> 
> Same is true for "Host system error" HSE (STS_FATAL), not sure
> why we halt it manually in that case.
> 
> > Add xhci_halt(xhci) call after the HCE warning to properly halt the
> > controller when this error condition is detected. This ensures the
> > controller is in a consistent state and prevents further operations
> > on a failed hardware. Additionally, if there are still unhandled
> > interrupts at this point, it may cause interrupt storm.
> 
> Is this something that has been seen on real word hardware?
> If yes, and halting the host helped ,then this fix is ok by me.
> At least until a proper host reset solution is implemented.
Yes, the HCE issue (and subsequent interrupt storm) has been consistently 
observed on production Android devices during UASP device plug/unplug 
operations. Adding xhci_halt() effectively resolves the system-level interrupt 
storm issue caused by HCE.
> 
> Thanks
> Mathias
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
  2026-01-28  8:48   ` Dayu Jiang
@ 2026-01-28  8:56     ` Greg Kroah-Hartman
  2026-02-26  9:27       ` Dayu Jiang
  0 siblings, 1 reply; 16+ messages in thread
From: Greg Kroah-Hartman @ 2026-01-28  8:56 UTC (permalink / raw)
  To: Dayu Jiang
  Cc: Mathias Nyman, Longfang Liu, linux-usb, linux-kernel, yudongbin,
	guhuinan, chenyu45, mahongwei3

On Wed, Jan 28, 2026 at 04:48:49PM +0800, Dayu Jiang wrote:
> On Tue, Jan 27, 2026 at 12:22:40PM +0100, Greg Kroah-Hartman wrote:
> > >  	if (status & STS_HCE) {
> > >  		xhci_warn(xhci, "WARNING: Host Controller Error\n");
> > > +		xhci_halt(xhci);
> > 
> > What is going to start things back up again?  And as you are calling
> > this function, why is the warning message needed anymore?  The
> > tracepoint information will give you that message now, right?
> When HCE is triggered, it indicates a critical hardware failure. 
> Aligning with the handling of HSE (STS_FATAL) by adding 
> xhci_halt() here is more reasonable: without xhci_halt(), the 
> USB controller may fall into an unpredictable and unstable state, 
> which could exacerbate system issues.  
> 
> Retaining the warning message is necessary because it is directly 
> visible in dmesg, whereas tracepoint information requires explicitly 
> enabling xHCI tracepoints. Additionally, if xhci_halt() is called in 
> xhci_irq() without the warning log, it would be impossible to 
> distinguish whether the halt was triggered by HCE or HSE.
> > 
> > And is this just papering over a hardware bug?  Should this really be
> > happening for any normal system?
> Yes, this issue has been reproducible on real-world hardware: HCE is 
> triggered in UAS Storage Device plug/unplug scenarios on Android 
> devices, which enters this error branch and causes an interrupt storm, 
> leading to severe system-level faults.

Great, please provide this information in the changelog text when you
resubmit this, thanks!

greg k-h

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
  2026-01-28  8:56     ` Greg Kroah-Hartman
@ 2026-02-26  9:27       ` Dayu Jiang
  2026-02-26 16:44         ` Mathias Nyman
  0 siblings, 1 reply; 16+ messages in thread
From: Dayu Jiang @ 2026-02-26  9:27 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Mathias Nyman, Longfang Liu, linux-usb, linux-kernel, yudongbin,
	guhuinan, chenyu45, mahongwei3, Dayu Jiang

Hi Greg,

I have updated the changelog text as requested and resubmitted the patch.
https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xiaomi.com/
Please kindly review it and let me know if it is acceptable now.

Thanks,
Dayu Jiang

On Wed, Jan 28, 2026 at 09:56:18AM +0100, Greg Kroah-Hartman wrote:
> On Wed, Jan 28, 2026 at 04:48:49PM +0800, Dayu Jiang wrote:
> > On Tue, Jan 27, 2026 at 12:22:40PM +0100, Greg Kroah-Hartman wrote:
> > > >  	if (status & STS_HCE) {
> > > >  		xhci_warn(xhci, "WARNING: Host Controller Error\n");
> > > > +		xhci_halt(xhci);
> > > 
> > > What is going to start things back up again?  And as you are calling
> > > this function, why is the warning message needed anymore?  The
> > > tracepoint information will give you that message now, right?
> > When HCE is triggered, it indicates a critical hardware failure. 
> > Aligning with the handling of HSE (STS_FATAL) by adding 
> > xhci_halt() here is more reasonable: without xhci_halt(), the 
> > USB controller may fall into an unpredictable and unstable state, 
> > which could exacerbate system issues.  
> > 
> > Retaining the warning message is necessary because it is directly 
> > visible in dmesg, whereas tracepoint information requires explicitly 
> > enabling xHCI tracepoints. Additionally, if xhci_halt() is called in 
> > xhci_irq() without the warning log, it would be impossible to 
> > distinguish whether the halt was triggered by HCE or HSE.
> > > 
> > > And is this just papering over a hardware bug?  Should this really be
> > > happening for any normal system?
> > Yes, this issue has been reproducible on real-world hardware: HCE is 
> > triggered in UAS Storage Device plug/unplug scenarios on Android 
> > devices, which enters this error branch and causes an interrupt storm, 
> > leading to severe system-level faults.
> 
> Great, please provide this information in the changelog text when you
> resubmit this, thanks!
> 
> greg k-h

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
  2026-02-26  9:27       ` Dayu Jiang
@ 2026-02-26 16:44         ` Mathias Nyman
  2026-02-26 18:17           ` Thinh Nguyen
  2026-02-27  7:33           ` Dayu Jiang
  0 siblings, 2 replies; 16+ messages in thread
From: Mathias Nyman @ 2026-02-26 16:44 UTC (permalink / raw)
  To: Dayu Jiang, Greg Kroah-Hartman
  Cc: Mathias Nyman, Longfang Liu, linux-usb, linux-kernel, yudongbin,
	guhuinan, chenyu45, mahongwei3

On 2/26/26 11:27, Dayu Jiang wrote:
> Hi Greg,
> 
> I have updated the changelog text as requested and resubmitted the patch.
> https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xiaomi.com/
> Please kindly review it and let me know if it is acceptable now.

I'll send it forward, but changed the commit message.
Does this modified version still describe the case accurately:

usb: xhci: Prevent interrupt storm on host controller error (HCE)

The xHCI controller reports a Host Controller Error (HCE) in UAS Storage
Device plug/unplug scenarios on Android devices, which is checked in
xhci_irq() function and causes an interrupt storm (since the interrupt
isn’t cleared), leading to severe system-level faults.

When the xHC controller reports HCE in the interrupt handler, the driver
only logs a warning and assumes xHC activity will stop. The interrupt storm
does however continue until driver manually disables xHC interrupt and
stops the controller by calling xhci_halt().

The change is made in xhci_irq() function where STS_HCE status is
checked, mirroring the existing error handling pattern used for
STS_FATAL errors.

This only fixes the interrupt storm. Proper HCE recovery requires resetting
and re-initializing the xHC.

Thanks
Mathias

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
  2026-02-26 16:44         ` Mathias Nyman
@ 2026-02-26 18:17           ` Thinh Nguyen
  2026-02-27  7:26             ` Dayu Jiang
  2026-02-27  9:43             ` Mathias Nyman
  2026-02-27  7:33           ` Dayu Jiang
  1 sibling, 2 replies; 16+ messages in thread
From: Thinh Nguyen @ 2026-02-26 18:17 UTC (permalink / raw)
  To: Mathias Nyman, Dayu Jiang
  Cc: Greg Kroah-Hartman, Mathias Nyman, Longfang Liu,
	linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
	yudongbin, guhuinan, chenyu45, mahongwei3

On Thu, Feb 26, 2026, Mathias Nyman wrote:
> On 2/26/26 11:27, Dayu Jiang wrote:
> > Hi Greg,
> > 
> > I have updated the changelog text as requested and resubmitted the patch.
> > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xiaomi.com/__;!!A4F2R9G_pg!ZSJNDKyOinm26qngopLW-axiQtwDAMely4bDqtqYDGv1ErWCtS6kZ6ZamdiKoZKuCyCk0IxMQK5g625GEIxYWFzKpAEiCUq7$
> > Please kindly review it and let me know if it is acceptable now.
> 
> I'll send it forward, but changed the commit message.
> Does this modified version still describe the case accurately:
> 
> usb: xhci: Prevent interrupt storm on host controller error (HCE)
> 
> The xHCI controller reports a Host Controller Error (HCE) in UAS Storage
> Device plug/unplug scenarios on Android devices, which is checked in
> xhci_irq() function and causes an interrupt storm (since the interrupt
> isn’t cleared), leading to severe system-level faults.
> 
> When the xHC controller reports HCE in the interrupt handler, the driver
> only logs a warning and assumes xHC activity will stop. The interrupt storm
> does however continue until driver manually disables xHC interrupt and
> stops the controller by calling xhci_halt().
> 
> The change is made in xhci_irq() function where STS_HCE status is
> checked, mirroring the existing error handling pattern used for
> STS_FATAL errors.
> 
> This only fixes the interrupt storm. Proper HCE recovery requires resetting
> and re-initializing the xHC.
> 

The controller is halted if there's an error like HCE. It's odd to try
to "halt" it again. Not sure how this will impact for other controllers.
Even if we don't have the full HCE recovery implemented, did we try to
just do HCRST, which is the first step of the recovery?

BR,
Thinh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
  2026-02-26 18:17           ` Thinh Nguyen
@ 2026-02-27  7:26             ` Dayu Jiang
  2026-02-28  0:22               ` Thinh Nguyen
  2026-02-27  9:43             ` Mathias Nyman
  1 sibling, 1 reply; 16+ messages in thread
From: Dayu Jiang @ 2026-02-27  7:26 UTC (permalink / raw)
  To: Thinh Nguyen
  Cc: Mathias Nyman, Greg Kroah-Hartman, Mathias Nyman, Longfang Liu,
	linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
	yudongbin, guhuinan, chenyu45, mahongwei3, Dayu Jiang

On Thu, Feb 26, 2026 at 06:17:23PM +0000, Thinh Nguyen wrote:
> On Thu, Feb 26, 2026, Mathias Nyman wrote:
> > On 2/26/26 11:27, Dayu Jiang wrote:
> > > Hi Greg,
> > > 
> > > I have updated the changelog text as requested and resubmitted the patch.
> > > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xiaomi.com/__;!!A4F2R9G_pg!ZSJNDKyOinm26qngopLW-axiQtwDAMely4bDqtqYDGv1ErWCtS6kZ6ZamdiKoZKuCyCk0IxMQK5g625GEIxYWFzKpAEiCUq7$
> > > Please kindly review it and let me know if it is acceptable now.
> > 
> > I'll send it forward, but changed the commit message.
> > Does this modified version still describe the case accurately:
> > 
> > usb: xhci: Prevent interrupt storm on host controller error (HCE)
> > 
> > The xHCI controller reports a Host Controller Error (HCE) in UAS Storage
> > Device plug/unplug scenarios on Android devices, which is checked in
> > xhci_irq() function and causes an interrupt storm (since the interrupt
> > isn’t cleared), leading to severe system-level faults.
> > 
> > When the xHC controller reports HCE in the interrupt handler, the driver
> > only logs a warning and assumes xHC activity will stop. The interrupt storm
> > does however continue until driver manually disables xHC interrupt and
> > stops the controller by calling xhci_halt().
> > 
> > The change is made in xhci_irq() function where STS_HCE status is
> > checked, mirroring the existing error handling pattern used for
> > STS_FATAL errors.
> > 
> > This only fixes the interrupt storm. Proper HCE recovery requires resetting
> > and re-initializing the xHC.
> > 
> 
> The controller is halted if there's an error like HCE. It's odd to try
> to "halt" it again. Not sure how this will impact for other controllers.
> Even if we don't have the full HCE recovery implemented, did we try to
> just do HCRST, which is the first step of the recovery?
A full recovery will not be implemented here. Performing only HCRST without 
a proper recovery procedure may introduce unpredictable risks.
In the xHCI driver flow, the standard handling for exceptions is mainly 
done via xhci_died() or xhci_halt() (please refer to the existing handling 
flow for HSE as a reference).
When an HCE occurs, the controller is already halted, but the interrupts 
have not been cleared. It has been confirmed that calling xhci_halt() at this 
point can properly resolve the interrupt storm issue.
> BR,
> Thinh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
  2026-02-26 16:44         ` Mathias Nyman
  2026-02-26 18:17           ` Thinh Nguyen
@ 2026-02-27  7:33           ` Dayu Jiang
  1 sibling, 0 replies; 16+ messages in thread
From: Dayu Jiang @ 2026-02-27  7:33 UTC (permalink / raw)
  To: Mathias Nyman
  Cc: Greg Kroah-Hartman, Mathias Nyman, Longfang Liu, linux-usb,
	linux-kernel, yudongbin, guhuinan, chenyu45, mahongwei3,
	Dayu Jiang

On Thu, Feb 26, 2026 at 06:44:02PM +0200, Mathias Nyman wrote:
> On 2/26/26 11:27, Dayu Jiang wrote:
> > Hi Greg,
> > 
> > I have updated the changelog text as requested and resubmitted the patch.
> > https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xiaomi.com/
> > Please kindly review it and let me know if it is acceptable now.
> 
> I'll send it forward, but changed the commit message.
> Does this modified version still describe the case accurately:
> 
> usb: xhci: Prevent interrupt storm on host controller error (HCE)
> 
> The xHCI controller reports a Host Controller Error (HCE) in UAS Storage
> Device plug/unplug scenarios on Android devices, which is checked in
> xhci_irq() function and causes an interrupt storm (since the interrupt
> isn’t cleared), leading to severe system-level faults.
> 
> When the xHC controller reports HCE in the interrupt handler, the driver
> only logs a warning and assumes xHC activity will stop. The interrupt storm
> does however continue until driver manually disables xHC interrupt and
> stops the controller by calling xhci_halt().
> 
> The change is made in xhci_irq() function where STS_HCE status is
> checked, mirroring the existing error handling pattern used for
> STS_FATAL errors.
> 
> This only fixes the interrupt storm. Proper HCE recovery requires resetting
> and re-initializing the xHC.
The modified version looks good and accurate to me. Please feel free to merge it.

Thanks
Dayu Jiang
> 
> Thanks
> Mathias

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
  2026-02-26 18:17           ` Thinh Nguyen
  2026-02-27  7:26             ` Dayu Jiang
@ 2026-02-27  9:43             ` Mathias Nyman
  2026-02-27 11:05               ` Michal Pecio
  2026-02-28  0:18               ` Thinh Nguyen
  1 sibling, 2 replies; 16+ messages in thread
From: Mathias Nyman @ 2026-02-27  9:43 UTC (permalink / raw)
  To: Thinh Nguyen, Dayu Jiang
  Cc: Greg Kroah-Hartman, Mathias Nyman, Longfang Liu,
	linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
	yudongbin, guhuinan, chenyu45, mahongwei3, Niklas Neronin

On 2/26/26 20:17, Thinh Nguyen wrote:
> On Thu, Feb 26, 2026, Mathias Nyman wrote:
>> On 2/26/26 11:27, Dayu Jiang wrote:
>>> Hi Greg,
>>>
>>> I have updated the changelog text as requested and resubmitted the patch.
>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xiaomi.com/__;!!A4F2R9G_pg!ZSJNDKyOinm26qngopLW-axiQtwDAMely4bDqtqYDGv1ErWCtS6kZ6ZamdiKoZKuCyCk0IxMQK5g625GEIxYWFzKpAEiCUq7$
>>> Please kindly review it and let me know if it is acceptable now.
>>
>> I'll send it forward, but changed the commit message.
>> Does this modified version still describe the case accurately:
>>
>> usb: xhci: Prevent interrupt storm on host controller error (HCE)
>>
>> The xHCI controller reports a Host Controller Error (HCE) in UAS Storage
>> Device plug/unplug scenarios on Android devices, which is checked in
>> xhci_irq() function and causes an interrupt storm (since the interrupt
>> isn’t cleared), leading to severe system-level faults.
>>
>> When the xHC controller reports HCE in the interrupt handler, the driver
>> only logs a warning and assumes xHC activity will stop. The interrupt storm
>> does however continue until driver manually disables xHC interrupt and
>> stops the controller by calling xhci_halt().
>>
>> The change is made in xhci_irq() function where STS_HCE status is
>> checked, mirroring the existing error handling pattern used for
>> STS_FATAL errors.
>>
>> This only fixes the interrupt storm. Proper HCE recovery requires resetting
>> and re-initializing the xHC.
>>
> 
> The controller is halted if there's an error like HCE. It's odd to try
> to "halt" it again. Not sure how this will impact for other controllers.

This is why I changed the commit message from:

"When the xHCI controller reports HCE in the interrupt handler, the driver
currently only logs a warning and continues execution. However, HCE
indicates a critical hardware failure that requires the controller to be
halted. This ensures the controller is in a consistent state and prevents
further operations on failed hardware."

to:

"When the xHC controller reports HCE in the interrupt handler, the driver
only logs a warning and assumes xHC activity will stop. The interrupt storm
does however continue until driver manually disables xHC interrupt and
stops the controller by calling xhci_halt()."

I can clarify it further by stating that .."assumes xHC activity will stop
as stated in xHCI spec. On some xHC controllers an interrupt storm continues after
HCE error, and only ceases after manually"..

The host is messed up at this point, and we are not recovering it. I don't think
there is any harm in a manual halt at this stage.

> Even if we don't have the full HCE recovery implemented, did we try to
> just do HCRST, which is the first step of the recovery?

Specs state that HCRST might re-trigger the HCE if it's due to a "hard" fault,
and driver needs to take action to prevent a HCE - HCRST recovery loop.

HCRST will clear all registers, so we need to reinitialize everything here,
write back addresses of event rings, command rings, DCBAA, scratchpads
dequeue pointers etc.

I support taking this fix to prevent the interrupt storm, an issue seen in real
life. And then solve proper recovery later.

Niklas is actually working on decoupling memory allocation and xHC register
initialization which will help future HCE recovery work.

Thanks
Mathias



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
  2026-02-27  9:43             ` Mathias Nyman
@ 2026-02-27 11:05               ` Michal Pecio
  2026-02-28  0:06                 ` Thinh Nguyen
  2026-02-28  0:18               ` Thinh Nguyen
  1 sibling, 1 reply; 16+ messages in thread
From: Michal Pecio @ 2026-02-27 11:05 UTC (permalink / raw)
  To: Mathias Nyman
  Cc: Thinh Nguyen, Dayu Jiang, Greg Kroah-Hartman, Mathias Nyman,
	Longfang Liu, linux-usb@vger.kernel.org,
	linux-kernel@vger.kernel.org, yudongbin, guhuinan, chenyu45,
	mahongwei3, Niklas Neronin

On Fri, 27 Feb 2026 11:43:45 +0200, Mathias Nyman wrote:
> On 2/26/26 20:17, Thinh Nguyen wrote:
> > The controller is halted if there's an error like HCE. It's odd to
> > try to "halt" it again. Not sure how this will impact for other
> > controllers.
> The host is messed up at this point, and we are not recovering it.
> I don't think there is any harm in a manual halt at this stage.

Specifically, calling xhci_halt() clears the USBCMD.Run flag and
all USBCMD interrupt enable flags. Seems relatively harmless. Clearing
USBCMD.Run would be the first step of resetting the HC anyway, so the
HW should expect it to happen afetr reporting HCE.

In case of HSE the HW should clear the Run bit by itself (4.10.2.6),
but no such requirement seems to exist for HCE (4.24.1).

The call also sets XHCI_STATE_HALTED and CMD_RING_STATE_STOPPED flags,
which helps with recovering stuck URBs. When class drivers time out
and unlink them, the URBs are given back instantly without drama.

I just tested the HSE case where xhci_halt() is already being called
and it worked for me. If I remove xhci_halt() then the driver tries to
issue Stop Endpoint commands, times out and calls hc_died(). Messy.
I suspect that the same happened with HCE before this patch.

Regards,
Michal

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
  2026-02-27 11:05               ` Michal Pecio
@ 2026-02-28  0:06                 ` Thinh Nguyen
  0 siblings, 0 replies; 16+ messages in thread
From: Thinh Nguyen @ 2026-02-28  0:06 UTC (permalink / raw)
  To: Michal Pecio
  Cc: Mathias Nyman, Thinh Nguyen, Dayu Jiang, Greg Kroah-Hartman,
	Mathias Nyman, Longfang Liu, linux-usb@vger.kernel.org,
	linux-kernel@vger.kernel.org, yudongbin, guhuinan, chenyu45,
	mahongwei3, Niklas Neronin

On Fri, Feb 27, 2026, Michal Pecio wrote:
> On Fri, 27 Feb 2026 11:43:45 +0200, Mathias Nyman wrote:
> > On 2/26/26 20:17, Thinh Nguyen wrote:
> > > The controller is halted if there's an error like HCE. It's odd to
> > > try to "halt" it again. Not sure how this will impact for other
> > > controllers.
> > The host is messed up at this point, and we are not recovering it.
> > I don't think there is any harm in a manual halt at this stage.
> 
> Specifically, calling xhci_halt() clears the USBCMD.Run flag and
> all USBCMD interrupt enable flags. Seems relatively harmless. Clearing
> USBCMD.Run would be the first step of resetting the HC anyway, so the
> HW should expect it to happen afetr reporting HCE.
> 
> In case of HSE the HW should clear the Run bit by itself (4.10.2.6),
> but no such requirement seems to exist for HCE (4.24.1).

Check 4.21.2. The controller should clear the Run/Stop bit for both
cases.

> 
> The call also sets XHCI_STATE_HALTED and CMD_RING_STATE_STOPPED flags,

Perhaps we update these driver flags, but we should not need to clear
the Run/Stop bit. That's not the first thing we should do.

> which helps with recovering stuck URBs. When class drivers time out
> and unlink them, the URBs are given back instantly without drama.
> 
> I just tested the HSE case where xhci_halt() is already being called
> and it worked for me. If I remove xhci_halt() then the driver tries to
> issue Stop Endpoint commands, times out and calls hc_died(). Messy.
> I suspect that the same happened with HCE before this patch.
> 

That's just the xhci driver needing to update the software states to
properly handle the teardown knowing that the xHC is halted.

BR,
Thinh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
  2026-02-27  9:43             ` Mathias Nyman
  2026-02-27 11:05               ` Michal Pecio
@ 2026-02-28  0:18               ` Thinh Nguyen
  1 sibling, 0 replies; 16+ messages in thread
From: Thinh Nguyen @ 2026-02-28  0:18 UTC (permalink / raw)
  To: Mathias Nyman
  Cc: Thinh Nguyen, Dayu Jiang, Greg Kroah-Hartman, Mathias Nyman,
	Longfang Liu, linux-usb@vger.kernel.org,
	linux-kernel@vger.kernel.org, yudongbin, guhuinan, chenyu45,
	mahongwei3, Niklas Neronin

On Fri, Feb 27, 2026, Mathias Nyman wrote:
> On 2/26/26 20:17, Thinh Nguyen wrote:
> > On Thu, Feb 26, 2026, Mathias Nyman wrote:
> > > On 2/26/26 11:27, Dayu Jiang wrote:
> > > > Hi Greg,
> > > > 
> > > > I have updated the changelog text as requested and resubmitted the patch.
> > > > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xiaomi.com/__;!!A4F2R9G_pg!ZSJNDKyOinm26qngopLW-axiQtwDAMely4bDqtqYDGv1ErWCtS6kZ6ZamdiKoZKuCyCk0IxMQK5g625GEIxYWFzKpAEiCUq7$
> > > > Please kindly review it and let me know if it is acceptable now.
> > > 
> > > I'll send it forward, but changed the commit message.
> > > Does this modified version still describe the case accurately:
> > > 
> > > usb: xhci: Prevent interrupt storm on host controller error (HCE)
> > > 
> > > The xHCI controller reports a Host Controller Error (HCE) in UAS Storage
> > > Device plug/unplug scenarios on Android devices, which is checked in
> > > xhci_irq() function and causes an interrupt storm (since the interrupt
> > > isn’t cleared), leading to severe system-level faults.
> > > 
> > > When the xHC controller reports HCE in the interrupt handler, the driver
> > > only logs a warning and assumes xHC activity will stop. The interrupt storm
> > > does however continue until driver manually disables xHC interrupt and
> > > stops the controller by calling xhci_halt().
> > > 
> > > The change is made in xhci_irq() function where STS_HCE status is
> > > checked, mirroring the existing error handling pattern used for
> > > STS_FATAL errors.
> > > 
> > > This only fixes the interrupt storm. Proper HCE recovery requires resetting
> > > and re-initializing the xHC.
> > > 
> > 
> > The controller is halted if there's an error like HCE. It's odd to try
> > to "halt" it again. Not sure how this will impact for other controllers.
> 
> This is why I changed the commit message from:
> 
> "When the xHCI controller reports HCE in the interrupt handler, the driver
> currently only logs a warning and continues execution. However, HCE
> indicates a critical hardware failure that requires the controller to be
> halted. This ensures the controller is in a consistent state and prevents
> further operations on failed hardware."
> 
> to:
> 
> "When the xHC controller reports HCE in the interrupt handler, the driver
> only logs a warning and assumes xHC activity will stop. The interrupt storm
> does however continue until driver manually disables xHC interrupt and
> stops the controller by calling xhci_halt()."
> 
> I can clarify it further by stating that .."assumes xHC activity will stop
> as stated in xHCI spec. On some xHC controllers an interrupt storm continues after
> HCE error, and only ceases after manually"..
> 
> The host is messed up at this point, and we are not recovering it. I don't think
> there is any harm in a manual halt at this stage.

We should update the xhci driver states when there's HCE and the
controller is halted but we don't need to manually clear the Run/Stop
bit again.

> 
> > Even if we don't have the full HCE recovery implemented, did we try to
> > just do HCRST, which is the first step of the recovery?
> 
> Specs state that HCRST might re-trigger the HCE if it's due to a "hard" fault,

That's only after we re-initialize the controller as noted in the spec,
not immidiately after HCRST.

> and driver needs to take action to prevent a HCE - HCRST recovery loop.
> 
> HCRST will clear all registers, so we need to reinitialize everything here,
> write back addresses of event rings, command rings, DCBAA, scratchpads
> dequeue pointers etc.
> 
> I support taking this fix to prevent the interrupt storm, an issue seen in real
> life. And then solve proper recovery later.

That's fair to me.

> 
> Niklas is actually working on decoupling memory allocation and xHC register
> initialization which will help future HCE recovery work.
> 

That's great! I'm looking forward to that.

Thanks,
Thinh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
  2026-02-27  7:26             ` Dayu Jiang
@ 2026-02-28  0:22               ` Thinh Nguyen
  0 siblings, 0 replies; 16+ messages in thread
From: Thinh Nguyen @ 2026-02-28  0:22 UTC (permalink / raw)
  To: Dayu Jiang
  Cc: Thinh Nguyen, Mathias Nyman, Greg Kroah-Hartman, Mathias Nyman,
	Longfang Liu, linux-usb@vger.kernel.org,
	linux-kernel@vger.kernel.org, yudongbin, guhuinan, chenyu45,
	mahongwei3

On Fri, Feb 27, 2026, Dayu Jiang wrote:
> On Thu, Feb 26, 2026 at 06:17:23PM +0000, Thinh Nguyen wrote:
> > On Thu, Feb 26, 2026, Mathias Nyman wrote:
> > > On 2/26/26 11:27, Dayu Jiang wrote:
> > > > Hi Greg,
> > > > 
> > > > I have updated the changelog text as requested and resubmitted the patch.
> > > > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xiaomi.com/__;!!A4F2R9G_pg!ZSJNDKyOinm26qngopLW-axiQtwDAMely4bDqtqYDGv1ErWCtS6kZ6ZamdiKoZKuCyCk0IxMQK5g625GEIxYWFzKpAEiCUq7$
> > > > Please kindly review it and let me know if it is acceptable now.
> > > 
> > > I'll send it forward, but changed the commit message.
> > > Does this modified version still describe the case accurately:
> > > 
> > > usb: xhci: Prevent interrupt storm on host controller error (HCE)
> > > 
> > > The xHCI controller reports a Host Controller Error (HCE) in UAS Storage
> > > Device plug/unplug scenarios on Android devices, which is checked in
> > > xhci_irq() function and causes an interrupt storm (since the interrupt
> > > isn’t cleared), leading to severe system-level faults.
> > > 
> > > When the xHC controller reports HCE in the interrupt handler, the driver
> > > only logs a warning and assumes xHC activity will stop. The interrupt storm
> > > does however continue until driver manually disables xHC interrupt and
> > > stops the controller by calling xhci_halt().
> > > 
> > > The change is made in xhci_irq() function where STS_HCE status is
> > > checked, mirroring the existing error handling pattern used for
> > > STS_FATAL errors.
> > > 
> > > This only fixes the interrupt storm. Proper HCE recovery requires resetting
> > > and re-initializing the xHC.
> > > 
> > 
> > The controller is halted if there's an error like HCE. It's odd to try
> > to "halt" it again. Not sure how this will impact for other controllers.
> > Even if we don't have the full HCE recovery implemented, did we try to
> > just do HCRST, which is the first step of the recovery?
> A full recovery will not be implemented here. Performing only HCRST without 
> a proper recovery procedure may introduce unpredictable risks.

What risks?

> In the xHCI driver flow, the standard handling for exceptions is mainly 
> done via xhci_died() or xhci_halt() (please refer to the existing handling 
> flow for HSE as a reference).
> When an HCE occurs, the controller is already halted, but the interrupts 
> have not been cleared. It has been confirmed that calling xhci_halt() at this 
> point can properly resolve the interrupt storm issue.

As I noted in Mathias's reply, I'm OK with this change while waiting for
the proper handling of HCE to be implemented.

BR,
Thinh

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2026-02-28  0:22 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-27 11:04 [PATCH] usb: xhci: add xhci_halt() for HCE Handling jiangdayu
2026-01-27 11:22 ` Greg Kroah-Hartman
2026-01-28  8:48   ` Dayu Jiang
2026-01-28  8:56     ` Greg Kroah-Hartman
2026-02-26  9:27       ` Dayu Jiang
2026-02-26 16:44         ` Mathias Nyman
2026-02-26 18:17           ` Thinh Nguyen
2026-02-27  7:26             ` Dayu Jiang
2026-02-28  0:22               ` Thinh Nguyen
2026-02-27  9:43             ` Mathias Nyman
2026-02-27 11:05               ` Michal Pecio
2026-02-28  0:06                 ` Thinh Nguyen
2026-02-28  0:18               ` Thinh Nguyen
2026-02-27  7:33           ` Dayu Jiang
2026-01-27 12:25 ` Mathias Nyman
2026-01-28  8:53   ` Dayu Jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox