* [PATCH] usb: xhci: add xhci_halt() for HCE Handling
@ 2026-01-27 11:04 jiangdayu
2026-01-27 11:22 ` Greg Kroah-Hartman
2026-01-27 12:25 ` Mathias Nyman
0 siblings, 2 replies; 16+ messages in thread
From: jiangdayu @ 2026-01-27 11:04 UTC (permalink / raw)
To: Mathias Nyman, Greg Kroah-Hartman
Cc: Longfang Liu, linux-usb, linux-kernel, yudongbin, guhuinan,
chenyu45, mahongwei3, jiangdayu
When the xHCI controller reports a Host Controller Error (HCE) status
in the interrupt handler, the driver currently only logs a warning and
continues execution. However, a Host Controller Error indicates a
critical hardware failure that requires the controller to be halted.
Add xhci_halt(xhci) call after the HCE warning to properly halt the
controller when this error condition is detected. This ensures the
controller is in a consistent state and prevents further operations
on a failed hardware. Additionally, if there are still unhandled
interrupts at this point, it may cause interrupt storm.
The change is made in xhci_irq() function where STS_HCE status is
checked, mirroring the existing error handling pattern used for
STS_FATAL errors.
Fixes: 2a25e66d676df ("xhci: print warning when HCE was set")
Signed-off-by: jiangdayu <jiangdayu@xiaomi.com>
---
drivers/usb/host/xhci-ring.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 9315ba18310d..1cbefee3c4ca 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -3195,6 +3195,7 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd)
if (status & STS_HCE) {
xhci_warn(xhci, "WARNING: Host Controller Error\n");
+ xhci_halt(xhci);
goto out;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
2026-01-27 11:04 [PATCH] usb: xhci: add xhci_halt() for HCE Handling jiangdayu
@ 2026-01-27 11:22 ` Greg Kroah-Hartman
2026-01-28 8:48 ` Dayu Jiang
2026-01-27 12:25 ` Mathias Nyman
1 sibling, 1 reply; 16+ messages in thread
From: Greg Kroah-Hartman @ 2026-01-27 11:22 UTC (permalink / raw)
To: jiangdayu
Cc: Mathias Nyman, Longfang Liu, linux-usb, linux-kernel, yudongbin,
guhuinan, chenyu45, mahongwei3
On Tue, Jan 27, 2026 at 07:04:22PM +0800, jiangdayu wrote:
> When the xHCI controller reports a Host Controller Error (HCE) status
> in the interrupt handler, the driver currently only logs a warning and
> continues execution. However, a Host Controller Error indicates a
> critical hardware failure that requires the controller to be halted.
>
> Add xhci_halt(xhci) call after the HCE warning to properly halt the
> controller when this error condition is detected. This ensures the
> controller is in a consistent state and prevents further operations
> on a failed hardware. Additionally, if there are still unhandled
> interrupts at this point, it may cause interrupt storm.
>
> The change is made in xhci_irq() function where STS_HCE status is
> checked, mirroring the existing error handling pattern used for
> STS_FATAL errors.
>
> Fixes: 2a25e66d676df ("xhci: print warning when HCE was set")
> Signed-off-by: jiangdayu <jiangdayu@xiaomi.com>
We need a full name, not an email alias, sorry.
And this isn't really "fixing" that commit, there's nothing wrong with
it as-is. This is adding new functionality to the code.
> ---
> drivers/usb/host/xhci-ring.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 9315ba18310d..1cbefee3c4ca 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -3195,6 +3195,7 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd)
>
> if (status & STS_HCE) {
> xhci_warn(xhci, "WARNING: Host Controller Error\n");
> + xhci_halt(xhci);
What is going to start things back up again? And as you are calling
this function, why is the warning message needed anymore? The
tracepoint information will give you that message now, right?
And is this just papering over a hardware bug? Should this really be
happening for any normal system?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
2026-01-27 11:04 [PATCH] usb: xhci: add xhci_halt() for HCE Handling jiangdayu
2026-01-27 11:22 ` Greg Kroah-Hartman
@ 2026-01-27 12:25 ` Mathias Nyman
2026-01-28 8:53 ` Dayu Jiang
1 sibling, 1 reply; 16+ messages in thread
From: Mathias Nyman @ 2026-01-27 12:25 UTC (permalink / raw)
To: jiangdayu, Mathias Nyman, Greg Kroah-Hartman
Cc: Longfang Liu, linux-usb, linux-kernel, yudongbin, guhuinan,
chenyu45, mahongwei3
Hi
On 1/27/26 13:04, jiangdayu wrote:
> When the xHCI controller reports a Host Controller Error (HCE) status
> in the interrupt handler, the driver currently only logs a warning and
> continues execution. However, a Host Controller Error indicates a
> critical hardware failure that requires the controller to be halted.
>
The host should cease all activity when it sets the HCE bit.
See xHCI spec 4.24.1 'Internal Errors':
"When the HCE flag is set to ‘1’ the xHC shall cease all activity.
Software response to the assertion of HCE is to reset the
xHC (HCRST = ‘1’) and reinitialize it."
Same is true for "Host system error" HSE (STS_FATAL), not sure
why we halt it manually in that case.
> Add xhci_halt(xhci) call after the HCE warning to properly halt the
> controller when this error condition is detected. This ensures the
> controller is in a consistent state and prevents further operations
> on a failed hardware. Additionally, if there are still unhandled
> interrupts at this point, it may cause interrupt storm.
Is this something that has been seen on real word hardware?
If yes, and halting the host helped ,then this fix is ok by me.
At least until a proper host reset solution is implemented.
Thanks
Mathias
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
2026-01-27 11:22 ` Greg Kroah-Hartman
@ 2026-01-28 8:48 ` Dayu Jiang
2026-01-28 8:56 ` Greg Kroah-Hartman
0 siblings, 1 reply; 16+ messages in thread
From: Dayu Jiang @ 2026-01-28 8:48 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Mathias Nyman, Longfang Liu, linux-usb, linux-kernel, yudongbin,
guhuinan, chenyu45, mahongwei3
On Tue, Jan 27, 2026 at 12:22:40PM +0100, Greg Kroah-Hartman wrote:
> On Tue, Jan 27, 2026 at 07:04:22PM +0800, jiangdayu wrote:
> > When the xHCI controller reports a Host Controller Error (HCE) status
> > in the interrupt handler, the driver currently only logs a warning and
> > continues execution. However, a Host Controller Error indicates a
> > critical hardware failure that requires the controller to be halted.
> >
> > Add xhci_halt(xhci) call after the HCE warning to properly halt the
> > controller when this error condition is detected. This ensures the
> > controller is in a consistent state and prevents further operations
> > on a failed hardware. Additionally, if there are still unhandled
> > interrupts at this point, it may cause interrupt storm.
> >
> > The change is made in xhci_irq() function where STS_HCE status is
> > checked, mirroring the existing error handling pattern used for
> > STS_FATAL errors.
> >
> > Fixes: 2a25e66d676df ("xhci: print warning when HCE was set")
> > Signed-off-by: jiangdayu <jiangdayu@xiaomi.com>
>
> We need a full name, not an email alias, sorry.
Sorry for the confusion, I will use my full legal name (instead of the
email alias) in the Signed-off-by line in the revised patch.
>
> And this isn't really "fixing" that commit, there's nothing wrong with
> it as-is. This is adding new functionality to the code.
I initially used the Fixes tag because the original commit only logged
a warning for HCE with no further action, this incomplete handling
risks interrupt storms on the SoC (since the interrupt isn’t cleared).
That’s a robustness gap I wanted to fix with this patch.
>
> > ---
> > drivers/usb/host/xhci-ring.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> > index 9315ba18310d..1cbefee3c4ca 100644
> > --- a/drivers/usb/host/xhci-ring.c
> > +++ b/drivers/usb/host/xhci-ring.c
> > @@ -3195,6 +3195,7 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd)
> >
> > if (status & STS_HCE) {
> > xhci_warn(xhci, "WARNING: Host Controller Error\n");
> > + xhci_halt(xhci);
>
> What is going to start things back up again? And as you are calling
> this function, why is the warning message needed anymore? The
> tracepoint information will give you that message now, right?
When HCE is triggered, it indicates a critical hardware failure.
Aligning with the handling of HSE (STS_FATAL) by adding
xhci_halt() here is more reasonable: without xhci_halt(), the
USB controller may fall into an unpredictable and unstable state,
which could exacerbate system issues.
Retaining the warning message is necessary because it is directly
visible in dmesg, whereas tracepoint information requires explicitly
enabling xHCI tracepoints. Additionally, if xhci_halt() is called in
xhci_irq() without the warning log, it would be impossible to
distinguish whether the halt was triggered by HCE or HSE.
>
> And is this just papering over a hardware bug? Should this really be
> happening for any normal system?
Yes, this issue has been reproducible on real-world hardware: HCE is
triggered in UAS Storage Device plug/unplug scenarios on Android
devices, which enters this error branch and causes an interrupt storm,
leading to severe system-level faults.
>
> thanks,
>
> greg k-h
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
2026-01-27 12:25 ` Mathias Nyman
@ 2026-01-28 8:53 ` Dayu Jiang
0 siblings, 0 replies; 16+ messages in thread
From: Dayu Jiang @ 2026-01-28 8:53 UTC (permalink / raw)
To: Mathias Nyman
Cc: Greg Kroah-Hartman, Longfang Liu, linux-usb, linux-kernel,
yudongbin, guhuinan, chenyu45, mahongwei3, Dayu Jiang
On Tue, Jan 27, 2026 at 02:25:19PM +0200, Mathias Nyman wrote:
> Hi
>
> On 1/27/26 13:04, jiangdayu wrote:
> > When the xHCI controller reports a Host Controller Error (HCE) status
> > in the interrupt handler, the driver currently only logs a warning and
> > continues execution. However, a Host Controller Error indicates a
> > critical hardware failure that requires the controller to be halted.
> >
>
> The host should cease all activity when it sets the HCE bit.
>
> See xHCI spec 4.24.1 'Internal Errors':
> "When the HCE flag is set to ‘1’ the xHC shall cease all activity.
> Software response to the assertion of HCE is to reset the
> xHC (HCRST = ‘1’) and reinitialize it."
>
> Same is true for "Host system error" HSE (STS_FATAL), not sure
> why we halt it manually in that case.
>
> > Add xhci_halt(xhci) call after the HCE warning to properly halt the
> > controller when this error condition is detected. This ensures the
> > controller is in a consistent state and prevents further operations
> > on a failed hardware. Additionally, if there are still unhandled
> > interrupts at this point, it may cause interrupt storm.
>
> Is this something that has been seen on real word hardware?
> If yes, and halting the host helped ,then this fix is ok by me.
> At least until a proper host reset solution is implemented.
Yes, the HCE issue (and subsequent interrupt storm) has been consistently
observed on production Android devices during UASP device plug/unplug
operations. Adding xhci_halt() effectively resolves the system-level interrupt
storm issue caused by HCE.
>
> Thanks
> Mathias
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
2026-01-28 8:48 ` Dayu Jiang
@ 2026-01-28 8:56 ` Greg Kroah-Hartman
2026-02-26 9:27 ` Dayu Jiang
0 siblings, 1 reply; 16+ messages in thread
From: Greg Kroah-Hartman @ 2026-01-28 8:56 UTC (permalink / raw)
To: Dayu Jiang
Cc: Mathias Nyman, Longfang Liu, linux-usb, linux-kernel, yudongbin,
guhuinan, chenyu45, mahongwei3
On Wed, Jan 28, 2026 at 04:48:49PM +0800, Dayu Jiang wrote:
> On Tue, Jan 27, 2026 at 12:22:40PM +0100, Greg Kroah-Hartman wrote:
> > > if (status & STS_HCE) {
> > > xhci_warn(xhci, "WARNING: Host Controller Error\n");
> > > + xhci_halt(xhci);
> >
> > What is going to start things back up again? And as you are calling
> > this function, why is the warning message needed anymore? The
> > tracepoint information will give you that message now, right?
> When HCE is triggered, it indicates a critical hardware failure.
> Aligning with the handling of HSE (STS_FATAL) by adding
> xhci_halt() here is more reasonable: without xhci_halt(), the
> USB controller may fall into an unpredictable and unstable state,
> which could exacerbate system issues.
>
> Retaining the warning message is necessary because it is directly
> visible in dmesg, whereas tracepoint information requires explicitly
> enabling xHCI tracepoints. Additionally, if xhci_halt() is called in
> xhci_irq() without the warning log, it would be impossible to
> distinguish whether the halt was triggered by HCE or HSE.
> >
> > And is this just papering over a hardware bug? Should this really be
> > happening for any normal system?
> Yes, this issue has been reproducible on real-world hardware: HCE is
> triggered in UAS Storage Device plug/unplug scenarios on Android
> devices, which enters this error branch and causes an interrupt storm,
> leading to severe system-level faults.
Great, please provide this information in the changelog text when you
resubmit this, thanks!
greg k-h
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
2026-01-28 8:56 ` Greg Kroah-Hartman
@ 2026-02-26 9:27 ` Dayu Jiang
2026-02-26 16:44 ` Mathias Nyman
0 siblings, 1 reply; 16+ messages in thread
From: Dayu Jiang @ 2026-02-26 9:27 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Mathias Nyman, Longfang Liu, linux-usb, linux-kernel, yudongbin,
guhuinan, chenyu45, mahongwei3, Dayu Jiang
Hi Greg,
I have updated the changelog text as requested and resubmitted the patch.
https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xiaomi.com/
Please kindly review it and let me know if it is acceptable now.
Thanks,
Dayu Jiang
On Wed, Jan 28, 2026 at 09:56:18AM +0100, Greg Kroah-Hartman wrote:
> On Wed, Jan 28, 2026 at 04:48:49PM +0800, Dayu Jiang wrote:
> > On Tue, Jan 27, 2026 at 12:22:40PM +0100, Greg Kroah-Hartman wrote:
> > > > if (status & STS_HCE) {
> > > > xhci_warn(xhci, "WARNING: Host Controller Error\n");
> > > > + xhci_halt(xhci);
> > >
> > > What is going to start things back up again? And as you are calling
> > > this function, why is the warning message needed anymore? The
> > > tracepoint information will give you that message now, right?
> > When HCE is triggered, it indicates a critical hardware failure.
> > Aligning with the handling of HSE (STS_FATAL) by adding
> > xhci_halt() here is more reasonable: without xhci_halt(), the
> > USB controller may fall into an unpredictable and unstable state,
> > which could exacerbate system issues.
> >
> > Retaining the warning message is necessary because it is directly
> > visible in dmesg, whereas tracepoint information requires explicitly
> > enabling xHCI tracepoints. Additionally, if xhci_halt() is called in
> > xhci_irq() without the warning log, it would be impossible to
> > distinguish whether the halt was triggered by HCE or HSE.
> > >
> > > And is this just papering over a hardware bug? Should this really be
> > > happening for any normal system?
> > Yes, this issue has been reproducible on real-world hardware: HCE is
> > triggered in UAS Storage Device plug/unplug scenarios on Android
> > devices, which enters this error branch and causes an interrupt storm,
> > leading to severe system-level faults.
>
> Great, please provide this information in the changelog text when you
> resubmit this, thanks!
>
> greg k-h
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
2026-02-26 9:27 ` Dayu Jiang
@ 2026-02-26 16:44 ` Mathias Nyman
2026-02-26 18:17 ` Thinh Nguyen
2026-02-27 7:33 ` Dayu Jiang
0 siblings, 2 replies; 16+ messages in thread
From: Mathias Nyman @ 2026-02-26 16:44 UTC (permalink / raw)
To: Dayu Jiang, Greg Kroah-Hartman
Cc: Mathias Nyman, Longfang Liu, linux-usb, linux-kernel, yudongbin,
guhuinan, chenyu45, mahongwei3
On 2/26/26 11:27, Dayu Jiang wrote:
> Hi Greg,
>
> I have updated the changelog text as requested and resubmitted the patch.
> https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xiaomi.com/
> Please kindly review it and let me know if it is acceptable now.
I'll send it forward, but changed the commit message.
Does this modified version still describe the case accurately:
usb: xhci: Prevent interrupt storm on host controller error (HCE)
The xHCI controller reports a Host Controller Error (HCE) in UAS Storage
Device plug/unplug scenarios on Android devices, which is checked in
xhci_irq() function and causes an interrupt storm (since the interrupt
isn’t cleared), leading to severe system-level faults.
When the xHC controller reports HCE in the interrupt handler, the driver
only logs a warning and assumes xHC activity will stop. The interrupt storm
does however continue until driver manually disables xHC interrupt and
stops the controller by calling xhci_halt().
The change is made in xhci_irq() function where STS_HCE status is
checked, mirroring the existing error handling pattern used for
STS_FATAL errors.
This only fixes the interrupt storm. Proper HCE recovery requires resetting
and re-initializing the xHC.
Thanks
Mathias
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
2026-02-26 16:44 ` Mathias Nyman
@ 2026-02-26 18:17 ` Thinh Nguyen
2026-02-27 7:26 ` Dayu Jiang
2026-02-27 9:43 ` Mathias Nyman
2026-02-27 7:33 ` Dayu Jiang
1 sibling, 2 replies; 16+ messages in thread
From: Thinh Nguyen @ 2026-02-26 18:17 UTC (permalink / raw)
To: Mathias Nyman, Dayu Jiang
Cc: Greg Kroah-Hartman, Mathias Nyman, Longfang Liu,
linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
yudongbin, guhuinan, chenyu45, mahongwei3
On Thu, Feb 26, 2026, Mathias Nyman wrote:
> On 2/26/26 11:27, Dayu Jiang wrote:
> > Hi Greg,
> >
> > I have updated the changelog text as requested and resubmitted the patch.
> > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xiaomi.com/__;!!A4F2R9G_pg!ZSJNDKyOinm26qngopLW-axiQtwDAMely4bDqtqYDGv1ErWCtS6kZ6ZamdiKoZKuCyCk0IxMQK5g625GEIxYWFzKpAEiCUq7$
> > Please kindly review it and let me know if it is acceptable now.
>
> I'll send it forward, but changed the commit message.
> Does this modified version still describe the case accurately:
>
> usb: xhci: Prevent interrupt storm on host controller error (HCE)
>
> The xHCI controller reports a Host Controller Error (HCE) in UAS Storage
> Device plug/unplug scenarios on Android devices, which is checked in
> xhci_irq() function and causes an interrupt storm (since the interrupt
> isn’t cleared), leading to severe system-level faults.
>
> When the xHC controller reports HCE in the interrupt handler, the driver
> only logs a warning and assumes xHC activity will stop. The interrupt storm
> does however continue until driver manually disables xHC interrupt and
> stops the controller by calling xhci_halt().
>
> The change is made in xhci_irq() function where STS_HCE status is
> checked, mirroring the existing error handling pattern used for
> STS_FATAL errors.
>
> This only fixes the interrupt storm. Proper HCE recovery requires resetting
> and re-initializing the xHC.
>
The controller is halted if there's an error like HCE. It's odd to try
to "halt" it again. Not sure how this will impact for other controllers.
Even if we don't have the full HCE recovery implemented, did we try to
just do HCRST, which is the first step of the recovery?
BR,
Thinh
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
2026-02-26 18:17 ` Thinh Nguyen
@ 2026-02-27 7:26 ` Dayu Jiang
2026-02-28 0:22 ` Thinh Nguyen
2026-02-27 9:43 ` Mathias Nyman
1 sibling, 1 reply; 16+ messages in thread
From: Dayu Jiang @ 2026-02-27 7:26 UTC (permalink / raw)
To: Thinh Nguyen
Cc: Mathias Nyman, Greg Kroah-Hartman, Mathias Nyman, Longfang Liu,
linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
yudongbin, guhuinan, chenyu45, mahongwei3, Dayu Jiang
On Thu, Feb 26, 2026 at 06:17:23PM +0000, Thinh Nguyen wrote:
> On Thu, Feb 26, 2026, Mathias Nyman wrote:
> > On 2/26/26 11:27, Dayu Jiang wrote:
> > > Hi Greg,
> > >
> > > I have updated the changelog text as requested and resubmitted the patch.
> > > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xiaomi.com/__;!!A4F2R9G_pg!ZSJNDKyOinm26qngopLW-axiQtwDAMely4bDqtqYDGv1ErWCtS6kZ6ZamdiKoZKuCyCk0IxMQK5g625GEIxYWFzKpAEiCUq7$
> > > Please kindly review it and let me know if it is acceptable now.
> >
> > I'll send it forward, but changed the commit message.
> > Does this modified version still describe the case accurately:
> >
> > usb: xhci: Prevent interrupt storm on host controller error (HCE)
> >
> > The xHCI controller reports a Host Controller Error (HCE) in UAS Storage
> > Device plug/unplug scenarios on Android devices, which is checked in
> > xhci_irq() function and causes an interrupt storm (since the interrupt
> > isn’t cleared), leading to severe system-level faults.
> >
> > When the xHC controller reports HCE in the interrupt handler, the driver
> > only logs a warning and assumes xHC activity will stop. The interrupt storm
> > does however continue until driver manually disables xHC interrupt and
> > stops the controller by calling xhci_halt().
> >
> > The change is made in xhci_irq() function where STS_HCE status is
> > checked, mirroring the existing error handling pattern used for
> > STS_FATAL errors.
> >
> > This only fixes the interrupt storm. Proper HCE recovery requires resetting
> > and re-initializing the xHC.
> >
>
> The controller is halted if there's an error like HCE. It's odd to try
> to "halt" it again. Not sure how this will impact for other controllers.
> Even if we don't have the full HCE recovery implemented, did we try to
> just do HCRST, which is the first step of the recovery?
A full recovery will not be implemented here. Performing only HCRST without
a proper recovery procedure may introduce unpredictable risks.
In the xHCI driver flow, the standard handling for exceptions is mainly
done via xhci_died() or xhci_halt() (please refer to the existing handling
flow for HSE as a reference).
When an HCE occurs, the controller is already halted, but the interrupts
have not been cleared. It has been confirmed that calling xhci_halt() at this
point can properly resolve the interrupt storm issue.
> BR,
> Thinh
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
2026-02-26 16:44 ` Mathias Nyman
2026-02-26 18:17 ` Thinh Nguyen
@ 2026-02-27 7:33 ` Dayu Jiang
1 sibling, 0 replies; 16+ messages in thread
From: Dayu Jiang @ 2026-02-27 7:33 UTC (permalink / raw)
To: Mathias Nyman
Cc: Greg Kroah-Hartman, Mathias Nyman, Longfang Liu, linux-usb,
linux-kernel, yudongbin, guhuinan, chenyu45, mahongwei3,
Dayu Jiang
On Thu, Feb 26, 2026 at 06:44:02PM +0200, Mathias Nyman wrote:
> On 2/26/26 11:27, Dayu Jiang wrote:
> > Hi Greg,
> >
> > I have updated the changelog text as requested and resubmitted the patch.
> > https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xiaomi.com/
> > Please kindly review it and let me know if it is acceptable now.
>
> I'll send it forward, but changed the commit message.
> Does this modified version still describe the case accurately:
>
> usb: xhci: Prevent interrupt storm on host controller error (HCE)
>
> The xHCI controller reports a Host Controller Error (HCE) in UAS Storage
> Device plug/unplug scenarios on Android devices, which is checked in
> xhci_irq() function and causes an interrupt storm (since the interrupt
> isn’t cleared), leading to severe system-level faults.
>
> When the xHC controller reports HCE in the interrupt handler, the driver
> only logs a warning and assumes xHC activity will stop. The interrupt storm
> does however continue until driver manually disables xHC interrupt and
> stops the controller by calling xhci_halt().
>
> The change is made in xhci_irq() function where STS_HCE status is
> checked, mirroring the existing error handling pattern used for
> STS_FATAL errors.
>
> This only fixes the interrupt storm. Proper HCE recovery requires resetting
> and re-initializing the xHC.
The modified version looks good and accurate to me. Please feel free to merge it.
Thanks
Dayu Jiang
>
> Thanks
> Mathias
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
2026-02-26 18:17 ` Thinh Nguyen
2026-02-27 7:26 ` Dayu Jiang
@ 2026-02-27 9:43 ` Mathias Nyman
2026-02-27 11:05 ` Michal Pecio
2026-02-28 0:18 ` Thinh Nguyen
1 sibling, 2 replies; 16+ messages in thread
From: Mathias Nyman @ 2026-02-27 9:43 UTC (permalink / raw)
To: Thinh Nguyen, Dayu Jiang
Cc: Greg Kroah-Hartman, Mathias Nyman, Longfang Liu,
linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
yudongbin, guhuinan, chenyu45, mahongwei3, Niklas Neronin
On 2/26/26 20:17, Thinh Nguyen wrote:
> On Thu, Feb 26, 2026, Mathias Nyman wrote:
>> On 2/26/26 11:27, Dayu Jiang wrote:
>>> Hi Greg,
>>>
>>> I have updated the changelog text as requested and resubmitted the patch.
>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xiaomi.com/__;!!A4F2R9G_pg!ZSJNDKyOinm26qngopLW-axiQtwDAMely4bDqtqYDGv1ErWCtS6kZ6ZamdiKoZKuCyCk0IxMQK5g625GEIxYWFzKpAEiCUq7$
>>> Please kindly review it and let me know if it is acceptable now.
>>
>> I'll send it forward, but changed the commit message.
>> Does this modified version still describe the case accurately:
>>
>> usb: xhci: Prevent interrupt storm on host controller error (HCE)
>>
>> The xHCI controller reports a Host Controller Error (HCE) in UAS Storage
>> Device plug/unplug scenarios on Android devices, which is checked in
>> xhci_irq() function and causes an interrupt storm (since the interrupt
>> isn’t cleared), leading to severe system-level faults.
>>
>> When the xHC controller reports HCE in the interrupt handler, the driver
>> only logs a warning and assumes xHC activity will stop. The interrupt storm
>> does however continue until driver manually disables xHC interrupt and
>> stops the controller by calling xhci_halt().
>>
>> The change is made in xhci_irq() function where STS_HCE status is
>> checked, mirroring the existing error handling pattern used for
>> STS_FATAL errors.
>>
>> This only fixes the interrupt storm. Proper HCE recovery requires resetting
>> and re-initializing the xHC.
>>
>
> The controller is halted if there's an error like HCE. It's odd to try
> to "halt" it again. Not sure how this will impact for other controllers.
This is why I changed the commit message from:
"When the xHCI controller reports HCE in the interrupt handler, the driver
currently only logs a warning and continues execution. However, HCE
indicates a critical hardware failure that requires the controller to be
halted. This ensures the controller is in a consistent state and prevents
further operations on failed hardware."
to:
"When the xHC controller reports HCE in the interrupt handler, the driver
only logs a warning and assumes xHC activity will stop. The interrupt storm
does however continue until driver manually disables xHC interrupt and
stops the controller by calling xhci_halt()."
I can clarify it further by stating that .."assumes xHC activity will stop
as stated in xHCI spec. On some xHC controllers an interrupt storm continues after
HCE error, and only ceases after manually"..
The host is messed up at this point, and we are not recovering it. I don't think
there is any harm in a manual halt at this stage.
> Even if we don't have the full HCE recovery implemented, did we try to
> just do HCRST, which is the first step of the recovery?
Specs state that HCRST might re-trigger the HCE if it's due to a "hard" fault,
and driver needs to take action to prevent a HCE - HCRST recovery loop.
HCRST will clear all registers, so we need to reinitialize everything here,
write back addresses of event rings, command rings, DCBAA, scratchpads
dequeue pointers etc.
I support taking this fix to prevent the interrupt storm, an issue seen in real
life. And then solve proper recovery later.
Niklas is actually working on decoupling memory allocation and xHC register
initialization which will help future HCE recovery work.
Thanks
Mathias
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
2026-02-27 9:43 ` Mathias Nyman
@ 2026-02-27 11:05 ` Michal Pecio
2026-02-28 0:06 ` Thinh Nguyen
2026-02-28 0:18 ` Thinh Nguyen
1 sibling, 1 reply; 16+ messages in thread
From: Michal Pecio @ 2026-02-27 11:05 UTC (permalink / raw)
To: Mathias Nyman
Cc: Thinh Nguyen, Dayu Jiang, Greg Kroah-Hartman, Mathias Nyman,
Longfang Liu, linux-usb@vger.kernel.org,
linux-kernel@vger.kernel.org, yudongbin, guhuinan, chenyu45,
mahongwei3, Niklas Neronin
On Fri, 27 Feb 2026 11:43:45 +0200, Mathias Nyman wrote:
> On 2/26/26 20:17, Thinh Nguyen wrote:
> > The controller is halted if there's an error like HCE. It's odd to
> > try to "halt" it again. Not sure how this will impact for other
> > controllers.
> The host is messed up at this point, and we are not recovering it.
> I don't think there is any harm in a manual halt at this stage.
Specifically, calling xhci_halt() clears the USBCMD.Run flag and
all USBCMD interrupt enable flags. Seems relatively harmless. Clearing
USBCMD.Run would be the first step of resetting the HC anyway, so the
HW should expect it to happen afetr reporting HCE.
In case of HSE the HW should clear the Run bit by itself (4.10.2.6),
but no such requirement seems to exist for HCE (4.24.1).
The call also sets XHCI_STATE_HALTED and CMD_RING_STATE_STOPPED flags,
which helps with recovering stuck URBs. When class drivers time out
and unlink them, the URBs are given back instantly without drama.
I just tested the HSE case where xhci_halt() is already being called
and it worked for me. If I remove xhci_halt() then the driver tries to
issue Stop Endpoint commands, times out and calls hc_died(). Messy.
I suspect that the same happened with HCE before this patch.
Regards,
Michal
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
2026-02-27 11:05 ` Michal Pecio
@ 2026-02-28 0:06 ` Thinh Nguyen
0 siblings, 0 replies; 16+ messages in thread
From: Thinh Nguyen @ 2026-02-28 0:06 UTC (permalink / raw)
To: Michal Pecio
Cc: Mathias Nyman, Thinh Nguyen, Dayu Jiang, Greg Kroah-Hartman,
Mathias Nyman, Longfang Liu, linux-usb@vger.kernel.org,
linux-kernel@vger.kernel.org, yudongbin, guhuinan, chenyu45,
mahongwei3, Niklas Neronin
On Fri, Feb 27, 2026, Michal Pecio wrote:
> On Fri, 27 Feb 2026 11:43:45 +0200, Mathias Nyman wrote:
> > On 2/26/26 20:17, Thinh Nguyen wrote:
> > > The controller is halted if there's an error like HCE. It's odd to
> > > try to "halt" it again. Not sure how this will impact for other
> > > controllers.
> > The host is messed up at this point, and we are not recovering it.
> > I don't think there is any harm in a manual halt at this stage.
>
> Specifically, calling xhci_halt() clears the USBCMD.Run flag and
> all USBCMD interrupt enable flags. Seems relatively harmless. Clearing
> USBCMD.Run would be the first step of resetting the HC anyway, so the
> HW should expect it to happen afetr reporting HCE.
>
> In case of HSE the HW should clear the Run bit by itself (4.10.2.6),
> but no such requirement seems to exist for HCE (4.24.1).
Check 4.21.2. The controller should clear the Run/Stop bit for both
cases.
>
> The call also sets XHCI_STATE_HALTED and CMD_RING_STATE_STOPPED flags,
Perhaps we update these driver flags, but we should not need to clear
the Run/Stop bit. That's not the first thing we should do.
> which helps with recovering stuck URBs. When class drivers time out
> and unlink them, the URBs are given back instantly without drama.
>
> I just tested the HSE case where xhci_halt() is already being called
> and it worked for me. If I remove xhci_halt() then the driver tries to
> issue Stop Endpoint commands, times out and calls hc_died(). Messy.
> I suspect that the same happened with HCE before this patch.
>
That's just the xhci driver needing to update the software states to
properly handle the teardown knowing that the xHC is halted.
BR,
Thinh
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
2026-02-27 9:43 ` Mathias Nyman
2026-02-27 11:05 ` Michal Pecio
@ 2026-02-28 0:18 ` Thinh Nguyen
1 sibling, 0 replies; 16+ messages in thread
From: Thinh Nguyen @ 2026-02-28 0:18 UTC (permalink / raw)
To: Mathias Nyman
Cc: Thinh Nguyen, Dayu Jiang, Greg Kroah-Hartman, Mathias Nyman,
Longfang Liu, linux-usb@vger.kernel.org,
linux-kernel@vger.kernel.org, yudongbin, guhuinan, chenyu45,
mahongwei3, Niklas Neronin
On Fri, Feb 27, 2026, Mathias Nyman wrote:
> On 2/26/26 20:17, Thinh Nguyen wrote:
> > On Thu, Feb 26, 2026, Mathias Nyman wrote:
> > > On 2/26/26 11:27, Dayu Jiang wrote:
> > > > Hi Greg,
> > > >
> > > > I have updated the changelog text as requested and resubmitted the patch.
> > > > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xiaomi.com/__;!!A4F2R9G_pg!ZSJNDKyOinm26qngopLW-axiQtwDAMely4bDqtqYDGv1ErWCtS6kZ6ZamdiKoZKuCyCk0IxMQK5g625GEIxYWFzKpAEiCUq7$
> > > > Please kindly review it and let me know if it is acceptable now.
> > >
> > > I'll send it forward, but changed the commit message.
> > > Does this modified version still describe the case accurately:
> > >
> > > usb: xhci: Prevent interrupt storm on host controller error (HCE)
> > >
> > > The xHCI controller reports a Host Controller Error (HCE) in UAS Storage
> > > Device plug/unplug scenarios on Android devices, which is checked in
> > > xhci_irq() function and causes an interrupt storm (since the interrupt
> > > isn’t cleared), leading to severe system-level faults.
> > >
> > > When the xHC controller reports HCE in the interrupt handler, the driver
> > > only logs a warning and assumes xHC activity will stop. The interrupt storm
> > > does however continue until driver manually disables xHC interrupt and
> > > stops the controller by calling xhci_halt().
> > >
> > > The change is made in xhci_irq() function where STS_HCE status is
> > > checked, mirroring the existing error handling pattern used for
> > > STS_FATAL errors.
> > >
> > > This only fixes the interrupt storm. Proper HCE recovery requires resetting
> > > and re-initializing the xHC.
> > >
> >
> > The controller is halted if there's an error like HCE. It's odd to try
> > to "halt" it again. Not sure how this will impact for other controllers.
>
> This is why I changed the commit message from:
>
> "When the xHCI controller reports HCE in the interrupt handler, the driver
> currently only logs a warning and continues execution. However, HCE
> indicates a critical hardware failure that requires the controller to be
> halted. This ensures the controller is in a consistent state and prevents
> further operations on failed hardware."
>
> to:
>
> "When the xHC controller reports HCE in the interrupt handler, the driver
> only logs a warning and assumes xHC activity will stop. The interrupt storm
> does however continue until driver manually disables xHC interrupt and
> stops the controller by calling xhci_halt()."
>
> I can clarify it further by stating that .."assumes xHC activity will stop
> as stated in xHCI spec. On some xHC controllers an interrupt storm continues after
> HCE error, and only ceases after manually"..
>
> The host is messed up at this point, and we are not recovering it. I don't think
> there is any harm in a manual halt at this stage.
We should update the xhci driver states when there's HCE and the
controller is halted but we don't need to manually clear the Run/Stop
bit again.
>
> > Even if we don't have the full HCE recovery implemented, did we try to
> > just do HCRST, which is the first step of the recovery?
>
> Specs state that HCRST might re-trigger the HCE if it's due to a "hard" fault,
That's only after we re-initialize the controller as noted in the spec,
not immidiately after HCRST.
> and driver needs to take action to prevent a HCE - HCRST recovery loop.
>
> HCRST will clear all registers, so we need to reinitialize everything here,
> write back addresses of event rings, command rings, DCBAA, scratchpads
> dequeue pointers etc.
>
> I support taking this fix to prevent the interrupt storm, an issue seen in real
> life. And then solve proper recovery later.
That's fair to me.
>
> Niklas is actually working on decoupling memory allocation and xHC register
> initialization which will help future HCE recovery work.
>
That's great! I'm looking forward to that.
Thanks,
Thinh
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
2026-02-27 7:26 ` Dayu Jiang
@ 2026-02-28 0:22 ` Thinh Nguyen
0 siblings, 0 replies; 16+ messages in thread
From: Thinh Nguyen @ 2026-02-28 0:22 UTC (permalink / raw)
To: Dayu Jiang
Cc: Thinh Nguyen, Mathias Nyman, Greg Kroah-Hartman, Mathias Nyman,
Longfang Liu, linux-usb@vger.kernel.org,
linux-kernel@vger.kernel.org, yudongbin, guhuinan, chenyu45,
mahongwei3
On Fri, Feb 27, 2026, Dayu Jiang wrote:
> On Thu, Feb 26, 2026 at 06:17:23PM +0000, Thinh Nguyen wrote:
> > On Thu, Feb 26, 2026, Mathias Nyman wrote:
> > > On 2/26/26 11:27, Dayu Jiang wrote:
> > > > Hi Greg,
> > > >
> > > > I have updated the changelog text as requested and resubmitted the patch.
> > > > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xiaomi.com/__;!!A4F2R9G_pg!ZSJNDKyOinm26qngopLW-axiQtwDAMely4bDqtqYDGv1ErWCtS6kZ6ZamdiKoZKuCyCk0IxMQK5g625GEIxYWFzKpAEiCUq7$
> > > > Please kindly review it and let me know if it is acceptable now.
> > >
> > > I'll send it forward, but changed the commit message.
> > > Does this modified version still describe the case accurately:
> > >
> > > usb: xhci: Prevent interrupt storm on host controller error (HCE)
> > >
> > > The xHCI controller reports a Host Controller Error (HCE) in UAS Storage
> > > Device plug/unplug scenarios on Android devices, which is checked in
> > > xhci_irq() function and causes an interrupt storm (since the interrupt
> > > isn’t cleared), leading to severe system-level faults.
> > >
> > > When the xHC controller reports HCE in the interrupt handler, the driver
> > > only logs a warning and assumes xHC activity will stop. The interrupt storm
> > > does however continue until driver manually disables xHC interrupt and
> > > stops the controller by calling xhci_halt().
> > >
> > > The change is made in xhci_irq() function where STS_HCE status is
> > > checked, mirroring the existing error handling pattern used for
> > > STS_FATAL errors.
> > >
> > > This only fixes the interrupt storm. Proper HCE recovery requires resetting
> > > and re-initializing the xHC.
> > >
> >
> > The controller is halted if there's an error like HCE. It's odd to try
> > to "halt" it again. Not sure how this will impact for other controllers.
> > Even if we don't have the full HCE recovery implemented, did we try to
> > just do HCRST, which is the first step of the recovery?
> A full recovery will not be implemented here. Performing only HCRST without
> a proper recovery procedure may introduce unpredictable risks.
What risks?
> In the xHCI driver flow, the standard handling for exceptions is mainly
> done via xhci_died() or xhci_halt() (please refer to the existing handling
> flow for HSE as a reference).
> When an HCE occurs, the controller is already halted, but the interrupts
> have not been cleared. It has been confirmed that calling xhci_halt() at this
> point can properly resolve the interrupt storm issue.
As I noted in Mathias's reply, I'm OK with this change while waiting for
the proper handling of HCE to be implemented.
BR,
Thinh
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2026-02-28 0:22 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-27 11:04 [PATCH] usb: xhci: add xhci_halt() for HCE Handling jiangdayu
2026-01-27 11:22 ` Greg Kroah-Hartman
2026-01-28 8:48 ` Dayu Jiang
2026-01-28 8:56 ` Greg Kroah-Hartman
2026-02-26 9:27 ` Dayu Jiang
2026-02-26 16:44 ` Mathias Nyman
2026-02-26 18:17 ` Thinh Nguyen
2026-02-27 7:26 ` Dayu Jiang
2026-02-28 0:22 ` Thinh Nguyen
2026-02-27 9:43 ` Mathias Nyman
2026-02-27 11:05 ` Michal Pecio
2026-02-28 0:06 ` Thinh Nguyen
2026-02-28 0:18 ` Thinh Nguyen
2026-02-27 7:33 ` Dayu Jiang
2026-01-27 12:25 ` Mathias Nyman
2026-01-28 8:53 ` Dayu Jiang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox