* [PATCH] usb: xhci: add xhci_halt() for HCE Handling
@ 2026-01-27 11:04 jiangdayu
2026-01-27 11:22 ` Greg Kroah-Hartman
2026-01-27 12:25 ` Mathias Nyman
0 siblings, 2 replies; 16+ messages in thread
From: jiangdayu @ 2026-01-27 11:04 UTC (permalink / raw)
To: Mathias Nyman, Greg Kroah-Hartman
Cc: Longfang Liu, linux-usb, linux-kernel, yudongbin, guhuinan,
chenyu45, mahongwei3, jiangdayu
When the xHCI controller reports a Host Controller Error (HCE) status
in the interrupt handler, the driver currently only logs a warning and
continues execution. However, a Host Controller Error indicates a
critical hardware failure that requires the controller to be halted.
Add xhci_halt(xhci) call after the HCE warning to properly halt the
controller when this error condition is detected. This ensures the
controller is in a consistent state and prevents further operations
on a failed hardware. Additionally, if there are still unhandled
interrupts at this point, it may cause interrupt storm.
The change is made in xhci_irq() function where STS_HCE status is
checked, mirroring the existing error handling pattern used for
STS_FATAL errors.
Fixes: 2a25e66d676df ("xhci: print warning when HCE was set")
Signed-off-by: jiangdayu <jiangdayu@xiaomi.com>
---
drivers/usb/host/xhci-ring.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 9315ba18310d..1cbefee3c4ca 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -3195,6 +3195,7 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd)
if (status & STS_HCE) {
xhci_warn(xhci, "WARNING: Host Controller Error\n");
+ xhci_halt(xhci);
goto out;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling 2026-01-27 11:04 [PATCH] usb: xhci: add xhci_halt() for HCE Handling jiangdayu @ 2026-01-27 11:22 ` Greg Kroah-Hartman 2026-01-28 8:48 ` Dayu Jiang 2026-01-27 12:25 ` Mathias Nyman 1 sibling, 1 reply; 16+ messages in thread From: Greg Kroah-Hartman @ 2026-01-27 11:22 UTC (permalink / raw) To: jiangdayu Cc: Mathias Nyman, Longfang Liu, linux-usb, linux-kernel, yudongbin, guhuinan, chenyu45, mahongwei3 On Tue, Jan 27, 2026 at 07:04:22PM +0800, jiangdayu wrote: > When the xHCI controller reports a Host Controller Error (HCE) status > in the interrupt handler, the driver currently only logs a warning and > continues execution. However, a Host Controller Error indicates a > critical hardware failure that requires the controller to be halted. > > Add xhci_halt(xhci) call after the HCE warning to properly halt the > controller when this error condition is detected. This ensures the > controller is in a consistent state and prevents further operations > on a failed hardware. Additionally, if there are still unhandled > interrupts at this point, it may cause interrupt storm. > > The change is made in xhci_irq() function where STS_HCE status is > checked, mirroring the existing error handling pattern used for > STS_FATAL errors. > > Fixes: 2a25e66d676df ("xhci: print warning when HCE was set") > Signed-off-by: jiangdayu <jiangdayu@xiaomi.com> We need a full name, not an email alias, sorry. And this isn't really "fixing" that commit, there's nothing wrong with it as-is. This is adding new functionality to the code. > --- > drivers/usb/host/xhci-ring.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c > index 9315ba18310d..1cbefee3c4ca 100644 > --- a/drivers/usb/host/xhci-ring.c > +++ b/drivers/usb/host/xhci-ring.c > @@ -3195,6 +3195,7 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd) > > if (status & STS_HCE) { > xhci_warn(xhci, "WARNING: Host Controller Error\n"); > + xhci_halt(xhci); What is going to start things back up again? And as you are calling this function, why is the warning message needed anymore? The tracepoint information will give you that message now, right? And is this just papering over a hardware bug? Should this really be happening for any normal system? thanks, greg k-h ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling 2026-01-27 11:22 ` Greg Kroah-Hartman @ 2026-01-28 8:48 ` Dayu Jiang 2026-01-28 8:56 ` Greg Kroah-Hartman 0 siblings, 1 reply; 16+ messages in thread From: Dayu Jiang @ 2026-01-28 8:48 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Mathias Nyman, Longfang Liu, linux-usb, linux-kernel, yudongbin, guhuinan, chenyu45, mahongwei3 On Tue, Jan 27, 2026 at 12:22:40PM +0100, Greg Kroah-Hartman wrote: > On Tue, Jan 27, 2026 at 07:04:22PM +0800, jiangdayu wrote: > > When the xHCI controller reports a Host Controller Error (HCE) status > > in the interrupt handler, the driver currently only logs a warning and > > continues execution. However, a Host Controller Error indicates a > > critical hardware failure that requires the controller to be halted. > > > > Add xhci_halt(xhci) call after the HCE warning to properly halt the > > controller when this error condition is detected. This ensures the > > controller is in a consistent state and prevents further operations > > on a failed hardware. Additionally, if there are still unhandled > > interrupts at this point, it may cause interrupt storm. > > > > The change is made in xhci_irq() function where STS_HCE status is > > checked, mirroring the existing error handling pattern used for > > STS_FATAL errors. > > > > Fixes: 2a25e66d676df ("xhci: print warning when HCE was set") > > Signed-off-by: jiangdayu <jiangdayu@xiaomi.com> > > We need a full name, not an email alias, sorry. Sorry for the confusion, I will use my full legal name (instead of the email alias) in the Signed-off-by line in the revised patch. > > And this isn't really "fixing" that commit, there's nothing wrong with > it as-is. This is adding new functionality to the code. I initially used the Fixes tag because the original commit only logged a warning for HCE with no further action, this incomplete handling risks interrupt storms on the SoC (since the interrupt isn’t cleared). That’s a robustness gap I wanted to fix with this patch. > > > --- > > drivers/usb/host/xhci-ring.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c > > index 9315ba18310d..1cbefee3c4ca 100644 > > --- a/drivers/usb/host/xhci-ring.c > > +++ b/drivers/usb/host/xhci-ring.c > > @@ -3195,6 +3195,7 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd) > > > > if (status & STS_HCE) { > > xhci_warn(xhci, "WARNING: Host Controller Error\n"); > > + xhci_halt(xhci); > > What is going to start things back up again? And as you are calling > this function, why is the warning message needed anymore? The > tracepoint information will give you that message now, right? When HCE is triggered, it indicates a critical hardware failure. Aligning with the handling of HSE (STS_FATAL) by adding xhci_halt() here is more reasonable: without xhci_halt(), the USB controller may fall into an unpredictable and unstable state, which could exacerbate system issues. Retaining the warning message is necessary because it is directly visible in dmesg, whereas tracepoint information requires explicitly enabling xHCI tracepoints. Additionally, if xhci_halt() is called in xhci_irq() without the warning log, it would be impossible to distinguish whether the halt was triggered by HCE or HSE. > > And is this just papering over a hardware bug? Should this really be > happening for any normal system? Yes, this issue has been reproducible on real-world hardware: HCE is triggered in UAS Storage Device plug/unplug scenarios on Android devices, which enters this error branch and causes an interrupt storm, leading to severe system-level faults. > > thanks, > > greg k-h ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling 2026-01-28 8:48 ` Dayu Jiang @ 2026-01-28 8:56 ` Greg Kroah-Hartman 2026-02-26 9:27 ` Dayu Jiang 0 siblings, 1 reply; 16+ messages in thread From: Greg Kroah-Hartman @ 2026-01-28 8:56 UTC (permalink / raw) To: Dayu Jiang Cc: Mathias Nyman, Longfang Liu, linux-usb, linux-kernel, yudongbin, guhuinan, chenyu45, mahongwei3 On Wed, Jan 28, 2026 at 04:48:49PM +0800, Dayu Jiang wrote: > On Tue, Jan 27, 2026 at 12:22:40PM +0100, Greg Kroah-Hartman wrote: > > > if (status & STS_HCE) { > > > xhci_warn(xhci, "WARNING: Host Controller Error\n"); > > > + xhci_halt(xhci); > > > > What is going to start things back up again? And as you are calling > > this function, why is the warning message needed anymore? The > > tracepoint information will give you that message now, right? > When HCE is triggered, it indicates a critical hardware failure. > Aligning with the handling of HSE (STS_FATAL) by adding > xhci_halt() here is more reasonable: without xhci_halt(), the > USB controller may fall into an unpredictable and unstable state, > which could exacerbate system issues. > > Retaining the warning message is necessary because it is directly > visible in dmesg, whereas tracepoint information requires explicitly > enabling xHCI tracepoints. Additionally, if xhci_halt() is called in > xhci_irq() without the warning log, it would be impossible to > distinguish whether the halt was triggered by HCE or HSE. > > > > And is this just papering over a hardware bug? Should this really be > > happening for any normal system? > Yes, this issue has been reproducible on real-world hardware: HCE is > triggered in UAS Storage Device plug/unplug scenarios on Android > devices, which enters this error branch and causes an interrupt storm, > leading to severe system-level faults. Great, please provide this information in the changelog text when you resubmit this, thanks! greg k-h ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling 2026-01-28 8:56 ` Greg Kroah-Hartman @ 2026-02-26 9:27 ` Dayu Jiang 2026-02-26 16:44 ` Mathias Nyman 0 siblings, 1 reply; 16+ messages in thread From: Dayu Jiang @ 2026-02-26 9:27 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Mathias Nyman, Longfang Liu, linux-usb, linux-kernel, yudongbin, guhuinan, chenyu45, mahongwei3, Dayu Jiang Hi Greg, I have updated the changelog text as requested and resubmitted the patch. https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xiaomi.com/ Please kindly review it and let me know if it is acceptable now. Thanks, Dayu Jiang On Wed, Jan 28, 2026 at 09:56:18AM +0100, Greg Kroah-Hartman wrote: > On Wed, Jan 28, 2026 at 04:48:49PM +0800, Dayu Jiang wrote: > > On Tue, Jan 27, 2026 at 12:22:40PM +0100, Greg Kroah-Hartman wrote: > > > > if (status & STS_HCE) { > > > > xhci_warn(xhci, "WARNING: Host Controller Error\n"); > > > > + xhci_halt(xhci); > > > > > > What is going to start things back up again? And as you are calling > > > this function, why is the warning message needed anymore? The > > > tracepoint information will give you that message now, right? > > When HCE is triggered, it indicates a critical hardware failure. > > Aligning with the handling of HSE (STS_FATAL) by adding > > xhci_halt() here is more reasonable: without xhci_halt(), the > > USB controller may fall into an unpredictable and unstable state, > > which could exacerbate system issues. > > > > Retaining the warning message is necessary because it is directly > > visible in dmesg, whereas tracepoint information requires explicitly > > enabling xHCI tracepoints. Additionally, if xhci_halt() is called in > > xhci_irq() without the warning log, it would be impossible to > > distinguish whether the halt was triggered by HCE or HSE. > > > > > > And is this just papering over a hardware bug? Should this really be > > > happening for any normal system? > > Yes, this issue has been reproducible on real-world hardware: HCE is > > triggered in UAS Storage Device plug/unplug scenarios on Android > > devices, which enters this error branch and causes an interrupt storm, > > leading to severe system-level faults. > > Great, please provide this information in the changelog text when you > resubmit this, thanks! > > greg k-h ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling 2026-02-26 9:27 ` Dayu Jiang @ 2026-02-26 16:44 ` Mathias Nyman 2026-02-26 18:17 ` Thinh Nguyen 2026-02-27 7:33 ` Dayu Jiang 0 siblings, 2 replies; 16+ messages in thread From: Mathias Nyman @ 2026-02-26 16:44 UTC (permalink / raw) To: Dayu Jiang, Greg Kroah-Hartman Cc: Mathias Nyman, Longfang Liu, linux-usb, linux-kernel, yudongbin, guhuinan, chenyu45, mahongwei3 On 2/26/26 11:27, Dayu Jiang wrote: > Hi Greg, > > I have updated the changelog text as requested and resubmitted the patch. > https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xiaomi.com/ > Please kindly review it and let me know if it is acceptable now. I'll send it forward, but changed the commit message. Does this modified version still describe the case accurately: usb: xhci: Prevent interrupt storm on host controller error (HCE) The xHCI controller reports a Host Controller Error (HCE) in UAS Storage Device plug/unplug scenarios on Android devices, which is checked in xhci_irq() function and causes an interrupt storm (since the interrupt isn’t cleared), leading to severe system-level faults. When the xHC controller reports HCE in the interrupt handler, the driver only logs a warning and assumes xHC activity will stop. The interrupt storm does however continue until driver manually disables xHC interrupt and stops the controller by calling xhci_halt(). The change is made in xhci_irq() function where STS_HCE status is checked, mirroring the existing error handling pattern used for STS_FATAL errors. This only fixes the interrupt storm. Proper HCE recovery requires resetting and re-initializing the xHC. Thanks Mathias ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling 2026-02-26 16:44 ` Mathias Nyman @ 2026-02-26 18:17 ` Thinh Nguyen 2026-02-27 7:26 ` Dayu Jiang 2026-02-27 9:43 ` Mathias Nyman 2026-02-27 7:33 ` Dayu Jiang 1 sibling, 2 replies; 16+ messages in thread From: Thinh Nguyen @ 2026-02-26 18:17 UTC (permalink / raw) To: Mathias Nyman, Dayu Jiang Cc: Greg Kroah-Hartman, Mathias Nyman, Longfang Liu, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, yudongbin, guhuinan, chenyu45, mahongwei3 On Thu, Feb 26, 2026, Mathias Nyman wrote: > On 2/26/26 11:27, Dayu Jiang wrote: > > Hi Greg, > > > > I have updated the changelog text as requested and resubmitted the patch. > > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xiaomi.com/__;!!A4F2R9G_pg!ZSJNDKyOinm26qngopLW-axiQtwDAMely4bDqtqYDGv1ErWCtS6kZ6ZamdiKoZKuCyCk0IxMQK5g625GEIxYWFzKpAEiCUq7$ > > Please kindly review it and let me know if it is acceptable now. > > I'll send it forward, but changed the commit message. > Does this modified version still describe the case accurately: > > usb: xhci: Prevent interrupt storm on host controller error (HCE) > > The xHCI controller reports a Host Controller Error (HCE) in UAS Storage > Device plug/unplug scenarios on Android devices, which is checked in > xhci_irq() function and causes an interrupt storm (since the interrupt > isn’t cleared), leading to severe system-level faults. > > When the xHC controller reports HCE in the interrupt handler, the driver > only logs a warning and assumes xHC activity will stop. The interrupt storm > does however continue until driver manually disables xHC interrupt and > stops the controller by calling xhci_halt(). > > The change is made in xhci_irq() function where STS_HCE status is > checked, mirroring the existing error handling pattern used for > STS_FATAL errors. > > This only fixes the interrupt storm. Proper HCE recovery requires resetting > and re-initializing the xHC. > The controller is halted if there's an error like HCE. It's odd to try to "halt" it again. Not sure how this will impact for other controllers. Even if we don't have the full HCE recovery implemented, did we try to just do HCRST, which is the first step of the recovery? BR, Thinh ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling 2026-02-26 18:17 ` Thinh Nguyen @ 2026-02-27 7:26 ` Dayu Jiang 2026-02-28 0:22 ` Thinh Nguyen 2026-02-27 9:43 ` Mathias Nyman 1 sibling, 1 reply; 16+ messages in thread From: Dayu Jiang @ 2026-02-27 7:26 UTC (permalink / raw) To: Thinh Nguyen Cc: Mathias Nyman, Greg Kroah-Hartman, Mathias Nyman, Longfang Liu, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, yudongbin, guhuinan, chenyu45, mahongwei3, Dayu Jiang On Thu, Feb 26, 2026 at 06:17:23PM +0000, Thinh Nguyen wrote: > On Thu, Feb 26, 2026, Mathias Nyman wrote: > > On 2/26/26 11:27, Dayu Jiang wrote: > > > Hi Greg, > > > > > > I have updated the changelog text as requested and resubmitted the patch. > > > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xiaomi.com/__;!!A4F2R9G_pg!ZSJNDKyOinm26qngopLW-axiQtwDAMely4bDqtqYDGv1ErWCtS6kZ6ZamdiKoZKuCyCk0IxMQK5g625GEIxYWFzKpAEiCUq7$ > > > Please kindly review it and let me know if it is acceptable now. > > > > I'll send it forward, but changed the commit message. > > Does this modified version still describe the case accurately: > > > > usb: xhci: Prevent interrupt storm on host controller error (HCE) > > > > The xHCI controller reports a Host Controller Error (HCE) in UAS Storage > > Device plug/unplug scenarios on Android devices, which is checked in > > xhci_irq() function and causes an interrupt storm (since the interrupt > > isn’t cleared), leading to severe system-level faults. > > > > When the xHC controller reports HCE in the interrupt handler, the driver > > only logs a warning and assumes xHC activity will stop. The interrupt storm > > does however continue until driver manually disables xHC interrupt and > > stops the controller by calling xhci_halt(). > > > > The change is made in xhci_irq() function where STS_HCE status is > > checked, mirroring the existing error handling pattern used for > > STS_FATAL errors. > > > > This only fixes the interrupt storm. Proper HCE recovery requires resetting > > and re-initializing the xHC. > > > > The controller is halted if there's an error like HCE. It's odd to try > to "halt" it again. Not sure how this will impact for other controllers. > Even if we don't have the full HCE recovery implemented, did we try to > just do HCRST, which is the first step of the recovery? A full recovery will not be implemented here. Performing only HCRST without a proper recovery procedure may introduce unpredictable risks. In the xHCI driver flow, the standard handling for exceptions is mainly done via xhci_died() or xhci_halt() (please refer to the existing handling flow for HSE as a reference). When an HCE occurs, the controller is already halted, but the interrupts have not been cleared. It has been confirmed that calling xhci_halt() at this point can properly resolve the interrupt storm issue. > BR, > Thinh ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling 2026-02-27 7:26 ` Dayu Jiang @ 2026-02-28 0:22 ` Thinh Nguyen 0 siblings, 0 replies; 16+ messages in thread From: Thinh Nguyen @ 2026-02-28 0:22 UTC (permalink / raw) To: Dayu Jiang Cc: Thinh Nguyen, Mathias Nyman, Greg Kroah-Hartman, Mathias Nyman, Longfang Liu, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, yudongbin, guhuinan, chenyu45, mahongwei3 On Fri, Feb 27, 2026, Dayu Jiang wrote: > On Thu, Feb 26, 2026 at 06:17:23PM +0000, Thinh Nguyen wrote: > > On Thu, Feb 26, 2026, Mathias Nyman wrote: > > > On 2/26/26 11:27, Dayu Jiang wrote: > > > > Hi Greg, > > > > > > > > I have updated the changelog text as requested and resubmitted the patch. > > > > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xiaomi.com/__;!!A4F2R9G_pg!ZSJNDKyOinm26qngopLW-axiQtwDAMely4bDqtqYDGv1ErWCtS6kZ6ZamdiKoZKuCyCk0IxMQK5g625GEIxYWFzKpAEiCUq7$ > > > > Please kindly review it and let me know if it is acceptable now. > > > > > > I'll send it forward, but changed the commit message. > > > Does this modified version still describe the case accurately: > > > > > > usb: xhci: Prevent interrupt storm on host controller error (HCE) > > > > > > The xHCI controller reports a Host Controller Error (HCE) in UAS Storage > > > Device plug/unplug scenarios on Android devices, which is checked in > > > xhci_irq() function and causes an interrupt storm (since the interrupt > > > isn’t cleared), leading to severe system-level faults. > > > > > > When the xHC controller reports HCE in the interrupt handler, the driver > > > only logs a warning and assumes xHC activity will stop. The interrupt storm > > > does however continue until driver manually disables xHC interrupt and > > > stops the controller by calling xhci_halt(). > > > > > > The change is made in xhci_irq() function where STS_HCE status is > > > checked, mirroring the existing error handling pattern used for > > > STS_FATAL errors. > > > > > > This only fixes the interrupt storm. Proper HCE recovery requires resetting > > > and re-initializing the xHC. > > > > > > > The controller is halted if there's an error like HCE. It's odd to try > > to "halt" it again. Not sure how this will impact for other controllers. > > Even if we don't have the full HCE recovery implemented, did we try to > > just do HCRST, which is the first step of the recovery? > A full recovery will not be implemented here. Performing only HCRST without > a proper recovery procedure may introduce unpredictable risks. What risks? > In the xHCI driver flow, the standard handling for exceptions is mainly > done via xhci_died() or xhci_halt() (please refer to the existing handling > flow for HSE as a reference). > When an HCE occurs, the controller is already halted, but the interrupts > have not been cleared. It has been confirmed that calling xhci_halt() at this > point can properly resolve the interrupt storm issue. As I noted in Mathias's reply, I'm OK with this change while waiting for the proper handling of HCE to be implemented. BR, Thinh ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling 2026-02-26 18:17 ` Thinh Nguyen 2026-02-27 7:26 ` Dayu Jiang @ 2026-02-27 9:43 ` Mathias Nyman 2026-02-27 11:05 ` Michal Pecio 2026-02-28 0:18 ` Thinh Nguyen 1 sibling, 2 replies; 16+ messages in thread From: Mathias Nyman @ 2026-02-27 9:43 UTC (permalink / raw) To: Thinh Nguyen, Dayu Jiang Cc: Greg Kroah-Hartman, Mathias Nyman, Longfang Liu, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, yudongbin, guhuinan, chenyu45, mahongwei3, Niklas Neronin On 2/26/26 20:17, Thinh Nguyen wrote: > On Thu, Feb 26, 2026, Mathias Nyman wrote: >> On 2/26/26 11:27, Dayu Jiang wrote: >>> Hi Greg, >>> >>> I have updated the changelog text as requested and resubmitted the patch. >>> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xiaomi.com/__;!!A4F2R9G_pg!ZSJNDKyOinm26qngopLW-axiQtwDAMely4bDqtqYDGv1ErWCtS6kZ6ZamdiKoZKuCyCk0IxMQK5g625GEIxYWFzKpAEiCUq7$ >>> Please kindly review it and let me know if it is acceptable now. >> >> I'll send it forward, but changed the commit message. >> Does this modified version still describe the case accurately: >> >> usb: xhci: Prevent interrupt storm on host controller error (HCE) >> >> The xHCI controller reports a Host Controller Error (HCE) in UAS Storage >> Device plug/unplug scenarios on Android devices, which is checked in >> xhci_irq() function and causes an interrupt storm (since the interrupt >> isn’t cleared), leading to severe system-level faults. >> >> When the xHC controller reports HCE in the interrupt handler, the driver >> only logs a warning and assumes xHC activity will stop. The interrupt storm >> does however continue until driver manually disables xHC interrupt and >> stops the controller by calling xhci_halt(). >> >> The change is made in xhci_irq() function where STS_HCE status is >> checked, mirroring the existing error handling pattern used for >> STS_FATAL errors. >> >> This only fixes the interrupt storm. Proper HCE recovery requires resetting >> and re-initializing the xHC. >> > > The controller is halted if there's an error like HCE. It's odd to try > to "halt" it again. Not sure how this will impact for other controllers. This is why I changed the commit message from: "When the xHCI controller reports HCE in the interrupt handler, the driver currently only logs a warning and continues execution. However, HCE indicates a critical hardware failure that requires the controller to be halted. This ensures the controller is in a consistent state and prevents further operations on failed hardware." to: "When the xHC controller reports HCE in the interrupt handler, the driver only logs a warning and assumes xHC activity will stop. The interrupt storm does however continue until driver manually disables xHC interrupt and stops the controller by calling xhci_halt()." I can clarify it further by stating that .."assumes xHC activity will stop as stated in xHCI spec. On some xHC controllers an interrupt storm continues after HCE error, and only ceases after manually".. The host is messed up at this point, and we are not recovering it. I don't think there is any harm in a manual halt at this stage. > Even if we don't have the full HCE recovery implemented, did we try to > just do HCRST, which is the first step of the recovery? Specs state that HCRST might re-trigger the HCE if it's due to a "hard" fault, and driver needs to take action to prevent a HCE - HCRST recovery loop. HCRST will clear all registers, so we need to reinitialize everything here, write back addresses of event rings, command rings, DCBAA, scratchpads dequeue pointers etc. I support taking this fix to prevent the interrupt storm, an issue seen in real life. And then solve proper recovery later. Niklas is actually working on decoupling memory allocation and xHC register initialization which will help future HCE recovery work. Thanks Mathias ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling 2026-02-27 9:43 ` Mathias Nyman @ 2026-02-27 11:05 ` Michal Pecio 2026-02-28 0:06 ` Thinh Nguyen 2026-02-28 0:18 ` Thinh Nguyen 1 sibling, 1 reply; 16+ messages in thread From: Michal Pecio @ 2026-02-27 11:05 UTC (permalink / raw) To: Mathias Nyman Cc: Thinh Nguyen, Dayu Jiang, Greg Kroah-Hartman, Mathias Nyman, Longfang Liu, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, yudongbin, guhuinan, chenyu45, mahongwei3, Niklas Neronin On Fri, 27 Feb 2026 11:43:45 +0200, Mathias Nyman wrote: > On 2/26/26 20:17, Thinh Nguyen wrote: > > The controller is halted if there's an error like HCE. It's odd to > > try to "halt" it again. Not sure how this will impact for other > > controllers. > The host is messed up at this point, and we are not recovering it. > I don't think there is any harm in a manual halt at this stage. Specifically, calling xhci_halt() clears the USBCMD.Run flag and all USBCMD interrupt enable flags. Seems relatively harmless. Clearing USBCMD.Run would be the first step of resetting the HC anyway, so the HW should expect it to happen afetr reporting HCE. In case of HSE the HW should clear the Run bit by itself (4.10.2.6), but no such requirement seems to exist for HCE (4.24.1). The call also sets XHCI_STATE_HALTED and CMD_RING_STATE_STOPPED flags, which helps with recovering stuck URBs. When class drivers time out and unlink them, the URBs are given back instantly without drama. I just tested the HSE case where xhci_halt() is already being called and it worked for me. If I remove xhci_halt() then the driver tries to issue Stop Endpoint commands, times out and calls hc_died(). Messy. I suspect that the same happened with HCE before this patch. Regards, Michal ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling 2026-02-27 11:05 ` Michal Pecio @ 2026-02-28 0:06 ` Thinh Nguyen 0 siblings, 0 replies; 16+ messages in thread From: Thinh Nguyen @ 2026-02-28 0:06 UTC (permalink / raw) To: Michal Pecio Cc: Mathias Nyman, Thinh Nguyen, Dayu Jiang, Greg Kroah-Hartman, Mathias Nyman, Longfang Liu, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, yudongbin, guhuinan, chenyu45, mahongwei3, Niklas Neronin On Fri, Feb 27, 2026, Michal Pecio wrote: > On Fri, 27 Feb 2026 11:43:45 +0200, Mathias Nyman wrote: > > On 2/26/26 20:17, Thinh Nguyen wrote: > > > The controller is halted if there's an error like HCE. It's odd to > > > try to "halt" it again. Not sure how this will impact for other > > > controllers. > > The host is messed up at this point, and we are not recovering it. > > I don't think there is any harm in a manual halt at this stage. > > Specifically, calling xhci_halt() clears the USBCMD.Run flag and > all USBCMD interrupt enable flags. Seems relatively harmless. Clearing > USBCMD.Run would be the first step of resetting the HC anyway, so the > HW should expect it to happen afetr reporting HCE. > > In case of HSE the HW should clear the Run bit by itself (4.10.2.6), > but no such requirement seems to exist for HCE (4.24.1). Check 4.21.2. The controller should clear the Run/Stop bit for both cases. > > The call also sets XHCI_STATE_HALTED and CMD_RING_STATE_STOPPED flags, Perhaps we update these driver flags, but we should not need to clear the Run/Stop bit. That's not the first thing we should do. > which helps with recovering stuck URBs. When class drivers time out > and unlink them, the URBs are given back instantly without drama. > > I just tested the HSE case where xhci_halt() is already being called > and it worked for me. If I remove xhci_halt() then the driver tries to > issue Stop Endpoint commands, times out and calls hc_died(). Messy. > I suspect that the same happened with HCE before this patch. > That's just the xhci driver needing to update the software states to properly handle the teardown knowing that the xHC is halted. BR, Thinh ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling 2026-02-27 9:43 ` Mathias Nyman 2026-02-27 11:05 ` Michal Pecio @ 2026-02-28 0:18 ` Thinh Nguyen 1 sibling, 0 replies; 16+ messages in thread From: Thinh Nguyen @ 2026-02-28 0:18 UTC (permalink / raw) To: Mathias Nyman Cc: Thinh Nguyen, Dayu Jiang, Greg Kroah-Hartman, Mathias Nyman, Longfang Liu, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, yudongbin, guhuinan, chenyu45, mahongwei3, Niklas Neronin On Fri, Feb 27, 2026, Mathias Nyman wrote: > On 2/26/26 20:17, Thinh Nguyen wrote: > > On Thu, Feb 26, 2026, Mathias Nyman wrote: > > > On 2/26/26 11:27, Dayu Jiang wrote: > > > > Hi Greg, > > > > > > > > I have updated the changelog text as requested and resubmitted the patch. > > > > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xiaomi.com/__;!!A4F2R9G_pg!ZSJNDKyOinm26qngopLW-axiQtwDAMely4bDqtqYDGv1ErWCtS6kZ6ZamdiKoZKuCyCk0IxMQK5g625GEIxYWFzKpAEiCUq7$ > > > > Please kindly review it and let me know if it is acceptable now. > > > > > > I'll send it forward, but changed the commit message. > > > Does this modified version still describe the case accurately: > > > > > > usb: xhci: Prevent interrupt storm on host controller error (HCE) > > > > > > The xHCI controller reports a Host Controller Error (HCE) in UAS Storage > > > Device plug/unplug scenarios on Android devices, which is checked in > > > xhci_irq() function and causes an interrupt storm (since the interrupt > > > isn’t cleared), leading to severe system-level faults. > > > > > > When the xHC controller reports HCE in the interrupt handler, the driver > > > only logs a warning and assumes xHC activity will stop. The interrupt storm > > > does however continue until driver manually disables xHC interrupt and > > > stops the controller by calling xhci_halt(). > > > > > > The change is made in xhci_irq() function where STS_HCE status is > > > checked, mirroring the existing error handling pattern used for > > > STS_FATAL errors. > > > > > > This only fixes the interrupt storm. Proper HCE recovery requires resetting > > > and re-initializing the xHC. > > > > > > > The controller is halted if there's an error like HCE. It's odd to try > > to "halt" it again. Not sure how this will impact for other controllers. > > This is why I changed the commit message from: > > "When the xHCI controller reports HCE in the interrupt handler, the driver > currently only logs a warning and continues execution. However, HCE > indicates a critical hardware failure that requires the controller to be > halted. This ensures the controller is in a consistent state and prevents > further operations on failed hardware." > > to: > > "When the xHC controller reports HCE in the interrupt handler, the driver > only logs a warning and assumes xHC activity will stop. The interrupt storm > does however continue until driver manually disables xHC interrupt and > stops the controller by calling xhci_halt()." > > I can clarify it further by stating that .."assumes xHC activity will stop > as stated in xHCI spec. On some xHC controllers an interrupt storm continues after > HCE error, and only ceases after manually".. > > The host is messed up at this point, and we are not recovering it. I don't think > there is any harm in a manual halt at this stage. We should update the xhci driver states when there's HCE and the controller is halted but we don't need to manually clear the Run/Stop bit again. > > > Even if we don't have the full HCE recovery implemented, did we try to > > just do HCRST, which is the first step of the recovery? > > Specs state that HCRST might re-trigger the HCE if it's due to a "hard" fault, That's only after we re-initialize the controller as noted in the spec, not immidiately after HCRST. > and driver needs to take action to prevent a HCE - HCRST recovery loop. > > HCRST will clear all registers, so we need to reinitialize everything here, > write back addresses of event rings, command rings, DCBAA, scratchpads > dequeue pointers etc. > > I support taking this fix to prevent the interrupt storm, an issue seen in real > life. And then solve proper recovery later. That's fair to me. > > Niklas is actually working on decoupling memory allocation and xHC register > initialization which will help future HCE recovery work. > That's great! I'm looking forward to that. Thanks, Thinh ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling 2026-02-26 16:44 ` Mathias Nyman 2026-02-26 18:17 ` Thinh Nguyen @ 2026-02-27 7:33 ` Dayu Jiang 1 sibling, 0 replies; 16+ messages in thread From: Dayu Jiang @ 2026-02-27 7:33 UTC (permalink / raw) To: Mathias Nyman Cc: Greg Kroah-Hartman, Mathias Nyman, Longfang Liu, linux-usb, linux-kernel, yudongbin, guhuinan, chenyu45, mahongwei3, Dayu Jiang On Thu, Feb 26, 2026 at 06:44:02PM +0200, Mathias Nyman wrote: > On 2/26/26 11:27, Dayu Jiang wrote: > > Hi Greg, > > > > I have updated the changelog text as requested and resubmitted the patch. > > https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xiaomi.com/ > > Please kindly review it and let me know if it is acceptable now. > > I'll send it forward, but changed the commit message. > Does this modified version still describe the case accurately: > > usb: xhci: Prevent interrupt storm on host controller error (HCE) > > The xHCI controller reports a Host Controller Error (HCE) in UAS Storage > Device plug/unplug scenarios on Android devices, which is checked in > xhci_irq() function and causes an interrupt storm (since the interrupt > isn’t cleared), leading to severe system-level faults. > > When the xHC controller reports HCE in the interrupt handler, the driver > only logs a warning and assumes xHC activity will stop. The interrupt storm > does however continue until driver manually disables xHC interrupt and > stops the controller by calling xhci_halt(). > > The change is made in xhci_irq() function where STS_HCE status is > checked, mirroring the existing error handling pattern used for > STS_FATAL errors. > > This only fixes the interrupt storm. Proper HCE recovery requires resetting > and re-initializing the xHC. The modified version looks good and accurate to me. Please feel free to merge it. Thanks Dayu Jiang > > Thanks > Mathias ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling 2026-01-27 11:04 [PATCH] usb: xhci: add xhci_halt() for HCE Handling jiangdayu 2026-01-27 11:22 ` Greg Kroah-Hartman @ 2026-01-27 12:25 ` Mathias Nyman 2026-01-28 8:53 ` Dayu Jiang 1 sibling, 1 reply; 16+ messages in thread From: Mathias Nyman @ 2026-01-27 12:25 UTC (permalink / raw) To: jiangdayu, Mathias Nyman, Greg Kroah-Hartman Cc: Longfang Liu, linux-usb, linux-kernel, yudongbin, guhuinan, chenyu45, mahongwei3 Hi On 1/27/26 13:04, jiangdayu wrote: > When the xHCI controller reports a Host Controller Error (HCE) status > in the interrupt handler, the driver currently only logs a warning and > continues execution. However, a Host Controller Error indicates a > critical hardware failure that requires the controller to be halted. > The host should cease all activity when it sets the HCE bit. See xHCI spec 4.24.1 'Internal Errors': "When the HCE flag is set to ‘1’ the xHC shall cease all activity. Software response to the assertion of HCE is to reset the xHC (HCRST = ‘1’) and reinitialize it." Same is true for "Host system error" HSE (STS_FATAL), not sure why we halt it manually in that case. > Add xhci_halt(xhci) call after the HCE warning to properly halt the > controller when this error condition is detected. This ensures the > controller is in a consistent state and prevents further operations > on a failed hardware. Additionally, if there are still unhandled > interrupts at this point, it may cause interrupt storm. Is this something that has been seen on real word hardware? If yes, and halting the host helped ,then this fix is ok by me. At least until a proper host reset solution is implemented. Thanks Mathias ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling 2026-01-27 12:25 ` Mathias Nyman @ 2026-01-28 8:53 ` Dayu Jiang 0 siblings, 0 replies; 16+ messages in thread From: Dayu Jiang @ 2026-01-28 8:53 UTC (permalink / raw) To: Mathias Nyman Cc: Greg Kroah-Hartman, Longfang Liu, linux-usb, linux-kernel, yudongbin, guhuinan, chenyu45, mahongwei3, Dayu Jiang On Tue, Jan 27, 2026 at 02:25:19PM +0200, Mathias Nyman wrote: > Hi > > On 1/27/26 13:04, jiangdayu wrote: > > When the xHCI controller reports a Host Controller Error (HCE) status > > in the interrupt handler, the driver currently only logs a warning and > > continues execution. However, a Host Controller Error indicates a > > critical hardware failure that requires the controller to be halted. > > > > The host should cease all activity when it sets the HCE bit. > > See xHCI spec 4.24.1 'Internal Errors': > "When the HCE flag is set to ‘1’ the xHC shall cease all activity. > Software response to the assertion of HCE is to reset the > xHC (HCRST = ‘1’) and reinitialize it." > > Same is true for "Host system error" HSE (STS_FATAL), not sure > why we halt it manually in that case. > > > Add xhci_halt(xhci) call after the HCE warning to properly halt the > > controller when this error condition is detected. This ensures the > > controller is in a consistent state and prevents further operations > > on a failed hardware. Additionally, if there are still unhandled > > interrupts at this point, it may cause interrupt storm. > > Is this something that has been seen on real word hardware? > If yes, and halting the host helped ,then this fix is ok by me. > At least until a proper host reset solution is implemented. Yes, the HCE issue (and subsequent interrupt storm) has been consistently observed on production Android devices during UASP device plug/unplug operations. Adding xhci_halt() effectively resolves the system-level interrupt storm issue caused by HCE. > > Thanks > Mathias > ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2026-02-28 0:22 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-01-27 11:04 [PATCH] usb: xhci: add xhci_halt() for HCE Handling jiangdayu 2026-01-27 11:22 ` Greg Kroah-Hartman 2026-01-28 8:48 ` Dayu Jiang 2026-01-28 8:56 ` Greg Kroah-Hartman 2026-02-26 9:27 ` Dayu Jiang 2026-02-26 16:44 ` Mathias Nyman 2026-02-26 18:17 ` Thinh Nguyen 2026-02-27 7:26 ` Dayu Jiang 2026-02-28 0:22 ` Thinh Nguyen 2026-02-27 9:43 ` Mathias Nyman 2026-02-27 11:05 ` Michal Pecio 2026-02-28 0:06 ` Thinh Nguyen 2026-02-28 0:18 ` Thinh Nguyen 2026-02-27 7:33 ` Dayu Jiang 2026-01-27 12:25 ` Mathias Nyman 2026-01-28 8:53 ` Dayu Jiang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox