linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [REGRESSION 6.16] xHCI host not responding to stop endpoint command after suspend and resume
@ 2025-08-18 21:11 Michał Pecio
  2025-08-19  6:41 ` Michał Pecio
  0 siblings, 1 reply; 5+ messages in thread
From: Michał Pecio @ 2025-08-18 21:11 UTC (permalink / raw)
  To: Mathias Nyman, linux-usb; +Cc: Niklas Neronin, regressions, Christian Heusel

Hi,

Two Arch Linux users reported that a suspend cycle makes their xHCI
controllers die since the 6.16 release.

Link:
https://bbs.archlinux.org/viewtopic.php?id=307641

Posted log snippet:
Aug 16 12:12:53 talc kernel: xhci_hcd 0000:00:14.0: xHCI host not responding to stop endpoint command
Aug 16 12:12:53 talc kernel: xhci_hcd 0000:00:14.0: xHCI host controller not responding, assume dead
Aug 16 12:12:53 talc kernel: xhci_hcd 0000:00:14.0: HC died; cleaning up
Aug 16 12:12:53 talc kernel: usb 1-8: PM: dpm_run_callback(): usb_dev_resume returns -22
Aug 16 12:12:53 talc kernel: usb 1-5: PM: dpm_run_callback(): usb_dev_resume returns -5
Aug 16 12:12:53 talc kernel: usb 1-5: PM: failed to resume async: error -5
Aug 16 12:12:53 talc kernel: usb 1-8: PM: failed to resume async: error -22
Aug 16 12:12:53 talc kernel: OOM killer enabled.
Aug 16 12:12:53 talc kernel: Restarting tasks: Starting
Aug 16 12:12:53 talc kernel: mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i915_hdcp_ops [i915])
Aug 16 12:12:53 talc kernel: usb 1-5: USB disconnect, device number 3
Aug 16 12:12:53 talc kernel: Restarting tasks: Done
Aug 16 12:12:53 talc kernel: random: crng reseeded on system resumption
Aug 16 12:12:53 talc kernel: usb 1-8: USB disconnect, device number 4
Aug 16 12:12:53 talc kernel: PM: suspend exit

I tried suspending my PC packed with several xHCs and none of them died,
so it may be hardware specific. The log shows this Intel PCI ID:

Aug 16 12:07:31 talc kernel: pci 0000:00:14.0: [8086:4ded] type 00 class 0x0c0330 conventional PCI endpoint


A bisect effort is ongoing and c0c9379f235d ("Merge tag 'usb-6.16-rc1'
of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb") is bad,
so that's where the bug likely came from.

Any ideas what it could be? I have none, looks like it may be another
utterly unobvious issue from a trivial cleanup patch.

Christian (Cc) is the Arch package maintainer helping track this down.

Regards,
Michal

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION 6.16] xHCI host not responding to stop endpoint command after suspend and resume
  2025-08-18 21:11 [REGRESSION 6.16] xHCI host not responding to stop endpoint command after suspend and resume Michał Pecio
@ 2025-08-19  6:41 ` Michał Pecio
  2025-08-19  8:56   ` Mathias Nyman
  0 siblings, 1 reply; 5+ messages in thread
From: Michał Pecio @ 2025-08-19  6:41 UTC (permalink / raw)
  To: Mathias Nyman, linux-usb; +Cc: Niklas Neronin, regressions, Christian Heusel

On Mon, 18 Aug 2025 23:11:03 +0200, Michał Pecio wrote:
> A bisect effort is ongoing and c0c9379f235d ("Merge tag 'usb-6.16-rc1'
> of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb") is bad,
> so that's where the bug likely came from.

Looks like the result is in.

e1db856bd28891d70008880d7f1d3b8d1ea948fd is the first bad commit
commit e1db856bd28891d70008880d7f1d3b8d1ea948fd
Author: Niklas Neronin <niklas.neronin@linux.intel.com>
Date:   Thu May 15 16:56:14 2025 +0300

    usb: xhci: remove '0' write to write-1-to-clear register

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION 6.16] xHCI host not responding to stop endpoint command after suspend and resume
  2025-08-19  6:41 ` Michał Pecio
@ 2025-08-19  8:56   ` Mathias Nyman
  2025-08-19  9:08     ` Neronin, Niklas
  0 siblings, 1 reply; 5+ messages in thread
From: Mathias Nyman @ 2025-08-19  8:56 UTC (permalink / raw)
  To: Michał Pecio, linux-usb
  Cc: Niklas Neronin, regressions, Christian Heusel

On 19.8.2025 9.41, Michał Pecio wrote:
> On Mon, 18 Aug 2025 23:11:03 +0200, Michał Pecio wrote:
>> A bisect effort is ongoing and c0c9379f235d ("Merge tag 'usb-6.16-rc1'
>> of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb") is bad,
>> so that's where the bug likely came from.
> 
> Looks like the result is in.
> 
> e1db856bd28891d70008880d7f1d3b8d1ea948fd is the first bad commit
> commit e1db856bd28891d70008880d7f1d3b8d1ea948fd
> Author: Niklas Neronin <niklas.neronin@linux.intel.com>
> Date:   Thu May 15 16:56:14 2025 +0300
> 
>      usb: xhci: remove '0' write to write-1-to-clear register

Thanks for tracking this down, I see the issue now

We may lose interrupts due to this patch, example:

Hardware sets IMAN_IP BIT(0) when in needs attention
Driver later allows xHC interrupt by setting IMAN_IE BIT(1), but
Driver clears IMAN_IP (RW1C) when setting IMAN_IE so no interrupt is triggered.
  
interrupts are only triggered if both IMAN_IE and IMAN_IP are set, (and some other
moderation and event handling bits are correct)

we need to make sure we don't accidentally clear a pending interrupt (IMAN_IP)
in both the enable and disable case.

Thanks
Mathias



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION 6.16] xHCI host not responding to stop endpoint command after suspend and resume
  2025-08-19  8:56   ` Mathias Nyman
@ 2025-08-19  9:08     ` Neronin, Niklas
  2025-08-19 10:42       ` Mathias Nyman
  0 siblings, 1 reply; 5+ messages in thread
From: Neronin, Niklas @ 2025-08-19  9:08 UTC (permalink / raw)
  To: Mathias Nyman, Michał Pecio, linux-usb; +Cc: regressions, Christian Heusel



On 19/08/2025 11.56, Mathias Nyman wrote:
> On 19.8.2025 9.41, Michał Pecio wrote:
>> On Mon, 18 Aug 2025 23:11:03 +0200, Michał Pecio wrote:
>>> A bisect effort is ongoing and c0c9379f235d ("Merge tag 'usb-6.16-rc1'
>>> of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb") is bad,
>>> so that's where the bug likely came from.
>>
>> Looks like the result is in.
>>
>> e1db856bd28891d70008880d7f1d3b8d1ea948fd is the first bad commit
>> commit e1db856bd28891d70008880d7f1d3b8d1ea948fd
>> Author: Niklas Neronin <niklas.neronin@linux.intel.com>
>> Date:   Thu May 15 16:56:14 2025 +0300
>>
>>      usb: xhci: remove '0' write to write-1-to-clear register
> 
> Thanks for tracking this down, I see the issue now
> 
> We may lose interrupts due to this patch, example:
> 
> Hardware sets IMAN_IP BIT(0) when in needs attention
> Driver later allows xHC interrupt by setting IMAN_IE BIT(1), but
> Driver clears IMAN_IP (RW1C) when setting IMAN_IE so no interrupt is triggered.

Apologies for my blunder.

So, there can be an interrupt pending even when the interrupt is not enabled?
But there (ideally) should not be an interrupt pending when disabling the interrupt?

I can submit a fix patch.

Best regards,
Niklas

>  
> interrupts are only triggered if both IMAN_IE and IMAN_IP are set, (and some other
> moderation and event handling bits are correct)
> 
> we need to make sure we don't accidentally clear a pending interrupt (IMAN_IP)
> in both the enable and disable case.
> 
> Thanks
> Mathias
> 
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION 6.16] xHCI host not responding to stop endpoint command after suspend and resume
  2025-08-19  9:08     ` Neronin, Niklas
@ 2025-08-19 10:42       ` Mathias Nyman
  0 siblings, 0 replies; 5+ messages in thread
From: Mathias Nyman @ 2025-08-19 10:42 UTC (permalink / raw)
  To: Neronin, Niklas, Michał Pecio, linux-usb
  Cc: regressions, Christian Heusel

On 19.8.2025 12.08, Neronin, Niklas wrote:
> 
> 
> On 19/08/2025 11.56, Mathias Nyman wrote:
>> On 19.8.2025 9.41, Michał Pecio wrote:
>>> On Mon, 18 Aug 2025 23:11:03 +0200, Michał Pecio wrote:
>>>> A bisect effort is ongoing and c0c9379f235d ("Merge tag 'usb-6.16-rc1'
>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb") is bad,
>>>> so that's where the bug likely came from.
>>>
>>> Looks like the result is in.
>>>
>>> e1db856bd28891d70008880d7f1d3b8d1ea948fd is the first bad commit
>>> commit e1db856bd28891d70008880d7f1d3b8d1ea948fd
>>> Author: Niklas Neronin <niklas.neronin@linux.intel.com>
>>> Date:   Thu May 15 16:56:14 2025 +0300
>>>
>>>       usb: xhci: remove '0' write to write-1-to-clear register
>>
>> Thanks for tracking this down, I see the issue now
>>
>> We may lose interrupts due to this patch, example:
>>
>> Hardware sets IMAN_IP BIT(0) when in needs attention
>> Driver later allows xHC interrupt by setting IMAN_IE BIT(1), but
>> Driver clears IMAN_IP (RW1C) when setting IMAN_IE so no interrupt is triggered.
> 
> Apologies for my blunder.
> 
> So, there can be an interrupt pending even when the interrupt is not enabled?

So it seems, Interrupt pending (IMAN_IP) is set if:
- event handler busy (EHB) is 0, and
- moderation counter (IMODC) reaches 0, and
- Internal IPE bit is set, meaning:
   xHC inserted an event to the event ring (ring not empty), and
   "block event interrupt" (BEI) is 0

It does not depend on interrupt enable (IMAN_IE) bit, that only gates the interrupt from being
generated for this interrupter

See xhci section 4.17.5

> But there (ideally) should not be an interrupt pending when disabling the interrupt?

Yes, So it would be good to still print the debug message if IP is set.
But we should not clear the IP bit here, it will trigger an interrupt and get handled once
IE is enabled back.

> 
> I can submit a fix patch.

Sounds good.
Lets get this fixed as soon as possible.

Thanks
Mathias


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-08-19 10:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-18 21:11 [REGRESSION 6.16] xHCI host not responding to stop endpoint command after suspend and resume Michał Pecio
2025-08-19  6:41 ` Michał Pecio
2025-08-19  8:56   ` Mathias Nyman
2025-08-19  9:08     ` Neronin, Niklas
2025-08-19 10:42       ` Mathias Nyman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).