public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Michal Pecio <michal.pecio@gmail.com>
To: Desnes Nunes <desnesn@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org,
	gregkh@linuxfoundation.org, mathias.nyman@intel.com,
	stable@vger.kernel.org
Subject: Re: [PATCH RFT RFC] usb: xhci: Kill hosts with HCE or HSE on command timeout
Date: Mon, 4 May 2026 09:31:18 +0200	[thread overview]
Message-ID: <20260504093118.615ff480.michal.pecio@gmail.com> (raw)
In-Reply-To: <20260503213111.117db3a1.michal.pecio@gmail.com>

On Sun, 3 May 2026 21:31:11 +0200, Michal Pecio wrote:
> My first wild guess would be that HSE is caused by resetting IOMMU
> while the xHC is unaware of kexec and continuing to DMA old buffers.
> Attached patch checks for this and also tries to explicitly clear
> HSE, although resetting ought to clear it too. But HW has bugs...

Never mind, here's the smoking gun:

[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: xHCI Host Controller
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: new USB bus
registered, assigned bus number 3
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: // Halt the HC
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: Resetting HCD
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: // Reset the HC
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: Wait for controller
to be ready for doorbell rings
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: Reset complete
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: Enabling 64-bit DMA addresses.
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: Calling HCD init
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: Starting xhci_init
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: HCD page size set to 4K
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: Device context base
array address = 0x0x000000100167c000 (DMA), 00000000d042f7e3 (virt)
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: Allocated command
ring at 0000000016f013a6
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: First segment DMA is
0x0x000000100167d000
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: Allocating primary event ring
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: Allocating 34
scratchpad buffers
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: Ext Cap
000000001bef6947, port offset = 1, count = 14, revision = 0x2
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: PSIV:1 PSIE:2 PLT:0
PFD:0 LP:0 PSIM:12
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: PSIV:2 PSIE:1 PLT:0
PFD:0 LP:0 PSIM:1500
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: PSIV:3 PSIE:2 PLT:0
PFD:0 LP:0 PSIM:480
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: xHCI 1.0: support
USB2 hardware lpm
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: Ext Cap
00000000a5bcc554, port offset = 17, count = 8, revision = 0x3
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: PSIV:4 PSIE:3 PLT:0
PFD:1 LP:0 PSIM:5
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: PSIV:5 PSIE:3 PLT:0
PFD:1 LP:1 PSIM:10
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: PSIV:6 PSIE:3 PLT:0
PFD:1 LP:1 PSIM:10
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: PSIV:7 PSIE:3 PLT:0
PFD:1 LP:1 PSIM:20
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: Found 14 USB 2.0
ports and 8 USB 3.0 ports.
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: xHC can handle at
most 64 device slots
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: Setting Max device
slots reg = 0x40
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: Setting command ring
address to 0x100167d001
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: Doorbell array is
located at offset 0x3000 from cap regs base addr
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: // Write event ring
dequeue pointer, preserving EHB bit
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: Finished xhci_init
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: Called HCD init
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: hcc params
0x20007fc1 hci version 0x120 quirks 0x0000000200009810
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: Got SBRN 50
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: MWI active
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: Finished xhci_pci_reinit
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: supports USB remote wakeup
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: xhci_run
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: ERST deq = 64'h100167e000
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: Finished xhci_run for main hcd
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: xHCI Host Controller
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: new USB bus
registered, assigned bus number 4
[Fri May  1 09:46:40 2026] xhci_hcd 0000:80:14.0: Host supports USB
3.2 Enhanced SuperSpeed
[Fri May  1 09:46:41 2026] xhci_hcd 0000:80:14.0: supports USB remote wakeup
[Fri May  1 09:46:41 2026] xhci_hcd 0000:80:14.0: Enable interrupts
[Fri May  1 09:46:41 2026] xhci_hcd 0000:80:14.0: Enable primary interrupter
[Fri May  1 09:46:41 2026] xhci_hcd 0000:80:14.0: // Turn on HC, cmd = 0x5.
[Fri May  1 09:46:41 2026] DMAR: DRHD: handling fault status reg 2
[Fri May  1 09:46:41 2026] DMAR: [DMA Read NO_PASID] Request device
[80:14.0] fault addr 0x1001680000 [fault reason 0x39] SM: Present bit
in Root Entry is clear

The chip IOMMU faults shortly after setting USBCMD.RUN = 1.
Such fault is expected to cause HSE assertion and usually it does.
You will probably find that HSE is already set while Enable Slot
is being queued, even if it was clear in xhci_gen_setup().

1001680000 is close to valid addresses like 100167e000 or 100167c000.

Possible causes:
- xHCI or IOMMU driver bug
- HW corrupted a pointer
- HW accessed something out of bounds
- HW dereferenced a stale pointer from the original kernel

Do you happen to have more of those logs saved, are they all like that?
Any chance that 1001680000 appears somewhere in the main kernel's log?

If not, I suppose we will have to log every single DMA mapping created
by the driver and see if this gives any new clues.

Regards,
Michal

      reply	other threads:[~2026-05-04  7:31 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30  1:48 [PATCH] usb: xhci: bound wait command completion to avoid kdump deadlock Desnes Nunes
2026-04-30  8:48 ` Michal Pecio
2026-04-30 17:27   ` Desnes Nunes
2026-04-30 21:54     ` Michal Pecio
2026-05-01 14:09       ` Desnes Nunes
2026-05-02  9:46         ` [PATCH RFT RFC] usb: xhci: Kill hosts with HCE or HSE on command timeout Michal Pecio
2026-05-02 11:38           ` Desnes Nunes
2026-05-02 21:55             ` Michal Pecio
2026-05-03  3:36               ` Desnes Nunes
2026-05-03  5:17                 ` Michal Pecio
2026-05-03 16:20                   ` Desnes Nunes
2026-05-03 19:31                     ` Michal Pecio
2026-05-04  7:31                       ` Michal Pecio [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260504093118.615ff480.michal.pecio@gmail.com \
    --to=michal.pecio@gmail.com \
    --cc=desnesn@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@intel.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox