On Sun, 3 May 2026 13:20:38 -0300, Desnes Nunes wrote: > Yes, same patched binary on the main kernel and kdump kernel. That's not a great news because it seems that the same HSE could occur on any kexec, not just kdump. It's unclear why it happens, it seems that after initial boot the HC works normally (does it?) but then kexec-ing breaks it somehow. I don't think this has anything to do with the Battlemage, because in the particular case which you shared, GPU began initialization *after* HSE had already been logged. My first wild guess would be that HSE is caused by resetting IOMMU while the xHC is unaware of kexec and continuing to DMA old buffers. Attached patch checks for this and also tries to explicitly clear HSE, although resetting ought to clear it too. But HW has bugs... So it may not help, but maybe it will if we are lucky, or at least it may offer some hint about when things go wrong. > So, I confirm that this patch, which checks for HSE or HCE indeed > fixes the bug, without having to rely to a > wait_for_completion_timeout(): > > # grep -i HSE -A5 kexec-dmesg.log > [Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: Command timeout, > USBSTS: 0x00000015 HCHalted HSE PCD > [Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: kill the damn thing > [Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: xHCI host controller > not responding, assume dead > [Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: HC died; cleaning up > [Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: Error while > assigning device slot ID: Command Aborted Thanks for testing, that's what the patch was intended to do. There is no lockup, but of course the chip doesn't work afterwards. Regards, Michal