From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 580302857EA for ; Mon, 4 May 2026 07:31:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.48 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777879887; cv=none; b=I6+l+bNU1hpiyy4up/Ic4qfO4rJf/gIHgRY5Os90mCGzW/F2W3NTNnzM4XXQ7Qj+O1iP9afPcETmWc0lU3I3x+qCB4sToUHVQ7Y9bVTVAWu6PxW+tqUReQu49feZDFpnJ8JZq4+8UidU7adXbhY/IrNQN2+Vzgcp2k++MutFGfY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777879887; c=relaxed/simple; bh=YSB5SQmQmqXs7OoK/OoQ5a2W1FhgmfKgwxP5ajgxFU8=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Q/NjtNgcI9LmYczAvqsj2dCYc1YFeYQ+cz/4hjQ1gFl+0COqf5A3gWkXF2KcgtGzNl7+bCrfxOcmlv4CFOIoqK5uas1iY4lcwLqb4oW0tDgMLo6IVP7axGUY1nscz0ZZXVH2RyZOLQTNchg/lCRLSPcDj8R8APgoFWzidFb+WZw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=W9ZSuEah; arc=none smtp.client-ip=209.85.128.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="W9ZSuEah" Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-4891c0620bcso25734375e9.1 for ; Mon, 04 May 2026 00:31:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777879885; x=1778484685; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=CatfaxbBgZQ3LQXPfEhaZbjDP184rQ7dewMd5HLOFbs=; b=W9ZSuEahfN+B27Yo/Iq8SyYIpXBgHgdkFI1IiU9mGjuWuUnjxqrOyg6Sp/vLNgFv5O euqVDJd7SxJjtUSFuT8hw6rsRgy76NWA0Xv29/1LH+XdChBmyS12v7gTZHw3DkHW7B5F sFMIt0vjOST37CkFyTECbWYSLZxIeOOw/uV8hIksP1bLyp8cDJg6lFA8fwrURCZMENB5 zOxNqha0831z7dO1oF1WI3dNB3WJ56rS7ZJfVxZvGVVKxrg+AnZ9dB5kUowqKYDaGP3Y pbgneJXhy5Ri8OhG9BMczOSjDqv9yWfTyKCODqyn8KNvZe0HYjylr0ZLjOQAcsp8nl8F VVcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777879885; x=1778484685; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=CatfaxbBgZQ3LQXPfEhaZbjDP184rQ7dewMd5HLOFbs=; b=GOvoag1+rr/kR/ljXknSLxVhRxVCKQjoc+k6qxxfZ142gnotnvfpZYBIAFO1CgJ42b XEfsPV8XvBjKaNKqoMJtrzXjcH/oahdPmF/EB4J+oWkba209AWMKeEAELsg76Tvj7w7F qBCjUeV0c9HPvUVQXMC+ZpEl5bBsntQJP+f0ND7j5EuhhH7wvhZGU7UOYI6SuRofLPsb UUbbMCOjrstaomkj+vET1ZPzusWIpoLkt9yLuiog2Nw+guGT6zmlhQM1wBIQnWdBvtLF xsNzx0qner8rXiGr/3HLHxChTcYiid14p5Lg6y02G5xZFvIqHzAk2npIlx9MWb0O2EE8 Cxyg== X-Gm-Message-State: AOJu0Ywiea+ai1+7vexsDpLUJvzzefk/yv4K/u8dacDxy6+NuJyRN3FA fyeCH1OZZerI+DGmf3pp/PccoVEgWrzAd6UESkqYpA2+zZYv1kRiPB4X X-Gm-Gg: AeBDieurzKR0hMwOcDX21Vp1OJgyF51L9XOGhbi1PPrM30174flwfKJbCE334dOKuWN M/V5S7iAlNayGCYJxhHDoY2yYFp8nUvT6o/Vv9euynufdeqU+kvnqwgoNoJoRrpU7Akrz5JhpNB PybAbwNDdpMgbYqCAUbwqKUHA6odfEGfMt8Bw70wEy+Fl0VwrpSZhpO7sQz78L4kVtQ7dY81vrQ NIujFXgB4Tox1zw/thwfkWXpffJtKFLXYOaHSApIOlyz5k+UQqN5+1L1F4jR1WUHlFTBbkNrPWB kuikn+qR09lHuW6fjjh4yGiyW1F2NOtStLNbXgQBhcpRFwNuoO55XVVfnc8hk2Lle7NXCp5I9mC mT3zDtP3bmbsgi4acMQvx/O/JFfjFdMVk1N9V7LNgcI6btE75+OIb6sHJ2QT/Pnstcn505wsGX0 g11RBbJH34apUB+c1kARzk6XAKAnoMQFM= X-Received: by 2002:a05:600c:4342:b0:485:2a85:e5ec with SMTP id 5b1f17b1804b1-48c6d455f69mr68059795e9.2.1777879884339; Mon, 04 May 2026 00:31:24 -0700 (PDT) Received: from foxbook (bgt227.neoplus.adsl.tpnet.pl. [83.28.83.227]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-48a82308d77sm355839375e9.14.2026.05.04.00.31.23 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Mon, 04 May 2026 00:31:24 -0700 (PDT) Date: Mon, 4 May 2026 09:31:18 +0200 From: Michal Pecio To: Desnes Nunes Cc: linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org, gregkh@linuxfoundation.org, mathias.nyman@intel.com, stable@vger.kernel.org Subject: Re: [PATCH RFT RFC] usb: xhci: Kill hosts with HCE or HSE on command timeout Message-ID: <20260504093118.615ff480.michal.pecio@gmail.com> In-Reply-To: <20260503213111.117db3a1.michal.pecio@gmail.com> References: <20260430014817.2006885-1-desnesn@redhat.com> <20260430104850.352bd946.michal.pecio@gmail.com> <20260430235453.2288c973.michal.pecio@gmail.com> <20260502114644.76e6b5a3.michal.pecio@gmail.com> <20260502235517.089ba5bf.michal.pecio@gmail.com> <20260503071749.6abda137.michal.pecio@gmail.com> <20260503213111.117db3a1.michal.pecio@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Sun, 3 May 2026 21:31:11 +0200, Michal Pecio wrote: > My first wild guess would be that HSE is caused by resetting IOMMU > while the xHC is unaware of kexec and continuing to DMA old buffers. > Attached patch checks for this and also tries to explicitly clear > HSE, although resetting ought to clear it too. But HW has bugs... Never mind, here's the smoking gun: [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: xHCI Host Controller [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: new USB bus registered, assigned bus number 3 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: // Halt the HC [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Resetting HCD [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: // Reset the HC [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Wait for controller to be ready for doorbell rings [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Reset complete [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Enabling 64-bit DMA addresses. [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Calling HCD init [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Starting xhci_init [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: HCD page size set to 4K [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Device context base array address = 0x0x000000100167c000 (DMA), 00000000d042f7e3 (virt) [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Allocated command ring at 0000000016f013a6 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: First segment DMA is 0x0x000000100167d000 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Allocating primary event ring [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Allocating 34 scratchpad buffers [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Ext Cap 000000001bef6947, port offset = 1, count = 14, revision = 0x2 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: PSIV:1 PSIE:2 PLT:0 PFD:0 LP:0 PSIM:12 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: PSIV:2 PSIE:1 PLT:0 PFD:0 LP:0 PSIM:1500 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: PSIV:3 PSIE:2 PLT:0 PFD:0 LP:0 PSIM:480 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: xHCI 1.0: support USB2 hardware lpm [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Ext Cap 00000000a5bcc554, port offset = 17, count = 8, revision = 0x3 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: PSIV:4 PSIE:3 PLT:0 PFD:1 LP:0 PSIM:5 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: PSIV:5 PSIE:3 PLT:0 PFD:1 LP:1 PSIM:10 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: PSIV:6 PSIE:3 PLT:0 PFD:1 LP:1 PSIM:10 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: PSIV:7 PSIE:3 PLT:0 PFD:1 LP:1 PSIM:20 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Found 14 USB 2.0 ports and 8 USB 3.0 ports. [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: xHC can handle at most 64 device slots [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Setting Max device slots reg = 0x40 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Setting command ring address to 0x100167d001 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Doorbell array is located at offset 0x3000 from cap regs base addr [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: // Write event ring dequeue pointer, preserving EHB bit [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Finished xhci_init [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Called HCD init [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: hcc params 0x20007fc1 hci version 0x120 quirks 0x0000000200009810 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Got SBRN 50 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: MWI active [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Finished xhci_pci_reinit [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: supports USB remote wakeup [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: xhci_run [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: ERST deq = 64'h100167e000 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Finished xhci_run for main hcd [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: xHCI Host Controller [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: new USB bus registered, assigned bus number 4 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Host supports USB 3.2 Enhanced SuperSpeed [Fri May 1 09:46:41 2026] xhci_hcd 0000:80:14.0: supports USB remote wakeup [Fri May 1 09:46:41 2026] xhci_hcd 0000:80:14.0: Enable interrupts [Fri May 1 09:46:41 2026] xhci_hcd 0000:80:14.0: Enable primary interrupter [Fri May 1 09:46:41 2026] xhci_hcd 0000:80:14.0: // Turn on HC, cmd = 0x5. [Fri May 1 09:46:41 2026] DMAR: DRHD: handling fault status reg 2 [Fri May 1 09:46:41 2026] DMAR: [DMA Read NO_PASID] Request device [80:14.0] fault addr 0x1001680000 [fault reason 0x39] SM: Present bit in Root Entry is clear The chip IOMMU faults shortly after setting USBCMD.RUN = 1. Such fault is expected to cause HSE assertion and usually it does. You will probably find that HSE is already set while Enable Slot is being queued, even if it was clear in xhci_gen_setup(). 1001680000 is close to valid addresses like 100167e000 or 100167c000. Possible causes: - xHCI or IOMMU driver bug - HW corrupted a pointer - HW accessed something out of bounds - HW dereferenced a stale pointer from the original kernel Do you happen to have more of those logs saved, are they all like that? Any chance that 1001680000 appears somewhere in the main kernel's log? If not, I suppose we will have to log every single DMA mapping created by the driver and see if this gives any new clues. Regards, Michal