From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5A6E8AD2C for ; Mon, 16 Mar 2026 00:39:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773621559; cv=none; b=WNC80U74Vb8gVxmwOkqmztQoBg9GXxU3gIMxkRyddve+vEawygHKP8XSdNuE0zpnLvOl90y76114AwZbX6DohtWPipvE2086DtzqD8AUECpYeZcvswLqTeHN3l6/VdKLybliTKY7LL6Sm/GAYydfu+wfwAm137ZvO4oliGob0yg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773621559; c=relaxed/simple; bh=hhMGypMkpp84RITcxptsFaKhcIrxdiwuVAeOX1Qzhfc=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: Content-Type:MIME-Version; b=UngnTsHQXXow9BrOqiVQLd9hN9xOF2fHvvPe5/B62USztm8a3HsUKZVoFsZz+g+W9dFHWlRJVm4IpgRYzCHyJWHCpTO+oDhiizgGeP6TNRww2MKqX+H92lzN7t/XltT9Er2QfTQxCDN+DhMno/cKTXoHCk2hv/rTtlghF5DuhFs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=dy/JIebL; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="dy/JIebL" Received: by smtp.kernel.org (Postfix) with ESMTPS id 03E7EC2BCB0 for ; Mon, 16 Mar 2026 00:39:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773621559; bh=hhMGypMkpp84RITcxptsFaKhcIrxdiwuVAeOX1Qzhfc=; h=From:To:Subject:Date:In-Reply-To:References:From; b=dy/JIebLf+b0OHx9syl4IrWABRx8fafE6a7y/w5HX+U0XYvgFpsI721P9CE54DCt/ D8klBs80UP2SMMWrBNmAPL/VK3nAZrt1rrvttO5N3ARe23+WOuwCSQcdJlFDxiZj21 BNIa5F8yuaapubMFS+/zJVsfM+KcqnS8LQ37UozMWGOZDR7IQ3z26+DpdwM4m57pP5 38tjt92A6svGqSQmy5YDRFm8k1DSP3LzjmUk3jF0wHGk59EBDM33EY86hBkrXkEw/p Uq0y8qC72QIe3OA/spstpNo5VG1c7ou8ToVsSADFO2pDdI/vlDSzZtIIjaWn9aQCqH JXSzh4BVJmv/g== Received: by aws-us-west-2-korg-bugzilla-1.web.codeaurora.org (Postfix, from userid 48) id F1F60C3279F; Mon, 16 Mar 2026 00:39:18 +0000 (UTC) From: bugzilla-daemon@kernel.org To: linux-usb@vger.kernel.org Subject: [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] Date: Mon, 16 Mar 2026 00:39:18 +0000 X-Bugzilla-Reason: None X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: AssignedTo drivers_usb@kernel-bugs.kernel.org X-Bugzilla-Product: Drivers X-Bugzilla-Component: USB X-Bugzilla-Version: 2.5 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: superveridical@gmail.com X-Bugzilla-Status: NEEDINFO X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: drivers_usb@kernel-bugs.kernel.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugzilla.kernel.org/ Auto-Submitted: auto-generated Precedence: bulk X-Mailing-List: linux-usb@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 https://bugzilla.kernel.org/show_bug.cgi?id=3D221073 --- Comment #40 from Alexander F (superveridical@gmail.com) --- >echo on > /sys/bus/pci/devices/0000:c4:00.4/power/control no effect. >The warning is unimportant, Yeah, I understand. I (likely mistakenly) assumed that whatever sets taint = on the kernel also flips the NonFatalError. If "DevSta:" can only be set by the hardware internally, than of course it's a different matter.=20=20 >was PCI DevSta: NonFatalErr+ ever set with the 'Forced MSI only' patch aft= er >resume? Never. I did 300 cycles with MSI-only patch with no issues and it was never= set to +. The only concerning things in dmesg during these 300 cycles were mult= iple (about 9) errors of this kind: amdgpu 0000:c4:00.0: amdgpu: Register(1) [regVPEC_QUEUE_RESET_REQ_6_1_1] fa= iled to reach value 0x00000000 !=3D 0x00000001n amdgpu 0000:c4:00.0: amdgpu: VPE queue reset failed >i.e. Does MSI-X usage on xHC trigger the DevSta: NonFatalErr+, causing xHC >interrupt handler to hot be called >is there something else causing PCI DevSta: NonFatalErr+ in resume which f= or >some reason only affects/omits MSI-X handler while MSI work and handler is >called as it should. Unfortunately, I'm not equipped to find that out. I can imagine it's possib= le to write a kernel module(or modify an existing one) that tests that, but th= at's beyond me. My understanding ends at the system call boundary. >I think all we got is just more evidence that it's a PCI or x86 architectu= re >problem, not USB. I would mail linux-pci I can probably do that, but I'm not really confident that my device is functioning properly hardware-wise, and I wouldn't be wasting everyone's ti= me. If I had access to another sample of the device, that was not self-selected= , I would at least be able to tell that it reproduces on a randomly sampled de= vice beside mine. Unfortunately the bugreport starter with access to multiple samples is MIA for some reason. ... Meanwhile I think I determined the source of instability I had during the sleep/restart actions. I had a working hypothesis that it's static zaps, an= d I happened to pretty severely zap something in the device through a (rather t= hin) keyboard key recently, severely enough to force my desktop's monitor, that = only has common connection with the Z13 through mains, to shutdown momentarily, likely due to power protection circuitry in its PSU. (There is also no grounding wire in this house) The device functioned nominally, but the mome= nt I tried to suspend it after that zap it died, and I had to longpress the power button. It means I did at least 5-7 similar level zaps, and it could have of course damaged something. All of this could mean nothing, but that makes me less confident that I have a properly functioning device. There are 4-7 people complaining of this issue on Linux, so it means at lea= st 100 users with their devices in the similar state. Not everyone reports iss= ues of course -- absolutely real bugs get 1-2 reporters on drm/amd for example,= so the number could be greater. Could it be that this number of people also za= pped their devices, and did the same kind of latent damage to the whatever machi= nery responsible for the MSI-X interrupt? Sounds kind of implausible. So if it doesn't manifest on all devices the only other reason I can think of is something to do with manufacturing. I think we need more people supplying debug data to be sure before bothering the other subsystems. But I would do as you recommend. And the issue looks = like something hardware/firmware related, i.e. beyond the level of the kernel. --=20 You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.=