From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f52.google.com (mail-wr1-f52.google.com [209.85.221.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB85F2DCC1F for ; Sat, 2 May 2026 09:46:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.52 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777715214; cv=none; b=l6nXOZ9FsFesVJ4CtpRwntFauURt1T5m/vdxUUhOKR5FHaU2vCqzkeg1097d3lSF2Aq9NdxD3g7p/NtKDDZcBTZg8g2SFaMURh3aXhByjpmNtvcywjVwiPSXaBWi2maqPA7F7vKPCeMX/xKcGEXnB1egdpYNsuA35GoKsBhlkv4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777715214; c=relaxed/simple; bh=DPiNTEITt1TPBpjArvFJpFpEyHTx0vdtATFnbHgPgfU=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=TzEDVcPe3JjIXKWEr8UDu4Br7zkS+VpYJfWZQIitYXBIcj7C+eidzBHrsSePqkZ+X+ma6MfjMVwntD9wBnoKUiEG9wMIh4m9GJijNJqmYxVRYJhL7y6RYsIO6w8A9hgste9/oajP88ZtJaH+ksfLBe9LD7m7br89QrDv01uOp74= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MP2O6I6U; arc=none smtp.client-ip=209.85.221.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MP2O6I6U" Received: by mail-wr1-f52.google.com with SMTP id ffacd0b85a97d-43d73422431so2175933f8f.2 for ; Sat, 02 May 2026 02:46:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777715211; x=1778320011; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=ezmD1eosO8J8L5bV2blwD7uKVAJ5uRnCIe2hKB1QaCg=; b=MP2O6I6UmkMu+yWeLeSdXoUwOAlMSjK4K9IympeZv4SCa8wB4CWZAwZoY/hzIZpP2U OMuXbB3PskdWK3w7ugNJ/lE/Yqj7fOqV4s3R26NigI8nW6srpOCBRv1eSMFHVl7Ptpb3 KuOsnHNBNnNMcB9qZqjERcYVNV+uj/9jtgJBe1KQZKtqu3M0+NUaPqk7s9e6/VAwzT14 Av57RaZeFOAhskc3eTbOxpe/ZEfBosxnZ399ZYK3Zsy9POuvbnY2OvnxxZdWEi5xiTNd 1e3TGxj3pgeHhHQ6iobs/Hi/8McrRTcTpCsSEvgmCcxRXTeYFzOHjpABo1s93sGRH9xC X8Yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777715211; x=1778320011; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=ezmD1eosO8J8L5bV2blwD7uKVAJ5uRnCIe2hKB1QaCg=; b=TFQIK1MVVQtQ4LiAVXzw3AhWU5IEl4OOuClRbAFEZXYC4GmrFuFToU3zu0aWV1R1vz CB0N9xcTHVDRp9ksjXECbuzfX8Bqsls8Hq+HvsL3GNcoZ1qZ6tF/zjM6kp0DmqyKYVcx SbMANco4EZsHQDIC3031PBYWUCMK7zzM64Gk3YP8y2T+2aTOpR3WEtJmtGTPirG1fsrk E8Grk8/13hbSCyRMWgNj3IAFWRMmZwLU7dCi5VQfUjOE12c6OyAhYhlgW4x8W4SHgatn 1ogL6hQVxNWxFcn2cFaFE/+kbYh4hX7hhQg9N5enai1L0uHd0TxXwHfukOUMQouSnQCg TRdw== X-Gm-Message-State: AOJu0YzveAz/0sj6FzP/UmNTIZ5Wc2eW3+bmQdOloatxWi03xXDv0NIW j+TgLeIPFAtPkmkPXCjz/Jgdzb4KM3u1LNP4Xns4MhIsffIt5+hD5lhxJB3w0g== X-Gm-Gg: AeBDieu1JyFkfeBiQfTtFaD8V4S+vnwWgMm7rplsh5Jc8AuPcd5kF66yyjbGW+IPXq6 amcnl/NeOWoGDssqTxNCkyhMDKfX+CwKHvNqawXCzEVoCDOWgr0VoSiCb3iIKwsm317qaF+3Fil mbC5Cs2IYM4uEOQAPXFTQx5NWGjPmQu6aRYcxTwGkgBgBsEElFh9YGaf9ZZYngAyUCDi1t8anvT w7o3pQW7lPCBCVsBFBxpqvFrydEUiFInrBOpXGiHOL/bRxR34kC5rZpr3df17eW5c+AQ5v+MCs3 dvTdUB0Bo/F+XMSvABPR4rmHjbDkSledbUNvsyihzX3F6Bil/cZ8w8kwpIfvIk0p+p079s16FoB /pcVNMfnZFIrGlmIoqkyM5JyhhF2I7UtLBJkfzKXqVgZdbIRfTaRQSZIVTb73rf+iM2Qp3XPEOo H5rEPurRUeRjbABF1H49YXO3Gi5BfhCqCyVugDKc2cA+jTvw== X-Received: by 2002:a05:6000:40c7:b0:43b:5672:efe with SMTP id ffacd0b85a97d-44bb36d17efmr4004334f8f.9.1777715211096; Sat, 02 May 2026 02:46:51 -0700 (PDT) Received: from foxbook (bgt227.neoplus.adsl.tpnet.pl. [83.28.83.227]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a8ea7d035sm9943933f8f.5.2026.05.02.02.46.50 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Sat, 02 May 2026 02:46:50 -0700 (PDT) Date: Sat, 2 May 2026 11:46:44 +0200 From: Michal Pecio To: Desnes Nunes Cc: linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org, gregkh@linuxfoundation.org, mathias.nyman@intel.com, stable@vger.kernel.org Subject: [PATCH RFT RFC] usb: xhci: Kill hosts with HCE or HSE on command timeout Message-ID: <20260502114644.76e6b5a3.michal.pecio@gmail.com> In-Reply-To: References: <20260430014817.2006885-1-desnesn@redhat.com> <20260430104850.352bd946.michal.pecio@gmail.com> <20260430235453.2288c973.michal.pecio@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Fri, 1 May 2026 11:09:27 -0300, Desnes Nunes wrote: > On Thu, Apr 30, 2026 at 6:55=E2=80=AFPM Michal Pecio wrote: > > When xhci_handle_command_timeout() logs USBSTS, does it help to add: > > > > if (usbsts & STS_FATAL) { > > xhci_halt(xhci); > > xhci_hc_died(xhci); > > goto time_out_completed; > > } > > It may not be perfect solution (race conditions?) but it could hint > > that we are on the right track, if it works. =20 >=20 > This panicked the system as soon as I hit `echo c > /proc/sysrq-trigger`: >=20 > [ 141.683476] sysrq: Trigger a crash > [ 141.686970] Kernel panic - not syncing: sysrq triggered crash Damn, that sucks. Any chance it's not a problem with my proposed change but some sort of issue on your side? Anyway, I think the patch below might cover it. It works for me in the sense that the bus does get killed, without ill effect. I tested on VL805 where HSE is easily triggered by disabling XHCI_TRB_OVERFETCH. However, the patch isn't necessary here - VL805 doesn't clear CRCR.CRR on HSE, so normal abort path is taken and times out, then hc_died(). Can somebody serious confirm if this issue actually exists in the first place, and whether the patch solves it? Hello Redhat, anyone alive there? Or only stochastic parrots? Mathias, do you remember what's the point of the "Command timeout on stopped ring" branch? Can it happen in any case other than dead chip? I also wonder if it wouldn't make sense to just hc_died() on every command timeout except Address Device. We rely on Stop Endpoint timeouts to kill chips which go unresponsive without setting HCE/HSE, because sooner or later somebody loses patience and unlinks an URB, but this story (real or hallucinated, but plausible) shows that this may not help when there are no devices created yet. --- diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index e5823650850a..3041deb67b57 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -1761,13 +1761,15 @@ void xhci_handle_command_timeout(struct work_struct= *work) /* mark this command to be cancelled */ xhci->current_cmd->status =3D COMP_COMMAND_ABORTED; =20 - /* Make sure command ring is running before aborting it */ + /* check for crashed or disconnected chip */ hw_ring_state =3D xhci_read_64(xhci, &xhci->op_regs->cmd_ring); - if (hw_ring_state =3D=3D ~(u64)0) { + if (hw_ring_state =3D=3D ~(u64)0 || usbsts & (STS_FATAL | STS_HCE)) { + xhci_info(xhci, "kill the damn thing\n"); xhci_hc_died(xhci); goto time_out_completed; } =20 + /* Make sure command ring is running before aborting it */ if ((xhci->cmd_ring_state & CMD_RING_STATE_RUNNING) && (hw_ring_state & CMD_RING_RUNNING)) { /* Prevent new doorbell, and start command abort */