From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CA2D734BA50 for ; Tue, 23 Jun 2026 11:55:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.53 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782215724; cv=none; b=NKbuJuljAxROwRoLMd5GB9OlnfI7l/1rXWso9GdvvvoknzAC0m21Mx0YlybT14rDQ66N/eqq3RD0rDSpXpBo5icU767D0MCUfE6ASXzeKzGzyzsuZeRlrupC6sUje2PIMHHcbpaa+gQ/pRPciEQQ1JbvlYU+jMa7ZMBe4LdYR5s= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782215724; c=relaxed/simple; bh=h2Rwr37eMoGme/d9wKqtjK2U64WXg2RRslmQ/9OCDWQ=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=VsVDWeCMvDkybRh/Rzb/Bn1QpLlsyQR189imwDw3mu3h9XLf0pcOzbo25zzIO4+BmPJH6g/Mq/KJfnmREC3ZAQvnvw9kQWyqhbbC4ebrnA7RJEBHeu6osOHNZNgeufFlpbnznOH3muwNIfu9kcuherer17zz5ctGer+b+J+rYoE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=NOSCPC+A; arc=none smtp.client-ip=209.85.128.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="NOSCPC+A" Received: by mail-wm1-f53.google.com with SMTP id 5b1f17b1804b1-49249707788so23801955e9.2 for ; Tue, 23 Jun 2026 04:55:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1782215721; x=1782820521; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=lx0STp9J5MD57IIANrpjIDVf2U7oRAYtvaIvJeF38Lw=; b=NOSCPC+AdZS7XjzzPf3YH5zhsi6PcezFTVP9rr9WzHZNx67chgHnIH+xQtVeuIy3oy WbUjeg7HzoE8o0DWW/SYogLYK3zuGcn8xmHjI9PXWGEAW7d3zsLRk8qOyUOB5Q/Q+znA uN70PyNf2vLjwgWPW9+IKW5NEJVEJQ1oJuHF37wJaI/hJS5Mfk8YqcVciZr8BeClEcXC ESb5jAfaLB5bt7e2LXRW0CYxUVI+RjViUOH4NDvt+ZHrRxFJuPBDoRiDMpN8vRklkS5U FRYgYJyE5kPPjUOsC3PozGmma+zxPV38to7EBS1uH/X4Gj0WglPNTAJtPpl2+tU702iI f/nA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1782215721; x=1782820521; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=lx0STp9J5MD57IIANrpjIDVf2U7oRAYtvaIvJeF38Lw=; b=cH1+EGkuBo17DsEtQ60VzmExFW3+utOzDblR6ntoI3/q+0dusRb90kW6l8adwO2vuQ diR0sF9en1YrW7biJoZAbUqf/U66tloBuMC2Ov3rQlcGlldR3UVYM7pOXxS+NuTcXMwF yQO2LjOdUny4rqpUmVVIWN2XX80fR14lHnp2I6YVC4KghKAPLnwqCg3O6/mT6RMUSN3Y Ba/M2Kuh1koUZ79uWZP5COtdFeEi00loGkFET8pn8gYITNhqHR0eNn/mE2noFcFyZIKh qWzlg+aVf02geFa+6WdDhHQROhaNoaloGbKxUGwKtNe1xkTeJrMILlD+FbBLLJWOZtyI IoKA== X-Forwarded-Encrypted: i=1; AFNElJ8DI1Ih5RnQxcrEkzfE5T3KDiKGYFVX0pJCZ6cV4QtP0gTN8xtMXxT1D9XCNZd0LEp0OtaibgVD5RI=@vger.kernel.org X-Gm-Message-State: AOJu0YzNKqclu7PL5QtebEHKBMt8WJ4pVsUY/IMV5EGXKRwzmrqSpsP+ 8lA0QU34hBPf2CCC8q9GQF6tbaa8hCJDAiRl9z2cE6uUCOx6uy+kUSqh X-Gm-Gg: AfdE7cnqxYec3xEyovKYFd7Bv8nidYkmqLWJrqe+Cuihb1DySgsOxOWjBYenDFZIQqG NxuNOxNilH/ABxHFfzGI2YCp8vF+cHaGN0zaAVbof3NZZLv5XK9PwKs8sXJZOlwewkZn4ydagqR Iv4wMffj76SZBI1vTBQszDKTo0AORFr3cP08wcHAtn2sdInZWemgj2lX6rv/fo1BiMaW+xKCVaO j0HXdwOfa2y24lOa5kBkzphwFiqbmrQiD/TqkGB06eMMxKy8dIRZOAx4GMzwzvcrCsEZxAKwjFE c8IxVqf42Rpd5uyCdhcckIg2Eo5ukEQblmGoD6x6i30y0HlTjC27WzIBcXbwUk1lZA6xImyap9c eV2xELV4ICYKUT9hCo1jIdWqGQMLUDR2uR/ilvxo/W2K2OLYl7vWg8XORES94+doVGjoRI1cYJl +XpI9xGkLTpsAh/sLyYOBQNg== X-Received: by 2002:a05:600c:3113:b0:48a:5565:ec3d with SMTP id 5b1f17b1804b1-492490a790cmr200461135e9.22.1782215720987; Tue, 23 Jun 2026 04:55:20 -0700 (PDT) Received: from foxbook (bfg19.neoplus.adsl.tpnet.pl. [83.28.44.19]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-492494496d8sm289452405e9.9.2026.06.23.04.55.20 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Tue, 23 Jun 2026 04:55:20 -0700 (PDT) Date: Tue, 23 Jun 2026 13:55:17 +0200 From: Michal Pecio To: Mathias Nyman Cc: raoxu , gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org, mathias.nyman@intel.com, stable@vger.kernel.org Subject: Re: [PATCH v2] xhci: pci: Disable soft retry for Renesas uPD720201 Message-ID: <20260623135517.2b1f0809.michal.pecio@gmail.com> In-Reply-To: References: <20260619124234.0a9e4670.michal.pecio@gmail.com> <237BFC17C62D63DF+20260622062117.56278-1-raoxu@uniontech.com> Precedence: bulk X-Mailing-List: linux-usb@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Replying a little out of order here. On Mon, 22 Jun 2026 14:31:58 +0300, Mathias Nyman wrote: > Cancel the realtek URB we tried to soft retry earlier. > > > 2026-06-22T13:23:39.477082+08:00 uos-PC kernel: xhci_hcd 0000:04:00.0: 8/6 (000/3) [200cb341b0/200cb341b1/200cb341c0] xhci_urb_dequeue cancel TD at 200cb341b0 stream 0 > > 2026-06-22T13:23:39.477082+08:00 uos-PC kernel: xhci_hcd 0000:04:00.0: 8/6 (004/3) [200cb341b0/200cb341b1/200cb341c0] queue_stop_endpoint suspend 0 > > queue stop endpoint to cancel URB for realtek device. > Endpoint context still shows endpoint is in "stopped" state. > Note that we restarted the endpoint 20ms earlier, endpoint context > might not have updated yet. This was business as usual on uPD720200, it seems that these chips don't update EP Context until the first scheduled service opportunity (though no later than about 30ms - long interval endpoints must have different rules) and they cannot execute Stop EP until then either. Some of them complete the command with Context State Error, others delay completion until the scheduled restart. If we wait longer and then queue Stop Endpoint, it executes instantly (fraction of a ms). It seems that 201/202 chips still have the same limitation. > I think there are some steps we could do to avoid soft retry, > restart, and stopping an endpoint we know is behind a disconnected > parent. Yes, existing logic can be trivially extended to cover children too. Of course, this does nothing if the device is disconnected from an external hub or a transaction error occurs without disconnection. But further experiments indicate that disconnection from the root hub is actually a necessary condition to trigger this bug. If another SuperSpeed device (even one without periodic endpoints like UAS) is connected to another port, the retry causes another Transaction Error a few ms later, the pipe halts and Stop EP completes normally with Context State Error, as expected. Then we reset, remove the URB and never restart this endpoint again. The same happens if I trigger the bug and then connect either the same hub or any other device to any SuperSpeed port before command timeout. [ +0,000009] xhci_hcd 0000:06:00.0: 6/6 (000/2) [ff8f0bd0/ff8f0bd1/ff8f0be0] queue_reset_endpoint tsp 1 [ +0,000009] xhci_hcd 0000:06:00.0: 0/-1 (fff/f) [ffffffff/ffffffff/ffffffff] xhci_ring_cmd_db cmd_ring_state 1 [ +0,000504] xhci_hcd 0000:06:00.0: 6/6 (002/3) [ff8f0bd0/ff8f0bd1/ff8f0be0] handle_cmd_completion cmd_type 14 comp_code 1 [ +0,000025] xhci_hcd 0000:06:00.0: 6/6 (000/3) [ff8f0bd0/ff8f0bd1/ff8f0be0] ring_ep_doorbell stream 0 [ +0,006627] usb 10-1: USB disconnect, device number 22 [ +0,000016] usb 10-1.4: USB disconnect, device number 23 [ +0,000005] r8152-cfgselector 10-1.4.4: USB disconnect, device number 24 [ +0,000190] xhci_hcd 0000:06:00.0: 6/6 (000/3) [ff8f0bd0/ff8f0bd1/ff8f0be0] xhci_urb_dequeue cancel TD at ff8f0bd0 stream 0 [ +0,000011] xhci_hcd 0000:06:00.0: 6/6 (004/3) [ff8f0bd0/ff8f0bd1/ff8f0be0] queue_stop_endpoint suspend 0 [ +0,000009] xhci_hcd 0000:06:00.0: 0/-1 (fff/f) [ffffffff/ffffffff/ffffffff] xhci_ring_cmd_db cmd_ring_state 1 [ +0,000655] xhci_hcd 0000:06:00.0: 6/6 (004/2) [ff8f0bd0/ff8f0bd1/ff8f0be0] handle_tx_event comp_code 4 trb_dma ff8f0bd0 [ +0,000023] xhci_hcd 0000:06:00.0: 6/6 (004/2) [ff8f0bd0/ff8f0bd1/ff8f0be0] handle_tx_event stream_id 0 trb_len 2 missing 2 [ +0,000013] xhci_hcd 0000:06:00.0: 6/6 (004/2) [ff8f0bd0/ff8f0bd1/ff8f0be0] queue_reset_endpoint tsp 1 [ +0,000008] xhci_hcd 0000:06:00.0: 0/-1 (fff/f) [ffffffff/ffffffff/ffffffff] xhci_ring_cmd_db cmd_ring_state 1 [ +0,000012] xhci_hcd 0000:06:00.0: 6/6 (006/2) [ff8f0bd0/ff8f0bd1/ff8f0be0] handle_cmd_completion cmd_type 15 comp_code 19 I would guess that disconnecting all SuperSpeed ports causes the chip to turn off its SuperSpeed schedule altogether and wait for SW to stop all endpoints which aren't halted yet, but in case of pending restart, Stop EP is scheduled to complete at the next service opportunity, which never happens. I also found that disconnecting a different affected NIC from the root hub itself also triggers this bug, but only if I disable protection from queuing Reset Endpoint (including with TSP) to "inactive" devices. And the bug doesn't trigger every time - sometimes the unlink happens while Reset Endpoint is pending and then its handler removes the URB without Stop Endpoint. And cable connection isn't actually necessary - I was mistaken due to the randomness of the bug. Regards, Michal