From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Michal Pecio <michal.pecio@gmail.com>,
Mathias Nyman <mathias.nyman@linux.intel.com>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH 6.1 09/81] usb: xhci: Limit Stop Endpoint retries
Date: Mon, 6 Jan 2025 16:15:41 +0100 [thread overview]
Message-ID: <20250106151129.791141506@linuxfoundation.org> (raw)
In-Reply-To: <20250106151129.433047073@linuxfoundation.org>
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Pecio <michal.pecio@gmail.com>
[ Upstream commit 42b7581376015c1bbcbe5831f043cd0ac119d028 ]
Some host controllers fail to atomically transition an endpoint to the
Running state on a doorbell ring and enter a hidden "Restarting" state,
which looks very much like Stopped, with the important difference that
it will spontaneously transition to Running anytime soon.
A Stop Endpoint command queued in the Restarting state typically fails
with Context State Error and the completion handler sees the Endpoint
Context State as either still Stopped or already Running. Even a case
of Halted was observed, when an error occurred right after the restart.
The Halted state is already recovered from by resetting the endpoint.
The Running state is handled by retrying Stop Endpoint.
The Stopped state was recognized as a problem on NEC controllers and
worked around also by retrying, because the endpoint soon restarts and
then stops for good. But there is a risk: the command may fail if the
endpoint is "stopped for good" already, and retries will fail forever.
The possibility of this was not realized at the time, but a number of
cases were discovered later and reproduced. Some proved difficult to
deal with, and it is outright impossible to predict if an endpoint may
fail to ever start at all due to a hardware bug. One such bug (albeit
on ASM3142, not on NEC) was found to be reliably triggered simply by
toggling an AX88179 NIC up/down in a tight loop for a few seconds.
An endless retries storm is quite nasty. Besides putting needless load
on the xHC and CPU, it causes URBs never to be given back, paralyzing
the device and connection/disconnection logic for the whole bus if the
device is unplugged. User processes waiting for URBs become unkillable,
drivers and kworker threads lock up and xhci_hcd cannot be reloaded.
For peace of mind, impose a timeout on Stop Endpoint retries in this
case. If they don't succeed in 100ms, consider the endpoint stopped
permanently for some reason and just give back the unlinked URBs. This
failure case is rare already and work is under way to make it rarer.
Start this work today by also handling one simple case of race with
Reset Endpoint, because it costs just two lines to implement.
Fixes: fd9d55d190c0 ("xhci: retry Stop Endpoint on buggy NEC controllers")
CC: stable@vger.kernel.org
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-32-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stable-dep-of: e21ebe51af68 ("xhci: Turn NEC specific quirk for handling Stop Endpoint errors generic")
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/usb/host/xhci-ring.c | 28 ++++++++++++++++++++++++----
drivers/usb/host/xhci.c | 2 ++
drivers/usb/host/xhci.h | 1 +
3 files changed, 27 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index d193d5ad8789..4a3a8a3fa69d 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -52,6 +52,7 @@
* endpoint rings; it generates events on the event ring for these.
*/
+#include <linux/jiffies.h>
#include <linux/scatterlist.h>
#include <linux/slab.h>
#include <linux/dma-mapping.h>
@@ -1143,16 +1144,35 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id,
return;
case EP_STATE_STOPPED:
/*
- * NEC uPD720200 sometimes sets this state and fails with
- * Context Error while continuing to process TRBs.
- * Be conservative and trust EP_CTX_STATE on other chips.
+ * Per xHCI 4.6.9, Stop Endpoint command on a Stopped
+ * EP is a Context State Error, and EP stays Stopped.
+ *
+ * But maybe it failed on Halted, and somebody ran Reset
+ * Endpoint later. EP state is now Stopped and EP_HALTED
+ * still set because Reset EP handler will run after us.
+ */
+ if (ep->ep_state & EP_HALTED)
+ break;
+ /*
+ * On some HCs EP state remains Stopped for some tens of
+ * us to a few ms or more after a doorbell ring, and any
+ * new Stop Endpoint fails without aborting the restart.
+ * This handler may run quickly enough to still see this
+ * Stopped state, but it will soon change to Running.
+ *
+ * Assume this bug on unexpected Stop Endpoint failures.
+ * Keep retrying until the EP starts and stops again, on
+ * chips where this is known to help. Wait for 100ms.
*/
if (!(xhci->quirks & XHCI_NEC_HOST))
break;
+ if (time_is_before_jiffies(ep->stop_time + msecs_to_jiffies(100)))
+ break;
fallthrough;
case EP_STATE_RUNNING:
/* Race, HW handled stop ep cmd before ep was running */
- xhci_dbg(xhci, "Stop ep completion ctx error, ep is running\n");
+ xhci_dbg(xhci, "Stop ep completion ctx error, ctx_state %d\n",
+ GET_EP_CTX_STATE(ep_ctx));
command = xhci_alloc_command(xhci, false, GFP_ATOMIC);
if (!command) {
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index b072154badf3..ae14c7ade9bc 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -8,6 +8,7 @@
* Some code borrowed from the Linux EHCI driver.
*/
+#include <linux/jiffies.h>
#include <linux/pci.h>
#include <linux/iommu.h>
#include <linux/iopoll.h>
@@ -1911,6 +1912,7 @@ static int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
ret = -ENOMEM;
goto done;
}
+ ep->stop_time = jiffies;
ep->ep_state |= EP_STOP_CMD_PENDING;
xhci_queue_stop_endpoint(xhci, command, urb->dev->slot_id,
ep_index, 0);
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 0b526edf636f..a75b8122538d 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -717,6 +717,7 @@ struct xhci_virt_ep {
/* Bandwidth checking storage */
struct xhci_bw_info bw_info;
struct list_head bw_endpoint_list;
+ unsigned long stop_time;
/* Isoch Frame ID checking storage */
int next_frame_id;
/* Use new Isoch TRB layout needed for extended TBC support */
--
2.39.5
next prev parent reply other threads:[~2025-01-06 15:19 UTC|newest]
Thread overview: 93+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-06 15:15 [PATCH 6.1 00/81] 6.1.124-rc1 review Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 01/81] x86/hyperv: Fix hv tsc page based sched_clock for hibernation Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 02/81] selinux: ignore unknown extended permissions Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 03/81] btrfs: fix use-after-free in btrfs_encoded_read_endio() Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 04/81] tracing: Have process_string() also allow arrays Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 05/81] thunderbolt: Add support for Intel Lunar Lake Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 06/81] thunderbolt: Add support for Intel Panther Lake-M/P Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 07/81] thunderbolt: Dont display nvm_version unless upgrade supported Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 08/81] xhci: retry Stop Endpoint on buggy NEC controllers Greg Kroah-Hartman
2025-01-06 15:15 ` Greg Kroah-Hartman [this message]
2025-01-06 15:15 ` [PATCH 6.1 10/81] xhci: Turn NEC specific quirk for handling Stop Endpoint errors generic Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 11/81] net: mctp: handle skb cleanup on sock_queue failures Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 12/81] RDMA/mlx5: Enforce same type port association for multiport RoCE Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 13/81] RDMA/bnxt_re: Add check for path mtu in modify_qp Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 14/81] RDMA/bnxt_re: Fix reporting hw_ver in query_device Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 15/81] RDMA/bnxt_re: Fix max_qp_wrs reported Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 16/81] RDMA/bnxt_re: Fix the locking while accessing the QP table Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 17/81] drm/bridge: adv7511_audio: Update Audio InfoFrame properly Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 18/81] net: dsa: microchip: Fix KSZ9477 set_ageing_time function Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 19/81] net: dsa: microchip: add ksz_rmw8() function Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 20/81] net: dsa: microchip: Fix LAN937X set_ageing_time function Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 21/81] RDMA/hns: Refactor mtr find Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 22/81] RDMA/hns: Remove unused parameters and variables Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 23/81] RDMA/hns: Fix mapping error of zero-hop WQE buffer Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 24/81] RDMA/hns: Fix warning storm caused by invalid input in IO path Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 25/81] RDMA/hns: Fix missing flush CQE for DWQE Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 26/81] net: stmmac: platform: provide devm_stmmac_probe_config_dt() Greg Kroah-Hartman
2025-01-06 15:15 ` [PATCH 6.1 27/81] net: stmmac: dont create a MDIO bus if unnecessary Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 28/81] net: stmmac: restructure the error path of stmmac_probe_config_dt() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 29/81] net: fix memory leak in tcp_conn_request() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 30/81] ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip devices Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 31/81] ip_tunnel: annotate data-races around t->parms.link Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 32/81] ipv4: ip_tunnel: Unmask upper DSCP bits in ip_tunnel_bind_dev() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 33/81] ipv4: ip_tunnel: Unmask upper DSCP bits in ip_md_tunnel_xmit() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 34/81] ipv4: ip_tunnel: Unmask upper DSCP bits in ip_tunnel_xmit() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 35/81] net: Fix netns for ip_tunnel_init_flow() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 36/81] netrom: check buffer length before accessing it Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 37/81] drm/i915/dg1: Fix power gate sequence Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 38/81] netfilter: nft_set_hash: unaligned atomic read on struct nft_set_ext Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 39/81] net: llc: reset skb->transport_header Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 40/81] ALSA: usb-audio: US16x08: Initialize array before use Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 41/81] eth: bcmsysport: fix call balance of priv->clk handling routines Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 42/81] net: mv643xx_eth: fix an OF node reference leak Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 43/81] net: wwan: t7xx: Fix FSM command timeout issue Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 44/81] RDMA/rtrs: Ensure ib_sge list is accessible Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 45/81] net: reenable NETIF_F_IPV6_CSUM offload for BIG TCP packets Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 46/81] net: restrict SO_REUSEPORT to inet sockets Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 47/81] net: wwan: iosm: Properly check for valid exec stage in ipc_mmio_init() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 48/81] af_packet: fix vlan_get_tci() vs MSG_PEEK Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 49/81] af_packet: fix vlan_get_protocol_dgram() " Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 50/81] ila: serialize calls to nf_register_net_hooks() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 51/81] btrfs: rename and export __btrfs_cow_block() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 52/81] btrfs: fix use-after-free when COWing tree bock and tracing is enabled Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 53/81] wifi: mac80211: wake the queues in case of failure in resume Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 54/81] drm/amdkfd: Correct the migration DMA map direction Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 55/81] btrfs: flush delalloc workers queue before stopping cleaner kthread during unmount Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 56/81] ALSA: hda/realtek: Add new alc2xx-fixup-headset-mic model Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 57/81] sound: usb: enable DSD output for ddHiFi TC44C Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 58/81] sound: usb: format: dont warn that raw DSD is unsupported Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 59/81] bpf: fix potential error return Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 60/81] ksmbd: retry iterate_dir in smb2_query_dir Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 61/81] net: usb: qmi_wwan: add Telit FE910C04 compositions Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 62/81] Bluetooth: hci_core: Fix sleeping function called from invalid context Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 63/81] irqchip/gic: Correct declaration of *percpu_base pointer in union gic_base Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 64/81] ARC: build: Try to guess GCC variant of cross compiler Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 65/81] usb: xhci: Avoid queuing redundant Stop Endpoint commands Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 66/81] modpost: fix input MODULE_DEVICE_TABLE() built for 64-bit on 32-bit host Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 67/81] modpost: fix the missed iteration for the max bit in do_input() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 68/81] ALSA hda/realtek: Add quirk for Framework F111:000C Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 69/81] ALSA: seq: oss: Fix races at processing SysEx messages Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 70/81] kcov: mark in_softirq_really() as __always_inline Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 71/81] RDMA/uverbs: Prevent integer overflow issue Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 72/81] pinctrl: mcp23s08: Fix sleeping in atomic context due to regmap locking Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 73/81] sky2: Add device ID 11ab:4373 for Marvell 88E8075 Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 74/81] net/sctp: Prevent autoclose integer overflow in sctp_association_init() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 75/81] drm: adv7511: Drop dsi single lane support Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 76/81] dt-bindings: display: adi,adv7533: Drop " Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 77/81] mm/readahead: fix large folio support in async readahead Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 78/81] mm: vmscan: account for free pages to prevent infinite Loop in throttle_direct_reclaim() Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 79/81] mptcp: fix TCP options overflow Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 80/81] mptcp: fix recvbuffer adjust on sleeping rcvmsg Greg Kroah-Hartman
2025-01-06 15:16 ` [PATCH 6.1 81/81] mptcp: dont always assume copied data in mptcp_cleanup_rbuf() Greg Kroah-Hartman
2025-01-06 18:22 ` [PATCH 6.1 00/81] 6.1.124-rc1 review Pavel Machek
2025-01-06 19:29 ` Florian Fainelli
2025-01-06 22:26 ` Peter Schneider
2025-01-07 0:22 ` SeongJae Park
2025-01-07 7:10 ` Ron Economos
2025-01-07 12:33 ` Mark Brown
2025-01-07 12:36 ` Naresh Kamboju
2025-01-07 12:44 ` Jon Hunter
2025-01-07 20:59 ` [PATCH 6.1] " Hardik Garg
2025-01-07 23:16 ` [PATCH 6.1 00/81] " Shuah Khan
2025-01-08 12:54 ` Muhammad Usama Anjum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250106151129.791141506@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=mathias.nyman@linux.intel.com \
--cc=michal.pecio@gmail.com \
--cc=patches@lists.linux.dev \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox