All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michał Pecio" <michal.pecio@gmail.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>,
	Mathias Nyman <mathias.nyman@linux.intel.com>,
	Mathias Nyman <mathias.nyman@intel.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-usb@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: xhci: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Date: Sun, 6 Apr 2025 09:50:08 +0200	[thread overview]
Message-ID: <20250406095008.0dbfd586@foxbook> (raw)
In-Reply-To: <ade0d77a-651a-4b03-bf21-00369fdc22f8@rowland.harvard.edu>

Thanks, Alan.

On Sat, 5 Apr 2025 22:40:51 -0400, Alan Stern wrote:
> The core believes that resetting a device should erase the endpoint 
> information in the HCD.  There is a callback in hub_port_reset() to
> that effect:
> 
> 		if (hcd->driver->reset_device)
> 			hcd->driver->reset_device(hcd, udev);

Makes sense, reset_device() does indeed nuke everything besides EP0.
When the endpoints are re-added later, their will get blank state.

> So after this the EP should not be in the Running-Idle state; in fact
> it should not exist at all (unless it is ep0, but in this case I
> think it isn't).

The log shows that reset_device() is followed by a few drop_endpoint()
and add_endpoint(), so the endpoint gets enabled again before new URBs
are submitted, no problem here.

What I found suspicious is that there is also endpoint_disable() and
I'm not sure where it comes from. Looking at core code, it seems to
often be followed by endpoint_reset(), but the log doesn't show that.

> Is the implementation of the reset_device callback in xhci-hcd
> missing something?

I guess we could clear EP_STALLED when an endpoint is being re-enabled.
This would cover reset_device() and also drop_endpoint().


Paul, could you apply this patch and report back if it fixes the
warnings and reset loops and which XXX messages are printed?
BTW, if those "is now EP_STALLED" and "no longer EP_STALLED" cause
too much noise, you can remove them, they are not very important.

diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index 3f918fea7aea..0ebf7980780f 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -1463,6 +1463,10 @@ int xhci_endpoint_init(struct xhci_hcd *xhci,
 	if (!virt_dev->eps[ep_index].new_ring)
 		return -ENOMEM;
 
+	if (virt_dev->eps[ep_index].ep_state & EP_STALLED)
+		dev_err(&udev->dev, "XXX ep %d still EP_STALLED on init, clearing\n", ep_index);
+	virt_dev->eps[ep_index].ep_state &= ~EP_STALLED;
+
 	virt_dev->eps[ep_index].skip = false;
 	ep_ring = virt_dev->eps[ep_index].new_ring;
 
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index e0816d2f6b75..b674f6b4be3c 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2541,6 +2541,7 @@ static void process_bulk_intr_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 	struct xhci_slot_ctx *slot_ctx;
 	u32 trb_comp_code;
 	u32 remaining, requested, ep_trb_len;
+	u32 ep_index = TRB_TO_EP_ID(le32_to_cpu(event->flags)) - 1;
 
 	slot_ctx = xhci_get_slot_ctx(xhci, ep->vdev->out_ctx);
 	trb_comp_code = GET_COMP_CODE(le32_to_cpu(event->transfer_len));
@@ -2581,6 +2582,7 @@ static void process_bulk_intr_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 		xhci_handle_halted_endpoint(xhci, ep, td, EP_SOFT_RESET);
 		return;
 	case COMP_STALL_ERROR:
+		dev_err(&td->urb->dev->dev, "XXX ep %d is now EP_STALLED\n", ep_index);
 		ep->ep_state |= EP_STALLED;
 		break;
 	default:
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 83a4adf57bae..9066e1b26ceb 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1607,7 +1607,7 @@ static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag
 
 	/* Class driver might not be aware ep halted due to async URB giveback */
 	if (*ep_state & EP_STALLED)
-		dev_dbg(&urb->dev->dev, "URB %p queued before clearing halt\n",
+		dev_err(&urb->dev->dev, "XXX URB %p queued before clearing halt\n",
 			urb);
 
 	switch (usb_endpoint_type(&urb->ep->desc)) {
@@ -1772,6 +1772,11 @@ static int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
 
 	/* In these cases no commands are pending but the endpoint is stopped */
 	if (ep->ep_state & (EP_CLEARING_TT | EP_STALLED)) {
+		struct xhci_ep_ctx *ep_ctx = xhci_get_ep_ctx(xhci, vdev->out_ctx, ep_index);
+		int ctx_state = GET_EP_CTX_STATE(ep_ctx);
+		if (ctx_state != EP_STATE_STOPPED)
+			dev_err(&urb->dev->dev, "XXX WTF ep_state %d ctx_state %d\n", ep->ep_state, ctx_state);
+
 		/* and cancelled TDs can be given back right away */
 		xhci_dbg(xhci, "Invalidating TDs instantly on slot %d ep %d in state 0x%x\n",
 				urb->dev->slot_id, ep_index, ep->ep_state);
@@ -3211,6 +3216,8 @@ static void xhci_endpoint_reset(struct usb_hcd *hcd,
 
 	spin_lock_irqsave(&xhci->lock, flags);
 
+	if (ep->ep_state & EP_STALLED)
+		dev_err(&udev->dev, "XXX ep %d no longer EP_STALLED\n", ep_index);
 	ep->ep_state &= ~EP_STALLED;
 
 	/* Bail out if toggle is already being cleared by a endpoint reset */

  reply	other threads:[~2025-04-06  7:50 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-03 18:02 xhci: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state Paul Menzel
2025-04-04 14:26 ` Mathias Nyman
2025-04-04 14:29   ` Paul Menzel
2025-04-05  5:23   ` Paul Menzel
2025-04-05 22:23     ` Michał Pecio
2025-04-06  2:40       ` Alan Stern
2025-04-06  7:50         ` Michał Pecio [this message]
2025-04-06 15:50           ` Michał Pecio
2025-04-06 19:26             ` Alan Stern
2025-04-07  5:49               ` Michał Pecio
2025-04-07 16:11                 ` Alan Stern
2025-04-08 10:18                   ` [PATCH RFC RFT] usb: hcd: Add a usb_device argument to hc_driver.endpoint_reset() Michał Pecio
2025-04-08 13:55                     ` Mathias Nyman
2025-04-09 10:18                       ` Michał Pecio
2025-04-09 14:13                         ` Alan Stern
2025-04-15  8:38                           ` Michał Pecio
2025-04-09 13:03                     ` kernel test robot
2025-04-09 13:14                     ` kernel test robot
2025-04-07  7:15         ` xhci: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state Mathias Nyman
2025-04-05  6:43 ` Michał Pecio
2025-04-05  7:36   ` Paul Menzel
2025-04-05  9:49     ` Michał Pecio
2025-04-05 14:08       ` Paul Menzel
2025-04-05 18:13         ` Paul Menzel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250406095008.0dbfd586@foxbook \
    --to=michal.pecio@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@intel.com \
    --cc=mathias.nyman@linux.intel.com \
    --cc=pmenzel@molgen.mpg.de \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.