All of lore.kernel.org
 help / color / mirror / Atom feed
From: zwisler@google.com
To: Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: Andrzej Pietrasiewicz <andrzej.p@collabora.com>,
	"linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>,
	"kernel@collabora.com" <kernel@collabora.com>
Subject: Re: xhci problem -> general protection fault
Date: Thu, 1 Oct 2020 10:43:52 -0600	[thread overview]
Message-ID: <20201001164352.GA13249@google.com> (raw)
In-Reply-To: <7e38c533-6ea1-63a6-fc92-2ecef7ee1f84@linux.intel.com>

On Tue, Sep 29, 2020 at 01:35:31AM +0300, Mathias Nyman wrote:
<>
> The race I was referring to is if a driver issues a "Stop endpoint" command,
> and it races with an endpoint error/halt initiated by the xHC controller. 
> 
> The additional note in xhci 4.6.9 - Stop Endpoint, explains it:
> "Note: A Busy endpoint may asynchronously transition from the Running to the Halted
> or Error state due to error conditions detected while processing TRBs. A possible
> race condition may occur if software, thinking an endpoint is in the Running state,
> issues a Stop Endpoint Command however at the same time the xHC
> asynchronously transitions the endpoint to the Halted or Error state. In this case,
> a Context State Error may be generated for the command completion. Software
> may verify that this case occurred by inspecting the EP State for Halted or Error
> when a Stop Endpoint Command results in a Context State Error."
> 
> There are several context state errors in your trace.
> 
> Thanks
> -Mathias

Interestingly it looks like it's the actions that we take at the end of
xhci_handle_cmd_set_deq() for the broken command which break the HC.
Specifically, this line:

	dev->eps[ep_index].ep_state &= ~SET_DEQ_PENDING;

If I skip this line when I notice that ep_ctx->deq==0, the system will keep
running happily.

Here is a trace and dmesg for a run with the patch at the bottom of this mail.
I trimmed the trace a bit since it was very large, but I think I've left the
important bits intact:

https://gist.github.com/rzwisler/422e55321d9d2db5fc258d6d5b93d018

I've been able to run with this patch and survive through many "Mismatch"
occurrences, both with ep_ctx->deq set to 0 and set to some other value which
just seems to be wrong.

It seems like there are a few other places where we notice that we're in a bad
state, and we just bail, specifically these in xhci_queue_new_dequeue_state():

	addr = xhci_trb_virt_to_dma(deq_state->new_deq_seg,
				    deq_state->new_deq_ptr);
	if (addr == 0) {
		xhci_warn(xhci, "WARN Cannot submit Set TR Deq Ptr\n");
		xhci_warn(xhci, "WARN deq seg = %px, deq pt = %px\n",
			  deq_state->new_deq_seg, deq_state->new_deq_ptr);
		return;
	}
	ep = &xhci->devs[slot_id]->eps[ep_index];
	if ((ep->ep_state & SET_DEQ_PENDING)) {
		xhci_warn(xhci, "WARN Cannot submit Set TR Deq Ptr\n");
		xhci_warn(xhci, "A Set TR Deq Ptr command is pending.\n");
		return;
	}

Is noticing that the HC has given us bad data via the "Mismatch" check in
xhci_handle_cmd_set_deq() and bailing out enough, or should we figure out
exactly why the HC is getting into a bad state?

I'm happy to gather logs with more debug or run other experiments, if that
would be helpful.  As it is I don't really know how to debug the internal
state of the HC further, but hopefully the knowledge that the patch below
makes a difference will help us move forward.

--- >8 ---
From 1642f6f375d4822919fb03a56ce8c9186f58d6f7 Mon Sep 17 00:00:00 2001
From: Ross Zwisler <zwisler@google.com>
Date: Wed, 30 Sep 2020 17:45:24 -0600
Subject: [PATCH] DEBUG: avoid xhci crash

---
 drivers/usb/host/xhci-ring.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 2c255d0620b05..379570facbc9e 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1142,6 +1142,8 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
 			xhci_warn(xhci, "Mismatch between completed Set TR Deq Ptr command & xHCI internal state.\n");
 			xhci_warn(xhci, "ep deq seg = %p, deq ptr = %p\n",
 				  ep->queued_deq_seg, ep->queued_deq_ptr);
+			xhci_warn(xhci, "calculated deq:%#llx  real deq:%#llx\n", xhci_trb_virt_to_dma(ep->queued_deq_seg, ep->queued_deq_ptr), deq);
+			return;
 		}
 	}
 
-- 
2.28.0.709.gb0816b6eb0-goog


  reply	other threads:[~2020-10-01 16:43 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-17 15:30 xhci problem -> general protection fault Andrzej Pietrasiewicz
2020-09-18 10:50 ` Mathias Nyman
2020-09-18 14:20   ` Andrzej Pietrasiewicz
2020-09-25 13:40     ` Mathias Nyman
2020-09-25 21:05       ` Ross Zwisler
2020-09-28 13:32         ` Andrzej Pietrasiewicz
2020-09-29  7:13           ` Mathias Nyman
2020-10-01 14:13             ` Andrzej Pietrasiewicz
2020-09-28 22:35         ` Mathias Nyman
2020-10-01 16:43           ` zwisler [this message]
2020-10-12 19:20             ` Mathias Nyman
2020-10-12 21:53               ` zwisler
2020-10-13  7:49                 ` Mathias Nyman
2020-10-13  8:29                   ` Andrzej Pietrasiewicz
2020-10-13 16:44                     ` zwisler
2020-11-19 16:52                   ` Ross Zwisler
2020-11-23 15:06                     ` Mathias Nyman
2020-12-02 22:59                       ` Ross Zwisler
2020-12-04 18:07                         ` Mathias Nyman
2020-12-08 17:24                           ` Ross Zwisler
2020-12-09 13:11                             ` Mathias Nyman
2020-12-09 18:54                               ` Ross Zwisler
2020-12-30 12:33                                 ` Mathias Nyman
2021-01-06 18:52                                   ` Ross Zwisler
2021-01-07  8:57                                     ` Mathias Nyman
2021-01-07 16:07                                       ` Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201001164352.GA13249@google.com \
    --to=zwisler@google.com \
    --cc=andrzej.p@collabora.com \
    --cc=kernel@collabora.com \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.