linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Felipe Balbi <balbi@kernel.org>
To: John Stultz <john.stultz@linaro.org>
Cc: lkml <linux-kernel@vger.kernel.org>,
	Anurag Kumar Vulisha <anurag.kumar.vulisha@xilinx.com>,
	Yang Fei <fei.yang@intel.com>, Thinh Nguyen <thinhn@synopsys.com>,
	Tejas Joglekar <tejas.joglekar@synopsys.com>,
	Andrzej Pietrasiewicz <andrzej.p@collabora.com>,
	Jack Pham <jackp@codeaurora.org>, Todd Kjos <tkjos@google.com>,
	Greg KH <gregkh@linuxfoundation.org>,
	Linux USB List <linux-usb@vger.kernel.org>,
	stable <stable@vger.kernel.org>
Subject: Re: [RFC][PATCH 1/2] usb: dwc3: gadget: Check for IOC/LST bit in both event->status and TRB->ctrl fields
Date: Fri, 24 Jan 2020 09:45:22 +0200	[thread overview]
Message-ID: <87iml147sd.fsf@kernel.org> (raw)
In-Reply-To: <CALAqxLUeBf2Jx2tLW1yzJk6JHM0RP9cJbTt7m19Qdz-rWMw2mQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 8333 bytes --]


Hi,

John Stultz <john.stultz@linaro.org> writes:

> On Wed, Jan 22, 2020 at 11:23 PM Felipe Balbi <balbi@kernel.org> wrote:
>> > From: Anurag Kumar Vulisha <anurag.kumar.vulisha@xilinx.com>
>> >
>> > The present code in dwc3_gadget_ep_reclaim_completed_trb() will check
>> > for IOC/LST bit in the event->status and returns if IOC/LST bit is
>> > set. This logic doesn't work if multiple TRBs are queued per
>> > request and the IOC/LST bit is set on the last TRB of that request.
>> > Consider an example where a queued request has multiple queued TRBs
>> > and IOC/LST bit is set only for the last TRB. In this case, the Core
>> > generates XferComplete/XferInProgress events only for the last TRB
>> > (since IOC/LST are set only for the last TRB). As per the logic in
>> > dwc3_gadget_ep_reclaim_completed_trb() event->status is checked for
>> > IOC/LST bit and returns on the first TRB. This makes the remaining
>> > TRBs left unhandled.
>> > To aviod this, changed the code to check for IOC/LST bits in both
>>      avoid
>>
>> > event->status & TRB->ctrl. This patch does the same.
>>
>> We don't need to check both. It's very likely that checking the TRB is
>> enough.
>
> Sorry, just to clarify, are you suggesting instead of:
> -       if (event->status & DEPEVT_STATUS_IOC)
> +       if ((event->status & DEPEVT_STATUS_IOC) &&
> +           (trb->ctrl & DWC3_TRB_CTRL_IOC))
>
>
> We do something like:
> -       if (event->status & DEPEVT_STATUS_IOC)
> +       if (trb->ctrl & DWC3_TRB_CTRL_IOC)
> +               return 1;
> +
> +       if (trb->ctrl & DWC3_TRB_CTRL_LST)
>                 return 1;
>
> ?

that's correct. In hindsight, I have no idea why I used the
event->status here since all other checks are done against the TRB
only.

>> > At a practical level, this patch resolves USB transfer stalls seen
>> > with adb on dwc3 based HiKey960 after functionfs gadget added
>> > scatter-gather support around v4.20.
>>
>> Right, I remember asking for tracepoint data showing this problem
>> happening. It's the best way to figure out what's really going on.
>>
>> Before we accept these two patches, could you collect dwc3 tracepoint
>> data and share here?
>
> Sure. Attached is trace logs and regdumps for hikey960.

Thanks

> The one gotcha with the logs is that in the working case (with this
> patch applied), I booted with the usb-c cable disconnected (as
> suggested in the dwc3.rst doc), enabled tracing and plugged in the
> device, then ran adb logcat a few times to validate no stalls.
>
> In the failure case (without this patch), I booted with the usb-c
> cable disconnected, enabled tracing and then when I plugged in the
> device, it never was detected by adb (it seems perhaps the problem had
> already struck?).

You never got a Reset Interrupt, so something else is going on. I
suggest putting a sniffer and first making sure the host *does* drive
reset signalling. Second step would be to look at your phy
configuration. Is it going in suspend for any reason? Might want to try
our snps,dis_u3_susphy_quirk and snps,dis_u2_susphy_quirk flags.

> So I generated the failure2 log by booting with USB-C plugged in,
> enabling tracing, and running adb logcat on the host to observe the
> stall.

Thank you. Here's a quick summary of what's in failure2:

There is a series of 24-byte transfers on ep1out and that's the one
which shows a problem. We can clearly see that adb is issuing one
transfer at a time, only enqueueing transfer n+1 when transfer n is
completed and given back, so we see a series of similar blocks:

- dwc3_alloc_request
- dwc3_ep_queue
- dwc3_prepare_trb
- dwc3_prepare_trb (for the chained bit)
- dwc3_gadget_ep_cmd (update transfer)
- dwc3_event (transfer in progress)
- dwc3_complete_trb
- dwc3_complete_trb (for the chained bit)
- dwc3_gadget_giveback
- dwc3_free_request

So this works for several iterations. Note, however, that the TRB
addresses don't really make sense. DWC3 allocates a contiguous block of
memory to server as TRB pool, but we see non-consecutive addresses on
these TRBs. I'm assuming there's an IOMMU in your system.

Anyway, the failing point is here:

>          adbd-461   [002] d..1    49.855992: dwc3_alloc_request: ep1out: req 000000004e6eaaba length 0/0 zsI ==> 0
>          adbd-461   [002] d..2    49.855994: dwc3_ep_queue: ep1out: req 000000004e6eaaba length 0/24 zsI ==> -115
>          adbd-461   [002] d..2    49.855996: dwc3_prepare_trb: ep1out: trb 00000000bae39b48 buf 000000009eb0b100 size 24 ctrl 0000001d (HlCS:sc:normal)
>          adbd-461   [002] d..2    49.855997: dwc3_prepare_trb: ep1out: trb 000000009093a074 buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
>          adbd-461   [002] d..2    49.856003: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
>   irq/65-dwc3-498   [000] d..1    53.902752: dwc3_event: event (00006084): ep1out: Transfer In Progress [0] (SIm)
>   irq/65-dwc3-498   [000] d..1    53.902763: dwc3_complete_trb: ep1out: trb 00000000bae39b48 buf 000000009eb0b100 size 0 ctrl 0000001c (hlCS:sc:normal)
>   irq/65-dwc3-498   [000] d..1    53.902769: dwc3_complete_trb: ep1out: trb 000000009093a074 buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
>   irq/65-dwc3-498   [000] d..1    53.902781: dwc3_gadget_giveback: ep1out: req 000000004e6eaaba length 24/24 zsI ==> 0
> kworker/u16:0-7     [000] ....    53.903020: dwc3_free_request: ep1out: req 000000004e6eaaba length 24/24 zsI ==> 0
>          adbd-461   [002] d..1    53.903273: dwc3_alloc_request: ep1out: req 00000000c769beab length 0/0 zsI ==> 0
>          adbd-461   [002] d..2    53.903285: dwc3_ep_queue: ep1out: req 00000000c769beab length 0/24 zsI ==> -115
>          adbd-461   [002] d..2    53.903292: dwc3_prepare_trb: ep1out: trb 00000000f0ffa827 buf 000000009eb11e80 size 24 ctrl 0000001d (HlCS:sc:normal)
>          adbd-461   [002] d..2    53.903296: dwc3_prepare_trb: ep1out: trb 00000000d6a9892a buf 0000000217da8000 size 488 ctrl 00000819 (HlcS:sC:normal)
>          adbd-461   [002] d..2    53.903315: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful

Note that this transfer, after started, took 4 seconds to complete,
while all others completed within a few ms. There's no real reason for
this visible from dwc3 driver itself. What follows, is a transfer that
never completed.

The only thing I can come up with, is that we starve the TRB ring, by
continuously reclaiming a single TRB. We have 255 usable TRBs, so after
a few iterations, we would see a stall due to starved TRB ring.

There is a way to verify this by tracking trb_enqueue and trb_dequeue,
if you're willing to do that, that'll help us prove that this is really
the problem and, since current tracepoints doen't really show that
information, it may be a good idea to add this information to
dwc3_log_trb tracepoint class. Something like below should be enough,
could you re-run the test of failure2 with this patch applied?

drivers/usb/dwc3/trace.h | 9 +++++++--

modified   drivers/usb/dwc3/trace.h
@@ -227,6 +227,8 @@ DECLARE_EVENT_CLASS(dwc3_log_trb,
 		__field(u32, size)
 		__field(u32, ctrl)
 		__field(u32, type)
+		__field(u32, enqueue)
+		__field(u32, dequeue)
 	),
 	TP_fast_assign(
 		__assign_str(name, dep->name);
@@ -236,9 +238,12 @@ DECLARE_EVENT_CLASS(dwc3_log_trb,
 		__entry->size = trb->size;
 		__entry->ctrl = trb->ctrl;
 		__entry->type = usb_endpoint_type(dep->endpoint.desc);
+		__entry->enqueue = dep->trb_enqueue
+		__entry->dequeue = dep->trb_dequeue
 	),
-	TP_printk("%s: trb %p buf %08x%08x size %s%d ctrl %08x (%c%c%c%c:%c%c:%s)",
-		__get_str(name), __entry->trb, __entry->bph, __entry->bpl,
+	TP_printk("%s: trb %p (E%d:D%d) buf %08x%08x size %s%d ctrl %08x (%c%c%c%c:%c%c:%s)",
+		__get_str(name), __entry->trb, __entry->enqueue,
+		__entry->dequeue, __entry->bph, __entry->bpl,
 		({char *s;
 		int pcm = ((__entry->size >> 24) & 3) + 1;
 		switch (__entry->type) {

> Anyway, all three sets of logs are included. Let me know if you need
> me to try anything else.

Thanks for doing this

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  parent reply	other threads:[~2020-01-24  7:44 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-22 22:26 [RFC][PATCH 0/2] Avoiding DWC3 transfer stalls/hangs when using adb over f_fs John Stultz
2020-01-22 22:26 ` [RFC][PATCH 1/2] usb: dwc3: gadget: Check for IOC/LST bit in both event->status and TRB->ctrl fields John Stultz
2020-01-23  7:24   ` Felipe Balbi
2020-01-23 18:54     ` Yang, Fei
     [not found]     ` <CALAqxLUeBf2Jx2tLW1yzJk6JHM0RP9cJbTt7m19Qdz-rWMw2mQ@mail.gmail.com>
2020-01-24  7:45       ` Felipe Balbi [this message]
2020-01-24 22:10         ` John Stultz
2020-01-25 11:02           ` Felipe Balbi
2020-01-27 18:48             ` John Stultz
2020-01-22 22:26 ` [RFC][PATCH 2/2] usb: dwc3: gadget: Correct the logic for finding last SG entry John Stultz
2020-01-23  7:25   ` Felipe Balbi
2020-01-23 15:50     ` Anurag Kumar Vulisha
2020-01-23  7:18 ` [RFC][PATCH 0/2] Avoiding DWC3 transfer stalls/hangs when using adb over f_fs Felipe Balbi
2020-01-23  8:43 ` Andrzej Pietrasiewicz
2020-01-23 16:29   ` Yang, Fei
2020-01-23 17:31     ` Felipe Balbi
2020-01-23 17:37       ` Yang, Fei
2020-01-23 17:46         ` Felipe Balbi
2020-01-23 18:28           ` Yang, Fei
2020-02-05 21:03           ` John Stultz
2020-02-06  6:23             ` Felipe Balbi
2020-02-06  7:40             ` Christoph Hellwig
2020-02-06 18:29               ` Felipe Balbi
2020-02-06 18:41                 ` Christoph Hellwig
2020-02-07  6:00                   ` Felipe Balbi
2020-01-23 19:58       ` John Stultz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87iml147sd.fsf@kernel.org \
    --to=balbi@kernel.org \
    --cc=andrzej.p@collabora.com \
    --cc=anurag.kumar.vulisha@xilinx.com \
    --cc=fei.yang@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jackp@codeaurora.org \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tejas.joglekar@synopsys.com \
    --cc=thinhn@synopsys.com \
    --cc=tkjos@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).