Linux USB
 help / color / mirror / Atom feed
From: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
To: Wesley Cheng <quic_wcheng@quicinc.com>,
	Thinh Nguyen <Thinh.Nguyen@synopsys.com>,
	Wesley Cheng <wcheng@codeaurora.org>,
	"balbi@kernel.org" <balbi@kernel.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
Cc: "linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"quic_jackp@quicinc.com" <quic_jackp@quicinc.com>
Subject: Re: [RFC PATCH v2 1/3] usb: dwc3: Flush pending SETUP data during stop active xfers
Date: Fri, 25 Mar 2022 01:51:56 +0000	[thread overview]
Message-ID: <4b91bf76-02e5-c680-862a-1ef4ed53eea4@synopsys.com> (raw)
In-Reply-To: <6d3470af-5d75-a989-dc21-00f42c022af0@quicinc.com>

Wesley Cheng wrote:
> Hi Thinh,
> 
> On 3/23/2022 5:39 PM, Thinh Nguyen wrote:
>> Hi Wesley,
>>
>> Wesley Cheng wrote:
>>> Hi Thinh,
>>>
>>> On 3/22/2022 6:31 PM, Thinh Nguyen wrote:
>>>> Hi Wesley,
>>>>
>>>> Wesley Cheng wrote:
>>>>> Hi Thinh,
>>>>>
>>>>> On 2/15/2022 4:08 PM, Wesley Cheng wrote:
>>>>>> While running the pullup disable sequence, if there are pending SETUP
>>>>>> transfers stored in the controller, then the ENDTRANSFER commands on non
>>>>>> control eps will fail w/ ETIMEDOUT.  As a suggestion from SNPS, in order
>>>>>> to drain potentially cached SETUP packets, SW needs to issue a
>>>>>> STARTTRANSFER command.  After issuing the STARTTRANSFER, and retrying the
>>>>>> ENDTRANSFER, the command should succeed.  Else, if the endpoints are not
>>>>>> properly stopped, the controller halt sequence will fail as well.
>>>>>>
>>>>>> One limitation is that the current logic will drop the SETUP data
>>>>>> being received (ie dropping the SETUP packet), however, it should be
>>>>>> acceptable in the pullup disable case, as the device is eventually
>>>>>> going to disconnect from the host.
>>>>>>
>>>>>> Signed-off-by: Wesley Cheng <quic_wcheng@quicinc.com>
>>>>>> ---
>>>>> Just wondering if you had any inputs for this particular change?  I
>>>>> think on the dequeue path discussion you had some concerns with parts of
>>>>> this change?  I think the difficult part for the pullup disable path is
>>>>> that we have this API running w/ interrupts disabled, so the EP0 state
>>>>> won't be able to advance compared to the dequeue case.
>>>>
>>>> This doesn't sound right. The pullup disable (or device initiated
>>>> disconnect) should wait for the EP0 state to be EP0_SETUP_PHASE before
>>>> proceeding, which it does.
>>>>
>>> That is correct, but even though that check is passed, it doesn't
>>> prevent the host from sending another SETUP token between the pending
>>> setup packet check and run/stop clear.
>>>
>>
>> That should be fine, because we would have the TRB ready for that SETUP
>> packet.
>>
> I agree, its valid for the controller to be able to receive the next
> setup packet.
> 
>>>>>
>>>>> Ideally, if there is a way to ensure that we avoid reading further setup
>>>>> packets, that would be nice, but from our discussions with Synopsys,
>>>>> this was not possible. (controller is always armed and ready to ACK
>>>>> setup tokens)
>>>>>
>>>>> I did evaluate keeping IRQs enabled and periodically releasing/attaining
>>>>> the lock between the stop active xfer calls, but that opened another can
>>>>> of worms.  If you think this is the approach we should take, I can take
>>>>> a look at this implementation further.
>>>>>
>>>>
>>>> This patch doesn't look right to me. The change I suggested before
>>>> should address this (I believe Greg already picked it up). What other
>>>> problem do you see, I'm not clear what's the problem here. One potential
>>>> problem that I can see is that currently dwc3 driver doesn't wait for
>>>> active endpoints to complete/end before clearing the run_stop bit on
>>>> device initiated disconnect, but I'm not sure if that's what you're seeing.
>>>>
>>>> Please help clarify further. If there's trace points log, that'd help.
>>>>
>>> Main difference between the issue Greg recently pulled in and this one
>>> is that this is on the pullup disable patch (no dequeue involved).  In
>>> this situation we will also stop active transfers, so that the
>>> controller can halt properly.
>>>
>>> I attached a few files, and will give a summary of them below:
>>> 1.  pullup_disable_timeout.txt - ftrace of an instance of when we see
>>> several endxfer timeouts.  Refer to line#2016.
>>>
>>> 2.  lecroy+ftrace_snip.png - picture showing a matching bus sniffer log
>>> and a ftrace collected (difference instance to #1)
>>>
>>> #2 will show that we completed a SETUP transfer before entering into
>>> dwc3_gadget_stop_active_transfers().  In here, we then see DWC ACK
>>> another SETUP token, which leads to endxfer timeouts on all subsequent
>>> endpoints.
>>
>> Thanks for the captures. I think the problem here is because the dwc3
>> driver disables the control endpoint. It shouldn't do that.
>>
>> Can you try this:
>>
>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>> index ab725d2262d6..f0b9ea353620 100644
>> --- a/drivers/usb/dwc3/gadget.c
>> +++ b/drivers/usb/dwc3/gadget.c
>> @@ -1010,12 +1010,12 @@ static int __dwc3_gadget_ep_disable(struct
>> dwc3_ep *dep)
>>         if (dep->flags & DWC3_EP_STALL)
>>                 __dwc3_gadget_ep_set_halt(dep, 0, false);
>>
>> -       reg = dwc3_readl(dwc->regs, DWC3_DALEPENA);
>> -       reg &= ~DWC3_DALEPENA_EP(dep->number);
>> -       dwc3_writel(dwc->regs, DWC3_DALEPENA, reg);
>> -
>> -       /* Clear out the ep descriptors for non-ep0 */
>>         if (dep->number > 1) {
>> +               reg = dwc3_readl(dwc->regs, DWC3_DALEPENA);
>> +               reg &= ~DWC3_DALEPENA_EP(dep->number);
>> +               dwc3_writel(dwc->regs, DWC3_DALEPENA, reg);
>> +
>> +               /* Clear out the ep descriptors for non-ep0 */
>>                 dep->endpoint.comp_desc = NULL;
>>                 dep->endpoint.desc = NULL;
>>         }
>>
> I was able to reproduce the endxfer timeout w/ the above change.  I'm
> assuming you didn't want me to include any parts of my change while
> testing, right?
> 

Right, please don't add additional code. I can't review knowing what
else was changed.

> Current sequence in dwc3_gadget_pullup(0) is that we should call
> dwc3_stop_active_transfers() before we call __dwc3_gadget_stop().
> (gadget stop will call the ep disable for EP[0] and EP[1])  This might
> be why the issue would still be occurring.
> 
> The attached ftrace that captures a situation where a forced BUG will
> occur on the first instance of the timeout.
> 
> [ 1573.687689437       0x724885c22]   event (000020c0): ep0out: Transfer
> Not Ready [0] (Not Active) [Status Phase]
> [ 1573.687738707       0x724885fd3]   dbg_send_ep_cmd: ep0out: cmd
> 'Start Transfer' [406] params 00000000 efffa000 00000000 --> status:
> Successful
> [ 1573.698040582       0x7248b6477]   dbg_send_ep_cmd: ep1out: cmd 'End
> Transfer' [20c08] params 00000000 00000000 00000000 --> status: Timed Out
> 
> We definitely started the STATUS phase, so host could have read it and
> sent the next SETUP packet while we were already in the
> dwc3_stop_active_transfers() loop.

Yeah, pullup() is another async call that can trigger the same problem.



[ 1573.687193134       0x7248836e8]   dbg_send_ep_cmd: ep0out: cmd 'Start Transfer' [406] params 00000000 efffa000 00000000 --> status: Successful
[ 1573.687263291       0x724883c2b]   event (0000c040): ep0out: Transfer Complete (sIL) [Setup Phase]
[ 1573.687287145       0x724883df6]   dbg_trace_log_ctrl: Clear Interface Feature(Function Suspend)

Trace indicates that this is ClearFeature(Interface.FuncSuspend)


[ 1573.687372301       0x724884459]   dbg_send_ep_cmd: ep0in: cmd 'Start Transfer' [406] params 00000000 efffa000 00000000 --> status: Successful
[ 1573.687400634       0x724884679]   event (000010c2): ep0in: Transfer Not Ready [0] (Not Active) [Data Phase]
[ 1573.687471155       0x724884bc2]   event (000090c2): ep0in: Transfer Not Ready [0] (Active) [Data Phase]
[ 1573.687495009       0x724884d8d]   event (0000c042): ep0in: Transfer Complete (sIL) [Data Phase]
[ 1573.687544697       0x724885146]   dbg_gadget_giveback: ep0out: req ffffff87a6768800 length 20/20 ZsI ==> 0

But there's data stage. Is this non-standard/vendor request?
We probably need to fix the tracepoint to properly print vendor
requests.


[ 1573.687593187       0x7248854e9]   dbg_ep_queue: ep5in: req ffffff87a6769000 length 0/8 zsI ==> -115
[ 1573.687619645       0x7248856e6]   dbg_prepare: ep5in: trb ffffffc05a97d1a0 (E27:D26) buf 00000000ee07f600 size 1x 8 ctrl 00000811 (Hlcs:sC:normal)
[ 1573.687660270       0x7248859f2]   dbg_send_ep_cmd: ep5in: cmd 'Update Transfer' [b0007] params 00000000 00000000 00000000 --> status: Successful
[ 1573.687689437       0x724885c22]   event (000020c0): ep0out: Transfer Not Ready [0] (Not Active) [Status Phase]
[ 1573.687738707       0x724885fd3]   dbg_send_ep_cmd: ep0out: cmd 'Start Transfer' [406] params 00000000 efffa000 00000000 --> status: Successful

It's still in status stage, it never reached Setup phase.


[ 1573.698040582       0x7248b6477]   dbg_send_ep_cmd: ep1out: cmd 'End Transfer' [20c08] params 00000000 00000000 00000000 --> status: Timed Out

That's why it's possible to time out here. The Setup Phase should
have been prepared before we can issue End Transfer.

I can't see where the pullup() begins as this log is missing register
read/write trace events. Looks like the wait_for_completion_timeout()
is insufficient. I assume it happens right after previous Setup
phase. There's no spin_lock, so the driver can proceed with the
new control request before it reaches dwc3_stop_active_transfers().

Can you try this. (yes, the code looks ugly right now, but it's just
to confirm my suspicion)

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index ab725d2262d6..3b40411556ff 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2506,6 +2506,8 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
 
        is_on = !!is_on;
        dwc->softconnect = is_on;
+
+retry:
        /*
         * Per databook, when we want to stop the gadget, if a control transfer
         * is still in process, complete it and get the core into setup phase.
@@ -2541,6 +2543,24 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
                return 0;
        }
 
+       if (!is_on) {
+               spin_lock_irqsave(&dwc->lock, flags);
+               if (dwc->ep0state != EP0_SETUP_PHASE) {
+                       spin_unlock_irqrestore(&dwc->lock, flags);
+                       goto retry;
+               }
+
+               /*
+                * In the Synopsis DesignWare Cores USB3 Databook Rev. 3.30a
+                * Section 4.1.8 Table 4-7, it states that for a device-initiated
+                * disconnect, the SW needs to ensure that it sends "a DEPENDXFER
+                * command for any active transfers" before clearing the RunStop
+                * bit.
+                */
+               dwc3_stop_active_transfers(dwc);
+               spin_unlock_irqrestore(&dwc->lock, flags);
+       }
+
        /*
         * Synchronize and disable any further event handling while controller
         * is being enabled/disabled.
@@ -2553,14 +2573,6 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
                u32 count;
 
                dwc->connected = false;
-               /*
-                * In the Synopsis DesignWare Cores USB3 Databook Rev. 3.30a
-                * Section 4.1.8 Table 4-7, it states that for a device-initiated
-                * disconnect, the SW needs to ensure that it sends "a DEPENDXFER
-                * command for any active transfers" before clearing the RunStop
-                * bit.
-                */
-               dwc3_stop_active_transfers(dwc);
                __dwc3_gadget_stop(dwc);
 
                /*

Thanks,
Thinh

  reply	other threads:[~2022-03-25  1:52 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-16  0:08 [RFC PATCH v2 0/3] Fix enumeration issues during composition switching Wesley Cheng
2022-02-16  0:08 ` [RFC PATCH v2 1/3] usb: dwc3: Flush pending SETUP data during stop active xfers Wesley Cheng
2022-03-22 18:21   ` Wesley Cheng
2022-03-23  1:31     ` Thinh Nguyen
     [not found]       ` <469213d0-526a-e26e-88c2-b34b2aa8dfe7@codeaurora.org>
2022-03-24  0:39         ` Thinh Nguyen
2022-03-24  2:19           ` Thinh Nguyen
2022-03-24 18:53           ` Wesley Cheng
2022-03-25  1:51             ` Thinh Nguyen [this message]
2022-03-25 18:35               ` Wesley Cheng
2022-04-01 19:31               ` Wesley Cheng
2022-04-04 23:32                 ` Thinh Nguyen
2022-02-16  0:08 ` [RFC PATCH v2 2/3] usb: dwc3: gadget: Wait for ep0 xfers to complete during dequeue Wesley Cheng
2022-02-16  0:08 ` [RFC PATCH v2 3/3] usb: dwc3: Issue core soft reset before enabling run/stop Wesley Cheng
2022-02-17  5:52   ` Jung Daehwan
2022-02-17  9:43     ` Jung Daehwan
2022-02-17 19:01     ` Wesley Cheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4b91bf76-02e5-c680-862a-1ef4ed53eea4@synopsys.com \
    --to=thinh.nguyen@synopsys.com \
    --cc=balbi@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=quic_jackp@quicinc.com \
    --cc=quic_wcheng@quicinc.com \
    --cc=wcheng@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox