* Mass Storage Gadget Device Falls from SuperSpeed to High Speed
@ 2019-04-02 7:48 ` Rob Weber
2019-04-02 8:13 ` Felipe Balbi
0 siblings, 1 reply; 6+ messages in thread
From: Rob Weber @ 2019-04-02 7:48 UTC (permalink / raw)
To: Felipe Balbi; +Cc: linux-usb, mathias.nyman
Hi Felipe,
On Tue, Apr 02, 2019 at 08:46:16AM +0300, Felipe Balbi wrote:
> Felipe Balbi <felipe.balbi@linux.intel.com> writes:
> >>> modphy is the USB PHY integrated in your SoC. There's no control for
> >>> that from OS side, only BIOS unfortunately. There is, however, one thing
> >>> we can try. DWC3 has several quirk flags for known quirky PHYs; perhaps
> >>> CHT needs one of those. Can you try with this patch and let me know
> >>> whether it helps?
> >>
> >> Sure thing, I will try tomorrow. Could you possibly explain what a quirk
> >> is as it relates to the kernel? I see this all over the source tree but
> >> never knew how it was used. Does the dwc3 also know about "quirks" and
> >> these particular flags? or are these flags just specific to the kernel
> >> and its functionality?
> >>
> >>> modified drivers/usb/dwc3/dwc3-pci.c
> >>> @@ -105,6 +105,8 @@ static int dwc3_byt_enable_ulpi_refclock(struct pci_dev *pci)
> >>> static const struct property_entry dwc3_pci_intel_properties[] = {
> >>> PROPERTY_ENTRY_STRING("dr_mode", "peripheral"),
> >>> PROPERTY_ENTRY_BOOL("linux,sysdev_is_parent"),
> >>> + PROPERTY_ENTRY_BOOL("snps,dis_u3_susphy_quirk"),
> >>> + PROPERTY_ENTRY_BOOL("snps,dis_u2_susphy_quirk"),
> >>> {}
> >>> };
> >>>
> >>> These two quirks will PHY suspend. There are other relevant quirk flags
> > ^^^^^^^^
> > will DISABLE phy suspend :-)
> >
> >> What do you mean by PHY suspend? Will it disable U2/U3 for the dwc3? I
> >
> > No, no. DWC3 can still enter U1/U2/U3, but the PHY will not enter the
> > matching P1/P2/P3 state.
> >
> >> see it modified the DWC3_GUSB2PHYCFG_SUSPHY bit in the configuration
> >> register, but I don't have access to the dwc3 databook to dig deeper
> >> into this.
> >
> > The second quirk flag (snps,dis_u2_susphy_quirk) will tell dwc3 to *NOT*
> > request USB2 PHY to enter low power state. While the first flag
> > (snps,dis_u3_susphy_quirk) will tell dwc3 to *NOT* request USB3 PHY to
> > enter low power state.
> >
> > In reality, they are a single block of HW but they _can_ be different
> > and even if they are a single block, they _can_ have separate clock
> > domains and we _may_ be able to control them separately. I haven't read
> > documentation for modphy because the OS doesn't touch that. If
> > necessary, I can try to find out more details about it, but I will
> > probably take some time to find the correct documentation.
> >
> >>> which we can try in case these two don't help. I'd like to figure out
> >>> exactly which quirk flag helps (if any). After that, we would need to
> >>> check if a similar problem happens on any CHT system or just your
> >>> design.
> >>>
> >>> If it happens on any other system, then I can make sure we add a quirk
> >>> flag to all CHTs.
> >>
> >> Sounds good!
> >>
> >> Thanks for taking the time to answer my questions! It's definitely
> >> helpful for my understanding of USB. I'm learning quite a bit of
> >> new information with each email and it's pretty awesome.
> >
> > No problems at all, happy to help.
>
> Any updates here? Hopefully the quirk flags above helped.
Thanks for following up. My team and I were doing a bunch of testing
today as well as this past weekend. We are still seeing mixed results
with the suggestion you provided to us earlier. I unfortunately do not
have any logs, traces, or analyzer captures to back this up yet, but I
hope to gather more concrete data tomorrow.
There were two main issues we observed with the quirk flags above:
* Host mode no longer worked with these quirks applied. Please keep in
mind that we backported part of a patchset [1] to support switching
the internal SS mux from the xHCI controller to the xDCI controller.
At a quick glance, do you see anything in this patchset that you think
might cause a problem with the role switch driver communicating with
the xHCI controller when the quirks are applied?
* We still saw the drop from SS to HS on a Windows laptop. Linux and Mac
seemed to be fine, but there's definitely a chance we did not test
with enough samples to confidently say this quirk is stable with those
hosts.
So tomorrow I hope to gather more concrete data to back up these
findings. Our USB protocol analyzer is also out-of-order at the moment,
so we might not be able to get analyzer data for a day or two either.
Are there some other quirks you would like us to try out?
Over the weekend, one of our team members was hacking in the dwc3 driver
to try and disable LPM at various points in the chain of events. I've
attached a patch that we believe is pretty stable. We have never seen it
drop from SS to HS. USB type A to type C connections from any host to
our device seem pretty stable. I've attached an archive called
"gnarbox-success-a-to-c.tar.xz" That contains the trace events for a
successful run where we connect the device to a Windows host and it
enumerates at SuperSpeed.
This patch isn't entirely stable as I haven't been able to get a
reliable USB C-to-C connection on any host yet. I do not have any logs
or traces for this particular case and I hope to get them to you soon.
Do you remember the "failed to enable ep0out" error I briefly
mentioned earlier? I was able to capture some trace events while this
occurs. This issue occaisonally occurs when modprobing g_mass_storage.
The exact steps are as follows:
1. switch from host mode to device mode (echo device >
/sys/class/usb_role/.../role)
2. modprobe g_mass_storage ...
3. Try and rest device mode with a host computer, but you'll observe
that device mode is not working.
4. modprobe -r g_mass_storage to remove the gadget driver.
5. switch back to host mode (echo host > /sys/class/usb_role/.../role)
6. switch back to device mode
7. modprobe g_mass_storage ...
8. Observe the error
I've attached the traces and dmesg output in an archive titled
"gnarbox-ep0out-failure.tar.xz". Please let me know if you need more
information about this particular issue.
Lastly, I started digging through the errata available for our SoC
series [2] and there are a few that stand out to me, including CHT40,
CHT47, CHT48, and CHT50. CHT47 is the most concerning because I recall
seeing an "Unknown" LFPS status coming from the device as it is trying
to leave U2. I asked the support specialists on our IPS ticket about
these errata and I'm hoping to hear back from them within the day. Do
any of these errata stand out to you as possible culprits?
Thanks for taking a look at all of this again!
Cheers,
Rob Weber
[1] https://lkml.org/lkml/2018/3/20/387
[2] https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/atom-z8000-spec-update.pdf
From de8b9e63df1e24d1b9402ae13224edbdae80cd5c Mon Sep 17 00:00:00 2001
From: Mike Lemmon <lemmon@gnarbox.com>
Date: Sun, 31 Mar 2019 13:18:12 -0700
Subject: [PATCH] dwc3: Disable U1 and U2 link states
---
drivers/usb/dwc3/core.c | 10 ++++++++++
drivers/usb/dwc3/debug.h | 28 ++++++++++++++++++++++++++
drivers/usb/dwc3/ep0.c | 10 ++++++++++
drivers/usb/dwc3/gadget.c | 51 ++++++++++++++++++++++++++++++++++++++++++++---
4 files changed, 96 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 53b26e978d90..b869ff53ee86 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -641,6 +641,16 @@ static int dwc3_core_init(struct dwc3 *dwc)
goto err0;
}
+ printk(KERN_INFO "GBX: (dwc3) dwc->revision=%x patch=0038-disable-U1-U2\n", dwc->revision);
+
+ /*
+ // Tried overriding to old revision but it just hung on transition
+ // to mass-storage mode...
+ printk(KERN_INFO "GBX: (dwc3) Overriding dwc3 revision from %x to %x",
+ dwc->revision, DWC3_REVISION_173A);
+ dwc->revision = DWC3_REVISION_173A;
+ */
+
/*
* Write Linux Version Code to our GUID register so it's easy to figure
* out which kernel version a bug was found.
diff --git a/drivers/usb/dwc3/debug.h b/drivers/usb/dwc3/debug.h
index 33ab2a203c1b..2385c2222123 100644
--- a/drivers/usb/dwc3/debug.h
+++ b/drivers/usb/dwc3/debug.h
@@ -124,6 +124,34 @@ dwc3_gadget_link_string(enum dwc3_link_state link_state)
}
}
+static inline const char *
+dwc3_dsts_speed_string(u8 speed)
+{
+ static char str[256];
+ switch (speed) {
+ case DWC3_DSTS_SUPERSPEED_PLUS: sprintf(str, "Super-Speed-Plus"); break;
+ case DWC3_DSTS_SUPERSPEED: sprintf(str, "Super-Speed"); break;
+ case DWC3_DSTS_HIGHSPEED: sprintf(str, "High-Speed"); break;
+ case DWC3_DSTS_FULLSPEED: sprintf(str, "Full-Speed"); break;
+ case DWC3_DSTS_LOWSPEED: sprintf(str, "Low-Speed"); break;
+ default: sprintf(str, "UNKNOWN");
+ }
+ return str;
+}
+
+static inline const char *
+dwc3_dr_mode_string(enum usb_dr_mode mode)
+{
+ static char str[256];
+ switch (mode) {
+ case USB_DR_MODE_PERIPHERAL: sprintf(str, "Device-Mode"); break;
+ case USB_DR_MODE_HOST: sprintf(str, "Host-Mode"); break;
+ case USB_DR_MODE_OTG: sprintf(str, "OTG-Mode"); break;
+ default: sprintf(str, "Unsupported Mode (%d)", mode);
+ }
+ return str;
+}
+
/**
* dwc3_gadget_event_string - returns event name
* @event: the event code
diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
index 2331469f943d..8c9bdd20ead3 100644
--- a/drivers/usb/dwc3/ep0.c
+++ b/drivers/usb/dwc3/ep0.c
@@ -420,12 +420,14 @@ static int dwc3_ep0_handle_feature(struct dwc3 *dwc,
switch (wValue) {
case USB_DEVICE_REMOTE_WAKEUP:
+ dwc3_trace(trace_dwc3_gadget, "GBX: USB_RECIP_DEVICE: USB_DEVICE_REMOTE_WAKEUP");
break;
/*
* 9.4.1 says only only for SS, in AddressState only for
* default control pipe
*/
case USB_DEVICE_U1_ENABLE:
+ dwc3_trace(trace_dwc3_gadget, "GBX: USB_RECIP_DEVICE: USB_DEVICE_U1_ENABLE");
if (state != USB_STATE_CONFIGURED)
return -EINVAL;
if ((dwc->speed != DWC3_DSTS_SUPERSPEED) &&
@@ -441,6 +443,7 @@ static int dwc3_ep0_handle_feature(struct dwc3 *dwc,
break;
case USB_DEVICE_U2_ENABLE:
+ dwc3_trace(trace_dwc3_gadget, "GBX: USB_RECIP_DEVICE: USB_DEVICE_U2_ENABLE");
if (state != USB_STATE_CONFIGURED)
return -EINVAL;
if ((dwc->speed != DWC3_DSTS_SUPERSPEED) &&
@@ -456,9 +459,11 @@ static int dwc3_ep0_handle_feature(struct dwc3 *dwc,
break;
case USB_DEVICE_LTM_ENABLE:
+ dwc3_trace(trace_dwc3_gadget, "GBX: USB_RECIP_DEVICE: USB_DEVICE_LTM_ENABLE");
return -EINVAL;
case USB_DEVICE_TEST_MODE:
+ dwc3_trace(trace_dwc3_gadget, "GBX: USB_RECIP_DEVICE: USB_DEVICE_TEST_MODE");
if ((wIndex & 0xff) != 0)
return -EINVAL;
if (!set)
@@ -478,6 +483,7 @@ static int dwc3_ep0_handle_feature(struct dwc3 *dwc,
}
break;
default:
+ dwc3_trace(trace_dwc3_gadget, "GBX: USB_RECIP_DEVICE: UNKNOWN FEATURE");
return -EINVAL;
}
break;
@@ -485,6 +491,7 @@ static int dwc3_ep0_handle_feature(struct dwc3 *dwc,
case USB_RECIP_INTERFACE:
switch (wValue) {
case USB_INTRF_FUNC_SUSPEND:
+ dwc3_trace(trace_dwc3_gadget, "GBX: USB_RECIP_INTERFACE: USB_INTRF_FUNC_SUSPEND");
if (wIndex & USB_INTRF_FUNC_SUSPEND_LP)
/* XXX enable Low power suspend */
;
@@ -493,6 +500,7 @@ static int dwc3_ep0_handle_feature(struct dwc3 *dwc,
;
break;
default:
+ dwc3_trace(trace_dwc3_gadget, "GBX: USB_RECIP_INTERFACE: UNKNOWN");
return -EINVAL;
}
break;
@@ -500,6 +508,7 @@ static int dwc3_ep0_handle_feature(struct dwc3 *dwc,
case USB_RECIP_ENDPOINT:
switch (wValue) {
case USB_ENDPOINT_HALT:
+ dwc3_trace(trace_dwc3_gadget, "GBX: USB_RECIP_ENDPOINT: USB_ENDPOINT_HALT");
dep = dwc3_wIndex_to_dep(dwc, ctrl->wIndex);
if (!dep)
return -EINVAL;
@@ -510,6 +519,7 @@ static int dwc3_ep0_handle_feature(struct dwc3 *dwc,
return -EINVAL;
break;
default:
+ dwc3_trace(trace_dwc3_gadget, "GBX: USB_RECIP_ENDPOINT: UNKNOWN");
return -EINVAL;
}
break;
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 26efe8c7535f..baf8967ad015 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -115,6 +115,10 @@ int dwc3_gadget_set_link_state(struct dwc3 *dwc, enum dwc3_link_state state)
}
reg = dwc3_readl(dwc->regs, DWC3_DCTL);
+ dwc3_trace(trace_dwc3_gadget, "GBX: link state change requested: [%s] => [%s]",
+ dwc3_gadget_link_string(DWC3_DSTS_USBLNKST(reg)),
+ dwc3_gadget_link_string(state)
+ );
reg &= ~DWC3_DCTL_ULSTCHNGREQ_MASK;
/* set requested state */
@@ -176,6 +180,8 @@ void dwc3_gadget_giveback(struct dwc3_ep *dep, struct dwc3_request *req,
struct dwc3 *dwc = dep->dwc;
unsigned int unmap_after_complete = false;
+ dwc3_trace(trace_dwc3_gadget, "GBX: gadget_giveback: status=%d --> %d", req->request.status, status);
+
req->started = false;
list_del(&req->list);
req->trb = NULL;
@@ -1636,6 +1642,21 @@ static int __dwc3_gadget_start(struct dwc3 *dwc)
reg = dwc3_readl(dwc->regs, DWC3_DCFG);
reg &= ~(DWC3_DCFG_SPEED_MASK);
+ /* // Tried disabling U1 and U2 here but it didn't work :(
+ {
+ u32 u1u2;
+ u32 reg2;
+ dwc3_trace(trace_dwc3_gadget, "GBX: Disabling U1 and U2");
+ reg2 = dwc3_readl(dwc->regs, DWC3_DCTL);
+ u1u2 = reg2 & (DWC3_DCTL_INITU2ENA
+ | DWC3_DCTL_ACCEPTU2ENA
+ | DWC3_DCTL_INITU1ENA
+ | DWC3_DCTL_ACCEPTU1ENA);
+ reg2 &= ~u1u2;
+ dwc3_writel(dwc->regs, DWC3_DCTL, reg2);
+ }
+ */
+
/**
* WORKAROUND: DWC3 revision < 2.20a have an issue
* which would cause metastability state on Run/Stop
@@ -1871,6 +1892,8 @@ static int dwc3_gadget_init_endpoints(struct dwc3 *dwc)
{
int ret;
+ dwc3_trace(trace_dwc3_gadget, "GBX: INIT ENDPOINTS: BEGIN");
+
INIT_LIST_HEAD(&dwc->gadget.ep_list);
ret = dwc3_gadget_init_hw_endpoints(dwc, dwc->num_out_eps, 0);
@@ -1887,6 +1910,8 @@ static int dwc3_gadget_init_endpoints(struct dwc3 *dwc)
return ret;
}
+ dwc3_trace(trace_dwc3_gadget, "GBX: INIT ENDPOINTS: END");
+
return 0;
}
@@ -2113,7 +2138,9 @@ static void dwc3_endpoint_transfer_complete(struct dwc3 *dwc,
* WORKAROUND: This is the 2nd half of U1/U2 -> U0 workaround.
* See dwc3_gadget_linksts_change_interrupt() for 1st half.
*/
+ /* GBX: Only enabling first half of workaround so that we stay out of U1/U2 */
if (dwc->revision < DWC3_REVISION_183A) {
+ dwc3_trace(trace_dwc3_gadget, "GBX: 2nd half of U1/U2 -> U0 workaround");
u32 reg;
int i;
@@ -2376,6 +2403,10 @@ static void dwc3_gadget_disconnect_interrupt(struct dwc3 *dwc)
{
int reg;
+ /* // Tried removing these writes to avoid going into U1/U2, but it didn't work :(
+ dwc3_trace(trace_dwc3_gadget,
+ "GBX: disconnect_interrupt: Not entering U1 and U2 even though we should");
+ */
reg = dwc3_readl(dwc->regs, DWC3_DCTL);
reg &= ~DWC3_DCTL_INITU1ENA;
dwc3_writel(dwc->regs, DWC3_DCTL, reg);
@@ -2481,9 +2512,12 @@ static void dwc3_gadget_conndone_interrupt(struct dwc3 *dwc)
reg = dwc3_readl(dwc->regs, DWC3_DSTS);
speed = reg & DWC3_DSTS_CONNECTSPD;
dwc->speed = speed;
-
dwc3_update_ram_clk_sel(dwc, speed);
+ dwc3_trace(trace_dwc3_gadget, "GBX: conndone_interrupt speed=%s",
+ dwc3_dsts_speed_string(speed),
+ dwc->revision);
+
switch (speed) {
case DWC3_DSTS_SUPERSPEED_PLUS:
dwc3_gadget_ep0_desc.wMaxPacketSize = cpu_to_le16(512);
@@ -2609,6 +2643,11 @@ static void dwc3_gadget_linksts_change_interrupt(struct dwc3 *dwc,
enum dwc3_link_state next = evtinfo & DWC3_LINK_STATE_MASK;
unsigned int pwropt;
+ dwc3_trace(trace_dwc3_gadget, "GBX: Processing Link Change interrupt: [%s] -> [%s]",
+ dwc3_gadget_link_string(dwc->link_state),
+ dwc3_gadget_link_string(next)
+ );
+
/*
* WORKAROUND: DWC3 < 2.50a have an issue when configured without
* Hibernation mode enabled which would show up when device detects
@@ -2655,8 +2694,10 @@ static void dwc3_gadget_linksts_change_interrupt(struct dwc3 *dwc,
* STAR#9000446952: RTL: Device SS : if U1/U2 ->U0 takes >128us
* core send LGO_Ux entering U0
*/
- if (dwc->revision < DWC3_REVISION_183A) {
+ // if (dwc->revision < DWC3_REVISION_183A) {
+ {
if (next == DWC3_LINK_STATE_U0) {
+ dwc3_trace(trace_dwc3_gadget, "GBX: using workaround for transition to U0");
u32 u1u2;
u32 reg;
@@ -2706,6 +2747,9 @@ static void dwc3_gadget_linksts_change_interrupt(struct dwc3 *dwc,
static void dwc3_gadget_suspend_interrupt(struct dwc3 *dwc,
unsigned int evtinfo)
{
+ dwc3_trace(trace_dwc3_gadget, "GBX: suspend_interrupt: speed=%s",
+ dwc3_dsts_speed_string(dwc->speed));
+
enum dwc3_link_state next = evtinfo & DWC3_LINK_STATE_MASK;
if (dwc->link_state != next && next == DWC3_LINK_STATE_U3)
@@ -2787,7 +2831,8 @@ static void dwc3_gadget_interrupt(struct dwc3 *dwc,
dwc3_trace(trace_dwc3_gadget, "Erratic Error");
break;
case DWC3_DEVICE_EVENT_CMD_CMPL:
- dwc3_trace(trace_dwc3_gadget, "Command Complete");
+ dwc3_trace(trace_dwc3_gadget, "GBX: Command Complete. speed=%s",
+ dwc3_dsts_speed_string(dwc->speed));
break;
case DWC3_DEVICE_EVENT_OVERFLOW:
dwc3_trace(trace_dwc3_gadget, "Overflow");
--
2.14.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Mass Storage Gadget Device Falls from SuperSpeed to High Speed
@ 2019-04-02 8:13 ` Felipe Balbi
[not found] ` <20190403070228.GA27956@coops>
0 siblings, 1 reply; 6+ messages in thread
From: Felipe Balbi @ 2019-04-02 8:13 UTC (permalink / raw)
To: Rob Weber; +Cc: linux-usb, mathias.nyman
Hi Rob,
Rob Weber <rob@gnarbox.com> writes:
> On Tue, Apr 02, 2019 at 08:46:16AM +0300, Felipe Balbi wrote:
>> Felipe Balbi <felipe.balbi@linux.intel.com> writes:
>> >>> modphy is the USB PHY integrated in your SoC. There's no control for
>> >>> that from OS side, only BIOS unfortunately. There is, however, one thing
>> >>> we can try. DWC3 has several quirk flags for known quirky PHYs; perhaps
>> >>> CHT needs one of those. Can you try with this patch and let me know
>> >>> whether it helps?
>> >>
>> >> Sure thing, I will try tomorrow. Could you possibly explain what a quirk
>> >> is as it relates to the kernel? I see this all over the source tree but
>> >> never knew how it was used. Does the dwc3 also know about "quirks" and
>> >> these particular flags? or are these flags just specific to the kernel
>> >> and its functionality?
>> >>
>> >>> modified drivers/usb/dwc3/dwc3-pci.c
>> >>> @@ -105,6 +105,8 @@ static int dwc3_byt_enable_ulpi_refclock(struct pci_dev *pci)
>> >>> static const struct property_entry dwc3_pci_intel_properties[] = {
>> >>> PROPERTY_ENTRY_STRING("dr_mode", "peripheral"),
>> >>> PROPERTY_ENTRY_BOOL("linux,sysdev_is_parent"),
>> >>> + PROPERTY_ENTRY_BOOL("snps,dis_u3_susphy_quirk"),
>> >>> + PROPERTY_ENTRY_BOOL("snps,dis_u2_susphy_quirk"),
>> >>> {}
>> >>> };
>> >>>
>> >>> These two quirks will PHY suspend. There are other relevant quirk flags
>> > ^^^^^^^^
>> > will DISABLE phy suspend :-)
>> >
>> >> What do you mean by PHY suspend? Will it disable U2/U3 for the dwc3? I
>> >
>> > No, no. DWC3 can still enter U1/U2/U3, but the PHY will not enter the
>> > matching P1/P2/P3 state.
>> >
>> >> see it modified the DWC3_GUSB2PHYCFG_SUSPHY bit in the configuration
>> >> register, but I don't have access to the dwc3 databook to dig deeper
>> >> into this.
>> >
>> > The second quirk flag (snps,dis_u2_susphy_quirk) will tell dwc3 to *NOT*
>> > request USB2 PHY to enter low power state. While the first flag
>> > (snps,dis_u3_susphy_quirk) will tell dwc3 to *NOT* request USB3 PHY to
>> > enter low power state.
>> >
>> > In reality, they are a single block of HW but they _can_ be different
>> > and even if they are a single block, they _can_ have separate clock
>> > domains and we _may_ be able to control them separately. I haven't read
>> > documentation for modphy because the OS doesn't touch that. If
>> > necessary, I can try to find out more details about it, but I will
>> > probably take some time to find the correct documentation.
>> >
>> >>> which we can try in case these two don't help. I'd like to figure out
>> >>> exactly which quirk flag helps (if any). After that, we would need to
>> >>> check if a similar problem happens on any CHT system or just your
>> >>> design.
>> >>>
>> >>> If it happens on any other system, then I can make sure we add a quirk
>> >>> flag to all CHTs.
>> >>
>> >> Sounds good!
>> >>
>> >> Thanks for taking the time to answer my questions! It's definitely
>> >> helpful for my understanding of USB. I'm learning quite a bit of
>> >> new information with each email and it's pretty awesome.
>> >
>> > No problems at all, happy to help.
>>
>> Any updates here? Hopefully the quirk flags above helped.
>
> Thanks for following up. My team and I were doing a bunch of testing
no problem.
> today as well as this past weekend. We are still seeing mixed results
> with the suggestion you provided to us earlier. I unfortunately do not
> have any logs, traces, or analyzer captures to back this up yet, but I
> hope to gather more concrete data tomorrow.
cool, thanks
> There were two main issues we observed with the quirk flags above:
>
> * Host mode no longer worked with these quirks applied. Please keep in
Interesting. This is rather unexpected.
> mind that we backported part of a patchset [1] to support switching
> the internal SS mux from the xHCI controller to the xDCI controller.
> At a quick glance, do you see anything in this patchset that you think
> might cause a problem with the role switch driver communicating with
> the xHCI controller when the quirks are applied?
nope, that shouldn't cause the behavior you're seeing.
> * We still saw the drop from SS to HS on a Windows laptop. Linux and Mac
> seemed to be fine, but there's definitely a chance we did not test
> with enough samples to confidently say this quirk is stable with those
> hosts.
okay, so still not good enough. Let's try to prevent the device side
from ever initiating U1/U2 entry and from ever accepting LGO_U1 and
LGO_U2 from the host. Here's a (hack) patch for that
modified drivers/usb/dwc3/ep0.c
@@ -382,7 +382,7 @@ static int dwc3_ep0_handle_u1(struct dwc3 *dwc, enum usb_device_state state,
reg = dwc3_readl(dwc->regs, DWC3_DCTL);
if (set)
- reg |= DWC3_DCTL_INITU1ENA;
+ return -EINVAL;
else
reg &= ~DWC3_DCTL_INITU1ENA;
dwc3_writel(dwc->regs, DWC3_DCTL, reg);
@@ -404,7 +404,7 @@ static int dwc3_ep0_handle_u2(struct dwc3 *dwc, enum usb_device_state state,
reg = dwc3_readl(dwc->regs, DWC3_DCTL);
if (set)
- reg |= DWC3_DCTL_INITU2ENA;
+ return -EINVAL;
else
reg &= ~DWC3_DCTL_INITU2ENA;
dwc3_writel(dwc->regs, DWC3_DCTL, reg);
@@ -625,9 +625,9 @@ static int dwc3_ep0_set_config(struct dwc3 *dwc, struct usb_ctrlrequest *ctrl)
* Enable transition to U1/U2 state when
* nothing is pending from application.
*/
- reg = dwc3_readl(dwc->regs, DWC3_DCTL);
- reg |= (DWC3_DCTL_ACCEPTU1ENA | DWC3_DCTL_ACCEPTU2ENA);
- dwc3_writel(dwc->regs, DWC3_DCTL, reg);
+ /* reg = dwc3_readl(dwc->regs, DWC3_DCTL); */
+ /* reg |= (DWC3_DCTL_ACCEPTU1ENA | DWC3_DCTL_ACCEPTU2ENA); */
+ /* dwc3_writel(dwc->regs, DWC3_DCTL, reg); */
}
break;
I don't think it will apply cleanly to your kernel, let me know if you
need help porting it.
> So tomorrow I hope to gather more concrete data to back up these
> findings. Our USB protocol analyzer is also out-of-order at the moment,
> so we might not be able to get analyzer data for a day or two either.
>
> Are there some other quirks you would like us to try out?
See above, not a quirk flag, but just a complete removal of U1/U2 entry
from the device side.
> Over the weekend, one of our team members was hacking in the dwc3 driver
> to try and disable LPM at various points in the chain of events. I've
> attached a patch that we believe is pretty stable. We have never seen it
> drop from SS to HS. USB type A to type C connections from any host to
> our device seem pretty stable. I've attached an archive called
> "gnarbox-success-a-to-c.tar.xz" That contains the trace events for a
> successful run where we connect the device to a Windows host and it
> enumerates at SuperSpeed.
Okay, cool. I'll go over your tarballs after this email.
> This patch isn't entirely stable as I haven't been able to get a
> reliable USB C-to-C connection on any host yet. I do not have any logs
> or traces for this particular case and I hope to get them to you soon.
>
> Do you remember the "failed to enable ep0out" error I briefly
> mentioned earlier? I was able to capture some trace events while this
> occurs. This issue occaisonally occurs when modprobing g_mass_storage.
> The exact steps are as follows:
>
> 1. switch from host mode to device mode (echo device >
> /sys/class/usb_role/.../role)
> 2. modprobe g_mass_storage ...
> 3. Try and rest device mode with a host computer, but you'll observe
> that device mode is not working.
> 4. modprobe -r g_mass_storage to remove the gadget driver.
> 5. switch back to host mode (echo host > /sys/class/usb_role/.../role)
> 6. switch back to device mode
> 7. modprobe g_mass_storage ...
> 8. Observe the error
Hmm, odd. When we remove a gadget driver, every endpoint (including ep0)
should be disabled.
> I've attached the traces and dmesg output in an archive titled
> "gnarbox-ep0out-failure.tar.xz". Please let me know if you need more
> information about this particular issue.
I'll go over the traces, this could be something that has been fixed in
the past.
> Lastly, I started digging through the errata available for our SoC
> series [2] and there are a few that stand out to me, including CHT40,
> CHT47, CHT48, and CHT50. CHT47 is the most concerning because I recall
> seeing an "Unknown" LFPS status coming from the device as it is trying
Right, I saw that one too.
> to leave U2. I asked the support specialists on our IPS ticket about
> these errata and I'm hoping to hear back from them within the day. Do
> any of these errata stand out to you as possible culprits?
I've read them but they're related to the host port, not the peripheral
port. The problem you're facing is on the peripheral side. I'd say
they're unrelated.
> Thanks for taking a look at all of this again!
No problem at all. Happy to help
^ permalink raw reply [flat|nested] 6+ messages in thread
* Mass Storage Gadget Device Falls from SuperSpeed to High Speed
@ 2019-04-03 7:48 ` Felipe Balbi
[not found] ` <20190404075624.GB15850@coops>
0 siblings, 1 reply; 6+ messages in thread
From: Felipe Balbi @ 2019-04-03 7:48 UTC (permalink / raw)
To: Rob Weber; +Cc: linux-usb, mathias.nyman
Hi Rob,
Rob Weber <rob@gnarbox.com> writes:
>> > * We still saw the drop from SS to HS on a Windows laptop. Linux and Mac
>> > seemed to be fine, but there's definitely a chance we did not test
>> > with enough samples to confidently say this quirk is stable with those
>> > hosts.
>>
>> okay, so still not good enough. Let's try to prevent the device side
>> from ever initiating U1/U2 entry and from ever accepting LGO_U1 and
>> LGO_U2 from the host. Here's a (hack) patch for that
>>
>> modified drivers/usb/dwc3/ep0.c
>> @@ -382,7 +382,7 @@ static int dwc3_ep0_handle_u1(struct dwc3 *dwc, enum usb_device_state state,
>>
>> reg = dwc3_readl(dwc->regs, DWC3_DCTL);
>> if (set)
>> - reg |= DWC3_DCTL_INITU1ENA;
>> + return -EINVAL;
>> else
>> reg &= ~DWC3_DCTL_INITU1ENA;
>> dwc3_writel(dwc->regs, DWC3_DCTL, reg);
>> @@ -404,7 +404,7 @@ static int dwc3_ep0_handle_u2(struct dwc3 *dwc, enum usb_device_state state,
>>
>> reg = dwc3_readl(dwc->regs, DWC3_DCTL);
>> if (set)
>> - reg |= DWC3_DCTL_INITU2ENA;
>> + return -EINVAL;
>> else
>> reg &= ~DWC3_DCTL_INITU2ENA;
>> dwc3_writel(dwc->regs, DWC3_DCTL, reg);
>> @@ -625,9 +625,9 @@ static int dwc3_ep0_set_config(struct dwc3 *dwc, struct usb_ctrlrequest *ctrl)
>> * Enable transition to U1/U2 state when
>> * nothing is pending from application.
>> */
>> - reg = dwc3_readl(dwc->regs, DWC3_DCTL);
>> - reg |= (DWC3_DCTL_ACCEPTU1ENA | DWC3_DCTL_ACCEPTU2ENA);
>> - dwc3_writel(dwc->regs, DWC3_DCTL, reg);
>> + /* reg = dwc3_readl(dwc->regs, DWC3_DCTL); */
>> + /* reg |= (DWC3_DCTL_ACCEPTU1ENA | DWC3_DCTL_ACCEPTU2ENA); */
>> + /* dwc3_writel(dwc->regs, DWC3_DCTL, reg); */
>> }
>> break;
>>
>>
>> I don't think it will apply cleanly to your kernel, let me know if you
>> need help porting it.
>
> Thank you for the offer. It wasn't too bad after reading the ep0.c file
> for both 4.9.115 and master. I've included the patch in the attached
> tarball if you would like to see how I ported it.
Thank you. BTW, it may very well be that you only need to disable U2 and
leave U1 enabled. That may help save some power, but I don't know
whether you have requirements for this :-)
>> > So tomorrow I hope to gather more concrete data to back up these
>> > findings. Our USB protocol analyzer is also out-of-order at the moment,
>> > so we might not be able to get analyzer data for a day or two either.
>> >
>> > Are there some other quirks you would like us to try out?
>>
>> See above, not a quirk flag, but just a complete removal of U1/U2 entry
>> from the device side.
>
> Thank you for providing the above patch. It seems quite stable for a
> variety of hosts! We want to continue testing this on several samples
> with a bunch of different host computers and cables to make sure we're
> in the clear, but this patch is feeling pretty good.
Good to hear, please update me once you're happy with the results. Also,
if you notice this problem happening again in the future, don't hesitate
to contact me.
> I've attached a tarball containing the dmesg output, regdump, and
> tracepoints of a succesful session with the patch applied. I've included
> the patch in the tarball if you would like to see how I ported it.
> Could you please double-check the traces to make sure they meet your
> expectations for disabling the U1/U2_ENABLE features? I looked through
> the traces and could not find any link state changes to U1 or U2, so it
> seems like it's working.
The host, from that run, didn't enable device-initiated U1/U2 entry. So
only host-initiated entry is avaiable. Since we have removed
ACCEPTU1/ACCEPTU2 bits from DCTL, dwc3 is denying LGO_U1 and LGO_U2. You
can confirm that only with a sniffer.
> I have a couple of follow-up questions about this patch:
>
> * I added some tracepoints in the patch I'm applying but they don't seem
> to be included in the trace output. This patch applies cleanly and I
> haven't seen any compiler warnings about it. Is the
> dwc3_ep0_handle_feature function not called during
it's called from SetFeature() command. If SetFeature() is not sent by
the host, then you won't see your tracepoints. FYI, SetFeature would
have a bRequest of 3. Here we can all ctrl requests:
192: irq/23-dwc3-1765 [000] d..1 213.007591: dwc3_ctrl_req: bRequestType 00 bRequest 05 wValue 0005 wIndex 0000 wLength 0
208: irq/23-dwc3-1765 [000] d..1 213.010457: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0100 wIndex 0000 wLength 18
237: irq/23-dwc3-1765 [000] d..1 213.010871: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0304 wIndex 0409 wLength 2
266: irq/23-dwc3-1765 [000] d..1 213.011330: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0304 wIndex 0409 wLength 40
295: irq/23-dwc3-1765 [000] d..1 213.011707: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0303 wIndex 0409 wLength 2
324: irq/23-dwc3-1765 [000] d..1 213.012137: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0303 wIndex 0409 wLength 92
353: irq/23-dwc3-1765 [000] d..1 213.012548: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0305 wIndex 0409 wLength 2
382: irq/23-dwc3-1765 [000] d..1 213.012934: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0305 wIndex 0409 wLength 12
411: irq/23-dwc3-1765 [000] d..1 213.013331: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0f00 wIndex 0000 wLength 5
440: irq/23-dwc3-1765 [000] d..1 213.013748: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0f00 wIndex 0000 wLength 22
469: irq/23-dwc3-1765 [000] d..1 213.016006: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0200 wIndex 0000 wLength 9
498: irq/23-dwc3-1765 [000] d..1 213.016543: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0200 wIndex 0000 wLength 44
527: irq/23-dwc3-1765 [000] d..1 213.016951: dwc3_ctrl_req: bRequestType 00 bRequest 09 wValue 0001 wIndex 0000 wLength 0
568: irq/23-dwc3-1765 [000] d..1 213.017569: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0301 wIndex 0409 wLength 2
597: irq/23-dwc3-1765 [000] d..1 213.017926: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0301 wIndex 0409 wLength 26
626: irq/23-dwc3-1765 [000] d..1 213.018367: dwc3_ctrl_req: bRequestType 81 bRequest 00 wValue 0000 wIndex 0000 wLength 2
653: irq/23-dwc3-1765 [000] d..1 213.026404: dwc3_ctrl_req: bRequestType a1 bRequest fe wValue 0000 wIndex 0000 wLength 1
929: irq/23-dwc3-1765 [000] d..1 214.138385: dwc3_ctrl_req: bRequestType 82 bRequest 00 wValue 0000 wIndex 0081 wLength 2
956: irq/23-dwc3-1765 [000] d..1 214.138791: dwc3_ctrl_req: bRequestType 02 bRequest 01 wValue 0000 wIndex 0081 wLength 0
1031: irq/23-dwc3-1765 [000] d..1 214.140214: dwc3_ctrl_req: bRequestType 82 bRequest 00 wValue 0000 wIndex 0081 wLength 2
1058: irq/23-dwc3-1765 [000] d..1 214.140618: dwc3_ctrl_req: bRequestType 02 bRequest 01 wValue 0000 wIndex 0081 wLength 0
1133: irq/23-dwc3-1765 [000] d..1 214.142208: dwc3_ctrl_req: bRequestType 82 bRequest 00 wValue 0000 wIndex 0081 wLength 2
1160: irq/23-dwc3-1765 [000] d..1 214.142587: dwc3_ctrl_req: bRequestType 02 bRequest 01 wValue 0000 wIndex 0081 wLength 0
Note that we don't have any "bRequest 03" substring.
> * I looked through the USB 3.0 Spec to find out more information about
> this solution you've suggested and the relevant section is 9.4.9. This
> section documents the "set feature" request. It specifies that a
> device should return a STALL transaction packet when the request
> references a feature that cannot be set. So does returning -EINVAL
> in the dwc3_ep0_handle_feature function translate to a STALL packet
> that gets sent to the host? And it's totally within spec for a device
> to not accept U1/U2_ENABLE requests, right?
That's correct. A Device can STALL a request for device-iniated U1/U2 entry.
>> > I've attached the traces and dmesg output in an archive titled
>> > "gnarbox-ep0out-failure.tar.xz". Please let me know if you need more
>> > information about this particular issue.
>>
>> I'll go over the traces, this could be something that has been fixed in
>> the past.
>
> Did you get a chance to take a look at the tracepoints for this error?
> No worries if you haven't gotten a chance yet.
I did, nothing screaming wrong with them. That "can't enable ep0out"
still puzzles me a bit. What we see on tracepoints is a time out:
modprobe-1281 [002] d..1 246.504043: dwc3_gadget_ep_cmd: ep0out: cmd 'Set Endpoint Transfer Resource' [2] params 00000001 00000000 00000000 --> status: Timed Out
But I have no idea why that happens. It could be that this was fixed
long ago, but we won't know unless we run a recent kernel on your
setup. I, certainly, haven't seen that :-)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Mass Storage Gadget Device Falls from SuperSpeed to High Speed
@ 2019-04-04 9:43 ` Felipe Balbi
2019-04-10 1:48 ` Rob Weber
0 siblings, 1 reply; 6+ messages in thread
From: Felipe Balbi @ 2019-04-04 9:43 UTC (permalink / raw)
To: Rob Weber; +Cc: linux-usb, mathias.nyman
Hi,
Rob Weber <rob@gnarbox.com> writes:
> On Wed, Apr 03, 2019 at 10:48:36AM +0300, Felipe Balbi wrote:
>> >> > * We still saw the drop from SS to HS on a Windows laptop. Linux and Mac
>> >> > seemed to be fine, but there's definitely a chance we did not test
>> >> > with enough samples to confidently say this quirk is stable with those
>> >> > hosts.
>> >>
>> >> okay, so still not good enough. Let's try to prevent the device side
>> >> from ever initiating U1/U2 entry and from ever accepting LGO_U1 and
>> >> LGO_U2 from the host. Here's a (hack) patch for that
>> >>
>> >> modified drivers/usb/dwc3/ep0.c
>> >> @@ -382,7 +382,7 @@ static int dwc3_ep0_handle_u1(struct dwc3 *dwc, enum usb_device_state state,
>> >>
>> >> reg = dwc3_readl(dwc->regs, DWC3_DCTL);
>> >> if (set)
>> >> - reg |= DWC3_DCTL_INITU1ENA;
>> >> + return -EINVAL;
>> >> else
>> >> reg &= ~DWC3_DCTL_INITU1ENA;
>> >> dwc3_writel(dwc->regs, DWC3_DCTL, reg);
>> >> @@ -404,7 +404,7 @@ static int dwc3_ep0_handle_u2(struct dwc3 *dwc, enum usb_device_state state,
>> >>
>> >> reg = dwc3_readl(dwc->regs, DWC3_DCTL);
>> >> if (set)
>> >> - reg |= DWC3_DCTL_INITU2ENA;
>> >> + return -EINVAL;
>> >> else
>> >> reg &= ~DWC3_DCTL_INITU2ENA;
>> >> dwc3_writel(dwc->regs, DWC3_DCTL, reg);
>> >> @@ -625,9 +625,9 @@ static int dwc3_ep0_set_config(struct dwc3 *dwc, struct usb_ctrlrequest *ctrl)
>> >> * Enable transition to U1/U2 state when
>> >> * nothing is pending from application.
>> >> */
>> >> - reg = dwc3_readl(dwc->regs, DWC3_DCTL);
>> >> - reg |= (DWC3_DCTL_ACCEPTU1ENA | DWC3_DCTL_ACCEPTU2ENA);
>> >> - dwc3_writel(dwc->regs, DWC3_DCTL, reg);
>> >> + /* reg = dwc3_readl(dwc->regs, DWC3_DCTL); */
>> >> + /* reg |= (DWC3_DCTL_ACCEPTU1ENA | DWC3_DCTL_ACCEPTU2ENA); */
>> >> + /* dwc3_writel(dwc->regs, DWC3_DCTL, reg); */
>> >> }
>> >> break;
>> >>
>> >>
>> >> I don't think it will apply cleanly to your kernel, let me know if you
>> >> need help porting it.
>> >
>> > Thank you for the offer. It wasn't too bad after reading the ep0.c file
>> > for both 4.9.115 and master. I've included the patch in the attached
>> > tarball if you would like to see how I ported it.
>>
>> Thank you. BTW, it may very well be that you only need to disable U2 and
>> leave U1 enabled. That may help save some power, but I don't know
>> whether you have requirements for this :-)
>
> Ahh, good point. We might come back to this after we finish testing with
> the current patch. Thanks!
>
>> >> > So tomorrow I hope to gather more concrete data to back up these
>> >> > findings. Our USB protocol analyzer is also out-of-order at the moment,
>> >> > so we might not be able to get analyzer data for a day or two either.
>> >> >
>> >> > Are there some other quirks you would like us to try out?
>> >>
>> >> See above, not a quirk flag, but just a complete removal of U1/U2 entry
>> >> from the device side.
>> >
>> > Thank you for providing the above patch. It seems quite stable for a
>> > variety of hosts! We want to continue testing this on several samples
>> > with a bunch of different host computers and cables to make sure we're
>> > in the clear, but this patch is feeling pretty good.
>>
>> Good to hear, please update me once you're happy with the results. Also,
>> if you notice this problem happening again in the future, don't hesitate
>> to contact me.
>
> The results are very promising, and our team is feeling very confident
> after hundreds of connections and disconnections :)
Good. Just thought I'd mention that we use a nice tool to help testing
such long connect/disconnect cycles.
http://www.cleware-shop.de/epages/63698188.sf/en_US/?ObjectPath=/Shops/63698188/Products/1001/SubProducts/1001-1
I'm not endorsing the company in any way, but we that little tool helps
us quite a bit. I wrote a tool to use that many years ago and I still
use it. It's available as part of another project where I keep testing
tools:
https://github.com/felipebalbi/usb-tools
The name of the binary will be "cleware".
> We've identified a couple of auxiliary issues that seem to be unrelated,
> but we are trying to gather more data to be sure. One issue we are
> experiencing is particularly clear: We seem to have a race condition
> between the switching of channels on the SS mux on our platform and the
> enumeration of our device. We believe our device gets enumerated in high
> speed before we switch the channel of the SS mux. This can be clearly
> observed while watching the logs of the MCU on our platform while
> simultaneously watching dmesg. Below is an example of this test setup.
> Please note the MCU logs begin with [general][DBG]:
>
> root@gnarbox-2:~# echo device > /sys/class/usb_role/intel_xhci_usb_sw-role-switch/role
> root@gnarbox-2:~# ./bin/usb-trace 1
> echo 1 > /t/events/dwc3/enable
> echo 1 > /t/events/xhci-hcd/enable
> echo 1 > /t/events/gadget/enable
> root@gnarbox-2:~# modprobe g_mass_storage file=/dev/nvme0n1p7 iSerialNumber=90405
> root@gnarbox-2:~# stty -F /dev/ttyS2 115200 cs8 cstopb -parenb -icrnl
> root@gnarbox-2:~# cat /dev/ttyS2 &
> [1] 1733
> root@gnarbox-2:~# dmesg -w
>
> Okay, setup complete, aaand we're rolling...
>
> [general] [DBG] handle_intr: (otg) event: Rp Change Detected()
> [general] [DBG] apply_pdo: setting limit based on Rp val 0
> [general] [DBG] set_current_limit: IIN_HOST: 100 0x0100
> [general] [DBG] set_current_limit: success (tried 1 times)
> [general] [DBG] handle_intr: (otg) event: Type-C Port Connect Detected()
> [general] [DBG] usb_mux_sel: (otg) polarity: 2
>
> Here's where the PD controller detects the connection. Note the polarity
> of the cable that we detect.
>
> [general] [DBG] handle_intr: (otg) Nothing to do.
> [general] [DBG] handle_intr: (otg) event: Hard Reset Sent to Port Partner
> [ 179.905835] g_mass_storage gadget: super-speed config #1: Linux File-Backed Storage
>
> shortly after, the device is enumerated and the link speed is set to 5Gbps
>
> [general] [DBG] handle_intr: (otg) event: Hard Reset Sent to Port Partner
> [general] [DBG] handle_intr: (otg) event: Hard Reset Sent to Port Partner
> [general] [DBG] handle_intr: (otg) event: Source Disabled State Entered
> [general] [DBG] enable_charging: incorrect current limit(3300), setting(100)
> [general] [DBG] set_current_limit: IIN_HOST: 100 0x0100
> [general] [DBG] set_current_limit: success (tried 1 times)
> ...
> ...
> [general] [DBG] handle_intr: (otg) event: Type-C Port Disconnect Detected()
> [general] [DBG] apply_pdo: setting limit based on Rp val 0
> [general] [DBG] set_current_limit: IIN_HOST: 100 0x0100
> [general] [DBG] set_current_limit: success (tried 1 times)
> [general] [DBG] disable_charging:
>
> Here we disconnect the cable from our device.
>
> [ 226.759843] g_mass_storage gadget: high-speed config #1: Linux File-Backed Storage
> [general] [DBG] handle_intr: (otg) event: Type-C Port Connect Detected()
> [general] [DBG] usb_mux_sel: (otg) polarity: 0
>
> Here we reconnected our device to the host but with the cable in a
> different polarity than before. This means the SS mux is not properly
> configured for the current cable orientation and must be switched. Also
> note here that linux seems to report the enumeration in high speed
> before our MCU had a chance to react to the new connection.
>
> [general] [DBG] handle_intr: (otg) event: Hard Reset Sent to Port Partner
> [general] [DBG] handle_intr: (otg) event: Hard Reset Sent to Port Partner
> [general] [DBG] handle_intr: (otg) event: Hard Reset Sent to Port Partner
> [general] [DBG] handle_intr: (otg) event: Source Disabled State Entered
> [general] [DBG] enable_charging: incorrect current limit(3300), setting(100)
> [general] [DBG] set_current_limit: IIN_HOST: 100 0x0100
> [general] [DBG] set_current_limit: success (tried 1 times)
> ...
> ...
> [general] [DBG] handle_intr: (otg) event: Type-C Port Disconnect Detected()
> [general] [DBG] apply_pdo: setting limit based on Rp val 0
> [general] [DBG] set_current_limit: IIN_HOST: 100 0x0100
> [general] [DBG] set_current_limit: success (tried 1 times)
> [general] [DBG] disable_charging:
>
> Here we disconnect our device from the host by removing the cable from
> our device's type-c connector.
>
> [ 262.783318] dwc3 dwc3.1.auto: request ffff880168ae3240 was not queued to ep1out
> [ 262.920461] g_mass_storage gadget: super-speed config #1: Linux File-Backed Storage
> [general] [DBG] handle_intr: (otg) event: Type-C Port Connect Detected()
> [general] [DBG] usb_mux_sel: (otg) polarity: 0
>
> Here we re-connect our device to the host, but we made sure to leave the
> cable in the same orientation as before. That way the SS mux is already
> configured for the current cable orientation. We still see the USB
> device enumeration occur before the MCU has a chance to respond, but it
> gets enumerated in SuperSpeed because the SS mux was already configured
> for this cable orientation.
This looks like an old bug on xhci which was fixed long ago. Mathias, do
you remember that bug? Which kernel fixed that issue?
> We've come up with a few solutions that are specific to our platform
> in order to reduce the chances of this race condition occuring, but I
> wanted to ask you for your thoughts about how we might be able to work
> around this from Linux. In the event that we cannot completely eliminate
> this race condition, we would like to find some way to "upgrade" the
> connection from high speed to SuperSpeed. Is something like this
> possible?
Unfortunately not. Does this continue to happen if you change the host
machine to e.g. Windows or a more recent linux kernel?
> My naiive understanding wsas that the host would continuously
> poll the SS lines for a valid connection, but I believe I am mistaken
> here. I noticed in the USB protocol traces that there are a couple of
> Link Management Packets (LPMs) that the host sends. One of them seems to
> request the configuration descriptor, and another seems to request the
> link speed be set to 5 Gbps, to which the device acknowledges and
> complies. Is it possible to force this sort of exchange to happen again?
AFAICT, that's handled entirely at the HW level without SW intervention.
> I've attached yet another tarball containing the dwc3 traces, USB
> protocol data (from a Teledyne LeCroy analyzer, compatible with the "USB
> Protocol Suite" application), and a raw copy of the logs I annotated
> above. The timestamps in the dwc3 traces should line up with the
> timestamps of the dmesg output in the annotated logs. The USB protocol
> data is from a different session, but still contains an example of the
> first enumeration in high speed, and the second in SS, all in one
> sample. The SS enumeration happens at around 8 seconds into the
> 12-second capture.
>
> Thanks in advance for your input here!
I'll have to spend some more time with the logs, but at least from
traces, I couldn't see anything that would give a hint. I'll try looking
at CATC traces later today tomorrow.
>> > I've attached a tarball containing the dmesg output, regdump, and
>> > tracepoints of a succesful session with the patch applied. I've included
>> > the patch in the tarball if you would like to see how I ported it.
>> > Could you please double-check the traces to make sure they meet your
>> > expectations for disabling the U1/U2_ENABLE features? I looked through
>> > the traces and could not find any link state changes to U1 or U2, so it
>> > seems like it's working.
>>
>> The host, from that run, didn't enable device-initiated U1/U2 entry. So
>> only host-initiated entry is avaiable. Since we have removed
>> ACCEPTU1/ACCEPTU2 bits from DCTL, dwc3 is denying LGO_U1 and LGO_U2. You
>> can confirm that only with a sniffer.
>>
>> > I have a couple of follow-up questions about this patch:
>> >
>> > * I added some tracepoints in the patch I'm applying but they don't seem
>> > to be included in the trace output. This patch applies cleanly and I
>> > haven't seen any compiler warnings about it. Is the
>> > dwc3_ep0_handle_feature function not called during
>>
>> it's called from SetFeature() command. If SetFeature() is not sent by
>> the host, then you won't see your tracepoints. FYI, SetFeature would
>> have a bRequest of 3. Here we can all ctrl requests:
>>
>> 192: irq/23-dwc3-1765 [000] d..1 213.007591: dwc3_ctrl_req: bRequestType 00 bRequest 05 wValue 0005 wIndex 0000 wLength 0
>> 208: irq/23-dwc3-1765 [000] d..1 213.010457: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0100 wIndex 0000 wLength 18
>> 237: irq/23-dwc3-1765 [000] d..1 213.010871: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0304 wIndex 0409 wLength 2
>> 266: irq/23-dwc3-1765 [000] d..1 213.011330: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0304 wIndex 0409 wLength 40
>> 295: irq/23-dwc3-1765 [000] d..1 213.011707: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0303 wIndex 0409 wLength 2
>> 324: irq/23-dwc3-1765 [000] d..1 213.012137: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0303 wIndex 0409 wLength 92
>> 353: irq/23-dwc3-1765 [000] d..1 213.012548: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0305 wIndex 0409 wLength 2
>> 382: irq/23-dwc3-1765 [000] d..1 213.012934: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0305 wIndex 0409 wLength 12
>> 411: irq/23-dwc3-1765 [000] d..1 213.013331: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0f00 wIndex 0000 wLength 5
>> 440: irq/23-dwc3-1765 [000] d..1 213.013748: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0f00 wIndex 0000 wLength 22
>> 469: irq/23-dwc3-1765 [000] d..1 213.016006: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0200 wIndex 0000 wLength 9
>> 498: irq/23-dwc3-1765 [000] d..1 213.016543: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0200 wIndex 0000 wLength 44
>> 527: irq/23-dwc3-1765 [000] d..1 213.016951: dwc3_ctrl_req: bRequestType 00 bRequest 09 wValue 0001 wIndex 0000 wLength 0
>> 568: irq/23-dwc3-1765 [000] d..1 213.017569: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0301 wIndex 0409 wLength 2
>> 597: irq/23-dwc3-1765 [000] d..1 213.017926: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0301 wIndex 0409 wLength 26
>> 626: irq/23-dwc3-1765 [000] d..1 213.018367: dwc3_ctrl_req: bRequestType 81 bRequest 00 wValue 0000 wIndex 0000 wLength 2
>> 653: irq/23-dwc3-1765 [000] d..1 213.026404: dwc3_ctrl_req: bRequestType a1 bRequest fe wValue 0000 wIndex 0000 wLength 1
>> 929: irq/23-dwc3-1765 [000] d..1 214.138385: dwc3_ctrl_req: bRequestType 82 bRequest 00 wValue 0000 wIndex 0081 wLength 2
>> 956: irq/23-dwc3-1765 [000] d..1 214.138791: dwc3_ctrl_req: bRequestType 02 bRequest 01 wValue 0000 wIndex 0081 wLength 0
>> 1031: irq/23-dwc3-1765 [000] d..1 214.140214: dwc3_ctrl_req: bRequestType 82 bRequest 00 wValue 0000 wIndex 0081 wLength 2
>> 1058: irq/23-dwc3-1765 [000] d..1 214.140618: dwc3_ctrl_req: bRequestType 02 bRequest 01 wValue 0000 wIndex 0081 wLength 0
>> 1133: irq/23-dwc3-1765 [000] d..1 214.142208: dwc3_ctrl_req: bRequestType 82 bRequest 00 wValue 0000 wIndex 0081 wLength 2
>> 1160: irq/23-dwc3-1765 [000] d..1 214.142587: dwc3_ctrl_req: bRequestType 02 bRequest 01 wValue 0000 wIndex 0081 wLength 0
>>
>> Note that we don't have any "bRequest 03" substring.
>
> Is it normal to not see any SetFeature commands? I think I was expecting
> to see them pretty frequently.
Totally normal :-) A host is not required to allow for device-initiated
U1/U2 entry. Nor is it required to halt and endpoint, for example.
>> >> > I've attached the traces and dmesg output in an archive titled
>> >> > "gnarbox-ep0out-failure.tar.xz". Please let me know if you need more
>> >> > information about this particular issue.
>> >>
>> >> I'll go over the traces, this could be something that has been fixed in
>> >> the past.
>> >
>> > Did you get a chance to take a look at the tracepoints for this error?
>> > No worries if you haven't gotten a chance yet.
>>
>> I did, nothing screaming wrong with them. That "can't enable ep0out"
>> still puzzles me a bit. What we see on tracepoints is a time out:
>>
>> modprobe-1281 [002] d..1 246.504043: dwc3_gadget_ep_cmd: ep0out: cmd 'Set Endpoint Transfer Resource' [2] params 00000001 00000000 00000000 --> status: Timed Out
>>
>> But I have no idea why that happens. It could be that this was fixed
>> long ago, but we won't know unless we run a recent kernel on your
>> setup. I, certainly, haven't seen that :-)
>
> One of my colleagues is currently testing kernel 5.0.5 without the patch
> you suggested and it looks pretty good so far! We'll keep you updated if
> this changes. With this issue in particular, it is still unclear how to
> get our device into this state. We'll definitely keep an eye out for it
> on 5.0.5.
If 5.0.5 is working, then you could run a git bisect and find the commit
that fixed it. If we have a failing commit and a working commit, git
bisect will help finding the fix, then you could simply cherry-pick it.
Best regards
^ permalink raw reply [flat|nested] 6+ messages in thread
* Mass Storage Gadget Device Falls from SuperSpeed to High Speed
@ 2019-04-10 1:48 ` Rob Weber
2019-04-10 1:48 ` Rob Weber
0 siblings, 1 reply; 6+ messages in thread
From: Rob Weber @ 2019-04-10 1:48 UTC (permalink / raw)
To: Felipe Balbi; +Cc: linux-usb, mathias.nyman
Hi Felipe,
On Thu, Apr 04, 2019 at 12:43:01PM +0300, Felipe Balbi wrote:
> > We've identified a couple of auxiliary issues that seem to be unrelated,
> > but we are trying to gather more data to be sure. One issue we are
> > experiencing is particularly clear: We seem to have a race condition
> > between the switching of channels on the SS mux on our platform and the
> > enumeration of our device. We believe our device gets enumerated in high
> > speed before we switch the channel of the SS mux. This can be clearly
> > observed while watching the logs of the MCU on our platform while
> > simultaneously watching dmesg. Below is an example of this test setup.
> > Please note the MCU logs begin with [general][DBG]:
> >
> > root@gnarbox-2:~# echo device > /sys/class/usb_role/intel_xhci_usb_sw-role-switch/role
> > root@gnarbox-2:~# ./bin/usb-trace 1
> > echo 1 > /t/events/dwc3/enable
> > echo 1 > /t/events/xhci-hcd/enable
> > echo 1 > /t/events/gadget/enable
> > root@gnarbox-2:~# modprobe g_mass_storage file=/dev/nvme0n1p7 iSerialNumber=90405
> > root@gnarbox-2:~# stty -F /dev/ttyS2 115200 cs8 cstopb -parenb -icrnl
> > root@gnarbox-2:~# cat /dev/ttyS2 &
> > [1] 1733
> > root@gnarbox-2:~# dmesg -w
> >
> > Okay, setup complete, aaand we're rolling...
> >
> > [general] [DBG] handle_intr: (otg) event: Rp Change Detected()
> > [general] [DBG] apply_pdo: setting limit based on Rp val 0
> > [general] [DBG] set_current_limit: IIN_HOST: 100 0x0100
> > [general] [DBG] set_current_limit: success (tried 1 times)
> > [general] [DBG] handle_intr: (otg) event: Type-C Port Connect Detected()
> > [general] [DBG] usb_mux_sel: (otg) polarity: 2
> >
> > Here's where the PD controller detects the connection. Note the polarity
> > of the cable that we detect.
> >
> > [general] [DBG] handle_intr: (otg) Nothing to do.
> > [general] [DBG] handle_intr: (otg) event: Hard Reset Sent to Port Partner
> > [ 179.905835] g_mass_storage gadget: super-speed config #1: Linux File-Backed Storage
> >
> > shortly after, the device is enumerated and the link speed is set to 5Gbps
> >
> > [general] [DBG] handle_intr: (otg) event: Hard Reset Sent to Port Partner
> > [general] [DBG] handle_intr: (otg) event: Hard Reset Sent to Port Partner
> > [general] [DBG] handle_intr: (otg) event: Source Disabled State Entered
> > [general] [DBG] enable_charging: incorrect current limit(3300), setting(100)
> > [general] [DBG] set_current_limit: IIN_HOST: 100 0x0100
> > [general] [DBG] set_current_limit: success (tried 1 times)
> > ...
> > ...
> > [general] [DBG] handle_intr: (otg) event: Type-C Port Disconnect Detected()
> > [general] [DBG] apply_pdo: setting limit based on Rp val 0
> > [general] [DBG] set_current_limit: IIN_HOST: 100 0x0100
> > [general] [DBG] set_current_limit: success (tried 1 times)
> > [general] [DBG] disable_charging:
> >
> > Here we disconnect the cable from our device.
> >
> > [ 226.759843] g_mass_storage gadget: high-speed config #1: Linux File-Backed Storage
> > [general] [DBG] handle_intr: (otg) event: Type-C Port Connect Detected()
> > [general] [DBG] usb_mux_sel: (otg) polarity: 0
> >
> > Here we reconnected our device to the host but with the cable in a
> > different polarity than before. This means the SS mux is not properly
> > configured for the current cable orientation and must be switched. Also
> > note here that linux seems to report the enumeration in high speed
> > before our MCU had a chance to react to the new connection.
> >
> > [general] [DBG] handle_intr: (otg) event: Hard Reset Sent to Port Partner
> > [general] [DBG] handle_intr: (otg) event: Hard Reset Sent to Port Partner
> > [general] [DBG] handle_intr: (otg) event: Hard Reset Sent to Port Partner
> > [general] [DBG] handle_intr: (otg) event: Source Disabled State Entered
> > [general] [DBG] enable_charging: incorrect current limit(3300), setting(100)
> > [general] [DBG] set_current_limit: IIN_HOST: 100 0x0100
> > [general] [DBG] set_current_limit: success (tried 1 times)
> > ...
> > ...
> > [general] [DBG] handle_intr: (otg) event: Type-C Port Disconnect Detected()
> > [general] [DBG] apply_pdo: setting limit based on Rp val 0
> > [general] [DBG] set_current_limit: IIN_HOST: 100 0x0100
> > [general] [DBG] set_current_limit: success (tried 1 times)
> > [general] [DBG] disable_charging:
> >
> > Here we disconnect our device from the host by removing the cable from
> > our device's type-c connector.
> >
> > [ 262.783318] dwc3 dwc3.1.auto: request ffff880168ae3240 was not queued to ep1out
> > [ 262.920461] g_mass_storage gadget: super-speed config #1: Linux File-Backed Storage
> > [general] [DBG] handle_intr: (otg) event: Type-C Port Connect Detected()
> > [general] [DBG] usb_mux_sel: (otg) polarity: 0
> >
> > Here we re-connect our device to the host, but we made sure to leave the
> > cable in the same orientation as before. That way the SS mux is already
> > configured for the current cable orientation. We still see the USB
> > device enumeration occur before the MCU has a chance to respond, but it
> > gets enumerated in SuperSpeed because the SS mux was already configured
> > for this cable orientation.
>
> This looks like an old bug on xhci which was fixed long ago. Mathias, do
> you remember that bug? Which kernel fixed that issue?
>
> > We've come up with a few solutions that are specific to our platform
> > in order to reduce the chances of this race condition occuring, but I
> > wanted to ask you for your thoughts about how we might be able to work
> > around this from Linux. In the event that we cannot completely eliminate
> > this race condition, we would like to find some way to "upgrade" the
> > connection from high speed to SuperSpeed. Is something like this
> > possible?
>
> Unfortunately not. Does this continue to happen if you change the host
> machine to e.g. Windows or a more recent linux kernel?
Yes, this happens on all hosts (Windows, OSX, Linux on a range of
kernels), which leads me to believe it might be unrelated to the xhci
bug you recalled earlier.
> > My naiive understanding wsas that the host would continuously
> > poll the SS lines for a valid connection, but I believe I am mistaken
> > here. I noticed in the USB protocol traces that there are a couple of
> > Link Management Packets (LPMs) that the host sends. One of them seems to
> > request the configuration descriptor, and another seems to request the
> > link speed be set to 5 Gbps, to which the device acknowledges and
> > complies. Is it possible to force this sort of exchange to happen again?
>
> AFAICT, that's handled entirely at the HW level without SW intervention.
Hmm, okay. Might it be possible to perform some sort of soft reset of
the dwc3, or possibly ask the host to reset the device?
We're looking for any way to possibly reset the connection from HS to SS
when we hit this race condition. We are also currently exploring ways of
switching the orientation of the SS mux more efficiently.
> > I've attached yet another tarball containing the dwc3 traces, USB
> > protocol data (from a Teledyne LeCroy analyzer, compatible with the "USB
> > Protocol Suite" application), and a raw copy of the logs I annotated
> > above. The timestamps in the dwc3 traces should line up with the
> > timestamps of the dmesg output in the annotated logs. The USB protocol
> > data is from a different session, but still contains an example of the
> > first enumeration in high speed, and the second in SS, all in one
> > sample. The SS enumeration happens at around 8 seconds into the
> > 12-second capture.
> >
> > Thanks in advance for your input here!
>
> I'll have to spend some more time with the logs, but at least from
> traces, I couldn't see anything that would give a hint. I'll try looking
> at CATC traces later today tomorrow.
>
> >> > I've attached a tarball containing the dmesg output, regdump, and
> >> > tracepoints of a succesful session with the patch applied. I've included
> >> > the patch in the tarball if you would like to see how I ported it.
> >> > Could you please double-check the traces to make sure they meet your
> >> > expectations for disabling the U1/U2_ENABLE features? I looked through
> >> > the traces and could not find any link state changes to U1 or U2, so it
> >> > seems like it's working.
> >>
> >> The host, from that run, didn't enable device-initiated U1/U2 entry. So
> >> only host-initiated entry is avaiable. Since we have removed
> >> ACCEPTU1/ACCEPTU2 bits from DCTL, dwc3 is denying LGO_U1 and LGO_U2. You
> >> can confirm that only with a sniffer.
> >>
> >> > I have a couple of follow-up questions about this patch:
> >> >
> >> > * I added some tracepoints in the patch I'm applying but they don't seem
> >> > to be included in the trace output. This patch applies cleanly and I
> >> > haven't seen any compiler warnings about it. Is the
> >> > dwc3_ep0_handle_feature function not called during
> >>
> >> it's called from SetFeature() command. If SetFeature() is not sent by
> >> the host, then you won't see your tracepoints. FYI, SetFeature would
> >> have a bRequest of 3. Here we can all ctrl requests:
> >>
> >> 192: irq/23-dwc3-1765 [000] d..1 213.007591: dwc3_ctrl_req: bRequestType 00 bRequest 05 wValue 0005 wIndex 0000 wLength 0
> >> 208: irq/23-dwc3-1765 [000] d..1 213.010457: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0100 wIndex 0000 wLength 18
> >> 237: irq/23-dwc3-1765 [000] d..1 213.010871: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0304 wIndex 0409 wLength 2
> >> 266: irq/23-dwc3-1765 [000] d..1 213.011330: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0304 wIndex 0409 wLength 40
> >> 295: irq/23-dwc3-1765 [000] d..1 213.011707: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0303 wIndex 0409 wLength 2
> >> 324: irq/23-dwc3-1765 [000] d..1 213.012137: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0303 wIndex 0409 wLength 92
> >> 353: irq/23-dwc3-1765 [000] d..1 213.012548: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0305 wIndex 0409 wLength 2
> >> 382: irq/23-dwc3-1765 [000] d..1 213.012934: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0305 wIndex 0409 wLength 12
> >> 411: irq/23-dwc3-1765 [000] d..1 213.013331: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0f00 wIndex 0000 wLength 5
> >> 440: irq/23-dwc3-1765 [000] d..1 213.013748: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0f00 wIndex 0000 wLength 22
> >> 469: irq/23-dwc3-1765 [000] d..1 213.016006: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0200 wIndex 0000 wLength 9
> >> 498: irq/23-dwc3-1765 [000] d..1 213.016543: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0200 wIndex 0000 wLength 44
> >> 527: irq/23-dwc3-1765 [000] d..1 213.016951: dwc3_ctrl_req: bRequestType 00 bRequest 09 wValue 0001 wIndex 0000 wLength 0
> >> 568: irq/23-dwc3-1765 [000] d..1 213.017569: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0301 wIndex 0409 wLength 2
> >> 597: irq/23-dwc3-1765 [000] d..1 213.017926: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0301 wIndex 0409 wLength 26
> >> 626: irq/23-dwc3-1765 [000] d..1 213.018367: dwc3_ctrl_req: bRequestType 81 bRequest 00 wValue 0000 wIndex 0000 wLength 2
> >> 653: irq/23-dwc3-1765 [000] d..1 213.026404: dwc3_ctrl_req: bRequestType a1 bRequest fe wValue 0000 wIndex 0000 wLength 1
> >> 929: irq/23-dwc3-1765 [000] d..1 214.138385: dwc3_ctrl_req: bRequestType 82 bRequest 00 wValue 0000 wIndex 0081 wLength 2
> >> 956: irq/23-dwc3-1765 [000] d..1 214.138791: dwc3_ctrl_req: bRequestType 02 bRequest 01 wValue 0000 wIndex 0081 wLength 0
> >> 1031: irq/23-dwc3-1765 [000] d..1 214.140214: dwc3_ctrl_req: bRequestType 82 bRequest 00 wValue 0000 wIndex 0081 wLength 2
> >> 1058: irq/23-dwc3-1765 [000] d..1 214.140618: dwc3_ctrl_req: bRequestType 02 bRequest 01 wValue 0000 wIndex 0081 wLength 0
> >> 1133: irq/23-dwc3-1765 [000] d..1 214.142208: dwc3_ctrl_req: bRequestType 82 bRequest 00 wValue 0000 wIndex 0081 wLength 2
> >> 1160: irq/23-dwc3-1765 [000] d..1 214.142587: dwc3_ctrl_req: bRequestType 02 bRequest 01 wValue 0000 wIndex 0081 wLength 0
> >>
> >> Note that we don't have any "bRequest 03" substring.
> >
> > Is it normal to not see any SetFeature commands? I think I was expecting
> > to see them pretty frequently.
>
> Totally normal :-) A host is not required to allow for device-initiated
> U1/U2 entry. Nor is it required to halt and endpoint, for example.
>
> >> >> > I've attached the traces and dmesg output in an archive titled
> >> >> > "gnarbox-ep0out-failure.tar.xz". Please let me know if you need more
> >> >> > information about this particular issue.
> >> >>
> >> >> I'll go over the traces, this could be something that has been fixed in
> >> >> the past.
> >> >
> >> > Did you get a chance to take a look at the tracepoints for this error?
> >> > No worries if you haven't gotten a chance yet.
> >>
> >> I did, nothing screaming wrong with them. That "can't enable ep0out"
> >> still puzzles me a bit. What we see on tracepoints is a time out:
> >>
> >> modprobe-1281 [002] d..1 246.504043: dwc3_gadget_ep_cmd: ep0out: cmd 'Set Endpoint Transfer Resource' [2] params 00000001 00000000 00000000 --> status: Timed Out
> >>
> >> But I have no idea why that happens. It could be that this was fixed
> >> long ago, but we won't know unless we run a recent kernel on your
> >> setup. I, certainly, haven't seen that :-)
> >
> > One of my colleagues is currently testing kernel 5.0.5 without the patch
> > you suggested and it looks pretty good so far! We'll keep you updated if
> > this changes. With this issue in particular, it is still unclear how to
> > get our device into this state. We'll definitely keep an eye out for it
> > on 5.0.5.
>
> If 5.0.5 is working, then you could run a git bisect and find the commit
> that fixed it. If we have a failing commit and a working commit, git
> bisect will help finding the fix, then you could simply cherry-pick it.
I spoke too soon. 5.0.5 (unpatched) shows significant improvement than 4.9.115 as it
takes about 15 minutes of data transfer before the link drops from SS to
HS. We will be sticking with the patch that you provided us earlier.
It's been working very well for us on 4.9.115.
Cheers,
Rob Weber
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Mass Storage Gadget Device Falls from SuperSpeed to High Speed
2019-04-10 1:48 ` Rob Weber
@ 2019-04-10 1:48 ` Rob Weber
0 siblings, 0 replies; 6+ messages in thread
From: Rob Weber @ 2019-04-10 1:48 UTC (permalink / raw)
To: Felipe Balbi; +Cc: linux-usb, mathias.nyman
Hi Felipe,
On Thu, Apr 04, 2019 at 12:43:01PM +0300, Felipe Balbi wrote:
> > We've identified a couple of auxiliary issues that seem to be unrelated,
> > but we are trying to gather more data to be sure. One issue we are
> > experiencing is particularly clear: We seem to have a race condition
> > between the switching of channels on the SS mux on our platform and the
> > enumeration of our device. We believe our device gets enumerated in high
> > speed before we switch the channel of the SS mux. This can be clearly
> > observed while watching the logs of the MCU on our platform while
> > simultaneously watching dmesg. Below is an example of this test setup.
> > Please note the MCU logs begin with [general][DBG]:
> >
> > root@gnarbox-2:~# echo device > /sys/class/usb_role/intel_xhci_usb_sw-role-switch/role
> > root@gnarbox-2:~# ./bin/usb-trace 1
> > echo 1 > /t/events/dwc3/enable
> > echo 1 > /t/events/xhci-hcd/enable
> > echo 1 > /t/events/gadget/enable
> > root@gnarbox-2:~# modprobe g_mass_storage file=/dev/nvme0n1p7 iSerialNumber=90405
> > root@gnarbox-2:~# stty -F /dev/ttyS2 115200 cs8 cstopb -parenb -icrnl
> > root@gnarbox-2:~# cat /dev/ttyS2 &
> > [1] 1733
> > root@gnarbox-2:~# dmesg -w
> >
> > Okay, setup complete, aaand we're rolling...
> >
> > [general] [DBG] handle_intr: (otg) event: Rp Change Detected()
> > [general] [DBG] apply_pdo: setting limit based on Rp val 0
> > [general] [DBG] set_current_limit: IIN_HOST: 100 0x0100
> > [general] [DBG] set_current_limit: success (tried 1 times)
> > [general] [DBG] handle_intr: (otg) event: Type-C Port Connect Detected()
> > [general] [DBG] usb_mux_sel: (otg) polarity: 2
> >
> > Here's where the PD controller detects the connection. Note the polarity
> > of the cable that we detect.
> >
> > [general] [DBG] handle_intr: (otg) Nothing to do.
> > [general] [DBG] handle_intr: (otg) event: Hard Reset Sent to Port Partner
> > [ 179.905835] g_mass_storage gadget: super-speed config #1: Linux File-Backed Storage
> >
> > shortly after, the device is enumerated and the link speed is set to 5Gbps
> >
> > [general] [DBG] handle_intr: (otg) event: Hard Reset Sent to Port Partner
> > [general] [DBG] handle_intr: (otg) event: Hard Reset Sent to Port Partner
> > [general] [DBG] handle_intr: (otg) event: Source Disabled State Entered
> > [general] [DBG] enable_charging: incorrect current limit(3300), setting(100)
> > [general] [DBG] set_current_limit: IIN_HOST: 100 0x0100
> > [general] [DBG] set_current_limit: success (tried 1 times)
> > ...
> > ...
> > [general] [DBG] handle_intr: (otg) event: Type-C Port Disconnect Detected()
> > [general] [DBG] apply_pdo: setting limit based on Rp val 0
> > [general] [DBG] set_current_limit: IIN_HOST: 100 0x0100
> > [general] [DBG] set_current_limit: success (tried 1 times)
> > [general] [DBG] disable_charging:
> >
> > Here we disconnect the cable from our device.
> >
> > [ 226.759843] g_mass_storage gadget: high-speed config #1: Linux File-Backed Storage
> > [general] [DBG] handle_intr: (otg) event: Type-C Port Connect Detected()
> > [general] [DBG] usb_mux_sel: (otg) polarity: 0
> >
> > Here we reconnected our device to the host but with the cable in a
> > different polarity than before. This means the SS mux is not properly
> > configured for the current cable orientation and must be switched. Also
> > note here that linux seems to report the enumeration in high speed
> > before our MCU had a chance to react to the new connection.
> >
> > [general] [DBG] handle_intr: (otg) event: Hard Reset Sent to Port Partner
> > [general] [DBG] handle_intr: (otg) event: Hard Reset Sent to Port Partner
> > [general] [DBG] handle_intr: (otg) event: Hard Reset Sent to Port Partner
> > [general] [DBG] handle_intr: (otg) event: Source Disabled State Entered
> > [general] [DBG] enable_charging: incorrect current limit(3300), setting(100)
> > [general] [DBG] set_current_limit: IIN_HOST: 100 0x0100
> > [general] [DBG] set_current_limit: success (tried 1 times)
> > ...
> > ...
> > [general] [DBG] handle_intr: (otg) event: Type-C Port Disconnect Detected()
> > [general] [DBG] apply_pdo: setting limit based on Rp val 0
> > [general] [DBG] set_current_limit: IIN_HOST: 100 0x0100
> > [general] [DBG] set_current_limit: success (tried 1 times)
> > [general] [DBG] disable_charging:
> >
> > Here we disconnect our device from the host by removing the cable from
> > our device's type-c connector.
> >
> > [ 262.783318] dwc3 dwc3.1.auto: request ffff880168ae3240 was not queued to ep1out
> > [ 262.920461] g_mass_storage gadget: super-speed config #1: Linux File-Backed Storage
> > [general] [DBG] handle_intr: (otg) event: Type-C Port Connect Detected()
> > [general] [DBG] usb_mux_sel: (otg) polarity: 0
> >
> > Here we re-connect our device to the host, but we made sure to leave the
> > cable in the same orientation as before. That way the SS mux is already
> > configured for the current cable orientation. We still see the USB
> > device enumeration occur before the MCU has a chance to respond, but it
> > gets enumerated in SuperSpeed because the SS mux was already configured
> > for this cable orientation.
>
> This looks like an old bug on xhci which was fixed long ago. Mathias, do
> you remember that bug? Which kernel fixed that issue?
>
> > We've come up with a few solutions that are specific to our platform
> > in order to reduce the chances of this race condition occuring, but I
> > wanted to ask you for your thoughts about how we might be able to work
> > around this from Linux. In the event that we cannot completely eliminate
> > this race condition, we would like to find some way to "upgrade" the
> > connection from high speed to SuperSpeed. Is something like this
> > possible?
>
> Unfortunately not. Does this continue to happen if you change the host
> machine to e.g. Windows or a more recent linux kernel?
Yes, this happens on all hosts (Windows, OSX, Linux on a range of
kernels), which leads me to believe it might be unrelated to the xhci
bug you recalled earlier.
> > My naiive understanding wsas that the host would continuously
> > poll the SS lines for a valid connection, but I believe I am mistaken
> > here. I noticed in the USB protocol traces that there are a couple of
> > Link Management Packets (LPMs) that the host sends. One of them seems to
> > request the configuration descriptor, and another seems to request the
> > link speed be set to 5 Gbps, to which the device acknowledges and
> > complies. Is it possible to force this sort of exchange to happen again?
>
> AFAICT, that's handled entirely at the HW level without SW intervention.
Hmm, okay. Might it be possible to perform some sort of soft reset of
the dwc3, or possibly ask the host to reset the device?
We're looking for any way to possibly reset the connection from HS to SS
when we hit this race condition. We are also currently exploring ways of
switching the orientation of the SS mux more efficiently.
> > I've attached yet another tarball containing the dwc3 traces, USB
> > protocol data (from a Teledyne LeCroy analyzer, compatible with the "USB
> > Protocol Suite" application), and a raw copy of the logs I annotated
> > above. The timestamps in the dwc3 traces should line up with the
> > timestamps of the dmesg output in the annotated logs. The USB protocol
> > data is from a different session, but still contains an example of the
> > first enumeration in high speed, and the second in SS, all in one
> > sample. The SS enumeration happens at around 8 seconds into the
> > 12-second capture.
> >
> > Thanks in advance for your input here!
>
> I'll have to spend some more time with the logs, but at least from
> traces, I couldn't see anything that would give a hint. I'll try looking
> at CATC traces later today tomorrow.
>
> >> > I've attached a tarball containing the dmesg output, regdump, and
> >> > tracepoints of a succesful session with the patch applied. I've included
> >> > the patch in the tarball if you would like to see how I ported it.
> >> > Could you please double-check the traces to make sure they meet your
> >> > expectations for disabling the U1/U2_ENABLE features? I looked through
> >> > the traces and could not find any link state changes to U1 or U2, so it
> >> > seems like it's working.
> >>
> >> The host, from that run, didn't enable device-initiated U1/U2 entry. So
> >> only host-initiated entry is avaiable. Since we have removed
> >> ACCEPTU1/ACCEPTU2 bits from DCTL, dwc3 is denying LGO_U1 and LGO_U2. You
> >> can confirm that only with a sniffer.
> >>
> >> > I have a couple of follow-up questions about this patch:
> >> >
> >> > * I added some tracepoints in the patch I'm applying but they don't seem
> >> > to be included in the trace output. This patch applies cleanly and I
> >> > haven't seen any compiler warnings about it. Is the
> >> > dwc3_ep0_handle_feature function not called during
> >>
> >> it's called from SetFeature() command. If SetFeature() is not sent by
> >> the host, then you won't see your tracepoints. FYI, SetFeature would
> >> have a bRequest of 3. Here we can all ctrl requests:
> >>
> >> 192: irq/23-dwc3-1765 [000] d..1 213.007591: dwc3_ctrl_req: bRequestType 00 bRequest 05 wValue 0005 wIndex 0000 wLength 0
> >> 208: irq/23-dwc3-1765 [000] d..1 213.010457: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0100 wIndex 0000 wLength 18
> >> 237: irq/23-dwc3-1765 [000] d..1 213.010871: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0304 wIndex 0409 wLength 2
> >> 266: irq/23-dwc3-1765 [000] d..1 213.011330: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0304 wIndex 0409 wLength 40
> >> 295: irq/23-dwc3-1765 [000] d..1 213.011707: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0303 wIndex 0409 wLength 2
> >> 324: irq/23-dwc3-1765 [000] d..1 213.012137: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0303 wIndex 0409 wLength 92
> >> 353: irq/23-dwc3-1765 [000] d..1 213.012548: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0305 wIndex 0409 wLength 2
> >> 382: irq/23-dwc3-1765 [000] d..1 213.012934: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0305 wIndex 0409 wLength 12
> >> 411: irq/23-dwc3-1765 [000] d..1 213.013331: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0f00 wIndex 0000 wLength 5
> >> 440: irq/23-dwc3-1765 [000] d..1 213.013748: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0f00 wIndex 0000 wLength 22
> >> 469: irq/23-dwc3-1765 [000] d..1 213.016006: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0200 wIndex 0000 wLength 9
> >> 498: irq/23-dwc3-1765 [000] d..1 213.016543: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0200 wIndex 0000 wLength 44
> >> 527: irq/23-dwc3-1765 [000] d..1 213.016951: dwc3_ctrl_req: bRequestType 00 bRequest 09 wValue 0001 wIndex 0000 wLength 0
> >> 568: irq/23-dwc3-1765 [000] d..1 213.017569: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0301 wIndex 0409 wLength 2
> >> 597: irq/23-dwc3-1765 [000] d..1 213.017926: dwc3_ctrl_req: bRequestType 80 bRequest 06 wValue 0301 wIndex 0409 wLength 26
> >> 626: irq/23-dwc3-1765 [000] d..1 213.018367: dwc3_ctrl_req: bRequestType 81 bRequest 00 wValue 0000 wIndex 0000 wLength 2
> >> 653: irq/23-dwc3-1765 [000] d..1 213.026404: dwc3_ctrl_req: bRequestType a1 bRequest fe wValue 0000 wIndex 0000 wLength 1
> >> 929: irq/23-dwc3-1765 [000] d..1 214.138385: dwc3_ctrl_req: bRequestType 82 bRequest 00 wValue 0000 wIndex 0081 wLength 2
> >> 956: irq/23-dwc3-1765 [000] d..1 214.138791: dwc3_ctrl_req: bRequestType 02 bRequest 01 wValue 0000 wIndex 0081 wLength 0
> >> 1031: irq/23-dwc3-1765 [000] d..1 214.140214: dwc3_ctrl_req: bRequestType 82 bRequest 00 wValue 0000 wIndex 0081 wLength 2
> >> 1058: irq/23-dwc3-1765 [000] d..1 214.140618: dwc3_ctrl_req: bRequestType 02 bRequest 01 wValue 0000 wIndex 0081 wLength 0
> >> 1133: irq/23-dwc3-1765 [000] d..1 214.142208: dwc3_ctrl_req: bRequestType 82 bRequest 00 wValue 0000 wIndex 0081 wLength 2
> >> 1160: irq/23-dwc3-1765 [000] d..1 214.142587: dwc3_ctrl_req: bRequestType 02 bRequest 01 wValue 0000 wIndex 0081 wLength 0
> >>
> >> Note that we don't have any "bRequest 03" substring.
> >
> > Is it normal to not see any SetFeature commands? I think I was expecting
> > to see them pretty frequently.
>
> Totally normal :-) A host is not required to allow for device-initiated
> U1/U2 entry. Nor is it required to halt and endpoint, for example.
>
> >> >> > I've attached the traces and dmesg output in an archive titled
> >> >> > "gnarbox-ep0out-failure.tar.xz". Please let me know if you need more
> >> >> > information about this particular issue.
> >> >>
> >> >> I'll go over the traces, this could be something that has been fixed in
> >> >> the past.
> >> >
> >> > Did you get a chance to take a look at the tracepoints for this error?
> >> > No worries if you haven't gotten a chance yet.
> >>
> >> I did, nothing screaming wrong with them. That "can't enable ep0out"
> >> still puzzles me a bit. What we see on tracepoints is a time out:
> >>
> >> modprobe-1281 [002] d..1 246.504043: dwc3_gadget_ep_cmd: ep0out: cmd 'Set Endpoint Transfer Resource' [2] params 00000001 00000000 00000000 --> status: Timed Out
> >>
> >> But I have no idea why that happens. It could be that this was fixed
> >> long ago, but we won't know unless we run a recent kernel on your
> >> setup. I, certainly, haven't seen that :-)
> >
> > One of my colleagues is currently testing kernel 5.0.5 without the patch
> > you suggested and it looks pretty good so far! We'll keep you updated if
> > this changes. With this issue in particular, it is still unclear how to
> > get our device into this state. We'll definitely keep an eye out for it
> > on 5.0.5.
>
> If 5.0.5 is working, then you could run a git bisect and find the commit
> that fixed it. If we have a failing commit and a working commit, git
> bisect will help finding the fix, then you could simply cherry-pick it.
I spoke too soon. 5.0.5 (unpatched) shows significant improvement than 4.9.115 as it
takes about 15 minutes of data transfer before the link drops from SS to
HS. We will be sticking with the patch that you provided us earlier.
It's been working very well for us on 4.9.115.
Cheers,
Rob Weber
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-04-10 1:48 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <87h8bnh5er.fsf@linux.intel.com>
[not found] ` <20190328085343.GA12754@coops>
[not found] ` <87bm1vgyby.fsf@linux.intel.com>
[not found] ` <87h8bhez87.fsf@linux.intel.com>
2019-04-02 7:48 ` Mass Storage Gadget Device Falls from SuperSpeed to High Speed Rob Weber
2019-04-02 8:13 ` Felipe Balbi
[not found] ` <20190403070228.GA27956@coops>
2019-04-03 7:48 ` Felipe Balbi
[not found] ` <20190404075624.GB15850@coops>
2019-04-04 9:43 ` Felipe Balbi
2019-04-10 1:48 ` Rob Weber
2019-04-10 1:48 ` Rob Weber
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).