* usb: dwc3: gadget performance insight
[not found] <PH1P110MB1489614D2BD4B34E66B9A3208334A@PH1P110MB1489.NAMP110.PROD.OUTLOOK.COM>
@ 2024-04-16 14:20 ` Grossman, Jake
2024-04-16 22:14 ` Wesley Cheng
2024-04-16 22:31 ` Thinh Nguyen
0 siblings, 2 replies; 4+ messages in thread
From: Grossman, Jake @ 2024-04-16 14:20 UTC (permalink / raw)
To: Thinh.Nguyen@synopsys.com
Cc: linux-usb@vger.kernel.org, Krebs, Charles, Hardee, Hayden M
[-- Attachment #1.1: Type: text/plain, Size: 1026 bytes --]
Hello,
We're trying to operate a USB gadget backed by the DWC3 core on an iMX8
processor, but we are seeing issues with performance.
As a comparison, utilizing iperf3 to benchmark, we are able to see
~230Mbit/s with an RNDIS gadget, and ~900Mbit/s with a hardware
USB-to-Ethernet peripheral.
Looking at the output of perf, we are seeing that with all of the gadget
drivers (RNDIS, UVC, ACM), there is significant time spent spinning in an
IRQ context that does not occur with the hardware peripheral. This seems
like it might be related to the interrupt handler as described here:
https://docs.kernel.org/usb/dwc3.html.
1. We have not yet acquired technical documentation regarding the DWC3
module. Do you have a list of the DWC3 commands that have high latency
(~1ms)?
2. Do you believe that implementing a per endpoint IRQ framework will
resolve the large disparity in performance? If not, do you have any insight
into what the root cause might be?
Thank you for your time and insight,
Jake Grossman
[-- Attachment #1.2: Type: text/html, Size: 4881 bytes --]
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 6677 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: usb: dwc3: gadget performance insight
2024-04-16 14:20 ` usb: dwc3: gadget performance insight Grossman, Jake
@ 2024-04-16 22:14 ` Wesley Cheng
2024-04-16 22:31 ` Thinh Nguyen
1 sibling, 0 replies; 4+ messages in thread
From: Wesley Cheng @ 2024-04-16 22:14 UTC (permalink / raw)
To: Grossman, Jake, Thinh.Nguyen@synopsys.com
Cc: linux-usb@vger.kernel.org, Krebs, Charles, Hardee, Hayden M
Hi Jake,
On 4/16/2024 7:20 AM, Grossman, Jake wrote:
> Hello,
>
> We’re trying to operate a USB gadget backed by the DWC3 core on an iMX8
> processor, but we are seeing issues with performance.
>
> As a comparison, utilizing iperf3 to benchmark, we are able to see
> ~230Mbit/s with an RNDIS gadget, and ~900Mbit/s with a hardware
> USB-to-Ethernet peripheral.
>
Might help to also mention the USB to Ethernet adapter that is being
used in your comparison as well, since some vendors may have some
enhanced optimizations such as data aggregation, etc...
Also, what direction are you getting these numbers in? (ie USB IN or OUT
transfers)
> Looking at the output of perf, we are seeing that with all of the gadget
> drivers (RNDIS, UVC, ACM), there is significant time spent spinning in
> an IRQ context that does not occur with the hardware peripheral. This
> seems like it might be related to the interrupt handler as described
> here: https://docs.kernel.org/usb/dwc3.html
> <https://docs.kernel.org/usb/dwc3.html>.
>
> 1. We have not yet acquired technical documentation regarding the DWC3
> module. Do you have a list of the DWC3 commands that have high
> latency (~1ms)?
DWC3 gadget nowadays utilizes the updatexfer command compared to ages
ago where it would only queue with startxfer after every xfernotready
event. That shift definitely optimized how the SW can update the
controller on when new TRBs are submitted to the endpoint's TRB ring if
a transfer is already in progress.
> 2. Do you believe that implementing a per endpoint IRQ framework will
> resolve the large disparity in performance? If not, do you have any
> insight into what the root cause might be?
>
Honestly, based on previous throughput debug, most of the problems were
at the function driver level less so from the UDC. I'll echo what Greg
says about RNDIS, and say that, along with the security concerns, it
isn't the most optimized function for IP data transfers. In my
experience the NCM class w/ packet framing will result in much better
numbers than the default RNDIS configuration, as allowing data
aggregation will lessen the number of interrupts per IP packet.
Thinh will probably have some more comments, but just sharing my two
cents :). Might be good to get some more details on the above before we
can guide you in the right direction.
Thanks
Wesley Cheng
> Thank you for your time and insight,
>
> Jake Grossman
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: usb: dwc3: gadget performance insight
2024-04-16 14:20 ` usb: dwc3: gadget performance insight Grossman, Jake
2024-04-16 22:14 ` Wesley Cheng
@ 2024-04-16 22:31 ` Thinh Nguyen
2024-04-17 6:42 ` Greg KH
1 sibling, 1 reply; 4+ messages in thread
From: Thinh Nguyen @ 2024-04-16 22:31 UTC (permalink / raw)
To: Grossman, Jake
Cc: Thinh Nguyen, linux-usb@vger.kernel.org, Krebs, Charles,
Hardee, Hayden M
On Tue, Apr 16, 2024, Grossman, Jake wrote:
> Hello,
>
>
>
> We’re trying to operate a USB gadget backed by the DWC3 core on an iMX8
> processor, but we are seeing issues with performance.
>
>
>
> As a comparison, utilizing iperf3 to benchmark, we are able to see ~230Mbit/s
> with an RNDIS gadget, and ~900Mbit/s with a hardware USB-to-Ethernet
> peripheral.
>
What is "a hardware USB-to-Ethernet peripheral"? Does it use the same
RNDIS function driver and the same kernel version? If not, you're
comparing 2 very different things. Also, I assume that you're testing
against the same host.
>
>
> Looking at the output of perf, we are seeing that with all of the gadget
> drivers (RNDIS, UVC, ACM), there is significant time spent spinning in an IRQ
> context that does not occur with the hardware peripheral. This seems like it
> might be related to the interrupt handler as described here: https://
> docs.kernel.org/usb/dwc3.html.
>
>
>
> 1. We have not yet acquired technical documentation regarding the DWC3
> module. Do you have a list of the DWC3 commands that have high latency
> (~1ms)?
> 2. Do you believe that implementing a per endpoint IRQ framework will resolve
> the large disparity in performance? If not, do you have any insight into
> what the root cause might be?
>
I'm not familiar with RNDIS. However, my suspicion is that RNDIS
transfers are small, and they may not take advantage of USB burst. Or
perhaps your platform doesn't setup the TxFIFO size for performance? On
a side note, isn't RNDIS getting outdated?
We can achieve close to theoretical USB speeds with the current dwc3
controller driver even on an FPGA platform. There are many factors
contributing to performance. You'd need to review the tracepoints and
perhaps through USB packets using some USB sniffer/analyzer to see what
the bottleneck is. I doubt that it's related to DWC3 commands. More
likely than not implementing per endpoint IRQ will make the performance
even worse (is your dwc3 controller even configured for multiple
interrupters? Somehow I doubt that's the case)
BR,
Thinh
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: usb: dwc3: gadget performance insight
2024-04-16 22:31 ` Thinh Nguyen
@ 2024-04-17 6:42 ` Greg KH
0 siblings, 0 replies; 4+ messages in thread
From: Greg KH @ 2024-04-17 6:42 UTC (permalink / raw)
To: Thinh Nguyen
Cc: Grossman, Jake, linux-usb@vger.kernel.org, Krebs, Charles,
Hardee, Hayden M
On Tue, Apr 16, 2024 at 10:31:19PM +0000, Thinh Nguyen wrote:
> On Tue, Apr 16, 2024, Grossman, Jake wrote:
> > Hello,
> >
> >
> >
> > We’re trying to operate a USB gadget backed by the DWC3 core on an iMX8
> > processor, but we are seeing issues with performance.
> >
> >
> >
> > As a comparison, utilizing iperf3 to benchmark, we are able to see ~230Mbit/s
> > with an RNDIS gadget, and ~900Mbit/s with a hardware USB-to-Ethernet
> > peripheral.
> >
>
> What is "a hardware USB-to-Ethernet peripheral"? Does it use the same
> RNDIS function driver and the same kernel version? If not, you're
> comparing 2 very different things. Also, I assume that you're testing
> against the same host.
>
> >
> >
> > Looking at the output of perf, we are seeing that with all of the gadget
> > drivers (RNDIS, UVC, ACM), there is significant time spent spinning in an IRQ
> > context that does not occur with the hardware peripheral. This seems like it
> > might be related to the interrupt handler as described here: https://
> > docs.kernel.org/usb/dwc3.html.
> >
> >
> >
> > 1. We have not yet acquired technical documentation regarding the DWC3
> > module. Do you have a list of the DWC3 commands that have high latency
> > (~1ms)?
> > 2. Do you believe that implementing a per endpoint IRQ framework will resolve
> > the large disparity in performance? If not, do you have any insight into
> > what the root cause might be?
> >
>
> I'm not familiar with RNDIS. However, my suspicion is that RNDIS
> transfers are small, and they may not take advantage of USB burst. Or
> perhaps your platform doesn't setup the TxFIFO size for performance? On
> a side note, isn't RNDIS getting outdated?
It's not only outdated, but incredibly insecure and should not be used
for anything unless you explicitly trust both ends of the connection.
Please never use it for anything real, including benchmarks.
I need to dust off my "delete the rndis code" patch set one of these
days...
thanks,
greg k-h
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-04-17 6:42 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <PH1P110MB1489614D2BD4B34E66B9A3208334A@PH1P110MB1489.NAMP110.PROD.OUTLOOK.COM>
2024-04-16 14:20 ` usb: dwc3: gadget performance insight Grossman, Jake
2024-04-16 22:14 ` Wesley Cheng
2024-04-16 22:31 ` Thinh Nguyen
2024-04-17 6:42 ` Greg KH
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).