* usb: dwc3: gadget performance insight [not found] <PH1P110MB1489614D2BD4B34E66B9A3208334A@PH1P110MB1489.NAMP110.PROD.OUTLOOK.COM> @ 2024-04-16 14:20 ` Grossman, Jake 2024-04-16 22:14 ` Wesley Cheng 2024-04-16 22:31 ` Thinh Nguyen 0 siblings, 2 replies; 4+ messages in thread From: Grossman, Jake @ 2024-04-16 14:20 UTC (permalink / raw) To: Thinh.Nguyen@synopsys.com Cc: linux-usb@vger.kernel.org, Krebs, Charles, Hardee, Hayden M [-- Attachment #1.1: Type: text/plain, Size: 1026 bytes --] Hello, We're trying to operate a USB gadget backed by the DWC3 core on an iMX8 processor, but we are seeing issues with performance. As a comparison, utilizing iperf3 to benchmark, we are able to see ~230Mbit/s with an RNDIS gadget, and ~900Mbit/s with a hardware USB-to-Ethernet peripheral. Looking at the output of perf, we are seeing that with all of the gadget drivers (RNDIS, UVC, ACM), there is significant time spent spinning in an IRQ context that does not occur with the hardware peripheral. This seems like it might be related to the interrupt handler as described here: https://docs.kernel.org/usb/dwc3.html. 1. We have not yet acquired technical documentation regarding the DWC3 module. Do you have a list of the DWC3 commands that have high latency (~1ms)? 2. Do you believe that implementing a per endpoint IRQ framework will resolve the large disparity in performance? If not, do you have any insight into what the root cause might be? Thank you for your time and insight, Jake Grossman [-- Attachment #1.2: Type: text/html, Size: 4881 bytes --] [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 6677 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: usb: dwc3: gadget performance insight 2024-04-16 14:20 ` usb: dwc3: gadget performance insight Grossman, Jake @ 2024-04-16 22:14 ` Wesley Cheng 2024-04-16 22:31 ` Thinh Nguyen 1 sibling, 0 replies; 4+ messages in thread From: Wesley Cheng @ 2024-04-16 22:14 UTC (permalink / raw) To: Grossman, Jake, Thinh.Nguyen@synopsys.com Cc: linux-usb@vger.kernel.org, Krebs, Charles, Hardee, Hayden M Hi Jake, On 4/16/2024 7:20 AM, Grossman, Jake wrote: > Hello, > > We’re trying to operate a USB gadget backed by the DWC3 core on an iMX8 > processor, but we are seeing issues with performance. > > As a comparison, utilizing iperf3 to benchmark, we are able to see > ~230Mbit/s with an RNDIS gadget, and ~900Mbit/s with a hardware > USB-to-Ethernet peripheral. > Might help to also mention the USB to Ethernet adapter that is being used in your comparison as well, since some vendors may have some enhanced optimizations such as data aggregation, etc... Also, what direction are you getting these numbers in? (ie USB IN or OUT transfers) > Looking at the output of perf, we are seeing that with all of the gadget > drivers (RNDIS, UVC, ACM), there is significant time spent spinning in > an IRQ context that does not occur with the hardware peripheral. This > seems like it might be related to the interrupt handler as described > here: https://docs.kernel.org/usb/dwc3.html > <https://docs.kernel.org/usb/dwc3.html>. > > 1. We have not yet acquired technical documentation regarding the DWC3 > module. Do you have a list of the DWC3 commands that have high > latency (~1ms)? DWC3 gadget nowadays utilizes the updatexfer command compared to ages ago where it would only queue with startxfer after every xfernotready event. That shift definitely optimized how the SW can update the controller on when new TRBs are submitted to the endpoint's TRB ring if a transfer is already in progress. > 2. Do you believe that implementing a per endpoint IRQ framework will > resolve the large disparity in performance? If not, do you have any > insight into what the root cause might be? > Honestly, based on previous throughput debug, most of the problems were at the function driver level less so from the UDC. I'll echo what Greg says about RNDIS, and say that, along with the security concerns, it isn't the most optimized function for IP data transfers. In my experience the NCM class w/ packet framing will result in much better numbers than the default RNDIS configuration, as allowing data aggregation will lessen the number of interrupts per IP packet. Thinh will probably have some more comments, but just sharing my two cents :). Might be good to get some more details on the above before we can guide you in the right direction. Thanks Wesley Cheng > Thank you for your time and insight, > > Jake Grossman > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: usb: dwc3: gadget performance insight 2024-04-16 14:20 ` usb: dwc3: gadget performance insight Grossman, Jake 2024-04-16 22:14 ` Wesley Cheng @ 2024-04-16 22:31 ` Thinh Nguyen 2024-04-17 6:42 ` Greg KH 1 sibling, 1 reply; 4+ messages in thread From: Thinh Nguyen @ 2024-04-16 22:31 UTC (permalink / raw) To: Grossman, Jake Cc: Thinh Nguyen, linux-usb@vger.kernel.org, Krebs, Charles, Hardee, Hayden M On Tue, Apr 16, 2024, Grossman, Jake wrote: > Hello, > > > > We’re trying to operate a USB gadget backed by the DWC3 core on an iMX8 > processor, but we are seeing issues with performance. > > > > As a comparison, utilizing iperf3 to benchmark, we are able to see ~230Mbit/s > with an RNDIS gadget, and ~900Mbit/s with a hardware USB-to-Ethernet > peripheral. > What is "a hardware USB-to-Ethernet peripheral"? Does it use the same RNDIS function driver and the same kernel version? If not, you're comparing 2 very different things. Also, I assume that you're testing against the same host. > > > Looking at the output of perf, we are seeing that with all of the gadget > drivers (RNDIS, UVC, ACM), there is significant time spent spinning in an IRQ > context that does not occur with the hardware peripheral. This seems like it > might be related to the interrupt handler as described here: https:// > docs.kernel.org/usb/dwc3.html. > > > > 1. We have not yet acquired technical documentation regarding the DWC3 > module. Do you have a list of the DWC3 commands that have high latency > (~1ms)? > 2. Do you believe that implementing a per endpoint IRQ framework will resolve > the large disparity in performance? If not, do you have any insight into > what the root cause might be? > I'm not familiar with RNDIS. However, my suspicion is that RNDIS transfers are small, and they may not take advantage of USB burst. Or perhaps your platform doesn't setup the TxFIFO size for performance? On a side note, isn't RNDIS getting outdated? We can achieve close to theoretical USB speeds with the current dwc3 controller driver even on an FPGA platform. There are many factors contributing to performance. You'd need to review the tracepoints and perhaps through USB packets using some USB sniffer/analyzer to see what the bottleneck is. I doubt that it's related to DWC3 commands. More likely than not implementing per endpoint IRQ will make the performance even worse (is your dwc3 controller even configured for multiple interrupters? Somehow I doubt that's the case) BR, Thinh ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: usb: dwc3: gadget performance insight 2024-04-16 22:31 ` Thinh Nguyen @ 2024-04-17 6:42 ` Greg KH 0 siblings, 0 replies; 4+ messages in thread From: Greg KH @ 2024-04-17 6:42 UTC (permalink / raw) To: Thinh Nguyen Cc: Grossman, Jake, linux-usb@vger.kernel.org, Krebs, Charles, Hardee, Hayden M On Tue, Apr 16, 2024 at 10:31:19PM +0000, Thinh Nguyen wrote: > On Tue, Apr 16, 2024, Grossman, Jake wrote: > > Hello, > > > > > > > > We’re trying to operate a USB gadget backed by the DWC3 core on an iMX8 > > processor, but we are seeing issues with performance. > > > > > > > > As a comparison, utilizing iperf3 to benchmark, we are able to see ~230Mbit/s > > with an RNDIS gadget, and ~900Mbit/s with a hardware USB-to-Ethernet > > peripheral. > > > > What is "a hardware USB-to-Ethernet peripheral"? Does it use the same > RNDIS function driver and the same kernel version? If not, you're > comparing 2 very different things. Also, I assume that you're testing > against the same host. > > > > > > > Looking at the output of perf, we are seeing that with all of the gadget > > drivers (RNDIS, UVC, ACM), there is significant time spent spinning in an IRQ > > context that does not occur with the hardware peripheral. This seems like it > > might be related to the interrupt handler as described here: https:// > > docs.kernel.org/usb/dwc3.html. > > > > > > > > 1. We have not yet acquired technical documentation regarding the DWC3 > > module. Do you have a list of the DWC3 commands that have high latency > > (~1ms)? > > 2. Do you believe that implementing a per endpoint IRQ framework will resolve > > the large disparity in performance? If not, do you have any insight into > > what the root cause might be? > > > > I'm not familiar with RNDIS. However, my suspicion is that RNDIS > transfers are small, and they may not take advantage of USB burst. Or > perhaps your platform doesn't setup the TxFIFO size for performance? On > a side note, isn't RNDIS getting outdated? It's not only outdated, but incredibly insecure and should not be used for anything unless you explicitly trust both ends of the connection. Please never use it for anything real, including benchmarks. I need to dust off my "delete the rndis code" patch set one of these days... thanks, greg k-h ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-04-17 6:42 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <PH1P110MB1489614D2BD4B34E66B9A3208334A@PH1P110MB1489.NAMP110.PROD.OUTLOOK.COM>
2024-04-16 14:20 ` usb: dwc3: gadget performance insight Grossman, Jake
2024-04-16 22:14 ` Wesley Cheng
2024-04-16 22:31 ` Thinh Nguyen
2024-04-17 6:42 ` Greg KH
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox