* DWC3-Gadget: Flickering with ISOC Streaming (UVC) while multiplier set on Superspeed
@ 2023-08-31 22:12 Michael Grzeschik
2023-09-01 1:35 ` Thinh Nguyen
0 siblings, 1 reply; 19+ messages in thread
From: Michael Grzeschik @ 2023-08-31 22:12 UTC (permalink / raw)
To: Thinh.Nguyen; +Cc: linux-usb, kernel
[-- Attachment #1: Type: text/plain, Size: 1453 bytes --]
Hi Thinh!
I just stumbled over a similar issue we already solved for the High
Speed Case when streaming ISOC packages and using a multiplier higher
then one. Last time we saw some bad frame artefacts when using the
higher multiplier value. The Frames were distorted due to truncated
transfers.
Since the last case we have patch
8affe37c525d ("usb: dwc3: gadget: fix high speed multiplier setting")
that fixes the calculation of the mult PCM1 parameter when using high
speed transfers. After that no truncations were reported again.
However I came across a similar issue which is just a little less easy
to trigger and only occurs with Superspeed. Now, while the memory
bandwidth of the machine runs on higher load, the UVC frames are
similarly distorted when we use a multiplier higher then one.
I looked over the implications the multiplier has on the Superspeed case
in the dwc3 gadget driver, but could only find some TXFIFO adjustments
and no other extra bits e.g. in the transfer descriptors. Do you have
some pointers how the multiplier parameter of the endpoint is used in
hardware?
Regards,
Michael
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: DWC3-Gadget: Flickering with ISOC Streaming (UVC) while multiplier set on Superspeed 2023-08-31 22:12 DWC3-Gadget: Flickering with ISOC Streaming (UVC) while multiplier set on Superspeed Michael Grzeschik @ 2023-09-01 1:35 ` Thinh Nguyen 2023-09-03 22:41 ` Michael Grzeschik 0 siblings, 1 reply; 19+ messages in thread From: Thinh Nguyen @ 2023-09-01 1:35 UTC (permalink / raw) To: Michael Grzeschik Cc: Thinh Nguyen, linux-usb@vger.kernel.org, kernel@pengutronix.de Hi Michael, On Fri, Sep 01, 2023, Michael Grzeschik wrote: > Hi Thinh! > > I just stumbled over a similar issue we already solved for the High > Speed Case when streaming ISOC packages and using a multiplier higher > then one. Last time we saw some bad frame artefacts when using the > higher multiplier value. The Frames were distorted due to truncated > transfers. > > Since the last case we have patch > > 8affe37c525d ("usb: dwc3: gadget: fix high speed multiplier setting") > > that fixes the calculation of the mult PCM1 parameter when using high > speed transfers. After that no truncations were reported again. > > However I came across a similar issue which is just a little less easy > to trigger and only occurs with Superspeed. Now, while the memory > bandwidth of the machine runs on higher load, the UVC frames are > similarly distorted when we use a multiplier higher then one. > > I looked over the implications the multiplier has on the Superspeed case > in the dwc3 gadget driver, but could only find some TXFIFO adjustments > and no other extra bits e.g. in the transfer descriptors. Do you have > some pointers how the multiplier parameter of the endpoint is used in > hardware? > As you already know, PCM1 is only for highspeed not Superspeed. What failure did the dwc3 driver reported? Missed isoc? What's the request transfer size? Perhaps you can capture some tracepoints of the problem? Thanks, Thinh ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: DWC3-Gadget: Flickering with ISOC Streaming (UVC) while multiplier set on Superspeed 2023-09-01 1:35 ` Thinh Nguyen @ 2023-09-03 22:41 ` Michael Grzeschik 2023-09-04 0:42 ` Michael Grzeschik 2023-09-06 0:41 ` Thinh Nguyen 0 siblings, 2 replies; 19+ messages in thread From: Michael Grzeschik @ 2023-09-03 22:41 UTC (permalink / raw) To: Thinh Nguyen; +Cc: linux-usb@vger.kernel.org, kernel@pengutronix.de [-- Attachment #1: Type: text/plain, Size: 3066 bytes --] Hi Thinh On Fri, Sep 01, 2023 at 01:35:16AM +0000, Thinh Nguyen wrote: >On Fri, Sep 01, 2023, Michael Grzeschik wrote: >> I just stumbled over a similar issue we already solved for the High >> Speed Case when streaming ISOC packages and using a multiplier higher >> then one. Last time we saw some bad frame artefacts when using the >> higher multiplier value. The Frames were distorted due to truncated >> transfers. >> >> Since the last case we have patch >> >> 8affe37c525d ("usb: dwc3: gadget: fix high speed multiplier setting") >> >> that fixes the calculation of the mult PCM1 parameter when using high >> speed transfers. After that no truncations were reported again. >> >> However I came across a similar issue which is just a little less easy >> to trigger and only occurs with Superspeed. Now, while the memory >> bandwidth of the machine runs on higher load, the UVC frames are >> similarly distorted when we use a multiplier higher then one. >> >> I looked over the implications the multiplier has on the Superspeed case >> in the dwc3 gadget driver, but could only find some TXFIFO adjustments >> and no other extra bits e.g. in the transfer descriptors. Do you have >> some pointers how the multiplier parameter of the endpoint is used in >> hardware? >> > >As you already know, PCM1 is only for highspeed not Superspeed. What >failure did the dwc3 driver reported? Missed isoc? What's the request >transfer size? Yes, I see missed isoc errors. But this is just a symptom in this case. I can increase the maxburst or maxpacket parameters stepwise and on one point see the flickering appear. But when I increase the TXFIFOSIZE for the endpoint the flickering is gone again. Until I increase one of the parameters maxpacket or maxburst to much again. So due to the memory bandwidth is under pressure, it seems like the hardware is not fast enough with sending the expected data per transfer, due to the txfifo is not long enough and needs to be refilled more often. This sounds like a fifo underrun issue in the hardware. I am currently looking into the fifo_resize device tree paramter, and try to figure out how the calculation is done. From the software design point of view, having the fifo calculation parameterized is a bad idea. We probably want to analyze how many endpoints are going to be used in the active gadget config and use the finite fifo length to calculate some fair parts for every ep once and then never touch them again. Dynamic resizing should not be necessary or do I overlook something? What do you think? >Perhaps you can capture some tracepoints of the problem? IMHO tracepoints are probably not necessary here anymore. Regards, Michael -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: DWC3-Gadget: Flickering with ISOC Streaming (UVC) while multiplier set on Superspeed 2023-09-03 22:41 ` Michael Grzeschik @ 2023-09-04 0:42 ` Michael Grzeschik 2023-09-06 0:44 ` Thinh Nguyen 2023-09-06 0:41 ` Thinh Nguyen 1 sibling, 1 reply; 19+ messages in thread From: Michael Grzeschik @ 2023-09-04 0:42 UTC (permalink / raw) To: Thinh Nguyen Cc: linux-usb@vger.kernel.org, kernel@pengutronix.de, stanley_chang [-- Attachment #1: Type: text/plain, Size: 2797 bytes --] Cc'ing: Stanley Chang <stanley_chang@realtek.com> On Mon, Sep 04, 2023 at 12:41:33AM +0200, Michael Grzeschik wrote: >Hi Thinh > >On Fri, Sep 01, 2023 at 01:35:16AM +0000, Thinh Nguyen wrote: >>On Fri, Sep 01, 2023, Michael Grzeschik wrote: >>>I just stumbled over a similar issue we already solved for the High >>>Speed Case when streaming ISOC packages and using a multiplier higher >>>then one. Last time we saw some bad frame artefacts when using the >>>higher multiplier value. The Frames were distorted due to truncated >>>transfers. >>> >>>Since the last case we have patch >>> >>>8affe37c525d ("usb: dwc3: gadget: fix high speed multiplier setting") >>> >>>that fixes the calculation of the mult PCM1 parameter when using high >>>speed transfers. After that no truncations were reported again. >>> >>>However I came across a similar issue which is just a little less easy >>>to trigger and only occurs with Superspeed. Now, while the memory >>>bandwidth of the machine runs on higher load, the UVC frames are >>>similarly distorted when we use a multiplier higher then one. >>> >>>I looked over the implications the multiplier has on the Superspeed case >>>in the dwc3 gadget driver, but could only find some TXFIFO adjustments >>>and no other extra bits e.g. in the transfer descriptors. Do you have >>>some pointers how the multiplier parameter of the endpoint is used in >>>hardware? >>> >> >>As you already know, PCM1 is only for highspeed not Superspeed. What >>failure did the dwc3 driver reported? Missed isoc? What's the request >>transfer size? > >Yes, I see missed isoc errors. But this is just a symptom in this case. > >I can increase the maxburst or maxpacket parameters stepwise and on >one point see the flickering appear. But when I increase the TXFIFOSIZE >for the endpoint the flickering is gone again. Until I increase one of >the parameters maxpacket or maxburst to much again. > >So due to the memory bandwidth is under pressure, it seems like the >hardware is not fast enough with sending the expected data per transfer, >due to the txfifo is not long enough and needs to be refilled more >often. > >This sounds like a fifo underrun issue in the hardware. This whole issue sound like a case of stanleys patches: https://lore.kernel.org/all/20230828055212.5600-1-stanley_chang@realtek.com/ For now I was unluky while adjusting the paramaters. Are those registers anywhere documented? Michael -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: DWC3-Gadget: Flickering with ISOC Streaming (UVC) while multiplier set on Superspeed 2023-09-04 0:42 ` Michael Grzeschik @ 2023-09-06 0:44 ` Thinh Nguyen 2023-09-06 6:40 ` Michael Grzeschik 0 siblings, 1 reply; 19+ messages in thread From: Thinh Nguyen @ 2023-09-06 0:44 UTC (permalink / raw) To: Michael Grzeschik Cc: Thinh Nguyen, linux-usb@vger.kernel.org, kernel@pengutronix.de, stanley_chang@realtek.com On Mon, Sep 04, 2023, Michael Grzeschik wrote: > Cc'ing: Stanley Chang <stanley_chang@realtek.com> > > On Mon, Sep 04, 2023 at 12:41:33AM +0200, Michael Grzeschik wrote: > > Hi Thinh > > > > On Fri, Sep 01, 2023 at 01:35:16AM +0000, Thinh Nguyen wrote: > > > On Fri, Sep 01, 2023, Michael Grzeschik wrote: > > > > I just stumbled over a similar issue we already solved for the High > > > > Speed Case when streaming ISOC packages and using a multiplier higher > > > > then one. Last time we saw some bad frame artefacts when using the > > > > higher multiplier value. The Frames were distorted due to truncated > > > > transfers. > > > > > > > > Since the last case we have patch > > > > > > > > 8affe37c525d ("usb: dwc3: gadget: fix high speed multiplier setting") > > > > > > > > that fixes the calculation of the mult PCM1 parameter when using high > > > > speed transfers. After that no truncations were reported again. > > > > > > > > However I came across a similar issue which is just a little less easy > > > > to trigger and only occurs with Superspeed. Now, while the memory > > > > bandwidth of the machine runs on higher load, the UVC frames are > > > > similarly distorted when we use a multiplier higher then one. > > > > > > > > I looked over the implications the multiplier has on the Superspeed case > > > > in the dwc3 gadget driver, but could only find some TXFIFO adjustments > > > > and no other extra bits e.g. in the transfer descriptors. Do you have > > > > some pointers how the multiplier parameter of the endpoint is used in > > > > hardware? > > > > > > > > > > As you already know, PCM1 is only for highspeed not Superspeed. What > > > failure did the dwc3 driver reported? Missed isoc? What's the request > > > transfer size? > > > > Yes, I see missed isoc errors. But this is just a symptom in this case. > > > > I can increase the maxburst or maxpacket parameters stepwise and on > > one point see the flickering appear. But when I increase the TXFIFOSIZE > > for the endpoint the flickering is gone again. Until I increase one of > > the parameters maxpacket or maxburst to much again. > > > > So due to the memory bandwidth is under pressure, it seems like the > > hardware is not fast enough with sending the expected data per transfer, > > due to the txfifo is not long enough and needs to be refilled more > > often. > > > > This sounds like a fifo underrun issue in the hardware. > > This whole issue sound like a case of stanleys patches: > > https://lore.kernel.org/all/20230828055212.5600-1-stanley_chang@realtek.com/ > > For now I was unluky while adjusting the paramaters. Are those registers > anywhere documented? > How does his change impact isoc endpoint? His change should only affect non-periodic endpoint. Also periodic settings is only for host. Thanks, Thinh ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: DWC3-Gadget: Flickering with ISOC Streaming (UVC) while multiplier set on Superspeed 2023-09-06 0:44 ` Thinh Nguyen @ 2023-09-06 6:40 ` Michael Grzeschik 0 siblings, 0 replies; 19+ messages in thread From: Michael Grzeschik @ 2023-09-06 6:40 UTC (permalink / raw) To: Thinh Nguyen Cc: linux-usb@vger.kernel.org, kernel@pengutronix.de, stanley_chang@realtek.com [-- Attachment #1: Type: text/plain, Size: 3590 bytes --] On Wed, Sep 06, 2023 at 12:44:02AM +0000, Thinh Nguyen wrote: >On Mon, Sep 04, 2023, Michael Grzeschik wrote: >> Cc'ing: Stanley Chang <stanley_chang@realtek.com> >> >> On Mon, Sep 04, 2023 at 12:41:33AM +0200, Michael Grzeschik wrote: >> > Hi Thinh >> > >> > On Fri, Sep 01, 2023 at 01:35:16AM +0000, Thinh Nguyen wrote: >> > > On Fri, Sep 01, 2023, Michael Grzeschik wrote: >> > > > I just stumbled over a similar issue we already solved for the High >> > > > Speed Case when streaming ISOC packages and using a multiplier higher >> > > > then one. Last time we saw some bad frame artefacts when using the >> > > > higher multiplier value. The Frames were distorted due to truncated >> > > > transfers. >> > > > >> > > > Since the last case we have patch >> > > > >> > > > 8affe37c525d ("usb: dwc3: gadget: fix high speed multiplier setting") >> > > > >> > > > that fixes the calculation of the mult PCM1 parameter when using high >> > > > speed transfers. After that no truncations were reported again. >> > > > >> > > > However I came across a similar issue which is just a little less easy >> > > > to trigger and only occurs with Superspeed. Now, while the memory >> > > > bandwidth of the machine runs on higher load, the UVC frames are >> > > > similarly distorted when we use a multiplier higher then one. >> > > > >> > > > I looked over the implications the multiplier has on the Superspeed case >> > > > in the dwc3 gadget driver, but could only find some TXFIFO adjustments >> > > > and no other extra bits e.g. in the transfer descriptors. Do you have >> > > > some pointers how the multiplier parameter of the endpoint is used in >> > > > hardware? >> > > > >> > > >> > > As you already know, PCM1 is only for highspeed not Superspeed. What >> > > failure did the dwc3 driver reported? Missed isoc? What's the request >> > > transfer size? >> > >> > Yes, I see missed isoc errors. But this is just a symptom in this case. >> > >> > I can increase the maxburst or maxpacket parameters stepwise and on >> > one point see the flickering appear. But when I increase the TXFIFOSIZE >> > for the endpoint the flickering is gone again. Until I increase one of >> > the parameters maxpacket or maxburst to much again. >> > >> > So due to the memory bandwidth is under pressure, it seems like the >> > hardware is not fast enough with sending the expected data per transfer, >> > due to the txfifo is not long enough and needs to be refilled more >> > often. >> > >> > This sounds like a fifo underrun issue in the hardware. >> >> This whole issue sound like a case of stanleys patches: >> >> https://lore.kernel.org/all/20230828055212.5600-1-stanley_chang@realtek.com/ >> >> For now I was unluky while adjusting the paramaters. Are those registers >> anywhere documented? >> > >How does his change impact isoc endpoint? His change should only affect >non-periodic endpoint. Also periodic settings is only for host. Since I did not find proper documentation of the registers I was just hoping. In the device-tree documentation stanley added he mentioned some device mode properties. As I look again it seems the RXFIFO parameters are the only one affecting device mode. So nevermind. Thanks, Michael -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: DWC3-Gadget: Flickering with ISOC Streaming (UVC) while multiplier set on Superspeed 2023-09-03 22:41 ` Michael Grzeschik 2023-09-04 0:42 ` Michael Grzeschik @ 2023-09-06 0:41 ` Thinh Nguyen 2023-09-06 7:18 ` Michael Grzeschik 1 sibling, 1 reply; 19+ messages in thread From: Thinh Nguyen @ 2023-09-06 0:41 UTC (permalink / raw) To: Michael Grzeschik Cc: Thinh Nguyen, linux-usb@vger.kernel.org, kernel@pengutronix.de Hi, On Mon, Sep 04, 2023, Michael Grzeschik wrote: > Hi Thinh > > On Fri, Sep 01, 2023 at 01:35:16AM +0000, Thinh Nguyen wrote: > > On Fri, Sep 01, 2023, Michael Grzeschik wrote: > > > I just stumbled over a similar issue we already solved for the High > > > Speed Case when streaming ISOC packages and using a multiplier higher > > > then one. Last time we saw some bad frame artefacts when using the > > > higher multiplier value. The Frames were distorted due to truncated > > > transfers. > > > > > > Since the last case we have patch > > > > > > 8affe37c525d ("usb: dwc3: gadget: fix high speed multiplier setting") > > > > > > that fixes the calculation of the mult PCM1 parameter when using high > > > speed transfers. After that no truncations were reported again. > > > > > > However I came across a similar issue which is just a little less easy > > > to trigger and only occurs with Superspeed. Now, while the memory > > > bandwidth of the machine runs on higher load, the UVC frames are > > > similarly distorted when we use a multiplier higher then one. > > > > > > I looked over the implications the multiplier has on the Superspeed case > > > in the dwc3 gadget driver, but could only find some TXFIFO adjustments > > > and no other extra bits e.g. in the transfer descriptors. Do you have > > > some pointers how the multiplier parameter of the endpoint is used in > > > hardware? > > > > > > > As you already know, PCM1 is only for highspeed not Superspeed. What > > failure did the dwc3 driver reported? Missed isoc? What's the request > > transfer size? > > Yes, I see missed isoc errors. But this is just a symptom in this case. > > I can increase the maxburst or maxpacket parameters stepwise and on > one point see the flickering appear. But when I increase the TXFIFOSIZE > for the endpoint the flickering is gone again. Until I increase one of > the parameters maxpacket or maxburst to much again. > > So due to the memory bandwidth is under pressure, it seems like the > hardware is not fast enough with sending the expected data per transfer, > due to the txfifo is not long enough and needs to be refilled more > often. > > This sounds like a fifo underrun issue in the hardware. > > I am currently looking into the fifo_resize device tree paramter, > and try to figure out how the calculation is done. > > From the software design point of view, having the fifo calculation > parameterized is a bad idea. We probably want to analyze how many > endpoints are going to be used in the active gadget config and use the > finite fifo length to calculate some fair parts for every ep > once and then never touch them again. Dynamic resizing should not be > necessary or do I overlook something? > > What do you think? There are multiple factors that impact the isoc performance: 1) FIFO size Typically the FIFO size is recommended to a minimum of the output of dwc3_gadget_calc_tx_fifo_size(). For isoc, depending on the request size, if it's 48KB/uframe, you probably need more than the minimum. Each physical endpoint has a pre-configured TX FIFO size. Without touching the FIFO reconfiguration parameter, then the driver will just pick the first physical endpoint that can be used, but it may not be the best fit for your purpose. 2) Burst setting I think this is self-explainatory. Large data request needs higher burst. 3) Internal cache size The controller caches X number of TRBs every time you queue a new request. If a request has a lot small TRBs, then it needs to recache multiple times just to complete the request. Typically the controller caches 8 or 16 TRBs. I notice that UVC uses SG to split up the request to small SG entries base on logical parts rather than for performance or hardware size contraints. So, there can be improvements here. 4) Host bandwidth constraint The host can limit the burst threshold and do less burst than what the device is capable of due to the host's bandwidth contraint. If there are other endpoint traffic on top of isoc endpoint, then the host would have to reserve bandwidth for other endpoints. Depending on the host controller implementation, it may reserve more or less for isoc. Also, if there are hubs or other devices in the topology, then it impacts the bandwidth too. Base on your description, looks like modifying TX FIFO size impacts the most. We already have some properties to calculate and resize the TX FIFO size, did you try to see if it improves for you? (check dwc->do_fifo_resize). > > > Perhaps you can capture some tracepoints of the problem? > > IMHO tracepoints are probably not necessary here anymore. > This helps to see how uvc setups the transfer and see what could impact the isoc transfers (for point 3 and 4 above). However, if just resizing the TX FIFO works for you, perhaps we can just focus into this. Thanks, Thinh ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: DWC3-Gadget: Flickering with ISOC Streaming (UVC) while multiplier set on Superspeed 2023-09-06 0:41 ` Thinh Nguyen @ 2023-09-06 7:18 ` Michael Grzeschik 2023-09-06 23:05 ` Thinh Nguyen 0 siblings, 1 reply; 19+ messages in thread From: Michael Grzeschik @ 2023-09-06 7:18 UTC (permalink / raw) To: Thinh Nguyen; +Cc: linux-usb@vger.kernel.org, kernel@pengutronix.de [-- Attachment #1: Type: text/plain, Size: 6526 bytes --] Hi On Wed, Sep 06, 2023 at 12:41:58AM +0000, Thinh Nguyen wrote: >On Mon, Sep 04, 2023, Michael Grzeschik wrote: >> On Fri, Sep 01, 2023 at 01:35:16AM +0000, Thinh Nguyen wrote: >> > On Fri, Sep 01, 2023, Michael Grzeschik wrote: >> > > I just stumbled over a similar issue we already solved for the High >> > > Speed Case when streaming ISOC packages and using a multiplier higher >> > > then one. Last time we saw some bad frame artefacts when using the >> > > higher multiplier value. The Frames were distorted due to truncated >> > > transfers. >> > > >> > > Since the last case we have patch >> > > >> > > 8affe37c525d ("usb: dwc3: gadget: fix high speed multiplier setting") >> > > >> > > that fixes the calculation of the mult PCM1 parameter when using high >> > > speed transfers. After that no truncations were reported again. >> > > >> > > However I came across a similar issue which is just a little less easy >> > > to trigger and only occurs with Superspeed. Now, while the memory >> > > bandwidth of the machine runs on higher load, the UVC frames are >> > > similarly distorted when we use a multiplier higher then one. >> > > >> > > I looked over the implications the multiplier has on the Superspeed case >> > > in the dwc3 gadget driver, but could only find some TXFIFO adjustments >> > > and no other extra bits e.g. in the transfer descriptors. Do you have >> > > some pointers how the multiplier parameter of the endpoint is used in >> > > hardware? >> > > >> > >> > As you already know, PCM1 is only for highspeed not Superspeed. What >> > failure did the dwc3 driver reported? Missed isoc? What's the request >> > transfer size? >> >> Yes, I see missed isoc errors. But this is just a symptom in this case. >> >> I can increase the maxburst or maxpacket parameters stepwise and on >> one point see the flickering appear. But when I increase the TXFIFOSIZE >> for the endpoint the flickering is gone again. Until I increase one of >> the parameters maxpacket or maxburst to much again. >> >> So due to the memory bandwidth is under pressure, it seems like the >> hardware is not fast enough with sending the expected data per transfer, >> due to the txfifo is not long enough and needs to be refilled more >> often. >> >> This sounds like a fifo underrun issue in the hardware. >> >> I am currently looking into the fifo_resize device tree paramter, >> and try to figure out how the calculation is done. >> >> From the software design point of view, having the fifo calculation >> parameterized is a bad idea. We probably want to analyze how many >> endpoints are going to be used in the active gadget config and use the >> finite fifo length to calculate some fair parts for every ep >> once and then never touch them again. Dynamic resizing should not be >> necessary or do I overlook something? >> >> What do you think? > >There are multiple factors that impact the isoc performance: >1) FIFO size > Typically the FIFO size is recommended to a minimum of the > output of dwc3_gadget_calc_tx_fifo_size(). For isoc, depending > on the request size, if it's 48KB/uframe, you probably need more > than the minimum. Each physical endpoint has a pre-configured TX > FIFO size. Without touching the FIFO reconfiguration parameter, > then the driver will just pick the first physical endpoint that > can be used, but it may not be the best fit for your purpose. This has an effect but needs to be tweaked somehow. >2) Burst setting > I think this is self-explainatory. Large data request needs > higher burst. I will have to find out if the burst setting can be changed on the rk3568 somehow. This sounds very likely. > >3) Internal cache size > The controller caches X number of TRBs every time you queue a > new request. If a request has a lot small TRBs, then it needs to > recache multiple times just to complete the request. Typically > the controller caches 8 or 16 TRBs. I notice that UVC uses SG to > split up the request to small SG entries base on logical parts > rather than for performance or hardware size contraints. So, > there can be improvements here. In my particular case the issue is also seen when using the non_sg uvc path and doing memcpy instead. So I for this usecase it may be the less significant factor. >4) Host bandwidth constraint > The host can limit the burst threshold and do less burst than > what the device is capable of due to the host's bandwidth > contraint. If there are other endpoint traffic on top of isoc > endpoint, then the host would have to reserve bandwidth for > other endpoints. Depending on the host controller > implementation, it may reserve more or less for isoc. Also, if > there are hubs or other devices in the topology, then it impacts > the bandwidth too. The host has no other traffic going on other then the isoc uvc streaming. >Base on your description, looks like modifying TX FIFO size impacts the >most. We already have some properties to calculate and resize the TX >FIFO size, did you try to see if it improves for you? (check >dwc->do_fifo_resize). First, thaks for the detailed explanation! I already enabled do_fifo_resize but it did not have an effect. However when I additionally set the second parameter from dwc3_gadget_calc_tx_fifo_size in dwc3_gadget_resize_tx_fifos from 1 to 2 it improves the streaming. But it does only to another threshold. In particular without the fifo increase it was only possible to use maxburst of 2 and multilier of 2. With the increased txfifo the it was possible to use maxburst of 4. My goal is an multiplier of 2 and maxburst of at least 6. Otherwise streaming of 4k mjpeg is not guaranteed. >> > Perhaps you can capture some tracepoints of the problem? >> >> IMHO tracepoints are probably not necessary here anymore. >> > >This helps to see how uvc setups the transfer and see what could impact >the isoc transfers (for point 3 and 4 above). However, if just resizing >the TX FIFO works for you, perhaps we can just focus into this. Since 3 and 4 are currently not so significant I will focus on the first two points. Thanks, Michael -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: DWC3-Gadget: Flickering with ISOC Streaming (UVC) while multiplier set on Superspeed 2023-09-06 7:18 ` Michael Grzeschik @ 2023-09-06 23:05 ` Thinh Nguyen 2023-09-06 23:09 ` Thinh Nguyen 0 siblings, 1 reply; 19+ messages in thread From: Thinh Nguyen @ 2023-09-06 23:05 UTC (permalink / raw) To: Michael Grzeschik Cc: Thinh Nguyen, linux-usb@vger.kernel.org, kernel@pengutronix.de On Wed, Sep 06, 2023, Michael Grzeschik wrote: > > > 2) Burst setting > > I think this is self-explainatory. Large data request needs > > higher burst. > > I will have to find out if the burst setting can be changed on the > rk3568 somehow. This sounds very likely. > The dwc3 driver checks the endpoint descriptor from the UVC function driver to setup the burst. So just setup the max 16 bursts (or 15 in the descriptor). The dwc3 controller supports that. Whether the host would do 16 bursts is another thing. Note that there's no "mult" setting for SuperSpeed. I recall that UVC tries to pack a lot of data in a single request. All the while some intervals it would send 0-length data because of idle time. I would spread to more requests with a little less data to give the host a little more breathing room and bandwidth. BR, Thinh ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: DWC3-Gadget: Flickering with ISOC Streaming (UVC) while multiplier set on Superspeed 2023-09-06 23:05 ` Thinh Nguyen @ 2023-09-06 23:09 ` Thinh Nguyen 2023-09-07 21:00 ` Michael Grzeschik 0 siblings, 1 reply; 19+ messages in thread From: Thinh Nguyen @ 2023-09-06 23:09 UTC (permalink / raw) To: Michael Grzeschik; +Cc: linux-usb@vger.kernel.org, kernel@pengutronix.de On Wed, Sep 06, 2023, Thinh Nguyen wrote: > On Wed, Sep 06, 2023, Michael Grzeschik wrote: > > > > > 2) Burst setting > > > I think this is self-explainatory. Large data request needs > > > higher burst. > > > > I will have to find out if the burst setting can be changed on the > > rk3568 somehow. This sounds very likely. > > > > The dwc3 driver checks the endpoint descriptor from the UVC function > driver to setup the burst. So just setup the max 16 bursts (or 15 in the > descriptor). The dwc3 controller supports that. Whether the host would > do 16 bursts is another thing. Note that there's no "mult" setting for > SuperSpeed. Clarification: no mult setting for the dwc3 controller when operate in SuperSpeed. Thinh > > I recall that UVC tries to pack a lot of data in a single request. All > the while some intervals it would send 0-length data because of idle > time. I would spread to more requests with a little less data to give > the host a little more breathing room and bandwidth. > > BR, > Thinh ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: DWC3-Gadget: Flickering with ISOC Streaming (UVC) while multiplier set on Superspeed 2023-09-06 23:09 ` Thinh Nguyen @ 2023-09-07 21:00 ` Michael Grzeschik 2023-09-07 23:33 ` Thinh Nguyen 0 siblings, 1 reply; 19+ messages in thread From: Michael Grzeschik @ 2023-09-07 21:00 UTC (permalink / raw) To: Thinh Nguyen; +Cc: linux-usb@vger.kernel.org, kernel@pengutronix.de [-- Attachment #1: Type: text/plain, Size: 2463 bytes --] Hi Thinh Thanks for the update! On Wed, Sep 06, 2023 at 11:09:03PM +0000, Thinh Nguyen wrote: >On Wed, Sep 06, 2023, Thinh Nguyen wrote: >> On Wed, Sep 06, 2023, Michael Grzeschik wrote: >> > >> > > 2) Burst setting >> > > I think this is self-explainatory. Large data request needs >> > > higher burst. >> > >> > I will have to find out if the burst setting can be changed on the >> > rk3568 somehow. This sounds very likely. >> > >> >> The dwc3 driver checks the endpoint descriptor from the UVC function >> driver to setup the burst. So just setup the max 16 bursts (or 15 in the >> descriptor). The dwc3 controller supports that. Whether the host would >> do 16 bursts is another thing. Note that there's no "mult" setting for >> SuperSpeed. > >Clarification: no mult setting for the dwc3 controller when operate in >SuperSpeed. I was somehow mistaken by the wording "burst setting" and thought of the axi-bus burst setting to the controller instead of the usb3 maxburst setting as you ment actually. However. This is usefull input anyway. I never thought of maximizing the burst and packagesize and let the host side make the decision. But we will probably will eat up a lot of usb bandwidth by doing so. Before your note I was somehow mistaken that the maxpacket and maxburst size had to correlate with the actually transfered data we queue into the hardware per request. >> I recall that UVC tries to pack a lot of data in a single request. All >> the while some intervals it would send 0-length data because of idle >> time. I would spread to more requests with a little less data to give >> the host a little more breathing room and bandwidth. The higher load per request is due to the fact that the uvc gadget is using the multiplier, maxpacket and maxburst parameters for the size calculation of the request. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/gadget/function/uvc_video.c#n331 Since it is clear now that those parameters are not necessary coupled it makes total sense to split the requests into smaller chunks. Thanks, Michael -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: DWC3-Gadget: Flickering with ISOC Streaming (UVC) while multiplier set on Superspeed 2023-09-07 21:00 ` Michael Grzeschik @ 2023-09-07 23:33 ` Thinh Nguyen 2023-10-30 12:18 ` Michael Grzeschik 0 siblings, 1 reply; 19+ messages in thread From: Thinh Nguyen @ 2023-09-07 23:33 UTC (permalink / raw) To: Michael Grzeschik Cc: Thinh Nguyen, linux-usb@vger.kernel.org, kernel@pengutronix.de On Thu, Sep 07, 2023, Michael Grzeschik wrote: > Hi Thinh > > Thanks for the update! > > On Wed, Sep 06, 2023 at 11:09:03PM +0000, Thinh Nguyen wrote: > > On Wed, Sep 06, 2023, Thinh Nguyen wrote: > > > On Wed, Sep 06, 2023, Michael Grzeschik wrote: > > > > > > > > > 2) Burst setting > > > > > I think this is self-explainatory. Large data request needs > > > > > higher burst. > > > > > > > > I will have to find out if the burst setting can be changed on the > > > > rk3568 somehow. This sounds very likely. > > > > > > > > > > The dwc3 driver checks the endpoint descriptor from the UVC function > > > driver to setup the burst. So just setup the max 16 bursts (or 15 in the > > > descriptor). The dwc3 controller supports that. Whether the host would > > > do 16 bursts is another thing. Note that there's no "mult" setting for > > > SuperSpeed. > > > > Clarification: no mult setting for the dwc3 controller when operate in > > SuperSpeed. > > I was somehow mistaken by the wording "burst setting" and thought of the > axi-bus burst setting to the controller instead of the usb3 maxburst > setting as you ment actually. I see. You were referring to the axi-bus burst. If your platform takes a long time to DMA out the data, it will impact the performance also. You can play around with GSBUSCFG0 to enable/restrict certain burst sizes to see any improvement. However, I would expect the default coreConfiguration values should be optimal for your platform design. > > However. This is usefull input anyway. I never thought of maximizing the > burst and packagesize and let the host side make the decision. > But we will probably will eat up a lot of usb bandwidth by doing so. > > Before your note I was somehow mistaken that the maxpacket and maxburst > size had to correlate with the actually transfered data we queue into > the hardware per request. Right. The maxpacket, maxburst, and mult limit the max request data length you can send. > > > > I recall that UVC tries to pack a lot of data in a single request. All > > > the while some intervals it would send 0-length data because of idle > > > time. I would spread to more requests with a little less data to give > > > the host a little more breathing room and bandwidth. > > The higher load per request is due to the fact that the uvc gadget is > using the multiplier, maxpacket and maxburst parameters for the size > calculation of the request. > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/gadget/function/uvc_video.c#n331 > > Since it is clear now that those parameters are not necessary coupled > it makes total sense to split the requests into smaller chunks. > Ok. BR, Thinh ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: DWC3-Gadget: Flickering with ISOC Streaming (UVC) while multiplier set on Superspeed 2023-09-07 23:33 ` Thinh Nguyen @ 2023-10-30 12:18 ` Michael Grzeschik 2023-10-31 23:18 ` Thinh Nguyen 0 siblings, 1 reply; 19+ messages in thread From: Michael Grzeschik @ 2023-10-30 12:18 UTC (permalink / raw) To: Thinh Nguyen; +Cc: linux-usb@vger.kernel.org, kernel@pengutronix.de [-- Attachment #1: Type: text/plain, Size: 4267 bytes --] Hi Thinh, On Thu, Sep 07, 2023 at 11:33:26PM +0000, Thinh Nguyen wrote: >On Thu, Sep 07, 2023, Michael Grzeschik wrote: >> On Wed, Sep 06, 2023 at 11:09:03PM +0000, Thinh Nguyen wrote: >> > On Wed, Sep 06, 2023, Thinh Nguyen wrote: >> > > On Wed, Sep 06, 2023, Michael Grzeschik wrote: >> > > > >> > > > > 2) Burst setting >> > > > > I think this is self-explainatory. Large data request needs >> > > > > higher burst. >> > > > >> > > > I will have to find out if the burst setting can be changed on the >> > > > rk3568 somehow. This sounds very likely. >> > > > >> > > >> > > The dwc3 driver checks the endpoint descriptor from the UVC function >> > > driver to setup the burst. So just setup the max 16 bursts (or 15 in the >> > > descriptor). The dwc3 controller supports that. Whether the host would >> > > do 16 bursts is another thing. Note that there's no "mult" setting for >> > > SuperSpeed. >> > >> > Clarification: no mult setting for the dwc3 controller when operate in >> > SuperSpeed. >> >> I was somehow mistaken by the wording "burst setting" and thought of the >> axi-bus burst setting to the controller instead of the usb3 maxburst >> setting as you ment actually. > >I see. You were referring to the axi-bus burst. If your platform takes a >long time to DMA out the data, it will impact the performance also. You >can play around with GSBUSCFG0 to enable/restrict certain burst sizes to >see any improvement. However, I would expect the default >coreConfiguration values should be optimal for your platform design. I was not lucky with that parameters. Under heavy memory load the system still runs into fifo underruns. >> However. This is usefull input anyway. I never thought of maximizing the >> burst and packagesize and let the host side make the decision. >> But we will probably will eat up a lot of usb bandwidth by doing so. >> >> Before your note I was somehow mistaken that the maxpacket and maxburst >> size had to correlate with the actually transfered data we queue into >> the hardware per request. > >Right. The maxpacket, maxburst, and mult limit the max request data >length you can send. >> > > I recall that UVC tries to pack a lot of data in a single request. All >> > > the while some intervals it would send 0-length data because of idle >> > > time. I would spread to more requests with a little less data to give >> > > the host a little more breathing room and bandwidth. >> >> The higher load per request is due to the fact that the uvc gadget is >> using the multiplier, maxpacket and maxburst parameters for the size >> calculation of the request. >> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/gadget/function/uvc_video.c#n331 >> >> Since it is clear now that those parameters are not necessary coupled >> it makes total sense to split the requests into smaller chunks. >> > >Ok. So changing the req_size to smaller chunks indeed did increase the stability. The main misunderstanding here was that the that not every request corresponds with the timeslot of one interval. After reading this thread once again, it is clear that this is not the case and we still find all possible bandwidth by decreasing the size of each request. The main takeway was that with the hardware will cache several trbs into the queue. So when there are not enough trbs available because they are just to big, the fifo runs into underruns. In our case with the high memory bandwidth usage, the trbs could probably not even read out that few trbs in time and did trigger the case earlier. So with smaller requests the fifo will be filled with more smaller trbs and does not run out that fast, since more trbs are cached to begin with. One more question: Does the caching amount of the fifo directly correlate with the endpoint setting in DWC3_DEPCFG_BURST_SIZE then? Regards, Michael -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: DWC3-Gadget: Flickering with ISOC Streaming (UVC) while multiplier set on Superspeed 2023-10-30 12:18 ` Michael Grzeschik @ 2023-10-31 23:18 ` Thinh Nguyen 2023-11-09 23:33 ` [PATCH] usb: gadget: uvc: reduce the request size to increase the throughput Michael Grzeschik 0 siblings, 1 reply; 19+ messages in thread From: Thinh Nguyen @ 2023-10-31 23:18 UTC (permalink / raw) To: Michael Grzeschik Cc: Thinh Nguyen, linux-usb@vger.kernel.org, kernel@pengutronix.de Hi, On Mon, Oct 30, 2023, Michael Grzeschik wrote: > Hi Thinh, > > On Thu, Sep 07, 2023 at 11:33:26PM +0000, Thinh Nguyen wrote: > > On Thu, Sep 07, 2023, Michael Grzeschik wrote: > > > On Wed, Sep 06, 2023 at 11:09:03PM +0000, Thinh Nguyen wrote: > > > > On Wed, Sep 06, 2023, Thinh Nguyen wrote: > > > > > On Wed, Sep 06, 2023, Michael Grzeschik wrote: > > > > > > > > > > > > > 2) Burst setting > > > > > > > I think this is self-explainatory. Large data request needs > > > > > > > higher burst. > > > > > > > > > > > > I will have to find out if the burst setting can be changed on the > > > > > > rk3568 somehow. This sounds very likely. > > > > > > > > > > > > > > > > The dwc3 driver checks the endpoint descriptor from the UVC function > > > > > driver to setup the burst. So just setup the max 16 bursts (or 15 in the > > > > > descriptor). The dwc3 controller supports that. Whether the host would > > > > > do 16 bursts is another thing. Note that there's no "mult" setting for > > > > > SuperSpeed. > > > > > > > > Clarification: no mult setting for the dwc3 controller when operate in > > > > SuperSpeed. > > > > > > I was somehow mistaken by the wording "burst setting" and thought of the > > > axi-bus burst setting to the controller instead of the usb3 maxburst > > > setting as you ment actually. > > > > I see. You were referring to the axi-bus burst. If your platform takes a > > long time to DMA out the data, it will impact the performance also. You > > can play around with GSBUSCFG0 to enable/restrict certain burst sizes to > > see any improvement. However, I would expect the default > > coreConfiguration values should be optimal for your platform design. > > I was not lucky with that parameters. Under heavy memory load the > system still runs into fifo underruns. > > > > However. This is usefull input anyway. I never thought of maximizing the > > > burst and packagesize and let the host side make the decision. > > > But we will probably will eat up a lot of usb bandwidth by doing so. > > > > > > Before your note I was somehow mistaken that the maxpacket and maxburst > > > size had to correlate with the actually transfered data we queue into > > > the hardware per request. > > > > Right. The maxpacket, maxburst, and mult limit the max request data > > length you can send. > > > > > > I recall that UVC tries to pack a lot of data in a single request. All > > > > > the while some intervals it would send 0-length data because of idle > > > > > time. I would spread to more requests with a little less data to give > > > > > the host a little more breathing room and bandwidth. > > > > > > The higher load per request is due to the fact that the uvc gadget is > > > using the multiplier, maxpacket and maxburst parameters for the size > > > calculation of the request. > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/gadget/function/uvc_video.c#n331 > > > > > > Since it is clear now that those parameters are not necessary coupled > > > it makes total sense to split the requests into smaller chunks. > > > > > > > Ok. > > So changing the req_size to smaller chunks indeed did increase the > stability. The main misunderstanding here was that the that not every > request corresponds with the timeslot of one interval. > > After reading this thread once again, it is clear that this is not the > case and we still find all possible bandwidth by decreasing the size of > each request. That's right. > > The main takeway was that with the hardware will cache several > trbs into the queue. So when there are not enough trbs available > because they are just to big, the fifo runs into underruns. > > In our case with the high memory bandwidth usage, the trbs could > probably not even read out that few trbs in time and did trigger > the case earlier. > > So with smaller requests the fifo will be filled with more smaller > trbs and does not run out that fast, since more trbs are cached to > begin with. > > One more question: Does the caching amount of the fifo directly > correlate with the endpoint setting in DWC3_DEPCFG_BURST_SIZE then? > No. The cache I was referring to is the cache for TRBs, and the caching of TRBs is not related to DWC3_DEPCFG_BURST_SIZE. BR, Thinh ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH] usb: gadget: uvc: reduce the request size to increase the throughput 2023-10-31 23:18 ` Thinh Nguyen @ 2023-11-09 23:33 ` Michael Grzeschik 2023-11-10 2:16 ` Thinh Nguyen 0 siblings, 1 reply; 19+ messages in thread From: Michael Grzeschik @ 2023-11-09 23:33 UTC (permalink / raw) To: linux-usb; +Cc: laurent.pinchart, dan.scally, gregkh, kernel One misconception of queueing request to the usb gadget controller is, that the amount of data that one usb_request is representing is the same as the hardware is able to transfer in one interval. This exact idea applies to the uvc_video gadget that assumes that the request needs to have the size as the gadgets bandwidth parameters are configured with. (maxpacket, multiplier and burst) Although it is actually no problem for the hardware to queue a big request, in the case of isochronous transfers it is impractical to queue big amount of data per request to the hardware. Especially if the necessary bandwidth was increased on purpose to maintain high amounts of data. E.g. in the dwc3-controller it has the negative impact that the endpoint FIFO will not be filled fast enough by the internal hardware engine. Since each transfer buffer (TRB) represents one big request, the hardware will need a long time to prefill the hardware FIFO. This can be avoided by queueing more smaller requests which will then lead to smaller TRBs which the hardware engine can keep up with. This patch is simply dropping the maxburst as an multiplier for the size calculation and therefor decreases the overall maximum request size. Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> --- This patch is created as an result from the discussion in the following thread: https://lore.kernel.org/linux-usb/20231031231841.dbhtrgqounu2rvgn@synopsys.com/ drivers/usb/gadget/function/uvc_queue.c | 1 - drivers/usb/gadget/function/uvc_video.c | 1 - 2 files changed, 2 deletions(-) diff --git a/drivers/usb/gadget/function/uvc_queue.c b/drivers/usb/gadget/function/uvc_queue.c index 0aa3d7e1f3cc32..1d3c3c09ff97cb 100644 --- a/drivers/usb/gadget/function/uvc_queue.c +++ b/drivers/usb/gadget/function/uvc_queue.c @@ -55,7 +55,6 @@ static int uvc_queue_setup(struct vb2_queue *vq, sizes[0] = video->imagesize; req_size = video->ep->maxpacket - * max_t(unsigned int, video->ep->maxburst, 1) * (video->ep->mult); /* We divide by two, to increase the chance to run diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c index 91af3b1ef0d412..c6b61843bad3d7 100644 --- a/drivers/usb/gadget/function/uvc_video.c +++ b/drivers/usb/gadget/function/uvc_video.c @@ -329,7 +329,6 @@ uvc_video_alloc_requests(struct uvc_video *video) BUG_ON(video->req_size); req_size = video->ep->maxpacket - * max_t(unsigned int, video->ep->maxburst, 1) * (video->ep->mult); video->ureq = kcalloc(video->uvc_num_requests, sizeof(struct uvc_request), GFP_KERNEL); -- 2.39.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH] usb: gadget: uvc: reduce the request size to increase the throughput 2023-11-09 23:33 ` [PATCH] usb: gadget: uvc: reduce the request size to increase the throughput Michael Grzeschik @ 2023-11-10 2:16 ` Thinh Nguyen 2023-11-10 22:42 ` Thinh Nguyen 2023-11-13 8:43 ` Michael Grzeschik 0 siblings, 2 replies; 19+ messages in thread From: Thinh Nguyen @ 2023-11-10 2:16 UTC (permalink / raw) To: Michael Grzeschik Cc: linux-usb@vger.kernel.org, laurent.pinchart@ideasonboard.com, dan.scally@ideasonboard.com, gregkh@linuxfoundation.org, kernel@pengutronix.de Hi, On Fri, Nov 10, 2023, Michael Grzeschik wrote: > One misconception of queueing request to the usb gadget controller is, > that the amount of data that one usb_request is representing is the same > as the hardware is able to transfer in one interval. > > This exact idea applies to the uvc_video gadget that assumes that the > request needs to have the size as the gadgets bandwidth parameters are > configured with. (maxpacket, multiplier and burst) > > Although it is actually no problem for the hardware to queue a big > request, in the case of isochronous transfers it is impractical to queue > big amount of data per request to the hardware. Especially if the > necessary bandwidth was increased on purpose to maintain high amounts of > data. > > E.g. in the dwc3-controller it has the negative impact that the endpoint > FIFO will not be filled fast enough by the internal hardware engine. > Since each transfer buffer (TRB) represents one big request, the > hardware will need a long time to prefill the hardware FIFO. This can be > avoided by queueing more smaller requests which will then lead to > smaller TRBs which the hardware engine can keep up with. Just want to clarify here to avoid any confusion, the hardware TX FIFO size is relatively small, usually can be smaller than the TRB. It should be fine whether the TRB is larger or smaller than the FIFO size. The hardware does not "prefill" the FIFO. It just fills whichever TRB it's currently processing (I think you may be mixing up with TRB cache). The performance impact from this change is to reduce the USB bandwidth usage. The current calculation in uvc function can use 48KB/uframe for each request in SuperSpeed, the max size for isoc in SuperSpeed. I know many hosts can't handle this kind of transfer rate from their hardware. (e.g. It gets worse when scheduling transfers for multiple endpoints and under multiple tier hubs). The bandwidth can be impacted by multiple factors and not just from the gadget side as noted in the discussion before. > > This patch is simply dropping the maxburst as an multiplier for the size > calculation and therefor decreases the overall maximum request size. > > Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> > --- > This patch is created as an result from the discussion in the following thread: > > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20231031231841.dbhtrgqounu2rvgn@synopsys.com/__;!!A4F2R9G_pg!fTaIo4tDljSbEvUY5SZLkNvKWcz0YeN0Ekzs0CPWyD73RGRmErRC2frODFgnMB1M4Nse0oKKgwxC65gePhGAtauKJq1Vnzlj$ > > drivers/usb/gadget/function/uvc_queue.c | 1 - > drivers/usb/gadget/function/uvc_video.c | 1 - > 2 files changed, 2 deletions(-) > > diff --git a/drivers/usb/gadget/function/uvc_queue.c b/drivers/usb/gadget/function/uvc_queue.c > index 0aa3d7e1f3cc32..1d3c3c09ff97cb 100644 > --- a/drivers/usb/gadget/function/uvc_queue.c > +++ b/drivers/usb/gadget/function/uvc_queue.c > @@ -55,7 +55,6 @@ static int uvc_queue_setup(struct vb2_queue *vq, > sizes[0] = video->imagesize; > > req_size = video->ep->maxpacket > - * max_t(unsigned int, video->ep->maxburst, 1) I think you're reducing a bit too much here? Also, take advantage of burst. So, perhaps keep request size to max(16K, MPS * maxburst)? Can be more or less depending on how much video data is needed to transfer over a video frame. BR, Thinh > * (video->ep->mult); > > /* We divide by two, to increase the chance to run > diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c > index 91af3b1ef0d412..c6b61843bad3d7 100644 > --- a/drivers/usb/gadget/function/uvc_video.c > +++ b/drivers/usb/gadget/function/uvc_video.c > @@ -329,7 +329,6 @@ uvc_video_alloc_requests(struct uvc_video *video) > BUG_ON(video->req_size); > > req_size = video->ep->maxpacket > - * max_t(unsigned int, video->ep->maxburst, 1) > * (video->ep->mult); > > video->ureq = kcalloc(video->uvc_num_requests, sizeof(struct uvc_request), GFP_KERNEL); > -- > 2.39.2 > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] usb: gadget: uvc: reduce the request size to increase the throughput 2023-11-10 2:16 ` Thinh Nguyen @ 2023-11-10 22:42 ` Thinh Nguyen 2023-11-13 8:43 ` Michael Grzeschik 1 sibling, 0 replies; 19+ messages in thread From: Thinh Nguyen @ 2023-11-10 22:42 UTC (permalink / raw) To: Thinh Nguyen Cc: Michael Grzeschik, linux-usb@vger.kernel.org, laurent.pinchart@ideasonboard.com, dan.scally@ideasonboard.com, gregkh@linuxfoundation.org, kernel@pengutronix.de On Fri, Nov 10, 2023, Thinh Nguyen wrote: > Hi, > > On Fri, Nov 10, 2023, Michael Grzeschik wrote: > > One misconception of queueing request to the usb gadget controller is, > > that the amount of data that one usb_request is representing is the same > > as the hardware is able to transfer in one interval. > > > > This exact idea applies to the uvc_video gadget that assumes that the > > request needs to have the size as the gadgets bandwidth parameters are > > configured with. (maxpacket, multiplier and burst) > > > > Although it is actually no problem for the hardware to queue a big > > request, in the case of isochronous transfers it is impractical to queue > > big amount of data per request to the hardware. Especially if the > > necessary bandwidth was increased on purpose to maintain high amounts of > > data. > > > > E.g. in the dwc3-controller it has the negative impact that the endpoint > > FIFO will not be filled fast enough by the internal hardware engine. > > Since each transfer buffer (TRB) represents one big request, the > > hardware will need a long time to prefill the hardware FIFO. This can be > > avoided by queueing more smaller requests which will then lead to > > smaller TRBs which the hardware engine can keep up with. > > Just want to clarify here to avoid any confusion, the hardware TX FIFO > size is relatively small, usually can be smaller than the TRB. It should > be fine whether the TRB is larger or smaller than the FIFO size. The > hardware does not "prefill" the FIFO. It just fills whichever TRB it's > currently processing (I think you may be mixing up with TRB cache). > > The performance impact from this change is to reduce the USB bandwidth > usage. The current calculation in uvc function can use 48KB/uframe for > each request in SuperSpeed, the max size for isoc in SuperSpeed. I know > many hosts can't handle this kind of transfer rate from their hardware. > (e.g. It gets worse when scheduling transfers for multiple endpoints and > under multiple tier hubs). > > The bandwidth can be impacted by multiple factors and not just from the > gadget side as noted in the discussion before. > > > > > This patch is simply dropping the maxburst as an multiplier for the size > > calculation and therefor decreases the overall maximum request size. > > > > Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> > > --- > > This patch is created as an result from the discussion in the following thread: > > > > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20231031231841.dbhtrgqounu2rvgn@synopsys.com/__;!!A4F2R9G_pg!fTaIo4tDljSbEvUY5SZLkNvKWcz0YeN0Ekzs0CPWyD73RGRmErRC2frODFgnMB1M4Nse0oKKgwxC65gePhGAtauKJq1Vnzlj$ > > > > drivers/usb/gadget/function/uvc_queue.c | 1 - > > drivers/usb/gadget/function/uvc_video.c | 1 - > > 2 files changed, 2 deletions(-) > > > > diff --git a/drivers/usb/gadget/function/uvc_queue.c b/drivers/usb/gadget/function/uvc_queue.c > > index 0aa3d7e1f3cc32..1d3c3c09ff97cb 100644 > > --- a/drivers/usb/gadget/function/uvc_queue.c > > +++ b/drivers/usb/gadget/function/uvc_queue.c > > @@ -55,7 +55,6 @@ static int uvc_queue_setup(struct vb2_queue *vq, > > sizes[0] = video->imagesize; > > > > req_size = video->ep->maxpacket > > - * max_t(unsigned int, video->ep->maxburst, 1) > > I think you're reducing a bit too much here? Also, take advantage of > burst. So, perhaps keep request size to max(16K, MPS * maxburst)? I mean to potentially cap at 16K, it didn't translate correctly to code.... I meant something like this: max_size = mult * maxburst * MPS; clamp(max_size, mult * MPS, 16K); > > Can be more or less depending on how much video data is needed to > transfer over a video frame. Can the uvc gadget figure out the max video data over a video frame? Or is it not available at this layer? I figure different setup requires more video data payload. If we have this info, then you can properly cap the request size. Thanks, Thinh ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] usb: gadget: uvc: reduce the request size to increase the throughput 2023-11-10 2:16 ` Thinh Nguyen 2023-11-10 22:42 ` Thinh Nguyen @ 2023-11-13 8:43 ` Michael Grzeschik 2023-11-17 2:39 ` Thinh Nguyen 1 sibling, 1 reply; 19+ messages in thread From: Michael Grzeschik @ 2023-11-13 8:43 UTC (permalink / raw) To: Thinh Nguyen Cc: linux-usb@vger.kernel.org, laurent.pinchart@ideasonboard.com, dan.scally@ideasonboard.com, gregkh@linuxfoundation.org, kernel@pengutronix.de [-- Attachment #1: Type: text/plain, Size: 6230 bytes --] On Fri, Nov 10, 2023 at 02:16:59AM +0000, Thinh Nguyen wrote: >On Fri, Nov 10, 2023, Michael Grzeschik wrote: >> One misconception of queueing request to the usb gadget controller is, >> that the amount of data that one usb_request is representing is the same >> as the hardware is able to transfer in one interval. >> >> This exact idea applies to the uvc_video gadget that assumes that the >> request needs to have the size as the gadgets bandwidth parameters are >> configured with. (maxpacket, multiplier and burst) >> >> Although it is actually no problem for the hardware to queue a big >> request, in the case of isochronous transfers it is impractical to queue >> big amount of data per request to the hardware. Especially if the >> necessary bandwidth was increased on purpose to maintain high amounts of >> data. >> >> E.g. in the dwc3-controller it has the negative impact that the endpoint >> FIFO will not be filled fast enough by the internal hardware engine. >> Since each transfer buffer (TRB) represents one big request, the >> hardware will need a long time to prefill the hardware FIFO. This can be >> avoided by queueing more smaller requests which will then lead to >> smaller TRBs which the hardware engine can keep up with. > >Just want to clarify here to avoid any confusion, the hardware TX FIFO >size is relatively small, usually can be smaller than the TRB. It should >be fine whether the TRB is larger or smaller than the FIFO size. The >hardware does not "prefill" the FIFO. It just fills whichever TRB it's >currently processing (I think you may be mixing up with TRB cache). What I see is, that by using bigger TRBs the hardware is not able to keep up with reading from the memory when the system is under heavy memory pressure. This is the main reason for this change. Since we found out that increasing the FIFO size had an effect to how high we are able to set the hardware endpoint configuration on our gadget side (params.param0), until we saw the issue reoccur. So the Idea here was to have a tweak on how the hardware handles the data from the memory to the Hardware-FIFO which seems not to underrun with smaller TRBs. >The performance impact from this change is to reduce the USB bandwidth >usage. The current calculation in uvc function can use 48KB/uframe for >each request in SuperSpeed, the max size for isoc in SuperSpeed. I know >many hosts can't handle this kind of transfer rate from their hardware. >(e.g. It gets worse when scheduling transfers for multiple endpoints and >under multiple tier hubs). I think I don't fully understand here. We change the overall buffersize of the usb_request and therefor limit the size of possible TRBs. This should even only have most effect on the trbsize for the memcopy path, since the scatter gather requests are already split into multiple trbs which is capped to the maximum mappable memory range of PAGE_SIZE (4k). Other then that, the parameterization of the endpoint on our gadget side is not changed by this patch. The endpoint configuration is set as follows: params.param0 |= DWC3_DEPCFG_BURST_SIZE(burst - 1) | DWC3_DEPCFG_MAX_PACKET_SIZE(usb_endpoint_maxp(desc)); So by changing the request_size there should not be any other change in the gadget side hardware configuration. How is the overall bandwidth usage affected by this change then other than we have more smaller potential trbs in the queue. If the Intervallength is not coupled to the amount of to be transfered TRBs in any case, it should not have an effect to the bandwidth. If I am mistaken here, can you point me to some code? >The bandwidth can be impacted by multiple factors and not just from the >gadget side as noted in the discussion before. Right. >> This patch is simply dropping the maxburst as an multiplier for the size >> calculation and therefor decreases the overall maximum request size. >> >> Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> >> --- >> This patch is created as an result from the discussion in the following thread: >> >> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20231031231841.dbhtrgqounu2rvgn@synopsys.com/__;!!A4F2R9G_pg!fTaIo4tDljSbEvUY5SZLkNvKWcz0YeN0Ekzs0CPWyD73RGRmErRC2frODFgnMB1M4Nse0oKKgwxC65gePhGAtauKJq1Vnzlj$ >> >> drivers/usb/gadget/function/uvc_queue.c | 1 - >> drivers/usb/gadget/function/uvc_video.c | 1 - >> 2 files changed, 2 deletions(-) >> >> diff --git a/drivers/usb/gadget/function/uvc_queue.c b/drivers/usb/gadget/function/uvc_queue.c >> index 0aa3d7e1f3cc32..1d3c3c09ff97cb 100644 >> --- a/drivers/usb/gadget/function/uvc_queue.c >> +++ b/drivers/usb/gadget/function/uvc_queue.c >> @@ -55,7 +55,6 @@ static int uvc_queue_setup(struct vb2_queue *vq, >> sizes[0] = video->imagesize; >> >> req_size = video->ep->maxpacket >> - * max_t(unsigned int, video->ep->maxburst, 1) > >I think you're reducing a bit too much here? Also, take advantage of >burst. So, perhaps keep request size to max(16K, MPS * maxburst)? > >Can be more or less depending on how much video data is needed to >transfer over a video frame. > >BR, >Thinh > >> * (video->ep->mult); >> >> /* We divide by two, to increase the chance to run >> diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c >> index 91af3b1ef0d412..c6b61843bad3d7 100644 >> --- a/drivers/usb/gadget/function/uvc_video.c >> +++ b/drivers/usb/gadget/function/uvc_video.c >> @@ -329,7 +329,6 @@ uvc_video_alloc_requests(struct uvc_video *video) >> BUG_ON(video->req_size); >> >> req_size = video->ep->maxpacket >> - * max_t(unsigned int, video->ep->maxburst, 1) >> * (video->ep->mult); >> >> video->ureq = kcalloc(video->uvc_num_requests, sizeof(struct uvc_request), GFP_KERNEL); >> -- >> 2.39.2 >> >> -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] usb: gadget: uvc: reduce the request size to increase the throughput 2023-11-13 8:43 ` Michael Grzeschik @ 2023-11-17 2:39 ` Thinh Nguyen 0 siblings, 0 replies; 19+ messages in thread From: Thinh Nguyen @ 2023-11-17 2:39 UTC (permalink / raw) To: Michael Grzeschik Cc: Thinh Nguyen, linux-usb@vger.kernel.org, laurent.pinchart@ideasonboard.com, dan.scally@ideasonboard.com, gregkh@linuxfoundation.org, kernel@pengutronix.de On Mon, Nov 13, 2023, Michael Grzeschik wrote: > On Fri, Nov 10, 2023 at 02:16:59AM +0000, Thinh Nguyen wrote: > > On Fri, Nov 10, 2023, Michael Grzeschik wrote: > > > One misconception of queueing request to the usb gadget controller is, > > > that the amount of data that one usb_request is representing is the same > > > as the hardware is able to transfer in one interval. > > > > > > This exact idea applies to the uvc_video gadget that assumes that the > > > request needs to have the size as the gadgets bandwidth parameters are > > > configured with. (maxpacket, multiplier and burst) > > > > > > Although it is actually no problem for the hardware to queue a big > > > request, in the case of isochronous transfers it is impractical to queue > > > big amount of data per request to the hardware. Especially if the > > > necessary bandwidth was increased on purpose to maintain high amounts of > > > data. > > > > > > E.g. in the dwc3-controller it has the negative impact that the endpoint > > > FIFO will not be filled fast enough by the internal hardware engine. > > > Since each transfer buffer (TRB) represents one big request, the > > > hardware will need a long time to prefill the hardware FIFO. This can be > > > avoided by queueing more smaller requests which will then lead to > > > smaller TRBs which the hardware engine can keep up with. > > > > Just want to clarify here to avoid any confusion, the hardware TX FIFO > > size is relatively small, usually can be smaller than the TRB. It should > > be fine whether the TRB is larger or smaller than the FIFO size. The > > hardware does not "prefill" the FIFO. It just fills whichever TRB it's > > currently processing (I think you may be mixing up with TRB cache). > > What I see is, that by using bigger TRBs the hardware is not able > to keep up with reading from the memory when the system is under > heavy memory pressure. This is the main reason for this change. > > Since we found out that increasing the FIFO size had an effect to how > high we are able to set the hardware endpoint configuration on our > gadget side (params.param0), until we saw the issue reoccur. > > So the Idea here was to have a tweak on how the hardware handles the > data from the memory to the Hardware-FIFO which seems not to underrun > with smaller TRBs. > > > The performance impact from this change is to reduce the USB bandwidth > > usage. The current calculation in uvc function can use 48KB/uframe for > > each request in SuperSpeed, the max size for isoc in SuperSpeed. I know > > many hosts can't handle this kind of transfer rate from their hardware. > > (e.g. It gets worse when scheduling transfers for multiple endpoints and > > under multiple tier hubs). > > I think I don't fully understand here. > > We change the overall buffersize of the usb_request and therefor limit > the size of possible TRBs. This should even only have most effect on the > trbsize for the memcopy path, since the scatter gather requests are > already split into multiple trbs which is capped to the maximum mappable > memory range of PAGE_SIZE (4k). > > Other then that, the parameterization of the endpoint on our gadget side > is not changed by this patch. The endpoint configuration is set as follows: > > params.param0 |= DWC3_DEPCFG_BURST_SIZE(burst - 1) | > DWC3_DEPCFG_MAX_PACKET_SIZE(usb_endpoint_maxp(desc)); > > So by changing the request_size there should not be any other change in the > gadget side hardware configuration. > > How is the overall bandwidth usage affected by this change then other > than we have more smaller potential trbs in the queue. > > If the Intervallength is not coupled to the amount of to be transfered > TRBs in any case, it should not have an effect to the bandwidth. > > If I am mistaken here, can you point me to some code? > My point was to clarify that the reduction of request size would mean more time available for the transfer to go out within the given usb bandwidth. Both the host and the device sides can affect when and how much data can be sent within an interval. The host has it own calculation on how much usb bandwidth is allocated for the device and endpoint. Some hosts give more, some give less. The more data is transferred, the less time it has to send out all the data. Your tweaking of the FIFO size and the bus burst help improve the performance on the device side. I also want to note that the transfer size of the usb request is what matter since it accounts for transfer over an interval. If you use SG request, then you may have many smaller TRBs. For newer controllers, the performance should be equivalent to a request using a large TRB. Regarding the question earlier, can you cap the size of the request depending on how much video data is expected over a given video frame instead? Thanks, Thinh ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2023-11-17 2:39 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-08-31 22:12 DWC3-Gadget: Flickering with ISOC Streaming (UVC) while multiplier set on Superspeed Michael Grzeschik 2023-09-01 1:35 ` Thinh Nguyen 2023-09-03 22:41 ` Michael Grzeschik 2023-09-04 0:42 ` Michael Grzeschik 2023-09-06 0:44 ` Thinh Nguyen 2023-09-06 6:40 ` Michael Grzeschik 2023-09-06 0:41 ` Thinh Nguyen 2023-09-06 7:18 ` Michael Grzeschik 2023-09-06 23:05 ` Thinh Nguyen 2023-09-06 23:09 ` Thinh Nguyen 2023-09-07 21:00 ` Michael Grzeschik 2023-09-07 23:33 ` Thinh Nguyen 2023-10-30 12:18 ` Michael Grzeschik 2023-10-31 23:18 ` Thinh Nguyen 2023-11-09 23:33 ` [PATCH] usb: gadget: uvc: reduce the request size to increase the throughput Michael Grzeschik 2023-11-10 2:16 ` Thinh Nguyen 2023-11-10 22:42 ` Thinh Nguyen 2023-11-13 8:43 ` Michael Grzeschik 2023-11-17 2:39 ` Thinh Nguyen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox