On Tue, May 28, 2024 at 05:33:46PM -0700, Avichal Rakesh wrote: > > >On 5/28/24 15:43, Michael Grzeschik wrote: >> On Tue, May 28, 2024 at 02:27:34PM -0700, Avichal Rakesh wrote: >>> >>> >>> On 5/28/24 13:22, Michael Grzeschik wrote: >>>> On Tue, May 28, 2024 at 10:30:30AM -0700, Avichal Rakesh wrote: >>>>> >>>>> >>>>> On 5/22/24 10:37, Michael Grzeschik wrote: >>>>>> On Wed, May 22, 2024 at 05:17:02PM +0000, Thinh Nguyen wrote: >>>>>>> On Wed, May 22, 2024, Alan Stern wrote: >>>>>>>> On Wed, May 22, 2024 at 01:41:42AM +0000, Thinh Nguyen wrote: >>>>>>>> > On Wed, May 22, 2024, Michael Grzeschik wrote: >>>>>>>> > > On Fri, May 17, 2024 at 01:44:05AM +0000, Thinh Nguyen wrote: >>>>>>>> > > > For isoc endpoint IN, yes. If the host requests for isoc data IN while >>>>>>>> > > > no TRB is prepared, then the controller will automatically send 0-length >>>>>>>> > > > packet respond. >>>>>>>> > > >>>>>>>> > > Perfect! This will help a lot and will make active queueing of own >>>>>>>> > > zero-length requests run unnecessary. >>>>>>>> > >>>>>>>> > Yes, if we rely on the current start/stop isoc transfer scheme for UVC, >>>>>>>> > then this will work. >>>>>>>> >>>>>>>> You shouldn't rely on this behavior.  Other device controllers might not >>>>>>>> behave this way; they might send no packet at all to the host (causing a >>>>>>>> USB protocol error) instead of sending a zero-length packet. >>>>>>> >>>>>>> I agree. The dwc3 driver has this workaround to somewhat work with the >>>>>>> UVC. This behavior is not something the controller expected, and this >>>>>>> workaround should not be a common behavior for different function >>>>>>> driver/protocol. Since this behavior was added a long time ago, it will >>>>>>> remain the default behavior in dwc3 to avoid regression with UVC (at >>>>>>> least until the UVC is changed). However, it would be nice for UVC to >>>>>>> not rely on this. >>>>>> >>>>>> With "this" you mean exactly the following commit, right? >>>>>> >>>>>> (f5e46aa4 usb: dwc3: gadget: when the started list is empty stop the active xfer) >>>>>> >>>>>> When we start questioning this, then lets dig deeper here. >>>>>> >>>>>> With the fast datarate of at least usb superspeed shouldn't they not all >>>>>> completely work asynchronous with their in flight trbs? >>>>>> >>>>>> In my understanding this validates that, with at least superspeed we are >>>>>> unlikely to react fast enough to maintain a steady isoc dataflow, since >>>>>> the driver above has to react to errors in the processing context. >>>>>> >>>>>> This runs the above patch (f5e46aa4) a gadget independent solution >>>>>> which has nothing to do with uvc in particular IMHO. >>>>>> >>>>>> How do other controllers and their drivers work? >>>>>> >>>>>>> Side note, when the dwc3 driver reschedules/starts isoc transfer again, >>>>>>> the first transfer will be scheduled go out at some future interval and >>>>>>> not the next immediate microframe. For UVC, it probably won't be a >>>>>>> problem since it doesn't seem to need data going out every interval. >>>>>> >>>>>> It should not make a difference. [TM] >>>>>> >>>>> >>>>> >>>>> Sorry for being absent for a lot of this discussion. >>>>> >>>>> I want to take a step back from the details of how we're >>>>> solving the problem to what problems we're trying to solve. >>>>> >>>>> So, question(s) for Michael, because I don't see an explicit >>>>> answer in this thread (and my sincerest apologies if they've >>>>> been answered already and I missed it): >>>>> >>>>> What exactly is the bug (or bugs) we're trying to solve here? >>>>> >>>>> So far, it looks like there are two main problems being >>>>> discussed: >>>>> >>>>> 1. Reducing the bandwidth usage of individual usb_requests >>>>> 2. Error handling for when transmission over the wire fails. >>>>> >>>>> Is that correct, or are there other issues at play here? >>>> >>>> That is correct. >>>> >>>>> (1) in isolation should be relatively easy to solve: Just >>>>> smear the encoded frame across some percentage >>>>> (prefereably < 100%) of the usb_requests in one >>>>> video frame interval. >>>> >>>> Right. >>>> >>>>> (2) is more complicated, and your suggestion is to have a >>>>> sentinel request between two video frames that tells the >>>>> host if the transmission of "current" video frame was >>>>> successful or not. (Is that correct?) >>>> >>>> Right. >>>> >>>>> Assuming my understanding of (2) is correct, how should >>>>> the host behave if the transmission of the sentinel >>>>> request fails after the video frame was sent >>>>> successfully (isoc is best effort so transmission is >>>>> never guaranteed)? >>>> >>>> If we have transmitted all requests of the frame but would only miss the >>>> sentinel request to the host that includes the EOF, the host could >>>> rather show it or drop it. The drop would not be necessary since the >>>> host did see all data of the frame. The user would not see any >>>> distortion in both cases. >>>> >>>> If we have transmitted only partial data of the frame it would be >>>> preferred if the host would drop the frame that did not finish with an >>>> proper EOF tag. >>>> >>>> AFAIK the linux kernel would still show the frame if the FID of the >>>> currently handled request would change and would take this as the end of >>>> frame indication. But I am not totally sure if this is by spec or >>>> optional. >>>> >>>> One option to be totally sure would be to resend the sentinel request to >>>> be properly transmitted before starting the next frame. This resend >>>> polling would probably include some extra zero-length requests. But also >>>> if this resend keeps failing for n times, the driver should doubt there >>>> is anything sane going on with the USB connection and bail out somehow. >>>> >>>> Since we try to tackle case (1) to avoid transmit errors and also avoid >>>> creating late enqueued requests in the running isoc transfer, the over >>>> all chance to trigger missed transfers should be minimal. >>> >>> Gotcha. It seems like the UVC gadget driver implicitly assumes that EOF >>> flag will be used although the userspace application can technically >>> make it optional. >> >> That is not all. The additional UVC_STREAM_ERR tag on the sentinel >> request can be set optional by the host driver. But by spec the >> userspace application has to drop the frame when the flag was set. > >Looking at the UVC specs, the ERR bit doesn't seem to refer to actual >transmission error, only errors in frame generation (Section 4.3.1.7 >of UVC 1.5 Class Specification). Maybe "data discontinuity" can be >used but the examples given are bad media, and encoder issues, which >suggests errors at higher level than the wire. Oh! That is a new perspective I did not consider. With the definition of UVC_STREAM_ERR by spec, the uvc_video driver would in no case set this header bit for the current frame on its own? Is that correct? >> With my proposal this flag will be set, whenever we find out that >> the currently transferred frame was erroneous. >> >>> Summarizing some of the discussions above: >>> 1. UVC gadget driver should _not_ rely on the usb controller to >>>   enqueue 0-length requests on UVC gadget drivers behalf; >>> 2. However keeping up the backpressure to the controller means the >>>   EOF request will be delayed behind all the zero-length requests. >> >> Exactly, this is why we have to somehow finetune the timedelay between >> requests that trigger interrupts. And also monitor the amount of >> requests currently enqueued in the hw ringbuffer. So that our drivers >> enqueue dequeue mechanism is virtually adding only the minimum amount >> of necessary zero-length requests in the hardware. This should be >> possible. >> >> I am currently thinking through the remaining steps the pump worker has >> to do on each wakeup to maintain the minimum threshold while waiting >> with submitting requests that contain actual image payload. >> >>> Out of curiosity: What is wrong with letting the host rely on >>> FID alone? Decoding the jpeg payload _should_ fail if any of the >>> usb_requests containing the payload failed to transmit. >> >> This is not totally true. We saw partially rendered jpeg frames on the >> host stream. How the host behaves with broken data is totally undefined >> if the typical uvc flags EOF/ERR are not used as specified. Then think >> about uncompressed formats. So relying on the transferred image format >> to solve our problems is just as wrong as relying on the gadgets >> hardware behavior. > >Do you know if the partially rendered frames were valid JPEGs, or >if the host was simply making a best effort at displaying a broken >JPEG? Perhaps the fix should go to the host instead? I can fully reproduce this with linux and windows hosts. For linux machines I saw that the host was taking the FID change as a marker to see the previous frame as ready and just rendered what got through. This did not lead to garbage but only to partially displayed frames with jpeg macroblock alignment. >Following is my opinion, feel free to disagree (and correct me if >something is factually incorrect): > >The fundamental issue here is that ISOC doesn't guarantee >delivery of usb_requests or even basic data consistency upon delivery. >So the gadget driver has no way to know the state of transmitted data. >The gadget driver is notified of underruns but not of any other issues, >and ideally we should never have an underrun if the zero-length >backpressure is working as intended. > >So, UVC gadget driver can reduce the number of errors, but it'll never be >able to guarantee that the data transmitted to the host isn't somehow >corrupted or missing unless a more reliable mode of transmission >(bulk, for example) is used. > >All of this to say: The host absolutely needs to be able to handle >all sorts of invalid and broken payloads. How the host handles it >might be undefined, but the host can never rely on perfect knowledge >about the transmission state. In cases like these, where the underlying >transport is unreliable, the burden of enforcing consistency moves up >a layer, i.e. to the encoded payload in this case. So it is perfectly >fine for the host to rely on the encoding to determine if the payload >is corrupt and handle it accordingly. Right. >As for uncompressed format, you're correct that subtle corruptions >may not be caught, but outright missing usb_requests can be easily >checked by simply looking at the number of bytes in the payload. YUV >frames are all of the same (predetermined) size for a given resolution. That was also my thought about five minutes after I did send you the previous mail. So sure, this is no real issue for the host. >So my recommendation is the following: >1. Fix the bandwidth problem by splitting the encoded video frame > into more usb_requests (as your patch already does) making sure > there are enough free usb_request to encode the video frame in > one burst so we don't accidentally inflate the transmission > duration of a video frame by sneaking in zero-length requests in > the middle. Ack. This should already solve a lot of issues. For this I would still suggest to move the usb_ep_queue to be done in the pump worker again. Its a bit back and forth, but IMHO its worth the extra mile since only this way we would respect the dwc3 interrupt threads assumption to run *very* short. >2. Unless there is an unusually high rate of transmission failures > when using the UVC gadget driver, it might be worth fixing the > host side driver to handle broken frames better instead (assuming > host is linux as well). Agreed, but this needs a separate scoped undestanding of the host side behaviour over all layers. >2. Tighten up the error checking in UVC gadget driver -- We drop the > current frame whenever an EXDEV happens which is wrong. We should > only be dropping the current frame if the EXDEV corresponds to the > frame currently being encoded. What do you mean by drop? I would suggest to immediatly switch the uvc_buffer that is being enqueued and start queueing prepared requests from the next buffers prep list. As suggested, the idea is to have per uvc_buffer prep_list requests which would make this task easy. > If the frame is already fully queued to the usb controller, the host > can handle missing payload as it sees fit. Ack This roadmap sounds like a good one. Michael -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |