* [BUG] musb: broken isochronous transfer at TI AM335x platform
@ 2018-02-15 17:54 Matwey V. Kornilov
2018-02-16 16:27 ` Tony Lindgren
0 siblings, 1 reply; 3+ messages in thread
From: Matwey V. Kornilov @ 2018-02-15 17:54 UTC (permalink / raw)
To: linux-usb, Bin Liu, Greg KH, hverkuil, linux-media, linux,
Tony Lindgren
Hi all,
Almost two years ago I faced an issue related to PWC-driven V4L2
webcam attached to BeagleBone Black SBC. [1][2] The issue still
persists in 4.16-rc1. However, some research has been carried out
since then. I would like to summarize my findings below. I also would
like to receive feedback since the issue appears not to have one
single source and probably may affect setups other than mine.
Initial issue signs were the following. When Philips SPC 900 webcam
(handled by pwc kernel module) is attached to BeagleBone Black SBC
(equipped with TI AM335x Cortex-A8 CPU and Inventra MUSB Dual-Role USB
controller), no frames can be received from the camera. In other
words, the major functional purpose of the camera was broken. From
user-space point of view,
ioctl(fd, VIDIOC_DQBUF, &buf)
indefinitely waits for a frame, yet all other functionalities (i.e.
setting formats) works as usual. This happens always when "ondemand"
cpufreq policy is set and from time to time when "performance" is
used. This also happens disregarding whether DMA USB is used.
Other instances of the camera (same model) have been tested and found
to have same behavior. Then, I've found that the reason for such
behavior is that pwc module never feeds V4L2 subsystem with frames
from the module side, because frame completion criteria never
satisfied in pwc_isoc_handler(). PWC considers the "shorter" or empty
incoming USB ISO packet as end-of-frame sign. Usually, every URB has
10 packets and every packet consists of 956 payload data bytes in case
of PWC. Also, when end-of-frame is received the accumulated amount of
bytes must be equal to expected (known from frame format, i.e width *
height * pixel-depth). It appeared that it was never the case, empty
USB packet was always received in advance to expected frame end and
then every frame was discarded.
I've also bisected the following. The USB ISO transfer works well since commit
2035772010db ("usb: musb: musb_host: Enable HCD_BH flag to handle
urb return in bottom half")
and stops working after commit
f551e1352983 ("Revert "usb: musb: musb_host: Enable HCD_BH flag to
handle urb return in bottom half"")
To reliably connect these two facts, I had to assembly OpenVizsla
hardware USB monitor (sniffer). [3] Using this tool the following MUSB
Host + PWC webcam behavior patterns were found [4]:
[ ] 7.219456 d= 0.000997 [181.0 + 3.667] [ 3] IN : 4.5
[ T ] 7.219459 d= 0.000003 [181.0 + 7.083] [800] DATA0: 53 da ...
[ ] 7.220456 d= 0.000997 [182 + 3.667] [ 3] IN : 4.5
[ T ] 7.220459 d= 0.000003 [182 + 7.000] [800] DATA0: 13 36 ...
[ ] 7.222456 d= 0.001997 [184 + 3.667] [ 3] IN : 4.5
[ ] 7.222459 d= 0.000003 [184 + 7.000] [ 3] DATA0: 00 00
[ ] 7.223456 d= 0.000997 [185.0 + 3.667] [ 3] IN : 4.5
[ ] 7.223459 d= 0.000003 [185.0 + 7.000] [ 3] DATA0: 00 00
Please note, that the time moment "7.221456" has missed IN request
packet which must be sent out every 1ms in this low-speed USB case.
Then, all incoming packets became empty ones. Such moments coincide
with frame discarding in pwc driver.
Even though IN sending is usually handled by USB host hardware, it is
not fully true for MUSB. Every IN is triggered by musb kernel driver
(see MUSB_RXCSR_H_REQPKT usage in musb_host_packet_rx() and
musb_ep_program()) since auto IN is not used. Rather complicated logic
is incorporated to decide whether IN packet has to be sent. First,
musb_host_packet_rx() handles IN sending when current URB is not
completed (i.e. current URB has another packet which has to be
received next). Second, musb_advance_schedule() (via musb_start_urb())
handles the case when current URB is completed but there is another
URB pending. It seems that there is a hardware logic to fire IN packet
in a way to have exactly 1ms between two consequent INs. So,
MUSB_RXCSR_H_REQPKT is considered as IN requesting flag.
Both functions are triggered by musb_host_rx() interrupt handler. And
there are a lot of other important things between interrupt triggering
and next IN requesting. It appears that sometimes it takes 0.4-0.5 ms
between the interrupt and MUSB_RXCSR_H_REQPKT writing. This delays
also lead to missed IN (It is actually sent, but at (last)+2ms time
instead of (last)+1ms) and confusing the peripheral as shown above.
This is a case of URB-completion branch.
The most time-consuming part is urb->giveback() calling.
Unfortunately, it is performed synchronously before requesting IN.
This explains why commit
2035772010db ("usb: musb: musb_host: Enable HCD_BH flag to handle
urb return in bottom half")
masks the issue. Moving (postponing) urb->giveback() call out of
interrupt context eliminated extra delays between IN requesting.
(Un)fortunately, this commit was reverted and the issue returned back.
I proposed a patch to swap time-consuming urb->giveback() and
MUSB_RXCSR_H_REQPKT writing but the patch has not been accepted since
it complicates IN-requesting logic even more. [5]
However, there is another level in this issue, the performance of
urb->giveback (pwc_isoc_handler()) itself. It takes 350us for the
handler to be executed that seems to be too much to copy 9560 bytes of
the data from urb->transmit_buffer to internal pwc data structures.
Using trace event points I measured exact timings in two cases. [6]
The median memcpy() performance is 28 MBps. Given CPU frequency at
lowest possible state in 'ondemand' cpufreq policy, it roughly equals
to 10 CPU cycles per byte. Reverting commit
63c5b4ca7d0d ("usb: musb: do not change dev's dma_mask")
raises the rate to 280 MBps (~ 1 cycle per byte) and fully fixes the
issue when PIO in musb is used.
The reason is the following. pwc uses usb_alloc_coherent() to allocate
memory for urb->transfer_buffer. It seems to be a common practice
among webcam drivers: uvc, gspca and others do the same, though it is
not clear what is the reason to use so strict requirement as full
coherence here. Internally, usb_alloc_coherent() does
dma_alloc_coherent() or kmalloc() (when dev->dma_mask is NULL). In
case of MUSB, usb_alloc_coherent() always behaves as
dma_alloc_coherent() (mind 63c5b4ca7d0d, dma_mask is always set). At
the same time AFAIU AM335x doesn't have truly coherent DMA and it is
emulated in the kernel by pgprot nocache flags in the pages table. So,
we always have uncacheable (slow) memory for urb->transfer_buffer.
In conclusion, I think there is possibility to fix the issue (because
there are known cases when things work), though the way is not clear
to me. memcpy() is "slow" for uncacheable memory (not sure that
something can be done with it). usb_alloc_coherent() is probably
misused in pwc and a number of USB peripherial drivers. musb_host_rx()
interrupt handler performs less important things (urb->giveback) prior
to more important (writing MUSB_RXCSR_H_REQPKT).
References:
[1] https://www.spinics.net/lists/linux-usb/msg143956.html
[2] https://www.spinics.net/lists/linux-usb/msg145747.html
[3] http://openvizsla.org/
[4] https://www.spinics.net/lists/linux-usb/msg156107.html
[5] https://www.spinics.net/lists/linux-usb/msg156486.html
[6] https://github.com/matwey/linux/commit/2b36e1add5aaf552923c8c1340e50bd7c2050fde
--
With best regards,
Matwey V. Kornilov.
Sternberg Astronomical Institute, Lomonosov Moscow State University, Russia
119234, Moscow, Universitetsky pr-k 13, +7 (495) 9392382
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [BUG] musb: broken isochronous transfer at TI AM335x platform
2018-02-15 17:54 [BUG] musb: broken isochronous transfer at TI AM335x platform Matwey V. Kornilov
@ 2018-02-16 16:27 ` Tony Lindgren
2018-02-18 20:47 ` Matwey V. Kornilov
0 siblings, 1 reply; 3+ messages in thread
From: Tony Lindgren @ 2018-02-16 16:27 UTC (permalink / raw)
To: Matwey V. Kornilov
Cc: linux-usb, Bin Liu, Greg KH, hverkuil, linux-media, linux
* Matwey V. Kornilov <matwey@sai.msu.ru> [180215 17:55]:
> [ ] 7.219456 d= 0.000997 [181.0 + 3.667] [ 3] IN : 4.5
> [ T ] 7.219459 d= 0.000003 [181.0 + 7.083] [800] DATA0: 53 da ...
> [ ] 7.220456 d= 0.000997 [182 + 3.667] [ 3] IN : 4.5
> [ T ] 7.220459 d= 0.000003 [182 + 7.000] [800] DATA0: 13 36 ...
> [ ] 7.222456 d= 0.001997 [184 + 3.667] [ 3] IN : 4.5
> [ ] 7.222459 d= 0.000003 [184 + 7.000] [ 3] DATA0: 00 00
> [ ] 7.223456 d= 0.000997 [185.0 + 3.667] [ 3] IN : 4.5
> [ ] 7.223459 d= 0.000003 [185.0 + 7.000] [ 3] DATA0: 00 00
>
> Please note, that the time moment "7.221456" has missed IN request
> packet which must be sent out every 1ms in this low-speed USB case.
> Then, all incoming packets became empty ones. Such moments coincide
> with frame discarding in pwc driver.
Well sounds like you may be able to fix it since you have a test
case for it :)
> Even though IN sending is usually handled by USB host hardware, it is
> not fully true for MUSB. Every IN is triggered by musb kernel driver
> (see MUSB_RXCSR_H_REQPKT usage in musb_host_packet_rx() and
> musb_ep_program()) since auto IN is not used. Rather complicated logic
> is incorporated to decide whether IN packet has to be sent. First,
> musb_host_packet_rx() handles IN sending when current URB is not
> completed (i.e. current URB has another packet which has to be
> received next). Second, musb_advance_schedule() (via musb_start_urb())
> handles the case when current URB is completed but there is another
> URB pending. It seems that there is a hardware logic to fire IN packet
> in a way to have exactly 1ms between two consequent INs. So,
> MUSB_RXCSR_H_REQPKT is considered as IN requesting flag.
Yeah this is a known issue with musb, there is not much ISO support
currently really. The regression should be fixed though.
Sorry I don't have much ideas on how to improve things here. One
way might be to attempt to split the large musb functions into
smaller functions and see if that then allows finer grained control.
Just to try to come up with some new ideas.. Maybe there's some way
to swap the hardware EP config dynamically and have some packets
mostly preconfigured and waiting to be triggered?
Regards,
Tony
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [BUG] musb: broken isochronous transfer at TI AM335x platform
2018-02-16 16:27 ` Tony Lindgren
@ 2018-02-18 20:47 ` Matwey V. Kornilov
0 siblings, 0 replies; 3+ messages in thread
From: Matwey V. Kornilov @ 2018-02-18 20:47 UTC (permalink / raw)
To: Tony Lindgren; +Cc: linux-usb, Bin Liu, Greg KH, hverkuil, linux-media, linux
2018-02-16 19:27 GMT+03:00 Tony Lindgren <tony@atomide.com>:
> * Matwey V. Kornilov <matwey@sai.msu.ru> [180215 17:55]:
>> [ ] 7.219456 d= 0.000997 [181.0 + 3.667] [ 3] IN : 4.5
>> [ T ] 7.219459 d= 0.000003 [181.0 + 7.083] [800] DATA0: 53 da ...
>> [ ] 7.220456 d= 0.000997 [182 + 3.667] [ 3] IN : 4.5
>> [ T ] 7.220459 d= 0.000003 [182 + 7.000] [800] DATA0: 13 36 ...
>> [ ] 7.222456 d= 0.001997 [184 + 3.667] [ 3] IN : 4.5
>> [ ] 7.222459 d= 0.000003 [184 + 7.000] [ 3] DATA0: 00 00
>> [ ] 7.223456 d= 0.000997 [185.0 + 3.667] [ 3] IN : 4.5
>> [ ] 7.223459 d= 0.000003 [185.0 + 7.000] [ 3] DATA0: 00 00
>>
>> Please note, that the time moment "7.221456" has missed IN request
>> packet which must be sent out every 1ms in this low-speed USB case.
>> Then, all incoming packets became empty ones. Such moments coincide
>> with frame discarding in pwc driver.
>
> Well sounds like you may be able to fix it since you have a test
> case for it :)
Well, I am not an USB expert and I need some guidance and advice to
find acceptable solution. No doubts I could implement and test it, but
I would like to spend time for something known to be useful.
>
>> Even though IN sending is usually handled by USB host hardware, it is
>> not fully true for MUSB. Every IN is triggered by musb kernel driver
>> (see MUSB_RXCSR_H_REQPKT usage in musb_host_packet_rx() and
>> musb_ep_program()) since auto IN is not used. Rather complicated logic
>> is incorporated to decide whether IN packet has to be sent. First,
>> musb_host_packet_rx() handles IN sending when current URB is not
>> completed (i.e. current URB has another packet which has to be
>> received next). Second, musb_advance_schedule() (via musb_start_urb())
>> handles the case when current URB is completed but there is another
>> URB pending. It seems that there is a hardware logic to fire IN packet
>> in a way to have exactly 1ms between two consequent INs. So,
>> MUSB_RXCSR_H_REQPKT is considered as IN requesting flag.
>
> Yeah this is a known issue with musb, there is not much ISO support
> currently really. The regression should be fixed though.
>
> Sorry I don't have much ideas on how to improve things here. One
> way might be to attempt to split the large musb functions into
> smaller functions and see if that then allows finer grained control.
>
> Just to try to come up with some new ideas.. Maybe there's some way
> to swap the hardware EP config dynamically and have some packets
> mostly preconfigured and waiting to be triggered?
>
> Regards,
>
> Tony
>
--
With best regards,
Matwey V. Kornilov.
Sternberg Astronomical Institute, Lomonosov Moscow State University, Russia
119234, Moscow, Universitetsky pr-k 13, +7 (495) 9392382
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-02-18 20:47 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-15 17:54 [BUG] musb: broken isochronous transfer at TI AM335x platform Matwey V. Kornilov
2018-02-16 16:27 ` Tony Lindgren
2018-02-18 20:47 ` Matwey V. Kornilov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox