dma_sync_single_for_cpu takes a really long time

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: mike.looijmans@topic.nl (Mike Looijmans)
To: linux-arm-kernel@lists.infradead.org
Subject: dma_sync_single_for_cpu takes a really long time
Date: Mon, 29 Jun 2015 15:24:58 +0200	[thread overview]
Message-ID: <5591472A.4080707@topic.nl> (raw)
In-Reply-To: <CAF6-1L7CY7HYr8RPeisW5rVGzftOcX8mRUev-Mqp1-EVB03pCQ@mail.gmail.com>

>> You're on a Zynq, and that has an ACP port. Connect through that instead of
>> an HP port (interface is almost the same), add "dma-coherent" to the
>> devicetree and also add my patch that properly maps this into userspace.
>>
>> The penalty of the ACP port is that it will write a lot slower to the memory
>> (about half the speed of the 600MB/s you get from the HP port) because of
>> all the cache administration. The good news is that all memory will be
>> cacheable once more, and all the dma_sync_... calls will turn into no-ops.
>> You don't have to change your driver and the logic also remains the same.
>
> That's a pretty big downside. 600 M/s write speed is already pretty
> low (I mean, DDR raw bw should be close to 4G/s, sure it's DDR so you
> can never reach that but still for large purely sequential access I
> expected to get closer than that).

I just repeat what's in the Zynq documentation. I did measure 599 M/s 
(simultaneously reading and writing at that speed), so it lives up to that.
The 600MB/s appears to be a limitation of the HP port, not the DDR controller.

Xilinx also mentions 1200MB/s for the ACP port in the same document, but 
that's only the case when reading/writing the L2 cache data.

> Also, doesn't that impact the ARM access performance too much to have to share ?

That I haven't tested. I don't know if the snoop unit may become a bottleneck 
here. I'd expect not, since the CPU interface is a lot faster than when the 
ACP uses.

> I guess the best flags to use for this are coherent write request
> without L2 allocation.

That's the situation where you'll get about half the HP performance. Its the 
ACP-DDR path that is slow.

If you want to process the data fast, use smaller chunks (32k or 64k works 
well) so that all data fits in the L2 cache. Use a bit less than 512k (the L2 
cache size) of buffer memory (for example 6x64k) and have the CPU process it 
in those small chunks as it arrives. Let the CPU "touch" all buffers so that 
they are present in the L2 cache before the logic reads or writes them.
Simply put: Process scan lines, not whole frames. That would make the data 
never hit DDR at all, and raise the processing speed by a significant factor.

>> Another approach is to make your software uncached-memory friendly. If you
>> process the frames sequentially and use NEON instructions to fetch large
>> aligned chunks for further processing, the absense of caching won't matter
>> much.
>
> Yes, that was the next thing I was going to try.
>
> Does using pre-load make anysense for uncached ? I guess not.

You could do some "preloading" by interleaving fetch and process instructions, 
so the CPU has some work to do while waiting for the DDR data. I haven't 
experimented with that either.

next prev parent reply	other threads:[~2015-06-29 13:24 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-28 20:40 dma_sync_single_for_cpu takes a really long time Sylvain Munaut
2015-06-28 22:30 ` Russell King - ARM Linux
2015-06-29  6:07   ` Sylvain Munaut
2015-06-29  9:08     ` Russell King - ARM Linux
2015-06-29  9:36       ` Catalin Marinas
2015-06-29 12:30       ` Sylvain Munaut
2015-06-29 13:29         ` Russell King - ARM Linux
2015-06-29  9:09   ` Catalin Marinas
2015-06-29  6:33 ` Mike Looijmans
2015-06-29 13:06   ` Sylvain Munaut
2015-06-29 13:24     ` Mike Looijmans [this message]
2015-06-29 10:25 ` Arnd Bergmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5591472A.4080707@topic.nl \
    --to=mike.looijmans@topic.nl \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).