Re: [PATCH 11/15] iio: buffer-dma: Boost performance using write-combine cache setting

public inbox for linux-media@vger.kernel.org
 help / color / mirror / Atom feed

From: Paul Cercueil <paul@crapouillou.net>
To: Jonathan Cameron <jic23@kernel.org>
Cc: "Alexandru Ardelean" <ardeleanalex@gmail.com>,
	"Lars-Peter Clausen" <lars@metafoo.de>,
	"Michael Hennerich" <Michael.Hennerich@analog.com>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Christian König" <christian.koenig@amd.com>,
	linux-iio@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org
Subject: Re: [PATCH 11/15] iio: buffer-dma: Boost performance using write-combine cache setting
Date: Thu, 25 Nov 2021 17:29:58 +0000	[thread overview]
Message-ID: <YX153R.0PENWW3ING7F1@crapouillou.net> (raw)
In-Reply-To: <8WNX2R.M4XE9MQC24W22@crapouillou.net>

Hi Jonathan,

Le dim., nov. 21 2021 at 17:43:20 +0000, Paul Cercueil 
<paul@crapouillou.net> a écrit :
> Hi Jonathan,
> 
> Le dim., nov. 21 2021 at 15:00:37 +0000, Jonathan Cameron 
> <jic23@kernel.org> a écrit :
>> On Mon, 15 Nov 2021 14:19:21 +0000
>> Paul Cercueil <paul@crapouillou.net> wrote:
>> 
>>>  We can be certain that the input buffers will only be accessed by
>>>  userspace for reading, and output buffers will mostly be accessed 
>>> by
>>>  userspace for writing.
>> 
>> Mostly?  Perhaps a little more info on why that's not 'only'.
> 
> Just like with a framebuffer, it really depends on what the 
> application does. Most of the cases it will just read sequentially an 
> input buffer, or write sequentially an output buffer. But then you 
> get the exotic application that will try to do something like alpha 
> blending, which means read+write. Hence "mostly".
> 
>>> 
>>>  Therefore, it makes more sense to use only fully cached input 
>>> \x7f\x7fbuffers,
>>>  and to use the write-combine cache coherency setting for output 
>>> \x7f\x7fbuffers.
>>> 
>>>  This boosts performance, as the data written to the output buffers 
>>> \x7f\x7fdoes
>>>  not have to be sync'd for coherency. It will halve performance if 
>>> \x7f\x7fthe
>>>  userspace application tries to read from the output buffer, but 
>>> this
>>>  should never happen.
>>> 
>>>  Since we don't need to sync the cache when disabling CPU access 
>>> \x7f\x7feither
>>>  for input buffers or output buffers, the .end_cpu_access() 
>>> callback \x7f\x7fcan
>>>  be dropped completely.
>> 
>> We have an odd mix of coherent and non coherent DMA in here as you 
>> \x7fnoted,
>> but are you sure this is safe on all platforms?
> 
> The mix isn't safe, but using only coherent or only non-coherent 
> should be safe, yes.
> 
>> 
>>> 
>>>  Signed-off-by: Paul Cercueil <paul@crapouillou.net>
>> 
>> Any numbers to support this patch?  The mapping types are performance
>> optimisations so nice to know how much of a difference they make.
> 
> Output buffers are definitely faster in write-combine mode. On a 
> ZedBoard with a AD9361 transceiver set to 66 MSPS, and buffer/size 
> set to 8192, I would get about 185 MiB/s before, 197 MiB/s after.
> 
> Input buffers... early results are mixed. On ARM32 it does look like 
> it is slightly faster to read from *uncached* memory than reading 
> from cached memory. The cache sync does take a long time.
> 
> Other architectures might have a different result, for instance on 
> MIPS invalidating the cache is a very fast operation, so using cached 
> buffers would be a huge win in performance.
> 
> Setups where the DMA operations are coherent also wouldn't require 
> any cache sync and this patch would give a huge win in performance.
> 
> I'll run some more tests next week to have some fresh numbers.

I think I mixed things up before, because I get different results now.

Here are some fresh benchmarks, triple-checked, using libiio's 
iio_readdev and iio_writedev tools, with 64K samples buffers at 61.44 
MSPS (max. theorical throughput: 234 MiB/s):
  iio_readdev -b 65536 cf-ad9361-lpc voltage0 voltage1 | pv > /dev/null
  pv /dev/zero | iio_writedev -b 65536 cf-ad9361-dds-core-lpc voltage0 
voltage1

Coherent mapping:
- fileio:
    read:	125 MiB/s
    write:	141 MiB/s
- dmabuf:
    read:	171 MiB/s
    write:	210 MiB/s

Coherent reads + Write-combine writes:
- fileio:
    read:	125 MiB/s
    write:	141 MiB/s
- dmabuf:
    read:	171 MiB/s
    write:	210 MiB/s

Non-coherent mapping:
- fileio:
    read:	119 MiB/s
    write:	124 MiB/s
- dmabuf:
    read:	159 MiB/s
    write:	124 MiB/s

Non-coherent reads + write-combine writes:
- fileio:
    read:	119 MiB/s
    write:	140 MiB/s
- dmabuf:
    read:	159 MiB/s
    write:	210 MiB/s

Non-coherent mapping with no cache sync:
- fileio:
    read:	156 MiB/s
    write:	123 MiB/s
- dmabuf:
    read:	234 MiB/s (capped by sample rate)
    write:	182 MiB/s

Non-coherent reads with no cache sync + write-combine writes:
- fileio:
    read:	156 MiB/s
    write:	140 MiB/s
- dmabuf:
    read:	234 MiB/s (capped by sample rate)
    write:	210 MiB/s


A few things we can deduce from this:

* Write-combine is not available on Zynq/ARM? If it was working, it 
should give a better performance than the coherent mapping, but it 
doesn't seem to do anything at all. At least it doesn't harm 
performance.

* Non-coherent + cache invalidation is definitely a good deal slower 
than using coherent mapping, at least on ARM32. However, when the cache 
sync is disabled (e.g. if the DMA operations are coherent) the reads 
are much faster.

* The new dma-buf based API is a great deal faster than the fileio API.

So in the future we could use coherent reads + write-combine writes, 
unless we know the DMA operations are coherent, and in this case use 
non-coherent reads + write-combine writes.

Regarding this patch, unfortunately I cannot prove that write-combine 
is faster, so I'll just drop this patch for now.

Cheers,
-Paul

next prev parent reply	other threads:[~2021-11-25 17:32 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-15 14:19 [PATCH 00/15] iio: buffer-dma: write() and new DMABUF based API Paul Cercueil
2021-11-15 14:19 ` [PATCH 01/15] iio: buffer-dma: Get rid of incoming/outgoing queues Paul Cercueil
2021-11-16  8:23   ` Alexandru Ardelean
2021-11-21 14:05   ` Jonathan Cameron
2021-11-21 16:23   ` Lars-Peter Clausen
2021-11-21 17:52     ` Paul Cercueil
2021-11-21 18:49       ` Lars-Peter Clausen
2021-11-21 20:08         ` Paul Cercueil
2021-11-22 15:08           ` Lars-Peter Clausen
2021-11-22 15:16             ` Paul Cercueil
2021-11-22 15:17               ` Lars-Peter Clausen
2021-11-22 15:27                 ` Paul Cercueil
2021-11-15 14:19 ` [PATCH 02/15] iio: buffer-dma: Remove unused iio_buffer_block struct Paul Cercueil
2021-11-16  8:22   ` Alexandru Ardelean
2021-11-15 14:19 ` [PATCH 03/15] iio: buffer-dma: Use round_down() instead of rounddown() Paul Cercueil
2021-11-16  8:26   ` Alexandru Ardelean
2021-11-21 14:08   ` Jonathan Cameron
2021-11-22 10:00     ` Paul Cercueil
2021-11-27 15:15       ` Jonathan Cameron
2021-11-15 14:19 ` [PATCH 04/15] iio: buffer-dma: Enable buffer write support Paul Cercueil
2021-11-16  8:52   ` Alexandru Ardelean
2021-11-21 14:20   ` Jonathan Cameron
2021-11-21 17:19     ` Paul Cercueil
2021-11-27 15:17       ` Jonathan Cameron
2021-11-15 14:19 ` [PATCH 05/15] iio: buffer-dmaengine: Support specifying buffer direction Paul Cercueil
2021-11-16  8:53   ` Alexandru Ardelean
2021-11-15 14:19 ` [PATCH 06/15] iio: buffer-dmaengine: Enable write support Paul Cercueil
2021-11-16  8:55   ` Alexandru Ardelean
2021-11-15 14:19 ` [PATCH 07/15] iio: core: Add new DMABUF interface infrastructure Paul Cercueil
2021-11-21 14:31   ` Jonathan Cameron
2021-11-15 14:19 ` [PATCH 08/15] iio: buffer-dma: split iio_dma_buffer_fileio_free() function Paul Cercueil
2021-11-16 10:59   ` Alexandru Ardelean
2021-11-21 13:49     ` Jonathan Cameron
2021-11-15 14:19 ` [PATCH 09/15] iio: buffer-dma: Use DMABUFs instead of custom solution Paul Cercueil
2021-11-15 14:19 ` [PATCH 10/15] iio: buffer-dma: Implement new DMABUF based userspace API Paul Cercueil
2021-11-15 14:19 ` [PATCH 11/15] iio: buffer-dma: Boost performance using write-combine cache setting Paul Cercueil
2021-11-18 11:45   ` Paul Cercueil
2021-11-21 15:00   ` Jonathan Cameron
2021-11-21 17:43     ` Paul Cercueil
2021-11-25 17:29       ` Paul Cercueil [this message]
2021-11-27 16:05         ` Jonathan Cameron
2021-11-28 13:25           ` Lars-Peter Clausen
2021-11-27 15:20       ` Jonathan Cameron
2021-11-15 14:22 ` [PATCH 12/15] iio: buffer-dmaengine: Support new DMABUF based userspace API Paul Cercueil
2021-11-15 14:22   ` [PATCH 13/15] iio: core: Add support for cyclic buffers Paul Cercueil
2021-11-16  9:50     ` Alexandru Ardelean
2021-11-15 14:22   ` [PATCH 14/15] iio: buffer-dmaengine: " Paul Cercueil
2021-11-16  9:50     ` Alexandru Ardelean
2021-11-15 14:22   ` [PATCH 15/15] Documentation: iio: Document high-speed DMABUF based API Paul Cercueil
2021-11-21 15:10     ` Jonathan Cameron
2021-11-21 17:46       ` Paul Cercueil
2021-11-15 14:37 ` [PATCH 00/15] iio: buffer-dma: write() and new " Daniel Vetter
2021-11-15 14:57   ` Paul Cercueil
2021-11-16 16:02     ` Daniel Vetter
2021-11-16 16:31       ` Laurent Pinchart
2021-11-17  8:48         ` Christian König
2021-11-17 12:50       ` Paul Cercueil
2021-11-17 13:42         ` Hennerich, Michael
2021-11-21 13:57 ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YX153R.0PENWW3ING7F1@crapouillou.net \
    --to=paul@crapouillou.net \
    --cc=Michael.Hennerich@analog.com \
    --cc=ardeleanalex@gmail.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jic23@kernel.org \
    --cc=lars@metafoo.de \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-iio@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=sumit.semwal@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox