linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Morton <jonathan.morton@movial.com>
To: M.K.Edwards@gmail.com
Cc: Subash Patel <subashrp@gmail.com>,
	Jordan Crouse <jcrouse@codeaurora.org>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	linux-arch@vger.kernel.org, linaro-mm-sig@lists.linaro.org,
	linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org
Subject: Re: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
Date: Sun, 26 Jun 2011 03:06:30 +0300	[thread overview]
Message-ID: <BANLkTi=uNVLOy4oTTBpr8niRMX+m6wgWBg@mail.gmail.com> (raw)
In-Reply-To: <BANLkTi=y6PGMdHq0uT9QJ7aej3nU6cKW2g@mail.gmail.com>

On 25 June 2011 12:55, Michael K. Edwards <m.k.edwards@gmail.com> wrote:
> With regard to the use of NEON for data moves, I have appended a
> snippet of a conversation from the BeagleBoard list that veered off
> into a related direction.  (My response is lightly edited, since I
> made some stupid errors in the original.)  While this is somewhat
> off-topic from Marek's patch set, I think it's relevant to the
> question of whether "user-allocated" buffers are an important design
> consideration for his otherwise DMA-centric API.  (And more to the
> point, buffers allocated suitably for one or more on-chip devices, and
> also mapped as uncacheable to userland.)

As far as userspace is concerned, dealing with the memory hierarchy's
quirks is already pretty much a black art, and that's *before* you
start presenting it with uncached buffers.  The best rule of thumb
userspace can follow is to keep things in cache if they can, and use
the biggest memory-move instructions (and prefetching if available) if
they can't.  Everything else they have to rely on the hardware to
optimise for them.  Indeed, when working in C, you barely even get
*that* level of control (optimised copy routines have been known to
use double simply because it is reliably 64 bits that can be loaded
and stored efficiently), and most other languages are worse.

Small wonder that userspace code that knows it has to work with
uncached buffers sometimes - such as Pixman - relies heavily on
handwritten SIMD assembler.

Video decoders are a particularly fun case, because the correct
solution is actually to DMA the output buffer to the GPU (or, better,
to map one onto the other so that zero-copy semantics result) so that
the CPU doesn't have to touch it.  But then you have to find a common
format that both VPU and GPU support, and you have to have a free DMA
channel and a way to use it.  Frankly though, this is a solution from
the 20th century (remember MPEG2 decoders sitting beside the SVGA
card?).

We *have* had to occasionally deal with hardware where no such common
format could be found, although often this has been due to inadequate
documentation or driver support (a familiar refrain).  In one case I
wrote a NEON NV12-to-RGB32 conversion routine which read directly from
the video buffer and wrote directly into a texture buffer, both of
which were of course uncached.  This halved the CPU consumption of the
video playback applet, but prefixing it with a routine which copied
the video buffer into cached memory (using 32-byte VLD1 instead of
16-byte versions) halved it again.  Profiling showed that the vast
majority of the time was spent in the prefix copy loop.  No doubt if
further savings had been required, I'd have tried using VLDM in the
copy loop.  (There weren't enough registers to widen the load stage of
the conversion routine itself.)

The takeaway from this is that if your code has to read from uncached
memory at all, that will undoubtedly dominate it's performance.  A
read-modify-write cycle is at least as bad (because the memory has to
go through at least one CAS latency and a write-to-read turnaround
before the next read can be serviced).  A pure write is, however, no
problem.

On cached memory, the L2 cache of most modern (even ARM) CPUs has an
auto-prefetcher which will help out with sequential transfers.  This
should get somewhere reasonably close to optimal performance.

 - Jonathan Morton

  reply	other threads:[~2011-06-26  0:06 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-20  7:50 [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign Marek Szyprowski
2011-06-20  7:50 ` [PATCH 1/8] ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops Marek Szyprowski
2011-06-20  8:35   ` Michal Nazarewicz
2011-06-20 10:46     ` Marek Szyprowski
2011-06-20 10:46       ` Marek Szyprowski
2011-07-03 15:28   ` Russell King - ARM Linux
2011-07-03 15:28     ` Russell King - ARM Linux
2011-07-26 12:56     ` Marek Szyprowski
2011-06-20  7:50 ` [PATCH 2/8] ARM: dma-mapping: implement dma_map_single on top of dma_map_page Marek Szyprowski
2011-06-20 14:39   ` Russell King - ARM Linux
2011-06-20 14:39     ` Russell King - ARM Linux
2011-06-20 15:15     ` Marek Szyprowski
2011-06-24 15:24       ` Arnd Bergmann
2011-06-24 15:24         ` Arnd Bergmann
2011-06-27 14:29         ` Marek Szyprowski
2011-06-27 14:53           ` Arnd Bergmann
2011-06-27 14:53             ` Arnd Bergmann
2011-06-27 15:06             ` Marek Szyprowski
2011-06-20  7:50 ` [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h Marek Szyprowski
2011-06-20 14:33   ` [Linaro-mm-sig] " KyongHo Cho
2011-06-21 11:47     ` Marek Szyprowski
2011-06-21 11:47       ` Marek Szyprowski
2011-06-24  8:39       ` 'Joerg Roedel'
2011-06-24  8:39         ` 'Joerg Roedel'
2011-06-24 15:36   ` Arnd Bergmann
2011-06-24 15:36     ` Arnd Bergmann
2011-06-27 12:18     ` Marek Szyprowski
2011-06-27 12:18       ` Marek Szyprowski
2011-06-27 13:19       ` Arnd Bergmann
2011-06-27 13:19         ` Arnd Bergmann
2011-07-07 12:09         ` Lennert Buytenhek
2011-07-07 12:09           ` Lennert Buytenhek
2011-07-07 12:38           ` Russell King - ARM Linux
2011-07-07 12:38             ` Russell King - ARM Linux
2011-07-15  0:10             ` Lennert Buytenhek
2011-07-15  9:27               ` Russell King - ARM Linux
2011-07-15  9:27                 ` Russell King - ARM Linux
2011-07-15 21:53                 ` Lennert Buytenhek
2011-06-20  7:50 ` [PATCH 4/8] ARM: dma-mapping: implement dma sg methods on top of generic dma ops Marek Szyprowski
2011-06-20  7:50   ` Marek Szyprowski
2011-06-20 14:37   ` KyongHo Cho
2011-06-20 14:40   ` Russell King - ARM Linux
2011-06-20 14:40     ` Russell King - ARM Linux
2011-06-20 15:23     ` Marek Szyprowski
2011-06-20  7:50 ` [PATCH 5/8] ARM: dma-mapping: move all dma bounce code to separate dma ops structure Marek Szyprowski
2011-06-20 14:42   ` Russell King - ARM Linux
2011-06-20 15:31     ` Marek Szyprowski
2011-06-20 15:31       ` Marek Szyprowski
2011-06-24 15:47       ` Arnd Bergmann
2011-06-24 15:47         ` Arnd Bergmann
2011-06-27 14:20         ` Marek Szyprowski
2011-06-27 14:20           ` Marek Szyprowski
2011-06-20  7:50 ` [PATCH 6/8] ARM: dma-mapping: remove redundant code and cleanup Marek Szyprowski
2011-06-20  7:50 ` [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs Marek Szyprowski
2011-06-20 14:45   ` KyongHo Cho
2011-06-20 15:06     ` Russell King - ARM Linux
2011-06-20 15:06       ` Russell King - ARM Linux
2011-06-20 15:14       ` [Linaro-mm-sig] " KyongHo Cho
2011-06-21 11:23     ` Marek Szyprowski
2011-06-22  0:00       ` [Linaro-mm-sig] " KyongHo Cho
2011-06-24  7:20         ` Marek Szyprowski
2011-06-24 15:51   ` Arnd Bergmann
2011-06-24 15:51     ` Arnd Bergmann
2011-06-24 16:15     ` James Bottomley
2011-06-24 16:23       ` Arnd Bergmann
2011-06-27 12:23     ` Marek Szyprowski
2011-06-27 12:23       ` Marek Szyprowski
2011-06-27 13:22       ` Arnd Bergmann
2011-06-27 13:22         ` Arnd Bergmann
2011-06-27 13:30         ` Marek Szyprowski
2011-06-27 13:30           ` Marek Szyprowski
2011-06-24 15:53   ` Arnd Bergmann
2011-06-24 15:53     ` Arnd Bergmann
2011-06-27 14:41     ` Marek Szyprowski
2011-06-20  7:50 ` [PATCH 8/8] ARM: dma-mapping: use alloc, mmap, free from dma_ops Marek Szyprowski
2011-06-22  6:53   ` [Linaro-mm-sig] " KyongHo Cho
2011-06-22  4:53 ` [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign Subash Patel
2011-06-22  6:59   ` Marek Szyprowski
2011-06-22  6:59     ` Marek Szyprowski
2011-06-22  8:53     ` Subash Patel
2011-06-22  9:27       ` Marek Szyprowski
2011-06-22 16:00         ` Jordan Crouse
2011-06-23 13:09           ` Subash Patel
2011-06-23 13:09             ` Subash Patel
2011-06-23 16:24             ` Michael K. Edwards
2011-06-23 22:09               ` Michael K. Edwards
2011-06-25  5:23                 ` Jonathan Morton
2011-06-25  5:23                   ` Jonathan Morton
2011-06-25  9:55                   ` Michael K. Edwards
2011-06-26  0:06                     ` Jonathan Morton [this message]
2011-06-24 15:20           ` Arnd Bergmann
2011-06-24 15:20             ` Arnd Bergmann
2011-06-24  9:18 ` Joerg Roedel
2011-06-24 14:26   ` Marek Szyprowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='BANLkTi=uNVLOy4oTTBpr8niRMX+m6wgWBg@mail.gmail.com' \
    --to=jonathan.morton@movial.com \
    --cc=M.K.Edwards@gmail.com \
    --cc=jcrouse@codeaurora.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=m.szyprowski@samsung.com \
    --cc=subashrp@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).