[Question about DMA] Consistent memory?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: mike.looijmans@topic.nl (Mike Looijmans)
To: linux-arm-kernel@lists.infradead.org
Subject: [Question about DMA] Consistent memory?
Date: Thu, 31 Dec 2015 18:12:52 +0100	[thread overview]
Message-ID: <56856214.3050907@topic.nl> (raw)
In-Reply-To: <CAK7LNAS3wXfJ_syOEMoi1aEt5_maNur05figQ472_9usGQagHQ@mail.gmail.com>

On 31-12-2015 15:57, Masahiro Yamada wrote:
> Hi Alan, Mike,
>
> Thanks for your help!
>
>
> 2015-12-31 19:25 GMT+09:00 One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>:
>
>>>
>>> In a system like Fig.2, is the memory non-consistent?
>>
>> dma_alloc_coherent will always provide you with coherent memory. On a
>> machine with good cache interfaces it will provide you with normal
>> memory. On some systems it may be memory from a special window, in other
>> cases it will fall back to providing uncached memory for this.
>>
>> If the platform genuinely cannot support this (even by marking those areas
>> uncacheable) then it will fail the allocation.
>>
>> What it does mean is that you need to use non-coherent mappings when
>> accessing a lot of data. On hardware without proper cache coherency it
>> may be quite expensive to access coherent memory.
>
>
> Now, it is clearer to me.
> The following is what I understood.
> (Please point out if I am wrong.)
>
>
> I think, roughly, there are two ways for handling DMA:
> (At first, I was so confused that I was thinking about [1] and [2] mixed.)
>
>
>
> [1] DMA-coherent buffers
>
> Allocate buffers with dma_alloc_coherent()
> and just have access to the buffers without cache synchronization.
>
> There is no need to call dma_sync_single_for_*().
>
>
>
> [2] Streaming DMA
>
> Allocate buffers with kmalloc() or friends,
> and then map them for DMA with dma_map_single().
>
> The buffers are cached, so they are non-consitent
> unless there exists hardware assist such as
> Cache Coherency Interconnect.
>
> The drivers must invoke cache operations
> by calling dma_sync_single_for_*().
>
>
>
>
> Is there any guideline about which way should be used in drivers?
>
> I think, if the buffer size is small, [1] is more efficient
> because it need not invoke cache operations.
>
> If the buffer is large, [2] seems better because
> the cost of uncached memory access gets more expensive
> than that of cache operations.

There's no difference in choice for large or small blocks. The dma_sync 
functions take linear time (as function of block size) to do their 
thing, larger buffers take longer to flush.

On the Zynq (also ARM, with a choice of coherency connections) I 
measured that the dma_sync operations took only slightly less time than 
simply copying the data.

If the action taken on the buffer after the DMA completion is to copy it 
to (of from) a user buffer, you should use dma_coherent calls. That's 
what I meant by "bounce buffers".

If you plan to DMA data straight to/from userspace, you'll need the 
dma_sync methods. (On coherent systems, the dma_sync methods become no-ops).

> (If devices are connected to the memory controller
> via Cache Coherency Interconnect, [1] always works very well.
> But drivers should be written in a portable way, so
> such a hardware implementation should not be expected.)
>
> I am not sure about the border line between [1] and [2], though...
>
>
>
> BTW, I am studying the DMA APIs in order to write a new
> MMC host driver for my ARM SoC.
>
>
> I grepped under drivers/mmc/host, and
> I found many drivers call dma_alloc_coherent(),
> but there are also some drivers that use dma_map_single().

If I recall correctly, most MMC controllers have their own 
scatter-gather DMA controller and copy data straight to/from userspace 
buffers.

-- 
Mike Looijmans

WARNING: multiple messages have this Message-ID (diff)

From: Mike Looijmans <mike.looijmans@topic.nl>
To: Masahiro Yamada <yamada.masahiro@socionext.com>,
	One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	dmaengine@vger.kernel.org,
	Dan Williams <dan.j.williams@intel.com>,
	"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
	Sumit Semwal <sumit.semwal@linaro.org>,
	Vinod Koul <vinod.koul@intel.com>, Christoph Hellwig <hch@lst.de>,
	Lars-Peter Clausen <lars@metafoo.de>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	Nicolas Ferre <nicolas.ferre@atmel.com>
Subject: Re: [Question about DMA] Consistent memory?
Date: Thu, 31 Dec 2015 18:12:52 +0100	[thread overview]
Message-ID: <56856214.3050907@topic.nl> (raw)
In-Reply-To: <CAK7LNAS3wXfJ_syOEMoi1aEt5_maNur05figQ472_9usGQagHQ@mail.gmail.com>

On 31-12-2015 15:57, Masahiro Yamada wrote:
> Hi Alan, Mike,
>
> Thanks for your help!
>
>
> 2015-12-31 19:25 GMT+09:00 One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>:
>
>>>
>>> In a system like Fig.2, is the memory non-consistent?
>>
>> dma_alloc_coherent will always provide you with coherent memory. On a
>> machine with good cache interfaces it will provide you with normal
>> memory. On some systems it may be memory from a special window, in other
>> cases it will fall back to providing uncached memory for this.
>>
>> If the platform genuinely cannot support this (even by marking those areas
>> uncacheable) then it will fail the allocation.
>>
>> What it does mean is that you need to use non-coherent mappings when
>> accessing a lot of data. On hardware without proper cache coherency it
>> may be quite expensive to access coherent memory.
>
>
> Now, it is clearer to me.
> The following is what I understood.
> (Please point out if I am wrong.)
>
>
> I think, roughly, there are two ways for handling DMA:
> (At first, I was so confused that I was thinking about [1] and [2] mixed.)
>
>
>
> [1] DMA-coherent buffers
>
> Allocate buffers with dma_alloc_coherent()
> and just have access to the buffers without cache synchronization.
>
> There is no need to call dma_sync_single_for_*().
>
>
>
> [2] Streaming DMA
>
> Allocate buffers with kmalloc() or friends,
> and then map them for DMA with dma_map_single().
>
> The buffers are cached, so they are non-consitent
> unless there exists hardware assist such as
> Cache Coherency Interconnect.
>
> The drivers must invoke cache operations
> by calling dma_sync_single_for_*().
>
>
>
>
> Is there any guideline about which way should be used in drivers?
>
> I think, if the buffer size is small, [1] is more efficient
> because it need not invoke cache operations.
>
> If the buffer is large, [2] seems better because
> the cost of uncached memory access gets more expensive
> than that of cache operations.

There's no difference in choice for large or small blocks. The dma_sync 
functions take linear time (as function of block size) to do their 
thing, larger buffers take longer to flush.

On the Zynq (also ARM, with a choice of coherency connections) I 
measured that the dma_sync operations took only slightly less time than 
simply copying the data.

If the action taken on the buffer after the DMA completion is to copy it 
to (of from) a user buffer, you should use dma_coherent calls. That's 
what I meant by "bounce buffers".

If you plan to DMA data straight to/from userspace, you'll need the 
dma_sync methods. (On coherent systems, the dma_sync methods become no-ops).

> (If devices are connected to the memory controller
> via Cache Coherency Interconnect, [1] always works very well.
> But drivers should be written in a portable way, so
> such a hardware implementation should not be expected.)
>
> I am not sure about the border line between [1] and [2], though...
>
>
>
> BTW, I am studying the DMA APIs in order to write a new
> MMC host driver for my ARM SoC.
>
>
> I grepped under drivers/mmc/host, and
> I found many drivers call dma_alloc_coherent(),
> but there are also some drivers that use dma_map_single().

If I recall correctly, most MMC controllers have their own 
scatter-gather DMA controller and copy data straight to/from userspace 
buffers.

-- 
Mike Looijmans

next prev parent reply	other threads:[~2015-12-31 17:12 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-31  7:50 [Question about DMA] Consistent memory? Masahiro Yamada
2015-12-31  7:50 ` Masahiro Yamada
2015-12-31  8:38 ` Mike Looijmans
2015-12-31  8:38   ` Mike Looijmans
2015-12-31 10:25 ` One Thousand Gnomes
2015-12-31 10:25   ` One Thousand Gnomes
2015-12-31 14:57   ` Masahiro Yamada
2015-12-31 14:57     ` Masahiro Yamada
2015-12-31 17:12     ` Mike Looijmans [this message]
2015-12-31 17:12       ` Mike Looijmans
2016-01-02 10:53     ` Russell King - ARM Linux
2016-01-02 10:53       ` Russell King - ARM Linux
2016-01-02 10:39 ` Russell King - ARM Linux
2016-01-02 10:39   ` Russell King - ARM Linux
2016-01-02 16:17   ` James Bottomley
2016-01-02 16:17     ` James Bottomley
2016-01-02 18:07     ` Russell King - ARM Linux
2016-01-02 18:07       ` Russell King - ARM Linux
2016-01-02 18:35   ` Mike Looijmans
2016-01-02 18:35     ` Mike Looijmans
2016-01-02 20:10     ` James Bottomley
2016-01-02 20:10       ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56856214.3050907@topic.nl \
    --to=mike.looijmans@topic.nl \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.