About dma_sync_single_for

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* About dma_sync_single_for_{cpu,device}
       [not found] <20120730202401.GA4947@gobelin>
@ 2012-07-31  6:45 ` Karl Beldan
  2012-07-31  6:59   ` Clemens Ladisch
  2012-07-31  9:09   ` Russell King - ARM Linux
  0 siblings, 2 replies; 9+ messages in thread
From: Karl Beldan @ 2012-07-31  6:45 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

(This is an email originally addressed to the linux-kernel
mailing-list.)

On our board we've got an MV78200 and a network device between which we
xfer memory chunks via the ddram with an external dma controller.
To handle these xfers we're using the dma API.

To tx a chunk of data from the SoC => network device, we :
- prepare a buffer with a leading header embedding a pattern, 
- trigger the xfer and wait for an irq
// The device updates the pattern and then triggers an irq
- upon irq we check the pattern for the xfer completion

I was expecting the following to work:
	addr = dma_map_single(dev, buffer, size, DMA_TO_DEVICE);
	dma_sync_single_for_device(dev, buffer, pattern_size, DMA_FROM_DEVICE);
	dev_send(buffer);
	// wait for irq (don't peek in the buffer) ... got irq
	dma_sync_single_for_cpu(dev, buffer, pattern_size, DMA_FROM_DEVICE);
	if (!xfer_done(buffer)) // not RAM value
		dma_sync_single_for_device(dev, buffer, pattern_size, DMA_FROM_DEVICE);
	[...]

But this does not work (the buffer pattern does not reflect the ddram
value).
On the other hand, the following works:
	[...]
	// wait for irq (don't peek in the buffer) ... got irq
	dma_sync_single_for_device(dev, buffer, pattern_size, DMA_FROM_DEVICE);
	if (!xfer_done(buffer)) // RAM value
	[...]

Looking at 
	dma-mapping.c:__dma_page_cpu_to_{dev,cpu}() and
	proc-feroceon.S: feroceon_dma_{,un}map_area
this behavior is not surprising.
The sync_for_cpu calls the unmap which just invalidates the outer cache
while the sync_for_device invalidates both inner and outer.

It seems that:
- we need to invalidate after the RAM has been updated
- we need to invalidate with sync_single_for_device rather than
  sync_single_for_cpu to check the value

Is it correct ?

Maybe the following comment in dma-mapping.c explains the situation :
/*
 * The DMA API is built upon the notion of "buffer ownership".  A buffer
 * is either exclusively owned by the CPU (and therefore may be accessed
 * by it) or exclusively owned by the DMA device.  These helper functions
 * represent the transitions between these two ownership states.
 *
 * Note, however, that on later ARMs, this notion does not work due to
 * speculative prefetches.  We model our approach on the assumption that
 * the CPU does do speculative prefetches, which means we clean caches
 * before transfers and delay cache invalidation until transfer completion.
 *
 */

Thanks for your input,

Regards,

Karl

^ permalink raw reply	[flat|nested] 9+ messages in thread

* About dma_sync_single_for_{cpu,device}
  2012-07-31  6:45 ` About dma_sync_single_for_{cpu,device} Karl Beldan
@ 2012-07-31  6:59   ` Clemens Ladisch
  2012-07-31  7:27     ` Karl Beldan
  2012-07-31  9:09   ` Russell King - ARM Linux
  1 sibling, 1 reply; 9+ messages in thread
From: Clemens Ladisch @ 2012-07-31  6:59 UTC (permalink / raw)
  To: linux-arm-kernel

Karl Beldan wrote:
> To tx a chunk of data from the SoC => network device, we :
> - prepare a buffer with a leading header embedding a pattern,
> - trigger the xfer and wait for an irq
> // The device updates the pattern and then triggers an irq
> - upon irq we check the pattern for the xfer completion
>
> I was expecting the following to work:
> 	addr = dma_map_single(dev, buffer, size, DMA_TO_DEVICE);

Of both the CPU and the device write to the buffer, you must use
DMA_BIDIRECTIONAL.


Regards,
Clemens

^ permalink raw reply	[flat|nested] 9+ messages in thread

* About dma_sync_single_for_{cpu,device}
  2012-07-31  6:59   ` Clemens Ladisch
@ 2012-07-31  7:27     ` Karl Beldan
  2012-07-31  7:34       ` Clemens Ladisch
  0 siblings, 1 reply; 9+ messages in thread
From: Karl Beldan @ 2012-07-31  7:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 7/31/12, Clemens Ladisch <clemens@ladisch.de> wrote:
> Karl Beldan wrote:
>> To tx a chunk of data from the SoC => network device, we :
>> - prepare a buffer with a leading header embedding a pattern,
>> - trigger the xfer and wait for an irq
>> // The device updates the pattern and then triggers an irq
>> - upon irq we check the pattern for the xfer completion
>>
>> I was expecting the following to work:
>> 	addr = dma_map_single(dev, buffer, size, DMA_TO_DEVICE);
>
> Of both the CPU and the device write to the buffer, you must use
> DMA_BIDIRECTIONAL.
>
Hi Clemens,

This does not work (tested) :  seems to me BIDIRECTIONAL would just
add invalidate, and invalidate before the ram has been updated, as
stated, does not work.
In fact the immediately following sync_for_device is intended to cater
for what DMA_BIDIRECTIONAL would provide (though it is not
implementation agnostic), only  invalidating a smaller address range.

Regards,

Karl

^ permalink raw reply	[flat|nested] 9+ messages in thread

* About dma_sync_single_for_{cpu,device}
  2012-07-31  7:27     ` Karl Beldan
@ 2012-07-31  7:34       ` Clemens Ladisch
  2012-07-31  8:30         ` Karl Beldan
  0 siblings, 1 reply; 9+ messages in thread
From: Clemens Ladisch @ 2012-07-31  7:34 UTC (permalink / raw)
  To: linux-arm-kernel

Karl Beldan wrote:
> On 7/31/12, Clemens Ladisch <clemens@ladisch.de> wrote:
>> Karl Beldan wrote:
>>> To tx a chunk of data from the SoC => network device, we :
>>> - prepare a buffer with a leading header embedding a pattern,
>>> - trigger the xfer and wait for an irq
>>> // The device updates the pattern and then triggers an irq
>>> - upon irq we check the pattern for the xfer completion
>>>
>>> I was expecting the following to work:
>>> 	addr = dma_map_single(dev, buffer, size, DMA_TO_DEVICE);
>>
>> Of both the CPU and the device write to the buffer, you must use
>> DMA_BIDIRECTIONAL.
>
> This does not work (tested) :  seems to me BIDIRECTIONAL would just
> add invalidate, and invalidate before the ram has been updated, as
> stated, does not work.

Please show the exact sequence of dma_* calls, and also show when and
how the CPU and the device access the buffer.


Regards,
Clemens

^ permalink raw reply	[flat|nested] 9+ messages in thread

* About dma_sync_single_for_{cpu,device}
  2012-07-31  7:34       ` Clemens Ladisch
@ 2012-07-31  8:30         ` Karl Beldan
  0 siblings, 0 replies; 9+ messages in thread
From: Karl Beldan @ 2012-07-31  8:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 31, 2012 at 09:34:01AM +0200, Clemens Ladisch wrote:
> Karl Beldan wrote:
> > On 7/31/12, Clemens Ladisch <clemens@ladisch.de> wrote:
> >> Karl Beldan wrote:
> >>> To tx a chunk of data from the SoC => network device, we :
> >>> - prepare a buffer with a leading header embedding a pattern,
> >>> - trigger the xfer and wait for an irq
> >>> // The device updates the pattern and then triggers an irq
> >>> - upon irq we check the pattern for the xfer completion
> >>>
> >>> I was expecting the following to work:
> >>> 	addr = dma_map_single(dev, buffer, size, DMA_TO_DEVICE);
> >>
> >> Of both the CPU and the device write to the buffer, you must use
> >> DMA_BIDIRECTIONAL.
> >
> > This does not work (tested) :  seems to me BIDIRECTIONAL would just
> > add invalidate, and invalidate before the ram has been updated, as
> > stated, does not work.
> 
> Please show the exact sequence of dma_* calls, and also show when and
> how the CPU and the device access the buffer.
> 
Hmm, so I just spotted a line where we peek in the buffer after
invalidating .. cannot believe I missed it .. so sorry for the noise ..
now it's working.
I felt I would find the culprit right after posting ;)
Thanks Clemens !

^ permalink raw reply	[flat|nested] 9+ messages in thread

* About dma_sync_single_for_{cpu,device}
  2012-07-31  6:45 ` About dma_sync_single_for_{cpu,device} Karl Beldan
  2012-07-31  6:59   ` Clemens Ladisch
@ 2012-07-31  9:09   ` Russell King - ARM Linux
  2012-07-31 19:31     ` Karl Beldan
  1 sibling, 1 reply; 9+ messages in thread
From: Russell King - ARM Linux @ 2012-07-31  9:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 31, 2012 at 08:45:57AM +0200, Karl Beldan wrote:
> I was expecting the following to work:
> 	addr = dma_map_single(dev, buffer, size, DMA_TO_DEVICE);
> 	dma_sync_single_for_device(dev, buffer, pattern_size, DMA_FROM_DEVICE);
> 	dev_send(buffer);
> 	// wait for irq (don't peek in the buffer) ... got irq
> 	dma_sync_single_for_cpu(dev, buffer, pattern_size, DMA_FROM_DEVICE);
> 	if (!xfer_done(buffer)) // not RAM value
> 		dma_sync_single_for_device(dev, buffer, pattern_size, DMA_FROM_DEVICE);
> 	[...]

First point is that you clearly do not understand the DMA API at all.  The
DMA API has the idea of buffer ownership.  Only the owner may access the
buffer:

*CPU OWNS THE BUFFER*
dma_map_single()
*DEVICE OWNS THE BUFFER*
dma_sync_single_for_cpu()
*CPU OWNS THE BUFFER*
dma_sync_single_for_device()
*DEVICE OWNS THE BUFFER*
dma_unmap_single()
*CPU OWNS THE BUFFER*

So, there is absolutely no noeed what so ever to follow dma_map_single()
with dma_sync_single_for_device().

Second point is that you should not change the 'direction' argument while
a buffer is mapped.

Thirdly, enable DMA API debugging (DMA_API_DEBUG) to make sure you're
using the DMA API correctly.

Fourthly, remember that the CPU deals with cache lines, and dirty cache
lines may be written back in their _entirety_ - which means that data
DMA'd from a device in the same cache line that you've modified via the
CPU will not work (either the data in the cache line has to be invalidated
and therefore the CPU update discarded, or the cache line has to be flushed
back to RAM and the DMA'd data is overwritten.)  Hence why the buffer
ownership rules are extremely important.

The solution for this fourth point is to use coherent DMA memory for things
like ring buffers and the like, which does not suffer from this.

> Maybe the following comment in dma-mapping.c explains the situation :
> /*
>  * The DMA API is built upon the notion of "buffer ownership".  A buffer
>  * is either exclusively owned by the CPU (and therefore may be accessed
>  * by it) or exclusively owned by the DMA device.  These helper functions
>  * represent the transitions between these two ownership states.
>  *
>  * Note, however, that on later ARMs, this notion does not work due to
>  * speculative prefetches.  We model our approach on the assumption that
>  * the CPU does do speculative prefetches, which means we clean caches
>  * before transfers and delay cache invalidation until transfer completion.
>  *
>  */

Even with that comment, the idea of buffer ownership must be preserved by
drivers, and they must follow that rule.  The fact that some ARM CPU
do not respect the ownership entirely is worked around inside the DMA API
and is of no interest to drivers.  Feroceon is not one CPU which does this
though.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* About dma_sync_single_for_{cpu,device}
  2012-07-31  9:09   ` Russell King - ARM Linux
@ 2012-07-31 19:31     ` Karl Beldan
  2012-07-31 20:08       ` Russell King - ARM Linux
  0 siblings, 1 reply; 9+ messages in thread
From: Karl Beldan @ 2012-07-31 19:31 UTC (permalink / raw)
  To: linux-arm-kernel

On 7/31/12, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote:
> On Tue, Jul 31, 2012 at 08:45:57AM +0200, Karl Beldan wrote:
>> I was expecting the following to work:
>> 	addr = dma_map_single(dev, buffer, size, DMA_TO_DEVICE);
>> 	dma_sync_single_for_device(dev, buffer, pattern_size, DMA_FROM_DEVICE);
>> 	dev_send(buffer);
>> 	// wait for irq (don't peek in the buffer) ... got irq
>> 	dma_sync_single_for_cpu(dev, buffer, pattern_size, DMA_FROM_DEVICE);
>> 	if (!xfer_done(buffer)) // not RAM value
>> 		dma_sync_single_for_device(dev, buffer, pattern_size, DMA_FROM_DEVICE);
>> 	[...]
>

Hi Russell,


> First point is that you clearly do not understand the DMA API at all.  The
> DMA API has the idea of buffer ownership.  Only the owner may access the
> buffer:
>
Are you saying that this scenario does not work ?
We are taking some liberties with the DMA API, we're more using some
of its funcs rather than _using_ it ;).
The question was not whether this was a proper usage of the API, but
why that scenario would not lead to the expected results .. and now
I've found the culprit peek I am happy.


[...]
> So, there is absolutely no noeed what so ever to follow dma_map_single()
> with dma_sync_single_for_device().
>
Cleaning a wide address range while invalidating a small one ?

[...]
> Fourthly, remember that the CPU deals with cache lines, and dirty cache
> lines may be written back in their _entirety_ - which means that data
> DMA'd from a device in the same cache line that you've modified via the
> CPU will not work (either the data in the cache line has to be invalidated
> and therefore the CPU update discarded, or the cache line has to be flushed
> back to RAM and the DMA'd data is overwritten.)  Hence why the buffer
> ownership rules are extremely important.
>
Of course.

> The solution for this fourth point is to use coherent DMA memory for things
> like ring buffers and the like, which does not suffer from this.
>
I might use something different in a not so distant future, yet, for
the time being, this way of doing as its advantages.

[...]
> Even with that comment, the idea of buffer ownership must be preserved by
> drivers, and they must follow that rule.  The fact that some ARM CPU
> do not respect the ownership entirely is worked around inside the DMA API
> and is of no interest to drivers.  Feroceon is not one CPU which does this
> though.
>
It was kind of a last resort explanation for a cache line filled out
of the blue .. before I spotted the culprit peek this morning.

Thanks for your input,



Karl

^ permalink raw reply	[flat|nested] 9+ messages in thread

* About dma_sync_single_for_{cpu,device}
  2012-07-31 19:31     ` Karl Beldan
@ 2012-07-31 20:08       ` Russell King - ARM Linux
  2012-08-01  6:50         ` Karl Beldan
  0 siblings, 1 reply; 9+ messages in thread
From: Russell King - ARM Linux @ 2012-07-31 20:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 31, 2012 at 09:31:13PM +0200, Karl Beldan wrote:
> On 7/31/12, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote:
> > On Tue, Jul 31, 2012 at 08:45:57AM +0200, Karl Beldan wrote:
> >> I was expecting the following to work:
> >> 	addr = dma_map_single(dev, buffer, size, DMA_TO_DEVICE);
> >> 	dma_sync_single_for_device(dev, buffer, pattern_size, DMA_FROM_DEVICE);
> >> 	dev_send(buffer);
> >> 	// wait for irq (don't peek in the buffer) ... got irq
> >> 	dma_sync_single_for_cpu(dev, buffer, pattern_size, DMA_FROM_DEVICE);
> >> 	if (!xfer_done(buffer)) // not RAM value
> >> 		dma_sync_single_for_device(dev, buffer, pattern_size, DMA_FROM_DEVICE);
> >> 	[...]
> >
> 
> Hi Russell,
> 
> 
> > First point is that you clearly do not understand the DMA API at all.  The
> > DMA API has the idea of buffer ownership.  Only the owner may access the
> > buffer:
> >
> Are you saying that this scenario does not work ?
> We are taking some liberties with the DMA API, we're more using some
> of its funcs rather than _using_ it ;).
> The question was not whether this was a proper usage of the API, but
> why that scenario would not lead to the expected results .. and now
> I've found the culprit peek I am happy.

If you abuse the API don't expect your stuff to work in future kernel
versions.

It seems that the overall tone of your reply is "what we have now works,
we don't care if it's correct, sod you."

Fine, I won't spend any more time on this.  Just don't ever think about
merging it into mainline, thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* About dma_sync_single_for_{cpu,device}
  2012-07-31 20:08       ` Russell King - ARM Linux
@ 2012-08-01  6:50         ` Karl Beldan
  0 siblings, 0 replies; 9+ messages in thread
From: Karl Beldan @ 2012-08-01  6:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 7/31/12, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote:
> On Tue, Jul 31, 2012 at 09:31:13PM +0200, Karl Beldan wrote:
>> On 7/31/12, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote:
>> > On Tue, Jul 31, 2012 at 08:45:57AM +0200, Karl Beldan wrote:
>> >> I was expecting the following to work:
>> >> 	addr = dma_map_single(dev, buffer, size, DMA_TO_DEVICE);
>> >> 	dma_sync_single_for_device(dev, buffer, pattern_size,
>> >> DMA_FROM_DEVICE);
>> >> 	dev_send(buffer);
>> >> 	// wait for irq (don't peek in the buffer) ... got irq
>> >> 	dma_sync_single_for_cpu(dev, buffer, pattern_size, DMA_FROM_DEVICE);
>> >> 	if (!xfer_done(buffer)) // not RAM value
>> >> 		dma_sync_single_for_device(dev, buffer, pattern_size,
>> >> DMA_FROM_DEVICE);
>> >> 	[...]
>> >
>>
>> Hi Russell,
>>
>>
>> > First point is that you clearly do not understand the DMA API at all.
>> > The
>> > DMA API has the idea of buffer ownership.  Only the owner may access
>> > the
>> > buffer:
>> >
>> Are you saying that this scenario does not work ?
>> We are taking some liberties with the DMA API, we're more using some
>> of its funcs rather than _using_ it ;).
>> The question was not whether this was a proper usage of the API, but
>> why that scenario would not lead to the expected results .. and now
>> I've found the culprit peek I am happy.
>
> If you abuse the API don't expect your stuff to work in future kernel
> versions.
>
Of course.

> It seems that the overall tone of your reply is "what we have now works,
> we don't care if it's correct, sod you."
>
Not at all : {
On 7/31/12, Karl Beldan <karl.beldan@gmail.com> wrote:
> I might use something different in a not so distant future, yet, for
> the time being, this way of doing as its advantages.
} and during this time I might stick to the API.
I am not at ease telling people how they should take things,
especially if I asked for their help, all the more when they put
efforts within the exchange while being expert on the matter, yet
please, do not assume I did not care for your advices, which I deem of
the most valuable, as, needless to say, do the community.

> Fine, I won't spend any more time on this.  Just don't ever think about
> merging it into mainline, thanks.
>
Merge submission while taking such liberties .. that I would not dare
;) this really was a down to the ground technical question not the
start of a disguised start of a merging request.

I am sure that taking such liberties and feeling its limits before
sticking to a super safe API is not a bad thing, e.g it might trigger
easierly nasty hidden bugs, it is often beneficial to me at least.

Thanks for your feedback,

Karl

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-08-01  6:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20120730202401.GA4947@gobelin>
2012-07-31  6:45 ` About dma_sync_single_for_{cpu,device} Karl Beldan
2012-07-31  6:59   ` Clemens Ladisch
2012-07-31  7:27     ` Karl Beldan
2012-07-31  7:34       ` Clemens Ladisch
2012-07-31  8:30         ` Karl Beldan
2012-07-31  9:09   ` Russell King - ARM Linux
2012-07-31 19:31     ` Karl Beldan
2012-07-31 20:08       ` Russell King - ARM Linux
2012-08-01  6:50         ` Karl Beldan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).