non barrier versions of dma

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* non barrier versions of dma_map functions
  2009-12-07 19:37 non barrier versions of dma_map functions adharmap at codeaurora.org
@ 2009-12-07 19:35 ` Russell King - ARM Linux
  2009-12-10  0:32   ` Abhijeet Dharmapurikar
  0 siblings, 1 reply; 6+ messages in thread
From: Russell King - ARM Linux @ 2009-12-07 19:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Dec 07, 2009 at 11:37:21AM -0800, adharmap at codeaurora.org wrote:
> We have a situation where we need to dma map multiple cached buffers for a
> single dma transaction.
> 
> The current DMA api suggests the use of dma_map_single for cache
> consistency. On ARMv7 it performs the necessary cache-operations and calls
> data sync barrier instruction (DSB). In our case we would be executing
> multiple DSB instruction before starting the dma operation - we need
> memory to be consistent only after we map the last buffer.

Is it a problem and do you have numbers to illustrate why it is a
problem, or is this just theory?

I suspect that the DSB doesn't figure in the bigger scheme of things
(such as running the cache maintainence operations.)  Moreover, there's
bigger issues (such as the DMA cache maintainence on ARMv7 actually
*being* correct - by invalidating after the DMA has completed) to be
getting on with at the moment that things as you're suggesting aren't
worth worrying about.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* non barrier versions of dma_map functions
@ 2009-12-07 19:37 adharmap at codeaurora.org
  2009-12-07 19:35 ` Russell King - ARM Linux
  0 siblings, 1 reply; 6+ messages in thread
From: adharmap at codeaurora.org @ 2009-12-07 19:37 UTC (permalink / raw)
  To: linux-arm-kernel

We have a situation where we need to dma map multiple cached buffers for a
single dma transaction.

The current DMA api suggests the use of dma_map_single for cache
consistency. On ARMv7 it performs the necessary cache-operations and calls
data sync barrier instruction (DSB). In our case we would be executing
multiple DSB instruction before starting the dma operation - we need
memory to be consistent only after we map the last buffer.

I am thinking we could define "no barrier" version's of all the mapping
functions and then a barrier function that results in DSB before the dma
is started.

Requesting alternative ideas or code design to get the desired nonbarrier
versions of the mapping functions.

Abhijeet Dharmapurikar

^ permalink raw reply	[flat|nested] 6+ messages in thread

* non barrier versions of dma_map functions
  2009-12-07 19:35 ` Russell King - ARM Linux
@ 2009-12-10  0:32   ` Abhijeet Dharmapurikar
  2009-12-10  9:39     ` Catalin Marinas
  0 siblings, 1 reply; 6+ messages in thread
From: Abhijeet Dharmapurikar @ 2009-12-10  0:32 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux wrote:
> On Mon, Dec 07, 2009 at 11:37:21AM -0800, adharmap at codeaurora.org wrote:
>> We have a situation where we need to dma map multiple cached buffers for a
>> single dma transaction.
>>
>> The current DMA api suggests the use of dma_map_single for cache
>> consistency. On ARMv7 it performs the necessary cache-operations and calls
>> data sync barrier instruction (DSB). In our case we would be executing
>> multiple DSB instruction before starting the dma operation - we need
>> memory to be consistent only after we map the last buffer.
> 
> Is it a problem and do you have numbers to illustrate why it is a
> problem, or is this just theory?

Here are numbers from a test ran on ARMv7 based device
It kmallocs N buffers of size 'size', dirties their cache by writing
to them and calls dma_map_single that calls the arch specific clean
operations with and without dsb. In "without dsb" case a dsb is executed
after the last buffer is mapped. The time is in microseconds

size	N	map_single	map_single w/o dsb	delta
128	16	8		5			60%
512	16	9		6			50%
512	32	15		8			88%
512	48	20		11			82%
512	64	27		14			93%
64	4	4		3			33%
64	8	4		3			33%
64	16	7		4			75%
64	32	12		4			200%
64	48	17		6			183%
64	64	21		7			200%
1024	16	9		7			29%

These buffer sizes and N are very close to real world sizes the
framebuffer driver handles. Cases where N is large happen the most
often.

Clearly,we could benefit from the nobarrier versions of the cache
operations and we could use them in scatter gather mappings as well.

Abhijeet

^ permalink raw reply	[flat|nested] 6+ messages in thread

* non barrier versions of dma_map functions
  2009-12-10  0:32   ` Abhijeet Dharmapurikar
@ 2009-12-10  9:39     ` Catalin Marinas
  2009-12-10 18:16       ` Abhijeet Dharmapurikar
  0 siblings, 1 reply; 6+ messages in thread
From: Catalin Marinas @ 2009-12-10  9:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2009-12-10 at 00:32 +0000, Abhijeet Dharmapurikar wrote:
> Russell King - ARM Linux wrote:
> > On Mon, Dec 07, 2009 at 11:37:21AM -0800, adharmap at codeaurora.org wrote:
> >> We have a situation where we need to dma map multiple cached buffers for a
> >> single dma transaction.
> >>
> >> The current DMA api suggests the use of dma_map_single for cache
> >> consistency. On ARMv7 it performs the necessary cache-operations and calls
> >> data sync barrier instruction (DSB). In our case we would be executing
> >> multiple DSB instruction before starting the dma operation - we need
> >> memory to be consistent only after we map the last buffer.
> >
> > Is it a problem and do you have numbers to illustrate why it is a
> > problem, or is this just theory?
> 
> 
> Here are numbers from a test ran on ARMv7 based device
> It kmallocs N buffers of size 'size', dirties their cache by writing
> to them and calls dma_map_single that calls the arch specific clean
> operations with and without dsb. In "without dsb" case a dsb is executed
> after the last buffer is mapped. The time is in microseconds

Interesting results but I have some additional questions:

What is the direction given to dma_map_single()?

If the direction is TO_DEVICE, has the buffer been dirtied before
calling dma_map_single()? The DSB overhead could be smaller compared to
the actual cache flushing.

-- 
Catalin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* non barrier versions of dma_map functions
  2009-12-10  9:39     ` Catalin Marinas
@ 2009-12-10 18:16       ` Abhijeet Dharmapurikar
  2009-12-10 19:08         ` Russell King - ARM Linux
  0 siblings, 1 reply; 6+ messages in thread
From: Abhijeet Dharmapurikar @ 2009-12-10 18:16 UTC (permalink / raw)
  To: linux-arm-kernel

Catalin Marinas wrote:
> On Thu, 2009-12-10 at 00:32 +0000, Abhijeet Dharmapurikar wrote:
>> Russell King - ARM Linux wrote:
>>> On Mon, Dec 07, 2009 at 11:37:21AM -0800, adharmap at codeaurora.org wrote:
>>>> We have a situation where we need to dma map multiple cached buffers for a
>>>> single dma transaction.
>>>>
>>>> The current DMA api suggests the use of dma_map_single for cache
>>>> consistency. On ARMv7 it performs the necessary cache-operations and calls
>>>> data sync barrier instruction (DSB). In our case we would be executing
>>>> multiple DSB instruction before starting the dma operation - we need
>>>> memory to be consistent only after we map the last buffer.
>>> Is it a problem and do you have numbers to illustrate why it is a
>>> problem, or is this just theory?
>>
>> Here are numbers from a test ran on ARMv7 based device
>> It kmallocs N buffers of size 'size', dirties their cache by writing
>> to them and calls dma_map_single that calls the arch specific clean
>> operations with and without dsb. In "without dsb" case a dsb is executed
>> after the last buffer is mapped. The time is in microseconds
> 
> Interesting results but I have some additional questions:
> 
> What is the direction given to dma_map_single()?
> 
DMA_TO_DEVICE
> If the direction is TO_DEVICE, has the buffer been dirtied before
> calling dma_map_single()? The DSB overhead could be smaller compared to
> the actual cache flushing.
Yes the buffers were dirtied before calling 
dma_map_single(...DMA_TO_DEVICE);
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* non barrier versions of dma_map functions
  2009-12-10 18:16       ` Abhijeet Dharmapurikar
@ 2009-12-10 19:08         ` Russell King - ARM Linux
  0 siblings, 0 replies; 6+ messages in thread
From: Russell King - ARM Linux @ 2009-12-10 19:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Dec 10, 2009 at 10:16:49AM -0800, Abhijeet Dharmapurikar wrote:
> Catalin Marinas wrote:
>> What is the direction given to dma_map_single()?
>>
> DMA_TO_DEVICE
>> If the direction is TO_DEVICE, has the buffer been dirtied before
>> calling dma_map_single()? The DSB overhead could be smaller compared to
>> the actual cache flushing.
> Yes the buffers were dirtied before calling  
> dma_map_single(...DMA_TO_DEVICE);

Well, I don't think we can extend the API like this without cross-platform
agreement.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-12-10 19:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-07 19:37 non barrier versions of dma_map functions adharmap at codeaurora.org
2009-12-07 19:35 ` Russell King - ARM Linux
2009-12-10  0:32   ` Abhijeet Dharmapurikar
2009-12-10  9:39     ` Catalin Marinas
2009-12-10 18:16       ` Abhijeet Dharmapurikar
2009-12-10 19:08         ` Russell King - ARM Linux

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).