linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] arch/microblaze: Added sync method support for DMA and made refinements
@ 2011-09-02  9:31 Eli Billauer
  2011-09-02 12:32 ` Michal Simek
  2011-09-02 15:15 ` [PATCH] Added sync method support for DMA and made refinements (Take II) Eli Billauer
  0 siblings, 2 replies; 6+ messages in thread
From: Eli Billauer @ 2011-09-02  9:31 UTC (permalink / raw)
  To: LKML; +Cc: michal.simek

Hi,


This was supposed to be a small patch, but as I dug in, I found that 
several things needed fixing. The sync support was tested by using a 
hardware driver which calls these methods extensively. Scatter-gather 
methods remain completely untested. General DMA operations work, or the 
system would crash, I suppose.


Here's a summary of the changes:


* Added support for sync_single_for_*

* Added support for sync_sg_for_*

* Removed the previous (static) function __dma_sync_page() in favor of 
__dma_sync(), so a single function can be used for all needs. This 
(inline)  function was moved to dma_mapping.h, since it's used by 
dma_cache_sync() there.

* Cache is now flushed on all dma mapping functions, regardless of 
direction. It's obvious for DMA_TO_DEVICE, but not so obvious regarding 
DMA_FROM_DEVICE. The reason in the latter case is that an unflushed 
cache line will be written back to memory sooner or later, possibly 
overwriting DMAed data.

* Cache is never invalidated on dma mapping. It's pointless, since the 
CPU isn't supposed to touch the memory segment anyhow until unmap or sync.

* dma_cache_sync() now really syncs the cache (is was effectively a NOP)


Patch follows.


    Eli



Signed-off-by: Eli Billauer <eli.billauer@gmail.com>
---
 arch/microblaze/include/asm/dma-mapping.h |   20 +++++-
 arch/microblaze/kernel/dma.c              |  109 
++++++++++++++++++++++-------
 2 files changed, 101 insertions(+), 28 deletions(-)

diff --git a/arch/microblaze/include/asm/dma-mapping.h 
b/arch/microblaze/include/asm/dma-mapping.h
index 8fbb0ec..cddeca5 100644
--- a/arch/microblaze/include/asm/dma-mapping.h
+++ b/arch/microblaze/include/asm/dma-mapping.h
@@ -28,12 +28,12 @@
 #include <linux/dma-attrs.h>
 #include <asm/io.h>
 #include <asm-generic/dma-coherent.h>
+#include <asm/cacheflush.h>
 
 #define DMA_ERROR_CODE        (~(dma_addr_t)0x0)
 
 #define __dma_alloc_coherent(dev, gfp, size, handle)    NULL
 #define __dma_free_coherent(size, addr)        ((void)0)
-#define __dma_sync(addr, size, rw)        ((void)0)
 
 static inline unsigned long device_to_mask(struct device *dev)
 {
@@ -95,6 +95,22 @@ static inline int dma_set_mask(struct device *dev, 
u64 dma_mask)
 
 #include <asm-generic/dma-mapping-common.h>
 
+static inline void __dma_sync(unsigned long paddr,
+                  size_t size, enum dma_data_direction direction)
+{
+    switch (direction) {
+    case DMA_TO_DEVICE:
+    case DMA_BIDIRECTIONAL:
+        flush_dcache_range(paddr, paddr + size);
+        break;
+    case DMA_FROM_DEVICE:
+        invalidate_dcache_range(paddr, paddr + size);
+        break;
+    default:
+        BUG();
+    }
+}
+
 static inline int dma_mapping_error(struct device *dev, dma_addr_t 
dma_addr)
 {
     struct dma_map_ops *ops = get_dma_ops(dev);
@@ -135,7 +151,7 @@ static inline void dma_cache_sync(struct device 
*dev, void *vaddr, size_t size,
         enum dma_data_direction direction)
 {
     BUG_ON(direction == DMA_NONE);
-    __dma_sync(vaddr, size, (int)direction);
+    __dma_sync(virt_to_phys(vaddr), size, (int)direction);
 }
 
 #endif    /* _ASM_MICROBLAZE_DMA_MAPPING_H */
diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c
index 393e6b2..9fa9d82 100644
--- a/arch/microblaze/kernel/dma.c
+++ b/arch/microblaze/kernel/dma.c
@@ -11,7 +11,6 @@
 #include <linux/gfp.h>
 #include <linux/dma-debug.h>
 #include <asm/bug.h>
-#include <asm/cacheflush.h>
 
 /*
  * Generic direct DMA implementation
@@ -21,21 +20,6 @@
  * can set archdata.dma_data to an unsigned long holding the offset. By
  * default the offset is PCI_DRAM_OFFSET.
  */
-static inline void __dma_sync_page(unsigned long paddr, unsigned long 
offset,
-                size_t size, enum dma_data_direction direction)
-{
-    switch (direction) {
-    case DMA_TO_DEVICE:
-    case DMA_BIDIRECTIONAL:
-        flush_dcache_range(paddr + offset, paddr + offset + size);
-        break;
-    case DMA_FROM_DEVICE:
-        invalidate_dcache_range(paddr + offset, paddr + offset + size);
-        break;
-    default:
-        BUG();
-    }
-}
 
 static unsigned long get_dma_direct_offset(struct device *dev)
 {
@@ -91,17 +75,24 @@ static int dma_direct_map_sg(struct device *dev, 
struct scatterlist *sgl,
     /* FIXME this part of code is untested */
     for_each_sg(sgl, sg, nents, i) {
         sg->dma_address = sg_phys(sg) + get_dma_direct_offset(dev);
-        __dma_sync_page(page_to_phys(sg_page(sg)), sg->offset,
-                            sg->length, direction);
+        __dma_sync(page_to_phys(sg_page(sg)) + sg->offset,
+               sg->length, DMA_TO_DEVICE);
     }
 
     return nents;
 }
 
-static void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sg,
+static void dma_direct_unmap_sg(struct device *dev, struct scatterlist 
*sgl,
                 int nents, enum dma_data_direction direction,
                 struct dma_attrs *attrs)
 {
+    struct scatterlist *sg;
+    int i;
+
+    /* FIXME this part of code is untested */
+    if (direction == DMA_FROM_DEVICE)
+        for_each_sg(sgl, sg, nents, i)
+            __dma_sync(sg->dma_address, sg->length, direction);
 }
 
 static int dma_direct_dma_supported(struct device *dev, u64 mask)
@@ -116,7 +107,15 @@ static inline dma_addr_t dma_direct_map_page(struct 
device *dev,
                          enum dma_data_direction direction,
                          struct dma_attrs *attrs)
 {
-    __dma_sync_page(page_to_phys(page), offset, size, direction);
+    /* We're before the DMA transfer, so cache invalidation makes no
+       sense in the case of DMA_FROM_DEVICE. Flushing is necessary
+       in either case, or an unflushed cache line may overwrite
+       data written by device, in the event of that line being allocated
+       for other use. Calling __dma_sync with DMA_TO_DEVICE makes this
+       flush.
+    */
+
+    __dma_sync(page_to_phys(page) + offset, size, DMA_TO_DEVICE);
     return page_to_phys(page) + offset + get_dma_direct_offset(dev);
 }
 
@@ -126,12 +125,66 @@ static inline void dma_direct_unmap_page(struct 
device *dev,
                      enum dma_data_direction direction,
                      struct dma_attrs *attrs)
 {
-/* There is not necessary to do cache cleanup
- *
- * phys_to_virt is here because in __dma_sync_page is __virt_to_phys and
- * dma_address is physical address
- */
-    __dma_sync_page(dma_address, 0 , size, direction);
+/* On a DMA to the device, the data has already been flushed and read by
+   the device at the point unmapping is done. No point doing anything.
+   In the other direction, unmapping may be used just before accessing the
+   data on the CPU, so cache invalidation is necessary.
+*/
+
+    if (direction == DMA_FROM_DEVICE)
+        __dma_sync(dma_address, size, direction);
+}
+
+static inline void
+dma_direct_sync_single_for_cpu(struct device *dev,
+                   dma_addr_t dma_handle, size_t size,
+                   enum dma_data_direction direction)
+{
+    /* It's pointless to invalidate the cache if the device isn't supposed
+       to write to the relevant region */
+
+    if (direction == DMA_FROM_DEVICE)
+        __dma_sync(dma_handle, size, direction);
+}
+
+static inline void
+dma_direct_sync_single_for_device(struct device *dev,
+                  dma_addr_t dma_handle, size_t size,
+                  enum dma_data_direction direction)
+{
+    /* It's pointless to flush the cache if the CPU isn't supposed
+       to write to the relevant region */
+
+    if (direction == DMA_TO_DEVICE)
+        __dma_sync(dma_handle, size, direction);
+}
+
+static inline void
+dma_direct_sync_sg_for_cpu(struct device *dev,
+               struct scatterlist *sgl, int nents,
+               enum dma_data_direction direction)
+{
+    struct scatterlist *sg;
+    int i;
+
+    /* FIXME this part of code is untested */
+    if (direction == DMA_FROM_DEVICE)
+        for_each_sg(sgl, sg, nents, i)
+            __dma_sync(sg->dma_address, sg->length, direction);
+}
+
+static inline void
+dma_direct_sync_sg_for_device(struct device *dev,
+                  struct scatterlist *sgl, int nents,
+                  enum dma_data_direction direction)
+{
+    struct scatterlist *sg;
+    int i;
+
+    /* FIXME this part of code is untested */
+    if (direction == DMA_TO_DEVICE)
+        for_each_sg(sgl, sg, nents, i)
+            __dma_sync(sg->dma_address, sg->length, direction);
 }
 
 struct dma_map_ops dma_direct_ops = {
@@ -142,6 +195,10 @@ struct dma_map_ops dma_direct_ops = {
     .dma_supported    = dma_direct_dma_supported,
     .map_page    = dma_direct_map_page,
     .unmap_page    = dma_direct_unmap_page,
+    .sync_single_for_cpu        = dma_direct_sync_single_for_cpu,
+    .sync_single_for_device        = dma_direct_sync_single_for_device,
+    .sync_sg_for_cpu        = dma_direct_sync_sg_for_cpu,
+    .sync_sg_for_device        = dma_direct_sync_sg_for_device,
 };
 EXPORT_SYMBOL(dma_direct_ops);
 
-- 
1.7.2.2



-- 
Web: http://www.billauer.co.il


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] arch/microblaze: Added sync method support for DMA and made refinements
  2011-09-02  9:31 [PATCH] arch/microblaze: Added sync method support for DMA and made refinements Eli Billauer
@ 2011-09-02 12:32 ` Michal Simek
  2011-09-02 15:12   ` Eli Billauer
  2011-09-02 15:15 ` [PATCH] Added sync method support for DMA and made refinements (Take II) Eli Billauer
  1 sibling, 1 reply; 6+ messages in thread
From: Michal Simek @ 2011-09-02 12:32 UTC (permalink / raw)
  To: Eli Billauer; +Cc: LKML

Eli Billauer wrote:
> Hi,
> 
> 
> This was supposed to be a small patch, but as I dug in, I found that 
> several things needed fixing. The sync support was tested by using a 
> hardware driver which calls these methods extensively. Scatter-gather 
> methods remain completely untested. General DMA operations work, or the 
> system would crash, I suppose.
> 
> 
> Here's a summary of the changes:
> 
> 
> * Added support for sync_single_for_*
> 
> * Added support for sync_sg_for_*
> 
> * Removed the previous (static) function __dma_sync_page() in favor of 
> __dma_sync(), so a single function can be used for all needs. This 
> (inline)  function was moved to dma_mapping.h, since it's used by 
> dma_cache_sync() there.
> 
> * Cache is now flushed on all dma mapping functions, regardless of 
> direction. It's obvious for DMA_TO_DEVICE, but not so obvious regarding 
> DMA_FROM_DEVICE. The reason in the latter case is that an unflushed 
> cache line will be written back to memory sooner or later, possibly 
> overwriting DMAed data.
> 
> * Cache is never invalidated on dma mapping. It's pointless, since the 
> CPU isn't supposed to touch the memory segment anyhow until unmap or sync.
> 
> * dma_cache_sync() now really syncs the cache (is was effectively a NOP)
> 
> 
> Patch follows.
> 
> 
>    Eli
> 
> 
> 
> Signed-off-by: Eli Billauer <eli.billauer@gmail.com>
> ---
> arch/microblaze/include/asm/dma-mapping.h |   20 +++++-
> arch/microblaze/kernel/dma.c              |  109 
> ++++++++++++++++++++++-------
> 2 files changed, 101 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/microblaze/include/asm/dma-mapping.h 
> b/arch/microblaze/include/asm/dma-mapping.h
> index 8fbb0ec..cddeca5 100644
> --- a/arch/microblaze/include/asm/dma-mapping.h
> +++ b/arch/microblaze/include/asm/dma-mapping.h
> @@ -28,12 +28,12 @@
> #include <linux/dma-attrs.h>
> #include <asm/io.h>
> #include <asm-generic/dma-coherent.h>
> +#include <asm/cacheflush.h>
> 
> #define DMA_ERROR_CODE        (~(dma_addr_t)0x0)
> 
> #define __dma_alloc_coherent(dev, gfp, size, handle)    NULL
> #define __dma_free_coherent(size, addr)        ((void)0)
> -#define __dma_sync(addr, size, rw)        ((void)0)
> 
> static inline unsigned long device_to_mask(struct device *dev)
> {
> @@ -95,6 +95,22 @@ static inline int dma_set_mask(struct device *dev, 
> u64 dma_mask)
> 
> #include <asm-generic/dma-mapping-common.h>
> 
> +static inline void __dma_sync(unsigned long paddr,
> +                  size_t size, enum dma_data_direction direction)
> +{
> +    switch (direction) {
> +    case DMA_TO_DEVICE:
> +    case DMA_BIDIRECTIONAL:
> +        flush_dcache_range(paddr, paddr + size);
> +        break;
> +    case DMA_FROM_DEVICE:
> +        invalidate_dcache_range(paddr, paddr + size);
> +        break;
> +    default:
> +        BUG();
> +    }
> +}
> +
> static inline int dma_mapping_error(struct device *dev, dma_addr_t 
> dma_addr)
> {
>     struct dma_map_ops *ops = get_dma_ops(dev);
> @@ -135,7 +151,7 @@ static inline void dma_cache_sync(struct device 
> *dev, void *vaddr, size_t size,
>         enum dma_data_direction direction)
> {
>     BUG_ON(direction == DMA_NONE);
> -    __dma_sync(vaddr, size, (int)direction);
> +    __dma_sync(virt_to_phys(vaddr), size, (int)direction);
> }
> 
> #endif    /* _ASM_MICROBLAZE_DMA_MAPPING_H */
> diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c
> index 393e6b2..9fa9d82 100644
> --- a/arch/microblaze/kernel/dma.c
> +++ b/arch/microblaze/kernel/dma.c
> @@ -11,7 +11,6 @@
> #include <linux/gfp.h>
> #include <linux/dma-debug.h>
> #include <asm/bug.h>
> -#include <asm/cacheflush.h>
> 
> /*
>  * Generic direct DMA implementation
> @@ -21,21 +20,6 @@
>  * can set archdata.dma_data to an unsigned long holding the offset. By
>  * default the offset is PCI_DRAM_OFFSET.
>  */
> -static inline void __dma_sync_page(unsigned long paddr, unsigned long 
> offset,
> -                size_t size, enum dma_data_direction direction)
> -{
> -    switch (direction) {
> -    case DMA_TO_DEVICE:
> -    case DMA_BIDIRECTIONAL:
> -        flush_dcache_range(paddr + offset, paddr + offset + size);
> -        break;
> -    case DMA_FROM_DEVICE:
> -        invalidate_dcache_range(paddr + offset, paddr + offset + size);
> -        break;
> -    default:
> -        BUG();
> -    }
> -}
> 
> static unsigned long get_dma_direct_offset(struct device *dev)
> {
> @@ -91,17 +75,24 @@ static int dma_direct_map_sg(struct device *dev, 
> struct scatterlist *sgl,
>     /* FIXME this part of code is untested */
>     for_each_sg(sgl, sg, nents, i) {
>         sg->dma_address = sg_phys(sg) + get_dma_direct_offset(dev);
> -        __dma_sync_page(page_to_phys(sg_page(sg)), sg->offset,
> -                            sg->length, direction);
> +        __dma_sync(page_to_phys(sg_page(sg)) + sg->offset,
> +               sg->length, DMA_TO_DEVICE);
>     }
> 
>     return nents;
> }
> 
> -static void dma_direct_unmap_sg(struct device *dev, struct scatterlist 
> *sg,
> +static void dma_direct_unmap_sg(struct device *dev, struct scatterlist 
> *sgl,
>                 int nents, enum dma_data_direction direction,
>                 struct dma_attrs *attrs)
> {
> +    struct scatterlist *sg;
> +    int i;
> +
> +    /* FIXME this part of code is untested */
> +    if (direction == DMA_FROM_DEVICE)
> +        for_each_sg(sgl, sg, nents, i)
> +            __dma_sync(sg->dma_address, sg->length, direction);
> }
> 
> static int dma_direct_dma_supported(struct device *dev, u64 mask)
> @@ -116,7 +107,15 @@ static inline dma_addr_t dma_direct_map_page(struct 
> device *dev,
>                          enum dma_data_direction direction,
>                          struct dma_attrs *attrs)
> {
> -    __dma_sync_page(page_to_phys(page), offset, size, direction);
> +    /* We're before the DMA transfer, so cache invalidation makes no
> +       sense in the case of DMA_FROM_DEVICE. Flushing is necessary
> +       in either case, or an unflushed cache line may overwrite
> +       data written by device, in the event of that line being allocated
> +       for other use. Calling __dma_sync with DMA_TO_DEVICE makes this
> +       flush.
> +    */

Please fix this comment.

DMA_TO_DEVICE is fine - data has to be flushed.

For DMA_FROM_DEVICE you expect that allocated space will contains data from device.
Which means if you flush them, they will be rewritten by DMA in the next step.
Which means that IMHO you can invalidate them which is faster than flushing.
Of course if you flush them it shouldn't be any problem.

The question is if this case can happen.
offset + size is not cache align and code invalidate data which should be flush
but I think this can't happen.

Have you seen any problem with invalidation?


> +
> +    __dma_sync(page_to_phys(page) + offset, size, DMA_TO_DEVICE);
>     return page_to_phys(page) + offset + get_dma_direct_offset(dev);
> }
> 
> @@ -126,12 +125,66 @@ static inline void dma_direct_unmap_page(struct 
> device *dev,
>                      enum dma_data_direction direction,
>                      struct dma_attrs *attrs)
> {
> -/* There is not necessary to do cache cleanup
> - *
> - * phys_to_virt is here because in __dma_sync_page is __virt_to_phys and
> - * dma_address is physical address
> - */
> -    __dma_sync_page(dma_address, 0 , size, direction);
> +/* On a DMA to the device, the data has already been flushed and read by
> +   the device at the point unmapping is done. No point doing anything.
> +   In the other direction, unmapping may be used just before accessing the
> +   data on the CPU, so cache invalidation is necessary.
> +*/

fix this comment to linux style.

> +
> +    if (direction == DMA_FROM_DEVICE)
> +        __dma_sync(dma_address, size, direction);
> +}
> +
> +static inline void
> +dma_direct_sync_single_for_cpu(struct device *dev,
> +                   dma_addr_t dma_handle, size_t size,
> +                   enum dma_data_direction direction)
> +{
> +    /* It's pointless to invalidate the cache if the device isn't supposed
> +       to write to the relevant region */

ditto

> +
> +    if (direction == DMA_FROM_DEVICE)
> +        __dma_sync(dma_handle, size, direction);
> +}
> +
> +static inline void
> +dma_direct_sync_single_for_device(struct device *dev,
> +                  dma_addr_t dma_handle, size_t size,
> +                  enum dma_data_direction direction)
> +{
> +    /* It's pointless to flush the cache if the CPU isn't supposed
> +       to write to the relevant region */

ditto

> +
> +    if (direction == DMA_TO_DEVICE)
> +        __dma_sync(dma_handle, size, direction);
> +}
> +
> +static inline void
> +dma_direct_sync_sg_for_cpu(struct device *dev,
> +               struct scatterlist *sgl, int nents,
> +               enum dma_data_direction direction)
> +{
> +    struct scatterlist *sg;
> +    int i;
> +
> +    /* FIXME this part of code is untested */
> +    if (direction == DMA_FROM_DEVICE)
> +        for_each_sg(sgl, sg, nents, i)
> +            __dma_sync(sg->dma_address, sg->length, direction);
> +}
> +
> +static inline void
> +dma_direct_sync_sg_for_device(struct device *dev,
> +                  struct scatterlist *sgl, int nents,
> +                  enum dma_data_direction direction)
> +{
> +    struct scatterlist *sg;
> +    int i;
> +
> +    /* FIXME this part of code is untested */
> +    if (direction == DMA_TO_DEVICE)
> +        for_each_sg(sgl, sg, nents, i)
> +            __dma_sync(sg->dma_address, sg->length, direction);
> }
> 
> struct dma_map_ops dma_direct_ops = {
> @@ -142,6 +195,10 @@ struct dma_map_ops dma_direct_ops = {
>     .dma_supported    = dma_direct_dma_supported,
>     .map_page    = dma_direct_map_page,
>     .unmap_page    = dma_direct_unmap_page,
> +    .sync_single_for_cpu        = dma_direct_sync_single_for_cpu,
> +    .sync_single_for_device        = dma_direct_sync_single_for_device,
> +    .sync_sg_for_cpu        = dma_direct_sync_sg_for_cpu,
> +    .sync_sg_for_device        = dma_direct_sync_sg_for_device,
> };
> EXPORT_SYMBOL(dma_direct_ops);
> 

Thanks,
Michal


-- 
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] arch/microblaze: Added sync method support for DMA  and made refinements
  2011-09-02 12:32 ` Michal Simek
@ 2011-09-02 15:12   ` Eli Billauer
  2011-09-09  9:16     ` Michal Simek
  0 siblings, 1 reply; 6+ messages in thread
From: Eli Billauer @ 2011-09-02 15:12 UTC (permalink / raw)
  To: LKML; +Cc: monstr

Michal Simek wrote:

>
> DMA_TO_DEVICE is fine - data has to be flushed.
>
> For DMA_FROM_DEVICE you expect that allocated space will contains data 
> from device.
> Which means if you flush them, they will be rewritten by DMA in the 
> next step.
> Which means that IMHO you can invalidate them which is faster than 
> flushing.
The flushing is only necessary when a certain memory region is written 
to, not flushed, and then allocated for DMA from device. This is not 
what drivers usually do, but it's nevertheless legal to do so. Maybe it 
can also happen as a result of memory being freed but not flushed, and 
then allocated as a DMA buffer. So this flushing prevents, at most, an 
extremely rare problem. I don't expect to see something go wrong right 
away in the lack of this flush.

In light of this, I can't see why flushing would be slower than 
invalidation, if the cache lines aren't expected to be dirty except for 
very rare conditions.

As for invalidation of a dirty cache line: Looking at the Microblaze 
reference manual's description of the wdc instruction, I'm under the 
impression that each cache line has two flags: Valid and Dirty, as they 
appear in the pseudocode. I'm not on the clear about what happens if a 
dirty cache line is invalidated. It would make sense to clear both 
flags. It also makes sense to write the data back to the RAM in this 
case. But I really don't know what's actually implemented.

Patch with the corrections soon to be posted.

   Eli

-- 
Web: http://www.billauer.co.il


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] Added sync method support for DMA and made refinements (Take II)
  2011-09-02  9:31 [PATCH] arch/microblaze: Added sync method support for DMA and made refinements Eli Billauer
  2011-09-02 12:32 ` Michal Simek
@ 2011-09-02 15:15 ` Eli Billauer
  2011-09-09  9:19   ` Michal Simek
  1 sibling, 1 reply; 6+ messages in thread
From: Eli Billauer @ 2011-09-02 15:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: monstr, Eli Billauer


Signed-off-by: Eli Billauer <eli.billauer@gmail.com>
---
 arch/microblaze/include/asm/dma-mapping.h |   20 +++++-
 arch/microblaze/kernel/dma.c              |  114 ++++++++++++++++++++++------
 2 files changed, 107 insertions(+), 27 deletions(-)

diff --git a/arch/microblaze/include/asm/dma-mapping.h b/arch/microblaze/include/asm/dma-mapping.h
index 8fbb0ec..cddeca5 100644
--- a/arch/microblaze/include/asm/dma-mapping.h
+++ b/arch/microblaze/include/asm/dma-mapping.h
@@ -28,12 +28,12 @@
 #include <linux/dma-attrs.h>
 #include <asm/io.h>
 #include <asm-generic/dma-coherent.h>
+#include <asm/cacheflush.h>
 
 #define DMA_ERROR_CODE		(~(dma_addr_t)0x0)
 
 #define __dma_alloc_coherent(dev, gfp, size, handle)	NULL
 #define __dma_free_coherent(size, addr)		((void)0)
-#define __dma_sync(addr, size, rw)		((void)0)
 
 static inline unsigned long device_to_mask(struct device *dev)
 {
@@ -95,6 +95,22 @@ static inline int dma_set_mask(struct device *dev, u64 dma_mask)
 
 #include <asm-generic/dma-mapping-common.h>
 
+static inline void __dma_sync(unsigned long paddr,
+			      size_t size, enum dma_data_direction direction)
+{
+	switch (direction) {
+	case DMA_TO_DEVICE:
+	case DMA_BIDIRECTIONAL:
+		flush_dcache_range(paddr, paddr + size);
+		break;
+	case DMA_FROM_DEVICE:
+		invalidate_dcache_range(paddr, paddr + size);
+		break;
+	default:
+		BUG();
+	}
+}
+
 static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
 {
 	struct dma_map_ops *ops = get_dma_ops(dev);
@@ -135,7 +151,7 @@ static inline void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
 		enum dma_data_direction direction)
 {
 	BUG_ON(direction == DMA_NONE);
-	__dma_sync(vaddr, size, (int)direction);
+	__dma_sync(virt_to_phys(vaddr), size, (int)direction);
 }
 
 #endif	/* _ASM_MICROBLAZE_DMA_MAPPING_H */
diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c
index 393e6b2..78d8289 100644
--- a/arch/microblaze/kernel/dma.c
+++ b/arch/microblaze/kernel/dma.c
@@ -11,7 +11,6 @@
 #include <linux/gfp.h>
 #include <linux/dma-debug.h>
 #include <asm/bug.h>
-#include <asm/cacheflush.h>
 
 /*
  * Generic direct DMA implementation
@@ -21,21 +20,6 @@
  * can set archdata.dma_data to an unsigned long holding the offset. By
  * default the offset is PCI_DRAM_OFFSET.
  */
-static inline void __dma_sync_page(unsigned long paddr, unsigned long offset,
-				size_t size, enum dma_data_direction direction)
-{
-	switch (direction) {
-	case DMA_TO_DEVICE:
-	case DMA_BIDIRECTIONAL:
-		flush_dcache_range(paddr + offset, paddr + offset + size);
-		break;
-	case DMA_FROM_DEVICE:
-		invalidate_dcache_range(paddr + offset, paddr + offset + size);
-		break;
-	default:
-		BUG();
-	}
-}
 
 static unsigned long get_dma_direct_offset(struct device *dev)
 {
@@ -91,17 +75,24 @@ static int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl,
 	/* FIXME this part of code is untested */
 	for_each_sg(sgl, sg, nents, i) {
 		sg->dma_address = sg_phys(sg) + get_dma_direct_offset(dev);
-		__dma_sync_page(page_to_phys(sg_page(sg)), sg->offset,
-							sg->length, direction);
+		__dma_sync(page_to_phys(sg_page(sg)) + sg->offset,
+			   sg->length, DMA_TO_DEVICE);
 	}
 
 	return nents;
 }
 
-static void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sg,
+static void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
 				int nents, enum dma_data_direction direction,
 				struct dma_attrs *attrs)
 {
+	struct scatterlist *sg;
+	int i;
+	
+	/* FIXME this part of code is untested */
+	if (direction == DMA_FROM_DEVICE)
+		for_each_sg(sgl, sg, nents, i) 
+			__dma_sync(sg->dma_address, sg->length, direction);	
 }
 
 static int dma_direct_dma_supported(struct device *dev, u64 mask)
@@ -116,7 +107,16 @@ static inline dma_addr_t dma_direct_map_page(struct device *dev,
 					     enum dma_data_direction direction,
 					     struct dma_attrs *attrs)
 {
-	__dma_sync_page(page_to_phys(page), offset, size, direction);
+	/*
+	 * We're before the DMA transfer, so cache invalidation makes no
+	 * sense in the case of DMA_FROM_DEVICE. Flushing is necessary
+	 * in either case, or an unflushed cache line may overwrite
+	 * data written by device, in the event of that line being allocated
+	 * for other use. Calling __dma_sync with DMA_TO_DEVICE makes this
+	 * flush.
+	 */
+
+	__dma_sync(page_to_phys(page) + offset, size, DMA_TO_DEVICE);
 	return page_to_phys(page) + offset + get_dma_direct_offset(dev);
 }
 
@@ -126,12 +126,72 @@ static inline void dma_direct_unmap_page(struct device *dev,
 					 enum dma_data_direction direction,
 					 struct dma_attrs *attrs)
 {
-/* There is not necessary to do cache cleanup
- *
- * phys_to_virt is here because in __dma_sync_page is __virt_to_phys and
- * dma_address is physical address
+
+/*
+ * On a DMA to the device, the data has already been flushed and read by
+ * the device at the point unmapping is done. No point doing anything.
+ * In the other direction, unmapping may be used just before accessing the
+ *  data on the CPU, so cache invalidation is necessary.
  */
-	__dma_sync_page(dma_address, 0 , size, direction);
+
+	if (direction == DMA_FROM_DEVICE)
+		__dma_sync(dma_address, size, direction);
+}
+
+static inline void 
+dma_direct_sync_single_for_cpu(struct device *dev,
+			       dma_addr_t dma_handle, size_t size,
+			       enum dma_data_direction direction)
+{
+	/*
+	 *  It's pointless to invalidate the cache if the device isn't
+	 *  supposed to write to the relevant region
+	 */
+
+	if (direction == DMA_FROM_DEVICE)
+		__dma_sync(dma_handle, size, direction);
+}
+
+static inline void 
+dma_direct_sync_single_for_device(struct device *dev,
+				  dma_addr_t dma_handle, size_t size,
+				  enum dma_data_direction direction)
+{
+	/*
+	 *  It's pointless to invalidate the cache if the device isn't
+	 *  supposed to write to the relevant region
+	 */
+	
+	if (direction == DMA_TO_DEVICE)
+		__dma_sync(dma_handle, size, direction);
+}
+
+static inline void
+dma_direct_sync_sg_for_cpu(struct device *dev,
+			   struct scatterlist *sgl, int nents,
+			   enum dma_data_direction direction)
+{
+	struct scatterlist *sg;
+	int i;
+	
+	/* FIXME this part of code is untested */
+	if (direction == DMA_FROM_DEVICE)
+		for_each_sg(sgl, sg, nents, i)
+			__dma_sync(sg->dma_address, sg->length, direction);
+}
+
+static inline void
+dma_direct_sync_sg_for_device(struct device *dev,
+			      struct scatterlist *sgl, int nents,
+			      enum dma_data_direction direction)
+{
+	struct scatterlist *sg;
+	int i;
+	
+	/* FIXME this part of code is untested */
+	if (direction == DMA_TO_DEVICE)
+		for_each_sg(sgl, sg, nents, i)
+			__dma_sync(sg->dma_address, sg->length, direction);
 }
 
 struct dma_map_ops dma_direct_ops = {
@@ -142,6 +202,10 @@ struct dma_map_ops dma_direct_ops = {
 	.dma_supported	= dma_direct_dma_supported,
 	.map_page	= dma_direct_map_page,
 	.unmap_page	= dma_direct_unmap_page,
+	.sync_single_for_cpu 		= dma_direct_sync_single_for_cpu,
+	.sync_single_for_device 	= dma_direct_sync_single_for_device,
+	.sync_sg_for_cpu 		= dma_direct_sync_sg_for_cpu,
+	.sync_sg_for_device 		= dma_direct_sync_sg_for_device,
 };
 EXPORT_SYMBOL(dma_direct_ops);
 
-- 
1.7.2.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] arch/microblaze: Added sync method support for DMA  and made refinements
  2011-09-02 15:12   ` Eli Billauer
@ 2011-09-09  9:16     ` Michal Simek
  0 siblings, 0 replies; 6+ messages in thread
From: Michal Simek @ 2011-09-09  9:16 UTC (permalink / raw)
  To: Eli Billauer; +Cc: LKML, monstr

Eli Billauer wrote:
> Michal Simek wrote:
> 
>>
>> DMA_TO_DEVICE is fine - data has to be flushed.
>>
>> For DMA_FROM_DEVICE you expect that allocated space will contains data 
>> from device.
>> Which means if you flush them, they will be rewritten by DMA in the 
>> next step.
>> Which means that IMHO you can invalidate them which is faster than 
>> flushing.
> The flushing is only necessary when a certain memory region is written 
> to, not flushed, and then allocated for DMA from device. This is not 
> what drivers usually do, but it's nevertheless legal to do so. Maybe it 
> can also happen as a result of memory being freed but not flushed, and 
> then allocated as a DMA buffer. So this flushing prevents, at most, an 
> extremely rare problem. I don't expect to see something go wrong right 
> away in the lack of this flush.

I prefer to fix the real problems. If you can't see any real problem,
please do not fix theoretical one. I am open to accept your comments
that it is safer to use flush. Look at my second email and add it to 3rd patch.

> 
> In light of this, I can't see why flushing would be slower than 
> invalidation, if the cache lines aren't expected to be dirty except for 
> very rare conditions.

If they are not dirty, flushing/invalidation time should be the same.
But if they are, invalidation is faster.


> 
> As for invalidation of a dirty cache line: Looking at the Microblaze 
> reference manual's description of the wdc instruction, I'm under the 
> impression that each cache line has two flags: Valid and Dirty, as they 
> appear in the pseudocode. I'm not on the clear about what happens if a 
> dirty cache line is invalidated. It would make sense to clear both 
> flags. It also makes sense to write the data back to the RAM in this 
> case. But I really don't know what's actually implemented.

AFAIK if a dirty cache line is invalidated data is just removed from flash
and there is not and you do invalidation - CPU using cache lines(on BE) and writing
data to memory. There could be a lot of stall cycles.

Thanks,
Michal




-- 
Michal Simek, Ing. (M.Eng)
PetaLogix - Linux Solutions for a Reconfigurable World
w: www.petalogix.com p: +61-7-30090663,+42-0-721842854 f: +61-7-30090663

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Added sync method support for DMA and made refinements (Take II)
  2011-09-02 15:15 ` [PATCH] Added sync method support for DMA and made refinements (Take II) Eli Billauer
@ 2011-09-09  9:19   ` Michal Simek
  0 siblings, 0 replies; 6+ messages in thread
From: Michal Simek @ 2011-09-09  9:19 UTC (permalink / raw)
  To: Eli Billauer; +Cc: linux-kernel

Eli Billauer wrote:
> Signed-off-by: Eli Billauer <eli.billauer@gmail.com>

Missing description. + add "microblaze: " prefix to subject.

I have looked at the patch again and I would prefer
to separate it to 3 patches.

1. Move __dma_sync to dma-mapping.h + fix parameters.
2. Introduce new sync operations
3. The rest of changes in dma code.

I have no problem to apply patch 1 and 2.
If you show me bug you see with patch 3, I will test it and keep in my repo for a while.
As I wrote in my previous post, I am open to add your comment to dma code
that flush will be safer to use but slower if cache lines are dirty but I don't like
to cache the code without real problem.
In any case, if you show me, on program/driver/etc that the problem is in invalidation
I will apply it without no problem.

Thanks for your patches,
Michal

-- 
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-09-09  9:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-02  9:31 [PATCH] arch/microblaze: Added sync method support for DMA and made refinements Eli Billauer
2011-09-02 12:32 ` Michal Simek
2011-09-02 15:12   ` Eli Billauer
2011-09-09  9:16     ` Michal Simek
2011-09-02 15:15 ` [PATCH] Added sync method support for DMA and made refinements (Take II) Eli Billauer
2011-09-09  9:19   ` Michal Simek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).