[PATCH/RFC 0/2] ARM: DMA-mapping: new extensions for buffer sharing (part 2)

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH/RFC 0/2] ARM: DMA-mapping: new extensions for buffer sharing (part 2)
@ 2012-06-06 13:17 ` Marek Szyprowski
  0 siblings, 0 replies; 21+ messages in thread
From: Marek Szyprowski @ 2012-06-06 13:17 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	linux-kernel
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann,
	Russell King - ARM Linux, Chunsang Jeong, Krishna Reddy,
	Benjamin Herrenschmidt, Konrad Rzeszutek Wilk, Hiroshi Doyu,
	Subash Patel, Sumit Semwal, Abhinav Kochhar, Tomasz Stanislawski

Hello,

This is a continuation of the dma-mapping extensions posted in the
following thread:
http://thread.gmane.org/gmane.linux.kernel.mm/78644

We noticed that some advanced buffer sharing use cases usually require
creating a dma mapping for the same memory buffer for more than one
device. Usually also such buffer is never touched with CPU, so the data
are processed by the devices.

From the DMA-mapping perspective this requires to call one of the
dma_map_{page,single,sg} function for the given memory buffer a few
times, for each of the devices. Each dma_map_* call performs CPU cache
synchronization, what might be a time consuming operation, especially
when the buffers are large. We would like to avoid any useless and time
consuming operations, so that was the main reason for introducing
another attribute for DMA-mapping subsystem: DMA_ATTR_SKIP_CPU_SYNC,
which lets dma-mapping core to skip CPU cache synchronization in certain
cases.

The proposed patches have been generated on top of the ARM DMA-mapping
redesign patch series on Linux v3.4-rc7. They are also available on the
following GIT branch:

git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git 3.4-rc7-arm-dma-v10-ext

with all require patches on top of vanilla v3.4-rc7 kernel. I will
resend them rebased onto v3.5-rc1 soon.

Best regards
Marek Szyprowski
Samsung Poland R&D Center

Patch summary:

Marek Szyprowski (2):
  common: DMA-mapping: add DMA_ATTR_SKIP_CPU_SYNC attribute
  ARM: dma-mapping: add support for DMA_ATTR_SKIP_CPU_SYNC attribute

 Documentation/DMA-attributes.txt |   24 ++++++++++++++++++++++++
 arch/arm/mm/dma-mapping.c        |   20 +++++++++++---------
 include/linux/dma-attrs.h        |    1 +
 3 files changed, 36 insertions(+), 9 deletions(-)

-- 
1.7.1.569.g6f426

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH/RFC 0/2] ARM: DMA-mapping: new extensions for buffer sharing (part 2)
@ 2012-06-06 13:17 ` Marek Szyprowski
  0 siblings, 0 replies; 21+ messages in thread
From: Marek Szyprowski @ 2012-06-06 13:17 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	linux-kernel
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann,
	Russell King - ARM Linux, Chunsang Jeong, Krishna Reddy,
	Benjamin Herrenschmidt, Konrad Rzeszutek Wilk, Hiroshi Doyu,
	Subash Patel, Sumit Semwal, Abhinav Kochhar, Tomasz Stanislawski

Hello,

This is a continuation of the dma-mapping extensions posted in the
following thread:
http://thread.gmane.org/gmane.linux.kernel.mm/78644

We noticed that some advanced buffer sharing use cases usually require
creating a dma mapping for the same memory buffer for more than one
device. Usually also such buffer is never touched with CPU, so the data
are processed by the devices.

>From the DMA-mapping perspective this requires to call one of the
dma_map_{page,single,sg} function for the given memory buffer a few
times, for each of the devices. Each dma_map_* call performs CPU cache
synchronization, what might be a time consuming operation, especially
when the buffers are large. We would like to avoid any useless and time
consuming operations, so that was the main reason for introducing
another attribute for DMA-mapping subsystem: DMA_ATTR_SKIP_CPU_SYNC,
which lets dma-mapping core to skip CPU cache synchronization in certain
cases.

The proposed patches have been generated on top of the ARM DMA-mapping
redesign patch series on Linux v3.4-rc7. They are also available on the
following GIT branch:

git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git 3.4-rc7-arm-dma-v10-ext

with all require patches on top of vanilla v3.4-rc7 kernel. I will
resend them rebased onto v3.5-rc1 soon.

Best regards
Marek Szyprowski
Samsung Poland R&D Center


Patch summary:

Marek Szyprowski (2):
  common: DMA-mapping: add DMA_ATTR_SKIP_CPU_SYNC attribute
  ARM: dma-mapping: add support for DMA_ATTR_SKIP_CPU_SYNC attribute

 Documentation/DMA-attributes.txt |   24 ++++++++++++++++++++++++
 arch/arm/mm/dma-mapping.c        |   20 +++++++++++---------
 include/linux/dma-attrs.h        |    1 +
 3 files changed, 36 insertions(+), 9 deletions(-)

-- 
1.7.1.569.g6f426


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH/RFC 0/2] ARM: DMA-mapping: new extensions for buffer sharing (part 2)
@ 2012-06-06 13:17 ` Marek Szyprowski
  0 siblings, 0 replies; 21+ messages in thread
From: Marek Szyprowski @ 2012-06-06 13:17 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	linux-kernel
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann,
	Russell King - ARM Linux, Chunsang Jeong, Krishna Reddy,
	Benjamin Herrenschmidt, Konrad Rzeszutek Wilk, Hiroshi Doyu,
	Subash Patel, Sumit Semwal, Abhinav Kochhar, Tomasz Stanislawski

Hello,

This is a continuation of the dma-mapping extensions posted in the
following thread:
http://thread.gmane.org/gmane.linux.kernel.mm/78644

We noticed that some advanced buffer sharing use cases usually require
creating a dma mapping for the same memory buffer for more than one
device. Usually also such buffer is never touched with CPU, so the data
are processed by the devices.

>From the DMA-mapping perspective this requires to call one of the
dma_map_{page,single,sg} function for the given memory buffer a few
times, for each of the devices. Each dma_map_* call performs CPU cache
synchronization, what might be a time consuming operation, especially
when the buffers are large. We would like to avoid any useless and time
consuming operations, so that was the main reason for introducing
another attribute for DMA-mapping subsystem: DMA_ATTR_SKIP_CPU_SYNC,
which lets dma-mapping core to skip CPU cache synchronization in certain
cases.

The proposed patches have been generated on top of the ARM DMA-mapping
redesign patch series on Linux v3.4-rc7. They are also available on the
following GIT branch:

git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git 3.4-rc7-arm-dma-v10-ext

with all require patches on top of vanilla v3.4-rc7 kernel. I will
resend them rebased onto v3.5-rc1 soon.

Best regards
Marek Szyprowski
Samsung Poland R&D Center


Patch summary:

Marek Szyprowski (2):
  common: DMA-mapping: add DMA_ATTR_SKIP_CPU_SYNC attribute
  ARM: dma-mapping: add support for DMA_ATTR_SKIP_CPU_SYNC attribute

 Documentation/DMA-attributes.txt |   24 ++++++++++++++++++++++++
 arch/arm/mm/dma-mapping.c        |   20 +++++++++++---------
 include/linux/dma-attrs.h        |    1 +
 3 files changed, 36 insertions(+), 9 deletions(-)

-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH/RFC 0/2] ARM: DMA-mapping: new extensions for buffer sharing (part 2)
@ 2012-06-06 13:17 ` Marek Szyprowski
  0 siblings, 0 replies; 21+ messages in thread
From: Marek Szyprowski @ 2012-06-06 13:17 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

This is a continuation of the dma-mapping extensions posted in the
following thread:
http://thread.gmane.org/gmane.linux.kernel.mm/78644

We noticed that some advanced buffer sharing use cases usually require
creating a dma mapping for the same memory buffer for more than one
device. Usually also such buffer is never touched with CPU, so the data
are processed by the devices.

>From the DMA-mapping perspective this requires to call one of the
dma_map_{page,single,sg} function for the given memory buffer a few
times, for each of the devices. Each dma_map_* call performs CPU cache
synchronization, what might be a time consuming operation, especially
when the buffers are large. We would like to avoid any useless and time
consuming operations, so that was the main reason for introducing
another attribute for DMA-mapping subsystem: DMA_ATTR_SKIP_CPU_SYNC,
which lets dma-mapping core to skip CPU cache synchronization in certain
cases.

The proposed patches have been generated on top of the ARM DMA-mapping
redesign patch series on Linux v3.4-rc7. They are also available on the
following GIT branch:

git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git 3.4-rc7-arm-dma-v10-ext

with all require patches on top of vanilla v3.4-rc7 kernel. I will
resend them rebased onto v3.5-rc1 soon.

Best regards
Marek Szyprowski
Samsung Poland R&D Center


Patch summary:

Marek Szyprowski (2):
  common: DMA-mapping: add DMA_ATTR_SKIP_CPU_SYNC attribute
  ARM: dma-mapping: add support for DMA_ATTR_SKIP_CPU_SYNC attribute

 Documentation/DMA-attributes.txt |   24 ++++++++++++++++++++++++
 arch/arm/mm/dma-mapping.c        |   20 +++++++++++---------
 include/linux/dma-attrs.h        |    1 +
 3 files changed, 36 insertions(+), 9 deletions(-)

-- 
1.7.1.569.g6f426

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/2] common: DMA-mapping: add DMA_ATTR_SKIP_CPU_SYNC attribute
@ 2012-06-06 13:17   ` Marek Szyprowski
  0 siblings, 0 replies; 21+ messages in thread
From: Marek Szyprowski @ 2012-06-06 13:17 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	linux-kernel
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann,
	Russell King - ARM Linux, Chunsang Jeong, Krishna Reddy,
	Benjamin Herrenschmidt, Konrad Rzeszutek Wilk, Hiroshi Doyu,
	Subash Patel, Sumit Semwal, Abhinav Kochhar, Tomasz Stanislawski

This patch adds DMA_ATTR_SKIP_CPU_SYNC attribute to the DMA-mapping
subsystem.

By default dma_map_{single,page,sg} functions family transfer a given
buffer from CPU domain to device domain. Some advanced use cases might
require sharing a buffer between more than one device. This requires
having a mapping created separately for each device and is usually
performed by calling dma_map_{single,page,sg} function more than once
for the given buffer with device pointer to each device taking part in
the buffer sharing. The first call transfers a buffer from 'CPU' domain
to 'device' domain, what synchronizes CPU caches for the given region
(usually it means that the cache has been flushed or invalidated
depending on the dma direction). However, next calls to
dma_map_{single,page,sg}() for other devices will perform exactly the
same sychronization operation on the CPU cache. CPU cache sychronization
might be a time consuming operation, especially if the buffers are
large, so it is highly recommended to avoid it if possible.
DMA_ATTR_SKIP_CPU_SYNC allows platform code to skip synchronization of
the CPU cache for the given buffer assuming that it has been already
transferred to 'device' domain. This attribute can be also used for
dma_unmap_{single,page,sg} functions family to force buffer to stay in
device domain after releasing a mapping for it. Use this attribute with
care!

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 Documentation/DMA-attributes.txt |   24 ++++++++++++++++++++++++
 include/linux/dma-attrs.h        |    1 +
 2 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
index 725580d..f503090 100644
--- a/Documentation/DMA-attributes.txt
+++ b/Documentation/DMA-attributes.txt
@@ -67,3 +67,27 @@ set on each call.
 Since it is optional for platforms to implement
 DMA_ATTR_NO_KERNEL_MAPPING, those that do not will simply ignore the
 attribute and exhibit default behavior.
+
+DMA_ATTR_SKIP_CPU_SYNC
+----------------------
+
+By default dma_map_{single,page,sg} functions family transfer a given
+buffer from CPU domain to device domain. Some advanced use cases might
+require sharing a buffer between more than one device. This requires
+having a mapping created separately for each device and is usually
+performed by calling dma_map_{single,page,sg} function more than once
+for the given buffer with device pointer to each device taking part in
+the buffer sharing. The first call transfers a buffer from 'CPU' domain
+to 'device' domain, what synchronizes CPU caches for the given region
+(usually it means that the cache has been flushed or invalidated
+depending on the dma direction). However, next calls to
+dma_map_{single,page,sg}() for other devices will perform exactly the
+same sychronization operation on the CPU cache. CPU cache sychronization
+might be a time consuming operation, especially if the buffers are
+large, so it is highly recommended to avoid it if possible.
+DMA_ATTR_SKIP_CPU_SYNC allows platform code to skip synchronization of
+the CPU cache for the given buffer assuming that it has been already
+transferred to 'device' domain. This attribute can be also used for
+dma_unmap_{single,page,sg} functions family to force buffer to stay in
+device domain after releasing a mapping for it. Use this attribute with
+care!
diff --git a/include/linux/dma-attrs.h b/include/linux/dma-attrs.h
index a37c10c..f83f793 100644
--- a/include/linux/dma-attrs.h
+++ b/include/linux/dma-attrs.h
@@ -16,6 +16,7 @@ enum dma_attr {
 	DMA_ATTR_WRITE_COMBINE,
 	DMA_ATTR_NON_CONSISTENT,
 	DMA_ATTR_NO_KERNEL_MAPPING,
+	DMA_ATTR_SKIP_CPU_SYNC,
 	DMA_ATTR_MAX,
 };
 
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 1/2] common: DMA-mapping: add DMA_ATTR_SKIP_CPU_SYNC attribute
@ 2012-06-06 13:17   ` Marek Szyprowski
  0 siblings, 0 replies; 21+ messages in thread
From: Marek Szyprowski @ 2012-06-06 13:17 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds DMA_ATTR_SKIP_CPU_SYNC attribute to the DMA-mapping
subsystem.

By default dma_map_{single,page,sg} functions family transfer a given
buffer from CPU domain to device domain. Some advanced use cases might
require sharing a buffer between more than one device. This requires
having a mapping created separately for each device and is usually
performed by calling dma_map_{single,page,sg} function more than once
for the given buffer with device pointer to each device taking part in
the buffer sharing. The first call transfers a buffer from 'CPU' domain
to 'device' domain, what synchronizes CPU caches for the given region
(usually it means that the cache has been flushed or invalidated
depending on the dma direction). However, next calls to
dma_map_{single,page,sg}() for other devices will perform exactly the
same sychronization operation on the CPU cache. CPU cache sychronization
might be a time consuming operation, especially if the buffers are
large, so it is highly recommended to avoid it if possible.
DMA_ATTR_SKIP_CPU_SYNC allows platform code to skip synchronization of
the CPU cache for the given buffer assuming that it has been already
transferred to 'device' domain. This attribute can be also used for
dma_unmap_{single,page,sg} functions family to force buffer to stay in
device domain after releasing a mapping for it. Use this attribute with
care!

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 Documentation/DMA-attributes.txt |   24 ++++++++++++++++++++++++
 include/linux/dma-attrs.h        |    1 +
 2 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
index 725580d..f503090 100644
--- a/Documentation/DMA-attributes.txt
+++ b/Documentation/DMA-attributes.txt
@@ -67,3 +67,27 @@ set on each call.
 Since it is optional for platforms to implement
 DMA_ATTR_NO_KERNEL_MAPPING, those that do not will simply ignore the
 attribute and exhibit default behavior.
+
+DMA_ATTR_SKIP_CPU_SYNC
+----------------------
+
+By default dma_map_{single,page,sg} functions family transfer a given
+buffer from CPU domain to device domain. Some advanced use cases might
+require sharing a buffer between more than one device. This requires
+having a mapping created separately for each device and is usually
+performed by calling dma_map_{single,page,sg} function more than once
+for the given buffer with device pointer to each device taking part in
+the buffer sharing. The first call transfers a buffer from 'CPU' domain
+to 'device' domain, what synchronizes CPU caches for the given region
+(usually it means that the cache has been flushed or invalidated
+depending on the dma direction). However, next calls to
+dma_map_{single,page,sg}() for other devices will perform exactly the
+same sychronization operation on the CPU cache. CPU cache sychronization
+might be a time consuming operation, especially if the buffers are
+large, so it is highly recommended to avoid it if possible.
+DMA_ATTR_SKIP_CPU_SYNC allows platform code to skip synchronization of
+the CPU cache for the given buffer assuming that it has been already
+transferred to 'device' domain. This attribute can be also used for
+dma_unmap_{single,page,sg} functions family to force buffer to stay in
+device domain after releasing a mapping for it. Use this attribute with
+care!
diff --git a/include/linux/dma-attrs.h b/include/linux/dma-attrs.h
index a37c10c..f83f793 100644
--- a/include/linux/dma-attrs.h
+++ b/include/linux/dma-attrs.h
@@ -16,6 +16,7 @@ enum dma_attr {
 	DMA_ATTR_WRITE_COMBINE,
 	DMA_ATTR_NON_CONSISTENT,
 	DMA_ATTR_NO_KERNEL_MAPPING,
+	DMA_ATTR_SKIP_CPU_SYNC,
 	DMA_ATTR_MAX,
 };

-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 1/2] common: DMA-mapping: add DMA_ATTR_SKIP_CPU_SYNC attribute
@ 2012-06-06 13:17   ` Marek Szyprowski
  0 siblings, 0 replies; 21+ messages in thread
From: Marek Szyprowski @ 2012-06-06 13:17 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	linux-kernel
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann,
	Russell King - ARM Linux, Chunsang Jeong, Krishna Reddy,
	Benjamin Herrenschmidt, Konrad Rzeszutek Wilk, Hiroshi Doyu,
	Subash Patel, Sumit Semwal, Abhinav Kochhar, Tomasz Stanislawski

This patch adds DMA_ATTR_SKIP_CPU_SYNC attribute to the DMA-mapping
subsystem.

By default dma_map_{single,page,sg} functions family transfer a given
buffer from CPU domain to device domain. Some advanced use cases might
require sharing a buffer between more than one device. This requires
having a mapping created separately for each device and is usually
performed by calling dma_map_{single,page,sg} function more than once
for the given buffer with device pointer to each device taking part in
the buffer sharing. The first call transfers a buffer from 'CPU' domain
to 'device' domain, what synchronizes CPU caches for the given region
(usually it means that the cache has been flushed or invalidated
depending on the dma direction). However, next calls to
dma_map_{single,page,sg}() for other devices will perform exactly the
same sychronization operation on the CPU cache. CPU cache sychronization
might be a time consuming operation, especially if the buffers are
large, so it is highly recommended to avoid it if possible.
DMA_ATTR_SKIP_CPU_SYNC allows platform code to skip synchronization of
the CPU cache for the given buffer assuming that it has been already
transferred to 'device' domain. This attribute can be also used for
dma_unmap_{single,page,sg} functions family to force buffer to stay in
device domain after releasing a mapping for it. Use this attribute with
care!

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 Documentation/DMA-attributes.txt |   24 ++++++++++++++++++++++++
 include/linux/dma-attrs.h        |    1 +
 2 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
index 725580d..f503090 100644
--- a/Documentation/DMA-attributes.txt
+++ b/Documentation/DMA-attributes.txt
@@ -67,3 +67,27 @@ set on each call.
 Since it is optional for platforms to implement
 DMA_ATTR_NO_KERNEL_MAPPING, those that do not will simply ignore the
 attribute and exhibit default behavior.
+
+DMA_ATTR_SKIP_CPU_SYNC
+----------------------
+
+By default dma_map_{single,page,sg} functions family transfer a given
+buffer from CPU domain to device domain. Some advanced use cases might
+require sharing a buffer between more than one device. This requires
+having a mapping created separately for each device and is usually
+performed by calling dma_map_{single,page,sg} function more than once
+for the given buffer with device pointer to each device taking part in
+the buffer sharing. The first call transfers a buffer from 'CPU' domain
+to 'device' domain, what synchronizes CPU caches for the given region
+(usually it means that the cache has been flushed or invalidated
+depending on the dma direction). However, next calls to
+dma_map_{single,page,sg}() for other devices will perform exactly the
+same sychronization operation on the CPU cache. CPU cache sychronization
+might be a time consuming operation, especially if the buffers are
+large, so it is highly recommended to avoid it if possible.
+DMA_ATTR_SKIP_CPU_SYNC allows platform code to skip synchronization of
+the CPU cache for the given buffer assuming that it has been already
+transferred to 'device' domain. This attribute can be also used for
+dma_unmap_{single,page,sg} functions family to force buffer to stay in
+device domain after releasing a mapping for it. Use this attribute with
+care!
diff --git a/include/linux/dma-attrs.h b/include/linux/dma-attrs.h
index a37c10c..f83f793 100644
--- a/include/linux/dma-attrs.h
+++ b/include/linux/dma-attrs.h
@@ -16,6 +16,7 @@ enum dma_attr {
 	DMA_ATTR_WRITE_COMBINE,
 	DMA_ATTR_NON_CONSISTENT,
 	DMA_ATTR_NO_KERNEL_MAPPING,
+	DMA_ATTR_SKIP_CPU_SYNC,
 	DMA_ATTR_MAX,
 };

-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/2] ARM: dma-mapping: add support for DMA_ATTR_SKIP_CPU_SYNC attribute
@ 2012-06-06 13:17   ` Marek Szyprowski
  0 siblings, 0 replies; 21+ messages in thread
From: Marek Szyprowski @ 2012-06-06 13:17 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	linux-kernel
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann,
	Russell King - ARM Linux, Chunsang Jeong, Krishna Reddy,
	Benjamin Herrenschmidt, Konrad Rzeszutek Wilk, Hiroshi Doyu,
	Subash Patel, Sumit Semwal, Abhinav Kochhar, Tomasz Stanislawski

This patch adds support for DMA_ATTR_SKIP_CPU_SYNC attribute for
dma_(un)map_(single,page,sg) functions family. It lets dma mapping clients
to create a mapping for the buffer for the given device without performing
a CPU cache synchronization. CPU cache synchronization can be skipped for
the buffers which it is known that they are already in 'device' domain (CPU
caches have been already synchronized or there are only coherent mappings
for the buffer). For advanced users only, please use it with care.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 arch/arm/mm/dma-mapping.c |   20 +++++++++++---------
 1 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index b140440..62a0023 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -68,7 +68,7 @@ static dma_addr_t arm_dma_map_page(struct device *dev, struct page *page,
 	     unsigned long offset, size_t size, enum dma_data_direction dir,
 	     struct dma_attrs *attrs)
 {
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
 		__dma_page_cpu_to_dev(page, offset, size, dir);
 	return pfn_to_dma(dev, page_to_pfn(page)) + offset;
 }
@@ -91,7 +91,7 @@ static void arm_dma_unmap_page(struct device *dev, dma_addr_t handle,
 		size_t size, enum dma_data_direction dir,
 		struct dma_attrs *attrs)
 {
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
 		__dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)),
 				      handle & ~PAGE_MASK, size, dir);
 }
@@ -1077,7 +1077,7 @@ static int arm_iommu_get_sgtable(struct device *dev, struct sg_table *sgt,
  */
 static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
 			  size_t size, dma_addr_t *handle,
-			  enum dma_data_direction dir)
+			  enum dma_data_direction dir, struct dma_attrs *attrs)
 {
 	struct dma_iommu_mapping *mapping = dev->archdata.mapping;
 	dma_addr_t iova, iova_base;
@@ -1096,7 +1096,8 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
 		phys_addr_t phys = page_to_phys(sg_page(s));
 		unsigned int len = PAGE_ALIGN(s->offset + s->length);
 
-		if (!arch_is_coherent())
+		if (!arch_is_coherent() &&
+		    !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
 			__dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir);
 
 		ret = iommu_map(mapping->domain, iova, phys, len, 0);
@@ -1143,7 +1144,7 @@ int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 
 		if (s->offset || (size & ~PAGE_MASK) || size + s->length > max) {
 			if (__map_sg_chunk(dev, start, size, &dma->dma_address,
-			    dir) < 0)
+			    dir, attrs) < 0)
 				goto bad_mapping;
 
 			dma->dma_address += offset;
@@ -1156,7 +1157,7 @@ int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 		}
 		size += s->length;
 	}
-	if (__map_sg_chunk(dev, start, size, &dma->dma_address, dir) < 0)
+	if (__map_sg_chunk(dev, start, size, &dma->dma_address, dir, attrs) < 0)
 		goto bad_mapping;
 
 	dma->dma_address += offset;
@@ -1190,7 +1191,8 @@ void arm_iommu_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
 		if (sg_dma_len(s))
 			__iommu_remove_mapping(dev, sg_dma_address(s),
 					       sg_dma_len(s));
-		if (!arch_is_coherent())
+		if (!arch_is_coherent() &&
+		    !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
 			__dma_page_dev_to_cpu(sg_page(s), s->offset,
 					      s->length, dir);
 	}
@@ -1252,7 +1254,7 @@ static dma_addr_t arm_iommu_map_page(struct device *dev, struct page *page,
 	dma_addr_t dma_addr;
 	int ret, len = PAGE_ALIGN(size + offset);
 
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
 		__dma_page_cpu_to_dev(page, offset, size, dir);
 
 	dma_addr = __alloc_iova(mapping, len);
@@ -1291,7 +1293,7 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle,
 	if (!iova)
 		return;
 
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
 		__dma_page_dev_to_cpu(page, offset, size, dir);
 
 	iommu_unmap(mapping->domain, iova, len);
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/2] ARM: dma-mapping: add support for DMA_ATTR_SKIP_CPU_SYNC attribute
@ 2012-06-06 13:17   ` Marek Szyprowski
  0 siblings, 0 replies; 21+ messages in thread
From: Marek Szyprowski @ 2012-06-06 13:17 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for DMA_ATTR_SKIP_CPU_SYNC attribute for
dma_(un)map_(single,page,sg) functions family. It lets dma mapping clients
to create a mapping for the buffer for the given device without performing
a CPU cache synchronization. CPU cache synchronization can be skipped for
the buffers which it is known that they are already in 'device' domain (CPU
caches have been already synchronized or there are only coherent mappings
for the buffer). For advanced users only, please use it with care.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 arch/arm/mm/dma-mapping.c |   20 +++++++++++---------
 1 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index b140440..62a0023 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -68,7 +68,7 @@ static dma_addr_t arm_dma_map_page(struct device *dev, struct page *page,
 	     unsigned long offset, size_t size, enum dma_data_direction dir,
 	     struct dma_attrs *attrs)
 {
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
 		__dma_page_cpu_to_dev(page, offset, size, dir);
 	return pfn_to_dma(dev, page_to_pfn(page)) + offset;
 }
@@ -91,7 +91,7 @@ static void arm_dma_unmap_page(struct device *dev, dma_addr_t handle,
 		size_t size, enum dma_data_direction dir,
 		struct dma_attrs *attrs)
 {
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
 		__dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)),
 				      handle & ~PAGE_MASK, size, dir);
 }
@@ -1077,7 +1077,7 @@ static int arm_iommu_get_sgtable(struct device *dev, struct sg_table *sgt,
  */
 static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
 			  size_t size, dma_addr_t *handle,
-			  enum dma_data_direction dir)
+			  enum dma_data_direction dir, struct dma_attrs *attrs)
 {
 	struct dma_iommu_mapping *mapping = dev->archdata.mapping;
 	dma_addr_t iova, iova_base;
@@ -1096,7 +1096,8 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
 		phys_addr_t phys = page_to_phys(sg_page(s));
 		unsigned int len = PAGE_ALIGN(s->offset + s->length);
 
-		if (!arch_is_coherent())
+		if (!arch_is_coherent() &&
+		    !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
 			__dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir);
 
 		ret = iommu_map(mapping->domain, iova, phys, len, 0);
@@ -1143,7 +1144,7 @@ int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 
 		if (s->offset || (size & ~PAGE_MASK) || size + s->length > max) {
 			if (__map_sg_chunk(dev, start, size, &dma->dma_address,
-			    dir) < 0)
+			    dir, attrs) < 0)
 				goto bad_mapping;
 
 			dma->dma_address += offset;
@@ -1156,7 +1157,7 @@ int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 		}
 		size += s->length;
 	}
-	if (__map_sg_chunk(dev, start, size, &dma->dma_address, dir) < 0)
+	if (__map_sg_chunk(dev, start, size, &dma->dma_address, dir, attrs) < 0)
 		goto bad_mapping;
 
 	dma->dma_address += offset;
@@ -1190,7 +1191,8 @@ void arm_iommu_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
 		if (sg_dma_len(s))
 			__iommu_remove_mapping(dev, sg_dma_address(s),
 					       sg_dma_len(s));
-		if (!arch_is_coherent())
+		if (!arch_is_coherent() &&
+		    !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
 			__dma_page_dev_to_cpu(sg_page(s), s->offset,
 					      s->length, dir);
 	}
@@ -1252,7 +1254,7 @@ static dma_addr_t arm_iommu_map_page(struct device *dev, struct page *page,
 	dma_addr_t dma_addr;
 	int ret, len = PAGE_ALIGN(size + offset);
 
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
 		__dma_page_cpu_to_dev(page, offset, size, dir);
 
 	dma_addr = __alloc_iova(mapping, len);
@@ -1291,7 +1293,7 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle,
 	if (!iova)
 		return;
 
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
 		__dma_page_dev_to_cpu(page, offset, size, dir);
 
 	iommu_unmap(mapping->domain, iova, len);
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/2] ARM: dma-mapping: add support for DMA_ATTR_SKIP_CPU_SYNC attribute
@ 2012-06-06 13:17   ` Marek Szyprowski
  0 siblings, 0 replies; 21+ messages in thread
From: Marek Szyprowski @ 2012-06-06 13:17 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	linux-kernel
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann,
	Russell King - ARM Linux, Chunsang Jeong, Krishna Reddy,
	Benjamin Herrenschmidt, Konrad Rzeszutek Wilk, Hiroshi Doyu,
	Subash Patel, Sumit Semwal, Abhinav Kochhar, Tomasz Stanislawski

This patch adds support for DMA_ATTR_SKIP_CPU_SYNC attribute for
dma_(un)map_(single,page,sg) functions family. It lets dma mapping clients
to create a mapping for the buffer for the given device without performing
a CPU cache synchronization. CPU cache synchronization can be skipped for
the buffers which it is known that they are already in 'device' domain (CPU
caches have been already synchronized or there are only coherent mappings
for the buffer). For advanced users only, please use it with care.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 arch/arm/mm/dma-mapping.c |   20 +++++++++++---------
 1 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index b140440..62a0023 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -68,7 +68,7 @@ static dma_addr_t arm_dma_map_page(struct device *dev, struct page *page,
 	     unsigned long offset, size_t size, enum dma_data_direction dir,
 	     struct dma_attrs *attrs)
 {
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
 		__dma_page_cpu_to_dev(page, offset, size, dir);
 	return pfn_to_dma(dev, page_to_pfn(page)) + offset;
 }
@@ -91,7 +91,7 @@ static void arm_dma_unmap_page(struct device *dev, dma_addr_t handle,
 		size_t size, enum dma_data_direction dir,
 		struct dma_attrs *attrs)
 {
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
 		__dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)),
 				      handle & ~PAGE_MASK, size, dir);
 }
@@ -1077,7 +1077,7 @@ static int arm_iommu_get_sgtable(struct device *dev, struct sg_table *sgt,
  */
 static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
 			  size_t size, dma_addr_t *handle,
-			  enum dma_data_direction dir)
+			  enum dma_data_direction dir, struct dma_attrs *attrs)
 {
 	struct dma_iommu_mapping *mapping = dev->archdata.mapping;
 	dma_addr_t iova, iova_base;
@@ -1096,7 +1096,8 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
 		phys_addr_t phys = page_to_phys(sg_page(s));
 		unsigned int len = PAGE_ALIGN(s->offset + s->length);
 
-		if (!arch_is_coherent())
+		if (!arch_is_coherent() &&
+		    !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
 			__dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir);
 
 		ret = iommu_map(mapping->domain, iova, phys, len, 0);
@@ -1143,7 +1144,7 @@ int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 
 		if (s->offset || (size & ~PAGE_MASK) || size + s->length > max) {
 			if (__map_sg_chunk(dev, start, size, &dma->dma_address,
-			    dir) < 0)
+			    dir, attrs) < 0)
 				goto bad_mapping;
 
 			dma->dma_address += offset;
@@ -1156,7 +1157,7 @@ int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 		}
 		size += s->length;
 	}
-	if (__map_sg_chunk(dev, start, size, &dma->dma_address, dir) < 0)
+	if (__map_sg_chunk(dev, start, size, &dma->dma_address, dir, attrs) < 0)
 		goto bad_mapping;
 
 	dma->dma_address += offset;
@@ -1190,7 +1191,8 @@ void arm_iommu_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
 		if (sg_dma_len(s))
 			__iommu_remove_mapping(dev, sg_dma_address(s),
 					       sg_dma_len(s));
-		if (!arch_is_coherent())
+		if (!arch_is_coherent() &&
+		    !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
 			__dma_page_dev_to_cpu(sg_page(s), s->offset,
 					      s->length, dir);
 	}
@@ -1252,7 +1254,7 @@ static dma_addr_t arm_iommu_map_page(struct device *dev, struct page *page,
 	dma_addr_t dma_addr;
 	int ret, len = PAGE_ALIGN(size + offset);
 
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
 		__dma_page_cpu_to_dev(page, offset, size, dir);
 
 	dma_addr = __alloc_iova(mapping, len);
@@ -1291,7 +1293,7 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle,
 	if (!iova)
 		return;
 
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
 		__dma_page_dev_to_cpu(page, offset, size, dir);
 
 	iommu_unmap(mapping->domain, iova, len);
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH/RFC 0/2] ARM: DMA-mapping: new extensions for buffer sharing (part 2)
@ 2012-06-06 13:45   ` Subash Patel
  0 siblings, 0 replies; 21+ messages in thread
From: Subash Patel @ 2012-06-06 13:45 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	linux-kernel, Kyungmin Park, Arnd Bergmann,
	Russell King - ARM Linux, Chunsang Jeong, Krishna Reddy,
	Benjamin Herrenschmidt, Konrad Rzeszutek Wilk, Hiroshi Doyu,
	Subash Patel, Sumit Semwal, Abhinav Kochhar, Tomasz Stanislawski

Hello Marek,

Thanks for the patch. We had found below two challenges when using UMM 
related to the cache invalidate/flush after/before performing the DMA 
operations:

a) when using HIGH_MEM pages, the page-table walk consumed lot of time 
to get the KVA of each page. Moreover the overhead was from the spinlock 
we acquire/release for each of the page.

b) One of my colleague tried to map/unmap the buffers only once instead 
of every time(which results in this problem) and we didn't find 
significant performance improvement. The reason is (as per my knowledge) 
when we give address range to cache controller to invalidate/flush out, 
the hardware operation is too fast(if there were any cache lines 
associated with the pages at all) to add any overhead to the CPU operation.

But this patch makes logical flow for dma-mapping one step closer :) I 
will adopt it as part of pulling all your new patches, and will keep you 
updated of any new findings.

Regards,
Subash

On 06/06/2012 06:47 PM, Marek Szyprowski wrote:
> Hello,
>
> This is a continuation of the dma-mapping extensions posted in the
> following thread:
> http://thread.gmane.org/gmane.linux.kernel.mm/78644
>
> We noticed that some advanced buffer sharing use cases usually require
> creating a dma mapping for the same memory buffer for more than one
> device. Usually also such buffer is never touched with CPU, so the data
> are processed by the devices.
>
>  From the DMA-mapping perspective this requires to call one of the
> dma_map_{page,single,sg} function for the given memory buffer a few
> times, for each of the devices. Each dma_map_* call performs CPU cache
> synchronization, what might be a time consuming operation, especially
> when the buffers are large. We would like to avoid any useless and time
> consuming operations, so that was the main reason for introducing
> another attribute for DMA-mapping subsystem: DMA_ATTR_SKIP_CPU_SYNC,
> which lets dma-mapping core to skip CPU cache synchronization in certain
> cases.
>
> The proposed patches have been generated on top of the ARM DMA-mapping
> redesign patch series on Linux v3.4-rc7. They are also available on the
> following GIT branch:
>
> git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git 3.4-rc7-arm-dma-v10-ext
>
> with all require patches on top of vanilla v3.4-rc7 kernel. I will
> resend them rebased onto v3.5-rc1 soon.
>
> Best regards
> Marek Szyprowski
> Samsung Poland R&D Center
>
>
> Patch summary:
>
> Marek Szyprowski (2):
>    common: DMA-mapping: add DMA_ATTR_SKIP_CPU_SYNC attribute
>    ARM: dma-mapping: add support for DMA_ATTR_SKIP_CPU_SYNC attribute
>
>   Documentation/DMA-attributes.txt |   24 ++++++++++++++++++++++++
>   arch/arm/mm/dma-mapping.c        |   20 +++++++++++---------
>   include/linux/dma-attrs.h        |    1 +
>   3 files changed, 36 insertions(+), 9 deletions(-)
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH/RFC 0/2] ARM: DMA-mapping: new extensions for buffer sharing (part 2)
@ 2012-06-06 13:45   ` Subash Patel
  0 siblings, 0 replies; 21+ messages in thread
From: Subash Patel @ 2012-06-06 13:45 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Marek,

Thanks for the patch. We had found below two challenges when using UMM 
related to the cache invalidate/flush after/before performing the DMA 
operations:

a) when using HIGH_MEM pages, the page-table walk consumed lot of time 
to get the KVA of each page. Moreover the overhead was from the spinlock 
we acquire/release for each of the page.

b) One of my colleague tried to map/unmap the buffers only once instead 
of every time(which results in this problem) and we didn't find 
significant performance improvement. The reason is (as per my knowledge) 
when we give address range to cache controller to invalidate/flush out, 
the hardware operation is too fast(if there were any cache lines 
associated with the pages at all) to add any overhead to the CPU operation.

But this patch makes logical flow for dma-mapping one step closer :) I 
will adopt it as part of pulling all your new patches, and will keep you 
updated of any new findings.

Regards,
Subash

On 06/06/2012 06:47 PM, Marek Szyprowski wrote:
> Hello,
>
> This is a continuation of the dma-mapping extensions posted in the
> following thread:
> http://thread.gmane.org/gmane.linux.kernel.mm/78644
>
> We noticed that some advanced buffer sharing use cases usually require
> creating a dma mapping for the same memory buffer for more than one
> device. Usually also such buffer is never touched with CPU, so the data
> are processed by the devices.
>
>  From the DMA-mapping perspective this requires to call one of the
> dma_map_{page,single,sg} function for the given memory buffer a few
> times, for each of the devices. Each dma_map_* call performs CPU cache
> synchronization, what might be a time consuming operation, especially
> when the buffers are large. We would like to avoid any useless and time
> consuming operations, so that was the main reason for introducing
> another attribute for DMA-mapping subsystem: DMA_ATTR_SKIP_CPU_SYNC,
> which lets dma-mapping core to skip CPU cache synchronization in certain
> cases.
>
> The proposed patches have been generated on top of the ARM DMA-mapping
> redesign patch series on Linux v3.4-rc7. They are also available on the
> following GIT branch:
>
> git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git 3.4-rc7-arm-dma-v10-ext
>
> with all require patches on top of vanilla v3.4-rc7 kernel. I will
> resend them rebased onto v3.5-rc1 soon.
>
> Best regards
> Marek Szyprowski
> Samsung Poland R&D Center
>
>
> Patch summary:
>
> Marek Szyprowski (2):
>    common: DMA-mapping: add DMA_ATTR_SKIP_CPU_SYNC attribute
>    ARM: dma-mapping: add support for DMA_ATTR_SKIP_CPU_SYNC attribute
>
>   Documentation/DMA-attributes.txt |   24 ++++++++++++++++++++++++
>   arch/arm/mm/dma-mapping.c        |   20 +++++++++++---------
>   include/linux/dma-attrs.h        |    1 +
>   3 files changed, 36 insertions(+), 9 deletions(-)
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH/RFC 0/2] ARM: DMA-mapping: new extensions for buffer sharing (part 2)
@ 2012-06-06 13:45   ` Subash Patel
  0 siblings, 0 replies; 21+ messages in thread
From: Subash Patel @ 2012-06-06 13:45 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	linux-kernel, Kyungmin Park, Arnd Bergmann,
	Russell King - ARM Linux, Chunsang Jeong, Krishna Reddy,
	Benjamin Herrenschmidt, Konrad Rzeszutek Wilk, Hiroshi Doyu,
	Subash Patel, Sumit Semwal, Abhinav Kochhar, Tomasz Stanislawski

Hello Marek,

Thanks for the patch. We had found below two challenges when using UMM 
related to the cache invalidate/flush after/before performing the DMA 
operations:

a) when using HIGH_MEM pages, the page-table walk consumed lot of time 
to get the KVA of each page. Moreover the overhead was from the spinlock 
we acquire/release for each of the page.

b) One of my colleague tried to map/unmap the buffers only once instead 
of every time(which results in this problem) and we didn't find 
significant performance improvement. The reason is (as per my knowledge) 
when we give address range to cache controller to invalidate/flush out, 
the hardware operation is too fast(if there were any cache lines 
associated with the pages at all) to add any overhead to the CPU operation.

But this patch makes logical flow for dma-mapping one step closer :) I 
will adopt it as part of pulling all your new patches, and will keep you 
updated of any new findings.

Regards,
Subash

On 06/06/2012 06:47 PM, Marek Szyprowski wrote:
> Hello,
>
> This is a continuation of the dma-mapping extensions posted in the
> following thread:
> http://thread.gmane.org/gmane.linux.kernel.mm/78644
>
> We noticed that some advanced buffer sharing use cases usually require
> creating a dma mapping for the same memory buffer for more than one
> device. Usually also such buffer is never touched with CPU, so the data
> are processed by the devices.
>
>  From the DMA-mapping perspective this requires to call one of the
> dma_map_{page,single,sg} function for the given memory buffer a few
> times, for each of the devices. Each dma_map_* call performs CPU cache
> synchronization, what might be a time consuming operation, especially
> when the buffers are large. We would like to avoid any useless and time
> consuming operations, so that was the main reason for introducing
> another attribute for DMA-mapping subsystem: DMA_ATTR_SKIP_CPU_SYNC,
> which lets dma-mapping core to skip CPU cache synchronization in certain
> cases.
>
> The proposed patches have been generated on top of the ARM DMA-mapping
> redesign patch series on Linux v3.4-rc7. They are also available on the
> following GIT branch:
>
> git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git 3.4-rc7-arm-dma-v10-ext
>
> with all require patches on top of vanilla v3.4-rc7 kernel. I will
> resend them rebased onto v3.5-rc1 soon.
>
> Best regards
> Marek Szyprowski
> Samsung Poland R&D Center
>
>
> Patch summary:
>
> Marek Szyprowski (2):
>    common: DMA-mapping: add DMA_ATTR_SKIP_CPU_SYNC attribute
>    ARM: dma-mapping: add support for DMA_ATTR_SKIP_CPU_SYNC attribute
>
>   Documentation/DMA-attributes.txt |   24 ++++++++++++++++++++++++
>   arch/arm/mm/dma-mapping.c        |   20 +++++++++++---------
>   include/linux/dma-attrs.h        |    1 +
>   3 files changed, 36 insertions(+), 9 deletions(-)
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH/RFC 0/2] ARM: DMA-mapping: new extensions for buffer sharing (part 2)
@ 2012-06-18  7:50   ` Hiroshi Doyu
  0 siblings, 0 replies; 21+ messages in thread
From: Hiroshi Doyu @ 2012-06-18  7:50 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel@lists.infradead.org,
	linaro-mm-sig@lists.linaro.org, linux-mm@kvack.org,
	linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
	Kyungmin Park, Arnd Bergmann, Russell King - ARM Linux,
	Chunsang Jeong, Krishna Reddy, Benjamin Herrenschmidt,
	Konrad Rzeszutek Wilk, Subash Patel, Sumit Semwal,
	Abhinav Kochhar, Tomasz Stanislawski

Hi Marek,

On Wed, 6 Jun 2012 15:17:35 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> Hello,
> 
> This is a continuation of the dma-mapping extensions posted in the
> following thread:
> http://thread.gmane.org/gmane.linux.kernel.mm/78644
> 
> We noticed that some advanced buffer sharing use cases usually require
> creating a dma mapping for the same memory buffer for more than one
> device. Usually also such buffer is never touched with CPU, so the data
> are processed by the devices.
> 
> From the DMA-mapping perspective this requires to call one of the
> dma_map_{page,single,sg} function for the given memory buffer a few
> times, for each of the devices. Each dma_map_* call performs CPU cache
> synchronization, what might be a time consuming operation, especially
> when the buffers are large. We would like to avoid any useless and time
> consuming operations, so that was the main reason for introducing
> another attribute for DMA-mapping subsystem: DMA_ATTR_SKIP_CPU_SYNC,
> which lets dma-mapping core to skip CPU cache synchronization in certain
> cases.

I had implemented the similer patch(*1) to optimize/skip the cache
maintanace, but we did this with "dir", not with "attr", making use of
the existing DMA_NONE to skip cache operations. I'm just interested in
why you choose attr for this purpose. Could you enlight me why attr is
used here?

Any way, this feature is necessary for us. Thank you for posting them.

*1: FYI:

From 4656146d23d0a3bd02131f732b0c04e50475b8da Mon Sep 17 00:00:00 2001
From: Hiroshi DOYU <hdoyu@nvidia.com>
Date: Tue, 20 Mar 2012 15:09:30 +0200
Subject: [PATCH 1/1] ARM: dma-mapping: Allow DMA_NONE to skip cache_maint

Signed-off-by: Hiroshi DOYU <hdoyu@nvidia.com>
---
 arch/arm/mm/dma-mapping.c                |   16 ++++++++--------
 drivers/video/tegra/nvmap/nvmap.c        |    2 +-
 drivers/video/tegra/nvmap/nvmap_handle.c |    2 +-
 include/linux/dma-mapping.h              |   16 +++++++++++++---
 4 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 83f0ac6..c4b1587 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1161,7 +1161,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
 		phys_addr_t phys = page_to_phys(sg_page(s));
 		unsigned int len = PAGE_ALIGN(s->offset + s->length);
 
-		if (!arch_is_coherent())
+		if (!arch_is_coherent() && (dir != DMA_NONE))
 			__dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir);
 
 		ret = iommu_map(mapping->domain, iova, phys, len, 0);
@@ -1254,7 +1254,7 @@ void arm_iommu_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
 		if (sg_dma_len(s))
 			__iommu_remove_mapping(dev, sg_dma_address(s),
 					       sg_dma_len(s));
-		if (!arch_is_coherent())
+		if (!arch_is_coherent() && (dir != DMA_NONE))
 			__dma_page_dev_to_cpu(sg_page(s), s->offset,
 					      s->length, dir);
 	}
@@ -1274,7 +1274,7 @@ void arm_iommu_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 	int i;
 
 	for_each_sg(sg, s, nents, i)
-		if (!arch_is_coherent())
+		if (!arch_is_coherent() && (dir != DMA_NONE))
 			__dma_page_dev_to_cpu(sg_page(s), s->offset, s->length, dir);
 
 }
@@ -1293,7 +1293,7 @@ void arm_iommu_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 	int i;
 
 	for_each_sg(sg, s, nents, i)
-		if (!arch_is_coherent())
+		if (!arch_is_coherent() && (dir != DMA_NONE))
 			__dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir);
 }
 
@@ -1305,7 +1305,7 @@ static dma_addr_t __arm_iommu_map_page_at(struct device *dev, struct page *page,
 	dma_addr_t dma_addr;
 	int ret, len = PAGE_ALIGN(size + offset);
 
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && (dir != DMA_NONE))
 		__dma_page_cpu_to_dev(page, offset, size, dir);
 
 	dma_addr = __alloc_iova_at(mapping, req, len);
@@ -1349,7 +1349,7 @@ dma_addr_t arm_iommu_map_page_at(struct device *dev, struct page *page,
 	unsigned int phys;
 	int ret;
 
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && (dir != DMA_NONE))
 		__dma_page_cpu_to_dev(page, offset, size, dir);
 
 	/* Check if iova area is reserved in advance. */
@@ -1386,7 +1386,7 @@ static void __arm_iommu_unmap_page_at(struct device *dev, dma_addr_t handle,
 	if (!iova)
 		return;
 
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && (dir != DMA_NONE))
 		__dma_page_dev_to_cpu(page, offset, size, dir);
 
 	iommu_unmap(mapping->domain, iova, len);
@@ -1430,7 +1430,7 @@ static void arm_iommu_sync_single_for_cpu(struct device *dev,
 	if (!iova)
 		return;
 
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && (dir != DMA_NONE))
 		__dma_page_dev_to_cpu(page, offset, size, dir);
 }
 
diff --git a/drivers/video/tegra/nvmap/nvmap.c b/drivers/video/tegra/nvmap/nvmap.c
index 1032224..e98dd11 100644
--- a/drivers/video/tegra/nvmap/nvmap.c
+++ b/drivers/video/tegra/nvmap/nvmap.c
@@ -56,7 +56,7 @@ static void map_iovmm_area(struct nvmap_handle *h)
 		BUG_ON(!pfn_valid(page_to_pfn(h->pgalloc.pages[i])));
 
 		iova = dma_map_page_at(to_iovmm_dev(h), h->pgalloc.pages[i],
-				       va, 0, PAGE_SIZE, DMA_BIDIRECTIONAL);
+				       va, 0, PAGE_SIZE, DMA_NONE);
 		BUG_ON(iova != va);
 	}
 	h->pgalloc.dirty = false;
diff --git a/drivers/video/tegra/nvmap/nvmap_handle.c b/drivers/video/tegra/nvmap/nvmap_handle.c
index 853f87e..b2bbeb1 100644
--- a/drivers/video/tegra/nvmap/nvmap_handle.c
+++ b/drivers/video/tegra/nvmap/nvmap_handle.c
@@ -504,7 +504,7 @@ void nvmap_free_vm(struct device *dev, struct tegra_iovmm_area *area)
 		dma_addr_t iova;
 
 		iova = area->iovm_start + i * PAGE_SIZE;
-		dma_unmap_page(dev, iova, PAGE_SIZE, DMA_BIDIRECTIONAL);
+		dma_unmap_page(dev, iova, PAGE_SIZE, DMA_NONE);
 	}
 	kfree(area);
 }
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 36dfe06..cbd8d47 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -55,9 +55,19 @@ struct dma_map_ops {
 
 static inline int valid_dma_direction(int dma_direction)
 {
-	return ((dma_direction == DMA_BIDIRECTIONAL) ||
-		(dma_direction == DMA_TO_DEVICE) ||
-		(dma_direction == DMA_FROM_DEVICE));
+	int ret = 1;
+
+	switch (dma_direction) {
+	case DMA_BIDIRECTIONAL:
+	case DMA_TO_DEVICE:
+	case DMA_FROM_DEVICE:
+	case DMA_NONE:
+		break;
+	default:
+		ret = !!ret;
+		break;
+	} 
+	return ret;
 }
 
 static inline int is_device_dma_capable(struct device *dev)
-- 
1.7.5.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH/RFC 0/2] ARM: DMA-mapping: new extensions for buffer sharing (part 2)
@ 2012-06-18  7:50   ` Hiroshi Doyu
  0 siblings, 0 replies; 21+ messages in thread
From: Hiroshi Doyu @ 2012-06-18  7:50 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel@lists.infradead.org,
	linaro-mm-sig@lists.linaro.org, linux-mm@kvack.org,
	linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
	Kyungmin Park, Arnd Bergmann, Russell King - ARM Linux,
	Chunsang Jeong, Krishna Reddy, Benjamin Herrenschmidt,
	Konrad Rzeszutek Wilk, Subash Patel, Sumit Semwal,
	Abhinav Kochhar, Tomasz Stanislawski

Hi Marek,

On Wed, 6 Jun 2012 15:17:35 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> Hello,
> 
> This is a continuation of the dma-mapping extensions posted in the
> following thread:
> http://thread.gmane.org/gmane.linux.kernel.mm/78644
> 
> We noticed that some advanced buffer sharing use cases usually require
> creating a dma mapping for the same memory buffer for more than one
> device. Usually also such buffer is never touched with CPU, so the data
> are processed by the devices.
> 
> From the DMA-mapping perspective this requires to call one of the
> dma_map_{page,single,sg} function for the given memory buffer a few
> times, for each of the devices. Each dma_map_* call performs CPU cache
> synchronization, what might be a time consuming operation, especially
> when the buffers are large. We would like to avoid any useless and time
> consuming operations, so that was the main reason for introducing
> another attribute for DMA-mapping subsystem: DMA_ATTR_SKIP_CPU_SYNC,
> which lets dma-mapping core to skip CPU cache synchronization in certain
> cases.

I had implemented the similer patch(*1) to optimize/skip the cache
maintanace, but we did this with "dir", not with "attr", making use of
the existing DMA_NONE to skip cache operations. I'm just interested in
why you choose attr for this purpose. Could you enlight me why attr is
used here?

Any way, this feature is necessary for us. Thank you for posting them.

*1: FYI:

>From 4656146d23d0a3bd02131f732b0c04e50475b8da Mon Sep 17 00:00:00 2001
From: Hiroshi DOYU <hdoyu@nvidia.com>
Date: Tue, 20 Mar 2012 15:09:30 +0200
Subject: [PATCH 1/1] ARM: dma-mapping: Allow DMA_NONE to skip cache_maint

Signed-off-by: Hiroshi DOYU <hdoyu@nvidia.com>
---
 arch/arm/mm/dma-mapping.c                |   16 ++++++++--------
 drivers/video/tegra/nvmap/nvmap.c        |    2 +-
 drivers/video/tegra/nvmap/nvmap_handle.c |    2 +-
 include/linux/dma-mapping.h              |   16 +++++++++++++---
 4 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 83f0ac6..c4b1587 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1161,7 +1161,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
 		phys_addr_t phys = page_to_phys(sg_page(s));
 		unsigned int len = PAGE_ALIGN(s->offset + s->length);
 
-		if (!arch_is_coherent())
+		if (!arch_is_coherent() && (dir != DMA_NONE))
 			__dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir);
 
 		ret = iommu_map(mapping->domain, iova, phys, len, 0);
@@ -1254,7 +1254,7 @@ void arm_iommu_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
 		if (sg_dma_len(s))
 			__iommu_remove_mapping(dev, sg_dma_address(s),
 					       sg_dma_len(s));
-		if (!arch_is_coherent())
+		if (!arch_is_coherent() && (dir != DMA_NONE))
 			__dma_page_dev_to_cpu(sg_page(s), s->offset,
 					      s->length, dir);
 	}
@@ -1274,7 +1274,7 @@ void arm_iommu_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 	int i;
 
 	for_each_sg(sg, s, nents, i)
-		if (!arch_is_coherent())
+		if (!arch_is_coherent() && (dir != DMA_NONE))
 			__dma_page_dev_to_cpu(sg_page(s), s->offset, s->length, dir);
 
 }
@@ -1293,7 +1293,7 @@ void arm_iommu_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 	int i;
 
 	for_each_sg(sg, s, nents, i)
-		if (!arch_is_coherent())
+		if (!arch_is_coherent() && (dir != DMA_NONE))
 			__dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir);
 }
 
@@ -1305,7 +1305,7 @@ static dma_addr_t __arm_iommu_map_page_at(struct device *dev, struct page *page,
 	dma_addr_t dma_addr;
 	int ret, len = PAGE_ALIGN(size + offset);
 
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && (dir != DMA_NONE))
 		__dma_page_cpu_to_dev(page, offset, size, dir);
 
 	dma_addr = __alloc_iova_at(mapping, req, len);
@@ -1349,7 +1349,7 @@ dma_addr_t arm_iommu_map_page_at(struct device *dev, struct page *page,
 	unsigned int phys;
 	int ret;
 
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && (dir != DMA_NONE))
 		__dma_page_cpu_to_dev(page, offset, size, dir);
 
 	/* Check if iova area is reserved in advance. */
@@ -1386,7 +1386,7 @@ static void __arm_iommu_unmap_page_at(struct device *dev, dma_addr_t handle,
 	if (!iova)
 		return;
 
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && (dir != DMA_NONE))
 		__dma_page_dev_to_cpu(page, offset, size, dir);
 
 	iommu_unmap(mapping->domain, iova, len);
@@ -1430,7 +1430,7 @@ static void arm_iommu_sync_single_for_cpu(struct device *dev,
 	if (!iova)
 		return;
 
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && (dir != DMA_NONE))
 		__dma_page_dev_to_cpu(page, offset, size, dir);
 }
 
diff --git a/drivers/video/tegra/nvmap/nvmap.c b/drivers/video/tegra/nvmap/nvmap.c
index 1032224..e98dd11 100644
--- a/drivers/video/tegra/nvmap/nvmap.c
+++ b/drivers/video/tegra/nvmap/nvmap.c
@@ -56,7 +56,7 @@ static void map_iovmm_area(struct nvmap_handle *h)
 		BUG_ON(!pfn_valid(page_to_pfn(h->pgalloc.pages[i])));
 
 		iova = dma_map_page_at(to_iovmm_dev(h), h->pgalloc.pages[i],
-				       va, 0, PAGE_SIZE, DMA_BIDIRECTIONAL);
+				       va, 0, PAGE_SIZE, DMA_NONE);
 		BUG_ON(iova != va);
 	}
 	h->pgalloc.dirty = false;
diff --git a/drivers/video/tegra/nvmap/nvmap_handle.c b/drivers/video/tegra/nvmap/nvmap_handle.c
index 853f87e..b2bbeb1 100644
--- a/drivers/video/tegra/nvmap/nvmap_handle.c
+++ b/drivers/video/tegra/nvmap/nvmap_handle.c
@@ -504,7 +504,7 @@ void nvmap_free_vm(struct device *dev, struct tegra_iovmm_area *area)
 		dma_addr_t iova;
 
 		iova = area->iovm_start + i * PAGE_SIZE;
-		dma_unmap_page(dev, iova, PAGE_SIZE, DMA_BIDIRECTIONAL);
+		dma_unmap_page(dev, iova, PAGE_SIZE, DMA_NONE);
 	}
 	kfree(area);
 }
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 36dfe06..cbd8d47 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -55,9 +55,19 @@ struct dma_map_ops {
 
 static inline int valid_dma_direction(int dma_direction)
 {
-	return ((dma_direction == DMA_BIDIRECTIONAL) ||
-		(dma_direction == DMA_TO_DEVICE) ||
-		(dma_direction == DMA_FROM_DEVICE));
+	int ret = 1;
+
+	switch (dma_direction) {
+	case DMA_BIDIRECTIONAL:
+	case DMA_TO_DEVICE:
+	case DMA_FROM_DEVICE:
+	case DMA_NONE:
+		break;
+	default:
+		ret = !!ret;
+		break;
+	} 
+	return ret;
 }
 
 static inline int is_device_dma_capable(struct device *dev)
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH/RFC 0/2] ARM: DMA-mapping: new extensions for buffer sharing (part 2)
@ 2012-06-18  7:50   ` Hiroshi Doyu
  0 siblings, 0 replies; 21+ messages in thread
From: Hiroshi Doyu @ 2012-06-18  7:50 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel@lists.infradead.org,
	linaro-mm-sig@lists.linaro.org, linux-mm@kvack.org,
	linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
	Kyungmin Park, Arnd Bergmann, Russell King - ARM Linux,
	Chunsang Jeong, Krishna Reddy, Benjamin Herrenschmidt,
	Konrad Rzeszutek Wilk, Subash Patel, Sumit Semwal,
	Abhinav Kochhar, Tomasz Stanislawski

Hi Marek,

On Wed, 6 Jun 2012 15:17:35 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> Hello,
> 
> This is a continuation of the dma-mapping extensions posted in the
> following thread:
> http://thread.gmane.org/gmane.linux.kernel.mm/78644
> 
> We noticed that some advanced buffer sharing use cases usually require
> creating a dma mapping for the same memory buffer for more than one
> device. Usually also such buffer is never touched with CPU, so the data
> are processed by the devices.
> 
> From the DMA-mapping perspective this requires to call one of the
> dma_map_{page,single,sg} function for the given memory buffer a few
> times, for each of the devices. Each dma_map_* call performs CPU cache
> synchronization, what might be a time consuming operation, especially
> when the buffers are large. We would like to avoid any useless and time
> consuming operations, so that was the main reason for introducing
> another attribute for DMA-mapping subsystem: DMA_ATTR_SKIP_CPU_SYNC,
> which lets dma-mapping core to skip CPU cache synchronization in certain
> cases.

I had implemented the similer patch(*1) to optimize/skip the cache
maintanace, but we did this with "dir", not with "attr", making use of
the existing DMA_NONE to skip cache operations. I'm just interested in
why you choose attr for this purpose. Could you enlight me why attr is
used here?

Any way, this feature is necessary for us. Thank you for posting them.

*1: FYI:

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH/RFC 0/2] ARM: DMA-mapping: new extensions for buffer sharing (part 2)
@ 2012-06-18  7:50   ` Hiroshi Doyu
  0 siblings, 0 replies; 21+ messages in thread
From: Hiroshi Doyu @ 2012-06-18  7:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marek,

On Wed, 6 Jun 2012 15:17:35 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> Hello,
> 
> This is a continuation of the dma-mapping extensions posted in the
> following thread:
> http://thread.gmane.org/gmane.linux.kernel.mm/78644
> 
> We noticed that some advanced buffer sharing use cases usually require
> creating a dma mapping for the same memory buffer for more than one
> device. Usually also such buffer is never touched with CPU, so the data
> are processed by the devices.
> 
> From the DMA-mapping perspective this requires to call one of the
> dma_map_{page,single,sg} function for the given memory buffer a few
> times, for each of the devices. Each dma_map_* call performs CPU cache
> synchronization, what might be a time consuming operation, especially
> when the buffers are large. We would like to avoid any useless and time
> consuming operations, so that was the main reason for introducing
> another attribute for DMA-mapping subsystem: DMA_ATTR_SKIP_CPU_SYNC,
> which lets dma-mapping core to skip CPU cache synchronization in certain
> cases.

I had implemented the similer patch(*1) to optimize/skip the cache
maintanace, but we did this with "dir", not with "attr", making use of
the existing DMA_NONE to skip cache operations. I'm just interested in
why you choose attr for this purpose. Could you enlight me why attr is
used here?

Any way, this feature is necessary for us. Thank you for posting them.

*1: FYI:

>From 4656146d23d0a3bd02131f732b0c04e50475b8da Mon Sep 17 00:00:00 2001
From: Hiroshi DOYU <hdoyu@nvidia.com>
Date: Tue, 20 Mar 2012 15:09:30 +0200
Subject: [PATCH 1/1] ARM: dma-mapping: Allow DMA_NONE to skip cache_maint

Signed-off-by: Hiroshi DOYU <hdoyu@nvidia.com>
---
 arch/arm/mm/dma-mapping.c                |   16 ++++++++--------
 drivers/video/tegra/nvmap/nvmap.c        |    2 +-
 drivers/video/tegra/nvmap/nvmap_handle.c |    2 +-
 include/linux/dma-mapping.h              |   16 +++++++++++++---
 4 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 83f0ac6..c4b1587 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1161,7 +1161,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
 		phys_addr_t phys = page_to_phys(sg_page(s));
 		unsigned int len = PAGE_ALIGN(s->offset + s->length);
 
-		if (!arch_is_coherent())
+		if (!arch_is_coherent() && (dir != DMA_NONE))
 			__dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir);
 
 		ret = iommu_map(mapping->domain, iova, phys, len, 0);
@@ -1254,7 +1254,7 @@ void arm_iommu_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
 		if (sg_dma_len(s))
 			__iommu_remove_mapping(dev, sg_dma_address(s),
 					       sg_dma_len(s));
-		if (!arch_is_coherent())
+		if (!arch_is_coherent() && (dir != DMA_NONE))
 			__dma_page_dev_to_cpu(sg_page(s), s->offset,
 					      s->length, dir);
 	}
@@ -1274,7 +1274,7 @@ void arm_iommu_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 	int i;
 
 	for_each_sg(sg, s, nents, i)
-		if (!arch_is_coherent())
+		if (!arch_is_coherent() && (dir != DMA_NONE))
 			__dma_page_dev_to_cpu(sg_page(s), s->offset, s->length, dir);
 
 }
@@ -1293,7 +1293,7 @@ void arm_iommu_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 	int i;
 
 	for_each_sg(sg, s, nents, i)
-		if (!arch_is_coherent())
+		if (!arch_is_coherent() && (dir != DMA_NONE))
 			__dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir);
 }
 
@@ -1305,7 +1305,7 @@ static dma_addr_t __arm_iommu_map_page_at(struct device *dev, struct page *page,
 	dma_addr_t dma_addr;
 	int ret, len = PAGE_ALIGN(size + offset);
 
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && (dir != DMA_NONE))
 		__dma_page_cpu_to_dev(page, offset, size, dir);
 
 	dma_addr = __alloc_iova_at(mapping, req, len);
@@ -1349,7 +1349,7 @@ dma_addr_t arm_iommu_map_page_at(struct device *dev, struct page *page,
 	unsigned int phys;
 	int ret;
 
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && (dir != DMA_NONE))
 		__dma_page_cpu_to_dev(page, offset, size, dir);
 
 	/* Check if iova area is reserved in advance. */
@@ -1386,7 +1386,7 @@ static void __arm_iommu_unmap_page_at(struct device *dev, dma_addr_t handle,
 	if (!iova)
 		return;
 
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && (dir != DMA_NONE))
 		__dma_page_dev_to_cpu(page, offset, size, dir);
 
 	iommu_unmap(mapping->domain, iova, len);
@@ -1430,7 +1430,7 @@ static void arm_iommu_sync_single_for_cpu(struct device *dev,
 	if (!iova)
 		return;
 
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && (dir != DMA_NONE))
 		__dma_page_dev_to_cpu(page, offset, size, dir);
 }
 
diff --git a/drivers/video/tegra/nvmap/nvmap.c b/drivers/video/tegra/nvmap/nvmap.c
index 1032224..e98dd11 100644
--- a/drivers/video/tegra/nvmap/nvmap.c
+++ b/drivers/video/tegra/nvmap/nvmap.c
@@ -56,7 +56,7 @@ static void map_iovmm_area(struct nvmap_handle *h)
 		BUG_ON(!pfn_valid(page_to_pfn(h->pgalloc.pages[i])));
 
 		iova = dma_map_page_at(to_iovmm_dev(h), h->pgalloc.pages[i],
-				       va, 0, PAGE_SIZE, DMA_BIDIRECTIONAL);
+				       va, 0, PAGE_SIZE, DMA_NONE);
 		BUG_ON(iova != va);
 	}
 	h->pgalloc.dirty = false;
diff --git a/drivers/video/tegra/nvmap/nvmap_handle.c b/drivers/video/tegra/nvmap/nvmap_handle.c
index 853f87e..b2bbeb1 100644
--- a/drivers/video/tegra/nvmap/nvmap_handle.c
+++ b/drivers/video/tegra/nvmap/nvmap_handle.c
@@ -504,7 +504,7 @@ void nvmap_free_vm(struct device *dev, struct tegra_iovmm_area *area)
 		dma_addr_t iova;
 
 		iova = area->iovm_start + i * PAGE_SIZE;
-		dma_unmap_page(dev, iova, PAGE_SIZE, DMA_BIDIRECTIONAL);
+		dma_unmap_page(dev, iova, PAGE_SIZE, DMA_NONE);
 	}
 	kfree(area);
 }
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 36dfe06..cbd8d47 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -55,9 +55,19 @@ struct dma_map_ops {
 
 static inline int valid_dma_direction(int dma_direction)
 {
-	return ((dma_direction == DMA_BIDIRECTIONAL) ||
-		(dma_direction == DMA_TO_DEVICE) ||
-		(dma_direction == DMA_FROM_DEVICE));
+	int ret = 1;
+
+	switch (dma_direction) {
+	case DMA_BIDIRECTIONAL:
+	case DMA_TO_DEVICE:
+	case DMA_FROM_DEVICE:
+	case DMA_NONE:
+		break;
+	default:
+		ret = !!ret;
+		break;
+	} 
+	return ret;
 }
 
 static inline int is_device_dma_capable(struct device *dev)
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH/RFC 0/2] ARM: DMA-mapping: new extensions for buffer sharing (part 2)
@ 2012-06-18  7:50   ` Hiroshi Doyu
  0 siblings, 0 replies; 21+ messages in thread
From: Hiroshi Doyu @ 2012-06-18  7:50 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel@lists.infradead.org,
	linaro-mm-sig@lists.linaro.org, linux-mm@kvack.org,
	linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
	Kyungmin Park, Arnd Bergmann, Russell King - ARM Linux,
	Chunsang Jeong, Krishna Reddy, Benjamin Herrenschmidt,
	Konrad Rzeszutek Wilk, Subash Patel, Sumit Semwal,
	Abhinav Kochhar, Tomasz Stanislawski

Hi Marek,

On Wed, 6 Jun 2012 15:17:35 +0200
Marek Szyprowski <m.szyprowski@samsung.com> wrote:

> Hello,
> 
> This is a continuation of the dma-mapping extensions posted in the
> following thread:
> http://thread.gmane.org/gmane.linux.kernel.mm/78644
> 
> We noticed that some advanced buffer sharing use cases usually require
> creating a dma mapping for the same memory buffer for more than one
> device. Usually also such buffer is never touched with CPU, so the data
> are processed by the devices.
> 
> From the DMA-mapping perspective this requires to call one of the
> dma_map_{page,single,sg} function for the given memory buffer a few
> times, for each of the devices. Each dma_map_* call performs CPU cache
> synchronization, what might be a time consuming operation, especially
> when the buffers are large. We would like to avoid any useless and time
> consuming operations, so that was the main reason for introducing
> another attribute for DMA-mapping subsystem: DMA_ATTR_SKIP_CPU_SYNC,
> which lets dma-mapping core to skip CPU cache synchronization in certain
> cases.

I had implemented the similer patch(*1) to optimize/skip the cache
maintanace, but we did this with "dir", not with "attr", making use of
the existing DMA_NONE to skip cache operations. I'm just interested in
why you choose attr for this purpose. Could you enlight me why attr is
used here?

Any way, this feature is necessary for us. Thank you for posting them.

*1: FYI:

From 4656146d23d0a3bd02131f732b0c04e50475b8da Mon Sep 17 00:00:00 2001
From: Hiroshi DOYU <hdoyu@nvidia.com>
Date: Tue, 20 Mar 2012 15:09:30 +0200
Subject: [PATCH 1/1] ARM: dma-mapping: Allow DMA_NONE to skip cache_maint

Signed-off-by: Hiroshi DOYU <hdoyu@nvidia.com>
---
 arch/arm/mm/dma-mapping.c                |   16 ++++++++--------
 drivers/video/tegra/nvmap/nvmap.c        |    2 +-
 drivers/video/tegra/nvmap/nvmap_handle.c |    2 +-
 include/linux/dma-mapping.h              |   16 +++++++++++++---
 4 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 83f0ac6..c4b1587 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1161,7 +1161,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
 		phys_addr_t phys = page_to_phys(sg_page(s));
 		unsigned int len = PAGE_ALIGN(s->offset + s->length);
 
-		if (!arch_is_coherent())
+		if (!arch_is_coherent() && (dir != DMA_NONE))
 			__dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir);
 
 		ret = iommu_map(mapping->domain, iova, phys, len, 0);
@@ -1254,7 +1254,7 @@ void arm_iommu_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
 		if (sg_dma_len(s))
 			__iommu_remove_mapping(dev, sg_dma_address(s),
 					       sg_dma_len(s));
-		if (!arch_is_coherent())
+		if (!arch_is_coherent() && (dir != DMA_NONE))
 			__dma_page_dev_to_cpu(sg_page(s), s->offset,
 					      s->length, dir);
 	}
@@ -1274,7 +1274,7 @@ void arm_iommu_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 	int i;
 
 	for_each_sg(sg, s, nents, i)
-		if (!arch_is_coherent())
+		if (!arch_is_coherent() && (dir != DMA_NONE))
 			__dma_page_dev_to_cpu(sg_page(s), s->offset, s->length, dir);
 
 }
@@ -1293,7 +1293,7 @@ void arm_iommu_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 	int i;
 
 	for_each_sg(sg, s, nents, i)
-		if (!arch_is_coherent())
+		if (!arch_is_coherent() && (dir != DMA_NONE))
 			__dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir);
 }
 
@@ -1305,7 +1305,7 @@ static dma_addr_t __arm_iommu_map_page_at(struct device *dev, struct page *page,
 	dma_addr_t dma_addr;
 	int ret, len = PAGE_ALIGN(size + offset);
 
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && (dir != DMA_NONE))
 		__dma_page_cpu_to_dev(page, offset, size, dir);
 
 	dma_addr = __alloc_iova_at(mapping, req, len);
@@ -1349,7 +1349,7 @@ dma_addr_t arm_iommu_map_page_at(struct device *dev, struct page *page,
 	unsigned int phys;
 	int ret;
 
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && (dir != DMA_NONE))
 		__dma_page_cpu_to_dev(page, offset, size, dir);
 
 	/* Check if iova area is reserved in advance. */
@@ -1386,7 +1386,7 @@ static void __arm_iommu_unmap_page_at(struct device *dev, dma_addr_t handle,
 	if (!iova)
 		return;
 
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && (dir != DMA_NONE))
 		__dma_page_dev_to_cpu(page, offset, size, dir);
 
 	iommu_unmap(mapping->domain, iova, len);
@@ -1430,7 +1430,7 @@ static void arm_iommu_sync_single_for_cpu(struct device *dev,
 	if (!iova)
 		return;
 
-	if (!arch_is_coherent())
+	if (!arch_is_coherent() && (dir != DMA_NONE))
 		__dma_page_dev_to_cpu(page, offset, size, dir);
 }
 
diff --git a/drivers/video/tegra/nvmap/nvmap.c b/drivers/video/tegra/nvmap/nvmap.c
index 1032224..e98dd11 100644
--- a/drivers/video/tegra/nvmap/nvmap.c
+++ b/drivers/video/tegra/nvmap/nvmap.c
@@ -56,7 +56,7 @@ static void map_iovmm_area(struct nvmap_handle *h)
 		BUG_ON(!pfn_valid(page_to_pfn(h->pgalloc.pages[i])));
 
 		iova = dma_map_page_at(to_iovmm_dev(h), h->pgalloc.pages[i],
-				       va, 0, PAGE_SIZE, DMA_BIDIRECTIONAL);
+				       va, 0, PAGE_SIZE, DMA_NONE);
 		BUG_ON(iova != va);
 	}
 	h->pgalloc.dirty = false;
diff --git a/drivers/video/tegra/nvmap/nvmap_handle.c b/drivers/video/tegra/nvmap/nvmap_handle.c
index 853f87e..b2bbeb1 100644
--- a/drivers/video/tegra/nvmap/nvmap_handle.c
+++ b/drivers/video/tegra/nvmap/nvmap_handle.c
@@ -504,7 +504,7 @@ void nvmap_free_vm(struct device *dev, struct tegra_iovmm_area *area)
 		dma_addr_t iova;
 
 		iova = area->iovm_start + i * PAGE_SIZE;
-		dma_unmap_page(dev, iova, PAGE_SIZE, DMA_BIDIRECTIONAL);
+		dma_unmap_page(dev, iova, PAGE_SIZE, DMA_NONE);
 	}
 	kfree(area);
 }
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 36dfe06..cbd8d47 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -55,9 +55,19 @@ struct dma_map_ops {
 
 static inline int valid_dma_direction(int dma_direction)
 {
-	return ((dma_direction == DMA_BIDIRECTIONAL) ||
-		(dma_direction == DMA_TO_DEVICE) ||
-		(dma_direction == DMA_FROM_DEVICE));
+	int ret = 1;
+
+	switch (dma_direction) {
+	case DMA_BIDIRECTIONAL:
+	case DMA_TO_DEVICE:
+	case DMA_FROM_DEVICE:
+	case DMA_NONE:
+		break;
+	default:
+		ret = !!ret;
+		break;
+	} 
+	return ret;
 }
 
 static inline int is_device_dma_capable(struct device *dev)
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* RE: [PATCH/RFC 0/2] ARM: DMA-mapping: new extensions for buffer sharing (part 2)
  2012-06-18  7:50   ` Hiroshi Doyu
  (?)
@ 2012-06-18  9:03     ` Marek Szyprowski
  -1 siblings, 0 replies; 21+ messages in thread
From: Marek Szyprowski @ 2012-06-18  9:03 UTC (permalink / raw)
  To: 'Hiroshi Doyu'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	linux-kernel, 'Kyungmin Park', 'Arnd Bergmann',
	'Russell King - ARM Linux', 'Chunsang Jeong',
	'Krishna Reddy', 'Benjamin Herrenschmidt',
	'Konrad Rzeszutek Wilk', 'Subash Patel',
	'Sumit Semwal', 'Abhinav Kochhar',
	Tomasz Stanislawski

Hi,

On Monday, June 18, 2012 9:51 AM Hiroshi Doyu wrote:

> On Wed, 6 Jun 2012 15:17:35 +0200
> Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> 
> > This is a continuation of the dma-mapping extensions posted in the
> > following thread:
> > http://thread.gmane.org/gmane.linux.kernel.mm/78644
> >
> > We noticed that some advanced buffer sharing use cases usually require
> > creating a dma mapping for the same memory buffer for more than one
> > device. Usually also such buffer is never touched with CPU, so the data
> > are processed by the devices.
> >
> > From the DMA-mapping perspective this requires to call one of the
> > dma_map_{page,single,sg} function for the given memory buffer a few
> > times, for each of the devices. Each dma_map_* call performs CPU cache
> > synchronization, what might be a time consuming operation, especially
> > when the buffers are large. We would like to avoid any useless and time
> > consuming operations, so that was the main reason for introducing
> > another attribute for DMA-mapping subsystem: DMA_ATTR_SKIP_CPU_SYNC,
> > which lets dma-mapping core to skip CPU cache synchronization in certain
> > cases.
> 
> I had implemented the similer patch(*1) to optimize/skip the cache
> maintanace, but we did this with "dir", not with "attr", making use of
> the existing DMA_NONE to skip cache operations. I'm just interested in
> why you choose attr for this purpose. Could you enlight me why attr is
> used here?

I also thought initially about adding new dma direction for this feature,
but then I realized that there might be cases where the real direction of
the data transfer might be needed (for example to set io read/write
attributes for the mappings) and this will lead us to 3 new dma directions.
The second reason was the compatibility with existing code. There are
already drivers which use DMA_NONE type for their internal stuff. Adding
support for new dma attributes requires changes in all implementations of
dma-mapping for all architectures. DMA attributes are imho better fits
this case. They are by default optional, so other architectures are free
to leave them unimplemented and the drivers should still work correctly.
 
Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH/RFC 0/2] ARM: DMA-mapping: new extensions for buffer sharing (part 2)
@ 2012-06-18  9:03     ` Marek Szyprowski
  0 siblings, 0 replies; 21+ messages in thread
From: Marek Szyprowski @ 2012-06-18  9:03 UTC (permalink / raw)
  To: 'Hiroshi Doyu'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	linux-kernel, 'Kyungmin Park', 'Arnd Bergmann',
	'Russell King - ARM Linux', 'Chunsang Jeong',
	'Krishna Reddy', 'Benjamin Herrenschmidt',
	'Konrad Rzeszutek Wilk', 'Subash Patel',
	'Sumit Semwal', 'Abhinav Kochhar',
	Tomasz Stanislawski

Hi,

On Monday, June 18, 2012 9:51 AM Hiroshi Doyu wrote:

> On Wed, 6 Jun 2012 15:17:35 +0200
> Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> 
> > This is a continuation of the dma-mapping extensions posted in the
> > following thread:
> > http://thread.gmane.org/gmane.linux.kernel.mm/78644
> >
> > We noticed that some advanced buffer sharing use cases usually require
> > creating a dma mapping for the same memory buffer for more than one
> > device. Usually also such buffer is never touched with CPU, so the data
> > are processed by the devices.
> >
> > From the DMA-mapping perspective this requires to call one of the
> > dma_map_{page,single,sg} function for the given memory buffer a few
> > times, for each of the devices. Each dma_map_* call performs CPU cache
> > synchronization, what might be a time consuming operation, especially
> > when the buffers are large. We would like to avoid any useless and time
> > consuming operations, so that was the main reason for introducing
> > another attribute for DMA-mapping subsystem: DMA_ATTR_SKIP_CPU_SYNC,
> > which lets dma-mapping core to skip CPU cache synchronization in certain
> > cases.
> 
> I had implemented the similer patch(*1) to optimize/skip the cache
> maintanace, but we did this with "dir", not with "attr", making use of
> the existing DMA_NONE to skip cache operations. I'm just interested in
> why you choose attr for this purpose. Could you enlight me why attr is
> used here?

I also thought initially about adding new dma direction for this feature,
but then I realized that there might be cases where the real direction of
the data transfer might be needed (for example to set io read/write
attributes for the mappings) and this will lead us to 3 new dma directions.
The second reason was the compatibility with existing code. There are
already drivers which use DMA_NONE type for their internal stuff. Adding
support for new dma attributes requires changes in all implementations of
dma-mapping for all architectures. DMA attributes are imho better fits
this case. They are by default optional, so other architectures are free
to leave them unimplemented and the drivers should still work correctly.
 
Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH/RFC 0/2] ARM: DMA-mapping: new extensions for buffer sharing (part 2)
@ 2012-06-18  9:03     ` Marek Szyprowski
  0 siblings, 0 replies; 21+ messages in thread
From: Marek Szyprowski @ 2012-06-18  9:03 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Monday, June 18, 2012 9:51 AM Hiroshi Doyu wrote:

> On Wed, 6 Jun 2012 15:17:35 +0200
> Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> 
> > This is a continuation of the dma-mapping extensions posted in the
> > following thread:
> > http://thread.gmane.org/gmane.linux.kernel.mm/78644
> >
> > We noticed that some advanced buffer sharing use cases usually require
> > creating a dma mapping for the same memory buffer for more than one
> > device. Usually also such buffer is never touched with CPU, so the data
> > are processed by the devices.
> >
> > From the DMA-mapping perspective this requires to call one of the
> > dma_map_{page,single,sg} function for the given memory buffer a few
> > times, for each of the devices. Each dma_map_* call performs CPU cache
> > synchronization, what might be a time consuming operation, especially
> > when the buffers are large. We would like to avoid any useless and time
> > consuming operations, so that was the main reason for introducing
> > another attribute for DMA-mapping subsystem: DMA_ATTR_SKIP_CPU_SYNC,
> > which lets dma-mapping core to skip CPU cache synchronization in certain
> > cases.
> 
> I had implemented the similer patch(*1) to optimize/skip the cache
> maintanace, but we did this with "dir", not with "attr", making use of
> the existing DMA_NONE to skip cache operations. I'm just interested in
> why you choose attr for this purpose. Could you enlight me why attr is
> used here?

I also thought initially about adding new dma direction for this feature,
but then I realized that there might be cases where the real direction of
the data transfer might be needed (for example to set io read/write
attributes for the mappings) and this will lead us to 3 new dma directions.
The second reason was the compatibility with existing code. There are
already drivers which use DMA_NONE type for their internal stuff. Adding
support for new dma attributes requires changes in all implementations of
dma-mapping for all architectures. DMA attributes are imho better fits
this case. They are by default optional, so other architectures are free
to leave them unimplemented and the drivers should still work correctly.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-06-18  9:03 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-06 13:17 [PATCH/RFC 0/2] ARM: DMA-mapping: new extensions for buffer sharing (part 2) Marek Szyprowski
2012-06-06 13:17 ` Marek Szyprowski
2012-06-06 13:17 ` Marek Szyprowski
2012-06-06 13:17 ` Marek Szyprowski
2012-06-06 13:17 ` [PATCH 1/2] common: DMA-mapping: add DMA_ATTR_SKIP_CPU_SYNC attribute Marek Szyprowski
2012-06-06 13:17   ` Marek Szyprowski
2012-06-06 13:17   ` Marek Szyprowski
2012-06-06 13:17 ` [PATCH 2/2] ARM: dma-mapping: add support for " Marek Szyprowski
2012-06-06 13:17   ` Marek Szyprowski
2012-06-06 13:17   ` Marek Szyprowski
2012-06-06 13:45 ` [PATCH/RFC 0/2] ARM: DMA-mapping: new extensions for buffer sharing (part 2) Subash Patel
2012-06-06 13:45   ` Subash Patel
2012-06-06 13:45   ` Subash Patel
2012-06-18  7:50 ` Hiroshi Doyu
2012-06-18  7:50   ` Hiroshi Doyu
2012-06-18  7:50   ` Hiroshi Doyu
2012-06-18  7:50   ` Hiroshi Doyu
2012-06-18  7:50   ` Hiroshi Doyu
2012-06-18  9:03   ` Marek Szyprowski
2012-06-18  9:03     ` Marek Szyprowski
2012-06-18  9:03     ` Marek Szyprowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.