linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/2] ARM: DMA-mapping & IOMMU integration
@ 2011-05-25  7:35 Marek Szyprowski
  2011-06-04 16:13 ` Ohad Ben-Cohen
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Marek Szyprowski @ 2011-05-25  7:35 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

Folloing the discussion about the driver for IOMMU controller for
Samsung Exynos4 platform and Arnd's suggestions I've decided to start
working on redesign of dma-mapping implementation for ARM architecture.
The goal is to add support for IOMMU in the way preffered by the
community :)

Some of the ideas about merging dma-mapping api and iommu api comes from
the following threads:
http://www.spinics.net/lists/linux-media/msg31453.html
http://www.spinics.net/lists/arm-kernel/msg122552.html
http://www.spinics.net/lists/arm-kernel/msg124416.html

They were also discussed on Linaro memory management meeting at UDS
(Budapest 9-12 May).

I've finaly managed to clean up a bit my works and present the initial,
very proof-of-concept version of patches that were ready just before
Linaro meeting.

What have been implemented: 

1. Introduced arm_dma_ops

dma_map_ops from include/linux/dma-mapping.h suffers from the following
limitations:
- lack of start address for sync operations
- lack of write-combine methods
- lack of mmap to user-space methods
- lack of map_single method

For the initial version I've decided to use custom arm_dma_ops.
Extending common interface will take time, until that I wanted to have
something already working.

dma_{alloc,free,mmap}_{coherent,writecombine} have been consolidated
into dma_{alloc,free,mmap}_attrib what have been suggested on Linaro
meeting. New attribute for WRITE_COMBINE memory have been introduced.

2. moved all inline ARM dma-mapping related operations to
arch/arm/mm/dma-mapping.c and put them as methods in generic arm_dma_ops
structure. The dma-mapping.c code deinitely needs cleanup, but this is
just a first step.

3. Added very initial IOMMU support. Right now it is limited only to
dma_alloc_attrib, dma_free_attrib and dma_mmap_attrib. It have been
tested with s5p-fimc driver on Samsung Exynos4 platform.

4. Adapted Samsung Exynos4 IOMUU driver to make use of the introduced
iommu_dma proposal.

This patch series contains only patches for common dma-mapping part.
There is also a patch that adds driver for Samsung IOMMU controller on
Exynos4 platform. All required patches are available on:

git://git.infradead.org/users/kmpark/linux-2.6-samsung dma-mapping branch

Git web interface:
http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/dma-mapping


Future:

1. Add all missing operations for IOMMU mappings (map_single/page/sg,
sync_*)

2. Move sync_* operations into separate function for better code sharing
between iommu and non-iommu dma-mapping code

3. Splitting out dma bounce code from non-bounce into separate set of
dma methods. Right now dma-bounce code is compiled conditionally and
spread over arch/arm/mm/dma-mapping.c and arch/arm/common/dmabounce.c.

4. Merging dma_map_single with dma_map_page. I haven't investigated
deeply why they have separate implementation on ARM. If this is a
requirement then dma_map_ops need to be extended with another method.

5. Fix dma_alloc to unmap from linear mapping.

6. Convert IO address space management code from gen-alloc to some
simpler bitmap based solution.

7. resolve issues that might araise during discussion & comments

Please note that this is very early version of patches, definitely NOT
intended for merging. I just wanted to make sure that the direction is
right and share the code with others that might want to cooperate on
dma-mapping improvements.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



Patch summary:

Marek Szyprowski (2):
  ARM: Move dma related inlines into arm_dma_ops methods
  ARM: initial proof-of-concept IOMMU mapper for DMA-mapping

 arch/arm/Kconfig                   |    1 +
 arch/arm/include/asm/device.h      |    3 +
 arch/arm/include/asm/dma-iommu.h   |   30 ++
 arch/arm/include/asm/dma-mapping.h |  653 +++++++++++------------------
 arch/arm/mm/dma-mapping.c          |  817 +++++++++++++++++++++++++++++++++---
 arch/arm/mm/vmregion.h             |    2 +-
 include/linux/dma-attrs.h          |    1 +
 7 files changed, 1033 insertions(+), 474 deletions(-)
 create mode 100644 arch/arm/include/asm/dma-iommu.h

-- 
1.7.1.569.g6f426

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC 0/2] ARM: DMA-mapping & IOMMU integration
  2011-05-25  7:35 [RFC 0/2] ARM: DMA-mapping & IOMMU integration Marek Szyprowski
@ 2011-06-04 16:13 ` Ohad Ben-Cohen
  2011-06-06  6:09   ` Marek Szyprowski
  2011-06-13 14:12 ` KyongHo Cho
  2011-06-20 14:31 ` Subash Patel
  2 siblings, 1 reply; 20+ messages in thread
From: Ohad Ben-Cohen @ 2011-06-04 16:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 25, 2011 at 10:35 AM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> I've finaly managed to clean up a bit my works and present the initial,
> very proof-of-concept version of patches that were ready just before
> Linaro meeting.

Btw, none of the patches seem to have made it through to lakml (only
the cover letter did).

Marek, did you get a "Message has a suspicious header" bounce too ?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC 0/2] ARM: DMA-mapping & IOMMU integration
  2011-06-04 16:13 ` Ohad Ben-Cohen
@ 2011-06-06  6:09   ` Marek Szyprowski
  0 siblings, 0 replies; 20+ messages in thread
From: Marek Szyprowski @ 2011-06-06  6:09 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Saturday, June 04, 2011 6:13 PM Ohad Ben- Cohen wrote:

> On Wed, May 25, 2011 at 10:35 AM, Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
> > I've finaly managed to clean up a bit my works and present the initial,
> > very proof-of-concept version of patches that were ready just before
> > Linaro meeting.
> 
> Btw, none of the patches seem to have made it through to lakml (only
> the cover letter did).

Hmmm, it looks that I need to resend the patches.
 
> Marek, did you get a "Message has a suspicious header" bounce too ?

Yes, I got it but I have no idea what is wrong... Do you have any suggestions?

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC 0/2] ARM: DMA-mapping & IOMMU integration
  2011-05-25  7:35 [RFC 0/2] ARM: DMA-mapping & IOMMU integration Marek Szyprowski
  2011-06-04 16:13 ` Ohad Ben-Cohen
@ 2011-06-13 14:12 ` KyongHo Cho
  2011-06-13 15:07   ` Arnd Bergmann
  2011-06-20 14:31 ` Subash Patel
  2 siblings, 1 reply; 20+ messages in thread
From: KyongHo Cho @ 2011-06-13 14:12 UTC (permalink / raw)
  To: linux-arm-kernel

I don't think dma_alloc_writecombine() is useful
because it is actually not different from dma_alloc_coherent().
Moreover, no architecture implements it except ARM and AVR32
and 'struct dma_map_ops' in <linux/dma-mapping.h> does not cover it.

The only difference of dma_alloc_writecombine() from dma_alloc_coherent() is
whether a caller needs to decide to use memory barrier after call
dma_alloc_writecombine().

Of course, the mapping created by by dma_alloc_writecombine()
may be more efficient for CPU to update the DMA buffer.
But I think mapping with dma_alloc_coherent() is not such a
performance bottleneck.

I think it is better to remove dma_alloc_writecombine() and replace
all of it with dma_alloc_coherent().

In addition, IMHO, mapping to user's address is not a duty of dma_map_ops.
dma_mmap_*() is not suitable for a system that has IOMMU
because a DMA address does not equal to its correspondent physical
address semantically.

I think DMA APIs of ARM must be changed drastically to support IOMMU
because IOMMU API does not manage virtual address space.

I've also concerned about IOMMU implementation in ARM architecture for
several months.
But i found that there are some obstacles to overcome.

Best regards.

On Wed, May 25, 2011 at 4:35 PM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> Hello,
>
> Folloing the discussion about the driver for IOMMU controller for
> Samsung Exynos4 platform and Arnd's suggestions I've decided to start
> working on redesign of dma-mapping implementation for ARM architecture.
> The goal is to add support for IOMMU in the way preffered by the
> community :)
>
> Some of the ideas about merging dma-mapping api and iommu api comes from
> the following threads:
> http://www.spinics.net/lists/linux-media/msg31453.html
> http://www.spinics.net/lists/arm-kernel/msg122552.html
> http://www.spinics.net/lists/arm-kernel/msg124416.html
>
> They were also discussed on Linaro memory management meeting at UDS
> (Budapest 9-12 May).
>
> I've finaly managed to clean up a bit my works and present the initial,
> very proof-of-concept version of patches that were ready just before
> Linaro meeting.
>
> What have been implemented:
>
> 1. Introduced arm_dma_ops
>
> dma_map_ops from include/linux/dma-mapping.h suffers from the following
> limitations:
> - lack of start address for sync operations
> - lack of write-combine methods
> - lack of mmap to user-space methods
> - lack of map_single method
>
> For the initial version I've decided to use custom arm_dma_ops.
> Extending common interface will take time, until that I wanted to have
> something already working.
>
> dma_{alloc,free,mmap}_{coherent,writecombine} have been consolidated
> into dma_{alloc,free,mmap}_attrib what have been suggested on Linaro
> meeting. New attribute for WRITE_COMBINE memory have been introduced.
>
> 2. moved all inline ARM dma-mapping related operations to
> arch/arm/mm/dma-mapping.c and put them as methods in generic arm_dma_ops
> structure. The dma-mapping.c code deinitely needs cleanup, but this is
> just a first step.
>
> 3. Added very initial IOMMU support. Right now it is limited only to
> dma_alloc_attrib, dma_free_attrib and dma_mmap_attrib. It have been
> tested with s5p-fimc driver on Samsung Exynos4 platform.
>
> 4. Adapted Samsung Exynos4 IOMUU driver to make use of the introduced
> iommu_dma proposal.
>
> This patch series contains only patches for common dma-mapping part.
> There is also a patch that adds driver for Samsung IOMMU controller on
> Exynos4 platform. All required patches are available on:
>
> git://git.infradead.org/users/kmpark/linux-2.6-samsung dma-mapping branch
>
> Git web interface:
> http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/dma-mapping
>
>
> Future:
>
> 1. Add all missing operations for IOMMU mappings (map_single/page/sg,
> sync_*)
>
> 2. Move sync_* operations into separate function for better code sharing
> between iommu and non-iommu dma-mapping code
>
> 3. Splitting out dma bounce code from non-bounce into separate set of
> dma methods. Right now dma-bounce code is compiled conditionally and
> spread over arch/arm/mm/dma-mapping.c and arch/arm/common/dmabounce.c.
>
> 4. Merging dma_map_single with dma_map_page. I haven't investigated
> deeply why they have separate implementation on ARM. If this is a
> requirement then dma_map_ops need to be extended with another method.
>
> 5. Fix dma_alloc to unmap from linear mapping.
>
> 6. Convert IO address space management code from gen-alloc to some
> simpler bitmap based solution.
>
> 7. resolve issues that might araise during discussion & comments
>
> Please note that this is very early version of patches, definitely NOT
> intended for merging. I just wanted to make sure that the direction is
> right and share the code with others that might want to cooperate on
> dma-mapping improvements.
>
> Best regards
> --
> Marek Szyprowski
> Samsung Poland R&D Center
>
>
>
> Patch summary:
>
> Marek Szyprowski (2):
> ?ARM: Move dma related inlines into arm_dma_ops methods
> ?ARM: initial proof-of-concept IOMMU mapper for DMA-mapping
>
> ?arch/arm/Kconfig ? ? ? ? ? ? ? ? ? | ? ?1 +
> ?arch/arm/include/asm/device.h ? ? ?| ? ?3 +
> ?arch/arm/include/asm/dma-iommu.h ? | ? 30 ++
> ?arch/arm/include/asm/dma-mapping.h | ?653 +++++++++++------------------
> ?arch/arm/mm/dma-mapping.c ? ? ? ? ?| ?817 +++++++++++++++++++++++++++++++++---
> ?arch/arm/mm/vmregion.h ? ? ? ? ? ? | ? ?2 +-
> ?include/linux/dma-attrs.h ? ? ? ? ?| ? ?1 +
> ?7 files changed, 1033 insertions(+), 474 deletions(-)
> ?create mode 100644 arch/arm/include/asm/dma-iommu.h
>
> --
> 1.7.1.569.g6f426
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo at kvack.org. ?For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email at kvack.org </a>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC 0/2] ARM: DMA-mapping & IOMMU integration
  2011-06-13 14:12 ` KyongHo Cho
@ 2011-06-13 15:07   ` Arnd Bergmann
  2011-06-13 15:30     ` KyongHo Cho
  0 siblings, 1 reply; 20+ messages in thread
From: Arnd Bergmann @ 2011-06-13 15:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 13 June 2011 16:12:05 KyongHo Cho wrote:
> Of course, the mapping created by by dma_alloc_writecombine()
> may be more efficient for CPU to update the DMA buffer.
> But I think mapping with dma_alloc_coherent() is not such a
> performance bottleneck.
> 
> I think it is better to remove dma_alloc_writecombine() and replace
> all of it with dma_alloc_coherent().

I'm sure that the graphics people will disagree with you on that.
Having the frame buffer mapped in write-combine mode is rather
important when you want to efficiently output videos from your
CPU.

> In addition, IMHO, mapping to user's address is not a duty of dma_map_ops.
> dma_mmap_*() is not suitable for a system that has IOMMU
> because a DMA address does not equal to its correspondent physical
> address semantically.
> 
> I think DMA APIs of ARM must be changed drastically to support IOMMU
> because IOMMU API does not manage virtual address space.

I can understand that there are arguments why mapping a DMA buffer into
user space doesn't belong into dma_map_ops, but I don't see how the
presence of an IOMMU is one of them.

The entire purpose of dma_map_ops is to hide from the user whether
you have an IOMMU or not, so that would be the main argument for
putting it in there, not against doing so.

	Arnd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC 0/2] ARM: DMA-mapping & IOMMU integration
  2011-06-13 15:07   ` Arnd Bergmann
@ 2011-06-13 15:30     ` KyongHo Cho
  2011-06-13 15:40       ` Catalin Marinas
                         ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: KyongHo Cho @ 2011-06-13 15:30 UTC (permalink / raw)
  To: linux-arm-kernel

Hi.

On Tue, Jun 14, 2011 at 12:07 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> I'm sure that the graphics people will disagree with you on that.
> Having the frame buffer mapped in write-combine mode is rather
> important when you want to efficiently output videos from your
> CPU.
>
I agree with you.
But I am discussing about dma_alloc_writecombine() in ARM.
You can see that only ARM and AVR32 implement it and there are few
drivers which use it.
No function in dma_map_ops corresponds to dma_alloc_writecombine().
That's why Marek tried to add 'alloc_writecombine' to dma_map_ops.

> I can understand that there are arguments why mapping a DMA buffer into
> user space doesn't belong into dma_map_ops, but I don't see how the
> presence of an IOMMU is one of them.
>
> The entire purpose of dma_map_ops is to hide from the user whether
> you have an IOMMU or not, so that would be the main argument for
> putting it in there, not against doing so.
>
I also understand the reasons why dma_map_ops maps a buffer into user space.
Mapping in device and user space at the same time or in a simple
approach may look good.
But I think mapping to user must be and driver-specific.
Moreover, kernel already provides various ways to map physical memory
to user space.
And I think that remapping DMA address that is in device address space
to user space is not a good idea
because DMA address is not same to physical address semantically if
features of IOMMU are implemented.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC 0/2] ARM: DMA-mapping & IOMMU integration
  2011-06-13 15:30     ` KyongHo Cho
@ 2011-06-13 15:40       ` Catalin Marinas
  2011-06-13 16:00         ` [Linaro-mm-sig] " KyongHo Cho
  2011-06-13 15:46       ` Arnd Bergmann
  2011-06-14  7:46       ` Marek Szyprowski
  2 siblings, 1 reply; 20+ messages in thread
From: Catalin Marinas @ 2011-06-13 15:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 14, 2011 at 12:30:44AM +0900, KyongHo Cho wrote:
> On Tue, Jun 14, 2011 at 12:07 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > I'm sure that the graphics people will disagree with you on that.
> > Having the frame buffer mapped in write-combine mode is rather
> > important when you want to efficiently output videos from your
> > CPU.
>
> I agree with you.
> But I am discussing about dma_alloc_writecombine() in ARM.
> You can see that only ARM and AVR32 implement it and there are few
> drivers which use it.
> No function in dma_map_ops corresponds to dma_alloc_writecombine().
> That's why Marek tried to add 'alloc_writecombine' to dma_map_ops.

FWIW, on ARMv6 and later hardware, the dma_alloc_coherent() provides
writecombine memory (i.e. Normal Noncacheable), so no need for
dma_alloc_writecombine(). On earlier architectures it is creating
Strongly Ordered mappings (no writecombine).

-- 
Catalin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC 0/2] ARM: DMA-mapping & IOMMU integration
  2011-06-13 15:30     ` KyongHo Cho
  2011-06-13 15:40       ` Catalin Marinas
@ 2011-06-13 15:46       ` Arnd Bergmann
  2011-06-13 15:58         ` [Linaro-mm-sig] " KyongHo Cho
  2011-06-14  7:46       ` Marek Szyprowski
  2 siblings, 1 reply; 20+ messages in thread
From: Arnd Bergmann @ 2011-06-13 15:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 13 June 2011 17:30:44 KyongHo Cho wrote:
> On Tue, Jun 14, 2011 at 12:07 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > I'm sure that the graphics people will disagree with you on that.
> > Having the frame buffer mapped in write-combine mode is rather
> > important when you want to efficiently output videos from your
> > CPU.
> >
> I agree with you.
> But I am discussing about dma_alloc_writecombine() in ARM.
> You can see that only ARM and AVR32 implement it and there are few
> drivers which use it.
> No function in dma_map_ops corresponds to dma_alloc_writecombine().
> That's why Marek tried to add 'alloc_writecombine' to dma_map_ops.

Yes, and I think Marek's patch is really necessary. The reason we
need dma_alloc_writecombine on ARM is because the page attributes in
the kernel need to match the ones in user space, while other
architectures either handle the writecombine flag outside of the
page table or can have multiple conflicting mappings.

The reason that I suspect AVR32 needs it is to share device drivers
with ARM.

> > I can understand that there are arguments why mapping a DMA buffer into
> > user space doesn't belong into dma_map_ops, but I don't see how the
> > presence of an IOMMU is one of them.
> >
> > The entire purpose of dma_map_ops is to hide from the user whether
> > you have an IOMMU or not, so that would be the main argument for
> > putting it in there, not against doing so.
> >
> I also understand the reasons why dma_map_ops maps a buffer into user space.
> Mapping in device and user space at the same time or in a simple
> approach may look good.
> But I think mapping to user must be and driver-specific.
> Moreover, kernel already provides various ways to map physical memory
> to user space.

I believe the idea of providing dma_mmap_... is to ensure that the
page attributes are not conflicting and the DMA code is the place
that decides on the page attributes for the kernel mapping, so no
other place in the kernel can really know what it should be in user
space.

> And I think that remapping DMA address that is in device address space
> to user space is not a good idea
> because DMA address is not same to physical address semantically if
> features of IOMMU are implemented.

I'm totally not following this argument. This has nothing to do with IOMMU
or not. If you have an IOMMU, the dma code will know where the pages are
anyway, so it can always map them into user space. The dma code might
have an easier way to do it other than follwoing the page tables.

	Arnd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Linaro-mm-sig] [RFC 0/2] ARM: DMA-mapping & IOMMU integration
  2011-06-13 15:46       ` Arnd Bergmann
@ 2011-06-13 15:58         ` KyongHo Cho
  0 siblings, 0 replies; 20+ messages in thread
From: KyongHo Cho @ 2011-06-13 15:58 UTC (permalink / raw)
  To: linux-arm-kernel

> I'm totally not following this argument. This has nothing to do with IOMMU
> or not. If you have an IOMMU, the dma code will know where the pages are
> anyway, so it can always map them into user space. The dma code might
> have an easier way to do it other than follwoing the page tables.
>
Ah. Sorry for that. I mixed dma_alloc_* up with dma_map_*.
I identified the reason why mmap_* in dma_map_ops is required.
You mean that nothing but DMA API knows what pages will be mapped to user space.
Thanks anyway.

KyongHo.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Linaro-mm-sig] [RFC 0/2] ARM: DMA-mapping & IOMMU integration
  2011-06-13 15:40       ` Catalin Marinas
@ 2011-06-13 16:00         ` KyongHo Cho
  2011-06-13 17:55           ` Michael K. Edwards
  2011-06-13 18:01           ` Catalin Marinas
  0 siblings, 2 replies; 20+ messages in thread
From: KyongHo Cho @ 2011-06-13 16:00 UTC (permalink / raw)
  To: linux-arm-kernel

> FWIW, on ARMv6 and later hardware, the dma_alloc_coherent() provides
> writecombine memory (i.e. Normal Noncacheable), so no need for
> dma_alloc_writecombine(). On earlier architectures it is creating
> Strongly Ordered mappings (no writecombine).
>
Thanks.

Do you mean that dma_alloc_coherent() and dma_alloc_writecombine() are
not different
except some additional features of dma_alloc_coherent() in ARM?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Linaro-mm-sig] [RFC 0/2] ARM: DMA-mapping & IOMMU integration
  2011-06-13 16:00         ` [Linaro-mm-sig] " KyongHo Cho
@ 2011-06-13 17:55           ` Michael K. Edwards
  2011-06-13 18:54             ` Jesse Barnes
  2011-06-13 18:01           ` Catalin Marinas
  1 sibling, 1 reply; 20+ messages in thread
From: Michael K. Edwards @ 2011-06-13 17:55 UTC (permalink / raw)
  To: linux-arm-kernel

The need to allocate pages for "write combining" access goes deeper
than anything to do with DMA or IOMMUs.  Please keep "write combine"
distinct from "coherent" in the allocation/mapping APIs.

Write-combining is a special case because it's an end-to-end
requirement, usually architecturally invisible, and getting it to
happen requires a very specific combination of mappings and code.
There's a good explanation here of the requirements on some Intel
implementations of the x86 architecture:
http://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers/
.  As I understand it, similar considerations apply on at least some
ARMv7 implementations, with NEON multi-register load/store operations
taking the place of MOVNTDQ.  (See
http://www.arm.com/files/pdf/A8_Paper.pdf for instance; although I
don't think there's enough detail about the conditions under which "if
the full cache line is written, the Level-2 line is simply marked
dirty and no external memory requests are required.")

As far as I can tell, there is not yet any way to get real
cache-bypassing write-combining from userland in a mainline kernel,
for x86/x86_64 or ARM.  I have been able to do it from inside a driver
on x86, including in an ISR with some fixes to the kernel's FPU
context save/restore code (patch attached, if you're curious);
otherwise I haven't yet seen write-combining in operation on Linux.
The code that needs to bypass the cache is part of a SoC silicon
erratum workaround supplied by Intel.  It didn't work as delivered --
it oopsed the kernel -- but is now shipping inside our product, and no
problems have been reported from QA or the field.  So I'm fairly sure
that the changes I made are effective.

I am not expert in this area; I was just forced to learn something
about it in order to make a product work.  My assertion that "there's
no way to do it yet" is almost certainly wrong.  I am hoping and
expecting to be immediately contradicted, with a working code example
and benchmarks that show that cache lines are not being fetched,
clobbered, and stored again, with the latencies hidden inside the
cache architecture.  :-)  (Seriously: there are four bits in the
Cortex-A8's "L2 Cache Auxiliary Control Register" that control various
aspects of this mechanism, and if you don't have a fairly good
explanation of which bits do and don't affect your benchmark, then I
contend that the job isn't done.  I don't begin to understand the
equivalent for the multi-core A9 I'm targeting next.)

If some kind person doesn't help me see the error of my ways, I'm
going to have to figure it out for myself on ARM in the next couple of
months, this time for performance reasons rather than to work around
silicon errata.  Unfortunately, I do not expect it to be particularly
low-hanging fruit.  I expect to switch to the hard-float ABI first
(the only remaining obstacle being a couple of TI-supplied binary-only
libraries).  That might provide enough of a system-level performance
win (by allowing the compiler to reorder fetches to NEON registers
across function/method calls) to obviate the need.

Cheers,
- Michael
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0011-Clean-up-task-FPU-state-thoroughly-during-exec-and-p.patch
Type: application/octet-stream
Size: 2039 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20110613/7461e0e6/attachment.obj>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Linaro-mm-sig] [RFC 0/2] ARM: DMA-mapping & IOMMU integration
  2011-06-13 16:00         ` [Linaro-mm-sig] " KyongHo Cho
  2011-06-13 17:55           ` Michael K. Edwards
@ 2011-06-13 18:01           ` Catalin Marinas
  1 sibling, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2011-06-13 18:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 13, 2011 at 05:00:16PM +0100, KyongHo Cho wrote:
> > FWIW, on ARMv6 and later hardware, the dma_alloc_coherent() provides
> > writecombine memory (i.e. Normal Noncacheable), so no need for
> > dma_alloc_writecombine(). On earlier architectures it is creating
> > Strongly Ordered mappings (no writecombine).
>
> Do you mean that dma_alloc_coherent() and dma_alloc_writecombine() are
> not different
> except some additional features of dma_alloc_coherent() in ARM?
 
When CONFIG_DMA_MEM_BUFFERABLE is enabled (by default on ARMv7 and ARMv6
with some exceptions because of hardware issues), the resulting mapping
for both coherent and writecombine is the same. In both cases the
mapping is done as L_PTE_MT_BUFFERABLE which is what you want with
writecombine. You can check the pgprot_writecombine() and
pgprot_dmacoherent() macros in asm/pgtable.h

-- 
Catalin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Linaro-mm-sig] [RFC 0/2] ARM: DMA-mapping & IOMMU integration
  2011-06-13 17:55           ` Michael K. Edwards
@ 2011-06-13 18:54             ` Jesse Barnes
  2011-06-14 18:15               ` Michael K. Edwards
  0 siblings, 1 reply; 20+ messages in thread
From: Jesse Barnes @ 2011-06-13 18:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 13 Jun 2011 10:55:59 -0700
"Michael K. Edwards" <m.k.edwards@gmail.com> wrote:

> As far as I can tell, there is not yet any way to get real
> cache-bypassing write-combining from userland in a mainline kernel,
> for x86/x86_64 or ARM. 

Well only if things are really broken.  sysfs exposes _wc resource
files to allow userland drivers to map a given PCI BAR using write
combining, if the underlying platform supports it.

Similarly, userland mapping of GEM objects through the GTT are supposed
to be write combined, though I need to verify this (we've had trouble
with it in the past).

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC 0/2] ARM: DMA-mapping & IOMMU integration
  2011-06-13 15:30     ` KyongHo Cho
  2011-06-13 15:40       ` Catalin Marinas
  2011-06-13 15:46       ` Arnd Bergmann
@ 2011-06-14  7:46       ` Marek Szyprowski
  2 siblings, 0 replies; 20+ messages in thread
From: Marek Szyprowski @ 2011-06-14  7:46 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Monday, June 13, 2011 5:31 PM KyongHo Cho wrote:

> On Tue, Jun 14, 2011 at 12:07 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > I'm sure that the graphics people will disagree with you on that.
> > Having the frame buffer mapped in write-combine mode is rather
> > important when you want to efficiently output videos from your
> > CPU.
> >
> I agree with you.
> But I am discussing about dma_alloc_writecombine() in ARM.
> You can see that only ARM and AVR32 implement it and there are few
> drivers which use it.
> No function in dma_map_ops corresponds to dma_alloc_writecombine().
> That's why Marek tried to add 'alloc_writecombine' to dma_map_ops.

I also introduced dma_alloc_attrs() to allow other types of 
memory&mappings combinations in the future. For example in case of
IOMMU the driver might like to call a function that will allocate 
a buffer that will 'work best with hardware'. This means that the
buffer might be build from pages larger than 4KiB, aligned to 
particular IOMMU requirements. Handling such requirements are
definitely not a part of the driver, only particular implementation
of dma-mapping will know them. The driver may just provide a some
hints how the memory will be used. The one that I'm particularly
thinking of are different types of caching.

> > I can understand that there are arguments why mapping a DMA buffer into
> > user space doesn't belong into dma_map_ops, but I don't see how the
> > presence of an IOMMU is one of them.
> >
> > The entire purpose of dma_map_ops is to hide from the user whether
> > you have an IOMMU or not, so that would be the main argument for
> > putting it in there, not against doing so.
>
> I also understand the reasons why dma_map_ops maps a buffer into user space.
> Mapping in device and user space at the same time or in a simple
> approach may look good.
> But I think mapping to user must be and driver-specific.
> Moreover, kernel already provides various ways to map physical memory
> to user space.
> And I think that remapping DMA address that is in device address space
> to user space is not a good idea
> because DMA address is not same to physical address semantically if
> features of IOMMU are implemented.

Mapping DMA address to user-space is one of the common feature of various
APIs (framebuffer, v4l2, alsa). In most cases the kernel virtual address 
in not even required for such drivers, because they just want to expose 
the buffer content to userspace. It would be great if dma-mapping will allow
allocating a coherent buffer without the need of mapping it to kernel space
at all. Kernel virtual space is really limited. For some multimedia
processing (like capturing & encoding HD movie from camera sensor) we
might need buffers of total size over 128MB or even more).

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Linaro-mm-sig] [RFC 0/2] ARM: DMA-mapping & IOMMU integration
  2011-06-13 18:54             ` Jesse Barnes
@ 2011-06-14 18:15               ` Michael K. Edwards
  2011-06-14 18:21                 ` Jesse Barnes
  0 siblings, 1 reply; 20+ messages in thread
From: Michael K. Edwards @ 2011-06-14 18:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 13, 2011 at 11:54 AM, Jesse Barnes <jbarnes@virtuousgeek.org> wrote:
> Well only if things are really broken. ?sysfs exposes _wc resource
> files to allow userland drivers to map a given PCI BAR using write
> combining, if the underlying platform supports it.

Mmm, I hadn't spotted that; that is useful, at least as sample code.
Doesn't do me any good directly, though; I'm not on a PCI device, I'm
on a SoC.  And what I need to do is to allocate normal memory through
an uncacheable write-combining page table entry (with certainty that
it is not aliased by a cacheable entry for the same physical memory),
and use it for interchange of data (GPU assets, compressed video) with
other on-chip cores.  (Or with off-chip PCI devices which use DMA to
transfer data to/from these buffers and then interrupt the CPU to
notify it to rotate them.)

What doesn't seem to be straightforward to do from userland is to
allocate pages that are locked to physical memory and mapped for
write-combining.  The device driver shouldn't have to mediate their
allocation, just map to a physical address (or set up an IOMMU entry,
I suppose) and pass that to the hardware that needs it.  Typical
userland code that could use such a mechanism would be the Qt/OpenGL
back end (which needs to store decompressed images and other
pre-rendered assets in GPU-ready buffers) and media pipelines.

> Similarly, userland mapping of GEM objects through the GTT are supposed
> to be write combined, though I need to verify this (we've had trouble
> with it in the past).

Also a nice source of sample code; though, again, I don't want this to
be driver-specific.  I might want a stage in my media pipeline that
uses the GPU to perform, say, lens distortion correction.  I shouldn't
have to go through contortions to use the same buffers from the GPU
and the video capture device.  The two devices are likely to have
their own variants on scatter-gather DMA, with a circularly linked
list of block descriptors with ownership bits and all that jazz; but
the actual data buffers should be generic, and the userland pipeline
setup code should just allocate them (presumably as contiguous regions
in a write-combining hugepage) and feed them to the plumbing.

Cheers,
- Michael

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Linaro-mm-sig] [RFC 0/2] ARM: DMA-mapping & IOMMU integration
  2011-06-14 18:15               ` Michael K. Edwards
@ 2011-06-14 18:21                 ` Jesse Barnes
  2011-06-14 19:10                   ` Zach Pfeffer
  2011-06-14 20:59                   ` Michael K. Edwards
  0 siblings, 2 replies; 20+ messages in thread
From: Jesse Barnes @ 2011-06-14 18:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 14 Jun 2011 11:15:38 -0700
"Michael K. Edwards" <m.k.edwards@gmail.com> wrote:
> What doesn't seem to be straightforward to do from userland is to
> allocate pages that are locked to physical memory and mapped for
> write-combining.  The device driver shouldn't have to mediate their
> allocation, just map to a physical address (or set up an IOMMU entry,
> I suppose) and pass that to the hardware that needs it.  Typical
> userland code that could use such a mechanism would be the Qt/OpenGL
> back end (which needs to store decompressed images and other
> pre-rendered assets in GPU-ready buffers) and media pipelines.

We try to avoid allowing userspace to pin arbitrary buffers though.  So
on the gfx side, userspace can allocate buffers, but they're only
actually pinned when some operation is performed on them (e.g. they're
referenced in a command buffer or used for a mode set operation).

Something like ION or GEM can provide the basic alloc & map API, but
the platform code still has to deal with grabbing hunks of memory,
making them uncached or write combine, and mapping them to app space
without conflicts.

> Also a nice source of sample code; though, again, I don't want this to
> be driver-specific.  I might want a stage in my media pipeline that
> uses the GPU to perform, say, lens distortion correction.  I shouldn't
> have to go through contortions to use the same buffers from the GPU
> and the video capture device.  The two devices are likely to have
> their own variants on scatter-gather DMA, with a circularly linked
> list of block descriptors with ownership bits and all that jazz; but
> the actual data buffers should be generic, and the userland pipeline
> setup code should just allocate them (presumably as contiguous regions
> in a write-combining hugepage) and feed them to the plumbing.

Totally agree.  That's one reason I don't think enhancing the DMA
mapping API in the kernel is a complete solution.  Sure, the platform
code needs to be able to map buffers to devices and use any available
IOMMUs, but we still need a userspace API for all of that, with its
associated changes to the CPU MMU handling.

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Linaro-mm-sig] [RFC 0/2] ARM: DMA-mapping & IOMMU integration
  2011-06-14 18:21                 ` Jesse Barnes
@ 2011-06-14 19:10                   ` Zach Pfeffer
  2011-06-14 20:59                   ` Michael K. Edwards
  1 sibling, 0 replies; 20+ messages in thread
From: Zach Pfeffer @ 2011-06-14 19:10 UTC (permalink / raw)
  To: linux-arm-kernel

On 14 June 2011 13:21, Jesse Barnes <jbarnes@virtuousgeek.org> wrote:
> On Tue, 14 Jun 2011 11:15:38 -0700
> "Michael K. Edwards" <m.k.edwards@gmail.com> wrote:
>> What doesn't seem to be straightforward to do from userland is to
>> allocate pages that are locked to physical memory and mapped for
>> write-combining. ?The device driver shouldn't have to mediate their
>> allocation, just map to a physical address (or set up an IOMMU entry,
>> I suppose) and pass that to the hardware that needs it. ?Typical
>> userland code that could use such a mechanism would be the Qt/OpenGL
>> back end (which needs to store decompressed images and other
>> pre-rendered assets in GPU-ready buffers) and media pipelines.
>
> We try to avoid allowing userspace to pin arbitrary buffers though. ?So
> on the gfx side, userspace can allocate buffers, but they're only
> actually pinned when some operation is performed on them (e.g. they're
> referenced in a command buffer or used for a mode set operation).
>
> Something like ION or GEM can provide the basic alloc & map API, but
> the platform code still has to deal with grabbing hunks of memory,
> making them uncached or write combine, and mapping them to app space
> without conflicts.
>
>> Also a nice source of sample code; though, again, I don't want this to
>> be driver-specific. ?I might want a stage in my media pipeline that
>> uses the GPU to perform, say, lens distortion correction. ?I shouldn't
>> have to go through contortions to use the same buffers from the GPU
>> and the video capture device. ?The two devices are likely to have
>> their own variants on scatter-gather DMA, with a circularly linked
>> list of block descriptors with ownership bits and all that jazz; but
>> the actual data buffers should be generic, and the userland pipeline
>> setup code should just allocate them (presumably as contiguous regions
>> in a write-combining hugepage) and feed them to the plumbing.
>
> Totally agree. ?That's one reason I don't think enhancing the DMA
> mapping API in the kernel is a complete solution. ?Sure, the platform
> code needs to be able to map buffers to devices and use any available
> IOMMUs, but we still need a userspace API for all of that, with its
> associated changes to the CPU MMU handling.

I haven't seen all the discussions but it sounds like creating the
correct userspace abstraction and then looking at how the kernel needs
to change (instead of the other way around) may add some clarity to
things.

> --
> Jesse Barnes, Intel Open Source Technology Center
>
> _______________________________________________
> Linaro-mm-sig mailing list
> Linaro-mm-sig at lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Linaro-mm-sig] [RFC 0/2] ARM: DMA-mapping & IOMMU integration
  2011-06-14 18:21                 ` Jesse Barnes
  2011-06-14 19:10                   ` Zach Pfeffer
@ 2011-06-14 20:59                   ` Michael K. Edwards
  1 sibling, 0 replies; 20+ messages in thread
From: Michael K. Edwards @ 2011-06-14 20:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 14, 2011 at 11:21 AM, Jesse Barnes <jbarnes@virtuousgeek.org> wrote:
> We try to avoid allowing userspace to pin arbitrary buffers though. ?So
> on the gfx side, userspace can allocate buffers, but they're only
> actually pinned when some operation is performed on them (e.g. they're
> referenced in a command buffer or used for a mode set operation).

The issue isn't so much pinning; I don't really care if the physical
memory moves out from under me as long as the mappings are properly
updated in all the process page tables that share it and all the
hardware units that care.  But the mapping has to have the right cache
policy from the beginning, so that I get the important part of write
combining (the fill buffer allocation -- without bothering to load
contents from DRAM that are likely to be completely clobbered -- and
the cache-line-sized flush once it's filled).  In any case, supposedly
there are weird aliasing issues if you try to take a page that is
already mapped cacheable and remap it write-combine; and in the case
of shared pages, you'd need to look up all processes that have the
page mapped and alter their page tables, even if they're currently
running on other SMP cores.  Nasty.

Besides, I don't want little 4K pages; I want a hugepage with the
right cache policy, in which I can build a malloc pool (tcmalloc,
jemalloc, something like that) and allocate buffers for a variety of
purposes.  (I also want to use this to pass whole data structures,
like priority search trees built using offset pointers, among cores
that don't share a cache hierarchy or a cache coherency protocol.)

Presumably the privilege of write-combine buffer allocation would be
limited to processes that have been granted the appropriate
capability; but then that process should be able to share it with
others.  I would think the natural thing would be for the special-page
allocation API to return a file descriptor, which can then be passed
over local domain sockets and mmap()ed by as many processes as
necessary.  For many usage patterns, there will be no need for a
kernel virtual mapping; hardware wants physical addresses (or IOMMU
mappings) anyway.

> Something like ION or GEM can provide the basic alloc & map API, but
> the platform code still has to deal with grabbing hunks of memory,
> making them uncached or write combine, and mapping them to app space
> without conflicts.

Absolutely.  Much like any other hugepage allocation, right?  Not
really something ION or GEM or any other device driver needs to be
involved in.  Except for alignment issues, I suppose; I haven't given
that much thought.

The part about setting up corresponding mappings to the same physical
addresses in the device's DMA mechanics is not buffer *allocation*,
it's buffer *registration*.  That's sort of like V4L2's "user pointer
I/O" mode, in which the userspace app allocates the buffers and uses
the QBUF ioctl to register them.  I see no reason why the enforcement
of minimum alignment and cache policy couldn't be done at buffer
registration time rather than region allocation time.

Cheers,
- Michael

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC 0/2] ARM: DMA-mapping & IOMMU integration
  2011-05-25  7:35 [RFC 0/2] ARM: DMA-mapping & IOMMU integration Marek Szyprowski
  2011-06-04 16:13 ` Ohad Ben-Cohen
  2011-06-13 14:12 ` KyongHo Cho
@ 2011-06-20 14:31 ` Subash Patel
  2011-06-20 14:59   ` Marek Szyprowski
  2 siblings, 1 reply; 20+ messages in thread
From: Subash Patel @ 2011-06-20 14:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marek,

In function: 
dma_alloc_coherent()->arm_iommu_alloc_attrs()->__iommu_alloc_buffer()

I have following questions:

a) Before we come to this point, we would have enabled SYSMMU in a call 
to arm_iommu_init(). Shouldnt the SYSMMU be enabled after call to 
__iommu_alloc_buffer(), but before __iommu_create_mapping()? If in case 
the __iommu_alloc_buffer() fails, we dont disable the SYSMMU.

b) For huge buffer sizes, the pressure on SYSMMU would be very high. 
Cant we have option to dictate the page size for the IOMMU from driver 
in such cases? Should it always be the size of system pages?

Regards,
Subash
SISO-SLG

On 05/25/2011 01:05 PM, Marek Szyprowski wrote:
> Hello,
>
> Folloing the discussion about the driver for IOMMU controller for
> Samsung Exynos4 platform and Arnd's suggestions I've decided to start
> working on redesign of dma-mapping implementation for ARM architecture.
> The goal is to add support for IOMMU in the way preffered by the
> community :)
>
> Some of the ideas about merging dma-mapping api and iommu api comes from
> the following threads:
> http://www.spinics.net/lists/linux-media/msg31453.html
> http://www.spinics.net/lists/arm-kernel/msg122552.html
> http://www.spinics.net/lists/arm-kernel/msg124416.html
>
> They were also discussed on Linaro memory management meeting at UDS
> (Budapest 9-12 May).
>
> I've finaly managed to clean up a bit my works and present the initial,
> very proof-of-concept version of patches that were ready just before
> Linaro meeting.
>
> What have been implemented:
>
> 1. Introduced arm_dma_ops
>
> dma_map_ops from include/linux/dma-mapping.h suffers from the following
> limitations:
> - lack of start address for sync operations
> - lack of write-combine methods
> - lack of mmap to user-space methods
> - lack of map_single method
>
> For the initial version I've decided to use custom arm_dma_ops.
> Extending common interface will take time, until that I wanted to have
> something already working.
>
> dma_{alloc,free,mmap}_{coherent,writecombine} have been consolidated
> into dma_{alloc,free,mmap}_attrib what have been suggested on Linaro
> meeting. New attribute for WRITE_COMBINE memory have been introduced.
>
> 2. moved all inline ARM dma-mapping related operations to
> arch/arm/mm/dma-mapping.c and put them as methods in generic arm_dma_ops
> structure. The dma-mapping.c code deinitely needs cleanup, but this is
> just a first step.
>
> 3. Added very initial IOMMU support. Right now it is limited only to
> dma_alloc_attrib, dma_free_attrib and dma_mmap_attrib. It have been
> tested with s5p-fimc driver on Samsung Exynos4 platform.
>
> 4. Adapted Samsung Exynos4 IOMUU driver to make use of the introduced
> iommu_dma proposal.
>
> This patch series contains only patches for common dma-mapping part.
> There is also a patch that adds driver for Samsung IOMMU controller on
> Exynos4 platform. All required patches are available on:
>
> git://git.infradead.org/users/kmpark/linux-2.6-samsung dma-mapping branch
>
> Git web interface:
> http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/dma-mapping
>
>
> Future:
>
> 1. Add all missing operations for IOMMU mappings (map_single/page/sg,
> sync_*)
>
> 2. Move sync_* operations into separate function for better code sharing
> between iommu and non-iommu dma-mapping code
>
> 3. Splitting out dma bounce code from non-bounce into separate set of
> dma methods. Right now dma-bounce code is compiled conditionally and
> spread over arch/arm/mm/dma-mapping.c and arch/arm/common/dmabounce.c.
>
> 4. Merging dma_map_single with dma_map_page. I haven't investigated
> deeply why they have separate implementation on ARM. If this is a
> requirement then dma_map_ops need to be extended with another method.
>
> 5. Fix dma_alloc to unmap from linear mapping.
>
> 6. Convert IO address space management code from gen-alloc to some
> simpler bitmap based solution.
>
> 7. resolve issues that might araise during discussion&  comments
>
> Please note that this is very early version of patches, definitely NOT
> intended for merging. I just wanted to make sure that the direction is
> right and share the code with others that might want to cooperate on
> dma-mapping improvements.
>
> Best regards

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC 0/2] ARM: DMA-mapping & IOMMU integration
  2011-06-20 14:31 ` Subash Patel
@ 2011-06-20 14:59   ` Marek Szyprowski
  0 siblings, 0 replies; 20+ messages in thread
From: Marek Szyprowski @ 2011-06-20 14:59 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Monday, June 20, 2011 4:31 PM Subash Patel wrote:

> In function:
> dma_alloc_coherent()->arm_iommu_alloc_attrs()->__iommu_alloc_buffer()
> 
> I have following questions:
> 
> a) Before we come to this point, we would have enabled SYSMMU in a call
> to arm_iommu_init(). Shouldnt the SYSMMU be enabled after call to
> __iommu_alloc_buffer(), but before __iommu_create_mapping()? If in case
> the __iommu_alloc_buffer() fails, we dont disable the SYSMMU.

I want to move enabling and disabling SYSMMU completely to the runtime_pm
framework. As You can notice, the updated SYSMMU driver automatically
becomes a parent of respective multimedia device and a child of the power
domain to which both belongs. This means that sysmmu will operate only
when multimedia device is enabled, what really makes sense. The sysmmu
driver will need to be updated not to poke into the registers if it is
disabled, but this should be really trivial change.

> b) For huge buffer sizes, the pressure on SYSMMU would be very high.
> Cant we have option to dictate the page size for the IOMMU from driver
> in such cases? Should it always be the size of system pages?

This was just a first version of dma-mapping and IOMMU integration, just
to show the development road and start the discussion. Of course in the
final version support for pages larger than 4KiB is highly expected. We
can even reuse the recently posted CMA to allocate large pages for IOMMU
to improve the performance and make sure that the framework will be able
to allocate such pages even if the device is running for long time and 
memory got fragmented by typically movable pages.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2011-06-20 14:59 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-25  7:35 [RFC 0/2] ARM: DMA-mapping & IOMMU integration Marek Szyprowski
2011-06-04 16:13 ` Ohad Ben-Cohen
2011-06-06  6:09   ` Marek Szyprowski
2011-06-13 14:12 ` KyongHo Cho
2011-06-13 15:07   ` Arnd Bergmann
2011-06-13 15:30     ` KyongHo Cho
2011-06-13 15:40       ` Catalin Marinas
2011-06-13 16:00         ` [Linaro-mm-sig] " KyongHo Cho
2011-06-13 17:55           ` Michael K. Edwards
2011-06-13 18:54             ` Jesse Barnes
2011-06-14 18:15               ` Michael K. Edwards
2011-06-14 18:21                 ` Jesse Barnes
2011-06-14 19:10                   ` Zach Pfeffer
2011-06-14 20:59                   ` Michael K. Edwards
2011-06-13 18:01           ` Catalin Marinas
2011-06-13 15:46       ` Arnd Bergmann
2011-06-13 15:58         ` [Linaro-mm-sig] " KyongHo Cho
2011-06-14  7:46       ` Marek Szyprowski
2011-06-20 14:31 ` Subash Patel
2011-06-20 14:59   ` Marek Szyprowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).