* How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?")
@ 2009-08-06 10:08 Laurent Pinchart
2009-08-06 11:46 ` Ben Dooks
0 siblings, 1 reply; 42+ messages in thread
From: Laurent Pinchart @ 2009-08-06 10:08 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Robin Holt, linux-kernel, v4l2_linux, linux-arm-kernel
[Resent with an updated subject, this time CC'ing linux-arm-kernel]
I've spent the last few days "playing" with get_user_pages() and mlock() and
got some interesting results. It turned out that cache coherency comes into
play at some point, making the overall problem more complex.
Here's my current setup:
- OMAP processor, based on an ARMv7 core
- MMU and IOMMU
- VIPT non-aliasing data cache
- video capture driver that transfers data to memory using DMA
- video capture application that pass userspace pointers to video buffers to
the driver
My goal is to make sure that, upon DMA completion, the correct data will be
available to the userspace application.
The first problem was to pin pages to memory, to make sure they will not be
freed when the DMA is in progress. videobug-dma-sg uses get_user_pages() for
that, and Hugh Dickins nicely explained to me why this is enough.
The second problem is to ensure cache coherency. As the userspace application
will read data from the video buffers, those buffers will end up being cached
in the processor's data cache. The driver does need to invalidate the cache
before starting the DMA operation (userspace could in theory write to the
buffers, but the data will be overwritten by DMA anyway, so there's no need to
clean the cache).
As the cache is of the VIPT (Virtual Index Physical Tag) type, cache
invalidation can either be done globally (in which case the cache is flushed
instead of being invalidated) or based on virtual addresses. In the last case
the processor will need to look physical addresses up, either in the TLB or
through hardware table walk.
I can see three solutions to the DMA/cache problem.
1. Flushing the whole data cache right before starting the DMA transfer.
There's no API for that in the ARM architecture, so a whole I+D cache is
required. This is quite costly, we're talking about around 30 flushes per
second, but it doesn't involve the MMU. That's the solution that I currently
use.
2. Invalidating only the cache lines that store video buffer data. This
requires a TLB lookup or a hardware table walk, so the userspace application
MM context needs to be available (no problem there as where's flushing in
userspace context) and all pages need to be mapped properly. This can be a
problem as, as Hugh pointed out, pages can still be unmapped from the
userspace context after get_user_pages() returns. I have experienced one oops
due to a kernel paging request failure:
Unable to handle kernel paging request at virtual address 44e12000
pgd = c8698000
[44e12000] *pgd=8a4fd031, *pte=8cfda1cd, *ppte=00000000
Internal error: Oops: 817 [#1] PREEMPT
PC is at v7_dma_inv_range+0x2c/0x44
Fixing this requires more investigation, and I'm not sure how to proceed to
find out if the page fault is really caused by pages being unmapped from the
userspace context. Help would be appreciated.
3. Mark the pages as non-cacheable. Depending on how the buffers are then used
by userspace, the additional cache misses might destroy any benefit I would
get from not flushing the cache before DMA. I'm not sure how to mark a bunch
of pages as non-cacheable though. What usually happens is that video drivers
allocate DMA-coherent memory themselves, but in this case I need to deal with
an arbitrary buffer allocated by userspace. If someone has any experience with
this, it would be appreciated.
Regards,
Laurent Pinchart
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-06 10:08 How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") Laurent Pinchart @ 2009-08-06 11:46 ` Ben Dooks 2009-08-06 13:06 ` Laurent Pinchart 0 siblings, 1 reply; 42+ messages in thread From: Ben Dooks @ 2009-08-06 11:46 UTC (permalink / raw) To: Laurent Pinchart Cc: Hugh Dickins, Robin Holt, linux-kernel, v4l2_linux, linux-arm-kernel On Thu, Aug 06, 2009 at 12:08:21PM +0200, Laurent Pinchart wrote: > [Resent with an updated subject, this time CC'ing linux-arm-kernel] > > I've spent the last few days "playing" with get_user_pages() and mlock() and > got some interesting results. It turned out that cache coherency comes into > play at some point, making the overall problem more complex. > > Here's my current setup: > > - OMAP processor, based on an ARMv7 core > - MMU and IOMMU > - VIPT non-aliasing data cache > - video capture driver that transfers data to memory using DMA > - video capture application that pass userspace pointers to video buffers to > the driver > > My goal is to make sure that, upon DMA completion, the correct data will be > available to the userspace application. > > The first problem was to pin pages to memory, to make sure they will not be > freed when the DMA is in progress. videobug-dma-sg uses get_user_pages() for > that, and Hugh Dickins nicely explained to me why this is enough. > > The second problem is to ensure cache coherency. As the userspace application > will read data from the video buffers, those buffers will end up being cached > in the processor's data cache. The driver does need to invalidate the cache > before starting the DMA operation (userspace could in theory write to the > buffers, but the data will be overwritten by DMA anyway, so there's no need to > clean the cache). You'll need to clean the write buffers, otherwise the CPU may have data queued that it has yet to write back to memory. > As the cache is of the VIPT (Virtual Index Physical Tag) type, cache > invalidation can either be done globally (in which case the cache is flushed > instead of being invalidated) or based on virtual addresses. In the last case > the processor will need to look physical addresses up, either in the TLB or > through hardware table walk. > > I can see three solutions to the DMA/cache problem. > > 1. Flushing the whole data cache right before starting the DMA transfer. > There's no API for that in the ARM architecture, so a whole I+D cache is > required. This is quite costly, we're talking about around 30 flushes per > second, but it doesn't involve the MMU. That's the solution that I currently > use. > > 2. Invalidating only the cache lines that store video buffer data. This > requires a TLB lookup or a hardware table walk, so the userspace application > MM context needs to be available (no problem there as where's flushing in > userspace context) and all pages need to be mapped properly. This can be a > problem as, as Hugh pointed out, pages can still be unmapped from the > userspace context after get_user_pages() returns. I have experienced one oops > due to a kernel paging request failure: If you already know the virtual addresses of the buffers, why do you need a TLB lookup (or am I being dense here?) > Unable to handle kernel paging request at virtual address 44e12000 > pgd = c8698000 > [44e12000] *pgd=8a4fd031, *pte=8cfda1cd, *ppte=00000000 > Internal error: Oops: 817 [#1] PREEMPT > PC is at v7_dma_inv_range+0x2c/0x44 > > Fixing this requires more investigation, and I'm not sure how to proceed to > find out if the page fault is really caused by pages being unmapped from the > userspace context. Help would be appreciated. > > 3. Mark the pages as non-cacheable. Depending on how the buffers are then used > by userspace, the additional cache misses might destroy any benefit I would > get from not flushing the cache before DMA. I'm not sure how to mark a bunch > of pages as non-cacheable though. What usually happens is that video drivers > allocate DMA-coherent memory themselves, but in this case I need to deal with > an arbitrary buffer allocated by userspace. If someone has any experience with > this, it would be appreciated. > > Regards, > > Laurent Pinchart > > > ------------------------------------------------------------------- > List admin: http://lists.arm.linux.org.uk/mailman/listinfo/linux-arm-kernel > FAQ: http://www.arm.linux.org.uk/mailinglists/faq.php > Etiquette: http://www.arm.linux.org.uk/mailinglists/etiquette.php -- -- Ben Q: What's a light-year? A: One-third less calories than a regular year. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-06 11:46 ` Ben Dooks @ 2009-08-06 13:06 ` Laurent Pinchart 2009-08-06 18:46 ` David Xiao 0 siblings, 1 reply; 42+ messages in thread From: Laurent Pinchart @ 2009-08-06 13:06 UTC (permalink / raw) To: Ben Dooks Cc: Hugh Dickins, Robin Holt, linux-kernel, v4l2_linux, linux-arm-kernel Hi Ben, On Thursday 06 August 2009 13:46:19 Ben Dooks wrote: > On Thu, Aug 06, 2009 at 12:08:21PM +0200, Laurent Pinchart wrote: [snip] > > > > The second problem is to ensure cache coherency. As the userspace > > application will read data from the video buffers, those buffers will end > > up being cached in the processor's data cache. The driver does need to > > invalidate the cache before starting the DMA operation (userspace could > > in theory write to the buffers, but the data will be overwritten by DMA > > anyway, so there's no need to clean the cache). > > You'll need to clean the write buffers, otherwise the CPU may have data > queued that it has yet to write back to memory. Good points, thanks. > > As the cache is of the VIPT (Virtual Index Physical Tag) type, cache > > invalidation can either be done globally (in which case the cache is > > flushed instead of being invalidated) or based on virtual addresses. In > > the last case the processor will need to look physical addresses up, > > either in the TLB or through hardware table walk. > > > > I can see three solutions to the DMA/cache problem. > > > > 1. Flushing the whole data cache right before starting the DMA transfer. > > There's no API for that in the ARM architecture, so a whole I+D cache is > > required. This is quite costly, we're talking about around 30 flushes per > > second, but it doesn't involve the MMU. That's the solution that I > > currently use. > > > > 2. Invalidating only the cache lines that store video buffer data. This > > requires a TLB lookup or a hardware table walk, so the userspace > > application MM context needs to be available (no problem there as where's > > flushing in userspace context) and all pages need to be mapped properly. > > This can be a problem as, as Hugh pointed out, pages can still be > > unmapped from the userspace context after get_user_pages() returns. I > > have experienced one oops due to a kernel paging request failure: > > If you already know the virtual addresses of the buffers, why do you need > a TLB lookup (or am I being dense here?) The virtual address is used to compute the cache lines index, and the physical address is then used when comparing the cache line tag. So the processor (or actually the CP15 coprocessor if I'm not wrong) does a TLB lookup to get the physical address during cache invalidation/flushing. > > Unable to handle kernel paging request at virtual address > > 44e12000 pgd = c8698000 > > [44e12000] *pgd=8a4fd031, *pte=8cfda1cd, *ppte=00000000 > > Internal error: Oops: 817 [#1] PREEMPT > > PC is at v7_dma_inv_range+0x2c/0x44 > > > > Fixing this requires more investigation, and I'm not sure how to proceed > > to find out if the page fault is really caused by pages being unmapped > > from the userspace context. Help would be appreciated. > > > > 3. Mark the pages as non-cacheable. Depending on how the buffers are then > > used by userspace, the additional cache misses might destroy any benefit > > I would get from not flushing the cache before DMA. I'm not sure how to > > mark a bunch of pages as non-cacheable though. What usually happens is > > that video drivers allocate DMA-coherent memory themselves, but in this > > case I need to deal with an arbitrary buffer allocated by userspace. If > > someone has any experience with this, it would be appreciated. Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-06 13:06 ` Laurent Pinchart @ 2009-08-06 18:46 ` David Xiao 2009-08-06 19:16 ` Chetan.Loke ` (3 more replies) 0 siblings, 4 replies; 42+ messages in thread From: David Xiao @ 2009-08-06 18:46 UTC (permalink / raw) To: Laurent Pinchart Cc: Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Thu, 2009-08-06 at 06:06 -0700, Laurent Pinchart wrote: > Hi Ben, > > On Thursday 06 August 2009 13:46:19 Ben Dooks wrote: > > On Thu, Aug 06, 2009 at 12:08:21PM +0200, Laurent Pinchart wrote: > [snip] > > > > > > The second problem is to ensure cache coherency. As the userspace > > > application will read data from the video buffers, those buffers will end > > > up being cached in the processor's data cache. The driver does need to > > > invalidate the cache before starting the DMA operation (userspace could > > > in theory write to the buffers, but the data will be overwritten by DMA > > > anyway, so there's no need to clean the cache). > > > > You'll need to clean the write buffers, otherwise the CPU may have data > > queued that it has yet to write back to memory. > > Good points, thanks. I thought this should have been taken care of by the CPU specific dma_inv_range routine. However, In arch/arm/mm/cache-v7.c, v7_dma_inv_range does not drain the write buffer; and the v6_dma_inv_range does that in the end of all the cache maintenance operaitons. So this is probably something Russel can clarify. > > > > As the cache is of the VIPT (Virtual Index Physical Tag) type, cache > > > invalidation can either be done globally (in which case the cache is > > > flushed instead of being invalidated) or based on virtual addresses. In > > > the last case the processor will need to look physical addresses up, > > > either in the TLB or through hardware table walk. > > > > > > I can see three solutions to the DMA/cache problem. > > > > > > 1. Flushing the whole data cache right before starting the DMA transfer. > > > There's no API for that in the ARM architecture, so a whole I+D cache is > > > required. This is quite costly, we're talking about around 30 flushes per > > > second, but it doesn't involve the MMU. That's the solution that I > > > currently use. > > > > > > 2. Invalidating only the cache lines that store video buffer data. This > > > requires a TLB lookup or a hardware table walk, so the userspace > > > application MM context needs to be available (no problem there as where's > > > flushing in userspace context) and all pages need to be mapped properly. > > > This can be a problem as, as Hugh pointed out, pages can still be > > > unmapped from the userspace context after get_user_pages() returns. I > > > have experienced one oops due to a kernel paging request failure: > > > > If you already know the virtual addresses of the buffers, why do you need > > a TLB lookup (or am I being dense here?) > > The virtual address is used to compute the cache lines index, and the physical > address is then used when comparing the cache line tag. So the processor (or > actually the CP15 coprocessor if I'm not wrong) does a TLB lookup to get the > physical address during cache invalidation/flushing. > > > > Unable to handle kernel paging request at virtual address > > > 44e12000 pgd = c8698000 > > > [44e12000] *pgd=8a4fd031, *pte=8cfda1cd, *ppte=00000000 > > > Internal error: Oops: 817 [#1] PREEMPT > > > PC is at v7_dma_inv_range+0x2c/0x44 > > > > > > Fixing this requires more investigation, and I'm not sure how to proceed > > > to find out if the page fault is really caused by pages being unmapped > > > from the userspace context. Help would be appreciated. > > > > > > 3. Mark the pages as non-cacheable. Depending on how the buffers are then > > > used by userspace, the additional cache misses might destroy any benefit > > > I would get from not flushing the cache before DMA. I'm not sure how to > > > mark a bunch of pages as non-cacheable though. What usually happens is > > > that video drivers allocate DMA-coherent memory themselves, but in this > > > case I need to deal with an arbitrary buffer allocated by userspace. If > > > someone has any experience with this, it would be appreciated. > Another approach is working from a different direction: the kernel allocates the non-cached buffer and then mmap() into user space. I have done that in similar situation to try to achieve "zero-copy". David ^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-06 18:46 ` David Xiao @ 2009-08-06 19:16 ` Chetan.Loke 2009-08-06 20:15 ` Jamie Lokier ` (2 subsequent siblings) 3 siblings, 0 replies; 42+ messages in thread From: Chetan.Loke @ 2009-08-06 19:16 UTC (permalink / raw) To: dxiao, laurent.pinchart Cc: ben-linux, hugh.dickins, holt, linux-kernel, linux-media, linux-arm-kernel > -----Original Message----- > From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel- > owner@vger.kernel.org] On Behalf Of David Xiao > Sent: Thursday, August 06, 2009 2:46 PM > To: Laurent Pinchart > Cc: Ben Dooks; Hugh Dickins; Robin Holt; linux-kernel@vger.kernel.org; > v4l2_linux; linux-arm-kernel@lists.arm.linux.org.uk > Subject: Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is > get_user_pages() enough to prevent pages from being swapped out ?") > > On Thu, 2009-08-06 at 06:06 -0700, Laurent Pinchart wrote: > > Hi Ben, > > > > On Thursday 06 August 2009 13:46:19 Ben Dooks wrote: > > > On Thu, Aug 06, 2009 at 12:08:21PM +0200, Laurent Pinchart wrote: > > [snip] > > > > > > > > The second problem is to ensure cache coherency. As the userspace > > > > application will read data from the video buffers, those buffers > will end > > > > up being cached in the processor's data cache. The driver does need > to > > > > invalidate the cache before starting the DMA operation (userspace > could > > > > in theory write to the buffers, but the data will be overwritten by > DMA > > > > anyway, so there's no need to clean the cache). > > > > > > You'll need to clean the write buffers, otherwise the CPU may have > data > > > queued that it has yet to write back to memory. > > > > Good points, thanks. > > I thought this should have been taken care of by the CPU specific > dma_inv_range routine. However, In arch/arm/mm/cache-v7.c, > v7_dma_inv_range does not drain the write buffer; and the > v6_dma_inv_range does that in the end of all the cache maintenance > operaitons. > So this is probably something Russel can clarify. > Something non-related. I haven't used this specific api but ARM1156 has an issue. If you use the clean-cache-block mcr feature then it might result in memory-corruption. So be careful. I'm not sure which of these(ARM1156T2-S or ARM1156T2F-S) variants has that errata. Chetan ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-06 18:46 ` David Xiao 2009-08-06 19:16 ` Chetan.Loke @ 2009-08-06 20:15 ` Jamie Lokier 2009-08-06 22:25 ` Russell King - ARM Linux 2009-08-07 7:29 ` Laurent Pinchart 3 siblings, 0 replies; 42+ messages in thread From: Jamie Lokier @ 2009-08-06 20:15 UTC (permalink / raw) To: David Xiao Cc: Laurent Pinchart, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk David Xiao wrote: > Another approach is working from a different direction: the kernel > allocates the non-cached buffer and then mmap() into user space. I have > done that in similar situation to try to achieve "zero-copy". open(O_DIRECT) does DMA to arbitrary pages allocated by userspace, and O_DIRECT is used by some important applications, so the problem still needs to be solved in general. -- Jamie ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-06 18:46 ` David Xiao 2009-08-06 19:16 ` Chetan.Loke 2009-08-06 20:15 ` Jamie Lokier @ 2009-08-06 22:25 ` Russell King - ARM Linux 2009-08-07 5:59 ` David Xiao ` (2 more replies) 2009-08-07 7:29 ` Laurent Pinchart 3 siblings, 3 replies; 42+ messages in thread From: Russell King - ARM Linux @ 2009-08-06 22:25 UTC (permalink / raw) To: David Xiao Cc: Laurent Pinchart, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Thu, Aug 06, 2009 at 11:46:14AM -0700, David Xiao wrote: > On Thu, 2009-08-06 at 06:06 -0700, Laurent Pinchart wrote: > > Hi Ben, > > > > On Thursday 06 August 2009 13:46:19 Ben Dooks wrote: > > > On Thu, Aug 06, 2009 at 12:08:21PM +0200, Laurent Pinchart wrote: > > [snip] > > > > > > > > The second problem is to ensure cache coherency. As the userspace > > > > application will read data from the video buffers, those buffers will end > > > > up being cached in the processor's data cache. The driver does need to > > > > invalidate the cache before starting the DMA operation (userspace could > > > > in theory write to the buffers, but the data will be overwritten by DMA > > > > anyway, so there's no need to clean the cache). > > > > > > You'll need to clean the write buffers, otherwise the CPU may have data > > > queued that it has yet to write back to memory. > > > > Good points, thanks. > > I thought this should have been taken care of by the CPU specific > dma_inv_range routine. However, In arch/arm/mm/cache-v7.c, > v7_dma_inv_range does not drain the write buffer; and the > v6_dma_inv_range does that in the end of all the cache maintenance > operaitons. There's no such thing as "drain write buffer" in ARMv7. There are barriers instead, in particular dsb, which replaces the original "drain write buffer" instruction. As far as userspace DMA coherency, the only way you could do it with current kernel APIs is by using get_user_pages(), creating a scatterlist from those, and then passing it to dma_map_sg(). While the device has ownership of the SG, userspace must _not_ touch the buffer until after DMA has completed. However, that won't work with ARMv7's speculative prefetching. I'm afraid with such things, DMA direct into userspace mappings becomes a _lot_ harder, and lets face it, lots of Linux drivers just aren't going to bother supporting this - we can't currently get agreement to have an API to map DMA coherent pages into userspace! ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-06 22:25 ` Russell King - ARM Linux @ 2009-08-07 5:59 ` David Xiao 2009-08-07 7:58 ` Laurent Pinchart ` (3 more replies) 2009-08-07 7:48 ` Laurent Pinchart 2009-08-25 12:53 ` Steven Walter 2 siblings, 4 replies; 42+ messages in thread From: David Xiao @ 2009-08-07 5:59 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Laurent Pinchart, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Thu, 2009-08-06 at 15:25 -0700, Russell King - ARM Linux wrote: > On Thu, Aug 06, 2009 at 11:46:14AM -0700, David Xiao wrote: > > On Thu, 2009-08-06 at 06:06 -0700, Laurent Pinchart wrote: > > > Hi Ben, > > > > > > On Thursday 06 August 2009 13:46:19 Ben Dooks wrote: > > > > On Thu, Aug 06, 2009 at 12:08:21PM +0200, Laurent Pinchart wrote: > > > [snip] > > > > > > > > > > The second problem is to ensure cache coherency. As the userspace > > > > > application will read data from the video buffers, those buffers will end > > > > > up being cached in the processor's data cache. The driver does need to > > > > > invalidate the cache before starting the DMA operation (userspace could > > > > > in theory write to the buffers, but the data will be overwritten by DMA > > > > > anyway, so there's no need to clean the cache). > > > > > > > > You'll need to clean the write buffers, otherwise the CPU may have data > > > > queued that it has yet to write back to memory. > > > > > > Good points, thanks. > > > > I thought this should have been taken care of by the CPU specific > > dma_inv_range routine. However, In arch/arm/mm/cache-v7.c, > > v7_dma_inv_range does not drain the write buffer; and the > > v6_dma_inv_range does that in the end of all the cache maintenance > > operaitons. > > There's no such thing as "drain write buffer" in ARMv7. There are > barriers instead, in particular dsb, which replaces the original > "drain write buffer" instruction. > Sorry, I overlooked the "DSB" inst in the end; yes, it looks like the CP15 related "drain write buffer" inst is deprecated in V7. > As far as userspace DMA coherency, the only way you could do it with > current kernel APIs is by using get_user_pages(), creating a scatterlist > from those, and then passing it to dma_map_sg(). While the device has > ownership of the SG, userspace must _not_ touch the buffer until after > DMA has completed. > > However, that won't work with ARMv7's speculative prefetching. I'm > afraid with such things, DMA direct into userspace mappings becomes a > _lot_ harder, and lets face it, lots of Linux drivers just aren't going > to bother supporting this - we can't currently get agreement to have an > API to map DMA coherent pages into userspace! The V7 speculative prefetching will then probably apply to DMA coherency issue in general, both kernel and user space DMAs. Could this be addressed by inside the dma_unmap_sg/single() calling dma_cache_maint() when the direction is DMA_FROM_DEVICE/DMA_BIDIRECTIONAL, to basically invalidate the related cache lines in case any filled by prefetching? Assuming dma_unmap_sg/single() is called after each DMA operation is completed. David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-07 5:59 ` David Xiao @ 2009-08-07 7:58 ` Laurent Pinchart 2009-08-07 8:10 ` Russell King - ARM Linux 2009-08-07 8:08 ` Russell King - ARM Linux ` (2 subsequent siblings) 3 siblings, 1 reply; 42+ messages in thread From: Laurent Pinchart @ 2009-08-07 7:58 UTC (permalink / raw) To: David Xiao Cc: Russell King - ARM Linux, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Friday 07 August 2009 07:59:26 David Xiao wrote: > On Thu, 2009-08-06 at 15:25 -0700, Russell King - ARM Linux wrote: > > As far as userspace DMA coherency, the only way you could do it with > > current kernel APIs is by using get_user_pages(), creating a scatterlist > > from those, and then passing it to dma_map_sg(). While the device has > > ownership of the SG, userspace must _not_ touch the buffer until after > > DMA has completed. > > > > However, that won't work with ARMv7's speculative prefetching. I'm > > afraid with such things, DMA direct into userspace mappings becomes a > > _lot_ harder, and lets face it, lots of Linux drivers just aren't going > > to bother supporting this - we can't currently get agreement to have an > > API to map DMA coherent pages into userspace! > > The V7 speculative prefetching will then probably apply to DMA coherency > issue in general, both kernel and user space DMAs. Could this be > addressed by inside the dma_unmap_sg/single() calling dma_cache_maint() > when the direction is DMA_FROM_DEVICE/DMA_BIDIRECTIONAL, to basically > invalidate the related cache lines in case any filled by prefetching? > Assuming dma_unmap_sg/single() is called after each DMA operation is > completed. Sorry about this, but I'm not sure to understand the speculative prefetching cache issue completely. My understanding is that, even if userspace doesn't touch the DMA buffer while DMA is in progress, it could still read from locations close to the buffer, resulting in a speculative prefetch of data in the buffer. Those data would then end up in the D-cache, and would not be coherent with what the device transfers. If that's correct, how do we avoid the problem in the general case of DMA to kernel-allocated buffers ? Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-07 7:58 ` Laurent Pinchart @ 2009-08-07 8:10 ` Russell King - ARM Linux 2009-08-07 9:54 ` Jamie Lokier 0 siblings, 1 reply; 42+ messages in thread From: Russell King - ARM Linux @ 2009-08-07 8:10 UTC (permalink / raw) To: Laurent Pinchart Cc: David Xiao, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Fri, Aug 07, 2009 at 09:58:30AM +0200, Laurent Pinchart wrote: > Sorry about this, but I'm not sure to understand the speculative prefetching > cache issue completely. The general case with speculative prefetching is that if memory is accessible, it can be prefetched. In other words, if we mapped devices without NX (non-exec) set, the CPU can prefetch instructions from devices, causing random read accesses. Yes, I know it sounds crazy, but that's what I'm told _can_ happen. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-07 8:10 ` Russell King - ARM Linux @ 2009-08-07 9:54 ` Jamie Lokier 2009-08-07 9:59 ` Russell King - ARM Linux 2009-08-07 12:07 ` Laurent Desnogues 0 siblings, 2 replies; 42+ messages in thread From: Jamie Lokier @ 2009-08-07 9:54 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Laurent Pinchart, David Xiao, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk Russell King - ARM Linux wrote: > On Fri, Aug 07, 2009 at 09:58:30AM +0200, Laurent Pinchart wrote: > > Sorry about this, but I'm not sure to understand the speculative prefetching > > cache issue completely. > > The general case with speculative prefetching is that if memory is > accessible, it can be prefetched. > > In other words, if we mapped devices without NX (non-exec) set, the > CPU can prefetch instructions from devices, causing random read > accesses. Yes, I know it sounds crazy, but that's what I'm told > _can_ happen. 1. Does the architecture not prevent speculative instruction prefetches from crossing a page boundary? It would be handy under the circumstances. 2. Is NX available on all the CPUs with speculative prefetching behaviour? If it is, just use that for device mappings? -- Jamie ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-07 9:54 ` Jamie Lokier @ 2009-08-07 9:59 ` Russell King - ARM Linux 2009-08-07 12:07 ` Laurent Desnogues 1 sibling, 0 replies; 42+ messages in thread From: Russell King - ARM Linux @ 2009-08-07 9:59 UTC (permalink / raw) To: Jamie Lokier Cc: Laurent Pinchart, David Xiao, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Fri, Aug 07, 2009 at 10:54:27AM +0100, Jamie Lokier wrote: > Russell King - ARM Linux wrote: > > On Fri, Aug 07, 2009 at 09:58:30AM +0200, Laurent Pinchart wrote: > > > Sorry about this, but I'm not sure to understand the speculative prefetching > > > cache issue completely. > > > > The general case with speculative prefetching is that if memory is > > accessible, it can be prefetched. > > > > In other words, if we mapped devices without NX (non-exec) set, the > > CPU can prefetch instructions from devices, causing random read > > accesses. Yes, I know it sounds crazy, but that's what I'm told > > _can_ happen. > > 1. Does the architecture not prevent speculative instruction > prefetches from crossing a page boundary? It would be handy under the > circumstances. > > 2. Is NX available on all the CPUs with speculative prefetching > behaviour? If it is, just use that for device mappings? I was using it as an example. Setting NX doesn't stop _data_ speculative prefetching to _memory_ areas (as opposed to device areas.) Getting things like the right memory attributes in place and ensuring people don't abuse them is the first step towards getting this stuff right. It's an ongoing project. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-07 9:54 ` Jamie Lokier 2009-08-07 9:59 ` Russell King - ARM Linux @ 2009-08-07 12:07 ` Laurent Desnogues 2009-08-07 13:15 ` Robin Holt 1 sibling, 1 reply; 42+ messages in thread From: Laurent Desnogues @ 2009-08-07 12:07 UTC (permalink / raw) To: Jamie Lokier Cc: Russell King - ARM Linux, Laurent Pinchart, David Xiao, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Fri, Aug 7, 2009 at 11:54 AM, Jamie Lokier<jamie@shareable.org> wrote: > > 1. Does the architecture not prevent speculative instruction > prefetches from crossing a page boundary? It would be handy under the > circumstances. There's no such restriction in ARMv7 architecture. Laurent ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-07 12:07 ` Laurent Desnogues @ 2009-08-07 13:15 ` Robin Holt 2009-08-07 19:01 ` Russell King - ARM Linux 0 siblings, 1 reply; 42+ messages in thread From: Robin Holt @ 2009-08-07 13:15 UTC (permalink / raw) To: Laurent Desnogues Cc: Jamie Lokier, Russell King - ARM Linux, Laurent Pinchart, David Xiao, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Fri, Aug 07, 2009 at 02:07:43PM +0200, Laurent Desnogues wrote: > On Fri, Aug 7, 2009 at 11:54 AM, Jamie Lokier<jamie@shareable.org> wrote: > > > > 1. Does the architecture not prevent speculative instruction > > prefetches from crossing a page boundary? It would be handy under the > > circumstances. > > There's no such restriction in ARMv7 architecture. Doesn't it prevent them for uncached areas? I _THOUGHT_ there was an alloc_consistent (or something like that) call on ARM which gave you an uncached mapping where you could do DMA. I also thought there was a dma_* set of functions which remapped as uncached before DMA begins and remapped as normal after DMA has been completed. Sorry for the fuzzy recollection. I am dredging from 2.6.21 timeframe. Robin ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-07 13:15 ` Robin Holt @ 2009-08-07 19:01 ` Russell King - ARM Linux 2009-08-07 20:11 ` Laurent Pinchart 0 siblings, 1 reply; 42+ messages in thread From: Russell King - ARM Linux @ 2009-08-07 19:01 UTC (permalink / raw) To: Robin Holt Cc: Laurent Desnogues, Jamie Lokier, Laurent Pinchart, David Xiao, Ben Dooks, Hugh Dickins, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Fri, Aug 07, 2009 at 08:15:01AM -0500, Robin Holt wrote: > On Fri, Aug 07, 2009 at 02:07:43PM +0200, Laurent Desnogues wrote: > > On Fri, Aug 7, 2009 at 11:54 AM, Jamie Lokier<jamie@shareable.org> wrote: > > > > > > 1. Does the architecture not prevent speculative instruction > > > prefetches from crossing a page boundary? It would be handy under the > > > circumstances. > > > > There's no such restriction in ARMv7 architecture. > > Doesn't it prevent them for uncached areas? "Uncached areas" is very very fuzzy. Are you talking about a non-cachable memory mapping, or a strongly ordered mapping. I'm afraid that we're going to have to require more precise use of language to describe these things - wolley statements like "uncached areas" are now just too ambiguous. > I _THOUGHT_ there was an > alloc_consistent (or something like that) call on ARM which gave you > an uncached mapping where you could do DMA. The dma_alloc_coherent() does _remap_ memory into a strongly ordered mapping. However, the fully cached mapping remains, which means that the CPU can still speculatively prefetch from that memory. Since we map the fully cached mapping using section (or even supersection) mappings for TLB efficiency, we can't change the memory type on a per-page basis. > I also thought there was a dma_* set of functions which remapped as > uncached before DMA begins and remapped as normal after DMA has been > completed. You're talking about the deprecated DMA bounce code there. It's basically the same problem since it uses the dma_alloc_coherent() interface to gain a source of DMA-able memory. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-07 19:01 ` Russell King - ARM Linux @ 2009-08-07 20:11 ` Laurent Pinchart 2009-08-07 20:28 ` Russell King - ARM Linux 0 siblings, 1 reply; 42+ messages in thread From: Laurent Pinchart @ 2009-08-07 20:11 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Robin Holt, Laurent Desnogues, Jamie Lokier, David Xiao, Ben Dooks, Hugh Dickins, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Friday 07 August 2009 21:01:45 Russell King - ARM Linux wrote: > On Fri, Aug 07, 2009 at 08:15:01AM -0500, Robin Holt wrote: > > On Fri, Aug 07, 2009 at 02:07:43PM +0200, Laurent Desnogues wrote: > > > On Fri, Aug 7, 2009 at 11:54 AM, Jamie Lokier<jamie@shareable.org> wrote: > > > > 1. Does the architecture not prevent speculative instruction > > > > prefetches from crossing a page boundary? It would be handy under > > > > the circumstances. > > > > > > There's no such restriction in ARMv7 architecture. > > > > Doesn't it prevent them for uncached areas? > > "Uncached areas" is very very fuzzy. Are you talking about a non-cachable > memory mapping, or a strongly ordered mapping. > > I'm afraid that we're going to have to require more precise use of language > to describe these things - wolley statements like "uncached areas" are now > just too ambiguous. Ok. Maybe the kernel mapping from L_PTE_MT_UNCACHED to strongly ordered for ARMv6 and up (not sure about how it worked for previous versions) brought some confusion. I'll try to be more precise now. > > I _THOUGHT_ there was an alloc_consistent (or something like that) call on > > ARM which gave you an uncached mapping where you could do DMA. > > The dma_alloc_coherent() does _remap_ memory into a strongly ordered > mapping. However, the fully cached mapping remains, which means that > the CPU can still speculatively prefetch from that memory. Does that mean that, in theory, all DMA transfers in the DMA_FROM_DEVICE direction are currently broken on ARMv7 ? The ARM Architecture Reference Manual (ARM DDI 0100I) states that "• If the same memory locations are marked as having different memory types (Normal, Device, or Strongly Ordered), for example by the use of synonyms in a virtual to physical address mapping, UNPREDICTABLE behavior results. • If the same memory locations are marked as having different cacheable attributes, for example by the use of synonyms in a virtual to physical address mapping, UNPREDICTABLE behavior results." dma_alloc_coherent() ends up calling __dma_alloc(), which allocates pages using alloc_pages(), flushes the data cache for the allocated virtual range and then simply remaps the pages using PTEs previously allocated from the kernel MM. This would be broken if a fully cached Normal mapping already existed for those physical pages. You seem to imply that's the case, but I'm not sure to understand why. > Since we map the fully cached mapping using section (or even supersection) > mappings for TLB efficiency, we can't change the memory type on a > per-page basis. > > > I also thought there was a dma_* set of functions which remapped as > > uncached before DMA begins and remapped as normal after DMA has been > > completed. > > You're talking about the deprecated DMA bounce code there. It's > basically the same problem since it uses the dma_alloc_coherent() > interface to gain a source of DMA-able memory. Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-07 20:11 ` Laurent Pinchart @ 2009-08-07 20:28 ` Russell King - ARM Linux 2009-08-07 22:25 ` David Xiao 2009-08-10 13:49 ` Laurent Pinchart 0 siblings, 2 replies; 42+ messages in thread From: Russell King - ARM Linux @ 2009-08-07 20:28 UTC (permalink / raw) To: Laurent Pinchart Cc: Robin Holt, Laurent Desnogues, Jamie Lokier, David Xiao, Ben Dooks, Hugh Dickins, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Fri, Aug 07, 2009 at 10:11:40PM +0200, Laurent Pinchart wrote: > Ok. Maybe the kernel mapping from L_PTE_MT_UNCACHED to strongly ordered for > ARMv6 and up (not sure about how it worked for previous versions) brought some > confusion. I'll try to be more precise now. It's something we should correct. > Does that mean that, in theory, all DMA transfers in the DMA_FROM_DEVICE > direction are currently broken on ARMv7 ? Technically, yes. I haven't had a stream of bug reports which tends to suggest that either the speculation isn't that aggressive in current silicon, or we're just lucky so far. > The ARM Architecture Reference Manual (ARM DDI 0100I) states that Bear in mind that DDI0100 is out of date now. There's a different document number for it (I forget what it is.) > "• If the same memory locations are marked as having different memory types > (Normal, Device, or Strongly Ordered), for example by the use of synonyms in a > virtual to physical address mapping, UNPREDICTABLE behavior results. > > • If the same memory locations are marked as having different cacheable > attributes, for example by the use of synonyms in a virtual to physical > address mapping, UNPREDICTABLE behavior results." Both of these we end up doing. The current position is "yes, umm, we're not sure what we can do about that"... which also happens to be mine as well. Currently, my best solution is to go for minimal lowmem and maximal highmem - so _everything_ gets mapped in on an as required basis. > This would be broken if a fully cached Normal mapping already existed for > those physical pages. You seem to imply that's the case, but I'm not sure to > understand why. The kernel direct mapping maps all system (low) memory with normal memory cacheable attributes. So using vmalloc, dma_alloc_coherent, using pages in userspace all create duplicate mappings of pages. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-07 20:28 ` Russell King - ARM Linux @ 2009-08-07 22:25 ` David Xiao 2009-08-10 13:49 ` Laurent Pinchart 1 sibling, 0 replies; 42+ messages in thread From: David Xiao @ 2009-08-07 22:25 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Laurent Pinchart, Robin Holt, Laurent Desnogues, Jamie Lokier, Ben Dooks, Hugh Dickins, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Fri, 2009-08-07 at 13:28 -0700, Russell King - ARM Linux wrote: > The kernel direct mapping maps all system (low) memory with normal > memory cacheable attributes. > > So using vmalloc, dma_alloc_coherent, using pages in userspace all > create duplicate mappings of pages. > If we do want to remove all these duplicate mappings, as part of solution to deal with the speculative prefetching, probably one way is to not map all the RAM into the direct-mapped space at paging_init() time, and instead map them on-demand by different upper layer allocation functions, such as vmalloc/dma_alloc_coherent/do_brk/kmalloc/ get_free_pages/etc. But then the distinction between upper layer allocation functions and non-upper layer ones must be made clear though. I know that mapping the RAM at paging_init() time can take advantage of 1M section mapping most of the time, and thus save many 1KB L2 page tables. But a lot of memory still ends up being remapped with L2 page tables later on, and meanwhile 1KB might not be as "precious" as it used to be as well-:) David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-07 20:28 ` Russell King - ARM Linux 2009-08-07 22:25 ` David Xiao @ 2009-08-10 13:49 ` Laurent Pinchart 1 sibling, 0 replies; 42+ messages in thread From: Laurent Pinchart @ 2009-08-10 13:49 UTC (permalink / raw) To: linux-arm-kernel Cc: Russell King - ARM Linux, Robin Holt, Laurent Desnogues, Jamie Lokier, David Xiao, Ben Dooks, Hugh Dickins, linux-kernel@vger.kernel.org, v4l2_linux On Friday 07 August 2009 22:28:29 Russell King - ARM Linux wrote: > On Fri, Aug 07, 2009 at 10:11:40PM +0200, Laurent Pinchart wrote: > > Ok. Maybe the kernel mapping from L_PTE_MT_UNCACHED to strongly ordered > > for ARMv6 and up (not sure about how it worked for previous versions) > > brought some confusion. I'll try to be more precise now. > > It's something we should correct. Do you mean we should map L_PTE_MT_UNCACHED to Normal, non cacheable memory on ARMv6 and up ? That looks like an easy change, but I'm scared of possible side effects. > > Does that mean that, in theory, all DMA transfers in the DMA_FROM_DEVICE > > direction are currently broken on ARMv7 ? > > Technically, yes. I haven't had a stream of bug reports which tends to > suggest that either the speculation isn't that aggressive in current > silicon, or we're just lucky so far. Current silicons probably avoid prefetching memory at random. The most probable cause of problems would be a read in kernel virtual memory at a location just before the buffer being written by DMA. This would result in a few bytes being corrupted for no apparent reason. As the problem would be quite difficult to reproduce, I don't expect many people to perform an in- depth investigation and fill a bug report. > > The ARM Architecture Reference Manual (ARM DDI 0100I) states that > > Bear in mind that DDI0100 is out of date now. There's a different document > number for it (I forget what it is.) Are you talking about the ARM Cortex A8 TRM (ARM DDI 0344D) ? I've read that one (and I should have done so earlier, it helped me understand that the kernel properly maps Linux PTE flags to ARM PTE flags where I thought there was a bug). > > "• If the same memory locations are marked as having different memory > > types (Normal, Device, or Strongly Ordered), for example by the use of > > synonyms in a virtual to physical address mapping, UNPREDICTABLE behavior > > results. > > > > • If the same memory locations are marked as having different cacheable > > attributes, for example by the use of synonyms in a virtual to physical > > address mapping, UNPREDICTABLE behavior results." > > Both of these we end up doing. The current position is "yes, umm, we're not > sure what we can do about that"... which also happens to be mine as well. > Currently, my best solution is to go for minimal lowmem and maximal highmem > - so _everything_ gets mapped in on an as required basis. I suppose the problem will be more common in future architectures, even on other platforms. Do we have the proper infrastructure to do so without seriously damaging performances ? > > This would be broken if a fully cached Normal mapping already existed for > > those physical pages. You seem to imply that's the case, but I'm not sure > > to understand why. > > The kernel direct mapping maps all system (low) memory with normal > memory cacheable attributes. > > So using vmalloc, dma_alloc_coherent, using pages in userspace all > create duplicate mappings of pages. Right. I'm experimenting with several solutions to the initial problem (handling DMA and cache). Of course they all theoretically break because of the aliasing introduced by the kernel low memory mapping combined with speculative prefetching, but as that problem is global it won't affect performances of one solution over the other. 1. Flushing the whole cache before giving ownership of the buffer to the device works, but is quite costly. 2. Flushing only part of the cache might work, but I'm getting unhandled kernel paging requests. I'm investigating that. 3. Marking the userspace mapping as non-cacheable might bring a performance improvement, so I'd like to try that. I'd like some help with marking the mapping as non-cacheable. As pages can be unmapped from userspace virtual memory even though get_user_pages() prevent them from being freed, I need to either: a. Make sure the mapping will be non-cacheable when brought back in userspace virtual memory after a page fault. This requires marking the whole underlying VMA as non-cacheable (vma->vm_page_prot), possibly making much more than the video buffers uncacheable. My plan is to retrieve a pointer to the VMA underlying the buffer, then walk the VMA virtual addresses range to mark all associated PTEs as uncacheable. If a PTE is not present for some reason I won't need to care, as it will be faulted in correctly using the VMA vm_page_prot the next time is is accessed. I'm not sure how to handle young PTEs though. On at least ARMv7 a non-young Linux PTE seems to result in an invalid ARM PTE (0x0000000). What exactly is that for ? How should I care ? b. Prevent the pages from being unmapped from the userspace virtual mapping, in which case the whole VMA won't need to be marked as uncached (unless this breaks coherency somewhere else). I've read/heard that this can be done by using mlock() from userspace, but I need a kernel-side solution. mlock() marks the VMA as VM_LOCKED among other things. Would that be enough to prevent pages from being unmapped from userspace virtual memory ? Regards, -- Laurent Pinchart ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-07 5:59 ` David Xiao 2009-08-07 7:58 ` Laurent Pinchart @ 2009-08-07 8:08 ` Russell King - ARM Linux 2009-08-07 10:23 ` Jamie Lokier 2009-08-11 9:31 ` Catalin Marinas 3 siblings, 0 replies; 42+ messages in thread From: Russell King - ARM Linux @ 2009-08-07 8:08 UTC (permalink / raw) To: David Xiao Cc: Laurent Pinchart, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Thu, Aug 06, 2009 at 10:59:26PM -0700, David Xiao wrote: > The V7 speculative prefetching will then probably apply to DMA coherency > issue in general, both kernel and user space DMAs. Could this be > addressed by inside the dma_unmap_sg/single() calling dma_cache_maint() > when the direction is DMA_FROM_DEVICE/DMA_BIDIRECTIONAL, to basically > invalidate the related cache lines in case any filled by prefetching? > Assuming dma_unmap_sg/single() is called after each DMA operation is > completed. It's something that I was going to look at, and it's probably going to have to be something I do blind - I currently have no MPCore platform, and even if my Realview EB worked, it doesn't use DMA at all. However, it's not trivial - the unmap functions don't have all the necessary information. dma_unmap_single() has the DMA address, which we can convert to the original virtual address via dma_to_virt(). However, dma_unmap_page() can't translate back to a virtual page since we're missing some information there. It bugs me that the DMA API is restrictive in the information which architectures can retain across a mapping which makes this non-trivial. Had I known of these issues when the DMA API was originally being discussed, I'd have suggested that we have an arch-specific dma_map struct which could contain whatever information was required, rather than requiring the driver to maintain the handle/size/direction/etc between each of the calls. That would mean we could retain the virtual address/struct page rather than having to work it back in some way. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-07 5:59 ` David Xiao 2009-08-07 7:58 ` Laurent Pinchart 2009-08-07 8:08 ` Russell King - ARM Linux @ 2009-08-07 10:23 ` Jamie Lokier 2009-08-07 19:03 ` Russell King - ARM Linux 2009-08-11 9:31 ` Catalin Marinas 3 siblings, 1 reply; 42+ messages in thread From: Jamie Lokier @ 2009-08-07 10:23 UTC (permalink / raw) To: David Xiao Cc: Russell King - ARM Linux, Laurent Pinchart, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk David Xiao wrote: > > However, that won't work with ARMv7's speculative prefetching. I'm > > afraid with such things, DMA direct into userspace mappings becomes a > > _lot_ harder, and lets face it, lots of Linux drivers just aren't going > > to bother supporting this - we can't currently get agreement to have an > > API to map DMA coherent pages into userspace! > > The V7 speculative prefetching will then probably apply to DMA coherency > issue in general, both kernel and user space DMAs. Could this be > addressed by inside the dma_unmap_sg/single() calling dma_cache_maint() > when the direction is DMA_FROM_DEVICE/DMA_BIDIRECTIONAL, to basically > invalidate the related cache lines in case any filled by prefetching? > Assuming dma_unmap_sg/single() is called after each DMA operation is > completed. If it's possible, surely its essential because of O_DIRECT file and block I/O? -- Jamie ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-07 10:23 ` Jamie Lokier @ 2009-08-07 19:03 ` Russell King - ARM Linux 0 siblings, 0 replies; 42+ messages in thread From: Russell King - ARM Linux @ 2009-08-07 19:03 UTC (permalink / raw) To: Jamie Lokier Cc: David Xiao, Laurent Pinchart, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Fri, Aug 07, 2009 at 11:23:39AM +0100, Jamie Lokier wrote: > David Xiao wrote: > > > However, that won't work with ARMv7's speculative prefetching. I'm > > > afraid with such things, DMA direct into userspace mappings becomes a > > > _lot_ harder, and lets face it, lots of Linux drivers just aren't going > > > to bother supporting this - we can't currently get agreement to have an > > > API to map DMA coherent pages into userspace! > > > > The V7 speculative prefetching will then probably apply to DMA coherency > > issue in general, both kernel and user space DMAs. Could this be > > addressed by inside the dma_unmap_sg/single() calling dma_cache_maint() > > when the direction is DMA_FROM_DEVICE/DMA_BIDIRECTIONAL, to basically > > invalidate the related cache lines in case any filled by prefetching? > > Assuming dma_unmap_sg/single() is called after each DMA operation is > > completed. > > If it's possible, surely its essential because of O_DIRECT file and > block I/O? The problem is that you require a _VIRTUAL_ address. The unmap functions do not have that information passed to them, so we need some way of maintaining that or calculating it. I've covered that issue in my postings this morning (please follow up there instead.) ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-07 5:59 ` David Xiao ` (2 preceding siblings ...) 2009-08-07 10:23 ` Jamie Lokier @ 2009-08-11 9:31 ` Catalin Marinas 2009-08-11 18:23 ` David Xiao 3 siblings, 1 reply; 42+ messages in thread From: Catalin Marinas @ 2009-08-11 9:31 UTC (permalink / raw) To: David Xiao Cc: Russell King - ARM Linux, Laurent Pinchart, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Thu, 2009-08-06 at 22:59 -0700, David Xiao wrote: > The V7 speculative prefetching will then probably apply to DMA coherency > issue in general, both kernel and user space DMAs. Could this be > addressed by inside the dma_unmap_sg/single() calling dma_cache_maint() > when the direction is DMA_FROM_DEVICE/DMA_BIDIRECTIONAL, to basically > invalidate the related cache lines in case any filled by prefetching? > Assuming dma_unmap_sg/single() is called after each DMA operation is > completed. Theoretically, with speculative prefetching on ARMv7 and the FROM_DEVICE case we need to invalidate the corresponding D-cache lines both before and after the DMA transfer, i.e. in both dma_map_sg and dma_unmap_sg, otherwise there is a risk of stale data in the cache. -- Catalin ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-11 9:31 ` Catalin Marinas @ 2009-08-11 18:23 ` David Xiao 0 siblings, 0 replies; 42+ messages in thread From: David Xiao @ 2009-08-11 18:23 UTC (permalink / raw) To: Catalin Marinas Cc: Russell King - ARM Linux, Laurent Pinchart, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Tue, 2009-08-11 at 02:31 -0700, Catalin Marinas wrote: > On Thu, 2009-08-06 at 22:59 -0700, David Xiao wrote: > > The V7 speculative prefetching will then probably apply to DMA coherency > > issue in general, both kernel and user space DMAs. Could this be > > addressed by inside the dma_unmap_sg/single() calling dma_cache_maint() > > when the direction is DMA_FROM_DEVICE/DMA_BIDIRECTIONAL, to basically > > invalidate the related cache lines in case any filled by prefetching? > > Assuming dma_unmap_sg/single() is called after each DMA operation is > > completed. > > Theoretically, with speculative prefetching on ARMv7 and the FROM_DEVICE > case we need to invalidate the corresponding D-cache lines both before > and after the DMA transfer, i.e. in both dma_map_sg and dma_unmap_sg, > otherwise there is a risk of stale data in the cache. > The dma_map_sg() code is already calling dma_cache_maint() to invalidate the cache lines in the DMA_FROM_DEVICE/DMA_BIDIRECTIONAL direction cases. And the suggestion was to do something similar in dma_unmap_sg() case to deal with the speculative prefetching on ARMv7, and Russel has other postings talking about the details of this in terms of feasibility/etc. Furthermore, duplicate MMU mappings in the kernel bring more twists to this problem as explained in this email chain as well, especially in the case of DMA-coherent memory (dma_alloc_coherent()). David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-06 22:25 ` Russell King - ARM Linux 2009-08-07 5:59 ` David Xiao @ 2009-08-07 7:48 ` Laurent Pinchart 2009-08-25 12:53 ` Steven Walter 2 siblings, 0 replies; 42+ messages in thread From: Laurent Pinchart @ 2009-08-07 7:48 UTC (permalink / raw) To: Russell King - ARM Linux Cc: David Xiao, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Friday 07 August 2009 00:25:43 Russell King - ARM Linux wrote: > > As far as userspace DMA coherency, the only way you could do it with > current kernel APIs is by using get_user_pages(), creating a scatterlist > from those, and then passing it to dma_map_sg(). While the device has > ownership of the SG, userspace must _not_ touch the buffer until after > DMA has completed. If the buffers are going to be reused again and again, would it be possible to mark the pages returned by get_user_pages() as non-cacheable instead ? Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-06 22:25 ` Russell King - ARM Linux 2009-08-07 5:59 ` David Xiao 2009-08-07 7:48 ` Laurent Pinchart @ 2009-08-25 12:53 ` Steven Walter 2009-08-25 22:02 ` David Xiao 2009-09-01 13:28 ` Russell King - ARM Linux 2 siblings, 2 replies; 42+ messages in thread From: Steven Walter @ 2009-08-25 12:53 UTC (permalink / raw) To: Russell King - ARM Linux Cc: David Xiao, Laurent Pinchart, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Thu, Aug 6, 2009 at 6:25 PM, Russell King - ARM Linux<linux@arm.linux.org.uk> wrote: [...] > As far as userspace DMA coherency, the only way you could do it with > current kernel APIs is by using get_user_pages(), creating a scatterlist > from those, and then passing it to dma_map_sg(). While the device has > ownership of the SG, userspace must _not_ touch the buffer until after > DMA has completed. [...] Would that work on a processor with VIVT caches? It seems not. In particular, dma_map_page uses page_address to get a virtual address to pass to map_single(). map_single() in turn uses this address to perform cache maintenance. Since page_address() returns the kernel virtual address, I don't see how any cache-lines for the userspace virtual address would get invalidated (for the DMA_FROM_DEVICE case). If that's true, then what is the correct way to allow DMA to/from a userspace buffer with a VIVT cache? If not true, what am I missing? Thanks -- -Steven Walter <stevenrwalter@gmail.com> ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-25 12:53 ` Steven Walter @ 2009-08-25 22:02 ` David Xiao 2009-08-25 23:17 ` Laurent Pinchart 2009-09-01 13:28 ` Russell King - ARM Linux 1 sibling, 1 reply; 42+ messages in thread From: David Xiao @ 2009-08-25 22:02 UTC (permalink / raw) To: Steven Walter Cc: Russell King - ARM Linux, Laurent Pinchart, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Tue, 2009-08-25 at 05:53 -0700, Steven Walter wrote: > On Thu, Aug 6, 2009 at 6:25 PM, Russell King - ARM > Linux<linux@arm.linux.org.uk> wrote: > [...] > > As far as userspace DMA coherency, the only way you could do it with > > current kernel APIs is by using get_user_pages(), creating a scatterlist > > from those, and then passing it to dma_map_sg(). While the device has > > ownership of the SG, userspace must _not_ touch the buffer until after > > DMA has completed. > [...] > > Would that work on a processor with VIVT caches? It seems not. In > particular, dma_map_page uses page_address to get a virtual address to > pass to map_single(). map_single() in turn uses this address to > perform cache maintenance. Since page_address() returns the kernel > virtual address, I don't see how any cache-lines for the userspace > virtual address would get invalidated (for the DMA_FROM_DEVICE case). > > If that's true, then what is the correct way to allow DMA to/from a > userspace buffer with a VIVT cache? If not true, what am I missing? page_address() is basically returning page->virtual, which records the virtual/physical mapping for both user/kernel space; and what only matters there is highmem or not. David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-25 22:02 ` David Xiao @ 2009-08-25 23:17 ` Laurent Pinchart 2009-08-26 17:22 ` David Xiao 0 siblings, 1 reply; 42+ messages in thread From: Laurent Pinchart @ 2009-08-25 23:17 UTC (permalink / raw) To: David Xiao Cc: Steven Walter, Russell King - ARM Linux, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Wednesday 26 August 2009 00:02:48 David Xiao wrote: > On Tue, 2009-08-25 at 05:53 -0700, Steven Walter wrote: > > On Thu, Aug 6, 2009 at 6:25 PM, Russell King - ARM > > Linux<linux@arm.linux.org.uk> wrote: > > [...] > > > > > As far as userspace DMA coherency, the only way you could do it with > > > current kernel APIs is by using get_user_pages(), creating a > > > scatterlist from those, and then passing it to dma_map_sg(). While the > > > device has ownership of the SG, userspace must _not_ touch the buffer > > > until after DMA has completed. > > > > [...] > > > > Would that work on a processor with VIVT caches? It seems not. In > > particular, dma_map_page uses page_address to get a virtual address to > > pass to map_single(). map_single() in turn uses this address to > > perform cache maintenance. Since page_address() returns the kernel > > virtual address, I don't see how any cache-lines for the userspace > > virtual address would get invalidated (for the DMA_FROM_DEVICE case). > > > > If that's true, then what is the correct way to allow DMA to/from a > > userspace buffer with a VIVT cache? If not true, what am I missing? > > page_address() is basically returning page->virtual, which records the > virtual/physical mapping for both user/kernel space; and what only > matters there is highmem or not. I'm not sure to get it. Are you implying that a physical page will then be mapped to the same address in all contexts (kernelspace and userspace processes) ? Is that even possible ? And if not, how could page->virtual store both the initial kernel map and all the userspace mappings ? -- Laurent Pinchart ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-25 23:17 ` Laurent Pinchart @ 2009-08-26 17:22 ` David Xiao 2009-09-01 13:31 ` Russell King - ARM Linux 0 siblings, 1 reply; 42+ messages in thread From: David Xiao @ 2009-08-26 17:22 UTC (permalink / raw) To: Laurent Pinchart Cc: Steven Walter, Russell King - ARM Linux, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Tue, 2009-08-25 at 16:17 -0700, Laurent Pinchart wrote: > On Wednesday 26 August 2009 00:02:48 David Xiao wrote: > > On Tue, 2009-08-25 at 05:53 -0700, Steven Walter wrote: > > > On Thu, Aug 6, 2009 at 6:25 PM, Russell King - ARM > > > Linux<linux@arm.linux.org.uk> wrote: > > > [...] > > > > > > > As far as userspace DMA coherency, the only way you could do it with > > > > current kernel APIs is by using get_user_pages(), creating a > > > > scatterlist from those, and then passing it to dma_map_sg(). While the > > > > device has ownership of the SG, userspace must _not_ touch the buffer > > > > until after DMA has completed. > > > > > > [...] > > > > > > Would that work on a processor with VIVT caches? It seems not. In > > > particular, dma_map_page uses page_address to get a virtual address to > > > pass to map_single(). map_single() in turn uses this address to > > > perform cache maintenance. Since page_address() returns the kernel > > > virtual address, I don't see how any cache-lines for the userspace > > > virtual address would get invalidated (for the DMA_FROM_DEVICE case). > > > > > > If that's true, then what is the correct way to allow DMA to/from a > > > userspace buffer with a VIVT cache? If not true, what am I missing? > > > > page_address() is basically returning page->virtual, which records the > > virtual/physical mapping for both user/kernel space; and what only > > matters there is highmem or not. > > I'm not sure to get it. Are you implying that a physical page will then be > mapped to the same address in all contexts (kernelspace and userspace > processes) ? Is that even possible ? And if not, how could page->virtual store > both the initial kernel map and all the userspace mappings ? > Sorry for the confusion, page_address() indeed only returns kernel virtual address; and in order to support VIVT cache maintenance for the user space mappings, the dma_map_sg/dma_map_page() functions or even the struct scatterlist do seem to have to be modified to pass in virtual address, I think. David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-26 17:22 ` David Xiao @ 2009-09-01 13:31 ` Russell King - ARM Linux 2009-09-01 18:08 ` David Xiao 0 siblings, 1 reply; 42+ messages in thread From: Russell King - ARM Linux @ 2009-09-01 13:31 UTC (permalink / raw) To: David Xiao Cc: Laurent Pinchart, Steven Walter, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Wed, Aug 26, 2009 at 10:22:11AM -0700, David Xiao wrote: > Sorry for the confusion, page_address() indeed only returns kernel > virtual address; and in order to support VIVT cache maintenance for the > user space mappings, the dma_map_sg/dma_map_page() functions or even the > struct scatterlist do seem to have to be modified to pass in virtual > address, I think. That's the wrong answer. When DMA happens (and therefore these functions are called) the userspace context could already have been switched away, which means that any userspace address information is useless. Adding support to the existing DMA API functions so they can be used for userspace mapped pages is simply the wrong approach - most users of those functions are not concerned with userspace mapped pages at all, and adding that burden onto all those users is clearly sub-optimal. The right answer? I don't think there is one (see my previous mail.) ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-09-01 13:31 ` Russell King - ARM Linux @ 2009-09-01 18:08 ` David Xiao 0 siblings, 0 replies; 42+ messages in thread From: David Xiao @ 2009-09-01 18:08 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Laurent Pinchart, Steven Walter, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Tue, 2009-09-01 at 06:31 -0700, Russell King - ARM Linux wrote: > On Wed, Aug 26, 2009 at 10:22:11AM -0700, David Xiao wrote: > > Sorry for the confusion, page_address() indeed only returns kernel > > virtual address; and in order to support VIVT cache maintenance for the > > user space mappings, the dma_map_sg/dma_map_page() functions or even the > > struct scatterlist do seem to have to be modified to pass in virtual > > address, I think. > > That's the wrong answer. When DMA happens (and therefore these functions > are called) the userspace context could already have been switched away, > which means that any userspace address information is useless. > The dma_map_sg/page() needs to be set up before starting DMA operations. If the context switch happens before/when DMA occurs, that is okay since in the case of VIVT cache all the necessary cache lines will be invalidated/flushed anyway with every context switch. My understanding is that there are basically two issues associated with VIVT cache in an OS environment: 1. address space change. When a context switch happens, if the new address space is overlapping with the old one, as ARM linux does, all the related cache lines have to be invalidated/flushed, unless something like ASID used together with VIVT cache. 2. cache-line aliasing in the same address space. In the user space DMA case, we are assuming that these physical pages are only mapped twice, once in user space and once in kernel direct-mapping. I went through the kernel code path and think the kernel direct-mapping was already flushed/invalidated before the pages were handed over to the user space; therefore, the proposal is to record the user space virtual address and do the proper cache maintenance operations. > Adding support to the existing DMA API functions so they can be used for > userspace mapped pages is simply the wrong approach - most users of those > functions are not concerned with userspace mapped pages at all, and adding > that burden onto all those users is clearly sub-optimal. > The kernel is already addressing the mmap() file case by putting the mapping field into the struct page and etc; and I personally do not think it is too much of a change for the user space DMA case, if we agree the application/request is valid of course. David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-25 12:53 ` Steven Walter 2009-08-25 22:02 ` David Xiao @ 2009-09-01 13:28 ` Russell King - ARM Linux 2009-09-01 13:43 ` Laurent Pinchart 1 sibling, 1 reply; 42+ messages in thread From: Russell King - ARM Linux @ 2009-09-01 13:28 UTC (permalink / raw) To: Steven Walter Cc: David Xiao, Laurent Pinchart, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Tue, Aug 25, 2009 at 08:53:29AM -0400, Steven Walter wrote: > On Thu, Aug 6, 2009 at 6:25 PM, Russell King - ARM > Linux<linux@arm.linux.org.uk> wrote: > [...] > > As far as userspace DMA coherency, the only way you could do it with > > current kernel APIs is by using get_user_pages(), creating a scatterlist > > from those, and then passing it to dma_map_sg(). While the device has > > ownership of the SG, userspace must _not_ touch the buffer until after > > DMA has completed. > [...] > > Would that work on a processor with VIVT caches? It seems not. In > particular, dma_map_page uses page_address to get a virtual address to > pass to map_single(). map_single() in turn uses this address to > perform cache maintenance. Since page_address() returns the kernel > virtual address, I don't see how any cache-lines for the userspace > virtual address would get invalidated (for the DMA_FROM_DEVICE case). You are correct. > If that's true, then what is the correct way to allow DMA to/from a > userspace buffer with a VIVT cache? If not true, what am I missing? I don't think you read what I said (but I've also forgotten what I did say). To put it simply, the kernel does not support DMA direct from userspace pages. Solutions which have been proposed in the past only work with a sub-set of conditions (such as the one above only works with VIPT caches.) ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-09-01 13:28 ` Russell King - ARM Linux @ 2009-09-01 13:43 ` Laurent Pinchart 2009-09-01 14:18 ` Russell King - ARM Linux 2009-09-02 15:10 ` Imre Deak 0 siblings, 2 replies; 42+ messages in thread From: Laurent Pinchart @ 2009-09-01 13:43 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Steven Walter, David Xiao, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Tuesday 01 September 2009 15:28:24 Russell King - ARM Linux wrote: > On Tue, Aug 25, 2009 at 08:53:29AM -0400, Steven Walter wrote: > > On Thu, Aug 6, 2009 at 6:25 PM, Russell King - ARM > > Linux<linux@arm.linux.org.uk> wrote: > > [...] > > > > > As far as userspace DMA coherency, the only way you could do it with > > > current kernel APIs is by using get_user_pages(), creating a > > > scatterlist from those, and then passing it to dma_map_sg(). While the > > > device has ownership of the SG, userspace must _not_ touch the buffer > > > until after DMA has completed. > > > > [...] > > > > Would that work on a processor with VIVT caches? It seems not. In > > particular, dma_map_page uses page_address to get a virtual address to > > pass to map_single(). map_single() in turn uses this address to > > perform cache maintenance. Since page_address() returns the kernel > > virtual address, I don't see how any cache-lines for the userspace > > virtual address would get invalidated (for the DMA_FROM_DEVICE case). > > You are correct. > > > If that's true, then what is the correct way to allow DMA to/from a > > userspace buffer with a VIVT cache? If not true, what am I missing? > > I don't think you read what I said (but I've also forgotten what I did > say). > > To put it simply, the kernel does not support DMA direct from userspace > pages. Solutions which have been proposed in the past only work with a > sub-set of conditions (such as the one above only works with VIPT > caches.) I might be missing something obvious, but I fail to see how VIVT caches could work at all with multiple mappings. If a kernel-allocated buffer is DMA'ed to, we certainly want to invalidate all cache lines that store buffer data. As the cache doesn't care about physical addresses we thus need to invalidate all virtual mappings for the buffer. If the buffer is mmap'ed in userspace I don't see how that would be done. -- Laurent Pinchart ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-09-01 13:43 ` Laurent Pinchart @ 2009-09-01 14:18 ` Russell King - ARM Linux 2009-09-01 16:53 ` Hugh Dickins 2009-09-02 15:10 ` Imre Deak 1 sibling, 1 reply; 42+ messages in thread From: Russell King - ARM Linux @ 2009-09-01 14:18 UTC (permalink / raw) To: Laurent Pinchart Cc: Steven Walter, David Xiao, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Tue, Sep 01, 2009 at 03:43:48PM +0200, Laurent Pinchart wrote: > I might be missing something obvious, but I fail to see how VIVT caches > could work at all with multiple mappings. If a kernel-allocated buffer > is DMA'ed to, we certainly want to invalidate all cache lines that store > buffer data. As the cache doesn't care about physical addresses we thus > need to invalidate all virtual mappings for the buffer. If the buffer is > mmap'ed in userspace I don't see how that would be done. You need to ask MM gurus about that. I don't touch the Linux MM very often so tend to keep forgetting how it works. However, it does work for shared mappings of files on CPUs with VIVT caches. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-09-01 14:18 ` Russell King - ARM Linux @ 2009-09-01 16:53 ` Hugh Dickins 0 siblings, 0 replies; 42+ messages in thread From: Hugh Dickins @ 2009-09-01 16:53 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Laurent Pinchart, Steven Walter, David Xiao, Ben Dooks, Robin Holt, linux-kernel, linux-media, linux-arm-kernel On Tue, 1 Sep 2009, Russell King - ARM Linux wrote: > On Tue, Sep 01, 2009 at 03:43:48PM +0200, Laurent Pinchart wrote: > > I might be missing something obvious, but I fail to see how VIVT caches > > could work at all with multiple mappings. If a kernel-allocated buffer > > is DMA'ed to, we certainly want to invalidate all cache lines that store > > buffer data. As the cache doesn't care about physical addresses we thus > > need to invalidate all virtual mappings for the buffer. If the buffer is > > mmap'ed in userspace I don't see how that would be done. > > You need to ask MM gurus about that. I don't touch the Linux MM very > often so tend to keep forgetting how it works. However, it does work > for shared mappings of files on CPUs with VIVT caches. I believe arch/arm/mm/flush.c __flush_dcache_aliases() is what does it. Hugh ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-09-01 13:43 ` Laurent Pinchart 2009-09-01 14:18 ` Russell King - ARM Linux @ 2009-09-02 15:10 ` Imre Deak 2009-09-03 7:31 ` Imre Deak 2009-09-03 8:36 ` Russell King - ARM Linux 1 sibling, 2 replies; 42+ messages in thread From: Imre Deak @ 2009-09-02 15:10 UTC (permalink / raw) To: ext Laurent Pinchart Cc: Russell King - ARM Linux, Steven Walter, David Xiao, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Tue, Sep 01, 2009 at 03:43:48PM +0200, ext Laurent Pinchart wrote: > On Tuesday 01 September 2009 15:28:24 Russell King - ARM Linux wrote: > > On Tue, Aug 25, 2009 at 08:53:29AM -0400, Steven Walter wrote: > > > On Thu, Aug 6, 2009 at 6:25 PM, Russell King - ARM > > > Linux<linux@arm.linux.org.uk> wrote: > > > [...] > > > > > > > As far as userspace DMA coherency, the only way you could do it with > > > > current kernel APIs is by using get_user_pages(), creating a > > > > scatterlist from those, and then passing it to dma_map_sg(). While the > > > > device has ownership of the SG, userspace must _not_ touch the buffer > > > > until after DMA has completed. > > > > > > [...] > > > > > > Would that work on a processor with VIVT caches? It seems not. In > > > particular, dma_map_page uses page_address to get a virtual address to > > > pass to map_single(). map_single() in turn uses this address to > > > perform cache maintenance. Since page_address() returns the kernel > > > virtual address, I don't see how any cache-lines for the userspace > > > virtual address would get invalidated (for the DMA_FROM_DEVICE case). > > > > You are correct. > > > > > If that's true, then what is the correct way to allow DMA to/from a > > > userspace buffer with a VIVT cache? If not true, what am I missing? > > > > I don't think you read what I said (but I've also forgotten what I did > > say). > > > > To put it simply, the kernel does not support DMA direct from userspace > > pages. Solutions which have been proposed in the past only work with a > > sub-set of conditions (such as the one above only works with VIPT > > caches.) > > I might be missing something obvious, but I fail to see how VIVT caches could > work at all with multiple mappings. If a kernel-allocated buffer is DMA'ed to, > we certainly want to invalidate all cache lines that store buffer data. As the > cache doesn't care about physical addresses we thus need to invalidate all > virtual mappings for the buffer. If the buffer is mmap'ed in userspace I don't > see how that would be done. To my understanding buffers returned by dma_alloc_*, kmalloc, vmalloc are ok: The cache lines for direct mapping are flushed in dma_alloc_* and vmalloc. After this you are not supposed to access the buffers through the direct mapping until you're done with the DMA. For kmalloc you use the direct mapping in the first place, so the flush in dma_map_* will be enough. For user mappings I think you'd have to do an additional flush for the direct mapping, while the user mapping is flushed in dma_map_*. --Imre ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-09-02 15:10 ` Imre Deak @ 2009-09-03 7:31 ` Imre Deak 2009-09-03 8:36 ` Russell King - ARM Linux 1 sibling, 0 replies; 42+ messages in thread From: Imre Deak @ 2009-09-03 7:31 UTC (permalink / raw) To: ext Laurent Pinchart, Russell King - ARM Linux Cc: Steven Walter, David Xiao, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Wed, Sep 02, 2009 at 05:10:44PM +0200, Deak Imre (Nokia-D/Helsinki) wrote: > On Tue, Sep 01, 2009 at 03:43:48PM +0200, ext Laurent Pinchart wrote: > > [...] > > I might be missing something obvious, but I fail to see how VIVT caches could > > work at all with multiple mappings. If a kernel-allocated buffer is DMA'ed to, > > we certainly want to invalidate all cache lines that store buffer data. As the > > cache doesn't care about physical addresses we thus need to invalidate all > > virtual mappings for the buffer. If the buffer is mmap'ed in userspace I don't > > see how that would be done. > > To my understanding buffers returned by dma_alloc_*, kmalloc, vmalloc > are ok: > > The cache lines for direct mapping are flushed in dma_alloc_* and > vmalloc. After this you are not supposed to access the buffers > through the direct mapping until you're done with the DMA. > > For kmalloc you use the direct mapping in the first place, so the > flush in dma_map_* will be enough. > > For user mappings I think you'd have to do an additional flush for > the direct mapping, while the user mapping is flushed in dma_map_*. Based on the the discussion so far this is my understanding on how zero-copy DMA is possible on ARM. Could you please confirm / correct these? : - user space passes an arbitrary buffer: - get_user_pages(user address range) - DMA(user address range) - user space reads from the buffer Problems: - not supported according to Russell - unhandled faults for cache ops on not-present PTEs, but patch from Laurent fixes this - mmap a kernel buffer to user space with cacheable mapping: - user space writes to the buffer - flush cache(user address range) - DMA(kernel buffer) - user space reads from the buffer The additional flush cache is needed for VIVT/aliasing VIPT. Instead of the flush cache: - the mapping can be done with writethrough, non-writeallocate or non-cacheable mapping, or - for aliasing VIPT a non-aliasing user address is picked DMA(address range) is: - dma_map_*(address range) - perform DMA to/from address range - dma_unmap_*(address range) Thanks, Imre ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-09-02 15:10 ` Imre Deak 2009-09-03 7:31 ` Imre Deak @ 2009-09-03 8:36 ` Russell King - ARM Linux 2009-09-08 13:05 ` Steven Walter 1 sibling, 1 reply; 42+ messages in thread From: Russell King - ARM Linux @ 2009-09-03 8:36 UTC (permalink / raw) To: Imre Deak Cc: ext Laurent Pinchart, Steven Walter, David Xiao, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Wed, Sep 02, 2009 at 06:10:44PM +0300, Imre Deak wrote: > To my understanding buffers returned by dma_alloc_*, kmalloc, vmalloc > are ok: For dma_map_*, the only pages/addresses which are valid to pass are those returned by get_free_pages() or kmalloc. Everything else is not permitted. Use of vmalloc'd and dma_alloc_* pages with the dma_map_* APIs is invalid use of the DMA API. See the notes in the DMA-mapping.txt document against "dma_map_single". > For user mappings I think you'd have to do an additional flush for > the direct mapping, while the user mapping is flushed in dma_map_*. I will not accept a patch which adds flushing of anything other than the kernel direct mapping in the dma_map_* functions, so please find a different approach. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-09-03 8:36 ` Russell King - ARM Linux @ 2009-09-08 13:05 ` Steven Walter 0 siblings, 0 replies; 42+ messages in thread From: Steven Walter @ 2009-09-08 13:05 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Imre Deak, ext Laurent Pinchart, David Xiao, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Thu, Sep 3, 2009 at 4:36 AM, Russell King - ARM Linux<linux@arm.linux.org.uk> wrote: > On Wed, Sep 02, 2009 at 06:10:44PM +0300, Imre Deak wrote: >> To my understanding buffers returned by dma_alloc_*, kmalloc, vmalloc >> are ok: > > For dma_map_*, the only pages/addresses which are valid to pass are > those returned by get_free_pages() or kmalloc. Everything else is > not permitted. > > Use of vmalloc'd and dma_alloc_* pages with the dma_map_* APIs is invalid > use of the DMA API. See the notes in the DMA-mapping.txt document > against "dma_map_single". Actually, DMA-mapping.txt seems to explicitly say that it's allowed to use pages allocated by vmalloc: "It is possible to DMA to the _underlying_ memory mapped into a vmalloc() area, but this requires walking page tables to get the physical addresses, and then translating each of those pages back to a kernel address using something like __va()." >> For user mappings I think you'd have to do an additional flush for >> the direct mapping, while the user mapping is flushed in dma_map_*. > > I will not accept a patch which adds flushing of anything other than > the kernel direct mapping in the dma_map_* functions, so please find > a different approach. What's the concern here? Just the performance overhead of the checks and additional flushes? It seems much more desirable for the dma_map_* API to take care of potential cache aliases than to require every driver to manage it for itself. After all, part of the purpose of the DMA API is to manage the cache maintenance around DMAs in an architecture-independent way. -- -Steven Walter <stevenrwalter@gmail.com> ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-06 18:46 ` David Xiao ` (2 preceding siblings ...) 2009-08-06 22:25 ` Russell King - ARM Linux @ 2009-08-07 7:29 ` Laurent Pinchart 2009-08-07 8:12 ` Matthieu CASTET 3 siblings, 1 reply; 42+ messages in thread From: Laurent Pinchart @ 2009-08-07 7:29 UTC (permalink / raw) To: David Xiao Cc: Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Thursday 06 August 2009 20:46:14 David Xiao wrote: [snip] > Another approach is working from a different direction: the kernel > allocates the non-cached buffer and then mmap() into user space. I have > done that in similar situation to try to achieve "zero-copy". That's what most drivers do. While it's probably the easiest solution in many cases, it will sometimes introduce additional memcpy() operations that I'd like to avoid. Think about the simple following use case. An application wants to display video it acquires from the device to the screen using Xv. The video buffer is allocated by Xv. Using the v4l2 user pointer streaming method, the device can DMA directly to the Xv buffer. Using driver-allocated buffers, a memcpy() is required between the v4l2 buffer and the Xv buffer. Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-07 7:29 ` Laurent Pinchart @ 2009-08-07 8:12 ` Matthieu CASTET 2009-08-07 10:13 ` How to efficiently handle DMA and cache on ARMv7 ? (was " Is " Laurent Pinchart 0 siblings, 1 reply; 42+ messages in thread From: Matthieu CASTET @ 2009-08-07 8:12 UTC (permalink / raw) To: Laurent Pinchart Cc: David Xiao, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk Laurent Pinchart a écrit : > On Thursday 06 August 2009 20:46:14 David Xiao wrote: > > Think about the simple following use case. An application wants to display > video it acquires from the device to the screen using Xv. The video buffer is > allocated by Xv. Using the v4l2 user pointer streaming method, the device can > DMA directly to the Xv buffer. Using driver-allocated buffers, a memcpy() is > required between the v4l2 buffer and the Xv buffer. > v4l2 got an API (overlay IRRC) that allow drivers to write directly in framebuffer memory. BTW Xv buffer is not always in video memory and the X driver can do a memcpy. Matthieu ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to efficiently handle DMA and cache on ARMv7 ? (was " Is get_user_pages() enough to prevent pages from being swapped out ?") 2009-08-07 8:12 ` Matthieu CASTET @ 2009-08-07 10:13 ` Laurent Pinchart 0 siblings, 0 replies; 42+ messages in thread From: Laurent Pinchart @ 2009-08-07 10:13 UTC (permalink / raw) To: Matthieu CASTET Cc: David Xiao, Ben Dooks, Hugh Dickins, Robin Holt, linux-kernel@vger.kernel.org, v4l2_linux, linux-arm-kernel@lists.arm.linux.org.uk On Friday 07 August 2009 10:12:23 Matthieu CASTET wrote: > Laurent Pinchart a écrit : > > On Thursday 06 August 2009 20:46:14 David Xiao wrote: > > > > Think about the simple following use case. An application wants to > > display video it acquires from the device to the screen using Xv. The > > video buffer is allocated by Xv. Using the v4l2 user pointer streaming > > method, the device can DMA directly to the Xv buffer. Using > > driver-allocated buffers, a memcpy() is required between the v4l2 buffer > > and the Xv buffer. > > v4l2 got an API (overlay IRRC) that allow drivers to write directly in > framebuffer memory. That's right, but I was mostly using this as an example. > BTW Xv buffer is not always in video memory and the X driver can do a > memcpy. Still, one less memcpy is better :-) Regards, Laurent Pinchart ^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2009-09-08 13:05 UTC | newest] Thread overview: 42+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-08-06 10:08 How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?") Laurent Pinchart 2009-08-06 11:46 ` Ben Dooks 2009-08-06 13:06 ` Laurent Pinchart 2009-08-06 18:46 ` David Xiao 2009-08-06 19:16 ` Chetan.Loke 2009-08-06 20:15 ` Jamie Lokier 2009-08-06 22:25 ` Russell King - ARM Linux 2009-08-07 5:59 ` David Xiao 2009-08-07 7:58 ` Laurent Pinchart 2009-08-07 8:10 ` Russell King - ARM Linux 2009-08-07 9:54 ` Jamie Lokier 2009-08-07 9:59 ` Russell King - ARM Linux 2009-08-07 12:07 ` Laurent Desnogues 2009-08-07 13:15 ` Robin Holt 2009-08-07 19:01 ` Russell King - ARM Linux 2009-08-07 20:11 ` Laurent Pinchart 2009-08-07 20:28 ` Russell King - ARM Linux 2009-08-07 22:25 ` David Xiao 2009-08-10 13:49 ` Laurent Pinchart 2009-08-07 8:08 ` Russell King - ARM Linux 2009-08-07 10:23 ` Jamie Lokier 2009-08-07 19:03 ` Russell King - ARM Linux 2009-08-11 9:31 ` Catalin Marinas 2009-08-11 18:23 ` David Xiao 2009-08-07 7:48 ` Laurent Pinchart 2009-08-25 12:53 ` Steven Walter 2009-08-25 22:02 ` David Xiao 2009-08-25 23:17 ` Laurent Pinchart 2009-08-26 17:22 ` David Xiao 2009-09-01 13:31 ` Russell King - ARM Linux 2009-09-01 18:08 ` David Xiao 2009-09-01 13:28 ` Russell King - ARM Linux 2009-09-01 13:43 ` Laurent Pinchart 2009-09-01 14:18 ` Russell King - ARM Linux 2009-09-01 16:53 ` Hugh Dickins 2009-09-02 15:10 ` Imre Deak 2009-09-03 7:31 ` Imre Deak 2009-09-03 8:36 ` Russell King - ARM Linux 2009-09-08 13:05 ` Steven Walter 2009-08-07 7:29 ` Laurent Pinchart 2009-08-07 8:12 ` Matthieu CASTET 2009-08-07 10:13 ` How to efficiently handle DMA and cache on ARMv7 ? (was " Is " Laurent Pinchart
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).