* [PATCHv2 0/5] fix xfs by making I/O to vmap/vmalloc areas work
@ 2009-12-23 21:22 James Bottomley
2009-12-23 21:22 ` [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas James Bottomley
0 siblings, 1 reply; 27+ messages in thread
From: James Bottomley @ 2009-12-23 21:22 UTC (permalink / raw)
To: linux-arch
Cc: linux-parisc, Christoph Hellwig, Russell King, Paul Mundt,
James Bottomley
From: James Bottomley <jejb@external.hp.com>
After looking through all the feedback, here's the next version of the
patch.
This one is based on a new api: flush/invalidate_kernel_vmap_range()
making it clear that the fs/driver is the entity managing the vmaps
... it also drops the block inputs because coherency management is now
the responsibility of the user. The xfs interface is nicely
simplified with this approach. Could someone check this on arm and
sh?
Thanks,
James
---
James Bottomley (5):
mm: add coherence API for DMA to vmalloc/vmap areas
parisc: add mm API for DMA to vmalloc/vmap areas
arm: add mm API for DMA to vmalloc/vmap areas
sh: add mm API for DMA to vmalloc/vmap areas
xfs: fix xfs to work with Virtually Indexed architectures
Documentation/cachetlb.txt | 27 +++++++++++++++++++++++++++
arch/arm/include/asm/cacheflush.h | 10 ++++++++++
arch/parisc/include/asm/cacheflush.h | 12 ++++++++++++
arch/sh/include/asm/cacheflush.h | 8 ++++++++
fs/xfs/linux-2.6/xfs_buf.c | 20 +++++++++++++++++++-
include/linux/highmem.h | 6 ++++++
6 files changed, 82 insertions(+), 1 deletions(-)
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas
2009-12-23 21:22 [PATCHv2 0/5] fix xfs by making I/O to vmap/vmalloc areas work James Bottomley
@ 2009-12-23 21:22 ` James Bottomley
2009-12-23 21:22 ` James Bottomley
` (2 more replies)
0 siblings, 3 replies; 27+ messages in thread
From: James Bottomley @ 2009-12-23 21:22 UTC (permalink / raw)
To: linux-arch
Cc: linux-parisc, Christoph Hellwig, Russell King, Paul Mundt,
James Bottomley
On Virtually Indexed architectures (which don't do automatic alias
resolution in their caches), we have to flush via the correct
virtual address to prepare pages for DMA. On some architectures
(like arm) we cannot prevent the CPU from doing data movein along
the alias (and thus giving stale read data), so we not only have to
introduce a flush API to push dirty cache lines out, but also an invalidate
API to kill inconsistent cache lines that may have moved in before
DMA changed the data
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
---
Documentation/cachetlb.txt | 27 +++++++++++++++++++++++++++
include/linux/highmem.h | 6 ++++++
2 files changed, 33 insertions(+), 0 deletions(-)
diff --git a/Documentation/cachetlb.txt b/Documentation/cachetlb.txt
index da42ab4..a29129f 100644
--- a/Documentation/cachetlb.txt
+++ b/Documentation/cachetlb.txt
@@ -377,3 +377,30 @@ maps this page at its virtual address.
All the functionality of flush_icache_page can be implemented in
flush_dcache_page and update_mmu_cache. In 2.7 the hope is to
remove this interface completely.
+
+The final category of APIs is for I/O to deliberately aliased address
+ranges inside the kernel. Such aliases are set up by use of the
+vmap/vmalloc API. Since kernel I/O goes via physical pages, the I/O
+subsystem assumes that the user mapping and kernel offset mapping are
+the only aliases. This isn't true for vmap aliases, so anything in
+the kernel trying to do I/O to vmap areas must manually manage
+coherency. It must do this by flushing the vmap range before doing
+I/O and invalidating it after the I/O returns.
+
+ void flush_kernel_vmap_range(void *vaddr, int size)
+ flushes the kernel cache for a given virtual address range in
+ the vmap area. This API makes sure that and data the kernel
+ modified in the vmap range is made visible to the physical
+ page. The design is to make this area safe to perform I/O on.
+ Note that this API does *not* also flush the offset map alias
+ of the area.
+
+ void invalidate_kernel_vmap_range(void *vaddr, int size)
+ invalidates the kernel cache for a given virtual address range
+ in the vmap area. This API is designed to make sure that while
+ I/O went on to an address range in the vmap area, the processor
+ didn't speculate cache reads and thus make the cache over the
+ virtual address stale. Its implementation may be a nop if the
+ architecture guarantees never to speculate on flushed ranges
+ during I/O.
+
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 211ff44..adfe101 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -17,6 +17,12 @@ static inline void flush_anon_page(struct vm_area_struct *vma, struct page *page
static inline void flush_kernel_dcache_page(struct page *page)
{
}
+static inline void flush_kernel_vmap_range(void *vaddr, int size)
+{
+}
+static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
+{
+}
#endif
#include <asm/kmap_types.h>
--
1.6.5
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas
2009-12-23 21:22 ` [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas James Bottomley
@ 2009-12-23 21:22 ` James Bottomley
2009-12-23 21:22 ` [PATCHv2 2/5] parisc: add mm " James Bottomley
2009-12-24 10:08 ` [PATCHv2 1/5] mm: add coherence " Matt Fleming
2 siblings, 0 replies; 27+ messages in thread
From: James Bottomley @ 2009-12-23 21:22 UTC (permalink / raw)
To: linux-arch
Cc: linux-parisc, Christoph Hellwig, Russell King, Paul Mundt,
James Bottomley
On Virtually Indexed architectures (which don't do automatic alias
resolution in their caches), we have to flush via the correct
virtual address to prepare pages for DMA. On some architectures
(like arm) we cannot prevent the CPU from doing data movein along
the alias (and thus giving stale read data), so we not only have to
introduce a flush API to push dirty cache lines out, but also an invalidate
API to kill inconsistent cache lines that may have moved in before
DMA changed the data
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
---
Documentation/cachetlb.txt | 27 +++++++++++++++++++++++++++
include/linux/highmem.h | 6 ++++++
2 files changed, 33 insertions(+), 0 deletions(-)
diff --git a/Documentation/cachetlb.txt b/Documentation/cachetlb.txt
index da42ab4..a29129f 100644
--- a/Documentation/cachetlb.txt
+++ b/Documentation/cachetlb.txt
@@ -377,3 +377,30 @@ maps this page at its virtual address.
All the functionality of flush_icache_page can be implemented in
flush_dcache_page and update_mmu_cache. In 2.7 the hope is to
remove this interface completely.
+
+The final category of APIs is for I/O to deliberately aliased address
+ranges inside the kernel. Such aliases are set up by use of the
+vmap/vmalloc API. Since kernel I/O goes via physical pages, the I/O
+subsystem assumes that the user mapping and kernel offset mapping are
+the only aliases. This isn't true for vmap aliases, so anything in
+the kernel trying to do I/O to vmap areas must manually manage
+coherency. It must do this by flushing the vmap range before doing
+I/O and invalidating it after the I/O returns.
+
+ void flush_kernel_vmap_range(void *vaddr, int size)
+ flushes the kernel cache for a given virtual address range in
+ the vmap area. This API makes sure that and data the kernel
+ modified in the vmap range is made visible to the physical
+ page. The design is to make this area safe to perform I/O on.
+ Note that this API does *not* also flush the offset map alias
+ of the area.
+
+ void invalidate_kernel_vmap_range(void *vaddr, int size)
+ invalidates the kernel cache for a given virtual address range
+ in the vmap area. This API is designed to make sure that while
+ I/O went on to an address range in the vmap area, the processor
+ didn't speculate cache reads and thus make the cache over the
+ virtual address stale. Its implementation may be a nop if the
+ architecture guarantees never to speculate on flushed ranges
+ during I/O.
+
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 211ff44..adfe101 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -17,6 +17,12 @@ static inline void flush_anon_page(struct vm_area_struct *vma, struct page *page
static inline void flush_kernel_dcache_page(struct page *page)
{
}
+static inline void flush_kernel_vmap_range(void *vaddr, int size)
+{
+}
+static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
+{
+}
#endif
#include <asm/kmap_types.h>
--
1.6.5
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCHv2 2/5] parisc: add mm API for DMA to vmalloc/vmap areas
2009-12-23 21:22 ` [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas James Bottomley
2009-12-23 21:22 ` James Bottomley
@ 2009-12-23 21:22 ` James Bottomley
2009-12-23 21:22 ` [PATCHv2 3/5] arm: " James Bottomley
2010-01-02 21:33 ` [PATCHv2 2/5] parisc: add mm API for DMA to vmalloc/vmap areas Benjamin Herrenschmidt
2009-12-24 10:08 ` [PATCHv2 1/5] mm: add coherence " Matt Fleming
2 siblings, 2 replies; 27+ messages in thread
From: James Bottomley @ 2009-12-23 21:22 UTC (permalink / raw)
To: linux-arch
Cc: linux-parisc, Christoph Hellwig, Russell King, Paul Mundt,
James Bottomley
We already have an API to flush a kernel page along an alias
address, so use it. The TLB purge prevents the CPU from doing
speculative moveins on the flushed address, so we don't need to
implement and invalidate.
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
---
arch/parisc/include/asm/cacheflush.h | 12 ++++++++++++
1 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h
index 7a73b61..4772777 100644
--- a/arch/parisc/include/asm/cacheflush.h
+++ b/arch/parisc/include/asm/cacheflush.h
@@ -38,6 +38,18 @@ void flush_cache_mm(struct mm_struct *mm);
#define flush_kernel_dcache_range(start,size) \
flush_kernel_dcache_range_asm((start), (start)+(size));
+/* vmap range flushes and invalidates. Architecturally, we don't need
+ * the invalidate, because the CPU should refuse to speculate once an
+ * area has been flushed, so invalidate is left empty */
+static inline void flush_kernel_vmap_range(void *vaddr, int size)
+{
+ unsigned long start = (unsigned long)vaddr;
+
+ flush_kernel_dcache_range_asm(start, start + size);
+}
+static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
+{
+}
#define flush_cache_vmap(start, end) flush_cache_all()
#define flush_cache_vunmap(start, end) flush_cache_all()
--
1.6.5
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCHv2 3/5] arm: add mm API for DMA to vmalloc/vmap areas
2009-12-23 21:22 ` [PATCHv2 2/5] parisc: add mm " James Bottomley
@ 2009-12-23 21:22 ` James Bottomley
2009-12-23 21:22 ` [PATCHv2 4/5] sh: " James Bottomley
2010-01-02 21:33 ` [PATCHv2 2/5] parisc: add mm API for DMA to vmalloc/vmap areas Benjamin Herrenschmidt
1 sibling, 1 reply; 27+ messages in thread
From: James Bottomley @ 2009-12-23 21:22 UTC (permalink / raw)
To: linux-arch
Cc: linux-parisc, Christoph Hellwig, Russell King, Paul Mundt,
James Bottomley
ARM cannot prevent cache movein, so this patch implements both the
flush and invalidate pieces of the API.
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
---
arch/arm/include/asm/cacheflush.h | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)
diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 730aefc..4ae503c 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -432,6 +432,16 @@ static inline void __flush_icache_all(void)
: "r" (0));
#endif
}
+static inline void flush_kernel_vmap_range(void *addr, int size)
+{
+ if ((cache_is_vivt() || cache_is_vipt_aliasing()))
+ __cpuc_flush_dcache_area(addr, (size_t)size);
+}
+static inline void invalidate_kernel_vmap_range(void *addr, int size)
+{
+ if ((cache_is_vivt() || cache_is_vipt_aliasing()))
+ __cpuc_flush_dcache_area(addr, (size_t)size);
+}
#define ARCH_HAS_FLUSH_ANON_PAGE
static inline void flush_anon_page(struct vm_area_struct *vma,
--
1.6.5
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCHv2 4/5] sh: add mm API for DMA to vmalloc/vmap areas
2009-12-23 21:22 ` [PATCHv2 3/5] arm: " James Bottomley
@ 2009-12-23 21:22 ` James Bottomley
2009-12-23 21:22 ` James Bottomley
2009-12-23 21:22 ` [PATCHv2 5/5] xfs: fix xfs to work with Virtually Indexed architectures James Bottomley
0 siblings, 2 replies; 27+ messages in thread
From: James Bottomley @ 2009-12-23 21:22 UTC (permalink / raw)
To: linux-arch
Cc: linux-parisc, Christoph Hellwig, Russell King, Paul Mundt,
James Bottomley
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
---
arch/sh/include/asm/cacheflush.h | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/arch/sh/include/asm/cacheflush.h b/arch/sh/include/asm/cacheflush.h
index dda96eb..da3ebec 100644
--- a/arch/sh/include/asm/cacheflush.h
+++ b/arch/sh/include/asm/cacheflush.h
@@ -63,6 +63,14 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
if (boot_cpu_data.dcache.n_aliases && PageAnon(page))
__flush_anon_page(page, vmaddr);
}
+static inline void flush_kernel_vmap_range(void *addr, int size)
+{
+ __flush_wback_region(addr, size);
+}
+static inline void invalidate_kernel_vmap_range(void *addr, int size)
+{
+ __flush_invalidate_region(addr, size);
+}
#define ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE
static inline void flush_kernel_dcache_page(struct page *page)
--
1.6.5
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCHv2 4/5] sh: add mm API for DMA to vmalloc/vmap areas
2009-12-23 21:22 ` [PATCHv2 4/5] sh: " James Bottomley
@ 2009-12-23 21:22 ` James Bottomley
2009-12-23 21:22 ` [PATCHv2 5/5] xfs: fix xfs to work with Virtually Indexed architectures James Bottomley
1 sibling, 0 replies; 27+ messages in thread
From: James Bottomley @ 2009-12-23 21:22 UTC (permalink / raw)
To: linux-arch
Cc: linux-parisc, Christoph Hellwig, Russell King, Paul Mundt,
James Bottomley
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
---
arch/sh/include/asm/cacheflush.h | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/arch/sh/include/asm/cacheflush.h b/arch/sh/include/asm/cacheflush.h
index dda96eb..da3ebec 100644
--- a/arch/sh/include/asm/cacheflush.h
+++ b/arch/sh/include/asm/cacheflush.h
@@ -63,6 +63,14 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
if (boot_cpu_data.dcache.n_aliases && PageAnon(page))
__flush_anon_page(page, vmaddr);
}
+static inline void flush_kernel_vmap_range(void *addr, int size)
+{
+ __flush_wback_region(addr, size);
+}
+static inline void invalidate_kernel_vmap_range(void *addr, int size)
+{
+ __flush_invalidate_region(addr, size);
+}
#define ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE
static inline void flush_kernel_dcache_page(struct page *page)
--
1.6.5
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCHv2 5/5] xfs: fix xfs to work with Virtually Indexed architectures
2009-12-23 21:22 ` [PATCHv2 4/5] sh: " James Bottomley
2009-12-23 21:22 ` James Bottomley
@ 2009-12-23 21:22 ` James Bottomley
2009-12-24 11:03 ` Christoph Hellwig
1 sibling, 1 reply; 27+ messages in thread
From: James Bottomley @ 2009-12-23 21:22 UTC (permalink / raw)
To: linux-arch
Cc: linux-parisc, Christoph Hellwig, Russell King, Paul Mundt,
James Bottomley
xfs_buf.c includes what is essentially a hand rolled version of
blk_rq_map_kern(). In order to work properly with the vmalloc buffers
that xfs uses, this hand rolled routine must also implement the flushing
API for vmap/vmalloc areas.
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
---
fs/xfs/linux-2.6/xfs_buf.c | 20 +++++++++++++++++++-
1 files changed, 19 insertions(+), 1 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_buf.c b/fs/xfs/linux-2.6/xfs_buf.c
index 77b8be8..4c77d96 100644
--- a/fs/xfs/linux-2.6/xfs_buf.c
+++ b/fs/xfs/linux-2.6/xfs_buf.c
@@ -76,6 +76,19 @@ struct workqueue_struct *xfsconvertd_workqueue;
#define xfs_buf_deallocate(bp) \
kmem_zone_free(xfs_buf_zone, (bp));
+STATIC int
+xfs_bp_is_vmapped(
+ xfs_buf_t *bp)
+{
+ /* return true if the buffer is vmapped. The XBF_MAPPED flag
+ * is set if the buffer should be mapped, but the code is
+ * clever enough to know it doesn't have to map a single page,
+ * so the check has to be both for XBF_MAPPED and
+ * bp->b_page_count > 1 */
+ return (bp->b_flags & XBF_MAPPED) && bp->b_page_count > 1;
+}
+
+
/*
* Page Region interfaces.
*
@@ -314,7 +327,7 @@ xfs_buf_free(
if (bp->b_flags & (_XBF_PAGE_CACHE|_XBF_PAGES)) {
uint i;
- if ((bp->b_flags & XBF_MAPPED) && (bp->b_page_count > 1))
+ if (xfs_bp_is_vmapped(bp))
free_address(bp->b_addr - bp->b_offset);
for (i = 0; i < bp->b_page_count; i++) {
@@ -1107,6 +1120,9 @@ xfs_buf_bio_end_io(
xfs_buf_ioerror(bp, -error);
+ if (!error && xfs_bp_is_vmapped(bp))
+ invalidate_kernel_vmap_range(bp->b_addr, (bp->b_page_count * PAGE_SIZE) - bp->b_offset);
+
do {
struct page *page = bvec->bv_page;
@@ -1216,6 +1232,8 @@ next_chunk:
submit_io:
if (likely(bio->bi_size)) {
+ if (xfs_bp_is_vmapped(bp))
+ flush_kernel_vmap_range(bp->b_addr, (bp->b_page_count * PAGE_SIZE) - bp->b_offset);
submit_bio(rw, bio);
if (size)
goto next_chunk;
--
1.6.5
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas
2009-12-23 21:22 ` [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas James Bottomley
2009-12-23 21:22 ` James Bottomley
2009-12-23 21:22 ` [PATCHv2 2/5] parisc: add mm " James Bottomley
@ 2009-12-24 10:08 ` Matt Fleming
2009-12-24 10:08 ` Matt Fleming
2009-12-24 12:39 ` Matthew Wilcox
2 siblings, 2 replies; 27+ messages in thread
From: Matt Fleming @ 2009-12-24 10:08 UTC (permalink / raw)
To: James Bottomley
Cc: linux-arch, linux-parisc, Christoph Hellwig, Russell King,
Paul Mundt
On Wed, Dec 23, 2009 at 03:22:21PM -0600, James Bottomley wrote:
> +
> + void flush_kernel_vmap_range(void *vaddr, int size)
> + flushes the kernel cache for a given virtual address range in
> + the vmap area. This API makes sure that and data the kernel
^^^ code and data?
> + void invalidate_kernel_vmap_range(void *vaddr, int size)
> + invalidates the kernel cache for a given virtual address range
> + in the vmap area. This API is designed to make sure that while
> + I/O went on to an address range in the vmap area, the processor
> + didn't speculate cache reads and thus make the cache over the
> + virtual address stale.
> +
Could this sentence be reworked a little? I find the "over the virtual
address" part a little difficult to parse.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas
2009-12-24 10:08 ` [PATCHv2 1/5] mm: add coherence " Matt Fleming
@ 2009-12-24 10:08 ` Matt Fleming
2009-12-24 12:39 ` Matthew Wilcox
1 sibling, 0 replies; 27+ messages in thread
From: Matt Fleming @ 2009-12-24 10:08 UTC (permalink / raw)
To: James Bottomley
Cc: linux-arch, linux-parisc, Christoph Hellwig, Russell King,
Paul Mundt
On Wed, Dec 23, 2009 at 03:22:21PM -0600, James Bottomley wrote:
> +
> + void flush_kernel_vmap_range(void *vaddr, int size)
> + flushes the kernel cache for a given virtual address range in
> + the vmap area. This API makes sure that and data the kernel
^^^ code and data?
> + void invalidate_kernel_vmap_range(void *vaddr, int size)
> + invalidates the kernel cache for a given virtual address range
> + in the vmap area. This API is designed to make sure that while
> + I/O went on to an address range in the vmap area, the processor
> + didn't speculate cache reads and thus make the cache over the
> + virtual address stale.
> +
Could this sentence be reworked a little? I find the "over the virtual
address" part a little difficult to parse.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 5/5] xfs: fix xfs to work with Virtually Indexed architectures
2009-12-23 21:22 ` [PATCHv2 5/5] xfs: fix xfs to work with Virtually Indexed architectures James Bottomley
@ 2009-12-24 11:03 ` Christoph Hellwig
2009-12-24 11:03 ` Christoph Hellwig
2009-12-27 15:32 ` James Bottomley
0 siblings, 2 replies; 27+ messages in thread
From: Christoph Hellwig @ 2009-12-24 11:03 UTC (permalink / raw)
To: James Bottomley
Cc: linux-arch, linux-parisc, Christoph Hellwig, Russell King,
Paul Mundt
On Wed, Dec 23, 2009 at 03:22:25PM -0600, James Bottomley wrote:
> xfs_buf.c includes what is essentially a hand rolled version of
> blk_rq_map_kern(). In order to work properly with the vmalloc buffers
> that xfs uses, this hand rolled routine must also implement the flushing
> API for vmap/vmalloc areas.
>
> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Looks good except some minor style issues. Fix in the version below
which also adds a helper to calculate the length of the vmap area
instead of calculating it twice.
---
From: James Bottomley <James.Bottomley@suse.de>
Subject: [PATCHv2 5/5] xfs: fix xfs to work with Virtually Indexed architectures
xfs_buf.c includes what is essentially a hand rolled version of
blk_rq_map_kern(). In order to work properly with the vmalloc buffers
that xfs uses, this hand rolled routine must also implement the flushing
API for vmap/vmalloc areas.
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Index: linux-2.6/fs/xfs/linux-2.6/xfs_buf.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_buf.c 2009-12-17 08:05:53.962275389 +0100
+++ linux-2.6/fs/xfs/linux-2.6/xfs_buf.c 2009-12-24 11:59:02.031073134 +0100
@@ -76,6 +76,27 @@ struct workqueue_struct *xfsconvertd_wor
#define xfs_buf_deallocate(bp) \
kmem_zone_free(xfs_buf_zone, (bp));
+static inline int
+xfs_buf_is_vmapped(
+ struct xfs_buf *bp)
+{
+ /*
+ * Return true if the buffer is vmapped.
+ *
+ * The XBF_MAPPED flag is set if the buffer should be mapped, but the
+ * code is clever enough to know it doesn't have to map a single page,
+ * so the check has to be both for XBF_MAPPED and bp->b_page_count > 1.
+ */
+ return (bp->b_flags & XBF_MAPPED) && bp->b_page_count > 1;
+}
+
+static inline int
+xfs_buf_vmap_len(
+ struct xfs_buf *bp)
+{
+ return (bp->b_page_count * PAGE_SIZE) - bp->b_offset;
+}
+
/*
* Page Region interfaces.
*
@@ -314,7 +335,7 @@ xfs_buf_free(
if (bp->b_flags & (_XBF_PAGE_CACHE|_XBF_PAGES)) {
uint i;
- if ((bp->b_flags & XBF_MAPPED) && (bp->b_page_count > 1))
+ if (xfs_buf_is_vmapped(bp))
free_address(bp->b_addr - bp->b_offset);
for (i = 0; i < bp->b_page_count; i++) {
@@ -1107,6 +1128,9 @@ xfs_buf_bio_end_io(
xfs_buf_ioerror(bp, -error);
+ if (!error && xfs_buf_is_vmapped(bp))
+ invalidate_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp));
+
do {
struct page *page = bvec->bv_page;
@@ -1216,6 +1240,10 @@ next_chunk:
submit_io:
if (likely(bio->bi_size)) {
+ if (xfs_buf_is_vmapped(bp)) {
+ flush_kernel_vmap_range(bp->b_addr,
+ xfs_buf_vmap_len(bp));
+ }
submit_bio(rw, bio);
if (size)
goto next_chunk;
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 5/5] xfs: fix xfs to work with Virtually Indexed architectures
2009-12-24 11:03 ` Christoph Hellwig
@ 2009-12-24 11:03 ` Christoph Hellwig
2009-12-27 15:32 ` James Bottomley
1 sibling, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2009-12-24 11:03 UTC (permalink / raw)
To: James Bottomley
Cc: linux-arch, linux-parisc, Christoph Hellwig, Russell King,
Paul Mundt
On Wed, Dec 23, 2009 at 03:22:25PM -0600, James Bottomley wrote:
> xfs_buf.c includes what is essentially a hand rolled version of
> blk_rq_map_kern(). In order to work properly with the vmalloc buffers
> that xfs uses, this hand rolled routine must also implement the flushing
> API for vmap/vmalloc areas.
>
> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Looks good except some minor style issues. Fix in the version below
which also adds a helper to calculate the length of the vmap area
instead of calculating it twice.
---
From: James Bottomley <James.Bottomley@suse.de>
Subject: [PATCHv2 5/5] xfs: fix xfs to work with Virtually Indexed architectures
xfs_buf.c includes what is essentially a hand rolled version of
blk_rq_map_kern(). In order to work properly with the vmalloc buffers
that xfs uses, this hand rolled routine must also implement the flushing
API for vmap/vmalloc areas.
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Index: linux-2.6/fs/xfs/linux-2.6/xfs_buf.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_buf.c 2009-12-17 08:05:53.962275389 +0100
+++ linux-2.6/fs/xfs/linux-2.6/xfs_buf.c 2009-12-24 11:59:02.031073134 +0100
@@ -76,6 +76,27 @@ struct workqueue_struct *xfsconvertd_wor
#define xfs_buf_deallocate(bp) \
kmem_zone_free(xfs_buf_zone, (bp));
+static inline int
+xfs_buf_is_vmapped(
+ struct xfs_buf *bp)
+{
+ /*
+ * Return true if the buffer is vmapped.
+ *
+ * The XBF_MAPPED flag is set if the buffer should be mapped, but the
+ * code is clever enough to know it doesn't have to map a single page,
+ * so the check has to be both for XBF_MAPPED and bp->b_page_count > 1.
+ */
+ return (bp->b_flags & XBF_MAPPED) && bp->b_page_count > 1;
+}
+
+static inline int
+xfs_buf_vmap_len(
+ struct xfs_buf *bp)
+{
+ return (bp->b_page_count * PAGE_SIZE) - bp->b_offset;
+}
+
/*
* Page Region interfaces.
*
@@ -314,7 +335,7 @@ xfs_buf_free(
if (bp->b_flags & (_XBF_PAGE_CACHE|_XBF_PAGES)) {
uint i;
- if ((bp->b_flags & XBF_MAPPED) && (bp->b_page_count > 1))
+ if (xfs_buf_is_vmapped(bp))
free_address(bp->b_addr - bp->b_offset);
for (i = 0; i < bp->b_page_count; i++) {
@@ -1107,6 +1128,9 @@ xfs_buf_bio_end_io(
xfs_buf_ioerror(bp, -error);
+ if (!error && xfs_buf_is_vmapped(bp))
+ invalidate_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp));
+
do {
struct page *page = bvec->bv_page;
@@ -1216,6 +1240,10 @@ next_chunk:
submit_io:
if (likely(bio->bi_size)) {
+ if (xfs_buf_is_vmapped(bp)) {
+ flush_kernel_vmap_range(bp->b_addr,
+ xfs_buf_vmap_len(bp));
+ }
submit_bio(rw, bio);
if (size)
goto next_chunk;
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas
2009-12-24 10:08 ` [PATCHv2 1/5] mm: add coherence " Matt Fleming
2009-12-24 10:08 ` Matt Fleming
@ 2009-12-24 12:39 ` Matthew Wilcox
2009-12-24 12:39 ` Matthew Wilcox
` (3 more replies)
1 sibling, 4 replies; 27+ messages in thread
From: Matthew Wilcox @ 2009-12-24 12:39 UTC (permalink / raw)
To: Matt Fleming
Cc: James Bottomley, linux-arch, linux-parisc, Christoph Hellwig,
Russell King, Paul Mundt
On Thu, Dec 24, 2009 at 10:08:53AM +0000, Matt Fleming wrote:
> On Wed, Dec 23, 2009 at 03:22:21PM -0600, James Bottomley wrote:
> > +
> > + void flush_kernel_vmap_range(void *vaddr, int size)
> > + flushes the kernel cache for a given virtual address range in
> > + the vmap area. This API makes sure that and data the kernel
>
> ^^^ code and data?
I'd guess it's a typo for 'any data'.
> > + void invalidate_kernel_vmap_range(void *vaddr, int size)
> > + invalidates the kernel cache for a given virtual address range
> > + in the vmap area. This API is designed to make sure that while
> > + I/O went on to an address range in the vmap area, the processor
> > + didn't speculate cache reads and thus make the cache over the
> > + virtual address stale.
> > +
>
> Could this sentence be reworked a little? I find the "over the virtual
> address" part a little difficult to parse.
How about:
invalidates the processor cache for a given virtual address range
in the vmap area. This API addresses the problem that the processor
may have performed speculative reads into its cache of the vmapped
area while I/O was occurring to the underlying physical pages.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas
2009-12-24 12:39 ` Matthew Wilcox
@ 2009-12-24 12:39 ` Matthew Wilcox
2009-12-24 13:06 ` Matt Fleming
` (2 subsequent siblings)
3 siblings, 0 replies; 27+ messages in thread
From: Matthew Wilcox @ 2009-12-24 12:39 UTC (permalink / raw)
To: Matt Fleming
Cc: James Bottomley, linux-arch, linux-parisc, Christoph Hellwig,
Russell King, Paul Mundt
On Thu, Dec 24, 2009 at 10:08:53AM +0000, Matt Fleming wrote:
> On Wed, Dec 23, 2009 at 03:22:21PM -0600, James Bottomley wrote:
> > +
> > + void flush_kernel_vmap_range(void *vaddr, int size)
> > + flushes the kernel cache for a given virtual address range in
> > + the vmap area. This API makes sure that and data the kernel
>
> ^^^ code and data?
I'd guess it's a typo for 'any data'.
> > + void invalidate_kernel_vmap_range(void *vaddr, int size)
> > + invalidates the kernel cache for a given virtual address range
> > + in the vmap area. This API is designed to make sure that while
> > + I/O went on to an address range in the vmap area, the processor
> > + didn't speculate cache reads and thus make the cache over the
> > + virtual address stale.
> > +
>
> Could this sentence be reworked a little? I find the "over the virtual
> address" part a little difficult to parse.
How about:
invalidates the processor cache for a given virtual address range
in the vmap area. This API addresses the problem that the processor
may have performed speculative reads into its cache of the vmapped
area while I/O was occurring to the underlying physical pages.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas
2009-12-24 12:39 ` Matthew Wilcox
2009-12-24 12:39 ` Matthew Wilcox
@ 2009-12-24 13:06 ` Matt Fleming
2009-12-24 13:06 ` Matt Fleming
2009-12-27 15:37 ` James Bottomley
2010-01-02 21:27 ` Benjamin Herrenschmidt
3 siblings, 1 reply; 27+ messages in thread
From: Matt Fleming @ 2009-12-24 13:06 UTC (permalink / raw)
To: Matthew Wilcox
Cc: James Bottomley, linux-arch, linux-parisc, Christoph Hellwig,
Russell King, Paul Mundt
On Thu, Dec 24, 2009 at 05:39:13AM -0700, Matthew Wilcox wrote:
> On Thu, Dec 24, 2009 at 10:08:53AM +0000, Matt Fleming wrote:
> > On Wed, Dec 23, 2009 at 03:22:21PM -0600, James Bottomley wrote:
> > > +
> > > + void flush_kernel_vmap_range(void *vaddr, int size)
> > > + flushes the kernel cache for a given virtual address range in
> > > + the vmap area. This API makes sure that and data the kernel
> >
> > ^^^ code and data?
>
> I'd guess it's a typo for 'any data'.
>
Aha yes, that would make more sense.
> > > + void invalidate_kernel_vmap_range(void *vaddr, int size)
> > > + invalidates the kernel cache for a given virtual address range
> > > + in the vmap area. This API is designed to make sure that while
> > > + I/O went on to an address range in the vmap area, the processor
> > > + didn't speculate cache reads and thus make the cache over the
> > > + virtual address stale.
> > > +
> >
> > Could this sentence be reworked a little? I find the "over the virtual
> > address" part a little difficult to parse.
>
> How about:
>
> invalidates the processor cache for a given virtual address range
> in the vmap area. This API addresses the problem that the processor
> may have performed speculative reads into its cache of the vmapped
> area while I/O was occurring to the underlying physical pages.
>
> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Sounds good to me!
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas
2009-12-24 13:06 ` Matt Fleming
@ 2009-12-24 13:06 ` Matt Fleming
0 siblings, 0 replies; 27+ messages in thread
From: Matt Fleming @ 2009-12-24 13:06 UTC (permalink / raw)
To: Matthew Wilcox
Cc: James Bottomley, linux-arch, linux-parisc, Christoph Hellwig,
Russell King, Paul Mundt
On Thu, Dec 24, 2009 at 05:39:13AM -0700, Matthew Wilcox wrote:
> On Thu, Dec 24, 2009 at 10:08:53AM +0000, Matt Fleming wrote:
> > On Wed, Dec 23, 2009 at 03:22:21PM -0600, James Bottomley wrote:
> > > +
> > > + void flush_kernel_vmap_range(void *vaddr, int size)
> > > + flushes the kernel cache for a given virtual address range in
> > > + the vmap area. This API makes sure that and data the kernel
> >
> > ^^^ code and data?
>
> I'd guess it's a typo for 'any data'.
>
Aha yes, that would make more sense.
> > > + void invalidate_kernel_vmap_range(void *vaddr, int size)
> > > + invalidates the kernel cache for a given virtual address range
> > > + in the vmap area. This API is designed to make sure that while
> > > + I/O went on to an address range in the vmap area, the processor
> > > + didn't speculate cache reads and thus make the cache over the
> > > + virtual address stale.
> > > +
> >
> > Could this sentence be reworked a little? I find the "over the virtual
> > address" part a little difficult to parse.
>
> How about:
>
> invalidates the processor cache for a given virtual address range
> in the vmap area. This API addresses the problem that the processor
> may have performed speculative reads into its cache of the vmapped
> area while I/O was occurring to the underlying physical pages.
>
> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Sounds good to me!
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 5/5] xfs: fix xfs to work with Virtually Indexed architectures
2009-12-24 11:03 ` Christoph Hellwig
2009-12-24 11:03 ` Christoph Hellwig
@ 2009-12-27 15:32 ` James Bottomley
1 sibling, 0 replies; 27+ messages in thread
From: James Bottomley @ 2009-12-27 15:32 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-arch, linux-parisc, Russell King, Paul Mundt
On Thu, 2009-12-24 at 12:03 +0100, Christoph Hellwig wrote:
> On Wed, Dec 23, 2009 at 03:22:25PM -0600, James Bottomley wrote:
> > xfs_buf.c includes what is essentially a hand rolled version of
> > blk_rq_map_kern(). In order to work properly with the vmalloc buffers
> > that xfs uses, this hand rolled routine must also implement the flushing
> > API for vmap/vmalloc areas.
> >
> > Signed-off-by: James Bottomley <James.Bottomley@suse.de>
>
> Looks good except some minor style issues. Fix in the version below
> which also adds a helper to calculate the length of the vmap area
> instead of calculating it twice.
Sure ... I nearly did that ... but then there's a lot of stuff like that
repeated in the code.
I'll also add the optimisation that the invalidation is unnecessary in
the write case.
James
> ---
>
> From: James Bottomley <James.Bottomley@suse.de>
> Subject: [PATCHv2 5/5] xfs: fix xfs to work with Virtually Indexed architectures
>
> xfs_buf.c includes what is essentially a hand rolled version of
> blk_rq_map_kern(). In order to work properly with the vmalloc buffers
> that xfs uses, this hand rolled routine must also implement the flushing
> API for vmap/vmalloc areas.
>
> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
>
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_buf.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_buf.c 2009-12-17 08:05:53.962275389 +0100
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_buf.c 2009-12-24 11:59:02.031073134 +0100
> @@ -76,6 +76,27 @@ struct workqueue_struct *xfsconvertd_wor
> #define xfs_buf_deallocate(bp) \
> kmem_zone_free(xfs_buf_zone, (bp));
>
> +static inline int
> +xfs_buf_is_vmapped(
> + struct xfs_buf *bp)
> +{
> + /*
> + * Return true if the buffer is vmapped.
> + *
> + * The XBF_MAPPED flag is set if the buffer should be mapped, but the
> + * code is clever enough to know it doesn't have to map a single page,
> + * so the check has to be both for XBF_MAPPED and bp->b_page_count > 1.
> + */
> + return (bp->b_flags & XBF_MAPPED) && bp->b_page_count > 1;
> +}
> +
> +static inline int
> +xfs_buf_vmap_len(
> + struct xfs_buf *bp)
> +{
> + return (bp->b_page_count * PAGE_SIZE) - bp->b_offset;
> +}
> +
> /*
> * Page Region interfaces.
> *
> @@ -314,7 +335,7 @@ xfs_buf_free(
> if (bp->b_flags & (_XBF_PAGE_CACHE|_XBF_PAGES)) {
> uint i;
>
> - if ((bp->b_flags & XBF_MAPPED) && (bp->b_page_count > 1))
> + if (xfs_buf_is_vmapped(bp))
> free_address(bp->b_addr - bp->b_offset);
>
> for (i = 0; i < bp->b_page_count; i++) {
> @@ -1107,6 +1128,9 @@ xfs_buf_bio_end_io(
>
> xfs_buf_ioerror(bp, -error);
>
> + if (!error && xfs_buf_is_vmapped(bp))
> + invalidate_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp));
> +
> do {
> struct page *page = bvec->bv_page;
>
> @@ -1216,6 +1240,10 @@ next_chunk:
>
> submit_io:
> if (likely(bio->bi_size)) {
> + if (xfs_buf_is_vmapped(bp)) {
> + flush_kernel_vmap_range(bp->b_addr,
> + xfs_buf_vmap_len(bp));
> + }
> submit_bio(rw, bio);
> if (size)
> goto next_chunk;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arch" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas
2009-12-24 12:39 ` Matthew Wilcox
2009-12-24 12:39 ` Matthew Wilcox
2009-12-24 13:06 ` Matt Fleming
@ 2009-12-27 15:37 ` James Bottomley
2010-01-02 21:27 ` Benjamin Herrenschmidt
3 siblings, 0 replies; 27+ messages in thread
From: James Bottomley @ 2009-12-27 15:37 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Matt Fleming, linux-arch, linux-parisc, Christoph Hellwig,
Russell King, Paul Mundt
On Thu, 2009-12-24 at 05:39 -0700, Matthew Wilcox wrote:
> On Thu, Dec 24, 2009 at 10:08:53AM +0000, Matt Fleming wrote:
> > On Wed, Dec 23, 2009 at 03:22:21PM -0600, James Bottomley wrote:
> > > +
> > > + void flush_kernel_vmap_range(void *vaddr, int size)
> > > + flushes the kernel cache for a given virtual address range in
> > > + the vmap area. This API makes sure that and data the kernel
> >
> > ^^^ code and data?
>
> I'd guess it's a typo for 'any data'.
>
> > > + void invalidate_kernel_vmap_range(void *vaddr, int size)
> > > + invalidates the kernel cache for a given virtual address range
> > > + in the vmap area. This API is designed to make sure that while
> > > + I/O went on to an address range in the vmap area, the processor
> > > + didn't speculate cache reads and thus make the cache over the
> > > + virtual address stale.
> > > +
> >
> > Could this sentence be reworked a little? I find the "over the virtual
> > address" part a little difficult to parse.
>
> How about:
>
> invalidates the processor cache for a given virtual address range
> in the vmap area. This API addresses the problem that the processor
> may have performed speculative reads into its cache of the vmapped
> area while I/O was occurring to the underlying physical pages.
So better, I think is
invalidates the cache for a given virtual address range in the vmap area
which prevents the processor from making the cache stale by
speculatively reading data while the I/O was occurring to the physical
pages.
James
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas
2009-12-24 12:39 ` Matthew Wilcox
` (2 preceding siblings ...)
2009-12-27 15:37 ` James Bottomley
@ 2010-01-02 21:27 ` Benjamin Herrenschmidt
2010-01-02 21:54 ` James Bottomley
3 siblings, 1 reply; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2010-01-02 21:27 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Matt Fleming, James Bottomley, linux-arch, linux-parisc,
Christoph Hellwig, Russell King, Paul Mundt
On Thu, 2009-12-24 at 05:39 -0700, Matthew Wilcox wrote:
>
> invalidates the processor cache for a given virtual address range
> in the vmap area. This API addresses the problem that the processor
> may have performed speculative reads into its cache of the vmapped
> area while I/O was occurring to the underlying physical pages.
>
> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Interestingly, our DMA APIs in this regard are sub-optimal as they
should provide 3 hooks, not 2.
Flush is good for ensuring dirty lines have been pushed out before an
outgoing DMA.
But for incoming DMA it would be nice to properly split the 2 calls that
may be needed on some archs, one before, one after the transfer. Sure,
invalidate twice will "work" but the f will also be sub-optimal on some
platforms depending on whether the platform is known to be able or not
to speculatively load cache lines etc...
Maybe just a "before" vs. "after" argument ?
Also, the proposal goes contrary to most of our DMA APIs which don't
actually expose the details of invalidate vs. flush but instead expose
the direction of the transfer (both lack the above subtlety though).
Thoughts ?
Cheers,
Ben.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 2/5] parisc: add mm API for DMA to vmalloc/vmap areas
2009-12-23 21:22 ` [PATCHv2 2/5] parisc: add mm " James Bottomley
2009-12-23 21:22 ` [PATCHv2 3/5] arm: " James Bottomley
@ 2010-01-02 21:33 ` Benjamin Herrenschmidt
2010-01-02 21:33 ` Benjamin Herrenschmidt
2010-01-02 21:53 ` James Bottomley
1 sibling, 2 replies; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2010-01-02 21:33 UTC (permalink / raw)
To: James Bottomley
Cc: linux-arch, linux-parisc, Christoph Hellwig, Russell King,
Paul Mundt
On Wed, 2009-12-23 at 15:22 -0600, James Bottomley wrote:
> #define flush_kernel_dcache_range(start,size) \
> flush_kernel_dcache_range_asm((start), (start)+(size));
> +/* vmap range flushes and invalidates. Architecturally, we don't need
> + * the invalidate, because the CPU should refuse to speculate once an
> + * area has been flushed, so invalidate is left empty */
> +static inline void flush_kernel_vmap_range(void *vaddr, int size)
> +{
> + unsigned long start = (unsigned long)vaddr;
> +
> + flush_kernel_dcache_range_asm(start, start + size);
> +}
> +static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
> +{
> +}
Do I understand correctly that for an inbound DMA you will first call
flush before starting the DMA, then invalidate at the end of the
transfer ?
See my other message on that subject but I believe this is a sub-optimal
semantic. I'd rather expose separately dma_vmap_sync_outbound vs.
dma_vma_sync_inboud_before vs. dma_vma_sync_inboud_after.
On quite a few archs, an invalidate is a lot faster than a flush (since
it doesn't require a writeback of potentially useless crap to memory)
and for an inbound transfer that doesn't cross cache line boundaries,
invalidate is all that's needed for both before and after. On 44x
additionally I don't need "after" since the core is too dumb to prefetch
(or rather it's disabled due to erratas).
Cheers,
Ben.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 2/5] parisc: add mm API for DMA to vmalloc/vmap areas
2010-01-02 21:33 ` [PATCHv2 2/5] parisc: add mm API for DMA to vmalloc/vmap areas Benjamin Herrenschmidt
@ 2010-01-02 21:33 ` Benjamin Herrenschmidt
2010-01-02 21:53 ` James Bottomley
1 sibling, 0 replies; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2010-01-02 21:33 UTC (permalink / raw)
To: James Bottomley
Cc: linux-arch, linux-parisc, Christoph Hellwig, Russell King,
Paul Mundt
On Wed, 2009-12-23 at 15:22 -0600, James Bottomley wrote:
> #define flush_kernel_dcache_range(start,size) \
> flush_kernel_dcache_range_asm((start), (start)+(size));
> +/* vmap range flushes and invalidates. Architecturally, we don't need
> + * the invalidate, because the CPU should refuse to speculate once an
> + * area has been flushed, so invalidate is left empty */
> +static inline void flush_kernel_vmap_range(void *vaddr, int size)
> +{
> + unsigned long start = (unsigned long)vaddr;
> +
> + flush_kernel_dcache_range_asm(start, start + size);
> +}
> +static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
> +{
> +}
Do I understand correctly that for an inbound DMA you will first call
flush before starting the DMA, then invalidate at the end of the
transfer ?
See my other message on that subject but I believe this is a sub-optimal
semantic. I'd rather expose separately dma_vmap_sync_outbound vs.
dma_vma_sync_inboud_before vs. dma_vma_sync_inboud_after.
On quite a few archs, an invalidate is a lot faster than a flush (since
it doesn't require a writeback of potentially useless crap to memory)
and for an inbound transfer that doesn't cross cache line boundaries,
invalidate is all that's needed for both before and after. On 44x
additionally I don't need "after" since the core is too dumb to prefetch
(or rather it's disabled due to erratas).
Cheers,
Ben.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 2/5] parisc: add mm API for DMA to vmalloc/vmap areas
2010-01-02 21:33 ` [PATCHv2 2/5] parisc: add mm API for DMA to vmalloc/vmap areas Benjamin Herrenschmidt
2010-01-02 21:33 ` Benjamin Herrenschmidt
@ 2010-01-02 21:53 ` James Bottomley
2010-01-03 20:12 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 27+ messages in thread
From: James Bottomley @ 2010-01-02 21:53 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: linux-arch, linux-parisc, Christoph Hellwig, Russell King,
Paul Mundt
On Sun, 2010-01-03 at 08:33 +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2009-12-23 at 15:22 -0600, James Bottomley wrote:
> > #define flush_kernel_dcache_range(start,size) \
> > flush_kernel_dcache_range_asm((start), (start)+(size));
> > +/* vmap range flushes and invalidates. Architecturally, we don't need
> > + * the invalidate, because the CPU should refuse to speculate once an
> > + * area has been flushed, so invalidate is left empty */
> > +static inline void flush_kernel_vmap_range(void *vaddr, int size)
> > +{
> > + unsigned long start = (unsigned long)vaddr;
> > +
> > + flush_kernel_dcache_range_asm(start, start + size);
> > +}
> > +static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
> > +{
> > +}
>
> Do I understand correctly that for an inbound DMA you will first call
> flush before starting the DMA, then invalidate at the end of the
> transfer ?
>
> See my other message on that subject but I believe this is a sub-optimal
> semantic. I'd rather expose separately dma_vmap_sync_outbound vs.
> dma_vma_sync_inboud_before vs. dma_vma_sync_inboud_after.
Well, this is such a micro optimisation, is it really worth it?
If I map exactly to architectural operations, it's flush (without
invalidate if possible) before an outbound DMA transfer and nothing
after. For inbound, it's invalidate before and after (the after only
assuming the architecture can do speculative move in), but doing a flush
first instead of an invalidate on DMA inbound produces a correct result
on architectures I know about.
> On quite a few archs, an invalidate is a lot faster than a flush (since
> it doesn't require a writeback of potentially useless crap to memory)
> and for an inbound transfer that doesn't cross cache line boundaries,
> invalidate is all that's needed for both before and after. On 44x
> additionally I don't need "after" since the core is too dumb to prefetch
> (or rather it's disabled due to erratas).
Your logic assumes the cache line is dirty. If you look at the XFS
usage, it never seems to do local modifications on a read, so the line
should be clean. At least on parisc, a flush of a clean cache line is
exactly equivalent to an invalidate. Even if there's some write into
the read area in xfs I've missed, it's only a few extra cycles because
the lines are mostly clean.
James
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas
2010-01-02 21:27 ` Benjamin Herrenschmidt
@ 2010-01-02 21:54 ` James Bottomley
2010-01-03 20:14 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 27+ messages in thread
From: James Bottomley @ 2010-01-02 21:54 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Matthew Wilcox, Matt Fleming, linux-arch, linux-parisc,
Christoph Hellwig, Russell King, Paul Mundt
On Sun, 2010-01-03 at 08:27 +1100, Benjamin Herrenschmidt wrote:
> On Thu, 2009-12-24 at 05:39 -0700, Matthew Wilcox wrote:
> >
> > invalidates the processor cache for a given virtual address range
> > in the vmap area. This API addresses the problem that the processor
> > may have performed speculative reads into its cache of the vmapped
> > area while I/O was occurring to the underlying physical pages.
> >
> > Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
>
> Interestingly, our DMA APIs in this regard are sub-optimal as they
> should provide 3 hooks, not 2.
>
> Flush is good for ensuring dirty lines have been pushed out before an
> outgoing DMA.
>
> But for incoming DMA it would be nice to properly split the 2 calls that
> may be needed on some archs, one before, one after the transfer. Sure,
> invalidate twice will "work" but the f will also be sub-optimal on some
> platforms depending on whether the platform is known to be able or not
> to speculatively load cache lines etc...
>
> Maybe just a "before" vs. "after" argument ?
>
> Also, the proposal goes contrary to most of our DMA APIs which don't
> actually expose the details of invalidate vs. flush but instead expose
> the direction of the transfer (both lack the above subtlety though).
>
> Thoughts ?
Well, that's the result of the email thread. XFS is poking deeply into
architectural issues by trying to do I/O on a vmap area. I thought the
consensus was that if xfs wants to do that then it takes complete
responsibility for coherence rather than trying to hide it in a block
API.
James
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 2/5] parisc: add mm API for DMA to vmalloc/vmap areas
2010-01-02 21:53 ` James Bottomley
@ 2010-01-03 20:12 ` Benjamin Herrenschmidt
2010-01-03 20:12 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2010-01-03 20:12 UTC (permalink / raw)
To: James Bottomley
Cc: linux-arch, linux-parisc, Christoph Hellwig, Russell King,
Paul Mundt
On Sat, 2010-01-02 at 15:53 -0600, James Bottomley wrote:
>
> > See my other message on that subject but I believe this is a
> sub-optimal
> > semantic. I'd rather expose separately dma_vmap_sync_outbound vs.
> > dma_vma_sync_inboud_before vs. dma_vma_sync_inboud_after.
>
> Well, this is such a micro optimisation, is it really worth it?
It's not that "micro" in the sense that flush can take significantly
longer than invalidate. Mostly a non issue except in network.... most
non cache coherent systems are low end and pretty sensitive to that sort
of stuff, especially when used as ... routers :-) But yes, it probably
doesn't matter in your current case of XFS vmap, I was mostly trying to
push toward generically exposing finer interfaces :-) And god knows
who'll use your vmap DMA APIs in the future.
> If I map exactly to architectural operations, it's flush (without
> invalidate if possible) before an outbound DMA transfer and nothing
> after. For inbound, it's invalidate before and after (the after only
> assuming the architecture can do speculative move in), but doing a
> flush first instead of an invalidate on DMA inbound produces a correct
> result on architectures I know about.
It is correct. Just sub-optimal ;-) However it does handle another quirk
which is side effect on "edges". If all is well, the DMA buffer is
completely self contained in its own cache lines but experience shows
that isn't always the case and it might share a cache line on the edges
(typical with skb's).
This is of course utterly broken except in one case (which I -think- is
the case with skb's) where the data sharing the cache line isn't
accessed during the DMA transfer.
However, for that to work, as you can see, one needs to flush and not
invalidate the "edge" cache lines before the inbound transfer. So your
approach of flush + invalidate is "safer". But I'd still rather have the
semantic exposed at a higher level so that arch can decide what to use
and allow for the micro-optimisation...
> Your logic assumes the cache line is dirty. If you look at the XFS
> usage, it never seems to do local modifications on a read, so the line
> should be clean. At least on parisc, a flush of a clean cache line is
> exactly equivalent to an invalidate. Even if there's some write into
> the read area in xfs I've missed, it's only a few extra cycles because
> the lines are mostly clean.
Right, in your case of vmap, it doesn't seem to be a big deal, but heh,
somebody is bound to use it again for something else right ? :-)
Anyways, just some thoughts I've been having about DMA APIs in general,
I don't say we must change it now, especially not the existing ones, but
for a -new- API like that I would have preferred if it exposed a richer
semantic from day 1.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 2/5] parisc: add mm API for DMA to vmalloc/vmap areas
2010-01-03 20:12 ` Benjamin Herrenschmidt
@ 2010-01-03 20:12 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2010-01-03 20:12 UTC (permalink / raw)
To: James Bottomley
Cc: linux-arch, linux-parisc, Christoph Hellwig, Russell King,
Paul Mundt
On Sat, 2010-01-02 at 15:53 -0600, James Bottomley wrote:
>
> > See my other message on that subject but I believe this is a
> sub-optimal
> > semantic. I'd rather expose separately dma_vmap_sync_outbound vs.
> > dma_vma_sync_inboud_before vs. dma_vma_sync_inboud_after.
>
> Well, this is such a micro optimisation, is it really worth it?
It's not that "micro" in the sense that flush can take significantly
longer than invalidate. Mostly a non issue except in network.... most
non cache coherent systems are low end and pretty sensitive to that sort
of stuff, especially when used as ... routers :-) But yes, it probably
doesn't matter in your current case of XFS vmap, I was mostly trying to
push toward generically exposing finer interfaces :-) And god knows
who'll use your vmap DMA APIs in the future.
> If I map exactly to architectural operations, it's flush (without
> invalidate if possible) before an outbound DMA transfer and nothing
> after. For inbound, it's invalidate before and after (the after only
> assuming the architecture can do speculative move in), but doing a
> flush first instead of an invalidate on DMA inbound produces a correct
> result on architectures I know about.
It is correct. Just sub-optimal ;-) However it does handle another quirk
which is side effect on "edges". If all is well, the DMA buffer is
completely self contained in its own cache lines but experience shows
that isn't always the case and it might share a cache line on the edges
(typical with skb's).
This is of course utterly broken except in one case (which I -think- is
the case with skb's) where the data sharing the cache line isn't
accessed during the DMA transfer.
However, for that to work, as you can see, one needs to flush and not
invalidate the "edge" cache lines before the inbound transfer. So your
approach of flush + invalidate is "safer". But I'd still rather have the
semantic exposed at a higher level so that arch can decide what to use
and allow for the micro-optimisation...
> Your logic assumes the cache line is dirty. If you look at the XFS
> usage, it never seems to do local modifications on a read, so the line
> should be clean. At least on parisc, a flush of a clean cache line is
> exactly equivalent to an invalidate. Even if there's some write into
> the read area in xfs I've missed, it's only a few extra cycles because
> the lines are mostly clean.
Right, in your case of vmap, it doesn't seem to be a big deal, but heh,
somebody is bound to use it again for something else right ? :-)
Anyways, just some thoughts I've been having about DMA APIs in general,
I don't say we must change it now, especially not the existing ones, but
for a -new- API like that I would have preferred if it exposed a richer
semantic from day 1.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas
2010-01-02 21:54 ` James Bottomley
@ 2010-01-03 20:14 ` Benjamin Herrenschmidt
2010-01-03 20:14 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2010-01-03 20:14 UTC (permalink / raw)
To: James Bottomley
Cc: Matthew Wilcox, Matt Fleming, linux-arch, linux-parisc,
Christoph Hellwig, Russell King, Paul Mundt
On Sat, 2010-01-02 at 15:54 -0600, James Bottomley wrote:
> Well, that's the result of the email thread. XFS is poking deeply into
> architectural issues by trying to do I/O on a vmap area. I thought the
> consensus was that if xfs wants to do that then it takes complete
> responsibility for coherence rather than trying to hide it in a block
> API.
Well, maybe ... Mostly my comment was about the "other" (standard) DMA
APIs, ie, the work you do for XFS just reminded me of that issue I've
been thinking about :-)
Cheers,
Ben.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas
2010-01-03 20:14 ` Benjamin Herrenschmidt
@ 2010-01-03 20:14 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2010-01-03 20:14 UTC (permalink / raw)
To: James Bottomley
Cc: Matthew Wilcox, Matt Fleming, linux-arch, linux-parisc,
Christoph Hellwig, Russell King, Paul Mundt
On Sat, 2010-01-02 at 15:54 -0600, James Bottomley wrote:
> Well, that's the result of the email thread. XFS is poking deeply into
> architectural issues by trying to do I/O on a vmap area. I thought the
> consensus was that if xfs wants to do that then it takes complete
> responsibility for coherence rather than trying to hide it in a block
> API.
Well, maybe ... Mostly my comment was about the "other" (standard) DMA
APIs, ie, the work you do for XFS just reminded me of that issue I've
been thinking about :-)
Cheers,
Ben.
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2010-01-03 20:14 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-23 21:22 [PATCHv2 0/5] fix xfs by making I/O to vmap/vmalloc areas work James Bottomley
2009-12-23 21:22 ` [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas James Bottomley
2009-12-23 21:22 ` James Bottomley
2009-12-23 21:22 ` [PATCHv2 2/5] parisc: add mm " James Bottomley
2009-12-23 21:22 ` [PATCHv2 3/5] arm: " James Bottomley
2009-12-23 21:22 ` [PATCHv2 4/5] sh: " James Bottomley
2009-12-23 21:22 ` James Bottomley
2009-12-23 21:22 ` [PATCHv2 5/5] xfs: fix xfs to work with Virtually Indexed architectures James Bottomley
2009-12-24 11:03 ` Christoph Hellwig
2009-12-24 11:03 ` Christoph Hellwig
2009-12-27 15:32 ` James Bottomley
2010-01-02 21:33 ` [PATCHv2 2/5] parisc: add mm API for DMA to vmalloc/vmap areas Benjamin Herrenschmidt
2010-01-02 21:33 ` Benjamin Herrenschmidt
2010-01-02 21:53 ` James Bottomley
2010-01-03 20:12 ` Benjamin Herrenschmidt
2010-01-03 20:12 ` Benjamin Herrenschmidt
2009-12-24 10:08 ` [PATCHv2 1/5] mm: add coherence " Matt Fleming
2009-12-24 10:08 ` Matt Fleming
2009-12-24 12:39 ` Matthew Wilcox
2009-12-24 12:39 ` Matthew Wilcox
2009-12-24 13:06 ` Matt Fleming
2009-12-24 13:06 ` Matt Fleming
2009-12-27 15:37 ` James Bottomley
2010-01-02 21:27 ` Benjamin Herrenschmidt
2010-01-02 21:54 ` James Bottomley
2010-01-03 20:14 ` Benjamin Herrenschmidt
2010-01-03 20:14 ` Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).