* [PATCHv2 0/5] fix xfs by making I/O to vmap/vmalloc areas work @ 2009-12-23 21:22 James Bottomley 2009-12-23 21:22 ` [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas James Bottomley 0 siblings, 1 reply; 27+ messages in thread From: James Bottomley @ 2009-12-23 21:22 UTC (permalink / raw) To: linux-arch Cc: linux-parisc, Christoph Hellwig, Russell King, Paul Mundt, James Bottomley From: James Bottomley <jejb@external.hp.com> After looking through all the feedback, here's the next version of the patch. This one is based on a new api: flush/invalidate_kernel_vmap_range() making it clear that the fs/driver is the entity managing the vmaps ... it also drops the block inputs because coherency management is now the responsibility of the user. The xfs interface is nicely simplified with this approach. Could someone check this on arm and sh? Thanks, James --- James Bottomley (5): mm: add coherence API for DMA to vmalloc/vmap areas parisc: add mm API for DMA to vmalloc/vmap areas arm: add mm API for DMA to vmalloc/vmap areas sh: add mm API for DMA to vmalloc/vmap areas xfs: fix xfs to work with Virtually Indexed architectures Documentation/cachetlb.txt | 27 +++++++++++++++++++++++++++ arch/arm/include/asm/cacheflush.h | 10 ++++++++++ arch/parisc/include/asm/cacheflush.h | 12 ++++++++++++ arch/sh/include/asm/cacheflush.h | 8 ++++++++ fs/xfs/linux-2.6/xfs_buf.c | 20 +++++++++++++++++++- include/linux/highmem.h | 6 ++++++ 6 files changed, 82 insertions(+), 1 deletions(-) ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas 2009-12-23 21:22 [PATCHv2 0/5] fix xfs by making I/O to vmap/vmalloc areas work James Bottomley @ 2009-12-23 21:22 ` James Bottomley 2009-12-23 21:22 ` James Bottomley ` (2 more replies) 0 siblings, 3 replies; 27+ messages in thread From: James Bottomley @ 2009-12-23 21:22 UTC (permalink / raw) To: linux-arch Cc: linux-parisc, Christoph Hellwig, Russell King, Paul Mundt, James Bottomley On Virtually Indexed architectures (which don't do automatic alias resolution in their caches), we have to flush via the correct virtual address to prepare pages for DMA. On some architectures (like arm) we cannot prevent the CPU from doing data movein along the alias (and thus giving stale read data), so we not only have to introduce a flush API to push dirty cache lines out, but also an invalidate API to kill inconsistent cache lines that may have moved in before DMA changed the data Signed-off-by: James Bottomley <James.Bottomley@suse.de> --- Documentation/cachetlb.txt | 27 +++++++++++++++++++++++++++ include/linux/highmem.h | 6 ++++++ 2 files changed, 33 insertions(+), 0 deletions(-) diff --git a/Documentation/cachetlb.txt b/Documentation/cachetlb.txt index da42ab4..a29129f 100644 --- a/Documentation/cachetlb.txt +++ b/Documentation/cachetlb.txt @@ -377,3 +377,30 @@ maps this page at its virtual address. All the functionality of flush_icache_page can be implemented in flush_dcache_page and update_mmu_cache. In 2.7 the hope is to remove this interface completely. + +The final category of APIs is for I/O to deliberately aliased address +ranges inside the kernel. Such aliases are set up by use of the +vmap/vmalloc API. Since kernel I/O goes via physical pages, the I/O +subsystem assumes that the user mapping and kernel offset mapping are +the only aliases. This isn't true for vmap aliases, so anything in +the kernel trying to do I/O to vmap areas must manually manage +coherency. It must do this by flushing the vmap range before doing +I/O and invalidating it after the I/O returns. + + void flush_kernel_vmap_range(void *vaddr, int size) + flushes the kernel cache for a given virtual address range in + the vmap area. This API makes sure that and data the kernel + modified in the vmap range is made visible to the physical + page. The design is to make this area safe to perform I/O on. + Note that this API does *not* also flush the offset map alias + of the area. + + void invalidate_kernel_vmap_range(void *vaddr, int size) + invalidates the kernel cache for a given virtual address range + in the vmap area. This API is designed to make sure that while + I/O went on to an address range in the vmap area, the processor + didn't speculate cache reads and thus make the cache over the + virtual address stale. Its implementation may be a nop if the + architecture guarantees never to speculate on flushed ranges + during I/O. + diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 211ff44..adfe101 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -17,6 +17,12 @@ static inline void flush_anon_page(struct vm_area_struct *vma, struct page *page static inline void flush_kernel_dcache_page(struct page *page) { } +static inline void flush_kernel_vmap_range(void *vaddr, int size) +{ +} +static inline void invalidate_kernel_vmap_range(void *vaddr, int size) +{ +} #endif #include <asm/kmap_types.h> -- 1.6.5 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas 2009-12-23 21:22 ` [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas James Bottomley @ 2009-12-23 21:22 ` James Bottomley 2009-12-23 21:22 ` [PATCHv2 2/5] parisc: add mm " James Bottomley 2009-12-24 10:08 ` [PATCHv2 1/5] mm: add coherence " Matt Fleming 2 siblings, 0 replies; 27+ messages in thread From: James Bottomley @ 2009-12-23 21:22 UTC (permalink / raw) To: linux-arch Cc: linux-parisc, Christoph Hellwig, Russell King, Paul Mundt, James Bottomley On Virtually Indexed architectures (which don't do automatic alias resolution in their caches), we have to flush via the correct virtual address to prepare pages for DMA. On some architectures (like arm) we cannot prevent the CPU from doing data movein along the alias (and thus giving stale read data), so we not only have to introduce a flush API to push dirty cache lines out, but also an invalidate API to kill inconsistent cache lines that may have moved in before DMA changed the data Signed-off-by: James Bottomley <James.Bottomley@suse.de> --- Documentation/cachetlb.txt | 27 +++++++++++++++++++++++++++ include/linux/highmem.h | 6 ++++++ 2 files changed, 33 insertions(+), 0 deletions(-) diff --git a/Documentation/cachetlb.txt b/Documentation/cachetlb.txt index da42ab4..a29129f 100644 --- a/Documentation/cachetlb.txt +++ b/Documentation/cachetlb.txt @@ -377,3 +377,30 @@ maps this page at its virtual address. All the functionality of flush_icache_page can be implemented in flush_dcache_page and update_mmu_cache. In 2.7 the hope is to remove this interface completely. + +The final category of APIs is for I/O to deliberately aliased address +ranges inside the kernel. Such aliases are set up by use of the +vmap/vmalloc API. Since kernel I/O goes via physical pages, the I/O +subsystem assumes that the user mapping and kernel offset mapping are +the only aliases. This isn't true for vmap aliases, so anything in +the kernel trying to do I/O to vmap areas must manually manage +coherency. It must do this by flushing the vmap range before doing +I/O and invalidating it after the I/O returns. + + void flush_kernel_vmap_range(void *vaddr, int size) + flushes the kernel cache for a given virtual address range in + the vmap area. This API makes sure that and data the kernel + modified in the vmap range is made visible to the physical + page. The design is to make this area safe to perform I/O on. + Note that this API does *not* also flush the offset map alias + of the area. + + void invalidate_kernel_vmap_range(void *vaddr, int size) + invalidates the kernel cache for a given virtual address range + in the vmap area. This API is designed to make sure that while + I/O went on to an address range in the vmap area, the processor + didn't speculate cache reads and thus make the cache over the + virtual address stale. Its implementation may be a nop if the + architecture guarantees never to speculate on flushed ranges + during I/O. + diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 211ff44..adfe101 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -17,6 +17,12 @@ static inline void flush_anon_page(struct vm_area_struct *vma, struct page *page static inline void flush_kernel_dcache_page(struct page *page) { } +static inline void flush_kernel_vmap_range(void *vaddr, int size) +{ +} +static inline void invalidate_kernel_vmap_range(void *vaddr, int size) +{ +} #endif #include <asm/kmap_types.h> -- 1.6.5 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCHv2 2/5] parisc: add mm API for DMA to vmalloc/vmap areas 2009-12-23 21:22 ` [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas James Bottomley 2009-12-23 21:22 ` James Bottomley @ 2009-12-23 21:22 ` James Bottomley 2009-12-23 21:22 ` [PATCHv2 3/5] arm: " James Bottomley 2010-01-02 21:33 ` [PATCHv2 2/5] parisc: add mm API for DMA to vmalloc/vmap areas Benjamin Herrenschmidt 2009-12-24 10:08 ` [PATCHv2 1/5] mm: add coherence " Matt Fleming 2 siblings, 2 replies; 27+ messages in thread From: James Bottomley @ 2009-12-23 21:22 UTC (permalink / raw) To: linux-arch Cc: linux-parisc, Christoph Hellwig, Russell King, Paul Mundt, James Bottomley We already have an API to flush a kernel page along an alias address, so use it. The TLB purge prevents the CPU from doing speculative moveins on the flushed address, so we don't need to implement and invalidate. Signed-off-by: James Bottomley <James.Bottomley@suse.de> --- arch/parisc/include/asm/cacheflush.h | 12 ++++++++++++ 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h index 7a73b61..4772777 100644 --- a/arch/parisc/include/asm/cacheflush.h +++ b/arch/parisc/include/asm/cacheflush.h @@ -38,6 +38,18 @@ void flush_cache_mm(struct mm_struct *mm); #define flush_kernel_dcache_range(start,size) \ flush_kernel_dcache_range_asm((start), (start)+(size)); +/* vmap range flushes and invalidates. Architecturally, we don't need + * the invalidate, because the CPU should refuse to speculate once an + * area has been flushed, so invalidate is left empty */ +static inline void flush_kernel_vmap_range(void *vaddr, int size) +{ + unsigned long start = (unsigned long)vaddr; + + flush_kernel_dcache_range_asm(start, start + size); +} +static inline void invalidate_kernel_vmap_range(void *vaddr, int size) +{ +} #define flush_cache_vmap(start, end) flush_cache_all() #define flush_cache_vunmap(start, end) flush_cache_all() -- 1.6.5 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCHv2 3/5] arm: add mm API for DMA to vmalloc/vmap areas 2009-12-23 21:22 ` [PATCHv2 2/5] parisc: add mm " James Bottomley @ 2009-12-23 21:22 ` James Bottomley 2009-12-23 21:22 ` [PATCHv2 4/5] sh: " James Bottomley 2010-01-02 21:33 ` [PATCHv2 2/5] parisc: add mm API for DMA to vmalloc/vmap areas Benjamin Herrenschmidt 1 sibling, 1 reply; 27+ messages in thread From: James Bottomley @ 2009-12-23 21:22 UTC (permalink / raw) To: linux-arch Cc: linux-parisc, Christoph Hellwig, Russell King, Paul Mundt, James Bottomley ARM cannot prevent cache movein, so this patch implements both the flush and invalidate pieces of the API. Signed-off-by: James Bottomley <James.Bottomley@suse.de> --- arch/arm/include/asm/cacheflush.h | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h index 730aefc..4ae503c 100644 --- a/arch/arm/include/asm/cacheflush.h +++ b/arch/arm/include/asm/cacheflush.h @@ -432,6 +432,16 @@ static inline void __flush_icache_all(void) : "r" (0)); #endif } +static inline void flush_kernel_vmap_range(void *addr, int size) +{ + if ((cache_is_vivt() || cache_is_vipt_aliasing())) + __cpuc_flush_dcache_area(addr, (size_t)size); +} +static inline void invalidate_kernel_vmap_range(void *addr, int size) +{ + if ((cache_is_vivt() || cache_is_vipt_aliasing())) + __cpuc_flush_dcache_area(addr, (size_t)size); +} #define ARCH_HAS_FLUSH_ANON_PAGE static inline void flush_anon_page(struct vm_area_struct *vma, -- 1.6.5 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCHv2 4/5] sh: add mm API for DMA to vmalloc/vmap areas 2009-12-23 21:22 ` [PATCHv2 3/5] arm: " James Bottomley @ 2009-12-23 21:22 ` James Bottomley 2009-12-23 21:22 ` James Bottomley 2009-12-23 21:22 ` [PATCHv2 5/5] xfs: fix xfs to work with Virtually Indexed architectures James Bottomley 0 siblings, 2 replies; 27+ messages in thread From: James Bottomley @ 2009-12-23 21:22 UTC (permalink / raw) To: linux-arch Cc: linux-parisc, Christoph Hellwig, Russell King, Paul Mundt, James Bottomley Signed-off-by: James Bottomley <James.Bottomley@suse.de> --- arch/sh/include/asm/cacheflush.h | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/arch/sh/include/asm/cacheflush.h b/arch/sh/include/asm/cacheflush.h index dda96eb..da3ebec 100644 --- a/arch/sh/include/asm/cacheflush.h +++ b/arch/sh/include/asm/cacheflush.h @@ -63,6 +63,14 @@ static inline void flush_anon_page(struct vm_area_struct *vma, if (boot_cpu_data.dcache.n_aliases && PageAnon(page)) __flush_anon_page(page, vmaddr); } +static inline void flush_kernel_vmap_range(void *addr, int size) +{ + __flush_wback_region(addr, size); +} +static inline void invalidate_kernel_vmap_range(void *addr, int size) +{ + __flush_invalidate_region(addr, size); +} #define ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE static inline void flush_kernel_dcache_page(struct page *page) -- 1.6.5 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCHv2 4/5] sh: add mm API for DMA to vmalloc/vmap areas 2009-12-23 21:22 ` [PATCHv2 4/5] sh: " James Bottomley @ 2009-12-23 21:22 ` James Bottomley 2009-12-23 21:22 ` [PATCHv2 5/5] xfs: fix xfs to work with Virtually Indexed architectures James Bottomley 1 sibling, 0 replies; 27+ messages in thread From: James Bottomley @ 2009-12-23 21:22 UTC (permalink / raw) To: linux-arch Cc: linux-parisc, Christoph Hellwig, Russell King, Paul Mundt, James Bottomley Signed-off-by: James Bottomley <James.Bottomley@suse.de> --- arch/sh/include/asm/cacheflush.h | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/arch/sh/include/asm/cacheflush.h b/arch/sh/include/asm/cacheflush.h index dda96eb..da3ebec 100644 --- a/arch/sh/include/asm/cacheflush.h +++ b/arch/sh/include/asm/cacheflush.h @@ -63,6 +63,14 @@ static inline void flush_anon_page(struct vm_area_struct *vma, if (boot_cpu_data.dcache.n_aliases && PageAnon(page)) __flush_anon_page(page, vmaddr); } +static inline void flush_kernel_vmap_range(void *addr, int size) +{ + __flush_wback_region(addr, size); +} +static inline void invalidate_kernel_vmap_range(void *addr, int size) +{ + __flush_invalidate_region(addr, size); +} #define ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE static inline void flush_kernel_dcache_page(struct page *page) -- 1.6.5 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCHv2 5/5] xfs: fix xfs to work with Virtually Indexed architectures 2009-12-23 21:22 ` [PATCHv2 4/5] sh: " James Bottomley 2009-12-23 21:22 ` James Bottomley @ 2009-12-23 21:22 ` James Bottomley 2009-12-24 11:03 ` Christoph Hellwig 1 sibling, 1 reply; 27+ messages in thread From: James Bottomley @ 2009-12-23 21:22 UTC (permalink / raw) To: linux-arch Cc: linux-parisc, Christoph Hellwig, Russell King, Paul Mundt, James Bottomley xfs_buf.c includes what is essentially a hand rolled version of blk_rq_map_kern(). In order to work properly with the vmalloc buffers that xfs uses, this hand rolled routine must also implement the flushing API for vmap/vmalloc areas. Signed-off-by: James Bottomley <James.Bottomley@suse.de> --- fs/xfs/linux-2.6/xfs_buf.c | 20 +++++++++++++++++++- 1 files changed, 19 insertions(+), 1 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_buf.c b/fs/xfs/linux-2.6/xfs_buf.c index 77b8be8..4c77d96 100644 --- a/fs/xfs/linux-2.6/xfs_buf.c +++ b/fs/xfs/linux-2.6/xfs_buf.c @@ -76,6 +76,19 @@ struct workqueue_struct *xfsconvertd_workqueue; #define xfs_buf_deallocate(bp) \ kmem_zone_free(xfs_buf_zone, (bp)); +STATIC int +xfs_bp_is_vmapped( + xfs_buf_t *bp) +{ + /* return true if the buffer is vmapped. The XBF_MAPPED flag + * is set if the buffer should be mapped, but the code is + * clever enough to know it doesn't have to map a single page, + * so the check has to be both for XBF_MAPPED and + * bp->b_page_count > 1 */ + return (bp->b_flags & XBF_MAPPED) && bp->b_page_count > 1; +} + + /* * Page Region interfaces. * @@ -314,7 +327,7 @@ xfs_buf_free( if (bp->b_flags & (_XBF_PAGE_CACHE|_XBF_PAGES)) { uint i; - if ((bp->b_flags & XBF_MAPPED) && (bp->b_page_count > 1)) + if (xfs_bp_is_vmapped(bp)) free_address(bp->b_addr - bp->b_offset); for (i = 0; i < bp->b_page_count; i++) { @@ -1107,6 +1120,9 @@ xfs_buf_bio_end_io( xfs_buf_ioerror(bp, -error); + if (!error && xfs_bp_is_vmapped(bp)) + invalidate_kernel_vmap_range(bp->b_addr, (bp->b_page_count * PAGE_SIZE) - bp->b_offset); + do { struct page *page = bvec->bv_page; @@ -1216,6 +1232,8 @@ next_chunk: submit_io: if (likely(bio->bi_size)) { + if (xfs_bp_is_vmapped(bp)) + flush_kernel_vmap_range(bp->b_addr, (bp->b_page_count * PAGE_SIZE) - bp->b_offset); submit_bio(rw, bio); if (size) goto next_chunk; -- 1.6.5 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCHv2 5/5] xfs: fix xfs to work with Virtually Indexed architectures 2009-12-23 21:22 ` [PATCHv2 5/5] xfs: fix xfs to work with Virtually Indexed architectures James Bottomley @ 2009-12-24 11:03 ` Christoph Hellwig 2009-12-24 11:03 ` Christoph Hellwig 2009-12-27 15:32 ` James Bottomley 0 siblings, 2 replies; 27+ messages in thread From: Christoph Hellwig @ 2009-12-24 11:03 UTC (permalink / raw) To: James Bottomley Cc: linux-arch, linux-parisc, Christoph Hellwig, Russell King, Paul Mundt On Wed, Dec 23, 2009 at 03:22:25PM -0600, James Bottomley wrote: > xfs_buf.c includes what is essentially a hand rolled version of > blk_rq_map_kern(). In order to work properly with the vmalloc buffers > that xfs uses, this hand rolled routine must also implement the flushing > API for vmap/vmalloc areas. > > Signed-off-by: James Bottomley <James.Bottomley@suse.de> Looks good except some minor style issues. Fix in the version below which also adds a helper to calculate the length of the vmap area instead of calculating it twice. --- From: James Bottomley <James.Bottomley@suse.de> Subject: [PATCHv2 5/5] xfs: fix xfs to work with Virtually Indexed architectures xfs_buf.c includes what is essentially a hand rolled version of blk_rq_map_kern(). In order to work properly with the vmalloc buffers that xfs uses, this hand rolled routine must also implement the flushing API for vmap/vmalloc areas. Signed-off-by: James Bottomley <James.Bottomley@suse.de> Index: linux-2.6/fs/xfs/linux-2.6/xfs_buf.c =================================================================== --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_buf.c 2009-12-17 08:05:53.962275389 +0100 +++ linux-2.6/fs/xfs/linux-2.6/xfs_buf.c 2009-12-24 11:59:02.031073134 +0100 @@ -76,6 +76,27 @@ struct workqueue_struct *xfsconvertd_wor #define xfs_buf_deallocate(bp) \ kmem_zone_free(xfs_buf_zone, (bp)); +static inline int +xfs_buf_is_vmapped( + struct xfs_buf *bp) +{ + /* + * Return true if the buffer is vmapped. + * + * The XBF_MAPPED flag is set if the buffer should be mapped, but the + * code is clever enough to know it doesn't have to map a single page, + * so the check has to be both for XBF_MAPPED and bp->b_page_count > 1. + */ + return (bp->b_flags & XBF_MAPPED) && bp->b_page_count > 1; +} + +static inline int +xfs_buf_vmap_len( + struct xfs_buf *bp) +{ + return (bp->b_page_count * PAGE_SIZE) - bp->b_offset; +} + /* * Page Region interfaces. * @@ -314,7 +335,7 @@ xfs_buf_free( if (bp->b_flags & (_XBF_PAGE_CACHE|_XBF_PAGES)) { uint i; - if ((bp->b_flags & XBF_MAPPED) && (bp->b_page_count > 1)) + if (xfs_buf_is_vmapped(bp)) free_address(bp->b_addr - bp->b_offset); for (i = 0; i < bp->b_page_count; i++) { @@ -1107,6 +1128,9 @@ xfs_buf_bio_end_io( xfs_buf_ioerror(bp, -error); + if (!error && xfs_buf_is_vmapped(bp)) + invalidate_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp)); + do { struct page *page = bvec->bv_page; @@ -1216,6 +1240,10 @@ next_chunk: submit_io: if (likely(bio->bi_size)) { + if (xfs_buf_is_vmapped(bp)) { + flush_kernel_vmap_range(bp->b_addr, + xfs_buf_vmap_len(bp)); + } submit_bio(rw, bio); if (size) goto next_chunk; ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 5/5] xfs: fix xfs to work with Virtually Indexed architectures 2009-12-24 11:03 ` Christoph Hellwig @ 2009-12-24 11:03 ` Christoph Hellwig 2009-12-27 15:32 ` James Bottomley 1 sibling, 0 replies; 27+ messages in thread From: Christoph Hellwig @ 2009-12-24 11:03 UTC (permalink / raw) To: James Bottomley Cc: linux-arch, linux-parisc, Christoph Hellwig, Russell King, Paul Mundt On Wed, Dec 23, 2009 at 03:22:25PM -0600, James Bottomley wrote: > xfs_buf.c includes what is essentially a hand rolled version of > blk_rq_map_kern(). In order to work properly with the vmalloc buffers > that xfs uses, this hand rolled routine must also implement the flushing > API for vmap/vmalloc areas. > > Signed-off-by: James Bottomley <James.Bottomley@suse.de> Looks good except some minor style issues. Fix in the version below which also adds a helper to calculate the length of the vmap area instead of calculating it twice. --- From: James Bottomley <James.Bottomley@suse.de> Subject: [PATCHv2 5/5] xfs: fix xfs to work with Virtually Indexed architectures xfs_buf.c includes what is essentially a hand rolled version of blk_rq_map_kern(). In order to work properly with the vmalloc buffers that xfs uses, this hand rolled routine must also implement the flushing API for vmap/vmalloc areas. Signed-off-by: James Bottomley <James.Bottomley@suse.de> Index: linux-2.6/fs/xfs/linux-2.6/xfs_buf.c =================================================================== --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_buf.c 2009-12-17 08:05:53.962275389 +0100 +++ linux-2.6/fs/xfs/linux-2.6/xfs_buf.c 2009-12-24 11:59:02.031073134 +0100 @@ -76,6 +76,27 @@ struct workqueue_struct *xfsconvertd_wor #define xfs_buf_deallocate(bp) \ kmem_zone_free(xfs_buf_zone, (bp)); +static inline int +xfs_buf_is_vmapped( + struct xfs_buf *bp) +{ + /* + * Return true if the buffer is vmapped. + * + * The XBF_MAPPED flag is set if the buffer should be mapped, but the + * code is clever enough to know it doesn't have to map a single page, + * so the check has to be both for XBF_MAPPED and bp->b_page_count > 1. + */ + return (bp->b_flags & XBF_MAPPED) && bp->b_page_count > 1; +} + +static inline int +xfs_buf_vmap_len( + struct xfs_buf *bp) +{ + return (bp->b_page_count * PAGE_SIZE) - bp->b_offset; +} + /* * Page Region interfaces. * @@ -314,7 +335,7 @@ xfs_buf_free( if (bp->b_flags & (_XBF_PAGE_CACHE|_XBF_PAGES)) { uint i; - if ((bp->b_flags & XBF_MAPPED) && (bp->b_page_count > 1)) + if (xfs_buf_is_vmapped(bp)) free_address(bp->b_addr - bp->b_offset); for (i = 0; i < bp->b_page_count; i++) { @@ -1107,6 +1128,9 @@ xfs_buf_bio_end_io( xfs_buf_ioerror(bp, -error); + if (!error && xfs_buf_is_vmapped(bp)) + invalidate_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp)); + do { struct page *page = bvec->bv_page; @@ -1216,6 +1240,10 @@ next_chunk: submit_io: if (likely(bio->bi_size)) { + if (xfs_buf_is_vmapped(bp)) { + flush_kernel_vmap_range(bp->b_addr, + xfs_buf_vmap_len(bp)); + } submit_bio(rw, bio); if (size) goto next_chunk; ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 5/5] xfs: fix xfs to work with Virtually Indexed architectures 2009-12-24 11:03 ` Christoph Hellwig 2009-12-24 11:03 ` Christoph Hellwig @ 2009-12-27 15:32 ` James Bottomley 1 sibling, 0 replies; 27+ messages in thread From: James Bottomley @ 2009-12-27 15:32 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-arch, linux-parisc, Russell King, Paul Mundt On Thu, 2009-12-24 at 12:03 +0100, Christoph Hellwig wrote: > On Wed, Dec 23, 2009 at 03:22:25PM -0600, James Bottomley wrote: > > xfs_buf.c includes what is essentially a hand rolled version of > > blk_rq_map_kern(). In order to work properly with the vmalloc buffers > > that xfs uses, this hand rolled routine must also implement the flushing > > API for vmap/vmalloc areas. > > > > Signed-off-by: James Bottomley <James.Bottomley@suse.de> > > Looks good except some minor style issues. Fix in the version below > which also adds a helper to calculate the length of the vmap area > instead of calculating it twice. Sure ... I nearly did that ... but then there's a lot of stuff like that repeated in the code. I'll also add the optimisation that the invalidation is unnecessary in the write case. James > --- > > From: James Bottomley <James.Bottomley@suse.de> > Subject: [PATCHv2 5/5] xfs: fix xfs to work with Virtually Indexed architectures > > xfs_buf.c includes what is essentially a hand rolled version of > blk_rq_map_kern(). In order to work properly with the vmalloc buffers > that xfs uses, this hand rolled routine must also implement the flushing > API for vmap/vmalloc areas. > > Signed-off-by: James Bottomley <James.Bottomley@suse.de> > > Index: linux-2.6/fs/xfs/linux-2.6/xfs_buf.c > =================================================================== > --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_buf.c 2009-12-17 08:05:53.962275389 +0100 > +++ linux-2.6/fs/xfs/linux-2.6/xfs_buf.c 2009-12-24 11:59:02.031073134 +0100 > @@ -76,6 +76,27 @@ struct workqueue_struct *xfsconvertd_wor > #define xfs_buf_deallocate(bp) \ > kmem_zone_free(xfs_buf_zone, (bp)); > > +static inline int > +xfs_buf_is_vmapped( > + struct xfs_buf *bp) > +{ > + /* > + * Return true if the buffer is vmapped. > + * > + * The XBF_MAPPED flag is set if the buffer should be mapped, but the > + * code is clever enough to know it doesn't have to map a single page, > + * so the check has to be both for XBF_MAPPED and bp->b_page_count > 1. > + */ > + return (bp->b_flags & XBF_MAPPED) && bp->b_page_count > 1; > +} > + > +static inline int > +xfs_buf_vmap_len( > + struct xfs_buf *bp) > +{ > + return (bp->b_page_count * PAGE_SIZE) - bp->b_offset; > +} > + > /* > * Page Region interfaces. > * > @@ -314,7 +335,7 @@ xfs_buf_free( > if (bp->b_flags & (_XBF_PAGE_CACHE|_XBF_PAGES)) { > uint i; > > - if ((bp->b_flags & XBF_MAPPED) && (bp->b_page_count > 1)) > + if (xfs_buf_is_vmapped(bp)) > free_address(bp->b_addr - bp->b_offset); > > for (i = 0; i < bp->b_page_count; i++) { > @@ -1107,6 +1128,9 @@ xfs_buf_bio_end_io( > > xfs_buf_ioerror(bp, -error); > > + if (!error && xfs_buf_is_vmapped(bp)) > + invalidate_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp)); > + > do { > struct page *page = bvec->bv_page; > > @@ -1216,6 +1240,10 @@ next_chunk: > > submit_io: > if (likely(bio->bi_size)) { > + if (xfs_buf_is_vmapped(bp)) { > + flush_kernel_vmap_range(bp->b_addr, > + xfs_buf_vmap_len(bp)); > + } > submit_bio(rw, bio); > if (size) > goto next_chunk; > -- > To unsubscribe from this list: send the line "unsubscribe linux-arch" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 2/5] parisc: add mm API for DMA to vmalloc/vmap areas 2009-12-23 21:22 ` [PATCHv2 2/5] parisc: add mm " James Bottomley 2009-12-23 21:22 ` [PATCHv2 3/5] arm: " James Bottomley @ 2010-01-02 21:33 ` Benjamin Herrenschmidt 2010-01-02 21:33 ` Benjamin Herrenschmidt 2010-01-02 21:53 ` James Bottomley 1 sibling, 2 replies; 27+ messages in thread From: Benjamin Herrenschmidt @ 2010-01-02 21:33 UTC (permalink / raw) To: James Bottomley Cc: linux-arch, linux-parisc, Christoph Hellwig, Russell King, Paul Mundt On Wed, 2009-12-23 at 15:22 -0600, James Bottomley wrote: > #define flush_kernel_dcache_range(start,size) \ > flush_kernel_dcache_range_asm((start), (start)+(size)); > +/* vmap range flushes and invalidates. Architecturally, we don't need > + * the invalidate, because the CPU should refuse to speculate once an > + * area has been flushed, so invalidate is left empty */ > +static inline void flush_kernel_vmap_range(void *vaddr, int size) > +{ > + unsigned long start = (unsigned long)vaddr; > + > + flush_kernel_dcache_range_asm(start, start + size); > +} > +static inline void invalidate_kernel_vmap_range(void *vaddr, int size) > +{ > +} Do I understand correctly that for an inbound DMA you will first call flush before starting the DMA, then invalidate at the end of the transfer ? See my other message on that subject but I believe this is a sub-optimal semantic. I'd rather expose separately dma_vmap_sync_outbound vs. dma_vma_sync_inboud_before vs. dma_vma_sync_inboud_after. On quite a few archs, an invalidate is a lot faster than a flush (since it doesn't require a writeback of potentially useless crap to memory) and for an inbound transfer that doesn't cross cache line boundaries, invalidate is all that's needed for both before and after. On 44x additionally I don't need "after" since the core is too dumb to prefetch (or rather it's disabled due to erratas). Cheers, Ben. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 2/5] parisc: add mm API for DMA to vmalloc/vmap areas 2010-01-02 21:33 ` [PATCHv2 2/5] parisc: add mm API for DMA to vmalloc/vmap areas Benjamin Herrenschmidt @ 2010-01-02 21:33 ` Benjamin Herrenschmidt 2010-01-02 21:53 ` James Bottomley 1 sibling, 0 replies; 27+ messages in thread From: Benjamin Herrenschmidt @ 2010-01-02 21:33 UTC (permalink / raw) To: James Bottomley Cc: linux-arch, linux-parisc, Christoph Hellwig, Russell King, Paul Mundt On Wed, 2009-12-23 at 15:22 -0600, James Bottomley wrote: > #define flush_kernel_dcache_range(start,size) \ > flush_kernel_dcache_range_asm((start), (start)+(size)); > +/* vmap range flushes and invalidates. Architecturally, we don't need > + * the invalidate, because the CPU should refuse to speculate once an > + * area has been flushed, so invalidate is left empty */ > +static inline void flush_kernel_vmap_range(void *vaddr, int size) > +{ > + unsigned long start = (unsigned long)vaddr; > + > + flush_kernel_dcache_range_asm(start, start + size); > +} > +static inline void invalidate_kernel_vmap_range(void *vaddr, int size) > +{ > +} Do I understand correctly that for an inbound DMA you will first call flush before starting the DMA, then invalidate at the end of the transfer ? See my other message on that subject but I believe this is a sub-optimal semantic. I'd rather expose separately dma_vmap_sync_outbound vs. dma_vma_sync_inboud_before vs. dma_vma_sync_inboud_after. On quite a few archs, an invalidate is a lot faster than a flush (since it doesn't require a writeback of potentially useless crap to memory) and for an inbound transfer that doesn't cross cache line boundaries, invalidate is all that's needed for both before and after. On 44x additionally I don't need "after" since the core is too dumb to prefetch (or rather it's disabled due to erratas). Cheers, Ben. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 2/5] parisc: add mm API for DMA to vmalloc/vmap areas 2010-01-02 21:33 ` [PATCHv2 2/5] parisc: add mm API for DMA to vmalloc/vmap areas Benjamin Herrenschmidt 2010-01-02 21:33 ` Benjamin Herrenschmidt @ 2010-01-02 21:53 ` James Bottomley 2010-01-03 20:12 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 27+ messages in thread From: James Bottomley @ 2010-01-02 21:53 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: linux-arch, linux-parisc, Christoph Hellwig, Russell King, Paul Mundt On Sun, 2010-01-03 at 08:33 +1100, Benjamin Herrenschmidt wrote: > On Wed, 2009-12-23 at 15:22 -0600, James Bottomley wrote: > > #define flush_kernel_dcache_range(start,size) \ > > flush_kernel_dcache_range_asm((start), (start)+(size)); > > +/* vmap range flushes and invalidates. Architecturally, we don't need > > + * the invalidate, because the CPU should refuse to speculate once an > > + * area has been flushed, so invalidate is left empty */ > > +static inline void flush_kernel_vmap_range(void *vaddr, int size) > > +{ > > + unsigned long start = (unsigned long)vaddr; > > + > > + flush_kernel_dcache_range_asm(start, start + size); > > +} > > +static inline void invalidate_kernel_vmap_range(void *vaddr, int size) > > +{ > > +} > > Do I understand correctly that for an inbound DMA you will first call > flush before starting the DMA, then invalidate at the end of the > transfer ? > > See my other message on that subject but I believe this is a sub-optimal > semantic. I'd rather expose separately dma_vmap_sync_outbound vs. > dma_vma_sync_inboud_before vs. dma_vma_sync_inboud_after. Well, this is such a micro optimisation, is it really worth it? If I map exactly to architectural operations, it's flush (without invalidate if possible) before an outbound DMA transfer and nothing after. For inbound, it's invalidate before and after (the after only assuming the architecture can do speculative move in), but doing a flush first instead of an invalidate on DMA inbound produces a correct result on architectures I know about. > On quite a few archs, an invalidate is a lot faster than a flush (since > it doesn't require a writeback of potentially useless crap to memory) > and for an inbound transfer that doesn't cross cache line boundaries, > invalidate is all that's needed for both before and after. On 44x > additionally I don't need "after" since the core is too dumb to prefetch > (or rather it's disabled due to erratas). Your logic assumes the cache line is dirty. If you look at the XFS usage, it never seems to do local modifications on a read, so the line should be clean. At least on parisc, a flush of a clean cache line is exactly equivalent to an invalidate. Even if there's some write into the read area in xfs I've missed, it's only a few extra cycles because the lines are mostly clean. James ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 2/5] parisc: add mm API for DMA to vmalloc/vmap areas 2010-01-02 21:53 ` James Bottomley @ 2010-01-03 20:12 ` Benjamin Herrenschmidt 2010-01-03 20:12 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 27+ messages in thread From: Benjamin Herrenschmidt @ 2010-01-03 20:12 UTC (permalink / raw) To: James Bottomley Cc: linux-arch, linux-parisc, Christoph Hellwig, Russell King, Paul Mundt On Sat, 2010-01-02 at 15:53 -0600, James Bottomley wrote: > > > See my other message on that subject but I believe this is a > sub-optimal > > semantic. I'd rather expose separately dma_vmap_sync_outbound vs. > > dma_vma_sync_inboud_before vs. dma_vma_sync_inboud_after. > > Well, this is such a micro optimisation, is it really worth it? It's not that "micro" in the sense that flush can take significantly longer than invalidate. Mostly a non issue except in network.... most non cache coherent systems are low end and pretty sensitive to that sort of stuff, especially when used as ... routers :-) But yes, it probably doesn't matter in your current case of XFS vmap, I was mostly trying to push toward generically exposing finer interfaces :-) And god knows who'll use your vmap DMA APIs in the future. > If I map exactly to architectural operations, it's flush (without > invalidate if possible) before an outbound DMA transfer and nothing > after. For inbound, it's invalidate before and after (the after only > assuming the architecture can do speculative move in), but doing a > flush first instead of an invalidate on DMA inbound produces a correct > result on architectures I know about. It is correct. Just sub-optimal ;-) However it does handle another quirk which is side effect on "edges". If all is well, the DMA buffer is completely self contained in its own cache lines but experience shows that isn't always the case and it might share a cache line on the edges (typical with skb's). This is of course utterly broken except in one case (which I -think- is the case with skb's) where the data sharing the cache line isn't accessed during the DMA transfer. However, for that to work, as you can see, one needs to flush and not invalidate the "edge" cache lines before the inbound transfer. So your approach of flush + invalidate is "safer". But I'd still rather have the semantic exposed at a higher level so that arch can decide what to use and allow for the micro-optimisation... > Your logic assumes the cache line is dirty. If you look at the XFS > usage, it never seems to do local modifications on a read, so the line > should be clean. At least on parisc, a flush of a clean cache line is > exactly equivalent to an invalidate. Even if there's some write into > the read area in xfs I've missed, it's only a few extra cycles because > the lines are mostly clean. Right, in your case of vmap, it doesn't seem to be a big deal, but heh, somebody is bound to use it again for something else right ? :-) Anyways, just some thoughts I've been having about DMA APIs in general, I don't say we must change it now, especially not the existing ones, but for a -new- API like that I would have preferred if it exposed a richer semantic from day 1. Cheers, Ben. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 2/5] parisc: add mm API for DMA to vmalloc/vmap areas 2010-01-03 20:12 ` Benjamin Herrenschmidt @ 2010-01-03 20:12 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 27+ messages in thread From: Benjamin Herrenschmidt @ 2010-01-03 20:12 UTC (permalink / raw) To: James Bottomley Cc: linux-arch, linux-parisc, Christoph Hellwig, Russell King, Paul Mundt On Sat, 2010-01-02 at 15:53 -0600, James Bottomley wrote: > > > See my other message on that subject but I believe this is a > sub-optimal > > semantic. I'd rather expose separately dma_vmap_sync_outbound vs. > > dma_vma_sync_inboud_before vs. dma_vma_sync_inboud_after. > > Well, this is such a micro optimisation, is it really worth it? It's not that "micro" in the sense that flush can take significantly longer than invalidate. Mostly a non issue except in network.... most non cache coherent systems are low end and pretty sensitive to that sort of stuff, especially when used as ... routers :-) But yes, it probably doesn't matter in your current case of XFS vmap, I was mostly trying to push toward generically exposing finer interfaces :-) And god knows who'll use your vmap DMA APIs in the future. > If I map exactly to architectural operations, it's flush (without > invalidate if possible) before an outbound DMA transfer and nothing > after. For inbound, it's invalidate before and after (the after only > assuming the architecture can do speculative move in), but doing a > flush first instead of an invalidate on DMA inbound produces a correct > result on architectures I know about. It is correct. Just sub-optimal ;-) However it does handle another quirk which is side effect on "edges". If all is well, the DMA buffer is completely self contained in its own cache lines but experience shows that isn't always the case and it might share a cache line on the edges (typical with skb's). This is of course utterly broken except in one case (which I -think- is the case with skb's) where the data sharing the cache line isn't accessed during the DMA transfer. However, for that to work, as you can see, one needs to flush and not invalidate the "edge" cache lines before the inbound transfer. So your approach of flush + invalidate is "safer". But I'd still rather have the semantic exposed at a higher level so that arch can decide what to use and allow for the micro-optimisation... > Your logic assumes the cache line is dirty. If you look at the XFS > usage, it never seems to do local modifications on a read, so the line > should be clean. At least on parisc, a flush of a clean cache line is > exactly equivalent to an invalidate. Even if there's some write into > the read area in xfs I've missed, it's only a few extra cycles because > the lines are mostly clean. Right, in your case of vmap, it doesn't seem to be a big deal, but heh, somebody is bound to use it again for something else right ? :-) Anyways, just some thoughts I've been having about DMA APIs in general, I don't say we must change it now, especially not the existing ones, but for a -new- API like that I would have preferred if it exposed a richer semantic from day 1. Cheers, Ben. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas 2009-12-23 21:22 ` [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas James Bottomley 2009-12-23 21:22 ` James Bottomley 2009-12-23 21:22 ` [PATCHv2 2/5] parisc: add mm " James Bottomley @ 2009-12-24 10:08 ` Matt Fleming 2009-12-24 10:08 ` Matt Fleming 2009-12-24 12:39 ` Matthew Wilcox 2 siblings, 2 replies; 27+ messages in thread From: Matt Fleming @ 2009-12-24 10:08 UTC (permalink / raw) To: James Bottomley Cc: linux-arch, linux-parisc, Christoph Hellwig, Russell King, Paul Mundt On Wed, Dec 23, 2009 at 03:22:21PM -0600, James Bottomley wrote: > + > + void flush_kernel_vmap_range(void *vaddr, int size) > + flushes the kernel cache for a given virtual address range in > + the vmap area. This API makes sure that and data the kernel ^^^ code and data? > + void invalidate_kernel_vmap_range(void *vaddr, int size) > + invalidates the kernel cache for a given virtual address range > + in the vmap area. This API is designed to make sure that while > + I/O went on to an address range in the vmap area, the processor > + didn't speculate cache reads and thus make the cache over the > + virtual address stale. > + Could this sentence be reworked a little? I find the "over the virtual address" part a little difficult to parse. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas 2009-12-24 10:08 ` [PATCHv2 1/5] mm: add coherence " Matt Fleming @ 2009-12-24 10:08 ` Matt Fleming 2009-12-24 12:39 ` Matthew Wilcox 1 sibling, 0 replies; 27+ messages in thread From: Matt Fleming @ 2009-12-24 10:08 UTC (permalink / raw) To: James Bottomley Cc: linux-arch, linux-parisc, Christoph Hellwig, Russell King, Paul Mundt On Wed, Dec 23, 2009 at 03:22:21PM -0600, James Bottomley wrote: > + > + void flush_kernel_vmap_range(void *vaddr, int size) > + flushes the kernel cache for a given virtual address range in > + the vmap area. This API makes sure that and data the kernel ^^^ code and data? > + void invalidate_kernel_vmap_range(void *vaddr, int size) > + invalidates the kernel cache for a given virtual address range > + in the vmap area. This API is designed to make sure that while > + I/O went on to an address range in the vmap area, the processor > + didn't speculate cache reads and thus make the cache over the > + virtual address stale. > + Could this sentence be reworked a little? I find the "over the virtual address" part a little difficult to parse. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas 2009-12-24 10:08 ` [PATCHv2 1/5] mm: add coherence " Matt Fleming 2009-12-24 10:08 ` Matt Fleming @ 2009-12-24 12:39 ` Matthew Wilcox 2009-12-24 12:39 ` Matthew Wilcox ` (3 more replies) 1 sibling, 4 replies; 27+ messages in thread From: Matthew Wilcox @ 2009-12-24 12:39 UTC (permalink / raw) To: Matt Fleming Cc: James Bottomley, linux-arch, linux-parisc, Christoph Hellwig, Russell King, Paul Mundt On Thu, Dec 24, 2009 at 10:08:53AM +0000, Matt Fleming wrote: > On Wed, Dec 23, 2009 at 03:22:21PM -0600, James Bottomley wrote: > > + > > + void flush_kernel_vmap_range(void *vaddr, int size) > > + flushes the kernel cache for a given virtual address range in > > + the vmap area. This API makes sure that and data the kernel > > ^^^ code and data? I'd guess it's a typo for 'any data'. > > + void invalidate_kernel_vmap_range(void *vaddr, int size) > > + invalidates the kernel cache for a given virtual address range > > + in the vmap area. This API is designed to make sure that while > > + I/O went on to an address range in the vmap area, the processor > > + didn't speculate cache reads and thus make the cache over the > > + virtual address stale. > > + > > Could this sentence be reworked a little? I find the "over the virtual > address" part a little difficult to parse. How about: invalidates the processor cache for a given virtual address range in the vmap area. This API addresses the problem that the processor may have performed speculative reads into its cache of the vmapped area while I/O was occurring to the underlying physical pages. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> -- Matthew Wilcox Intel Open Source Technology Centre "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas 2009-12-24 12:39 ` Matthew Wilcox @ 2009-12-24 12:39 ` Matthew Wilcox 2009-12-24 13:06 ` Matt Fleming ` (2 subsequent siblings) 3 siblings, 0 replies; 27+ messages in thread From: Matthew Wilcox @ 2009-12-24 12:39 UTC (permalink / raw) To: Matt Fleming Cc: James Bottomley, linux-arch, linux-parisc, Christoph Hellwig, Russell King, Paul Mundt On Thu, Dec 24, 2009 at 10:08:53AM +0000, Matt Fleming wrote: > On Wed, Dec 23, 2009 at 03:22:21PM -0600, James Bottomley wrote: > > + > > + void flush_kernel_vmap_range(void *vaddr, int size) > > + flushes the kernel cache for a given virtual address range in > > + the vmap area. This API makes sure that and data the kernel > > ^^^ code and data? I'd guess it's a typo for 'any data'. > > + void invalidate_kernel_vmap_range(void *vaddr, int size) > > + invalidates the kernel cache for a given virtual address range > > + in the vmap area. This API is designed to make sure that while > > + I/O went on to an address range in the vmap area, the processor > > + didn't speculate cache reads and thus make the cache over the > > + virtual address stale. > > + > > Could this sentence be reworked a little? I find the "over the virtual > address" part a little difficult to parse. How about: invalidates the processor cache for a given virtual address range in the vmap area. This API addresses the problem that the processor may have performed speculative reads into its cache of the vmapped area while I/O was occurring to the underlying physical pages. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> -- Matthew Wilcox Intel Open Source Technology Centre "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas 2009-12-24 12:39 ` Matthew Wilcox 2009-12-24 12:39 ` Matthew Wilcox @ 2009-12-24 13:06 ` Matt Fleming 2009-12-24 13:06 ` Matt Fleming 2009-12-27 15:37 ` James Bottomley 2010-01-02 21:27 ` Benjamin Herrenschmidt 3 siblings, 1 reply; 27+ messages in thread From: Matt Fleming @ 2009-12-24 13:06 UTC (permalink / raw) To: Matthew Wilcox Cc: James Bottomley, linux-arch, linux-parisc, Christoph Hellwig, Russell King, Paul Mundt On Thu, Dec 24, 2009 at 05:39:13AM -0700, Matthew Wilcox wrote: > On Thu, Dec 24, 2009 at 10:08:53AM +0000, Matt Fleming wrote: > > On Wed, Dec 23, 2009 at 03:22:21PM -0600, James Bottomley wrote: > > > + > > > + void flush_kernel_vmap_range(void *vaddr, int size) > > > + flushes the kernel cache for a given virtual address range in > > > + the vmap area. This API makes sure that and data the kernel > > > > ^^^ code and data? > > I'd guess it's a typo for 'any data'. > Aha yes, that would make more sense. > > > + void invalidate_kernel_vmap_range(void *vaddr, int size) > > > + invalidates the kernel cache for a given virtual address range > > > + in the vmap area. This API is designed to make sure that while > > > + I/O went on to an address range in the vmap area, the processor > > > + didn't speculate cache reads and thus make the cache over the > > > + virtual address stale. > > > + > > > > Could this sentence be reworked a little? I find the "over the virtual > > address" part a little difficult to parse. > > How about: > > invalidates the processor cache for a given virtual address range > in the vmap area. This API addresses the problem that the processor > may have performed speculative reads into its cache of the vmapped > area while I/O was occurring to the underlying physical pages. > > Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Sounds good to me! ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas 2009-12-24 13:06 ` Matt Fleming @ 2009-12-24 13:06 ` Matt Fleming 0 siblings, 0 replies; 27+ messages in thread From: Matt Fleming @ 2009-12-24 13:06 UTC (permalink / raw) To: Matthew Wilcox Cc: James Bottomley, linux-arch, linux-parisc, Christoph Hellwig, Russell King, Paul Mundt On Thu, Dec 24, 2009 at 05:39:13AM -0700, Matthew Wilcox wrote: > On Thu, Dec 24, 2009 at 10:08:53AM +0000, Matt Fleming wrote: > > On Wed, Dec 23, 2009 at 03:22:21PM -0600, James Bottomley wrote: > > > + > > > + void flush_kernel_vmap_range(void *vaddr, int size) > > > + flushes the kernel cache for a given virtual address range in > > > + the vmap area. This API makes sure that and data the kernel > > > > ^^^ code and data? > > I'd guess it's a typo for 'any data'. > Aha yes, that would make more sense. > > > + void invalidate_kernel_vmap_range(void *vaddr, int size) > > > + invalidates the kernel cache for a given virtual address range > > > + in the vmap area. This API is designed to make sure that while > > > + I/O went on to an address range in the vmap area, the processor > > > + didn't speculate cache reads and thus make the cache over the > > > + virtual address stale. > > > + > > > > Could this sentence be reworked a little? I find the "over the virtual > > address" part a little difficult to parse. > > How about: > > invalidates the processor cache for a given virtual address range > in the vmap area. This API addresses the problem that the processor > may have performed speculative reads into its cache of the vmapped > area while I/O was occurring to the underlying physical pages. > > Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Sounds good to me! ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas 2009-12-24 12:39 ` Matthew Wilcox 2009-12-24 12:39 ` Matthew Wilcox 2009-12-24 13:06 ` Matt Fleming @ 2009-12-27 15:37 ` James Bottomley 2010-01-02 21:27 ` Benjamin Herrenschmidt 3 siblings, 0 replies; 27+ messages in thread From: James Bottomley @ 2009-12-27 15:37 UTC (permalink / raw) To: Matthew Wilcox Cc: Matt Fleming, linux-arch, linux-parisc, Christoph Hellwig, Russell King, Paul Mundt On Thu, 2009-12-24 at 05:39 -0700, Matthew Wilcox wrote: > On Thu, Dec 24, 2009 at 10:08:53AM +0000, Matt Fleming wrote: > > On Wed, Dec 23, 2009 at 03:22:21PM -0600, James Bottomley wrote: > > > + > > > + void flush_kernel_vmap_range(void *vaddr, int size) > > > + flushes the kernel cache for a given virtual address range in > > > + the vmap area. This API makes sure that and data the kernel > > > > ^^^ code and data? > > I'd guess it's a typo for 'any data'. > > > > + void invalidate_kernel_vmap_range(void *vaddr, int size) > > > + invalidates the kernel cache for a given virtual address range > > > + in the vmap area. This API is designed to make sure that while > > > + I/O went on to an address range in the vmap area, the processor > > > + didn't speculate cache reads and thus make the cache over the > > > + virtual address stale. > > > + > > > > Could this sentence be reworked a little? I find the "over the virtual > > address" part a little difficult to parse. > > How about: > > invalidates the processor cache for a given virtual address range > in the vmap area. This API addresses the problem that the processor > may have performed speculative reads into its cache of the vmapped > area while I/O was occurring to the underlying physical pages. So better, I think is invalidates the cache for a given virtual address range in the vmap area which prevents the processor from making the cache stale by speculatively reading data while the I/O was occurring to the physical pages. James ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas 2009-12-24 12:39 ` Matthew Wilcox ` (2 preceding siblings ...) 2009-12-27 15:37 ` James Bottomley @ 2010-01-02 21:27 ` Benjamin Herrenschmidt 2010-01-02 21:54 ` James Bottomley 3 siblings, 1 reply; 27+ messages in thread From: Benjamin Herrenschmidt @ 2010-01-02 21:27 UTC (permalink / raw) To: Matthew Wilcox Cc: Matt Fleming, James Bottomley, linux-arch, linux-parisc, Christoph Hellwig, Russell King, Paul Mundt On Thu, 2009-12-24 at 05:39 -0700, Matthew Wilcox wrote: > > invalidates the processor cache for a given virtual address range > in the vmap area. This API addresses the problem that the processor > may have performed speculative reads into its cache of the vmapped > area while I/O was occurring to the underlying physical pages. > > Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Interestingly, our DMA APIs in this regard are sub-optimal as they should provide 3 hooks, not 2. Flush is good for ensuring dirty lines have been pushed out before an outgoing DMA. But for incoming DMA it would be nice to properly split the 2 calls that may be needed on some archs, one before, one after the transfer. Sure, invalidate twice will "work" but the f will also be sub-optimal on some platforms depending on whether the platform is known to be able or not to speculatively load cache lines etc... Maybe just a "before" vs. "after" argument ? Also, the proposal goes contrary to most of our DMA APIs which don't actually expose the details of invalidate vs. flush but instead expose the direction of the transfer (both lack the above subtlety though). Thoughts ? Cheers, Ben. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas 2010-01-02 21:27 ` Benjamin Herrenschmidt @ 2010-01-02 21:54 ` James Bottomley 2010-01-03 20:14 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 27+ messages in thread From: James Bottomley @ 2010-01-02 21:54 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Matthew Wilcox, Matt Fleming, linux-arch, linux-parisc, Christoph Hellwig, Russell King, Paul Mundt On Sun, 2010-01-03 at 08:27 +1100, Benjamin Herrenschmidt wrote: > On Thu, 2009-12-24 at 05:39 -0700, Matthew Wilcox wrote: > > > > invalidates the processor cache for a given virtual address range > > in the vmap area. This API addresses the problem that the processor > > may have performed speculative reads into its cache of the vmapped > > area while I/O was occurring to the underlying physical pages. > > > > Signed-off-by: Matthew Wilcox <willy@linux.intel.com> > > Interestingly, our DMA APIs in this regard are sub-optimal as they > should provide 3 hooks, not 2. > > Flush is good for ensuring dirty lines have been pushed out before an > outgoing DMA. > > But for incoming DMA it would be nice to properly split the 2 calls that > may be needed on some archs, one before, one after the transfer. Sure, > invalidate twice will "work" but the f will also be sub-optimal on some > platforms depending on whether the platform is known to be able or not > to speculatively load cache lines etc... > > Maybe just a "before" vs. "after" argument ? > > Also, the proposal goes contrary to most of our DMA APIs which don't > actually expose the details of invalidate vs. flush but instead expose > the direction of the transfer (both lack the above subtlety though). > > Thoughts ? Well, that's the result of the email thread. XFS is poking deeply into architectural issues by trying to do I/O on a vmap area. I thought the consensus was that if xfs wants to do that then it takes complete responsibility for coherence rather than trying to hide it in a block API. James ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas 2010-01-02 21:54 ` James Bottomley @ 2010-01-03 20:14 ` Benjamin Herrenschmidt 2010-01-03 20:14 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 27+ messages in thread From: Benjamin Herrenschmidt @ 2010-01-03 20:14 UTC (permalink / raw) To: James Bottomley Cc: Matthew Wilcox, Matt Fleming, linux-arch, linux-parisc, Christoph Hellwig, Russell King, Paul Mundt On Sat, 2010-01-02 at 15:54 -0600, James Bottomley wrote: > Well, that's the result of the email thread. XFS is poking deeply into > architectural issues by trying to do I/O on a vmap area. I thought the > consensus was that if xfs wants to do that then it takes complete > responsibility for coherence rather than trying to hide it in a block > API. Well, maybe ... Mostly my comment was about the "other" (standard) DMA APIs, ie, the work you do for XFS just reminded me of that issue I've been thinking about :-) Cheers, Ben. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas 2010-01-03 20:14 ` Benjamin Herrenschmidt @ 2010-01-03 20:14 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 27+ messages in thread From: Benjamin Herrenschmidt @ 2010-01-03 20:14 UTC (permalink / raw) To: James Bottomley Cc: Matthew Wilcox, Matt Fleming, linux-arch, linux-parisc, Christoph Hellwig, Russell King, Paul Mundt On Sat, 2010-01-02 at 15:54 -0600, James Bottomley wrote: > Well, that's the result of the email thread. XFS is poking deeply into > architectural issues by trying to do I/O on a vmap area. I thought the > consensus was that if xfs wants to do that then it takes complete > responsibility for coherence rather than trying to hide it in a block > API. Well, maybe ... Mostly my comment was about the "other" (standard) DMA APIs, ie, the work you do for XFS just reminded me of that issue I've been thinking about :-) Cheers, Ben. ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2010-01-03 20:14 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-12-23 21:22 [PATCHv2 0/5] fix xfs by making I/O to vmap/vmalloc areas work James Bottomley 2009-12-23 21:22 ` [PATCHv2 1/5] mm: add coherence API for DMA to vmalloc/vmap areas James Bottomley 2009-12-23 21:22 ` James Bottomley 2009-12-23 21:22 ` [PATCHv2 2/5] parisc: add mm " James Bottomley 2009-12-23 21:22 ` [PATCHv2 3/5] arm: " James Bottomley 2009-12-23 21:22 ` [PATCHv2 4/5] sh: " James Bottomley 2009-12-23 21:22 ` James Bottomley 2009-12-23 21:22 ` [PATCHv2 5/5] xfs: fix xfs to work with Virtually Indexed architectures James Bottomley 2009-12-24 11:03 ` Christoph Hellwig 2009-12-24 11:03 ` Christoph Hellwig 2009-12-27 15:32 ` James Bottomley 2010-01-02 21:33 ` [PATCHv2 2/5] parisc: add mm API for DMA to vmalloc/vmap areas Benjamin Herrenschmidt 2010-01-02 21:33 ` Benjamin Herrenschmidt 2010-01-02 21:53 ` James Bottomley 2010-01-03 20:12 ` Benjamin Herrenschmidt 2010-01-03 20:12 ` Benjamin Herrenschmidt 2009-12-24 10:08 ` [PATCHv2 1/5] mm: add coherence " Matt Fleming 2009-12-24 10:08 ` Matt Fleming 2009-12-24 12:39 ` Matthew Wilcox 2009-12-24 12:39 ` Matthew Wilcox 2009-12-24 13:06 ` Matt Fleming 2009-12-24 13:06 ` Matt Fleming 2009-12-27 15:37 ` James Bottomley 2010-01-02 21:27 ` Benjamin Herrenschmidt 2010-01-02 21:54 ` James Bottomley 2010-01-03 20:14 ` Benjamin Herrenschmidt 2010-01-03 20:14 ` Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).