* xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas @ 2009-09-08 18:27 James Bottomley 2009-09-08 19:00 ` Russell King 2009-10-13 1:40 ` xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas Christoph Hellwig 0 siblings, 2 replies; 19+ messages in thread From: James Bottomley @ 2009-09-08 18:27 UTC (permalink / raw) To: Parisc List, Linux Filesystem Mailing List, linux-arch; +Cc: Christoph Hellwig This bug was observed on parisc, but I would expect it to affect all architectures with virtually indexed caches. The inception of this problem is the changes we made to block and SCSI to eliminate the special case path for kernel buffers. This change forced every I/O to go via the full scatter gather processing. In this way we thought we'd removed the restrictions about using vmalloc/vmap areas for I/O from the kernel. XFS acually took advantage of this, hence the problems. Actually, if you look at the implementation of blk_rq_map_kern(), it still won't accept vmalloc pages on most architectures because virt_to_page() assumes an offset mapped page ... x86 actually has a bug on for the vmalloc case if you enable DEBUG_VIRTUAL). The only reason xfs gets away with this is because it builds the vmalloc'd bio manually, essentially open coding blk_rq_map_kern(). The problem comes because by the time we get to map scatter gather lists, all we have is the page, we've lost the virtual address. There's a macro: sg_virt() which claims to recover the virtual address, but all it really does is provide the offset map of the page physical address. This means that sg_virt() returns a different address from the one the page was actually used by if it's in a vmalloc/vmap area (because we remapped the page within the kernel virtual address space). This means that for virtually indexed caches, we end up flushing the wrong page alias ... and hence corrupting data because we do DMA with a possibly dirty cache line set above the page. The generic fix is simple: flush the potentially dirty page along the correct cache alias before feeding it into the block routines and losing the alias address information. The slight problem is that we don't have an API to handle this ... flush_kernel_dcache_page() would be the correct one except that it only takes a page as the argument, not the virtual address. So, I propose as part of this change to introduce a new API: flush_kernel_dcache_addr() which performs exactly the same as flush_kernel_dcache_page except that it flushes through the provided virtual address (whether offset mapped or mapped via vmalloc/vmap). I'll send out the patch series as a reply to this email. James ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas 2009-09-08 18:27 xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas James Bottomley @ 2009-09-08 19:00 ` Russell King 2009-09-08 19:11 ` James Bottomley 2009-10-13 1:40 ` xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas Christoph Hellwig 1 sibling, 1 reply; 19+ messages in thread From: Russell King @ 2009-09-08 19:00 UTC (permalink / raw) To: James Bottomley Cc: Parisc List, Linux Filesystem Mailing List, linux-arch, Christoph Hellwig On Tue, Sep 08, 2009 at 01:27:49PM -0500, James Bottomley wrote: > This bug was observed on parisc, but I would expect it to affect all > architectures with virtually indexed caches. I don't think your proposed solution will work for ARM with speculative prefetching (iow, the latest ARM CPUs.) If there is a mapping present, it can be speculatively prefetched from at any time - the CPU designers have placed no bounds on the amount of speculative prefetching which may be present in a design. What this means that for DMA, we will need to handle cache coherency issues both before and after DMA. If we're going to allow non-direct mapped (offset mapped in your parlence) block IO, it makes it impossible to handle cache coherency after DMA completion - although we can translate (via page table walks) from a virtual address to a physical, and then to a bus address for DMA, going back the other way is impossible since there could be many right answers. What has been annoying me for a while about the current DMA API is that drivers have to carry around all sorts of information for a DMA mapping, whether the architecture needs it or not - and sometimes that information is not what the architecture wants. To this end, I've been thinking that something more like: struct dma_mapping map; err = dma2_map_single(&map, buffer, size, direction); if (err) ... addr = dma2_addr(&map); /* program controller */ /* completion */ dma2_unmap_single(&map); with similar style interfaces for pages and so forth (scatterlists are already arch-defined.) Architectures define the contents of struct dma_mapping - but it must contain at least the dma address. What's the advantage of this? It means that if an architecture needs to handle cache issues after DMA on unmap via a virtual address, it can ensure that the correct address is passed through all the way to the unmap function. This approach also relieves the driver writer from having to carry around the direction, size and dma address themselves, which means we don't need the DMA debug infrastructure to check that drivers are doing these things correctly. I seriously doubt, though, that we can revise the DMA API... In your (and my) case, maybe struct scatterlist also needs to contain the virtual address as well as the struct page, offset and length? PS, ARM already does not allow anything but direct-mapped RAM addresses for dma_map_single(), since we need to be able to translate virtual addresses to physical for non-coherent L2 cache handling - L1 cache needs handling via the virtual address and L2 via the physical address. PPS, you're not the only architecture which has problems with XFS. ARM has a long standing issue with it too. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas 2009-09-08 19:00 ` Russell King @ 2009-09-08 19:11 ` James Bottomley 2009-09-08 20:16 ` Russell King 0 siblings, 1 reply; 19+ messages in thread From: James Bottomley @ 2009-09-08 19:11 UTC (permalink / raw) To: Russell King Cc: Parisc List, Linux Filesystem Mailing List, linux-arch, Christoph Hellwig On Tue, 2009-09-08 at 20:00 +0100, Russell King wrote: > On Tue, Sep 08, 2009 at 01:27:49PM -0500, James Bottomley wrote: > > This bug was observed on parisc, but I would expect it to affect all > > architectures with virtually indexed caches. > > I don't think your proposed solution will work for ARM with speculative > prefetching (iow, the latest ARM CPUs.) If there is a mapping present, > it can be speculatively prefetched from at any time - the CPU designers > have placed no bounds on the amount of speculative prefetching which > may be present in a design. The architecturally prescribed fix for this on parisc is to purge the TLB entry as well. Without a TLB entry, the CPU is forbidden from doing speculative reads. This obviously works only as long as the kernel never touches the page during DMA, of course ... Isn't this also true for arm? > What this means that for DMA, we will need to handle cache coherency > issues both before and after DMA. > > If we're going to allow non-direct mapped (offset mapped in your parlence) > block IO, it makes it impossible to handle cache coherency after DMA > completion - although we can translate (via page table walks) from a > virtual address to a physical, and then to a bus address for DMA, going > back the other way is impossible since there could be many right answers. > > What has been annoying me for a while about the current DMA API is that > drivers have to carry around all sorts of information for a DMA mapping, > whether the architecture needs it or not - and sometimes that information > is not what the architecture wants. To this end, I've been thinking that > something more like: > > struct dma_mapping map; > > err = dma2_map_single(&map, buffer, size, direction); > if (err) > ... > > addr = dma2_addr(&map); > /* program controller */ > > /* completion */ > dma2_unmap_single(&map); > > with similar style interfaces for pages and so forth (scatterlists are > already arch-defined.) Architectures define the contents of > struct dma_mapping - but it must contain at least the dma address. > > What's the advantage of this? It means that if an architecture needs to > handle cache issues after DMA on unmap via a virtual address, it can > ensure that the correct address is passed through all the way to the > unmap function. This approach also relieves the driver writer from > having to carry around the direction, size and dma address themselves, > which means we don't need the DMA debug infrastructure to check that > drivers are doing these things correctly. > > I seriously doubt, though, that we can revise the DMA API... Actually, there's a more fundamental problem. I did think of doing it this way initially. However, most of the dma_map..() cases come down from block and have already lost all idea of what the virtual address was and where it came from ... so there's an awful lot more work to do to make them carry it through to dma_map...() > In your (and my) case, maybe struct scatterlist also needs to contain > the virtual address as well as the struct page, offset and length? > > > PS, ARM already does not allow anything but direct-mapped RAM addresses > for dma_map_single(), since we need to be able to translate virtual > addresses to physical for non-coherent L2 cache handling - L1 cache > needs handling via the virtual address and L2 via the physical address. > > > PPS, you're not the only architecture which has problems with XFS. ARM > has a long standing issue with it too. Well, the good news is that I can fix it to work on parisc. James ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas 2009-09-08 19:11 ` James Bottomley @ 2009-09-08 20:16 ` Russell King 2009-09-08 20:39 ` James Bottomley 0 siblings, 1 reply; 19+ messages in thread From: Russell King @ 2009-09-08 20:16 UTC (permalink / raw) To: James Bottomley Cc: Parisc List, Linux Filesystem Mailing List, linux-arch, Christoph Hellwig On Tue, Sep 08, 2009 at 07:11:52PM +0000, James Bottomley wrote: > On Tue, 2009-09-08 at 20:00 +0100, Russell King wrote: > > On Tue, Sep 08, 2009 at 01:27:49PM -0500, James Bottomley wrote: > > > This bug was observed on parisc, but I would expect it to affect all > > > architectures with virtually indexed caches. > > > > I don't think your proposed solution will work for ARM with speculative > > prefetching (iow, the latest ARM CPUs.) If there is a mapping present, > > it can be speculatively prefetched from at any time - the CPU designers > > have placed no bounds on the amount of speculative prefetching which > > may be present in a design. > > The architecturally prescribed fix for this on parisc is to purge the > TLB entry as well. Without a TLB entry, the CPU is forbidden from doing > speculative reads. This obviously works only as long as the kernel > never touches the page during DMA, of course ... > > Isn't this also true for arm? There appears to be nothing architected along those lines for ARM. >From the architectural point of view, any "normal memory" mapping is a candidate for speculative accesses provided access is permitted via the page permissions. In other words, if the CPU is permitted to access a memory page, it is a candidate for speculative accesses. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas 2009-09-08 20:16 ` Russell King @ 2009-09-08 20:39 ` James Bottomley 2009-09-08 21:39 ` Russell King 0 siblings, 1 reply; 19+ messages in thread From: James Bottomley @ 2009-09-08 20:39 UTC (permalink / raw) To: Russell King Cc: Parisc List, Linux Filesystem Mailing List, linux-arch, Christoph Hellwig On Tue, 2009-09-08 at 21:16 +0100, Russell King wrote: > On Tue, Sep 08, 2009 at 07:11:52PM +0000, James Bottomley wrote: > > On Tue, 2009-09-08 at 20:00 +0100, Russell King wrote: > > > On Tue, Sep 08, 2009 at 01:27:49PM -0500, James Bottomley wrote: > > > > This bug was observed on parisc, but I would expect it to affect all > > > > architectures with virtually indexed caches. > > > > > > I don't think your proposed solution will work for ARM with speculative > > > prefetching (iow, the latest ARM CPUs.) If there is a mapping present, > > > it can be speculatively prefetched from at any time - the CPU designers > > > have placed no bounds on the amount of speculative prefetching which > > > may be present in a design. > > > > The architecturally prescribed fix for this on parisc is to purge the > > TLB entry as well. Without a TLB entry, the CPU is forbidden from doing > > speculative reads. This obviously works only as long as the kernel > > never touches the page during DMA, of course ... > > > > Isn't this also true for arm? > > There appears to be nothing architected along those lines for ARM. > From the architectural point of view, any "normal memory" mapping is > a candidate for speculative accesses provided access is permitted via > the page permissions. > > In other words, if the CPU is permitted to access a memory page, it > is a candidate for speculative accesses. So the parisc architectural feature is simply a statement of fact for VI cache architectures: if you don't have a TLB entry for a page, you can't do cache operations for it. We have a software TLB interrupt and the CPU can't interrupt for a speculation, so it's restricted to the existing TLB entries in its cache for speculative move ins. So now we know what the problem is, if arm can't operate this way, what's your suggestion for fixing this ... I take it you have a DMA coherence index like we do that flushes the cache on DMA ops? James ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas 2009-09-08 20:39 ` James Bottomley @ 2009-09-08 21:39 ` Russell King 2009-09-09 3:14 ` James Bottomley 0 siblings, 1 reply; 19+ messages in thread From: Russell King @ 2009-09-08 21:39 UTC (permalink / raw) To: James Bottomley Cc: Parisc List, Linux Filesystem Mailing List, linux-arch, Christoph Hellwig On Tue, Sep 08, 2009 at 08:39:12PM +0000, James Bottomley wrote: > On Tue, 2009-09-08 at 21:16 +0100, Russell King wrote: > > On Tue, Sep 08, 2009 at 07:11:52PM +0000, James Bottomley wrote: > > > The architecturally prescribed fix for this on parisc is to purge the > > > TLB entry as well. Without a TLB entry, the CPU is forbidden from doing > > > speculative reads. This obviously works only as long as the kernel > > > never touches the page during DMA, of course ... > > > > > > Isn't this also true for arm? > > > > There appears to be nothing architected along those lines for ARM. > > From the architectural point of view, any "normal memory" mapping is > > a candidate for speculative accesses provided access is permitted via > > the page permissions. > > > > In other words, if the CPU is permitted to access a memory page, it > > is a candidate for speculative accesses. > > So the parisc architectural feature is simply a statement of fact for VI > cache architectures: if you don't have a TLB entry for a page, you can't > do cache operations for it. That is also true for ARM - you can't perform cache maintainence on a page without there being a valid page table entry... > We have a software TLB interrupt and the > CPU can't interrupt for a speculation, so it's restricted to the > existing TLB entries in its cache for speculative move ins. though this is where we differ, since the hardware walks the page tables and doesn't require any interrupts to do this. > So now we know what the problem is, if arm can't operate this way, > what's your suggestion for fixing this ... I take it you have a DMA > coherence index like we do that flushes the cache on DMA ops? DMA on ARM continues to be totally non-coherent with the caches. There is no hardware help with this. So, with these speculative accessing CPUs, we will need to do software cache maintainence both before _and_ after the DMA in the case of DMA from device. Maintainence before the DMA is required to ensure that the cache doesn't write out dirty cache lines to the region which is being DMA'd to. Maintainence after the DMA is needed to invalidate any stale data, whether it be pre-existing or speculatively loaded. What this means is that we need to have the correct virtual address in the unmap operation to ensure that subsequent CPU reads access the newly DMA'd data. The alternative solution I can see is to ensure that subsystems which do DMA from (eg) vmalloc'd regions are not selectable for ARM. It's also worth noting that we do have this restriction already in place across _all_ ARM CPUs for the DMA APIs which take virtual addresses - we only accept direct mapped kernel addresses via those APIs since we use virt_to_phys() for L2 cache maintainence. Walking page tables, especially with high PTE support (ARM has joined those architectures supporting highmem), sounds to me very unfunny. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas 2009-09-08 21:39 ` Russell King @ 2009-09-09 3:14 ` James Bottomley 2009-09-09 3:17 ` [PATCH 1/5] mm: add coherence API for DMA " James Bottomley ` (4 more replies) 0 siblings, 5 replies; 19+ messages in thread From: James Bottomley @ 2009-09-09 3:14 UTC (permalink / raw) To: Russell King Cc: Parisc List, Linux Filesystem Mailing List, linux-arch, Christoph Hellwig On Tue, 2009-09-08 at 22:39 +0100, Russell King wrote: > On Tue, Sep 08, 2009 at 08:39:12PM +0000, James Bottomley wrote: > > On Tue, 2009-09-08 at 21:16 +0100, Russell King wrote: > > > On Tue, Sep 08, 2009 at 07:11:52PM +0000, James Bottomley wrote: > > > > The architecturally prescribed fix for this on parisc is to purge the > > > > TLB entry as well. Without a TLB entry, the CPU is forbidden from doing > > > > speculative reads. This obviously works only as long as the kernel > > > > never touches the page during DMA, of course ... > > > > > > > > Isn't this also true for arm? > > > > > > There appears to be nothing architected along those lines for ARM. > > > From the architectural point of view, any "normal memory" mapping is > > > a candidate for speculative accesses provided access is permitted via > > > the page permissions. > > > > > > In other words, if the CPU is permitted to access a memory page, it > > > is a candidate for speculative accesses. > > > > So the parisc architectural feature is simply a statement of fact for VI > > cache architectures: if you don't have a TLB entry for a page, you can't > > do cache operations for it. > > That is also true for ARM - you can't perform cache maintainence on a > page without there being a valid page table entry... > > > We have a software TLB interrupt and the > > CPU can't interrupt for a speculation, so it's restricted to the > > existing TLB entries in its cache for speculative move ins. > > though this is where we differ, since the hardware walks the page tables > and doesn't require any interrupts to do this. > > > So now we know what the problem is, if arm can't operate this way, > > what's your suggestion for fixing this ... I take it you have a DMA > > coherence index like we do that flushes the cache on DMA ops? > > DMA on ARM continues to be totally non-coherent with the caches. There > is no hardware help with this. So, with these speculative accessing > CPUs, we will need to do software cache maintainence both before _and_ > after the DMA in the case of DMA from device. > > Maintainence before the DMA is required to ensure that the cache > doesn't write out dirty cache lines to the region which is being > DMA'd to. Maintainence after the DMA is needed to invalidate any > stale data, whether it be pre-existing or speculatively loaded. > > What this means is that we need to have the correct virtual address > in the unmap operation to ensure that subsequent CPU reads access > the newly DMA'd data. > > The alternative solution I can see is to ensure that subsystems which > do DMA from (eg) vmalloc'd regions are not selectable for ARM. > > It's also worth noting that we do have this restriction already in > place across _all_ ARM CPUs for the DMA APIs which take virtual > addresses - we only accept direct mapped kernel addresses via > those APIs since we use virt_to_phys() for L2 cache maintainence. > Walking page tables, especially with high PTE support (ARM has > joined those architectures supporting highmem), sounds to me very > unfunny. OK, so we can work with that. This series of patches introduces both a flush and an invalidate (one for pre and one for post). With this, I'm able to successfully use xfs filesystems on parisc (before, they mostly refuse to mount because of journal corruption). James ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/5] mm: add coherence API for DMA to vmalloc/vmap areas 2009-09-09 3:14 ` James Bottomley @ 2009-09-09 3:17 ` James Bottomley 2009-09-09 3:23 ` James Bottomley 2009-09-09 3:18 ` [PATCH 2/5] parisc: add mm " James Bottomley ` (3 subsequent siblings) 4 siblings, 1 reply; 19+ messages in thread From: James Bottomley @ 2009-09-09 3:17 UTC (permalink / raw) To: Russell King Cc: Parisc List, Linux Filesystem Mailing List, linux-arch, Christoph Hellwig On Virtually Indexed architectures (which don't do automatic alias resolution in their caches), we have to flush via the correct virtual address to prepare pages for DMA. On some architectures (like arm) we cannot prevent the CPU from doing data movein along the alias (and thus giving stale read data), so we not only have to introduce a flush API to push dirty cache lines out, but also an invalidate API to kill inconsistent cache lines that may have moved in before DMA changed the data Signed-off-by: James Bottomley <James.Bottomley@suse.de> --- include/linux/highmem.h | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 211ff44..9719952 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -17,6 +17,12 @@ static inline void flush_anon_page(struct vm_area_struct *vma, struct page *page static inline void flush_kernel_dcache_page(struct page *page) { } +static incline void flush_kernel_dcache_addr(void *vaddr) +{ +} +static incline void invalidate_kernel_dcache_addr(void *vaddr) +{ +} #endif #include <asm/kmap_types.h> -- 1.6.3.3 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 1/5] mm: add coherence API for DMA to vmalloc/vmap areas 2009-09-09 3:17 ` [PATCH 1/5] mm: add coherence API for DMA " James Bottomley @ 2009-09-09 3:23 ` James Bottomley 2009-09-09 3:35 ` Paul Mundt 0 siblings, 1 reply; 19+ messages in thread From: James Bottomley @ 2009-09-09 3:23 UTC (permalink / raw) To: Russell King Cc: Parisc List, Linux Filesystem Mailing List, linux-arch, Christoph Hellwig On Wed, 2009-09-09 at 03:17 +0000, James Bottomley wrote: > On Virtually Indexed architectures (which don't do automatic alias > resolution in their caches), we have to flush via the correct > virtual address to prepare pages for DMA. On some architectures > (like arm) we cannot prevent the CPU from doing data movein along > the alias (and thus giving stale read data), so we not only have to > introduce a flush API to push dirty cache lines out, but also an invalidate > API to kill inconsistent cache lines that may have moved in before > DMA changed the data > > Signed-off-by: James Bottomley <James.Bottomley@suse.de> > --- > include/linux/highmem.h | 6 ++++++ > 1 files changed, 6 insertions(+), 0 deletions(-) > > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > index 211ff44..9719952 100644 > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -17,6 +17,12 @@ static inline void flush_anon_page(struct vm_area_struct *vma, struct page *page > static inline void flush_kernel_dcache_page(struct page *page) > { > } > +static incline void flush_kernel_dcache_addr(void *vaddr) > +{ > +} > +static incline void invalidate_kernel_dcache_addr(void *vaddr) > +{ > +} > #endif OK, so it's been pointed out to me that I didn't compile check this ... just testing to see everyone is awake ... James ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/5] mm: add coherence API for DMA to vmalloc/vmap areas 2009-09-09 3:23 ` James Bottomley @ 2009-09-09 3:35 ` Paul Mundt 2009-09-09 14:34 ` James Bottomley 0 siblings, 1 reply; 19+ messages in thread From: Paul Mundt @ 2009-09-09 3:35 UTC (permalink / raw) To: James Bottomley Cc: Russell King, Parisc List, Linux Filesystem Mailing List, linux-arch, Christoph Hellwig On Tue, Sep 08, 2009 at 10:23:21PM -0500, James Bottomley wrote: > On Wed, 2009-09-09 at 03:17 +0000, James Bottomley wrote: > > On Virtually Indexed architectures (which don't do automatic alias > > resolution in their caches), we have to flush via the correct > > virtual address to prepare pages for DMA. On some architectures > > (like arm) we cannot prevent the CPU from doing data movein along > > the alias (and thus giving stale read data), so we not only have to > > introduce a flush API to push dirty cache lines out, but also an invalidate > > API to kill inconsistent cache lines that may have moved in before > > DMA changed the data > > > > Signed-off-by: James Bottomley <James.Bottomley@suse.de> > > --- > > include/linux/highmem.h | 6 ++++++ > > 1 files changed, 6 insertions(+), 0 deletions(-) > > > > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > > index 211ff44..9719952 100644 > > --- a/include/linux/highmem.h > > +++ b/include/linux/highmem.h > > @@ -17,6 +17,12 @@ static inline void flush_anon_page(struct vm_area_struct *vma, struct page *page > > static inline void flush_kernel_dcache_page(struct page *page) > > { > > } > > +static incline void flush_kernel_dcache_addr(void *vaddr) > > +{ > > +} > > +static incline void invalidate_kernel_dcache_addr(void *vaddr) > > +{ > > +} > > #endif > > OK, so it's been pointed out to me that I didn't compile check this ... > just testing to see everyone is awake ... > And here I thought it was a new gcc construct along the lines of inline-if-so-inclined.. :-) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/5] mm: add coherence API for DMA to vmalloc/vmap areas 2009-09-09 3:35 ` Paul Mundt @ 2009-09-09 14:34 ` James Bottomley 2009-09-10 0:24 ` Paul Mundt 0 siblings, 1 reply; 19+ messages in thread From: James Bottomley @ 2009-09-09 14:34 UTC (permalink / raw) To: Paul Mundt Cc: Russell King, Parisc List, Linux Filesystem Mailing List, linux-arch, Christoph Hellwig On Wed, 2009-09-09 at 12:35 +0900, Paul Mundt wrote: > And here I thought it was a new gcc construct along the lines of > inline-if-so-inclined.. :-) Actually, I missed the fact that sh also sets ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE, so you'll need implementations of these functions too. Does this look right? James --- diff --git a/arch/sh/include/asm/cacheflush.h b/arch/sh/include/asm/cacheflush.h index 4c5462d..db06611 100644 --- a/arch/sh/include/asm/cacheflush.h +++ b/arch/sh/include/asm/cacheflush.h @@ -48,6 +48,14 @@ static inline void flush_kernel_dcache_page(struct page *page) { flush_dcache_page(page); } +static inline void flush_kernel_dcache_addr(void *addr) +{ + __flush_invalidate_region(addr, PAGE_SIZE); +} +static inline void invalidate_kernel_dcache_addr(void *addr) +{ + __flush_invalidate_region(addr, PAGE_SIZE); +} #if defined(CONFIG_CPU_SH4) && !defined(CONFIG_CACHE_OFF) extern void copy_to_user_page(struct vm_area_struct *vma, ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 1/5] mm: add coherence API for DMA to vmalloc/vmap areas 2009-09-09 14:34 ` James Bottomley @ 2009-09-10 0:24 ` Paul Mundt 2009-09-10 0:30 ` James Bottomley 0 siblings, 1 reply; 19+ messages in thread From: Paul Mundt @ 2009-09-10 0:24 UTC (permalink / raw) To: James Bottomley Cc: Russell King, Parisc List, Linux Filesystem Mailing List, linux-arch, Christoph Hellwig On Wed, Sep 09, 2009 at 09:34:38AM -0500, James Bottomley wrote: > On Wed, 2009-09-09 at 12:35 +0900, Paul Mundt wrote: > > And here I thought it was a new gcc construct along the lines of > > inline-if-so-inclined.. :-) > > Actually, I missed the fact that sh also sets > ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE, so you'll need implementations of > these functions too. Does this look right? > Yes, I was planning on just wiring it up later, but this looks correct. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/5] mm: add coherence API for DMA to vmalloc/vmap areas 2009-09-10 0:24 ` Paul Mundt @ 2009-09-10 0:30 ` James Bottomley 0 siblings, 0 replies; 19+ messages in thread From: James Bottomley @ 2009-09-10 0:30 UTC (permalink / raw) To: Paul Mundt Cc: Russell King, Parisc List, Linux Filesystem Mailing List, linux-arch, Christoph Hellwig On Thu, 2009-09-10 at 09:24 +0900, Paul Mundt wrote: > On Wed, Sep 09, 2009 at 09:34:38AM -0500, James Bottomley wrote: > > On Wed, 2009-09-09 at 12:35 +0900, Paul Mundt wrote: > > > And here I thought it was a new gcc construct along the lines of > > > inline-if-so-inclined.. :-) > > > > Actually, I missed the fact that sh also sets > > ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE, so you'll need implementations of > > these functions too. Does this look right? > > > Yes, I was planning on just wiring it up later, but this looks correct. Great, thanks! Give me an ack and I'll take care of submitting it ... although it would be nice to test it out with the xfs problem case. James ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 2/5] parisc: add mm API for DMA to vmalloc/vmap areas 2009-09-09 3:14 ` James Bottomley 2009-09-09 3:17 ` [PATCH 1/5] mm: add coherence API for DMA " James Bottomley @ 2009-09-09 3:18 ` James Bottomley 2009-09-09 3:20 ` [PATCH 3/5] arm: " James Bottomley ` (2 subsequent siblings) 4 siblings, 0 replies; 19+ messages in thread From: James Bottomley @ 2009-09-09 3:18 UTC (permalink / raw) To: Russell King Cc: Parisc List, Linux Filesystem Mailing List, linux-arch, Christoph Hellwig We already have an API to flush a kernel page along an alias address, so use it. The TLB purge prevents the CPU from doing speculative moveins on the flushed address, so we don't need to implement and invalidate. Signed-off-by: James Bottomley <James.Bottomley@suse.de> --- arch/parisc/include/asm/cacheflush.h | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h index 7243951..2536a00 100644 --- a/arch/parisc/include/asm/cacheflush.h +++ b/arch/parisc/include/asm/cacheflush.h @@ -90,6 +90,14 @@ static inline void flush_kernel_dcache_page(struct page *page) { flush_kernel_dcache_page_addr(page_address(page)); } +static inline void flush_kernel_dcache_addr(void *addr) +{ + flush_kernel_dcache_page_addr(addr); +} +static inline void invalidate_kernel_dcache_addr(void *addr) +{ + /* nop .. the flush prevents move in until the page is touched */ +} #ifdef CONFIG_DEBUG_RODATA void mark_rodata_ro(void); -- 1.6.3.3 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 3/5] arm: add mm API for DMA to vmalloc/vmap areas 2009-09-09 3:14 ` James Bottomley 2009-09-09 3:17 ` [PATCH 1/5] mm: add coherence API for DMA " James Bottomley 2009-09-09 3:18 ` [PATCH 2/5] parisc: add mm " James Bottomley @ 2009-09-09 3:20 ` James Bottomley 2009-09-09 3:21 ` [PATCH 4/5] block: permit I/O to vmalloc/vmap kernel pages James Bottomley 2009-09-09 3:21 ` [PATCH 5/5] xfs: fix xfs to work with Virtually Indexed architectures James Bottomley 4 siblings, 0 replies; 19+ messages in thread From: James Bottomley @ 2009-09-09 3:20 UTC (permalink / raw) To: Russell King Cc: Parisc List, Linux Filesystem Mailing List, linux-arch, Christoph Hellwig ARM cannot prevent cache movein, so this patch implements both the flush and invalidate pieces of the API. Signed-off-by: James Bottomley <James.Bottomley@suse.de> --- arch/arm/include/asm/cacheflush.h | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h index 1a711ea..1104ee9 100644 --- a/arch/arm/include/asm/cacheflush.h +++ b/arch/arm/include/asm/cacheflush.h @@ -436,6 +436,16 @@ static inline void flush_kernel_dcache_page(struct page *page) if ((cache_is_vivt() || cache_is_vipt_aliasing()) && !PageHighMem(page)) __cpuc_flush_dcache_page(page_address(page)); } +static inline void flush_kernel_dcache_addr(void *addr) +{ + if ((cache_is_vivt() || cache_is_vipt_aliasing())) + __cpuc_flush_dcache_page(addr); +} +static inline void invalidate_kernel_dcache_addr(void *addr) +{ + if ((cache_is_vivt() || cache_is_vipt_aliasing())) + __cpuc_flush_dcache_page(addr); +} #define flush_dcache_mmap_lock(mapping) \ spin_lock_irq(&(mapping)->tree_lock) -- 1.6.3.3 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 4/5] block: permit I/O to vmalloc/vmap kernel pages 2009-09-09 3:14 ` James Bottomley ` (2 preceding siblings ...) 2009-09-09 3:20 ` [PATCH 3/5] arm: " James Bottomley @ 2009-09-09 3:21 ` James Bottomley 2009-09-09 3:21 ` [PATCH 5/5] xfs: fix xfs to work with Virtually Indexed architectures James Bottomley 4 siblings, 0 replies; 19+ messages in thread From: James Bottomley @ 2009-09-09 3:21 UTC (permalink / raw) To: Russell King Cc: Parisc List, Linux Filesystem Mailing List, linux-arch, Christoph Hellwig This updates bio_map_kern() to check for pages in the vmalloc address range and call the new kernel flushing APIs if the are. This should allow any kernel user to pass a vmalloc/vmap area to block. Signed-off-by: James Bottomley <James.Bottomley@suse.de> --- fs/bio.c | 19 +++++++++++++++++-- 1 files changed, 17 insertions(+), 2 deletions(-) diff --git a/fs/bio.c b/fs/bio.c index 7673800..ea346b4 100644 --- a/fs/bio.c +++ b/fs/bio.c @@ -1120,6 +1120,13 @@ void bio_unmap_user(struct bio *bio) static void bio_map_kern_endio(struct bio *bio, int err) { + void *kaddr = bio->bi_private; + if (is_vmalloc_addr(kaddr)) { + void *addr; + for (addr = kaddr; addr < kaddr + bio->bi_size; + addr += PAGE_SIZE) + invalidate_kernel_dcache_addr(addr); + } bio_put(bio); } @@ -1138,9 +1145,12 @@ static struct bio *__bio_map_kern(struct request_queue *q, void *data, if (!bio) return ERR_PTR(-ENOMEM); + bio->bi_private = data; + offset = offset_in_page(kaddr); for (i = 0; i < nr_pages; i++) { unsigned int bytes = PAGE_SIZE - offset; + struct page *page; if (len <= 0) break; @@ -1148,8 +1158,13 @@ static struct bio *__bio_map_kern(struct request_queue *q, void *data, if (bytes > len) bytes = len; - if (bio_add_pc_page(q, bio, virt_to_page(data), bytes, - offset) < bytes) + if (is_vmalloc_addr(data)) { + flush_kernel_dcache_addr(data); + page = vmalloc_to_page(data); + } else + page = virt_to_page(data); + + if (bio_add_pc_page(q, bio, page, bytes, offset) < bytes) break; data += bytes; -- 1.6.3.3 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 5/5] xfs: fix xfs to work with Virtually Indexed architectures 2009-09-09 3:14 ` James Bottomley ` (3 preceding siblings ...) 2009-09-09 3:21 ` [PATCH 4/5] block: permit I/O to vmalloc/vmap kernel pages James Bottomley @ 2009-09-09 3:21 ` James Bottomley 4 siblings, 0 replies; 19+ messages in thread From: James Bottomley @ 2009-09-09 3:21 UTC (permalink / raw) To: Russell King Cc: Parisc List, Linux Filesystem Mailing List, linux-arch, Christoph Hellwig xfs_buf.c includes what is essentially a hand rolled version of blk_rq_map_kern(). In order to work properly with the vmalloc buffers that xfs uses, this hand rolled routine must also implement the flushing API for vmap/vmalloc areas. Signed-off-by: James Bottomley <James.Bottomley@suse.de> --- fs/xfs/linux-2.6/xfs_buf.c | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_buf.c b/fs/xfs/linux-2.6/xfs_buf.c index 965df12..62ae977 100644 --- a/fs/xfs/linux-2.6/xfs_buf.c +++ b/fs/xfs/linux-2.6/xfs_buf.c @@ -1138,6 +1138,10 @@ xfs_buf_bio_end_io( do { struct page *page = bvec->bv_page; + if (is_vmalloc_addr(bp->b_addr)) + invalidate_kernel_dcache_addr(bp->b_addr + + bvec->bv_offset); + ASSERT(!PagePrivate(page)); if (unlikely(bp->b_error)) { if (bp->b_flags & XBF_READ) @@ -1202,6 +1206,9 @@ _xfs_buf_ioapply( bio->bi_end_io = xfs_buf_bio_end_io; bio->bi_private = bp; + if (is_vmalloc_addr(bp->b_addr)) + flush_kernel_dcache_addr(bp->b_addr); + bio_add_page(bio, bp->b_pages[0], PAGE_CACHE_SIZE, 0); size = 0; @@ -1228,6 +1235,9 @@ next_chunk: if (nbytes > size) nbytes = size; + if (is_vmalloc_addr(bp->b_addr)) + flush_kernel_dcache_addr(bp->b_addr + PAGE_SIZE*map_i); + rbytes = bio_add_page(bio, bp->b_pages[map_i], nbytes, offset); if (rbytes < nbytes) break; -- 1.6.3.3 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas 2009-09-08 18:27 xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas James Bottomley 2009-09-08 19:00 ` Russell King @ 2009-10-13 1:40 ` Christoph Hellwig 2009-10-13 4:13 ` James Bottomley 1 sibling, 1 reply; 19+ messages in thread From: Christoph Hellwig @ 2009-10-13 1:40 UTC (permalink / raw) To: James Bottomley Cc: Parisc List, Linux Filesystem Mailing List, linux-arch, Christoph Hellwig So what's going to happen with this patch series? Can we expect it to get merged one day? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas 2009-10-13 1:40 ` xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas Christoph Hellwig @ 2009-10-13 4:13 ` James Bottomley 0 siblings, 0 replies; 19+ messages in thread From: James Bottomley @ 2009-10-13 4:13 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Parisc List, Linux Filesystem Mailing List, linux-arch On Tue, 2009-10-13 at 03:40 +0200, Christoph Hellwig wrote: > So what's going to happen with this patch series? Can we expect it to > get merged one day? Well, we need it for parisc, so we can just push it through our tree. I was sort of hoping for confirmation that the other arch bits worked ... but they can't be more broken than they are today, so I suppose the risk is low. James ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2009-10-13 4:13 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-09-08 18:27 xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas James Bottomley 2009-09-08 19:00 ` Russell King 2009-09-08 19:11 ` James Bottomley 2009-09-08 20:16 ` Russell King 2009-09-08 20:39 ` James Bottomley 2009-09-08 21:39 ` Russell King 2009-09-09 3:14 ` James Bottomley 2009-09-09 3:17 ` [PATCH 1/5] mm: add coherence API for DMA " James Bottomley 2009-09-09 3:23 ` James Bottomley 2009-09-09 3:35 ` Paul Mundt 2009-09-09 14:34 ` James Bottomley 2009-09-10 0:24 ` Paul Mundt 2009-09-10 0:30 ` James Bottomley 2009-09-09 3:18 ` [PATCH 2/5] parisc: add mm " James Bottomley 2009-09-09 3:20 ` [PATCH 3/5] arm: " James Bottomley 2009-09-09 3:21 ` [PATCH 4/5] block: permit I/O to vmalloc/vmap kernel pages James Bottomley 2009-09-09 3:21 ` [PATCH 5/5] xfs: fix xfs to work with Virtually Indexed architectures James Bottomley 2009-10-13 1:40 ` xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas Christoph Hellwig 2009-10-13 4:13 ` James Bottomley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).