From: Russell King <rmk+lkml@arm.linux.org.uk>
To: James Bottomley <James.Bottomley@SteelEye.com>
Cc: David Miller <davem@davemloft.net>,
miklos@szeredi.hu, arjan@infradead.org, torvalds@osdl.org,
linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
akpm@osdl.org
Subject: Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
Date: Sun, 7 Jan 2007 16:30:40 +0000 [thread overview]
Message-ID: <20070107163040.GB21133@flint.arm.linux.org.uk> (raw)
In-Reply-To: <1168186153.2792.80.camel@mulgrave.il.steeleye.com>
On Sun, Jan 07, 2007 at 10:09:13AM -0600, James Bottomley wrote:
> On Wed, 2007-01-03 at 15:09 +0000, Russell King wrote:
> > On Wed, Jan 03, 2007 at 09:00:58AM -0600, James Bottomley wrote:
> > > However, I was wondering if there might be a different way around this.
> > > We can't really walk all the user mappings because of the locks, but
> > > could we store the user flush hints in the page (or a related
> > > structure)? On parisc we don't care about the process id (called space
> > > in our architecture) because we've turned off the pieces of the cache
> > > that match on space id. Thus, all we care about is flushing with the
> > > physical address and virtual address (and only about 10 bits of this are
> > > significant for matching). We go to great lengths to ensure that every
> > > mapping in user space all has the same 10 bits of virtual address, so if
> > > we just new what they were we could flush the whole of the user spaces
> > > for the page without having to walk any VMA lists. Could arm do this as
> > > well?
> >
> > I don't think so. The organisation of the VIVT caches in terms of
> > how the set index and tag correspond with virtual addresses are hardly
> > ever documented. When they are, they don't appear to lend themselves
> > to such an approach. For example, Xscale has:
> >
> > tag: virtual address b31-10
> > set index: b9-5
> >
> > and there's 32 ways per set. So there's nothing to be gained from
> > controlling the virtual address which individual mappings end up at
> > in this case.
>
> OK, so the bottom line we seem to have reached is that we can't manage
> the user coherency in the DMA API. Does this also mean you can't do it
> for non-DMA cases (kmap_atomic would seem to be a likely problem)?
It will only work if we can walk the VMA lists associated with the
page from IRQ context. By that I mean the address_space vma lists
as well as the anonymous memory list.
> in which case the only coherency kmap would control would be kernel
> coherency?
If that's all that kmap could do, it would solve the issues with PIO,
but not things like fuse and the other users of get_user_pages() with
the current context. All those would remain potential sources of data
corruption.
My current attempt at solving this (following David's advice for anon
pages in flush_dcache_page()) is as follows (which involves duplicating
bits of mm/rmap.c) but I remain unconvinced that this is safe from all
contexts which flush_dcache_page() may be called from.
Given where we are in the -rc cycle, I feel that my original two patches
are far safer to solve the problems reported so far - especially as they've
been tested. If someone can come up with a better way, then we can look
to implementing that instead once 2.6.20 has happened.
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 628348c..e5830f6 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -10,6 +10,7 @@
#include <linux/module.h>
#include <linux/mm.h>
#include <linux/pagemap.h>
+#include <linux/rmap.h>
#include <asm/cacheflush.h>
#include <asm/system.h>
@@ -117,6 +118,56 @@ void flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
#define flush_pfn_alias(pfn,vaddr) do { } while (0)
#endif
+/*
+ * Copy of vma_address in mm/rmap.c... would be useful to have that non-static.
+ */
+static inline unsigned long
+arm_vma_address(struct page *page, struct vm_area_struct *vma)
+{
+ pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+ unsigned long address;
+
+ address = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+ if (unlikely(address < vma->vm_start || address >= vma->vm_end)) {
+ /* page should be within any vma from prio_tree_next */
+ BUG_ON(!PageAnon(page));
+ return -EFAULT;
+ }
+ return address;
+}
+
+static void flush_user_mapped_page(struct vm_area_struct *vma, struct page *page)
+{
+ unsigned long address = arm_vma_address(page, vma);
+
+ if (address != -EFAULT)
+ flush_cache_page(vma, address, page_to_pfn(page));
+}
+
+static void __flush_anon_mapping(struct page *page)
+{
+ struct mm_struct *mm = current->active_mm;
+
+ rcu_read_lock();
+ if (page_mapped(page)) {
+ unsigned long anon_mapping = (unsigned long) page->mapping;
+ struct anon_vma *anon_vma;
+ struct vm_area_struct *vma;
+
+ anon_vma = (struct anon_vma *)(anon_mapping - PAGE_MAPPING_ANON);
+ spin_lock(&anon_vma->lock);
+ rcu_read_unlock();
+
+ list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
+ if (vma->vm_mm == mm)
+ flush_user_mapped_page(vma, page);
+ }
+ spin_unlock(&anon_vma->lock);
+ } else {
+ rcu_read_unlock();
+ }
+}
+
void __flush_dcache_page(struct address_space *mapping, struct page *page)
{
/*
@@ -141,20 +192,17 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
struct mm_struct *mm = current->active_mm;
struct vm_area_struct *mpnt;
struct prio_tree_iter iter;
- pgoff_t pgoff;
+ pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
/*
* There are possible user space mappings of this page:
* - VIVT cache: we need to also write back and invalidate all user
* data in the current VM view associated with this page.
* - aliasing VIPT: we only need to find one mapping of this page.
+ * (we handle this separately)
*/
- pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
-
flush_dcache_mmap_lock(mapping);
vma_prio_tree_foreach(mpnt, &iter, &mapping->i_mmap, pgoff, pgoff) {
- unsigned long offset;
-
/*
* If this VMA is not in our MM, we can ignore it.
*/
@@ -162,8 +210,7 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
continue;
if (!(mpnt->vm_flags & VM_MAYSHARE))
continue;
- offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
- flush_cache_page(mpnt, mpnt->vm_start + offset, page_to_pfn(page));
+ flush_user_mapped_page(mpnt, page);
}
flush_dcache_mmap_unlock(mapping);
}
@@ -199,6 +246,8 @@ void flush_dcache_page(struct page *page)
__flush_dcache_page(mapping, page);
if (mapping && cache_is_vivt())
__flush_dcache_aliases(mapping, page);
+ if (PageAnon(page))
+ __flush_anon_mapping(page);
}
}
EXPORT_SYMBOL(flush_dcache_page);
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
next prev parent reply other threads:[~2007-01-07 16:31 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-12-21 15:26 fuse, get_user_pages, flush_anon_page, aliasing caches and all that again Russell King
2006-12-21 15:53 ` Miklos Szeredi
2006-12-21 16:57 ` Russell King
2006-12-21 17:51 ` Miklos Szeredi
2006-12-21 21:04 ` Jan Engelhardt
2006-12-21 21:30 ` Miklos Szeredi
2007-01-01 15:04 ` James Bottomley
2006-12-21 17:17 ` Russell King
2006-12-21 17:55 ` Miklos Szeredi
2006-12-21 18:11 ` Russell King
2006-12-21 18:30 ` Miklos Szeredi
2006-12-21 18:55 ` Russell King
2006-12-21 19:05 ` Miklos Szeredi
2006-12-21 23:51 ` Randolph Chung
2006-12-22 8:43 ` Russell King
2006-12-22 14:45 ` Randolph Chung
2006-12-30 16:39 ` Russell King
2006-12-30 16:50 ` Russell King
2006-12-30 18:26 ` Linus Torvalds
2006-12-30 22:46 ` Russell King
2006-12-31 5:23 ` David Miller
2006-12-31 9:10 ` Miklos Szeredi
2006-12-31 9:45 ` David Miller
2006-12-31 9:23 ` Russell King
2006-12-31 9:27 ` Arjan van de Ven
2006-12-31 9:47 ` David Miller
2006-12-31 10:00 ` Russell King
2006-12-31 10:04 ` David Miller
2006-12-31 12:24 ` Miklos Szeredi
2006-12-31 17:37 ` Russell King
2007-01-01 22:15 ` Miklos Szeredi
2007-01-01 23:45 ` Russell King
2007-01-02 19:40 ` Dan Williams
2007-01-02 22:53 ` James Bottomley
2007-01-02 23:19 ` David Miller
2007-01-02 23:34 ` James Bottomley
2007-01-03 0:20 ` David Miller
2007-01-03 14:16 ` Russell King
2007-01-03 15:00 ` James Bottomley
2007-01-03 15:09 ` Russell King
2007-01-07 16:09 ` James Bottomley
2007-01-07 16:30 ` Russell King [this message]
2006-12-31 20:40 ` David Miller
2006-12-31 20:58 ` Linus Torvalds
2006-12-31 21:12 ` David Miller
2007-01-01 16:44 ` James Bottomley
2007-01-01 23:04 ` David Miller
2007-01-01 23:23 ` James Bottomley
[not found] ` <1167669252.5302.57.camel@mulgrave.il.steeleye.com>
2007-01-01 23:01 ` David Miller
2007-01-01 23:17 ` Russell King
2006-12-31 9:55 ` Russell King
2006-12-31 9:46 ` David Miller
2007-01-01 14:35 ` James Bottomley
2007-01-01 16:21 ` Russell King
2006-12-30 18:21 ` Linus Torvalds
2006-12-21 16:29 ` Arjan van de Ven
2006-12-21 17:35 ` Russell King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070107163040.GB21133@flint.arm.linux.org.uk \
--to=rmk+lkml@arm.linux.org.uk \
--cc=James.Bottomley@SteelEye.com \
--cc=akpm@osdl.org \
--cc=arjan@infradead.org \
--cc=davem@davemloft.net \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox