public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Russell King <rmk+lkml@arm.linux.org.uk>
To: James Bottomley <James.Bottomley@SteelEye.com>
Cc: David Miller <davem@davemloft.net>,
	miklos@szeredi.hu, arjan@infradead.org, torvalds@osdl.org,
	linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	akpm@osdl.org
Subject: Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
Date: Sun, 7 Jan 2007 16:30:40 +0000	[thread overview]
Message-ID: <20070107163040.GB21133@flint.arm.linux.org.uk> (raw)
In-Reply-To: <1168186153.2792.80.camel@mulgrave.il.steeleye.com>

On Sun, Jan 07, 2007 at 10:09:13AM -0600, James Bottomley wrote:
> On Wed, 2007-01-03 at 15:09 +0000, Russell King wrote:
> > On Wed, Jan 03, 2007 at 09:00:58AM -0600, James Bottomley wrote:
> > > However, I was wondering if there might be a different way around this.
> > > We can't really walk all the user mappings because of the locks, but
> > > could we store the user flush hints in the page (or a related
> > > structure)?  On parisc we don't care about the process id (called space
> > > in our architecture) because we've turned off the pieces of the cache
> > > that match on space id.  Thus, all we care about is flushing with the
> > > physical address and virtual address (and only about 10 bits of this are
> > > significant for matching).  We go to great lengths to ensure that every
> > > mapping in user space all has the same 10 bits of virtual address, so if
> > > we just new what they were we could flush the whole of the user spaces
> > > for the page without having to walk any VMA lists.  Could arm do this as
> > > well?
> > 
> > I don't think so.  The organisation of the VIVT caches in terms of
> > how the set index and tag correspond with virtual addresses are hardly
> > ever documented.  When they are, they don't appear to lend themselves
> > to such an approach.  For example,  Xscale has:
> > 
> >  tag:       virtual address b31-10
> >  set index: b9-5
> > 
> > and there's 32 ways per set.  So there's nothing to be gained from
> > controlling the virtual address which individual mappings end up at
> > in this case.
> 
> OK, so the bottom line we seem to have reached is that we can't manage
> the user coherency in the DMA API.  Does this also mean you can't do it
> for non-DMA cases (kmap_atomic would seem to be a likely problem)?

It will only work if we can walk the VMA lists associated with the
page from IRQ context.  By that I mean the address_space vma lists
as well as the anonymous memory list.

> in which case the only coherency kmap would control would be kernel
> coherency?

If that's all that kmap could do, it would solve the issues with PIO,
but not things like fuse and the other users of get_user_pages() with
the current context.  All those would remain potential sources of data
corruption.

My current attempt at solving this (following David's advice for anon
pages in flush_dcache_page()) is as follows (which involves duplicating
bits of mm/rmap.c) but I remain unconvinced that this is safe from all
contexts which flush_dcache_page() may be called from.

Given where we are in the -rc cycle, I feel that my original two patches
are far safer to solve the problems reported so far - especially as they've
been tested.  If someone can come up with a better way, then we can look
to implementing that instead once 2.6.20 has happened.

diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 628348c..e5830f6 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -10,6 +10,7 @@
 #include <linux/module.h>
 #include <linux/mm.h>
 #include <linux/pagemap.h>
+#include <linux/rmap.h>
 
 #include <asm/cacheflush.h>
 #include <asm/system.h>
@@ -117,6 +118,56 @@ void flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
 #define flush_pfn_alias(pfn,vaddr)	do { } while (0)
 #endif
 
+/*
+ * Copy of vma_address in mm/rmap.c... would be useful to have that non-static.
+ */
+static inline unsigned long
+arm_vma_address(struct page *page, struct vm_area_struct *vma)
+{
+	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+	unsigned long address;
+
+	address = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+	if (unlikely(address < vma->vm_start || address >= vma->vm_end)) {
+		/* page should be within any vma from prio_tree_next */
+		BUG_ON(!PageAnon(page));
+		return -EFAULT;
+	}
+	return address;
+}
+
+static void flush_user_mapped_page(struct vm_area_struct *vma, struct page *page)
+{
+	unsigned long address = arm_vma_address(page, vma);
+
+	if (address != -EFAULT)
+		flush_cache_page(vma, address, page_to_pfn(page));
+}
+
+static void __flush_anon_mapping(struct page *page)
+{
+	struct mm_struct *mm = current->active_mm;
+
+	rcu_read_lock();
+	if (page_mapped(page)) {
+		unsigned long anon_mapping = (unsigned long) page->mapping;
+		struct anon_vma *anon_vma;
+		struct vm_area_struct *vma;
+
+		anon_vma = (struct anon_vma *)(anon_mapping - PAGE_MAPPING_ANON);
+		spin_lock(&anon_vma->lock);
+		rcu_read_unlock();
+
+		list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
+			if (vma->vm_mm == mm)
+				flush_user_mapped_page(vma, page);
+		}
+		spin_unlock(&anon_vma->lock);
+	} else {
+		rcu_read_unlock();
+	}
+}
+
 void __flush_dcache_page(struct address_space *mapping, struct page *page)
 {
 	/*
@@ -141,20 +192,17 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
 	struct mm_struct *mm = current->active_mm;
 	struct vm_area_struct *mpnt;
 	struct prio_tree_iter iter;
-	pgoff_t pgoff;
+	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
 
 	/*
 	 * There are possible user space mappings of this page:
 	 * - VIVT cache: we need to also write back and invalidate all user
 	 *   data in the current VM view associated with this page.
 	 * - aliasing VIPT: we only need to find one mapping of this page.
+	 *   (we handle this separately)
 	 */
-	pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
-
 	flush_dcache_mmap_lock(mapping);
 	vma_prio_tree_foreach(mpnt, &iter, &mapping->i_mmap, pgoff, pgoff) {
-		unsigned long offset;
-
 		/*
 		 * If this VMA is not in our MM, we can ignore it.
 		 */
@@ -162,8 +210,7 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
 			continue;
 		if (!(mpnt->vm_flags & VM_MAYSHARE))
 			continue;
-		offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
-		flush_cache_page(mpnt, mpnt->vm_start + offset, page_to_pfn(page));
+		flush_user_mapped_page(mpnt, page);
 	}
 	flush_dcache_mmap_unlock(mapping);
 }
@@ -199,6 +246,8 @@ void flush_dcache_page(struct page *page)
 		__flush_dcache_page(mapping, page);
 		if (mapping && cache_is_vivt())
 			__flush_dcache_aliases(mapping, page);
+		if (PageAnon(page))
+			__flush_anon_mapping(page);
 	}
 }
 EXPORT_SYMBOL(flush_dcache_page);


-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

  reply	other threads:[~2007-01-07 16:31 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-12-21 15:26 fuse, get_user_pages, flush_anon_page, aliasing caches and all that again Russell King
2006-12-21 15:53 ` Miklos Szeredi
2006-12-21 16:57   ` Russell King
2006-12-21 17:51     ` Miklos Szeredi
2006-12-21 21:04       ` Jan Engelhardt
2006-12-21 21:30         ` Miklos Szeredi
2007-01-01 15:04     ` James Bottomley
2006-12-21 17:17   ` Russell King
2006-12-21 17:55     ` Miklos Szeredi
2006-12-21 18:11       ` Russell King
2006-12-21 18:30         ` Miklos Szeredi
2006-12-21 18:55           ` Russell King
2006-12-21 19:05             ` Miklos Szeredi
2006-12-21 23:51               ` Randolph Chung
2006-12-22  8:43                 ` Russell King
2006-12-22 14:45                   ` Randolph Chung
2006-12-30 16:39     ` Russell King
2006-12-30 16:50       ` Russell King
2006-12-30 18:26         ` Linus Torvalds
2006-12-30 22:46           ` Russell King
2006-12-31  5:23             ` David Miller
2006-12-31  9:10               ` Miklos Szeredi
2006-12-31  9:45                 ` David Miller
2006-12-31  9:23               ` Russell King
2006-12-31  9:27                 ` Arjan van de Ven
2006-12-31  9:47                   ` David Miller
2006-12-31 10:00                     ` Russell King
2006-12-31 10:04                       ` David Miller
2006-12-31 12:24                       ` Miklos Szeredi
2006-12-31 17:37                         ` Russell King
2007-01-01 22:15                           ` Miklos Szeredi
2007-01-01 23:45                             ` Russell King
2007-01-02 19:40                               ` Dan Williams
2007-01-02 22:53                               ` James Bottomley
2007-01-02 23:19                                 ` David Miller
2007-01-02 23:34                                   ` James Bottomley
2007-01-03  0:20                                     ` David Miller
2007-01-03 14:16                                       ` Russell King
2007-01-03 15:00                                         ` James Bottomley
2007-01-03 15:09                                           ` Russell King
2007-01-07 16:09                                             ` James Bottomley
2007-01-07 16:30                                               ` Russell King [this message]
2006-12-31 20:40                         ` David Miller
2006-12-31 20:58                           ` Linus Torvalds
2006-12-31 21:12                             ` David Miller
2007-01-01 16:44                               ` James Bottomley
2007-01-01 23:04                                 ` David Miller
2007-01-01 23:23                                   ` James Bottomley
     [not found]                     ` <1167669252.5302.57.camel@mulgrave.il.steeleye.com>
2007-01-01 23:01                       ` David Miller
2007-01-01 23:17                         ` Russell King
2006-12-31  9:55                   ` Russell King
2006-12-31  9:46                 ` David Miller
2007-01-01 14:35           ` James Bottomley
2007-01-01 16:21             ` Russell King
2006-12-30 18:21       ` Linus Torvalds
2006-12-21 16:29 ` Arjan van de Ven
2006-12-21 17:35   ` Russell King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070107163040.GB21133@flint.arm.linux.org.uk \
    --to=rmk+lkml@arm.linux.org.uk \
    --cc=James.Bottomley@SteelEye.com \
    --cc=akpm@osdl.org \
    --cc=arjan@infradead.org \
    --cc=davem@davemloft.net \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox