* Potential problem with the page_mapping macro and flush_dcache_page()
@ 2006-03-19 17:15 James Bottomley
2006-03-19 20:33 ` David S. Miller
0 siblings, 1 reply; 16+ messages in thread
From: James Bottomley @ 2006-03-19 17:15 UTC (permalink / raw)
To: linux-arch
This issue came up while investigating why fuse doesn't work on parisc.
The problem, essentially is that get_user_pages() doesn't return a
coherent view of anonymous pages to the kernel. The reason is that
get_user_pages() uses flush_dcache_page() as its engine of coherence,
and a lot of flush_dcache_page() implementations use the page_mapping()
macros to get a list of user mappings of the page. The page_mapping()
macro (include/linux/mm.h) returns NULL if the PAGE_MAPPING_ANON flag is
set, this means that flush_dcache_page() has no list of user mappings,
so nothing gets flushed in user space.
A lot of arch's don't have an incoherent view of user and kernel
mappings, so I think this problem perhaps affects only parisc and arm.
It seems to me that once get_user_pages() has executed, we should have a
guarantee that the kernel has a coherent view of all the pages so
returned, so either flush_dcache_page() should iterate over anonymous
page mappings (my preferred solution) or get_user_pages() should have
additional coherency for anonymous pages.
Although this problem first manifested with fuse, I think we would have
a similar issue doing SG_IO on memory obtained from brk, so we're lucky
it hasn't bitten us before.
James
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Potential problem with the page_mapping macro and flush_dcache_page()
2006-03-19 17:15 Potential problem with the page_mapping macro and flush_dcache_page() James Bottomley
@ 2006-03-19 20:33 ` David S. Miller
2006-03-19 20:53 ` James Bottomley
0 siblings, 1 reply; 16+ messages in thread
From: David S. Miller @ 2006-03-19 20:33 UTC (permalink / raw)
To: James.Bottomley; +Cc: linux-arch
From: James Bottomley <James.Bottomley@SteelEye.com>
Date: Sun, 19 Mar 2006 11:15:58 -0600
> The problem, essentially is that get_user_pages() doesn't return a
> coherent view of anonymous pages to the kernel. The reason is that
> get_user_pages() uses flush_dcache_page() as its engine of coherence,
> and a lot of flush_dcache_page() implementations use the page_mapping()
> macros to get a list of user mappings of the page. The page_mapping()
> macro (include/linux/mm.h) returns NULL if the PAGE_MAPPING_ANON flag is
> set, this means that flush_dcache_page() has no list of user mappings,
> so nothing gets flushed in user space.
>
> A lot of arch's don't have an incoherent view of user and kernel
> mappings, so I think this problem perhaps affects only parisc and arm.
flush_dcache_page() implementations intentionally don't operate on
anonymous pages because the kernel should only ever access anonymous
pages via:
1) copy_user_highpage()/clear_user_highpage(), where you can
handle the D-cache issues there.
2) device DMA, where the L2 cache eviction done by the device DMA
will pull out the D-cache entries, or if the block I/O is via PIO
it will do cache flushing in there, as per some asm-*/ide.h
implementations of __ide_ins*()/__ide_outs*()
3) ptrace poking
You've found a case that doesn't go through any special interfaces,
and thus is not handled properly at the current time.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Potential problem with the page_mapping macro and flush_dcache_page()
2006-03-19 20:33 ` David S. Miller
@ 2006-03-19 20:53 ` James Bottomley
2006-03-19 21:00 ` David S. Miller
0 siblings, 1 reply; 16+ messages in thread
From: James Bottomley @ 2006-03-19 20:53 UTC (permalink / raw)
To: David S. Miller; +Cc: linux-arch
On Sun, 2006-03-19 at 12:33 -0800, David S. Miller wrote:
> flush_dcache_page() implementations intentionally don't operate on
> anonymous pages because the kernel should only ever access anonymous
> pages via:
Yes .. I can certainly live with that one.
> 1) copy_user_highpage()/clear_user_highpage(), where you can
> handle the D-cache issues there.
>
> 2) device DMA, where the L2 cache eviction done by the device DMA
> will pull out the D-cache entries, or if the block I/O is via PIO
> it will do cache flushing in there, as per some asm-*/ide.h
> implementations of __ide_ins*()/__ide_outs*()
Actually, this is my specific worry. On parisc, the coherence index
associated with the DMA is that of the kernel mapping, not the user
mappings, so DMA only flushes the kernel cache for the line, not the
user one. If nothing happens to purge the user line, DMA to anon pages
won't work on parisc either.
> 3) ptrace poking
>
> You've found a case that doesn't go through any special interfaces,
> and thus is not handled properly at the current time.
I do think we need to begin considering it. The user space device
managers, like hal, are starting to use SG_IO to interrogate devices,
making it much more prevalent than before. It's got to be only a matter
of time before this issue bites something like hal.
Is the principle that get_user_pages() should return coherent pages OK?
If so, I think the fix is to have get_user_pages do a specific user page
flush for anonymous pages.
James
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Potential problem with the page_mapping macro and flush_dcache_page()
2006-03-19 20:53 ` James Bottomley
@ 2006-03-19 21:00 ` David S. Miller
2006-03-19 21:04 ` James Bottomley
0 siblings, 1 reply; 16+ messages in thread
From: David S. Miller @ 2006-03-19 21:00 UTC (permalink / raw)
To: James.Bottomley; +Cc: linux-arch
From: James Bottomley <James.Bottomley@SteelEye.com>
Date: Sun, 19 Mar 2006 14:53:08 -0600
> Actually, this is my specific worry. On parisc, the coherence index
> associated with the DMA is that of the kernel mapping, not the user
> mappings, so DMA only flushes the kernel cache for the line, not the
> user one. If nothing happens to purge the user line, DMA to anon pages
> won't work on parisc either.
You have physically indexed caches, thus cache coherency transactions
should purge all cache lines with that same physical TAG.
If not, you have much larger problems.
> I do think we need to begin considering it. The user space device
> managers, like hal, are starting to use SG_IO to interrogate devices,
> making it much more prevalent than before. It's got to be only a matter
> of time before this issue bites something like hal.
>
> Is the principle that get_user_pages() should return coherent pages OK?
> If so, I think the fix is to have get_user_pages do a specific user page
> flush for anonymous pages.
Direct I/O is another problematic case.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Potential problem with the page_mapping macro and flush_dcache_page()
2006-03-19 21:00 ` David S. Miller
@ 2006-03-19 21:04 ` James Bottomley
2006-03-19 21:23 ` James Bottomley
2006-03-19 21:25 ` David S. Miller
0 siblings, 2 replies; 16+ messages in thread
From: James Bottomley @ 2006-03-19 21:04 UTC (permalink / raw)
To: David S. Miller; +Cc: linux-arch
On Sun, 2006-03-19 at 13:00 -0800, David S. Miller wrote:
> You have physically indexed caches, thus cache coherency transactions
> should purge all cache lines with that same physical TAG.
No, we have virtually indexed but physically tagged caches; this is why
we have the problem (we have to select one of the aliases to flush).
> Direct I/O is another problematic case.
OK, I confess to not ever having looked at that ... does it use
get_user_pages() as well?
James
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Potential problem with the page_mapping macro and flush_dcache_page()
2006-03-19 21:04 ` James Bottomley
@ 2006-03-19 21:23 ` James Bottomley
2006-03-19 21:26 ` David S. Miller
2006-03-19 21:25 ` David S. Miller
1 sibling, 1 reply; 16+ messages in thread
From: James Bottomley @ 2006-03-19 21:23 UTC (permalink / raw)
To: David S. Miller; +Cc: linux-arch
On Sun, 2006-03-19 at 15:04 -0600, James Bottomley wrote:
> OK, I confess to not ever having looked at that ... does it use
> get_user_pages() as well?
If the fix is to be in get_user_pages(), this is what works for parisc.
James
Index: mm/memory.c
===================================================================
RCS file: /var/cvs/linux-2.6/mm/memory.c,v
retrieving revision 1.46
diff -u -p -r1.46 memory.c
--- mm/memory.c 18 Feb 2006 05:40:32 -0000 1.46
+++ mm/memory.c 19 Mar 2006 21:23:19 -0000
@@ -1073,6 +1073,8 @@ int get_user_pages(struct task_struct *t
}
if (pages) {
pages[i] = page;
+ if (PageAnon(page))
+ flush_cache_page(vma, start, page_to_pfn(page));
flush_dcache_page(page);
}
if (vmas)
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: Potential problem with the page_mapping macro and flush_dcache_page()
2006-03-19 21:23 ` James Bottomley
@ 2006-03-19 21:26 ` David S. Miller
2006-03-19 23:50 ` James Bottomley
0 siblings, 1 reply; 16+ messages in thread
From: David S. Miller @ 2006-03-19 21:26 UTC (permalink / raw)
To: James.Bottomley; +Cc: linux-arch
From: James Bottomley <James.Bottomley@SteelEye.com>
Date: Sun, 19 Mar 2006 15:23:51 -0600
> On Sun, 2006-03-19 at 15:04 -0600, James Bottomley wrote:
> > OK, I confess to not ever having looked at that ... does it use
> > get_user_pages() as well?
>
> If the fix is to be in get_user_pages(), this is what works for parisc.
flush_cache_page() is a nop on many architectures which have
cache aliasing issues, such as sparc64.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Potential problem with the page_mapping macro and flush_dcache_page()
2006-03-19 21:26 ` David S. Miller
@ 2006-03-19 23:50 ` James Bottomley
2006-03-20 1:15 ` James Bottomley
0 siblings, 1 reply; 16+ messages in thread
From: James Bottomley @ 2006-03-19 23:50 UTC (permalink / raw)
To: David S. Miller; +Cc: linux-arch
On Sun, 2006-03-19 at 13:26 -0800, David S. Miller wrote:
> flush_cache_page() is a nop on many architectures which have
> cache aliasing issues, such as sparc64.
Oh ... hmm .. I was just going by the comment in cachetlb.txt which says
After running, there will be no entries in the cache for
'vma->vm_mm' for virtual address 'addr' which translates
to 'pfn'.
which seemed it should do what I want.
It looks like there's no other API I can use for this ... do you want me
to construct another? something like flush_anon_page()?
James
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Potential problem with the page_mapping macro and flush_dcache_page()
2006-03-19 23:50 ` James Bottomley
@ 2006-03-20 1:15 ` James Bottomley
0 siblings, 0 replies; 16+ messages in thread
From: James Bottomley @ 2006-03-20 1:15 UTC (permalink / raw)
To: David S. Miller; +Cc: linux-arch
On Sun, 2006-03-19 at 17:50 -0600, James Bottomley wrote:
> It looks like there's no other API I can use for this ... do you want me
> to construct another? something like flush_anon_page()?
This is my proposal for such an API, with a default nop implementation
including the parisc implementation.
James
diff --git a/Documentation/cachetlb.txt b/Documentation/cachetlb.txt
index 4ae4188..2627c73 100644
Index: tmp-2.6/Documentation/cachetlb.txt
===================================================================
--- tmp-2.6.orig/Documentation/cachetlb.txt 2006-03-19 19:09:30.000000000 -0600
+++ tmp-2.6/Documentation/cachetlb.txt 2006-03-19 19:09:38.000000000 -0600
@@ -362,6 +362,12 @@
likely that you will need to flush the instruction cache
for copy_to_user_page().
+ void flush_anon_page(struct page *page, unsigned long vmaddr)
+ when the kernel needs to access the contents of an anonymous
+ page, it calls this function (currently only
+ get_user_pages()). The default implementation is a nop (and
+ should remain so for all coherent architectures).
+
void flush_icache_range(unsigned long start, unsigned long end)
When the kernel stores into addresses that it will execute
out of (eg when loading modules), this function is called.
Index: tmp-2.6/include/asm-parisc/cacheflush.h
===================================================================
--- tmp-2.6.orig/include/asm-parisc/cacheflush.h 2006-03-19 19:09:30.000000000 -0600
+++ tmp-2.6/include/asm-parisc/cacheflush.h 2006-03-19 19:13:19.000000000 -0600
@@ -184,6 +184,14 @@
}
+static inline void
+flush_anon_page(struct page *page, unsigned long vmaddr)
+{
+ if (PageAnon(page))
+ flush_user_dcache_page(vmaddr);
+}
+#define ARCH_HAS_FLUSH_ANON_PAGE
+
#ifdef CONFIG_DEBUG_RODATA
void mark_rodata_ro(void);
#endif
Index: tmp-2.6/include/linux/highmem.h
===================================================================
--- tmp-2.6.orig/include/linux/highmem.h 2006-03-19 19:09:30.000000000 -0600
+++ tmp-2.6/include/linux/highmem.h 2006-03-19 19:09:38.000000000 -0600
@@ -7,6 +7,12 @@
#include <asm/cacheflush.h>
+#ifndef ARCH_HAS_FLUSH_ANON_PAGE
+static inline void flush_anon_page(struct page *page, unsigned long vmaddr)
+{
+}
+#endif
+
#ifdef CONFIG_HIGHMEM
#include <asm/highmem.h>
Index: tmp-2.6/mm/memory.c
===================================================================
--- tmp-2.6.orig/mm/memory.c 2006-03-19 19:09:30.000000000 -0600
+++ tmp-2.6/mm/memory.c 2006-03-19 19:09:38.000000000 -0600
@@ -1074,6 +1074,8 @@
}
if (pages) {
pages[i] = page;
+
+ flush_anon_page(page, start);
flush_dcache_page(page);
}
if (vmas)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Potential problem with the page_mapping macro and flush_dcache_page()
2006-03-19 21:04 ` James Bottomley
2006-03-19 21:23 ` James Bottomley
@ 2006-03-19 21:25 ` David S. Miller
2006-03-19 21:29 ` James Bottomley
1 sibling, 1 reply; 16+ messages in thread
From: David S. Miller @ 2006-03-19 21:25 UTC (permalink / raw)
To: James.Bottomley; +Cc: linux-arch
From: James Bottomley <James.Bottomley@SteelEye.com>
Date: Sun, 19 Mar 2006 15:04:46 -0600
> On Sun, 2006-03-19 at 13:00 -0800, David S. Miller wrote:
> > You have physically indexed caches, thus cache coherency transactions
> > should purge all cache lines with that same physical TAG.
>
> No, we have virtually indexed but physically tagged caches; this is why
> we have the problem (we have to select one of the aliases to flush).
This is exactly what Sparc64's caches are too, L1 D-cache is virtually
indexed and physically tagged. But any bus coherency transaction will
update all lines with matching physical tags, otherwise things just
won't work.
> > Direct I/O is another problematic case.
>
> OK, I confess to not ever having looked at that ... does it use
> get_user_pages() as well?
I believe so, Andrew knows...
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Potential problem with the page_mapping macro and flush_dcache_page()
2006-03-19 21:25 ` David S. Miller
@ 2006-03-19 21:29 ` James Bottomley
2006-03-19 22:01 ` David S. Miller
2006-03-23 6:20 ` Andrew Morton
0 siblings, 2 replies; 16+ messages in thread
From: James Bottomley @ 2006-03-19 21:29 UTC (permalink / raw)
To: David S. Miller; +Cc: linux-arch
On Sun, 2006-03-19 at 13:25 -0800, David S. Miller wrote:
> This is exactly what Sparc64's caches are too, L1 D-cache is virtually
> indexed and physically tagged. But any bus coherency transaction will
> update all lines with matching physical tags, otherwise things just
> won't work.
I'm afraid ours don't. The reason is that parisc caches are huge
(around a megabyte), so you have to specify a coherence index (which is
effectively the virtual index) to the IOMMU on DMA transactions and
that's the only line it flushes (as long as the physical tags match).
> > OK, I confess to not ever having looked at that ... does it use
> > get_user_pages() as well?
>
> I believe so, Andrew knows...
OK I'll wait to see what he says.
James
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Potential problem with the page_mapping macro and flush_dcache_page()
2006-03-19 21:29 ` James Bottomley
@ 2006-03-19 22:01 ` David S. Miller
2006-03-19 23:39 ` James Bottomley
2006-03-23 6:20 ` Andrew Morton
1 sibling, 1 reply; 16+ messages in thread
From: David S. Miller @ 2006-03-19 22:01 UTC (permalink / raw)
To: James.Bottomley; +Cc: linux-arch
From: James Bottomley <James.Bottomley@SteelEye.com>
Date: Sun, 19 Mar 2006 15:29:43 -0600
> On Sun, 2006-03-19 at 13:25 -0800, David S. Miller wrote:
> > This is exactly what Sparc64's caches are too, L1 D-cache is virtually
> > indexed and physically tagged. But any bus coherency transaction will
> > update all lines with matching physical tags, otherwise things just
> > won't work.
>
> I'm afraid ours don't. The reason is that parisc caches are huge
> (around a megabyte), so you have to specify a coherence index (which is
> effectively the virtual index) to the IOMMU on DMA transactions and
> that's the only line it flushes (as long as the physical tags match).
I don't dispute how the parisc cache behaves, you know that better
than me, but I do want to know what in the world does the size of the
cache have to do with deciding whether to design the hardware to flush
all the matching lines or not?
It's a CAM lookup on bus snoop. That can search all physical tags in
one CAM cycle, which will thus set the invalid bit on all matching
cache lines regardless of virtual index.
How is a virtual address getting into this equation at all? The IOMMU
sits on some other bus agent such as the PCI controller, right? So
coherency transactions, even if translated by the IOMMU from a PCI bus
virtual address to a physical main memory address, looks like a
physical address cache transation to the cpu caches on a bus snoop.
What am I missing?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Potential problem with the page_mapping macro and flush_dcache_page()
2006-03-19 22:01 ` David S. Miller
@ 2006-03-19 23:39 ` James Bottomley
2006-03-19 23:42 ` David S. Miller
0 siblings, 1 reply; 16+ messages in thread
From: James Bottomley @ 2006-03-19 23:39 UTC (permalink / raw)
To: David S. Miller; +Cc: linux-arch
On Sun, 2006-03-19 at 14:01 -0800, David S. Miller wrote:
> I don't dispute how the parisc cache behaves, you know that better
> than me, but I do want to know what in the world does the size of the
> cache have to do with deciding whether to design the hardware to flush
> all the matching lines or not?
> It's a CAM lookup on bus snoop. That can search all physical tags in
> one CAM cycle, which will thus set the invalid bit on all matching
> cache lines regardless of virtual index.
Well ... CAM designs are really only appropriate to highly associative
cache designs, which the parisc cache isn't. I just mentioned the size
because the cache size divided by the page size and divided by the
associativity gives a significantly large number in the PA cache, which
is what makes implementing a CAM design for it hard (or actually not
realistic).
> How is a virtual address getting into this equation at all? The IOMMU
> sits on some other bus agent such as the PCI controller, right? So
> coherency transactions, even if translated by the IOMMU from a PCI bus
> virtual address to a physical main memory address, looks like a
> physical address cache transation to the cpu caches on a bus snoop.
as part of the IOMMU set up and tear down in dma_map/unmap_sg require
this thing called a coherence index for each page; we simply work it out
from the kernel virtual address. This is basically just a disguised
virtual index of the cache, but it's required for each iotlb entry.
When the IOMMU participates in the coherence protocol, it places the
coherence index as well as the physical address on the bus so the CPUs
know which lines to flush.
> What am I missing?
Nothing; it's always irked me that we need all these extra coherence
protocols for the PA VIPT cache because it's so large with such low
associativity. But it also makes us a good test case for the linux
API ...
James
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Potential problem with the page_mapping macro and flush_dcache_page()
2006-03-19 23:39 ` James Bottomley
@ 2006-03-19 23:42 ` David S. Miller
0 siblings, 0 replies; 16+ messages in thread
From: David S. Miller @ 2006-03-19 23:42 UTC (permalink / raw)
To: James.Bottomley; +Cc: linux-arch
From: James Bottomley <James.Bottomley@SteelEye.com>
Date: Sun, 19 Mar 2006 17:39:36 -0600
> > What am I missing?
>
> Nothing; it's always irked me that we need all these extra coherence
> protocols for the PA VIPT cache because it's so large with such low
> associativity. But it also makes us a good test case for the linux
> API ...
Ok, thanks for the explanation. It is horrible though :)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Potential problem with the page_mapping macro and flush_dcache_page()
2006-03-19 21:29 ` James Bottomley
2006-03-19 22:01 ` David S. Miller
@ 2006-03-23 6:20 ` Andrew Morton
2006-03-23 14:13 ` James Bottomley
1 sibling, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2006-03-23 6:20 UTC (permalink / raw)
To: James Bottomley; +Cc: davem, linux-arch
James Bottomley <James.Bottomley@SteelEye.com> wrote:
>
> > > OK, I confess to not ever having looked at that ... does it use
> > > get_user_pages() as well?
> >
> > I believe so, Andrew knows...
>
> OK I'll wait to see what he says.
<he'd made some procmailrc changes and had thought that linux-arch was
rather quiet lately>
Yes, direct-io does get_user_pages() and then uses submit_bio() to
read/write the pages.
So it should fall under
2) device DMA, where the L2 cache eviction done by the device DMA
will pull out the D-cache entries, or if the block I/O is via PIO
it will do cache flushing in there, as per some asm-*/ide.h
implementations of __ide_ins*()/__ide_outs*()
umm, except in the case where direct-io hit a file hole. In that case we
memset the user's page and use flush_dcache_page(). And that's most likely
an anonymous page.
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: Potential problem with the page_mapping macro and flush_dcache_page()
2006-03-23 6:20 ` Andrew Morton
@ 2006-03-23 14:13 ` James Bottomley
0 siblings, 0 replies; 16+ messages in thread
From: James Bottomley @ 2006-03-23 14:13 UTC (permalink / raw)
To: Andrew Morton; +Cc: davem, linux-arch
On Wed, 2006-03-22 at 22:20 -0800, Andrew Morton wrote:
> James Bottomley <James.Bottomley@SteelEye.com> wrote:
> umm, except in the case where direct-io hit a file hole. In that case we
> memset the user's page and use flush_dcache_page(). And that's most likely
> an anonymous page.
Actually, in this one case, you need to use the
flush_kernel_dcache_page() API rather than the flush_anon_page(), since
it sounds like what you've modified is the kernel view of the page, not
the user's view.
James
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2006-03-23 14:14 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-19 17:15 Potential problem with the page_mapping macro and flush_dcache_page() James Bottomley
2006-03-19 20:33 ` David S. Miller
2006-03-19 20:53 ` James Bottomley
2006-03-19 21:00 ` David S. Miller
2006-03-19 21:04 ` James Bottomley
2006-03-19 21:23 ` James Bottomley
2006-03-19 21:26 ` David S. Miller
2006-03-19 23:50 ` James Bottomley
2006-03-20 1:15 ` James Bottomley
2006-03-19 21:25 ` David S. Miller
2006-03-19 21:29 ` James Bottomley
2006-03-19 22:01 ` David S. Miller
2006-03-19 23:39 ` James Bottomley
2006-03-19 23:42 ` David S. Miller
2006-03-23 6:20 ` Andrew Morton
2006-03-23 14:13 ` James Bottomley
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox