[RFC PATCH] Update the cachetlb.txt file WRT flush_dcache_page and update_mmu

linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH] Update the cachetlb.txt file WRT flush_dcache_page and update_mmu_cache
@ 2010-05-07 13:24 Catalin Marinas
  2010-05-10  8:06 ` FUJITA Tomonori
  0 siblings, 1 reply; 12+ messages in thread
From: Catalin Marinas @ 2010-05-07 13:24 UTC (permalink / raw)
  To: linux-arch, linux-kernel
  Cc: James Bottomley, Benjamin Herrenschmidt, David Miller,
	Russell King

It has been discussed on a few occasions about issues with D-cache
aliasing and I-D cache coherency (on Harvard architectures) caused by
PIO drivers not flushing the caches:

http://thread.gmane.org/gmane.linux.usb.general/27072
http://thread.gmane.org/gmane.linux.kernel.cross-arch/5136

This patch modifies the cachetlb.txt recommendations for implementing
flush_dcache_page() and deferred cache flushing.

Basically, flush_dcache_page() is not usually called in PIO drivers for
new page cache pages after the data was written (recent driver fixes -
commits db8516f6 and 2d68b7fe). Proposing a new PIO API has been
suggested but this requires fixing too many drivers.

A solution adopted by IA-64 and PowerPC is to always consider newly
allocated page cache pages as dirty. The meaning of PG_arch_1 would
become "D-cache clean" (rather than "D-cache dirty" as on SPARC64). This
bit is checked in set_pte_at() and, if it isn't set, this function
flushes the cache. The advantage of this approach is that it is not
necessary to previously call flush_dcache_page() for a new page cache
page, as it is the case with most PIO drivers.

It is, however, necessary for set_pte_at() to always check this flag
even if deferred cache flushing is not implemented because of PIO
drivers not calling flush_dcache_page().

There are SMP configurations where the cache maintenance operations are
not automatically broadcast to the other CPUs. One of the solutions is
to add flush_dcache_page() calls to the required PIO drivers and perform
non-deferred cache flushing. Another solution is to implement
"read-for-ownership" tricks in the architecture cache flushing function
to force D-cache lines eviction.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Miller <davem@davemloft.net>
---
 Documentation/cachetlb.txt |   34 +++++++++++++++++++++++-----------
 1 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/Documentation/cachetlb.txt b/Documentation/cachetlb.txt
index 2b5f823..af67140 100644
--- a/Documentation/cachetlb.txt
+++ b/Documentation/cachetlb.txt
@@ -100,6 +100,12 @@ changes occur:
 	translations for software managed TLB configurations.
 	The sparc64 port currently does this.
 
+	NOTE: On SMP systems with hardware TLB this function cannot be
+	      paired with flush_dcache_page() for deferring the cache
+	      flushing because a page table entry written by
+	      set_pte_at() may become visible to other CPUs before the
+	      cache flushing has taken place.
+
 6) void tlb_migrate_finish(struct mm_struct *mm)
 
 	This interface is called at the end of an explicit
@@ -278,7 +284,7 @@ maps this page at its virtual address.
 
   void flush_dcache_page(struct page *page)
 
-	Any time the kernel writes to a page cache page, _OR_
+	Any time the kernel modifies an existing page cache page, _OR_
 	the kernel is about to read from a page cache page and
 	user space shared/writable mappings of this page potentially
 	exist, this routine is called.
@@ -289,20 +295,26 @@ maps this page at its virtual address.
 	      handling vfs symlinks in the page cache need not call
 	      this interface at all.
 
+	      The kernel may not call this function on a newly allocated
+	      page cache page even though it stored data into the page.
+
 	The phrase "kernel writes to a page cache page" means,
 	specifically, that the kernel executes store instructions
 	that dirty data in that page at the page->virtual mapping
 	of that page.  It is important to flush here to handle
 	D-cache aliasing, to make sure these kernel stores are
-	visible to user space mappings of that page.
+	visible to user space mappings of that page. It is also
+	important to flush the cache on Harvard architectures where the
+	I and D caches are not coherent.
 
 	The corollary case is just as important, if there are users
 	which have shared+writable mappings of this file, we must make
 	sure that kernel reads of these pages will see the most recent
 	stores done by the user.
 
-	If D-cache aliasing is not an issue, this routine may
-	simply be defined as a nop on that architecture.
+	If D-cache aliasing is not an issue and the I and D caches are
+	unified, this routine may simply be defined as a nop on that
+	architecture.
 
         There is a bit set aside in page->flags (PG_arch_1) as
 	"architecture private".  The kernel guarantees that,
@@ -312,15 +324,15 @@ maps this page at its virtual address.
 	This allows these interfaces to be implemented much more
 	efficiently.  It allows one to "defer" (perhaps indefinitely)
 	the actual flush if there are currently no user processes
-	mapping this page.  See sparc64's flush_dcache_page and
-	update_mmu_cache implementations for an example of how to go
+	mapping this page.  See IA-64's flush_dcache_page and
+	set_pte_at implementations for an example of how to go
 	about doing this.
 
 	The idea is, first at flush_dcache_page() time, if
 	page->mapping->i_mmap is an empty tree and ->i_mmap_nonlinear
-	an empty list, just mark the architecture private page flag bit.
-	Later, in update_mmu_cache(), a check is made of this flag bit,
-	and if set the flush is done and the flag bit is cleared.
+	an empty list, just clear the architecture private page flag bit.
+	Later, in set_pte_at(), a check is made of this flag bit,
+	and if cleared, the flush is done and the flag bit is set.
 
 	IMPORTANT NOTE: It is often important, if you defer the flush,
 			that the actual flush occurs on the same CPU
@@ -375,8 +387,8 @@ maps this page at its virtual address.
 
   void flush_icache_page(struct vm_area_struct *vma, struct page *page)
 	All the functionality of flush_icache_page can be implemented in
-	flush_dcache_page and update_mmu_cache. In 2.7 the hope is to
-	remove this interface completely.
+	flush_dcache_page and set_pte_at. In 2.7 the hope is to remove
+	this interface completely.
 
 The final category of APIs is for I/O to deliberately aliased address
 ranges inside the kernel.  Such aliases are set up by use of the

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Update the cachetlb.txt file WRT flush_dcache_page and update_mmu_cache
  2010-05-07 13:24 [RFC PATCH] Update the cachetlb.txt file WRT flush_dcache_page and update_mmu_cache Catalin Marinas
@ 2010-05-10  8:06 ` FUJITA Tomonori
  2010-05-10 10:16   ` Catalin Marinas
  0 siblings, 1 reply; 12+ messages in thread
From: FUJITA Tomonori @ 2010-05-10  8:06 UTC (permalink / raw)
  To: catalin.marinas
  Cc: linux-arch, linux-kernel, James.Bottomley, benh, davem, rmk

On Fri, 07 May 2010 14:24:18 +0100
Catalin Marinas <catalin.marinas@arm.com> wrote:

>    void flush_dcache_page(struct page *page)
>  
> -	Any time the kernel writes to a page cache page, _OR_
> +	Any time the kernel modifies an existing page cache page, _OR_
>  	the kernel is about to read from a page cache page and
>  	user space shared/writable mappings of this page potentially
>  	exist, this routine is called.
> @@ -289,20 +295,26 @@ maps this page at its virtual address.
>  	      handling vfs symlinks in the page cache need not call
>  	      this interface at all.
>  
> +	      The kernel may not call this function on a newly allocated
> +	      page cache page even though it stored data into the page.
> +
>  	The phrase "kernel writes to a page cache page" means,
>  	specifically, that the kernel executes store instructions
>  	that dirty data in that page at the page->virtual mapping
>  	of that page.  It is important to flush here to handle
>  	D-cache aliasing, to make sure these kernel stores are
> -	visible to user space mappings of that page.
> +	visible to user space mappings of that page. It is also
> +	important to flush the cache on Harvard architectures where the
> +	I and D caches are not coherent.
>  
>  	The corollary case is just as important, if there are users
>  	which have shared+writable mappings of this file, we must make
>  	sure that kernel reads of these pages will see the most recent
>  	stores done by the user.
>  
> -	If D-cache aliasing is not an issue, this routine may
> -	simply be defined as a nop on that architecture.
> +	If D-cache aliasing is not an issue and the I and D caches are
> +	unified, this routine may simply be defined as a nop on that
> +	architecture.
>  
>          There is a bit set aside in page->flags (PG_arch_1) as
>  	"architecture private".  The kernel guarantees that,
> @@ -312,15 +324,15 @@ maps this page at its virtual address.
>  	This allows these interfaces to be implemented much more
>  	efficiently.  It allows one to "defer" (perhaps indefinitely)
>  	the actual flush if there are currently no user processes
> -	mapping this page.  See sparc64's flush_dcache_page and
> -	update_mmu_cache implementations for an example of how to go
> +	mapping this page.  See IA-64's flush_dcache_page and
> +	set_pte_at implementations for an example of how to go
>  	about doing this.

cachetlb.txt says that flush_dcache_page() is the API to solve the
D-cache aliasing issue. Using IA64 as an example for the API here
looks strange since IA64 (PIPT) doesn't have D-cache aliasing
issue.

I don't think that just replacing sparc64 with IA64 helps much here
since we still have the problem that the whole cache handling
(architectures, subsystems, file systems) is inconsistent. I think
that we need to agree on it first.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Update the cachetlb.txt file WRT flush_dcache_page and update_mmu_cache
  2010-05-10  8:06 ` FUJITA Tomonori
@ 2010-05-10 10:16   ` Catalin Marinas
  2010-05-10 10:29     ` Paul Mundt
                       ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Catalin Marinas @ 2010-05-10 10:16 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: linux-arch, linux-kernel, James.Bottomley, benh, davem, rmk

On Mon, 2010-05-10 at 09:06 +0100, FUJITA Tomonori wrote:
> On Fri, 07 May 2010 14:24:18 +0100
> Catalin Marinas <catalin.marinas@arm.com> wrote:
> > @@ -312,15 +324,15 @@ maps this page at its virtual address.
> >       This allows these interfaces to be implemented much more
> >       efficiently.  It allows one to "defer" (perhaps indefinitely)
> >       the actual flush if there are currently no user processes
> > -     mapping this page.  See sparc64's flush_dcache_page and
> > -     update_mmu_cache implementations for an example of how to go
> > +     mapping this page.  See IA-64's flush_dcache_page and
> > +     set_pte_at implementations for an example of how to go
> >       about doing this.
> 
> cachetlb.txt says that flush_dcache_page() is the API to solve the
> D-cache aliasing issue. Using IA64 as an example for the API here
> looks strange since IA64 (PIPT) doesn't have D-cache aliasing
> issue.

Once we fix the ARM implementation, we could use it as an example :). It
has processors with both D-cache aliasing and separate I/D caches.

> I don't think that just replacing sparc64 with IA64 helps much here
> since we still have the problem that the whole cache handling
> (architectures, subsystems, file systems) is inconsistent. I think
> that we need to agree on it first.

Yes, this need to be agreed and hopefully this thread is a starting
point for such discussion.

The main problem I encountered on ARM was I/D cache coherency on a PIPT
processor and IA-64 and PowerPC fixed it by combining
flush_dcache_page() with set_pte_at().

IMHO, the D-cache aliasing isn't that much different from the I/D cache
coherency. We can view the I-cache as yet another alias of the D-cache
which needs explicit flushing. As I said on a few occasions, including
this patch, the flush_dcache_page() isn't always called from PIO
drivers. Adding a PIO API didn't seem very popular as it requires a lot
of drivers to be modified.

In most situations, just doing flushing in set_pte_at() would suffice
and flush_dcache_page() can be ignored. There are two situations where I
still see flush_dcache_page() useful:

     1. SMP systems where the cache maintenance operations aren't
        automatically broadcast in hardware
     2. The kernel modifies a page cache page that is already mapped in
        user space

(1) can be worked around on some architectures (though not sure about
all of them).

Is (2) a valid scenario?

-- 
Catalin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Update the cachetlb.txt file WRT flush_dcache_page and update_mmu_cache
  2010-05-10 10:16   ` Catalin Marinas
@ 2010-05-10 10:29     ` Paul Mundt
  2010-05-10 14:40       ` James Bottomley
  2010-05-10 11:55     ` Matthew Wilcox
  2010-05-11 11:31     ` [RFC PATCH] Update the cachetlb.txt file WRT flush_dcache_page and update_mmu_cache FUJITA Tomonori
  2 siblings, 1 reply; 12+ messages in thread
From: Paul Mundt @ 2010-05-10 10:29 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: FUJITA Tomonori, linux-arch, linux-kernel, James.Bottomley, benh,
	davem, rmk

On Mon, May 10, 2010 at 11:16:47AM +0100, Catalin Marinas wrote:
> In most situations, just doing flushing in set_pte_at() would suffice
> and flush_dcache_page() can be ignored. There are two situations where I
> still see flush_dcache_page() useful:
> 
>      1. SMP systems where the cache maintenance operations aren't
>         automatically broadcast in hardware
>      2. The kernel modifies a page cache page that is already mapped in
>         user space
> 
> (1) can be worked around on some architectures (though not sure about
> all of them).
> 
> Is (2) a valid scenario?
> 
get_user_pages() ?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Update the cachetlb.txt file WRT flush_dcache_page and update_mmu_cache
  2010-05-10 10:29     ` Paul Mundt
@ 2010-05-10 14:40       ` James Bottomley
  2010-05-10 14:40         ` James Bottomley
  0 siblings, 1 reply; 12+ messages in thread
From: James Bottomley @ 2010-05-10 14:40 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Catalin Marinas, FUJITA Tomonori, linux-arch, linux-kernel, benh,
	davem, rmk

On Mon, 2010-05-10 at 19:29 +0900, Paul Mundt wrote:
> On Mon, May 10, 2010 at 11:16:47AM +0100, Catalin Marinas wrote:
> > In most situations, just doing flushing in set_pte_at() would suffice
> > and flush_dcache_page() can be ignored. There are two situations where I
> > still see flush_dcache_page() useful:
> > 
> >      1. SMP systems where the cache maintenance operations aren't
> >         automatically broadcast in hardware
> >      2. The kernel modifies a page cache page that is already mapped in
> >         user space
> > 
> > (1) can be worked around on some architectures (though not sure about
> > all of them).
> > 
> > Is (2) a valid scenario?
> > 
> get_user_pages() ?

Actually, no, not really.  get_user_pages() is preparing the pages for
DMA (so it pins and possibly flushes them to make the underlying
physical page in the page cache clean).  This means it gathers the
physical pages into a scatter gather list.  It actually assumes the
kernel *won't* be touching them (without using kmap), so it doesn't give
the pages an in-kernel mapping.

James

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Update the cachetlb.txt file WRT flush_dcache_page and update_mmu_cache
  2010-05-10 14:40       ` James Bottomley
@ 2010-05-10 14:40         ` James Bottomley
  0 siblings, 0 replies; 12+ messages in thread
From: James Bottomley @ 2010-05-10 14:40 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Catalin Marinas, FUJITA Tomonori, linux-arch, linux-kernel, benh,
	davem, rmk

On Mon, 2010-05-10 at 19:29 +0900, Paul Mundt wrote:
> On Mon, May 10, 2010 at 11:16:47AM +0100, Catalin Marinas wrote:
> > In most situations, just doing flushing in set_pte_at() would suffice
> > and flush_dcache_page() can be ignored. There are two situations where I
> > still see flush_dcache_page() useful:
> > 
> >      1. SMP systems where the cache maintenance operations aren't
> >         automatically broadcast in hardware
> >      2. The kernel modifies a page cache page that is already mapped in
> >         user space
> > 
> > (1) can be worked around on some architectures (though not sure about
> > all of them).
> > 
> > Is (2) a valid scenario?
> > 
> get_user_pages() ?

Actually, no, not really.  get_user_pages() is preparing the pages for
DMA (so it pins and possibly flushes them to make the underlying
physical page in the page cache clean).  This means it gathers the
physical pages into a scatter gather list.  It actually assumes the
kernel *won't* be touching them (without using kmap), so it doesn't give
the pages an in-kernel mapping.

James



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Update the cachetlb.txt file WRT flush_dcache_page and update_mmu_cache
  2010-05-10 10:16   ` Catalin Marinas
  2010-05-10 10:29     ` Paul Mundt
@ 2010-05-10 11:55     ` Matthew Wilcox
  2010-05-10 14:00       ` [RFC PATCH] Update the cachetlb.txt file WRT flush_dcache_pageand update_mmu_cache Catalin Marinas
  2010-05-11 11:31     ` [RFC PATCH] Update the cachetlb.txt file WRT flush_dcache_page and update_mmu_cache FUJITA Tomonori
  2 siblings, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2010-05-10 11:55 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: FUJITA Tomonori, linux-arch, linux-kernel, James.Bottomley, benh,
	davem, rmk

On Mon, May 10, 2010 at 11:16:47AM +0100, Catalin Marinas wrote:
> In most situations, just doing flushing in set_pte_at() would suffice
> and flush_dcache_page() can be ignored. There are two situations where I
> still see flush_dcache_page() useful:
> 
>      1. SMP systems where the cache maintenance operations aren't
>         automatically broadcast in hardware
>      2. The kernel modifies a page cache page that is already mapped in
>         user space
> 
> (1) can be worked around on some architectures (though not sure about
> all of them).
> 
> Is (2) a valid scenario?

The kernel always calls kmap() / kunmap() around accesses to page cache
pages (thanks to x86-32's ability to support 64GB).  There are three
ways I know of that architectures use this:

1) No-ops.  These architectures don't have cache problems.
2) Flush the kernel's mapping in kunmap().  This can have bad consequences
in SMP systems with threaded programs.
3) Select an address in kmap() that will alias to the user's address.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Update the cachetlb.txt file WRT flush_dcache_pageand update_mmu_cache
  2010-05-10 11:55     ` Matthew Wilcox
@ 2010-05-10 14:00       ` Catalin Marinas
  2010-05-10 14:03         ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Catalin Marinas @ 2010-05-10 14:00 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: FUJITA Tomonori, linux-arch, linux-kernel, James.Bottomley, benh,
	davem, rmk

On Mon, 2010-05-10 at 12:55 +0100, Matthew Wilcox wrote:
> On Mon, May 10, 2010 at 11:16:47AM +0100, Catalin Marinas wrote:
> > In most situations, just doing flushing in set_pte_at() would suffice
> > and flush_dcache_page() can be ignored. There are two situations where I
> > still see flush_dcache_page() useful:
> >
> >      1. SMP systems where the cache maintenance operations aren't
> >         automatically broadcast in hardware
> >      2. The kernel modifies a page cache page that is already mapped in
> >         user space
> >
> > (1) can be worked around on some architectures (though not sure about
> > all of them).
> >
> > Is (2) a valid scenario?
> 
> The kernel always calls kmap() / kunmap() around accesses to page cache
> pages (thanks to x86-32's ability to support 64GB).  There are three
> ways I know of that architectures use this:

I think this was mentioned in some past discussions. There are
situations where page cache pages aren't highmem pages and kmap/kunmap
isn't used:

http://thread.gmane.org/gmane.linux.ide/44847

But yes, that's a possible solution that ideally would need to be agreed
with the other architectures and write the recommendations in
cachetlb.txt. It may need, however, to update some of the drivers in
Linux.

It would also mean that some architectures need to implement the kmap
API even if they don't need it. That's David Miller's comment on
sparc64:

http://article.gmane.org/gmane.linux.ide/44872

> 1) No-ops.  These architectures don't have cache problems.
> 2) Flush the kernel's mapping in kunmap().  This can have bad consequences
> in SMP systems with threaded programs.
> 3) Select an address in kmap() that will alias to the user's address.

3rd point above would help with the D-cache aliasing. Does the I/D cache
coherency need to be handled differently? On PIPT Harvard architectures,
we don't actually have D-cache aliasing but we may end up flushing too
much in kunmap() just in case such page would be mapped in user space
with executable permission.

An alternative for a PIO API which would require fixing individual
drivers was proposed here:

http://thread.gmane.org/gmane.linux.kernel.cross-arch/5136

-- 
Catalin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Update the cachetlb.txt file WRT flush_dcache_pageand update_mmu_cache
  2010-05-10 14:00       ` [RFC PATCH] Update the cachetlb.txt file WRT flush_dcache_pageand update_mmu_cache Catalin Marinas
@ 2010-05-10 14:03         ` David Miller
  2010-05-10 14:32           ` Catalin Marinas
  0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2010-05-10 14:03 UTC (permalink / raw)
  To: catalin.marinas
  Cc: matthew, fujita.tomonori, linux-arch, linux-kernel,
	James.Bottomley, benh, rmk

From: Catalin Marinas <catalin.marinas@arm.com>
Date: Mon, 10 May 2010 15:00:10 +0100

> 3rd point above would help with the D-cache aliasing. Does the I/D cache
> coherency need to be handled differently? On PIPT Harvard architectures,
> we don't actually have D-cache aliasing but we may end up flushing too
> much in kunmap() just in case such page would be mapped in user space
> with executable permission.

You can handle this by having an "I-cache clean" bit in the page.
When you kmap/kunmap, simply force this bit clear.

In update_mmu_cache() or set_pte_at() you'll see when a page gets
into userspace with execute permission, and if the I-cache bit
is clear you can do the flush then and set the "I-cache clean"
bit.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Update the cachetlb.txt file WRT flush_dcache_pageand update_mmu_cache
  2010-05-10 14:03         ` David Miller
@ 2010-05-10 14:32           ` Catalin Marinas
  0 siblings, 0 replies; 12+ messages in thread
From: Catalin Marinas @ 2010-05-10 14:32 UTC (permalink / raw)
  To: David Miller
  Cc: matthew, fujita.tomonori, linux-arch, linux-kernel,
	James.Bottomley, benh, rmk

> From: Catalin Marinas <catalin.marinas@arm.com>
> Date: Mon, 10 May 2010 15:00:10 +0100
> 
> > 3rd point above would help with the D-cache aliasing. Does the I/D cache
> > coherency need to be handled differently? On PIPT Harvard architectures,
> > we don't actually have D-cache aliasing but we may end up flushing too
> > much in kunmap() just in case such page would be mapped in user space
> > with executable permission.
> 
> You can handle this by having an "I-cache clean" bit in the page.
> When you kmap/kunmap, simply force this bit clear.
> 
> In update_mmu_cache() or set_pte_at() you'll see when a page gets
> into userspace with execute permission, and if the I-cache bit
> is clear you can do the flush then and set the "I-cache clean"
> bit.

If calling kmap on a new page cache page that hasn't been mapped in user
space, such bit is already cleared anyway. But would the kernel ever
kmap a page already mapped in user space without calling
flush_dcache_page? Ideally we shouldn't have to implement the kmap API
for architectures with highmem disabled.

What I'm trying to achieve is to get an agreement between architectures
and use the the cachetlb.txt document as a central recommendation point
for how cache flushing should be handled (we've had such issues on ARM
for quite some time).

The main problems with the arch/arm/ implementation and how we probably
understood cachetlb.txt:

     1. flush_dcache_page() isn't always called on page cache pages that
        were written to (and which are subsequently mapped in user
        space)
     2. kmap/kunmap isn't always used in PIO drivers (sometimes just
        normal pages)
     3. deferring the cache flushing to update_mmu_cache() doesn't work
        on SMP systems with hardware TLB implementation (a different CPU
        could see the PTE before the cache flushing occurred). The
        set_pte_at() would be a better place for this

Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Update the cachetlb.txt file WRT flush_dcache_page and update_mmu_cache
  2010-05-10 10:16   ` Catalin Marinas
  2010-05-10 10:29     ` Paul Mundt
  2010-05-10 11:55     ` Matthew Wilcox
@ 2010-05-11 11:31     ` FUJITA Tomonori
  2010-05-11 17:22       ` Catalin Marinas
  2 siblings, 1 reply; 12+ messages in thread
From: FUJITA Tomonori @ 2010-05-11 11:31 UTC (permalink / raw)
  To: catalin.marinas
  Cc: fujita.tomonori, linux-arch, linux-kernel, James.Bottomley, benh,
	davem, rmk

On Mon, 10 May 2010 11:16:47 +0100
Catalin Marinas <catalin.marinas@arm.com> wrote:

> > I don't think that just replacing sparc64 with IA64 helps much here
> > since we still have the problem that the whole cache handling
> > (architectures, subsystems, file systems) is inconsistent. I think
> > that we need to agree on it first.
> 
> Yes, this need to be agreed and hopefully this thread is a starting
> point for such discussion.

Hopefully, but I'm not sure what we need to agree is clear enough.

If we invert the meaning of PG_arch_1 (from PG_dcache_dirty to
PG_dcache_clean) like the way IA64 and POWERPC to use the bit to solve
I/D coherency, we can avoid calling flush_dcache_page() at low level
drivers or their subsystems (ide_* macros, libata,
bio_flush_dcache_pages, rq_flush_dcache_pages, etc). Architectures
that need to handle D aliasing and I/D coherence need two bits
respectively (needs another PG_arch_2 bit) to do flushes effectively.

Right?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Update the cachetlb.txt file WRT flush_dcache_page and update_mmu_cache
  2010-05-11 11:31     ` [RFC PATCH] Update the cachetlb.txt file WRT flush_dcache_page and update_mmu_cache FUJITA Tomonori
@ 2010-05-11 17:22       ` Catalin Marinas
  0 siblings, 0 replies; 12+ messages in thread
From: Catalin Marinas @ 2010-05-11 17:22 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: linux-arch, linux-kernel, James.Bottomley, benh, davem, rmk

On Tue, 2010-05-11 at 12:31 +0100, FUJITA Tomonori wrote:
> On Mon, 10 May 2010 11:16:47 +0100
> Catalin Marinas <catalin.marinas@arm.com> wrote:
> 
> > > I don't think that just replacing sparc64 with IA64 helps much here
> > > since we still have the problem that the whole cache handling
> > > (architectures, subsystems, file systems) is inconsistent. I think
> > > that we need to agree on it first.
> >
> > Yes, this need to be agreed and hopefully this thread is a starting
> > point for such discussion.
> 
> Hopefully, but I'm not sure what we need to agree is clear enough.
> 
> If we invert the meaning of PG_arch_1 (from PG_dcache_dirty to
> PG_dcache_clean) like the way IA64 and POWERPC to use the bit to solve
> I/D coherency, we can avoid calling flush_dcache_page() at low level
> drivers or their subsystems (ide_* macros, libata,
> bio_flush_dcache_pages, rq_flush_dcache_pages, etc). Architectures
> that need to handle D aliasing and I/D coherence need two bits
> respectively (needs another PG_arch_2 bit) to do flushes effectively.

The two bits idea was mentioned in the previous threads on cache
coherency.

So we basically have two main options (IMHO):

1) leave things as they currently are with PG_arch_1 meaning "dirty" and
change all low level (PIO) drivers call flush_dcache_page() when they
dirty the D-cache.

2) changing the meaning of PG_arch_1 to "clean" and maybe introduce
PG_arch_2 as an optimisation but don't force the low level drivers to
call flush_dcache_page().

The current cachetlb.txt recommends (1) but not all low-level (PIO)
drivers call flush_dcache_page(), hence I/D cache coherency issues at
least on ARM.

Should we go for (2) as a general recommendation across all
architectures that require I/D cache maintenance? Or stick with (1) and
modify the low level drivers to call flush_dcache_page (or a PIO API
similar to kmap that was already proposed on linux-arch)?

Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-05-11 17:23 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-07 13:24 [RFC PATCH] Update the cachetlb.txt file WRT flush_dcache_page and update_mmu_cache Catalin Marinas
2010-05-10  8:06 ` FUJITA Tomonori
2010-05-10 10:16   ` Catalin Marinas
2010-05-10 10:29     ` Paul Mundt
2010-05-10 14:40       ` James Bottomley
2010-05-10 14:40         ` James Bottomley
2010-05-10 11:55     ` Matthew Wilcox
2010-05-10 14:00       ` [RFC PATCH] Update the cachetlb.txt file WRT flush_dcache_pageand update_mmu_cache Catalin Marinas
2010-05-10 14:03         ` David Miller
2010-05-10 14:32           ` Catalin Marinas
2010-05-11 11:31     ` [RFC PATCH] Update the cachetlb.txt file WRT flush_dcache_page and update_mmu_cache FUJITA Tomonori
2010-05-11 17:22       ` Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).