[PATCH] flush_cache_page while pte valid

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] flush_cache_page while pte valid
@ 2002-11-11 18:25 Hugh Dickins
  2002-11-11 18:35 ` Andrew Morton
  2002-11-11 23:19 ` David S. Miller
  0 siblings, 2 replies; 16+ messages in thread
From: Hugh Dickins @ 2002-11-11 18:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Dave McCracken, Rik van Riel, David S. Miller, linux-kernel

On some architectures (cachetlb.txt gives HyperSparc as an example)
it is essential to flush_cache_page while pte is still valid: the
rmap VM diverged from the base 2.4 VM before that fix was made,
so this error has crept back into 2.5.

Patch below applies to 2.5.47 or 2.5.47-mm1 - needs more work over
shared pagetables, but they've silently fallen out of 2.5.47-mm1:
oversight?  I'll send Alan the equivalent for 2.4-ac shortly.

(I wonder, what happens if userspace now modifies the page
after the flush_cache_page, before the pte is invalidated?)

--- 2.5.47/mm/rmap.c	Mon Oct  7 20:37:50 2002
+++ linux/mm/rmap.c	Mon Nov 11 17:01:27 2002
@@ -393,9 +393,9 @@
 	}

 	/* Nuke the page table entry. */
+	flush_cache_page(vma, address);
 	pte = ptep_get_and_clear(ptep);
 	flush_tlb_page(vma, address);
-	flush_cache_page(vma, address);

 	/* Store the swap location in the pte. See handle_pte_fault() ... */
 	if (PageSwapCache(page)) {

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] flush_cache_page while pte valid
  2002-11-11 18:25 Hugh Dickins
@ 2002-11-11 18:35 ` Andrew Morton
  2002-11-11 23:19 ` David S. Miller
  1 sibling, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2002-11-11 18:35 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Dave McCracken, Rik van Riel, David S. Miller, linux-kernel

Hugh Dickins wrote:
> 
> On some architectures (cachetlb.txt gives HyperSparc as an example)
> it is essential to flush_cache_page while pte is still valid: the
> rmap VM diverged from the base 2.4 VM before that fix was made,
> so this error has crept back into 2.5.

OK, thanks.

> Patch below applies to 2.5.47 or 2.5.47-mm1 - needs more work over
> shared pagetables, but they've silently fallen out of 2.5.47-mm1:
> oversight?

Yes, oversight.  dcache-rcu was similarly lost.  Pathetic, isn't it?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] flush_cache_page while pte valid
@ 2002-11-11 21:20 Manfred Spraul
  2002-11-12  0:37 ` Russell King
  0 siblings, 1 reply; 16+ messages in thread
From: Manfred Spraul @ 2002-11-11 21:20 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton; +Cc: linux-kernel

> 	/* Nuke the page table entry. */
>+	flush_cache_page(vma, address);
> 	pte = ptep_get_and_clear(ptep);
> 	flush_tlb_page(vma, address);
>-	flush_cache_page(vma, address);
> 

Is it correct that this are 3 arch hooks that must appear back to back?
What about one hook with all parameters?

	pte = ptep_get_and_clear_and_flush(ptep, vma, address);

The current implementation just asks for such errors.

Documentation:
ptep_get_and_clear_and_flush() removes the page table entry *ptep from the page tables and clears all caches (tlb, cpu cache if virtually tagged).
The return value contains the final value of the pte. The critical information is the dirty bit in the pte - on SMP one cpu can call ptep_get_and_clear_and_flush() while another cpu accesses the mmap'ing from user space.

--
	Manfred

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] flush_cache_page while pte valid
  2002-11-11 18:25 Hugh Dickins
  2002-11-11 18:35 ` Andrew Morton
@ 2002-11-11 23:19 ` David S. Miller
  2002-11-12  6:53   ` Hugh Dickins
  1 sibling, 1 reply; 16+ messages in thread
From: David S. Miller @ 2002-11-11 23:19 UTC (permalink / raw)
  To: hugh; +Cc: akpm, dmccr, riel, linux-kernel

   From: Hugh Dickins <hugh@veritas.com>
   Date: Mon, 11 Nov 2002 18:25:25 +0000 (GMT)

   On some architectures (cachetlb.txt gives HyperSparc as an example)
   it is essential to flush_cache_page while pte is still valid: the
   rmap VM diverged from the base 2.4 VM before that fix was made,
   so this error has crept back into 2.5.
...   
   (I wonder, what happens if userspace now modifies the page
   after the flush_cache_page, before the pte is invalidated?)

Thanks for catching this.

On architectures that are affected (such as the mentioned HyperSPARC
chips), the cpu will take a trap and OOPS the kernel if the PTE is
invalidated before the cache flush is made.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] flush_cache_page while pte valid
  2002-11-11 21:20 [PATCH] flush_cache_page while pte valid Manfred Spraul
@ 2002-11-12  0:37 ` Russell King
  0 siblings, 0 replies; 16+ messages in thread
From: Russell King @ 2002-11-12  0:37 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Hugh Dickins, Andrew Morton, linux-kernel

On Mon, Nov 11, 2002 at 10:20:30PM +0100, Manfred Spraul wrote:
> > 	/* Nuke the page table entry. */
> >+	flush_cache_page(vma, address);
> > 	pte = ptep_get_and_clear(ptep);
> > 	flush_tlb_page(vma, address);
> >-	flush_cache_page(vma, address);
> > 
> 
> Is it correct that this are 3 arch hooks that must appear back to back?
> What about one hook with all parameters?
> 
> 	pte = ptep_get_and_clear_and_flush(ptep, vma, address);
> 
> The current implementation just asks for such errors.

Its actually a very simple rule.  The sequence must be:

- flush cache for area
- change pte entries in area
- flush tlb for area

Anything else is just buggy, and may very well be racy.  Think about the
race when you flush the tlb entry before changing the pte.

Rather than creating a new interface that's only useful for 10% of the
cases, I'd prefer to keep the rule personally.  The smaller the number
of functions each with their own particular set of behaviours doing
almost the same job, the less chance of getting the wrong function.
And, IMHO, the easier it is to audit the code.

grep -4 ptep_get_and_clear mm/*.c

vs

"Is this the right function here?"

PS, I see one place where "ptep_get_and_clear_and_flush" would be useful
out of 6 uses.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] flush_cache_page while pte valid
@ 2002-11-12  1:08 Ulrich Weigand
  0 siblings, 0 replies; 16+ messages in thread
From: Ulrich Weigand @ 2002-11-12  1:08 UTC (permalink / raw)
  To: manfred; +Cc: linux-kernel, uweigand, schwidefsky

Manfred Spraul wrote:

>Is it correct that this are 3 arch hooks that must appear back to back?
>What about one hook with all parameters?
>
>        pte = ptep_get_and_clear_and_flush(ptep, vma, address);

This interface would help us on s390 very much.  We have a hardware
instruction (INVALIDATE PAGE TABLE ENTRY) that implements the combination 
of 
        pte = ptep_get_and_clear(ptep);
        flush_tlb_page(vma, address);

very efficiently, but we cannot use it to implement flush_tlb_page
alone, because it requires the pte to be valid.

This is why our current in-tree flush_tlb_page is quite inefficient
(it always flushes the complete TLB) ...

If we had a combined hook, we could use our IPTE instruction.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  weigand@informatik.uni-erlangen.de

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] flush_cache_page while pte valid
  2002-11-11 23:19 ` David S. Miller
@ 2002-11-12  6:53   ` Hugh Dickins
  2002-11-12  6:53     ` David S. Miller
  0 siblings, 1 reply; 16+ messages in thread
From: Hugh Dickins @ 2002-11-12  6:53 UTC (permalink / raw)
  To: David S. Miller; +Cc: akpm, dmccr, riel, linux-kernel

On Mon, 11 Nov 2002, David S. Miller wrote:
>    From: Hugh Dickins <hugh@veritas.com>
>    Date: Mon, 11 Nov 2002 18:25:25 +0000 (GMT)
> 
>    On some architectures (cachetlb.txt gives HyperSparc as an example)
>    it is essential to flush_cache_page while pte is still valid: the
>    rmap VM diverged from the base 2.4 VM before that fix was made,
>    so this error has crept back into 2.5.
> ...   
>    (I wonder, what happens if userspace now modifies the page
>    after the flush_cache_page, before the pte is invalidated?)
> 
> Thanks for catching this.
> 
> On architectures that are affected (such as the mentioned HyperSPARC
> chips), the cpu will take a trap and OOPS the kernel if the PTE is
> invalidated before the cache flush is made.

Thanks for shedding light on that; but I'm still wondering if there
might be data loss if userspace modifies the page in the tiny window
between correctly positioned flush_cache_page and pte invalidation?

Hugh


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] flush_cache_page while pte valid
  2002-11-12  6:53   ` Hugh Dickins
@ 2002-11-12  6:53     ` David S. Miller
  2002-11-12 14:49       ` Rik van Riel
  2002-11-12 17:43       ` Hugh Dickins
  0 siblings, 2 replies; 16+ messages in thread
From: David S. Miller @ 2002-11-12  6:53 UTC (permalink / raw)
  To: hugh; +Cc: akpm, dmccr, riel, linux-kernel

   From: Hugh Dickins <hugh@veritas.com>
   Date: Tue, 12 Nov 2002 06:53:04 +0000 (GMT)

   Thanks for shedding light on that; but I'm still wondering if there
   might be data loss if userspace modifies the page in the tiny window
   between correctly positioned flush_cache_page and pte invalidation?

The flush merely writes back the data, a copy-back operation, fully L2
cache coherent.  All cpus will see correct data if an intermittant
store occurs.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] flush_cache_page while pte valid
  2002-11-12  6:53     ` David S. Miller
@ 2002-11-12 14:49       ` Rik van Riel
  2002-11-12 21:45         ` David S. Miller
  2002-11-12 17:43       ` Hugh Dickins
  1 sibling, 1 reply; 16+ messages in thread
From: Rik van Riel @ 2002-11-12 14:49 UTC (permalink / raw)
  To: David S. Miller; +Cc: hugh, akpm, dmccr, linux-kernel

On Mon, 11 Nov 2002, David S. Miller wrote:
>    From: Hugh Dickins <hugh@veritas.com>
>    Date: Tue, 12 Nov 2002 06:53:04 +0000 (GMT)
>
>    Thanks for shedding light on that; but I'm still wondering if there
>    might be data loss if userspace modifies the page in the tiny window
>    between correctly positioned flush_cache_page and pte invalidation?
>
> The flush merely writes back the data, a copy-back operation, fully L2
> cache coherent.  All cpus will see correct data if an intermittant
> store occurs.

The CPUs will, but the harddisk might not.

We really need to get this right in the swapout path.

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/		http://distro.conectiva.com/
Current spamtrap:  <a href=mailto:"october@surriel.com">october@surriel.com</a>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] flush_cache_page while pte valid
  2002-11-12  6:53     ` David S. Miller
  2002-11-12 14:49       ` Rik van Riel
@ 2002-11-12 17:43       ` Hugh Dickins
  2002-11-12 21:51         ` David S. Miller
  1 sibling, 1 reply; 16+ messages in thread
From: Hugh Dickins @ 2002-11-12 17:43 UTC (permalink / raw)
  To: David S. Miller; +Cc: akpm, dmccr, riel, linux-kernel

On Mon, 11 Nov 2002, David S. Miller wrote:
>    From: Hugh Dickins <hugh@veritas.com>
>    Date: Tue, 12 Nov 2002 06:53:04 +0000 (GMT)
>    
>    Thanks for shedding light on that; but I'm still wondering if there
>    might be data loss if userspace modifies the page in the tiny window
>    between correctly positioned flush_cache_page and pte invalidation?
> 
> The flush merely writes back the data, a copy-back operation, fully L2
> cache coherent.  All cpus will see correct data if an intermittant
> store occurs.

Sorry, I still don't get it.  If the flush_cache_page is doing something
necessary, then won't a user access in between it and invalidating pte
undo what was necessary?  And if it's not necessary, why do we do it?
(For better performance would be a very good reason.)

But don't worry about me: I may well have some fundamental misconception
which emailing back and forth will fail to resolve, would just waste your
time and expose my ignorance.  (I think Andrew has sometimes protested
that "flush" can mean too many different things.)  So I'm trying to
understand it better from over here - but glad to see Rik seems
to understand my concern.

Hugh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] flush_cache_page while pte valid
@ 2002-11-12 20:53 Manfred Spraul
  2002-11-12 22:01 ` David S. Miller
  0 siblings, 1 reply; 16+ messages in thread
From: Manfred Spraul @ 2002-11-12 20:53 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel, davem

>
>
>>
>> The flush merely writes back the data, a copy-back operation, fully L2
>> cache coherent.  All cpus will see correct data if an intermittant
>> store occurs.
>
>The CPUs will, but the harddisk might not.
>  
>
The lost dirty bit can only happen on cpus where the hardware maintains 
a dirty bit. I doubt that the sparc cpus do that in hardware.

But like Hugh I don't understand how the cache writeback works on SMP.

cpu1:                cpu 2
                        access a mmaping, pte loaded into TLB
try_to_unmap_one()
flush_cache_page();
                        access the mmaping again. pte either still from
                        tlb, or reloaded from pte.
ptep_get_and_clear();
                        access the mmaping again, using the tlb
flush_tlb()
                        ??? How will the cpu write back now?

If the write back happens based on the tlb, then I don't understand why 
it's needed at all.

Regarding the dirty bit:
The assumption for the dirty bit is that the i386 cpu are the only cpus 
in the world (TM) that maintain the dirty bit in hardware, and tests on 
several i386 cpus have shown that the tlb walker retests the present bit 
before setting the dirty bit. Software tlb implementations must emulate 
that.

Thus it's guaranteed that
- if the dirty bit is not set in the result of ptep_get_and_clear, then 
no write operation has happened or will happen.
- if the dirty bit is set, then write operations could happen until the 
tlb flush.
- there will be no spuriously set dirty bits in the page tables.

--
    Manfred


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] flush_cache_page while pte valid
@ 2002-11-12 21:32 Ulrich Weigand
  0 siblings, 0 replies; 16+ messages in thread
From: Ulrich Weigand @ 2002-11-12 21:32 UTC (permalink / raw)
  To: manfred; +Cc: linux-kernel, uweigand

Manfred Spraul wrote:

>The assumption for the dirty bit is that the i386 cpu are the only cpus 
>in the world (TM) that maintain the dirty bit in hardware

Not true; the s390 also has a dirty bit in hardware ;-)

To make things more interesting, the s390 dirty bit does not
reside in the pte, but in a storage key associated with the
*physical* page.  The bit is set on any access to the physical
page, whether through any virtual mapping, by direct access
with dynamic address translation switched off, or by an I/O
device moving data directly to main memory.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  weigand@informatik.uni-erlangen.de

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] flush_cache_page while pte valid
  2002-11-12 14:49       ` Rik van Riel
@ 2002-11-12 21:45         ` David S. Miller
  0 siblings, 0 replies; 16+ messages in thread
From: David S. Miller @ 2002-11-12 21:45 UTC (permalink / raw)
  To: riel; +Cc: hugh, akpm, dmccr, linux-kernel

   From: Rik van Riel <riel@conectiva.com.br>
   Date: Tue, 12 Nov 2002 12:49:50 -0200 (BRST)

   On Mon, 11 Nov 2002, David S. Miller wrote:
   > The flush merely writes back the data, a copy-back operation, fully L2
   > cache coherent.  All cpus will see correct data if an intermittant
   > store occurs.
   
   The CPUs will, but the harddisk might not.
   
   We really need to get this right in the swapout path.

We do get it right, watch:

1) remove last mapping instance of page

   -> MM level cache flush pushes it permanently to ram
      in full view of DMA activity

2) remove last mapping, but new one appears as we swap out

   -> no problem we'll see one of several instances of the
      page and we'll need to reswap it out later anyways
      if someone currently writes to it

I know this because I tested this extensively ages ago on sparc32
where it really mattered.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] flush_cache_page while pte valid
  2002-11-12 17:43       ` Hugh Dickins
@ 2002-11-12 21:51         ` David S. Miller
  2002-11-12 22:59           ` Hugh Dickins
  0 siblings, 1 reply; 16+ messages in thread
From: David S. Miller @ 2002-11-12 21:51 UTC (permalink / raw)
  To: hugh; +Cc: akpm, dmccr, riel, linux-kernel

   From: Hugh Dickins <hugh@veritas.com>
   Date: Tue, 12 Nov 2002 17:43:40 +0000 (GMT)

   Sorry, I still don't get it.  If the flush_cache_page is doing something
   necessary, then won't a user access in between it and invalidating pte
   undo what was necessary?  And if it's not necessary, why do we do it?
   (For better performance would be a very good reason.)

If there are other writable mappings of the page, we can't swap
it out legally.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] flush_cache_page while pte valid
  2002-11-12 20:53 Manfred Spraul
@ 2002-11-12 22:01 ` David S. Miller
  0 siblings, 0 replies; 16+ messages in thread
From: David S. Miller @ 2002-11-12 22:01 UTC (permalink / raw)
  To: manfred; +Cc: riel, linux-kernel

   From: Manfred Spraul <manfred@colorfullife.com>
   Date: Tue, 12 Nov 2002 21:53:14 +0100

   The lost dirty bit can only happen on cpus where the hardware maintains 
   a dirty bit. I doubt that the sparc cpus do that in hardware.
   
sun4m sparc32 chips do, exactly like x86

Franks a lot,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] flush_cache_page while pte valid
  2002-11-12 21:51         ` David S. Miller
@ 2002-11-12 22:59           ` Hugh Dickins
  0 siblings, 0 replies; 16+ messages in thread
From: Hugh Dickins @ 2002-11-12 22:59 UTC (permalink / raw)
  To: David S. Miller; +Cc: akpm, dmccr, riel, linux-kernel

On Tue, 12 Nov 2002, David S. Miller wrote:

>    From: Hugh Dickins <hugh@veritas.com>
>    Date: Tue, 12 Nov 2002 17:43:40 +0000 (GMT)
>    
>    Sorry, I still don't get it.  If the flush_cache_page is doing something
>    necessary, then won't a user access in between it and invalidating pte
>    undo what was necessary?  And if it's not necessary, why do we do it?
>    (For better performance would be a very good reason.)
>    
> If there are other writable mappings of the page, we can't swap
> it out legally.

But I'm worried about the case where this is the last writable mapping:
it seems userspace (on another CPU) can still write to it in between
the flush_cache_page and invalidation of the pte (on this CPU hoping
to swap out that page).

Hugh


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2002-11-12 22:52 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-11-11 21:20 [PATCH] flush_cache_page while pte valid Manfred Spraul
2002-11-12  0:37 ` Russell King
  -- strict thread matches above, loose matches on Subject: below --
2002-11-12 21:32 Ulrich Weigand
2002-11-12 20:53 Manfred Spraul
2002-11-12 22:01 ` David S. Miller
2002-11-12  1:08 Ulrich Weigand
2002-11-11 18:25 Hugh Dickins
2002-11-11 18:35 ` Andrew Morton
2002-11-11 23:19 ` David S. Miller
2002-11-12  6:53   ` Hugh Dickins
2002-11-12  6:53     ` David S. Miller
2002-11-12 14:49       ` Rik van Riel
2002-11-12 21:45         ` David S. Miller
2002-11-12 17:43       ` Hugh Dickins
2002-11-12 21:51         ` David S. Miller
2002-11-12 22:59           ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox