linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Unsafe pte_update() in do_page_fault() (4xx and Book-E)
@ 2006-03-02 20:26 Eugene Surovegin
  2006-03-03  3:43 ` Benjamin Herrenschmidt
  2006-03-28  7:55 ` [PATCH] lock PTE before updating it in 440/BookE page fault handler Eugene Surovegin
  0 siblings, 2 replies; 6+ messages in thread
From: Eugene Surovegin @ 2006-03-02 20:26 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, Kumar Gala

Hi!

For the last couple of days I was debugging rare 

  swap_dup: Bad swap file entry 0x00000080

errors in my custom 2.4 kernel running on 405GPr system.

My current theory is that this error is caused by the special lazy 
dcache/icache flush handling on 4xx and BookE. Because this code in my 
2.4 was actually a backport from 2.6, I think we have a problem in 
current 2.6 as well.

Here is what I think happens. On 4xx/BookE we use execute bit to 
deffer dcache to icache flush, in do_page_fault() we flush page when 
execute trap triggers and enable _PAGE_HWEXEC bit in PTE. 

Unfortunately, we don't lock this PTE and it's possible that after 
pte_present() check but _before_ pte_update() call this particular 
page was purged from the memory, e.g. because of extreme memory 
pressure (of course, I'm assuming enabled preempt). 

If this happens, pte_update() sets _PAGE_HWEXEC bit in just cleared 
PTE. Sometime later, another page fault happens for this page, but 
because of that set bit, pte_none() test in handle_pte_fault() fails, 
and we continue along the wrong path, thinking that this PTE was 
swapped out to the swap file, and this triggers swap_dup error I 
mentioned at the beginning.

_PAGE_HWXEC is 0x200 on 405GPr, and because swap entry is PTE shifted 
2 bits to the right, we get that "0x00000080" value.

Paul, does my theory make any sense? I cannot test 2.6 on our hw. So 
far, after I added additional page_table_lock locking to my 2.4 in 
do_page_fault(), I haven't seen these errors, but it's too early to be 
100% sure :).

I'll make a patch for 2.6 if you think my analysis is correct.

-- 
Eugene

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unsafe pte_update() in do_page_fault() (4xx and Book-E)
  2006-03-02 20:26 Unsafe pte_update() in do_page_fault() (4xx and Book-E) Eugene Surovegin
@ 2006-03-03  3:43 ` Benjamin Herrenschmidt
  2006-03-03  3:55   ` Eugene Surovegin
  2006-03-28  7:55 ` [PATCH] lock PTE before updating it in 440/BookE page fault handler Eugene Surovegin
  1 sibling, 1 reply; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2006-03-03  3:43 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linuxppc-dev, Kumar Gala


> If this happens, pte_update() sets _PAGE_HWEXEC bit in just cleared 
> PTE. Sometime later, another page fault happens for this page, but 
> because of that set bit, pte_none() test in handle_pte_fault() fails, 
> and we continue along the wrong path, thinking that this PTE was 
> swapped out to the swap file, and this triggers swap_dup error I 
> mentioned at the beginning.

Can we preempt at that point ? As tehre is no SMP 4xx that I know of
preempt would be the only cause for such a race ...

Ben.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unsafe pte_update() in do_page_fault() (4xx and Book-E)
  2006-03-03  3:43 ` Benjamin Herrenschmidt
@ 2006-03-03  3:55   ` Eugene Surovegin
  0 siblings, 0 replies; 6+ messages in thread
From: Eugene Surovegin @ 2006-03-03  3:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Kumar Gala

On Fri, Mar 03, 2006 at 02:43:02PM +1100, Benjamin Herrenschmidt wrote:
> 
> > If this happens, pte_update() sets _PAGE_HWEXEC bit in just cleared 
> > PTE. Sometime later, another page fault happens for this page, but 
> > because of that set bit, pte_none() test in handle_pte_fault() fails, 
> > and we continue along the wrong path, thinking that this PTE was 
> > swapped out to the swap file, and this triggers swap_dup error I 
> > mentioned at the beginning.
> 
> Can we preempt at that point ? As tehre is no SMP 4xx that I know of
> preempt would be the only cause for such a race ...

Yes, as I mentioned in the original post, I'm running preempt enabled 
kernel.

-- 
Eugene

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] lock PTE before updating it in 440/BookE page fault handler
  2006-03-02 20:26 Unsafe pte_update() in do_page_fault() (4xx and Book-E) Eugene Surovegin
  2006-03-03  3:43 ` Benjamin Herrenschmidt
@ 2006-03-28  7:55 ` Eugene Surovegin
  2006-03-28 11:10   ` Paul Mackerras
  1 sibling, 1 reply; 6+ messages in thread
From: Eugene Surovegin @ 2006-03-28  7:55 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Fix 44x and BookE page fault handler to correctly lock PTE before 
trying to pte_update() it, otherwise this PTE might be swapped out 
after pte_present() check but before pte_uptdate() call, resulting in 
corrupted PTE. This can happen with enabled preemption and low memory 
condition.

Signed-off-by: Eugene Surovegin <ebs@ebshome.net>

---
 arch/ppc/mm/fault.c       |   30 +++++++++++++++++-------------
 arch/ppc/mm/pgtable.c     |    6 ++++--
 include/asm-ppc/pgtable.h |    3 ++-
 3 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/arch/ppc/mm/fault.c b/arch/ppc/mm/fault.c
index 0217188..8e08ca3 100644
--- a/arch/ppc/mm/fault.c
+++ b/arch/ppc/mm/fault.c
@@ -202,6 +202,7 @@ good_area:
 	/* an exec  - 4xx/Book-E allows for per-page execute permission */
 	} else if (TRAP(regs) == 0x400) {
 		pte_t *ptep;
+		pmd_t *pmdp;
 
 #if 0
 		/* It would be nice to actually enforce the VM execute
@@ -215,21 +216,24 @@ good_area:
 		/* Since 4xx/Book-E supports per-page execute permission,
 		 * we lazily flush dcache to icache. */
 		ptep = NULL;
-		if (get_pteptr(mm, address, &ptep) && pte_present(*ptep)) {
-			struct page *page = pte_page(*ptep);
-
-			if (! test_bit(PG_arch_1, &page->flags)) {
-				flush_dcache_icache_page(page);
-				set_bit(PG_arch_1, &page->flags);
+		if (get_pteptr(mm, address, &ptep, &pmdp)) {
+			spinlock_t *ptl = pte_lockptr(mm, pmdp);
+			spin_lock(ptl);
+			if (pte_present(*ptep)) {
+				struct page *page = pte_page(*ptep);
+
+				if (!test_bit(PG_arch_1, &page->flags)) {
+					flush_dcache_icache_page(page);
+					set_bit(PG_arch_1, &page->flags);
+				}
+				pte_update(ptep, 0, _PAGE_HWEXEC);
+				_tlbie(address);
+				pte_unmap_unlock(ptep, ptl);
+				up_read(&mm->mmap_sem);
+				return 0;
 			}
-			pte_update(ptep, 0, _PAGE_HWEXEC);
-			_tlbie(address);
-			pte_unmap(ptep);
-			up_read(&mm->mmap_sem);
-			return 0;
+			pte_unmap_unlock(ptep, ptl);
 		}
-		if (ptep != NULL)
-			pte_unmap(ptep);
 #endif
 	/* a read */
 	} else {
diff --git a/arch/ppc/mm/pgtable.c b/arch/ppc/mm/pgtable.c
index 6ea9185..98a83be 100644
--- a/arch/ppc/mm/pgtable.c
+++ b/arch/ppc/mm/pgtable.c
@@ -368,7 +368,7 @@ void __init io_block_mapping(unsigned lo
  * the PTE pointer is unmodified if PTE is not found.
  */
 int
-get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep)
+get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep, pmd_t **pmdp)
 {
         pgd_t	*pgd;
         pmd_t	*pmd;
@@ -383,6 +383,8 @@ get_pteptr(struct mm_struct *mm, unsigne
                         if (pte) {
 				retval = 1;
 				*ptep = pte;
+				if (pmdp)
+					*pmdp = pmd;
 				/* XXX caller needs to do pte_unmap, yuck */
                         }
                 }
@@ -420,7 +422,7 @@ unsigned long iopa(unsigned long addr)
 		mm = &init_mm;
 
 	pa = 0;
-	if (get_pteptr(mm, addr, &pte)) {
+	if (get_pteptr(mm, addr, &pte, NULL)) {
 		pa = (pte_val(*pte) & PAGE_MASK) | (addr & ~PAGE_MASK);
 		pte_unmap(pte);
 	}
diff --git a/include/asm-ppc/pgtable.h b/include/asm-ppc/pgtable.h
index e1c62da..570b355 100644
--- a/include/asm-ppc/pgtable.h
+++ b/include/asm-ppc/pgtable.h
@@ -837,7 +837,8 @@ static inline int io_remap_pfn_range(str
  */
 #define pgtable_cache_init()	do { } while (0)
 
-extern int get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep);
+extern int get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep,
+		      pmd_t **pmdp);
 
 #include <asm-generic/pgtable.h>
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] lock PTE before updating it in 440/BookE page fault handler
  2006-03-28  7:55 ` [PATCH] lock PTE before updating it in 440/BookE page fault handler Eugene Surovegin
@ 2006-03-28 11:10   ` Paul Mackerras
  2006-03-28 18:13     ` Eugene Surovegin
  0 siblings, 1 reply; 6+ messages in thread
From: Paul Mackerras @ 2006-03-28 11:10 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linuxppc-dev

Eugene Surovegin writes:

> Fix 44x and BookE page fault handler to correctly lock PTE before 
> trying to pte_update() it, otherwise this PTE might be swapped out 
> after pte_present() check but before pte_uptdate() call, resulting in 
> corrupted PTE. This can happen with enabled preemption and low memory 
> condition.

That gives me this for an ARCH=powerpc 32-bit pmac build:

/home/paulus/kernel/powerpc/arch/powerpc/mm/pgtable_32.c:376: error: conflicting types for 'get_pteptr'
/home/paulus/kernel/powerpc/include/asm-ppc/pgtable.h:841: error: previous declaration of 'get_pteptr' was here

Paul.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] lock PTE before updating it in 440/BookE page fault handler
  2006-03-28 11:10   ` Paul Mackerras
@ 2006-03-28 18:13     ` Eugene Surovegin
  0 siblings, 0 replies; 6+ messages in thread
From: Eugene Surovegin @ 2006-03-28 18:13 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

On Tue, Mar 28, 2006 at 10:10:39PM +1100, Paul Mackerras wrote:
> Eugene Surovegin writes:
> 
> > Fix 44x and BookE page fault handler to correctly lock PTE before 
> > trying to pte_update() it, otherwise this PTE might be swapped out 
> > after pte_present() check but before pte_uptdate() call, resulting in 
> > corrupted PTE. This can happen with enabled preemption and low memory 
> > condition.
> 
> That gives me this for an ARCH=powerpc 32-bit pmac build:
> 
> /home/paulus/kernel/powerpc/arch/powerpc/mm/pgtable_32.c:376: error: conflicting types for 'get_pteptr'
> /home/paulus/kernel/powerpc/include/asm-ppc/pgtable.h:841: error: previous declaration of 'get_pteptr' was here

Oops, sorry. Haven't noticed we have duplicated files in ppc and 
powerpc. Full patch follows.


Fix 44x and BookE page fault handler to correctly lock PTE before
trying to pte_update() it, otherwise this PTE might be swapped out
after pte_present() check but before pte_uptdate() call, resulting in
corrupted PTE. This can happen with enabled preemption and low memory
condition.

Signed-off-by: Eugene Surovegin <ebs@ebshome.net>

---
 arch/powerpc/mm/fault.c      |   30 +++++++++++++++++-------------
 arch/powerpc/mm/pgtable_32.c |    6 ++++--
 arch/ppc/mm/fault.c          |   30 +++++++++++++++++-------------
 arch/ppc/mm/pgtable.c        |    6 ++++--
 include/asm-ppc/pgtable.h    |    3 ++-
 5 files changed, 44 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index ec4adcb..5aea090 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -267,25 +267,29 @@ good_area:
 #endif
 #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE)
 		pte_t *ptep;
+		pmd_t *pmdp;
 
 		/* Since 4xx/Book-E supports per-page execute permission,
 		 * we lazily flush dcache to icache. */
 		ptep = NULL;
-		if (get_pteptr(mm, address, &ptep) && pte_present(*ptep)) {
-			struct page *page = pte_page(*ptep);
-
-			if (! test_bit(PG_arch_1, &page->flags)) {
-				flush_dcache_icache_page(page);
-				set_bit(PG_arch_1, &page->flags);
+		if (get_pteptr(mm, address, &ptep, &pmdp)) {
+			spinlock_t *ptl = pte_lockptr(mm, pmdp);
+			spin_lock(ptl);
+			if (pte_present(*ptep)) {
+				struct page *page = pte_page(*ptep);
+
+				if (!test_bit(PG_arch_1, &page->flags)) {
+					flush_dcache_icache_page(page);
+					set_bit(PG_arch_1, &page->flags);
+				}
+				pte_update(ptep, 0, _PAGE_HWEXEC);
+				_tlbie(address);
+				pte_unmap_unlock(ptep, ptl);
+				up_read(&mm->mmap_sem);
+				return 0;
 			}
-			pte_update(ptep, 0, _PAGE_HWEXEC);
-			_tlbie(address);
-			pte_unmap(ptep);
-			up_read(&mm->mmap_sem);
-			return 0;
+			pte_unmap_unlock(ptep, ptl);
 		}
-		if (ptep != NULL)
-			pte_unmap(ptep);
 #endif
 	/* a write */
 	} else if (is_write) {
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index d296eb6..9062860 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -372,7 +372,7 @@ void __init io_block_mapping(unsigned lo
  * the PTE pointer is unmodified if PTE is not found.
  */
 int
-get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep)
+get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep, pmd_t **pmdp)
 {
         pgd_t	*pgd;
         pmd_t	*pmd;
@@ -387,6 +387,8 @@ get_pteptr(struct mm_struct *mm, unsigne
                         if (pte) {
 				retval = 1;
 				*ptep = pte;
+				if (pmdp)
+					*pmdp = pmd;
 				/* XXX caller needs to do pte_unmap, yuck */
                         }
                 }
@@ -424,7 +426,7 @@ unsigned long iopa(unsigned long addr)
 		mm = &init_mm;
 
 	pa = 0;
-	if (get_pteptr(mm, addr, &pte)) {
+	if (get_pteptr(mm, addr, &pte, NULL)) {
 		pa = (pte_val(*pte) & PAGE_MASK) | (addr & ~PAGE_MASK);
 		pte_unmap(pte);
 	}
diff --git a/arch/ppc/mm/fault.c b/arch/ppc/mm/fault.c
index 0217188..8e08ca3 100644
--- a/arch/ppc/mm/fault.c
+++ b/arch/ppc/mm/fault.c
@@ -202,6 +202,7 @@ good_area:
 	/* an exec  - 4xx/Book-E allows for per-page execute permission */
 	} else if (TRAP(regs) == 0x400) {
 		pte_t *ptep;
+		pmd_t *pmdp;
 
 #if 0
 		/* It would be nice to actually enforce the VM execute
@@ -215,21 +216,24 @@ good_area:
 		/* Since 4xx/Book-E supports per-page execute permission,
 		 * we lazily flush dcache to icache. */
 		ptep = NULL;
-		if (get_pteptr(mm, address, &ptep) && pte_present(*ptep)) {
-			struct page *page = pte_page(*ptep);
-
-			if (! test_bit(PG_arch_1, &page->flags)) {
-				flush_dcache_icache_page(page);
-				set_bit(PG_arch_1, &page->flags);
+		if (get_pteptr(mm, address, &ptep, &pmdp)) {
+			spinlock_t *ptl = pte_lockptr(mm, pmdp);
+			spin_lock(ptl);
+			if (pte_present(*ptep)) {
+				struct page *page = pte_page(*ptep);
+
+				if (!test_bit(PG_arch_1, &page->flags)) {
+					flush_dcache_icache_page(page);
+					set_bit(PG_arch_1, &page->flags);
+				}
+				pte_update(ptep, 0, _PAGE_HWEXEC);
+				_tlbie(address);
+				pte_unmap_unlock(ptep, ptl);
+				up_read(&mm->mmap_sem);
+				return 0;
 			}
-			pte_update(ptep, 0, _PAGE_HWEXEC);
-			_tlbie(address);
-			pte_unmap(ptep);
-			up_read(&mm->mmap_sem);
-			return 0;
+			pte_unmap_unlock(ptep, ptl);
 		}
-		if (ptep != NULL)
-			pte_unmap(ptep);
 #endif
 	/* a read */
 	} else {
diff --git a/arch/ppc/mm/pgtable.c b/arch/ppc/mm/pgtable.c
index 6ea9185..98a83be 100644
--- a/arch/ppc/mm/pgtable.c
+++ b/arch/ppc/mm/pgtable.c
@@ -368,7 +368,7 @@ void __init io_block_mapping(unsigned lo
  * the PTE pointer is unmodified if PTE is not found.
  */
 int
-get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep)
+get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep, pmd_t **pmdp)
 {
         pgd_t	*pgd;
         pmd_t	*pmd;
@@ -383,6 +383,8 @@ get_pteptr(struct mm_struct *mm, unsigne
                         if (pte) {
 				retval = 1;
 				*ptep = pte;
+				if (pmdp)
+					*pmdp = pmd;
 				/* XXX caller needs to do pte_unmap, yuck */
                         }
                 }
@@ -420,7 +422,7 @@ unsigned long iopa(unsigned long addr)
 		mm = &init_mm;
 
 	pa = 0;
-	if (get_pteptr(mm, addr, &pte)) {
+	if (get_pteptr(mm, addr, &pte, NULL)) {
 		pa = (pte_val(*pte) & PAGE_MASK) | (addr & ~PAGE_MASK);
 		pte_unmap(pte);
 	}
diff --git a/include/asm-ppc/pgtable.h b/include/asm-ppc/pgtable.h
index e1c62da..570b355 100644
--- a/include/asm-ppc/pgtable.h
+++ b/include/asm-ppc/pgtable.h
@@ -837,7 +837,8 @@ static inline int io_remap_pfn_range(str
  */
 #define pgtable_cache_init()	do { } while (0)
 
-extern int get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep);
+extern int get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep,
+		      pmd_t **pmdp);
 
 #include <asm-generic/pgtable.h>
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-03-28 18:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-02 20:26 Unsafe pte_update() in do_page_fault() (4xx and Book-E) Eugene Surovegin
2006-03-03  3:43 ` Benjamin Herrenschmidt
2006-03-03  3:55   ` Eugene Surovegin
2006-03-28  7:55 ` [PATCH] lock PTE before updating it in 440/BookE page fault handler Eugene Surovegin
2006-03-28 11:10   ` Paul Mackerras
2006-03-28 18:13     ` Eugene Surovegin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).