linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ARM: LPAE: Invalidate the TLB for module addresses during translation fault
@ 2012-02-23 17:59 Catalin Marinas
  2012-02-23 19:50 ` Russell King - ARM Linux
  0 siblings, 1 reply; 6+ messages in thread
From: Catalin Marinas @ 2012-02-23 17:59 UTC (permalink / raw)
  To: linux-arm-kernel

During the free_pgtables() call all user and modules/pkmap entries are
removed. If a translation fault for the modules/pkmap area occurs before
we switched away from the current pgd, do_translation_fault() would copy
the init_mm pud into the user pud.

There is a small window between pud_clear() and pmd_free_tlb() in
free_pmd_range() where the pud entry was cleared but the TLB has not
been invalidated yet and the CPU may have cached the original (valid)
pud entry in the TLB. A scenario like below would get stuck in
continuous prefetch abort:

1. Current process exiting. The modules pmd entries not populated
2. exit_mmap() -> ... -> pmd_free_tlb()
3. pud_clear() for the 1GB pud containing user stack and modules (no TLB
   invalidation yet)
4. Interrupt -> module interrupt routine
5. Level 2 (pmd) translation fault occurs when executing the module
   interrupt routine. The CPU previously cached (TLB) the old valid pud
   value for the modules area, so we don't get a level 1 translation
   fault
6. do_translation fault() copies the pud_k into the pud
7. Linux returns to the faulty instruction. Goes back to 5

At point 7, since the CPU still has the old pud value, it goes back to
point 5 and never gets out of this loop. With this patch, the stale pud
TLB entry is invalidated after point 6 and the subsequent prefetch abort
doesn't occur.

Reported-by: Tony Thompson <Anthony.Thompson@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/mm/fault.c |   13 +++++++++++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index bb7eac3..d9aacf0 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -444,8 +444,16 @@ do_translation_fault(unsigned long addr, unsigned int fsr,
 
 	if (pud_none(*pud_k))
 		goto bad_area;
-	if (!pud_present(*pud))
+	if (!pud_present(*pud)) {
 		set_pud(pud, *pud_k);
+		/*
+		 * There is a small window during free_pgtables() where the
+		 * user *pud entry is 0 but the TLB has not been invalidated
+		 * and we get a level 2 (pmd) translation fault caused by the
+		 * intermediate TLB caching of the old level 1 (pud) entry.
+		 */
+		flush_tlb_kernel_page(addr);
+	}
 
 	pmd = pmd_offset(pud, addr);
 	pmd_k = pmd_offset(pud_k, addr);
@@ -468,8 +476,9 @@ do_translation_fault(unsigned long addr, unsigned int fsr,
 #endif
 	if (pmd_none(pmd_k[index]))
 		goto bad_area;
+	if (!pmd_present(pmd[index]))
+		copy_pmd(pmd, pmd_k);
 
-	copy_pmd(pmd, pmd_k);
 	return 0;
 
 bad_area:

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH] ARM: LPAE: Invalidate the TLB for module addresses during translation fault
  2012-02-23 17:59 [PATCH] ARM: LPAE: Invalidate the TLB for module addresses during translation fault Catalin Marinas
@ 2012-02-23 19:50 ` Russell King - ARM Linux
  2012-02-23 22:13   ` Catalin Marinas
  0 siblings, 1 reply; 6+ messages in thread
From: Russell King - ARM Linux @ 2012-02-23 19:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 23, 2012 at 05:59:41PM +0000, Catalin Marinas wrote:
> During the free_pgtables() call all user and modules/pkmap entries are
> removed.

Err, no.  free_pgtables() should only ever touch up to TASK_SIZE, which
is the maximum size of userspace.  Moreover, because TASK_SIZE does not
cover an entire pud, free_pgtables() should not be removing the pud
table.

> There is a small window between pud_clear() and pmd_free_tlb() in
> free_pmd_range() where the pud entry was cleared but the TLB has not
> been invalidated yet and the CPU may have cached the original (valid)
> pud entry in the TLB. A scenario like below would get stuck in
> continuous prefetch abort:
> 
> 1. Current process exiting. The modules pmd entries not populated
> 2. exit_mmap() -> ... -> pmd_free_tlb()
> 3. pud_clear() for the 1GB pud containing user stack and modules (no TLB
>    invalidation yet)

This is where things are wrong.  Because the request should not be
requesting that the entire pud is being cleared, it should not be
removed from the pgd.

So, there's some debugging to be done here.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] ARM: LPAE: Invalidate the TLB for module addresses during translation fault
  2012-02-23 19:50 ` Russell King - ARM Linux
@ 2012-02-23 22:13   ` Catalin Marinas
  2012-02-24  9:58     ` Catalin Marinas
  0 siblings, 1 reply; 6+ messages in thread
From: Catalin Marinas @ 2012-02-23 22:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 23, 2012 at 07:50:03PM +0000, Russell King - ARM Linux wrote:
> On Thu, Feb 23, 2012 at 05:59:41PM +0000, Catalin Marinas wrote:
> > During the free_pgtables() call all user and modules/pkmap entries are
> > removed.
> 
> Err, no.  free_pgtables() should only ever touch up to TASK_SIZE, which
> is the maximum size of userspace.  Moreover, because TASK_SIZE does not
> cover an entire pud, free_pgtables() should not be removing the pud
> table.

Are you sure? exit_mmap() calls free_pgtables() with ceiling == 0.  If
free_pmd_range() is called for a range in the top 1GB of the task
address space (which includes modules), it also calls pud_clear()
because ceiling is 0.

> > There is a small window between pud_clear() and pmd_free_tlb() in
> > free_pmd_range() where the pud entry was cleared but the TLB has not
> > been invalidated yet and the CPU may have cached the original (valid)
> > pud entry in the TLB. A scenario like below would get stuck in
> > continuous prefetch abort:
> > 
> > 1. Current process exiting. The modules pmd entries not populated
> > 2. exit_mmap() -> ... -> pmd_free_tlb()
> > 3. pud_clear() for the 1GB pud containing user stack and modules (no TLB
> >    invalidation yet)
> 
> This is where things are wrong.  Because the request should not be
> requesting that the entire pud is being cleared, it should not be
> removed from the pgd.

Actually the pgd and pud are the same. free_pgtables() removes the whole
pmd (in free_pmd_range()) and clears the higher pud entry that includes
the modules mapping.

-- 
Catalin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] ARM: LPAE: Invalidate the TLB for module addresses during translation fault
  2012-02-23 22:13   ` Catalin Marinas
@ 2012-02-24  9:58     ` Catalin Marinas
  2012-02-24 10:12       ` Russell King - ARM Linux
  0 siblings, 1 reply; 6+ messages in thread
From: Catalin Marinas @ 2012-02-24  9:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 23, 2012 at 10:13:43PM +0000, Catalin Marinas wrote:
> On Thu, Feb 23, 2012 at 07:50:03PM +0000, Russell King - ARM Linux wrote:
> > On Thu, Feb 23, 2012 at 05:59:41PM +0000, Catalin Marinas wrote:
> > > During the free_pgtables() call all user and modules/pkmap entries are
> > > removed.
> > 
> > Err, no.  free_pgtables() should only ever touch up to TASK_SIZE, which
> > is the maximum size of userspace.  Moreover, because TASK_SIZE does not
> > cover an entire pud, free_pgtables() should not be removing the pud
> > table.
> 
> Are you sure? exit_mmap() calls free_pgtables() with ceiling == 0.  If
> free_pmd_range() is called for a range in the top 1GB of the task
> address space (which includes modules), it also calls pud_clear()
> because ceiling is 0.

BTW, an alternative patch but which needs wider acknowledgement as it
touches generic code (I can post it to linux-mm):

diff --git a/mm/mmap.c b/mm/mmap.c
index 3f758c7..5e5c8a8 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1866,7 +1866,7 @@ static void unmap_region(struct mm_struct *mm,
 	unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
 	vm_unacct_memory(nr_accounted);
 	free_pgtables(&tlb, vma, prev ? prev->vm_end : FIRST_USER_ADDRESS,
-				 next ? next->vm_start : 0);
+				 next ? next->vm_start : TASK_SIZE);
 	tlb_finish_mmu(&tlb, start, end);
 }
 
@@ -2241,7 +2241,7 @@ void exit_mmap(struct mm_struct *mm)
 	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
 	vm_unacct_memory(nr_accounted);
 
-	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, 0);
+	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, TASK_SIZE);
 	tlb_finish_mmu(&tlb, 0, end);
 
 	/*

-- 
Catalin

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH] ARM: LPAE: Invalidate the TLB for module addresses during translation fault
  2012-02-24  9:58     ` Catalin Marinas
@ 2012-02-24 10:12       ` Russell King - ARM Linux
  2012-02-24 11:39         ` Catalin Marinas
  0 siblings, 1 reply; 6+ messages in thread
From: Russell King - ARM Linux @ 2012-02-24 10:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 24, 2012 at 09:58:06AM +0000, Catalin Marinas wrote:
> On Thu, Feb 23, 2012 at 10:13:43PM +0000, Catalin Marinas wrote:
> > On Thu, Feb 23, 2012 at 07:50:03PM +0000, Russell King - ARM Linux wrote:
> > > On Thu, Feb 23, 2012 at 05:59:41PM +0000, Catalin Marinas wrote:
> > > > During the free_pgtables() call all user and modules/pkmap entries are
> > > > removed.
> > > 
> > > Err, no.  free_pgtables() should only ever touch up to TASK_SIZE, which
> > > is the maximum size of userspace.  Moreover, because TASK_SIZE does not
> > > cover an entire pud, free_pgtables() should not be removing the pud
> > > table.
> > 
> > Are you sure? exit_mmap() calls free_pgtables() with ceiling == 0.  If
> > free_pmd_range() is called for a range in the top 1GB of the task
> > address space (which includes modules), it also calls pud_clear()
> > because ceiling is 0.
> 
> BTW, an alternative patch but which needs wider acknowledgement as it
> touches generic code (I can post it to linux-mm):

I think that's the right solution.

> diff --git a/mm/mmap.c b/mm/mmap.c
> index 3f758c7..5e5c8a8 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1866,7 +1866,7 @@ static void unmap_region(struct mm_struct *mm,
>  	unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
>  	vm_unacct_memory(nr_accounted);
>  	free_pgtables(&tlb, vma, prev ? prev->vm_end : FIRST_USER_ADDRESS,
> -				 next ? next->vm_start : 0);
> +				 next ? next->vm_start : TASK_SIZE);
>  	tlb_finish_mmu(&tlb, start, end);
>  }
>  
> @@ -2241,7 +2241,7 @@ void exit_mmap(struct mm_struct *mm)
>  	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
>  	vm_unacct_memory(nr_accounted);
>  
> -	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, 0);
> +	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, TASK_SIZE);
>  	tlb_finish_mmu(&tlb, 0, end);
>  
>  	/*
> 
> -- 
> Catalin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] ARM: LPAE: Invalidate the TLB for module addresses during translation fault
  2012-02-24 10:12       ` Russell King - ARM Linux
@ 2012-02-24 11:39         ` Catalin Marinas
  0 siblings, 0 replies; 6+ messages in thread
From: Catalin Marinas @ 2012-02-24 11:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 24, 2012 at 10:12:00AM +0000, Russell King - ARM Linux wrote:
> On Fri, Feb 24, 2012 at 09:58:06AM +0000, Catalin Marinas wrote:
> > On Thu, Feb 23, 2012 at 10:13:43PM +0000, Catalin Marinas wrote:
> > > On Thu, Feb 23, 2012 at 07:50:03PM +0000, Russell King - ARM Linux wrote:
> > > > On Thu, Feb 23, 2012 at 05:59:41PM +0000, Catalin Marinas wrote:
> > > > > During the free_pgtables() call all user and modules/pkmap entries are
> > > > > removed.
> > > > 
> > > > Err, no.  free_pgtables() should only ever touch up to TASK_SIZE, which
> > > > is the maximum size of userspace.  Moreover, because TASK_SIZE does not
> > > > cover an entire pud, free_pgtables() should not be removing the pud
> > > > table.
> > > 
> > > Are you sure? exit_mmap() calls free_pgtables() with ceiling == 0.  If
> > > free_pmd_range() is called for a range in the top 1GB of the task
> > > address space (which includes modules), it also calls pud_clear()
> > > because ceiling is 0.
> > 
> > BTW, an alternative patch but which needs wider acknowledgement as it
> > touches generic code (I can post it to linux-mm):
> 
> I think that's the right solution.

Something similar is needed in shift_arg_pages() when calling
free_pgd_range().

But my concern - are there other architectures that rely on
free_pgtables() to remove page tables corresponding to vmas beyond
TASK_SIZE (e.g. vsyscall)?

-- 
Catalin

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-02-24 11:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-23 17:59 [PATCH] ARM: LPAE: Invalidate the TLB for module addresses during translation fault Catalin Marinas
2012-02-23 19:50 ` Russell King - ARM Linux
2012-02-23 22:13   ` Catalin Marinas
2012-02-24  9:58     ` Catalin Marinas
2012-02-24 10:12       ` Russell King - ARM Linux
2012-02-24 11:39         ` Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).