linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] pageattr fixes for pmd/pte_present
@ 2012-12-17 18:00 Andrea Arcangeli
  2012-12-17 18:00 ` [PATCH 1/2] Revert "x86, mm: Make spurious_fault check explicitly check the PRESENT bit" Andrea Arcangeli
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Andrea Arcangeli @ 2012-12-17 18:00 UTC (permalink / raw)
  To: linux-mm
  Cc: Andi Kleen, Andrew Morton, Shaohua Li, H. Peter Anvin, Mel Gorman,
	Hugh Dickins

Hi,

I got a report for a minor regression introduced by commit
027ef6c87853b0a9df53175063028edb4950d476.

So the problem is, pageattr creates kernel pagetables (pte and pmds)
that breaks pte_present/pmd_present and the patch above exposed this
invariant breakage for pmd_present.

The same problem already existed for the pte and pte_present and it
was fixed by commit 660a293ea9be709b893d371fbc0328fcca33c33a (if it
wasn't for that commit, it wouldn't even be a regression). That fix
avoids the pagefault to use pte_present. I could follow through by
stopping using pmd_present/pmd_huge too.

However I think it's more robust to fix pageattr and to clear the
PSE/GLOBAL bitflags too in addition to the present bitflag. So the
kernel page fault can keep using the regular
pte_present/pmd_present/pmd_huge.

The confusion arises because _PAGE_GLOBAL and _PAGE_PROTNONE are
sharing the same bit, and in the pmd case we pretend _PAGE_PSE to be
set only in present pmds (to facilitate split_huge_page final tlb
flush).

Andrea Arcangeli (2):
  Revert "x86, mm: Make spurious_fault check explicitly check the
    PRESENT bit"
  pageattr: prevent PSE and GLOABL leftovers to confuse pmd/pte_present
    and pmd_huge

 arch/x86/mm/fault.c    |    8 +------
 arch/x86/mm/pageattr.c |   50 +++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 48 insertions(+), 10 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] Revert "x86, mm: Make spurious_fault check explicitly check the PRESENT bit"
  2012-12-17 18:00 [PATCH 0/2] pageattr fixes for pmd/pte_present Andrea Arcangeli
@ 2012-12-17 18:00 ` Andrea Arcangeli
  2012-12-17 18:17   ` H. Peter Anvin
  2012-12-17 18:00 ` [PATCH 2/2] pageattr: prevent PSE and GLOABL leftovers to confuse pmd/pte_present and pmd_huge Andrea Arcangeli
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Andrea Arcangeli @ 2012-12-17 18:00 UTC (permalink / raw)
  To: linux-mm
  Cc: Andi Kleen, Andrew Morton, Shaohua Li, H. Peter Anvin, Mel Gorman,
	Hugh Dickins

This reverts commit 660a293ea9be709b893d371fbc0328fcca33c33a.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 arch/x86/mm/fault.c |    8 +-------
 1 files changed, 1 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 8e13ecb..4d18eb5 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -944,14 +944,8 @@ spurious_fault(unsigned long error_code, unsigned long address)
 	if (pmd_large(*pmd))
 		return spurious_fault_check(error_code, (pte_t *) pmd);
 
-	/*
-	 * Note: don't use pte_present() here, since it returns true
-	 * if the _PAGE_PROTNONE bit is set.  However, this aliases the
-	 * _PAGE_GLOBAL bit, which for kernel pages give false positives
-	 * when CONFIG_DEBUG_PAGEALLOC is used.
-	 */
 	pte = pte_offset_kernel(pmd, address);
-	if (!(pte_flags(*pte) & _PAGE_PRESENT))
+	if (!pte_present(*pte))
 		return 0;
 
 	ret = spurious_fault_check(error_code, pte);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] pageattr: prevent PSE and GLOABL leftovers to confuse pmd/pte_present and pmd_huge
  2012-12-17 18:00 [PATCH 0/2] pageattr fixes for pmd/pte_present Andrea Arcangeli
  2012-12-17 18:00 ` [PATCH 1/2] Revert "x86, mm: Make spurious_fault check explicitly check the PRESENT bit" Andrea Arcangeli
@ 2012-12-17 18:00 ` Andrea Arcangeli
  2013-01-10  7:59   ` Simon Jeons
  2013-01-06  2:59 ` [PATCH 0/2] pageattr fixes for pmd/pte_present Simon Jeons
  2013-01-10  7:42 ` Simon Jeons
  3 siblings, 1 reply; 11+ messages in thread
From: Andrea Arcangeli @ 2012-12-17 18:00 UTC (permalink / raw)
  To: linux-mm
  Cc: Andi Kleen, Andrew Morton, Shaohua Li, H. Peter Anvin, Mel Gorman,
	Hugh Dickins

Without this patch any kernel code that reads kernel memory in non
present kernel pte/pmds (as set by pageattr.c) will crash.

With this kernel code:

static struct page *crash_page;
static unsigned long *crash_address;
[..]
	crash_page = alloc_pages(GFP_KERNEL, 9);
	crash_address = page_address(crash_page);
	if (set_memory_np((unsigned long)crash_address, 1))
		printk("set_memory_np failure\n");
[..]

The kernel will crash if inside the "crash tool" one would try to read
the memory at the not present address.

crash> p crash_address
crash_address = $8 = (long unsigned int *) 0xffff88023c000000
crash> rd 0xffff88023c000000
[ *lockup* ]

The lockup happens because _PAGE_GLOBAL and _PAGE_PROTNONE shares the
same bit, and pageattr leaves _PAGE_GLOBAL set on a kernel pte which
is then mistaken as _PAGE_PROTNONE (so pte_present returns true by
mistake and the kernel fault then gets confused and loops).

With THP the same can happen after we taught pmd_present to check
_PAGE_PROTNONE and _PAGE_PSE in commit
027ef6c87853b0a9df53175063028edb4950d476. THP has the same problem
with _PAGE_GLOBAL as the 4k pages, but it also has a problem with
_PAGE_PSE, which must be cleared too.

After the patch is applied copy_user correctly returns -EFAULT and
doesn't lockup anymore.

crash> p crash_address
crash_address = $9 = (long unsigned int *) 0xffff88023c000000
crash> rd 0xffff88023c000000
rd: read error: kernel virtual address: ffff88023c000000  type: "64-bit KVADDR"

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 arch/x86/mm/pageattr.c |   50 +++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index a718e0d..2713be4 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -445,6 +445,19 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
 	pgprot_val(req_prot) |= pgprot_val(cpa->mask_set);
 
 	/*
+	 * Set the PSE and GLOBAL flags only if the PRESENT flag is
+	 * set otherwise pmd_present/pmd_huge will return true even on
+	 * a non present pmd. The canon_pgprot will clear _PAGE_GLOBAL
+	 * for the ancient hardware that doesn't support it.
+	 */
+	if (pgprot_val(new_prot) & _PAGE_PRESENT)
+		pgprot_val(new_prot) |= _PAGE_PSE | _PAGE_GLOBAL;
+	else
+		pgprot_val(new_prot) &= ~(_PAGE_PSE | _PAGE_GLOBAL);
+
+	new_prot = canon_pgprot(new_prot);
+
+	/*
 	 * old_pte points to the large page base address. So we need
 	 * to add the offset of the virtual address:
 	 */
@@ -489,7 +502,7 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
 		 * The address is aligned and the number of pages
 		 * covers the full page.
 		 */
-		new_pte = pfn_pte(pte_pfn(old_pte), canon_pgprot(new_prot));
+		new_pte = pfn_pte(pte_pfn(old_pte), new_prot);
 		__set_pmd_pte(kpte, address, new_pte);
 		cpa->flags |= CPA_FLUSHTLB;
 		do_split = 0;
@@ -540,16 +553,35 @@ static int split_large_page(pte_t *kpte, unsigned long address)
 #ifdef CONFIG_X86_64
 	if (level == PG_LEVEL_1G) {
 		pfninc = PMD_PAGE_SIZE >> PAGE_SHIFT;
-		pgprot_val(ref_prot) |= _PAGE_PSE;
+		/*
+		 * Set the PSE flags only if the PRESENT flag is set
+		 * otherwise pmd_present/pmd_huge will return true
+		 * even on a non present pmd.
+		 */
+		if (pgprot_val(ref_prot) & _PAGE_PRESENT)
+			pgprot_val(ref_prot) |= _PAGE_PSE;
+		else
+			pgprot_val(ref_prot) &= ~_PAGE_PSE;
 	}
 #endif
 
 	/*
+	 * Set the GLOBAL flags only if the PRESENT flag is set
+	 * otherwise pmd/pte_present will return true even on a non
+	 * present pmd/pte. The canon_pgprot will clear _PAGE_GLOBAL
+	 * for the ancient hardware that doesn't support it.
+	 */
+	if (pgprot_val(ref_prot) & _PAGE_PRESENT)
+		pgprot_val(ref_prot) |= _PAGE_GLOBAL;
+	else
+		pgprot_val(ref_prot) &= ~_PAGE_GLOBAL;
+
+	/*
 	 * Get the target pfn from the original entry:
 	 */
 	pfn = pte_pfn(*kpte);
 	for (i = 0; i < PTRS_PER_PTE; i++, pfn += pfninc)
-		set_pte(&pbase[i], pfn_pte(pfn, ref_prot));
+		set_pte(&pbase[i], pfn_pte(pfn, canon_pgprot(ref_prot)));
 
 	if (address >= (unsigned long)__va(0) &&
 		address < (unsigned long)__va(max_low_pfn_mapped << PAGE_SHIFT))
@@ -660,6 +692,18 @@ repeat:
 		new_prot = static_protections(new_prot, address, pfn);
 
 		/*
+		 * Set the GLOBAL flags only if the PRESENT flag is
+		 * set otherwise pte_present will return true even on
+		 * a non present pte. The canon_pgprot will clear
+		 * _PAGE_GLOBAL for the ancient hardware that doesn't
+		 * support it.
+		 */
+		if (pgprot_val(new_prot) & _PAGE_PRESENT)
+			pgprot_val(new_prot) |= _PAGE_GLOBAL;
+		else
+			pgprot_val(new_prot) &= ~_PAGE_GLOBAL;
+
+		/*
 		 * We need to keep the pfn from the existing PTE,
 		 * after all we're only going to change it's attributes
 		 * not the memory it points to

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Revert "x86, mm: Make spurious_fault check explicitly check the PRESENT bit"
  2012-12-17 18:00 ` [PATCH 1/2] Revert "x86, mm: Make spurious_fault check explicitly check the PRESENT bit" Andrea Arcangeli
@ 2012-12-17 18:17   ` H. Peter Anvin
  2012-12-17 18:35     ` Andrea Arcangeli
  0 siblings, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2012-12-17 18:17 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mm, Andi Kleen, Andrew Morton, Shaohua Li, Mel Gorman,
	Hugh Dickins

On 12/17/2012 10:00 AM, Andrea Arcangeli wrote:
> This reverts commit 660a293ea9be709b893d371fbc0328fcca33c33a.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

Motivation/details?

	-hpa


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] Revert "x86, mm: Make spurious_fault check explicitly check the PRESENT bit"
  2012-12-17 18:17   ` H. Peter Anvin
@ 2012-12-17 18:35     ` Andrea Arcangeli
  0 siblings, 0 replies; 11+ messages in thread
From: Andrea Arcangeli @ 2012-12-17 18:35 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: linux-mm, Andi Kleen, Andrew Morton, Shaohua Li, Mel Gorman,
	Hugh Dickins

On Mon, Dec 17, 2012 at 10:17:15AM -0800, H. Peter Anvin wrote:
> On 12/17/2012 10:00 AM, Andrea Arcangeli wrote:
> > This reverts commit 660a293ea9be709b893d371fbc0328fcca33c33a.
> > 
> > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> 
> Motivation/details?

It's all in 0/2 and 2/2. This one arrived first for whatever reason.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] pageattr fixes for pmd/pte_present
  2012-12-17 18:00 [PATCH 0/2] pageattr fixes for pmd/pte_present Andrea Arcangeli
  2012-12-17 18:00 ` [PATCH 1/2] Revert "x86, mm: Make spurious_fault check explicitly check the PRESENT bit" Andrea Arcangeli
  2012-12-17 18:00 ` [PATCH 2/2] pageattr: prevent PSE and GLOABL leftovers to confuse pmd/pte_present and pmd_huge Andrea Arcangeli
@ 2013-01-06  2:59 ` Simon Jeons
  2013-01-07 21:53   ` Andrew Morton
  2013-01-10  7:42 ` Simon Jeons
  3 siblings, 1 reply; 11+ messages in thread
From: Simon Jeons @ 2013-01-06  2:59 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mm, Andi Kleen, Andrew Morton, Shaohua Li, H. Peter Anvin,
	Mel Gorman, Hugh Dickins


What's the status of these two patches?

On Mon, 2012-12-17 at 19:00 +0100, Andrea Arcangeli wrote:
> Hi,
> 
> I got a report for a minor regression introduced by commit
> 027ef6c87853b0a9df53175063028edb4950d476.
> 
> So the problem is, pageattr creates kernel pagetables (pte and pmds)
> that breaks pte_present/pmd_present and the patch above exposed this
> invariant breakage for pmd_present.
> 
> The same problem already existed for the pte and pte_present and it
> was fixed by commit 660a293ea9be709b893d371fbc0328fcca33c33a (if it
> wasn't for that commit, it wouldn't even be a regression). That fix
> avoids the pagefault to use pte_present. I could follow through by
> stopping using pmd_present/pmd_huge too.
> 
> However I think it's more robust to fix pageattr and to clear the
> PSE/GLOBAL bitflags too in addition to the present bitflag. So the
> kernel page fault can keep using the regular
> pte_present/pmd_present/pmd_huge.
> 
> The confusion arises because _PAGE_GLOBAL and _PAGE_PROTNONE are
> sharing the same bit, and in the pmd case we pretend _PAGE_PSE to be
> set only in present pmds (to facilitate split_huge_page final tlb
> flush).
> 
> Andrea Arcangeli (2):
>   Revert "x86, mm: Make spurious_fault check explicitly check the
>     PRESENT bit"
>   pageattr: prevent PSE and GLOABL leftovers to confuse pmd/pte_present
>     and pmd_huge
> 
>  arch/x86/mm/fault.c    |    8 +------
>  arch/x86/mm/pageattr.c |   50 +++++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 48 insertions(+), 10 deletions(-)
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] pageattr fixes for pmd/pte_present
  2013-01-06  2:59 ` [PATCH 0/2] pageattr fixes for pmd/pte_present Simon Jeons
@ 2013-01-07 21:53   ` Andrew Morton
  2013-01-07 21:55     ` H. Peter Anvin
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2013-01-07 21:53 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Andrea Arcangeli, linux-mm, Andi Kleen, Shaohua Li,
	H. Peter Anvin, Mel Gorman, Hugh Dickins

On Sat, 05 Jan 2013 20:59:57 -0600
Simon Jeons <simon.jeons@gmail.com> wrote:
> 

(top-posting repaired)

top-posting makes it really hard to reply to your email in a useful
fashion.  So if you want a reply, please don't top-post!

> On Mon, 2012-12-17 at 19:00 +0100, Andrea Arcangeli wrote:
> > Hi,
> > 
> > I got a report for a minor regression introduced by commit
> > 027ef6c87853b0a9df53175063028edb4950d476.
> > 
> > So the problem is, pageattr creates kernel pagetables (pte and pmds)
> > that breaks pte_present/pmd_present and the patch above exposed this
> > invariant breakage for pmd_present.
> > 
> > The same problem already existed for the pte and pte_present and it
> > was fixed by commit 660a293ea9be709b893d371fbc0328fcca33c33a (if it
> > wasn't for that commit, it wouldn't even be a regression). That fix
> > avoids the pagefault to use pte_present. I could follow through by
> > stopping using pmd_present/pmd_huge too.
> > 
> > However I think it's more robust to fix pageattr and to clear the
> > PSE/GLOBAL bitflags too in addition to the present bitflag. So the
> > kernel page fault can keep using the regular
> > pte_present/pmd_present/pmd_huge.
> > 
> > The confusion arises because _PAGE_GLOBAL and _PAGE_PROTNONE are
> > sharing the same bit, and in the pmd case we pretend _PAGE_PSE to be
> > set only in present pmds (to facilitate split_huge_page final tlb
> > flush).
> > 
> 
> What's the status of these two patches?

I expect they fell through the christmas cracks.  I added them to my
(getting large) queue of x86 patches for consideration by the x86
maintainers.

Why do you ask?  It seems the bug is a pretty minor one and that we
need only fix it in 3.8 or even 3.9.  Is that supposition incorrect?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] pageattr fixes for pmd/pte_present
  2013-01-07 21:53   ` Andrew Morton
@ 2013-01-07 21:55     ` H. Peter Anvin
  2013-01-08 12:25       ` Andrea Arcangeli
  0 siblings, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2013-01-07 21:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Simon Jeons, Andrea Arcangeli, linux-mm, Andi Kleen, Shaohua Li,
	Mel Gorman, Hugh Dickins

On 01/07/2013 01:53 PM, Andrew Morton wrote:
>>
>> What's the status of these two patches?
> 
> I expect they fell through the christmas cracks.  I added them to my
> (getting large) queue of x86 patches for consideration by the x86
> maintainers.

Yes, I'm just coming back online today, and needless to say, I have a
*huge* backlog.

> Why do you ask?  It seems the bug is a pretty minor one and that we
> need only fix it in 3.8 or even 3.9.  Is that supposition incorrect?

I would like to know this as well.

	-hpa




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] pageattr fixes for pmd/pte_present
  2013-01-07 21:55     ` H. Peter Anvin
@ 2013-01-08 12:25       ` Andrea Arcangeli
  0 siblings, 0 replies; 11+ messages in thread
From: Andrea Arcangeli @ 2013-01-08 12:25 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andrew Morton, Simon Jeons, linux-mm, Andi Kleen, Shaohua Li,
	Mel Gorman, Hugh Dickins

Hi,

On Mon, Jan 07, 2013 at 01:55:33PM -0800, H. Peter Anvin wrote:
> On 01/07/2013 01:53 PM, Andrew Morton wrote:
> >>
> >> What's the status of these two patches?
> > 
> > I expect they fell through the christmas cracks.  I added them to my
> > (getting large) queue of x86 patches for consideration by the x86
> > maintainers.
> 
> Yes, I'm just coming back online today, and needless to say, I have a
> *huge* backlog.
> 
> > Why do you ask?  It seems the bug is a pretty minor one and that we
> > need only fix it in 3.8 or even 3.9.  Is that supposition incorrect?
> 
> I would like to know this as well.

It is a minor regression that was reported. The only way to notice it
should be to use the crash tool or other non standard root stuff. So
it shouldn't be too urgent.

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] pageattr fixes for pmd/pte_present
  2012-12-17 18:00 [PATCH 0/2] pageattr fixes for pmd/pte_present Andrea Arcangeli
                   ` (2 preceding siblings ...)
  2013-01-06  2:59 ` [PATCH 0/2] pageattr fixes for pmd/pte_present Simon Jeons
@ 2013-01-10  7:42 ` Simon Jeons
  3 siblings, 0 replies; 11+ messages in thread
From: Simon Jeons @ 2013-01-10  7:42 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mm, Andi Kleen, Andrew Morton, Shaohua Li, H. Peter Anvin,
	Mel Gorman, Hugh Dickins

On Mon, 2012-12-17 at 19:00 +0100, Andrea Arcangeli wrote:
> Hi,
> 
> I got a report for a minor regression introduced by commit
> 027ef6c87853b0a9df53175063028edb4950d476.
> 
> So the problem is, pageattr creates kernel pagetables (pte and pmds)
> that breaks pte_present/pmd_present and the patch above exposed this
> invariant breakage for pmd_present.
> 
> The same problem already existed for the pte and pte_present and it
> was fixed by commit 660a293ea9be709b893d371fbc0328fcca33c33a (if it
> wasn't for that commit, it wouldn't even be a regression). That fix
> avoids the pagefault to use pte_present. I could follow through by
> stopping using pmd_present/pmd_huge too.
> 
> However I think it's more robust to fix pageattr and to clear the
> PSE/GLOBAL bitflags too in addition to the present bitflag. So the
> kernel page fault can keep using the regular
> pte_present/pmd_present/pmd_huge.
> 
> The confusion arises because _PAGE_GLOBAL and _PAGE_PROTNONE are
> sharing the same bit, and in the pmd case we pretend _PAGE_PSE to be
> set only in present pmds (to facilitate split_huge_page final tlb
> flush).

Hi all,

One offline question.

Why free pages should map to kernel virtual address in function
isolate_freepages? If you said that those pages are about to be used as
migration targets, it's splitting a free page but those pages are about
to be used for copying data to, then why kernel need to access these
pages? These target pages are isolated buddy pages and they will be the
migration targets of normal processes' pages, why kernel need to access
them? 

> 
> Andrea Arcangeli (2):
>   Revert "x86, mm: Make spurious_fault check explicitly check the
>     PRESENT bit"
>   pageattr: prevent PSE and GLOABL leftovers to confuse pmd/pte_present
>     and pmd_huge
> 
>  arch/x86/mm/fault.c    |    8 +------
>  arch/x86/mm/pageattr.c |   50 +++++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 48 insertions(+), 10 deletions(-)
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] pageattr: prevent PSE and GLOABL leftovers to confuse pmd/pte_present and pmd_huge
  2012-12-17 18:00 ` [PATCH 2/2] pageattr: prevent PSE and GLOABL leftovers to confuse pmd/pte_present and pmd_huge Andrea Arcangeli
@ 2013-01-10  7:59   ` Simon Jeons
  0 siblings, 0 replies; 11+ messages in thread
From: Simon Jeons @ 2013-01-10  7:59 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mm, Andi Kleen, Andrew Morton, Shaohua Li, H. Peter Anvin,
	Mel Gorman, Hugh Dickins

On Mon, 2012-12-17 at 19:00 +0100, Andrea Arcangeli wrote:
> Without this patch any kernel code that reads kernel memory in non
> present kernel pte/pmds (as set by pageattr.c) will crash.
> 
> With this kernel code:
> 
> static struct page *crash_page;
> static unsigned long *crash_address;
> [..]
> 	crash_page = alloc_pages(GFP_KERNEL, 9);
> 	crash_address = page_address(crash_page);
> 	if (set_memory_np((unsigned long)crash_address, 1))
> 		printk("set_memory_np failure\n");
> [..]
> 
> The kernel will crash if inside the "crash tool" one would try to read
> the memory at the not present address.
> 
> crash> p crash_address
> crash_address = $8 = (long unsigned int *) 0xffff88023c000000
> crash> rd 0xffff88023c000000
> [ *lockup* ]
> 
> The lockup happens because _PAGE_GLOBAL and _PAGE_PROTNONE shares the
> same bit, and pageattr leaves _PAGE_GLOBAL set on a kernel pte which
> is then mistaken as _PAGE_PROTNONE (so pte_present returns true by
> mistake and the kernel fault then gets confused and loops).
> 
> With THP the same can happen after we taught pmd_present to check
> _PAGE_PROTNONE and _PAGE_PSE in commit
> 027ef6c87853b0a9df53175063028edb4950d476. THP has the same problem
> with _PAGE_GLOBAL as the 4k pages, but it also has a problem with
> _PAGE_PSE, which must be cleared too.
> 
> After the patch is applied copy_user correctly returns -EFAULT and
> doesn't lockup anymore.
> 
> crash> p crash_address
> crash_address = $9 = (long unsigned int *) 0xffff88023c000000
> crash> rd 0xffff88023c000000
> rd: read error: kernel virtual address: ffff88023c000000  type: "64-bit KVADDR"
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
>  arch/x86/mm/pageattr.c |   50 +++++++++++++++++++++++++++++++++++++++++++++--
>  1 files changed, 47 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index a718e0d..2713be4 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -445,6 +445,19 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,

Hi Andrea,

Since function kernel_physical_mapping_init has already setup identity
mapping of pgd/pmd/pte, why need preserve large page here? 

cat /proc/meminfo

> DirectMap4k:       12280 kB
> DirectMap2M:      894976 kB

Why DirectMap2M is not equal to DirectMap4k?
It seems that DirectMap2M should consist of DirectMap4K.

>  	pgprot_val(req_prot) |= pgprot_val(cpa->mask_set);
>  
>  	/*
> +	 * Set the PSE and GLOBAL flags only if the PRESENT flag is
> +	 * set otherwise pmd_present/pmd_huge will return true even on
> +	 * a non present pmd. The canon_pgprot will clear _PAGE_GLOBAL
> +	 * for the ancient hardware that doesn't support it.
> +	 */
> +	if (pgprot_val(new_prot) & _PAGE_PRESENT)
> +		pgprot_val(new_prot) |= _PAGE_PSE | _PAGE_GLOBAL;
> +	else
> +		pgprot_val(new_prot) &= ~(_PAGE_PSE | _PAGE_GLOBAL);
> +
> +	new_prot = canon_pgprot(new_prot);
> +
> +	/*
>  	 * old_pte points to the large page base address. So we need
>  	 * to add the offset of the virtual address:
>  	 */
> @@ -489,7 +502,7 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
>  		 * The address is aligned and the number of pages
>  		 * covers the full page.
>  		 */
> -		new_pte = pfn_pte(pte_pfn(old_pte), canon_pgprot(new_prot));
> +		new_pte = pfn_pte(pte_pfn(old_pte), new_prot);
>  		__set_pmd_pte(kpte, address, new_pte);
>  		cpa->flags |= CPA_FLUSHTLB;
>  		do_split = 0;
> @@ -540,16 +553,35 @@ static int split_large_page(pte_t *kpte, unsigned long address)
>  #ifdef CONFIG_X86_64
>  	if (level == PG_LEVEL_1G) {
>  		pfninc = PMD_PAGE_SIZE >> PAGE_SHIFT;
> -		pgprot_val(ref_prot) |= _PAGE_PSE;
> +		/*
> +		 * Set the PSE flags only if the PRESENT flag is set
> +		 * otherwise pmd_present/pmd_huge will return true
> +		 * even on a non present pmd.
> +		 */
> +		if (pgprot_val(ref_prot) & _PAGE_PRESENT)
> +			pgprot_val(ref_prot) |= _PAGE_PSE;
> +		else
> +			pgprot_val(ref_prot) &= ~_PAGE_PSE;
>  	}
>  #endif
>  
>  	/*
> +	 * Set the GLOBAL flags only if the PRESENT flag is set
> +	 * otherwise pmd/pte_present will return true even on a non
> +	 * present pmd/pte. The canon_pgprot will clear _PAGE_GLOBAL
> +	 * for the ancient hardware that doesn't support it.
> +	 */
> +	if (pgprot_val(ref_prot) & _PAGE_PRESENT)
> +		pgprot_val(ref_prot) |= _PAGE_GLOBAL;
> +	else
> +		pgprot_val(ref_prot) &= ~_PAGE_GLOBAL;
> +
> +	/*
>  	 * Get the target pfn from the original entry:
>  	 */
>  	pfn = pte_pfn(*kpte);
>  	for (i = 0; i < PTRS_PER_PTE; i++, pfn += pfninc)
> -		set_pte(&pbase[i], pfn_pte(pfn, ref_prot));
> +		set_pte(&pbase[i], pfn_pte(pfn, canon_pgprot(ref_prot)));
>  
>  	if (address >= (unsigned long)__va(0) &&
>  		address < (unsigned long)__va(max_low_pfn_mapped << PAGE_SHIFT))
> @@ -660,6 +692,18 @@ repeat:
>  		new_prot = static_protections(new_prot, address, pfn);
>  
>  		/*
> +		 * Set the GLOBAL flags only if the PRESENT flag is
> +		 * set otherwise pte_present will return true even on
> +		 * a non present pte. The canon_pgprot will clear
> +		 * _PAGE_GLOBAL for the ancient hardware that doesn't
> +		 * support it.
> +		 */
> +		if (pgprot_val(new_prot) & _PAGE_PRESENT)
> +			pgprot_val(new_prot) |= _PAGE_GLOBAL;
> +		else
> +			pgprot_val(new_prot) &= ~_PAGE_GLOBAL;
> +
> +		/*
>  		 * We need to keep the pfn from the existing PTE,
>  		 * after all we're only going to change it's attributes
>  		 * not the memory it points to
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-01-10  7:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-17 18:00 [PATCH 0/2] pageattr fixes for pmd/pte_present Andrea Arcangeli
2012-12-17 18:00 ` [PATCH 1/2] Revert "x86, mm: Make spurious_fault check explicitly check the PRESENT bit" Andrea Arcangeli
2012-12-17 18:17   ` H. Peter Anvin
2012-12-17 18:35     ` Andrea Arcangeli
2012-12-17 18:00 ` [PATCH 2/2] pageattr: prevent PSE and GLOABL leftovers to confuse pmd/pte_present and pmd_huge Andrea Arcangeli
2013-01-10  7:59   ` Simon Jeons
2013-01-06  2:59 ` [PATCH 0/2] pageattr fixes for pmd/pte_present Simon Jeons
2013-01-07 21:53   ` Andrew Morton
2013-01-07 21:55     ` H. Peter Anvin
2013-01-08 12:25       ` Andrea Arcangeli
2013-01-10  7:42 ` Simon Jeons

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).