From: Simon Jeons <simon.jeons@gmail.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: linux-mm@kvack.org, Andi Kleen <andi@firstfloor.org>,
Andrew Morton <akpm@linux-foundation.org>,
Shaohua Li <shaohua.li@intel.com>,
"H. Peter Anvin" <hpa@linux.intel.com>,
Mel Gorman <mgorman@suse.de>, Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH 2/2] pageattr: prevent PSE and GLOABL leftovers to confuse pmd/pte_present and pmd_huge
Date: Thu, 10 Jan 2013 01:59:52 -0600 [thread overview]
Message-ID: <1357804792.6568.17.camel@kernel.cn.ibm.com> (raw)
In-Reply-To: <1355767224-13298-3-git-send-email-aarcange@redhat.com>
On Mon, 2012-12-17 at 19:00 +0100, Andrea Arcangeli wrote:
> Without this patch any kernel code that reads kernel memory in non
> present kernel pte/pmds (as set by pageattr.c) will crash.
>
> With this kernel code:
>
> static struct page *crash_page;
> static unsigned long *crash_address;
> [..]
> crash_page = alloc_pages(GFP_KERNEL, 9);
> crash_address = page_address(crash_page);
> if (set_memory_np((unsigned long)crash_address, 1))
> printk("set_memory_np failure\n");
> [..]
>
> The kernel will crash if inside the "crash tool" one would try to read
> the memory at the not present address.
>
> crash> p crash_address
> crash_address = $8 = (long unsigned int *) 0xffff88023c000000
> crash> rd 0xffff88023c000000
> [ *lockup* ]
>
> The lockup happens because _PAGE_GLOBAL and _PAGE_PROTNONE shares the
> same bit, and pageattr leaves _PAGE_GLOBAL set on a kernel pte which
> is then mistaken as _PAGE_PROTNONE (so pte_present returns true by
> mistake and the kernel fault then gets confused and loops).
>
> With THP the same can happen after we taught pmd_present to check
> _PAGE_PROTNONE and _PAGE_PSE in commit
> 027ef6c87853b0a9df53175063028edb4950d476. THP has the same problem
> with _PAGE_GLOBAL as the 4k pages, but it also has a problem with
> _PAGE_PSE, which must be cleared too.
>
> After the patch is applied copy_user correctly returns -EFAULT and
> doesn't lockup anymore.
>
> crash> p crash_address
> crash_address = $9 = (long unsigned int *) 0xffff88023c000000
> crash> rd 0xffff88023c000000
> rd: read error: kernel virtual address: ffff88023c000000 type: "64-bit KVADDR"
>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
> arch/x86/mm/pageattr.c | 50 +++++++++++++++++++++++++++++++++++++++++++++--
> 1 files changed, 47 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index a718e0d..2713be4 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -445,6 +445,19 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
Hi Andrea,
Since function kernel_physical_mapping_init has already setup identity
mapping of pgd/pmd/pte, why need preserve large page here?
cat /proc/meminfo
> DirectMap4k: 12280 kB
> DirectMap2M: 894976 kB
Why DirectMap2M is not equal to DirectMap4k?
It seems that DirectMap2M should consist of DirectMap4K.
> pgprot_val(req_prot) |= pgprot_val(cpa->mask_set);
>
> /*
> + * Set the PSE and GLOBAL flags only if the PRESENT flag is
> + * set otherwise pmd_present/pmd_huge will return true even on
> + * a non present pmd. The canon_pgprot will clear _PAGE_GLOBAL
> + * for the ancient hardware that doesn't support it.
> + */
> + if (pgprot_val(new_prot) & _PAGE_PRESENT)
> + pgprot_val(new_prot) |= _PAGE_PSE | _PAGE_GLOBAL;
> + else
> + pgprot_val(new_prot) &= ~(_PAGE_PSE | _PAGE_GLOBAL);
> +
> + new_prot = canon_pgprot(new_prot);
> +
> + /*
> * old_pte points to the large page base address. So we need
> * to add the offset of the virtual address:
> */
> @@ -489,7 +502,7 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
> * The address is aligned and the number of pages
> * covers the full page.
> */
> - new_pte = pfn_pte(pte_pfn(old_pte), canon_pgprot(new_prot));
> + new_pte = pfn_pte(pte_pfn(old_pte), new_prot);
> __set_pmd_pte(kpte, address, new_pte);
> cpa->flags |= CPA_FLUSHTLB;
> do_split = 0;
> @@ -540,16 +553,35 @@ static int split_large_page(pte_t *kpte, unsigned long address)
> #ifdef CONFIG_X86_64
> if (level == PG_LEVEL_1G) {
> pfninc = PMD_PAGE_SIZE >> PAGE_SHIFT;
> - pgprot_val(ref_prot) |= _PAGE_PSE;
> + /*
> + * Set the PSE flags only if the PRESENT flag is set
> + * otherwise pmd_present/pmd_huge will return true
> + * even on a non present pmd.
> + */
> + if (pgprot_val(ref_prot) & _PAGE_PRESENT)
> + pgprot_val(ref_prot) |= _PAGE_PSE;
> + else
> + pgprot_val(ref_prot) &= ~_PAGE_PSE;
> }
> #endif
>
> /*
> + * Set the GLOBAL flags only if the PRESENT flag is set
> + * otherwise pmd/pte_present will return true even on a non
> + * present pmd/pte. The canon_pgprot will clear _PAGE_GLOBAL
> + * for the ancient hardware that doesn't support it.
> + */
> + if (pgprot_val(ref_prot) & _PAGE_PRESENT)
> + pgprot_val(ref_prot) |= _PAGE_GLOBAL;
> + else
> + pgprot_val(ref_prot) &= ~_PAGE_GLOBAL;
> +
> + /*
> * Get the target pfn from the original entry:
> */
> pfn = pte_pfn(*kpte);
> for (i = 0; i < PTRS_PER_PTE; i++, pfn += pfninc)
> - set_pte(&pbase[i], pfn_pte(pfn, ref_prot));
> + set_pte(&pbase[i], pfn_pte(pfn, canon_pgprot(ref_prot)));
>
> if (address >= (unsigned long)__va(0) &&
> address < (unsigned long)__va(max_low_pfn_mapped << PAGE_SHIFT))
> @@ -660,6 +692,18 @@ repeat:
> new_prot = static_protections(new_prot, address, pfn);
>
> /*
> + * Set the GLOBAL flags only if the PRESENT flag is
> + * set otherwise pte_present will return true even on
> + * a non present pte. The canon_pgprot will clear
> + * _PAGE_GLOBAL for the ancient hardware that doesn't
> + * support it.
> + */
> + if (pgprot_val(new_prot) & _PAGE_PRESENT)
> + pgprot_val(new_prot) |= _PAGE_GLOBAL;
> + else
> + pgprot_val(new_prot) &= ~_PAGE_GLOBAL;
> +
> + /*
> * We need to keep the pfn from the existing PTE,
> * after all we're only going to change it's attributes
> * not the memory it points to
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-01-10 7:59 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-17 18:00 [PATCH 0/2] pageattr fixes for pmd/pte_present Andrea Arcangeli
2012-12-17 18:00 ` [PATCH 1/2] Revert "x86, mm: Make spurious_fault check explicitly check the PRESENT bit" Andrea Arcangeli
2012-12-17 18:17 ` H. Peter Anvin
2012-12-17 18:35 ` Andrea Arcangeli
2012-12-17 18:00 ` [PATCH 2/2] pageattr: prevent PSE and GLOABL leftovers to confuse pmd/pte_present and pmd_huge Andrea Arcangeli
2013-01-10 7:59 ` Simon Jeons [this message]
2013-01-06 2:59 ` [PATCH 0/2] pageattr fixes for pmd/pte_present Simon Jeons
2013-01-07 21:53 ` Andrew Morton
2013-01-07 21:55 ` H. Peter Anvin
2013-01-08 12:25 ` Andrea Arcangeli
2013-01-10 7:42 ` Simon Jeons
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1357804792.6568.17.camel@kernel.cn.ibm.com \
--to=simon.jeons@gmail.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=hpa@linux.intel.com \
--cc=hughd@google.com \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=shaohua.li@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.