From: Mel Gorman <mgorman@suse.de>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Sasha Levin <sasha.levin@oracle.com>,
Dave Jones <davej@redhat.com>,
LKML <linux-kernel@vger.kernel.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Peter Zijlstra <peterz@infradead.org>,
Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Cyrill Gorcunov <gorcunov@gmail.com>
Subject: [PATCH] x86,mm: fix pte_special versus pte_numa
Date: Tue, 12 Aug 2014 11:47:58 +0100 [thread overview]
Message-ID: <20140812104758.GE7970@suse.de> (raw)
In-Reply-To: <53E989FB.5000904@oracle.com>
Sasha Levin has shown oopses on ffffea0003480048 and ffffea0003480008
at mm/memory.c:1132, running Trinity on different 3.16-rc-next kernels:
where zap_pte_range() checks page->mapping to see if PageAnon(page).
Those addresses fit struct pages for pfns d2001 and d2000, and in each
dump a register or a stack slot showed d2001730 or d2000730: pte flags
0x730 are PCD ACCESSED PROTNONE SPECIAL IOMAP; and Sasha's e820 map has
a hole between cfffffff and 100000000, which would need special access.
Commit c46a7c817e66 ("x86: define _PAGE_NUMA by reusing software bits on
the PMD and PTE levels") has broken vm_normal_page(): a PROTNONE SPECIAL
pte no longer passes the pte_special() test, so zap_pte_range() goes on
to try to access a non-existent struct page.
Fix this by refining pte_special() (SPECIAL with PRESENT or PROTNONE)
to complement pte_numa() (SPECIAL with neither PRESENT nor PROTNONE).
A hint that this was a problem was that c46a7c817e66 added pte_numa()
test to vm_normal_page(), and moved its is_zero_pfn() test from slow to
fast path: This was papering over a pte_special() snag when the zero page
was encountered during zap. This patch reverts vm_normal_page() to how it
was before, relying on pte_special().
It still appears that this patch may be incomplete: aren't there other
places which need to be handling PROTNONE along with PRESENT? For example,
pte_mknuma() clears _PAGE_PRESENT and sets _PAGE_NUMA, but on a PROT_NONE
area, that would make it pte_special(). This is side-stepped by the fact
that NUMA hinting faults skipped PROT_NONE VMAs and there are no grounds
where a NUMA hinting fault on a PROT_NONE VMA would be interesting.
Fixes: c46a7c817e66 ("x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels")
Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: stable@vger.kernel.org [3.16]
---
arch/x86/include/asm/pgtable.h | 9 +++++++--
mm/memory.c | 7 +++----
2 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 0ec0560..aa97a07 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -131,8 +131,13 @@ static inline int pte_exec(pte_t pte)
static inline int pte_special(pte_t pte)
{
- return (pte_flags(pte) & (_PAGE_PRESENT|_PAGE_SPECIAL)) ==
- (_PAGE_PRESENT|_PAGE_SPECIAL);
+ /*
+ * See CONFIG_NUMA_BALANCING pte_numa in include/asm-generic/pgtable.h.
+ * On x86 we have _PAGE_BIT_NUMA == _PAGE_BIT_GLOBAL+1 ==
+ * __PAGE_BIT_SOFTW1 == _PAGE_BIT_SPECIAL.
+ */
+ return (pte_flags(pte) & _PAGE_SPECIAL) &&
+ (pte_flags(pte) & (_PAGE_PRESENT|_PAGE_PROTNONE));
}
static inline unsigned long pte_pfn(pte_t pte)
diff --git a/mm/memory.c b/mm/memory.c
index 8b44f76..0a21f3d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -751,7 +751,7 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn = pte_pfn(pte);
if (HAVE_PTE_SPECIAL) {
- if (likely(!pte_special(pte) || pte_numa(pte)))
+ if (likely(!pte_special(pte)))
goto check_pfn;
if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
return NULL;
@@ -777,15 +777,14 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
}
}
+ if (is_zero_pfn(pfn))
+ return NULL;
check_pfn:
if (unlikely(pfn > highest_memmap_pfn)) {
print_bad_pte(vma, addr, pte, NULL);
return NULL;
}
- if (is_zero_pfn(pfn))
- return NULL;
-
/*
* NOTE! We still have PageReserved() pages in the page tables.
* eg. VDSO mappings can cause them to exist.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Sasha Levin <sasha.levin@oracle.com>,
Dave Jones <davej@redhat.com>,
LKML <linux-kernel@vger.kernel.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Peter Zijlstra <peterz@infradead.org>,
Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Cyrill Gorcunov <gorcunov@gmail.com>
Subject: [PATCH] x86,mm: fix pte_special versus pte_numa
Date: Tue, 12 Aug 2014 11:47:58 +0100 [thread overview]
Message-ID: <20140812104758.GE7970@suse.de> (raw)
In-Reply-To: <53E989FB.5000904@oracle.com>
Sasha Levin has shown oopses on ffffea0003480048 and ffffea0003480008
at mm/memory.c:1132, running Trinity on different 3.16-rc-next kernels:
where zap_pte_range() checks page->mapping to see if PageAnon(page).
Those addresses fit struct pages for pfns d2001 and d2000, and in each
dump a register or a stack slot showed d2001730 or d2000730: pte flags
0x730 are PCD ACCESSED PROTNONE SPECIAL IOMAP; and Sasha's e820 map has
a hole between cfffffff and 100000000, which would need special access.
Commit c46a7c817e66 ("x86: define _PAGE_NUMA by reusing software bits on
the PMD and PTE levels") has broken vm_normal_page(): a PROTNONE SPECIAL
pte no longer passes the pte_special() test, so zap_pte_range() goes on
to try to access a non-existent struct page.
Fix this by refining pte_special() (SPECIAL with PRESENT or PROTNONE)
to complement pte_numa() (SPECIAL with neither PRESENT nor PROTNONE).
A hint that this was a problem was that c46a7c817e66 added pte_numa()
test to vm_normal_page(), and moved its is_zero_pfn() test from slow to
fast path: This was papering over a pte_special() snag when the zero page
was encountered during zap. This patch reverts vm_normal_page() to how it
was before, relying on pte_special().
It still appears that this patch may be incomplete: aren't there other
places which need to be handling PROTNONE along with PRESENT? For example,
pte_mknuma() clears _PAGE_PRESENT and sets _PAGE_NUMA, but on a PROT_NONE
area, that would make it pte_special(). This is side-stepped by the fact
that NUMA hinting faults skipped PROT_NONE VMAs and there are no grounds
where a NUMA hinting fault on a PROT_NONE VMA would be interesting.
Fixes: c46a7c817e66 ("x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels")
Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: stable@vger.kernel.org [3.16]
---
arch/x86/include/asm/pgtable.h | 9 +++++++--
mm/memory.c | 7 +++----
2 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 0ec0560..aa97a07 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -131,8 +131,13 @@ static inline int pte_exec(pte_t pte)
static inline int pte_special(pte_t pte)
{
- return (pte_flags(pte) & (_PAGE_PRESENT|_PAGE_SPECIAL)) ==
- (_PAGE_PRESENT|_PAGE_SPECIAL);
+ /*
+ * See CONFIG_NUMA_BALANCING pte_numa in include/asm-generic/pgtable.h.
+ * On x86 we have _PAGE_BIT_NUMA == _PAGE_BIT_GLOBAL+1 ==
+ * __PAGE_BIT_SOFTW1 == _PAGE_BIT_SPECIAL.
+ */
+ return (pte_flags(pte) & _PAGE_SPECIAL) &&
+ (pte_flags(pte) & (_PAGE_PRESENT|_PAGE_PROTNONE));
}
static inline unsigned long pte_pfn(pte_t pte)
diff --git a/mm/memory.c b/mm/memory.c
index 8b44f76..0a21f3d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -751,7 +751,7 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn = pte_pfn(pte);
if (HAVE_PTE_SPECIAL) {
- if (likely(!pte_special(pte) || pte_numa(pte)))
+ if (likely(!pte_special(pte)))
goto check_pfn;
if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
return NULL;
@@ -777,15 +777,14 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
}
}
+ if (is_zero_pfn(pfn))
+ return NULL;
check_pfn:
if (unlikely(pfn > highest_memmap_pfn)) {
print_bad_pte(vma, addr, pte, NULL);
return NULL;
}
- if (is_zero_pfn(pfn))
- return NULL;
-
/*
* NOTE! We still have PageReserved() pages in the page tables.
* eg. VDSO mappings can cause them to exist.
next prev parent reply other threads:[~2014-08-12 10:48 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-02 21:58 mm: BUG in unmap_page_range Sasha Levin
2014-08-02 21:58 ` Sasha Levin
2014-08-04 11:40 ` Hugh Dickins
2014-08-04 11:40 ` Hugh Dickins
2014-08-05 14:44 ` Mel Gorman
2014-08-05 14:44 ` Mel Gorman
2014-08-06 0:42 ` Hugh Dickins
2014-08-06 0:42 ` Hugh Dickins
2014-08-06 1:04 ` Sasha Levin
2014-08-06 1:04 ` Sasha Levin
2014-08-12 3:28 ` Sasha Levin
2014-08-12 3:28 ` Sasha Levin
2014-08-12 10:47 ` Mel Gorman [this message]
2014-08-12 10:47 ` [PATCH] x86,mm: fix pte_special versus pte_numa Mel Gorman
2014-08-12 11:08 ` [PATCH] mm: Remove misleading ARCH_USES_NUMA_PROT_NONE Mel Gorman
2014-08-12 11:08 ` Mel Gorman
2014-08-13 13:14 ` Aneesh Kumar K.V
2014-08-13 13:14 ` Aneesh Kumar K.V
2014-08-27 3:16 ` mm: BUG in unmap_page_range Sasha Levin
2014-08-27 3:16 ` Sasha Levin
2014-08-27 15:26 ` Mel Gorman
2014-08-27 15:26 ` Mel Gorman
2014-08-27 18:21 ` Sasha Levin
2014-08-27 18:21 ` Sasha Levin
2014-08-30 1:23 ` Sasha Levin
2014-08-30 1:23 ` Sasha Levin
2014-09-04 9:04 ` Sasha Levin
2014-09-04 9:04 ` Sasha Levin
2014-09-08 17:18 ` Mel Gorman
2014-09-08 17:18 ` Mel Gorman
2014-09-08 17:23 ` Sasha Levin
2014-09-08 17:56 ` Sasha Levin
2014-09-08 17:56 ` Sasha Levin
2014-09-09 21:33 ` Mel Gorman
2014-09-09 21:33 ` Mel Gorman
2014-09-09 22:20 ` Sasha Levin
2014-09-09 22:20 ` Sasha Levin
2014-09-10 2:45 ` Hugh Dickins
2014-09-10 2:45 ` Hugh Dickins
2014-09-10 12:47 ` Mel Gorman
2014-09-10 12:47 ` Mel Gorman
2014-09-10 14:24 ` Trinity and mbind flags (WAS: Re: mm: BUG in unmap_page_range) Sasha Levin
2014-09-10 14:24 ` Sasha Levin
2014-09-10 14:33 ` Dave Jones
2014-09-10 14:33 ` Dave Jones
2014-09-10 19:06 ` mm: BUG in unmap_page_range Sasha Levin
2014-09-10 19:06 ` Sasha Levin
2014-09-10 19:36 ` Hugh Dickins
2014-09-10 19:36 ` Hugh Dickins
2014-09-11 2:43 ` Sasha Levin
2014-09-11 2:43 ` Sasha Levin
2014-09-11 11:39 ` Hugh Dickins
2014-09-11 11:39 ` Hugh Dickins
2014-09-11 14:22 ` Sasha Levin
2014-09-11 14:22 ` Sasha Levin
2014-09-11 14:33 ` Dave Jones
2014-09-11 14:33 ` Dave Jones
2014-09-11 16:28 ` Mel Gorman
2014-09-11 16:28 ` Mel Gorman
2014-09-11 22:38 ` Sasha Levin
2014-09-11 22:38 ` Sasha Levin
2014-09-17 21:37 ` Sasha Levin
2014-09-17 21:37 ` Sasha Levin
2014-09-10 13:12 ` Sasha Levin
2014-09-10 13:12 ` Sasha Levin
2014-09-10 13:40 ` Mel Gorman
2014-09-10 13:40 ` Mel Gorman
2014-09-10 16:44 ` Sasha Levin
2014-09-10 16:44 ` Sasha Levin
2014-09-10 19:09 ` Hugh Dickins
2014-09-10 19:09 ` Hugh Dickins
2014-09-10 20:36 ` Sasha Levin
2014-09-10 20:36 ` Sasha Levin
2014-09-10 23:00 ` Hugh Dickins
2014-09-10 23:00 ` Hugh Dickins
2014-08-06 10:35 ` Mel Gorman
2014-08-06 10:35 ` Mel Gorman
2014-08-06 7:14 ` Aneesh Kumar K.V
2014-08-06 7:14 ` Aneesh Kumar K.V
2014-08-06 7:14 ` Aneesh Kumar K.V
2014-08-06 10:23 ` Mel Gorman
2014-08-06 10:23 ` Mel Gorman
2014-08-06 10:23 ` Mel Gorman
2014-08-07 8:40 ` Aneesh Kumar K.V
2014-08-07 8:40 ` Aneesh Kumar K.V
2014-08-07 8:40 ` Aneesh Kumar K.V
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140812104758.GE7970@suse.de \
--to=mgorman@suse.de \
--cc=akpm@linux-foundation.org \
--cc=davej@redhat.com \
--cc=gorcunov@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=sasha.levin@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.