From: Mel Gorman <mel@csn.ul.ie>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
starlight@binnacle.cx, Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, bugzilla-daemon@bugzilla.kernel.org,
bugme-daemon@bugzilla.kernel.org, Adam Litke <agl@us.ibm.com>,
Eric B Munson <ebmunson@us.ibm.com>,
riel@redhat.com, hugh.dickins@tiscali.co.uk, kenchen@google.com
Subject: Re: [Bugme-new] [Bug 13302] New: "bad pmd" on fork() of process with hugepage shared memory segments attached
Date: Fri, 22 May 2009 17:41:02 +0100 [thread overview]
Message-ID: <20090522164101.GA9196@csn.ul.ie> (raw)
In-Reply-To: <20090521094057.63B8.A69D9226@jp.fujitsu.com>
On Thu, May 21, 2009 at 09:41:46AM +0900, KOSAKI Motohiro wrote:
> Hi
>
> > Basic and in this case, apparently the critical factor. This patch on
> > 2.6.27.7 makes the problem disappear as well by never setting VM_LOCKED on
> > hugetlb-backed VMAs. Obviously, it's a hachet job and almost certainly the
> > wrong fix but it indicates that the handling of VM_LOCKED && VM_HUGETLB
> > is wrong somewhere. Now I have a better idea now what to search for on
> > Friday. Thanks Lee.
> >
> > --- mm/mlock.c 2009-05-20 16:36:08.000000000 +0100
> > +++ mm/mlock-new.c 2009-05-20 16:28:17.000000000 +0100
> > @@ -64,7 +64,8 @@
> > * It's okay if try_to_unmap_one unmaps a page just after we
> > * set VM_LOCKED, make_pages_present below will bring it back.
> > */
> > - vma->vm_flags = newflags;
> > + if (!(vma->vm_flags & VM_HUGETLB))
>
> this condition meaning isn't so obvious to me. could you please
> consider comment adding?
>
I should have used the helper, but anyway, the check was to see if the VMA was
backed by hugetlbfs or not. This wasn't the right fix. It was only intended
to show that it was something to do with the VM_LOCKED flag.
The real problem has something to do with pagetable-sharing of hugetlb-backed
segments. After fork(), the VM_LOCKED gets cleared so when huge_pmd_share()
is called, some of the pagetables are shared and others are not. I believe
this is resulting in pagetables being freed prematurely. I'm cc'ing the
author and acks to the pagetable-sharing patch to see can they shed more
light on whether this is the right patch or not. Kenneth, Hugh?
==== CUT HERE ====
x86: Ignore VM_LOCKED when determining if hugetlb-backed page tables can be shared or not
On x86 and x86-64, it is possible that page tables are shared beween shared
mappings backed by hugetlbfs. As part of this, page_table_shareable() checks
a pair of vma->vm_flags and they must match if they are to be shared. All
VMA flags are taken into account, including VM_LOCKED.
The problem is that VM_LOCKED is cleared on fork(). When a process with a
shared memory segment forks() to exec() a helper, there will be shared VMAs
with different flags. The impact is that the shared segment is sometimes
considered shareable and other times not, depending on what process is
checking. A test process that forks and execs heavily can trigger a
number of "bad pmd" messages appearing in the kernel log and hugepages
being leaked.
I believe what happens is that the segment page tables are being shared but
the count is inaccurate depending on the ordering of events.
Strictly speaking, this affects mainline but the problem is masked by the
changes made for CONFIG_UNEVITABLE_LRU as the kernel now never has VM_LOCKED
set for hugetlbfs-backed mapping. This does affect the stable branch of
2.6.27 and distributions based on that kernel such as SLES 11.
This patch addresses the problem by comparing all flags but VM_LOCKED when
deciding if pagetables should be shared or not for hugetlbfs-backed mapping.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
arch/x86/mm/hugetlbpage.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 8f307d9..16e4bcc 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -26,12 +26,16 @@ static unsigned long page_table_shareable(struct vm_area_struct *svma,
unsigned long sbase = saddr & PUD_MASK;
unsigned long s_end = sbase + PUD_SIZE;
+ /* Allow segments to share if only one is locked */
+ unsigned long vm_flags = vma->vm_flags & ~VM_LOCKED;
+ unsigned long svm_flags = vma->vm_flags & ~VM_LOCKED;
+
/*
* match the virtual addresses, permission and the alignment of the
* page table page.
*/
if (pmd_index(addr) != pmd_index(saddr) ||
- vma->vm_flags != svma->vm_flags ||
+ vm_flags != svm_flags ||
sbase < svma->vm_start || svma->vm_end < s_end)
return 0;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-05-22 16:40 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-15 18:53 [Bugme-new] [Bug 13302] New: "bad pmd" on fork() of process with hugepage shared memory segments attached starlight
2009-05-20 11:35 ` Mel Gorman
2009-05-20 14:29 ` Mel Gorman
2009-05-20 14:53 ` Lee Schermerhorn
2009-05-20 15:05 ` Lee Schermerhorn
2009-05-20 15:41 ` Mel Gorman
2009-05-21 0:41 ` KOSAKI Motohiro
2009-05-22 16:41 ` Mel Gorman [this message]
2009-05-24 13:44 ` KOSAKI Motohiro
2009-05-25 8:51 ` Mel Gorman
2009-05-25 10:10 ` Hugh Dickins
2009-05-25 13:17 ` Mel Gorman
-- strict thread matches above, loose matches on Subject: below --
2009-05-15 18:44 starlight
2009-05-18 16:36 ` Mel Gorman
2009-05-15 5:32 starlight
2009-05-15 14:55 ` Mel Gorman
2009-05-15 15:02 ` starlight
[not found] <bug-13302-10286@http.bugzilla.kernel.org/>
2009-05-13 20:08 ` Andrew Morton
2009-05-14 10:53 ` Mel Gorman
2009-05-14 10:59 ` Mel Gorman
2009-05-14 17:20 ` starlight
2009-05-14 17:49 ` Mel Gorman
2009-05-14 18:42 ` starlight
2009-05-14 19:10 ` starlight
2009-05-14 17:16 ` starlight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090522164101.GA9196@csn.ul.ie \
--to=mel@csn.ul.ie \
--cc=Lee.Schermerhorn@hp.com \
--cc=agl@us.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=bugme-daemon@bugzilla.kernel.org \
--cc=bugzilla-daemon@bugzilla.kernel.org \
--cc=ebmunson@us.ibm.com \
--cc=hugh.dickins@tiscali.co.uk \
--cc=kenchen@google.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=riel@redhat.com \
--cc=starlight@binnacle.cx \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).