From: Nishanth Aravamudan <nacc@us.ibm.com>
To: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>,
Hugh Dickins <hugh@veritas.com>, Ingo Molnar <mingo@elte.hu>,
Jeff Chua <jeff.chua.linux@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>,
Gabriel C <nix.or.die@googlemail.com>,
Arjan van de Ven <arjan@linux.intel.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] x86: fix PAE pmd_bad bootup warning
Date: Thu, 8 May 2008 10:16:57 -0700 [thread overview]
Message-ID: <20080508171657.GO23990@us.ibm.com> (raw)
In-Reply-To: <20080508165111.GI12654@escobedo.amd.com>
On 08.05.2008 [18:51:11 +0200], Hans Rosenfeld wrote:
> On Thu, May 08, 2008 at 09:33:52AM -0700, Nishanth Aravamudan wrote:
> > So this seems to lend credence to Dave's hypothesis. Without, as you
> > were trying before, teaching pagemap all about hugepages, what are our
> > options?
> >
> > Can we just skip over the current iteration of the PMD loop (would we
> > need something similar for the PTE loop for power?) if pmd_huge(pmd)?
>
> Allowing huge pages in the page walker would affect both walk_pmd_range
> and walk_pud_range. Then either the users of the page walker need to
> know how to handle huge pages themselves (in the pmd_entry and pud_entry
> callback functions), or the page walker treats huge pages as any other
> pages (calling the pte_entry callback function).
Right, I agree *if* we allow huge pages in the walker. But AIUI, things
are broken now with hugepages in the process' address space. This is a
bug upstream and leads to hugepages leaking out of the kernel when
/proc/pid/pagemap is read. Why not, instead (as a short-term fix), skip
hugepage mappings altogether in the page-walker code?
Hrm, upon further investigation, this seems to be a pretty clear
limitation of walk_page_range(). One that is avoided in the other two
callers, i.e.
static int show_smap(struct seq_file *m, void *v)
{
...
if (vma->vm_mm && !is_vm_hugetlb_page(vma))
walk_page_range(vma->vm_mm, vma->vm_start, vma->vm_end,
&smaps_walk, &mss);
...
}
static ssize_t clear_refs_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
...
for (vma = mm->mmap; vma; vma = vma->vm_next)
if (!is_vm_hugetlb_page(vma))
walk_page_range(mm, vma->vm_start, vma->vm_end,
&clear_refs_walk, vma);
...
}
No such protection exists for
static ssize_t pagemap_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos);
So, is there any way to either add a is_vm_hugetlb_page(vma) check into
pagemap_read()? Or can we modify walk_page_range to take the a vma and
skip the walking if is_vm_hugetlb_page(vma) is set [to avoid
complications down the road until hugepage walking is fixed]. I guess
the latter isn't possible for pagemap_read(), since we are just looking
at arbitrary addresses in the process space?
Dunno, seems quite clear that the bug is in pagemap_read(), not any
hugepage code, and that the simplest fix is to make pagemap_read() do
what the other walker-callers do, and skip hugepage regions.
Thanks,
Nish
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
WARNING: multiple messages have this Message-ID (diff)
From: Nishanth Aravamudan <nacc@us.ibm.com>
To: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>,
Hugh Dickins <hugh@veritas.com>, Ingo Molnar <mingo@elte.hu>,
Jeff Chua <jeff.chua.linux@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>,
Gabriel C <nix.or.die@googlemail.com>,
Arjan van de Ven <arjan@linux.intel.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] x86: fix PAE pmd_bad bootup warning
Date: Thu, 8 May 2008 10:16:57 -0700 [thread overview]
Message-ID: <20080508171657.GO23990@us.ibm.com> (raw)
In-Reply-To: <20080508165111.GI12654@escobedo.amd.com>
On 08.05.2008 [18:51:11 +0200], Hans Rosenfeld wrote:
> On Thu, May 08, 2008 at 09:33:52AM -0700, Nishanth Aravamudan wrote:
> > So this seems to lend credence to Dave's hypothesis. Without, as you
> > were trying before, teaching pagemap all about hugepages, what are our
> > options?
> >
> > Can we just skip over the current iteration of the PMD loop (would we
> > need something similar for the PTE loop for power?) if pmd_huge(pmd)?
>
> Allowing huge pages in the page walker would affect both walk_pmd_range
> and walk_pud_range. Then either the users of the page walker need to
> know how to handle huge pages themselves (in the pmd_entry and pud_entry
> callback functions), or the page walker treats huge pages as any other
> pages (calling the pte_entry callback function).
Right, I agree *if* we allow huge pages in the walker. But AIUI, things
are broken now with hugepages in the process' address space. This is a
bug upstream and leads to hugepages leaking out of the kernel when
/proc/pid/pagemap is read. Why not, instead (as a short-term fix), skip
hugepage mappings altogether in the page-walker code?
Hrm, upon further investigation, this seems to be a pretty clear
limitation of walk_page_range(). One that is avoided in the other two
callers, i.e.
static int show_smap(struct seq_file *m, void *v)
{
...
if (vma->vm_mm && !is_vm_hugetlb_page(vma))
walk_page_range(vma->vm_mm, vma->vm_start, vma->vm_end,
&smaps_walk, &mss);
...
}
static ssize_t clear_refs_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
...
for (vma = mm->mmap; vma; vma = vma->vm_next)
if (!is_vm_hugetlb_page(vma))
walk_page_range(mm, vma->vm_start, vma->vm_end,
&clear_refs_walk, vma);
...
}
No such protection exists for
static ssize_t pagemap_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos);
So, is there any way to either add a is_vm_hugetlb_page(vma) check into
pagemap_read()? Or can we modify walk_page_range to take the a vma and
skip the walking if is_vm_hugetlb_page(vma) is set [to avoid
complications down the road until hugepage walking is fixed]. I guess
the latter isn't possible for pagemap_read(), since we are just looking
at arbitrary addresses in the process space?
Dunno, seems quite clear that the bug is in pagemap_read(), not any
hugepage code, and that the simplest fix is to make pagemap_read() do
what the other walker-callers do, and skip hugepage regions.
Thanks,
Nish
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-05-08 17:17 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-06 1:06 Linux 2.6.26-rc1 - pgtable_32.c:178 pmd_bad Jeff Chua
2008-05-06 12:49 ` Ingo Molnar
2008-05-06 13:56 ` Hugh Dickins
2008-05-06 15:04 ` Ingo Molnar
2008-05-06 15:09 ` Linus Torvalds
2008-05-06 15:15 ` Ingo Molnar
2008-05-06 16:16 ` Linus Torvalds
2008-05-06 16:30 ` Ingo Molnar
2008-05-06 15:32 ` Jeremy Fitzhardinge
2008-05-06 16:12 ` Hugh Dickins
2008-05-06 18:39 ` Linus Torvalds
2008-05-06 19:43 ` Hugh Dickins
2008-05-06 19:49 ` [PATCH] x86: fix PAE pmd_bad bootup warning Hugh Dickins
2008-05-06 19:49 ` Hugh Dickins
2008-05-06 20:06 ` Linus Torvalds
2008-05-06 20:06 ` Linus Torvalds
2008-05-06 20:30 ` Hugh Dickins
2008-05-06 20:30 ` Hugh Dickins
2008-05-08 16:07 ` Nishanth Aravamudan
2008-05-08 16:07 ` Nishanth Aravamudan
2008-05-06 20:22 ` Hans Rosenfeld
2008-05-06 20:22 ` Hans Rosenfeld
2008-05-06 20:36 ` Hugh Dickins
2008-05-06 20:36 ` Hugh Dickins
2008-05-07 23:39 ` Nishanth Aravamudan
2008-05-07 23:39 ` Nishanth Aravamudan
2008-05-06 20:42 ` Dave Hansen
2008-05-06 20:42 ` Dave Hansen
2008-05-08 14:34 ` Hans Rosenfeld
2008-05-08 14:34 ` Hans Rosenfeld
2008-05-08 14:39 ` Hans Rosenfeld
2008-05-08 14:52 ` Dave Hansen
2008-05-08 14:52 ` Dave Hansen
2008-05-08 15:11 ` Hans Rosenfeld
2008-05-08 15:11 ` Hans Rosenfeld
2008-05-08 15:51 ` Dave Hansen
2008-05-08 15:51 ` Dave Hansen
2008-05-08 16:19 ` Hans Rosenfeld
2008-05-08 16:19 ` Hans Rosenfeld
2008-05-08 16:33 ` Nishanth Aravamudan
2008-05-08 16:33 ` Nishanth Aravamudan
2008-05-08 16:51 ` Hans Rosenfeld
2008-05-08 16:51 ` Hans Rosenfeld
2008-05-08 17:16 ` Nishanth Aravamudan [this message]
2008-05-08 17:16 ` Nishanth Aravamudan
2008-05-08 18:42 ` Dave Hansen
2008-05-08 18:42 ` Dave Hansen
2008-05-08 18:58 ` Hugh Dickins
2008-05-08 18:58 ` Hugh Dickins
2008-05-08 19:06 ` Dave Hansen
2008-05-08 19:06 ` Dave Hansen
2008-05-08 18:48 ` Hugh Dickins
2008-05-08 18:48 ` Hugh Dickins
2008-05-08 19:49 ` Matt Mackall
2008-05-08 19:49 ` Matt Mackall
2008-05-08 20:08 ` Dave Hansen
2008-05-08 20:08 ` Dave Hansen
2008-05-08 20:02 ` Hans Rosenfeld
2008-05-08 20:02 ` Hans Rosenfeld
2008-05-08 20:16 ` Dave Hansen
2008-05-08 20:16 ` Dave Hansen
2008-05-08 23:15 ` Dave Hansen
2008-05-08 23:15 ` Dave Hansen
2008-05-14 19:01 ` Matt Mackall
2008-05-14 19:01 ` Matt Mackall
2008-05-09 9:03 ` Paul Mundt
2008-05-09 9:03 ` Paul Mundt
2008-05-08 16:42 ` Dave Hansen
2008-05-08 16:42 ` Dave Hansen
2008-05-08 15:44 ` Nishanth Aravamudan
2008-05-08 15:44 ` Nishanth Aravamudan
2008-05-07 4:40 ` Jeff Chua
2008-05-07 4:40 ` Jeff Chua
2008-05-07 5:30 ` Hugh Dickins
2008-05-07 5:30 ` Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080508171657.GO23990@us.ibm.com \
--to=nacc@us.ibm.com \
--cc=arjan@linux.intel.com \
--cc=dave@linux.vnet.ibm.com \
--cc=hans.rosenfeld@amd.com \
--cc=hpa@zytor.com \
--cc=hugh@veritas.com \
--cc=jeff.chua.linux@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@elte.hu \
--cc=nix.or.die@googlemail.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.