From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: James Custer <jcuster@sgi.com>,
x86@kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de,
mingo@redhat.com, hpa@zytor.com,
David Rientjes <rientjes@google.com>
Subject: Re: [PATCH] x86: Allow 1GB pages to be SPECIAL similar to 2MB
Date: Wed, 26 Nov 2014 08:17:15 +0200 [thread overview]
Message-ID: <20141126061715.GA17897@node.dhcp.inet.fi> (raw)
In-Reply-To: <20141125132304.ee475f4ea432a2e52ed4f8b5@linux-foundation.org>
On Tue, Nov 25, 2014 at 01:23:04PM -0800, Andrew Morton wrote:
> On Thu, 25 Sep 2014 09:40:24 -0500 James Custer <jcuster@sgi.com> wrote:
>
> > Superpages allocated by SGI's superpages module can be backed by 1GB pages,
> > but direct i/o cannot be used. The superpages module uses _PAGE_BIT_SPECIAL
> > to disable direct i/o because some code depends on the memory being backed
> > by page structures. But, because superpages have no backing page structures
> > this causes a panic.
> >
> > This is the way direct i/o on 1GB pages fails:
> >
> > BUG: unable to handle kernel paging request at ffffea0038000000
> > [60463.203795] IP: [<ffffffff8103c93a>] gup_huge_pud+0x9a/0xe0
> > [60463.210058] PGD 83ffd3067 PUD 83ffd2067 PMD 0
> > [60463.215052] Oops: 0000 [#1] SMP
> >
> > Stack traceback for pid 77136
> > 0xffff8867a88ae300 77136 74825 1 56 R 0xffff8867a88ae970 *readdirectsp
> > [<ffffffff8103c93a>] gup_huge_pud+0x9a/0xe0
> > [<ffffffff8103cc33>] gup_pud_range+0x173/0x1b0
> > [<ffffffff8103cd57>] get_user_pages_fast+0xe7/0x1b0
> > [<ffffffff8118eac3>] dio_get_page+0x83/0x150
> > [<ffffffff8118f641>] do_direct_IO+0x81/0x420
> > [<ffffffff8118fb89>] direct_io_worker+0x1a9/0x340
> > [<ffffffffa00c5de8>] ext3_direct_IO+0xe8/0x2c0 [ext3]
> > [<ffffffff810fa527>] generic_file_aio_read+0x237/0x260
> > [<ffffffff81159878>] do_sync_read+0xc8/0x110
> > [<ffffffff8115a027>] vfs_read+0xc7/0x130
> > [<ffffffff8115a193>] sys_read+0x53/0xa0
> > [<ffffffff81466192>] system_call_fastpath+0x16/0x1b
> >
> > gup_huge_pud() is trying to find the page structure, and with superpages there
> > is none.
> >
> > With direct i/o on 2MB pages:
> >
> > static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
> > int write, struct page **pages, int *nr)
> > {
> > ...
> > if (pmd_none(pmd) || pmd_trans_splitting(pmd))
> > return 0;
> >
> > and pmd_trans_splitting() is testing _PAGE_SPLITTING, which is an alias
> > for _PAGE_SPECIAL which we set on the 2MB or 1GB pages mapped in by superpages.
> >
> > But gup_pud_range() has no such check:
> >
> > static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
> > int write, struct page **pages, int *nr)
> > {
> > ...
> > if (pud_none(pud))
> > return 0;
> >
> > Therefore direct i/o on 1GB pages attempts to get a page structure and panics.
> >
> > ...
> >
> > @@ -223,7 +221,7 @@ static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
> > pud_t pud = *pudp;
> >
> > next = pud_addr_end(addr, end);
> > - if (pud_none(pud))
> > + if (pud_none(pud) || (pud_val(pud) & _PAGE_SPECIAL))
> > return 0;
> > if (unlikely(pud_large(pud))) {
> > if (!gup_huge_pud(pud, addr, next, write, pages, nr))
>
> If I'm understanding it correctly, this patch is only needed by SGI's
> superpages module, yes?
>
> That being said, it looks like a reasonable precaution and we could
> easily carry it.
Previously we used PSE + SOFTW1 in pmd_t for pmd_trans_splitting().
I don't think it's good idea to reserve a bit in page table entries for
use-case kernel by itself doesn't support. Especially, that it's a bit in
present entry.
--
Kirill A. Shutemov
next prev parent reply other threads:[~2014-11-26 6:17 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-25 14:40 [PATCH] x86: Allow 1GB pages to be SPECIAL similar to 2MB James Custer
2014-11-25 21:23 ` Andrew Morton
2014-11-26 6:17 ` Kirill A. Shutemov [this message]
-- strict thread matches above, loose matches on Subject: below --
2014-09-24 17:19 James Custer
2014-09-24 22:04 ` David Rientjes
2014-09-25 10:39 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141126061715.GA17897@node.dhcp.inet.fi \
--to=kirill@shutemov.name \
--cc=akpm@linux-foundation.org \
--cc=hpa@zytor.com \
--cc=jcuster@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=rientjes@google.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.