From: Andrew Morton <akpm@linux-foundation.org>
To: James Custer <jcuster@sgi.com>
Cc: x86@kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de,
mingo@redhat.com, hpa@zytor.com,
David Rientjes <rientjes@google.com>
Subject: Re: [PATCH] x86: Allow 1GB pages to be SPECIAL similar to 2MB
Date: Tue, 25 Nov 2014 13:23:04 -0800 [thread overview]
Message-ID: <20141125132304.ee475f4ea432a2e52ed4f8b5@linux-foundation.org> (raw)
In-Reply-To: <1411656024-33114-1-git-send-email-jcuster@sgi.com>
On Thu, 25 Sep 2014 09:40:24 -0500 James Custer <jcuster@sgi.com> wrote:
> Superpages allocated by SGI's superpages module can be backed by 1GB pages,
> but direct i/o cannot be used. The superpages module uses _PAGE_BIT_SPECIAL
> to disable direct i/o because some code depends on the memory being backed
> by page structures. But, because superpages have no backing page structures
> this causes a panic.
>
> This is the way direct i/o on 1GB pages fails:
>
> BUG: unable to handle kernel paging request at ffffea0038000000
> [60463.203795] IP: [<ffffffff8103c93a>] gup_huge_pud+0x9a/0xe0
> [60463.210058] PGD 83ffd3067 PUD 83ffd2067 PMD 0
> [60463.215052] Oops: 0000 [#1] SMP
>
> Stack traceback for pid 77136
> 0xffff8867a88ae300 77136 74825 1 56 R 0xffff8867a88ae970 *readdirectsp
> [<ffffffff8103c93a>] gup_huge_pud+0x9a/0xe0
> [<ffffffff8103cc33>] gup_pud_range+0x173/0x1b0
> [<ffffffff8103cd57>] get_user_pages_fast+0xe7/0x1b0
> [<ffffffff8118eac3>] dio_get_page+0x83/0x150
> [<ffffffff8118f641>] do_direct_IO+0x81/0x420
> [<ffffffff8118fb89>] direct_io_worker+0x1a9/0x340
> [<ffffffffa00c5de8>] ext3_direct_IO+0xe8/0x2c0 [ext3]
> [<ffffffff810fa527>] generic_file_aio_read+0x237/0x260
> [<ffffffff81159878>] do_sync_read+0xc8/0x110
> [<ffffffff8115a027>] vfs_read+0xc7/0x130
> [<ffffffff8115a193>] sys_read+0x53/0xa0
> [<ffffffff81466192>] system_call_fastpath+0x16/0x1b
>
> gup_huge_pud() is trying to find the page structure, and with superpages there
> is none.
>
> With direct i/o on 2MB pages:
>
> static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
> int write, struct page **pages, int *nr)
> {
> ...
> if (pmd_none(pmd) || pmd_trans_splitting(pmd))
> return 0;
>
> and pmd_trans_splitting() is testing _PAGE_SPLITTING, which is an alias
> for _PAGE_SPECIAL which we set on the 2MB or 1GB pages mapped in by superpages.
>
> But gup_pud_range() has no such check:
>
> static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
> int write, struct page **pages, int *nr)
> {
> ...
> if (pud_none(pud))
> return 0;
>
> Therefore direct i/o on 1GB pages attempts to get a page structure and panics.
>
> ...
>
> @@ -223,7 +221,7 @@ static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
> pud_t pud = *pudp;
>
> next = pud_addr_end(addr, end);
> - if (pud_none(pud))
> + if (pud_none(pud) || (pud_val(pud) & _PAGE_SPECIAL))
> return 0;
> if (unlikely(pud_large(pud))) {
> if (!gup_huge_pud(pud, addr, next, write, pages, nr))
If I'm understanding it correctly, this patch is only needed by SGI's
superpages module, yes?
That being said, it looks like a reasonable precaution and we could
easily carry it.
But please, this check needs a very good code comment explaining to
readers why it is here and what it is doing. Probably that comment
should also mention that the mainline kernel doesn't strictly need the
check.
Also, open-coding the flag test looks rather inconsistent. Should we
have some (documented, please) helper macro to perform this test?
next prev parent reply other threads:[~2014-11-25 21:23 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-25 14:40 [PATCH] x86: Allow 1GB pages to be SPECIAL similar to 2MB James Custer
2014-11-25 21:23 ` Andrew Morton [this message]
2014-11-26 6:17 ` Kirill A. Shutemov
-- strict thread matches above, loose matches on Subject: below --
2014-09-24 17:19 James Custer
2014-09-24 22:04 ` David Rientjes
2014-09-25 10:39 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141125132304.ee475f4ea432a2e52ed4f8b5@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=hpa@zytor.com \
--cc=jcuster@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=rientjes@google.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox