From: Nishanth Aravamudan <nacc@us.ibm.com>
To: agl@us.ibm.com, david@gibson.dropbear.id.au, ak@suse.de
Cc: linux-mm@kvack.org, discuss@x86-64.org
Subject: BUG in x86_64 hugepage support
Date: Tue, 14 Mar 2006 17:20:00 -0800 [thread overview]
Message-ID: <20060315012000.GC5526@us.ibm.com> (raw)
Hello,
While doing some testing of libhugetlbfs, I ran into the following BUGs
on my x86_64 box when checking mprotect with hugepages (running make
func in libhugetlbfs is all it took here) (distro is Ubuntu Dapper, runs
32-bit userspace).
[ 633.480724] ----------- [cut here ] --------- [please bite here ] ---------
[ 633.480733] Kernel BUG at ...rc6-mm1/arch/x86_64/mm/../../i386/mm/hugetlbpage.c:31
[ 633.480736] invalid opcode: 0000 [1] PREEMPT SMP
[ 633.480740] last sysfs file: /block/sdb/sdb1/stat
[ 633.480743] CPU 1
[ 633.480745] Modules linked in:
[ 633.480750] Pid: 7872, comm: mprotect Not tainted 2.6.16-rc6-mm1 #1
[ 633.480753] RIP: 0010:[<ffffffff80188e46>] <ffffffff80188e46>{huge_pte_alloc+230}
[ 633.480764] RSP: 0000:ffff81006675dd18 EFLAGS: 00010283
[ 633.480767] RAX: 0000000000000001 RBX: ffff8100648ce008 RCX: 0000000000000000
[ 633.480772] RDX: ffff81007c7aa560 RSI: 0000000055800000 RDI: ffff8100648e4480
[ 633.480776] RBP: ffff81006675dd38 R08: 00000000556c19fc R09: 00000000ffffd178
[ 633.480780] R10: ffff81006675c000 R11: 0000000000000246 R12: 0000000000000560
[ 633.480784] R13: ffff8100648e4480 R14: ffff8100648e4480 R15: 0000000000000000
[ 633.480788] FS: 00002ac688028e10(0000) GS:ffff81007f1b4740(0063) knlGS:00000000556c68e0
[ 633.480792] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
[ 633.480795] CR2: 0000000055800000 CR3: 000000006593d000 CR4: 00000000000006e0
[ 633.480800] Process mprotect (pid: 7872, threadinfo ffff81006675c000, task ffff8100402fc750)
[ 633.480803] Stack: 0000000055800000 0000000000000000 0000000000000000 0000000055800000
[ 633.480809] ffff81006675dd88 ffffffff801c5383 ffff81006675de68 ffff8100734e3c60
[ 633.480817] ffff81006675dda8 ffff8100734e3c60
[ 633.480821] Call Trace: <ffffffff801c5383>{hugetlb_fault+51} <ffffffff801085ea>{__handle_mm_fault+90}
[ 633.480834] <ffffffff8010a727>{ia32_setup_sigcontext+327} <ffffffff80173fdb>{notifier_call_chain+43}
[ 633.480845] <ffffffff80173b19>{do_page_fault+1241} <ffffffff80171275>{_spin_unlock_irq+21}
[ 633.480854] <ffffffff80134cc5>{sys_rt_sigprocmask+229} <ffffffff8016c741>{error_exit+0}
[ 633.480868]
[ 633.480869] Code: 0f 0b 68 08 a6 4e 80 c2 1f 00 48 8b 5d e8 4c 8b 65 f0 48 89
[ 633.480881] RIP <ffffffff80188e46>{huge_pte_alloc+230} RSP <ffff81006675dd18>
[ 633.480888] ----------- [cut here ] --------- [please bite here ] ---------
[ 633.492589] Kernel BUG at ...rc6-mm1/arch/x86_64/mm/../../i386/mm/hugetlbpage.c:31
[ 633.492593] invalid opcode: 0000 [2] PREEMPT SMP
[ 633.492597] last sysfs file: /block/sdb/sdb1/stat
[ 633.492600] CPU 1
[ 633.492602] Modules linked in:
[ 633.492606] Pid: 7873, comm: mprotect Not tainted 2.6.16-rc6-mm1 #1
[ 633.492610] RIP: 0010:[<ffffffff80188e46>] <ffffffff80188e46>{huge_pte_alloc+230}
[ 633.492620] RSP: 0000:ffff81006675fd18 EFLAGS: 00010283
[ 633.492624] RAX: 0000000000000001 RBX: ffff810066745550 RCX: 0000000000000000
[ 633.492628] RDX: ffff81006596bab0 RSI: 00002aaaaac00000 RDI: ffff8100648e4480
[ 633.492632] RBP: ffff81006675fd38 R08: 0000000000000000 R09: 0000000000000000
[ 633.492635] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000ab0
[ 633.492639] R13: ffff8100648e4480 R14: ffff8100648e4480 R15: 0000000000000000
[ 633.492644] FS: 00002b745df2ce10(0000) GS:ffff81007f1b4740(0000) knlGS:00000000556ac6c0
[ 633.492647] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 633.492651] CR2: 00002aaaaac00000 CR3: 000000006670a000 CR4: 00000000000006e0
[ 633.492655] Process mprotect (pid: 7873, threadinfo ffff81006675e000, task ffff8100402fd530)
[ 633.492658] Stack: 00002aaaaac00000 0000000000000000 0000000000000003 00002aaaaac00000
[ 633.492665] ffff81006675fd88 ffffffff801c5383 ffff81006675fe68 ffff8100734e3af0
[ 633.492673] ffff81006675fd78 ffff8100734e3af0
[ 633.492677] Call Trace: <ffffffff801c5383>{hugetlb_fault+51} <ffffffff801085ea>{__handle_mm_fault+90}
[ 633.492690] <ffffffff80173fdb>{notifier_call_chain+43} <ffffffff80173b19>{do_page_fault+1241}
[ 633.492701] <ffffffff80123652>{sys_mprotect+1522} <ffffffff8016c741>{error_exit+0}
[ 633.492715]
[ 633.492716] Code: 0f 0b 68 08 a6 4e 80 c2 1f 00 48 8b 5d e8 4c 8b 65 f0 48 89
[ 633.492728] RIP <ffffffff80188e46>{huge_pte_alloc+230} RSP <ffff81006675fd18>
The line in question (arch/i386/mm/hugetlbpage.c:31) in 2.6.16-rc6-mm1
is:
BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
We are trying to verify that if the pte was succesfully allocated that
it is filled in and that it is a hugetlb pte.
After some discussion with Adam Litke, I added some debugging to see
what pte_val we were getting:
huge_pte_alloc failed: pte == 800000003d800027
which indicates our flags = 0x27 or 00100111.
On x86_64, pte_huge is defined to be:
#define __LARGE_PTE (_PAGE_PSE|_PAGE_PRESENT) // __LARGE_PTE = 10000001
static inline int pte_huge(pte_t pte) { return (pte_val(pte) & __LARGE_PTE) == __LARGE_PTE; }
Clearly, pte_huge() is going to return 0, as
pte_val(pte) & __LARGE_PTE == 0x1 != __LARGE_PTE
in this case.
I believe the issue occurs due to the following code path:
sys_mprotect() --> hugetlb_change_protection() --> pte_modify()
On x86_64, that last call is:
#defined _PAGE_CHG_MASK (PTE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY) // upper bits all 1, lower 11 bits = 00001100000
unsigned long __supported_pte_mask __read_mostly = ~0UL;
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
{
pte_val(pte) &= _PAGE_CHG_MASK;
pte_val(pte) |= pgprot_val(newprot);
pte_val(pte) &= __supported_pte_mask;
return pte;
}
So, the first &= results in the lower 11 bits of pte_val(pte) being all
0s. By my analysis, this is the problem, pte_modify() on x86_64 is
clearing the bits we check to see if a pte is a hugetlb one. To see if
this might be an accurate analysis, I modified _PAGE_CHG_MASK as
follows:
-#define _PAGE_CHG_MASK (PTE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY)
+#define _PAGE_CHG_MASK (PTE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE | _PAGE_PRESENT)
That is, forcing the bits we care about to get set in pte_modify(). This
removed the BUG()s I was seeing in our testing.
This obviously isn't a solution, though, but I don't know what is :) I
am hoping somebody with a bit more VM (or x86-64) experience can figure
out the right fix. I would appreciate any input, or corrections to my
analysis.
Thanks,
Nish
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next reply other threads:[~2006-03-15 1:20 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-03-15 1:20 Nishanth Aravamudan [this message]
2006-03-15 4:03 ` BUG in x86_64 hugepage support Chen, Kenneth W
2006-03-15 4:35 ` Nishanth Aravamudan
2006-03-15 7:08 ` Chen, Kenneth W
2006-03-15 7:30 ` Nishanth Aravamudan
2006-03-15 8:50 ` [discuss] " Jan Beulich
2006-03-15 10:03 ` Chen, Kenneth W
2006-03-15 15:14 ` Nishanth Aravamudan
2006-03-15 15:56 ` Nishanth Aravamudan
2006-03-15 15:13 ` Nishanth Aravamudan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060315012000.GC5526@us.ibm.com \
--to=nacc@us.ibm.com \
--cc=agl@us.ibm.com \
--cc=ak@suse.de \
--cc=david@gibson.dropbear.id.au \
--cc=discuss@x86-64.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.