public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] thp: Add compound tail page _mapcount when mapped
@ 2011-11-25  5:47 Youquan Song
  2011-11-25  5:47 ` [PATCH 2/2] thp: Set compound tail page _count to zero Youquan Song
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Youquan Song @ 2011-11-25  5:47 UTC (permalink / raw)
  To: linux-kernel, akpm, aarcange, wli
  Cc: david.woodhouse, allen.m.kay, mtosatti, chrisw, andi,
	chaohong.guo, Youquan Song, Youquan Song

With 3.2-rc kernel, the IOMMU 2M page in KVM works. While I try to us IOMMU
1GB page in KVM, I encounter a oops and 1GB page total fail to be used.
The root cause is that 1GB page allocation calls gup_huge_pud() while 2M
 page calls gup_huge_pmd. If compound pages are used and the page is tail page,
gup_huge_pmd increase _mapcount to record tail page are mapped while
gup_huge_pud does not include this process. So when the mapped page is relesed,
it will result in kernel oops because the page does not mark mapped.

This patch add tail process for compound page in 1GB huge page which keeps the
same process as 2M page.

Reproduce like:
1. Add grub boot option: hugepagesz=1G hugepages=8
2. mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages
3.qemu-kvm -m 2048 -hda os-kvm.img -cpu kvm64 -smp 4 -mem-path /dev/hugepages
 -net none -device pci-assign,host=07:00.1

kernel BUG at mm/swap.c:114!
invalid opcode: 0000 [#1] SMP
Call Trace:
 [<ffffffff81127482>] put_page+0x15/0x37
 [<ffffffff810067c4>] kvm_release_pfn_clean+0x31/0x36
 [<ffffffff8100b69c>] kvm_iommu_put_pages+0x94/0xb1
 [<ffffffff8100b739>] kvm_iommu_unmap_memslots+0x80/0xb6
 [<ffffffff8100b6b9>] ? kvm_iommu_put_pages+0xb1/0xb1
 [<ffffffff81425cf3>] ? intel_iommu_attach_device+0x13b/0x144
 [<ffffffff8100bc03>] kvm_assign_device+0xba/0x117
 [<ffffffff8100aec2>] kvm_vm_ioctl_assigned_device+0x301/0xa47
 [<ffffffff8100ac6d>] ? kvm_vm_ioctl_assigned_device+0xac/0xa47
 [<ffffffff8104f2a6>] ? native_sched_clock+0x32/0x6b
 [<ffffffff810b0be2>] ? sched_clock_cpu+0x45/0xd4
 [<ffffffff810bc430>] ? trace_hardirqs_off+0xd/0xf
 [<ffffffff810b0cb2>] ? local_clock+0x41/0x5a
 [<ffffffff810bc881>] ? lock_release_holdtime+0x2c/0x129
 [<ffffffff8115760d>] ? cmpxchg_double_slab+0xd0/0x12b
 [<ffffffff81248f27>] ? avc_has_perm_noaudit+0x388/0x399
 [<ffffffff8104f2a6>] ? native_sched_clock+0x32/0x6b
 [<ffffffff8104f2e8>] ? sched_clock+0x9/0xd
 [<ffffffff81007dcb>] kvm_vm_ioctl+0x36c/0x3a2
 [<ffffffff8104f2a6>] ? native_sched_clock+0x32/0x6b
 [<ffffffff8104f2e8>] ? sched_clock+0x9/0xd
 [<ffffffff81174af0>] do_vfs_ioctl+0x49e/0x4e4
 [<ffffffff81174b90>] sys_ioctl+0x5a/0x7c
 [<ffffffff81500dc2>] system_call_fastpath+0x16/0x1b
RIP  [<ffffffff811273d9>] put_compound_page+0xd4/0x168

Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
---
 arch/x86/mm/gup.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index ea30585..dd74e46 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -201,6 +201,8 @@ static noinline int gup_huge_pud(pud_t pud, unsigned long addr,
 	do {
 		VM_BUG_ON(compound_head(page) != head);
 		pages[*nr] = page;
+		if (PageTail(page))
+			get_huge_page_tail(page);
 		(*nr)++;
 		page++;
 		refs++;
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread
* [PATCH 1/2] thp: Add compound tail page _mapcount when mapped
@ 2011-11-26  3:23 Youquan Song
  0 siblings, 0 replies; 8+ messages in thread
From: Youquan Song @ 2011-11-26  3:23 UTC (permalink / raw)
  To: linux-kernel, akpm, aarcange
  Cc: stable, david.woodhouse, allen.m.kay, mtosatti, chrisw, andi,
	chaohong.guo, Youquan Song, Youquan Song

With 3.2-rc kernel, the IOMMU 2MiB page in KVM works. While I try to us IOMMU
1GiB page in KVM, I encounter a oops and 1GiB page total fail to be used.
The root cause is that 1GiB page allocation calls gup_huge_pud() while 2MiB
 page calls gup_huge_pmd. If compound pages are used and the page is tail page,
gup_huge_pmd increase _mapcount to record tail page are mapped while
gup_huge_pud does not include this process. So when the mapped page is relesed,
it will result in kernel oops because the page does not mark mapped.

This patch add tail process for compound page in 1GiB huge page which keeps the
same process as 2MiB page.

Reproduce like:
1. Add grub boot option: hugepagesz=1G hugepages=8
2. mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages
3.qemu-kvm -m 2048 -hda os-kvm.img -cpu kvm64 -smp 4 -mem-path /dev/hugepages
 -net none -device pci-assign,host=07:00.1

kernel BUG at mm/swap.c:114!
invalid opcode: 0000 [#1] SMP
Call Trace:
 [<ffffffff81127482>] put_page+0x15/0x37
 [<ffffffff810067c4>] kvm_release_pfn_clean+0x31/0x36
 [<ffffffff8100b69c>] kvm_iommu_put_pages+0x94/0xb1
 [<ffffffff8100b739>] kvm_iommu_unmap_memslots+0x80/0xb6
 [<ffffffff8100b6b9>] ? kvm_iommu_put_pages+0xb1/0xb1
 [<ffffffff81425cf3>] ? intel_iommu_attach_device+0x13b/0x144
 [<ffffffff8100bc03>] kvm_assign_device+0xba/0x117
 [<ffffffff8100aec2>] kvm_vm_ioctl_assigned_device+0x301/0xa47
 [<ffffffff8100ac6d>] ? kvm_vm_ioctl_assigned_device+0xac/0xa47
 [<ffffffff8104f2a6>] ? native_sched_clock+0x32/0x6b
 [<ffffffff810b0be2>] ? sched_clock_cpu+0x45/0xd4
 [<ffffffff810bc430>] ? trace_hardirqs_off+0xd/0xf
 [<ffffffff810b0cb2>] ? local_clock+0x41/0x5a
 [<ffffffff810bc881>] ? lock_release_holdtime+0x2c/0x129
 [<ffffffff8115760d>] ? cmpxchg_double_slab+0xd0/0x12b
 [<ffffffff81248f27>] ? avc_has_perm_noaudit+0x388/0x399
 [<ffffffff8104f2a6>] ? native_sched_clock+0x32/0x6b
 [<ffffffff8104f2e8>] ? sched_clock+0x9/0xd
 [<ffffffff81007dcb>] kvm_vm_ioctl+0x36c/0x3a2
 [<ffffffff8104f2a6>] ? native_sched_clock+0x32/0x6b
 [<ffffffff8104f2e8>] ? sched_clock+0x9/0xd
 [<ffffffff81174af0>] do_vfs_ioctl+0x49e/0x4e4
 [<ffffffff81174b90>] sys_ioctl+0x5a/0x7c
 [<ffffffff81500dc2>] system_call_fastpath+0x16/0x1b
RIP  [<ffffffff811273d9>] put_compound_page+0xd4/0x168

Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org> # 3.0.x
Signed-off-by: Youquan Song <youquan.song@intel.com>
---
 arch/x86/mm/gup.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index ea30585..dd74e46 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -201,6 +201,8 @@ static noinline int gup_huge_pud(pud_t pud, unsigned long addr,
 	do {
 		VM_BUG_ON(compound_head(page) != head);
 		pages[*nr] = page;
+		if (PageTail(page))
+			get_huge_page_tail(page);
 		(*nr)++;
 		page++;
 		refs++;
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-11-29  5:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-25  5:47 [PATCH 1/2] thp: Add compound tail page _mapcount when mapped Youquan Song
2011-11-25  5:47 ` [PATCH 2/2] thp: Set compound tail page _count to zero Youquan Song
2011-11-25 13:12 ` [PATCH 1/2] thp: Add compound tail page _mapcount when mapped Andrea Arcangeli
2011-11-29  0:16 ` Andrew Morton
2011-11-29  0:19   ` Andi Kleen
2011-11-29  0:58     ` Andrea Arcangeli
2011-11-29 18:00       ` Youquan Song
  -- strict thread matches above, loose matches on Subject: below --
2011-11-26  3:23 Youquan Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox