From: Greg Edwards <gedwards@ddn.com>
To: kvm@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Subject: Re: BUG unpinning 1 GiB huge pages with KVM PCI assignment
Date: Tue, 29 Oct 2013 17:19:43 -0600 [thread overview]
Message-ID: <20131029231943.GA29828@psuche> (raw)
In-Reply-To: <20131028193756.GA1653@psuche>
On Mon, Oct 28, 2013 at 12:37:56PM -0700, Greg Edwards wrote:
> Using KVM PCI assignment with 1 GiB huge pages trips a BUG in 3.12.0-rc7, e.g.
>
> # qemu-system-x86_64 \
> -m 8192 \
> -mem-path /var/lib/hugetlbfs/pagesize-1GB \
> -mem-prealloc \
> -enable-kvm \
> -device pci-assign,host=1:0.0 \
> -drive file=/var/tmp/vm.img,cache=none
>
>
> [ 287.081736] ------------[ cut here ]------------
> [ 287.086364] kernel BUG at mm/hugetlb.c:654!
> [ 287.090552] invalid opcode: 0000 [#1] PREEMPT SMP
> [ 287.095407] Modules linked in: pci_stub autofs4 sunrpc iptable_filter ip_tables ip6table_filter ip6_tables x_tables binfmt_misc freq_table processor x86_pkg_temp_thermal kvm_intel kvm crc32_pclmul microcode serio_raw i2c_i801 evdev sg igb i2c_algo_bit i2c_core ptp pps_core mlx4_core button ext4 jbd2 mbcache crc16 usbhid sd_mod
> [ 287.124916] CPU: 15 PID: 25668 Comm: qemu-system-x86 Not tainted 3.12.0-rc7 #1
> [ 287.132140] Hardware name: DataDirect Networks SFA12KX/SFA12000, BIOS 21.0m4 06/28/2013
> [ 287.140145] task: ffff88007c732e60 ti: ffff881ff1d3a000 task.ti: ffff881ff1d3a000
> [ 287.147620] RIP: 0010:[<ffffffff811395e1>] [<ffffffff811395e1>] free_huge_page+0x1d1/0x1e0
> [ 287.155992] RSP: 0018:ffff881ff1d3ba88 EFLAGS: 00010213
> [ 287.161309] RAX: 0000000000000000 RBX: ffffffff818bcd80 RCX: 0000000000000012
> [ 287.168446] RDX: 020000000000400c RSI: 0000000000001000 RDI: 0000000040000000
> [ 287.175574] RBP: ffff881ff1d3bab8 R08: 0000000000000000 R09: 0000000000000002
> [ 287.182705] R10: 0000000000000000 R11: 0000000000000000 R12: ffffea007c000000
> [ 287.189834] R13: 020000000000400c R14: 0000000000000000 R15: 00000000ffffffff
> [ 287.196964] FS: 00007f13722d5840(0000) GS:ffff88287f660000(0000) knlGS:0000000000000000
> [ 287.205048] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 287.210790] CR2: ffffffffff600400 CR3: 0000001fee3f5000 CR4: 00000000001427e0
> [ 287.217918] Stack:
> [ 287.219931] 0000000000000001 ffffea007c000000 0000000001f00000 ffff881fe3d88500
> [ 287.227390] 00000000000e0000 00000000ffffffff ffff881ff1d3bad8 ffffffff81102f9c
> [ 287.234849] 0000000000000246 ffffea007c000000 ffff881ff1d3baf8 ffffffff811035c0
> [ 287.242308] Call Trace:
> [ 287.244762] [<ffffffff81102f9c>] __put_compound_page+0x1c/0x30
> [ 287.250680] [<ffffffff811035c0>] put_compound_page+0x80/0x200
> [ 287.256516] [<ffffffff81103d05>] put_page+0x45/0x50
> [ 287.261487] [<ffffffffa019f070>] kvm_release_pfn_clean+0x50/0x60 [kvm]
> [ 287.268098] [<ffffffffa01a62d5>] kvm_iommu_put_pages+0xb5/0xe0 [kvm]
> [ 287.274542] [<ffffffffa01a6315>] kvm_iommu_unmap_pages+0x15/0x20 [kvm]
> [ 287.281160] [<ffffffffa01a638a>] kvm_iommu_unmap_memslots+0x6a/0x90 [kvm]
> [ 287.288038] [<ffffffffa01a68b7>] kvm_assign_device+0xa7/0x140 [kvm]
> [ 287.294398] [<ffffffffa01a5e6c>] kvm_vm_ioctl_assigned_device+0x78c/0xb40 [kvm]
> [ 287.301795] [<ffffffff8113baa1>] ? alloc_pages_vma+0xb1/0x1b0
> [ 287.307632] [<ffffffffa01a089e>] kvm_vm_ioctl+0x1be/0x5b0 [kvm]
> [ 287.313645] [<ffffffff811220fd>] ? remove_vma+0x5d/0x70
> [ 287.318963] [<ffffffff8103ecec>] ? __do_page_fault+0x1fc/0x4b0
> [ 287.324886] [<ffffffffa01b49ec>] ? kvm_dev_ioctl_check_extension+0x8c/0xd0 [kvm]
> [ 287.332370] [<ffffffffa019fba6>] ? kvm_dev_ioctl+0xa6/0x460 [kvm]
> [ 287.338551] [<ffffffff8115e049>] do_vfs_ioctl+0x89/0x4c0
> [ 287.343953] [<ffffffff8115e521>] SyS_ioctl+0xa1/0xb0
> [ 287.349007] [<ffffffff814c1552>] system_call_fastpath+0x16/0x1b
> [ 287.355011] Code: e6 48 89 df 48 89 42 08 48 89 10 4d 89 54 24 20 4d 89 4c 24 28 e8 70 bc ff ff 48 83 6b 38 01 42 83 6c ab 08 01 eb 91 0f 0b eb fe <0f> 0b eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57
> [ 287.374986] RIP [<ffffffff811395e1>] free_huge_page+0x1d1/0x1e0
> [ 287.381007] RSP <ffff881ff1d3ba88>
> [ 287.384508] ---[ end trace 82c719f97df2e524 ]---
> [ 287.389129] Kernel panic - not syncing: Fatal exception
> [ 287.394378] ------------[ cut here ]------------
>
>
> This is on an Ivy Bridge system, so it has IOMMU with snoop control, hence the
> map/unmap/map sequence on device assignment to get the cache coherency right.
> It appears we are unpinning tail pages we never pinned the first time through
> kvm_iommu_map_memslots(). This kernel does not have THP enabled, if that makes
> a difference.
The issue here is one of the 1 GiB huge pages is partially in one
memslot (memslot 1) and fully in another one (memslot 5). When the
memslots are pinned by kvm_iommu_map_pages(), we only pin the pages
once.
When we unmap them with kvm_iommu_put_pages(), half of the huge page is
unpinned when memslot 1 is unmapped/unpinned, but when memslot 5 is
unpinned next, iommu_iova_to_phys() still returns values for the gfns
that were part of the partial huge page in memslot 1 (and also in
memslot 5), and we unpin those pages a second time, plus the rest of the
huge page that was in memslot 5 only, and then trip the bug when
page->_count reaches zero.
Is it expected the same pages might be mapped in multiple memslots? I
noticed the gfn overlap check in __kvm_set_memory_region().
It appears pfn_to_dma_pte() is behaving as expected, given half the huge
page is still mapped. Do I have that correct? If so, then we really
can't rely on iommu_iova_to_phys() alone to determine if its safe to
unpin a page in kvm_iommu_put_pages().
Ideas on how to best handle this condition?
Greg
next prev parent reply other threads:[~2013-10-29 23:19 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-28 19:37 BUG unpinning 1 GiB huge pages with KVM PCI assignment Greg Edwards
2013-10-29 23:19 ` Greg Edwards [this message]
2013-11-01 17:47 ` Marcelo Tosatti
[not found] ` <20131101174734.GA27370-I4X2Mt4zSy4@public.gmane.org>
2013-11-01 18:01 ` Greg Edwards
2013-11-02 1:17 ` Marcelo Tosatti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131029231943.GA29828@psuche \
--to=gedwards@ddn.com \
--cc=iommu@lists.linux-foundation.org \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.