public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Edwards <gedwards@ddn.com>
To: kvm@vger.kernel.org
Subject: BUG unpinning 1 GiB huge pages with KVM PCI assignment
Date: Mon, 28 Oct 2013 13:37:56 -0600	[thread overview]
Message-ID: <20131028193756.GA1653@psuche> (raw)

Using KVM PCI assignment with 1 GiB huge pages trips a BUG in 3.12.0-rc7, e.g.

# qemu-system-x86_64 \
	-m 8192 \
	-mem-path /var/lib/hugetlbfs/pagesize-1GB \
	-mem-prealloc \
	-enable-kvm \
	-device pci-assign,host=1:0.0 \
	-drive file=/var/tmp/vm.img,cache=none


[  287.081736] ------------[ cut here ]------------
[  287.086364] kernel BUG at mm/hugetlb.c:654!
[  287.090552] invalid opcode: 0000 [#1] PREEMPT SMP 
[  287.095407] Modules linked in: pci_stub autofs4 sunrpc iptable_filter ip_tables ip6table_filter ip6_tables x_tables binfmt_misc freq_table processor x86_pkg_temp_thermal kvm_intel kvm crc32_pclmul microcode serio_raw i2c_i801 evdev sg igb i2c_algo_bit i2c_core ptp pps_core mlx4_core button ext4 jbd2 mbcache crc16 usbhid sd_mod
[  287.124916] CPU: 15 PID: 25668 Comm: qemu-system-x86 Not tainted 3.12.0-rc7 #1
[  287.132140] Hardware name: DataDirect Networks SFA12KX/SFA12000, BIOS 21.0m4 06/28/2013
[  287.140145] task: ffff88007c732e60 ti: ffff881ff1d3a000 task.ti: ffff881ff1d3a000
[  287.147620] RIP: 0010:[<ffffffff811395e1>]  [<ffffffff811395e1>] free_huge_page+0x1d1/0x1e0
[  287.155992] RSP: 0018:ffff881ff1d3ba88  EFLAGS: 00010213
[  287.161309] RAX: 0000000000000000 RBX: ffffffff818bcd80 RCX: 0000000000000012
[  287.168446] RDX: 020000000000400c RSI: 0000000000001000 RDI: 0000000040000000
[  287.175574] RBP: ffff881ff1d3bab8 R08: 0000000000000000 R09: 0000000000000002
[  287.182705] R10: 0000000000000000 R11: 0000000000000000 R12: ffffea007c000000
[  287.189834] R13: 020000000000400c R14: 0000000000000000 R15: 00000000ffffffff
[  287.196964] FS:  00007f13722d5840(0000) GS:ffff88287f660000(0000) knlGS:0000000000000000
[  287.205048] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  287.210790] CR2: ffffffffff600400 CR3: 0000001fee3f5000 CR4: 00000000001427e0
[  287.217918] Stack:
[  287.219931]  0000000000000001 ffffea007c000000 0000000001f00000 ffff881fe3d88500
[  287.227390]  00000000000e0000 00000000ffffffff ffff881ff1d3bad8 ffffffff81102f9c
[  287.234849]  0000000000000246 ffffea007c000000 ffff881ff1d3baf8 ffffffff811035c0
[  287.242308] Call Trace:
[  287.244762]  [<ffffffff81102f9c>] __put_compound_page+0x1c/0x30
[  287.250680]  [<ffffffff811035c0>] put_compound_page+0x80/0x200
[  287.256516]  [<ffffffff81103d05>] put_page+0x45/0x50
[  287.261487]  [<ffffffffa019f070>] kvm_release_pfn_clean+0x50/0x60 [kvm]
[  287.268098]  [<ffffffffa01a62d5>] kvm_iommu_put_pages+0xb5/0xe0 [kvm]
[  287.274542]  [<ffffffffa01a6315>] kvm_iommu_unmap_pages+0x15/0x20 [kvm]
[  287.281160]  [<ffffffffa01a638a>] kvm_iommu_unmap_memslots+0x6a/0x90 [kvm]
[  287.288038]  [<ffffffffa01a68b7>] kvm_assign_device+0xa7/0x140 [kvm]
[  287.294398]  [<ffffffffa01a5e6c>] kvm_vm_ioctl_assigned_device+0x78c/0xb40 [kvm]
[  287.301795]  [<ffffffff8113baa1>] ? alloc_pages_vma+0xb1/0x1b0
[  287.307632]  [<ffffffffa01a089e>] kvm_vm_ioctl+0x1be/0x5b0 [kvm]
[  287.313645]  [<ffffffff811220fd>] ? remove_vma+0x5d/0x70
[  287.318963]  [<ffffffff8103ecec>] ? __do_page_fault+0x1fc/0x4b0
[  287.324886]  [<ffffffffa01b49ec>] ? kvm_dev_ioctl_check_extension+0x8c/0xd0 [kvm]
[  287.332370]  [<ffffffffa019fba6>] ? kvm_dev_ioctl+0xa6/0x460 [kvm]
[  287.338551]  [<ffffffff8115e049>] do_vfs_ioctl+0x89/0x4c0
[  287.343953]  [<ffffffff8115e521>] SyS_ioctl+0xa1/0xb0
[  287.349007]  [<ffffffff814c1552>] system_call_fastpath+0x16/0x1b
[  287.355011] Code: e6 48 89 df 48 89 42 08 48 89 10 4d 89 54 24 20 4d 89 4c 24 28 e8 70 bc ff ff 48 83 6b 38 01 42 83 6c ab 08 01 eb 91 0f 0b eb fe <0f> 0b eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 
[  287.374986] RIP  [<ffffffff811395e1>] free_huge_page+0x1d1/0x1e0
[  287.381007]  RSP <ffff881ff1d3ba88>
[  287.384508] ---[ end trace 82c719f97df2e524 ]---
[  287.389129] Kernel panic - not syncing: Fatal exception
[  287.394378] ------------[ cut here ]------------


This is on an Ivy Bridge system, so it has IOMMU with snoop control, hence the
map/unmap/map sequence on device assignment to get the cache coherency right.
It appears we are unpinning tail pages we never pinned the first time through
kvm_iommu_map_memslots().  This kernel does not have THP enabled, if that makes
a difference.

Interestingly, with this patch

  http://www.spinics.net/lists/kvm/msg97561.html

we no longer trip the BUG, but on qemu exit, we leak memory, as the huge pages
don't go back into the free pool.  It's likely just masking the original issue.

I haven't been successful in finding the bug yet.  Ideas on where to look?

Greg

             reply	other threads:[~2013-10-28 19:43 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-28 19:37 Greg Edwards [this message]
2013-10-29 23:19 ` BUG unpinning 1 GiB huge pages with KVM PCI assignment Greg Edwards
2013-11-01 17:47   ` Marcelo Tosatti
     [not found]     ` <20131101174734.GA27370-I4X2Mt4zSy4@public.gmane.org>
2013-11-01 18:01       ` Greg Edwards
2013-11-02  1:17         ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131028193756.GA1653@psuche \
    --to=gedwards@ddn.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox