* mmapping physical memory
@ 2013-08-26 11:58 Anatoly Burakov
2013-08-26 14:19 ` Andrea Arcangeli
0 siblings, 1 reply; 2+ messages in thread
From: Anatoly Burakov @ 2013-08-26 11:58 UTC (permalink / raw)
To: kvm
Hi all
I am using IVSHMEM to mmap /dev/mem into guest. The mmap works fine on
QEMU without KVM support enabled, but with KVM i get kernel errors:
***************************** (with EPT enabled)
[ 746.940720] ------------[ cut here ]------------
[ 746.948612] kernel BUG at arch/x86/kvm/../../../virt/kvm/kvm_main.c:1257!
[ 746.949067] invalid opcode: 0000 [#1] SMP
[ 746.949393] Modules linked in: rte_kni(OF) igb_uio(OF)
ebtable_nat(F) xt_CHECKSUM(F) bridge(F) stp(F) llc(F)
nf_conntrack_netbios_ns(F) nf_conntrack_broadcast(F) ipt_MASQUERADE(F)
ip6table_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F)
nf_defrag_ipv6(F) bnep(F) bluetooth(F) rfkill(F) iptable_nat(F)
nf_nat_ipv4(F) nf_nat(F) iptable_mangle(F) nf_conntrack_ipv4(F)
nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) ebtable_filter(F)
ebtables(F) ip6table_filter(F) ip6_tables(F) be2iscsi(F)
iscsi_boot_sysfs(F) bnx2i(F) cnic(F) uio(F) cxgb4i(F) cxgb4(F)
cxgb3i(F) cxgb3(F) libcxgbi(F) ib_iser(F) rdma_cm(F) ib_addr(F)
iw_cm(F) ib_cm(F) ib_sa(F) ib_mad(F) ib_core(F) iscsi_tcp(F)
libiscsi_tcp(F) libiscsi(F) scsi_transport_iscsi(F) iTCO_wdt(F)
iTCO_vendor_support(F) acpi_cpufreq(F) mperf(F) coretemp(F) shpchp(F)
[ 747.014963] lpc_ich(F) mfd_core(F) i2c_i801(F) ioatdma(F)
microcode(F) joydev(F) i7core_edac(F) edac_core(F) vhost_net(F) tun(F)
macvtap(F) macvlan(F) kvm_intel(F) kvm(F) uinput(F) crc32_pclmul(F)
crc32c_intel(F) ghash_clmulni_intel(F) ast(F) ixgbe(F) igb(F)
drm_kms_helper(F) e1000e(F) dca(F) ttm(F) ptp(F) drm(F)
i2c_algo_bit(F) pps_core(F) mdio(F) i2c_core(F) sunrpc(F) [last
unloaded: rte_kni]
[ 747.136764] CPU 8
[ 747.136909] Pid: 2501, comm: qemu-system-x86 Tainted: GF O
3.9.11-200.no_strict_dev_mem.fc18.x86_64 #1 Intel Corporation
S5520HC/S5520HC
[ 747.228668] RIP: 0010:[<ffffffffa018c43a>] [<ffffffffa018c43a>]
__gfn_to_pfn_memslot+0x36a/0x3e0 [kvm]
[ 747.259705] RSP: 0018:ffff880130d39ae8 EFLAGS: 00010246
[ 747.291580] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8801effeb000
[ 747.322598] RDX: 00000000001c3c00 RSI: 00007fd11f000000 RDI: ffffea00070f0000
[ 747.354242] RBP: ffff880130d39b58 R08: 0000000000000126 R09: ffff880130d39c2f
[ 747.385123] R10: 0000000000000000 R11: 00007fd140000000 R12: 00007fd11f000001
[ 747.415981] R13: ffff880130d39ba7 R14: ffff8801c3bcb4f0 R15: ffff8802b4538001
[ 747.447877] FS: 00007fd35c1e9700(0000) GS:ffff8801e9c80000(0000)
knlGS:0000000000000000
[ 747.479010] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 747.510220] CR2: 00007fe2ffc00000 CR3: 00000001e66c4000 CR4: 00000000000027e0
[ 747.542410] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 747.573780] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 747.604759] Process qemu-system-x86 (pid: 2501, threadinfo
ffff880130d38000, task ffff8801c3bcb4f0)
[ 747.637044] Stack:
[ 747.668362] ffff880130d39af8 ffffffff81083798 ffff880130d39b48
00007fd11f000000
[ 747.700654] 00000000001c3c00 00ff8802b3272a90 0000000000000380
ffff8802b3272a80
[ 747.731895] 0000000000000380 00000000000fc000 ffff880130d39c38
ffff880365fe8000
[ 747.763068] Call Trace:
[ 747.793746] [<ffffffff81083798>] ? hrtimer_start+0x18/0x20
[ 747.824435] [<ffffffffa018c530>] __gfn_to_pfn+0x60/0x70 [kvm]
[ 747.855267] [<ffffffffa018c61a>] gfn_to_pfn_async+0x1a/0x20 [kvm]
[ 747.884586] [<ffffffffa01a703a>] try_async_pf+0x4a/0x1d0 [kvm]
[ 747.914146] [<ffffffffa01aea2a>] tdp_page_fault+0xfa/0x210 [kvm]
[ 747.943000] [<ffffffffa01a89a1>] kvm_mmu_page_fault+0x31/0x100 [kvm]
[ 747.972271] [<ffffffffa02135ce>] handle_ept_violation+0x5e/0x100 [kvm_intel]
[ 748.000620] [<ffffffffa02189f6>] vmx_handle_exit+0xf6/0x7c0 [kvm_intel]
[ 748.029860] [<ffffffffa01bbe38>] ? kvm_apic_has_interrupt+0x28/0xe0 [kvm]
[ 748.058214] [<ffffffffa0210370>] ? vmx_invpcid_supported+0x20/0x20
[kvm_intel]
[ 748.086496] [<ffffffffa01a281b>] kvm_arch_vcpu_ioctl_run+0x2fb/0x11a0 [kvm]
[ 748.114711] [<ffffffffa019de67>] ? kvm_arch_vcpu_load+0x57/0x1e0 [kvm]
[ 748.142788] [<ffffffffa018a0ee>] kvm_vcpu_ioctl+0x26e/0x5f0 [kvm]
[ 748.170647] [<ffffffff810b7340>] ? do_futex+0x100/0xad0
[ 748.198558] [<ffffffff811232b4>] ? perf_event_context_sched_in+0x94/0xc0
[ 748.226194] [<ffffffff811abe07>] do_vfs_ioctl+0x97/0x580
[ 748.253809] [<ffffffff8129d027>] ? file_has_perm+0x97/0xb0
[ 748.281110] [<ffffffff811ac381>] sys_ioctl+0x91/0xb0
[ 748.307911] [<ffffffff816604d9>] system_call_fastpath+0x16/0x1b
[ 748.333388] Code: ff ff 49 29 d2 4c 89 d2 48 c1 ea 0c 48 03 90 98
00 00 00 48 89 d7 48 89 55 b0 e8 92 d6 ff ff 84 c0 48 8b 55 b0 0f 85
bf fe ff ff <0f> 0b 0f 1f 40 00 48 ba 00 00 00 00 00 00 f0 7f e9 aa fe
ff ff
[ 748.392724] RIP [<ffffffffa018c43a>] __gfn_to_pfn_memslot+0x36a/0x3e0 [kvm]
[ 748.419435] RSP <ffff880130d39ae8>
[ 748.524222] ---[ end trace 854a37c471141217 ]---
*********************************** (with EPT disabled)
[ 559.581338] ------------[ cut here ]------------
[ 559.581701] kernel BUG at arch/x86/kvm/../../../virt/kvm/kvm_main.c:1257!
[ 559.582169] invalid opcode: 0000 [#1] SMP
[ 559.582499] Modules linked in: kvm_intel rte_kni(OF) igb_uio(OF)
ebtable_nat xt_CHECKSUM bridge stp llc nf_conntrack_netbios_ns
nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT
nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat
iptable_mangle be2iscsi iscsi_boot_sysfs nf_conntrack_ipv4 bnx2i cnic
uio cxgb4i cxgb4 nf_defrag_ipv4 xt_conntrack nf_conntrack cxgb3i cxgb3
libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ebtable_filter bnep
ib_mad bluetooth ebtables ib_core iscsi_tcp libiscsi_tcp
ip6table_filter rfkill ip6_tables libiscsi scsi_transport_iscsi
iTCO_wdt acpi_cpufreq shpchp mperf i7core_edac iTCO_vendor_support
lpc_ich i2c_i801 coretemp microcode ioatdma edac_core mfd_core joydev
vhost_net tun macvtap macvlan kvm uinput ast crc32_pclmul
drm_kms_helper crc32c_intel
[ 559.592269] ghash_clmulni_intel ttm ixgbe igb e1000e drm dca ptp
i2c_algo_bit pps_core i2c_core mdio sunrpc [last unloaded: kvm_intel]
[ 559.593392] CPU 0
[ 559.593545] Pid: 2346, comm: qemu-system-x86 Tainted: GF O
3.9.11-200.no_strict_dev_mem.fc18.x86_64 #1 Intel Corporation
S5520HC/S5520HC
[ 559.629429] RIP: 0010:[<ffffffffa019243a>] [<ffffffffa019243a>]
__gfn_to_pfn_memslot+0x36a/0x3e0 [kvm]
[ 559.665456] RSP: 0018:ffff880137c999a8 EFLAGS: 00010246
[ 559.700778] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8801effeb000
[ 559.734581] RDX: 00000000001c3c00 RSI: 00007f3f40000000 RDI: ffffea00070f0000
[ 559.768291] RBP: ffff880137c99a18 R08: 0000000000000126 R09: ffff880137c99b2b
[ 559.801329] R10: 0000000000000000 R11: 00007f3f80000000 R12: 00007f3f40000001
[ 559.834191] R13: ffff880137c99a67 R14: ffff8801e64ad330 R15: ffff8802a3d5c001
[ 559.866638] FS: 00007f3f4193b700(0000) GS:ffff8801e9c00000(0000)
knlGS:0000000000000000
[ 559.899459] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 559.938787] CR2: 00007ff90ac00000 CR3: 00000001e524d000 CR4: 00000000000027e0
[ 559.970626] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 560.003114] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 560.034090] Process qemu-system-x86 (pid: 2346, threadinfo
ffff880137c98000, task ffff8801e64ad330)
[ 560.066053] Stack:
[ 560.097033] ffff880137c999e8 ffffffffa0192bb9 0000000000001a07
00007f3f40000000
[ 560.129651] 00000000001c3c00 0000000000000000 00000000000bce20
0000000000000002
[ 560.161050] ffff880137c99a08 00000000000fe000 ffff880137c99b38
ffff8802b5cb0000
[ 560.193547] Call Trace:
[ 560.224759] [<ffffffffa0192bb9>] ? kvm_io_bus_get_first_dev+0x49/0xe0 [kvm]
[ 560.257409] [<ffffffffa0192530>] __gfn_to_pfn+0x60/0x70 [kvm]
[ 560.288929] [<ffffffffa019261a>] gfn_to_pfn_async+0x1a/0x20 [kvm]
[ 560.320276] [<ffffffffa01ad03a>] try_async_pf+0x4a/0x1d0 [kvm]
[ 560.351392] [<ffffffffa01b5444>] paging64_page_fault+0x1e4/0x960 [kvm]
[ 560.383588] [<ffffffffa01a6747>] ? emulator_read_write+0x177/0x180 [kvm]
[ 560.414567] [<ffffffffa019c169>] ? kvm_fetch_guest_virt+0x69/0x80 [kvm]
[ 560.445593] [<ffffffffa0192c89>] ? kvm_io_bus_write+0x39/0x110 [kvm]
[ 560.476368] [<ffffffffa01ae9a1>] kvm_mmu_page_fault+0x31/0x100 [kvm]
[ 560.505739] [<ffffffffa05738eb>] handle_exception+0x23b/0x3b0 [kvm_intel]
[ 560.535920] [<ffffffffa05749f6>] vmx_handle_exit+0xf6/0x7c0 [kvm_intel]
[ 560.566093] [<ffffffffa056c370>] ? vmx_invpcid_supported+0x20/0x20
[kvm_intel]
[ 560.595010] [<ffffffffa01a881b>] kvm_arch_vcpu_ioctl_run+0x2fb/0x11a0 [kvm]
[ 560.623660] [<ffffffffa01a3e67>] ? kvm_arch_vcpu_load+0x57/0x1e0 [kvm]
[ 560.652031] [<ffffffffa01900ee>] kvm_vcpu_ioctl+0x26e/0x5f0 [kvm]
[ 560.680444] [<ffffffff811232b4>] ? perf_event_context_sched_in+0x94/0xc0
[ 560.708413] [<ffffffff811abe07>] do_vfs_ioctl+0x97/0x580
[ 560.741942] [<ffffffff8129d027>] ? file_has_perm+0x97/0xb0
[ 560.772561] [<ffffffff811ac381>] sys_ioctl+0x91/0xb0
[ 560.799777] [<ffffffff81014938>] ? do_notify_resume+0x38/0xb0
[ 560.826859] [<ffffffff816604d9>] system_call_fastpath+0x16/0x1b
[ 560.853168] Code: ff ff 49 29 d2 4c 89 d2 48 c1 ea 0c 48 03 90 98
00 00 00 48 89 d7 48 89 55 b0 e8 92 d6 ff ff 84 c0 48 8b 55 b0 0f 85
bf fe ff ff <0f> 0b 0f 1f 40 00 48 ba 00 00 00 00 00 00 f0 7f e9 aa fe
ff ff
[ 560.911100] RIP [<ffffffffa019243a>] __gfn_to_pfn_memslot+0x36a/0x3e0 [kvm]
[ 560.937369] RSP <ffff880137c999a8>
[ 561.073525] ---[ end trace d60d5d8001890dc2 ]---
The mmap code is your generic mmap code - open /dev/mem, get fd, mmap
into fd with offset of target physical memory and target size.
Any help on where to look would be greatly appreciated.
--
Best regards,
Anatoly
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: mmapping physical memory
2013-08-26 11:58 mmapping physical memory Anatoly Burakov
@ 2013-08-26 14:19 ` Andrea Arcangeli
0 siblings, 0 replies; 2+ messages in thread
From: Andrea Arcangeli @ 2013-08-26 14:19 UTC (permalink / raw)
To: Anatoly Burakov; +Cc: kvm
Hi Anatoly,
On Mon, Aug 26, 2013 at 12:58:25PM +0100, Anatoly Burakov wrote:
> Hi all
>
> I am using IVSHMEM to mmap /dev/mem into guest. The mmap works fine on
> QEMU without KVM support enabled, but with KVM i get kernel errors:
>
> ***************************** (with EPT enabled)
>
> [ 746.940720] ------------[ cut here ]------------
> [ 746.948612] kernel BUG at arch/x86/kvm/../../../virt/kvm/kvm_main.c:1257!
So the problem is KVM cannot do put_page on a pfn coming from a
/dev/mem mapping, but it cannot handle VM_PFNMAP mappings without
PageReserved set. During kvm_release_page_* KVM only has the pfn
number of the page, and it has to decide if this page is refcounted or
not, solely based on the pfn number. So if the page is not set as
referenced it cannot allow a mapping to be established, or later
during spte teardown put_page would run on the /dev/mem memory leading
to memory corruption. The above BUG_ON isn't just a false positive,
but it shows a limitation in the KVM page fault ability to map
any kind of memory coming from the host (including /dev/mem mappings).
So I'm suggesting to drop FOLL_GET in the page fault and
kvm_release_page_* after the spte establishment, and to relay entirely
on the mmu notifier and the kvm_mmu lock by adding a
vcpu->in_progress_fault_addr to set before calling gup hva_to_pfn and
to clear in the mmu notifier code within kvm->mmu_lock and to check
within the kvm->mmu_lock during spte establishment to know if the page
pointer become stale and we shall bail out and repeat the fault or not.
We'll still need to use FOLL_GET and set_page_dirty in some cases,
like after modifying the page in places like
emulator_cmpxchg_emulated. Those places cannot depend on the mmu
notifier and the dirty bit set in the pte isn't enough because the
page can be swapped out to disk and marked clean before kmap_atomic
runs, but the 99% of the hva_to_pfn are coming from the KVM secondary
MMU page faults, they're protected by the mmu notifier and they can
skip the refcounting completely including FOLL_GET. And then because
we won't have to run put_page at all anymore, the above BUG will
disappear too.
In terms of performance, I estimate the only cons will be a
"ATOMIC_ONCE(vcpu->in_progress_fault_addr) = addr" per-thread
cacheline local and lockless initialization before calling gup in
hva_to_pfn and the pros will be the removal of all refcounting
atomic_inc/dec and set_page_dirty from all the KVM page faults.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2013-08-26 14:19 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-26 11:58 mmapping physical memory Anatoly Burakov
2013-08-26 14:19 ` Andrea Arcangeli
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.