* KVM: mmu_notifiers release method @ 2008-12-10 20:23 Marcelo Tosatti 2008-12-24 12:50 ` Avi Kivity 0 siblings, 1 reply; 10+ messages in thread From: Marcelo Tosatti @ 2008-12-10 20:23 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: kvm-devel The destructor for huge pages uses the backing inode for adjusting hugetlbfs accounting. Hugepage mappings are destroyed by exit_mmap, after mmu_notifier_release, so there are no notifications through unmap_hugepage_range at this point. The hugetlbfs inode can be freed with pages backed by it referenced by the shadow. When the shadow releases its reference, the huge page destructor will access a now freed inode. Implement the release operation for kvm mmu notifiers to release page refs before the hugetlbfs inode is gone. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index e7644b9..5bc38b5 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -741,11 +741,19 @@ static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn, return young; } +static void kvm_mmu_notifier_release(struct mmu_notifier *mn, + struct mm_struct *mm) +{ + struct kvm *kvm = mmu_notifier_to_kvm(mn); + kvm_arch_flush_shadow(kvm); +} + static const struct mmu_notifier_ops kvm_mmu_notifier_ops = { .invalidate_page = kvm_mmu_notifier_invalidate_page, .invalidate_range_start = kvm_mmu_notifier_invalidate_range_start, .invalidate_range_end = kvm_mmu_notifier_invalidate_range_end, .clear_flush_young = kvm_mmu_notifier_clear_flush_young, + .release = kvm_mmu_notifier_release, }; #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */ ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: KVM: mmu_notifiers release method 2008-12-10 20:23 KVM: mmu_notifiers release method Marcelo Tosatti @ 2008-12-24 12:50 ` Avi Kivity 2008-12-24 15:28 ` Andrea Arcangeli 0 siblings, 1 reply; 10+ messages in thread From: Avi Kivity @ 2008-12-24 12:50 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Andrea Arcangeli, kvm-devel Marcelo Tosatti wrote: > The destructor for huge pages uses the backing inode for adjusting > hugetlbfs accounting. > > Hugepage mappings are destroyed by exit_mmap, after > mmu_notifier_release, so there are no notifications through > unmap_hugepage_range at this point. > > The hugetlbfs inode can be freed with pages backed by it referenced > by the shadow. When the shadow releases its reference, the huge page > destructor will access a now freed inode. > > Implement the release operation for kvm mmu notifiers to release page > refs before the hugetlbfs inode is gone. > > I see this isn't it. Andrea, comments? > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index e7644b9..5bc38b5 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -741,11 +741,19 @@ static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn, > return young; > } > > +static void kvm_mmu_notifier_release(struct mmu_notifier *mn, > + struct mm_struct *mm) > +{ > + struct kvm *kvm = mmu_notifier_to_kvm(mn); > + kvm_arch_flush_shadow(kvm); > +} > + > static const struct mmu_notifier_ops kvm_mmu_notifier_ops = { > .invalidate_page = kvm_mmu_notifier_invalidate_page, > .invalidate_range_start = kvm_mmu_notifier_invalidate_range_start, > .invalidate_range_end = kvm_mmu_notifier_invalidate_range_end, > .clear_flush_young = kvm_mmu_notifier_clear_flush_young, > + .release = kvm_mmu_notifier_release, > }; > #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */ > > -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: KVM: mmu_notifiers release method 2008-12-24 12:50 ` Avi Kivity @ 2008-12-24 15:28 ` Andrea Arcangeli 2008-12-29 14:58 ` __purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y Marcelo Tosatti 0 siblings, 1 reply; 10+ messages in thread From: Andrea Arcangeli @ 2008-12-24 15:28 UTC (permalink / raw) To: Avi Kivity; +Cc: Marcelo Tosatti, kvm-devel On Wed, Dec 24, 2008 at 02:50:57PM +0200, Avi Kivity wrote: > Marcelo Tosatti wrote: >> The destructor for huge pages uses the backing inode for adjusting >> hugetlbfs accounting. >> >> Hugepage mappings are destroyed by exit_mmap, after >> mmu_notifier_release, so there are no notifications through >> unmap_hugepage_range at this point. >> >> The hugetlbfs inode can be freed with pages backed by it referenced >> by the shadow. When the shadow releases its reference, the huge page >> destructor will access a now freed inode. >> >> Implement the release operation for kvm mmu notifiers to release page >> refs before the hugetlbfs inode is gone. >> >> > > I see this isn't it. Andrea, comments? Yeah, the patch looks good, I talked a bit with Marcelo about this by PM. The issue is that it's not as strightforward as it seems, basically when I implemented the ->release handlers and had sptes teardown running before the files were closed (instead of waiting the kvm anon inode release handler to fire) I was getting bugchecks from debug options including preempt=y (certain debug checks only becomes functional with preempt enabled unfortunately), so eventually I removed ->release because for kvm ->release wasn't useful because no guest mode can run any more by the time mmu notifier ->release is invoked, and that avoided the issues with the bugchecks. We'll be using the mmu notifiers ->release because it's always called just before the filehandle are destroyed, it's not really about the guest mode or secondary mmu but just an ordering issue with hugetlbfs internals. So in short if no bugcheck triggers this is fine (at least until hugetlbfs provides a way to register some callback to invoke at the start of the hugetlbfs->release handler). ^ permalink raw reply [flat|nested] 10+ messages in thread
* __purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y 2008-12-24 15:28 ` Andrea Arcangeli @ 2008-12-29 14:58 ` Marcelo Tosatti 2008-12-30 3:53 ` Nick Piggin 0 siblings, 1 reply; 10+ messages in thread From: Marcelo Tosatti @ 2008-12-29 14:58 UTC (permalink / raw) To: Andrea Arcangeli, Nick Piggin; +Cc: Avi Kivity, kvm-devel On Wed, Dec 24, 2008 at 04:28:44PM +0100, Andrea Arcangeli wrote: > On Wed, Dec 24, 2008 at 02:50:57PM +0200, Avi Kivity wrote: > > Marcelo Tosatti wrote: > >> The destructor for huge pages uses the backing inode for adjusting > >> hugetlbfs accounting. > >> > >> Hugepage mappings are destroyed by exit_mmap, after > >> mmu_notifier_release, so there are no notifications through > >> unmap_hugepage_range at this point. > >> > >> The hugetlbfs inode can be freed with pages backed by it referenced > >> by the shadow. When the shadow releases its reference, the huge page > >> destructor will access a now freed inode. > >> > >> Implement the release operation for kvm mmu notifiers to release page > >> refs before the hugetlbfs inode is gone. > >> > >> > > > > I see this isn't it. Andrea, comments? > > Yeah, the patch looks good, I talked a bit with Marcelo about this by > PM. The issue is that it's not as strightforward as it seems, > basically when I implemented the ->release handlers and had sptes > teardown running before the files were closed (instead of waiting the > kvm anon inode release handler to fire) I was getting bugchecks from > debug options including preempt=y (certain debug checks only becomes > functional with preempt enabled unfortunately), so eventually I > removed ->release because for kvm ->release wasn't useful because no > guest mode can run any more by the time mmu notifier ->release is > invoked, and that avoided the issues with the bugchecks. > > We'll be using the mmu notifiers ->release because it's always called > just before the filehandle are destroyed, it's not really about the > guest mode or secondary mmu but just an ordering issue with hugetlbfs > internals. > > So in short if no bugcheck triggers this is fine (at least until > hugetlbfs provides a way to register some callback to invoke at the > start of the hugetlbfs->release handler). The only bugcheck I see, which triggers on vanilla kvm upstream with CONFIG_PREEMPT_DEBUG=y and CONFIG_PREEMPT_RCU=y is: general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC<4>ttyS1: 1 input overrun(s) last sysfs file: /sys/class/net/tap0/address CPU 0 Modules linked in: tun ipt_MASQUERADE iptable_nat nf_nat bridge stp llc nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT iptable_filter ip_tables x_tables dm_multipath kvm_intel kvm scsi_wait_scan ata_piix libata dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug mptsas mptscsih mptbase scsi_transport_sas uhci_hcd ohci_hcd ehci_hcd Pid: 4768, comm: qemu-system-x86 Not tainted 2.6.28-00165-g4f27e3e-dirty #164 RIP: 0010:[<ffffffff8028a5b6>] [<ffffffff8028a5b6>] __purge_vmap_area_lazy+0x12c/0x163 RSP: 0018:ffff88021e1f9a38 EFLAGS: 00010202 RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b2b RCX: 0000000000000003 RDX: ffffffff80a1dae0 RSI: ffff880028083980 RDI: 0000000000000001 RBP: ffff88021e1f9a78 R08: 0000000000000286 R09: ffffffff80a1bf50 R10: ffff880119c270f8 R11: ffff88021e1f99b8 R12: ffff88021e1f9a38 R13: ffff88021e1f9a90 R14: ffff88021e1f9a98 R15: 000000000000813a FS: 0000000000000000(0000) GS:ffffffff8080d900(0000) knlGS:0000000000000000 CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b CR2: 00000000008d9828 CR3: 0000000000201000 CR4: 00000000000026e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process qemu-system-x86 (pid: 4768, threadinfo ffff88021e1f8000, task ffff880119c270f8) Stack: ffff88022bdfd840 ffff880119da11b8 ffffc20011c30000 000000000000813a 0000000000000000 0000000000000001 ffff88022ec11c18 ffff88022f061838 ffff88021e1f9aa8 ffffffff8028ab1d ffff88021e1f9aa8 ffffc20021976000 Call Trace: [<ffffffff8028ab1d>] free_unmap_vmap_area_noflush+0x69/0x70 [<ffffffff8028ab49>] remove_vm_area+0x25/0x71 [<ffffffff8028ac54>] __vunmap+0x3a/0xca [<ffffffff8028ad35>] vfree+0x29/0x2b [<ffffffffa00f98a3>] kvm_free_physmem_slot+0x25/0x7c [kvm] [<ffffffffa00f9d75>] kvm_free_physmem+0x27/0x36 [kvm] [<ffffffffa00fccb4>] kvm_arch_destroy_vm+0xa6/0xda [kvm] [<ffffffffa00f9e11>] kvm_put_kvm+0x8d/0xa7 [kvm] [<ffffffffa00fa0e2>] kvm_vcpu_release+0x13/0x17 [kvm] [<ffffffff802a1c07>] __fput+0xeb/0x1a3 [<ffffffff802a1cd4>] fput+0x15/0x17 [<ffffffff8029f26c>] filp_close+0x67/0x72 [<ffffffff802378a8>] put_files_struct+0x74/0xc8 [<ffffffff80237943>] exit_files+0x47/0x4f [<ffffffff80238fe5>] do_exit+0x1eb/0x7a7 [<ffffffff80587edf>] ? _spin_unlock_irq+0x2b/0x51 [<ffffffff80239614>] do_group_exit+0x73/0xa0 [<ffffffff80242b10>] get_signal_to_deliver+0x30c/0x32c [<ffffffff8020b4d5>] ? sysret_signal+0x19/0x29 [<ffffffff8020a80f>] do_notify_resume+0x8c/0x851 [<ffffffff8025b811>] ? do_futex+0x90/0x92a [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114 [<ffffffff80587f51>] ? _spin_unlock_irqrestore+0x4c/0x68 [<ffffffff8026be5c>] ? __rcu_read_unlock+0x92/0x9e [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114 [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf [<ffffffff8024f300>] ? getnstimeofday+0x3a/0x96 [<ffffffff8024c4f0>] ? ktime_get_ts+0x49/0x4e [<ffffffff8020b4c1>] ? sysret_signal+0x5/0x29 [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114 [<ffffffff8020b4d5>] ? sysret_signal+0x19/0x29 [<ffffffff8020b7b7>] ptregscall_common+0x67/0xb0 Code: 46 48 c7 c7 c0 d1 74 80 4c 8d 65 c0 e8 0c db 2f 00 48 8b 45 c0 48 8d 58 c0 eb 10 48 89 df e8 74 fe ff ff 48 8b 43 40 48 8d 58 c0 <48> 8b 43 40 0f 18 08 48 8d 43 40 4c 39 e0 75 e0 48 c7 c7 c0 d1 RIP [<ffffffff8028a5b6>] __purge_vmap_area_lazy+0x12c/0x163 RSP <ffff88021e1f9a38> ---[ end trace fde3e64ebe4bbca2 ]--- Fixing recursive fault but reboot is needed! BUG: scheduling while atomic: qemu-system-x86/4768/0x00000003 Modules linked in: tun ipt_MASQUERADE iptable_nat nf_nat bridge stp llc nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT iptable_filter ip_tables x_tables dm_multipath kvm_intel kvm scsi_wait_scan ata_piix libata dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug mptsas mptscsih mptbase scsi_transport_sas uhci_hcd ohci_hcd ehci_hcd Pid: 4768, comm: qemu-system-x86 Tainted: G D 2.6.28-00165-g4f27e3e-dirty #164 Call Trace: [<ffffffff8025585e>] ? __debug_show_held_locks+0x1b/0x24 [<ffffffff8023187b>] __schedule_bug+0x8c/0x95 [<ffffffff805851e1>] schedule+0xd3/0x902 [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf [<ffffffff8037a938>] ? put_io_context+0x67/0x72 [<ffffffff80238ed4>] do_exit+0xda/0x7a7 [<ffffffff805892c9>] oops_begin+0x0/0x90 [<ffffffff8020e3c9>] die+0x5d/0x66 [<ffffffff80588ff7>] do_general_protection+0x128/0x130 [<ffffffff80588ecf>] ? do_general_protection+0x0/0x130 [<ffffffff80588702>] error_exit+0x0/0xa9 [<ffffffff8028a5b6>] ? __purge_vmap_area_lazy+0x12c/0x163 [<ffffffff8028a5ae>] ? __purge_vmap_area_lazy+0x124/0x163 [<ffffffff8028ab1d>] free_unmap_vmap_area_noflush+0x69/0x70 [<ffffffff8028ab49>] remove_vm_area+0x25/0x71 [<ffffffff8028ac54>] __vunmap+0x3a/0xca [<ffffffff8028ad35>] vfree+0x29/0x2b [<ffffffffa00f98a3>] kvm_free_physmem_slot+0x25/0x7c [kvm] [<ffffffffa00f9d75>] kvm_free_physmem+0x27/0x36 [kvm] [<ffffffffa00fccb4>] kvm_arch_destroy_vm+0xa6/0xda [kvm] [<ffffffffa00f9e11>] kvm_put_kvm+0x8d/0xa7 [kvm] [<ffffffffa00fa0e2>] kvm_vcpu_release+0x13/0x17 [kvm] [<ffffffff802a1c07>] __fput+0xeb/0x1a3 [<ffffffff802a1cd4>] fput+0x15/0x17 [<ffffffff8029f26c>] filp_close+0x67/0x72 [<ffffffff802378a8>] put_files_struct+0x74/0xc8 [<ffffffff80237943>] exit_files+0x47/0x4f [<ffffffff80238fe5>] do_exit+0x1eb/0x7a7 [<ffffffff80587edf>] ? _spin_unlock_irq+0x2b/0x51 [<ffffffff80239614>] do_group_exit+0x73/0xa0 [<ffffffff80242b10>] get_signal_to_deliver+0x30c/0x32c [<ffffffff8020b4d5>] ? sysret_signal+0x19/0x29 [<ffffffff8020a80f>] do_notify_resume+0x8c/0x851 [<ffffffff8025b811>] ? do_futex+0x90/0x92a [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114 [<ffffffff80587f51>] ? _spin_unlock_irqrestore+0x4c/0x68 [<ffffffff8026be5c>] ? __rcu_read_unlock+0x92/0x9e [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114 [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf [<ffffffff8024f300>] ? getnstimeofday+0x3a/0x96 [<ffffffff8024c4f0>] ? ktime_get_ts+0x49/0x4e [<ffffffff8020b4c1>] ? sysret_signal+0x5/0x29 [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114 [<ffffffff8020b4d5>] ? sysret_signal+0x19/0x29 [<ffffffff8020b7b7>] ptregscall_common+0x67/0xb0 ttyS1: 26 input overrun(s) And its not specific to vm shutdown path. Another instance: general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC last sysfs file: /sys/class/net/tap0/address CPU 5 Modules linked in: ipt_REJECT xt_state xt_tcpudp iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables tun kvm_intel kvm bridge stp llc dm_multipath scsi_wait_scan ata_piix libata dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug mptsas mptscsih mptbase scsi_transport_sas uhci_hcd ohci_hcd ehci_hcd [last unloaded: x_tables] Pid: 4440, comm: qemu-system-x86 Not tainted 2.6.28-00165-g4f27e3e-dirty #163 RIP: 0010:[<ffffffff8028a5b6>] [<ffffffff8028a5b6>] __purge_vmap_area_lazy+0x12c/0x163 RSP: 0018:ffff88011f4c7be8 EFLAGS: 00010246 RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b2b RCX: 0000000000000003 RDX: ffffffff80a1dae0 RSI: ffff880028083980 RDI: 0000000000000001 RBP: ffff88011f4c7c28 R08: 0000000000000282 R09: ffffffff80a1bf50 R10: ffff88022e9dc0f8 R11: ffff88011f4c7b68 R12: ffff88011f4c7be8 R13: ffff88011f4c7c40 R14: ffff88011f4c7c48 R15: 0000000000008001 FS: 0000000040abf950(0063) GS:ffff88022f25ed18(0000) knlGS:0000000000000000 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000229d34000 CR4: 00000000000026e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process qemu-system-x86 (pid: 4440, threadinfo ffff88011f4c6000, task ffff88022e9dc0f8) Stack: ffff8802291a14b0 ffff880229003d58 ffffc20021000000 0000000000008001 ffff880229526000 0000000000000000 ffff88022d073000 ffff88011f58c0c0 ffff88011f4c7c58 ffffffff8028ab1d ffff88011f4c7c58 ffffffffa015c000 Call Trace: [<ffffffff8028ab1d>] free_unmap_vmap_area_noflush+0x69/0x70 [<ffffffff8028ab49>] remove_vm_area+0x25/0x71 [<ffffffff8028ac54>] __vunmap+0x3a/0xca [<ffffffff8028ad0a>] vunmap+0x26/0x28 [<ffffffffa01be092>] pio_copy_data+0xcf/0x113 [kvm] [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf [<ffffffffa01be16f>] complete_pio+0x99/0x1ef [kvm] [<ffffffff8023fcd2>] ? sigprocmask+0xc6/0xd0 [<ffffffffa01c0295>] kvm_arch_vcpu_ioctl_run+0x9a/0x889 [kvm] [<ffffffffa01b84f4>] kvm_vcpu_ioctl+0xfc/0x48b [kvm] [<ffffffff802ac760>] vfs_ioctl+0x2a/0x78 [<ffffffff8026be5c>] ? __rcu_read_unlock+0x92/0x9e [<ffffffff802acb46>] do_vfs_ioctl+0x398/0x3c6 [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf [<ffffffff802acbb6>] sys_ioctl+0x42/0x65 [<ffffffff8020b43b>] system_call_fastpath+0x16/0x1b Code: 46 48 c7 c7 c0 d1 74 80 4c 8d 65 c0 e8 8c da 2f 00 48 8b 45 c0 48 8d 58 c0 eb 10 48 89 df e8 74 fe ff ff 48 8b 43 40 48 8d 58 c0 <48> 8b 43 40 0f 18 08 48 8d 43 40 4c 39 e0 75 e0 48 c7 c7 c0 d1 RIP [<ffffffff8028a5b6>] __purge_vmap_area_lazy+0x12c/0x163 RSP <ffff88011f4c7be8> ---[ end trace 31811279a2e983e8 ]--- note: qemu-system-x86[4440] exited with preempt_count 2 (gdb) l *(__purge_vmap_area_lazy + 0x12c) 0xffffffff80289ca2 is in __purge_vmap_area_lazy (mm/vmalloc.c:516). 511 if (nr || force_flush) 512 flush_tlb_kernel_range(*start, *end); 513 514 if (nr) { 515 spin_lock(&vmap_area_lock); 516 list_for_each_entry(va, &valist, purge_list) 517 __free_vmap_area(va); 518 spin_unlock(&vmap_area_lock); 519 } 520 spin_unlock(&purge_lock); 0xffffffff80289c9a <__purge_vmap_area_lazy+292>: mov 0x40(%rbx),%rax 0xffffffff80289c9e <__purge_vmap_area_lazy+296>: lea -0x40(%rax),%rbx 0xffffffff80289ca2 <__purge_vmap_area_lazy+300>: mov 0x40(%rbx),%rax ^^^^^^^^^^^^^^^^^^^ 0xffffffff80289ca6 <__purge_vmap_area_lazy+304>: prefetcht0 (%rax) Which vanishes once PREEMPT_RCU is disabled. Nick? KVM does not make direct use of RCU. Same issue happens if the entire __purge_vmap_area_lazy runs with vmap_area_lock held. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: __purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y 2008-12-29 14:58 ` __purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y Marcelo Tosatti @ 2008-12-30 3:53 ` Nick Piggin 2008-12-30 15:13 ` Marcelo Tosatti 0 siblings, 1 reply; 10+ messages in thread From: Nick Piggin @ 2008-12-30 3:53 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Andrea Arcangeli, Avi Kivity, kvm-devel On Tuesday 30 December 2008 01:58:21 Marcelo Tosatti wrote: > On Wed, Dec 24, 2008 at 04:28:44PM +0100, Andrea Arcangeli wrote: > > On Wed, Dec 24, 2008 at 02:50:57PM +0200, Avi Kivity wrote: > > > Marcelo Tosatti wrote: > > >> The destructor for huge pages uses the backing inode for adjusting > > >> hugetlbfs accounting. > > >> > > >> Hugepage mappings are destroyed by exit_mmap, after > > >> mmu_notifier_release, so there are no notifications through > > >> unmap_hugepage_range at this point. > > >> > > >> The hugetlbfs inode can be freed with pages backed by it referenced > > >> by the shadow. When the shadow releases its reference, the huge page > > >> destructor will access a now freed inode. > > >> > > >> Implement the release operation for kvm mmu notifiers to release page > > >> refs before the hugetlbfs inode is gone. > > > > > > I see this isn't it. Andrea, comments? > > > > Yeah, the patch looks good, I talked a bit with Marcelo about this by > > PM. The issue is that it's not as strightforward as it seems, > > basically when I implemented the ->release handlers and had sptes > > teardown running before the files were closed (instead of waiting the > > kvm anon inode release handler to fire) I was getting bugchecks from > > debug options including preempt=y (certain debug checks only becomes > > functional with preempt enabled unfortunately), so eventually I > > removed ->release because for kvm ->release wasn't useful because no > > guest mode can run any more by the time mmu notifier ->release is > > invoked, and that avoided the issues with the bugchecks. > > > > We'll be using the mmu notifiers ->release because it's always called > > just before the filehandle are destroyed, it's not really about the > > guest mode or secondary mmu but just an ordering issue with hugetlbfs > > internals. > > > > So in short if no bugcheck triggers this is fine (at least until > > hugetlbfs provides a way to register some callback to invoke at the > > start of the hugetlbfs->release handler). > > The only bugcheck I see, which triggers on vanilla kvm upstream with > CONFIG_PREEMPT_DEBUG=y and CONFIG_PREEMPT_RCU=y is: > > general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC<4>ttyS1: 1 > input overrun(s) > > last sysfs file: /sys/class/net/tap0/address > CPU 0 > Modules linked in: tun ipt_MASQUERADE iptable_nat nf_nat bridge stp llc > nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT > iptable_filter ip_tables x_tables dm_multipath kvm_intel kvm scsi_wait_scan > ata_piix libata dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod > shpchp pci_hotplug mptsas mptscsih mptbase scsi_transport_sas uhci_hcd > ohci_hcd ehci_hcd Pid: 4768, comm: qemu-system-x86 Not tainted > 2.6.28-00165-g4f27e3e-dirty #164 RIP: 0010:[<ffffffff8028a5b6>] > [<ffffffff8028a5b6>] __purge_vmap_area_lazy+0x12c/0x163 RSP: > 0018:ffff88021e1f9a38 EFLAGS: 00010202 > RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b2b RCX: 0000000000000003 > RDX: ffffffff80a1dae0 RSI: ffff880028083980 RDI: 0000000000000001 > RBP: ffff88021e1f9a78 R08: 0000000000000286 R09: ffffffff80a1bf50 > R10: ffff880119c270f8 R11: ffff88021e1f99b8 R12: ffff88021e1f9a38 > R13: ffff88021e1f9a90 R14: ffff88021e1f9a98 R15: 000000000000813a > FS: 0000000000000000(0000) GS:ffffffff8080d900(0000) > knlGS:0000000000000000 CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b > CR2: 00000000008d9828 CR3: 0000000000201000 CR4: 00000000000026e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process qemu-system-x86 (pid: 4768, threadinfo ffff88021e1f8000, task > ffff880119c270f8) Stack: > ffff88022bdfd840 ffff880119da11b8 ffffc20011c30000 000000000000813a > 0000000000000000 0000000000000001 ffff88022ec11c18 ffff88022f061838 > ffff88021e1f9aa8 ffffffff8028ab1d ffff88021e1f9aa8 ffffc20021976000 > Call Trace: > [<ffffffff8028ab1d>] free_unmap_vmap_area_noflush+0x69/0x70 > [<ffffffff8028ab49>] remove_vm_area+0x25/0x71 > [<ffffffff8028ac54>] __vunmap+0x3a/0xca > [<ffffffff8028ad35>] vfree+0x29/0x2b > [<ffffffffa00f98a3>] kvm_free_physmem_slot+0x25/0x7c [kvm] > [<ffffffffa00f9d75>] kvm_free_physmem+0x27/0x36 [kvm] > [<ffffffffa00fccb4>] kvm_arch_destroy_vm+0xa6/0xda [kvm] > [<ffffffffa00f9e11>] kvm_put_kvm+0x8d/0xa7 [kvm] > [<ffffffffa00fa0e2>] kvm_vcpu_release+0x13/0x17 [kvm] > [<ffffffff802a1c07>] __fput+0xeb/0x1a3 > [<ffffffff802a1cd4>] fput+0x15/0x17 > [<ffffffff8029f26c>] filp_close+0x67/0x72 > [<ffffffff802378a8>] put_files_struct+0x74/0xc8 > [<ffffffff80237943>] exit_files+0x47/0x4f > [<ffffffff80238fe5>] do_exit+0x1eb/0x7a7 > [<ffffffff80587edf>] ? _spin_unlock_irq+0x2b/0x51 > [<ffffffff80239614>] do_group_exit+0x73/0xa0 > [<ffffffff80242b10>] get_signal_to_deliver+0x30c/0x32c > [<ffffffff8020b4d5>] ? sysret_signal+0x19/0x29 > [<ffffffff8020a80f>] do_notify_resume+0x8c/0x851 > [<ffffffff8025b811>] ? do_futex+0x90/0x92a > [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114 > [<ffffffff80587f51>] ? _spin_unlock_irqrestore+0x4c/0x68 > [<ffffffff8026be5c>] ? __rcu_read_unlock+0x92/0x9e > [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114 > [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf > [<ffffffff8024f300>] ? getnstimeofday+0x3a/0x96 > [<ffffffff8024c4f0>] ? ktime_get_ts+0x49/0x4e > [<ffffffff8020b4c1>] ? sysret_signal+0x5/0x29 > [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114 > [<ffffffff8020b4d5>] ? sysret_signal+0x19/0x29 > [<ffffffff8020b7b7>] ptregscall_common+0x67/0xb0 > Code: 46 48 c7 c7 c0 d1 74 80 4c 8d 65 c0 e8 0c db 2f 00 48 8b 45 c0 48 8d > 58 c0 eb 10 48 89 df e8 74 fe ff ff 48 8b 43 40 48 8d 58 c0 <48> 8b 43 40 > 0f 18 08 48 8d 43 40 4c 39 e0 75 e0 48 c7 c7 c0 d1 RIP > [<ffffffff8028a5b6>] __purge_vmap_area_lazy+0x12c/0x163 > RSP <ffff88021e1f9a38> > ---[ end trace fde3e64ebe4bbca2 ]--- > Fixing recursive fault but reboot is needed! > BUG: scheduling while atomic: qemu-system-x86/4768/0x00000003 > Modules linked in: tun ipt_MASQUERADE iptable_nat nf_nat bridge stp llc > nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT > iptable_filter ip_tables x_tables dm_multipath kvm_intel kvm scsi_wait_scan > ata_piix libata dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod > shpchp pci_hotplug mptsas mptscsih mptbase scsi_transport_sas uhci_hcd > ohci_hcd ehci_hcd Pid: 4768, comm: qemu-system-x86 Tainted: G D > 2.6.28-00165-g4f27e3e-dirty #164 Call Trace: > [<ffffffff8025585e>] ? __debug_show_held_locks+0x1b/0x24 > [<ffffffff8023187b>] __schedule_bug+0x8c/0x95 > [<ffffffff805851e1>] schedule+0xd3/0x902 > [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf > [<ffffffff8037a938>] ? put_io_context+0x67/0x72 > [<ffffffff80238ed4>] do_exit+0xda/0x7a7 > [<ffffffff805892c9>] oops_begin+0x0/0x90 > [<ffffffff8020e3c9>] die+0x5d/0x66 > [<ffffffff80588ff7>] do_general_protection+0x128/0x130 > [<ffffffff80588ecf>] ? do_general_protection+0x0/0x130 > [<ffffffff80588702>] error_exit+0x0/0xa9 > [<ffffffff8028a5b6>] ? __purge_vmap_area_lazy+0x12c/0x163 > [<ffffffff8028a5ae>] ? __purge_vmap_area_lazy+0x124/0x163 > [<ffffffff8028ab1d>] free_unmap_vmap_area_noflush+0x69/0x70 > [<ffffffff8028ab49>] remove_vm_area+0x25/0x71 > [<ffffffff8028ac54>] __vunmap+0x3a/0xca > [<ffffffff8028ad35>] vfree+0x29/0x2b > [<ffffffffa00f98a3>] kvm_free_physmem_slot+0x25/0x7c [kvm] > [<ffffffffa00f9d75>] kvm_free_physmem+0x27/0x36 [kvm] > [<ffffffffa00fccb4>] kvm_arch_destroy_vm+0xa6/0xda [kvm] > [<ffffffffa00f9e11>] kvm_put_kvm+0x8d/0xa7 [kvm] > [<ffffffffa00fa0e2>] kvm_vcpu_release+0x13/0x17 [kvm] > [<ffffffff802a1c07>] __fput+0xeb/0x1a3 > [<ffffffff802a1cd4>] fput+0x15/0x17 > [<ffffffff8029f26c>] filp_close+0x67/0x72 > [<ffffffff802378a8>] put_files_struct+0x74/0xc8 > [<ffffffff80237943>] exit_files+0x47/0x4f > [<ffffffff80238fe5>] do_exit+0x1eb/0x7a7 > [<ffffffff80587edf>] ? _spin_unlock_irq+0x2b/0x51 > [<ffffffff80239614>] do_group_exit+0x73/0xa0 > [<ffffffff80242b10>] get_signal_to_deliver+0x30c/0x32c > [<ffffffff8020b4d5>] ? sysret_signal+0x19/0x29 > [<ffffffff8020a80f>] do_notify_resume+0x8c/0x851 > [<ffffffff8025b811>] ? do_futex+0x90/0x92a > [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114 > [<ffffffff80587f51>] ? _spin_unlock_irqrestore+0x4c/0x68 > [<ffffffff8026be5c>] ? __rcu_read_unlock+0x92/0x9e > [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114 > [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf > [<ffffffff8024f300>] ? getnstimeofday+0x3a/0x96 > [<ffffffff8024c4f0>] ? ktime_get_ts+0x49/0x4e > [<ffffffff8020b4c1>] ? sysret_signal+0x5/0x29 > [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114 > [<ffffffff8020b4d5>] ? sysret_signal+0x19/0x29 > [<ffffffff8020b7b7>] ptregscall_common+0x67/0xb0 > ttyS1: 26 input overrun(s) > > And its not specific to vm shutdown path. Another instance: > > > general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC > last sysfs file: /sys/class/net/tap0/address > CPU 5 > Modules linked in: ipt_REJECT xt_state xt_tcpudp iptable_filter > ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack > nf_defrag_ipv4 ip_tables x_tables tun kvm_intel kvm bridge stp llc > dm_multipath scsi_wait_scan ata_piix libata dm_snapshot dm_zero dm_mirror > dm_region_hash dm_log dm_mod shpchp pci_hotplug mptsas mptscsih mptbase > scsi_transport_sas uhci_hcd ohci_hcd ehci_hcd [last unloaded: x_tables] > Pid: 4440, comm: qemu-system-x86 Not tainted 2.6.28-00165-g4f27e3e-dirty > #163 RIP: 0010:[<ffffffff8028a5b6>] [<ffffffff8028a5b6>] > __purge_vmap_area_lazy+0x12c/0x163 RSP: 0018:ffff88011f4c7be8 EFLAGS: > 00010246 > RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b2b RCX: 0000000000000003 > RDX: ffffffff80a1dae0 RSI: ffff880028083980 RDI: 0000000000000001 > RBP: ffff88011f4c7c28 R08: 0000000000000282 R09: ffffffff80a1bf50 > R10: ffff88022e9dc0f8 R11: ffff88011f4c7b68 R12: ffff88011f4c7be8 > R13: ffff88011f4c7c40 R14: ffff88011f4c7c48 R15: 0000000000008001 > FS: 0000000040abf950(0063) GS:ffff88022f25ed18(0000) > knlGS:0000000000000000 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 > CR2: 0000000000000000 CR3: 0000000229d34000 CR4: 00000000000026e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process qemu-system-x86 (pid: 4440, threadinfo ffff88011f4c6000, task > ffff88022e9dc0f8) Stack: > ffff8802291a14b0 ffff880229003d58 ffffc20021000000 0000000000008001 > ffff880229526000 0000000000000000 ffff88022d073000 ffff88011f58c0c0 > ffff88011f4c7c58 ffffffff8028ab1d ffff88011f4c7c58 ffffffffa015c000 > Call Trace: > [<ffffffff8028ab1d>] free_unmap_vmap_area_noflush+0x69/0x70 > [<ffffffff8028ab49>] remove_vm_area+0x25/0x71 > [<ffffffff8028ac54>] __vunmap+0x3a/0xca > [<ffffffff8028ad0a>] vunmap+0x26/0x28 > [<ffffffffa01be092>] pio_copy_data+0xcf/0x113 [kvm] > [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf > [<ffffffffa01be16f>] complete_pio+0x99/0x1ef [kvm] > [<ffffffff8023fcd2>] ? sigprocmask+0xc6/0xd0 > [<ffffffffa01c0295>] kvm_arch_vcpu_ioctl_run+0x9a/0x889 [kvm] > [<ffffffffa01b84f4>] kvm_vcpu_ioctl+0xfc/0x48b [kvm] > [<ffffffff802ac760>] vfs_ioctl+0x2a/0x78 > [<ffffffff8026be5c>] ? __rcu_read_unlock+0x92/0x9e > [<ffffffff802acb46>] do_vfs_ioctl+0x398/0x3c6 > [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf > [<ffffffff802acbb6>] sys_ioctl+0x42/0x65 > [<ffffffff8020b43b>] system_call_fastpath+0x16/0x1b > Code: 46 48 c7 c7 c0 d1 74 80 4c 8d 65 c0 e8 8c da 2f 00 48 8b 45 c0 48 8d > 58 c0 eb 10 48 89 df e8 74 fe ff ff 48 8b 43 40 48 8d 58 c0 <48> 8b 43 40 > 0f 18 08 48 8d 43 40 4c 39 e0 75 e0 48 c7 c7 c0 d1 RIP > [<ffffffff8028a5b6>] __purge_vmap_area_lazy+0x12c/0x163 > RSP <ffff88011f4c7be8> > ---[ end trace 31811279a2e983e8 ]--- > note: qemu-system-x86[4440] exited with preempt_count 2 > > > (gdb) l *(__purge_vmap_area_lazy + 0x12c) > 0xffffffff80289ca2 is in __purge_vmap_area_lazy (mm/vmalloc.c:516). > 511 if (nr || force_flush) > 512 flush_tlb_kernel_range(*start, *end); > 513 > 514 if (nr) { > 515 spin_lock(&vmap_area_lock); > 516 list_for_each_entry(va, &valist, purge_list) > 517 __free_vmap_area(va); > 518 spin_unlock(&vmap_area_lock); > 519 } > 520 spin_unlock(&purge_lock); > > 0xffffffff80289c9a <__purge_vmap_area_lazy+292>: mov 0x40(%rbx),%rax > 0xffffffff80289c9e <__purge_vmap_area_lazy+296>: lea -0x40(%rax),%rbx > 0xffffffff80289ca2 <__purge_vmap_area_lazy+300>: mov 0x40(%rbx),%rax > ^^^^^^^^^^^^^^^^^^^ > 0xffffffff80289ca6 <__purge_vmap_area_lazy+304>: prefetcht0 (%rax) > > > Which vanishes once PREEMPT_RCU is disabled. > > Nick? KVM does not make direct use of RCU. Same issue happens if the > entire __purge_vmap_area_lazy runs with vmap_area_lock held. The thing is that the valist and va->purge_list is protected by purge_lock in that function. I can't easily see how that could get corrupted. Is it easy to reproduce? Can you try putting preempt_disable around rcu_read_lock? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: __purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y 2008-12-30 3:53 ` Nick Piggin @ 2008-12-30 15:13 ` Marcelo Tosatti 2008-12-30 15:32 ` Avi Kivity 0 siblings, 1 reply; 10+ messages in thread From: Marcelo Tosatti @ 2008-12-30 15:13 UTC (permalink / raw) To: Nick Piggin; +Cc: Andrea Arcangeli, Avi Kivity, kvm-devel On Tue, Dec 30, 2008 at 02:53:36PM +1100, Nick Piggin wrote: > > RSP <ffff88011f4c7be8> > > ---[ end trace 31811279a2e983e8 ]--- > > note: qemu-system-x86[4440] exited with preempt_count 2 > > > > > > (gdb) l *(__purge_vmap_area_lazy + 0x12c) > > 0xffffffff80289ca2 is in __purge_vmap_area_lazy (mm/vmalloc.c:516). > > 511 if (nr || force_flush) > > 512 flush_tlb_kernel_range(*start, *end); > > 513 > > 514 if (nr) { > > 515 spin_lock(&vmap_area_lock); > > 516 list_for_each_entry(va, &valist, purge_list) > > 517 __free_vmap_area(va); > > 518 spin_unlock(&vmap_area_lock); > > 519 } > > 520 spin_unlock(&purge_lock); > > > > 0xffffffff80289c9a <__purge_vmap_area_lazy+292>: mov 0x40(%rbx),%rax > > 0xffffffff80289c9e <__purge_vmap_area_lazy+296>: lea -0x40(%rax),%rbx > > 0xffffffff80289ca2 <__purge_vmap_area_lazy+300>: mov 0x40(%rbx),%rax > > ^^^^^^^^^^^^^^^^^^^ Note: RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b2b > > 0xffffffff80289ca6 <__purge_vmap_area_lazy+304>: prefetcht0 (%rax) > > > > > > Which vanishes once PREEMPT_RCU is disabled. > > > > Nick? KVM does not make direct use of RCU. Same issue happens if the > > entire __purge_vmap_area_lazy runs with vmap_area_lock held. Hum, mmu_notifiers does. > The thing is that the valist and va->purge_list is protected by > purge_lock in that function. Which disables preemption... > I can't easily see how that could get corrupted. Perhaps the corruption happens before the freeing pass. > Is it easy to reproduce? Yes. Enable CONFIG_PREEMPT, CONFIG_PREEMPT_RCU, and DEBUG_PREEMPT. Start a Linux guest, wait for boot to finish, and shut it down. Sometimes it happens even before guest shutdown, in the complete_pio path as reported. > Can you try putting preempt_disable around rcu_read_lock? Tried it before, does not help. Even tried to protect all rcu_read_lock/unlock pairs with preempt_disable/enable in vmalloc.c. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: __purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y 2008-12-30 15:13 ` Marcelo Tosatti @ 2008-12-30 15:32 ` Avi Kivity 2008-12-31 2:32 ` Nick Piggin 0 siblings, 1 reply; 10+ messages in thread From: Avi Kivity @ 2008-12-30 15:32 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Nick Piggin, Andrea Arcangeli, kvm-devel Marcelo Tosatti wrote: > On Tue, Dec 30, 2008 at 02:53:36PM +1100, Nick Piggin wrote: > >>> RSP <ffff88011f4c7be8> >>> ---[ end trace 31811279a2e983e8 ]--- >>> note: qemu-system-x86[4440] exited with preempt_count 2 >>> >>> >>> (gdb) l *(__purge_vmap_area_lazy + 0x12c) >>> 0xffffffff80289ca2 is in __purge_vmap_area_lazy (mm/vmalloc.c:516). >>> 511 if (nr || force_flush) >>> 512 flush_tlb_kernel_range(*start, *end); >>> 513 >>> 514 if (nr) { >>> 515 spin_lock(&vmap_area_lock); >>> 516 list_for_each_entry(va, &valist, purge_list) >>> 517 __free_vmap_area(va); >>> 518 spin_unlock(&vmap_area_lock); >>> 519 } >>> 520 spin_unlock(&purge_lock); >>> >>> 0xffffffff80289c9a <__purge_vmap_area_lazy+292>: mov 0x40(%rbx),%rax >>> 0xffffffff80289c9e <__purge_vmap_area_lazy+296>: lea -0x40(%rax),%rbx >>> 0xffffffff80289ca2 <__purge_vmap_area_lazy+300>: mov 0x40(%rbx),%rax >>> ^^^^^^^^^^^^^^^^^^^ >>> > > Note: > > RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b2b > > Good old POISON_FREE. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: __purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y 2008-12-30 15:32 ` Avi Kivity @ 2008-12-31 2:32 ` Nick Piggin 2009-01-06 19:53 ` Marcelo Tosatti 0 siblings, 1 reply; 10+ messages in thread From: Nick Piggin @ 2008-12-31 2:32 UTC (permalink / raw) To: Avi Kivity; +Cc: Marcelo Tosatti, Andrea Arcangeli, kvm-devel On Wednesday 31 December 2008 02:32:50 Avi Kivity wrote: > Marcelo Tosatti wrote: > > On Tue, Dec 30, 2008 at 02:53:36PM +1100, Nick Piggin wrote: > >>> RSP <ffff88011f4c7be8> > >>> ---[ end trace 31811279a2e983e8 ]--- > >>> note: qemu-system-x86[4440] exited with preempt_count 2 > >>> > >>> > >>> (gdb) l *(__purge_vmap_area_lazy + 0x12c) > >>> 0xffffffff80289ca2 is in __purge_vmap_area_lazy (mm/vmalloc.c:516). > >>> 511 if (nr || force_flush) > >>> 512 flush_tlb_kernel_range(*start, *end); > >>> 513 > >>> 514 if (nr) { > >>> 515 spin_lock(&vmap_area_lock); > >>> 516 list_for_each_entry(va, &valist, purge_list) > >>> 517 __free_vmap_area(va); > >>> 518 spin_unlock(&vmap_area_lock); > >>> 519 } > >>> 520 spin_unlock(&purge_lock); > >>> > >>> 0xffffffff80289c9a <__purge_vmap_area_lazy+292>: mov 0x40(%rbx),%rax > >>> 0xffffffff80289c9e <__purge_vmap_area_lazy+296>: lea > >>> -0x40(%rax),%rbx 0xffffffff80289ca2 <__purge_vmap_area_lazy+300>: > >>> mov 0x40(%rbx),%rax ^^^^^^^^^^^^^^^^^^^ > > > > Note: > > > > RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b2b > > Good old POISON_FREE. Right, it seems like it has been kfreed while it is still accessed via RCU. But I just can't see how the vmap_area can be freed while there is a concurrent process traversing the vmap_area_list... __free_vmap_area removes the entry from the list first, then does a call_rcu to kfree it. Hmm... ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: __purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y 2008-12-31 2:32 ` Nick Piggin @ 2009-01-06 19:53 ` Marcelo Tosatti 2009-01-07 10:02 ` Avi Kivity 0 siblings, 1 reply; 10+ messages in thread From: Marcelo Tosatti @ 2009-01-06 19:53 UTC (permalink / raw) To: Nick Piggin; +Cc: Avi Kivity, Andrea Arcangeli, kvm-devel On Wed, Dec 31, 2008 at 01:32:37PM +1100, Nick Piggin wrote: > On Wednesday 31 December 2008 02:32:50 Avi Kivity wrote: > > Marcelo Tosatti wrote: > > > On Tue, Dec 30, 2008 at 02:53:36PM +1100, Nick Piggin wrote: > > >>> RSP <ffff88011f4c7be8> > > >>> ---[ end trace 31811279a2e983e8 ]--- > > >>> note: qemu-system-x86[4440] exited with preempt_count 2 > > >>> > > >>> > > >>> (gdb) l *(__purge_vmap_area_lazy + 0x12c) > > >>> 0xffffffff80289ca2 is in __purge_vmap_area_lazy (mm/vmalloc.c:516). > > >>> 511 if (nr || force_flush) > > >>> 512 flush_tlb_kernel_range(*start, *end); > > >>> 513 > > >>> 514 if (nr) { > > >>> 515 spin_lock(&vmap_area_lock); > > >>> 516 list_for_each_entry(va, &valist, purge_list) > > >>> 517 __free_vmap_area(va); > > >>> 518 spin_unlock(&vmap_area_lock); > > >>> 519 } > > >>> 520 spin_unlock(&purge_lock); > > >>> > > >>> 0xffffffff80289c9a <__purge_vmap_area_lazy+292>: mov 0x40(%rbx),%rax > > >>> 0xffffffff80289c9e <__purge_vmap_area_lazy+296>: lea > > >>> -0x40(%rax),%rbx 0xffffffff80289ca2 <__purge_vmap_area_lazy+300>: > > >>> mov 0x40(%rbx),%rax ^^^^^^^^^^^^^^^^^^^ > > > > > > Note: > > > > > > RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b2b > > > > Good old POISON_FREE. > > Right, it seems like it has been kfreed while it is still accessed via RCU. > But I just can't see how the vmap_area can be freed while there is a > concurrent process traversing the vmap_area_list... __free_vmap_area removes > the entry from the list first, then does a call_rcu to kfree it. > > Hmm... Ok, the bug seems to be gone now. Avi, can you apply the kernel patch please? I'll send a separate patch to disable hugepage usage if mmu notifiers aren't enabled. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: __purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y 2009-01-06 19:53 ` Marcelo Tosatti @ 2009-01-07 10:02 ` Avi Kivity 0 siblings, 0 replies; 10+ messages in thread From: Avi Kivity @ 2009-01-07 10:02 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Nick Piggin, Andrea Arcangeli, kvm-devel Marcelo Tosatti wrote: > Ok, the bug seems to be gone now. Avi, can you apply the kernel patch > please? Done. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2009-01-07 10:02 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-12-10 20:23 KVM: mmu_notifiers release method Marcelo Tosatti 2008-12-24 12:50 ` Avi Kivity 2008-12-24 15:28 ` Andrea Arcangeli 2008-12-29 14:58 ` __purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y Marcelo Tosatti 2008-12-30 3:53 ` Nick Piggin 2008-12-30 15:13 ` Marcelo Tosatti 2008-12-30 15:32 ` Avi Kivity 2008-12-31 2:32 ` Nick Piggin 2009-01-06 19:53 ` Marcelo Tosatti 2009-01-07 10:02 ` Avi Kivity
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).