KVM: mmu_notifiers release method

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* KVM: mmu_notifiers release method
@ 2008-12-10 20:23 Marcelo Tosatti
  2008-12-24 12:50 ` Avi Kivity
  0 siblings, 1 reply; 10+ messages in thread
From: Marcelo Tosatti @ 2008-12-10 20:23 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: kvm-devel


The destructor for huge pages uses the backing inode for adjusting
hugetlbfs accounting.

Hugepage mappings are destroyed by exit_mmap, after
mmu_notifier_release, so there are no notifications through
unmap_hugepage_range at this point.

The hugetlbfs inode can be freed with pages backed by it referenced
by the shadow. When the shadow releases its reference, the huge page
destructor will access a now freed inode.

Implement the release operation for kvm mmu notifiers to release page
refs before the hugetlbfs inode is gone.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>


diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e7644b9..5bc38b5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -741,11 +741,19 @@ static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn,
 	return young;
 }
 
+static void kvm_mmu_notifier_release(struct mmu_notifier *mn,
+				     struct mm_struct *mm)
+{
+	struct kvm *kvm = mmu_notifier_to_kvm(mn);
+	kvm_arch_flush_shadow(kvm);
+}
+
 static const struct mmu_notifier_ops kvm_mmu_notifier_ops = {
 	.invalidate_page	= kvm_mmu_notifier_invalidate_page,
 	.invalidate_range_start	= kvm_mmu_notifier_invalidate_range_start,
 	.invalidate_range_end	= kvm_mmu_notifier_invalidate_range_end,
 	.clear_flush_young	= kvm_mmu_notifier_clear_flush_young,
+	.release		= kvm_mmu_notifier_release,
 };
 #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: KVM: mmu_notifiers release method
  2008-12-10 20:23 KVM: mmu_notifiers release method Marcelo Tosatti
@ 2008-12-24 12:50 ` Avi Kivity
  2008-12-24 15:28   ` Andrea Arcangeli
  0 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2008-12-24 12:50 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Andrea Arcangeli, kvm-devel

Marcelo Tosatti wrote:
> The destructor for huge pages uses the backing inode for adjusting
> hugetlbfs accounting.
>
> Hugepage mappings are destroyed by exit_mmap, after
> mmu_notifier_release, so there are no notifications through
> unmap_hugepage_range at this point.
>
> The hugetlbfs inode can be freed with pages backed by it referenced
> by the shadow. When the shadow releases its reference, the huge page
> destructor will access a now freed inode.
>
> Implement the release operation for kvm mmu notifiers to release page
> refs before the hugetlbfs inode is gone.
>
>   

I see this isn't it.  Andrea, comments?


> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index e7644b9..5bc38b5 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -741,11 +741,19 @@ static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn,
>  	return young;
>  }
>  
> +static void kvm_mmu_notifier_release(struct mmu_notifier *mn,
> +				     struct mm_struct *mm)
> +{
> +	struct kvm *kvm = mmu_notifier_to_kvm(mn);
> +	kvm_arch_flush_shadow(kvm);
> +}
> +
>  static const struct mmu_notifier_ops kvm_mmu_notifier_ops = {
>  	.invalidate_page	= kvm_mmu_notifier_invalidate_page,
>  	.invalidate_range_start	= kvm_mmu_notifier_invalidate_range_start,
>  	.invalidate_range_end	= kvm_mmu_notifier_invalidate_range_end,
>  	.clear_flush_young	= kvm_mmu_notifier_clear_flush_young,
> +	.release		= kvm_mmu_notifier_release,
>  };
>  #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */
>  
>   


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: KVM: mmu_notifiers release method
  2008-12-24 12:50 ` Avi Kivity
@ 2008-12-24 15:28   ` Andrea Arcangeli
  2008-12-29 14:58     ` __purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y Marcelo Tosatti
  0 siblings, 1 reply; 10+ messages in thread
From: Andrea Arcangeli @ 2008-12-24 15:28 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm-devel

On Wed, Dec 24, 2008 at 02:50:57PM +0200, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>> The destructor for huge pages uses the backing inode for adjusting
>> hugetlbfs accounting.
>>
>> Hugepage mappings are destroyed by exit_mmap, after
>> mmu_notifier_release, so there are no notifications through
>> unmap_hugepage_range at this point.
>>
>> The hugetlbfs inode can be freed with pages backed by it referenced
>> by the shadow. When the shadow releases its reference, the huge page
>> destructor will access a now freed inode.
>>
>> Implement the release operation for kvm mmu notifiers to release page
>> refs before the hugetlbfs inode is gone.
>>
>>   
>
> I see this isn't it.  Andrea, comments?

Yeah, the patch looks good, I talked a bit with Marcelo about this by
PM. The issue is that it's not as strightforward as it seems,
basically when I implemented the ->release handlers and had sptes
teardown running before the files were closed (instead of waiting the
kvm anon inode release handler to fire) I was getting bugchecks from
debug options including preempt=y (certain debug checks only becomes
functional with preempt enabled unfortunately), so eventually I
removed ->release because for kvm ->release wasn't useful because no
guest mode can run any more by the time mmu notifier ->release is
invoked, and that avoided the issues with the bugchecks.

We'll be using the mmu notifiers ->release because it's always called
just before the filehandle are destroyed, it's not really about the
guest mode or secondary mmu but just an ordering issue with hugetlbfs
internals.

So in short if no bugcheck triggers this is fine (at least until
hugetlbfs provides a way to register some callback to invoke at the
start of the hugetlbfs->release handler).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* __purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y
  2008-12-24 15:28   ` Andrea Arcangeli
@ 2008-12-29 14:58     ` Marcelo Tosatti
  2008-12-30  3:53       ` Nick Piggin
  0 siblings, 1 reply; 10+ messages in thread
From: Marcelo Tosatti @ 2008-12-29 14:58 UTC (permalink / raw)
  To: Andrea Arcangeli, Nick Piggin; +Cc: Avi Kivity, kvm-devel

On Wed, Dec 24, 2008 at 04:28:44PM +0100, Andrea Arcangeli wrote:
> On Wed, Dec 24, 2008 at 02:50:57PM +0200, Avi Kivity wrote:
> > Marcelo Tosatti wrote:
> >> The destructor for huge pages uses the backing inode for adjusting
> >> hugetlbfs accounting.
> >>
> >> Hugepage mappings are destroyed by exit_mmap, after
> >> mmu_notifier_release, so there are no notifications through
> >> unmap_hugepage_range at this point.
> >>
> >> The hugetlbfs inode can be freed with pages backed by it referenced
> >> by the shadow. When the shadow releases its reference, the huge page
> >> destructor will access a now freed inode.
> >>
> >> Implement the release operation for kvm mmu notifiers to release page
> >> refs before the hugetlbfs inode is gone.
> >>
> >>   
> >
> > I see this isn't it.  Andrea, comments?
> 
> Yeah, the patch looks good, I talked a bit with Marcelo about this by
> PM. The issue is that it's not as strightforward as it seems,
> basically when I implemented the ->release handlers and had sptes
> teardown running before the files were closed (instead of waiting the
> kvm anon inode release handler to fire) I was getting bugchecks from
> debug options including preempt=y (certain debug checks only becomes
> functional with preempt enabled unfortunately), so eventually I
> removed ->release because for kvm ->release wasn't useful because no
> guest mode can run any more by the time mmu notifier ->release is
> invoked, and that avoided the issues with the bugchecks.
> 
> We'll be using the mmu notifiers ->release because it's always called
> just before the filehandle are destroyed, it's not really about the
> guest mode or secondary mmu but just an ordering issue with hugetlbfs
> internals.
> 
> So in short if no bugcheck triggers this is fine (at least until
> hugetlbfs provides a way to register some callback to invoke at the
> start of the hugetlbfs->release handler).

The only bugcheck I see, which triggers on vanilla kvm upstream with 
CONFIG_PREEMPT_DEBUG=y and CONFIG_PREEMPT_RCU=y is:

general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC<4>ttyS1: 1 input overrun(s)

last sysfs file: /sys/class/net/tap0/address
CPU 0 
Modules linked in: tun ipt_MASQUERADE iptable_nat nf_nat bridge stp llc nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT iptable_filter ip_tables x_tables dm_multipath kvm_intel kvm scsi_wait_scan ata_piix libata dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug mptsas mptscsih mptbase scsi_transport_sas uhci_hcd ohci_hcd ehci_hcd
Pid: 4768, comm: qemu-system-x86 Not tainted 2.6.28-00165-g4f27e3e-dirty #164
RIP: 0010:[<ffffffff8028a5b6>]  [<ffffffff8028a5b6>] __purge_vmap_area_lazy+0x12c/0x163
RSP: 0018:ffff88021e1f9a38  EFLAGS: 00010202
RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b2b RCX: 0000000000000003
RDX: ffffffff80a1dae0 RSI: ffff880028083980 RDI: 0000000000000001
RBP: ffff88021e1f9a78 R08: 0000000000000286 R09: ffffffff80a1bf50
R10: ffff880119c270f8 R11: ffff88021e1f99b8 R12: ffff88021e1f9a38
R13: ffff88021e1f9a90 R14: ffff88021e1f9a98 R15: 000000000000813a
FS:  0000000000000000(0000) GS:ffffffff8080d900(0000) knlGS:0000000000000000
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 00000000008d9828 CR3: 0000000000201000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process qemu-system-x86 (pid: 4768, threadinfo ffff88021e1f8000, task ffff880119c270f8)
Stack:
 ffff88022bdfd840 ffff880119da11b8 ffffc20011c30000 000000000000813a
 0000000000000000 0000000000000001 ffff88022ec11c18 ffff88022f061838
 ffff88021e1f9aa8 ffffffff8028ab1d ffff88021e1f9aa8 ffffc20021976000
Call Trace:
 [<ffffffff8028ab1d>] free_unmap_vmap_area_noflush+0x69/0x70
 [<ffffffff8028ab49>] remove_vm_area+0x25/0x71
 [<ffffffff8028ac54>] __vunmap+0x3a/0xca
 [<ffffffff8028ad35>] vfree+0x29/0x2b
 [<ffffffffa00f98a3>] kvm_free_physmem_slot+0x25/0x7c [kvm]
 [<ffffffffa00f9d75>] kvm_free_physmem+0x27/0x36 [kvm]
 [<ffffffffa00fccb4>] kvm_arch_destroy_vm+0xa6/0xda [kvm]
 [<ffffffffa00f9e11>] kvm_put_kvm+0x8d/0xa7 [kvm]
 [<ffffffffa00fa0e2>] kvm_vcpu_release+0x13/0x17 [kvm]
 [<ffffffff802a1c07>] __fput+0xeb/0x1a3
 [<ffffffff802a1cd4>] fput+0x15/0x17
 [<ffffffff8029f26c>] filp_close+0x67/0x72
 [<ffffffff802378a8>] put_files_struct+0x74/0xc8
 [<ffffffff80237943>] exit_files+0x47/0x4f
 [<ffffffff80238fe5>] do_exit+0x1eb/0x7a7
 [<ffffffff80587edf>] ? _spin_unlock_irq+0x2b/0x51
 [<ffffffff80239614>] do_group_exit+0x73/0xa0
 [<ffffffff80242b10>] get_signal_to_deliver+0x30c/0x32c
 [<ffffffff8020b4d5>] ? sysret_signal+0x19/0x29
 [<ffffffff8020a80f>] do_notify_resume+0x8c/0x851
 [<ffffffff8025b811>] ? do_futex+0x90/0x92a
 [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114
 [<ffffffff80587f51>] ? _spin_unlock_irqrestore+0x4c/0x68
 [<ffffffff8026be5c>] ? __rcu_read_unlock+0x92/0x9e
 [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114
 [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff8024f300>] ? getnstimeofday+0x3a/0x96
 [<ffffffff8024c4f0>] ? ktime_get_ts+0x49/0x4e
 [<ffffffff8020b4c1>] ? sysret_signal+0x5/0x29
 [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114
 [<ffffffff8020b4d5>] ? sysret_signal+0x19/0x29
 [<ffffffff8020b7b7>] ptregscall_common+0x67/0xb0
Code: 46 48 c7 c7 c0 d1 74 80 4c 8d 65 c0 e8 0c db 2f 00 48 8b 45 c0 48 8d 58 c0 eb 10 48 89 df e8 74 fe ff ff 48 8b 43 40 48 8d 58 c0 <48> 8b 43 40 0f 18 08 48 8d 43 40 4c 39 e0 75 e0 48 c7 c7 c0 d1 
RIP  [<ffffffff8028a5b6>] __purge_vmap_area_lazy+0x12c/0x163
 RSP <ffff88021e1f9a38>
---[ end trace fde3e64ebe4bbca2 ]---
Fixing recursive fault but reboot is needed!
BUG: scheduling while atomic: qemu-system-x86/4768/0x00000003
Modules linked in: tun ipt_MASQUERADE iptable_nat nf_nat bridge stp llc nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT iptable_filter ip_tables x_tables dm_multipath kvm_intel kvm scsi_wait_scan ata_piix libata dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug mptsas mptscsih mptbase scsi_transport_sas uhci_hcd ohci_hcd ehci_hcd
Pid: 4768, comm: qemu-system-x86 Tainted: G      D    2.6.28-00165-g4f27e3e-dirty #164
Call Trace:
 [<ffffffff8025585e>] ? __debug_show_held_locks+0x1b/0x24
 [<ffffffff8023187b>] __schedule_bug+0x8c/0x95
 [<ffffffff805851e1>] schedule+0xd3/0x902
 [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff8037a938>] ? put_io_context+0x67/0x72
 [<ffffffff80238ed4>] do_exit+0xda/0x7a7
 [<ffffffff805892c9>] oops_begin+0x0/0x90
 [<ffffffff8020e3c9>] die+0x5d/0x66
 [<ffffffff80588ff7>] do_general_protection+0x128/0x130
 [<ffffffff80588ecf>] ? do_general_protection+0x0/0x130
 [<ffffffff80588702>] error_exit+0x0/0xa9
 [<ffffffff8028a5b6>] ? __purge_vmap_area_lazy+0x12c/0x163
 [<ffffffff8028a5ae>] ? __purge_vmap_area_lazy+0x124/0x163
 [<ffffffff8028ab1d>] free_unmap_vmap_area_noflush+0x69/0x70
 [<ffffffff8028ab49>] remove_vm_area+0x25/0x71
 [<ffffffff8028ac54>] __vunmap+0x3a/0xca
 [<ffffffff8028ad35>] vfree+0x29/0x2b
 [<ffffffffa00f98a3>] kvm_free_physmem_slot+0x25/0x7c [kvm]
 [<ffffffffa00f9d75>] kvm_free_physmem+0x27/0x36 [kvm]
 [<ffffffffa00fccb4>] kvm_arch_destroy_vm+0xa6/0xda [kvm]
 [<ffffffffa00f9e11>] kvm_put_kvm+0x8d/0xa7 [kvm]
 [<ffffffffa00fa0e2>] kvm_vcpu_release+0x13/0x17 [kvm]
 [<ffffffff802a1c07>] __fput+0xeb/0x1a3
 [<ffffffff802a1cd4>] fput+0x15/0x17
 [<ffffffff8029f26c>] filp_close+0x67/0x72
 [<ffffffff802378a8>] put_files_struct+0x74/0xc8
 [<ffffffff80237943>] exit_files+0x47/0x4f
 [<ffffffff80238fe5>] do_exit+0x1eb/0x7a7
 [<ffffffff80587edf>] ? _spin_unlock_irq+0x2b/0x51
 [<ffffffff80239614>] do_group_exit+0x73/0xa0
 [<ffffffff80242b10>] get_signal_to_deliver+0x30c/0x32c
 [<ffffffff8020b4d5>] ? sysret_signal+0x19/0x29
 [<ffffffff8020a80f>] do_notify_resume+0x8c/0x851
 [<ffffffff8025b811>] ? do_futex+0x90/0x92a
 [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114
 [<ffffffff80587f51>] ? _spin_unlock_irqrestore+0x4c/0x68
 [<ffffffff8026be5c>] ? __rcu_read_unlock+0x92/0x9e
 [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114
 [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff8024f300>] ? getnstimeofday+0x3a/0x96
 [<ffffffff8024c4f0>] ? ktime_get_ts+0x49/0x4e
 [<ffffffff8020b4c1>] ? sysret_signal+0x5/0x29
 [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114
 [<ffffffff8020b4d5>] ? sysret_signal+0x19/0x29
 [<ffffffff8020b7b7>] ptregscall_common+0x67/0xb0
ttyS1: 26 input overrun(s)

And its not specific to vm shutdown path. Another instance:


general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/class/net/tap0/address
CPU 5 
Modules linked in: ipt_REJECT xt_state xt_tcpudp iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables tun kvm_intel kvm bridge stp llc dm_multipath scsi_wait_scan ata_piix libata dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug mptsas mptscsih mptbase scsi_transport_sas uhci_hcd ohci_hcd ehci_hcd [last unloaded: x_tables]
Pid: 4440, comm: qemu-system-x86 Not tainted 2.6.28-00165-g4f27e3e-dirty #163
RIP: 0010:[<ffffffff8028a5b6>]  [<ffffffff8028a5b6>] __purge_vmap_area_lazy+0x12c/0x163
RSP: 0018:ffff88011f4c7be8  EFLAGS: 00010246
RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b2b RCX: 0000000000000003
RDX: ffffffff80a1dae0 RSI: ffff880028083980 RDI: 0000000000000001
RBP: ffff88011f4c7c28 R08: 0000000000000282 R09: ffffffff80a1bf50
R10: ffff88022e9dc0f8 R11: ffff88011f4c7b68 R12: ffff88011f4c7be8
R13: ffff88011f4c7c40 R14: ffff88011f4c7c48 R15: 0000000000008001
FS:  0000000040abf950(0063) GS:ffff88022f25ed18(0000) knlGS:0000000000000000
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000229d34000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process qemu-system-x86 (pid: 4440, threadinfo ffff88011f4c6000, task ffff88022e9dc0f8)
Stack:
 ffff8802291a14b0 ffff880229003d58 ffffc20021000000 0000000000008001
 ffff880229526000 0000000000000000 ffff88022d073000 ffff88011f58c0c0
 ffff88011f4c7c58 ffffffff8028ab1d ffff88011f4c7c58 ffffffffa015c000
Call Trace:
 [<ffffffff8028ab1d>] free_unmap_vmap_area_noflush+0x69/0x70
 [<ffffffff8028ab49>] remove_vm_area+0x25/0x71
 [<ffffffff8028ac54>] __vunmap+0x3a/0xca
 [<ffffffff8028ad0a>] vunmap+0x26/0x28
 [<ffffffffa01be092>] pio_copy_data+0xcf/0x113 [kvm]
 [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffffa01be16f>] complete_pio+0x99/0x1ef [kvm]
 [<ffffffff8023fcd2>] ? sigprocmask+0xc6/0xd0
 [<ffffffffa01c0295>] kvm_arch_vcpu_ioctl_run+0x9a/0x889 [kvm]
 [<ffffffffa01b84f4>] kvm_vcpu_ioctl+0xfc/0x48b [kvm]
 [<ffffffff802ac760>] vfs_ioctl+0x2a/0x78
 [<ffffffff8026be5c>] ? __rcu_read_unlock+0x92/0x9e
 [<ffffffff802acb46>] do_vfs_ioctl+0x398/0x3c6
 [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff802acbb6>] sys_ioctl+0x42/0x65
 [<ffffffff8020b43b>] system_call_fastpath+0x16/0x1b
Code: 46 48 c7 c7 c0 d1 74 80 4c 8d 65 c0 e8 8c da 2f 00 48 8b 45 c0 48 8d 58 c0 eb 10 48 89 df e8 74 fe ff ff 48 8b 43 40 48 8d 58 c0 <48> 8b 43 40 0f 18 08 48 8d 43 40 4c 39 e0 75 e0 48 c7 c7 c0 d1 
RIP  [<ffffffff8028a5b6>] __purge_vmap_area_lazy+0x12c/0x163
 RSP <ffff88011f4c7be8>
---[ end trace 31811279a2e983e8 ]---
note: qemu-system-x86[4440] exited with preempt_count 2


(gdb) l *(__purge_vmap_area_lazy + 0x12c)
0xffffffff80289ca2 is in __purge_vmap_area_lazy (mm/vmalloc.c:516).
511     if (nr || force_flush)
512         flush_tlb_kernel_range(*start, *end);
513 
514     if (nr) {
515         spin_lock(&vmap_area_lock);
516         list_for_each_entry(va, &valist, purge_list)
517             __free_vmap_area(va);
518         spin_unlock(&vmap_area_lock);
519     }
520     spin_unlock(&purge_lock);

0xffffffff80289c9a <__purge_vmap_area_lazy+292>:    mov 0x40(%rbx),%rax
0xffffffff80289c9e <__purge_vmap_area_lazy+296>:    lea -0x40(%rax),%rbx
0xffffffff80289ca2 <__purge_vmap_area_lazy+300>:    mov 0x40(%rbx),%rax
                                                    ^^^^^^^^^^^^^^^^^^^
0xffffffff80289ca6 <__purge_vmap_area_lazy+304>:    prefetcht0 (%rax)


Which vanishes once PREEMPT_RCU is disabled. 

Nick? KVM does not make direct use of RCU. Same issue happens if the
entire __purge_vmap_area_lazy runs with vmap_area_lock held.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: __purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y
  2008-12-29 14:58     ` __purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y Marcelo Tosatti
@ 2008-12-30  3:53       ` Nick Piggin
  2008-12-30 15:13         ` Marcelo Tosatti
  0 siblings, 1 reply; 10+ messages in thread
From: Nick Piggin @ 2008-12-30  3:53 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Andrea Arcangeli, Avi Kivity, kvm-devel

On Tuesday 30 December 2008 01:58:21 Marcelo Tosatti wrote:
> On Wed, Dec 24, 2008 at 04:28:44PM +0100, Andrea Arcangeli wrote:
> > On Wed, Dec 24, 2008 at 02:50:57PM +0200, Avi Kivity wrote:
> > > Marcelo Tosatti wrote:
> > >> The destructor for huge pages uses the backing inode for adjusting
> > >> hugetlbfs accounting.
> > >>
> > >> Hugepage mappings are destroyed by exit_mmap, after
> > >> mmu_notifier_release, so there are no notifications through
> > >> unmap_hugepage_range at this point.
> > >>
> > >> The hugetlbfs inode can be freed with pages backed by it referenced
> > >> by the shadow. When the shadow releases its reference, the huge page
> > >> destructor will access a now freed inode.
> > >>
> > >> Implement the release operation for kvm mmu notifiers to release page
> > >> refs before the hugetlbfs inode is gone.
> > >
> > > I see this isn't it.  Andrea, comments?
> >
> > Yeah, the patch looks good, I talked a bit with Marcelo about this by
> > PM. The issue is that it's not as strightforward as it seems,
> > basically when I implemented the ->release handlers and had sptes
> > teardown running before the files were closed (instead of waiting the
> > kvm anon inode release handler to fire) I was getting bugchecks from
> > debug options including preempt=y (certain debug checks only becomes
> > functional with preempt enabled unfortunately), so eventually I
> > removed ->release because for kvm ->release wasn't useful because no
> > guest mode can run any more by the time mmu notifier ->release is
> > invoked, and that avoided the issues with the bugchecks.
> >
> > We'll be using the mmu notifiers ->release because it's always called
> > just before the filehandle are destroyed, it's not really about the
> > guest mode or secondary mmu but just an ordering issue with hugetlbfs
> > internals.
> >
> > So in short if no bugcheck triggers this is fine (at least until
> > hugetlbfs provides a way to register some callback to invoke at the
> > start of the hugetlbfs->release handler).
>
> The only bugcheck I see, which triggers on vanilla kvm upstream with
> CONFIG_PREEMPT_DEBUG=y and CONFIG_PREEMPT_RCU=y is:
>
> general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC<4>ttyS1: 1
> input overrun(s)
>
> last sysfs file: /sys/class/net/tap0/address
> CPU 0
> Modules linked in: tun ipt_MASQUERADE iptable_nat nf_nat bridge stp llc
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT
> iptable_filter ip_tables x_tables dm_multipath kvm_intel kvm scsi_wait_scan
> ata_piix libata dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod
> shpchp pci_hotplug mptsas mptscsih mptbase scsi_transport_sas uhci_hcd
> ohci_hcd ehci_hcd Pid: 4768, comm: qemu-system-x86 Not tainted
> 2.6.28-00165-g4f27e3e-dirty #164 RIP: 0010:[<ffffffff8028a5b6>] 
> [<ffffffff8028a5b6>] __purge_vmap_area_lazy+0x12c/0x163 RSP:
> 0018:ffff88021e1f9a38  EFLAGS: 00010202
> RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b2b RCX: 0000000000000003
> RDX: ffffffff80a1dae0 RSI: ffff880028083980 RDI: 0000000000000001
> RBP: ffff88021e1f9a78 R08: 0000000000000286 R09: ffffffff80a1bf50
> R10: ffff880119c270f8 R11: ffff88021e1f99b8 R12: ffff88021e1f9a38
> R13: ffff88021e1f9a90 R14: ffff88021e1f9a98 R15: 000000000000813a
> FS:  0000000000000000(0000) GS:ffffffff8080d900(0000)
> knlGS:0000000000000000 CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
> CR2: 00000000008d9828 CR3: 0000000000201000 CR4: 00000000000026e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process qemu-system-x86 (pid: 4768, threadinfo ffff88021e1f8000, task
> ffff880119c270f8) Stack:
>  ffff88022bdfd840 ffff880119da11b8 ffffc20011c30000 000000000000813a
>  0000000000000000 0000000000000001 ffff88022ec11c18 ffff88022f061838
>  ffff88021e1f9aa8 ffffffff8028ab1d ffff88021e1f9aa8 ffffc20021976000
> Call Trace:
>  [<ffffffff8028ab1d>] free_unmap_vmap_area_noflush+0x69/0x70
>  [<ffffffff8028ab49>] remove_vm_area+0x25/0x71
>  [<ffffffff8028ac54>] __vunmap+0x3a/0xca
>  [<ffffffff8028ad35>] vfree+0x29/0x2b
>  [<ffffffffa00f98a3>] kvm_free_physmem_slot+0x25/0x7c [kvm]
>  [<ffffffffa00f9d75>] kvm_free_physmem+0x27/0x36 [kvm]
>  [<ffffffffa00fccb4>] kvm_arch_destroy_vm+0xa6/0xda [kvm]
>  [<ffffffffa00f9e11>] kvm_put_kvm+0x8d/0xa7 [kvm]
>  [<ffffffffa00fa0e2>] kvm_vcpu_release+0x13/0x17 [kvm]
>  [<ffffffff802a1c07>] __fput+0xeb/0x1a3
>  [<ffffffff802a1cd4>] fput+0x15/0x17
>  [<ffffffff8029f26c>] filp_close+0x67/0x72
>  [<ffffffff802378a8>] put_files_struct+0x74/0xc8
>  [<ffffffff80237943>] exit_files+0x47/0x4f
>  [<ffffffff80238fe5>] do_exit+0x1eb/0x7a7
>  [<ffffffff80587edf>] ? _spin_unlock_irq+0x2b/0x51
>  [<ffffffff80239614>] do_group_exit+0x73/0xa0
>  [<ffffffff80242b10>] get_signal_to_deliver+0x30c/0x32c
>  [<ffffffff8020b4d5>] ? sysret_signal+0x19/0x29
>  [<ffffffff8020a80f>] do_notify_resume+0x8c/0x851
>  [<ffffffff8025b811>] ? do_futex+0x90/0x92a
>  [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114
>  [<ffffffff80587f51>] ? _spin_unlock_irqrestore+0x4c/0x68
>  [<ffffffff8026be5c>] ? __rcu_read_unlock+0x92/0x9e
>  [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114
>  [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf
>  [<ffffffff8024f300>] ? getnstimeofday+0x3a/0x96
>  [<ffffffff8024c4f0>] ? ktime_get_ts+0x49/0x4e
>  [<ffffffff8020b4c1>] ? sysret_signal+0x5/0x29
>  [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114
>  [<ffffffff8020b4d5>] ? sysret_signal+0x19/0x29
>  [<ffffffff8020b7b7>] ptregscall_common+0x67/0xb0
> Code: 46 48 c7 c7 c0 d1 74 80 4c 8d 65 c0 e8 0c db 2f 00 48 8b 45 c0 48 8d
> 58 c0 eb 10 48 89 df e8 74 fe ff ff 48 8b 43 40 48 8d 58 c0 <48> 8b 43 40
> 0f 18 08 48 8d 43 40 4c 39 e0 75 e0 48 c7 c7 c0 d1 RIP 
> [<ffffffff8028a5b6>] __purge_vmap_area_lazy+0x12c/0x163
>  RSP <ffff88021e1f9a38>
> ---[ end trace fde3e64ebe4bbca2 ]---
> Fixing recursive fault but reboot is needed!
> BUG: scheduling while atomic: qemu-system-x86/4768/0x00000003
> Modules linked in: tun ipt_MASQUERADE iptable_nat nf_nat bridge stp llc
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT
> iptable_filter ip_tables x_tables dm_multipath kvm_intel kvm scsi_wait_scan
> ata_piix libata dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod
> shpchp pci_hotplug mptsas mptscsih mptbase scsi_transport_sas uhci_hcd
> ohci_hcd ehci_hcd Pid: 4768, comm: qemu-system-x86 Tainted: G      D   
> 2.6.28-00165-g4f27e3e-dirty #164 Call Trace:
>  [<ffffffff8025585e>] ? __debug_show_held_locks+0x1b/0x24
>  [<ffffffff8023187b>] __schedule_bug+0x8c/0x95
>  [<ffffffff805851e1>] schedule+0xd3/0x902
>  [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf
>  [<ffffffff8037a938>] ? put_io_context+0x67/0x72
>  [<ffffffff80238ed4>] do_exit+0xda/0x7a7
>  [<ffffffff805892c9>] oops_begin+0x0/0x90
>  [<ffffffff8020e3c9>] die+0x5d/0x66
>  [<ffffffff80588ff7>] do_general_protection+0x128/0x130
>  [<ffffffff80588ecf>] ? do_general_protection+0x0/0x130
>  [<ffffffff80588702>] error_exit+0x0/0xa9
>  [<ffffffff8028a5b6>] ? __purge_vmap_area_lazy+0x12c/0x163
>  [<ffffffff8028a5ae>] ? __purge_vmap_area_lazy+0x124/0x163
>  [<ffffffff8028ab1d>] free_unmap_vmap_area_noflush+0x69/0x70
>  [<ffffffff8028ab49>] remove_vm_area+0x25/0x71
>  [<ffffffff8028ac54>] __vunmap+0x3a/0xca
>  [<ffffffff8028ad35>] vfree+0x29/0x2b
>  [<ffffffffa00f98a3>] kvm_free_physmem_slot+0x25/0x7c [kvm]
>  [<ffffffffa00f9d75>] kvm_free_physmem+0x27/0x36 [kvm]
>  [<ffffffffa00fccb4>] kvm_arch_destroy_vm+0xa6/0xda [kvm]
>  [<ffffffffa00f9e11>] kvm_put_kvm+0x8d/0xa7 [kvm]
>  [<ffffffffa00fa0e2>] kvm_vcpu_release+0x13/0x17 [kvm]
>  [<ffffffff802a1c07>] __fput+0xeb/0x1a3
>  [<ffffffff802a1cd4>] fput+0x15/0x17
>  [<ffffffff8029f26c>] filp_close+0x67/0x72
>  [<ffffffff802378a8>] put_files_struct+0x74/0xc8
>  [<ffffffff80237943>] exit_files+0x47/0x4f
>  [<ffffffff80238fe5>] do_exit+0x1eb/0x7a7
>  [<ffffffff80587edf>] ? _spin_unlock_irq+0x2b/0x51
>  [<ffffffff80239614>] do_group_exit+0x73/0xa0
>  [<ffffffff80242b10>] get_signal_to_deliver+0x30c/0x32c
>  [<ffffffff8020b4d5>] ? sysret_signal+0x19/0x29
>  [<ffffffff8020a80f>] do_notify_resume+0x8c/0x851
>  [<ffffffff8025b811>] ? do_futex+0x90/0x92a
>  [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114
>  [<ffffffff80587f51>] ? _spin_unlock_irqrestore+0x4c/0x68
>  [<ffffffff8026be5c>] ? __rcu_read_unlock+0x92/0x9e
>  [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114
>  [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf
>  [<ffffffff8024f300>] ? getnstimeofday+0x3a/0x96
>  [<ffffffff8024c4f0>] ? ktime_get_ts+0x49/0x4e
>  [<ffffffff8020b4c1>] ? sysret_signal+0x5/0x29
>  [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114
>  [<ffffffff8020b4d5>] ? sysret_signal+0x19/0x29
>  [<ffffffff8020b7b7>] ptregscall_common+0x67/0xb0
> ttyS1: 26 input overrun(s)
>
> And its not specific to vm shutdown path. Another instance:
>
>
> general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> last sysfs file: /sys/class/net/tap0/address
> CPU 5
> Modules linked in: ipt_REJECT xt_state xt_tcpudp iptable_filter
> ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack
> nf_defrag_ipv4 ip_tables x_tables tun kvm_intel kvm bridge stp llc
> dm_multipath scsi_wait_scan ata_piix libata dm_snapshot dm_zero dm_mirror
> dm_region_hash dm_log dm_mod shpchp pci_hotplug mptsas mptscsih mptbase
> scsi_transport_sas uhci_hcd ohci_hcd ehci_hcd [last unloaded: x_tables]
> Pid: 4440, comm: qemu-system-x86 Not tainted 2.6.28-00165-g4f27e3e-dirty
> #163 RIP: 0010:[<ffffffff8028a5b6>]  [<ffffffff8028a5b6>]
> __purge_vmap_area_lazy+0x12c/0x163 RSP: 0018:ffff88011f4c7be8  EFLAGS:
> 00010246
> RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b2b RCX: 0000000000000003
> RDX: ffffffff80a1dae0 RSI: ffff880028083980 RDI: 0000000000000001
> RBP: ffff88011f4c7c28 R08: 0000000000000282 R09: ffffffff80a1bf50
> R10: ffff88022e9dc0f8 R11: ffff88011f4c7b68 R12: ffff88011f4c7be8
> R13: ffff88011f4c7c40 R14: ffff88011f4c7c48 R15: 0000000000008001
> FS:  0000000040abf950(0063) GS:ffff88022f25ed18(0000)
> knlGS:0000000000000000 CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 0000000229d34000 CR4: 00000000000026e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process qemu-system-x86 (pid: 4440, threadinfo ffff88011f4c6000, task
> ffff88022e9dc0f8) Stack:
>  ffff8802291a14b0 ffff880229003d58 ffffc20021000000 0000000000008001
>  ffff880229526000 0000000000000000 ffff88022d073000 ffff88011f58c0c0
>  ffff88011f4c7c58 ffffffff8028ab1d ffff88011f4c7c58 ffffffffa015c000
> Call Trace:
>  [<ffffffff8028ab1d>] free_unmap_vmap_area_noflush+0x69/0x70
>  [<ffffffff8028ab49>] remove_vm_area+0x25/0x71
>  [<ffffffff8028ac54>] __vunmap+0x3a/0xca
>  [<ffffffff8028ad0a>] vunmap+0x26/0x28
>  [<ffffffffa01be092>] pio_copy_data+0xcf/0x113 [kvm]
>  [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf
>  [<ffffffffa01be16f>] complete_pio+0x99/0x1ef [kvm]
>  [<ffffffff8023fcd2>] ? sigprocmask+0xc6/0xd0
>  [<ffffffffa01c0295>] kvm_arch_vcpu_ioctl_run+0x9a/0x889 [kvm]
>  [<ffffffffa01b84f4>] kvm_vcpu_ioctl+0xfc/0x48b [kvm]
>  [<ffffffff802ac760>] vfs_ioctl+0x2a/0x78
>  [<ffffffff8026be5c>] ? __rcu_read_unlock+0x92/0x9e
>  [<ffffffff802acb46>] do_vfs_ioctl+0x398/0x3c6
>  [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf
>  [<ffffffff802acbb6>] sys_ioctl+0x42/0x65
>  [<ffffffff8020b43b>] system_call_fastpath+0x16/0x1b
> Code: 46 48 c7 c7 c0 d1 74 80 4c 8d 65 c0 e8 8c da 2f 00 48 8b 45 c0 48 8d
> 58 c0 eb 10 48 89 df e8 74 fe ff ff 48 8b 43 40 48 8d 58 c0 <48> 8b 43 40
> 0f 18 08 48 8d 43 40 4c 39 e0 75 e0 48 c7 c7 c0 d1 RIP 
> [<ffffffff8028a5b6>] __purge_vmap_area_lazy+0x12c/0x163
>  RSP <ffff88011f4c7be8>
> ---[ end trace 31811279a2e983e8 ]---
> note: qemu-system-x86[4440] exited with preempt_count 2
>
>
> (gdb) l *(__purge_vmap_area_lazy + 0x12c)
> 0xffffffff80289ca2 is in __purge_vmap_area_lazy (mm/vmalloc.c:516).
> 511     if (nr || force_flush)
> 512         flush_tlb_kernel_range(*start, *end);
> 513
> 514     if (nr) {
> 515         spin_lock(&vmap_area_lock);
> 516         list_for_each_entry(va, &valist, purge_list)
> 517             __free_vmap_area(va);
> 518         spin_unlock(&vmap_area_lock);
> 519     }
> 520     spin_unlock(&purge_lock);
>
> 0xffffffff80289c9a <__purge_vmap_area_lazy+292>:    mov 0x40(%rbx),%rax
> 0xffffffff80289c9e <__purge_vmap_area_lazy+296>:    lea -0x40(%rax),%rbx
> 0xffffffff80289ca2 <__purge_vmap_area_lazy+300>:    mov 0x40(%rbx),%rax
>                                                     ^^^^^^^^^^^^^^^^^^^
> 0xffffffff80289ca6 <__purge_vmap_area_lazy+304>:    prefetcht0 (%rax)
>
>
> Which vanishes once PREEMPT_RCU is disabled.
>
> Nick? KVM does not make direct use of RCU. Same issue happens if the
> entire __purge_vmap_area_lazy runs with vmap_area_lock held.

The thing is that the valist and va->purge_list is protected by
purge_lock in that function. I can't easily see how that could
get corrupted.

Is it easy to reproduce? Can you try putting preempt_disable
around rcu_read_lock?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: __purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y
  2008-12-30  3:53       ` Nick Piggin
@ 2008-12-30 15:13         ` Marcelo Tosatti
  2008-12-30 15:32           ` Avi Kivity
  0 siblings, 1 reply; 10+ messages in thread
From: Marcelo Tosatti @ 2008-12-30 15:13 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrea Arcangeli, Avi Kivity, kvm-devel

On Tue, Dec 30, 2008 at 02:53:36PM +1100, Nick Piggin wrote:
> >  RSP <ffff88011f4c7be8>
> > ---[ end trace 31811279a2e983e8 ]---
> > note: qemu-system-x86[4440] exited with preempt_count 2
> >
> >
> > (gdb) l *(__purge_vmap_area_lazy + 0x12c)
> > 0xffffffff80289ca2 is in __purge_vmap_area_lazy (mm/vmalloc.c:516).
> > 511     if (nr || force_flush)
> > 512         flush_tlb_kernel_range(*start, *end);
> > 513
> > 514     if (nr) {
> > 515         spin_lock(&vmap_area_lock);
> > 516         list_for_each_entry(va, &valist, purge_list)
> > 517             __free_vmap_area(va);
> > 518         spin_unlock(&vmap_area_lock);
> > 519     }
> > 520     spin_unlock(&purge_lock);
> >
> > 0xffffffff80289c9a <__purge_vmap_area_lazy+292>:    mov 0x40(%rbx),%rax
> > 0xffffffff80289c9e <__purge_vmap_area_lazy+296>:    lea -0x40(%rax),%rbx
> > 0xffffffff80289ca2 <__purge_vmap_area_lazy+300>:    mov 0x40(%rbx),%rax
> >                                                     ^^^^^^^^^^^^^^^^^^^

Note:

RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b2b

> > 0xffffffff80289ca6 <__purge_vmap_area_lazy+304>:    prefetcht0 (%rax)
> >
> >
> > Which vanishes once PREEMPT_RCU is disabled.
> >
> > Nick? KVM does not make direct use of RCU. Same issue happens if the
> > entire __purge_vmap_area_lazy runs with vmap_area_lock held.

Hum, mmu_notifiers does.

> The thing is that the valist and va->purge_list is protected by
> purge_lock in that function. 

Which disables preemption...

> I can't easily see how that could get corrupted.

Perhaps the corruption happens before the freeing pass.

> Is it easy to reproduce?

Yes. Enable CONFIG_PREEMPT, CONFIG_PREEMPT_RCU, and DEBUG_PREEMPT. Start
a Linux guest, wait for boot to finish, and shut it down. Sometimes
it happens even before guest shutdown, in the complete_pio path as
reported.

> Can you try putting preempt_disable around rcu_read_lock?

Tried it before, does not help. Even tried to protect all
rcu_read_lock/unlock pairs with preempt_disable/enable in vmalloc.c.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: __purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y
  2008-12-30 15:13         ` Marcelo Tosatti
@ 2008-12-30 15:32           ` Avi Kivity
  2008-12-31  2:32             ` Nick Piggin
  0 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2008-12-30 15:32 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Nick Piggin, Andrea Arcangeli, kvm-devel

Marcelo Tosatti wrote:
> On Tue, Dec 30, 2008 at 02:53:36PM +1100, Nick Piggin wrote:
>   
>>>  RSP <ffff88011f4c7be8>
>>> ---[ end trace 31811279a2e983e8 ]---
>>> note: qemu-system-x86[4440] exited with preempt_count 2
>>>
>>>
>>> (gdb) l *(__purge_vmap_area_lazy + 0x12c)
>>> 0xffffffff80289ca2 is in __purge_vmap_area_lazy (mm/vmalloc.c:516).
>>> 511     if (nr || force_flush)
>>> 512         flush_tlb_kernel_range(*start, *end);
>>> 513
>>> 514     if (nr) {
>>> 515         spin_lock(&vmap_area_lock);
>>> 516         list_for_each_entry(va, &valist, purge_list)
>>> 517             __free_vmap_area(va);
>>> 518         spin_unlock(&vmap_area_lock);
>>> 519     }
>>> 520     spin_unlock(&purge_lock);
>>>
>>> 0xffffffff80289c9a <__purge_vmap_area_lazy+292>:    mov 0x40(%rbx),%rax
>>> 0xffffffff80289c9e <__purge_vmap_area_lazy+296>:    lea -0x40(%rax),%rbx
>>> 0xffffffff80289ca2 <__purge_vmap_area_lazy+300>:    mov 0x40(%rbx),%rax
>>>                                                     ^^^^^^^^^^^^^^^^^^^
>>>       
>
> Note:
>
> RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b2b
>
>   

Good old POISON_FREE.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: __purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y
  2008-12-30 15:32           ` Avi Kivity
@ 2008-12-31  2:32             ` Nick Piggin
  2009-01-06 19:53               ` Marcelo Tosatti
  0 siblings, 1 reply; 10+ messages in thread
From: Nick Piggin @ 2008-12-31  2:32 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Andrea Arcangeli, kvm-devel

On Wednesday 31 December 2008 02:32:50 Avi Kivity wrote:
> Marcelo Tosatti wrote:
> > On Tue, Dec 30, 2008 at 02:53:36PM +1100, Nick Piggin wrote:
> >>>  RSP <ffff88011f4c7be8>
> >>> ---[ end trace 31811279a2e983e8 ]---
> >>> note: qemu-system-x86[4440] exited with preempt_count 2
> >>>
> >>>
> >>> (gdb) l *(__purge_vmap_area_lazy + 0x12c)
> >>> 0xffffffff80289ca2 is in __purge_vmap_area_lazy (mm/vmalloc.c:516).
> >>> 511     if (nr || force_flush)
> >>> 512         flush_tlb_kernel_range(*start, *end);
> >>> 513
> >>> 514     if (nr) {
> >>> 515         spin_lock(&vmap_area_lock);
> >>> 516         list_for_each_entry(va, &valist, purge_list)
> >>> 517             __free_vmap_area(va);
> >>> 518         spin_unlock(&vmap_area_lock);
> >>> 519     }
> >>> 520     spin_unlock(&purge_lock);
> >>>
> >>> 0xffffffff80289c9a <__purge_vmap_area_lazy+292>:    mov 0x40(%rbx),%rax
> >>> 0xffffffff80289c9e <__purge_vmap_area_lazy+296>:    lea
> >>> -0x40(%rax),%rbx 0xffffffff80289ca2 <__purge_vmap_area_lazy+300>:   
> >>> mov 0x40(%rbx),%rax ^^^^^^^^^^^^^^^^^^^
> >
> > Note:
> >
> > RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b2b
>
> Good old POISON_FREE.

Right, it seems like it has been kfreed while it is still accessed via RCU.
But I just can't see how the vmap_area can be freed while there is a
concurrent process traversing the vmap_area_list... __free_vmap_area removes
the entry from the list first, then does a call_rcu to kfree it.

Hmm...


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: __purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y
  2008-12-31  2:32             ` Nick Piggin
@ 2009-01-06 19:53               ` Marcelo Tosatti
  2009-01-07 10:02                 ` Avi Kivity
  0 siblings, 1 reply; 10+ messages in thread
From: Marcelo Tosatti @ 2009-01-06 19:53 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Avi Kivity, Andrea Arcangeli, kvm-devel

On Wed, Dec 31, 2008 at 01:32:37PM +1100, Nick Piggin wrote:
> On Wednesday 31 December 2008 02:32:50 Avi Kivity wrote:
> > Marcelo Tosatti wrote:
> > > On Tue, Dec 30, 2008 at 02:53:36PM +1100, Nick Piggin wrote:
> > >>>  RSP <ffff88011f4c7be8>
> > >>> ---[ end trace 31811279a2e983e8 ]---
> > >>> note: qemu-system-x86[4440] exited with preempt_count 2
> > >>>
> > >>>
> > >>> (gdb) l *(__purge_vmap_area_lazy + 0x12c)
> > >>> 0xffffffff80289ca2 is in __purge_vmap_area_lazy (mm/vmalloc.c:516).
> > >>> 511     if (nr || force_flush)
> > >>> 512         flush_tlb_kernel_range(*start, *end);
> > >>> 513
> > >>> 514     if (nr) {
> > >>> 515         spin_lock(&vmap_area_lock);
> > >>> 516         list_for_each_entry(va, &valist, purge_list)
> > >>> 517             __free_vmap_area(va);
> > >>> 518         spin_unlock(&vmap_area_lock);
> > >>> 519     }
> > >>> 520     spin_unlock(&purge_lock);
> > >>>
> > >>> 0xffffffff80289c9a <__purge_vmap_area_lazy+292>:    mov 0x40(%rbx),%rax
> > >>> 0xffffffff80289c9e <__purge_vmap_area_lazy+296>:    lea
> > >>> -0x40(%rax),%rbx 0xffffffff80289ca2 <__purge_vmap_area_lazy+300>:   
> > >>> mov 0x40(%rbx),%rax ^^^^^^^^^^^^^^^^^^^
> > >
> > > Note:
> > >
> > > RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b2b
> >
> > Good old POISON_FREE.
> 
> Right, it seems like it has been kfreed while it is still accessed via RCU.
> But I just can't see how the vmap_area can be freed while there is a
> concurrent process traversing the vmap_area_list... __free_vmap_area removes
> the entry from the list first, then does a call_rcu to kfree it.
> 
> Hmm...

Ok, the bug seems to be gone now. Avi, can you apply the kernel patch
please? I'll send a separate patch to disable hugepage usage if mmu
notifiers aren't enabled.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: __purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y
  2009-01-06 19:53               ` Marcelo Tosatti
@ 2009-01-07 10:02                 ` Avi Kivity
  0 siblings, 0 replies; 10+ messages in thread
From: Avi Kivity @ 2009-01-07 10:02 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Nick Piggin, Andrea Arcangeli, kvm-devel

Marcelo Tosatti wrote:
> Ok, the bug seems to be gone now. Avi, can you apply the kernel patch
> please?

Done.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-01-07 10:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-10 20:23 KVM: mmu_notifiers release method Marcelo Tosatti
2008-12-24 12:50 ` Avi Kivity
2008-12-24 15:28   ` Andrea Arcangeli
2008-12-29 14:58     ` __purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y Marcelo Tosatti
2008-12-30  3:53       ` Nick Piggin
2008-12-30 15:13         ` Marcelo Tosatti
2008-12-30 15:32           ` Avi Kivity
2008-12-31  2:32             ` Nick Piggin
2009-01-06 19:53               ` Marcelo Tosatti
2009-01-07 10:02                 ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).