__purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Marcelo Tosatti <mtosatti@redhat.com>
To: Andrea Arcangeli <aarcange@redhat.com>,
	Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Avi Kivity <avi@redhat.com>, kvm-devel <kvm@vger.kernel.org>
Subject: __purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y
Date: Mon, 29 Dec 2008 12:58:21 -0200	[thread overview]
Message-ID: <20081229145821.GA3823@amt.cnet> (raw)
In-Reply-To: <20081224152844.GE29319@random.random>

On Wed, Dec 24, 2008 at 04:28:44PM +0100, Andrea Arcangeli wrote:
> On Wed, Dec 24, 2008 at 02:50:57PM +0200, Avi Kivity wrote:
> > Marcelo Tosatti wrote:
> >> The destructor for huge pages uses the backing inode for adjusting
> >> hugetlbfs accounting.
> >>
> >> Hugepage mappings are destroyed by exit_mmap, after
> >> mmu_notifier_release, so there are no notifications through
> >> unmap_hugepage_range at this point.
> >>
> >> The hugetlbfs inode can be freed with pages backed by it referenced
> >> by the shadow. When the shadow releases its reference, the huge page
> >> destructor will access a now freed inode.
> >>
> >> Implement the release operation for kvm mmu notifiers to release page
> >> refs before the hugetlbfs inode is gone.
> >>
> >>   
> >
> > I see this isn't it.  Andrea, comments?
> 
> Yeah, the patch looks good, I talked a bit with Marcelo about this by
> PM. The issue is that it's not as strightforward as it seems,
> basically when I implemented the ->release handlers and had sptes
> teardown running before the files were closed (instead of waiting the
> kvm anon inode release handler to fire) I was getting bugchecks from
> debug options including preempt=y (certain debug checks only becomes
> functional with preempt enabled unfortunately), so eventually I
> removed ->release because for kvm ->release wasn't useful because no
> guest mode can run any more by the time mmu notifier ->release is
> invoked, and that avoided the issues with the bugchecks.
> 
> We'll be using the mmu notifiers ->release because it's always called
> just before the filehandle are destroyed, it's not really about the
> guest mode or secondary mmu but just an ordering issue with hugetlbfs
> internals.
> 
> So in short if no bugcheck triggers this is fine (at least until
> hugetlbfs provides a way to register some callback to invoke at the
> start of the hugetlbfs->release handler).

The only bugcheck I see, which triggers on vanilla kvm upstream with 
CONFIG_PREEMPT_DEBUG=y and CONFIG_PREEMPT_RCU=y is:

general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC<4>ttyS1: 1 input overrun(s)

last sysfs file: /sys/class/net/tap0/address
CPU 0 
Modules linked in: tun ipt_MASQUERADE iptable_nat nf_nat bridge stp llc nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT iptable_filter ip_tables x_tables dm_multipath kvm_intel kvm scsi_wait_scan ata_piix libata dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug mptsas mptscsih mptbase scsi_transport_sas uhci_hcd ohci_hcd ehci_hcd
Pid: 4768, comm: qemu-system-x86 Not tainted 2.6.28-00165-g4f27e3e-dirty #164
RIP: 0010:[<ffffffff8028a5b6>]  [<ffffffff8028a5b6>] __purge_vmap_area_lazy+0x12c/0x163
RSP: 0018:ffff88021e1f9a38  EFLAGS: 00010202
RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b2b RCX: 0000000000000003
RDX: ffffffff80a1dae0 RSI: ffff880028083980 RDI: 0000000000000001
RBP: ffff88021e1f9a78 R08: 0000000000000286 R09: ffffffff80a1bf50
R10: ffff880119c270f8 R11: ffff88021e1f99b8 R12: ffff88021e1f9a38
R13: ffff88021e1f9a90 R14: ffff88021e1f9a98 R15: 000000000000813a
FS:  0000000000000000(0000) GS:ffffffff8080d900(0000) knlGS:0000000000000000
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 00000000008d9828 CR3: 0000000000201000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process qemu-system-x86 (pid: 4768, threadinfo ffff88021e1f8000, task ffff880119c270f8)
Stack:
 ffff88022bdfd840 ffff880119da11b8 ffffc20011c30000 000000000000813a
 0000000000000000 0000000000000001 ffff88022ec11c18 ffff88022f061838
 ffff88021e1f9aa8 ffffffff8028ab1d ffff88021e1f9aa8 ffffc20021976000
Call Trace:
 [<ffffffff8028ab1d>] free_unmap_vmap_area_noflush+0x69/0x70
 [<ffffffff8028ab49>] remove_vm_area+0x25/0x71
 [<ffffffff8028ac54>] __vunmap+0x3a/0xca
 [<ffffffff8028ad35>] vfree+0x29/0x2b
 [<ffffffffa00f98a3>] kvm_free_physmem_slot+0x25/0x7c [kvm]
 [<ffffffffa00f9d75>] kvm_free_physmem+0x27/0x36 [kvm]
 [<ffffffffa00fccb4>] kvm_arch_destroy_vm+0xa6/0xda [kvm]
 [<ffffffffa00f9e11>] kvm_put_kvm+0x8d/0xa7 [kvm]
 [<ffffffffa00fa0e2>] kvm_vcpu_release+0x13/0x17 [kvm]
 [<ffffffff802a1c07>] __fput+0xeb/0x1a3
 [<ffffffff802a1cd4>] fput+0x15/0x17
 [<ffffffff8029f26c>] filp_close+0x67/0x72
 [<ffffffff802378a8>] put_files_struct+0x74/0xc8
 [<ffffffff80237943>] exit_files+0x47/0x4f
 [<ffffffff80238fe5>] do_exit+0x1eb/0x7a7
 [<ffffffff80587edf>] ? _spin_unlock_irq+0x2b/0x51
 [<ffffffff80239614>] do_group_exit+0x73/0xa0
 [<ffffffff80242b10>] get_signal_to_deliver+0x30c/0x32c
 [<ffffffff8020b4d5>] ? sysret_signal+0x19/0x29
 [<ffffffff8020a80f>] do_notify_resume+0x8c/0x851
 [<ffffffff8025b811>] ? do_futex+0x90/0x92a
 [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114
 [<ffffffff80587f51>] ? _spin_unlock_irqrestore+0x4c/0x68
 [<ffffffff8026be5c>] ? __rcu_read_unlock+0x92/0x9e
 [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114
 [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff8024f300>] ? getnstimeofday+0x3a/0x96
 [<ffffffff8024c4f0>] ? ktime_get_ts+0x49/0x4e
 [<ffffffff8020b4c1>] ? sysret_signal+0x5/0x29
 [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114
 [<ffffffff8020b4d5>] ? sysret_signal+0x19/0x29
 [<ffffffff8020b7b7>] ptregscall_common+0x67/0xb0
Code: 46 48 c7 c7 c0 d1 74 80 4c 8d 65 c0 e8 0c db 2f 00 48 8b 45 c0 48 8d 58 c0 eb 10 48 89 df e8 74 fe ff ff 48 8b 43 40 48 8d 58 c0 <48> 8b 43 40 0f 18 08 48 8d 43 40 4c 39 e0 75 e0 48 c7 c7 c0 d1 
RIP  [<ffffffff8028a5b6>] __purge_vmap_area_lazy+0x12c/0x163
 RSP <ffff88021e1f9a38>
---[ end trace fde3e64ebe4bbca2 ]---
Fixing recursive fault but reboot is needed!
BUG: scheduling while atomic: qemu-system-x86/4768/0x00000003
Modules linked in: tun ipt_MASQUERADE iptable_nat nf_nat bridge stp llc nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT iptable_filter ip_tables x_tables dm_multipath kvm_intel kvm scsi_wait_scan ata_piix libata dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug mptsas mptscsih mptbase scsi_transport_sas uhci_hcd ohci_hcd ehci_hcd
Pid: 4768, comm: qemu-system-x86 Tainted: G      D    2.6.28-00165-g4f27e3e-dirty #164
Call Trace:
 [<ffffffff8025585e>] ? __debug_show_held_locks+0x1b/0x24
 [<ffffffff8023187b>] __schedule_bug+0x8c/0x95
 [<ffffffff805851e1>] schedule+0xd3/0x902
 [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff8037a938>] ? put_io_context+0x67/0x72
 [<ffffffff80238ed4>] do_exit+0xda/0x7a7
 [<ffffffff805892c9>] oops_begin+0x0/0x90
 [<ffffffff8020e3c9>] die+0x5d/0x66
 [<ffffffff80588ff7>] do_general_protection+0x128/0x130
 [<ffffffff80588ecf>] ? do_general_protection+0x0/0x130
 [<ffffffff80588702>] error_exit+0x0/0xa9
 [<ffffffff8028a5b6>] ? __purge_vmap_area_lazy+0x12c/0x163
 [<ffffffff8028a5ae>] ? __purge_vmap_area_lazy+0x124/0x163
 [<ffffffff8028ab1d>] free_unmap_vmap_area_noflush+0x69/0x70
 [<ffffffff8028ab49>] remove_vm_area+0x25/0x71
 [<ffffffff8028ac54>] __vunmap+0x3a/0xca
 [<ffffffff8028ad35>] vfree+0x29/0x2b
 [<ffffffffa00f98a3>] kvm_free_physmem_slot+0x25/0x7c [kvm]
 [<ffffffffa00f9d75>] kvm_free_physmem+0x27/0x36 [kvm]
 [<ffffffffa00fccb4>] kvm_arch_destroy_vm+0xa6/0xda [kvm]
 [<ffffffffa00f9e11>] kvm_put_kvm+0x8d/0xa7 [kvm]
 [<ffffffffa00fa0e2>] kvm_vcpu_release+0x13/0x17 [kvm]
 [<ffffffff802a1c07>] __fput+0xeb/0x1a3
 [<ffffffff802a1cd4>] fput+0x15/0x17
 [<ffffffff8029f26c>] filp_close+0x67/0x72
 [<ffffffff802378a8>] put_files_struct+0x74/0xc8
 [<ffffffff80237943>] exit_files+0x47/0x4f
 [<ffffffff80238fe5>] do_exit+0x1eb/0x7a7
 [<ffffffff80587edf>] ? _spin_unlock_irq+0x2b/0x51
 [<ffffffff80239614>] do_group_exit+0x73/0xa0
 [<ffffffff80242b10>] get_signal_to_deliver+0x30c/0x32c
 [<ffffffff8020b4d5>] ? sysret_signal+0x19/0x29
 [<ffffffff8020a80f>] do_notify_resume+0x8c/0x851
 [<ffffffff8025b811>] ? do_futex+0x90/0x92a
 [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114
 [<ffffffff80587f51>] ? _spin_unlock_irqrestore+0x4c/0x68
 [<ffffffff8026be5c>] ? __rcu_read_unlock+0x92/0x9e
 [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114
 [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff8024f300>] ? getnstimeofday+0x3a/0x96
 [<ffffffff8024c4f0>] ? ktime_get_ts+0x49/0x4e
 [<ffffffff8020b4c1>] ? sysret_signal+0x5/0x29
 [<ffffffff80256bd7>] ? trace_hardirqs_on_caller+0xf0/0x114
 [<ffffffff8020b4d5>] ? sysret_signal+0x19/0x29
 [<ffffffff8020b7b7>] ptregscall_common+0x67/0xb0
ttyS1: 26 input overrun(s)

And its not specific to vm shutdown path. Another instance:


general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/class/net/tap0/address
CPU 5 
Modules linked in: ipt_REJECT xt_state xt_tcpudp iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables tun kvm_intel kvm bridge stp llc dm_multipath scsi_wait_scan ata_piix libata dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod shpchp pci_hotplug mptsas mptscsih mptbase scsi_transport_sas uhci_hcd ohci_hcd ehci_hcd [last unloaded: x_tables]
Pid: 4440, comm: qemu-system-x86 Not tainted 2.6.28-00165-g4f27e3e-dirty #163
RIP: 0010:[<ffffffff8028a5b6>]  [<ffffffff8028a5b6>] __purge_vmap_area_lazy+0x12c/0x163
RSP: 0018:ffff88011f4c7be8  EFLAGS: 00010246
RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b2b RCX: 0000000000000003
RDX: ffffffff80a1dae0 RSI: ffff880028083980 RDI: 0000000000000001
RBP: ffff88011f4c7c28 R08: 0000000000000282 R09: ffffffff80a1bf50
R10: ffff88022e9dc0f8 R11: ffff88011f4c7b68 R12: ffff88011f4c7be8
R13: ffff88011f4c7c40 R14: ffff88011f4c7c48 R15: 0000000000008001
FS:  0000000040abf950(0063) GS:ffff88022f25ed18(0000) knlGS:0000000000000000
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000229d34000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process qemu-system-x86 (pid: 4440, threadinfo ffff88011f4c6000, task ffff88022e9dc0f8)
Stack:
 ffff8802291a14b0 ffff880229003d58 ffffc20021000000 0000000000008001
 ffff880229526000 0000000000000000 ffff88022d073000 ffff88011f58c0c0
 ffff88011f4c7c58 ffffffff8028ab1d ffff88011f4c7c58 ffffffffa015c000
Call Trace:
 [<ffffffff8028ab1d>] free_unmap_vmap_area_noflush+0x69/0x70
 [<ffffffff8028ab49>] remove_vm_area+0x25/0x71
 [<ffffffff8028ac54>] __vunmap+0x3a/0xca
 [<ffffffff8028ad0a>] vunmap+0x26/0x28
 [<ffffffffa01be092>] pio_copy_data+0xcf/0x113 [kvm]
 [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffffa01be16f>] complete_pio+0x99/0x1ef [kvm]
 [<ffffffff8023fcd2>] ? sigprocmask+0xc6/0xd0
 [<ffffffffa01c0295>] kvm_arch_vcpu_ioctl_run+0x9a/0x889 [kvm]
 [<ffffffffa01b84f4>] kvm_vcpu_ioctl+0xfc/0x48b [kvm]
 [<ffffffff802ac760>] vfs_ioctl+0x2a/0x78
 [<ffffffff8026be5c>] ? __rcu_read_unlock+0x92/0x9e
 [<ffffffff802acb46>] do_vfs_ioctl+0x398/0x3c6
 [<ffffffff80256c08>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff802acbb6>] sys_ioctl+0x42/0x65
 [<ffffffff8020b43b>] system_call_fastpath+0x16/0x1b
Code: 46 48 c7 c7 c0 d1 74 80 4c 8d 65 c0 e8 8c da 2f 00 48 8b 45 c0 48 8d 58 c0 eb 10 48 89 df e8 74 fe ff ff 48 8b 43 40 48 8d 58 c0 <48> 8b 43 40 0f 18 08 48 8d 43 40 4c 39 e0 75 e0 48 c7 c7 c0 d1 
RIP  [<ffffffff8028a5b6>] __purge_vmap_area_lazy+0x12c/0x163
 RSP <ffff88011f4c7be8>
---[ end trace 31811279a2e983e8 ]---
note: qemu-system-x86[4440] exited with preempt_count 2


(gdb) l *(__purge_vmap_area_lazy + 0x12c)
0xffffffff80289ca2 is in __purge_vmap_area_lazy (mm/vmalloc.c:516).
511     if (nr || force_flush)
512         flush_tlb_kernel_range(*start, *end);
513 
514     if (nr) {
515         spin_lock(&vmap_area_lock);
516         list_for_each_entry(va, &valist, purge_list)
517             __free_vmap_area(va);
518         spin_unlock(&vmap_area_lock);
519     }
520     spin_unlock(&purge_lock);

0xffffffff80289c9a <__purge_vmap_area_lazy+292>:    mov 0x40(%rbx),%rax
0xffffffff80289c9e <__purge_vmap_area_lazy+296>:    lea -0x40(%rax),%rbx
0xffffffff80289ca2 <__purge_vmap_area_lazy+300>:    mov 0x40(%rbx),%rax
                                                    ^^^^^^^^^^^^^^^^^^^
0xffffffff80289ca6 <__purge_vmap_area_lazy+304>:    prefetcht0 (%rax)


Which vanishes once PREEMPT_RCU is disabled. 

Nick? KVM does not make direct use of RCU. Same issue happens if the
entire __purge_vmap_area_lazy runs with vmap_area_lock held.

next prev parent reply	other threads:[~2008-12-29 14:58 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-10 20:23 KVM: mmu_notifiers release method Marcelo Tosatti
2008-12-24 12:50 ` Avi Kivity
2008-12-24 15:28   ` Andrea Arcangeli
2008-12-29 14:58     ` Marcelo Tosatti [this message]
2008-12-30  3:53       ` __purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y Nick Piggin
2008-12-30 15:13         ` Marcelo Tosatti
2008-12-30 15:32           ` Avi Kivity
2008-12-31  2:32             ` Nick Piggin
2009-01-06 19:53               ` Marcelo Tosatti
2009-01-07 10:02                 ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081229145821.GA3823@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=nickpiggin@yahoo.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.