Re: [Qemu-devel] QEMU terminates during reboot after memory unplug with vhost=on

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Greg Kurz <groug@kaod.org>,
	imammedo@redhat.com, bharata@linux.vnet.ibm.com,
	qemu-ppc@nongnu.org, qemu-devel@nongnu.org,
	david@gibson.dropbear.id.au
Subject: Re: [Qemu-devel] QEMU terminates during reboot after memory unplug with vhost=on
Date: Thu, 16 Nov 2017 17:59:17 +0200	[thread overview]
Message-ID: <20171116175713-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20171116085345.7b02ecdd@t450s.home>

On Thu, Nov 16, 2017 at 08:53:45AM -0700, Alex Williamson wrote:
> On Thu, 16 Nov 2017 17:28:44 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Thu, Nov 16, 2017 at 01:23:39PM +0100, Greg Kurz wrote:
> > > Hi,
> > > 
> > > I'm resurrecting a thread about a QEMU crash we're still hitting on ppc64. It
> > > was reported to the list by Bharata 2 months ago:
> > > 
> > > https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg03685.html
> > > 
> > > "Hi,
> > > 
> > > QEMU hits the below assert
> > > 
> > > qemu-system-ppc64: used ring relocated for ring 2
> > > qemu-system-ppc64: qemu/hw/virtio/vhost.c:649: vhost_commit: Assertion `r >= 0' failed.
> > > 
> > > in the following scenario:
> > > 
> > > 1. Boot guest with vhost=on
> > >   -netdev tap,id=mynet0,script=qemu-ifup,downscript=qemu-ifdown,vhost=on -device virtio-net-pci,netdev=mynet0
> > > 2. Hot add a DIMM device 
> > > 3. Reboot
> > >    When the guest reboots, we can see
> > >    vhost_virtqueue_start:vq->used_phys getting assigned an address that
> > >    falls in the hotplugged memory range.
> > > 4. Remove the DIMM device
> > >    Guest refuses the removal as the hotplugged memory is under use.
> > > 5. Reboot
> > >    QEMU forces the removal of the DIMM device during reset and that's
> > >    when we hit the above assert.
> > > 
> > > Any pointers on why we are hitting this assert ? Shouldn't vhost be
> > > done with using the hotplugged memory when we hit reset ?
> > > 
> > > Regards,
> > > Bharata."
> > > 
> > > #0  0x00007ffff760eff0 in raise () from /lib64/libc.so.6
> > > #1  0x00007ffff761136c in abort () from /lib64/libc.so.6
> > > #2  0x00007ffff7604c44 in __assert_fail_base () from /lib64/libc.so.6
> > > #3  0x00007ffff7604d34 in __assert_fail () from /lib64/libc.so.6
> > > #4  0x0000000010161138 in vhost_commit (listener=0x11469e88) at /home/greg/Work/qemu/qemu-spapr/hw/virtio/vhost.c:650
> > > #5  0x00000000100917fc in memory_region_transaction_commit () at /home/greg/Work/qemu/qemu-spapr/memory.c:1094
> > > #6  0x0000000010096748 in memory_region_del_subregion (mr=0x1143eed0, subregion=0x116f1920) at /home/greg/Work/qemu/qemu-spapr/memory.c:2337
> > > #7  0x00000000104a9aec in pc_dimm_memory_unplug (dev=0x11445c50, hpms=0x1143eec0, mr=0x116f1920) at /home/greg/Work/qemu/qemu-spapr/hw/mem/pc-dimm.c:126
> > > #8  0x0000000010180454 in spapr_lmb_release (dev=0x11445c50) at /home/greg/Work/qemu/qemu-spapr/hw/ppc/spapr.c:3151
> > > #9  0x00000000101a397c in spapr_drc_release (drc=0x116b9cc0) at /home/greg/Work/qemu/qemu-spapr/hw/ppc/spapr_drc.c:401
> > > #10 0x00000000101a3ba0 in spapr_drc_reset (drc=0x116b9cc0) at /home/greg/Work/qemu/qemu-spapr/hw/ppc/spapr_drc.c:439
> > > #11 0x00000000101a3c88 in drc_reset (opaque=0x116b9cc0) at /home/greg/Work/qemu/qemu-spapr/hw/ppc/spapr_drc.c:460
> > > #12 0x0000000010447380 in qemu_devices_reset () at /home/greg/Work/qemu/qemu-spapr/hw/core/reset.c:69
> > > #13 0x000000001017ae80 in ppc_spapr_reset () at /home/greg/Work/qemu/qemu-spapr/hw/ppc/spapr.c:1445
> > > #14 0x0000000010377c60 in qemu_system_reset (reason=SHUTDOWN_CAUSE_HOST_QMP) at /home/greg/Work/qemu/qemu-spapr/vl.c:1788
> > > #15 0x00000000103785ac in main_loop_should_exit () at /home/greg/Work/qemu/qemu-spapr/vl.c:1962
> > > #16 0x0000000010378708 in main_loop () at /home/greg/Work/qemu/qemu-spapr/vl.c:1999
> > > #17 0x0000000010382c54 in main (argc=21, argv=0x7ffffffff098, envp=0x7ffffffff148) at /home/greg/Work/qemu/qemu-spapr/vl.c:4897
> > > 
> > > 
> > > This basically happens because on pseries, like x86, we usually wait
> > > for the guest to eject the DIMM before actually removing it, BUT,
> > > unlike x86, we force the removal on reset. This is handled by a DRC
> > > object which registers a handler with qemu_register_reset().
> > > 
> > > At reset time, the machine calls qemu_devices_reset() but unfortunately,
> > > the DRC reset handler gets called BEFORE the VirtIONet device one. The
> > > vhost device is still active and it doesn't like the ring addresses to
> > > change while in this state.
> > > 
> > > Michael,
> > > 
> > > The assert() has been around since the beginning, at a time I believe there was
> > > no such thing as memory hot-unplug. Now that memory can go away at reset time,
> > > is it really legitimate to crash QEMU if vhost detects a ring address change ?  
> > 
> > It's just a symptom of a problem though. If memory is going away
> > while vhost backend is running, things are not going to
> > end well. Less scary for a network device, more scary for a block
> > device. VFIO probably has the same issue, it just does not
> > have an assert.
> 
> Hmm, why?  We don't have data structures living in guest RAM with
> vfio.  Guest RAM is just a mapping through the IOMMU.  So long as the
> MemoryListener is correctly doing its job, that range will be unmapped
> and vfio shouldn't care about it.  Thanks,
> 
> Alex

Range is unmapped by listener but device has not been reset yet,
so it will get errors when attempting to access guest RAM.

-- 
MST

next prev parent reply	other threads:[~2017-11-16 15:59 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-16 12:23 [Qemu-devel] QEMU terminates during reboot after memory unplug with vhost=on Greg Kurz
2017-11-16 15:28 ` Michael S. Tsirkin
2017-11-16 15:34   ` Greg Kurz
2017-11-16 15:53   ` Alex Williamson
2017-11-16 15:59     ` Michael S. Tsirkin [this message]
2017-11-16 16:04       ` Alex Williamson
2017-11-16 16:17         ` Michael S. Tsirkin
  -- strict thread matches above, loose matches on Subject: below --
2017-09-14  7:01 Bharata B Rao
2017-09-14  8:00 ` Igor Mammedov
2017-09-14  8:18   ` Bharata B Rao
2017-09-14  8:59     ` Igor Mammedov
2017-09-14 10:20       ` Bharata B Rao
2017-10-16 10:08       ` Bharata B Rao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171116175713-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=bharata@linux.vnet.ibm.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=groug@kaod.org \
    --cc=imammedo@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).