qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Igor Mammedov <imammedo@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: pbonzini@redhat.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [RFC v3 0/8] Fix QEMU crash during memory hotplug with vhost=on
Date: Wed, 8 Jul 2015 13:41:16 +0200	[thread overview]
Message-ID: <20150708134116.0218f4e5@nial.brq.redhat.com> (raw)
In-Reply-To: <20150708125901-mutt-send-email-mst@redhat.com>

On Wed, 8 Jul 2015 13:01:05 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Wed, Jul 08, 2015 at 11:46:40AM +0200, Igor Mammedov wrote:
> > Changelog:
> >  v2->v3:
> >    * fixed(work-arouned) unmapping issues,
> >      now memory subsytem keeps track of HVA mapped
> >      regions and doesn't allow to map a new region
> >      at address where previos has benn mapped until
> >      previous region is gone
> >    * fixed offset calculations in memory_region_find_hva_range()
> >      in 2/8
> >    * redone MemorySection folding into HVA range for VHOST,
> >      now compacted memory map is temporary and passed only to vhost
> >      backend and doesn't touch original memory map used by QEMU
> >  v1->v2:
> >    * take into account Paolo's review comments
> >      * do not overload ram_addr
> >      * ifdef linux specific code
> >    * reseve HVA using API from exec.c instead of calling
> >      mmap() dircely from memory.c
> >    * support unmapping of HVA remapped region
> > 
> > When more than ~50 pc-dimm devices are hotplugged with
> > vhost enabled, QEMU will assert in vhost vhost_commit()
> > due to backend refusing to accept too many memory ranges.
> > 
> > Series introduces Reserved HVA MemoryRegion container
> > where to all hotplugged memory is remapped and passes
> > the single container range to vhost instead of multiple
> > memory ranges for each hotlugged pc-dimm device.
> 
> Did not review yet, two procedural comments:
> 
> - this fixes qemu on current kernels, so it's a bugfix
it's easier/safer to patch current kernels with much simpler
fix on kernel side.
I'd call this series a workaround to overcome current vhost
limitation, which hacks around QEMU's memory layer to satisfy
broken vhost.
For my point of view HVA hack is worth only if
we convert all RAM including initial RAM and replace lookup
in vhost region mapping with a simple computation using single
HVA range.
So far 'cure' in QEMU is much worse than rising limit on vhost
backend side.

 
> - this changes the semantics of memory hot unplug slightly
>   so I think it's important to merge in 2.4 before we
>   release qemu with memory hot unplug, this way we
>   won't have to maintain old semantics forever
It changes memory hotplug semantics as following:
  * hotplug at previously used address could fail gracefully
    if previous mapping is still in use
  * user MUST remove corresponding backend after removing
    pc-dimm device
Above limitations apply only if vhost is used, but pointless
in all other cases so I'm not sure that we should enforce
them on vhost-less setups.

As additional issue:
deleting backend doesn't actually frees memory since it's
just mmap(NORESERVE) over allocated region. It's possible
to do madvise(MADV_DONTNEED) on area but it doesn't guaranty
that kernel will free memory. So all these mmap/remap
tricks could screw up mgmt tools if they try to limit memory
consumed by QEMU.

series is not tested wrt cross-version migration.

> 
> 
> > It's an alternative approach to increasing backend supported
> > memory regions limit. 
> > 
> > git branch for testing:
> >   https://github.com/imammedo/qemu/commits/vhost_one_hp_range_v3
> > 
> > Igor Mammedov (8):
> >   memory: get rid of memory_region_destructor_ram_from_ptr()
> >   memory: introduce MemoryRegion container with reserved HVA range
> >   pc: reserve hotpluggable memory range with
> >     memory_region_init_hva_range()
> >   pc: fix QEMU crashing when more than ~50 memory hotplugged
> >   exec: make sure that RAMBlock descriptor won't be leaked
> >   exec: add qemu_ram_unmap_hva() API for unmapping memory from HVA area
> >   memory: extend memory_region_add_subregion() to support error
> >     reporting
> >   memory: add support for deleting HVA mapped MemoryRegion
> > 
> >  exec.c                                   |  71 ++++++++++++-------
> >  hw/acpi/core.c                           |   6 +-
> >  hw/acpi/cpu_hotplug.c                    |   2 +-
> >  hw/acpi/ich9.c                           |   8 ++-
> >  hw/acpi/memory_hotplug.c                 |   3 +-
> >  hw/acpi/pcihp.c                          |   3 +-
> >  hw/acpi/piix4.c                          |   6 +-
> >  hw/alpha/typhoon.c                       |  16 ++---
> >  hw/arm/armv7m.c                          |   2 +-
> >  hw/arm/cubieboard.c                      |   2 +-
> >  hw/arm/digic_boards.c                    |   2 +-
> >  hw/arm/exynos4210.c                      |  12 ++--
> >  hw/arm/highbank.c                        |   4 +-
> >  hw/arm/integratorcp.c                    |   5 +-
> >  hw/arm/kzm.c                             |   9 ++-
> >  hw/arm/mainstone.c                       |   2 +-
> >  hw/arm/musicpal.c                        |   5 +-
> >  hw/arm/omap1.c                           |  59 +++++++++-------
> >  hw/arm/omap2.c                           |   8 ++-
> >  hw/arm/omap_sx1.c                        |  19 +++--
> >  hw/arm/palm.c                            |  14 ++--
> >  hw/arm/pxa2xx.c                          |  30 +++++---
> >  hw/arm/realview.c                        |   8 +--
> >  hw/arm/spitz.c                           |   2 +-
> >  hw/arm/stellaris.c                       |   7 +-
> >  hw/arm/stm32f205_soc.c                   |   8 ++-
> >  hw/arm/strongarm.c                       |   2 +-
> >  hw/arm/tosa.c                            |   2 +-
> >  hw/arm/versatilepb.c                     |   2 +-
> >  hw/arm/vexpress.c                        |  15 ++--
> >  hw/arm/virt.c                            |  12 ++--
> >  hw/arm/xilinx_zynq.c                     |   5 +-
> >  hw/arm/xlnx-ep108.c                      |   3 +-
> >  hw/arm/xlnx-zynqmp.c                     |   3 +-
> >  hw/block/onenand.c                       |   2 +-
> >  hw/block/pflash_cfi02.c                  |   3 +-
> >  hw/char/debugcon.c                       |   2 +-
> >  hw/char/mcf_uart.c                       |   2 +-
> >  hw/char/omap_uart.c                      |   2 +-
> >  hw/char/parallel.c                       |   2 +-
> >  hw/char/serial-pci.c                     |   2 +-
> >  hw/char/serial.c                         |   4 +-
> >  hw/char/sh_serial.c                      |   6 +-
> >  hw/core/platform-bus.c                   |   2 +-
> >  hw/core/sysbus.c                         |   4 +-
> >  hw/cpu/a15mpcore.c                       |   6 +-
> >  hw/cpu/a9mpcore.c                        |  18 +++--
> >  hw/cpu/arm11mpcore.c                     |  15 ++--
> >  hw/cris/axis_dev88.c                     |  10 +--
> >  hw/display/cirrus_vga.c                  |  11 +--
> >  hw/display/omap_dss.c                    |   2 +-
> >  hw/display/omap_lcdc.c                   |   2 +-
> >  hw/display/pxa2xx_lcd.c                  |   2 +-
> >  hw/display/sm501.c                       |   9 +--
> >  hw/display/tc6393xb.c                    |   5 +-
> >  hw/display/vga-isa-mm.c                  |   6 +-
> >  hw/display/vga-pci.c                     |   6 +-
> >  hw/display/vga.c                         |   3 +-
> >  hw/dma/etraxfs_dma.c                     |   3 +-
> >  hw/dma/i8257.c                           |   5 +-
> >  hw/dma/omap_dma.c                        |   4 +-
> >  hw/dma/rc4030.c                          |   4 +-
> >  hw/i386/kvm/pci-assign.c                 |   6 +-
> >  hw/i386/pc.c                             |  16 +++--
> >  hw/i386/pc_sysfw.c                       |   2 +-
> >  hw/ide/cmd646.c                          |   6 +-
> >  hw/ide/piix.c                            |   6 +-
> >  hw/ide/via.c                             |   6 +-
> >  hw/input/pxa2xx_keypad.c                 |   2 +-
> >  hw/intc/apic_common.c                    |   3 +-
> >  hw/intc/armv7m_nvic.c                    |   5 +-
> >  hw/intc/exynos4210_gic.c                 |   6 +-
> >  hw/intc/openpic.c                        |   2 +-
> >  hw/intc/realview_gic.c                   |   6 +-
> >  hw/intc/sh_intc.c                        |   6 +-
> >  hw/isa/apm.c                             |   2 +-
> >  hw/isa/isa-bus.c                         |   3 +-
> >  hw/isa/vt82c686.c                        |   7 +-
> >  hw/lm32/lm32_boards.c                    |   6 +-
> >  hw/lm32/milkymist.c                      |   3 +-
> >  hw/m68k/an5206.c                         |   5 +-
> >  hw/m68k/dummy_m68k.c                     |   2 +-
> >  hw/m68k/mcf5206.c                        |   2 +-
> >  hw/m68k/mcf5208.c                        |  10 +--
> >  hw/m68k/mcf_intc.c                       |   2 +-
> >  hw/mem/pc-dimm.c                         |   6 +-
> >  hw/microblaze/petalogix_ml605_mmu.c      |   6 +-
> >  hw/microblaze/petalogix_s3adsp1800_mmu.c |   5 +-
> >  hw/mips/gt64xxx_pci.c                    |   9 +--
> >  hw/mips/mips_fulong2e.c                  |   5 +-
> >  hw/mips/mips_jazz.c                      |  30 +++++---
> >  hw/mips/mips_malta.c                     |  17 +++--
> >  hw/mips/mips_mipssim.c                   |  11 +--
> >  hw/mips/mips_r4k.c                       |  14 ++--
> >  hw/misc/debugexit.c                      |   2 +-
> >  hw/misc/ivshmem.c                        |   4 +-
> >  hw/misc/macio/macio.c                    |  24 ++++---
> >  hw/misc/omap_gpmc.c                      |   7 +-
> >  hw/misc/omap_l4.c                        |   3 +-
> >  hw/misc/omap_sdrc.c                      |   2 +-
> >  hw/misc/pc-testdev.c                     |  11 +--
> >  hw/moxie/moxiesim.c                      |   4 +-
> >  hw/net/fsl_etsec/etsec.c                 |   3 +-
> >  hw/net/mcf_fec.c                         |   2 +-
> >  hw/openrisc/openrisc_sim.c               |   6 +-
> >  hw/pci-host/apb.c                        |   3 +-
> >  hw/pci-host/grackle.c                    |   2 +-
> >  hw/pci-host/piix.c                       |   3 +-
> >  hw/pci-host/ppce500.c                    |  13 ++--
> >  hw/pci-host/prep.c                       |  24 ++++---
> >  hw/pci-host/q35.c                        |   8 ++-
> >  hw/pci-host/uninorth.c                   |   4 +-
> >  hw/pci/msix.c                            |   6 +-
> >  hw/pci/pcie_host.c                       |   3 +-
> >  hw/pci/shpc.c                            |   2 +-
> >  hw/pcmcia/pxa2xx.c                       |   6 +-
> >  hw/ppc/e500.c                            |  14 ++--
> >  hw/ppc/mac_newworld.c                    |  14 ++--
> >  hw/ppc/mac_oldworld.c                    |   6 +-
> >  hw/ppc/ppc405_boards.c                   |  12 ++--
> >  hw/ppc/ppc405_uc.c                       |  16 +++--
> >  hw/ppc/ppc440_bamboo.c                   |   3 +-
> >  hw/ppc/ppc4xx_devs.c                     |   4 +-
> >  hw/ppc/ppc4xx_pci.c                      |   9 ++-
> >  hw/ppc/prep.c                            |   4 +-
> >  hw/ppc/spapr.c                           |   4 +-
> >  hw/ppc/spapr_pci.c                       |   8 +--
> >  hw/ppc/spapr_pci_vfio.c                  |   2 +-
> >  hw/ppc/virtex_ml507.c                    |   3 +-
> >  hw/s390x/s390-virtio-ccw.c               |   2 +-
> >  hw/s390x/s390-virtio.c                   |   2 +-
> >  hw/s390x/sclp.c                          |   3 +-
> >  hw/sd/omap_mmc.c                         |   2 +-
> >  hw/sd/pxa2xx_mmci.c                      |   2 +-
> >  hw/sh4/r2d.c                             |   5 +-
> >  hw/sh4/sh7750.c                          |  21 ++++--
> >  hw/sh4/sh_pci.c                          |   6 +-
> >  hw/sh4/shix.c                            |   6 +-
> >  hw/sparc/leon3.c                         |   6 +-
> >  hw/sparc64/sun4u.c                       |   2 +-
> >  hw/timer/m48t59.c                        |   3 +-
> >  hw/timer/sh_timer.c                      |   6 +-
> >  hw/tpm/tpm_tis.c                         |   2 +-
> >  hw/tricore/tricore_testboard.c           |  12 ++--
> >  hw/unicore32/puv3.c                      |   6 +-
> >  hw/usb/hcd-ehci-sysbus.c                 |   2 +-
> >  hw/usb/hcd-ehci.c                        |   8 ++-
> >  hw/usb/hcd-xhci.c                        |  15 ++--
> >  hw/vfio/common.c                         |   2 +-
> >  hw/vfio/pci.c                            |   6 +-
> >  hw/virtio/vhost.c                        |  47 +++++++++++-
> >  hw/virtio/virtio-pci.c                   |   3 +-
> >  hw/xtensa/sim.c                          |   5 +-
> >  hw/xtensa/xtfpga.c                       |  18 ++---
> >  include/exec/cpu-common.h                |   3 +
> >  include/exec/memory.h                    |  49 ++++++++++++-
> >  include/exec/ram_addr.h                  |   1 -
> >  include/hw/virtio/vhost.h                |   1 +
> >  ioport.c                                 |   2 +-
> >  memory.c                                 | 118 ++++++++++++++++++++++++++++---
> >  numa.c                                   |   2 +-
> >  161 files changed, 867 insertions(+), 458 deletions(-)
> > 
> > -- 
> > 1.8.3.1

  reply	other threads:[~2015-07-08 11:41 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-08  9:46 [Qemu-devel] [RFC v3 0/8] Fix QEMU crash during memory hotplug with vhost=on Igor Mammedov
2015-07-08  9:46 ` [Qemu-devel] [RFC v3 1/8] memory: get rid of memory_region_destructor_ram_from_ptr() Igor Mammedov
2015-07-08  9:46 ` [Qemu-devel] [RFC v3 2/8] memory: introduce MemoryRegion container with reserved HVA range Igor Mammedov
2015-07-08  9:46 ` [Qemu-devel] [RFC v3 3/8] pc: reserve hotpluggable memory range with memory_region_init_hva_range() Igor Mammedov
2015-07-08  9:46 ` [Qemu-devel] [RFC v3 4/8] pc: fix QEMU crashing when more than ~50 memory hotplugged Igor Mammedov
2015-07-08  9:46 ` [Qemu-devel] [RFC v3 5/8] exec: make sure that RAMBlock descriptor won't be leaked Igor Mammedov
2015-07-08  9:46 ` [Qemu-devel] [RFC v3 6/8] exec: add qemu_ram_unmap_hva() API for unmapping memory from HVA area Igor Mammedov
2015-07-08  9:46 ` [Qemu-devel] [RFC v3 7/8] memory: extend memory_region_add_subregion() to support error reporting Igor Mammedov
2015-07-08 11:03   ` Michael S. Tsirkin
2015-07-08 11:09   ` Peter Maydell
2015-07-08 14:58     ` Igor Mammedov
2015-07-08 17:30       ` Michael S. Tsirkin
2015-07-08 18:41         ` Igor Mammedov
2015-07-09  6:58           ` Michael S. Tsirkin
2015-07-08 17:42       ` Paolo Bonzini
2015-07-08 18:58         ` Igor Mammedov
2015-07-08  9:46 ` [Qemu-devel] [RFC v3 8/8] memory: add support for deleting HVA mapped MemoryRegion Igor Mammedov
2015-07-08  9:58   ` Michael S. Tsirkin
2015-07-08 14:43     ` Igor Mammedov
2015-07-08 14:50       ` Michael S. Tsirkin
2015-07-08 10:01 ` [Qemu-devel] [RFC v3 0/8] Fix QEMU crash during memory hotplug with vhost=on Michael S. Tsirkin
2015-07-08 11:41   ` Igor Mammedov [this message]
2015-07-08 11:45     ` Michael S. Tsirkin
2015-07-08 15:46   ` Igor Mammedov
2015-07-09 17:04     ` Andrey Korolyov
2015-07-15 15:18       ` Igor Mammedov
2015-07-15 15:26         ` Andrey Korolyov
2015-07-15 16:08           ` Michael S. Tsirkin
2015-07-15 16:46             ` Andrey Korolyov
2015-07-16 20:35               ` Andrey Korolyov
2015-07-17 20:45                 ` Andrey Korolyov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150708134116.0218f4e5@nial.brq.redhat.com \
    --to=imammedo@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).