qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Raphael Norwitz <raphael.norwitz@nutanix.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [PATCH v2 3/3] Lift max memory slots limit imposed by vhost-user
Date: Thu, 20 Feb 2020 02:03:08 -0500	[thread overview]
Message-ID: <20200220070308.GA2215@localhost.localdomain> (raw)
In-Reply-To: <20200209174335.GA15050@localhost.localdomain>

Ping

On Sun, Feb 09, 2020 at 12:43:35PM -0500, Raphael Norwitz wrote:
> 
> On Thu, Feb 06, 2020 at 03:32:38AM -0500, Michael S. Tsirkin wrote:
> > 
> > On Wed, Jan 15, 2020 at 09:57:06PM -0500, Raphael Norwitz wrote:
> > > The current vhost-user implementation in Qemu imposes a limit on the
> > > maximum number of memory slots exposed to a VM using a vhost-user
> > > device. This change provides a new protocol feature
> > > VHOST_USER_F_CONFIGURE_SLOTS which, when enabled, lifts this limit and
> > > allows a VM with a vhost-user device to expose a configurable number of
> > > memory slots, up to the ACPI defined maximum. Existing backends which
> > > do not support this protocol feature are unaffected.
> > 
> > Hmm ACPI maximum seems to be up to 512 - is this too much to fit in a
> > single message?  So can't we just increase the number (after negotiating
> > with remote) and be done with it, instead of add/remove?  Or is there
> > another reason to prefer add/remove?
> >
> 
> As mentioned in my cover letter, we experimented with simply increasing the
> message size and it didn’t work on our setup. We debugged down to the socket
> layer and found that on the receiving end the messages were truncated at
> around 512 bytes, or around 16 memory regions. To support 512 memory regions
> we would need a message size of around  32 <bytes per region> * 512 <regions>
> + 8 <bytes for padding and region count> ~= 16k packet size. That would be 64
> times larger than the next largest message size. We thought it would be cleaner
> and more in line with the rest of the protocol to keep the message sizes
> smaller. In particular, we thought memory regions should be treated like the
> rings, which are sent over one message at a time instead of in one large message.
> Whether or not such a large message size can be made to work in our case,
> separate messages will always work on Linux, and most likely all other UNIX
> platforms QEMU is used on.
> 

> > > 
> > > This feature works by using three new messages,
> > > VHOST_USER_GET_MAX_MEM_SLOTS, VHOST_USER_ADD_MEM_REG and
> > > VHOST_USER_REM_MEM_REG. VHOST_USER_GET_MAX_MEM_SLOTS gets the
> > > number of memory slots the backend is willing to accept when the
> > > backend is initialized. Then, when the memory tables are set or updated,
> > > a series of VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG messages
> > > are sent to transmit the regions to map and/or unmap instead of trying
> > > to send all the regions in one fixed size VHOST_USER_SET_MEM_TABLE
> > > message.
> > > 
> > > The vhost_user struct maintains a shadow state of the VM’s memory
> > > regions. When the memory tables are modified, the
> > > vhost_user_set_mem_table() function compares the new device memory state
> > > to the shadow state and only sends regions which need to be unmapped or
> > > mapped in. The regions which must be unmapped are sent first, followed
> > > by the new regions to be mapped in. After all the messages have been
> > > sent, the shadow state is set to the current virtual device state.
> > > 
> > > The current feature implementation does not work with postcopy migration
> > > and cannot be enabled if the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has
> > > also been negotiated.
> > 
> > Hmm what would it take to lift the restrictions?
> > conflicting features like this makes is very hard for users to make
> > an informed choice what to support.
> >
> 
> We would need a setup with a backend which supports these features (REPLY_ACK
> and postcopy migration). At first glance it looks like DPDK could work but
> I'm not sure how easy it will be to test postcopy migration with the resources
> we have.
>  

> > > Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
> > > Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com>
> > > Suggested-by: Mike Cui <cui@nutanix.com>
> > > ---
> > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > index af83fdd..fed6d02 100644
> > > --- a/hw/virtio/vhost-user.c
> > > +++ b/hw/virtio/vhost-user.c
> > > @@ -35,11 +35,29 @@
> > >  #include <linux/userfaultfd.h>
> > >  #endif
> > >  
> > > -#define VHOST_MEMORY_MAX_NREGIONS    8
> > > +#define VHOST_MEMORY_LEGACY_NREGIONS    8
> > 
> > Hardly legacy when this is intended to always be used e.g. with
> > postcopy, right?
> >
> 
> How about 'BASELINE'?

> > > +    msg->hdr.size = sizeof(msg->payload.mem_reg.padding);
> > > +    msg->hdr.size += sizeof(VhostUserMemoryRegion);
> > > +
> > > +    /*
> > > +     * Send VHOST_USER_REM_MEM_REG for memory regions in our shadow state
> > > +     * which are not found not in the device's memory state.
> > 
> > double negation - could not parse this.
> >
> 
> Apologies - typo here. It should say “Send VHOST_USER_REM_MEM_REG for memory
> regions in our shadow state which are not found in the device's memory state.” 
> i.e. send messages to remove regions in the shadow state but not in the updated
> device state. 
>  
> > > +     */
> > > +    for (i = 0; i < u->num_shadow_regions; ++i) {
> > > +        reg = dev->mem->regions;
> > > +
> > > +        for (j = 0; j < dev->mem->nregions; j++) {
> > > +            reg = dev->mem->regions + j;
> > > +
> > > +            assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
> > > +            mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
> > > +                                         &offset);
> > > +            fd = memory_region_get_fd(mr);
> > > +
> > > +            if (reg_equal(&u->shadow_regions[i], reg)) {
> > > +                matching = true;
> > > +                found[j] = true;
> > > +                break;
> > > +            }
> > > +        }
> > > +
> > > +        if (fd > 0 && !matching) {
> > > +            msg->hdr.request = VHOST_USER_REM_MEM_REG;
> > > +            msg->payload.mem_reg.region.userspace_addr = reg->userspace_addr;
> > > +            msg->payload.mem_reg.region.memory_size = reg->memory_size;
> > > +            msg->payload.mem_reg.region.guest_phys_addr =
> > > +                reg->guest_phys_addr;
> > > +            msg->payload.mem_reg.region.mmap_offset = offset;
> > > +
> > > +            if (vhost_user_write(dev, msg, &fd, 1) < 0) {
> > > +                return -1;
> > > +            }
> > > +        }
> > > +    }
> > > +


  reply	other threads:[~2020-02-25  7:23 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-16  2:57 [PATCH v2 0/3] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
2020-01-16  2:57 ` [PATCH v2 1/3] Fixed assert in vhost_user_set_mem_table_postcopy Raphael Norwitz
2020-02-06  8:17   ` Michael S. Tsirkin
2020-02-06  8:20     ` Michael S. Tsirkin
2020-02-09 17:17       ` Raphael Norwitz
2020-01-16  2:57 ` [PATCH v2 2/3] Refactor vhost_user_set_mem_table functions Raphael Norwitz
2020-02-06  8:21   ` Michael S. Tsirkin
2020-02-09 17:21     ` Raphael Norwitz
2020-01-16  2:57 ` [PATCH v2 3/3] Lift max memory slots limit imposed by vhost-user Raphael Norwitz
2020-02-06  8:32   ` Michael S. Tsirkin
2020-02-09 17:43     ` Raphael Norwitz
2020-02-20  7:03       ` Raphael Norwitz [this message]
2020-02-25 12:07       ` Michael S. Tsirkin
2020-01-31 21:21 ` [PATCH v2 0/3] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
2020-02-06  8:33 ` Michael S. Tsirkin
2020-02-09 17:14   ` Raphael Norwitz
2020-02-10 16:04     ` Michael S. Tsirkin
2020-02-19  5:33       ` Raphael Norwitz
2020-02-19 10:08         ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200220070308.GA2215@localhost.localdomain \
    --to=raphael.norwitz@nutanix.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).