From: Raphael Norwitz <raphael.norwitz@nutanix.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [PATCH v2 3/3] Lift max memory slots limit imposed by vhost-user
Date: Thu, 20 Feb 2020 02:03:08 -0500 [thread overview]
Message-ID: <20200220070308.GA2215@localhost.localdomain> (raw)
In-Reply-To: <20200209174335.GA15050@localhost.localdomain>
Ping
On Sun, Feb 09, 2020 at 12:43:35PM -0500, Raphael Norwitz wrote:
>
> On Thu, Feb 06, 2020 at 03:32:38AM -0500, Michael S. Tsirkin wrote:
> >
> > On Wed, Jan 15, 2020 at 09:57:06PM -0500, Raphael Norwitz wrote:
> > > The current vhost-user implementation in Qemu imposes a limit on the
> > > maximum number of memory slots exposed to a VM using a vhost-user
> > > device. This change provides a new protocol feature
> > > VHOST_USER_F_CONFIGURE_SLOTS which, when enabled, lifts this limit and
> > > allows a VM with a vhost-user device to expose a configurable number of
> > > memory slots, up to the ACPI defined maximum. Existing backends which
> > > do not support this protocol feature are unaffected.
> >
> > Hmm ACPI maximum seems to be up to 512 - is this too much to fit in a
> > single message? So can't we just increase the number (after negotiating
> > with remote) and be done with it, instead of add/remove? Or is there
> > another reason to prefer add/remove?
> >
>
> As mentioned in my cover letter, we experimented with simply increasing the
> message size and it didn’t work on our setup. We debugged down to the socket
> layer and found that on the receiving end the messages were truncated at
> around 512 bytes, or around 16 memory regions. To support 512 memory regions
> we would need a message size of around 32 <bytes per region> * 512 <regions>
> + 8 <bytes for padding and region count> ~= 16k packet size. That would be 64
> times larger than the next largest message size. We thought it would be cleaner
> and more in line with the rest of the protocol to keep the message sizes
> smaller. In particular, we thought memory regions should be treated like the
> rings, which are sent over one message at a time instead of in one large message.
> Whether or not such a large message size can be made to work in our case,
> separate messages will always work on Linux, and most likely all other UNIX
> platforms QEMU is used on.
>
> > >
> > > This feature works by using three new messages,
> > > VHOST_USER_GET_MAX_MEM_SLOTS, VHOST_USER_ADD_MEM_REG and
> > > VHOST_USER_REM_MEM_REG. VHOST_USER_GET_MAX_MEM_SLOTS gets the
> > > number of memory slots the backend is willing to accept when the
> > > backend is initialized. Then, when the memory tables are set or updated,
> > > a series of VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG messages
> > > are sent to transmit the regions to map and/or unmap instead of trying
> > > to send all the regions in one fixed size VHOST_USER_SET_MEM_TABLE
> > > message.
> > >
> > > The vhost_user struct maintains a shadow state of the VM’s memory
> > > regions. When the memory tables are modified, the
> > > vhost_user_set_mem_table() function compares the new device memory state
> > > to the shadow state and only sends regions which need to be unmapped or
> > > mapped in. The regions which must be unmapped are sent first, followed
> > > by the new regions to be mapped in. After all the messages have been
> > > sent, the shadow state is set to the current virtual device state.
> > >
> > > The current feature implementation does not work with postcopy migration
> > > and cannot be enabled if the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has
> > > also been negotiated.
> >
> > Hmm what would it take to lift the restrictions?
> > conflicting features like this makes is very hard for users to make
> > an informed choice what to support.
> >
>
> We would need a setup with a backend which supports these features (REPLY_ACK
> and postcopy migration). At first glance it looks like DPDK could work but
> I'm not sure how easy it will be to test postcopy migration with the resources
> we have.
>
> > > Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
> > > Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com>
> > > Suggested-by: Mike Cui <cui@nutanix.com>
> > > ---
> > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > index af83fdd..fed6d02 100644
> > > --- a/hw/virtio/vhost-user.c
> > > +++ b/hw/virtio/vhost-user.c
> > > @@ -35,11 +35,29 @@
> > > #include <linux/userfaultfd.h>
> > > #endif
> > >
> > > -#define VHOST_MEMORY_MAX_NREGIONS 8
> > > +#define VHOST_MEMORY_LEGACY_NREGIONS 8
> >
> > Hardly legacy when this is intended to always be used e.g. with
> > postcopy, right?
> >
>
> How about 'BASELINE'?
> > > + msg->hdr.size = sizeof(msg->payload.mem_reg.padding);
> > > + msg->hdr.size += sizeof(VhostUserMemoryRegion);
> > > +
> > > + /*
> > > + * Send VHOST_USER_REM_MEM_REG for memory regions in our shadow state
> > > + * which are not found not in the device's memory state.
> >
> > double negation - could not parse this.
> >
>
> Apologies - typo here. It should say “Send VHOST_USER_REM_MEM_REG for memory
> regions in our shadow state which are not found in the device's memory state.”
> i.e. send messages to remove regions in the shadow state but not in the updated
> device state.
>
> > > + */
> > > + for (i = 0; i < u->num_shadow_regions; ++i) {
> > > + reg = dev->mem->regions;
> > > +
> > > + for (j = 0; j < dev->mem->nregions; j++) {
> > > + reg = dev->mem->regions + j;
> > > +
> > > + assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
> > > + mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
> > > + &offset);
> > > + fd = memory_region_get_fd(mr);
> > > +
> > > + if (reg_equal(&u->shadow_regions[i], reg)) {
> > > + matching = true;
> > > + found[j] = true;
> > > + break;
> > > + }
> > > + }
> > > +
> > > + if (fd > 0 && !matching) {
> > > + msg->hdr.request = VHOST_USER_REM_MEM_REG;
> > > + msg->payload.mem_reg.region.userspace_addr = reg->userspace_addr;
> > > + msg->payload.mem_reg.region.memory_size = reg->memory_size;
> > > + msg->payload.mem_reg.region.guest_phys_addr =
> > > + reg->guest_phys_addr;
> > > + msg->payload.mem_reg.region.mmap_offset = offset;
> > > +
> > > + if (vhost_user_write(dev, msg, &fd, 1) < 0) {
> > > + return -1;
> > > + }
> > > + }
> > > + }
> > > +
next prev parent reply other threads:[~2020-02-25 7:23 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-16 2:57 [PATCH v2 0/3] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
2020-01-16 2:57 ` [PATCH v2 1/3] Fixed assert in vhost_user_set_mem_table_postcopy Raphael Norwitz
2020-02-06 8:17 ` Michael S. Tsirkin
2020-02-06 8:20 ` Michael S. Tsirkin
2020-02-09 17:17 ` Raphael Norwitz
2020-01-16 2:57 ` [PATCH v2 2/3] Refactor vhost_user_set_mem_table functions Raphael Norwitz
2020-02-06 8:21 ` Michael S. Tsirkin
2020-02-09 17:21 ` Raphael Norwitz
2020-01-16 2:57 ` [PATCH v2 3/3] Lift max memory slots limit imposed by vhost-user Raphael Norwitz
2020-02-06 8:32 ` Michael S. Tsirkin
2020-02-09 17:43 ` Raphael Norwitz
2020-02-20 7:03 ` Raphael Norwitz [this message]
2020-02-25 12:07 ` Michael S. Tsirkin
2020-01-31 21:21 ` [PATCH v2 0/3] vhost-user: Lift Max Ram Slots Limitation Raphael Norwitz
2020-02-06 8:33 ` Michael S. Tsirkin
2020-02-09 17:14 ` Raphael Norwitz
2020-02-10 16:04 ` Michael S. Tsirkin
2020-02-19 5:33 ` Raphael Norwitz
2020-02-19 10:08 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200220070308.GA2215@localhost.localdomain \
--to=raphael.norwitz@nutanix.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).