From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: qemu-devel@nongnu.org, maxime.coquelin@redhat.com,
marcandre.lureau@redhat.com, peterx@redhat.com,
imammedo@redhat.com, quintela@redhat.com, aarcange@redhat.com
Subject: Re: [Qemu-devel] [PATCH v3 15/29] vhost+postcopy: Send address back to qemu
Date: Tue, 27 Feb 2018 22:25:14 +0200 [thread overview]
Message-ID: <20180227222336-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20180227195418.GK2847@work-vm>
On Tue, Feb 27, 2018 at 07:54:18PM +0000, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
> > On Fri, Feb 16, 2018 at 01:16:11PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > >
> > > We need a better way, but at the moment we need the address of the
> > > mappings sent back to qemu so it can interpret the messages on the
> > > userfaultfd it reads.
> > >
> > > This is done as a 3 stage set:
> > > QEMU -> client
> > > set_mem_table
> > >
> > > mmap stuff, get addresses
> > >
> > > client -> qemu
> > > here are the addresses
> > >
> > > qemu -> client
> > > OK - now you can use them
> > >
> > > That ensures that qemu has registered the new addresses in it's
> > > userfault code before the client starts accessing them.
> > >
> > > Note: We don't ask for the default 'ack' reply since we've got our own.
> > >
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > ---
> > > contrib/libvhost-user/libvhost-user.c | 24 ++++++++++++-
> > > docs/interop/vhost-user.txt | 9 +++++
> > > hw/virtio/trace-events | 1 +
> > > hw/virtio/vhost-user.c | 67 +++++++++++++++++++++++++++++++++--
> > > 4 files changed, 98 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
> > > index a18bc74a7c..e02e5d6f46 100644
> > > --- a/contrib/libvhost-user/libvhost-user.c
> > > +++ b/contrib/libvhost-user/libvhost-user.c
> > > @@ -491,10 +491,32 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
> > > dev_region->mmap_addr);
> > > }
> > >
> > > + /* Return the address to QEMU so that it can translate the ufd
> > > + * fault addresses back.
> > > + */
> > > + msg_region->userspace_addr = (uintptr_t)(mmap_addr +
> > > + dev_region->mmap_offset);
> > > close(vmsg->fds[i]);
> > > }
> > >
> > > - /* TODO: Get address back to QEMU */
> > > + /* Send the message back to qemu with the addresses filled in */
> > > + vmsg->fd_num = 0;
> > > + if (!vu_message_write(dev, dev->sock, vmsg)) {
> > > + vu_panic(dev, "failed to respond to set-mem-table for postcopy");
> > > + return false;
> > > + }
> > > +
> > > + /* Wait for QEMU to confirm that it's registered the handler for the
> > > + * faults.
> > > + */
> > > + if (!vu_message_read(dev, dev->sock, vmsg) ||
> > > + vmsg->size != sizeof(vmsg->payload.u64) ||
> > > + vmsg->payload.u64 != 0) {
> > > + vu_panic(dev, "failed to receive valid ack for postcopy set-mem-table");
> > > + return false;
> > > + }
> > > +
> > > + /* OK, now we can go and register the memory and generate faults */
> > > for (i = 0; i < dev->nregions; i++) {
> > > VuDevRegion *dev_region = &dev->regions[i];
> > > #ifdef UFFDIO_REGISTER
> > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > > index bdec9ec0e8..5bbcab2cc4 100644
> > > --- a/docs/interop/vhost-user.txt
> > > +++ b/docs/interop/vhost-user.txt
> > > @@ -454,12 +454,21 @@ Master message types
> > > Id: 5
> > > Equivalent ioctl: VHOST_SET_MEM_TABLE
> > > Master payload: memory regions description
> > > + Slave payload: (postcopy only) memory regions description
> > >
> > > Sets the memory map regions on the slave so it can translate the vring
> > > addresses. In the ancillary data there is an array of file descriptors
> > > for each memory mapped region. The size and ordering of the fds matches
> > > the number and ordering of memory regions.
> > >
> > > + When postcopy-listening has been received,
> >
> > Which message is this?
>
> VHOST_USER_POSTCOPY_LISTEN
>
> Do you want me just to change that to, 'When VHOST_USER_POSTCOPY_LISTEN
> has been received' ?
I think it's better this way, yes.
> > > SET_MEM_TABLE replies with
> > > + the bases of the memory mapped regions to the master. It must have mmap'd
> > > + the regions but not yet accessed them and should not yet generate a userfault
> > > + event. Note NEED_REPLY_MASK is not set in this case.
> > > + QEMU will then reply back to the list of mappings with an empty
> > > + VHOST_USER_SET_MEM_TABLE as an acknolwedgment; only upon reception of this
> > > + message may the guest start accessing the memory and generating faults.
> > > +
> > > * VHOST_USER_SET_LOG_BASE
> > >
> > > Id: 6
> >
> > As you say yourself, this is probably the best we can do for now,
> > but it's not ideal. So I think it's a good idea to isolate this
> > behind a separate protocol feature bit. For now it will be required
> > for postcopy, when it's fixed in kernel we can drop it
> > cleanly.
> >
>
> While we've talked about ways of avoiding the exact addresses being
> known by the slave, I'm not sure we've talked about a way of removing
> this handshake; although it's doable if we move more of the work to the QEMU
> side.
>
> Dave
Some kernel changes might thinkably remove the need for use of the
address with userfaultfd, too.
> > > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > > index 06ec03d6e7..05d18ada77 100644
> > > --- a/hw/virtio/trace-events
> > > +++ b/hw/virtio/trace-events
> > > @@ -8,6 +8,7 @@ vhost_section(const char *name, int r) "%s:%d"
> > >
> > > # hw/virtio/vhost-user.c
> > > vhost_user_postcopy_listen(void) ""
> > > +vhost_user_set_mem_table_postcopy(uint64_t client_addr, uint64_t qhva, int reply_i, int region_i) "client:0x%"PRIx64" for hva: 0x%"PRIx64" reply %d region %d"
> > >
> > > # hw/virtio/virtio.c
> > > virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
> > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > index 64f4b3b3f9..a060442cb9 100644
> > > --- a/hw/virtio/vhost-user.c
> > > +++ b/hw/virtio/vhost-user.c
> > > @@ -159,6 +159,7 @@ struct vhost_user {
> > > int slave_fd;
> > > NotifierWithReturn postcopy_notifier;
> > > struct PostCopyFD postcopy_fd;
> > > + uint64_t postcopy_client_bases[VHOST_MEMORY_MAX_NREGIONS];
> > > /* True once we've entered postcopy_listen */
> > > bool postcopy_listen;
> > > };
> > > @@ -328,12 +329,15 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
> > > static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
> > > struct vhost_memory *mem)
> > > {
> > > + struct vhost_user *u = dev->opaque;
> > > int fds[VHOST_MEMORY_MAX_NREGIONS];
> > > int i, fd;
> > > size_t fd_num = 0;
> > > bool reply_supported = virtio_has_feature(dev->protocol_features,
> > > VHOST_USER_PROTOCOL_F_REPLY_ACK);
> > > - /* TODO: Add actual postcopy differences */
> > > + VhostUserMsg msg_reply;
> > > + int region_i, msg_i;
> > > +
> > > VhostUserMsg msg = {
> > > .hdr.request = VHOST_USER_SET_MEM_TABLE,
> > > .hdr.flags = VHOST_USER_VERSION,
> > > @@ -380,6 +384,64 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
> > > return -1;
> > > }
> > >
> > > + if (vhost_user_read(dev, &msg_reply) < 0) {
> > > + return -1;
> > > + }
> > > +
> > > + if (msg_reply.hdr.request != VHOST_USER_SET_MEM_TABLE) {
> > > + error_report("%s: Received unexpected msg type."
> > > + "Expected %d received %d", __func__,
> > > + VHOST_USER_SET_MEM_TABLE, msg_reply.hdr.request);
> > > + return -1;
> > > + }
> > > + /* We're using the same structure, just reusing one of the
> > > + * fields, so it should be the same size.
> > > + */
> > > + if (msg_reply.hdr.size != msg.hdr.size) {
> > > + error_report("%s: Unexpected size for postcopy reply "
> > > + "%d vs %d", __func__, msg_reply.hdr.size, msg.hdr.size);
> > > + return -1;
> > > + }
> > > +
> > > + memset(u->postcopy_client_bases, 0,
> > > + sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS);
> > > +
> > > + /* They're in the same order as the regions that were sent
> > > + * but some of the regions were skipped (above) if they
> > > + * didn't have fd's
> > > + */
> > > + for (msg_i = 0, region_i = 0;
> > > + region_i < dev->mem->nregions;
> > > + region_i++) {
> > > + if (msg_i < fd_num &&
> > > + msg_reply.payload.memory.regions[msg_i].guest_phys_addr ==
> > > + dev->mem->regions[region_i].guest_phys_addr) {
> > > + u->postcopy_client_bases[region_i] =
> > > + msg_reply.payload.memory.regions[msg_i].userspace_addr;
> > > + trace_vhost_user_set_mem_table_postcopy(
> > > + msg_reply.payload.memory.regions[msg_i].userspace_addr,
> > > + msg.payload.memory.regions[msg_i].userspace_addr,
> > > + msg_i, region_i);
> > > + msg_i++;
> > > + }
> > > + }
> > > + if (msg_i != fd_num) {
> > > + error_report("%s: postcopy reply not fully consumed "
> > > + "%d vs %zd",
> > > + __func__, msg_i, fd_num);
> > > + return -1;
> > > + }
> > > + /* Now we've registered this with the postcopy code, we ack to the client,
> > > + * because now we're in the position to be able to deal with any faults
> > > + * it generates.
> > > + */
> > > + /* TODO: Use this for failure cases as well with a bad value */
> > > + msg.hdr.size = sizeof(msg.payload.u64);
> > > + msg.payload.u64 = 0; /* OK */
> > > + if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > > + return -1;
> > > + }
> > > +
> > > if (reply_supported) {
> > > return process_message_reply(dev, &msg);
> > > }
> > > @@ -396,7 +458,8 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
> > > size_t fd_num = 0;
> > > bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler;
> > > bool reply_supported = virtio_has_feature(dev->protocol_features,
> > > - VHOST_USER_PROTOCOL_F_REPLY_ACK);
> > > + VHOST_USER_PROTOCOL_F_REPLY_ACK) &&
> > > + !do_postcopy;
> > >
> > > if (do_postcopy) {
> > > /* Postcopy has enough differences that it's best done in it's own
> > > --
> > > 2.14.3
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2018-02-27 20:25 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-16 13:15 [Qemu-devel] [PATCH v3 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
2018-02-16 13:15 ` [Qemu-devel] [PATCH v3 01/29] migrate: Update ram_block_discard_range for shared Dr. David Alan Gilbert (git)
2018-02-28 6:37 ` Peter Xu
2018-02-28 19:54 ` Dr. David Alan Gilbert
2018-02-16 13:15 ` [Qemu-devel] [PATCH v3 02/29] qemu_ram_block_host_offset Dr. David Alan Gilbert (git)
2018-02-16 13:15 ` [Qemu-devel] [PATCH v3 03/29] postcopy: use UFFDIO_ZEROPAGE only when available Dr. David Alan Gilbert (git)
2018-02-28 6:53 ` Peter Xu
2018-03-05 17:23 ` Dr. David Alan Gilbert
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 04/29] postcopy: Add notifier chain Dr. David Alan Gilbert (git)
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 05/29] postcopy: Add vhost-user flag for postcopy and check it Dr. David Alan Gilbert (git)
2018-02-28 7:14 ` Peter Xu
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 06/29] vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message Dr. David Alan Gilbert (git)
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 07/29] libvhost-user: Support sending fds back to qemu Dr. David Alan Gilbert (git)
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 08/29] libvhost-user: Open userfaultfd Dr. David Alan Gilbert (git)
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 09/29] postcopy: Allow registering of fd handler Dr. David Alan Gilbert (git)
2018-02-28 8:38 ` Peter Xu
2018-03-05 17:35 ` Dr. David Alan Gilbert
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 10/29] vhost+postcopy: Register shared ufd with postcopy Dr. David Alan Gilbert (git)
2018-02-28 8:46 ` Peter Xu
2018-03-05 18:21 ` Dr. David Alan Gilbert
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 11/29] vhost+postcopy: Transmit 'listen' to client Dr. David Alan Gilbert (git)
2018-02-28 8:42 ` Peter Xu
2018-03-05 17:42 ` Dr. David Alan Gilbert
2018-03-06 7:06 ` Peter Xu
2018-03-06 11:20 ` Dr. David Alan Gilbert
2018-03-07 10:05 ` Peter Xu
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 12/29] postcopy+vhost-user: Split set_mem_table for postcopy Dr. David Alan Gilbert (git)
2018-02-28 8:49 ` Peter Xu
2018-03-05 18:45 ` Dr. David Alan Gilbert
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 13/29] migration/ram: ramblock_recv_bitmap_test_byte_offset Dr. David Alan Gilbert (git)
2018-02-28 8:52 ` Peter Xu
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 14/29] libvhost-user+postcopy: Register new regions with the ufd Dr. David Alan Gilbert (git)
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 15/29] vhost+postcopy: Send address back to qemu Dr. David Alan Gilbert (git)
2018-02-27 14:25 ` Michael S. Tsirkin
2018-02-27 19:54 ` Dr. David Alan Gilbert
2018-02-27 20:25 ` Michael S. Tsirkin [this message]
2018-02-28 18:26 ` Dr. David Alan Gilbert
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 16/29] vhost+postcopy: Stash RAMBlock and offset Dr. David Alan Gilbert (git)
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 17/29] vhost+postcopy: Send requests to source for shared pages Dr. David Alan Gilbert (git)
2018-02-28 10:03 ` Peter Xu
2018-03-05 18:55 ` Dr. David Alan Gilbert
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 18/29] vhost+postcopy: Resolve client address Dr. David Alan Gilbert (git)
2018-03-02 7:29 ` Peter Xu
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 19/29] postcopy: wake shared Dr. David Alan Gilbert (git)
2018-03-02 7:44 ` Peter Xu
2018-03-05 19:35 ` Dr. David Alan Gilbert
2018-03-12 15:44 ` Marc-André Lureau
2018-03-12 16:42 ` Dr. David Alan Gilbert
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 20/29] postcopy: postcopy_notify_shared_wake Dr. David Alan Gilbert (git)
2018-03-02 7:51 ` Peter Xu
2018-03-05 19:55 ` Dr. David Alan Gilbert
2018-03-06 3:37 ` Peter Xu
2018-03-06 10:54 ` Dr. David Alan Gilbert
2018-03-07 10:13 ` Peter Xu
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 21/29] vhost+postcopy: Add vhost waker Dr. David Alan Gilbert (git)
2018-03-02 7:55 ` Peter Xu
2018-03-05 20:16 ` Dr. David Alan Gilbert
2018-03-06 7:19 ` Peter Xu
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 22/29] vhost+postcopy: Call wakeups Dr. David Alan Gilbert (git)
2018-03-02 8:05 ` Peter Xu
2018-03-06 10:36 ` Dr. David Alan Gilbert
2018-03-08 6:22 ` Peter Xu
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 23/29] libvhost-user: mprotect & madvises for postcopy Dr. David Alan Gilbert (git)
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 24/29] vhost-user: Add VHOST_USER_POSTCOPY_END message Dr. David Alan Gilbert (git)
2018-02-26 20:27 ` Michael S. Tsirkin
2018-02-27 10:09 ` Dr. David Alan Gilbert
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 25/29] vhost+postcopy: Wire up POSTCOPY_END notify Dr. David Alan Gilbert (git)
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 26/29] vhost: Huge page align and merge Dr. David Alan Gilbert (git)
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 27/29] postcopy: Allow shared memory Dr. David Alan Gilbert (git)
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 28/29] libvhost-user: Claim support for postcopy Dr. David Alan Gilbert (git)
2018-02-16 13:16 ` [Qemu-devel] [PATCH v3 29/29] postcopy shared docs Dr. David Alan Gilbert (git)
2018-02-27 14:01 ` [Qemu-devel] [PATCH v3 00/29] postcopy+vhost-user/shared ram Michael S. Tsirkin
2018-02-27 20:05 ` Dr. David Alan Gilbert
2018-02-27 20:23 ` Michael S. Tsirkin
2018-02-28 18:38 ` Dr. David Alan Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180227222336-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=aarcange@redhat.com \
--cc=dgilbert@redhat.com \
--cc=imammedo@redhat.com \
--cc=marcandre.lureau@redhat.com \
--cc=maxime.coquelin@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).