qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: qemu-devel@nongnu.org, a.perevalov@samsung.com,
	marcandre.lureau@redhat.com, maxime.coquelin@redhat.com,
	quintela@redhat.com, peterx@redhat.com, lvivier@redhat.com,
	aarcange@redhat.com
Subject: Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram
Date: Fri, 7 Jul 2017 13:01:56 +0100	[thread overview]
Message-ID: <20170707120155.GE2451@work-vm> (raw)
In-Reply-To: <20170703205127-mutt-send-email-mst@kernel.org>

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Wed, Jun 28, 2017 at 08:00:18PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Hi,
> >   This is a RFC/WIP series that enables postcopy migration
> > with shared memory to a vhost-user process.
> > It's based off current-head + Juan's load_cleanup series, and
> > Alexey's bitmap series (v4).  It's very lightly tested and seems
> > to work, but it's quite rough.
> > 
> > I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to
> > use the new feature, since this is about the simplest
> > client around.
> > 
> > Structure:
> > 
> > The basic idea is that near the start of postcopy, the client
> > opens its own userfaultfd fd and sends that back to QEMU over
> > the socket it's already using for VHUST_USER_* commands.
> > Then when VHOST_USER_SET_MEM_TABLE arrives it registers the
> > areas with userfaultfd and sends the mapped addresses back to QEMU.
> > 
> > QEMU then reads the clients UFD in it's fault thread and issues
> > requests back to the source as needed.
> > QEMU also issues 'WAKE' ioctls on the UFD to let the client know
> > that the page has arrived and can carry on.
> > 
> > A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that
> > the QEMU knows the client can talk postcopy.
> > Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are
> > added to guide the process along.
> > 
> > Current known issues:
> >    I've not tested it with hugepages yet; and I suspect the madvises
> >    will need tweaking for it.
> > 
> >    The qemu gets to see the base addresses that the client has its
> >    regions mapped at; that's not great for security
> 
> Not urgent to fix.
> 
> >    Take care of deadlocking; any thread in the client that
> >    accesses a userfault protected page can stall.
> 
> And it can happen under a lock quite easily.
> What exactly is proposed here?
> Maybe we want to reuse the new channel that the IOMMU uses.

There's no fundamental reason to get deadlocks as long as you
get it right; the qemu thread that processes the user-fault's
is a separate independent thread, so once it's going the client
can do whatever it likes and it will get woken up without
intervention.
Some care is needed around the postcopy-end; reception of the
message that tells you to drop the userfault enables (which
frees anything that hasn't been woken) must be allowed to happen
for the postcopy complete;  we take care that QEMUs fault
thread lives on until that message is acknowledged.

I'm more worried about how this will work in a full packet switch
when one vhost-user client for an incoming migration stalls
the whole switch unless care is taken about the design.
How do we figure out whether this is going to fly on a full stack?
That's my main reason for getting this WIP set out here to
get comments.

> >    There's a nasty hack of a lock around the set_mem_table message.
> 
> Yes.
> 
> >    I've not looked at the recent IOMMU code.
> > 
> >    Some cleanup and a lot of corner cases need thinking about.
> > 
> >    There are probably plenty of unknown issues as well.
> 
> At the protocol level, I'd like to rename the feature to
> USER_PAGEFAULT. Client does not really know anything about
> copies, it's all internal to qemu.
> Spec can document that it's used by qemu for postcopy.

OK, tbh I suspect that using it for anything else would be tricky
without adding more protocol features for that other use case.

Dave

> > Test setup:
> >   I'm running on one host at the moment, with the guest
> >   scping a large file from the host as it migrates.
> >   The setup is based on one I found in the vhost-user setups.
> >   You'll need a recent kernel for the shared memory support
> >   in userfaultfd, and userfault isn't that happy if a process
> >   using shared memory core's - so make sure you have the
> >   latest fixes.
> > 
> > SESS=vhost
> > ulimit -c unlimited
> > tmux -L $SESS new-session -d
> > tmux -L $SESS set-option -g history-limit 30000
> > # Start a router using the system qemu
> > tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca
> > lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0
> > tmux -L $SESS set-option -g set-remain-on-exit on
> > # Start source vhost bridge
> > tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log"
> > sleep 0.5
> > tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe
> > nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/
> > tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log "
> > # Start dest vhost bridge
> > tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0.
> > 1:5556 2>dst-vub-log"
> > sleep 0.5
> > tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend
> > -file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm
> > p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log"
> > tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on
> > tmux -L $SESS send-keys -t source "migrate_set_speed 20M
> > tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on
> > 
> > then once booted:
> > tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M'
> > tmux -L vhost send-keys -t source 'migrate_start_postcopy^M'
> > (Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M)
> > 
> > 
> > Dave
> > 
> > Dr. David Alan Gilbert (29):
> >   RAMBlock/migration: Add migration flags
> >   migrate: Update ram_block_discard_range for shared
> >   qemu_ram_block_host_offset
> >   migration/ram: ramblock_recv_bitmap_test_byte_offset
> >   postcopy: use UFFDIO_ZEROPAGE only when available
> >   postcopy: Add notifier chain
> >   postcopy: Add vhost-user flag for postcopy and check it
> >   vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message
> >   vhub: Support sending fds back to qemu
> >   vhub: Open userfaultfd
> >   postcopy: Allow registering of fd handler
> >   vhost+postcopy: Register shared ufd with postcopy
> >   vhost+postcopy: Transmit 'listen' to client
> >   vhost+postcopy: Register new regions with the ufd
> >   vhost+postcopy: Send address back to qemu
> >   vhost+postcopy: Stash RAMBlock and offset
> >   vhost+postcopy: Send requests to source for shared pages
> >   vhost+postcopy: Resolve client address
> >   postcopy: wake shared
> >   postcopy: postcopy_notify_shared_wake
> >   vhost+postcopy: Add vhost waker
> >   vhost+postcopy: Call wakeups
> >   vub+postcopy: madvises
> >   vhost+postcopy: Lock around set_mem_table
> >   vhu: enable = false on get_vring_base
> >   vhost: Add VHOST_USER_POSTCOPY_END message
> >   vhost+postcopy: Wire up POSTCOPY_END notify
> >   postcopy: Allow shared memory
> >   vhost-user: Claim support for postcopy
> > 
> >  contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++-
> >  contrib/libvhost-user/libvhost-user.h |   8 +
> >  exec.c                                |  44 +++--
> >  hw/virtio/trace-events                |  13 ++
> >  hw/virtio/vhost-user.c                | 293 +++++++++++++++++++++++++++-
> >  include/exec/cpu-common.h             |   3 +
> >  include/exec/ram_addr.h               |   2 +
> >  migration/migration.c                 |   3 +
> >  migration/migration.h                 |   8 +
> >  migration/postcopy-ram.c              | 357 +++++++++++++++++++++++++++-------
> >  migration/postcopy-ram.h              |  69 +++++++
> >  migration/ram.c                       |   5 +
> >  migration/ram.h                       |   1 +
> >  migration/savevm.c                    |  13 ++
> >  migration/trace-events                |   6 +
> >  trace-events                          |   3 +
> >  vl.c                                  |   4 +-
> >  17 files changed, 926 insertions(+), 84 deletions(-)
> > 
> > -- 
> > 2.13.0
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2017-07-07 12:05 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 01/29] RAMBlock/migration: Add migration flags Dr. David Alan Gilbert (git)
2017-07-10  9:28   ` Peter Xu
2017-07-12 16:48     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 02/29] migrate: Update ram_block_discard_range for shared Dr. David Alan Gilbert (git)
2017-07-10 10:03   ` Peter Xu
2017-08-24 16:59     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 03/29] qemu_ram_block_host_offset Dr. David Alan Gilbert (git)
2017-07-03 17:44   ` Michael S. Tsirkin
2017-08-14 17:27     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 04/29] migration/ram: ramblock_recv_bitmap_test_byte_offset Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 05/29] postcopy: use UFFDIO_ZEROPAGE only when available Dr. David Alan Gilbert (git)
2017-07-10 10:19   ` Peter Xu
2017-07-12 16:54     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 06/29] postcopy: Add notifier chain Dr. David Alan Gilbert (git)
2017-07-10 10:31   ` Peter Xu
2017-07-12 17:14     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 07/29] postcopy: Add vhost-user flag for postcopy and check it Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 08/29] vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 09/29] vhub: Support sending fds back to qemu Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 10/29] vhub: Open userfaultfd Dr. David Alan Gilbert (git)
2017-07-24 12:10   ` Maxime Coquelin
2017-07-26 17:12     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 11/29] postcopy: Allow registering of fd handler Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 12/29] vhost+postcopy: Register shared ufd with postcopy Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 13/29] vhost+postcopy: Transmit 'listen' to client Dr. David Alan Gilbert (git)
2017-07-24 14:36   ` Maxime Coquelin
2017-07-26 17:42     ` Dr. David Alan Gilbert
2017-07-26 18:03       ` Maxime Coquelin
2017-06-28 19:00 ` [Qemu-devel] [RFC 14/29] vhost+postcopy: Register new regions with the ufd Dr. David Alan Gilbert (git)
2017-07-24 15:22   ` Maxime Coquelin
2017-07-24 17:50     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 15/29] vhost+postcopy: Send address back to qemu Dr. David Alan Gilbert (git)
2017-07-24 17:31   ` Maxime Coquelin
2017-06-28 19:00 ` [Qemu-devel] [RFC 16/29] vhost+postcopy: Stash RAMBlock and offset Dr. David Alan Gilbert (git)
2017-07-11  3:31   ` Peter Xu
2017-07-14 17:15     ` Dr. David Alan Gilbert
2017-07-17  2:59       ` Peter Xu
2017-08-17 17:29         ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 17/29] vhost+postcopy: Send requests to source for shared pages Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 18/29] vhost+postcopy: Resolve client address Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 19/29] postcopy: wake shared Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 20/29] postcopy: postcopy_notify_shared_wake Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 21/29] vhost+postcopy: Add vhost waker Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 22/29] vhost+postcopy: Call wakeups Dr. David Alan Gilbert (git)
2017-07-11  4:22   ` Peter Xu
2017-07-12 15:00     ` Andrea Arcangeli
2017-07-14  2:45       ` Peter Xu
2017-07-14 14:18       ` Michael S. Tsirkin
2017-06-28 19:00 ` [Qemu-devel] [RFC 23/29] vub+postcopy: madvises Dr. David Alan Gilbert (git)
2017-08-07  4:49   ` Alexey Perevalov
2017-08-08 17:06     ` Dr. David Alan Gilbert
2017-08-09 11:02       ` Alexey Perevalov
2017-08-10  8:55         ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 24/29] vhost+postcopy: Lock around set_mem_table Dr. David Alan Gilbert (git)
2017-07-04 19:34   ` Maxime Coquelin
2017-07-07 11:53     ` Dr. David Alan Gilbert
2017-07-07 12:52       ` Maxime Coquelin
2017-10-03 13:23       ` Dr. David Alan Gilbert
2017-10-06 12:22         ` Maxime Coquelin
2017-10-09 12:12           ` Dr. David Alan Gilbert
2017-10-12  7:22             ` Maxime Coquelin
2017-06-28 19:00 ` [Qemu-devel] [RFC 25/29] vhu: enable = false on get_vring_base Dr. David Alan Gilbert (git)
2017-07-04 19:38   ` Maxime Coquelin
2017-07-04 21:59   ` Michael S. Tsirkin
2017-07-05 17:16     ` Dr. David Alan Gilbert
2017-07-05 23:28       ` Michael S. Tsirkin
2017-08-18 19:19     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 26/29] vhost: Add VHOST_USER_POSTCOPY_END message Dr. David Alan Gilbert (git)
2017-07-27 11:35   ` Maxime Coquelin
2017-08-24 14:53     ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 27/29] vhost+postcopy: Wire up POSTCOPY_END notify Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 28/29] postcopy: Allow shared memory Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 29/29] vhost-user: Claim support for postcopy Dr. David Alan Gilbert (git)
2017-07-04 14:09   ` Maxime Coquelin
2017-07-07 11:39     ` Dr. David Alan Gilbert
2017-06-29 18:55 ` [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert
2017-07-03 11:03   ` Marc-André Lureau
2017-07-03 11:48     ` Dr. David Alan Gilbert
2017-07-07 10:51     ` Dr. David Alan Gilbert
     [not found] ` <CGME20170703135859eucas1p1edc55e3318a3079b026bed81e0ae0388@eucas1p1.samsung.com>
2017-07-03 13:58   ` Alexey
2017-07-03 16:49     ` Dr. David Alan Gilbert
2017-07-03 17:42       ` Alexey
2017-07-03 17:55 ` Michael S. Tsirkin
2017-07-07 12:01   ` Dr. David Alan Gilbert [this message]
2017-07-07 15:35     ` Michael S. Tsirkin
2017-07-07 17:26       ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170707120155.GE2451@work-vm \
    --to=dgilbert@redhat.com \
    --cc=a.perevalov@samsung.com \
    --cc=aarcange@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=mst@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).