From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: qemu-devel@nongnu.org, a.perevalov@samsung.com,
marcandre.lureau@redhat.com, maxime.coquelin@redhat.com,
quintela@redhat.com, peterx@redhat.com, lvivier@redhat.com,
aarcange@redhat.com
Subject: Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram
Date: Fri, 7 Jul 2017 13:01:56 +0100 [thread overview]
Message-ID: <20170707120155.GE2451@work-vm> (raw)
In-Reply-To: <20170703205127-mutt-send-email-mst@kernel.org>
* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Wed, Jun 28, 2017 at 08:00:18PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Hi,
> > This is a RFC/WIP series that enables postcopy migration
> > with shared memory to a vhost-user process.
> > It's based off current-head + Juan's load_cleanup series, and
> > Alexey's bitmap series (v4). It's very lightly tested and seems
> > to work, but it's quite rough.
> >
> > I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to
> > use the new feature, since this is about the simplest
> > client around.
> >
> > Structure:
> >
> > The basic idea is that near the start of postcopy, the client
> > opens its own userfaultfd fd and sends that back to QEMU over
> > the socket it's already using for VHUST_USER_* commands.
> > Then when VHOST_USER_SET_MEM_TABLE arrives it registers the
> > areas with userfaultfd and sends the mapped addresses back to QEMU.
> >
> > QEMU then reads the clients UFD in it's fault thread and issues
> > requests back to the source as needed.
> > QEMU also issues 'WAKE' ioctls on the UFD to let the client know
> > that the page has arrived and can carry on.
> >
> > A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that
> > the QEMU knows the client can talk postcopy.
> > Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are
> > added to guide the process along.
> >
> > Current known issues:
> > I've not tested it with hugepages yet; and I suspect the madvises
> > will need tweaking for it.
> >
> > The qemu gets to see the base addresses that the client has its
> > regions mapped at; that's not great for security
>
> Not urgent to fix.
>
> > Take care of deadlocking; any thread in the client that
> > accesses a userfault protected page can stall.
>
> And it can happen under a lock quite easily.
> What exactly is proposed here?
> Maybe we want to reuse the new channel that the IOMMU uses.
There's no fundamental reason to get deadlocks as long as you
get it right; the qemu thread that processes the user-fault's
is a separate independent thread, so once it's going the client
can do whatever it likes and it will get woken up without
intervention.
Some care is needed around the postcopy-end; reception of the
message that tells you to drop the userfault enables (which
frees anything that hasn't been woken) must be allowed to happen
for the postcopy complete; we take care that QEMUs fault
thread lives on until that message is acknowledged.
I'm more worried about how this will work in a full packet switch
when one vhost-user client for an incoming migration stalls
the whole switch unless care is taken about the design.
How do we figure out whether this is going to fly on a full stack?
That's my main reason for getting this WIP set out here to
get comments.
> > There's a nasty hack of a lock around the set_mem_table message.
>
> Yes.
>
> > I've not looked at the recent IOMMU code.
> >
> > Some cleanup and a lot of corner cases need thinking about.
> >
> > There are probably plenty of unknown issues as well.
>
> At the protocol level, I'd like to rename the feature to
> USER_PAGEFAULT. Client does not really know anything about
> copies, it's all internal to qemu.
> Spec can document that it's used by qemu for postcopy.
OK, tbh I suspect that using it for anything else would be tricky
without adding more protocol features for that other use case.
Dave
> > Test setup:
> > I'm running on one host at the moment, with the guest
> > scping a large file from the host as it migrates.
> > The setup is based on one I found in the vhost-user setups.
> > You'll need a recent kernel for the shared memory support
> > in userfaultfd, and userfault isn't that happy if a process
> > using shared memory core's - so make sure you have the
> > latest fixes.
> >
> > SESS=vhost
> > ulimit -c unlimited
> > tmux -L $SESS new-session -d
> > tmux -L $SESS set-option -g history-limit 30000
> > # Start a router using the system qemu
> > tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca
> > lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0
> > tmux -L $SESS set-option -g set-remain-on-exit on
> > # Start source vhost bridge
> > tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log"
> > sleep 0.5
> > tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe
> > nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/
> > tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log "
> > # Start dest vhost bridge
> > tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0.
> > 1:5556 2>dst-vub-log"
> > sleep 0.5
> > tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend
> > -file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm
> > p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log"
> > tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on
> > tmux -L $SESS send-keys -t source "migrate_set_speed 20M
> > tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on
> >
> > then once booted:
> > tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M'
> > tmux -L vhost send-keys -t source 'migrate_start_postcopy^M'
> > (Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M)
> >
> >
> > Dave
> >
> > Dr. David Alan Gilbert (29):
> > RAMBlock/migration: Add migration flags
> > migrate: Update ram_block_discard_range for shared
> > qemu_ram_block_host_offset
> > migration/ram: ramblock_recv_bitmap_test_byte_offset
> > postcopy: use UFFDIO_ZEROPAGE only when available
> > postcopy: Add notifier chain
> > postcopy: Add vhost-user flag for postcopy and check it
> > vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message
> > vhub: Support sending fds back to qemu
> > vhub: Open userfaultfd
> > postcopy: Allow registering of fd handler
> > vhost+postcopy: Register shared ufd with postcopy
> > vhost+postcopy: Transmit 'listen' to client
> > vhost+postcopy: Register new regions with the ufd
> > vhost+postcopy: Send address back to qemu
> > vhost+postcopy: Stash RAMBlock and offset
> > vhost+postcopy: Send requests to source for shared pages
> > vhost+postcopy: Resolve client address
> > postcopy: wake shared
> > postcopy: postcopy_notify_shared_wake
> > vhost+postcopy: Add vhost waker
> > vhost+postcopy: Call wakeups
> > vub+postcopy: madvises
> > vhost+postcopy: Lock around set_mem_table
> > vhu: enable = false on get_vring_base
> > vhost: Add VHOST_USER_POSTCOPY_END message
> > vhost+postcopy: Wire up POSTCOPY_END notify
> > postcopy: Allow shared memory
> > vhost-user: Claim support for postcopy
> >
> > contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++-
> > contrib/libvhost-user/libvhost-user.h | 8 +
> > exec.c | 44 +++--
> > hw/virtio/trace-events | 13 ++
> > hw/virtio/vhost-user.c | 293 +++++++++++++++++++++++++++-
> > include/exec/cpu-common.h | 3 +
> > include/exec/ram_addr.h | 2 +
> > migration/migration.c | 3 +
> > migration/migration.h | 8 +
> > migration/postcopy-ram.c | 357 +++++++++++++++++++++++++++-------
> > migration/postcopy-ram.h | 69 +++++++
> > migration/ram.c | 5 +
> > migration/ram.h | 1 +
> > migration/savevm.c | 13 ++
> > migration/trace-events | 6 +
> > trace-events | 3 +
> > vl.c | 4 +-
> > 17 files changed, 926 insertions(+), 84 deletions(-)
> >
> > --
> > 2.13.0
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2017-07-07 12:05 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-28 19:00 [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 01/29] RAMBlock/migration: Add migration flags Dr. David Alan Gilbert (git)
2017-07-10 9:28 ` Peter Xu
2017-07-12 16:48 ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 02/29] migrate: Update ram_block_discard_range for shared Dr. David Alan Gilbert (git)
2017-07-10 10:03 ` Peter Xu
2017-08-24 16:59 ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 03/29] qemu_ram_block_host_offset Dr. David Alan Gilbert (git)
2017-07-03 17:44 ` Michael S. Tsirkin
2017-08-14 17:27 ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 04/29] migration/ram: ramblock_recv_bitmap_test_byte_offset Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 05/29] postcopy: use UFFDIO_ZEROPAGE only when available Dr. David Alan Gilbert (git)
2017-07-10 10:19 ` Peter Xu
2017-07-12 16:54 ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 06/29] postcopy: Add notifier chain Dr. David Alan Gilbert (git)
2017-07-10 10:31 ` Peter Xu
2017-07-12 17:14 ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 07/29] postcopy: Add vhost-user flag for postcopy and check it Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 08/29] vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 09/29] vhub: Support sending fds back to qemu Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 10/29] vhub: Open userfaultfd Dr. David Alan Gilbert (git)
2017-07-24 12:10 ` Maxime Coquelin
2017-07-26 17:12 ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 11/29] postcopy: Allow registering of fd handler Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 12/29] vhost+postcopy: Register shared ufd with postcopy Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 13/29] vhost+postcopy: Transmit 'listen' to client Dr. David Alan Gilbert (git)
2017-07-24 14:36 ` Maxime Coquelin
2017-07-26 17:42 ` Dr. David Alan Gilbert
2017-07-26 18:03 ` Maxime Coquelin
2017-06-28 19:00 ` [Qemu-devel] [RFC 14/29] vhost+postcopy: Register new regions with the ufd Dr. David Alan Gilbert (git)
2017-07-24 15:22 ` Maxime Coquelin
2017-07-24 17:50 ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 15/29] vhost+postcopy: Send address back to qemu Dr. David Alan Gilbert (git)
2017-07-24 17:31 ` Maxime Coquelin
2017-06-28 19:00 ` [Qemu-devel] [RFC 16/29] vhost+postcopy: Stash RAMBlock and offset Dr. David Alan Gilbert (git)
2017-07-11 3:31 ` Peter Xu
2017-07-14 17:15 ` Dr. David Alan Gilbert
2017-07-17 2:59 ` Peter Xu
2017-08-17 17:29 ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 17/29] vhost+postcopy: Send requests to source for shared pages Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 18/29] vhost+postcopy: Resolve client address Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 19/29] postcopy: wake shared Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 20/29] postcopy: postcopy_notify_shared_wake Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 21/29] vhost+postcopy: Add vhost waker Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 22/29] vhost+postcopy: Call wakeups Dr. David Alan Gilbert (git)
2017-07-11 4:22 ` Peter Xu
2017-07-12 15:00 ` Andrea Arcangeli
2017-07-14 2:45 ` Peter Xu
2017-07-14 14:18 ` Michael S. Tsirkin
2017-06-28 19:00 ` [Qemu-devel] [RFC 23/29] vub+postcopy: madvises Dr. David Alan Gilbert (git)
2017-08-07 4:49 ` Alexey Perevalov
2017-08-08 17:06 ` Dr. David Alan Gilbert
2017-08-09 11:02 ` Alexey Perevalov
2017-08-10 8:55 ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 24/29] vhost+postcopy: Lock around set_mem_table Dr. David Alan Gilbert (git)
2017-07-04 19:34 ` Maxime Coquelin
2017-07-07 11:53 ` Dr. David Alan Gilbert
2017-07-07 12:52 ` Maxime Coquelin
2017-10-03 13:23 ` Dr. David Alan Gilbert
2017-10-06 12:22 ` Maxime Coquelin
2017-10-09 12:12 ` Dr. David Alan Gilbert
2017-10-12 7:22 ` Maxime Coquelin
2017-06-28 19:00 ` [Qemu-devel] [RFC 25/29] vhu: enable = false on get_vring_base Dr. David Alan Gilbert (git)
2017-07-04 19:38 ` Maxime Coquelin
2017-07-04 21:59 ` Michael S. Tsirkin
2017-07-05 17:16 ` Dr. David Alan Gilbert
2017-07-05 23:28 ` Michael S. Tsirkin
2017-08-18 19:19 ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 26/29] vhost: Add VHOST_USER_POSTCOPY_END message Dr. David Alan Gilbert (git)
2017-07-27 11:35 ` Maxime Coquelin
2017-08-24 14:53 ` Dr. David Alan Gilbert
2017-06-28 19:00 ` [Qemu-devel] [RFC 27/29] vhost+postcopy: Wire up POSTCOPY_END notify Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 28/29] postcopy: Allow shared memory Dr. David Alan Gilbert (git)
2017-06-28 19:00 ` [Qemu-devel] [RFC 29/29] vhost-user: Claim support for postcopy Dr. David Alan Gilbert (git)
2017-07-04 14:09 ` Maxime Coquelin
2017-07-07 11:39 ` Dr. David Alan Gilbert
2017-06-29 18:55 ` [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram Dr. David Alan Gilbert
2017-07-03 11:03 ` Marc-André Lureau
2017-07-03 11:48 ` Dr. David Alan Gilbert
2017-07-07 10:51 ` Dr. David Alan Gilbert
[not found] ` <CGME20170703135859eucas1p1edc55e3318a3079b026bed81e0ae0388@eucas1p1.samsung.com>
2017-07-03 13:58 ` Alexey
2017-07-03 16:49 ` Dr. David Alan Gilbert
2017-07-03 17:42 ` Alexey
2017-07-03 17:55 ` Michael S. Tsirkin
2017-07-07 12:01 ` Dr. David Alan Gilbert [this message]
2017-07-07 15:35 ` Michael S. Tsirkin
2017-07-07 17:26 ` Dr. David Alan Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170707120155.GE2451@work-vm \
--to=dgilbert@redhat.com \
--cc=a.perevalov@samsung.com \
--cc=aarcange@redhat.com \
--cc=lvivier@redhat.com \
--cc=marcandre.lureau@redhat.com \
--cc=maxime.coquelin@redhat.com \
--cc=mst@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).