* [Qemu-devel] [PATCHv2 0/2] migration: stop dma while VM is stopped @ 2010-11-04 18:04 Michael S. Tsirkin 2010-11-04 18:06 ` [Qemu-devel] [PATCHv2 1/2] char: separate device and system fd handlers Michael S. Tsirkin 2010-11-04 18:06 ` [Qemu-devel] [PATCHv2 2/2] tap: mark fd handler as device Michael S. Tsirkin 0 siblings, 2 replies; 15+ messages in thread From: Michael S. Tsirkin @ 2010-11-04 18:04 UTC (permalink / raw) To: Anthony Liguori, qemu-devel Here's a patch that's much smaller: instead of trying to solve the whole problem, it fixes the specific bug I observe, that of net device traffic when vm is stopped confusing the bridge. We can then slowly convert all other devices that need to be stopped on vmstop to this API. Comments? Michael S. Tsirkin (2): char: separate device and system fd handlers tap: mark fd handler as device hw/vhost_net.c | 2 +- net/tap.c | 3 +- qemu-char.h | 6 ++- qemu-kvm.c | 2 +- qemu-tool.c | 10 +++++ vl.c | 117 ++++++++++++++++++++++++++++++++++++------------------- 6 files changed, 95 insertions(+), 45 deletions(-) -- 1.7.3.2.91.g446ac ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCHv2 1/2] char: separate device and system fd handlers 2010-11-04 18:04 [Qemu-devel] [PATCHv2 0/2] migration: stop dma while VM is stopped Michael S. Tsirkin @ 2010-11-04 18:06 ` Michael S. Tsirkin 2010-11-16 10:24 ` [Qemu-devel] " Juan Quintela 2010-11-04 18:06 ` [Qemu-devel] [PATCHv2 2/2] tap: mark fd handler as device Michael S. Tsirkin 1 sibling, 1 reply; 15+ messages in thread From: Michael S. Tsirkin @ 2010-11-04 18:06 UTC (permalink / raw) To: qemu-devel, Anthony Liguori Create separate lists for system and device fd handlers. Device handlers will not run while vm is stopped. By default all fds are assumed system so they will keep running as before. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> --- qemu-char.h | 6 +++- qemu-kvm.c | 2 +- qemu-tool.c | 10 +++++ vl.c | 117 ++++++++++++++++++++++++++++++++++++++--------------------- 4 files changed, 92 insertions(+), 43 deletions(-) diff --git a/qemu-char.h b/qemu-char.h index 18ad12b..ad09f56 100644 --- a/qemu-char.h +++ b/qemu-char.h @@ -101,7 +101,11 @@ CharDriverState *qemu_chr_open_eventfd(int eventfd); extern int term_escape_char; /* async I/O support */ - +int qemu_set_fd_handler3(bool device, int fd, + IOCanReadHandler *fd_read_poll, + IOHandler *fd_read, + IOHandler *fd_write, + void *opaque); int qemu_set_fd_handler2(int fd, IOCanReadHandler *fd_read_poll, IOHandler *fd_read, diff --git a/qemu-kvm.c b/qemu-kvm.c index 625f286..56924a9 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -1621,7 +1621,7 @@ int kvm_main_loop(void) fcntl(sigfd, F_SETFL, O_NONBLOCK); - qemu_set_fd_handler2(sigfd, NULL, sigfd_handler, NULL, + qemu_set_fd_handler3(true, sigfd, NULL, sigfd_handler, NULL, (void *)(unsigned long) sigfd); pthread_cond_broadcast(&qemu_system_cond); diff --git a/qemu-tool.c b/qemu-tool.c index b39af86..99d5938 100644 --- a/qemu-tool.c +++ b/qemu-tool.c @@ -111,6 +111,16 @@ int qemu_set_fd_handler2(int fd, return 0; } +int qemu_set_fd_handler3(bool device, + int fd, + IOCanReadHandler *fd_read_poll, + IOHandler *fd_read, + IOHandler *fd_write, + void *opaque) +{ + return 0; +} + int64_t qemu_get_clock(QEMUClock *clock) { qemu_timeval tv; diff --git a/vl.c b/vl.c index 42617c2..9e9e2a1 100644 --- a/vl.c +++ b/vl.c @@ -958,34 +958,39 @@ typedef struct IOHandlerRecord { QLIST_ENTRY(IOHandlerRecord) next; } IOHandlerRecord; -static QLIST_HEAD(, IOHandlerRecord) io_handlers = - QLIST_HEAD_INITIALIZER(io_handlers); +typedef QLIST_HEAD(, IOHandlerRecord) IOHandlerRecordList; +static IOHandlerRecordList device_io_handlers = + QLIST_HEAD_INITIALIZER(device_io_handlers); +static IOHandlerRecordList system_io_handlers = + QLIST_HEAD_INITIALIZER(system_io_handlers); -/* XXX: fd_read_poll should be suppressed, but an API change is - necessary in the character devices to suppress fd_can_read(). */ -int qemu_set_fd_handler2(int fd, +int qemu_set_fd_handler3(bool device, + int fd, IOCanReadHandler *fd_read_poll, IOHandler *fd_read, IOHandler *fd_write, void *opaque) { + IOHandlerRecordList *list; IOHandlerRecord *ioh; + list = device ? &device_io_handlers: &system_io_handlers; + if (!fd_read && !fd_write) { - QLIST_FOREACH(ioh, &io_handlers, next) { + QLIST_FOREACH(ioh, list, next) { if (ioh->fd == fd) { ioh->deleted = 1; break; } } } else { - QLIST_FOREACH(ioh, &io_handlers, next) { + QLIST_FOREACH(ioh, list, next) { if (ioh->fd == fd) goto found; } ioh = qemu_mallocz(sizeof(IOHandlerRecord)); - QLIST_INSERT_HEAD(&io_handlers, ioh, next); + QLIST_INSERT_HEAD(list, ioh, next); found: ioh->fd = fd; ioh->fd_read_poll = fd_read_poll; @@ -998,6 +1003,19 @@ int qemu_set_fd_handler2(int fd, return 0; } + +/* XXX: fd_read_poll should be suppressed, but an API change is + necessary in the character devices to suppress fd_can_read(). */ +int qemu_set_fd_handler2(int fd, + IOCanReadHandler *fd_read_poll, + IOHandler *fd_read, + IOHandler *fd_write, + void *opaque) +{ + return qemu_set_fd_handler3(false, fd, fd_read_poll, fd_read, fd_write, + opaque); +} + int qemu_set_fd_handler(int fd, IOHandler *fd_read, IOHandler *fd_write, @@ -1242,9 +1260,52 @@ void qemu_system_powerdown_request(void) qemu_notify_event(); } -void main_loop_wait(int nonblocking) +static inline int get_ioh_fds(IOHandlerRecordList *list, + int nfds, fd_set *rfds, fd_set *wfds) { IOHandlerRecord *ioh; + QLIST_FOREACH(ioh, list, next) { + if (ioh->deleted) + continue; + if (ioh->fd_read && + (!ioh->fd_read_poll || + ioh->fd_read_poll(ioh->opaque) != 0)) { + FD_SET(ioh->fd, rfds); + if (ioh->fd > nfds) + nfds = ioh->fd; + } + if (ioh->fd_write) { + FD_SET(ioh->fd, wfds); + if (ioh->fd > nfds) + nfds = ioh->fd; + } + } + return nfds; +} + +static inline void call_ioh_fds(IOHandlerRecordList *list, + fd_set *rfds, fd_set *wfds) +{ + IOHandlerRecord *ioh, *pioh; + + QLIST_FOREACH_SAFE(ioh, list, next, pioh) { + if (ioh->deleted) { + QLIST_REMOVE(ioh, next); + qemu_free(ioh); + continue; + } + if (ioh->fd_read && FD_ISSET(ioh->fd, rfds)) { + ioh->fd_read(ioh->opaque); + if (!(ioh->fd_read_poll && ioh->fd_read_poll(ioh->opaque))) + FD_CLR(ioh->fd, rfds); + } + if (ioh->fd_write && FD_ISSET(ioh->fd, wfds)) { + ioh->fd_write(ioh->opaque); + } + } +} +void main_loop_wait(int nonblocking) +{ fd_set rfds, wfds, xfds; int ret, nfds; struct timeval tv; @@ -1260,26 +1321,13 @@ void main_loop_wait(int nonblocking) os_host_main_loop_wait(&timeout); /* poll any events */ - /* XXX: separate device handlers from system ones */ nfds = -1; FD_ZERO(&rfds); FD_ZERO(&wfds); FD_ZERO(&xfds); - QLIST_FOREACH(ioh, &io_handlers, next) { - if (ioh->deleted) - continue; - if (ioh->fd_read && - (!ioh->fd_read_poll || - ioh->fd_read_poll(ioh->opaque) != 0)) { - FD_SET(ioh->fd, &rfds); - if (ioh->fd > nfds) - nfds = ioh->fd; - } - if (ioh->fd_write) { - FD_SET(ioh->fd, &wfds); - if (ioh->fd > nfds) - nfds = ioh->fd; - } + nfds = get_ioh_fds(&system_io_handlers, nfds, &rfds, &wfds); + if (vm_running) { + nfds = get_ioh_fds(&device_io_handlers, nfds, &rfds, &wfds); } tv.tv_sec = timeout / 1000; @@ -1291,22 +1339,9 @@ void main_loop_wait(int nonblocking) ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv); qemu_mutex_lock_iothread(); if (ret > 0) { - IOHandlerRecord *pioh; - - QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { - if (ioh->deleted) { - QLIST_REMOVE(ioh, next); - qemu_free(ioh); - continue; - } - if (ioh->fd_read && FD_ISSET(ioh->fd, &rfds)) { - ioh->fd_read(ioh->opaque); - if (!(ioh->fd_read_poll && ioh->fd_read_poll(ioh->opaque))) - FD_CLR(ioh->fd, &rfds); - } - if (ioh->fd_write && FD_ISSET(ioh->fd, &wfds)) { - ioh->fd_write(ioh->opaque); - } + call_ioh_fds(&system_io_handlers, &rfds, &wfds); + if (vm_running) { + call_ioh_fds(&device_io_handlers, &rfds, &wfds); } } -- 1.7.3.2.91.g446ac ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] Re: [PATCHv2 1/2] char: separate device and system fd handlers 2010-11-04 18:06 ` [Qemu-devel] [PATCHv2 1/2] char: separate device and system fd handlers Michael S. Tsirkin @ 2010-11-16 10:24 ` Juan Quintela 0 siblings, 0 replies; 15+ messages in thread From: Juan Quintela @ 2010-11-16 10:24 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: qemu-devel "Michael S. Tsirkin" <mst@redhat.com> wrote: > Create separate lists for system and device fd handlers. > Device handlers will not run while vm is stopped. > By default all fds are assumed system so they will > keep running as before. > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > --- > qemu-char.h | 6 +++- > qemu-kvm.c | 2 +- > qemu-tool.c | 10 +++++ > vl.c | 117 ++++++++++++++++++++++++++++++++++++++--------------------- > 4 files changed, 92 insertions(+), 43 deletions(-) > > diff --git a/qemu-char.h b/qemu-char.h > index 18ad12b..ad09f56 100644 > --- a/qemu-char.h > +++ b/qemu-char.h > @@ -101,7 +101,11 @@ CharDriverState *qemu_chr_open_eventfd(int eventfd); > extern int term_escape_char; > > /* async I/O support */ > - > +int qemu_set_fd_handler3(bool device, int fd, > + IOCanReadHandler *fd_read_poll, > + IOHandler *fd_read, > + IOHandler *fd_write, > + void *opaque); No this horror. Can't we just create: qemu_set_device_handler() qemu_set_system_handler() or whatever named you like? qemu_set_fd_handler2 is already bad enough, adding another one just make things worse in my humble opinion. > +int qemu_set_fd_handler3(bool device, > + int fd, > IOCanReadHandler *fd_read_poll, > IOHandler *fd_read, > IOHandler *fd_write, > void *opaque) > { > + IOHandlerRecordList *list; > IOHandlerRecord *ioh; > > + list = device ? &device_io_handlers: &system_io_handlers; > + If you are going to use this, passing list paramenter instead of device looks like a much better option. It would indeed make things go better. > if (!fd_read && !fd_write) { > - QLIST_FOREACH(ioh, &io_handlers, next) { > + QLIST_FOREACH(ioh, list, next) { > if (ioh->fd == fd) { > ioh->deleted = 1; > break; > } > } > } else { > - QLIST_FOREACH(ioh, &io_handlers, next) { > + QLIST_FOREACH(ioh, list, next) { > if (ioh->fd == fd) > goto found; > } > ioh = qemu_mallocz(sizeof(IOHandlerRecord)); > - QLIST_INSERT_HEAD(&io_handlers, ioh, next); > + QLIST_INSERT_HEAD(list, ioh, next); > found: > ioh->fd = fd; > ioh->fd_read_poll = fd_read_poll; > @@ -998,6 +1003,19 @@ int qemu_set_fd_handler2(int fd, > return 0; > } > > + > +/* XXX: fd_read_poll should be suppressed, but an API change is > + necessary in the character devices to suppress fd_can_read(). */ > +int qemu_set_fd_handler2(int fd, > + IOCanReadHandler *fd_read_poll, > + IOHandler *fd_read, > + IOHandler *fd_write, > + void *opaque) > +{ > + return qemu_set_fd_handler3(false, fd, fd_read_poll, fd_read, fd_write, > + opaque); > +} > + > int qemu_set_fd_handler(int fd, > IOHandler *fd_read, > IOHandler *fd_write, > @@ -1242,9 +1260,52 @@ void qemu_system_powerdown_request(void) > qemu_notify_event(); > } > > -void main_loop_wait(int nonblocking) > +static inline int get_ioh_fds(IOHandlerRecordList *list, > + int nfds, fd_set *rfds, fd_set *wfds) > { > IOHandlerRecord *ioh; > + QLIST_FOREACH(ioh, list, next) { > + if (ioh->deleted) > + continue; > + if (ioh->fd_read && > + (!ioh->fd_read_poll || > + ioh->fd_read_poll(ioh->opaque) != 0)) { > + FD_SET(ioh->fd, rfds); > + if (ioh->fd > nfds) > + nfds = ioh->fd; > + } > + if (ioh->fd_write) { > + FD_SET(ioh->fd, wfds); > + if (ioh->fd > nfds) > + nfds = ioh->fd; > + } > + } > + return nfds; > +} > + > +static inline void call_ioh_fds(IOHandlerRecordList *list, > + fd_set *rfds, fd_set *wfds) > +{ > + IOHandlerRecord *ioh, *pioh; > + > + QLIST_FOREACH_SAFE(ioh, list, next, pioh) { > + if (ioh->deleted) { > + QLIST_REMOVE(ioh, next); > + qemu_free(ioh); > + continue; > + } > + if (ioh->fd_read && FD_ISSET(ioh->fd, rfds)) { > + ioh->fd_read(ioh->opaque); > + if (!(ioh->fd_read_poll && ioh->fd_read_poll(ioh->opaque))) > + FD_CLR(ioh->fd, rfds); > + } > + if (ioh->fd_write && FD_ISSET(ioh->fd, wfds)) { > + ioh->fd_write(ioh->opaque); > + } > + } > +} > +void main_loop_wait(int nonblocking) > +{ > fd_set rfds, wfds, xfds; > int ret, nfds; > struct timeval tv; > @@ -1260,26 +1321,13 @@ void main_loop_wait(int nonblocking) > os_host_main_loop_wait(&timeout); > > /* poll any events */ > - /* XXX: separate device handlers from system ones */ > nfds = -1; > FD_ZERO(&rfds); > FD_ZERO(&wfds); > FD_ZERO(&xfds); > - QLIST_FOREACH(ioh, &io_handlers, next) { > - if (ioh->deleted) > - continue; > - if (ioh->fd_read && > - (!ioh->fd_read_poll || > - ioh->fd_read_poll(ioh->opaque) != 0)) { > - FD_SET(ioh->fd, &rfds); > - if (ioh->fd > nfds) > - nfds = ioh->fd; > - } > - if (ioh->fd_write) { > - FD_SET(ioh->fd, &wfds); > - if (ioh->fd > nfds) > - nfds = ioh->fd; > - } > + nfds = get_ioh_fds(&system_io_handlers, nfds, &rfds, &wfds); > + if (vm_running) { > + nfds = get_ioh_fds(&device_io_handlers, nfds, &rfds, &wfds); > } > > tv.tv_sec = timeout / 1000; > @@ -1291,22 +1339,9 @@ void main_loop_wait(int nonblocking) > ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv); > qemu_mutex_lock_iothread(); > if (ret > 0) { > - IOHandlerRecord *pioh; > - > - QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { > - if (ioh->deleted) { > - QLIST_REMOVE(ioh, next); > - qemu_free(ioh); > - continue; > - } > - if (ioh->fd_read && FD_ISSET(ioh->fd, &rfds)) { > - ioh->fd_read(ioh->opaque); > - if (!(ioh->fd_read_poll && ioh->fd_read_poll(ioh->opaque))) > - FD_CLR(ioh->fd, &rfds); > - } > - if (ioh->fd_write && FD_ISSET(ioh->fd, &wfds)) { > - ioh->fd_write(ioh->opaque); > - } > + call_ioh_fds(&system_io_handlers, &rfds, &wfds); > + if (vm_running) { > + call_ioh_fds(&device_io_handlers, &rfds, &wfds); > } > } ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCHv2 2/2] tap: mark fd handler as device 2010-11-04 18:04 [Qemu-devel] [PATCHv2 0/2] migration: stop dma while VM is stopped Michael S. Tsirkin 2010-11-04 18:06 ` [Qemu-devel] [PATCHv2 1/2] char: separate device and system fd handlers Michael S. Tsirkin @ 2010-11-04 18:06 ` Michael S. Tsirkin 2010-11-15 14:52 ` [Qemu-devel] " Juan Quintela 1 sibling, 1 reply; 15+ messages in thread From: Michael S. Tsirkin @ 2010-11-04 18:06 UTC (permalink / raw) To: qemu-devel, Anthony Liguori There's no reason for tap to run when VM is stopped. If we let it, it confuses the bridge on TX and corrupts DMA memory on RX. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> --- hw/vhost_net.c | 2 +- net/tap.c | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/hw/vhost_net.c b/hw/vhost_net.c index c068be1..7061d6c 100644 --- a/hw/vhost_net.c +++ b/hw/vhost_net.c @@ -139,7 +139,7 @@ int vhost_net_start(struct vhost_net *net, } net->vc->info->poll(net->vc, false); - qemu_set_fd_handler(net->backend, NULL, NULL, NULL); + qemu_set_fd_handler3(true, net->backend, NULL, NULL, NULL, NULL); file.fd = net->backend; for (file.index = 0; file.index < net->dev.nvqs; ++file.index) { r = ioctl(net->dev.control, VHOST_NET_SET_BACKEND, &file); diff --git a/net/tap.c b/net/tap.c index 4afb314..246ef01 100644 --- a/net/tap.c +++ b/net/tap.c @@ -71,7 +71,8 @@ static void tap_writable(void *opaque); static void tap_update_fd_handler(TAPState *s) { - qemu_set_fd_handler2(s->fd, + qemu_set_fd_handler3(true, + s->fd, s->read_poll ? tap_can_send : NULL, s->read_poll ? tap_send : NULL, s->write_poll ? tap_writable : NULL, -- 1.7.3.2.91.g446ac ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] Re: [PATCHv2 2/2] tap: mark fd handler as device 2010-11-04 18:06 ` [Qemu-devel] [PATCHv2 2/2] tap: mark fd handler as device Michael S. Tsirkin @ 2010-11-15 14:52 ` Juan Quintela 2010-11-15 14:55 ` Anthony Liguori 2010-11-15 15:21 ` Michael S. Tsirkin 0 siblings, 2 replies; 15+ messages in thread From: Juan Quintela @ 2010-11-15 14:52 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: qemu-devel "Michael S. Tsirkin" <mst@redhat.com> wrote: > There's no reason for tap to run when VM is stopped. > If we let it, it confuses the bridge on TX > and corrupts DMA memory on RX. > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> once here, what handlers make sense to run while stopped? /me can think of the normal console, non live migration, loadvm and not much more. Perhaps it is easier to just move the other way around? Later, Juan. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] Re: [PATCHv2 2/2] tap: mark fd handler as device 2010-11-15 14:52 ` [Qemu-devel] " Juan Quintela @ 2010-11-15 14:55 ` Anthony Liguori 2010-11-15 15:18 ` Michael S. Tsirkin 2010-11-15 15:21 ` Michael S. Tsirkin 1 sibling, 1 reply; 15+ messages in thread From: Anthony Liguori @ 2010-11-15 14:55 UTC (permalink / raw) To: Juan Quintela; +Cc: qemu-devel, Michael S. Tsirkin On 11/15/2010 08:52 AM, Juan Quintela wrote: > "Michael S. Tsirkin"<mst@redhat.com> wrote: > >> There's no reason for tap to run when VM is stopped. >> If we let it, it confuses the bridge on TX >> and corrupts DMA memory on RX. >> >> Signed-off-by: Michael S. Tsirkin<mst@redhat.com> >> > once here, what handlers make sense to run while stopped? > /me can think of the normal console, non live migration, loadvm and not > much more. Perhaps it is easier to just move the other way around? > I'm not sure I concur that this is really a problem. Semantically, I don't think that stop has to imply that the guest memory no longer changes. Regards, Anthony Liguori > Later, Juan. > ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] Re: [PATCHv2 2/2] tap: mark fd handler as device 2010-11-15 14:55 ` Anthony Liguori @ 2010-11-15 15:18 ` Michael S. Tsirkin 2010-11-15 16:09 ` Anthony Liguori 0 siblings, 1 reply; 15+ messages in thread From: Michael S. Tsirkin @ 2010-11-15 15:18 UTC (permalink / raw) To: Anthony Liguori; +Cc: qemu-devel, Juan Quintela On Mon, Nov 15, 2010 at 08:55:07AM -0600, Anthony Liguori wrote: > On 11/15/2010 08:52 AM, Juan Quintela wrote: > >"Michael S. Tsirkin"<mst@redhat.com> wrote: > >>There's no reason for tap to run when VM is stopped. > >>If we let it, it confuses the bridge on TX > >>and corrupts DMA memory on RX. > >> > >>Signed-off-by: Michael S. Tsirkin<mst@redhat.com> > >once here, what handlers make sense to run while stopped? > >/me can think of the normal console, non live migration, loadvm and not > >much more. Perhaps it is easier to just move the other way around? > > I'm not sure I concur that this is really a problem. > Semantically, I don't think that stop has to imply that the guest > memory no longer changes. > > Regards, > > Anthony Liguori > > >Later, Juan. Well, I do not really know about vmstop that is not for migration. For most vmstop calls are for migration. And there, the problems are very real. First, it's not just memory. At least for network transmit, sending out packets with the same MAC from two locations is a problem. See? For memory, it is much worse: any memory changes can either get discarded or not. This breaks consistency guarantees that guest relies upon. Imagine virtio index getting updated but content not being updated. See? -- MST ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] Re: [PATCHv2 2/2] tap: mark fd handler as device 2010-11-15 15:18 ` Michael S. Tsirkin @ 2010-11-15 16:09 ` Anthony Liguori 2010-11-15 20:26 ` Michael S. Tsirkin 2010-11-15 20:53 ` Stefan Hajnoczi 0 siblings, 2 replies; 15+ messages in thread From: Anthony Liguori @ 2010-11-15 16:09 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: qemu-devel, Juan Quintela On 11/15/2010 09:18 AM, Michael S. Tsirkin wrote: > On Mon, Nov 15, 2010 at 08:55:07AM -0600, Anthony Liguori wrote: > >> On 11/15/2010 08:52 AM, Juan Quintela wrote: >> >>> "Michael S. Tsirkin"<mst@redhat.com> wrote: >>> >>>> There's no reason for tap to run when VM is stopped. >>>> If we let it, it confuses the bridge on TX >>>> and corrupts DMA memory on RX. >>>> >>>> Signed-off-by: Michael S. Tsirkin<mst@redhat.com> >>>> >>> once here, what handlers make sense to run while stopped? >>> /me can think of the normal console, non live migration, loadvm and not >>> much more. Perhaps it is easier to just move the other way around? >>> >> I'm not sure I concur that this is really a problem. >> Semantically, I don't think that stop has to imply that the guest >> memory no longer changes. >> >> Regards, >> >> Anthony Liguori >> >> >>> Later, Juan. >>> > Well, I do not really know about vmstop that is not for migration. > They are separate. For instance, we don't rely on stop to pause pending disk I/O because we don't serialize pending disk I/O operations. Instead, we flush all pending I/O and rely on the fact that disk I/O requests are always submitted in the context of a vCPU operation. This assumption breaks down though with ioeventfd so we need to revisit it. > For most vmstop calls are for migration. And there, the problems are very > real. > > First, it's not just memory. At least for network transmit, sending out > packets with the same MAC from two locations is a problem. See? > I agree it's a problem but I'm not sure that just marking fd handlers really helps. Bottom halves can also trigger transmits. I think that if we put something in the network layer that just queued packets if the vm is stopped, it would be a more robust solution to the problem. > For memory, it is much worse: any memory changes can either get > discarded or not. This breaks consistency guarantees that guest relies > upon. Imagine virtio index getting updated but content not being > updated. See? > If you suppress any I/O then the memory changes don't matter because the same changes will happen on the destination too. I think this basic problem is the same as Kemari. We can either attempt to totally freeze a guest which means stopping all callbacks that are device related or we can prevent I/O from happening which should introduce enough determinism to fix the problem in practice. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] Re: [PATCHv2 2/2] tap: mark fd handler as device 2010-11-15 16:09 ` Anthony Liguori @ 2010-11-15 20:26 ` Michael S. Tsirkin 2010-11-15 20:37 ` Anthony Liguori 2010-11-15 20:53 ` Stefan Hajnoczi 1 sibling, 1 reply; 15+ messages in thread From: Michael S. Tsirkin @ 2010-11-15 20:26 UTC (permalink / raw) To: Anthony Liguori; +Cc: qemu-devel, Juan Quintela On Mon, Nov 15, 2010 at 10:09:29AM -0600, Anthony Liguori wrote: > On 11/15/2010 09:18 AM, Michael S. Tsirkin wrote: > >On Mon, Nov 15, 2010 at 08:55:07AM -0600, Anthony Liguori wrote: > >>On 11/15/2010 08:52 AM, Juan Quintela wrote: > >>>"Michael S. Tsirkin"<mst@redhat.com> wrote: > >>>>There's no reason for tap to run when VM is stopped. > >>>>If we let it, it confuses the bridge on TX > >>>>and corrupts DMA memory on RX. > >>>> > >>>>Signed-off-by: Michael S. Tsirkin<mst@redhat.com> > >>>once here, what handlers make sense to run while stopped? > >>>/me can think of the normal console, non live migration, loadvm and not > >>>much more. Perhaps it is easier to just move the other way around? > >>I'm not sure I concur that this is really a problem. > >>Semantically, I don't think that stop has to imply that the guest > >>memory no longer changes. > >> > >>Regards, > >> > >>Anthony Liguori > >> > >>>Later, Juan. > >Well, I do not really know about vmstop that is not for migration. > > They are separate. For instance, we don't rely on stop to pause > pending disk I/O because we don't serialize pending disk I/O > operations. Instead, we flush all pending I/O and rely on the fact > that disk I/O requests are always submitted in the context of a vCPU > operation. This assumption breaks down though with ioeventfd so we > need to revisit it. > > >For most vmstop calls are for migration. And there, the problems are very > >real. > > > >First, it's not just memory. At least for network transmit, sending out > >packets with the same MAC from two locations is a problem. See? > > I agree it's a problem but I'm not sure that just marking fd > handlers really helps. > > Bottom halves can also trigger transmits. Are there any system ones? Can we just stop processing them? > I think that if we put > something in the network layer that just queued packets if the vm is > stopped, it would be a more robust solution to the problem. Will only work for -net. The problem is for anything that can trigger activity when vm is stopped. > >For memory, it is much worse: any memory changes can either get > >discarded or not. This breaks consistency guarantees that guest relies > >upon. Imagine virtio index getting updated but content not being > >updated. See? > > If you suppress any I/O then the memory changes don't matter because > the same changes will happen on the destination too. They matter, and same changes won't happen. Example: virtio used index is in page 1, it can point at data in page 2. device writes into data, *then* into index. Order matters, but won't be preserved: migration assumes memory does not change after vmstop, and so it might send old values for data but new values for index. Result will be invalid data coming into guest. On the destination guest will pick up the index and get bad (stale) data. > I think this basic problem is the same as Kemari. We can either > attempt to totally freeze a guest which means stopping all callbacks > that are device related or we can prevent I/O from happening which > should introduce enough determinism to fix the problem in practice. > > Regards, > > Anthony Liguori See above. IMO it's a different problem. Unlike Kemari, I don't really see any drawbacks to stop all callbacks. Do you? -- MST ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] Re: [PATCHv2 2/2] tap: mark fd handler as device 2010-11-15 20:26 ` Michael S. Tsirkin @ 2010-11-15 20:37 ` Anthony Liguori 0 siblings, 0 replies; 15+ messages in thread From: Anthony Liguori @ 2010-11-15 20:37 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: qemu-devel, Juan Quintela On 11/15/2010 02:26 PM, Michael S. Tsirkin wrote: > Are there any system ones? > There's one in qemu-char.c. Not sure it's easy to just say for the future that there can't be BHs that get delivered during vmstop. > Can we just stop processing them? > We ran into a lot of problems trying to do this with emulating synchronous I/O in the block layer. If we can avoid the stop-dispatching handlers approach, I think we'll have far fewer problems. >> I think that if we put >> something in the network layer that just queued packets if the vm is >> stopped, it would be a more robust solution to the problem. >> > Will only work for -net. The problem is for anything > that can trigger activity when vm is stopped. > Activity or external I/O? My assertion is that if we can stop things from hitting the disk, escaping the network, or leaving a character device, we're solid. >>> For memory, it is much worse: any memory changes can either get >>> discarded or not. This breaks consistency guarantees that guest relies >>> upon. Imagine virtio index getting updated but content not being >>> updated. See? >>> >> If you suppress any I/O then the memory changes don't matter because >> the same changes will happen on the destination too. >> > They matter, and same changes won't happen. > Example: > > virtio used index is in page 1, it can point at data in page 2. > device writes into data, *then* into index. Order matters, > but won't be preserved: migration assumes memory does not > change after vmstop, and so it might send old values for > data but new values for index. Result will be invalid > data coming into guest. > No, this scenario is invalid. Migration copies data while the guest is live and whenever a change happens, updates the dirty bitmap (that's why cpu_physical_memory_rw matters). Once the VM is stopped, we never return to the main loop before completing migration so nothing else gets an opportunity to run. This means when we send a page, we guarantee it won't be made dirty against until after the migration completes. Once the migration completes, the contents of memory may change but that's okay. As long as you stop packets from being sent, if you need to resume the guest, it'll pick up where it left off. > On the destination guest will pick up the index and > get bad (stale) data. > If you're seeing this happen with vhost, you aren't updating dirty bits correctly. AFAICT, this cannot happen with userspace device models. > >> I think this basic problem is the same as Kemari. We can either >> attempt to totally freeze a guest which means stopping all callbacks >> that are device related or we can prevent I/O from happening which >> should introduce enough determinism to fix the problem in practice. >> >> Regards, >> >> Anthony Liguori >> > > See above. IMO it's a different problem. Unlike Kemari, > I don't really see any drawbacks to stop all callbacks. > Do you? > I think it's going to be extremely difficult to get right and keep right. A bunch of code needs to be converted to make us safe and then becomes brittle as it's very simple to change things in such a way that we're broken again. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Re: [PATCHv2 2/2] tap: mark fd handler as device 2010-11-15 16:09 ` Anthony Liguori 2010-11-15 20:26 ` Michael S. Tsirkin @ 2010-11-15 20:53 ` Stefan Hajnoczi 2010-11-15 21:05 ` Anthony Liguori 1 sibling, 1 reply; 15+ messages in thread From: Stefan Hajnoczi @ 2010-11-15 20:53 UTC (permalink / raw) To: Anthony Liguori; +Cc: Juan Quintela, qemu-devel, Michael S. Tsirkin On Mon, Nov 15, 2010 at 4:09 PM, Anthony Liguori <anthony@codemonkey.ws> wrote: > On 11/15/2010 09:18 AM, Michael S. Tsirkin wrote: >> >> On Mon, Nov 15, 2010 at 08:55:07AM -0600, Anthony Liguori wrote: >> >>> >>> On 11/15/2010 08:52 AM, Juan Quintela wrote: >>> >>>> >>>> "Michael S. Tsirkin"<mst@redhat.com> wrote: >>>> >>>>> >>>>> There's no reason for tap to run when VM is stopped. >>>>> If we let it, it confuses the bridge on TX >>>>> and corrupts DMA memory on RX. >>>>> >>>>> Signed-off-by: Michael S. Tsirkin<mst@redhat.com> >>>>> >>>> >>>> once here, what handlers make sense to run while stopped? >>>> /me can think of the normal console, non live migration, loadvm and not >>>> much more. Perhaps it is easier to just move the other way around? >>>> >>> >>> I'm not sure I concur that this is really a problem. >>> Semantically, I don't think that stop has to imply that the guest >>> memory no longer changes. >>> >>> Regards, >>> >>> Anthony Liguori >>> >>> >>>> >>>> Later, Juan. >>>> >> >> Well, I do not really know about vmstop that is not for migration. >> > > They are separate. For instance, we don't rely on stop to pause pending > disk I/O because we don't serialize pending disk I/O operations. Instead, > we flush all pending I/O and rely on the fact that disk I/O requests are > always submitted in the context of a vCPU operation. This assumption breaks > down though with ioeventfd so we need to revisit it. vhost has a solution for this: register a VMChangeStateHandler function that stops ioeventfd while vm_stop(0) is in effect. Then poll to see if there is work pending. I will add equivalent functionality to virtio-ioeventfd. Stefan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Re: [PATCHv2 2/2] tap: mark fd handler as device 2010-11-15 20:53 ` Stefan Hajnoczi @ 2010-11-15 21:05 ` Anthony Liguori 2010-11-15 22:44 ` Michael S. Tsirkin 0 siblings, 1 reply; 15+ messages in thread From: Anthony Liguori @ 2010-11-15 21:05 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: Juan Quintela, qemu-devel, Michael S. Tsirkin On 11/15/2010 02:53 PM, Stefan Hajnoczi wrote: > > vhost has a solution for this: register a VMChangeStateHandler > function that stops ioeventfd while vm_stop(0) is in effect. Then > poll to see if there is work pending. > > I will add equivalent functionality to virtio-ioeventfd. > I still think that stopping this at the net/block layer is going to be a lot more robust in the long term. All net/block traffic flows through a single set of interfaces whereas getting the devices to completely stop themselves requires touching every device and making sure that noone adds back vmstop-unfriendly behavior down the road. Regards, Anthony Liguori > Stefan > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Re: [PATCHv2 2/2] tap: mark fd handler as device 2010-11-15 21:05 ` Anthony Liguori @ 2010-11-15 22:44 ` Michael S. Tsirkin 2010-11-15 23:46 ` Anthony Liguori 0 siblings, 1 reply; 15+ messages in thread From: Michael S. Tsirkin @ 2010-11-15 22:44 UTC (permalink / raw) To: Anthony Liguori; +Cc: Stefan Hajnoczi, qemu-devel, Juan Quintela On Mon, Nov 15, 2010 at 03:05:39PM -0600, Anthony Liguori wrote: > On 11/15/2010 02:53 PM, Stefan Hajnoczi wrote: > > > >vhost has a solution for this: register a VMChangeStateHandler > >function that stops ioeventfd while vm_stop(0) is in effect. Then > >poll to see if there is work pending. > > > >I will add equivalent functionality to virtio-ioeventfd. > > I still think that stopping this at the net/block layer is going to > be a lot more robust in the long term. > All net/block traffic flows > through a single set of interfaces whereas getting the devices to > completely stop themselves requires touching every device and making > sure that noone adds back vmstop-unfriendly behavior down the road. > > Regards, > > Anthony Liguori So I'm not sure I understand what is proposed here. Care posting a patch? -- MST ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Re: [PATCHv2 2/2] tap: mark fd handler as device 2010-11-15 22:44 ` Michael S. Tsirkin @ 2010-11-15 23:46 ` Anthony Liguori 0 siblings, 0 replies; 15+ messages in thread From: Anthony Liguori @ 2010-11-15 23:46 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Stefan Hajnoczi, qemu-devel, Juan Quintela On 11/15/2010 04:44 PM, Michael S. Tsirkin wrote: > So I'm not sure I understand what is proposed here. > Care posting a patch? > I *think* just adding: if (vm_running == 0) { return qemu_net_queue_append(queue, sender, flags, data, size, NULL); } To qemu_net_queue_send() along with a state notifier to kick the queue on start would do the trick. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] Re: [PATCHv2 2/2] tap: mark fd handler as device 2010-11-15 14:52 ` [Qemu-devel] " Juan Quintela 2010-11-15 14:55 ` Anthony Liguori @ 2010-11-15 15:21 ` Michael S. Tsirkin 1 sibling, 0 replies; 15+ messages in thread From: Michael S. Tsirkin @ 2010-11-15 15:21 UTC (permalink / raw) To: Juan Quintela; +Cc: qemu-devel On Mon, Nov 15, 2010 at 03:52:54PM +0100, Juan Quintela wrote: > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > There's no reason for tap to run when VM is stopped. > > If we let it, it confuses the bridge on TX > > and corrupts DMA memory on RX. > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > once here, what handlers make sense to run while stopped? > /me can think of the normal console, non live migration, loadvm and not > much more. Perhaps it is easier to just move the other way around? > > Later, Juan. vnc? SDL? Yes, more devices need to be stopped than not, but I tread carefully to avoid breaking existing functionality. If you could solve it for all devices in one swoop, that'd be great. I'm not up to it. -- MST ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2010-11-16 10:24 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-04 18:04 [Qemu-devel] [PATCHv2 0/2] migration: stop dma while VM is stopped Michael S. Tsirkin 2010-11-04 18:06 ` [Qemu-devel] [PATCHv2 1/2] char: separate device and system fd handlers Michael S. Tsirkin 2010-11-16 10:24 ` [Qemu-devel] " Juan Quintela 2010-11-04 18:06 ` [Qemu-devel] [PATCHv2 2/2] tap: mark fd handler as device Michael S. Tsirkin 2010-11-15 14:52 ` [Qemu-devel] " Juan Quintela 2010-11-15 14:55 ` Anthony Liguori 2010-11-15 15:18 ` Michael S. Tsirkin 2010-11-15 16:09 ` Anthony Liguori 2010-11-15 20:26 ` Michael S. Tsirkin 2010-11-15 20:37 ` Anthony Liguori 2010-11-15 20:53 ` Stefan Hajnoczi 2010-11-15 21:05 ` Anthony Liguori 2010-11-15 22:44 ` Michael S. Tsirkin 2010-11-15 23:46 ` Anthony Liguori 2010-11-15 15:21 ` Michael S. Tsirkin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).