From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: zhanghailiang <zhang.zhanghailiang@huawei.com>
Cc: peter.maydell@linaro.org, mst@redhat.com, quintela@redhat.com,
luonengjun@huawei.com, qemu-devel@nongnu.org,
peter.huangpeng@huawei.com, stefanha@redhat.com
Subject: Re: [Qemu-devel] [PATCH] net: Forbid dealing with packets when VM is not running
Date: Tue, 19 Aug 2014 09:48:37 +0100 [thread overview]
Message-ID: <20140819084837.GC2383@work-vm> (raw)
In-Reply-To: <53F2F2C2.5080904@huawei.com>
* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> On 2014/8/18 20:27, Dr. David Alan Gilbert wrote:
> >* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> >>For all NICs(except virtio-net) emulated by qemu,
> >>Such as e1000, rtl8139, pcnet and ne2k_pci,
> >>Qemu can still receive packets when VM is not running.
> >>If this happened in *migration's* last PAUSE VM stage,
> >>The new dirty RAM related to the packets will be missed,
> >>And this will lead serious network fault in VM.
> >
> >I'd like to understand more about when exactly this happens.
> >migration.c:migration_thread in the last step, takes the iothread, puts
> >the runstate into RUN_STATE_FINISH_MIGRATE and only then calls qemu_savevm_state_complete
> >before unlocking the iothread.
> >
>
> Hmm, Sorry, the description above may not be exact.
>
> Actually, the action of receiving packets action happens after the
> migration thread unlock the iothread-lock(when VM is stopped), but
> before the end of the migration(to be exact, before close the socket
> fd, which is used to send data to destination).
>
> I think before close the socket fd, we can not assure all the RAM of the
> VM has been copied to the memory of network card or has been send to
> the destination, we still have the chance to modify its content. (If i
> am wrong, please let me know, Thanks. ;) )
>
> If the above situation happens, it will dirty parts of the RAM which
> will be send and parts of new dirty RAM page may be missed. As a result,
> The contents of memory in the destination is not integral, And this
> will lead network fault in VM.
Interesting; the only reason I can think that would happen, is because
arch_init.c:ram_save_page uses qemu_put_buffer_async to send most of the RAM pages
and that won't make sure the write happens until later.
This fix will probably help other migration code as well; postcopy
runs the migration for a lot longer after stopping the CPUs, and I am seeing
something that might be due to this very occasionally.
Dave
> Here i have made a test:
> (1) Download the new qemu.
> (2) Extend the time window between qemu_savevm_state_complete and
> migrate_fd_cleanup(where qemu_fclose(s->file) will be called), patch
> like(this will extend the chances of receiving packets between the time
> window):
> diff --git a/migration.c b/migration.c
> index 8d675b3..597abf9 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -614,7 +614,7 @@ static void *migration_thread(void *opaque)
> qemu_savevm_state_complete(s->file);
> }
> qemu_mutex_unlock_iothread();
> -
> + sleep(2);/*extend the time window between stop vm and end
> migration*/
> if (ret < 0) {
> migrate_set_state(s, MIG_STATE_ACTIVE,
> MIG_STATE_ERROR);
> break;
> (3) Start VM (suse11sp1) and in VM ping xxx -i 0.1
> (4) Migrate the VM to another host.
>
> And the *PING* command in VM will very likely fail with message:
> 'Destination HOST Unreachable', the NIC in VM will stay unavailable
> unless you run 'service network restart'.(without step 2, you may have
> to migration lots of times between two hosts before that happens).
>
> Thanks,
> zhanghailiang
>
> >If that's true, how does this problem happen - can the net code
> >still receive packets with the iothread lock taken?
> >qemu_savevm_state_complete does a call to the RAM code, so I think this should get
> >any RAM changes that happened before the lock was taken.
> >
> >I'm gently worried about threading stuff as well - is all this net code
> >running off fd handlers on the main thread or are there separate threads?
> >
> >Dave
> >
> >>To avoid this, we forbid receiving packets in generic net code when
> >>VM is not running. Also, when the runstate changes back to running,
> >>we definitely need to flush queues to get packets flowing again.
> >>
> >>Here we implement this in the net layer:
> >>(1) Judge the vm runstate in qemu_can_send_packet
> >>(2) Add a member 'VMChangeStateEntry *vmstate' to struct NICState,
> >>Which will listen for VM runstate changes.
> >>(3) Register a handler function for VMstate change.
> >>When vm changes back to running, we flush all queues in the callback function.
> >>(4) Remove checking vm state in virtio_net_can_receive
> >>
> >>Signed-off-by: zhanghailiang<zhang.zhanghailiang@huawei.com>
> >>---
> >> hw/net/virtio-net.c | 4 ----
> >> include/net/net.h | 2 ++
> >> net/net.c | 32 ++++++++++++++++++++++++++++++++
> >> 3 files changed, 34 insertions(+), 4 deletions(-)
> >>
> >>diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >>index 268eff9..287d762 100644
> >>--- a/hw/net/virtio-net.c
> >>+++ b/hw/net/virtio-net.c
> >>@@ -839,10 +839,6 @@ static int virtio_net_can_receive(NetClientState *nc)
> >> VirtIODevice *vdev = VIRTIO_DEVICE(n);
> >> VirtIONetQueue *q = virtio_net_get_subqueue(nc);
> >>
> >>- if (!vdev->vm_running) {
> >>- return 0;
> >>- }
> >>-
> >> if (nc->queue_index>= n->curr_queues) {
> >> return 0;
> >> }
> >>diff --git a/include/net/net.h b/include/net/net.h
> >>index ed594f9..a294277 100644
> >>--- a/include/net/net.h
> >>+++ b/include/net/net.h
> >>@@ -8,6 +8,7 @@
> >> #include "net/queue.h"
> >> #include "migration/vmstate.h"
> >> #include "qapi-types.h"
> >>+#include "sysemu/sysemu.h"
> >>
> >> #define MAX_QUEUE_NUM 1024
> >>
> >>@@ -96,6 +97,7 @@ typedef struct NICState {
> >> NICConf *conf;
> >> void *opaque;
> >> bool peer_deleted;
> >>+ VMChangeStateEntry *vmstate;
> >> } NICState;
> >>
> >> NetClientState *qemu_find_netdev(const char *id);
> >>diff --git a/net/net.c b/net/net.c
> >>index 6d930ea..21f0d48 100644
> >>--- a/net/net.c
> >>+++ b/net/net.c
> >>@@ -242,6 +242,29 @@ NetClientState *qemu_new_net_client(NetClientInfo *info,
> >> return nc;
> >> }
> >>
> >>+static void nic_vmstate_change_handler(void *opaque,
> >>+ int running,
> >>+ RunState state)
> >>+{
> >>+ NICState *nic = opaque;
> >>+ NetClientState *nc;
> >>+ int i, queues;
> >>+
> >>+ if (!running) {
> >>+ return;
> >>+ }
> >>+
> >>+ queues = MAX(1, nic->conf->peers.queues);
> >>+ for (i = 0; i< queues; i++) {
> >>+ nc =&nic->ncs[i];
> >>+ if (nc->receive_disabled
> >>+ || (nc->info->can_receive&& !nc->info->can_receive(nc))) {
> >>+ continue;
> >>+ }
> >>+ qemu_flush_queued_packets(nc);
> >>+ }
> >>+}
> >>+
> >> NICState *qemu_new_nic(NetClientInfo *info,
> >> NICConf *conf,
> >> const char *model,
> >>@@ -259,6 +282,8 @@ NICState *qemu_new_nic(NetClientInfo *info,
> >> nic->ncs = (void *)nic + info->size;
> >> nic->conf = conf;
> >> nic->opaque = opaque;
> >>+ nic->vmstate = qemu_add_vm_change_state_handler(nic_vmstate_change_handler,
> >>+ nic);
> >>
> >> for (i = 0; i< queues; i++) {
> >> qemu_net_client_setup(&nic->ncs[i], info, peers[i], model, name,
> >>@@ -379,6 +404,7 @@ void qemu_del_nic(NICState *nic)
> >> qemu_free_net_client(nc);
> >> }
> >>
> >>+ qemu_del_vm_change_state_handler(nic->vmstate);
> >> g_free(nic);
> >> }
> >>
> >>@@ -452,6 +478,12 @@ void qemu_set_vnet_hdr_len(NetClientState *nc, int len)
> >>
> >> int qemu_can_send_packet(NetClientState *sender)
> >> {
> >>+ int vmstat = runstate_is_running();
> >>+
> >>+ if (!vmstat) {
> >>+ return 0;
> >>+ }
> >>+
> >> if (!sender->peer) {
> >> return 1;
> >> }
> >>--
> >>1.7.12.4
> >>
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >.
> >
>
>
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
prev parent reply other threads:[~2014-08-19 8:50 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-18 4:46 [Qemu-devel] [PATCH] net: Forbid dealing with packets when VM is not running zhanghailiang
2014-08-18 6:55 ` Jason Wang
2014-08-18 8:32 ` zhanghailiang
2014-08-18 9:14 ` Jason Wang
2014-08-20 1:59 ` zhanghailiang
2014-08-19 12:29 ` Stefan Hajnoczi
2014-08-20 2:19 ` zhanghailiang
2014-08-20 3:17 ` Jason Wang
2014-08-22 10:08 ` Stefan Hajnoczi
2014-08-18 12:27 ` Dr. David Alan Gilbert
2014-08-19 6:46 ` zhanghailiang
2014-08-19 8:48 ` Dr. David Alan Gilbert [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140819084837.GC2383@work-vm \
--to=dgilbert@redhat.com \
--cc=luonengjun@huawei.com \
--cc=mst@redhat.com \
--cc=peter.huangpeng@huawei.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=stefanha@redhat.com \
--cc=zhang.zhanghailiang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.