All of lore.kernel.org
 help / color / mirror / Atom feed
From: zhanghailiang <zhang.zhanghailiang@huawei.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: peter.maydell@linaro.org, mst@redhat.com, luonengjun@huawei.com,
	peter.huangpeng@huawei.com, qemu-devel@nongnu.org,
	stefanha@redhat.com
Subject: Re: [Qemu-devel] [PATCH] net: Forbid dealing with packets when VM is not running
Date: Tue, 19 Aug 2014 14:46:26 +0800	[thread overview]
Message-ID: <53F2F2C2.5080904@huawei.com> (raw)
In-Reply-To: <20140818122732.GK2388@work-vm>

On 2014/8/18 20:27, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> For all NICs(except virtio-net) emulated by qemu,
>> Such as e1000, rtl8139, pcnet and ne2k_pci,
>> Qemu can still receive packets when VM is not running.
>> If this happened in *migration's* last PAUSE VM stage,
>> The new dirty RAM related to the packets will be missed,
>> And this will lead serious network fault in VM.
>
> I'd like to understand more about when exactly this happens.
> migration.c:migration_thread  in the last step, takes the iothread, puts
> the runstate into RUN_STATE_FINISH_MIGRATE and only then calls qemu_savevm_state_complete
> before unlocking the iothread.
>

Hmm, Sorry, the description above may not be exact.

Actually, the action of receiving packets action happens after the
migration thread unlock the iothread-lock(when VM is stopped), but
before the end of the migration(to be exact, before close the socket
fd, which is used to send data to destination).

I think before close the socket fd, we can not assure all the RAM of the
VM has been copied to the memory of network card or has been send to
the destination, we still have the chance to modify its content. (If i
am wrong, please let me know, Thanks. ;) )

If the above situation happens, it will dirty parts of the RAM which
will be send and parts of new dirty RAM page may be missed. As a result,
The contents of memory in the destination is not integral, And this
will lead network fault in VM.

Here i have made a test:
(1) Download the new qemu.
(2) Extend the time window between qemu_savevm_state_complete and
migrate_fd_cleanup(where qemu_fclose(s->file) will be called), patch
like(this will extend the chances of receiving packets between the time
window):
diff --git a/migration.c b/migration.c
index 8d675b3..597abf9 100644
--- a/migration.c
+++ b/migration.c
@@ -614,7 +614,7 @@ static void *migration_thread(void *opaque)
                      qemu_savevm_state_complete(s->file);
                  }
                  qemu_mutex_unlock_iothread();
-
+                sleep(2);/*extend the time window between stop vm and 
end migration*/
                  if (ret < 0) {
                      migrate_set_state(s, MIG_STATE_ACTIVE, 
MIG_STATE_ERROR);
                      break;
(3) Start VM (suse11sp1) and in VM ping xxx -i 0.1
(4) Migrate the VM to another host.

And the *PING* command in VM will very likely fail with message:
'Destination HOST Unreachable', the NIC in VM will stay unavailable
unless you run 'service network restart'.(without step 2, you may have
to migration lots of times between two hosts before that happens).

Thanks,
zhanghailiang

> If that's true, how does this problem happen - can the net code
> still receive packets with the iothread lock taken?
> qemu_savevm_state_complete does a call to the RAM code, so I think this should get
> any RAM changes that happened before the lock was taken.
>
> I'm gently worried about threading stuff as well - is all this net code
> running off fd handlers on the main thread or are there separate threads?
>
> Dave
>
>> To avoid this, we forbid receiving packets in generic net code when
>> VM is not running. Also, when the runstate changes back to running,
>> we definitely need to flush queues to get packets flowing again.
>>
>> Here we implement this in the net layer:
>> (1) Judge the vm runstate in qemu_can_send_packet
>> (2) Add a member 'VMChangeStateEntry *vmstate' to struct NICState,
>> Which will listen for VM runstate changes.
>> (3) Register a handler function for VMstate change.
>> When vm changes back to running, we flush all queues in the callback function.
>> (4) Remove checking vm state in virtio_net_can_receive
>>
>> Signed-off-by: zhanghailiang<zhang.zhanghailiang@huawei.com>
>> ---
>>   hw/net/virtio-net.c |  4 ----
>>   include/net/net.h   |  2 ++
>>   net/net.c           | 32 ++++++++++++++++++++++++++++++++
>>   3 files changed, 34 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index 268eff9..287d762 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -839,10 +839,6 @@ static int virtio_net_can_receive(NetClientState *nc)
>>       VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>       VirtIONetQueue *q = virtio_net_get_subqueue(nc);
>>
>> -    if (!vdev->vm_running) {
>> -        return 0;
>> -    }
>> -
>>       if (nc->queue_index>= n->curr_queues) {
>>           return 0;
>>       }
>> diff --git a/include/net/net.h b/include/net/net.h
>> index ed594f9..a294277 100644
>> --- a/include/net/net.h
>> +++ b/include/net/net.h
>> @@ -8,6 +8,7 @@
>>   #include "net/queue.h"
>>   #include "migration/vmstate.h"
>>   #include "qapi-types.h"
>> +#include "sysemu/sysemu.h"
>>
>>   #define MAX_QUEUE_NUM 1024
>>
>> @@ -96,6 +97,7 @@ typedef struct NICState {
>>       NICConf *conf;
>>       void *opaque;
>>       bool peer_deleted;
>> +    VMChangeStateEntry *vmstate;
>>   } NICState;
>>
>>   NetClientState *qemu_find_netdev(const char *id);
>> diff --git a/net/net.c b/net/net.c
>> index 6d930ea..21f0d48 100644
>> --- a/net/net.c
>> +++ b/net/net.c
>> @@ -242,6 +242,29 @@ NetClientState *qemu_new_net_client(NetClientInfo *info,
>>       return nc;
>>   }
>>
>> +static void nic_vmstate_change_handler(void *opaque,
>> +                                       int running,
>> +                                       RunState state)
>> +{
>> +    NICState *nic = opaque;
>> +    NetClientState *nc;
>> +    int i, queues;
>> +
>> +    if (!running) {
>> +        return;
>> +    }
>> +
>> +    queues = MAX(1, nic->conf->peers.queues);
>> +    for (i = 0; i<  queues; i++) {
>> +        nc =&nic->ncs[i];
>> +        if (nc->receive_disabled
>> +            || (nc->info->can_receive&&  !nc->info->can_receive(nc))) {
>> +            continue;
>> +        }
>> +        qemu_flush_queued_packets(nc);
>> +    }
>> +}
>> +
>>   NICState *qemu_new_nic(NetClientInfo *info,
>>                          NICConf *conf,
>>                          const char *model,
>> @@ -259,6 +282,8 @@ NICState *qemu_new_nic(NetClientInfo *info,
>>       nic->ncs = (void *)nic + info->size;
>>       nic->conf = conf;
>>       nic->opaque = opaque;
>> +    nic->vmstate = qemu_add_vm_change_state_handler(nic_vmstate_change_handler,
>> +                                                    nic);
>>
>>       for (i = 0; i<  queues; i++) {
>>           qemu_net_client_setup(&nic->ncs[i], info, peers[i], model, name,
>> @@ -379,6 +404,7 @@ void qemu_del_nic(NICState *nic)
>>           qemu_free_net_client(nc);
>>       }
>>
>> +    qemu_del_vm_change_state_handler(nic->vmstate);
>>       g_free(nic);
>>   }
>>
>> @@ -452,6 +478,12 @@ void qemu_set_vnet_hdr_len(NetClientState *nc, int len)
>>
>>   int qemu_can_send_packet(NetClientState *sender)
>>   {
>> +    int vmstat = runstate_is_running();
>> +
>> +    if (!vmstat) {
>> +        return 0;
>> +    }
>> +
>>       if (!sender->peer) {
>>           return 1;
>>       }
>> --
>> 1.7.12.4
>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

  reply	other threads:[~2014-08-19  6:47 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-18  4:46 [Qemu-devel] [PATCH] net: Forbid dealing with packets when VM is not running zhanghailiang
2014-08-18  6:55 ` Jason Wang
2014-08-18  8:32   ` zhanghailiang
2014-08-18  9:14     ` Jason Wang
2014-08-20  1:59       ` zhanghailiang
2014-08-19 12:29     ` Stefan Hajnoczi
2014-08-20  2:19       ` zhanghailiang
2014-08-20  3:17       ` Jason Wang
2014-08-22 10:08         ` Stefan Hajnoczi
2014-08-18 12:27 ` Dr. David Alan Gilbert
2014-08-19  6:46   ` zhanghailiang [this message]
2014-08-19  8:48     ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53F2F2C2.5080904@huawei.com \
    --to=zhang.zhanghailiang@huawei.com \
    --cc=dgilbert@redhat.com \
    --cc=luonengjun@huawei.com \
    --cc=mst@redhat.com \
    --cc=peter.huangpeng@huawei.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.