netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>,
	"Zhangjie (HZ)" <zhangjie14@huawei.com>
Cc: netdev@vger.kernel.org, qinchuanyu@huawei.com,
	liuyongan@huawei.com, davem@davemloft.net
Subject: Re: Query: Is it possible  to lose interrupts between vhost and virtio_net during migration?
Date: Fri, 01 Aug 2014 19:14:49 +0800	[thread overview]
Message-ID: <53DB76A9.8010305@redhat.com> (raw)
In-Reply-To: <53DB705F.2000405@redhat.com>

On 08/01/2014 06:47 PM, Jason Wang wrote:
> On 07/31/2014 10:37 PM, Michael S. Tsirkin wrote:
>> > On Thu, Jul 31, 2014 at 04:31:00PM +0200, Michael S. Tsirkin wrote:
>>>> >> > On Thu, Jul 31, 2014 at 07:47:24PM +0800, Zhangjie (HZ) wrote:
>>>>>> >>> > > [The test scenario]:
>>>>>> >>> > > 
>>>>>> >>> > > Doing migration between two Hosts roundly(A->B, B->A) ,after about 20 times, network of the VM is unreachable.
>>>>>> >>> > > There are other 20 VMs in each Host, and they send ipv4 or ipv6 and multicast packets to each other.
>>>>>> >>> > > Sometimes the CPU idle of the Host maybe 0;
>>>>>> >>> > > 
>>>>>> >>> > > [Problem description]:
>>>>>> >>> > > 
>>>>>> >>> > > I wonder if it was interrupts missing that cause the network unreachable.
>>>>>> >>> > > In the migration process of kvm, source end should suspend, which include steps as follows:
>>>>>> >>> > > 1.	do_vm_stop->pause_all_vcpus
>>>>>> >>> > > 2.	vm_state_notify-> vhost_net_stop->set_guest_notifiers->kvm_virtio_pci_vq_vector_release
>>>>>> >>> > > 3.	vm_state_notify-> vhost_net_stop-> vhost_net_stop_one->OST_NET_SET_BACKEND-> vhost_net_flush_vq-> vhost_work_flush
>>>>>> >>> > > This may cause interrupts missing. Supose the scene that, virtqueue_notify() is called in virtio_net,
>>>>>> >>> > > then the VM is paused. And, just before the portiowrite being handled, eventfd of kvm is released.
>>>>>> >>> > > Then, vhost could not sense the notify, and the tx notify is lost.
>>>>>> >>> > > On the other side, if eventfd of kvm is released just after vhost_notify(), and before eventfd_signal(), then rx signal by vhost is lost.
>>>> >> > 
>>>> >> > Could be a bug in userspace: should should cleanups notifiers
>>>> >> > after it stops vhost.
>>>> >> > 
>>>> >> > Could you please send this to appropriate mailing lists?
>>>> >> > I have a policy against off-list discussions.
>> > Also, Jason, could you take a look please?
>> > Looks like your patch a9f98bb5ebe6fb1869321dcc58e72041ae626ad8
>> > changed the order of stopping the device.
>> > Previously vhost_dev_stop would disable backend and only afterwards,
>> > unset guest notifiers.  You now unset guest notifiers while vhost is still
>> > active. Looks like this can lose events?
> Not sure it will really cause the issue. Since during guest notifier
> deassign in virtio_queue_set_guest_notifier_fd_handler() it will test
> the notifier and trigger callback if set. Looks like this can guarantee
> the interrupt was not lost.

More thought on this, looks like it was still a window between guest
notifiers disabling and vhost_net stopping.

Please Zhang Jie test the patch of changing its order and if it works,
sends a formal patch to qemu-devel.

btw, vhost_scsi may need the fix as well since it may meet the same issue.

Thanks

  reply	other threads:[~2014-08-01 11:15 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-31 11:47 Query: Is it possible to lose interrupts between vhost and virtio_net during migration? Zhangjie (HZ)
2014-07-31 14:31 ` Michael S. Tsirkin
2014-07-31 14:37   ` Michael S. Tsirkin
2014-08-01 10:47     ` Jason Wang
2014-08-01 11:14       ` Jason Wang [this message]
2014-08-05  6:29         ` Zhangjie (HZ)
2014-08-05  9:49           ` Michael S. Tsirkin
2014-08-05 12:14             ` Zhangjie (HZ)
2014-08-07 12:47               ` Zhangjie (HZ)
2014-08-14  8:52                 ` Jason Wang
2014-08-14 10:02                   ` Michael S. Tsirkin
2014-08-15  2:55                     ` Jason Wang
2014-08-17 10:22                       ` Michael S. Tsirkin
2014-08-18  5:23                         ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53DB76A9.8010305@redhat.com \
    --to=jasowang@redhat.com \
    --cc=davem@davemloft.net \
    --cc=liuyongan@huawei.com \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=qinchuanyu@huawei.com \
    --cc=zhangjie14@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).