All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
Date: Sat, 11 Jan 2014 00:44:43 +1100	[thread overview]
Message-ID: <52CFF94B.9000301@ozlabs.ru> (raw)
In-Reply-To: <20140110124148.GE10700@redhat.com>

On 01/10/2014 11:41 PM, Michael S. Tsirkin wrote:
> On Fri, Jan 10, 2014 at 04:13:34PM +1100, Alexey Kardashevskiy wrote:
>> On 01/08/2014 12:18 AM, Alexey Kardashevskiy wrote:
>>> On 12/25/2013 02:43 AM, Michael S. Tsirkin wrote:
>>>> On Wed, Dec 25, 2013 at 01:15:29AM +1100, Alexey Kardashevskiy wrote:
>>>>> On 12/24/2013 08:40 PM, Michael S. Tsirkin wrote:
>>>>>> On Tue, Dec 24, 2013 at 02:09:07PM +1100, Alexey Kardashevskiy wrote:
>>>>>>> On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote:
>>>>>>>> On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
>>>>>>>>>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
>>>>>>>>>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>>>>> Hi!
>>>>>>>>>>>>
>>>>>>>>>>>> I am having a problem with virtio-net + vhost on POWER7 machine - it does
>>>>>>>>>>>> not survive reboot of the guest.
>>>>>>>>>>>>
>>>>>>>>>>>> Steps to reproduce:
>>>>>>>>>>>> 1. boot the guest
>>>>>>>>>>>> 2. configure eth0 and do ping - everything works
>>>>>>>>>>>> 3. reboot the guest (i.e. type "reboot")
>>>>>>>>>>>> 4. when it is booted, eth0 can be configured but will not work at all.
>>>>>>>>>>>>
>>>>>>>>>>>> The test is:
>>>>>>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>>>>>>> ping 172.20.1.23
>>>>>>>>>>>>
>>>>>>>>>>>> If to run tcpdump on the host's "tap-id3" interface, it shows no trafic
>>>>>>>>>>>> coming from the guest. If to compare how it works before and after reboot,
>>>>>>>>>>>> I can see the guest doing an ARP request for 172.20.1.23 and receives the
>>>>>>>>>>>> response and it does the same after reboot but the answer does not come.
>>>>>>>>>>>
>>>>>>>>>>> So you see the arp packet in guest but not in host?
>>>>>>>>>>
>>>>>>>>>> Yes.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> One thing to try is to boot debug kernel - where pr_debug is
>>>>>>>>>>> enabled - then you might see some errors in the kernel log.
>>>>>>>>>>
>>>>>>>>>> Tried and added lot more debug printk myself, not clear at all what is
>>>>>>>>>> happening there.
>>>>>>>>>>
>>>>>>>>>> One more hint - if I boot the guest and the guest does not bring eth0 up
>>>>>>>>>> AND wait more than 200 seconds (and less than 210 seconds), then eth0 will
>>>>>>>>>> not work at all. I.e. this script produces not-working-eth0:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ifconfig eth0 172.20.1.2 down
>>>>>>>>>> sleep 210
>>>>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>>>>> ping 172.20.1.23
>>>>>>>>>>
>>>>>>>>>> s/210/200/ - and it starts working. No reboot is required to reproduce.
>>>>>>>>>>
>>>>>>>>>> No "vhost" == always works. The only difference I can see here is vhost's
>>>>>>>>>> thread which may get suspended if not used for a while after the start and
>>>>>>>>>> does not wake up but this is almost a blind guess.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yet another clue - this host kernel patch seems to help with the guest
>>>>>>>>> reboot but does not help with the initial 210 seconds delay:
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>>>>>>>> index 69068e0..5e67650 100644
>>>>>>>>> --- a/drivers/vhost/vhost.c
>>>>>>>>> +++ b/drivers/vhost/vhost.c
>>>>>>>>> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev *dev, struct
>>>>>>>>> vhost_work *work)
>>>>>>>>>                 list_add_tail(&work->node, &dev->work_list);
>>>>>>>>>                 work->queue_seq++;
>>>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>>>>>> -               wake_up_process(dev->worker);
>>>>>>>>>         } else {
>>>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>>>>>>         }
>>>>>>>>> +       wake_up_process(dev->worker);
>>>>>>>>>  }
>>>>>>>>>  EXPORT_SYMBOL_GPL(vhost_work_queue);
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Interesting. Some kind of race? A missing memory barrier somewhere?
>>>>>>>
>>>>>>> I do not see how. I boot the guest and just wait 210 seconds, nothing
>>>>>>> happens to cause races.
>>>>>>>
>>>>>>>
>>>>>>>> Since it's all around startup,
>>>>>>>> you can try kicking the host eventfd in
>>>>>>>> vhost_net_start.
>>>>>>>
>>>>>>>
>>>>>>> How exactly? This did not help. Thanks.
>>>>>>>
>>>>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>>>>>> index 006576d..407ecf2 100644
>>>>>>> --- a/hw/net/vhost_net.c
>>>>>>> +++ b/hw/net/vhost_net.c
>>>>>>> @@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, NetClientState
>>>>>>> *ncs,
>>>>>>>          if (r < 0) {
>>>>>>>              goto err;
>>>>>>>          }
>>>>>>> +
>>>>>>> +        VHostNetState *vn = tap_get_vhost_net(ncs[i].peer);
>>>>>>> +        struct vhost_vring_file file = {
>>>>>>> +            .index = i
>>>>>>> +        };
>>>>>>> +        file.fd =
>>>>>>> event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq));
>>>>>>> +        r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file);
>>>>>>
>>>>>> No, this sets the notifier, it does not kick.
>>>>>> To kick you write 1 there:
>>>>>> 	uint6_t  v = 1;
>>>>>> 	write(fd, &v, sizeof v);
>>>>>
>>>>>
>>>>> Please, be precise. How/where do I get that @fd? Is what I do correct?
>>>>
>>>> Yes.
>>>
>>> Turns out that no. The control device in the host kernel does not implement
>>> write() so it always fails.
>>>
>>> This works:
>>>
>>> uint64_t v = 1;
>>> int fd = event_notifier_get_fd(&vq->host_notifier);
>>> int r = write(fd, &v, sizeof v);
>>>
>>> By "works" I mean it helps to wake the whole thing up and the guest's eth0
>>> starts working after 3 minutes delay.
>>
>>
>>
>> Checked if virtnet_napi_enable() is called as expected and it is. As I can
>> see "Receiving skb proto" in the guest's receive_buf(), I believe
>> host->guest channel works just fine but the guest is unable to send
>> anything until QEMU writes to event notifier (the code above).
>>
>> I actually spotted the problem in the host kernel - KVM_IOEVENTFD is called
>> with a PCI bus address but kvm_io_bus_write() is called with a guest
>> physical address and these things are different on PPC64/spapr.
>>
>> I am trying to make a patch for this and post it to some list tonight, I'll
>> put you in copy.
>>
> 
> Can we fix this in qemu?
> 
> We do:
>         memory_region_add_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2,
>                                   true, n, notifier);
> 
> I think as a result, KVM_IOEVENTFD should be called with guest physical address.


I fixed this in "[PATCH] KVM: fix addr type for KVM_IOEVENTFD", you are in
cc. Heh. I suspected something ppc64 specific as the problem does not
appear on x86, and I posted another patch for PPC64 HV KVM, but the QEMU's
bug is still nice :)




-- 
Alexey

  reply	other threads:[~2014-01-10 13:45 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-21 15:01 [Qemu-devel] vhost-net issue: does not survive reboot on ppc64 Alexey Kardashevskiy
2013-12-22 10:56 ` Michael S. Tsirkin
2013-12-22 14:46   ` Alexey Kardashevskiy
2013-12-22 15:01     ` Alexey Kardashevskiy
2013-12-23 16:24       ` Michael S. Tsirkin
2013-12-24  3:09         ` Alexey Kardashevskiy
2013-12-24  9:40           ` Michael S. Tsirkin
2013-12-24 14:15             ` Alexey Kardashevskiy
2013-12-24 15:43               ` Michael S. Tsirkin
2013-12-25  1:36                 ` Alexey Kardashevskiy
2013-12-25  9:52                   ` Michael S. Tsirkin
2013-12-26 10:13                     ` Alexey Kardashevskiy
2013-12-26 10:49                       ` Michael S. Tsirkin
2013-12-26 12:51                         ` Alexey Kardashevskiy
2013-12-26 13:48                           ` Michael S. Tsirkin
2013-12-26 14:59                             ` Alexey Kardashevskiy
2013-12-26 15:12                               ` Michael S. Tsirkin
2013-12-27  1:44                                 ` Alexey Kardashevskiy
2014-01-06  9:57                                   ` Alexey Kardashevskiy
2014-01-07 13:18                 ` Alexey Kardashevskiy
2014-01-10  5:13                   ` Alexey Kardashevskiy
2014-01-10 12:41                     ` Michael S. Tsirkin
2014-01-10 13:44                       ` Alexey Kardashevskiy [this message]
2013-12-22 11:41 ` Zhi Yong Wu
2013-12-22 14:48   ` Alexey Kardashevskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52CFF94B.9000301@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.