[Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
@ 2016-03-30 15:14 Ilya Maximets
  2016-03-30 17:01 ` Michael S. Tsirkin
  0 siblings, 1 reply; 10+ messages in thread
From: Ilya Maximets @ 2016-03-30 15:14 UTC (permalink / raw)
  To: qemu-devel, Michael S. Tsirkin; +Cc: Ilya Maximets, Jason Wang, Dyasly Sergey

Currently QEMU always crashes in following scenario (assume that
vhost-user application is Open vSwitch with 'dpdkvhostuser' port):

1. # Check that link in guest is in a normal state.
   [guest]# ip link show eth0
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
        link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff

2. # Kill vhost-user application (using SIGSEGV just to be sure).
   [host]# kill -11 `pgrep ovs-vswitchd`

3. # Check that guest still thinks that all is good.
   [guest]# ip link show eth0
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
        link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff

4. # Try to unbind virtio-pci driver and observe QEMU crash.
   [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
    qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
    qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
    qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.

    Child terminated with signal = 0xb (SIGSEGV)
    GDBserver exiting

After the applying of this patch-set:

4. # Try to unbind virtio-pci driver. Unbind works fine with only few errors.
   [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
    qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
    qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.

5. # Bind virtio-pci driver back.
   [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/bind

6. # Check link in guest. No crashes here, link in DOWN state.
   [guest]# ip link show eth0
    7: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc <...>
        link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff

7. QEMU may be gracefully restarted to restore communication after restarting
   of vhost-user application.

Ilya Maximets (4):
  vhost-user: fix crash on socket disconnect.
  vhost: prevent double stop of vhost_net device.
  vhost: check for vhost_net device validity.
  net: notify about link status only if it changed.

 hw/net/vhost_net.c             | 48 ++++++++++++++++++++++++++++++++++++++----
 hw/net/virtio-net.c            | 33 ++++++++++++++++++++++-------
 include/hw/virtio/virtio-net.h |  1 +
 include/net/vhost-user.h       |  1 +
 include/net/vhost_net.h        |  1 +
 net/filter.c                   |  1 +
 net/net.c                      |  7 +++---
 net/vhost-user.c               | 43 +++++++++++++++++++++++++++++--------
 8 files changed, 111 insertions(+), 24 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
  2016-03-30 15:14 Ilya Maximets
@ 2016-03-30 17:01 ` Michael S. Tsirkin
  2016-03-31  6:02   ` Ilya Maximets
  0 siblings, 1 reply; 10+ messages in thread
From: Michael S. Tsirkin @ 2016-03-30 17:01 UTC (permalink / raw)
  To: Ilya Maximets; +Cc: Jason Wang, qemu-devel, Dyasly Sergey

On Wed, Mar 30, 2016 at 06:14:05PM +0300, Ilya Maximets wrote:
> Currently QEMU always crashes in following scenario (assume that
> vhost-user application is Open vSwitch with 'dpdkvhostuser' port):

In fact, wouldn't the right thing to do be stopping the VM?

> 1. # Check that link in guest is in a normal state.
>    [guest]# ip link show eth0
>     2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
> 
> 2. # Kill vhost-user application (using SIGSEGV just to be sure).
>    [host]# kill -11 `pgrep ovs-vswitchd`
> 
> 3. # Check that guest still thinks that all is good.
>    [guest]# ip link show eth0
>     2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
> 
> 4. # Try to unbind virtio-pci driver and observe QEMU crash.
>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
> 
>     Child terminated with signal = 0xb (SIGSEGV)
>     GDBserver exiting
> 
> After the applying of this patch-set:
> 
> 4. # Try to unbind virtio-pci driver. Unbind works fine with only few errors.
>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
> 
> 5. # Bind virtio-pci driver back.
>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/bind
> 
> 6. # Check link in guest. No crashes here, link in DOWN state.
>    [guest]# ip link show eth0
>     7: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc <...>
>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
> 
> 7. QEMU may be gracefully restarted to restore communication after restarting
>    of vhost-user application.
> 
> Ilya Maximets (4):
>   vhost-user: fix crash on socket disconnect.
>   vhost: prevent double stop of vhost_net device.
>   vhost: check for vhost_net device validity.
>   net: notify about link status only if it changed.
> 
>  hw/net/vhost_net.c             | 48 ++++++++++++++++++++++++++++++++++++++----
>  hw/net/virtio-net.c            | 33 ++++++++++++++++++++++-------
>  include/hw/virtio/virtio-net.h |  1 +
>  include/net/vhost-user.h       |  1 +
>  include/net/vhost_net.h        |  1 +
>  net/filter.c                   |  1 +
>  net/net.c                      |  7 +++---
>  net/vhost-user.c               | 43 +++++++++++++++++++++++++++++--------
>  8 files changed, 111 insertions(+), 24 deletions(-)
> 
> -- 
> 2.5.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
  2016-03-30 17:01 ` Michael S. Tsirkin
@ 2016-03-31  6:02   ` Ilya Maximets
  2016-03-31  9:21     ` Michael S. Tsirkin
  2016-04-05 10:46     ` Michael S. Tsirkin
  0 siblings, 2 replies; 10+ messages in thread
From: Ilya Maximets @ 2016-03-31  6:02 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, qemu-devel, Dyasly Sergey

On 30.03.2016 20:01, Michael S. Tsirkin wrote:
> On Wed, Mar 30, 2016 at 06:14:05PM +0300, Ilya Maximets wrote:
>> Currently QEMU always crashes in following scenario (assume that
>> vhost-user application is Open vSwitch with 'dpdkvhostuser' port):
> 
> In fact, wouldn't the right thing to do be stopping the VM?

I don't think that is reasonable to stop VM on failure of just one of
network interfaces. There may be still working vhost-kernel interfaces.
Even connection can still be established if vhost-user interface was in
bonding with kernel interface.
Anyway user should be able to save all his data before QEMU restart.

> 
>> 1. # Check that link in guest is in a normal state.
>>    [guest]# ip link show eth0
>>     2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
>>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
>>
>> 2. # Kill vhost-user application (using SIGSEGV just to be sure).
>>    [host]# kill -11 `pgrep ovs-vswitchd`
>>
>> 3. # Check that guest still thinks that all is good.
>>    [guest]# ip link show eth0
>>     2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
>>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
>>
>> 4. # Try to unbind virtio-pci driver and observe QEMU crash.
>>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
>>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>>
>>     Child terminated with signal = 0xb (SIGSEGV)
>>     GDBserver exiting
>>
>> After the applying of this patch-set:
>>
>> 4. # Try to unbind virtio-pci driver. Unbind works fine with only few errors.
>>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
>>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>>
>> 5. # Bind virtio-pci driver back.
>>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/bind
>>
>> 6. # Check link in guest. No crashes here, link in DOWN state.
>>    [guest]# ip link show eth0
>>     7: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc <...>
>>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
>>
>> 7. QEMU may be gracefully restarted to restore communication after restarting
>>    of vhost-user application.
>>
>> Ilya Maximets (4):
>>   vhost-user: fix crash on socket disconnect.
>>   vhost: prevent double stop of vhost_net device.
>>   vhost: check for vhost_net device validity.
>>   net: notify about link status only if it changed.
>>
>>  hw/net/vhost_net.c             | 48 ++++++++++++++++++++++++++++++++++++++----
>>  hw/net/virtio-net.c            | 33 ++++++++++++++++++++++-------
>>  include/hw/virtio/virtio-net.h |  1 +
>>  include/net/vhost-user.h       |  1 +
>>  include/net/vhost_net.h        |  1 +
>>  net/filter.c                   |  1 +
>>  net/net.c                      |  7 +++---
>>  net/vhost-user.c               | 43 +++++++++++++++++++++++++++++--------
>>  8 files changed, 111 insertions(+), 24 deletions(-)
>>
>> -- 
>> 2.5.0
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
  2016-03-31  6:02   ` Ilya Maximets
@ 2016-03-31  9:21     ` Michael S. Tsirkin
  2016-03-31 10:48       ` Ilya Maximets
  2016-04-05 10:46     ` Michael S. Tsirkin
  1 sibling, 1 reply; 10+ messages in thread
From: Michael S. Tsirkin @ 2016-03-31  9:21 UTC (permalink / raw)
  To: Ilya Maximets; +Cc: Jason Wang, qemu-devel, Dyasly Sergey

On Thu, Mar 31, 2016 at 09:02:01AM +0300, Ilya Maximets wrote:
> On 30.03.2016 20:01, Michael S. Tsirkin wrote:
> > On Wed, Mar 30, 2016 at 06:14:05PM +0300, Ilya Maximets wrote:
> >> Currently QEMU always crashes in following scenario (assume that
> >> vhost-user application is Open vSwitch with 'dpdkvhostuser' port):
> > 
> > In fact, wouldn't the right thing to do be stopping the VM?
> 
> I don't think that is reasonable to stop VM on failure of just one of
> network interfaces. There may be still working vhost-kernel interfaces.
> Even connection can still be established if vhost-user interface was in
> bonding with kernel interface.
> Anyway user should be able to save all his data before QEMU restart.


Unfrotunately some guests are known to crash if we defer
using available buffers indefinitely.

If you see bugs let's fix them, but recovering would be
a new feature, let's defer it until after 2.6.

> > 
> >> 1. # Check that link in guest is in a normal state.
> >>    [guest]# ip link show eth0
> >>     2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
> >>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
> >>
> >> 2. # Kill vhost-user application (using SIGSEGV just to be sure).
> >>    [host]# kill -11 `pgrep ovs-vswitchd`
> >>
> >> 3. # Check that guest still thinks that all is good.
> >>    [guest]# ip link show eth0
> >>     2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
> >>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
> >>
> >> 4. # Try to unbind virtio-pci driver and observe QEMU crash.
> >>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
> >>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
> >>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
> >>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
> >>
> >>     Child terminated with signal = 0xb (SIGSEGV)
> >>     GDBserver exiting
> >>
> >> After the applying of this patch-set:
> >>
> >> 4. # Try to unbind virtio-pci driver. Unbind works fine with only few errors.
> >>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
> >>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
> >>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
> >>
> >> 5. # Bind virtio-pci driver back.
> >>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/bind
> >>
> >> 6. # Check link in guest. No crashes here, link in DOWN state.
> >>    [guest]# ip link show eth0
> >>     7: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc <...>
> >>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
> >>
> >> 7. QEMU may be gracefully restarted to restore communication after restarting
> >>    of vhost-user application.
> >>
> >> Ilya Maximets (4):
> >>   vhost-user: fix crash on socket disconnect.
> >>   vhost: prevent double stop of vhost_net device.
> >>   vhost: check for vhost_net device validity.
> >>   net: notify about link status only if it changed.
> >>
> >>  hw/net/vhost_net.c             | 48 ++++++++++++++++++++++++++++++++++++++----
> >>  hw/net/virtio-net.c            | 33 ++++++++++++++++++++++-------
> >>  include/hw/virtio/virtio-net.h |  1 +
> >>  include/net/vhost-user.h       |  1 +
> >>  include/net/vhost_net.h        |  1 +
> >>  net/filter.c                   |  1 +
> >>  net/net.c                      |  7 +++---
> >>  net/vhost-user.c               | 43 +++++++++++++++++++++++++++++--------
> >>  8 files changed, 111 insertions(+), 24 deletions(-)
> >>
> >> -- 
> >> 2.5.0
> > 
> > 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
  2016-03-31  9:21     ` Michael S. Tsirkin
@ 2016-03-31 10:48       ` Ilya Maximets
  0 siblings, 0 replies; 10+ messages in thread
From: Ilya Maximets @ 2016-03-31 10:48 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, qemu-devel, Dyasly Sergey

On 31.03.2016 12:21, Michael S. Tsirkin wrote:
> On Thu, Mar 31, 2016 at 09:02:01AM +0300, Ilya Maximets wrote:
>> On 30.03.2016 20:01, Michael S. Tsirkin wrote:
>>> On Wed, Mar 30, 2016 at 06:14:05PM +0300, Ilya Maximets wrote:
>>>> Currently QEMU always crashes in following scenario (assume that
>>>> vhost-user application is Open vSwitch with 'dpdkvhostuser' port):
>>>
>>> In fact, wouldn't the right thing to do be stopping the VM?
>>
>> I don't think that is reasonable to stop VM on failure of just one of
>> network interfaces. There may be still working vhost-kernel interfaces.
>> Even connection can still be established if vhost-user interface was in
>> bonding with kernel interface.
>> Anyway user should be able to save all his data before QEMU restart.
> 
> 
> Unfrotunately some guests are known to crash if we defer
> using available buffers indefinitely.

What exactly do you mean?

Anyway here we have 'crash of some guests' vs. '100% constant QEMU crash'.

> 
> If you see bugs let's fix them, but recovering would be
> a new feature, let's defer it until after 2.6.

The segmentation fault is a bug. This series *doesn't introduce any recovering*
features. Disconnected network interface in guest will never work again.
I just tried to avoid segmentation fault in this situation. There will
be no difference for guest between QEMU with this patches and without them.
There just will not be any crashes. 

> 
>>>
>>>> 1. # Check that link in guest is in a normal state.
>>>>    [guest]# ip link show eth0
>>>>     2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
>>>>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
>>>>
>>>> 2. # Kill vhost-user application (using SIGSEGV just to be sure).
>>>>    [host]# kill -11 `pgrep ovs-vswitchd`
>>>>
>>>> 3. # Check that guest still thinks that all is good.
>>>>    [guest]# ip link show eth0
>>>>     2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
>>>>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
>>>>
>>>> 4. # Try to unbind virtio-pci driver and observe QEMU crash.
>>>>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
>>>>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>>>>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>>>>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>>>>
>>>>     Child terminated with signal = 0xb (SIGSEGV)
>>>>     GDBserver exiting
>>>>
>>>> After the applying of this patch-set:
>>>>
>>>> 4. # Try to unbind virtio-pci driver. Unbind works fine with only few errors.
>>>>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
>>>>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>>>>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>>>>
>>>> 5. # Bind virtio-pci driver back.
>>>>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/bind
>>>>
>>>> 6. # Check link in guest. No crashes here, link in DOWN state.
>>>>    [guest]# ip link show eth0
>>>>     7: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc <...>
>>>>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
>>>>
>>>> 7. QEMU may be gracefully restarted to restore communication after restarting
>>>>    of vhost-user application.
>>>>
>>>> Ilya Maximets (4):
>>>>   vhost-user: fix crash on socket disconnect.
>>>>   vhost: prevent double stop of vhost_net device.
>>>>   vhost: check for vhost_net device validity.
>>>>   net: notify about link status only if it changed.
>>>>
>>>>  hw/net/vhost_net.c             | 48 ++++++++++++++++++++++++++++++++++++++----
>>>>  hw/net/virtio-net.c            | 33 ++++++++++++++++++++++-------
>>>>  include/hw/virtio/virtio-net.h |  1 +
>>>>  include/net/vhost-user.h       |  1 +
>>>>  include/net/vhost_net.h        |  1 +
>>>>  net/filter.c                   |  1 +
>>>>  net/net.c                      |  7 +++---
>>>>  net/vhost-user.c               | 43 +++++++++++++++++++++++++++++--------
>>>>  8 files changed, 111 insertions(+), 24 deletions(-)
>>>>
>>>> -- 
>>>> 2.5.0
>>>
>>>
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
  2016-03-31  6:02   ` Ilya Maximets
  2016-03-31  9:21     ` Michael S. Tsirkin
@ 2016-04-05 10:46     ` Michael S. Tsirkin
  1 sibling, 0 replies; 10+ messages in thread
From: Michael S. Tsirkin @ 2016-04-05 10:46 UTC (permalink / raw)
  To: Ilya Maximets; +Cc: Jason Wang, qemu-devel, Dyasly Sergey

On Thu, Mar 31, 2016 at 09:02:01AM +0300, Ilya Maximets wrote:
> On 30.03.2016 20:01, Michael S. Tsirkin wrote:
> > On Wed, Mar 30, 2016 at 06:14:05PM +0300, Ilya Maximets wrote:
> >> Currently QEMU always crashes in following scenario (assume that
> >> vhost-user application is Open vSwitch with 'dpdkvhostuser' port):
> > 
> > In fact, wouldn't the right thing to do be stopping the VM?
> 
> I don't think that is reasonable to stop VM on failure of just one of
> network interfaces.

We don't start QEMU until vhost user connects, either.
Making guest run properly with a backend is a much bigger
project, let's tackle this separately.

Also, I think handling graceful disconnect is the correct first step.
Handling misbehaving clients is much harder as we have asserts on remote
obeying protocol rules all over the place.

> There may be still working vhost-kernel interfaces.
> Even connection can still be established if vhost-user interface was in
> bonding with kernel interface.

Could not parse this.

> Anyway user should be able to save all his data before QEMU restart.

So reconnect a new backend, and VM will keep going.

> > 
> >> 1. # Check that link in guest is in a normal state.
> >>    [guest]# ip link show eth0
> >>     2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
> >>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
> >>
> >> 2. # Kill vhost-user application (using SIGSEGV just to be sure).
> >>    [host]# kill -11 `pgrep ovs-vswitchd`
> >>
> >> 3. # Check that guest still thinks that all is good.
> >>    [guest]# ip link show eth0
> >>     2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
> >>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
> >>
> >> 4. # Try to unbind virtio-pci driver and observe QEMU crash.
> >>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
> >>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
> >>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
> >>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
> >>
> >>     Child terminated with signal = 0xb (SIGSEGV)
> >>     GDBserver exiting
> >>
> >> After the applying of this patch-set:
> >>
> >> 4. # Try to unbind virtio-pci driver. Unbind works fine with only few errors.
> >>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
> >>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
> >>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
> >>
> >> 5. # Bind virtio-pci driver back.
> >>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/bind
> >>
> >> 6. # Check link in guest. No crashes here, link in DOWN state.


We already have interfaces to control link state.
I don't think we should necessarily tie link state to backend
state. Could be an option but probably not the default.

> >>    [guest]# ip link show eth0
> >>     7: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc <...>
> >>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
> >>
> >> 7. QEMU may be gracefully restarted to restore communication after restarting
> >>    of vhost-user application.
> >>
> >> Ilya Maximets (4):
> >>   vhost-user: fix crash on socket disconnect.
> >>   vhost: prevent double stop of vhost_net device.
> >>   vhost: check for vhost_net device validity.
> >>   net: notify about link status only if it changed.
> >>
> >>  hw/net/vhost_net.c             | 48 ++++++++++++++++++++++++++++++++++++++----
> >>  hw/net/virtio-net.c            | 33 ++++++++++++++++++++++-------
> >>  include/hw/virtio/virtio-net.h |  1 +
> >>  include/net/vhost-user.h       |  1 +
> >>  include/net/vhost_net.h        |  1 +
> >>  net/filter.c                   |  1 +
> >>  net/net.c                      |  7 +++---
> >>  net/vhost-user.c               | 43 +++++++++++++++++++++++++++++--------
> >>  8 files changed, 111 insertions(+), 24 deletions(-)
> >>
> >> -- 
> >> 2.5.0
> > 
> > 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
@ 2016-04-06 23:52 Ilya Maximets
  2016-04-07  7:01 ` Michael S. Tsirkin
  0 siblings, 1 reply; 10+ messages in thread
From: Ilya Maximets @ 2016-04-06 23:52 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, qemu-devel@nongnu.org, Sergey Dyasly

------- Original Message -------
Sender : Michael S. Tsirkin<mst@redhat.com>
Date : Apr 05, 2016 13:46 (GMT+03:00)
Title : Re: [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.

> On Thu, Mar 31, 2016 at 09:02:01AM +0300, Ilya Maximets wrote:
> > On 30.03.2016 20:01, Michael S. Tsirkin wrote:
> > > On Wed, Mar 30, 2016 at 06:14:05PM +0300, Ilya Maximets wrote:
> > >> Currently QEMU always crashes in following scenario (assume that
> > >> vhost-user application is Open vSwitch with 'dpdkvhostuser' port):
> > > 
> > > In fact, wouldn't the right thing to do be stopping the VM?
> > 
> > I don't think that is reasonable to stop VM on failure of just one of
> > network interfaces.
> 
> We don't start QEMU until vhost user connects, either.
> Making guest run properly with a backend is a much bigger
> project, let's tackle this separately.
> 
> Also, I think handling graceful disconnect is the correct first step.
> Handling misbehaving clients is much harder as we have asserts on remote
> obeying protocol rules all over the place.
> 
> > There may be still working vhost-kernel interfaces.
> > Even connection can still be established if vhost-user interface was in
> > bonding with kernel interface.
> 
> Could not parse this.
>
> > Anyway user should be able to save all his data before QEMU restart.
>
> So reconnect a new backend, and VM will keep going.

We cant't do this because of 2 reasons:
1. Segmentation fault of QEMU on disconnect. (fixed by this patch-set)
2. There is no reconnect functionality in current QEMU version.

So, what are you talking about?

Best regards, Ilya Maximets.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
  2016-04-06 23:52 Ilya Maximets
@ 2016-04-07  7:01 ` Michael S. Tsirkin
  0 siblings, 0 replies; 10+ messages in thread
From: Michael S. Tsirkin @ 2016-04-07  7:01 UTC (permalink / raw)
  To: Ilya Maximets; +Cc: Jason Wang, qemu-devel@nongnu.org, Sergey Dyasly

On Wed, Apr 06, 2016 at 11:52:56PM +0000, Ilya Maximets wrote:
> ------- Original Message -------
> Sender : Michael S. Tsirkin<mst@redhat.com>
> Date : Apr 05, 2016 13:46 (GMT+03:00)
> Title : Re: [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
> 
> > On Thu, Mar 31, 2016 at 09:02:01AM +0300, Ilya Maximets wrote:
> > > On 30.03.2016 20:01, Michael S. Tsirkin wrote:
> > > > On Wed, Mar 30, 2016 at 06:14:05PM +0300, Ilya Maximets wrote:
> > > >> Currently QEMU always crashes in following scenario (assume that
> > > >> vhost-user application is Open vSwitch with 'dpdkvhostuser' port):
> > > > 
> > > > In fact, wouldn't the right thing to do be stopping the VM?
> > > 
> > > I don't think that is reasonable to stop VM on failure of just one of
> > > network interfaces.
> > 
> > We don't start QEMU until vhost user connects, either.
> > Making guest run properly with a backend is a much bigger
> > project, let's tackle this separately.
> > 
> > Also, I think handling graceful disconnect is the correct first step.
> > Handling misbehaving clients is much harder as we have asserts on remote
> > obeying protocol rules all over the place.
> > 
> > > There may be still working vhost-kernel interfaces.
> > > Even connection can still be established if vhost-user interface was in
> > > bonding with kernel interface.
> > 
> > Could not parse this.
> >
> > > Anyway user should be able to save all his data before QEMU restart.
> >
> > So reconnect a new backend, and VM will keep going.
> 
> We cant't do this because of 2 reasons:
> 1. Segmentation fault of QEMU on disconnect. (fixed by this patch-set)
> 2. There is no reconnect functionality in current QEMU version.
> 
> So, what are you talking about?
> 
> Best regards, Ilya Maximets.

One can currently disconnect vhost user clients without killing
guests using migration:
- save vm
- quit qemu
- start new qemu
- load vm

Or using hotplug
- request unplug
- wait for guest eject
- create new device

I would like to make sure we do not create an expectation that guests
keep going unconditionally with device present even with backend
disconnected. Unfortunately your patchset might create such
expectation.

-- 
MST

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
@ 2016-04-07 11:09 Ilya Maximets
  2016-04-07 12:25 ` Michael S. Tsirkin
  0 siblings, 1 reply; 10+ messages in thread
From: Ilya Maximets @ 2016-04-07 11:09 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, qemu-devel@nongnu.org, Sergey Dyasly

> ------- Original Message -------
> Sender : Michael S. Tsirkin<mst@redhat.com>
> Date : Apr 07, 2016 10:01 (GMT+03:00)
> Title : Re: Re: [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
> 
> On Wed, Apr 06, 2016 at 11:52:56PM +0000, Ilya Maximets wrote:
> > ------- Original Message -------
> > Sender : Michael S. Tsirkin
> > Date : Apr 05, 2016 13:46 (GMT+03:00)
> > Title : Re: [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
> > 
> > > On Thu, Mar 31, 2016 at 09:02:01AM +0300, Ilya Maximets wrote:
> > > > On 30.03.2016 20:01, Michael S. Tsirkin wrote:
> > > > > On Wed, Mar 30, 2016 at 06:14:05PM +0300, Ilya Maximets wrote:
> > > > >> Currently QEMU always crashes in following scenario (assume that
> > > > >> vhost-user application is Open vSwitch with 'dpdkvhostuser' port):
> > > > > 
> > > > > In fact, wouldn't the right thing to do be stopping the VM?
> > > > 
> > > > I don't think that is reasonable to stop VM on failure of just one of
> > > > network interfaces.
> > > 
> > > We don't start QEMU until vhost user connects, either.
> > > Making guest run properly with a backend is a much bigger
> > > project, let's tackle this separately.
> > > 
> > > Also, I think handling graceful disconnect is the correct first step.
> > > Handling misbehaving clients is much harder as we have asserts on remote
> > > obeying protocol rules all over the place.
> > > 
> > > > There may be still working vhost-kernel interfaces.
> > > > Even connection can still be established if vhost-user interface was in
> > > > bonding with kernel interface.
> > > 
> > > Could not parse this.
> > >
> > > > Anyway user should be able to save all his data before QEMU restart.
> > >
> > > So reconnect a new backend, and VM will keep going.
> > 
> > We cant't do this because of 2 reasons:
> > 1. Segmentation fault of QEMU on disconnect. (fixed by this patch-set)
> > 2. There is no reconnect functionality in current QEMU version.
> > 
> > So, what are you talking about?
> > 
> > Best regards, Ilya Maximets.
> 
> One can currently disconnect vhost user clients without killing
> guests using migration:
> - save vm
> - quit qemu
> - start new qemu
> - load vm
>
> Or using hotplug
> - request unplug
> - wait for guest eject
> - create new device

This may be fixes second reason mentioned above, but in my scenario (described in cover-letter)
guset will know that backend disconnected only after segmentation fault. So, there will be
no opportunity to save or unplug machine.

> I would like to make sure we do not create an expectation that guests
> keep going unconditionally with device present even with backend
>disconnected. Unfortunately your patchset might create such
> expectation.

It's already so, because guest will never know about crach of backend untill it
tries to communicate via socket.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
  2016-04-07 11:09 [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect Ilya Maximets
@ 2016-04-07 12:25 ` Michael S. Tsirkin
  0 siblings, 0 replies; 10+ messages in thread
From: Michael S. Tsirkin @ 2016-04-07 12:25 UTC (permalink / raw)
  To: Ilya Maximets; +Cc: Jason Wang, qemu-devel@nongnu.org, Sergey Dyasly

On Thu, Apr 07, 2016 at 11:09:48AM +0000, Ilya Maximets wrote:
> > ------- Original Message -------
> > Sender : Michael S. Tsirkin<mst@redhat.com>
> > Date : Apr 07, 2016 10:01 (GMT+03:00)
> > Title : Re: Re: [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
> > 
> > On Wed, Apr 06, 2016 at 11:52:56PM +0000, Ilya Maximets wrote:
> > > ------- Original Message -------
> > > Sender : Michael S. Tsirkin
> > > Date : Apr 05, 2016 13:46 (GMT+03:00)
> > > Title : Re: [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
> > > 
> > > > On Thu, Mar 31, 2016 at 09:02:01AM +0300, Ilya Maximets wrote:
> > > > > On 30.03.2016 20:01, Michael S. Tsirkin wrote:
> > > > > > On Wed, Mar 30, 2016 at 06:14:05PM +0300, Ilya Maximets wrote:
> > > > > >> Currently QEMU always crashes in following scenario (assume that
> > > > > >> vhost-user application is Open vSwitch with 'dpdkvhostuser' port):
> > > > > > 
> > > > > > In fact, wouldn't the right thing to do be stopping the VM?
> > > > > 
> > > > > I don't think that is reasonable to stop VM on failure of just one of
> > > > > network interfaces.
> > > > 
> > > > We don't start QEMU until vhost user connects, either.
> > > > Making guest run properly with a backend is a much bigger
> > > > project, let's tackle this separately.
> > > > 
> > > > Also, I think handling graceful disconnect is the correct first step.
> > > > Handling misbehaving clients is much harder as we have asserts on remote
> > > > obeying protocol rules all over the place.
> > > > 
> > > > > There may be still working vhost-kernel interfaces.
> > > > > Even connection can still be established if vhost-user interface was in
> > > > > bonding with kernel interface.
> > > > 
> > > > Could not parse this.
> > > >
> > > > > Anyway user should be able to save all his data before QEMU restart.
> > > >
> > > > So reconnect a new backend, and VM will keep going.
> > > 
> > > We cant't do this because of 2 reasons:
> > > 1. Segmentation fault of QEMU on disconnect. (fixed by this patch-set)
> > > 2. There is no reconnect functionality in current QEMU version.
> > > 
> > > So, what are you talking about?
> > > 
> > > Best regards, Ilya Maximets.
> > 
> > One can currently disconnect vhost user clients without killing
> > guests using migration:
> > - save vm
> > - quit qemu
> > - start new qemu
> > - load vm
> >
> > Or using hotplug
> > - request unplug
> > - wait for guest eject
> > - create new device
> 
> This may be fixes second reason mentioned above, but in my scenario (described in cover-letter)
> guset will know that backend disconnected only after segmentation fault. So, there will be
> no opportunity to save or unplug machine.

In cover letter you describe killing vhost user client.
So don't do it then.

> > I would like to make sure we do not create an expectation that guests
> > keep going unconditionally with device present even with backend
> >disconnected. Unfortunately your patchset might create such
> > expectation.
> 
> It's already so, because guest will never know about crach of backend untill it
> tries to communicate via socket.

I agree the current handling isn't good. I am just reluctant to
stick even more stuff in there before 2.6. We need a plan going
forward. I'll think it over during the weekend.

-- 
MST

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-04-07 12:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-07 11:09 [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect Ilya Maximets
2016-04-07 12:25 ` Michael S. Tsirkin
  -- strict thread matches above, loose matches on Subject: below --
2016-04-06 23:52 Ilya Maximets
2016-04-07  7:01 ` Michael S. Tsirkin
2016-03-30 15:14 Ilya Maximets
2016-03-30 17:01 ` Michael S. Tsirkin
2016-03-31  6:02   ` Ilya Maximets
2016-03-31  9:21     ` Michael S. Tsirkin
2016-03-31 10:48       ` Ilya Maximets
2016-04-05 10:46     ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).