qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
@ 2016-03-30 15:14 Ilya Maximets
  2016-03-30 15:14 ` [Qemu-devel] [PATCH 1/4] vhost-user: fix crash on " Ilya Maximets
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Ilya Maximets @ 2016-03-30 15:14 UTC (permalink / raw)
  To: qemu-devel, Michael S. Tsirkin; +Cc: Ilya Maximets, Jason Wang, Dyasly Sergey

Currently QEMU always crashes in following scenario (assume that
vhost-user application is Open vSwitch with 'dpdkvhostuser' port):

1. # Check that link in guest is in a normal state.
   [guest]# ip link show eth0
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
        link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff

2. # Kill vhost-user application (using SIGSEGV just to be sure).
   [host]# kill -11 `pgrep ovs-vswitchd`

3. # Check that guest still thinks that all is good.
   [guest]# ip link show eth0
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
        link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff

4. # Try to unbind virtio-pci driver and observe QEMU crash.
   [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
    qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
    qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
    qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.

    Child terminated with signal = 0xb (SIGSEGV)
    GDBserver exiting

After the applying of this patch-set:

4. # Try to unbind virtio-pci driver. Unbind works fine with only few errors.
   [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
    qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
    qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.

5. # Bind virtio-pci driver back.
   [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/bind

6. # Check link in guest. No crashes here, link in DOWN state.
   [guest]# ip link show eth0
    7: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc <...>
        link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff

7. QEMU may be gracefully restarted to restore communication after restarting
   of vhost-user application.

Ilya Maximets (4):
  vhost-user: fix crash on socket disconnect.
  vhost: prevent double stop of vhost_net device.
  vhost: check for vhost_net device validity.
  net: notify about link status only if it changed.

 hw/net/vhost_net.c             | 48 ++++++++++++++++++++++++++++++++++++++----
 hw/net/virtio-net.c            | 33 ++++++++++++++++++++++-------
 include/hw/virtio/virtio-net.h |  1 +
 include/net/vhost-user.h       |  1 +
 include/net/vhost_net.h        |  1 +
 net/filter.c                   |  1 +
 net/net.c                      |  7 +++---
 net/vhost-user.c               | 43 +++++++++++++++++++++++++++++--------
 8 files changed, 111 insertions(+), 24 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 1/4] vhost-user: fix crash on socket disconnect.
  2016-03-30 15:14 [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect Ilya Maximets
@ 2016-03-30 15:14 ` Ilya Maximets
  2016-03-30 15:14 ` [Qemu-devel] [PATCH 2/4] vhost: prevent double stop of vhost_net device Ilya Maximets
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Ilya Maximets @ 2016-03-30 15:14 UTC (permalink / raw)
  To: qemu-devel, Michael S. Tsirkin; +Cc: Ilya Maximets, Jason Wang, Dyasly Sergey

After disconnect of vhost-user socket (Ex.: host application crash)
QEMU will crash with segmentation fault while virtio driver unbinding
inside the guest.

This happens because vhost_net_cleanup() called while stopping
virtqueue #0 because of CHR_EVENT_CLOSED event arriving.
After that stopping of virtqueue #1 crashes because of cleaned and
freed device memory:

[-------------------------------- cut -----------------------------------]
[guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.

Child terminated with signal = 0xb (SIGSEGV)
GDBserver exiting
[-------------------------------- cut -----------------------------------]
[host]# gdb
Breakpoint 1 vhost_user_cleanup(dev) at hw/virtio/vhost-user.c
(gdb) bt
#0  vhost_user_cleanup # executes 'dev->opaque = 0;'
#1  vhost_dev_cleanup
#2  vhost_net_cleanup  # After return from #1: g_free(net)
#3  vhost_user_stop
#4  net_vhost_user_event (<...>, event=5) /* CHR_EVENT_CLOSED */
#5  qemu_chr_be_event (<...>, event=event@entry=5)
#6  tcp_chr_disconnect
#7  tcp_chr_sync_read
#8  qemu_chr_fe_read_all
#9  vhost_user_read
#10 vhost_user_get_vring_base
#11 vhost_virtqueue_stop (vq=0xffe190, idx=0)
#12 vhost_dev_stop
#13 vhost_net_stop_one
#14 vhost_net_stop
#15 virtio_net_vhost_status
#16 virtio_net_set_status
#17 virtio_set_status
<...>
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
qemu_chr_fe_write_all (s=s@entry=0x0, <...>) at qemu-char.c:302
302         if (s->replay && replay_mode == REPLAY_MODE_PLAY) {
(gdb) bt
#0  qemu_chr_fe_write_all (s=s@entry=0x0, <...>)
#1  vhost_user_write
#2  vhost_user_get_vring_base
#3  vhost_virtqueue_stop (vq=0xffe190, idx=1) # device already freed here
#4  vhost_dev_stop
#5  vhost_net_stop_one
#6  vhost_net_stop
#7  virtio_net_vhost_status
#8  virtio_net_set_status
#9  virtio_set_status
<...>
[-------------------------------- cut -----------------------------------]

Fix that by introducing of reference counter for vhost_net device
and freeing memory only after dropping of last reference.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
---
 hw/net/vhost_net.c       | 22 +++++++++++++++++++---
 hw/net/virtio-net.c      | 32 ++++++++++++++++++++++++--------
 include/net/vhost-user.h |  1 +
 include/net/vhost_net.h  |  1 +
 net/filter.c             |  1 +
 net/vhost-user.c         | 43 ++++++++++++++++++++++++++++++++++---------
 6 files changed, 80 insertions(+), 20 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 6e1032f..55d5456 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -296,6 +296,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
         if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) {
                 dev->use_guest_notifier_mask = false;
         }
+        put_vhost_net(ncs[i].peer);
      }
 
     r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
@@ -306,7 +307,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 
     for (i = 0; i < total_queues; i++) {
         r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev);
-
+        put_vhost_net(ncs[i].peer);
         if (r < 0) {
             goto err_start;
         }
@@ -317,6 +318,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 err_start:
     while (--i >= 0) {
         vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
+        put_vhost_net(ncs[i].peer);
     }
     e = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
     if (e < 0) {
@@ -337,6 +339,7 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
 
     for (i = 0; i < total_queues; i++) {
         vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
+        put_vhost_net(ncs[i].peer);
     }
 
     r = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
@@ -398,16 +401,25 @@ VHostNetState *get_vhost_net(NetClientState *nc)
     return vhost_net;
 }
 
+void put_vhost_net(NetClientState *nc)
+{
+    if (nc && nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) {
+        vhost_user_put_vhost_net(nc);
+    }
+}
+
 int vhost_set_vring_enable(NetClientState *nc, int enable)
 {
     VHostNetState *net = get_vhost_net(nc);
     const VhostOps *vhost_ops = net->dev.vhost_ops;
+    int res = 0;
 
     if (vhost_ops->vhost_set_vring_enable) {
-        return vhost_ops->vhost_set_vring_enable(&net->dev, enable);
+        res = vhost_ops->vhost_set_vring_enable(&net->dev, enable);
     }
 
-    return 0;
+    put_vhost_net(nc);
+    return res;
 }
 
 #else
@@ -466,6 +478,10 @@ VHostNetState *get_vhost_net(NetClientState *nc)
     return 0;
 }
 
+void put_vhost_net(NetClientState *nc)
+{
+}
+
 int vhost_set_vring_enable(NetClientState *nc, int enable)
 {
     return 0;
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 5798f87..e5456ef 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -121,6 +121,7 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
     if (!get_vhost_net(nc->peer)) {
         return;
     }
+    put_vhost_net(nc->peer);
 
     if ((virtio_net_started(n, status) && !nc->peer->link_down) ==
         !!n->vhost_started) {
@@ -521,6 +522,8 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
 {
     VirtIONet *n = VIRTIO_NET(vdev);
     NetClientState *nc = qemu_get_queue(n->nic);
+    VHostNetState *net;
+    uint64_t res;
 
     /* Firstly sync all virtio-net possible supported features */
     features |= n->host_features;
@@ -544,10 +547,15 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
         virtio_clear_feature(&features, VIRTIO_NET_F_HOST_UFO);
     }
 
-    if (!get_vhost_net(nc->peer)) {
-        return features;
+    net = get_vhost_net(nc->peer);
+    if (!net) {
+        res = features;
+    } else {
+        res = vhost_net_get_features(net, features);
+        put_vhost_net(nc->peer);
     }
-    return vhost_net_get_features(get_vhost_net(nc->peer), features);
+
+    return res;
 }
 
 static uint64_t virtio_net_bad_features(VirtIODevice *vdev)
@@ -615,11 +623,13 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
 
     for (i = 0;  i < n->max_queues; i++) {
         NetClientState *nc = qemu_get_subqueue(n->nic, i);
+        VHostNetState *net = get_vhost_net(nc->peer);
 
-        if (!get_vhost_net(nc->peer)) {
+        if (!net) {
             continue;
         }
-        vhost_net_ack_features(get_vhost_net(nc->peer), features);
+        vhost_net_ack_features(net, features);
+        put_vhost_net(nc->peer);
     }
 
     if (virtio_has_feature(features, VIRTIO_NET_F_CTRL_VLAN)) {
@@ -1698,8 +1708,13 @@ static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
 {
     VirtIONet *n = VIRTIO_NET(vdev);
     NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
+    VHostNetState *net = get_vhost_net(nc->peer);
+    bool res;
+
     assert(n->vhost_started);
-    return vhost_net_virtqueue_pending(get_vhost_net(nc->peer), idx);
+    res = vhost_net_virtqueue_pending(net, idx);
+    put_vhost_net(nc->peer);
+    return res;
 }
 
 static void virtio_net_guest_notifier_mask(VirtIODevice *vdev, int idx,
@@ -1707,9 +1722,10 @@ static void virtio_net_guest_notifier_mask(VirtIODevice *vdev, int idx,
 {
     VirtIONet *n = VIRTIO_NET(vdev);
     NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
+    VHostNetState *net = get_vhost_net(nc->peer);
     assert(n->vhost_started);
-    vhost_net_virtqueue_mask(get_vhost_net(nc->peer),
-                             vdev, idx, mask);
+    vhost_net_virtqueue_mask(net, vdev, idx, mask);
+    put_vhost_net(nc->peer);
 }
 
 static void virtio_net_set_config_size(VirtIONet *n, uint64_t host_features)
diff --git a/include/net/vhost-user.h b/include/net/vhost-user.h
index 85109f6..6bdaa1a 100644
--- a/include/net/vhost-user.h
+++ b/include/net/vhost-user.h
@@ -13,5 +13,6 @@
 
 struct vhost_net;
 struct vhost_net *vhost_user_get_vhost_net(NetClientState *nc);
+void vhost_user_put_vhost_net(NetClientState *nc);
 
 #endif /* VHOST_USER_H_ */
diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
index 3389b41..577bfa8 100644
--- a/include/net/vhost_net.h
+++ b/include/net/vhost_net.h
@@ -29,6 +29,7 @@ void vhost_net_virtqueue_mask(VHostNetState *net, VirtIODevice *dev,
                               int idx, bool mask);
 int vhost_net_notify_migration_done(VHostNetState *net, char* mac_addr);
 VHostNetState *get_vhost_net(NetClientState *nc);
+void put_vhost_net(NetClientState *nc);
 
 int vhost_set_vring_enable(NetClientState * nc, int enable);
 #endif
diff --git a/net/filter.c b/net/filter.c
index 1c4fc5a..28403aa 100644
--- a/net/filter.c
+++ b/net/filter.c
@@ -214,6 +214,7 @@ static void netfilter_complete(UserCreatable *uc, Error **errp)
 
     if (get_vhost_net(ncs[0])) {
         error_setg(errp, "Vhost is not supported");
+        put_vhost_net(ncs[0]);
         return;
     }
 
diff --git a/net/vhost-user.c b/net/vhost-user.c
index 1b9e73a..9026df3 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -22,6 +22,7 @@ typedef struct VhostUserState {
     NetClientState nc;
     CharDriverState *chr;
     VHostNetState *vhost_net;
+    int refcnt;
 } VhostUserState;
 
 typedef struct VhostUserChardevProps {
@@ -29,13 +30,41 @@ typedef struct VhostUserChardevProps {
     bool is_unix;
 } VhostUserChardevProps;
 
+static void vhost_user_net_ref(VhostUserState *s)
+{
+    if (!s->vhost_net) {
+        return;
+    }
+    s->refcnt++;
+}
+
+static void vhost_user_net_unref(VhostUserState *s)
+{
+    if (!s->vhost_net) {
+        return;
+    }
+
+    if (--s->refcnt == 0) {
+        vhost_net_cleanup(s->vhost_net);
+        s->vhost_net = NULL;
+    }
+}
+
 VHostNetState *vhost_user_get_vhost_net(NetClientState *nc)
 {
     VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
     assert(nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER);
+    vhost_user_net_ref(s);
     return s->vhost_net;
 }
 
+void vhost_user_put_vhost_net(NetClientState *nc)
+{
+    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+    assert(nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER);
+    vhost_user_net_unref(s);
+}
+
 static int vhost_user_running(VhostUserState *s)
 {
     return (s->vhost_net) ? 1 : 0;
@@ -54,10 +83,7 @@ static void vhost_user_stop(int queues, NetClientState *ncs[])
             continue;
         }
 
-        if (s->vhost_net) {
-            vhost_net_cleanup(s->vhost_net);
-            s->vhost_net = NULL;
-        }
+        vhost_user_net_unref(s);
     }
 }
 
@@ -81,6 +107,7 @@ static int vhost_user_start(int queues, NetClientState *ncs[])
         options.net_backend = ncs[i];
         options.opaque      = s->chr;
         s->vhost_net = vhost_net_init(&options);
+        vhost_user_net_ref(s);
         if (!s->vhost_net) {
             error_report("failed to init vhost_net for queue %d", i);
             goto err;
@@ -119,7 +146,9 @@ static ssize_t vhost_user_receive(NetClientState *nc, const uint8_t *buf,
         /* extract guest mac address from the RARP message */
         memcpy(mac_addr, &buf[6], 6);
 
+        vhost_user_net_ref(s);
         r = vhost_net_notify_migration_done(s->vhost_net, mac_addr);
+        vhost_user_net_unref(s);
 
         if ((r != 0) && (display_rarp_failure)) {
             fprintf(stderr,
@@ -136,11 +165,7 @@ static void vhost_user_cleanup(NetClientState *nc)
 {
     VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
 
-    if (s->vhost_net) {
-        vhost_net_cleanup(s->vhost_net);
-        s->vhost_net = NULL;
-    }
-
+    vhost_user_net_unref(s);
     qemu_purge_queued_packets(nc);
 }
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 2/4] vhost: prevent double stop of vhost_net device.
  2016-03-30 15:14 [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect Ilya Maximets
  2016-03-30 15:14 ` [Qemu-devel] [PATCH 1/4] vhost-user: fix crash on " Ilya Maximets
@ 2016-03-30 15:14 ` Ilya Maximets
  2016-03-30 15:14 ` [Qemu-devel] [PATCH 3/4] vhost: check for vhost_net device validity Ilya Maximets
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Ilya Maximets @ 2016-03-30 15:14 UTC (permalink / raw)
  To: qemu-devel, Michael S. Tsirkin; +Cc: Ilya Maximets, Jason Wang, Dyasly Sergey

There are few control flows in vhost's implementation where
vhost_net_stop() may be re entered several times in a row. This
leads to triggering of assertion while duble freeing of same resources.

For example, if vhost-user socket disconnection occured:
[-------------------------------- cut -----------------------------------]
[guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
qemu: /qemu/memory.c:1852: memory_region_del_eventfd:
      Assertion `i != mr->ioeventfd_nb' failed.

Child terminated with signal = 0x6 (SIGABRT)
GDBserver exiting
[-------------------------------- cut -----------------------------------]
[host]# gdb
Breakpoint 1 in virtio_pci_set_host_notifier_internal()
#0  virtio_pci_set_host_notifier_internal
#1  vhost_dev_disable_notifiers
#2  vhost_net_stop_one
#3  vhost_net_stop (dev=dev@entry=0x13a45f8, ncs=0x1ce89f0, <...>)
#4  virtio_net_vhost_status
#5  virtio_net_set_status
#6  qmp_set_link (<...>, up=up@entry=false, <...>)
#7  net_vhost_user_event (<...>, event=5)
#8  qemu_chr_be_event
#9  tcp_chr_disconnect
#10 tcp_chr_sync_read
#11 qemu_chr_fe_read_all
#12 vhost_user_read
#13 vhost_user_get_vring_base
#14 vhost_virtqueue_stop
#15 vhost_dev_stop
#16 vhost_net_stop_one
#17 vhost_net_stop (dev=dev@entry=0x13a45f8, ncs=0x1ce89f0, <...>)
#18 virtio_net_vhost_status
#19 virtio_net_set_status
#20 virtio_set_status
<...>
[-------------------------------- cut -----------------------------------]

In example above assertion will fail when control will be brought back
to function at #17 and it will try to free 'eventfd' that was already
freed at call #3.

Fix that by disallowing execution of vhost_net_stop() if we're
already inside of it.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
---
 hw/net/vhost_net.c             | 8 ++++++++
 hw/net/virtio-net.c            | 1 +
 include/hw/virtio/virtio-net.h | 1 +
 3 files changed, 10 insertions(+)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 55d5456..0996e5d 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -335,8 +335,14 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
     BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
     VirtioBusState *vbus = VIRTIO_BUS(qbus);
     VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
+    VirtIONet *n = VIRTIO_NET(dev);
     int i, r;
 
+    if (n->vhost_stopping_in_progress) {
+        return;
+    }
+    n->vhost_stopping_in_progress = 1;
+
     for (i = 0; i < total_queues; i++) {
         vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
         put_vhost_net(ncs[i].peer);
@@ -348,6 +354,8 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
         fflush(stderr);
     }
     assert(r >= 0);
+
+    n->vhost_stopping_in_progress = 0;
 }
 
 void vhost_net_cleanup(struct vhost_net *net)
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index e5456ef..9b1539c 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -149,6 +149,7 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
         }
 
         n->vhost_started = 1;
+        n->vhost_stopping_in_progress = 0;
         r = vhost_net_start(vdev, n->nic->ncs, queues);
         if (r < 0) {
             error_report("unable to start vhost net: %d: "
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index 0cabdb6..fc07385 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -74,6 +74,7 @@ typedef struct VirtIONet {
     uint8_t nouni;
     uint8_t nobcast;
     uint8_t vhost_started;
+    uint8_t vhost_stopping_in_progress;
     struct {
         uint32_t in_use;
         uint32_t first_multi;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 3/4] vhost: check for vhost_net device validity.
  2016-03-30 15:14 [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect Ilya Maximets
  2016-03-30 15:14 ` [Qemu-devel] [PATCH 1/4] vhost-user: fix crash on " Ilya Maximets
  2016-03-30 15:14 ` [Qemu-devel] [PATCH 2/4] vhost: prevent double stop of vhost_net device Ilya Maximets
@ 2016-03-30 15:14 ` Ilya Maximets
  2016-03-30 15:14 ` [Qemu-devel] [PATCH 4/4] net: notify about link status only if it changed Ilya Maximets
  2016-03-30 17:01 ` [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect Michael S. Tsirkin
  4 siblings, 0 replies; 14+ messages in thread
From: Ilya Maximets @ 2016-03-30 15:14 UTC (permalink / raw)
  To: qemu-devel, Michael S. Tsirkin; +Cc: Ilya Maximets, Jason Wang, Dyasly Sergey

After successfull destroying of vhost-user device (may be after
virtio driver unbinding after disconnection of vhost-user socket)
QEMU will fail to bind virtio driver again with segmentation fault:

[-------------------------------- cut -----------------------------------]
# After vhost-user socket diconnection.
[guest]# ip link show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
    link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff

[guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.

[guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/bind

Child terminated with signal = 0xb (SIGSEGV)
GDBserver exiting
[-------------------------------- cut -----------------------------------]
[host]# gdb
Program received signal SIGSEGV, Segmentation fault.
vhost_set_vring_enable (<...>) at /hw/net/vhost_net.c:425
425         if (vhost_ops->vhost_set_vring_enable) {
(gdb) bt
#0  vhost_set_vring_enable (nc=0xfff8a0, enable=enable@entry=1)
        net = 0x0 # NULL pointer to vhost_net device !
        vhost_ops = <(Cannot access memory at address 0x110)>
        res = 0
#1  peer_attach
#2  virtio_net_set_queues
#3  virtio_net_set_multiqueue
#4  virtio_net_set_features
#5  virtio_set_features_nocheck
#6  memory_region_write_accessor
#7  access_with_adjusted_size
#8  memory_region_dispatch_write
#9  address_space_write_continue
#10 address_space_write
<...>
[-------------------------------- cut -----------------------------------]

This happens because of invalid vhost_net device pointer.
Fix that by checking this pointer in all functions before using.

Result:
[-------------------------------- cut -----------------------------------]
[guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/bind

# Check link in guest. No crashes here, link in DOWN state.
[guest]# ip link show eth0
7: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc <...>
    link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
[-------------------------------- cut -----------------------------------]

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
---
 hw/net/vhost_net.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 0996e5d..4c3363f 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -202,6 +202,10 @@ static int vhost_net_start_one(struct vhost_net *net,
     struct vhost_vring_file file = { };
     int r;
 
+    if (!net) {
+        return -1;
+    }
+
     net->dev.nvqs = 2;
     net->dev.vqs = net->vqs;
 
@@ -256,6 +260,10 @@ static void vhost_net_stop_one(struct vhost_net *net,
 {
     struct vhost_vring_file file = { .fd = -1 };
 
+    if (!net) {
+        return;
+    }
+
     if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_TAP) {
         for (file.index = 0; file.index < net->dev.nvqs; ++file.index) {
             const VhostOps *vhost_ops = net->dev.vhost_ops;
@@ -287,6 +295,9 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
         struct vhost_net *net;
 
         net = get_vhost_net(ncs[i].peer);
+        if (!net) {
+            return -1;
+        }
         vhost_net_set_vq_index(net, i * 2);
 
         /* Suppress the masking guest notifiers on vhost user
@@ -419,9 +430,14 @@ void put_vhost_net(NetClientState *nc)
 int vhost_set_vring_enable(NetClientState *nc, int enable)
 {
     VHostNetState *net = get_vhost_net(nc);
-    const VhostOps *vhost_ops = net->dev.vhost_ops;
+    const VhostOps *vhost_ops;
     int res = 0;
 
+    if (!net) {
+        return 0;
+    }
+
+    vhost_ops = net->dev.vhost_ops;
     if (vhost_ops->vhost_set_vring_enable) {
         res = vhost_ops->vhost_set_vring_enable(&net->dev, enable);
     }
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 4/4] net: notify about link status only if it changed.
  2016-03-30 15:14 [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect Ilya Maximets
                   ` (2 preceding siblings ...)
  2016-03-30 15:14 ` [Qemu-devel] [PATCH 3/4] vhost: check for vhost_net device validity Ilya Maximets
@ 2016-03-30 15:14 ` Ilya Maximets
  2016-03-30 17:01 ` [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect Michael S. Tsirkin
  4 siblings, 0 replies; 14+ messages in thread
From: Ilya Maximets @ 2016-03-30 15:14 UTC (permalink / raw)
  To: qemu-devel, Michael S. Tsirkin; +Cc: Ilya Maximets, Jason Wang, Dyasly Sergey

No need to notify nc->peer if nothing changed.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
---
 net/net.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/net.c b/net/net.c
index 3b5a142..6f6a8ce 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1385,9 +1385,10 @@ void qmp_set_link(const char *name, bool up, Error **errp)
             for (i = 0; i < queues; i++) {
                 ncs[i]->peer->link_down = !up;
             }
-        }
-        if (nc->peer->info->link_status_changed) {
-            nc->peer->info->link_status_changed(nc->peer);
+
+            if (nc->peer->info->link_status_changed) {
+                nc->peer->info->link_status_changed(nc->peer);
+            }
         }
     }
 }
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
  2016-03-30 15:14 [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect Ilya Maximets
                   ` (3 preceding siblings ...)
  2016-03-30 15:14 ` [Qemu-devel] [PATCH 4/4] net: notify about link status only if it changed Ilya Maximets
@ 2016-03-30 17:01 ` Michael S. Tsirkin
  2016-03-31  6:02   ` Ilya Maximets
  4 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2016-03-30 17:01 UTC (permalink / raw)
  To: Ilya Maximets; +Cc: Jason Wang, qemu-devel, Dyasly Sergey

On Wed, Mar 30, 2016 at 06:14:05PM +0300, Ilya Maximets wrote:
> Currently QEMU always crashes in following scenario (assume that
> vhost-user application is Open vSwitch with 'dpdkvhostuser' port):

In fact, wouldn't the right thing to do be stopping the VM?

> 1. # Check that link in guest is in a normal state.
>    [guest]# ip link show eth0
>     2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
> 
> 2. # Kill vhost-user application (using SIGSEGV just to be sure).
>    [host]# kill -11 `pgrep ovs-vswitchd`
> 
> 3. # Check that guest still thinks that all is good.
>    [guest]# ip link show eth0
>     2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
> 
> 4. # Try to unbind virtio-pci driver and observe QEMU crash.
>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
> 
>     Child terminated with signal = 0xb (SIGSEGV)
>     GDBserver exiting
> 
> After the applying of this patch-set:
> 
> 4. # Try to unbind virtio-pci driver. Unbind works fine with only few errors.
>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
> 
> 5. # Bind virtio-pci driver back.
>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/bind
> 
> 6. # Check link in guest. No crashes here, link in DOWN state.
>    [guest]# ip link show eth0
>     7: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc <...>
>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
> 
> 7. QEMU may be gracefully restarted to restore communication after restarting
>    of vhost-user application.
> 
> Ilya Maximets (4):
>   vhost-user: fix crash on socket disconnect.
>   vhost: prevent double stop of vhost_net device.
>   vhost: check for vhost_net device validity.
>   net: notify about link status only if it changed.
> 
>  hw/net/vhost_net.c             | 48 ++++++++++++++++++++++++++++++++++++++----
>  hw/net/virtio-net.c            | 33 ++++++++++++++++++++++-------
>  include/hw/virtio/virtio-net.h |  1 +
>  include/net/vhost-user.h       |  1 +
>  include/net/vhost_net.h        |  1 +
>  net/filter.c                   |  1 +
>  net/net.c                      |  7 +++---
>  net/vhost-user.c               | 43 +++++++++++++++++++++++++++++--------
>  8 files changed, 111 insertions(+), 24 deletions(-)
> 
> -- 
> 2.5.0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
  2016-03-30 17:01 ` [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect Michael S. Tsirkin
@ 2016-03-31  6:02   ` Ilya Maximets
  2016-03-31  9:21     ` Michael S. Tsirkin
  2016-04-05 10:46     ` Michael S. Tsirkin
  0 siblings, 2 replies; 14+ messages in thread
From: Ilya Maximets @ 2016-03-31  6:02 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, qemu-devel, Dyasly Sergey

On 30.03.2016 20:01, Michael S. Tsirkin wrote:
> On Wed, Mar 30, 2016 at 06:14:05PM +0300, Ilya Maximets wrote:
>> Currently QEMU always crashes in following scenario (assume that
>> vhost-user application is Open vSwitch with 'dpdkvhostuser' port):
> 
> In fact, wouldn't the right thing to do be stopping the VM?

I don't think that is reasonable to stop VM on failure of just one of
network interfaces. There may be still working vhost-kernel interfaces.
Even connection can still be established if vhost-user interface was in
bonding with kernel interface.
Anyway user should be able to save all his data before QEMU restart.

> 
>> 1. # Check that link in guest is in a normal state.
>>    [guest]# ip link show eth0
>>     2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
>>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
>>
>> 2. # Kill vhost-user application (using SIGSEGV just to be sure).
>>    [host]# kill -11 `pgrep ovs-vswitchd`
>>
>> 3. # Check that guest still thinks that all is good.
>>    [guest]# ip link show eth0
>>     2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
>>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
>>
>> 4. # Try to unbind virtio-pci driver and observe QEMU crash.
>>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
>>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>>
>>     Child terminated with signal = 0xb (SIGSEGV)
>>     GDBserver exiting
>>
>> After the applying of this patch-set:
>>
>> 4. # Try to unbind virtio-pci driver. Unbind works fine with only few errors.
>>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
>>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>>
>> 5. # Bind virtio-pci driver back.
>>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/bind
>>
>> 6. # Check link in guest. No crashes here, link in DOWN state.
>>    [guest]# ip link show eth0
>>     7: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc <...>
>>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
>>
>> 7. QEMU may be gracefully restarted to restore communication after restarting
>>    of vhost-user application.
>>
>> Ilya Maximets (4):
>>   vhost-user: fix crash on socket disconnect.
>>   vhost: prevent double stop of vhost_net device.
>>   vhost: check for vhost_net device validity.
>>   net: notify about link status only if it changed.
>>
>>  hw/net/vhost_net.c             | 48 ++++++++++++++++++++++++++++++++++++++----
>>  hw/net/virtio-net.c            | 33 ++++++++++++++++++++++-------
>>  include/hw/virtio/virtio-net.h |  1 +
>>  include/net/vhost-user.h       |  1 +
>>  include/net/vhost_net.h        |  1 +
>>  net/filter.c                   |  1 +
>>  net/net.c                      |  7 +++---
>>  net/vhost-user.c               | 43 +++++++++++++++++++++++++++++--------
>>  8 files changed, 111 insertions(+), 24 deletions(-)
>>
>> -- 
>> 2.5.0
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
  2016-03-31  6:02   ` Ilya Maximets
@ 2016-03-31  9:21     ` Michael S. Tsirkin
  2016-03-31 10:48       ` Ilya Maximets
  2016-04-05 10:46     ` Michael S. Tsirkin
  1 sibling, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2016-03-31  9:21 UTC (permalink / raw)
  To: Ilya Maximets; +Cc: Jason Wang, qemu-devel, Dyasly Sergey

On Thu, Mar 31, 2016 at 09:02:01AM +0300, Ilya Maximets wrote:
> On 30.03.2016 20:01, Michael S. Tsirkin wrote:
> > On Wed, Mar 30, 2016 at 06:14:05PM +0300, Ilya Maximets wrote:
> >> Currently QEMU always crashes in following scenario (assume that
> >> vhost-user application is Open vSwitch with 'dpdkvhostuser' port):
> > 
> > In fact, wouldn't the right thing to do be stopping the VM?
> 
> I don't think that is reasonable to stop VM on failure of just one of
> network interfaces. There may be still working vhost-kernel interfaces.
> Even connection can still be established if vhost-user interface was in
> bonding with kernel interface.
> Anyway user should be able to save all his data before QEMU restart.


Unfrotunately some guests are known to crash if we defer
using available buffers indefinitely.

If you see bugs let's fix them, but recovering would be
a new feature, let's defer it until after 2.6.

> > 
> >> 1. # Check that link in guest is in a normal state.
> >>    [guest]# ip link show eth0
> >>     2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
> >>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
> >>
> >> 2. # Kill vhost-user application (using SIGSEGV just to be sure).
> >>    [host]# kill -11 `pgrep ovs-vswitchd`
> >>
> >> 3. # Check that guest still thinks that all is good.
> >>    [guest]# ip link show eth0
> >>     2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
> >>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
> >>
> >> 4. # Try to unbind virtio-pci driver and observe QEMU crash.
> >>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
> >>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
> >>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
> >>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
> >>
> >>     Child terminated with signal = 0xb (SIGSEGV)
> >>     GDBserver exiting
> >>
> >> After the applying of this patch-set:
> >>
> >> 4. # Try to unbind virtio-pci driver. Unbind works fine with only few errors.
> >>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
> >>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
> >>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
> >>
> >> 5. # Bind virtio-pci driver back.
> >>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/bind
> >>
> >> 6. # Check link in guest. No crashes here, link in DOWN state.
> >>    [guest]# ip link show eth0
> >>     7: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc <...>
> >>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
> >>
> >> 7. QEMU may be gracefully restarted to restore communication after restarting
> >>    of vhost-user application.
> >>
> >> Ilya Maximets (4):
> >>   vhost-user: fix crash on socket disconnect.
> >>   vhost: prevent double stop of vhost_net device.
> >>   vhost: check for vhost_net device validity.
> >>   net: notify about link status only if it changed.
> >>
> >>  hw/net/vhost_net.c             | 48 ++++++++++++++++++++++++++++++++++++++----
> >>  hw/net/virtio-net.c            | 33 ++++++++++++++++++++++-------
> >>  include/hw/virtio/virtio-net.h |  1 +
> >>  include/net/vhost-user.h       |  1 +
> >>  include/net/vhost_net.h        |  1 +
> >>  net/filter.c                   |  1 +
> >>  net/net.c                      |  7 +++---
> >>  net/vhost-user.c               | 43 +++++++++++++++++++++++++++++--------
> >>  8 files changed, 111 insertions(+), 24 deletions(-)
> >>
> >> -- 
> >> 2.5.0
> > 
> > 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
  2016-03-31  9:21     ` Michael S. Tsirkin
@ 2016-03-31 10:48       ` Ilya Maximets
  0 siblings, 0 replies; 14+ messages in thread
From: Ilya Maximets @ 2016-03-31 10:48 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, qemu-devel, Dyasly Sergey

On 31.03.2016 12:21, Michael S. Tsirkin wrote:
> On Thu, Mar 31, 2016 at 09:02:01AM +0300, Ilya Maximets wrote:
>> On 30.03.2016 20:01, Michael S. Tsirkin wrote:
>>> On Wed, Mar 30, 2016 at 06:14:05PM +0300, Ilya Maximets wrote:
>>>> Currently QEMU always crashes in following scenario (assume that
>>>> vhost-user application is Open vSwitch with 'dpdkvhostuser' port):
>>>
>>> In fact, wouldn't the right thing to do be stopping the VM?
>>
>> I don't think that is reasonable to stop VM on failure of just one of
>> network interfaces. There may be still working vhost-kernel interfaces.
>> Even connection can still be established if vhost-user interface was in
>> bonding with kernel interface.
>> Anyway user should be able to save all his data before QEMU restart.
> 
> 
> Unfrotunately some guests are known to crash if we defer
> using available buffers indefinitely.

What exactly do you mean?

Anyway here we have 'crash of some guests' vs. '100% constant QEMU crash'.

> 
> If you see bugs let's fix them, but recovering would be
> a new feature, let's defer it until after 2.6.

The segmentation fault is a bug. This series *doesn't introduce any recovering*
features. Disconnected network interface in guest will never work again.
I just tried to avoid segmentation fault in this situation. There will
be no difference for guest between QEMU with this patches and without them.
There just will not be any crashes. 

> 
>>>
>>>> 1. # Check that link in guest is in a normal state.
>>>>    [guest]# ip link show eth0
>>>>     2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
>>>>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
>>>>
>>>> 2. # Kill vhost-user application (using SIGSEGV just to be sure).
>>>>    [host]# kill -11 `pgrep ovs-vswitchd`
>>>>
>>>> 3. # Check that guest still thinks that all is good.
>>>>    [guest]# ip link show eth0
>>>>     2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
>>>>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
>>>>
>>>> 4. # Try to unbind virtio-pci driver and observe QEMU crash.
>>>>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
>>>>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>>>>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>>>>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>>>>
>>>>     Child terminated with signal = 0xb (SIGSEGV)
>>>>     GDBserver exiting
>>>>
>>>> After the applying of this patch-set:
>>>>
>>>> 4. # Try to unbind virtio-pci driver. Unbind works fine with only few errors.
>>>>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
>>>>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>>>>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
>>>>
>>>> 5. # Bind virtio-pci driver back.
>>>>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/bind
>>>>
>>>> 6. # Check link in guest. No crashes here, link in DOWN state.
>>>>    [guest]# ip link show eth0
>>>>     7: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc <...>
>>>>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
>>>>
>>>> 7. QEMU may be gracefully restarted to restore communication after restarting
>>>>    of vhost-user application.
>>>>
>>>> Ilya Maximets (4):
>>>>   vhost-user: fix crash on socket disconnect.
>>>>   vhost: prevent double stop of vhost_net device.
>>>>   vhost: check for vhost_net device validity.
>>>>   net: notify about link status only if it changed.
>>>>
>>>>  hw/net/vhost_net.c             | 48 ++++++++++++++++++++++++++++++++++++++----
>>>>  hw/net/virtio-net.c            | 33 ++++++++++++++++++++++-------
>>>>  include/hw/virtio/virtio-net.h |  1 +
>>>>  include/net/vhost-user.h       |  1 +
>>>>  include/net/vhost_net.h        |  1 +
>>>>  net/filter.c                   |  1 +
>>>>  net/net.c                      |  7 +++---
>>>>  net/vhost-user.c               | 43 +++++++++++++++++++++++++++++--------
>>>>  8 files changed, 111 insertions(+), 24 deletions(-)
>>>>
>>>> -- 
>>>> 2.5.0
>>>
>>>
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
  2016-03-31  6:02   ` Ilya Maximets
  2016-03-31  9:21     ` Michael S. Tsirkin
@ 2016-04-05 10:46     ` Michael S. Tsirkin
  1 sibling, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2016-04-05 10:46 UTC (permalink / raw)
  To: Ilya Maximets; +Cc: Jason Wang, qemu-devel, Dyasly Sergey

On Thu, Mar 31, 2016 at 09:02:01AM +0300, Ilya Maximets wrote:
> On 30.03.2016 20:01, Michael S. Tsirkin wrote:
> > On Wed, Mar 30, 2016 at 06:14:05PM +0300, Ilya Maximets wrote:
> >> Currently QEMU always crashes in following scenario (assume that
> >> vhost-user application is Open vSwitch with 'dpdkvhostuser' port):
> > 
> > In fact, wouldn't the right thing to do be stopping the VM?
> 
> I don't think that is reasonable to stop VM on failure of just one of
> network interfaces.

We don't start QEMU until vhost user connects, either.
Making guest run properly with a backend is a much bigger
project, let's tackle this separately.

Also, I think handling graceful disconnect is the correct first step.
Handling misbehaving clients is much harder as we have asserts on remote
obeying protocol rules all over the place.

> There may be still working vhost-kernel interfaces.
> Even connection can still be established if vhost-user interface was in
> bonding with kernel interface.

Could not parse this.

> Anyway user should be able to save all his data before QEMU restart.

So reconnect a new backend, and VM will keep going.

> > 
> >> 1. # Check that link in guest is in a normal state.
> >>    [guest]# ip link show eth0
> >>     2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
> >>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
> >>
> >> 2. # Kill vhost-user application (using SIGSEGV just to be sure).
> >>    [host]# kill -11 `pgrep ovs-vswitchd`
> >>
> >> 3. # Check that guest still thinks that all is good.
> >>    [guest]# ip link show eth0
> >>     2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc <...>
> >>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
> >>
> >> 4. # Try to unbind virtio-pci driver and observe QEMU crash.
> >>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
> >>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
> >>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
> >>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
> >>
> >>     Child terminated with signal = 0xb (SIGSEGV)
> >>     GDBserver exiting
> >>
> >> After the applying of this patch-set:
> >>
> >> 4. # Try to unbind virtio-pci driver. Unbind works fine with only few errors.
> >>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
> >>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
> >>     qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
> >>
> >> 5. # Bind virtio-pci driver back.
> >>    [guest]# echo -n '0000:00:01.0' > /sys/bus/pci/drivers/virtio-pci/bind
> >>
> >> 6. # Check link in guest. No crashes here, link in DOWN state.


We already have interfaces to control link state.
I don't think we should necessarily tie link state to backend
state. Could be an option but probably not the default.

> >>    [guest]# ip link show eth0
> >>     7: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc <...>
> >>         link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
> >>
> >> 7. QEMU may be gracefully restarted to restore communication after restarting
> >>    of vhost-user application.
> >>
> >> Ilya Maximets (4):
> >>   vhost-user: fix crash on socket disconnect.
> >>   vhost: prevent double stop of vhost_net device.
> >>   vhost: check for vhost_net device validity.
> >>   net: notify about link status only if it changed.
> >>
> >>  hw/net/vhost_net.c             | 48 ++++++++++++++++++++++++++++++++++++++----
> >>  hw/net/virtio-net.c            | 33 ++++++++++++++++++++++-------
> >>  include/hw/virtio/virtio-net.h |  1 +
> >>  include/net/vhost-user.h       |  1 +
> >>  include/net/vhost_net.h        |  1 +
> >>  net/filter.c                   |  1 +
> >>  net/net.c                      |  7 +++---
> >>  net/vhost-user.c               | 43 +++++++++++++++++++++++++++++--------
> >>  8 files changed, 111 insertions(+), 24 deletions(-)
> >>
> >> -- 
> >> 2.5.0
> > 
> > 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
@ 2016-04-06 23:52 Ilya Maximets
  2016-04-07  7:01 ` Michael S. Tsirkin
  0 siblings, 1 reply; 14+ messages in thread
From: Ilya Maximets @ 2016-04-06 23:52 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, qemu-devel@nongnu.org, Sergey Dyasly

------- Original Message -------
Sender : Michael S. Tsirkin<mst@redhat.com>
Date : Apr 05, 2016 13:46 (GMT+03:00)
Title : Re: [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.

> On Thu, Mar 31, 2016 at 09:02:01AM +0300, Ilya Maximets wrote:
> > On 30.03.2016 20:01, Michael S. Tsirkin wrote:
> > > On Wed, Mar 30, 2016 at 06:14:05PM +0300, Ilya Maximets wrote:
> > >> Currently QEMU always crashes in following scenario (assume that
> > >> vhost-user application is Open vSwitch with 'dpdkvhostuser' port):
> > > 
> > > In fact, wouldn't the right thing to do be stopping the VM?
> > 
> > I don't think that is reasonable to stop VM on failure of just one of
> > network interfaces.
> 
> We don't start QEMU until vhost user connects, either.
> Making guest run properly with a backend is a much bigger
> project, let's tackle this separately.
> 
> Also, I think handling graceful disconnect is the correct first step.
> Handling misbehaving clients is much harder as we have asserts on remote
> obeying protocol rules all over the place.
> 
> > There may be still working vhost-kernel interfaces.
> > Even connection can still be established if vhost-user interface was in
> > bonding with kernel interface.
> 
> Could not parse this.
>
> > Anyway user should be able to save all his data before QEMU restart.
>
> So reconnect a new backend, and VM will keep going.

We cant't do this because of 2 reasons:
1. Segmentation fault of QEMU on disconnect. (fixed by this patch-set)
2. There is no reconnect functionality in current QEMU version.

So, what are you talking about?

Best regards, Ilya Maximets.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
  2016-04-06 23:52 Ilya Maximets
@ 2016-04-07  7:01 ` Michael S. Tsirkin
  0 siblings, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2016-04-07  7:01 UTC (permalink / raw)
  To: Ilya Maximets; +Cc: Jason Wang, qemu-devel@nongnu.org, Sergey Dyasly

On Wed, Apr 06, 2016 at 11:52:56PM +0000, Ilya Maximets wrote:
> ------- Original Message -------
> Sender : Michael S. Tsirkin<mst@redhat.com>
> Date : Apr 05, 2016 13:46 (GMT+03:00)
> Title : Re: [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
> 
> > On Thu, Mar 31, 2016 at 09:02:01AM +0300, Ilya Maximets wrote:
> > > On 30.03.2016 20:01, Michael S. Tsirkin wrote:
> > > > On Wed, Mar 30, 2016 at 06:14:05PM +0300, Ilya Maximets wrote:
> > > >> Currently QEMU always crashes in following scenario (assume that
> > > >> vhost-user application is Open vSwitch with 'dpdkvhostuser' port):
> > > > 
> > > > In fact, wouldn't the right thing to do be stopping the VM?
> > > 
> > > I don't think that is reasonable to stop VM on failure of just one of
> > > network interfaces.
> > 
> > We don't start QEMU until vhost user connects, either.
> > Making guest run properly with a backend is a much bigger
> > project, let's tackle this separately.
> > 
> > Also, I think handling graceful disconnect is the correct first step.
> > Handling misbehaving clients is much harder as we have asserts on remote
> > obeying protocol rules all over the place.
> > 
> > > There may be still working vhost-kernel interfaces.
> > > Even connection can still be established if vhost-user interface was in
> > > bonding with kernel interface.
> > 
> > Could not parse this.
> >
> > > Anyway user should be able to save all his data before QEMU restart.
> >
> > So reconnect a new backend, and VM will keep going.
> 
> We cant't do this because of 2 reasons:
> 1. Segmentation fault of QEMU on disconnect. (fixed by this patch-set)
> 2. There is no reconnect functionality in current QEMU version.
> 
> So, what are you talking about?
> 
> Best regards, Ilya Maximets.

One can currently disconnect vhost user clients without killing
guests using migration:
- save vm
- quit qemu
- start new qemu
- load vm

Or using hotplug
- request unplug
- wait for guest eject
- create new device

I would like to make sure we do not create an expectation that guests
keep going unconditionally with device present even with backend
disconnected. Unfortunately your patchset might create such
expectation.

-- 
MST

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
@ 2016-04-07 11:09 Ilya Maximets
  2016-04-07 12:25 ` Michael S. Tsirkin
  0 siblings, 1 reply; 14+ messages in thread
From: Ilya Maximets @ 2016-04-07 11:09 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, qemu-devel@nongnu.org, Sergey Dyasly

> ------- Original Message -------
> Sender : Michael S. Tsirkin<mst@redhat.com>
> Date : Apr 07, 2016 10:01 (GMT+03:00)
> Title : Re: Re: [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
> 
> On Wed, Apr 06, 2016 at 11:52:56PM +0000, Ilya Maximets wrote:
> > ------- Original Message -------
> > Sender : Michael S. Tsirkin
> > Date : Apr 05, 2016 13:46 (GMT+03:00)
> > Title : Re: [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
> > 
> > > On Thu, Mar 31, 2016 at 09:02:01AM +0300, Ilya Maximets wrote:
> > > > On 30.03.2016 20:01, Michael S. Tsirkin wrote:
> > > > > On Wed, Mar 30, 2016 at 06:14:05PM +0300, Ilya Maximets wrote:
> > > > >> Currently QEMU always crashes in following scenario (assume that
> > > > >> vhost-user application is Open vSwitch with 'dpdkvhostuser' port):
> > > > > 
> > > > > In fact, wouldn't the right thing to do be stopping the VM?
> > > > 
> > > > I don't think that is reasonable to stop VM on failure of just one of
> > > > network interfaces.
> > > 
> > > We don't start QEMU until vhost user connects, either.
> > > Making guest run properly with a backend is a much bigger
> > > project, let's tackle this separately.
> > > 
> > > Also, I think handling graceful disconnect is the correct first step.
> > > Handling misbehaving clients is much harder as we have asserts on remote
> > > obeying protocol rules all over the place.
> > > 
> > > > There may be still working vhost-kernel interfaces.
> > > > Even connection can still be established if vhost-user interface was in
> > > > bonding with kernel interface.
> > > 
> > > Could not parse this.
> > >
> > > > Anyway user should be able to save all his data before QEMU restart.
> > >
> > > So reconnect a new backend, and VM will keep going.
> > 
> > We cant't do this because of 2 reasons:
> > 1. Segmentation fault of QEMU on disconnect. (fixed by this patch-set)
> > 2. There is no reconnect functionality in current QEMU version.
> > 
> > So, what are you talking about?
> > 
> > Best regards, Ilya Maximets.
> 
> One can currently disconnect vhost user clients without killing
> guests using migration:
> - save vm
> - quit qemu
> - start new qemu
> - load vm
>
> Or using hotplug
> - request unplug
> - wait for guest eject
> - create new device

This may be fixes second reason mentioned above, but in my scenario (described in cover-letter)
guset will know that backend disconnected only after segmentation fault. So, there will be
no opportunity to save or unplug machine.

> I would like to make sure we do not create an expectation that guests
> keep going unconditionally with device present even with backend
>disconnected. Unfortunately your patchset might create such
> expectation.

It's already so, because guest will never know about crach of backend untill it
tries to communicate via socket.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
  2016-04-07 11:09 Ilya Maximets
@ 2016-04-07 12:25 ` Michael S. Tsirkin
  0 siblings, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2016-04-07 12:25 UTC (permalink / raw)
  To: Ilya Maximets; +Cc: Jason Wang, qemu-devel@nongnu.org, Sergey Dyasly

On Thu, Apr 07, 2016 at 11:09:48AM +0000, Ilya Maximets wrote:
> > ------- Original Message -------
> > Sender : Michael S. Tsirkin<mst@redhat.com>
> > Date : Apr 07, 2016 10:01 (GMT+03:00)
> > Title : Re: Re: [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
> > 
> > On Wed, Apr 06, 2016 at 11:52:56PM +0000, Ilya Maximets wrote:
> > > ------- Original Message -------
> > > Sender : Michael S. Tsirkin
> > > Date : Apr 05, 2016 13:46 (GMT+03:00)
> > > Title : Re: [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.
> > > 
> > > > On Thu, Mar 31, 2016 at 09:02:01AM +0300, Ilya Maximets wrote:
> > > > > On 30.03.2016 20:01, Michael S. Tsirkin wrote:
> > > > > > On Wed, Mar 30, 2016 at 06:14:05PM +0300, Ilya Maximets wrote:
> > > > > >> Currently QEMU always crashes in following scenario (assume that
> > > > > >> vhost-user application is Open vSwitch with 'dpdkvhostuser' port):
> > > > > > 
> > > > > > In fact, wouldn't the right thing to do be stopping the VM?
> > > > > 
> > > > > I don't think that is reasonable to stop VM on failure of just one of
> > > > > network interfaces.
> > > > 
> > > > We don't start QEMU until vhost user connects, either.
> > > > Making guest run properly with a backend is a much bigger
> > > > project, let's tackle this separately.
> > > > 
> > > > Also, I think handling graceful disconnect is the correct first step.
> > > > Handling misbehaving clients is much harder as we have asserts on remote
> > > > obeying protocol rules all over the place.
> > > > 
> > > > > There may be still working vhost-kernel interfaces.
> > > > > Even connection can still be established if vhost-user interface was in
> > > > > bonding with kernel interface.
> > > > 
> > > > Could not parse this.
> > > >
> > > > > Anyway user should be able to save all his data before QEMU restart.
> > > >
> > > > So reconnect a new backend, and VM will keep going.
> > > 
> > > We cant't do this because of 2 reasons:
> > > 1. Segmentation fault of QEMU on disconnect. (fixed by this patch-set)
> > > 2. There is no reconnect functionality in current QEMU version.
> > > 
> > > So, what are you talking about?
> > > 
> > > Best regards, Ilya Maximets.
> > 
> > One can currently disconnect vhost user clients without killing
> > guests using migration:
> > - save vm
> > - quit qemu
> > - start new qemu
> > - load vm
> >
> > Or using hotplug
> > - request unplug
> > - wait for guest eject
> > - create new device
> 
> This may be fixes second reason mentioned above, but in my scenario (described in cover-letter)
> guset will know that backend disconnected only after segmentation fault. So, there will be
> no opportunity to save or unplug machine.

In cover letter you describe killing vhost user client.
So don't do it then.

> > I would like to make sure we do not create an expectation that guests
> > keep going unconditionally with device present even with backend
> >disconnected. Unfortunately your patchset might create such
> > expectation.
> 
> It's already so, because guest will never know about crach of backend untill it
> tries to communicate via socket.

I agree the current handling isn't good. I am just reluctant to
stick even more stuff in there before 2.6. We need a plan going
forward. I'll think it over during the weekend.

-- 
MST

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-04-07 12:25 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-30 15:14 [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect Ilya Maximets
2016-03-30 15:14 ` [Qemu-devel] [PATCH 1/4] vhost-user: fix crash on " Ilya Maximets
2016-03-30 15:14 ` [Qemu-devel] [PATCH 2/4] vhost: prevent double stop of vhost_net device Ilya Maximets
2016-03-30 15:14 ` [Qemu-devel] [PATCH 3/4] vhost: check for vhost_net device validity Ilya Maximets
2016-03-30 15:14 ` [Qemu-devel] [PATCH 4/4] net: notify about link status only if it changed Ilya Maximets
2016-03-30 17:01 ` [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect Michael S. Tsirkin
2016-03-31  6:02   ` Ilya Maximets
2016-03-31  9:21     ` Michael S. Tsirkin
2016-03-31 10:48       ` Ilya Maximets
2016-04-05 10:46     ` Michael S. Tsirkin
  -- strict thread matches above, loose matches on Subject: below --
2016-04-06 23:52 Ilya Maximets
2016-04-07  7:01 ` Michael S. Tsirkin
2016-04-07 11:09 Ilya Maximets
2016-04-07 12:25 ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).