qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] virtio-net: Don't pass NULL peer to tap routines
@ 2010-09-22 19:52 Alex Williamson
  2010-09-23 17:43 ` Anthony Liguori
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Williamson @ 2010-09-22 19:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.williamson, mst

During a hotplug, the netdev might be removed before the
connected virtio device.  When this happens, the guest might
be running cleanup operations that can trigger a segfault in
qemu.  Avoid one set of these by checking whether the peer
device is present before trying to do tap operations.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

 hw/virtio-net.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 0a9cae2..2c758ad 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -216,6 +216,10 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint32_t features)
 
     n->mergeable_rx_bufs = !!(features & (1 << VIRTIO_NET_F_MRG_RXBUF));
 
+    if (!n->nic->nc.peer ||
+        n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
+        return;
+    }
     if (n->has_vnet_hdr) {
         tap_set_offload(n->nic->nc.peer,
                         (features >> VIRTIO_NET_F_GUEST_CSUM) & 1,
@@ -224,10 +228,6 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint32_t features)
                         (features >> VIRTIO_NET_F_GUEST_ECN)  & 1,
                         (features >> VIRTIO_NET_F_GUEST_UFO)  & 1);
     }
-    if (!n->nic->nc.peer ||
-        n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
-        return;
-    }
     if (!tap_get_vhost_net(n->nic->nc.peer)) {
         return;
     }
@@ -859,7 +859,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
             return -1;
         }
 
-        if (n->has_vnet_hdr) {
+        if (n->nic->nc.peer && n->has_vnet_hdr) {
             tap_using_vnet_hdr(n->nic->nc.peer, 1);
             tap_set_offload(n->nic->nc.peer,
                     (n->vdev.guest_features >> VIRTIO_NET_F_GUEST_CSUM) & 1,

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] virtio-net: Don't pass NULL peer to tap routines
  2010-09-22 19:52 [Qemu-devel] [PATCH] virtio-net: Don't pass NULL peer to tap routines Alex Williamson
@ 2010-09-23 17:43 ` Anthony Liguori
  2010-09-23 18:25   ` Alex Williamson
  0 siblings, 1 reply; 6+ messages in thread
From: Anthony Liguori @ 2010-09-23 17:43 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel, mst

On 09/22/2010 02:52 PM, Alex Williamson wrote:
> During a hotplug, the netdev might be removed before the
> connected virtio device.  When this happens, the guest might
> be running cleanup operations that can trigger a segfault in
> qemu.  Avoid one set of these by checking whether the peer
> device is present before trying to do tap operations.
>
> Signed-off-by: Alex Williamson<alex.williamson@redhat.com>
>    

Can you explain this scenario a little better?

If nc.peer is NULL when set_features is called, it would seem to me like 
we're in a pretty critical state.  I agree that we shouldn't set fault, 
but I wonder if the real bug is that this can happen at all.

Regards,

Anthony Liguori

> ---
>
>   hw/virtio-net.c |   10 +++++-----
>   1 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/hw/virtio-net.c b/hw/virtio-net.c
> index 0a9cae2..2c758ad 100644
> --- a/hw/virtio-net.c
> +++ b/hw/virtio-net.c
> @@ -216,6 +216,10 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint32_t features)
>
>       n->mergeable_rx_bufs = !!(features&  (1<<  VIRTIO_NET_F_MRG_RXBUF));
>
> +    if (!n->nic->nc.peer ||
> +        n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
> +        return;
> +    }
>       if (n->has_vnet_hdr) {
>           tap_set_offload(n->nic->nc.peer,
>                           (features>>  VIRTIO_NET_F_GUEST_CSUM)&  1,
> @@ -224,10 +228,6 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint32_t features)
>                           (features>>  VIRTIO_NET_F_GUEST_ECN)&  1,
>                           (features>>  VIRTIO_NET_F_GUEST_UFO)&  1);
>       }
> -    if (!n->nic->nc.peer ||
> -        n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
> -        return;
> -    }
>       if (!tap_get_vhost_net(n->nic->nc.peer)) {
>           return;
>       }
> @@ -859,7 +859,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
>               return -1;
>           }
>
> -        if (n->has_vnet_hdr) {
> +        if (n->nic->nc.peer&&  n->has_vnet_hdr) {
>               tap_using_vnet_hdr(n->nic->nc.peer, 1);
>               tap_set_offload(n->nic->nc.peer,
>                       (n->vdev.guest_features>>  VIRTIO_NET_F_GUEST_CSUM)&  1,
>
>
>    

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] virtio-net: Don't pass NULL peer to tap routines
  2010-09-23 17:43 ` Anthony Liguori
@ 2010-09-23 18:25   ` Alex Williamson
  2010-09-24  9:31     ` Markus Armbruster
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Williamson @ 2010-09-23 18:25 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, mst

On Thu, 2010-09-23 at 12:43 -0500, Anthony Liguori wrote:
> On 09/22/2010 02:52 PM, Alex Williamson wrote:
> > During a hotplug, the netdev might be removed before the
> > connected virtio device.  When this happens, the guest might
> > be running cleanup operations that can trigger a segfault in
> > qemu.  Avoid one set of these by checking whether the peer
> > device is present before trying to do tap operations.
> >
> > Signed-off-by: Alex Williamson<alex.williamson@redhat.com>
> >    
> 
> Can you explain this scenario a little better?
> 
> If nc.peer is NULL when set_features is called, it would seem to me like 
> we're in a pretty critical state.  I agree that we shouldn't set fault, 
> but I wonder if the real bug is that this can happen at all.

Unfortunately that critical state happens all the time since device_del
does an asynchronous ACPI call into the guest and libvirt isn't blocked
waiting for that to complete and doesn't poll to see if the device goes
away.  So it's actually pretty common today that the netdev disappears
before the device.  We talked about this in the community call on
Tuesday, and I think Michael is trying to think of a way to solve this,
perhaps by separating the guest releasing the device from the device
removal.

In the mean time, virtio-net has this hole that seems like it can be
avoided by simply checking some pointers on a slow path.  Since the
netdev has already disappeared, attempting to set features on it seems
pointless.  The change in the load function is really just a paranoia
check since it followed the same model of calling tap_*() funcs w/o
checking the value of nc.peer.  Thanks,

Alex

> > ---
> >
> >   hw/virtio-net.c |   10 +++++-----
> >   1 files changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/hw/virtio-net.c b/hw/virtio-net.c
> > index 0a9cae2..2c758ad 100644
> > --- a/hw/virtio-net.c
> > +++ b/hw/virtio-net.c
> > @@ -216,6 +216,10 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint32_t features)
> >
> >       n->mergeable_rx_bufs = !!(features&  (1<<  VIRTIO_NET_F_MRG_RXBUF));
> >
> > +    if (!n->nic->nc.peer ||
> > +        n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
> > +        return;
> > +    }
> >       if (n->has_vnet_hdr) {
> >           tap_set_offload(n->nic->nc.peer,
> >                           (features>>  VIRTIO_NET_F_GUEST_CSUM)&  1,
> > @@ -224,10 +228,6 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint32_t features)
> >                           (features>>  VIRTIO_NET_F_GUEST_ECN)&  1,
> >                           (features>>  VIRTIO_NET_F_GUEST_UFO)&  1);
> >       }
> > -    if (!n->nic->nc.peer ||
> > -        n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
> > -        return;
> > -    }
> >       if (!tap_get_vhost_net(n->nic->nc.peer)) {
> >           return;
> >       }
> > @@ -859,7 +859,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
> >               return -1;
> >           }
> >
> > -        if (n->has_vnet_hdr) {
> > +        if (n->nic->nc.peer&&  n->has_vnet_hdr) {
> >               tap_using_vnet_hdr(n->nic->nc.peer, 1);
> >               tap_set_offload(n->nic->nc.peer,
> >                       (n->vdev.guest_features>>  VIRTIO_NET_F_GUEST_CSUM)&  1,
> >
> >
> >    
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] virtio-net: Don't pass NULL peer to tap routines
  2010-09-23 18:25   ` Alex Williamson
@ 2010-09-24  9:31     ` Markus Armbruster
  2010-09-24 14:17       ` Alex Williamson
  0 siblings, 1 reply; 6+ messages in thread
From: Markus Armbruster @ 2010-09-24  9:31 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel, mst

Alex Williamson <alex.williamson@redhat.com> writes:

> On Thu, 2010-09-23 at 12:43 -0500, Anthony Liguori wrote:
>> On 09/22/2010 02:52 PM, Alex Williamson wrote:
>> > During a hotplug, the netdev might be removed before the

unplug?

>> > connected virtio device.  When this happens, the guest might
>> > be running cleanup operations that can trigger a segfault in
>> > qemu.  Avoid one set of these by checking whether the peer
>> > device is present before trying to do tap operations.
>> >
>> > Signed-off-by: Alex Williamson<alex.williamson@redhat.com>
>> >    
>> 
>> Can you explain this scenario a little better?
>> 
>> If nc.peer is NULL when set_features is called, it would seem to me like 
>> we're in a pretty critical state.  I agree that we shouldn't set fault, 
>> but I wonder if the real bug is that this can happen at all.
>
> Unfortunately that critical state happens all the time since device_del
> does an asynchronous ACPI call into the guest and libvirt isn't blocked
> waiting for that to complete and doesn't poll to see if the device goes
> away.  So it's actually pretty common today that the netdev disappears
> before the device.  We talked about this in the community call on
> Tuesday, and I think Michael is trying to think of a way to solve this,
> perhaps by separating the guest releasing the device from the device
> removal.
>
> In the mean time, virtio-net has this hole that seems like it can be
> avoided by simply checking some pointers on a slow path.  Since the
> netdev has already disappeared, attempting to set features on it seems
> pointless.  The change in the load function is really just a paranoia
> check since it followed the same model of calling tap_*() funcs w/o
> checking the value of nc.peer.  Thanks,

I figure we should either make netdev_del fail when the netdev is in
use, or make its users cope graciously with the netdev going away (make
it look like somebody yanked the cable).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] virtio-net: Don't pass NULL peer to tap routines
  2010-09-24  9:31     ` Markus Armbruster
@ 2010-09-24 14:17       ` Alex Williamson
  2010-09-26 11:57         ` Michael S. Tsirkin
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Williamson @ 2010-09-24 14:17 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, mst

On Fri, 2010-09-24 at 11:31 +0200, Markus Armbruster wrote:
> Alex Williamson <alex.williamson@redhat.com> writes:
> 
> > On Thu, 2010-09-23 at 12:43 -0500, Anthony Liguori wrote:
> >> On 09/22/2010 02:52 PM, Alex Williamson wrote:
> >> > During a hotplug, the netdev might be removed before the
> 
> unplug?

yep

> >> > connected virtio device.  When this happens, the guest might
> >> > be running cleanup operations that can trigger a segfault in
> >> > qemu.  Avoid one set of these by checking whether the peer
> >> > device is present before trying to do tap operations.
> >> >
> >> > Signed-off-by: Alex Williamson<alex.williamson@redhat.com>
> >> >    
> >> 
> >> Can you explain this scenario a little better?
> >> 
> >> If nc.peer is NULL when set_features is called, it would seem to me like 
> >> we're in a pretty critical state.  I agree that we shouldn't set fault, 
> >> but I wonder if the real bug is that this can happen at all.
> >
> > Unfortunately that critical state happens all the time since device_del
> > does an asynchronous ACPI call into the guest and libvirt isn't blocked
> > waiting for that to complete and doesn't poll to see if the device goes
> > away.  So it's actually pretty common today that the netdev disappears
> > before the device.  We talked about this in the community call on
> > Tuesday, and I think Michael is trying to think of a way to solve this,
> > perhaps by separating the guest releasing the device from the device
> > removal.
> >
> > In the mean time, virtio-net has this hole that seems like it can be
> > avoided by simply checking some pointers on a slow path.  Since the
> > netdev has already disappeared, attempting to set features on it seems
> > pointless.  The change in the load function is really just a paranoia
> > check since it followed the same model of calling tap_*() funcs w/o
> > checking the value of nc.peer.  Thanks,
> 
> I figure we should either make netdev_del fail when the netdev is in
> use, or make its users cope graciously with the netdev going away (make
> it look like somebody yanked the cable).

I'm not sure how useful it is, but I like the idea that we can swap the
netdev from under a running guest.  I believe this is possible with the
emulated drivers since they don't try to push features into the tap
device.  Perhaps something like you suggest where a netdev going away
sets a link down on the device.  If/when a netdev gets reattached, the
link returns and features are renegotiated.  Then we could move the
guest between NAT'd bridges and transparent bridges and it'd look like
we moved the network cable from one switch to another in the guest.

Alex

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] virtio-net: Don't pass NULL peer to tap routines
  2010-09-24 14:17       ` Alex Williamson
@ 2010-09-26 11:57         ` Michael S. Tsirkin
  0 siblings, 0 replies; 6+ messages in thread
From: Michael S. Tsirkin @ 2010-09-26 11:57 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Markus Armbruster, qemu-devel

On Fri, Sep 24, 2010 at 08:17:09AM -0600, Alex Williamson wrote:
> On Fri, 2010-09-24 at 11:31 +0200, Markus Armbruster wrote:
> > Alex Williamson <alex.williamson@redhat.com> writes:
> > 
> > > On Thu, 2010-09-23 at 12:43 -0500, Anthony Liguori wrote:
> > >> On 09/22/2010 02:52 PM, Alex Williamson wrote:
> > >> > During a hotplug, the netdev might be removed before the
> > 
> > unplug?
> 
> yep
> 
> > >> > connected virtio device.  When this happens, the guest might
> > >> > be running cleanup operations that can trigger a segfault in
> > >> > qemu.  Avoid one set of these by checking whether the peer
> > >> > device is present before trying to do tap operations.
> > >> >
> > >> > Signed-off-by: Alex Williamson<alex.williamson@redhat.com>
> > >> >    
> > >> 
> > >> Can you explain this scenario a little better?
> > >> 
> > >> If nc.peer is NULL when set_features is called, it would seem to me like 
> > >> we're in a pretty critical state.  I agree that we shouldn't set fault, 
> > >> but I wonder if the real bug is that this can happen at all.
> > >
> > > Unfortunately that critical state happens all the time since device_del
> > > does an asynchronous ACPI call into the guest and libvirt isn't blocked
> > > waiting for that to complete and doesn't poll to see if the device goes
> > > away.  So it's actually pretty common today that the netdev disappears
> > > before the device.  We talked about this in the community call on
> > > Tuesday, and I think Michael is trying to think of a way to solve this,
> > > perhaps by separating the guest releasing the device from the device
> > > removal.
> > >
> > > In the mean time, virtio-net has this hole that seems like it can be
> > > avoided by simply checking some pointers on a slow path.  Since the
> > > netdev has already disappeared, attempting to set features on it seems
> > > pointless.  The change in the load function is really just a paranoia
> > > check since it followed the same model of calling tap_*() funcs w/o
> > > checking the value of nc.peer.  Thanks,
> > 
> > I figure we should either make netdev_del fail when the netdev is in
> > use, or make its users cope graciously with the netdev going away (make
> > it look like somebody yanked the cable).
> 
> I'm not sure how useful it is, but I like the idea that we can swap the
> netdev from under a running guest.  I believe this is possible with the
> emulated drivers since they don't try to push features into the tap
> device.  Perhaps something like you suggest where a netdev going away
> sets a link down on the device.  If/when a netdev gets reattached, the
> link returns and features are renegotiated.

Existing guests don't renegotiate the features on link change though,
do they? To renegotiate features, it's probably cleaner
to have an event for this, not reuse link state change.

> Then we could move the guest between NAT'd bridges and transparent
> bridges and it'd look like we moved the network cable from one switch
> to another in the guest.
> 
> Alex

This last might in fact be possible without feature renegotiation
since both backends are tap.

-- 
MST

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-09-26 12:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-22 19:52 [Qemu-devel] [PATCH] virtio-net: Don't pass NULL peer to tap routines Alex Williamson
2010-09-23 17:43 ` Anthony Liguori
2010-09-23 18:25   ` Alex Williamson
2010-09-24  9:31     ` Markus Armbruster
2010-09-24 14:17       ` Alex Williamson
2010-09-26 11:57         ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).