Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] bnx2x: dont dereference tcp header on non tcp frames
From: David Miller @ 2011-04-20  8:46 UTC (permalink / raw)
  To: eric.dumazet; +Cc: dmitry, eilong, netdev
In-Reply-To: <1303284713.3186.13.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 20 Apr 2011 09:31:53 +0200

> [PATCH] bnx2x: dont dereference tcp header on non tcp frames
> 
> bnx2x_set_pbd_csum() & bnx2x_set_pbd_csum_e2() use 
> tcp_hdrlen(skb) even for non TCP frames
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Broadcom folks please review.

^ permalink raw reply

* Re: [PATCH] bonding: 802.3ad - fix agg_device_up
From: David Miller @ 2011-04-20  8:45 UTC (permalink / raw)
  To: fubar; +Cc: jbohac, netdev, andy, shemminger
In-Reply-To: <21326.1303232103@death>

From: Jay Vosburgh <fubar@us.ibm.com>
Date: Tue, 19 Apr 2011 09:55:03 -0700

> Jiri Bohac <jbohac@suse.cz> wrote:
> 
>>The slave member of struct aggregator does not necessarily point
>>to a slave which is part of the aggregator. It points to the
>>slave structure containing the aggregator structure, while
>>completely different slaves (or no slaves at all) may be part of
>>the aggregator.
>>
>>The agg_device_up() function wrongly uses agg->slave to find the state of the
>>aggregator.  Use agg->lag_ports->slave instead. The bug has been
>>introduced by commit 4cd6fe1c6483cde93e2ec91f58b7af9c9eea51ad.
>>
>>Signed-off-by: Jiri Bohac <jbohac@suse.cz>
> 
> 	One additional note: port->slave can be NULL when the slave is
> transitioning in or out of the bond, but not when the port is part of an
> aggregator, so this usage should be safe.
> 
> 	-J
> 
> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH] ehea: Fix a DLPAR bug on ehea_rereg_mrs().
From: David Miller @ 2011-04-20  8:42 UTC (permalink / raw)
  To: leitao; +Cc: netdev
In-Reply-To: <1303241962-32285-1-git-send-email-leitao@linux.vnet.ibm.com>

From: Breno Leitao <leitao@linux.vnet.ibm.com>
Date: Tue, 19 Apr 2011 16:39:22 -0300

> We are currently continuing if ehea_restart_qps() fails, when we
> do a memory DLPAR (remove or add more memory to the system).
> 
> This patch just let the NAPI disabled if the ehea_restart_qps()
> fails.
> 
> Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com>

Applied, thanks.

^ permalink raw reply

* Re: A patch you wrote some time ago (aka: "[patch 41/54] ICMP: Fix icmp_errors_use_inbound_ifaddr sysctl")
From: Alexander Hoogerhuis @ 2011-04-20  8:38 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Chris Wright, linux-kernel, netdev
In-Reply-To: <4DAE9824.10802@trash.net>

On 20.04.2011 10:24, Patrick McHardy wrote:
>
> That might be a possibility to fix this for your case. But I'm
> wondering why you're turning this on at all and not have routing
> decide the correct source address?

Not a whole lot of tuning, but trying to figure why this would not work 
as any other VRRP implementation would work on other routers/OSes.

My case seems to be a general problem for ICMP errors, as the IP stack 
tends to want to listen more to advice coming back with the source IP of 
the gateway, not a third party.

If you have two machines (A and B) run VRRP and share an IP (C), then 
any ICMP redirect should have the VRRP IP as source (C), and the way it 
works today (with or without sysctl_icmp_errors_use_inbound_ifaddr) is 
that it will have the source set to the primary IP of the source interface.

I suspect this holds for any other ICMP message sent back to hosts in 
the connected network as well, such as PMTU-related issues, etc.

In my case nodes in the connected subnet would get ICMP redirects from 
the primary IPs, and thus not listen to them as they are arriving from 
nodes not listen in the list of known gateways.

It would make more sense when returning ICMP messages the source IP 
would be the actual IP it is recveied on, not the primary IP of the 
interface.

mvh,
A
-- 
Alexander Hoogerhuis | http://no.linkedin.com/in/alexh
Boxed Solutions AS   | +47 908 21 485 - alexh@boxed.no
"Given enough eyeballs, all bugs are shallow." -Eric S. Raymond

^ permalink raw reply

* [RFC PATCH 0/2] Multiqueue support for qemu(virtio-net)
From: Jason Wang @ 2011-04-20  8:33 UTC (permalink / raw)
  To: krkumar2, kvm, mst, netdev, rusty, qemu-devel, anthony

Inspired by Krishna's patch (http://www.spinics.net/lists/kvm/msg52098.html) and
Michael's suggestions.  The following series adds the multiqueue support for
qemu and enable it for virtio-net (both userspace and vhost).

The aim for this series is to simplified the management and achieve the same
performacne with less codes.

Follows are the differences between this series and Krishna's:

- Add the multiqueue support for qemu and also for userspace virtio-net
- Instead of hacking the vhost module to manipulate kthreads, this patch just
implement the userspace based multiqueues and thus can re-use the existed vhost kernel-side codes without any modification.
- Use 1:1 mapping between TX/RX pairs and vhost kthread because the
implementation is based on usersapce.
- The cli is also changed to make the mgmt easier, the -netdev option of qdev
can now accpet more than one ids. You can start a multiqueue virtio-net device
through:
./qemu-system-x86_64 -netdev tap,id=hn0,vhost=on,fd=X -netdev
tap,id=hn0,vhost=on,fd=Y -device virtio-net-pci,netdev=hn0#hn1,queues=2 ...

The series is very primitive and still need polished.

Suggestions are welcomed.
---

Jason Wang (2):
      net: Add multiqueue support
      virtio-net: add multiqueue support


 hw/qdev-properties.c |   37 ++++-
 hw/qdev.h            |    3 
 hw/vhost.c           |   26 ++-
 hw/vhost.h           |    1 
 hw/vhost_net.c       |    7 +
 hw/vhost_net.h       |    2 
 hw/virtio-net.c      |  409 ++++++++++++++++++++++++++++++++------------------
 hw/virtio-net.h      |    2 
 hw/virtio-pci.c      |    1 
 hw/virtio.h          |    1 
 net.c                |   34 +++-
 net.h                |   15 +-
 12 files changed, 353 insertions(+), 185 deletions(-)

-- 
Jason Wang

^ permalink raw reply

* Re: [PATCH] net: tun: convert to hw_features
From: David Miller @ 2011-04-20  8:32 UTC (permalink / raw)
  To: mirq-linux; +Cc: netdev, rusty
In-Reply-To: <20110419161310.7508513909@rere.qmqm.pl>

From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Date: Tue, 19 Apr 2011 18:13:10 +0200 (CEST)

> This changes offload setting behaviour to what I think is correct:
>  - offloads set via ethtool mean what admin wants to use (by default
>    he wants 'em all)
>  - offloads set via ioctl() mean what userspace is expecting to get
>    (this limits which admin wishes are granted)
>  - TUN_NOCHECKSUM is ignored, as it might cause broken packets when
>    forwarded (ip_summed == CHECKSUM_UNNECESSARY means that checksum
>    was verified, not that it can be ignored)
> 
> If TUN_NOCHECKSUM is implemented, it should set skb->csum_* and
> skb->ip_summed (= CHECKSUM_PARTIAL) for known protocols and let others
> be verified by kernel when necessary.
> 
> TUN_NOCHECKSUM handling was introduced by commit
> f43798c27684ab925adde7d8acc34c78c6e50df8:
> 
>     tun: Allow GSO using virtio_net_hdr
>     
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>

Applied.

^ permalink raw reply

* Re: [PATCH v2] net: qlcnic: convert to hw_features
From: David Miller @ 2011-04-20  8:32 UTC (permalink / raw)
  To: mirq-linux; +Cc: netdev, amit.salecha, anirban.chakraborty, linux-driver
In-Reply-To: <20110419130357.40FA813909@rere.qmqm.pl>

From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Date: Tue, 19 Apr 2011 15:03:57 +0200 (CEST)

> Bit more than minimal conversion. There might be some issues because
> of qlcnic_set_netdev_features() if it's called after netdev init.
> 
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> ---
> v2: fix compilation issues (I forgot to 'stg refresh' before sending v1)

Applied.

^ permalink raw reply

* [RFC PATCH 2/2] virtio-net: add multiqueue support
From: Jason Wang @ 2011-04-20  8:33 UTC (permalink / raw)
  To: krkumar2, kvm, mst, netdev, rusty, qemu-devel, anthony
In-Reply-To: <20110420082706.32157.59668.stgit@dhcp-91-7.nay.redhat.com.englab.nay.redhat.com>

This patch add the multiqueue ability to virtio-net for both userapce and
vhost. With this patch the kernel side vhost could be reused without
modification to support multiqueue virtio-net nics.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/vhost.c      |   26 ++-
 hw/vhost.h      |    1 
 hw/vhost_net.c  |    7 +
 hw/vhost_net.h  |    2 
 hw/virtio-net.c |  409 +++++++++++++++++++++++++++++++++++--------------------
 hw/virtio-net.h |    2 
 hw/virtio-pci.c |    1 
 hw/virtio.h     |    1 
 8 files changed, 284 insertions(+), 165 deletions(-)

diff --git a/hw/vhost.c b/hw/vhost.c
index 14b571d..2301d53 100644
--- a/hw/vhost.c
+++ b/hw/vhost.c
@@ -450,10 +450,10 @@ static int vhost_virtqueue_init(struct vhost_dev *dev,
     target_phys_addr_t s, l, a;
     int r;
     struct vhost_vring_file file = {
-        .index = idx,
+        .index = idx % dev->nvqs,
     };
     struct vhost_vring_state state = {
-        .index = idx,
+        .index = idx % dev->nvqs,
     };
     struct VirtQueue *vvq = virtio_get_queue(vdev, idx);
 
@@ -504,12 +504,13 @@ static int vhost_virtqueue_init(struct vhost_dev *dev,
         goto fail_alloc_ring;
     }
 
-    r = vhost_virtqueue_set_addr(dev, vq, idx, dev->log_enabled);
+    r = vhost_virtqueue_set_addr(dev, vq, idx % dev->nvqs, dev->log_enabled);
     if (r < 0) {
         r = -errno;
         goto fail_alloc;
     }
     r = vdev->binding->set_host_notifier(vdev->binding_opaque, idx, true);
+
     if (r < 0) {
         fprintf(stderr, "Error binding host notifier: %d\n", -r);
         goto fail_host_notifier;
@@ -557,7 +558,7 @@ static void vhost_virtqueue_cleanup(struct vhost_dev *dev,
                                     unsigned idx)
 {
     struct vhost_vring_state state = {
-        .index = idx,
+        .index = idx % dev->nvqs,
     };
     int r;
     r = vdev->binding->set_host_notifier(vdev->binding_opaque, idx, false);
@@ -648,10 +649,13 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
         goto fail;
     }
 
-    r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, true);
-    if (r < 0) {
-        fprintf(stderr, "Error binding guest notifier: %d\n", -r);
-        goto fail_notifiers;
+    for (i = 0; i < hdev->nvqs; i++) {
+        r = vdev->binding->set_guest_notifier(vdev->binding_opaque,
+                                              hdev->start_idx + i, true);
+        if (r < 0) {
+            fprintf(stderr, "Error binding guest notifier: %d\n", -r);
+            goto fail_notifiers;
+        }
     }
 
     r = vhost_dev_set_features(hdev, hdev->log_enabled);
@@ -667,7 +671,7 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
         r = vhost_virtqueue_init(hdev,
                                  vdev,
                                  hdev->vqs + i,
-                                 i);
+                                 hdev->start_idx + i);
         if (r < 0) {
             goto fail_vq;
         }
@@ -694,7 +698,7 @@ fail_vq:
         vhost_virtqueue_cleanup(hdev,
                                 vdev,
                                 hdev->vqs + i,
-                                i);
+                                hdev->start_idx + i);
     }
 fail_mem:
 fail_features:
@@ -712,7 +716,7 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
         vhost_virtqueue_cleanup(hdev,
                                 vdev,
                                 hdev->vqs + i,
-                                i);
+                                hdev->start_idx + i);
     }
     vhost_client_sync_dirty_bitmap(&hdev->client, 0,
                                    (target_phys_addr_t)~0x0ull);
diff --git a/hw/vhost.h b/hw/vhost.h
index c8c595a..48b9478 100644
--- a/hw/vhost.h
+++ b/hw/vhost.h
@@ -31,6 +31,7 @@ struct vhost_dev {
     struct vhost_memory *mem;
     struct vhost_virtqueue *vqs;
     int nvqs;
+    int start_idx;
     unsigned long long features;
     unsigned long long acked_features;
     unsigned long long backend_features;
diff --git a/hw/vhost_net.c b/hw/vhost_net.c
index 420e05f..7fc87f8 100644
--- a/hw/vhost_net.c
+++ b/hw/vhost_net.c
@@ -128,7 +128,8 @@ bool vhost_net_query(VHostNetState *net, VirtIODevice *dev)
 }
 
 int vhost_net_start(struct vhost_net *net,
-                    VirtIODevice *dev)
+                    VirtIODevice *dev,
+                    int start_idx)
 {
     struct vhost_vring_file file = { };
     int r;
@@ -139,6 +140,7 @@ int vhost_net_start(struct vhost_net *net,
 
     net->dev.nvqs = 2;
     net->dev.vqs = net->vqs;
+    net->dev.start_idx = start_idx;
     r = vhost_dev_start(&net->dev, dev);
     if (r < 0) {
         return r;
@@ -206,7 +208,8 @@ bool vhost_net_query(VHostNetState *net, VirtIODevice *dev)
 }
 
 int vhost_net_start(struct vhost_net *net,
-		    VirtIODevice *dev)
+                    VirtIODevice *dev,
+                    int start_idx)
 {
     return -ENOSYS;
 }
diff --git a/hw/vhost_net.h b/hw/vhost_net.h
index 91e40b1..79a4f09 100644
--- a/hw/vhost_net.h
+++ b/hw/vhost_net.h
@@ -9,7 +9,7 @@ typedef struct vhost_net VHostNetState;
 VHostNetState *vhost_net_init(VLANClientState *backend, int devfd, bool force);
 
 bool vhost_net_query(VHostNetState *net, VirtIODevice *dev);
-int vhost_net_start(VHostNetState *net, VirtIODevice *dev);
+int vhost_net_start(VHostNetState *net, VirtIODevice *dev, int start_idx);
 void vhost_net_stop(VHostNetState *net, VirtIODevice *dev);
 
 void vhost_net_cleanup(VHostNetState *net);
diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 6997e02..b4430b7 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -26,26 +26,35 @@
 #define MAC_TABLE_ENTRIES    64
 #define MAX_VLAN    (1 << 12)   /* Per 802.1Q definition */
 
-typedef struct VirtIONet
+struct VirtIONet;
+
+typedef struct VirtIONetQueue
 {
-    VirtIODevice vdev;
-    uint8_t mac[ETH_ALEN];
-    uint16_t status;
     VirtQueue *rx_vq;
     VirtQueue *tx_vq;
-    VirtQueue *ctrl_vq;
-    NICState *nic;
     QEMUTimer *tx_timer;
     QEMUBH *tx_bh;
     uint32_t tx_timeout;
-    int32_t tx_burst;
     int tx_waiting;
-    uint32_t has_vnet_hdr;
-    uint8_t has_ufo;
     struct {
         VirtQueueElement elem;
         ssize_t len;
     } async_tx;
+    struct VirtIONet *n;
+    uint8_t vhost_started;
+} VirtIONetQueue;
+
+typedef struct VirtIONet
+{
+    VirtIODevice vdev;
+    uint8_t mac[ETH_ALEN];
+    uint16_t status;
+    VirtIONetQueue vqs[MAX_QUEUE_NUM];
+    VirtQueue *ctrl_vq;
+    NICState *nic;
+    int32_t tx_burst;
+    uint32_t has_vnet_hdr;
+    uint8_t has_ufo;
     int mergeable_rx_bufs;
     uint8_t promisc;
     uint8_t allmulti;
@@ -53,7 +62,6 @@ typedef struct VirtIONet
     uint8_t nomulti;
     uint8_t nouni;
     uint8_t nobcast;
-    uint8_t vhost_started;
     struct {
         int in_use;
         int first_multi;
@@ -63,6 +71,7 @@ typedef struct VirtIONet
     } mac_table;
     uint32_t *vlans;
     DeviceState *qdev;
+    uint32_t queues;
 } VirtIONet;
 
 /* TODO
@@ -74,12 +83,25 @@ static VirtIONet *to_virtio_net(VirtIODevice *vdev)
     return (VirtIONet *)vdev;
 }
 
+static int vq_get_pair_index(VirtIONet *n, VirtQueue *vq)
+{
+    int i;
+    for (i = 0; i < n->queues; i++) {
+        if (n->vqs[i].tx_vq == vq || n->vqs[i].rx_vq == vq) {
+            return i;
+        }
+    }
+    assert(1);
+    return -1;
+}
+
 static void virtio_net_get_config(VirtIODevice *vdev, uint8_t *config)
 {
     VirtIONet *n = to_virtio_net(vdev);
     struct virtio_net_config netcfg;
 
     stw_p(&netcfg.status, n->status);
+    netcfg.queues = n->queues * 2;
     memcpy(netcfg.mac, n->mac, ETH_ALEN);
     memcpy(config, &netcfg, sizeof(netcfg));
 }
@@ -103,75 +125,104 @@ static bool virtio_net_started(VirtIONet *n, uint8_t status)
         (n->status & VIRTIO_NET_S_LINK_UP) && n->vdev.vm_running;
 }
 
-static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
+static void nc_vhost_status(VLANClientState *nc, VirtIONet *n,
+                            uint8_t status)
 {
-    if (!n->nic->nc.peer) {
+    int queue_index = nc->queue_index;
+    VLANClientState *peer = nc->peer;
+    VirtIONetQueue *netq = &n->vqs[nc->queue_index];
+
+    if (!peer) {
         return;
     }
-    if (n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
+    if (peer->info->type != NET_CLIENT_TYPE_TAP) {
         return;
     }
 
-    if (!tap_get_vhost_net(n->nic->nc.peer)) {
+    if (!tap_get_vhost_net(peer)) {
         return;
     }
-    if (!!n->vhost_started == virtio_net_started(n, status) &&
-                              !n->nic->nc.peer->link_down) {
+    if (!!netq->vhost_started == virtio_net_started(n, status) &&
+                                 !peer->link_down) {
         return;
     }
-    if (!n->vhost_started) {
+    if (!netq->vhost_started) {
         int r;
-        if (!vhost_net_query(tap_get_vhost_net(n->nic->nc.peer), &n->vdev)) {
+        if (!vhost_net_query(tap_get_vhost_net(peer), &n->vdev)) {
             return;
         }
-        r = vhost_net_start(tap_get_vhost_net(n->nic->nc.peer), &n->vdev);
+        r = vhost_net_start(tap_get_vhost_net(peer), &n->vdev, queue_index * 2);
         if (r < 0) {
             error_report("unable to start vhost net: %d: "
                          "falling back on userspace virtio", -r);
         } else {
-            n->vhost_started = 1;
+            netq->vhost_started = 1;
         }
     } else {
-        vhost_net_stop(tap_get_vhost_net(n->nic->nc.peer), &n->vdev);
-        n->vhost_started = 0;
+        vhost_net_stop(tap_get_vhost_net(peer), &n->vdev);
+        netq->vhost_started = 0;
+    }
+}
+
+static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
+{
+    int i;
+    for (i = 0; i < n->queues; i++) {
+        nc_vhost_status(n->nic->ncs[i], n, status);
     }
 }
 
 static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
 {
     VirtIONet *n = to_virtio_net(vdev);
+    int i;
 
     virtio_net_vhost_status(n, status);
 
-    if (!n->tx_waiting) {
-        return;
-    }
+    for (i = 0; i < n->queues; i++) {
+        VirtIONetQueue *netq = &n->vqs[i];
+        if (!netq->tx_waiting) {
+            continue;
+        }
 
-    if (virtio_net_started(n, status) && !n->vhost_started) {
-        if (n->tx_timer) {
-            qemu_mod_timer(n->tx_timer,
-                           qemu_get_clock_ns(vm_clock) + n->tx_timeout);
+        if (virtio_net_started(n, status) && !netq->vhost_started) {
+            if (netq->tx_timer) {
+                qemu_mod_timer(netq->tx_timer,
+                               qemu_get_clock_ns(vm_clock) + netq->tx_timeout);
+            } else {
+                qemu_bh_schedule(netq->tx_bh);
+            }
         } else {
-            qemu_bh_schedule(n->tx_bh);
+            if (netq->tx_timer) {
+                qemu_del_timer(netq->tx_timer);
+            } else {
+                qemu_bh_cancel(netq->tx_bh);
+            }
         }
-    } else {
-        if (n->tx_timer) {
-            qemu_del_timer(n->tx_timer);
-        } else {
-            qemu_bh_cancel(n->tx_bh);
+    }
+}
+
+static bool virtio_net_is_link_up(VirtIONet *n)
+{
+    int i;
+    for (i = 0; i < n->queues; i++) {
+        if (n->nic->ncs[i]->link_down) {
+            return false;
         }
     }
+    return true;
 }
 
 static void virtio_net_set_link_status(VLANClientState *nc)
 {
-    VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque;
+    VirtIONet *n = ((NICState *)(nc->opaque))->opaque;
     uint16_t old_status = n->status;
 
-    if (nc->link_down)
+    if (virtio_net_is_link_up(n)) {
         n->status &= ~VIRTIO_NET_S_LINK_UP;
-    else
+    } else {
         n->status |= VIRTIO_NET_S_LINK_UP;
+    }
 
     if (n->status != old_status)
         virtio_notify_config(&n->vdev);
@@ -202,13 +253,15 @@ static void virtio_net_reset(VirtIODevice *vdev)
 
 static int peer_has_vnet_hdr(VirtIONet *n)
 {
-    if (!n->nic->nc.peer)
+    if (!n->nic->ncs[0]->peer) {
         return 0;
+    }
 
-    if (n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP)
+    if (n->nic->ncs[0]->peer->info->type != NET_CLIENT_TYPE_TAP) {
         return 0;
+    }
 
-    n->has_vnet_hdr = tap_has_vnet_hdr(n->nic->nc.peer);
+    n->has_vnet_hdr = tap_has_vnet_hdr(n->nic->ncs[0]->peer);
 
     return n->has_vnet_hdr;
 }
@@ -218,7 +271,7 @@ static int peer_has_ufo(VirtIONet *n)
     if (!peer_has_vnet_hdr(n))
         return 0;
 
-    n->has_ufo = tap_has_ufo(n->nic->nc.peer);
+    n->has_ufo = tap_has_ufo(n->nic->ncs[0]->peer);
 
     return n->has_ufo;
 }
@@ -228,9 +281,13 @@ static uint32_t virtio_net_get_features(VirtIODevice *vdev, uint32_t features)
     VirtIONet *n = to_virtio_net(vdev);
 
     features |= (1 << VIRTIO_NET_F_MAC);
+    features |= (1 << VIRTIO_NET_F_MULTIQUEUE);
 
     if (peer_has_vnet_hdr(n)) {
-        tap_using_vnet_hdr(n->nic->nc.peer, 1);
+        int i;
+        for (i = 0; i < n->queues; i++) {
+            tap_using_vnet_hdr(n->nic->ncs[i]->peer, 1);
+        }
     } else {
         features &= ~(0x1 << VIRTIO_NET_F_CSUM);
         features &= ~(0x1 << VIRTIO_NET_F_HOST_TSO4);
@@ -248,14 +305,15 @@ static uint32_t virtio_net_get_features(VirtIODevice *vdev, uint32_t features)
         features &= ~(0x1 << VIRTIO_NET_F_HOST_UFO);
     }
 
-    if (!n->nic->nc.peer ||
-        n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
+    if (!n->nic->ncs[0]->peer ||
+        n->nic->ncs[0]->peer->info->type != NET_CLIENT_TYPE_TAP) {
         return features;
     }
-    if (!tap_get_vhost_net(n->nic->nc.peer)) {
+    if (!tap_get_vhost_net(n->nic->ncs[0]->peer)) {
         return features;
     }
-    return vhost_net_get_features(tap_get_vhost_net(n->nic->nc.peer), features);
+    return vhost_net_get_features(tap_get_vhost_net(n->nic->ncs[0]->peer),
+                                  features);
 }
 
 static uint32_t virtio_net_bad_features(VirtIODevice *vdev)
@@ -276,25 +334,29 @@ static uint32_t virtio_net_bad_features(VirtIODevice *vdev)
 static void virtio_net_set_features(VirtIODevice *vdev, uint32_t features)
 {
     VirtIONet *n = to_virtio_net(vdev);
+    int i;
 
     n->mergeable_rx_bufs = !!(features & (1 << VIRTIO_NET_F_MRG_RXBUF));
 
-    if (n->has_vnet_hdr) {
-        tap_set_offload(n->nic->nc.peer,
-                        (features >> VIRTIO_NET_F_GUEST_CSUM) & 1,
-                        (features >> VIRTIO_NET_F_GUEST_TSO4) & 1,
-                        (features >> VIRTIO_NET_F_GUEST_TSO6) & 1,
-                        (features >> VIRTIO_NET_F_GUEST_ECN)  & 1,
-                        (features >> VIRTIO_NET_F_GUEST_UFO)  & 1);
-    }
-    if (!n->nic->nc.peer ||
-        n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
-        return;
-    }
-    if (!tap_get_vhost_net(n->nic->nc.peer)) {
-        return;
+    for (i = 0; i < n->queues; i++) {
+        if (n->has_vnet_hdr) {
+            tap_set_offload(n->nic->ncs[i]->peer,
+                            (features >> VIRTIO_NET_F_GUEST_CSUM) & 1,
+                            (features >> VIRTIO_NET_F_GUEST_TSO4) & 1,
+                            (features >> VIRTIO_NET_F_GUEST_TSO6) & 1,
+                            (features >> VIRTIO_NET_F_GUEST_ECN)  & 1,
+                            (features >> VIRTIO_NET_F_GUEST_UFO)  & 1);
+        }
+        if (!n->nic->ncs[i]->peer ||
+            n->nic->ncs[i]->peer->info->type != NET_CLIENT_TYPE_TAP) {
+            continue;
+        }
+        if (!tap_get_vhost_net(n->nic->ncs[i]->peer)) {
+            continue;
+        }
+        vhost_net_ack_features(tap_get_vhost_net(n->nic->ncs[i]->peer),
+                               features);
     }
-    vhost_net_ack_features(tap_get_vhost_net(n->nic->nc.peer), features);
 }
 
 static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
@@ -446,7 +508,7 @@ static void virtio_net_handle_rx(VirtIODevice *vdev, VirtQueue *vq)
 {
     VirtIONet *n = to_virtio_net(vdev);
 
-    qemu_flush_queued_packets(&n->nic->nc);
+    qemu_flush_queued_packets(n->nic->ncs[vq_get_pair_index(n, vq)]);
 
     /* We now have RX buffers, signal to the IO thread to break out of the
      * select to re-poll the tap file descriptor */
@@ -455,36 +517,36 @@ static void virtio_net_handle_rx(VirtIODevice *vdev, VirtQueue *vq)
 
 static int virtio_net_can_receive(VLANClientState *nc)
 {
-    VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque;
+    int queue_index = nc->queue_index;
+    VirtIONet *n = ((NICState *)nc->opaque)->opaque;
+
     if (!n->vdev.vm_running) {
         return 0;
     }
 
-    if (!virtio_queue_ready(n->rx_vq) ||
+    if (!virtio_queue_ready(n->vqs[queue_index].rx_vq) ||
         !(n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK))
         return 0;
 
     return 1;
 }
 
-static int virtio_net_has_buffers(VirtIONet *n, int bufsize)
+static int virtio_net_has_buffers(VirtIONet *n, int bufsize, VirtQueue *vq)
 {
-    if (virtio_queue_empty(n->rx_vq) ||
-        (n->mergeable_rx_bufs &&
-         !virtqueue_avail_bytes(n->rx_vq, bufsize, 0))) {
-        virtio_queue_set_notification(n->rx_vq, 1);
+    if (virtio_queue_empty(vq) || (n->mergeable_rx_bufs &&
+        !virtqueue_avail_bytes(vq, bufsize, 0))) {
+        virtio_queue_set_notification(vq, 1);
 
         /* To avoid a race condition where the guest has made some buffers
          * available after the above check but before notification was
          * enabled, check for available buffers again.
          */
-        if (virtio_queue_empty(n->rx_vq) ||
-            (n->mergeable_rx_bufs &&
-             !virtqueue_avail_bytes(n->rx_vq, bufsize, 0)))
+        if (virtio_queue_empty(vq) || (n->mergeable_rx_bufs &&
+             !virtqueue_avail_bytes(vq, bufsize, 0)))
             return 0;
     }
 
-    virtio_queue_set_notification(n->rx_vq, 0);
+    virtio_queue_set_notification(vq, 0);
     return 1;
 }
 
@@ -595,11 +657,13 @@ static int receive_filter(VirtIONet *n, const uint8_t *buf, int size)
 
 static ssize_t virtio_net_receive(VLANClientState *nc, const uint8_t *buf, size_t size)
 {
-    VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque;
+    int queue_index = nc->queue_index;
+    VirtIONet *n = ((NICState *)(nc->opaque))->opaque;
+    VirtQueue *vq = n->vqs[queue_index].rx_vq;
     struct virtio_net_hdr_mrg_rxbuf *mhdr = NULL;
     size_t guest_hdr_len, offset, i, host_hdr_len;
 
-    if (!virtio_net_can_receive(&n->nic->nc))
+    if (!virtio_net_can_receive(n->nic->ncs[queue_index]))
         return -1;
 
     /* hdr_len refers to the header we supply to the guest */
@@ -608,7 +672,7 @@ static ssize_t virtio_net_receive(VLANClientState *nc, const uint8_t *buf, size_
 
 
     host_hdr_len = n->has_vnet_hdr ? sizeof(struct virtio_net_hdr) : 0;
-    if (!virtio_net_has_buffers(n, size + guest_hdr_len - host_hdr_len))
+    if (!virtio_net_has_buffers(n, size + guest_hdr_len - host_hdr_len, vq))
         return 0;
 
     if (!receive_filter(n, buf, size))
@@ -623,7 +687,7 @@ static ssize_t virtio_net_receive(VLANClientState *nc, const uint8_t *buf, size_
 
         total = 0;
 
-        if (virtqueue_pop(n->rx_vq, &elem) == 0) {
+        if (virtqueue_pop(vq, &elem) == 0) {
             if (i == 0)
                 return -1;
             error_report("virtio-net unexpected empty queue: "
@@ -675,47 +739,50 @@ static ssize_t virtio_net_receive(VLANClientState *nc, const uint8_t *buf, size_
         }
 
         /* signal other side */
-        virtqueue_fill(n->rx_vq, &elem, total, i++);
+        virtqueue_fill(vq, &elem, total, i++);
     }
 
     if (mhdr) {
         stw_p(&mhdr->num_buffers, i);
     }
 
-    virtqueue_flush(n->rx_vq, i);
-    virtio_notify(&n->vdev, n->rx_vq);
+    virtqueue_flush(vq, i);
+    virtio_notify(&n->vdev, vq);
 
     return size;
 }
 
-static int32_t virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq);
+static int32_t virtio_net_flush_tx(VirtIONet *n, VirtIONetQueue *tvq);
 
 static void virtio_net_tx_complete(VLANClientState *nc, ssize_t len)
 {
-    VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque;
+    VirtIONet *n = ((NICState *)nc->opaque)->opaque;
+    VirtIONetQueue *netq = &n->vqs[nc->queue_index];
 
-    virtqueue_push(n->tx_vq, &n->async_tx.elem, n->async_tx.len);
-    virtio_notify(&n->vdev, n->tx_vq);
+    virtqueue_push(netq->tx_vq, &netq->async_tx.elem, netq->async_tx.len);
+    virtio_notify(&n->vdev, netq->tx_vq);
 
-    n->async_tx.elem.out_num = n->async_tx.len = 0;
+    netq->async_tx.elem.out_num = netq->async_tx.len;
 
-    virtio_queue_set_notification(n->tx_vq, 1);
-    virtio_net_flush_tx(n, n->tx_vq);
+    virtio_queue_set_notification(netq->tx_vq, 1);
+    virtio_net_flush_tx(n, netq);
 }
 
 /* TX */
-static int32_t virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq)
+static int32_t virtio_net_flush_tx(VirtIONet *n, VirtIONetQueue *netq)
 {
     VirtQueueElement elem;
     int32_t num_packets = 0;
+    VirtQueue *vq = netq->tx_vq;
+
     if (!(n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK)) {
         return num_packets;
     }
 
     assert(n->vdev.vm_running);
 
-    if (n->async_tx.elem.out_num) {
-        virtio_queue_set_notification(n->tx_vq, 0);
+    if (netq->async_tx.elem.out_num) {
+        virtio_queue_set_notification(vq, 0);
         return num_packets;
     }
 
@@ -747,12 +814,12 @@ static int32_t virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq)
             len += hdr_len;
         }
 
-        ret = qemu_sendv_packet_async(&n->nic->nc, out_sg, out_num,
-                                      virtio_net_tx_complete);
+        ret = qemu_sendv_packet_async(n->nic->ncs[vq_get_pair_index(n, vq)],
+                                      out_sg, out_num, virtio_net_tx_complete);
         if (ret == 0) {
-            virtio_queue_set_notification(n->tx_vq, 0);
-            n->async_tx.elem = elem;
-            n->async_tx.len  = len;
+            virtio_queue_set_notification(vq, 0);
+            netq->async_tx.elem = elem;
+            netq->async_tx.len  = len;
             return -EBUSY;
         }
 
@@ -771,22 +838,23 @@ static int32_t virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq)
 static void virtio_net_handle_tx_timer(VirtIODevice *vdev, VirtQueue *vq)
 {
     VirtIONet *n = to_virtio_net(vdev);
+    VirtIONetQueue *netq = &n->vqs[vq_get_pair_index(n, vq)];
 
     /* This happens when device was stopped but VCPU wasn't. */
     if (!n->vdev.vm_running) {
-        n->tx_waiting = 1;
+        netq->tx_waiting = 1;
         return;
     }
 
-    if (n->tx_waiting) {
+    if (netq->tx_waiting) {
         virtio_queue_set_notification(vq, 1);
-        qemu_del_timer(n->tx_timer);
-        n->tx_waiting = 0;
-        virtio_net_flush_tx(n, vq);
+        qemu_del_timer(netq->tx_timer);
+        netq->tx_waiting = 0;
+        virtio_net_flush_tx(n, netq);
     } else {
-        qemu_mod_timer(n->tx_timer,
-                       qemu_get_clock_ns(vm_clock) + n->tx_timeout);
-        n->tx_waiting = 1;
+        qemu_mod_timer(netq->tx_timer,
+                       qemu_get_clock_ns(vm_clock) + netq->tx_timeout);
+        netq->tx_waiting = 1;
         virtio_queue_set_notification(vq, 0);
     }
 }
@@ -794,48 +862,53 @@ static void virtio_net_handle_tx_timer(VirtIODevice *vdev, VirtQueue *vq)
 static void virtio_net_handle_tx_bh(VirtIODevice *vdev, VirtQueue *vq)
 {
     VirtIONet *n = to_virtio_net(vdev);
+    VirtIONetQueue *netq = &n->vqs[vq_get_pair_index(n, vq)];
 
-    if (unlikely(n->tx_waiting)) {
+    if (unlikely(netq->tx_waiting)) {
         return;
     }
-    n->tx_waiting = 1;
+    netq->tx_waiting = 1;
     /* This happens when device was stopped but VCPU wasn't. */
     if (!n->vdev.vm_running) {
         return;
     }
     virtio_queue_set_notification(vq, 0);
-    qemu_bh_schedule(n->tx_bh);
+    qemu_bh_schedule(netq->tx_bh);
 }
 
 static void virtio_net_tx_timer(void *opaque)
 {
-    VirtIONet *n = opaque;
+    VirtIONetQueue *netq = opaque;
+    VirtIONet *n = netq->n;
+
     assert(n->vdev.vm_running);
 
-    n->tx_waiting = 0;
+    netq->tx_waiting = 0;
 
     /* Just in case the driver is not ready on more */
     if (!(n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK))
         return;
 
-    virtio_queue_set_notification(n->tx_vq, 1);
-    virtio_net_flush_tx(n, n->tx_vq);
+    virtio_queue_set_notification(netq->tx_vq, 1);
+    virtio_net_flush_tx(n, netq);
 }
 
 static void virtio_net_tx_bh(void *opaque)
 {
-    VirtIONet *n = opaque;
+    VirtIONetQueue *netq = opaque;
+    VirtQueue *vq = netq->tx_vq;
+    VirtIONet *n = netq->n;
     int32_t ret;
 
     assert(n->vdev.vm_running);
 
-    n->tx_waiting = 0;
+    netq->tx_waiting = 0;
 
     /* Just in case the driver is not ready on more */
     if (unlikely(!(n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK)))
         return;
 
-    ret = virtio_net_flush_tx(n, n->tx_vq);
+    ret = virtio_net_flush_tx(n, netq);
     if (ret == -EBUSY) {
         return; /* Notification re-enable handled by tx_complete */
     }
@@ -843,33 +916,39 @@ static void virtio_net_tx_bh(void *opaque)
     /* If we flush a full burst of packets, assume there are
      * more coming and immediately reschedule */
     if (ret >= n->tx_burst) {
-        qemu_bh_schedule(n->tx_bh);
-        n->tx_waiting = 1;
+        qemu_bh_schedule(netq->tx_bh);
+        netq->tx_waiting = 1;
         return;
     }
 
     /* If less than a full burst, re-enable notification and flush
      * anything that may have come in while we weren't looking.  If
      * we find something, assume the guest is still active and reschedule */
-    virtio_queue_set_notification(n->tx_vq, 1);
-    if (virtio_net_flush_tx(n, n->tx_vq) > 0) {
-        virtio_queue_set_notification(n->tx_vq, 0);
-        qemu_bh_schedule(n->tx_bh);
-        n->tx_waiting = 1;
+    virtio_queue_set_notification(vq, 1);
+    if (virtio_net_flush_tx(n, netq) > 0) {
+        virtio_queue_set_notification(vq, 0);
+        qemu_bh_schedule(netq->tx_bh);
+        netq->tx_waiting = 1;
     }
 }
 
 static void virtio_net_save(QEMUFile *f, void *opaque)
 {
     VirtIONet *n = opaque;
+    int i;
 
     /* At this point, backend must be stopped, otherwise
      * it might keep writing to memory. */
-    assert(!n->vhost_started);
+    for (i = 0; i < n->queues; i++) {
+        assert(!n->vqs[i].vhost_started);
+    }
     virtio_save(&n->vdev, f);
 
     qemu_put_buffer(f, n->mac, ETH_ALEN);
-    qemu_put_be32(f, n->tx_waiting);
+    qemu_put_be32(f, n->queues);
+    for (i = 0; i < n->queues; i++) {
+        qemu_put_be32(f, n->vqs[i].tx_waiting);
+    }
     qemu_put_be32(f, n->mergeable_rx_bufs);
     qemu_put_be16(f, n->status);
     qemu_put_byte(f, n->promisc);
@@ -898,7 +977,10 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
     virtio_load(&n->vdev, f);
 
     qemu_get_buffer(f, n->mac, ETH_ALEN);
-    n->tx_waiting = qemu_get_be32(f);
+    n->queues = qemu_get_be32(f);
+    for (i = 0; i < n->queues; i++) {
+        n->vqs[i].tx_waiting = qemu_get_be32(f);
+    }
     n->mergeable_rx_bufs = qemu_get_be32(f);
 
     if (version_id >= 3)
@@ -926,7 +1008,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
             n->mac_table.in_use = 0;
         }
     }
- 
+
     if (version_id >= 6)
         qemu_get_buffer(f, (uint8_t *)n->vlans, MAX_VLAN >> 3);
 
@@ -937,13 +1019,16 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
         }
 
         if (n->has_vnet_hdr) {
-            tap_using_vnet_hdr(n->nic->nc.peer, 1);
-            tap_set_offload(n->nic->nc.peer,
-                    (n->vdev.guest_features >> VIRTIO_NET_F_GUEST_CSUM) & 1,
-                    (n->vdev.guest_features >> VIRTIO_NET_F_GUEST_TSO4) & 1,
-                    (n->vdev.guest_features >> VIRTIO_NET_F_GUEST_TSO6) & 1,
-                    (n->vdev.guest_features >> VIRTIO_NET_F_GUEST_ECN)  & 1,
-                    (n->vdev.guest_features >> VIRTIO_NET_F_GUEST_UFO)  & 1);
+            for(i = 0; i < n->queues; i++) {
+                tap_using_vnet_hdr(n->nic->ncs[i]->peer, 1);
+                tap_set_offload(n->nic->ncs[i]->peer,
+                        (n->vdev.guest_features >> VIRTIO_NET_F_GUEST_CSUM) & 1,
+                        (n->vdev.guest_features >> VIRTIO_NET_F_GUEST_TSO4) & 1,
+                        (n->vdev.guest_features >> VIRTIO_NET_F_GUEST_TSO6) & 1,
+                        (n->vdev.guest_features >> VIRTIO_NET_F_GUEST_ECN)  & 1,
+                        (n->vdev.guest_features >> VIRTIO_NET_F_GUEST_UFO)  &
+                        1);
+           }
         }
     }
 
@@ -978,7 +1063,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
 
 static void virtio_net_cleanup(VLANClientState *nc)
 {
-    VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque;
+    VirtIONet *n = ((NICState *)nc->opaque)->opaque;
 
     n->nic = NULL;
 }
@@ -996,6 +1081,7 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
                               virtio_net_conf *net)
 {
     VirtIONet *n;
+    int i;
 
     n = (VirtIONet *)virtio_common_init("virtio-net", VIRTIO_ID_NET,
                                         sizeof(struct virtio_net_config),
@@ -1008,7 +1094,6 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
     n->vdev.bad_features = virtio_net_bad_features;
     n->vdev.reset = virtio_net_reset;
     n->vdev.set_status = virtio_net_set_status;
-    n->rx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_rx);
 
     if (net->tx && strcmp(net->tx, "timer") && strcmp(net->tx, "bh")) {
         error_report("virtio-net: "
@@ -1017,15 +1102,6 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
         error_report("Defaulting to \"bh\"");
     }
 
-    if (net->tx && !strcmp(net->tx, "timer")) {
-        n->tx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_tx_timer);
-        n->tx_timer = qemu_new_timer_ns(vm_clock, virtio_net_tx_timer, n);
-        n->tx_timeout = net->txtimer;
-    } else {
-        n->tx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_tx_bh);
-        n->tx_bh = qemu_bh_new(virtio_net_tx_bh, n);
-    }
-    n->ctrl_vq = virtio_add_queue(&n->vdev, 64, virtio_net_handle_ctrl);
     qemu_macaddr_default_if_unset(&conf->macaddr);
     memcpy(&n->mac[0], &conf->macaddr, sizeof(n->mac));
     n->status = VIRTIO_NET_S_LINK_UP;
@@ -1034,7 +1110,6 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
 
     qemu_format_nic_info_str(&n->nic->nc, conf->macaddr.a);
 
-    n->tx_waiting = 0;
     n->tx_burst = net->txburst;
     n->mergeable_rx_bufs = 0;
     n->promisc = 1; /* for compatibility */
@@ -1042,6 +1117,29 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
     n->mac_table.macs = qemu_mallocz(MAC_TABLE_ENTRIES * ETH_ALEN);
 
     n->vlans = qemu_mallocz(MAX_VLAN >> 3);
+    n->queues = conf->queues;
+
+    /* Allocate per rx/tx vq's */
+    for (i = 0; i < n->queues; i++) {
+        n->vqs[i].rx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_rx);
+        if (net->tx && !strcmp(net->tx, "timer")) {
+            n->vqs[i].tx_vq = virtio_add_queue(&n->vdev, 256,
+                                               virtio_net_handle_tx_timer);
+            n->vqs[i].tx_timer = qemu_new_timer_ns(vm_clock,
+                                                   virtio_net_tx_timer,
+                                                   &n->vqs[i]);
+            n->vqs[i].tx_timeout = net->txtimer;
+        } else {
+            n->vqs[i].tx_vq = virtio_add_queue(&n->vdev, 256,
+                                               virtio_net_handle_tx_bh);
+            n->vqs[i].tx_bh = qemu_bh_new(virtio_net_tx_bh, &n->vqs[i]);
+        }
+
+        n->vqs[i].tx_waiting = 0;
+        n->vqs[i].n = n;
+    }
+
+    n->ctrl_vq = virtio_add_queue(&n->vdev, 64, virtio_net_handle_ctrl);
 
     n->qdev = dev;
     register_savevm(dev, "virtio-net", -1, VIRTIO_NET_VM_VERSION,
@@ -1055,24 +1153,33 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
 void virtio_net_exit(VirtIODevice *vdev)
 {
     VirtIONet *n = DO_UPCAST(VirtIONet, vdev, vdev);
+    int i;
 
     /* This will stop vhost backend if appropriate. */
     virtio_net_set_status(vdev, 0);
 
-    qemu_purge_queued_packets(&n->nic->nc);
+    for (i = 0; i < n->queues; i++) {
+        qemu_purge_queued_packets(n->nic->ncs[i]);
+    }
 
     unregister_savevm(n->qdev, "virtio-net", n);
 
     qemu_free(n->mac_table.macs);
     qemu_free(n->vlans);
 
-    if (n->tx_timer) {
-        qemu_del_timer(n->tx_timer);
-        qemu_free_timer(n->tx_timer);
-    } else {
-        qemu_bh_delete(n->tx_bh);
+    for (i = 0; i < n->queues; i++) {
+        VirtIONetQueue *netq = &n->vqs[i];
+        if (netq->tx_timer) {
+            qemu_del_timer(netq->tx_timer);
+            qemu_free_timer(netq->tx_timer);
+        } else {
+            qemu_bh_delete(netq->tx_bh);
+        }
     }
 
     virtio_cleanup(&n->vdev);
-    qemu_del_vlan_client(&n->nic->nc);
+
+    for (i = 0; i < n->queues; i++) {
+        qemu_del_vlan_client(n->nic->ncs[i]);
+    }
 }
diff --git a/hw/virtio-net.h b/hw/virtio-net.h
index 8af9a1c..479489f 100644
--- a/hw/virtio-net.h
+++ b/hw/virtio-net.h
@@ -44,6 +44,7 @@
 #define VIRTIO_NET_F_CTRL_RX    18      /* Control channel RX mode support */
 #define VIRTIO_NET_F_CTRL_VLAN  19      /* Control channel VLAN filtering */
 #define VIRTIO_NET_F_CTRL_RX_EXTRA 20   /* Extra RX mode control support */
+#define VIRTIO_NET_F_MULTIQUEUE   21
 
 #define VIRTIO_NET_S_LINK_UP    1       /* Link is up */
 
@@ -72,6 +73,7 @@ struct virtio_net_config
     uint8_t mac[ETH_ALEN];
     /* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
     uint16_t status;
+    uint16_t queues;
 } __attribute__((packed));
 
 /* This is the first element of the scatter-gather list.  If you don't
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 555f23f..cae311e 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -666,6 +666,7 @@ static const VirtIOBindings virtio_pci_bindings = {
     .query_guest_notifiers = virtio_pci_query_guest_notifiers,
     .set_host_notifier = virtio_pci_set_host_notifier,
     .set_guest_notifiers = virtio_pci_set_guest_notifiers,
+    .set_guest_notifier = virtio_pci_set_guest_notifier,
     .vmstate_change = virtio_pci_vmstate_change,
 };
 
diff --git a/hw/virtio.h b/hw/virtio.h
index bc72289..e939674 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -96,6 +96,7 @@ typedef struct {
     unsigned (*get_features)(void * opaque);
     bool (*query_guest_notifiers)(void * opaque);
     int (*set_guest_notifiers)(void * opaque, bool assigned);
+    int (*set_guest_notifier)(void *opaque, int n, bool assigned);
     int (*set_host_notifier)(void * opaque, int n, bool assigned);
     void (*vmstate_change)(void * opaque, bool running);
 } VirtIOBindings;


^ permalink raw reply related

* Re: [PATCH] net: ibmveth: convert to hw_features
From: David Miller @ 2011-04-20  8:32 UTC (permalink / raw)
  To: mirq-linux; +Cc: netdev, santil
In-Reply-To: <20110419121426.18A2D13909@rere.qmqm.pl>

From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Date: Tue, 19 Apr 2011 14:14:25 +0200 (CEST)

> A minimal conversion.
> 
> ibmveth_set_csum_offload() can be folded into ibmveth_set_features()
> and adapter->rx_csum removed - left for later cleanup.
> 
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>

Applied.

^ permalink raw reply

* [RFC PATCH 1/2] net: Add multiqueue support
From: Jason Wang @ 2011-04-20  8:33 UTC (permalink / raw)
  To: krkumar2, kvm, mst, netdev, rusty, qemu-devel, anthony
In-Reply-To: <20110420082706.32157.59668.stgit@dhcp-91-7.nay.redhat.com.englab.nay.redhat.com>

This patch adds the multiqueues support for emulated nics. Each VLANClientState
pairs are now abstract as a queue instead of a nic, and multiple VLANClientState
pointers were stored in the NICState and treated as the multiple queues of a
single nic. The netdev options of qdev were now expanded to accept more than one
netdev ids. A queue_index were also introduced to let the emulated nics know
which queue the packet were came from or sent out. Virtio-net would be the first
user.

The legacy single queue nics can still run happily without modification as the
the compatibility were kept.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/qdev-properties.c |   37 ++++++++++++++++++++++++++++++-------
 hw/qdev.h            |    3 ++-
 net.c                |   34 ++++++++++++++++++++++++++--------
 net.h                |   15 +++++++++++----
 4 files changed, 69 insertions(+), 20 deletions(-)

diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c
index 1088a26..dd371e1 100644
--- a/hw/qdev-properties.c
+++ b/hw/qdev-properties.c
@@ -384,14 +384,37 @@ PropertyInfo qdev_prop_chr = {
 
 static int parse_netdev(DeviceState *dev, Property *prop, const char *str)
 {
-    VLANClientState **ptr = qdev_get_prop_ptr(dev, prop);
+    VLANClientState ***nc = qdev_get_prop_ptr(dev, prop);
+    const char *ptr = str;
+    int i = 0;
+    size_t len = strlen(str);
+    *nc = qemu_malloc(MAX_QUEUE_NUM * sizeof(VLANClientState *));
+
+    while (i < MAX_QUEUE_NUM && ptr < str + len) {
+        char *name = NULL;
+        char *this = strchr(ptr, '#');
+
+        if (this == NULL) {
+            name = strdup(ptr);
+        } else {
+            name = strndup(ptr, this - ptr);
+        }
 
-    *ptr = qemu_find_netdev(str);
-    if (*ptr == NULL)
-        return -ENOENT;
-    if ((*ptr)->peer) {
-        return -EEXIST;
+        (*nc)[i] = qemu_find_netdev(name);
+        if ((*nc)[i] == NULL) {
+            return -ENOENT;
+        }
+        if (((*nc)[i])->peer) {
+            return -EEXIST;
+        }
+
+        if (this == NULL) {
+            break;
+        }
+        i++;
+        ptr = this + 1;
     }
+
     return 0;
 }
 
@@ -409,7 +432,7 @@ static int print_netdev(DeviceState *dev, Property *prop, char *dest, size_t len
 PropertyInfo qdev_prop_netdev = {
     .name  = "netdev",
     .type  = PROP_TYPE_NETDEV,
-    .size  = sizeof(VLANClientState*),
+    .size  = sizeof(VLANClientState **),
     .parse = parse_netdev,
     .print = print_netdev,
 };
diff --git a/hw/qdev.h b/hw/qdev.h
index 8a13ec9..b438da0 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -257,6 +257,7 @@ extern PropertyInfo qdev_prop_pci_devfn;
         .defval    = (bool[]) { (_defval) },                     \
         }
 
+
 #define DEFINE_PROP_UINT8(_n, _s, _f, _d)                       \
     DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_uint8, uint8_t)
 #define DEFINE_PROP_UINT16(_n, _s, _f, _d)                      \
@@ -281,7 +282,7 @@ extern PropertyInfo qdev_prop_pci_devfn;
 #define DEFINE_PROP_STRING(_n, _s, _f)             \
     DEFINE_PROP(_n, _s, _f, qdev_prop_string, char*)
 #define DEFINE_PROP_NETDEV(_n, _s, _f)             \
-    DEFINE_PROP(_n, _s, _f, qdev_prop_netdev, VLANClientState*)
+    DEFINE_PROP(_n, _s, _f, qdev_prop_netdev, VLANClientState**)
 #define DEFINE_PROP_VLAN(_n, _s, _f)             \
     DEFINE_PROP(_n, _s, _f, qdev_prop_vlan, VLANState*)
 #define DEFINE_PROP_DRIVE(_n, _s, _f) \
diff --git a/net.c b/net.c
index 4f777c3..a937e5d 100644
--- a/net.c
+++ b/net.c
@@ -227,16 +227,36 @@ NICState *qemu_new_nic(NetClientInfo *info,
 {
     VLANClientState *nc;
     NICState *nic;
+    int i;
 
     assert(info->type == NET_CLIENT_TYPE_NIC);
     assert(info->size >= sizeof(NICState));
 
-    nc = qemu_new_net_client(info, conf->vlan, conf->peer, model, name);
+    nc = qemu_new_net_client(info, conf->vlan, conf->peers[0], model, name);
 
     nic = DO_UPCAST(NICState, nc, nc);
     nic->conf = conf;
     nic->opaque = opaque;
 
+    /* For compatiablity with single queue nic */
+    nic->ncs[0] = nc;
+    nc->opaque = nic;
+
+    for (i = 1 ; i < conf->queues; i++) {
+        VLANClientState *vc = qemu_mallocz(sizeof(*vc));
+        vc->opaque = nic;
+        nic->ncs[i] = vc;
+        vc->peer = conf->peers[i];
+        vc->info = info;
+        vc->queue_index = i;
+        vc->peer->peer = vc;
+        QTAILQ_INSERT_TAIL(&non_vlan_clients, vc, next);
+
+        vc->send_queue = qemu_new_net_queue(qemu_deliver_packet,
+                                            qemu_deliver_packet_iov,
+                                            vc);
+    }
+
     return nic;
 }
 
@@ -272,11 +292,10 @@ void qemu_del_vlan_client(VLANClientState *vc)
 {
     /* If there is a peer NIC, delete and cleanup client, but do not free. */
     if (!vc->vlan && vc->peer && vc->peer->info->type == NET_CLIENT_TYPE_NIC) {
-        NICState *nic = DO_UPCAST(NICState, nc, vc->peer);
-        if (nic->peer_deleted) {
+        if (vc->peer_deleted) {
             return;
         }
-        nic->peer_deleted = true;
+        vc->peer_deleted = true;
         /* Let NIC know peer is gone. */
         vc->peer->link_down = true;
         if (vc->peer->info->link_status_changed) {
@@ -288,8 +307,7 @@ void qemu_del_vlan_client(VLANClientState *vc)
 
     /* If this is a peer NIC and peer has already been deleted, free it now. */
     if (!vc->vlan && vc->peer && vc->info->type == NET_CLIENT_TYPE_NIC) {
-        NICState *nic = DO_UPCAST(NICState, nc, vc);
-        if (nic->peer_deleted) {
+        if (vc->peer_deleted) {
             qemu_free_vlan_client(vc->peer);
         }
     }
@@ -331,14 +349,14 @@ void qemu_foreach_nic(qemu_nic_foreach func, void *opaque)
 
     QTAILQ_FOREACH(nc, &non_vlan_clients, next) {
         if (nc->info->type == NET_CLIENT_TYPE_NIC) {
-            func(DO_UPCAST(NICState, nc, nc), opaque);
+            func((NICState *)nc->opaque, opaque);
         }
     }
 
     QTAILQ_FOREACH(vlan, &vlans, next) {
         QTAILQ_FOREACH(nc, &vlan->clients, next) {
             if (nc->info->type == NET_CLIENT_TYPE_NIC) {
-                func(DO_UPCAST(NICState, nc, nc), opaque);
+                func((NICState *)nc->opaque, opaque);
             }
         }
     }
diff --git a/net.h b/net.h
index 6ceca50..c2fbd60 100644
--- a/net.h
+++ b/net.h
@@ -11,20 +11,24 @@ struct MACAddr {
     uint8_t a[6];
 };
 
+#define MAX_QUEUE_NUM 32
+
 /* qdev nic properties */
 
 typedef struct NICConf {
     MACAddr macaddr;
     VLANState *vlan;
-    VLANClientState *peer;
+    VLANClientState **peers;
     int32_t bootindex;
+    int32_t queues;
 } NICConf;
 
 #define DEFINE_NIC_PROPERTIES(_state, _conf)                            \
     DEFINE_PROP_MACADDR("mac",   _state, _conf.macaddr),                \
     DEFINE_PROP_VLAN("vlan",     _state, _conf.vlan),                   \
-    DEFINE_PROP_NETDEV("netdev", _state, _conf.peer),                   \
-    DEFINE_PROP_INT32("bootindex", _state, _conf.bootindex, -1)
+    DEFINE_PROP_NETDEV("netdev", _state, _conf.peers),                   \
+    DEFINE_PROP_INT32("bootindex", _state, _conf.bootindex, -1),        \
+    DEFINE_PROP_INT32("queues", _state, _conf.queues, 1)
 
 /* VLANs support */
 
@@ -68,13 +72,16 @@ struct VLANClientState {
     char *name;
     char info_str[256];
     unsigned receive_disabled : 1;
+    unsigned int queue_index;
+    bool peer_deleted;
+    void *opaque;
 };
 
 typedef struct NICState {
     VLANClientState nc;
+    VLANClientState *ncs[MAX_QUEUE_NUM];
     NICConf *conf;
     void *opaque;
-    bool peer_deleted;
 } NICState;
 
 struct VLANState {


^ permalink raw reply related

* Re: [PATCH] net: pch_gbe: convert to hw_features
From: David Miller @ 2011-04-20  8:31 UTC (permalink / raw)
  To: mirq-linux; +Cc: netdev, toshiharu-linux, masa-korg
In-Reply-To: <20110419115612.C83881389B@rere.qmqm.pl>

From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Date: Tue, 19 Apr 2011 13:56:12 +0200 (CEST)

> This also fixes bug in xmit path, where TX checksum offload state was used
> instead of skb->ip_summed to decide if the offload was needed.
> 
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>

Applied.

^ permalink raw reply

* Re: [PATCH] net: infiniband/ulp/ipoib: convert to hw_features
From: David Miller @ 2011-04-20  8:31 UTC (permalink / raw)
  To: mirq-linux; +Cc: netdev, roland, sean.hefty, hal.rosenstock, linux-rdma
In-Reply-To: <20110419104320.7D01C13A67@rere.qmqm.pl>

From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Date: Tue, 19 Apr 2011 12:43:20 +0200 (CEST)

> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>

Applied.

^ permalink raw reply

* Re: [PATCH] net: s390: convert to hw_features
From: David Miller @ 2011-04-20  8:31 UTC (permalink / raw)
  To: mirq-linux
  Cc: netdev, ursula.braun, blaschka, linux390, schwidefsky,
	heiko.carstens, linux-s390
In-Reply-To: <20110419104320.314AB13A66@rere.qmqm.pl>

From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Date: Tue, 19 Apr 2011 12:43:20 +0200 (CEST)

> options.large_send was easy to get rid of. options.checksum_type has deeper
> roots so is left for later cleanup.
> 
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>

Applied.

^ permalink raw reply

* Re: [PATCH] net: dsa: remove ethtool_ops->set_sg
From: David Miller @ 2011-04-20  8:31 UTC (permalink / raw)
  To: mirq-linux; +Cc: netdev
In-Reply-To: <20110419104320.032B713909@rere.qmqm.pl>

From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Date: Tue, 19 Apr 2011 12:43:19 +0200 (CEST)

> Remove set_sg from DSA slave ethtool_ops. Features inheritance looks
> broken/not fully implemented anyway.
> 
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>

Applied.

^ permalink raw reply

* Re: [PATCH] net: infiniband/hw/nes: convert to hw_features
From: David Miller @ 2011-04-20  8:31 UTC (permalink / raw)
  To: mirq-linux-CoA6ZxLDdyEEUmgCuDUIdw
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	faisal.latif-ral2JQCrhuEAvxtiuMwx3w,
	roland-DgEjT+Ai2ygdnm+yROfE0A, sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20110419104320.51E5F13A64-CoA6ZxLDdyEEUmgCuDUIdw@public.gmane.org>

From: Michał Mirosław <mirq-linux-CoA6ZxLDdyEEUmgCuDUIdw@public.gmane.org>
Date: Tue, 19 Apr 2011 12:43:20 +0200 (CEST)

> Signed-off-by: Michał Mirosław <mirq-linux-CoA6ZxLDdyEEUmgCuDUIdw@public.gmane.org>

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2] net: xen-netback: convert to hw_features
From: David Miller @ 2011-04-20  8:31 UTC (permalink / raw)
  To: Ian.Campbell; +Cc: mirq-linux, netdev, xen-devel
In-Reply-To: <1303286338.5997.243.camel@zakaz.uk.xensource.com>

From: Ian Campbell <Ian.Campbell@eu.citrix.com>
Date: Wed, 20 Apr 2011 08:58:58 +0100

> On Tue, 2011-04-19 at 14:39 +0100, Michał Mirosław wrote:
>> On Tue, Apr 19, 2011 at 03:35:06PM +0200, Michał Mirosław wrote:
>> > Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
>> > Acked-by: Ian Campbell <ian.campbell@citrix.com>
>> 
>> David, I was too quick with this, so please wait with this patch until Ian
>> acks it again.
> 
> Thanks Michał.
> 
> Acked-by: Ian Campbell <ian.campbell@citrix.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: batman-adv: remove rx_csum ethtool_ops
From: David Miller @ 2011-04-20  8:31 UTC (permalink / raw)
  To: mirq-linux-CoA6ZxLDdyEEUmgCuDUIdw
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
	lindner_marek-LWAfsSFWpa4, siwu-MaAgPAbsBIVS8oHt8HbXEIQuADTiUCJX
In-Reply-To: <20110419104320.0675D1389B-CoA6ZxLDdyEEUmgCuDUIdw@public.gmane.org>

From: Michał Mirosław <mirq-linux-CoA6ZxLDdyEEUmgCuDUIdw@public.gmane.org>
Date: Tue, 19 Apr 2011 12:43:20 +0200 (CEST)

> Signed-off-by: Michał Mirosław <mirq-linux-CoA6ZxLDdyEEUmgCuDUIdw@public.gmane.org>

Applied.

^ permalink raw reply

* Re: [PATCH v2] net: filter: Just In Time compiler
From: Hagen Paul Pfeifer @ 2011-04-20  8:27 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, avi, netdev, acme, bhutchings
In-Reply-To: <20110420.011454.226757908.davem@davemloft.net>


On Wed, 20 Apr 2011 01:14:54 -0700 (PDT), David Miller wrote:



I am fine with this patch too:



Acked-by: Hagen Paul Pfeifer <hagen@jauu.net>



Hopefully someone dig into the BPF optimizer in libpcap - cause there lie

the big optimization potential. Trivial filter rules like "host

192.168.1.1" cannot really be optimized - the unavoidable code path is the

"problem".



Hagen

^ permalink raw reply

* Re: [PATCHv2] usbnet: Resubmit interrupt URB more often
From: David Miller @ 2011-04-20  8:24 UTC (permalink / raw)
  To: pstew; +Cc: netdev, bhutchings
In-Reply-To: <20110419175037.AADD22052B@glenhelen.mtv.corp.google.com>

From: Paul Stewart <pstew@chromium.org>
Date: Tue, 19 Apr 2011 10:44:12 -0700

> The second class of device are those which shut down the device
> during suspend.  During a suspend-resume cycle, the device is
> re-enumerated at system resume, and for whatever reason
> usbnet_resume may be called on the device during the call-tree
> from usbnet_open-> usb_autopm_get_interface, which may cause a
> race where the first change above may cause the bh to submit the
> interrupt urb before usbnet_open() does.  As a result, I've added
> an EALREADY check and a fix to urb.c to send one.

You'll need to submit the generic urb submission change via the USB
maintainers, not networking.

^ permalink raw reply

* Re: A patch you wrote some time ago (aka: "[patch 41/54] ICMP: Fix icmp_errors_use_inbound_ifaddr sysctl")
From: Patrick McHardy @ 2011-04-20  8:24 UTC (permalink / raw)
  To: Chris Wright; +Cc: Alexander Hoogerhuis, linux-kernel, netdev
In-Reply-To: <20110419165411.GO9569@sequoia.sous-sol.org>

On 19.04.2011 18:54, Chris Wright wrote:
> * Alexander Hoogerhuis (alexh@boxed.no) wrote:
>> I hope you (or anyone else) can spare half a minute to have a quick
>> look at a patch you wrote a few years ago:
>>
>>> http://lkml.org/lkml/2007/6/8/124
> 
> I actually did not write that patch, rather added it to the -stable tree.
> Patrick (CCd) wrote it.

I actually only fixed it, it was added in 1c2fb7f9 by J. Simonetti
<jeroen@simonetti.nl>. Anyways ...

>> I've been tracking down a case of ICMP Redirects originating from
>> the wrong IPs, and as far I can tell, you patch is the last to touch
>> this code (net/ipv4/icmp.c:507):
>>
>>> if (rt->fl.iif && net->ipv4.sysctl_icmp_errors_use_inbound_ifaddr)
>>>        dev = dev_get_by_index_rcu(net, rt->fl.iif);
>>>
>>> if (dev)
>>>        saddr = inet_select_addr(dev, 0, RT_SCOPE_LINK);
>>> else
>>>        saddr = 0;
>>
>> In a plain world this would work, but I have come across a case that
>> seems to be not handled by this.
>>
>> I have two machines set up with VRRP to act as routers out of a
>> subnet, and they have IPs x.x.x.13/28 and x.x.x.14/28, with VRRP
>> holding on to x.x.x.1/28.
>>
>> If a node in x.x.x.0/28 needs to get a ICMP redirect from x.x.x.1/28
>> (to reach another subnet behind a  different gateway in x.x.x.0/28),
>> then the source IP on the ICMP redirect is chosen as the primary IP
>> on the interface that the packet arrived at.
>>
>> This is as far as I can tell from RFCs and colleagues fine for most
>> things after you're routed one hop or more, but in the case of ICMP
>> redirect it means that the redirect is not adhered to by the client,
>> as it will get the reidrect from x.x.x.13/28, not x.x.x.1/28.
>>
>> inet_select_addr seems to be explicitly looking for the primary IP
>> in all cases (./net/ipv4/devinet.c:875), and in the case of sending
>> ICMP recdirect when in an VRRP setup, that would not work well. It
>> should try to match the actual inbound IP.

>From what I understand, its explicitly meant to behave this way.
This is what the original commit stated:

    The new behaviour (when the sysctl variable is toggled on), it will send
    the message with the ip of the interface that received the packet that
    caused the icmp error. This is the behaviour network administrators will
    expect from a router. It makes debugging complicated network layouts
    much easier. Also, all 'vendor routers' I know of have the later
    behaviour.

>> Judging by the comments from your patch I am not sure if the source
>> IP that triggers the ICMP redirect is available at this point any
>> more.
>>
>> The way I understand it should pick adress is this way:
>>
>>>  if (rt->fl.iif && net->ipv4.sysctl_icmp_errors_use_inbound_ifaddr)
>>>         dev = dev_get_by_index_rcu(net, rt->fl.iif);
>>>
>>> if (dev == fl.iif)
>>>         saddr = iph->daddr;
>>>
>>> if (dev != fl.iif)
>>>         saddr = inet_select_addr(dev, 0, RT_SCOPE_LINK);
>>> else
>>>         saddr = 0;
>>
>> I.e. if we are replying to something that is from a local network
>> segment, then iph->daddr would be a more correct source. My C skill
>> is prehistoric so what I've written likely is far from correct, but
>> the general gist is that there is a special case for replying to
>> something local.

That might be a possibility to fix this for your case. But I'm
wondering why you're turning this on at all and not have routing
decide the correct source address?

^ permalink raw reply

* Re: [Bugme-new] [Bug 33502] New: Caught 64-bit read from uninitialized memory in __alloc_skb
From: Pekka Enberg @ 2011-04-20  8:21 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: casteyde.christian, Christoph Lameter, Andrew Morton, netdev,
	bugzilla-daemon, bugme-daemon, Vegard Nossum
In-Reply-To: <1303286998.3186.18.camel@edumazet-laptop>

On 4/20/11 11:09 AM, Eric Dumazet wrote:
> Le mercredi 20 avril 2011 à 10:49 +0300, Pekka Enberg a écrit :
>> On 4/20/11 10:45 AM, casteyde.christian@free.fr wrote:
>>> Indeed I can reproduce each time I boot, however I didn't managed to apply the
>>> patch to test it on -rc3 (neither -rc4), only the first hunk passed.
>>
>> That's odd. Eric?
>>
>
> I dont know, my patches were linux-2.6 based, and I pull tree very
> often ;)

Btw, the patch applies fine here on top of 2.6.39-rc4.

^ permalink raw reply

* Re: [PATCH v2] net: filter: Just In Time compiler
From: David Miller @ 2011-04-20  8:14 UTC (permalink / raw)
  To: eric.dumazet; +Cc: avi, hagen, netdev, acme, bhutchings
In-Reply-To: <1303286878.3186.17.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 20 Apr 2011 10:07:58 +0200

> No problem, I'll wait your work on this then.

Don't, please resubmit your BPF jit code so I can apply it.

Waiting for some userspace pie-in-the-sky based solution, and blocking
this perfectly functional thing we have now meanwhile, is not
reasonable.

We're pragmatic, not perfectionists.

Thanks Eric.

^ permalink raw reply

* Re: [PATCH v2] net: filter: Just In Time compiler
From: David Miller @ 2011-04-20  8:12 UTC (permalink / raw)
  To: avi; +Cc: eric.dumazet, hagen, netdev, acme, bhutchings
In-Reply-To: <4DAE8E47.7040409@redhat.com>

From: Avi Kivity <avi@redhat.com>
Date: Wed, 20 Apr 2011 10:41:59 +0300

> To avoid getting into an infinite loop (btw, does you jit avoid
> infinite loops in the generated code?) I'll restate what I think are
> an external jit's advantages and then stop harping on the subject:

Only forward branching is allowed in BPF code.

^ permalink raw reply

* Re: [Bugme-new] [Bug 33502] New: Caught 64-bit read from uninitialized memory in __alloc_skb
From: Eric Dumazet @ 2011-04-20  8:09 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: casteyde.christian, Christoph Lameter, Andrew Morton, netdev,
	bugzilla-daemon, bugme-daemon, Vegard Nossum
In-Reply-To: <4DAE901C.2090809@cs.helsinki.fi>

Le mercredi 20 avril 2011 à 10:49 +0300, Pekka Enberg a écrit :
> On 4/20/11 10:45 AM, casteyde.christian@free.fr wrote:
> > Indeed I can reproduce each time I boot, however I didn't managed to apply the
> > patch to test it on -rc3 (neither -rc4), only the first hunk passed.
> 
> That's odd. Eric?
> 

I dont know, my patches were linux-2.6 based, and I pull tree very
often ;)

> > I simply boot on this laptop with kmemcheck activated to get the warning, with
> > vanilla -rc3. I've also tested -rc4 yersterday (without the patch therefore),
> > the warning was not the same (it's a bit later, I will post the dmesg output on
> > bugzilla tonight).
> 
> That's expected. You're unlikely to see the problem in the exact same 
> place across different kernel configs and versions.
> 

Christian, please send us your .config

Thanks



^ permalink raw reply

* Re: [PATCH v2] net: filter: Just In Time compiler
From: Eric Dumazet @ 2011-04-20  8:07 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Hagen Paul Pfeifer, David Miller, netdev,
	Arnaldo Carvalho de Melo, Ben Hutchings
In-Reply-To: <4DAE8E47.7040409@redhat.com>

Le mercredi 20 avril 2011 à 10:41 +0300, Avi Kivity a écrit :
> On 04/14/2011 07:05 PM, Eric Dumazet wrote:
> > Le jeudi 14 avril 2011 à 18:53 +0300, Avi Kivity a écrit :
> >
> > >  IMO, it will.  I'll try to have gcc optimize your example filter later.
> >
> > Sure you can JIT a C program from bpf. It should take maybe 30 minutes.
> > It certainly is more easy than JIT an binary/assembly code :)
> >
> > Now take a look how I call slowpath, I am not sure gcc will actually
> > generate better code because of C conventions.
> 
> Some things will be the same (like calling a function outside the jit).  
> Some things will be faster.
> 
> > Loading a filter should be fast.
> > Invoking a compiler is just too much work for BPF.
> > Remember loading a filter is available to any user.
> 
> Like I mentioned before, use the interpreter until the result of the 
> jitter is available.
> 
> > This idea would be good for netfilter stuff, because we dont load
> > iptables rules that often.
> >
> > But still, the netfilter mainloop can be converted as a kernel JIT, most
> > probably. All the complex stuff (matches, targets) must call external
> > procedures anyway.
> 
> We could convert some matches to bytecode, probably.
> 
> To avoid getting into an infinite loop (btw, does you jit avoid infinite 
> loops in the generated code?) I'll restate what I think are an external 
> jit's advantages and then stop harping on the subject:
> 
> - less effort
> - less kernel code
> - better arch support
> - better optimization
> - better profiler/debugger integration
> - multiple optimization levels (can use your jitter in userspace, or 
> gcc, or llvm)
> 

No problem, I'll wait your work on this then.

I disagree having a gcc on production machines, I'm not sure it will
please admins...

Thanks



^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox