* Re: [PATCH 3/3] virtio-blk: Add bio-based IO path for virtio-blk
From: Ronen Hod @ 2012-07-03 14:22 UTC (permalink / raw)
To: dlaor
Cc: kvm, Michael S. Tsirkin, linux-kernel, virtualization,
Christoph Hellwig
In-Reply-To: <4FDF0DA7.40604@redhat.com>
On 06/18/2012 02:14 PM, Dor Laor wrote:
> On 06/18/2012 01:05 PM, Rusty Russell wrote:
>> On Mon, 18 Jun 2012 16:03:23 +0800, Asias He<asias@redhat.com> wrote:
>>> On 06/18/2012 03:46 PM, Rusty Russell wrote:
>>>> On Mon, 18 Jun 2012 14:53:10 +0800, Asias He<asias@redhat.com> wrote:
>>>>> This patch introduces bio-based IO path for virtio-blk.
>>>>
>>>> Why make it optional?
>>>
>>> request-based IO path is useful for users who do not want to bypass the
>>> IO scheduler in guest kernel, e.g. users using spinning disk. For users
>>> using fast disk device, e.g. SSD device, they can use bio-based IO path.
>>
>> Users using a spinning disk still get IO scheduling in the host though.
>> What benefit is there in doing it in the guest as well?
>
> The io scheduler waits for requests to merge and thus batch IOs together. It's not important w.r.t spinning disks since the host can do it but it causes much less vmexits which is the key issue for VMs.
Does it make sense to use the guest's I/O scheduler at all?
- It is not aware of the physical (spinning) disk layout.
- It is not aware of all the host's disk pending requests.
It does have a good side-effect - batching of requests.
Ronen.
>
>>
>> Cheers,
>> Rusty.
>> _______________________________________________
>> Virtualization mailing list
>> Virtualization@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 3/3] virtio-blk: Add bio-based IO path for virtio-blk
From: Asias He @ 2012-07-03 14:02 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm, Michael S. Tsirkin, linux-kernel, virtualization,
Sasha Levin, Christoph Hellwig
In-Reply-To: <4FF2F417.2010209@redhat.com>
On 07/03/2012 09:31 PM, Paolo Bonzini wrote:
> Il 02/07/2012 08:41, Rusty Russell ha scritto:
>>> With the same workload in guest, the guest fires 200K requests to host
>>> with merges enabled in guest (echo 0 > /sys/block/vdb/queue/nomerges),
>>> while the guest fires 40000K requests to host with merges disabled in
>>> guest (echo 2 > /sys/block/vdb/queue/nomerges). This show that the merge
>>> in block layer reduces the total number of requests fire to host a lot
>>> (40000K / 200K = 20).
>>>
>
> 40000 / 200 is 200, not 20. :)
Crap, I wrote one more zero here. Actually, it is 4000K. So the factor
is still 4000K/200k = 20 ;-)
--
Asias
^ permalink raw reply
* Re: [PATCH 3/3] virtio-blk: Add bio-based IO path for virtio-blk
From: Paolo Bonzini @ 2012-07-03 13:31 UTC (permalink / raw)
To: Rusty Russell
Cc: kvm, Michael S. Tsirkin, linux-kernel, virtualization,
Sasha Levin, Christoph Hellwig
In-Reply-To: <8762a6y89v.fsf@rustcorp.com.au>
Il 02/07/2012 08:41, Rusty Russell ha scritto:
>> With the same workload in guest, the guest fires 200K requests to host
>> with merges enabled in guest (echo 0 > /sys/block/vdb/queue/nomerges),
>> while the guest fires 40000K requests to host with merges disabled in
>> guest (echo 2 > /sys/block/vdb/queue/nomerges). This show that the merge
>> in block layer reduces the total number of requests fire to host a lot
>> (40000K / 200K = 20).
>>
40000 / 200 is 200, not 20. :)
Paolo
^ permalink raw reply
* [PATCH 2/2] virtio-blk spec: writeback cache enable improvements
From: Paolo Bonzini @ 2012-07-03 13:16 UTC (permalink / raw)
To: virtualization, kvm, rusty
In-Reply-To: <1341321412-24214-1-git-send-email-pbonzini@redhat.com>
This patch introduces two improvements to writeback cache handling
in the virtio-blk spec.
1) The VIRTIO_BLK_F_FLUSH feature is renamed to VIRTIO_BLK_F_WCE, and
QEMU's behavior is documented explicitly as part of the spec: the host
negotiates the feature only if its cache is writeback. The obvious dual
requirement is imposed on the guest: it should negotiate the feature
only if it is able to send flushes. And in order to protect against
data loss, the spec now mandates that the host operates in writethrough
mode if the guest does not negotiate VIRTIO_BLK_F_WCE (this behavior
was already _allowed_ by the spec so far). This can change with every
reset of course; typically the BIOS will run as writethrough, while the
"main" OS will run in writeback mode. This is a backwards-compatible
refinement geared towards old or limited guests, so there is no need
for a new feature bit.
2) a second feature is added, VIRTIO_BLK_F_CONFIG_WCE, that provides
the same information in the configuration. This will enable the driver
to modify the write-cache setting at runtime (via sysfs for Linux, via
MODE SELECT for Windows).
Patches for QEMU and Linux will come soonish.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
virtio-spec.lyx | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 55 insertions(+), 2 deletions(-)
diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index 859dbe7..fccbd28 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -5021,9 +5021,19 @@ VIRTIO_BLK_F_SCSI (7) Device supports scsi packet commands.
\end_layout
\begin_layout Description
-VIRTIO_BLK_F_FLUSH (9) Cache flush command support.
+VIRTIO_BLK_F_
+\change_deleted 1531152142 1341302299
+FLUSH
+\change_inserted 1531152142 1341302304
+WCE
+\change_unchanged
+ (9)
+\change_deleted 1531152142 1341302317
+Cache flush command support.
\change_inserted 1531152142 1341305427
-
+Device cache starts in writeback mode after reset.
+ Guests should not negotiate this feature unless they are capable of sending
+ VIRTIO_BLK_T_FLUSH commands.
\end_layout
\begin_layout Description
@@ -5032,6 +5042,13 @@ VIRTIO_BLK_F_FLUSH (9) Cache flush command support.
VIRTIO_BLK_F_TOPOLOGY (10) Device exports information on optimal I/O alignment.
\end_layout
+\begin_layout Description
+
+\change_inserted 1531152142 1341302349
+VIRTIO_BLK_F_CONFIG_WCE (11) Device can toggle its cache between writeback
+ and writethrough modes.
+\end_layout
+
\end_deeper
\begin_layout Description
Device
@@ -5145,6 +5162,15 @@ struct virtio_blk_config {
\begin_layout Plain Layout
+\change_inserted 1531152142 1341301918
+
+ u8 writeback;
+\change_unchanged
+
+\end_layout
+
+\begin_layout Plain Layout
+
};
\end_layout
@@ -5205,6 +5231,33 @@ If the VIRTIO_BLK_F_TOPOLOGY feature is negotiated, the fields in the topology
This also does not affect the units in the protocol, only performance.
\end_layout
+\begin_layout Enumerate
+
+\change_inserted 1531152142 1341305949
+The cache mode should be read from the writeback field of the configuration
+ if the VIRTIO_BLK_F_CONFIG_WCE feature if available; the driver can also
+ write to the field in order to toggle the cache between writethrough (0)
+ and writeback (1) mode.
+ If the feature is not available, the driver can instead look at the result
+ of negotiating VIRTIO_BLK_F_WCE: the cache will be in writeback mode after
+ reset if and only if VIRTIO_BLK_F_WCE is negotiated
+\begin_inset Foot
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1341306004
+Until version 1.1, QEMU remained in writeback mode even after a guest announced
+ lack of support for VIRTIO_BLK_F_FLUSH.
+\change_unchanged
+
+\end_layout
+
+\end_inset
+
+.
+\end_layout
+
\begin_layout Section*
Device Operation
\end_layout
--
1.7.10.2
^ permalink raw reply related
* [PATCH 1/2] virtio-blk spec: document topology info
From: Paolo Bonzini @ 2012-07-03 13:16 UTC (permalink / raw)
To: virtualization, kvm, rusty
In-Reply-To: <1341321412-24214-1-git-send-email-pbonzini@redhat.com>
Current QEMU and Linux drivers can export queue parameters via the
virtio-blk configuration space. Document this, since the next patch
will have to add another configuration field after these.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
virtio-spec.lyx | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 80 insertions(+), 5 deletions(-)
diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index dd2d53b..859dbe7 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -5021,7 +5021,20 @@ VIRTIO_BLK_F_SCSI (7) Device supports scsi packet commands.
\end_layout
\begin_layout Description
-VIRTIO_BLK_F_FLUSH (9) Cache flush command support.Device
+VIRTIO_BLK_F_FLUSH (9) Cache flush command support.
+\change_inserted 1531152142 1341305427
+
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1341301882
+VIRTIO_BLK_F_TOPOLOGY (10) Device exports information on optimal I/O alignment.
+\end_layout
+
+\end_deeper
+\begin_layout Description
+Device
\begin_inset space ~
\end_inset
@@ -5090,6 +5103,48 @@ struct virtio_blk_config {
\begin_layout Plain Layout
+\change_inserted 1531152142 1341301807
+
+ struct virtio_blk_topology {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1341301810
+
+ u8 physical_block_exp;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1341301817
+
+ u8 alignment_offset;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1341301822
+
+ u16 min_io_size;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1341301827
+
+ u32 opt_io_size;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1341301911
+
+ } topology;
+\end_layout
+
+\begin_layout Plain Layout
+
};
\end_layout
@@ -5098,7 +5153,6 @@ struct virtio_blk_config {
\end_layout
-\end_deeper
\begin_layout Section*
Device Initialization
\end_layout
@@ -5119,15 +5173,36 @@ capacity
\begin_layout Enumerate
If the VIRTIO_BLK_F_BLK_SIZE feature is negotiated, the blk_size field can
be read to determine the optimal sector size for the driver to use.
- This does not effect the units used in the protocol (always 512 bytes),
- but awareness of the correct value can effect performance.
+ This does not
+\change_deleted 1531152142 1341301967
+e
+\change_inserted 1531152142 1341301967
+a
+\change_unchanged
+ffect the units used in the protocol (always 512 bytes), but awareness of
+ the correct value can
+\change_deleted 1531152142 1341301978
+e
+\change_inserted 1531152142 1341301978
+a
+\change_unchanged
+ffect performance.
\end_layout
\begin_layout Enumerate
If the VIRTIO_BLK_F_RO feature is set by the device, any write requests
will fail.
-\change_inserted 1531152142 1341301982
+\change_inserted 1531152142 1341301920
+
+\end_layout
+\begin_layout Enumerate
+
+\change_inserted 1531152142 1341301982
+If the VIRTIO_BLK_F_TOPOLOGY feature is negotiated, the fields in the topology
+ struct can be read to determine the physical block size and optimal I/O
+ lengths for the driver to use.
+ This also does not affect the units in the protocol, only performance.
\end_layout
\begin_layout Section*
--
1.7.10.2
^ permalink raw reply related
* [PATCH 0/2] virtio-blk spec: document topology info, add WCE toggle
From: Paolo Bonzini @ 2012-07-03 13:16 UTC (permalink / raw)
To: virtualization, kvm, rusty
Hi Rusty, here are two improvements to the virtio-blk spec.
The first documents the status quo of an extension that has already
been supported in QEMU for several releases, but never made it to the
official spec.
The second adds support for toggling the cache mode between writethrough
and writeback. Two mechanisms are introduced for this. One is to not
negotiate VIRTIO_BLK_F_FLUSH; it can be done only at reset time and is
more of a refinement geared towards older or limited guests, in order to
make them safe wrt power losses. The second is via feature bits and a
new configuration field.
Paolo Bonzini (2):
virtio-blk spec: document topology info
virtio-blk spec: writeback cache enable improvements
virtio-spec.lyx | 136 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 132 insertions(+), 4 deletions(-)
--
1.7.10.2
^ permalink raw reply
* Re: [PATCH v2] virtio-scsi: hotplug support for virtio-scsi
From: Paolo Bonzini @ 2012-07-03 11:50 UTC (permalink / raw)
To: Cong Meng
Cc: stefanha, linux-scsi, senwang, zwanp, linuxram, linux-kernel,
virtualization
In-Reply-To: <1341294085-17164-1-git-send-email-mc@linux.vnet.ibm.com>
Il 03/07/2012 07:41, Cong Meng ha scritto:
> This patch implements the hotplug support for virtio-scsi.
> When there is a device attached/detached, the virtio-scsi driver will be
> signaled via event virtual queue and it will add/remove the scsi device
> in question automatically.
>
> v2: handle no_event event
>
> Signed-off-by: Cong Meng <mc@linux.vnet.ibm.com>
> Signed-off-by: Sen Wang <senwang@linux.vnet.ibm.com>
The SoB lines are swapped. Otherwise looks good. Since you have to
respin, please add dropped event support too, it shouldn't be hard.
Paolo
> ---
> drivers/scsi/virtio_scsi.c | 113 ++++++++++++++++++++++++++++++++++++++++++-
> include/linux/virtio_scsi.h | 9 ++++
> 2 files changed, 121 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index 9fc5e67..e44b2d6 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -25,6 +25,7 @@
> #include <scsi/scsi_cmnd.h>
>
> #define VIRTIO_SCSI_MEMPOOL_SZ 64
> +#define VIRTIO_SCSI_EVENT_LEN 8
>
> /* Command queue element */
> struct virtio_scsi_cmd {
> @@ -43,6 +44,12 @@ struct virtio_scsi_cmd {
> } resp;
> } ____cacheline_aligned_in_smp;
>
> +struct virtio_scsi_event_node {
> + struct virtio_scsi *vscsi;
> + struct virtio_scsi_event event;
> + struct work_struct work;
> +};
> +
> struct virtio_scsi_vq {
> /* Protects vq */
> spinlock_t vq_lock;
> @@ -67,6 +74,9 @@ struct virtio_scsi {
> struct virtio_scsi_vq event_vq;
> struct virtio_scsi_vq req_vq;
>
> + /* Get some buffers reday for event vq */
> + struct virtio_scsi_event_node event_list[VIRTIO_SCSI_EVENT_LEN];
> +
> struct virtio_scsi_target_state *tgt[];
> };
>
> @@ -202,6 +212,97 @@ static void virtscsi_ctrl_done(struct virtqueue *vq)
> spin_unlock_irqrestore(&vscsi->ctrl_vq.vq_lock, flags);
> };
>
> +static int virtscsi_kick_event(struct virtio_scsi *vscsi,
> + struct virtio_scsi_event_node *event_node)
> +{
> + int ret;
> + struct scatterlist sg;
> + unsigned long flags;
> +
> + sg_set_buf(&sg, &event_node->event, sizeof(struct virtio_scsi_event));
> +
> + spin_lock_irqsave(&vscsi->event_vq.vq_lock, flags);
> +
> + ret = virtqueue_add_buf(vscsi->event_vq.vq, &sg, 0, 1, event_node, GFP_ATOMIC);
> + if (ret >= 0)
> + virtqueue_kick(vscsi->event_vq.vq);
> +
> + spin_unlock_irqrestore(&vscsi->event_vq.vq_lock, flags);
> +
> + return ret;
> +}
> +
> +static int virtscsi_kick_event_all(struct virtio_scsi *vscsi)
> +{
> + int i;
> +
> + for (i = 0; i < VIRTIO_SCSI_EVENT_LEN; i++) {
> + vscsi->event_list[i].vscsi = vscsi;
> + virtscsi_kick_event(vscsi, &vscsi->event_list[i]);
> + }
> +
> + return 0;
> +}
> +
> +static void virtscsi_handle_transport_reset(struct virtio_scsi *vscsi,
> + struct virtio_scsi_event *event)
> +{
> + struct scsi_device *sdev;
> + struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
> + unsigned int target = event->lun[1];
> + unsigned int lun = (event->lun[2] << 8) | event->lun[3];
> +
> + switch (event->reason) {
> + case VIRTIO_SCSI_EVT_RESET_RESCAN:
> + scsi_add_device(shost, 0, target, lun);
> + break;
> + case VIRTIO_SCSI_EVT_RESET_REMOVED:
> + sdev = scsi_device_lookup(shost, 0, target, lun);
> + if (sdev) {
> + scsi_remove_device(sdev);
> + scsi_device_put(sdev);
> + } else {
> + pr_err("SCSI device %d 0 %d %d not found\n",
> + shost->host_no, target, lun);
> + }
> + break;
> + default:
> + pr_info("Unsupport virtio scsi event reason %x\n", event->reason);
> + }
> +}
> +
> +static void virtscsi_handle_event(struct work_struct *work)
> +{
> + struct virtio_scsi_event_node *event_node =
> + container_of(work, struct virtio_scsi_event_node, work);
> + struct virtio_scsi *vscsi = event_node->vscsi;
> + struct virtio_scsi_event *event = &event_node->event;
> +
> + if (event->event & VIRTIO_SCSI_T_EVENTS_MISSED) {
> + event->event &= (~VIRTIO_SCSI_T_EVENTS_MISSED);
> + /* FIXME: handle event missed here */
> + }
> +
> + switch (event->event) {
> + case VIRTIO_SCSI_T_NO_EVENT:
> + break;
> + case VIRTIO_SCSI_T_TRANSPORT_RESET:
> + virtscsi_handle_transport_reset(vscsi, event);
> + break;
> + default:
> + pr_err("Unsupport virtio scsi event %x\n", event->event);
> + }
> + virtscsi_kick_event(vscsi, event_node);
> +}
> +
> +static void virtscsi_complete_event(void *buf)
> +{
> + struct virtio_scsi_event_node *event_node = buf;
> +
> + INIT_WORK(&event_node->work, virtscsi_handle_event);
> + schedule_work(&event_node->work);
> +}
> +
> static void virtscsi_event_done(struct virtqueue *vq)
> {
> struct Scsi_Host *sh = virtio_scsi_host(vq->vdev);
> @@ -209,7 +310,7 @@ static void virtscsi_event_done(struct virtqueue *vq)
> unsigned long flags;
>
> spin_lock_irqsave(&vscsi->event_vq.vq_lock, flags);
> - virtscsi_vq_done(vq, virtscsi_complete_free);
> + virtscsi_vq_done(vq, virtscsi_complete_event);
> spin_unlock_irqrestore(&vscsi->event_vq.vq_lock, flags);
> };
>
> @@ -510,6 +611,10 @@ static int virtscsi_init(struct virtio_device *vdev,
> virtscsi_config_set(vdev, cdb_size, VIRTIO_SCSI_CDB_SIZE);
> virtscsi_config_set(vdev, sense_size, VIRTIO_SCSI_SENSE_SIZE);
>
> + if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
> + virtscsi_kick_event_all(vscsi);
> + }
> +
> /* We need to know how many segments before we allocate. */
> sg_elems = virtscsi_config_get(vdev, seg_max) ?: 1;
>
> @@ -608,7 +713,13 @@ static struct virtio_device_id id_table[] = {
> { 0 },
> };
>
> +static unsigned int features[] = {
> + VIRTIO_SCSI_F_HOTPLUG
> +};
> +
> static struct virtio_driver virtio_scsi_driver = {
> + .feature_table = features,
> + .feature_table_size = ARRAY_SIZE(features),
> .driver.name = KBUILD_MODNAME,
> .driver.owner = THIS_MODULE,
> .id_table = id_table,
> diff --git a/include/linux/virtio_scsi.h b/include/linux/virtio_scsi.h
> index 8ddeafd..dc8d305 100644
> --- a/include/linux/virtio_scsi.h
> +++ b/include/linux/virtio_scsi.h
> @@ -69,6 +69,10 @@ struct virtio_scsi_config {
> u32 max_lun;
> } __packed;
>
> +/* Feature Bits */
> +#define VIRTIO_SCSI_F_INOUT 0
> +#define VIRTIO_SCSI_F_HOTPLUG 1
> +
> /* Response codes */
> #define VIRTIO_SCSI_S_OK 0
> #define VIRTIO_SCSI_S_OVERRUN 1
> @@ -105,6 +109,11 @@ struct virtio_scsi_config {
> #define VIRTIO_SCSI_T_TRANSPORT_RESET 1
> #define VIRTIO_SCSI_T_ASYNC_NOTIFY 2
>
> +/* Reasons of transport reset event */
> +#define VIRTIO_SCSI_EVT_RESET_HARD 0
> +#define VIRTIO_SCSI_EVT_RESET_RESCAN 1
> +#define VIRTIO_SCSI_EVT_RESET_REMOVED 2
> +
> #define VIRTIO_SCSI_S_SIMPLE 0
> #define VIRTIO_SCSI_S_ORDERED 1
> #define VIRTIO_SCSI_S_HEAD 2
>
^ permalink raw reply
* [PATCH v2] virtio-scsi: hotplug support for virtio-scsi
From: Cong Meng @ 2012-07-03 5:41 UTC (permalink / raw)
To: Paolo Bonzini
Cc: stefanha, linux-scsi, senwang, zwanp, linuxram, linux-kernel,
virtualization, Cong Meng
This patch implements the hotplug support for virtio-scsi.
When there is a device attached/detached, the virtio-scsi driver will be
signaled via event virtual queue and it will add/remove the scsi device
in question automatically.
v2: handle no_event event
Signed-off-by: Cong Meng <mc@linux.vnet.ibm.com>
Signed-off-by: Sen Wang <senwang@linux.vnet.ibm.com>
---
drivers/scsi/virtio_scsi.c | 113 ++++++++++++++++++++++++++++++++++++++++++-
include/linux/virtio_scsi.h | 9 ++++
2 files changed, 121 insertions(+), 1 deletions(-)
diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 9fc5e67..e44b2d6 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -25,6 +25,7 @@
#include <scsi/scsi_cmnd.h>
#define VIRTIO_SCSI_MEMPOOL_SZ 64
+#define VIRTIO_SCSI_EVENT_LEN 8
/* Command queue element */
struct virtio_scsi_cmd {
@@ -43,6 +44,12 @@ struct virtio_scsi_cmd {
} resp;
} ____cacheline_aligned_in_smp;
+struct virtio_scsi_event_node {
+ struct virtio_scsi *vscsi;
+ struct virtio_scsi_event event;
+ struct work_struct work;
+};
+
struct virtio_scsi_vq {
/* Protects vq */
spinlock_t vq_lock;
@@ -67,6 +74,9 @@ struct virtio_scsi {
struct virtio_scsi_vq event_vq;
struct virtio_scsi_vq req_vq;
+ /* Get some buffers reday for event vq */
+ struct virtio_scsi_event_node event_list[VIRTIO_SCSI_EVENT_LEN];
+
struct virtio_scsi_target_state *tgt[];
};
@@ -202,6 +212,97 @@ static void virtscsi_ctrl_done(struct virtqueue *vq)
spin_unlock_irqrestore(&vscsi->ctrl_vq.vq_lock, flags);
};
+static int virtscsi_kick_event(struct virtio_scsi *vscsi,
+ struct virtio_scsi_event_node *event_node)
+{
+ int ret;
+ struct scatterlist sg;
+ unsigned long flags;
+
+ sg_set_buf(&sg, &event_node->event, sizeof(struct virtio_scsi_event));
+
+ spin_lock_irqsave(&vscsi->event_vq.vq_lock, flags);
+
+ ret = virtqueue_add_buf(vscsi->event_vq.vq, &sg, 0, 1, event_node, GFP_ATOMIC);
+ if (ret >= 0)
+ virtqueue_kick(vscsi->event_vq.vq);
+
+ spin_unlock_irqrestore(&vscsi->event_vq.vq_lock, flags);
+
+ return ret;
+}
+
+static int virtscsi_kick_event_all(struct virtio_scsi *vscsi)
+{
+ int i;
+
+ for (i = 0; i < VIRTIO_SCSI_EVENT_LEN; i++) {
+ vscsi->event_list[i].vscsi = vscsi;
+ virtscsi_kick_event(vscsi, &vscsi->event_list[i]);
+ }
+
+ return 0;
+}
+
+static void virtscsi_handle_transport_reset(struct virtio_scsi *vscsi,
+ struct virtio_scsi_event *event)
+{
+ struct scsi_device *sdev;
+ struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
+ unsigned int target = event->lun[1];
+ unsigned int lun = (event->lun[2] << 8) | event->lun[3];
+
+ switch (event->reason) {
+ case VIRTIO_SCSI_EVT_RESET_RESCAN:
+ scsi_add_device(shost, 0, target, lun);
+ break;
+ case VIRTIO_SCSI_EVT_RESET_REMOVED:
+ sdev = scsi_device_lookup(shost, 0, target, lun);
+ if (sdev) {
+ scsi_remove_device(sdev);
+ scsi_device_put(sdev);
+ } else {
+ pr_err("SCSI device %d 0 %d %d not found\n",
+ shost->host_no, target, lun);
+ }
+ break;
+ default:
+ pr_info("Unsupport virtio scsi event reason %x\n", event->reason);
+ }
+}
+
+static void virtscsi_handle_event(struct work_struct *work)
+{
+ struct virtio_scsi_event_node *event_node =
+ container_of(work, struct virtio_scsi_event_node, work);
+ struct virtio_scsi *vscsi = event_node->vscsi;
+ struct virtio_scsi_event *event = &event_node->event;
+
+ if (event->event & VIRTIO_SCSI_T_EVENTS_MISSED) {
+ event->event &= (~VIRTIO_SCSI_T_EVENTS_MISSED);
+ /* FIXME: handle event missed here */
+ }
+
+ switch (event->event) {
+ case VIRTIO_SCSI_T_NO_EVENT:
+ break;
+ case VIRTIO_SCSI_T_TRANSPORT_RESET:
+ virtscsi_handle_transport_reset(vscsi, event);
+ break;
+ default:
+ pr_err("Unsupport virtio scsi event %x\n", event->event);
+ }
+ virtscsi_kick_event(vscsi, event_node);
+}
+
+static void virtscsi_complete_event(void *buf)
+{
+ struct virtio_scsi_event_node *event_node = buf;
+
+ INIT_WORK(&event_node->work, virtscsi_handle_event);
+ schedule_work(&event_node->work);
+}
+
static void virtscsi_event_done(struct virtqueue *vq)
{
struct Scsi_Host *sh = virtio_scsi_host(vq->vdev);
@@ -209,7 +310,7 @@ static void virtscsi_event_done(struct virtqueue *vq)
unsigned long flags;
spin_lock_irqsave(&vscsi->event_vq.vq_lock, flags);
- virtscsi_vq_done(vq, virtscsi_complete_free);
+ virtscsi_vq_done(vq, virtscsi_complete_event);
spin_unlock_irqrestore(&vscsi->event_vq.vq_lock, flags);
};
@@ -510,6 +611,10 @@ static int virtscsi_init(struct virtio_device *vdev,
virtscsi_config_set(vdev, cdb_size, VIRTIO_SCSI_CDB_SIZE);
virtscsi_config_set(vdev, sense_size, VIRTIO_SCSI_SENSE_SIZE);
+ if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
+ virtscsi_kick_event_all(vscsi);
+ }
+
/* We need to know how many segments before we allocate. */
sg_elems = virtscsi_config_get(vdev, seg_max) ?: 1;
@@ -608,7 +713,13 @@ static struct virtio_device_id id_table[] = {
{ 0 },
};
+static unsigned int features[] = {
+ VIRTIO_SCSI_F_HOTPLUG
+};
+
static struct virtio_driver virtio_scsi_driver = {
+ .feature_table = features,
+ .feature_table_size = ARRAY_SIZE(features),
.driver.name = KBUILD_MODNAME,
.driver.owner = THIS_MODULE,
.id_table = id_table,
diff --git a/include/linux/virtio_scsi.h b/include/linux/virtio_scsi.h
index 8ddeafd..dc8d305 100644
--- a/include/linux/virtio_scsi.h
+++ b/include/linux/virtio_scsi.h
@@ -69,6 +69,10 @@ struct virtio_scsi_config {
u32 max_lun;
} __packed;
+/* Feature Bits */
+#define VIRTIO_SCSI_F_INOUT 0
+#define VIRTIO_SCSI_F_HOTPLUG 1
+
/* Response codes */
#define VIRTIO_SCSI_S_OK 0
#define VIRTIO_SCSI_S_OVERRUN 1
@@ -105,6 +109,11 @@ struct virtio_scsi_config {
#define VIRTIO_SCSI_T_TRANSPORT_RESET 1
#define VIRTIO_SCSI_T_ASYNC_NOTIFY 2
+/* Reasons of transport reset event */
+#define VIRTIO_SCSI_EVT_RESET_HARD 0
+#define VIRTIO_SCSI_EVT_RESET_RESCAN 1
+#define VIRTIO_SCSI_EVT_RESET_REMOVED 2
+
#define VIRTIO_SCSI_S_SIMPLE 0
#define VIRTIO_SCSI_S_ORDERED 1
#define VIRTIO_SCSI_S_HEAD 2
--
1.7.7.6
^ permalink raw reply related
* Re: RFD: virtio balloon API use (was Re: [PATCH 5 of 5] virtio: expose added descriptors immediately)
From: Rusty Russell @ 2012-07-03 0:47 UTC (permalink / raw)
To: Rafael Aquini, Michael S. Tsirkin; +Cc: linux-kernel, kvm, virtualization
In-Reply-To: <20120702160814.GB1750@t510.redhat.com>
On Mon, 2 Jul 2012 13:08:19 -0300, Rafael Aquini <aquini@redhat.com> wrote:
> As 'locking in balloon', may I assume the approach I took for the compaction case
> is OK and aligned to address these concerns of yours? If not, do not hesitate in
> giving me your thoughts, please. I'm respinning a V3 series to address a couple
> of extra nitpicks from the compaction standpoint, and I'd love to be able to
> address any extra concern you might have on the balloon side of that work.
It's orthogonal, though looks like they clash textually :(
I'll re-spin MST's patch on top of yours, and include both in my tree,
otherwise linux-next will have to do the merge. But I'll await your
push before pushing to Linus next merge window.
Thanks,
Rusty.
^ permalink raw reply
* Re: [PATCH 3/3] virtio-blk: Add bio-based IO path for virtio-blk
From: Asias He @ 2012-07-03 0:39 UTC (permalink / raw)
To: Rusty Russell
Cc: kvm, Michael S. Tsirkin, linux-kernel, virtualization,
Sasha Levin, Christoph Hellwig
In-Reply-To: <8762a6y89v.fsf@rustcorp.com.au>
On 07/02/2012 02:41 PM, Rusty Russell wrote:
> On Mon, 02 Jul 2012 10:45:05 +0800, Asias He <asias@redhat.com> wrote:
>> On 07/02/2012 07:54 AM, Rusty Russell wrote:
>>> Confused. So, without merging we get 6k exits (per second?) How many
>>> do we get when we use the request-based IO path?
>>
>> Sorry for the confusion. The numbers were collected from request-based
>> IO path where we can have the guest block layer merge the requests.
>>
>> With the same workload in guest, the guest fires 200K requests to host
>> with merges enabled in guest (echo 0 > /sys/block/vdb/queue/nomerges),
>> while the guest fires 40000K requests to host with merges disabled in
>> guest (echo 2 > /sys/block/vdb/queue/nomerges). This show that the merge
>> in block layer reduces the total number of requests fire to host a lot
>> (40000K / 200K = 20).
>>
>> The guest fires 200K requests to host with merges enabled in guest (echo
>> 0 > /sys/block/vdb/queue/nomerges), the host fires 6K interrupts in
>> total for the 200K requests. This show that the ratio of interrupts
>> coalesced (200K / 6K = 33).
>
> OK, got it! Guest merging cuts requests by a factor of 20. EVENT_IDX
> cuts interrupts by a factor of 33.
Yes.
>
>>> If your device is slow, then you won't be able to make many requests per
>>> second: why worry about exit costs?
>>
>> If a device is slow, the merge would merge more requests and reduce the
>> total number of requests to host. This saves exit costs, no?
>
> Sure, our guest merging might save us 100x as many exits as no merging.
> But since we're not doing many requests, does it matter?
We can still have many requests with slow devices. The number of
requests depends on the workload in guest. E.g. 512 IO threads in guest
keeping doing IO.
> Ideally we'd merge requests only if the device queue is full. But that
> sounds hard.
>
>>> If your device is fast (eg. ram),
>>> you've already shown that your patch is a win, right?
>>
>> Yes. Both on ramdisk and fast SSD device (e.g. FusionIO).
>
> Thanks,
> Rusty.
>
--
Asias
^ permalink raw reply
* Re: [PATCH 00/13] drivers: hv: kvp
From: Ben Hutchings @ 2012-07-02 19:57 UTC (permalink / raw)
To: KY Srinivasan
Cc: Olaf Hering, Greg KH, apw@canonical.com,
devel@linuxdriverproject.org, virtualization@lists.osdl.org,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <426367E2313C2449837CD2DE46E7EAF9155EF399@SN2PRD0310MB382.namprd03.prod.outlook.com>
On Mon, Jul 02, 2012 at 03:22:25PM +0000, KY Srinivasan wrote:
>
>
> > -----Original Message-----
> > From: Olaf Hering [mailto:olaf@aepfle.de]
> > Sent: Thursday, June 28, 2012 10:24 AM
> > To: KY Srinivasan
> > Cc: Greg KH; apw@canonical.com; devel@linuxdriverproject.org;
> > virtualization@lists.osdl.org; linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH 00/13] drivers: hv: kvp
> >
> > On Tue, Jun 26, KY Srinivasan wrote:
> >
> > > > From: Greg KH [mailto:gregkh@linuxfoundation.org]
> > > > The fact that it was Red Hat specific was the main part, this should be
> > > > done in a standard way, with standard tools, right?
> > >
> > > The reason I asked this question was to make sure I address these
> > > issues in addition to whatever I am debugging now. I use the standard
> > > tools and calls to retrieve all the IP configuration. As I look at
> > > each distribution the files they keep persistent IP configuration
> > > Information is different and that is the reason I chose to start with
> > > RedHat. If there is a standard way to store the configuration, I will
> > > do that.
> >
> >
> > KY,
> >
> > instead of using system() in kvp_get_ipconfig_info and kvp_set_ip_info,
> > wouldnt it be easier to call an external helper script which does all
> > the distribution specific work? Just define some API to pass values to
> > the script, and something to read values collected by the script back
> > into the daemon.
>
> On the "Get" side I mostly use standard commands/APIs to get all the information:
>
> 1) IP address information and subnet mask: getifaddrs()
> 2) DNS information: Parsing /etc/resolv.conf
> 3) /sbin/ip command for all the routing information
If you're interested in the *current* configuration then (1) and (3)
are OK but you should really use the rtnetlink API.
However, I suspect that Hyper-V assumes that current and persistent
configuration are the same thing, which is obviously not true in
general on Linux. But if NetworkManager is running then you can
assume they are.
> 4) Parse /etc/sysconfig/network-scripts/ifcfg-ethx for boot protocol
>
> As you can see, all but the boot protocol is gathered using the "standard distro
> independent mechanisms. I was looking at NetworkManager cli and it looks
> like I could gather all the information except the boot protocol information. I am
> not sure how to gather the boot protocol information in a distro independent fashion.
>
> On the SET side, I need to persistently store the settings in an appropriate configuration
> file and flush these settings down so that the interface is appropriately configured. It is here
> that I am struggling to find a distro independent way of doing things. It would be great if I can
> use NetworkManager cli (nmcli) to accomplish this. Any help here would be greatly appreciated.
[...]
What was wrong with the NetworkManager D-Bus API I pointed you at?
I don't see how it makes sense to use nmcli as an API.
Ben.
--
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
- Albert Camus
^ permalink raw reply
* Re: RFD: virtio balloon API use (was Re: [PATCH 5 of 5] virtio: expose added descriptors immediately)
From: Rafael Aquini @ 2012-07-02 16:08 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: linux-kernel, kvm, virtualization
In-Reply-To: <20120702072558.GA8268@redhat.com>
On Mon, Jul 02, 2012 at 10:25:58AM +0300, Michael S. Tsirkin wrote:
> On Mon, Jul 02, 2012 at 10:35:47AM +0930, Rusty Russell wrote:
> > On Sun, 1 Jul 2012 12:20:51 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Thu, Nov 03, 2011 at 06:12:53PM +1030, Rusty Russell wrote:
> > > > A virtio driver does virtqueue_add_buf() multiple times before finally
> > > > calling virtqueue_kick(); previously we only exposed the added buffers
> > > > in the virtqueue_kick() call. This means we don't need a memory
> > > > barrier in virtqueue_add_buf(), but it reduces concurrency as the
> > > > device (ie. host) can't see the buffers until the kick.
> > > >
> > > > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> > >
> > > Looking at recent mm compaction patches made me look at locking
> > > in balloon closely. And I noticed the referenced patch (commit
> > > ee7cd8981e15bcb365fc762afe3fc47b8242f630 upstream) interacts strangely
> > > with virtio balloon; balloon currently does:
> > >
> > > static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
> > > {
> > > struct scatterlist sg;
> > >
> > > sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> > >
> > > init_completion(&vb->acked);
> > >
> > > /* We should always be able to add one buffer to an empty queue. */
> > > if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
> > > BUG();
> > > virtqueue_kick(vq);
> > >
> > > /* When host has read buffer, this completes via balloon_ack */
> > > wait_for_completion(&vb->acked);
> > > }
> > >
> > >
> > > While vq callback does:
> > >
> > > static void balloon_ack(struct virtqueue *vq)
> > > {
> > > struct virtio_balloon *vb;
> > > unsigned int len;
> > >
> > > vb = virtqueue_get_buf(vq, &len);
> > > if (vb)
> > > complete(&vb->acked);
> > > }
> > >
> > >
> > > So virtqueue_get_buf might now run concurrently with virtqueue_kick.
> > > I audited both and this seems safe in practice but I think
> >
> > Good spotting!
> >
> > Agreed. Because there's only add_buf, we get away with it: the add_buf
> > must be almost finished by the time get_buf runs because the device has
> > seen the buffer.
> >
> > > we need to either declare this legal at the API level
> > > or add locking in driver.
> >
> > I wonder if we should just lock in the balloon driver, rather than
> > document this corner case and set a bad example.
>
> We'll need to replace &vb->acked with a waitqueue
> and do get_buf from the same thread.
> But I note that stats_request hash the same issue.
> Let's see if we can fix it.
>
> > Are there other
> > drivers which take the same shortcut?
>
> Not that I know.
>
> > > Further, is there a guarantee that we never get
> > > spurious callbacks? We currently check ring not empty
> > > but esp for non shared MSI this might not be needed.
> >
> > Yes, I think this saves us. A spurious interrupt won't trigger
> > a spurious callback.
> >
> > > If a spurious callback triggers, virtqueue_get_buf can run
> > > concurrently with virtqueue_add_buf which is known to be racy.
> > > Again I think this is currently safe as no spurious callbacks in
> > > practice but should we guarantee no spurious callbacks at the API level
> > > or add locking in driver?
> >
> > I think we should guarantee it, but is there a hole in the current
> > implementation?
> >
> > Thanks,
> > Rusty.
>
> Could be. The check for ring empty looks somewhat suspicious.
> It might be expensive to make it 100% robust - that check was
> intended as an optimization for shared interrupts.
> Whith per vq interrupts we IMO do not need the check.
> If we add locking in balloon I think there's no need
> to guarantee no spurious interrupts.
>
As 'locking in balloon', may I assume the approach I took for the compaction case
is OK and aligned to address these concerns of yours? If not, do not hesitate in
giving me your thoughts, please. I'm respinning a V3 series to address a couple
of extra nitpicks from the compaction standpoint, and I'd love to be able to
address any extra concern you might have on the balloon side of that work.
Thanks!
Rafael.
^ permalink raw reply
* RE: [PATCH 00/13] drivers: hv: kvp
From: KY Srinivasan @ 2012-07-02 15:22 UTC (permalink / raw)
To: Olaf Hering
Cc: Greg KH, apw@canonical.com, devel@linuxdriverproject.org,
virtualization@lists.osdl.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org
In-Reply-To: <20120628142340.GA21537@aepfle.de>
> -----Original Message-----
> From: Olaf Hering [mailto:olaf@aepfle.de]
> Sent: Thursday, June 28, 2012 10:24 AM
> To: KY Srinivasan
> Cc: Greg KH; apw@canonical.com; devel@linuxdriverproject.org;
> virtualization@lists.osdl.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 00/13] drivers: hv: kvp
>
> On Tue, Jun 26, KY Srinivasan wrote:
>
> > > From: Greg KH [mailto:gregkh@linuxfoundation.org]
> > > The fact that it was Red Hat specific was the main part, this should be
> > > done in a standard way, with standard tools, right?
> >
> > The reason I asked this question was to make sure I address these
> > issues in addition to whatever I am debugging now. I use the standard
> > tools and calls to retrieve all the IP configuration. As I look at
> > each distribution the files they keep persistent IP configuration
> > Information is different and that is the reason I chose to start with
> > RedHat. If there is a standard way to store the configuration, I will
> > do that.
>
>
> KY,
>
> instead of using system() in kvp_get_ipconfig_info and kvp_set_ip_info,
> wouldnt it be easier to call an external helper script which does all
> the distribution specific work? Just define some API to pass values to
> the script, and something to read values collected by the script back
> into the daemon.
On the "Get" side I mostly use standard commands/APIs to get all the information:
1) IP address information and subnet mask: getifaddrs()
2) DNS information: Parsing /etc/resolv.conf
3) /sbin/ip command for all the routing information
4) Parse /etc/sysconfig/network-scripts/ifcfg-ethx for boot protocol
As you can see, all but the boot protocol is gathered using the "standard distro
independent mechanisms. I was looking at NetworkManager cli and it looks
like I could gather all the information except the boot protocol information. I am
not sure how to gather the boot protocol information in a distro independent fashion.
On the SET side, I need to persistently store the settings in an appropriate configuration
file and flush these settings down so that the interface is appropriately configured. It is here
that I am struggling to find a distro independent way of doing things. It would be great if I can
use NetworkManager cli (nmcli) to accomplish this. Any help here would be greatly appreciated.
While I toyed with your proposal, I feel it just pushes the problem out of the daemon code -
we would still need to write distro specific scripts. If this approach is something that everybody
is comfortable with, I can take a stab at implementing that.
>
> If the work is done in a script it will be much easier for an admin to
> debug and adjust it.
>
> I think there is no standard way to configure all relevant distros in
> the same way. Maybe one day NetworkManager can finally handle all
> possible ways to configure network related things. But until that
> happens the config files need to be adjusted manually.
>
>
>
> Some of the functions have deep indention levels due to 'while() {
> switch() }' usage. Perhaps such code could be moved into its own
> function so that lines dont need to be wrapped that much due to the odd
> 80 column limit.
I will take care of this. As suggested by Greg, I am adding netdev developers here to
seek their input.
Regards,
K. Y
^ permalink raw reply
* Re: [PATCH 1/2 v2] scsi bus: introduce hotplug() and hot_unplug() interfaces for SCSI bus
From: Paolo Bonzini @ 2012-07-02 8:27 UTC (permalink / raw)
To: Cong Meng
Cc: stefanha, zwanp, linuxram, senwang, qemu-devel, Anthony Liguori,
virtualization, Andreas Färber
In-Reply-To: <ad3714e594248d5a8d558b6983db3fb2da65ae9a.1340264611.git.mc@linux.vnet.ibm.com>
Il 21/06/2012 09:54, Cong Meng ha scritto:
> Add two interfaces hotplug() and hot_unplug() to scsi bus info.
> The embody scsi bus can implement these two interfaces to signal the HBA driver
> of guest kernel to add/remove the scsi device in question.
>
> Signed-off-by: Cong Meng <mc@linux.vnet.ibm.com>
> Signed-off-by: Sen Wang <senwang@linux.vnet.ibm.com>
> ---
> hw/scsi-bus.c | 16 +++++++++++++++-
> hw/scsi.h | 2 ++
> 2 files changed, 17 insertions(+), 1 deletions(-)
>
> diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
> index dbdb99c..f08900e 100644
> --- a/hw/scsi-bus.c
> +++ b/hw/scsi-bus.c
> @@ -177,6 +177,10 @@ static int scsi_qdev_init(DeviceState *qdev)
> dev);
> }
>
> + if (bus->info->hotplug) {
> + bus->info->hotplug(bus, dev);
> + }
> +
> err:
> return rc;
> }
> @@ -1539,6 +1543,16 @@ static int get_scsi_requests(QEMUFile *f, void *pv, size_t size)
> return 0;
> }
>
> +static int scsi_qdev_unplug(DeviceState *qdev)
> +{
> + SCSIDevice *dev = SCSI_DEVICE(qdev);
> + SCSIBus *bus = scsi_bus_from_device(dev);
> +
> + if (bus->info->hot_unplug)
> + bus->info->hot_unplug(bus, dev);
Tab indentation and braces missing. I fixed this up and applied it to
the scsi-next branch.
Paolo
> + return qdev_simple_unplug_cb(qdev);
> +}
> +
> const VMStateInfo vmstate_info_scsi_requests = {
> .name = "scsi-requests",
> .get = get_scsi_requests,
> @@ -1575,7 +1589,7 @@ static void scsi_device_class_init(ObjectClass *klass, void *data)
> DeviceClass *k = DEVICE_CLASS(klass);
> k->bus_info = &scsi_bus_info;
> k->init = scsi_qdev_init;
> - k->unplug = qdev_simple_unplug_cb;
> + k->unplug = scsi_qdev_unplug;
> k->exit = scsi_qdev_exit;
> }
>
> diff --git a/hw/scsi.h b/hw/scsi.h
> index 2eb66f7..5768071 100644
> --- a/hw/scsi.h
> +++ b/hw/scsi.h
> @@ -130,6 +130,8 @@ struct SCSIBusInfo {
> void (*transfer_data)(SCSIRequest *req, uint32_t arg);
> void (*complete)(SCSIRequest *req, uint32_t arg, size_t resid);
> void (*cancel)(SCSIRequest *req);
> + void (*hotplug)(SCSIBus *bus, SCSIDevice *dev);
> + void (*hot_unplug)(SCSIBus *bus, SCSIDevice *dev);
> QEMUSGList *(*get_sg_list)(SCSIRequest *req);
>
> void (*save_request)(QEMUFile *f, SCSIRequest *req);
>
^ permalink raw reply
* Re: [PATCH] virtio-scsi: hotplug suppot for virtio-scsi
From: Paolo Bonzini @ 2012-07-02 8:11 UTC (permalink / raw)
To: mc
Cc: stefanha, linux-scsi, linux-kernel, zwanp, linuxram, senwang,
virtualization, James E.J. Bottomley, Anthony Liguori
In-Reply-To: <20120702032057.Horde.WOVSk5ir309P8UvZ8oq3rUA@imap.linux.ibm.com>
Il 02/07/2012 09:20, mc@linux.vnet.ibm.com ha scritto:
>>> +static void virtscsi_handle_event(struct work_struct *work)
>>> +{
>>> + struct virtio_scsi_event_node *event_node =
>>> + container_of(work, struct virtio_scsi_event_node, work);
>>> + struct virtio_scsi *vscsi = event_node->vscsi;
>>> + struct virtio_scsi_event *event = &event_node->event;
>>> +
>>> + switch (event->event) {
>>> + case VIRTIO_SCSI_T_TRANSPORT_RESET:
>>> + virtscsi_handle_transport_reset(vscsi, event);
>>> + break;
>>
>> Please handle VIRTIO_SCSI_T_NO_EVENT too here, and mask out
>> VIRTIO_SCSI_T_EVENTS_MISSED even if you do not handle it by triggering a
>> full rescan.
>>
> Paolo, what's the appropriate action to response VIRTIO_SCSI_T_NO_EVENT
> event? An error message?
No, just do nothing.
> I also have some confusion on VIRTIO_SCSI_T_EVENTS_MISSED:
> Should VIRTIO_SCSI_T_EVENTS_MISSED always be used with other event flag?
Yes.
> For instanse:
> (VIRTIO_SCSI_T_EVENTS_MISSED|VIRTIO_SCSI_T_TRANSPORT_RESET) is used to
> report hotplug event dropped when a buffer available.
The idea in the spec was that QEMU would report VIRTIO_SCSI_T_NO_EVENT |
VIRTIO_SCSI_T_EVENTS_MISSED as soon as a buffer becomes available, but
technically this is indeed possible. So you need to handle
VIRTIO_SCSI_T_EVENTS_MISSED first, and the other bits separately.
Paolo
^ permalink raw reply
* Re: [RFC V2 PATCH 4/4] virtio-net: add multiqueue support
From: Michael S. Tsirkin @ 2012-07-02 7:34 UTC (permalink / raw)
To: Jason Wang
Cc: krkumar2, habanero, aliguori, mashirle, kvm, qemu-devel,
virtualization, tahm, jwhan
In-Reply-To: <4FF147E0.7090903@redhat.com>
On Mon, Jul 02, 2012 at 03:04:00PM +0800, Jason Wang wrote:
> On 07/01/2012 05:43 PM, Michael S. Tsirkin wrote:
> >On Mon, Jun 25, 2012 at 06:04:49PM +0800, Jason Wang wrote:
> >>This patch let the virtio-net can transmit and recevie packets through multiuple
> >>VLANClientStates and abstract them as multiple virtqueues to guest. A new
> >>parameter 'queues' were introduced to specify the number of queue pairs.
> >>
> >>The main goal for vhost support is to let the multiqueue could be used without
> >>changes in vhost code. So each vhost_net structure were used to track a single
> >>VLANClientState and two virtqueues in the past. As multiple VLANClientState were
> >>stored in the NICState, we can infer the correspond VLANClientState from this
> >>and queue_index easily.
> >>
> >>Signed-off-by: Jason Wang<jasowang@redhat.com>
> >Can this patch be split up?
> >1. extend vhost API to allow multiqueue and minimally tweak virtio
> >2. add real multiqueue for virtio
> >
> >Hmm?
>
> Sure, do you think it's necessary to separate the vhost parts of
> multiqueue from virtio?
Separating is always nice if it is not too hard.
> >>---
> >> hw/vhost.c | 58 ++++---
> >> hw/vhost.h | 1
> >> hw/vhost_net.c | 7 +
> >> hw/vhost_net.h | 2
> >> hw/virtio-net.c | 461 +++++++++++++++++++++++++++++++++++++------------------
> >> hw/virtio-net.h | 3
> >> 6 files changed, 355 insertions(+), 177 deletions(-)
> >>
> >>diff --git a/hw/vhost.c b/hw/vhost.c
> >>index 43664e7..6318bb2 100644
> >>--- a/hw/vhost.c
> >>+++ b/hw/vhost.c
> >>@@ -620,11 +620,12 @@ static int vhost_virtqueue_init(struct vhost_dev *dev,
> >> {
> >> target_phys_addr_t s, l, a;
> >> int r;
> >>+ int vhost_vq_index = (idx> 2 ? idx - 1 : idx) % dev->nvqs;
> >> struct vhost_vring_file file = {
> >>- .index = idx,
> >>+ .index = vhost_vq_index
> >> };
> >> struct vhost_vring_state state = {
> >>- .index = idx,
> >>+ .index = vhost_vq_index
> >> };
> >> struct VirtQueue *vvq = virtio_get_queue(vdev, idx);
> >>
> >>@@ -670,11 +671,12 @@ static int vhost_virtqueue_init(struct vhost_dev *dev,
> >> goto fail_alloc_ring;
> >> }
> >>
> >>- r = vhost_virtqueue_set_addr(dev, vq, idx, dev->log_enabled);
> >>+ r = vhost_virtqueue_set_addr(dev, vq, vhost_vq_index, dev->log_enabled);
> >> if (r< 0) {
> >> r = -errno;
> >> goto fail_alloc;
> >> }
> >>+
> >> file.fd = event_notifier_get_fd(virtio_queue_get_host_notifier(vvq));
> >> r = ioctl(dev->control, VHOST_SET_VRING_KICK,&file);
> >> if (r) {
> >>@@ -715,7 +717,7 @@ static void vhost_virtqueue_cleanup(struct vhost_dev *dev,
> >> unsigned idx)
> >> {
> >> struct vhost_vring_state state = {
> >>- .index = idx,
> >>+ .index = (idx> 2 ? idx - 1 : idx) % dev->nvqs,
> >> };
> >> int r;
> >> r = ioctl(dev->control, VHOST_GET_VRING_BASE,&state);
> >>@@ -829,7 +831,9 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
> >> }
> >>
> >> for (i = 0; i< hdev->nvqs; ++i) {
> >>- r = vdev->binding->set_host_notifier(vdev->binding_opaque, i, true);
> >>+ r = vdev->binding->set_host_notifier(vdev->binding_opaque,
> >>+ hdev->start_idx + i,
> >>+ true);
> >> if (r< 0) {
> >> fprintf(stderr, "vhost VQ %d notifier binding failed: %d\n", i, -r);
> >> goto fail_vq;
> >>@@ -839,7 +843,9 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
> >> return 0;
> >> fail_vq:
> >> while (--i>= 0) {
> >>- r = vdev->binding->set_host_notifier(vdev->binding_opaque, i, false);
> >>+ r = vdev->binding->set_host_notifier(vdev->binding_opaque,
> >>+ hdev->start_idx + i,
> >>+ false);
> >> if (r< 0) {
> >> fprintf(stderr, "vhost VQ %d notifier cleanup error: %d\n", i, -r);
> >> fflush(stderr);
> >>@@ -860,7 +866,9 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
> >> int i, r;
> >>
> >> for (i = 0; i< hdev->nvqs; ++i) {
> >>- r = vdev->binding->set_host_notifier(vdev->binding_opaque, i, false);
> >>+ r = vdev->binding->set_host_notifier(vdev->binding_opaque,
> >>+ hdev->start_idx + i,
> >>+ false);
> >> if (r< 0) {
> >> fprintf(stderr, "vhost VQ %d notifier cleanup failed: %d\n", i, -r);
> >> fflush(stderr);
> >>@@ -874,15 +882,17 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
> >> {
> >> int i, r;
> >> if (!vdev->binding->set_guest_notifiers) {
> >>- fprintf(stderr, "binding does not support guest notifiers\n");
> >>+ fprintf(stderr, "binding does not support guest notifier\n");
> >> r = -ENOSYS;
> >> goto fail;
> >> }
> >>
> >>- r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, true);
> >>- if (r< 0) {
> >>- fprintf(stderr, "Error binding guest notifier: %d\n", -r);
> >>- goto fail_notifiers;
> >>+ if (hdev->start_idx == 0) {
> >>+ r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, true);
> >>+ if (r< 0) {
> >>+ fprintf(stderr, "Error binding guest notifier: %d\n", -r);
> >>+ goto fail_notifiers;
> >>+ }
> >> }
> >>
> >> r = vhost_dev_set_features(hdev, hdev->log_enabled);
> >>@@ -898,7 +908,7 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
> >> r = vhost_virtqueue_init(hdev,
> >> vdev,
> >> hdev->vqs + i,
> >>- i);
> >>+ hdev->start_idx + i);
> >> if (r< 0) {
> >> goto fail_vq;
> >> }
> >>@@ -925,11 +935,13 @@ fail_vq:
> >> vhost_virtqueue_cleanup(hdev,
> >> vdev,
> >> hdev->vqs + i,
> >>- i);
> >>+ hdev->start_idx + i);
> >> }
> >>+ i = hdev->nvqs;
> >> fail_mem:
> >> fail_features:
> >>- vdev->binding->set_guest_notifiers(vdev->binding_opaque, false);
> >>+ if (hdev->start_idx == 0)
> >>+ vdev->binding->set_guest_notifiers(vdev->binding_opaque, false);
> >> fail_notifiers:
> >> fail:
> >> return r;
> >>@@ -944,18 +956,22 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
> >> vhost_virtqueue_cleanup(hdev,
> >> vdev,
> >> hdev->vqs + i,
> >>- i);
> >>+ hdev->start_idx + i);
> >> }
> >>+
> >> for (i = 0; i< hdev->n_mem_sections; ++i) {
> >> vhost_sync_dirty_bitmap(hdev,&hdev->mem_sections[i],
> >> 0, (target_phys_addr_t)~0x0ull);
> >> }
> >>- r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, false);
> >>- if (r< 0) {
> >>- fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r);
> >>- fflush(stderr);
> >>+
> >>+ if (hdev->start_idx == 0) {
> >>+ r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, false);
> >>+ if (r< 0) {
> >>+ fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r);
> >>+ fflush(stderr);
> >>+ }
> >>+ assert (r>= 0);
> >> }
> >>- assert (r>= 0);
> >>
> >> hdev->started = false;
> >> g_free(hdev->log);
> >>diff --git a/hw/vhost.h b/hw/vhost.h
> >>index 80e64df..fa5357a 100644
> >>--- a/hw/vhost.h
> >>+++ b/hw/vhost.h
> >>@@ -34,6 +34,7 @@ struct vhost_dev {
> >> MemoryRegionSection *mem_sections;
> >> struct vhost_virtqueue *vqs;
> >> int nvqs;
> >>+ int start_idx;
> >> unsigned long long features;
> >> unsigned long long acked_features;
> >> unsigned long long backend_features;
> >>diff --git a/hw/vhost_net.c b/hw/vhost_net.c
> >>index f672e9d..73a72bb 100644
> >>--- a/hw/vhost_net.c
> >>+++ b/hw/vhost_net.c
> >>@@ -138,13 +138,15 @@ bool vhost_net_query(VHostNetState *net, VirtIODevice *dev)
> >> }
> >>
> >> int vhost_net_start(struct vhost_net *net,
> >>- VirtIODevice *dev)
> >>+ VirtIODevice *dev,
> >>+ int start_idx)
> >> {
> >> struct vhost_vring_file file = { };
> >> int r;
> >>
> >> net->dev.nvqs = 2;
> >> net->dev.vqs = net->vqs;
> >>+ net->dev.start_idx = start_idx;
> >>
> >> r = vhost_dev_enable_notifiers(&net->dev, dev);
> >> if (r< 0) {
> >>@@ -227,7 +229,8 @@ bool vhost_net_query(VHostNetState *net, VirtIODevice *dev)
> >> }
> >>
> >> int vhost_net_start(struct vhost_net *net,
> >>- VirtIODevice *dev)
> >>+ VirtIODevice *dev,
> >>+ int start_idx)
> >> {
> >> return -ENOSYS;
> >> }
> >>diff --git a/hw/vhost_net.h b/hw/vhost_net.h
> >>index 91e40b1..79a4f09 100644
> >>--- a/hw/vhost_net.h
> >>+++ b/hw/vhost_net.h
> >>@@ -9,7 +9,7 @@ typedef struct vhost_net VHostNetState;
> >> VHostNetState *vhost_net_init(VLANClientState *backend, int devfd, bool force);
> >>
> >> bool vhost_net_query(VHostNetState *net, VirtIODevice *dev);
> >>-int vhost_net_start(VHostNetState *net, VirtIODevice *dev);
> >>+int vhost_net_start(VHostNetState *net, VirtIODevice *dev, int start_idx);
> >> void vhost_net_stop(VHostNetState *net, VirtIODevice *dev);
> >>
> >> void vhost_net_cleanup(VHostNetState *net);
> >>diff --git a/hw/virtio-net.c b/hw/virtio-net.c
> >>index 3f190d4..d42c4cc 100644
> >>--- a/hw/virtio-net.c
> >>+++ b/hw/virtio-net.c
> >>@@ -26,34 +26,43 @@
> >> #define MAC_TABLE_ENTRIES 64
> >> #define MAX_VLAN (1<< 12) /* Per 802.1Q definition */
> >>
> >>-typedef struct VirtIONet
> >>+struct VirtIONet;
> >>+
> >>+typedef struct VirtIONetQueue
> >> {
> >>- VirtIODevice vdev;
> >>- uint8_t mac[ETH_ALEN];
> >>- uint16_t status;
> >> VirtQueue *rx_vq;
> >> VirtQueue *tx_vq;
> >>- VirtQueue *ctrl_vq;
> >>- NICState *nic;
> >> QEMUTimer *tx_timer;
> >> QEMUBH *tx_bh;
> >> uint32_t tx_timeout;
> >>- int32_t tx_burst;
> >> int tx_waiting;
> >>- uint32_t has_vnet_hdr;
> >>- uint8_t has_ufo;
> >> struct {
> >> VirtQueueElement elem;
> >> ssize_t len;
> >> } async_tx;
> >>+ struct VirtIONet *n;
> >>+ uint8_t vhost_started;
> >>+} VirtIONetQueue;
> >>+
> >>+typedef struct VirtIONet
> >>+{
> >>+ VirtIODevice vdev;
> >>+ uint8_t mac[ETH_ALEN];
> >>+ uint16_t status;
> >>+ VirtIONetQueue vqs[MAX_QUEUE_NUM];
> >>+ VirtQueue *ctrl_vq;
> >>+ NICState *nic;
> >>+ int32_t tx_burst;
> >>+ uint32_t has_vnet_hdr;
> >>+ uint8_t has_ufo;
> >> int mergeable_rx_bufs;
> >>+ int multiqueue;
> >> uint8_t promisc;
> >> uint8_t allmulti;
> >> uint8_t alluni;
> >> uint8_t nomulti;
> >> uint8_t nouni;
> >> uint8_t nobcast;
> >>- uint8_t vhost_started;
> >> struct {
> >> int in_use;
> >> int first_multi;
> >>@@ -63,6 +72,7 @@ typedef struct VirtIONet
> >> } mac_table;
> >> uint32_t *vlans;
> >> DeviceState *qdev;
> >>+ uint32_t queues;
> >> } VirtIONet;
> >>
> >> /* TODO
> >>@@ -74,12 +84,25 @@ static VirtIONet *to_virtio_net(VirtIODevice *vdev)
> >> return (VirtIONet *)vdev;
> >> }
> >>
> >>+static int vq_get_pair_index(VirtIONet *n, VirtQueue *vq)
> >>+{
> >>+ int i;
> >>+ for (i = 0; i< n->queues; i++) {
> >>+ if (n->vqs[i].tx_vq == vq || n->vqs[i].rx_vq == vq) {
> >>+ return i;
> >>+ }
> >>+ }
> >>+ assert(1);
> >>+ return -1;
> >>+}
> >>+
> >> static void virtio_net_get_config(VirtIODevice *vdev, uint8_t *config)
> >> {
> >> VirtIONet *n = to_virtio_net(vdev);
> >> struct virtio_net_config netcfg;
> >>
> >> stw_p(&netcfg.status, n->status);
> >>+ netcfg.queues = n->queues * 2;
> >> memcpy(netcfg.mac, n->mac, ETH_ALEN);
> >> memcpy(config,&netcfg, sizeof(netcfg));
> >> }
> >>@@ -103,78 +126,140 @@ static bool virtio_net_started(VirtIONet *n, uint8_t status)
> >> (n->status& VIRTIO_NET_S_LINK_UP)&& n->vdev.vm_running;
> >> }
> >>
> >>-static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
> >>+static void nc_vhost_status(VLANClientState *nc, VirtIONet *n,
> >>+ uint8_t status)
> >> {
> >>- if (!n->nic->nc.peer) {
> >>+ int queue_index = nc->queue_index;
> >>+ VLANClientState *peer = nc->peer;
> >>+ VirtIONetQueue *netq =&n->vqs[nc->queue_index];
> >>+
> >>+ if (!peer) {
> >> return;
> >> }
> >>- if (n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
> >>+ if (peer->info->type != NET_CLIENT_TYPE_TAP) {
> >> return;
> >> }
> >>
> >>- if (!tap_get_vhost_net(n->nic->nc.peer)) {
> >>+ if (!tap_get_vhost_net(peer)) {
> >> return;
> >> }
> >>- if (!!n->vhost_started == virtio_net_started(n, status)&&
> >>- !n->nic->nc.peer->link_down) {
> >>+ if (!!netq->vhost_started == virtio_net_started(n, status)&&
> >>+ !peer->link_down) {
> >> return;
> >> }
> >>- if (!n->vhost_started) {
> >>- int r;
> >>- if (!vhost_net_query(tap_get_vhost_net(n->nic->nc.peer),&n->vdev)) {
> >>+ if (!netq->vhost_started) {
> >>+ /* skip ctrl vq */
> >>+ int r, start_idx = queue_index == 0 ? 0 : queue_index * 2 + 1;
> >>+ if (!vhost_net_query(tap_get_vhost_net(peer),&n->vdev)) {
> >> return;
> >> }
> >>- r = vhost_net_start(tap_get_vhost_net(n->nic->nc.peer),&n->vdev);
> >>+ r = vhost_net_start(tap_get_vhost_net(peer),&n->vdev, start_idx);
> >> if (r< 0) {
> >> error_report("unable to start vhost net: %d: "
> >> "falling back on userspace virtio", -r);
> >> } else {
> >>- n->vhost_started = 1;
> >>+ netq->vhost_started = 1;
> >> }
> >> } else {
> >>- vhost_net_stop(tap_get_vhost_net(n->nic->nc.peer),&n->vdev);
> >>- n->vhost_started = 0;
> >>+ vhost_net_stop(tap_get_vhost_net(peer),&n->vdev);
> >>+ netq->vhost_started = 0;
> >>+ }
> >>+}
> >>+
> >>+static int peer_attach(VirtIONet *n, int index)
> >>+{
> >>+ if (!n->nic->ncs[index]->peer) {
> >>+ return -1;
> >>+ }
> >>+
> >>+ if (n->nic->ncs[index]->peer->info->type != NET_CLIENT_TYPE_TAP) {
> >>+ return -1;
> >>+ }
> >>+
> >>+ return tap_attach(n->nic->ncs[index]->peer);
> >>+}
> >>+
> >>+static int peer_detach(VirtIONet *n, int index)
> >>+{
> >>+ if (!n->nic->ncs[index]->peer) {
> >>+ return -1;
> >>+ }
> >>+
> >>+ if (n->nic->ncs[index]->peer->info->type != NET_CLIENT_TYPE_TAP) {
> >>+ return -1;
> >>+ }
> >>+
> >>+ return tap_detach(n->nic->ncs[index]->peer);
> >>+}
> >>+
> >>+static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
> >>+{
> >>+ int i;
> >>+ for (i = 0; i< n->queues; i++) {
> >>+ if (!n->multiqueue&& i != 0)
> >>+ status = 0;
> >>+ nc_vhost_status(n->nic->ncs[i], n, status);
> >> }
> >> }
> >>
> >> static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
> >> {
> >> VirtIONet *n = to_virtio_net(vdev);
> >>+ int i;
> >>
> >> virtio_net_vhost_status(n, status);
> >>
> >>- if (!n->tx_waiting) {
> >>- return;
> >>- }
> >>+ for (i = 0; i< n->queues; i++) {
> >>+ VirtIONetQueue *netq =&n->vqs[i];
> >>+ if (!netq->tx_waiting) {
> >>+ continue;
> >>+ }
> >>+
> >>+ if (!n->multiqueue&& i != 0)
> >>+ status = 0;
> >>
> >>- if (virtio_net_started(n, status)&& !n->vhost_started) {
> >>- if (n->tx_timer) {
> >>- qemu_mod_timer(n->tx_timer,
> >>- qemu_get_clock_ns(vm_clock) + n->tx_timeout);
> >>+ if (virtio_net_started(n, status)&& !netq->vhost_started) {
> >>+ if (netq->tx_timer) {
> >>+ qemu_mod_timer(netq->tx_timer,
> >>+ qemu_get_clock_ns(vm_clock) + netq->tx_timeout);
> >>+ } else {
> >>+ qemu_bh_schedule(netq->tx_bh);
> >>+ }
> >> } else {
> >>- qemu_bh_schedule(n->tx_bh);
> >>+ if (netq->tx_timer) {
> >>+ qemu_del_timer(netq->tx_timer);
> >>+ } else {
> >>+ qemu_bh_cancel(netq->tx_bh);
> >>+ }
> >> }
> >>- } else {
> >>- if (n->tx_timer) {
> >>- qemu_del_timer(n->tx_timer);
> >>- } else {
> >>- qemu_bh_cancel(n->tx_bh);
> >>+ }
> >>+}
> >>+
> >>+static bool virtio_net_is_link_up(VirtIONet *n)
> >>+{
> >>+ int i;
> >>+ for (i = 0; i< n->queues; i++) {
> >>+ if (n->nic->ncs[i]->link_down) {
> >>+ return false;
> >> }
> >> }
> >>+ return true;
> >> }
> >>
> >> static void virtio_net_set_link_status(VLANClientState *nc)
> >> {
> >>- VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque;
> >>+ VirtIONet *n = ((NICState *)(nc->opaque))->opaque;
> >> uint16_t old_status = n->status;
> >>
> >>- if (nc->link_down)
> >>+ if (virtio_net_is_link_up(n)) {
> >> n->status&= ~VIRTIO_NET_S_LINK_UP;
> >>- else
> >>+ } else {
> >> n->status |= VIRTIO_NET_S_LINK_UP;
> >>+ }
> >>
> >>- if (n->status != old_status)
> >>+ if (n->status != old_status) {
> >> virtio_notify_config(&n->vdev);
> >>+ }
> >>
> >> virtio_net_set_status(&n->vdev, n->vdev.status);
> >> }
> >>@@ -202,13 +287,15 @@ static void virtio_net_reset(VirtIODevice *vdev)
> >>
> >> static int peer_has_vnet_hdr(VirtIONet *n)
> >> {
> >>- if (!n->nic->nc.peer)
> >>+ if (!n->nic->ncs[0]->peer) {
> >> return 0;
> >>+ }
> >>
> >>- if (n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP)
> >>+ if (n->nic->ncs[0]->peer->info->type != NET_CLIENT_TYPE_TAP) {
> >> return 0;
> >>+ }
> >>
> >>- n->has_vnet_hdr = tap_has_vnet_hdr(n->nic->nc.peer);
> >>+ n->has_vnet_hdr = tap_has_vnet_hdr(n->nic->ncs[0]->peer);
> >>
> >> return n->has_vnet_hdr;
> >> }
> >>@@ -218,7 +305,7 @@ static int peer_has_ufo(VirtIONet *n)
> >> if (!peer_has_vnet_hdr(n))
> >> return 0;
> >>
> >>- n->has_ufo = tap_has_ufo(n->nic->nc.peer);
> >>+ n->has_ufo = tap_has_ufo(n->nic->ncs[0]->peer);
> >>
> >> return n->has_ufo;
> >> }
> >>@@ -228,9 +315,13 @@ static uint32_t virtio_net_get_features(VirtIODevice *vdev, uint32_t features)
> >> VirtIONet *n = to_virtio_net(vdev);
> >>
> >> features |= (1<< VIRTIO_NET_F_MAC);
> >>+ features |= (1<< VIRTIO_NET_F_MULTIQUEUE);
> >>
> >> if (peer_has_vnet_hdr(n)) {
> >>- tap_using_vnet_hdr(n->nic->nc.peer, 1);
> >>+ int i;
> >>+ for (i = 0; i< n->queues; i++) {
> >>+ tap_using_vnet_hdr(n->nic->ncs[i]->peer, 1);
> >>+ }
> >> } else {
> >> features&= ~(0x1<< VIRTIO_NET_F_CSUM);
> >> features&= ~(0x1<< VIRTIO_NET_F_HOST_TSO4);
> >>@@ -248,14 +339,15 @@ static uint32_t virtio_net_get_features(VirtIODevice *vdev, uint32_t features)
> >> features&= ~(0x1<< VIRTIO_NET_F_HOST_UFO);
> >> }
> >>
> >>- if (!n->nic->nc.peer ||
> >>- n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
> >>+ if (!n->nic->ncs[0]->peer ||
> >>+ n->nic->ncs[0]->peer->info->type != NET_CLIENT_TYPE_TAP) {
> >> return features;
> >> }
> >>- if (!tap_get_vhost_net(n->nic->nc.peer)) {
> >>+ if (!tap_get_vhost_net(n->nic->ncs[0]->peer)) {
> >> return features;
> >> }
> >>- return vhost_net_get_features(tap_get_vhost_net(n->nic->nc.peer), features);
> >>+ return vhost_net_get_features(tap_get_vhost_net(n->nic->ncs[0]->peer),
> >>+ features);
> >> }
> >>
> >> static uint32_t virtio_net_bad_features(VirtIODevice *vdev)
> >>@@ -276,25 +368,38 @@ static uint32_t virtio_net_bad_features(VirtIODevice *vdev)
> >> static void virtio_net_set_features(VirtIODevice *vdev, uint32_t features)
> >> {
> >> VirtIONet *n = to_virtio_net(vdev);
> >>+ int i, r;
> >>
> >> n->mergeable_rx_bufs = !!(features& (1<< VIRTIO_NET_F_MRG_RXBUF));
> >>+ n->multiqueue = !!(features& (1<< VIRTIO_NET_F_MULTIQUEUE));
> >>
> >>- if (n->has_vnet_hdr) {
> >>- tap_set_offload(n->nic->nc.peer,
> >>- (features>> VIRTIO_NET_F_GUEST_CSUM)& 1,
> >>- (features>> VIRTIO_NET_F_GUEST_TSO4)& 1,
> >>- (features>> VIRTIO_NET_F_GUEST_TSO6)& 1,
> >>- (features>> VIRTIO_NET_F_GUEST_ECN)& 1,
> >>- (features>> VIRTIO_NET_F_GUEST_UFO)& 1);
> >>- }
> >>- if (!n->nic->nc.peer ||
> >>- n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
> >>- return;
> >>- }
> >>- if (!tap_get_vhost_net(n->nic->nc.peer)) {
> >>- return;
> >>+ for (i = 0; i< n->queues; i++) {
> >>+ if (!n->multiqueue&& i != 0) {
> >>+ r = peer_detach(n, i);
> >>+ assert(r == 0);
> >>+ } else {
> >>+ r = peer_attach(n, i);
> >>+ assert(r == 0);
> >>+
> >>+ if (n->has_vnet_hdr) {
> >>+ tap_set_offload(n->nic->ncs[i]->peer,
> >>+ (features>> VIRTIO_NET_F_GUEST_CSUM)& 1,
> >>+ (features>> VIRTIO_NET_F_GUEST_TSO4)& 1,
> >>+ (features>> VIRTIO_NET_F_GUEST_TSO6)& 1,
> >>+ (features>> VIRTIO_NET_F_GUEST_ECN)& 1,
> >>+ (features>> VIRTIO_NET_F_GUEST_UFO)& 1);
> >>+ }
> >>+ if (!n->nic->ncs[i]->peer ||
> >>+ n->nic->ncs[i]->peer->info->type != NET_CLIENT_TYPE_TAP) {
> >>+ continue;
> >>+ }
> >>+ if (!tap_get_vhost_net(n->nic->ncs[i]->peer)) {
> >>+ continue;
> >>+ }
> >>+ vhost_net_ack_features(tap_get_vhost_net(n->nic->ncs[i]->peer),
> >>+ features);
> >>+ }
> >> }
> >>- vhost_net_ack_features(tap_get_vhost_net(n->nic->nc.peer), features);
> >> }
> >>
> >> static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> >>@@ -446,7 +551,7 @@ static void virtio_net_handle_rx(VirtIODevice *vdev, VirtQueue *vq)
> >> {
> >> VirtIONet *n = to_virtio_net(vdev);
> >>
> >>- qemu_flush_queued_packets(&n->nic->nc);
> >>+ qemu_flush_queued_packets(n->nic->ncs[vq_get_pair_index(n, vq)]);
> >>
> >> /* We now have RX buffers, signal to the IO thread to break out of the
> >> * select to re-poll the tap file descriptor */
> >>@@ -455,36 +560,37 @@ static void virtio_net_handle_rx(VirtIODevice *vdev, VirtQueue *vq)
> >>
> >> static int virtio_net_can_receive(VLANClientState *nc)
> >> {
> >>- VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque;
> >>+ int queue_index = nc->queue_index;
> >>+ VirtIONet *n = ((NICState *)nc->opaque)->opaque;
> >>+
> >> if (!n->vdev.vm_running) {
> >> return 0;
> >> }
> >>
> >>- if (!virtio_queue_ready(n->rx_vq) ||
> >>+ if (!virtio_queue_ready(n->vqs[queue_index].rx_vq) ||
> >> !(n->vdev.status& VIRTIO_CONFIG_S_DRIVER_OK))
> >> return 0;
> >>
> >> return 1;
> >> }
> >>
> >>-static int virtio_net_has_buffers(VirtIONet *n, int bufsize)
> >>+static int virtio_net_has_buffers(VirtIONet *n, int bufsize, VirtQueue *vq)
> >> {
> >>- if (virtio_queue_empty(n->rx_vq) ||
> >>- (n->mergeable_rx_bufs&&
> >>- !virtqueue_avail_bytes(n->rx_vq, bufsize, 0))) {
> >>- virtio_queue_set_notification(n->rx_vq, 1);
> >>+ if (virtio_queue_empty(vq) || (n->mergeable_rx_bufs&&
> >>+ !virtqueue_avail_bytes(vq, bufsize, 0))) {
> >>+ virtio_queue_set_notification(vq, 1);
> >>
> >> /* To avoid a race condition where the guest has made some buffers
> >> * available after the above check but before notification was
> >> * enabled, check for available buffers again.
> >> */
> >>- if (virtio_queue_empty(n->rx_vq) ||
> >>- (n->mergeable_rx_bufs&&
> >>- !virtqueue_avail_bytes(n->rx_vq, bufsize, 0)))
> >>+ if (virtio_queue_empty(vq) || (n->mergeable_rx_bufs&&
> >>+ !virtqueue_avail_bytes(vq, bufsize, 0))) {
> >> return 0;
> >>+ }
> >> }
> >>
> >>- virtio_queue_set_notification(n->rx_vq, 0);
> >>+ virtio_queue_set_notification(vq, 0);
> >> return 1;
> >> }
> >>
> >>@@ -595,12 +701,15 @@ static int receive_filter(VirtIONet *n, const uint8_t *buf, int size)
> >>
> >> static ssize_t virtio_net_receive(VLANClientState *nc, const uint8_t *buf, size_t size)
> >> {
> >>- VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque;
> >>+ int queue_index = nc->queue_index;
> >>+ VirtIONet *n = ((NICState *)(nc->opaque))->opaque;
> >>+ VirtQueue *vq = n->vqs[queue_index].rx_vq;
> >> struct virtio_net_hdr_mrg_rxbuf *mhdr = NULL;
> >> size_t guest_hdr_len, offset, i, host_hdr_len;
> >>
> >>- if (!virtio_net_can_receive(&n->nic->nc))
> >>+ if (!virtio_net_can_receive(n->nic->ncs[queue_index])) {
> >> return -1;
> >>+ }
> >>
> >> /* hdr_len refers to the header we supply to the guest */
> >> guest_hdr_len = n->mergeable_rx_bufs ?
> >>@@ -608,7 +717,7 @@ static ssize_t virtio_net_receive(VLANClientState *nc, const uint8_t *buf, size_
> >>
> >>
> >> host_hdr_len = n->has_vnet_hdr ? sizeof(struct virtio_net_hdr) : 0;
> >>- if (!virtio_net_has_buffers(n, size + guest_hdr_len - host_hdr_len))
> >>+ if (!virtio_net_has_buffers(n, size + guest_hdr_len - host_hdr_len, vq))
> >> return 0;
> >>
> >> if (!receive_filter(n, buf, size))
> >>@@ -623,7 +732,7 @@ static ssize_t virtio_net_receive(VLANClientState *nc, const uint8_t *buf, size_
> >>
> >> total = 0;
> >>
> >>- if (virtqueue_pop(n->rx_vq,&elem) == 0) {
> >>+ if (virtqueue_pop(vq,&elem) == 0) {
> >> if (i == 0)
> >> return -1;
> >> error_report("virtio-net unexpected empty queue: "
> >>@@ -675,47 +784,50 @@ static ssize_t virtio_net_receive(VLANClientState *nc, const uint8_t *buf, size_
> >> }
> >>
> >> /* signal other side */
> >>- virtqueue_fill(n->rx_vq,&elem, total, i++);
> >>+ virtqueue_fill(vq,&elem, total, i++);
> >> }
> >>
> >> if (mhdr) {
> >> stw_p(&mhdr->num_buffers, i);
> >> }
> >>
> >>- virtqueue_flush(n->rx_vq, i);
> >>- virtio_notify(&n->vdev, n->rx_vq);
> >>+ virtqueue_flush(vq, i);
> >>+ virtio_notify(&n->vdev, vq);
> >>
> >> return size;
> >> }
> >>
> >>-static int32_t virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq);
> >>+static int32_t virtio_net_flush_tx(VirtIONet *n, VirtIONetQueue *tvq);
> >>
> >> static void virtio_net_tx_complete(VLANClientState *nc, ssize_t len)
> >> {
> >>- VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque;
> >>+ VirtIONet *n = ((NICState *)nc->opaque)->opaque;
> >>+ VirtIONetQueue *netq =&n->vqs[nc->queue_index];
> >>
> >>- virtqueue_push(n->tx_vq,&n->async_tx.elem, n->async_tx.len);
> >>- virtio_notify(&n->vdev, n->tx_vq);
> >>+ virtqueue_push(netq->tx_vq,&netq->async_tx.elem, netq->async_tx.len);
> >>+ virtio_notify(&n->vdev, netq->tx_vq);
> >>
> >>- n->async_tx.elem.out_num = n->async_tx.len = 0;
> >>+ netq->async_tx.elem.out_num = netq->async_tx.len;
> >>
> >>- virtio_queue_set_notification(n->tx_vq, 1);
> >>- virtio_net_flush_tx(n, n->tx_vq);
> >>+ virtio_queue_set_notification(netq->tx_vq, 1);
> >>+ virtio_net_flush_tx(n, netq);
> >> }
> >>
> >> /* TX */
> >>-static int32_t virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq)
> >>+static int32_t virtio_net_flush_tx(VirtIONet *n, VirtIONetQueue *netq)
> >> {
> >> VirtQueueElement elem;
> >> int32_t num_packets = 0;
> >>+ VirtQueue *vq = netq->tx_vq;
> >>+
> >> if (!(n->vdev.status& VIRTIO_CONFIG_S_DRIVER_OK)) {
> >> return num_packets;
> >> }
> >>
> >> assert(n->vdev.vm_running);
> >>
> >>- if (n->async_tx.elem.out_num) {
> >>- virtio_queue_set_notification(n->tx_vq, 0);
> >>+ if (netq->async_tx.elem.out_num) {
> >>+ virtio_queue_set_notification(vq, 0);
> >> return num_packets;
> >> }
> >>
> >>@@ -747,12 +859,12 @@ static int32_t virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq)
> >> len += hdr_len;
> >> }
> >>
> >>- ret = qemu_sendv_packet_async(&n->nic->nc, out_sg, out_num,
> >>- virtio_net_tx_complete);
> >>+ ret = qemu_sendv_packet_async(n->nic->ncs[vq_get_pair_index(n, vq)],
> >>+ out_sg, out_num, virtio_net_tx_complete);
> >> if (ret == 0) {
> >>- virtio_queue_set_notification(n->tx_vq, 0);
> >>- n->async_tx.elem = elem;
> >>- n->async_tx.len = len;
> >>+ virtio_queue_set_notification(vq, 0);
> >>+ netq->async_tx.elem = elem;
> >>+ netq->async_tx.len = len;
> >> return -EBUSY;
> >> }
> >>
> >>@@ -771,22 +883,23 @@ static int32_t virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq)
> >> static void virtio_net_handle_tx_timer(VirtIODevice *vdev, VirtQueue *vq)
> >> {
> >> VirtIONet *n = to_virtio_net(vdev);
> >>+ VirtIONetQueue *netq =&n->vqs[vq_get_pair_index(n, vq)];
> >>
> >> /* This happens when device was stopped but VCPU wasn't. */
> >> if (!n->vdev.vm_running) {
> >>- n->tx_waiting = 1;
> >>+ netq->tx_waiting = 1;
> >> return;
> >> }
> >>
> >>- if (n->tx_waiting) {
> >>+ if (netq->tx_waiting) {
> >> virtio_queue_set_notification(vq, 1);
> >>- qemu_del_timer(n->tx_timer);
> >>- n->tx_waiting = 0;
> >>- virtio_net_flush_tx(n, vq);
> >>+ qemu_del_timer(netq->tx_timer);
> >>+ netq->tx_waiting = 0;
> >>+ virtio_net_flush_tx(n, netq);
> >> } else {
> >>- qemu_mod_timer(n->tx_timer,
> >>- qemu_get_clock_ns(vm_clock) + n->tx_timeout);
> >>- n->tx_waiting = 1;
> >>+ qemu_mod_timer(netq->tx_timer,
> >>+ qemu_get_clock_ns(vm_clock) + netq->tx_timeout);
> >>+ netq->tx_waiting = 1;
> >> virtio_queue_set_notification(vq, 0);
> >> }
> >> }
> >>@@ -794,48 +907,53 @@ static void virtio_net_handle_tx_timer(VirtIODevice *vdev, VirtQueue *vq)
> >> static void virtio_net_handle_tx_bh(VirtIODevice *vdev, VirtQueue *vq)
> >> {
> >> VirtIONet *n = to_virtio_net(vdev);
> >>+ VirtIONetQueue *netq =&n->vqs[vq_get_pair_index(n, vq)];
> >>
> >>- if (unlikely(n->tx_waiting)) {
> >>+ if (unlikely(netq->tx_waiting)) {
> >> return;
> >> }
> >>- n->tx_waiting = 1;
> >>+ netq->tx_waiting = 1;
> >> /* This happens when device was stopped but VCPU wasn't. */
> >> if (!n->vdev.vm_running) {
> >> return;
> >> }
> >> virtio_queue_set_notification(vq, 0);
> >>- qemu_bh_schedule(n->tx_bh);
> >>+ qemu_bh_schedule(netq->tx_bh);
> >> }
> >>
> >> static void virtio_net_tx_timer(void *opaque)
> >> {
> >>- VirtIONet *n = opaque;
> >>+ VirtIONetQueue *netq = opaque;
> >>+ VirtIONet *n = netq->n;
> >>+
> >> assert(n->vdev.vm_running);
> >>
> >>- n->tx_waiting = 0;
> >>+ netq->tx_waiting = 0;
> >>
> >> /* Just in case the driver is not ready on more */
> >> if (!(n->vdev.status& VIRTIO_CONFIG_S_DRIVER_OK))
> >> return;
> >>
> >>- virtio_queue_set_notification(n->tx_vq, 1);
> >>- virtio_net_flush_tx(n, n->tx_vq);
> >>+ virtio_queue_set_notification(netq->tx_vq, 1);
> >>+ virtio_net_flush_tx(n, netq);
> >> }
> >>
> >> static void virtio_net_tx_bh(void *opaque)
> >> {
> >>- VirtIONet *n = opaque;
> >>+ VirtIONetQueue *netq = opaque;
> >>+ VirtQueue *vq = netq->tx_vq;
> >>+ VirtIONet *n = netq->n;
> >> int32_t ret;
> >>
> >> assert(n->vdev.vm_running);
> >>
> >>- n->tx_waiting = 0;
> >>+ netq->tx_waiting = 0;
> >>
> >> /* Just in case the driver is not ready on more */
> >> if (unlikely(!(n->vdev.status& VIRTIO_CONFIG_S_DRIVER_OK)))
> >> return;
> >>
> >>- ret = virtio_net_flush_tx(n, n->tx_vq);
> >>+ ret = virtio_net_flush_tx(n, netq);
> >> if (ret == -EBUSY) {
> >> return; /* Notification re-enable handled by tx_complete */
> >> }
> >>@@ -843,33 +961,39 @@ static void virtio_net_tx_bh(void *opaque)
> >> /* If we flush a full burst of packets, assume there are
> >> * more coming and immediately reschedule */
> >> if (ret>= n->tx_burst) {
> >>- qemu_bh_schedule(n->tx_bh);
> >>- n->tx_waiting = 1;
> >>+ qemu_bh_schedule(netq->tx_bh);
> >>+ netq->tx_waiting = 1;
> >> return;
> >> }
> >>
> >> /* If less than a full burst, re-enable notification and flush
> >> * anything that may have come in while we weren't looking. If
> >> * we find something, assume the guest is still active and reschedule */
> >>- virtio_queue_set_notification(n->tx_vq, 1);
> >>- if (virtio_net_flush_tx(n, n->tx_vq)> 0) {
> >>- virtio_queue_set_notification(n->tx_vq, 0);
> >>- qemu_bh_schedule(n->tx_bh);
> >>- n->tx_waiting = 1;
> >>+ virtio_queue_set_notification(vq, 1);
> >>+ if (virtio_net_flush_tx(n, netq)> 0) {
> >>+ virtio_queue_set_notification(vq, 0);
> >>+ qemu_bh_schedule(netq->tx_bh);
> >>+ netq->tx_waiting = 1;
> >> }
> >> }
> >>
> >> static void virtio_net_save(QEMUFile *f, void *opaque)
> >> {
> >> VirtIONet *n = opaque;
> >>+ int i;
> >>
> >> /* At this point, backend must be stopped, otherwise
> >> * it might keep writing to memory. */
> >>- assert(!n->vhost_started);
> >>+ for (i = 0; i< n->queues; i++) {
> >>+ assert(!n->vqs[i].vhost_started);
> >>+ }
> >> virtio_save(&n->vdev, f);
> >>
> >> qemu_put_buffer(f, n->mac, ETH_ALEN);
> >>- qemu_put_be32(f, n->tx_waiting);
> >>+ qemu_put_be32(f, n->queues);
> >>+ for (i = 0; i< n->queues; i++) {
> >>+ qemu_put_be32(f, n->vqs[i].tx_waiting);
> >>+ }
> >> qemu_put_be32(f, n->mergeable_rx_bufs);
> >> qemu_put_be16(f, n->status);
> >> qemu_put_byte(f, n->promisc);
> >>@@ -902,7 +1026,10 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
> >> }
> >>
> >> qemu_get_buffer(f, n->mac, ETH_ALEN);
> >>- n->tx_waiting = qemu_get_be32(f);
> >>+ n->queues = qemu_get_be32(f);
> >>+ for (i = 0; i< n->queues; i++) {
> >>+ n->vqs[i].tx_waiting = qemu_get_be32(f);
> >>+ }
> >> n->mergeable_rx_bufs = qemu_get_be32(f);
> >>
> >> if (version_id>= 3)
> >>@@ -930,7 +1057,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
> >> n->mac_table.in_use = 0;
> >> }
> >> }
> >>-
> >>+
> >> if (version_id>= 6)
> >> qemu_get_buffer(f, (uint8_t *)n->vlans, MAX_VLAN>> 3);
> >>
> >>@@ -941,13 +1068,16 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
> >> }
> >>
> >> if (n->has_vnet_hdr) {
> >>- tap_using_vnet_hdr(n->nic->nc.peer, 1);
> >>- tap_set_offload(n->nic->nc.peer,
> >>- (n->vdev.guest_features>> VIRTIO_NET_F_GUEST_CSUM)& 1,
> >>- (n->vdev.guest_features>> VIRTIO_NET_F_GUEST_TSO4)& 1,
> >>- (n->vdev.guest_features>> VIRTIO_NET_F_GUEST_TSO6)& 1,
> >>- (n->vdev.guest_features>> VIRTIO_NET_F_GUEST_ECN)& 1,
> >>- (n->vdev.guest_features>> VIRTIO_NET_F_GUEST_UFO)& 1);
> >>+ for(i = 0; i< n->queues; i++) {
> >>+ tap_using_vnet_hdr(n->nic->ncs[i]->peer, 1);
> >>+ tap_set_offload(n->nic->ncs[i]->peer,
> >>+ (n->vdev.guest_features>> VIRTIO_NET_F_GUEST_CSUM)& 1,
> >>+ (n->vdev.guest_features>> VIRTIO_NET_F_GUEST_TSO4)& 1,
> >>+ (n->vdev.guest_features>> VIRTIO_NET_F_GUEST_TSO6)& 1,
> >>+ (n->vdev.guest_features>> VIRTIO_NET_F_GUEST_ECN)& 1,
> >>+ (n->vdev.guest_features>> VIRTIO_NET_F_GUEST_UFO)&
> >>+ 1);
> >>+ }
> >> }
> >> }
> >>
> >>@@ -982,7 +1112,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
> >>
> >> static void virtio_net_cleanup(VLANClientState *nc)
> >> {
> >>- VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque;
> >>+ VirtIONet *n = ((NICState *)nc->opaque)->opaque;
> >>
> >> n->nic = NULL;
> >> }
> >>@@ -1000,6 +1130,7 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
> >> virtio_net_conf *net)
> >> {
> >> VirtIONet *n;
> >>+ int i;
> >>
> >> n = (VirtIONet *)virtio_common_init("virtio-net", VIRTIO_ID_NET,
> >> sizeof(struct virtio_net_config),
> >>@@ -1012,7 +1143,6 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
> >> n->vdev.bad_features = virtio_net_bad_features;
> >> n->vdev.reset = virtio_net_reset;
> >> n->vdev.set_status = virtio_net_set_status;
> >>- n->rx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_rx);
> >>
> >> if (net->tx&& strcmp(net->tx, "timer")&& strcmp(net->tx, "bh")) {
> >> error_report("virtio-net: "
> >>@@ -1021,15 +1151,6 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
> >> error_report("Defaulting to \"bh\"");
> >> }
> >>
> >>- if (net->tx&& !strcmp(net->tx, "timer")) {
> >>- n->tx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_tx_timer);
> >>- n->tx_timer = qemu_new_timer_ns(vm_clock, virtio_net_tx_timer, n);
> >>- n->tx_timeout = net->txtimer;
> >>- } else {
> >>- n->tx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_tx_bh);
> >>- n->tx_bh = qemu_bh_new(virtio_net_tx_bh, n);
> >>- }
> >>- n->ctrl_vq = virtio_add_queue(&n->vdev, 64, virtio_net_handle_ctrl);
> >> qemu_macaddr_default_if_unset(&conf->macaddr);
> >> memcpy(&n->mac[0],&conf->macaddr, sizeof(n->mac));
> >> n->status = VIRTIO_NET_S_LINK_UP;
> >>@@ -1038,7 +1159,6 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
> >>
> >> qemu_format_nic_info_str(&n->nic->nc, conf->macaddr.a);
> >>
> >>- n->tx_waiting = 0;
> >> n->tx_burst = net->txburst;
> >> n->mergeable_rx_bufs = 0;
> >> n->promisc = 1; /* for compatibility */
> >>@@ -1046,6 +1166,32 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
> >> n->mac_table.macs = g_malloc0(MAC_TABLE_ENTRIES * ETH_ALEN);
> >>
> >> n->vlans = g_malloc0(MAX_VLAN>> 3);
> >>+ n->queues = conf->queues;
> >>+
> >>+ /* Allocate per rx/tx vq's */
> >>+ for (i = 0; i< n->queues; i++) {
> >>+ n->vqs[i].rx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_rx);
> >>+ if (net->tx&& !strcmp(net->tx, "timer")) {
> >>+ n->vqs[i].tx_vq = virtio_add_queue(&n->vdev, 256,
> >>+ virtio_net_handle_tx_timer);
> >>+ n->vqs[i].tx_timer = qemu_new_timer_ns(vm_clock,
> >>+ virtio_net_tx_timer,
> >>+&n->vqs[i]);
> >>+ n->vqs[i].tx_timeout = net->txtimer;
> >>+ } else {
> >>+ n->vqs[i].tx_vq = virtio_add_queue(&n->vdev, 256,
> >>+ virtio_net_handle_tx_bh);
> >>+ n->vqs[i].tx_bh = qemu_bh_new(virtio_net_tx_bh,&n->vqs[i]);
> >>+ }
> >>+
> >>+ n->vqs[i].tx_waiting = 0;
> >>+ n->vqs[i].n = n;
> >>+
> >>+ if (i == 0) {
> >>+ /* keep compatiable with spec and old guest */
> >>+ n->ctrl_vq = virtio_add_queue(&n->vdev, 64, virtio_net_handle_ctrl);
> >>+ }
> >>+ }
> >>
> >> n->qdev = dev;
> >> register_savevm(dev, "virtio-net", -1, VIRTIO_NET_VM_VERSION,
> >>@@ -1059,24 +1205,33 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
> >> void virtio_net_exit(VirtIODevice *vdev)
> >> {
> >> VirtIONet *n = DO_UPCAST(VirtIONet, vdev, vdev);
> >>+ int i;
> >>
> >> /* This will stop vhost backend if appropriate. */
> >> virtio_net_set_status(vdev, 0);
> >>
> >>- qemu_purge_queued_packets(&n->nic->nc);
> >>+ for (i = 0; i< n->queues; i++) {
> >>+ qemu_purge_queued_packets(n->nic->ncs[i]);
> >>+ }
> >>
> >> unregister_savevm(n->qdev, "virtio-net", n);
> >>
> >> g_free(n->mac_table.macs);
> >> g_free(n->vlans);
> >>
> >>- if (n->tx_timer) {
> >>- qemu_del_timer(n->tx_timer);
> >>- qemu_free_timer(n->tx_timer);
> >>- } else {
> >>- qemu_bh_delete(n->tx_bh);
> >>+ for (i = 0; i< n->queues; i++) {
> >>+ VirtIONetQueue *netq =&n->vqs[i];
> >>+ if (netq->tx_timer) {
> >>+ qemu_del_timer(netq->tx_timer);
> >>+ qemu_free_timer(netq->tx_timer);
> >>+ } else {
> >>+ qemu_bh_delete(netq->tx_bh);
> >>+ }
> >> }
> >>
> >>- qemu_del_vlan_client(&n->nic->nc);
> >> virtio_cleanup(&n->vdev);
> >>+
> >>+ for (i = 0; i< n->queues; i++) {
> >>+ qemu_del_vlan_client(n->nic->ncs[i]);
> >>+ }
> >> }
> >>diff --git a/hw/virtio-net.h b/hw/virtio-net.h
> >>index 36aa463..b35ba5d 100644
> >>--- a/hw/virtio-net.h
> >>+++ b/hw/virtio-net.h
> >>@@ -44,6 +44,7 @@
> >> #define VIRTIO_NET_F_CTRL_RX 18 /* Control channel RX mode support */
> >> #define VIRTIO_NET_F_CTRL_VLAN 19 /* Control channel VLAN filtering */
> >> #define VIRTIO_NET_F_CTRL_RX_EXTRA 20 /* Extra RX mode control support */
> >>+#define VIRTIO_NET_F_MULTIQUEUE 22
> >>
> >> #define VIRTIO_NET_S_LINK_UP 1 /* Link is up */
> >>
> >>@@ -72,6 +73,8 @@ struct virtio_net_config
> >> uint8_t mac[ETH_ALEN];
> >> /* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
> >> uint16_t status;
> >>+
> >>+ uint16_t queues;
> >> } QEMU_PACKED;
> >>
> >> /* This is the first element of the scatter-gather list. If you don't
> >--
> >To unsubscribe from this list: send the line "unsubscribe kvm" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH RFC] virtio-balloon: fix add/get API use
From: Michael S. Tsirkin @ 2012-07-02 7:33 UTC (permalink / raw)
To: Rusty Russell; +Cc: virtualization, Rafael Aquini, kvm, linux-kernel
In-Reply-To: <87d34fx990.fsf@rustcorp.com.au>
In virtio balloon virtqueue_get_buf might now run concurrently with
virtqueue_kick. I audited both and this seems safe in practice but
this is not guaranteed by the API.
Additionally, a spurious interrupt might in theory make
virtqueue_get_buf run in parallel with virtqueue_add_buf, which is racy.
While we might try to protect against spurious callbacks it's
easier to fix the driver: balloon seems to be the only one
(mis)using the API like this, so let's just fix balloon.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
Warning: completely untested.
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index bfbc15c..a26eb4f 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -47,7 +47,7 @@ struct virtio_balloon
struct task_struct *thread;
/* Waiting for host to ack the pages we released. */
- struct completion acked;
+ wait_queue_head_t acked;
/* Number of balloon pages we've told the Host we're not using. */
unsigned int num_pages;
@@ -89,29 +89,26 @@ static struct page *balloon_pfn_to_page(u32 pfn)
static void balloon_ack(struct virtqueue *vq)
{
- struct virtio_balloon *vb;
- unsigned int len;
+ struct virtio_balloon *vb = vq->vdev->priv;
- vb = virtqueue_get_buf(vq, &len);
- if (vb)
- complete(&vb->acked);
+ wake_up(&vb->acked);
}
static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
{
struct scatterlist sg;
+ unsigned int len;
+ void *buf;
sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
- init_completion(&vb->acked);
-
/* We should always be able to add one buffer to an empty queue. */
if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
BUG();
virtqueue_kick(vq);
/* When host has read buffer, this completes via balloon_ack */
- wait_for_completion(&vb->acked);
+ wait_event(vb->acked, virtqueue_get_buf(vq, &len));
}
static void set_page_pfns(u32 pfns[], struct page *page)
@@ -231,12 +228,8 @@ static void update_balloon_stats(struct virtio_balloon *vb)
*/
static void stats_request(struct virtqueue *vq)
{
- struct virtio_balloon *vb;
- unsigned int len;
+ struct virtio_balloon *vb = vq->vdev->priv;
- vb = virtqueue_get_buf(vq, &len);
- if (!vb)
- return;
vb->need_stats_update = 1;
wake_up(&vb->config_change);
}
@@ -245,11 +238,14 @@ static void stats_handle_request(struct virtio_balloon *vb)
{
struct virtqueue *vq;
struct scatterlist sg;
+ unsigned int len;
vb->need_stats_update = 0;
update_balloon_stats(vb);
vq = vb->stats_vq;
+ if (!virtqueue_get_buf(vq, &len))
+ return;
sg_init_one(&sg, vb->stats, sizeof(vb->stats));
if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
BUG();
@@ -358,6 +354,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
INIT_LIST_HEAD(&vb->pages);
vb->num_pages = 0;
init_waitqueue_head(&vb->config_change);
+ init_waitqueue_head(&vb->acked);
vb->vdev = vdev;
vb->need_stats_update = 0;
^ permalink raw reply related
* Re: RFD: virtio balloon API use (was Re: [PATCH 5 of 5] virtio: expose added descriptors immediately)
From: Michael S. Tsirkin @ 2012-07-02 7:25 UTC (permalink / raw)
To: Rusty Russell; +Cc: virtualization, Rafael Aquini, kvm, linux-kernel
In-Reply-To: <87d34fx990.fsf@rustcorp.com.au>
On Mon, Jul 02, 2012 at 10:35:47AM +0930, Rusty Russell wrote:
> On Sun, 1 Jul 2012 12:20:51 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Thu, Nov 03, 2011 at 06:12:53PM +1030, Rusty Russell wrote:
> > > A virtio driver does virtqueue_add_buf() multiple times before finally
> > > calling virtqueue_kick(); previously we only exposed the added buffers
> > > in the virtqueue_kick() call. This means we don't need a memory
> > > barrier in virtqueue_add_buf(), but it reduces concurrency as the
> > > device (ie. host) can't see the buffers until the kick.
> > >
> > > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> >
> > Looking at recent mm compaction patches made me look at locking
> > in balloon closely. And I noticed the referenced patch (commit
> > ee7cd8981e15bcb365fc762afe3fc47b8242f630 upstream) interacts strangely
> > with virtio balloon; balloon currently does:
> >
> > static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
> > {
> > struct scatterlist sg;
> >
> > sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> >
> > init_completion(&vb->acked);
> >
> > /* We should always be able to add one buffer to an empty queue. */
> > if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
> > BUG();
> > virtqueue_kick(vq);
> >
> > /* When host has read buffer, this completes via balloon_ack */
> > wait_for_completion(&vb->acked);
> > }
> >
> >
> > While vq callback does:
> >
> > static void balloon_ack(struct virtqueue *vq)
> > {
> > struct virtio_balloon *vb;
> > unsigned int len;
> >
> > vb = virtqueue_get_buf(vq, &len);
> > if (vb)
> > complete(&vb->acked);
> > }
> >
> >
> > So virtqueue_get_buf might now run concurrently with virtqueue_kick.
> > I audited both and this seems safe in practice but I think
>
> Good spotting!
>
> Agreed. Because there's only add_buf, we get away with it: the add_buf
> must be almost finished by the time get_buf runs because the device has
> seen the buffer.
>
> > we need to either declare this legal at the API level
> > or add locking in driver.
>
> I wonder if we should just lock in the balloon driver, rather than
> document this corner case and set a bad example.
We'll need to replace &vb->acked with a waitqueue
and do get_buf from the same thread.
But I note that stats_request hash the same issue.
Let's see if we can fix it.
> Are there other
> drivers which take the same shortcut?
Not that I know.
> > Further, is there a guarantee that we never get
> > spurious callbacks? We currently check ring not empty
> > but esp for non shared MSI this might not be needed.
>
> Yes, I think this saves us. A spurious interrupt won't trigger
> a spurious callback.
>
> > If a spurious callback triggers, virtqueue_get_buf can run
> > concurrently with virtqueue_add_buf which is known to be racy.
> > Again I think this is currently safe as no spurious callbacks in
> > practice but should we guarantee no spurious callbacks at the API level
> > or add locking in driver?
>
> I think we should guarantee it, but is there a hole in the current
> implementation?
>
> Thanks,
> Rusty.
Could be. The check for ring empty looks somewhat suspicious.
It might be expensive to make it 100% robust - that check was
intended as an optimization for shared interrupts.
Whith per vq interrupts we IMO do not need the check.
If we add locking in balloon I think there's no need
to guarantee no spurious interrupts.
--
MST
^ permalink raw reply
* Re: [PATCH] virtio-scsi: hotplug suppot for virtio-scsi
From: mc @ 2012-07-02 7:20 UTC (permalink / raw)
To: Paolo Bonzini
Cc: stefanha, linux-scsi, linux-kernel, zwanp, linuxram, senwang,
virtualization, James E.J. Bottomley, Anthony Liguori
In-Reply-To: <4FF04F79.4090304@redhat.com>
Quoting Paolo Bonzini <pbonzini@redhat.com>:
> Il 20/06/2012 08:55, Cong Meng ha scritto:
>> This patch implements the hotplug support for virtio-scsi.
>> When there is a device attached/detached, the virtio-scsi driver will be
>> signaled via event virtual queue and it will add/remove the scsi device
>> in question automatically.
>>
>> Signed-off-by: Cong Meng <mc@linux.vnet.ibm.com>
>> Signed-off-by: Sen Wang <senwang@linux.vnet.ibm.com>
>
> Looks good (with one nit below), but please rebase over the four patches
> at http://permalink.gmane.org/gmane.linux.kernel/1312496
>
>> ---
>> drivers/scsi/virtio_scsi.c | 109
>> ++++++++++++++++++++++++++++++++++++++++++-
>> include/linux/virtio_scsi.h | 9 ++++
>> 2 files changed, 116 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
>> index 1b38431..6dad625 100644
>> --- a/drivers/scsi/virtio_scsi.c
>> +++ b/drivers/scsi/virtio_scsi.c
>> @@ -25,6 +25,7 @@
>> #include <scsi/scsi_cmnd.h>
>>
>> #define VIRTIO_SCSI_MEMPOOL_SZ 64
>> +#define VIRTIO_SCSI_EVENT_LEN 8
>>
>> /* Command queue element */
>> struct virtio_scsi_cmd {
>> @@ -43,9 +44,15 @@ struct virtio_scsi_cmd {
>> } resp;
>> } ____cacheline_aligned_in_smp;
>>
>> +struct virtio_scsi_event_node {
>> + struct virtio_scsi *vscsi;
>> + struct virtio_scsi_event event;
>> + struct work_struct work;
>> +};
>> +
>> /* Driver instance state */
>> struct virtio_scsi {
>> - /* Protects ctrl_vq, req_vq and sg[] */
>> + /* Protects ctrl_vq, event_vq, req_vq and sg[] */
>> spinlock_t vq_lock;
>>
>> struct virtio_device *vdev;
>> @@ -53,6 +60,9 @@ struct virtio_scsi {
>> struct virtqueue *event_vq;
>> struct virtqueue *req_vq;
>>
>> + /* Get some buffers reday for event vq */
>> + struct virtio_scsi_event_node event_list[VIRTIO_SCSI_EVENT_LEN];
>> +
>> /* For sglist construction when adding commands to the virtqueue. */
>> struct scatterlist sg[];
>> };
>> @@ -184,9 +194,93 @@ static void virtscsi_ctrl_done(struct virtqueue *vq)
>> virtscsi_vq_done(vq, virtscsi_complete_free);
>> };
>>
>> +static int virtscsi_kick_event(struct virtio_scsi *vscsi,
>> + struct virtio_scsi_event_node *event_node)
>> +{
>> + int ret;
>> + struct scatterlist sg;
>> + unsigned long flags;
>> +
>> + sg_set_buf(&sg, &event_node->event, sizeof(struct virtio_scsi_event));
>> +
>> + spin_lock_irqsave(&vscsi->vq_lock, flags);
>> +
>> + ret = virtqueue_add_buf(vscsi->event_vq, &sg, 0, 1, event_node,
>> GFP_ATOMIC);
>> + if (ret >= 0)
>> + virtqueue_kick(vscsi->event_vq);
>> +
>> + spin_unlock_irqrestore(&vscsi->vq_lock, flags);
>> +
>> + return ret;
>> +}
>> +
>> +static int virtscsi_kick_event_all(struct virtio_scsi *vscsi)
>> +{
>> + int i;
>> +
>> + for (i = 0; i < VIRTIO_SCSI_EVENT_LEN; i++) {
>> + vscsi->event_list[i].vscsi = vscsi;
>> + virtscsi_kick_event(vscsi, &vscsi->event_list[i]);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static void virtscsi_handle_transport_reset(struct virtio_scsi *vscsi,
>> + struct
>> virtio_scsi_event *event)
>> +{
>> + struct scsi_device *sdev;
>> + struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
>> + unsigned int target = event->lun[1];
>> + unsigned int lun = (event->lun[2] << 8) | event->lun[3];
>> +
>> + switch (event->reason) {
>> + case VIRTIO_SCSI_EVT_RESET_RESCAN:
>> + scsi_add_device(shost, 0, target, lun);
>> + break;
>> + case VIRTIO_SCSI_EVT_RESET_REMOVED:
>> + sdev = scsi_device_lookup(shost, 0, target, lun);
>> + if (sdev) {
>> + scsi_remove_device(sdev);
>> + scsi_device_put(sdev);
>> + } else {
>> + pr_err("SCSI device %d 0 %d %d not found\n",
>> + shost->host_no, target, lun);
>> + }
>> + break;
>> + default:
>> + pr_info("Unsupport virtio scsi event reason %x\n", event->reason);
>> + }
>> +}
>> +
>> +static void virtscsi_handle_event(struct work_struct *work)
>> +{
>> + struct virtio_scsi_event_node *event_node =
>> + container_of(work, struct virtio_scsi_event_node, work);
>> + struct virtio_scsi *vscsi = event_node->vscsi;
>> + struct virtio_scsi_event *event = &event_node->event;
>> +
>> + switch (event->event) {
>> + case VIRTIO_SCSI_T_TRANSPORT_RESET:
>> + virtscsi_handle_transport_reset(vscsi, event);
>> + break;
>
> Please handle VIRTIO_SCSI_T_NO_EVENT too here, and mask out
> VIRTIO_SCSI_T_EVENTS_MISSED even if you do not handle it by triggering a
> full rescan.
>
Paolo, what's the appropriate action to response VIRTIO_SCSI_T_NO_EVENT
event? An error message?
I also have some confusion on VIRTIO_SCSI_T_EVENTS_MISSED:
Should VIRTIO_SCSI_T_EVENTS_MISSED always be used with other event flag?
For instanse:
(VIRTIO_SCSI_T_EVENTS_MISSED|VIRTIO_SCSI_T_TRANSPORT_RESET) is used to report
hotplug event dropped when a buffer available.
Thanks.
Cong.
> Paolo
>
>> + default:
>> + pr_err("Unsupport virtio scsi event %x\n", event->event);
>
>
>> + }
>> + virtscsi_kick_event(vscsi, event_node);
>> +}
>> +
>> +static void virtscsi_complete_event(void *buf)
>> +{
>> + struct virtio_scsi_event_node *event_node = buf;
>> +
>> + INIT_WORK(&event_node->work, virtscsi_handle_event);
>> + schedule_work(&event_node->work);
>> +}
>> +
>> static void virtscsi_event_done(struct virtqueue *vq)
>> {
>> - virtscsi_vq_done(vq, virtscsi_complete_free);
>> + virtscsi_vq_done(vq, virtscsi_complete_event);
>> };
>>
>> static void virtscsi_map_sgl(struct scatterlist *sg, unsigned int *p_idx,
>> @@ -435,6 +529,11 @@ static int virtscsi_init(struct virtio_device *vdev,
>>
>> virtscsi_config_set(vdev, cdb_size, VIRTIO_SCSI_CDB_SIZE);
>> virtscsi_config_set(vdev, sense_size, VIRTIO_SCSI_SENSE_SIZE);
>> +
>> + if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
>> + virtscsi_kick_event_all(vscsi);
>> + }
>> +
>> return 0;
>> }
>>
>> @@ -532,7 +631,13 @@ static struct virtio_device_id id_table[] = {
>> { 0 },
>> };
>>
>> +static unsigned int features[] = {
>> + VIRTIO_SCSI_F_HOTPLUG
>> +};
>> +
>> static struct virtio_driver virtio_scsi_driver = {
>> + .feature_table = features,
>> + .feature_table_size = ARRAY_SIZE(features),
>> .driver.name = KBUILD_MODNAME,
>> .driver.owner = THIS_MODULE,
>> .id_table = id_table,
>> diff --git a/include/linux/virtio_scsi.h b/include/linux/virtio_scsi.h
>> index 8ddeafd..dc8d305 100644
>> --- a/include/linux/virtio_scsi.h
>> +++ b/include/linux/virtio_scsi.h
>> @@ -69,6 +69,10 @@ struct virtio_scsi_config {
>> u32 max_lun;
>> } __packed;
>>
>> +/* Feature Bits */
>> +#define VIRTIO_SCSI_F_INOUT 0
>> +#define VIRTIO_SCSI_F_HOTPLUG 1
>> +
>> /* Response codes */
>> #define VIRTIO_SCSI_S_OK 0
>> #define VIRTIO_SCSI_S_OVERRUN 1
>> @@ -105,6 +109,11 @@ struct virtio_scsi_config {
>> #define VIRTIO_SCSI_T_TRANSPORT_RESET 1
>> #define VIRTIO_SCSI_T_ASYNC_NOTIFY 2
>>
>> +/* Reasons of transport reset event */
>> +#define VIRTIO_SCSI_EVT_RESET_HARD 0
>> +#define VIRTIO_SCSI_EVT_RESET_RESCAN 1
>> +#define VIRTIO_SCSI_EVT_RESET_REMOVED 2
>> +
>> #define VIRTIO_SCSI_S_SIMPLE 0
>> #define VIRTIO_SCSI_S_ORDERED 1
>> #define VIRTIO_SCSI_S_HEAD 2
>>
>
> Thanks,
>
> Paolo
^ permalink raw reply
* Re: [RFC V2 PATCH 4/4] virtio-net: add multiqueue support
From: Jason Wang @ 2012-07-02 7:04 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: krkumar2, habanero, aliguori, mashirle, kvm, qemu-devel,
virtualization, tahm, jwhan
In-Reply-To: <20120701094321.GA4642@redhat.com>
On 07/01/2012 05:43 PM, Michael S. Tsirkin wrote:
> On Mon, Jun 25, 2012 at 06:04:49PM +0800, Jason Wang wrote:
>> This patch let the virtio-net can transmit and recevie packets through multiuple
>> VLANClientStates and abstract them as multiple virtqueues to guest. A new
>> parameter 'queues' were introduced to specify the number of queue pairs.
>>
>> The main goal for vhost support is to let the multiqueue could be used without
>> changes in vhost code. So each vhost_net structure were used to track a single
>> VLANClientState and two virtqueues in the past. As multiple VLANClientState were
>> stored in the NICState, we can infer the correspond VLANClientState from this
>> and queue_index easily.
>>
>> Signed-off-by: Jason Wang<jasowang@redhat.com>
> Can this patch be split up?
> 1. extend vhost API to allow multiqueue and minimally tweak virtio
> 2. add real multiqueue for virtio
>
> Hmm?
Sure, do you think it's necessary to separate the vhost parts of
multiqueue from virtio?
>> ---
>> hw/vhost.c | 58 ++++---
>> hw/vhost.h | 1
>> hw/vhost_net.c | 7 +
>> hw/vhost_net.h | 2
>> hw/virtio-net.c | 461 +++++++++++++++++++++++++++++++++++++------------------
>> hw/virtio-net.h | 3
>> 6 files changed, 355 insertions(+), 177 deletions(-)
>>
>> diff --git a/hw/vhost.c b/hw/vhost.c
>> index 43664e7..6318bb2 100644
>> --- a/hw/vhost.c
>> +++ b/hw/vhost.c
>> @@ -620,11 +620,12 @@ static int vhost_virtqueue_init(struct vhost_dev *dev,
>> {
>> target_phys_addr_t s, l, a;
>> int r;
>> + int vhost_vq_index = (idx> 2 ? idx - 1 : idx) % dev->nvqs;
>> struct vhost_vring_file file = {
>> - .index = idx,
>> + .index = vhost_vq_index
>> };
>> struct vhost_vring_state state = {
>> - .index = idx,
>> + .index = vhost_vq_index
>> };
>> struct VirtQueue *vvq = virtio_get_queue(vdev, idx);
>>
>> @@ -670,11 +671,12 @@ static int vhost_virtqueue_init(struct vhost_dev *dev,
>> goto fail_alloc_ring;
>> }
>>
>> - r = vhost_virtqueue_set_addr(dev, vq, idx, dev->log_enabled);
>> + r = vhost_virtqueue_set_addr(dev, vq, vhost_vq_index, dev->log_enabled);
>> if (r< 0) {
>> r = -errno;
>> goto fail_alloc;
>> }
>> +
>> file.fd = event_notifier_get_fd(virtio_queue_get_host_notifier(vvq));
>> r = ioctl(dev->control, VHOST_SET_VRING_KICK,&file);
>> if (r) {
>> @@ -715,7 +717,7 @@ static void vhost_virtqueue_cleanup(struct vhost_dev *dev,
>> unsigned idx)
>> {
>> struct vhost_vring_state state = {
>> - .index = idx,
>> + .index = (idx> 2 ? idx - 1 : idx) % dev->nvqs,
>> };
>> int r;
>> r = ioctl(dev->control, VHOST_GET_VRING_BASE,&state);
>> @@ -829,7 +831,9 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
>> }
>>
>> for (i = 0; i< hdev->nvqs; ++i) {
>> - r = vdev->binding->set_host_notifier(vdev->binding_opaque, i, true);
>> + r = vdev->binding->set_host_notifier(vdev->binding_opaque,
>> + hdev->start_idx + i,
>> + true);
>> if (r< 0) {
>> fprintf(stderr, "vhost VQ %d notifier binding failed: %d\n", i, -r);
>> goto fail_vq;
>> @@ -839,7 +843,9 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
>> return 0;
>> fail_vq:
>> while (--i>= 0) {
>> - r = vdev->binding->set_host_notifier(vdev->binding_opaque, i, false);
>> + r = vdev->binding->set_host_notifier(vdev->binding_opaque,
>> + hdev->start_idx + i,
>> + false);
>> if (r< 0) {
>> fprintf(stderr, "vhost VQ %d notifier cleanup error: %d\n", i, -r);
>> fflush(stderr);
>> @@ -860,7 +866,9 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
>> int i, r;
>>
>> for (i = 0; i< hdev->nvqs; ++i) {
>> - r = vdev->binding->set_host_notifier(vdev->binding_opaque, i, false);
>> + r = vdev->binding->set_host_notifier(vdev->binding_opaque,
>> + hdev->start_idx + i,
>> + false);
>> if (r< 0) {
>> fprintf(stderr, "vhost VQ %d notifier cleanup failed: %d\n", i, -r);
>> fflush(stderr);
>> @@ -874,15 +882,17 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
>> {
>> int i, r;
>> if (!vdev->binding->set_guest_notifiers) {
>> - fprintf(stderr, "binding does not support guest notifiers\n");
>> + fprintf(stderr, "binding does not support guest notifier\n");
>> r = -ENOSYS;
>> goto fail;
>> }
>>
>> - r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, true);
>> - if (r< 0) {
>> - fprintf(stderr, "Error binding guest notifier: %d\n", -r);
>> - goto fail_notifiers;
>> + if (hdev->start_idx == 0) {
>> + r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, true);
>> + if (r< 0) {
>> + fprintf(stderr, "Error binding guest notifier: %d\n", -r);
>> + goto fail_notifiers;
>> + }
>> }
>>
>> r = vhost_dev_set_features(hdev, hdev->log_enabled);
>> @@ -898,7 +908,7 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
>> r = vhost_virtqueue_init(hdev,
>> vdev,
>> hdev->vqs + i,
>> - i);
>> + hdev->start_idx + i);
>> if (r< 0) {
>> goto fail_vq;
>> }
>> @@ -925,11 +935,13 @@ fail_vq:
>> vhost_virtqueue_cleanup(hdev,
>> vdev,
>> hdev->vqs + i,
>> - i);
>> + hdev->start_idx + i);
>> }
>> + i = hdev->nvqs;
>> fail_mem:
>> fail_features:
>> - vdev->binding->set_guest_notifiers(vdev->binding_opaque, false);
>> + if (hdev->start_idx == 0)
>> + vdev->binding->set_guest_notifiers(vdev->binding_opaque, false);
>> fail_notifiers:
>> fail:
>> return r;
>> @@ -944,18 +956,22 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
>> vhost_virtqueue_cleanup(hdev,
>> vdev,
>> hdev->vqs + i,
>> - i);
>> + hdev->start_idx + i);
>> }
>> +
>> for (i = 0; i< hdev->n_mem_sections; ++i) {
>> vhost_sync_dirty_bitmap(hdev,&hdev->mem_sections[i],
>> 0, (target_phys_addr_t)~0x0ull);
>> }
>> - r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, false);
>> - if (r< 0) {
>> - fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r);
>> - fflush(stderr);
>> +
>> + if (hdev->start_idx == 0) {
>> + r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, false);
>> + if (r< 0) {
>> + fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r);
>> + fflush(stderr);
>> + }
>> + assert (r>= 0);
>> }
>> - assert (r>= 0);
>>
>> hdev->started = false;
>> g_free(hdev->log);
>> diff --git a/hw/vhost.h b/hw/vhost.h
>> index 80e64df..fa5357a 100644
>> --- a/hw/vhost.h
>> +++ b/hw/vhost.h
>> @@ -34,6 +34,7 @@ struct vhost_dev {
>> MemoryRegionSection *mem_sections;
>> struct vhost_virtqueue *vqs;
>> int nvqs;
>> + int start_idx;
>> unsigned long long features;
>> unsigned long long acked_features;
>> unsigned long long backend_features;
>> diff --git a/hw/vhost_net.c b/hw/vhost_net.c
>> index f672e9d..73a72bb 100644
>> --- a/hw/vhost_net.c
>> +++ b/hw/vhost_net.c
>> @@ -138,13 +138,15 @@ bool vhost_net_query(VHostNetState *net, VirtIODevice *dev)
>> }
>>
>> int vhost_net_start(struct vhost_net *net,
>> - VirtIODevice *dev)
>> + VirtIODevice *dev,
>> + int start_idx)
>> {
>> struct vhost_vring_file file = { };
>> int r;
>>
>> net->dev.nvqs = 2;
>> net->dev.vqs = net->vqs;
>> + net->dev.start_idx = start_idx;
>>
>> r = vhost_dev_enable_notifiers(&net->dev, dev);
>> if (r< 0) {
>> @@ -227,7 +229,8 @@ bool vhost_net_query(VHostNetState *net, VirtIODevice *dev)
>> }
>>
>> int vhost_net_start(struct vhost_net *net,
>> - VirtIODevice *dev)
>> + VirtIODevice *dev,
>> + int start_idx)
>> {
>> return -ENOSYS;
>> }
>> diff --git a/hw/vhost_net.h b/hw/vhost_net.h
>> index 91e40b1..79a4f09 100644
>> --- a/hw/vhost_net.h
>> +++ b/hw/vhost_net.h
>> @@ -9,7 +9,7 @@ typedef struct vhost_net VHostNetState;
>> VHostNetState *vhost_net_init(VLANClientState *backend, int devfd, bool force);
>>
>> bool vhost_net_query(VHostNetState *net, VirtIODevice *dev);
>> -int vhost_net_start(VHostNetState *net, VirtIODevice *dev);
>> +int vhost_net_start(VHostNetState *net, VirtIODevice *dev, int start_idx);
>> void vhost_net_stop(VHostNetState *net, VirtIODevice *dev);
>>
>> void vhost_net_cleanup(VHostNetState *net);
>> diff --git a/hw/virtio-net.c b/hw/virtio-net.c
>> index 3f190d4..d42c4cc 100644
>> --- a/hw/virtio-net.c
>> +++ b/hw/virtio-net.c
>> @@ -26,34 +26,43 @@
>> #define MAC_TABLE_ENTRIES 64
>> #define MAX_VLAN (1<< 12) /* Per 802.1Q definition */
>>
>> -typedef struct VirtIONet
>> +struct VirtIONet;
>> +
>> +typedef struct VirtIONetQueue
>> {
>> - VirtIODevice vdev;
>> - uint8_t mac[ETH_ALEN];
>> - uint16_t status;
>> VirtQueue *rx_vq;
>> VirtQueue *tx_vq;
>> - VirtQueue *ctrl_vq;
>> - NICState *nic;
>> QEMUTimer *tx_timer;
>> QEMUBH *tx_bh;
>> uint32_t tx_timeout;
>> - int32_t tx_burst;
>> int tx_waiting;
>> - uint32_t has_vnet_hdr;
>> - uint8_t has_ufo;
>> struct {
>> VirtQueueElement elem;
>> ssize_t len;
>> } async_tx;
>> + struct VirtIONet *n;
>> + uint8_t vhost_started;
>> +} VirtIONetQueue;
>> +
>> +typedef struct VirtIONet
>> +{
>> + VirtIODevice vdev;
>> + uint8_t mac[ETH_ALEN];
>> + uint16_t status;
>> + VirtIONetQueue vqs[MAX_QUEUE_NUM];
>> + VirtQueue *ctrl_vq;
>> + NICState *nic;
>> + int32_t tx_burst;
>> + uint32_t has_vnet_hdr;
>> + uint8_t has_ufo;
>> int mergeable_rx_bufs;
>> + int multiqueue;
>> uint8_t promisc;
>> uint8_t allmulti;
>> uint8_t alluni;
>> uint8_t nomulti;
>> uint8_t nouni;
>> uint8_t nobcast;
>> - uint8_t vhost_started;
>> struct {
>> int in_use;
>> int first_multi;
>> @@ -63,6 +72,7 @@ typedef struct VirtIONet
>> } mac_table;
>> uint32_t *vlans;
>> DeviceState *qdev;
>> + uint32_t queues;
>> } VirtIONet;
>>
>> /* TODO
>> @@ -74,12 +84,25 @@ static VirtIONet *to_virtio_net(VirtIODevice *vdev)
>> return (VirtIONet *)vdev;
>> }
>>
>> +static int vq_get_pair_index(VirtIONet *n, VirtQueue *vq)
>> +{
>> + int i;
>> + for (i = 0; i< n->queues; i++) {
>> + if (n->vqs[i].tx_vq == vq || n->vqs[i].rx_vq == vq) {
>> + return i;
>> + }
>> + }
>> + assert(1);
>> + return -1;
>> +}
>> +
>> static void virtio_net_get_config(VirtIODevice *vdev, uint8_t *config)
>> {
>> VirtIONet *n = to_virtio_net(vdev);
>> struct virtio_net_config netcfg;
>>
>> stw_p(&netcfg.status, n->status);
>> + netcfg.queues = n->queues * 2;
>> memcpy(netcfg.mac, n->mac, ETH_ALEN);
>> memcpy(config,&netcfg, sizeof(netcfg));
>> }
>> @@ -103,78 +126,140 @@ static bool virtio_net_started(VirtIONet *n, uint8_t status)
>> (n->status& VIRTIO_NET_S_LINK_UP)&& n->vdev.vm_running;
>> }
>>
>> -static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>> +static void nc_vhost_status(VLANClientState *nc, VirtIONet *n,
>> + uint8_t status)
>> {
>> - if (!n->nic->nc.peer) {
>> + int queue_index = nc->queue_index;
>> + VLANClientState *peer = nc->peer;
>> + VirtIONetQueue *netq =&n->vqs[nc->queue_index];
>> +
>> + if (!peer) {
>> return;
>> }
>> - if (n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
>> + if (peer->info->type != NET_CLIENT_TYPE_TAP) {
>> return;
>> }
>>
>> - if (!tap_get_vhost_net(n->nic->nc.peer)) {
>> + if (!tap_get_vhost_net(peer)) {
>> return;
>> }
>> - if (!!n->vhost_started == virtio_net_started(n, status)&&
>> - !n->nic->nc.peer->link_down) {
>> + if (!!netq->vhost_started == virtio_net_started(n, status)&&
>> + !peer->link_down) {
>> return;
>> }
>> - if (!n->vhost_started) {
>> - int r;
>> - if (!vhost_net_query(tap_get_vhost_net(n->nic->nc.peer),&n->vdev)) {
>> + if (!netq->vhost_started) {
>> + /* skip ctrl vq */
>> + int r, start_idx = queue_index == 0 ? 0 : queue_index * 2 + 1;
>> + if (!vhost_net_query(tap_get_vhost_net(peer),&n->vdev)) {
>> return;
>> }
>> - r = vhost_net_start(tap_get_vhost_net(n->nic->nc.peer),&n->vdev);
>> + r = vhost_net_start(tap_get_vhost_net(peer),&n->vdev, start_idx);
>> if (r< 0) {
>> error_report("unable to start vhost net: %d: "
>> "falling back on userspace virtio", -r);
>> } else {
>> - n->vhost_started = 1;
>> + netq->vhost_started = 1;
>> }
>> } else {
>> - vhost_net_stop(tap_get_vhost_net(n->nic->nc.peer),&n->vdev);
>> - n->vhost_started = 0;
>> + vhost_net_stop(tap_get_vhost_net(peer),&n->vdev);
>> + netq->vhost_started = 0;
>> + }
>> +}
>> +
>> +static int peer_attach(VirtIONet *n, int index)
>> +{
>> + if (!n->nic->ncs[index]->peer) {
>> + return -1;
>> + }
>> +
>> + if (n->nic->ncs[index]->peer->info->type != NET_CLIENT_TYPE_TAP) {
>> + return -1;
>> + }
>> +
>> + return tap_attach(n->nic->ncs[index]->peer);
>> +}
>> +
>> +static int peer_detach(VirtIONet *n, int index)
>> +{
>> + if (!n->nic->ncs[index]->peer) {
>> + return -1;
>> + }
>> +
>> + if (n->nic->ncs[index]->peer->info->type != NET_CLIENT_TYPE_TAP) {
>> + return -1;
>> + }
>> +
>> + return tap_detach(n->nic->ncs[index]->peer);
>> +}
>> +
>> +static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>> +{
>> + int i;
>> + for (i = 0; i< n->queues; i++) {
>> + if (!n->multiqueue&& i != 0)
>> + status = 0;
>> + nc_vhost_status(n->nic->ncs[i], n, status);
>> }
>> }
>>
>> static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
>> {
>> VirtIONet *n = to_virtio_net(vdev);
>> + int i;
>>
>> virtio_net_vhost_status(n, status);
>>
>> - if (!n->tx_waiting) {
>> - return;
>> - }
>> + for (i = 0; i< n->queues; i++) {
>> + VirtIONetQueue *netq =&n->vqs[i];
>> + if (!netq->tx_waiting) {
>> + continue;
>> + }
>> +
>> + if (!n->multiqueue&& i != 0)
>> + status = 0;
>>
>> - if (virtio_net_started(n, status)&& !n->vhost_started) {
>> - if (n->tx_timer) {
>> - qemu_mod_timer(n->tx_timer,
>> - qemu_get_clock_ns(vm_clock) + n->tx_timeout);
>> + if (virtio_net_started(n, status)&& !netq->vhost_started) {
>> + if (netq->tx_timer) {
>> + qemu_mod_timer(netq->tx_timer,
>> + qemu_get_clock_ns(vm_clock) + netq->tx_timeout);
>> + } else {
>> + qemu_bh_schedule(netq->tx_bh);
>> + }
>> } else {
>> - qemu_bh_schedule(n->tx_bh);
>> + if (netq->tx_timer) {
>> + qemu_del_timer(netq->tx_timer);
>> + } else {
>> + qemu_bh_cancel(netq->tx_bh);
>> + }
>> }
>> - } else {
>> - if (n->tx_timer) {
>> - qemu_del_timer(n->tx_timer);
>> - } else {
>> - qemu_bh_cancel(n->tx_bh);
>> + }
>> +}
>> +
>> +static bool virtio_net_is_link_up(VirtIONet *n)
>> +{
>> + int i;
>> + for (i = 0; i< n->queues; i++) {
>> + if (n->nic->ncs[i]->link_down) {
>> + return false;
>> }
>> }
>> + return true;
>> }
>>
>> static void virtio_net_set_link_status(VLANClientState *nc)
>> {
>> - VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque;
>> + VirtIONet *n = ((NICState *)(nc->opaque))->opaque;
>> uint16_t old_status = n->status;
>>
>> - if (nc->link_down)
>> + if (virtio_net_is_link_up(n)) {
>> n->status&= ~VIRTIO_NET_S_LINK_UP;
>> - else
>> + } else {
>> n->status |= VIRTIO_NET_S_LINK_UP;
>> + }
>>
>> - if (n->status != old_status)
>> + if (n->status != old_status) {
>> virtio_notify_config(&n->vdev);
>> + }
>>
>> virtio_net_set_status(&n->vdev, n->vdev.status);
>> }
>> @@ -202,13 +287,15 @@ static void virtio_net_reset(VirtIODevice *vdev)
>>
>> static int peer_has_vnet_hdr(VirtIONet *n)
>> {
>> - if (!n->nic->nc.peer)
>> + if (!n->nic->ncs[0]->peer) {
>> return 0;
>> + }
>>
>> - if (n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP)
>> + if (n->nic->ncs[0]->peer->info->type != NET_CLIENT_TYPE_TAP) {
>> return 0;
>> + }
>>
>> - n->has_vnet_hdr = tap_has_vnet_hdr(n->nic->nc.peer);
>> + n->has_vnet_hdr = tap_has_vnet_hdr(n->nic->ncs[0]->peer);
>>
>> return n->has_vnet_hdr;
>> }
>> @@ -218,7 +305,7 @@ static int peer_has_ufo(VirtIONet *n)
>> if (!peer_has_vnet_hdr(n))
>> return 0;
>>
>> - n->has_ufo = tap_has_ufo(n->nic->nc.peer);
>> + n->has_ufo = tap_has_ufo(n->nic->ncs[0]->peer);
>>
>> return n->has_ufo;
>> }
>> @@ -228,9 +315,13 @@ static uint32_t virtio_net_get_features(VirtIODevice *vdev, uint32_t features)
>> VirtIONet *n = to_virtio_net(vdev);
>>
>> features |= (1<< VIRTIO_NET_F_MAC);
>> + features |= (1<< VIRTIO_NET_F_MULTIQUEUE);
>>
>> if (peer_has_vnet_hdr(n)) {
>> - tap_using_vnet_hdr(n->nic->nc.peer, 1);
>> + int i;
>> + for (i = 0; i< n->queues; i++) {
>> + tap_using_vnet_hdr(n->nic->ncs[i]->peer, 1);
>> + }
>> } else {
>> features&= ~(0x1<< VIRTIO_NET_F_CSUM);
>> features&= ~(0x1<< VIRTIO_NET_F_HOST_TSO4);
>> @@ -248,14 +339,15 @@ static uint32_t virtio_net_get_features(VirtIODevice *vdev, uint32_t features)
>> features&= ~(0x1<< VIRTIO_NET_F_HOST_UFO);
>> }
>>
>> - if (!n->nic->nc.peer ||
>> - n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
>> + if (!n->nic->ncs[0]->peer ||
>> + n->nic->ncs[0]->peer->info->type != NET_CLIENT_TYPE_TAP) {
>> return features;
>> }
>> - if (!tap_get_vhost_net(n->nic->nc.peer)) {
>> + if (!tap_get_vhost_net(n->nic->ncs[0]->peer)) {
>> return features;
>> }
>> - return vhost_net_get_features(tap_get_vhost_net(n->nic->nc.peer), features);
>> + return vhost_net_get_features(tap_get_vhost_net(n->nic->ncs[0]->peer),
>> + features);
>> }
>>
>> static uint32_t virtio_net_bad_features(VirtIODevice *vdev)
>> @@ -276,25 +368,38 @@ static uint32_t virtio_net_bad_features(VirtIODevice *vdev)
>> static void virtio_net_set_features(VirtIODevice *vdev, uint32_t features)
>> {
>> VirtIONet *n = to_virtio_net(vdev);
>> + int i, r;
>>
>> n->mergeable_rx_bufs = !!(features& (1<< VIRTIO_NET_F_MRG_RXBUF));
>> + n->multiqueue = !!(features& (1<< VIRTIO_NET_F_MULTIQUEUE));
>>
>> - if (n->has_vnet_hdr) {
>> - tap_set_offload(n->nic->nc.peer,
>> - (features>> VIRTIO_NET_F_GUEST_CSUM)& 1,
>> - (features>> VIRTIO_NET_F_GUEST_TSO4)& 1,
>> - (features>> VIRTIO_NET_F_GUEST_TSO6)& 1,
>> - (features>> VIRTIO_NET_F_GUEST_ECN)& 1,
>> - (features>> VIRTIO_NET_F_GUEST_UFO)& 1);
>> - }
>> - if (!n->nic->nc.peer ||
>> - n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
>> - return;
>> - }
>> - if (!tap_get_vhost_net(n->nic->nc.peer)) {
>> - return;
>> + for (i = 0; i< n->queues; i++) {
>> + if (!n->multiqueue&& i != 0) {
>> + r = peer_detach(n, i);
>> + assert(r == 0);
>> + } else {
>> + r = peer_attach(n, i);
>> + assert(r == 0);
>> +
>> + if (n->has_vnet_hdr) {
>> + tap_set_offload(n->nic->ncs[i]->peer,
>> + (features>> VIRTIO_NET_F_GUEST_CSUM)& 1,
>> + (features>> VIRTIO_NET_F_GUEST_TSO4)& 1,
>> + (features>> VIRTIO_NET_F_GUEST_TSO6)& 1,
>> + (features>> VIRTIO_NET_F_GUEST_ECN)& 1,
>> + (features>> VIRTIO_NET_F_GUEST_UFO)& 1);
>> + }
>> + if (!n->nic->ncs[i]->peer ||
>> + n->nic->ncs[i]->peer->info->type != NET_CLIENT_TYPE_TAP) {
>> + continue;
>> + }
>> + if (!tap_get_vhost_net(n->nic->ncs[i]->peer)) {
>> + continue;
>> + }
>> + vhost_net_ack_features(tap_get_vhost_net(n->nic->ncs[i]->peer),
>> + features);
>> + }
>> }
>> - vhost_net_ack_features(tap_get_vhost_net(n->nic->nc.peer), features);
>> }
>>
>> static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
>> @@ -446,7 +551,7 @@ static void virtio_net_handle_rx(VirtIODevice *vdev, VirtQueue *vq)
>> {
>> VirtIONet *n = to_virtio_net(vdev);
>>
>> - qemu_flush_queued_packets(&n->nic->nc);
>> + qemu_flush_queued_packets(n->nic->ncs[vq_get_pair_index(n, vq)]);
>>
>> /* We now have RX buffers, signal to the IO thread to break out of the
>> * select to re-poll the tap file descriptor */
>> @@ -455,36 +560,37 @@ static void virtio_net_handle_rx(VirtIODevice *vdev, VirtQueue *vq)
>>
>> static int virtio_net_can_receive(VLANClientState *nc)
>> {
>> - VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque;
>> + int queue_index = nc->queue_index;
>> + VirtIONet *n = ((NICState *)nc->opaque)->opaque;
>> +
>> if (!n->vdev.vm_running) {
>> return 0;
>> }
>>
>> - if (!virtio_queue_ready(n->rx_vq) ||
>> + if (!virtio_queue_ready(n->vqs[queue_index].rx_vq) ||
>> !(n->vdev.status& VIRTIO_CONFIG_S_DRIVER_OK))
>> return 0;
>>
>> return 1;
>> }
>>
>> -static int virtio_net_has_buffers(VirtIONet *n, int bufsize)
>> +static int virtio_net_has_buffers(VirtIONet *n, int bufsize, VirtQueue *vq)
>> {
>> - if (virtio_queue_empty(n->rx_vq) ||
>> - (n->mergeable_rx_bufs&&
>> - !virtqueue_avail_bytes(n->rx_vq, bufsize, 0))) {
>> - virtio_queue_set_notification(n->rx_vq, 1);
>> + if (virtio_queue_empty(vq) || (n->mergeable_rx_bufs&&
>> + !virtqueue_avail_bytes(vq, bufsize, 0))) {
>> + virtio_queue_set_notification(vq, 1);
>>
>> /* To avoid a race condition where the guest has made some buffers
>> * available after the above check but before notification was
>> * enabled, check for available buffers again.
>> */
>> - if (virtio_queue_empty(n->rx_vq) ||
>> - (n->mergeable_rx_bufs&&
>> - !virtqueue_avail_bytes(n->rx_vq, bufsize, 0)))
>> + if (virtio_queue_empty(vq) || (n->mergeable_rx_bufs&&
>> + !virtqueue_avail_bytes(vq, bufsize, 0))) {
>> return 0;
>> + }
>> }
>>
>> - virtio_queue_set_notification(n->rx_vq, 0);
>> + virtio_queue_set_notification(vq, 0);
>> return 1;
>> }
>>
>> @@ -595,12 +701,15 @@ static int receive_filter(VirtIONet *n, const uint8_t *buf, int size)
>>
>> static ssize_t virtio_net_receive(VLANClientState *nc, const uint8_t *buf, size_t size)
>> {
>> - VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque;
>> + int queue_index = nc->queue_index;
>> + VirtIONet *n = ((NICState *)(nc->opaque))->opaque;
>> + VirtQueue *vq = n->vqs[queue_index].rx_vq;
>> struct virtio_net_hdr_mrg_rxbuf *mhdr = NULL;
>> size_t guest_hdr_len, offset, i, host_hdr_len;
>>
>> - if (!virtio_net_can_receive(&n->nic->nc))
>> + if (!virtio_net_can_receive(n->nic->ncs[queue_index])) {
>> return -1;
>> + }
>>
>> /* hdr_len refers to the header we supply to the guest */
>> guest_hdr_len = n->mergeable_rx_bufs ?
>> @@ -608,7 +717,7 @@ static ssize_t virtio_net_receive(VLANClientState *nc, const uint8_t *buf, size_
>>
>>
>> host_hdr_len = n->has_vnet_hdr ? sizeof(struct virtio_net_hdr) : 0;
>> - if (!virtio_net_has_buffers(n, size + guest_hdr_len - host_hdr_len))
>> + if (!virtio_net_has_buffers(n, size + guest_hdr_len - host_hdr_len, vq))
>> return 0;
>>
>> if (!receive_filter(n, buf, size))
>> @@ -623,7 +732,7 @@ static ssize_t virtio_net_receive(VLANClientState *nc, const uint8_t *buf, size_
>>
>> total = 0;
>>
>> - if (virtqueue_pop(n->rx_vq,&elem) == 0) {
>> + if (virtqueue_pop(vq,&elem) == 0) {
>> if (i == 0)
>> return -1;
>> error_report("virtio-net unexpected empty queue: "
>> @@ -675,47 +784,50 @@ static ssize_t virtio_net_receive(VLANClientState *nc, const uint8_t *buf, size_
>> }
>>
>> /* signal other side */
>> - virtqueue_fill(n->rx_vq,&elem, total, i++);
>> + virtqueue_fill(vq,&elem, total, i++);
>> }
>>
>> if (mhdr) {
>> stw_p(&mhdr->num_buffers, i);
>> }
>>
>> - virtqueue_flush(n->rx_vq, i);
>> - virtio_notify(&n->vdev, n->rx_vq);
>> + virtqueue_flush(vq, i);
>> + virtio_notify(&n->vdev, vq);
>>
>> return size;
>> }
>>
>> -static int32_t virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq);
>> +static int32_t virtio_net_flush_tx(VirtIONet *n, VirtIONetQueue *tvq);
>>
>> static void virtio_net_tx_complete(VLANClientState *nc, ssize_t len)
>> {
>> - VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque;
>> + VirtIONet *n = ((NICState *)nc->opaque)->opaque;
>> + VirtIONetQueue *netq =&n->vqs[nc->queue_index];
>>
>> - virtqueue_push(n->tx_vq,&n->async_tx.elem, n->async_tx.len);
>> - virtio_notify(&n->vdev, n->tx_vq);
>> + virtqueue_push(netq->tx_vq,&netq->async_tx.elem, netq->async_tx.len);
>> + virtio_notify(&n->vdev, netq->tx_vq);
>>
>> - n->async_tx.elem.out_num = n->async_tx.len = 0;
>> + netq->async_tx.elem.out_num = netq->async_tx.len;
>>
>> - virtio_queue_set_notification(n->tx_vq, 1);
>> - virtio_net_flush_tx(n, n->tx_vq);
>> + virtio_queue_set_notification(netq->tx_vq, 1);
>> + virtio_net_flush_tx(n, netq);
>> }
>>
>> /* TX */
>> -static int32_t virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq)
>> +static int32_t virtio_net_flush_tx(VirtIONet *n, VirtIONetQueue *netq)
>> {
>> VirtQueueElement elem;
>> int32_t num_packets = 0;
>> + VirtQueue *vq = netq->tx_vq;
>> +
>> if (!(n->vdev.status& VIRTIO_CONFIG_S_DRIVER_OK)) {
>> return num_packets;
>> }
>>
>> assert(n->vdev.vm_running);
>>
>> - if (n->async_tx.elem.out_num) {
>> - virtio_queue_set_notification(n->tx_vq, 0);
>> + if (netq->async_tx.elem.out_num) {
>> + virtio_queue_set_notification(vq, 0);
>> return num_packets;
>> }
>>
>> @@ -747,12 +859,12 @@ static int32_t virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq)
>> len += hdr_len;
>> }
>>
>> - ret = qemu_sendv_packet_async(&n->nic->nc, out_sg, out_num,
>> - virtio_net_tx_complete);
>> + ret = qemu_sendv_packet_async(n->nic->ncs[vq_get_pair_index(n, vq)],
>> + out_sg, out_num, virtio_net_tx_complete);
>> if (ret == 0) {
>> - virtio_queue_set_notification(n->tx_vq, 0);
>> - n->async_tx.elem = elem;
>> - n->async_tx.len = len;
>> + virtio_queue_set_notification(vq, 0);
>> + netq->async_tx.elem = elem;
>> + netq->async_tx.len = len;
>> return -EBUSY;
>> }
>>
>> @@ -771,22 +883,23 @@ static int32_t virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq)
>> static void virtio_net_handle_tx_timer(VirtIODevice *vdev, VirtQueue *vq)
>> {
>> VirtIONet *n = to_virtio_net(vdev);
>> + VirtIONetQueue *netq =&n->vqs[vq_get_pair_index(n, vq)];
>>
>> /* This happens when device was stopped but VCPU wasn't. */
>> if (!n->vdev.vm_running) {
>> - n->tx_waiting = 1;
>> + netq->tx_waiting = 1;
>> return;
>> }
>>
>> - if (n->tx_waiting) {
>> + if (netq->tx_waiting) {
>> virtio_queue_set_notification(vq, 1);
>> - qemu_del_timer(n->tx_timer);
>> - n->tx_waiting = 0;
>> - virtio_net_flush_tx(n, vq);
>> + qemu_del_timer(netq->tx_timer);
>> + netq->tx_waiting = 0;
>> + virtio_net_flush_tx(n, netq);
>> } else {
>> - qemu_mod_timer(n->tx_timer,
>> - qemu_get_clock_ns(vm_clock) + n->tx_timeout);
>> - n->tx_waiting = 1;
>> + qemu_mod_timer(netq->tx_timer,
>> + qemu_get_clock_ns(vm_clock) + netq->tx_timeout);
>> + netq->tx_waiting = 1;
>> virtio_queue_set_notification(vq, 0);
>> }
>> }
>> @@ -794,48 +907,53 @@ static void virtio_net_handle_tx_timer(VirtIODevice *vdev, VirtQueue *vq)
>> static void virtio_net_handle_tx_bh(VirtIODevice *vdev, VirtQueue *vq)
>> {
>> VirtIONet *n = to_virtio_net(vdev);
>> + VirtIONetQueue *netq =&n->vqs[vq_get_pair_index(n, vq)];
>>
>> - if (unlikely(n->tx_waiting)) {
>> + if (unlikely(netq->tx_waiting)) {
>> return;
>> }
>> - n->tx_waiting = 1;
>> + netq->tx_waiting = 1;
>> /* This happens when device was stopped but VCPU wasn't. */
>> if (!n->vdev.vm_running) {
>> return;
>> }
>> virtio_queue_set_notification(vq, 0);
>> - qemu_bh_schedule(n->tx_bh);
>> + qemu_bh_schedule(netq->tx_bh);
>> }
>>
>> static void virtio_net_tx_timer(void *opaque)
>> {
>> - VirtIONet *n = opaque;
>> + VirtIONetQueue *netq = opaque;
>> + VirtIONet *n = netq->n;
>> +
>> assert(n->vdev.vm_running);
>>
>> - n->tx_waiting = 0;
>> + netq->tx_waiting = 0;
>>
>> /* Just in case the driver is not ready on more */
>> if (!(n->vdev.status& VIRTIO_CONFIG_S_DRIVER_OK))
>> return;
>>
>> - virtio_queue_set_notification(n->tx_vq, 1);
>> - virtio_net_flush_tx(n, n->tx_vq);
>> + virtio_queue_set_notification(netq->tx_vq, 1);
>> + virtio_net_flush_tx(n, netq);
>> }
>>
>> static void virtio_net_tx_bh(void *opaque)
>> {
>> - VirtIONet *n = opaque;
>> + VirtIONetQueue *netq = opaque;
>> + VirtQueue *vq = netq->tx_vq;
>> + VirtIONet *n = netq->n;
>> int32_t ret;
>>
>> assert(n->vdev.vm_running);
>>
>> - n->tx_waiting = 0;
>> + netq->tx_waiting = 0;
>>
>> /* Just in case the driver is not ready on more */
>> if (unlikely(!(n->vdev.status& VIRTIO_CONFIG_S_DRIVER_OK)))
>> return;
>>
>> - ret = virtio_net_flush_tx(n, n->tx_vq);
>> + ret = virtio_net_flush_tx(n, netq);
>> if (ret == -EBUSY) {
>> return; /* Notification re-enable handled by tx_complete */
>> }
>> @@ -843,33 +961,39 @@ static void virtio_net_tx_bh(void *opaque)
>> /* If we flush a full burst of packets, assume there are
>> * more coming and immediately reschedule */
>> if (ret>= n->tx_burst) {
>> - qemu_bh_schedule(n->tx_bh);
>> - n->tx_waiting = 1;
>> + qemu_bh_schedule(netq->tx_bh);
>> + netq->tx_waiting = 1;
>> return;
>> }
>>
>> /* If less than a full burst, re-enable notification and flush
>> * anything that may have come in while we weren't looking. If
>> * we find something, assume the guest is still active and reschedule */
>> - virtio_queue_set_notification(n->tx_vq, 1);
>> - if (virtio_net_flush_tx(n, n->tx_vq)> 0) {
>> - virtio_queue_set_notification(n->tx_vq, 0);
>> - qemu_bh_schedule(n->tx_bh);
>> - n->tx_waiting = 1;
>> + virtio_queue_set_notification(vq, 1);
>> + if (virtio_net_flush_tx(n, netq)> 0) {
>> + virtio_queue_set_notification(vq, 0);
>> + qemu_bh_schedule(netq->tx_bh);
>> + netq->tx_waiting = 1;
>> }
>> }
>>
>> static void virtio_net_save(QEMUFile *f, void *opaque)
>> {
>> VirtIONet *n = opaque;
>> + int i;
>>
>> /* At this point, backend must be stopped, otherwise
>> * it might keep writing to memory. */
>> - assert(!n->vhost_started);
>> + for (i = 0; i< n->queues; i++) {
>> + assert(!n->vqs[i].vhost_started);
>> + }
>> virtio_save(&n->vdev, f);
>>
>> qemu_put_buffer(f, n->mac, ETH_ALEN);
>> - qemu_put_be32(f, n->tx_waiting);
>> + qemu_put_be32(f, n->queues);
>> + for (i = 0; i< n->queues; i++) {
>> + qemu_put_be32(f, n->vqs[i].tx_waiting);
>> + }
>> qemu_put_be32(f, n->mergeable_rx_bufs);
>> qemu_put_be16(f, n->status);
>> qemu_put_byte(f, n->promisc);
>> @@ -902,7 +1026,10 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
>> }
>>
>> qemu_get_buffer(f, n->mac, ETH_ALEN);
>> - n->tx_waiting = qemu_get_be32(f);
>> + n->queues = qemu_get_be32(f);
>> + for (i = 0; i< n->queues; i++) {
>> + n->vqs[i].tx_waiting = qemu_get_be32(f);
>> + }
>> n->mergeable_rx_bufs = qemu_get_be32(f);
>>
>> if (version_id>= 3)
>> @@ -930,7 +1057,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
>> n->mac_table.in_use = 0;
>> }
>> }
>> -
>> +
>> if (version_id>= 6)
>> qemu_get_buffer(f, (uint8_t *)n->vlans, MAX_VLAN>> 3);
>>
>> @@ -941,13 +1068,16 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
>> }
>>
>> if (n->has_vnet_hdr) {
>> - tap_using_vnet_hdr(n->nic->nc.peer, 1);
>> - tap_set_offload(n->nic->nc.peer,
>> - (n->vdev.guest_features>> VIRTIO_NET_F_GUEST_CSUM)& 1,
>> - (n->vdev.guest_features>> VIRTIO_NET_F_GUEST_TSO4)& 1,
>> - (n->vdev.guest_features>> VIRTIO_NET_F_GUEST_TSO6)& 1,
>> - (n->vdev.guest_features>> VIRTIO_NET_F_GUEST_ECN)& 1,
>> - (n->vdev.guest_features>> VIRTIO_NET_F_GUEST_UFO)& 1);
>> + for(i = 0; i< n->queues; i++) {
>> + tap_using_vnet_hdr(n->nic->ncs[i]->peer, 1);
>> + tap_set_offload(n->nic->ncs[i]->peer,
>> + (n->vdev.guest_features>> VIRTIO_NET_F_GUEST_CSUM)& 1,
>> + (n->vdev.guest_features>> VIRTIO_NET_F_GUEST_TSO4)& 1,
>> + (n->vdev.guest_features>> VIRTIO_NET_F_GUEST_TSO6)& 1,
>> + (n->vdev.guest_features>> VIRTIO_NET_F_GUEST_ECN)& 1,
>> + (n->vdev.guest_features>> VIRTIO_NET_F_GUEST_UFO)&
>> + 1);
>> + }
>> }
>> }
>>
>> @@ -982,7 +1112,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
>>
>> static void virtio_net_cleanup(VLANClientState *nc)
>> {
>> - VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque;
>> + VirtIONet *n = ((NICState *)nc->opaque)->opaque;
>>
>> n->nic = NULL;
>> }
>> @@ -1000,6 +1130,7 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
>> virtio_net_conf *net)
>> {
>> VirtIONet *n;
>> + int i;
>>
>> n = (VirtIONet *)virtio_common_init("virtio-net", VIRTIO_ID_NET,
>> sizeof(struct virtio_net_config),
>> @@ -1012,7 +1143,6 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
>> n->vdev.bad_features = virtio_net_bad_features;
>> n->vdev.reset = virtio_net_reset;
>> n->vdev.set_status = virtio_net_set_status;
>> - n->rx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_rx);
>>
>> if (net->tx&& strcmp(net->tx, "timer")&& strcmp(net->tx, "bh")) {
>> error_report("virtio-net: "
>> @@ -1021,15 +1151,6 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
>> error_report("Defaulting to \"bh\"");
>> }
>>
>> - if (net->tx&& !strcmp(net->tx, "timer")) {
>> - n->tx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_tx_timer);
>> - n->tx_timer = qemu_new_timer_ns(vm_clock, virtio_net_tx_timer, n);
>> - n->tx_timeout = net->txtimer;
>> - } else {
>> - n->tx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_tx_bh);
>> - n->tx_bh = qemu_bh_new(virtio_net_tx_bh, n);
>> - }
>> - n->ctrl_vq = virtio_add_queue(&n->vdev, 64, virtio_net_handle_ctrl);
>> qemu_macaddr_default_if_unset(&conf->macaddr);
>> memcpy(&n->mac[0],&conf->macaddr, sizeof(n->mac));
>> n->status = VIRTIO_NET_S_LINK_UP;
>> @@ -1038,7 +1159,6 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
>>
>> qemu_format_nic_info_str(&n->nic->nc, conf->macaddr.a);
>>
>> - n->tx_waiting = 0;
>> n->tx_burst = net->txburst;
>> n->mergeable_rx_bufs = 0;
>> n->promisc = 1; /* for compatibility */
>> @@ -1046,6 +1166,32 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
>> n->mac_table.macs = g_malloc0(MAC_TABLE_ENTRIES * ETH_ALEN);
>>
>> n->vlans = g_malloc0(MAX_VLAN>> 3);
>> + n->queues = conf->queues;
>> +
>> + /* Allocate per rx/tx vq's */
>> + for (i = 0; i< n->queues; i++) {
>> + n->vqs[i].rx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_rx);
>> + if (net->tx&& !strcmp(net->tx, "timer")) {
>> + n->vqs[i].tx_vq = virtio_add_queue(&n->vdev, 256,
>> + virtio_net_handle_tx_timer);
>> + n->vqs[i].tx_timer = qemu_new_timer_ns(vm_clock,
>> + virtio_net_tx_timer,
>> +&n->vqs[i]);
>> + n->vqs[i].tx_timeout = net->txtimer;
>> + } else {
>> + n->vqs[i].tx_vq = virtio_add_queue(&n->vdev, 256,
>> + virtio_net_handle_tx_bh);
>> + n->vqs[i].tx_bh = qemu_bh_new(virtio_net_tx_bh,&n->vqs[i]);
>> + }
>> +
>> + n->vqs[i].tx_waiting = 0;
>> + n->vqs[i].n = n;
>> +
>> + if (i == 0) {
>> + /* keep compatiable with spec and old guest */
>> + n->ctrl_vq = virtio_add_queue(&n->vdev, 64, virtio_net_handle_ctrl);
>> + }
>> + }
>>
>> n->qdev = dev;
>> register_savevm(dev, "virtio-net", -1, VIRTIO_NET_VM_VERSION,
>> @@ -1059,24 +1205,33 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
>> void virtio_net_exit(VirtIODevice *vdev)
>> {
>> VirtIONet *n = DO_UPCAST(VirtIONet, vdev, vdev);
>> + int i;
>>
>> /* This will stop vhost backend if appropriate. */
>> virtio_net_set_status(vdev, 0);
>>
>> - qemu_purge_queued_packets(&n->nic->nc);
>> + for (i = 0; i< n->queues; i++) {
>> + qemu_purge_queued_packets(n->nic->ncs[i]);
>> + }
>>
>> unregister_savevm(n->qdev, "virtio-net", n);
>>
>> g_free(n->mac_table.macs);
>> g_free(n->vlans);
>>
>> - if (n->tx_timer) {
>> - qemu_del_timer(n->tx_timer);
>> - qemu_free_timer(n->tx_timer);
>> - } else {
>> - qemu_bh_delete(n->tx_bh);
>> + for (i = 0; i< n->queues; i++) {
>> + VirtIONetQueue *netq =&n->vqs[i];
>> + if (netq->tx_timer) {
>> + qemu_del_timer(netq->tx_timer);
>> + qemu_free_timer(netq->tx_timer);
>> + } else {
>> + qemu_bh_delete(netq->tx_bh);
>> + }
>> }
>>
>> - qemu_del_vlan_client(&n->nic->nc);
>> virtio_cleanup(&n->vdev);
>> +
>> + for (i = 0; i< n->queues; i++) {
>> + qemu_del_vlan_client(n->nic->ncs[i]);
>> + }
>> }
>> diff --git a/hw/virtio-net.h b/hw/virtio-net.h
>> index 36aa463..b35ba5d 100644
>> --- a/hw/virtio-net.h
>> +++ b/hw/virtio-net.h
>> @@ -44,6 +44,7 @@
>> #define VIRTIO_NET_F_CTRL_RX 18 /* Control channel RX mode support */
>> #define VIRTIO_NET_F_CTRL_VLAN 19 /* Control channel VLAN filtering */
>> #define VIRTIO_NET_F_CTRL_RX_EXTRA 20 /* Extra RX mode control support */
>> +#define VIRTIO_NET_F_MULTIQUEUE 22
>>
>> #define VIRTIO_NET_S_LINK_UP 1 /* Link is up */
>>
>> @@ -72,6 +73,8 @@ struct virtio_net_config
>> uint8_t mac[ETH_ALEN];
>> /* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
>> uint16_t status;
>> +
>> + uint16_t queues;
>> } QEMU_PACKED;
>>
>> /* This is the first element of the scatter-gather list. If you don't
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 3/3] virtio-blk: Add bio-based IO path for virtio-blk
From: Rusty Russell @ 2012-07-02 6:41 UTC (permalink / raw)
To: Asias He
Cc: kvm, Michael S. Tsirkin, linux-kernel, virtualization,
Sasha Levin, Christoph Hellwig
In-Reply-To: <4FF10B31.60609@redhat.com>
On Mon, 02 Jul 2012 10:45:05 +0800, Asias He <asias@redhat.com> wrote:
> On 07/02/2012 07:54 AM, Rusty Russell wrote:
> > Confused. So, without merging we get 6k exits (per second?) How many
> > do we get when we use the request-based IO path?
>
> Sorry for the confusion. The numbers were collected from request-based
> IO path where we can have the guest block layer merge the requests.
>
> With the same workload in guest, the guest fires 200K requests to host
> with merges enabled in guest (echo 0 > /sys/block/vdb/queue/nomerges),
> while the guest fires 40000K requests to host with merges disabled in
> guest (echo 2 > /sys/block/vdb/queue/nomerges). This show that the merge
> in block layer reduces the total number of requests fire to host a lot
> (40000K / 200K = 20).
>
> The guest fires 200K requests to host with merges enabled in guest (echo
> 0 > /sys/block/vdb/queue/nomerges), the host fires 6K interrupts in
> total for the 200K requests. This show that the ratio of interrupts
> coalesced (200K / 6K = 33).
OK, got it! Guest merging cuts requests by a factor of 20. EVENT_IDX
cuts interrupts by a factor of 33.
> > If your device is slow, then you won't be able to make many requests per
> > second: why worry about exit costs?
>
> If a device is slow, the merge would merge more requests and reduce the
> total number of requests to host. This saves exit costs, no?
Sure, our guest merging might save us 100x as many exits as no merging.
But since we're not doing many requests, does it matter?
Ideally we'd merge requests only if the device queue is full. But that
sounds hard.
> > If your device is fast (eg. ram),
> > you've already shown that your patch is a win, right?
>
> Yes. Both on ramdisk and fast SSD device (e.g. FusionIO).
Thanks,
Rusty.
^ permalink raw reply
* Re: [PATCH 3/3] virtio-blk: Add bio-based IO path for virtio-blk
From: Asias He @ 2012-07-02 2:45 UTC (permalink / raw)
To: Rusty Russell
Cc: kvm, Michael S. Tsirkin, linux-kernel, virtualization,
Sasha Levin, Christoph Hellwig
In-Reply-To: <87r4svxcjw.fsf@rustcorp.com.au>
On 07/02/2012 07:54 AM, Rusty Russell wrote:
> On Tue, 19 Jun 2012 10:51:18 +0800, Asias He <asias@redhat.com> wrote:
>> On 06/18/2012 07:39 PM, Sasha Levin wrote:
>>> On Mon, 2012-06-18 at 14:14 +0300, Dor Laor wrote:
>>>> On 06/18/2012 01:05 PM, Rusty Russell wrote:
>>>>> On Mon, 18 Jun 2012 16:03:23 +0800, Asias He<asias@redhat.com> wrote:
>>>>>> On 06/18/2012 03:46 PM, Rusty Russell wrote:
>>>>>>> On Mon, 18 Jun 2012 14:53:10 +0800, Asias He<asias@redhat.com> wrote:
>>>>>>>> This patch introduces bio-based IO path for virtio-blk.
>>>>>>>
>>>>>>> Why make it optional?
>>>>>>
>>>>>> request-based IO path is useful for users who do not want to bypass the
>>>>>> IO scheduler in guest kernel, e.g. users using spinning disk. For users
>>>>>> using fast disk device, e.g. SSD device, they can use bio-based IO path.
>>>>>
>>>>> Users using a spinning disk still get IO scheduling in the host though.
>>>>> What benefit is there in doing it in the guest as well?
>>>>
>>>> The io scheduler waits for requests to merge and thus batch IOs
>>>> together. It's not important w.r.t spinning disks since the host can do
>>>> it but it causes much less vmexits which is the key issue for VMs.
>>>
>>> Is the amount of exits caused by virtio-blk significant at all with
>>> EVENT_IDX?
>>
>> Yes. EVENT_IDX saves the number of notify and interrupt. Let's take the
>> interrupt as an example, The guest fires 200K request to host, the
>> number of interrupt is about 6K thanks to EVENT_IDX. The ratio is 200K /
>> 6K = 33. The ratio of merging is 40000K / 200K = 20.
>
> Confused. So, without merging we get 6k exits (per second?) How many
> do we get when we use the request-based IO path?
Sorry for the confusion. The numbers were collected from request-based
IO path where we can have the guest block layer merge the requests.
With the same workload in guest, the guest fires 200K requests to host
with merges enabled in guest (echo 0 > /sys/block/vdb/queue/nomerges),
while the guest fires 40000K requests to host with merges disabled in
guest (echo 2 > /sys/block/vdb/queue/nomerges). This show that the merge
in block layer reduces the total number of requests fire to host a lot
(40000K / 200K = 20).
The guest fires 200K requests to host with merges enabled in guest (echo
0 > /sys/block/vdb/queue/nomerges), the host fires 6K interrupts in
total for the 200K requests. This show that the ratio of interrupts
coalesced (200K / 6K = 33).
>
> If your device is slow, then you won't be able to make many requests per
> second: why worry about exit costs?
If a device is slow, the merge would merge more requests and reduce the
total number of requests to host. This saves exit costs, no?
> If your device is fast (eg. ram),
> you've already shown that your patch is a win, right?
Yes. Both on ramdisk and fast SSD device (e.g. FusionIO).
--
Asias
^ permalink raw reply
* Re: RFD: virtio balloon API use (was Re: [PATCH 5 of 5] virtio: expose added descriptors immediately)
From: Rusty Russell @ 2012-07-02 1:05 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: virtualization, Rafael Aquini, kvm, linux-kernel
In-Reply-To: <20120701092051.GA4515@redhat.com>
On Sun, 1 Jul 2012 12:20:51 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, Nov 03, 2011 at 06:12:53PM +1030, Rusty Russell wrote:
> > A virtio driver does virtqueue_add_buf() multiple times before finally
> > calling virtqueue_kick(); previously we only exposed the added buffers
> > in the virtqueue_kick() call. This means we don't need a memory
> > barrier in virtqueue_add_buf(), but it reduces concurrency as the
> > device (ie. host) can't see the buffers until the kick.
> >
> > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>
> Looking at recent mm compaction patches made me look at locking
> in balloon closely. And I noticed the referenced patch (commit
> ee7cd8981e15bcb365fc762afe3fc47b8242f630 upstream) interacts strangely
> with virtio balloon; balloon currently does:
>
> static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
> {
> struct scatterlist sg;
>
> sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
>
> init_completion(&vb->acked);
>
> /* We should always be able to add one buffer to an empty queue. */
> if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
> BUG();
> virtqueue_kick(vq);
>
> /* When host has read buffer, this completes via balloon_ack */
> wait_for_completion(&vb->acked);
> }
>
>
> While vq callback does:
>
> static void balloon_ack(struct virtqueue *vq)
> {
> struct virtio_balloon *vb;
> unsigned int len;
>
> vb = virtqueue_get_buf(vq, &len);
> if (vb)
> complete(&vb->acked);
> }
>
>
> So virtqueue_get_buf might now run concurrently with virtqueue_kick.
> I audited both and this seems safe in practice but I think
Good spotting!
Agreed. Because there's only add_buf, we get away with it: the add_buf
must be almost finished by the time get_buf runs because the device has
seen the buffer.
> we need to either declare this legal at the API level
> or add locking in driver.
I wonder if we should just lock in the balloon driver, rather than
document this corner case and set a bad example. Are there other
drivers which take the same shortcut?
> Further, is there a guarantee that we never get
> spurious callbacks? We currently check ring not empty
> but esp for non shared MSI this might not be needed.
Yes, I think this saves us. A spurious interrupt won't trigger
a spurious callback.
> If a spurious callback triggers, virtqueue_get_buf can run
> concurrently with virtqueue_add_buf which is known to be racy.
> Again I think this is currently safe as no spurious callbacks in
> practice but should we guarantee no spurious callbacks at the API level
> or add locking in driver?
I think we should guarantee it, but is there a hole in the current
implementation?
Thanks,
Rusty.
^ permalink raw reply
* Re: [PATCH] Add a page cache-backed balloon device driver.
From: Rusty Russell @ 2012-07-02 0:29 UTC (permalink / raw)
To: Frank Swiderski, Michael S. Tsirkin
Cc: Andrea Arcangeli, Rik van Riel, Rafael Aquini, kvm, linux-kernel,
mikew, Ying Han, virtualization
In-Reply-To: <CAK+C7kUN-kYVK9AnEhcof98p+eZN1dkt9qVyYppETOeS2n3CMQ@mail.gmail.com>
On Tue, 26 Jun 2012 16:21:58 -0700, Frank Swiderski <fes@google.com> wrote:
> On Tue, Jun 26, 2012 at 2:47 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > Let's assume it's a feature bit: how would you
> > formulate what the feature does *from host point of view*?
>
> In this implementation, the host doesn't keep track of pages in the
> balloon, as there is no explicit deflate path. The host device for
> this implementation should merely, for example, MADV_DONTNEED on the
> pages sent in an inflate. Thus, the inflate becomes a notification
> that the guest doesn't need those pages mapped in, but that they
> should be available if the guest touches them. In that sense, it's
> not a rigid shrink of guest memory. I'm not sure what I'd call the
> feature bit though.
>
> Was that the question you were asking, or did I misread?
Hmm, the spec is unfortunately vague: !VIRTIO_BALLOON_F_MUST_TELL_HOST
implies you should tell the host (eventually). I don't know if any
implementations actually care though.
We could add a VIRTIO_BALLOON_F_NEVER_TELL_DEFLATE which would mean the
deflate vq need not be used at all.
Is it altogether impossible to know when a page is reused in your
implementation? If we could do that, we could replace our balloon with
this one.
(My deep ignorance of vm issues is hurting us here, sorry.)
Cheers,
Rusty.
^ permalink raw reply
* Re: [PATCH 3/3] virtio-blk: Add bio-based IO path for virtio-blk
From: Rusty Russell @ 2012-07-01 23:54 UTC (permalink / raw)
To: Asias He, Sasha Levin
Cc: kvm, Michael S. Tsirkin, linux-kernel, virtualization,
Christoph Hellwig
In-Reply-To: <4FDFE926.7030309@redhat.com>
On Tue, 19 Jun 2012 10:51:18 +0800, Asias He <asias@redhat.com> wrote:
> On 06/18/2012 07:39 PM, Sasha Levin wrote:
> > On Mon, 2012-06-18 at 14:14 +0300, Dor Laor wrote:
> >> On 06/18/2012 01:05 PM, Rusty Russell wrote:
> >>> On Mon, 18 Jun 2012 16:03:23 +0800, Asias He<asias@redhat.com> wrote:
> >>>> On 06/18/2012 03:46 PM, Rusty Russell wrote:
> >>>>> On Mon, 18 Jun 2012 14:53:10 +0800, Asias He<asias@redhat.com> wrote:
> >>>>>> This patch introduces bio-based IO path for virtio-blk.
> >>>>>
> >>>>> Why make it optional?
> >>>>
> >>>> request-based IO path is useful for users who do not want to bypass the
> >>>> IO scheduler in guest kernel, e.g. users using spinning disk. For users
> >>>> using fast disk device, e.g. SSD device, they can use bio-based IO path.
> >>>
> >>> Users using a spinning disk still get IO scheduling in the host though.
> >>> What benefit is there in doing it in the guest as well?
> >>
> >> The io scheduler waits for requests to merge and thus batch IOs
> >> together. It's not important w.r.t spinning disks since the host can do
> >> it but it causes much less vmexits which is the key issue for VMs.
> >
> > Is the amount of exits caused by virtio-blk significant at all with
> > EVENT_IDX?
>
> Yes. EVENT_IDX saves the number of notify and interrupt. Let's take the
> interrupt as an example, The guest fires 200K request to host, the
> number of interrupt is about 6K thanks to EVENT_IDX. The ratio is 200K /
> 6K = 33. The ratio of merging is 40000K / 200K = 20.
Confused. So, without merging we get 6k exits (per second?) How many
do we get when we use the request-based IO path?
If your device is slow, then you won't be able to make many requests per
second: why worry about exit costs? If your device is fast (eg. ram),
you've already shown that your patch is a win, right?
Cheers,
Rusty.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox