* Re: [PATCH 6/6] virtio-scsi: Set shost->max_id=1 for tcm_vhost WWPNs
From: Paolo Bonzini @ 2012-07-05 6:42 UTC (permalink / raw)
To: Nicholas A. Bellinger
Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, Michael S. Tsirkin,
Zhi Yong Wu, Anthony Liguori, target-devel, linux-scsi, lf-virt,
Christoph Hellwig
In-Reply-To: <1341453940.23954.229.camel@haakon2.linux-iscsi.org>
Il 05/07/2012 04:05, Nicholas A. Bellinger ha scritto:
>> > But that's what the config field is for... why can't tcm_vhost (or QEMU)
>> > set max_id to 0?
>> >
> So this patch was carried forward from Stefan's original code that I
> thought was required due to other limitations..
>
> If that's not the case anymore I'm happy to drop it for now and look
> into a proper fix outside of virtio-scsi.
>
I think max_id did not exist in the virtio-scsi configuration at the
time Stefan was working on it.
Paolo
^ permalink raw reply
* Re: [PATCH] virtio-blk: allow toggling host cache between writeback and writethrough
From: Paolo Bonzini @ 2012-07-05 6:45 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: linux-kernel, kvm, virtualization
In-Reply-To: <20120704213037.GA27713@redhat.com>
Il 04/07/2012 23:30, Michael S. Tsirkin ha scritto:
>>>>>> +static int virtblk_get_cache_mode(struct virtio_device *vdev)
>>>>>
>>>>> Why are you converting u8 to int here?
>>>>
>>>> The fact that it is a u8 is really an internal detail. Perhaps the bug
>>>> is using u8 in the callers.
>>>
>>> Make it bool then?
>>>
>>> You are using u8 in the config. So you could get any value
>>> besides 0 and 1, and you interpret that as 1.
>>> Is 1 always a safe value? If not maybe it's better to set
>>> to a safe value if it is not 0 or 1, that is we don't know how to interpret it.
>>
>> That would be a host bug; the spec only gives meaning to 0 and 1.
>
> Yes but if the other side does not validate values implementations
> *will* have bugs. Why not declare bits 1-7 reserved?
Ok, that would be a different change. I thought about it yesterday, but
it seemed like a useless complication. It's not like we have been
adding configuration fields every other week. :)
But I can certainly prepare patches to both virtio-blk and the spec for
that if you prefer.
Paolo
^ permalink raw reply
* [PATCH v3] virtio-scsi: hotplug support for virtio-scsi
From: Cong Meng @ 2012-07-05 7:09 UTC (permalink / raw)
To: Paolo Bonzini
Cc: stefanha, linux-scsi, senwang, zwanp, linuxram, linux-kernel,
virtualization, Cong Meng
This patch implements the hotplug support for virtio-scsi.
When there is a device attached/detached, the virtio-scsi driver will be
signaled via event virtual queue and it will add/remove the scsi device
in question automatically.
v2: handle no_event event
v3: add handle event dropped, and typo fix
Signed-off-by: Sen Wang <senwang@linux.vnet.ibm.com>
Signed-off-by: Cong Meng <mc@linux.vnet.ibm.com>
---
drivers/scsi/virtio_scsi.c | 113 ++++++++++++++++++++++++++++++++++++++++++-
include/linux/virtio_scsi.h | 9 ++++
2 files changed, 121 insertions(+), 1 deletions(-)
diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 9fc5e67..d87446b 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -25,6 +25,7 @@
#include <scsi/scsi_cmnd.h>
#define VIRTIO_SCSI_MEMPOOL_SZ 64
+#define VIRTIO_SCSI_EVENT_LEN 8
/* Command queue element */
struct virtio_scsi_cmd {
@@ -43,6 +44,12 @@ struct virtio_scsi_cmd {
} resp;
} ____cacheline_aligned_in_smp;
+struct virtio_scsi_event_node {
+ struct virtio_scsi *vscsi;
+ struct virtio_scsi_event event;
+ struct work_struct work;
+};
+
struct virtio_scsi_vq {
/* Protects vq */
spinlock_t vq_lock;
@@ -67,6 +74,9 @@ struct virtio_scsi {
struct virtio_scsi_vq event_vq;
struct virtio_scsi_vq req_vq;
+ /* Get some buffers ready for event vq */
+ struct virtio_scsi_event_node event_list[VIRTIO_SCSI_EVENT_LEN];
+
struct virtio_scsi_target_state *tgt[];
};
@@ -202,6 +212,97 @@ static void virtscsi_ctrl_done(struct virtqueue *vq)
spin_unlock_irqrestore(&vscsi->ctrl_vq.vq_lock, flags);
};
+static int virtscsi_kick_event(struct virtio_scsi *vscsi,
+ struct virtio_scsi_event_node *event_node)
+{
+ int ret;
+ struct scatterlist sg;
+ unsigned long flags;
+
+ sg_set_buf(&sg, &event_node->event, sizeof(struct virtio_scsi_event));
+
+ spin_lock_irqsave(&vscsi->event_vq.vq_lock, flags);
+
+ ret = virtqueue_add_buf(vscsi->event_vq.vq, &sg, 0, 1, event_node, GFP_ATOMIC);
+ if (ret >= 0)
+ virtqueue_kick(vscsi->event_vq.vq);
+
+ spin_unlock_irqrestore(&vscsi->event_vq.vq_lock, flags);
+
+ return ret;
+}
+
+static int virtscsi_kick_event_all(struct virtio_scsi *vscsi)
+{
+ int i;
+
+ for (i = 0; i < VIRTIO_SCSI_EVENT_LEN; i++) {
+ vscsi->event_list[i].vscsi = vscsi;
+ virtscsi_kick_event(vscsi, &vscsi->event_list[i]);
+ }
+
+ return 0;
+}
+
+static void virtscsi_handle_transport_reset(struct virtio_scsi *vscsi,
+ struct virtio_scsi_event *event)
+{
+ struct scsi_device *sdev;
+ struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
+ unsigned int target = event->lun[1];
+ unsigned int lun = (event->lun[2] << 8) | event->lun[3];
+
+ switch (event->reason) {
+ case VIRTIO_SCSI_EVT_RESET_RESCAN:
+ scsi_add_device(shost, 0, target, lun);
+ break;
+ case VIRTIO_SCSI_EVT_RESET_REMOVED:
+ sdev = scsi_device_lookup(shost, 0, target, lun);
+ if (sdev) {
+ scsi_remove_device(sdev);
+ scsi_device_put(sdev);
+ } else {
+ pr_err("SCSI device %d 0 %d %d not found\n",
+ shost->host_no, target, lun);
+ }
+ break;
+ default:
+ pr_info("Unsupport virtio scsi event reason %x\n", event->reason);
+ }
+}
+
+static void virtscsi_handle_event(struct work_struct *work)
+{
+ struct virtio_scsi_event_node *event_node =
+ container_of(work, struct virtio_scsi_event_node, work);
+ struct virtio_scsi *vscsi = event_node->vscsi;
+ struct virtio_scsi_event *event = &event_node->event;
+
+ if (event->event & VIRTIO_SCSI_T_EVENTS_MISSED) {
+ event->event &= (~VIRTIO_SCSI_T_EVENTS_MISSED);
+ scsi_scan_host(virtio_scsi_host(vscsi->vdev));
+ }
+
+ switch (event->event) {
+ case VIRTIO_SCSI_T_NO_EVENT:
+ break;
+ case VIRTIO_SCSI_T_TRANSPORT_RESET:
+ virtscsi_handle_transport_reset(vscsi, event);
+ break;
+ default:
+ pr_err("Unsupport virtio scsi event %x\n", event->event);
+ }
+ virtscsi_kick_event(vscsi, event_node);
+}
+
+static void virtscsi_complete_event(void *buf)
+{
+ struct virtio_scsi_event_node *event_node = buf;
+
+ INIT_WORK(&event_node->work, virtscsi_handle_event);
+ schedule_work(&event_node->work);
+}
+
static void virtscsi_event_done(struct virtqueue *vq)
{
struct Scsi_Host *sh = virtio_scsi_host(vq->vdev);
@@ -209,7 +310,7 @@ static void virtscsi_event_done(struct virtqueue *vq)
unsigned long flags;
spin_lock_irqsave(&vscsi->event_vq.vq_lock, flags);
- virtscsi_vq_done(vq, virtscsi_complete_free);
+ virtscsi_vq_done(vq, virtscsi_complete_event);
spin_unlock_irqrestore(&vscsi->event_vq.vq_lock, flags);
};
@@ -510,6 +611,10 @@ static int virtscsi_init(struct virtio_device *vdev,
virtscsi_config_set(vdev, cdb_size, VIRTIO_SCSI_CDB_SIZE);
virtscsi_config_set(vdev, sense_size, VIRTIO_SCSI_SENSE_SIZE);
+ if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
+ virtscsi_kick_event_all(vscsi);
+ }
+
/* We need to know how many segments before we allocate. */
sg_elems = virtscsi_config_get(vdev, seg_max) ?: 1;
@@ -608,7 +713,13 @@ static struct virtio_device_id id_table[] = {
{ 0 },
};
+static unsigned int features[] = {
+ VIRTIO_SCSI_F_HOTPLUG
+};
+
static struct virtio_driver virtio_scsi_driver = {
+ .feature_table = features,
+ .feature_table_size = ARRAY_SIZE(features),
.driver.name = KBUILD_MODNAME,
.driver.owner = THIS_MODULE,
.id_table = id_table,
diff --git a/include/linux/virtio_scsi.h b/include/linux/virtio_scsi.h
index 8ddeafd..dc8d305 100644
--- a/include/linux/virtio_scsi.h
+++ b/include/linux/virtio_scsi.h
@@ -69,6 +69,10 @@ struct virtio_scsi_config {
u32 max_lun;
} __packed;
+/* Feature Bits */
+#define VIRTIO_SCSI_F_INOUT 0
+#define VIRTIO_SCSI_F_HOTPLUG 1
+
/* Response codes */
#define VIRTIO_SCSI_S_OK 0
#define VIRTIO_SCSI_S_OVERRUN 1
@@ -105,6 +109,11 @@ struct virtio_scsi_config {
#define VIRTIO_SCSI_T_TRANSPORT_RESET 1
#define VIRTIO_SCSI_T_ASYNC_NOTIFY 2
+/* Reasons of transport reset event */
+#define VIRTIO_SCSI_EVT_RESET_HARD 0
+#define VIRTIO_SCSI_EVT_RESET_RESCAN 1
+#define VIRTIO_SCSI_EVT_RESET_REMOVED 2
+
#define VIRTIO_SCSI_S_SIMPLE 0
#define VIRTIO_SCSI_S_ORDERED 1
#define VIRTIO_SCSI_S_HEAD 2
--
1.7.7.6
^ permalink raw reply related
* Re: [PATCH v3] virtio-scsi: hotplug support for virtio-scsi
From: Paolo Bonzini @ 2012-07-05 7:24 UTC (permalink / raw)
To: Cong Meng
Cc: stefanha, linux-scsi, senwang, zwanp, linuxram, linux-kernel,
virtualization
In-Reply-To: <1341472141-19508-1-git-send-email-mc@linux.vnet.ibm.com>
Il 05/07/2012 09:09, Cong Meng ha scritto:
> This patch implements the hotplug support for virtio-scsi.
> When there is a device attached/detached, the virtio-scsi driver will be
> signaled via event virtual queue and it will add/remove the scsi device
> in question automatically.
>
> v2: handle no_event event
> v3: add handle event dropped, and typo fix
>
> Signed-off-by: Sen Wang <senwang@linux.vnet.ibm.com>
> Signed-off-by: Cong Meng <mc@linux.vnet.ibm.com>
> ---
> drivers/scsi/virtio_scsi.c | 113 ++++++++++++++++++++++++++++++++++++++++++-
> include/linux/virtio_scsi.h | 9 ++++
> 2 files changed, 121 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index 9fc5e67..d87446b 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -25,6 +25,7 @@
> #include <scsi/scsi_cmnd.h>
>
> #define VIRTIO_SCSI_MEMPOOL_SZ 64
> +#define VIRTIO_SCSI_EVENT_LEN 8
>
> /* Command queue element */
> struct virtio_scsi_cmd {
> @@ -43,6 +44,12 @@ struct virtio_scsi_cmd {
> } resp;
> } ____cacheline_aligned_in_smp;
>
> +struct virtio_scsi_event_node {
> + struct virtio_scsi *vscsi;
> + struct virtio_scsi_event event;
> + struct work_struct work;
> +};
> +
> struct virtio_scsi_vq {
> /* Protects vq */
> spinlock_t vq_lock;
> @@ -67,6 +74,9 @@ struct virtio_scsi {
> struct virtio_scsi_vq event_vq;
> struct virtio_scsi_vq req_vq;
>
> + /* Get some buffers ready for event vq */
> + struct virtio_scsi_event_node event_list[VIRTIO_SCSI_EVENT_LEN];
> +
> struct virtio_scsi_target_state *tgt[];
> };
>
> @@ -202,6 +212,97 @@ static void virtscsi_ctrl_done(struct virtqueue *vq)
> spin_unlock_irqrestore(&vscsi->ctrl_vq.vq_lock, flags);
> };
>
> +static int virtscsi_kick_event(struct virtio_scsi *vscsi,
> + struct virtio_scsi_event_node *event_node)
> +{
> + int ret;
> + struct scatterlist sg;
> + unsigned long flags;
> +
> + sg_set_buf(&sg, &event_node->event, sizeof(struct virtio_scsi_event));
> +
> + spin_lock_irqsave(&vscsi->event_vq.vq_lock, flags);
> +
> + ret = virtqueue_add_buf(vscsi->event_vq.vq, &sg, 0, 1, event_node, GFP_ATOMIC);
> + if (ret >= 0)
> + virtqueue_kick(vscsi->event_vq.vq);
> +
> + spin_unlock_irqrestore(&vscsi->event_vq.vq_lock, flags);
> +
> + return ret;
> +}
> +
> +static int virtscsi_kick_event_all(struct virtio_scsi *vscsi)
> +{
> + int i;
> +
> + for (i = 0; i < VIRTIO_SCSI_EVENT_LEN; i++) {
> + vscsi->event_list[i].vscsi = vscsi;
> + virtscsi_kick_event(vscsi, &vscsi->event_list[i]);
> + }
> +
> + return 0;
> +}
> +
> +static void virtscsi_handle_transport_reset(struct virtio_scsi *vscsi,
> + struct virtio_scsi_event *event)
> +{
> + struct scsi_device *sdev;
> + struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
> + unsigned int target = event->lun[1];
> + unsigned int lun = (event->lun[2] << 8) | event->lun[3];
> +
> + switch (event->reason) {
> + case VIRTIO_SCSI_EVT_RESET_RESCAN:
> + scsi_add_device(shost, 0, target, lun);
> + break;
> + case VIRTIO_SCSI_EVT_RESET_REMOVED:
> + sdev = scsi_device_lookup(shost, 0, target, lun);
> + if (sdev) {
> + scsi_remove_device(sdev);
> + scsi_device_put(sdev);
> + } else {
> + pr_err("SCSI device %d 0 %d %d not found\n",
> + shost->host_no, target, lun);
> + }
> + break;
> + default:
> + pr_info("Unsupport virtio scsi event reason %x\n", event->reason);
> + }
> +}
> +
> +static void virtscsi_handle_event(struct work_struct *work)
> +{
> + struct virtio_scsi_event_node *event_node =
> + container_of(work, struct virtio_scsi_event_node, work);
> + struct virtio_scsi *vscsi = event_node->vscsi;
> + struct virtio_scsi_event *event = &event_node->event;
> +
> + if (event->event & VIRTIO_SCSI_T_EVENTS_MISSED) {
> + event->event &= (~VIRTIO_SCSI_T_EVENTS_MISSED);
Coding standard (no parentheses on the right hand side of operators).
Also, you're missing a loop that calls cancel_work_sync in virtscsi_remove.
I believe with these changes we're good to go. Thanks again, and sorry
for missing the cancel_work_sync issue so far.
Paolo
> + scsi_scan_host(virtio_scsi_host(vscsi->vdev));
> + }
> +
> + switch (event->event) {
> + case VIRTIO_SCSI_T_NO_EVENT:
> + break;
> + case VIRTIO_SCSI_T_TRANSPORT_RESET:
> + virtscsi_handle_transport_reset(vscsi, event);
> + break;
> + default:
> + pr_err("Unsupport virtio scsi event %x\n", event->event);
> + }
> + virtscsi_kick_event(vscsi, event_node);
> +}
> +
> +static void virtscsi_complete_event(void *buf)
> +{
> + struct virtio_scsi_event_node *event_node = buf;
> +
> + INIT_WORK(&event_node->work, virtscsi_handle_event);
> + schedule_work(&event_node->work);
> +}
> +
> static void virtscsi_event_done(struct virtqueue *vq)
> {
> struct Scsi_Host *sh = virtio_scsi_host(vq->vdev);
> @@ -209,7 +310,7 @@ static void virtscsi_event_done(struct virtqueue *vq)
> unsigned long flags;
>
> spin_lock_irqsave(&vscsi->event_vq.vq_lock, flags);
> - virtscsi_vq_done(vq, virtscsi_complete_free);
> + virtscsi_vq_done(vq, virtscsi_complete_event);
> spin_unlock_irqrestore(&vscsi->event_vq.vq_lock, flags);
> };
>
> @@ -510,6 +611,10 @@ static int virtscsi_init(struct virtio_device *vdev,
> virtscsi_config_set(vdev, cdb_size, VIRTIO_SCSI_CDB_SIZE);
> virtscsi_config_set(vdev, sense_size, VIRTIO_SCSI_SENSE_SIZE);
>
> + if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
> + virtscsi_kick_event_all(vscsi);
> + }
> +
> /* We need to know how many segments before we allocate. */
> sg_elems = virtscsi_config_get(vdev, seg_max) ?: 1;
>
> @@ -608,7 +713,13 @@ static struct virtio_device_id id_table[] = {
> { 0 },
> };
>
> +static unsigned int features[] = {
> + VIRTIO_SCSI_F_HOTPLUG
> +};
> +
> static struct virtio_driver virtio_scsi_driver = {
> + .feature_table = features,
> + .feature_table_size = ARRAY_SIZE(features),
> .driver.name = KBUILD_MODNAME,
> .driver.owner = THIS_MODULE,
> .id_table = id_table,
> diff --git a/include/linux/virtio_scsi.h b/include/linux/virtio_scsi.h
> index 8ddeafd..dc8d305 100644
> --- a/include/linux/virtio_scsi.h
> +++ b/include/linux/virtio_scsi.h
> @@ -69,6 +69,10 @@ struct virtio_scsi_config {
> u32 max_lun;
> } __packed;
>
> +/* Feature Bits */
> +#define VIRTIO_SCSI_F_INOUT 0
> +#define VIRTIO_SCSI_F_HOTPLUG 1
> +
> /* Response codes */
> #define VIRTIO_SCSI_S_OK 0
> #define VIRTIO_SCSI_S_OVERRUN 1
> @@ -105,6 +109,11 @@ struct virtio_scsi_config {
> #define VIRTIO_SCSI_T_TRANSPORT_RESET 1
> #define VIRTIO_SCSI_T_ASYNC_NOTIFY 2
>
> +/* Reasons of transport reset event */
> +#define VIRTIO_SCSI_EVT_RESET_HARD 0
> +#define VIRTIO_SCSI_EVT_RESET_RESCAN 1
> +#define VIRTIO_SCSI_EVT_RESET_REMOVED 2
> +
> #define VIRTIO_SCSI_S_SIMPLE 0
> #define VIRTIO_SCSI_S_ORDERED 1
> #define VIRTIO_SCSI_S_HEAD 2
>
^ permalink raw reply
* [PATCH v4] virtio-scsi: hotplug support for virtio-scsi
From: Cong Meng @ 2012-07-05 9:06 UTC (permalink / raw)
To: Paolo Bonzini
Cc: stefanha, linux-scsi, senwang, zwanp, linuxram, linux-kernel,
virtualization, Cong Meng
This patch implements the hotplug support for virtio-scsi.
When there is a device attached/detached, the virtio-scsi driver will be
signaled via event virtual queue and it will add/remove the scsi device
in question automatically.
v2: handle no_event event
v3: add handle event dropped, and typo fix
v4: Cancel event works when exit. Coding type fix.
Signed-off-by: Sen Wang <senwang@linux.vnet.ibm.com>
Signed-off-by: Cong Meng <mc@linux.vnet.ibm.com>
---
drivers/scsi/virtio_scsi.c | 127 ++++++++++++++++++++++++++++++++++++++++++-
include/linux/virtio_scsi.h | 9 +++
2 files changed, 135 insertions(+), 1 deletions(-)
diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 9fc5e67..173cb39 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -25,6 +25,7 @@
#include <scsi/scsi_cmnd.h>
#define VIRTIO_SCSI_MEMPOOL_SZ 64
+#define VIRTIO_SCSI_EVENT_LEN 8
/* Command queue element */
struct virtio_scsi_cmd {
@@ -43,6 +44,12 @@ struct virtio_scsi_cmd {
} resp;
} ____cacheline_aligned_in_smp;
+struct virtio_scsi_event_node {
+ struct virtio_scsi *vscsi;
+ struct virtio_scsi_event event;
+ struct work_struct work;
+};
+
struct virtio_scsi_vq {
/* Protects vq */
spinlock_t vq_lock;
@@ -67,6 +74,9 @@ struct virtio_scsi {
struct virtio_scsi_vq event_vq;
struct virtio_scsi_vq req_vq;
+ /* Get some buffers ready for event vq */
+ struct virtio_scsi_event_node event_list[VIRTIO_SCSI_EVENT_LEN];
+
struct virtio_scsi_target_state *tgt[];
};
@@ -202,6 +212,106 @@ static void virtscsi_ctrl_done(struct virtqueue *vq)
spin_unlock_irqrestore(&vscsi->ctrl_vq.vq_lock, flags);
};
+static int virtscsi_kick_event(struct virtio_scsi *vscsi,
+ struct virtio_scsi_event_node *event_node)
+{
+ int ret;
+ struct scatterlist sg;
+ unsigned long flags;
+
+ sg_set_buf(&sg, &event_node->event, sizeof(struct virtio_scsi_event));
+
+ spin_lock_irqsave(&vscsi->event_vq.vq_lock, flags);
+
+ ret = virtqueue_add_buf(vscsi->event_vq.vq, &sg, 0, 1, event_node, GFP_ATOMIC);
+ if (ret >= 0)
+ virtqueue_kick(vscsi->event_vq.vq);
+
+ spin_unlock_irqrestore(&vscsi->event_vq.vq_lock, flags);
+
+ return ret;
+}
+
+static int virtscsi_kick_event_all(struct virtio_scsi *vscsi)
+{
+ int i;
+
+ for (i = 0; i < VIRTIO_SCSI_EVENT_LEN; i++) {
+ vscsi->event_list[i].vscsi = vscsi;
+ virtscsi_kick_event(vscsi, &vscsi->event_list[i]);
+ }
+
+ return 0;
+}
+
+static void virtscsi_cancel_event_work(struct virtio_scsi *vscsi)
+{
+ int i;
+
+ for (i = 0; i < VIRTIO_SCSI_EVENT_LEN; i++) {
+ cancel_work_sync(&vscsi->event_list[i].work);
+ }
+}
+
+static void virtscsi_handle_transport_reset(struct virtio_scsi *vscsi,
+ struct virtio_scsi_event *event)
+{
+ struct scsi_device *sdev;
+ struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
+ unsigned int target = event->lun[1];
+ unsigned int lun = (event->lun[2] << 8) | event->lun[3];
+
+ switch (event->reason) {
+ case VIRTIO_SCSI_EVT_RESET_RESCAN:
+ scsi_add_device(shost, 0, target, lun);
+ break;
+ case VIRTIO_SCSI_EVT_RESET_REMOVED:
+ sdev = scsi_device_lookup(shost, 0, target, lun);
+ if (sdev) {
+ scsi_remove_device(sdev);
+ scsi_device_put(sdev);
+ } else {
+ pr_err("SCSI device %d 0 %d %d not found\n",
+ shost->host_no, target, lun);
+ }
+ break;
+ default:
+ pr_info("Unsupport virtio scsi event reason %x\n", event->reason);
+ }
+}
+
+static void virtscsi_handle_event(struct work_struct *work)
+{
+ struct virtio_scsi_event_node *event_node =
+ container_of(work, struct virtio_scsi_event_node, work);
+ struct virtio_scsi *vscsi = event_node->vscsi;
+ struct virtio_scsi_event *event = &event_node->event;
+
+ if (event->event & VIRTIO_SCSI_T_EVENTS_MISSED) {
+ event->event &= ~VIRTIO_SCSI_T_EVENTS_MISSED;
+ scsi_scan_host(virtio_scsi_host(vscsi->vdev));
+ }
+
+ switch (event->event) {
+ case VIRTIO_SCSI_T_NO_EVENT:
+ break;
+ case VIRTIO_SCSI_T_TRANSPORT_RESET:
+ virtscsi_handle_transport_reset(vscsi, event);
+ break;
+ default:
+ pr_err("Unsupport virtio scsi event %x\n", event->event);
+ }
+ virtscsi_kick_event(vscsi, event_node);
+}
+
+static void virtscsi_complete_event(void *buf)
+{
+ struct virtio_scsi_event_node *event_node = buf;
+
+ INIT_WORK(&event_node->work, virtscsi_handle_event);
+ schedule_work(&event_node->work);
+}
+
static void virtscsi_event_done(struct virtqueue *vq)
{
struct Scsi_Host *sh = virtio_scsi_host(vq->vdev);
@@ -209,7 +319,7 @@ static void virtscsi_event_done(struct virtqueue *vq)
unsigned long flags;
spin_lock_irqsave(&vscsi->event_vq.vq_lock, flags);
- virtscsi_vq_done(vq, virtscsi_complete_free);
+ virtscsi_vq_done(vq, virtscsi_complete_event);
spin_unlock_irqrestore(&vscsi->event_vq.vq_lock, flags);
};
@@ -510,6 +620,10 @@ static int virtscsi_init(struct virtio_device *vdev,
virtscsi_config_set(vdev, cdb_size, VIRTIO_SCSI_CDB_SIZE);
virtscsi_config_set(vdev, sense_size, VIRTIO_SCSI_SENSE_SIZE);
+ if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
+ virtscsi_kick_event_all(vscsi);
+ }
+
/* We need to know how many segments before we allocate. */
sg_elems = virtscsi_config_get(vdev, seg_max) ?: 1;
@@ -580,6 +694,11 @@ virtscsi_init_failed:
static void __devexit virtscsi_remove(struct virtio_device *vdev)
{
struct Scsi_Host *shost = virtio_scsi_host(vdev);
+ struct virtio_scsi *vscsi = shost_priv(shost);
+
+ if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
+ virtscsi_cancel_event_work(vscsi);
+ }
scsi_remove_host(shost);
@@ -608,7 +727,13 @@ static struct virtio_device_id id_table[] = {
{ 0 },
};
+static unsigned int features[] = {
+ VIRTIO_SCSI_F_HOTPLUG
+};
+
static struct virtio_driver virtio_scsi_driver = {
+ .feature_table = features,
+ .feature_table_size = ARRAY_SIZE(features),
.driver.name = KBUILD_MODNAME,
.driver.owner = THIS_MODULE,
.id_table = id_table,
diff --git a/include/linux/virtio_scsi.h b/include/linux/virtio_scsi.h
index 8ddeafd..dc8d305 100644
--- a/include/linux/virtio_scsi.h
+++ b/include/linux/virtio_scsi.h
@@ -69,6 +69,10 @@ struct virtio_scsi_config {
u32 max_lun;
} __packed;
+/* Feature Bits */
+#define VIRTIO_SCSI_F_INOUT 0
+#define VIRTIO_SCSI_F_HOTPLUG 1
+
/* Response codes */
#define VIRTIO_SCSI_S_OK 0
#define VIRTIO_SCSI_S_OVERRUN 1
@@ -105,6 +109,11 @@ struct virtio_scsi_config {
#define VIRTIO_SCSI_T_TRANSPORT_RESET 1
#define VIRTIO_SCSI_T_ASYNC_NOTIFY 2
+/* Reasons of transport reset event */
+#define VIRTIO_SCSI_EVT_RESET_HARD 0
+#define VIRTIO_SCSI_EVT_RESET_RESCAN 1
+#define VIRTIO_SCSI_EVT_RESET_REMOVED 2
+
#define VIRTIO_SCSI_S_SIMPLE 0
#define VIRTIO_SCSI_S_ORDERED 1
#define VIRTIO_SCSI_S_HEAD 2
--
1.7.7.6
^ permalink raw reply related
* Re: [PATCH v4] virtio-scsi: hotplug support for virtio-scsi
From: Paolo Bonzini @ 2012-07-05 9:19 UTC (permalink / raw)
To: Cong Meng
Cc: stefanha, linux-scsi, senwang, zwanp, linuxram, linux-kernel,
virtualization
In-Reply-To: <1341479203-12058-1-git-send-email-mc@linux.vnet.ibm.com>
Il 05/07/2012 11:06, Cong Meng ha scritto:
> This patch implements the hotplug support for virtio-scsi.
> When there is a device attached/detached, the virtio-scsi driver will be
> signaled via event virtual queue and it will add/remove the scsi device
> in question automatically.
>
> v2: handle no_event event
> v3: add handle event dropped, and typo fix
> v4: Cancel event works when exit. Coding type fix.
>
> Signed-off-by: Sen Wang <senwang@linux.vnet.ibm.com>
> Signed-off-by: Cong Meng <mc@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> drivers/scsi/virtio_scsi.c | 127 ++++++++++++++++++++++++++++++++++++++++++-
> include/linux/virtio_scsi.h | 9 +++
> 2 files changed, 135 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index 9fc5e67..173cb39 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -25,6 +25,7 @@
> #include <scsi/scsi_cmnd.h>
>
> #define VIRTIO_SCSI_MEMPOOL_SZ 64
> +#define VIRTIO_SCSI_EVENT_LEN 8
>
> /* Command queue element */
> struct virtio_scsi_cmd {
> @@ -43,6 +44,12 @@ struct virtio_scsi_cmd {
> } resp;
> } ____cacheline_aligned_in_smp;
>
> +struct virtio_scsi_event_node {
> + struct virtio_scsi *vscsi;
> + struct virtio_scsi_event event;
> + struct work_struct work;
> +};
> +
> struct virtio_scsi_vq {
> /* Protects vq */
> spinlock_t vq_lock;
> @@ -67,6 +74,9 @@ struct virtio_scsi {
> struct virtio_scsi_vq event_vq;
> struct virtio_scsi_vq req_vq;
>
> + /* Get some buffers ready for event vq */
> + struct virtio_scsi_event_node event_list[VIRTIO_SCSI_EVENT_LEN];
> +
> struct virtio_scsi_target_state *tgt[];
> };
>
> @@ -202,6 +212,106 @@ static void virtscsi_ctrl_done(struct virtqueue *vq)
> spin_unlock_irqrestore(&vscsi->ctrl_vq.vq_lock, flags);
> };
>
> +static int virtscsi_kick_event(struct virtio_scsi *vscsi,
> + struct virtio_scsi_event_node *event_node)
> +{
> + int ret;
> + struct scatterlist sg;
> + unsigned long flags;
> +
> + sg_set_buf(&sg, &event_node->event, sizeof(struct virtio_scsi_event));
> +
> + spin_lock_irqsave(&vscsi->event_vq.vq_lock, flags);
> +
> + ret = virtqueue_add_buf(vscsi->event_vq.vq, &sg, 0, 1, event_node, GFP_ATOMIC);
> + if (ret >= 0)
> + virtqueue_kick(vscsi->event_vq.vq);
> +
> + spin_unlock_irqrestore(&vscsi->event_vq.vq_lock, flags);
> +
> + return ret;
> +}
> +
> +static int virtscsi_kick_event_all(struct virtio_scsi *vscsi)
> +{
> + int i;
> +
> + for (i = 0; i < VIRTIO_SCSI_EVENT_LEN; i++) {
> + vscsi->event_list[i].vscsi = vscsi;
> + virtscsi_kick_event(vscsi, &vscsi->event_list[i]);
> + }
> +
> + return 0;
> +}
> +
> +static void virtscsi_cancel_event_work(struct virtio_scsi *vscsi)
> +{
> + int i;
> +
> + for (i = 0; i < VIRTIO_SCSI_EVENT_LEN; i++) {
> + cancel_work_sync(&vscsi->event_list[i].work);
> + }
> +}
> +
> +static void virtscsi_handle_transport_reset(struct virtio_scsi *vscsi,
> + struct virtio_scsi_event *event)
> +{
> + struct scsi_device *sdev;
> + struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
> + unsigned int target = event->lun[1];
> + unsigned int lun = (event->lun[2] << 8) | event->lun[3];
> +
> + switch (event->reason) {
> + case VIRTIO_SCSI_EVT_RESET_RESCAN:
> + scsi_add_device(shost, 0, target, lun);
> + break;
> + case VIRTIO_SCSI_EVT_RESET_REMOVED:
> + sdev = scsi_device_lookup(shost, 0, target, lun);
> + if (sdev) {
> + scsi_remove_device(sdev);
> + scsi_device_put(sdev);
> + } else {
> + pr_err("SCSI device %d 0 %d %d not found\n",
> + shost->host_no, target, lun);
> + }
> + break;
> + default:
> + pr_info("Unsupport virtio scsi event reason %x\n", event->reason);
> + }
> +}
> +
> +static void virtscsi_handle_event(struct work_struct *work)
> +{
> + struct virtio_scsi_event_node *event_node =
> + container_of(work, struct virtio_scsi_event_node, work);
> + struct virtio_scsi *vscsi = event_node->vscsi;
> + struct virtio_scsi_event *event = &event_node->event;
> +
> + if (event->event & VIRTIO_SCSI_T_EVENTS_MISSED) {
> + event->event &= ~VIRTIO_SCSI_T_EVENTS_MISSED;
> + scsi_scan_host(virtio_scsi_host(vscsi->vdev));
> + }
> +
> + switch (event->event) {
> + case VIRTIO_SCSI_T_NO_EVENT:
> + break;
> + case VIRTIO_SCSI_T_TRANSPORT_RESET:
> + virtscsi_handle_transport_reset(vscsi, event);
> + break;
> + default:
> + pr_err("Unsupport virtio scsi event %x\n", event->event);
> + }
> + virtscsi_kick_event(vscsi, event_node);
> +}
> +
> +static void virtscsi_complete_event(void *buf)
> +{
> + struct virtio_scsi_event_node *event_node = buf;
> +
> + INIT_WORK(&event_node->work, virtscsi_handle_event);
> + schedule_work(&event_node->work);
> +}
> +
> static void virtscsi_event_done(struct virtqueue *vq)
> {
> struct Scsi_Host *sh = virtio_scsi_host(vq->vdev);
> @@ -209,7 +319,7 @@ static void virtscsi_event_done(struct virtqueue *vq)
> unsigned long flags;
>
> spin_lock_irqsave(&vscsi->event_vq.vq_lock, flags);
> - virtscsi_vq_done(vq, virtscsi_complete_free);
> + virtscsi_vq_done(vq, virtscsi_complete_event);
> spin_unlock_irqrestore(&vscsi->event_vq.vq_lock, flags);
> };
>
> @@ -510,6 +620,10 @@ static int virtscsi_init(struct virtio_device *vdev,
> virtscsi_config_set(vdev, cdb_size, VIRTIO_SCSI_CDB_SIZE);
> virtscsi_config_set(vdev, sense_size, VIRTIO_SCSI_SENSE_SIZE);
>
> + if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
> + virtscsi_kick_event_all(vscsi);
> + }
> +
> /* We need to know how many segments before we allocate. */
> sg_elems = virtscsi_config_get(vdev, seg_max) ?: 1;
>
> @@ -580,6 +694,11 @@ virtscsi_init_failed:
> static void __devexit virtscsi_remove(struct virtio_device *vdev)
> {
> struct Scsi_Host *shost = virtio_scsi_host(vdev);
> + struct virtio_scsi *vscsi = shost_priv(shost);
> +
> + if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
> + virtscsi_cancel_event_work(vscsi);
> + }
>
> scsi_remove_host(shost);
>
> @@ -608,7 +727,13 @@ static struct virtio_device_id id_table[] = {
> { 0 },
> };
>
> +static unsigned int features[] = {
> + VIRTIO_SCSI_F_HOTPLUG
> +};
> +
> static struct virtio_driver virtio_scsi_driver = {
> + .feature_table = features,
> + .feature_table_size = ARRAY_SIZE(features),
> .driver.name = KBUILD_MODNAME,
> .driver.owner = THIS_MODULE,
> .id_table = id_table,
> diff --git a/include/linux/virtio_scsi.h b/include/linux/virtio_scsi.h
> index 8ddeafd..dc8d305 100644
> --- a/include/linux/virtio_scsi.h
> +++ b/include/linux/virtio_scsi.h
> @@ -69,6 +69,10 @@ struct virtio_scsi_config {
> u32 max_lun;
> } __packed;
>
> +/* Feature Bits */
> +#define VIRTIO_SCSI_F_INOUT 0
> +#define VIRTIO_SCSI_F_HOTPLUG 1
> +
> /* Response codes */
> #define VIRTIO_SCSI_S_OK 0
> #define VIRTIO_SCSI_S_OVERRUN 1
> @@ -105,6 +109,11 @@ struct virtio_scsi_config {
> #define VIRTIO_SCSI_T_TRANSPORT_RESET 1
> #define VIRTIO_SCSI_T_ASYNC_NOTIFY 2
>
> +/* Reasons of transport reset event */
> +#define VIRTIO_SCSI_EVT_RESET_HARD 0
> +#define VIRTIO_SCSI_EVT_RESET_RESCAN 1
> +#define VIRTIO_SCSI_EVT_RESET_REMOVED 2
> +
> #define VIRTIO_SCSI_S_SIMPLE 0
> #define VIRTIO_SCSI_S_ORDERED 1
> #define VIRTIO_SCSI_S_HEAD 2
>
^ permalink raw reply
* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
From: Michael S. Tsirkin @ 2012-07-05 9:31 UTC (permalink / raw)
To: Nicholas A. Bellinger
Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, lf-virt, Anthony Liguori,
target-devel, linux-scsi, Paolo Bonzini, Zhi Yong Wu,
Christoph Hellwig
In-Reply-To: <1341453665.23954.224.camel@haakon2.linux-iscsi.org>
On Wed, Jul 04, 2012 at 07:01:05PM -0700, Nicholas A. Bellinger wrote:
> On Wed, 2012-07-04 at 18:05 +0300, Michael S. Tsirkin wrote:
> > On Wed, Jul 04, 2012 at 04:52:00PM +0200, Paolo Bonzini wrote:
> > > Il 04/07/2012 16:02, Michael S. Tsirkin ha scritto:
> > > > On Wed, Jul 04, 2012 at 04:24:00AM +0000, Nicholas A. Bellinger wrote:
> > > >> From: Nicholas Bellinger <nab@linux-iscsi.org>
> > > >>
> > > >> Hi folks,
> > > >>
> > > >> This series contains patches required to update tcm_vhost <-> virtio-scsi
> > > >> connected hosts <-> guests to run on v3.5-rc2 mainline code. This series is
> > > >> available on top of target-pending/auto-next here:
> > > >>
> > > >> git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git tcm_vhost
> > > >>
> > > >> This includes the necessary vhost changes from Stefan to to get tcm_vhost
> > > >> functioning, along a virtio-scsi LUN scanning change to address a client bug
> > > >> with tcm_vhost I ran into.. Also, tcm_vhost driver has been merged into a single
> > > >> source + header file that is now living under /drivers/vhost/, along with latest
> > > >> tcm_vhost changes from Zhi's tcm_vhost tree.
> > > >>
> > > >> Here are a couple of screenshots of the code in action using raw IBLOCK
> > > >> backends provided by FusionIO ioDrive Duo:
> > > >>
> > > >> http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-3.png
> > > >> http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-4.png
> > > >>
> > > >> So the next steps on my end will be converting tcm_vhost to submit backend I/O from
> > > >> cmwq context, along with fio benchmark numbers between tcm_vhost/virtio-scsi and
> > > >> virtio-scsi-raw using raw IBLOCK iomemory_vsl flash.
> > > >
> > > > OK so this is an RFC, not for merge yet?
> > >
> > > Patch 6 definitely looks RFCish, but patch 5 should go in anyway.
> > >
> > > Paolo
> >
> > I was talking about 4/6 first of all.
>
> So yeah, this code is still considered RFC at this point for-3.6, but
> I'd like to get this into target-pending/for-next in next week for more
> feedback and start collecting signoffs for the necessary pieces that
> effect existing vhost code.
>
> By that time the cmwq conversion of tcm_vhost should be in place as
> well..
I'll try to give some feedback but I think we do need
to see the qemu patches - they weren't posted yet, were they?
This driver has some userspace interface and once
that is merged it has to be supported.
So I think we need the buy-in from the qemu side at the principal level.
> > Anyway, it's best to split, not to mix RFCs and fixes.
> >
>
> <nod>, I'll send patch #5 separately to linux-scsi -> James and CC
> stable following Paolo's request.
>
> Thanks!
>
> --nab
^ permalink raw reply
* [PATCH] MAINTAINERS: add kvm list for virtio components
From: Paolo Bonzini @ 2012-07-05 10:07 UTC (permalink / raw)
To: linux-kernel; +Cc: amit.shah, mst, kvm, virtualization
The KVM list is followed by more people than the generic
virtualization@lists.linux-foundation.org mailing list, and is
already "de facto" the place where virtio patches are posted.
pv-ops still has no other lists than virtualization@lists.linux-foundation.org.
However, pv-ops patches will likely touch Xen or KVM files as well and
the respective mailing list will usually be reached as well.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
MAINTAINERS | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 14bc707..e265f2e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7340,6 +7340,7 @@ F: include/media/videobuf2-*
VIRTIO CONSOLE DRIVER
M: Amit Shah <amit.shah@redhat.com>
L: virtualization@lists.linux-foundation.org
+L: kvm@vger.kernel.org
S: Maintained
F: drivers/char/virtio_console.c
F: include/linux/virtio_console.h
@@ -7348,6 +7349,7 @@ VIRTIO CORE, NET AND BLOCK DRIVERS
M: Rusty Russell <rusty@rustcorp.com.au>
M: "Michael S. Tsirkin" <mst@redhat.com>
L: virtualization@lists.linux-foundation.org
+L: kvm@vger.kernel.org
S: Maintained
F: drivers/virtio/
F: drivers/net/virtio_net.c
--
1.7.1
^ permalink raw reply related
* [PATCH] virtio-blk: add back VIRTIO_BLK_F_FLUSH
From: Paolo Bonzini @ 2012-07-05 10:08 UTC (permalink / raw)
To: linux-kernel, virtualization, kvm; +Cc: levinsasha928, Michael S. Tsirkin
The old name is part of the userspace API, add it back for compatibility.
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
include/linux/virtio_blk.h | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/include/linux/virtio_blk.h b/include/linux/virtio_blk.h
index 18a1027..83a3116 100644
--- a/include/linux/virtio_blk.h
+++ b/include/linux/virtio_blk.h
@@ -41,6 +41,9 @@
#define VIRTIO_BLK_F_TOPOLOGY 10 /* Topology information is available */
#define VIRTIO_BLK_F_CONFIG_WCE 11 /* Writeback mode available in config */
+/* Backwards-compatibility #defines for renamed features. */
+#define VIRTIO_BLK_F_FLUSH VIRTIO_BLK_F_WCE
+
#define VIRTIO_BLK_ID_BYTES 20 /* ID string length */
struct virtio_blk_config {
--
1.7.1
^ permalink raw reply related
* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
From: Paolo Bonzini @ 2012-07-05 10:22 UTC (permalink / raw)
To: Nicholas A. Bellinger
Cc: Jens Axboe, Anthony Liguori, linux-scsi, kvm-devel,
Michael S. Tsirkin, lf-virt, Anthony Liguori, target-devel,
Zhi Yong Wu, Christoph Hellwig, Stefan Hajnoczi
In-Reply-To: <1341453135.23954.214.camel@haakon2.linux-iscsi.org>
Il 05/07/2012 03:52, Nicholas A. Bellinger ha scritto:
>
> fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal raw block
> ------------------------------------------------------------------------------------
> 25 Write / 75 Read | ~15K | ~45K | ~70K
> 75 Write / 25 Read | ~20K | ~55K | ~60K
This is impressive, but I think it's still not enough to justify the
inclusion of tcm_vhost. In my opinion, vhost-blk/vhost-scsi are mostly
worthwhile as drivers for improvements to QEMU performance. We want to
add more fast paths to QEMU that let us move SCSI and virtio processing
to separate threads, we have proof of concepts that this can be done,
and we can use vhost-blk/vhost-scsi to find bottlenecks more effectively.
In fact, virtio-scsi-qemu and virtio-scsi-vhost are effectively two
completely different devices that happen to speak the same SCSI
transport. Not only virtio-scsi-vhost must be configured outside QEMU
and doesn't support -device; it (obviously) presents different
inquiry/vpd/mode data than virtio-scsi-qemu, so that it is not possible
to migrate one to the other.
I don't think vhost-scsi is particularly useful for virtualization,
honestly. However, if it is useful for development, testing or
benchmarking of lio itself (does this make any sense? :)) that could be
by itself a good reason to include it.
Paolo
^ permalink raw reply
* [net-next RFC V5 0/5] Multiqueue virtio-net
From: Jason Wang @ 2012-07-05 10:29 UTC (permalink / raw)
To: mst, mashirle, krkumar2, habanero, rusty, netdev, linux-kernel,
virtualization, edumazet, tahm, jwhan, davem
Cc: kvm, sri
Hello All:
This series is an update version of multiqueue virtio-net driver based on
Krishna Kumar's work to let virtio-net use multiple rx/tx queues to do the
packets reception and transmission. Please review and comments.
Test Environment:
- Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, 8 cores 2 numa nodes
- Two directed connected 82599
Test Summary:
- Highlights: huge improvements on TCP_RR test
- Lowlights: regression on small packet transmission, higher cpu utilization
than single queue, need further optimization
Analysis of the performance result:
- I count the number of packets sending/receiving during the test, and
multiqueue show much more ability in terms of packets per second.
- For the tx regression, multiqueue send about 1-2 times of more packets
compared to single queue, and the packets size were much smaller than single
queue does. I suspect tcp does less batching in multiqueue, so I hack the
tcp_write_xmit() to forece more batching, multiqueue works as well as
singlequeue for both small transmission and throughput
- I didn't pack the accelerate RFS with virtio-net in this sereis as it still
need further shaping, for the one that interested in this please see:
http://www.mail-archive.com/kvm@vger.kernel.org/msg64111.html
Changes from V4:
- Add ability to negotiate the number of queues through control virtqueue
- Ethtool -{L|l} support and default the tx/rx queue number to 1
- Expose the API to set irq affinity instead of irq itself
Changes from V3:
- Rebase to the net-next
- Let queue 2 to be the control virtqueue to obey the spec
- Prodives irq affinity
- Choose txq based on processor id
References:
- V4: https://lkml.org/lkml/2012/6/25/120
- V3: http://lwn.net/Articles/467283/
Test result:
1) 1 vm 2 vcpu 1q vs 2q, 1 - 1q, 2 - 2q, no pinning
- Guest to External Host TCP STREAM
sessions size throughput1 throughput2 norm1 norm2
1 64 650.55 655.61 100% 24.88 24.86 99%
2 64 1446.81 1309.44 90% 30.49 27.16 89%
4 64 1430.52 1305.59 91% 30.78 26.80 87%
8 64 1450.89 1270.82 87% 30.83 25.95 84%
1 256 1699.45 1779.58 104% 56.75 59.08 104%
2 256 4902.71 3446.59 70% 98.53 62.78 63%
4 256 4803.76 2980.76 62% 97.44 54.68 56%
8 256 5128.88 3158.74 61% 104.68 58.61 55%
1 512 2837.98 2838.42 100% 89.76 90.41 100%
2 512 6742.59 5495.83 81% 155.03 99.07 63%
4 512 9193.70 5900.17 64% 202.84 106.44 52%
8 512 9287.51 7107.79 76% 202.18 129.08 63%
1 1024 4166.42 4224.98 101% 128.55 129.86 101%
2 1024 6196.94 7823.08 126% 181.80 168.81 92%
4 1024 9113.62 9219.49 101% 235.15 190.93 81%
8 1024 9324.25 9402.66 100% 239.10 179.99 75%
1 2048 7441.63 6534.04 87% 248.01 215.63 86%
2 2048 7024.61 7414.90 105% 225.79 219.62 97%
4 2048 8971.49 9269.00 103% 278.94 220.84 79%
8 2048 9314.20 9359.96 100% 268.36 192.23 71%
1 4096 8282.60 8990.08 108% 277.45 320.05 115%
2 4096 9194.80 9293.78 101% 317.02 248.76 78%
4 4096 9340.73 9313.19 99% 300.34 230.35 76%
8 4096 9148.23 9347.95 102% 279.49 199.43 71%
1 16384 8787.89 8766.31 99% 312.38 316.53 101%
2 16384 9306.35 9156.14 98% 319.53 279.83 87%
4 16384 9177.81 9307.50 101% 312.69 230.07 73%
8 16384 9035.82 9188.00 101% 298.32 199.17 66%
- TCP RR
sessions size throughput1 throughput2 norm1 norm2
50 1 54695.41 84164.98 153% 1957.33 1901.31 97%
100 1 60141.88 88598.94 147% 2157.90 2000.45 92%
250 1 74763.56 135584.22 181% 2541.94 2628.59 103%
50 64 51628.38 82867.50 160% 1872.55 1812.16 96%
100 64 60367.73 84080.60 139% 2215.69 1867.69 84%
250 64 68502.70 124910.59 182% 2321.43 2495.76 107%
50 128 53477.08 77625.07 145% 1905.10 1870.99 98%
100 128 59697.56 74902.37 125% 2230.66 1751.03 78%
250 128 71248.74 133963.55 188% 2453.12 2711.72 110%
50 256 47663.86 67742.63 142% 1880.45 1735.30 92%
100 256 54051.84 68738.57 127% 2123.03 1778.59 83%
250 256 68250.06 124487.90 182% 2321.89 2598.60 111%
- External Host to Guest TCP STRAM
sessions size throughput1 throughput2 norm1 norm2
1 64 847.71 864.83 102% 57.99 57.93 99%
2 64 1690.82 1544.94 91% 80.13 55.09 68%
4 64 3434.98 3455.53 100% 127.17 89.00 69%
8 64 5890.19 6557.35 111% 194.70 146.52 75%
1 256 2094.04 2109.14 100% 130.73 127.14 97%
2 256 5218.13 3731.97 71% 219.15 114.02 52%
4 256 6734.51 9213.47 136% 227.87 208.31 91%
8 256 6452.86 9402.78 145% 224.83 207.77 92%
1 512 3945.07 4203.68 106% 279.72 273.30 97%
2 512 7878.96 8122.55 103% 278.25 231.71 83%
4 512 7645.89 9402.13 122% 252.10 217.42 86%
8 512 6657.06 9403.71 141% 239.81 214.89 89%
1 1024 5729.06 5111.21 89% 289.38 303.09 104%
2 1024 8097.27 8159.67 100% 269.29 242.97 90%
4 1024 7778.93 8919.02 114% 261.28 205.50 78%
8 1024 6458.02 9360.02 144% 221.26 208.09 94%
1 2048 6426.94 5195.59 80% 292.52 307.47 105%
2 2048 8221.90 9025.66 109% 283.80 242.25 85%
4 2048 7364.72 8527.79 115% 248.10 198.36 79%
8 2048 6760.63 9161.07 135% 230.53 205.12 88%
1 4096 7247.02 6874.21 94% 276.23 287.68 104%
2 4096 8346.04 8818.65 105% 281.49 254.81 90%
4 4096 6710.00 9354.59 139% 216.41 210.13 97%
8 4096 6265.69 9406.87 150% 206.69 210.92 102%
1 16384 8159.50 8048.79 98% 266.94 283.11 106%
2 16384 8525.66 8552.41 100% 294.36 239.27 81%
4 16384 6042.24 8447.86 139% 200.21 196.40 98%
8 16384 6432.63 9403.49 146% 211.48 206.13 97%
2) 1 vm 4 vcpu 1q vs 4q, 1 - 1q, 2 - 4q, no pinning
- Guest to External Host TCP STREAM
sessions size throughput1 throughput2 norm1 norm2
1 64 636.93 657.69 103% 23.55 24.42 103%
2 64 1457.46 1268.78 87% 30.97 26.02 84%
4 64 3062.86 2302.43 75% 41.00 29.64 72%
8 64 3107.68 2308.32 74% 41.62 29.07 69%
1 256 1743.50 1750.11 100% 59.00 56.63 95%
2 256 4582.61 2870.31 62% 92.47 51.97 56%
4 256 8440.96 4795.37 56% 135.10 56.39 41%
8 256 9240.31 6654.82 72% 144.76 74.89 51%
1 512 2918.25 2735.26 93% 91.08 86.47 94%
2 512 8978.32 5107.95 56% 200.00 94.97 47%
4 512 8850.39 6864.37 77% 190.32 101.09 53%
8 512 9270.30 8483.01 91% 193.44 118.73 61%
1 1024 4416.10 3679.70 83% 135.54 110.63 81%
2 1024 9085.20 8770.48 96% 242.23 175.59 72%
4 1024 9158.57 9011.56 98% 234.39 159.17 67%
8 1024 9345.89 9067.43 97% 233.35 138.73 59%
1 2048 8455.19 6077.94 71% 338.52 190.16 56%
2 2048 9223.32 8237.73 89% 270.00 198.27 73%
4 2048 9080.75 9257.63 101% 261.30 172.80 66%
8 2048 9177.39 8977.10 97% 256.89 147.50 57%
1 4096 8665.35 8394.78 96% 289.63 289.85 100%
2 4096 7850.73 8857.86 112% 253.33 252.62 99%
4 4096 9332.55 8508.37 91% 289.19 151.29 52%
8 4096 8482.30 9146.80 107% 255.41 156.02 61%
1 16384 8825.72 8778.26 99% 314.60 308.89 98%
2 16384 9283.85 8927.40 96% 316.48 246.98 78%
4 16384 7766.95 8708.06 112% 265.25 155.59 58%
8 16384 8945.55 8940.23 99% 298.45 151.32 50%
- TCP_RR
sessions size throughput1 throughput2 norm1 norm2
50 1 60848.70 81719.39 134% 2196.86 1551.05 70%
100 1 61886.19 81425.02 131% 2215.76 1517.52 68%
250 1 72058.41 162597.84 225% 2441.84 2278.14 93%
50 64 51646.93 74160.10 143% 1861.07 1322.22 71%
100 64 57574.86 83488.26 145% 2076.54 1479.79 71%
250 64 67583.35 138482.15 204% 2314.46 2022.83 87%
50 128 59931.51 71633.03 119% 2244.60 1309.18 58%
100 128 58329.80 73104.90 125% 2202.98 1329.52 60%
250 128 71021.55 161067.73 226% 2469.11 2205.28 89%
50 256 47509.24 64330.24 135% 1915.75 1269.90 66%
100 256 49293.03 68507.94 138% 1939.75 1263.64 65%
250 256 63169.07 138390.68 219% 2255.47 2098.13 93%
- External Host to Guest TCP STREAM
sessions size throughput1 throughput2 norm1 norm2
1 64 850.18 854.96 100% 56.94 58.25 102%
2 64 1659.12 1730.25 104% 81.65 67.57 82%
4 64 3254.70 3397.17 104% 118.57 76.21 64%
8 64 6251.97 6389.29 102% 207.68 104.21 50%
1 256 2029.14 2105.18 103% 116.45 119.69 102%
2 256 5412.02 4260.32 78% 240.87 139.73 58%
4 256 7777.28 8743.12 112% 263.20 174.65 66%
8 256 6459.51 9388.93 145% 218.94 158.37 72%
1 512 4566.31 4269.30 93% 274.74 289.83 105%
2 512 7444.52 8240.64 110% 286.24 243.74 85%
4 512 7722.29 9391.16 121% 261.96 180.36 68%
8 512 6228.50 9134.52 146% 209.17 161.00 76%
1 1024 4965.50 4953.68 99% 307.64 280.48 91%
2 1024 8270.08 7733.71 93% 288.32 197.04 68%
4 1024 7551.04 9394.58 124% 268.41 206.62 76%
8 1024 6307.78 9179.03 145% 216.67 159.63 73%
1 2048 5741.12 5948.80 103% 290.34 268.66 92%
2 2048 7932.79 8766.05 110% 262.96 215.90 82%
4 2048 6907.55 9255.97 133% 233.56 203.96 87%
8 2048 6037.22 9399.41 155% 197.14 164.09 83%
1 4096 7131.70 7535.10 105% 279.43 275.12 98%
2 4096 8109.17 9348.04 115% 274.29 211.49 77%
4 4096 6878.92 9319.13 135% 244.21 192.06 78%
8 4096 6265.92 9408.35 150% 211.85 159.26 75%
1 16384 8288.01 8596.39 103% 272.85 290.22 106%
2 16384 8166.29 9280.12 113% 277.04 236.61 85%
4 16384 6446.97 9382.22 145% 222.91 187.24 83%
8 16384 6066.98 9405.51 155% 198.98 157.09 78%
3) 2 vms each with 2 vcpus, 1q vs 2q - pin vhost/vcpu in the same node
- 2 Guests to External Hosts TCP STREAM
sessions size throughput1 throughput2 norm1 norm2
1 64 1442.07 1475.11 102% 30.82 31.21 101%
2 64 3124.87 2900.93 92% 40.29 35.95 89%
4 64 3166.52 2864.04 90% 40.70 35.47 87%
8 64 3141.45 2848.94 90% 40.38 35.34 87%
1 256 3628.54 3711.73 102% 68.47 70.22 102%
2 256 7806.95 7586.69 97% 111.23 84.38 75%
4 256 8823.65 7612.74 86% 132.92 85.04 63%
8 256 9194.89 9373.41 101% 135.98 119.62 87%
1 512 7106.67 7128.00 100% 124.79 124.30 99%
2 512 9190.22 9397.33 102% 180.84 149.34 82%
4 512 9401.01 9376.67 99% 173.00 140.15 81%
8 512 8572.84 9032.90 105% 150.49 127.58 84%
1 1024 9361.93 9379.24 100% 205.81 202.94 98%
2 1024 9386.69 9389.04 100% 201.78 165.75 82%
4 1024 9403.43 9378.54 99% 195.33 152.06 77%
8 1024 9213.63 9180.64 99% 178.99 141.51 79%
1 2048 9338.95 9384.67 100% 223.22 227.86 102%
2 2048 9389.28 9389.45 100% 202.37 170.08 84%
4 2048 9405.86 9388.71 99% 193.76 161.54 83%
8 2048 9352.40 9384.06 100% 189.16 157.06 83%
1 4096 9380.74 9384.90 100% 239.37 241.56 100%
2 4096 9393.47 9376.74 99% 213.84 195.61 91%
4 4096 9393.85 9381.50 99% 198.06 170.18 85%
8 4096 9400.41 9232.31 98% 192.87 163.56 84%
1 16384 9348.18 9335.55 99% 253.02 254.86 100%
2 16384 9384.97 9359.53 99% 218.56 208.59 95%
4 16384 9326.60 9382.15 100% 206.24 179.72 87%
8 16384 9355.82 9392.85 100% 198.22 172.89 87%
- TCP RR
sessions size throughput1 throughput2 norm1 norm2
50 1 200340.33 261750.19 130% 2935.27 3018.59 102%
100 1 236141.58 266304.49 112% 3452.16 3071.74 88%
250 1 361574.59 320825.08 88% 4972.98 3705.70 74%
50 64 225748.53 242671.12 107% 3011.48 2869.07 95%
100 64 249885.37 260453.72 104% 3240.21 3063.67 94%
250 64 360341.12 310775.60 86% 4682.42 3657.91 78%
50 128 227995.27 289320.38 126% 2950.92 3479.37 117%
100 128 239491.11 291135.77 121% 3099.55 3508.75 113%
250 128 390390.68 362484.35 92% 5042.30 4368.52 86%
50 256 222604.51 317140.97 142% 3058.08 3839.39 125%
100 256 254770.92 335606.03 131% 3326.16 4046.65 121%
250 256 400584.52 436749.22 109% 5220.79 5278.86 101%
- External Host to 2 Guests
sessions size throughput1 throughput2 norm1 norm2
1 64 1667.99 1684.50 100% 59.66 60.77 101%
2 64 3338.83 3379.97 101% 83.61 64.82 77%
4 64 6613.65 6619.11 100% 131.00 97.19 74%
8 64 6553.07 6418.31 97% 141.35 98.27 69%
1 256 3938.40 4068.52 103% 125.21 123.76 98%
2 256 9215.57 9210.88 99% 185.31 154.27 83%
4 256 9407.29 9008.13 95% 186.72 150.01 80%
8 256 9377.17 9385.57 100% 190.28 137.59 72%
1 512 7360.19 6984.80 94% 214.09 211.66 98%
2 512 9392.91 9401.88 100% 193.92 173.11 89%
4 512 9382.64 9394.34 100% 189.27 145.80 77%
8 512 9308.60 9094.08 97% 189.70 141.26 74%
1 1024 9153.26 9066.06 99% 223.07 219.95 98%
2 1024 9393.38 9398.43 100% 194.02 173.82 89%
4 1024 9395.92 8960.73 95% 192.61 145.82 75%
8 1024 9388.92 9399.08 100% 191.18 143.87 75%
1 2048 9355.32 9240.63 98% 221.50 223.03 100%
2 2048 9395.68 9399.62 100% 193.31 177.21 91%
4 2048 9397.67 9399.56 100% 195.25 157.53 80%
8 2048 9397.89 9401.70 100% 197.57 146.96 74%
1 4096 9375.84 9381.72 100% 223.06 225.06 100%
2 4096 9389.47 9396.00 100% 193.91 197.13 101%
4 4096 9397.45 9400.11 100% 192.33 163.60 85%
8 4096 9105.40 9415.76 103% 192.71 140.41 72%
1 16384 9381.53 9381.40 99% 223.53 225.66 100%
2 16384 9387.90 9395.44 100% 193.34 177.03 91%
4 16384 9397.92 9410.98 100% 195.04 151.14 77%
8 16384 9259.00 9419.48 101% 194.91 153.48 78%
4) Local vm to vm 2 vcpu 1q vs 2q - pin vcpu/thread in the same numa node
- VM to VM TCP STREAM
sessions size throughput1 throughput2 norm1 norm2
1 64 576.05 576.14 100% 12.25 12.32 100%
2 64 1266.75 1160.04 91% 19.10 16.05 84%
4 64 1267.34 1123.70 88% 19.08 15.51 81%
8 64 1230.88 1174.70 95% 18.53 15.58 84%
1 256 1311.00 1303.02 99% 25.34 25.35 100%
2 256 5400.26 2794.00 51% 75.92 36.43 47%
4 256 5200.67 2818.88 54% 72.81 33.92 46%
8 256 5234.55 2893.74 55% 73.10 34.97 47%
1 512 3244.09 3263.72 100% 56.48 56.65 100%
2 512 8172.16 4661.15 57% 119.05 67.89 57%
4 512 10567.44 7063.25 66% 147.76 77.27 52%
8 512 10477.87 8471.33 80% 145.94 102.91 70%
1 1024 5432.54 5333.99 98% 93.69 92.38 98%
2 1024 12590.24 9259.97 73% 185.37 135.28 72%
4 1024 15600.53 10731.93 68% 222.20 123.60 55%
8 1024 16222.87 10704.85 65% 227.05 113.81 50%
1 2048 6667.61 7484.37 112% 116.75 129.72 111%
2 2048 8180.43 11500.88 140% 137.84 156.64 113%
4 2048 15127.93 14416.16 95% 227.60 154.59 67%
8 2048 16381.79 14794.10 90% 244.29 158.45 64%
1 4096 7375.63 8948.90 121% 131.97 156.57 118%
2 4096 9321.16 14443.21 154% 161.24 163.74 101%
4 4096 13028.45 15984.94 122% 212.78 171.26 80%
8 4096 15611.28 18810.54 120% 245.15 198.65 81%
1 16384 15304.38 14202.08 92% 259.94 244.04 93%
2 16384 15508.97 15913.09 102% 261.30 244.26 93%
4 16384 14859.98 20164.34 135% 248.29 214.26 86%
8 16384 15594.59 19960.99 127% 253.79 211.27 83%
- TCP RR
sessions size throughput1 throughput2 norm1 norm2
50 1 54972.51 69820.99 127% 1133.58 1063.58 93%
100 1 55847.16 72407.93 129% 1155.73 1024.35 88%
250 1 60066.23 108266.50 180% 1114.30 1323.55 118%
50 64 48727.63 62378.32 128% 1014.29 888.78 87%
100 64 51804.65 69250.51 133% 1077.78 986.97 91%
250 64 61278.68 100015.78 163% 1076.93 1243.18 115%
50 256 51593.29 62046.22 120% 1069.14 871.08 81%
100 256 51647.00 68197.43 132% 1071.66 958.51 89%
250 256 60433.88 99072.59 163% 1072.41 1199.10 111%
50 512 52177.79 66483.77 127% 1082.65 960.82 88%
100 512 50351.67 62537.63 124% 1041.61 876.41 84%
250 512 60510.14 103856.79 171% 1055.21 1245.17 118%
Jason Wang (4):
virtio_ring: move queue_index to vring_virtqueue
virtio: intorduce an API to set affinity for a virtqueue
virtio_net: multiqueue support
virtio_net: support negotiating the number of queues through ctrl vq
Krishna Kumar (1):
virtio_net: Introduce VIRTIO_NET_F_MULTIQUEUE
drivers/net/virtio_net.c | 792 +++++++++++++++++++++++++++++------------
drivers/virtio/virtio_mmio.c | 5 +-
drivers/virtio/virtio_pci.c | 58 +++-
drivers/virtio/virtio_ring.c | 17 +
include/linux/virtio.h | 4 +
include/linux/virtio_config.h | 21 ++
include/linux/virtio_net.h | 10 +
7 files changed, 677 insertions(+), 230 deletions(-)
^ permalink raw reply
* [net-next RFC V5 1/5] virtio_net: Introduce VIRTIO_NET_F_MULTIQUEUE
From: Jason Wang @ 2012-07-05 10:29 UTC (permalink / raw)
To: mst, mashirle, krkumar2, habanero, rusty, netdev, linux-kernel,
virtualization, edumazet, tahm, jwhan, davem
Cc: kvm, sri
In-Reply-To: <1341484194-8108-1-git-send-email-jasowang@redhat.com>
From: Krishna Kumar <krkumar2@in.ibm.com>
Introduce VIRTIO_NET_F_MULTIQUEUE.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
include/linux/virtio_net.h | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 2470f54..1bc7e30 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -51,6 +51,7 @@
#define VIRTIO_NET_F_CTRL_RX_EXTRA 20 /* Extra RX mode control support */
#define VIRTIO_NET_F_GUEST_ANNOUNCE 21 /* Guest can announce device on the
* network */
+#define VIRTIO_NET_F_MULTIQUEUE 22 /* Device supports multiple TXQ/RXQ */
#define VIRTIO_NET_S_LINK_UP 1 /* Link is up */
#define VIRTIO_NET_S_ANNOUNCE 2 /* Announcement is needed */
--
1.7.1
^ permalink raw reply related
* [net-next RFC V5 2/5] virtio_ring: move queue_index to vring_virtqueue
From: Jason Wang @ 2012-07-05 10:29 UTC (permalink / raw)
To: mst, mashirle, krkumar2, habanero, rusty, netdev, linux-kernel,
virtualization, edumazet, tahm, jwhan, davem
Cc: kvm, sri
In-Reply-To: <1341484194-8108-1-git-send-email-jasowang@redhat.com>
Instead of storing the queue index in virtio infos, this patch moves them to
vring_virtqueue and introduces helpers to set and get the value. This would
simplify the management and tracing.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/virtio/virtio_mmio.c | 5 +----
drivers/virtio/virtio_pci.c | 12 +++++-------
drivers/virtio/virtio_ring.c | 17 +++++++++++++++++
include/linux/virtio.h | 4 ++++
4 files changed, 27 insertions(+), 11 deletions(-)
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 453db0c..f5432b6 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -131,9 +131,6 @@ struct virtio_mmio_vq_info {
/* the number of entries in the queue */
unsigned int num;
- /* the index of the queue */
- int queue_index;
-
/* the virtual address of the ring queue */
void *queue;
@@ -324,7 +321,6 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
err = -ENOMEM;
goto error_kmalloc;
}
- info->queue_index = index;
/* Allocate pages for the queue - start with a queue as big as
* possible (limited by maximum size allowed by device), drop down
@@ -363,6 +359,7 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index,
goto error_new_virtqueue;
}
+ virtqueue_set_queue_index(vq, index);
vq->priv = info;
info->vq = vq;
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 2e03d41..adb24f2 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -79,9 +79,6 @@ struct virtio_pci_vq_info
/* the number of entries in the queue */
int num;
- /* the index of the queue */
- int queue_index;
-
/* the virtual address of the ring queue */
void *queue;
@@ -202,11 +199,11 @@ static void vp_reset(struct virtio_device *vdev)
static void vp_notify(struct virtqueue *vq)
{
struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
- struct virtio_pci_vq_info *info = vq->priv;
/* we write the queue's selector into the notification register to
* signal the other end */
- iowrite16(info->queue_index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY);
+ iowrite16(virtqueue_get_queue_index(vq),
+ vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY);
}
/* Handle a configuration change: Tell driver if it wants to know. */
@@ -402,7 +399,6 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
if (!info)
return ERR_PTR(-ENOMEM);
- info->queue_index = index;
info->num = num;
info->msix_vector = msix_vec;
@@ -425,6 +421,7 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index,
goto out_activate_queue;
}
+ virtqueue_set_queue_index(vq, index);
vq->priv = info;
info->vq = vq;
@@ -467,7 +464,8 @@ static void vp_del_vq(struct virtqueue *vq)
list_del(&info->node);
spin_unlock_irqrestore(&vp_dev->lock, flags);
- iowrite16(info->queue_index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
+ iowrite16(virtqueue_get_queue_index(vq),
+ vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
if (vp_dev->msix_enabled) {
iowrite16(VIRTIO_MSI_NO_VECTOR,
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 5aa43c3..9c5aeea 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -106,6 +106,9 @@ struct vring_virtqueue
/* How to notify other side. FIXME: commonalize hcalls! */
void (*notify)(struct virtqueue *vq);
+ /* Index of the queue */
+ int queue_index;
+
#ifdef DEBUG
/* They're supposed to lock for us. */
unsigned int in_use;
@@ -171,6 +174,20 @@ static int vring_add_indirect(struct vring_virtqueue *vq,
return head;
}
+void virtqueue_set_queue_index(struct virtqueue *_vq, int queue_index)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ vq->queue_index = queue_index;
+}
+EXPORT_SYMBOL_GPL(virtqueue_set_queue_index);
+
+int virtqueue_get_queue_index(struct virtqueue *_vq)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ return vq->queue_index;
+}
+EXPORT_SYMBOL_GPL(virtqueue_get_queue_index);
+
/**
* virtqueue_add_buf - expose buffer to other end
* @vq: the struct virtqueue we're talking about.
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 8efd28a..0d8ed46 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -50,6 +50,10 @@ void *virtqueue_detach_unused_buf(struct virtqueue *vq);
unsigned int virtqueue_get_vring_size(struct virtqueue *vq);
+void virtqueue_set_queue_index(struct virtqueue *vq, int queue_index);
+
+int virtqueue_get_queue_index(struct virtqueue *vq);
+
/**
* virtio_device - representation of a device using virtio
* @index: unique position on the virtio bus
--
1.7.1
^ permalink raw reply related
* [net-next RFC V5 3/5] virtio: intorduce an API to set affinity for a virtqueue
From: Jason Wang @ 2012-07-05 10:29 UTC (permalink / raw)
To: mst, mashirle, krkumar2, habanero, rusty, netdev, linux-kernel,
virtualization, edumazet, tahm, jwhan, davem
Cc: kvm, sri
In-Reply-To: <1341484194-8108-1-git-send-email-jasowang@redhat.com>
Sometimes, virtio device need to configure irq affiniry hint to maximize the
performance. Instead of just exposing the irq of a virtqueue, this patch
introduce an API to set the affinity for a virtqueue.
The api is best-effort, the affinity hint may not be set as expected due to
platform support, irq sharing or irq type. Currently, only pci method were
implemented and we set the affinity according to:
- if device uses INTX, we just ignore the request
- if device has per vq vector, we force the affinity hint
- if the virtqueues share MSI, make the affinity OR over all affinities
requested
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/virtio/virtio_pci.c | 46 +++++++++++++++++++++++++++++++++++++++++
include/linux/virtio_config.h | 21 ++++++++++++++++++
2 files changed, 67 insertions(+), 0 deletions(-)
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index adb24f2..2ff0451 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -48,6 +48,7 @@ struct virtio_pci_device
int msix_enabled;
int intx_enabled;
struct msix_entry *msix_entries;
+ cpumask_var_t *msix_affinity_masks;
/* Name strings for interrupts. This size should be enough,
* and I'm too lazy to allocate each name separately. */
char (*msix_names)[256];
@@ -276,6 +277,10 @@ static void vp_free_vectors(struct virtio_device *vdev)
for (i = 0; i < vp_dev->msix_used_vectors; ++i)
free_irq(vp_dev->msix_entries[i].vector, vp_dev);
+ for (i = 0; i < vp_dev->msix_vectors; i++)
+ if (vp_dev->msix_affinity_masks[i])
+ free_cpumask_var(vp_dev->msix_affinity_masks[i]);
+
if (vp_dev->msix_enabled) {
/* Disable the vector used for configuration */
iowrite16(VIRTIO_MSI_NO_VECTOR,
@@ -293,6 +298,8 @@ static void vp_free_vectors(struct virtio_device *vdev)
vp_dev->msix_names = NULL;
kfree(vp_dev->msix_entries);
vp_dev->msix_entries = NULL;
+ kfree(vp_dev->msix_affinity_masks);
+ vp_dev->msix_affinity_masks = NULL;
}
static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
@@ -311,6 +318,15 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
GFP_KERNEL);
if (!vp_dev->msix_names)
goto error;
+ vp_dev->msix_affinity_masks
+ = kzalloc(nvectors * sizeof *vp_dev->msix_affinity_masks,
+ GFP_KERNEL);
+ if (!vp_dev->msix_affinity_masks)
+ goto error;
+ for (i = 0; i < nvectors; ++i)
+ if (!alloc_cpumask_var(&vp_dev->msix_affinity_masks[i],
+ GFP_KERNEL))
+ goto error;
for (i = 0; i < nvectors; ++i)
vp_dev->msix_entries[i].entry = i;
@@ -607,6 +623,35 @@ static const char *vp_bus_name(struct virtio_device *vdev)
return pci_name(vp_dev->pci_dev);
}
+/* Setup the affinity for a virtqueue:
+ * - force the affinity for per vq vector
+ * - OR over all affinities for shared MSI
+ * - ignore the affinity request if we're using INTX
+ */
+static int vp_set_vq_affinity(struct virtqueue *vq, int cpu)
+{
+ struct virtio_device *vdev = vq->vdev;
+ struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+ struct virtio_pci_vq_info *info = vq->priv;
+ struct cpumask *mask;
+ unsigned int irq;
+
+ if (!vq->callback)
+ return -EINVAL;
+
+ if (vp_dev->msix_enabled) {
+ mask = vp_dev->msix_affinity_masks[info->msix_vector];
+ irq = vp_dev->msix_entries[info->msix_vector].vector;
+ if (cpu == -1)
+ irq_set_affinity_hint(irq, NULL);
+ else {
+ cpumask_set_cpu(cpu, mask);
+ irq_set_affinity_hint(irq, mask);
+ }
+ }
+ return 0;
+}
+
static struct virtio_config_ops virtio_pci_config_ops = {
.get = vp_get,
.set = vp_set,
@@ -618,6 +663,7 @@ static struct virtio_config_ops virtio_pci_config_ops = {
.get_features = vp_get_features,
.finalize_features = vp_finalize_features,
.bus_name = vp_bus_name,
+ .set_vq_affinity = vp_set_vq_affinity,
};
static void virtio_pci_release_dev(struct device *_d)
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index fc457f4..2c4a989 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -98,6 +98,7 @@
* vdev: the virtio_device
* This returns a pointer to the bus name a la pci_name from which
* the caller can then copy.
+ * @set_vq_affinity: set the affinity for a virtqueue.
*/
typedef void vq_callback_t(struct virtqueue *);
struct virtio_config_ops {
@@ -116,6 +117,7 @@ struct virtio_config_ops {
u32 (*get_features)(struct virtio_device *vdev);
void (*finalize_features)(struct virtio_device *vdev);
const char *(*bus_name)(struct virtio_device *vdev);
+ int (*set_vq_affinity)(struct virtqueue *vq, int cpu);
};
/* If driver didn't advertise the feature, it will never appear. */
@@ -190,5 +192,24 @@ const char *virtio_bus_name(struct virtio_device *vdev)
return vdev->config->bus_name(vdev);
}
+/**
+ * virtqueue_set_affinity - setting affinity for a virtqueue
+ * @vq: the virtqueue
+ * @cpu: the cpu no.
+ *
+ * Pay attention the function are best-effort: the affinity hint may not be set
+ * due to config support, irq type and sharing.
+ *
+ */
+static inline
+int virtqueue_set_affinity(struct virtqueue *vq, int cpu)
+{
+ struct virtio_device *vdev = vq->vdev;
+ if (vdev->config->set_vq_affinity)
+ return vdev->config->set_vq_affinity(vq, cpu);
+ return 0;
+}
+
+
#endif /* __KERNEL__ */
#endif /* _LINUX_VIRTIO_CONFIG_H */
--
1.7.1
^ permalink raw reply related
* [net-next RFC V5 4/5] virtio_net: multiqueue support
From: Jason Wang @ 2012-07-05 10:29 UTC (permalink / raw)
To: mst, mashirle, krkumar2, habanero, rusty, netdev, linux-kernel,
virtualization, edumazet, tahm, jwhan, davem
Cc: kvm, sri
In-Reply-To: <1341484194-8108-1-git-send-email-jasowang@redhat.com>
This patch converts virtio_net to a multi queue device. After negotiated
VIRTIO_NET_F_MULTIQUEUE feature, the virtio device has many tx/rx queue pairs,
and driver could read the number from config space.
The driver expects the number of rx/tx queue paris is equal to the number of
vcpus. To maximize the performance under this per-cpu rx/tx queue pairs, some
optimization were introduced:
- Txq selection is based on the processor id in order to avoid contending a lock
whose owner may exits to host.
- Since the txq/txq were per-cpu, affinity hint were set to the cpu that owns
the queue pairs.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 645 ++++++++++++++++++++++++++++++-------------
include/linux/virtio_net.h | 2 +
2 files changed, 452 insertions(+), 195 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 1db445b..7410187 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -26,6 +26,7 @@
#include <linux/scatterlist.h>
#include <linux/if_vlan.h>
#include <linux/slab.h>
+#include <linux/interrupt.h>
static int napi_weight = 128;
module_param(napi_weight, int, 0444);
@@ -41,6 +42,8 @@ module_param(gso, bool, 0444);
#define VIRTNET_SEND_COMMAND_SG_MAX 2
#define VIRTNET_DRIVER_VERSION "1.0.0"
+#define MAX_QUEUES 256
+
struct virtnet_stats {
struct u64_stats_sync tx_syncp;
struct u64_stats_sync rx_syncp;
@@ -51,43 +54,69 @@ struct virtnet_stats {
u64 rx_packets;
};
-struct virtnet_info {
- struct virtio_device *vdev;
- struct virtqueue *rvq, *svq, *cvq;
- struct net_device *dev;
+/* Internal representation of a send virtqueue */
+struct send_queue {
+ /* Virtqueue associated with this send _queue */
+ struct virtqueue *vq;
+
+ /* TX: fragments + linear part + virtio header */
+ struct scatterlist sg[MAX_SKB_FRAGS + 2];
+};
+
+/* Internal representation of a receive virtqueue */
+struct receive_queue {
+ /* Virtqueue associated with this receive_queue */
+ struct virtqueue *vq;
+
+ /* Back pointer to the virtnet_info */
+ struct virtnet_info *vi;
+
struct napi_struct napi;
- unsigned int status;
/* Number of input buffers, and max we've ever had. */
unsigned int num, max;
+ /* Work struct for refilling if we run low on memory. */
+ struct delayed_work refill;
+
+ /* Chain pages by the private ptr. */
+ struct page *pages;
+
+ /* RX: fragments + linear part + virtio header */
+ struct scatterlist sg[MAX_SKB_FRAGS + 2];
+};
+
+struct virtnet_info {
+ u16 num_queue_pairs; /* # of RX/TX vq pairs */
+
+ struct send_queue *sq[MAX_QUEUES] ____cacheline_aligned_in_smp;
+ struct receive_queue *rq[MAX_QUEUES] ____cacheline_aligned_in_smp;
+ struct virtqueue *cvq;
+
+ struct virtio_device *vdev;
+ struct net_device *dev;
+ unsigned int status;
+
/* I like... big packets and I cannot lie! */
bool big_packets;
/* Host will merge rx buffers for big packets (shake it! shake it!) */
bool mergeable_rx_bufs;
+ /* Has control virtqueue */
+ bool has_cvq;
+
/* enable config space updates */
bool config_enable;
/* Active statistics */
struct virtnet_stats __percpu *stats;
- /* Work struct for refilling if we run low on memory. */
- struct delayed_work refill;
-
/* Work struct for config space updates */
struct work_struct config_work;
/* Lock for config space updates */
struct mutex config_lock;
-
- /* Chain pages by the private ptr. */
- struct page *pages;
-
- /* fragments + linear part + virtio header */
- struct scatterlist rx_sg[MAX_SKB_FRAGS + 2];
- struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
};
struct skb_vnet_hdr {
@@ -108,6 +137,22 @@ struct padded_vnet_hdr {
char padding[6];
};
+static inline int txq_get_qnum(struct virtnet_info *vi, struct virtqueue *vq)
+{
+ int ret = virtqueue_get_queue_index(vq);
+
+ /* skip ctrl vq */
+ if (vi->has_cvq)
+ return (ret - 1) / 2;
+ else
+ return ret / 2;
+}
+
+static inline int rxq_get_qnum(struct virtnet_info *vi, struct virtqueue *vq)
+{
+ return virtqueue_get_queue_index(vq) / 2;
+}
+
static inline struct skb_vnet_hdr *skb_vnet_hdr(struct sk_buff *skb)
{
return (struct skb_vnet_hdr *)skb->cb;
@@ -117,22 +162,22 @@ static inline struct skb_vnet_hdr *skb_vnet_hdr(struct sk_buff *skb)
* private is used to chain pages for big packets, put the whole
* most recent used list in the beginning for reuse
*/
-static void give_pages(struct virtnet_info *vi, struct page *page)
+static void give_pages(struct receive_queue *rq, struct page *page)
{
struct page *end;
/* Find end of list, sew whole thing into vi->pages. */
for (end = page; end->private; end = (struct page *)end->private);
- end->private = (unsigned long)vi->pages;
- vi->pages = page;
+ end->private = (unsigned long)rq->pages;
+ rq->pages = page;
}
-static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
+static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
{
- struct page *p = vi->pages;
+ struct page *p = rq->pages;
if (p) {
- vi->pages = (struct page *)p->private;
+ rq->pages = (struct page *)p->private;
/* clear private here, it is used to chain pages */
p->private = 0;
} else
@@ -140,15 +185,15 @@ static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
return p;
}
-static void skb_xmit_done(struct virtqueue *svq)
+static void skb_xmit_done(struct virtqueue *vq)
{
- struct virtnet_info *vi = svq->vdev->priv;
+ struct virtnet_info *vi = vq->vdev->priv;
/* Suppress further interrupts. */
- virtqueue_disable_cb(svq);
+ virtqueue_disable_cb(vq);
/* We were probably waiting for more output buffers. */
- netif_wake_queue(vi->dev);
+ netif_wake_subqueue(vi->dev, txq_get_qnum(vi, vq));
}
static void set_skb_frag(struct sk_buff *skb, struct page *page,
@@ -167,9 +212,10 @@ static void set_skb_frag(struct sk_buff *skb, struct page *page,
}
/* Called from bottom half context */
-static struct sk_buff *page_to_skb(struct virtnet_info *vi,
+static struct sk_buff *page_to_skb(struct receive_queue *rq,
struct page *page, unsigned int len)
{
+ struct virtnet_info *vi = rq->vi;
struct sk_buff *skb;
struct skb_vnet_hdr *hdr;
unsigned int copy, hdr_len, offset;
@@ -225,12 +271,12 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
}
if (page)
- give_pages(vi, page);
+ give_pages(rq, page);
return skb;
}
-static int receive_mergeable(struct virtnet_info *vi, struct sk_buff *skb)
+static int receive_mergeable(struct receive_queue *rq, struct sk_buff *skb)
{
struct skb_vnet_hdr *hdr = skb_vnet_hdr(skb);
struct page *page;
@@ -244,7 +290,7 @@ static int receive_mergeable(struct virtnet_info *vi, struct sk_buff *skb)
skb->dev->stats.rx_length_errors++;
return -EINVAL;
}
- page = virtqueue_get_buf(vi->rvq, &len);
+ page = virtqueue_get_buf(rq->vq, &len);
if (!page) {
pr_debug("%s: rx error: %d buffers missing\n",
skb->dev->name, hdr->mhdr.num_buffers);
@@ -257,13 +303,14 @@ static int receive_mergeable(struct virtnet_info *vi, struct sk_buff *skb)
set_skb_frag(skb, page, 0, &len);
- --vi->num;
+ --rq->num;
}
return 0;
}
-static void receive_buf(struct net_device *dev, void *buf, unsigned int len)
+static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len)
{
+ struct net_device *dev = rq->vi->dev;
struct virtnet_info *vi = netdev_priv(dev);
struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
struct sk_buff *skb;
@@ -274,7 +321,7 @@ static void receive_buf(struct net_device *dev, void *buf, unsigned int len)
pr_debug("%s: short packet %i\n", dev->name, len);
dev->stats.rx_length_errors++;
if (vi->mergeable_rx_bufs || vi->big_packets)
- give_pages(vi, buf);
+ give_pages(rq, buf);
else
dev_kfree_skb(buf);
return;
@@ -286,14 +333,14 @@ static void receive_buf(struct net_device *dev, void *buf, unsigned int len)
skb_trim(skb, len);
} else {
page = buf;
- skb = page_to_skb(vi, page, len);
+ skb = page_to_skb(rq, page, len);
if (unlikely(!skb)) {
dev->stats.rx_dropped++;
- give_pages(vi, page);
+ give_pages(rq, page);
return;
}
if (vi->mergeable_rx_bufs)
- if (receive_mergeable(vi, skb)) {
+ if (receive_mergeable(rq, skb)) {
dev_kfree_skb(skb);
return;
}
@@ -363,90 +410,91 @@ frame_err:
dev_kfree_skb(skb);
}
-static int add_recvbuf_small(struct virtnet_info *vi, gfp_t gfp)
+static int add_recvbuf_small(struct receive_queue *rq, gfp_t gfp)
{
struct sk_buff *skb;
struct skb_vnet_hdr *hdr;
int err;
- skb = __netdev_alloc_skb_ip_align(vi->dev, MAX_PACKET_LEN, gfp);
+ skb = __netdev_alloc_skb_ip_align(rq->vi->dev, MAX_PACKET_LEN, gfp);
if (unlikely(!skb))
return -ENOMEM;
skb_put(skb, MAX_PACKET_LEN);
hdr = skb_vnet_hdr(skb);
- sg_set_buf(vi->rx_sg, &hdr->hdr, sizeof hdr->hdr);
+ sg_set_buf(rq->sg, &hdr->hdr, sizeof hdr->hdr);
+
+ skb_to_sgvec(skb, rq->sg + 1, 0, skb->len);
- skb_to_sgvec(skb, vi->rx_sg + 1, 0, skb->len);
+ err = virtqueue_add_buf(rq->vq, rq->sg, 0, 2, skb, gfp);
- err = virtqueue_add_buf(vi->rvq, vi->rx_sg, 0, 2, skb, gfp);
if (err < 0)
dev_kfree_skb(skb);
return err;
}
-static int add_recvbuf_big(struct virtnet_info *vi, gfp_t gfp)
+static int add_recvbuf_big(struct receive_queue *rq, gfp_t gfp)
{
struct page *first, *list = NULL;
char *p;
int i, err, offset;
- /* page in vi->rx_sg[MAX_SKB_FRAGS + 1] is list tail */
+ /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
- first = get_a_page(vi, gfp);
+ first = get_a_page(rq, gfp);
if (!first) {
if (list)
- give_pages(vi, list);
+ give_pages(rq, list);
return -ENOMEM;
}
- sg_set_buf(&vi->rx_sg[i], page_address(first), PAGE_SIZE);
+ sg_set_buf(&rq->sg[i], page_address(first), PAGE_SIZE);
/* chain new page in list head to match sg */
first->private = (unsigned long)list;
list = first;
}
- first = get_a_page(vi, gfp);
+ first = get_a_page(rq, gfp);
if (!first) {
- give_pages(vi, list);
+ give_pages(rq, list);
return -ENOMEM;
}
p = page_address(first);
- /* vi->rx_sg[0], vi->rx_sg[1] share the same page */
- /* a separated vi->rx_sg[0] for virtio_net_hdr only due to QEMU bug */
- sg_set_buf(&vi->rx_sg[0], p, sizeof(struct virtio_net_hdr));
+ /* rq->sg[0], rq->sg[1] share the same page */
+ /* a separated rq->sg[0] for virtio_net_hdr only due to QEMU bug */
+ sg_set_buf(&rq->sg[0], p, sizeof(struct virtio_net_hdr));
- /* vi->rx_sg[1] for data packet, from offset */
+ /* rq->sg[1] for data packet, from offset */
offset = sizeof(struct padded_vnet_hdr);
- sg_set_buf(&vi->rx_sg[1], p + offset, PAGE_SIZE - offset);
+ sg_set_buf(&rq->sg[1], p + offset, PAGE_SIZE - offset);
/* chain first in list head */
first->private = (unsigned long)list;
- err = virtqueue_add_buf(vi->rvq, vi->rx_sg, 0, MAX_SKB_FRAGS + 2,
+ err = virtqueue_add_buf(rq->vq, rq->sg, 0, MAX_SKB_FRAGS + 2,
first, gfp);
if (err < 0)
- give_pages(vi, first);
+ give_pages(rq, first);
return err;
}
-static int add_recvbuf_mergeable(struct virtnet_info *vi, gfp_t gfp)
+static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
{
struct page *page;
int err;
- page = get_a_page(vi, gfp);
+ page = get_a_page(rq, gfp);
if (!page)
return -ENOMEM;
- sg_init_one(vi->rx_sg, page_address(page), PAGE_SIZE);
+ sg_init_one(rq->sg, page_address(page), PAGE_SIZE);
- err = virtqueue_add_buf(vi->rvq, vi->rx_sg, 0, 1, page, gfp);
+ err = virtqueue_add_buf(rq->vq, rq->sg, 0, 1, page, gfp);
if (err < 0)
- give_pages(vi, page);
+ give_pages(rq, page);
return err;
}
@@ -458,97 +506,104 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, gfp_t gfp)
* before we're receiving packets, or from refill_work which is
* careful to disable receiving (using napi_disable).
*/
-static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
+static bool try_fill_recv(struct receive_queue *rq, gfp_t gfp)
{
+ struct virtnet_info *vi = rq->vi;
int err;
bool oom;
do {
if (vi->mergeable_rx_bufs)
- err = add_recvbuf_mergeable(vi, gfp);
+ err = add_recvbuf_mergeable(rq, gfp);
else if (vi->big_packets)
- err = add_recvbuf_big(vi, gfp);
+ err = add_recvbuf_big(rq, gfp);
else
- err = add_recvbuf_small(vi, gfp);
+ err = add_recvbuf_small(rq, gfp);
oom = err == -ENOMEM;
if (err < 0)
break;
- ++vi->num;
+ ++rq->num;
} while (err > 0);
- if (unlikely(vi->num > vi->max))
- vi->max = vi->num;
- virtqueue_kick(vi->rvq);
+ if (unlikely(rq->num > rq->max))
+ rq->max = rq->num;
+ virtqueue_kick(rq->vq);
return !oom;
}
-static void skb_recv_done(struct virtqueue *rvq)
+static void skb_recv_done(struct virtqueue *vq)
{
- struct virtnet_info *vi = rvq->vdev->priv;
+ struct virtnet_info *vi = vq->vdev->priv;
+ struct napi_struct *napi = &vi->rq[rxq_get_qnum(vi, vq)]->napi;
+
/* Schedule NAPI, Suppress further interrupts if successful. */
- if (napi_schedule_prep(&vi->napi)) {
- virtqueue_disable_cb(rvq);
- __napi_schedule(&vi->napi);
+ if (napi_schedule_prep(napi)) {
+ virtqueue_disable_cb(vq);
+ __napi_schedule(napi);
}
}
-static void virtnet_napi_enable(struct virtnet_info *vi)
+static void virtnet_napi_enable(struct receive_queue *rq)
{
- napi_enable(&vi->napi);
+ napi_enable(&rq->napi);
/* If all buffers were filled by other side before we napi_enabled, we
* won't get another interrupt, so process any outstanding packets
* now. virtnet_poll wants re-enable the queue, so we disable here.
* We synchronize against interrupts via NAPI_STATE_SCHED */
- if (napi_schedule_prep(&vi->napi)) {
- virtqueue_disable_cb(vi->rvq);
+ if (napi_schedule_prep(&rq->napi)) {
+ virtqueue_disable_cb(rq->vq);
local_bh_disable();
- __napi_schedule(&vi->napi);
+ __napi_schedule(&rq->napi);
local_bh_enable();
}
}
static void refill_work(struct work_struct *work)
{
- struct virtnet_info *vi;
+ struct napi_struct *napi;
+ struct receive_queue *rq;
bool still_empty;
- vi = container_of(work, struct virtnet_info, refill.work);
- napi_disable(&vi->napi);
- still_empty = !try_fill_recv(vi, GFP_KERNEL);
- virtnet_napi_enable(vi);
+ rq = container_of(work, struct receive_queue, refill.work);
+ napi = &rq->napi;
+
+ napi_disable(napi);
+ still_empty = !try_fill_recv(rq, GFP_KERNEL);
+ virtnet_napi_enable(rq);
/* In theory, this can happen: if we don't get any buffers in
* we will *never* try to fill again. */
if (still_empty)
- queue_delayed_work(system_nrt_wq, &vi->refill, HZ/2);
+ queue_delayed_work(system_nrt_wq, &rq->refill, HZ/2);
}
static int virtnet_poll(struct napi_struct *napi, int budget)
{
- struct virtnet_info *vi = container_of(napi, struct virtnet_info, napi);
+ struct receive_queue *rq = container_of(napi, struct receive_queue,
+ napi);
void *buf;
unsigned int len, received = 0;
again:
while (received < budget &&
- (buf = virtqueue_get_buf(vi->rvq, &len)) != NULL) {
- receive_buf(vi->dev, buf, len);
- --vi->num;
+ (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
+ receive_buf(rq, buf, len);
+ --rq->num;
received++;
}
- if (vi->num < vi->max / 2) {
- if (!try_fill_recv(vi, GFP_ATOMIC))
- queue_delayed_work(system_nrt_wq, &vi->refill, 0);
+ if (rq->num < rq->max / 2) {
+ if (!try_fill_recv(rq, GFP_ATOMIC))
+ queue_delayed_work(system_nrt_wq, &rq->refill, 0);
}
/* Out of packets? */
if (received < budget) {
napi_complete(napi);
- if (unlikely(!virtqueue_enable_cb(vi->rvq)) &&
+ if (unlikely(!virtqueue_enable_cb(rq->vq)) &&
napi_schedule_prep(napi)) {
- virtqueue_disable_cb(vi->rvq);
+ virtqueue_disable_cb(rq->vq);
__napi_schedule(napi);
goto again;
}
@@ -557,13 +612,14 @@ again:
return received;
}
-static unsigned int free_old_xmit_skbs(struct virtnet_info *vi)
+static unsigned int free_old_xmit_skbs(struct virtnet_info *vi,
+ struct virtqueue *vq)
{
struct sk_buff *skb;
unsigned int len, tot_sgs = 0;
struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
- while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
+ while ((skb = virtqueue_get_buf(vq, &len)) != NULL) {
pr_debug("Sent skb %p\n", skb);
u64_stats_update_begin(&stats->tx_syncp);
@@ -577,7 +633,8 @@ static unsigned int free_old_xmit_skbs(struct virtnet_info *vi)
return tot_sgs;
}
-static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
+static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb,
+ struct virtqueue *vq, struct scatterlist *sg)
{
struct skb_vnet_hdr *hdr = skb_vnet_hdr(skb);
const unsigned char *dest = ((struct ethhdr *)skb->data)->h_dest;
@@ -615,44 +672,47 @@ static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
/* Encode metadata header at front. */
if (vi->mergeable_rx_bufs)
- sg_set_buf(vi->tx_sg, &hdr->mhdr, sizeof hdr->mhdr);
+ sg_set_buf(sg, &hdr->mhdr, sizeof hdr->mhdr);
else
- sg_set_buf(vi->tx_sg, &hdr->hdr, sizeof hdr->hdr);
+ sg_set_buf(sg, &hdr->hdr, sizeof hdr->hdr);
- hdr->num_sg = skb_to_sgvec(skb, vi->tx_sg + 1, 0, skb->len) + 1;
- return virtqueue_add_buf(vi->svq, vi->tx_sg, hdr->num_sg,
+ hdr->num_sg = skb_to_sgvec(skb, sg + 1, 0, skb->len) + 1;
+ return virtqueue_add_buf(vq, sg, hdr->num_sg,
0, skb, GFP_ATOMIC);
}
static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct virtnet_info *vi = netdev_priv(dev);
+ int qnum = skb_get_queue_mapping(skb);
+ struct virtqueue *vq = vi->sq[qnum]->vq;
int capacity;
/* Free up any pending old buffers before queueing new ones. */
- free_old_xmit_skbs(vi);
+ free_old_xmit_skbs(vi, vq);
/* Try to transmit */
- capacity = xmit_skb(vi, skb);
+ capacity = xmit_skb(vi, skb, vq, vi->sq[qnum]->sg);
/* This can happen with OOM and indirect buffers. */
if (unlikely(capacity < 0)) {
if (likely(capacity == -ENOMEM)) {
if (net_ratelimit())
dev_warn(&dev->dev,
- "TX queue failure: out of memory\n");
+ "TXQ (%d) failure: out of memory\n",
+ qnum);
} else {
dev->stats.tx_fifo_errors++;
if (net_ratelimit())
dev_warn(&dev->dev,
- "Unexpected TX queue failure: %d\n",
- capacity);
+ "Unexpected TXQ (%d) failure: %d\n",
+ qnum, capacity);
}
dev->stats.tx_dropped++;
kfree_skb(skb);
return NETDEV_TX_OK;
}
- virtqueue_kick(vi->svq);
+ virtqueue_kick(vq);
/* Don't wait up for transmitted skbs to be freed. */
skb_orphan(skb);
@@ -661,13 +721,13 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
/* Apparently nice girls don't return TX_BUSY; stop the queue
* before it gets out of hand. Naturally, this wastes entries. */
if (capacity < 2+MAX_SKB_FRAGS) {
- netif_stop_queue(dev);
- if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) {
+ netif_stop_subqueue(dev, qnum);
+ if (unlikely(!virtqueue_enable_cb_delayed(vq))) {
/* More just got used, free them then recheck. */
- capacity += free_old_xmit_skbs(vi);
+ capacity += free_old_xmit_skbs(vi, vq);
if (capacity >= 2+MAX_SKB_FRAGS) {
- netif_start_queue(dev);
- virtqueue_disable_cb(vi->svq);
+ netif_start_subqueue(dev, qnum);
+ virtqueue_disable_cb(vq);
}
}
}
@@ -700,7 +760,8 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,
unsigned int start;
for_each_possible_cpu(cpu) {
- struct virtnet_stats *stats = per_cpu_ptr(vi->stats, cpu);
+ struct virtnet_stats __percpu *stats
+ = per_cpu_ptr(vi->stats, cpu);
u64 tpackets, tbytes, rpackets, rbytes;
do {
@@ -734,20 +795,26 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,
static void virtnet_netpoll(struct net_device *dev)
{
struct virtnet_info *vi = netdev_priv(dev);
+ int i;
- napi_schedule(&vi->napi);
+ for (i = 0; i < vi->num_queue_pairs; i++)
+ napi_schedule(&vi->rq[i]->napi);
}
#endif
static int virtnet_open(struct net_device *dev)
{
struct virtnet_info *vi = netdev_priv(dev);
+ int i;
- /* Make sure we have some buffers: if oom use wq. */
- if (!try_fill_recv(vi, GFP_KERNEL))
- queue_delayed_work(system_nrt_wq, &vi->refill, 0);
+ for (i = 0; i < vi->num_queue_pairs; i++) {
+ /* Make sure we have some buffers: if oom use wq. */
+ if (!try_fill_recv(vi->rq[i], GFP_KERNEL))
+ queue_delayed_work(system_nrt_wq,
+ &vi->rq[i]->refill, 0);
+ virtnet_napi_enable(vi->rq[i]);
+ }
- virtnet_napi_enable(vi);
return 0;
}
@@ -809,10 +876,13 @@ static void virtnet_ack_link_announce(struct virtnet_info *vi)
static int virtnet_close(struct net_device *dev)
{
struct virtnet_info *vi = netdev_priv(dev);
+ int i;
/* Make sure refill_work doesn't re-enable napi! */
- cancel_delayed_work_sync(&vi->refill);
- napi_disable(&vi->napi);
+ for (i = 0; i < vi->num_queue_pairs; i++) {
+ cancel_delayed_work_sync(&vi->rq[i]->refill);
+ napi_disable(&vi->rq[i]->napi);
+ }
return 0;
}
@@ -924,11 +994,10 @@ static void virtnet_get_ringparam(struct net_device *dev,
{
struct virtnet_info *vi = netdev_priv(dev);
- ring->rx_max_pending = virtqueue_get_vring_size(vi->rvq);
- ring->tx_max_pending = virtqueue_get_vring_size(vi->svq);
+ ring->rx_max_pending = virtqueue_get_vring_size(vi->rq[0]->vq);
+ ring->tx_max_pending = virtqueue_get_vring_size(vi->sq[0]->vq);
ring->rx_pending = ring->rx_max_pending;
ring->tx_pending = ring->tx_max_pending;
-
}
@@ -961,6 +1030,19 @@ static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
return 0;
}
+/* To avoid contending a lock hold by a vcpu who would exit to host, select the
+ * txq based on the processor id.
+ */
+static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
+{
+ int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
+ smp_processor_id();
+
+ while (unlikely(txq >= dev->real_num_tx_queues))
+ txq -= dev->real_num_tx_queues;
+ return txq;
+}
+
static const struct net_device_ops virtnet_netdev = {
.ndo_open = virtnet_open,
.ndo_stop = virtnet_close,
@@ -972,6 +1054,7 @@ static const struct net_device_ops virtnet_netdev = {
.ndo_get_stats64 = virtnet_stats,
.ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
+ .ndo_select_queue = virtnet_select_queue,
#ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller = virtnet_netpoll,
#endif
@@ -1007,10 +1090,10 @@ static void virtnet_config_changed_work(struct work_struct *work)
if (vi->status & VIRTIO_NET_S_LINK_UP) {
netif_carrier_on(vi->dev);
- netif_wake_queue(vi->dev);
+ netif_tx_wake_all_queues(vi->dev);
} else {
netif_carrier_off(vi->dev);
- netif_stop_queue(vi->dev);
+ netif_tx_stop_all_queues(vi->dev);
}
done:
mutex_unlock(&vi->config_lock);
@@ -1023,41 +1106,217 @@ static void virtnet_config_changed(struct virtio_device *vdev)
queue_work(system_nrt_wq, &vi->config_work);
}
-static int init_vqs(struct virtnet_info *vi)
+static void free_receive_bufs(struct virtnet_info *vi)
+{
+ int i;
+
+ for (i = 0; i < vi->num_queue_pairs; i++) {
+ while (vi->rq[i]->pages)
+ __free_pages(get_a_page(vi->rq[i], GFP_KERNEL), 0);
+ }
+}
+
+/* Free memory allocated for send and receive queues */
+static void virtnet_free_queues(struct virtnet_info *vi)
{
- struct virtqueue *vqs[3];
- vq_callback_t *callbacks[] = { skb_recv_done, skb_xmit_done, NULL};
- const char *names[] = { "input", "output", "control" };
- int nvqs, err;
+ int i;
- /* We expect two virtqueues, receive then send,
- * and optionally control. */
- nvqs = virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ? 3 : 2;
+ for (i = 0; i < vi->num_queue_pairs; i++) {
+ kfree(vi->rq[i]);
+ vi->rq[i] = NULL;
+ kfree(vi->sq[i]);
+ vi->sq[i] = NULL;
+ }
+}
- err = vi->vdev->config->find_vqs(vi->vdev, nvqs, vqs, callbacks, names);
- if (err)
- return err;
+static void free_unused_bufs(struct virtnet_info *vi)
+{
+ void *buf;
+ int i;
+
+ for (i = 0; i < vi->num_queue_pairs; i++) {
+ struct virtqueue *vq = vi->sq[i]->vq;
+
+ while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
+ dev_kfree_skb(buf);
+ }
+
+ for (i = 0; i < vi->num_queue_pairs; i++) {
+ struct virtqueue *vq = vi->rq[i]->vq;
+
+ while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
+ if (vi->mergeable_rx_bufs || vi->big_packets)
+ give_pages(vi->rq[i], buf);
+ else
+ dev_kfree_skb(buf);
+ --vi->rq[i]->num;
+ }
+ BUG_ON(vi->rq[i]->num != 0);
+ }
+}
+
+static void virtnet_set_affinity(struct virtnet_info *vi, bool set)
+{
+ int i;
+
+ if (vi->num_queue_pairs == 1)
+ return;
+
+ for (i = 0; i < vi->num_queue_pairs; i++) {
+ int cpu = set ? i : -1;
+ virtqueue_set_affinity(vi->rq[i]->vq, cpu);
+ virtqueue_set_affinity(vi->sq[i]->vq, cpu);
+ }
+ return;
+}
+
+static void virtnet_del_vqs(struct virtnet_info *vi)
+{
+ struct virtio_device *vdev = vi->vdev;
+
+ virtnet_set_affinity(vi, false);
+
+ vdev->config->del_vqs(vdev);
+
+ virtnet_free_queues(vi);
+}
+
+static int virtnet_find_vqs(struct virtnet_info *vi)
+{
+ vq_callback_t **callbacks;
+ struct virtqueue **vqs;
+ int ret = -ENOMEM;
+ int i, total_vqs;
+ char **names;
- vi->rvq = vqs[0];
- vi->svq = vqs[1];
+ /*
+ * We expect 1 RX virtqueue followed by 1 TX virtqueue, followd by
+ * possible control virtqueue and followed by the same
+ * 'vi->num_queue_pairs-1' more times
+ */
+ total_vqs = vi->num_queue_pairs * 2 +
+ virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ);
+
+ /* Allocate space for find_vqs parameters */
+ vqs = kmalloc(total_vqs * sizeof(*vqs), GFP_KERNEL);
+ callbacks = kmalloc(total_vqs * sizeof(*callbacks), GFP_KERNEL);
+ names = kmalloc(total_vqs * sizeof(*names), GFP_KERNEL);
+ if (!vqs || !callbacks || !names)
+ goto err;
+
+ /* Parameters for control virtqueue, if any */
+ if (vi->has_cvq) {
+ callbacks[2] = NULL;
+ names[2] = "control";
+ }
+
+ /* Allocate/initialize parameters for send/receive virtqueues */
+ for (i = 0; i < vi->num_queue_pairs * 2; i += 2) {
+ int j = (i == 0 ? i : i + vi->has_cvq);
+ callbacks[j] = skb_recv_done;
+ callbacks[j + 1] = skb_xmit_done;
+ names[j] = kasprintf(GFP_KERNEL, "input.%d", i / 2);
+ names[j + 1] = kasprintf(GFP_KERNEL, "output.%d", i / 2);
+ }
- if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)) {
+ ret = vi->vdev->config->find_vqs(vi->vdev, total_vqs, vqs, callbacks,
+ (const char **)names);
+ if (ret)
+ goto err;
+
+ if (vi->has_cvq)
vi->cvq = vqs[2];
- if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VLAN))
- vi->dev->features |= NETIF_F_HW_VLAN_FILTER;
+ for (i = 0; i < vi->num_queue_pairs * 2; i += 2) {
+ int j = i == 0 ? i : i + vi->has_cvq;
+ vi->rq[i / 2]->vq = vqs[j];
+ vi->sq[i / 2]->vq = vqs[j + 1];
}
- return 0;
+
+err:
+ if (ret && names)
+ for (i = 0; i < vi->num_queue_pairs * 2; i++)
+ kfree(names[i]);
+
+ kfree(names);
+ kfree(callbacks);
+ kfree(vqs);
+
+ return ret;
+}
+
+static int virtnet_alloc_queues(struct virtnet_info *vi)
+{
+ int ret = -ENOMEM;
+ int i;
+
+ for (i = 0; i < vi->num_queue_pairs; i++) {
+ vi->rq[i] = kzalloc(sizeof(*vi->rq[i]), GFP_KERNEL);
+ vi->sq[i] = kzalloc(sizeof(*vi->sq[i]), GFP_KERNEL);
+ if (!vi->rq[i] || !vi->sq[i])
+ goto err;
+ }
+
+ ret = 0;
+
+ /* setup initial receive and send queue parameters */
+ for (i = 0; i < vi->num_queue_pairs; i++) {
+ vi->rq[i]->vi = vi;
+ vi->rq[i]->pages = NULL;
+ INIT_DELAYED_WORK(&vi->rq[i]->refill, refill_work);
+ netif_napi_add(vi->dev, &vi->rq[i]->napi, virtnet_poll,
+ napi_weight);
+
+ sg_init_table(vi->rq[i]->sg, ARRAY_SIZE(vi->rq[i]->sg));
+ sg_init_table(vi->sq[i]->sg, ARRAY_SIZE(vi->sq[i]->sg));
+ }
+
+err:
+ if (ret)
+ virtnet_free_queues(vi);
+
+ return ret;
+}
+
+static int virtnet_setup_vqs(struct virtnet_info *vi)
+{
+ int ret;
+
+ /* Allocate send & receive queues */
+ ret = virtnet_alloc_queues(vi);
+ if (!ret) {
+ ret = virtnet_find_vqs(vi);
+ if (ret)
+ virtnet_free_queues(vi);
+ else
+ virtnet_set_affinity(vi, true);
+ }
+
+ return ret;
}
static int virtnet_probe(struct virtio_device *vdev)
{
- int err;
+ int i, err;
struct net_device *dev;
struct virtnet_info *vi;
+ u16 num_queues, num_queue_pairs;
+
+ /* Find if host supports multiqueue virtio_net device */
+ err = virtio_config_val(vdev, VIRTIO_NET_F_MULTIQUEUE,
+ offsetof(struct virtio_net_config,
+ num_queues), &num_queues);
+
+ /* We need atleast 2 queue's */
+ if (err || num_queues < 2)
+ num_queues = 2;
+ if (num_queues > MAX_QUEUES * 2)
+ num_queues = MAX_QUEUES;
+
+ num_queue_pairs = num_queues / 2;
/* Allocate ourselves a network device with room for our info */
- dev = alloc_etherdev(sizeof(struct virtnet_info));
+ dev = alloc_etherdev_mq(sizeof(struct virtnet_info), num_queue_pairs);
if (!dev)
return -ENOMEM;
@@ -1103,22 +1362,18 @@ static int virtnet_probe(struct virtio_device *vdev)
/* Set up our device-specific information */
vi = netdev_priv(dev);
- netif_napi_add(dev, &vi->napi, virtnet_poll, napi_weight);
vi->dev = dev;
vi->vdev = vdev;
vdev->priv = vi;
- vi->pages = NULL;
vi->stats = alloc_percpu(struct virtnet_stats);
err = -ENOMEM;
if (vi->stats == NULL)
- goto free;
+ goto free_netdev;
- INIT_DELAYED_WORK(&vi->refill, refill_work);
mutex_init(&vi->config_lock);
vi->config_enable = true;
INIT_WORK(&vi->config_work, virtnet_config_changed_work);
- sg_init_table(vi->rx_sg, ARRAY_SIZE(vi->rx_sg));
- sg_init_table(vi->tx_sg, ARRAY_SIZE(vi->tx_sg));
+ vi->num_queue_pairs = num_queue_pairs;
/* If we can receive ANY GSO packets, we must allocate large ones. */
if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
@@ -1129,9 +1384,17 @@ static int virtnet_probe(struct virtio_device *vdev)
if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
vi->mergeable_rx_bufs = true;
- err = init_vqs(vi);
+ if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
+ vi->has_cvq = true;
+
+ /* Allocate/initialize the rx/tx queues, and invoke find_vqs */
+ err = virtnet_setup_vqs(vi);
if (err)
- goto free_stats;
+ goto free_netdev;
+
+ if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) &&
+ virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VLAN))
+ dev->features |= NETIF_F_HW_VLAN_FILTER;
err = register_netdev(dev);
if (err) {
@@ -1140,12 +1403,15 @@ static int virtnet_probe(struct virtio_device *vdev)
}
/* Last of all, set up some receive buffers. */
- try_fill_recv(vi, GFP_KERNEL);
-
- /* If we didn't even get one input buffer, we're useless. */
- if (vi->num == 0) {
- err = -ENOMEM;
- goto unregister;
+ for (i = 0; i < num_queue_pairs; i++) {
+ try_fill_recv(vi->rq[i], GFP_KERNEL);
+
+ /* If we didn't even get one input buffer, we're useless. */
+ if (vi->rq[i]->num == 0) {
+ free_unused_bufs(vi);
+ err = -ENOMEM;
+ goto free_recv_bufs;
+ }
}
/* Assume link up if device can't report link status,
@@ -1158,42 +1424,25 @@ static int virtnet_probe(struct virtio_device *vdev)
netif_carrier_on(dev);
}
- pr_debug("virtnet: registered device %s\n", dev->name);
+ pr_debug("virtnet: registered device %s with %d RX and TX vq's\n",
+ dev->name, num_queue_pairs);
+
return 0;
-unregister:
+free_recv_bufs:
+ free_receive_bufs(vi);
unregister_netdev(dev);
+
free_vqs:
- vdev->config->del_vqs(vdev);
-free_stats:
- free_percpu(vi->stats);
-free:
+ for (i = 0; i < num_queue_pairs; i++)
+ cancel_delayed_work_sync(&vi->rq[i]->refill);
+ virtnet_del_vqs(vi);
+
+free_netdev:
free_netdev(dev);
return err;
}
-static void free_unused_bufs(struct virtnet_info *vi)
-{
- void *buf;
- while (1) {
- buf = virtqueue_detach_unused_buf(vi->svq);
- if (!buf)
- break;
- dev_kfree_skb(buf);
- }
- while (1) {
- buf = virtqueue_detach_unused_buf(vi->rvq);
- if (!buf)
- break;
- if (vi->mergeable_rx_bufs || vi->big_packets)
- give_pages(vi, buf);
- else
- dev_kfree_skb(buf);
- --vi->num;
- }
- BUG_ON(vi->num != 0);
-}
-
static void remove_vq_common(struct virtnet_info *vi)
{
vi->vdev->config->reset(vi->vdev);
@@ -1201,10 +1450,9 @@ static void remove_vq_common(struct virtnet_info *vi)
/* Free unused buffers in both send and recv, if any. */
free_unused_bufs(vi);
- vi->vdev->config->del_vqs(vi->vdev);
+ free_receive_bufs(vi);
- while (vi->pages)
- __free_pages(get_a_page(vi, GFP_KERNEL), 0);
+ virtnet_del_vqs(vi);
}
static void __devexit virtnet_remove(struct virtio_device *vdev)
@@ -1230,6 +1478,7 @@ static void __devexit virtnet_remove(struct virtio_device *vdev)
static int virtnet_freeze(struct virtio_device *vdev)
{
struct virtnet_info *vi = vdev->priv;
+ int i;
/* Prevent config work handler from accessing the device */
mutex_lock(&vi->config_lock);
@@ -1237,10 +1486,13 @@ static int virtnet_freeze(struct virtio_device *vdev)
mutex_unlock(&vi->config_lock);
netif_device_detach(vi->dev);
- cancel_delayed_work_sync(&vi->refill);
+ for (i = 0; i < vi->num_queue_pairs; i++)
+ cancel_delayed_work_sync(&vi->rq[i]->refill);
if (netif_running(vi->dev))
- napi_disable(&vi->napi);
+ for (i = 0; i < vi->num_queue_pairs; i++)
+ napi_disable(&vi->rq[i]->napi);
+
remove_vq_common(vi);
@@ -1252,19 +1504,22 @@ static int virtnet_freeze(struct virtio_device *vdev)
static int virtnet_restore(struct virtio_device *vdev)
{
struct virtnet_info *vi = vdev->priv;
- int err;
+ int err, i;
- err = init_vqs(vi);
+ err = virtnet_setup_vqs(vi);
if (err)
return err;
if (netif_running(vi->dev))
- virtnet_napi_enable(vi);
+ for (i = 0; i < vi->num_queue_pairs; i++)
+ virtnet_napi_enable(vi->rq[i]);
netif_device_attach(vi->dev);
- if (!try_fill_recv(vi, GFP_KERNEL))
- queue_delayed_work(system_nrt_wq, &vi->refill, 0);
+ for (i = 0; i < vi->num_queue_pairs; i++)
+ if (!try_fill_recv(vi->rq[i], GFP_KERNEL))
+ queue_delayed_work(system_nrt_wq,
+ &vi->rq[i]->refill, 0);
mutex_lock(&vi->config_lock);
vi->config_enable = true;
@@ -1287,7 +1542,7 @@ static unsigned int features[] = {
VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_UFO,
VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
- VIRTIO_NET_F_GUEST_ANNOUNCE,
+ VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MULTIQUEUE,
};
static struct virtio_driver virtio_net_driver = {
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 1bc7e30..60f09ff 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -61,6 +61,8 @@ struct virtio_net_config {
__u8 mac[6];
/* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
__u16 status;
+ /* Total number of RX/TX queues */
+ __u16 num_queues;
} __attribute__((packed));
/* This is the first element of the scatter-gather list. If you don't
--
1.7.1
^ permalink raw reply related
* [net-next RFC V5 5/5] virtio_net: support negotiating the number of queues through ctrl vq
From: Jason Wang @ 2012-07-05 10:29 UTC (permalink / raw)
To: mst, mashirle, krkumar2, habanero, rusty, netdev, linux-kernel,
virtualization, edumazet, tahm, jwhan, davem
Cc: kvm, sri
In-Reply-To: <1341484194-8108-1-git-send-email-jasowang@redhat.com>
This patch let the virtio_net driver can negotiate the number of queues it
wishes to use through control virtqueue and export an ethtool interface to let
use tweak it.
As current multiqueue virtio-net implementation has optimizations on per-cpu
virtuqueues, so only two modes were support:
- single queue pair mode
- multiple queue paris mode, the number of queues matches the number of vcpus
The single queue mode were used by default currently due to regression of
multiqueue mode in some test (especially in stream test).
Since virtio core does not support paritially deleting virtqueues, so during
mode switching the whole virtqueue were deleted and the driver would re-create
the virtqueues it would used.
btw. The queue number negotiating were defered to .ndo_open(), this is because
only after feature negotitaion could we send the command to control virtqueue
(as it may also use event index).
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 171 ++++++++++++++++++++++++++++++++++---------
include/linux/virtio_net.h | 7 ++
2 files changed, 142 insertions(+), 36 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 7410187..3339eeb 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -88,6 +88,7 @@ struct receive_queue {
struct virtnet_info {
u16 num_queue_pairs; /* # of RX/TX vq pairs */
+ u16 total_queue_pairs;
struct send_queue *sq[MAX_QUEUES] ____cacheline_aligned_in_smp;
struct receive_queue *rq[MAX_QUEUES] ____cacheline_aligned_in_smp;
@@ -137,6 +138,8 @@ struct padded_vnet_hdr {
char padding[6];
};
+static const struct ethtool_ops virtnet_ethtool_ops;
+
static inline int txq_get_qnum(struct virtnet_info *vi, struct virtqueue *vq)
{
int ret = virtqueue_get_queue_index(vq);
@@ -802,22 +805,6 @@ static void virtnet_netpoll(struct net_device *dev)
}
#endif
-static int virtnet_open(struct net_device *dev)
-{
- struct virtnet_info *vi = netdev_priv(dev);
- int i;
-
- for (i = 0; i < vi->num_queue_pairs; i++) {
- /* Make sure we have some buffers: if oom use wq. */
- if (!try_fill_recv(vi->rq[i], GFP_KERNEL))
- queue_delayed_work(system_nrt_wq,
- &vi->rq[i]->refill, 0);
- virtnet_napi_enable(vi->rq[i]);
- }
-
- return 0;
-}
-
/*
* Send command via the control virtqueue and check status. Commands
* supported by the hypervisor, as indicated by feature bits, should
@@ -873,6 +860,43 @@ static void virtnet_ack_link_announce(struct virtnet_info *vi)
rtnl_unlock();
}
+static int virtnet_set_queues(struct virtnet_info *vi)
+{
+ struct scatterlist sg;
+ struct net_device *dev = vi->dev;
+ sg_init_one(&sg, &vi->num_queue_pairs, sizeof(vi->num_queue_pairs));
+
+ if (!vi->has_cvq)
+ return -EINVAL;
+
+ if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MULTIQUEUE,
+ VIRTIO_NET_CTRL_MULTIQUEUE_QNUM, &sg, 1, 0)){
+ dev_warn(&dev->dev, "Fail to set the number of queue pairs to"
+ " %d\n", vi->num_queue_pairs);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int virtnet_open(struct net_device *dev)
+{
+ struct virtnet_info *vi = netdev_priv(dev);
+ int i;
+
+ for (i = 0; i < vi->num_queue_pairs; i++) {
+ /* Make sure we have some buffers: if oom use wq. */
+ if (!try_fill_recv(vi->rq[i], GFP_KERNEL))
+ queue_delayed_work(system_nrt_wq,
+ &vi->rq[i]->refill, 0);
+ virtnet_napi_enable(vi->rq[i]);
+ }
+
+ virtnet_set_queues(vi);
+
+ return 0;
+}
+
static int virtnet_close(struct net_device *dev)
{
struct virtnet_info *vi = netdev_priv(dev);
@@ -1013,12 +1037,6 @@ static void virtnet_get_drvinfo(struct net_device *dev,
}
-static const struct ethtool_ops virtnet_ethtool_ops = {
- .get_drvinfo = virtnet_get_drvinfo,
- .get_link = ethtool_op_get_link,
- .get_ringparam = virtnet_get_ringparam,
-};
-
#define MIN_MTU 68
#define MAX_MTU 65535
@@ -1235,7 +1253,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
err:
if (ret && names)
- for (i = 0; i < vi->num_queue_pairs * 2; i++)
+ for (i = 0; i < total_vqs * 2; i++)
kfree(names[i]);
kfree(names);
@@ -1373,7 +1391,6 @@ static int virtnet_probe(struct virtio_device *vdev)
mutex_init(&vi->config_lock);
vi->config_enable = true;
INIT_WORK(&vi->config_work, virtnet_config_changed_work);
- vi->num_queue_pairs = num_queue_pairs;
/* If we can receive ANY GSO packets, we must allocate large ones. */
if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
@@ -1387,6 +1404,10 @@ static int virtnet_probe(struct virtio_device *vdev)
if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
vi->has_cvq = true;
+ /* Use single tx/rx queue pair as default */
+ vi->num_queue_pairs = 1;
+ vi->total_queue_pairs = num_queue_pairs;
+
/* Allocate/initialize the rx/tx queues, and invoke find_vqs */
err = virtnet_setup_vqs(vi);
if (err)
@@ -1396,6 +1417,9 @@ static int virtnet_probe(struct virtio_device *vdev)
virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VLAN))
dev->features |= NETIF_F_HW_VLAN_FILTER;
+ netif_set_real_num_tx_queues(dev, 1);
+ netif_set_real_num_rx_queues(dev, 1);
+
err = register_netdev(dev);
if (err) {
pr_debug("virtio_net: registering device failed\n");
@@ -1403,7 +1427,7 @@ static int virtnet_probe(struct virtio_device *vdev)
}
/* Last of all, set up some receive buffers. */
- for (i = 0; i < num_queue_pairs; i++) {
+ for (i = 0; i < vi->num_queue_pairs; i++) {
try_fill_recv(vi->rq[i], GFP_KERNEL);
/* If we didn't even get one input buffer, we're useless. */
@@ -1474,10 +1498,8 @@ static void __devexit virtnet_remove(struct virtio_device *vdev)
free_netdev(vi->dev);
}
-#ifdef CONFIG_PM
-static int virtnet_freeze(struct virtio_device *vdev)
+static void virtnet_stop(struct virtnet_info *vi)
{
- struct virtnet_info *vi = vdev->priv;
int i;
/* Prevent config work handler from accessing the device */
@@ -1493,17 +1515,10 @@ static int virtnet_freeze(struct virtio_device *vdev)
for (i = 0; i < vi->num_queue_pairs; i++)
napi_disable(&vi->rq[i]->napi);
-
- remove_vq_common(vi);
-
- flush_work(&vi->config_work);
-
- return 0;
}
-static int virtnet_restore(struct virtio_device *vdev)
+static int virtnet_start(struct virtnet_info *vi)
{
- struct virtnet_info *vi = vdev->priv;
int err, i;
err = virtnet_setup_vqs(vi);
@@ -1527,6 +1542,29 @@ static int virtnet_restore(struct virtio_device *vdev)
return 0;
}
+
+#ifdef CONFIG_PM
+static int virtnet_freeze(struct virtio_device *vdev)
+{
+ struct virtnet_info *vi = vdev->priv;
+
+ virtnet_stop(vi);
+
+ remove_vq_common(vi);
+
+ flush_work(&vi->config_work);
+
+ return 0;
+}
+
+static int virtnet_restore(struct virtio_device *vdev)
+{
+ struct virtnet_info *vi = vdev->priv;
+
+ virtnet_start(vi);
+
+ return 0;
+}
#endif
static struct virtio_device_id id_table[] = {
@@ -1560,6 +1598,67 @@ static struct virtio_driver virtio_net_driver = {
#endif
};
+static int virtnet_set_channels(struct net_device *dev,
+ struct ethtool_channels *channels)
+{
+ struct virtnet_info *vi = netdev_priv(dev);
+ u16 queues = channels->rx_count;
+ unsigned status = VIRTIO_CONFIG_S_ACKNOWLEDGE | VIRTIO_CONFIG_S_DRIVER;
+
+ if (channels->rx_count != channels->tx_count)
+ return -EINVAL;
+ /* Only two modes were support currently */
+ if (queues != vi->total_queue_pairs && queues != 1)
+ return -EINVAL;
+ if (!vi->has_cvq)
+ return -EINVAL;
+
+ virtnet_stop(vi);
+
+ netif_set_real_num_tx_queues(dev, queues);
+ netif_set_real_num_rx_queues(dev, queues);
+
+ remove_vq_common(vi);
+ flush_work(&vi->config_work);
+
+ vi->num_queue_pairs = queues;
+ virtnet_start(vi);
+
+ vi->vdev->config->finalize_features(vi->vdev);
+
+ if (virtnet_set_queues(vi))
+ status |= VIRTIO_CONFIG_S_FAILED;
+ else
+ status |= VIRTIO_CONFIG_S_DRIVER_OK;
+
+ vi->vdev->config->set_status(vi->vdev, status);
+
+ return 0;
+}
+
+static void virtnet_get_channels(struct net_device *dev,
+ struct ethtool_channels *channels)
+{
+ struct virtnet_info *vi = netdev_priv(dev);
+
+ channels->max_rx = vi->total_queue_pairs;
+ channels->max_tx = vi->total_queue_pairs;
+ channels->max_other = 0;
+ channels->max_combined = 0;
+ channels->rx_count = vi->num_queue_pairs;
+ channels->tx_count = vi->num_queue_pairs;
+ channels->other_count = 0;
+ channels->combined_count = 0;
+}
+
+static const struct ethtool_ops virtnet_ethtool_ops = {
+ .get_drvinfo = virtnet_get_drvinfo,
+ .get_link = ethtool_op_get_link,
+ .get_ringparam = virtnet_get_ringparam,
+ .set_channels = virtnet_set_channels,
+ .get_channels = virtnet_get_channels,
+};
+
static int __init init(void)
{
return register_virtio_driver(&virtio_net_driver);
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 60f09ff..0d21e08 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -169,4 +169,11 @@ struct virtio_net_ctrl_mac {
#define VIRTIO_NET_CTRL_ANNOUNCE 3
#define VIRTIO_NET_CTRL_ANNOUNCE_ACK 0
+/*
+ * Control multiqueue
+ *
+ */
+#define VIRTIO_NET_CTRL_MULTIQUEUE 4
+ #define VIRTIO_NET_CTRL_MULTIQUEUE_QNUM 0
+
#endif /* _LINUX_VIRTIO_NET_H */
--
1.7.1
^ permalink raw reply related
* Re: [net-next RFC V5 2/5] virtio_ring: move queue_index to vring_virtqueue
From: Sasha Levin @ 2012-07-05 11:40 UTC (permalink / raw)
To: Jason Wang
Cc: krkumar2, habanero, mashirle, kvm, mst, netdev, linux-kernel,
virtualization, edumazet, tahm, jwhan, davem, sri
In-Reply-To: <1341484194-8108-3-git-send-email-jasowang@redhat.com>
On Thu, 2012-07-05 at 18:29 +0800, Jason Wang wrote:
> Instead of storing the queue index in virtio infos, this patch moves them to
> vring_virtqueue and introduces helpers to set and get the value. This would
> simplify the management and tracing.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
This patch actually fails to compile:
drivers/virtio/virtio_mmio.c: In function ‘vm_notify’:
drivers/virtio/virtio_mmio.c:229:13: error: ‘struct virtio_mmio_vq_info’ has no member named ‘queue_index’
drivers/virtio/virtio_mmio.c: In function ‘vm_del_vq’:
drivers/virtio/virtio_mmio.c:278:13: error: ‘struct virtio_mmio_vq_info’ has no member named ‘queue_index’
make[2]: *** [drivers/virtio/virtio_mmio.o] Error 1
It probably misses the following hunks:
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index f5432b6..12b6180 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -222,11 +222,10 @@ static void vm_reset(struct virtio_device *vdev)
static void vm_notify(struct virtqueue *vq)
{
struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev);
- struct virtio_mmio_vq_info *info = vq->priv;
/* We write the queue's selector into the notification register to
* signal the other end */
- writel(info->queue_index, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
+ writel(virtqueue_get_queue_index(vq), vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
}
/* Notify all virtqueues on an interrupt. */
@@ -275,7 +274,7 @@ static void vm_del_vq(struct virtqueue *vq)
vring_del_virtqueue(vq);
/* Select and deactivate the queue */
- writel(info->queue_index, vm_dev->base + VIRTIO_MMIO_QUEUE_SEL);
+ writel(virtqueue_get_queue_index(vq), vm_dev->base + VIRTIO_MMIO_QUEUE_SEL);
writel(0, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
size = PAGE_ALIGN(vring_size(info->num, VIRTIO_MMIO_VRING_ALIGN));
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related
* Re: [PATCH] MAINTAINERS: add kvm list for virtio components
From: Michael S. Tsirkin @ 2012-07-05 12:22 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kvm, linux-kernel, virtualization, avi, amit.shah
In-Reply-To: <1341482827-13195-1-git-send-email-pbonzini@redhat.com>
On Thu, Jul 05, 2012 at 12:07:07PM +0200, Paolo Bonzini wrote:
> The KVM list is followed by more people than the generic
> virtualization@lists.linux-foundation.org mailing list, and is
> already "de facto" the place where virtio patches are posted.
I have no data on the first statement (do you?) and I disagree with the
last statement, but have no objection to people adding kvm list as well.
> pv-ops still has no other lists than virtualization@lists.linux-foundation.org.
> However, pv-ops patches will likely touch Xen or KVM files as well and
> the respective mailing list will usually be reached as well.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
So pls please replace the 1st paragraph in the commit log with
"virtio changes are likely to affect many KVM users as virtio is
the de-facto standard for PV devices under KVM". Otherwise ok.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> MAINTAINERS | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 14bc707..e265f2e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -7340,6 +7340,7 @@ F: include/media/videobuf2-*
> VIRTIO CONSOLE DRIVER
> M: Amit Shah <amit.shah@redhat.com>
> L: virtualization@lists.linux-foundation.org
> +L: kvm@vger.kernel.org
> S: Maintained
> F: drivers/char/virtio_console.c
> F: include/linux/virtio_console.h
> @@ -7348,6 +7349,7 @@ VIRTIO CORE, NET AND BLOCK DRIVERS
> M: Rusty Russell <rusty@rustcorp.com.au>
> M: "Michael S. Tsirkin" <mst@redhat.com>
> L: virtualization@lists.linux-foundation.org
> +L: kvm@vger.kernel.org
> S: Maintained
> F: drivers/virtio/
> F: drivers/net/virtio_net.c
> --
> 1.7.1
^ permalink raw reply
* Re: [net-next RFC V5 5/5] virtio_net: support negotiating the number of queues through ctrl vq
From: Sasha Levin @ 2012-07-05 12:51 UTC (permalink / raw)
To: Jason Wang
Cc: krkumar2, habanero, mashirle, kvm, mst, netdev, linux-kernel,
virtualization, edumazet, tahm, jwhan, davem, sri
In-Reply-To: <1341484194-8108-6-git-send-email-jasowang@redhat.com>
On Thu, 2012-07-05 at 18:29 +0800, Jason Wang wrote:
> @@ -1387,6 +1404,10 @@ static int virtnet_probe(struct virtio_device *vdev)
> if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
> vi->has_cvq = true;
>
> + /* Use single tx/rx queue pair as default */
> + vi->num_queue_pairs = 1;
> + vi->total_queue_pairs = num_queue_pairs;
The code is using this "default" even if the amount of queue pairs it
wants was specified during initialization. This basically limits any
device to use 1 pair when starting up.
^ permalink raw reply
* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
From: Michael S. Tsirkin @ 2012-07-05 13:53 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Jens Axboe, Anthony Liguori, linux-scsi, kvm-devel, lf-virt,
Anthony Liguori, target-devel, Zhi Yong Wu, Christoph Hellwig,
Stefan Hajnoczi
In-Reply-To: <4FF56AE9.9060201@redhat.com>
On Thu, Jul 05, 2012 at 12:22:33PM +0200, Paolo Bonzini wrote:
> Il 05/07/2012 03:52, Nicholas A. Bellinger ha scritto:
> >
> > fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal raw block
> > ------------------------------------------------------------------------------------
> > 25 Write / 75 Read | ~15K | ~45K | ~70K
> > 75 Write / 25 Read | ~20K | ~55K | ~60K
>
> This is impressive, but I think it's still not enough to justify the
> inclusion of tcm_vhost. In my opinion, vhost-blk/vhost-scsi are mostly
> worthwhile as drivers for improvements to QEMU performance. We want to
> add more fast paths to QEMU that let us move SCSI and virtio processing
> to separate threads, we have proof of concepts that this can be done,
> and we can use vhost-blk/vhost-scsi to find bottlenecks more effectively.
A general rant below:
OTOH if it works, and adds value, we really should consider including code.
To me, it does not make sense to reject code just because in theory
someone could write even better code. Code walks. Time to marker matters too.
Yes I realize more options increases support. But downstreams can make
their own decisions on whether to support some configurations:
add a configure option to disable it and that's enough.
> In fact, virtio-scsi-qemu and virtio-scsi-vhost are effectively two
> completely different devices that happen to speak the same SCSI
> transport. Not only virtio-scsi-vhost must be configured outside QEMU
configuration outside QEMU is OK I think - real users use
management anyway. But maybe we can have helper scripts
like we have for tun?
> and doesn't support -device;
This needs to be fixed I think.
> it (obviously) presents different
> inquiry/vpd/mode data than virtio-scsi-qemu,
Why is this obvious and can't be fixed? Userspace virtio-scsi
is pretty flexible - can't it supply matching inquiry/vpd/mode data
so that switching is transparent to the guest?
> so that it is not possible to migrate one to the other.
Migration between different backend types does not seem all that useful.
The general rule is you need identical flags on both sides to allow
migration, and it is not clear how valuable it is to relax this
somewhat.
> I don't think vhost-scsi is particularly useful for virtualization,
> honestly. However, if it is useful for development, testing or
> benchmarking of lio itself (does this make any sense? :)) that could be
> by itself a good reason to include it.
>
> Paolo
--
MST
^ permalink raw reply
* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
From: Anthony Liguori @ 2012-07-05 14:06 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Jens Axboe, linux-scsi, kvm-devel, lf-virt, Anthony Liguori,
target-devel, Paolo Bonzini, Zhi Yong Wu, Christoph Hellwig,
Stefan Hajnoczi
In-Reply-To: <20120705135318.GG30572@redhat.com>
On 07/05/2012 08:53 AM, Michael S. Tsirkin wrote:
> On Thu, Jul 05, 2012 at 12:22:33PM +0200, Paolo Bonzini wrote:
>> Il 05/07/2012 03:52, Nicholas A. Bellinger ha scritto:
>>>
>>> fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal raw block
>>> ------------------------------------------------------------------------------------
>>> 25 Write / 75 Read | ~15K | ~45K | ~70K
>>> 75 Write / 25 Read | ~20K | ~55K | ~60K
>>
>> This is impressive, but I think it's still not enough to justify the
>> inclusion of tcm_vhost.
We have demonstrated better results at much higher IOP rates with virtio-blk in
userspace so while these results are nice, there's no reason to believe we can't
do this in userspace.
>> In my opinion, vhost-blk/vhost-scsi are mostly
>> worthwhile as drivers for improvements to QEMU performance. We want to
>> add more fast paths to QEMU that let us move SCSI and virtio processing
>> to separate threads, we have proof of concepts that this can be done,
>> and we can use vhost-blk/vhost-scsi to find bottlenecks more effectively.
>
> A general rant below:
>
> OTOH if it works, and adds value, we really should consider including code.
Users want something that has lots of features and performs really, really well.
They want everything.
Having one device type that is "fast" but has no features and another that is
"not fast" but has a lot of features forces the user to make a bad choice. No
one wins in the end.
virtio-scsi is brand new. It's not as if we've had any significant time to make
virtio-scsi-qemu faster. In fact, tcm_vhost existed before virtio-scsi-qemu did
if I understand correctly.
> To me, it does not make sense to reject code just because in theory
> someone could write even better code.
There is no theory. We have proof points with virtio-blk.
> Code walks. Time to marker matters too.
But guest/user facing decisions cannot be easily unmade and making the wrong
technical choices because of premature concerns of "time to market" just result
in a long term mess.
There is no technical reason why tcm_vhost is going to be faster than doing it
in userspace. We can demonstrate this with virtio-blk. This isn't a
theoretical argument.
> Yes I realize more options increases support. But downstreams can make
> their own decisions on whether to support some configurations:
> add a configure option to disable it and that's enough.
>
>> In fact, virtio-scsi-qemu and virtio-scsi-vhost are effectively two
>> completely different devices that happen to speak the same SCSI
>> transport. Not only virtio-scsi-vhost must be configured outside QEMU
>
> configuration outside QEMU is OK I think - real users use
> management anyway. But maybe we can have helper scripts
> like we have for tun?
Asking a user to write a helper script is pretty awful...
>
>> and doesn't support -device;
>
> This needs to be fixed I think.
>
>> it (obviously) presents different
>> inquiry/vpd/mode data than virtio-scsi-qemu,
>
> Why is this obvious and can't be fixed?
It's an entirely different emulation path. It's not a simple packet protocol
like virtio-net. It's a complex command protocol where the backend maintains a
very large amount of state.
> Userspace virtio-scsi
> is pretty flexible - can't it supply matching inquiry/vpd/mode data
> so that switching is transparent to the guest?
Basically, the issue is that the kernel has more complete SCSI emulation that
QEMU does right now.
There are lots of ways to try to solve this--like try to reuse the kernel code
in userspace or just improving the userspace code. If we were able to make the
two paths identical, then I strongly suspect there'd be no point in having
tcm_vhost anyway.
Regards,
Anthony Liguori
>
>> so that it is not possible to migrate one to the other.
>
> Migration between different backend types does not seem all that useful.
> The general rule is you need identical flags on both sides to allow
> migration, and it is not clear how valuable it is to relax this
> somewhat.
>
>> I don't think vhost-scsi is particularly useful for virtualization,
>> honestly. However, if it is useful for development, testing or
>> benchmarking of lio itself (does this make any sense? :)) that could be
>> by itself a good reason to include it.
>>
>> Paolo
>
^ permalink raw reply
* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
From: Paolo Bonzini @ 2012-07-05 14:32 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Jens Axboe, Anthony Liguori, linux-scsi, kvm-devel, lf-virt,
Anthony Liguori, target-devel, Zhi Yong Wu, Christoph Hellwig,
Stefan Hajnoczi
In-Reply-To: <20120705135318.GG30572@redhat.com>
Il 05/07/2012 15:53, Michael S. Tsirkin ha scritto:
> On Thu, Jul 05, 2012 at 12:22:33PM +0200, Paolo Bonzini wrote:
>> Il 05/07/2012 03:52, Nicholas A. Bellinger ha scritto:
>>>
>>> fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal raw block
>>> ------------------------------------------------------------------------------------
>>> 25 Write / 75 Read | ~15K | ~45K | ~70K
>>> 75 Write / 25 Read | ~20K | ~55K | ~60K
>>
>> This is impressive, but I think it's still not enough to justify the
>> inclusion of tcm_vhost. In my opinion, vhost-blk/vhost-scsi are mostly
>> worthwhile as drivers for improvements to QEMU performance. We want to
>> add more fast paths to QEMU that let us move SCSI and virtio processing
>> to separate threads, we have proof of concepts that this can be done,
>> and we can use vhost-blk/vhost-scsi to find bottlenecks more effectively.
>
> A general rant below:
>
> OTOH if it works, and adds value, we really should consider including code.
> To me, it does not make sense to reject code just because in theory
> someone could write even better code.
It's not about writing better code. It's about having two completely
separate SCSI/block layers with completely different feature sets.
> Code walks. Time to marker matters too.
> Yes I realize more options increases support. But downstreams can make
> their own decisions on whether to support some configurations:
> add a configure option to disable it and that's enough.
>
>> In fact, virtio-scsi-qemu and virtio-scsi-vhost are effectively two
>> completely different devices that happen to speak the same SCSI
>> transport. Not only virtio-scsi-vhost must be configured outside QEMU
>
> configuration outside QEMU is OK I think - real users use
> management anyway. But maybe we can have helper scripts
> like we have for tun?
We could add hooks for vhost-scsi in the SCSI devices and let them
configure themselves. I'm not sure it is a good idea.
>> and doesn't support -device;
>
> This needs to be fixed I think.
To be clear, it supports -device for the virtio-scsi HBA itself; it
doesn't support using -drive/-device to set up the disks hanging off it.
>> it (obviously) presents different
>> inquiry/vpd/mode data than virtio-scsi-qemu,
>
> Why is this obvious and can't be fixed? Userspace virtio-scsi
> is pretty flexible - can't it supply matching inquiry/vpd/mode data
> so that switching is transparent to the guest?
It cannot support anyway the whole feature set unless you want to port
thousands of lines from the kernel to QEMU (well, perhaps we'll get
there but it's far. And dually, the in-kernel target of course does not
support qcow2 and friends though perhaps you could imagine some hack
based on NBD.
Paolo
^ permalink raw reply
* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
From: Michael S. Tsirkin @ 2012-07-05 14:40 UTC (permalink / raw)
To: Anthony Liguori
Cc: Jens Axboe, linux-scsi, kvm-devel, lf-virt, Anthony Liguori,
target-devel, Paolo Bonzini, Zhi Yong Wu, Christoph Hellwig,
Stefan Hajnoczi
In-Reply-To: <4FF59F6B.2000101@us.ibm.com>
On Thu, Jul 05, 2012 at 09:06:35AM -0500, Anthony Liguori wrote:
> On 07/05/2012 08:53 AM, Michael S. Tsirkin wrote:
> >On Thu, Jul 05, 2012 at 12:22:33PM +0200, Paolo Bonzini wrote:
> >>Il 05/07/2012 03:52, Nicholas A. Bellinger ha scritto:
> >>>
> >>>fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal raw block
> >>>------------------------------------------------------------------------------------
> >>>25 Write / 75 Read | ~15K | ~45K | ~70K
> >>>75 Write / 25 Read | ~20K | ~55K | ~60K
> >>
> >>This is impressive, but I think it's still not enough to justify the
> >>inclusion of tcm_vhost.
>
> We have demonstrated better results at much higher IOP rates with
> virtio-blk in userspace so while these results are nice, there's no
> reason to believe we can't do this in userspace.
>
> >>In my opinion, vhost-blk/vhost-scsi are mostly
> >>worthwhile as drivers for improvements to QEMU performance. We want to
> >>add more fast paths to QEMU that let us move SCSI and virtio processing
> >>to separate threads, we have proof of concepts that this can be done,
> >>and we can use vhost-blk/vhost-scsi to find bottlenecks more effectively.
> >
> >A general rant below:
> >
> >OTOH if it works, and adds value, we really should consider including code.
>
> Users want something that has lots of features and performs really,
> really well. They want everything.
>
> Having one device type that is "fast" but has no features and
> another that is "not fast" but has a lot of features forces the user
> to make a bad choice. No one wins in the end.
>
> virtio-scsi is brand new. It's not as if we've had any significant
> time to make virtio-scsi-qemu faster. In fact, tcm_vhost existed
> before virtio-scsi-qemu did if I understand correctly.
Can't same can be said about virtio scsi - it seems to be
slower so we force a bad choice between blk and scsi at the user?
>
> >To me, it does not make sense to reject code just because in theory
> >someone could write even better code.
>
> There is no theory. We have proof points with virtio-blk.
>
> >Code walks. Time to marker matters too.
>
> But guest/user facing decisions cannot be easily unmade and making
> the wrong technical choices because of premature concerns of "time
> to market" just result in a long term mess.
>
> There is no technical reason why tcm_vhost is going to be faster
> than doing it in userspace.
But doing what in userspace exactly?
> We can demonstrate this with
> virtio-blk. This isn't a theoretical argument.
>
> >Yes I realize more options increases support. But downstreams can make
> >their own decisions on whether to support some configurations:
> >add a configure option to disable it and that's enough.
> >
> >>In fact, virtio-scsi-qemu and virtio-scsi-vhost are effectively two
> >>completely different devices that happen to speak the same SCSI
> >>transport. Not only virtio-scsi-vhost must be configured outside QEMU
> >
> >configuration outside QEMU is OK I think - real users use
> >management anyway. But maybe we can have helper scripts
> >like we have for tun?
>
> Asking a user to write a helper script is pretty awful...
A developer can write a helper. A user should just use management.
> >
> >>and doesn't support -device;
> >
> >This needs to be fixed I think.
> >
> >>it (obviously) presents different
> >>inquiry/vpd/mode data than virtio-scsi-qemu,
> >
> >Why is this obvious and can't be fixed?
>
> It's an entirely different emulation path. It's not a simple packet
> protocol like virtio-net. It's a complex command protocol where the
> backend maintains a very large amount of state.
>
> >Userspace virtio-scsi
> >is pretty flexible - can't it supply matching inquiry/vpd/mode data
> >so that switching is transparent to the guest?
>
> Basically, the issue is that the kernel has more complete SCSI
> emulation that QEMU does right now.
>
> There are lots of ways to try to solve this--like try to reuse the
> kernel code in userspace or just improving the userspace code. If
> we were able to make the two paths identical, then I strongly
> suspect there'd be no point in having tcm_vhost anyway.
>
> Regards,
>
> Anthony Liguori
However, a question we should ask ourselves is whether this will happen
in practice, and when.
I have no idea, I am just asking questions.
--
MST
^ permalink raw reply
* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
From: Paolo Bonzini @ 2012-07-05 14:47 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Jens Axboe, Anthony Liguori, linux-scsi, kvm-devel, lf-virt,
Anthony Liguori, target-devel, Zhi Yong Wu, Christoph Hellwig,
Stefan Hajnoczi
In-Reply-To: <20120705144001.GB31257@redhat.com>
Il 05/07/2012 16:40, Michael S. Tsirkin ha scritto:
>> virtio-scsi is brand new. It's not as if we've had any significant
>> time to make virtio-scsi-qemu faster. In fact, tcm_vhost existed
>> before virtio-scsi-qemu did if I understand correctly.
Yes.
> Can't same can be said about virtio scsi - it seems to be
> slower so we force a bad choice between blk and scsi at the user?
virtio-scsi supports multiple devices per PCI slot (or even function),
can talk to tapes, has better passthrough support for disks, and does a
bunch of other things that virtio-blk by design doesn't do. This
applies to both tcm_vhost and virtio-scsi-qemu.
So far, all that virtio-scsi vs. virtio-blk benchmarks say is that more
benchmarking is needed. Some people see it faster, some people see it
slower. In some sense, it's consistent with the expectation that the
two should roughly be the same. :)
>> But guest/user facing decisions cannot be easily unmade and making
>> the wrong technical choices because of premature concerns of "time
>> to market" just result in a long term mess.
>>
>> There is no technical reason why tcm_vhost is going to be faster
>> than doing it in userspace.
>
> But doing what in userspace exactly?
Processing virtqueues in separate threads, switching the block and SCSI
layer to fine-grained locking, adding some more fast paths.
>> Basically, the issue is that the kernel has more complete SCSI
>> emulation that QEMU does right now.
>>
>> There are lots of ways to try to solve this--like try to reuse the
>> kernel code in userspace or just improving the userspace code. If
>> we were able to make the two paths identical, then I strongly
>> suspect there'd be no point in having tcm_vhost anyway.
>
> However, a question we should ask ourselves is whether this will happen
> in practice, and when.
It's already happening, but it takes a substantial amount of preparatory
work before you can actually see results.
Paolo
^ permalink raw reply
* Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
From: Michael S. Tsirkin @ 2012-07-05 17:26 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Jens Axboe, Anthony Liguori, linux-scsi, kvm-devel, lf-virt,
Anthony Liguori, target-devel, Zhi Yong Wu, Christoph Hellwig,
Stefan Hajnoczi
In-Reply-To: <4FF5A90F.5050902@redhat.com>
On Thu, Jul 05, 2012 at 04:47:43PM +0200, Paolo Bonzini wrote:
> Il 05/07/2012 16:40, Michael S. Tsirkin ha scritto:
> >> virtio-scsi is brand new. It's not as if we've had any significant
> >> time to make virtio-scsi-qemu faster. In fact, tcm_vhost existed
> >> before virtio-scsi-qemu did if I understand correctly.
>
> Yes.
>
> > Can't same can be said about virtio scsi - it seems to be
> > slower so we force a bad choice between blk and scsi at the user?
>
> virtio-scsi supports multiple devices per PCI slot (or even function),
> can talk to tapes, has better passthrough support for disks, and does a
> bunch of other things that virtio-blk by design doesn't do. This
> applies to both tcm_vhost and virtio-scsi-qemu.
>
> So far, all that virtio-scsi vs. virtio-blk benchmarks say is that more
> benchmarking is needed. Some people see it faster, some people see it
> slower. In some sense, it's consistent with the expectation that the
> two should roughly be the same. :)
Anyway, all I was saying is new technology often lacks some features of
the old one. We are not forcing new inferior one on anyone, so we can
let it mature it tree.
> >> But guest/user facing decisions cannot be easily unmade and making
> >> the wrong technical choices because of premature concerns of "time
> >> to market" just result in a long term mess.
> >>
> >> There is no technical reason why tcm_vhost is going to be faster
> >> than doing it in userspace.
> >
> > But doing what in userspace exactly?
>
> Processing virtqueues in separate threads, switching the block and SCSI
> layer to fine-grained locking, adding some more fast paths.
>
> >> Basically, the issue is that the kernel has more complete SCSI
> >> emulation that QEMU does right now.
> >>
> >> There are lots of ways to try to solve this--like try to reuse the
> >> kernel code in userspace or just improving the userspace code. If
> >> we were able to make the two paths identical, then I strongly
> >> suspect there'd be no point in having tcm_vhost anyway.
> >
> > However, a question we should ask ourselves is whether this will happen
> > in practice, and when.
>
> It's already happening, but it takes a substantial amount of preparatory
> work before you can actually see results.
>
> Paolo
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox