qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] iSCSI support for QEMU
@ 2011-04-21  8:43 Ronnie Sahlberg
  2011-04-21  8:43 ` [Qemu-devel] [PATCH] iSCSI block driver support Ronnie Sahlberg
  2011-04-21  8:50 ` [Qemu-devel] iSCSI support for QEMU Christoph Hellwig
  0 siblings, 2 replies; 16+ messages in thread
From: Ronnie Sahlberg @ 2011-04-21  8:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha

Please find attached a new version of the patch for iSCSI support.
iSCSI support is automaticallt detected and activated during configure/build
if the libiscsi library is available on the build host.

This library is available at : https://github.com/sahlberg/libiscsi


This new version contains two changes since previous versions
* avoid copying data in the read path and use the api for reading directly
into the application buffers
* use proper task management functions to abort tasks that are cancelled
by qemu


A built in iSCSI initiator has many benefits
* the LUNs used are local to the guest and are not exposed to the host, nor any of the processes running on the host.
* it provides very high performance out-of-the-box, compared to open-iscsi
that requires several special flags to iscsi to match libiscsi
* it avoids cache trashing and having two copies of the same data, once in the 
hosts cache and a second copy in the guest. I.e. O_DIRECT like behaviour.
* management of LUNs are much easier for high-end or enterprise users with
very many iscsi devices. You no longer need to expose many hundreds of devices
to the local host and have them pollute the caches.


Performance is acceptable with libiscsi.
Some basic tests thatve been performed show it to be significantly faster
than an out-of-the-box open-iscsi mounted LUN being accessed by default
QEMU i/o options.

Test was performed as booting a Ubuntu 10.04 host, mounted on a 8GB STGT LUN 
exposed across a 1GbE network.
Test was once the system booted, how long would it take to run, inside the guest, "sudo time dd if=/dev/sda of=/dev/null bs=1M"
i.e. just have the guest read the 8GB system disk.

QEMU+libiscsi                   : 228 seconds
QEMU, default device options    : 273 seconds
QEMU with cache=none            : 387 seconds
QEMU with cache=none,aio=native : 218 seconds


Please review and consider for inclusion.

regards
ronnie sahlberg



From: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Subject: iSCSI support for QEMU
In-Reply-To: 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH] iSCSI block driver support
  2011-04-21  8:43 [Qemu-devel] iSCSI support for QEMU Ronnie Sahlberg
@ 2011-04-21  8:43 ` Ronnie Sahlberg
  2011-04-21  8:50 ` [Qemu-devel] iSCSI support for QEMU Christoph Hellwig
  1 sibling, 0 replies; 16+ messages in thread
From: Ronnie Sahlberg @ 2011-04-21  8:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha, Ronnie Sahlberg

This patch adds a new block driver : block.iscsi.c
This driver interfaces with the multiplatform posix library
for iscsi initiator/client access to iscsi devices hosted at
git://github.com/sahlberg/libiscsi.git

The patch adds the driver to interface with the iscsi library.
It also updated the configure script to
* by default, probe is libiscsi is available and if so, build
  qemu against libiscsi.
* --enable-libiscsi
  Force a build against libiscsi. If libiscsi is not available
  the build will fail.
* --disable-libiscsi
  Do not link against libiscsi, even if it is available.

When linked with libiscsi, qemu gains support to access iscsi resources
such as disks and cdrom directly, without having to make the devices visible
to the host.

You can specify devices using a iscsi url of the form :
iscsi://[<username>[:<password>@]]<host>[:<port]/<target-iqn-name>/<lun>
When using authentication, the password can optionally be set with
LIBISCSI_CHAP_PASSWORD="password" to avoid it showing up in the process list

Example:
-drive file=iscsi://10.1.1.1:3260/iqn.ronnie.test/1

-cdrom iscsi://10.1.1.1:3260/iqn.ronnie.test/2

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
---
 Makefile.objs |    1 +
 block/iscsi.c |  590 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 configure     |   31 +++
 trace-events  |    6 +
 4 files changed, 628 insertions(+), 0 deletions(-)
 create mode 100644 block/iscsi.c

diff --git a/Makefile.objs b/Makefile.objs
index 44ce368..8403c66 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -25,6 +25,7 @@ block-nested-y += qed-check.o
 block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
 block-nested-$(CONFIG_POSIX) += raw-posix.o
+block-nested-$(CONFIG_LIBISCSI) += iscsi.o
 block-nested-$(CONFIG_CURL) += curl.o
 block-nested-$(CONFIG_RBD) += rbd.o
 
diff --git a/block/iscsi.c b/block/iscsi.c
new file mode 100644
index 0000000..31831ec
--- /dev/null
+++ b/block/iscsi.c
@@ -0,0 +1,590 @@
+/*
+ * QEMU Block driver for iSCSI images
+ *
+ * Copyright (c) 2010 Ronnie Sahlberg <ronniesahlberg@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "config-host.h"
+
+#include <poll.h>
+#include "sysemu.h"
+#include "qemu-common.h"
+#include "qemu-error.h"
+#include "block_int.h"
+#include "trace.h"
+
+#include <iscsi/iscsi.h>
+#include <iscsi/scsi-lowlevel.h>
+
+
+typedef struct IscsiLun {
+    struct iscsi_context *iscsi;
+    int lun;
+    int block_size;
+    unsigned long num_blocks;
+} IscsiLun;
+
+typedef struct IscsiAIOCB {
+    BlockDriverAIOCB common;
+    QEMUIOVector *qiov;
+    QEMUBH *bh;
+    IscsiLun *iscsilun;
+    struct scsi_task *task;
+    uint8_t *buf;
+    int canceled;
+    int status;
+    size_t read_size;
+    size_t read_offset;
+} IscsiAIOCB;
+
+struct IscsiTask {
+    IscsiLun *iscsilun;
+    int status;
+    int complete;
+};
+
+static void
+iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void *command_data,
+		void *private_data)
+{
+}
+
+static void
+iscsi_aio_cancel(BlockDriverAIOCB *blockacb)
+{
+    IscsiAIOCB *acb = (IscsiAIOCB *)blockacb;
+    IscsiLun *iscsilun = acb->iscsilun;
+
+    acb->status = -ECANCELED;
+    acb->common.cb(acb->common.opaque, acb->status);
+    acb->canceled = 1;
+
+    iscsi_task_mgmt_abort_task_async(iscsilun->iscsi, acb->task,
+    				iscsi_abort_task_cb, NULL);
+}
+
+static AIOPool iscsi_aio_pool = {
+    .aiocb_size         = sizeof(IscsiAIOCB),
+    .cancel             = iscsi_aio_cancel,
+};
+
+
+static void iscsi_process_read(void *arg);
+static void iscsi_process_write(void *arg);
+
+static int iscsi_process_flush(void *arg)
+{
+    IscsiLun *iscsilun = arg;
+
+    return iscsi_queue_length(iscsilun->iscsi) > 0;
+}
+
+static void
+iscsi_set_events(IscsiLun *iscsilun)
+{
+    struct iscsi_context *iscsi = iscsilun->iscsi;
+
+    qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), iscsi_process_read,
+                           (iscsi_which_events(iscsi) & POLLOUT)
+                           ? iscsi_process_write : NULL,
+                           iscsi_process_flush, NULL, iscsilun);
+}
+
+static void
+iscsi_process_read(void *arg)
+{
+    IscsiLun *iscsilun = arg;
+    struct iscsi_context *iscsi = iscsilun->iscsi;
+
+    iscsi_service(iscsi, POLLIN);
+    iscsi_set_events(iscsilun);
+}
+
+static void
+iscsi_process_write(void *arg)
+{
+    IscsiLun *iscsilun = arg;
+    struct iscsi_context *iscsi = iscsilun->iscsi;
+
+    iscsi_service(iscsi, POLLOUT);
+    iscsi_set_events(iscsilun);
+}
+
+
+static int
+iscsi_schedule_bh(QEMUBHFunc *cb, IscsiAIOCB *acb)
+{
+    acb->bh = qemu_bh_new(cb, acb);
+    if (!acb->bh) {
+        error_report("oom: could not create iscsi bh");
+        return -EIO;
+    }
+
+    qemu_bh_schedule(acb->bh);
+    return 0;
+}
+
+static void
+iscsi_readv_writev_bh_cb(void *p)
+{
+    IscsiAIOCB *acb = p;
+
+    qemu_bh_delete(acb->bh);
+
+    if (acb->status != -ECANCELED) {
+        acb->common.cb(acb->common.opaque, acb->status);
+    }
+
+    qemu_aio_release(acb);
+}
+
+
+static void
+iscsi_aio_write10_cb(struct iscsi_context *iscsi, int status,
+                     void *command_data, void *opaque)
+{
+    IscsiAIOCB *acb = opaque;
+
+    trace_iscsi_aio_write10_cb(iscsi, status, acb, acb->canceled);
+
+    if (acb->buf != NULL) {
+        free(acb->buf);
+    }
+
+    if (acb->canceled != 0) {
+        qemu_aio_release(acb);
+	scsi_free_scsi_task(acb->task);
+	acb->task = NULL;
+        return;
+    }
+
+    acb->status = 0;
+    if (status < 0) {
+        error_report("Failed to write10 data to iSCSI lun. %s",
+                     iscsi_get_error(iscsi));
+        acb->status = -EIO;
+    }
+
+    iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb);
+    scsi_free_scsi_task(acb->task);
+    acb->task = NULL;
+}
+
+static int64_t sector_qemu2lun(int64_t sector, IscsiLun *iscsilun)
+{
+    return sector * BDRV_SECTOR_SIZE / iscsilun->block_size;
+}
+
+static BlockDriverAIOCB *
+iscsi_aio_writev(BlockDriverState *bs, int64_t sector_num,
+                 QEMUIOVector *qiov, int nb_sectors,
+                 BlockDriverCompletionFunc *cb,
+                 void *opaque)
+{
+    IscsiLun *iscsilun = bs->opaque;
+    struct iscsi_context *iscsi = iscsilun->iscsi;
+    IscsiAIOCB *acb;
+    size_t size;
+
+    acb = qemu_aio_get(&iscsi_aio_pool, bs, cb, opaque);
+    trace_iscsi_aio_writev(iscsi, sector_num, nb_sectors, opaque, acb);
+    if (!acb) {
+        return NULL;
+    }
+
+    acb->iscsilun = iscsilun;
+    acb->qiov     = qiov;
+
+    acb->canceled   = 0;
+
+    /* XXX we should pass the iovec to write10 to avoid the extra copy */
+    /* this will allow us to get rid of 'buf' completely */
+    size = nb_sectors * BDRV_SECTOR_SIZE;
+    acb->buf = qemu_malloc(size);
+    qemu_iovec_to_buffer(acb->qiov, acb->buf);
+    acb->task = iscsi_write10_task(iscsi, iscsilun->lun, acb->buf, size,
+                              sector_qemu2lun(sector_num, iscsilun),
+                              0, 0, iscsilun->block_size,
+                              iscsi_aio_write10_cb, acb);
+    if (acb->task == NULL) {
+        error_report("iSCSI: Failed to send write10 command. %s",
+                     iscsi_get_error(iscsi));
+        qemu_free(acb->buf);
+        qemu_aio_release(acb);
+        return NULL;
+    }
+
+    iscsi_set_events(iscsilun);
+
+    return &acb->common;
+}
+
+static void
+iscsi_aio_read10_cb(struct iscsi_context *iscsi, int status,
+                    void *command_data, void *opaque)
+{
+    IscsiAIOCB *acb = opaque;
+
+    trace_iscsi_aio_read10_cb(iscsi, status, acb, acb->canceled);
+
+    if (acb->canceled != 0) {
+        qemu_aio_release(acb);
+	scsi_free_scsi_task(acb->task);
+	acb->task = NULL;
+        return;
+    }
+
+    acb->status = 0;
+    if (status != 0) {
+        error_report("Failed to read10 data from iSCSI lun. %s",
+                     iscsi_get_error(iscsi));
+        acb->status = -EIO;
+    }
+
+    iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb);
+    scsi_free_scsi_task(acb->task);
+    acb->task = NULL;
+}
+
+static BlockDriverAIOCB *
+iscsi_aio_readv(BlockDriverState *bs, int64_t sector_num,
+                QEMUIOVector *qiov, int nb_sectors,
+                BlockDriverCompletionFunc *cb,
+                void *opaque)
+{
+    IscsiLun *iscsilun = bs->opaque;
+    struct iscsi_context *iscsi = iscsilun->iscsi;
+    IscsiAIOCB *acb;
+    size_t qemu_read_size, lun_read_size;
+    int i;
+
+    qemu_read_size = BDRV_SECTOR_SIZE * (size_t)nb_sectors;
+
+    acb = qemu_aio_get(&iscsi_aio_pool, bs, cb, opaque);
+    trace_iscsi_aio_readv(iscsi, sector_num, nb_sectors, opaque, acb);
+    if (!acb) {
+        return NULL;
+    }
+
+    acb->iscsilun = iscsilun;
+    acb->qiov     = qiov;
+
+    acb->canceled    = 0;
+    acb->read_size   = qemu_read_size;
+    acb->buf         = NULL;
+
+    /* If LUN blocksize is bigger than BDRV_BLOCK_SIZE a read from QEMU
+     * may be misaligned to the LUN, so we may need to read some extra
+     * data.
+     */
+    acb->read_offset = 0;
+    if (iscsilun->block_size > BDRV_SECTOR_SIZE) {
+        uint64_t bdrv_offset = BDRV_SECTOR_SIZE * sector_num;
+
+        acb->read_offset  = bdrv_offset % iscsilun->block_size;
+    }
+
+    lun_read_size  = (qemu_read_size + iscsilun->block_size
+                     + acb->read_offset - 1)
+                     / iscsilun->block_size * iscsilun->block_size;
+    acb->task = iscsi_read10_task(iscsi, iscsilun->lun,
+                             sector_qemu2lun(sector_num, iscsilun),
+                             lun_read_size, iscsilun->block_size,
+                             iscsi_aio_read10_cb, acb);
+    if (acb->task == NULL) {
+        error_report("iSCSI: Failed to send read10 command. %s",
+                     iscsi_get_error(iscsi));
+        qemu_aio_release(acb);
+        return NULL;
+    }
+
+    for (i = 0;i < acb->qiov->niov; i++) {
+        scsi_task_add_data_in_buffer(acb->task,
+                acb->qiov->iov[i].iov_len, 
+                acb->qiov->iov[i].iov_base);
+    }
+
+    iscsi_set_events(iscsilun);
+
+    return &acb->common;
+}
+
+
+static void
+iscsi_synccache10_cb(struct iscsi_context *iscsi, int status,
+                     void *command_data, void *opaque)
+{
+    IscsiAIOCB *acb = opaque;
+
+    if (acb->canceled != 0) {
+        qemu_aio_release(acb);
+	scsi_free_scsi_task(acb->task);
+	acb->task = NULL;
+        return;
+    }
+
+    acb->status = 0;
+    if (status < 0) {
+        error_report("Failed to sync10 data on iSCSI lun. %s",
+                     iscsi_get_error(iscsi));
+        acb->status = -EIO;
+    }
+
+    iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb);
+    scsi_free_scsi_task(acb->task);
+    acb->task = NULL;
+}
+
+static BlockDriverAIOCB *
+iscsi_aio_flush(BlockDriverState *bs,
+                BlockDriverCompletionFunc *cb, void *opaque)
+{
+    IscsiLun *iscsilun = bs->opaque;
+    struct iscsi_context *iscsi = iscsilun->iscsi;
+    IscsiAIOCB *acb;
+
+    acb = qemu_aio_get(&iscsi_aio_pool, bs, cb, opaque);
+    if (!acb) {
+        return NULL;
+    }
+
+    acb->iscsilun = iscsilun;
+    acb->canceled   = 0;
+
+    acb->task = iscsi_synchronizecache10_task(iscsi, iscsilun->lun,
+                                         0, 0, 0, 0,
+                                         iscsi_synccache10_cb,
+                                         acb);
+    if (acb->task == NULL) {
+        error_report("iSCSI: Failed to send synchronizecache10 command. %s",
+                     iscsi_get_error(iscsi));
+        qemu_aio_release(acb);
+        return NULL;
+    }
+
+    iscsi_set_events(iscsilun);
+
+    return &acb->common;
+}
+
+static int64_t
+iscsi_getlength(BlockDriverState *bs)
+{
+    IscsiLun *iscsilun = bs->opaque;
+    int64_t len;
+
+    len  = iscsilun->num_blocks;
+    len *= iscsilun->block_size;
+
+    return len;
+}
+
+static void
+iscsi_readcapacity10_cb(struct iscsi_context *iscsi, int status,
+                        void *command_data, void *opaque)
+{
+    struct IscsiTask *itask = opaque;
+    struct scsi_readcapacity10 *rc10;
+    struct scsi_task *task = command_data;
+
+    if (status != 0) {
+        error_report("iSCSI: Failed to read capacity of iSCSI lun. %s",
+                     iscsi_get_error(iscsi));
+        itask->status   = 1;
+        itask->complete = 1;
+	scsi_free_scsi_task(task);
+        return;
+    }
+
+    rc10 = scsi_datain_unmarshall(task);
+    if (rc10 == NULL) {
+        error_report("iSCSI: Failed to unmarshall readcapacity10 data.");
+        itask->status   = 1;
+        itask->complete = 1;
+	scsi_free_scsi_task(task);
+        return;
+    }
+
+    itask->iscsilun->block_size = rc10->block_size;
+    itask->iscsilun->num_blocks = rc10->lba;
+
+    itask->status   = 0;
+    itask->complete = 1;
+    scsi_free_scsi_task(task);
+}
+
+
+static void
+iscsi_connect_cb(struct iscsi_context *iscsi, int status, void *command_data,
+                 void *opaque)
+{
+    struct IscsiTask *itask = opaque;
+    struct scsi_task *task;
+
+    if (status != 0) {
+        itask->status   = 1;
+        itask->complete = 1;
+        return;
+    }
+
+    task = iscsi_readcapacity10_task(iscsi, itask->iscsilun->lun, 0, 0,
+                                   iscsi_readcapacity10_cb, opaque);
+    if (task == NULL) {
+        error_report("iSCSI: failed to send readcapacity command.");
+        itask->status   = 1;
+        itask->complete = 1;
+        return;
+    }
+}
+
+/*
+ * We support iscsi url's on the form
+ * iscsi://[<username>%<password>@]<host>[:<port>]/<targetname>/<lun>
+ */
+static int iscsi_open(BlockDriverState *bs, const char *filename, int flags)
+{
+    IscsiLun *iscsilun = bs->opaque;
+    struct iscsi_context *iscsi = NULL;
+    struct iscsi_url *iscsi_url = NULL;
+    struct IscsiTask task;
+    int ret;
+
+    if ((BDRV_SECTOR_SIZE % 512) != 0) {
+        error_report("iSCSI: Invalid BDRV_SECTOR_SIZE. "
+                     "BDRV_SECTOR_SIZE(%lld) is not a multiple "
+                     "of 512", BDRV_SECTOR_SIZE);
+        return -EINVAL;
+    }
+
+    memset(iscsilun, 0, sizeof(IscsiLun));
+
+    /* Should really append the KVM name after the ':' here */
+    iscsi = iscsi_create_context("iqn.2008-11.org.linux-kvm:");
+    if (iscsi == NULL) {
+        error_report("iSCSI: Failed to create iSCSI context.");
+        ret = -ENOMEM;
+        goto failed;
+    }
+
+    iscsi_url = iscsi_parse_full_url(iscsi, filename);
+    if (iscsi_url == NULL) {
+        error_report("Failed to parse URL : %s %s", filename,
+                     iscsi_get_error(iscsi));
+        ret = -ENOMEM;
+        goto failed;
+    }
+
+    if (iscsi_set_targetname(iscsi, iscsi_url->target)) {
+        error_report("iSCSI: Failed to set target name.");
+        ret = -ENOMEM;
+        goto failed;
+    }
+
+    if (iscsi_url->user != NULL) {
+        ret = iscsi_set_initiator_username_pwd(iscsi, iscsi_url->user,
+                                              iscsi_url->passwd);
+        if (ret != 0) {
+            error_report("Failed to set initiator username and password");
+            ret = -ENOMEM;
+            goto failed;
+        }
+    }
+    if (iscsi_set_session_type(iscsi, ISCSI_SESSION_NORMAL) != 0) {
+        error_report("iSCSI: Failed to set session type to normal.");
+        ret = -ENOMEM;
+        goto failed;
+    }
+
+    iscsi_set_header_digest(iscsi, ISCSI_HEADER_DIGEST_NONE_CRC32C);
+
+    task.iscsilun = iscsilun;
+    task.status = 0;
+    task.complete = 0;
+
+    iscsilun->iscsi = iscsi;
+    iscsilun->lun   = iscsi_url->lun;
+
+    if (iscsi_full_connect_async(iscsi, iscsi_url->portal, iscsi_url->lun,
+                                 iscsi_connect_cb, &task)
+        != 0) {
+        error_report("iSCSI: Failed to start async connect.");
+        ret = -ENOMEM;
+        goto failed;
+    }
+
+    async_context_push();
+    while (!task.complete) {
+        iscsi_set_events(iscsilun);
+        qemu_aio_wait();
+    }
+    async_context_pop();
+    if (task.status != 0) {
+        error_report("iSCSI: Failed to connect to LUN : %s",
+                     iscsi_get_error(iscsi));
+        ret = -EINVAL;
+        goto failed;
+    }
+
+    return 0;
+
+failed:
+    if (iscsi_url != NULL) {
+        iscsi_destroy_url(iscsi_url);
+    }
+    if (iscsi != NULL) {
+        iscsi_destroy_context(iscsi);
+    }
+    memset(iscsilun, 0, sizeof(IscsiLun));
+    return ret;
+}
+
+static void iscsi_close(BlockDriverState *bs)
+{
+    IscsiLun *iscsilun = bs->opaque;
+    struct iscsi_context *iscsi = iscsilun->iscsi;
+
+    qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), NULL, NULL, NULL, NULL, NULL);
+    iscsi_destroy_context(iscsi);
+    memset(iscsilun, 0, sizeof(IscsiLun));
+}
+
+static BlockDriver bdrv_iscsi = {
+    .format_name     = "iscsi",
+    .protocol_name   = "iscsi",
+
+    .instance_size   = sizeof(IscsiLun),
+    .bdrv_file_open  = iscsi_open,
+    .bdrv_close      = iscsi_close,
+
+    .bdrv_getlength  = iscsi_getlength,
+
+    .bdrv_aio_readv  = iscsi_aio_readv,
+    .bdrv_aio_writev = iscsi_aio_writev,
+    .bdrv_aio_flush  = iscsi_aio_flush,
+};
+
+static void iscsi_block_init(void)
+{
+    bdrv_register(&bdrv_iscsi);
+}
+
+block_init(iscsi_block_init);
+
diff --git a/configure b/configure
index da2da04..7a71153 100755
--- a/configure
+++ b/configure
@@ -178,6 +178,7 @@ rbd=""
 smartcard=""
 smartcard_nss=""
 opengl="no"
+libiscsi=""
 
 # parse CC options first
 for opt do
@@ -632,6 +633,10 @@ for opt do
   ;;
   --enable-spice) spice="yes"
   ;;
+  --disable-libiscsi) libiscsi="no"
+  ;;
+  --enable-libiscsi) libiscsi="yes"
+  ;;
   --enable-profiler) profiler="yes"
   ;;
   --enable-cocoa)
@@ -941,6 +946,8 @@ echo "                           Default:trace-<pid>"
 echo "  --disable-spice          disable spice"
 echo "  --enable-spice           enable spice"
 echo "  --enable-rbd             enable building the rados block device (rbd)"
+echo "  --disable-libiscsi       disable iscsi support"
+echo "  --enable-libiscsi        enable iscsi support"
 echo "  --disable-smartcard      disable smartcard support"
 echo "  --enable-smartcard       enable smartcard support"
 echo "  --disable-smartcard-nss  disable smartcard nss support"
@@ -2282,6 +2289,25 @@ if compile_prog "" "" ; then
 fi
 
 ##########################################
+# Do we have libiscsi
+if test "$libiscsi" != "no" ; then
+  cat > $TMPC << EOF
+#include <iscsi/iscsi.h>
+int main(void) { iscsi_create_context(""); return 0; }
+EOF
+  if compile_prog "-Werror" "-liscsi" ; then
+    libiscsi="yes"
+    LIBS="$LIBS -liscsi"
+  else
+    if test "$libiscsi" = "yes" ; then
+      feature_not_found "libiscsi"
+    fi
+    libiscsi="no"
+  fi
+fi
+
+
+##########################################
 # Do we need librt
 cat > $TMPC <<EOF
 #include <signal.h>
@@ -2615,6 +2641,7 @@ echo "rbd support       $rbd"
 echo "xfsctl support    $xfs"
 echo "nss used          $smartcard_nss"
 echo "OpenGL support    $opengl"
+echo "libiscsi support  $libiscsi"
 
 if test $sdl_too_old = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -2909,6 +2936,10 @@ if test "$opengl" = "yes" ; then
   echo "CONFIG_OPENGL=y" >> $config_host_mak
 fi
 
+if test "$libiscsi" = "yes" ; then
+  echo "CONFIG_LIBISCSI=y" >> $config_host_mak
+fi
+
 # XXX: suppress that
 if [ "$bsd" = "yes" ] ; then
   echo "CONFIG_BSD=y" >> $config_host_mak
diff --git a/trace-events b/trace-events
index 703b745..05059a0 100644
--- a/trace-events
+++ b/trace-events
@@ -360,3 +360,9 @@ disable milkymist_uart_pulse_irq_tx(void) "Pulse IRQ TX"
 # hw/milkymist-vgafb.c
 disable milkymist_vgafb_memory_read(uint32_t addr, uint32_t value) "addr %08x value %08x"
 disable milkymist_vgafb_memory_write(uint32_t addr, uint32_t value) "addr %08x value %08x"
+
+# block/iscsi.c
+disable iscsi_aio_write10_cb(void *iscsi, int status, void *acb, int canceled) "iscsi %p status %d acb %p canceled %d"
+disable iscsi_aio_writev(void *iscsi, int64_t sector_num, int nb_sectors, void *opaque, void *acb) "iscsi %p sector_num %"PRId64" nb_sectors %d opaque %p acb %p"
+disable iscsi_aio_read10_cb(void *iscsi, int status, void *acb, int canceled) "iscsi %p status %d acb %p canceled %d"
+disable iscsi_aio_readv(void *iscsi, int64_t sector_num, int nb_sectors, void *opaque, void *acb) "iscsi %p sector_num %"PRId64" nb_sectors %d opaque %p acb %p"
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] iSCSI support for QEMU
  2011-04-21  8:43 [Qemu-devel] iSCSI support for QEMU Ronnie Sahlberg
  2011-04-21  8:43 ` [Qemu-devel] [PATCH] iSCSI block driver support Ronnie Sahlberg
@ 2011-04-21  8:50 ` Christoph Hellwig
  2011-04-21  8:58   ` ronnie sahlberg
  1 sibling, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2011-04-21  8:50 UTC (permalink / raw)
  To: Ronnie Sahlberg; +Cc: qemu-devel, stefanha

On Thu, Apr 21, 2011 at 06:43:10PM +1000, Ronnie Sahlberg wrote:
> Some basic tests thatve been performed show it to be significantly faster
> than an out-of-the-box open-iscsi mounted LUN being accessed by default
> QEMU i/o options.

Which isn't a useful comparism.  qemu's default is the braindead
cache=writethrough behaviour, which forces writes out to disk, and bloats the
pagecache, while your mail sounds you silently always implement O_DIRECT-like
semantics.

Also the implementation has data integrity issues due this.  It does
not actually force writes out to disk in the default cache mode, despite
qemu claiming to have WCE=0 in the default mode, i.e. you never flush
the cache.  You'll need set the FUA bits on all writes if cache=writethrough
is used, with a fallback to a real cache flush in case the target doesn't
support the FUA bit.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] iSCSI support for QEMU
  2011-04-21  8:50 ` [Qemu-devel] iSCSI support for QEMU Christoph Hellwig
@ 2011-04-21  8:58   ` ronnie sahlberg
  2011-04-21  9:09     ` Christoph Hellwig
  0 siblings, 1 reply; 16+ messages in thread
From: ronnie sahlberg @ 2011-04-21  8:58 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: qemu-devel, stefanha

Please re-read my post or read the patch.

It has O_DIRECT like behaviour in that it will not pollute the hosts cache.
This for the simple reason that the host is not aware that there is
any block i/o happening.

In my patch, there are NO data integrity issues.
Data is sent out on the wire immediately as the guest issues the write.
Once the guest issues a flush call, the flush call will not terminate
until the SYNCCACHE10 task has completed.

I do not have any cache at all in libiscsi or the patch. All I/O are
always sent to the iSCSI target.
No data is ever cached anywhere.


I am just saying that in the tests I did,  I had to use
'cache=none,aio=native' to come anywhere near the libiscsi
performance.
All other options performed significantly worse.

I have no idea what those options have for effect on data integrity or
open-iscsi.
There should not be any integrity issues in libiscsi since it does not
cache at all, ever.


regards
ronnie sahlberg


On Thu, Apr 21, 2011 at 6:50 PM, Christoph Hellwig <hch@lst.de> wrote:
> On Thu, Apr 21, 2011 at 06:43:10PM +1000, Ronnie Sahlberg wrote:
>> Some basic tests thatve been performed show it to be significantly faster
>> than an out-of-the-box open-iscsi mounted LUN being accessed by default
>> QEMU i/o options.
>
> Which isn't a useful comparism.  qemu's default is the braindead
> cache=writethrough behaviour, which forces writes out to disk, and bloats the
> pagecache, while your mail sounds you silently always implement O_DIRECT-like
> semantics.
>
> Also the implementation has data integrity issues due this.  It does
> not actually force writes out to disk in the default cache mode, despite
> qemu claiming to have WCE=0 in the default mode, i.e. you never flush
> the cache.  You'll need set the FUA bits on all writes if cache=writethrough
> is used, with a fallback to a real cache flush in case the target doesn't
> support the FUA bit.
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] iSCSI support for QEMU
  2011-04-21  8:58   ` ronnie sahlberg
@ 2011-04-21  9:09     ` Christoph Hellwig
  2011-04-21  9:28       ` ronnie sahlberg
  2011-04-21  9:47       ` ronnie sahlberg
  0 siblings, 2 replies; 16+ messages in thread
From: Christoph Hellwig @ 2011-04-21  9:09 UTC (permalink / raw)
  To: ronnie sahlberg; +Cc: Christoph Hellwig, stefanha, qemu-devel

> In my patch, there are NO data integrity issues.
> Data is sent out on the wire immediately as the guest issues the write.
> Once the guest issues a flush call, the flush call will not terminate
> until the SYNCCACHE10 task has completed.

No guest will even issue a cache flush, as we claim to be WCE=0 by default.
Now if you target has WCE=1 it will cache data internally, and your
iscsi initiator will never flush it out to disk.

We only claim WCE=1 to the guest if cache=writeback or cache=none are
set.  So ignoring the issue of having a cache on the initiator side
you must implement stable writes for the default cache=writethrough
behaviour by either seeting the FUA bit on your writes, or doing
a cache flush after every write in case the target does not support FUA.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] iSCSI support for QEMU
  2011-04-21  9:09     ` Christoph Hellwig
@ 2011-04-21  9:28       ` ronnie sahlberg
  2011-04-21 10:58         ` Stefan Hajnoczi
  2011-04-21  9:47       ` ronnie sahlberg
  1 sibling, 1 reply; 16+ messages in thread
From: ronnie sahlberg @ 2011-04-21  9:28 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: qemu-devel, stefanha

On Thu, Apr 21, 2011 at 7:09 PM, Christoph Hellwig <hch@lst.de> wrote:
>> In my patch, there are NO data integrity issues.
>> Data is sent out on the wire immediately as the guest issues the write.
>> Once the guest issues a flush call, the flush call will not terminate
>> until the SYNCCACHE10 task has completed.
>
> No guest will even issue a cache flush, as we claim to be WCE=0 by default.
> Now if you target has WCE=1 it will cache data internally, and your
> iscsi initiator will never flush it out to disk.

My target does not do any caching at all.
It happily ignores both FUA and FUA_NV bits and always destage all
data to stable storage before
sending SCSI_STATUS_GOOD back to the initiator.

I use the same target and the same LUN and the same settings for both testing
QEMU+openiscsi-mounted-lun   and QEMU+libiscsi

I do not understand why my target would have data integrity problem
when used with libiscsi
but not with open-iscsi mounted lun?


>
> We only claim WCE=1 to the guest if cache=writeback or cache=none are
> set.  So ignoring the issue of having a cache on the initiator side
> you must implement stable writes for the default cache=writethrough
> behaviour by either seeting the FUA bit on your writes, or doing
> a cache flush after every write in case the target does not support FUA.

My target right now does such flushes for writes.


I fail to see why FUA, FUA_NV or flushes have any relevance to a test
that just involves reading data off the lun.



I think this discussion is strange. I would like discussion about the
merits of my patch and if features like built-in
iscsi support that enterprise users find useful are desireable in
QEMU. I do not find discussions about semantics of
my particular iscsi target to be meaningful for that purpose.



regards
ronnie sahlberg

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] iSCSI support for QEMU
  2011-04-21  9:09     ` Christoph Hellwig
  2011-04-21  9:28       ` ronnie sahlberg
@ 2011-04-21  9:47       ` ronnie sahlberg
  1 sibling, 0 replies; 16+ messages in thread
From: ronnie sahlberg @ 2011-04-21  9:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: qemu-devel, stefanha

Christoph,

I think you misread my test.
My test is pure reading :

sudo time dd if=/dev/sda of=/dev/null bs=1M

There are no writes involved at all in this test, only a huge number
of READ10 being sent to the target,
or in the case of when using QEMU+openiscsi-mounted-lun sometimes
being served out of the pagecache of the host.


Since open-iscsi mounted LUNs by default perform so very poorly
against libiscsi,  I assume that there are very few blocks that are be
served out
of the cache of the host.
This is based on that a block served out of cache would have
significantly, many orders or magnitudes, lower access latency
than a block that needs to be fetched across a 1GbE network.
As open-iscsi performs so much poorly in this case compared to
libiscsi, I just speculate that very few blocks are delivered by cache
hits.




I have absolutely no idea on why, QEMU+open-iscsi would perform so
much better for a read-intensive workload like this when setting
cache=none,aio=native.  That is for the qemu developers to explain.


Maybe doing READ10 through open-iscsi is very expensive? Maybe
something else in the linux kernel makes reads very expensive unless
you use "cache=none,aio=native"?
Who knows?

I have no idea, other than without using "cache=none,aio=native"  QEMU
performance for read intensive tasks are significantly slower than
QEMU doing the exact same reads using libiscsi.


I really don't care why QEMU+openiscsi performs so bad either. That is
of very little interest to me. As long as libiscsi is not
significantly worse than open-iscsi I care very little about why.


regards
ronnie sahlberg


On Thu, Apr 21, 2011 at 7:09 PM, Christoph Hellwig <hch@lst.de> wrote:
>> In my patch, there are NO data integrity issues.
>> Data is sent out on the wire immediately as the guest issues the write.
>> Once the guest issues a flush call, the flush call will not terminate
>> until the SYNCCACHE10 task has completed.
>
> No guest will even issue a cache flush, as we claim to be WCE=0 by default.
> Now if you target has WCE=1 it will cache data internally, and your
> iscsi initiator will never flush it out to disk.
>
> We only claim WCE=1 to the guest if cache=writeback or cache=none are
> set.  So ignoring the issue of having a cache on the initiator side
> you must implement stable writes for the default cache=writethrough
> behaviour by either seeting the FUA bit on your writes, or doing
> a cache flush after every write in case the target does not support FUA.
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] iSCSI support for QEMU
  2011-04-21  9:28       ` ronnie sahlberg
@ 2011-04-21 10:58         ` Stefan Hajnoczi
  2011-04-21 11:12           ` ronnie sahlberg
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2011-04-21 10:58 UTC (permalink / raw)
  To: ronnie sahlberg; +Cc: Christoph Hellwig, stefanha, qemu-devel

On Thu, Apr 21, 2011 at 10:28 AM, ronnie sahlberg
<ronniesahlberg@gmail.com> wrote:
> On Thu, Apr 21, 2011 at 7:09 PM, Christoph Hellwig <hch@lst.de> wrote:
>> We only claim WCE=1 to the guest if cache=writeback or cache=none are
>> set.  So ignoring the issue of having a cache on the initiator side
>> you must implement stable writes for the default cache=writethrough
>> behaviour by either seeting the FUA bit on your writes, or doing
>> a cache flush after every write in case the target does not support FUA.
>
> My target right now does such flushes for writes.
>
>
> I fail to see why FUA, FUA_NV or flushes have any relevance to a test
> that just involves reading data off the lun.

I'll try to rephrase what Christoph has pointed out.

When QEMU is run with cache=writethrough (default), QEMU does not
report a write cache on the emulated disk.  The guest believes that
all writes are stable because there is no disk write cache.  Therefore
the guest does not need to issue synchronize cache commands ever.

In order to meet these semantics with libiscsi, we would need to set
FUA or send a synchronize cache command for every write.  (QEMU's
raw-posix.c file I/O meets these semantics by opening the image file
with O_DSYNC when cache=writethrough.)

> I do not understand why my target would have data integrity problem
> when used with libiscsi
> but not with open-iscsi mounted lun?

In the open-iscsi cache=writethrough case, QEMU's raw-posix.c opens
the file with O_DSYNC.  Open-iscsi must set the FUA bit or synchronize
cache for each write request.

How does libiscsi behave in this case?

Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] iSCSI support for QEMU
  2011-04-21 10:58         ` Stefan Hajnoczi
@ 2011-04-21 11:12           ` ronnie sahlberg
  2011-04-21 11:21             ` Stefan Hajnoczi
  0 siblings, 1 reply; 16+ messages in thread
From: ronnie sahlberg @ 2011-04-21 11:12 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Christoph Hellwig, stefanha, qemu-devel

On Thu, Apr 21, 2011 at 8:58 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Thu, Apr 21, 2011 at 10:28 AM, ronnie sahlberg
> <ronniesahlberg@gmail.com> wrote:
>> On Thu, Apr 21, 2011 at 7:09 PM, Christoph Hellwig <hch@lst.de> wrote:
>>> We only claim WCE=1 to the guest if cache=writeback or cache=none are
>>> set.  So ignoring the issue of having a cache on the initiator side
>>> you must implement stable writes for the default cache=writethrough
>>> behaviour by either seeting the FUA bit on your writes, or doing
>>> a cache flush after every write in case the target does not support FUA.
>>
>> My target right now does such flushes for writes.
>>
>>
>> I fail to see why FUA, FUA_NV or flushes have any relevance to a test
>> that just involves reading data off the lun.
>
> I'll try to rephrase what Christoph has pointed out.
>
> When QEMU is run with cache=writethrough (default), QEMU does not
> report a write cache on the emulated disk.  The guest believes that
> all writes are stable because there is no disk write cache.  Therefore
> the guest does not need to issue synchronize cache commands ever.
>
> In order to meet these semantics with libiscsi, we would need to set
> FUA or send a synchronize cache command for every write.  (QEMU's
> raw-posix.c file I/O meets these semantics by opening the image file
> with O_DSYNC when cache=writethrough.)
>
>> I do not understand why my target would have data integrity problem
>> when used with libiscsi
>> but not with open-iscsi mounted lun?
>
> In the open-iscsi cache=writethrough case, QEMU's raw-posix.c opens
> the file with O_DSYNC.  Open-iscsi must set the FUA bit or synchronize
> cache for each write request.
>
> How does libiscsi behave in this case?

libiscsi ignores the O_DSYNC flag.
It does not matter for two reasons:
* my target always destage to disk before replying. I.e. my target
ALWAYS write data synchronously to stable storage
* this test we are talking about is for READ10,   reads, not writes.


Serioulsly, please explain,
in what exact way are write semantics and FUA bits and write destage
policy relevant here :

sudo time dd if=/dev/sda of=/dev/null bs=1M


I seriously do not understand. Please educate me.



regards
ronnie sahlberg

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] iSCSI support for QEMU
  2011-04-21 11:12           ` ronnie sahlberg
@ 2011-04-21 11:21             ` Stefan Hajnoczi
  2011-04-21 11:36               ` ronnie sahlberg
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2011-04-21 11:21 UTC (permalink / raw)
  To: ronnie sahlberg; +Cc: Christoph Hellwig, stefanha, qemu-devel

On Thu, Apr 21, 2011 at 12:12 PM, ronnie sahlberg
<ronniesahlberg@gmail.com> wrote:
> On Thu, Apr 21, 2011 at 8:58 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> On Thu, Apr 21, 2011 at 10:28 AM, ronnie sahlberg
>> <ronniesahlberg@gmail.com> wrote:
>>> On Thu, Apr 21, 2011 at 7:09 PM, Christoph Hellwig <hch@lst.de> wrote:
>>>> We only claim WCE=1 to the guest if cache=writeback or cache=none are
>>>> set.  So ignoring the issue of having a cache on the initiator side
>>>> you must implement stable writes for the default cache=writethrough
>>>> behaviour by either seeting the FUA bit on your writes, or doing
>>>> a cache flush after every write in case the target does not support FUA.
>>>
>>> My target right now does such flushes for writes.
>>>
>>>
>>> I fail to see why FUA, FUA_NV or flushes have any relevance to a test
>>> that just involves reading data off the lun.
>>
>> I'll try to rephrase what Christoph has pointed out.
>>
>> When QEMU is run with cache=writethrough (default), QEMU does not
>> report a write cache on the emulated disk.  The guest believes that
>> all writes are stable because there is no disk write cache.  Therefore
>> the guest does not need to issue synchronize cache commands ever.
>>
>> In order to meet these semantics with libiscsi, we would need to set
>> FUA or send a synchronize cache command for every write.  (QEMU's
>> raw-posix.c file I/O meets these semantics by opening the image file
>> with O_DSYNC when cache=writethrough.)
>>
>>> I do not understand why my target would have data integrity problem
>>> when used with libiscsi
>>> but not with open-iscsi mounted lun?
>>
>> In the open-iscsi cache=writethrough case, QEMU's raw-posix.c opens
>> the file with O_DSYNC.  Open-iscsi must set the FUA bit or synchronize
>> cache for each write request.
>>
>> How does libiscsi behave in this case?
>
> libiscsi ignores the O_DSYNC flag.
> It does not matter for two reasons:
> * my target always destage to disk before replying. I.e. my target
> ALWAYS write data synchronously to stable storage

Does libiscsi initiator ensure this?  What if I use a different target
or configure it differently, will libiscsi take care to ensure the
semantics are still met?

> * this test we are talking about is for READ10,   reads, not writes.

I was not talking about a specific test.

> Serioulsly, please explain,
> in what exact way are write semantics and FUA bits and write destage
> policy relevant here :
>
> sudo time dd if=/dev/sda of=/dev/null bs=1M
>
>
> I seriously do not understand. Please educate me.

Write semantics are completely independent of this dd read example.

Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] iSCSI support for QEMU
  2011-04-21 11:21             ` Stefan Hajnoczi
@ 2011-04-21 11:36               ` ronnie sahlberg
  2011-04-21 11:44                 ` Kevin Wolf
                                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: ronnie sahlberg @ 2011-04-21 11:36 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Christoph Hellwig, stefanha, qemu-devel

Stephan,

I understand.

Let me re-send a patch tomorrow that can optionally enable/force FUA
bits for write.
There are some high-volume arrays that advertise support but fail any
cdb with FUA, FUA_NV bits set with sense, so it needs to be made optional.


regards
ronnie sahlberg

On Thu, Apr 21, 2011 at 9:21 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Thu, Apr 21, 2011 at 12:12 PM, ronnie sahlberg
> <ronniesahlberg@gmail.com> wrote:
>> On Thu, Apr 21, 2011 at 8:58 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>> On Thu, Apr 21, 2011 at 10:28 AM, ronnie sahlberg
>>> <ronniesahlberg@gmail.com> wrote:
>>>> On Thu, Apr 21, 2011 at 7:09 PM, Christoph Hellwig <hch@lst.de> wrote:
>>>>> We only claim WCE=1 to the guest if cache=writeback or cache=none are
>>>>> set.  So ignoring the issue of having a cache on the initiator side
>>>>> you must implement stable writes for the default cache=writethrough
>>>>> behaviour by either seeting the FUA bit on your writes, or doing
>>>>> a cache flush after every write in case the target does not support FUA.
>>>>
>>>> My target right now does such flushes for writes.
>>>>
>>>>
>>>> I fail to see why FUA, FUA_NV or flushes have any relevance to a test
>>>> that just involves reading data off the lun.
>>>
>>> I'll try to rephrase what Christoph has pointed out.
>>>
>>> When QEMU is run with cache=writethrough (default), QEMU does not
>>> report a write cache on the emulated disk.  The guest believes that
>>> all writes are stable because there is no disk write cache.  Therefore
>>> the guest does not need to issue synchronize cache commands ever.
>>>
>>> In order to meet these semantics with libiscsi, we would need to set
>>> FUA or send a synchronize cache command for every write.  (QEMU's
>>> raw-posix.c file I/O meets these semantics by opening the image file
>>> with O_DSYNC when cache=writethrough.)
>>>
>>>> I do not understand why my target would have data integrity problem
>>>> when used with libiscsi
>>>> but not with open-iscsi mounted lun?
>>>
>>> In the open-iscsi cache=writethrough case, QEMU's raw-posix.c opens
>>> the file with O_DSYNC.  Open-iscsi must set the FUA bit or synchronize
>>> cache for each write request.
>>>
>>> How does libiscsi behave in this case?
>>
>> libiscsi ignores the O_DSYNC flag.
>> It does not matter for two reasons:
>> * my target always destage to disk before replying. I.e. my target
>> ALWAYS write data synchronously to stable storage
>
> Does libiscsi initiator ensure this?  What if I use a different target
> or configure it differently, will libiscsi take care to ensure the
> semantics are still met?
>
>> * this test we are talking about is for READ10,   reads, not writes.
>
> I was not talking about a specific test.
>
>> Serioulsly, please explain,
>> in what exact way are write semantics and FUA bits and write destage
>> policy relevant here :
>>
>> sudo time dd if=/dev/sda of=/dev/null bs=1M
>>
>>
>> I seriously do not understand. Please educate me.
>
> Write semantics are completely independent of this dd read example.
>
> Stefan
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] iSCSI support for QEMU
  2011-04-21 11:36               ` ronnie sahlberg
@ 2011-04-21 11:44                 ` Kevin Wolf
  2011-04-21 12:08                 ` Stefan Hajnoczi
  2011-04-21 12:49                 ` Christoph Hellwig
  2 siblings, 0 replies; 16+ messages in thread
From: Kevin Wolf @ 2011-04-21 11:44 UTC (permalink / raw)
  To: ronnie sahlberg; +Cc: qemu-devel, Stefan Hajnoczi, Christoph Hellwig, stefanha

Am 21.04.2011 13:36, schrieb ronnie sahlberg:
> Stephan,
> 
> I understand.
> 
> Let me re-send a patch tomorrow that can optionally enable/force FUA
> bits for write.
> There are some high-volume arrays that advertise support but fail any
> cdb with FUA, FUA_NV bits set with sense, so it needs to be made optional.

You should make it dependent on the cache mode, i.e. you set FUA for
cache=writethrough and leave it alone for the other cache modes.

Kevin

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] iSCSI support for QEMU
  2011-04-21 11:36               ` ronnie sahlberg
  2011-04-21 11:44                 ` Kevin Wolf
@ 2011-04-21 12:08                 ` Stefan Hajnoczi
  2011-04-21 12:49                 ` Christoph Hellwig
  2 siblings, 0 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2011-04-21 12:08 UTC (permalink / raw)
  To: ronnie sahlberg; +Cc: Christoph Hellwig, stefanha, qemu-devel

On Thu, Apr 21, 2011 at 12:36 PM, ronnie sahlberg
<ronniesahlberg@gmail.com> wrote:
> I understand.
>
> Let me re-send a patch tomorrow that can optionally enable/force FUA
> bits for write.
> There are some high-volume arrays that advertise support but fail any
> cdb with FUA, FUA_NV bits set with sense, so it needs to be made optional.

Maybe we should just disable write cache on the LUN (setting WCE=0) if
cache=writethrough?  Then we don't need to worry about FUA or
synchronize cache.

Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] iSCSI support for QEMU
  2011-04-21 11:36               ` ronnie sahlberg
  2011-04-21 11:44                 ` Kevin Wolf
  2011-04-21 12:08                 ` Stefan Hajnoczi
@ 2011-04-21 12:49                 ` Christoph Hellwig
  2011-04-21 20:25                   ` ronnie sahlberg
  2 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2011-04-21 12:49 UTC (permalink / raw)
  To: ronnie sahlberg; +Cc: Stefan Hajnoczi, Christoph Hellwig, stefanha, qemu-devel

On Thu, Apr 21, 2011 at 09:36:12PM +1000, ronnie sahlberg wrote:
> There are some high-volume arrays that advertise support but fail any
> cdb with FUA, FUA_NV bits set with sense, so it needs to be made optional.

Which on would that be?  Linux uses the FUA bit if the device advertises support
via the DPOFUA bit since 2005, and prints a warning if a FUA write fails,
and since last year even fails the request hard.  I'd be really surprised
if a common device fails FUA writes and we didn't know about it by now.

Either way you'll have to still guarantee data made it to non-volatile
storage for cache=writethrough mode, either by disabling the WCE bit
using MODE SELECT, or by flushing the cache after every write.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] iSCSI support for QEMU
  2011-04-21 12:49                 ` Christoph Hellwig
@ 2011-04-21 20:25                   ` ronnie sahlberg
  0 siblings, 0 replies; 16+ messages in thread
From: ronnie sahlberg @ 2011-04-21 20:25 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Stefan Hajnoczi, qemu-devel, stefanha

On Thu, Apr 21, 2011 at 10:49 PM, Christoph Hellwig <hch@lst.de> wrote:
> On Thu, Apr 21, 2011 at 09:36:12PM +1000, ronnie sahlberg wrote:
>> There are some high-volume arrays that advertise support but fail any
>> cdb with FUA, FUA_NV bits set with sense, so it needs to be made optional.
>
> Which on would that be?  Linux uses the FUA bit if the device advertises support
> via the DPOFUA bit since 2005, and prints a warning if a FUA write fails,
> and since last year even fails the request hard.  I'd be really surprised
> if a common device fails FUA writes and we didn't know about it by now.
>
> Either way you'll have to still guarantee data made it to non-volatile
> storage for cache=writethrough mode, either by disabling the WCE bit
> using MODE SELECT, or by flushing the cache after every write.
>

Thankyou for your patience.
I understand and I have updated the patch to set FUA when the _WB flag
is not set.

regards
ronnie sahlberg

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Qemu-devel] iSCSI support for QEMU
@ 2011-06-12  2:54 ronnie sahlberg
  0 siblings, 0 replies; 16+ messages in thread
From: ronnie sahlberg @ 2011-06-12  2:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, stefanha, hch

Sorry about the missing subject line in the previous mail
Got confused when using git-send-patch :-)

regards
ronnie sahlberg

On Sun, Jun 12, 2011 at 12:47 PM, Ronnie Sahlberg
<ronniesahlberg@gmail.com> wrote:
> Please find attached a patch to add built-in support for iSCSI into QEMU.
> Please review and/or apply this patch.
>
> This is the latest version of this patch and I think I have addressed all previous concerns and suggestions.
>
>
> Using built-in iSCSI support has many advantages for certain use cases :
>
> * if you have very many iSCSI devices it may be impractical to expose
> all LUNs to the underlying host.
>
> * the devices become private to the guest and are not visible to the host.
> This automatically avoids polluting the page-cache on the host.
>
> * security, the devices are private to the guest, which prevents other guests or the host itself from accessing the devices.
>
> * security, it allows non-root users a secure way to get private and password protected access to the device by using CHAP authentication.
>
> * migration, many other virtualization systems provide built-in iscsi clients like this. Also providing this as feature in QEMU may make it easier for such users to migrate over to QEMU/KVM.
>
> * easier to maintain. For users with very many QEMU instances I think having guest-private iscsi targets and LUNs may be easier to manage than 'huge set of files in /dev/scsi/*'
>
> * easier to maintain, when copying a QEMU instance from one host to another, this offers a more self-contained solution where the QEMU command line itself cotnains all required storage configuration, instead of also having to coordinate this move with setting up and tearing down open-iscsi logins on the underlying hosts.
>
>
>
>
> The patch has been tested by installing and running several distributions such as RHEL6 and Ubuntu on devices, both system disk as well as the installer CDROM as being iscsi devices.
>
> This testing has also been performed by running full QEMU under valgrind.
>
> Performance is comparable to using open-iscsi devices.
>
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-06-12  2:54 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-21  8:43 [Qemu-devel] iSCSI support for QEMU Ronnie Sahlberg
2011-04-21  8:43 ` [Qemu-devel] [PATCH] iSCSI block driver support Ronnie Sahlberg
2011-04-21  8:50 ` [Qemu-devel] iSCSI support for QEMU Christoph Hellwig
2011-04-21  8:58   ` ronnie sahlberg
2011-04-21  9:09     ` Christoph Hellwig
2011-04-21  9:28       ` ronnie sahlberg
2011-04-21 10:58         ` Stefan Hajnoczi
2011-04-21 11:12           ` ronnie sahlberg
2011-04-21 11:21             ` Stefan Hajnoczi
2011-04-21 11:36               ` ronnie sahlberg
2011-04-21 11:44                 ` Kevin Wolf
2011-04-21 12:08                 ` Stefan Hajnoczi
2011-04-21 12:49                 ` Christoph Hellwig
2011-04-21 20:25                   ` ronnie sahlberg
2011-04-21  9:47       ` ronnie sahlberg
  -- strict thread matches above, loose matches on Subject: below --
2011-06-12  2:54 ronnie sahlberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).