* [Qemu-devel] iSCSI support for QEMU @ 2011-04-21 8:43 Ronnie Sahlberg 2011-04-21 8:43 ` [Qemu-devel] [PATCH] iSCSI block driver support Ronnie Sahlberg 2011-04-21 8:50 ` [Qemu-devel] iSCSI support for QEMU Christoph Hellwig 0 siblings, 2 replies; 16+ messages in thread From: Ronnie Sahlberg @ 2011-04-21 8:43 UTC (permalink / raw) To: qemu-devel; +Cc: stefanha Please find attached a new version of the patch for iSCSI support. iSCSI support is automaticallt detected and activated during configure/build if the libiscsi library is available on the build host. This library is available at : https://github.com/sahlberg/libiscsi This new version contains two changes since previous versions * avoid copying data in the read path and use the api for reading directly into the application buffers * use proper task management functions to abort tasks that are cancelled by qemu A built in iSCSI initiator has many benefits * the LUNs used are local to the guest and are not exposed to the host, nor any of the processes running on the host. * it provides very high performance out-of-the-box, compared to open-iscsi that requires several special flags to iscsi to match libiscsi * it avoids cache trashing and having two copies of the same data, once in the hosts cache and a second copy in the guest. I.e. O_DIRECT like behaviour. * management of LUNs are much easier for high-end or enterprise users with very many iscsi devices. You no longer need to expose many hundreds of devices to the local host and have them pollute the caches. Performance is acceptable with libiscsi. Some basic tests thatve been performed show it to be significantly faster than an out-of-the-box open-iscsi mounted LUN being accessed by default QEMU i/o options. Test was performed as booting a Ubuntu 10.04 host, mounted on a 8GB STGT LUN exposed across a 1GbE network. Test was once the system booted, how long would it take to run, inside the guest, "sudo time dd if=/dev/sda of=/dev/null bs=1M" i.e. just have the guest read the 8GB system disk. QEMU+libiscsi : 228 seconds QEMU, default device options : 273 seconds QEMU with cache=none : 387 seconds QEMU with cache=none,aio=native : 218 seconds Please review and consider for inclusion. regards ronnie sahlberg From: Ronnie Sahlberg <ronniesahlberg@gmail.com> Subject: iSCSI support for QEMU In-Reply-To: ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH] iSCSI block driver support 2011-04-21 8:43 [Qemu-devel] iSCSI support for QEMU Ronnie Sahlberg @ 2011-04-21 8:43 ` Ronnie Sahlberg 2011-04-21 8:50 ` [Qemu-devel] iSCSI support for QEMU Christoph Hellwig 1 sibling, 0 replies; 16+ messages in thread From: Ronnie Sahlberg @ 2011-04-21 8:43 UTC (permalink / raw) To: qemu-devel; +Cc: stefanha, Ronnie Sahlberg This patch adds a new block driver : block.iscsi.c This driver interfaces with the multiplatform posix library for iscsi initiator/client access to iscsi devices hosted at git://github.com/sahlberg/libiscsi.git The patch adds the driver to interface with the iscsi library. It also updated the configure script to * by default, probe is libiscsi is available and if so, build qemu against libiscsi. * --enable-libiscsi Force a build against libiscsi. If libiscsi is not available the build will fail. * --disable-libiscsi Do not link against libiscsi, even if it is available. When linked with libiscsi, qemu gains support to access iscsi resources such as disks and cdrom directly, without having to make the devices visible to the host. You can specify devices using a iscsi url of the form : iscsi://[<username>[:<password>@]]<host>[:<port]/<target-iqn-name>/<lun> When using authentication, the password can optionally be set with LIBISCSI_CHAP_PASSWORD="password" to avoid it showing up in the process list Example: -drive file=iscsi://10.1.1.1:3260/iqn.ronnie.test/1 -cdrom iscsi://10.1.1.1:3260/iqn.ronnie.test/2 Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> --- Makefile.objs | 1 + block/iscsi.c | 590 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ configure | 31 +++ trace-events | 6 + 4 files changed, 628 insertions(+), 0 deletions(-) create mode 100644 block/iscsi.c diff --git a/Makefile.objs b/Makefile.objs index 44ce368..8403c66 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -25,6 +25,7 @@ block-nested-y += qed-check.o block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o block-nested-$(CONFIG_WIN32) += raw-win32.o block-nested-$(CONFIG_POSIX) += raw-posix.o +block-nested-$(CONFIG_LIBISCSI) += iscsi.o block-nested-$(CONFIG_CURL) += curl.o block-nested-$(CONFIG_RBD) += rbd.o diff --git a/block/iscsi.c b/block/iscsi.c new file mode 100644 index 0000000..31831ec --- /dev/null +++ b/block/iscsi.c @@ -0,0 +1,590 @@ +/* + * QEMU Block driver for iSCSI images + * + * Copyright (c) 2010 Ronnie Sahlberg <ronniesahlberg@gmail.com> + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "config-host.h" + +#include <poll.h> +#include "sysemu.h" +#include "qemu-common.h" +#include "qemu-error.h" +#include "block_int.h" +#include "trace.h" + +#include <iscsi/iscsi.h> +#include <iscsi/scsi-lowlevel.h> + + +typedef struct IscsiLun { + struct iscsi_context *iscsi; + int lun; + int block_size; + unsigned long num_blocks; +} IscsiLun; + +typedef struct IscsiAIOCB { + BlockDriverAIOCB common; + QEMUIOVector *qiov; + QEMUBH *bh; + IscsiLun *iscsilun; + struct scsi_task *task; + uint8_t *buf; + int canceled; + int status; + size_t read_size; + size_t read_offset; +} IscsiAIOCB; + +struct IscsiTask { + IscsiLun *iscsilun; + int status; + int complete; +}; + +static void +iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void *command_data, + void *private_data) +{ +} + +static void +iscsi_aio_cancel(BlockDriverAIOCB *blockacb) +{ + IscsiAIOCB *acb = (IscsiAIOCB *)blockacb; + IscsiLun *iscsilun = acb->iscsilun; + + acb->status = -ECANCELED; + acb->common.cb(acb->common.opaque, acb->status); + acb->canceled = 1; + + iscsi_task_mgmt_abort_task_async(iscsilun->iscsi, acb->task, + iscsi_abort_task_cb, NULL); +} + +static AIOPool iscsi_aio_pool = { + .aiocb_size = sizeof(IscsiAIOCB), + .cancel = iscsi_aio_cancel, +}; + + +static void iscsi_process_read(void *arg); +static void iscsi_process_write(void *arg); + +static int iscsi_process_flush(void *arg) +{ + IscsiLun *iscsilun = arg; + + return iscsi_queue_length(iscsilun->iscsi) > 0; +} + +static void +iscsi_set_events(IscsiLun *iscsilun) +{ + struct iscsi_context *iscsi = iscsilun->iscsi; + + qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), iscsi_process_read, + (iscsi_which_events(iscsi) & POLLOUT) + ? iscsi_process_write : NULL, + iscsi_process_flush, NULL, iscsilun); +} + +static void +iscsi_process_read(void *arg) +{ + IscsiLun *iscsilun = arg; + struct iscsi_context *iscsi = iscsilun->iscsi; + + iscsi_service(iscsi, POLLIN); + iscsi_set_events(iscsilun); +} + +static void +iscsi_process_write(void *arg) +{ + IscsiLun *iscsilun = arg; + struct iscsi_context *iscsi = iscsilun->iscsi; + + iscsi_service(iscsi, POLLOUT); + iscsi_set_events(iscsilun); +} + + +static int +iscsi_schedule_bh(QEMUBHFunc *cb, IscsiAIOCB *acb) +{ + acb->bh = qemu_bh_new(cb, acb); + if (!acb->bh) { + error_report("oom: could not create iscsi bh"); + return -EIO; + } + + qemu_bh_schedule(acb->bh); + return 0; +} + +static void +iscsi_readv_writev_bh_cb(void *p) +{ + IscsiAIOCB *acb = p; + + qemu_bh_delete(acb->bh); + + if (acb->status != -ECANCELED) { + acb->common.cb(acb->common.opaque, acb->status); + } + + qemu_aio_release(acb); +} + + +static void +iscsi_aio_write10_cb(struct iscsi_context *iscsi, int status, + void *command_data, void *opaque) +{ + IscsiAIOCB *acb = opaque; + + trace_iscsi_aio_write10_cb(iscsi, status, acb, acb->canceled); + + if (acb->buf != NULL) { + free(acb->buf); + } + + if (acb->canceled != 0) { + qemu_aio_release(acb); + scsi_free_scsi_task(acb->task); + acb->task = NULL; + return; + } + + acb->status = 0; + if (status < 0) { + error_report("Failed to write10 data to iSCSI lun. %s", + iscsi_get_error(iscsi)); + acb->status = -EIO; + } + + iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); + scsi_free_scsi_task(acb->task); + acb->task = NULL; +} + +static int64_t sector_qemu2lun(int64_t sector, IscsiLun *iscsilun) +{ + return sector * BDRV_SECTOR_SIZE / iscsilun->block_size; +} + +static BlockDriverAIOCB * +iscsi_aio_writev(BlockDriverState *bs, int64_t sector_num, + QEMUIOVector *qiov, int nb_sectors, + BlockDriverCompletionFunc *cb, + void *opaque) +{ + IscsiLun *iscsilun = bs->opaque; + struct iscsi_context *iscsi = iscsilun->iscsi; + IscsiAIOCB *acb; + size_t size; + + acb = qemu_aio_get(&iscsi_aio_pool, bs, cb, opaque); + trace_iscsi_aio_writev(iscsi, sector_num, nb_sectors, opaque, acb); + if (!acb) { + return NULL; + } + + acb->iscsilun = iscsilun; + acb->qiov = qiov; + + acb->canceled = 0; + + /* XXX we should pass the iovec to write10 to avoid the extra copy */ + /* this will allow us to get rid of 'buf' completely */ + size = nb_sectors * BDRV_SECTOR_SIZE; + acb->buf = qemu_malloc(size); + qemu_iovec_to_buffer(acb->qiov, acb->buf); + acb->task = iscsi_write10_task(iscsi, iscsilun->lun, acb->buf, size, + sector_qemu2lun(sector_num, iscsilun), + 0, 0, iscsilun->block_size, + iscsi_aio_write10_cb, acb); + if (acb->task == NULL) { + error_report("iSCSI: Failed to send write10 command. %s", + iscsi_get_error(iscsi)); + qemu_free(acb->buf); + qemu_aio_release(acb); + return NULL; + } + + iscsi_set_events(iscsilun); + + return &acb->common; +} + +static void +iscsi_aio_read10_cb(struct iscsi_context *iscsi, int status, + void *command_data, void *opaque) +{ + IscsiAIOCB *acb = opaque; + + trace_iscsi_aio_read10_cb(iscsi, status, acb, acb->canceled); + + if (acb->canceled != 0) { + qemu_aio_release(acb); + scsi_free_scsi_task(acb->task); + acb->task = NULL; + return; + } + + acb->status = 0; + if (status != 0) { + error_report("Failed to read10 data from iSCSI lun. %s", + iscsi_get_error(iscsi)); + acb->status = -EIO; + } + + iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); + scsi_free_scsi_task(acb->task); + acb->task = NULL; +} + +static BlockDriverAIOCB * +iscsi_aio_readv(BlockDriverState *bs, int64_t sector_num, + QEMUIOVector *qiov, int nb_sectors, + BlockDriverCompletionFunc *cb, + void *opaque) +{ + IscsiLun *iscsilun = bs->opaque; + struct iscsi_context *iscsi = iscsilun->iscsi; + IscsiAIOCB *acb; + size_t qemu_read_size, lun_read_size; + int i; + + qemu_read_size = BDRV_SECTOR_SIZE * (size_t)nb_sectors; + + acb = qemu_aio_get(&iscsi_aio_pool, bs, cb, opaque); + trace_iscsi_aio_readv(iscsi, sector_num, nb_sectors, opaque, acb); + if (!acb) { + return NULL; + } + + acb->iscsilun = iscsilun; + acb->qiov = qiov; + + acb->canceled = 0; + acb->read_size = qemu_read_size; + acb->buf = NULL; + + /* If LUN blocksize is bigger than BDRV_BLOCK_SIZE a read from QEMU + * may be misaligned to the LUN, so we may need to read some extra + * data. + */ + acb->read_offset = 0; + if (iscsilun->block_size > BDRV_SECTOR_SIZE) { + uint64_t bdrv_offset = BDRV_SECTOR_SIZE * sector_num; + + acb->read_offset = bdrv_offset % iscsilun->block_size; + } + + lun_read_size = (qemu_read_size + iscsilun->block_size + + acb->read_offset - 1) + / iscsilun->block_size * iscsilun->block_size; + acb->task = iscsi_read10_task(iscsi, iscsilun->lun, + sector_qemu2lun(sector_num, iscsilun), + lun_read_size, iscsilun->block_size, + iscsi_aio_read10_cb, acb); + if (acb->task == NULL) { + error_report("iSCSI: Failed to send read10 command. %s", + iscsi_get_error(iscsi)); + qemu_aio_release(acb); + return NULL; + } + + for (i = 0;i < acb->qiov->niov; i++) { + scsi_task_add_data_in_buffer(acb->task, + acb->qiov->iov[i].iov_len, + acb->qiov->iov[i].iov_base); + } + + iscsi_set_events(iscsilun); + + return &acb->common; +} + + +static void +iscsi_synccache10_cb(struct iscsi_context *iscsi, int status, + void *command_data, void *opaque) +{ + IscsiAIOCB *acb = opaque; + + if (acb->canceled != 0) { + qemu_aio_release(acb); + scsi_free_scsi_task(acb->task); + acb->task = NULL; + return; + } + + acb->status = 0; + if (status < 0) { + error_report("Failed to sync10 data on iSCSI lun. %s", + iscsi_get_error(iscsi)); + acb->status = -EIO; + } + + iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb); + scsi_free_scsi_task(acb->task); + acb->task = NULL; +} + +static BlockDriverAIOCB * +iscsi_aio_flush(BlockDriverState *bs, + BlockDriverCompletionFunc *cb, void *opaque) +{ + IscsiLun *iscsilun = bs->opaque; + struct iscsi_context *iscsi = iscsilun->iscsi; + IscsiAIOCB *acb; + + acb = qemu_aio_get(&iscsi_aio_pool, bs, cb, opaque); + if (!acb) { + return NULL; + } + + acb->iscsilun = iscsilun; + acb->canceled = 0; + + acb->task = iscsi_synchronizecache10_task(iscsi, iscsilun->lun, + 0, 0, 0, 0, + iscsi_synccache10_cb, + acb); + if (acb->task == NULL) { + error_report("iSCSI: Failed to send synchronizecache10 command. %s", + iscsi_get_error(iscsi)); + qemu_aio_release(acb); + return NULL; + } + + iscsi_set_events(iscsilun); + + return &acb->common; +} + +static int64_t +iscsi_getlength(BlockDriverState *bs) +{ + IscsiLun *iscsilun = bs->opaque; + int64_t len; + + len = iscsilun->num_blocks; + len *= iscsilun->block_size; + + return len; +} + +static void +iscsi_readcapacity10_cb(struct iscsi_context *iscsi, int status, + void *command_data, void *opaque) +{ + struct IscsiTask *itask = opaque; + struct scsi_readcapacity10 *rc10; + struct scsi_task *task = command_data; + + if (status != 0) { + error_report("iSCSI: Failed to read capacity of iSCSI lun. %s", + iscsi_get_error(iscsi)); + itask->status = 1; + itask->complete = 1; + scsi_free_scsi_task(task); + return; + } + + rc10 = scsi_datain_unmarshall(task); + if (rc10 == NULL) { + error_report("iSCSI: Failed to unmarshall readcapacity10 data."); + itask->status = 1; + itask->complete = 1; + scsi_free_scsi_task(task); + return; + } + + itask->iscsilun->block_size = rc10->block_size; + itask->iscsilun->num_blocks = rc10->lba; + + itask->status = 0; + itask->complete = 1; + scsi_free_scsi_task(task); +} + + +static void +iscsi_connect_cb(struct iscsi_context *iscsi, int status, void *command_data, + void *opaque) +{ + struct IscsiTask *itask = opaque; + struct scsi_task *task; + + if (status != 0) { + itask->status = 1; + itask->complete = 1; + return; + } + + task = iscsi_readcapacity10_task(iscsi, itask->iscsilun->lun, 0, 0, + iscsi_readcapacity10_cb, opaque); + if (task == NULL) { + error_report("iSCSI: failed to send readcapacity command."); + itask->status = 1; + itask->complete = 1; + return; + } +} + +/* + * We support iscsi url's on the form + * iscsi://[<username>%<password>@]<host>[:<port>]/<targetname>/<lun> + */ +static int iscsi_open(BlockDriverState *bs, const char *filename, int flags) +{ + IscsiLun *iscsilun = bs->opaque; + struct iscsi_context *iscsi = NULL; + struct iscsi_url *iscsi_url = NULL; + struct IscsiTask task; + int ret; + + if ((BDRV_SECTOR_SIZE % 512) != 0) { + error_report("iSCSI: Invalid BDRV_SECTOR_SIZE. " + "BDRV_SECTOR_SIZE(%lld) is not a multiple " + "of 512", BDRV_SECTOR_SIZE); + return -EINVAL; + } + + memset(iscsilun, 0, sizeof(IscsiLun)); + + /* Should really append the KVM name after the ':' here */ + iscsi = iscsi_create_context("iqn.2008-11.org.linux-kvm:"); + if (iscsi == NULL) { + error_report("iSCSI: Failed to create iSCSI context."); + ret = -ENOMEM; + goto failed; + } + + iscsi_url = iscsi_parse_full_url(iscsi, filename); + if (iscsi_url == NULL) { + error_report("Failed to parse URL : %s %s", filename, + iscsi_get_error(iscsi)); + ret = -ENOMEM; + goto failed; + } + + if (iscsi_set_targetname(iscsi, iscsi_url->target)) { + error_report("iSCSI: Failed to set target name."); + ret = -ENOMEM; + goto failed; + } + + if (iscsi_url->user != NULL) { + ret = iscsi_set_initiator_username_pwd(iscsi, iscsi_url->user, + iscsi_url->passwd); + if (ret != 0) { + error_report("Failed to set initiator username and password"); + ret = -ENOMEM; + goto failed; + } + } + if (iscsi_set_session_type(iscsi, ISCSI_SESSION_NORMAL) != 0) { + error_report("iSCSI: Failed to set session type to normal."); + ret = -ENOMEM; + goto failed; + } + + iscsi_set_header_digest(iscsi, ISCSI_HEADER_DIGEST_NONE_CRC32C); + + task.iscsilun = iscsilun; + task.status = 0; + task.complete = 0; + + iscsilun->iscsi = iscsi; + iscsilun->lun = iscsi_url->lun; + + if (iscsi_full_connect_async(iscsi, iscsi_url->portal, iscsi_url->lun, + iscsi_connect_cb, &task) + != 0) { + error_report("iSCSI: Failed to start async connect."); + ret = -ENOMEM; + goto failed; + } + + async_context_push(); + while (!task.complete) { + iscsi_set_events(iscsilun); + qemu_aio_wait(); + } + async_context_pop(); + if (task.status != 0) { + error_report("iSCSI: Failed to connect to LUN : %s", + iscsi_get_error(iscsi)); + ret = -EINVAL; + goto failed; + } + + return 0; + +failed: + if (iscsi_url != NULL) { + iscsi_destroy_url(iscsi_url); + } + if (iscsi != NULL) { + iscsi_destroy_context(iscsi); + } + memset(iscsilun, 0, sizeof(IscsiLun)); + return ret; +} + +static void iscsi_close(BlockDriverState *bs) +{ + IscsiLun *iscsilun = bs->opaque; + struct iscsi_context *iscsi = iscsilun->iscsi; + + qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), NULL, NULL, NULL, NULL, NULL); + iscsi_destroy_context(iscsi); + memset(iscsilun, 0, sizeof(IscsiLun)); +} + +static BlockDriver bdrv_iscsi = { + .format_name = "iscsi", + .protocol_name = "iscsi", + + .instance_size = sizeof(IscsiLun), + .bdrv_file_open = iscsi_open, + .bdrv_close = iscsi_close, + + .bdrv_getlength = iscsi_getlength, + + .bdrv_aio_readv = iscsi_aio_readv, + .bdrv_aio_writev = iscsi_aio_writev, + .bdrv_aio_flush = iscsi_aio_flush, +}; + +static void iscsi_block_init(void) +{ + bdrv_register(&bdrv_iscsi); +} + +block_init(iscsi_block_init); + diff --git a/configure b/configure index da2da04..7a71153 100755 --- a/configure +++ b/configure @@ -178,6 +178,7 @@ rbd="" smartcard="" smartcard_nss="" opengl="no" +libiscsi="" # parse CC options first for opt do @@ -632,6 +633,10 @@ for opt do ;; --enable-spice) spice="yes" ;; + --disable-libiscsi) libiscsi="no" + ;; + --enable-libiscsi) libiscsi="yes" + ;; --enable-profiler) profiler="yes" ;; --enable-cocoa) @@ -941,6 +946,8 @@ echo " Default:trace-<pid>" echo " --disable-spice disable spice" echo " --enable-spice enable spice" echo " --enable-rbd enable building the rados block device (rbd)" +echo " --disable-libiscsi disable iscsi support" +echo " --enable-libiscsi enable iscsi support" echo " --disable-smartcard disable smartcard support" echo " --enable-smartcard enable smartcard support" echo " --disable-smartcard-nss disable smartcard nss support" @@ -2282,6 +2289,25 @@ if compile_prog "" "" ; then fi ########################################## +# Do we have libiscsi +if test "$libiscsi" != "no" ; then + cat > $TMPC << EOF +#include <iscsi/iscsi.h> +int main(void) { iscsi_create_context(""); return 0; } +EOF + if compile_prog "-Werror" "-liscsi" ; then + libiscsi="yes" + LIBS="$LIBS -liscsi" + else + if test "$libiscsi" = "yes" ; then + feature_not_found "libiscsi" + fi + libiscsi="no" + fi +fi + + +########################################## # Do we need librt cat > $TMPC <<EOF #include <signal.h> @@ -2615,6 +2641,7 @@ echo "rbd support $rbd" echo "xfsctl support $xfs" echo "nss used $smartcard_nss" echo "OpenGL support $opengl" +echo "libiscsi support $libiscsi" if test $sdl_too_old = "yes"; then echo "-> Your SDL version is too old - please upgrade to have SDL support" @@ -2909,6 +2936,10 @@ if test "$opengl" = "yes" ; then echo "CONFIG_OPENGL=y" >> $config_host_mak fi +if test "$libiscsi" = "yes" ; then + echo "CONFIG_LIBISCSI=y" >> $config_host_mak +fi + # XXX: suppress that if [ "$bsd" = "yes" ] ; then echo "CONFIG_BSD=y" >> $config_host_mak diff --git a/trace-events b/trace-events index 703b745..05059a0 100644 --- a/trace-events +++ b/trace-events @@ -360,3 +360,9 @@ disable milkymist_uart_pulse_irq_tx(void) "Pulse IRQ TX" # hw/milkymist-vgafb.c disable milkymist_vgafb_memory_read(uint32_t addr, uint32_t value) "addr %08x value %08x" disable milkymist_vgafb_memory_write(uint32_t addr, uint32_t value) "addr %08x value %08x" + +# block/iscsi.c +disable iscsi_aio_write10_cb(void *iscsi, int status, void *acb, int canceled) "iscsi %p status %d acb %p canceled %d" +disable iscsi_aio_writev(void *iscsi, int64_t sector_num, int nb_sectors, void *opaque, void *acb) "iscsi %p sector_num %"PRId64" nb_sectors %d opaque %p acb %p" +disable iscsi_aio_read10_cb(void *iscsi, int status, void *acb, int canceled) "iscsi %p status %d acb %p canceled %d" +disable iscsi_aio_readv(void *iscsi, int64_t sector_num, int nb_sectors, void *opaque, void *acb) "iscsi %p sector_num %"PRId64" nb_sectors %d opaque %p acb %p" -- 1.7.3.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] iSCSI support for QEMU 2011-04-21 8:43 [Qemu-devel] iSCSI support for QEMU Ronnie Sahlberg 2011-04-21 8:43 ` [Qemu-devel] [PATCH] iSCSI block driver support Ronnie Sahlberg @ 2011-04-21 8:50 ` Christoph Hellwig 2011-04-21 8:58 ` ronnie sahlberg 1 sibling, 1 reply; 16+ messages in thread From: Christoph Hellwig @ 2011-04-21 8:50 UTC (permalink / raw) To: Ronnie Sahlberg; +Cc: qemu-devel, stefanha On Thu, Apr 21, 2011 at 06:43:10PM +1000, Ronnie Sahlberg wrote: > Some basic tests thatve been performed show it to be significantly faster > than an out-of-the-box open-iscsi mounted LUN being accessed by default > QEMU i/o options. Which isn't a useful comparism. qemu's default is the braindead cache=writethrough behaviour, which forces writes out to disk, and bloats the pagecache, while your mail sounds you silently always implement O_DIRECT-like semantics. Also the implementation has data integrity issues due this. It does not actually force writes out to disk in the default cache mode, despite qemu claiming to have WCE=0 in the default mode, i.e. you never flush the cache. You'll need set the FUA bits on all writes if cache=writethrough is used, with a fallback to a real cache flush in case the target doesn't support the FUA bit. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] iSCSI support for QEMU 2011-04-21 8:50 ` [Qemu-devel] iSCSI support for QEMU Christoph Hellwig @ 2011-04-21 8:58 ` ronnie sahlberg 2011-04-21 9:09 ` Christoph Hellwig 0 siblings, 1 reply; 16+ messages in thread From: ronnie sahlberg @ 2011-04-21 8:58 UTC (permalink / raw) To: Christoph Hellwig; +Cc: qemu-devel, stefanha Please re-read my post or read the patch. It has O_DIRECT like behaviour in that it will not pollute the hosts cache. This for the simple reason that the host is not aware that there is any block i/o happening. In my patch, there are NO data integrity issues. Data is sent out on the wire immediately as the guest issues the write. Once the guest issues a flush call, the flush call will not terminate until the SYNCCACHE10 task has completed. I do not have any cache at all in libiscsi or the patch. All I/O are always sent to the iSCSI target. No data is ever cached anywhere. I am just saying that in the tests I did, I had to use 'cache=none,aio=native' to come anywhere near the libiscsi performance. All other options performed significantly worse. I have no idea what those options have for effect on data integrity or open-iscsi. There should not be any integrity issues in libiscsi since it does not cache at all, ever. regards ronnie sahlberg On Thu, Apr 21, 2011 at 6:50 PM, Christoph Hellwig <hch@lst.de> wrote: > On Thu, Apr 21, 2011 at 06:43:10PM +1000, Ronnie Sahlberg wrote: >> Some basic tests thatve been performed show it to be significantly faster >> than an out-of-the-box open-iscsi mounted LUN being accessed by default >> QEMU i/o options. > > Which isn't a useful comparism. qemu's default is the braindead > cache=writethrough behaviour, which forces writes out to disk, and bloats the > pagecache, while your mail sounds you silently always implement O_DIRECT-like > semantics. > > Also the implementation has data integrity issues due this. It does > not actually force writes out to disk in the default cache mode, despite > qemu claiming to have WCE=0 in the default mode, i.e. you never flush > the cache. You'll need set the FUA bits on all writes if cache=writethrough > is used, with a fallback to a real cache flush in case the target doesn't > support the FUA bit. > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] iSCSI support for QEMU 2011-04-21 8:58 ` ronnie sahlberg @ 2011-04-21 9:09 ` Christoph Hellwig 2011-04-21 9:28 ` ronnie sahlberg 2011-04-21 9:47 ` ronnie sahlberg 0 siblings, 2 replies; 16+ messages in thread From: Christoph Hellwig @ 2011-04-21 9:09 UTC (permalink / raw) To: ronnie sahlberg; +Cc: Christoph Hellwig, stefanha, qemu-devel > In my patch, there are NO data integrity issues. > Data is sent out on the wire immediately as the guest issues the write. > Once the guest issues a flush call, the flush call will not terminate > until the SYNCCACHE10 task has completed. No guest will even issue a cache flush, as we claim to be WCE=0 by default. Now if you target has WCE=1 it will cache data internally, and your iscsi initiator will never flush it out to disk. We only claim WCE=1 to the guest if cache=writeback or cache=none are set. So ignoring the issue of having a cache on the initiator side you must implement stable writes for the default cache=writethrough behaviour by either seeting the FUA bit on your writes, or doing a cache flush after every write in case the target does not support FUA. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] iSCSI support for QEMU 2011-04-21 9:09 ` Christoph Hellwig @ 2011-04-21 9:28 ` ronnie sahlberg 2011-04-21 10:58 ` Stefan Hajnoczi 2011-04-21 9:47 ` ronnie sahlberg 1 sibling, 1 reply; 16+ messages in thread From: ronnie sahlberg @ 2011-04-21 9:28 UTC (permalink / raw) To: Christoph Hellwig; +Cc: qemu-devel, stefanha On Thu, Apr 21, 2011 at 7:09 PM, Christoph Hellwig <hch@lst.de> wrote: >> In my patch, there are NO data integrity issues. >> Data is sent out on the wire immediately as the guest issues the write. >> Once the guest issues a flush call, the flush call will not terminate >> until the SYNCCACHE10 task has completed. > > No guest will even issue a cache flush, as we claim to be WCE=0 by default. > Now if you target has WCE=1 it will cache data internally, and your > iscsi initiator will never flush it out to disk. My target does not do any caching at all. It happily ignores both FUA and FUA_NV bits and always destage all data to stable storage before sending SCSI_STATUS_GOOD back to the initiator. I use the same target and the same LUN and the same settings for both testing QEMU+openiscsi-mounted-lun and QEMU+libiscsi I do not understand why my target would have data integrity problem when used with libiscsi but not with open-iscsi mounted lun? > > We only claim WCE=1 to the guest if cache=writeback or cache=none are > set. So ignoring the issue of having a cache on the initiator side > you must implement stable writes for the default cache=writethrough > behaviour by either seeting the FUA bit on your writes, or doing > a cache flush after every write in case the target does not support FUA. My target right now does such flushes for writes. I fail to see why FUA, FUA_NV or flushes have any relevance to a test that just involves reading data off the lun. I think this discussion is strange. I would like discussion about the merits of my patch and if features like built-in iscsi support that enterprise users find useful are desireable in QEMU. I do not find discussions about semantics of my particular iscsi target to be meaningful for that purpose. regards ronnie sahlberg ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] iSCSI support for QEMU 2011-04-21 9:28 ` ronnie sahlberg @ 2011-04-21 10:58 ` Stefan Hajnoczi 2011-04-21 11:12 ` ronnie sahlberg 0 siblings, 1 reply; 16+ messages in thread From: Stefan Hajnoczi @ 2011-04-21 10:58 UTC (permalink / raw) To: ronnie sahlberg; +Cc: Christoph Hellwig, stefanha, qemu-devel On Thu, Apr 21, 2011 at 10:28 AM, ronnie sahlberg <ronniesahlberg@gmail.com> wrote: > On Thu, Apr 21, 2011 at 7:09 PM, Christoph Hellwig <hch@lst.de> wrote: >> We only claim WCE=1 to the guest if cache=writeback or cache=none are >> set. So ignoring the issue of having a cache on the initiator side >> you must implement stable writes for the default cache=writethrough >> behaviour by either seeting the FUA bit on your writes, or doing >> a cache flush after every write in case the target does not support FUA. > > My target right now does such flushes for writes. > > > I fail to see why FUA, FUA_NV or flushes have any relevance to a test > that just involves reading data off the lun. I'll try to rephrase what Christoph has pointed out. When QEMU is run with cache=writethrough (default), QEMU does not report a write cache on the emulated disk. The guest believes that all writes are stable because there is no disk write cache. Therefore the guest does not need to issue synchronize cache commands ever. In order to meet these semantics with libiscsi, we would need to set FUA or send a synchronize cache command for every write. (QEMU's raw-posix.c file I/O meets these semantics by opening the image file with O_DSYNC when cache=writethrough.) > I do not understand why my target would have data integrity problem > when used with libiscsi > but not with open-iscsi mounted lun? In the open-iscsi cache=writethrough case, QEMU's raw-posix.c opens the file with O_DSYNC. Open-iscsi must set the FUA bit or synchronize cache for each write request. How does libiscsi behave in this case? Stefan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] iSCSI support for QEMU 2011-04-21 10:58 ` Stefan Hajnoczi @ 2011-04-21 11:12 ` ronnie sahlberg 2011-04-21 11:21 ` Stefan Hajnoczi 0 siblings, 1 reply; 16+ messages in thread From: ronnie sahlberg @ 2011-04-21 11:12 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: Christoph Hellwig, stefanha, qemu-devel On Thu, Apr 21, 2011 at 8:58 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote: > On Thu, Apr 21, 2011 at 10:28 AM, ronnie sahlberg > <ronniesahlberg@gmail.com> wrote: >> On Thu, Apr 21, 2011 at 7:09 PM, Christoph Hellwig <hch@lst.de> wrote: >>> We only claim WCE=1 to the guest if cache=writeback or cache=none are >>> set. So ignoring the issue of having a cache on the initiator side >>> you must implement stable writes for the default cache=writethrough >>> behaviour by either seeting the FUA bit on your writes, or doing >>> a cache flush after every write in case the target does not support FUA. >> >> My target right now does such flushes for writes. >> >> >> I fail to see why FUA, FUA_NV or flushes have any relevance to a test >> that just involves reading data off the lun. > > I'll try to rephrase what Christoph has pointed out. > > When QEMU is run with cache=writethrough (default), QEMU does not > report a write cache on the emulated disk. The guest believes that > all writes are stable because there is no disk write cache. Therefore > the guest does not need to issue synchronize cache commands ever. > > In order to meet these semantics with libiscsi, we would need to set > FUA or send a synchronize cache command for every write. (QEMU's > raw-posix.c file I/O meets these semantics by opening the image file > with O_DSYNC when cache=writethrough.) > >> I do not understand why my target would have data integrity problem >> when used with libiscsi >> but not with open-iscsi mounted lun? > > In the open-iscsi cache=writethrough case, QEMU's raw-posix.c opens > the file with O_DSYNC. Open-iscsi must set the FUA bit or synchronize > cache for each write request. > > How does libiscsi behave in this case? libiscsi ignores the O_DSYNC flag. It does not matter for two reasons: * my target always destage to disk before replying. I.e. my target ALWAYS write data synchronously to stable storage * this test we are talking about is for READ10, reads, not writes. Serioulsly, please explain, in what exact way are write semantics and FUA bits and write destage policy relevant here : sudo time dd if=/dev/sda of=/dev/null bs=1M I seriously do not understand. Please educate me. regards ronnie sahlberg ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] iSCSI support for QEMU 2011-04-21 11:12 ` ronnie sahlberg @ 2011-04-21 11:21 ` Stefan Hajnoczi 2011-04-21 11:36 ` ronnie sahlberg 0 siblings, 1 reply; 16+ messages in thread From: Stefan Hajnoczi @ 2011-04-21 11:21 UTC (permalink / raw) To: ronnie sahlberg; +Cc: Christoph Hellwig, stefanha, qemu-devel On Thu, Apr 21, 2011 at 12:12 PM, ronnie sahlberg <ronniesahlberg@gmail.com> wrote: > On Thu, Apr 21, 2011 at 8:58 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote: >> On Thu, Apr 21, 2011 at 10:28 AM, ronnie sahlberg >> <ronniesahlberg@gmail.com> wrote: >>> On Thu, Apr 21, 2011 at 7:09 PM, Christoph Hellwig <hch@lst.de> wrote: >>>> We only claim WCE=1 to the guest if cache=writeback or cache=none are >>>> set. So ignoring the issue of having a cache on the initiator side >>>> you must implement stable writes for the default cache=writethrough >>>> behaviour by either seeting the FUA bit on your writes, or doing >>>> a cache flush after every write in case the target does not support FUA. >>> >>> My target right now does such flushes for writes. >>> >>> >>> I fail to see why FUA, FUA_NV or flushes have any relevance to a test >>> that just involves reading data off the lun. >> >> I'll try to rephrase what Christoph has pointed out. >> >> When QEMU is run with cache=writethrough (default), QEMU does not >> report a write cache on the emulated disk. The guest believes that >> all writes are stable because there is no disk write cache. Therefore >> the guest does not need to issue synchronize cache commands ever. >> >> In order to meet these semantics with libiscsi, we would need to set >> FUA or send a synchronize cache command for every write. (QEMU's >> raw-posix.c file I/O meets these semantics by opening the image file >> with O_DSYNC when cache=writethrough.) >> >>> I do not understand why my target would have data integrity problem >>> when used with libiscsi >>> but not with open-iscsi mounted lun? >> >> In the open-iscsi cache=writethrough case, QEMU's raw-posix.c opens >> the file with O_DSYNC. Open-iscsi must set the FUA bit or synchronize >> cache for each write request. >> >> How does libiscsi behave in this case? > > libiscsi ignores the O_DSYNC flag. > It does not matter for two reasons: > * my target always destage to disk before replying. I.e. my target > ALWAYS write data synchronously to stable storage Does libiscsi initiator ensure this? What if I use a different target or configure it differently, will libiscsi take care to ensure the semantics are still met? > * this test we are talking about is for READ10, reads, not writes. I was not talking about a specific test. > Serioulsly, please explain, > in what exact way are write semantics and FUA bits and write destage > policy relevant here : > > sudo time dd if=/dev/sda of=/dev/null bs=1M > > > I seriously do not understand. Please educate me. Write semantics are completely independent of this dd read example. Stefan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] iSCSI support for QEMU 2011-04-21 11:21 ` Stefan Hajnoczi @ 2011-04-21 11:36 ` ronnie sahlberg 2011-04-21 11:44 ` Kevin Wolf ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: ronnie sahlberg @ 2011-04-21 11:36 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: Christoph Hellwig, stefanha, qemu-devel Stephan, I understand. Let me re-send a patch tomorrow that can optionally enable/force FUA bits for write. There are some high-volume arrays that advertise support but fail any cdb with FUA, FUA_NV bits set with sense, so it needs to be made optional. regards ronnie sahlberg On Thu, Apr 21, 2011 at 9:21 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote: > On Thu, Apr 21, 2011 at 12:12 PM, ronnie sahlberg > <ronniesahlberg@gmail.com> wrote: >> On Thu, Apr 21, 2011 at 8:58 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote: >>> On Thu, Apr 21, 2011 at 10:28 AM, ronnie sahlberg >>> <ronniesahlberg@gmail.com> wrote: >>>> On Thu, Apr 21, 2011 at 7:09 PM, Christoph Hellwig <hch@lst.de> wrote: >>>>> We only claim WCE=1 to the guest if cache=writeback or cache=none are >>>>> set. So ignoring the issue of having a cache on the initiator side >>>>> you must implement stable writes for the default cache=writethrough >>>>> behaviour by either seeting the FUA bit on your writes, or doing >>>>> a cache flush after every write in case the target does not support FUA. >>>> >>>> My target right now does such flushes for writes. >>>> >>>> >>>> I fail to see why FUA, FUA_NV or flushes have any relevance to a test >>>> that just involves reading data off the lun. >>> >>> I'll try to rephrase what Christoph has pointed out. >>> >>> When QEMU is run with cache=writethrough (default), QEMU does not >>> report a write cache on the emulated disk. The guest believes that >>> all writes are stable because there is no disk write cache. Therefore >>> the guest does not need to issue synchronize cache commands ever. >>> >>> In order to meet these semantics with libiscsi, we would need to set >>> FUA or send a synchronize cache command for every write. (QEMU's >>> raw-posix.c file I/O meets these semantics by opening the image file >>> with O_DSYNC when cache=writethrough.) >>> >>>> I do not understand why my target would have data integrity problem >>>> when used with libiscsi >>>> but not with open-iscsi mounted lun? >>> >>> In the open-iscsi cache=writethrough case, QEMU's raw-posix.c opens >>> the file with O_DSYNC. Open-iscsi must set the FUA bit or synchronize >>> cache for each write request. >>> >>> How does libiscsi behave in this case? >> >> libiscsi ignores the O_DSYNC flag. >> It does not matter for two reasons: >> * my target always destage to disk before replying. I.e. my target >> ALWAYS write data synchronously to stable storage > > Does libiscsi initiator ensure this? What if I use a different target > or configure it differently, will libiscsi take care to ensure the > semantics are still met? > >> * this test we are talking about is for READ10, reads, not writes. > > I was not talking about a specific test. > >> Serioulsly, please explain, >> in what exact way are write semantics and FUA bits and write destage >> policy relevant here : >> >> sudo time dd if=/dev/sda of=/dev/null bs=1M >> >> >> I seriously do not understand. Please educate me. > > Write semantics are completely independent of this dd read example. > > Stefan > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] iSCSI support for QEMU 2011-04-21 11:36 ` ronnie sahlberg @ 2011-04-21 11:44 ` Kevin Wolf 2011-04-21 12:08 ` Stefan Hajnoczi 2011-04-21 12:49 ` Christoph Hellwig 2 siblings, 0 replies; 16+ messages in thread From: Kevin Wolf @ 2011-04-21 11:44 UTC (permalink / raw) To: ronnie sahlberg; +Cc: qemu-devel, Stefan Hajnoczi, Christoph Hellwig, stefanha Am 21.04.2011 13:36, schrieb ronnie sahlberg: > Stephan, > > I understand. > > Let me re-send a patch tomorrow that can optionally enable/force FUA > bits for write. > There are some high-volume arrays that advertise support but fail any > cdb with FUA, FUA_NV bits set with sense, so it needs to be made optional. You should make it dependent on the cache mode, i.e. you set FUA for cache=writethrough and leave it alone for the other cache modes. Kevin ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] iSCSI support for QEMU 2011-04-21 11:36 ` ronnie sahlberg 2011-04-21 11:44 ` Kevin Wolf @ 2011-04-21 12:08 ` Stefan Hajnoczi 2011-04-21 12:49 ` Christoph Hellwig 2 siblings, 0 replies; 16+ messages in thread From: Stefan Hajnoczi @ 2011-04-21 12:08 UTC (permalink / raw) To: ronnie sahlberg; +Cc: Christoph Hellwig, stefanha, qemu-devel On Thu, Apr 21, 2011 at 12:36 PM, ronnie sahlberg <ronniesahlberg@gmail.com> wrote: > I understand. > > Let me re-send a patch tomorrow that can optionally enable/force FUA > bits for write. > There are some high-volume arrays that advertise support but fail any > cdb with FUA, FUA_NV bits set with sense, so it needs to be made optional. Maybe we should just disable write cache on the LUN (setting WCE=0) if cache=writethrough? Then we don't need to worry about FUA or synchronize cache. Stefan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] iSCSI support for QEMU 2011-04-21 11:36 ` ronnie sahlberg 2011-04-21 11:44 ` Kevin Wolf 2011-04-21 12:08 ` Stefan Hajnoczi @ 2011-04-21 12:49 ` Christoph Hellwig 2011-04-21 20:25 ` ronnie sahlberg 2 siblings, 1 reply; 16+ messages in thread From: Christoph Hellwig @ 2011-04-21 12:49 UTC (permalink / raw) To: ronnie sahlberg; +Cc: Stefan Hajnoczi, Christoph Hellwig, stefanha, qemu-devel On Thu, Apr 21, 2011 at 09:36:12PM +1000, ronnie sahlberg wrote: > There are some high-volume arrays that advertise support but fail any > cdb with FUA, FUA_NV bits set with sense, so it needs to be made optional. Which on would that be? Linux uses the FUA bit if the device advertises support via the DPOFUA bit since 2005, and prints a warning if a FUA write fails, and since last year even fails the request hard. I'd be really surprised if a common device fails FUA writes and we didn't know about it by now. Either way you'll have to still guarantee data made it to non-volatile storage for cache=writethrough mode, either by disabling the WCE bit using MODE SELECT, or by flushing the cache after every write. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] iSCSI support for QEMU 2011-04-21 12:49 ` Christoph Hellwig @ 2011-04-21 20:25 ` ronnie sahlberg 0 siblings, 0 replies; 16+ messages in thread From: ronnie sahlberg @ 2011-04-21 20:25 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Stefan Hajnoczi, qemu-devel, stefanha On Thu, Apr 21, 2011 at 10:49 PM, Christoph Hellwig <hch@lst.de> wrote: > On Thu, Apr 21, 2011 at 09:36:12PM +1000, ronnie sahlberg wrote: >> There are some high-volume arrays that advertise support but fail any >> cdb with FUA, FUA_NV bits set with sense, so it needs to be made optional. > > Which on would that be? Linux uses the FUA bit if the device advertises support > via the DPOFUA bit since 2005, and prints a warning if a FUA write fails, > and since last year even fails the request hard. I'd be really surprised > if a common device fails FUA writes and we didn't know about it by now. > > Either way you'll have to still guarantee data made it to non-volatile > storage for cache=writethrough mode, either by disabling the WCE bit > using MODE SELECT, or by flushing the cache after every write. > Thankyou for your patience. I understand and I have updated the patch to set FUA when the _WB flag is not set. regards ronnie sahlberg ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] iSCSI support for QEMU 2011-04-21 9:09 ` Christoph Hellwig 2011-04-21 9:28 ` ronnie sahlberg @ 2011-04-21 9:47 ` ronnie sahlberg 1 sibling, 0 replies; 16+ messages in thread From: ronnie sahlberg @ 2011-04-21 9:47 UTC (permalink / raw) To: Christoph Hellwig; +Cc: qemu-devel, stefanha Christoph, I think you misread my test. My test is pure reading : sudo time dd if=/dev/sda of=/dev/null bs=1M There are no writes involved at all in this test, only a huge number of READ10 being sent to the target, or in the case of when using QEMU+openiscsi-mounted-lun sometimes being served out of the pagecache of the host. Since open-iscsi mounted LUNs by default perform so very poorly against libiscsi, I assume that there are very few blocks that are be served out of the cache of the host. This is based on that a block served out of cache would have significantly, many orders or magnitudes, lower access latency than a block that needs to be fetched across a 1GbE network. As open-iscsi performs so much poorly in this case compared to libiscsi, I just speculate that very few blocks are delivered by cache hits. I have absolutely no idea on why, QEMU+open-iscsi would perform so much better for a read-intensive workload like this when setting cache=none,aio=native. That is for the qemu developers to explain. Maybe doing READ10 through open-iscsi is very expensive? Maybe something else in the linux kernel makes reads very expensive unless you use "cache=none,aio=native"? Who knows? I have no idea, other than without using "cache=none,aio=native" QEMU performance for read intensive tasks are significantly slower than QEMU doing the exact same reads using libiscsi. I really don't care why QEMU+openiscsi performs so bad either. That is of very little interest to me. As long as libiscsi is not significantly worse than open-iscsi I care very little about why. regards ronnie sahlberg On Thu, Apr 21, 2011 at 7:09 PM, Christoph Hellwig <hch@lst.de> wrote: >> In my patch, there are NO data integrity issues. >> Data is sent out on the wire immediately as the guest issues the write. >> Once the guest issues a flush call, the flush call will not terminate >> until the SYNCCACHE10 task has completed. > > No guest will even issue a cache flush, as we claim to be WCE=0 by default. > Now if you target has WCE=1 it will cache data internally, and your > iscsi initiator will never flush it out to disk. > > We only claim WCE=1 to the guest if cache=writeback or cache=none are > set. So ignoring the issue of having a cache on the initiator side > you must implement stable writes for the default cache=writethrough > behaviour by either seeting the FUA bit on your writes, or doing > a cache flush after every write in case the target does not support FUA. > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Qemu-devel] iSCSI support for QEMU
@ 2011-06-12 2:54 ronnie sahlberg
0 siblings, 0 replies; 16+ messages in thread
From: ronnie sahlberg @ 2011-06-12 2:54 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, stefanha, hch
Sorry about the missing subject line in the previous mail
Got confused when using git-send-patch :-)
regards
ronnie sahlberg
On Sun, Jun 12, 2011 at 12:47 PM, Ronnie Sahlberg
<ronniesahlberg@gmail.com> wrote:
> Please find attached a patch to add built-in support for iSCSI into QEMU.
> Please review and/or apply this patch.
>
> This is the latest version of this patch and I think I have addressed all previous concerns and suggestions.
>
>
> Using built-in iSCSI support has many advantages for certain use cases :
>
> * if you have very many iSCSI devices it may be impractical to expose
> all LUNs to the underlying host.
>
> * the devices become private to the guest and are not visible to the host.
> This automatically avoids polluting the page-cache on the host.
>
> * security, the devices are private to the guest, which prevents other guests or the host itself from accessing the devices.
>
> * security, it allows non-root users a secure way to get private and password protected access to the device by using CHAP authentication.
>
> * migration, many other virtualization systems provide built-in iscsi clients like this. Also providing this as feature in QEMU may make it easier for such users to migrate over to QEMU/KVM.
>
> * easier to maintain. For users with very many QEMU instances I think having guest-private iscsi targets and LUNs may be easier to manage than 'huge set of files in /dev/scsi/*'
>
> * easier to maintain, when copying a QEMU instance from one host to another, this offers a more self-contained solution where the QEMU command line itself cotnains all required storage configuration, instead of also having to coordinate this move with setting up and tearing down open-iscsi logins on the underlying hosts.
>
>
>
>
> The patch has been tested by installing and running several distributions such as RHEL6 and Ubuntu on devices, both system disk as well as the installer CDROM as being iscsi devices.
>
> This testing has also been performed by running full QEMU under valgrind.
>
> Performance is comparable to using open-iscsi devices.
>
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2011-06-12 2:54 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-04-21 8:43 [Qemu-devel] iSCSI support for QEMU Ronnie Sahlberg 2011-04-21 8:43 ` [Qemu-devel] [PATCH] iSCSI block driver support Ronnie Sahlberg 2011-04-21 8:50 ` [Qemu-devel] iSCSI support for QEMU Christoph Hellwig 2011-04-21 8:58 ` ronnie sahlberg 2011-04-21 9:09 ` Christoph Hellwig 2011-04-21 9:28 ` ronnie sahlberg 2011-04-21 10:58 ` Stefan Hajnoczi 2011-04-21 11:12 ` ronnie sahlberg 2011-04-21 11:21 ` Stefan Hajnoczi 2011-04-21 11:36 ` ronnie sahlberg 2011-04-21 11:44 ` Kevin Wolf 2011-04-21 12:08 ` Stefan Hajnoczi 2011-04-21 12:49 ` Christoph Hellwig 2011-04-21 20:25 ` ronnie sahlberg 2011-04-21 9:47 ` ronnie sahlberg -- strict thread matches above, loose matches on Subject: below -- 2011-06-12 2:54 ronnie sahlberg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).