From: ronnie sahlberg <ronniesahlberg@gmail.com>
To: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: kwolf@redhat.com, fujita.tomonori@lab.ntt.co.jp,
qemu-devel@nongnu.org, hch@lst.de
Subject: Re: [Qemu-devel] [PATCH] This patch adds a new block driver : iSCSI
Date: Wed, 21 Sep 2011 19:48:22 +1000 [thread overview]
Message-ID: <CAN05THTxTcaz8VtKOyi2eGDwigDh59SBdKQeuM_PtdKLBinWeA@mail.gmail.com> (raw)
In-Reply-To: <20110912091408.GA3465@stefanha-thinkpad.localdomain>
Stefan,
Thanks for your review.
I feel that iscsi would be beneficial for several usecases.
I have implemented all changes you pointed out, except two, and resent
an updated patch to the list.
Please see below.
regards
ronnie sahlberg
On Mon, Sep 12, 2011 at 7:14 PM, Stefan Hajnoczi
<stefanha@linux.vnet.ibm.com> wrote:
> On Sat, Sep 10, 2011 at 02:23:30PM +1000, Ronnie Sahlberg wrote:
>
> Looking good. I think this is worth merging because it does offer
> benefits over host iSCSI.
>
>> +static void
>> +iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void *command_data,
>> + void *private_data)
>> +{
>> +}
>> +
>> +static void
>> +iscsi_aio_cancel(BlockDriverAIOCB *blockacb)
>> +{
>> + IscsiAIOCB *acb = (IscsiAIOCB *)blockacb;
>> + IscsiLun *iscsilun = acb->iscsilun;
>> +
>> + acb->status = -ECANCELED;
>> + acb->common.cb(acb->common.opaque, acb->status);
>> + acb->canceled = 1;
>> +
>> + iscsi_task_mgmt_abort_task_async(iscsilun->iscsi, acb->task,
>> + iscsi_abort_task_cb, NULL);
>> +}
>
> The asynchronous abort task call looks odd. If a caller allocates a
> buffer and issues a read request, then we need to make sure that the
> request is really aborted by the time .bdrv_aio_cancel() returns.
>
> If I understand the code correctly, iscsi_aio_cancel() returns
> immediately but the read request will still be in progress. That means
> the caller could now free their data buffer and the read request will
> overwrite that unallocated memory.
You are right.
I have added a new function to libiscsi that will also release the
task structure from libiscsi as well.
So we should no longer have a window where we might have a cancelled
task in flight writing the data to an already released buffer once the
cancelled data buffers start arriving on the socket.
>
>> +static void
>> +iscsi_aio_write10_cb(struct iscsi_context *iscsi, int status,
>> + void *command_data, void *opaque)
>> +{
>> + IscsiAIOCB *acb = opaque;
>> +
>> + trace_iscsi_aio_write10_cb(iscsi, status, acb, acb->canceled);
>> +
>> + if (acb->buf != NULL) {
>> + free(acb->buf);
>> + }
>
> Please just use g_free(acb->buf). g_free(NULL) is defined as a nop so
> the check isn't necessary. Also, this code uses free(3) when it shoulde.
> use g_free(3).
>
Done.
>> +
>> + if (acb->canceled != 0) {
>> + qemu_aio_release(acb);
>> + scsi_free_scsi_task(acb->task);
>> + acb->task = NULL;
>> + return;
>> + }
>> +
>> + acb->status = 0;
>> + if (status < 0) {
>> + error_report("Failed to write10 data to iSCSI lun. %s",
>> + iscsi_get_error(iscsi));
>> + acb->status = -EIO;
>> + }
>> +
>> + iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb);
>> + scsi_free_scsi_task(acb->task);
>> + acb->task = NULL;
>> +}
>> +
>> +static int64_t sector_qemu2lun(int64_t sector, IscsiLun *iscsilun)
>> +{
>> + return sector * BDRV_SECTOR_SIZE / iscsilun->block_size;
>> +}
>> +
>> +static BlockDriverAIOCB *
>> +iscsi_aio_writev(BlockDriverState *bs, int64_t sector_num,
>> + QEMUIOVector *qiov, int nb_sectors,
>> + BlockDriverCompletionFunc *cb,
>> + void *opaque)
>> +{
>> + IscsiLun *iscsilun = bs->opaque;
>> + struct iscsi_context *iscsi = iscsilun->iscsi;
>> + IscsiAIOCB *acb;
>> + size_t size;
>> + int fua = 0;
>> +
>> + /* set FUA on writes when cache mode is write through */
>> + if (!(bs->open_flags & (BDRV_O_CACHE_WB|BDRV_O_NOCACHE))) {
>> + fua = 1;
>> + }
>
> FUA needs to reflect the guest semantics - does this disk have an
> enabled write cache? When bs->open_flags has BDRV_O_CACHE_WB, then the
> guest knows it needs to send flushes because there is a write cache:
>
> if (!(bs->open_flags & BDRV_O_CACHE_WB)) {
> fua = 1;
> }
>
> BDRV_O_NOCACHE is just for local files and sets the O_DIRECT hint. It
> doesn't affect the cache semantics that the guest sees.
>
Done. When I first started building the patch a while ago, both flags
were needed. I missed when the second flag became obsolete upstream.
>> +/*
>> + * We support iscsi url's on the form
>> + * iscsi://[<username>%<password>@]<host>[:<port>]/<targetname>/<lun>
>> + */
>> +static int iscsi_open(BlockDriverState *bs, const char *filename, int flags)
>> +{
>> + IscsiLun *iscsilun = bs->opaque;
>> + struct iscsi_context *iscsi = NULL;
>> + struct iscsi_url *iscsi_url = NULL;
>> + struct IscsiTask task;
>> + int ret;
>> +
>> + if ((BDRV_SECTOR_SIZE % 512) != 0) {
>> + error_report("iSCSI: Invalid BDRV_SECTOR_SIZE. "
>> + "BDRV_SECTOR_SIZE(%lld) is not a multiple "
>> + "of 512", BDRV_SECTOR_SIZE);
>> + return -EINVAL;
>> + }
>
> Another way of saying this is:
> QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE % 512 != 0);
>
> The advantage is that the build fails instead of waiting until iscsi is
> used at runtime until the failure is detected.
>
> What will happen if BDRV_SECTOR_SIZE is not a multiple of 512?
>
I did not do this change.
If blocksize is not multiple of 512, the iscsi backend will just
refuse to work since having a read-modify-write cycle across the
network would be extremely costly.
I dont think iscsi should cause the entire build to fail.
While it is difficult to imagine a different blocksize, it is not
impossible I guess.
If someone comes up with a non-512-multiple blocksize and a good
reason for one, I dont want to be the guy that broke the build.
>> +
>> + memset(iscsilun, 0, sizeof(IscsiLun));
>> +
>> + /* Should really append the KVM name after the ':' here */
>> + iscsi = iscsi_create_context("iqn.2008-11.org.linux-kvm:");
>> + if (iscsi == NULL) {
>> + error_report("iSCSI: Failed to create iSCSI context.");
>> + ret = -ENOMEM;
>> + goto failed;
>> + }
>> +
>> + iscsi_url = iscsi_parse_full_url(iscsi, filename);
>> + if (iscsi_url == NULL) {
>> + error_report("Failed to parse URL : %s %s", filename,
>> + iscsi_get_error(iscsi));
>> + ret = -ENOMEM;
>
> -EINVAL?
Done.
>
>> +static BlockDriver bdrv_iscsi = {
>> + .format_name = "iscsi",
>> + .protocol_name = "iscsi",
>> +
>> + .instance_size = sizeof(IscsiLun),
>> + .bdrv_file_open = iscsi_open,
>> + .bdrv_close = iscsi_close,
>> +
>> + .bdrv_getlength = iscsi_getlength,
>> +
>> + .bdrv_aio_readv = iscsi_aio_readv,
>> + .bdrv_aio_writev = iscsi_aio_writev,
>> + .bdrv_aio_flush = iscsi_aio_flush,
>
> block.c does not emulate .bdrv_flush() using your .bdrv_aio_flush()
> implementation. I have sent a patch to add this emulation. This will
> turn bdrv_flush(iscsi_bs) into an actual operation instead of a nop :).
I did not implement this since I understood from the discussion that a
patch is imminent and it is required for existing backend(s?) anyway.
>
>> diff --git a/trace-events b/trace-events
>> index 3fdd60f..4e461e5 100644
>> --- a/trace-events
>> +++ b/trace-events
>> @@ -494,3 +494,10 @@ escc_sunkbd_event_in(int ch) "Untranslated keycode %2.2x"
>> escc_sunkbd_event_out(int ch) "Translated keycode %2.2x"
>> escc_kbd_command(int val) "Command %d"
>> escc_sunmouse_event(int dx, int dy, int buttons_state) "dx=%d dy=%d buttons=%01x"
>> +
>> +# block/iscsi.c
>> +disable iscsi_aio_write10_cb(void *iscsi, int status, void *acb, int canceled) "iscsi %p status %d acb %p canceled %d"
>> +disable iscsi_aio_writev(void *iscsi, int64_t sector_num, int nb_sectors, void *opaque, void *acb) "iscsi %p sector_num %"PRId64" nb_sectors %d opaque %p acb %p"
>> +disable iscsi_aio_read10_cb(void *iscsi, int status, void *acb, int canceled) "iscsi %p status %d acb %p canceled %d"
>> +disable iscsi_aio_readv(void *iscsi, int64_t sector_num, int nb_sectors, void *opaque, void *acb) "iscsi %p sector_num %"PRId64" nb_sectors %d opaque %p acb %p"
>
> It is no longer necessary to prefix trace event definitions with
> "disable".
Done.
>
> The stderr and simple backends now start up with all events disabled by
> default. The set of events can be set using the -trace
> events=<events-file> option or on the QEMU monitor using set trace-event
> <name> on.
>
> Stefan
>
next prev parent reply other threads:[~2011-09-21 9:48 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-10 4:23 [Qemu-devel] [PATCH] Add iSCSI support for QEMU Ronnie Sahlberg
2011-09-10 4:23 ` [Qemu-devel] [PATCH] This patch adds a new block driver : iSCSI Ronnie Sahlberg
2011-09-12 9:14 ` Stefan Hajnoczi
2011-09-14 14:36 ` Christoph Hellwig
2011-09-14 15:50 ` Stefan Hajnoczi
2011-09-16 15:53 ` Christoph Hellwig
2011-09-17 7:11 ` Stefan Hajnoczi
2011-09-14 22:51 ` ronnie sahlberg
2011-09-15 8:02 ` Daniel P. Berrange
2011-09-15 9:03 ` Kevin Wolf
2011-09-14 23:08 ` ronnie sahlberg
2011-09-15 6:04 ` Paolo Bonzini
2011-09-15 8:48 ` Dor Laor
2011-09-15 9:11 ` Paolo Bonzini
2011-09-15 11:27 ` ronnie sahlberg
2011-09-15 11:42 ` Dor Laor
2011-09-15 11:46 ` Christoph Hellwig
2011-09-15 12:01 ` Dor Laor
2011-09-15 12:04 ` Paolo Bonzini
2011-09-15 11:58 ` Paolo Bonzini
2011-09-15 12:34 ` Orit Wasserman
2011-09-15 12:58 ` Paolo Bonzini
2011-09-15 16:59 ` Orit Wasserman
2011-09-15 9:44 ` Daniel P. Berrange
2011-09-15 9:10 ` Kevin Wolf
2011-09-15 9:39 ` Paolo Bonzini
2011-09-21 9:48 ` ronnie sahlberg [this message]
2011-09-23 9:15 ` Mark Wu
2011-09-23 10:16 ` Paolo Bonzini
2011-09-12 8:56 ` [Qemu-devel] [PATCH] Add iSCSI support for QEMU Kevin Wolf
2011-09-14 12:24 ` Orit Wasserman
2011-09-14 14:33 ` Christoph Hellwig
2011-09-14 14:37 ` Christoph Hellwig
2011-09-14 15:35 ` Stefan Hajnoczi
2011-09-14 15:40 ` Christoph Hellwig
2011-09-14 15:51 ` Stefan Hajnoczi
2011-09-14 16:36 ` Orit Wasserman
2011-09-15 6:06 ` Paolo Bonzini
2011-09-15 9:52 ` Orit Wasserman
2011-09-15 9:55 ` Paolo Bonzini
2011-09-15 10:10 ` Kevin Wolf
2011-09-17 19:08 ` Laurent Vivier
2011-09-18 7:43 ` Paolo Bonzini
2011-09-14 16:37 ` Paolo Bonzini
2011-09-14 22:46 ` ronnie sahlberg
-- strict thread matches above, loose matches on Subject: below --
2011-09-21 9:37 Ronnie Sahlberg
2011-09-21 9:37 ` [Qemu-devel] [PATCH] This patch adds a new block driver : iSCSI Ronnie Sahlberg
2011-09-21 9:45 ` Paolo Bonzini
2011-09-21 9:52 ` ronnie sahlberg
2011-09-27 20:08 ` ronnie sahlberg
2011-09-28 5:54 ` Paolo Bonzini
2011-09-29 6:54 ` Stefan Hajnoczi
2011-10-09 20:46 ` ronnie sahlberg
2011-10-13 9:46 ` ronnie sahlberg
2011-10-13 9:48 ` Paolo Bonzini
2011-10-13 9:54 ` Stefan Hajnoczi
2011-10-13 10:01 ` Daniel P. Berrange
2011-10-13 10:55 ` Daniel P. Berrange
2011-10-13 10:52 ` Kevin Wolf
2011-10-24 13:33 ` Kevin Wolf
2011-10-25 8:04 ` ronnie sahlberg
2011-10-25 8:17 ` Kevin Wolf
2011-10-25 8:23 ` ronnie sahlberg
2011-10-25 8:46 ` Kevin Wolf
2011-10-28 10:46 ` Zhi Yong Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAN05THTxTcaz8VtKOyi2eGDwigDh59SBdKQeuM_PtdKLBinWeA@mail.gmail.com \
--to=ronniesahlberg@gmail.com \
--cc=fujita.tomonori@lab.ntt.co.jp \
--cc=hch@lst.de \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).