qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: ronnie sahlberg <ronniesahlberg@gmail.com>
To: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: kwolf@redhat.com, fujita.tomonori@lab.ntt.co.jp,
	qemu-devel@nongnu.org, hch@lst.de
Subject: Re: [Qemu-devel] [PATCH] This patch adds a new block driver : iSCSI
Date: Thu, 15 Sep 2011 09:08:03 +1000	[thread overview]
Message-ID: <CAN05THSb51aX58AXjH_-Vdv=NFjHpw9T0at9b1jbwierhcij=w@mail.gmail.com> (raw)
In-Reply-To: <20110912091408.GA3465@stefanha-thinkpad.localdomain>

Stefan,

Thanks for your review.
I will do the changes you recommend and send an updated patch over the weekend.

Could you advice the best path for handling the .bdrv_flush  and the
blocksize questions?


I think it is reasonable to just not support iscsi at all for
blocksize that is not multiple of 512 bytes
since a read-modify-write cycle across a network would probably be
prohibitively expensive.

.bdrv_flush() I can easily add a synchronous implementation of this
unless your patch is expected to be merged
in the near future.


regards
ronnie sahlberg

On Mon, Sep 12, 2011 at 7:14 PM, Stefan Hajnoczi
<stefanha@linux.vnet.ibm.com> wrote:
> On Sat, Sep 10, 2011 at 02:23:30PM +1000, Ronnie Sahlberg wrote:
>
> Looking good.  I think this is worth merging because it does offer
> benefits over host iSCSI.
>
>> +static void
>> +iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void *command_data,
>> +                    void *private_data)
>> +{
>> +}
>> +
>> +static void
>> +iscsi_aio_cancel(BlockDriverAIOCB *blockacb)
>> +{
>> +    IscsiAIOCB *acb = (IscsiAIOCB *)blockacb;
>> +    IscsiLun *iscsilun = acb->iscsilun;
>> +
>> +    acb->status = -ECANCELED;
>> +    acb->common.cb(acb->common.opaque, acb->status);
>> +    acb->canceled = 1;
>> +
>> +    iscsi_task_mgmt_abort_task_async(iscsilun->iscsi, acb->task,
>> +                                     iscsi_abort_task_cb, NULL);
>> +}
>
> The asynchronous abort task call looks odd.  If a caller allocates a
> buffer and issues a read request, then we need to make sure that the
> request is really aborted by the time .bdrv_aio_cancel() returns.
>
> If I understand the code correctly, iscsi_aio_cancel() returns
> immediately but the read request will still be in progress.  That means
> the caller could now free their data buffer and the read request will
> overwrite that unallocated memory.
>

Right.
I will need to remove the read pdu from the local initiator side too
so that if the read is in flight and we receive data for it, that this
data is discarded
and not written into the possibly no longer valid buffers.
I will do this change in the next version of the patch.

>> +static void
>> +iscsi_aio_write10_cb(struct iscsi_context *iscsi, int status,
>> +                     void *command_data, void *opaque)
>> +{
>> +    IscsiAIOCB *acb = opaque;
>> +
>> +    trace_iscsi_aio_write10_cb(iscsi, status, acb, acb->canceled);
>> +
>> +    if (acb->buf != NULL) {
>> +        free(acb->buf);
>> +    }
>
> Please just use g_free(acb->buf).  g_free(NULL) is defined as a nop so
> the check isn't necessary.  Also, this code uses free(3) when it should
> use g_free(3).

Will do.

>
>> +
>> +    if (acb->canceled != 0) {
>> +        qemu_aio_release(acb);
>> +        scsi_free_scsi_task(acb->task);
>> +        acb->task = NULL;
>> +        return;
>> +    }
>> +
>> +    acb->status = 0;
>> +    if (status < 0) {
>> +        error_report("Failed to write10 data to iSCSI lun. %s",
>> +                     iscsi_get_error(iscsi));
>> +        acb->status = -EIO;
>> +    }
>> +
>> +    iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb);
>> +    scsi_free_scsi_task(acb->task);
>> +    acb->task = NULL;
>> +}
>> +
>> +static int64_t sector_qemu2lun(int64_t sector, IscsiLun *iscsilun)
>> +{
>> +    return sector * BDRV_SECTOR_SIZE / iscsilun->block_size;
>> +}
>> +
>> +static BlockDriverAIOCB *
>> +iscsi_aio_writev(BlockDriverState *bs, int64_t sector_num,
>> +                 QEMUIOVector *qiov, int nb_sectors,
>> +                 BlockDriverCompletionFunc *cb,
>> +                 void *opaque)
>> +{
>> +    IscsiLun *iscsilun = bs->opaque;
>> +    struct iscsi_context *iscsi = iscsilun->iscsi;
>> +    IscsiAIOCB *acb;
>> +    size_t size;
>> +    int fua = 0;
>> +
>> +    /* set FUA on writes when cache mode is write through */
>> +    if (!(bs->open_flags & (BDRV_O_CACHE_WB|BDRV_O_NOCACHE))) {
>> +        fua = 1;
>> +    }
>
> FUA needs to reflect the guest semantics - does this disk have an
> enabled write cache?  When bs->open_flags has BDRV_O_CACHE_WB, then the
> guest knows it needs to send flushes because there is a write cache:
>
> if (!(bs->open_flags & BDRV_O_CACHE_WB)) {
>    fua = 1;
> }
>
> BDRV_O_NOCACHE is just for local files and sets the O_DIRECT hint.  It
> doesn't affect the cache semantics that the guest sees.

Thanks.  I'll do this change.

>
>> +/*
>> + * We support iscsi url's on the form
>> + * iscsi://[<username>%<password>@]<host>[:<port>]/<targetname>/<lun>
>> + */
>> +static int iscsi_open(BlockDriverState *bs, const char *filename, int flags)
>> +{
>> +    IscsiLun *iscsilun = bs->opaque;
>> +    struct iscsi_context *iscsi = NULL;
>> +    struct iscsi_url *iscsi_url = NULL;
>> +    struct IscsiTask task;
>> +    int ret;
>> +
>> +    if ((BDRV_SECTOR_SIZE % 512) != 0) {
>> +        error_report("iSCSI: Invalid BDRV_SECTOR_SIZE. "
>> +                     "BDRV_SECTOR_SIZE(%lld) is not a multiple "
>> +                     "of 512", BDRV_SECTOR_SIZE);
>> +        return -EINVAL;
>> +    }
>
> Another way of saying this is:
> QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE % 512 != 0);
>
> The advantage is that the build fails instead of waiting until iscsi is
> used at runtime until the failure is detected.

I can add this. But is it better to fail the whole build than to
return an error at runtime if/when iscsi is used?

>
> What will happen if BDRV_SECTOR_SIZE is not a multiple of 512?

iscsi will not work.
I dont think it is worth implementing a read-modify-write cycle for iscsi.
Performance would be so poor for this case that I think it is better
to just not support it at all.

>
>> +
>> +    memset(iscsilun, 0, sizeof(IscsiLun));
>> +
>> +    /* Should really append the KVM name after the ':' here */
>> +    iscsi = iscsi_create_context("iqn.2008-11.org.linux-kvm:");
>> +    if (iscsi == NULL) {
>> +        error_report("iSCSI: Failed to create iSCSI context.");
>> +        ret = -ENOMEM;
>> +        goto failed;
>> +    }
>> +
>> +    iscsi_url = iscsi_parse_full_url(iscsi, filename);
>> +    if (iscsi_url == NULL) {
>> +        error_report("Failed to parse URL : %s %s", filename,
>> +                     iscsi_get_error(iscsi));
>> +        ret = -ENOMEM;
>
> -EINVAL?

Will do.

>
>> +static BlockDriver bdrv_iscsi = {
>> +    .format_name     = "iscsi",
>> +    .protocol_name   = "iscsi",
>> +
>> +    .instance_size   = sizeof(IscsiLun),
>> +    .bdrv_file_open  = iscsi_open,
>> +    .bdrv_close      = iscsi_close,
>> +
>> +    .bdrv_getlength  = iscsi_getlength,
>> +
>> +    .bdrv_aio_readv  = iscsi_aio_readv,
>> +    .bdrv_aio_writev = iscsi_aio_writev,
>> +    .bdrv_aio_flush  = iscsi_aio_flush,
>
> block.c does not emulate .bdrv_flush() using your .bdrv_aio_flush()
> implementation.  I have sent a patch to add this emulation.  This will
> turn bdrv_flush(iscsi_bs) into an actual operation instead of a nop :).

Should I add a synchronous .bdrv_flush ? That is very easy for me to add.
Or should I just wait for your patch to be merged ?


>
>> diff --git a/trace-events b/trace-events
>> index 3fdd60f..4e461e5 100644
>> --- a/trace-events
>> +++ b/trace-events
>> @@ -494,3 +494,10 @@ escc_sunkbd_event_in(int ch) "Untranslated keycode %2.2x"
>>  escc_sunkbd_event_out(int ch) "Translated keycode %2.2x"
>>  escc_kbd_command(int val) "Command %d"
>>  escc_sunmouse_event(int dx, int dy, int buttons_state) "dx=%d dy=%d buttons=%01x"
>> +
>> +# block/iscsi.c
>> +disable iscsi_aio_write10_cb(void *iscsi, int status, void *acb, int canceled) "iscsi %p status %d acb %p canceled %d"
>> +disable iscsi_aio_writev(void *iscsi, int64_t sector_num, int nb_sectors, void *opaque, void *acb) "iscsi %p sector_num %"PRId64" nb_sectors %d opaque %p acb %p"
>> +disable iscsi_aio_read10_cb(void *iscsi, int status, void *acb, int canceled) "iscsi %p status %d acb %p canceled %d"
>> +disable iscsi_aio_readv(void *iscsi, int64_t sector_num, int nb_sectors, void *opaque, void *acb) "iscsi %p sector_num %"PRId64" nb_sectors %d opaque %p acb %p"
>
> It is no longer necessary to prefix trace event definitions with
> "disable".

Will fix this.

>
> The stderr and simple backends now start up with all events disabled by
> default.  The set of events can be set using the -trace
> events=<events-file> option or on the QEMU monitor using set trace-event
> <name> on.
>
> Stefan
>

  parent reply	other threads:[~2011-09-14 23:08 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-10  4:23 [Qemu-devel] [PATCH] Add iSCSI support for QEMU Ronnie Sahlberg
2011-09-10  4:23 ` [Qemu-devel] [PATCH] This patch adds a new block driver : iSCSI Ronnie Sahlberg
2011-09-12  9:14   ` Stefan Hajnoczi
2011-09-14 14:36     ` Christoph Hellwig
2011-09-14 15:50       ` Stefan Hajnoczi
2011-09-16 15:53         ` Christoph Hellwig
2011-09-17  7:11           ` Stefan Hajnoczi
2011-09-14 22:51       ` ronnie sahlberg
2011-09-15  8:02         ` Daniel P. Berrange
2011-09-15  9:03         ` Kevin Wolf
2011-09-14 23:08     ` ronnie sahlberg [this message]
2011-09-15  6:04       ` Paolo Bonzini
2011-09-15  8:48         ` Dor Laor
2011-09-15  9:11           ` Paolo Bonzini
2011-09-15 11:27             ` ronnie sahlberg
2011-09-15 11:42             ` Dor Laor
2011-09-15 11:46               ` Christoph Hellwig
2011-09-15 12:01                 ` Dor Laor
2011-09-15 12:04                   ` Paolo Bonzini
2011-09-15 11:58               ` Paolo Bonzini
2011-09-15 12:34                 ` Orit Wasserman
2011-09-15 12:58                   ` Paolo Bonzini
2011-09-15 16:59                     ` Orit Wasserman
2011-09-15  9:44           ` Daniel P. Berrange
2011-09-15  9:10         ` Kevin Wolf
2011-09-15  9:39           ` Paolo Bonzini
2011-09-21  9:48     ` ronnie sahlberg
2011-09-23  9:15   ` Mark Wu
2011-09-23 10:16     ` Paolo Bonzini
2011-09-12  8:56 ` [Qemu-devel] [PATCH] Add iSCSI support for QEMU Kevin Wolf
2011-09-14 12:24   ` Orit Wasserman
2011-09-14 14:33     ` Christoph Hellwig
2011-09-14 14:37     ` Christoph Hellwig
2011-09-14 15:35       ` Stefan Hajnoczi
2011-09-14 15:40         ` Christoph Hellwig
2011-09-14 15:51           ` Stefan Hajnoczi
2011-09-14 16:36             ` Orit Wasserman
2011-09-15  6:06               ` Paolo Bonzini
2011-09-15  9:52                 ` Orit Wasserman
2011-09-15  9:55                   ` Paolo Bonzini
2011-09-15 10:10                     ` Kevin Wolf
2011-09-17 19:08                 ` Laurent Vivier
2011-09-18  7:43                   ` Paolo Bonzini
2011-09-14 16:37             ` Paolo Bonzini
2011-09-14 22:46     ` ronnie sahlberg
  -- strict thread matches above, loose matches on Subject: below --
2011-09-21  9:37 Ronnie Sahlberg
2011-09-21  9:37 ` [Qemu-devel] [PATCH] This patch adds a new block driver : iSCSI Ronnie Sahlberg
2011-09-21  9:45   ` Paolo Bonzini
2011-09-21  9:52     ` ronnie sahlberg
2011-09-27 20:08       ` ronnie sahlberg
2011-09-28  5:54         ` Paolo Bonzini
2011-09-29  6:54   ` Stefan Hajnoczi
2011-10-09 20:46     ` ronnie sahlberg
2011-10-13  9:46       ` ronnie sahlberg
2011-10-13  9:48         ` Paolo Bonzini
2011-10-13  9:54         ` Stefan Hajnoczi
2011-10-13 10:01         ` Daniel P. Berrange
2011-10-13 10:55           ` Daniel P. Berrange
2011-10-13 10:52         ` Kevin Wolf
2011-10-24 13:33   ` Kevin Wolf
2011-10-25  8:04     ` ronnie sahlberg
2011-10-25  8:17       ` Kevin Wolf
2011-10-25  8:23         ` ronnie sahlberg
2011-10-25  8:46           ` Kevin Wolf
2011-10-28 10:46   ` Zhi Yong Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAN05THSb51aX58AXjH_-Vdv=NFjHpw9T0at9b1jbwierhcij=w@mail.gmail.com' \
    --to=ronniesahlberg@gmail.com \
    --cc=fujita.tomonori@lab.ntt.co.jp \
    --cc=hch@lst.de \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).