From: Paolo Bonzini <pbonzini@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>,
Anand Avati <aavati@redhat.com>,
Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>,
Vijay Bellur <vbellur@redhat.com>,
Amar Tumballi <amarts@redhat.com>,
qemu-devel@nongnu.org, Blue Swirl <blauwirbel@gmail.com>,
bharata@linux.vnet.ibm.com
Subject: Re: [Qemu-devel] [PATCH v6 2/2] block: Support GlusterFS as a QEMU block backend
Date: Thu, 06 Sep 2012 13:01:02 +0200 [thread overview]
Message-ID: <5048826E.7020700@redhat.com> (raw)
In-Reply-To: <50487B0A.6010908@redhat.com>
Il 06/09/2012 12:29, Kevin Wolf ha scritto:
>> That's quite difficult. Completion of an I/O operation can trigger
>> another I/O operation on another block device, and so on until we go
>> back to the first device (think of a hypothetical RAID-5 device).
>
> You always have a tree of BDSes, and children should only ever trigger
> completion of I/O operations in their parents. Am I missing anything?
Yes, it can only ever trigger completion in the parents. So if you had
bdrv_drain in the children, it could leave other I/O pending on the
siblings, but that's okay. Very nice!! I hadn't thought of the tree.
>>> Doesn't change anything about this problem, though. So the options that
>>> we have are:
>>>
>>> 1. Delay the callback using a BH. Doing this in each driver is ugly.
>>> But is there actually more than one possible callback in today's
>>> coroutine world? I only see bdrv_co_io_em_complete(), which could
>>> reenter the coroutine from a BH.
>>
>> Easy and safe, but it feels a bit like a timebomb. Also, I'm not
>> entirely sure of _why_ the bottom half works. :)
>
> Hm, safe and time bomb is contradictory in my book. :-)
Well, safe for now. :)
> The bottom half work because we're not reentering the qcow2_create
> coroutine immediately, so the gluster AIO callback can complete all of
> its cleanup work without being interrupted by code that might wait on
> this particular request and create a deadlock this way.
Got it now. It's just (2) with a bottom half instead of simple code
reorganization.
>>> 2. Delay the callback by just calling it later when the cleanup has
>>> been completed and .io_flush() can return 0. You say that it's hard
>>> to implement for some drivers, except if the AIOCB are leaked until
>>> the end of functions like qcow2_create().
>>
>> ... which is what we do in posix-aio-compat.c; nobody screamed so far.
>
> True. Would be easy to fix in posix-aio-compat, though, or can a
> callback expect that the AIOCB is still valid?
IMO no. What would you use it for, anyway? It's opaque, all you could
do is bdrv_aio_cancel it. I checked all of the callers of
bdrv_aio_cancel. SCSI always zeroes their aiocb pointers on entry to
the AIO callback. IDE is a bit less explicit, but in the end will
always zero the aiocb as well. AHCI probably has a bug there (it does
not NULL the aiocb in ncq_cb).
virtio and Xen do not even store the aiocb, i.e. they couldn't care less.
>> Not really hard, it just has to be assessed for each driver separately.
>> We can just do it in gluster and refactor it later.
>
> Okay, so let's keep it as an option for now.
>
>>> 3. Add a delay only later in functions like bdrv_drain_all() that assume
>>> that the request has completed. Are there more of this type? AIOCBs
>>> are leaked until a bdrv_drain_all() call. Does it work with draining
>>> specific BDSes instead of everything?
>>>
>>> Unless I forgot some important point, it almost looks like option 1 is
>>> the easiest and safest.
>>
>> I agree with your opinion, but I would feel better if I understood
>> better why it works. (2) can be done easily in each driver (no
>> ugliness) and refactored later.
>
> I think option 2 must be done in each driver by design, or do you see
> even a theoretical way how to do it generically?
Yes, it has to be done in every driver. If we added something like
qemu_aio_complete(acb, ret) that does
cb = acb->cb;
opaque = acb->opaque;
qemu_aio_release(acb);
cb(opaque, ret);
by converting the driver to qemu_aio_complete you would avoid the leak.
Paolo
next prev parent reply other threads:[~2012-09-06 11:01 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-09 13:00 [Qemu-devel] [PATCH v6 0/2] GlusterFS support in QEMU - v6 Bharata B Rao
2012-08-09 13:01 ` [Qemu-devel] [PATCH v6 1/2] qemu: Add a config option for GlusterFS as block backend Bharata B Rao
2012-08-09 13:02 ` [Qemu-devel] [PATCH v6 2/2] block: Support GlusterFS as a QEMU " Bharata B Rao
2012-08-13 12:50 ` Kevin Wolf
2012-08-14 4:38 ` Bharata B Rao
2012-08-14 8:29 ` Kevin Wolf
2012-08-14 9:34 ` Bharata B Rao
2012-08-14 9:58 ` Kevin Wolf
2012-09-06 8:29 ` Avi Kivity
2012-09-06 15:40 ` Bharata B Rao
2012-09-06 15:44 ` Paolo Bonzini
2012-09-06 15:47 ` Daniel P. Berrange
2012-09-06 16:04 ` ronnie sahlberg
2012-09-06 16:06 ` Avi Kivity
2012-09-07 3:24 ` Bharata B Rao
2012-09-07 9:19 ` Daniel P. Berrange
2012-09-07 9:36 ` Paolo Bonzini
2012-09-07 9:57 ` Kevin Wolf
2012-09-12 9:22 ` Bharata B Rao
2012-09-12 9:24 ` Paolo Bonzini
2012-09-07 10:00 ` Kevin Wolf
2012-09-07 10:03 ` Daniel P. Berrange
2012-09-07 10:05 ` Paolo Bonzini
2012-08-15 5:21 ` Bharata B Rao
2012-08-15 8:00 ` Kevin Wolf
2012-08-15 9:22 ` Bharata B Rao
2012-08-15 8:51 ` Bharata B Rao
2012-09-05 7:41 ` Bharata B Rao
2012-09-05 9:57 ` Bharata B Rao
2012-09-06 7:23 ` Paolo Bonzini
2012-09-06 9:06 ` Kevin Wolf
2012-09-06 9:38 ` Paolo Bonzini
2012-09-06 10:07 ` Kevin Wolf
2012-09-06 10:18 ` Paolo Bonzini
2012-09-06 10:29 ` Kevin Wolf
2012-09-06 11:01 ` Paolo Bonzini [this message]
2012-09-07 15:06 ` Bharata B Rao
2012-09-07 15:11 ` Paolo Bonzini
2012-09-08 14:22 ` Bharata B Rao
2012-09-05 10:01 ` Kevin Wolf
2012-09-05 10:43 ` Bharata B Rao
2012-09-06 7:35 ` Paolo Bonzini
2012-09-07 5:46 ` Bharata B Rao
2012-08-13 9:49 ` [Qemu-devel] [PATCH v6 0/2] GlusterFS support in QEMU - v6 Bharata B Rao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5048826E.7020700@redhat.com \
--to=pbonzini@redhat.com \
--cc=aavati@redhat.com \
--cc=aliguori@us.ibm.com \
--cc=amarts@redhat.com \
--cc=bharata@linux.vnet.ibm.com \
--cc=blauwirbel@gmail.com \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@linux.vnet.ibm.com \
--cc=vbellur@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.