From: Anthony Liguori <anthony@codemonkey.ws>
To: Yehuda Sadeh Weinraub <yehudasa@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
ceph-devel@vger.kernel.org, qemu-devel@nongnu.org,
kvm@vger.kernel.org, Christian Brunner <chb@muc.de>
Subject: Re: [Qemu-devel] [PATCH] ceph/rbd block driver for qemu-kvm (v4)
Date: Thu, 07 Oct 2010 16:55:09 -0500 [thread overview]
Message-ID: <4CAE41BD.2070508@codemonkey.ws> (raw)
In-Reply-To: <AANLkTinzdiJZFxacD1BDCPWt44ChokNaNVU6_E=GtcTD@mail.gmail.com>
On 10/07/2010 04:49 PM, Yehuda Sadeh Weinraub wrote:
> On Thu, Oct 7, 2010 at 2:04 PM, Anthony Liguori<anthony@codemonkey.ws> wrote:
>
>> On 10/07/2010 03:47 PM, Yehuda Sadeh Weinraub wrote:
>>
>>>> How is that possible? Are the callbacks delivered in the context of a
>>>> different thread? If so, don't you need locking?
>>>>
>>>>
>>> Not sure I'm completely following you. The callbacks are delivered in
>>> the context of a different thread, but won't run concurrently.
>>>
>> Concurrently to what? How do you prevent them from running concurrently
>> with qemu?
>>
> There are two types of callbacks. The first is for rados aio
> completions, and the second one is the one added later for the fd glue
> layer.
>
This is a bad architecture for something like qemu. You could create a
pipe and use the pipe to signal to qemu. Same principle as eventfd.
Ideally, you would do this in the library itself.
Regards,
Anthony Liguori
> The first callback, called by librados whenever aio completes, runs in
> the context of a single librados thread:
>
> +static void rbd_finish_aiocb(rados_completion_t c, RADOSCB *rcb)
> +{
> + RBDAIOCB *acb = rcb->acb;
> rcb is per a single aio. Was created before and will be destroyed
> here, whereas acb is shared between a few aios, however, it was
> generated before the first aio was created.
>
> + int64_t r;
> + uint64_t buf = 1;
> + int i;
> +
> + acb->aiocnt--;
>
> acb->aiocnt has been set before initiating all the aios, so it's ok to
> touch it now. Same goes to all acb fields.
>
> + r = rados_aio_get_return_value(c);
> + rados_aio_release(c);
> + if (acb->write) {
> + if (r< 0) {
> + acb->ret = r;
> + acb->error = 1;
> + } else if (!acb->error) {
> + acb->ret += rcb->segsize;
> + }
> + } else {
> + if (r == -ENOENT) {
> + memset(rcb->buf, 0, rcb->segsize);
> + if (!acb->error) {
> + acb->ret += rcb->segsize;
> + }
> + } else if (r< 0) {
> + acb->ret = r;
> + acb->error = 1;
> + } else if (r< rcb->segsize) {
> + memset(rcb->buf + r, 0, rcb->segsize - r);
> + if (!acb->error) {
> + acb->ret += rcb->segsize;
> + }
> + } else if (!acb->error) {
> + acb->ret += r;
> + }
> + }
> + if (write(acb->s->efd,&buf, sizeof(buf))< 0)
> This will wake up the io_read()
>
> + error_report("failed writing to acb->s->efd\n");
> + qemu_free(rcb);
> + i = 0;
> + if (!acb->aiocnt&& acb->bh) {
> + qemu_bh_schedule(acb->bh);
> This is the only qemu related call in here, seems safe to call it.
>
> + }
> +}
>
> The scheduled bh function will be called only after all aios that
> relate to this specific aio set are done, so the following seems ok,
> as there's no more acb references.
> +static void rbd_aio_bh_cb(void *opaque)
> +{
> + RBDAIOCB *acb = opaque;
> + uint64_t buf = 1;
> +
> + if (!acb->write) {
> + qemu_iovec_from_buffer(acb->qiov, acb->bounce, acb->qiov->size);
> + }
> + qemu_vfree(acb->bounce);
> + acb->common.cb(acb->common.opaque, (acb->ret> 0 ? 0 : acb->ret));
> + qemu_bh_delete(acb->bh);
> + acb->bh = NULL;
> +
> + if (write(acb->s->efd,&buf, sizeof(buf))< 0)
> + error_report("failed writing to acb->s->efd\n");
> + qemu_aio_release(acb);
> +}
>
> Now, the second ones are the io_read(), in which we have our glue fd.
> We send uint64 per each completed io
>
> +static void rbd_aio_completion_cb(void *opaque)
> +{
> + BDRVRBDState *s = opaque;
> +
> + uint64_t val;
> + ssize_t ret;
> +
> + do {
> + if ((ret = read(s->efd,&val, sizeof(val)))> 0) {
> + s->qemu_aio_count -= val;
> There is an issue here with s->qemu_aio_count which needs to be
> protected by a mutex. Other than that, it just reads from s->efd.
>
> + }
> + } while (ret< 0&& errno == EINTR);
> +
> + return;
> +}
> +
> +static int rbd_aio_flush_cb(void *opaque)
> +{
> + BDRVRBDState *s = opaque;
> +
> + return (s->qemu_aio_count> 0);
> Same here as with the previous one, needs a mutex around s->qemu_aio_count.
>
> +}
>
>
>> If you saw lock ups, I bet that's what it was from.
>>
>>
> As I explained before, before introducing the fd glue layer, the lack
> of fd associated with our block device caused that there was no way
> for qemu to check whether all aios were flushed or not, which didn't
> work well when doing migration/savevm.
>
> Thanks,
> Yehuda
>
next prev parent reply other threads:[~2010-10-07 21:55 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-02 19:46 [Qemu-devel] [PATCH] ceph/rbd block driver for qemu-kvm (v4) Christian Brunner
2010-08-02 20:37 ` malc
2010-08-03 20:14 ` Christian Brunner
2010-09-23 2:21 ` Yehuda Sadeh Weinraub
2010-10-07 12:23 ` Kevin Wolf
2010-10-07 14:12 ` Anthony Liguori
2010-10-07 18:08 ` Yehuda Sadeh Weinraub
2010-10-07 18:38 ` Anthony Liguori
2010-10-07 18:41 ` Yehuda Sadeh Weinraub
2010-10-07 19:51 ` Anthony Liguori
2010-10-07 20:47 ` Yehuda Sadeh Weinraub
2010-10-07 21:04 ` Anthony Liguori
2010-10-07 21:49 ` Yehuda Sadeh Weinraub
2010-10-07 21:55 ` Anthony Liguori [this message]
2010-10-07 22:45 ` Sage Weil
2010-10-08 14:06 ` Anthony Liguori
2010-10-08 15:50 ` Yehuda Sadeh Weinraub
2010-10-08 16:05 ` Anthony Liguori
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CAE41BD.2070508@codemonkey.ws \
--to=anthony@codemonkey.ws \
--cc=ceph-devel@vger.kernel.org \
--cc=chb@muc.de \
--cc=kvm@vger.kernel.org \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=yehudasa@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).