Re: [Qemu-devel] [PATCH v6 2/2] block: Support GlusterFS as a QEMU block backend

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Bharata B Rao <bharata@linux.vnet.ibm.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>,
	Anand Avati <aavati@redhat.com>,
	Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>,
	Vijay Bellur <vbellur@redhat.com>,
	Amar Tumballi <amarts@redhat.com>,
	qemu-devel@nongnu.org, Blue Swirl <blauwirbel@gmail.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v6 2/2] block: Support GlusterFS as a QEMU block backend
Date: Fri, 7 Sep 2012 20:36:43 +0530	[thread overview]
Message-ID: <20120907150643.GF20421@in.ibm.com> (raw)
In-Reply-To: <50487B0A.6010908@redhat.com>

On Thu, Sep 06, 2012 at 12:29:30PM +0200, Kevin Wolf wrote:
> Am 06.09.2012 12:18, schrieb Paolo Bonzini:
> > Il 06/09/2012 12:07, Kevin Wolf ha scritto:
> >>> The AIOCB is already invalid at the time the callback is entered, so we
> >>> could release it before the call.  However, not all implementation of
> >>> AIO are ready for that and I'm not really in the mood for large scale
> >>> refactoring...
> >>
> >> But the way, what I'd really want to see in the end is to get rid of
> >> qemu_aio_flush() and replace it by .bdrv_drain() callbacks in each
> >> BlockDriver. The way we're doing it today is a layering violation.
> > 
> > That's quite difficult.  Completion of an I/O operation can trigger
> > another I/O operation on another block device, and so on until we go
> > back to the first device (think of a hypothetical RAID-5 device).
> 
> You always have a tree of BDSes, and children should only ever trigger
> completion of I/O operations in their parents. Am I missing anything?
> 
> >> Doesn't change anything about this problem, though. So the options that
> >> we have are:
> >>
> >> 1. Delay the callback using a BH. Doing this in each driver is ugly.
> >>    But is there actually more than one possible callback in today's
> >>    coroutine world? I only see bdrv_co_io_em_complete(), which could
> >>    reenter the coroutine from a BH.
> > 
> > Easy and safe, but it feels a bit like a timebomb.  Also, I'm not
> > entirely sure of _why_ the bottom half works. :)
> 
> Hm, safe and time bomb is contradictory in my book. :-)
> 
> The bottom half work because we're not reentering the qcow2_create
> coroutine immediately, so the gluster AIO callback can complete all of
> its cleanup work without being interrupted by code that might wait on
> this particular request and create a deadlock this way.
> 
> >> 2. Delay the callback by just calling it later when the cleanup has
> >>    been completed and .io_flush() can return 0. You say that it's hard
> >>    to implement for some drivers, except if the AIOCB are leaked until
> >>    the end of functions like qcow2_create().
> > 
> > ... which is what we do in posix-aio-compat.c; nobody screamed so far.
> 
> True. Would be easy to fix in posix-aio-compat, though, or can a
> callback expect that the AIOCB is still valid?
> 
> > Not really hard, it just has to be assessed for each driver separately.
> >  We can just do it in gluster and refactor it later.
> 
> Okay, so let's keep it as an option for now.

I tried this approach (option 2) in gluster and I was able to go past the hang
I was seeing earlier, but this causes other problems.

Let me restate what I am doing so that you could tell me if I am indeed
following the option 2 you mention above. I am doing the cleanup first
(qemu_aio_count-- and releasing the AIOCB) before calling the callback at
the end.

static void qemu_gluster_complete_aio(GlusterAIOCB *acb, BDRVGlusterState *s)
{
    int ret;
    bool *finished = acb->finished;
    BlockDriverCompletionFunc *cb = acb->common.cb;
    void *opaque = acb->common.opaque;
    
    if (!acb->ret || acb->ret == acb->size) {
        ret = 0; /* Success */
    } else if (acb->ret < 0) {
        ret = acb->ret; /* Read/Write failed */
    } else {
        ret = -EIO; /* Partial read/write - fail it */
    }
    s->qemu_aio_count--;
    qemu_aio_release(acb);

    cb(opaque, ret);
    if (finished) {
        *finished = true;
    }
}

static void qemu_gluster_aio_event_reader(void *opaque)
{
    BDRVGlusterState *s = opaque;
    ssize_t ret;

    do {
        char *p = (char *)&s->event_acb;

        ret = read(s->fds[GLUSTER_FD_READ], p + s->event_reader_pos,
                   sizeof(s->event_acb) - s->event_reader_pos);
        if (ret > 0) {
            s->event_reader_pos += ret;
            if (s->event_reader_pos == sizeof(s->event_acb)) {
                s->event_reader_pos = 0;
                qemu_gluster_complete_aio(s->event_acb, s);
                //s->qemu_aio_count--;
            }
        }
    } while (ret < 0 && errno == EINTR);
}

qemu_gluster_aio_event_reader() is the node->io_read in qemu_aio_wait().

qemu_aio_wait() calls node->io_read() which calls qemu_gluster_complete_aio().
Before we return back to qemu_aio_wait(), many other things happen:

bdrv_close() gets called from qcow2_create2()
This closes the gluster connection, closes the pipe, does
qemu_set_fd_hander(read_pipe_fd, NULL, NULL, NULL, NULL), which results
in the AioHandler node being deleted from aio_handlers list.

Now qemu_gluster_aio_event_reader (node->io_read) which was called from
qemu_aio_wait() finally completes and goes ahead and accesses "node"
which has already been deleted. This causes segfault.

So I think the option 1 (scheduling a BH from node->io_read) would
be better for gluster.

Regards,
Bharata.

next prev parent reply	other threads:[~2012-09-07 15:05 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-09 13:00 [Qemu-devel] [PATCH v6 0/2] GlusterFS support in QEMU - v6 Bharata B Rao
2012-08-09 13:01 ` [Qemu-devel] [PATCH v6 1/2] qemu: Add a config option for GlusterFS as block backend Bharata B Rao
2012-08-09 13:02 ` [Qemu-devel] [PATCH v6 2/2] block: Support GlusterFS as a QEMU " Bharata B Rao
2012-08-13 12:50   ` Kevin Wolf
2012-08-14  4:38     ` Bharata B Rao
2012-08-14  8:29       ` Kevin Wolf
2012-08-14  9:34         ` Bharata B Rao
2012-08-14  9:58           ` Kevin Wolf
2012-09-06  8:29             ` Avi Kivity
2012-09-06 15:40               ` Bharata B Rao
2012-09-06 15:44                 ` Paolo Bonzini
2012-09-06 15:47                 ` Daniel P. Berrange
2012-09-06 16:04                   ` ronnie sahlberg
2012-09-06 16:06                   ` Avi Kivity
2012-09-07  3:24                   ` Bharata B Rao
2012-09-07  9:19                     ` Daniel P. Berrange
2012-09-07  9:36                     ` Paolo Bonzini
2012-09-07  9:57                       ` Kevin Wolf
2012-09-12  9:22                         ` Bharata B Rao
2012-09-12  9:24                           ` Paolo Bonzini
2012-09-07 10:00                   ` Kevin Wolf
2012-09-07 10:03                     ` Daniel P. Berrange
2012-09-07 10:05                       ` Paolo Bonzini
2012-08-15  5:21         ` Bharata B Rao
2012-08-15  8:00           ` Kevin Wolf
2012-08-15  9:22             ` Bharata B Rao
2012-08-15  8:51         ` Bharata B Rao
2012-09-05  7:41   ` Bharata B Rao
2012-09-05  9:57     ` Bharata B Rao
2012-09-06  7:23       ` Paolo Bonzini
2012-09-06  9:06         ` Kevin Wolf
2012-09-06  9:38           ` Paolo Bonzini
2012-09-06 10:07             ` Kevin Wolf
2012-09-06 10:18               ` Paolo Bonzini
2012-09-06 10:29                 ` Kevin Wolf
2012-09-06 11:01                   ` Paolo Bonzini
2012-09-07 15:06                   ` Bharata B Rao [this message]
2012-09-07 15:11                     ` Paolo Bonzini
2012-09-08 14:22                       ` Bharata B Rao
2012-09-05 10:01     ` Kevin Wolf
2012-09-05 10:43       ` Bharata B Rao
2012-09-06  7:35   ` Paolo Bonzini
2012-09-07  5:46     ` Bharata B Rao
2012-08-13  9:49 ` [Qemu-devel] [PATCH v6 0/2] GlusterFS support in QEMU - v6 Bharata B Rao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120907150643.GF20421@in.ibm.com \
    --to=bharata@linux.vnet.ibm.com \
    --cc=aavati@redhat.com \
    --cc=aliguori@us.ibm.com \
    --cc=amarts@redhat.com \
    --cc=blauwirbel@gmail.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@linux.vnet.ibm.com \
    --cc=vbellur@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.