Re: [Qemu-devel] [PATCH for-2.6 v2 3/3] block/gluster: prevent data loss after i/o error

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Jeff Cody <jcody@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org,
	rwheeler@redhat.com, pkarampu@redhat.com, rgowdapp@redhat.com,
	ndevos@redhat.com
Subject: Re: [Qemu-devel] [PATCH for-2.6 v2 3/3] block/gluster: prevent data loss after i/o error
Date: Tue, 19 Apr 2016 08:29:27 -0400	[thread overview]
Message-ID: <20160419122927.GC7187@localhost.localdomain> (raw)
In-Reply-To: <20160419122756.GB4498@noname.str.redhat.com>

On Tue, Apr 19, 2016 at 02:27:56PM +0200, Kevin Wolf wrote:
> Am 19.04.2016 um 14:07 hat Jeff Cody geschrieben:
> > Upon receiving an I/O error after an fsync, by default gluster will
> > dump its cache.  However, QEMU will retry the fsync, which is especially
> > useful when encountering errors such as ENOSPC when using the werror=stop
> > option.  When using caching with gluster, however, the last written data
> > will be lost upon encountering ENOSPC.  Using the write-behind-cache
> > xlator option of 'resync-failed-syncs-after-fsync' should cause gluster
> > to retain the cached data after a failed fsync, so that ENOSPC and other
> > transient errors are recoverable.
> > 
> > Unfortunately, we have no way of knowing if the
> > 'resync-failed-syncs-after-fsync' xlator option is supported, so for now
> > close the fd and set the BDS driver to NULL upon fsync error.
> > 
> > Signed-off-by: Jeff Cody <jcody@redhat.com>
> > ---
> >  block/gluster.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> >  configure       |  8 ++++++++
> >  2 files changed, 50 insertions(+)
> > 
> > diff --git a/block/gluster.c b/block/gluster.c
> > index d9aace6..ba33488 100644
> > --- a/block/gluster.c
> > +++ b/block/gluster.c
> > @@ -314,6 +314,23 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
> >          goto out;
> >      }
> >  
> > +#ifdef CONFIG_GLUSTERFS_XLATOR_OPT
> > +    /* Without this, if fsync fails for a recoverable reason (for instance,
> > +     * ENOSPC), gluster will dump its cache, preventing retries.  This means
> > +     * almost certain data loss.  Not all gluster versions support the
> > +     * 'resync-failed-syncs-after-fsync' key value, but there is no way to
> > +     * discover during runtime if it is supported (this api returns success for
> > +     * unknown key/value pairs) */
> > +    ret = glfs_set_xlator_option(s->glfs, "*-write-behind",
> > +                                          "resync-failed-syncs-after-fsync",
> > +                                          "on");
> > +    if (ret < 0) {
> > +        error_setg_errno(errp, errno, "Unable to set xlator key/value pair");
> > +        ret = -errno;
> > +        goto out;
> > +    }
> > +#endif
> > +
> >      qemu_gluster_parse_flags(bdrv_flags, &open_flags);
> >  
> >      s->fd = glfs_open(s->glfs, gconf->image, open_flags);
> > @@ -366,6 +383,16 @@ static int qemu_gluster_reopen_prepare(BDRVReopenState *state,
> >          goto exit;
> >      }
> >  
> > +#ifdef CONFIG_GLUSTERFS_XLATOR_OPT
> > +    ret = glfs_set_xlator_option(reop_s->glfs, "*-write-behind",
> > +                                 "resync-failed-syncs-after-fsync", "on");
> > +    if (ret < 0) {
> > +        error_setg_errno(errp, errno, "Unable to set xlator key/value pair");
> > +        ret = -errno;
> > +        goto exit;
> > +    }
> > +#endif
> > +
> >      reop_s->fd = glfs_open(reop_s->glfs, gconf->image, open_flags);
> >      if (reop_s->fd == NULL) {
> >          /* reops->glfs will be cleaned up in _abort */
> > @@ -613,6 +640,21 @@ static coroutine_fn int qemu_gluster_co_flush_to_disk(BlockDriverState *bs)
> >  
> >      ret = glfs_fsync_async(s->fd, gluster_finish_aiocb, &acb);
> >      if (ret < 0) {
> > +        /* Some versions of Gluster (3.5.6 -> 3.5.8?) will not retain its
> > +         * cache after a fsync failure, so we have no way of allowing the guest
> > +         * to safely continue.  Gluster versions prior to 3.5.6 don't retain
> > +         * the cache either, but will invalidate the fd on error, so this is
> > +         * again our only option.
> > +         *
> > +         * The 'resync-failed-syncs-after-fsync' xlator option for the
> > +         * write-behind cache will cause later gluster versions to retain
> > +         * its cache after error, so long as the fd remains open.  However,
> > +         * we currently have no way of knowing if this option is supported.
> > +         *
> > +         * TODO: Once gluster provides a way for us to determine if the option
> > +         *       is supported, bypass the closure and setting drv to NULL.  */
> > +        qemu_gluster_close(bs);
> > +        bs->drv = NULL;
> >          return -errno;
> >      }
> 
> More context:
> 
>         qemu_coroutine_yield();
>         return acb.ret;
>     }
> 
> I would guess that acb.ret containing an error is the more common case.
> We should probably invalidate the BDS in both cases (immediate failure
> and callback with error code).
>

Ah yes, indeed.  I'll do that now.

next prev parent reply	other threads:[~2016-04-19 12:29 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-19 12:07 [Qemu-devel] [PATCH for-2.6 v2 0/3] Bug fixes for gluster Jeff Cody
2016-04-19 12:07 ` [Qemu-devel] [PATCH for-2.6 v2 1/3] block/gluster: return correct error value Jeff Cody
2016-04-19 12:07 ` [Qemu-devel] [PATCH for-2.6 v2 2/3] block/gluster: code movement of qemu_gluster_close() Jeff Cody
2016-04-19 12:07 ` [Qemu-devel] [PATCH for-2.6 v2 3/3] block/gluster: prevent data loss after i/o error Jeff Cody
2016-04-19 12:27   ` Kevin Wolf
2016-04-19 12:29     ` Jeff Cody [this message]
2016-04-19 12:18 ` [Qemu-devel] [PATCH for-2.6 v2 0/3] Bug fixes for gluster Ric Wheeler
2016-04-19 14:09   ` Jeff Cody
2016-04-20  1:56     ` Ric Wheeler
2016-04-20  9:24       ` Kevin Wolf
2016-04-20 10:40         ` Ric Wheeler
2016-04-20 11:46           ` Kevin Wolf
2016-04-20 18:38             ` Rik van Riel
2016-04-21  8:43               ` Kevin Wolf
2016-04-20 18:37           ` Rik van Riel
2016-04-20  5:15     ` Raghavendra Gowdappa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160419122927.GC7187@localhost.localdomain \
    --to=jcody@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=ndevos@redhat.com \
    --cc=pkarampu@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rgowdapp@redhat.com \
    --cc=rwheeler@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).