qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Cody <jcody@redhat.com>
To: Niels de Vos <ndevos@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	qemu-block@nongnu.org, pkarampu@redhat.com,
	qemu-devel@nongnu.org, rwheeler@redhat.com, kdhananj@redhat.com
Subject: Re: [Qemu-devel] [PATCH for-2.6 2/2] block/gluster: prevent data loss after i/o error
Date: Wed, 6 Apr 2016 12:05:25 -0400	[thread overview]
Message-ID: <20160406160525.GC7329@localhost.localdomain> (raw)
In-Reply-To: <20160406144732.GM17260@ndevos-x240.usersys.redhat.com>

On Wed, Apr 06, 2016 at 04:47:32PM +0200, Niels de Vos wrote:
> On Wed, Apr 06, 2016 at 08:44:18AM -0400, Jeff Cody wrote:
> > On Wed, Apr 06, 2016 at 01:02:16PM +0200, Kevin Wolf wrote:
> > > [ Adding some CCs ]
> > > 
> > > Am 06.04.2016 um 05:29 hat Jeff Cody geschrieben:
> > > > Upon receiving an I/O error after an fsync, by default gluster will
> > > > dump its cache.  However, QEMU will retry the fsync, which is especially
> > > > useful when encountering errors such as ENOSPC when using the werror=stop
> > > > option.  When using caching with gluster, however, the last written data
> > > > will be lost upon encountering ENOSPC.  Using the cache xlator option of
> 
> Use "write-behind xlator" instead of "cache xlator". There are different
> caches in Gluster.
>

Thanks, I'll update the commit message.

> > > > 'resync-failed-syncs-after-fsync' should cause gluster to retain the
> > > > cached data after a failed fsync, so that ENOSPC and other transient
> > > > errors are recoverable.
> > > > 
> > > > Signed-off-by: Jeff Cody <jcody@redhat.com>
> > > > ---
> > > >  block/gluster.c | 27 +++++++++++++++++++++++++++
> > > >  configure       |  8 ++++++++
> > > >  2 files changed, 35 insertions(+)
> > > > 
> > > > diff --git a/block/gluster.c b/block/gluster.c
> > > > index 30a827e..b1cf71b 100644
> > > > --- a/block/gluster.c
> > > > +++ b/block/gluster.c
> > > > @@ -330,6 +330,23 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
> > > >          goto out;
> > > >      }
> > > >  
> > > > +#ifdef CONFIG_GLUSTERFS_XLATOR_OPT
> > > > +    /* Without this, if fsync fails for a recoverable reason (for instance,
> > > > +     * ENOSPC), gluster will dump its cache, preventing retries.  This means
> > > > +     * almost certain data loss.  Not all gluster versions support the
> > > > +     * 'resync-failed-syncs-after-fsync' key value, but there is no way to
> > > > +     * discover during runtime if it is supported (this api returns success for
> > > > +     * unknown key/value pairs) */
> > > 
> > > Honestly, this sucks. There is apparently no way to operate gluster so
> > > we can safely recover after a failed fsync. "We hope everything is fine,
> > > but depending on your gluster version, we may now corrupt your image"
> > > isn't very good.
> > > 
> > > We need to consider very carefully if this is good enough to go on after
> > > an error. I'm currently leaning towards "no". That is, we should only
> > > enable this after Gluster provides us a way to make sure that the option
> > > is really set.
> 
> Unfortunately it is also possible to disable the write-behind xlator as
> well. This would cause setting the option to fail too :-/ At the moment
> there is no real useful way to detect if write-behind is disabled (it is
> enabled by default).
> 
> > > > +    ret = glfs_set_xlator_option (s->glfs, "*-write-behind",
> > > > +                                           "resync-failed-syncs-after-fsync",
> > > > +                                           "on");
> > > > +    if (ret < 0) {
> > > > +        error_setg_errno(errp, errno, "Unable to set xlator key/value pair");
> > > > +        ret = -errno;
> > > > +        goto out;
> > > > +    }
> > > > +#endif
> > > 
> > > We also need to consider the case without CONFIG_GLUSTERFS_XLATOR_OPT.
> > > In this case (as well as theoretically in the case that the option
> > > didn't take effect - if only we could know about it), a failed
> > > glfs_fsync_async() is fatal and we need to stop operating on the image,
> > > i.e. set bs->drv = NULL like when we detect corruption in qcow2 images.
> > > The guest will see a broken disk that fails all I/O requests, but that's
> > > better than corrupting data.
> > >
> > 
> > Gluster versions that don't support CONFIG_GLUSTERFS_XLATOR_OPT will
> > also not have the gluster patch that removes the file descriptor
> > invalidation upon error (unless that was a relatively new
> > bug/feature).  But if that is the case, every operation on the file
> > descriptor in those versions will return error.  But it is also rather
> > old versions that don't support glfs_set_xlator_option() I believe.
> 
> Indeed, glfs_set_xlator_option() was introduced with glusterfs-3.4.0. We
> are currently on glusterfs-3.7, with the oldest supported version of
> 3.5. In ~2 months we hopefully have a 3.8 release and that will cause

Is it possible for 3.8 to be able to validate key/value pairs in
glfs_set_xlator_option()?  Or is it something that by design is decoupled
enough that it isn't feasible?

This is the second instance I know of that a lack of error information from
the gluster api has made it difficult to determine if a feature is
supported (the other instance being SEEK_DATA/SEEK_HOLE).  It'd be nice if
error returns were more consistent in 3.8 in the API, is possible.

> the end-of-life of 3.5. 3.4 has been EOL for about a year now, hopefully
> all our users have upgraded, but we know that some users will stay on
> unsupported versions for a long time...
> 
> However, the "resync-failed-syncs-after-fsync" option was only
> introduced recently, with glusterfs-3.7.9. You could detect this with
> pkg-config glusterfs-api >= 7.3.7.9 if need to be.
> 

Unfortunately, compile-time detection of the specific key option isn't a
great option, since we are using gluster as a shared library. Backports for
various distributions may or may not have the
"resync-failed-syncs-after-fsync" key in it regardless of versioning, and
it isn't a new symbol that is detectable by the linker or at runtime.
Additionally, QEMU on a system with gluster version 3.7.8, for instance,
would have no way of knowing that "resync-failed-syncs-after-fsync" does
nothing.

The reason for the compile-time check for glfs_set_xlator_option() is
because it is a symbol that will be looked up when QEMU is linked (and
executed).  In contrast, if the user downgrades gluster on their system to
a version that doesn't have glfs_set_xlator_option(), QEMU won't run, so
the breakage will be visible.

> More details about the problem the option addresses are in the commit
> message on http://review.gluster.org/13057 .
>

Thanks, that is useful!

Jeff

  reply	other threads:[~2016-04-06 16:05 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-06  3:29 [Qemu-devel] [PATCH for-2.6 0/2] Bug fixes for gluster Jeff Cody
2016-04-06  3:29 ` [Qemu-devel] [PATCH for-2.6 1/2] block/gluster: return correct error value Jeff Cody
2016-04-06 15:00   ` Niels de Vos
2016-04-06  3:29 ` [Qemu-devel] [PATCH for-2.6 2/2] block/gluster: prevent data loss after i/o error Jeff Cody
2016-04-06 11:02   ` Kevin Wolf
2016-04-06 11:19     ` Ric Wheeler
2016-04-06 11:41       ` Kevin Wolf
2016-04-06 11:51         ` [Qemu-devel] [Qemu-block] " Kevin Wolf
2016-04-06 13:10           ` Jeff Cody
2016-04-06 13:20             ` Kevin Wolf
2016-04-07  7:48               ` Pranith Kumar Karampuri
2016-04-11  4:26                 ` Raghavendra Gowdappa
2016-05-09  6:38                   ` Raghavendra Talur
2016-05-09  8:02                     ` Kevin Wolf
2016-04-06 12:44     ` [Qemu-devel] " Jeff Cody
2016-04-06 14:47       ` Niels de Vos
2016-04-06 16:05         ` Jeff Cody [this message]
2016-04-07 16:17     ` Jeff Cody

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160406160525.GC7329@localhost.localdomain \
    --to=jcody@redhat.com \
    --cc=kdhananj@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=ndevos@redhat.com \
    --cc=pkarampu@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rwheeler@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).