From: Jeff Cody <jcody@redhat.com>
To: Niels de Vos <ndevos@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
qemu-block@nongnu.org, pkarampu@redhat.com,
qemu-devel@nongnu.org, rwheeler@redhat.com, kdhananj@redhat.com
Subject: Re: [Qemu-devel] [PATCH for-2.6 2/2] block/gluster: prevent data loss after i/o error
Date: Wed, 6 Apr 2016 12:05:25 -0400 [thread overview]
Message-ID: <20160406160525.GC7329@localhost.localdomain> (raw)
In-Reply-To: <20160406144732.GM17260@ndevos-x240.usersys.redhat.com>
On Wed, Apr 06, 2016 at 04:47:32PM +0200, Niels de Vos wrote:
> On Wed, Apr 06, 2016 at 08:44:18AM -0400, Jeff Cody wrote:
> > On Wed, Apr 06, 2016 at 01:02:16PM +0200, Kevin Wolf wrote:
> > > [ Adding some CCs ]
> > >
> > > Am 06.04.2016 um 05:29 hat Jeff Cody geschrieben:
> > > > Upon receiving an I/O error after an fsync, by default gluster will
> > > > dump its cache. However, QEMU will retry the fsync, which is especially
> > > > useful when encountering errors such as ENOSPC when using the werror=stop
> > > > option. When using caching with gluster, however, the last written data
> > > > will be lost upon encountering ENOSPC. Using the cache xlator option of
>
> Use "write-behind xlator" instead of "cache xlator". There are different
> caches in Gluster.
>
Thanks, I'll update the commit message.
> > > > 'resync-failed-syncs-after-fsync' should cause gluster to retain the
> > > > cached data after a failed fsync, so that ENOSPC and other transient
> > > > errors are recoverable.
> > > >
> > > > Signed-off-by: Jeff Cody <jcody@redhat.com>
> > > > ---
> > > > block/gluster.c | 27 +++++++++++++++++++++++++++
> > > > configure | 8 ++++++++
> > > > 2 files changed, 35 insertions(+)
> > > >
> > > > diff --git a/block/gluster.c b/block/gluster.c
> > > > index 30a827e..b1cf71b 100644
> > > > --- a/block/gluster.c
> > > > +++ b/block/gluster.c
> > > > @@ -330,6 +330,23 @@ static int qemu_gluster_open(BlockDriverState *bs, QDict *options,
> > > > goto out;
> > > > }
> > > >
> > > > +#ifdef CONFIG_GLUSTERFS_XLATOR_OPT
> > > > + /* Without this, if fsync fails for a recoverable reason (for instance,
> > > > + * ENOSPC), gluster will dump its cache, preventing retries. This means
> > > > + * almost certain data loss. Not all gluster versions support the
> > > > + * 'resync-failed-syncs-after-fsync' key value, but there is no way to
> > > > + * discover during runtime if it is supported (this api returns success for
> > > > + * unknown key/value pairs) */
> > >
> > > Honestly, this sucks. There is apparently no way to operate gluster so
> > > we can safely recover after a failed fsync. "We hope everything is fine,
> > > but depending on your gluster version, we may now corrupt your image"
> > > isn't very good.
> > >
> > > We need to consider very carefully if this is good enough to go on after
> > > an error. I'm currently leaning towards "no". That is, we should only
> > > enable this after Gluster provides us a way to make sure that the option
> > > is really set.
>
> Unfortunately it is also possible to disable the write-behind xlator as
> well. This would cause setting the option to fail too :-/ At the moment
> there is no real useful way to detect if write-behind is disabled (it is
> enabled by default).
>
> > > > + ret = glfs_set_xlator_option (s->glfs, "*-write-behind",
> > > > + "resync-failed-syncs-after-fsync",
> > > > + "on");
> > > > + if (ret < 0) {
> > > > + error_setg_errno(errp, errno, "Unable to set xlator key/value pair");
> > > > + ret = -errno;
> > > > + goto out;
> > > > + }
> > > > +#endif
> > >
> > > We also need to consider the case without CONFIG_GLUSTERFS_XLATOR_OPT.
> > > In this case (as well as theoretically in the case that the option
> > > didn't take effect - if only we could know about it), a failed
> > > glfs_fsync_async() is fatal and we need to stop operating on the image,
> > > i.e. set bs->drv = NULL like when we detect corruption in qcow2 images.
> > > The guest will see a broken disk that fails all I/O requests, but that's
> > > better than corrupting data.
> > >
> >
> > Gluster versions that don't support CONFIG_GLUSTERFS_XLATOR_OPT will
> > also not have the gluster patch that removes the file descriptor
> > invalidation upon error (unless that was a relatively new
> > bug/feature). But if that is the case, every operation on the file
> > descriptor in those versions will return error. But it is also rather
> > old versions that don't support glfs_set_xlator_option() I believe.
>
> Indeed, glfs_set_xlator_option() was introduced with glusterfs-3.4.0. We
> are currently on glusterfs-3.7, with the oldest supported version of
> 3.5. In ~2 months we hopefully have a 3.8 release and that will cause
Is it possible for 3.8 to be able to validate key/value pairs in
glfs_set_xlator_option()? Or is it something that by design is decoupled
enough that it isn't feasible?
This is the second instance I know of that a lack of error information from
the gluster api has made it difficult to determine if a feature is
supported (the other instance being SEEK_DATA/SEEK_HOLE). It'd be nice if
error returns were more consistent in 3.8 in the API, is possible.
> the end-of-life of 3.5. 3.4 has been EOL for about a year now, hopefully
> all our users have upgraded, but we know that some users will stay on
> unsupported versions for a long time...
>
> However, the "resync-failed-syncs-after-fsync" option was only
> introduced recently, with glusterfs-3.7.9. You could detect this with
> pkg-config glusterfs-api >= 7.3.7.9 if need to be.
>
Unfortunately, compile-time detection of the specific key option isn't a
great option, since we are using gluster as a shared library. Backports for
various distributions may or may not have the
"resync-failed-syncs-after-fsync" key in it regardless of versioning, and
it isn't a new symbol that is detectable by the linker or at runtime.
Additionally, QEMU on a system with gluster version 3.7.8, for instance,
would have no way of knowing that "resync-failed-syncs-after-fsync" does
nothing.
The reason for the compile-time check for glfs_set_xlator_option() is
because it is a symbol that will be looked up when QEMU is linked (and
executed). In contrast, if the user downgrades gluster on their system to
a version that doesn't have glfs_set_xlator_option(), QEMU won't run, so
the breakage will be visible.
> More details about the problem the option addresses are in the commit
> message on http://review.gluster.org/13057 .
>
Thanks, that is useful!
Jeff
next prev parent reply other threads:[~2016-04-06 16:05 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-06 3:29 [Qemu-devel] [PATCH for-2.6 0/2] Bug fixes for gluster Jeff Cody
2016-04-06 3:29 ` [Qemu-devel] [PATCH for-2.6 1/2] block/gluster: return correct error value Jeff Cody
2016-04-06 15:00 ` Niels de Vos
2016-04-06 3:29 ` [Qemu-devel] [PATCH for-2.6 2/2] block/gluster: prevent data loss after i/o error Jeff Cody
2016-04-06 11:02 ` Kevin Wolf
2016-04-06 11:19 ` Ric Wheeler
2016-04-06 11:41 ` Kevin Wolf
2016-04-06 11:51 ` [Qemu-devel] [Qemu-block] " Kevin Wolf
2016-04-06 13:10 ` Jeff Cody
2016-04-06 13:20 ` Kevin Wolf
2016-04-07 7:48 ` Pranith Kumar Karampuri
2016-04-11 4:26 ` Raghavendra Gowdappa
2016-05-09 6:38 ` Raghavendra Talur
2016-05-09 8:02 ` Kevin Wolf
2016-04-06 12:44 ` [Qemu-devel] " Jeff Cody
2016-04-06 14:47 ` Niels de Vos
2016-04-06 16:05 ` Jeff Cody [this message]
2016-04-07 16:17 ` Jeff Cody
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160406160525.GC7329@localhost.localdomain \
--to=jcody@redhat.com \
--cc=kdhananj@redhat.com \
--cc=kwolf@redhat.com \
--cc=ndevos@redhat.com \
--cc=pkarampu@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=rwheeler@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.