qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Cody <jcody@redhat.com>
To: qemu-block@nongnu.org
Cc: qemu-devel@nongnu.org, kwolf@redhat.com, rwheeler@redhat.com,
	pkarampu@redhat.com, rgowdapp@redhat.com, ndevos@redhat.com
Subject: [Qemu-devel] [PATCH for-2.6 v2 3/3] block/gluster: prevent data loss after i/o error
Date: Tue, 19 Apr 2016 08:07:30 -0400	[thread overview]
Message-ID: <3d0edbc7aafd7731496db768c55a8ff3d4ac1537.1461067248.git.jcody@redhat.com> (raw)
In-Reply-To: <cover.1461067248.git.jcody@redhat.com>
In-Reply-To: <cover.1461067248.git.jcody@redhat.com>

Upon receiving an I/O error after an fsync, by default gluster will
dump its cache.  However, QEMU will retry the fsync, which is especially
useful when encountering errors such as ENOSPC when using the werror=stop
option.  When using caching with gluster, however, the last written data
will be lost upon encountering ENOSPC.  Using the write-behind-cache
xlator option of 'resync-failed-syncs-after-fsync' should cause gluster
to retain the cached data after a failed fsync, so that ENOSPC and other
transient errors are recoverable.

Unfortunately, we have no way of knowing if the
'resync-failed-syncs-after-fsync' xlator option is supported, so for now
close the fd and set the BDS driver to NULL upon fsync error.

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 block/gluster.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 configure       |  8 ++++++++
 2 files changed, 50 insertions(+)

diff --git a/block/gluster.c b/block/gluster.c
index d9aace6..ba33488 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -314,6 +314,23 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
         goto out;
     }
 
+#ifdef CONFIG_GLUSTERFS_XLATOR_OPT
+    /* Without this, if fsync fails for a recoverable reason (for instance,
+     * ENOSPC), gluster will dump its cache, preventing retries.  This means
+     * almost certain data loss.  Not all gluster versions support the
+     * 'resync-failed-syncs-after-fsync' key value, but there is no way to
+     * discover during runtime if it is supported (this api returns success for
+     * unknown key/value pairs) */
+    ret = glfs_set_xlator_option(s->glfs, "*-write-behind",
+                                          "resync-failed-syncs-after-fsync",
+                                          "on");
+    if (ret < 0) {
+        error_setg_errno(errp, errno, "Unable to set xlator key/value pair");
+        ret = -errno;
+        goto out;
+    }
+#endif
+
     qemu_gluster_parse_flags(bdrv_flags, &open_flags);
 
     s->fd = glfs_open(s->glfs, gconf->image, open_flags);
@@ -366,6 +383,16 @@ static int qemu_gluster_reopen_prepare(BDRVReopenState *state,
         goto exit;
     }
 
+#ifdef CONFIG_GLUSTERFS_XLATOR_OPT
+    ret = glfs_set_xlator_option(reop_s->glfs, "*-write-behind",
+                                 "resync-failed-syncs-after-fsync", "on");
+    if (ret < 0) {
+        error_setg_errno(errp, errno, "Unable to set xlator key/value pair");
+        ret = -errno;
+        goto exit;
+    }
+#endif
+
     reop_s->fd = glfs_open(reop_s->glfs, gconf->image, open_flags);
     if (reop_s->fd == NULL) {
         /* reops->glfs will be cleaned up in _abort */
@@ -613,6 +640,21 @@ static coroutine_fn int qemu_gluster_co_flush_to_disk(BlockDriverState *bs)
 
     ret = glfs_fsync_async(s->fd, gluster_finish_aiocb, &acb);
     if (ret < 0) {
+        /* Some versions of Gluster (3.5.6 -> 3.5.8?) will not retain its
+         * cache after a fsync failure, so we have no way of allowing the guest
+         * to safely continue.  Gluster versions prior to 3.5.6 don't retain
+         * the cache either, but will invalidate the fd on error, so this is
+         * again our only option.
+         *
+         * The 'resync-failed-syncs-after-fsync' xlator option for the
+         * write-behind cache will cause later gluster versions to retain
+         * its cache after error, so long as the fd remains open.  However,
+         * we currently have no way of knowing if this option is supported.
+         *
+         * TODO: Once gluster provides a way for us to determine if the option
+         *       is supported, bypass the closure and setting drv to NULL.  */
+        qemu_gluster_close(bs);
+        bs->drv = NULL;
         return -errno;
     }
 
diff --git a/configure b/configure
index f1c307b..ab54f3c 100755
--- a/configure
+++ b/configure
@@ -298,6 +298,7 @@ coroutine=""
 coroutine_pool=""
 seccomp=""
 glusterfs=""
+glusterfs_xlator_opt="no"
 glusterfs_discard="no"
 glusterfs_zerofill="no"
 archipelago="no"
@@ -3400,6 +3401,9 @@ if test "$glusterfs" != "no" ; then
     glusterfs="yes"
     glusterfs_cflags=`$pkg_config --cflags glusterfs-api`
     glusterfs_libs=`$pkg_config --libs glusterfs-api`
+    if $pkg_config --atleast-version=4 glusterfs-api; then
+      glusterfs_xlator_opt="yes"
+    fi
     if $pkg_config --atleast-version=5 glusterfs-api; then
       glusterfs_discard="yes"
     fi
@@ -5342,6 +5346,10 @@ if test "$glusterfs" = "yes" ; then
   echo "GLUSTERFS_LIBS=$glusterfs_libs" >> $config_host_mak
 fi
 
+if test "$glusterfs_xlator_opt" = "yes" ; then
+  echo "CONFIG_GLUSTERFS_XLATOR_OPT=y" >> $config_host_mak
+fi
+
 if test "$glusterfs_discard" = "yes" ; then
   echo "CONFIG_GLUSTERFS_DISCARD=y" >> $config_host_mak
 fi
-- 
1.9.3

  parent reply	other threads:[~2016-04-19 12:07 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-19 12:07 [Qemu-devel] [PATCH for-2.6 v2 0/3] Bug fixes for gluster Jeff Cody
2016-04-19 12:07 ` [Qemu-devel] [PATCH for-2.6 v2 1/3] block/gluster: return correct error value Jeff Cody
2016-04-19 12:07 ` [Qemu-devel] [PATCH for-2.6 v2 2/3] block/gluster: code movement of qemu_gluster_close() Jeff Cody
2016-04-19 12:07 ` Jeff Cody [this message]
2016-04-19 12:27   ` [Qemu-devel] [PATCH for-2.6 v2 3/3] block/gluster: prevent data loss after i/o error Kevin Wolf
2016-04-19 12:29     ` Jeff Cody
2016-04-19 12:18 ` [Qemu-devel] [PATCH for-2.6 v2 0/3] Bug fixes for gluster Ric Wheeler
2016-04-19 14:09   ` Jeff Cody
2016-04-20  1:56     ` Ric Wheeler
2016-04-20  9:24       ` Kevin Wolf
2016-04-20 10:40         ` Ric Wheeler
2016-04-20 11:46           ` Kevin Wolf
2016-04-20 18:38             ` Rik van Riel
2016-04-21  8:43               ` Kevin Wolf
2016-04-20 18:37           ` Rik van Riel
2016-04-20  5:15     ` Raghavendra Gowdappa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3d0edbc7aafd7731496db768c55a8ff3d4ac1537.1461067248.git.jcody@redhat.com \
    --to=jcody@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=ndevos@redhat.com \
    --cc=pkarampu@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rgowdapp@redhat.com \
    --cc=rwheeler@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).