From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:40511)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1asUlC-0008O8-Pq
	for qemu-devel@nongnu.org; Tue, 19 Apr 2016 08:28:10 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1asUlB-0000d9-MF
	for qemu-devel@nongnu.org; Tue, 19 Apr 2016 08:28:06 -0400
Date: Tue, 19 Apr 2016 14:27:56 +0200
From: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20160419122756.GB4498@noname.str.redhat.com>
References: <cover.1461067248.git.jcody@redhat.com>
	<3d0edbc7aafd7731496db768c55a8ff3d4ac1537.1461067248.git.jcody@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <3d0edbc7aafd7731496db768c55a8ff3d4ac1537.1461067248.git.jcody@redhat.com>
Subject: Re: [Qemu-devel] [PATCH for-2.6 v2 3/3] block/gluster: prevent data
 loss after i/o error
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Jeff Cody <jcody@redhat.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org, rwheeler@redhat.com, pkarampu@redhat.com, rgowdapp@redhat.com, ndevos@redhat.com

Am 19.04.2016 um 14:07 hat Jeff Cody geschrieben:
> Upon receiving an I/O error after an fsync, by default gluster will
> dump its cache.  However, QEMU will retry the fsync, which is especially
> useful when encountering errors such as ENOSPC when using the werror=stop
> option.  When using caching with gluster, however, the last written data
> will be lost upon encountering ENOSPC.  Using the write-behind-cache
> xlator option of 'resync-failed-syncs-after-fsync' should cause gluster
> to retain the cached data after a failed fsync, so that ENOSPC and other
> transient errors are recoverable.
> 
> Unfortunately, we have no way of knowing if the
> 'resync-failed-syncs-after-fsync' xlator option is supported, so for now
> close the fd and set the BDS driver to NULL upon fsync error.
> 
> Signed-off-by: Jeff Cody <jcody@redhat.com>
> ---
>  block/gluster.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>  configure       |  8 ++++++++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/block/gluster.c b/block/gluster.c
> index d9aace6..ba33488 100644
> --- a/block/gluster.c
> +++ b/block/gluster.c
> @@ -314,6 +314,23 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
>          goto out;
>      }
>  
> +#ifdef CONFIG_GLUSTERFS_XLATOR_OPT
> +    /* Without this, if fsync fails for a recoverable reason (for instance,
> +     * ENOSPC), gluster will dump its cache, preventing retries.  This means
> +     * almost certain data loss.  Not all gluster versions support the
> +     * 'resync-failed-syncs-after-fsync' key value, but there is no way to
> +     * discover during runtime if it is supported (this api returns success for
> +     * unknown key/value pairs) */
> +    ret = glfs_set_xlator_option(s->glfs, "*-write-behind",
> +                                          "resync-failed-syncs-after-fsync",
> +                                          "on");
> +    if (ret < 0) {
> +        error_setg_errno(errp, errno, "Unable to set xlator key/value pair");
> +        ret = -errno;
> +        goto out;
> +    }
> +#endif
> +
>      qemu_gluster_parse_flags(bdrv_flags, &open_flags);
>  
>      s->fd = glfs_open(s->glfs, gconf->image, open_flags);
> @@ -366,6 +383,16 @@ static int qemu_gluster_reopen_prepare(BDRVReopenState *state,
>          goto exit;
>      }
>  
> +#ifdef CONFIG_GLUSTERFS_XLATOR_OPT
> +    ret = glfs_set_xlator_option(reop_s->glfs, "*-write-behind",
> +                                 "resync-failed-syncs-after-fsync", "on");
> +    if (ret < 0) {
> +        error_setg_errno(errp, errno, "Unable to set xlator key/value pair");
> +        ret = -errno;
> +        goto exit;
> +    }
> +#endif
> +
>      reop_s->fd = glfs_open(reop_s->glfs, gconf->image, open_flags);
>      if (reop_s->fd == NULL) {
>          /* reops->glfs will be cleaned up in _abort */
> @@ -613,6 +640,21 @@ static coroutine_fn int qemu_gluster_co_flush_to_disk(BlockDriverState *bs)
>  
>      ret = glfs_fsync_async(s->fd, gluster_finish_aiocb, &acb);
>      if (ret < 0) {
> +        /* Some versions of Gluster (3.5.6 -> 3.5.8?) will not retain its
> +         * cache after a fsync failure, so we have no way of allowing the guest
> +         * to safely continue.  Gluster versions prior to 3.5.6 don't retain
> +         * the cache either, but will invalidate the fd on error, so this is
> +         * again our only option.
> +         *
> +         * The 'resync-failed-syncs-after-fsync' xlator option for the
> +         * write-behind cache will cause later gluster versions to retain
> +         * its cache after error, so long as the fd remains open.  However,
> +         * we currently have no way of knowing if this option is supported.
> +         *
> +         * TODO: Once gluster provides a way for us to determine if the option
> +         *       is supported, bypass the closure and setting drv to NULL.  */
> +        qemu_gluster_close(bs);
> +        bs->drv = NULL;
>          return -errno;
>      }

More context:

        qemu_coroutine_yield();
        return acb.ret;
    }

I would guess that acb.ret containing an error is the more common case.
We should probably invalidate the BDS in both cases (immediate failure
and callback with error code).

Kevin