qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* Re: [Qemu-devel] [Qemu-block] [PATCH] block/gluster: add support for SEEK_DATA/SEEK_HOLE
       [not found] <1457373855-8072-1-git-send-email-ndevos@redhat.com>
@ 2016-03-07 19:14 ` Eric Blake
  2016-10-06 22:09   ` Eric Blake
       [not found] ` <20160307182738.GA14127@localhost.localdomain>
  1 sibling, 1 reply; 16+ messages in thread
From: Eric Blake @ 2016-03-07 19:14 UTC (permalink / raw)
  To: Niels de Vos, qemu-block; +Cc: qemu-devel@nongnu.org, Prasanna Kumar Kalever

[-- Attachment #1: Type: text/plain, Size: 10132 bytes --]

[adding qemu-devel; ALL patches must cc qemu-devel even when sent to
another list]

On 03/07/2016 11:04 AM, Niels de Vos wrote:
> GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
> it possible to detect sparse areas in files.
> 
> Signed-off-by: Niels de Vos <ndevos@redhat.com>
> 
> --
> Tested by compiling and running "qemu-img map gluster://..." with a
> build of the current master branch of glusterfs. Using a Fedora
> cloud image (in raw format) shows many SEEK procudure calls going back
> and forth over the network. The output of "qemu map" matches the output
> when run against the image on the local filesystem.
> ---
>  block/gluster.c | 159 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  configure       |  25 +++++++++
>  2 files changed, 184 insertions(+)
> 
> diff --git a/block/gluster.c b/block/gluster.c
> index 65077a0..1430010 100644
> --- a/block/gluster.c
> +++ b/block/gluster.c
> @@ -677,6 +677,153 @@ static int qemu_gluster_has_zero_init(BlockDriverState *bs)
>      return 0;
>  }
>  
> +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> +/*
> + * Find allocation range in @bs around offset @start.
> + * May change underlying file descriptor's file offset.
> + * If @start is not in a hole, store @start in @data, and the
> + * beginning of the next hole in @hole, and return 0.
> + * If @start is in a non-trailing hole, store @start in @hole and the
> + * beginning of the next non-hole in @data, and return 0.
> + * If @start is in a trailing hole or beyond EOF, return -ENXIO.
> + * If we can't find out, return a negative errno other than -ENXIO.
> + *
> + * (Shamefully copied from raw-posix.c, only miniscule adaptions.)
> + */
> +static int find_allocation(BlockDriverState *bs, off_t start,
> +                           off_t *data, off_t *hole)
> +{
> +    BDRVGlusterState *s = bs->opaque;
> +    off_t offs;
> +
> +    /*
> +     * SEEK_DATA cases:
> +     * D1. offs == start: start is in data
> +     * D2. offs > start: start is in a hole, next data at offs
> +     * D3. offs < 0, errno = ENXIO: either start is in a trailing hole
> +     *                              or start is beyond EOF
> +     *     If the latter happens, the file has been truncated behind
> +     *     our back since we opened it.  All bets are off then.
> +     *     Treating like a trailing hole is simplest.
> +     * D4. offs < 0, errno != ENXIO: we learned nothing
> +     */
> +    offs = glfs_lseek(s->fd, start, SEEK_DATA);
> +    if (offs < 0) {
> +        return -errno;          /* D3 or D4 */
> +    }
> +    assert(offs >= start);
> +
> +    if (offs > start) {
> +        /* D2: in hole, next data at offs */
> +        *hole = start;
> +        *data = offs;
> +        return 0;
> +    }
> +
> +    /* D1: in data, end not yet known */
> +
> +    /*
> +     * SEEK_HOLE cases:
> +     * H1. offs == start: start is in a hole
> +     *     If this happens here, a hole has been dug behind our back
> +     *     since the previous lseek().
> +     * H2. offs > start: either start is in data, next hole at offs,
> +     *                   or start is in trailing hole, EOF at offs
> +     *     Linux treats trailing holes like any other hole: offs ==
> +     *     start.  Solaris seeks to EOF instead: offs > start (blech).
> +     *     If that happens here, a hole has been dug behind our back
> +     *     since the previous lseek().
> +     * H3. offs < 0, errno = ENXIO: start is beyond EOF
> +     *     If this happens, the file has been truncated behind our
> +     *     back since we opened it.  Treat it like a trailing hole.
> +     * H4. offs < 0, errno != ENXIO: we learned nothing
> +     *     Pretend we know nothing at all, i.e. "forget" about D1.
> +     */
> +    offs = glfs_lseek(s->fd, start, SEEK_HOLE);
> +    if (offs < 0) {
> +        return -errno;          /* D1 and (H3 or H4) */
> +    }
> +    assert(offs >= start);
> +
> +    if (offs > start) {
> +        /*
> +         * D1 and H2: either in data, next hole at offs, or it was in
> +         * data but is now in a trailing hole.  In the latter case,
> +         * all bets are off.  Treating it as if it there was data all
> +         * the way to EOF is safe, so simply do that.
> +         */
> +        *data = start;
> +        *hole = offs;
> +        return 0;
> +    }
> +
> +    /* D1 and H1 */
> +    return -EBUSY;
> +}
> +
> +/*
> + * Returns the allocation status of the specified sectors.
> + *
> + * If 'sector_num' is beyond the end of the disk image the return value is 0
> + * and 'pnum' is set to 0.
> + *
> + * 'pnum' is set to the number of sectors (including and immediately following
> + * the specified sector) that are known to be in the same
> + * allocated/unallocated state.
> + *
> + * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
> + * beyond the end of the disk image it will be clamped.
> + *
> + * (Based on raw_co_get_block_status() from raw-posix.c.)
> + */
> +static int64_t coroutine_fn qemu_gluster_co_get_block_status(
> +        BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum)
> +{
> +    BDRVGlusterState *s = bs->opaque;
> +    off_t start, data = 0, hole = 0;
> +    int64_t total_size;
> +    int ret = -EINVAL;
> +
> +    if (!s->fd) {
> +        return ret;
> +    }
> +
> +    start = sector_num * BDRV_SECTOR_SIZE;
> +    total_size = bdrv_getlength(bs);
> +    if (total_size < 0) {
> +        return total_size;
> +    } else if (start >= total_size) {
> +        *pnum = 0;
> +        return 0;
> +    } else if (start + nb_sectors * BDRV_SECTOR_SIZE > total_size) {
> +        nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
> +    }
> +
> +    ret = find_allocation(bs, start, &data, &hole);
> +    if (ret == -ENXIO) {
> +        /* Trailing hole */
> +        *pnum = nb_sectors;
> +        ret = BDRV_BLOCK_ZERO;
> +    } else if (ret < 0) {
> +        /* No info available, so pretend there are no holes */
> +        *pnum = nb_sectors;
> +        ret = BDRV_BLOCK_DATA;
> +    } else if (data == start) {
> +        /* On a data extent, compute sectors to the end of the extent,
> +         * possibly including a partial sector at EOF. */
> +        *pnum = MIN(nb_sectors, DIV_ROUND_UP(hole - start, BDRV_SECTOR_SIZE));
> +        ret = BDRV_BLOCK_DATA;
> +    } else {
> +        /* On a hole, compute sectors to the beginning of the next extent.  */
> +        assert(hole == start);
> +        *pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
> +        ret = BDRV_BLOCK_ZERO;
> +    }
> +    return ret | BDRV_BLOCK_OFFSET_VALID | start;
> +}
> +#endif /* CONFIG_GLUSTERFS_SEEK_DATA */
> +
> +
>  static QemuOptsList qemu_gluster_create_opts = {
>      .name = "qemu-gluster-create-opts",
>      .head = QTAILQ_HEAD_INITIALIZER(qemu_gluster_create_opts.head),
> @@ -719,6 +866,9 @@ static BlockDriver bdrv_gluster = {
>  #ifdef CONFIG_GLUSTERFS_ZEROFILL
>      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
>  #endif
> +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> +#endif
>      .create_opts                  = &qemu_gluster_create_opts,
>  };
>  
> @@ -746,6 +896,9 @@ static BlockDriver bdrv_gluster_tcp = {
>  #ifdef CONFIG_GLUSTERFS_ZEROFILL
>      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
>  #endif
> +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> +#endif
>      .create_opts                  = &qemu_gluster_create_opts,
>  };
>  
> @@ -773,6 +926,9 @@ static BlockDriver bdrv_gluster_unix = {
>  #ifdef CONFIG_GLUSTERFS_ZEROFILL
>      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
>  #endif
> +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> +#endif
>      .create_opts                  = &qemu_gluster_create_opts,
>  };
>  
> @@ -800,6 +956,9 @@ static BlockDriver bdrv_gluster_rdma = {
>  #ifdef CONFIG_GLUSTERFS_ZEROFILL
>      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
>  #endif
> +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> +#endif
>      .create_opts                  = &qemu_gluster_create_opts,
>  };
>  
> diff --git a/configure b/configure
> index 0c0472a..ca3821c 100755
> --- a/configure
> +++ b/configure
> @@ -3351,6 +3351,9 @@ if test "$glusterfs" != "no" ; then
>      if $pkg_config --atleast-version=6 glusterfs-api; then
>        glusterfs_zerofill="yes"
>      fi
> +    if $pkg_config --atleast-version=7.3.8 glusterfs-api; then
> +      glusterfs_seek_data="yes"
> +    fi
>    else
>      if test "$glusterfs" = "yes" ; then
>        feature_not_found "GlusterFS backend support" \
> @@ -3660,6 +3663,24 @@ if compile_prog "" "" ; then
>    fiemap=yes
>  fi
>  
> +# check for SEEK_DATA and SEEK_HOLE
> +seek_data=no
> +cat > $TMPC << EOF
> +#define _GNU_SOURCE
> +#include <sys/types.h>
> +#include <unistd.h>
> +
> +int main(void)
> +{
> +    lseek(0, 0, SEEK_DATA);
> +    lseek(0, 0, SEEK_HOLE);
> +    return 0;
> +}
> +EOF
> +if compile_prog "" "" ; then
> +  seek_data=yes
> +fi
> +
>  # check for dup3
>  dup3=no
>  cat > $TMPC << EOF
> @@ -5278,6 +5299,10 @@ if test "$glusterfs_zerofill" = "yes" ; then
>    echo "CONFIG_GLUSTERFS_ZEROFILL=y" >> $config_host_mak
>  fi
>  
> +if test "$glusterfs_seek_data" = "yes" && test "$seek_data" = "yes" ; then
> +  echo "CONFIG_GLUSTERFS_SEEK_DATA=y" >> $config_host_mak
> +fi
> +
>  if test "$archipelago" = "yes" ; then
>    echo "CONFIG_ARCHIPELAGO=m" >> $config_host_mak
>    echo "ARCHIPELAGO_LIBS=$archipelago_libs" >> $config_host_mak
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH] block/gluster: add support for SEEK_DATA/SEEK_HOLE
       [not found] ` <20160307182738.GA14127@localhost.localdomain>
@ 2016-03-08  4:21   ` Niels de Vos
  2016-03-08 12:33     ` Jeff Cody
  2016-03-08 12:53     ` [Qemu-devel] [Qemu-block] [PATCH] " Kevin Wolf
  0 siblings, 2 replies; 16+ messages in thread
From: Niels de Vos @ 2016-03-08  4:21 UTC (permalink / raw)
  To: Jeff Cody; +Cc: Prasanna Kumar Kalever, qemu-block, qemu-devel

On Mon, Mar 07, 2016 at 01:27:38PM -0500, Jeff Cody wrote:
> On Mon, Mar 07, 2016 at 07:04:15PM +0100, Niels de Vos wrote:
> > GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
> > it possible to detect sparse areas in files.
> > 
> > Signed-off-by: Niels de Vos <ndevos@redhat.com>
> > 
> > --
> > Tested by compiling and running "qemu-img map gluster://..." with a
> > build of the current master branch of glusterfs. Using a Fedora
> > cloud image (in raw format) shows many SEEK procudure calls going back
> > and forth over the network. The output of "qemu map" matches the output
> > when run against the image on the local filesystem.
> > ---
> >  block/gluster.c | 159 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  configure       |  25 +++++++++
> >  2 files changed, 184 insertions(+)
> > 
> > diff --git a/block/gluster.c b/block/gluster.c
> > index 65077a0..1430010 100644
> > --- a/block/gluster.c
> > +++ b/block/gluster.c
> > @@ -677,6 +677,153 @@ static int qemu_gluster_has_zero_init(BlockDriverState *bs)
> >      return 0;
> >  }
> >  
> > +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> 
> Why do we need to make this a compile-time option?  Version checking
> is problematic; for instance, different distributions may have
> backported bug fixes / features, that are not reflected by the
> reported version number, etc..  Ideally, we can determine
> functionality during runtime, and behave accordingly.

This will not get backported to older Gluster versions, it required a
protocol change.

> If SEEK_DATA and SEEK_HOLE are not supported,
> qemu_gluster_co_get_block_status can return that sectors are all
> allocated (which is what happens in block/io.c anyway if the driver
> doesn't support the function).

Ok, good to know.

> As long as glfs_lseek() will return error (e.g. EINVAL) for an invalid
> whence value,  we can handle it runtime.  Does glfs_lseek() behave
> sanely?

Unfortunately older versions of libgfapi do not return EINVAL when
SEEK_DATA/HOLE is used. It is something we'll need to fix in the stable
releases. We can not assume that all users have installed a version of
the library that handles SEEK_DATA/HOLE correctly (return EINVAL) when
there is no support in the network protocol or on the server.

To be sure that we don't get some undefined behaviour, the compile time
check is needed.

Niels

> > +/*
> > + * Find allocation range in @bs around offset @start.
> > + * May change underlying file descriptor's file offset.
> > + * If @start is not in a hole, store @start in @data, and the
> > + * beginning of the next hole in @hole, and return 0.
> > + * If @start is in a non-trailing hole, store @start in @hole and the
> > + * beginning of the next non-hole in @data, and return 0.
> > + * If @start is in a trailing hole or beyond EOF, return -ENXIO.
> > + * If we can't find out, return a negative errno other than -ENXIO.
> > + *
> > + * (Shamefully copied from raw-posix.c, only miniscule adaptions.)
> > + */
> > +static int find_allocation(BlockDriverState *bs, off_t start,
> > +                           off_t *data, off_t *hole)
> > +{
> > +    BDRVGlusterState *s = bs->opaque;
> > +    off_t offs;
> > +
> > +    /*
> > +     * SEEK_DATA cases:
> > +     * D1. offs == start: start is in data
> > +     * D2. offs > start: start is in a hole, next data at offs
> > +     * D3. offs < 0, errno = ENXIO: either start is in a trailing hole
> > +     *                              or start is beyond EOF
> > +     *     If the latter happens, the file has been truncated behind
> > +     *     our back since we opened it.  All bets are off then.
> > +     *     Treating like a trailing hole is simplest.
> > +     * D4. offs < 0, errno != ENXIO: we learned nothing
> > +     */
> > +    offs = glfs_lseek(s->fd, start, SEEK_DATA);
> > +    if (offs < 0) {
> > +        return -errno;          /* D3 or D4 */
> > +    }
> > +    assert(offs >= start);
> > +
> > +    if (offs > start) {
> > +        /* D2: in hole, next data at offs */
> > +        *hole = start;
> > +        *data = offs;
> > +        return 0;
> > +    }
> > +
> > +    /* D1: in data, end not yet known */
> > +
> > +    /*
> > +     * SEEK_HOLE cases:
> > +     * H1. offs == start: start is in a hole
> > +     *     If this happens here, a hole has been dug behind our back
> > +     *     since the previous lseek().
> > +     * H2. offs > start: either start is in data, next hole at offs,
> > +     *                   or start is in trailing hole, EOF at offs
> > +     *     Linux treats trailing holes like any other hole: offs ==
> > +     *     start.  Solaris seeks to EOF instead: offs > start (blech).
> > +     *     If that happens here, a hole has been dug behind our back
> > +     *     since the previous lseek().
> > +     * H3. offs < 0, errno = ENXIO: start is beyond EOF
> > +     *     If this happens, the file has been truncated behind our
> > +     *     back since we opened it.  Treat it like a trailing hole.
> > +     * H4. offs < 0, errno != ENXIO: we learned nothing
> > +     *     Pretend we know nothing at all, i.e. "forget" about D1.
> > +     */
> > +    offs = glfs_lseek(s->fd, start, SEEK_HOLE);
> > +    if (offs < 0) {
> > +        return -errno;          /* D1 and (H3 or H4) */
> > +    }
> > +    assert(offs >= start);
> > +
> > +    if (offs > start) {
> > +        /*
> > +         * D1 and H2: either in data, next hole at offs, or it was in
> > +         * data but is now in a trailing hole.  In the latter case,
> > +         * all bets are off.  Treating it as if it there was data all
> > +         * the way to EOF is safe, so simply do that.
> > +         */
> > +        *data = start;
> > +        *hole = offs;
> > +        return 0;
> > +    }
> > +
> > +    /* D1 and H1 */
> > +    return -EBUSY;
> > +}
> > +
> > +/*
> > + * Returns the allocation status of the specified sectors.
> > + *
> > + * If 'sector_num' is beyond the end of the disk image the return value is 0
> > + * and 'pnum' is set to 0.
> > + *
> > + * 'pnum' is set to the number of sectors (including and immediately following
> > + * the specified sector) that are known to be in the same
> > + * allocated/unallocated state.
> > + *
> > + * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
> > + * beyond the end of the disk image it will be clamped.
> > + *
> > + * (Based on raw_co_get_block_status() from raw-posix.c.)
> > + */
> > +static int64_t coroutine_fn qemu_gluster_co_get_block_status(
> > +        BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum)
> > +{
> > +    BDRVGlusterState *s = bs->opaque;
> > +    off_t start, data = 0, hole = 0;
> > +    int64_t total_size;
> > +    int ret = -EINVAL;
> > +
> > +    if (!s->fd) {
> > +        return ret;
> > +    }
> > +
> > +    start = sector_num * BDRV_SECTOR_SIZE;
> > +    total_size = bdrv_getlength(bs);
> > +    if (total_size < 0) {
> > +        return total_size;
> > +    } else if (start >= total_size) {
> > +        *pnum = 0;
> > +        return 0;
> > +    } else if (start + nb_sectors * BDRV_SECTOR_SIZE > total_size) {
> > +        nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
> > +    }
> > +
> > +    ret = find_allocation(bs, start, &data, &hole);
> > +    if (ret == -ENXIO) {
> > +        /* Trailing hole */
> > +        *pnum = nb_sectors;
> > +        ret = BDRV_BLOCK_ZERO;
> > +    } else if (ret < 0) {
> > +        /* No info available, so pretend there are no holes */
> > +        *pnum = nb_sectors;
> > +        ret = BDRV_BLOCK_DATA;
> > +    } else if (data == start) {
> > +        /* On a data extent, compute sectors to the end of the extent,
> > +         * possibly including a partial sector at EOF. */
> > +        *pnum = MIN(nb_sectors, DIV_ROUND_UP(hole - start, BDRV_SECTOR_SIZE));
> > +        ret = BDRV_BLOCK_DATA;
> > +    } else {
> > +        /* On a hole, compute sectors to the beginning of the next extent.  */
> > +        assert(hole == start);
> > +        *pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
> > +        ret = BDRV_BLOCK_ZERO;
> > +    }
> > +    return ret | BDRV_BLOCK_OFFSET_VALID | start;
> > +}
> > +#endif /* CONFIG_GLUSTERFS_SEEK_DATA */
> > +
> > +
> >  static QemuOptsList qemu_gluster_create_opts = {
> >      .name = "qemu-gluster-create-opts",
> >      .head = QTAILQ_HEAD_INITIALIZER(qemu_gluster_create_opts.head),
> > @@ -719,6 +866,9 @@ static BlockDriver bdrv_gluster = {
> >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> >  #endif
> > +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> > +#endif
> >      .create_opts                  = &qemu_gluster_create_opts,
> >  };
> >  
> > @@ -746,6 +896,9 @@ static BlockDriver bdrv_gluster_tcp = {
> >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> >  #endif
> > +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> > +#endif
> >      .create_opts                  = &qemu_gluster_create_opts,
> >  };
> >  
> > @@ -773,6 +926,9 @@ static BlockDriver bdrv_gluster_unix = {
> >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> >  #endif
> > +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> > +#endif
> >      .create_opts                  = &qemu_gluster_create_opts,
> >  };
> >  
> > @@ -800,6 +956,9 @@ static BlockDriver bdrv_gluster_rdma = {
> >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> >  #endif
> > +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> > +#endif
> >      .create_opts                  = &qemu_gluster_create_opts,
> >  };
> >  
> > diff --git a/configure b/configure
> > index 0c0472a..ca3821c 100755
> > --- a/configure
> > +++ b/configure
> > @@ -3351,6 +3351,9 @@ if test "$glusterfs" != "no" ; then
> >      if $pkg_config --atleast-version=6 glusterfs-api; then
> >        glusterfs_zerofill="yes"
> >      fi
> > +    if $pkg_config --atleast-version=7.3.8 glusterfs-api; then
> > +      glusterfs_seek_data="yes"
> > +    fi
> >    else
> >      if test "$glusterfs" = "yes" ; then
> >        feature_not_found "GlusterFS backend support" \
> > @@ -3660,6 +3663,24 @@ if compile_prog "" "" ; then
> >    fiemap=yes
> >  fi
> >  
> > +# check for SEEK_DATA and SEEK_HOLE
> > +seek_data=no
> > +cat > $TMPC << EOF
> > +#define _GNU_SOURCE
> > +#include <sys/types.h>
> > +#include <unistd.h>
> > +
> > +int main(void)
> > +{
> > +    lseek(0, 0, SEEK_DATA);
> > +    lseek(0, 0, SEEK_HOLE);
> > +    return 0;
> > +}
> > +EOF
> > +if compile_prog "" "" ; then
> > +  seek_data=yes
> > +fi
> > +
> >  # check for dup3
> >  dup3=no
> >  cat > $TMPC << EOF
> > @@ -5278,6 +5299,10 @@ if test "$glusterfs_zerofill" = "yes" ; then
> >    echo "CONFIG_GLUSTERFS_ZEROFILL=y" >> $config_host_mak
> >  fi
> >  
> > +if test "$glusterfs_seek_data" = "yes" && test "$seek_data" = "yes" ; then
> > +  echo "CONFIG_GLUSTERFS_SEEK_DATA=y" >> $config_host_mak
> > +fi
> > +
> >  if test "$archipelago" = "yes" ; then
> >    echo "CONFIG_ARCHIPELAGO=m" >> $config_host_mak
> >    echo "ARCHIPELAGO_LIBS=$archipelago_libs" >> $config_host_mak
> > -- 
> > 2.5.0
> > 
> > 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH] block/gluster: add support for SEEK_DATA/SEEK_HOLE
  2016-03-08  4:21   ` Niels de Vos
@ 2016-03-08 12:33     ` Jeff Cody
  2016-03-08 13:14       ` Niels de Vos
  2016-03-08 12:53     ` [Qemu-devel] [Qemu-block] [PATCH] " Kevin Wolf
  1 sibling, 1 reply; 16+ messages in thread
From: Jeff Cody @ 2016-03-08 12:33 UTC (permalink / raw)
  To: Niels de Vos; +Cc: Prasanna Kumar Kalever, qemu-block, qemu-devel

On Tue, Mar 08, 2016 at 05:21:48AM +0100, Niels de Vos wrote:
> On Mon, Mar 07, 2016 at 01:27:38PM -0500, Jeff Cody wrote:
> > On Mon, Mar 07, 2016 at 07:04:15PM +0100, Niels de Vos wrote:
> > > GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
> > > it possible to detect sparse areas in files.
> > > 
> > > Signed-off-by: Niels de Vos <ndevos@redhat.com>
> > > 
> > > --
> > > Tested by compiling and running "qemu-img map gluster://..." with a
> > > build of the current master branch of glusterfs. Using a Fedora
> > > cloud image (in raw format) shows many SEEK procudure calls going back
> > > and forth over the network. The output of "qemu map" matches the output
> > > when run against the image on the local filesystem.
> > > ---
> > >  block/gluster.c | 159 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  configure       |  25 +++++++++
> > >  2 files changed, 184 insertions(+)
> > > 
> > > diff --git a/block/gluster.c b/block/gluster.c
> > > index 65077a0..1430010 100644
> > > --- a/block/gluster.c
> > > +++ b/block/gluster.c
> > > @@ -677,6 +677,153 @@ static int qemu_gluster_has_zero_init(BlockDriverState *bs)
> > >      return 0;
> > >  }
> > >  
> > > +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> > 
> > Why do we need to make this a compile-time option?  Version checking
> > is problematic; for instance, different distributions may have
> > backported bug fixes / features, that are not reflected by the
> > reported version number, etc..  Ideally, we can determine
> > functionality during runtime, and behave accordingly.
> 
> This will not get backported to older Gluster versions, it required a
> protocol change.
> 
> > If SEEK_DATA and SEEK_HOLE are not supported,
> > qemu_gluster_co_get_block_status can return that sectors are all
> > allocated (which is what happens in block/io.c anyway if the driver
> > doesn't support the function).
> 
> Ok, good to know.
> 
> > As long as glfs_lseek() will return error (e.g. EINVAL) for an invalid
> > whence value,  we can handle it runtime.  Does glfs_lseek() behave
> > sanely?
> 
> Unfortunately older versions of libgfapi do not return EINVAL when
> SEEK_DATA/HOLE is used. It is something we'll need to fix in the stable
> releases. We can not assume that all users have installed a version of
> the library that handles SEEK_DATA/HOLE correctly (return EINVAL) when
> there is no support in the network protocol or on the server.
> 
> To be sure that we don't get some undefined behaviour, the compile time
> check is needed.
>

That's unfortunate.

However, we may still be able to work around that with some probing in
the qemu_gluster_open() function.  I peeked at the libgfapi code, and
it looks like for an unknown whence, the current offset is just
returned by glfs_lseek() - is that correct?


With the same offset input, an lseek should always return something
different for a SEEK_DATA and SEEK_HOLE whence (SEEK_HOLE will return
EOF offset if there are no further holes).

In qemu_gluster_open(), we can then probe for support.  if we call
glfs_lseek() twice with the same offset, first with SEEK_DATA and
then with SEEK_HOLE, then we can see if the libgfapi version
supports SEEK_DATA/SEEK_HOLE.  If the resulting offsets from the calls
are equal, it doesn't support it.  If the offsets differ, then
presumably it does.  We can then set and remember that in the
BDRVGlusterState struct (e.g. bool supports_seek_data).

> > > +/*
> > > + * Find allocation range in @bs around offset @start.
> > > + * May change underlying file descriptor's file offset.
> > > + * If @start is not in a hole, store @start in @data, and the
> > > + * beginning of the next hole in @hole, and return 0.
> > > + * If @start is in a non-trailing hole, store @start in @hole and the
> > > + * beginning of the next non-hole in @data, and return 0.
> > > + * If @start is in a trailing hole or beyond EOF, return -ENXIO.
> > > + * If we can't find out, return a negative errno other than -ENXIO.
> > > + *
> > > + * (Shamefully copied from raw-posix.c, only miniscule adaptions.)
> > > + */
> > > +static int find_allocation(BlockDriverState *bs, off_t start,
> > > +                           off_t *data, off_t *hole)
> > > +{
> > > +    BDRVGlusterState *s = bs->opaque;
> > > +    off_t offs;
> > > +
> > > +    /*
> > > +     * SEEK_DATA cases:
> > > +     * D1. offs == start: start is in data
> > > +     * D2. offs > start: start is in a hole, next data at offs
> > > +     * D3. offs < 0, errno = ENXIO: either start is in a trailing hole
> > > +     *                              or start is beyond EOF
> > > +     *     If the latter happens, the file has been truncated behind
> > > +     *     our back since we opened it.  All bets are off then.
> > > +     *     Treating like a trailing hole is simplest.
> > > +     * D4. offs < 0, errno != ENXIO: we learned nothing
> > > +     */
> > > +    offs = glfs_lseek(s->fd, start, SEEK_DATA);
> > > +    if (offs < 0) {
> > > +        return -errno;          /* D3 or D4 */
> > > +    }
> > > +    assert(offs >= start);
> > > +
> > > +    if (offs > start) {
> > > +        /* D2: in hole, next data at offs */
> > > +        *hole = start;
> > > +        *data = offs;
> > > +        return 0;
> > > +    }
> > > +
> > > +    /* D1: in data, end not yet known */
> > > +
> > > +    /*
> > > +     * SEEK_HOLE cases:
> > > +     * H1. offs == start: start is in a hole
> > > +     *     If this happens here, a hole has been dug behind our back
> > > +     *     since the previous lseek().
> > > +     * H2. offs > start: either start is in data, next hole at offs,
> > > +     *                   or start is in trailing hole, EOF at offs
> > > +     *     Linux treats trailing holes like any other hole: offs ==
> > > +     *     start.  Solaris seeks to EOF instead: offs > start (blech).
> > > +     *     If that happens here, a hole has been dug behind our back
> > > +     *     since the previous lseek().
> > > +     * H3. offs < 0, errno = ENXIO: start is beyond EOF
> > > +     *     If this happens, the file has been truncated behind our
> > > +     *     back since we opened it.  Treat it like a trailing hole.
> > > +     * H4. offs < 0, errno != ENXIO: we learned nothing
> > > +     *     Pretend we know nothing at all, i.e. "forget" about D1.
> > > +     */
> > > +    offs = glfs_lseek(s->fd, start, SEEK_HOLE);
> > > +    if (offs < 0) {
> > > +        return -errno;          /* D1 and (H3 or H4) */
> > > +    }
> > > +    assert(offs >= start);
> > > +
> > > +    if (offs > start) {
> > > +        /*
> > > +         * D1 and H2: either in data, next hole at offs, or it was in
> > > +         * data but is now in a trailing hole.  In the latter case,
> > > +         * all bets are off.  Treating it as if it there was data all
> > > +         * the way to EOF is safe, so simply do that.
> > > +         */
> > > +        *data = start;
> > > +        *hole = offs;
> > > +        return 0;
> > > +    }
> > > +
> > > +    /* D1 and H1 */
> > > +    return -EBUSY;
> > > +}
> > > +
> > > +/*
> > > + * Returns the allocation status of the specified sectors.
> > > + *
> > > + * If 'sector_num' is beyond the end of the disk image the return value is 0
> > > + * and 'pnum' is set to 0.
> > > + *
> > > + * 'pnum' is set to the number of sectors (including and immediately following
> > > + * the specified sector) that are known to be in the same
> > > + * allocated/unallocated state.
> > > + *
> > > + * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
> > > + * beyond the end of the disk image it will be clamped.
> > > + *
> > > + * (Based on raw_co_get_block_status() from raw-posix.c.)
> > > + */
> > > +static int64_t coroutine_fn qemu_gluster_co_get_block_status(
> > > +        BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum)
> > > +{
> > > +    BDRVGlusterState *s = bs->opaque;
> > > +    off_t start, data = 0, hole = 0;
> > > +    int64_t total_size;
> > > +    int ret = -EINVAL;
> > > +
> > > +    if (!s->fd) {
> > > +        return ret;
> > > +    }
> > > +
> > > +    start = sector_num * BDRV_SECTOR_SIZE;
> > > +    total_size = bdrv_getlength(bs);
> > > +    if (total_size < 0) {
> > > +        return total_size;
> > > +    } else if (start >= total_size) {
> > > +        *pnum = 0;
> > > +        return 0;
> > > +    } else if (start + nb_sectors * BDRV_SECTOR_SIZE > total_size) {
> > > +        nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
> > > +    }
> > > +
> > > +    ret = find_allocation(bs, start, &data, &hole);
> > > +    if (ret == -ENXIO) {
> > > +        /* Trailing hole */
> > > +        *pnum = nb_sectors;
> > > +        ret = BDRV_BLOCK_ZERO;
> > > +    } else if (ret < 0) {
> > > +        /* No info available, so pretend there are no holes */
> > > +        *pnum = nb_sectors;
> > > +        ret = BDRV_BLOCK_DATA;
> > > +    } else if (data == start) {
> > > +        /* On a data extent, compute sectors to the end of the extent,
> > > +         * possibly including a partial sector at EOF. */
> > > +        *pnum = MIN(nb_sectors, DIV_ROUND_UP(hole - start, BDRV_SECTOR_SIZE));
> > > +        ret = BDRV_BLOCK_DATA;
> > > +    } else {
> > > +        /* On a hole, compute sectors to the beginning of the next extent.  */
> > > +        assert(hole == start);
> > > +        *pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
> > > +        ret = BDRV_BLOCK_ZERO;
> > > +    }
> > > +    return ret | BDRV_BLOCK_OFFSET_VALID | start;
> > > +}
> > > +#endif /* CONFIG_GLUSTERFS_SEEK_DATA */
> > > +
> > > +
> > >  static QemuOptsList qemu_gluster_create_opts = {
> > >      .name = "qemu-gluster-create-opts",
> > >      .head = QTAILQ_HEAD_INITIALIZER(qemu_gluster_create_opts.head),
> > > @@ -719,6 +866,9 @@ static BlockDriver bdrv_gluster = {
> > >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> > >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> > >  #endif
> > > +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> > > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> > > +#endif
> > >      .create_opts                  = &qemu_gluster_create_opts,
> > >  };
> > >  
> > > @@ -746,6 +896,9 @@ static BlockDriver bdrv_gluster_tcp = {
> > >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> > >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> > >  #endif
> > > +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> > > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> > > +#endif
> > >      .create_opts                  = &qemu_gluster_create_opts,
> > >  };
> > >  
> > > @@ -773,6 +926,9 @@ static BlockDriver bdrv_gluster_unix = {
> > >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> > >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> > >  #endif
> > > +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> > > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> > > +#endif
> > >      .create_opts                  = &qemu_gluster_create_opts,
> > >  };
> > >  
> > > @@ -800,6 +956,9 @@ static BlockDriver bdrv_gluster_rdma = {
> > >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> > >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> > >  #endif
> > > +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> > > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> > > +#endif
> > >      .create_opts                  = &qemu_gluster_create_opts,
> > >  };
> > >  
> > > diff --git a/configure b/configure
> > > index 0c0472a..ca3821c 100755
> > > --- a/configure
> > > +++ b/configure
> > > @@ -3351,6 +3351,9 @@ if test "$glusterfs" != "no" ; then
> > >      if $pkg_config --atleast-version=6 glusterfs-api; then
> > >        glusterfs_zerofill="yes"
> > >      fi
> > > +    if $pkg_config --atleast-version=7.3.8 glusterfs-api; then
> > > +      glusterfs_seek_data="yes"
> > > +    fi
> > >    else
> > >      if test "$glusterfs" = "yes" ; then
> > >        feature_not_found "GlusterFS backend support" \
> > > @@ -3660,6 +3663,24 @@ if compile_prog "" "" ; then
> > >    fiemap=yes
> > >  fi
> > >  
> > > +# check for SEEK_DATA and SEEK_HOLE
> > > +seek_data=no
> > > +cat > $TMPC << EOF
> > > +#define _GNU_SOURCE
> > > +#include <sys/types.h>
> > > +#include <unistd.h>
> > > +
> > > +int main(void)
> > > +{
> > > +    lseek(0, 0, SEEK_DATA);
> > > +    lseek(0, 0, SEEK_HOLE);
> > > +    return 0;
> > > +}
> > > +EOF
> > > +if compile_prog "" "" ; then
> > > +  seek_data=yes
> > > +fi
> > > +
> > >  # check for dup3
> > >  dup3=no
> > >  cat > $TMPC << EOF
> > > @@ -5278,6 +5299,10 @@ if test "$glusterfs_zerofill" = "yes" ; then
> > >    echo "CONFIG_GLUSTERFS_ZEROFILL=y" >> $config_host_mak
> > >  fi
> > >  
> > > +if test "$glusterfs_seek_data" = "yes" && test "$seek_data" = "yes" ; then
> > > +  echo "CONFIG_GLUSTERFS_SEEK_DATA=y" >> $config_host_mak
> > > +fi
> > > +
> > >  if test "$archipelago" = "yes" ; then
> > >    echo "CONFIG_ARCHIPELAGO=m" >> $config_host_mak
> > >    echo "ARCHIPELAGO_LIBS=$archipelago_libs" >> $config_host_mak
> > > -- 
> > > 2.5.0
> > > 
> > > 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH] block/gluster: add support for SEEK_DATA/SEEK_HOLE
  2016-03-08  4:21   ` Niels de Vos
  2016-03-08 12:33     ` Jeff Cody
@ 2016-03-08 12:53     ` Kevin Wolf
  2016-03-08 13:19       ` Niels de Vos
  1 sibling, 1 reply; 16+ messages in thread
From: Kevin Wolf @ 2016-03-08 12:53 UTC (permalink / raw)
  To: Niels de Vos; +Cc: Jeff Cody, qemu-block, Prasanna Kumar Kalever, qemu-devel

Am 08.03.2016 um 05:21 hat Niels de Vos geschrieben:
> On Mon, Mar 07, 2016 at 01:27:38PM -0500, Jeff Cody wrote:
> > On Mon, Mar 07, 2016 at 07:04:15PM +0100, Niels de Vos wrote:
> > > GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
> > > it possible to detect sparse areas in files.
> > > 
> > > Signed-off-by: Niels de Vos <ndevos@redhat.com>
> > > 
> > > --
> > > Tested by compiling and running "qemu-img map gluster://..." with a
> > > build of the current master branch of glusterfs. Using a Fedora
> > > cloud image (in raw format) shows many SEEK procudure calls going back
> > > and forth over the network. The output of "qemu map" matches the output
> > > when run against the image on the local filesystem.
> > > ---
> > >  block/gluster.c | 159 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  configure       |  25 +++++++++
> > >  2 files changed, 184 insertions(+)
> > > 
> > > diff --git a/block/gluster.c b/block/gluster.c
> > > index 65077a0..1430010 100644
> > > --- a/block/gluster.c
> > > +++ b/block/gluster.c
> > > @@ -677,6 +677,153 @@ static int qemu_gluster_has_zero_init(BlockDriverState *bs)
> > >      return 0;
> > >  }
> > >  
> > > +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> > 
> > Why do we need to make this a compile-time option?  Version checking
> > is problematic; for instance, different distributions may have
> > backported bug fixes / features, that are not reflected by the
> > reported version number, etc..  Ideally, we can determine
> > functionality during runtime, and behave accordingly.
> 
> This will not get backported to older Gluster versions, it required a
> protocol change.
> 
> > If SEEK_DATA and SEEK_HOLE are not supported,
> > qemu_gluster_co_get_block_status can return that sectors are all
> > allocated (which is what happens in block/io.c anyway if the driver
> > doesn't support the function).
> 
> Ok, good to know.
> 
> > As long as glfs_lseek() will return error (e.g. EINVAL) for an invalid
> > whence value,  we can handle it runtime.  Does glfs_lseek() behave
> > sanely?
> 
> Unfortunately older versions of libgfapi do not return EINVAL when
> SEEK_DATA/HOLE is used. It is something we'll need to fix in the stable
> releases. We can not assume that all users have installed a version of
> the library that handles SEEK_DATA/HOLE correctly (return EINVAL) when
> there is no support in the network protocol or on the server.
> 
> To be sure that we don't get some undefined behaviour, the compile time
> check is needed.

The code could be compiled on a host with newer libgfapi, but run on a
different host with an older version. This is why having (only) compile
time checks is rarely a good idea.

Jeff's suggestion to probe the actual behaviour on the host we're
running on in .bdrv_open() sounds reasonable to me.

Kevin

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH] block/gluster: add support for SEEK_DATA/SEEK_HOLE
  2016-03-08 12:33     ` Jeff Cody
@ 2016-03-08 13:14       ` Niels de Vos
  2016-03-09 12:30         ` [Qemu-devel] [PATCH v2] " Niels de Vos
  0 siblings, 1 reply; 16+ messages in thread
From: Niels de Vos @ 2016-03-08 13:14 UTC (permalink / raw)
  To: Jeff Cody; +Cc: Prasanna Kumar Kalever, qemu-block, qemu-devel

On Tue, Mar 08, 2016 at 07:33:26AM -0500, Jeff Cody wrote:
> On Tue, Mar 08, 2016 at 05:21:48AM +0100, Niels de Vos wrote:
> > On Mon, Mar 07, 2016 at 01:27:38PM -0500, Jeff Cody wrote:
> > > On Mon, Mar 07, 2016 at 07:04:15PM +0100, Niels de Vos wrote:
> > > > GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
> > > > it possible to detect sparse areas in files.
> > > > 
> > > > Signed-off-by: Niels de Vos <ndevos@redhat.com>
> > > > 
> > > > --
> > > > Tested by compiling and running "qemu-img map gluster://..." with a
> > > > build of the current master branch of glusterfs. Using a Fedora
> > > > cloud image (in raw format) shows many SEEK procudure calls going back
> > > > and forth over the network. The output of "qemu map" matches the output
> > > > when run against the image on the local filesystem.
> > > > ---
> > > >  block/gluster.c | 159 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  configure       |  25 +++++++++
> > > >  2 files changed, 184 insertions(+)
> > > > 
> > > > diff --git a/block/gluster.c b/block/gluster.c
> > > > index 65077a0..1430010 100644
> > > > --- a/block/gluster.c
> > > > +++ b/block/gluster.c
> > > > @@ -677,6 +677,153 @@ static int qemu_gluster_has_zero_init(BlockDriverState *bs)
> > > >      return 0;
> > > >  }
> > > >  
> > > > +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> > > 
> > > Why do we need to make this a compile-time option?  Version checking
> > > is problematic; for instance, different distributions may have
> > > backported bug fixes / features, that are not reflected by the
> > > reported version number, etc..  Ideally, we can determine
> > > functionality during runtime, and behave accordingly.
> > 
> > This will not get backported to older Gluster versions, it required a
> > protocol change.
> > 
> > > If SEEK_DATA and SEEK_HOLE are not supported,
> > > qemu_gluster_co_get_block_status can return that sectors are all
> > > allocated (which is what happens in block/io.c anyway if the driver
> > > doesn't support the function).
> > 
> > Ok, good to know.
> > 
> > > As long as glfs_lseek() will return error (e.g. EINVAL) for an invalid
> > > whence value,  we can handle it runtime.  Does glfs_lseek() behave
> > > sanely?
> > 
> > Unfortunately older versions of libgfapi do not return EINVAL when
> > SEEK_DATA/HOLE is used. It is something we'll need to fix in the stable
> > releases. We can not assume that all users have installed a version of
> > the library that handles SEEK_DATA/HOLE correctly (return EINVAL) when
> > there is no support in the network protocol or on the server.
> > 
> > To be sure that we don't get some undefined behaviour, the compile time
> > check is needed.
> >
> 
> That's unfortunate.
> 
> However, we may still be able to work around that with some probing in
> the qemu_gluster_open() function.  I peeked at the libgfapi code, and
> it looks like for an unknown whence, the current offset is just
> returned by glfs_lseek() - is that correct?

Yes, that is correct.

> With the same offset input, an lseek should always return something
> different for a SEEK_DATA and SEEK_HOLE whence (SEEK_HOLE will return
> EOF offset if there are no further holes).
> 
> In qemu_gluster_open(), we can then probe for support.  if we call
> glfs_lseek() twice with the same offset, first with SEEK_DATA and
> then with SEEK_HOLE, then we can see if the libgfapi version
> supports SEEK_DATA/SEEK_HOLE.  If the resulting offsets from the calls
> are equal, it doesn't support it.  If the offsets differ, then
> presumably it does.  We can then set and remember that in the
> BDRVGlusterState struct (e.g. bool supports_seek_data).

Hmm, yes, that should work. And it'll obviously catch the case where
glfs_lseek() returns EINVAL too.

Thanks for the idea, I'll try to get this done over the next few days.

Niels


> 
> > > > +/*
> > > > + * Find allocation range in @bs around offset @start.
> > > > + * May change underlying file descriptor's file offset.
> > > > + * If @start is not in a hole, store @start in @data, and the
> > > > + * beginning of the next hole in @hole, and return 0.
> > > > + * If @start is in a non-trailing hole, store @start in @hole and the
> > > > + * beginning of the next non-hole in @data, and return 0.
> > > > + * If @start is in a trailing hole or beyond EOF, return -ENXIO.
> > > > + * If we can't find out, return a negative errno other than -ENXIO.
> > > > + *
> > > > + * (Shamefully copied from raw-posix.c, only miniscule adaptions.)
> > > > + */
> > > > +static int find_allocation(BlockDriverState *bs, off_t start,
> > > > +                           off_t *data, off_t *hole)
> > > > +{
> > > > +    BDRVGlusterState *s = bs->opaque;
> > > > +    off_t offs;
> > > > +
> > > > +    /*
> > > > +     * SEEK_DATA cases:
> > > > +     * D1. offs == start: start is in data
> > > > +     * D2. offs > start: start is in a hole, next data at offs
> > > > +     * D3. offs < 0, errno = ENXIO: either start is in a trailing hole
> > > > +     *                              or start is beyond EOF
> > > > +     *     If the latter happens, the file has been truncated behind
> > > > +     *     our back since we opened it.  All bets are off then.
> > > > +     *     Treating like a trailing hole is simplest.
> > > > +     * D4. offs < 0, errno != ENXIO: we learned nothing
> > > > +     */
> > > > +    offs = glfs_lseek(s->fd, start, SEEK_DATA);
> > > > +    if (offs < 0) {
> > > > +        return -errno;          /* D3 or D4 */
> > > > +    }
> > > > +    assert(offs >= start);
> > > > +
> > > > +    if (offs > start) {
> > > > +        /* D2: in hole, next data at offs */
> > > > +        *hole = start;
> > > > +        *data = offs;
> > > > +        return 0;
> > > > +    }
> > > > +
> > > > +    /* D1: in data, end not yet known */
> > > > +
> > > > +    /*
> > > > +     * SEEK_HOLE cases:
> > > > +     * H1. offs == start: start is in a hole
> > > > +     *     If this happens here, a hole has been dug behind our back
> > > > +     *     since the previous lseek().
> > > > +     * H2. offs > start: either start is in data, next hole at offs,
> > > > +     *                   or start is in trailing hole, EOF at offs
> > > > +     *     Linux treats trailing holes like any other hole: offs ==
> > > > +     *     start.  Solaris seeks to EOF instead: offs > start (blech).
> > > > +     *     If that happens here, a hole has been dug behind our back
> > > > +     *     since the previous lseek().
> > > > +     * H3. offs < 0, errno = ENXIO: start is beyond EOF
> > > > +     *     If this happens, the file has been truncated behind our
> > > > +     *     back since we opened it.  Treat it like a trailing hole.
> > > > +     * H4. offs < 0, errno != ENXIO: we learned nothing
> > > > +     *     Pretend we know nothing at all, i.e. "forget" about D1.
> > > > +     */
> > > > +    offs = glfs_lseek(s->fd, start, SEEK_HOLE);
> > > > +    if (offs < 0) {
> > > > +        return -errno;          /* D1 and (H3 or H4) */
> > > > +    }
> > > > +    assert(offs >= start);
> > > > +
> > > > +    if (offs > start) {
> > > > +        /*
> > > > +         * D1 and H2: either in data, next hole at offs, or it was in
> > > > +         * data but is now in a trailing hole.  In the latter case,
> > > > +         * all bets are off.  Treating it as if it there was data all
> > > > +         * the way to EOF is safe, so simply do that.
> > > > +         */
> > > > +        *data = start;
> > > > +        *hole = offs;
> > > > +        return 0;
> > > > +    }
> > > > +
> > > > +    /* D1 and H1 */
> > > > +    return -EBUSY;
> > > > +}
> > > > +
> > > > +/*
> > > > + * Returns the allocation status of the specified sectors.
> > > > + *
> > > > + * If 'sector_num' is beyond the end of the disk image the return value is 0
> > > > + * and 'pnum' is set to 0.
> > > > + *
> > > > + * 'pnum' is set to the number of sectors (including and immediately following
> > > > + * the specified sector) that are known to be in the same
> > > > + * allocated/unallocated state.
> > > > + *
> > > > + * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
> > > > + * beyond the end of the disk image it will be clamped.
> > > > + *
> > > > + * (Based on raw_co_get_block_status() from raw-posix.c.)
> > > > + */
> > > > +static int64_t coroutine_fn qemu_gluster_co_get_block_status(
> > > > +        BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum)
> > > > +{
> > > > +    BDRVGlusterState *s = bs->opaque;
> > > > +    off_t start, data = 0, hole = 0;
> > > > +    int64_t total_size;
> > > > +    int ret = -EINVAL;
> > > > +
> > > > +    if (!s->fd) {
> > > > +        return ret;
> > > > +    }
> > > > +
> > > > +    start = sector_num * BDRV_SECTOR_SIZE;
> > > > +    total_size = bdrv_getlength(bs);
> > > > +    if (total_size < 0) {
> > > > +        return total_size;
> > > > +    } else if (start >= total_size) {
> > > > +        *pnum = 0;
> > > > +        return 0;
> > > > +    } else if (start + nb_sectors * BDRV_SECTOR_SIZE > total_size) {
> > > > +        nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
> > > > +    }
> > > > +
> > > > +    ret = find_allocation(bs, start, &data, &hole);
> > > > +    if (ret == -ENXIO) {
> > > > +        /* Trailing hole */
> > > > +        *pnum = nb_sectors;
> > > > +        ret = BDRV_BLOCK_ZERO;
> > > > +    } else if (ret < 0) {
> > > > +        /* No info available, so pretend there are no holes */
> > > > +        *pnum = nb_sectors;
> > > > +        ret = BDRV_BLOCK_DATA;
> > > > +    } else if (data == start) {
> > > > +        /* On a data extent, compute sectors to the end of the extent,
> > > > +         * possibly including a partial sector at EOF. */
> > > > +        *pnum = MIN(nb_sectors, DIV_ROUND_UP(hole - start, BDRV_SECTOR_SIZE));
> > > > +        ret = BDRV_BLOCK_DATA;
> > > > +    } else {
> > > > +        /* On a hole, compute sectors to the beginning of the next extent.  */
> > > > +        assert(hole == start);
> > > > +        *pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
> > > > +        ret = BDRV_BLOCK_ZERO;
> > > > +    }
> > > > +    return ret | BDRV_BLOCK_OFFSET_VALID | start;
> > > > +}
> > > > +#endif /* CONFIG_GLUSTERFS_SEEK_DATA */
> > > > +
> > > > +
> > > >  static QemuOptsList qemu_gluster_create_opts = {
> > > >      .name = "qemu-gluster-create-opts",
> > > >      .head = QTAILQ_HEAD_INITIALIZER(qemu_gluster_create_opts.head),
> > > > @@ -719,6 +866,9 @@ static BlockDriver bdrv_gluster = {
> > > >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> > > >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> > > >  #endif
> > > > +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> > > > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> > > > +#endif
> > > >      .create_opts                  = &qemu_gluster_create_opts,
> > > >  };
> > > >  
> > > > @@ -746,6 +896,9 @@ static BlockDriver bdrv_gluster_tcp = {
> > > >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> > > >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> > > >  #endif
> > > > +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> > > > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> > > > +#endif
> > > >      .create_opts                  = &qemu_gluster_create_opts,
> > > >  };
> > > >  
> > > > @@ -773,6 +926,9 @@ static BlockDriver bdrv_gluster_unix = {
> > > >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> > > >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> > > >  #endif
> > > > +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> > > > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> > > > +#endif
> > > >      .create_opts                  = &qemu_gluster_create_opts,
> > > >  };
> > > >  
> > > > @@ -800,6 +956,9 @@ static BlockDriver bdrv_gluster_rdma = {
> > > >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> > > >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> > > >  #endif
> > > > +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> > > > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> > > > +#endif
> > > >      .create_opts                  = &qemu_gluster_create_opts,
> > > >  };
> > > >  
> > > > diff --git a/configure b/configure
> > > > index 0c0472a..ca3821c 100755
> > > > --- a/configure
> > > > +++ b/configure
> > > > @@ -3351,6 +3351,9 @@ if test "$glusterfs" != "no" ; then
> > > >      if $pkg_config --atleast-version=6 glusterfs-api; then
> > > >        glusterfs_zerofill="yes"
> > > >      fi
> > > > +    if $pkg_config --atleast-version=7.3.8 glusterfs-api; then
> > > > +      glusterfs_seek_data="yes"
> > > > +    fi
> > > >    else
> > > >      if test "$glusterfs" = "yes" ; then
> > > >        feature_not_found "GlusterFS backend support" \
> > > > @@ -3660,6 +3663,24 @@ if compile_prog "" "" ; then
> > > >    fiemap=yes
> > > >  fi
> > > >  
> > > > +# check for SEEK_DATA and SEEK_HOLE
> > > > +seek_data=no
> > > > +cat > $TMPC << EOF
> > > > +#define _GNU_SOURCE
> > > > +#include <sys/types.h>
> > > > +#include <unistd.h>
> > > > +
> > > > +int main(void)
> > > > +{
> > > > +    lseek(0, 0, SEEK_DATA);
> > > > +    lseek(0, 0, SEEK_HOLE);
> > > > +    return 0;
> > > > +}
> > > > +EOF
> > > > +if compile_prog "" "" ; then
> > > > +  seek_data=yes
> > > > +fi
> > > > +
> > > >  # check for dup3
> > > >  dup3=no
> > > >  cat > $TMPC << EOF
> > > > @@ -5278,6 +5299,10 @@ if test "$glusterfs_zerofill" = "yes" ; then
> > > >    echo "CONFIG_GLUSTERFS_ZEROFILL=y" >> $config_host_mak
> > > >  fi
> > > >  
> > > > +if test "$glusterfs_seek_data" = "yes" && test "$seek_data" = "yes" ; then
> > > > +  echo "CONFIG_GLUSTERFS_SEEK_DATA=y" >> $config_host_mak
> > > > +fi
> > > > +
> > > >  if test "$archipelago" = "yes" ; then
> > > >    echo "CONFIG_ARCHIPELAGO=m" >> $config_host_mak
> > > >    echo "ARCHIPELAGO_LIBS=$archipelago_libs" >> $config_host_mak
> > > > -- 
> > > > 2.5.0
> > > > 
> > > > 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH] block/gluster: add support for SEEK_DATA/SEEK_HOLE
  2016-03-08 12:53     ` [Qemu-devel] [Qemu-block] [PATCH] " Kevin Wolf
@ 2016-03-08 13:19       ` Niels de Vos
  0 siblings, 0 replies; 16+ messages in thread
From: Niels de Vos @ 2016-03-08 13:19 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Jeff Cody, qemu-block, Prasanna Kumar Kalever, qemu-devel

On Tue, Mar 08, 2016 at 01:53:26PM +0100, Kevin Wolf wrote:
> Am 08.03.2016 um 05:21 hat Niels de Vos geschrieben:
> > On Mon, Mar 07, 2016 at 01:27:38PM -0500, Jeff Cody wrote:
> > > On Mon, Mar 07, 2016 at 07:04:15PM +0100, Niels de Vos wrote:
> > > > GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
> > > > it possible to detect sparse areas in files.
> > > > 
> > > > Signed-off-by: Niels de Vos <ndevos@redhat.com>
> > > > 
> > > > --
> > > > Tested by compiling and running "qemu-img map gluster://..." with a
> > > > build of the current master branch of glusterfs. Using a Fedora
> > > > cloud image (in raw format) shows many SEEK procudure calls going back
> > > > and forth over the network. The output of "qemu map" matches the output
> > > > when run against the image on the local filesystem.
> > > > ---
> > > >  block/gluster.c | 159 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  configure       |  25 +++++++++
> > > >  2 files changed, 184 insertions(+)
> > > > 
> > > > diff --git a/block/gluster.c b/block/gluster.c
> > > > index 65077a0..1430010 100644
> > > > --- a/block/gluster.c
> > > > +++ b/block/gluster.c
> > > > @@ -677,6 +677,153 @@ static int qemu_gluster_has_zero_init(BlockDriverState *bs)
> > > >      return 0;
> > > >  }
> > > >  
> > > > +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> > > 
> > > Why do we need to make this a compile-time option?  Version checking
> > > is problematic; for instance, different distributions may have
> > > backported bug fixes / features, that are not reflected by the
> > > reported version number, etc..  Ideally, we can determine
> > > functionality during runtime, and behave accordingly.
> > 
> > This will not get backported to older Gluster versions, it required a
> > protocol change.
> > 
> > > If SEEK_DATA and SEEK_HOLE are not supported,
> > > qemu_gluster_co_get_block_status can return that sectors are all
> > > allocated (which is what happens in block/io.c anyway if the driver
> > > doesn't support the function).
> > 
> > Ok, good to know.
> > 
> > > As long as glfs_lseek() will return error (e.g. EINVAL) for an invalid
> > > whence value,  we can handle it runtime.  Does glfs_lseek() behave
> > > sanely?
> > 
> > Unfortunately older versions of libgfapi do not return EINVAL when
> > SEEK_DATA/HOLE is used. It is something we'll need to fix in the stable
> > releases. We can not assume that all users have installed a version of
> > the library that handles SEEK_DATA/HOLE correctly (return EINVAL) when
> > there is no support in the network protocol or on the server.
> > 
> > To be sure that we don't get some undefined behaviour, the compile time
> > check is needed.
> 
> The code could be compiled on a host with newer libgfapi, but run on a
> different host with an older version. This is why having (only) compile
> time checks is rarely a good idea.

Oh, yes, that is possible. glfs_lseek() is not a new function, so the
symbol version did not need to change.

> Jeff's suggestion to probe the actual behaviour on the host we're
> running on in .bdrv_open() sounds reasonable to me.

Yes, it sure is. I'll send a v2 patch soon.

Thanks,
Niels

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v2] block/gluster: add support for SEEK_DATA/SEEK_HOLE
  2016-03-08 13:14       ` Niels de Vos
@ 2016-03-09 12:30         ` Niels de Vos
  2016-03-09 15:46           ` Jeff Cody
  0 siblings, 1 reply; 16+ messages in thread
From: Niels de Vos @ 2016-03-09 12:30 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: Kevin Wolf, Jeff Cody, Prasanna Kumar Kalever, Niels de Vos

GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
it possible to detect sparse areas in files.

Signed-off-by: Niels de Vos <ndevos@redhat.com>

---
Tested by compiling and running "qemu-img map gluster://..." with a
build of the current master branch of glusterfs. Using a Fedora cloud
image (in raw format) shows many SEEK procudure calls going back and
forth over the network. The output of "qemu map" matches the output when
run against the image on the local filesystem.

v2 based on feedback from Jeff Cody:
- Replace compile time detection by runtime detection
- Update return pointer (new argument) for .bdrv_co_get_block_status
---
 block/gluster.c | 186 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 186 insertions(+)

diff --git a/block/gluster.c b/block/gluster.c
index 65077a0..b01ab52 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -23,6 +23,7 @@ typedef struct GlusterAIOCB {
 typedef struct BDRVGlusterState {
     struct glfs *glfs;
     struct glfs_fd *fd;
+    bool supports_seek_data;
 } BDRVGlusterState;
 
 typedef struct GlusterConf {
@@ -286,6 +287,33 @@ static void qemu_gluster_parse_flags(int bdrv_flags, int *open_flags)
     }
 }
 
+/*
+ * Do SEEK_DATA/HOLE to detect if it is functional. In order to be usable, it
+ * should return different values for the start of data and the start of a
+ * hole. There are three different cases to handle:
+ *
+ *  - the same position is returned for data/hole (indicates broken gfapi)
+ *  - an error is returned:
+ *     - ENXIO only gets returned if there is valid support on client+server
+ *     - EINVAL is returned when gfapi or the server does not support it
+ */
+static bool qemu_gluster_test_seek(struct glfs_fd *fd)
+{
+    off_t start_data, start_hole;
+    bool supports_seek_data = false;
+
+    start_data = glfs_lseek(fd, 0, SEEK_DATA);
+    if (start_data != -1) {
+        start_hole = glfs_lseek(fd, 0, SEEK_HOLE);
+        if (start_hole != -1)
+            supports_seek_data = !(start_data == start_hole);
+    } else if (errno == ENXIO) {
+        supports_seek_data = true;
+    }
+
+    return supports_seek_data;
+}
+
 static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
                              int bdrv_flags, Error **errp)
 {
@@ -320,6 +348,8 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
         ret = -errno;
     }
 
+    s->supports_seek_data = qemu_gluster_test_seek(s->fd);
+
 out:
     qemu_opts_del(opts);
     qemu_gluster_gconf_free(gconf);
@@ -677,6 +707,158 @@ static int qemu_gluster_has_zero_init(BlockDriverState *bs)
     return 0;
 }
 
+/*
+ * Find allocation range in @bs around offset @start.
+ * May change underlying file descriptor's file offset.
+ * If @start is not in a hole, store @start in @data, and the
+ * beginning of the next hole in @hole, and return 0.
+ * If @start is in a non-trailing hole, store @start in @hole and the
+ * beginning of the next non-hole in @data, and return 0.
+ * If @start is in a trailing hole or beyond EOF, return -ENXIO.
+ * If we can't find out, return a negative errno other than -ENXIO.
+ *
+ * (Shamefully copied from raw-posix.c, only miniscule adaptions.)
+ */
+static int find_allocation(BlockDriverState *bs, off_t start,
+                           off_t *data, off_t *hole)
+{
+    BDRVGlusterState *s = bs->opaque;
+    off_t offs;
+
+    if (!s->supports_seek_data)
+        return -EINVAL;
+
+    /*
+     * SEEK_DATA cases:
+     * D1. offs == start: start is in data
+     * D2. offs > start: start is in a hole, next data at offs
+     * D3. offs < 0, errno = ENXIO: either start is in a trailing hole
+     *                              or start is beyond EOF
+     *     If the latter happens, the file has been truncated behind
+     *     our back since we opened it.  All bets are off then.
+     *     Treating like a trailing hole is simplest.
+     * D4. offs < 0, errno != ENXIO: we learned nothing
+     */
+    offs = glfs_lseek(s->fd, start, SEEK_DATA);
+    if (offs < 0) {
+        return -errno;          /* D3 or D4 */
+    }
+    assert(offs >= start);
+
+    if (offs > start) {
+        /* D2: in hole, next data at offs */
+        *hole = start;
+        *data = offs;
+        return 0;
+    }
+
+    /* D1: in data, end not yet known */
+
+    /*
+     * SEEK_HOLE cases:
+     * H1. offs == start: start is in a hole
+     *     If this happens here, a hole has been dug behind our back
+     *     since the previous lseek().
+     * H2. offs > start: either start is in data, next hole at offs,
+     *                   or start is in trailing hole, EOF at offs
+     *     Linux treats trailing holes like any other hole: offs ==
+     *     start.  Solaris seeks to EOF instead: offs > start (blech).
+     *     If that happens here, a hole has been dug behind our back
+     *     since the previous lseek().
+     * H3. offs < 0, errno = ENXIO: start is beyond EOF
+     *     If this happens, the file has been truncated behind our
+     *     back since we opened it.  Treat it like a trailing hole.
+     * H4. offs < 0, errno != ENXIO: we learned nothing
+     *     Pretend we know nothing at all, i.e. "forget" about D1.
+     */
+    offs = glfs_lseek(s->fd, start, SEEK_HOLE);
+    if (offs < 0) {
+        return -errno;          /* D1 and (H3 or H4) */
+    }
+    assert(offs >= start);
+
+    if (offs > start) {
+        /*
+         * D1 and H2: either in data, next hole at offs, or it was in
+         * data but is now in a trailing hole.  In the latter case,
+         * all bets are off.  Treating it as if it there was data all
+         * the way to EOF is safe, so simply do that.
+         */
+        *data = start;
+        *hole = offs;
+        return 0;
+    }
+
+    /* D1 and H1 */
+    return -EBUSY;
+}
+
+/*
+ * Returns the allocation status of the specified sectors.
+ *
+ * If 'sector_num' is beyond the end of the disk image the return value is 0
+ * and 'pnum' is set to 0.
+ *
+ * 'pnum' is set to the number of sectors (including and immediately following
+ * the specified sector) that are known to be in the same
+ * allocated/unallocated state.
+ *
+ * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
+ * beyond the end of the disk image it will be clamped.
+ *
+ * (Based on raw_co_get_block_status() from raw-posix.c.)
+ */
+static int64_t coroutine_fn qemu_gluster_co_get_block_status(
+	BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
+        BlockDriverState **file)
+{
+    BDRVGlusterState *s = bs->opaque;
+    off_t start, data = 0, hole = 0;
+    int64_t total_size;
+    int ret = -EINVAL;
+
+    if (!s->fd) {
+        return ret;
+    }
+
+    start = sector_num * BDRV_SECTOR_SIZE;
+    total_size = bdrv_getlength(bs);
+    if (total_size < 0) {
+        return total_size;
+    } else if (start >= total_size) {
+        *pnum = 0;
+        return 0;
+    } else if (start + nb_sectors * BDRV_SECTOR_SIZE > total_size) {
+        nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
+    }
+
+    ret = find_allocation(bs, start, &data, &hole);
+    if (ret == -ENXIO) {
+        /* Trailing hole */
+        *pnum = nb_sectors;
+        ret = BDRV_BLOCK_ZERO;
+    } else if (ret < 0) {
+        /* No info available, so pretend there are no holes */
+        *pnum = nb_sectors;
+        ret = BDRV_BLOCK_DATA;
+    } else if (data == start) {
+        /* On a data extent, compute sectors to the end of the extent,
+         * possibly including a partial sector at EOF. */
+        *pnum = MIN(nb_sectors, DIV_ROUND_UP(hole - start, BDRV_SECTOR_SIZE));
+        ret = BDRV_BLOCK_DATA;
+    } else {
+        /* On a hole, compute sectors to the beginning of the next extent.  */
+        assert(hole == start);
+        *pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
+        ret = BDRV_BLOCK_ZERO;
+    }
+
+    *file = bs;
+
+    return ret | BDRV_BLOCK_OFFSET_VALID | start;
+}
+
+
 static QemuOptsList qemu_gluster_create_opts = {
     .name = "qemu-gluster-create-opts",
     .head = QTAILQ_HEAD_INITIALIZER(qemu_gluster_create_opts.head),
@@ -719,6 +901,7 @@ static BlockDriver bdrv_gluster = {
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
     .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
 #endif
+    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
     .create_opts                  = &qemu_gluster_create_opts,
 };
 
@@ -746,6 +929,7 @@ static BlockDriver bdrv_gluster_tcp = {
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
     .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
 #endif
+    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
     .create_opts                  = &qemu_gluster_create_opts,
 };
 
@@ -773,6 +957,7 @@ static BlockDriver bdrv_gluster_unix = {
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
     .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
 #endif
+    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
     .create_opts                  = &qemu_gluster_create_opts,
 };
 
@@ -800,6 +985,7 @@ static BlockDriver bdrv_gluster_rdma = {
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
     .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
 #endif
+    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
     .create_opts                  = &qemu_gluster_create_opts,
 };
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v2] block/gluster: add support for SEEK_DATA/SEEK_HOLE
  2016-03-09 12:30         ` [Qemu-devel] [PATCH v2] " Niels de Vos
@ 2016-03-09 15:46           ` Jeff Cody
  2016-03-09 18:12             ` Niels de Vos
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Cody @ 2016-03-09 15:46 UTC (permalink / raw)
  To: Niels de Vos; +Cc: Kevin Wolf, qemu-devel, qemu-block, Prasanna Kumar Kalever

On Wed, Mar 09, 2016 at 01:30:14PM +0100, Niels de Vos wrote:
> GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
> it possible to detect sparse areas in files.
> 
> Signed-off-by: Niels de Vos <ndevos@redhat.com>
> 
> ---
> Tested by compiling and running "qemu-img map gluster://..." with a
> build of the current master branch of glusterfs. Using a Fedora cloud
> image (in raw format) shows many SEEK procudure calls going back and
> forth over the network. The output of "qemu map" matches the output when
> run against the image on the local filesystem.
> 
> v2 based on feedback from Jeff Cody:
> - Replace compile time detection by runtime detection
> - Update return pointer (new argument) for .bdrv_co_get_block_status
> ---
>  block/gluster.c | 186 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 186 insertions(+)
> 
> diff --git a/block/gluster.c b/block/gluster.c
> index 65077a0..b01ab52 100644
> --- a/block/gluster.c
> +++ b/block/gluster.c
> @@ -23,6 +23,7 @@ typedef struct GlusterAIOCB {
>  typedef struct BDRVGlusterState {
>      struct glfs *glfs;
>      struct glfs_fd *fd;
> +    bool supports_seek_data;
>  } BDRVGlusterState;
>  
>  typedef struct GlusterConf {
> @@ -286,6 +287,33 @@ static void qemu_gluster_parse_flags(int bdrv_flags, int *open_flags)
>      }
>  }
>  
> +/*
> + * Do SEEK_DATA/HOLE to detect if it is functional. In order to be usable, it
> + * should return different values for the start of data and the start of a
> + * hole. There are three different cases to handle:
> + *
> + *  - the same position is returned for data/hole (indicates broken gfapi)
> + *  - an error is returned:
> + *     - ENXIO only gets returned if there is valid support on client+server
> + *     - EINVAL is returned when gfapi or the server does not support it
> + */
> +static bool qemu_gluster_test_seek(struct glfs_fd *fd)
> +{
> +    off_t start_data, start_hole;
> +    bool supports_seek_data = false;
> +
> +    start_data = glfs_lseek(fd, 0, SEEK_DATA);
> +    if (start_data != -1) {

I recommend just checking if the returned value is >= 0.

> +        start_hole = glfs_lseek(fd, 0, SEEK_HOLE);
> +        if (start_hole != -1)

Minor formatting nit: per QEMU coding standard, all conditional
statements require brackets.

> +            supports_seek_data = !(start_data == start_hole);
> +    } else if (errno == ENXIO) {

This errno check for ENXIO won't catch the case if an ENXIO error
occurs in the SEEK_HOLE call.

> +        supports_seek_data = true;
> +    }
> +
> +    return supports_seek_data;
> +}
> +
>  static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
>                               int bdrv_flags, Error **errp)
>  {
> @@ -320,6 +348,8 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
>          ret = -errno;
>      }
>  
> +    s->supports_seek_data = qemu_gluster_test_seek(s->fd);
> +
>  out:
>      qemu_opts_del(opts);
>      qemu_gluster_gconf_free(gconf);
> @@ -677,6 +707,158 @@ static int qemu_gluster_has_zero_init(BlockDriverState *bs)
>      return 0;
>  }
>  
> +/*
> + * Find allocation range in @bs around offset @start.
> + * May change underlying file descriptor's file offset.
> + * If @start is not in a hole, store @start in @data, and the
> + * beginning of the next hole in @hole, and return 0.
> + * If @start is in a non-trailing hole, store @start in @hole and the
> + * beginning of the next non-hole in @data, and return 0.
> + * If @start is in a trailing hole or beyond EOF, return -ENXIO.
> + * If we can't find out, return a negative errno other than -ENXIO.
> + *
> + * (Shamefully copied from raw-posix.c, only miniscule adaptions.)
> + */
> +static int find_allocation(BlockDriverState *bs, off_t start,
> +                           off_t *data, off_t *hole)
> +{
> +    BDRVGlusterState *s = bs->opaque;
> +    off_t offs;
> +
> +    if (!s->supports_seek_data)

Another formatting nit: brackets needed here as well.

> +        return -EINVAL;

-ENOTSUP would probably be a better fit here, but I don't care too
much, since the error code isn't passed along outside the gluster
driver.

> +
> +    /*
> +     * SEEK_DATA cases:
> +     * D1. offs == start: start is in data
> +     * D2. offs > start: start is in a hole, next data at offs
> +     * D3. offs < 0, errno = ENXIO: either start is in a trailing hole
> +     *                              or start is beyond EOF
> +     *     If the latter happens, the file has been truncated behind
> +     *     our back since we opened it.  All bets are off then.
> +     *     Treating like a trailing hole is simplest.
> +     * D4. offs < 0, errno != ENXIO: we learned nothing
> +     */
> +    offs = glfs_lseek(s->fd, start, SEEK_DATA);
> +    if (offs < 0) {
> +        return -errno;          /* D3 or D4 */
> +    }
> +    assert(offs >= start);
> +
> +    if (offs > start) {
> +        /* D2: in hole, next data at offs */
> +        *hole = start;
> +        *data = offs;
> +        return 0;
> +    }
> +
> +    /* D1: in data, end not yet known */
> +
> +    /*
> +     * SEEK_HOLE cases:
> +     * H1. offs == start: start is in a hole
> +     *     If this happens here, a hole has been dug behind our back
> +     *     since the previous lseek().
> +     * H2. offs > start: either start is in data, next hole at offs,
> +     *                   or start is in trailing hole, EOF at offs
> +     *     Linux treats trailing holes like any other hole: offs ==
> +     *     start.  Solaris seeks to EOF instead: offs > start (blech).
> +     *     If that happens here, a hole has been dug behind our back
> +     *     since the previous lseek().
> +     * H3. offs < 0, errno = ENXIO: start is beyond EOF
> +     *     If this happens, the file has been truncated behind our
> +     *     back since we opened it.  Treat it like a trailing hole.
> +     * H4. offs < 0, errno != ENXIO: we learned nothing
> +     *     Pretend we know nothing at all, i.e. "forget" about D1.
> +     */
> +    offs = glfs_lseek(s->fd, start, SEEK_HOLE);
> +    if (offs < 0) {
> +        return -errno;          /* D1 and (H3 or H4) */
> +    }
> +    assert(offs >= start);
> +
> +    if (offs > start) {
> +        /*
> +         * D1 and H2: either in data, next hole at offs, or it was in
> +         * data but is now in a trailing hole.  In the latter case,
> +         * all bets are off.  Treating it as if it there was data all
> +         * the way to EOF is safe, so simply do that.
> +         */
> +        *data = start;
> +        *hole = offs;
> +        return 0;
> +    }
> +
> +    /* D1 and H1 */
> +    return -EBUSY;
> +}
> +
> +/*
> + * Returns the allocation status of the specified sectors.
> + *
> + * If 'sector_num' is beyond the end of the disk image the return value is 0
> + * and 'pnum' is set to 0.
> + *
> + * 'pnum' is set to the number of sectors (including and immediately following
> + * the specified sector) that are known to be in the same
> + * allocated/unallocated state.
> + *
> + * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
> + * beyond the end of the disk image it will be clamped.
> + *
> + * (Based on raw_co_get_block_status() from raw-posix.c.)
> + */
> +static int64_t coroutine_fn qemu_gluster_co_get_block_status(
> +	BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,

Nit: a tab snuck in the above line, before "BlockDriverState" (tabs
expanded to spaces in coding guidelines as well).

> +        BlockDriverState **file)
> +{
> +    BDRVGlusterState *s = bs->opaque;
> +    off_t start, data = 0, hole = 0;
> +    int64_t total_size;
> +    int ret = -EINVAL;
> +
> +    if (!s->fd) {
> +        return ret;
> +    }
> +
> +    start = sector_num * BDRV_SECTOR_SIZE;
> +    total_size = bdrv_getlength(bs);
> +    if (total_size < 0) {
> +        return total_size;
> +    } else if (start >= total_size) {
> +        *pnum = 0;
> +        return 0;
> +    } else if (start + nb_sectors * BDRV_SECTOR_SIZE > total_size) {
> +        nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
> +    }
> +
> +    ret = find_allocation(bs, start, &data, &hole);
> +    if (ret == -ENXIO) {
> +        /* Trailing hole */
> +        *pnum = nb_sectors;
> +        ret = BDRV_BLOCK_ZERO;
> +    } else if (ret < 0) {
> +        /* No info available, so pretend there are no holes */
> +        *pnum = nb_sectors;
> +        ret = BDRV_BLOCK_DATA;
> +    } else if (data == start) {
> +        /* On a data extent, compute sectors to the end of the extent,
> +         * possibly including a partial sector at EOF. */
> +        *pnum = MIN(nb_sectors, DIV_ROUND_UP(hole - start, BDRV_SECTOR_SIZE));
> +        ret = BDRV_BLOCK_DATA;
> +    } else {
> +        /* On a hole, compute sectors to the beginning of the next extent.  */
> +        assert(hole == start);
> +        *pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
> +        ret = BDRV_BLOCK_ZERO;
> +    }
> +
> +    *file = bs;
> +
> +    return ret | BDRV_BLOCK_OFFSET_VALID | start;
> +}
> +
> +
>  static QemuOptsList qemu_gluster_create_opts = {
>      .name = "qemu-gluster-create-opts",
>      .head = QTAILQ_HEAD_INITIALIZER(qemu_gluster_create_opts.head),
> @@ -719,6 +901,7 @@ static BlockDriver bdrv_gluster = {
>  #ifdef CONFIG_GLUSTERFS_ZEROFILL
>      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
>  #endif
> +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
>      .create_opts                  = &qemu_gluster_create_opts,
>  };
>  
> @@ -746,6 +929,7 @@ static BlockDriver bdrv_gluster_tcp = {
>  #ifdef CONFIG_GLUSTERFS_ZEROFILL
>      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
>  #endif
> +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
>      .create_opts                  = &qemu_gluster_create_opts,
>  };
>  
> @@ -773,6 +957,7 @@ static BlockDriver bdrv_gluster_unix = {
>  #ifdef CONFIG_GLUSTERFS_ZEROFILL
>      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
>  #endif
> +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
>      .create_opts                  = &qemu_gluster_create_opts,
>  };
>  
> @@ -800,6 +985,7 @@ static BlockDriver bdrv_gluster_rdma = {
>  #ifdef CONFIG_GLUSTERFS_ZEROFILL
>      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
>  #endif
> +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
>      .create_opts                  = &qemu_gluster_create_opts,
>  };
>  
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v2] block/gluster: add support for SEEK_DATA/SEEK_HOLE
  2016-03-09 15:46           ` Jeff Cody
@ 2016-03-09 18:12             ` Niels de Vos
  2016-03-09 22:19               ` Jeff Cody
  0 siblings, 1 reply; 16+ messages in thread
From: Niels de Vos @ 2016-03-09 18:12 UTC (permalink / raw)
  To: Jeff Cody; +Cc: Kevin Wolf, qemu-devel, qemu-block, Prasanna Kumar Kalever

On Wed, Mar 09, 2016 at 10:46:02AM -0500, Jeff Cody wrote:
> On Wed, Mar 09, 2016 at 01:30:14PM +0100, Niels de Vos wrote:
> > GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
> > it possible to detect sparse areas in files.
> > 
> > Signed-off-by: Niels de Vos <ndevos@redhat.com>
> > 
> > ---
> > Tested by compiling and running "qemu-img map gluster://..." with a
> > build of the current master branch of glusterfs. Using a Fedora cloud
> > image (in raw format) shows many SEEK procudure calls going back and
> > forth over the network. The output of "qemu map" matches the output when
> > run against the image on the local filesystem.
> > 
> > v2 based on feedback from Jeff Cody:
> > - Replace compile time detection by runtime detection
> > - Update return pointer (new argument) for .bdrv_co_get_block_status
> > ---
> >  block/gluster.c | 186 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 186 insertions(+)
> > 
> > diff --git a/block/gluster.c b/block/gluster.c
> > index 65077a0..b01ab52 100644
> > --- a/block/gluster.c
> > +++ b/block/gluster.c
> > @@ -23,6 +23,7 @@ typedef struct GlusterAIOCB {
> >  typedef struct BDRVGlusterState {
> >      struct glfs *glfs;
> >      struct glfs_fd *fd;
> > +    bool supports_seek_data;
> >  } BDRVGlusterState;
> >  
> >  typedef struct GlusterConf {
> > @@ -286,6 +287,33 @@ static void qemu_gluster_parse_flags(int bdrv_flags, int *open_flags)
> >      }
> >  }
> >  
> > +/*
> > + * Do SEEK_DATA/HOLE to detect if it is functional. In order to be usable, it
> > + * should return different values for the start of data and the start of a
> > + * hole. There are three different cases to handle:
> > + *
> > + *  - the same position is returned for data/hole (indicates broken gfapi)
> > + *  - an error is returned:
> > + *     - ENXIO only gets returned if there is valid support on client+server
> > + *     - EINVAL is returned when gfapi or the server does not support it
> > + */
> > +static bool qemu_gluster_test_seek(struct glfs_fd *fd)
> > +{
> > +    off_t start_data, start_hole;
> > +    bool supports_seek_data = false;
> > +
> > +    start_data = glfs_lseek(fd, 0, SEEK_DATA);
> > +    if (start_data != -1) {
> 
> I recommend just checking if the returned value is >= 0.

Ok, I can change that.

> > +        start_hole = glfs_lseek(fd, 0, SEEK_HOLE);
> > +        if (start_hole != -1)
> 
> Minor formatting nit: per QEMU coding standard, all conditional
> statements require brackets.

Ah, sure, will do.

> > +            supports_seek_data = !(start_data == start_hole);
> > +    } else if (errno == ENXIO) {
> 
> This errno check for ENXIO won't catch the case if an ENXIO error
> occurs in the SEEK_HOLE call.

I'm not sure if I'm following. lseek(SEEK_DATA) returns -1 and sets
errno to ENXIO when the position in the filedescriptor is EOF. In this
test, we check from position=0, so when ENXIO is returned, we know the
file is empty and the return value+errno has gone through the whole
Gluster stack.

EINVAL would be an other error that is expected, in case either (a
current) gfapi or the server do not support SEEK_DATA.

I do not think there is a need to check for ENXIO on lseek(SEEK_HOLE).

Do you have a preference on how I should change it?

> > +        supports_seek_data = true;
> > +    }
> > +
> > +    return supports_seek_data;
> > +}
> > +
> >  static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
> >                               int bdrv_flags, Error **errp)
> >  {
> > @@ -320,6 +348,8 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
> >          ret = -errno;
> >      }
> >  
> > +    s->supports_seek_data = qemu_gluster_test_seek(s->fd);
> > +
> >  out:
> >      qemu_opts_del(opts);
> >      qemu_gluster_gconf_free(gconf);
> > @@ -677,6 +707,158 @@ static int qemu_gluster_has_zero_init(BlockDriverState *bs)
> >      return 0;
> >  }
> >  
> > +/*
> > + * Find allocation range in @bs around offset @start.
> > + * May change underlying file descriptor's file offset.
> > + * If @start is not in a hole, store @start in @data, and the
> > + * beginning of the next hole in @hole, and return 0.
> > + * If @start is in a non-trailing hole, store @start in @hole and the
> > + * beginning of the next non-hole in @data, and return 0.
> > + * If @start is in a trailing hole or beyond EOF, return -ENXIO.
> > + * If we can't find out, return a negative errno other than -ENXIO.
> > + *
> > + * (Shamefully copied from raw-posix.c, only miniscule adaptions.)
> > + */
> > +static int find_allocation(BlockDriverState *bs, off_t start,
> > +                           off_t *data, off_t *hole)
> > +{
> > +    BDRVGlusterState *s = bs->opaque;
> > +    off_t offs;
> > +
> > +    if (!s->supports_seek_data)
> 
> Another formatting nit: brackets needed here as well.

Ok!

> > +        return -EINVAL;
> 
> -ENOTSUP would probably be a better fit here, but I don't care too
> much, since the error code isn't passed along outside the gluster
> driver.

Yes, I agree that -ENOTSUP is nicer, will change that too.

> > +
> > +    /*
> > +     * SEEK_DATA cases:
> > +     * D1. offs == start: start is in data
> > +     * D2. offs > start: start is in a hole, next data at offs
> > +     * D3. offs < 0, errno = ENXIO: either start is in a trailing hole
> > +     *                              or start is beyond EOF
> > +     *     If the latter happens, the file has been truncated behind
> > +     *     our back since we opened it.  All bets are off then.
> > +     *     Treating like a trailing hole is simplest.
> > +     * D4. offs < 0, errno != ENXIO: we learned nothing
> > +     */
> > +    offs = glfs_lseek(s->fd, start, SEEK_DATA);
> > +    if (offs < 0) {
> > +        return -errno;          /* D3 or D4 */
> > +    }
> > +    assert(offs >= start);
> > +
> > +    if (offs > start) {
> > +        /* D2: in hole, next data at offs */
> > +        *hole = start;
> > +        *data = offs;
> > +        return 0;
> > +    }
> > +
> > +    /* D1: in data, end not yet known */
> > +
> > +    /*
> > +     * SEEK_HOLE cases:
> > +     * H1. offs == start: start is in a hole
> > +     *     If this happens here, a hole has been dug behind our back
> > +     *     since the previous lseek().
> > +     * H2. offs > start: either start is in data, next hole at offs,
> > +     *                   or start is in trailing hole, EOF at offs
> > +     *     Linux treats trailing holes like any other hole: offs ==
> > +     *     start.  Solaris seeks to EOF instead: offs > start (blech).
> > +     *     If that happens here, a hole has been dug behind our back
> > +     *     since the previous lseek().
> > +     * H3. offs < 0, errno = ENXIO: start is beyond EOF
> > +     *     If this happens, the file has been truncated behind our
> > +     *     back since we opened it.  Treat it like a trailing hole.
> > +     * H4. offs < 0, errno != ENXIO: we learned nothing
> > +     *     Pretend we know nothing at all, i.e. "forget" about D1.
> > +     */
> > +    offs = glfs_lseek(s->fd, start, SEEK_HOLE);
> > +    if (offs < 0) {
> > +        return -errno;          /* D1 and (H3 or H4) */
> > +    }
> > +    assert(offs >= start);
> > +
> > +    if (offs > start) {
> > +        /*
> > +         * D1 and H2: either in data, next hole at offs, or it was in
> > +         * data but is now in a trailing hole.  In the latter case,
> > +         * all bets are off.  Treating it as if it there was data all
> > +         * the way to EOF is safe, so simply do that.
> > +         */
> > +        *data = start;
> > +        *hole = offs;
> > +        return 0;
> > +    }
> > +
> > +    /* D1 and H1 */
> > +    return -EBUSY;
> > +}
> > +
> > +/*
> > + * Returns the allocation status of the specified sectors.
> > + *
> > + * If 'sector_num' is beyond the end of the disk image the return value is 0
> > + * and 'pnum' is set to 0.
> > + *
> > + * 'pnum' is set to the number of sectors (including and immediately following
> > + * the specified sector) that are known to be in the same
> > + * allocated/unallocated state.
> > + *
> > + * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
> > + * beyond the end of the disk image it will be clamped.
> > + *
> > + * (Based on raw_co_get_block_status() from raw-posix.c.)
> > + */
> > +static int64_t coroutine_fn qemu_gluster_co_get_block_status(
> > +	BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
> 
> Nit: a tab snuck in the above line, before "BlockDriverState" (tabs
> expanded to spaces in coding guidelines as well).

Oh, yuck :-/

Thanks for the review!
Niels


> > +        BlockDriverState **file)
> > +{
> > +    BDRVGlusterState *s = bs->opaque;
> > +    off_t start, data = 0, hole = 0;
> > +    int64_t total_size;
> > +    int ret = -EINVAL;
> > +
> > +    if (!s->fd) {
> > +        return ret;
> > +    }
> > +
> > +    start = sector_num * BDRV_SECTOR_SIZE;
> > +    total_size = bdrv_getlength(bs);
> > +    if (total_size < 0) {
> > +        return total_size;
> > +    } else if (start >= total_size) {
> > +        *pnum = 0;
> > +        return 0;
> > +    } else if (start + nb_sectors * BDRV_SECTOR_SIZE > total_size) {
> > +        nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
> > +    }
> > +
> > +    ret = find_allocation(bs, start, &data, &hole);
> > +    if (ret == -ENXIO) {
> > +        /* Trailing hole */
> > +        *pnum = nb_sectors;
> > +        ret = BDRV_BLOCK_ZERO;
> > +    } else if (ret < 0) {
> > +        /* No info available, so pretend there are no holes */
> > +        *pnum = nb_sectors;
> > +        ret = BDRV_BLOCK_DATA;
> > +    } else if (data == start) {
> > +        /* On a data extent, compute sectors to the end of the extent,
> > +         * possibly including a partial sector at EOF. */
> > +        *pnum = MIN(nb_sectors, DIV_ROUND_UP(hole - start, BDRV_SECTOR_SIZE));
> > +        ret = BDRV_BLOCK_DATA;
> > +    } else {
> > +        /* On a hole, compute sectors to the beginning of the next extent.  */
> > +        assert(hole == start);
> > +        *pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
> > +        ret = BDRV_BLOCK_ZERO;
> > +    }
> > +
> > +    *file = bs;
> > +
> > +    return ret | BDRV_BLOCK_OFFSET_VALID | start;
> > +}
> > +
> > +
> >  static QemuOptsList qemu_gluster_create_opts = {
> >      .name = "qemu-gluster-create-opts",
> >      .head = QTAILQ_HEAD_INITIALIZER(qemu_gluster_create_opts.head),
> > @@ -719,6 +901,7 @@ static BlockDriver bdrv_gluster = {
> >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> >  #endif
> > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> >      .create_opts                  = &qemu_gluster_create_opts,
> >  };
> >  
> > @@ -746,6 +929,7 @@ static BlockDriver bdrv_gluster_tcp = {
> >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> >  #endif
> > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> >      .create_opts                  = &qemu_gluster_create_opts,
> >  };
> >  
> > @@ -773,6 +957,7 @@ static BlockDriver bdrv_gluster_unix = {
> >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> >  #endif
> > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> >      .create_opts                  = &qemu_gluster_create_opts,
> >  };
> >  
> > @@ -800,6 +985,7 @@ static BlockDriver bdrv_gluster_rdma = {
> >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> >  #endif
> > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> >      .create_opts                  = &qemu_gluster_create_opts,
> >  };
> >  
> > -- 
> > 2.5.0
> > 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v2] block/gluster: add support for SEEK_DATA/SEEK_HOLE
  2016-03-09 18:12             ` Niels de Vos
@ 2016-03-09 22:19               ` Jeff Cody
  2016-03-10 18:38                 ` [Qemu-devel] [PATCH v3] " Niels de Vos
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Cody @ 2016-03-09 22:19 UTC (permalink / raw)
  To: Niels de Vos; +Cc: Kevin Wolf, qemu-devel, qemu-block, Prasanna Kumar Kalever

On Wed, Mar 09, 2016 at 07:12:41PM +0100, Niels de Vos wrote:
> On Wed, Mar 09, 2016 at 10:46:02AM -0500, Jeff Cody wrote:
> > On Wed, Mar 09, 2016 at 01:30:14PM +0100, Niels de Vos wrote:
> > > GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
> > > it possible to detect sparse areas in files.
> > > 
> > > Signed-off-by: Niels de Vos <ndevos@redhat.com>
> > > 
> > > ---
> > > Tested by compiling and running "qemu-img map gluster://..." with a
> > > build of the current master branch of glusterfs. Using a Fedora cloud
> > > image (in raw format) shows many SEEK procudure calls going back and
> > > forth over the network. The output of "qemu map" matches the output when
> > > run against the image on the local filesystem.
> > > 
> > > v2 based on feedback from Jeff Cody:
> > > - Replace compile time detection by runtime detection
> > > - Update return pointer (new argument) for .bdrv_co_get_block_status
> > > ---
> > >  block/gluster.c | 186 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 186 insertions(+)
> > > 
> > > diff --git a/block/gluster.c b/block/gluster.c
> > > index 65077a0..b01ab52 100644
> > > --- a/block/gluster.c
> > > +++ b/block/gluster.c
> > > @@ -23,6 +23,7 @@ typedef struct GlusterAIOCB {
> > >  typedef struct BDRVGlusterState {
> > >      struct glfs *glfs;
> > >      struct glfs_fd *fd;
> > > +    bool supports_seek_data;
> > >  } BDRVGlusterState;
> > >  
> > >  typedef struct GlusterConf {
> > > @@ -286,6 +287,33 @@ static void qemu_gluster_parse_flags(int bdrv_flags, int *open_flags)
> > >      }
> > >  }
> > >  
> > > +/*
> > > + * Do SEEK_DATA/HOLE to detect if it is functional. In order to be usable, it
> > > + * should return different values for the start of data and the start of a
> > > + * hole. There are three different cases to handle:
> > > + *
> > > + *  - the same position is returned for data/hole (indicates broken gfapi)
> > > + *  - an error is returned:
> > > + *     - ENXIO only gets returned if there is valid support on client+server
> > > + *     - EINVAL is returned when gfapi or the server does not support it
> > > + */
> > > +static bool qemu_gluster_test_seek(struct glfs_fd *fd)
> > > +{
> > > +    off_t start_data, start_hole;
> > > +    bool supports_seek_data = false;
> > > +
> > > +    start_data = glfs_lseek(fd, 0, SEEK_DATA);
> > > +    if (start_data != -1) {
> > 
> > I recommend just checking if the returned value is >= 0.
> 
> Ok, I can change that.
> 
> > > +        start_hole = glfs_lseek(fd, 0, SEEK_HOLE);
> > > +        if (start_hole != -1)
> > 
> > Minor formatting nit: per QEMU coding standard, all conditional
> > statements require brackets.
> 
> Ah, sure, will do.
> 
> > > +            supports_seek_data = !(start_data == start_hole);
> > > +    } else if (errno == ENXIO) {
> > 
> > This errno check for ENXIO won't catch the case if an ENXIO error
> > occurs in the SEEK_HOLE call.
> 
> I'm not sure if I'm following. lseek(SEEK_DATA) returns -1 and sets
> errno to ENXIO when the position in the filedescriptor is EOF. In this
> test, we check from position=0, so when ENXIO is returned, we know the
> file is empty and the return value+errno has gone through the whole
> Gluster stack.
> 
> EINVAL would be an other error that is expected, in case either (a
> current) gfapi or the server do not support SEEK_DATA.
> 
> I do not think there is a need to check for ENXIO on lseek(SEEK_HOLE).


Hmm, I think you are right - I can't think of any scenario that
SEEK_HOLE should return ENXIO that SEEK_DATA would not.

As matter of fact, if we can rely on ENXIO always indicating if
SEEK_DATA is supported by gluster, then we can make the whole
detection process much simpler, like this:

static bool qemu_gluster_test_seek(struct glfs_fd *fd)
{
    off_t ret, eof;
    eof = glfs_lseek(fd, 0, SEEK_END);
    if (eof < 0) {
        /* this shouldn't occur */
        return false;
    }
    /* this should always fail with ENXIO if SEEK_DATA is supported */
    ret = glfs_lseek(fd, eof, SEEK_DATA);
    return (ret < 0) && (errno == ENXIO);
}


> 
> Do you have a preference on how I should change it?
> 
> > > +        supports_seek_data = true;
> > > +    }
> > > +
> > > +    return supports_seek_data;
> > > +}
> > > +
> > >  static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
> > >                               int bdrv_flags, Error **errp)
> > >  {
> > > @@ -320,6 +348,8 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
> > >          ret = -errno;
> > >      }
> > >  
> > > +    s->supports_seek_data = qemu_gluster_test_seek(s->fd);
> > > +
> > >  out:
> > >      qemu_opts_del(opts);
> > >      qemu_gluster_gconf_free(gconf);
> > > @@ -677,6 +707,158 @@ static int qemu_gluster_has_zero_init(BlockDriverState *bs)
> > >      return 0;
> > >  }
> > >  
> > > +/*
> > > + * Find allocation range in @bs around offset @start.
> > > + * May change underlying file descriptor's file offset.
> > > + * If @start is not in a hole, store @start in @data, and the
> > > + * beginning of the next hole in @hole, and return 0.
> > > + * If @start is in a non-trailing hole, store @start in @hole and the
> > > + * beginning of the next non-hole in @data, and return 0.
> > > + * If @start is in a trailing hole or beyond EOF, return -ENXIO.
> > > + * If we can't find out, return a negative errno other than -ENXIO.
> > > + *
> > > + * (Shamefully copied from raw-posix.c, only miniscule adaptions.)
> > > + */
> > > +static int find_allocation(BlockDriverState *bs, off_t start,
> > > +                           off_t *data, off_t *hole)
> > > +{
> > > +    BDRVGlusterState *s = bs->opaque;
> > > +    off_t offs;
> > > +
> > > +    if (!s->supports_seek_data)
> > 
> > Another formatting nit: brackets needed here as well.
> 
> Ok!
> 
> > > +        return -EINVAL;
> > 
> > -ENOTSUP would probably be a better fit here, but I don't care too
> > much, since the error code isn't passed along outside the gluster
> > driver.
> 
> Yes, I agree that -ENOTSUP is nicer, will change that too.
> 
> > > +
> > > +    /*
> > > +     * SEEK_DATA cases:
> > > +     * D1. offs == start: start is in data
> > > +     * D2. offs > start: start is in a hole, next data at offs
> > > +     * D3. offs < 0, errno = ENXIO: either start is in a trailing hole
> > > +     *                              or start is beyond EOF
> > > +     *     If the latter happens, the file has been truncated behind
> > > +     *     our back since we opened it.  All bets are off then.
> > > +     *     Treating like a trailing hole is simplest.
> > > +     * D4. offs < 0, errno != ENXIO: we learned nothing
> > > +     */
> > > +    offs = glfs_lseek(s->fd, start, SEEK_DATA);
> > > +    if (offs < 0) {
> > > +        return -errno;          /* D3 or D4 */
> > > +    }
> > > +    assert(offs >= start);
> > > +
> > > +    if (offs > start) {
> > > +        /* D2: in hole, next data at offs */
> > > +        *hole = start;
> > > +        *data = offs;
> > > +        return 0;
> > > +    }
> > > +
> > > +    /* D1: in data, end not yet known */
> > > +
> > > +    /*
> > > +     * SEEK_HOLE cases:
> > > +     * H1. offs == start: start is in a hole
> > > +     *     If this happens here, a hole has been dug behind our back
> > > +     *     since the previous lseek().
> > > +     * H2. offs > start: either start is in data, next hole at offs,
> > > +     *                   or start is in trailing hole, EOF at offs
> > > +     *     Linux treats trailing holes like any other hole: offs ==
> > > +     *     start.  Solaris seeks to EOF instead: offs > start (blech).
> > > +     *     If that happens here, a hole has been dug behind our back
> > > +     *     since the previous lseek().
> > > +     * H3. offs < 0, errno = ENXIO: start is beyond EOF
> > > +     *     If this happens, the file has been truncated behind our
> > > +     *     back since we opened it.  Treat it like a trailing hole.
> > > +     * H4. offs < 0, errno != ENXIO: we learned nothing
> > > +     *     Pretend we know nothing at all, i.e. "forget" about D1.
> > > +     */
> > > +    offs = glfs_lseek(s->fd, start, SEEK_HOLE);
> > > +    if (offs < 0) {
> > > +        return -errno;          /* D1 and (H3 or H4) */
> > > +    }
> > > +    assert(offs >= start);
> > > +
> > > +    if (offs > start) {
> > > +        /*
> > > +         * D1 and H2: either in data, next hole at offs, or it was in
> > > +         * data but is now in a trailing hole.  In the latter case,
> > > +         * all bets are off.  Treating it as if it there was data all
> > > +         * the way to EOF is safe, so simply do that.
> > > +         */
> > > +        *data = start;
> > > +        *hole = offs;
> > > +        return 0;
> > > +    }
> > > +
> > > +    /* D1 and H1 */
> > > +    return -EBUSY;
> > > +}
> > > +
> > > +/*
> > > + * Returns the allocation status of the specified sectors.
> > > + *
> > > + * If 'sector_num' is beyond the end of the disk image the return value is 0
> > > + * and 'pnum' is set to 0.
> > > + *
> > > + * 'pnum' is set to the number of sectors (including and immediately following
> > > + * the specified sector) that are known to be in the same
> > > + * allocated/unallocated state.
> > > + *
> > > + * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
> > > + * beyond the end of the disk image it will be clamped.
> > > + *
> > > + * (Based on raw_co_get_block_status() from raw-posix.c.)
> > > + */
> > > +static int64_t coroutine_fn qemu_gluster_co_get_block_status(
> > > +	BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
> > 
> > Nit: a tab snuck in the above line, before "BlockDriverState" (tabs
> > expanded to spaces in coding guidelines as well).
> 
> Oh, yuck :-/
> 
> Thanks for the review!
> Niels
> 
> 
> > > +        BlockDriverState **file)
> > > +{
> > > +    BDRVGlusterState *s = bs->opaque;
> > > +    off_t start, data = 0, hole = 0;
> > > +    int64_t total_size;
> > > +    int ret = -EINVAL;
> > > +
> > > +    if (!s->fd) {
> > > +        return ret;
> > > +    }
> > > +
> > > +    start = sector_num * BDRV_SECTOR_SIZE;
> > > +    total_size = bdrv_getlength(bs);
> > > +    if (total_size < 0) {
> > > +        return total_size;
> > > +    } else if (start >= total_size) {
> > > +        *pnum = 0;
> > > +        return 0;
> > > +    } else if (start + nb_sectors * BDRV_SECTOR_SIZE > total_size) {
> > > +        nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
> > > +    }
> > > +
> > > +    ret = find_allocation(bs, start, &data, &hole);
> > > +    if (ret == -ENXIO) {
> > > +        /* Trailing hole */
> > > +        *pnum = nb_sectors;
> > > +        ret = BDRV_BLOCK_ZERO;
> > > +    } else if (ret < 0) {
> > > +        /* No info available, so pretend there are no holes */
> > > +        *pnum = nb_sectors;
> > > +        ret = BDRV_BLOCK_DATA;
> > > +    } else if (data == start) {
> > > +        /* On a data extent, compute sectors to the end of the extent,
> > > +         * possibly including a partial sector at EOF. */
> > > +        *pnum = MIN(nb_sectors, DIV_ROUND_UP(hole - start, BDRV_SECTOR_SIZE));
> > > +        ret = BDRV_BLOCK_DATA;
> > > +    } else {
> > > +        /* On a hole, compute sectors to the beginning of the next extent.  */
> > > +        assert(hole == start);
> > > +        *pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
> > > +        ret = BDRV_BLOCK_ZERO;
> > > +    }
> > > +
> > > +    *file = bs;
> > > +
> > > +    return ret | BDRV_BLOCK_OFFSET_VALID | start;
> > > +}
> > > +
> > > +
> > >  static QemuOptsList qemu_gluster_create_opts = {
> > >      .name = "qemu-gluster-create-opts",
> > >      .head = QTAILQ_HEAD_INITIALIZER(qemu_gluster_create_opts.head),
> > > @@ -719,6 +901,7 @@ static BlockDriver bdrv_gluster = {
> > >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> > >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> > >  #endif
> > > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> > >      .create_opts                  = &qemu_gluster_create_opts,
> > >  };
> > >  
> > > @@ -746,6 +929,7 @@ static BlockDriver bdrv_gluster_tcp = {
> > >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> > >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> > >  #endif
> > > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> > >      .create_opts                  = &qemu_gluster_create_opts,
> > >  };
> > >  
> > > @@ -773,6 +957,7 @@ static BlockDriver bdrv_gluster_unix = {
> > >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> > >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> > >  #endif
> > > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> > >      .create_opts                  = &qemu_gluster_create_opts,
> > >  };
> > >  
> > > @@ -800,6 +985,7 @@ static BlockDriver bdrv_gluster_rdma = {
> > >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> > >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> > >  #endif
> > > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> > >      .create_opts                  = &qemu_gluster_create_opts,
> > >  };
> > >  
> > > -- 
> > > 2.5.0
> > > 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v3] block/gluster: add support for SEEK_DATA/SEEK_HOLE
  2016-03-09 22:19               ` Jeff Cody
@ 2016-03-10 18:38                 ` Niels de Vos
  2016-03-15 19:50                   ` Jeff Cody
  0 siblings, 1 reply; 16+ messages in thread
From: Niels de Vos @ 2016-03-10 18:38 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: Kevin Wolf, Jeff Cody, Prasanna Kumar Kalever, Niels de Vos

GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
it possible to detect sparse areas in files.

Signed-off-by: Niels de Vos <ndevos@redhat.com>

---
Tested by compiling and running "qemu-img map gluster://..." with a
build of the current master branch of glusterfs. Using a Fedora cloud
image (in raw format) shows many SEEK procudure calls going back and
forth over the network. The output of "qemu map" matches the output when
run against the image on the local filesystem.

v2 based on feedback from Jeff Cody:
- Replace compile time detection by runtime detection
- Update return pointer (new argument) for .bdrv_co_get_block_status
---
 block/gluster.c | 182 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 182 insertions(+)

diff --git a/block/gluster.c b/block/gluster.c
index 65077a0..a4f0628 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -23,6 +23,7 @@ typedef struct GlusterAIOCB {
 typedef struct BDRVGlusterState {
     struct glfs *glfs;
     struct glfs_fd *fd;
+    bool supports_seek_data;
 } BDRVGlusterState;
 
 typedef struct GlusterConf {
@@ -286,6 +287,28 @@ static void qemu_gluster_parse_flags(int bdrv_flags, int *open_flags)
     }
 }
 
+/*
+ * Do SEEK_DATA/HOLE to detect if it is functional. Older broken versions of
+ * gfapi incorrectly return the current offset when SEEK_DATA/HOLE is used.
+ * - Corrected versions return -1 and set errno to EINVAL.
+ * - Versions that support SEEK_DATA/HOLE correctly, will return -1 and set
+ *   errno to ENXIO when SEEK_DATA is called with a position of EOF.
+ */
+static bool qemu_gluster_test_seek(struct glfs_fd *fd)
+{
+    off_t ret, eof;
+
+    eof = glfs_lseek(fd, 0, SEEK_END);
+    if (eof < 0) {
+        /* this should never occur */
+        return false;
+    }
+
+    /* this should always fail with ENXIO if SEEK_DATA is supported */
+    ret = glfs_lseek(fd, eof, SEEK_DATA);
+    return (ret < 0) && (errno == ENXIO);
+}
+
 static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
                              int bdrv_flags, Error **errp)
 {
@@ -320,6 +343,8 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
         ret = -errno;
     }
 
+    s->supports_seek_data = qemu_gluster_test_seek(s->fd);
+
 out:
     qemu_opts_del(opts);
     qemu_gluster_gconf_free(gconf);
@@ -677,6 +702,159 @@ static int qemu_gluster_has_zero_init(BlockDriverState *bs)
     return 0;
 }
 
+/*
+ * Find allocation range in @bs around offset @start.
+ * May change underlying file descriptor's file offset.
+ * If @start is not in a hole, store @start in @data, and the
+ * beginning of the next hole in @hole, and return 0.
+ * If @start is in a non-trailing hole, store @start in @hole and the
+ * beginning of the next non-hole in @data, and return 0.
+ * If @start is in a trailing hole or beyond EOF, return -ENXIO.
+ * If we can't find out, return a negative errno other than -ENXIO.
+ *
+ * (Shamefully copied from raw-posix.c, only miniscule adaptions.)
+ */
+static int find_allocation(BlockDriverState *bs, off_t start,
+                           off_t *data, off_t *hole)
+{
+    BDRVGlusterState *s = bs->opaque;
+    off_t offs;
+
+    if (!s->supports_seek_data) {
+        return -ENOTSUP;
+    }
+
+    /*
+     * SEEK_DATA cases:
+     * D1. offs == start: start is in data
+     * D2. offs > start: start is in a hole, next data at offs
+     * D3. offs < 0, errno = ENXIO: either start is in a trailing hole
+     *                              or start is beyond EOF
+     *     If the latter happens, the file has been truncated behind
+     *     our back since we opened it.  All bets are off then.
+     *     Treating like a trailing hole is simplest.
+     * D4. offs < 0, errno != ENXIO: we learned nothing
+     */
+    offs = glfs_lseek(s->fd, start, SEEK_DATA);
+    if (offs < 0) {
+        return -errno;          /* D3 or D4 */
+    }
+    assert(offs >= start);
+
+    if (offs > start) {
+        /* D2: in hole, next data at offs */
+        *hole = start;
+        *data = offs;
+        return 0;
+    }
+
+    /* D1: in data, end not yet known */
+
+    /*
+     * SEEK_HOLE cases:
+     * H1. offs == start: start is in a hole
+     *     If this happens here, a hole has been dug behind our back
+     *     since the previous lseek().
+     * H2. offs > start: either start is in data, next hole at offs,
+     *                   or start is in trailing hole, EOF at offs
+     *     Linux treats trailing holes like any other hole: offs ==
+     *     start.  Solaris seeks to EOF instead: offs > start (blech).
+     *     If that happens here, a hole has been dug behind our back
+     *     since the previous lseek().
+     * H3. offs < 0, errno = ENXIO: start is beyond EOF
+     *     If this happens, the file has been truncated behind our
+     *     back since we opened it.  Treat it like a trailing hole.
+     * H4. offs < 0, errno != ENXIO: we learned nothing
+     *     Pretend we know nothing at all, i.e. "forget" about D1.
+     */
+    offs = glfs_lseek(s->fd, start, SEEK_HOLE);
+    if (offs < 0) {
+        return -errno;          /* D1 and (H3 or H4) */
+    }
+    assert(offs >= start);
+
+    if (offs > start) {
+        /*
+         * D1 and H2: either in data, next hole at offs, or it was in
+         * data but is now in a trailing hole.  In the latter case,
+         * all bets are off.  Treating it as if it there was data all
+         * the way to EOF is safe, so simply do that.
+         */
+        *data = start;
+        *hole = offs;
+        return 0;
+    }
+
+    /* D1 and H1 */
+    return -EBUSY;
+}
+
+/*
+ * Returns the allocation status of the specified sectors.
+ *
+ * If 'sector_num' is beyond the end of the disk image the return value is 0
+ * and 'pnum' is set to 0.
+ *
+ * 'pnum' is set to the number of sectors (including and immediately following
+ * the specified sector) that are known to be in the same
+ * allocated/unallocated state.
+ *
+ * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
+ * beyond the end of the disk image it will be clamped.
+ *
+ * (Based on raw_co_get_block_status() from raw-posix.c.)
+ */
+static int64_t coroutine_fn qemu_gluster_co_get_block_status(
+        BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
+        BlockDriverState **file)
+{
+    BDRVGlusterState *s = bs->opaque;
+    off_t start, data = 0, hole = 0;
+    int64_t total_size;
+    int ret = -EINVAL;
+
+    if (!s->fd) {
+        return ret;
+    }
+
+    start = sector_num * BDRV_SECTOR_SIZE;
+    total_size = bdrv_getlength(bs);
+    if (total_size < 0) {
+        return total_size;
+    } else if (start >= total_size) {
+        *pnum = 0;
+        return 0;
+    } else if (start + nb_sectors * BDRV_SECTOR_SIZE > total_size) {
+        nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
+    }
+
+    ret = find_allocation(bs, start, &data, &hole);
+    if (ret == -ENXIO) {
+        /* Trailing hole */
+        *pnum = nb_sectors;
+        ret = BDRV_BLOCK_ZERO;
+    } else if (ret < 0) {
+        /* No info available, so pretend there are no holes */
+        *pnum = nb_sectors;
+        ret = BDRV_BLOCK_DATA;
+    } else if (data == start) {
+        /* On a data extent, compute sectors to the end of the extent,
+         * possibly including a partial sector at EOF. */
+        *pnum = MIN(nb_sectors, DIV_ROUND_UP(hole - start, BDRV_SECTOR_SIZE));
+        ret = BDRV_BLOCK_DATA;
+    } else {
+        /* On a hole, compute sectors to the beginning of the next extent.  */
+        assert(hole == start);
+        *pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
+        ret = BDRV_BLOCK_ZERO;
+    }
+
+    *file = bs;
+
+    return ret | BDRV_BLOCK_OFFSET_VALID | start;
+}
+
+
 static QemuOptsList qemu_gluster_create_opts = {
     .name = "qemu-gluster-create-opts",
     .head = QTAILQ_HEAD_INITIALIZER(qemu_gluster_create_opts.head),
@@ -719,6 +897,7 @@ static BlockDriver bdrv_gluster = {
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
     .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
 #endif
+    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
     .create_opts                  = &qemu_gluster_create_opts,
 };
 
@@ -746,6 +925,7 @@ static BlockDriver bdrv_gluster_tcp = {
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
     .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
 #endif
+    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
     .create_opts                  = &qemu_gluster_create_opts,
 };
 
@@ -773,6 +953,7 @@ static BlockDriver bdrv_gluster_unix = {
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
     .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
 #endif
+    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
     .create_opts                  = &qemu_gluster_create_opts,
 };
 
@@ -800,6 +981,7 @@ static BlockDriver bdrv_gluster_rdma = {
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
     .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
 #endif
+    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
     .create_opts                  = &qemu_gluster_create_opts,
 };
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v3] block/gluster: add support for SEEK_DATA/SEEK_HOLE
  2016-03-10 18:38                 ` [Qemu-devel] [PATCH v3] " Niels de Vos
@ 2016-03-15 19:50                   ` Jeff Cody
  2016-03-15 19:52                     ` Jeff Cody
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Cody @ 2016-03-15 19:50 UTC (permalink / raw)
  To: Niels de Vos; +Cc: Kevin Wolf, qemu-devel, qemu-block, Prasanna Kumar Kalever

On Thu, Mar 10, 2016 at 07:38:00PM +0100, Niels de Vos wrote:
> GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
> it possible to detect sparse areas in files.
> 
> Signed-off-by: Niels de Vos <ndevos@redhat.com>
> 
> ---
> Tested by compiling and running "qemu-img map gluster://..." with a
> build of the current master branch of glusterfs. Using a Fedora cloud
> image (in raw format) shows many SEEK procudure calls going back and
> forth over the network. The output of "qemu map" matches the output when
> run against the image on the local filesystem.
> 
> v2 based on feedback from Jeff Cody:
> - Replace compile time detection by runtime detection
> - Update return pointer (new argument) for .bdrv_co_get_block_status
> ---
>  block/gluster.c | 182 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 182 insertions(+)
> 
> diff --git a/block/gluster.c b/block/gluster.c
> index 65077a0..a4f0628 100644
> --- a/block/gluster.c
> +++ b/block/gluster.c
> @@ -23,6 +23,7 @@ typedef struct GlusterAIOCB {
>  typedef struct BDRVGlusterState {
>      struct glfs *glfs;
>      struct glfs_fd *fd;
> +    bool supports_seek_data;
>  } BDRVGlusterState;
>  
>  typedef struct GlusterConf {
> @@ -286,6 +287,28 @@ static void qemu_gluster_parse_flags(int bdrv_flags, int *open_flags)
>      }
>  }
>  
> +/*
> + * Do SEEK_DATA/HOLE to detect if it is functional. Older broken versions of
> + * gfapi incorrectly return the current offset when SEEK_DATA/HOLE is used.
> + * - Corrected versions return -1 and set errno to EINVAL.
> + * - Versions that support SEEK_DATA/HOLE correctly, will return -1 and set
> + *   errno to ENXIO when SEEK_DATA is called with a position of EOF.
> + */
> +static bool qemu_gluster_test_seek(struct glfs_fd *fd)
> +{
> +    off_t ret, eof;
> +
> +    eof = glfs_lseek(fd, 0, SEEK_END);
> +    if (eof < 0) {
> +        /* this should never occur */
> +        return false;
> +    }
> +
> +    /* this should always fail with ENXIO if SEEK_DATA is supported */
> +    ret = glfs_lseek(fd, eof, SEEK_DATA);
> +    return (ret < 0) && (errno == ENXIO);
> +}
> +
>  static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
>                               int bdrv_flags, Error **errp)
>  {
> @@ -320,6 +343,8 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
>          ret = -errno;
>      }
>  
> +    s->supports_seek_data = qemu_gluster_test_seek(s->fd);
> +
>  out:
>      qemu_opts_del(opts);
>      qemu_gluster_gconf_free(gconf);
> @@ -677,6 +702,159 @@ static int qemu_gluster_has_zero_init(BlockDriverState *bs)
>      return 0;
>  }
>  
> +/*
> + * Find allocation range in @bs around offset @start.
> + * May change underlying file descriptor's file offset.
> + * If @start is not in a hole, store @start in @data, and the
> + * beginning of the next hole in @hole, and return 0.
> + * If @start is in a non-trailing hole, store @start in @hole and the
> + * beginning of the next non-hole in @data, and return 0.
> + * If @start is in a trailing hole or beyond EOF, return -ENXIO.
> + * If we can't find out, return a negative errno other than -ENXIO.
> + *
> + * (Shamefully copied from raw-posix.c, only miniscule adaptions.)
> + */
> +static int find_allocation(BlockDriverState *bs, off_t start,
> +                           off_t *data, off_t *hole)
> +{
> +    BDRVGlusterState *s = bs->opaque;
> +    off_t offs;
> +
> +    if (!s->supports_seek_data) {
> +        return -ENOTSUP;
> +    }
> +
> +    /*
> +     * SEEK_DATA cases:
> +     * D1. offs == start: start is in data
> +     * D2. offs > start: start is in a hole, next data at offs
> +     * D3. offs < 0, errno = ENXIO: either start is in a trailing hole
> +     *                              or start is beyond EOF
> +     *     If the latter happens, the file has been truncated behind
> +     *     our back since we opened it.  All bets are off then.
> +     *     Treating like a trailing hole is simplest.
> +     * D4. offs < 0, errno != ENXIO: we learned nothing
> +     */
> +    offs = glfs_lseek(s->fd, start, SEEK_DATA);
> +    if (offs < 0) {
> +        return -errno;          /* D3 or D4 */
> +    }
> +    assert(offs >= start);
> +
> +    if (offs > start) {
> +        /* D2: in hole, next data at offs */
> +        *hole = start;
> +        *data = offs;
> +        return 0;
> +    }
> +
> +    /* D1: in data, end not yet known */
> +
> +    /*
> +     * SEEK_HOLE cases:
> +     * H1. offs == start: start is in a hole
> +     *     If this happens here, a hole has been dug behind our back
> +     *     since the previous lseek().
> +     * H2. offs > start: either start is in data, next hole at offs,
> +     *                   or start is in trailing hole, EOF at offs
> +     *     Linux treats trailing holes like any other hole: offs ==
> +     *     start.  Solaris seeks to EOF instead: offs > start (blech).
> +     *     If that happens here, a hole has been dug behind our back
> +     *     since the previous lseek().
> +     * H3. offs < 0, errno = ENXIO: start is beyond EOF
> +     *     If this happens, the file has been truncated behind our
> +     *     back since we opened it.  Treat it like a trailing hole.
> +     * H4. offs < 0, errno != ENXIO: we learned nothing
> +     *     Pretend we know nothing at all, i.e. "forget" about D1.
> +     */
> +    offs = glfs_lseek(s->fd, start, SEEK_HOLE);
> +    if (offs < 0) {
> +        return -errno;          /* D1 and (H3 or H4) */
> +    }
> +    assert(offs >= start);
> +
> +    if (offs > start) {
> +        /*
> +         * D1 and H2: either in data, next hole at offs, or it was in
> +         * data but is now in a trailing hole.  In the latter case,
> +         * all bets are off.  Treating it as if it there was data all
> +         * the way to EOF is safe, so simply do that.
> +         */
> +        *data = start;
> +        *hole = offs;
> +        return 0;
> +    }
> +
> +    /* D1 and H1 */
> +    return -EBUSY;
> +}
> +
> +/*
> + * Returns the allocation status of the specified sectors.
> + *
> + * If 'sector_num' is beyond the end of the disk image the return value is 0
> + * and 'pnum' is set to 0.
> + *
> + * 'pnum' is set to the number of sectors (including and immediately following
> + * the specified sector) that are known to be in the same
> + * allocated/unallocated state.
> + *
> + * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
> + * beyond the end of the disk image it will be clamped.
> + *
> + * (Based on raw_co_get_block_status() from raw-posix.c.)
> + */
> +static int64_t coroutine_fn qemu_gluster_co_get_block_status(
> +        BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
> +        BlockDriverState **file)
> +{
> +    BDRVGlusterState *s = bs->opaque;
> +    off_t start, data = 0, hole = 0;
> +    int64_t total_size;
> +    int ret = -EINVAL;
> +
> +    if (!s->fd) {
> +        return ret;
> +    }
> +
> +    start = sector_num * BDRV_SECTOR_SIZE;
> +    total_size = bdrv_getlength(bs);
> +    if (total_size < 0) {
> +        return total_size;
> +    } else if (start >= total_size) {
> +        *pnum = 0;
> +        return 0;
> +    } else if (start + nb_sectors * BDRV_SECTOR_SIZE > total_size) {
> +        nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
> +    }
> +
> +    ret = find_allocation(bs, start, &data, &hole);
> +    if (ret == -ENXIO) {
> +        /* Trailing hole */
> +        *pnum = nb_sectors;
> +        ret = BDRV_BLOCK_ZERO;
> +    } else if (ret < 0) {
> +        /* No info available, so pretend there are no holes */
> +        *pnum = nb_sectors;
> +        ret = BDRV_BLOCK_DATA;
> +    } else if (data == start) {
> +        /* On a data extent, compute sectors to the end of the extent,
> +         * possibly including a partial sector at EOF. */
> +        *pnum = MIN(nb_sectors, DIV_ROUND_UP(hole - start, BDRV_SECTOR_SIZE));
> +        ret = BDRV_BLOCK_DATA;
> +    } else {
> +        /* On a hole, compute sectors to the beginning of the next extent.  */
> +        assert(hole == start);
> +        *pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
> +        ret = BDRV_BLOCK_ZERO;
> +    }
> +
> +    *file = bs;
> +
> +    return ret | BDRV_BLOCK_OFFSET_VALID | start;
> +}
> +
> +
>  static QemuOptsList qemu_gluster_create_opts = {
>      .name = "qemu-gluster-create-opts",
>      .head = QTAILQ_HEAD_INITIALIZER(qemu_gluster_create_opts.head),
> @@ -719,6 +897,7 @@ static BlockDriver bdrv_gluster = {
>  #ifdef CONFIG_GLUSTERFS_ZEROFILL
>      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
>  #endif
> +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
>      .create_opts                  = &qemu_gluster_create_opts,
>  };
>  
> @@ -746,6 +925,7 @@ static BlockDriver bdrv_gluster_tcp = {
>  #ifdef CONFIG_GLUSTERFS_ZEROFILL
>      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
>  #endif
> +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
>      .create_opts                  = &qemu_gluster_create_opts,
>  };
>  
> @@ -773,6 +953,7 @@ static BlockDriver bdrv_gluster_unix = {
>  #ifdef CONFIG_GLUSTERFS_ZEROFILL
>      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
>  #endif
> +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
>      .create_opts                  = &qemu_gluster_create_opts,
>  };
>  
> @@ -800,6 +981,7 @@ static BlockDriver bdrv_gluster_rdma = {
>  #ifdef CONFIG_GLUSTERFS_ZEROFILL
>      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
>  #endif
> +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
>      .create_opts                  = &qemu_gluster_create_opts,
>  };
>  
> -- 
> 2.5.0
> 

Thanks,

Applied to my block branch:

git://github.com/codyprime/qemu-kvm-jtc.git block

-Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v3] block/gluster: add support for SEEK_DATA/SEEK_HOLE
  2016-03-15 19:50                   ` Jeff Cody
@ 2016-03-15 19:52                     ` Jeff Cody
  2016-03-16  4:08                       ` Niels de Vos
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Cody @ 2016-03-15 19:52 UTC (permalink / raw)
  To: Niels de Vos; +Cc: Kevin Wolf, qemu-devel, qemu-block, Prasanna Kumar Kalever

On Tue, Mar 15, 2016 at 03:50:17PM -0400, Jeff Cody wrote:
> On Thu, Mar 10, 2016 at 07:38:00PM +0100, Niels de Vos wrote:
> > GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
> > it possible to detect sparse areas in files.
> > 
> > Signed-off-by: Niels de Vos <ndevos@redhat.com>
> > 
> > ---
> > Tested by compiling and running "qemu-img map gluster://..." with a
> > build of the current master branch of glusterfs. Using a Fedora cloud
> > image (in raw format) shows many SEEK procudure calls going back and
> > forth over the network. The output of "qemu map" matches the output when
> > run against the image on the local filesystem.
> > 
> > v2 based on feedback from Jeff Cody:
> > - Replace compile time detection by runtime detection
> > - Update return pointer (new argument) for .bdrv_co_get_block_status
> > ---
> >  block/gluster.c | 182 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 182 insertions(+)
> > 
> > diff --git a/block/gluster.c b/block/gluster.c
> > index 65077a0..a4f0628 100644
> > --- a/block/gluster.c
> > +++ b/block/gluster.c
> > @@ -23,6 +23,7 @@ typedef struct GlusterAIOCB {
> >  typedef struct BDRVGlusterState {
> >      struct glfs *glfs;
> >      struct glfs_fd *fd;
> > +    bool supports_seek_data;
> >  } BDRVGlusterState;
> >  
> >  typedef struct GlusterConf {
> > @@ -286,6 +287,28 @@ static void qemu_gluster_parse_flags(int bdrv_flags, int *open_flags)
> >      }
> >  }
> >  
> > +/*
> > + * Do SEEK_DATA/HOLE to detect if it is functional. Older broken versions of
> > + * gfapi incorrectly return the current offset when SEEK_DATA/HOLE is used.
> > + * - Corrected versions return -1 and set errno to EINVAL.
> > + * - Versions that support SEEK_DATA/HOLE correctly, will return -1 and set
> > + *   errno to ENXIO when SEEK_DATA is called with a position of EOF.
> > + */
> > +static bool qemu_gluster_test_seek(struct glfs_fd *fd)
> > +{
> > +    off_t ret, eof;
> > +
> > +    eof = glfs_lseek(fd, 0, SEEK_END);
> > +    if (eof < 0) {
> > +        /* this should never occur */
> > +        return false;
> > +    }
> > +
> > +    /* this should always fail with ENXIO if SEEK_DATA is supported */
> > +    ret = glfs_lseek(fd, eof, SEEK_DATA);
> > +    return (ret < 0) && (errno == ENXIO);
> > +}
> > +
> >  static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
> >                               int bdrv_flags, Error **errp)
> >  {
> > @@ -320,6 +343,8 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
> >          ret = -errno;
> >      }
> >  
> > +    s->supports_seek_data = qemu_gluster_test_seek(s->fd);
> > +
> >  out:
> >      qemu_opts_del(opts);
> >      qemu_gluster_gconf_free(gconf);
> > @@ -677,6 +702,159 @@ static int qemu_gluster_has_zero_init(BlockDriverState *bs)
> >      return 0;
> >  }
> >  
> > +/*
> > + * Find allocation range in @bs around offset @start.
> > + * May change underlying file descriptor's file offset.
> > + * If @start is not in a hole, store @start in @data, and the
> > + * beginning of the next hole in @hole, and return 0.
> > + * If @start is in a non-trailing hole, store @start in @hole and the
> > + * beginning of the next non-hole in @data, and return 0.
> > + * If @start is in a trailing hole or beyond EOF, return -ENXIO.
> > + * If we can't find out, return a negative errno other than -ENXIO.
> > + *
> > + * (Shamefully copied from raw-posix.c, only miniscule adaptions.)
> > + */
> > +static int find_allocation(BlockDriverState *bs, off_t start,
> > +                           off_t *data, off_t *hole)
> > +{
> > +    BDRVGlusterState *s = bs->opaque;
> > +    off_t offs;
> > +
> > +    if (!s->supports_seek_data) {
> > +        return -ENOTSUP;
> > +    }
> > +
> > +    /*
> > +     * SEEK_DATA cases:
> > +     * D1. offs == start: start is in data
> > +     * D2. offs > start: start is in a hole, next data at offs
> > +     * D3. offs < 0, errno = ENXIO: either start is in a trailing hole
> > +     *                              or start is beyond EOF
> > +     *     If the latter happens, the file has been truncated behind
> > +     *     our back since we opened it.  All bets are off then.
> > +     *     Treating like a trailing hole is simplest.
> > +     * D4. offs < 0, errno != ENXIO: we learned nothing
> > +     */
> > +    offs = glfs_lseek(s->fd, start, SEEK_DATA);
> > +    if (offs < 0) {
> > +        return -errno;          /* D3 or D4 */
> > +    }
> > +    assert(offs >= start);
> > +
> > +    if (offs > start) {
> > +        /* D2: in hole, next data at offs */
> > +        *hole = start;
> > +        *data = offs;
> > +        return 0;
> > +    }
> > +
> > +    /* D1: in data, end not yet known */
> > +
> > +    /*
> > +     * SEEK_HOLE cases:
> > +     * H1. offs == start: start is in a hole
> > +     *     If this happens here, a hole has been dug behind our back
> > +     *     since the previous lseek().
> > +     * H2. offs > start: either start is in data, next hole at offs,
> > +     *                   or start is in trailing hole, EOF at offs
> > +     *     Linux treats trailing holes like any other hole: offs ==
> > +     *     start.  Solaris seeks to EOF instead: offs > start (blech).
> > +     *     If that happens here, a hole has been dug behind our back
> > +     *     since the previous lseek().
> > +     * H3. offs < 0, errno = ENXIO: start is beyond EOF
> > +     *     If this happens, the file has been truncated behind our
> > +     *     back since we opened it.  Treat it like a trailing hole.
> > +     * H4. offs < 0, errno != ENXIO: we learned nothing
> > +     *     Pretend we know nothing at all, i.e. "forget" about D1.
> > +     */
> > +    offs = glfs_lseek(s->fd, start, SEEK_HOLE);
> > +    if (offs < 0) {
> > +        return -errno;          /* D1 and (H3 or H4) */
> > +    }
> > +    assert(offs >= start);
> > +
> > +    if (offs > start) {
> > +        /*
> > +         * D1 and H2: either in data, next hole at offs, or it was in
> > +         * data but is now in a trailing hole.  In the latter case,
> > +         * all bets are off.  Treating it as if it there was data all
> > +         * the way to EOF is safe, so simply do that.
> > +         */
> > +        *data = start;
> > +        *hole = offs;
> > +        return 0;
> > +    }
> > +
> > +    /* D1 and H1 */
> > +    return -EBUSY;
> > +}
> > +
> > +/*
> > + * Returns the allocation status of the specified sectors.
> > + *
> > + * If 'sector_num' is beyond the end of the disk image the return value is 0
> > + * and 'pnum' is set to 0.
> > + *
> > + * 'pnum' is set to the number of sectors (including and immediately following
> > + * the specified sector) that are known to be in the same
> > + * allocated/unallocated state.
> > + *
> > + * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
> > + * beyond the end of the disk image it will be clamped.
> > + *
> > + * (Based on raw_co_get_block_status() from raw-posix.c.)
> > + */
> > +static int64_t coroutine_fn qemu_gluster_co_get_block_status(
> > +        BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
> > +        BlockDriverState **file)
> > +{
> > +    BDRVGlusterState *s = bs->opaque;
> > +    off_t start, data = 0, hole = 0;
> > +    int64_t total_size;
> > +    int ret = -EINVAL;
> > +
> > +    if (!s->fd) {
> > +        return ret;
> > +    }
> > +
> > +    start = sector_num * BDRV_SECTOR_SIZE;
> > +    total_size = bdrv_getlength(bs);
> > +    if (total_size < 0) {
> > +        return total_size;
> > +    } else if (start >= total_size) {
> > +        *pnum = 0;
> > +        return 0;
> > +    } else if (start + nb_sectors * BDRV_SECTOR_SIZE > total_size) {
> > +        nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
> > +    }
> > +
> > +    ret = find_allocation(bs, start, &data, &hole);
> > +    if (ret == -ENXIO) {
> > +        /* Trailing hole */
> > +        *pnum = nb_sectors;
> > +        ret = BDRV_BLOCK_ZERO;
> > +    } else if (ret < 0) {
> > +        /* No info available, so pretend there are no holes */
> > +        *pnum = nb_sectors;
> > +        ret = BDRV_BLOCK_DATA;
> > +    } else if (data == start) {
> > +        /* On a data extent, compute sectors to the end of the extent,
> > +         * possibly including a partial sector at EOF. */
> > +        *pnum = MIN(nb_sectors, DIV_ROUND_UP(hole - start, BDRV_SECTOR_SIZE));
> > +        ret = BDRV_BLOCK_DATA;
> > +    } else {
> > +        /* On a hole, compute sectors to the beginning of the next extent.  */
> > +        assert(hole == start);
> > +        *pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
> > +        ret = BDRV_BLOCK_ZERO;
> > +    }
> > +
> > +    *file = bs;
> > +
> > +    return ret | BDRV_BLOCK_OFFSET_VALID | start;
> > +}
> > +
> > +
> >  static QemuOptsList qemu_gluster_create_opts = {
> >      .name = "qemu-gluster-create-opts",
> >      .head = QTAILQ_HEAD_INITIALIZER(qemu_gluster_create_opts.head),
> > @@ -719,6 +897,7 @@ static BlockDriver bdrv_gluster = {
> >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> >  #endif
> > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> >      .create_opts                  = &qemu_gluster_create_opts,
> >  };
> >  
> > @@ -746,6 +925,7 @@ static BlockDriver bdrv_gluster_tcp = {
> >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> >  #endif
> > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> >      .create_opts                  = &qemu_gluster_create_opts,
> >  };
> >  
> > @@ -773,6 +953,7 @@ static BlockDriver bdrv_gluster_unix = {
> >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> >  #endif
> > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> >      .create_opts                  = &qemu_gluster_create_opts,
> >  };
> >  
> > @@ -800,6 +981,7 @@ static BlockDriver bdrv_gluster_rdma = {
> >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> >  #endif
> > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> >      .create_opts                  = &qemu_gluster_create_opts,
> >  };
> >  
> > -- 
> > 2.5.0
> > 
> 
> Thanks,
> 
> Applied to my block branch:
> 
> git://github.com/codyprime/qemu-kvm-jtc.git block


Correction: block-next (for 2.7), not block (for 2.6)

> 
> -Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v3] block/gluster: add support for SEEK_DATA/SEEK_HOLE
  2016-03-15 19:52                     ` Jeff Cody
@ 2016-03-16  4:08                       ` Niels de Vos
  0 siblings, 0 replies; 16+ messages in thread
From: Niels de Vos @ 2016-03-16  4:08 UTC (permalink / raw)
  To: Jeff Cody; +Cc: Kevin Wolf, qemu-devel, qemu-block, Prasanna Kumar Kalever

On Tue, Mar 15, 2016 at 03:52:02PM -0400, Jeff Cody wrote:
> On Tue, Mar 15, 2016 at 03:50:17PM -0400, Jeff Cody wrote:
> > On Thu, Mar 10, 2016 at 07:38:00PM +0100, Niels de Vos wrote:
> > > GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
> > > it possible to detect sparse areas in files.
> > > 
> > > Signed-off-by: Niels de Vos <ndevos@redhat.com>
> > > 
> > > ---
> > > Tested by compiling and running "qemu-img map gluster://..." with a
> > > build of the current master branch of glusterfs. Using a Fedora cloud
> > > image (in raw format) shows many SEEK procudure calls going back and
> > > forth over the network. The output of "qemu map" matches the output when
> > > run against the image on the local filesystem.
> > > 
> > > v2 based on feedback from Jeff Cody:
> > > - Replace compile time detection by runtime detection
> > > - Update return pointer (new argument) for .bdrv_co_get_block_status
> > > ---
> > >  block/gluster.c | 182 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 182 insertions(+)
> > > 
> > > diff --git a/block/gluster.c b/block/gluster.c
> > > index 65077a0..a4f0628 100644
> > > --- a/block/gluster.c
> > > +++ b/block/gluster.c
> > > @@ -23,6 +23,7 @@ typedef struct GlusterAIOCB {
> > >  typedef struct BDRVGlusterState {
> > >      struct glfs *glfs;
> > >      struct glfs_fd *fd;
> > > +    bool supports_seek_data;
> > >  } BDRVGlusterState;
> > >  
> > >  typedef struct GlusterConf {
> > > @@ -286,6 +287,28 @@ static void qemu_gluster_parse_flags(int bdrv_flags, int *open_flags)
> > >      }
> > >  }
> > >  
> > > +/*
> > > + * Do SEEK_DATA/HOLE to detect if it is functional. Older broken versions of
> > > + * gfapi incorrectly return the current offset when SEEK_DATA/HOLE is used.
> > > + * - Corrected versions return -1 and set errno to EINVAL.
> > > + * - Versions that support SEEK_DATA/HOLE correctly, will return -1 and set
> > > + *   errno to ENXIO when SEEK_DATA is called with a position of EOF.
> > > + */
> > > +static bool qemu_gluster_test_seek(struct glfs_fd *fd)
> > > +{
> > > +    off_t ret, eof;
> > > +
> > > +    eof = glfs_lseek(fd, 0, SEEK_END);
> > > +    if (eof < 0) {
> > > +        /* this should never occur */
> > > +        return false;
> > > +    }
> > > +
> > > +    /* this should always fail with ENXIO if SEEK_DATA is supported */
> > > +    ret = glfs_lseek(fd, eof, SEEK_DATA);
> > > +    return (ret < 0) && (errno == ENXIO);
> > > +}
> > > +
> > >  static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
> > >                               int bdrv_flags, Error **errp)
> > >  {
> > > @@ -320,6 +343,8 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
> > >          ret = -errno;
> > >      }
> > >  
> > > +    s->supports_seek_data = qemu_gluster_test_seek(s->fd);
> > > +
> > >  out:
> > >      qemu_opts_del(opts);
> > >      qemu_gluster_gconf_free(gconf);
> > > @@ -677,6 +702,159 @@ static int qemu_gluster_has_zero_init(BlockDriverState *bs)
> > >      return 0;
> > >  }
> > >  
> > > +/*
> > > + * Find allocation range in @bs around offset @start.
> > > + * May change underlying file descriptor's file offset.
> > > + * If @start is not in a hole, store @start in @data, and the
> > > + * beginning of the next hole in @hole, and return 0.
> > > + * If @start is in a non-trailing hole, store @start in @hole and the
> > > + * beginning of the next non-hole in @data, and return 0.
> > > + * If @start is in a trailing hole or beyond EOF, return -ENXIO.
> > > + * If we can't find out, return a negative errno other than -ENXIO.
> > > + *
> > > + * (Shamefully copied from raw-posix.c, only miniscule adaptions.)
> > > + */
> > > +static int find_allocation(BlockDriverState *bs, off_t start,
> > > +                           off_t *data, off_t *hole)
> > > +{
> > > +    BDRVGlusterState *s = bs->opaque;
> > > +    off_t offs;
> > > +
> > > +    if (!s->supports_seek_data) {
> > > +        return -ENOTSUP;
> > > +    }
> > > +
> > > +    /*
> > > +     * SEEK_DATA cases:
> > > +     * D1. offs == start: start is in data
> > > +     * D2. offs > start: start is in a hole, next data at offs
> > > +     * D3. offs < 0, errno = ENXIO: either start is in a trailing hole
> > > +     *                              or start is beyond EOF
> > > +     *     If the latter happens, the file has been truncated behind
> > > +     *     our back since we opened it.  All bets are off then.
> > > +     *     Treating like a trailing hole is simplest.
> > > +     * D4. offs < 0, errno != ENXIO: we learned nothing
> > > +     */
> > > +    offs = glfs_lseek(s->fd, start, SEEK_DATA);
> > > +    if (offs < 0) {
> > > +        return -errno;          /* D3 or D4 */
> > > +    }
> > > +    assert(offs >= start);
> > > +
> > > +    if (offs > start) {
> > > +        /* D2: in hole, next data at offs */
> > > +        *hole = start;
> > > +        *data = offs;
> > > +        return 0;
> > > +    }
> > > +
> > > +    /* D1: in data, end not yet known */
> > > +
> > > +    /*
> > > +     * SEEK_HOLE cases:
> > > +     * H1. offs == start: start is in a hole
> > > +     *     If this happens here, a hole has been dug behind our back
> > > +     *     since the previous lseek().
> > > +     * H2. offs > start: either start is in data, next hole at offs,
> > > +     *                   or start is in trailing hole, EOF at offs
> > > +     *     Linux treats trailing holes like any other hole: offs ==
> > > +     *     start.  Solaris seeks to EOF instead: offs > start (blech).
> > > +     *     If that happens here, a hole has been dug behind our back
> > > +     *     since the previous lseek().
> > > +     * H3. offs < 0, errno = ENXIO: start is beyond EOF
> > > +     *     If this happens, the file has been truncated behind our
> > > +     *     back since we opened it.  Treat it like a trailing hole.
> > > +     * H4. offs < 0, errno != ENXIO: we learned nothing
> > > +     *     Pretend we know nothing at all, i.e. "forget" about D1.
> > > +     */
> > > +    offs = glfs_lseek(s->fd, start, SEEK_HOLE);
> > > +    if (offs < 0) {
> > > +        return -errno;          /* D1 and (H3 or H4) */
> > > +    }
> > > +    assert(offs >= start);
> > > +
> > > +    if (offs > start) {
> > > +        /*
> > > +         * D1 and H2: either in data, next hole at offs, or it was in
> > > +         * data but is now in a trailing hole.  In the latter case,
> > > +         * all bets are off.  Treating it as if it there was data all
> > > +         * the way to EOF is safe, so simply do that.
> > > +         */
> > > +        *data = start;
> > > +        *hole = offs;
> > > +        return 0;
> > > +    }
> > > +
> > > +    /* D1 and H1 */
> > > +    return -EBUSY;
> > > +}
> > > +
> > > +/*
> > > + * Returns the allocation status of the specified sectors.
> > > + *
> > > + * If 'sector_num' is beyond the end of the disk image the return value is 0
> > > + * and 'pnum' is set to 0.
> > > + *
> > > + * 'pnum' is set to the number of sectors (including and immediately following
> > > + * the specified sector) that are known to be in the same
> > > + * allocated/unallocated state.
> > > + *
> > > + * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
> > > + * beyond the end of the disk image it will be clamped.
> > > + *
> > > + * (Based on raw_co_get_block_status() from raw-posix.c.)
> > > + */
> > > +static int64_t coroutine_fn qemu_gluster_co_get_block_status(
> > > +        BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
> > > +        BlockDriverState **file)
> > > +{
> > > +    BDRVGlusterState *s = bs->opaque;
> > > +    off_t start, data = 0, hole = 0;
> > > +    int64_t total_size;
> > > +    int ret = -EINVAL;
> > > +
> > > +    if (!s->fd) {
> > > +        return ret;
> > > +    }
> > > +
> > > +    start = sector_num * BDRV_SECTOR_SIZE;
> > > +    total_size = bdrv_getlength(bs);
> > > +    if (total_size < 0) {
> > > +        return total_size;
> > > +    } else if (start >= total_size) {
> > > +        *pnum = 0;
> > > +        return 0;
> > > +    } else if (start + nb_sectors * BDRV_SECTOR_SIZE > total_size) {
> > > +        nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
> > > +    }
> > > +
> > > +    ret = find_allocation(bs, start, &data, &hole);
> > > +    if (ret == -ENXIO) {
> > > +        /* Trailing hole */
> > > +        *pnum = nb_sectors;
> > > +        ret = BDRV_BLOCK_ZERO;
> > > +    } else if (ret < 0) {
> > > +        /* No info available, so pretend there are no holes */
> > > +        *pnum = nb_sectors;
> > > +        ret = BDRV_BLOCK_DATA;
> > > +    } else if (data == start) {
> > > +        /* On a data extent, compute sectors to the end of the extent,
> > > +         * possibly including a partial sector at EOF. */
> > > +        *pnum = MIN(nb_sectors, DIV_ROUND_UP(hole - start, BDRV_SECTOR_SIZE));
> > > +        ret = BDRV_BLOCK_DATA;
> > > +    } else {
> > > +        /* On a hole, compute sectors to the beginning of the next extent.  */
> > > +        assert(hole == start);
> > > +        *pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
> > > +        ret = BDRV_BLOCK_ZERO;
> > > +    }
> > > +
> > > +    *file = bs;
> > > +
> > > +    return ret | BDRV_BLOCK_OFFSET_VALID | start;
> > > +}
> > > +
> > > +
> > >  static QemuOptsList qemu_gluster_create_opts = {
> > >      .name = "qemu-gluster-create-opts",
> > >      .head = QTAILQ_HEAD_INITIALIZER(qemu_gluster_create_opts.head),
> > > @@ -719,6 +897,7 @@ static BlockDriver bdrv_gluster = {
> > >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> > >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> > >  #endif
> > > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> > >      .create_opts                  = &qemu_gluster_create_opts,
> > >  };
> > >  
> > > @@ -746,6 +925,7 @@ static BlockDriver bdrv_gluster_tcp = {
> > >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> > >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> > >  #endif
> > > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> > >      .create_opts                  = &qemu_gluster_create_opts,
> > >  };
> > >  
> > > @@ -773,6 +953,7 @@ static BlockDriver bdrv_gluster_unix = {
> > >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> > >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> > >  #endif
> > > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> > >      .create_opts                  = &qemu_gluster_create_opts,
> > >  };
> > >  
> > > @@ -800,6 +981,7 @@ static BlockDriver bdrv_gluster_rdma = {
> > >  #ifdef CONFIG_GLUSTERFS_ZEROFILL
> > >      .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
> > >  #endif
> > > +    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
> > >      .create_opts                  = &qemu_gluster_create_opts,
> > >  };
> > >  
> > > -- 
> > > 2.5.0
> > > 
> > 
> > Thanks,
> > 
> > Applied to my block branch:
> > 
> > git://github.com/codyprime/qemu-kvm-jtc.git block
> 
> 
> Correction: block-next (for 2.7), not block (for 2.6)

Ok, thanks!

Niels

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH] block/gluster: add support for SEEK_DATA/SEEK_HOLE
  2016-03-07 19:14 ` [Qemu-devel] [Qemu-block] [PATCH] block/gluster: add support for SEEK_DATA/SEEK_HOLE Eric Blake
@ 2016-10-06 22:09   ` Eric Blake
  2016-10-07  3:57     ` Jeff Cody
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Blake @ 2016-10-06 22:09 UTC (permalink / raw)
  To: Niels de Vos, qemu-block; +Cc: qemu-devel@nongnu.org, Prasanna Kumar Kalever

[-- Attachment #1: Type: text/plain, Size: 2900 bytes --]

On 03/07/2016 01:14 PM, Eric Blake wrote:
> [adding qemu-devel; ALL patches must cc qemu-devel even when sent to
> another list]
> 
> On 03/07/2016 11:04 AM, Niels de Vos wrote:
>> GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
>> it possible to detect sparse areas in files.
>>
>> Signed-off-by: Niels de Vos <ndevos@redhat.com>
>>
>> --
>> Tested by compiling and running "qemu-img map gluster://..." with a
>> build of the current master branch of glusterfs. Using a Fedora
>> cloud image (in raw format) shows many SEEK procudure calls going back
>> and forth over the network. The output of "qemu map" matches the output
>> when run against the image on the local filesystem.
>> ---

I hit a weird failure when trying to compile this on an older RHEL 6
box, where /usr/include/unistd.h is too old to include SEEK_DATA and
SEEK_HOLE:

block/gluster.c: In function ‘qemu_gluster_test_seek’:
block/gluster.c:684: error: ‘SEEK_DATA’ undeclared (first use in this
function)
block/gluster.c:684: error: (Each undeclared identifier is reported only
once
block/gluster.c:684: error: for each function it appears in.)
block/gluster.c: In function ‘find_allocation’:
block/gluster.c:1202: error: ‘SEEK_DATA’ undeclared (first use in this
function)
block/gluster.c:1234: error: ‘SEEK_HOLE’ undeclared (first use in this
function)

The patch has been in place for several months (which shows how seldom I
compile on that particular box), but it makes me wonder why none of the
autobuilders have hit this failure.  But since the code mentions that it
shamelessly copies from raw-posix.c, and that file in turn has #ifdef
guards to only do SEEK_HOLE optimizations if the system headers defined
SEEK_HOLE in the first place, it sounds like you need to do a followup
patch along those lines.


>> + *
>> + * (Shamefully copied from raw-posix.c, only miniscule adaptions.)
>> + */
>> +static int find_allocation(BlockDriverState *bs, off_t start,
>> +                           off_t *data, off_t *hole)
>> +{
>> +    BDRVGlusterState *s = bs->opaque;
>> +    off_t offs;
>> +
>> +    /*
>> +     * SEEK_DATA cases:
>> +     * D1. offs == start: start is in data
>> +     * D2. offs > start: start is in a hole, next data at offs
>> +     * D3. offs < 0, errno = ENXIO: either start is in a trailing hole
>> +     *                              or start is beyond EOF
>> +     *     If the latter happens, the file has been truncated behind
>> +     *     our back since we opened it.  All bets are off then.
>> +     *     Treating like a trailing hole is simplest.
>> +     * D4. offs < 0, errno != ENXIO: we learned nothing
>> +     */
>> +    offs = glfs_lseek(s->fd, start, SEEK_DATA);


-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH] block/gluster: add support for SEEK_DATA/SEEK_HOLE
  2016-10-06 22:09   ` Eric Blake
@ 2016-10-07  3:57     ` Jeff Cody
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff Cody @ 2016-10-07  3:57 UTC (permalink / raw)
  To: Eric Blake
  Cc: Niels de Vos, qemu-block, qemu-devel@nongnu.org,
	Prasanna Kumar Kalever

On Thu, Oct 06, 2016 at 05:09:59PM -0500, Eric Blake wrote:
> On 03/07/2016 01:14 PM, Eric Blake wrote:
> > [adding qemu-devel; ALL patches must cc qemu-devel even when sent to
> > another list]
> > 
> > On 03/07/2016 11:04 AM, Niels de Vos wrote:
> >> GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
> >> it possible to detect sparse areas in files.
> >>
> >> Signed-off-by: Niels de Vos <ndevos@redhat.com>
> >>
> >> --
> >> Tested by compiling and running "qemu-img map gluster://..." with a
> >> build of the current master branch of glusterfs. Using a Fedora
> >> cloud image (in raw format) shows many SEEK procudure calls going back
> >> and forth over the network. The output of "qemu map" matches the output
> >> when run against the image on the local filesystem.
> >> ---
> 
> I hit a weird failure when trying to compile this on an older RHEL 6
> box, where /usr/include/unistd.h is too old to include SEEK_DATA and
> SEEK_HOLE:
> 
> block/gluster.c: In function ‘qemu_gluster_test_seek’:
> block/gluster.c:684: error: ‘SEEK_DATA’ undeclared (first use in this
> function)
> block/gluster.c:684: error: (Each undeclared identifier is reported only
> once
> block/gluster.c:684: error: for each function it appears in.)
> block/gluster.c: In function ‘find_allocation’:
> block/gluster.c:1202: error: ‘SEEK_DATA’ undeclared (first use in this
> function)
> block/gluster.c:1234: error: ‘SEEK_HOLE’ undeclared (first use in this
> function)
> 
> The patch has been in place for several months (which shows how seldom I
> compile on that particular box), but it makes me wonder why none of the
> autobuilders have hit this failure.  But since the code mentions that it
> shamelessly copies from raw-posix.c, and that file in turn has #ifdef
> guards to only do SEEK_HOLE optimizations if the system headers defined
> SEEK_HOLE in the first place, it sounds like you need to do a followup
> patch along those lines.
> 
> 

Ooof.

I just sent a patch for this, but I haven't been able to test it yet.  I'm
in the process of doing that, but since you have a build env already set up
to do it, would you mind trying the patch (just for compilation on RHEL6)?

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-10-07  3:57 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1457373855-8072-1-git-send-email-ndevos@redhat.com>
2016-03-07 19:14 ` [Qemu-devel] [Qemu-block] [PATCH] block/gluster: add support for SEEK_DATA/SEEK_HOLE Eric Blake
2016-10-06 22:09   ` Eric Blake
2016-10-07  3:57     ` Jeff Cody
     [not found] ` <20160307182738.GA14127@localhost.localdomain>
2016-03-08  4:21   ` Niels de Vos
2016-03-08 12:33     ` Jeff Cody
2016-03-08 13:14       ` Niels de Vos
2016-03-09 12:30         ` [Qemu-devel] [PATCH v2] " Niels de Vos
2016-03-09 15:46           ` Jeff Cody
2016-03-09 18:12             ` Niels de Vos
2016-03-09 22:19               ` Jeff Cody
2016-03-10 18:38                 ` [Qemu-devel] [PATCH v3] " Niels de Vos
2016-03-15 19:50                   ` Jeff Cody
2016-03-15 19:52                     ` Jeff Cody
2016-03-16  4:08                       ` Niels de Vos
2016-03-08 12:53     ` [Qemu-devel] [Qemu-block] [PATCH] " Kevin Wolf
2016-03-08 13:19       ` Niels de Vos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).