qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Niels de Vos <ndevos@redhat.com>
To: qemu-devel@nongnu.org, qemu-block@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Jeff Cody <jcody@redhat.com>,
	Prasanna Kumar Kalever <pkalever@redhat.com>,
	Niels de Vos <ndevos@redhat.com>
Subject: [Qemu-devel] [PATCH v2] block/gluster: add support for SEEK_DATA/SEEK_HOLE
Date: Wed,  9 Mar 2016 13:30:14 +0100	[thread overview]
Message-ID: <1457526614-15596-1-git-send-email-ndevos@redhat.com> (raw)
In-Reply-To: <20160308131428.GE18666@ndevos-x240.usersys.redhat.com>

GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
it possible to detect sparse areas in files.

Signed-off-by: Niels de Vos <ndevos@redhat.com>

---
Tested by compiling and running "qemu-img map gluster://..." with a
build of the current master branch of glusterfs. Using a Fedora cloud
image (in raw format) shows many SEEK procudure calls going back and
forth over the network. The output of "qemu map" matches the output when
run against the image on the local filesystem.

v2 based on feedback from Jeff Cody:
- Replace compile time detection by runtime detection
- Update return pointer (new argument) for .bdrv_co_get_block_status
---
 block/gluster.c | 186 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 186 insertions(+)

diff --git a/block/gluster.c b/block/gluster.c
index 65077a0..b01ab52 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -23,6 +23,7 @@ typedef struct GlusterAIOCB {
 typedef struct BDRVGlusterState {
     struct glfs *glfs;
     struct glfs_fd *fd;
+    bool supports_seek_data;
 } BDRVGlusterState;
 
 typedef struct GlusterConf {
@@ -286,6 +287,33 @@ static void qemu_gluster_parse_flags(int bdrv_flags, int *open_flags)
     }
 }
 
+/*
+ * Do SEEK_DATA/HOLE to detect if it is functional. In order to be usable, it
+ * should return different values for the start of data and the start of a
+ * hole. There are three different cases to handle:
+ *
+ *  - the same position is returned for data/hole (indicates broken gfapi)
+ *  - an error is returned:
+ *     - ENXIO only gets returned if there is valid support on client+server
+ *     - EINVAL is returned when gfapi or the server does not support it
+ */
+static bool qemu_gluster_test_seek(struct glfs_fd *fd)
+{
+    off_t start_data, start_hole;
+    bool supports_seek_data = false;
+
+    start_data = glfs_lseek(fd, 0, SEEK_DATA);
+    if (start_data != -1) {
+        start_hole = glfs_lseek(fd, 0, SEEK_HOLE);
+        if (start_hole != -1)
+            supports_seek_data = !(start_data == start_hole);
+    } else if (errno == ENXIO) {
+        supports_seek_data = true;
+    }
+
+    return supports_seek_data;
+}
+
 static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
                              int bdrv_flags, Error **errp)
 {
@@ -320,6 +348,8 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
         ret = -errno;
     }
 
+    s->supports_seek_data = qemu_gluster_test_seek(s->fd);
+
 out:
     qemu_opts_del(opts);
     qemu_gluster_gconf_free(gconf);
@@ -677,6 +707,158 @@ static int qemu_gluster_has_zero_init(BlockDriverState *bs)
     return 0;
 }
 
+/*
+ * Find allocation range in @bs around offset @start.
+ * May change underlying file descriptor's file offset.
+ * If @start is not in a hole, store @start in @data, and the
+ * beginning of the next hole in @hole, and return 0.
+ * If @start is in a non-trailing hole, store @start in @hole and the
+ * beginning of the next non-hole in @data, and return 0.
+ * If @start is in a trailing hole or beyond EOF, return -ENXIO.
+ * If we can't find out, return a negative errno other than -ENXIO.
+ *
+ * (Shamefully copied from raw-posix.c, only miniscule adaptions.)
+ */
+static int find_allocation(BlockDriverState *bs, off_t start,
+                           off_t *data, off_t *hole)
+{
+    BDRVGlusterState *s = bs->opaque;
+    off_t offs;
+
+    if (!s->supports_seek_data)
+        return -EINVAL;
+
+    /*
+     * SEEK_DATA cases:
+     * D1. offs == start: start is in data
+     * D2. offs > start: start is in a hole, next data at offs
+     * D3. offs < 0, errno = ENXIO: either start is in a trailing hole
+     *                              or start is beyond EOF
+     *     If the latter happens, the file has been truncated behind
+     *     our back since we opened it.  All bets are off then.
+     *     Treating like a trailing hole is simplest.
+     * D4. offs < 0, errno != ENXIO: we learned nothing
+     */
+    offs = glfs_lseek(s->fd, start, SEEK_DATA);
+    if (offs < 0) {
+        return -errno;          /* D3 or D4 */
+    }
+    assert(offs >= start);
+
+    if (offs > start) {
+        /* D2: in hole, next data at offs */
+        *hole = start;
+        *data = offs;
+        return 0;
+    }
+
+    /* D1: in data, end not yet known */
+
+    /*
+     * SEEK_HOLE cases:
+     * H1. offs == start: start is in a hole
+     *     If this happens here, a hole has been dug behind our back
+     *     since the previous lseek().
+     * H2. offs > start: either start is in data, next hole at offs,
+     *                   or start is in trailing hole, EOF at offs
+     *     Linux treats trailing holes like any other hole: offs ==
+     *     start.  Solaris seeks to EOF instead: offs > start (blech).
+     *     If that happens here, a hole has been dug behind our back
+     *     since the previous lseek().
+     * H3. offs < 0, errno = ENXIO: start is beyond EOF
+     *     If this happens, the file has been truncated behind our
+     *     back since we opened it.  Treat it like a trailing hole.
+     * H4. offs < 0, errno != ENXIO: we learned nothing
+     *     Pretend we know nothing at all, i.e. "forget" about D1.
+     */
+    offs = glfs_lseek(s->fd, start, SEEK_HOLE);
+    if (offs < 0) {
+        return -errno;          /* D1 and (H3 or H4) */
+    }
+    assert(offs >= start);
+
+    if (offs > start) {
+        /*
+         * D1 and H2: either in data, next hole at offs, or it was in
+         * data but is now in a trailing hole.  In the latter case,
+         * all bets are off.  Treating it as if it there was data all
+         * the way to EOF is safe, so simply do that.
+         */
+        *data = start;
+        *hole = offs;
+        return 0;
+    }
+
+    /* D1 and H1 */
+    return -EBUSY;
+}
+
+/*
+ * Returns the allocation status of the specified sectors.
+ *
+ * If 'sector_num' is beyond the end of the disk image the return value is 0
+ * and 'pnum' is set to 0.
+ *
+ * 'pnum' is set to the number of sectors (including and immediately following
+ * the specified sector) that are known to be in the same
+ * allocated/unallocated state.
+ *
+ * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
+ * beyond the end of the disk image it will be clamped.
+ *
+ * (Based on raw_co_get_block_status() from raw-posix.c.)
+ */
+static int64_t coroutine_fn qemu_gluster_co_get_block_status(
+	BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
+        BlockDriverState **file)
+{
+    BDRVGlusterState *s = bs->opaque;
+    off_t start, data = 0, hole = 0;
+    int64_t total_size;
+    int ret = -EINVAL;
+
+    if (!s->fd) {
+        return ret;
+    }
+
+    start = sector_num * BDRV_SECTOR_SIZE;
+    total_size = bdrv_getlength(bs);
+    if (total_size < 0) {
+        return total_size;
+    } else if (start >= total_size) {
+        *pnum = 0;
+        return 0;
+    } else if (start + nb_sectors * BDRV_SECTOR_SIZE > total_size) {
+        nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
+    }
+
+    ret = find_allocation(bs, start, &data, &hole);
+    if (ret == -ENXIO) {
+        /* Trailing hole */
+        *pnum = nb_sectors;
+        ret = BDRV_BLOCK_ZERO;
+    } else if (ret < 0) {
+        /* No info available, so pretend there are no holes */
+        *pnum = nb_sectors;
+        ret = BDRV_BLOCK_DATA;
+    } else if (data == start) {
+        /* On a data extent, compute sectors to the end of the extent,
+         * possibly including a partial sector at EOF. */
+        *pnum = MIN(nb_sectors, DIV_ROUND_UP(hole - start, BDRV_SECTOR_SIZE));
+        ret = BDRV_BLOCK_DATA;
+    } else {
+        /* On a hole, compute sectors to the beginning of the next extent.  */
+        assert(hole == start);
+        *pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
+        ret = BDRV_BLOCK_ZERO;
+    }
+
+    *file = bs;
+
+    return ret | BDRV_BLOCK_OFFSET_VALID | start;
+}
+
+
 static QemuOptsList qemu_gluster_create_opts = {
     .name = "qemu-gluster-create-opts",
     .head = QTAILQ_HEAD_INITIALIZER(qemu_gluster_create_opts.head),
@@ -719,6 +901,7 @@ static BlockDriver bdrv_gluster = {
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
     .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
 #endif
+    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
     .create_opts                  = &qemu_gluster_create_opts,
 };
 
@@ -746,6 +929,7 @@ static BlockDriver bdrv_gluster_tcp = {
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
     .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
 #endif
+    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
     .create_opts                  = &qemu_gluster_create_opts,
 };
 
@@ -773,6 +957,7 @@ static BlockDriver bdrv_gluster_unix = {
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
     .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
 #endif
+    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
     .create_opts                  = &qemu_gluster_create_opts,
 };
 
@@ -800,6 +985,7 @@ static BlockDriver bdrv_gluster_rdma = {
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
     .bdrv_co_write_zeroes         = qemu_gluster_co_write_zeroes,
 #endif
+    .bdrv_co_get_block_status     = qemu_gluster_co_get_block_status,
     .create_opts                  = &qemu_gluster_create_opts,
 };
 
-- 
2.5.0

  reply	other threads:[~2016-03-09 12:31 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1457373855-8072-1-git-send-email-ndevos@redhat.com>
2016-03-07 19:14 ` [Qemu-devel] [Qemu-block] [PATCH] block/gluster: add support for SEEK_DATA/SEEK_HOLE Eric Blake
2016-10-06 22:09   ` Eric Blake
2016-10-07  3:57     ` Jeff Cody
     [not found] ` <20160307182738.GA14127@localhost.localdomain>
2016-03-08  4:21   ` Niels de Vos
2016-03-08 12:33     ` Jeff Cody
2016-03-08 13:14       ` Niels de Vos
2016-03-09 12:30         ` Niels de Vos [this message]
2016-03-09 15:46           ` [Qemu-devel] [PATCH v2] " Jeff Cody
2016-03-09 18:12             ` Niels de Vos
2016-03-09 22:19               ` Jeff Cody
2016-03-10 18:38                 ` [Qemu-devel] [PATCH v3] " Niels de Vos
2016-03-15 19:50                   ` Jeff Cody
2016-03-15 19:52                     ` Jeff Cody
2016-03-16  4:08                       ` Niels de Vos
2016-03-08 12:53     ` [Qemu-devel] [Qemu-block] [PATCH] " Kevin Wolf
2016-03-08 13:19       ` Niels de Vos

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1457526614-15596-1-git-send-email-ndevos@redhat.com \
    --to=ndevos@redhat.com \
    --cc=jcody@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pkalever@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).