qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 for-2.2 0/3] raw-posix: Get rid of FIEMAP, fix SEEK_HOLE
@ 2014-11-17 10:18 Markus Armbruster
  2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 1/3] raw-posix: Fix comment for raw_co_get_block_status() Markus Armbruster
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Markus Armbruster @ 2014-11-17 10:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, tony, mreitz, stefanha, pbonzini

PATCH 1 is just a comment fix.  PATCH 2 drops FIEMAP use and explains
why it needs to go.  PATCH 3 carefully rewrites the SEEK_HOLE code.

Why 2.2?  The series fixes bugs, but the bugs are either not terribly
severe, or not particularly likely to bite.  The reason I want it
included is we've already changed the code fundamentally in 2.2, from

    Incorrect use of FIEMAP if available, else somewhat incorrect use
    of SEEK_HOLE if available, else pretend no holes

to

    Somewhat incorrect use of SEEK_HOLE if available, else correct but
    slow-as-molasses use of FIEMAP if available, else pretend no holes

Let's finish the job:

    Correct use of SEEK_HOLE if available, else pretend no holes

Less complex, more robust, and no half-broken intermediate version
released.

v3:
* PATCH 1&2 unchanged, except for a commit message typo [Eric]
* PATCH 3&4 redone as PATCH 3, very carefully [Eric, Kevin]

v2:
* PATCH 1 unchanged
* PATCH 2 revised and split up [Paolo, Fam, Eric, Max]

Markus Armbruster (3):
  raw-posix: Fix comment for raw_co_get_block_status()
  raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  raw-posix: The SEEK_HOLE code is flawed, rewrite it

 block/raw-posix.c | 167 ++++++++++++++++++++++++++++--------------------------
 1 file changed, 86 insertions(+), 81 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 for-2.2 1/3] raw-posix: Fix comment for raw_co_get_block_status()
  2014-11-17 10:18 [Qemu-devel] [PATCH v3 for-2.2 0/3] raw-posix: Get rid of FIEMAP, fix SEEK_HOLE Markus Armbruster
@ 2014-11-17 10:18 ` Markus Armbruster
  2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 2/3] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP Markus Armbruster
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Markus Armbruster @ 2014-11-17 10:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, tony, mreitz, stefanha, pbonzini

Missed in commit 705be72.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/raw-posix.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index e100ae2..706d3c0 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1555,9 +1555,7 @@ static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
 }
 
 /*
- * Returns true iff the specified sector is present in the disk image. Drivers
- * not implementing the functionality are assumed to not support backing files,
- * hence all their sectors are reported as allocated.
+ * Returns the allocation status of the specified sectors.
  *
  * If 'sector_num' is beyond the end of the disk image the return value is 0
  * and 'pnum' is set to 0.
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 for-2.2 2/3] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  2014-11-17 10:18 [Qemu-devel] [PATCH v3 for-2.2 0/3] raw-posix: Get rid of FIEMAP, fix SEEK_HOLE Markus Armbruster
  2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 1/3] raw-posix: Fix comment for raw_co_get_block_status() Markus Armbruster
@ 2014-11-17 10:18 ` Markus Armbruster
  2014-11-17 10:30   ` Max Reitz
  2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 3/3] raw-posix: The SEEK_HOLE code is flawed, rewrite it Markus Armbruster
  2014-11-18  8:48 ` [Qemu-devel] [PATCH v3 for-2.2 0/3] raw-posix: Get rid of FIEMAP, fix SEEK_HOLE Max Reitz
  3 siblings, 1 reply; 12+ messages in thread
From: Markus Armbruster @ 2014-11-17 10:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, tony, mreitz, stefanha, pbonzini

Commit 5500316 (May 2012) implemented raw_co_is_allocated() as
follows:

1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl

2. Else if defined(SEEK_HOLE) && defined(SEEK_DATA), use lseek()

3. Else pretend there are no holes

Later on, raw_co_is_allocated() was generalized to
raw_co_get_block_status().

Commit 4f11aa8 (May 2014) changed it to try the three methods in order
until success, because "there may be implementations which support
[SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice
versa."

Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC.
Commit 38c4d0a (Sep 2014) added it.  Because that's a significant
speed hit, the next commit 7c159037 put SEEK_HOLE/SEEK_DATA first.

As you see, the obvious use of FIEMAP is wrong, and the correct use is
slow.  I guess this puts it somewhere between -7 "The obvious use is
wrong" and -10 "It's impossible to get right" on Rusty Russel's Hard
to Misuse scale[*].

"Fortunately", the FIEMAP code is used only when

* SEEK_HOLE/SEEK_DATA aren't defined, but CONFIG_FIEMAP is

  Uncommon.  SEEK_HOLE had no XFS implementation between 2011 (when it
  was introduced for ext4 and btrfs) and 2012.

* SEEK_HOLE/SEEK_DATA and CONFIG_FIEMAP are defined, but lseek() fails

  Unlikely.

Thus, the FIEMAP code executes rarely.  Makes it a nice hidey-hole for
bugs.  Worse, bugs hiding there can theoretically bite even on a host
that has SEEK_HOLE/SEEK_DATA.

I don't want to worry about this crap, not even theoretically.  Get
rid of it.

[*] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/raw-posix.c | 60 ++++---------------------------------------------------
 1 file changed, 4 insertions(+), 56 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 706d3c0..fd80d84 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -60,9 +60,6 @@
 #define FS_NOCOW_FL                     0x00800000 /* Do not cow file */
 #endif
 #endif
-#ifdef CONFIG_FIEMAP
-#include <linux/fiemap.h>
-#endif
 #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
 #include <linux/falloc.h>
 #endif
@@ -1481,52 +1478,6 @@ out:
     return result;
 }
 
-static int try_fiemap(BlockDriverState *bs, off_t start, off_t *data,
-                      off_t *hole, int nb_sectors)
-{
-#ifdef CONFIG_FIEMAP
-    BDRVRawState *s = bs->opaque;
-    int ret = 0;
-    struct {
-        struct fiemap fm;
-        struct fiemap_extent fe;
-    } f;
-
-    if (s->skip_fiemap) {
-        return -ENOTSUP;
-    }
-
-    f.fm.fm_start = start;
-    f.fm.fm_length = (int64_t)nb_sectors * BDRV_SECTOR_SIZE;
-    f.fm.fm_flags = FIEMAP_FLAG_SYNC;
-    f.fm.fm_extent_count = 1;
-    f.fm.fm_reserved = 0;
-    if (ioctl(s->fd, FS_IOC_FIEMAP, &f) == -1) {
-        s->skip_fiemap = true;
-        return -errno;
-    }
-
-    if (f.fm.fm_mapped_extents == 0) {
-        /* No extents found, data is beyond f.fm.fm_start + f.fm.fm_length.
-         * f.fm.fm_start + f.fm.fm_length must be clamped to the file size!
-         */
-        off_t length = lseek(s->fd, 0, SEEK_END);
-        *hole = f.fm.fm_start;
-        *data = MIN(f.fm.fm_start + f.fm.fm_length, length);
-    } else {
-        *data = f.fe.fe_logical;
-        *hole = f.fe.fe_logical + f.fe.fe_length;
-        if (f.fe.fe_flags & FIEMAP_EXTENT_UNWRITTEN) {
-            ret |= BDRV_BLOCK_ZERO;
-        }
-    }
-
-    return ret;
-#else
-    return -ENOTSUP;
-#endif
-}
-
 static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
                          off_t *hole)
 {
@@ -1593,13 +1544,10 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
 
     ret = try_seek_hole(bs, start, &data, &hole);
     if (ret < 0) {
-        ret = try_fiemap(bs, start, &data, &hole, nb_sectors);
-        if (ret < 0) {
-            /* Assume everything is allocated. */
-            data = 0;
-            hole = start + nb_sectors * BDRV_SECTOR_SIZE;
-            ret = 0;
-        }
+        /* Assume everything is allocated. */
+        data = 0;
+        hole = start + nb_sectors * BDRV_SECTOR_SIZE;
+        ret = 0;
     }
 
     assert(ret >= 0);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v3 for-2.2 3/3] raw-posix: The SEEK_HOLE code is flawed, rewrite it
  2014-11-17 10:18 [Qemu-devel] [PATCH v3 for-2.2 0/3] raw-posix: Get rid of FIEMAP, fix SEEK_HOLE Markus Armbruster
  2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 1/3] raw-posix: Fix comment for raw_co_get_block_status() Markus Armbruster
  2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 2/3] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP Markus Armbruster
@ 2014-11-17 10:18 ` Markus Armbruster
  2014-11-17 10:33   ` Max Reitz
  2014-11-17 16:43   ` Eric Blake
  2014-11-18  8:48 ` [Qemu-devel] [PATCH v3 for-2.2 0/3] raw-posix: Get rid of FIEMAP, fix SEEK_HOLE Max Reitz
  3 siblings, 2 replies; 12+ messages in thread
From: Markus Armbruster @ 2014-11-17 10:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, tony, mreitz, stefanha, pbonzini

On systems where SEEK_HOLE in a trailing hole seeks to EOF (Solaris,
but not Linux), try_seek_hole() reports trailing data instead.

Additionally, unlikely lseek() failures are treated badly:

* When SEEK_HOLE fails, try_seek_hole() reports trailing data.  For
  -ENXIO, there's in fact a trailing hole.  Can happen only when
  something truncated the file since we opened it.

* When SEEK_HOLE succeeds, SEEK_DATA fails, and SEEK_END succeeds,
  then try_seek_hole() reports a trailing hole.  This is okay only
  when SEEK_DATA failed with -ENXIO (which means the non-trailing hole
  found by SEEK_HOLE has since become trailing somehow).  For other
  failures (unlikely), it's wrong.

* When SEEK_HOLE succeeds, SEEK_DATA fails, SEEK_END fails (unlikely),
  then try_seek_hole() reports bogus data [-1,start), which its caller
  raw_co_get_block_status() turns into zero sectors of data.  Could
  theoretically lead to infinite loops in code that attempts to scan
  data vs. hole forward.

Rewrite from scratch, with very careful comments.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 block/raw-posix.c | 111 +++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 85 insertions(+), 26 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index fd80d84..0b5d5a8 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1478,28 +1478,86 @@ out:
     return result;
 }
 
-static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
-                         off_t *hole)
+/*
+ * Find allocation range in @bs around offset @start.
+ * May change underlying file descriptor's file offset.
+ * If @start is not in a hole, store @start in @data, and the
+ * beginning of the next hole in @hole, and return 0.
+ * If @start is in a non-trailing hole, store @start in @hole and the
+ * beginning of the next non-hole in @data, and return 0.
+ * If @start is in a trailing hole or beyond EOF, return -ENXIO.
+ * If we can't find out, return a negative errno other than -ENXIO.
+ */
+static int find_allocation(BlockDriverState *bs, off_t start,
+                           off_t *data, off_t *hole)
 {
 #if defined SEEK_HOLE && defined SEEK_DATA
     BDRVRawState *s = bs->opaque;
+    off_t offs;
 
-    *hole = lseek(s->fd, start, SEEK_HOLE);
-    if (*hole == -1) {
-        return -errno;
+    /*
+     * SEEK_DATA cases:
+     * D1. offs == start: start is in data
+     * D2. offs > start: start is in a hole, next data at offs
+     * D3. offs < 0, errno = ENXIO: either start is in a trailing hole
+     *                              or start is beyond EOF
+     *     If the latter happens, the file has been truncated behind
+     *     our back since we opened it.  All bets are off then.
+     *     Treating like a trailing hole is simplest.
+     * D4. offs < 0, errno != ENXIO: we learned nothing
+     */
+    offs = lseek(s->fd, start, SEEK_DATA);
+    if (offs < 0) {
+        return -errno;          /* D3 or D4 */
     }
+    assert(offs >= start);
 
-    if (*hole > start) {
+    if (offs > start) {
+        /* D2: in hole, next data at offs */
+        *hole = start;
+        *data = offs;
+        return 0;
+    }
+
+    /* D1: in data, end not yet known */
+
+    /*
+     * SEEK_HOLE cases:
+     * H1. offs == start: start is in a hole
+     *     If this happens here, a hole has been dug behind our back
+     *     since the previous lseek().
+     * H2. offs > start: either start is in data, next hole at offs,
+     *                   or start is in trailing hole, EOF at offs
+     *     Linux treats trailing holes like any other hole: offs ==
+     *     start.  Solaris seeks to EOF instead: offs > start (blech).
+     *     If that happens here, a hole has been dug behind our back
+     *     since the previous lseek().
+     * H3. offs < 0, errno = ENXIO: start is beyond EOF
+     *     If this happens, the file has been truncated behind our
+     *     back since we opened it.  Treat it like a trailing hole.
+     * H4. offs < 0, errno != ENXIO: we learned nothing
+     *     Pretend we know nothing at all, i.e. "forget" about D1.
+     */
+    offs = lseek(s->fd, start, SEEK_HOLE);
+    if (offs < 0) {
+        return -errno;          /* D1 and (H3 or H4) */
+    }
+    assert(offs >= start);
+
+    if (offs > start) {
+        /*
+         * D1 and H2: either in data, next hole at offs, or it was in
+         * data but is now in a trailing hole.  In the latter case,
+         * all bets are off.  Treating it as if it there was data all
+         * the way to EOF is safe, so simply do that.
+         */
         *data = start;
-    } else {
-        /* On a hole.  We need another syscall to find its end.  */
-        *data = lseek(s->fd, start, SEEK_DATA);
-        if (*data == -1) {
-            *data = lseek(s->fd, 0, SEEK_END);
-        }
+        *hole = offs;
+        return 0;
     }
 
-    return 0;
+    /* D1 and H1 */
+    return -EBUSY;
 #else
     return -ENOTSUP;
 #endif
@@ -1542,25 +1600,26 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
         nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
     }
 
-    ret = try_seek_hole(bs, start, &data, &hole);
-    if (ret < 0) {
-        /* Assume everything is allocated. */
-        data = 0;
-        hole = start + nb_sectors * BDRV_SECTOR_SIZE;
-        ret = 0;
-    }
-
-    assert(ret >= 0);
-
-    if (data <= start) {
+    ret = find_allocation(bs, start, &data, &hole);
+    if (ret == -ENXIO) {
+        /* Trailing hole */
+        *pnum = nb_sectors;
+        ret = BDRV_BLOCK_ZERO;
+    } else if (ret < 0) {
+        /* No info available, so pretend there are no holes */
+        *pnum = nb_sectors;
+        ret = BDRV_BLOCK_DATA;
+    } else if (data == start) {
         /* On a data extent, compute sectors to the end of the extent.  */
         *pnum = MIN(nb_sectors, (hole - start) / BDRV_SECTOR_SIZE);
-        return ret | BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
+        ret = BDRV_BLOCK_DATA;
     } else {
         /* On a hole, compute sectors to the beginning of the next extent.  */
+        assert(hole == start);
         *pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
-        return ret | BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID | start;
+        ret = BDRV_BLOCK_ZERO;
     }
+    return ret | BDRV_BLOCK_OFFSET_VALID | start;
 }
 
 static coroutine_fn BlockAIOCB *raw_aio_discard(BlockDriverState *bs,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 for-2.2 2/3] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 2/3] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP Markus Armbruster
@ 2014-11-17 10:30   ` Max Reitz
  2014-11-17 10:58     ` Markus Armbruster
  0 siblings, 1 reply; 12+ messages in thread
From: Max Reitz @ 2014-11-17 10:30 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: kwolf, famz, tony, stefanha, pbonzini

On 2014-11-17 at 11:18, Markus Armbruster wrote:
> Commit 5500316 (May 2012) implemented raw_co_is_allocated() as
> follows:
>
> 1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl
>
> 2. Else if defined(SEEK_HOLE) && defined(SEEK_DATA), use lseek()
>
> 3. Else pretend there are no holes
>
> Later on, raw_co_is_allocated() was generalized to
> raw_co_get_block_status().
>
> Commit 4f11aa8 (May 2014) changed it to try the three methods in order
> until success, because "there may be implementations which support
> [SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice
> versa."
>
> Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC.
> Commit 38c4d0a (Sep 2014) added it.  Because that's a significant
> speed hit, the next commit 7c159037 put SEEK_HOLE/SEEK_DATA first.
>
> As you see, the obvious use of FIEMAP is wrong, and the correct use is
> slow.  I guess this puts it somewhere between -7 "The obvious use is
> wrong" and -10 "It's impossible to get right" on Rusty Russel's Hard
> to Misuse scale[*].
>
> "Fortunately", the FIEMAP code is used only when
>
> * SEEK_HOLE/SEEK_DATA aren't defined, but CONFIG_FIEMAP is
>
>    Uncommon.  SEEK_HOLE had no XFS implementation between 2011 (when it
>    was introduced for ext4 and btrfs) and 2012.
>
> * SEEK_HOLE/SEEK_DATA and CONFIG_FIEMAP are defined, but lseek() fails
>
>    Unlikely.
>
> Thus, the FIEMAP code executes rarely.  Makes it a nice hidey-hole for
> bugs.  Worse, bugs hiding there can theoretically bite even on a host
> that has SEEK_HOLE/SEEK_DATA.
>
> I don't want to worry about this crap, not even theoretically.  Get
> rid of it.
>
> [*] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> ---
>   block/raw-posix.c | 60 ++++---------------------------------------------------
>   1 file changed, 4 insertions(+), 56 deletions(-)

I only just now realized that you're not getting rid of FIEMAP 
completely; there's still the skip_fiemap flag in BDRVRawState. I don't 
care for now, though, thus my R-b stands.

Max

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 for-2.2 3/3] raw-posix: The SEEK_HOLE code is flawed, rewrite it
  2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 3/3] raw-posix: The SEEK_HOLE code is flawed, rewrite it Markus Armbruster
@ 2014-11-17 10:33   ` Max Reitz
  2014-11-17 16:43   ` Eric Blake
  1 sibling, 0 replies; 12+ messages in thread
From: Max Reitz @ 2014-11-17 10:33 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: kwolf, famz, tony, stefanha, pbonzini

On 2014-11-17 at 11:18, Markus Armbruster wrote:
> On systems where SEEK_HOLE in a trailing hole seeks to EOF (Solaris,
> but not Linux), try_seek_hole() reports trailing data instead.
>
> Additionally, unlikely lseek() failures are treated badly:
>
> * When SEEK_HOLE fails, try_seek_hole() reports trailing data.  For
>    -ENXIO, there's in fact a trailing hole.  Can happen only when
>    something truncated the file since we opened it.
>
> * When SEEK_HOLE succeeds, SEEK_DATA fails, and SEEK_END succeeds,
>    then try_seek_hole() reports a trailing hole.  This is okay only
>    when SEEK_DATA failed with -ENXIO (which means the non-trailing hole
>    found by SEEK_HOLE has since become trailing somehow).  For other
>    failures (unlikely), it's wrong.
>
> * When SEEK_HOLE succeeds, SEEK_DATA fails, SEEK_END fails (unlikely),
>    then try_seek_hole() reports bogus data [-1,start), which its caller
>    raw_co_get_block_status() turns into zero sectors of data.  Could
>    theoretically lead to infinite loops in code that attempts to scan
>    data vs. hole forward.
>
> Rewrite from scratch, with very careful comments.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   block/raw-posix.c | 111 +++++++++++++++++++++++++++++++++++++++++-------------
>   1 file changed, 85 insertions(+), 26 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 for-2.2 2/3] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  2014-11-17 10:30   ` Max Reitz
@ 2014-11-17 10:58     ` Markus Armbruster
  2014-11-17 10:59       ` Max Reitz
  2014-11-17 16:31       ` Eric Blake
  0 siblings, 2 replies; 12+ messages in thread
From: Markus Armbruster @ 2014-11-17 10:58 UTC (permalink / raw)
  To: Max Reitz; +Cc: kwolf, famz, qemu-devel, tony, stefanha, pbonzini

Max Reitz <mreitz@redhat.com> writes:

> On 2014-11-17 at 11:18, Markus Armbruster wrote:
>> Commit 5500316 (May 2012) implemented raw_co_is_allocated() as
>> follows:
>>
>> 1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl
>>
>> 2. Else if defined(SEEK_HOLE) && defined(SEEK_DATA), use lseek()
>>
>> 3. Else pretend there are no holes
>>
>> Later on, raw_co_is_allocated() was generalized to
>> raw_co_get_block_status().
>>
>> Commit 4f11aa8 (May 2014) changed it to try the three methods in order
>> until success, because "there may be implementations which support
>> [SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice
>> versa."
>>
>> Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC.
>> Commit 38c4d0a (Sep 2014) added it.  Because that's a significant
>> speed hit, the next commit 7c159037 put SEEK_HOLE/SEEK_DATA first.
>>
>> As you see, the obvious use of FIEMAP is wrong, and the correct use is
>> slow.  I guess this puts it somewhere between -7 "The obvious use is
>> wrong" and -10 "It's impossible to get right" on Rusty Russel's Hard
>> to Misuse scale[*].
>>
>> "Fortunately", the FIEMAP code is used only when
>>
>> * SEEK_HOLE/SEEK_DATA aren't defined, but CONFIG_FIEMAP is
>>
>>    Uncommon.  SEEK_HOLE had no XFS implementation between 2011 (when it
>>    was introduced for ext4 and btrfs) and 2012.
>>
>> * SEEK_HOLE/SEEK_DATA and CONFIG_FIEMAP are defined, but lseek() fails
>>
>>    Unlikely.
>>
>> Thus, the FIEMAP code executes rarely.  Makes it a nice hidey-hole for
>> bugs.  Worse, bugs hiding there can theoretically bite even on a host
>> that has SEEK_HOLE/SEEK_DATA.
>>
>> I don't want to worry about this crap, not even theoretically.  Get
>> rid of it.
>>
>> [*] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
>> ---
>>   block/raw-posix.c | 60 ++++---------------------------------------------------
>>   1 file changed, 4 insertions(+), 56 deletions(-)
>
> I only just now realized that you're not getting rid of FIEMAP
> completely; there's still the skip_fiemap flag in BDRVRawState. I
> don't care for now, though, thus my R-b stands.

You're right!  The appended patch should be squashed in, either on
commit, or in a respin.


diff --git a/block/raw-posix.c b/block/raw-posix.c
index 0b5d5a8..414e6d1 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -148,9 +148,6 @@ typedef struct BDRVRawState {
     bool has_write_zeroes:1;
     bool discard_zeroes:1;
     bool needs_alignment;
-#ifdef CONFIG_FIEMAP
-    bool skip_fiemap;
-#endif
 } BDRVRawState;
 
 typedef struct BDRVRawReopenState {

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 for-2.2 2/3] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  2014-11-17 10:58     ` Markus Armbruster
@ 2014-11-17 10:59       ` Max Reitz
  2014-11-17 16:31       ` Eric Blake
  1 sibling, 0 replies; 12+ messages in thread
From: Max Reitz @ 2014-11-17 10:59 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: kwolf, famz, qemu-devel, tony, stefanha, pbonzini

On 2014-11-17 at 11:58, Markus Armbruster wrote:
> Max Reitz <mreitz@redhat.com> writes:
>
>> On 2014-11-17 at 11:18, Markus Armbruster wrote:
>>> Commit 5500316 (May 2012) implemented raw_co_is_allocated() as
>>> follows:
>>>
>>> 1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl
>>>
>>> 2. Else if defined(SEEK_HOLE) && defined(SEEK_DATA), use lseek()
>>>
>>> 3. Else pretend there are no holes
>>>
>>> Later on, raw_co_is_allocated() was generalized to
>>> raw_co_get_block_status().
>>>
>>> Commit 4f11aa8 (May 2014) changed it to try the three methods in order
>>> until success, because "there may be implementations which support
>>> [SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice
>>> versa."
>>>
>>> Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC.
>>> Commit 38c4d0a (Sep 2014) added it.  Because that's a significant
>>> speed hit, the next commit 7c159037 put SEEK_HOLE/SEEK_DATA first.
>>>
>>> As you see, the obvious use of FIEMAP is wrong, and the correct use is
>>> slow.  I guess this puts it somewhere between -7 "The obvious use is
>>> wrong" and -10 "It's impossible to get right" on Rusty Russel's Hard
>>> to Misuse scale[*].
>>>
>>> "Fortunately", the FIEMAP code is used only when
>>>
>>> * SEEK_HOLE/SEEK_DATA aren't defined, but CONFIG_FIEMAP is
>>>
>>>     Uncommon.  SEEK_HOLE had no XFS implementation between 2011 (when it
>>>     was introduced for ext4 and btrfs) and 2012.
>>>
>>> * SEEK_HOLE/SEEK_DATA and CONFIG_FIEMAP are defined, but lseek() fails
>>>
>>>     Unlikely.
>>>
>>> Thus, the FIEMAP code executes rarely.  Makes it a nice hidey-hole for
>>> bugs.  Worse, bugs hiding there can theoretically bite even on a host
>>> that has SEEK_HOLE/SEEK_DATA.
>>>
>>> I don't want to worry about this crap, not even theoretically.  Get
>>> rid of it.
>>>
>>> [*] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html
>>>
>>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>>> Reviewed-by: Eric Blake <eblake@redhat.com>
>>> ---
>>>    block/raw-posix.c | 60 ++++---------------------------------------------------
>>>    1 file changed, 4 insertions(+), 56 deletions(-)
>> I only just now realized that you're not getting rid of FIEMAP
>> completely; there's still the skip_fiemap flag in BDRVRawState. I
>> don't care for now, though, thus my R-b stands.
> You're right!  The appended patch should be squashed in, either on
> commit, or in a respin.
>
>
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 0b5d5a8..414e6d1 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -148,9 +148,6 @@ typedef struct BDRVRawState {
>       bool has_write_zeroes:1;
>       bool discard_zeroes:1;
>       bool needs_alignment;
> -#ifdef CONFIG_FIEMAP
> -    bool skip_fiemap;
> -#endif
>   } BDRVRawState;
>   
>   typedef struct BDRVRawReopenState {

With or without thank hunk (better with):

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 for-2.2 2/3] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  2014-11-17 10:58     ` Markus Armbruster
  2014-11-17 10:59       ` Max Reitz
@ 2014-11-17 16:31       ` Eric Blake
  1 sibling, 0 replies; 12+ messages in thread
From: Eric Blake @ 2014-11-17 16:31 UTC (permalink / raw)
  To: Markus Armbruster, Max Reitz
  Cc: kwolf, famz, qemu-devel, tony, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1495 bytes --]

On 11/17/2014 03:58 AM, Markus Armbruster wrote:
> Max Reitz <mreitz@redhat.com> writes:
> 
>> On 2014-11-17 at 11:18, Markus Armbruster wrote:
>>> Commit 5500316 (May 2012) implemented raw_co_is_allocated() as
>>> follows:
>>>

>>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>>> Reviewed-by: Eric Blake <eblake@redhat.com>
>>> ---
>>>   block/raw-posix.c | 60 ++++---------------------------------------------------
>>>   1 file changed, 4 insertions(+), 56 deletions(-)
>>
>> I only just now realized that you're not getting rid of FIEMAP
>> completely; there's still the skip_fiemap flag in BDRVRawState. I
>> don't care for now, though, thus my R-b stands.
> 
> You're right!  The appended patch should be squashed in, either on
> commit, or in a respin.

My R-b stands whether or not you squash this in; best situation would be
squashing this in during commit.

> 
> 
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 0b5d5a8..414e6d1 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -148,9 +148,6 @@ typedef struct BDRVRawState {
>      bool has_write_zeroes:1;
>      bool discard_zeroes:1;
>      bool needs_alignment;
> -#ifdef CONFIG_FIEMAP
> -    bool skip_fiemap;
> -#endif
>  } BDRVRawState;
>  
>  typedef struct BDRVRawReopenState {
> 
> 
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 for-2.2 3/3] raw-posix: The SEEK_HOLE code is flawed, rewrite it
  2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 3/3] raw-posix: The SEEK_HOLE code is flawed, rewrite it Markus Armbruster
  2014-11-17 10:33   ` Max Reitz
@ 2014-11-17 16:43   ` Eric Blake
  2014-11-18  8:42     ` Max Reitz
  1 sibling, 1 reply; 12+ messages in thread
From: Eric Blake @ 2014-11-17 16:43 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel
  Cc: kwolf, famz, tony, mreitz, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 2562 bytes --]

On 11/17/2014 03:18 AM, Markus Armbruster wrote:
> On systems where SEEK_HOLE in a trailing hole seeks to EOF (Solaris,
> but not Linux), try_seek_hole() reports trailing data instead.

Maybe worth a comment that this is not fatal, but also not optimal.

> 
> Additionally, unlikely lseek() failures are treated badly:
> 
> * When SEEK_HOLE fails, try_seek_hole() reports trailing data.  For
>   -ENXIO, there's in fact a trailing hole.  Can happen only when
>   something truncated the file since we opened it.
> 
> * When SEEK_HOLE succeeds, SEEK_DATA fails, and SEEK_END succeeds,
>   then try_seek_hole() reports a trailing hole.  This is okay only
>   when SEEK_DATA failed with -ENXIO (which means the non-trailing hole
>   found by SEEK_HOLE has since become trailing somehow).  For other
>   failures (unlikely), it's wrong.
> 
> * When SEEK_HOLE succeeds, SEEK_DATA fails, SEEK_END fails (unlikely),
>   then try_seek_hole() reports bogus data [-1,start), which its caller
>   raw_co_get_block_status() turns into zero sectors of data.  Could
>   theoretically lead to infinite loops in code that attempts to scan
>   data vs. hole forward.
> 
> Rewrite from scratch, with very careful comments.

Thanks for the careful commit message as well as the careful comments :)

> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  block/raw-posix.c | 111 +++++++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 85 insertions(+), 26 deletions(-)
> 

> @@ -1542,25 +1600,26 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
>          nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
>      }
>  

> +    } else if (data == start) {
>          /* On a data extent, compute sectors to the end of the extent.  */
>          *pnum = MIN(nb_sectors, (hole - start) / BDRV_SECTOR_SIZE);

I think we are safe for now that no file system supports holes smaller
than 512 bytes, so (hole - start) should always be a non-zero multiple
of sectors.  Similarly for the hole case of (data - start).  Maybe it's
worth assert(*pnum > 0) to ensure that we never hit a situation where we
go into an infinite loop where we aren't progressing because pnum is
never advancing to the next sector?  But that would be okay as a
separate patch, and I don't want to delay getting _this_ patch into 2.2.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 for-2.2 3/3] raw-posix: The SEEK_HOLE code is flawed, rewrite it
  2014-11-17 16:43   ` Eric Blake
@ 2014-11-18  8:42     ` Max Reitz
  0 siblings, 0 replies; 12+ messages in thread
From: Max Reitz @ 2014-11-18  8:42 UTC (permalink / raw)
  To: Eric Blake, Markus Armbruster, qemu-devel
  Cc: kwolf, pbonzini, famz, tony, stefanha

On 2014-11-17 at 17:43, Eric Blake wrote:
> On 11/17/2014 03:18 AM, Markus Armbruster wrote:
>> On systems where SEEK_HOLE in a trailing hole seeks to EOF (Solaris,
>> but not Linux), try_seek_hole() reports trailing data instead.
> Maybe worth a comment that this is not fatal, but also not optimal.
>
>> Additionally, unlikely lseek() failures are treated badly:
>>
>> * When SEEK_HOLE fails, try_seek_hole() reports trailing data.  For
>>    -ENXIO, there's in fact a trailing hole.  Can happen only when
>>    something truncated the file since we opened it.
>>
>> * When SEEK_HOLE succeeds, SEEK_DATA fails, and SEEK_END succeeds,
>>    then try_seek_hole() reports a trailing hole.  This is okay only
>>    when SEEK_DATA failed with -ENXIO (which means the non-trailing hole
>>    found by SEEK_HOLE has since become trailing somehow).  For other
>>    failures (unlikely), it's wrong.
>>
>> * When SEEK_HOLE succeeds, SEEK_DATA fails, SEEK_END fails (unlikely),
>>    then try_seek_hole() reports bogus data [-1,start), which its caller
>>    raw_co_get_block_status() turns into zero sectors of data.  Could
>>    theoretically lead to infinite loops in code that attempts to scan
>>    data vs. hole forward.
>>
>> Rewrite from scratch, with very careful comments.
> Thanks for the careful commit message as well as the careful comments :)
>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   block/raw-posix.c | 111 +++++++++++++++++++++++++++++++++++++++++-------------
>>   1 file changed, 85 insertions(+), 26 deletions(-)
>>
>> @@ -1542,25 +1600,26 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
>>           nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
>>       }
>>   
>> +    } else if (data == start) {
>>           /* On a data extent, compute sectors to the end of the extent.  */
>>           *pnum = MIN(nb_sectors, (hole - start) / BDRV_SECTOR_SIZE);
> I think we are safe for now that no file system supports holes smaller
> than 512 bytes, so (hole - start) should always be a non-zero multiple
> of sectors.  Similarly for the hole case of (data - start).  Maybe it's
> worth assert(*pnum > 0) to ensure that we never hit a situation where we
> go into an infinite loop where we aren't progressing because pnum is
> never advancing to the next sector?

That's something the callers of bdrv_get_block_status() have to take 
care of. See commit 4b25bbc4c22cf39350b75bd250d568a4d975f7c5, for example.

Max

> But that would be okay as a
> separate patch, and I don't want to delay getting _this_ patch into 2.2.
>
> Reviewed-by: Eric Blake <eblake@redhat.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v3 for-2.2 0/3] raw-posix: Get rid of FIEMAP, fix SEEK_HOLE
  2014-11-17 10:18 [Qemu-devel] [PATCH v3 for-2.2 0/3] raw-posix: Get rid of FIEMAP, fix SEEK_HOLE Markus Armbruster
                   ` (2 preceding siblings ...)
  2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 3/3] raw-posix: The SEEK_HOLE code is flawed, rewrite it Markus Armbruster
@ 2014-11-18  8:48 ` Max Reitz
  3 siblings, 0 replies; 12+ messages in thread
From: Max Reitz @ 2014-11-18  8:48 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: kwolf, famz, tony, stefanha, pbonzini

On 2014-11-17 at 11:18, Markus Armbruster wrote:
> PATCH 1 is just a comment fix.  PATCH 2 drops FIEMAP use and explains
> why it needs to go.  PATCH 3 carefully rewrites the SEEK_HOLE code.
>
> Why 2.2?  The series fixes bugs, but the bugs are either not terribly
> severe, or not particularly likely to bite.  The reason I want it
> included is we've already changed the code fundamentally in 2.2, from
>
>      Incorrect use of FIEMAP if available, else somewhat incorrect use
>      of SEEK_HOLE if available, else pretend no holes
>
> to
>
>      Somewhat incorrect use of SEEK_HOLE if available, else correct but
>      slow-as-molasses use of FIEMAP if available, else pretend no holes
>
> Let's finish the job:
>
>      Correct use of SEEK_HOLE if available, else pretend no holes
>
> Less complex, more robust, and no half-broken intermediate version
> released.
>
> v3:
> * PATCH 1&2 unchanged, except for a commit message typo [Eric]
> * PATCH 3&4 redone as PATCH 3, very carefully [Eric, Kevin]
>
> v2:
> * PATCH 1 unchanged
> * PATCH 2 revised and split up [Paolo, Fam, Eric, Max]
>
> Markus Armbruster (3):
>    raw-posix: Fix comment for raw_co_get_block_status()
>    raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
>    raw-posix: The SEEK_HOLE code is flawed, rewrite it
>
>   block/raw-posix.c | 167 ++++++++++++++++++++++++++++--------------------------
>   1 file changed, 86 insertions(+), 81 deletions(-)

Thanks, applied to my block tree with the hunk removing skip_fiemap 
squashed into patch 2:

https://github.com/XanClic/qemu/commits/block

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-11-18  8:49 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-17 10:18 [Qemu-devel] [PATCH v3 for-2.2 0/3] raw-posix: Get rid of FIEMAP, fix SEEK_HOLE Markus Armbruster
2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 1/3] raw-posix: Fix comment for raw_co_get_block_status() Markus Armbruster
2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 2/3] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP Markus Armbruster
2014-11-17 10:30   ` Max Reitz
2014-11-17 10:58     ` Markus Armbruster
2014-11-17 10:59       ` Max Reitz
2014-11-17 16:31       ` Eric Blake
2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 3/3] raw-posix: The SEEK_HOLE code is flawed, rewrite it Markus Armbruster
2014-11-17 10:33   ` Max Reitz
2014-11-17 16:43   ` Eric Blake
2014-11-18  8:42     ` Max Reitz
2014-11-18  8:48 ` [Qemu-devel] [PATCH v3 for-2.2 0/3] raw-posix: Get rid of FIEMAP, fix SEEK_HOLE Max Reitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).