* [Qemu-devel] [PATCH v3 for-2.2 0/3] raw-posix: Get rid of FIEMAP, fix SEEK_HOLE
@ 2014-11-17 10:18 Markus Armbruster
2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 1/3] raw-posix: Fix comment for raw_co_get_block_status() Markus Armbruster
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Markus Armbruster @ 2014-11-17 10:18 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, famz, tony, mreitz, stefanha, pbonzini
PATCH 1 is just a comment fix. PATCH 2 drops FIEMAP use and explains
why it needs to go. PATCH 3 carefully rewrites the SEEK_HOLE code.
Why 2.2? The series fixes bugs, but the bugs are either not terribly
severe, or not particularly likely to bite. The reason I want it
included is we've already changed the code fundamentally in 2.2, from
Incorrect use of FIEMAP if available, else somewhat incorrect use
of SEEK_HOLE if available, else pretend no holes
to
Somewhat incorrect use of SEEK_HOLE if available, else correct but
slow-as-molasses use of FIEMAP if available, else pretend no holes
Let's finish the job:
Correct use of SEEK_HOLE if available, else pretend no holes
Less complex, more robust, and no half-broken intermediate version
released.
v3:
* PATCH 1&2 unchanged, except for a commit message typo [Eric]
* PATCH 3&4 redone as PATCH 3, very carefully [Eric, Kevin]
v2:
* PATCH 1 unchanged
* PATCH 2 revised and split up [Paolo, Fam, Eric, Max]
Markus Armbruster (3):
raw-posix: Fix comment for raw_co_get_block_status()
raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
raw-posix: The SEEK_HOLE code is flawed, rewrite it
block/raw-posix.c | 167 ++++++++++++++++++++++++++++--------------------------
1 file changed, 86 insertions(+), 81 deletions(-)
--
1.9.3
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH v3 for-2.2 1/3] raw-posix: Fix comment for raw_co_get_block_status()
2014-11-17 10:18 [Qemu-devel] [PATCH v3 for-2.2 0/3] raw-posix: Get rid of FIEMAP, fix SEEK_HOLE Markus Armbruster
@ 2014-11-17 10:18 ` Markus Armbruster
2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 2/3] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP Markus Armbruster
` (2 subsequent siblings)
3 siblings, 0 replies; 12+ messages in thread
From: Markus Armbruster @ 2014-11-17 10:18 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, famz, tony, mreitz, stefanha, pbonzini
Missed in commit 705be72.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
block/raw-posix.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/block/raw-posix.c b/block/raw-posix.c
index e100ae2..706d3c0 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1555,9 +1555,7 @@ static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
}
/*
- * Returns true iff the specified sector is present in the disk image. Drivers
- * not implementing the functionality are assumed to not support backing files,
- * hence all their sectors are reported as allocated.
+ * Returns the allocation status of the specified sectors.
*
* If 'sector_num' is beyond the end of the disk image the return value is 0
* and 'pnum' is set to 0.
--
1.9.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH v3 for-2.2 2/3] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
2014-11-17 10:18 [Qemu-devel] [PATCH v3 for-2.2 0/3] raw-posix: Get rid of FIEMAP, fix SEEK_HOLE Markus Armbruster
2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 1/3] raw-posix: Fix comment for raw_co_get_block_status() Markus Armbruster
@ 2014-11-17 10:18 ` Markus Armbruster
2014-11-17 10:30 ` Max Reitz
2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 3/3] raw-posix: The SEEK_HOLE code is flawed, rewrite it Markus Armbruster
2014-11-18 8:48 ` [Qemu-devel] [PATCH v3 for-2.2 0/3] raw-posix: Get rid of FIEMAP, fix SEEK_HOLE Max Reitz
3 siblings, 1 reply; 12+ messages in thread
From: Markus Armbruster @ 2014-11-17 10:18 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, famz, tony, mreitz, stefanha, pbonzini
Commit 5500316 (May 2012) implemented raw_co_is_allocated() as
follows:
1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl
2. Else if defined(SEEK_HOLE) && defined(SEEK_DATA), use lseek()
3. Else pretend there are no holes
Later on, raw_co_is_allocated() was generalized to
raw_co_get_block_status().
Commit 4f11aa8 (May 2014) changed it to try the three methods in order
until success, because "there may be implementations which support
[SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice
versa."
Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC.
Commit 38c4d0a (Sep 2014) added it. Because that's a significant
speed hit, the next commit 7c159037 put SEEK_HOLE/SEEK_DATA first.
As you see, the obvious use of FIEMAP is wrong, and the correct use is
slow. I guess this puts it somewhere between -7 "The obvious use is
wrong" and -10 "It's impossible to get right" on Rusty Russel's Hard
to Misuse scale[*].
"Fortunately", the FIEMAP code is used only when
* SEEK_HOLE/SEEK_DATA aren't defined, but CONFIG_FIEMAP is
Uncommon. SEEK_HOLE had no XFS implementation between 2011 (when it
was introduced for ext4 and btrfs) and 2012.
* SEEK_HOLE/SEEK_DATA and CONFIG_FIEMAP are defined, but lseek() fails
Unlikely.
Thus, the FIEMAP code executes rarely. Makes it a nice hidey-hole for
bugs. Worse, bugs hiding there can theoretically bite even on a host
that has SEEK_HOLE/SEEK_DATA.
I don't want to worry about this crap, not even theoretically. Get
rid of it.
[*] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
block/raw-posix.c | 60 ++++---------------------------------------------------
1 file changed, 4 insertions(+), 56 deletions(-)
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 706d3c0..fd80d84 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -60,9 +60,6 @@
#define FS_NOCOW_FL 0x00800000 /* Do not cow file */
#endif
#endif
-#ifdef CONFIG_FIEMAP
-#include <linux/fiemap.h>
-#endif
#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
#include <linux/falloc.h>
#endif
@@ -1481,52 +1478,6 @@ out:
return result;
}
-static int try_fiemap(BlockDriverState *bs, off_t start, off_t *data,
- off_t *hole, int nb_sectors)
-{
-#ifdef CONFIG_FIEMAP
- BDRVRawState *s = bs->opaque;
- int ret = 0;
- struct {
- struct fiemap fm;
- struct fiemap_extent fe;
- } f;
-
- if (s->skip_fiemap) {
- return -ENOTSUP;
- }
-
- f.fm.fm_start = start;
- f.fm.fm_length = (int64_t)nb_sectors * BDRV_SECTOR_SIZE;
- f.fm.fm_flags = FIEMAP_FLAG_SYNC;
- f.fm.fm_extent_count = 1;
- f.fm.fm_reserved = 0;
- if (ioctl(s->fd, FS_IOC_FIEMAP, &f) == -1) {
- s->skip_fiemap = true;
- return -errno;
- }
-
- if (f.fm.fm_mapped_extents == 0) {
- /* No extents found, data is beyond f.fm.fm_start + f.fm.fm_length.
- * f.fm.fm_start + f.fm.fm_length must be clamped to the file size!
- */
- off_t length = lseek(s->fd, 0, SEEK_END);
- *hole = f.fm.fm_start;
- *data = MIN(f.fm.fm_start + f.fm.fm_length, length);
- } else {
- *data = f.fe.fe_logical;
- *hole = f.fe.fe_logical + f.fe.fe_length;
- if (f.fe.fe_flags & FIEMAP_EXTENT_UNWRITTEN) {
- ret |= BDRV_BLOCK_ZERO;
- }
- }
-
- return ret;
-#else
- return -ENOTSUP;
-#endif
-}
-
static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
off_t *hole)
{
@@ -1593,13 +1544,10 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
ret = try_seek_hole(bs, start, &data, &hole);
if (ret < 0) {
- ret = try_fiemap(bs, start, &data, &hole, nb_sectors);
- if (ret < 0) {
- /* Assume everything is allocated. */
- data = 0;
- hole = start + nb_sectors * BDRV_SECTOR_SIZE;
- ret = 0;
- }
+ /* Assume everything is allocated. */
+ data = 0;
+ hole = start + nb_sectors * BDRV_SECTOR_SIZE;
+ ret = 0;
}
assert(ret >= 0);
--
1.9.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH v3 for-2.2 3/3] raw-posix: The SEEK_HOLE code is flawed, rewrite it
2014-11-17 10:18 [Qemu-devel] [PATCH v3 for-2.2 0/3] raw-posix: Get rid of FIEMAP, fix SEEK_HOLE Markus Armbruster
2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 1/3] raw-posix: Fix comment for raw_co_get_block_status() Markus Armbruster
2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 2/3] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP Markus Armbruster
@ 2014-11-17 10:18 ` Markus Armbruster
2014-11-17 10:33 ` Max Reitz
2014-11-17 16:43 ` Eric Blake
2014-11-18 8:48 ` [Qemu-devel] [PATCH v3 for-2.2 0/3] raw-posix: Get rid of FIEMAP, fix SEEK_HOLE Max Reitz
3 siblings, 2 replies; 12+ messages in thread
From: Markus Armbruster @ 2014-11-17 10:18 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, famz, tony, mreitz, stefanha, pbonzini
On systems where SEEK_HOLE in a trailing hole seeks to EOF (Solaris,
but not Linux), try_seek_hole() reports trailing data instead.
Additionally, unlikely lseek() failures are treated badly:
* When SEEK_HOLE fails, try_seek_hole() reports trailing data. For
-ENXIO, there's in fact a trailing hole. Can happen only when
something truncated the file since we opened it.
* When SEEK_HOLE succeeds, SEEK_DATA fails, and SEEK_END succeeds,
then try_seek_hole() reports a trailing hole. This is okay only
when SEEK_DATA failed with -ENXIO (which means the non-trailing hole
found by SEEK_HOLE has since become trailing somehow). For other
failures (unlikely), it's wrong.
* When SEEK_HOLE succeeds, SEEK_DATA fails, SEEK_END fails (unlikely),
then try_seek_hole() reports bogus data [-1,start), which its caller
raw_co_get_block_status() turns into zero sectors of data. Could
theoretically lead to infinite loops in code that attempts to scan
data vs. hole forward.
Rewrite from scratch, with very careful comments.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
block/raw-posix.c | 111 +++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 85 insertions(+), 26 deletions(-)
diff --git a/block/raw-posix.c b/block/raw-posix.c
index fd80d84..0b5d5a8 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1478,28 +1478,86 @@ out:
return result;
}
-static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
- off_t *hole)
+/*
+ * Find allocation range in @bs around offset @start.
+ * May change underlying file descriptor's file offset.
+ * If @start is not in a hole, store @start in @data, and the
+ * beginning of the next hole in @hole, and return 0.
+ * If @start is in a non-trailing hole, store @start in @hole and the
+ * beginning of the next non-hole in @data, and return 0.
+ * If @start is in a trailing hole or beyond EOF, return -ENXIO.
+ * If we can't find out, return a negative errno other than -ENXIO.
+ */
+static int find_allocation(BlockDriverState *bs, off_t start,
+ off_t *data, off_t *hole)
{
#if defined SEEK_HOLE && defined SEEK_DATA
BDRVRawState *s = bs->opaque;
+ off_t offs;
- *hole = lseek(s->fd, start, SEEK_HOLE);
- if (*hole == -1) {
- return -errno;
+ /*
+ * SEEK_DATA cases:
+ * D1. offs == start: start is in data
+ * D2. offs > start: start is in a hole, next data at offs
+ * D3. offs < 0, errno = ENXIO: either start is in a trailing hole
+ * or start is beyond EOF
+ * If the latter happens, the file has been truncated behind
+ * our back since we opened it. All bets are off then.
+ * Treating like a trailing hole is simplest.
+ * D4. offs < 0, errno != ENXIO: we learned nothing
+ */
+ offs = lseek(s->fd, start, SEEK_DATA);
+ if (offs < 0) {
+ return -errno; /* D3 or D4 */
}
+ assert(offs >= start);
- if (*hole > start) {
+ if (offs > start) {
+ /* D2: in hole, next data at offs */
+ *hole = start;
+ *data = offs;
+ return 0;
+ }
+
+ /* D1: in data, end not yet known */
+
+ /*
+ * SEEK_HOLE cases:
+ * H1. offs == start: start is in a hole
+ * If this happens here, a hole has been dug behind our back
+ * since the previous lseek().
+ * H2. offs > start: either start is in data, next hole at offs,
+ * or start is in trailing hole, EOF at offs
+ * Linux treats trailing holes like any other hole: offs ==
+ * start. Solaris seeks to EOF instead: offs > start (blech).
+ * If that happens here, a hole has been dug behind our back
+ * since the previous lseek().
+ * H3. offs < 0, errno = ENXIO: start is beyond EOF
+ * If this happens, the file has been truncated behind our
+ * back since we opened it. Treat it like a trailing hole.
+ * H4. offs < 0, errno != ENXIO: we learned nothing
+ * Pretend we know nothing at all, i.e. "forget" about D1.
+ */
+ offs = lseek(s->fd, start, SEEK_HOLE);
+ if (offs < 0) {
+ return -errno; /* D1 and (H3 or H4) */
+ }
+ assert(offs >= start);
+
+ if (offs > start) {
+ /*
+ * D1 and H2: either in data, next hole at offs, or it was in
+ * data but is now in a trailing hole. In the latter case,
+ * all bets are off. Treating it as if it there was data all
+ * the way to EOF is safe, so simply do that.
+ */
*data = start;
- } else {
- /* On a hole. We need another syscall to find its end. */
- *data = lseek(s->fd, start, SEEK_DATA);
- if (*data == -1) {
- *data = lseek(s->fd, 0, SEEK_END);
- }
+ *hole = offs;
+ return 0;
}
- return 0;
+ /* D1 and H1 */
+ return -EBUSY;
#else
return -ENOTSUP;
#endif
@@ -1542,25 +1600,26 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
}
- ret = try_seek_hole(bs, start, &data, &hole);
- if (ret < 0) {
- /* Assume everything is allocated. */
- data = 0;
- hole = start + nb_sectors * BDRV_SECTOR_SIZE;
- ret = 0;
- }
-
- assert(ret >= 0);
-
- if (data <= start) {
+ ret = find_allocation(bs, start, &data, &hole);
+ if (ret == -ENXIO) {
+ /* Trailing hole */
+ *pnum = nb_sectors;
+ ret = BDRV_BLOCK_ZERO;
+ } else if (ret < 0) {
+ /* No info available, so pretend there are no holes */
+ *pnum = nb_sectors;
+ ret = BDRV_BLOCK_DATA;
+ } else if (data == start) {
/* On a data extent, compute sectors to the end of the extent. */
*pnum = MIN(nb_sectors, (hole - start) / BDRV_SECTOR_SIZE);
- return ret | BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
+ ret = BDRV_BLOCK_DATA;
} else {
/* On a hole, compute sectors to the beginning of the next extent. */
+ assert(hole == start);
*pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
- return ret | BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID | start;
+ ret = BDRV_BLOCK_ZERO;
}
+ return ret | BDRV_BLOCK_OFFSET_VALID | start;
}
static coroutine_fn BlockAIOCB *raw_aio_discard(BlockDriverState *bs,
--
1.9.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH v3 for-2.2 2/3] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 2/3] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP Markus Armbruster
@ 2014-11-17 10:30 ` Max Reitz
2014-11-17 10:58 ` Markus Armbruster
0 siblings, 1 reply; 12+ messages in thread
From: Max Reitz @ 2014-11-17 10:30 UTC (permalink / raw)
To: Markus Armbruster, qemu-devel; +Cc: kwolf, famz, tony, stefanha, pbonzini
On 2014-11-17 at 11:18, Markus Armbruster wrote:
> Commit 5500316 (May 2012) implemented raw_co_is_allocated() as
> follows:
>
> 1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl
>
> 2. Else if defined(SEEK_HOLE) && defined(SEEK_DATA), use lseek()
>
> 3. Else pretend there are no holes
>
> Later on, raw_co_is_allocated() was generalized to
> raw_co_get_block_status().
>
> Commit 4f11aa8 (May 2014) changed it to try the three methods in order
> until success, because "there may be implementations which support
> [SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice
> versa."
>
> Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC.
> Commit 38c4d0a (Sep 2014) added it. Because that's a significant
> speed hit, the next commit 7c159037 put SEEK_HOLE/SEEK_DATA first.
>
> As you see, the obvious use of FIEMAP is wrong, and the correct use is
> slow. I guess this puts it somewhere between -7 "The obvious use is
> wrong" and -10 "It's impossible to get right" on Rusty Russel's Hard
> to Misuse scale[*].
>
> "Fortunately", the FIEMAP code is used only when
>
> * SEEK_HOLE/SEEK_DATA aren't defined, but CONFIG_FIEMAP is
>
> Uncommon. SEEK_HOLE had no XFS implementation between 2011 (when it
> was introduced for ext4 and btrfs) and 2012.
>
> * SEEK_HOLE/SEEK_DATA and CONFIG_FIEMAP are defined, but lseek() fails
>
> Unlikely.
>
> Thus, the FIEMAP code executes rarely. Makes it a nice hidey-hole for
> bugs. Worse, bugs hiding there can theoretically bite even on a host
> that has SEEK_HOLE/SEEK_DATA.
>
> I don't want to worry about this crap, not even theoretically. Get
> rid of it.
>
> [*] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> ---
> block/raw-posix.c | 60 ++++---------------------------------------------------
> 1 file changed, 4 insertions(+), 56 deletions(-)
I only just now realized that you're not getting rid of FIEMAP
completely; there's still the skip_fiemap flag in BDRVRawState. I don't
care for now, though, thus my R-b stands.
Max
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH v3 for-2.2 3/3] raw-posix: The SEEK_HOLE code is flawed, rewrite it
2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 3/3] raw-posix: The SEEK_HOLE code is flawed, rewrite it Markus Armbruster
@ 2014-11-17 10:33 ` Max Reitz
2014-11-17 16:43 ` Eric Blake
1 sibling, 0 replies; 12+ messages in thread
From: Max Reitz @ 2014-11-17 10:33 UTC (permalink / raw)
To: Markus Armbruster, qemu-devel; +Cc: kwolf, famz, tony, stefanha, pbonzini
On 2014-11-17 at 11:18, Markus Armbruster wrote:
> On systems where SEEK_HOLE in a trailing hole seeks to EOF (Solaris,
> but not Linux), try_seek_hole() reports trailing data instead.
>
> Additionally, unlikely lseek() failures are treated badly:
>
> * When SEEK_HOLE fails, try_seek_hole() reports trailing data. For
> -ENXIO, there's in fact a trailing hole. Can happen only when
> something truncated the file since we opened it.
>
> * When SEEK_HOLE succeeds, SEEK_DATA fails, and SEEK_END succeeds,
> then try_seek_hole() reports a trailing hole. This is okay only
> when SEEK_DATA failed with -ENXIO (which means the non-trailing hole
> found by SEEK_HOLE has since become trailing somehow). For other
> failures (unlikely), it's wrong.
>
> * When SEEK_HOLE succeeds, SEEK_DATA fails, SEEK_END fails (unlikely),
> then try_seek_hole() reports bogus data [-1,start), which its caller
> raw_co_get_block_status() turns into zero sectors of data. Could
> theoretically lead to infinite loops in code that attempts to scan
> data vs. hole forward.
>
> Rewrite from scratch, with very careful comments.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
> block/raw-posix.c | 111 +++++++++++++++++++++++++++++++++++++++++-------------
> 1 file changed, 85 insertions(+), 26 deletions(-)
Reviewed-by: Max Reitz <mreitz@redhat.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH v3 for-2.2 2/3] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
2014-11-17 10:30 ` Max Reitz
@ 2014-11-17 10:58 ` Markus Armbruster
2014-11-17 10:59 ` Max Reitz
2014-11-17 16:31 ` Eric Blake
0 siblings, 2 replies; 12+ messages in thread
From: Markus Armbruster @ 2014-11-17 10:58 UTC (permalink / raw)
To: Max Reitz; +Cc: kwolf, famz, qemu-devel, tony, stefanha, pbonzini
Max Reitz <mreitz@redhat.com> writes:
> On 2014-11-17 at 11:18, Markus Armbruster wrote:
>> Commit 5500316 (May 2012) implemented raw_co_is_allocated() as
>> follows:
>>
>> 1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl
>>
>> 2. Else if defined(SEEK_HOLE) && defined(SEEK_DATA), use lseek()
>>
>> 3. Else pretend there are no holes
>>
>> Later on, raw_co_is_allocated() was generalized to
>> raw_co_get_block_status().
>>
>> Commit 4f11aa8 (May 2014) changed it to try the three methods in order
>> until success, because "there may be implementations which support
>> [SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice
>> versa."
>>
>> Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC.
>> Commit 38c4d0a (Sep 2014) added it. Because that's a significant
>> speed hit, the next commit 7c159037 put SEEK_HOLE/SEEK_DATA first.
>>
>> As you see, the obvious use of FIEMAP is wrong, and the correct use is
>> slow. I guess this puts it somewhere between -7 "The obvious use is
>> wrong" and -10 "It's impossible to get right" on Rusty Russel's Hard
>> to Misuse scale[*].
>>
>> "Fortunately", the FIEMAP code is used only when
>>
>> * SEEK_HOLE/SEEK_DATA aren't defined, but CONFIG_FIEMAP is
>>
>> Uncommon. SEEK_HOLE had no XFS implementation between 2011 (when it
>> was introduced for ext4 and btrfs) and 2012.
>>
>> * SEEK_HOLE/SEEK_DATA and CONFIG_FIEMAP are defined, but lseek() fails
>>
>> Unlikely.
>>
>> Thus, the FIEMAP code executes rarely. Makes it a nice hidey-hole for
>> bugs. Worse, bugs hiding there can theoretically bite even on a host
>> that has SEEK_HOLE/SEEK_DATA.
>>
>> I don't want to worry about this crap, not even theoretically. Get
>> rid of it.
>>
>> [*] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
>> ---
>> block/raw-posix.c | 60 ++++---------------------------------------------------
>> 1 file changed, 4 insertions(+), 56 deletions(-)
>
> I only just now realized that you're not getting rid of FIEMAP
> completely; there's still the skip_fiemap flag in BDRVRawState. I
> don't care for now, though, thus my R-b stands.
You're right! The appended patch should be squashed in, either on
commit, or in a respin.
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 0b5d5a8..414e6d1 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -148,9 +148,6 @@ typedef struct BDRVRawState {
bool has_write_zeroes:1;
bool discard_zeroes:1;
bool needs_alignment;
-#ifdef CONFIG_FIEMAP
- bool skip_fiemap;
-#endif
} BDRVRawState;
typedef struct BDRVRawReopenState {
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH v3 for-2.2 2/3] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
2014-11-17 10:58 ` Markus Armbruster
@ 2014-11-17 10:59 ` Max Reitz
2014-11-17 16:31 ` Eric Blake
1 sibling, 0 replies; 12+ messages in thread
From: Max Reitz @ 2014-11-17 10:59 UTC (permalink / raw)
To: Markus Armbruster; +Cc: kwolf, famz, qemu-devel, tony, stefanha, pbonzini
On 2014-11-17 at 11:58, Markus Armbruster wrote:
> Max Reitz <mreitz@redhat.com> writes:
>
>> On 2014-11-17 at 11:18, Markus Armbruster wrote:
>>> Commit 5500316 (May 2012) implemented raw_co_is_allocated() as
>>> follows:
>>>
>>> 1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl
>>>
>>> 2. Else if defined(SEEK_HOLE) && defined(SEEK_DATA), use lseek()
>>>
>>> 3. Else pretend there are no holes
>>>
>>> Later on, raw_co_is_allocated() was generalized to
>>> raw_co_get_block_status().
>>>
>>> Commit 4f11aa8 (May 2014) changed it to try the three methods in order
>>> until success, because "there may be implementations which support
>>> [SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice
>>> versa."
>>>
>>> Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC.
>>> Commit 38c4d0a (Sep 2014) added it. Because that's a significant
>>> speed hit, the next commit 7c159037 put SEEK_HOLE/SEEK_DATA first.
>>>
>>> As you see, the obvious use of FIEMAP is wrong, and the correct use is
>>> slow. I guess this puts it somewhere between -7 "The obvious use is
>>> wrong" and -10 "It's impossible to get right" on Rusty Russel's Hard
>>> to Misuse scale[*].
>>>
>>> "Fortunately", the FIEMAP code is used only when
>>>
>>> * SEEK_HOLE/SEEK_DATA aren't defined, but CONFIG_FIEMAP is
>>>
>>> Uncommon. SEEK_HOLE had no XFS implementation between 2011 (when it
>>> was introduced for ext4 and btrfs) and 2012.
>>>
>>> * SEEK_HOLE/SEEK_DATA and CONFIG_FIEMAP are defined, but lseek() fails
>>>
>>> Unlikely.
>>>
>>> Thus, the FIEMAP code executes rarely. Makes it a nice hidey-hole for
>>> bugs. Worse, bugs hiding there can theoretically bite even on a host
>>> that has SEEK_HOLE/SEEK_DATA.
>>>
>>> I don't want to worry about this crap, not even theoretically. Get
>>> rid of it.
>>>
>>> [*] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html
>>>
>>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>>> Reviewed-by: Eric Blake <eblake@redhat.com>
>>> ---
>>> block/raw-posix.c | 60 ++++---------------------------------------------------
>>> 1 file changed, 4 insertions(+), 56 deletions(-)
>> I only just now realized that you're not getting rid of FIEMAP
>> completely; there's still the skip_fiemap flag in BDRVRawState. I
>> don't care for now, though, thus my R-b stands.
> You're right! The appended patch should be squashed in, either on
> commit, or in a respin.
>
>
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 0b5d5a8..414e6d1 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -148,9 +148,6 @@ typedef struct BDRVRawState {
> bool has_write_zeroes:1;
> bool discard_zeroes:1;
> bool needs_alignment;
> -#ifdef CONFIG_FIEMAP
> - bool skip_fiemap;
> -#endif
> } BDRVRawState;
>
> typedef struct BDRVRawReopenState {
With or without thank hunk (better with):
Reviewed-by: Max Reitz <mreitz@redhat.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH v3 for-2.2 2/3] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
2014-11-17 10:58 ` Markus Armbruster
2014-11-17 10:59 ` Max Reitz
@ 2014-11-17 16:31 ` Eric Blake
1 sibling, 0 replies; 12+ messages in thread
From: Eric Blake @ 2014-11-17 16:31 UTC (permalink / raw)
To: Markus Armbruster, Max Reitz
Cc: kwolf, famz, qemu-devel, tony, stefanha, pbonzini
[-- Attachment #1: Type: text/plain, Size: 1495 bytes --]
On 11/17/2014 03:58 AM, Markus Armbruster wrote:
> Max Reitz <mreitz@redhat.com> writes:
>
>> On 2014-11-17 at 11:18, Markus Armbruster wrote:
>>> Commit 5500316 (May 2012) implemented raw_co_is_allocated() as
>>> follows:
>>>
>>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>>> Reviewed-by: Eric Blake <eblake@redhat.com>
>>> ---
>>> block/raw-posix.c | 60 ++++---------------------------------------------------
>>> 1 file changed, 4 insertions(+), 56 deletions(-)
>>
>> I only just now realized that you're not getting rid of FIEMAP
>> completely; there's still the skip_fiemap flag in BDRVRawState. I
>> don't care for now, though, thus my R-b stands.
>
> You're right! The appended patch should be squashed in, either on
> commit, or in a respin.
My R-b stands whether or not you squash this in; best situation would be
squashing this in during commit.
>
>
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 0b5d5a8..414e6d1 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -148,9 +148,6 @@ typedef struct BDRVRawState {
> bool has_write_zeroes:1;
> bool discard_zeroes:1;
> bool needs_alignment;
> -#ifdef CONFIG_FIEMAP
> - bool skip_fiemap;
> -#endif
> } BDRVRawState;
>
> typedef struct BDRVRawReopenState {
>
>
>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH v3 for-2.2 3/3] raw-posix: The SEEK_HOLE code is flawed, rewrite it
2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 3/3] raw-posix: The SEEK_HOLE code is flawed, rewrite it Markus Armbruster
2014-11-17 10:33 ` Max Reitz
@ 2014-11-17 16:43 ` Eric Blake
2014-11-18 8:42 ` Max Reitz
1 sibling, 1 reply; 12+ messages in thread
From: Eric Blake @ 2014-11-17 16:43 UTC (permalink / raw)
To: Markus Armbruster, qemu-devel
Cc: kwolf, famz, tony, mreitz, stefanha, pbonzini
[-- Attachment #1: Type: text/plain, Size: 2562 bytes --]
On 11/17/2014 03:18 AM, Markus Armbruster wrote:
> On systems where SEEK_HOLE in a trailing hole seeks to EOF (Solaris,
> but not Linux), try_seek_hole() reports trailing data instead.
Maybe worth a comment that this is not fatal, but also not optimal.
>
> Additionally, unlikely lseek() failures are treated badly:
>
> * When SEEK_HOLE fails, try_seek_hole() reports trailing data. For
> -ENXIO, there's in fact a trailing hole. Can happen only when
> something truncated the file since we opened it.
>
> * When SEEK_HOLE succeeds, SEEK_DATA fails, and SEEK_END succeeds,
> then try_seek_hole() reports a trailing hole. This is okay only
> when SEEK_DATA failed with -ENXIO (which means the non-trailing hole
> found by SEEK_HOLE has since become trailing somehow). For other
> failures (unlikely), it's wrong.
>
> * When SEEK_HOLE succeeds, SEEK_DATA fails, SEEK_END fails (unlikely),
> then try_seek_hole() reports bogus data [-1,start), which its caller
> raw_co_get_block_status() turns into zero sectors of data. Could
> theoretically lead to infinite loops in code that attempts to scan
> data vs. hole forward.
>
> Rewrite from scratch, with very careful comments.
Thanks for the careful commit message as well as the careful comments :)
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
> block/raw-posix.c | 111 +++++++++++++++++++++++++++++++++++++++++-------------
> 1 file changed, 85 insertions(+), 26 deletions(-)
>
> @@ -1542,25 +1600,26 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
> nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
> }
>
> + } else if (data == start) {
> /* On a data extent, compute sectors to the end of the extent. */
> *pnum = MIN(nb_sectors, (hole - start) / BDRV_SECTOR_SIZE);
I think we are safe for now that no file system supports holes smaller
than 512 bytes, so (hole - start) should always be a non-zero multiple
of sectors. Similarly for the hole case of (data - start). Maybe it's
worth assert(*pnum > 0) to ensure that we never hit a situation where we
go into an infinite loop where we aren't progressing because pnum is
never advancing to the next sector? But that would be okay as a
separate patch, and I don't want to delay getting _this_ patch into 2.2.
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH v3 for-2.2 3/3] raw-posix: The SEEK_HOLE code is flawed, rewrite it
2014-11-17 16:43 ` Eric Blake
@ 2014-11-18 8:42 ` Max Reitz
0 siblings, 0 replies; 12+ messages in thread
From: Max Reitz @ 2014-11-18 8:42 UTC (permalink / raw)
To: Eric Blake, Markus Armbruster, qemu-devel
Cc: kwolf, pbonzini, famz, tony, stefanha
On 2014-11-17 at 17:43, Eric Blake wrote:
> On 11/17/2014 03:18 AM, Markus Armbruster wrote:
>> On systems where SEEK_HOLE in a trailing hole seeks to EOF (Solaris,
>> but not Linux), try_seek_hole() reports trailing data instead.
> Maybe worth a comment that this is not fatal, but also not optimal.
>
>> Additionally, unlikely lseek() failures are treated badly:
>>
>> * When SEEK_HOLE fails, try_seek_hole() reports trailing data. For
>> -ENXIO, there's in fact a trailing hole. Can happen only when
>> something truncated the file since we opened it.
>>
>> * When SEEK_HOLE succeeds, SEEK_DATA fails, and SEEK_END succeeds,
>> then try_seek_hole() reports a trailing hole. This is okay only
>> when SEEK_DATA failed with -ENXIO (which means the non-trailing hole
>> found by SEEK_HOLE has since become trailing somehow). For other
>> failures (unlikely), it's wrong.
>>
>> * When SEEK_HOLE succeeds, SEEK_DATA fails, SEEK_END fails (unlikely),
>> then try_seek_hole() reports bogus data [-1,start), which its caller
>> raw_co_get_block_status() turns into zero sectors of data. Could
>> theoretically lead to infinite loops in code that attempts to scan
>> data vs. hole forward.
>>
>> Rewrite from scratch, with very careful comments.
> Thanks for the careful commit message as well as the careful comments :)
>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>> block/raw-posix.c | 111 +++++++++++++++++++++++++++++++++++++++++-------------
>> 1 file changed, 85 insertions(+), 26 deletions(-)
>>
>> @@ -1542,25 +1600,26 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
>> nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
>> }
>>
>> + } else if (data == start) {
>> /* On a data extent, compute sectors to the end of the extent. */
>> *pnum = MIN(nb_sectors, (hole - start) / BDRV_SECTOR_SIZE);
> I think we are safe for now that no file system supports holes smaller
> than 512 bytes, so (hole - start) should always be a non-zero multiple
> of sectors. Similarly for the hole case of (data - start). Maybe it's
> worth assert(*pnum > 0) to ensure that we never hit a situation where we
> go into an infinite loop where we aren't progressing because pnum is
> never advancing to the next sector?
That's something the callers of bdrv_get_block_status() have to take
care of. See commit 4b25bbc4c22cf39350b75bd250d568a4d975f7c5, for example.
Max
> But that would be okay as a
> separate patch, and I don't want to delay getting _this_ patch into 2.2.
>
> Reviewed-by: Eric Blake <eblake@redhat.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH v3 for-2.2 0/3] raw-posix: Get rid of FIEMAP, fix SEEK_HOLE
2014-11-17 10:18 [Qemu-devel] [PATCH v3 for-2.2 0/3] raw-posix: Get rid of FIEMAP, fix SEEK_HOLE Markus Armbruster
` (2 preceding siblings ...)
2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 3/3] raw-posix: The SEEK_HOLE code is flawed, rewrite it Markus Armbruster
@ 2014-11-18 8:48 ` Max Reitz
3 siblings, 0 replies; 12+ messages in thread
From: Max Reitz @ 2014-11-18 8:48 UTC (permalink / raw)
To: Markus Armbruster, qemu-devel; +Cc: kwolf, famz, tony, stefanha, pbonzini
On 2014-11-17 at 11:18, Markus Armbruster wrote:
> PATCH 1 is just a comment fix. PATCH 2 drops FIEMAP use and explains
> why it needs to go. PATCH 3 carefully rewrites the SEEK_HOLE code.
>
> Why 2.2? The series fixes bugs, but the bugs are either not terribly
> severe, or not particularly likely to bite. The reason I want it
> included is we've already changed the code fundamentally in 2.2, from
>
> Incorrect use of FIEMAP if available, else somewhat incorrect use
> of SEEK_HOLE if available, else pretend no holes
>
> to
>
> Somewhat incorrect use of SEEK_HOLE if available, else correct but
> slow-as-molasses use of FIEMAP if available, else pretend no holes
>
> Let's finish the job:
>
> Correct use of SEEK_HOLE if available, else pretend no holes
>
> Less complex, more robust, and no half-broken intermediate version
> released.
>
> v3:
> * PATCH 1&2 unchanged, except for a commit message typo [Eric]
> * PATCH 3&4 redone as PATCH 3, very carefully [Eric, Kevin]
>
> v2:
> * PATCH 1 unchanged
> * PATCH 2 revised and split up [Paolo, Fam, Eric, Max]
>
> Markus Armbruster (3):
> raw-posix: Fix comment for raw_co_get_block_status()
> raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
> raw-posix: The SEEK_HOLE code is flawed, rewrite it
>
> block/raw-posix.c | 167 ++++++++++++++++++++++++++++--------------------------
> 1 file changed, 86 insertions(+), 81 deletions(-)
Thanks, applied to my block tree with the hunk removing skip_fiemap
squashed into patch 2:
https://github.com/XanClic/qemu/commits/block
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-11-18 8:49 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-17 10:18 [Qemu-devel] [PATCH v3 for-2.2 0/3] raw-posix: Get rid of FIEMAP, fix SEEK_HOLE Markus Armbruster
2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 1/3] raw-posix: Fix comment for raw_co_get_block_status() Markus Armbruster
2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 2/3] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP Markus Armbruster
2014-11-17 10:30 ` Max Reitz
2014-11-17 10:58 ` Markus Armbruster
2014-11-17 10:59 ` Max Reitz
2014-11-17 16:31 ` Eric Blake
2014-11-17 10:18 ` [Qemu-devel] [PATCH v3 for-2.2 3/3] raw-posix: The SEEK_HOLE code is flawed, rewrite it Markus Armbruster
2014-11-17 10:33 ` Max Reitz
2014-11-17 16:43 ` Eric Blake
2014-11-18 8:42 ` Max Reitz
2014-11-18 8:48 ` [Qemu-devel] [PATCH v3 for-2.2 0/3] raw-posix: Get rid of FIEMAP, fix SEEK_HOLE Max Reitz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).