qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] block/raw-posix: Fall back to SEEK_HOLE/SEEK_DATA
@ 2014-05-05 20:01 Max Reitz
  2014-05-06 11:49 ` Stefan Hajnoczi
  0 siblings, 1 reply; 6+ messages in thread
From: Max Reitz @ 2014-05-05 20:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Max Reitz

The current version of raw-posix always uses ioctl(FS_IOC_FIEMAP) if
FIEMAP is available; lseek with SEEK_HOLE/SEEK_DATA are not even
compiled in in this case. However, there may be implementations which
support the latter but not the former (e.g., NFSv4.2). In this case,
raw-posix should fall back to lseek with SEEK_HOLE/SEEK_DATA if FIEMAP
does not work.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
As Linux does not yet implement NFSv4.2 (I don't even know whether the
specification is complete), I have no way of testing whether this
actually works for the proposed case. But as it doesn't break any of the
existing test cases, it should be fine.
---
 block/raw-posix.c | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 3ce026d..e523633 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -146,6 +146,9 @@ typedef struct BDRVRawState {
     bool has_discard:1;
     bool has_write_zeroes:1;
     bool discard_zeroes:1;
+#if defined CONFIG_FIEMAP && defined SEEK_HOLE && defined SEEK_DATA
+    bool use_seek_hole_data;
+#endif
 } BDRVRawState;
 
 typedef struct BDRVRawReopenState {
@@ -1291,6 +1294,7 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
                                             int64_t sector_num,
                                             int nb_sectors, int *pnum)
 {
+    BDRVRawState *s = bs->opaque;
     off_t start, data, hole;
     int64_t ret;
 
@@ -1303,22 +1307,31 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
     ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
 
 #ifdef CONFIG_FIEMAP
-
-    BDRVRawState *s = bs->opaque;
     struct {
         struct fiemap fm;
         struct fiemap_extent fe;
     } f;
 
+#if defined SEEK_HOLE && defined SEEK_DATA
+    if (s->use_seek_hole_data) {
+        goto try_seek_hole_data;
+    }
+#endif
+
     f.fm.fm_start = start;
     f.fm.fm_length = (int64_t)nb_sectors * BDRV_SECTOR_SIZE;
     f.fm.fm_flags = 0;
     f.fm.fm_extent_count = 1;
     f.fm.fm_reserved = 0;
     if (ioctl(s->fd, FS_IOC_FIEMAP, &f) == -1) {
+#if defined SEEK_HOLE && defined SEEK_DATA
+        s->use_seek_hole_data = true;
+        goto try_seek_hole_data;
+#else
         /* Assume everything is allocated.  */
         *pnum = nb_sectors;
         return ret;
+#endif
     }
 
     if (f.fm.fm_mapped_extents == 0) {
@@ -1336,10 +1349,11 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
         }
     }
 
-#elif defined SEEK_HOLE && defined SEEK_DATA
-
-    BDRVRawState *s = bs->opaque;
+    goto done;
+#endif
 
+#if defined SEEK_HOLE && defined SEEK_DATA
+try_seek_hole_data:
     hole = lseek(s->fd, start, SEEK_HOLE);
     if (hole == -1) {
         /* -ENXIO indicates that sector_num was past the end of the file.
@@ -1360,11 +1374,12 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
             data = lseek(s->fd, 0, SEEK_END);
         }
     }
-#else
+#elif !defined CONFIG_FIEMAP
     data = 0;
     hole = start + nb_sectors * BDRV_SECTOR_SIZE;
 #endif
 
+done:
     if (data <= start) {
         /* On a data extent, compute sectors to the end of the extent.  */
         *pnum = MIN(nb_sectors, (hole - start) / BDRV_SECTOR_SIZE);
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] block/raw-posix: Fall back to SEEK_HOLE/SEEK_DATA
  2014-05-05 20:01 [Qemu-devel] [PATCH] block/raw-posix: Fall back to SEEK_HOLE/SEEK_DATA Max Reitz
@ 2014-05-06 11:49 ` Stefan Hajnoczi
  2014-05-06 12:27   ` Eric Blake
  2014-05-06 17:45   ` Max Reitz
  0 siblings, 2 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2014-05-06 11:49 UTC (permalink / raw)
  To: Max Reitz; +Cc: Kevin Wolf, qemu-devel

On Mon, May 05, 2014 at 10:01:39PM +0200, Max Reitz wrote:
> The current version of raw-posix always uses ioctl(FS_IOC_FIEMAP) if
> FIEMAP is available; lseek with SEEK_HOLE/SEEK_DATA are not even
> compiled in in this case. However, there may be implementations which
> support the latter but not the former (e.g., NFSv4.2). In this case,
> raw-posix should fall back to lseek with SEEK_HOLE/SEEK_DATA if FIEMAP
> does not work.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
> As Linux does not yet implement NFSv4.2 (I don't even know whether the
> specification is complete), I have no way of testing whether this
> actually works for the proposed case. But as it doesn't break any of the
> existing test cases, it should be fine.
> ---
>  block/raw-posix.c | 27 +++++++++++++++++++++------
>  1 file changed, 21 insertions(+), 6 deletions(-)
> 
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 3ce026d..e523633 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -146,6 +146,9 @@ typedef struct BDRVRawState {
>      bool has_discard:1;
>      bool has_write_zeroes:1;
>      bool discard_zeroes:1;
> +#if defined CONFIG_FIEMAP && defined SEEK_HOLE && defined SEEK_DATA
> +    bool use_seek_hole_data;
> +#endif
>  } BDRVRawState;
>  
>  typedef struct BDRVRawReopenState {
> @@ -1291,6 +1294,7 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
>                                              int64_t sector_num,
>                                              int nb_sectors, int *pnum)
>  {
> +    BDRVRawState *s = bs->opaque;
>      off_t start, data, hole;
>      int64_t ret;
>  
> @@ -1303,22 +1307,31 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
>      ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
>  
>  #ifdef CONFIG_FIEMAP
> -
> -    BDRVRawState *s = bs->opaque;
>      struct {
>          struct fiemap fm;
>          struct fiemap_extent fe;
>      } f;
>  
> +#if defined SEEK_HOLE && defined SEEK_DATA
> +    if (s->use_seek_hole_data) {
> +        goto try_seek_hole_data;
> +    }
> +#endif

This is becoming hard to read due to the ifdefs and their relationships.

A minor simplification is to change the bool field:
#if defined CONFIG_FIEMAP
bool use_fiemap;
#endif

...

#if defined CONFIG_FIEMAP
if (s->use_fiemap) {
    ...try fiemap...
}
#endif /* CONFIG_FIEMAP */

use_fiemap is not dependent on SEEK_HOLE/SEEK_DATA.  That reduces the
amount of ifdefs needed.

A bigger cleanup is extracting the FIEMAP and SEEK_HOLE/SEEK_DATA
implementations into their own static functions.  Then
raw_co_get_block_status() becomes simpler and doesn't need ifdefs:

ret = try_fiemap(...);
if (ret < 0) {
    ret = try_seekhole(...);
}
if (ret < 0) {
    ...report every block allocated by default....
}

In other words, let normal C control flow describe the relationships
between these code paths.  Use ifdef only to nop out try_fiemap() and
try_seekhole().

What do you think?

Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] block/raw-posix: Fall back to SEEK_HOLE/SEEK_DATA
  2014-05-06 11:49 ` Stefan Hajnoczi
@ 2014-05-06 12:27   ` Eric Blake
  2014-05-06 17:46     ` Max Reitz
  2014-05-06 17:45   ` Max Reitz
  1 sibling, 1 reply; 6+ messages in thread
From: Eric Blake @ 2014-05-06 12:27 UTC (permalink / raw)
  To: Stefan Hajnoczi, Max Reitz; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2401 bytes --]

On 05/06/2014 05:49 AM, Stefan Hajnoczi wrote:
> On Mon, May 05, 2014 at 10:01:39PM +0200, Max Reitz wrote:
>> The current version of raw-posix always uses ioctl(FS_IOC_FIEMAP) if
>> FIEMAP is available; lseek with SEEK_HOLE/SEEK_DATA are not even
>> compiled in in this case. However, there may be implementations which
>> support the latter but not the former (e.g., NFSv4.2). In this case,
>> raw-posix should fall back to lseek with SEEK_HOLE/SEEK_DATA if FIEMAP
>> does not work.
>>

> 
> A bigger cleanup is extracting the FIEMAP and SEEK_HOLE/SEEK_DATA
> implementations into their own static functions.  Then
> raw_co_get_block_status() becomes simpler and doesn't need ifdefs:
> 
> ret = try_fiemap(...);
> if (ret < 0) {
>     ret = try_seekhole(...);
> }
> if (ret < 0) {
>     ...report every block allocated by default....
> }
> 
> In other words, let normal C control flow describe the relationships
> between these code paths.  Use ifdef only to nop out try_fiemap() and
> try_seekhole().
> 
> What do you think?

I like the idea - separating control flow from #ifdefs (by having stubs
on the other end of the ifdef) definitely makes algorithms easier to
understand.

More things to consider: GNU Coreutils has support for both fiemap and
seek_hole, but favors seek_hole first, for a couple reasons.  First,
FIEMAP has not always been reliable: on some older kernel/filesystem
pairs, fiemap could return stale results, which led cp(1) to cause data
loss unless it did an fsync() first to get the fiemap to be stable - but
the cost of the fsync() made the operation slower than if fiemap were
never used.  Second, POSIX will be standardizing seek_hole in its next
revision [1] (still several years out, but the fact that it is an
announced intention means people are starting to implement it now).
fiemap, on the other hand, remains a Linux-only extension.  Yes, fiemap
provides more details than seek_hole (and is the ONLY way to know the
difference between a hole that has reserved space on the disk vs a hole
that will require allocation if is written to), but if all you need to
know is whether a hole exists (rather than what type of hole), then
seek_hole is MUCH simpler.

[1] http://austingroupbugs.net/view.php?id=415

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] block/raw-posix: Fall back to SEEK_HOLE/SEEK_DATA
  2014-05-06 11:49 ` Stefan Hajnoczi
  2014-05-06 12:27   ` Eric Blake
@ 2014-05-06 17:45   ` Max Reitz
  1 sibling, 0 replies; 6+ messages in thread
From: Max Reitz @ 2014-05-06 17:45 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, qemu-devel

On 06.05.2014 13:49, Stefan Hajnoczi wrote:
> On Mon, May 05, 2014 at 10:01:39PM +0200, Max Reitz wrote:
>> The current version of raw-posix always uses ioctl(FS_IOC_FIEMAP) if
>> FIEMAP is available; lseek with SEEK_HOLE/SEEK_DATA are not even
>> compiled in in this case. However, there may be implementations which
>> support the latter but not the former (e.g., NFSv4.2). In this case,
>> raw-posix should fall back to lseek with SEEK_HOLE/SEEK_DATA if FIEMAP
>> does not work.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>> As Linux does not yet implement NFSv4.2 (I don't even know whether the
>> specification is complete), I have no way of testing whether this
>> actually works for the proposed case. But as it doesn't break any of the
>> existing test cases, it should be fine.
>> ---
>>   block/raw-posix.c | 27 +++++++++++++++++++++------
>>   1 file changed, 21 insertions(+), 6 deletions(-)
>>
>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>> index 3ce026d..e523633 100644
>> --- a/block/raw-posix.c
>> +++ b/block/raw-posix.c
>> @@ -146,6 +146,9 @@ typedef struct BDRVRawState {
>>       bool has_discard:1;
>>       bool has_write_zeroes:1;
>>       bool discard_zeroes:1;
>> +#if defined CONFIG_FIEMAP && defined SEEK_HOLE && defined SEEK_DATA
>> +    bool use_seek_hole_data;
>> +#endif
>>   } BDRVRawState;
>>   
>>   typedef struct BDRVRawReopenState {
>> @@ -1291,6 +1294,7 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
>>                                               int64_t sector_num,
>>                                               int nb_sectors, int *pnum)
>>   {
>> +    BDRVRawState *s = bs->opaque;
>>       off_t start, data, hole;
>>       int64_t ret;
>>   
>> @@ -1303,22 +1307,31 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
>>       ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
>>   
>>   #ifdef CONFIG_FIEMAP
>> -
>> -    BDRVRawState *s = bs->opaque;
>>       struct {
>>           struct fiemap fm;
>>           struct fiemap_extent fe;
>>       } f;
>>   
>> +#if defined SEEK_HOLE && defined SEEK_DATA
>> +    if (s->use_seek_hole_data) {
>> +        goto try_seek_hole_data;
>> +    }
>> +#endif
> This is becoming hard to read due to the ifdefs and their relationships.
>
> A minor simplification is to change the bool field:
> #if defined CONFIG_FIEMAP
> bool use_fiemap;
> #endif
>
> ...
>
> #if defined CONFIG_FIEMAP
> if (s->use_fiemap) {
>      ...try fiemap...
> }
> #endif /* CONFIG_FIEMAP */
>
> use_fiemap is not dependent on SEEK_HOLE/SEEK_DATA.  That reduces the
> amount of ifdefs needed.
>
> A bigger cleanup is extracting the FIEMAP and SEEK_HOLE/SEEK_DATA
> implementations into their own static functions.  Then
> raw_co_get_block_status() becomes simpler and doesn't need ifdefs:
>
> ret = try_fiemap(...);
> if (ret < 0) {
>      ret = try_seekhole(...);
> }
> if (ret < 0) {
>      ...report every block allocated by default....
> }
>
> In other words, let normal C control flow describe the relationships
> between these code paths.  Use ifdef only to nop out try_fiemap() and
> try_seekhole().
>
> What do you think?

Great idea! I thought about using normal if's or something, but didn't 
think of this.

Max

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] block/raw-posix: Fall back to SEEK_HOLE/SEEK_DATA
  2014-05-06 12:27   ` Eric Blake
@ 2014-05-06 17:46     ` Max Reitz
  2014-05-06 17:55       ` Eric Blake
  0 siblings, 1 reply; 6+ messages in thread
From: Max Reitz @ 2014-05-06 17:46 UTC (permalink / raw)
  To: Eric Blake, Stefan Hajnoczi; +Cc: Kevin Wolf, qemu-devel

On 06.05.2014 14:27, Eric Blake wrote:
> On 05/06/2014 05:49 AM, Stefan Hajnoczi wrote:
>> On Mon, May 05, 2014 at 10:01:39PM +0200, Max Reitz wrote:
>>> The current version of raw-posix always uses ioctl(FS_IOC_FIEMAP) if
>>> FIEMAP is available; lseek with SEEK_HOLE/SEEK_DATA are not even
>>> compiled in in this case. However, there may be implementations which
>>> support the latter but not the former (e.g., NFSv4.2). In this case,
>>> raw-posix should fall back to lseek with SEEK_HOLE/SEEK_DATA if FIEMAP
>>> does not work.
>>>
>> A bigger cleanup is extracting the FIEMAP and SEEK_HOLE/SEEK_DATA
>> implementations into their own static functions.  Then
>> raw_co_get_block_status() becomes simpler and doesn't need ifdefs:
>>
>> ret = try_fiemap(...);
>> if (ret < 0) {
>>      ret = try_seekhole(...);
>> }
>> if (ret < 0) {
>>      ...report every block allocated by default....
>> }
>>
>> In other words, let normal C control flow describe the relationships
>> between these code paths.  Use ifdef only to nop out try_fiemap() and
>> try_seekhole().
>>
>> What do you think?
> I like the idea - separating control flow from #ifdefs (by having stubs
> on the other end of the ifdef) definitely makes algorithms easier to
> understand.
>
> More things to consider: GNU Coreutils has support for both fiemap and
> seek_hole, but favors seek_hole first, for a couple reasons.  First,
> FIEMAP has not always been reliable: on some older kernel/filesystem
> pairs, fiemap could return stale results, which led cp(1) to cause data
> loss unless it did an fsync() first to get the fiemap to be stable - but
> the cost of the fsync() made the operation slower than if fiemap were
> never used.  Second, POSIX will be standardizing seek_hole in its next
> revision [1] (still several years out, but the fact that it is an
> announced intention means people are starting to implement it now).
> fiemap, on the other hand, remains a Linux-only extension.  Yes, fiemap
> provides more details than seek_hole (and is the ONLY way to know the
> difference between a hole that has reserved space on the disk vs a hole
> that will require allocation if is written to), but if all you need to
> know is whether a hole exists (rather than what type of hole), then
> seek_hole is MUCH simpler.
>
> [1] http://austingroupbugs.net/view.php?id=415

Okay, then I'll put the functionality in own functions and reverse the 
order in v2 while keeping the fallback idea, as I think there may exist 
systems where the reverse of what this patch tries to fix is true: 
SEEK_HOLE/SEEK_DATA is not supported, but FIEMAP is.

Max

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] block/raw-posix: Fall back to SEEK_HOLE/SEEK_DATA
  2014-05-06 17:46     ` Max Reitz
@ 2014-05-06 17:55       ` Eric Blake
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Blake @ 2014-05-06 17:55 UTC (permalink / raw)
  To: Max Reitz, Stefan Hajnoczi; +Cc: Kevin Wolf, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 624 bytes --]

On 05/06/2014 11:46 AM, Max Reitz wrote:

> 
> Okay, then I'll put the functionality in own functions and reverse the
> order in v2 while keeping the fallback idea, as I think there may exist
> systems where the reverse of what this patch tries to fix is true:
> SEEK_HOLE/SEEK_DATA is not supported, but FIEMAP is.

Correct - Linux implemented FIEMAP long before SEEK_HOLE, so there is a
range of kernels where FIEMAP is all you have.  It is only non-Linux
where you will have SEEK_HOLE in isolation.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-05-06 18:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-05 20:01 [Qemu-devel] [PATCH] block/raw-posix: Fall back to SEEK_HOLE/SEEK_DATA Max Reitz
2014-05-06 11:49 ` Stefan Hajnoczi
2014-05-06 12:27   ` Eric Blake
2014-05-06 17:46     ` Max Reitz
2014-05-06 17:55       ` Eric Blake
2014-05-06 17:45   ` Max Reitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).