* [Qemu-devel] [PATCH v2 0/4] raw-posix: Get rid of FIEMAP, and more @ 2014-11-13 10:16 Markus Armbruster 2014-11-13 10:17 ` [Qemu-devel] [PATCH v2 1/4] raw-posix: Fix comment for raw_co_get_block_status() Markus Armbruster ` (4 more replies) 0 siblings, 5 replies; 21+ messages in thread From: Markus Armbruster @ 2014-11-13 10:16 UTC (permalink / raw) To: qemu-devel; +Cc: kwolf, famz, tony, mreitz, stefanha, pbonzini See PATCH 2/4 for why FIEMAP needs to go. Minor fixes in 1+3/4, cleanup in 4/4. Would you like this included in 2.2? Maybe just the first three? v2: * PATCH 1 unchanged * PATCH 2 revised and split up [Paolo, Fam, Eric, Max] Markus Armbruster (4): raw-posix: Fix comment for raw_co_get_block_status() raw-posix: SEEK_HOLE suffices, get rid of FIEMAP raw-posix: Fix try_seek_hole()'s handling of SEEK_DATA failure raw-posix: Clean up around raw_co_get_block_status() block/raw-posix.c | 117 ++++++++++++++++++------------------------------------ 1 file changed, 38 insertions(+), 79 deletions(-) -- 1.9.3 ^ permalink raw reply [flat|nested] 21+ messages in thread
* [Qemu-devel] [PATCH v2 1/4] raw-posix: Fix comment for raw_co_get_block_status() 2014-11-13 10:16 [Qemu-devel] [PATCH v2 0/4] raw-posix: Get rid of FIEMAP, and more Markus Armbruster @ 2014-11-13 10:17 ` Markus Armbruster 2014-11-13 10:17 ` [Qemu-devel] [PATCH v2 2/4] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP Markus Armbruster ` (3 subsequent siblings) 4 siblings, 0 replies; 21+ messages in thread From: Markus Armbruster @ 2014-11-13 10:17 UTC (permalink / raw) To: qemu-devel; +Cc: kwolf, famz, tony, mreitz, stefanha, pbonzini Missed in commit 705be72. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> --- block/raw-posix.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/block/raw-posix.c b/block/raw-posix.c index e100ae2..706d3c0 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -1555,9 +1555,7 @@ static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data, } /* - * Returns true iff the specified sector is present in the disk image. Drivers - * not implementing the functionality are assumed to not support backing files, - * hence all their sectors are reported as allocated. + * Returns the allocation status of the specified sectors. * * If 'sector_num' is beyond the end of the disk image the return value is 0 * and 'pnum' is set to 0. -- 1.9.3 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Qemu-devel] [PATCH v2 2/4] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP 2014-11-13 10:16 [Qemu-devel] [PATCH v2 0/4] raw-posix: Get rid of FIEMAP, and more Markus Armbruster 2014-11-13 10:17 ` [Qemu-devel] [PATCH v2 1/4] raw-posix: Fix comment for raw_co_get_block_status() Markus Armbruster @ 2014-11-13 10:17 ` Markus Armbruster 2014-11-13 10:19 ` Max Reitz 2014-11-13 14:09 ` Eric Blake 2014-11-13 10:17 ` [Qemu-devel] [PATCH v2 3/4] raw-posix: Fix try_seek_hole()'s handling of SEEK_DATA failure Markus Armbruster ` (2 subsequent siblings) 4 siblings, 2 replies; 21+ messages in thread From: Markus Armbruster @ 2014-11-13 10:17 UTC (permalink / raw) To: qemu-devel; +Cc: kwolf, famz, tony, mreitz, stefanha, pbonzini Commit 5500316 (May 2012) implemented raw_co_is_allocated() as follows: 1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl 2. Else if defined(SEEK_HOLE) && defined(SEEK_DATA), use lseek() 3. Else pretend there are no holes Later on, raw_co_is_allocated() was generalized to raw_co_get_block_status(). Commit 4f11aa8 (May 2014) changed it to try the three methods in order until success, because "there may be implementations which support [SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice versa." Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC. Commit 38c4d0a (Sep 2014) added it. Because that's a significant speed hit, the next commit 7c159037 put SEEK_HOLE/SEEK_DATA first. As you see, the obvious use of FIEMAP is wrong, and the correct use is slow. I guess this puts it somewhere between -7 "The obvious use is wrong" and -10 "It's impossible to get right" on Rusty Russel's Hard to Misuse scale[*]. "Fortunately", the FIEMAP code is used only when * SEEK_HOLE/SEEK_DATA arent't defined, but CONFIG_FIEMAP is Uncommon. SEEK_HOLE had no XFS implementation between 2011 (when it was introduced for ext4 and btrfs) and 2012. * SEEK_HOLE/SEEK_DATA and CONFIG_FIEMAP are defined, but lseek() fails Unlikely. Thus, the FIEMAP code executes rarely. Makes it a nice hidey-hole for bugs. Worse, bugs hiding there can theoretically bite even on a host that has SEEK_HOLE/SEEK_DATA. I don't want to worry about this crap, not even theoretically. Get rid of it. [*] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html Signed-off-by: Markus Armbruster <armbru@redhat.com> --- block/raw-posix.c | 60 ++++--------------------------------------------------- 1 file changed, 4 insertions(+), 56 deletions(-) diff --git a/block/raw-posix.c b/block/raw-posix.c index 706d3c0..fd80d84 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -60,9 +60,6 @@ #define FS_NOCOW_FL 0x00800000 /* Do not cow file */ #endif #endif -#ifdef CONFIG_FIEMAP -#include <linux/fiemap.h> -#endif #ifdef CONFIG_FALLOCATE_PUNCH_HOLE #include <linux/falloc.h> #endif @@ -1481,52 +1478,6 @@ out: return result; } -static int try_fiemap(BlockDriverState *bs, off_t start, off_t *data, - off_t *hole, int nb_sectors) -{ -#ifdef CONFIG_FIEMAP - BDRVRawState *s = bs->opaque; - int ret = 0; - struct { - struct fiemap fm; - struct fiemap_extent fe; - } f; - - if (s->skip_fiemap) { - return -ENOTSUP; - } - - f.fm.fm_start = start; - f.fm.fm_length = (int64_t)nb_sectors * BDRV_SECTOR_SIZE; - f.fm.fm_flags = FIEMAP_FLAG_SYNC; - f.fm.fm_extent_count = 1; - f.fm.fm_reserved = 0; - if (ioctl(s->fd, FS_IOC_FIEMAP, &f) == -1) { - s->skip_fiemap = true; - return -errno; - } - - if (f.fm.fm_mapped_extents == 0) { - /* No extents found, data is beyond f.fm.fm_start + f.fm.fm_length. - * f.fm.fm_start + f.fm.fm_length must be clamped to the file size! - */ - off_t length = lseek(s->fd, 0, SEEK_END); - *hole = f.fm.fm_start; - *data = MIN(f.fm.fm_start + f.fm.fm_length, length); - } else { - *data = f.fe.fe_logical; - *hole = f.fe.fe_logical + f.fe.fe_length; - if (f.fe.fe_flags & FIEMAP_EXTENT_UNWRITTEN) { - ret |= BDRV_BLOCK_ZERO; - } - } - - return ret; -#else - return -ENOTSUP; -#endif -} - static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data, off_t *hole) { @@ -1593,13 +1544,10 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs, ret = try_seek_hole(bs, start, &data, &hole); if (ret < 0) { - ret = try_fiemap(bs, start, &data, &hole, nb_sectors); - if (ret < 0) { - /* Assume everything is allocated. */ - data = 0; - hole = start + nb_sectors * BDRV_SECTOR_SIZE; - ret = 0; - } + /* Assume everything is allocated. */ + data = 0; + hole = start + nb_sectors * BDRV_SECTOR_SIZE; + ret = 0; } assert(ret >= 0); -- 1.9.3 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH v2 2/4] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP 2014-11-13 10:17 ` [Qemu-devel] [PATCH v2 2/4] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP Markus Armbruster @ 2014-11-13 10:19 ` Max Reitz 2014-11-13 14:09 ` Eric Blake 1 sibling, 0 replies; 21+ messages in thread From: Max Reitz @ 2014-11-13 10:19 UTC (permalink / raw) To: Markus Armbruster, qemu-devel; +Cc: kwolf, famz, tony, stefanha, pbonzini On 2014-11-13 at 11:17, Markus Armbruster wrote: > Commit 5500316 (May 2012) implemented raw_co_is_allocated() as > follows: > > 1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl > > 2. Else if defined(SEEK_HOLE) && defined(SEEK_DATA), use lseek() > > 3. Else pretend there are no holes > > Later on, raw_co_is_allocated() was generalized to > raw_co_get_block_status(). > > Commit 4f11aa8 (May 2014) changed it to try the three methods in order > until success, because "there may be implementations which support > [SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice > versa." > > Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC. > Commit 38c4d0a (Sep 2014) added it. Because that's a significant > speed hit, the next commit 7c159037 put SEEK_HOLE/SEEK_DATA first. > > As you see, the obvious use of FIEMAP is wrong, and the correct use is > slow. I guess this puts it somewhere between -7 "The obvious use is > wrong" and -10 "It's impossible to get right" on Rusty Russel's Hard > to Misuse scale[*]. > > "Fortunately", the FIEMAP code is used only when > > * SEEK_HOLE/SEEK_DATA arent't defined, but CONFIG_FIEMAP is > > Uncommon. SEEK_HOLE had no XFS implementation between 2011 (when it > was introduced for ext4 and btrfs) and 2012. > > * SEEK_HOLE/SEEK_DATA and CONFIG_FIEMAP are defined, but lseek() fails > > Unlikely. > > Thus, the FIEMAP code executes rarely. Makes it a nice hidey-hole for > bugs. Worse, bugs hiding there can theoretically bite even on a host > that has SEEK_HOLE/SEEK_DATA. > > I don't want to worry about this crap, not even theoretically. Get > rid of it. > > [*] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html > > Signed-off-by: Markus Armbruster <armbru@redhat.com> > --- > block/raw-posix.c | 60 ++++--------------------------------------------------- > 1 file changed, 4 insertions(+), 56 deletions(-) Reviewed-by: Max Reitz <mreitz@redhat.com> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH v2 2/4] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP 2014-11-13 10:17 ` [Qemu-devel] [PATCH v2 2/4] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP Markus Armbruster 2014-11-13 10:19 ` Max Reitz @ 2014-11-13 14:09 ` Eric Blake 1 sibling, 0 replies; 21+ messages in thread From: Eric Blake @ 2014-11-13 14:09 UTC (permalink / raw) To: Markus Armbruster, qemu-devel Cc: kwolf, famz, tony, mreitz, stefanha, pbonzini [-- Attachment #1: Type: text/plain, Size: 635 bytes --] On 11/13/2014 03:17 AM, Markus Armbruster wrote: > Commit 5500316 (May 2012) implemented raw_co_is_allocated() as > follows: > > "Fortunately", the FIEMAP code is used only when > > * SEEK_HOLE/SEEK_DATA arent't defined, but CONFIG_FIEMAP is s/arent't/aren't/ > Signed-off-by: Markus Armbruster <armbru@redhat.com> > --- > block/raw-posix.c | 60 ++++--------------------------------------------------- > 1 file changed, 4 insertions(+), 56 deletions(-) Reviewed-by: Eric Blake <eblake@redhat.com> -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 539 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* [Qemu-devel] [PATCH v2 3/4] raw-posix: Fix try_seek_hole()'s handling of SEEK_DATA failure 2014-11-13 10:16 [Qemu-devel] [PATCH v2 0/4] raw-posix: Get rid of FIEMAP, and more Markus Armbruster 2014-11-13 10:17 ` [Qemu-devel] [PATCH v2 1/4] raw-posix: Fix comment for raw_co_get_block_status() Markus Armbruster 2014-11-13 10:17 ` [Qemu-devel] [PATCH v2 2/4] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP Markus Armbruster @ 2014-11-13 10:17 ` Markus Armbruster 2014-11-13 10:22 ` Max Reitz 2014-11-13 13:03 ` Kevin Wolf 2014-11-13 10:17 ` [Qemu-devel] [PATCH v2 4/4] raw-posix: Clean up around raw_co_get_block_status() Markus Armbruster 2014-11-13 13:30 ` [Qemu-devel] [PATCH v2 0/4] raw-posix: Get rid of FIEMAP, and more Eric Blake 4 siblings, 2 replies; 21+ messages in thread From: Markus Armbruster @ 2014-11-13 10:17 UTC (permalink / raw) To: qemu-devel; +Cc: kwolf, famz, tony, mreitz, stefanha, pbonzini When SEEK_HOLE tells us we're in a hole, we try SEEK_DATA to find its end. When that fails, we pretend the hole extends to the end of file. Wrong. Except when SEEK_END fails, we screw up and claim it extends to offset -1. More wrong. Fortunately, these seeks are very unlikely to fail. Fix it anyway, by returning failure. The caller will then pretend there are no holes. Inaccurate, but safe. Signed-off-by: Markus Armbruster <armbru@redhat.com> --- block/raw-posix.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/block/raw-posix.c b/block/raw-posix.c index fd80d84..2a12a50 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -1494,8 +1494,9 @@ static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data, } else { /* On a hole. We need another syscall to find its end. */ *data = lseek(s->fd, start, SEEK_DATA); - if (*data == -1) { - *data = lseek(s->fd, 0, SEEK_END); + if (*data < 0) { + /* no idea where the hole ends, give up (unlikely to happen) */ + return -errno; } } -- 1.9.3 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/4] raw-posix: Fix try_seek_hole()'s handling of SEEK_DATA failure 2014-11-13 10:17 ` [Qemu-devel] [PATCH v2 3/4] raw-posix: Fix try_seek_hole()'s handling of SEEK_DATA failure Markus Armbruster @ 2014-11-13 10:22 ` Max Reitz 2014-11-13 13:03 ` Kevin Wolf 1 sibling, 0 replies; 21+ messages in thread From: Max Reitz @ 2014-11-13 10:22 UTC (permalink / raw) To: Markus Armbruster, qemu-devel; +Cc: kwolf, famz, tony, stefanha, pbonzini On 2014-11-13 at 11:17, Markus Armbruster wrote: > When SEEK_HOLE tells us we're in a hole, we try SEEK_DATA to find its > end. When that fails, we pretend the hole extends to the end of file. > Wrong. Except when SEEK_END fails, we screw up and claim it extends > to offset -1. More wrong. > > Fortunately, these seeks are very unlikely to fail. Fix it anyway, by > returning failure. The caller will then pretend there are no holes. > Inaccurate, but safe. > > Signed-off-by: Markus Armbruster <armbru@redhat.com> > --- > block/raw-posix.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) Reviewed-by: Max Reitz <mreitz@redhat.com> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/4] raw-posix: Fix try_seek_hole()'s handling of SEEK_DATA failure 2014-11-13 10:17 ` [Qemu-devel] [PATCH v2 3/4] raw-posix: Fix try_seek_hole()'s handling of SEEK_DATA failure Markus Armbruster 2014-11-13 10:22 ` Max Reitz @ 2014-11-13 13:03 ` Kevin Wolf 2014-11-13 14:52 ` Eric Blake 1 sibling, 1 reply; 21+ messages in thread From: Kevin Wolf @ 2014-11-13 13:03 UTC (permalink / raw) To: Markus Armbruster; +Cc: famz, tony, qemu-devel, stefanha, pbonzini, mreitz Am 13.11.2014 um 11:17 hat Markus Armbruster geschrieben: > When SEEK_HOLE tells us we're in a hole, we try SEEK_DATA to find its > end. When that fails, we pretend the hole extends to the end of file. > Wrong. Wrong only in some cases, see below. > Except when SEEK_END fails, we screw up and claim it extends > to offset -1. More wrong. > > Fortunately, these seeks are very unlikely to fail. Fix it anyway, by > returning failure. The caller will then pretend there are no holes. > Inaccurate, but safe. > > Signed-off-by: Markus Armbruster <armbru@redhat.com> > --- > block/raw-posix.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/block/raw-posix.c b/block/raw-posix.c > index fd80d84..2a12a50 100644 > --- a/block/raw-posix.c > +++ b/block/raw-posix.c > @@ -1494,8 +1494,9 @@ static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data, > } else { > /* On a hole. We need another syscall to find its end. */ > *data = lseek(s->fd, start, SEEK_DATA); > - if (*data == -1) { > - *data = lseek(s->fd, 0, SEEK_END); > + if (*data < 0) { > + /* no idea where the hole ends, give up (unlikely to happen) */ Not quite unlikely. If the file ends with a sparse area, we'll get -1/ENXIO here. lseek() with SEEK_DATA starting in a hole when there is no data until EOF is actually the part that isn't documented in the man page, but ENXIO is what I'm seeing here on RHEL 7. > + return -errno; > } > } Kevin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/4] raw-posix: Fix try_seek_hole()'s handling of SEEK_DATA failure 2014-11-13 13:03 ` Kevin Wolf @ 2014-11-13 14:52 ` Eric Blake 2014-11-13 15:29 ` Eric Blake 0 siblings, 1 reply; 21+ messages in thread From: Eric Blake @ 2014-11-13 14:52 UTC (permalink / raw) To: Kevin Wolf, Markus Armbruster Cc: famz, tony, qemu-devel, stefanha, pbonzini, mreitz [-- Attachment #1: Type: text/plain, Size: 1472 bytes --] On 11/13/2014 06:03 AM, Kevin Wolf wrote: > Am 13.11.2014 um 11:17 hat Markus Armbruster geschrieben: >> When SEEK_HOLE tells us we're in a hole, we try SEEK_DATA to find its >> end. When that fails, we pretend the hole extends to the end of file. >> Wrong. > > Wrong only in some cases, see below. > >> Except when SEEK_END fails, we screw up and claim it extends >> to offset -1. More wrong. >> +++ b/block/raw-posix.c >> @@ -1494,8 +1494,9 @@ static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data, >> } else { >> /* On a hole. We need another syscall to find its end. */ >> *data = lseek(s->fd, start, SEEK_DATA); >> - if (*data == -1) { >> - *data = lseek(s->fd, 0, SEEK_END); >> + if (*data < 0) { >> + /* no idea where the hole ends, give up (unlikely to happen) */ > > Not quite unlikely. If the file ends with a sparse area, we'll get > -1/ENXIO here. > > lseek() with SEEK_DATA starting in a hole when there is no data until > EOF is actually the part that isn't documented in the man page, but > ENXIO is what I'm seeing here on RHEL 7. Here's the (proposed) POSIX wording: http://austingroupbugs.net/view.php?id=415 And ENXIO is indeed the expected error for SEEK_DATA on a trailing hole, so maybe we should special case it. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 539 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/4] raw-posix: Fix try_seek_hole()'s handling of SEEK_DATA failure 2014-11-13 14:52 ` Eric Blake @ 2014-11-13 15:29 ` Eric Blake 2014-11-13 15:44 ` Max Reitz 2014-11-13 15:47 ` Eric Blake 0 siblings, 2 replies; 21+ messages in thread From: Eric Blake @ 2014-11-13 15:29 UTC (permalink / raw) To: Kevin Wolf, Markus Armbruster Cc: famz, tony, qemu-devel, stefanha, pbonzini, mreitz [-- Attachment #1: Type: text/plain, Size: 2960 bytes --] On 11/13/2014 07:52 AM, Eric Blake wrote: > On 11/13/2014 06:03 AM, Kevin Wolf wrote: >> Am 13.11.2014 um 11:17 hat Markus Armbruster geschrieben: >>> When SEEK_HOLE tells us we're in a hole, we try SEEK_DATA to find its >>> end. When that fails, we pretend the hole extends to the end of file. >>> Wrong. >> >> Wrong only in some cases, see below. >> >>> Except when SEEK_END fails, we screw up and claim it extends >>> to offset -1. More wrong. > > >>> +++ b/block/raw-posix.c >>> @@ -1494,8 +1494,9 @@ static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data, >>> } else { >>> /* On a hole. We need another syscall to find its end. */ >>> *data = lseek(s->fd, start, SEEK_DATA); >>> - if (*data == -1) { >>> - *data = lseek(s->fd, 0, SEEK_END); >>> + if (*data < 0) { >>> + /* no idea where the hole ends, give up (unlikely to happen) */ >> >> Not quite unlikely. If the file ends with a sparse area, we'll get >> -1/ENXIO here. >> >> lseek() with SEEK_DATA starting in a hole when there is no data until >> EOF is actually the part that isn't documented in the man page, but >> ENXIO is what I'm seeing here on RHEL 7. > > Here's the (proposed) POSIX wording: > > http://austingroupbugs.net/view.php?id=415 > > And ENXIO is indeed the expected error for SEEK_DATA on a trailing hole, > so maybe we should special case it. > Uggh. Historical practice on Solaris (and therefore the POSIX wording) says that SEEK_HOLE in a trailing hole is allowed (but not required) to seek to EOF instead of reporting the offset requested. I have no clue why this was done, but it is VERY annoying - it means that if you provide an offset within a tail hole of a file, you cannot reliably tell if the file ends in a hole or with data, without ALSO trying SEEK_DATA. For applications that are reading a file sequentially but skipping over holes, this behavior is fine (it short-circuits the hole/data search points and might shave an iteration off a lop). But for OUR purposes, where we are merely trying to ascertain whether we are in a hole, we have an inaccurate response - since SEEK_HOLE does NOT return the offset we passed in, we are prone to treat the offset as belonging to data, which is a pessimization (you never get wrong results by treating a hole as data and reading it, but it is definitely slower). I think you HAVE to call lseek() twice, both with SEEK_HOLE and with SEEK_DATA, if you want to accurately determine whether an offset happens to live within a trailing hole. (By the way, I really wish Solaris had implemented a variant that queried, but did NOT change the file offset - maybe Linux can add that as an extension, and give it sane semantics of not special casing trailing holes...) -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 539 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/4] raw-posix: Fix try_seek_hole()'s handling of SEEK_DATA failure 2014-11-13 15:29 ` Eric Blake @ 2014-11-13 15:44 ` Max Reitz 2014-11-13 15:49 ` Eric Blake 2014-11-13 15:47 ` Eric Blake 1 sibling, 1 reply; 21+ messages in thread From: Max Reitz @ 2014-11-13 15:44 UTC (permalink / raw) To: Eric Blake, Kevin Wolf, Markus Armbruster Cc: pbonzini, famz, qemu-devel, stefanha, tony On 2014-11-13 at 16:29, Eric Blake wrote: > On 11/13/2014 07:52 AM, Eric Blake wrote: >> On 11/13/2014 06:03 AM, Kevin Wolf wrote: >>> Am 13.11.2014 um 11:17 hat Markus Armbruster geschrieben: >>>> When SEEK_HOLE tells us we're in a hole, we try SEEK_DATA to find its >>>> end. When that fails, we pretend the hole extends to the end of file. >>>> Wrong. >>> Wrong only in some cases, see below. >>> >>>> Except when SEEK_END fails, we screw up and claim it extends >>>> to offset -1. More wrong. >> >>>> +++ b/block/raw-posix.c >>>> @@ -1494,8 +1494,9 @@ static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data, >>>> } else { >>>> /* On a hole. We need another syscall to find its end. */ >>>> *data = lseek(s->fd, start, SEEK_DATA); >>>> - if (*data == -1) { >>>> - *data = lseek(s->fd, 0, SEEK_END); >>>> + if (*data < 0) { >>>> + /* no idea where the hole ends, give up (unlikely to happen) */ >>> Not quite unlikely. If the file ends with a sparse area, we'll get >>> -1/ENXIO here. >>> >>> lseek() with SEEK_DATA starting in a hole when there is no data until >>> EOF is actually the part that isn't documented in the man page, but >>> ENXIO is what I'm seeing here on RHEL 7. >> Here's the (proposed) POSIX wording: >> >> http://austingroupbugs.net/view.php?id=415 >> >> And ENXIO is indeed the expected error for SEEK_DATA on a trailing hole, >> so maybe we should special case it. >> > Uggh. Historical practice on Solaris (and therefore the POSIX wording) > says that SEEK_HOLE in a trailing hole is allowed (but not required) to > seek to EOF instead of reporting the offset requested. I have no clue > why this was done, but it is VERY annoying - it means that if you > provide an offset within a tail hole of a file, you cannot reliably tell > if the file ends in a hole or with data, without ALSO trying SEEK_DATA. > For applications that are reading a file sequentially but skipping over > holes, this behavior is fine (it short-circuits the hole/data search > points and might shave an iteration off a lop). But for OUR purposes, > where we are merely trying to ascertain whether we are in a hole, we > have an inaccurate response - since SEEK_HOLE does NOT return the offset > we passed in, we are prone to treat the offset as belonging to data, > which is a pessimization (you never get wrong results by treating a hole > as data and reading it, but it is definitely slower). > > I think you HAVE to call lseek() twice, both with SEEK_HOLE and with > SEEK_DATA, if you want to accurately determine whether an offset happens > to live within a trailing hole. > > (By the way, I really wish Solaris had implemented a variant that > queried, but did NOT change the file offset - maybe Linux can add that > as an extension, and give it sane semantics of not special casing > trailing holes...) Are you asking for fiemap? :-P Max ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/4] raw-posix: Fix try_seek_hole()'s handling of SEEK_DATA failure 2014-11-13 15:44 ` Max Reitz @ 2014-11-13 15:49 ` Eric Blake 2014-11-13 15:52 ` Eric Blake 0 siblings, 1 reply; 21+ messages in thread From: Eric Blake @ 2014-11-13 15:49 UTC (permalink / raw) To: Max Reitz, Kevin Wolf, Markus Armbruster Cc: pbonzini, famz, qemu-devel, stefanha, tony [-- Attachment #1: Type: text/plain, Size: 614 bytes --] On 11/13/2014 08:44 AM, Max Reitz wrote: >> (By the way, I really wish Solaris had implemented a variant that >> queried, but did NOT change the file offset - maybe Linux can add that >> as an extension, and give it sane semantics of not special casing >> trailing holes...) > > Are you asking for fiemap? :-P Not that bulky; maybe just two more constants SEEK_PEEK_HOLE and SEEK_PEEK_DATA, which return the same values as their non-peek counterparts but without modifying the fd offset. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 539 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/4] raw-posix: Fix try_seek_hole()'s handling of SEEK_DATA failure 2014-11-13 15:49 ` Eric Blake @ 2014-11-13 15:52 ` Eric Blake 0 siblings, 0 replies; 21+ messages in thread From: Eric Blake @ 2014-11-13 15:52 UTC (permalink / raw) To: Max Reitz, Kevin Wolf, Markus Armbruster Cc: pbonzini, famz, qemu-devel, stefanha, tony [-- Attachment #1: Type: text/plain, Size: 767 bytes --] On 11/13/2014 08:49 AM, Eric Blake wrote: > On 11/13/2014 08:44 AM, Max Reitz wrote: > >>> (By the way, I really wish Solaris had implemented a variant that >>> queried, but did NOT change the file offset - maybe Linux can add that >>> as an extension, and give it sane semantics of not special casing >>> trailing holes...) >> >> Are you asking for fiemap? :-P > > Not that bulky; maybe just two more constants SEEK_PEEK_HOLE and > SEEK_PEEK_DATA, which return the same values as their non-peek > counterparts but without modifying the fd offset. And not the first time I've requested it. From 2011: https://lkml.org/lkml/2011/4/22/91 -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 539 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/4] raw-posix: Fix try_seek_hole()'s handling of SEEK_DATA failure 2014-11-13 15:29 ` Eric Blake 2014-11-13 15:44 ` Max Reitz @ 2014-11-13 15:47 ` Eric Blake 2014-11-13 16:01 ` Eric Blake 2014-11-14 13:12 ` Markus Armbruster 1 sibling, 2 replies; 21+ messages in thread From: Eric Blake @ 2014-11-13 15:47 UTC (permalink / raw) To: Kevin Wolf, Markus Armbruster Cc: famz, qemu-devel, tony, mreitz, stefanha, pbonzini [-- Attachment #1: Type: text/plain, Size: 3829 bytes --] On 11/13/2014 08:29 AM, Eric Blake wrote: >>> lseek() with SEEK_DATA starting in a hole when there is no data until >>> EOF is actually the part that isn't documented in the man page, but >>> ENXIO is what I'm seeing here on RHEL 7. >> >> Here's the (proposed) POSIX wording: >> >> http://austingroupbugs.net/view.php?id=415 >> >> And ENXIO is indeed the expected error for SEEK_DATA on a trailing hole, >> so maybe we should special case it. >> > > Uggh. Historical practice on Solaris (and therefore the POSIX wording) > says that SEEK_HOLE in a trailing hole is allowed (but not required) to > seek to EOF instead of reporting the offset requested. I have no clue > why this was done, but it is VERY annoying - it means that if you > provide an offset within a tail hole of a file, you cannot reliably tell > if the file ends in a hole or with data, without ALSO trying SEEK_DATA. > For applications that are reading a file sequentially but skipping over > holes, this behavior is fine (it short-circuits the hole/data search > points and might shave an iteration off a lop). But for OUR purposes, > where we are merely trying to ascertain whether we are in a hole, we > have an inaccurate response - since SEEK_HOLE does NOT return the offset > we passed in, we are prone to treat the offset as belonging to data, > which is a pessimization (you never get wrong results by treating a hole > as data and reading it, but it is definitely slower). > > I think you HAVE to call lseek() twice, both with SEEK_HOLE and with > SEEK_DATA, if you want to accurately determine whether an offset happens > to live within a trailing hole. Here's a table of possible situations, based solely on POSIX wording (and not on actual tests on Solaris or Linux, although it shouldn't be too hard to confirm behavior): 0-length file: lseek(fd, 0, SEEK_HOLE) => -1 ENXIO lseek(fd, 0, SEEK_DATA) => -1 ENXIO conclusion: 0 is at EOF file of any size: lseek(fd, size_or_larger, SEEK_HOLE) => -1 ENXIO lseek(fd, size_or_larger, SEEK_DATA) => -1 ENXIO conclusion: size_or_larger is at or beyond EOF file where offset is in a hole, but data appears later: lseek(fd, offset, SEEK_HOLE) => offset lseek(fd, offset, SEEK_DATA) => end_of_hole conclusion: offset through end_of_hole is in a hole file where offset is data, whether or not a hole appears later: lseek(fd, offset, SEEK_HOLE) => end_of_data lseek(fd, offset, SEEK_DATA) => offset conclusion: offset through end_of_data is in data file where offset is in a tail hole, option 1: lseek(fd, offset, SEEK_HOLE) => offset lseek(fd, offset, SEEK_DATA) => -1 ENXIO conclusion: offset through EOF is in hole, but another seek needed to learn EOF file where offset is in a tail hole, option 2: lseek(fd, offset, SEEK_HOLE) => EOF lseek(fd, offset, SEEK_DATA) => -1 ENXIO conclusion: offset through EOF is in hole, no additional seek needed The two calls are both necessary, in order to learn which extant type offset belongs to, and to tell where that extant ends; and the behaviors are distinguishable (if both lseek() succeed, we have both numbers we want; if both fail with ENXIO, we know the offset is at or beyond EOF; and if only SEEK_HOLE fails with ENXIO, we know we have a trailing hole); and we can tell at runtime what to do about a trailing hole (if the return value is offset, we need one more lseek(fd, 0, SEEK_END) to find EOF; if the return value is larger than offset, we have EOF for free). You can optimize by calling SEEK_HOLE first (if it fails with ENXIO, there is no need to try SEEK_DATA); but SEEK_HOLE in isolation is insufficient to give you all the information you need. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 539 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/4] raw-posix: Fix try_seek_hole()'s handling of SEEK_DATA failure 2014-11-13 15:47 ` Eric Blake @ 2014-11-13 16:01 ` Eric Blake 2014-11-14 13:12 ` Markus Armbruster 1 sibling, 0 replies; 21+ messages in thread From: Eric Blake @ 2014-11-13 16:01 UTC (permalink / raw) To: Kevin Wolf, Markus Armbruster Cc: famz, tony, qemu-devel, mreitz, stefanha, pbonzini [-- Attachment #1: Type: text/plain, Size: 997 bytes --] On 11/13/2014 08:47 AM, Eric Blake wrote: > The two calls are both necessary, in order to learn which extant type > offset belongs to, and to tell where that extant ends; and the behaviors > are distinguishable (if both lseek() succeed, we have both numbers we > want; if both fail with ENXIO, we know the offset is at or beyond EOF; > and if only SEEK_HOLE fails with ENXIO, we know we have a trailing ^ I meant SEEK_DATA here. > hole); and we can tell at runtime what to do about a trailing hole (if > the return value is offset, we need one more lseek(fd, 0, SEEK_END) to > find EOF; if the return value is larger than offset, we have EOF for > free). You can optimize by calling SEEK_HOLE first (if it fails with > ENXIO, there is no need to try SEEK_DATA); but SEEK_HOLE in isolation is > insufficient to give you all the information you need. > -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 539 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/4] raw-posix: Fix try_seek_hole()'s handling of SEEK_DATA failure 2014-11-13 15:47 ` Eric Blake 2014-11-13 16:01 ` Eric Blake @ 2014-11-14 13:12 ` Markus Armbruster 2014-11-15 0:47 ` Eric Blake 1 sibling, 1 reply; 21+ messages in thread From: Markus Armbruster @ 2014-11-14 13:12 UTC (permalink / raw) To: Eric Blake; +Cc: Kevin Wolf, famz, tony, qemu-devel, mreitz, stefanha, pbonzini Eric Blake <eblake@redhat.com> writes: > On 11/13/2014 08:29 AM, Eric Blake wrote: > >>>> lseek() with SEEK_DATA starting in a hole when there is no data until >>>> EOF is actually the part that isn't documented in the man page, but >>>> ENXIO is what I'm seeing here on RHEL 7. >>> >>> Here's the (proposed) POSIX wording: >>> >>> http://austingroupbugs.net/view.php?id=415 >>> >>> And ENXIO is indeed the expected error for SEEK_DATA on a trailing hole, >>> so maybe we should special case it. >>> >> >> Uggh. Historical practice on Solaris (and therefore the POSIX wording) >> says that SEEK_HOLE in a trailing hole is allowed (but not required) to >> seek to EOF instead of reporting the offset requested. I have no clue >> why this was done, but it is VERY annoying - it means that if you >> provide an offset within a tail hole of a file, you cannot reliably tell >> if the file ends in a hole or with data, without ALSO trying SEEK_DATA. >> For applications that are reading a file sequentially but skipping over >> holes, this behavior is fine (it short-circuits the hole/data search >> points and might shave an iteration off a lop). But for OUR purposes, >> where we are merely trying to ascertain whether we are in a hole, we >> have an inaccurate response - since SEEK_HOLE does NOT return the offset >> we passed in, we are prone to treat the offset as belonging to data, >> which is a pessimization (you never get wrong results by treating a hole >> as data and reading it, but it is definitely slower). >> >> I think you HAVE to call lseek() twice, both with SEEK_HOLE and with >> SEEK_DATA, if you want to accurately determine whether an offset happens >> to live within a trailing hole. > > Here's a table of possible situations, based solely on POSIX wording > (and not on actual tests on Solaris or Linux, although it shouldn't be > too hard to confirm behavior): > > 0-length file: > lseek(fd, 0, SEEK_HOLE) => -1 ENXIO > lseek(fd, 0, SEEK_DATA) => -1 ENXIO > conclusion: 0 is at EOF Isn't this a special case of the next one? > file of any size: > lseek(fd, size_or_larger, SEEK_HOLE) => -1 ENXIO > lseek(fd, size_or_larger, SEEK_DATA) => -1 ENXIO > conclusion: size_or_larger is at or beyond EOF > > file where offset is in a hole, but data appears later: > lseek(fd, offset, SEEK_HOLE) => offset > lseek(fd, offset, SEEK_DATA) => end_of_hole > conclusion: offset through end_of_hole is in a hole > > file where offset is data, whether or not a hole appears later: > lseek(fd, offset, SEEK_HOLE) => end_of_data > lseek(fd, offset, SEEK_DATA) => offset > conclusion: offset through end_of_data is in data > > file where offset is in a tail hole, option 1: > lseek(fd, offset, SEEK_HOLE) => offset > lseek(fd, offset, SEEK_DATA) => -1 ENXIO > conclusion: offset through EOF is in hole, but another seek needed to > learn EOF > > file where offset is in a tail hole, option 2: > lseek(fd, offset, SEEK_HOLE) => EOF > lseek(fd, offset, SEEK_DATA) => -1 ENXIO > conclusion: offset through EOF is in hole, no additional seek needed > > The two calls are both necessary, in order to learn which extant type > offset belongs to, and to tell where that extant ends; and the behaviors > are distinguishable (if both lseek() succeed, we have both numbers we > want; if both fail with ENXIO, we know the offset is at or beyond EOF; > and if only SEEK_HOLE fails with ENXIO, we know we have a trailing > hole); and we can tell at runtime what to do about a trailing hole (if > the return value is offset, we need one more lseek(fd, 0, SEEK_END) to > find EOF; if the return value is larger than offset, we have EOF for > free). You can optimize by calling SEEK_HOLE first (if it fails with > ENXIO, there is no need to try SEEK_DATA); but SEEK_HOLE in isolation is > insufficient to give you all the information you need. Not discussed: how to handle failures other than ENXIO. The appended code still avoids a second seek in one case. Useful mostly because it saves us from handling a second seek's contradictory information. /* * Find allocation range in @bs around offset @start. * May change underlying file descriptor's file offset. * If @start is not in a hole, store @start in @data, and the * beginning of the next hole in @hole, and return 0. * If @start is in a non-trailing hole, store @start in @hole and the * beginning of the next non-hole in @data, and return 0. * If @start is in a trailing hole or beyond EOF, return -ENXIO. * If we can't find out, return a negative errno other than -ENXIO. */ static int find_allocation(BlockDriverState *bs, off_t start, off_t *data, off_t *hole) { #if defined SEEK_HOLE && defined SEEK_DATA BDRVRawState *s = bs->opaque; off_t offs; /* * SEEK_DATA cases: * D1. offs == start: start is in data * D2. offs > start: start is in a hole, next data at offs * D3. offs < 0, errno = ENXIO: either start is in a trailing hole * or start is beyond EOF * If the latter happens, the file has been truncated behind * our back since we opened it. Best we can do is treat like * a trailing hole. * D4. offs < 0, errno != ENXIO: we learned nothing */ offs = lseek(s->fd, start, SEEK_DATA); if (offs < 0) { return -errno; /* D3 or D4 */ } assert(offs >= start); if (offs > start) { /* D2: in hole, next data at offs */ *hole = start; *data = offs; return 0; } /* D1: in data, end not yet known */ /* * SEEK_HOLE cases: * H1. offs == start: start is in a hole * If this happens here, a hole has been dug behind our back * since the previous lseek(). * H2. offs > start: either start is in data, next hole at offs, * or start is in trailing hole, EOF at offs * Linux treats trailing holes like any other hole: offs == * start. Solaris seeks to EOF instead: offs > start (blech). * If that happens here, a hole has been dug behind our back * since the previous lseek(). * H3. offs < 0, errno = ENXIO: start is beyond EOF * If this happens, the file has been truncated behind our * back since we opened it. Treat it like a trailing hole. * H4. offs < 0, errno != ENXIO: we learned nothing * Pretend we know nothing at all, i.e. "forget" about D1. */ offs = lseek(s->fd, start, SEEK_HOLE); if (offs < 0) { return -errno; /* D1 and (H3 or H4) */ } assert(offs >= start); if (offs > start) { /* * D1 and H2: either in data, next hole at offs, or it was in * data but is now in a trailing hole. Treating the latter as * if it there was data extending to EOF is safe, so simply do * that. */ *data = start; *hole = offs; return 0; } /* D1 and H1 */ return -EBUSY; #else return -ENOTSUP; #endif } ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/4] raw-posix: Fix try_seek_hole()'s handling of SEEK_DATA failure 2014-11-14 13:12 ` Markus Armbruster @ 2014-11-15 0:47 ` Eric Blake 0 siblings, 0 replies; 21+ messages in thread From: Eric Blake @ 2014-11-15 0:47 UTC (permalink / raw) To: Markus Armbruster Cc: Kevin Wolf, famz, tony, qemu-devel, mreitz, stefanha, pbonzini [-- Attachment #1: Type: text/plain, Size: 5646 bytes --] On 11/14/2014 06:12 AM, Markus Armbruster wrote: >> 0-length file: >> lseek(fd, 0, SEEK_HOLE) => -1 ENXIO >> lseek(fd, 0, SEEK_DATA) => -1 ENXIO >> conclusion: 0 is at EOF > > Isn't this a special case of the next one? > >> file of any size: >> lseek(fd, size_or_larger, SEEK_HOLE) => -1 ENXIO >> lseek(fd, size_or_larger, SEEK_DATA) => -1 ENXIO >> conclusion: size_or_larger is at or beyond EOF Yes. >> >> The two calls are both necessary, in order to learn which extant type >> offset belongs to, and to tell where that extant ends; and the behaviors >> are distinguishable (if both lseek() succeed, we have both numbers we >> want; if both fail with ENXIO, we know the offset is at or beyond EOF; >> and if only SEEK_HOLE fails with ENXIO, we know we have a trailing >> hole); and we can tell at runtime what to do about a trailing hole (if >> the return value is offset, we need one more lseek(fd, 0, SEEK_END) to >> find EOF; if the return value is larger than offset, we have EOF for >> free). You can optimize by calling SEEK_HOLE first (if it fails with >> ENXIO, there is no need to try SEEK_DATA); but SEEK_HOLE in isolation is >> insufficient to give you all the information you need. > > Not discussed: how to handle failures other than ENXIO. > > The appended code still avoids a second seek in one case. Useful mostly > because it saves us from handling a second seek's contradictory > information. Slick - I focused on SEEK_HOLE first, but you focused on SEEK_DATA first. Your comments make all the difference. > > > /* > * Find allocation range in @bs around offset @start. > * May change underlying file descriptor's file offset. > * If @start is not in a hole, store @start in @data, and the > * beginning of the next hole in @hole, and return 0. > * If @start is in a non-trailing hole, store @start in @hole and the > * beginning of the next non-hole in @data, and return 0. > * If @start is in a trailing hole or beyond EOF, return -ENXIO. And caller can blindly and safely treat that as a trailing hole, as needed. > * If we can't find out, return a negative errno other than -ENXIO. > */ > static int find_allocation(BlockDriverState *bs, off_t start, > off_t *data, off_t *hole) > { > #if defined SEEK_HOLE && defined SEEK_DATA I seriously doubt you'd find a system with one but not both of these constants defined. But it doesn't hurt to check both. > BDRVRawState *s = bs->opaque; > off_t offs; > > /* > * SEEK_DATA cases: > * D1. offs == start: start is in data > * D2. offs > start: start is in a hole, next data at offs > * D3. offs < 0, errno = ENXIO: either start is in a trailing hole > * or start is beyond EOF > * If the latter happens, the file has been truncated behind > * our back since we opened it. Best we can do is treat like > * a trailing hole. > * D4. offs < 0, errno != ENXIO: we learned nothing > */ Correct. > offs = lseek(s->fd, start, SEEK_DATA); > if (offs < 0) { > return -errno; /* D3 or D4 */ > } > assert(offs >= start); > > if (offs > start) { > /* D2: in hole, next data at offs */ > *hole = start; > *data = offs; > return 0; > } > > /* D1: in data, end not yet known */ > > /* > * SEEK_HOLE cases: > * H1. offs == start: start is in a hole > * If this happens here, a hole has been dug behind our back > * since the previous lseek(). > * H2. offs > start: either start is in data, next hole at offs, > * or start is in trailing hole, EOF at offs > * Linux treats trailing holes like any other hole: offs == > * start. Solaris seeks to EOF instead: offs > start (blech). Correct in isolation. Coupled with the additional knowledge that we are in state D1 (and already treated D3 as a trailing hole with early exit),... > * If that happens here, a hole has been dug behind our back > * since the previous lseek(). ...this is further true for this function. > * H3. offs < 0, errno = ENXIO: start is beyond EOF > * If this happens, the file has been truncated behind our > * back since we opened it. Treat it like a trailing hole. > * H4. offs < 0, errno != ENXIO: we learned nothing > * Pretend we know nothing at all, i.e. "forget" about D1. > */ > offs = lseek(s->fd, start, SEEK_HOLE); > if (offs < 0) { > return -errno; /* D1 and (H3 or H4) */ > } > assert(offs >= start); > > if (offs > start) { > /* > * D1 and H2: either in data, next hole at offs, or it was in > * data but is now in a trailing hole. Treating the latter as > * if it there was data extending to EOF is safe, so simply do > * that. > */ > *data = start; > *hole = offs; > return 0; > } Reasonable. > > /* D1 and H1 */ > return -EBUSY; > #else > return -ENOTSUP; > #endif > } I like it. Maybe we could do better than -ENOTSUP (by treating the entire file as data and the hole at EOF), but if the caller handles ENOTSUP differently from ENXIO, you don't necessarily need to do it here. Looking forward to this in an actual v3 patch. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 539 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* [Qemu-devel] [PATCH v2 4/4] raw-posix: Clean up around raw_co_get_block_status() 2014-11-13 10:16 [Qemu-devel] [PATCH v2 0/4] raw-posix: Get rid of FIEMAP, and more Markus Armbruster ` (2 preceding siblings ...) 2014-11-13 10:17 ` [Qemu-devel] [PATCH v2 3/4] raw-posix: Fix try_seek_hole()'s handling of SEEK_DATA failure Markus Armbruster @ 2014-11-13 10:17 ` Markus Armbruster 2014-11-13 10:27 ` Max Reitz 2014-11-13 13:30 ` [Qemu-devel] [PATCH v2 0/4] raw-posix: Get rid of FIEMAP, and more Eric Blake 4 siblings, 1 reply; 21+ messages in thread From: Markus Armbruster @ 2014-11-13 10:17 UTC (permalink / raw) To: qemu-devel; +Cc: kwolf, famz, tony, mreitz, stefanha, pbonzini try_seek_hole() doesn't really seek to a hole, it tries to find out whether its argument is in a hole or not, and where the hole or non-hole ends. Rename to find_allocation() and add a proper function comment. Using arguments passed by reference like local variables is a bad habit. Only assign to them right before return. Avoid nesting of conditionals. When find_allocation() fails, don't make up a range that'll get mapped to nb_sectors, simply set *pnum = nb_sectors directly. Don't repeat BDRV_BLOCK_OFFSET_VALID | start. Drop a pointless assertion, add some meaningful ones. Signed-off-by: Markus Armbruster <armbru@redhat.com> --- block/raw-posix.c | 62 +++++++++++++++++++++++++++++++++---------------------- 1 file changed, 37 insertions(+), 25 deletions(-) diff --git a/block/raw-posix.c b/block/raw-posix.c index 2a12a50..ea5b3b8 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -1478,28 +1478,43 @@ out: return result; } -static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data, - off_t *hole) +/* + * Find allocation range in @bs around offset @start. + * If @start is in a hole, store @start in @hole and the end of the + * hole in @data, and return 0. + * If @start is in a data, store @start to @data, and the end of the + * data to @hole, and return 0. + * If we can't find out, return -errno. + */ +static int find_allocation(BlockDriverState *bs, off_t start, + off_t *data, off_t *hole) { #if defined SEEK_HOLE && defined SEEK_DATA BDRVRawState *s = bs->opaque; + off_t offs; - *hole = lseek(s->fd, start, SEEK_HOLE); - if (*hole == -1) { + offs = lseek(s->fd, start, SEEK_HOLE); + if (offs < 0) { return -errno; } + assert(offs >= start); - if (*hole > start) { + if (offs > start) { + /* in data, next hole at offs */ *data = start; - } else { - /* On a hole. We need another syscall to find its end. */ - *data = lseek(s->fd, start, SEEK_DATA); - if (*data < 0) { - /* no idea where the hole ends, give up (unlikely to happen) */ - return -errno; - } + *hole = offs; + return 0; } + /* in hole, end not yet known */ + offs = lseek(s->fd, start, SEEK_DATA); + if (offs < 0) { + /* no idea where the hole ends, give up (unlikely to happen) */ + return -errno; + } + assert(offs > start); + *hole = start; + *data = offs; return 0; #else return -ENOTSUP; @@ -1543,25 +1558,22 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs, nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE); } - ret = try_seek_hole(bs, start, &data, &hole); - if (ret < 0) { - /* Assume everything is allocated. */ - data = 0; - hole = start + nb_sectors * BDRV_SECTOR_SIZE; - ret = 0; - } - - assert(ret >= 0); - - if (data <= start) { + ret = BDRV_BLOCK_OFFSET_VALID | start; + if (find_allocation(bs, start, &data, &hole) < 0) { + /* No info available, so pretend there are no holes */ + *pnum = nb_sectors; + ret |= BDRV_BLOCK_DATA; + } else if (data == start) { /* On a data extent, compute sectors to the end of the extent. */ *pnum = MIN(nb_sectors, (hole - start) / BDRV_SECTOR_SIZE); - return ret | BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start; + ret |= BDRV_BLOCK_DATA; } else { /* On a hole, compute sectors to the beginning of the next extent. */ + assert(hole == start); *pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE); - return ret | BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID | start; + ret |= BDRV_BLOCK_ZERO; } + return ret; } static coroutine_fn BlockAIOCB *raw_aio_discard(BlockDriverState *bs, -- 1.9.3 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH v2 4/4] raw-posix: Clean up around raw_co_get_block_status() 2014-11-13 10:17 ` [Qemu-devel] [PATCH v2 4/4] raw-posix: Clean up around raw_co_get_block_status() Markus Armbruster @ 2014-11-13 10:27 ` Max Reitz 2014-11-13 12:48 ` Markus Armbruster 0 siblings, 1 reply; 21+ messages in thread From: Max Reitz @ 2014-11-13 10:27 UTC (permalink / raw) To: Markus Armbruster, qemu-devel; +Cc: kwolf, famz, tony, stefanha, pbonzini On 2014-11-13 at 11:17, Markus Armbruster wrote: > try_seek_hole() doesn't really seek to a hole, it tries to find out > whether its argument is in a hole or not, and where the hole or > non-hole ends. Rename to find_allocation() and add a proper function > comment. > > Using arguments passed by reference like local variables is a bad > habit. Only assign to them right before return. > > Avoid nesting of conditionals. > > When find_allocation() fails, don't make up a range that'll get mapped > to nb_sectors, simply set *pnum = nb_sectors directly. > > Don't repeat BDRV_BLOCK_OFFSET_VALID | start. > > Drop a pointless assertion, add some meaningful ones. > > Signed-off-by: Markus Armbruster <armbru@redhat.com> > --- > block/raw-posix.c | 62 +++++++++++++++++++++++++++++++++---------------------- > 1 file changed, 37 insertions(+), 25 deletions(-) > > diff --git a/block/raw-posix.c b/block/raw-posix.c > index 2a12a50..ea5b3b8 100644 > --- a/block/raw-posix.c > +++ b/block/raw-posix.c > @@ -1478,28 +1478,43 @@ out: > return result; > } > > -static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data, > - off_t *hole) > +/* > + * Find allocation range in @bs around offset @start. > + * If @start is in a hole, store @start in @hole and the end of the > + * hole in @data, and return 0. > + * If @start is in a data, store @start to @data, and the end of the "is in a data" sounds funny enough I'd even like to keep it. Probably should be "data extent" or something similar. > + * data to @hole, and return 0. Here, too. With or without that changed: Reviewed-by: Max Reitz <mreitz@redhat.com> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH v2 4/4] raw-posix: Clean up around raw_co_get_block_status() 2014-11-13 10:27 ` Max Reitz @ 2014-11-13 12:48 ` Markus Armbruster 0 siblings, 0 replies; 21+ messages in thread From: Markus Armbruster @ 2014-11-13 12:48 UTC (permalink / raw) To: Max Reitz; +Cc: kwolf, famz, qemu-devel, tony, stefanha, pbonzini Max Reitz <mreitz@redhat.com> writes: > On 2014-11-13 at 11:17, Markus Armbruster wrote: >> try_seek_hole() doesn't really seek to a hole, it tries to find out >> whether its argument is in a hole or not, and where the hole or >> non-hole ends. Rename to find_allocation() and add a proper function >> comment. >> >> Using arguments passed by reference like local variables is a bad >> habit. Only assign to them right before return. >> >> Avoid nesting of conditionals. >> >> When find_allocation() fails, don't make up a range that'll get mapped >> to nb_sectors, simply set *pnum = nb_sectors directly. >> >> Don't repeat BDRV_BLOCK_OFFSET_VALID | start. >> >> Drop a pointless assertion, add some meaningful ones. >> >> Signed-off-by: Markus Armbruster <armbru@redhat.com> >> --- >> block/raw-posix.c | 62 +++++++++++++++++++++++++++++++++---------------------- >> 1 file changed, 37 insertions(+), 25 deletions(-) >> >> diff --git a/block/raw-posix.c b/block/raw-posix.c >> index 2a12a50..ea5b3b8 100644 >> --- a/block/raw-posix.c >> +++ b/block/raw-posix.c >> @@ -1478,28 +1478,43 @@ out: >> return result; >> } >> -static int try_seek_hole(BlockDriverState *bs, off_t start, off_t >> *data, >> - off_t *hole) >> +/* >> + * Find allocation range in @bs around offset @start. >> + * If @start is in a hole, store @start in @hole and the end of the >> + * hole in @data, and return 0. >> + * If @start is in a data, store @start to @data, and the end of the > > "is in a data" sounds funny enough I'd even like to keep it. Probably > should be "data extent" or something similar. Okay. >> + * data to @hole, and return 0. > > Here, too. > > With or without that changed: > > Reviewed-by: Max Reitz <mreitz@redhat.com> Thanks! ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/4] raw-posix: Get rid of FIEMAP, and more 2014-11-13 10:16 [Qemu-devel] [PATCH v2 0/4] raw-posix: Get rid of FIEMAP, and more Markus Armbruster ` (3 preceding siblings ...) 2014-11-13 10:17 ` [Qemu-devel] [PATCH v2 4/4] raw-posix: Clean up around raw_co_get_block_status() Markus Armbruster @ 2014-11-13 13:30 ` Eric Blake 4 siblings, 0 replies; 21+ messages in thread From: Eric Blake @ 2014-11-13 13:30 UTC (permalink / raw) To: Markus Armbruster, qemu-devel Cc: kwolf, famz, tony, mreitz, stefanha, pbonzini [-- Attachment #1: Type: text/plain, Size: 932 bytes --] On 11/13/2014 03:16 AM, Markus Armbruster wrote: > See PATCH 2/4 for why FIEMAP needs to go. Minor fixes in 1+3/4, > cleanup in 4/4. > > Would you like this included in 2.2? Maybe just the first three? I'm okay with removing FIEMAP in 2.2; I'm okay if the entire series goes in, once it passes review. > > v2: > * PATCH 1 unchanged > * PATCH 2 revised and split up [Paolo, Fam, Eric, Max] > > Markus Armbruster (4): > raw-posix: Fix comment for raw_co_get_block_status() > raw-posix: SEEK_HOLE suffices, get rid of FIEMAP > raw-posix: Fix try_seek_hole()'s handling of SEEK_DATA failure > raw-posix: Clean up around raw_co_get_block_status() > > block/raw-posix.c | 117 ++++++++++++++++++------------------------------------ > 1 file changed, 38 insertions(+), 79 deletions(-) > -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 539 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2014-11-15 0:47 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-11-13 10:16 [Qemu-devel] [PATCH v2 0/4] raw-posix: Get rid of FIEMAP, and more Markus Armbruster 2014-11-13 10:17 ` [Qemu-devel] [PATCH v2 1/4] raw-posix: Fix comment for raw_co_get_block_status() Markus Armbruster 2014-11-13 10:17 ` [Qemu-devel] [PATCH v2 2/4] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP Markus Armbruster 2014-11-13 10:19 ` Max Reitz 2014-11-13 14:09 ` Eric Blake 2014-11-13 10:17 ` [Qemu-devel] [PATCH v2 3/4] raw-posix: Fix try_seek_hole()'s handling of SEEK_DATA failure Markus Armbruster 2014-11-13 10:22 ` Max Reitz 2014-11-13 13:03 ` Kevin Wolf 2014-11-13 14:52 ` Eric Blake 2014-11-13 15:29 ` Eric Blake 2014-11-13 15:44 ` Max Reitz 2014-11-13 15:49 ` Eric Blake 2014-11-13 15:52 ` Eric Blake 2014-11-13 15:47 ` Eric Blake 2014-11-13 16:01 ` Eric Blake 2014-11-14 13:12 ` Markus Armbruster 2014-11-15 0:47 ` Eric Blake 2014-11-13 10:17 ` [Qemu-devel] [PATCH v2 4/4] raw-posix: Clean up around raw_co_get_block_status() Markus Armbruster 2014-11-13 10:27 ` Max Reitz 2014-11-13 12:48 ` Markus Armbruster 2014-11-13 13:30 ` [Qemu-devel] [PATCH v2 0/4] raw-posix: Get rid of FIEMAP, and more Eric Blake
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).