From: Max Reitz <mreitz@redhat.com>
To: Eric Blake <eblake@redhat.com>, qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v2] block/raw-posix: Try both FIEMAP and SEEK_HOLE
Date: Tue, 06 May 2014 23:35:27 +0200 [thread overview]
Message-ID: <5369559F.8020207@redhat.com> (raw)
In-Reply-To: <536951A9.1070200@redhat.com>
On 06.05.2014 23:18, Eric Blake wrote:
> On 05/06/2014 01:00 PM, Max Reitz wrote:
>> The current version of raw-posix always uses ioctl(FS_IOC_FIEMAP) if
>> FIEMAP is available; lseek with SEEK_HOLE/SEEK_DATA are not even
>> compiled in in this case. However, there may be implementations which
>> support the latter but not the former (e.g., NFSv4.2) as well as vice
>> versa.
>>
>> To cover both cases, always try SEEK_HOLE/SEEK_DATA (as this will
>> probably be covered by POSIX soon) and if that does not work, fall back
>> to FIEMAP; and if that does not work either, treat everything as
>> allocated.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>> v2:
>> - reworked using static functions [Stefan]
>> - changed order of FIEMAP and SEEK_HOLE [Eric]
>> ---
>> block/raw-posix.c | 135 ++++++++++++++++++++++++++++++++++--------------------
>> 1 file changed, 85 insertions(+), 50 deletions(-)
>>
>> +++ b/block/raw-posix.c
>> @@ -146,6 +146,12 @@ typedef struct BDRVRawState {
>> bool has_discard:1;
>> bool has_write_zeroes:1;
>> bool discard_zeroes:1;
>> +#if defined SEEK_HOLE && defined SEEK_DATA
>> + bool skip_seek_hole;
> Remember, SEEK_HOLE support requires both support of the kernel and the
> filesystem - we may still have cases where there are filesystems that
> lack SEEK_HOLE but still have FIEMAP, even when compiled against kernel
> headers where SEEK_HOLE is defined - on those filesystems, the kernel
> guarantees SEEK_HOLE will report the entire file as allocated, even
> though FIEMAP might be able to find holes. On the other hand, the whole
> point of this is to optimize cases; where the fallback to treating the
> whole file as allocated may be slower but still gives correct behavior
> to the guest.
>
> I like how you have a per-struct state to track whether you have
> encountered a previous failure, to skip the known-failing probe the next
> time around. But I wonder if you need a tweak...
>
>> +static int64_t try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
>> + off_t *hole, int *pnum)
>> +{
>> +#if defined SEEK_HOLE && defined SEEK_DATA
>> BDRVRawState *s = bs->opaque;
>>
>> - hole = lseek(s->fd, start, SEEK_HOLE);
>> - if (hole == -1) {
>> + if (s->skip_seek_hole) {
>> + return -ENOTSUP;
>> + }
>> +
>> + *hole = lseek(s->fd, start, SEEK_HOLE);
>> + if (*hole == -1) {
>> /* -ENXIO indicates that sector_num was past the end of the file.
>> * There is a virtual hole there. */
>> assert(errno != -ENXIO);
>>
>> - /* Most likely EINVAL. Assume everything is allocated. */
>> - *pnum = nb_sectors;
>> - return ret;
>> + s->skip_seek_hole = true;
>> + return -errno;
>> }
> ...if you are on a file system where SEEK_HOLE triggers the kernel
> fallback of "entire file is allocated", but where FIEMAP is wired up for
> that file system, would it make sense to have try_seek_hole return -1 in
> situations where lseek(s->fd, 0, SEEK_HOLE) returns the end of the file?
> Even more, should skip_seek_hole be a tri-state?
>
> state: "unknown" - if lseek(SEEK_HOLE) returns -1, state changes to
> "skip"; if it returns something other than EOF, state changes to "use";
> if it returns EOF, state remains "unknown" (as punching a hole or
> resizing the image may work to create a hole in what is currently a
> fully-allocated image)
>
> state: "skip" - we've had a previous lseek failure, no need to try
> seeking for holes on this file
>
> state: "use" - we've had success probing a hole, so we want to always
> trust SEEK_HOLE for this file
Hm, you're probably right. I just hope it won't get too complicated…
>>
>> - if (hole > start) {
>> - data = start;
>> + if (*hole > start) {
>> + *data = start;
>> } else {
>> /* On a hole. We need another syscall to find its end. */
>> - data = lseek(s->fd, start, SEEK_DATA);
>> - if (data == -1) {
>> - data = lseek(s->fd, 0, SEEK_END);
>> + *data = lseek(s->fd, start, SEEK_DATA);
>> + if (*data == -1) {
>> + *data = lseek(s->fd, 0, SEEK_END);
>> }
> Pre-existing, and unaffected by this patch, but lseek() changes the file
> offset. Does that affect any other code that might do a read()/write(),
> or are we safe in always using pread()/pwrite() so that we don't care
> about file offset?
I see multiple lseek(fd, 0, SEEK_END) in raw-posix.c for determining the
file size, so I guess it's fine.
>> + *
>> + * 'nb_sectors' is the max value 'pnum' should be set to. If nb_sectors goes
>> + * beyond the end of the disk image it will be clamped.
>> + */
>> +static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
>> + int64_t sector_num,
>> + int nb_sectors, int *pnum)
> Indentation is off.
Well, yes, this is pre-existing, but I'll fix it.
>> +{
>> + off_t start, data = 0, hole = 0;
>> + int64_t ret;
>> +
>> + ret = fd_open(bs);
>> + if (ret < 0) {
>> + return ret;
>> + }
>> +
>> + start = sector_num * BDRV_SECTOR_SIZE;
>> +
>> + ret = try_seek_hole(bs, start, &data, &hole, pnum);
> Again, a tri-state return (-1 for skipped, 0 for unknown, 1 for hole or
> EOF found) might make more sense. That way, you hit the fallback of
> declaring the whole file as allocated only if both probes reported no
> holes. Or hide the tri-state in the helper function, but map both
> "skip" and "unknown" into -1 return, and only "use" into 0 return.
I guess I'll go for the latter.
Thanks for you review,
Max
>> + if (ret < 0) {
>> + ret = try_fiemap(bs, start, &data, &hole, nb_sectors, pnum);
>> + if (ret < 0) {
>> + /* Assume everything is allocated. */
>> + data = 0;
>> + hole = start + nb_sectors * BDRV_SECTOR_SIZE;
>> + ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
>> + }
>> + }
>>
>> if (data <= start) {
>> /* On a data extent, compute sectors to the end of the extent. */
>>
next prev parent reply other threads:[~2014-05-06 21:35 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-06 19:00 [Qemu-devel] [PATCH v2] block/raw-posix: Try both FIEMAP and SEEK_HOLE Max Reitz
2014-05-06 21:18 ` Eric Blake
2014-05-06 21:35 ` Max Reitz [this message]
2014-05-06 21:47 ` Eric Blake
2014-05-06 21:48 ` Max Reitz
2014-05-07 5:56 ` Markus Armbruster
2014-05-08 18:35 ` Max Reitz
2014-05-09 8:03 ` Paolo Bonzini
2014-05-11 17:26 ` Christoph Hellwig
2014-05-14 23:13 ` Max Reitz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5369559F.8020207@redhat.com \
--to=mreitz@redhat.com \
--cc=eblake@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.