From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35154) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Whn1X-0006Ko-8o for qemu-devel@nongnu.org; Tue, 06 May 2014 17:35:45 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Whn1Q-0004P2-Va for qemu-devel@nongnu.org; Tue, 06 May 2014 17:35:39 -0400 Received: from mx1.redhat.com ([209.132.183.28]:21898) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Whn1Q-0004Ou-FW for qemu-devel@nongnu.org; Tue, 06 May 2014 17:35:32 -0400 Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s46LZVaB030064 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Tue, 6 May 2014 17:35:31 -0400 Message-ID: <5369559F.8020207@redhat.com> Date: Tue, 06 May 2014 23:35:27 +0200 From: Max Reitz MIME-Version: 1.0 References: <1399402854-24559-1-git-send-email-mreitz@redhat.com> <536951A9.1070200@redhat.com> In-Reply-To: <536951A9.1070200@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v2] block/raw-posix: Try both FIEMAP and SEEK_HOLE List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake , qemu-devel@nongnu.org Cc: Kevin Wolf , Stefan Hajnoczi On 06.05.2014 23:18, Eric Blake wrote: > On 05/06/2014 01:00 PM, Max Reitz wrote: >> The current version of raw-posix always uses ioctl(FS_IOC_FIEMAP) if >> FIEMAP is available; lseek with SEEK_HOLE/SEEK_DATA are not even >> compiled in in this case. However, there may be implementations which >> support the latter but not the former (e.g., NFSv4.2) as well as vice >> versa. >> >> To cover both cases, always try SEEK_HOLE/SEEK_DATA (as this will >> probably be covered by POSIX soon) and if that does not work, fall bac= k >> to FIEMAP; and if that does not work either, treat everything as >> allocated. >> >> Signed-off-by: Max Reitz >> --- >> v2: >> - reworked using static functions [Stefan] >> - changed order of FIEMAP and SEEK_HOLE [Eric] >> --- >> block/raw-posix.c | 135 ++++++++++++++++++++++++++++++++++----------= ---------- >> 1 file changed, 85 insertions(+), 50 deletions(-) >> >> +++ b/block/raw-posix.c >> @@ -146,6 +146,12 @@ typedef struct BDRVRawState { >> bool has_discard:1; >> bool has_write_zeroes:1; >> bool discard_zeroes:1; >> +#if defined SEEK_HOLE && defined SEEK_DATA >> + bool skip_seek_hole; > Remember, SEEK_HOLE support requires both support of the kernel and the > filesystem - we may still have cases where there are filesystems that > lack SEEK_HOLE but still have FIEMAP, even when compiled against kernel > headers where SEEK_HOLE is defined - on those filesystems, the kernel > guarantees SEEK_HOLE will report the entire file as allocated, even > though FIEMAP might be able to find holes. On the other hand, the whol= e > point of this is to optimize cases; where the fallback to treating the > whole file as allocated may be slower but still gives correct behavior > to the guest. > > I like how you have a per-struct state to track whether you have > encountered a previous failure, to skip the known-failing probe the nex= t > time around. But I wonder if you need a tweak... > >> +static int64_t try_seek_hole(BlockDriverState *bs, off_t start, off_t= *data, >> + off_t *hole, int *pnum) >> +{ >> +#if defined SEEK_HOLE && defined SEEK_DATA >> BDRVRawState *s =3D bs->opaque; >> =20 >> - hole =3D lseek(s->fd, start, SEEK_HOLE); >> - if (hole =3D=3D -1) { >> + if (s->skip_seek_hole) { >> + return -ENOTSUP; >> + } >> + >> + *hole =3D lseek(s->fd, start, SEEK_HOLE); >> + if (*hole =3D=3D -1) { >> /* -ENXIO indicates that sector_num was past the end of the = file. >> * There is a virtual hole there. */ >> assert(errno !=3D -ENXIO); >> =20 >> - /* Most likely EINVAL. Assume everything is allocated. */ >> - *pnum =3D nb_sectors; >> - return ret; >> + s->skip_seek_hole =3D true; >> + return -errno; >> } > ...if you are on a file system where SEEK_HOLE triggers the kernel > fallback of "entire file is allocated", but where FIEMAP is wired up fo= r > that file system, would it make sense to have try_seek_hole return -1 i= n > situations where lseek(s->fd, 0, SEEK_HOLE) returns the end of the file= ? > Even more, should skip_seek_hole be a tri-state? > > state: "unknown" - if lseek(SEEK_HOLE) returns -1, state changes to > "skip"; if it returns something other than EOF, state changes to "use"; > if it returns EOF, state remains "unknown" (as punching a hole or > resizing the image may work to create a hole in what is currently a > fully-allocated image) > > state: "skip" - we've had a previous lseek failure, no need to try > seeking for holes on this file > > state: "use" - we've had success probing a hole, so we want to always > trust SEEK_HOLE for this file Hm, you're probably right. I just hope it won't get too complicated=E2=80= =A6 >> =20 >> - if (hole > start) { >> - data =3D start; >> + if (*hole > start) { >> + *data =3D start; >> } else { >> /* On a hole. We need another syscall to find its end. */ >> - data =3D lseek(s->fd, start, SEEK_DATA); >> - if (data =3D=3D -1) { >> - data =3D lseek(s->fd, 0, SEEK_END); >> + *data =3D lseek(s->fd, start, SEEK_DATA); >> + if (*data =3D=3D -1) { >> + *data =3D lseek(s->fd, 0, SEEK_END); >> } > Pre-existing, and unaffected by this patch, but lseek() changes the fil= e > offset. Does that affect any other code that might do a read()/write()= , > or are we safe in always using pread()/pwrite() so that we don't care > about file offset? I see multiple lseek(fd, 0, SEEK_END) in raw-posix.c for determining the=20 file size, so I guess it's fine. >> + * >> + * 'nb_sectors' is the max value 'pnum' should be set to. If nb_sect= ors goes >> + * beyond the end of the disk image it will be clamped. >> + */ >> +static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState = *bs, >> + int64_t sector_num, >> + int nb_sectors, int *pnum= ) > Indentation is off. Well, yes, this is pre-existing, but I'll fix it. >> +{ >> + off_t start, data =3D 0, hole =3D 0; >> + int64_t ret; >> + >> + ret =3D fd_open(bs); >> + if (ret < 0) { >> + return ret; >> + } >> + >> + start =3D sector_num * BDRV_SECTOR_SIZE; >> + >> + ret =3D try_seek_hole(bs, start, &data, &hole, pnum); > Again, a tri-state return (-1 for skipped, 0 for unknown, 1 for hole or > EOF found) might make more sense. That way, you hit the fallback of > declaring the whole file as allocated only if both probes reported no > holes. Or hide the tri-state in the helper function, but map both > "skip" and "unknown" into -1 return, and only "use" into 0 return. I guess I'll go for the latter. Thanks for you review, Max >> + if (ret < 0) { >> + ret =3D try_fiemap(bs, start, &data, &hole, nb_sectors, pnum)= ; >> + if (ret < 0) { >> + /* Assume everything is allocated. */ >> + data =3D 0; >> + hole =3D start + nb_sectors * BDRV_SECTOR_SIZE; >> + ret =3D BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start= ; >> + } >> + } >> =20 >> if (data <=3D start) { >> /* On a data extent, compute sectors to the end of the exten= t. */ >>