From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51219) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XpGg2-0001eh-DQ for qemu-devel@nongnu.org; Fri, 14 Nov 2014 08:12:43 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XpGfr-00054H-Pe for qemu-devel@nongnu.org; Fri, 14 Nov 2014 08:12:34 -0500 Received: from mx1.redhat.com ([209.132.183.28]:46504) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XpGfr-00054A-HO for qemu-devel@nongnu.org; Fri, 14 Nov 2014 08:12:27 -0500 From: Markus Armbruster References: <1415873823-13844-1-git-send-email-armbru@redhat.com> <1415873823-13844-4-git-send-email-armbru@redhat.com> <20141113130327.GD3933@noname.redhat.com> <5464C59A.4000602@redhat.com> <5464CE42.4000809@redhat.com> <5464D291.4000303@redhat.com> Date: Fri, 14 Nov 2014 14:12:20 +0100 In-Reply-To: <5464D291.4000303@redhat.com> (Eric Blake's message of "Thu, 13 Nov 2014 08:47:29 -0700") Message-ID: <871tp62aff.fsf@blackfin.pond.sub.org> MIME-Version: 1.0 Content-Type: text/plain Subject: Re: [Qemu-devel] [PATCH v2 3/4] raw-posix: Fix try_seek_hole()'s handling of SEEK_DATA failure List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake Cc: Kevin Wolf , famz@redhat.com, tony@bakeyournoodle.com, qemu-devel@nongnu.org, mreitz@redhat.com, stefanha@redhat.com, pbonzini@redhat.com Eric Blake writes: > On 11/13/2014 08:29 AM, Eric Blake wrote: > >>>> lseek() with SEEK_DATA starting in a hole when there is no data until >>>> EOF is actually the part that isn't documented in the man page, but >>>> ENXIO is what I'm seeing here on RHEL 7. >>> >>> Here's the (proposed) POSIX wording: >>> >>> http://austingroupbugs.net/view.php?id=415 >>> >>> And ENXIO is indeed the expected error for SEEK_DATA on a trailing hole, >>> so maybe we should special case it. >>> >> >> Uggh. Historical practice on Solaris (and therefore the POSIX wording) >> says that SEEK_HOLE in a trailing hole is allowed (but not required) to >> seek to EOF instead of reporting the offset requested. I have no clue >> why this was done, but it is VERY annoying - it means that if you >> provide an offset within a tail hole of a file, you cannot reliably tell >> if the file ends in a hole or with data, without ALSO trying SEEK_DATA. >> For applications that are reading a file sequentially but skipping over >> holes, this behavior is fine (it short-circuits the hole/data search >> points and might shave an iteration off a lop). But for OUR purposes, >> where we are merely trying to ascertain whether we are in a hole, we >> have an inaccurate response - since SEEK_HOLE does NOT return the offset >> we passed in, we are prone to treat the offset as belonging to data, >> which is a pessimization (you never get wrong results by treating a hole >> as data and reading it, but it is definitely slower). >> >> I think you HAVE to call lseek() twice, both with SEEK_HOLE and with >> SEEK_DATA, if you want to accurately determine whether an offset happens >> to live within a trailing hole. > > Here's a table of possible situations, based solely on POSIX wording > (and not on actual tests on Solaris or Linux, although it shouldn't be > too hard to confirm behavior): > > 0-length file: > lseek(fd, 0, SEEK_HOLE) => -1 ENXIO > lseek(fd, 0, SEEK_DATA) => -1 ENXIO > conclusion: 0 is at EOF Isn't this a special case of the next one? > file of any size: > lseek(fd, size_or_larger, SEEK_HOLE) => -1 ENXIO > lseek(fd, size_or_larger, SEEK_DATA) => -1 ENXIO > conclusion: size_or_larger is at or beyond EOF > > file where offset is in a hole, but data appears later: > lseek(fd, offset, SEEK_HOLE) => offset > lseek(fd, offset, SEEK_DATA) => end_of_hole > conclusion: offset through end_of_hole is in a hole > > file where offset is data, whether or not a hole appears later: > lseek(fd, offset, SEEK_HOLE) => end_of_data > lseek(fd, offset, SEEK_DATA) => offset > conclusion: offset through end_of_data is in data > > file where offset is in a tail hole, option 1: > lseek(fd, offset, SEEK_HOLE) => offset > lseek(fd, offset, SEEK_DATA) => -1 ENXIO > conclusion: offset through EOF is in hole, but another seek needed to > learn EOF > > file where offset is in a tail hole, option 2: > lseek(fd, offset, SEEK_HOLE) => EOF > lseek(fd, offset, SEEK_DATA) => -1 ENXIO > conclusion: offset through EOF is in hole, no additional seek needed > > The two calls are both necessary, in order to learn which extant type > offset belongs to, and to tell where that extant ends; and the behaviors > are distinguishable (if both lseek() succeed, we have both numbers we > want; if both fail with ENXIO, we know the offset is at or beyond EOF; > and if only SEEK_HOLE fails with ENXIO, we know we have a trailing > hole); and we can tell at runtime what to do about a trailing hole (if > the return value is offset, we need one more lseek(fd, 0, SEEK_END) to > find EOF; if the return value is larger than offset, we have EOF for > free). You can optimize by calling SEEK_HOLE first (if it fails with > ENXIO, there is no need to try SEEK_DATA); but SEEK_HOLE in isolation is > insufficient to give you all the information you need. Not discussed: how to handle failures other than ENXIO. The appended code still avoids a second seek in one case. Useful mostly because it saves us from handling a second seek's contradictory information. /* * Find allocation range in @bs around offset @start. * May change underlying file descriptor's file offset. * If @start is not in a hole, store @start in @data, and the * beginning of the next hole in @hole, and return 0. * If @start is in a non-trailing hole, store @start in @hole and the * beginning of the next non-hole in @data, and return 0. * If @start is in a trailing hole or beyond EOF, return -ENXIO. * If we can't find out, return a negative errno other than -ENXIO. */ static int find_allocation(BlockDriverState *bs, off_t start, off_t *data, off_t *hole) { #if defined SEEK_HOLE && defined SEEK_DATA BDRVRawState *s = bs->opaque; off_t offs; /* * SEEK_DATA cases: * D1. offs == start: start is in data * D2. offs > start: start is in a hole, next data at offs * D3. offs < 0, errno = ENXIO: either start is in a trailing hole * or start is beyond EOF * If the latter happens, the file has been truncated behind * our back since we opened it. Best we can do is treat like * a trailing hole. * D4. offs < 0, errno != ENXIO: we learned nothing */ offs = lseek(s->fd, start, SEEK_DATA); if (offs < 0) { return -errno; /* D3 or D4 */ } assert(offs >= start); if (offs > start) { /* D2: in hole, next data at offs */ *hole = start; *data = offs; return 0; } /* D1: in data, end not yet known */ /* * SEEK_HOLE cases: * H1. offs == start: start is in a hole * If this happens here, a hole has been dug behind our back * since the previous lseek(). * H2. offs > start: either start is in data, next hole at offs, * or start is in trailing hole, EOF at offs * Linux treats trailing holes like any other hole: offs == * start. Solaris seeks to EOF instead: offs > start (blech). * If that happens here, a hole has been dug behind our back * since the previous lseek(). * H3. offs < 0, errno = ENXIO: start is beyond EOF * If this happens, the file has been truncated behind our * back since we opened it. Treat it like a trailing hole. * H4. offs < 0, errno != ENXIO: we learned nothing * Pretend we know nothing at all, i.e. "forget" about D1. */ offs = lseek(s->fd, start, SEEK_HOLE); if (offs < 0) { return -errno; /* D1 and (H3 or H4) */ } assert(offs >= start); if (offs > start) { /* * D1 and H2: either in data, next hole at offs, or it was in * data but is now in a trailing hole. Treating the latter as * if it there was data extending to EOF is safe, so simply do * that. */ *data = start; *hole = offs; return 0; } /* D1 and H1 */ return -EBUSY; #else return -ENOTSUP; #endif }