From: Max Reitz <mreitz@redhat.com>
To: Eric Blake <eblake@redhat.com>, qemu-block@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>,
qemu-devel@nongnu.org, qemu-stable@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 1/2] block/file-posix: Unaligned O_DIRECT block-status
Date: Wed, 15 May 2019 06:08:33 +0200 [thread overview]
Message-ID: <61ff8c16-e849-0aab-0bae-230128b692dc@redhat.com> (raw)
In-Reply-To: <88ab9614-e1ec-650f-8834-4a906768aedb@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 4938 bytes --]
On 14.05.19 23:50, Eric Blake wrote:
> On 5/14/19 4:42 PM, Max Reitz wrote:
>> Currently, qemu crashes whenever someone queries the block status of an
>> unaligned image tail of an O_DIRECT image:
>> $ echo > foo
>> $ qemu-img map --image-opts driver=file,filename=foo,cache.direct=on
>> Offset Length Mapped to File
>> qemu-img: block/io.c:2093: bdrv_co_block_status: Assertion `*pnum &&
>> QEMU_IS_ALIGNED(*pnum, align) && align > offset - aligned_offset'
>> failed.
>>
>> This is because bdrv_co_block_status() checks that the result returned
>> by the driver's implementation is aligned to the request_alignment, but
>> file-posix can fail to do so, which is actually mentioned in a comment
>> there: "[...] possibly including a partial sector at EOF".
>>
>> Fix this by rounding up those partial sectors.
>>
>> There are two possible alternative fixes:
>> (1) We could refuse to open unaligned image files with O_DIRECT
>> altogether. That sounds reasonable until you realize that qcow2
>> does necessarily not fill up its metadata clusters, and that nobody
>> runs qemu-img create with O_DIRECT. Therefore, unpreallocated qcow2
>> files usually have an unaligned image tail.
>
> Yep, non-starter.
>
>>
>> (2) bdrv_co_block_status() could ignore unaligned tails. It actually
>> throws away everything past the EOF already, so that sounds
>> reasonable.
>> Unfortunately, the block layer knows file lengths only with a
>> granularity of BDRV_SECTOR_SIZE, so bdrv_co_block_status() usually
>> would have to guess whether its file length information is inexact
>> or whether the driver is broken.
>
> Well, if I ever get around to my thread of making the block layer honor
> byte-accurate sizes, instead of rounding up, then there is no longer
> than inexactness. I think our mails crossed, and you missed another idea
> of mine of having block drivers (probably only file-posix, per your
> audit) set BDRV_BLOCK_EOF when returning an unaligned answer due to EOF,
> as the key for letting the block layer know whether the unaligned answer
> was due to size rounding.
Yes, that EOF change makes sense, I think. Not least because right now
the EOF detection in block/io.c has to be a bit wonky considering that
it's inexact... But to be honest, returning the EOF flag from the
drivers would have required me to modify all drivers. I felt like maybe
that something to be left for another time. :-)
OTOH, I don’t know whether returning the EOF flag from the drivers would
still sense if we had a byte-accurate bdrv_getlength()...
>> Fixing what raw_co_block_status() returns is the safest thing to do.
>
> Agree.
>
>>
>> There seems to be no other block driver that sets request_alignment and
>> does not make sure that it always returns aligned values.
>
> Thanks for auditing.
>
>>
>> Cc: qemu-stable@nongnu.org
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>> block/file-posix.c | 17 +++++++++++++++++
>> 1 file changed, 17 insertions(+)
>>
>> diff --git a/block/file-posix.c b/block/file-posix.c
>> index e09e15bbf8..f489a5420c 100644
>> --- a/block/file-posix.c
>> +++ b/block/file-posix.c
>> @@ -2488,6 +2488,9 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
>> off_t data = 0, hole = 0;
>> int ret;
>>
>> + assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment) &&
>> + QEMU_IS_ALIGNED(bytes, bs->bl.request_alignment));
>> +
>
> Can write in one line as:
>
> assert(QEMU_IS_ALIGNED(offset | bytes, bs->bl.request_alignment));
Ah, yeah, sure, why not.
>> ret = fd_open(bs);
>> if (ret < 0) {
>> return ret;
>> @@ -2513,6 +2516,20 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
>> /* On a data extent, compute bytes to the end of the extent,
>> * possibly including a partial sector at EOF. */
>> *pnum = MIN(bytes, hole - offset);
>> +
>> + /*
>> + * We are not allowed to return partial sectors, though, so
>> + * round up if necessary.
>> + */
>> + if (!QEMU_IS_ALIGNED(*pnum, bs->bl.request_alignment)) {
>> + int64_t file_length = raw_getlength(bs);
>> + if (file_length > 0) {
>> + /* Ignore errors, this is just a safeguard */
>> + assert(hole == file_length);
>> + }
>> + *pnum = ROUND_UP(*pnum, bs->bl.request_alignment);
>> + }
>
> Reviewed-by: Eric Blake <eblake@redhat.com>
Thanks!
I'll send a v2 with shorter assert().
Max
> bl.request_alignment is normally 1 (making this a no-op), but is
> definitely larger for O_DIRECT images (where rounding up and treating
> the post-EOF hole the same as the rest of the sector is the same thing
> that NBD chose to do).
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2019-05-15 4:09 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-14 21:42 [Qemu-devel] [PATCH 0/2] block/file-posix: Fix unaligned O_DIRECT block status Max Reitz
2019-05-14 21:42 ` [Qemu-devel] [PATCH 1/2] block/file-posix: Unaligned O_DIRECT block-status Max Reitz
2019-05-14 21:50 ` Eric Blake
2019-05-15 4:08 ` Max Reitz [this message]
2019-05-14 21:42 ` [Qemu-devel] [PATCH 2/2] iotests: Test unaligned raw images with O_DIRECT Max Reitz
2019-05-14 21:52 ` Eric Blake
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=61ff8c16-e849-0aab-0bae-230128b692dc@redhat.com \
--to=mreitz@redhat.com \
--cc=eblake@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=qemu-stable@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).