From: Max Reitz <mreitz@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
qemu-block@nongnu.org
Cc: kwolf@redhat.com, den@openvz.org, qemu-devel@nongnu.org
Subject: Re: [PATCH 3/3] block: fail on open when file size is unaligned to request_alignment
Date: Mon, 23 Mar 2020 14:00:31 +0100 [thread overview]
Message-ID: <18af3397-be93-1cf1-189b-e8298d30ec5d@redhat.com> (raw)
In-Reply-To: <ad885deb-6384-700e-6a26-a2dd9e1e06f9@virtuozzo.com>
[-- Attachment #1.1: Type: text/plain, Size: 6223 bytes --]
On 12.03.20 13:06, Vladimir Sementsov-Ogievskiy wrote:
> 12.03.2020 14:59, Vladimir Sementsov-Ogievskiy wrote:
>> 11.03.2020 14:06, Max Reitz wrote:
>>> On 30.01.20 16:22, Vladimir Sementsov-Ogievskiy wrote:
>>>> Prior to the commit the following command lead to crash:
>>>>
>>>> ./qemu-io --image-opts -c 'write 0 512' \
>>>> driver=blkdebug,align=4096,image.driver=null-co,image.size=512
>>>>
>>>> It failes on assertion in bdrv_aligned_pwritev:
>>>> "end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE"
>>>>
>>>> The problem is obvious: 512 is aligned to 4096 and becomes larger than
>>>> file size. And the core bad thing is that file size is unaligned to
>>>> request_alignment.
>>>>
>>>> Let's catch such case on bdrv_open_driver and fail.
>>>
>>> I think we had a discussion on this before, but I can’t find it right
>>> now. (Although I think that had more to do with something in the
>>> file-posix driver, because it wasn’t limited to alignments above 512.)
>>>
>>> In any case, the file itself is totally valid. Most importantly, qcow2
>>> will regularly create files with unaligned file lengths.
>>>
>>> So let me create a qcow2 image on a 4k-aligned device:
>>>
>>> $ truncate 512M fs.img
Oops, “truncate --size=512M fs.img”, of course.
>>> $ sudo losetup -f --show -b 4096 fs.img
>>> /dev/loop0
>>> $ sudo mkfs.ext4 /dev/loop0
>>> [...]
>>> $ sudo mount /dev/loop0 /mnt/tmp
>>>
>>> $ sudo ./qemu-img create -f qcow2 /mnt/tmp/foo.qcow2 64M
>>> Formatting '/mnt/tmp/foo.qcow2', fmt=qcow2 size=67108864
>>> cluster_size=65536 lazy_refcounts=off refcount_bits=16
>>> $ sudo ./qemu-io -t none -c quit /mnt/tmp/foo.qcow2
>>> qemu-io: can't open device /mnt/tmp/foo.qcow2: File size is unaligned to
>>> request alignment
>>>
>>> Which is too bad.
>>
>> What exactly is bad?
That with this patch you can no longer create a qcow2 image on a 4k
block size medium and then open it with O_DIRECT.
>> Is it correct that create succeeded? Without new error, how would
>> qcow2 driver
>> read from unaligned tail of file-posix? It will crash, isn't it?
Well, the above test works, which is better already.
Basically part of the problem seems to be changing from unsafe caching
(during create) to O_DIRECT. If we used O_DIRECT all the time, the
image would always be aligned to the block alignment.
Now that makes me wonder how such cases are handled right now. Do we crash?
So I tried to create an image with 512 byte clusters, write to it with
normal WB caching, thus creating an image that isn’t aligned to 4k:
$ sudo ./qemu-img create -f qcow2 -o cluster_size=512 \
/mnt/tmp/foo.qcow2 64M
[...]
$ sudo ./qemu-io -c 'write -P 42 0 512' /mnt/tmp/foo.qcow2
[...]
$ printf '%x\n' $(stat -c '%s' /mnt/tmp/foo.qcow2)
4a00
(So not aligned to 4k)
$ sudo ./qemu-io -t none -c 'read -P 42 0 512' /mnt/tmp/foo.qcow2
read 512/512 bytes at offset 0
512 bytes, 1 ops; 00.00 sec (4.567 MiB/sec and 9352.6997 ops/sec)
So actually, that works just fine right now.
When I let file-posix print all pread() calls, this is what I can see:
[...]
pread(9, buf_ofs=+0, file_ofs=0x4000, size=0x1000)
-> 2560
pread(9, buf_ofs=+0xa00, file_ofs=0x4a00, size=0x600)
-> 0
[...]
So all reads are increased to the alignment (4k), and to read the data
cluster, we read all 4k from 0x4000 (even though it actually is just 512
bytes at 0x4800). This succeeds, even though just as a partial read
(2560 == 0xa00), so we try to read the rest of the 4k block, but that
doesn’t work (it’s at the EOF), and so we stop reading.
handle_aiocb_rw() then zeroes out the rest of the buffer (below the
“out” label) and everyone’s happy.
So to me it looks like it works perfectly fine right now. No crashes.
> Hmm, it crashes only on write-part if don't have RESIZE permission.. So
> for qcow2
> everything is OK.
>
> And generic read don't care about reading past-EOF because of alignment,
> read is passed
> to driver.
(And only after the test, I realize that you apparently answered the
question yourself... Oops.)
>>> So the real solution would probably... Be to align the file size up to
>>> the alignment?
>>
>> On creation, you mean?
Yes. But I suspect we’d need to do it all the time, because see my
above example: The user might access an image with WB caching at one
point, and then with O_DIRECT the next time.
But I can see two problems:
First, we don’t even know what the block alignment is. It’s hard enough
to figure it out for O_DIRECT images, I’m not sure we can reliably do so
for WB-cached images.
Second, this wouldn’t help with pre-existing images. And note that
“pre-existing” might mean an image a user creates on an FS with 512 byte
blocks and then moves it to another with 4k blocks.
So I don’t think aligning the file size up would work, actually. Except
if we decided to always align it up to 4k, but then we’d still have a
problem with older pre-existing images...
Max
>>>
>>> Max
>>>
>>>> Note, that file size and request_alignment may become out of sync
>>>> later, so this commit is not full fix of the problem, but it's better
>>>> than nothing.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> ---
>>>> block.c | 7 +++++++
>>>> 1 file changed, 7 insertions(+)
>>>>
>>>> diff --git a/block.c b/block.c
>>>> index ecd09dbbfd..4cfc6c33a2 100644
>>>> --- a/block.c
>>>> +++ b/block.c
>>>> @@ -1324,6 +1324,13 @@ static int bdrv_open_driver(BlockDriverState
>>>> *bs, BlockDriver *drv,
>>>> assert(bdrv_min_mem_align(bs) != 0);
>>>> assert(is_power_of_2(bs->bl.request_alignment));
>>>> + if (bs->bl.request_alignment > 512 &&
>>>> + !QEMU_IS_ALIGNED(bs->total_sectors,
>>>> bs->bl.request_alignment / 512))
>>>> + {
>>>> + error_setg(errp, "File size is unaligned to request
>>>> alignment");
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> for (i = 0; i < bs->quiesce_counter; i++) {
>>>> if (drv->bdrv_co_drain_begin) {
>>>> drv->bdrv_co_drain_begin(bs);
>>>>
>>>
>>>
>>
>>
>
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2020-03-23 13:01 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-30 15:22 [PATCH 0/3] request_alignment vs file size Vladimir Sementsov-Ogievskiy
2020-01-30 15:22 ` [PATCH 1/3] block/file-posix: add raw_getlength_fd Vladimir Sementsov-Ogievskiy
2020-03-11 10:21 ` Max Reitz
2020-01-30 15:22 ` [PATCH 2/3] block/file-posix: consider file size when fallback to max_align Vladimir Sementsov-Ogievskiy
2020-03-11 10:46 ` Max Reitz
2020-01-30 15:22 ` [PATCH 3/3] block: fail on open when file size is unaligned to request_alignment Vladimir Sementsov-Ogievskiy
2020-03-11 11:06 ` Max Reitz
2020-03-11 12:29 ` Eric Blake
2020-03-12 9:06 ` Vladimir Sementsov-Ogievskiy
2020-03-12 11:31 ` Eric Blake
2020-03-12 11:59 ` Vladimir Sementsov-Ogievskiy
2020-03-12 12:06 ` Vladimir Sementsov-Ogievskiy
2020-03-23 13:00 ` Max Reitz [this message]
2020-01-30 15:52 ` [PATCH 0/3] request_alignment vs file size no-reply
2020-01-30 16:06 ` Vladimir Sementsov-Ogievskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=18af3397-be93-1cf1-189b-e8298d30ec5d@redhat.com \
--to=mreitz@redhat.com \
--cc=den@openvz.org \
--cc=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).