From: Kevin Wolf <kwolf@redhat.com>
To: Eric Blake <eblake@redhat.com>
Cc: qemu-devel@nongnu.org, jsnow@redhat.com,
vsementsov@virtuozzo.com, qemu-block@nongnu.org,
Stefan Hajnoczi <stefanha@redhat.com>, Fam Zheng <fam@euphon.net>,
Max Reitz <mreitz@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 5/7] block: Fix BDRV_BLOCK_RAW status to honor alignment
Date: Wed, 3 Apr 2019 15:03:54 +0200 [thread overview]
Message-ID: <20190403130354.GA6108@localhost.localdomain> (raw)
In-Reply-To: <20190403030526.12258-6-eblake@redhat.com>
Am 03.04.2019 um 05:05 hat Eric Blake geschrieben:
> Previous patches mentioned how the blkdebug filter driver demonstrates
> a bug present in our NBD server; the bug is also present with the raw
> format driver when probing occurs. Basically, if we specify a
> particular alignment > 1, but defer the actual block status to the
> underlying file, and the underlying file has a smaller alignment, then
> the use of BDRV_BLOCK_RAW to defer to the underlying file can end up
> with status split at an alignment unacceptable to our layer. Many
> callers don't care, but NBD does - it is a violation of the NBD
> protocol to send status or read results split on an unaligned boundary
> (we've taught our client to be tolerant of such violations because of
> qemu 3.1; but we still need to fix our server for the sake of other
> stricter clients).
>
> This patch lays the groundwork - it adjusts bdrv_block_status to
> ensure that any time one layer defers to another via BDRV_BLOCK_RAW,
> the deferral is either truncated down to an aligned boundary, or
> multiple sub-aligned requests are coalesced into a single
> representative answer (using an implicit hole beyond EOF as
> needed). Iotest 241 exposes the effect (when format probing occurred,
> we don't want block status to subdivide the initial sector, and thus
> not any other sector either). Similarly, iotest 221 is a good
> candidate to amend to specifically track the effects; a change to a
> hole at EOF is not visible if an upper layer does not want alignment
> smaller than 512. However, this patch alone is not a complete fix - it
> is still possible to have an active layer with large alignment
> constraints backed by another layer with smaller constraints; so the
> next patch will complete the task.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> block/io.c | 108 +++++++++++++++++++++++++++++++++++--
> tests/qemu-iotests/221 | 10 ++++
> tests/qemu-iotests/221.out | 6 +++
> tests/qemu-iotests/241.out | 3 +-
> 4 files changed, 122 insertions(+), 5 deletions(-)
>
> diff --git a/block/io.c b/block/io.c
> index dfc153b8d82..936877d3745 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2021,6 +2021,97 @@ int coroutine_fn bdrv_co_block_status_from_backing(BlockDriverState *bs,
> return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
> }
>
> +static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
> + bool want_zero,
> + int64_t offset, int64_t bytes,
> + int64_t *pnum, int64_t *map,
> + BlockDriverState **file);
> +
> +/*
> + * Returns an aligned allocation status of the specified sectors.
> + * Wrapper around bdrv_co_block_status() which requires the initial
> + * offset and count to be aligned, and guarantees the result will also
> + * be aligned (even if it requires piecing together multiple
> + * sub-aligned queries into an appropriate coalesced answer).
> + */
I think this comment should specify the result of the operation when the
aligned region consists of multiple subregions with different status.
Probably something like this:
- BDRV_BLOCK_DATA is set if the flag is set for at least one subregion
- BDRV_BLOCK_ZERO is set if the flag is set for all subregions
- BDRV_BLOCK_OFFSET_VALID is set if the flag is set for all subregions,
the provided offsets are contiguous and file is the same for all
subregions.
- BDRV_BLOCK_ALLOCATED is never set here (the caller sets it)
- BDRV_BLOCK_EOF is set if the last subregion sets it; assert that it's
not set for any other subregion
- BDRV_BLOCK_RAW is never set
- *map is only set if BDRV_BLOCK_OFFSET_VALID is set. It contains
the offset of the first subregion then.
- *file is also only set if BDRV_BLOCK_OFFSET_VALID is set. It contains
the *file of the subregions, which must be the same for all of them
(otherwise, we wouldn't have set BDRV_BLOCK_OFFSET_VALID).
- *pnum: The sum of all subregions
> +static int coroutine_fn bdrv_co_block_status_aligned(BlockDriverState *bs,
> + uint32_t align,
> + bool want_zero,
> + int64_t offset,
> + int64_t bytes,
> + int64_t *pnum,
> + int64_t *map,
> + BlockDriverState **file)
> +{
> + int ret;
> +
> + assert(is_power_of_2(align) && QEMU_IS_ALIGNED(offset | bytes, align));
> + ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
> + if (ret < 0) {
> + return ret;
> + }
> + if (!*pnum) {
> + assert(!bytes || ret & BDRV_BLOCK_EOF);
> + return ret;
> + }
> + if (*pnum >= align) {
> + if (!QEMU_IS_ALIGNED(*pnum, align)) {
> + ret &= ~BDRV_BLOCK_EOF;
> + *pnum = QEMU_ALIGN_DOWN(*pnum, align);
> + }
> + return ret;
> + }
So we decide to just shorten the function if we can return at least some
*pnum > 0. This means that is most cases, this function will have to be
called twice, once for the aligned part, and again for the misaligned
rest.
We do save a little if the caller isn't actually interested in mapping
the whole image, but just in a specific range before the misaligned
part.
So it depends on the use case whether this is an optimisation or a
pessimisation.
> + /* Merge in status for the rest of align */
> + while (*pnum < align) {
Okay, in any case, I can see it's a simplification. :-)
> + int ret2;
> + int64_t pnum2;
> + int64_t map2;
> + BlockDriverState *file2;
> +
> + ret2 = bdrv_co_block_status(bs, want_zero, offset + *pnum,
> + align - *pnum, &pnum2,
> + map ? &map2 : NULL, file ? &file2 : NULL);
> + if (ret2 < 0) {
> + return ret2;
> + }
> + if (ret2 & BDRV_BLOCK_EOF) {
> + ret |= BDRV_BLOCK_EOF;
> + if (!pnum2) {
> + pnum2 = align - *pnum;
> + ret2 |= BDRV_BLOCK_ZERO;
> + }
> + } else {
> + assert(pnum2);
> + }
> + if (ret2 & BDRV_BLOCK_DATA) {
> + ret |= BDRV_BLOCK_DATA;
> + }
> + if (!(ret2 & BDRV_BLOCK_ZERO)) {
> + ret &= ~BDRV_BLOCK_ZERO;
> + }
> + if ((ret & BDRV_BLOCK_OFFSET_VALID) &&
> + (!(ret2 & BDRV_BLOCK_OFFSET_VALID) ||
> + (map && *map + *pnum != map2) || (file && *file != file2))) {
> + ret &= ~BDRV_BLOCK_OFFSET_VALID;
> + if (map) {
> + *map = 0;
> + }
> + if (file) {
> + *file = NULL;
> + }
> + }
> + if (ret2 & BDRV_BLOCK_ALLOCATED) {
> + ret |= BDRV_BLOCK_ALLOCATED;
> + }
Hm, so if we have a region that consists of two subregion, one is
unallocated and the other one is a zero cluster. The result will be
DATA == false, ZERO == false, ALLOCATED == true. This looks a bit odd.
Did you check this case and come to the conclusion that it's okay? If
so, I think a comment would be good.
> + if (ret2 & BDRV_BLOCK_RAW) {
> + assert(ret & BDRV_BLOCK_RAW);
> + assert(ret & BDRV_BLOCK_OFFSET_VALID);
> + }
Doesn't RAW mean that we need to recurse rather than returning the flag?
Or actually, bdrv_co_block_status() shouldn't ever return the flag, so
can't we assert that it's clear?
> + *pnum += pnum2;
> + }
> + return ret;
> +}
Kevin
next prev parent reply other threads:[~2019-04-03 13:04 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-03 3:05 [Qemu-devel] [PATCH for-4.0? 0/7] Final round of NBD alignment fixes Eric Blake
2019-04-03 3:05 ` [Qemu-devel] [PATCH 1/7] nbd/server: Fix blockstatus trace Eric Blake
2019-04-05 14:13 ` Vladimir Sementsov-Ogievskiy
2019-04-03 3:05 ` [Qemu-devel] [PATCH 2/7] nbd/server: Trace server noncompliance on unaligned requests Eric Blake
2019-04-05 14:39 ` Vladimir Sementsov-Ogievskiy
2019-04-05 20:04 ` Eric Blake
2019-04-08 12:14 ` Vladimir Sementsov-Ogievskiy
2019-04-08 14:32 ` Eric Blake
2019-04-03 3:05 ` [Qemu-devel] [PATCH 3/7] nbd/server: Don't fail NBD_OPT_INFO for byte-aligned sources Eric Blake
2019-04-05 15:34 ` Vladimir Sementsov-Ogievskiy
2019-04-03 3:05 ` [Qemu-devel] [PATCH 4/7] iotests: Update 241 to expose backing layer fragmentation Eric Blake
2019-04-08 13:51 ` Vladimir Sementsov-Ogievskiy
2019-04-08 13:51 ` Vladimir Sementsov-Ogievskiy
2019-04-10 20:44 ` Eric Blake
2019-04-10 20:44 ` Eric Blake
2019-04-03 3:05 ` [Qemu-devel] [PATCH 5/7] block: Fix BDRV_BLOCK_RAW status to honor alignment Eric Blake
2019-04-03 13:03 ` Kevin Wolf [this message]
2019-04-03 14:02 ` Eric Blake
2019-04-03 3:05 ` [Qemu-devel] [PATCH 6/7] nbd/server: Avoid unaligned read/block_status from backing Eric Blake
2019-04-03 3:05 ` [Qemu-devel] [PATCH for-4.1 7/7] nbd/server: Avoid unaligned dirty-bitmap status Eric Blake
2019-04-04 14:52 ` [Qemu-devel] [PATCH for-4.0? 8/7] nbd/client: Fix error message for server with unusable sizing Eric Blake
2019-04-04 15:22 ` [Qemu-devel] [Qemu-block] " Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190403130354.GA6108@localhost.localdomain \
--to=kwolf@redhat.com \
--cc=eblake@redhat.com \
--cc=fam@euphon.net \
--cc=jsnow@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.