From: Kevin Wolf <kwolf@redhat.com>
To: qemu-block@nongnu.org
Cc: kwolf@redhat.com, qemu-devel@nongnu.org
Subject: [Qemu-devel] [PULL 01/37] block: Add .bdrv_co_block_status() callback
Date: Fri, 2 Mar 2018 19:54:12 +0100 [thread overview]
Message-ID: <20180302185448.6314-2-kwolf@redhat.com> (raw)
In-Reply-To: <20180302185448.6314-1-kwolf@redhat.com>
From: Eric Blake <eblake@redhat.com>
We are gradually moving away from sector-based interfaces, towards
byte-based. Now that the block layer exposes byte-based allocation,
it's time to tackle the drivers. Add a new callback that operates
on as small as byte boundaries. Subsequent patches will then update
individual drivers, then finally remove .bdrv_co_get_block_status().
The new code also passes through the 'want_zero' hint, which will
allow subsequent patches to further optimize callers that only care
about how much of the image is allocated (want_zero is false),
rather than full details about runs of zeroes and which offsets the
allocation actually maps to (want_zero is true). As part of this
effort, fix another part of the documentation: the claim in commit
4c41cb4 that BDRV_BLOCK_ALLOCATED is short for 'DATA || ZERO' is a
lie at the block layer (see commit e88ae2264), even though it is
how the bit is computed from the driver layer. After all, there
are intentionally cases where we return ZERO but not ALLOCATED at
the block layer, when we know that a read sees zero because the
backing file is too short. Note that the driver interface is thus
slightly different than the public interface with regards to which
bits will be set, and what guarantees are provided on input.
We also add an assertion that any driver using the new callback will
make progress (the only time pnum will be 0 is if the block layer
already handled an out-of-bounds request, or if there is an error);
the old driver interface did not provide this guarantee, which
could lead to some inf-loops in drastic corner-case failures.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
include/block/block.h | 14 +++++++-------
include/block/block_int.h | 20 +++++++++++++++-----
block/io.c | 28 +++++++++++++++++++---------
3 files changed, 41 insertions(+), 21 deletions(-)
diff --git a/include/block/block.h b/include/block/block.h
index 19b3ab9cb5..947e8876cd 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -115,19 +115,19 @@ typedef struct HDGeometry {
* BDRV_BLOCK_ZERO: offset reads as zero
* BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing raw data
* BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
- * layer (short for DATA || ZERO), set by block layer
- * BDRV_BLOCK_EOF: the returned pnum covers through end of file for this layer
+ * layer rather than any backing, set by block layer
+ * BDRV_BLOCK_EOF: the returned pnum covers through end of file for this
+ * layer, set by block layer
*
* Internal flag:
* BDRV_BLOCK_RAW: for use by passthrough drivers, such as raw, to request
* that the block layer recompute the answer from the returned
* BDS; must be accompanied by just BDRV_BLOCK_OFFSET_VALID.
*
- * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 (BDRV_BLOCK_OFFSET_MASK) of
- * the return value (old interface) or the entire map parameter (new
- * interface) represent the offset in the returned BDS that is allocated for
- * the corresponding raw data. However, whether that offset actually
- * contains data also depends on BDRV_BLOCK_DATA, as follows:
+ * If BDRV_BLOCK_OFFSET_VALID is set, the map parameter represents the
+ * host offset within the returned BDS that is allocated for the
+ * corresponding raw guest data. However, whether that offset
+ * actually contains data also depends on BDRV_BLOCK_DATA, as follows:
*
* DATA ZERO OFFSET_VALID
* t t t sectors read as zero, returned file is zero at offset
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 5ea63f8fa8..c93722b43a 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -202,15 +202,25 @@ struct BlockDriver {
/*
* Building block for bdrv_block_status[_above] and
* bdrv_is_allocated[_above]. The driver should answer only
- * according to the current layer, and should not set
- * BDRV_BLOCK_ALLOCATED, but may set BDRV_BLOCK_RAW. See block.h
- * for the meaning of _DATA, _ZERO, and _OFFSET_VALID. The block
- * layer guarantees input aligned to request_alignment, as well as
- * non-NULL pnum and file.
+ * according to the current layer, and should only need to set
+ * BDRV_BLOCK_DATA, BDRV_BLOCK_ZERO, BDRV_BLOCK_OFFSET_VALID,
+ * and/or BDRV_BLOCK_RAW; if the current layer defers to a backing
+ * layer, the result should be 0 (and not BDRV_BLOCK_ZERO). See
+ * block.h for the overall meaning of the bits. As a hint, the
+ * flag want_zero is true if the caller cares more about precise
+ * mappings (favor accurate _OFFSET_VALID/_ZERO) or false for
+ * overall allocation (favor larger *pnum, perhaps by reporting
+ * _DATA instead of _ZERO). The block layer guarantees input
+ * clamped to bdrv_getlength() and aligned to request_alignment,
+ * as well as non-NULL pnum, map, and file; in turn, the driver
+ * must return an error or set pnum to an aligned non-zero value.
*/
int64_t coroutine_fn (*bdrv_co_get_block_status)(BlockDriverState *bs,
int64_t sector_num, int nb_sectors, int *pnum,
BlockDriverState **file);
+ int coroutine_fn (*bdrv_co_block_status)(BlockDriverState *bs,
+ bool want_zero, int64_t offset, int64_t bytes, int64_t *pnum,
+ int64_t *map, BlockDriverState **file);
/*
* Invalidate any cached meta-data.
diff --git a/block/io.c b/block/io.c
index 89d0745e95..b00c7e2e2c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1899,10 +1899,10 @@ int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
* Drivers not implementing the functionality are assumed to not support
* backing files, hence all their sectors are reported as allocated.
*
- * If 'want_zero' is true, the caller is querying for mapping purposes,
- * and the result should include BDRV_BLOCK_OFFSET_VALID and
- * BDRV_BLOCK_ZERO where possible; otherwise, the result may omit those
- * bits particularly if it allows for a larger value in 'pnum'.
+ * If 'want_zero' is true, the caller is querying for mapping
+ * purposes, with a focus on valid BDRV_BLOCK_OFFSET_VALID, _DATA, and
+ * _ZERO where possible; otherwise, the result favors larger 'pnum',
+ * with a focus on accurate BDRV_BLOCK_ALLOCATED.
*
* If 'offset' is beyond the end of the disk image the return value is
* BDRV_BLOCK_EOF and 'pnum' is set to 0.
@@ -1959,7 +1959,7 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
/* Must be non-NULL or bdrv_getlength() would have failed */
assert(bs->drv);
- if (!bs->drv->bdrv_co_get_block_status) {
+ if (!bs->drv->bdrv_co_get_block_status && !bs->drv->bdrv_co_block_status) {
*pnum = bytes;
ret = BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED;
if (offset + bytes == total_size) {
@@ -1976,13 +1976,14 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
bdrv_inc_in_flight(bs);
/* Round out to request_alignment boundaries */
- /* TODO: until we have a byte-based driver callback, we also have to
- * round out to sectors, even if that is bigger than request_alignment */
- align = MAX(bs->bl.request_alignment, BDRV_SECTOR_SIZE);
+ align = bs->bl.request_alignment;
+ if (bs->drv->bdrv_co_get_block_status && align < BDRV_SECTOR_SIZE) {
+ align = BDRV_SECTOR_SIZE;
+ }
aligned_offset = QEMU_ALIGN_DOWN(offset, align);
aligned_bytes = ROUND_UP(offset + bytes, align) - aligned_offset;
- {
+ if (bs->drv->bdrv_co_get_block_status) {
int count; /* sectors */
int64_t longret;
@@ -2007,6 +2008,15 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
}
ret = longret & ~BDRV_BLOCK_OFFSET_MASK;
*pnum = count * BDRV_SECTOR_SIZE;
+ } else {
+ ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset,
+ aligned_bytes, pnum, &local_map,
+ &local_file);
+ if (ret < 0) {
+ *pnum = 0;
+ goto out;
+ }
+ assert(*pnum); /* The block driver must make progress */
}
/*
--
2.13.6
next prev parent reply other threads:[~2018-03-02 18:55 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-02 18:54 [Qemu-devel] [PULL 00/37] Block layer patches Kevin Wolf
2018-03-02 18:54 ` Kevin Wolf [this message]
2018-03-02 18:54 ` [Qemu-devel] [PULL 02/37] nvme: Drop pointless .bdrv_co_get_block_status() Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 03/37] block: Switch passthrough drivers to .bdrv_co_block_status() Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 04/37] file-posix: Switch " Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 05/37] gluster: " Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 06/37] iscsi: Switch cluster_sectors to byte-based Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 07/37] iscsi: Switch iscsi_allocmap_update() " Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 08/37] iscsi: Switch to .bdrv_co_block_status() Kevin Wolf
2018-05-08 15:37 ` Peter Maydell
2018-05-08 16:38 ` Eric Blake
2018-03-02 18:54 ` [Qemu-devel] [PULL 09/37] null: " Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 10/37] parallels: " Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 11/37] qcow: " Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 12/37] qcow2: " Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 13/37] qed: " Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 14/37] raw: " Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 15/37] sheepdog: " Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 16/37] vdi: Avoid bitrot of debugging code Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 17/37] vdi: Switch to .bdrv_co_block_status() Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 18/37] vmdk: " Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 19/37] vpc: " Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 20/37] vvfat: " Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 21/37] block: Drop unused .bdrv_co_get_block_status() Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 22/37] block: fix write with zero flag set and iovector provided Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 23/37] iotest 033: add misaligned write-zeroes test via truncate Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 24/37] specs/qcow2: Fix documentation of the compressed cluster descriptor Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 25/37] docs: document how to use the l2-cache-entry-size parameter Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 26/37] aio: rename aio_context_in_iothread() to in_aio_context_home_thread() Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 27/37] block: extract AIO_WAIT_WHILE() from BlockDriverState Kevin Wolf
2018-03-05 19:53 ` Eric Blake
2018-03-02 18:54 ` [Qemu-devel] [PULL 28/37] block: add BlockBackend->in_flight counter Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 29/37] block: test blk_aio_flush() with blk->root == NULL Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 30/37] Revert "IDE: Do not flush empty CDROM drives" Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 31/37] block: rename .bdrv_create() to .bdrv_co_create_opts() Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 32/37] qcow2: make qcow2_co_create2() a coroutine_fn Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 33/37] qemu-img: Make resize error message more general Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 34/37] block/ssh: Pull ssh_grow_file() from ssh_create() Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 35/37] block/ssh: Make ssh_grow_file() blocking Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 36/37] block/ssh: Add basic .bdrv_truncate() Kevin Wolf
2018-03-02 18:54 ` [Qemu-devel] [PULL 37/37] qcow2: Replace align_offset() with ROUND_UP() Kevin Wolf
2018-03-05 13:27 ` [Qemu-devel] [PULL 00/37] Block layer patches Peter Maydell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180302185448.6314-2-kwolf@redhat.com \
--to=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).