From: Kevin Wolf <kwolf@redhat.com>
To: Anton Nefedov <anton.nefedov@virtuozzo.com>
Cc: qemu-devel@nongnu.org, den@virtuozzo.com, mreitz@redhat.com,
"Denis V. Lunev" <den@openvz.org>
Subject: Re: [Qemu-devel] [PATCH v1 01/13] qcow2: alloc space for COW in one chunk
Date: Fri, 26 May 2017 10:11:47 +0200 [thread overview]
Message-ID: <20170526081147.GC7211@noname.str.redhat.com> (raw)
In-Reply-To: <1495186480-114192-2-git-send-email-anton.nefedov@virtuozzo.com>
Am 19.05.2017 um 11:34 hat Anton Nefedov geschrieben:
> From: "Denis V. Lunev" <den@openvz.org>
>
> Currently each single write operation can result in 3 write operations
> if guest offsets are not cluster aligned. One write is performed for the
> real payload and two for COW-ed areas. Thus the data possibly lays
> non-contiguously on the host filesystem. This will reduce further
> sequential read performance significantly.
>
> The patch allocates the space in the file with cluster granularity,
> ensuring
> 1. better host offset locality
> 2. less space allocation operations
> (which can be expensive on distributed storages)
>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
I don't think this is the right approach. You end up with two
write operations: One write_zeroes and then one data write. If the
backend actually supports efficient zero writes, write_zeroes won't
necessarily allocate space, but writing data will definitely split
the already existing allocation. If anything, we need a new
bdrv_allocate() or something that would call fallocate() instead of
abusing write_zeroes.
It seems much clearer to me that simply unifying the three write
requests into a single one is an improvement. And it's easy to do, I
even had a patch once to do this. The reason that I didn't send it was
that it seemed to conflict with the data cache approach
> block/qcow2.c | 32 +++++++++++++++++++++++++++++++-
> 1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/block/qcow2.c b/block/qcow2.c
> index a8d61f0..2e6a0ec 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -1575,6 +1575,32 @@ fail:
> return ret;
> }
>
> +static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
> +{
> + BDRVQcow2State *s = bs->opaque;
> + BlockDriverState *file = bs->file->bs;
> + QCowL2Meta *m;
> + int ret;
> +
> + for (m = l2meta; m != NULL; m = m->next) {
> + uint64_t bytes = m->nb_clusters << s->cluster_bits;
> +
> + if (m->cow_start.nb_bytes == 0 && m->cow_end.nb_bytes == 0) {
> + continue;
> + }
> +
> + /* try to alloc host space in one chunk for better locality */
> + ret = file->drv->bdrv_co_pwrite_zeroes(file, m->alloc_offset, bytes, 0);
No. This is what you bypass:
* All sanity checks that the block layer does
* bdrv_inc/dec_in_flight(), which is required for drain to work
correctly. Not doing this will cause crashes.
* tracked_request_begin/end(), mark_request_serialising() and
wait_serialising_requests(), which are required for serialising
requests to work correctly
* Ensuring correct request alignment for file. This means that e.g.
qcow2 with cluster size 512 on a host with a 4k native disk will
break.
* blkdebug events
* before_write_notifiers. Not calling these will cause corrupted backups
if someone backups file.
* Dirty bitmap updates
* Updating write_gen, wr_highest_offset and total_sectors
* Ensuring that bl.max_pwrite_zeroes and bl.pwrite_zeroes_alignment are
respected
And these are just the obvious things. I'm sure I missed some.
> + if (ret != 0) {
> + continue;
> + }
> +
> + file->total_sectors = MAX(file->total_sectors,
> + (m->alloc_offset + bytes) / BDRV_SECTOR_SIZE);
You only compensate for part of a single item in the list above, by
duplicating code with block/io.c. This is not how to do things.
As I said above, I think you don't really want write_zeroes anyway, but
if you wanted a write_zeroes "but only if it's efficient" (which I'm not
sure is a good thing to want), then a better way might be introducing a
new request flag.
> + }
> +}
> +
> static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
> uint64_t bytes, QEMUIOVector *qiov,
> int flags)
Kevin
next prev parent reply other threads:[~2017-05-26 8:11 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-19 9:34 [Qemu-devel] [PATCH v1 00/13] qcow2: space preallocation and COW improvements Anton Nefedov
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 01/13] qcow2: alloc space for COW in one chunk Anton Nefedov
2017-05-22 19:00 ` Eric Blake
2017-05-23 8:28 ` Anton Nefedov
2017-05-23 9:13 ` Denis V. Lunev
2017-05-26 8:11 ` Kevin Wolf [this message]
2017-05-26 8:57 ` Denis V. Lunev
2017-05-26 10:09 ` Anton Nefedov
2017-05-26 11:16 ` Kevin Wolf
2017-05-26 10:57 ` Denis V. Lunev
2017-05-26 11:32 ` Kevin Wolf
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 02/13] qcow2: is_zero_sectors(): return true if area is outside of backing file Anton Nefedov
2017-05-22 19:12 ` Eric Blake
2017-05-22 19:14 ` Eric Blake
2017-05-23 8:35 ` Anton Nefedov
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 03/13] qcow2: do not COW the empty areas Anton Nefedov
2017-05-22 19:24 ` Eric Blake
2017-05-23 8:31 ` Anton Nefedov
2017-05-23 9:15 ` Denis V. Lunev
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 04/13] qcow2: preallocation at image expand Anton Nefedov
2017-05-22 19:29 ` Eric Blake
2017-05-24 16:57 ` Anton Nefedov
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 05/13] qcow2: set inactive flag Anton Nefedov
2017-05-26 8:11 ` Kevin Wolf
2017-05-31 16:56 ` Anton Nefedov
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 06/13] qcow2: truncate preallocated space Anton Nefedov
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 07/13] qcow2: check space leak at the end of the image Anton Nefedov
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 08/13] qcow2: handle_prealloc(): find out if area zeroed by earlier preallocation Anton Nefedov
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 09/13] qcow2: fix misleading comment about L2 linking Anton Nefedov
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 10/13] qcow2-cluster: slightly refactor handle_dependencies() Anton Nefedov
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 11/13] qcow2-cluster: make handle_dependencies() logic easier to follow Anton Nefedov
2017-05-22 19:37 ` Eric Blake
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 12/13] qcow2: allow concurrent unaligned writes to the same clusters Anton Nefedov
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 13/13] iotest 046: test simultaneous cluster write error case Anton Nefedov
2017-05-23 14:35 ` [Qemu-devel] [PATCH v1 00/13] qcow2: space preallocation and COW improvements Eric Blake
2017-05-23 14:51 ` Denis V. Lunev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170526081147.GC7211@noname.str.redhat.com \
--to=kwolf@redhat.com \
--cc=anton.nefedov@virtuozzo.com \
--cc=den@openvz.org \
--cc=den@virtuozzo.com \
--cc=mreitz@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).