From: Anton Nefedov <anton.nefedov@virtuozzo.com>
To: Eric Blake <eblake@redhat.com>, qemu-devel@nongnu.org
Cc: kwolf@redhat.com, "Denis V. Lunev" <den@openvz.org>,
den@virtuozzo.com, mreitz@redhat.com
Subject: Re: [Qemu-devel] [PATCH v1 01/13] qcow2: alloc space for COW in one chunk
Date: Tue, 23 May 2017 11:28:39 +0300 [thread overview]
Message-ID: <761233e8-026d-732c-4527-a744b16eba3e@virtuozzo.com> (raw)
In-Reply-To: <f628e214-2faa-0cf2-c5d2-39b0b6505fc2@redhat.com>
On 05/22/2017 10:00 PM, Eric Blake wrote:
> On 05/19/2017 04:34 AM, Anton Nefedov wrote:
>> From: "Denis V. Lunev" <den@openvz.org>
>>
>> Currently each single write operation can result in 3 write operations
>> if guest offsets are not cluster aligned. One write is performed for the
>> real payload and two for COW-ed areas. Thus the data possibly lays
>> non-contiguously on the host filesystem. This will reduce further
>> sequential read performance significantly.
>>
>> The patch allocates the space in the file with cluster granularity,
>> ensuring
>> 1. better host offset locality
>> 2. less space allocation operations
>> (which can be expensive on distributed storages)
>
> s/storages/storage/
>
Done.
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>> ---
>> block/qcow2.c | 32 +++++++++++++++++++++++++++++++-
>> 1 file changed, 31 insertions(+), 1 deletion(-)
>>
>
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index a8d61f0..2e6a0ec 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -1575,6 +1575,32 @@ fail:
>> return ret;
>> }
>>
>> +static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
>> +{
>> + BDRVQcow2State *s = bs->opaque;
>> + BlockDriverState *file = bs->file->bs;
>> + QCowL2Meta *m;
>> + int ret;
>> +
>> + for (m = l2meta; m != NULL; m = m->next) {
>> + uint64_t bytes = m->nb_clusters << s->cluster_bits;
>> +
>> + if (m->cow_start.nb_bytes == 0 && m->cow_end.nb_bytes == 0) {
>> + continue;
>> + }
>> +
>> + /* try to alloc host space in one chunk for better locality */
>> + ret = file->drv->bdrv_co_pwrite_zeroes(file, m->alloc_offset, bytes, 0);
>
> Are we guaranteed that this is a fast operation? (That is, it either
> results in a hole or an error, and doesn't waste time tediously writing
> actual zeroes)
>
well, block_int.h reads:
/*
* Efficiently zero a region of the disk image. Typically an image format
* would use a compact metadata representation to implement this. This
* function pointer may be NULL or return -ENOSUP and .bdrv_co_writev()
* will be called instead.
*/
int coroutine_fn (*bdrv_co_pwrite_zeroes)(BlockDriverState *bs,
int64_t offset, int count, BdrvRequestFlags flags);
(and that's why the driver function is used directly, bypassing the
'safe' bdrv interface that would try to write zeroes no matter the cost)
As far as I checked the drivers mostly follow the idea
>> +
>> + if (ret != 0) {
>> + continue;
>> + }
>
> Supposing we are using a file system that doesn't support holes, then
> ret will not be zero, and we ended up not allocating anything after all.
> Is that a problem that we are just blindly continuing the loop as our
> reaction to the error?
>
> /reads further
>
> I guess not - you aren't reacting to any error call, but merely using
> the side effect that an allocation happened for speed when it worked,
> and ignoring failure (you get the old behavior of the write() now
> causing the allocation) when it didn't.
>
yes, exactly
>> +
>> + file->total_sectors = MAX(file->total_sectors,
>> + (m->alloc_offset + bytes) / BDRV_SECTOR_SIZE);
>> + }
>> +}
>> +
>> static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
>> uint64_t bytes, QEMUIOVector *qiov,
>> int flags)
>> @@ -1656,8 +1682,12 @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
>> if (ret < 0) {
>> goto fail;
>> }
>> -
>> qemu_co_mutex_unlock(&s->lock);
>> +
>> + if (bs->file->bs->drv->bdrv_co_pwrite_zeroes != NULL) {
>> + handle_alloc_space(bs, l2meta);
>> + }
>
> Is it really a good idea to be modifying the underlying protocol image
> outside of the mutex?
>
as far as I understand, qcow2 usually modifies the underlying image
outside of the mutex? I guess it's qcow2 metadata that we wouldn't want
to touch unlocked
> At any rate, it looks like your patch is doing a best-effort write
> zeroes as an attempt to trigger consecutive allocation of the entire
> cluster in the underlying protocol right after a cluster has been
> allocated at the qcow2 format layer. Which means there are more
> syscalls now than there were previously, but now when we do three
> write() calls at offsets B, A, C, those three calls are into file space
> that was allocated earlier by the write zeroes, rather than fresh calls
> into unallocated space that is likely to trigger up to three disjoint
> allocations.
>
> As a discussion point, wouldn't we achieve the same effect of less
> fragmentation if we instead collect our data into a bounce buffer, and
> only then do a single write() (or more likely, a writev() where the iov
> is set up to reconstruct a single buffer on the syscall, but where the
> source data is still at different offsets)? We'd be avoiding the extra
> syscalls of pre-allocating the cluster, and while our write() call is
> still causing allocations, at least it is now one cluster-aligned
> write() rather than three sub-cluster out-of-order write()s.
>
I think yes we would achieve the same effect of less fragmentation;
but pre-zeroing also makes the following patch possible (to skip COW if
there is no backing data)
I have follow-up patches which merge initial data and COW padding into a
single writev(). After those it should become reasonable to skip
cluster pre-zeroing (for cases when there is backing data).
/Anton
next prev parent reply other threads:[~2017-05-23 8:28 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-19 9:34 [Qemu-devel] [PATCH v1 00/13] qcow2: space preallocation and COW improvements Anton Nefedov
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 01/13] qcow2: alloc space for COW in one chunk Anton Nefedov
2017-05-22 19:00 ` Eric Blake
2017-05-23 8:28 ` Anton Nefedov [this message]
2017-05-23 9:13 ` Denis V. Lunev
2017-05-26 8:11 ` Kevin Wolf
2017-05-26 8:57 ` Denis V. Lunev
2017-05-26 10:09 ` Anton Nefedov
2017-05-26 11:16 ` Kevin Wolf
2017-05-26 10:57 ` Denis V. Lunev
2017-05-26 11:32 ` Kevin Wolf
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 02/13] qcow2: is_zero_sectors(): return true if area is outside of backing file Anton Nefedov
2017-05-22 19:12 ` Eric Blake
2017-05-22 19:14 ` Eric Blake
2017-05-23 8:35 ` Anton Nefedov
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 03/13] qcow2: do not COW the empty areas Anton Nefedov
2017-05-22 19:24 ` Eric Blake
2017-05-23 8:31 ` Anton Nefedov
2017-05-23 9:15 ` Denis V. Lunev
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 04/13] qcow2: preallocation at image expand Anton Nefedov
2017-05-22 19:29 ` Eric Blake
2017-05-24 16:57 ` Anton Nefedov
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 05/13] qcow2: set inactive flag Anton Nefedov
2017-05-26 8:11 ` Kevin Wolf
2017-05-31 16:56 ` Anton Nefedov
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 06/13] qcow2: truncate preallocated space Anton Nefedov
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 07/13] qcow2: check space leak at the end of the image Anton Nefedov
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 08/13] qcow2: handle_prealloc(): find out if area zeroed by earlier preallocation Anton Nefedov
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 09/13] qcow2: fix misleading comment about L2 linking Anton Nefedov
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 10/13] qcow2-cluster: slightly refactor handle_dependencies() Anton Nefedov
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 11/13] qcow2-cluster: make handle_dependencies() logic easier to follow Anton Nefedov
2017-05-22 19:37 ` Eric Blake
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 12/13] qcow2: allow concurrent unaligned writes to the same clusters Anton Nefedov
2017-05-19 9:34 ` [Qemu-devel] [PATCH v1 13/13] iotest 046: test simultaneous cluster write error case Anton Nefedov
2017-05-23 14:35 ` [Qemu-devel] [PATCH v1 00/13] qcow2: space preallocation and COW improvements Eric Blake
2017-05-23 14:51 ` Denis V. Lunev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=761233e8-026d-732c-4527-a744b16eba3e@virtuozzo.com \
--to=anton.nefedov@virtuozzo.com \
--cc=den@openvz.org \
--cc=den@virtuozzo.com \
--cc=eblake@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).