qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Denis V. Lunev" <den@virtuozzo.com>
To: Kevin Wolf <kwolf@redhat.com>,
	Anton Nefedov <anton.nefedov@virtuozzo.com>
Cc: qemu-devel@nongnu.org, mreitz@redhat.com,
	"Denis V. Lunev" <den@openvz.org>
Subject: Re: [Qemu-devel] [PATCH v1 01/13] qcow2: alloc space for COW in one chunk
Date: Fri, 26 May 2017 11:57:48 +0300	[thread overview]
Message-ID: <2f0f5f9a-23b3-5f67-7675-ecdfe15cc187@virtuozzo.com> (raw)
In-Reply-To: <20170526081147.GC7211@noname.str.redhat.com>

On 05/26/2017 11:11 AM, Kevin Wolf wrote:
> Am 19.05.2017 um 11:34 hat Anton Nefedov geschrieben:
>> From: "Denis V. Lunev" <den@openvz.org>
>>
>> Currently each single write operation can result in 3 write operations
>> if guest offsets are not cluster aligned. One write is performed for the
>> real payload and two for COW-ed areas. Thus the data possibly lays
>> non-contiguously on the host filesystem. This will reduce further
>> sequential read performance significantly.
>>
>> The patch allocates the space in the file with cluster granularity,
>> ensuring
>>   1. better host offset locality
>>   2. less space allocation operations
>>      (which can be expensive on distributed storages)
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> I don't think this is the right approach. You end up with two
> write operations: One write_zeroes and then one data write. If the
> backend actually supports efficient zero writes, write_zeroes won't
> necessarily allocate space, but writing data will definitely split
> the already existing allocation. If anything, we need a new
> bdrv_allocate() or something that would call fallocate() instead of
> abusing write_zeroes.
great idea. Very nice then.

> It seems much clearer to me that simply unifying the three write
> requests into a single one is an improvement. And it's easy to do, I
> even had a patch once to do this. The reason that I didn't send it was
> that it seemed to conflict with the data cache approach
These changes help a lot on 2 patterns:
- writing 4Kb into the middle of the block. The bigger the block size,
  more we gain
- sequential write, which is not sector aligned and comes in 2 requests:
  we perform allocation, which should be significantly faster than actual
  write and after that both parts of the write can be executed in parallel.
At my opinion both cases are frequent and important.

But OK, the code should be improved, you are absolutely right here.

>>  block/qcow2.c | 32 +++++++++++++++++++++++++++++++-
>>  1 file changed, 31 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index a8d61f0..2e6a0ec 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -1575,6 +1575,32 @@ fail:
>>      return ret;
>>  }
>>  
>> +static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
>> +{
>> +    BDRVQcow2State *s = bs->opaque;
>> +    BlockDriverState *file = bs->file->bs;
>> +    QCowL2Meta *m;
>> +    int ret;
>> +
>> +    for (m = l2meta; m != NULL; m = m->next) {
>> +        uint64_t bytes = m->nb_clusters << s->cluster_bits;
>> +
>> +        if (m->cow_start.nb_bytes == 0 && m->cow_end.nb_bytes == 0) {
>> +            continue;
>> +        }
>> +
>> +        /* try to alloc host space in one chunk for better locality */
>> +        ret = file->drv->bdrv_co_pwrite_zeroes(file, m->alloc_offset, bytes, 0);
> No. This is what you bypass:
>
> * All sanity checks that the block layer does
>
> * bdrv_inc/dec_in_flight(), which is required for drain to work
>   correctly. Not doing this will cause crashes.
>
> * tracked_request_begin/end(), mark_request_serialising() and
>   wait_serialising_requests(), which are required for serialising
>   requests to work correctly
>
> * Ensuring correct request alignment for file. This means that e.g.
>   qcow2 with cluster size 512 on a host with a 4k native disk will
>   break.
>
> * blkdebug events
>
> * before_write_notifiers. Not calling these will cause corrupted backups
>   if someone backups file.
>
> * Dirty bitmap updates
>
> * Updating write_gen, wr_highest_offset and total_sectors
>
> * Ensuring that bl.max_pwrite_zeroes and bl.pwrite_zeroes_alignment are
>   respected
>
> And these are just the obvious things. I'm sure I missed some.
>

You seems right. I have not though about that from this angle.

>> +        if (ret != 0) {
>> +            continue;
>> +        }
>> +
>> +        file->total_sectors = MAX(file->total_sectors,
>> +                                  (m->alloc_offset + bytes) / BDRV_SECTOR_SIZE);
> You only compensate for part of a single item in the list above, by
> duplicating code with block/io.c. This is not how to do things.
>
> As I said above, I think you don't really want write_zeroes anyway, but
> if you wanted a write_zeroes "but only if it's efficient" (which I'm not
> sure is a good thing to want), then a better way might be introducing a
> new request flag.
>
>> +    }
>> +}
>> +
>>  static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
>>                                           uint64_t bytes, QEMUIOVector *qiov,
>>                                           int flags)
> Kevin

Thank you for review and ideas ;)

Den

  reply	other threads:[~2017-05-26  8:58 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-19  9:34 [Qemu-devel] [PATCH v1 00/13] qcow2: space preallocation and COW improvements Anton Nefedov
2017-05-19  9:34 ` [Qemu-devel] [PATCH v1 01/13] qcow2: alloc space for COW in one chunk Anton Nefedov
2017-05-22 19:00   ` Eric Blake
2017-05-23  8:28     ` Anton Nefedov
2017-05-23  9:13     ` Denis V. Lunev
2017-05-26  8:11   ` Kevin Wolf
2017-05-26  8:57     ` Denis V. Lunev [this message]
2017-05-26 10:09       ` Anton Nefedov
2017-05-26 11:16       ` Kevin Wolf
2017-05-26 10:57     ` Denis V. Lunev
2017-05-26 11:32       ` Kevin Wolf
2017-05-19  9:34 ` [Qemu-devel] [PATCH v1 02/13] qcow2: is_zero_sectors(): return true if area is outside of backing file Anton Nefedov
2017-05-22 19:12   ` Eric Blake
2017-05-22 19:14     ` Eric Blake
2017-05-23  8:35       ` Anton Nefedov
2017-05-19  9:34 ` [Qemu-devel] [PATCH v1 03/13] qcow2: do not COW the empty areas Anton Nefedov
2017-05-22 19:24   ` Eric Blake
2017-05-23  8:31     ` Anton Nefedov
2017-05-23  9:15     ` Denis V. Lunev
2017-05-19  9:34 ` [Qemu-devel] [PATCH v1 04/13] qcow2: preallocation at image expand Anton Nefedov
2017-05-22 19:29   ` Eric Blake
2017-05-24 16:57     ` Anton Nefedov
2017-05-19  9:34 ` [Qemu-devel] [PATCH v1 05/13] qcow2: set inactive flag Anton Nefedov
2017-05-26  8:11   ` Kevin Wolf
2017-05-31 16:56     ` Anton Nefedov
2017-05-19  9:34 ` [Qemu-devel] [PATCH v1 06/13] qcow2: truncate preallocated space Anton Nefedov
2017-05-19  9:34 ` [Qemu-devel] [PATCH v1 07/13] qcow2: check space leak at the end of the image Anton Nefedov
2017-05-19  9:34 ` [Qemu-devel] [PATCH v1 08/13] qcow2: handle_prealloc(): find out if area zeroed by earlier preallocation Anton Nefedov
2017-05-19  9:34 ` [Qemu-devel] [PATCH v1 09/13] qcow2: fix misleading comment about L2 linking Anton Nefedov
2017-05-19  9:34 ` [Qemu-devel] [PATCH v1 10/13] qcow2-cluster: slightly refactor handle_dependencies() Anton Nefedov
2017-05-19  9:34 ` [Qemu-devel] [PATCH v1 11/13] qcow2-cluster: make handle_dependencies() logic easier to follow Anton Nefedov
2017-05-22 19:37   ` Eric Blake
2017-05-19  9:34 ` [Qemu-devel] [PATCH v1 12/13] qcow2: allow concurrent unaligned writes to the same clusters Anton Nefedov
2017-05-19  9:34 ` [Qemu-devel] [PATCH v1 13/13] iotest 046: test simultaneous cluster write error case Anton Nefedov
2017-05-23 14:35 ` [Qemu-devel] [PATCH v1 00/13] qcow2: space preallocation and COW improvements Eric Blake
2017-05-23 14:51   ` Denis V. Lunev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2f0f5f9a-23b3-5f67-7675-ecdfe15cc187@virtuozzo.com \
    --to=den@virtuozzo.com \
    --cc=anton.nefedov@virtuozzo.com \
    --cc=den@openvz.org \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).