From: Kevin Wolf <kwolf@redhat.com>
To: "Denis V. Lunev" <den@openvz.org>
Cc: qemu-devel@nongnu.org, Max Reitz <mreitz@redhat.com>
Subject: Re: [Qemu-devel] [PATCH for 2.7 1/1] qcow2: improve qcow2_co_write_zeroes()
Date: Mon, 25 Apr 2016 11:05:53 +0200 [thread overview]
Message-ID: <20160425090553.GA5293@noname.str.redhat.com> (raw)
In-Reply-To: <1461413127-2594-1-git-send-email-den@openvz.org>
Am 23.04.2016 um 14:05 hat Denis V. Lunev geschrieben:
> Unfortunately Linux kernel could send non-aligned requests to qemu-nbd
> if the caller is using O_DIRECT and does not align in-memory data to
> page. Thus qemu-nbd will call block layer with non-aligned requests.
>
> qcow2_co_write_zeroes forcibly asks the caller to supply block-aligned
> data. In the other case it rejects with ENOTSUP which is properly
> handled on the upper level. The problem is that this grows the image.
>
> This could be optimized a bit:
> - particular request could be split to block aligned part and head/tail,
> which could be handled separately
In fact, this is what bdrv_co_do_write_zeroes() is already supposed to
do. qcow2 exposes its cluster size as bs->bl.write_zeroes_alignment, so
block/io.c should split the request in three.
If you see something different happening, we may have a bug there.
> - writes could be omitted when we do know that the image already contains
> zeroes at the offsets being written
I don't think this is a valid shortcut. The semantics of a write_zeroes
operation is that the zeros (literal or as flags) are stored in this
layer and that the backing file isn't involved any more for the given
sectors. For example, a streaming operation or qemu-img rebase may
involve write_zeroes operations, and relying on the backing file would
cause corruption there (because the whole point of the operation is that
the backing file can be removed).
And to be honest, writing zero flags in memory to the cached L2 table is
an operation so quick that calling bdrv_get_block_status_above() might
actually end up slower in most cases than just setting the flag.
Kevin
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> ---
> block/qcow2.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 53 insertions(+), 5 deletions(-)
>
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 470734b..9bdaa15 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2411,21 +2411,69 @@ finish:
> return ret;
> }
>
> +static int write_zeroes_chunk(BlockDriverState *bs, int64_t sector_num, int nr)
> +{
> + int ret, count;
> + BlockDriverState *file;
> + uint8_t *buf;
> + struct iovec iov;
> + QEMUIOVector local_qiov;
> +
> + ret = bdrv_get_block_status_above(bs, NULL, sector_num, nr, &count, &file);
> + if (ret > 0 && (ret & BDRV_BLOCK_ZERO) && count == nr) {
> + /* Nothing to do. The area is zeroed already.
> + Worth to check to avoid image expansion for non-aligned reqs. */
> + return 0;
> + }
> +
> + buf = qemu_blockalign0(bs, nr << BDRV_SECTOR_BITS);
> + iov = (struct iovec) {
> + .iov_base = buf,
> + .iov_len = nr << BDRV_SECTOR_BITS,
> + };
> + qemu_iovec_init_external(&local_qiov, &iov, 1);
> +
> + return qcow2_co_writev(bs, sector_num, nr, &local_qiov);
> +}
> +
> static coroutine_fn int qcow2_co_write_zeroes(BlockDriverState *bs,
> int64_t sector_num, int nb_sectors, BdrvRequestFlags flags)
> {
> int ret;
> BDRVQcow2State *s = bs->opaque;
>
> - /* Emulate misaligned zero writes */
> - if (sector_num % s->cluster_sectors || nb_sectors % s->cluster_sectors) {
> - return -ENOTSUP;
> + int nr = sector_num % s->cluster_sectors;
> + if (nr != 0) {
> + nr = MIN(s->cluster_sectors - nr, nb_sectors);
> +
> + ret = write_zeroes_chunk(bs, sector_num, nr);
> + if (ret < 0) {
> + return ret;
> + }
> +
> + sector_num += nr;
> + nb_sectors -= nr;
> + if (nb_sectors == 0) {
> + return 0;
> + }
> + }
> +
> + nr = nb_sectors % s->cluster_sectors;
> + if (nr != 0) {
> + ret = write_zeroes_chunk(bs, sector_num + nb_sectors - nr, nr);
> + if (ret < 0) {
> + return ret;
> + }
> +
> + nb_sectors -= nr;
> + if (nb_sectors == 0) {
> + return 0;
> + }
> }
>
> /* Whatever is left can use real zero clusters */
> qemu_co_mutex_lock(&s->lock);
> - ret = qcow2_zero_clusters(bs, sector_num << BDRV_SECTOR_BITS,
> - nb_sectors);
> + ret = qcow2_zero_clusters(bs, sector_num << BDRV_SECTOR_BITS, nb_sectors);
> qemu_co_mutex_unlock(&s->lock);
>
> return ret;
> --
> 2.5.0
>
next prev parent reply other threads:[~2016-04-25 9:06 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-23 12:05 [Qemu-devel] [PATCH for 2.7 1/1] qcow2: improve qcow2_co_write_zeroes() Denis V. Lunev
2016-04-25 9:05 ` Kevin Wolf [this message]
2016-04-25 10:20 ` Denis V. Lunev
2016-04-25 19:35 ` Eric Blake
2016-04-25 21:00 ` Denis V. Lunev
2016-04-26 8:23 ` Kevin Wolf
2016-04-26 9:35 ` Denis V. Lunev
2016-04-26 10:19 ` Kevin Wolf
2016-04-27 7:07 ` Denis V. Lunev
2016-04-27 8:12 ` Kevin Wolf
2016-04-27 8:32 ` Denis V. Lunev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160425090553.GA5293@noname.str.redhat.com \
--to=kwolf@redhat.com \
--cc=den@openvz.org \
--cc=mreitz@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).