All of lore.kernel.org
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Fam Zheng <famz@redhat.com>, qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>,
	pbonzini@redhat.com, Jeff Cody <jcody@redhat.com>,
	qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v6 1/2] mirror: Rewrite mirror_iteration
Date: Fri, 18 Dec 2015 17:41:45 +0100	[thread overview]
Message-ID: <56743749.2050500@redhat.com> (raw)
In-Reply-To: <1448014326-27608-2-git-send-email-famz@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 8958 bytes --]

On 20.11.2015 11:12, Fam Zheng wrote:
> The "pnum < nb_sectors" condition in deciding whether to actually copy
> data is unnecessarily strict, and the qiov initialization is
> unnecessarily for bdrv_aio_write_zeroes and bdrv_aio_discard.
> 
> Rewrite mirror_iteration to fix both flaws.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> 
> ---
> v5: Address Max's review comments:
>     - Fix parameter name of mirror_do_read().
>     - Simplify the buffer waiting loop in mirror_do_read.
>     - Don't skip next dirty chunk when collecting consective dirty
>       chunks.
>     - Check sector range when collecting consective dirty chunks.
>     - Don't misuse a negative return value of
>       bdrv_get_block_status_above.
> ---
>  block/mirror.c | 307 +++++++++++++++++++++++++++++++++++----------------------
>  1 file changed, 187 insertions(+), 120 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 52c9abf..ff8149d 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -45,7 +45,6 @@ typedef struct MirrorBlockJob {
>      BlockdevOnError on_source_error, on_target_error;
>      bool synced;
>      bool should_complete;
> -    int64_t sector_num;
>      int64_t granularity;
>      size_t buf_size;
>      int64_t bdev_length;
> @@ -157,113 +156,76 @@ static void mirror_read_complete(void *opaque, int ret)
>                      mirror_write_complete, op);
>  }
>  
> -static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
> +/* Round sector_num and/or nb_sectors to target cluster if COW is needed, and
> + * return the offset of the adjusted ending sector against
> + * sector_num + nb_sectors. */
> +static int mirror_cow_align(MirrorBlockJob *s,
> +                            int64_t *sector_num,
> +                            int *nb_sectors)
> +{
> +    bool head_need_cow, tail_need_cow;
> +    int diff = 0;
> +    int sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
> +
> +    head_need_cow = !test_bit(*sector_num / sectors_per_chunk, s->cow_bitmap);
> +    tail_need_cow = !test_bit((*sector_num + *nb_sectors) / sectors_per_chunk,

Should this be (*sector_num + *nb_sectors - 1) so that we actually check
the last chunk of the sector range (instead of the next chunk if
*sector_num + *nb_sectors is divisible by sectors_per_chunk).

> +                             s->cow_bitmap);
> +    if (head_need_cow || tail_need_cow) {
> +        int64_t rounded_sector_num;
> +        int rounded_nb_sectors;
> +        bdrv_round_to_clusters(s->target, *sector_num, *nb_sectors,
> +                               &rounded_sector_num, &rounded_nb_sectors);
> +        assert(*sector_num >= rounded_sector_num);
> +        assert(rounded_nb_sectors >= *nb_sectors);

You could move these assertions into the following conditional blocks,
that would make more sense to me:

> +        if (tail_need_cow) {
> +            int diff = rounded_sector_num + rounded_nb_sectors
> +                        - (*sector_num + *nb_sectors);

assert(diff >= 0);

Also, I don't like shadowing of variables very much.

> +            *nb_sectors += diff;
> +        }
> +        if (head_need_cow) {
> +            int diff = *sector_num - rounded_sector_num;

assert(diff >= 0);

> +            *sector_num = rounded_sector_num;
> +            *nb_sectors += diff;
> +        }
> +    }
> +    return diff;

diff is always 0.

(This is why I don't like shadowing of variables very much.)

> +}
> +
> +/* Submit async read while handling COW.
> + * Returns: nb_sectors if no alignment is necessary, or
> + *          (new_end - sector_num) if tail is rounded up or down due to
> + *          alignment or buffer limit.
> + */
> +static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
> +                          int nb_sectors)
>  {
>      BlockDriverState *source = s->common.bs;
> -    int nb_sectors, sectors_per_chunk, nb_chunks;
> -    int64_t end, sector_num, next_chunk, next_sector, hbitmap_next_sector;
> -    uint64_t delay_ns = 0;
> +    int sectors_per_chunk, nb_chunks;
> +    int ret = nb_sectors;
>      MirrorOp *op;
> -    int pnum;
> -    int64_t ret;
>  
> -    s->sector_num = hbitmap_iter_next(&s->hbi);
> -    if (s->sector_num < 0) {
> -        bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi);
> -        s->sector_num = hbitmap_iter_next(&s->hbi);
> -        trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
> -        assert(s->sector_num >= 0);
> -    }
> -
> -    hbitmap_next_sector = s->sector_num;
> -    sector_num = s->sector_num;
>      sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
> -    end = s->bdev_length / BDRV_SECTOR_SIZE;
>  
> -    /* Extend the QEMUIOVector to include all adjacent blocks that will
> -     * be copied in this operation.
> -     *
> -     * We have to do this if we have no backing file yet in the destination,
> -     * and the cluster size is very large.  Then we need to do COW ourselves.
> -     * The first time a cluster is copied, copy it entirely.  Note that,
> -     * because both the granularity and the cluster size are powers of two,
> -     * the number of sectors to copy cannot exceed one cluster.
> -     *
> -     * We also want to extend the QEMUIOVector to include more adjacent
> -     * dirty blocks if possible, to limit the number of I/O operations and
> -     * run efficiently even with a small granularity.
> -     */
> -    nb_chunks = 0;
> -    nb_sectors = 0;
> -    next_sector = sector_num;
> -    next_chunk = sector_num / sectors_per_chunk;
> +    if (s->cow_bitmap) {
> +        ret += mirror_cow_align(s, &sector_num, &nb_sectors);

mirror_cow_align() always returns 0, but I assume it is supposed to
return the difference of nb_sectors before and after the call (in which
case this line is correct).

> +    }
> +    /* We can only handle as much as buf_size at a time. */
> +    nb_sectors = MIN(s->buf_size >> BDRV_SECTOR_BITS, nb_sectors);
> +    assert(nb_sectors);

Maybe we should move these three lines before the mirror_cow_align()
call. I don't think it will make a difference in practice, but it seems
cleaner to me that way.

> +    /* The sector range must meet granularity because:
> +     * 1) Caller passes in aligned values;
> +     * 2) mirror_cow_align is used only when target cluster is larger. */
> +    assert(!(nb_sectors % sectors_per_chunk));
> +    assert(!(sector_num % sectors_per_chunk));
> +    nb_chunks = nb_sectors / sectors_per_chunk;

[...]

> +static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
> +{

[...]

> +    while (nb_chunks > 0 && sector_num < end) {
> +        int ret;
> +        int io_sectors;
> +        enum MirrorMethod {
> +            MIRROR_METHOD_COPY,
> +            MIRROR_METHOD_ZERO,
> +            MIRROR_METHOD_DISCARD
> +        } mirror_method = MIRROR_METHOD_COPY;
> +
> +        assert(!(sector_num % sectors_per_chunk));
> +        ret = bdrv_get_block_status_above(source, NULL, sector_num,
> +                                          nb_chunks * sectors_per_chunk,
> +                                          &io_sectors);
> +        if (ret < 0) {
> +            io_sectors = nb_chunks * sectors_per_chunk;
> +        }
> +
> +        io_sectors -= io_sectors % sectors_per_chunk;
> +        if (io_sectors < sectors_per_chunk) {
> +            io_sectors = sectors_per_chunk;
> +        } else if (ret > 0 && !(ret & BDRV_BLOCK_DATA)) {

Why > 0 and not >= 0?

Max

> +            int64_t target_sector_num;
> +            int target_nb_sectors;
> +            bdrv_round_to_clusters(s->target, sector_num, io_sectors,
> +                                   &target_sector_num, &target_nb_sectors);
> +            if (target_sector_num == sector_num &&
> +                target_nb_sectors == io_sectors) {
> +                mirror_method = ret & BDRV_BLOCK_ZERO ?
> +                                    MIRROR_METHOD_ZERO :
> +                                    MIRROR_METHOD_DISCARD;
> +            }
> +        }
> +
> +        switch (mirror_method) {
> +        case MIRROR_METHOD_COPY:
> +            io_sectors = mirror_do_read(s, sector_num, io_sectors);
> +            break;
> +        case MIRROR_METHOD_ZERO:
> +            mirror_do_zero_or_discard(s, sector_num, io_sectors, false);
> +            break;
> +        case MIRROR_METHOD_DISCARD:
> +            mirror_do_zero_or_discard(s, sector_num, io_sectors, true);
> +            break;
> +        default:
> +            abort();
> +        }
> +        assert(io_sectors);
> +        sector_num += io_sectors;
> +        nb_chunks -= io_sectors / sectors_per_chunk;
> +        delay_ns += ratelimit_calculate_delay(&s->limit, io_sectors);
>      }
>      return delay_ns;
>  }
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

  reply	other threads:[~2015-12-18 16:41 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-20 10:12 [Qemu-devel] [PATCH v6 0/2] mirror: Improve zero write and discard Fam Zheng
2015-11-20 10:12 ` [Qemu-devel] [PATCH v6 1/2] mirror: Rewrite mirror_iteration Fam Zheng
2015-12-18 16:41   ` Max Reitz [this message]
2015-12-23  3:53     ` Fam Zheng
2015-11-20 10:12 ` [Qemu-devel] [PATCH v6 2/2] mirror: Add mirror_wait_for_io Fam Zheng
2015-12-18 16:42   ` Max Reitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56743749.2050500@redhat.com \
    --to=mreitz@redhat.com \
    --cc=famz@redhat.com \
    --cc=jcody@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.