Re: [Qemu-devel] [PATCH 1/2] qemu-img convert: Rewrite copying logic

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Max Reitz <mreitz@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>, qemu-devel@nongnu.org
Cc: stefanha@redhat.com
Subject: Re: [Qemu-devel] [PATCH 1/2] qemu-img convert: Rewrite copying logic
Date: Wed, 18 Feb 2015 12:18:02 -0500	[thread overview]
Message-ID: <54E4C94A.2020404@redhat.com> (raw)
In-Reply-To: <1424272761-27554-2-git-send-email-kwolf@redhat.com>

On 2015-02-18 at 10:19, Kevin Wolf wrote:
> The implementation of qemu-img convert is (a) messy, (b) buggy, and
> (c) less efficient than possible. The changes required to beat some
> sense into it are massive enough that incremental changes would only
> make my and the reviewers' life harder. So throw it away and reimplement
> it from scratch.
>
> Let me give some examples what I mean by messy, buggy and inefficient:
>
> (a) The copying logic of qemu-img convert has two separate branches for
>      compressed and normal target images, which roughly do the same -
>      except for a little code that handles actual differences between
>      compressed and uncompressed images, and much more code that
>      implements just a different set of optimisations and bugs. This is
>      unnecessary code duplication, and makes the code for compressed
>      output (unsurprisingly) suffer from bitrot.
>
>      The code for uncompressed ouput is run twice to count the the total
>      length for the progress bar. In the first run it just takes a
>      shortcut and runs only half the loop, and when it's done, it toggles
>      a boolean, jumps out of the loop with a backwards goto and starts
>      over. Works, but pretty is something different.
>
> (b) Converting while keeping a backing file (-B option) is broken in
>      several ways. This includes not writing to the image file if the
>      input has zero clusters or data filled with zeros (ignoring that the
>      backing file will be visible instead).
>
>      It also doesn't correctly limit every iteration of the copy loop to
>      sectors of the same status so that too many sectors may be copied to
>      in the target image. For -B this gives an unexpected result, for
>      other images it just does more work than necessary.
>
>      Conversion with a compressed target completely ignores any target
>      backing file.
>
> (c) qemu-img convert skips reading and writing an area if it knows from
>      metadata that copying isn't needed (except for the bug mentioned
>      above that ignores a status change in some cases). It does, however,
>      read from the source even if it knows that it will read zeros, and
>      then search for non-zero bytes in the read buffer, if it's possible
>      that a write might be needed.
>
> This reimplementation of the copying core reorganises the code to remove
> the duplication and have a much more obvious code flow, by essentially
> splitting the copy iteration loop into three parts:
>
> 1. Find the number of contiguous sectors of the same status at the
>     current offset (This can also be called in a separate loop for the
>     copying loop in order to determine the total sectors for the progress
>     bar.)
>
> 2. Read sectors. If the status implies that there is no data there to
>     read (zero or unallocated cluster), don't do anything.
>
> 3. Write sectors depending on the status. If it's data, write it. If
>     we want the backing file to be visible (with -B), don't write it. If
>     it's zeroed, skip it if you can, otherwise use bdrv_write_zeroes() to
>     optimise the write at least where possible.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   qemu-img.c | 511 ++++++++++++++++++++++++++++++++++++-------------------------
>   1 file changed, 305 insertions(+), 206 deletions(-)

You have been very careful not to write "Rewrite img_convert()" or 
something like that in the subject, so I can't really complain that 
there are still bugs in the function (which are not related to the 
copying logic), for instance:

$ ./qemu-img create -f qcow2 i1.qcow2 64M
Formatting 'i1.qcow2', fmt=qcow2 size=67108864 encryption=off 
cluster_size=65536 lazy_refcounts=off
$ ./qemu-img create -f qcow2 i2.qcow2 64M
Formatting 'i2.qcow2', fmt=qcow2 size=67108864 encryption=off 
cluster_size=65536 lazy_refcounts=off
$ ./qemu-img snapshot -c foo i1.qcow2
$ ./qemu-img snapshot -c foo i2.qcow2
$ ./qemu-io -c 'write 0 64M' i2.qcow2
wrote 67108864/67108864 bytes at offset 0
64 MiB, 1 ops; 0:00:01.32 (48.152 MiB/sec and 0.7524 ops/sec)
$ ./qemu-img convert -l snapshot.name=foo -O qcow2 i{1,2}.qcow2 o.qcow2
qemu-img: error while reading sector 131072: Input/output error

("No support for concatenating multiple snapshot" should be enforced for 
sn_opts != NULL)

> diff --git a/qemu-img.c b/qemu-img.c
> index e148af8..5c8386e 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1306,20 +1306,307 @@ out3:
>       return ret;
>   }
>   
> +enum ImgConvertBlockStatus {
> +    BLK_DATA,
> +    BLK_ZERO,
> +    BLK_BACKING_FILE,
> +};
> +
> +typedef struct ImgConvertState {
> +    BlockDriverState **src;
> +    int64_t *src_sectors;
> +    int src_cur, src_num;
> +    int64_t src_cur_offset;
> +    int64_t total_sectors;
> +    int64_t allocated_sectors;
> +    enum ImgConvertBlockStatus status;
> +    int64_t sector_next_status;
> +    BlockDriverState *target;
> +    bool has_zero_init;
> +    bool compressed;
> +    bool out_backing;

Maybe "out_backed" (or "out_is_backed" or "out_has_backing") would be 
better; "out_backing" to me sounds like this should describe the backing 
file.

> +    int min_sparse;
> +    size_t cluster_sectors;
> +    size_t buf_sectors;
> +} ImgConvertState;
> +
> +static void convert_select_part(ImgConvertState *s, int64_t sector_num)
> +{
> +    while (sector_num - s->src_cur_offset >= s->src_sectors[s->src_cur]) {
> +        s->src_cur_offset += s->src_sectors[s->src_cur];
> +        s->src_cur++;
> +        assert(s->src_cur < s->src_num);
> +    }
> +}
> +
> +static int convert_iteration_sectors(ImgConvertState *s, int64_t sector_num)
> +{
> +    int64_t ret;
> +    int n;
> +
> +    convert_select_part(s, sector_num);
> +
> +    assert(s->total_sectors > sector_num);
> +    n = MIN(s->total_sectors - sector_num, INT_MAX);

Maybe it would be better to use INT_MAX / BDRV_SECTOR_SIZE here (and in 
other places, too)... In practice, this would only be relevant to reads 
and writes, though, and those are limited by s->buf_sectors.

> +
> +    if (s->sector_next_status <= sector_num) {
> +        ret = bdrv_get_block_status(s->src[s->src_cur],
> +                                    sector_num - s->src_cur_offset,
> +                                    n, &n);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +
> +        if (ret & BDRV_BLOCK_ZERO) {
> +            s->status = BLK_ZERO;
> +        } else if (ret & BDRV_BLOCK_DATA) {
> +            s->status = BLK_DATA;
> +        } else if (!s->out_backing) {
> +            /* Without a target backing file we must copy over the contents of
> +             * the backing file as well. */
> +            /* TODO Check block status of the backing file chain to avoid
> +             * needlessly reading zeroes and limiting the iteration to the
> +             * buffer size */
> +            s->status = BLK_DATA;
> +        } else {
> +            s->status = BLK_BACKING_FILE;
> +        }
> +
> +        s->sector_next_status = sector_num + n;
> +    }
> +
> +    n = MIN(n, s->sector_next_status - sector_num);
> +    if (s->status == BLK_DATA) {
> +        n = MIN(n, s->buf_sectors);
> +    }
> +
> +    /* We need to write complete clusters for compressed images, so if an
> +     * unallocated area is shorter than that, we must consider the whole
> +     * cluster allocated. */
> +    if (s->compressed) {
> +        if (n < s->cluster_sectors) {
> +            n = MIN(s->cluster_sectors, s->total_sectors - sector_num);
> +            s->status = BLK_DATA;
> +        } else {
> +            n = QEMU_ALIGN_DOWN(n, s->cluster_sectors);
> +        }
> +    }
> +
> +    return n;
> +}
> +
> +static int convert_read(ImgConvertState *s, int64_t sector_num, int nb_sectors,
> +                        uint8_t *buf)
> +{
> +    int n;
> +    int ret;
> +
> +    if (s->status == BLK_ZERO || s->status == BLK_BACKING_FILE) {
> +        return 0;
> +    }
> +
> +    assert(nb_sectors <= s->buf_sectors);
> +    while (nb_sectors > 0) {
> +        BlockDriverState *bs;
> +        int64_t bs_sectors;
> +
> +        /* In the case of compression with multiple source files, we can get a
> +         * nb_sectors that spreads into the next part. So we must be able to
> +         * read across multiple BDSes for one convert_read() call. */
> +        convert_select_part(s, sector_num);
> +        bs = s->src[s->src_cur];
> +        bs_sectors = s->src_sectors[s->src_cur];
> +
> +        n = MIN(nb_sectors, bs_sectors - (sector_num - s->src_cur_offset));
> +        ret = bdrv_read(bs, sector_num, buf, n);

Shouldn't this be sector_num - s->src_cur_offset?

> +        if (ret < 0) {
> +            return ret;
> +        }
> +
> +        sector_num += n;
> +        nb_sectors -= n;
> +        buf += n * BDRV_SECTOR_SIZE;
> +    }
> +
> +    return 0;
> +}
> +
> +static int convert_write(ImgConvertState *s, int64_t sector_num, int nb_sectors,
> +                         const uint8_t *buf)
> +{
> +    int ret;
> +
> +    while (nb_sectors > 0) {
> +

This empty line looks a bit strange to me...

> +        int n = nb_sectors;
> +
> +        switch (s->status) {
> +        case BLK_BACKING_FILE:
> +            /* If we have a backing file, leave clusters unallocated that are
> +             * unallocated in the source image, so that the backing file is
> +             * visible at the respective offset. */
> +            assert(s->out_backing);
> +            break;
> +
> +        case BLK_DATA:
> +            /* We must always write compressed clusters as a whole, so don't
> +             * try to find zeroed parts in the buffer. We can only save the
> +             * write if the buffer is completely zeroed. */

But you shouldn't be doing that if n < s->min_sparse, I think (if people 
specify -S 0, they don't want you to zero anything, and that's 
guaranteed by the man page). Considering that is_allocated_sectors_min() 
basically has a min = MIN(min, n) part, too, I'd be fine with making -S 
0 a special case here ("if (s->has_zero_init && s->min_sparse && 
buffer_is_zero(...))").

Actually, now that I'm looking at is_allocated_sectors_min(), if the 
first sectors is not allocated, it will return false and thus both the 
current qemu-img convert and the new one after this patch won't write 
data. That's a bug, I think (because it is guaranteed by the man page).

> +            if (s->compressed) {
> +                if (s->has_zero_init &&
> +                    buffer_is_zero(buf, n * BDRV_SECTOR_SIZE))
> +                {
> +                    assert(!s->out_backing);
> +                    break;
> +                }
> +
> +                ret = bdrv_write_compressed(s->target, sector_num, buf, n);
> +                if (ret < 0) {
> +                    return ret;
> +                }
> +                break;
> +            }
> +
> +            /* If there is real non-zero data, we must write it. Otherwise we
> +             * can treat it as zero sectors. */
> +            if (is_allocated_sectors_min(buf, n, &n, s->min_sparse)) {

So I think this should be "if (!s->min_sparse || ...)".

> +                ret = bdrv_write(s->target, sector_num, buf, n);
> +                if (ret < 0) {
> +                    return ret;
> +                }
> +                break;
> +            }
> +            /* fall-through */
> +
> +        case BLK_ZERO:
> +            if (s->has_zero_init) {

And I don't really know what to do about this. Should we write zeros if 
!s->min_sparse? Or is this a special case and the user can't expect 
qemu-img convert to really write zeros if the source image contained 
unallocated/zero clusters? That would make sense, but once again, the 
man page clearly states that "If sparse_size is 0, […] the destination 
image will always be fully allocated." Maybe we could argue that "fully" 
in this case means "as fully as the source image was allocated".

> +                break;
> +            }
> +            /* TODO Should this use BDRV_REQ_MAY_UNMAP? */

Can't we remove this comment? If BDRV_REQ_MAP_UNMAP results in zeros 
read, bdrv_make_zero() below will have been invoked and thus 
s->has_zero_init = true (unless you take my suggestion and leave 
s->has_zero_init false if bdrv_make_zero() failed, but I wouldn't really 
mind that case here).

> +            ret = bdrv_write_zeroes(s->target, sector_num, n, 0);
> +            if (ret < 0) {
> +                return ret;
> +            }
> +            break;
> +        }
> +
> +        sector_num += n;
> +        nb_sectors -= n;
> +        buf += n * BDRV_SECTOR_SIZE;
> +    }
> +
> +    return 0;
> +}
> +
> +static int convert_do_copy(ImgConvertState *s)
> +{
> +    uint8_t *buf = NULL;
> +    int64_t sector_num, allocated_done;
> +    int ret;
> +    int n;
> +
> +    /* Check whether we have zero initialisation or can get it efficiently */
> +    s->has_zero_init = s->min_sparse && !s->out_backing
> +                     ? bdrv_has_zero_init(s->target)
> +                     : false;
> +
> +    if (!s->has_zero_init && !s->out_backing &&
> +        bdrv_can_write_zeroes_with_unmap(s->target))
> +    {
> +        ret = bdrv_make_zero(s->target, BDRV_REQ_MAY_UNMAP);
> +        if (ret < 0) {
> +            goto fail;

I think just leaving s->has_zero_init false should be fine, too (instead 
of failing).

> +        }
> +        s->has_zero_init = 1;

s/1/true/

> +    }
> +
> +    /* Allocate buffer for copied data. For compressed images, only one cluster
> +     * can be copied at a time. */
> +    if (s->compressed) {
> +        if (s->cluster_sectors <= 0 || s->cluster_sectors > s->buf_sectors) {
> +            error_report("invalid cluster size");
> +            ret = -EINVAL;
> +            goto fail;
> +        }
> +        s->buf_sectors = s->cluster_sectors;
> +    }
> +    buf = qemu_blockalign(s->target, s->buf_sectors * BDRV_SECTOR_SIZE);
> +
> +    /* Calculate allocated sectors for progress */
> +    s->allocated_sectors = 0;
> +    sector_num = 0;
> +    while (sector_num < s->total_sectors) {
> +        n = convert_iteration_sectors(s, sector_num);
> +        if (n < 0) {
> +            ret = n;
> +            goto fail;
> +        }
> +        if (s->status == BLK_DATA) {
> +            s->allocated_sectors += n;
> +        }
> +        sector_num += n;
> +    }
> +
> +    /* Do the copy */
> +    s->src_cur = 0;
> +    s->src_cur_offset = 0;
> +    s->sector_next_status = 0;
> +
> +    sector_num = 0;
> +    allocated_done = 0;
> +
> +    while (sector_num < s->total_sectors) {
> +        n = convert_iteration_sectors(s, sector_num);
> +        if (n < 0) {
> +            ret = n;
> +            goto fail;
> +        }
> +        if (s->status == BLK_DATA) {
> +            allocated_done += n;
> +            qemu_progress_print(100.0 * allocated_done / s->allocated_sectors,
> +                                0);
> +        }
> +
> +        ret = convert_read(s, sector_num, n, buf);
> +        if (ret < 0) {
> +            error_report("error while reading sector %" PRId64
> +                         ": %s", sector_num, strerror(-ret));
> +            goto fail;
> +        }
> +
> +        ret = convert_write(s, sector_num, n, buf);
> +        if (ret < 0) {
> +            error_report("error while writing sector %" PRId64
> +                         ": %s", sector_num, strerror(-ret));
> +            goto fail;
> +        }
> +
> +        sector_num += n;
> +    }
> +
> +    if (s->compressed) {
> +        /* signal EOF to align */
> +        bdrv_write_compressed(s->target, 0, NULL, 0);

Is there a reason for ignoring the return value other than "That's how 
img_convert() used to do it"?


All in all, this patch looks pretty good to me and certainly makes the 
function much more clear. But as the bdrv_read() call probably needs to 
be fixed, and because I think the -S 0 behavior is buggy, I cannot give 
an R-b at this point (but I will if those are fixed, or if you tell me 
why they don't need to be).

Max

> +    }
> +
> +    ret = 0;
> +fail:
> +    qemu_vfree(buf);
> +    return ret;
> +}
> +
>   static int img_convert(int argc, char **argv)
>   {
> -    int c, n, n1, bs_n, bs_i, compress, cluster_sectors, skip_create;
> +    int c, bs_n, bs_i, compress, cluster_sectors, skip_create;
>       int64_t ret = 0;
>       int progress = 0, flags, src_flags;
>       const char *fmt, *out_fmt, *cache, *src_cache, *out_baseimg, *out_filename;
>       BlockDriver *drv, *proto_drv;
>       BlockBackend **blk = NULL, *out_blk = NULL;
>       BlockDriverState **bs = NULL, *out_bs = NULL;
> -    int64_t total_sectors, nb_sectors, sector_num, bs_offset;
> +    int64_t total_sectors;
>       int64_t *bs_sectors = NULL;
> -    uint8_t * buf = NULL;
>       size_t bufsectors = IO_BUF_SIZE / BDRV_SECTOR_SIZE;
> -    const uint8_t *buf1;
>       BlockDriverInfo bdi;
>       QemuOpts *opts = NULL;
>       QemuOptsList *create_opts = NULL;
> @@ -1330,6 +1617,7 @@ static int img_convert(int argc, char **argv)
>       bool quiet = false;
>       Error *local_err = NULL;
>       QemuOpts *sn_opts = NULL;
> +    ImgConvertState state;
>   
>       fmt = NULL;
>       out_fmt = "raw";
> @@ -1622,9 +1910,6 @@ static int img_convert(int argc, char **argv)
>       }
>       out_bs = blk_bs(out_blk);
>   
> -    bs_i = 0;
> -    bs_offset = 0;
> -
>       /* increase bufsectors from the default 4096 (2M) if opt_transfer_length
>        * or discard_alignment of the out_bs is greater. Limit to 32768 (16MB)
>        * as maximum. */
> @@ -1633,8 +1918,6 @@ static int img_convert(int argc, char **argv)
>                                            out_bs->bl.discard_alignment))
>                       );
>   
> -    buf = qemu_blockalign(out_bs, bufsectors * BDRV_SECTOR_SIZE);
> -
>       if (skip_create) {
>           int64_t output_sectors = bdrv_nb_sectors(out_bs);
>           if (output_sectors < 0) {
> @@ -1661,203 +1944,20 @@ static int img_convert(int argc, char **argv)
>           cluster_sectors = bdi.cluster_size / BDRV_SECTOR_SIZE;
>       }
>   
> -    if (compress) {
> -        if (cluster_sectors <= 0 || cluster_sectors > bufsectors) {
> -            error_report("invalid cluster size");
> -            ret = -1;
> -            goto out;
> -        }
> -        sector_num = 0;
> -
> -        nb_sectors = total_sectors;
> -
> -        for(;;) {
> -            int64_t bs_num;
> -            int remainder;
> -            uint8_t *buf2;
> -
> -            nb_sectors = total_sectors - sector_num;
> -            if (nb_sectors <= 0)
> -                break;
> -            if (nb_sectors >= cluster_sectors)
> -                n = cluster_sectors;
> -            else
> -                n = nb_sectors;
> -
> -            bs_num = sector_num - bs_offset;
> -            assert (bs_num >= 0);
> -            remainder = n;
> -            buf2 = buf;
> -            while (remainder > 0) {
> -                int nlow;
> -                while (bs_num == bs_sectors[bs_i]) {
> -                    bs_offset += bs_sectors[bs_i];
> -                    bs_i++;
> -                    assert (bs_i < bs_n);
> -                    bs_num = 0;
> -                    /* printf("changing part: sector_num=%" PRId64 ", "
> -                       "bs_i=%d, bs_offset=%" PRId64 ", bs_sectors=%" PRId64
> -                       "\n", sector_num, bs_i, bs_offset, bs_sectors[bs_i]); */
> -                }
> -                assert (bs_num < bs_sectors[bs_i]);
> -
> -                nlow = remainder > bs_sectors[bs_i] - bs_num
> -                    ? bs_sectors[bs_i] - bs_num : remainder;
> -
> -                ret = bdrv_read(bs[bs_i], bs_num, buf2, nlow);
> -                if (ret < 0) {
> -                    error_report("error while reading sector %" PRId64 ": %s",
> -                                 bs_num, strerror(-ret));
> -                    goto out;
> -                }
> -
> -                buf2 += nlow * 512;
> -                bs_num += nlow;
> -
> -                remainder -= nlow;
> -            }
> -            assert (remainder == 0);
> -
> -            if (!buffer_is_zero(buf, n * BDRV_SECTOR_SIZE)) {
> -                ret = bdrv_write_compressed(out_bs, sector_num, buf, n);
> -                if (ret != 0) {
> -                    error_report("error while compressing sector %" PRId64
> -                                 ": %s", sector_num, strerror(-ret));
> -                    goto out;
> -                }
> -            }
> -            sector_num += n;
> -            qemu_progress_print(100.0 * sector_num / total_sectors, 0);
> -        }
> -        /* signal EOF to align */
> -        bdrv_write_compressed(out_bs, 0, NULL, 0);
> -    } else {
> -        int64_t sectors_to_read, sectors_read, sector_num_next_status;
> -        bool count_allocated_sectors;
> -        int has_zero_init = min_sparse ? bdrv_has_zero_init(out_bs) : 0;
> -
> -        if (!has_zero_init && bdrv_can_write_zeroes_with_unmap(out_bs)) {
> -            ret = bdrv_make_zero(out_bs, BDRV_REQ_MAY_UNMAP);
> -            if (ret < 0) {
> -                goto out;
> -            }
> -            has_zero_init = 1;
> -        }
> -
> -        sectors_to_read = total_sectors;
> -        count_allocated_sectors = progress && (out_baseimg || has_zero_init);
> -restart:
> -        sector_num = 0; // total number of sectors converted so far
> -        sectors_read = 0;
> -        sector_num_next_status = 0;
> -
> -        for(;;) {
> -            nb_sectors = total_sectors - sector_num;
> -            if (nb_sectors <= 0) {
> -                if (count_allocated_sectors) {
> -                    sectors_to_read = sectors_read;
> -                    count_allocated_sectors = false;
> -                    goto restart;
> -                }
> -                ret = 0;
> -                break;
> -            }
> -
> -            while (sector_num - bs_offset >= bs_sectors[bs_i]) {
> -                bs_offset += bs_sectors[bs_i];
> -                bs_i ++;
> -                assert (bs_i < bs_n);
> -                /* printf("changing part: sector_num=%" PRId64 ", bs_i=%d, "
> -                  "bs_offset=%" PRId64 ", bs_sectors=%" PRId64 "\n",
> -                   sector_num, bs_i, bs_offset, bs_sectors[bs_i]); */
> -            }
> -
> -            if ((out_baseimg || has_zero_init) &&
> -                sector_num >= sector_num_next_status) {
> -                n = nb_sectors > INT_MAX ? INT_MAX : nb_sectors;
> -                ret = bdrv_get_block_status(bs[bs_i], sector_num - bs_offset,
> -                                            n, &n1);
> -                if (ret < 0) {
> -                    error_report("error while reading block status of sector %"
> -                                 PRId64 ": %s", sector_num - bs_offset,
> -                                 strerror(-ret));
> -                    goto out;
> -                }
> -                /* If the output image is zero initialized, we are not working
> -                 * on a shared base and the input is zero we can skip the next
> -                 * n1 sectors */
> -                if (has_zero_init && !out_baseimg && (ret & BDRV_BLOCK_ZERO)) {
> -                    sector_num += n1;
> -                    continue;
> -                }
> -                /* If the output image is being created as a copy on write
> -                 * image, assume that sectors which are unallocated in the
> -                 * input image are present in both the output's and input's
> -                 * base images (no need to copy them). */
> -                if (out_baseimg) {
> -                    if (!(ret & BDRV_BLOCK_DATA)) {
> -                        sector_num += n1;
> -                        continue;
> -                    }
> -                    /* The next 'n1' sectors are allocated in the input image.
> -                     * Copy only those as they may be followed by unallocated
> -                     * sectors. */
> -                    nb_sectors = n1;
> -                }
> -                /* avoid redundant callouts to get_block_status */
> -                sector_num_next_status = sector_num + n1;
> -            }
> -
> -            n = MIN(nb_sectors, bufsectors);
> -
> -            /* round down request length to an aligned sector, but
> -             * do not bother doing this on short requests. They happen
> -             * when we found an all-zero area, and the next sector to
> -             * write will not be sector_num + n. */
> -            if (cluster_sectors > 0 && n >= cluster_sectors) {
> -                int64_t next_aligned_sector = (sector_num + n);
> -                next_aligned_sector -= next_aligned_sector % cluster_sectors;
> -                if (sector_num + n > next_aligned_sector) {
> -                    n = next_aligned_sector - sector_num;
> -                }
> -            }
> -
> -            n = MIN(n, bs_sectors[bs_i] - (sector_num - bs_offset));
> -
> -            sectors_read += n;
> -            if (count_allocated_sectors) {
> -                sector_num += n;
> -                continue;
> -            }
> +    state = (ImgConvertState) {
> +        .src                = bs,
> +        .src_sectors        = bs_sectors,
> +        .src_num            = bs_n,
> +        .total_sectors      = total_sectors,
> +        .target             = out_bs,
> +        .compressed         = compress,
> +        .out_backing        = (bool) out_baseimg,
> +        .min_sparse         = min_sparse,
> +        .cluster_sectors    = cluster_sectors,
> +        .buf_sectors        = bufsectors,
> +    };
> +    ret = convert_do_copy(&state);
>   
> -            n1 = n;
> -            ret = bdrv_read(bs[bs_i], sector_num - bs_offset, buf, n);
> -            if (ret < 0) {
> -                error_report("error while reading sector %" PRId64 ": %s",
> -                             sector_num - bs_offset, strerror(-ret));
> -                goto out;
> -            }
> -            /* NOTE: at the same time we convert, we do not write zero
> -               sectors to have a chance to compress the image. Ideally, we
> -               should add a specific call to have the info to go faster */
> -            buf1 = buf;
> -            while (n > 0) {
> -                if (!has_zero_init ||
> -                    is_allocated_sectors_min(buf1, n, &n1, min_sparse)) {
> -                    ret = bdrv_write(out_bs, sector_num, buf1, n1);
> -                    if (ret < 0) {
> -                        error_report("error while writing sector %" PRId64
> -                                     ": %s", sector_num, strerror(-ret));
> -                        goto out;
> -                    }
> -                }
> -                sector_num += n1;
> -                n -= n1;
> -                buf1 += n1 * 512;
> -            }
> -            qemu_progress_print(100.0 * sectors_read / sectors_to_read, 0);
> -        }
> -    }
>   out:
>       if (!ret) {
>           qemu_progress_print(100, 0);
> @@ -1865,7 +1965,6 @@ out:
>       qemu_progress_end();
>       qemu_opts_del(opts);
>       qemu_opts_free(create_opts);
> -    qemu_vfree(buf);
>       qemu_opts_del(sn_opts);
>       blk_unref(out_blk);
>       g_free(bs);

next prev parent reply	other threads:[~2015-02-18 17:18 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-18 15:19 [Qemu-devel] [PATCH 0/2] qemu-img convert: Rewrite copying logic Kevin Wolf
2015-02-18 15:19 ` [Qemu-devel] [PATCH 1/2] " Kevin Wolf
2015-02-18 17:18   ` Max Reitz [this message]
2015-02-19 15:47     ` Kevin Wolf
2015-02-19 16:01       ` Max Reitz
2015-02-18 15:19 ` [Qemu-devel] [PATCH 2/2] qemu-iotests: Some qemu-img convert tests Kevin Wolf
2015-02-18 17:24   ` Max Reitz
2015-03-19 14:41   ` Max Reitz
2015-03-19 14:43     ` Max Reitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54E4C94A.2020404@redhat.com \
    --to=mreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).