qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Fam Zheng <famz@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, pl@kamp.de,
	pbonzini@redhat.com, ronniesahlberg@gmail.com,
	stefanha@redhat.com, mreitz@redhat.com, kwolf@redhat.com,
	jcody@redhat.com, jsnow@redhat.com, den@openvz.org
Subject: Re: [Qemu-devel] [PATCH v5 4/4] block/backup: fix fleecing scheme: use serialized writes
Date: Tue, 10 Jul 2018 10:57:43 +0800	[thread overview]
Message-ID: <20180710025743.GI17581@lemon.usersys.redhat.com> (raw)
In-Reply-To: <20180709163719.87107-5-vsementsov@virtuozzo.com>

On Mon, 07/09 19:37, Vladimir Sementsov-Ogievskiy wrote:
> Fleecing scheme works as follows: we want a kind of temporary snapshot
> of active drive A. We create temporary image B, with B->backing = A.
> Then we start backup(sync=none) from A to B. From this point, B reads
> as point-in-time snapshot of A (A continues to be active drive,
> accepting guest IO).
> 
> This scheme needs some additional synchronization between reads from B
> and backup COW operations, otherwise, the following situation is
> theoretically possible:
> 
> (assume B is qcow2, client is NBD client, reading from B)
> 
> 1. client starts reading and take qcow2 mutex in qcow2_co_preadv, and
>    goes up to l2 table loading (assume cache miss)
> 
> 2) guest write => backup COW => qcow2 write =>
>    try to take qcow2 mutex => waiting
> 
> 3. l2 table loaded, we see that cluster is UNALLOCATED, go to
>    "case QCOW2_CLUSTER_UNALLOCATED" and unlock mutex before
>    bdrv_co_preadv(bs->backing, ...)
> 
> 4) aha, mutex unlocked, backup COW continues, and we finally finish
>    guest write and change cluster in our active disk A
> 
> 5. actually, do bdrv_co_preadv(bs->backing, ...) and read
>    _new updated_ data.
> 
> To avoid this, let's make backup writes serializing, to not intersect
> with reads from B.
> 
> Note: we expand range of handled cases from (sync=none and
> B->backing = A) to just (A in backing chain of B), to finally allow
> safe reading from B during backup for all cases when A in backing chain
> of B, i.e. B formally looks like point-in-time snapshot of A.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/backup.c | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index f3e4e814b6..319fc922e8 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -47,6 +47,8 @@ typedef struct BackupBlockJob {
>      HBitmap *copy_bitmap;
>      bool use_copy_range;
>      int64_t copy_range_size;
> +
> +    bool serialize_target_writes;
>  } BackupBlockJob;
>  
>  static const BlockJobDriver backup_job_driver;
> @@ -102,6 +104,8 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
>      QEMUIOVector qiov;
>      BlockBackend *blk = job->common.blk;
>      int nbytes;
> +    int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
> +    int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;
>  
>      hbitmap_reset(job->copy_bitmap, start / job->cluster_size, 1);
>      nbytes = MIN(job->cluster_size, job->len - start);
> @@ -112,8 +116,7 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
>      iov.iov_len = nbytes;
>      qemu_iovec_init_external(&qiov, &iov, 1);
>  
> -    ret = blk_co_preadv(blk, start, qiov.size, &qiov,
> -                        is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);
> +    ret = blk_co_preadv(blk, start, qiov.size, &qiov, read_flags);
>      if (ret < 0) {
>          trace_backup_do_cow_read_fail(job, start, ret);
>          if (error_is_read) {
> @@ -124,11 +127,11 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
>  
>      if (qemu_iovec_is_zero(&qiov)) {
>          ret = blk_co_pwrite_zeroes(job->target, start,
> -                                   qiov.size, BDRV_REQ_MAY_UNMAP);
> +                                   qiov.size, write_flags | BDRV_REQ_MAY_UNMAP);
>      } else {
>          ret = blk_co_pwritev(job->target, start,
> -                             qiov.size, &qiov,
> -                             job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
> +                             qiov.size, &qiov, write_flags |
> +                             (job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0));
>      }
>      if (ret < 0) {
>          trace_backup_do_cow_write_fail(job, start, ret);
> @@ -156,6 +159,8 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,
>      int nr_clusters;
>      BlockBackend *blk = job->common.blk;
>      int nbytes;
> +    int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
> +    int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;
>  
>      assert(QEMU_IS_ALIGNED(job->copy_range_size, job->cluster_size));
>      nbytes = MIN(job->copy_range_size, end - start);
> @@ -163,7 +168,7 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,
>      hbitmap_reset(job->copy_bitmap, start / job->cluster_size,
>                    nr_clusters);
>      ret = blk_co_copy_range(blk, start, job->target, start, nbytes,
> -                            is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0, 0);
> +                            read_flags, write_flags);
>      if (ret < 0) {
>          trace_backup_do_cow_copy_range_fail(job, start, ret);
>          hbitmap_set(job->copy_bitmap, start / job->cluster_size,
> @@ -701,6 +706,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>                         sync_bitmap : NULL;
>      job->compress = compress;
>  
> +    /* Detect image-fleecing (and similar) schemes */
> +    job->serialize_target_writes = bdrv_chain_contains(target, bs);
> +
>      /* If there is no backing file on the target, we cannot rely on COW if our
>       * backup cluster size is smaller than the target cluster size. Even for
>       * targets with a backing file, try to avoid COW if possible. */

I always thought mirror is by far the most complicated job, but now backup is
catching up quickly!

Reviewed-by: Fam Zheng <famz@redhat.com>

  reply	other threads:[~2018-07-10  2:57 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-09 16:37 [Qemu-devel] [PATCH v5 0/4] fix image fleecing Vladimir Sementsov-Ogievskiy
2018-07-09 16:37 ` [Qemu-devel] [PATCH v5 1/4] block/io: fix copy_range Vladimir Sementsov-Ogievskiy
2018-07-10  1:49   ` Fam Zheng
2018-07-09 16:37 ` [Qemu-devel] [PATCH v5 2/4] block: split flags in copy_range Vladimir Sementsov-Ogievskiy
2018-07-10  1:52   ` Fam Zheng
2018-07-09 16:37 ` [Qemu-devel] [PATCH v5 3/4] block: add BDRV_REQ_SERIALISING flag Vladimir Sementsov-Ogievskiy
2018-07-10  2:51   ` Fam Zheng
2018-07-09 16:37 ` [Qemu-devel] [PATCH v5 4/4] block/backup: fix fleecing scheme: use serialized writes Vladimir Sementsov-Ogievskiy
2018-07-10  2:57   ` Fam Zheng [this message]
2018-07-10 11:13 ` [Qemu-devel] [PATCH v5 0/4] fix image fleecing Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180710025743.GI17581@lemon.usersys.redhat.com \
    --to=famz@redhat.com \
    --cc=den@openvz.org \
    --cc=jcody@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=ronniesahlberg@gmail.com \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).