All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fam Zheng <famz@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, pl@kamp.de,
	pbonzini@redhat.com, ronniesahlberg@gmail.com,
	stefanha@redhat.com, mreitz@redhat.com, kwolf@redhat.com,
	jcody@redhat.com, jsnow@redhat.com, den@openvz.org
Subject: Re: [Qemu-devel] [PATCH v5 4/4] block/backup: fix fleecing scheme: use serialized writes
Date: Tue, 10 Jul 2018 10:57:43 +0800	[thread overview]
Message-ID: <20180710025743.GI17581@lemon.usersys.redhat.com> (raw)
In-Reply-To: <20180709163719.87107-5-vsementsov@virtuozzo.com>

On Mon, 07/09 19:37, Vladimir Sementsov-Ogievskiy wrote:
> Fleecing scheme works as follows: we want a kind of temporary snapshot
> of active drive A. We create temporary image B, with B->backing = A.
> Then we start backup(sync=none) from A to B. From this point, B reads
> as point-in-time snapshot of A (A continues to be active drive,
> accepting guest IO).
> 
> This scheme needs some additional synchronization between reads from B
> and backup COW operations, otherwise, the following situation is
> theoretically possible:
> 
> (assume B is qcow2, client is NBD client, reading from B)
> 
> 1. client starts reading and take qcow2 mutex in qcow2_co_preadv, and
>    goes up to l2 table loading (assume cache miss)
> 
> 2) guest write => backup COW => qcow2 write =>
>    try to take qcow2 mutex => waiting
> 
> 3. l2 table loaded, we see that cluster is UNALLOCATED, go to
>    "case QCOW2_CLUSTER_UNALLOCATED" and unlock mutex before
>    bdrv_co_preadv(bs->backing, ...)
> 
> 4) aha, mutex unlocked, backup COW continues, and we finally finish
>    guest write and change cluster in our active disk A
> 
> 5. actually, do bdrv_co_preadv(bs->backing, ...) and read
>    _new updated_ data.
> 
> To avoid this, let's make backup writes serializing, to not intersect
> with reads from B.
> 
> Note: we expand range of handled cases from (sync=none and
> B->backing = A) to just (A in backing chain of B), to finally allow
> safe reading from B during backup for all cases when A in backing chain
> of B, i.e. B formally looks like point-in-time snapshot of A.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/backup.c | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index f3e4e814b6..319fc922e8 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -47,6 +47,8 @@ typedef struct BackupBlockJob {
>      HBitmap *copy_bitmap;
>      bool use_copy_range;
>      int64_t copy_range_size;
> +
> +    bool serialize_target_writes;
>  } BackupBlockJob;
>  
>  static const BlockJobDriver backup_job_driver;
> @@ -102,6 +104,8 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
>      QEMUIOVector qiov;
>      BlockBackend *blk = job->common.blk;
>      int nbytes;
> +    int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
> +    int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;
>  
>      hbitmap_reset(job->copy_bitmap, start / job->cluster_size, 1);
>      nbytes = MIN(job->cluster_size, job->len - start);
> @@ -112,8 +116,7 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
>      iov.iov_len = nbytes;
>      qemu_iovec_init_external(&qiov, &iov, 1);
>  
> -    ret = blk_co_preadv(blk, start, qiov.size, &qiov,
> -                        is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);
> +    ret = blk_co_preadv(blk, start, qiov.size, &qiov, read_flags);
>      if (ret < 0) {
>          trace_backup_do_cow_read_fail(job, start, ret);
>          if (error_is_read) {
> @@ -124,11 +127,11 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
>  
>      if (qemu_iovec_is_zero(&qiov)) {
>          ret = blk_co_pwrite_zeroes(job->target, start,
> -                                   qiov.size, BDRV_REQ_MAY_UNMAP);
> +                                   qiov.size, write_flags | BDRV_REQ_MAY_UNMAP);
>      } else {
>          ret = blk_co_pwritev(job->target, start,
> -                             qiov.size, &qiov,
> -                             job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
> +                             qiov.size, &qiov, write_flags |
> +                             (job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0));
>      }
>      if (ret < 0) {
>          trace_backup_do_cow_write_fail(job, start, ret);
> @@ -156,6 +159,8 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,
>      int nr_clusters;
>      BlockBackend *blk = job->common.blk;
>      int nbytes;
> +    int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
> +    int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;
>  
>      assert(QEMU_IS_ALIGNED(job->copy_range_size, job->cluster_size));
>      nbytes = MIN(job->copy_range_size, end - start);
> @@ -163,7 +168,7 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,
>      hbitmap_reset(job->copy_bitmap, start / job->cluster_size,
>                    nr_clusters);
>      ret = blk_co_copy_range(blk, start, job->target, start, nbytes,
> -                            is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0, 0);
> +                            read_flags, write_flags);
>      if (ret < 0) {
>          trace_backup_do_cow_copy_range_fail(job, start, ret);
>          hbitmap_set(job->copy_bitmap, start / job->cluster_size,
> @@ -701,6 +706,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>                         sync_bitmap : NULL;
>      job->compress = compress;
>  
> +    /* Detect image-fleecing (and similar) schemes */
> +    job->serialize_target_writes = bdrv_chain_contains(target, bs);
> +
>      /* If there is no backing file on the target, we cannot rely on COW if our
>       * backup cluster size is smaller than the target cluster size. Even for
>       * targets with a backing file, try to avoid COW if possible. */

I always thought mirror is by far the most complicated job, but now backup is
catching up quickly!

Reviewed-by: Fam Zheng <famz@redhat.com>

  reply	other threads:[~2018-07-10  2:57 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-09 16:37 [Qemu-devel] [PATCH v5 0/4] fix image fleecing Vladimir Sementsov-Ogievskiy
2018-07-09 16:37 ` [Qemu-devel] [PATCH v5 1/4] block/io: fix copy_range Vladimir Sementsov-Ogievskiy
2018-07-10  1:49   ` Fam Zheng
2018-07-09 16:37 ` [Qemu-devel] [PATCH v5 2/4] block: split flags in copy_range Vladimir Sementsov-Ogievskiy
2018-07-10  1:52   ` Fam Zheng
2018-07-09 16:37 ` [Qemu-devel] [PATCH v5 3/4] block: add BDRV_REQ_SERIALISING flag Vladimir Sementsov-Ogievskiy
2018-07-10  2:51   ` Fam Zheng
2018-07-09 16:37 ` [Qemu-devel] [PATCH v5 4/4] block/backup: fix fleecing scheme: use serialized writes Vladimir Sementsov-Ogievskiy
2018-07-10  2:57   ` Fam Zheng [this message]
2018-07-10 11:13 ` [Qemu-devel] [PATCH v5 0/4] fix image fleecing Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180710025743.GI17581@lemon.usersys.redhat.com \
    --to=famz@redhat.com \
    --cc=den@openvz.org \
    --cc=jcody@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=ronniesahlberg@gmail.com \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.