From: Max Reitz <mreitz@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
qemu-block@nongnu.org, qemu-devel@nongnu.org
Cc: fam@euphon.net, stefanha@redhat.com, jcody@redhat.com,
kwolf@redhat.com, den@openvz.org, eblake@redhat.com,
jsnow@redhat.com
Subject: Re: [Qemu-devel] [PATCH v5 11/11] block/backup: use backup-top instead of write notifiers
Date: Fri, 18 Jan 2019 15:56:45 +0100 [thread overview]
Message-ID: <1ea09d99-04ff-711e-9137-cdae33ca8681@redhat.com> (raw)
In-Reply-To: <20181229122027.42245-12-vsementsov@virtuozzo.com>
[-- Attachment #1: Type: text/plain, Size: 12784 bytes --]
On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
> Drop write notifiers and use filter node instead. Changes:
>
> 1. copy-before-writes now handled by filter node, so, drop all
> is_write_notifier arguments.
>
> 2. we don't have intersecting requests, so their handling is dropped.
> Instead, synchronization works as follows:
> when backup or backup-top starts copying of some area it firstly
> clears copy-bitmap bits, and nobody touches areas, not marked with
> dirty bits in copy-bitmap, so there is no intersection. Also, backup
> job copy operations are surrounded by bdrv region lock, which is
> actually serializing request, to not interfer with guest writes and
> not read changed data from source (before reading we clear
> corresponding bit in copy-bitmap, so, this area is not more handled by
> backup-top).
>
> 3. To sync with in-flight requests we now just drain hook node, we
> don't need rw-lock.
>
> 4. After the whole backup loop (top, full, incremental modes), we need
> to check for not copied clusters, which were under backup-top operation
> and we skipped them, but backup-top operation failed, error returned to
> the guest and dirty bits set back.
>
> 5. Don't create additional blk, use backup-top children for copy
> operations.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> block/backup.c | 285 ++++++++++++++++++++++++++-----------------------
> 1 file changed, 149 insertions(+), 136 deletions(-)
>
> diff --git a/block/backup.c b/block/backup.c
> index 88c0242b4e..e332909fb7 100644
> --- a/block/backup.c
> +++ b/block/backup.c
[...]
> @@ -300,21 +231,23 @@ static void backup_abort(Job *job)
> static void backup_clean(Job *job)
> {
> BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
> - assert(s->target);
> - blk_unref(s->target);
> +
> + /* We must clean it to not crash in backup_drain. */
> s->target = NULL;
Why not set s->source to NULL along with it? It makes sense if you're
going to drop the backup-top node because both of these are its children.
>
> if (s->copy_bitmap) {
> hbitmap_free(s->copy_bitmap);
> s->copy_bitmap = NULL;
> }
> +
> + bdrv_backup_top_drop(s->backup_top);
> }
[...]
> @@ -386,21 +319,45 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
> bool error_is_read;
> int64_t offset;
> HBitmapIter hbi;
> + void *lock = NULL;
>
> hbitmap_iter_init(&hbi, job->copy_bitmap, 0);
> - while ((offset = hbitmap_iter_next(&hbi)) != -1) {
> + while (hbitmap_count(job->copy_bitmap)) {
> + offset = hbitmap_iter_next(&hbi);
> + if (offset == -1) {
> + /*
> + * we may have skipped some clusters, which were handled by
> + * backup-top, but failed and finished by returning error to
> + * the guest and set dirty bit back.
> + */
> + hbitmap_iter_init(&hbi, job->copy_bitmap, 0);
> + offset = hbitmap_iter_next(&hbi);
> + assert(offset);
I think you want to assert "offset >= 0".
> + }
> +
> + lock = bdrv_co_try_lock(job->source, offset, job->cluster_size);
> + /*
> + * Dirty bit is set, which means that there are no in-flight
> + * write requests on this area. We must succeed.
> + */
> + assert(lock);
I'm not sure that is true right now, but more on that below in backup_run().
> +
> do {
> if (yield_and_check(job)) {
> + bdrv_co_unlock(lock);
> return 0;
> }
> - ret = backup_do_cow(job, offset,
> - job->cluster_size, &error_is_read, false);
> + ret = backup_do_cow(job, offset, job->cluster_size, &error_is_read);
> if (ret < 0 && backup_error_action(job, error_is_read, -ret) ==
> BLOCK_ERROR_ACTION_REPORT)
> {
> + bdrv_co_unlock(lock);
> return ret;
> }
> } while (ret < 0);
> +
> + bdrv_co_unlock(lock);
> + lock = NULL;
This statement seems unnecessary here.
> }
>
> return 0;
[...]
> @@ -447,26 +402,39 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
> hbitmap_set(s->copy_bitmap, 0, s->len);
> }
>
> - s->before_write.notify = backup_before_write_notify;
> - bdrv_add_before_write_notifier(bs, &s->before_write);
> -
> if (s->sync_mode == MIRROR_SYNC_MODE_NONE) {
> /* All bits are set in copy_bitmap to allow any cluster to be copied.
> * This does not actually require them to be copied. */
> while (!job_is_cancelled(job)) {
> - /* Yield until the job is cancelled. We just let our before_write
> - * notify callback service CoW requests. */
> + /*
> + * Yield until the job is cancelled. We just let our backup-top
> + * fileter driver service CbW requests.
*filter
> + */
> job_yield(job);
> }
> } else if (s->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
> ret = backup_run_incremental(s);
> } else {
[...]
> @@ -505,8 +474,20 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
> if (alloced < 0) {
> ret = alloced;
> } else {
> + if (!hbitmap_get(s->copy_bitmap, offset)) {
> + trace_backup_do_cow_skip(job, offset);
> + continue; /* already copied */
> + }
> + if (!lock) {
> + lock = bdrv_co_try_lock(s->source, offset, s->cluster_size);
> + /*
> + * Dirty bit is set, which means that there are no in-flight
> + * write requests on this area. We must succeed.
> + */
> + assert(lock);
What if I have a different parent node for the source that issues
concurrent writes? This used to work fine because the before_write
notifier would still work. After this patch, that would be broken
because those writes would not cause a CbW.
That's not so bad because we just have to make sure that all writes go
through the backup-top node. That would make this assertion valid
again, too. But that means we cannot share PERM_WRITE; see [1].
> + }
> ret = backup_do_cow(s, offset, s->cluster_size,
> - &error_is_read, false);
> + &error_is_read);
> }
> if (ret < 0) {
> /* Depending on error action, fail now or retry cluster */
> @@ -516,17 +497,34 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
> break;
> } else {
> offset -= s->cluster_size;
> + retry = true;
> continue;
> }
> }
> }
> + if (lock) {
> + bdrv_co_unlock(lock);
> + lock = NULL;
> + }
> + if (ret == 0 && !job_is_cancelled(job) &&
> + hbitmap_count(s->copy_bitmap))
> + {
> + /*
> + * we may have skipped some clusters, which were handled by
> + * backup-top, but failed and finished by returning error to
> + * the guest and set dirty bit back.
So it's a matter of a race?
> + */
> + goto iteration;
> + }
Why not wrap everything in a do {} while (ret == 0 && !job_is...)
instead? Because it would mean reindenting everything?
> }
>
> - notifier_with_return_remove(&s->before_write);
> + /* wait pending CBW operations in backup-top */
> + bdrv_drain(s->backup_top);
>
> - /* wait until pending backup_do_cow() calls have completed */
> - qemu_co_rwlock_wrlock(&s->flush_rwlock);
> - qemu_co_rwlock_unlock(&s->flush_rwlock);
> + backup_top_progress = bdrv_backup_top_progress(s->backup_top);
> + job_progress_update(job, ret + backup_top_progress -
Why the "ret"?
> + s->backup_top_progress);
> + s->backup_top_progress = backup_top_progress;
So the backup-top progress is ignored during basically all of the block
job until it suddenly jumps to 100 % completion? That doesn't sound ideal.
Or did you mean to put this into the for () loop of MODE_TOP/MODE_FULL?
(And the while() loop of MODE_NONE)
>
> return ret;
> }
> @@ -563,6 +561,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
> int ret;
> int64_t cluster_size;
> HBitmap *copy_bitmap = NULL;
> + BlockDriverState *backup_top = NULL;
> + uint64_t all_except_resize = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
> + BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD;
BLK_PERM_ALL & ~BLK_PERM_RESIZE?
[1] But we probably do not want to share either PERM_WRITE or
PERM_WRITE_UNCHANGED because during the duration of the backup,
everything should go through the backup-top filter (not sure about
PERM_WRITE_UNCHANGED right now). Or is that something that the filter
node should enforce in backup_top_child_perm()?
>
> assert(bs);
> assert(target);
> @@ -655,25 +656,31 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>
> copy_bitmap = hbitmap_alloc(len, ctz32(cluster_size));
>
> - /* job->len is fixed, so we can't allow resize */
> - job = block_job_create(job_id, &backup_job_driver, txn, bs,
> - BLK_PERM_CONSISTENT_READ,
> - BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
> - BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD,
> - speed, creation_flags, cb, opaque, errp);
> - if (!job) {
> + /*
> + * bdrv_get_device_name will not help to find device name starting from
> + * @bs after backup-top append,
Why not? Since backup-top is appended, shouldn't all parents of @bs be
parents of @backup_top then? (Making bdrv_get_parent_name() return the
same result)
> so let's calculate job_id before. Do
> + * it in the same way like block_job_create
> + */
> + if (job_id == NULL && !(creation_flags & JOB_INTERNAL)) {
> + job_id = bdrv_get_device_name(bs);
> + }
> +
> + backup_top = bdrv_backup_top_append(bs, target, copy_bitmap, errp);
> + if (!backup_top) {
> goto error;
> }
>
> - /* The target must match the source in size, so no resize here either */
> - job->target = blk_new(BLK_PERM_WRITE,
> - BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
> - BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD);
> - ret = blk_insert_bs(job->target, target, errp);
> - if (ret < 0) {
> + /* job->len is fixed, so we can't allow resize */
> + job = block_job_create(job_id, &backup_job_driver, txn, bs, 0,
Is there a reason you dropped PERM_CONSISTENT_READ here?
> + all_except_resize, speed, creation_flags,
> + cb, opaque, errp);
> + if (!job) {
> goto error;
> }
>
> + job->source = backup_top->backing;
> + job->target = ((BDRVBackupTopState *)backup_top->opaque)->target;
This looks really ugly. I think as long as the block job performs
writes itself, it should use its own BlockBackend.
Alternatively, I think it would make sense for the backup-top filter to
offer functions that this job can use to issue writes to the target.
Then the backup job would no longer need direct access to the target as
a BdrvChild.
> +
> job->on_source_error = on_source_error;
> job->on_target_error = on_target_error;
> job->sync_mode = sync_mode;
[...]
> @@ -712,6 +722,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
> backup_clean(&job->common.job);
> job_early_fail(&job->common.job);
> }
> + if (backup_top) {
> + bdrv_backup_top_drop(backup_top);
> + }
While it shouldn't be a problem in practice, backup_top has a reference
to copy_bitmap, so that bitmap should be freed after backup_top is dropped.
Max
>
> return NULL;
> }
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2019-01-18 14:57 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-29 12:20 [Qemu-devel] [PATCH v5 00/11] backup-top filter driver for backup Vladimir Sementsov-Ogievskiy
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 01/11] block/backup: simplify backup_incremental_init_copy_bitmap Vladimir Sementsov-Ogievskiy
2019-01-14 13:10 ` Max Reitz
2019-01-14 13:13 ` Max Reitz
2019-01-14 14:01 ` Vladimir Sementsov-Ogievskiy
2019-01-14 14:13 ` Max Reitz
2019-01-14 14:48 ` Vladimir Sementsov-Ogievskiy
2019-01-16 13:05 ` Max Reitz
2019-01-23 8:20 ` Vladimir Sementsov-Ogievskiy
2019-01-23 13:19 ` Max Reitz
2019-01-23 14:36 ` Eric Blake
2019-01-24 14:20 ` Vladimir Sementsov-Ogievskiy
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 02/11] block/backup: move to copy_bitmap with granularity Vladimir Sementsov-Ogievskiy
2019-01-14 14:10 ` Max Reitz
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 03/11] block: improve should_update_child Vladimir Sementsov-Ogievskiy
2019-01-14 14:32 ` Max Reitz
2019-01-14 16:13 ` Vladimir Sementsov-Ogievskiy
2019-01-16 13:17 ` Max Reitz
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 04/11] iotests: handle -f argument correctly for qemu_io_silent Vladimir Sementsov-Ogievskiy
2019-01-14 14:36 ` Max Reitz
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 05/11] iotests: allow resume_drive by node name Vladimir Sementsov-Ogievskiy
2019-01-14 14:46 ` Max Reitz
2019-01-14 16:06 ` Vladimir Sementsov-Ogievskiy
2019-01-16 13:11 ` Max Reitz
2019-01-23 13:22 ` Vladimir Sementsov-Ogievskiy
2019-01-23 13:31 ` Vladimir Sementsov-Ogievskiy
2019-01-23 13:33 ` Max Reitz
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 06/11] iotests: prepare 055 to graph changes during backup job Vladimir Sementsov-Ogievskiy
2019-01-16 13:48 ` Max Reitz
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 07/11] block: introduce backup-top filter driver Vladimir Sementsov-Ogievskiy
2019-01-16 16:02 ` Max Reitz
2019-01-17 12:13 ` Vladimir Sementsov-Ogievskiy
2019-01-18 12:05 ` Max Reitz
2019-01-23 13:47 ` Vladimir Sementsov-Ogievskiy
2019-04-13 16:08 ` Vladimir Sementsov-Ogievskiy
2019-04-13 16:08 ` Vladimir Sementsov-Ogievskiy
2019-04-13 17:03 ` Vladimir Sementsov-Ogievskiy
2019-04-13 17:03 ` Vladimir Sementsov-Ogievskiy
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 08/11] block/io: refactor wait_serialising_requests Vladimir Sementsov-Ogievskiy
2019-01-16 16:18 ` Max Reitz
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 09/11] block: add lock/unlock range functions Vladimir Sementsov-Ogievskiy
2019-01-16 16:36 ` Max Reitz
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 10/11] block/backup: tiny refactor backup_job_create Vladimir Sementsov-Ogievskiy
2019-01-18 13:00 ` Max Reitz
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 11/11] block/backup: use backup-top instead of write notifiers Vladimir Sementsov-Ogievskiy
2019-01-18 14:56 ` Max Reitz [this message]
2019-01-28 11:29 ` Vladimir Sementsov-Ogievskiy
2019-01-28 15:59 ` Max Reitz
2019-01-28 16:44 ` Vladimir Sementsov-Ogievskiy
2019-01-28 16:53 ` Max Reitz
2019-01-28 17:14 ` Vladimir Sementsov-Ogievskiy
2019-01-28 17:40 ` Kevin Wolf
2019-01-28 19:00 ` Vladimir Sementsov-Ogievskiy
2019-01-23 15:26 ` [Qemu-devel] [PATCH v5 00/11] backup-top filter driver for backup no-reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1ea09d99-04ff-711e-9137-cdae33ca8681@redhat.com \
--to=mreitz@redhat.com \
--cc=den@openvz.org \
--cc=eblake@redhat.com \
--cc=fam@euphon.net \
--cc=jcody@redhat.com \
--cc=jsnow@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).