From: "Denis V. Lunev" <den@openvz.org>
To: qemu-devel@nongnu.org, qemu-block@nongnu.org
Cc: vsementsov@virtuozzo.com, "Denis V. Lunev" <den@openvz.org>,
Stefan Hajnoczi <stefanha@redhat.com>,
Fam Zheng <famz@redhat.com>, Kevin Wolf <kwolf@redhat.com>,
Max Reitz <mreitz@redhat.com>, Jeff Cody <jcody@redhat.com>,
Eric Blake <eblake@redhat.com>
Subject: [Qemu-devel] [PATCH 8/9] mirror: use synch scheme for drive mirror
Date: Tue, 14 Jun 2016 18:25:15 +0300 [thread overview]
Message-ID: <1465917916-22348-9-git-send-email-den@openvz.org> (raw)
In-Reply-To: <1465917916-22348-1-git-send-email-den@openvz.org>
Block commit of the active image to the backing store on a slow disk
could never end. For example with the guest with the following loop
inside
while true; do
dd bs=1k count=1 if=/dev/zero of=x
done
running above slow storage could not complete the operation with a
resonable amount of time:
virsh blockcommit rhel7 sda --active --shallow
virsh qemu-monitor-event
virsh qemu-monitor-command rhel7 \
'{"execute":"block-job-complete",\
"arguments":{"device":"drive-scsi0-0-0-0"} }'
virsh qemu-monitor-event
Completion event is never received.
This problem could not be fixed easily with the current architecture. We
should either prohibit guest writes (making dirty bitmap dirty) or switch
to the sycnchronous scheme.
This patch implements the latter. It adds mirror_before_write_notify
callback. In this case all data written from the guest is synchnonously
written to the mirror target. Though the problem is solved partially.
We should switch from bdrv_dirty_bitmap to simple hbitmap. This will be
done in the next patch.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
CC: Eric Blake <eblake@redhat.com>
---
block/mirror.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 78 insertions(+)
diff --git a/block/mirror.c b/block/mirror.c
index 7471211..086256c 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -58,6 +58,9 @@ typedef struct MirrorBlockJob {
QSIMPLEQ_HEAD(, MirrorBuffer) buf_free;
int buf_free_count;
+ NotifierWithReturn before_write;
+ CoQueue dependent_writes;
+
unsigned long *in_flight_bitmap;
int in_flight;
int sectors_in_flight;
@@ -125,6 +128,7 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
g_free(op->buf);
g_free(op);
+ qemu_co_queue_restart_all(&s->dependent_writes);
if (s->waiting_for_io) {
qemu_coroutine_enter(s->common.co, NULL);
}
@@ -511,6 +515,74 @@ static void mirror_exit(BlockJob *job, void *opaque)
bdrv_unref(src);
}
+static int coroutine_fn mirror_before_write_notify(
+ NotifierWithReturn *notifier, void *opaque)
+{
+ MirrorBlockJob *s = container_of(notifier, MirrorBlockJob, before_write);
+ BdrvTrackedRequest *req = opaque;
+ MirrorOp *op;
+ int sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
+ int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
+ int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;
+ int64_t end_sector = sector_num + nb_sectors;
+ int64_t aligned_start, aligned_end;
+
+ if (req->type != BDRV_TRACKED_DISCARD && req->type != BDRV_TRACKED_WRITE) {
+ /* this is not discard and write, we do not care */
+ return 0;
+ }
+
+ while (1) {
+ bool waited = false;
+ int64_t sn;
+
+ for (sn = sector_num; sn < end_sector; sn += sectors_per_chunk) {
+ int64_t chunk = sn / sectors_per_chunk;
+ if (test_bit(chunk, s->in_flight_bitmap)) {
+ trace_mirror_yield_in_flight(s, chunk, s->in_flight);
+ qemu_co_queue_wait(&s->dependent_writes);
+ waited = true;
+ }
+ }
+
+ if (!waited) {
+ break;
+ }
+ }
+
+ aligned_start = QEMU_ALIGN_UP(sector_num, sectors_per_chunk);
+ aligned_end = QEMU_ALIGN_DOWN(sector_num + nb_sectors, sectors_per_chunk);
+ if (aligned_end > aligned_start) {
+ bdrv_reset_dirty_bitmap(s->dirty_bitmap, aligned_start,
+ aligned_end - aligned_start);
+ }
+
+ if (req->type == BDRV_TRACKED_DISCARD) {
+ mirror_do_zero_or_discard(s, sector_num, nb_sectors, true);
+ return 0;
+ }
+
+ s->in_flight++;
+ s->sectors_in_flight += nb_sectors;
+
+ /* Allocate a MirrorOp that is used as an AIO callback. */
+ op = g_new(MirrorOp, 1);
+ op->s = s;
+ op->sector_num = sector_num;
+ op->nb_sectors = nb_sectors;
+ op->buf = qemu_try_blockalign(blk_bs(s->target), req->qiov->size);
+ if (op->buf == NULL) {
+ g_free(op);
+ return -ENOMEM;
+ }
+ qemu_iovec_init(&op->qiov, req->qiov->niov);
+ qemu_iovec_clone(&op->qiov, req->qiov, op->buf);
+
+ blk_aio_pwritev(s->target, req->offset, &op->qiov, 0,
+ mirror_write_complete, op);
+ return 0;
+}
+
static int mirror_dirty_init(MirrorBlockJob *s)
{
int64_t sector_num, end;
@@ -764,6 +836,8 @@ immediate_exit:
mirror_drain(s);
}
+ notifier_with_return_remove(&s->before_write);
+
assert(s->in_flight == 0);
qemu_vfree(s->buf);
g_free(s->cow_bitmap);
@@ -905,6 +979,10 @@ static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
return;
}
+ qemu_co_queue_init(&s->dependent_writes);
+ s->before_write.notify = mirror_before_write_notify;
+ bdrv_add_before_write_notifier(bs, &s->before_write);
+
bdrv_op_block_all(target, s->common.blocker);
s->common.co = qemu_coroutine_create(mirror_run);
--
2.5.0
next prev parent reply other threads:[~2016-06-14 15:26 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-14 15:25 [Qemu-devel] [PATCH 0/9] major rework of drive-mirror Denis V. Lunev
2016-06-14 15:25 ` [Qemu-devel] [PATCH 1/9] mirror: fix calling of blk_aio_pwritev/blk_aio_preadv Denis V. Lunev
2016-06-14 22:48 ` Eric Blake
2016-06-14 15:25 ` [Qemu-devel] [PATCH 2/9] mirror: create mirror_dirty_init helper for mirror_run Denis V. Lunev
2016-06-15 2:29 ` Eric Blake
2016-06-14 15:25 ` [Qemu-devel] [PATCH 3/9] mirror: optimize dirty bitmap filling in mirror_run a bit Denis V. Lunev
2016-06-15 2:36 ` Eric Blake
2016-06-15 8:41 ` Denis V. Lunev
2016-06-15 12:25 ` Eric Blake
2016-06-14 15:25 ` [Qemu-devel] [PATCH 4/9] mirror: efficiently zero out target Denis V. Lunev
2016-06-15 3:00 ` Eric Blake
2016-06-15 8:46 ` Denis V. Lunev
2016-06-15 12:34 ` Eric Blake
2016-06-15 13:18 ` Denis V. Lunev
2016-07-06 14:33 ` Denis V. Lunev
2016-06-14 15:25 ` [Qemu-devel] [PATCH 5/9] mirror: improve performance of mirroring of empty disk Denis V. Lunev
2016-06-15 3:20 ` Eric Blake
2016-06-15 9:19 ` Stefan Hajnoczi
2016-06-15 10:37 ` Denis V. Lunev
2016-06-16 10:10 ` Stefan Hajnoczi
2016-06-17 2:53 ` Eric Blake
2016-06-17 13:56 ` Stefan Hajnoczi
2016-06-14 15:25 ` [Qemu-devel] [PATCH 6/9] block: pass qiov into before_write notifier Denis V. Lunev
2016-06-15 4:07 ` Eric Blake
2016-06-15 9:21 ` Stefan Hajnoczi
2016-06-15 9:24 ` Denis V. Lunev
2016-06-15 9:22 ` Stefan Hajnoczi
2016-06-14 15:25 ` [Qemu-devel] [PATCH 7/9] mirror: allow to save buffer for QEMUIOVector in MirrorOp Denis V. Lunev
2016-06-15 4:11 ` Eric Blake
2016-06-14 15:25 ` Denis V. Lunev [this message]
2016-06-15 4:18 ` [Qemu-devel] [PATCH 8/9] mirror: use synch scheme for drive mirror Eric Blake
2016-06-15 8:52 ` Denis V. Lunev
2016-06-15 9:48 ` Stefan Hajnoczi
2016-06-14 15:25 ` [Qemu-devel] [PATCH 9/9] mirror: replace bdrv_dirty_bitmap with plain hbitmap Denis V. Lunev
2016-06-15 9:06 ` [Qemu-devel] [PATCH 0/9] major rework of drive-mirror Kevin Wolf
2016-06-15 9:34 ` Denis V. Lunev
2016-06-15 10:25 ` Kevin Wolf
2016-06-15 10:44 ` Denis V. Lunev
2016-06-15 9:50 ` Stefan Hajnoczi
2016-06-15 11:09 ` Dr. David Alan Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1465917916-22348-9-git-send-email-den@openvz.org \
--to=den@openvz.org \
--cc=eblake@redhat.com \
--cc=famz@redhat.com \
--cc=jcody@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).