qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Fiona Ebner <f.ebner@proxmox.com>
Cc: "open list:Network Block Dev..." <qemu-block@nongnu.org>,
	"QEMU Developers" <qemu-devel@nongnu.org>,
	jsnow@redhat.com, vsementsov@yandex-team.ru,
	"Kevin Wolf" <kwolf@redhat.com>,
	"Hanna Reitz" <hreitz@redhat.com>,
	quintela@redhat.com, "Thomas Lamprecht" <t.lamprecht@proxmox.com>,
	"Fabian Grünbichler" <f.gruenbichler@proxmox.com>,
	"Wolfgang Bumiller" <w.bumiller@proxmox.com>
Subject: Re: Question regarding live-migration with drive-mirror
Date: Wed, 28 Sep 2022 19:53:48 +0100	[thread overview]
Message-ID: <YzSYPDR0L98Nks4P@work-vm> (raw)
In-Reply-To: <1db7f571-cb7f-c293-04cc-cd856e060c3f@proxmox.com>

* Fiona Ebner (f.ebner@proxmox.com) wrote:
> Hi,
> recently one of our users provided a backtrace[0] for the following
> assertion failure during a live migration that uses drive-mirror to sync
> a local disk:
> > bdrv_co_write_req_prepare: Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed
> 
> The way we do migration with a local disk is essentially:
> 1. start target instance with a suitable NBD export
> 2. start drive-mirror on the source side and wait for it to become ready
> once
> 3. issue 'migrate' QMP command
> 4. cancel drive-mirror blockjob after the migration has finished
> 
> I reproduced the issue with the following fio script running in the
> guest (to dirty lots of clusters):
> > fio --name=make-mirror-work --size=100M --direct=1 --rw=randwrite \
> >     --bs=4k --ioengine=psync --numjobs=5 --runtime=60 --time_based
> 
> AFAIU, the issue is that nothing guarantees that the drive mirror is
> ready when the migration inactivates the block drives.

I don't know the block code well enough; I don't think I'd realised
that a drive-mirror could become unready.

> Is using copy-mode=write-blocking for drive-mirror to only way to avoid
> this issue? There, the downside is that the network (used by the mirror)
> would become a bottleneck for IO in the guest, while the behavior would
> really only be needed during the final phase.

It sounds like you need a way to switch to the blocking mode.

> I guess the assert should be avoided in any case. Here's a few ideas
> that came to mind:
> 1. migration should fail gracefully
> 2. migration should wait for the mirror-jobs to become ready before
> inactivating the block drives - that would increase the downtime in
> these situations of course
> 2A. additionally, drive-mirror could be taken into account when
> converging the migration somehow?

Does the migration capaibility 'pause-before-switchover' help you here?
If enabled, it causes the VM to pause just before the
bdrv_inactivate_all (and then use migrate-continue to tell it to carry
on)

Dave

> I noticed the following comment in the mirror implementation
> >         /* Note that even when no rate limit is applied we need to yield
> >          * periodically with no pending I/O so that bdrv_drain_all() returns.
> >          * We do so every BLKOCK_JOB_SLICE_TIME nanoseconds, or when there is
> >          * an error, or when the source is clean, whichever comes first. */
> 
> 3. change draining behavior after the job was ready once, so that
> bdrv_drain_all() will only return when the job is ready again? Hope I'm
> not completely misunderstanding this.
> 
> Best Regards,
> Fiona
> 
> [0]:
> > Thread 1 (Thread 0x7f3389d4a000 (LWP 2297576) "kvm"):
> > #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> > #1  0x00007f339488d537 in __GI_abort () at abort.c:79
> > #2  0x00007f339488d40f in __assert_fail_base (fmt=0x7f3394a056a8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5595f85bfd70 "!(bs->open_flags & BDRV_O_INACTIVE)", file=0x5595f85cb576 "../block/io.c", line=2026, function=<optimized out>) at assert.c:92
> > #3  0x00007f339489c662 in __GI___assert_fail (assertion=assertion@entry=0x5595f85bfd70 "!(bs->open_flags & BDRV_O_INACTIVE)", file=file@entry=0x5595f85cb576 "../block/io.c", line=line@entry=2026, function=function@entry=0x5595f85cc510 <__PRETTY_FUNCTION__.8> "bdrv_co_write_req_prepare") at assert.c:101
> > #4  0x00005595f83218f2 in bdrv_co_write_req_prepare (child=0x5595f91cab90, offset=60867018752, bytes=196608, req=0x7f324a2e9d70, flags=0) at ../block/io.c:2026
> > #5  0x00005595f8323384 in bdrv_aligned_pwritev (child=child@entry=0x5595f91cab90, req=req@entry=0x7f324a2e9d70, offset=60867018752, bytes=196608, align=align@entry=1, qiov=0x5595f9030d58, qiov_offset=0, flags=0) at ../block/io.c:2140
> > #6  0x00005595f832485a in bdrv_co_pwritev_part (child=0x5595f91cab90, offset=<optimized out>, offset@entry=60867018752, bytes=<optimized out>, bytes@entry=196608, qiov=<optimized out>, qiov@entry=0x5595f9030d58, qiov_offset=<optimized out>, qiov_offset@entry=0, flags=flags@entry=0) at ../block/io.c:2353
> > #7  0x00005595f8315a09 in blk_co_do_pwritev_part (blk=blk@entry=0x5595f91db8c0, offset=60867018752, bytes=196608, qiov=qiov@entry=0x5595f9030d58, qiov_offset=qiov_offset@entry=0, flags=flags@entry=0) at ../block/block-backend.c:1365
> > #8  0x00005595f8315bdd in blk_co_pwritev_part (flags=0, qiov_offset=0, qiov=qiov@entry=0x5595f9030d58, bytes=<optimized out>, offset=<optimized out>, blk=0x5595f91db8c0) at ../block/block-backend.c:1380
> > #9  blk_co_pwritev (blk=0x5595f91db8c0, offset=<optimized out>, bytes=<optimized out>, qiov=qiov@entry=0x5595f9030d58, flags=flags@entry=0) at ../block/block-backend.c:1391
> > #10 0x00005595f8328a59 in mirror_read_complete (ret=0, op=0x5595f9030d50) at ../block/mirror.c:260
> > #11 mirror_co_read (opaque=0x5595f9030d50) at ../block/mirror.c:400
> > #12 0x00005595f843a39b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:177
> > #13 0x00007f33948b8d40 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> > #14 0x00007f324a3ea6e0 in ?? ()
> > #15 0x0000000000000000 in ?? ()
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



  reply	other threads:[~2022-09-28 18:55 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-28 14:10 Question regarding live-migration with drive-mirror Fiona Ebner
2022-09-28 18:53 ` Dr. David Alan Gilbert [this message]
2022-09-29  9:39   ` Fiona Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YzSYPDR0L98Nks4P@work-vm \
    --to=dgilbert@redhat.com \
    --cc=f.ebner@proxmox.com \
    --cc=f.gruenbichler@proxmox.com \
    --cc=hreitz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=t.lamprecht@proxmox.com \
    --cc=vsementsov@yandex-team.ru \
    --cc=w.bumiller@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).