All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Fiona Ebner <f.ebner@proxmox.com>
Cc: "open list:Network Block Dev..." <qemu-block@nongnu.org>,
	"QEMU Developers" <qemu-devel@nongnu.org>,
	jsnow@redhat.com, vsementsov@yandex-team.ru,
	"Kevin Wolf" <kwolf@redhat.com>,
	"Hanna Reitz" <hreitz@redhat.com>,
	quintela@redhat.com, "Thomas Lamprecht" <t.lamprecht@proxmox.com>,
	"Fabian Grünbichler" <f.gruenbichler@proxmox.com>,
	"Wolfgang Bumiller" <w.bumiller@proxmox.com>
Subject: Re: Question regarding live-migration with drive-mirror
Date: Wed, 28 Sep 2022 19:53:48 +0100	[thread overview]
Message-ID: <YzSYPDR0L98Nks4P@work-vm> (raw)
In-Reply-To: <1db7f571-cb7f-c293-04cc-cd856e060c3f@proxmox.com>

* Fiona Ebner (f.ebner@proxmox.com) wrote:
> Hi,
> recently one of our users provided a backtrace[0] for the following
> assertion failure during a live migration that uses drive-mirror to sync
> a local disk:
> > bdrv_co_write_req_prepare: Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed
> 
> The way we do migration with a local disk is essentially:
> 1. start target instance with a suitable NBD export
> 2. start drive-mirror on the source side and wait for it to become ready
> once
> 3. issue 'migrate' QMP command
> 4. cancel drive-mirror blockjob after the migration has finished
> 
> I reproduced the issue with the following fio script running in the
> guest (to dirty lots of clusters):
> > fio --name=make-mirror-work --size=100M --direct=1 --rw=randwrite \
> >     --bs=4k --ioengine=psync --numjobs=5 --runtime=60 --time_based
> 
> AFAIU, the issue is that nothing guarantees that the drive mirror is
> ready when the migration inactivates the block drives.

I don't know the block code well enough; I don't think I'd realised
that a drive-mirror could become unready.

> Is using copy-mode=write-blocking for drive-mirror to only way to avoid
> this issue? There, the downside is that the network (used by the mirror)
> would become a bottleneck for IO in the guest, while the behavior would
> really only be needed during the final phase.

It sounds like you need a way to switch to the blocking mode.

> I guess the assert should be avoided in any case. Here's a few ideas
> that came to mind:
> 1. migration should fail gracefully
> 2. migration should wait for the mirror-jobs to become ready before
> inactivating the block drives - that would increase the downtime in
> these situations of course
> 2A. additionally, drive-mirror could be taken into account when
> converging the migration somehow?

Does the migration capaibility 'pause-before-switchover' help you here?
If enabled, it causes the VM to pause just before the
bdrv_inactivate_all (and then use migrate-continue to tell it to carry
on)

Dave

> I noticed the following comment in the mirror implementation
> >         /* Note that even when no rate limit is applied we need to yield
> >          * periodically with no pending I/O so that bdrv_drain_all() returns.
> >          * We do so every BLKOCK_JOB_SLICE_TIME nanoseconds, or when there is
> >          * an error, or when the source is clean, whichever comes first. */
> 
> 3. change draining behavior after the job was ready once, so that
> bdrv_drain_all() will only return when the job is ready again? Hope I'm
> not completely misunderstanding this.
> 
> Best Regards,
> Fiona
> 
> [0]:
> > Thread 1 (Thread 0x7f3389d4a000 (LWP 2297576) "kvm"):
> > #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> > #1  0x00007f339488d537 in __GI_abort () at abort.c:79
> > #2  0x00007f339488d40f in __assert_fail_base (fmt=0x7f3394a056a8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5595f85bfd70 "!(bs->open_flags & BDRV_O_INACTIVE)", file=0x5595f85cb576 "../block/io.c", line=2026, function=<optimized out>) at assert.c:92
> > #3  0x00007f339489c662 in __GI___assert_fail (assertion=assertion@entry=0x5595f85bfd70 "!(bs->open_flags & BDRV_O_INACTIVE)", file=file@entry=0x5595f85cb576 "../block/io.c", line=line@entry=2026, function=function@entry=0x5595f85cc510 <__PRETTY_FUNCTION__.8> "bdrv_co_write_req_prepare") at assert.c:101
> > #4  0x00005595f83218f2 in bdrv_co_write_req_prepare (child=0x5595f91cab90, offset=60867018752, bytes=196608, req=0x7f324a2e9d70, flags=0) at ../block/io.c:2026
> > #5  0x00005595f8323384 in bdrv_aligned_pwritev (child=child@entry=0x5595f91cab90, req=req@entry=0x7f324a2e9d70, offset=60867018752, bytes=196608, align=align@entry=1, qiov=0x5595f9030d58, qiov_offset=0, flags=0) at ../block/io.c:2140
> > #6  0x00005595f832485a in bdrv_co_pwritev_part (child=0x5595f91cab90, offset=<optimized out>, offset@entry=60867018752, bytes=<optimized out>, bytes@entry=196608, qiov=<optimized out>, qiov@entry=0x5595f9030d58, qiov_offset=<optimized out>, qiov_offset@entry=0, flags=flags@entry=0) at ../block/io.c:2353
> > #7  0x00005595f8315a09 in blk_co_do_pwritev_part (blk=blk@entry=0x5595f91db8c0, offset=60867018752, bytes=196608, qiov=qiov@entry=0x5595f9030d58, qiov_offset=qiov_offset@entry=0, flags=flags@entry=0) at ../block/block-backend.c:1365
> > #8  0x00005595f8315bdd in blk_co_pwritev_part (flags=0, qiov_offset=0, qiov=qiov@entry=0x5595f9030d58, bytes=<optimized out>, offset=<optimized out>, blk=0x5595f91db8c0) at ../block/block-backend.c:1380
> > #9  blk_co_pwritev (blk=0x5595f91db8c0, offset=<optimized out>, bytes=<optimized out>, qiov=qiov@entry=0x5595f9030d58, flags=flags@entry=0) at ../block/block-backend.c:1391
> > #10 0x00005595f8328a59 in mirror_read_complete (ret=0, op=0x5595f9030d50) at ../block/mirror.c:260
> > #11 mirror_co_read (opaque=0x5595f9030d50) at ../block/mirror.c:400
> > #12 0x00005595f843a39b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:177
> > #13 0x00007f33948b8d40 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> > #14 0x00007f324a3ea6e0 in ?? ()
> > #15 0x0000000000000000 in ?? ()
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



  reply	other threads:[~2022-09-28 18:55 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-28 14:10 Question regarding live-migration with drive-mirror Fiona Ebner
2022-09-28 18:53 ` Dr. David Alan Gilbert [this message]
2022-09-29  9:39   ` Fiona Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YzSYPDR0L98Nks4P@work-vm \
    --to=dgilbert@redhat.com \
    --cc=f.ebner@proxmox.com \
    --cc=f.gruenbichler@proxmox.com \
    --cc=hreitz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=t.lamprecht@proxmox.com \
    --cc=vsementsov@yandex-team.ru \
    --cc=w.bumiller@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.