* Re: [Qemu-devel] question: I found a qemu crash about migration
@ 2018-01-13 0:31 Matthew Schumacher
2018-01-15 12:55 ` Dr. David Alan Gilbert
0 siblings, 1 reply; 3+ messages in thread
From: Matthew Schumacher @ 2018-01-13 0:31 UTC (permalink / raw)
To: qemu-devel
Am 28.09.2017 um 19:01 hat Dr. David Alan Gilbert geschrieben:
> Hi,
> This is a 'fun' bug; I had a good chat to kwolf about it earlier.
> A proper fix really needs to be done together with libvirt so that we
> can sequence:
> a) The stopping of the CPU on the source
> b) The termination of the mirroring block job
> c) The inactivation of the block devices on the source
> (bdrv_inactivate_all)
> d) The activation of the block devices on the destination
> (bdrv_invalidate_cache_all)
> e) The start of the CPU on the destinationOn 01/12/2018 03:21 PM,
qemu-devel-confirm+7e23769bf079599cf1f3db6b00d347e8675d87f
3@nongnu.org wrote:
>
>
> It looks like you're hitting a race between b/c; we've had races
> between c/d in the past and moved the bdrv_inactivate_all.
>
> During the discussion we ended up with two proposed solutions;
> both of them require one extra command and one extra migration
> capability.
>
> The block way
> -------------
> 1) Add a new migration capability pause-at-complete
> 2) Add a new migration state almost-complete
> 3) After saving devices, if pause-at-complete is set,
> transition to almost-complete
> 4) Add a new command (migration-continue) that
> causes the migration to inactivate the devices (c)
> and send the final EOF to the destination.
>
> You set pause-at-complete, wait until migrate hits almost-complete;
> cleanup the mirror job, and then do migration-continue. When it
> completes do 'cont' on the destination.
>
> The migration way
> -----------------
> 1) Stop doing (d) when the destination is started with -S
> since it happens anyway when 'cont' is issued
> 2) Add a new migration capability ext-manage-storage
> 3) When 'ext-manage-storage' is set, we don't bother doing (c)
> 4) Add a new command 'block-inactivate' on the source
>
> You set ext-manage-storage, do the migrate and when it's finished
> clean up the block job, block-inactivate on the source, and
> then cont on the destination.
>
>
> My worry about the 'block way' is that the point at which we
> do the pause seems pretty interesting; it probably is best
> done after the final device save but before the inactivate,
> but could be done before it. But it probably becomes API
> and something might become dependent on where we did it.
>
> I think Kevin's worry about the 'migration way' is that
> it's a bit of a block-specific fudge; which is probably right.
>
>
> I've not really thought what happens when you have a mix of shared and
> non-shared storage.
>
> Could we do any hack that isn't libvirt-visible for existing versions?
> I guess maybe hack drive-mirror so it interlocks with the migration
> code somehow to hold off on that inactivate?
>
> This code is visible probalby from 2.9.ish with the new locking code;
> but really that b/c race has been there for ever - there's maybe
> always the chance that the last few blocks of mirroring might have
> happened too late ?
>
> Thoughts?
> What are the libvirt view on the preferred solution.
>
> Dave
Devs,
Did this issue ever get addressed? I'm looking at the history for
mirror.c at https://github.com/qemu/qemu/commits/master/block/mirror.c
and I don't see anything that leads me to believe this was fixed.
I'm still unable to live migrate storage without risking corruption on
even a moderately loaded vm.
Thanks,
schu
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Qemu-devel] question: I found a qemu crash about migration
2018-01-13 0:31 [Qemu-devel] question: I found a qemu crash about migration Matthew Schumacher
@ 2018-01-15 12:55 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 3+ messages in thread
From: Dr. David Alan Gilbert @ 2018-01-15 12:55 UTC (permalink / raw)
To: Matthew Schumacher; +Cc: qemu-devel, kwolf
* Matthew Schumacher (matt.s@aptalaska.net) wrote:
> Am 28.09.2017 um 19:01 hat Dr. David Alan Gilbert geschrieben:
> > Hi,
> > This is a 'fun' bug; I had a good chat to kwolf about it earlier.
> > A proper fix really needs to be done together with libvirt so that we
> > can sequence:
> > a) The stopping of the CPU on the source
> > b) The termination of the mirroring block job
> > c) The inactivation of the block devices on the source
> > (bdrv_inactivate_all)
> > d) The activation of the block devices on the destination
> > (bdrv_invalidate_cache_all)
> > e) The start of the CPU on the destinationOn 01/12/2018 03:21 PM,
> qemu-devel-confirm+7e23769bf079599cf1f3db6b00d347e8675d87f
> 3@nongnu.org wrote:
> >
> >
> > It looks like you're hitting a race between b/c; we've had races
> > between c/d in the past and moved the bdrv_inactivate_all.
> >
> > During the discussion we ended up with two proposed solutions;
> > both of them require one extra command and one extra migration
> > capability.
> >
> > The block way
> > -------------
> > 1) Add a new migration capability pause-at-complete
> > 2) Add a new migration state almost-complete
> > 3) After saving devices, if pause-at-complete is set,
> > transition to almost-complete
> > 4) Add a new command (migration-continue) that
> > causes the migration to inactivate the devices (c)
> > and send the final EOF to the destination.
> >
> > You set pause-at-complete, wait until migrate hits almost-complete;
> > cleanup the mirror job, and then do migration-continue. When it
> > completes do 'cont' on the destination.
> >
> > The migration way
> > -----------------
> > 1) Stop doing (d) when the destination is started with -S
> > since it happens anyway when 'cont' is issued
> > 2) Add a new migration capability ext-manage-storage
> > 3) When 'ext-manage-storage' is set, we don't bother doing (c)
> > 4) Add a new command 'block-inactivate' on the source
> >
> > You set ext-manage-storage, do the migrate and when it's finished
> > clean up the block job, block-inactivate on the source, and
> > then cont on the destination.
> >
> >
> > My worry about the 'block way' is that the point at which we
> > do the pause seems pretty interesting; it probably is best
> > done after the final device save but before the inactivate,
> > but could be done before it. But it probably becomes API
> > and something might become dependent on where we did it.
> >
> > I think Kevin's worry about the 'migration way' is that
> > it's a bit of a block-specific fudge; which is probably right.
> >
> >
> > I've not really thought what happens when you have a mix of shared and
> > non-shared storage.
> >
> > Could we do any hack that isn't libvirt-visible for existing versions?
> > I guess maybe hack drive-mirror so it interlocks with the migration
> > code somehow to hold off on that inactivate?
> >
> > This code is visible probalby from 2.9.ish with the new locking code;
> > but really that b/c race has been there for ever - there's maybe
> > always the chance that the last few blocks of mirroring might have
> > happened too late ?
> >
> > Thoughts?
> > What are the libvirt view on the preferred solution.
> >
> > Dave
>
> Devs,
>
> Did this issue ever get addressed? I'm looking at the history for
> mirror.c at https://github.com/qemu/qemu/commits/master/block/mirror.c
> and I don't see anything that leads me to believe this was fixed.
>
> I'm still unable to live migrate storage without risking corruption on
> even a moderately loaded vm.
Yes, there's now a 'pause-before-switchover' which gives libvirt
a chance to quiesce the block devices.
That went in my 93fbd0314^..0331c8cabf6168 back in October.
I believe libvirt uses that; see libvirt commit 6addde2.
Dave
> Thanks,
> schu
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Qemu-devel] question: I found a qemu crash about migration
@ 2017-09-26 13:10 wangjie (P)
0 siblings, 0 replies; 3+ messages in thread
From: wangjie (P) @ 2017-09-26 13:10 UTC (permalink / raw)
To: qemu-devel@nongnu.org, kwolf@redhat.com, pbonzini@redhat.com
Cc: wangjie (P), fuweiwei (C), eblake@redhat.com, kchamart@redhat.com,
dgilbert@redhat.com, famz@redhat.com, Wubin (H)
[-- Attachment #1: Type: text/plain, Size: 2647 bytes --]
Hi,
When I use qemuMigrationRun to migrate both memory and storage with some IO press in VM, and configured iothreads. We triggered a error reports: (I use the current qemu master branch)
" bdrv_co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed",
I reviewed the code, and gdb the coredump file, I think one case can trigger the error reports
Case:
Migration_thread()
Migration_completion() ----------> last iteration of memory migration
Vm_stop_force_state()--------------> Stop the VM, and call bdrv_drain_all, but I gdb the core file, and found the cnt of dirty bitmap of driver-mirror is not 0, and in_flight mirror IO is 16,
Bdrv_inactivate_all()----------------> inactivate images and set the INACTIVE label.
-> bdrv_co_do_pwritev()-------------->then the mirror IO handled after will trigger the Assertion `!(bs->open_flags & 0x0800)' and qemu crashed
As we can see from above, Migration_completion call Bdrv_inactivate_all to inactivate images, but the mirror_run is not done (still has dirty clusters), the mirror_run IO issued later will triggered error reports: " bdrv_co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed",
It seems that memory migration and storage mirror is done independently and the sequence of the two progresses are quite random.
How can I solve this problem, should we not set INACTIVE label for drive-mirror BlockDriverState?
Qemu Crash bt:
(gdb) bt
#0 0x00007f6b6e2a71d7 in raise () from /usr/lib64/libc.so.6
#1 0x00007f6b6e2a88c8 in abort () from /usr/lib64/libc.so.6
#2 0x00007f6b6e2a0146 in __assert_fail_base () from /usr/lib64/libc.so.6
#3 0x00007f6b6e2a01f2 in __assert_fail () from /usr/lib64/libc.so.6
#4 0x00000000007b9211 in bdrv_co_pwritev (child=<optimized out>, offset=offset@entry=7034896384, bytes=bytes@entry=65536,
qiov=qiov@entry=0x7f69cc09b068, flags=0) at block/io.c:1536
#5 0x00000000007a6f02 in blk_co_pwritev (blk=0x2f92750, offset=7034896384, bytes=65536, qiov=0x7f69cc09b068,
flags=<optimized out>) at block/block_backend.c:851
#6 0x00000000007a6fc1 in blk_aio_write_entry (opaque=0x301dad0) at block/block_backend.c:1043
#7 0x0000000000835e2a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine_ucontext.c:79
#8 0x00007f6b6e2b8cf0 in ?? () from /usr/lib64/libc.so.6
#9 0x00007f6a1bcfc780 in ?? ()
#10 0x0000000000000000 in ?? ()
And I see the mirror_run is not done, gdb info as following:
[cid:image001.png@01D33708.266B5230]
Src VM qemu log:
[cid:image002.png@01D33709.1C358820]
[-- Attachment #2: image001.png --]
[-- Type: image/png, Size: 61384 bytes --]
[-- Attachment #3: image002.png --]
[-- Type: image/png, Size: 153960 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-01-15 12:55 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-13 0:31 [Qemu-devel] question: I found a qemu crash about migration Matthew Schumacher
2018-01-15 12:55 ` Dr. David Alan Gilbert
-- strict thread matches above, loose matches on Subject: below --
2017-09-26 13:10 wangjie (P)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).