From: Balamuruhan S <bala24@linux.vnet.ibm.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: peterx@redhat.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes
Date: Fri, 6 Jul 2018 18:16:54 +0530 [thread overview]
Message-ID: <20180706124654.GA4141@localhost.localdomain> (raw)
In-Reply-To: <20180706105658.GB2661@work-vm>
On Fri, Jul 06, 2018 at 11:56:59AM +0100, Dr. David Alan Gilbert wrote:
> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > Based-on: <20180627132246.5576-1-peterx@redhat.com>
> > >
> > > Based on the series to unbreak postcopy:
> > > Subject: [PATCH v3 0/4] migation: unbreak postcopy recovery
> > > Message-Id: <20180627132246.5576-1-peterx@redhat.com>
> > >
> > > This series introduce a new postcopy recovery test. The new test
> > > actually helped me to identify two bugs there so fix them as well
> > > before 3.0 release.
> > >
> > > Patch 1: a trivial cleanup for existing postcopy ram load, which I
> > > found a bit confusing during debugging the problem.
> > >
> > > Patch 2-3: two bug fixes that address different issues. Please see
> > > the commit log for more information.
> > >
> > > Patch 4-9: add the postcopy recovery unit test.
> > >
> > > Please review. Thanks,
> >
> > Queued
>
> Hi Peter,
> There's a problem in there somewhere; I'm getting
> an intermittent failure of the test if I run a make check -j 8 on my
> laptop. Just running two copies of tests/migration-test in parallel
> sometimes triggers it (but not if I turn on QTEST_LOG!).
> But it's always failing with:
>
> ERROR:/home/dgilbert/git/migpull/tests/migration-test.c:373:migrate_recover: assertion failed: (qdict_haskey(rsp, "return"))
>
> Dave
Hi Peter, Dave,
I have applied this patchset in upstream Qemu to test postcopy
pause/recovery.
I observed error after triggering recovery command from source monitor
where the target is lost and the source remains to be in `postcopy-pause`
state.
Please find my observation below,
Source:
# ppc64-softmmu/qemu-system-ppc64 --enable-kvm --nographic -vga none -machine \
pseries -m 64G,slots=128,maxmem=128G -smp 16,maxcpus=32 -device virtio-blk-pci,drive=rootdisk \
-drive file=/home/hostos-ppc64le.qcow2,if=none,cache=none,format=qcow2,id=rootdisk \
-monitor telnet:127.0.0.1:1234,server,nowait -net nic,model=virtio -net user \
-redir tcp:2000::22
qemu-system-ppc64: Detected IO failure for postcopy. Migration paused.
Source Monitor:
(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate_set_parameter max-postcopy-bandwidth 4096
(qemu) migrate -d tcp:127.0.0.1:4444
(qemu) migrate_start_postcopy
(qemu) migrate_pause
(qemu) migrate -r tcp:127.0.0.1:4446
After triggering recovery, target is lost with the error mentioned below
and source remains to be in `postcopy-paused` state
(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off
zero-blocks: off \
compress: off events: off postcopy-ram: on x-colo: off release-ram: off
block: off return-path: off pause-before-switchover: off x-multifd: off \
dirty-bitmaps: off
postcopy-blocktime: off late-block-activate: off
Migration status: postcopy-recover
total time: 78818 milliseconds
expected downtime: 300 milliseconds
setup: 169 milliseconds
transferred ram: 177749 kbytes
throughput: 63.72 mbps
remaining ram: 28061376 kbytes
total ram: 67109120 kbytes
duplicate: 9742102 pages
skipped: 0 pages
normal: 22986 pages
normal bytes: 91944 kbytes
dirty sync count: 2
page size: 4 kbytes
multifd bytes: 0 kbytes
dirty pages rate: 1273187 pages
postcopy request count: 236
Target:
# ppc64-softmmu/qemu-system-ppc64 --enable-kvm --nographic -vga none -machine \
pseries -m 64G,slots=128,maxmem=128G -smp 16,maxcpus=32 -device virtio-blk-pci,drive=rootdisk \
-drive file=/home/bala/sharing/hostos-ppc64le.qcow2,if=none,cache=none,format=qcow2,id=rootdisk \
-monitor telnet:127.0.0.1:1235,server,nowait -net nic,model=virtio -net user \
-redir tcp:2001::22 -incoming tcp:127.0.0.1:4444
qemu-system-ppc64: check_section_footer: Read section footer failed: -5
qemu-system-ppc64: Detected IO failure for postcopy. Migration paused.
qemu-system-ppc64: Not a migration stream
qemu-system-ppc64: load of migration failed: Invalid argument
Target Monitor:
(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate_recover tcp:127.0.0.1:4446
(qemu) Connection closed by foreign host.
QTest:
Also with respect to Qtest, I have tested it and the recovery test
doesn't complete as it waits on the source for "completed" but due to this
issue source remains to be in `postcopy-paused`
`migrate_postcopy_complete(from, to);`
but it actually doesn't end.
As it did not complete, I cancelled it forcefully
# time QTEST_QEMU_BINARY=./ppc64-softmmu/qemu-system-ppc64 ./tests/migration-test
/ppc64/migration/deprecated: OK
/ppc64/migration/bad_dest: OK
/ppc64/migration/postcopy/unix: OK
/ppc64/migration/postcopy/recovery: ^C
real 21m55.176s
user 2m28.800s
sys 4m55.980s
-- Bala
>
> > > Peter Xu (9):
> > > migration: simplify check to use qemu file buffer
> > > migration: loosen recovery check when load vm
> > > migration: fix incorrect bitmap size calculation
> > > tests: introduce migrate_postcopy_* helpers
> > > tests: allow migrate() to take extra flags
> > > tests: introduce migrate_query*() helpers
> > > tests: introduce wait_for_migration_status()
> > > tests: add postcopy recovery test
> > > tests: hide stderr for postcopy recovery test
> > >
> > > migration/ram.c | 21 +++--
> > > migration/savevm.c | 16 ++--
> > > tests/migration-test.c | 198 ++++++++++++++++++++++++++++++++---------
> > > 3 files changed, 176 insertions(+), 59 deletions(-)
> > >
> > > --
> > > 2.17.1
> > >
> > >
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
next prev parent reply other threads:[~2018-07-06 12:47 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-05 3:17 [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes Peter Xu
2018-07-05 3:17 ` [Qemu-devel] [PATCH for-3.0 1/9] migration: simplify check to use qemu file buffer Peter Xu
2018-07-05 9:01 ` Dr. David Alan Gilbert
2018-07-05 9:11 ` Peter Xu
2018-07-05 12:59 ` Juan Quintela
2018-07-05 3:17 ` [Qemu-devel] [PATCH for-3.0 2/9] migration: loosen recovery check when load vm Peter Xu
2018-07-05 9:15 ` Dr. David Alan Gilbert
2018-07-05 9:31 ` Peter Xu
2018-07-05 13:01 ` Juan Quintela
2018-07-05 3:17 ` [Qemu-devel] [PATCH for-3.0 3/9] migration: fix incorrect bitmap size calculation Peter Xu
2018-07-05 9:38 ` Dr. David Alan Gilbert
2018-07-05 13:01 ` Juan Quintela
2018-07-05 3:17 ` [Qemu-devel] [PATCH for-3.0 4/9] tests: introduce migrate_postcopy_* helpers Peter Xu
2018-07-05 9:31 ` Balamuruhan S
2018-07-06 2:19 ` Peter Xu
2018-07-06 6:17 ` Balamuruhan S
2018-07-05 9:59 ` Dr. David Alan Gilbert
2018-07-05 13:03 ` Juan Quintela
2018-07-05 3:17 ` [Qemu-devel] [PATCH for-3.0 5/9] tests: allow migrate() to take extra flags Peter Xu
2018-07-05 10:18 ` Dr. David Alan Gilbert
2018-07-05 13:05 ` Juan Quintela
2018-07-06 10:36 ` Balamuruhan S
2018-07-05 3:17 ` [Qemu-devel] [PATCH for-3.0 6/9] tests: introduce migrate_query*() helpers Peter Xu
2018-07-05 10:23 ` Dr. David Alan Gilbert
2018-07-05 13:07 ` Juan Quintela
2018-07-05 10:59 ` Balamuruhan S
2018-07-05 13:06 ` Juan Quintela
2018-07-05 3:17 ` [Qemu-devel] [PATCH for-3.0 7/9] tests: introduce wait_for_migration_status() Peter Xu
2018-07-05 10:27 ` Dr. David Alan Gilbert
2018-07-05 13:07 ` Juan Quintela
2018-07-06 10:41 ` Balamuruhan S
2018-07-05 3:17 ` [Qemu-devel] [PATCH for-3.0 8/9] tests: add postcopy recovery test Peter Xu
2018-07-05 10:30 ` Dr. David Alan Gilbert
2018-07-05 13:08 ` Juan Quintela
2018-07-05 3:17 ` [Qemu-devel] [PATCH for-3.0 9/9] tests: hide stderr for " Peter Xu
2018-07-05 10:36 ` Dr. David Alan Gilbert
2018-07-05 13:09 ` Juan Quintela
2018-07-06 9:17 ` [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes Dr. David Alan Gilbert
2018-07-06 10:56 ` Dr. David Alan Gilbert
2018-07-06 11:45 ` Balamuruhan S
2018-07-06 12:46 ` Balamuruhan S [this message]
2018-07-12 8:50 ` Dr. David Alan Gilbert
2018-07-10 3:27 ` Peter Xu
2018-07-10 8:53 ` Dr. David Alan Gilbert
2018-07-10 1:56 ` Balamuruhan S
2018-07-10 3:07 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180706124654.GA4141@localhost.localdomain \
--to=bala24@linux.vnet.ibm.com \
--cc=dgilbert@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).