All of lore.kernel.org
 help / color / mirror / Atom feed
From: Balamuruhan S <bala24@linux.vnet.ibm.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: peterx@redhat.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes
Date: Fri, 6 Jul 2018 18:16:54 +0530	[thread overview]
Message-ID: <20180706124654.GA4141@localhost.localdomain> (raw)
In-Reply-To: <20180706105658.GB2661@work-vm>

On Fri, Jul 06, 2018 at 11:56:59AM +0100, Dr. David Alan Gilbert wrote:
> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > Based-on: <20180627132246.5576-1-peterx@redhat.com>
> > > 
> > > Based on the series to unbreak postcopy:
> > >   Subject: [PATCH v3 0/4] migation: unbreak postcopy recovery
> > >   Message-Id: <20180627132246.5576-1-peterx@redhat.com>
> > > 
> > > This series introduce a new postcopy recovery test.  The new test
> > > actually helped me to identify two bugs there so fix them as well
> > > before 3.0 release.
> > > 
> > > Patch 1: a trivial cleanup for existing postcopy ram load, which I
> > >          found a bit confusing during debugging the problem.
> > > 
> > > Patch 2-3: two bug fixes that address different issues.  Please see
> > >            the commit log for more information.
> > > 
> > > Patch 4-9: add the postcopy recovery unit test.
> > > 
> > > Please review.  Thanks,
> > 
> > Queued
> 
> Hi Peter,
>   There's a problem in there somewhere;  I'm getting
> an intermittent failure of the test if I run a make check -j 8    on my
> laptop.  Just running two copies of tests/migration-test in parallel
> sometimes triggers it (but not if I turn on QTEST_LOG!).
> But it's always failing with:
> 
>   ERROR:/home/dgilbert/git/migpull/tests/migration-test.c:373:migrate_recover: assertion failed: (qdict_haskey(rsp, "return"))
> 
> Dave

Hi Peter, Dave,

I have applied this patchset in upstream Qemu to test postcopy
pause/recovery.

I observed error after triggering recovery command from source monitor
where the target is lost and the source remains to be in `postcopy-pause`
state.

Please find my observation below,

Source:

# ppc64-softmmu/qemu-system-ppc64 --enable-kvm --nographic -vga none -machine \
pseries -m 64G,slots=128,maxmem=128G -smp 16,maxcpus=32 -device virtio-blk-pci,drive=rootdisk \
-drive file=/home/hostos-ppc64le.qcow2,if=none,cache=none,format=qcow2,id=rootdisk \
-monitor telnet:127.0.0.1:1234,server,nowait -net nic,model=virtio -net user \
-redir tcp:2000::22

qemu-system-ppc64: Detected IO failure for postcopy. Migration paused.

Source Monitor:

(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate_set_parameter max-postcopy-bandwidth 4096
(qemu) migrate -d tcp:127.0.0.1:4444
(qemu) migrate_start_postcopy
(qemu) migrate_pause
(qemu) migrate -r tcp:127.0.0.1:4446

After triggering recovery, target is lost with the error mentioned below
and source remains to be in `postcopy-paused` state

(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off
zero-blocks: off \
compress: off events: off postcopy-ram: on x-colo: off release-ram: off
block: off return-path: off pause-before-switchover: off x-multifd: off \
dirty-bitmaps: off
postcopy-blocktime: off late-block-activate: off 
Migration status: postcopy-recover
total time: 78818 milliseconds
expected downtime: 300 milliseconds
setup: 169 milliseconds
transferred ram: 177749 kbytes
throughput: 63.72 mbps
remaining ram: 28061376 kbytes
total ram: 67109120 kbytes
duplicate: 9742102 pages
skipped: 0 pages
normal: 22986 pages
normal bytes: 91944 kbytes
dirty sync count: 2
page size: 4 kbytes
multifd bytes: 0 kbytes
dirty pages rate: 1273187 pages
postcopy request count: 236


Target:

# ppc64-softmmu/qemu-system-ppc64 --enable-kvm --nographic -vga none -machine \
pseries -m 64G,slots=128,maxmem=128G -smp 16,maxcpus=32 -device virtio-blk-pci,drive=rootdisk \
-drive file=/home/bala/sharing/hostos-ppc64le.qcow2,if=none,cache=none,format=qcow2,id=rootdisk \
-monitor telnet:127.0.0.1:1235,server,nowait -net nic,model=virtio -net user \
-redir tcp:2001::22 -incoming tcp:127.0.0.1:4444


qemu-system-ppc64: check_section_footer: Read section footer failed: -5
qemu-system-ppc64: Detected IO failure for postcopy. Migration paused.
qemu-system-ppc64: Not a migration stream
qemu-system-ppc64: load of migration failed: Invalid argument


Target Monitor:

(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate_recover tcp:127.0.0.1:4446
(qemu) Connection closed by foreign host.

QTest:

Also with respect to Qtest, I have tested it and the recovery test
doesn't complete as it waits on the source for "completed" but due to this
issue source remains to be in `postcopy-paused`

`migrate_postcopy_complete(from, to);`

but it actually doesn't end.

As it did not complete, I cancelled it forcefully

# time QTEST_QEMU_BINARY=./ppc64-softmmu/qemu-system-ppc64 ./tests/migration-test
/ppc64/migration/deprecated: OK
/ppc64/migration/bad_dest: OK
/ppc64/migration/postcopy/unix: OK
/ppc64/migration/postcopy/recovery: ^C

real    21m55.176s
user    2m28.800s
sys 4m55.980s

-- Bala
> 
> > > Peter Xu (9):
> > >   migration: simplify check to use qemu file buffer
> > >   migration: loosen recovery check when load vm
> > >   migration: fix incorrect bitmap size calculation
> > >   tests: introduce migrate_postcopy_* helpers
> > >   tests: allow migrate() to take extra flags
> > >   tests: introduce migrate_query*() helpers
> > >   tests: introduce wait_for_migration_status()
> > >   tests: add postcopy recovery test
> > >   tests: hide stderr for postcopy recovery test
> > > 
> > >  migration/ram.c        |  21 +++--
> > >  migration/savevm.c     |  16 ++--
> > >  tests/migration-test.c | 198 ++++++++++++++++++++++++++++++++---------
> > >  3 files changed, 176 insertions(+), 59 deletions(-)
> > > 
> > > -- 
> > > 2.17.1
> > > 
> > > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

  parent reply	other threads:[~2018-07-06 12:47 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-05  3:17 [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes Peter Xu
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 1/9] migration: simplify check to use qemu file buffer Peter Xu
2018-07-05  9:01   ` Dr. David Alan Gilbert
2018-07-05  9:11     ` Peter Xu
2018-07-05 12:59   ` Juan Quintela
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 2/9] migration: loosen recovery check when load vm Peter Xu
2018-07-05  9:15   ` Dr. David Alan Gilbert
2018-07-05  9:31     ` Peter Xu
2018-07-05 13:01   ` Juan Quintela
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 3/9] migration: fix incorrect bitmap size calculation Peter Xu
2018-07-05  9:38   ` Dr. David Alan Gilbert
2018-07-05 13:01   ` Juan Quintela
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 4/9] tests: introduce migrate_postcopy_* helpers Peter Xu
2018-07-05  9:31   ` Balamuruhan S
2018-07-06  2:19     ` Peter Xu
2018-07-06  6:17       ` Balamuruhan S
2018-07-05  9:59   ` Dr. David Alan Gilbert
2018-07-05 13:03   ` Juan Quintela
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 5/9] tests: allow migrate() to take extra flags Peter Xu
2018-07-05 10:18   ` Dr. David Alan Gilbert
2018-07-05 13:05   ` Juan Quintela
2018-07-06 10:36   ` Balamuruhan S
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 6/9] tests: introduce migrate_query*() helpers Peter Xu
2018-07-05 10:23   ` Dr. David Alan Gilbert
2018-07-05 13:07     ` Juan Quintela
2018-07-05 10:59   ` Balamuruhan S
2018-07-05 13:06   ` Juan Quintela
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 7/9] tests: introduce wait_for_migration_status() Peter Xu
2018-07-05 10:27   ` Dr. David Alan Gilbert
2018-07-05 13:07   ` Juan Quintela
2018-07-06 10:41   ` Balamuruhan S
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 8/9] tests: add postcopy recovery test Peter Xu
2018-07-05 10:30   ` Dr. David Alan Gilbert
2018-07-05 13:08   ` Juan Quintela
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 9/9] tests: hide stderr for " Peter Xu
2018-07-05 10:36   ` Dr. David Alan Gilbert
2018-07-05 13:09   ` Juan Quintela
2018-07-06  9:17 ` [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes Dr. David Alan Gilbert
2018-07-06 10:56   ` Dr. David Alan Gilbert
2018-07-06 11:45     ` Balamuruhan S
2018-07-06 12:46     ` Balamuruhan S [this message]
2018-07-12  8:50       ` Dr. David Alan Gilbert
2018-07-10  3:27     ` Peter Xu
2018-07-10  8:53       ` Dr. David Alan Gilbert
2018-07-10  1:56 ` Balamuruhan S
2018-07-10  3:07   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180706124654.GA4141@localhost.localdomain \
    --to=bala24@linux.vnet.ibm.com \
    --cc=dgilbert@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.