All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Balamuruhan S <bala24@linux.vnet.ibm.com>
Cc: peterx@redhat.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes
Date: Thu, 12 Jul 2018 09:50:32 +0100	[thread overview]
Message-ID: <20180712085032.GA2610@work-vm> (raw)
In-Reply-To: <20180706124654.GA4141@localhost.localdomain>

* Balamuruhan S (bala24@linux.vnet.ibm.com) wrote:
> On Fri, Jul 06, 2018 at 11:56:59AM +0100, Dr. David Alan Gilbert wrote:
> > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > Based-on: <20180627132246.5576-1-peterx@redhat.com>
> > > > 
> > > > Based on the series to unbreak postcopy:
> > > >   Subject: [PATCH v3 0/4] migation: unbreak postcopy recovery
> > > >   Message-Id: <20180627132246.5576-1-peterx@redhat.com>
> > > > 
> > > > This series introduce a new postcopy recovery test.  The new test
> > > > actually helped me to identify two bugs there so fix them as well
> > > > before 3.0 release.
> > > > 
> > > > Patch 1: a trivial cleanup for existing postcopy ram load, which I
> > > >          found a bit confusing during debugging the problem.
> > > > 
> > > > Patch 2-3: two bug fixes that address different issues.  Please see
> > > >            the commit log for more information.
> > > > 
> > > > Patch 4-9: add the postcopy recovery unit test.
> > > > 
> > > > Please review.  Thanks,
> > > 
> > > Queued
> > 
> > Hi Peter,
> >   There's a problem in there somewhere;  I'm getting
> > an intermittent failure of the test if I run a make check -j 8    on my
> > laptop.  Just running two copies of tests/migration-test in parallel
> > sometimes triggers it (but not if I turn on QTEST_LOG!).
> > But it's always failing with:
> > 
> >   ERROR:/home/dgilbert/git/migpull/tests/migration-test.c:373:migrate_recover: assertion failed: (qdict_haskey(rsp, "return"))
> > 
> > Dave
> 
> Hi Peter, Dave,

Hi Bala,

> I have applied this patchset in upstream Qemu to test postcopy
> pause/recovery.

Are you still seeing this with the set that got merged into 3.0-rc0?
The second of your errors looks similar to problems with the race
we had before Peter fixed it; but the set that I merged passed a 'make
check' on a Power box.

Dave

> I observed error after triggering recovery command from source monitor
> where the target is lost and the source remains to be in `postcopy-pause`
> state.
> 
> Please find my observation below,
> 
> Source:
> 
> # ppc64-softmmu/qemu-system-ppc64 --enable-kvm --nographic -vga none -machine \
> pseries -m 64G,slots=128,maxmem=128G -smp 16,maxcpus=32 -device virtio-blk-pci,drive=rootdisk \
> -drive file=/home/hostos-ppc64le.qcow2,if=none,cache=none,format=qcow2,id=rootdisk \
> -monitor telnet:127.0.0.1:1234,server,nowait -net nic,model=virtio -net user \
> -redir tcp:2000::22
> 
> qemu-system-ppc64: Detected IO failure for postcopy. Migration paused.
> 
> Source Monitor:
> 
> (qemu) migrate_set_capability postcopy-ram on
> (qemu) migrate_set_parameter max-postcopy-bandwidth 4096
> (qemu) migrate -d tcp:127.0.0.1:4444
> (qemu) migrate_start_postcopy
> (qemu) migrate_pause
> (qemu) migrate -r tcp:127.0.0.1:4446
> 
> After triggering recovery, target is lost with the error mentioned below
> and source remains to be in `postcopy-paused` state
> 
> (qemu) info migrate
> globals:
> store-global-state: on
> only-migratable: off
> send-configuration: on
> send-section-footer: on
> decompress-error-check: on
> capabilities: xbzrle: off rdma-pin-all: off auto-converge: off
> zero-blocks: off \
> compress: off events: off postcopy-ram: on x-colo: off release-ram: off
> block: off return-path: off pause-before-switchover: off x-multifd: off \
> dirty-bitmaps: off
> postcopy-blocktime: off late-block-activate: off 
> Migration status: postcopy-recover
> total time: 78818 milliseconds
> expected downtime: 300 milliseconds
> setup: 169 milliseconds
> transferred ram: 177749 kbytes
> throughput: 63.72 mbps
> remaining ram: 28061376 kbytes
> total ram: 67109120 kbytes
> duplicate: 9742102 pages
> skipped: 0 pages
> normal: 22986 pages
> normal bytes: 91944 kbytes
> dirty sync count: 2
> page size: 4 kbytes
> multifd bytes: 0 kbytes
> dirty pages rate: 1273187 pages
> postcopy request count: 236
> 
> 
> Target:
> 
> # ppc64-softmmu/qemu-system-ppc64 --enable-kvm --nographic -vga none -machine \
> pseries -m 64G,slots=128,maxmem=128G -smp 16,maxcpus=32 -device virtio-blk-pci,drive=rootdisk \
> -drive file=/home/bala/sharing/hostos-ppc64le.qcow2,if=none,cache=none,format=qcow2,id=rootdisk \
> -monitor telnet:127.0.0.1:1235,server,nowait -net nic,model=virtio -net user \
> -redir tcp:2001::22 -incoming tcp:127.0.0.1:4444
> 
> 
> qemu-system-ppc64: check_section_footer: Read section footer failed: -5
> qemu-system-ppc64: Detected IO failure for postcopy. Migration paused.
> qemu-system-ppc64: Not a migration stream
> qemu-system-ppc64: load of migration failed: Invalid argument
> 
> 
> Target Monitor:
> 
> (qemu) migrate_set_capability postcopy-ram on
> (qemu) migrate_recover tcp:127.0.0.1:4446
> (qemu) Connection closed by foreign host.
> 
> QTest:
> 
> Also with respect to Qtest, I have tested it and the recovery test
> doesn't complete as it waits on the source for "completed" but due to this
> issue source remains to be in `postcopy-paused`
> 
> `migrate_postcopy_complete(from, to);`
> 
> but it actually doesn't end.
> 
> As it did not complete, I cancelled it forcefully
> 
> # time QTEST_QEMU_BINARY=./ppc64-softmmu/qemu-system-ppc64 ./tests/migration-test
> /ppc64/migration/deprecated: OK
> /ppc64/migration/bad_dest: OK
> /ppc64/migration/postcopy/unix: OK
> /ppc64/migration/postcopy/recovery: ^C
> 
> real    21m55.176s
> user    2m28.800s
> sys 4m55.980s
> 
> -- Bala
> > 
> > > > Peter Xu (9):
> > > >   migration: simplify check to use qemu file buffer
> > > >   migration: loosen recovery check when load vm
> > > >   migration: fix incorrect bitmap size calculation
> > > >   tests: introduce migrate_postcopy_* helpers
> > > >   tests: allow migrate() to take extra flags
> > > >   tests: introduce migrate_query*() helpers
> > > >   tests: introduce wait_for_migration_status()
> > > >   tests: add postcopy recovery test
> > > >   tests: hide stderr for postcopy recovery test
> > > > 
> > > >  migration/ram.c        |  21 +++--
> > > >  migration/savevm.c     |  16 ++--
> > > >  tests/migration-test.c | 198 ++++++++++++++++++++++++++++++++---------
> > > >  3 files changed, 176 insertions(+), 59 deletions(-)
> > > > 
> > > > -- 
> > > > 2.17.1
> > > > 
> > > > 
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2018-07-12  8:50 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-05  3:17 [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes Peter Xu
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 1/9] migration: simplify check to use qemu file buffer Peter Xu
2018-07-05  9:01   ` Dr. David Alan Gilbert
2018-07-05  9:11     ` Peter Xu
2018-07-05 12:59   ` Juan Quintela
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 2/9] migration: loosen recovery check when load vm Peter Xu
2018-07-05  9:15   ` Dr. David Alan Gilbert
2018-07-05  9:31     ` Peter Xu
2018-07-05 13:01   ` Juan Quintela
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 3/9] migration: fix incorrect bitmap size calculation Peter Xu
2018-07-05  9:38   ` Dr. David Alan Gilbert
2018-07-05 13:01   ` Juan Quintela
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 4/9] tests: introduce migrate_postcopy_* helpers Peter Xu
2018-07-05  9:31   ` Balamuruhan S
2018-07-06  2:19     ` Peter Xu
2018-07-06  6:17       ` Balamuruhan S
2018-07-05  9:59   ` Dr. David Alan Gilbert
2018-07-05 13:03   ` Juan Quintela
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 5/9] tests: allow migrate() to take extra flags Peter Xu
2018-07-05 10:18   ` Dr. David Alan Gilbert
2018-07-05 13:05   ` Juan Quintela
2018-07-06 10:36   ` Balamuruhan S
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 6/9] tests: introduce migrate_query*() helpers Peter Xu
2018-07-05 10:23   ` Dr. David Alan Gilbert
2018-07-05 13:07     ` Juan Quintela
2018-07-05 10:59   ` Balamuruhan S
2018-07-05 13:06   ` Juan Quintela
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 7/9] tests: introduce wait_for_migration_status() Peter Xu
2018-07-05 10:27   ` Dr. David Alan Gilbert
2018-07-05 13:07   ` Juan Quintela
2018-07-06 10:41   ` Balamuruhan S
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 8/9] tests: add postcopy recovery test Peter Xu
2018-07-05 10:30   ` Dr. David Alan Gilbert
2018-07-05 13:08   ` Juan Quintela
2018-07-05  3:17 ` [Qemu-devel] [PATCH for-3.0 9/9] tests: hide stderr for " Peter Xu
2018-07-05 10:36   ` Dr. David Alan Gilbert
2018-07-05 13:09   ` Juan Quintela
2018-07-06  9:17 ` [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes Dr. David Alan Gilbert
2018-07-06 10:56   ` Dr. David Alan Gilbert
2018-07-06 11:45     ` Balamuruhan S
2018-07-06 12:46     ` Balamuruhan S
2018-07-12  8:50       ` Dr. David Alan Gilbert [this message]
2018-07-10  3:27     ` Peter Xu
2018-07-10  8:53       ` Dr. David Alan Gilbert
2018-07-10  1:56 ` Balamuruhan S
2018-07-10  3:07   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180712085032.GA2610@work-vm \
    --to=dgilbert@redhat.com \
    --cc=bala24@linux.vnet.ibm.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.