From: Isaku Yamahata <yamahata@valinux.co.jp>
To: thfbjyddx <thfbjyddx@hotmail.com>
Cc: "t.hirofuchi" <t.hirofuchi@aist.go.jp>,
qemu-devel <qemu-devel@nongnu.org>, kvm <kvm@vger.kernel.org>,
"satoshi.itoh" <satoshi.itoh@aist.go.jp>
Subject: Re: [Qemu-devel] 回??: [PATCH 00/21][RFC] postcopy live?migration
Date: Thu, 12 Jan 2012 17:54:41 +0900 [thread overview]
Message-ID: <20120112085441.GA23322@valinux.co.jp> (raw)
In-Reply-To: <BLU0-SMTP365D862A04A006EA3F0683ABC9F0@phx.gbl>
On Thu, Jan 12, 2012 at 04:29:44PM +0800, thfbjyddx wrote:
> Hi , I've dug more thess days
>
> > (qemu) migration-tcp: Attempting to start an incoming migration
> > migration-tcp: accepted migration
> > 4872:4872 postcopy_incoming_ram_load:1018: incoming ram load
> > 4872:4872 postcopy_incoming_ram_load:1031: addr 0x10870000 flags 0x4
> > 4872:4872 postcopy_incoming_ram_load:1057: done
> > 4872:4872 postcopy_incoming_ram_load:1018: incoming ram load
> > 4872:4872 postcopy_incoming_ram_load:1031: addr 0x0 flags 0x10
> > 4872:4872 postcopy_incoming_ram_load:1037: EOS
> > 4872:4872 postcopy_incoming_ram_load:1018: incoming ram load
> > 4872:4872 postcopy_incoming_ram_load:1031: addr 0x0 flags 0x10
> > 4872:4872 postcopy_incoming_ram_load:1037: EOS
>
> There should be only single EOS line. Just copy & past miss?
>
> There must be two EOS for one is coming from postcopy_outgoing_ram_save_live
> (...stage == QEMU_SAVE_LIVE_STAGE_PART) and the other is
> postcopy_outgoing_ram_save_live(...stage == QEMU_SAVE_LIVE_STAGE_END)
> I think in postcopy the ram_save_live in the iterate part can be ignore
> so why there still have the qemu_put_byte(f, QEMU_VM_SECTON_PART) and
> qemu_put_byte(f, QEMU_VM_SECTON_END) in the procedure? Is it essential?
Not so essential.
> Can you please track it down one more step?
> Which line did it stuck in kvm_put_msrs()? kvm_put_msrs() doesn't seem to
> block.(backtrace by the debugger would be best.)
>
> it gets to the kvm_vcpu_ioctl(env, KVM_SET_MSRS, &msr_data) and never return
> so it gets stuck
Do you know what wchan the process was blocked at?
kvm_vcpu_ioctl(env, KVM_SET_MSRS, &msr_data) doesn't seem to block.
> when I check the EOS problem
> I just annotated the qemu_put_byte(f, QEMU_VM_SECTION_PART); and qemu_put_be32
> (f, se->section_id)
> (I think this is a wrong way to fix it and I don't know how it get through)
> and leave just the se->save_live_state in the qemu_savevm_state_iterate
> it didn't get stuck at kvm_put_msrs()
> but it has some other error
> (qemu) migration-tcp: Attempting to start an incoming migration
> migration-tcp: accepted migration
> 2126:2126 postcopy_incoming_ram_load:1018: incoming ram load
> 2126:2126 postcopy_incoming_ram_load:1031: addr 0x10870000 flags 0x4
> 2126:2126 postcopy_incoming_ram_load:1057: done
> migration: successfully loaded vm state
> 2126:2126 postcopy_incoming_fork_umemd:1069: fork
> 2126:2126 postcopy_incoming_fork_umemd:1127: qemu pid: 2126 daemon pid: 2129
> 2130:2130 postcopy_incoming_umemd:1840: daemon pid: 2130
> 2130:2130 postcopy_incoming_umemd:1875: entering umemd main loop
> Can't find block !
> 2130:2130 postcopy_incoming_umem_ram_load:1526: shmem == NULL
> 2130:2130 postcopy_incoming_umemd:1882: exiting umemd main loop
> and at the same time , the destination node didn't show the EOS
>
> so I still can't solve the stuck problem
> Thanks for your help~!
> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> Tommy
>
> From: Isaku Yamahata
> Date: 2012-01-11 10:45
> To: thfbjyddx
> CC: t.hirofuchi; qemu-devel; kvm; satoshi.itoh
> Subject: Re: [Qemu-devel]回??: [PATCH 00/21][RFC] postcopy live migration
> On Sat, Jan 07, 2012 at 06:29:14PM +0800, thfbjyddx wrote:
> > Hello all!
>
> Hi, thank you for detailed report. The procedure you've tried looks
> good basically. Some comments below.
>
> > I got the qemu basic version(03ecd2c80a64d030a22fe67cc7a60f24e17ff211) and
> > patched it correctly
> > but it still didn't make sense and I got the same scenario as before
> > outgoing node intel x86_64; incoming node amd x86_64. guest image is on nfs
> >
> > I think I should show what I do more clearly and hope somebody can figure out
> > the problem
> >
> > ・ 1, both in/out node patch the qemu and start on 3.1.7 kernel with umem
> >
> > ./configure --target-list=
> x86_64-softmmu --enable-kvm --enable-postcopy
> > --enable-debug
> > make
> > make install
> >
> > ・ 2, outgoing qemu:
> >
> > qemu-system-x86_64 -m 256 -hda xxx -monitor stdio -vnc: 2 -usbdevice tablet
> > -machine accel=kvm
> > incoming qemu:
> > qemu-system-x86_64 -m 256 -hda xxx -postcopy -incoming tcp:0:8888 -monitor
> > stdio -vnc: 2 -usbdevice tablet -machine accel=kvm
> >
> > ・ 3, outgoing node:
> >
> > migrate -d -p -n tcp:(incoming node ip):8888
> >
> > result:
> >
> > ・ outgoing qemu:
> >
> > info status: VM-status: paused (finish-migrate);
> >
> > ・ incoming qemu:
> >
> > can't type any more and can't kill the process(qemu-system-x86)
> >
> > I open the debug flag in migration.c migration-tcp.c migration-postcopy.c:
> >
> > ・ outgoing qemu:
> >
> > (qemu) migration-tcp: connect completed
> > migration: beginning savevm
> > 4500:4500 postcopy_outgoing_ram_save_live:540: stage 1
> > migration: iterate
> > 4500:4500 postcopy_outgoing_ram_save_live:540: stage 2
> > migration: done iterating
> > 4500:4500 postcopy_outgoing_ram_save_live:540: stage 3
> > 4500:4500 postcopy_outgoing_begin:716: outgoing begin
> >
> > ・ incoming qemu:
> >
> > (qemu) migration-tcp: Attempting to start an incoming migration
> > migration-tcp: accepted migration
> > 4872:4872 postcopy_incoming_ram_load:1018: incoming ram load
> > 4872:4872 postcopy_incoming_ram_load:1031: addr 0x10870000 flags 0x4
> > 4872:4872 postcopy_incoming_ram_load:1057: done
> > 4872:4872 postcopy_incoming_ram_load:1018: incoming ram load
> > 4872:4872 postcopy_incoming_ram_load:1031: addr 0x0 flags 0x10
> > 4872:4872 postcopy_incoming_ram_load:1037: EOS
> > 4872:4872 postcopy_incoming_ram_load:1018: incoming ram load
> > 4872:4872 postcopy_incoming_ram_load:1031: addr 0x0 flags 0x10
> > 4872:4872 postcopy_incoming_ram_load:1037: EOS
>
> There should be only single EOS line. Just copy & past miss?
>
>
> > from the result:
> > It didn't get to the "successfully loaded vm state"
> > So it still in the qemu_loadvm_state, and I found it's in
> > cpu_synchronize_all_post_init->kvm_arch_put_registers->kvm_put_msrs and got
> > stuck
>
> Can you please track it down one more step?
> Which line did it stuck in kvm_put_msrs()? kvm_put_msrs() doesn't seem to
> block.(backtrace by the debugger would be best.)
>
> If possible, can you please test with more simplified configuration.
> i.e. drop device as much as possible i.e. no usbdevice, no disk...
> So the debug will be simplified.
>
> thanks,
>
> > Does anyone give some advises on the problem?
> > Thanks very much~
> >
> > ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> ━
> > Tommy
> >
> > From: Isaku Yamahata
> > Date: 2011-12-29 09:25
> > To: kvm; qemu-devel
> > CC: yamahata; t.hirofuchi; satoshi.itoh
> > Subject: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration
> > Intro
> > =====
> > This patch series implements postcopy live migration.[1]
> > As discussed at KVM forum 2011, dedicated character device is used for
> > distributed shared memory between migration source and destination.
> > Now we can discuss/benchmark/compare with precopy. I believe there are
> > much rooms for improvement.
> >
> > [1] http://wiki.qemu.org/Features/PostCopyLiveMigration
> >
> >
> > Usage
> > =====
> > You need load umem character device on the host before starting migration.
> > Postcopy can be used for tcg and kvm accelarator. The implementation depend
> > on only linux umem character device. But the driver dependent code is split
> > into a file.
> > I tested only host page size == guest page size case, but the implementation
> > allows host page size != guest page size case.
> >
> > The following options are added with this patch series.
> > - incoming part
> > command line options
> > -postcopy [-postcopy-flags <flags>]
> > where flags is for changing behavior for benchmark/debugging
> > Currently the following flags are available
> > 0: default
> > 1: enable touching page request
> >
> > example:
> > qemu -postcopy -incoming tcp:0:4444 -monitor stdio -machine accel=kvm
> >
> > - outging part
> > options for migrate command
> > migrate [-p [-n]] URI
> > -p: indicate postcopy migration
> > -n: disable background transferring pages: This is for benchmark/debugging
> >
> > example:
> > migrate -p -n tcp:<dest ip address>:4444
> >
> >
> > TODO
> > ====
> > - benchmark/evaluation. Especially how async page fault affects the result.
> > - improve/optimization
> > At the moment at least what I'm aware of is
> > - touching pages in incoming qemu process by fd handler seems suboptimal.
> > creating dedicated thread?
> > - making incoming socket non-blocking
> > - outgoing handler seems suboptimal causing latency.
> > - catch up memory API change
> > - consider on FUSE/CUSE possibility
> > - and more...
> >
> > basic postcopy work flow
> > ========================
> > qemu on the destination
> > |
> > V
> > open(/dev/umem)
> > |
> > V
> > UMEM_DEV_CREATE_UMEM
> > |
> > V
> > Here we have two file descriptors to
> > umem device and shmem file
> > |
> > | umemd
> > | daemon on the destination
> > |
> > V create pipe to communicate
> > fork()---------------------------------------,
> > | |
> > V |
> > close(socket) V
> > close(shmem) mmap(shmem file)
> > | |
> > V V
> > mmap(umem device) for guest RAM close(shmem file)
> > | |
> > close(umem device) |
> > | |
> > V |
> > wait for ready from daemon <----pipe-----send ready message
> > | |
> > | Here the daemon takes over
> > send ok------------pipe---------------> the owner of the socket
> > | to the source
> > V |
> > entering post copy stage |
> > start guest execution |
> > | |
> > V V
> > access guest RAM UMEM_GET_PAGE_REQUEST
> > | |
> > V V
> > page fault ------------------------------>page offset is returned
> > block |
> > V
> > pull page from the source
> > write the page contents
> > to the shmem.
> > |
> > V
> > unblock <-----------------------------UMEM_MARK_PAGE_CACHED
> > the fault handler returns the page
> > page fault is resolved
> > |
> > | pages can be sent
> > | backgroundly
> > | |
> > | V
> > | UMEM_MARK_PAGE_CACHED
> > | |
> > V V
> > The specified pages<-----pipe------------request to touch pages
> > are made present by |
> > touching guest RAM. |
> > | |
> > V V
> > reply-------------pipe-------------> release the cached page
> > | madvise(MADV_REMOVE)
> > | |
> > V V
> >
> > all the pages are pulled from the source
> >
> > | |
> > V V
> > the vma becomes anonymous<----------------UMEM_MAKE_VMA_ANONYMOUS
> > (note: I'm not sure if this can be implemented or not)
> > | |
> > V V
> > migration completes exit()
> >
> >
> >
> > Isaku Yamahata (21):
> > arch_init: export sort_ram_list() and ram_save_block()
> > arch_init: export RAM_SAVE_xxx flags for postcopy
> > arch_init/ram_save: introduce constant for ram save version = 4
> > arch_init: refactor host_from_stream_offset()
> > arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case
> > arch_init: refactor ram_save_block()
> > arch_init/ram_save_live: factor out ram_save_limit
> > arch_init/ram_load: refactor ram_load
> > exec.c: factor out qemu_get_ram_ptr()
> > exec.c: export last_ram_offset()
> > savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip
> > savevm: qemu_pending_size() to return pending buffered size
> > savevm, buffered_file: introduce method to drain buffer of buffered
> > file
> > migration: export migrate_fd_completed() and migrate_fd_cleanup()
> > migration: factor out parameters into MigrationParams
> > umem.h: import Linux umem.h
> > update-linux-headers.sh: teach umem.h to update-linux-headers.sh
> > configure: add CONFIG_POSTCOPY option
> > postcopy: introduce -postcopy and -postcopy-flags option
> > postcopy outgoing: add -p and -n option to migrate command
> > postcopy: implement postcopy livemigration
> >
> > Makefile.target | 4 +
> > arch_init.c | 260 ++++---
> > arch_init.h | 20 +
> > block-migration.c | 8 +-
> > buffered_file.c | 20 +-
> > buffered_file.h | 1 +
> > configure | 12 +
> > cpu-all.h | 9 +
> > exec-obsolete.h | 1 +
> > exec.c | 75 +-
> > hmp-commands.hx | 12 +-
> > hw/hw.h | 7 +-
> > linux-headers/linux/umem.h | 83 ++
> > migration-exec.c | 8 +
> > migration-fd.c | 30 +
> > migration-postcopy-stub.c | 77 ++
> > migration-postcopy.c |
> 1891 +++++++++++++++++++++++++++++++++++++++
> > migration-tcp.c | 37 +-
> > migration-unix.c | 32 +-
> > migration.c | 53 +-
> > migration.h | 49 +-
> > qemu-common.h | 2 +
> > qemu-options.hx | 25 +
> > qmp-commands.hx | 10 +-
> > savevm.c | 31 +-
> > scripts/update-linux-headers.sh | 2 +-
> > sysemu.h | 4 +-
> > umem.c | 379 ++++++++
> > umem.h | 105 +++
> > vl.c | 20 +-
> > 30 files changed, 3086 insertions(+), 181 deletions(-)
> > create mode 100644 linux-headers/linux/umem.h
> > create mode 100644 migration-postcopy-stub.c
> > create mode 100644 migration-postcopy.c
> > create mode 100644 umem.c
> > create mode 100644 umem.h
> >
> >
> >
>
> --
> yamahata
>
>
--
yamahata
next prev parent reply other threads:[~2012-01-12 8:54 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-29 1:25 [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration Isaku Yamahata
2011-12-29 1:25 ` [Qemu-devel] [PATCH 01/21] arch_init: export sort_ram_list() and ram_save_block() Isaku Yamahata
2011-12-29 1:25 ` [Qemu-devel] [PATCH 02/21] arch_init: export RAM_SAVE_xxx flags for postcopy Isaku Yamahata
2011-12-29 1:25 ` [Qemu-devel] [PATCH 03/21] arch_init/ram_save: introduce constant for ram save version = 4 Isaku Yamahata
2011-12-29 1:25 ` [Qemu-devel] [PATCH 04/21] arch_init: refactor host_from_stream_offset() Isaku Yamahata
2011-12-29 1:25 ` [Qemu-devel] [PATCH 05/21] arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case Isaku Yamahata
2011-12-29 1:25 ` [Qemu-devel] [PATCH 06/21] arch_init: refactor ram_save_block() Isaku Yamahata
2011-12-29 1:25 ` [Qemu-devel] [PATCH 07/21] arch_init/ram_save_live: factor out ram_save_limit Isaku Yamahata
2011-12-29 1:25 ` [Qemu-devel] [PATCH 08/21] arch_init/ram_load: refactor ram_load Isaku Yamahata
2011-12-29 1:25 ` [Qemu-devel] [PATCH 09/21] exec.c: factor out qemu_get_ram_ptr() Isaku Yamahata
2011-12-29 1:25 ` [Qemu-devel] [PATCH 10/21] exec.c: export last_ram_offset() Isaku Yamahata
2011-12-29 1:25 ` [Qemu-devel] [PATCH 11/21] savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip Isaku Yamahata
2011-12-29 1:25 ` [Qemu-devel] [PATCH 12/21] savevm: qemu_pending_size() to return pending buffered size Isaku Yamahata
2011-12-29 1:25 ` [Qemu-devel] [PATCH 13/21] savevm, buffered_file: introduce method to drain buffer of buffered file Isaku Yamahata
2011-12-29 1:25 ` [Qemu-devel] [PATCH 14/21] migration: export migrate_fd_completed() and migrate_fd_cleanup() Isaku Yamahata
2011-12-29 1:25 ` [Qemu-devel] [PATCH 15/21] migration: factor out parameters into MigrationParams Isaku Yamahata
2011-12-29 1:25 ` [Qemu-devel] [PATCH 16/21] umem.h: import Linux umem.h Isaku Yamahata
2011-12-29 1:25 ` [Qemu-devel] [PATCH 17/21] update-linux-headers.sh: teach umem.h to update-linux-headers.sh Isaku Yamahata
2011-12-29 1:25 ` [Qemu-devel] [PATCH 18/21] configure: add CONFIG_POSTCOPY option Isaku Yamahata
2011-12-29 1:25 ` [Qemu-devel] [PATCH 19/21] postcopy: introduce -postcopy and -postcopy-flags option Isaku Yamahata
2011-12-29 1:25 ` [Qemu-devel] [PATCH 20/21] postcopy outgoing: add -p and -n option to migrate command Isaku Yamahata
2011-12-29 1:26 ` [Qemu-devel] [PATCH 21/21] postcopy: implement postcopy livemigration Isaku Yamahata
2011-12-29 15:51 ` Orit Wasserman
2012-01-04 3:34 ` Isaku Yamahata
2011-12-29 16:06 ` Avi Kivity
2012-01-04 3:29 ` Isaku Yamahata
2012-01-12 14:15 ` Avi Kivity
2011-12-29 22:39 ` [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration Anthony Liguori
2012-01-01 9:43 ` Orit Wasserman
2012-01-01 16:27 ` Stefan Hajnoczi
2012-01-02 9:28 ` Dor Laor
2012-01-02 17:22 ` Stefan Hajnoczi
2012-01-01 9:52 ` Dor Laor
2012-01-04 1:30 ` Takuya Yoshikawa
2012-01-04 3:48 ` Michael Roth
2012-01-04 3:51 ` Isaku Yamahata
[not found] ` <BLU0-SMTP161AC380D472854F48E33A5BC9A0@phx.gbl>
2012-01-11 2:45 ` [Qemu-devel] 回??: " Isaku Yamahata
2012-01-12 8:29 ` thfbjyddx
2012-01-12 8:54 ` Isaku Yamahata [this message]
2012-01-12 13:26 ` [Qemu-devel] 回??: [PATCH 00/21][RFC] postcopy live?migration thfbjyddx
2012-01-16 6:51 ` Isaku Yamahata
2012-01-16 10:17 ` Isaku Yamahata
2012-03-12 8:36 ` thfbjyddx
2012-03-13 3:21 ` Isaku Yamahata
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120112085441.GA23322@valinux.co.jp \
--to=yamahata@valinux.co.jp \
--cc=kvm@vger.kernel.org \
--cc=qemu-devel@nongnu.org \
--cc=satoshi.itoh@aist.go.jp \
--cc=t.hirofuchi@aist.go.jp \
--cc=thfbjyddx@hotmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).