From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:34025) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SbYGY-00056v-VL for qemu-devel@nongnu.org; Mon, 04 Jun 2012 10:28:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SbYGS-0006bF-KL for qemu-devel@nongnu.org; Mon, 04 Jun 2012 10:28:18 -0400 Received: from g5t0008.atlanta.hp.com ([15.192.0.45]:32322) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SbYGS-0006Vj-DX for qemu-devel@nongnu.org; Mon, 04 Jun 2012 10:28:12 -0400 Message-ID: <4FCCC5CD.1070107@hp.com> Date: Mon, 04 Jun 2012 07:27:25 -0700 From: Chegu Vinod MIME-Version: 1.0 References: <4FCCA39A.1050300@hp.com> <20120604131325.GA4033@valinux.co.jp> In-Reply-To: <20120604131325.GA4033@valinux.co.jp> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Fwd: [PATCH v2 00/41] postcopy live migration Reply-To: chegu_vinod@hp.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Isaku Yamahata Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org On 6/4/2012 6:13 AM, Isaku Yamahata wrote: > On Mon, Jun 04, 2012 at 05:01:30AM -0700, Chegu Vinod wrote: >> Hello Isaku Yamahata, > Hi. > >> I just saw your patches..Would it be possible to email me a tar bundle of these >> patches (makes it easier to apply the patches to a copy of the upstream qemu.git) > I uploaded them to github for those who are interested in it. > > git://github.com/yamahata/qemu.git qemu-postcopy-june-04-2012 > git://github.com/yamahata/linux-umem.git linux-umem-june-04-2012 > Thanks for the pointer... >> BTW, I am also curious if you have considered using any kind of RDMA features for >> optimizing the page-faults during postcopy ? > Yes, RDMA is interesting topic. Can we share your use case/concern/issues? Looking at large sized guests (256GB and higher) running cpu/memory intensive enterprise workloads. The concerns are the same...i.e. having a predictable total migration time, minimal downtime/freeze-time and of course minimal service degradation to the workload(s) in the VM or the co-located VM's... How large of a guest have you tested your changes with and what kind of workloads have you used so far ? > Thus we can collaborate. > You may want to see Benoit's results. Yes. 'have already seen some of Benoit's results. Hence the question about use of RDMA techniques for post copy. > As long as I know, he has not published > his code yet. Thanks Vinod > > thanks, > >> Thanks >> Vinod >> >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Mon, 4 Jun 2012 18:57:02 +0900 >> From: Isaku Yamahata >> To: qemu-devel@nongnu.org, kvm@vger.kernel.org >> Cc: benoit.hudzia@gmail.com, aarcange@redhat.com, aliguori@us.ibm.com, >> quintela@redhat.com, stefanha@gmail.com, t.hirofuchi@aist.go.jp, >> dlaor@redhat.com, satoshi.itoh@aist.go.jp, mdroth@linux.vnet.ibm.com, >> yoshikawa.takuya@oss.ntt.co.jp, owasserm@redhat.com, avi@redhat.com, >> pbonzini@redhat.com >> Subject: [Qemu-devel] [PATCH v2 00/41] postcopy live migration >> Message-ID: >> >> After the long time, we have v2. This is qemu part. >> The linux kernel part is sent separatedly. >> >> Changes v1 -> v2: >> - split up patches for review >> - buffered file refactored >> - many bug fixes >> Espcially PV drivers can work with postcopy >> - optimization/heuristic >> >> Patches >> 1 - 30: refactoring exsiting code and preparation >> 31 - 37: implement postcopy itself (essential part) >> 38 - 41: some optimization/heuristic for postcopy >> >> Intro >> ===== >> This patch series implements postcopy live migration.[1] >> As discussed at KVM forum 2011, dedicated character device is used for >> distributed shared memory between migration source and destination. >> Now we can discuss/benchmark/compare with precopy. I believe there are >> much rooms for improvement. >> >> [1] http://wiki.qemu.org/Features/PostCopyLiveMigration >> >> >> Usage >> ===== >> You need load umem character device on the host before starting migration. >> Postcopy can be used for tcg and kvm accelarator. The implementation depend >> on only linux umem character device. But the driver dependent code is split >> into a file. >> I tested only host page size == guest page size case, but the implementation >> allows host page size != guest page size case. >> >> The following options are added with this patch series. >> - incoming part >> command line options >> -postcopy [-postcopy-flags] >> where flags is for changing behavior for benchmark/debugging >> Currently the following flags are available >> 0: default >> 1: enable touching page request >> >> example: >> qemu -postcopy -incoming tcp:0:4444 -monitor stdio -machine accel=kvm >> >> - outging part >> options for migrate command >> migrate [-p [-n] [-m]] URI [ []] >> -p: indicate postcopy migration >> -n: disable background transferring pages: This is for benchmark/debugging >> -m: move background transfer of postcopy mode >> : The number of forward pages which is sent with on-demand >> : The number of backward pages which is sent with >> on-demand >> >> example: >> migrate -p -n tcp::4444 >> migrate -p -n -m tcp::4444 32 0 >> >> >> TODO >> ==== >> - benchmark/evaluation. Especially how async page fault affects the result. >> - improve/optimization >> At the moment at least what I'm aware of is >> - making incoming socket non-blocking with thread >> As page compression is comming, it is impractical to non-blocking read >> and check if the necessary data is read. >> - touching pages in incoming qemu process by fd handler seems suboptimal. >> creating dedicated thread? >> - outgoing handler seems suboptimal causing latency. >> - consider on FUSE/CUSE possibility >> - don't fork umemd, but create thread? >> >> basic postcopy work flow >> ======================== >> qemu on the destination >> | >> V >> open(/dev/umem) >> | >> V >> UMEM_INIT >> | >> V >> Here we have two file descriptors to >> umem device and shmem file >> | >> | umemd >> | daemon on the destination >> | >> V create pipe to communicate >> fork()---------------------------------------, >> | | >> V | >> close(socket) V >> close(shmem) mmap(shmem file) >> | | >> V V >> mmap(umem device) for guest RAM close(shmem file) >> | | >> close(umem device) | >> | | >> V | >> wait for ready from daemon<----pipe-----send ready message >> | | >> | Here the daemon takes over >> send ok------------pipe---------------> the owner of the socket >> | to the source >> V | >> entering post copy stage | >> start guest execution | >> | | >> V V >> access guest RAM read() to get faulted pages >> | | >> V V >> page fault ------------------------------>page offset is returned >> block | >> V >> pull page from the source >> write the page contents >> to the shmem. >> | >> V >> unblock<-----------------------------write() to tell served pages >> the fault handler returns the page >> page fault is resolved >> | >> | pages can be sent >> | backgroundly >> | | >> | V >> | write() >> | | >> V V >> The specified pages<-----pipe------------request to touch pages >> are made present by | >> touching guest RAM. | >> | | >> V V >> reply-------------pipe-------------> release the cached page >> | madvise(MADV_REMOVE) >> | | >> V V >> >> all the pages are pulled from the source >> >> | | >> V V >> the vma becomes anonymous<----------------UMEM_MAKE_VMA_ANONYMOUS >> (note: I'm not sure if this can be implemented or not) >> | | >> V V >> migration completes exit() >> >> >> >> >> Isaku Yamahata (41): >> arch_init: export sort_ram_list() and ram_save_block() >> arch_init: export RAM_SAVE_xxx flags for postcopy >> arch_init/ram_save: introduce constant for ram save version = 4 >> arch_init: refactor host_from_stream_offset() >> arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case >> arch_init: refactor ram_save_block() >> arch_init/ram_save_live: factor out ram_save_limit >> arch_init/ram_load: refactor ram_load >> arch_init: introduce helper function to find ram block with id string >> arch_init: simplify a bit by ram_find_block() >> arch_init: factor out counting transferred bytes >> arch_init: factor out setting last_block, last_offset >> exec.c: factor out qemu_get_ram_ptr() >> exec.c: export last_ram_offset() >> savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip >> savevm: qemu_pending_size() to return pending buffered size >> savevm, buffered_file: introduce method to drain buffer of buffered >> file >> QEMUFile: add qemu_file_fd() for later use >> savevm/QEMUFile: drop qemu_stdio_fd >> savevm/QEMUFileSocket: drop duplicated member fd >> savevm: rename QEMUFileSocket to QEMUFileFD, socket_close to fd_close >> savevm/QEMUFile: introduce qemu_fopen_fd >> migration.c: remove redundant line in migrate_init() >> migration: export migrate_fd_completed() and migrate_fd_cleanup() >> migration: factor out parameters into MigrationParams >> buffered_file: factor out buffer management logic >> buffered_file: Introduce QEMUFileNonblock for nonblock write >> buffered_file: add qemu_file to read/write to buffer in memory >> umem.h: import Linux umem.h >> update-linux-headers.sh: teach umem.h to update-linux-headers.sh >> configure: add CONFIG_POSTCOPY option >> savevm: add new section that is used by postcopy >> postcopy: introduce -postcopy and -postcopy-flags option >> postcopy outgoing: add -p and -n option to migrate command >> postcopy: introduce helper functions for postcopy >> postcopy: implement incoming part of postcopy live migration >> postcopy: implement outgoing part of postcopy live migration >> postcopy/outgoing: add forward, backward option to specify the size >> of prefault >> postcopy/outgoing: implement prefault >> migrate: add -m (movebg) option to migrate command >> migration/postcopy: add movebg mode >> >> Makefile.target | 5 + >> arch_init.c | 298 ++++--- >> arch_init.h | 20 + >> block-migration.c | 8 +- >> buffered_file.c | 322 ++++++-- >> buffered_file.h | 32 + >> configure | 12 + >> cpu-all.h | 9 + >> exec-obsolete.h | 1 + >> exec.c | 87 ++- >> hmp-commands.hx | 18 +- >> hmp.c | 10 +- >> linux-headers/linux/umem.h | 42 + >> migration-exec.c | 12 +- >> migration-fd.c | 25 +- >> migration-postcopy-stub.c | 77 ++ >> migration-postcopy.c | 1771 +++++++++++++++++++++++++++++++++++++++ >> migration-tcp.c | 25 +- >> migration-unix.c | 26 +- >> migration.c | 97 ++- >> migration.h | 47 +- >> qapi-schema.json | 4 +- >> qemu-common.h | 2 + >> qemu-file.h | 8 +- >> qemu-options.hx | 25 + >> qmp-commands.hx | 4 +- >> savevm.c | 177 ++++- >> scripts/update-linux-headers.sh | 2 +- >> sysemu.h | 4 +- >> umem.c | 364 ++++++++ >> umem.h | 101 +++ >> vl.c | 16 +- >> vmstate.h | 2 +- >> 33 files changed, 3373 insertions(+), 280 deletions(-) >> create mode 100644 linux-headers/linux/umem.h >> create mode 100644 migration-postcopy-stub.c >> create mode 100644 migration-postcopy.c >> create mode 100644 umem.c >> create mode 100644 umem.h >> >> >> >> >> ------------------------------ >> >>