qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* Re: [Qemu-devel] Fwd:  [PATCH v2 00/41] postcopy live migration
       [not found] <4FCCA39A.1050300@hp.com>
@ 2012-06-04 13:13 ` Isaku Yamahata
  2012-06-04 14:27   ` Chegu Vinod
  0 siblings, 1 reply; 3+ messages in thread
From: Isaku Yamahata @ 2012-06-04 13:13 UTC (permalink / raw)
  To: Chegu Vinod; +Cc: qemu-devel, kvm

On Mon, Jun 04, 2012 at 05:01:30AM -0700, Chegu Vinod wrote:
> Hello Isaku Yamahata,

Hi.

> I just saw your patches..Would it be possible to email me a tar bundle of these
> patches (makes it easier to apply the patches to a copy of the upstream qemu.git)

I uploaded them to github for those who are interested in it.

git://github.com/yamahata/qemu.git qemu-postcopy-june-04-2012
git://github.com/yamahata/linux-umem.git  linux-umem-june-04-2012 


> BTW, I am also curious if you have considered using any kind of RDMA features for
> optimizing the page-faults during postcopy ?

Yes, RDMA is interesting topic. Can we share your use case/concern/issues?
Thus we can collaborate.
You may want to see Benoit's results. As long as I know, he has not published
his code yet.

thanks,

> Thanks
> Vinod
>
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon,  4 Jun 2012 18:57:02 +0900
> From: Isaku Yamahata<yamahata@valinux.co.jp>
> To: qemu-devel@nongnu.org, kvm@vger.kernel.org
> Cc: benoit.hudzia@gmail.com, aarcange@redhat.com, aliguori@us.ibm.com,
> 	quintela@redhat.com, stefanha@gmail.com, t.hirofuchi@aist.go.jp,
> 	dlaor@redhat.com, satoshi.itoh@aist.go.jp,	mdroth@linux.vnet.ibm.com,
> 	yoshikawa.takuya@oss.ntt.co.jp,	owasserm@redhat.com, avi@redhat.com,
> 	pbonzini@redhat.com
> Subject: [Qemu-devel] [PATCH v2 00/41] postcopy live migration
> Message-ID:<cover.1338802190.git.yamahata@valinux.co.jp>
>
> After the long time, we have v2. This is qemu part.
> The linux kernel part is sent separatedly.
>
> Changes v1 ->  v2:
> - split up patches for review
> - buffered file refactored
> - many bug fixes
>   Espcially PV drivers can work with postcopy
> - optimization/heuristic
>
> Patches
> 1 - 30: refactoring exsiting code and preparation
> 31 - 37: implement postcopy itself (essential part)
> 38 - 41: some optimization/heuristic for postcopy
>
> Intro
> =====
> This patch series implements postcopy live migration.[1]
> As discussed at KVM forum 2011, dedicated character device is used for
> distributed shared memory between migration source and destination.
> Now we can discuss/benchmark/compare with precopy. I believe there are
> much rooms for improvement.
>
> [1] http://wiki.qemu.org/Features/PostCopyLiveMigration
>
>
> Usage
> =====
> You need load umem character device on the host before starting migration.
> Postcopy can be used for tcg and kvm accelarator. The implementation depend
> on only linux umem character device. But the driver dependent code is split
> into a file.
> I tested only host page size == guest page size case, but the implementation
> allows host page size != guest page size case.
>
> The following options are added with this patch series.
> - incoming part
>   command line options
>   -postcopy [-postcopy-flags<flags>]
>   where flags is for changing behavior for benchmark/debugging
>   Currently the following flags are available
>   0: default
>   1: enable touching page request
>
>   example:
>   qemu -postcopy -incoming tcp:0:4444 -monitor stdio -machine accel=kvm
>
> - outging part
>   options for migrate command
>   migrate [-p [-n] [-m]] URI [<prefault forward>  [<prefault backword>]]
>   -p: indicate postcopy migration
>   -n: disable background transferring pages: This is for benchmark/debugging
>   -m: move background transfer of postcopy mode
>   <prefault forward>: The number of forward pages which is sent with on-demand
>   <prefault backward>: The number of backward pages which is sent with
>                        on-demand
>
>   example:
>   migrate -p -n tcp:<dest ip address>:4444
>   migrate -p -n -m tcp:<dest ip address>:4444 32 0
>
>
> TODO
> ====
> - benchmark/evaluation. Especially how async page fault affects the result.
> - improve/optimization
>   At the moment at least what I'm aware of is
>   - making incoming socket non-blocking with thread
>     As page compression is comming, it is impractical to non-blocking read
>     and check if the necessary data is read.
>   - touching pages in incoming qemu process by fd handler seems suboptimal.
>     creating dedicated thread?
>   - outgoing handler seems suboptimal causing latency.
> - consider on FUSE/CUSE possibility
> - don't fork umemd, but create thread?
>
> basic postcopy work flow
> ========================
>         qemu on the destination
>               |
>               V
>         open(/dev/umem)
>               |
>               V
>         UMEM_INIT
>               |
>               V
>         Here we have two file descriptors to
>         umem device and shmem file
>               |
>               |                                  umemd
>               |                                  daemon on the destination
>               |
>               V    create pipe to communicate
>         fork()---------------------------------------,
>               |                                      |
>               V                                      |
>         close(socket)                                V
>         close(shmem)                              mmap(shmem file)
>               |                                      |
>               V                                      V
>         mmap(umem device) for guest RAM           close(shmem file)
>               |                                      |
>         close(umem device)                           |
>               |                                      |
>               V                                      |
>         wait for ready from daemon<----pipe-----send ready message
>               |                                      |
>               |                                 Here the daemon takes over
>         send ok------------pipe--------------->  the owner of the socket
>               |				        to the source
>               V                                      |
>         entering post copy stage                     |
>         start guest execution                        |
>               |                                      |
>               V                                      V
>         access guest RAM                          read() to get faulted pages
>               |                                      |
>               V                                      V
>         page fault ------------------------------>page offset is returned
>         block                                        |
>                                                      V
>                                                   pull page from the source
>                                                   write the page contents
>                                                   to the shmem.
>                                                      |
>                                                      V
>         unblock<-----------------------------write() to tell served pages
>         the fault handler returns the page
>         page fault is resolved
>               |
>               |                                   pages can be sent
>               |                                   backgroundly
>               |                                      |
>               |                                      V
>               |                                   write()
>               |                                      |
>               V                                      V
>         The specified pages<-----pipe------------request to touch pages
>         are made present by                          |
>         touching guest RAM.                          |
>               |                                      |
>               V                                      V
>              reply-------------pipe------------->  release the cached page
>               |                                   madvise(MADV_REMOVE)
>               |                                      |
>               V                                      V
>
>                  all the pages are pulled from the source
>
>               |                                      |
>               V                                      V
>         the vma becomes anonymous<----------------UMEM_MAKE_VMA_ANONYMOUS
>        (note: I'm not sure if this can be implemented or not)
>               |                                      |
>               V                                      V
>         migration completes                        exit()
>
>
>
>
> Isaku Yamahata (41):
>   arch_init: export sort_ram_list() and ram_save_block()
>   arch_init: export RAM_SAVE_xxx flags for postcopy
>   arch_init/ram_save: introduce constant for ram save version = 4
>   arch_init: refactor host_from_stream_offset()
>   arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case
>   arch_init: refactor ram_save_block()
>   arch_init/ram_save_live: factor out ram_save_limit
>   arch_init/ram_load: refactor ram_load
>   arch_init: introduce helper function to find ram block with id string
>   arch_init: simplify a bit by ram_find_block()
>   arch_init: factor out counting transferred bytes
>   arch_init: factor out setting last_block, last_offset
>   exec.c: factor out qemu_get_ram_ptr()
>   exec.c: export last_ram_offset()
>   savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip
>   savevm: qemu_pending_size() to return pending buffered size
>   savevm, buffered_file: introduce method to drain buffer of buffered
>     file
>   QEMUFile: add qemu_file_fd() for later use
>   savevm/QEMUFile: drop qemu_stdio_fd
>   savevm/QEMUFileSocket: drop duplicated member fd
>   savevm: rename QEMUFileSocket to QEMUFileFD, socket_close to fd_close
>   savevm/QEMUFile: introduce qemu_fopen_fd
>   migration.c: remove redundant line in migrate_init()
>   migration: export migrate_fd_completed() and migrate_fd_cleanup()
>   migration: factor out parameters into MigrationParams
>   buffered_file: factor out buffer management logic
>   buffered_file: Introduce QEMUFileNonblock for nonblock write
>   buffered_file: add qemu_file to read/write to buffer in memory
>   umem.h: import Linux umem.h
>   update-linux-headers.sh: teach umem.h to update-linux-headers.sh
>   configure: add CONFIG_POSTCOPY option
>   savevm: add new section that is used by postcopy
>   postcopy: introduce -postcopy and -postcopy-flags option
>   postcopy outgoing: add -p and -n option to migrate command
>   postcopy: introduce helper functions for postcopy
>   postcopy: implement incoming part of postcopy live migration
>   postcopy: implement outgoing part of postcopy live migration
>   postcopy/outgoing: add forward, backward option to specify the size
>     of prefault
>   postcopy/outgoing: implement prefault
>   migrate: add -m (movebg) option to migrate command
>   migration/postcopy: add movebg mode
>
>  Makefile.target                 |    5 +
>  arch_init.c                     |  298 ++++---
>  arch_init.h                     |   20 +
>  block-migration.c               |    8 +-
>  buffered_file.c                 |  322 ++++++--
>  buffered_file.h                 |   32 +
>  configure                       |   12 +
>  cpu-all.h                       |    9 +
>  exec-obsolete.h                 |    1 +
>  exec.c                          |   87 ++-
>  hmp-commands.hx                 |   18 +-
>  hmp.c                           |   10 +-
>  linux-headers/linux/umem.h      |   42 +
>  migration-exec.c                |   12 +-
>  migration-fd.c                  |   25 +-
>  migration-postcopy-stub.c       |   77 ++
>  migration-postcopy.c            | 1771 +++++++++++++++++++++++++++++++++++++++
>  migration-tcp.c                 |   25 +-
>  migration-unix.c                |   26 +-
>  migration.c                     |   97 ++-
>  migration.h                     |   47 +-
>  qapi-schema.json                |    4 +-
>  qemu-common.h                   |    2 +
>  qemu-file.h                     |    8 +-
>  qemu-options.hx                 |   25 +
>  qmp-commands.hx                 |    4 +-
>  savevm.c                        |  177 ++++-
>  scripts/update-linux-headers.sh |    2 +-
>  sysemu.h                        |    4 +-
>  umem.c                          |  364 ++++++++
>  umem.h                          |  101 +++
>  vl.c                            |   16 +-
>  vmstate.h                       |    2 +-
>  33 files changed, 3373 insertions(+), 280 deletions(-)
>  create mode 100644 linux-headers/linux/umem.h
>  create mode 100644 migration-postcopy-stub.c
>  create mode 100644 migration-postcopy.c
>  create mode 100644 umem.c
>  create mode 100644 umem.h
>
>
>
>
> ------------------------------
>
>

-- 
yamahata

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Qemu-devel] Fwd:  [PATCH v2 00/41] postcopy live migration
  2012-06-04 13:13 ` [Qemu-devel] Fwd: [PATCH v2 00/41] postcopy live migration Isaku Yamahata
@ 2012-06-04 14:27   ` Chegu Vinod
  2012-06-04 15:13     ` Isaku Yamahata
  0 siblings, 1 reply; 3+ messages in thread
From: Chegu Vinod @ 2012-06-04 14:27 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: qemu-devel, kvm

On 6/4/2012 6:13 AM, Isaku Yamahata wrote:
> On Mon, Jun 04, 2012 at 05:01:30AM -0700, Chegu Vinod wrote:
>> Hello Isaku Yamahata,
> Hi.
>
>> I just saw your patches..Would it be possible to email me a tar bundle of these
>> patches (makes it easier to apply the patches to a copy of the upstream qemu.git)
> I uploaded them to github for those who are interested in it.
>
> git://github.com/yamahata/qemu.git qemu-postcopy-june-04-2012
> git://github.com/yamahata/linux-umem.git  linux-umem-june-04-2012
>

Thanks for the pointer...
>> BTW, I am also curious if you have considered using any kind of RDMA features for
>> optimizing the page-faults during postcopy ?
> Yes, RDMA is interesting topic. Can we share your use case/concern/issues?


Looking at large sized guests (256GB and higher)  running cpu/memory 
intensive enterprise workloads.
The  concerns are the same...i.e. having a predictable total migration 
time, minimal downtime/freeze-time and of course minimal service 
degradation to the workload(s) in the VM or the co-located VM's...

How large of a guest have you tested your changes with and what kind of 
workloads have you used so far ?

> Thus we can collaborate.
> You may want to see Benoit's results.

Yes. 'have already seen some of Benoit's results.

Hence the question about use of RDMA techniques for post copy.

> As long as I know, he has not published
> his code yet.

Thanks
Vinod

>
> thanks,
>
>> Thanks
>> Vinod
>>
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Mon,  4 Jun 2012 18:57:02 +0900
>> From: Isaku Yamahata<yamahata@valinux.co.jp>
>> To: qemu-devel@nongnu.org, kvm@vger.kernel.org
>> Cc: benoit.hudzia@gmail.com, aarcange@redhat.com, aliguori@us.ibm.com,
>> 	quintela@redhat.com, stefanha@gmail.com, t.hirofuchi@aist.go.jp,
>> 	dlaor@redhat.com, satoshi.itoh@aist.go.jp,	mdroth@linux.vnet.ibm.com,
>> 	yoshikawa.takuya@oss.ntt.co.jp,	owasserm@redhat.com, avi@redhat.com,
>> 	pbonzini@redhat.com
>> Subject: [Qemu-devel] [PATCH v2 00/41] postcopy live migration
>> Message-ID:<cover.1338802190.git.yamahata@valinux.co.jp>
>>
>> After the long time, we have v2. This is qemu part.
>> The linux kernel part is sent separatedly.
>>
>> Changes v1 ->   v2:
>> - split up patches for review
>> - buffered file refactored
>> - many bug fixes
>>    Espcially PV drivers can work with postcopy
>> - optimization/heuristic
>>
>> Patches
>> 1 - 30: refactoring exsiting code and preparation
>> 31 - 37: implement postcopy itself (essential part)
>> 38 - 41: some optimization/heuristic for postcopy
>>
>> Intro
>> =====
>> This patch series implements postcopy live migration.[1]
>> As discussed at KVM forum 2011, dedicated character device is used for
>> distributed shared memory between migration source and destination.
>> Now we can discuss/benchmark/compare with precopy. I believe there are
>> much rooms for improvement.
>>
>> [1] http://wiki.qemu.org/Features/PostCopyLiveMigration
>>
>>
>> Usage
>> =====
>> You need load umem character device on the host before starting migration.
>> Postcopy can be used for tcg and kvm accelarator. The implementation depend
>> on only linux umem character device. But the driver dependent code is split
>> into a file.
>> I tested only host page size == guest page size case, but the implementation
>> allows host page size != guest page size case.
>>
>> The following options are added with this patch series.
>> - incoming part
>>    command line options
>>    -postcopy [-postcopy-flags<flags>]
>>    where flags is for changing behavior for benchmark/debugging
>>    Currently the following flags are available
>>    0: default
>>    1: enable touching page request
>>
>>    example:
>>    qemu -postcopy -incoming tcp:0:4444 -monitor stdio -machine accel=kvm
>>
>> - outging part
>>    options for migrate command
>>    migrate [-p [-n] [-m]] URI [<prefault forward>   [<prefault backword>]]
>>    -p: indicate postcopy migration
>>    -n: disable background transferring pages: This is for benchmark/debugging
>>    -m: move background transfer of postcopy mode
>>    <prefault forward>: The number of forward pages which is sent with on-demand
>>    <prefault backward>: The number of backward pages which is sent with
>>                         on-demand
>>
>>    example:
>>    migrate -p -n tcp:<dest ip address>:4444
>>    migrate -p -n -m tcp:<dest ip address>:4444 32 0
>>
>>
>> TODO
>> ====
>> - benchmark/evaluation. Especially how async page fault affects the result.
>> - improve/optimization
>>    At the moment at least what I'm aware of is
>>    - making incoming socket non-blocking with thread
>>      As page compression is comming, it is impractical to non-blocking read
>>      and check if the necessary data is read.
>>    - touching pages in incoming qemu process by fd handler seems suboptimal.
>>      creating dedicated thread?
>>    - outgoing handler seems suboptimal causing latency.
>> - consider on FUSE/CUSE possibility
>> - don't fork umemd, but create thread?
>>
>> basic postcopy work flow
>> ========================
>>          qemu on the destination
>>                |
>>                V
>>          open(/dev/umem)
>>                |
>>                V
>>          UMEM_INIT
>>                |
>>                V
>>          Here we have two file descriptors to
>>          umem device and shmem file
>>                |
>>                |                                  umemd
>>                |                                  daemon on the destination
>>                |
>>                V    create pipe to communicate
>>          fork()---------------------------------------,
>>                |                                      |
>>                V                                      |
>>          close(socket)                                V
>>          close(shmem)                              mmap(shmem file)
>>                |                                      |
>>                V                                      V
>>          mmap(umem device) for guest RAM           close(shmem file)
>>                |                                      |
>>          close(umem device)                           |
>>                |                                      |
>>                V                                      |
>>          wait for ready from daemon<----pipe-----send ready message
>>                |                                      |
>>                |                                 Here the daemon takes over
>>          send ok------------pipe--------------->   the owner of the socket
>>                |				        to the source
>>                V                                      |
>>          entering post copy stage                     |
>>          start guest execution                        |
>>                |                                      |
>>                V                                      V
>>          access guest RAM                          read() to get faulted pages
>>                |                                      |
>>                V                                      V
>>          page fault ------------------------------>page offset is returned
>>          block                                        |
>>                                                       V
>>                                                    pull page from the source
>>                                                    write the page contents
>>                                                    to the shmem.
>>                                                       |
>>                                                       V
>>          unblock<-----------------------------write() to tell served pages
>>          the fault handler returns the page
>>          page fault is resolved
>>                |
>>                |                                   pages can be sent
>>                |                                   backgroundly
>>                |                                      |
>>                |                                      V
>>                |                                   write()
>>                |                                      |
>>                V                                      V
>>          The specified pages<-----pipe------------request to touch pages
>>          are made present by                          |
>>          touching guest RAM.                          |
>>                |                                      |
>>                V                                      V
>>               reply-------------pipe------------->   release the cached page
>>                |                                   madvise(MADV_REMOVE)
>>                |                                      |
>>                V                                      V
>>
>>                   all the pages are pulled from the source
>>
>>                |                                      |
>>                V                                      V
>>          the vma becomes anonymous<----------------UMEM_MAKE_VMA_ANONYMOUS
>>         (note: I'm not sure if this can be implemented or not)
>>                |                                      |
>>                V                                      V
>>          migration completes                        exit()
>>
>>
>>
>>
>> Isaku Yamahata (41):
>>    arch_init: export sort_ram_list() and ram_save_block()
>>    arch_init: export RAM_SAVE_xxx flags for postcopy
>>    arch_init/ram_save: introduce constant for ram save version = 4
>>    arch_init: refactor host_from_stream_offset()
>>    arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case
>>    arch_init: refactor ram_save_block()
>>    arch_init/ram_save_live: factor out ram_save_limit
>>    arch_init/ram_load: refactor ram_load
>>    arch_init: introduce helper function to find ram block with id string
>>    arch_init: simplify a bit by ram_find_block()
>>    arch_init: factor out counting transferred bytes
>>    arch_init: factor out setting last_block, last_offset
>>    exec.c: factor out qemu_get_ram_ptr()
>>    exec.c: export last_ram_offset()
>>    savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip
>>    savevm: qemu_pending_size() to return pending buffered size
>>    savevm, buffered_file: introduce method to drain buffer of buffered
>>      file
>>    QEMUFile: add qemu_file_fd() for later use
>>    savevm/QEMUFile: drop qemu_stdio_fd
>>    savevm/QEMUFileSocket: drop duplicated member fd
>>    savevm: rename QEMUFileSocket to QEMUFileFD, socket_close to fd_close
>>    savevm/QEMUFile: introduce qemu_fopen_fd
>>    migration.c: remove redundant line in migrate_init()
>>    migration: export migrate_fd_completed() and migrate_fd_cleanup()
>>    migration: factor out parameters into MigrationParams
>>    buffered_file: factor out buffer management logic
>>    buffered_file: Introduce QEMUFileNonblock for nonblock write
>>    buffered_file: add qemu_file to read/write to buffer in memory
>>    umem.h: import Linux umem.h
>>    update-linux-headers.sh: teach umem.h to update-linux-headers.sh
>>    configure: add CONFIG_POSTCOPY option
>>    savevm: add new section that is used by postcopy
>>    postcopy: introduce -postcopy and -postcopy-flags option
>>    postcopy outgoing: add -p and -n option to migrate command
>>    postcopy: introduce helper functions for postcopy
>>    postcopy: implement incoming part of postcopy live migration
>>    postcopy: implement outgoing part of postcopy live migration
>>    postcopy/outgoing: add forward, backward option to specify the size
>>      of prefault
>>    postcopy/outgoing: implement prefault
>>    migrate: add -m (movebg) option to migrate command
>>    migration/postcopy: add movebg mode
>>
>>   Makefile.target                 |    5 +
>>   arch_init.c                     |  298 ++++---
>>   arch_init.h                     |   20 +
>>   block-migration.c               |    8 +-
>>   buffered_file.c                 |  322 ++++++--
>>   buffered_file.h                 |   32 +
>>   configure                       |   12 +
>>   cpu-all.h                       |    9 +
>>   exec-obsolete.h                 |    1 +
>>   exec.c                          |   87 ++-
>>   hmp-commands.hx                 |   18 +-
>>   hmp.c                           |   10 +-
>>   linux-headers/linux/umem.h      |   42 +
>>   migration-exec.c                |   12 +-
>>   migration-fd.c                  |   25 +-
>>   migration-postcopy-stub.c       |   77 ++
>>   migration-postcopy.c            | 1771 +++++++++++++++++++++++++++++++++++++++
>>   migration-tcp.c                 |   25 +-
>>   migration-unix.c                |   26 +-
>>   migration.c                     |   97 ++-
>>   migration.h                     |   47 +-
>>   qapi-schema.json                |    4 +-
>>   qemu-common.h                   |    2 +
>>   qemu-file.h                     |    8 +-
>>   qemu-options.hx                 |   25 +
>>   qmp-commands.hx                 |    4 +-
>>   savevm.c                        |  177 ++++-
>>   scripts/update-linux-headers.sh |    2 +-
>>   sysemu.h                        |    4 +-
>>   umem.c                          |  364 ++++++++
>>   umem.h                          |  101 +++
>>   vl.c                            |   16 +-
>>   vmstate.h                       |    2 +-
>>   33 files changed, 3373 insertions(+), 280 deletions(-)
>>   create mode 100644 linux-headers/linux/umem.h
>>   create mode 100644 migration-postcopy-stub.c
>>   create mode 100644 migration-postcopy.c
>>   create mode 100644 umem.c
>>   create mode 100644 umem.h
>>
>>
>>
>>
>> ------------------------------
>>
>>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Qemu-devel] Fwd:  [PATCH v2 00/41] postcopy live migration
  2012-06-04 14:27   ` Chegu Vinod
@ 2012-06-04 15:13     ` Isaku Yamahata
  0 siblings, 0 replies; 3+ messages in thread
From: Isaku Yamahata @ 2012-06-04 15:13 UTC (permalink / raw)
  To: Chegu Vinod; +Cc: qemu-devel, kvm

On Mon, Jun 04, 2012 at 07:27:25AM -0700, Chegu Vinod wrote:
> On 6/4/2012 6:13 AM, Isaku Yamahata wrote:
>> On Mon, Jun 04, 2012 at 05:01:30AM -0700, Chegu Vinod wrote:
>>> Hello Isaku Yamahata,
>> Hi.
>>
>>> I just saw your patches..Would it be possible to email me a tar bundle of these
>>> patches (makes it easier to apply the patches to a copy of the upstream qemu.git)
>> I uploaded them to github for those who are interested in it.
>>
>> git://github.com/yamahata/qemu.git qemu-postcopy-june-04-2012
>> git://github.com/yamahata/linux-umem.git  linux-umem-june-04-2012
>>
>
> Thanks for the pointer...
>>> BTW, I am also curious if you have considered using any kind of RDMA features for
>>> optimizing the page-faults during postcopy ?
>> Yes, RDMA is interesting topic. Can we share your use case/concern/issues?
>
>
> Looking at large sized guests (256GB and higher)  running cpu/memory  
> intensive enterprise workloads.
> The  concerns are the same...i.e. having a predictable total migration  
> time, minimal downtime/freeze-time and of course minimal service  
> degradation to the workload(s) in the VM or the co-located VM's...
>
> How large of a guest have you tested your changes with and what kind of  
> workloads have you used so far ?

Only up to several GB VM. Off course We'd like to benchmark with real
huge VM (several hundred GB), but it's somewhat difficult.


>> Thus we can collaborate.
>> You may want to see Benoit's results.
>
> Yes. 'have already seen some of Benoit's results.

Great.

> Hence the question about use of RDMA techniques for post copy.

So far my implementation doesn't used RDMA.

>> As long as I know, he has not published
>> his code yet.
>
> Thanks
> Vinod
>
>>
>> thanks,
>>
>>> Thanks
>>> Vinod
>>>
>>>
>>>
>>> ----------------------------------------------------------------------
>>>
>>> Message: 1
>>> Date: Mon,  4 Jun 2012 18:57:02 +0900
>>> From: Isaku Yamahata<yamahata@valinux.co.jp>
>>> To: qemu-devel@nongnu.org, kvm@vger.kernel.org
>>> Cc: benoit.hudzia@gmail.com, aarcange@redhat.com, aliguori@us.ibm.com,
>>> 	quintela@redhat.com, stefanha@gmail.com, t.hirofuchi@aist.go.jp,
>>> 	dlaor@redhat.com, satoshi.itoh@aist.go.jp,	mdroth@linux.vnet.ibm.com,
>>> 	yoshikawa.takuya@oss.ntt.co.jp,	owasserm@redhat.com, avi@redhat.com,
>>> 	pbonzini@redhat.com
>>> Subject: [Qemu-devel] [PATCH v2 00/41] postcopy live migration
>>> Message-ID:<cover.1338802190.git.yamahata@valinux.co.jp>
>>>
>>> After the long time, we have v2. This is qemu part.
>>> The linux kernel part is sent separatedly.
>>>
>>> Changes v1 ->   v2:
>>> - split up patches for review
>>> - buffered file refactored
>>> - many bug fixes
>>>    Espcially PV drivers can work with postcopy
>>> - optimization/heuristic
>>>
>>> Patches
>>> 1 - 30: refactoring exsiting code and preparation
>>> 31 - 37: implement postcopy itself (essential part)
>>> 38 - 41: some optimization/heuristic for postcopy
>>>
>>> Intro
>>> =====
>>> This patch series implements postcopy live migration.[1]
>>> As discussed at KVM forum 2011, dedicated character device is used for
>>> distributed shared memory between migration source and destination.
>>> Now we can discuss/benchmark/compare with precopy. I believe there are
>>> much rooms for improvement.
>>>
>>> [1] http://wiki.qemu.org/Features/PostCopyLiveMigration
>>>
>>>
>>> Usage
>>> =====
>>> You need load umem character device on the host before starting migration.
>>> Postcopy can be used for tcg and kvm accelarator. The implementation depend
>>> on only linux umem character device. But the driver dependent code is split
>>> into a file.
>>> I tested only host page size == guest page size case, but the implementation
>>> allows host page size != guest page size case.
>>>
>>> The following options are added with this patch series.
>>> - incoming part
>>>    command line options
>>>    -postcopy [-postcopy-flags<flags>]
>>>    where flags is for changing behavior for benchmark/debugging
>>>    Currently the following flags are available
>>>    0: default
>>>    1: enable touching page request
>>>
>>>    example:
>>>    qemu -postcopy -incoming tcp:0:4444 -monitor stdio -machine accel=kvm
>>>
>>> - outging part
>>>    options for migrate command
>>>    migrate [-p [-n] [-m]] URI [<prefault forward>   [<prefault backword>]]
>>>    -p: indicate postcopy migration
>>>    -n: disable background transferring pages: This is for benchmark/debugging
>>>    -m: move background transfer of postcopy mode
>>>    <prefault forward>: The number of forward pages which is sent with on-demand
>>>    <prefault backward>: The number of backward pages which is sent with
>>>                         on-demand
>>>
>>>    example:
>>>    migrate -p -n tcp:<dest ip address>:4444
>>>    migrate -p -n -m tcp:<dest ip address>:4444 32 0
>>>
>>>
>>> TODO
>>> ====
>>> - benchmark/evaluation. Especially how async page fault affects the result.
>>> - improve/optimization
>>>    At the moment at least what I'm aware of is
>>>    - making incoming socket non-blocking with thread
>>>      As page compression is comming, it is impractical to non-blocking read
>>>      and check if the necessary data is read.
>>>    - touching pages in incoming qemu process by fd handler seems suboptimal.
>>>      creating dedicated thread?
>>>    - outgoing handler seems suboptimal causing latency.
>>> - consider on FUSE/CUSE possibility
>>> - don't fork umemd, but create thread?
>>>
>>> basic postcopy work flow
>>> ========================
>>>          qemu on the destination
>>>                |
>>>                V
>>>          open(/dev/umem)
>>>                |
>>>                V
>>>          UMEM_INIT
>>>                |
>>>                V
>>>          Here we have two file descriptors to
>>>          umem device and shmem file
>>>                |
>>>                |                                  umemd
>>>                |                                  daemon on the destination
>>>                |
>>>                V    create pipe to communicate
>>>          fork()---------------------------------------,
>>>                |                                      |
>>>                V                                      |
>>>          close(socket)                                V
>>>          close(shmem)                              mmap(shmem file)
>>>                |                                      |
>>>                V                                      V
>>>          mmap(umem device) for guest RAM           close(shmem file)
>>>                |                                      |
>>>          close(umem device)                           |
>>>                |                                      |
>>>                V                                      |
>>>          wait for ready from daemon<----pipe-----send ready message
>>>                |                                      |
>>>                |                                 Here the daemon takes over
>>>          send ok------------pipe--------------->   the owner of the socket
>>>                |				        to the source
>>>                V                                      |
>>>          entering post copy stage                     |
>>>          start guest execution                        |
>>>                |                                      |
>>>                V                                      V
>>>          access guest RAM                          read() to get faulted pages
>>>                |                                      |
>>>                V                                      V
>>>          page fault ------------------------------>page offset is returned
>>>          block                                        |
>>>                                                       V
>>>                                                    pull page from the source
>>>                                                    write the page contents
>>>                                                    to the shmem.
>>>                                                       |
>>>                                                       V
>>>          unblock<-----------------------------write() to tell served pages
>>>          the fault handler returns the page
>>>          page fault is resolved
>>>                |
>>>                |                                   pages can be sent
>>>                |                                   backgroundly
>>>                |                                      |
>>>                |                                      V
>>>                |                                   write()
>>>                |                                      |
>>>                V                                      V
>>>          The specified pages<-----pipe------------request to touch pages
>>>          are made present by                          |
>>>          touching guest RAM.                          |
>>>                |                                      |
>>>                V                                      V
>>>               reply-------------pipe------------->   release the cached page
>>>                |                                   madvise(MADV_REMOVE)
>>>                |                                      |
>>>                V                                      V
>>>
>>>                   all the pages are pulled from the source
>>>
>>>                |                                      |
>>>                V                                      V
>>>          the vma becomes anonymous<----------------UMEM_MAKE_VMA_ANONYMOUS
>>>         (note: I'm not sure if this can be implemented or not)
>>>                |                                      |
>>>                V                                      V
>>>          migration completes                        exit()
>>>
>>>
>>>
>>>
>>> Isaku Yamahata (41):
>>>    arch_init: export sort_ram_list() and ram_save_block()
>>>    arch_init: export RAM_SAVE_xxx flags for postcopy
>>>    arch_init/ram_save: introduce constant for ram save version = 4
>>>    arch_init: refactor host_from_stream_offset()
>>>    arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case
>>>    arch_init: refactor ram_save_block()
>>>    arch_init/ram_save_live: factor out ram_save_limit
>>>    arch_init/ram_load: refactor ram_load
>>>    arch_init: introduce helper function to find ram block with id string
>>>    arch_init: simplify a bit by ram_find_block()
>>>    arch_init: factor out counting transferred bytes
>>>    arch_init: factor out setting last_block, last_offset
>>>    exec.c: factor out qemu_get_ram_ptr()
>>>    exec.c: export last_ram_offset()
>>>    savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip
>>>    savevm: qemu_pending_size() to return pending buffered size
>>>    savevm, buffered_file: introduce method to drain buffer of buffered
>>>      file
>>>    QEMUFile: add qemu_file_fd() for later use
>>>    savevm/QEMUFile: drop qemu_stdio_fd
>>>    savevm/QEMUFileSocket: drop duplicated member fd
>>>    savevm: rename QEMUFileSocket to QEMUFileFD, socket_close to fd_close
>>>    savevm/QEMUFile: introduce qemu_fopen_fd
>>>    migration.c: remove redundant line in migrate_init()
>>>    migration: export migrate_fd_completed() and migrate_fd_cleanup()
>>>    migration: factor out parameters into MigrationParams
>>>    buffered_file: factor out buffer management logic
>>>    buffered_file: Introduce QEMUFileNonblock for nonblock write
>>>    buffered_file: add qemu_file to read/write to buffer in memory
>>>    umem.h: import Linux umem.h
>>>    update-linux-headers.sh: teach umem.h to update-linux-headers.sh
>>>    configure: add CONFIG_POSTCOPY option
>>>    savevm: add new section that is used by postcopy
>>>    postcopy: introduce -postcopy and -postcopy-flags option
>>>    postcopy outgoing: add -p and -n option to migrate command
>>>    postcopy: introduce helper functions for postcopy
>>>    postcopy: implement incoming part of postcopy live migration
>>>    postcopy: implement outgoing part of postcopy live migration
>>>    postcopy/outgoing: add forward, backward option to specify the size
>>>      of prefault
>>>    postcopy/outgoing: implement prefault
>>>    migrate: add -m (movebg) option to migrate command
>>>    migration/postcopy: add movebg mode
>>>
>>>   Makefile.target                 |    5 +
>>>   arch_init.c                     |  298 ++++---
>>>   arch_init.h                     |   20 +
>>>   block-migration.c               |    8 +-
>>>   buffered_file.c                 |  322 ++++++--
>>>   buffered_file.h                 |   32 +
>>>   configure                       |   12 +
>>>   cpu-all.h                       |    9 +
>>>   exec-obsolete.h                 |    1 +
>>>   exec.c                          |   87 ++-
>>>   hmp-commands.hx                 |   18 +-
>>>   hmp.c                           |   10 +-
>>>   linux-headers/linux/umem.h      |   42 +
>>>   migration-exec.c                |   12 +-
>>>   migration-fd.c                  |   25 +-
>>>   migration-postcopy-stub.c       |   77 ++
>>>   migration-postcopy.c            | 1771 +++++++++++++++++++++++++++++++++++++++
>>>   migration-tcp.c                 |   25 +-
>>>   migration-unix.c                |   26 +-
>>>   migration.c                     |   97 ++-
>>>   migration.h                     |   47 +-
>>>   qapi-schema.json                |    4 +-
>>>   qemu-common.h                   |    2 +
>>>   qemu-file.h                     |    8 +-
>>>   qemu-options.hx                 |   25 +
>>>   qmp-commands.hx                 |    4 +-
>>>   savevm.c                        |  177 ++++-
>>>   scripts/update-linux-headers.sh |    2 +-
>>>   sysemu.h                        |    4 +-
>>>   umem.c                          |  364 ++++++++
>>>   umem.h                          |  101 +++
>>>   vl.c                            |   16 +-
>>>   vmstate.h                       |    2 +-
>>>   33 files changed, 3373 insertions(+), 280 deletions(-)
>>>   create mode 100644 linux-headers/linux/umem.h
>>>   create mode 100644 migration-postcopy-stub.c
>>>   create mode 100644 migration-postcopy.c
>>>   create mode 100644 umem.c
>>>   create mode 100644 umem.h
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>>
>

-- 
yamahata

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-06-04 15:13 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <4FCCA39A.1050300@hp.com>
2012-06-04 13:13 ` [Qemu-devel] Fwd: [PATCH v2 00/41] postcopy live migration Isaku Yamahata
2012-06-04 14:27   ` Chegu Vinod
2012-06-04 15:13     ` Isaku Yamahata

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).