[Qemu-devel] [PATCH v8 00/54] Postcopy implementation

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v8 00/54] Postcopy implementation
@ 2015-09-29  8:37 Dr. David Alan Gilbert (git)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 01/54] Add postcopy documentation Dr. David Alan Gilbert (git)
                   ` (53 more replies)
  0 siblings, 54 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

  This is the 8th cut of my version of postcopy.

The userfaultfd linux kernel code is now in the upstream kernel
tree, and so 4.3-rc3 can be used without modification.

This qemu series can be found at:
https://github.com/orbitfp7/qemu.git
on the wp3-postcopy-v8 tag


Testing status:
  * Tested heavily on x86
  * Smoke tested on aarch64 (so it does work on different page sizes)
  * Power is unhappy for me (but gets further than the htab problem
    v7 used to have) (I get a kvm run failed)

Note that patches:
   3 Init page size in qtest
   10 Use RAMBlock rather than Memory Region
   14,15 (splitting ram find and save block)
   36 Split out end of migration code

   have also been posted seperately during the last month and
can be taken separately from this series.

This work has been partially funded by the EU Orbit project:
  see http://www.orbitproject.eu/about/

v8
  Huge page changes
     The precopy phase is now allowed to keep transparent-huge-pages
     enabled, although these maybe split in the discard phase.  A change
     to the discard process now causes discards for unsent (as well as
     redirtied) pages; the combination of these changes means that the
     destination starts running with many of the precopy pages as huge
     pages, resulting in a significant performance benefit.  This change
     adds one more state ('discard') which is entered by the destination
     on reception of the 1st discard command.

  Add global_state_store for postcopy
  Moved postcopy_state back out of MigrationIncomingState
     During the end of the main migration thread reading state the
     postcopy state is read to see who should cleanup; in failure
     conditions there was a race between the state being read and the
     freeing of the MIS

  Stop calling the iterate method for non-postcopiable devices during postcopy
     Helps fix Power (thanks for Bharata for helping debug this)

  Review comment fixes
    rework of migration command parsing
    rework of postcopy_chunk_hostpages
    rework of discard code & protocol into start/length rather than start/end
    rename qemu_get_buffer_less_copy -> qemu_get_buffer_in_place
    split 'Postcopy end in migration thread' patch into two
    split of MIG_RP_MSG_REQ_PAGES into subtype with name
    Added comments documenting postcopy_state's use
    lots of other minor fixups

  Notes:
    I kept the mlock support (users are saying they wanted migration/postcopy
      with mlock)

    I'm keeping the x-  for now, until the libvirt interface gets finalised.

    There are two checkpatch errors, that I don't think are right to change:
     a) a 'typedef enum' it wants split - that's the way we do all our enums
        and would force a dummy name for the enum.
     b) A complaint about postcopy_ram_discard_version = 0 that's a global
        static; I could get rid of it by making my version 1, but it doesn't
        seem right to omit the '= 0 ' for a version constant.

Dave

Dr. David Alan Gilbert (54):
  Add postcopy documentation
  Provide runtime Target page information
  Init page sizes in qtest
  Move configuration section writing
  qemu_ram_block_from_host
  Rename mis->file to from_src_file
  Add qemu_get_buffer_in_place to avoid copies some of the time
  Add wrapper for setting blocking status on a QEMUFile
  Add QEMU_MADV_NOHUGEPAGE
  migration/ram.c: Use RAMBlock rather than MemoryRegion
  ram_debug_dump_bitmap: Dump a migration bitmap as text
  migrate_init: Call from savevm
  Move dirty page search state into separate structure
  ram_find_and_save_block: Split out the finding
  Rename save_live_complete to save_live_complete_precopy
  Return path: Open a return path on QEMUFile for sockets
  Return path: socket_writev_buffer: Block even on non-blocking fd's
  Migration commands
  Return path: Control commands
  Return path: Send responses from destination to source
  Return path: Source handling of return path
  Rework loadvm path for subloops
  Add migration-capability boolean for postcopy-ram.
  Add wrappers and handlers for sending/receiving the postcopy-ram
    migration messages.
  MIG_CMD_PACKAGED: Send a packaged chunk of migration stream
  Modify save_live_pending for postcopy
  postcopy: OS support test
  migrate_start_postcopy: Command to trigger transition to postcopy
  MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state
  Avoid sending vmdescription during postcopy
  Add qemu_savevm_state_complete_postcopy
  Postcopy: Maintain sentmap and calculate discard
  postcopy: Incoming initialisation
  postcopy: ram_enable_notify to switch on userfault
  Postcopy: Postcopy startup in migration thread
  Split out end of migration code from migration_thread
  Postcopy: End of iteration
  Page request:  Add MIG_RP_MSG_REQ_PAGES reverse command
  Page request: Process incoming page request
  Page request: Consume pages off the post-copy queue
  postcopy_ram.c: place_page and helpers
  Postcopy: Use helpers to map pages during migration
  Don't sync dirty bitmaps in postcopy
  Don't iterate on precopy-only devices during postcopy
  Host page!=target page: Cleanup bitmaps
  postcopy: Check order of received target pages
  Round up RAMBlock sizes to host page sizes
  Postcopy; Handle userfault requests
  Start up a postcopy/listener thread ready for incoming page data
  postcopy: Wire up loadvm_postcopy_handle_ commands
  Postcopy: Mark nohugepage before discard
  End of migration for postcopy
  Disable mlock around incoming postcopy
  Inhibit ballooning during postcopy

 balloon.c                        |  11 +
 docs/migration.txt               | 191 ++++++++
 exec.c                           |  72 ++-
 hmp-commands.hx                  |  15 +
 hmp.c                            |   7 +
 hmp.h                            |   1 +
 hw/ppc/spapr.c                   |   2 +-
 hw/virtio/virtio-balloon.c       |   4 +-
 include/exec/cpu-common.h        |   3 +
 include/exec/ram_addr.h          |   2 -
 include/migration/migration.h    | 128 ++++-
 include/migration/postcopy-ram.h |  99 ++++
 include/migration/qemu-file.h    |  10 +
 include/migration/vmstate.h      |   8 +-
 include/qemu/osdep.h             |   9 +
 include/qemu/typedefs.h          |   3 +
 include/sysemu/balloon.h         |   2 +
 include/sysemu/sysemu.h          |  46 +-
 migration/Makefile.objs          |   2 +-
 migration/block.c                |   9 +-
 migration/migration.c            | 753 +++++++++++++++++++++++++++--
 migration/postcopy-ram.c         | 763 ++++++++++++++++++++++++++++++
 migration/qemu-file-unix.c       | 111 ++++-
 migration/qemu-file.c            |  74 +++
 migration/ram.c                  | 995 +++++++++++++++++++++++++++++++++++----
 migration/savevm.c               | 825 ++++++++++++++++++++++++++++----
 qapi-schema.json                 |  18 +-
 qmp-commands.hx                  |  19 +
 qtest.c                          |   1 +
 trace-events                     |  81 +++-
 30 files changed, 3996 insertions(+), 268 deletions(-)
 create mode 100644 include/migration/postcopy-ram.h
 create mode 100644 migration/postcopy-ram.c

-- 
2.5.0

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 01/54] Add postcopy documentation
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 02/54] Provide runtime Target page information Dr. David Alan Gilbert (git)
                   ` (52 subsequent siblings)
  53 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 docs/migration.txt | 191 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 191 insertions(+)

diff --git a/docs/migration.txt b/docs/migration.txt
index f6df4be..7853709 100644
--- a/docs/migration.txt
+++ b/docs/migration.txt
@@ -291,3 +291,194 @@ save/send this state when we are in the middle of a pio operation
 (that is what ide_drive_pio_state_needed() checks).  If DRQ_STAT is
 not enabled, the values on that fields are garbage and don't need to
 be sent.
+
+= Return path =
+
+In most migration scenarios there is only a single data path that runs
+from the source VM to the destination, typically along a single fd (although
+possibly with another fd or similar for some fast way of throwing pages across).
+
+However, some uses need two way communication; in particular the Postcopy
+destination needs to be able to request pages on demand from the source.
+
+For these scenarios there is a 'return path' from the destination to the source;
+qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the return
+path.
+
+  Source side
+     Forward path - written by migration thread
+     Return path  - opened by main thread, read by return-path thread
+
+  Destination side
+     Forward path - read by main thread
+     Return path  - opened by main thread, written by main thread AND postcopy
+                    thread (protected by rp_mutex)
+
+= Postcopy =
+'Postcopy' migration is a way to deal with migrations that refuse to converge
+(or take too long to converge) its plus side is that there is an upper bound on
+the amount of migration traffic and time it takes, the down side is that during
+the postcopy phase, a failure of *either* side or the network connection causes
+the guest to be lost.
+
+In postcopy the destination CPUs are started before all the memory has been
+transferred, and accesses to pages that are yet to be transferred cause
+a fault that's translated by QEMU into a request to the source QEMU.
+
+Postcopy can be combined with precopy (i.e. normal migration) so that if precopy
+doesn't finish in a given time the switch is made to postcopy.
+
+=== Enabling postcopy ===
+
+To enable postcopy, issue this command on the monitor prior to the
+start of migration:
+
+migrate_set_capability x-postcopy-ram on
+
+The normal commands are then used to start a migration, which is still
+started in precopy mode.  Issuing:
+
+migrate_start_postcopy
+
+will now cause the transition from precopy to postcopy.
+It can be issued immediately after migration is started or any
+time later on.  Issuing it after the end of a migration is harmless.
+
+Note: During the postcopy phase, the bandwidth limits set using
+migrate_set_speed is ignored (to avoid delaying requested pages that
+the destination is waiting for).
+
+=== Postcopy device transfer ===
+
+Loading of device data may cause the device emulation to access guest RAM
+that may trigger faults that have to be resolved by the source, as such
+the migration stream has to be able to respond with page data *during* the
+device load, and hence the device data has to be read from the stream completely
+before the device load begins to free the stream up.  This is achieved by
+'packaging' the device data into a blob that's read in one go.
+
+Source behaviour
+
+Until postcopy is entered the migration stream is identical to normal
+precopy, except for the addition of a 'postcopy advise' command at
+the beginning, to tell the destination that postcopy might happen.
+When postcopy starts the source sends the page discard data and then
+forms the 'package' containing:
+
+   Command: 'postcopy listen'
+   The device state
+      A series of sections, identical to the precopy streams device state stream
+      containing everything except postcopiable devices (i.e. RAM)
+   Command: 'postcopy run'
+
+The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
+contents are formatted in the same way as the main migration stream.
+
+During postcopy the source scans the list of dirty pages and sends them
+to the destination without being requested (in much the same way as precopy),
+however when a page request is received from the destination, the dirty page
+scanning restarts from the requested location.  This causes requested pages
+to be sent quickly, and also causes pages directly after the requested page
+to be sent quickly in the hope that those pages are likely to be used
+by the destination soon.
+
+Destination behaviour
+
+Initially the destination looks the same as precopy, with a single thread
+reading the migration stream; the 'postcopy advise' and 'discard' commands
+are processed to change the way RAM is managed, but don't affect the stream
+processing.
+
+------------------------------------------------------------------------------
+                        1      2   3     4 5                      6   7
+main -----DISCARD-CMD_PACKAGED ( LISTEN  DEVICE     DEVICE DEVICE RUN )
+thread                             |       |
+                                   |     (page request)
+                                   |        \___
+                                   v            \
+listen thread:                     --- page -- page -- page -- page -- page --
+
+                                   a   b        c
+------------------------------------------------------------------------------
+
+On receipt of CMD_PACKAGED (1)
+   All the data associated with the package - the ( ... ) section in the
+diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
+recurses into qemu_loadvm_state_main to process the contents of the package (2)
+which contains commands (3,6) and devices (4...)
+
+On receipt of 'postcopy listen' - 3 -(i.e. the 1st command in the package)
+a new thread (a) is started that takes over servicing the migration stream,
+while the main thread carries on loading the package.   It loads normal
+background page data (b) but if during a device load a fault happens (5) the
+returned page (c) is loaded by the listen thread allowing the main threads
+device load to carry on.
+
+The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the destination
+CPUs start running.
+At the end of the CMD_PACKAGED (7) the main thread returns to normal running behaviour
+and is no longer used by migration, while the listen thread carries
+on servicing page data until the end of migration.
+
+=== Postcopy states ===
+
+Postcopy moves through a series of states (see postcopy_state) from
+ADVISE->DISCARD->LISTEN->RUNNING->END
+
+  Advise:  Set at the start of migration if postcopy is enabled, even
+           if it hasn't had the start command; here the destination
+           checks that its OS has the support needed for postcopy, and performs
+           setup to ensure the RAM mappings are suitable for later postcopy.
+           The destination will fail early in migration at this point if the
+           required OS support is not present.
+           (Triggered by reception of POSTCOPY_ADVISE command)
+
+  Discard: Entered on receipt of the first 'discard' command; prior to
+           the first Discard being performed, hugepages are switched off
+           (using madvise) to ensure that no new huge pages are created
+           during the postcopy phase, and to cause any huge pages that
+           have discards on them to be broken.
+
+  Listen:  The first command in the package, POSTCOPY_LISTEN, switches
+           the destination state to Listen, and starts a new thread
+           (the 'listen thread') which takes over the job of receiving
+           pages off the migration stream, while the main thread carries
+           on processing the blob.  With this thread able to process page
+           reception, the destination now 'sensitises' the RAM to detect
+           any access to missing pages (on Linux using the 'userfault'
+           system).
+
+  Running: POSTCOPY_RUN causes the destination to synchronise all
+           state and start the CPUs and IO devices running.  The main
+           thread now finishes processing the migration package and
+           now carries on as it would for normal precopy migration
+           (although it can't do the cleanup it would do as it
+           finishes a normal migration).
+
+  End:     The listen thread can now quit, and perform the cleanup of migration
+           state, the migration is now complete.
+
+=== Source side page maps ===
+
+The source side keeps two bitmaps during postcopy; 'the migration bitmap'
+and 'sent map'.  The 'migration bitmap' is basically the same as in
+the precopy case, and holds a bit to indicate that page is 'dirty' -
+i.e. needs sending.  During the precopy phase this is updated as the CPU
+dirties pages, however during postcopy the CPUs are stopped and nothing
+should dirty anything any more.
+
+The 'sent map' is used for the transition to postcopy. It is a bitmap that
+has a bit set whenever a page is sent to the destination, however during
+the transition to postcopy mode it is combined with the migration bitmap
+to form a set of pages that:
+   a) Have been sent but then redirtied (which must be discarded)
+   b) Have not yet been sent - which also must be discarded to cause any
+      transparent huge pages built during precopy to be broken.
+
+Note that the contents of the sentmap are sacrificed during the calculation
+of the discard set and thus aren't valid once in postcopy.  The dirtymap
+is still valid and is used to ensure that no page is sent more than once.  Any
+request for a page that has already been sent is ignored.  Duplicate requests
+such as this can happen as a page is sent at about the same time the
+destination accesses it.
+
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 02/54] Provide runtime Target page information
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 01/54] Add postcopy documentation Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 03/54] Init page sizes in qtest Dr. David Alan Gilbert (git)
                   ` (51 subsequent siblings)
  53 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The migration code generally is built target-independent, however
there are a few places where knowing the target page size would
avoid artificially moving stuff into migration/ram.c.

Provide 'qemu_target_page_bits()' that returns TARGET_PAGE_BITS
to other bits of code so that they can stay target-independent.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
 exec.c                  | 10 ++++++++++
 include/sysemu/sysemu.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/exec.c b/exec.c
index 47ada31..1852613 100644
--- a/exec.c
+++ b/exec.c
@@ -3468,6 +3468,16 @@ int cpu_memory_rw_debug(CPUState *cpu, target_ulong addr,
     }
     return 0;
 }
+
+/*
+ * Allows code that needs to deal with migration bitmaps etc to still be built
+ * target independent.
+ */
+size_t qemu_target_page_bits(void)
+{
+    return TARGET_PAGE_BITS;
+}
+
 #endif
 
 /*
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index c439975..3e5c3d1 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -70,6 +70,7 @@ void qemu_system_killed(int signal, pid_t pid);
 void qemu_devices_reset(void);
 void qemu_system_reset(bool report);
 void qemu_system_guest_panicked(void);
+size_t qemu_target_page_bits(void);
 
 void qemu_add_exit_notifier(Notifier *notify);
 void qemu_remove_exit_notifier(Notifier *notify);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 03/54] Init page sizes in qtest
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 01/54] Add postcopy documentation Dr. David Alan Gilbert (git)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 02/54] Provide runtime Target page information Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 04/54] Move configuration section writing Dr. David Alan Gilbert (git)
                   ` (50 subsequent siblings)
  53 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

One of my patches used a loop that was based on host page size;
it dies in qtest since qtest hadn't bothered init'ing it.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 qtest.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/qtest.c b/qtest.c
index 05cefd2..8e10340 100644
--- a/qtest.c
+++ b/qtest.c
@@ -657,6 +657,7 @@ void qtest_init(const char *qtest_chrdev, const char *qtest_log, Error **errp)
 
     inbuf = g_string_new("");
     qtest_chr = chr;
+    page_size_init();
 }
 
 bool qtest_driver(void)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 04/54] Move configuration section writing
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (2 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 03/54] Init page sizes in qtest Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-10-05  6:44   ` Amit Shah
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 05/54] qemu_ram_block_from_host Dr. David Alan Gilbert (git)
                   ` (49 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The vmstate_configuration is currently written
in 'qemu_savevm_state_begin', move it to
'qemu_savevm_state_header' since it's got a hard
requirement that it must be the 1st thing after
the header.
(In postcopy some 'command' sections get sent
early before the saving of the main sections
and hence before qemu_savevm_state_begin).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/savevm.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 33e55fe..d8847c4 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -712,6 +712,12 @@ void qemu_savevm_state_header(QEMUFile *f)
     trace_savevm_state_header();
     qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
     qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+
+    if (!savevm_state.skip_configuration) {
+        qemu_put_byte(f, QEMU_VM_CONFIGURATION);
+        vmstate_save_state(f, &vmstate_configuration, &savevm_state, 0);
+    }
+
 }
 
 void qemu_savevm_state_begin(QEMUFile *f,
@@ -728,11 +734,6 @@ void qemu_savevm_state_begin(QEMUFile *f,
         se->ops->set_params(params, se->opaque);
     }
 
-    if (!savevm_state.skip_configuration) {
-        qemu_put_byte(f, QEMU_VM_CONFIGURATION);
-        vmstate_save_state(f, &vmstate_configuration, &savevm_state, 0);
-    }
-
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
         if (!se->ops || !se->ops->save_live_setup) {
             continue;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 05/54] qemu_ram_block_from_host
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (3 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 04/54] Move configuration section writing Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 06/54] Rename mis->file to from_src_file Dr. David Alan Gilbert (git)
                   ` (48 subsequent siblings)
  53 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy sends RAMBlock names and offsets over the wire (since it can't
rely on the order of ramaddr being the same), and it starts out with
HVA fault addresses from the kernel.

qemu_ram_block_from_host translates a HVA into a RAMBlock, an offset
in the RAMBlock and the global ram_addr_t value.

Rewrite qemu_ram_addr_from_host to use qemu_ram_block_from_host.

Provide qemu_ram_get_idstr since its the actual name text sent on the
wire.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 exec.c                    | 54 +++++++++++++++++++++++++++++++++++++++--------
 include/exec/cpu-common.h |  3 +++
 include/exec/ram_addr.h   |  2 --
 include/qemu/typedefs.h   |  1 +
 4 files changed, 49 insertions(+), 11 deletions(-)

diff --git a/exec.c b/exec.c
index 1852613..d7c50e3 100644
--- a/exec.c
+++ b/exec.c
@@ -1350,6 +1350,11 @@ static RAMBlock *find_ram_block(ram_addr_t addr)
     return NULL;
 }
 
+const char *qemu_ram_get_idstr(RAMBlock *rb)
+{
+    return rb->idstr;
+}
+
 /* Called with iothread lock held.  */
 void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev)
 {
@@ -1845,8 +1850,16 @@ static void *qemu_ram_ptr_length(ram_addr_t addr, hwaddr *size)
     }
 }
 
-/* Some of the softmmu routines need to translate from a host pointer
- * (typically a TLB entry) back to a ram offset.
+/*
+ * Translates a host ptr back to a RAMBlock, a ram_addr and an offset
+ * in that RAMBlock.
+ *
+ * ptr: Host pointer to look up
+ * round_offset: If true round the result offset down to a page boundary
+ * *ram_addr: set to result ram_addr
+ * *offset: set to result offset within the RAMBlock
+ *
+ * Returns: RAMBlock (or NULL if not found)
  *
  * By the time this function returns, the returned pointer is not protected
  * by RCU anymore.  If the caller is not within an RCU critical section and
@@ -1854,18 +1867,22 @@ static void *qemu_ram_ptr_length(ram_addr_t addr, hwaddr *size)
  * pointer, such as a reference to the region that includes the incoming
  * ram_addr_t.
  */
-MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
+RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
+                                   ram_addr_t *ram_addr,
+                                   ram_addr_t *offset)
 {
     RAMBlock *block;
     uint8_t *host = ptr;
-    MemoryRegion *mr;
 
     if (xen_enabled()) {
         rcu_read_lock();
         *ram_addr = xen_ram_addr_from_mapcache(ptr);
-        mr = qemu_get_ram_block(*ram_addr)->mr;
+        block = qemu_get_ram_block(*ram_addr);
+        if (block) {
+            *offset = (host - block->host);
+        }
         rcu_read_unlock();
-        return mr;
+        return block;
     }
 
     rcu_read_lock();
@@ -1888,10 +1905,29 @@ MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
     return NULL;
 
 found:
-    *ram_addr = block->offset + (host - block->host);
-    mr = block->mr;
+    *offset = (host - block->host);
+    if (round_offset) {
+        *offset &= TARGET_PAGE_MASK;
+    }
+    *ram_addr = block->offset + *offset;
     rcu_read_unlock();
-    return mr;
+    return block;
+}
+
+/* Some of the softmmu routines need to translate from a host pointer
+   (typically a TLB entry) back to a ram offset.  */
+MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
+{
+    RAMBlock *block;
+    ram_addr_t offset; /* Not used */
+
+    block = qemu_ram_block_from_host(ptr, false, ram_addr, &offset);
+
+    if (!block) {
+        return NULL;
+    }
+
+    return block->mr;
 }
 
 static void notdirty_mem_write(void *opaque, hwaddr ram_addr,
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 9fb1d54..94d1f8a 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -64,8 +64,11 @@ typedef uint32_t CPUReadMemoryFunc(void *opaque, hwaddr addr);
 void qemu_ram_remap(ram_addr_t addr, ram_addr_t length);
 /* This should not be used by devices.  */
 MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr);
+RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
+                                   ram_addr_t *ram_addr, ram_addr_t *offset);
 void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev);
 void qemu_ram_unset_idstr(ram_addr_t addr);
+const char *qemu_ram_get_idstr(RAMBlock *rb);
 
 void cpu_physical_memory_rw(hwaddr addr, uint8_t *buf,
                             int len, int is_write);
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index c400a75..76601de 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -22,8 +22,6 @@
 #ifndef CONFIG_USER_ONLY
 #include "hw/xen/xen.h"
 
-typedef struct RAMBlock RAMBlock;
-
 struct RAMBlock {
     struct rcu_head rcu;
     struct MemoryRegion *mr;
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index ce82c64..a43e6e9 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -72,6 +72,7 @@ typedef struct QEMUSizedBuffer QEMUSizedBuffer;
 typedef struct QEMUTimerListGroup QEMUTimerListGroup;
 typedef struct QEMUTimer QEMUTimer;
 typedef struct Range Range;
+typedef struct RAMBlock RAMBlock;
 typedef struct SerialState SerialState;
 typedef struct SHPCDevice SHPCDevice;
 typedef struct SMBusDevice SMBusDevice;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 06/54] Rename mis->file to from_src_file
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (4 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 05/54] qemu_ram_block_from_host Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-09-29 10:41   ` Amit Shah
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 07/54] Add qemu_get_buffer_in_place to avoid copies some of the time Dr. David Alan Gilbert (git)
                   ` (47 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

'file' becomes confusing when you have flows in each direction;
rename to make it clear.
This leaves just the main forward direction ms->file, which is used
in a lot of places and is probably not worth renaming given the churn.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h | 2 +-
 migration/migration.c         | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 8334621..83fba23 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -48,7 +48,7 @@ typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
 
 /* State for the incoming migration */
 struct MigrationIncomingState {
-    QEMUFile *file;
+    QEMUFile *from_src_file;
 
     /* See savevm.c */
     LoadStateEntry_Head loadvm_handlers;
diff --git a/migration/migration.c b/migration/migration.c
index 662e77e..192e975 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -87,7 +87,7 @@ MigrationIncomingState *migration_incoming_get_current(void)
 MigrationIncomingState *migration_incoming_state_new(QEMUFile* f)
 {
     mis_current = g_malloc0(sizeof(MigrationIncomingState));
-    mis_current->file = f;
+    mis_current->from_src_file = f;
     QLIST_INIT(&mis_current->loadvm_handlers);
 
     return mis_current;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 07/54] Add qemu_get_buffer_in_place to avoid copies some of the time
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (5 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 06/54] Rename mis->file to from_src_file Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 08/54] Add wrapper for setting blocking status on a QEMUFile Dr. David Alan Gilbert (git)
                   ` (46 subsequent siblings)
  53 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

qemu_get_buffer always copies the data it reads to a users buffer,
however in many cases the file buffer inside qemu_file could be given
back to the caller, avoiding the copy.  This isn't always possible
depending on the size and alignment of the data.

Thus 'qemu_get_buffer_in_place' either copies the data to a supplied
buffer or updates a pointer to the internal buffer if convenient.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 include/migration/qemu-file.h |  2 ++
 migration/qemu-file.c         | 47 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index ea49f33..ca96461 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -166,6 +166,8 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size);
 ssize_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size,
                                   int level);
 int qemu_put_qemu_file(QEMUFile *f_des, QEMUFile *f_src);
+int qemu_get_buffer_in_place(QEMUFile *f, uint8_t **buf, int size);
+
 /*
  * Note that you can only peek continuous bytes from where the current pointer
  * is; you aren't guaranteed to be able to peak to +n bytes unless you've
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 6bb3dc1..05a41b3 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -434,6 +434,53 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size)
 }
 
 /*
+ * Read 'size' bytes of data from the file.
+ * 'size' can be larger than the internal buffer.
+ *
+ * The data:
+ *   may be held on an internal buffer (in which case *buf is updated
+ *     to point to it) that is valid until the next qemu_file operation.
+ * OR
+ *   will be copied to the *buf that was passed in.
+ *
+ * The code tries to avoid the copy if possible.
+ *
+ * It will return size bytes unless there was an error, in which case it will
+ * return as many as it managed to read (assuming blocking fd's which
+ * all current QEMUFile are)
+ *
+ * Note: Since **buf may get changed, the caller should take care to
+ *       keep a pointer to the original buffer if it needs to deallocate it.
+ */
+int qemu_get_buffer_in_place(QEMUFile *f, uint8_t **buf, int size)
+{
+    int pending = size;
+    int done = 0;
+    bool first = true;
+
+    while (pending > 0) {
+        int res;
+        uint8_t *src;
+
+        res = qemu_peek_buffer(f, &src, MIN(pending, IO_BUF_SIZE), 0);
+        if (res == 0) {
+            return done;
+        }
+        qemu_file_skip(f, res);
+        done += res;
+        pending -= res;
+        if (first && res == size) {
+            *buf = src;
+            break;
+        }
+        first = false;
+        memcpy(buf, src, res);
+        buf += res;
+    }
+    return done;
+}
+
+/*
  * Peeks a single byte from the buffer; this isn't guaranteed to work if
  * offset leaves a gap after the previous read/peeked data.
  */
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 08/54] Add wrapper for setting blocking status on a QEMUFile
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (6 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 07/54] Add qemu_get_buffer_in_place to avoid copies some of the time Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 09/54] Add QEMU_MADV_NOHUGEPAGE Dr. David Alan Gilbert (git)
                   ` (45 subsequent siblings)
  53 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add a wrapper to change the blocking status on a QEMUFile
rather than having to use qemu_set_block(qemu_get_fd(f));
it seems best to avoid exposing the fd since not all QEMUFile's
really have one.  With this wrapper we could move the implementation
down to be different on different transports.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
 include/migration/qemu-file.h |  1 +
 migration/qemu-file.c         | 15 +++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index ca96461..865f897 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -197,6 +197,7 @@ int qemu_file_get_error(QEMUFile *f);
 void qemu_file_set_error(QEMUFile *f, int ret);
 int qemu_file_shutdown(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
+void qemu_file_set_blocking(QEMUFile *f, bool block);
 
 static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
 {
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 05a41b3..3c64a9c 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -658,3 +658,18 @@ size_t qemu_get_counted_string(QEMUFile *f, char buf[256])
 
     return res == len ? res : 0;
 }
+
+/*
+ * Set the blocking state of the QEMUFile.
+ * Note: On some transports the OS only keeps a single blocking state for
+ *       both directions, and thus changing the blocking on the main
+ *       QEMUFile can also affect the return path.
+ */
+void qemu_file_set_blocking(QEMUFile *f, bool block)
+{
+    if (block) {
+        qemu_set_block(qemu_get_fd(f));
+    } else {
+        qemu_set_nonblock(qemu_get_fd(f));
+    }
+}
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 09/54] Add QEMU_MADV_NOHUGEPAGE
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (7 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 08/54] Add wrapper for setting blocking status on a QEMUFile Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-10-28 10:35   ` Amit Shah
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 10/54] migration/ram.c: Use RAMBlock rather than MemoryRegion Dr. David Alan Gilbert (git)
                   ` (44 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add QEMU_MADV_NOHUGEPAGE as an OS-independent version of
MADV_NOHUGEPAGE.

We include sys/mman.h before making the test to ensure
that we pick up the system defines.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/qemu/osdep.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index ab3c876..7d471f6 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -135,6 +135,8 @@ void qemu_anon_ram_free(void *ptr, size_t size);
 
 #if defined(CONFIG_MADVISE)
 
+#include <sys/mman.h>
+
 #define QEMU_MADV_WILLNEED  MADV_WILLNEED
 #define QEMU_MADV_DONTNEED  MADV_DONTNEED
 #ifdef MADV_DONTFORK
@@ -167,6 +169,11 @@ void qemu_anon_ram_free(void *ptr, size_t size);
 #else
 #define QEMU_MADV_HUGEPAGE QEMU_MADV_INVALID
 #endif
+#ifdef MADV_NOHUGEPAGE
+#define QEMU_MADV_NOHUGEPAGE MADV_NOHUGEPAGE
+#else
+#define QEMU_MADV_NOHUGEPAGE QEMU_MADV_INVALID
+#endif
 
 #elif defined(CONFIG_POSIX_MADVISE)
 
@@ -178,6 +185,7 @@ void qemu_anon_ram_free(void *ptr, size_t size);
 #define QEMU_MADV_DODUMP QEMU_MADV_INVALID
 #define QEMU_MADV_DONTDUMP QEMU_MADV_INVALID
 #define QEMU_MADV_HUGEPAGE  QEMU_MADV_INVALID
+#define QEMU_MADV_NOHUGEPAGE  QEMU_MADV_INVALID
 
 #else /* no-op */
 
@@ -189,6 +197,7 @@ void qemu_anon_ram_free(void *ptr, size_t size);
 #define QEMU_MADV_DODUMP QEMU_MADV_INVALID
 #define QEMU_MADV_DONTDUMP QEMU_MADV_INVALID
 #define QEMU_MADV_HUGEPAGE  QEMU_MADV_INVALID
+#define QEMU_MADV_NOHUGEPAGE  QEMU_MADV_INVALID
 
 #endif
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 10/54] migration/ram.c: Use RAMBlock rather than MemoryRegion
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (8 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 09/54] Add QEMU_MADV_NOHUGEPAGE Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-10-28 10:36   ` Amit Shah
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 11/54] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
                   ` (43 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

RAM migration mainly works on RAMBlocks but in a few places
uses data from MemoryRegions to access the same information that's
already held in RAMBlocks; clean it up just to avoid the
MemoryRegion use.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/ram.c | 26 +++++++++++---------------
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 7f007e6..7df9157 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -497,13 +497,13 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t **current_data,
 
 /* Called with rcu_read_lock() to protect migration_bitmap */
 static inline
-ram_addr_t migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
+ram_addr_t migration_bitmap_find_and_reset_dirty(RAMBlock *rb,
                                                  ram_addr_t start)
 {
-    unsigned long base = mr->ram_addr >> TARGET_PAGE_BITS;
+    unsigned long base = rb->offset >> TARGET_PAGE_BITS;
     unsigned long nr = base + (start >> TARGET_PAGE_BITS);
-    uint64_t mr_size = TARGET_PAGE_ALIGN(memory_region_size(mr));
-    unsigned long size = base + (mr_size >> TARGET_PAGE_BITS);
+    uint64_t rb_size = rb->used_length;
+    unsigned long size = base + (rb_size >> TARGET_PAGE_BITS);
     unsigned long *bitmap;
 
     unsigned long next;
@@ -573,7 +573,7 @@ static void migration_bitmap_sync(void)
     qemu_mutex_lock(&migration_bitmap_mutex);
     rcu_read_lock();
     QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
-        migration_bitmap_sync_range(block->mr->ram_addr, block->used_length);
+        migration_bitmap_sync_range(block->offset, block->used_length);
     }
     rcu_read_unlock();
     qemu_mutex_unlock(&migration_bitmap_mutex);
@@ -668,12 +668,11 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset,
     int pages = -1;
     uint64_t bytes_xmit;
     ram_addr_t current_addr;
-    MemoryRegion *mr = block->mr;
     uint8_t *p;
     int ret;
     bool send_async = true;
 
-    p = memory_region_get_ram_ptr(mr) + offset;
+    p = block->host + offset;
 
     /* In doubt sent page as normal */
     bytes_xmit = 0;
@@ -744,7 +743,7 @@ static int do_compress_ram_page(CompressParam *param)
     RAMBlock *block = param->block;
     ram_addr_t offset = param->offset;
 
-    p = memory_region_get_ram_ptr(block->mr) + (offset & TARGET_PAGE_MASK);
+    p = block->host + (offset & TARGET_PAGE_MASK);
 
     bytes_sent = save_page_header(param->file, block, offset |
                                   RAM_SAVE_FLAG_COMPRESS_PAGE);
@@ -852,11 +851,10 @@ static int ram_save_compressed_page(QEMUFile *f, RAMBlock *block,
 {
     int pages = -1;
     uint64_t bytes_xmit;
-    MemoryRegion *mr = block->mr;
     uint8_t *p;
     int ret;
 
-    p = memory_region_get_ram_ptr(mr) + offset;
+    p = block->host + offset;
 
     bytes_xmit = 0;
     ret = ram_control_save_page(f, block->offset,
@@ -929,14 +927,12 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
     ram_addr_t offset = last_offset;
     bool complete_round = false;
     int pages = 0;
-    MemoryRegion *mr;
 
     if (!block)
         block = QLIST_FIRST_RCU(&ram_list.blocks);
 
     while (true) {
-        mr = block->mr;
-        offset = migration_bitmap_find_and_reset_dirty(mr, offset);
+        offset = migration_bitmap_find_and_reset_dirty(block, offset);
         if (complete_round && block == last_seen_block &&
             offset >= last_offset) {
             break;
@@ -1344,7 +1340,7 @@ static inline void *host_from_stream_offset(QEMUFile *f,
             return NULL;
         }
 
-        return memory_region_get_ram_ptr(block->mr) + offset;
+        return block->host + offset;
     }
 
     len = qemu_get_byte(f);
@@ -1354,7 +1350,7 @@ static inline void *host_from_stream_offset(QEMUFile *f,
     QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
         if (!strncmp(id, block->idstr, sizeof(id)) &&
             block->max_length > offset) {
-            return memory_region_get_ram_ptr(block->mr) + offset;
+            return block->host + offset;
         }
     }
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 11/54] ram_debug_dump_bitmap: Dump a migration bitmap as text
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (9 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 10/54] migration/ram.c: Use RAMBlock rather than MemoryRegion Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 12/54] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
                   ` (42 subsequent siblings)
  53 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Useful for debugging the migration bitmap and other bitmaps
of the same format (including the sentmap in postcopy).

The bitmap is printed to stderr.
Lines that are all the expected value are excluded so the output
can be quite compact for many bitmaps.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
 include/migration/migration.h |  1 +
 migration/ram.c               | 39 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 83fba23..51bc348 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -145,6 +145,7 @@ uint64_t xbzrle_mig_pages_cache_miss(void);
 double xbzrle_mig_cache_miss_rate(void);
 
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
+void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
 
 /**
  * @migrate_add_blocker - prevent migration from proceeding
diff --git a/migration/ram.c b/migration/ram.c
index 7df9157..1c9d1da 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1086,6 +1086,45 @@ void migration_bitmap_extend(ram_addr_t old, ram_addr_t new)
     }
 }
 
+/*
+ * 'expected' is the value you expect the bitmap mostly to be full
+ * of; it won't bother printing lines that are all this value.
+ * If 'todump' is null the migration bitmap is dumped.
+ */
+void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
+{
+    int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
+
+    int64_t cur;
+    int64_t linelen = 128;
+    char linebuf[129];
+
+    if (!todump) {
+        todump = migration_bitmap;
+    }
+
+    for (cur = 0; cur < ram_pages; cur += linelen) {
+        int64_t curb;
+        bool found = false;
+        /*
+         * Last line; catch the case where the line length
+         * is longer than remaining ram
+         */
+        if (cur + linelen > ram_pages) {
+            linelen = ram_pages - cur;
+        }
+        for (curb = 0; curb < linelen; curb++) {
+            bool thisbit = test_bit(cur + curb, todump);
+            linebuf[curb] = thisbit ? '1' : '.';
+            found = found || (thisbit != expected);
+        }
+        if (found) {
+            linebuf[curb] = '\0';
+            fprintf(stderr,  "0x%08" PRIx64 " : %s\n", cur, linebuf);
+        }
+    }
+}
+
 /* Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
  * start to become numerous it will be necessary to reduce the
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 12/54] migrate_init: Call from savevm
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (10 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 11/54] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 13/54] Move dirty page search state into separate structure Dr. David Alan Gilbert (git)
                   ` (41 subsequent siblings)
  53 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Suspend to file is very much like a migrate, and it makes life
easier if we have the Migration state available, so initialise it
in the savevm.c code for suspending.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewd-by: Amit Shah <amit.shah@redhat.com>
---
 include/migration/migration.h | 4 +---
 include/qemu/typedefs.h       | 1 +
 migration/migration.c         | 2 +-
 migration/savevm.c            | 2 ++
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 51bc348..82cc3a6 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -42,10 +42,7 @@ struct MigrationParams {
     bool shared;
 };
 
-typedef struct MigrationState MigrationState;
-
 typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
-
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *from_src_file;
@@ -116,6 +113,7 @@ int migrate_fd_close(MigrationState *s);
 
 void add_migration_state_change_notifier(Notifier *notify);
 void remove_migration_state_change_notifier(Notifier *notify);
+MigrationState *migrate_init(const MigrationParams *params);
 bool migration_in_setup(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index a43e6e9..0bf7967 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -41,6 +41,7 @@ typedef struct MemoryRegion MemoryRegion;
 typedef struct MemoryRegionSection MemoryRegionSection;
 typedef struct MigrationIncomingState MigrationIncomingState;
 typedef struct MigrationParams MigrationParams;
+typedef struct MigrationState MigrationState;
 typedef struct Monitor Monitor;
 typedef struct MouseTransformInfo MouseTransformInfo;
 typedef struct MSIMessage MSIMessage;
diff --git a/migration/migration.c b/migration/migration.c
index 192e975..9db77ae 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -632,7 +632,7 @@ bool migration_has_failed(MigrationState *s)
             s->state == MIGRATION_STATUS_FAILED);
 }
 
-static MigrationState *migrate_init(const MigrationParams *params)
+MigrationState *migrate_init(const MigrationParams *params)
 {
     MigrationState *s = migrate_get_current();
     int64_t bandwidth_limit = s->bandwidth_limit;
diff --git a/migration/savevm.c b/migration/savevm.c
index d8847c4..8254630 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -921,6 +921,8 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
         .blk = 0,
         .shared = 0
     };
+    MigrationState *ms = migrate_init(&params);
+    ms->file = f;
 
     if (qemu_savevm_state_blocked(errp)) {
         return -EINVAL;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 13/54] Move dirty page search state into separate structure
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (11 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 12/54] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 14/54] ram_find_and_save_block: Split out the finding Dr. David Alan Gilbert (git)
                   ` (40 subsequent siblings)
  53 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Pull the search state for one iteration of the dirty page
search into a structure.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 migration/ram.c | 55 +++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 35 insertions(+), 20 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 1c9d1da..04e895c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -227,6 +227,17 @@ static uint64_t migration_dirty_pages;
 static uint32_t last_version;
 static bool ram_bulk_stage;
 
+/* used by the search for pages to send */
+struct PageSearchStatus {
+    /* Current block being searched */
+    RAMBlock    *block;
+    /* Current offset to search from */
+    ram_addr_t   offset;
+    /* Set once we wrap around */
+    bool         complete_round;
+};
+typedef struct PageSearchStatus PageSearchStatus;
+
 struct CompressParam {
     bool start;
     bool done;
@@ -531,7 +542,6 @@ static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length)
         cpu_physical_memory_sync_dirty_bitmap(bitmap, start, length);
 }
 
-
 /* Fix me: there are too many global variables used in migration process. */
 static int64_t start_time;
 static int64_t bytes_xfer_prev;
@@ -923,26 +933,30 @@ static int ram_save_compressed_page(QEMUFile *f, RAMBlock *block,
 static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
                                    uint64_t *bytes_transferred)
 {
-    RAMBlock *block = last_seen_block;
-    ram_addr_t offset = last_offset;
-    bool complete_round = false;
+    PageSearchStatus pss;
     int pages = 0;
 
-    if (!block)
-        block = QLIST_FIRST_RCU(&ram_list.blocks);
+    pss.block = last_seen_block;
+    pss.offset = last_offset;
+    pss.complete_round = false;
+
+    if (!pss.block) {
+        pss.block = QLIST_FIRST_RCU(&ram_list.blocks);
+    }
 
     while (true) {
-        offset = migration_bitmap_find_and_reset_dirty(block, offset);
-        if (complete_round && block == last_seen_block &&
-            offset >= last_offset) {
+        pss.offset = migration_bitmap_find_and_reset_dirty(pss.block,
+                                                           pss.offset);
+        if (pss.complete_round && pss.block == last_seen_block &&
+            pss.offset >= last_offset) {
             break;
         }
-        if (offset >= block->used_length) {
-            offset = 0;
-            block = QLIST_NEXT_RCU(block, next);
-            if (!block) {
-                block = QLIST_FIRST_RCU(&ram_list.blocks);
-                complete_round = true;
+        if (pss.offset >= pss.block->used_length) {
+            pss.offset = 0;
+            pss.block = QLIST_NEXT_RCU(pss.block, next);
+            if (!pss.block) {
+                pss.block = QLIST_FIRST_RCU(&ram_list.blocks);
+                pss.complete_round = true;
                 ram_bulk_stage = false;
                 if (migrate_use_xbzrle()) {
                     /* If xbzrle is on, stop using the data compression at this
@@ -954,23 +968,24 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
             }
         } else {
             if (compression_switch && migrate_use_compression()) {
-                pages = ram_save_compressed_page(f, block, offset, last_stage,
+                pages = ram_save_compressed_page(f, pss.block, pss.offset,
+                                                 last_stage,
                                                  bytes_transferred);
             } else {
-                pages = ram_save_page(f, block, offset, last_stage,
+                pages = ram_save_page(f, pss.block, pss.offset, last_stage,
                                       bytes_transferred);
             }
 
             /* if page is unmodified, continue to the next */
             if (pages > 0) {
-                last_sent_block = block;
+                last_sent_block = pss.block;
                 break;
             }
         }
     }
 
-    last_seen_block = block;
-    last_offset = offset;
+    last_seen_block = pss.block;
+    last_offset = pss.offset;
 
     return pages;
 }
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 14/54] ram_find_and_save_block: Split out the finding
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (12 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 13/54] Move dirty page search state into separate structure Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 15/54] Rename save_live_complete to save_live_complete_precopy Dr. David Alan Gilbert (git)
                   ` (39 subsequent siblings)
  53 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Split out the finding of the dirty page and all the wrap detection
into a separate function since it was getting a bit hairy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/ram.c | 84 ++++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 59 insertions(+), 25 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 04e895c..8d0a388 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -917,6 +917,59 @@ static int ram_save_compressed_page(QEMUFile *f, RAMBlock *block,
     return pages;
 }
 
+/*
+ * Find the next dirty page and update any state associated with
+ * the search process.
+ *
+ * Returns: True if a page is found
+ *
+ * @f: Current migration stream.
+ * @pss: Data about the state of the current dirty page scan.
+ * @*again: Set to false if the search has scanned the whole of RAM
+ */
+static bool find_dirty_block(QEMUFile *f, PageSearchStatus *pss,
+                             bool *again)
+{
+    pss->offset = migration_bitmap_find_and_reset_dirty(pss->block,
+                                                       pss->offset);
+    if (pss->complete_round && pss->block == last_seen_block &&
+        pss->offset >= last_offset) {
+        /*
+         * We've been once around the RAM and haven't found anything
+         * give up.
+         */
+        *again = false;
+        return false;
+    }
+    if (pss->offset >= pss->block->used_length) {
+        /* Didn't find anything in this RAM Block */
+        pss->offset = 0;
+        pss->block = QLIST_NEXT_RCU(pss->block, next);
+        if (!pss->block) {
+            /* Hit the end of the list */
+            pss->block = QLIST_FIRST_RCU(&ram_list.blocks);
+            /* Flag that we've looped */
+            pss->complete_round = true;
+            ram_bulk_stage = false;
+            if (migrate_use_xbzrle()) {
+                /* If xbzrle is on, stop using the data compression at this
+                 * point. In theory, xbzrle can do better than compression.
+                 */
+                flush_compressed_data(f);
+                compression_switch = false;
+            }
+        }
+        /* Didn't find anything this time, but try again on the new block */
+        *again = true;
+        return false;
+    } else {
+        /* Can go around again, but... */
+        *again = true;
+        /* We've found something so probably don't need to */
+        return true;
+    }
+}
+
 /**
  * ram_find_and_save_block: Finds a dirty page and sends it to f
  *
@@ -935,6 +988,7 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
 {
     PageSearchStatus pss;
     int pages = 0;
+    bool again, found;
 
     pss.block = last_seen_block;
     pss.offset = last_offset;
@@ -944,29 +998,10 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
         pss.block = QLIST_FIRST_RCU(&ram_list.blocks);
     }
 
-    while (true) {
-        pss.offset = migration_bitmap_find_and_reset_dirty(pss.block,
-                                                           pss.offset);
-        if (pss.complete_round && pss.block == last_seen_block &&
-            pss.offset >= last_offset) {
-            break;
-        }
-        if (pss.offset >= pss.block->used_length) {
-            pss.offset = 0;
-            pss.block = QLIST_NEXT_RCU(pss.block, next);
-            if (!pss.block) {
-                pss.block = QLIST_FIRST_RCU(&ram_list.blocks);
-                pss.complete_round = true;
-                ram_bulk_stage = false;
-                if (migrate_use_xbzrle()) {
-                    /* If xbzrle is on, stop using the data compression at this
-                     * point. In theory, xbzrle can do better than compression.
-                     */
-                    flush_compressed_data(f);
-                    compression_switch = false;
-                }
-            }
-        } else {
+    do {
+        found = find_dirty_block(f, &pss, &again);
+
+        if (found) {
             if (compression_switch && migrate_use_compression()) {
                 pages = ram_save_compressed_page(f, pss.block, pss.offset,
                                                  last_stage,
@@ -979,10 +1014,9 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
             /* if page is unmodified, continue to the next */
             if (pages > 0) {
                 last_sent_block = pss.block;
-                break;
             }
         }
-    }
+    } while (!pages && again);
 
     last_seen_block = pss.block;
     last_offset = pss.offset;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 15/54] Rename save_live_complete to save_live_complete_precopy
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (13 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 14/54] ram_find_and_save_block: Split out the finding Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 16/54] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
                   ` (38 subsequent siblings)
  53 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

In postcopy we're going to need to perform the complete phase
for postcopiable devices at a different point, start out by
renaming all of the 'complete's to make the difference obvious.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
 hw/ppc/spapr.c              |  2 +-
 include/migration/vmstate.h |  2 +-
 include/sysemu/sysemu.h     |  2 +-
 migration/block.c           |  2 +-
 migration/migration.c       |  2 +-
 migration/ram.c             |  2 +-
 migration/savevm.c          | 10 +++++-----
 trace-events                |  2 +-
 8 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 7f4f196..cdf9534 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1528,7 +1528,7 @@ static int htab_load(QEMUFile *f, void *opaque, int version_id)
 static SaveVMHandlers savevm_htab_handlers = {
     .save_live_setup = htab_save_setup,
     .save_live_iterate = htab_save_iterate,
-    .save_live_complete = htab_save_complete,
+    .save_live_complete_precopy = htab_save_complete,
     .load_state = htab_load,
 };
 
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 2e5a97d..d0f4451 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -40,7 +40,7 @@ typedef struct SaveVMHandlers {
     SaveStateHandler *save_state;
 
     void (*cancel)(void *opaque);
-    int (*save_live_complete)(QEMUFile *f, void *opaque);
+    int (*save_live_complete_precopy)(QEMUFile *f, void *opaque);
 
     /* This runs both outside and inside the iothread lock.  */
     bool (*is_active)(void *opaque);
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 3e5c3d1..7fd6c73 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -89,7 +89,7 @@ void qemu_savevm_state_begin(QEMUFile *f,
                              const MigrationParams *params);
 void qemu_savevm_state_header(QEMUFile *f);
 int qemu_savevm_state_iterate(QEMUFile *f);
-void qemu_savevm_state_complete(QEMUFile *f);
+void qemu_savevm_state_complete_precopy(QEMUFile *f);
 void qemu_savevm_state_cancel(void);
 uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
 int qemu_loadvm_state(QEMUFile *f);
diff --git a/migration/block.c b/migration/block.c
index ed865ed..ceae0ab 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -877,7 +877,7 @@ static SaveVMHandlers savevm_block_handlers = {
     .set_params = block_set_params,
     .save_live_setup = block_save_setup,
     .save_live_iterate = block_save_iterate,
-    .save_live_complete = block_save_complete,
+    .save_live_complete_precopy = block_save_complete,
     .save_live_pending = block_save_pending,
     .load_state = block_load,
     .cancel = block_migration_cancel,
diff --git a/migration/migration.c b/migration/migration.c
index 9db77ae..ba23a65 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -955,7 +955,7 @@ static void *migration_thread(void *opaque)
                     ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
                     if (ret >= 0) {
                         qemu_file_set_rate_limit(s->file, INT64_MAX);
-                        qemu_savevm_state_complete(s->file);
+                        qemu_savevm_state_complete_precopy(s->file);
                     }
                 }
                 qemu_mutex_unlock_iothread();
diff --git a/migration/ram.c b/migration/ram.c
index 8d0a388..1ae8223 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1692,7 +1692,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
 static SaveVMHandlers savevm_ram_handlers = {
     .save_live_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
-    .save_live_complete = ram_save_complete,
+    .save_live_complete_precopy = ram_save_complete,
     .save_live_pending = ram_save_pending,
     .load_state = ram_load,
     .cancel = ram_migration_cancel,
diff --git a/migration/savevm.c b/migration/savevm.c
index 8254630..e621afd 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -806,19 +806,19 @@ static bool should_send_vmdesc(void)
     return !machine->suppress_vmdesc;
 }
 
-void qemu_savevm_state_complete(QEMUFile *f)
+void qemu_savevm_state_complete_precopy(QEMUFile *f)
 {
     QJSON *vmdesc;
     int vmdesc_len;
     SaveStateEntry *se;
     int ret;
 
-    trace_savevm_state_complete();
+    trace_savevm_state_complete_precopy();
 
     cpu_synchronize_all_states();
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-        if (!se->ops || !se->ops->save_live_complete) {
+        if (!se->ops || !se->ops->save_live_complete_precopy) {
             continue;
         }
         if (se->ops && se->ops->is_active) {
@@ -830,7 +830,7 @@ void qemu_savevm_state_complete(QEMUFile *f)
 
         save_section_header(f, se, QEMU_VM_SECTION_END);
 
-        ret = se->ops->save_live_complete(f, se->opaque);
+        ret = se->ops->save_live_complete_precopy(f, se->opaque);
         trace_savevm_section_end(se->idstr, se->section_id, ret);
         save_section_footer(f, se);
         if (ret < 0) {
@@ -941,7 +941,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
 
     ret = qemu_file_get_error(f);
     if (ret == 0) {
-        qemu_savevm_state_complete(f);
+        qemu_savevm_state_complete_precopy(f);
         ret = qemu_file_get_error(f);
     }
     if (ret != 0) {
diff --git a/trace-events b/trace-events
index 25c53e0..cec51f6 100644
--- a/trace-events
+++ b/trace-events
@@ -1203,7 +1203,7 @@ savevm_section_skip(const char *id, unsigned int section_id) "%s, section_id %u"
 savevm_state_begin(void) ""
 savevm_state_header(void) ""
 savevm_state_iterate(void) ""
-savevm_state_complete(void) ""
+savevm_state_complete_precopy(void) ""
 savevm_state_cancel(void) ""
 vmstate_save(const char *idstr, const char *vmsd_name) "%s, %s"
 vmstate_load(const char *idstr, const char *vmsd_name) "%s, %s"
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 16/54] Return path: Open a return path on QEMUFile for sockets
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (14 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 15/54] Rename save_live_complete to save_live_complete_precopy Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-10-02 15:29   ` Daniel P. Berrange
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 17/54] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
                   ` (37 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy needs a method to send messages from the destination back to
the source, this is the 'return path'.

Wire it up for 'socket' QEMUFile's.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 include/migration/qemu-file.h |  7 +++++
 migration/qemu-file-unix.c    | 69 +++++++++++++++++++++++++++++++++++++------
 migration/qemu-file.c         | 12 ++++++++
 3 files changed, 79 insertions(+), 9 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 865f897..4c89a2c 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -89,6 +89,11 @@ typedef size_t (QEMURamSaveFunc)(QEMUFile *f, void *opaque,
                                uint64_t *bytes_sent);
 
 /*
+ * Return a QEMUFile for comms in the opposite direction
+ */
+typedef QEMUFile *(QEMURetPathFunc)(void *opaque);
+
+/*
  * Stop any read or write (depending on flags) on the underlying
  * transport on the QEMUFile.
  * Existing blocking reads/writes must be woken
@@ -106,6 +111,7 @@ typedef struct QEMUFileOps {
     QEMURamHookFunc *after_ram_iterate;
     QEMURamHookFunc *hook_ram_load;
     QEMURamSaveFunc *save_page;
+    QEMURetPathFunc *get_return_path;
     QEMUFileShutdownFunc *shut_down;
 } QEMUFileOps;
 
@@ -196,6 +202,7 @@ int64_t qemu_file_get_rate_limit(QEMUFile *f);
 int qemu_file_get_error(QEMUFile *f);
 void qemu_file_set_error(QEMUFile *f, int ret);
 int qemu_file_shutdown(QEMUFile *f);
+QEMUFile *qemu_file_get_return_path(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 void qemu_file_set_blocking(QEMUFile *f, bool block);
 
diff --git a/migration/qemu-file-unix.c b/migration/qemu-file-unix.c
index bfbc086..dd463ff 100644
--- a/migration/qemu-file-unix.c
+++ b/migration/qemu-file-unix.c
@@ -96,6 +96,56 @@ static int socket_shutdown(void *opaque, bool rd, bool wr)
     }
 }
 
+static int socket_return_close(void *opaque)
+{
+    QEMUFileSocket *s = opaque;
+    /*
+     * Note: We don't close the socket, that should be done by the forward
+     * path.
+     */
+    g_free(s);
+    return 0;
+}
+
+static const QEMUFileOps socket_return_read_ops = {
+    .get_fd          = socket_get_fd,
+    .get_buffer      = socket_get_buffer,
+    .close           = socket_return_close,
+    .shut_down       = socket_shutdown,
+};
+
+static const QEMUFileOps socket_return_write_ops = {
+    .get_fd          = socket_get_fd,
+    .writev_buffer   = socket_writev_buffer,
+    .close           = socket_return_close,
+    .shut_down       = socket_shutdown,
+};
+
+/*
+ * Give a QEMUFile* off the same socket but data in the opposite
+ * direction.
+ */
+static QEMUFile *socket_get_return_path(void *opaque)
+{
+    QEMUFileSocket *forward = opaque;
+    QEMUFileSocket *reverse;
+
+    if (qemu_file_get_error(forward->file)) {
+        /* If the forward file is in error, don't try and open a return */
+        return NULL;
+    }
+
+    reverse = g_malloc0(sizeof(QEMUFileSocket));
+    reverse->fd = forward->fd;
+    /* I don't think there's a better way to tell which direction 'this' is */
+    if (forward->file->ops->get_buffer != NULL) {
+        /* being called from the read side, so we need to be able to write */
+        return qemu_fopen_ops(reverse, &socket_return_write_ops);
+    } else {
+        return qemu_fopen_ops(reverse, &socket_return_read_ops);
+    }
+}
+
 static ssize_t unix_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
                                   int64_t pos)
 {
@@ -204,18 +254,19 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
 }
 
 static const QEMUFileOps socket_read_ops = {
-    .get_fd     = socket_get_fd,
-    .get_buffer = socket_get_buffer,
-    .close      = socket_close,
-    .shut_down  = socket_shutdown
-
+    .get_fd          = socket_get_fd,
+    .get_buffer      = socket_get_buffer,
+    .close           = socket_close,
+    .shut_down       = socket_shutdown,
+    .get_return_path = socket_get_return_path
 };
 
 static const QEMUFileOps socket_write_ops = {
-    .get_fd        = socket_get_fd,
-    .writev_buffer = socket_writev_buffer,
-    .close         = socket_close,
-    .shut_down     = socket_shutdown
+    .get_fd          = socket_get_fd,
+    .writev_buffer   = socket_writev_buffer,
+    .close           = socket_close,
+    .shut_down       = socket_shutdown,
+    .get_return_path = socket_get_return_path
 };
 
 QEMUFile *qemu_fopen_socket(int fd, const char *mode)
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 3c64a9c..e188b69 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -44,6 +44,18 @@ int qemu_file_shutdown(QEMUFile *f)
     return f->ops->shut_down(f->opaque, true, true);
 }
 
+/*
+ * Result: QEMUFile* for a 'return path' for comms in the opposite direction
+ *         NULL if not available
+ */
+QEMUFile *qemu_file_get_return_path(QEMUFile *f)
+{
+    if (!f->ops->get_return_path) {
+        return NULL;
+    }
+    return f->ops->get_return_path(f->opaque);
+}
+
 bool qemu_file_mode_is_not_valid(const char *mode)
 {
     if (mode == NULL ||
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 17/54] Return path: socket_writev_buffer: Block even on non-blocking fd's
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (15 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 16/54] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 18/54] Migration commands Dr. David Alan Gilbert (git)
                   ` (36 subsequent siblings)
  53 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The destination sets the fd to non-blocking on incoming migrations;
this also affects the return path from the destination, and thus we
need to make sure we can safely write to the return path.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 migration/qemu-file-unix.c | 42 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 37 insertions(+), 5 deletions(-)

diff --git a/migration/qemu-file-unix.c b/migration/qemu-file-unix.c
index dd463ff..4bf050f 100644
--- a/migration/qemu-file-unix.c
+++ b/migration/qemu-file-unix.c
@@ -22,6 +22,7 @@
  * THE SOFTWARE.
  */
 #include "qemu-common.h"
+#include "qemu/error-report.h"
 #include "qemu/iov.h"
 #include "qemu/sockets.h"
 #include "block/coroutine.h"
@@ -39,12 +40,43 @@ static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
     QEMUFileSocket *s = opaque;
     ssize_t len;
     ssize_t size = iov_size(iov, iovcnt);
+    ssize_t offset = 0;
+    int     err;
 
-    len = iov_send(s->fd, iov, iovcnt, 0, size);
-    if (len < size) {
-        len = -socket_error();
-    }
-    return len;
+    while (size > 0) {
+        len = iov_send(s->fd, iov, iovcnt, offset, size);
+
+        if (len > 0) {
+            size -= len;
+            offset += len;
+        }
+
+        if (size > 0) {
+            err = socket_error();
+
+            if (err != EAGAIN && err != EWOULDBLOCK) {
+                error_report("socket_writev_buffer: Got err=%d for (%zd/%zd)",
+                             err, size, len);
+                /*
+                 * If I've already sent some but only just got the error, I
+                 * could return the amount validly sent so far and wait for the
+                 * next call to report the error, but I'd rather flag the error
+                 * immediately.
+                 */
+                return -err;
+            }
+
+            /* Emulate blocking */
+            GPollFD pfd;
+
+            pfd.fd = s->fd;
+            pfd.events = G_IO_OUT | G_IO_ERR;
+            pfd.revents = 0;
+            g_poll(&pfd, 1 /* 1 fd */, -1 /* no timeout */);
+        }
+     }
+
+    return offset;
 }
 
 static int socket_get_fd(void *opaque)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 18/54] Migration commands
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (16 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 17/54] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-10-20 11:22   ` Juan Quintela
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 19/54] Return path: Control commands Dr. David Alan Gilbert (git)
                   ` (35 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Create QEMU_VM_COMMAND section type for sending commands from
source to destination.  These commands are not intended to convey
guest state but to control the migration process.

For use in postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 include/migration/migration.h |  1 +
 include/sysemu/sysemu.h       |  8 +++++
 migration/savevm.c            | 70 +++++++++++++++++++++++++++++++++++++++++++
 trace-events                  |  2 ++
 4 files changed, 81 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 82cc3a6..0bb4383 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -35,6 +35,7 @@
 #define QEMU_VM_SUBSECTION           0x05
 #define QEMU_VM_VMDESCRIPTION        0x06
 #define QEMU_VM_CONFIGURATION        0x07
+#define QEMU_VM_COMMAND              0x08
 #define QEMU_VM_SECTION_FOOTER       0x7e
 
 struct MigrationParams {
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 7fd6c73..07e9502 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -84,6 +84,12 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict);
 
 void qemu_announce_self(void);
 
+/* Subcommands for QEMU_VM_COMMAND */
+enum qemu_vm_cmd {
+    MIG_CMD_INVALID = 0,   /* Must be 0 */
+    MIG_CMD_MAX
+};
+
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_state_begin(QEMUFile *f,
                              const MigrationParams *params);
@@ -92,6 +98,8 @@ int qemu_savevm_state_iterate(QEMUFile *f);
 void qemu_savevm_state_complete_precopy(QEMUFile *f);
 void qemu_savevm_state_cancel(void);
 uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
+void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
+                              uint16_t len, uint8_t *data);
 int qemu_loadvm_state(QEMUFile *f);
 
 typedef enum DisplayType
diff --git a/migration/savevm.c b/migration/savevm.c
index e621afd..eb495e6 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -59,6 +59,14 @@
 
 static bool skip_section_footers;
 
+static struct mig_cmd_args {
+    ssize_t     len; /* -1 = variable */
+    const char *name;
+} mig_cmd_args[] = {
+    [MIG_CMD_INVALID]          = { .len = -1, .name = "INVALID" },
+    [MIG_CMD_MAX]              = { .len = -1, .name = "MAX" },
+};
+
 static int announce_self_create(uint8_t *buf,
                                 uint8_t *mac_addr)
 {
@@ -693,6 +701,28 @@ static void save_section_footer(QEMUFile *f, SaveStateEntry *se)
     }
 }
 
+/**
+ * qemu_savevm_command_send: Send a 'QEMU_VM_COMMAND' type element with the
+ *                           command and associated data.
+ *
+ * @f: File to send command on
+ * @command: Command type to send
+ * @len: Length of associated data
+ * @data: Data associated with command.
+ */
+void qemu_savevm_command_send(QEMUFile *f,
+                              enum qemu_vm_cmd command,
+                              uint16_t len,
+                              uint8_t *data)
+{
+    trace_savevm_command_send(command, len);
+    qemu_put_byte(f, QEMU_VM_COMMAND);
+    qemu_put_be16(f, (uint16_t)command);
+    qemu_put_be16(f, len);
+    qemu_put_buffer(f, data, len);
+    qemu_fflush(f);
+}
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -1003,6 +1033,40 @@ static SaveStateEntry *find_se(const char *idstr, int instance_id)
     return NULL;
 }
 
+/**
+ * loadvm_process_command: Process an incoming 'QEMU_VM_COMMAND'
+ *
+ * Returns: 0 on success, negative on error (in which case it will issue an
+ *          error message).
+ * @f: The stream to read the command data from.
+ */
+static int loadvm_process_command(QEMUFile *f)
+{
+    uint16_t cmd;
+    uint16_t len;
+
+    cmd = qemu_get_be16(f);
+    len = qemu_get_be16(f);
+
+    trace_loadvm_process_command(cmd, len);
+    if (cmd >= MIG_CMD_MAX || cmd == MIG_CMD_INVALID) {
+        error_report("MIG_CMD 0x%x unknown (len 0x%x)", cmd, len);
+        return -EINVAL;
+    }
+
+    if (mig_cmd_args[cmd].len != -1 && mig_cmd_args[cmd].len != len) {
+        error_report("%s received with bad length - expecting %zd, got %d",
+                      mig_cmd_args[cmd].name, mig_cmd_args[cmd].len, len);
+        return -ERANGE;
+    }
+
+    switch (cmd) {
+        /* Filling added in next patch */
+    }
+
+    return 0;
+}
+
 struct LoadStateEntry {
     QLIST_ENTRY(LoadStateEntry) entry;
     SaveStateEntry *se;
@@ -1182,6 +1246,12 @@ int qemu_loadvm_state(QEMUFile *f)
                 goto out;
             }
             break;
+        case QEMU_VM_COMMAND:
+            ret = loadvm_process_command(f);
+            if (ret < 0) {
+                goto out;
+            }
+            break;
         default:
             error_report("Unknown savevm section type %d", section_type);
             ret = -EINVAL;
diff --git a/trace-events b/trace-events
index cec51f6..b6cdf11 100644
--- a/trace-events
+++ b/trace-events
@@ -1197,6 +1197,8 @@ virtio_gpu_fence_resp(uint64_t fence) "fence 0x%" PRIx64
 qemu_loadvm_state_section(unsigned int section_type) "%d"
 qemu_loadvm_state_section_partend(uint32_t section_id) "%u"
 qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u"
+loadvm_process_command(uint16_t com, uint16_t len) "com=0x%x len=%d"
+savevm_command_send(uint16_t command, uint16_t len) "com=0x%x len=%d"
 savevm_section_start(const char *id, unsigned int section_id) "%s, section_id %u"
 savevm_section_end(const char *id, unsigned int section_id, int ret) "%s, section_id %u -> %d"
 savevm_section_skip(const char *id, unsigned int section_id) "%s, section_id %u"
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 19/54] Return path: Control commands
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (17 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 18/54] Migration commands Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-10-20 11:27   ` Juan Quintela
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 20/54] Return path: Send responses from destination to source Dr. David Alan Gilbert (git)
                   ` (34 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add two src->dest commands:
   * OPEN_RETURN_PATH - To request that the destination open the return path
   * PING - Request an acknowledge from the destination

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 include/migration/migration.h |  2 ++
 include/sysemu/sysemu.h       |  4 ++++
 migration/savevm.c            | 42 +++++++++++++++++++++++++++++++++++++++++-
 trace-events                  |  2 ++
 4 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 0bb4383..98a6d07 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -48,6 +48,8 @@ typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
 struct MigrationIncomingState {
     QEMUFile *from_src_file;
 
+    QEMUFile *to_src_file;
+
     /* See savevm.c */
     LoadStateEntry_Head loadvm_handlers;
 };
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 07e9502..c6a3a78 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -87,6 +87,8 @@ void qemu_announce_self(void);
 /* Subcommands for QEMU_VM_COMMAND */
 enum qemu_vm_cmd {
     MIG_CMD_INVALID = 0,   /* Must be 0 */
+    MIG_CMD_OPEN_RETURN_PATH,  /* Tell the dest to open the Return path */
+    MIG_CMD_PING,              /* Request a PONG on the RP */
     MIG_CMD_MAX
 };
 
@@ -100,6 +102,8 @@ void qemu_savevm_state_cancel(void);
 uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
 void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
+void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
+void qemu_savevm_send_open_return_path(QEMUFile *f);
 int qemu_loadvm_state(QEMUFile *f);
 
 typedef enum DisplayType
diff --git a/migration/savevm.c b/migration/savevm.c
index eb495e6..819ab1e 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -64,6 +64,8 @@ static struct mig_cmd_args {
     const char *name;
 } mig_cmd_args[] = {
     [MIG_CMD_INVALID]          = { .len = -1, .name = "INVALID" },
+    [MIG_CMD_OPEN_RETURN_PATH] = { .len =  0, .name = "OPEN_RETURN_PATH" },
+    [MIG_CMD_PING]             = { .len = sizeof(uint32_t), .name = "PING" },
     [MIG_CMD_MAX]              = { .len = -1, .name = "MAX" },
 };
 
@@ -723,6 +725,20 @@ void qemu_savevm_command_send(QEMUFile *f,
     qemu_fflush(f);
 }
 
+void qemu_savevm_send_ping(QEMUFile *f, uint32_t value)
+{
+    uint32_t buf;
+
+    trace_savevm_send_ping(value);
+    buf = cpu_to_be32(value);
+    qemu_savevm_command_send(f, MIG_CMD_PING, sizeof(value), (uint8_t *)&buf);
+}
+
+void qemu_savevm_send_open_return_path(QEMUFile *f)
+{
+    qemu_savevm_command_send(f, MIG_CMD_OPEN_RETURN_PATH, 0, NULL);
+}
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -1042,8 +1058,10 @@ static SaveStateEntry *find_se(const char *idstr, int instance_id)
  */
 static int loadvm_process_command(QEMUFile *f)
 {
+    MigrationIncomingState *mis = migration_incoming_get_current();
     uint16_t cmd;
     uint16_t len;
+    uint32_t tmp32;
 
     cmd = qemu_get_be16(f);
     len = qemu_get_be16(f);
@@ -1061,7 +1079,29 @@ static int loadvm_process_command(QEMUFile *f)
     }
 
     switch (cmd) {
-        /* Filling added in next patch */
+    case MIG_CMD_OPEN_RETURN_PATH:
+        if (mis->to_src_file) {
+            error_report("CMD_OPEN_RETURN_PATH called when RP already open");
+            /* Not really a problem, so don't give up */
+            return 0;
+        }
+        mis->to_src_file = qemu_file_get_return_path(f);
+        if (!mis->to_src_file) {
+            error_report("CMD_OPEN_RETURN_PATH failed");
+            return -1;
+        }
+        break;
+
+    case MIG_CMD_PING:
+        tmp32 = qemu_get_be32(f);
+        trace_loadvm_process_command_ping(tmp32);
+        if (!mis->to_src_file) {
+            error_report("CMD_PING (0x%x) received with no return path",
+                         tmp32);
+            return -1;
+        }
+        /* migrate_send_rp_pong(mis, tmp32); TODO: gets added later */
+        break;
     }
 
     return 0;
diff --git a/trace-events b/trace-events
index b6cdf11..4d4e9dc 100644
--- a/trace-events
+++ b/trace-events
@@ -1198,10 +1198,12 @@ qemu_loadvm_state_section(unsigned int section_type) "%d"
 qemu_loadvm_state_section_partend(uint32_t section_id) "%u"
 qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u"
 loadvm_process_command(uint16_t com, uint16_t len) "com=0x%x len=%d"
+loadvm_process_command_ping(uint32_t val) "%x"
 savevm_command_send(uint16_t command, uint16_t len) "com=0x%x len=%d"
 savevm_section_start(const char *id, unsigned int section_id) "%s, section_id %u"
 savevm_section_end(const char *id, unsigned int section_id, int ret) "%s, section_id %u -> %d"
 savevm_section_skip(const char *id, unsigned int section_id) "%s, section_id %u"
+savevm_send_ping(uint32_t val) "%x"
 savevm_state_begin(void) ""
 savevm_state_header(void) ""
 savevm_state_iterate(void) ""
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 20/54] Return path: Send responses from destination to source
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (18 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 19/54] Return path: Control commands Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 21/54] Return path: Source handling of return path Dr. David Alan Gilbert (git)
                   ` (33 subsequent siblings)
  53 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add migrate_send_rp_message to send a message from destination to source along the return path.
  (It uses a mutex to let it be called from multiple threads)
Add migrate_send_rp_shut to send a 'shut' message to indicate
  the destination is finished with the RP.
Add migrate_send_rp_ack to send a 'PONG' message in response to a PING
  Use it in the MSG_RP_PING handler

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 include/migration/migration.h | 19 ++++++++++++++++++
 migration/migration.c         | 45 +++++++++++++++++++++++++++++++++++++++++++
 migration/savevm.c            |  2 +-
 trace-events                  |  1 +
 4 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 98a6d07..3ce3fda 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -43,12 +43,22 @@ struct MigrationParams {
     bool shared;
 };
 
+/* Messages sent on the return path from destination to source */
+enum mig_rp_message_type {
+    MIG_RP_MSG_INVALID = 0,  /* Must be 0 */
+    MIG_RP_MSG_SHUT,         /* sibling will not send any more RP messages */
+    MIG_RP_MSG_PONG,         /* Response to a PING; data (seq: be32 ) */
+
+    MIG_RP_MSG_MAX
+};
+
 typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *from_src_file;
 
     QEMUFile *to_src_file;
+    QemuMutex rp_mutex;    /* We send replies from multiple threads */
 
     /* See savevm.c */
     LoadStateEntry_Head loadvm_handlers;
@@ -181,6 +191,15 @@ int migrate_compress_threads(void);
 int migrate_decompress_threads(void);
 bool migrate_use_events(void);
 
+/* Sending on the return path - generic and then for each message type */
+void migrate_send_rp_message(MigrationIncomingState *mis,
+                             enum mig_rp_message_type message_type,
+                             uint16_t len, void *data);
+void migrate_send_rp_shut(MigrationIncomingState *mis,
+                          uint32_t value);
+void migrate_send_rp_pong(MigrationIncomingState *mis,
+                          uint32_t value);
+
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data);
diff --git a/migration/migration.c b/migration/migration.c
index ba23a65..4fad6a5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -89,6 +89,7 @@ MigrationIncomingState *migration_incoming_state_new(QEMUFile* f)
     mis_current = g_malloc0(sizeof(MigrationIncomingState));
     mis_current->from_src_file = f;
     QLIST_INIT(&mis_current->loadvm_handlers);
+    qemu_mutex_init(&mis_current->rp_mutex);
 
     return mis_current;
 }
@@ -325,6 +326,50 @@ void process_incoming_migration(QEMUFile *f)
     qemu_coroutine_enter(co, f);
 }
 
+/*
+ * Send a message on the return channel back to the source
+ * of the migration.
+ */
+void migrate_send_rp_message(MigrationIncomingState *mis,
+                             enum mig_rp_message_type message_type,
+                             uint16_t len, void *data)
+{
+    trace_migrate_send_rp_message((int)message_type, len);
+    qemu_mutex_lock(&mis->rp_mutex);
+    qemu_put_be16(mis->to_src_file, (unsigned int)message_type);
+    qemu_put_be16(mis->to_src_file, len);
+    qemu_put_buffer(mis->to_src_file, data, len);
+    qemu_fflush(mis->to_src_file);
+    qemu_mutex_unlock(&mis->rp_mutex);
+}
+
+/*
+ * Send a 'SHUT' message on the return channel with the given value
+ * to indicate that we've finished with the RP.  Non-0 value indicates
+ * error.
+ */
+void migrate_send_rp_shut(MigrationIncomingState *mis,
+                          uint32_t value)
+{
+    uint32_t buf;
+
+    buf = cpu_to_be32(value);
+    migrate_send_rp_message(mis, MIG_RP_MSG_SHUT, sizeof(buf), &buf);
+}
+
+/*
+ * Send a 'PONG' message on the return channel with the given value
+ * (normally in response to a 'PING')
+ */
+void migrate_send_rp_pong(MigrationIncomingState *mis,
+                          uint32_t value)
+{
+    uint32_t buf;
+
+    buf = cpu_to_be32(value);
+    migrate_send_rp_message(mis, MIG_RP_MSG_PONG, sizeof(buf), &buf);
+}
+
 /* amount of nanoseconds we are willing to wait for migration to be down.
  * the choice of nanoseconds is because it is the maximum resolution that
  * get_clock() can achieve. It is an internal measure. All user-visible
diff --git a/migration/savevm.c b/migration/savevm.c
index 819ab1e..f51cbcd 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1100,7 +1100,7 @@ static int loadvm_process_command(QEMUFile *f)
                          tmp32);
             return -1;
         }
-        /* migrate_send_rp_pong(mis, tmp32); TODO: gets added later */
+        migrate_send_rp_pong(mis, tmp32);
         break;
     }
 
diff --git a/trace-events b/trace-events
index 4d4e9dc..be58b47 100644
--- a/trace-events
+++ b/trace-events
@@ -1419,6 +1419,7 @@ migrate_fd_cleanup(void) ""
 migrate_fd_error(void) ""
 migrate_fd_cancel(void) ""
 migrate_pending(uint64_t size, uint64_t max) "pending size %" PRIu64 " max %" PRIu64
+migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
 migrate_transferred(uint64_t tranferred, uint64_t time_spent, double bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %g max_size %" PRId64
 migrate_state_too_big(void) ""
 migrate_global_state_post_load(const char *state) "loaded state: %s"
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 21/54] Return path: Source handling of return path
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (19 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 20/54] Return path: Send responses from destination to source Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-10-20 11:33   ` Juan Quintela
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 22/54] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
                   ` (32 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Open a return path, and handle messages that are received upon it.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |   8 ++
 migration/migration.c         | 172 +++++++++++++++++++++++++++++++++++++++++-
 trace-events                  |  10 +++
 3 files changed, 189 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 3ce3fda..571466b 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -80,6 +80,14 @@ struct MigrationState
 
     int state;
     MigrationParams params;
+
+    /* State related to return path */
+    struct {
+        QEMUFile     *from_dst_file;
+        QemuThread    rp_thread;
+        bool          error;
+    } rp_state;
+
     double mbps;
     int64_t total_time;
     int64_t downtime;
diff --git a/migration/migration.c b/migration/migration.c
index 4fad6a5..26bcb25 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -421,6 +421,23 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
     return params;
 }
 
+/*
+ * Return true if we're already in the middle of a migration
+ * (i.e. any of the active or setup states)
+ */
+static bool migration_is_active(MigrationState *ms)
+{
+    switch (ms->state) {
+    case MIGRATION_STATUS_ACTIVE:
+    case MIGRATION_STATUS_SETUP:
+        return true;
+
+    default:
+        return false;
+
+    }
+}
+
 static void get_xbzrle_cache_stats(MigrationInfo *info)
 {
     if (migrate_use_xbzrle()) {
@@ -630,6 +647,11 @@ static void migrate_fd_cancel(MigrationState *s)
     QEMUFile *f = migrate_get_current()->file;
     trace_migrate_fd_cancel();
 
+    if (s->rp_state.from_dst_file) {
+        /* shutdown the rp socket, so causing the rp thread to shutdown */
+        qemu_file_shutdown(s->rp_state.from_dst_file);
+    }
+
     do {
         old_state = s->state;
         if (old_state != MIGRATION_STATUS_SETUP &&
@@ -958,8 +980,156 @@ int64_t migrate_xbzrle_cache_size(void)
     return s->xbzrle_cache_size;
 }
 
-/* migration thread support */
+/*
+ * Something bad happened to the RP stream, mark an error
+ * The caller shall print or trace something to indicate why
+ */
+static void mark_source_rp_bad(MigrationState *s)
+{
+    s->rp_state.error = true;
+}
+
+static struct rp_cmd_args {
+    ssize_t     len; /* -1 = variable */
+    const char *name;
+} rp_cmd_args[] = {
+    [MIG_RP_MSG_INVALID]        = { .len = -1, .name = "INVALID" },
+    [MIG_RP_MSG_SHUT]           = { .len =  4, .name = "SHUT" },
+    [MIG_RP_MSG_PONG]           = { .len =  4, .name = "PONG" },
+    [MIG_RP_MSG_MAX]            = { .len = -1, .name = "MAX" },
+};
+
+/*
+ * Handles messages sent on the return path towards the source VM
+ *
+ */
+static void *source_return_path_thread(void *opaque)
+{
+    MigrationState *ms = opaque;
+    QEMUFile *rp = ms->rp_state.from_dst_file;
+    uint16_t header_len, header_type;
+    const int max_len = 512;
+    uint8_t buf[max_len];
+    uint32_t tmp32, sibling_error;
+    int res;
+
+    trace_source_return_path_thread_entry();
+    while (!ms->rp_state.error && !qemu_file_get_error(rp) &&
+           migration_is_active(ms)) {
+        trace_source_return_path_thread_loop_top();
+        header_type = qemu_get_be16(rp);
+        header_len = qemu_get_be16(rp);
+
+        if (header_type >= MIG_RP_MSG_MAX ||
+            header_type == MIG_RP_MSG_INVALID) {
+            error_report("RP: Received invalid message 0x%04x length 0x%04x",
+                    header_type, header_len);
+            mark_source_rp_bad(ms);
+            goto out;
+        }
+
+        if ((rp_cmd_args[header_type].len != -1 &&
+            header_len != rp_cmd_args[header_type].len) ||
+            header_len > max_len) {
+            error_report("RP: Received '%s' message (0x%04x) with"
+                    "incorrect length %d expecting %zd",
+                    rp_cmd_args[header_type].name, header_type, header_len,
+                    rp_cmd_args[header_type].len);
+            mark_source_rp_bad(ms);
+            goto out;
+        }
+
+        /* We know we've got a valid header by this point */
+        res = qemu_get_buffer(rp, buf, header_len);
+        if (res != header_len) {
+            error_report("RP: Failed reading data for message 0x%04x"
+                         " read %d expected %d",
+                         header_type, res, header_len);
+            mark_source_rp_bad(ms);
+            goto out;
+        }
+
+        /* OK, we have the message and the data */
+        switch (header_type) {
+        case MIG_RP_MSG_SHUT:
+            sibling_error = be32_to_cpup((uint32_t *)buf);
+            trace_source_return_path_thread_shut(sibling_error);
+            if (sibling_error) {
+                error_report("RP: Sibling indicated error %d", sibling_error);
+                mark_source_rp_bad(ms);
+            }
+            /*
+             * We'll let the main thread deal with closing the RP
+             * we could do a shutdown(2) on it, but we're the only user
+             * anyway, so there's nothing gained.
+             */
+            goto out;
+
+        case MIG_RP_MSG_PONG:
+            tmp32 = be32_to_cpup((uint32_t *)buf);
+            trace_source_return_path_thread_pong(tmp32);
+            break;
+
+        default:
+            break;
+        }
+    }
+    if (rp && qemu_file_get_error(rp)) {
+        trace_source_return_path_thread_bad_end();
+        mark_source_rp_bad(ms);
+    }
+
+    trace_source_return_path_thread_end();
+out:
+    qemu_fclose(rp);
+    return NULL;
+}
+
+__attribute__ (( unused )) /* Until later in patch series */
+static int open_return_path_on_source(MigrationState *ms)
+{
 
+    ms->rp_state.from_dst_file = qemu_file_get_return_path(ms->file);
+    if (!ms->rp_state.from_dst_file) {
+        return -1;
+    }
+
+    trace_open_return_path_on_source();
+    qemu_thread_create(&ms->rp_state.rp_thread, "return path",
+                       source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
+
+    trace_open_return_path_on_source_continue();
+
+    return 0;
+}
+
+__attribute__ (( unused )) /* Until later in patch series */
+/* Returns 0 if the RP was ok, otherwise there was an error on the RP */
+static int await_return_path_close_on_source(MigrationState *ms)
+{
+    /*
+     * If this is a normal exit then the destination will send a SHUT and the
+     * rp_thread will exit, however if there's an error we need to cause
+     * it to exit.
+     */
+    if (qemu_file_get_error(ms->file) && ms->rp_state.from_dst_file) {
+        /*
+         * shutdown(2), if we have it, will cause it to unblock if it's stuck
+         * waiting for the destination.
+         */
+        qemu_file_shutdown(ms->rp_state.from_dst_file);
+        mark_source_rp_bad(ms);
+    }
+    trace_await_return_path_close_on_source_joining();
+    qemu_thread_join(&ms->rp_state.rp_thread);
+    trace_await_return_path_close_on_source_close();
+    return ms->rp_state.error;
+}
+
+/*
+ * Master migration thread on the source VM.
+ * It drives the migration and pumps the data down the outgoing channel.
+ */
 static void *migration_thread(void *opaque)
 {
     MigrationState *s = opaque;
diff --git a/trace-events b/trace-events
index be58b47..5bbfdf7 100644
--- a/trace-events
+++ b/trace-events
@@ -1414,12 +1414,22 @@ flic_no_device_api(int err) "flic: no Device Contral API support %d"
 flic_reset_failed(int err) "flic: reset failed %d"
 
 # migration.c
+await_return_path_close_on_source_close(void) ""
+await_return_path_close_on_source_joining(void) ""
 migrate_set_state(int new_state) "new state %d"
 migrate_fd_cleanup(void) ""
 migrate_fd_error(void) ""
 migrate_fd_cancel(void) ""
 migrate_pending(uint64_t size, uint64_t max) "pending size %" PRIu64 " max %" PRIu64
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
+open_return_path_on_source(void) ""
+open_return_path_on_source_continue(void) ""
+source_return_path_thread_bad_end(void) ""
+source_return_path_thread_end(void) ""
+source_return_path_thread_entry(void) ""
+source_return_path_thread_loop_top(void) ""
+source_return_path_thread_pong(uint32_t val) "%x"
+source_return_path_thread_shut(uint32_t val) "%x"
 migrate_transferred(uint64_t tranferred, uint64_t time_spent, double bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %g max_size %" PRId64
 migrate_state_too_big(void) ""
 migrate_global_state_post_load(const char *state) "loaded state: %s"
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 22/54] Rework loadvm path for subloops
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (20 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 21/54] Return path: Source handling of return path Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 23/54] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
                   ` (31 subsequent siblings)
  53 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy needs to have two migration streams loading concurrently;
one from memory (with the device state) and the other from the fd
with the memory transactions.

Split the core of qemu_loadvm_state out so we can use it for both.

Allow the inner loadvm loop to quit and cause the parent loops to
exit as well.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 include/migration/migration.h |   6 ++
 migration/migration.c         |   2 +
 migration/savevm.c            | 141 ++++++++++++++++++++++--------------------
 trace-events                  |   4 ++
 4 files changed, 86 insertions(+), 67 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 571466b..3dc95f4 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -57,6 +57,12 @@ typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
 struct MigrationIncomingState {
     QEMUFile *from_src_file;
 
+    /*
+     * Free at the start of the main state load, set as the main thread finishes
+     * loading state.
+     */
+    QemuEvent main_thread_load_event;
+
     QEMUFile *to_src_file;
     QemuMutex rp_mutex;    /* We send replies from multiple threads */
 
diff --git a/migration/migration.c b/migration/migration.c
index 26bcb25..6691a28 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -90,12 +90,14 @@ MigrationIncomingState *migration_incoming_state_new(QEMUFile* f)
     mis_current->from_src_file = f;
     QLIST_INIT(&mis_current->loadvm_handlers);
     qemu_mutex_init(&mis_current->rp_mutex);
+    qemu_event_init(&mis_current->main_thread_load_event, false);
 
     return mis_current;
 }
 
 void migration_incoming_state_destroy(void)
 {
+    qemu_event_destroy(&mis_current->main_thread_load_event);
     loadvm_free_handlers(mis_current);
     g_free(mis_current);
     mis_current = NULL;
diff --git a/migration/savevm.c b/migration/savevm.c
index f51cbcd..f9baaa8 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1049,11 +1049,18 @@ static SaveStateEntry *find_se(const char *idstr, int instance_id)
     return NULL;
 }
 
+enum LoadVMExitCodes {
+    /* Allow a command to quit all layers of nested loadvm loops */
+    LOADVM_QUIT     =  1,
+};
+
+static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
 /**
  * loadvm_process_command: Process an incoming 'QEMU_VM_COMMAND'
  *
- * Returns: 0 on success, negative on error (in which case it will issue an
- *          error message).
+ * Returns: 0 on just a normal return
+ *          LOADVM_QUIT All good, but exit the loop
+ *          <0 error (in which case it will issue an error message).
  * @f: The stream to read the command data from.
  */
 static int loadvm_process_command(QEMUFile *f)
@@ -1159,47 +1166,10 @@ void loadvm_free_handlers(MigrationIncomingState *mis)
     }
 }
 
-int qemu_loadvm_state(QEMUFile *f)
+static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
 {
-    MigrationIncomingState *mis = migration_incoming_get_current();
-    Error *local_err = NULL;
     uint8_t section_type;
-    unsigned int v;
     int ret;
-    int file_error_after_eof = -1;
-
-    if (qemu_savevm_state_blocked(&local_err)) {
-        error_report_err(local_err);
-        return -EINVAL;
-    }
-
-    v = qemu_get_be32(f);
-    if (v != QEMU_VM_FILE_MAGIC) {
-        error_report("Not a migration stream");
-        return -EINVAL;
-    }
-
-    v = qemu_get_be32(f);
-    if (v == QEMU_VM_FILE_VERSION_COMPAT) {
-        error_report("SaveVM v2 format is obsolete and don't work anymore");
-        return -ENOTSUP;
-    }
-    if (v != QEMU_VM_FILE_VERSION) {
-        error_report("Unsupported migration stream version");
-        return -ENOTSUP;
-    }
-
-    if (!savevm_state.skip_configuration) {
-        if (qemu_get_byte(f) != QEMU_VM_CONFIGURATION) {
-            error_report("Configuration section missing");
-            return -EINVAL;
-        }
-        ret = vmstate_load_state(f, &vmstate_configuration, &savevm_state, 0);
-
-        if (ret) {
-            return ret;
-        }
-    }
 
     while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
         uint32_t instance_id, version_id, section_id;
@@ -1228,16 +1198,14 @@ int qemu_loadvm_state(QEMUFile *f)
             if (se == NULL) {
                 error_report("Unknown savevm section or instance '%s' %d",
                              idstr, instance_id);
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
 
             /* Validate version */
             if (version_id > se->version_id) {
                 error_report("savevm: unsupported version %d for '%s' v%d",
                              version_id, idstr, se->version_id);
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
 
             /* Add entry */
@@ -1252,11 +1220,10 @@ int qemu_loadvm_state(QEMUFile *f)
             if (ret < 0) {
                 error_report("error while loading state for instance 0x%x of"
                              " device '%s'", instance_id, idstr);
-                goto out;
+                return ret;
             }
             if (!check_section_footer(f, le)) {
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
             break;
         case QEMU_VM_SECTION_PART:
@@ -1271,35 +1238,83 @@ int qemu_loadvm_state(QEMUFile *f)
             }
             if (le == NULL) {
                 error_report("Unknown savevm section %d", section_id);
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
 
             ret = vmstate_load(f, le->se, le->version_id);
             if (ret < 0) {
                 error_report("error while loading state section id %d(%s)",
                              section_id, le->se->idstr);
-                goto out;
+                return ret;
             }
             if (!check_section_footer(f, le)) {
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
             break;
         case QEMU_VM_COMMAND:
             ret = loadvm_process_command(f);
-            if (ret < 0) {
-                goto out;
+            trace_qemu_loadvm_state_section_command(ret);
+            if ((ret < 0) || (ret & LOADVM_QUIT)) {
+                return ret;
             }
             break;
         default:
             error_report("Unknown savevm section type %d", section_type);
-            ret = -EINVAL;
-            goto out;
+            return -EINVAL;
+        }
+    }
+
+    return 0;
+}
+
+int qemu_loadvm_state(QEMUFile *f)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    Error *local_err = NULL;
+    unsigned int v;
+    int ret;
+
+    if (qemu_savevm_state_blocked(&local_err)) {
+        error_report_err(local_err);
+        return -EINVAL;
+    }
+
+    v = qemu_get_be32(f);
+    if (v != QEMU_VM_FILE_MAGIC) {
+        error_report("Not a migration stream");
+        return -EINVAL;
+    }
+
+    v = qemu_get_be32(f);
+    if (v == QEMU_VM_FILE_VERSION_COMPAT) {
+        error_report("SaveVM v2 format is obsolete and don't work anymore");
+        return -ENOTSUP;
+    }
+    if (v != QEMU_VM_FILE_VERSION) {
+        error_report("Unsupported migration stream version");
+        return -ENOTSUP;
+    }
+
+    if (!savevm_state.skip_configuration) {
+        if (qemu_get_byte(f) != QEMU_VM_CONFIGURATION) {
+            error_report("Configuration section missing");
+            return -EINVAL;
+        }
+        ret = vmstate_load_state(f, &vmstate_configuration, &savevm_state, 0);
+
+        if (ret) {
+            return ret;
         }
     }
 
-    file_error_after_eof = qemu_file_get_error(f);
+    ret = qemu_loadvm_state_main(f, mis);
+    qemu_event_set(&mis->main_thread_load_event);
+
+    trace_qemu_loadvm_state_post_main(ret);
+
+    if (ret == 0) {
+        ret = qemu_file_get_error(f);
+    }
 
     /*
      * Try to read in the VMDESC section as well, so that dumping tools that
@@ -1311,10 +1326,10 @@ int qemu_loadvm_state(QEMUFile *f)
      * We also mustn't read data that isn't there; some transports (RDMA)
      * will stall waiting for that data when the source has already closed.
      */
-    if (should_send_vmdesc()) {
+    if (ret == 0 && should_send_vmdesc()) {
         uint8_t *buf;
         uint32_t size;
-        section_type = qemu_get_byte(f);
+        uint8_t  section_type = qemu_get_byte(f);
 
         if (section_type != QEMU_VM_VMDESCRIPTION) {
             error_report("Expected vmdescription section, but got %d",
@@ -1338,14 +1353,6 @@ int qemu_loadvm_state(QEMUFile *f)
 
     cpu_synchronize_all_post_init();
 
-    ret = 0;
-
-out:
-    if (ret == 0) {
-        /* We may not have a VMDESC section, so ignore relative errors */
-        ret = file_error_after_eof;
-    }
-
     return ret;
 }
 
diff --git a/trace-events b/trace-events
index 5bbfdf7..228f5b6 100644
--- a/trace-events
+++ b/trace-events
@@ -1195,7 +1195,11 @@ virtio_gpu_fence_resp(uint64_t fence) "fence 0x%" PRIx64
 
 # migration/savevm.c
 qemu_loadvm_state_section(unsigned int section_type) "%d"
+qemu_loadvm_state_section_command(int ret) "%d"
 qemu_loadvm_state_section_partend(uint32_t section_id) "%u"
+qemu_loadvm_state_main(void) ""
+qemu_loadvm_state_main_quit_parent(void) ""
+qemu_loadvm_state_post_main(int ret) "%d"
 qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u"
 loadvm_process_command(uint16_t com, uint16_t len) "com=0x%x len=%d"
 loadvm_process_command_ping(uint32_t val) "%x"
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 23/54] Add migration-capability boolean for postcopy-ram.
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (21 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 22/54] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-09-29 20:22   ` Eric Blake
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 24/54] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
                   ` (30 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The 'postcopy ram' capability allows postcopy migration of RAM;
note that the migration starts off in precopy mode until
postcopy mode is triggered (see the migrate_start_postcopy
patch later in the series).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 include/migration/migration.h |  1 +
 migration/migration.c         | 23 +++++++++++++++++++++++
 qapi-schema.json              |  6 +++++-
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 3dc95f4..4ed7931 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -186,6 +186,7 @@ void migrate_add_blocker(Error *reason);
  */
 void migrate_del_blocker(Error *reason);
 
+bool migrate_postcopy_ram(void);
 bool migrate_zero_blocks(void);
 
 bool migrate_auto_converge(void);
diff --git a/migration/migration.c b/migration/migration.c
index 6691a28..23bdad3 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -551,6 +551,20 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
     for (cap = params; cap; cap = cap->next) {
         s->enabled_capabilities[cap->value->capability] = cap->value->state;
     }
+
+    if (migrate_postcopy_ram()) {
+        if (migrate_use_compression()) {
+            /* The decompression threads asynchronously write into RAM
+             * rather than use the atomic copies needed to avoid
+             * userfaulting.  It should be possible to fix the decompression
+             * threads for compatibility in future.
+             */
+            error_report("Postcopy is not currently compatible with "
+                         "compression");
+            s->enabled_capabilities[MIGRATION_CAPABILITY_X_POSTCOPY_RAM] =
+                false;
+        }
+    }
 }
 
 void qmp_migrate_set_parameters(bool has_compress_level,
@@ -901,6 +915,15 @@ void qmp_migrate_set_downtime(double value, Error **errp)
     max_downtime = (uint64_t)value;
 }
 
+bool migrate_postcopy_ram(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_X_POSTCOPY_RAM];
+}
+
 bool migrate_auto_converge(void)
 {
     MigrationState *s;
diff --git a/qapi-schema.json b/qapi-schema.json
index 527690d..c6f1942 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -535,11 +535,15 @@
 # @auto-converge: If enabled, QEMU will automatically throttle down the guest
 #          to speed up convergence of RAM migration. (since 1.6)
 #
+# @x-postcopy-ram: Start executing on the migration target before all of RAM has
+#          been migrated, pulling the remaining pages along as needed. NOTE: If
+#          the migration fails during postcopy the VM will fail.  (since 2.5)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
   'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
-           'compress', 'events'] }
+           'compress', 'events', 'x-postcopy-ram'] }
 
 ##
 # @MigrationCapabilityStatus
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 24/54] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages.
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (22 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 23/54] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-10-20 11:50   ` Juan Quintela
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 25/54] MIG_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
                   ` (29 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The state of the postcopy process is managed via a series of messages;
   * Add wrappers and handlers for sending/receiving these messages
   * Add state variable that track the current state of postcopy

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 include/migration/migration.h |  27 +++++
 include/sysemu/sysemu.h       |  19 ++++
 migration/migration.c         |  20 ++++
 migration/savevm.c            | 255 ++++++++++++++++++++++++++++++++++++++++++
 trace-events                  |  10 ++
 5 files changed, 331 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 4ed7931..2e9fa3c 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -53,6 +53,29 @@ enum mig_rp_message_type {
 };
 
 typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
+
+/* The current postcopy state is read/set by postcopy_state_get/set
+ * which update it atomically.
+ * The state is updated as postcopy messages are received, and
+ * in general only one thread should be writing to the state at any one
+ * time, initially the main thread and then the listen thread;
+ * Corner cases are where either thread finishes early and/or errors.
+ * The state is checked as messages are received to ensure that
+ * the source is sending us messages in the correct order.
+ * The state is also used by the RAM reception code to know if it
+ * has to place pages atomically, and the cleanup code at the end of
+ * the main thread to know if it has to delay cleanup until the end
+ * of postcopy.
+ */
+typedef enum {
+    POSTCOPY_INCOMING_NONE = 0,  /* Initial state - no postcopy */
+    POSTCOPY_INCOMING_ADVISE,
+    POSTCOPY_INCOMING_DISCARD,
+    POSTCOPY_INCOMING_LISTENING,
+    POSTCOPY_INCOMING_RUNNING,
+    POSTCOPY_INCOMING_END
+} PostcopyState;
+
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *from_src_file;
@@ -240,4 +263,8 @@ void global_state_set_optional(void);
 void savevm_skip_configuration(void);
 int global_state_store(void);
 void global_state_store_running(void);
+
+PostcopyState postcopy_state_get(void);
+/* Set the state and return the old state */
+PostcopyState postcopy_state_set(PostcopyState new_state);
 #endif
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index c6a3a78..204b1c3 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -89,6 +89,16 @@ enum qemu_vm_cmd {
     MIG_CMD_INVALID = 0,   /* Must be 0 */
     MIG_CMD_OPEN_RETURN_PATH,  /* Tell the dest to open the Return path */
     MIG_CMD_PING,              /* Request a PONG on the RP */
+
+    MIG_CMD_POSTCOPY_ADVISE,       /* Prior to any page transfers, just
+                                      warn we might want to do PC */
+    MIG_CMD_POSTCOPY_LISTEN,       /* Start listening for incoming
+                                      pages as it's running. */
+    MIG_CMD_POSTCOPY_RUN,          /* Start execution */
+
+    MIG_CMD_POSTCOPY_RAM_DISCARD,  /* A list of pages to discard that
+                                      were previously sent during
+                                      precopy but are dirty. */
     MIG_CMD_MAX
 };
 
@@ -104,6 +114,15 @@ void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
 void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
 void qemu_savevm_send_open_return_path(QEMUFile *f);
+void qemu_savevm_send_postcopy_advise(QEMUFile *f);
+void qemu_savevm_send_postcopy_listen(QEMUFile *f);
+void qemu_savevm_send_postcopy_run(QEMUFile *f);
+
+void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
+                                           uint16_t len,
+                                           uint64_t *start_list,
+                                           uint64_t *length_list);
+
 int qemu_loadvm_state(QEMUFile *f);
 
 typedef enum DisplayType
diff --git a/migration/migration.c b/migration/migration.c
index 23bdad3..fe93ec8 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -53,6 +53,13 @@ static NotifierList migration_state_notifiers =
 
 static bool deferred_incoming;
 
+/*
+ * Current state of incoming postcopy; note this is not part of
+ * MigrationIncomingState since it's state is used during cleanup
+ * at the end as MIS is being freed.
+ */
+static PostcopyState incoming_postcopy_state;
+
 /* When we add fault tolerance, we could have several
    migrations at once.  For now we don't need to add
    dynamic creation of migration */
@@ -276,6 +283,7 @@ static void process_incoming_migration_co(void *opaque)
     int ret;
 
     migration_incoming_state_new(f);
+    postcopy_state_set(POSTCOPY_INCOMING_NONE);
     migrate_generate_event(MIGRATION_STATUS_ACTIVE);
     ret = qemu_loadvm_state(f);
 
@@ -1286,3 +1294,15 @@ void migrate_fd_connect(MigrationState *s)
     qemu_thread_create(&s->thread, "migration", migration_thread, s,
                        QEMU_THREAD_JOINABLE);
 }
+
+PostcopyState  postcopy_state_get(void)
+{
+    return atomic_mb_read(&incoming_postcopy_state);
+}
+
+/* Set the state and return the old state */
+PostcopyState postcopy_state_set(PostcopyState new_state)
+{
+    return atomic_xchg(&incoming_postcopy_state, new_state);
+}
+
diff --git a/migration/savevm.c b/migration/savevm.c
index f9baaa8..7af8165 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -45,6 +45,7 @@
 #include "exec/memory.h"
 #include "qmp-commands.h"
 #include "trace.h"
+#include "qemu/bitops.h"
 #include "qemu/iov.h"
 #include "block/snapshot.h"
 #include "block/qapi.h"
@@ -57,6 +58,8 @@
 #define ARP_PTYPE_IP 0x0800
 #define ARP_OP_REQUEST_REV 0x3
 
+const unsigned int postcopy_ram_discard_version = 0;
+
 static bool skip_section_footers;
 
 static struct mig_cmd_args {
@@ -66,6 +69,11 @@ static struct mig_cmd_args {
     [MIG_CMD_INVALID]          = { .len = -1, .name = "INVALID" },
     [MIG_CMD_OPEN_RETURN_PATH] = { .len =  0, .name = "OPEN_RETURN_PATH" },
     [MIG_CMD_PING]             = { .len = sizeof(uint32_t), .name = "PING" },
+    [MIG_CMD_POSTCOPY_ADVISE]  = { .len = 16, .name = "POSTCOPY_ADVISE" },
+    [MIG_CMD_POSTCOPY_LISTEN]  = { .len =  0, .name = "POSTCOPY_LISTEN" },
+    [MIG_CMD_POSTCOPY_RUN]     = { .len =  0, .name = "POSTCOPY_RUN" },
+    [MIG_CMD_POSTCOPY_RAM_DISCARD] = {
+                                   .len = -1, .name = "POSTCOPY_RAM_DISCARD" },
     [MIG_CMD_MAX]              = { .len = -1, .name = "MAX" },
 };
 
@@ -739,6 +747,77 @@ void qemu_savevm_send_open_return_path(QEMUFile *f)
     qemu_savevm_command_send(f, MIG_CMD_OPEN_RETURN_PATH, 0, NULL);
 }
 
+/* Send prior to any postcopy transfer */
+void qemu_savevm_send_postcopy_advise(QEMUFile *f)
+{
+    uint64_t tmp[2];
+    tmp[0] = cpu_to_be64(getpagesize());
+    tmp[1] = cpu_to_be64(1ul << qemu_target_page_bits());
+
+    trace_qemu_savevm_send_postcopy_advise();
+    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_ADVISE, 16, (uint8_t *)tmp);
+}
+
+/* Sent prior to starting the destination running in postcopy, discard pages
+ * that have already been sent but redirtied on the source.
+ * CMD_POSTCOPY_RAM_DISCARD consist of:
+ *      byte   version (0)
+ *      byte   Length of name field (not including 0)
+ *  n x byte   RAM block name
+ *      byte   0 terminator (just for safety)
+ *  n x        Byte ranges within the named RAMBlock
+ *      be64   Start of the range
+ *      be64   Length
+ *
+ *  name:  RAMBlock name that these entries are part of
+ *  len: Number of page entries
+ *  start_list: 'len' addresses
+ *  length_list: 'len' addresses
+ *
+ */
+void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
+                                           uint16_t len,
+                                           uint64_t *start_list,
+                                           uint64_t *length_list)
+{
+    uint8_t *buf;
+    uint16_t tmplen;
+    uint16_t t;
+    size_t name_len = strlen(name);
+
+    trace_qemu_savevm_send_postcopy_ram_discard(name, len);
+    assert(name_len < 256);
+    buf = g_malloc0(1 + 1 + name_len + 1 + (8 + 8) * len);
+    buf[0] = postcopy_ram_discard_version;
+    buf[1] = name_len;
+    memcpy(buf + 2, name, name_len);
+    tmplen = 2 + name_len;
+    buf[tmplen++] = '\0';
+
+    for (t = 0; t < len; t++) {
+        cpu_to_be64w((uint64_t *)(buf + tmplen), start_list[t]);
+        tmplen += 8;
+        cpu_to_be64w((uint64_t *)(buf + tmplen), length_list[t]);
+        tmplen += 8;
+    }
+    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RAM_DISCARD, tmplen, buf);
+    g_free(buf);
+}
+
+/* Get the destination into a state where it can receive postcopy data. */
+void qemu_savevm_send_postcopy_listen(QEMUFile *f)
+{
+    trace_savevm_send_postcopy_listen();
+    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_LISTEN, 0, NULL);
+}
+
+/* Kick the destination into running */
+void qemu_savevm_send_postcopy_run(QEMUFile *f)
+{
+    trace_savevm_send_postcopy_run();
+    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RUN, 0, NULL);
+}
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -1055,6 +1134,167 @@ enum LoadVMExitCodes {
 };
 
 static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
+
+/* ------ incoming postcopy messages ------ */
+/* 'advise' arrives before any transfers just to tell us that a postcopy
+ * *might* happen - it might be skipped if precopy transferred everything
+ * quickly.
+ */
+static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
+                                         uint64_t remote_hps,
+                                         uint64_t remote_tps)
+{
+    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_ADVISE);
+    trace_loadvm_postcopy_handle_advise();
+    if (ps != POSTCOPY_INCOMING_NONE) {
+        error_report("CMD_POSTCOPY_ADVISE in wrong postcopy state (%d)", ps);
+        return -1;
+    }
+
+    if (remote_hps != getpagesize())  {
+        /*
+         * Some combinations of mismatch are probably possible but it gets
+         * a bit more complicated.  In particular we need to place whole
+         * host pages on the dest at once, and we need to ensure that we
+         * handle dirtying to make sure we never end up sending part of
+         * a hostpage on it's own.
+         */
+        error_report("Postcopy needs matching host page sizes (s=%d d=%d)",
+                     (int)remote_hps, getpagesize());
+        return -1;
+    }
+
+    if (remote_tps != (1ul << qemu_target_page_bits())) {
+        /*
+         * Again, some differences could be dealt with, but for now keep it
+         * simple.
+         */
+        error_report("Postcopy needs matching target page sizes (s=%d d=%d)",
+                     (int)remote_tps, 1 << qemu_target_page_bits());
+        return -1;
+    }
+
+    return 0;
+}
+
+/* After postcopy we will be told to throw some pages away since they're
+ * dirty and will have to be demand fetched.  Must happen before CPU is
+ * started.
+ * There can be 0..many of these messages, each encoding multiple pages.
+ */
+static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
+                                              uint16_t len)
+{
+    int tmp;
+    char ramid[256];
+    PostcopyState ps = postcopy_state_get();
+
+    trace_loadvm_postcopy_ram_handle_discard();
+
+    switch (ps) {
+    case POSTCOPY_INCOMING_ADVISE:
+        /* 1st discard */
+        tmp = 0; /* TODO: later patch postcopy_ram_prepare_discard(mis); */
+        if (tmp) {
+            return tmp;
+        }
+        break;
+
+    case POSTCOPY_INCOMING_DISCARD:
+        /* Expected state */
+        break;
+
+    default:
+        error_report("CMD_POSTCOPY_RAM_DISCARD in wrong postcopy state (%d)",
+                     ps);
+        return -1;
+    }
+    /* We're expecting a
+     *    Version (0)
+     *    a RAM ID string (length byte, name, 0 term)
+     *    then at least 1 16 byte chunk
+    */
+    if (len < (1 + 1 + 1 + 1 + 2 * 8)) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD invalid length (%d)", len);
+        return -1;
+    }
+
+    tmp = qemu_get_byte(mis->from_src_file);
+    if (tmp != postcopy_ram_discard_version) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD invalid version (%d)", tmp);
+        return -1;
+    }
+
+    if (!qemu_get_counted_string(mis->from_src_file, ramid)) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD Failed to read RAMBlock ID");
+        return -1;
+    }
+    tmp = qemu_get_byte(mis->from_src_file);
+    if (tmp != 0) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD missing nil (%d)", tmp);
+        return -1;
+    }
+
+    len -= 3 + strlen(ramid);
+    if (len % 16) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD invalid length (%d)", len);
+        return -1;
+    }
+    trace_loadvm_postcopy_ram_handle_discard_header(ramid, len);
+    while (len) {
+        /* TODO - ram_discard_range gets added in a later patch
+        uint64_t start_addr, block_length;
+        start_addr = qemu_get_be64(mis->from_src_file);
+        block_length = qemu_get_be64(mis->from_src_file);
+
+        len -= 16;
+        int ret = ram_discard_range(mis, ramid, start_addr,
+                                    block_length);
+        if (ret) {
+            return ret;
+        }
+        */
+    }
+    trace_loadvm_postcopy_ram_handle_discard_end();
+
+    return 0;
+}
+
+/* After this message we must be able to immediately receive postcopy data */
+static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
+{
+    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_LISTENING);
+    trace_loadvm_postcopy_handle_listen();
+    if (ps != POSTCOPY_INCOMING_ADVISE && ps != POSTCOPY_INCOMING_DISCARD) {
+        error_report("CMD_POSTCOPY_LISTEN in wrong postcopy state (%d)", ps);
+        return -1;
+    }
+
+    /* TODO start up the postcopy listening thread */
+    return 0;
+}
+
+/* After all discards we can start running and asking for pages */
+static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
+{
+    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_RUNNING);
+    trace_loadvm_postcopy_handle_run();
+    if (ps != POSTCOPY_INCOMING_LISTENING) {
+        error_report("CMD_POSTCOPY_RUN in wrong postcopy state (%d)", ps);
+        return -1;
+    }
+
+    if (autostart) {
+        /* Hold onto your hats, starting the CPU */
+        vm_start();
+    } else {
+        /* leave it paused and let management decide when to start the CPU */
+        runstate_set(RUN_STATE_PAUSED);
+    }
+
+    return 0;
+}
+
 /**
  * loadvm_process_command: Process an incoming 'QEMU_VM_COMMAND'
  *
@@ -1069,6 +1309,7 @@ static int loadvm_process_command(QEMUFile *f)
     uint16_t cmd;
     uint16_t len;
     uint32_t tmp32;
+    uint64_t tmp64a, tmp64b;
 
     cmd = qemu_get_be16(f);
     len = qemu_get_be16(f);
@@ -1109,6 +1350,20 @@ static int loadvm_process_command(QEMUFile *f)
         }
         migrate_send_rp_pong(mis, tmp32);
         break;
+
+    case MIG_CMD_POSTCOPY_ADVISE:
+        tmp64a = qemu_get_be64(f); /* hps */
+        tmp64b = qemu_get_be64(f); /* tps */
+        return loadvm_postcopy_handle_advise(mis, tmp64a, tmp64b);
+
+    case MIG_CMD_POSTCOPY_LISTEN:
+        return loadvm_postcopy_handle_listen(mis);
+
+    case MIG_CMD_POSTCOPY_RUN:
+        return loadvm_postcopy_handle_run(mis);
+
+    case MIG_CMD_POSTCOPY_RAM_DISCARD:
+        return loadvm_postcopy_ram_handle_discard(mis, len);
     }
 
     return 0;
diff --git a/trace-events b/trace-events
index 228f5b6..cc6668f 100644
--- a/trace-events
+++ b/trace-events
@@ -1201,13 +1201,23 @@ qemu_loadvm_state_main(void) ""
 qemu_loadvm_state_main_quit_parent(void) ""
 qemu_loadvm_state_post_main(int ret) "%d"
 qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u"
+loadvm_postcopy_handle_advise(void) ""
+loadvm_postcopy_handle_listen(void) ""
+loadvm_postcopy_handle_run(void) ""
+loadvm_postcopy_ram_handle_discard(void) ""
+loadvm_postcopy_ram_handle_discard_end(void) ""
+loadvm_postcopy_ram_handle_discard_header(const char *ramid, uint16_t len) "%s: %ud"
 loadvm_process_command(uint16_t com, uint16_t len) "com=0x%x len=%d"
 loadvm_process_command_ping(uint32_t val) "%x"
+qemu_savevm_send_postcopy_advise(void) ""
+qemu_savevm_send_postcopy_ram_discard(const char *id, uint16_t len) "%s: %ud"
 savevm_command_send(uint16_t command, uint16_t len) "com=0x%x len=%d"
 savevm_section_start(const char *id, unsigned int section_id) "%s, section_id %u"
 savevm_section_end(const char *id, unsigned int section_id, int ret) "%s, section_id %u -> %d"
 savevm_section_skip(const char *id, unsigned int section_id) "%s, section_id %u"
 savevm_send_ping(uint32_t val) "%x"
+savevm_send_postcopy_listen(void) ""
+savevm_send_postcopy_run(void) ""
 savevm_state_begin(void) ""
 savevm_state_header(void) ""
 savevm_state_iterate(void) ""
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 25/54] MIG_CMD_PACKAGED: Send a packaged chunk of migration stream
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (23 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 24/54] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-10-20 13:25   ` Juan Quintela
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 26/54] Modify save_live_pending for postcopy Dr. David Alan Gilbert (git)
                   ` (28 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

MIG_CMD_PACKAGED is a migration command that wraps a chunk of migration
stream inside a package whose length can be determined purely by reading
its header.  The destination guarantees that the whole MIG_CMD_PACKAGED
is read off the stream prior to parsing the contents.

This is used by postcopy to load device state (from the package)
while leaving the main stream free to receive memory pages.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 include/sysemu/sysemu.h |   4 ++
 migration/savevm.c      | 106 +++++++++++++++++++++++++++++++++++++++++++++---
 trace-events            |   4 ++
 3 files changed, 109 insertions(+), 5 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 204b1c3..9c78d71 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -99,9 +99,12 @@ enum qemu_vm_cmd {
     MIG_CMD_POSTCOPY_RAM_DISCARD,  /* A list of pages to discard that
                                       were previously sent during
                                       precopy but are dirty. */
+    MIG_CMD_PACKAGED,          /* Send a wrapped stream within this stream */
     MIG_CMD_MAX
 };
 
+#define MAX_VM_CMD_PACKAGED_SIZE (1ul << 24)
+
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_state_begin(QEMUFile *f,
                              const MigrationParams *params);
@@ -114,6 +117,7 @@ void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
 void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
 void qemu_savevm_send_open_return_path(QEMUFile *f);
+int qemu_savevm_send_packaged(QEMUFile *f, const QEMUSizedBuffer *qsb);
 void qemu_savevm_send_postcopy_advise(QEMUFile *f);
 void qemu_savevm_send_postcopy_listen(QEMUFile *f);
 void qemu_savevm_send_postcopy_run(QEMUFile *f);
diff --git a/migration/savevm.c b/migration/savevm.c
index 7af8165..de20b95 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -74,6 +74,7 @@ static struct mig_cmd_args {
     [MIG_CMD_POSTCOPY_RUN]     = { .len =  0, .name = "POSTCOPY_RUN" },
     [MIG_CMD_POSTCOPY_RAM_DISCARD] = {
                                    .len = -1, .name = "POSTCOPY_RAM_DISCARD" },
+    [MIG_CMD_PACKAGED]         = { .len =  4, .name = "PACKAGED" },
     [MIG_CMD_MAX]              = { .len = -1, .name = "MAX" },
 };
 
@@ -747,6 +748,50 @@ void qemu_savevm_send_open_return_path(QEMUFile *f)
     qemu_savevm_command_send(f, MIG_CMD_OPEN_RETURN_PATH, 0, NULL);
 }
 
+/* We have a buffer of data to send; we don't want that all to be loaded
+ * by the command itself, so the command contains just the length of the
+ * extra buffer that we then send straight after it.
+ * TODO: Must be a better way to organise that
+ *
+ * Returns:
+ *    0 on success
+ *    -ve on error
+ */
+int qemu_savevm_send_packaged(QEMUFile *f, const QEMUSizedBuffer *qsb)
+{
+    size_t cur_iov;
+    size_t len = qsb_get_length(qsb);
+    uint32_t tmp;
+
+    if (len > MAX_VM_CMD_PACKAGED_SIZE) {
+        error_report("%s: Unreasonably large packaged state: %zu",
+                     __func__, len);
+        return -1;
+    }
+
+    tmp = cpu_to_be32(len);
+
+    trace_qemu_savevm_send_packaged();
+    qemu_savevm_command_send(f, MIG_CMD_PACKAGED, 4, (uint8_t *)&tmp);
+
+    /* all the data follows (concatinating the iov's) */
+    for (cur_iov = 0; cur_iov < qsb->n_iov; cur_iov++) {
+        /* The iov entries are partially filled */
+        size_t towrite = (qsb->iov[cur_iov].iov_len > len) ?
+                              len :
+                              qsb->iov[cur_iov].iov_len;
+        len -= towrite;
+
+        if (!towrite) {
+            break;
+        }
+
+        qemu_put_buffer(f, qsb->iov[cur_iov].iov_base, towrite);
+    }
+
+    return 0;
+}
+
 /* Send prior to any postcopy transfer */
 void qemu_savevm_send_postcopy_advise(QEMUFile *f)
 {
@@ -1296,12 +1341,59 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
 }
 
 /**
- * loadvm_process_command: Process an incoming 'QEMU_VM_COMMAND'
+ * Immediately following this command is a blob of data containing an embedded
+ * chunk of migration stream; read it and load it.
+ *
+ * @mis: Incoming state
+ * @length: Length of packaged data to read
+ *
+ * Returns: Negative values on error
  *
- * Returns: 0 on just a normal return
- *          LOADVM_QUIT All good, but exit the loop
- *          <0 error (in which case it will issue an error message).
- * @f: The stream to read the command data from.
+ */
+static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis,
+                                      uint32_t length)
+{
+    int ret;
+    uint8_t *buffer;
+    QEMUSizedBuffer *qsb;
+
+    trace_loadvm_handle_cmd_packaged(length);
+
+    if (length > MAX_VM_CMD_PACKAGED_SIZE) {
+        error_report("Unreasonably large packaged state: %u", length);
+        return -1;
+    }
+    buffer = g_malloc0(length);
+    ret = qemu_get_buffer(mis->from_src_file, buffer, (int)length);
+    if (ret != length) {
+        g_free(buffer);
+        error_report("CMD_PACKAGED: Buffer receive fail ret=%d length=%d\n",
+                ret, length);
+        return (ret < 0) ? ret : -EAGAIN;
+    }
+    trace_loadvm_handle_cmd_packaged_received(ret);
+
+    /* Setup a dummy QEMUFile that actually reads from the buffer */
+    qsb = qsb_create(buffer, length);
+    g_free(buffer); /* Because qsb_create copies */
+    if (!qsb) {
+        error_report("Unable to create qsb");
+    }
+    QEMUFile *packf = qemu_bufopen("r", qsb);
+
+    ret = qemu_loadvm_state_main(packf, mis);
+    trace_loadvm_handle_cmd_packaged_main(ret);
+    qemu_fclose(packf);
+    qsb_free(qsb);
+
+    return ret;
+}
+
+/*
+ * Process an incoming 'QEMU_VM_COMMAND'
+ * 0           just a normal return
+ * LOADVM_QUIT All good, but exit the loop
+ * <0          Error
  */
 static int loadvm_process_command(QEMUFile *f)
 {
@@ -1351,6 +1443,10 @@ static int loadvm_process_command(QEMUFile *f)
         migrate_send_rp_pong(mis, tmp32);
         break;
 
+    case MIG_CMD_PACKAGED:
+        tmp32 = qemu_get_be32(f);
+        return loadvm_handle_cmd_packaged(mis, tmp32);
+
     case MIG_CMD_POSTCOPY_ADVISE:
         tmp64a = qemu_get_be64(f); /* hps */
         tmp64b = qemu_get_be64(f); /* tps */
diff --git a/trace-events b/trace-events
index cc6668f..4bc05fd 100644
--- a/trace-events
+++ b/trace-events
@@ -1201,6 +1201,10 @@ qemu_loadvm_state_main(void) ""
 qemu_loadvm_state_main_quit_parent(void) ""
 qemu_loadvm_state_post_main(int ret) "%d"
 qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u"
+qemu_savevm_send_packaged(void) ""
+loadvm_handle_cmd_packaged(unsigned int length) "%u"
+loadvm_handle_cmd_packaged_main(int ret) "%d"
+loadvm_handle_cmd_packaged_received(int ret) "%d"
 loadvm_postcopy_handle_advise(void) ""
 loadvm_postcopy_handle_listen(void) ""
 loadvm_postcopy_handle_run(void) ""
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 26/54] Modify save_live_pending for postcopy
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (24 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 25/54] MIG_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-10-28 11:03   ` Amit Shah
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 27/54] postcopy: OS support test Dr. David Alan Gilbert (git)
                   ` (27 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Modify save_live_pending to return separate postcopiable and
non-postcopiable counts.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
 include/migration/vmstate.h |  5 +++--
 include/sysemu/sysemu.h     |  4 +++-
 migration/block.c           |  7 +++++--
 migration/migration.c       |  9 +++++++--
 migration/ram.c             |  8 ++++++--
 migration/savevm.c          | 17 +++++++++++++----
 trace-events                |  2 +-
 7 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index d0f4451..6635bac 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -54,8 +54,9 @@ typedef struct SaveVMHandlers {
 
     /* This runs outside the iothread lock!  */
     int (*save_live_setup)(QEMUFile *f, void *opaque);
-    uint64_t (*save_live_pending)(QEMUFile *f, void *opaque, uint64_t max_size);
-
+    void (*save_live_pending)(QEMUFile *f, void *opaque, uint64_t max_size,
+                              uint64_t *non_postcopiable_pending,
+                              uint64_t *postcopiable_pending);
     LoadStateHandler *load_state;
 } SaveVMHandlers;
 
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 9c78d71..75fc79e 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -112,7 +112,9 @@ void qemu_savevm_state_header(QEMUFile *f);
 int qemu_savevm_state_iterate(QEMUFile *f);
 void qemu_savevm_state_complete_precopy(QEMUFile *f);
 void qemu_savevm_state_cancel(void);
-uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
+void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
+                               uint64_t *res_non_postcopiable,
+                               uint64_t *res_postcopiable);
 void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
 void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
diff --git a/migration/block.c b/migration/block.c
index ceae0ab..449ef7f 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -754,7 +754,9 @@ static int block_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static uint64_t block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size)
+static void block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
+                               uint64_t *non_postcopiable_pending,
+                               uint64_t *postcopiable_pending)
 {
     /* Estimate pending number of bytes to send */
     uint64_t pending;
@@ -773,7 +775,8 @@ static uint64_t block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size)
     qemu_mutex_unlock_iothread();
 
     DPRINTF("Enter save live pending  %" PRIu64 "\n", pending);
-    return pending;
+    /* We don't do postcopy */
+    *non_postcopiable_pending += pending;
 }
 
 static int block_load(QEMUFile *f, void *opaque, int version_id)
diff --git a/migration/migration.c b/migration/migration.c
index fe93ec8..6989e21 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1186,8 +1186,13 @@ static void *migration_thread(void *opaque)
         uint64_t pending_size;
 
         if (!qemu_file_rate_limit(s->file)) {
-            pending_size = qemu_savevm_state_pending(s->file, max_size);
-            trace_migrate_pending(pending_size, max_size);
+            uint64_t pend_post, pend_nonpost;
+
+            qemu_savevm_state_pending(s->file, max_size, &pend_nonpost,
+                                      &pend_post);
+            pending_size = pend_nonpost + pend_post;
+            trace_migrate_pending(pending_size, max_size,
+                                  pend_post, pend_nonpost);
             if (pending_size && pending_size >= max_size) {
                 qemu_savevm_state_iterate(s->file);
             } else {
diff --git a/migration/ram.c b/migration/ram.c
index 1ae8223..16eb119 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1359,7 +1359,9 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static uint64_t ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size)
+static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
+                             uint64_t *non_postcopiable_pending,
+                             uint64_t *postcopiable_pending)
 {
     uint64_t remaining_size;
 
@@ -1373,7 +1375,9 @@ static uint64_t ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size)
         qemu_mutex_unlock_iothread();
         remaining_size = ram_save_remaining() * TARGET_PAGE_SIZE;
     }
-    return remaining_size;
+
+    /* We can do postcopy, and all the data is postcopiable */
+    *postcopiable_pending += remaining_size;
 }
 
 static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
diff --git a/migration/savevm.c b/migration/savevm.c
index de20b95..3f919e0 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1053,10 +1053,19 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f)
     qemu_fflush(f);
 }
 
-uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size)
+/* Give an estimate of the amount left to be transferred,
+ * the result is split into the amount for units that can and
+ * for units that can't do postcopy.
+ */
+void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
+                               uint64_t *res_non_postcopiable,
+                               uint64_t *res_postcopiable)
 {
     SaveStateEntry *se;
-    uint64_t ret = 0;
+
+    *res_non_postcopiable = 0;
+    *res_postcopiable = 0;
+
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
         if (!se->ops || !se->ops->save_live_pending) {
@@ -1067,9 +1076,9 @@ uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size)
                 continue;
             }
         }
-        ret += se->ops->save_live_pending(f, se->opaque, max_size);
+        se->ops->save_live_pending(f, se->opaque, max_size,
+                                   res_non_postcopiable, res_postcopiable);
     }
-    return ret;
 }
 
 void qemu_savevm_state_cancel(void)
diff --git a/trace-events b/trace-events
index 4bc05fd..4fff040 100644
--- a/trace-events
+++ b/trace-events
@@ -1438,7 +1438,7 @@ migrate_set_state(int new_state) "new state %d"
 migrate_fd_cleanup(void) ""
 migrate_fd_error(void) ""
 migrate_fd_cancel(void) ""
-migrate_pending(uint64_t size, uint64_t max) "pending size %" PRIu64 " max %" PRIu64
+migrate_pending(uint64_t size, uint64_t max, uint64_t post, uint64_t nonpost) "pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64 " nonpost=%" PRIu64 ")"
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
 open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 27/54] postcopy: OS support test
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (25 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 26/54] Modify save_live_pending for postcopy Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-10-20 13:31   ` Juan Quintela
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 28/54] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
                   ` (26 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Provide a check to see if the OS we're running on has all the bits
needed for postcopy.

Creates postcopy-ram.c which will get most of the other helpers we need.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 include/migration/postcopy-ram.h |  19 +++++
 migration/Makefile.objs          |   2 +-
 migration/postcopy-ram.c         | 157 +++++++++++++++++++++++++++++++++++++++
 migration/savevm.c               |   5 ++
 4 files changed, 182 insertions(+), 1 deletion(-)
 create mode 100644 include/migration/postcopy-ram.h
 create mode 100644 migration/postcopy-ram.c

diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
new file mode 100644
index 0000000..d81934f
--- /dev/null
+++ b/include/migration/postcopy-ram.h
@@ -0,0 +1,19 @@
+/*
+ * Postcopy migration for RAM
+ *
+ * Copyright 2013 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *  Dave Gilbert  <dgilbert@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+#ifndef QEMU_POSTCOPY_RAM_H
+#define QEMU_POSTCOPY_RAM_H
+
+/* Return true if the host supports everything we need to do postcopy-ram */
+bool postcopy_ram_supported_by_host(void);
+
+#endif
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index d929e96..0cac6d7 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,7 +1,7 @@
 common-obj-y += migration.o tcp.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
-common-obj-y += xbzrle.o
+common-obj-y += xbzrle.o postcopy-ram.o
 
 common-obj-$(CONFIG_RDMA) += rdma.o
 common-obj-$(CONFIG_POSIX) += exec.o unix.o fd.o
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
new file mode 100644
index 0000000..cdd0168
--- /dev/null
+++ b/migration/postcopy-ram.c
@@ -0,0 +1,157 @@
+/*
+ * Postcopy migration for RAM
+ *
+ * Copyright 2013-2015 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *  Dave Gilbert  <dgilbert@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/*
+ * Postcopy is a migration technique where the execution flips from the
+ * source to the destination before all the data has been copied.
+ */
+
+#include <glib.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#include "qemu-common.h"
+#include "migration/migration.h"
+#include "migration/postcopy-ram.h"
+#include "sysemu/sysemu.h"
+#include "qemu/error-report.h"
+#include "trace.h"
+
+/* Postcopy needs to detect accesses to pages that haven't yet been copied
+ * across, and efficiently map new pages in, the techniques for doing this
+ * are target OS specific.
+ */
+#if defined(__linux__)
+
+#include <sys/mman.h>
+#include <sys/ioctl.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <asm/types.h> /* for __u64 */
+#endif
+
+#if defined(__linux__) && defined(__NR_userfaultfd)
+#include <linux/userfaultfd.h>
+
+static bool ufd_version_check(int ufd)
+{
+    struct uffdio_api api_struct;
+    uint64_t ioctl_mask;
+
+    api_struct.api = UFFD_API;
+    api_struct.features = 0;
+    if (ioctl(ufd, UFFDIO_API, &api_struct)) {
+        error_report("postcopy_ram_supported_by_host: UFFDIO_API failed: %s",
+                     strerror(errno));
+        return false;
+    }
+
+    ioctl_mask = (__u64)1 << _UFFDIO_REGISTER |
+                 (__u64)1 << _UFFDIO_UNREGISTER;
+    if ((api_struct.ioctls & ioctl_mask) != ioctl_mask) {
+        error_report("Missing userfault features: %" PRIx64,
+                     (uint64_t)(~api_struct.ioctls & ioctl_mask));
+        return false;
+    }
+
+    return true;
+}
+
+bool postcopy_ram_supported_by_host(void)
+{
+    long pagesize = getpagesize();
+    int ufd = -1;
+    bool ret = false; /* Error unless we change it */
+    void *testarea = NULL;
+    struct uffdio_register reg_struct;
+    struct uffdio_range range_struct;
+    uint64_t feature_mask;
+
+    if ((1ul << qemu_target_page_bits()) > pagesize) {
+        error_report("Target page size bigger than host page size");
+        goto out;
+    }
+
+    ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
+    if (ufd == -1) {
+        error_report("%s: userfaultfd not available: %s", __func__,
+                     strerror(errno));
+        goto out;
+    }
+
+    /* Version and features check */
+    if (!ufd_version_check(ufd)) {
+        goto out;
+    }
+
+    /*
+     *  We need to check that the ops we need are supported on anon memory
+     *  To do that we need to register a chunk and see the flags that
+     *  are returned.
+     */
+    testarea = mmap(NULL, pagesize, PROT_READ | PROT_WRITE, MAP_PRIVATE |
+                                    MAP_ANONYMOUS, -1, 0);
+    if (testarea == MAP_FAILED) {
+        error_report("%s: Failed to map test area: %s", __func__,
+                     strerror(errno));
+        goto out;
+    }
+    g_assert(((size_t)testarea & (pagesize-1)) == 0);
+
+    reg_struct.range.start = (uintptr_t)testarea;
+    reg_struct.range.len = pagesize;
+    reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
+
+    if (ioctl(ufd, UFFDIO_REGISTER, &reg_struct)) {
+        error_report("%s userfault register: %s", __func__, strerror(errno));
+        goto out;
+    }
+
+    range_struct.start = (uintptr_t)testarea;
+    range_struct.len = pagesize;
+    if (ioctl(ufd, UFFDIO_UNREGISTER, &range_struct)) {
+        error_report("%s userfault unregister: %s", __func__, strerror(errno));
+        goto out;
+    }
+
+    feature_mask = (__u64)1 << _UFFDIO_WAKE |
+                   (__u64)1 << _UFFDIO_COPY |
+                   (__u64)1 << _UFFDIO_ZEROPAGE;
+    if ((reg_struct.ioctls & feature_mask) != feature_mask) {
+        error_report("Missing userfault map features: %" PRIx64,
+                     (uint64_t)(~reg_struct.ioctls & feature_mask));
+        goto out;
+    }
+
+    /* Success! */
+    ret = true;
+out:
+    if (testarea) {
+        munmap(testarea, pagesize);
+    }
+    if (ufd != -1) {
+        close(ufd);
+    }
+    return ret;
+}
+
+#else
+/* No target OS support, stubs just fail */
+bool postcopy_ram_supported_by_host(void)
+{
+    error_report("%s: No OS support", __func__);
+    return false;
+}
+
+#endif
+
diff --git a/migration/savevm.c b/migration/savevm.c
index 3f919e0..c065ae8 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -37,6 +37,7 @@
 #include "qemu/timer.h"
 #include "audio/audio.h"
 #include "migration/migration.h"
+#include "migration/postcopy-ram.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/error-report.h"
 #include "qemu/sockets.h"
@@ -1205,6 +1206,10 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
         return -1;
     }
 
+    if (!postcopy_ram_supported_by_host()) {
+        return -1;
+    }
+
     if (remote_hps != getpagesize())  {
         /*
          * Some combinations of mismatch are probably possible but it gets
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 28/54] migrate_start_postcopy: Command to trigger transition to postcopy
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (26 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 27/54] postcopy: OS support test Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-09-30 16:25   ` Eric Blake
                     ` (2 more replies)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 29/54] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
                   ` (25 subsequent siblings)
  53 siblings, 3 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Once postcopy is enabled (with migrate_set_capability), the migration
will still start on precopy mode.  To cause a transition into postcopy
the:

  migrate_start_postcopy

command must be issued.  Postcopy will start sometime after this
(when it's next checked in the migration loop).

Issuing the command before migration has started will error,
and issuing after it has finished is ignored.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 hmp-commands.hx               | 15 +++++++++++++++
 hmp.c                         |  7 +++++++
 hmp.h                         |  1 +
 include/migration/migration.h |  3 +++
 migration/migration.c         | 22 ++++++++++++++++++++++
 qapi-schema.json              |  8 ++++++++
 qmp-commands.hx               | 19 +++++++++++++++++++
 7 files changed, 75 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 3a4ae39..8939b98 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1008,6 +1008,21 @@ Set the parameter @var{parameter} for migration.
 ETEXI
 
     {
+        .name       = "migrate_start_postcopy",
+        .args_type  = "",
+        .params     = "",
+        .help       = "Switch migration to postcopy mode",
+        .mhandler.cmd = hmp_migrate_start_postcopy,
+    },
+
+STEXI
+@item migrate_start_postcopy
+@findex migrate_start_postcopy
+Switch in-progress migration to postcopy mode. Ignored after the end of
+migration (or once already in postcopy).
+ETEXI
+
+    {
         .name       = "client_migrate_info",
         .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
         .params     = "protocol hostname port tls-port cert-subject",
diff --git a/hmp.c b/hmp.c
index 3f807b7..38dd8f7 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1272,6 +1272,13 @@ void hmp_client_migrate_info(Monitor *mon, const QDict *qdict)
     hmp_handle_error(mon, &err);
 }
 
+void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+    qmp_migrate_start_postcopy(&err);
+    hmp_handle_error(mon, &err);
+}
+
 void hmp_set_password(Monitor *mon, const QDict *qdict)
 {
     const char *protocol  = qdict_get_str(qdict, "protocol");
diff --git a/hmp.h b/hmp.h
index 81656c3..a8c5b5a 100644
--- a/hmp.h
+++ b/hmp.h
@@ -69,6 +69,7 @@ void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
 void hmp_client_migrate_info(Monitor *mon, const QDict *qdict);
+void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict);
 void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
 void hmp_eject(Monitor *mon, const QDict *qdict);
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 2e9fa3c..2176666 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -127,6 +127,9 @@ struct MigrationState
     int64_t xbzrle_cache_size;
     int64_t setup_time;
     int64_t dirty_sync_count;
+
+    /* Flag set once the migration has been asked to enter postcopy */
+    bool start_postcopy;
 };
 
 void process_incoming_migration(QEMUFile *f);
diff --git a/migration/migration.c b/migration/migration.c
index 6989e21..5ee2c11 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -616,6 +616,28 @@ void qmp_migrate_set_parameters(bool has_compress_level,
     }
 }
 
+void qmp_migrate_start_postcopy(Error **errp)
+{
+    MigrationState *s = migrate_get_current();
+
+    if (!migrate_postcopy_ram()) {
+        error_setg(errp, "Enable postcopy with migration_set_capability before"
+                         " the start of migration");
+        return;
+    }
+
+    if (s->state == MIGRATION_STATUS_NONE) {
+        error_setg(errp, "Postcopy must be started after migration has been"
+                         " started");
+        return;
+    }
+    /*
+     * we don't error if migration has finished since that would be racy
+     * with issuing this command.
+     */
+    atomic_set(&s->start_postcopy, true);
+}
+
 /* shared migration helpers */
 
 static void migrate_set_state(MigrationState *s, int old_state, int new_state)
diff --git a/qapi-schema.json b/qapi-schema.json
index c6f1942..48644f5 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -670,6 +670,14 @@
             '*tls-port': 'int', '*cert-subject': 'str' } }
 
 ##
+# @migrate-start-postcopy
+#
+# Switch migration to postcopy mode
+#
+# Since: 2.4
+{ 'command': 'migrate-start-postcopy' }
+
+##
 # @MouseInfo:
 #
 # Information about a mouse device.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index d2ba800..6d8547a 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -718,6 +718,25 @@ Example:
 
 EQMP
     {
+        .name       = "migrate-start-postcopy",
+        .args_type  = "",
+        .mhandler.cmd_new = qmp_marshal_migrate_start_postcopy,
+    },
+
+SQMP
+migrate-start-postcopy
+----------------------
+
+Switch an in-progress migration to postcopy mode. Ignored after the end of
+migration (or once already in postcopy).
+
+Example:
+-> { "execute": "migrate-start-postcopy" }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "query-migrate-cache-size",
         .args_type  = "",
         .mhandler.cmd_new = qmp_marshal_query_migrate_cache_size,
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 29/54] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (27 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 28/54] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-10-20 13:35   ` Juan Quintela
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 30/54] Avoid sending vmdescription during postcopy Dr. David Alan Gilbert (git)
                   ` (24 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

'MIGRATION_STATUS_POSTCOPY_ACTIVE' is entered after migrate_start_postcopy

'migration_in_postcopy' is provided for other sections to know if
they're in postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 include/migration/migration.h |  2 ++
 migration/migration.c         | 56 ++++++++++++++++++++++++++++++++++++-------
 qapi-schema.json              |  4 +++-
 trace-events                  |  1 +
 4 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 2176666..219032d 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -170,6 +170,8 @@ MigrationState *migrate_init(const MigrationParams *params);
 bool migration_in_setup(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
+/* True if outgoing migration has entered postcopy phase */
+bool migration_in_postcopy(MigrationState *);
 MigrationState *migrate_get_current(void);
 
 void migrate_compress_threads_create(void);
diff --git a/migration/migration.c b/migration/migration.c
index 5ee2c11..2ae5909 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -439,6 +439,7 @@ static bool migration_is_active(MigrationState *ms)
 {
     switch (ms->state) {
     case MIGRATION_STATUS_ACTIVE:
+    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
     case MIGRATION_STATUS_SETUP:
         return true;
 
@@ -509,6 +510,39 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 
         get_xbzrle_cache_stats(info);
         break;
+    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
+        /* Mostly the same as active; TODO add some postcopy stats */
+        info->has_status = true;
+        info->has_total_time = true;
+        info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME)
+            - s->total_time;
+        info->has_expected_downtime = true;
+        info->expected_downtime = s->expected_downtime;
+        info->has_setup_time = true;
+        info->setup_time = s->setup_time;
+
+        info->has_ram = true;
+        info->ram = g_malloc0(sizeof(*info->ram));
+        info->ram->transferred = ram_bytes_transferred();
+        info->ram->remaining = ram_bytes_remaining();
+        info->ram->total = ram_bytes_total();
+        info->ram->duplicate = dup_mig_pages_transferred();
+        info->ram->skipped = skipped_mig_pages_transferred();
+        info->ram->normal = norm_mig_pages_transferred();
+        info->ram->normal_bytes = norm_mig_bytes_transferred();
+        info->ram->dirty_pages_rate = s->dirty_pages_rate;
+        info->ram->mbps = s->mbps;
+
+        if (blk_mig_active()) {
+            info->has_disk = true;
+            info->disk = g_malloc0(sizeof(*info->disk));
+            info->disk->transferred = blk_mig_bytes_transferred();
+            info->disk->remaining = blk_mig_bytes_remaining();
+            info->disk->total = blk_mig_bytes_total();
+        }
+
+        get_xbzrle_cache_stats(info);
+        break;
     case MIGRATION_STATUS_COMPLETED:
         get_xbzrle_cache_stats(info);
 
@@ -550,8 +584,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
     MigrationState *s = migrate_get_current();
     MigrationCapabilityStatusList *cap;
 
-    if (s->state == MIGRATION_STATUS_ACTIVE ||
-        s->state == MIGRATION_STATUS_SETUP) {
+    if (migration_is_active(s)) {
         error_setg(errp, QERR_MIGRATION_ACTIVE);
         return;
     }
@@ -666,7 +699,8 @@ static void migrate_fd_cleanup(void *opaque)
         s->file = NULL;
     }
 
-    assert(s->state != MIGRATION_STATUS_ACTIVE);
+    assert((s->state != MIGRATION_STATUS_ACTIVE) &&
+           (s->state != MIGRATION_STATUS_POSTCOPY_ACTIVE));
 
     if (s->state != MIGRATION_STATUS_COMPLETED) {
         qemu_savevm_state_cancel();
@@ -700,8 +734,7 @@ static void migrate_fd_cancel(MigrationState *s)
 
     do {
         old_state = s->state;
-        if (old_state != MIGRATION_STATUS_SETUP &&
-            old_state != MIGRATION_STATUS_ACTIVE) {
+        if (!migration_is_active(s)) {
             break;
         }
         migrate_set_state(s, old_state, MIGRATION_STATUS_CANCELLING);
@@ -745,6 +778,11 @@ bool migration_has_failed(MigrationState *s)
             s->state == MIGRATION_STATUS_FAILED);
 }
 
+bool migration_in_postcopy(MigrationState *s)
+{
+    return (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
+}
+
 MigrationState *migrate_init(const MigrationParams *params)
 {
     MigrationState *s = migrate_get_current();
@@ -825,8 +863,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     params.blk = has_blk && blk;
     params.shared = has_inc && inc;
 
-    if (s->state == MIGRATION_STATUS_ACTIVE ||
-        s->state == MIGRATION_STATUS_SETUP ||
+    if (migration_is_active(s) ||
         s->state == MIGRATION_STATUS_CANCELLING) {
         error_setg(errp, QERR_MIGRATION_ACTIVE);
         return;
@@ -1203,7 +1240,10 @@ static void *migration_thread(void *opaque)
     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
     migrate_set_state(s, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_ACTIVE);
 
-    while (s->state == MIGRATION_STATUS_ACTIVE) {
+    trace_migration_thread_setup_complete();
+
+    while (s->state == MIGRATION_STATUS_ACTIVE ||
+           s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
         int64_t current_time;
         uint64_t pending_size;
 
diff --git a/qapi-schema.json b/qapi-schema.json
index 48644f5..f814174 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -430,6 +430,8 @@
 #
 # @active: in the process of doing migration.
 #
+# @postcopy-active: like active, but now in postcopy mode. (since 2.5)
+#
 # @completed: migration is finished.
 #
 # @failed: some error occurred during migration process.
@@ -439,7 +441,7 @@
 ##
 { 'enum': 'MigrationStatus',
   'data': [ 'none', 'setup', 'cancelling', 'cancelled',
-            'active', 'completed', 'failed' ] }
+            'active', 'postcopy-active', 'completed', 'failed' ] }
 
 ##
 # @MigrationInfo
diff --git a/trace-events b/trace-events
index 4fff040..e68e69d 100644
--- a/trace-events
+++ b/trace-events
@@ -1440,6 +1440,7 @@ migrate_fd_error(void) ""
 migrate_fd_cancel(void) ""
 migrate_pending(uint64_t size, uint64_t max, uint64_t post, uint64_t nonpost) "pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64 " nonpost=%" PRIu64 ")"
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
+migration_thread_setup_complete(void) ""
 open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
 source_return_path_thread_bad_end(void) ""
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 30/54] Avoid sending vmdescription during postcopy
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (28 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 29/54] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-10-20 13:35   ` Juan Quintela
  2015-10-28 11:19   ` Amit Shah
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 31/54] Add qemu_savevm_state_complete_postcopy Dr. David Alan Gilbert (git)
                   ` (23 subsequent siblings)
  53 siblings, 2 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

VMDescription is normally sent at the end, after all
of the devices; however that's not the end for postcopy,
so just don't send it when in postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/savevm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index c065ae8..5a98bb4 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -974,7 +974,8 @@ int qemu_savevm_state_iterate(QEMUFile *f)
 static bool should_send_vmdesc(void)
 {
     MachineState *machine = MACHINE(qdev_get_machine());
-    return !machine->suppress_vmdesc;
+    bool in_postcopy = migration_in_postcopy(migrate_get_current());
+    return !machine->suppress_vmdesc && !in_postcopy;
 }
 
 void qemu_savevm_state_complete_precopy(QEMUFile *f)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 31/54] Add qemu_savevm_state_complete_postcopy
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (29 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 30/54] Avoid sending vmdescription during postcopy Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 32/54] Postcopy: Maintain sentmap and calculate discard Dr. David Alan Gilbert (git)
                   ` (22 subsequent siblings)
  53 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add qemu_savevm_state_complete_postcopy to complement
qemu_savevm_state_complete_precopy together with a new
save_live_complete_postcopy method on devices.

The save_live_complete_precopy method is called on
all devices during a precopy migration, and all non-postcopy
devices during a postcopy migration at the transition.

The save_live_complete_postcopy method is called at
the end of postcopy for all postcopiable devices.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 include/migration/vmstate.h |  1 +
 include/sysemu/sysemu.h     |  1 +
 migration/ram.c             |  1 +
 migration/savevm.c          | 49 +++++++++++++++++++++++++++++++++++++++++++--
 4 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 6635bac..bfe71a8 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -40,6 +40,7 @@ typedef struct SaveVMHandlers {
     SaveStateHandler *save_state;
 
     void (*cancel)(void *opaque);
+    int (*save_live_complete_postcopy)(QEMUFile *f, void *opaque);
     int (*save_live_complete_precopy)(QEMUFile *f, void *opaque);
 
     /* This runs both outside and inside the iothread lock.  */
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 75fc79e..9a0d0b5 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -110,6 +110,7 @@ void qemu_savevm_state_begin(QEMUFile *f,
                              const MigrationParams *params);
 void qemu_savevm_state_header(QEMUFile *f);
 int qemu_savevm_state_iterate(QEMUFile *f);
+void qemu_savevm_state_complete_postcopy(QEMUFile *f);
 void qemu_savevm_state_complete_precopy(QEMUFile *f);
 void qemu_savevm_state_cancel(void);
 void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
diff --git a/migration/ram.c b/migration/ram.c
index 16eb119..8644675 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1696,6 +1696,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
 static SaveVMHandlers savevm_ram_handlers = {
     .save_live_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
+    .save_live_complete_postcopy = ram_save_complete,
     .save_live_complete_precopy = ram_save_complete,
     .save_live_pending = ram_save_pending,
     .load_state = ram_load,
diff --git a/migration/savevm.c b/migration/savevm.c
index 5a98bb4..52fca3c 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -978,19 +978,61 @@ static bool should_send_vmdesc(void)
     return !machine->suppress_vmdesc && !in_postcopy;
 }
 
+/*
+ * Calls the save_live_complete_postcopy methods
+ * causing the last few pages to be sent immediately and doing any associated
+ * cleanup.
+ * Note postcopy also calls qemu_savevm_state_complete_precopy to complete
+ * all the other devices, but that happens at the point we switch to postcopy.
+ */
+void qemu_savevm_state_complete_postcopy(QEMUFile *f)
+{
+    SaveStateEntry *se;
+    int ret;
+
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (!se->ops || !se->ops->save_live_complete_postcopy) {
+            continue;
+        }
+        if (se->ops && se->ops->is_active) {
+            if (!se->ops->is_active(se->opaque)) {
+                continue;
+            }
+        }
+        trace_savevm_section_start(se->idstr, se->section_id);
+        /* Section type */
+        qemu_put_byte(f, QEMU_VM_SECTION_END);
+        qemu_put_be32(f, se->section_id);
+
+        ret = se->ops->save_live_complete_postcopy(f, se->opaque);
+        trace_savevm_section_end(se->idstr, se->section_id, ret);
+        save_section_footer(f, se);
+        if (ret < 0) {
+            qemu_file_set_error(f, ret);
+            return;
+        }
+    }
+
+    qemu_put_byte(f, QEMU_VM_EOF);
+    qemu_fflush(f);
+}
+
 void qemu_savevm_state_complete_precopy(QEMUFile *f)
 {
     QJSON *vmdesc;
     int vmdesc_len;
     SaveStateEntry *se;
     int ret;
+    bool in_postcopy = migration_in_postcopy(migrate_get_current());
 
     trace_savevm_state_complete_precopy();
 
     cpu_synchronize_all_states();
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-        if (!se->ops || !se->ops->save_live_complete_precopy) {
+        if (!se->ops ||
+            (in_postcopy && se->ops->save_live_complete_postcopy) ||
+            !se->ops->save_live_complete_precopy) {
             continue;
         }
         if (se->ops && se->ops->is_active) {
@@ -1039,7 +1081,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f)
         save_section_footer(f, se);
     }
 
-    qemu_put_byte(f, QEMU_VM_EOF);
+    if (!in_postcopy) {
+        /* Postcopy stream will still be going */
+        qemu_put_byte(f, QEMU_VM_EOF);
+    }
 
     json_end_array(vmdesc);
     qjson_finish(vmdesc);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 32/54] Postcopy: Maintain sentmap and calculate discard
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (30 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 31/54] Add qemu_savevm_state_complete_postcopy Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-10-21 11:17   ` Juan Quintela
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 33/54] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
                   ` (21 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Where postcopy is preceeded by a period of precopy, the destination will
have received pages that may have been dirtied on the source after the
page was sent.  The destination must throw these pages away before
starting it's CPUs.

Maintain a 'sentmap' of pages that have already been sent.
Calculate list of sent & dirty pages
Provide helpers on the destination side to discard these.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 include/migration/migration.h    |  12 +++
 include/migration/postcopy-ram.h |  35 +++++++
 include/qemu/typedefs.h          |   1 +
 migration/migration.c            |   1 +
 migration/postcopy-ram.c         | 129 +++++++++++++++++++++++
 migration/ram.c                  | 218 ++++++++++++++++++++++++++++++++++++++-
 migration/savevm.c               |   2 -
 trace-events                     |   5 +
 8 files changed, 396 insertions(+), 7 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 219032d..4904d00 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -130,6 +130,13 @@ struct MigrationState
 
     /* Flag set once the migration has been asked to enter postcopy */
     bool start_postcopy;
+
+    /* bitmap of pages that have been sent at least once
+     * only maintained and used in postcopy at the moment
+     * where it's used to send the dirtymap at the start
+     * of the postcopy phase
+     */
+    unsigned long *sentmap;
 };
 
 void process_incoming_migration(QEMUFile *f);
@@ -199,6 +206,11 @@ double xbzrle_mig_cache_miss_rate(void);
 
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
 void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
+/* For outgoing discard bitmap */
+int ram_postcopy_send_discard_bitmap(MigrationState *ms);
+/* For incoming postcopy discard */
+int ram_discard_range(MigrationIncomingState *mis, const char *block_name,
+                      uint64_t start, size_t length);
 
 /**
  * @migrate_add_blocker - prevent migration from proceeding
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index d81934f..80ed2d9 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -16,4 +16,39 @@
 /* Return true if the host supports everything we need to do postcopy-ram */
 bool postcopy_ram_supported_by_host(void);
 
+/*
+ * Discard the contents of 'length' bytes from 'start'
+ * We can assume that if we've been called postcopy_ram_hosttest returned true
+ */
+int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
+                               size_t length);
+
+
+/*
+ * Called at the start of each RAMBlock by the bitmap code.
+ * 'offset' is the bitmap offset of the named RAMBlock in the migration
+ * bitmap.
+ * Returns a new PDS
+ */
+PostcopyDiscardState *postcopy_discard_send_init(MigrationState *ms,
+                                                 unsigned long offset,
+                                                 const char *name);
+
+/*
+ * Called by the bitmap code for each chunk to discard.
+ * May send a discard message, may just leave it queued to
+ * be sent later.
+ * 'start' and 'end' describe an inclusive range of pages in the
+ * migration bitmap in the RAM block passed to postcopy_discard_send_init.
+ */
+void postcopy_discard_send_range(MigrationState *ms, PostcopyDiscardState *pds,
+                                 unsigned long start, unsigned long end);
+
+/*
+ * Called at the end of each RAMBlock by the bitmap code.
+ * Sends any outstanding discard messages, frees the PDS.
+ */
+void postcopy_discard_send_finish(MigrationState *ms,
+                                  PostcopyDiscardState *pds);
+
 #endif
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 0bf7967..32332b8 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -63,6 +63,7 @@ typedef struct PCMachineState PCMachineState;
 typedef struct PCMachineClass PCMachineClass;
 typedef struct PCMCIACardState PCMCIACardState;
 typedef struct PixelFormat PixelFormat;
+typedef struct PostcopyDiscardState PostcopyDiscardState;
 typedef struct PropertyInfo PropertyInfo;
 typedef struct Property Property;
 typedef struct QEMUBH QEMUBH;
diff --git a/migration/migration.c b/migration/migration.c
index 2ae5909..b57a0e6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -24,6 +24,7 @@
 #include "qemu/sockets.h"
 #include "qemu/rcu.h"
 #include "migration/block.h"
+#include "migration/postcopy-ram.h"
 #include "qemu/thread.h"
 #include "qmp-commands.h"
 #include "trace.h"
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index cdd0168..10c9cab 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -27,6 +27,24 @@
 #include "qemu/error-report.h"
 #include "trace.h"
 
+/* Arbitrary limit on size of each discard command,
+ * keeps them around ~200 bytes
+ */
+#define MAX_DISCARDS_PER_COMMAND 12
+
+struct PostcopyDiscardState {
+    const char *name;
+    uint64_t offset; /* Bitmap entry for the 1st bit of this RAMBlock */
+    uint16_t cur_entry;
+    /*
+     * Start and length of a discard range (bytes)
+     */
+    uint64_t start_list[MAX_DISCARDS_PER_COMMAND];
+    uint64_t length_list[MAX_DISCARDS_PER_COMMAND];
+    unsigned int nsentwords;
+    unsigned int nsentcmds;
+};
+
 /* Postcopy needs to detect accesses to pages that haven't yet been copied
  * across, and efficiently map new pages in, the techniques for doing this
  * are target OS specific.
@@ -145,6 +163,27 @@ out:
     return ret;
 }
 
+/**
+ * postcopy_ram_discard_range: Discard a range of memory.
+ * We can assume that if we've been called postcopy_ram_hosttest returned true.
+ *
+ * @mis: Current incoming migration state.
+ * @start, @length: range of memory to discard.
+ *
+ * returns: 0 on success.
+ */
+int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
+                               size_t length)
+{
+    trace_postcopy_ram_discard_range(start, length);
+    if (madvise(start, length, MADV_DONTNEED)) {
+        error_report("%s MADV_DONTNEED: %s", __func__, strerror(errno));
+        return -1;
+    }
+
+    return 0;
+}
+
 #else
 /* No target OS support, stubs just fail */
 bool postcopy_ram_supported_by_host(void)
@@ -153,5 +192,95 @@ bool postcopy_ram_supported_by_host(void)
     return false;
 }
 
+int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
+                               size_t length)
+{
+    assert(0);
+}
 #endif
 
+/* ------------------------------------------------------------------------- */
+
+/**
+ * postcopy_discard_send_init: Called at the start of each RAMBlock before
+ *   asking to discard individual ranges.
+ *
+ * @ms: The current migration state.
+ * @offset: the bitmap offset of the named RAMBlock in the migration
+ *   bitmap.
+ * @name: RAMBlock that discards will operate on.
+ *
+ * returns: a new PDS.
+ */
+PostcopyDiscardState *postcopy_discard_send_init(MigrationState *ms,
+                                                 unsigned long offset,
+                                                 const char *name)
+{
+    PostcopyDiscardState *res = g_try_malloc(sizeof(PostcopyDiscardState));
+
+    if (res) {
+        res->name = name;
+        res->cur_entry = 0;
+        res->nsentwords = 0;
+        res->nsentcmds = 0;
+        res->offset = offset;
+    }
+
+    return res;
+}
+
+/**
+ * postcopy_discard_send_range: Called by the bitmap code for each chunk to
+ *   discard. May send a discard message, may just leave it queued to
+ *   be sent later.
+ *
+ * @ms: Current migration state.
+ * @pds: Structure initialised by postcopy_discard_send_init().
+ * @start,@end: an inclusive range of pages in the migration bitmap in the
+ *   RAM block passed to postcopy_discard_send_init().
+ */
+void postcopy_discard_send_range(MigrationState *ms, PostcopyDiscardState *pds,
+                                unsigned long start, unsigned long end)
+{
+    size_t tp_bits = qemu_target_page_bits();
+    /* Convert to byte offsets within the RAM block */
+    pds->start_list[pds->cur_entry] = (start - pds->offset) << tp_bits;
+    pds->length_list[pds->cur_entry] = ((1 + end - pds->offset) << tp_bits) -
+                                       pds->start_list[pds->cur_entry];
+    pds->cur_entry++;
+    pds->nsentwords++;
+
+    if (pds->cur_entry == MAX_DISCARDS_PER_COMMAND) {
+        /* Full set, ship it! */
+        qemu_savevm_send_postcopy_ram_discard(ms->file, pds->name,
+                                              pds->cur_entry,
+                                              pds->start_list,
+                                              pds->length_list);
+        pds->nsentcmds++;
+        pds->cur_entry = 0;
+    }
+}
+
+/**
+ * postcopy_discard_send_finish: Called at the end of each RAMBlock by the
+ * bitmap code. Sends any outstanding discard messages, frees the PDS
+ *
+ * @ms: Current migration state.
+ * @pds: Structure initialised by postcopy_discard_send_init().
+ */
+void postcopy_discard_send_finish(MigrationState *ms, PostcopyDiscardState *pds)
+{
+    /* Anything unsent? */
+    if (pds->cur_entry) {
+        qemu_savevm_send_postcopy_ram_discard(ms->file, pds->name,
+                                              pds->cur_entry,
+                                              pds->start_list,
+                                              pds->length_list);
+        pds->nsentcmds++;
+    }
+
+    trace_postcopy_discard_send_finish(pds->name, pds->nsentwords,
+                                       pds->nsentcmds);
+
+    g_free(pds);
+}
diff --git a/migration/ram.c b/migration/ram.c
index 8644675..e1c6c4a 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -32,6 +32,7 @@
 #include "qemu/timer.h"
 #include "qemu/main-loop.h"
 #include "migration/migration.h"
+#include "migration/postcopy-ram.h"
 #include "exec/address-spaces.h"
 #include "migration/page_cache.h"
 #include "qemu/error-report.h"
@@ -506,10 +507,18 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t **current_data,
     return 1;
 }
 
-/* Called with rcu_read_lock() to protect migration_bitmap */
+/* Called with rcu_read_lock() to protect migration_bitmap
+ * mr: The region to search for dirty pages in
+ * start: Start address (typically so we can continue from previous page)
+ * ram_addr_abs: Pointer into which to store the address of the dirty page
+ *               within the global ram_addr space
+ *
+ * Returns: byte offset within memory region of the start of a dirty page
+ */
 static inline
 ram_addr_t migration_bitmap_find_and_reset_dirty(RAMBlock *rb,
-                                                 ram_addr_t start)
+                                                 ram_addr_t start,
+                                                 ram_addr_t *ram_addr_abs)
 {
     unsigned long base = rb->offset >> TARGET_PAGE_BITS;
     unsigned long nr = base + (start >> TARGET_PAGE_BITS);
@@ -530,6 +539,7 @@ ram_addr_t migration_bitmap_find_and_reset_dirty(RAMBlock *rb,
         clear_bit(next, bitmap);
         migration_dirty_pages--;
     }
+    *ram_addr_abs = next << TARGET_PAGE_BITS;
     return (next - base) << TARGET_PAGE_BITS;
 }
 
@@ -662,6 +672,24 @@ static int save_zero_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
 }
 
 /**
+ * ram_find_block_by_id: Find a ramblock by name.
+ *
+ * Returns: The RAMBlock with matching ID, or NULL.
+ */
+static RAMBlock *ram_find_block_by_id(const char *id)
+{
+    RAMBlock *block;
+
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        if (!strcmp(id, block->idstr)) {
+            return block;
+        }
+    }
+
+    return NULL;
+}
+
+/**
  * ram_save_page: Send the given page to the stream
  *
  * Returns: Number of pages written.
@@ -926,12 +954,15 @@ static int ram_save_compressed_page(QEMUFile *f, RAMBlock *block,
  * @f: Current migration stream.
  * @pss: Data about the state of the current dirty page scan.
  * @*again: Set to false if the search has scanned the whole of RAM
+ * *ram_addr_abs: Pointer into which to store the address of the dirty page
+ *               within the global ram_addr space
  */
 static bool find_dirty_block(QEMUFile *f, PageSearchStatus *pss,
-                             bool *again)
+                             bool *again, ram_addr_t *ram_addr_abs)
 {
     pss->offset = migration_bitmap_find_and_reset_dirty(pss->block,
-                                                       pss->offset);
+                                                       pss->offset,
+                                                       ram_addr_abs);
     if (pss->complete_round && pss->block == last_seen_block &&
         pss->offset >= last_offset) {
         /*
@@ -989,6 +1020,8 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
     PageSearchStatus pss;
     int pages = 0;
     bool again, found;
+    ram_addr_t dirty_ram_abs; /* Address of the start of the dirty page in
+                                 ram_addr_t space */
 
     pss.block = last_seen_block;
     pss.offset = last_offset;
@@ -999,7 +1032,7 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
     }
 
     do {
-        found = find_dirty_block(f, &pss, &again);
+        found = find_dirty_block(f, &pss, &again, &dirty_ram_abs);
 
         if (found) {
             if (compression_switch && migrate_use_compression()) {
@@ -1013,7 +1046,11 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
 
             /* if page is unmodified, continue to the next */
             if (pages > 0) {
+                MigrationState *ms = migrate_get_current();
                 last_sent_block = pss.block;
+                if (ms->sentmap) {
+                    set_bit(dirty_ram_abs >> TARGET_PAGE_BITS, ms->sentmap);
+                }
             }
         }
     } while (!pages && again);
@@ -1071,6 +1108,8 @@ void free_xbzrle_decoded_buf(void)
 
 static void migration_end(void)
 {
+    MigrationState *s = migrate_get_current();
+
     /* caller have hold iothread lock or is in a bh, so there is
      * no writing race against this migration_bitmap
      */
@@ -1082,6 +1121,9 @@ static void migration_end(void)
         g_free(bitmap);
     }
 
+    g_free(s->sentmap);
+    s->sentmap = NULL;
+
     XBZRLE_cache_lock();
     if (XBZRLE.cache) {
         cache_fini(XBZRLE.cache);
@@ -1174,6 +1216,166 @@ void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
     }
 }
 
+/* **** functions for postcopy ***** */
+
+/*
+ * Callback from postcopy_each_ram_send_discard for each RAMBlock
+ * start,end: Indexes into the bitmap for the first and last bit
+ *            representing the named block
+ */
+static int postcopy_send_discard_bm_ram(MigrationState *ms,
+                                        PostcopyDiscardState *pds,
+                                        unsigned long start, unsigned long end)
+{
+    unsigned long current;
+
+    for (current = start; current <= end; ) {
+        unsigned long set = find_next_bit(ms->sentmap, end + 1, current);
+
+        if (set <= end) {
+            unsigned long zero = find_next_zero_bit(ms->sentmap,
+                                                    end + 1, set + 1);
+
+            if (zero > end) {
+                zero = end + 1;
+            }
+            postcopy_discard_send_range(ms, pds, set, zero - 1);
+            current = zero + 1;
+        } else {
+            current = set;
+        }
+    }
+
+    return 0;
+}
+
+/*
+ * Utility for the outgoing postcopy code.
+ *   Calls postcopy_send_discard_bm_ram for each RAMBlock
+ *   passing it bitmap indexes and name.
+ * Returns: 0 on success
+ * (qemu_ram_foreach_block ends up passing unscaled lengths
+ *  which would mean postcopy code would have to deal with target page)
+ */
+static int postcopy_each_ram_send_discard(MigrationState *ms)
+{
+    struct RAMBlock *block;
+    int ret;
+
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        unsigned long first = block->offset >> TARGET_PAGE_BITS;
+        unsigned long last = (block->offset + (block->used_length - 1))
+                                >> TARGET_PAGE_BITS;
+        PostcopyDiscardState *pds = postcopy_discard_send_init(ms,
+                                                               first,
+                                                               block->idstr);
+
+        /*
+         * Postcopy sends chunks of bitmap over the wire, but it
+         * just needs indexes at this point, avoids it having
+         * target page specific code.
+         */
+        ret = postcopy_send_discard_bm_ram(ms, pds, first, last);
+        postcopy_discard_send_finish(ms, pds);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
+/*
+ * Transmit the set of pages to be discarded after precopy to the target
+ * these are pages that:
+ *     a) Have been previously transmitted but are now dirty again
+ *     b) Pages that have never been transmitted, this ensures that
+ *        any pages on the destination that have been mapped by background
+ *        tasks get discarded (transparent huge pages is the specific concern)
+ * Hopefully this is pretty sparse
+ */
+int ram_postcopy_send_discard_bitmap(MigrationState *ms)
+{
+    int ret;
+
+    rcu_read_lock();
+
+    /* This should be our last sync, the src is now paused */
+    migration_bitmap_sync();
+
+    /*
+     * Update the sentmap to be sentmap = ~sentmap | dirty
+     */
+    bitmap_complement(ms->sentmap, ms->sentmap,
+               last_ram_offset() >> TARGET_PAGE_BITS);
+
+    bitmap_or(ms->sentmap, ms->sentmap, migration_bitmap,
+               last_ram_offset() >> TARGET_PAGE_BITS);
+
+
+    trace_ram_postcopy_send_discard_bitmap();
+#ifdef DEBUG_POSTCOPY
+    ram_debug_dump_bitmap(ms->sentmap, false);
+#endif
+
+    ret = postcopy_each_ram_send_discard(ms);
+    rcu_read_unlock();
+
+    return ret;
+}
+
+/*
+ * At the start of the postcopy phase of migration, any now-dirty
+ * precopied pages are discarded.
+ *
+ * start, length describe a byte address range within the RAMBlock
+ *
+ * Returns 0 on success.
+ */
+int ram_discard_range(MigrationIncomingState *mis,
+                      const char *block_name,
+                      uint64_t start, size_t length)
+{
+    int ret = -1;
+
+    rcu_read_lock();
+    RAMBlock *rb = ram_find_block_by_id(block_name);
+
+    if (!rb) {
+        error_report("ram_discard_range: Failed to find block '%s'",
+                     block_name);
+        goto err;
+    }
+
+    uint8_t *host_startaddr = rb->host + start;
+
+    if ((uintptr_t)host_startaddr & (qemu_host_page_size - 1)) {
+        error_report("ram_discard_range: Unaligned start address: %p",
+                     host_startaddr);
+        goto err;
+    }
+
+    if ((start + length) <= rb->used_length) {
+        uint8_t *host_endaddr = host_startaddr + length;
+        if ((uintptr_t)host_endaddr & (qemu_host_page_size - 1)) {
+            error_report("ram_discard_range: Unaligned end address: %p",
+                         host_endaddr);
+            goto err;
+        }
+        ret = postcopy_ram_discard_range(mis, host_startaddr, length);
+    } else {
+        error_report("ram_discard_range: Overrun block '%s' (%" PRIu64
+                     "/%zu/%zu)",
+                     block_name, start, length, rb->used_length);
+    }
+
+err:
+    rcu_read_unlock();
+
+    return ret;
+}
+
+
 /* Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
  * start to become numerous it will be necessary to reduce the
@@ -1232,6 +1434,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     migration_bitmap = bitmap_new(ram_bitmap_pages);
     bitmap_set(migration_bitmap, 0, ram_bitmap_pages);
 
+    if (migrate_postcopy_ram()) {
+        MigrationState *s = migrate_get_current();
+        s->sentmap = bitmap_new(ram_bitmap_pages);
+        bitmap_clear(s->sentmap, 0, ram_bitmap_pages);
+    }
+
     /*
      * Count the total number of pages used by ram blocks not including any
      * gaps due to alignment or unplugs.
diff --git a/migration/savevm.c b/migration/savevm.c
index 52fca3c..85462b1 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1347,7 +1347,6 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
     }
     trace_loadvm_postcopy_ram_handle_discard_header(ramid, len);
     while (len) {
-        /* TODO - ram_discard_range gets added in a later patch
         uint64_t start_addr, block_length;
         start_addr = qemu_get_be64(mis->from_src_file);
         block_length = qemu_get_be64(mis->from_src_file);
@@ -1358,7 +1357,6 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
         if (ret) {
             return ret;
         }
-        */
     }
     trace_loadvm_postcopy_ram_handle_discard_end();
 
diff --git a/trace-events b/trace-events
index e68e69d..aa2d2e7 100644
--- a/trace-events
+++ b/trace-events
@@ -1247,6 +1247,7 @@ qemu_file_fclose(void) ""
 migration_bitmap_sync_start(void) ""
 migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64""
 migration_throttle(void) ""
+ram_postcopy_send_discard_bitmap(void) ""
 
 # hw/display/qxl.c
 disable qxl_interface_set_mm_time(int qid, uint32_t mm_time) "%d %d"
@@ -1518,6 +1519,10 @@ rdma_start_incoming_migration_after_rdma_listen(void) ""
 rdma_start_outgoing_migration_after_rdma_connect(void) ""
 rdma_start_outgoing_migration_after_rdma_source_init(void) ""
 
+# migration/postcopy-ram.c
+postcopy_discard_send_finish(const char *ramblock, int nwords, int ncmds) "%s mask words sent=%d in %d commands"
+postcopy_ram_discard_range(void *start, size_t length) "%p,+%zx"
+
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
 kvm_vm_ioctl(int type, void *arg) "type 0x%x, arg %p"
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 33/54] postcopy: Incoming initialisation
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (31 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 32/54] Postcopy: Maintain sentmap and calculate discard Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-10-21  8:35   ` Juan Quintela
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 34/54] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
                   ` (20 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 include/migration/migration.h    |   3 ++
 include/migration/postcopy-ram.h |  12 +++++
 migration/postcopy-ram.c         | 102 +++++++++++++++++++++++++++++++++++++++
 migration/ram.c                  |  11 +++++
 migration/savevm.c               |   6 +++
 trace-events                     |   2 +
 6 files changed, 136 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 4904d00..321ad1e 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -86,6 +86,8 @@ struct MigrationIncomingState {
      */
     QemuEvent main_thread_load_event;
 
+    /* For the kernel to send us notifications */
+    int       userfault_fd;
     QEMUFile *to_src_file;
     QemuMutex rp_mutex;    /* We send replies from multiple threads */
 
@@ -211,6 +213,7 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms);
 /* For incoming postcopy discard */
 int ram_discard_range(MigrationIncomingState *mis, const char *block_name,
                       uint64_t start, size_t length);
+int ram_postcopy_incoming_init(MigrationIncomingState *mis);
 
 /**
  * @migrate_add_blocker - prevent migration from proceeding
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index 80ed2d9..9d98f7a 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -17,6 +17,18 @@
 bool postcopy_ram_supported_by_host(void);
 
 /*
+ * Initialise postcopy-ram, setting the RAM to a state where we can go into
+ * postcopy later; must be called prior to any precopy.
+ * called from ram.c's similarly named ram_postcopy_incoming_init
+ */
+int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages);
+
+/*
+ * At the end of a migration where postcopy_ram_incoming_init was called.
+ */
+int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis);
+
+/*
  * Discard the contents of 'length' bytes from 'start'
  * We can assume that if we've been called postcopy_ram_hosttest returned true
  */
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 10c9cab..15ac820 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -184,6 +184,97 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
     return 0;
 }
 
+/*
+ * Setup an area of RAM so that it *can* be used for postcopy later; this
+ * must be done right at the start prior to pre-copy.
+ * opaque should be the MIS.
+ */
+static int init_range(const char *block_name, void *host_addr,
+                      ram_addr_t offset, ram_addr_t length, void *opaque)
+{
+    MigrationIncomingState *mis = opaque;
+
+    trace_postcopy_init_range(block_name, host_addr, offset, length);
+
+    /*
+     * We need the whole of RAM to be truly empty for postcopy, so things
+     * like ROMs and any data tables built during init must be zero'd
+     * - we're going to get the copy from the source anyway.
+     * (Precopy will just overwrite this data, so doesn't need the discard)
+     */
+    if (postcopy_ram_discard_range(mis, host_addr, length)) {
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * At the end of migration, undo the effects of init_range
+ * opaque should be the MIS.
+ */
+static int cleanup_range(const char *block_name, void *host_addr,
+                        ram_addr_t offset, ram_addr_t length, void *opaque)
+{
+    MigrationIncomingState *mis = opaque;
+    struct uffdio_range range_struct;
+    trace_postcopy_cleanup_range(block_name, host_addr, offset, length);
+
+    /*
+     * We turned off hugepage for the precopy stage with postcopy enabled
+     * we can turn it back on now.
+     */
+#ifdef MADV_HUGEPAGE
+    if (madvise(host_addr, length, MADV_HUGEPAGE)) {
+        error_report("%s HUGEPAGE: %s", __func__, strerror(errno));
+        return -1;
+    }
+#endif
+
+    /*
+     * We can also turn off userfault now since we should have all the
+     * pages.   It can be useful to leave it on to debug postcopy
+     * if you're not sure it's always getting every page.
+     */
+    range_struct.start = (uintptr_t)host_addr;
+    range_struct.len = length;
+
+    if (ioctl(mis->userfault_fd, UFFDIO_UNREGISTER, &range_struct)) {
+        error_report("%s: userfault unregister %s", __func__, strerror(errno));
+
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * Initialise postcopy-ram, setting the RAM to a state where we can go into
+ * postcopy later; must be called prior to any precopy.
+ * called from arch_init's similarly named ram_postcopy_incoming_init
+ */
+int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
+{
+    if (qemu_ram_foreach_block(init_range, mis)) {
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * At the end of a migration where postcopy_ram_incoming_init was called.
+ */
+int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
+{
+    /* TODO: Join the fault thread once we're sure it will exit */
+    if (qemu_ram_foreach_block(cleanup_range, mis)) {
+        return -1;
+    }
+
+    return 0;
+}
+
 #else
 /* No target OS support, stubs just fail */
 bool postcopy_ram_supported_by_host(void)
@@ -192,6 +283,17 @@ bool postcopy_ram_supported_by_host(void)
     return false;
 }
 
+int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
+{
+    error_report("postcopy_ram_incoming_init: No OS support");
+    return -1;
+}
+
+int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
+{
+    assert(0);
+}
+
 int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
                                size_t length)
 {
diff --git a/migration/ram.c b/migration/ram.c
index e1c6c4a..d005aca 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1762,6 +1762,17 @@ static void decompress_data_with_multi_threads(uint8_t *compbuf,
     }
 }
 
+/*
+ * Allocate data structures etc needed by incoming migration with postcopy-ram
+ * postcopy-ram's similarly names postcopy_ram_incoming_init does the work
+ */
+int ram_postcopy_incoming_init(MigrationIncomingState *mis)
+{
+    size_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
+
+    return postcopy_ram_incoming_init(mis, ram_pages);
+}
+
 static int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
     int flags = 0, ret = 0;
diff --git a/migration/savevm.c b/migration/savevm.c
index 85462b1..11bf172 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1279,6 +1279,12 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
         return -1;
     }
 
+    if (ram_postcopy_incoming_init(mis)) {
+        return -1;
+    }
+
+    postcopy_state_set(POSTCOPY_INCOMING_ADVISE);
+
     return 0;
 }
 
diff --git a/trace-events b/trace-events
index aa2d2e7..aa65d3d 100644
--- a/trace-events
+++ b/trace-events
@@ -1522,6 +1522,8 @@ rdma_start_outgoing_migration_after_rdma_source_init(void) ""
 # migration/postcopy-ram.c
 postcopy_discard_send_finish(const char *ramblock, int nwords, int ncmds) "%s mask words sent=%d in %d commands"
 postcopy_ram_discard_range(void *start, size_t length) "%p,+%zx"
+postcopy_cleanup_range(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=%zx length=%zx"
+postcopy_init_range(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=%zx length=%zx"
 
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 34/54] postcopy: ram_enable_notify to switch on userfault
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (32 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 33/54] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-10-28 11:40   ` Amit Shah
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 35/54] Postcopy: Postcopy startup in migration thread Dr. David Alan Gilbert (git)
                   ` (19 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Mark the area of RAM as 'userfault'
Start up a fault-thread to handle any userfaults we might receive
from it (to be filled in later)

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
 include/migration/migration.h    |  3 ++
 include/migration/postcopy-ram.h |  6 ++++
 migration/postcopy-ram.c         | 68 ++++++++++++++++++++++++++++++++++++++++
 migration/savevm.c               |  9 ++++++
 4 files changed, 86 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 321ad1e..aecf284 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -86,6 +86,9 @@ struct MigrationIncomingState {
      */
     QemuEvent main_thread_load_event;
 
+    QemuThread     fault_thread;
+    QemuSemaphore  fault_thread_sem;
+
     /* For the kernel to send us notifications */
     int       userfault_fd;
     QEMUFile *to_src_file;
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index 9d98f7a..9d037ff 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -17,6 +17,12 @@
 bool postcopy_ram_supported_by_host(void);
 
 /*
+ * Make all of RAM sensitive to accesses to areas that haven't yet been written
+ * and wire up anything necessary to deal with it.
+ */
+int postcopy_ram_enable_notify(MigrationIncomingState *mis);
+
+/*
  * Initialise postcopy-ram, setting the RAM to a state where we can go into
  * postcopy later; must be called prior to any precopy.
  * called from ram.c's similarly named ram_postcopy_incoming_init
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 15ac820..e89a99e 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -275,6 +275,69 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
     return 0;
 }
 
+/*
+ * Mark the given area of RAM as requiring notification to unwritten areas
+ * Used as a  callback on qemu_ram_foreach_block.
+ *   host_addr: Base of area to mark
+ *   offset: Offset in the whole ram arena
+ *   length: Length of the section
+ *   opaque: MigrationIncomingState pointer
+ * Returns 0 on success
+ */
+static int ram_block_enable_notify(const char *block_name, void *host_addr,
+                                   ram_addr_t offset, ram_addr_t length,
+                                   void *opaque)
+{
+    MigrationIncomingState *mis = opaque;
+    struct uffdio_register reg_struct;
+
+    reg_struct.range.start = (uintptr_t)host_addr;
+    reg_struct.range.len = length;
+    reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
+
+    /* Now tell our userfault_fd that it's responsible for this area */
+    if (ioctl(mis->userfault_fd, UFFDIO_REGISTER, &reg_struct)) {
+        error_report("%s userfault register: %s", __func__, strerror(errno));
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * Handle faults detected by the USERFAULT markings
+ */
+static void *postcopy_ram_fault_thread(void *opaque)
+{
+    MigrationIncomingState *mis = opaque;
+
+    fprintf(stderr, "postcopy_ram_fault_thread\n");
+    /* TODO: In later patch */
+    qemu_sem_post(&mis->fault_thread_sem);
+    while (1) {
+        /* TODO: In later patch */
+    }
+
+    return NULL;
+}
+
+int postcopy_ram_enable_notify(MigrationIncomingState *mis)
+{
+    /* Create the fault handler thread and wait for it to be ready */
+    qemu_sem_init(&mis->fault_thread_sem, 0);
+    qemu_thread_create(&mis->fault_thread, "postcopy/fault",
+                       postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
+    qemu_sem_wait(&mis->fault_thread_sem);
+    qemu_sem_destroy(&mis->fault_thread_sem);
+
+    /* Mark so that we get notified of accesses to unwritten areas */
+    if (qemu_ram_foreach_block(ram_block_enable_notify, mis)) {
+        return -1;
+    }
+
+    return 0;
+}
+
 #else
 /* No target OS support, stubs just fail */
 bool postcopy_ram_supported_by_host(void)
@@ -299,6 +362,11 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
 {
     assert(0);
 }
+
+int postcopy_ram_enable_notify(MigrationIncomingState *mis)
+{
+    assert(0);
+}
 #endif
 
 /* ------------------------------------------------------------------------- */
diff --git a/migration/savevm.c b/migration/savevm.c
index 11bf172..4072912 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1379,6 +1379,15 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
         return -1;
     }
 
+    /*
+     * Sensitise RAM - can now generate requests for blocks that don't exist
+     * However, at this point the CPU shouldn't be running, and the IO
+     * shouldn't be doing anything yet so don't actually expect requests
+     */
+    if (postcopy_ram_enable_notify(mis)) {
+        return -1;
+    }
+
     /* TODO start up the postcopy listening thread */
     return 0;
 }
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 35/54] Postcopy: Postcopy startup in migration thread
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (33 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 34/54] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
@ 2015-09-29  8:37 ` Dr. David Alan Gilbert (git)
  2015-10-21  8:57   ` Juan Quintela
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 36/54] Split out end of migration code from migration_thread Dr. David Alan Gilbert (git)
                   ` (18 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:37 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Rework the migration thread to setup and start postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 include/migration/migration.h |   3 +
 migration/migration.c         | 172 ++++++++++++++++++++++++++++++++++++++++--
 trace-events                  |   4 +
 3 files changed, 173 insertions(+), 6 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index aecf284..0586f8c 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -136,6 +136,9 @@ struct MigrationState
     /* Flag set once the migration has been asked to enter postcopy */
     bool start_postcopy;
 
+    /* Flag set once the migration thread is running (and needs joining) */
+    bool migration_thread_started;
+
     /* bitmap of pages that have been sent at least once
      * only maintained and used in postcopy at the moment
      * where it's used to send the dirtymap at the start
diff --git a/migration/migration.c b/migration/migration.c
index b57a0e6..6c662e6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -692,7 +692,10 @@ static void migrate_fd_cleanup(void *opaque)
     if (s->file) {
         trace_migrate_fd_cleanup();
         qemu_mutex_unlock_iothread();
-        qemu_thread_join(&s->thread);
+        if (s->migration_thread_started) {
+            qemu_thread_join(&s->thread);
+            s->migration_thread_started = false;
+        }
         qemu_mutex_lock_iothread();
 
         migrate_compress_threads_join();
@@ -1178,7 +1181,6 @@ out:
     return NULL;
 }
 
-__attribute__ (( unused )) /* Until later in patch series */
 static int open_return_path_on_source(MigrationState *ms)
 {
 
@@ -1220,25 +1222,149 @@ static int await_return_path_close_on_source(MigrationState *ms)
 }
 
 /*
+ * Switch from normal iteration to postcopy
+ * Returns non-0 on error
+ */
+static int postcopy_start(MigrationState *ms, bool *old_vm_running)
+{
+    int ret;
+    const QEMUSizedBuffer *qsb;
+    int64_t time_at_stop = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    migrate_set_state(ms, MIGRATION_STATUS_ACTIVE,
+                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
+
+    trace_postcopy_start();
+    qemu_mutex_lock_iothread();
+    trace_postcopy_start_set_run();
+
+    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
+    *old_vm_running = runstate_is_running();
+    global_state_store();
+    ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+
+    if (ret < 0) {
+        goto fail;
+    }
+
+    /*
+     * in Finish migrate and with the io-lock held everything should
+     * be quiet, but we've potentially still got dirty pages and we
+     * need to tell the destination to throw any pages it's already received
+     * that are dirty
+     */
+    if (ram_postcopy_send_discard_bitmap(ms)) {
+        error_report("postcopy send discard bitmap failed");
+        goto fail;
+    }
+
+    /*
+     * send rest of state - note things that are doing postcopy
+     * will notice we're in POSTCOPY_ACTIVE and not actually
+     * wrap their state up here
+     */
+    qemu_file_set_rate_limit(ms->file, INT64_MAX);
+    /* Ping just for debugging, helps line traces up */
+    qemu_savevm_send_ping(ms->file, 2);
+
+    /*
+     * While loading the device state we may trigger page transfer
+     * requests and the fd must be free to process those, and thus
+     * the destination must read the whole device state off the fd before
+     * it starts processing it.  Unfortunately the ad-hoc migration format
+     * doesn't allow the destination to know the size to read without fully
+     * parsing it through each devices load-state code (especially the open
+     * coded devices that use get/put).
+     * So we wrap the device state up in a package with a length at the start;
+     * to do this we use a qemu_buf to hold the whole of the device state.
+     */
+    QEMUFile *fb = qemu_bufopen("w", NULL);
+    if (!fb) {
+        error_report("Failed to create buffered file");
+        goto fail;
+    }
+
+    qemu_savevm_state_complete_precopy(fb);
+    qemu_savevm_send_ping(fb, 3);
+
+    qemu_savevm_send_postcopy_run(fb);
+
+    /* <><> end of stuff going into the package */
+    qsb = qemu_buf_get(fb);
+
+    /* Now send that blob */
+    if (qemu_savevm_send_packaged(ms->file, qsb)) {
+        goto fail_closefb;
+    }
+    qemu_fclose(fb);
+    ms->downtime =  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - time_at_stop;
+
+    qemu_mutex_unlock_iothread();
+
+    /*
+     * Although this ping is just for debug, it could potentially be
+     * used for getting a better measurement of downtime at the source.
+     */
+    qemu_savevm_send_ping(ms->file, 4);
+
+    ret = qemu_file_get_error(ms->file);
+    if (ret) {
+        error_report("postcopy_start: Migration stream errored");
+        migrate_set_state(ms, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+                              MIGRATION_STATUS_FAILED);
+    }
+
+    return ret;
+
+fail_closefb:
+    qemu_fclose(fb);
+fail:
+    migrate_set_state(ms, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+                          MIGRATION_STATUS_FAILED);
+    qemu_mutex_unlock_iothread();
+    return -1;
+}
+
+/*
  * Master migration thread on the source VM.
  * It drives the migration and pumps the data down the outgoing channel.
  */
 static void *migration_thread(void *opaque)
 {
     MigrationState *s = opaque;
+    /* Used by the bandwidth calcs, updated later */
     int64_t initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     int64_t initial_bytes = 0;
     int64_t max_size = 0;
     int64_t start_time = initial_time;
     bool old_vm_running = false;
+    bool entered_postcopy = false;
+    /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
+    enum MigrationStatus current_active_state = MIGRATION_STATUS_ACTIVE;
 
     rcu_register_thread();
 
     qemu_savevm_state_header(s->file);
+
+    if (migrate_postcopy_ram()) {
+        /* Now tell the dest that it should open its end so it can reply */
+        qemu_savevm_send_open_return_path(s->file);
+
+        /* And do a ping that will make stuff easier to debug */
+        qemu_savevm_send_ping(s->file, 1);
+
+        /*
+         * Tell the destination that we *might* want to do postcopy later;
+         * if the other end can't do postcopy it should fail now, nice and
+         * early.
+         */
+        qemu_savevm_send_postcopy_advise(s->file);
+    }
+
     qemu_savevm_state_begin(s->file, &s->params);
 
     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
+    current_active_state = MIGRATION_STATUS_ACTIVE;
     migrate_set_state(s, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_ACTIVE);
 
     trace_migration_thread_setup_complete();
@@ -1257,6 +1383,22 @@ static void *migration_thread(void *opaque)
             trace_migrate_pending(pending_size, max_size,
                                   pend_post, pend_nonpost);
             if (pending_size && pending_size >= max_size) {
+                /* Still a significant amount to transfer */
+
+                current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+                if (migrate_postcopy_ram() &&
+                    s->state != MIGRATION_STATUS_POSTCOPY_ACTIVE &&
+                    pend_nonpost <= max_size &&
+                    atomic_read(&s->start_postcopy)) {
+
+                    if (!postcopy_start(s, &old_vm_running)) {
+                        current_active_state = MIGRATION_STATUS_POSTCOPY_ACTIVE;
+                        entered_postcopy = true;
+                    }
+
+                    continue;
+                }
+                /* Just another iteration step */
                 qemu_savevm_state_iterate(s->file);
             } else {
                 int ret;
@@ -1291,8 +1433,8 @@ static void *migration_thread(void *opaque)
         }
 
         if (qemu_file_get_error(s->file)) {
-            migrate_set_state(s, MIGRATION_STATUS_ACTIVE,
-                              MIGRATION_STATUS_FAILED);
+            migrate_set_state(s, current_active_state, MIGRATION_STATUS_FAILED);
+            trace_migration_thread_file_err();
             break;
         }
         current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
@@ -1323,19 +1465,22 @@ static void *migration_thread(void *opaque)
         }
     }
 
+    trace_migration_thread_after_loop();
     qemu_mutex_lock_iothread();
     if (s->state == MIGRATION_STATUS_COMPLETED) {
         int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
         uint64_t transferred_bytes = qemu_ftell(s->file);
         s->total_time = end_time - s->total_time;
-        s->downtime = end_time - start_time;
+        if (!entered_postcopy) {
+            s->downtime = end_time - start_time;
+        }
         if (s->total_time) {
             s->mbps = (((double) transferred_bytes * 8.0) /
                        ((double) s->total_time)) / 1000;
         }
         runstate_set(RUN_STATE_POSTMIGRATE);
     } else {
-        if (old_vm_running) {
+        if (old_vm_running && !entered_postcopy) {
             vm_start();
         }
     }
@@ -1358,9 +1503,24 @@ void migrate_fd_connect(MigrationState *s)
     /* Notify before starting migration thread */
     notifier_list_notify(&migration_state_notifiers, s);
 
+    /*
+     * Open the return path; currently for postcopy but other things might
+     * also want it.
+     */
+    if (migrate_postcopy_ram()) {
+        if (open_return_path_on_source(s)) {
+            error_report("Unable to open return-path for postcopy");
+            migrate_set_state(s, MIGRATION_STATUS_SETUP,
+                              MIGRATION_STATUS_FAILED);
+            migrate_fd_cleanup(s);
+            return;
+        }
+    }
+
     migrate_compress_threads_create();
     qemu_thread_create(&s->thread, "migration", migration_thread, s,
                        QEMU_THREAD_JOINABLE);
+    s->migration_thread_started = true;
 }
 
 PostcopyState  postcopy_state_get(void)
diff --git a/trace-events b/trace-events
index aa65d3d..b0e7c20 100644
--- a/trace-events
+++ b/trace-events
@@ -1441,9 +1441,13 @@ migrate_fd_error(void) ""
 migrate_fd_cancel(void) ""
 migrate_pending(uint64_t size, uint64_t max, uint64_t post, uint64_t nonpost) "pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64 " nonpost=%" PRIu64 ")"
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
+migration_thread_after_loop(void) ""
+migration_thread_file_err(void) ""
 migration_thread_setup_complete(void) ""
 open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
+postcopy_start(void) ""
+postcopy_start_set_run(void) ""
 source_return_path_thread_bad_end(void) ""
 source_return_path_thread_end(void) ""
 source_return_path_thread_entry(void) ""
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 36/54] Split out end of migration code from migration_thread
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (34 preceding siblings ...)
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 35/54] Postcopy: Postcopy startup in migration thread Dr. David Alan Gilbert (git)
@ 2015-09-29  8:38 ` Dr. David Alan Gilbert (git)
  2015-10-21  9:11   ` Juan Quintela
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 37/54] Postcopy: End of iteration Dr. David Alan Gilbert (git)
                   ` (17 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:38 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The code that gets run at the end of the migration process
is getting large, and is about to have a chunk added for postcopy.
Split it into a separate function.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c | 75 ++++++++++++++++++++++++++++++++-------------------
 trace-events          |  2 ++
 2 files changed, 50 insertions(+), 27 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 6c662e6..1b32625 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1324,6 +1324,50 @@ fail:
     return -1;
 }
 
+/**
+ * migration_completion: Used by migration_thread when there's not much left
+ *   pending. The caller 'breaks' the loop when this returns.
+ *
+ * @s: Current migration state
+ * @current_active_state: The migration state we expect to be in
+ * @*old_vm_running: Pointer to old_vm_running flag
+ * @*start_time: Pointer to time to update
+ */
+static void migration_completion(MigrationState *s, int current_active_state,
+                                 bool *old_vm_running, int64_t *start_time)
+{
+    int ret;
+    qemu_mutex_lock_iothread();
+    *start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
+    *old_vm_running = runstate_is_running();
+    ret = global_state_store();
+
+    if (!ret) {
+        ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+        if (ret >= 0) {
+            qemu_file_set_rate_limit(s->file, INT64_MAX);
+            qemu_savevm_state_complete_precopy(s->file);
+        }
+    }
+    qemu_mutex_unlock_iothread();
+
+    if (ret < 0) {
+        goto fail;
+    }
+
+    if (qemu_file_get_error(s->file)) {
+        trace_migration_completion_file_err();
+        goto fail;
+    }
+
+    migrate_set_state(s, current_active_state, MIGRATION_STATUS_COMPLETED);
+    return;
+
+fail:
+    migrate_set_state(s, current_active_state, MIGRATION_STATUS_FAILED);
+}
+
 /*
  * Master migration thread on the source VM.
  * It drives the migration and pumps the data down the outgoing channel.
@@ -1401,34 +1445,11 @@ static void *migration_thread(void *opaque)
                 /* Just another iteration step */
                 qemu_savevm_state_iterate(s->file);
             } else {
-                int ret;
-
-                qemu_mutex_lock_iothread();
-                start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
-                qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
-                old_vm_running = runstate_is_running();
-
-                ret = global_state_store();
-                if (!ret) {
-                    ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
-                    if (ret >= 0) {
-                        qemu_file_set_rate_limit(s->file, INT64_MAX);
-                        qemu_savevm_state_complete_precopy(s->file);
-                    }
-                }
-                qemu_mutex_unlock_iothread();
+                trace_migration_thread_low_pending(pending_size);
 
-                if (ret < 0) {
-                    migrate_set_state(s, MIGRATION_STATUS_ACTIVE,
-                                      MIGRATION_STATUS_FAILED);
-                    break;
-                }
-
-                if (!qemu_file_get_error(s->file)) {
-                    migrate_set_state(s, MIGRATION_STATUS_ACTIVE,
-                                      MIGRATION_STATUS_COMPLETED);
-                    break;
-                }
+                migration_completion(s, current_active_state, &old_vm_running,
+                                     &start_time);
+                break;
             }
         }
 
diff --git a/trace-events b/trace-events
index b0e7c20..875d9ef 100644
--- a/trace-events
+++ b/trace-events
@@ -1441,9 +1441,11 @@ migrate_fd_error(void) ""
 migrate_fd_cancel(void) ""
 migrate_pending(uint64_t size, uint64_t max, uint64_t post, uint64_t nonpost) "pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64 " nonpost=%" PRIu64 ")"
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
+migration_completion_file_err(void) ""
 migration_thread_after_loop(void) ""
 migration_thread_file_err(void) ""
 migration_thread_setup_complete(void) ""
+migration_thread_low_pending(uint64_t pending) "%" PRIu64
 open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
 postcopy_start(void) ""
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 37/54] Postcopy: End of iteration
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (35 preceding siblings ...)
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 36/54] Split out end of migration code from migration_thread Dr. David Alan Gilbert (git)
@ 2015-09-29  8:38 ` Dr. David Alan Gilbert (git)
  2015-10-21  9:16   ` Juan Quintela
  2015-10-29  5:10   ` Amit Shah
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 38/54] Page request: Add MIG_RP_MSG_REQ_PAGES reverse command Dr. David Alan Gilbert (git)
                   ` (16 subsequent siblings)
  53 siblings, 2 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:38 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The end of migration in postcopy is a bit different since some of
the things normally done at the end of migration have already been
done on the transition to postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c | 51 +++++++++++++++++++++++++++++++++++++--------------
 trace-events          |  4 ++++
 2 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 1b32625..4f8ef6f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1198,7 +1198,6 @@ static int open_return_path_on_source(MigrationState *ms)
     return 0;
 }
 
-__attribute__ (( unused )) /* Until later in patch series */
 /* Returns 0 if the RP was ok, otherwise there was an error on the RP */
 static int await_return_path_close_on_source(MigrationState *ms)
 {
@@ -1337,23 +1336,47 @@ static void migration_completion(MigrationState *s, int current_active_state,
                                  bool *old_vm_running, int64_t *start_time)
 {
     int ret;
-    qemu_mutex_lock_iothread();
-    *start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
-    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
-    *old_vm_running = runstate_is_running();
-    ret = global_state_store();
+    if (s->state == MIGRATION_STATUS_ACTIVE) {
+        qemu_mutex_lock_iothread();
+        *start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+        qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
+        *old_vm_running = runstate_is_running();
+        ret = global_state_store();
+
+        if (!ret) {
+            ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+            if (ret >= 0) {
+                qemu_file_set_rate_limit(s->file, INT64_MAX);
+                qemu_savevm_state_complete_precopy(s->file);
+            }
+        }
+        qemu_mutex_unlock_iothread();
 
-    if (!ret) {
-        ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
-        if (ret >= 0) {
-            qemu_file_set_rate_limit(s->file, INT64_MAX);
-            qemu_savevm_state_complete_precopy(s->file);
+        if (ret < 0) {
+            goto fail;
         }
+    } else if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        trace_migration_completion_postcopy_end();
+
+        qemu_savevm_state_complete_postcopy(s->file);
+        trace_migration_completion_postcopy_end_after_complete();
     }
-    qemu_mutex_unlock_iothread();
 
-    if (ret < 0) {
-        goto fail;
+    /*
+     * If rp was opened we must clean up the thread before
+     * cleaning everything else up (since if there are no failures
+     * it will wait for the destination to send it's status in
+     * a SHUT command).
+     * Postcopy opens rp if enabled (even if it's not avtivated)
+     */
+    if (migrate_postcopy_ram()) {
+        int rp_error;
+        trace_migration_completion_postcopy_end_before_rp();
+        rp_error = await_return_path_close_on_source(s);
+        trace_migration_completion_postcopy_end_after_rp(rp_error);
+        if (rp_error) {
+            goto fail;
+        }
     }
 
     if (qemu_file_get_error(s->file)) {
diff --git a/trace-events b/trace-events
index 875d9ef..dec2ae1 100644
--- a/trace-events
+++ b/trace-events
@@ -1442,6 +1442,10 @@ migrate_fd_cancel(void) ""
 migrate_pending(uint64_t size, uint64_t max, uint64_t post, uint64_t nonpost) "pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64 " nonpost=%" PRIu64 ")"
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
 migration_completion_file_err(void) ""
+migration_completion_postcopy_end(void) ""
+migration_completion_postcopy_end_after_complete(void) ""
+migration_completion_postcopy_end_before_rp(void) ""
+migration_completion_postcopy_end_after_rp(int rp_error) "%d"
 migration_thread_after_loop(void) ""
 migration_thread_file_err(void) ""
 migration_thread_setup_complete(void) ""
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 38/54] Page request: Add MIG_RP_MSG_REQ_PAGES reverse command
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (36 preceding siblings ...)
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 37/54] Postcopy: End of iteration Dr. David Alan Gilbert (git)
@ 2015-09-29  8:38 ` Dr. David Alan Gilbert (git)
  2015-10-21 11:12   ` Juan Quintela
  2015-10-29  5:17   ` Amit Shah
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 39/54] Page request: Process incoming page request Dr. David Alan Gilbert (git)
                   ` (15 subsequent siblings)
  53 siblings, 2 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:38 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add MIG_RP_MSG_REQ_PAGES command on Return path for the postcopy
destination to request a page from the source.

Two versions exist:
   MIG_RP_MSG_REQ_PAGES_ID that includes a RAMBlock name and start/len
   MIG_RP_MSG_REQ_PAGES that just has start/len for use with the same
                        RAMBlock as a previous MIG_RP_MSG_REQ_PAGES_ID

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  5 ++++
 migration/migration.c         | 70 +++++++++++++++++++++++++++++++++++++++++++
 trace-events                  |  1 +
 3 files changed, 76 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 0586f8c..9be08c8 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -49,6 +49,9 @@ enum mig_rp_message_type {
     MIG_RP_MSG_SHUT,         /* sibling will not send any more RP messages */
     MIG_RP_MSG_PONG,         /* Response to a PING; data (seq: be32 ) */
 
+    MIG_RP_MSG_REQ_PAGES_ID, /* data (start: be64, len: be32, id: string) */
+    MIG_RP_MSG_REQ_PAGES,    /* data (start: be64, len: be32) */
+
     MIG_RP_MSG_MAX
 };
 
@@ -263,6 +266,8 @@ void migrate_send_rp_shut(MigrationIncomingState *mis,
                           uint32_t value);
 void migrate_send_rp_pong(MigrationIncomingState *mis,
                           uint32_t value);
+void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
+                              ram_addr_t start, size_t len);
 
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
diff --git a/migration/migration.c b/migration/migration.c
index 4f8ef6f..e994164 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -251,6 +251,35 @@ static void deferred_incoming_migration(Error **errp)
     deferred_incoming = true;
 }
 
+/* Request a range of pages from the source VM at the given
+ * start address.
+ *   rbname: Name of the RAMBlock to request the page in, if NULL it's the same
+ *           as the last request (a name must have been given previously)
+ *   Start: Address offset within the RB
+ *   Len: Length in bytes required - must be a multiple of pagesize
+ */
+void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
+                               ram_addr_t start, size_t len)
+{
+    uint8_t bufc[12 + 1 + 255]; /* start (8), len (4), rbname upto 256 */
+    size_t msglen = 12; /* start + len */
+
+    *(uint64_t *)bufc = cpu_to_be64((uint64_t)start);
+    *(uint32_t *)(bufc + 8) = cpu_to_be32((uint32_t)len);
+
+    if (rbname) {
+        int rbname_len = strlen(rbname);
+        assert(rbname_len < 256);
+
+        bufc[msglen++] = rbname_len;
+        memcpy(bufc + msglen, rbname, rbname_len);
+        msglen += rbname_len;
+        migrate_send_rp_message(mis, MIG_RP_MSG_REQ_PAGES_ID, msglen, bufc);
+    } else {
+        migrate_send_rp_message(mis, MIG_RP_MSG_REQ_PAGES, msglen, bufc);
+    }
+}
+
 void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p;
@@ -1092,10 +1121,23 @@ static struct rp_cmd_args {
     [MIG_RP_MSG_INVALID]        = { .len = -1, .name = "INVALID" },
     [MIG_RP_MSG_SHUT]           = { .len =  4, .name = "SHUT" },
     [MIG_RP_MSG_PONG]           = { .len =  4, .name = "PONG" },
+    [MIG_RP_MSG_REQ_PAGES]      = { .len = 12, .name = "REQ_PAGES" },
+    [MIG_RP_MSG_REQ_PAGES_ID]   = { .len = -1, .name = "REQ_PAGES_ID" },
     [MIG_RP_MSG_MAX]            = { .len = -1, .name = "MAX" },
 };
 
 /*
+ * Process a request for pages received on the return path,
+ * We're allowed to send more than requested (e.g. to round to our page size)
+ * and we don't need to send pages that have already been sent.
+ */
+static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
+                                       ram_addr_t start, size_t len)
+{
+    trace_migrate_handle_rp_req_pages(rbname, start, len);
+}
+
+/*
  * Handles messages sent on the return path towards the source VM
  *
  */
@@ -1107,6 +1149,8 @@ static void *source_return_path_thread(void *opaque)
     const int max_len = 512;
     uint8_t buf[max_len];
     uint32_t tmp32, sibling_error;
+    ram_addr_t start = 0; /* =0 to silence warning */
+    size_t  len = 0, expected_len;
     int res;
 
     trace_source_return_path_thread_entry();
@@ -1166,6 +1210,32 @@ static void *source_return_path_thread(void *opaque)
             trace_source_return_path_thread_pong(tmp32);
             break;
 
+        case MIG_RP_MSG_REQ_PAGES:
+            start = be64_to_cpup((uint64_t *)buf);
+            len = be32_to_cpup((uint32_t *)(buf + 8));
+            migrate_handle_rp_req_pages(ms, NULL, start, len);
+            break;
+
+        case MIG_RP_MSG_REQ_PAGES_ID:
+            expected_len = 12 + 1; /* header + termination */
+
+            if (header_len >= expected_len) {
+                start = be64_to_cpup((uint64_t *)buf);
+                len = be32_to_cpup((uint32_t *)(buf + 8));
+                /* Now we expect an idstr */
+                tmp32 = buf[12]; /* Length of the following idstr */
+                buf[13 + tmp32] = '\0';
+                expected_len += tmp32;
+            }
+            if (header_len != expected_len) {
+                error_report("RP: Req_Page_id with length %d expecting %zd",
+                        header_len, expected_len);
+                mark_source_rp_bad(ms);
+                goto out;
+            }
+            migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len);
+            break;
+
         default:
             break;
         }
diff --git a/trace-events b/trace-events
index dec2ae1..b58077f 100644
--- a/trace-events
+++ b/trace-events
@@ -1439,6 +1439,7 @@ migrate_set_state(int new_state) "new state %d"
 migrate_fd_cleanup(void) ""
 migrate_fd_error(void) ""
 migrate_fd_cancel(void) ""
+migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at %zx len %zx"
 migrate_pending(uint64_t size, uint64_t max, uint64_t post, uint64_t nonpost) "pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64 " nonpost=%" PRIu64 ")"
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
 migration_completion_file_err(void) ""
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 39/54] Page request: Process incoming page request
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (37 preceding siblings ...)
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 38/54] Page request: Add MIG_RP_MSG_REQ_PAGES reverse command Dr. David Alan Gilbert (git)
@ 2015-09-29  8:38 ` Dr. David Alan Gilbert (git)
  2015-10-21 11:17   ` Juan Quintela
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 40/54] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
                   ` (14 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:38 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

On receiving MIG_RPCOMM_REQ_PAGES look up the address and
queue the page.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 include/migration/migration.h | 22 +++++++++++
 migration/migration.c         | 31 +++++++++++++++-
 migration/ram.c               | 85 +++++++++++++++++++++++++++++++++++++++++++
 trace-events                  |  1 +
 4 files changed, 138 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 9be08c8..fbf3b99 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -105,6 +105,18 @@ MigrationIncomingState *migration_incoming_get_current(void);
 MigrationIncomingState *migration_incoming_state_new(QEMUFile *f);
 void migration_incoming_state_destroy(void);
 
+/*
+ * An outstanding page request, on the source, having been received
+ * and queued
+ */
+struct MigrationSrcPageRequest {
+    RAMBlock *rb;
+    hwaddr    offset;
+    hwaddr    len;
+
+    QSIMPLEQ_ENTRY(MigrationSrcPageRequest) next_req;
+};
+
 struct MigrationState
 {
     int64_t bandwidth_limit;
@@ -148,6 +160,12 @@ struct MigrationState
      * of the postcopy phase
      */
     unsigned long *sentmap;
+
+    /* Queue of outstanding page requests from the destination */
+    QemuMutex src_page_req_mutex;
+    QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests;
+    /* The RAMBlock used in the last src_page_request */
+    RAMBlock *last_req_rb;
 };
 
 void process_incoming_migration(QEMUFile *f);
@@ -295,6 +313,10 @@ void savevm_skip_configuration(void);
 int global_state_store(void);
 void global_state_store_running(void);
 
+void flush_page_queue(MigrationState *ms);
+int ram_save_queue_pages(MigrationState *ms, const char *rbname,
+                         ram_addr_t start, ram_addr_t len);
+
 PostcopyState postcopy_state_get(void);
 /* Set the state and return the old state */
 PostcopyState postcopy_state_set(PostcopyState new_state);
diff --git a/migration/migration.c b/migration/migration.c
index e994164..6160259 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -21,6 +21,7 @@
 #include "sysemu/sysemu.h"
 #include "block/block.h"
 #include "qapi/qmp/qerror.h"
+#include "qapi/util.h"
 #include "qemu/sockets.h"
 #include "qemu/rcu.h"
 #include "migration/block.h"
@@ -28,8 +29,9 @@
 #include "qemu/thread.h"
 #include "qmp-commands.h"
 #include "trace.h"
-#include "qapi/util.h"
 #include "qapi-event.h"
+#include "exec/memory.h"
+#include "exec/address-spaces.h"
 
 #define MAX_THROTTLE  (32 << 20)      /* Migration speed throttling */
 
@@ -68,6 +70,7 @@ static PostcopyState incoming_postcopy_state;
 /* For outgoing */
 MigrationState *migrate_get_current(void)
 {
+    static bool once;
     static MigrationState current_migration = {
         .state = MIGRATION_STATUS_NONE,
         .bandwidth_limit = MAX_THROTTLE,
@@ -81,6 +84,10 @@ MigrationState *migrate_get_current(void)
                 DEFAULT_MIGRATE_DECOMPRESS_THREAD_COUNT,
     };
 
+    if (!once) {
+        qemu_mutex_init(&current_migration.src_page_req_mutex);
+        once = true;
+    }
     return &current_migration;
 }
 
@@ -718,6 +725,8 @@ static void migrate_fd_cleanup(void *opaque)
     qemu_bh_delete(s->cleanup_bh);
     s->cleanup_bh = NULL;
 
+    flush_page_queue(s);
+
     if (s->file) {
         trace_migrate_fd_cleanup();
         qemu_mutex_unlock_iothread();
@@ -845,6 +854,8 @@ MigrationState *migrate_init(const MigrationParams *params)
     s->bandwidth_limit = bandwidth_limit;
     migrate_set_state(s, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
 
+    QSIMPLEQ_INIT(&s->src_page_requests);
+
     s->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     return s;
 }
@@ -1134,7 +1145,25 @@ static struct rp_cmd_args {
 static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
                                        ram_addr_t start, size_t len)
 {
+    long our_host_ps = getpagesize();
+
     trace_migrate_handle_rp_req_pages(rbname, start, len);
+
+    /*
+     * Since we currently insist on matching page sizes, just sanity check
+     * we're being asked for whole host pages.
+     */
+    if (start & (our_host_ps-1) ||
+       (len & (our_host_ps-1))) {
+        error_report("%s: Misaligned page request, start: " RAM_ADDR_FMT
+                     " len: " RAM_ADDR_FMT, __func__, start, len);
+        mark_source_rp_bad(ms);
+        return;
+    }
+
+    if (ram_save_queue_pages(ms, rbname, start, len)) {
+        mark_source_rp_bad(ms);
+    }
 }
 
 /*
diff --git a/migration/ram.c b/migration/ram.c
index d005aca..5771983 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1002,6 +1002,91 @@ static bool find_dirty_block(QEMUFile *f, PageSearchStatus *pss,
 }
 
 /**
+ * flush_page_queue: Flush any remaining pages in the ram request queue
+ *    it should be empty at the end anyway, but in error cases there may be
+ *    some left.
+ *
+ * ms: MigrationState
+ */
+void flush_page_queue(MigrationState *ms)
+{
+    struct MigrationSrcPageRequest *mspr, *next_mspr;
+    /* This queue generally should be empty - but in the case of a failed
+     * migration might have some droppings in.
+     */
+    rcu_read_lock();
+    QSIMPLEQ_FOREACH_SAFE(mspr, &ms->src_page_requests, next_req, next_mspr) {
+        memory_region_unref(mspr->rb->mr);
+        QSIMPLEQ_REMOVE_HEAD(&ms->src_page_requests, next_req);
+        g_free(mspr);
+    }
+    rcu_read_unlock();
+}
+
+/**
+ * Queue the pages for transmission, e.g. a request from postcopy destination
+ *   ms: MigrationStatus in which the queue is held
+ *   rbname: The RAMBlock the request is for - may be NULL (to mean reuse last)
+ *   start: Offset from the start of the RAMBlock
+ *   len: Length (in bytes) to send
+ *   Return: 0 on success
+ */
+int ram_save_queue_pages(MigrationState *ms, const char *rbname,
+                         ram_addr_t start, ram_addr_t len)
+{
+    RAMBlock *ramblock;
+
+    rcu_read_lock();
+    if (!rbname) {
+        /* Reuse last RAMBlock */
+        ramblock = ms->last_req_rb;
+
+        if (!ramblock) {
+            /*
+             * Shouldn't happen, we can't reuse the last RAMBlock if
+             * it's the 1st request.
+             */
+            error_report("ram_save_queue_pages no previous block");
+            goto err;
+        }
+    } else {
+        ramblock = ram_find_block_by_id(rbname);
+
+        if (!ramblock) {
+            /* We shouldn't be asked for a non-existent RAMBlock */
+            error_report("ram_save_queue_pages no block '%s'", rbname);
+            goto err;
+        }
+        ms->last_req_rb = ramblock;
+    }
+    trace_ram_save_queue_pages(ramblock->idstr, start, len);
+    if (start+len > ramblock->used_length) {
+        error_report("%s request overrun start=%zx len=%zx blocklen=%zx",
+                     __func__, start, len, ramblock->used_length);
+        goto err;
+    }
+
+    struct MigrationSrcPageRequest *new_entry =
+        g_malloc0(sizeof(struct MigrationSrcPageRequest));
+    new_entry->rb = ramblock;
+    new_entry->offset = start;
+    new_entry->len = len;
+
+    memory_region_ref(ramblock->mr);
+    qemu_mutex_lock(&ms->src_page_req_mutex);
+    QSIMPLEQ_INSERT_TAIL(&ms->src_page_requests, new_entry, next_req);
+    qemu_mutex_unlock(&ms->src_page_req_mutex);
+    rcu_read_unlock();
+
+    return 0;
+
+err:
+    rcu_read_unlock();
+    return -1;
+}
+
+
+/**
  * ram_find_and_save_block: Finds a dirty page and sends it to f
  *
  * Called within an RCU critical section.
diff --git a/trace-events b/trace-events
index b58077f..e40f00e 100644
--- a/trace-events
+++ b/trace-events
@@ -1248,6 +1248,7 @@ migration_bitmap_sync_start(void) ""
 migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64""
 migration_throttle(void) ""
 ram_postcopy_send_discard_bitmap(void) ""
+ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: %zx len: %zx"
 
 # hw/display/qxl.c
 disable qxl_interface_set_mm_time(int qid, uint32_t mm_time) "%d %d"
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 40/54] Page request: Consume pages off the post-copy queue
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (38 preceding siblings ...)
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 39/54] Page request: Process incoming page request Dr. David Alan Gilbert (git)
@ 2015-09-29  8:38 ` Dr. David Alan Gilbert (git)
  2015-10-26 16:32   ` Juan Quintela
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 41/54] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
                   ` (13 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:38 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When transmitting RAM pages, consume pages that have been queued by
MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning.

Note:
  a) After a queued page the linear walk carries on from after the
unqueued page; there is a reasonable chance that the destination
was about to ask for other closeby pages anyway.

  b) We have to be careful of any assumptions that the page walking
code makes, in particular it does some short cuts on its first linear
walk that break as soon as we do a queued page.

  c) We have to be careful to not break up host-page size chunks, since
this makes it harder to place the pages on the destination.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/ram.c | 195 +++++++++++++++++++++++++++++++++++++++++++++++---------
 trace-events    |   2 +
 2 files changed, 168 insertions(+), 29 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 5771983..487e838 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -516,9 +516,9 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t **current_data,
  * Returns: byte offset within memory region of the start of a dirty page
  */
 static inline
-ram_addr_t migration_bitmap_find_and_reset_dirty(RAMBlock *rb,
-                                                 ram_addr_t start,
-                                                 ram_addr_t *ram_addr_abs)
+ram_addr_t migration_bitmap_find_dirty(RAMBlock *rb,
+                                       ram_addr_t start,
+                                       ram_addr_t *ram_addr_abs)
 {
     unsigned long base = rb->offset >> TARGET_PAGE_BITS;
     unsigned long nr = base + (start >> TARGET_PAGE_BITS);
@@ -535,15 +535,24 @@ ram_addr_t migration_bitmap_find_and_reset_dirty(RAMBlock *rb,
         next = find_next_bit(bitmap, size, nr);
     }
 
-    if (next < size) {
-        clear_bit(next, bitmap);
-        migration_dirty_pages--;
-    }
     *ram_addr_abs = next << TARGET_PAGE_BITS;
     return (next - base) << TARGET_PAGE_BITS;
 }
 
-/* Called with rcu_read_lock() to protect migration_bitmap */
+static inline bool migration_bitmap_clear_dirty(ram_addr_t addr)
+{
+    bool ret;
+    int nr = addr >> TARGET_PAGE_BITS;
+    unsigned long *bitmap = atomic_rcu_read(&migration_bitmap);
+
+    ret = test_and_clear_bit(nr, bitmap);
+
+    if (ret) {
+        migration_dirty_pages--;
+    }
+    return ret;
+}
+
 static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length)
 {
     unsigned long *bitmap;
@@ -960,9 +969,8 @@ static int ram_save_compressed_page(QEMUFile *f, RAMBlock *block,
 static bool find_dirty_block(QEMUFile *f, PageSearchStatus *pss,
                              bool *again, ram_addr_t *ram_addr_abs)
 {
-    pss->offset = migration_bitmap_find_and_reset_dirty(pss->block,
-                                                       pss->offset,
-                                                       ram_addr_abs);
+    pss->offset = migration_bitmap_find_dirty(pss->block, pss->offset,
+                                              ram_addr_abs);
     if (pss->complete_round && pss->block == last_seen_block &&
         pss->offset >= last_offset) {
         /*
@@ -1001,6 +1009,88 @@ static bool find_dirty_block(QEMUFile *f, PageSearchStatus *pss,
     }
 }
 
+/*
+ * Unqueue a page from the queue fed by postcopy page requests; skips pages
+ * that are already sent (!dirty)
+ *
+ * Returns:      true if a queued page is found
+ *      ms:      MigrationState in
+ *     pss:      PageSearchStatus structure updated with found block/offset
+ * ram_addr_abs: global offset in the dirty/sent bitmaps
+ */
+static bool get_queued_page(MigrationState *ms, PageSearchStatus *pss,
+                            ram_addr_t *ram_addr_abs)
+{
+    RAMBlock  *block;
+    ram_addr_t offset;
+    bool dirty;
+
+    do {
+        block = NULL;
+        qemu_mutex_lock(&ms->src_page_req_mutex);
+        if (!QSIMPLEQ_EMPTY(&ms->src_page_requests)) {
+            struct MigrationSrcPageRequest *entry =
+                                    QSIMPLEQ_FIRST(&ms->src_page_requests);
+            block = entry->rb;
+            offset = entry->offset;
+            *ram_addr_abs = (entry->offset + entry->rb->offset) &
+                            TARGET_PAGE_MASK;
+
+            if (entry->len > TARGET_PAGE_SIZE) {
+                entry->len -= TARGET_PAGE_SIZE;
+                entry->offset += TARGET_PAGE_SIZE;
+            } else {
+                memory_region_unref(block->mr);
+                QSIMPLEQ_REMOVE_HEAD(&ms->src_page_requests, next_req);
+                g_free(entry);
+            }
+        }
+        qemu_mutex_unlock(&ms->src_page_req_mutex);
+
+        /*
+         * We're sending this page, and since it's postcopy nothing else
+         * will dirty it, and we must make sure it doesn't get sent again
+         * even if this queue request was received after the background
+         * search already sent it.
+         */
+        if (block) {
+            dirty = test_bit(*ram_addr_abs >> TARGET_PAGE_BITS,
+                             migration_bitmap);
+            if (!dirty) {
+                trace_get_queued_page_not_dirty(
+                    block->idstr, (uint64_t)offset,
+                    (uint64_t)*ram_addr_abs,
+                    test_bit(*ram_addr_abs >> TARGET_PAGE_BITS, ms->sentmap));
+            } else {
+                trace_get_queued_page(block->idstr,
+                                      (uint64_t)offset,
+                                      (uint64_t)*ram_addr_abs);
+            }
+        }
+
+    } while (block && !dirty);
+
+    if (block) {
+        /*
+         * As soon as we start servicing pages out of order, then we have
+         * to kill the bulk stage, since the bulk stage assumes
+         * in (migration_bitmap_find_and_reset_dirty) that every page is
+         * dirty, that's no longer true.
+         */
+        ram_bulk_stage = false;
+
+        /*
+         * We want the background search to continue from the queued page
+         * since the guest is likely to want other pages near to the page
+         * it just requested.
+         */
+        pss->block = block;
+        pss->offset = offset;
+    }
+
+    return !!block;
+}
+
 /**
  * flush_page_queue: Flush any remaining pages in the ram request queue
  *    it should be empty at the end anyway, but in error cases there may be
@@ -1087,6 +1177,57 @@ err:
 
 
 /**
+ * ram_save_host_page: Starting at *offset send pages upto the end
+ *                     of the current host page.  It's valid for the initial
+ *                     offset to point into the middle of a host page
+ *                     in which case the remainder of the hostpage is sent.
+ *                     Only dirty target pages are sent.
+ *
+ * Returns: Number of pages written.
+ *
+ * @f: QEMUFile where to send the data
+ * @block: pointer to block that contains the page we want to send
+ * @offset: offset inside the block for the page; updated to last target page
+ *          sent
+ * @last_stage: if we are at the completion stage
+ * @bytes_transferred: increase it with the number of transferred bytes
+ */
+static int ram_save_host_page(MigrationState *ms, QEMUFile *f, RAMBlock* block,
+                              ram_addr_t *offset, bool last_stage,
+                              uint64_t *bytes_transferred,
+                              ram_addr_t dirty_ram_abs)
+{
+    int tmppages, pages = 0;
+    do {
+        /* Check the pages is dirty and if it is send it */
+        if (migration_bitmap_clear_dirty(dirty_ram_abs)) {
+            if (compression_switch && migrate_use_compression()) {
+                tmppages = ram_save_compressed_page(f, block, *offset,
+                                                    last_stage,
+                                                    bytes_transferred);
+            } else {
+                tmppages = ram_save_page(f, block, *offset, last_stage,
+                                         bytes_transferred);
+            }
+
+            if (tmppages < 0) {
+                return tmppages;
+            }
+            if (ms->sentmap) {
+                set_bit(dirty_ram_abs >> TARGET_PAGE_BITS, ms->sentmap);
+            }
+            pages += tmppages;
+        }
+        *offset += TARGET_PAGE_SIZE;
+        dirty_ram_abs += TARGET_PAGE_SIZE;
+    } while (*offset & (qemu_host_page_size - 1));
+
+    /* The offset we leave with is the last one we looked at */
+    *offset -= TARGET_PAGE_SIZE;
+    return pages;
+}
+
+/**
  * ram_find_and_save_block: Finds a dirty page and sends it to f
  *
  * Called within an RCU critical section.
@@ -1097,12 +1238,16 @@ err:
  * @f: QEMUFile where to send the data
  * @last_stage: if we are at the completion stage
  * @bytes_transferred: increase it with the number of transferred bytes
+ *
+ * On systems where host-page-size > target-page-size it will send all the
+ * pages in a host page that are dirty.
  */
 
 static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
                                    uint64_t *bytes_transferred)
 {
     PageSearchStatus pss;
+    MigrationState *ms = migrate_get_current();
     int pages = 0;
     bool again, found;
     ram_addr_t dirty_ram_abs; /* Address of the start of the dirty page in
@@ -1117,26 +1262,18 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
     }
 
     do {
-        found = find_dirty_block(f, &pss, &again, &dirty_ram_abs);
+        again = true;
+        found = get_queued_page(ms, &pss, &dirty_ram_abs);
 
-        if (found) {
-            if (compression_switch && migrate_use_compression()) {
-                pages = ram_save_compressed_page(f, pss.block, pss.offset,
-                                                 last_stage,
-                                                 bytes_transferred);
-            } else {
-                pages = ram_save_page(f, pss.block, pss.offset, last_stage,
-                                      bytes_transferred);
-            }
+        if (!found) {
+            /* priority queue empty, so just search for something dirty */
+            found = find_dirty_block(f, &pss, &again, &dirty_ram_abs);
+        }
 
-            /* if page is unmodified, continue to the next */
-            if (pages > 0) {
-                MigrationState *ms = migrate_get_current();
-                last_sent_block = pss.block;
-                if (ms->sentmap) {
-                    set_bit(dirty_ram_abs >> TARGET_PAGE_BITS, ms->sentmap);
-                }
-            }
+        if (found) {
+            pages = ram_save_host_page(ms, f, pss.block, &pss.offset,
+                                       last_stage, bytes_transferred,
+                                       dirty_ram_abs);
         }
     } while (!pages && again);
 
diff --git a/trace-events b/trace-events
index e40f00e..9e4206b 100644
--- a/trace-events
+++ b/trace-events
@@ -1244,6 +1244,8 @@ vmstate_subsection_load_good(const char *parent) "%s"
 qemu_file_fclose(void) ""
 
 # migration/ram.c
+get_queued_page(const char *block_name, uint64_t tmp_offset, uint64_t ram_addr) "%s/%" PRIx64 " ram_addr=%" PRIx64
+get_queued_page_not_dirty(const char *block_name, uint64_t tmp_offset, uint64_t ram_addr, int sent) "%s/%" PRIx64 " ram_addr=%" PRIx64 " (sent=%d)"
 migration_bitmap_sync_start(void) ""
 migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64""
 migration_throttle(void) ""
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 41/54] postcopy_ram.c: place_page and helpers
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (39 preceding siblings ...)
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 40/54] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
@ 2015-09-29  8:38 ` Dr. David Alan Gilbert (git)
  2015-10-28 10:28   ` Juan Quintela
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 42/54] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
                   ` (12 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:38 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

postcopy_place_page (etc) provide a way for postcopy to place a page
into guests memory atomically (using the copy ioctl on the ufd).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 include/migration/migration.h    |  1 +
 include/migration/postcopy-ram.h | 21 +++++++++
 migration/postcopy-ram.c         | 97 ++++++++++++++++++++++++++++++++++++++++
 trace-events                     |  2 +
 4 files changed, 121 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index fbf3b99..218b2ca 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -96,6 +96,7 @@ struct MigrationIncomingState {
     int       userfault_fd;
     QEMUFile *to_src_file;
     QemuMutex rp_mutex;    /* We send replies from multiple threads */
+    void     *postcopy_tmp_page;
 
     /* See savevm.c */
     LoadStateEntry_Head loadvm_handlers;
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index 9d037ff..50c1ce5 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -69,4 +69,25 @@ void postcopy_discard_send_range(MigrationState *ms, PostcopyDiscardState *pds,
 void postcopy_discard_send_finish(MigrationState *ms,
                                   PostcopyDiscardState *pds);
 
+/*
+ * Place a page (from) at (host) efficiently
+ *    There are restrictions on how 'from' must be mapped, in general best
+ *    to use other postcopy_ routines to allocate.
+ * returns 0 on success
+ */
+int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from);
+
+/*
+ * Place a zero page at (host) atomically
+ * returns 0 on success
+ */
+int postcopy_place_page_zero(MigrationIncomingState *mis, void *host);
+
+/*
+ * Allocate a page of memory that can be mapped at a later point in time
+ * using postcopy_place_page
+ * Returns: Pointer to allocated page
+ */
+void *postcopy_get_tmp_page(MigrationIncomingState *mis);
+
 #endif
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index e89a99e..09a1349 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -272,6 +272,10 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
         return -1;
     }
 
+    if (mis->postcopy_tmp_page) {
+        munmap(mis->postcopy_tmp_page, getpagesize());
+        mis->postcopy_tmp_page = NULL;
+    }
     return 0;
 }
 
@@ -338,6 +342,83 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
     return 0;
 }
 
+/*
+ * Place a host page (from) at (host) atomically
+ * returns 0 on success
+ */
+int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from)
+{
+    struct uffdio_copy copy_struct;
+
+    copy_struct.dst = (uint64_t)(uintptr_t)host;
+    copy_struct.src = (uint64_t)(uintptr_t)from;
+    copy_struct.len = getpagesize();
+    copy_struct.mode = 0;
+
+    /* copy also acks to the kernel waking the stalled thread up
+     * TODO: We can inhibit that ack and only do it if it was requested
+     * which would be slightly cheaper, but we'd have to be careful
+     * of the order of updating our page state.
+     */
+    if (ioctl(mis->userfault_fd, UFFDIO_COPY, &copy_struct)) {
+        int e = errno;
+        error_report("%s: %s copy host: %p from: %p",
+                     __func__, strerror(e), host, from);
+
+        return -e;
+    }
+
+    trace_postcopy_place_page(host);
+    return 0;
+}
+
+/*
+ * Place a zero page at (host) atomically
+ * returns 0 on success
+ */
+int postcopy_place_page_zero(MigrationIncomingState *mis, void *host)
+{
+    struct uffdio_zeropage zero_struct;
+
+    zero_struct.range.start = (uint64_t)(uintptr_t)host;
+    zero_struct.range.len = getpagesize();
+    zero_struct.mode = 0;
+
+    if (ioctl(mis->userfault_fd, UFFDIO_ZEROPAGE, &zero_struct)) {
+        int e = errno;
+        error_report("%s: %s zero host: %p",
+                     __func__, strerror(e), host);
+
+        return -e;
+    }
+
+    trace_postcopy_place_page_zero(host);
+    return 0;
+}
+
+/*
+ * Returns a target page of memory that can be mapped at a later point in time
+ * using postcopy_place_page
+ * The same address is used repeatedly, postcopy_place_page just takes the
+ * backing page away.
+ * Returns: Pointer to allocated page
+ *
+ */
+void *postcopy_get_tmp_page(MigrationIncomingState *mis)
+{
+    if (!mis->postcopy_tmp_page) {
+        mis->postcopy_tmp_page = mmap(NULL, getpagesize(),
+                             PROT_READ | PROT_WRITE, MAP_PRIVATE |
+                             MAP_ANONYMOUS, -1, 0);
+        if (!mis->postcopy_tmp_page) {
+            error_report("%s: %s", __func__, strerror(errno));
+            return NULL;
+        }
+    }
+
+    return mis->postcopy_tmp_page;
+}
+
 #else
 /* No target OS support, stubs just fail */
 bool postcopy_ram_supported_by_host(void)
@@ -367,6 +448,22 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
 {
     assert(0);
 }
+
+int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from)
+{
+    assert(0);
+}
+
+int postcopy_place_page_zero(MigrationIncomingState *mis, void *host)
+{
+    assert(0);
+}
+
+void *postcopy_get_tmp_page(MigrationIncomingState *mis)
+{
+    assert(0);
+}
+
 #endif
 
 /* ------------------------------------------------------------------------- */
diff --git a/trace-events b/trace-events
index 9e4206b..2f27385 100644
--- a/trace-events
+++ b/trace-events
@@ -1538,6 +1538,8 @@ postcopy_discard_send_finish(const char *ramblock, int nwords, int ncmds) "%s ma
 postcopy_ram_discard_range(void *start, size_t length) "%p,+%zx"
 postcopy_cleanup_range(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=%zx length=%zx"
 postcopy_init_range(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=%zx length=%zx"
+postcopy_place_page(void *host_addr) "host=%p"
+postcopy_place_page_zero(void *host_addr) "host=%p"
 
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 42/54] Postcopy: Use helpers to map pages during migration
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (40 preceding siblings ...)
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 41/54] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
@ 2015-09-29  8:38 ` Dr. David Alan Gilbert (git)
  2015-10-28 10:58   ` Juan Quintela
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 43/54] Don't sync dirty bitmaps in postcopy Dr. David Alan Gilbert (git)
                   ` (11 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:38 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

In postcopy, the destination guest is running at the same time
as it's receiving pages; as we receive new pages we must put
them into the guests address space atomically to avoid a running
CPU accessing a partially written page.

Use the helpers in postcopy-ram.c to map these pages.

qemu_get_buffer_in_place is used to avoid a copy out of qemu_file
in the case that postcopy is going to do a copy anyway.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/ram.c | 128 +++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 103 insertions(+), 25 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 487e838..6d9cfb5 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1848,7 +1848,17 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
 /* Must be called from within a rcu critical section.
  * Returns a pointer from within the RCU-protected ram_list.
  */
+/*
+ * Read a RAMBlock ID from the stream f, find the host address of the
+ * start of that block and add on 'offset'
+ *
+ * f: Stream to read from
+ * mis: MigrationIncomingState
+ * offset: Offset within the block
+ * flags: Page flags (mostly to see if it's a continuation of previous block)
+ */
 static inline void *host_from_stream_offset(QEMUFile *f,
+                                            MigrationIncomingState *mis,
                                             ram_addr_t offset,
                                             int flags)
 {
@@ -2000,6 +2010,15 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     int flags = 0, ret = 0;
     static uint64_t seq_iter;
     int len = 0;
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    /*
+     * If system is running in postcopy mode, page inserts to host memory must
+     * be atomic
+     */
+    bool postcopy_running = postcopy_state_get() >= POSTCOPY_INCOMING_LISTENING;
+    void *postcopy_host_page = NULL;
+    bool postcopy_place_needed = false;
+    bool matching_page_sizes = qemu_host_page_size == TARGET_PAGE_SIZE;
 
     seq_iter++;
 
@@ -2015,13 +2034,55 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     rcu_read_lock();
     while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
         ram_addr_t addr, total_ram_bytes;
-        void *host;
+        void *host = NULL;
+        void *page_buffer = NULL;
+        void *postcopy_place_source = NULL;
         uint8_t ch;
+        bool all_zero = false;
 
         addr = qemu_get_be64(f);
         flags = addr & ~TARGET_PAGE_MASK;
         addr &= TARGET_PAGE_MASK;
 
+        postcopy_place_needed = false;
+        if (flags & (RAM_SAVE_FLAG_COMPRESS | RAM_SAVE_FLAG_PAGE |
+                     RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
+            host = host_from_stream_offset(f, mis, addr, flags);
+            if (!host) {
+                error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
+                ret = -EINVAL;
+                break;
+            }
+            page_buffer = host;
+            if (postcopy_running) {
+                /*
+                 * Postcopy requires that we place whole host pages atomically.
+                 * To make it atomic, the data is read into a temporary page
+                 * that's moved into place later.
+                 * The migration protocol uses,  possibly smaller, target-pages
+                 * however the source ensures it always sends all the components
+                 * of a host page in order.
+                 */
+                if (!postcopy_host_page) {
+                    postcopy_host_page = postcopy_get_tmp_page(mis);
+                }
+                page_buffer = postcopy_host_page +
+                              ((uintptr_t)host & ~qemu_host_page_mask);
+                /* If all TP are zero then we can optimise the place */
+                if (!((uintptr_t)host & ~qemu_host_page_mask)) {
+                    all_zero = true;
+                }
+
+                /*
+                 * If it's the last part of a host page then we place the host
+                 * page
+                 */
+                postcopy_place_needed = (((uintptr_t)host + TARGET_PAGE_SIZE) &
+                                         ~qemu_host_page_mask) == 0;
+                postcopy_place_source = postcopy_host_page;
+            }
+        }
+
         switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
         case RAM_SAVE_FLAG_MEM_SIZE:
             /* Synchronize RAM block list */
@@ -2062,32 +2123,36 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
             }
             break;
         case RAM_SAVE_FLAG_COMPRESS:
-            host = host_from_stream_offset(f, addr, flags);
-            if (!host) {
-                error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
-                ret = -EINVAL;
-                break;
-            }
             ch = qemu_get_byte(f);
-            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
+            if (!postcopy_running) {
+                ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
+            } else {
+                memset(page_buffer, ch, TARGET_PAGE_SIZE);
+                if (ch) {
+                    all_zero = false;
+                }
+            }
             break;
+
         case RAM_SAVE_FLAG_PAGE:
-            host = host_from_stream_offset(f, addr, flags);
-            if (!host) {
-                error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
-                ret = -EINVAL;
-                break;
+            all_zero = false;
+            if (!postcopy_place_needed || !matching_page_sizes) {
+                qemu_get_buffer(f, page_buffer, TARGET_PAGE_SIZE);
+            } else {
+                /* Avoids the qemu_file copy during postcopy, which is
+                 * going to do a copy later; can only do it when we
+                 * do this read in one go (matching page sizes)
+                 */
+                qemu_get_buffer_in_place(f, (uint8_t **)&postcopy_place_source,
+                                         TARGET_PAGE_SIZE);
             }
-            qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
             break;
         case RAM_SAVE_FLAG_COMPRESS_PAGE:
-            host = host_from_stream_offset(f, addr, flags);
-            if (!host) {
-                error_report("Invalid RAM offset " RAM_ADDR_FMT, addr);
-                ret = -EINVAL;
-                break;
+            all_zero = false;
+            if (postcopy_running) {
+                error_report("Compressed RAM in postcopy mode @%zx\n", addr);
+                return -EINVAL;
             }
-
             len = qemu_get_be32(f);
             if (len < 0 || len > compressBound(TARGET_PAGE_SIZE)) {
                 error_report("Invalid compressed data length: %d", len);
@@ -2097,12 +2162,12 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
             qemu_get_buffer(f, compressed_data_buf, len);
             decompress_data_with_multi_threads(compressed_data_buf, host, len);
             break;
+
         case RAM_SAVE_FLAG_XBZRLE:
-            host = host_from_stream_offset(f, addr, flags);
-            if (!host) {
-                error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
-                ret = -EINVAL;
-                break;
+            all_zero = false;
+            if (postcopy_running) {
+                error_report("XBZRLE RAM block in postcopy mode @%zx\n", addr);
+                return -EINVAL;
             }
             if (load_xbzrle(f, addr, host) < 0) {
                 error_report("Failed to decompress XBZRLE page at "
@@ -2123,6 +2188,19 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
                 ret = -EINVAL;
             }
         }
+
+        if (postcopy_place_needed) {
+            /* This gets called at the last target page in the host page */
+            if (!all_zero) {
+                ret = postcopy_place_page(mis, host + TARGET_PAGE_SIZE -
+                                               qemu_host_page_size,
+                                               postcopy_place_source);
+            } else {
+                ret = postcopy_place_page_zero(mis,
+                                               host + TARGET_PAGE_SIZE -
+                                                 qemu_host_page_size);
+            }
+        }
         if (!ret) {
             ret = qemu_file_get_error(f);
         }
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 43/54] Don't sync dirty bitmaps in postcopy
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (41 preceding siblings ...)
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 42/54] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
@ 2015-09-29  8:38 ` Dr. David Alan Gilbert (git)
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 44/54] Don't iterate on precopy-only devices during postcopy Dr. David Alan Gilbert (git)
                   ` (10 subsequent siblings)
  53 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:38 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Once we're in postcopy the source processors are stopped and memory
shouldn't change any more, so there's no need to look at the dirty
map.

There are two notes to this:
  1) If we do resync and a page had changed then the page would get
     sent again, which the destination wouldn't allow (since it might
     have also modified the page)
  2) Before disabling this I'd seen very rare cases where a page had been
     marked dirtied although the memory contents are apparently identical

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 migration/ram.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 6d9cfb5..437b937 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1761,7 +1761,9 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 {
     rcu_read_lock();
 
-    migration_bitmap_sync();
+    if (!migration_in_postcopy(migrate_get_current())) {
+        migration_bitmap_sync();
+    }
 
     ram_control_before_iterate(f, RAM_CONTROL_FINISH);
 
@@ -1797,7 +1799,8 @@ static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
 
     remaining_size = ram_save_remaining() * TARGET_PAGE_SIZE;
 
-    if (remaining_size < max_size) {
+    if (!migration_in_postcopy(migrate_get_current()) &&
+        remaining_size < max_size) {
         qemu_mutex_lock_iothread();
         rcu_read_lock();
         migration_bitmap_sync();
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 44/54] Don't iterate on precopy-only devices during postcopy
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (42 preceding siblings ...)
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 43/54] Don't sync dirty bitmaps in postcopy Dr. David Alan Gilbert (git)
@ 2015-09-29  8:38 ` Dr. David Alan Gilbert (git)
  2015-10-28 11:01   ` Juan Quintela
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 45/54] Host page!=target page: Cleanup bitmaps Dr. David Alan Gilbert (git)
                   ` (9 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:38 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

During the postcopy phase we must not call the iterate method on
precopy-only devices, since they may have done some cleanup during
the _complete call at the end of the precopy phase.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/sysemu/sysemu.h |  2 +-
 migration/migration.c   |  2 +-
 migration/savevm.c      | 13 +++++++++++--
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 9a0d0b5..e2353a5 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -109,7 +109,7 @@ bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_state_begin(QEMUFile *f,
                              const MigrationParams *params);
 void qemu_savevm_state_header(QEMUFile *f);
-int qemu_savevm_state_iterate(QEMUFile *f);
+int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy);
 void qemu_savevm_state_complete_postcopy(QEMUFile *f);
 void qemu_savevm_state_complete_precopy(QEMUFile *f);
 void qemu_savevm_state_cancel(void);
diff --git a/migration/migration.c b/migration/migration.c
index 6160259..bb7b683 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1565,7 +1565,7 @@ static void *migration_thread(void *opaque)
                     continue;
                 }
                 /* Just another iteration step */
-                qemu_savevm_state_iterate(s->file);
+                qemu_savevm_state_iterate(s->file, entered_postcopy);
             } else {
                 trace_migration_thread_low_pending(pending_size);
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 4072912..63b2c30 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -931,7 +931,7 @@ void qemu_savevm_state_begin(QEMUFile *f,
  *   0 : We haven't finished, caller have to go again
  *   1 : We have finished, we can go to complete phase
  */
-int qemu_savevm_state_iterate(QEMUFile *f)
+int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy)
 {
     SaveStateEntry *se;
     int ret = 1;
@@ -946,6 +946,15 @@ int qemu_savevm_state_iterate(QEMUFile *f)
                 continue;
             }
         }
+        /*
+         * In the postcopy phase, any device that doesn't know how to
+         * do postcopy should have saved it's state in the _complete
+         * call that's already run, it might get confused if we call
+         * iterate afterwards.
+         */
+        if (postcopy && !se->ops->save_live_complete_postcopy) {
+            return 0;
+        }
         if (qemu_file_rate_limit(f)) {
             return 0;
         }
@@ -1160,7 +1169,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
     qemu_mutex_lock_iothread();
 
     while (qemu_file_get_error(f) == 0) {
-        if (qemu_savevm_state_iterate(f) > 0) {
+        if (qemu_savevm_state_iterate(f, false) > 0) {
             break;
         }
     }
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 45/54] Host page!=target page: Cleanup bitmaps
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (43 preceding siblings ...)
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 44/54] Don't iterate on precopy-only devices during postcopy Dr. David Alan Gilbert (git)
@ 2015-09-29  8:38 ` Dr. David Alan Gilbert (git)
  2015-10-28 11:24   ` Juan Quintela
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 46/54] postcopy: Check order of received target pages Dr. David Alan Gilbert (git)
                   ` (8 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:38 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Prior to the start of postcopy, ensure that everything that will
be transferred later is a whole host-page in size.

This is accomplished by discarding partially transferred host pages
and marking any that are partially dirty as fully dirty.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/ram.c | 190 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 190 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 437b937..d6437be 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1508,6 +1508,189 @@ static int postcopy_each_ram_send_discard(MigrationState *ms)
 }
 
 /*
+ * Utility for the outgoing postcopy code.
+ *
+ * Discard any partially sent host-page size chunks, mark any partially
+ * dirty host-page size chunks as all dirty.
+ *
+ * Returns: 0 on success
+ */
+static int postcopy_chunk_hostpages(MigrationState *ms)
+{
+    struct RAMBlock *block;
+    unsigned int host_ratio = qemu_host_page_size / TARGET_PAGE_SIZE;
+
+    if (qemu_host_page_size == TARGET_PAGE_SIZE) {
+        /* Easy case - TPS==HPS - nothing to be done */
+        return 0;
+    }
+
+    /* Easiest way to make sure we don't resume in the middle of a host-page */
+    last_seen_block = NULL;
+    last_sent_block = NULL;
+    last_offset     = 0;
+
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        unsigned long first = block->offset >> TARGET_PAGE_BITS;
+        unsigned long len = block->used_length >> TARGET_PAGE_BITS;
+        unsigned long last = first + (len - 1);
+        unsigned long found_set;
+        unsigned long search_start;
+
+        PostcopyDiscardState *pds =
+                         postcopy_discard_send_init(ms, first, block->idstr);
+
+        /* First pass: Discard all partially sent host pages */
+        found_set = find_next_bit(ms->sentmap, last + 1, first);
+        while (found_set <= last) {
+            bool do_discard = false;
+            unsigned long discard_start_addr;
+            /*
+             * If the start of this run of pages is in the middle of a host
+             * page, then we need to discard this host page.
+             */
+            if (found_set % host_ratio) {
+                do_discard = true;
+                found_set -= found_set % host_ratio;
+                discard_start_addr = found_set;
+                search_start = found_set + host_ratio;
+            } else {
+                /* Find the end of this run */
+                unsigned long found_zero;
+                found_zero = find_next_zero_bit(ms->sentmap, last + 1,
+                                                found_set + 1);
+                /*
+                 * If the 0 isn't at the start of a host page, then the
+                 * run of 1's doesn't finish at the end of a host page
+                 * and we need to discard.
+                 */
+                if (found_zero % host_ratio) {
+                    do_discard = true;
+                    discard_start_addr = found_zero - (found_zero % host_ratio);
+                    /*
+                     * This host page has gone, the next loop iteration starts
+                     * from the next page with a 1 bit
+                     */
+                    search_start = discard_start_addr + host_ratio;
+                } else {
+                    /*
+                     * No discards on this iteration, next loop starts from
+                     * next 1 bit
+                     */
+                    search_start = found_zero + 1;
+                }
+            }
+            /* Find the next 1 for the next iteration */
+            found_set = find_next_bit(ms->sentmap, last + 1, search_start);
+
+            if (do_discard) {
+                unsigned long page;
+
+                /* Tell the destination to discard this page */
+                postcopy_discard_send_range(ms, pds, discard_start_addr,
+                         discard_start_addr + host_ratio - 1);
+                /* Clean up the bitmap */
+                for (page = discard_start_addr;
+                     page < discard_start_addr + host_ratio; page++) {
+                    /* All pages in this host page are now not sent */
+                    clear_bit(page, ms->sentmap);
+
+                    /*
+                     * Remark them as dirty, updating the count for any pages
+                     * that weren't previously dirty.
+                     */
+                    migration_dirty_pages += !test_and_set_bit(page,
+                                                             migration_bitmap);
+                }
+            }
+        }
+
+        /*
+         * Second pass: Ensure that all partially dirty host pages are made
+         * fully dirty.
+         */
+        found_set = find_next_bit(migration_bitmap, last + 1, first);
+        while (found_set <= last) {
+            bool do_dirty = false;
+            unsigned long dirty_start_addr;
+            /*
+             * If the start of this run of pages is in the middle of a host
+             * page, then we need to mark the whole of this host page dirty
+             */
+            if (found_set % host_ratio) {
+                do_dirty = true;
+                found_set -= found_set % host_ratio;
+                dirty_start_addr = found_set;
+                search_start = found_set + host_ratio;
+            } else {
+                /* Find the end of this run */
+                unsigned long found_zero;
+                found_zero = find_next_zero_bit(migration_bitmap, last + 1,
+                                                found_set + 1);
+                /*
+                 * If the 0 isn't at the start of a host page, then the
+                 * run of 1's doesn't finish at the end of a host page
+                 * and we need to discard.
+                 */
+                if (found_zero % host_ratio) {
+                    do_dirty = true;
+                    dirty_start_addr = found_zero - (found_zero % host_ratio);
+                    /*
+                     * This host page has gone, the next loop iteration starts
+                     * from the next page with a 1 bit
+                     */
+                    search_start = dirty_start_addr + host_ratio;
+                } else {
+                    /*
+                     * No discards on this iteration, next loop starts from
+                     * next 1 bit
+                     */
+                    search_start = found_zero + 1;
+                }
+            }
+
+            /* Find the next 1 for the next iteration */
+            found_set = find_next_bit(migration_bitmap, last + 1, search_start);
+
+            if (do_dirty) {
+                unsigned long page;
+
+                if (test_bit(dirty_start_addr, ms->sentmap)) {
+                    /*
+                     * If the page being redirtied is marked as sent, then it
+                     * must have been fully sent (otherwise it would have been
+                     * discarded by the previous pass.)
+                     * Discard it now.
+                     */
+                    postcopy_discard_send_range(ms, pds, dirty_start_addr,
+                                                dirty_start_addr +
+                                                host_ratio - 1);
+                }
+
+                /* Clean up the bitmap */
+                for (page = dirty_start_addr;
+                     page < dirty_start_addr + host_ratio; page++) {
+
+                    /* Clear the sentmap bits for the discard case above */
+                    clear_bit(page, ms->sentmap);
+
+                    /*
+                     * Mark them as dirty, updating the count for any pages
+                     * that weren't previously dirty.
+                     */
+                    migration_dirty_pages += !test_and_set_bit(page,
+                                                             migration_bitmap);
+                }
+            }
+        }
+        postcopy_discard_send_finish(ms, pds);
+
+    } /* ram_list loop */
+
+    return 0;
+}
+
+/*
  * Transmit the set of pages to be discarded after precopy to the target
  * these are pages that:
  *     a) Have been previously transmitted but are now dirty again
@@ -1525,6 +1708,13 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms)
     /* This should be our last sync, the src is now paused */
     migration_bitmap_sync();
 
+    /* Deal with TPS != HPS */
+    ret = postcopy_chunk_hostpages(ms);
+    if (ret) {
+        rcu_read_unlock();
+        return ret;
+    }
+
     /*
      * Update the sentmap to be sentmap = ~sentmap | dirty
      */
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 46/54] postcopy: Check order of received target pages
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (44 preceding siblings ...)
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 45/54] Host page!=target page: Cleanup bitmaps Dr. David Alan Gilbert (git)
@ 2015-09-29  8:38 ` Dr. David Alan Gilbert (git)
  2015-10-28 11:26   ` Juan Quintela
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 47/54] Round up RAMBlock sizes to host page sizes Dr. David Alan Gilbert (git)
                   ` (7 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:38 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Ensure that target pages received within a host page are in order.
This shouldn't trigger, but in the cases where the sender goes
wrong and sends stuff out of order it produces a corruption that's
really nasty to debug.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/ram.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index d6437be..8b1570d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2212,6 +2212,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     void *postcopy_host_page = NULL;
     bool postcopy_place_needed = false;
     bool matching_page_sizes = qemu_host_page_size == TARGET_PAGE_SIZE;
+    void *last_host = NULL;
 
     seq_iter++;
 
@@ -2264,6 +2265,14 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
                 /* If all TP are zero then we can optimise the place */
                 if (!((uintptr_t)host & ~qemu_host_page_mask)) {
                     all_zero = true;
+                } else {
+                    /* not the 1st TP within the HP */
+                    if (host != (last_host + TARGET_PAGE_SIZE)) {
+                        error_report("Non-sequential target page %p/%p\n",
+                                      host, last_host);
+                        ret = -EINVAL;
+                        break;
+                    }
                 }
 
                 /*
@@ -2274,6 +2283,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
                                          ~qemu_host_page_mask) == 0;
                 postcopy_place_source = postcopy_host_page;
             }
+            last_host = host;
         }
 
         switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 47/54] Round up RAMBlock sizes to host page sizes
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (45 preceding siblings ...)
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 46/54] postcopy: Check order of received target pages Dr. David Alan Gilbert (git)
@ 2015-09-29  8:38 ` Dr. David Alan Gilbert (git)
  2015-10-28 11:28   ` Juan Quintela
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 48/54] Postcopy; Handle userfault requests Dr. David Alan Gilbert (git)
                   ` (6 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:38 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

RAMBlocks that are not a multiple of host pages in length
cause problems for postcopy (I've seen an ACPI table on aarch64
be 5k in length - i.e. 5x target-page), so round RAMBlock sizes
up to a host-page.

This potentially breaks migration compatibility due to changes
in RAMBlock sizes; however:
   1) x86 and s390 I think always have host=target page size
   2) When I've tried on Power the block sizes already seem aligned.
   3) I don't think there's anything else that maintains per-version
      machine-types for compatibility.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 exec.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/exec.c b/exec.c
index d7c50e3..f746409 100644
--- a/exec.c
+++ b/exec.c
@@ -1425,7 +1425,7 @@ int qemu_ram_resize(ram_addr_t base, ram_addr_t newsize, Error **errp)
 
     assert(block);
 
-    newsize = TARGET_PAGE_ALIGN(newsize);
+    newsize = HOST_PAGE_ALIGN(newsize);
 
     if (block->used_length == newsize) {
         return 0;
@@ -1569,7 +1569,7 @@ ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
         return -1;
     }
 
-    size = TARGET_PAGE_ALIGN(size);
+    size = HOST_PAGE_ALIGN(size);
     new_block = g_malloc0(sizeof(*new_block));
     new_block->mr = mr;
     new_block->used_length = size;
@@ -1604,8 +1604,8 @@ ram_addr_t qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
     ram_addr_t addr;
     Error *local_err = NULL;
 
-    size = TARGET_PAGE_ALIGN(size);
-    max_size = TARGET_PAGE_ALIGN(max_size);
+    size = HOST_PAGE_ALIGN(size);
+    max_size = HOST_PAGE_ALIGN(max_size);
     new_block = g_malloc0(sizeof(*new_block));
     new_block->mr = mr;
     new_block->resized = resized;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 48/54] Postcopy; Handle userfault requests
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (46 preceding siblings ...)
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 47/54] Round up RAMBlock sizes to host page sizes Dr. David Alan Gilbert (git)
@ 2015-09-29  8:38 ` Dr. David Alan Gilbert (git)
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 49/54] Start up a postcopy/listener thread ready for incoming page data Dr. David Alan Gilbert (git)
                   ` (5 subsequent siblings)
  53 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:38 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

userfaultfd is a Linux syscall that gives an fd that receives a stream
of notifications of accesses to pages registered with it and allows
the program to acknowledge those stalls and tell the accessing
thread to carry on.

We convert the requests from the kernel into messages back to the
source asking for the pages.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 include/migration/migration.h |   3 +
 migration/postcopy-ram.c      | 155 +++++++++++++++++++++++++++++++++++++++---
 trace-events                  |   9 +++
 3 files changed, 158 insertions(+), 9 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 218b2ca..65dfe04 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -89,11 +89,14 @@ struct MigrationIncomingState {
      */
     QemuEvent main_thread_load_event;
 
+    bool           have_fault_thread;
     QemuThread     fault_thread;
     QemuSemaphore  fault_thread_sem;
 
     /* For the kernel to send us notifications */
     int       userfault_fd;
+    /* To tell the fault_thread to quit */
+    int       userfault_quit_fd;
     QEMUFile *to_src_file;
     QemuMutex rp_mutex;    /* We send replies from multiple threads */
     void     *postcopy_tmp_page;
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 09a1349..0b021ca 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -51,6 +51,8 @@ struct PostcopyDiscardState {
  */
 #if defined(__linux__)
 
+#include <poll.h>
+#include <sys/eventfd.h>
 #include <sys/mman.h>
 #include <sys/ioctl.h>
 #include <sys/syscall.h>
@@ -267,15 +269,41 @@ int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
  */
 int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
 {
-    /* TODO: Join the fault thread once we're sure it will exit */
-    if (qemu_ram_foreach_block(cleanup_range, mis)) {
-        return -1;
+    trace_postcopy_ram_incoming_cleanup_entry();
+
+    if (mis->have_fault_thread) {
+        uint64_t tmp64;
+
+        if (qemu_ram_foreach_block(cleanup_range, mis)) {
+            return -1;
+        }
+        /*
+         * Tell the fault_thread to exit, it's an eventfd that should
+         * currently be at 0, we're going to increment it to 1
+         */
+        tmp64 = 1;
+        if (write(mis->userfault_quit_fd, &tmp64, 8) == 8) {
+            trace_postcopy_ram_incoming_cleanup_join();
+            qemu_thread_join(&mis->fault_thread);
+        } else {
+            /* Not much we can do here, but may as well report it */
+            error_report("%s: incrementing userfault_quit_fd: %s", __func__,
+                         strerror(errno));
+        }
+        trace_postcopy_ram_incoming_cleanup_closeuf();
+        close(mis->userfault_fd);
+        close(mis->userfault_quit_fd);
+        mis->have_fault_thread = false;
     }
 
+    postcopy_state_set(POSTCOPY_INCOMING_END);
+    migrate_send_rp_shut(mis, qemu_file_get_error(mis->from_src_file) != 0);
+
     if (mis->postcopy_tmp_page) {
         munmap(mis->postcopy_tmp_page, getpagesize());
         mis->postcopy_tmp_page = NULL;
     }
+    trace_postcopy_ram_incoming_cleanup_exit();
     return 0;
 }
 
@@ -314,31 +342,140 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
 static void *postcopy_ram_fault_thread(void *opaque)
 {
     MigrationIncomingState *mis = opaque;
+    struct uffd_msg msg;
+    int ret;
+    size_t hostpagesize = getpagesize();
+    RAMBlock *rb = NULL;
+    RAMBlock *last_rb = NULL; /* last RAMBlock we sent part of */
 
-    fprintf(stderr, "postcopy_ram_fault_thread\n");
-    /* TODO: In later patch */
+    trace_postcopy_ram_fault_thread_entry();
     qemu_sem_post(&mis->fault_thread_sem);
-    while (1) {
-        /* TODO: In later patch */
-    }
 
+    while (true) {
+        ram_addr_t rb_offset;
+        ram_addr_t in_raspace;
+        struct pollfd pfd[2];
+
+        /*
+         * We're mainly waiting for the kernel to give us a faulting HVA,
+         * however we can be told to quit via userfault_quit_fd which is
+         * an eventfd
+         */
+        pfd[0].fd = mis->userfault_fd;
+        pfd[0].events = POLLIN;
+        pfd[0].revents = 0;
+        pfd[1].fd = mis->userfault_quit_fd;
+        pfd[1].events = POLLIN; /* Waiting for eventfd to go positive */
+        pfd[1].revents = 0;
+
+        if (poll(pfd, 2, -1 /* Wait forever */) == -1) {
+            error_report("%s: userfault poll: %s", __func__, strerror(errno));
+            break;
+        }
+
+        if (pfd[1].revents) {
+            trace_postcopy_ram_fault_thread_quit();
+            break;
+        }
+
+        ret = read(mis->userfault_fd, &msg, sizeof(msg));
+        if (ret != sizeof(msg)) {
+            if (errno == EAGAIN) {
+                /*
+                 * if a wake up happens on the other thread just after
+                 * the poll, there is nothing to read.
+                 */
+                continue;
+            }
+            if (ret < 0) {
+                error_report("%s: Failed to read full userfault message: %s",
+                             __func__, strerror(errno));
+                break;
+            } else {
+                error_report("%s: Read %d bytes from userfaultfd expected %zd",
+                             __func__, ret, sizeof(msg));
+                break; /* Lost alignment, don't know what we'd read next */
+            }
+        }
+        if (msg.event != UFFD_EVENT_PAGEFAULT) {
+            error_report("%s: Read unexpected event %ud from userfaultfd",
+                         __func__, msg.event);
+            continue; /* It's not a page fault, shouldn't happen */
+        }
+
+        rb = qemu_ram_block_from_host(
+                 (void *)(uintptr_t)msg.arg.pagefault.address,
+                 true, &in_raspace, &rb_offset);
+        if (!rb) {
+            error_report("postcopy_ram_fault_thread: Fault outside guest: %"
+                         PRIx64, (uint64_t)msg.arg.pagefault.address);
+            break;
+        }
+
+        rb_offset &= ~(hostpagesize - 1);
+        trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
+                                                qemu_ram_get_idstr(rb),
+                                                rb_offset);
+
+        /*
+         * Send the request to the source - we want to request one
+         * of our host page sizes (which is >= TPS)
+         */
+        if (rb != last_rb) {
+            last_rb = rb;
+            migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
+                                     rb_offset, hostpagesize);
+        } else {
+            /* Save some space */
+            migrate_send_rp_req_pages(mis, NULL,
+                                     rb_offset, hostpagesize);
+        }
+    }
+    trace_postcopy_ram_fault_thread_exit();
     return NULL;
 }
 
 int postcopy_ram_enable_notify(MigrationIncomingState *mis)
 {
-    /* Create the fault handler thread and wait for it to be ready */
+    /* Open the fd for the kernel to give us userfaults */
+    mis->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
+    if (mis->userfault_fd == -1) {
+        error_report("%s: Failed to open userfault fd: %s", __func__,
+                     strerror(errno));
+        return -1;
+    }
+
+    /*
+     * Although the host check already tested the API, we need to
+     * do the check again as an ABI handshake on the new fd.
+     */
+    if (!ufd_version_check(mis->userfault_fd)) {
+        return -1;
+    }
+
+    /* Now an eventfd we use to tell the fault-thread to quit */
+    mis->userfault_quit_fd = eventfd(0, EFD_CLOEXEC);
+    if (mis->userfault_quit_fd == -1) {
+        error_report("%s: Opening userfault_quit_fd: %s", __func__,
+                     strerror(errno));
+        close(mis->userfault_fd);
+        return -1;
+    }
+
     qemu_sem_init(&mis->fault_thread_sem, 0);
     qemu_thread_create(&mis->fault_thread, "postcopy/fault",
                        postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
     qemu_sem_wait(&mis->fault_thread_sem);
     qemu_sem_destroy(&mis->fault_thread_sem);
+    mis->have_fault_thread = true;
 
     /* Mark so that we get notified of accesses to unwritten areas */
     if (qemu_ram_foreach_block(ram_block_enable_notify, mis)) {
         return -1;
     }
 
+    trace_postcopy_ram_enable_notify();
+
     return 0;
 }
 
diff --git a/trace-events b/trace-events
index 2f27385..8ca3518 100644
--- a/trace-events
+++ b/trace-events
@@ -1540,6 +1540,15 @@ postcopy_cleanup_range(const char *ramblock, void *host_addr, size_t offset, siz
 postcopy_init_range(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=%zx length=%zx"
 postcopy_place_page(void *host_addr) "host=%p"
 postcopy_place_page_zero(void *host_addr) "host=%p"
+postcopy_ram_enable_notify(void) ""
+postcopy_ram_fault_thread_entry(void) ""
+postcopy_ram_fault_thread_exit(void) ""
+postcopy_ram_fault_thread_quit(void) ""
+postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=%" PRIx64 " rb=%s offset=%zx"
+postcopy_ram_incoming_cleanup_closeuf(void) ""
+postcopy_ram_incoming_cleanup_entry(void) ""
+postcopy_ram_incoming_cleanup_exit(void) ""
+postcopy_ram_incoming_cleanup_join(void) ""
 
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 49/54] Start up a postcopy/listener thread ready for incoming page data
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (47 preceding siblings ...)
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 48/54] Postcopy; Handle userfault requests Dr. David Alan Gilbert (git)
@ 2015-09-29  8:38 ` Dr. David Alan Gilbert (git)
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 50/54] postcopy: Wire up loadvm_postcopy_handle_ commands Dr. David Alan Gilbert (git)
                   ` (4 subsequent siblings)
  53 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:38 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The loading of a device state (during postcopy) may access guest
memory that's still on the source machine and thus might need
a page fill; split off a separate thread that handles the incoming
page data so that the original incoming migration code can finish
off the device data.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
 include/migration/migration.h |  4 +++
 migration/migration.c         |  6 ++++
 migration/savevm.c            | 79 ++++++++++++++++++++++++++++++++++++++++++-
 trace-events                  |  2 ++
 4 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 65dfe04..e8bed7d 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -93,6 +93,10 @@ struct MigrationIncomingState {
     QemuThread     fault_thread;
     QemuSemaphore  fault_thread_sem;
 
+    bool           have_listen_thread;
+    QemuThread     listen_thread;
+    QemuSemaphore  listen_thread_sem;
+
     /* For the kernel to send us notifications */
     int       userfault_fd;
     /* To tell the fault_thread to quit */
diff --git a/migration/migration.c b/migration/migration.c
index bb7b683..379fadc 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1381,6 +1381,12 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
         goto fail;
     }
 
+    /*
+     * Make sure the receiver can get incoming pages before we send the rest
+     * of the state
+     */
+    qemu_savevm_send_postcopy_listen(fb);
+
     qemu_savevm_state_complete_precopy(fb);
     qemu_savevm_send_ping(fb, 3);
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 63b2c30..fe683d6 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1378,6 +1378,65 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
     return 0;
 }
 
+/*
+ * Triggered by a postcopy_listen command; this thread takes over reading
+ * the input stream, leaving the main thread free to carry on loading the rest
+ * of the device state (from RAM).
+ * (TODO:This could do with being in a postcopy file - but there again it's
+ * just another input loop, not that postcopy specific)
+ */
+static void *postcopy_ram_listen_thread(void *opaque)
+{
+    QEMUFile *f = opaque;
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    int load_res;
+
+    qemu_sem_post(&mis->listen_thread_sem);
+    trace_postcopy_ram_listen_thread_start();
+
+    /*
+     * Because we're a thread and not a coroutine we can't yield
+     * in qemu_file, and thus we must be blocking now.
+     */
+    qemu_file_set_blocking(f, true);
+    load_res = qemu_loadvm_state_main(f, mis);
+    /* And non-blocking again so we don't block in any cleanup */
+    qemu_file_set_blocking(f, false);
+
+    trace_postcopy_ram_listen_thread_exit();
+    if (load_res < 0) {
+        error_report("%s: loadvm failed: %d", __func__, load_res);
+        qemu_file_set_error(f, load_res);
+    } else {
+        /*
+         * This looks good, but it's possible that the device loading in the
+         * main thread hasn't finished yet, and so we might not be in 'RUN'
+         * state yet; wait for the end of the main thread.
+         */
+        qemu_event_wait(&mis->main_thread_load_event);
+    }
+    postcopy_ram_incoming_cleanup(mis);
+    /*
+     * If everything has worked fine, then the main thread has waited
+     * for us to start, and we're the last use of the mis.
+     * (If something broke then qemu will have to exit anyway since it's
+     * got a bad migration state).
+     */
+    migration_incoming_state_destroy();
+
+    if (load_res < 0) {
+        /*
+         * If something went wrong then we have a bad state so exit;
+         * depending how far we got it might be possible at this point
+         * to leave the guest running and fire MCEs for pages that never
+         * arrived as a desperate recovery step.
+         */
+        exit(EXIT_FAILURE);
+    }
+
+    return NULL;
+}
+
 /* After this message we must be able to immediately receive postcopy data */
 static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
 {
@@ -1397,7 +1456,20 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
         return -1;
     }
 
-    /* TODO start up the postcopy listening thread */
+    if (mis->have_listen_thread) {
+        error_report("CMD_POSTCOPY_RAM_LISTEN already has a listen thread");
+        return -1;
+    }
+
+    mis->have_listen_thread = true;
+    /* Start up the listening thread and wait for it to signal ready */
+    qemu_sem_init(&mis->listen_thread_sem, 0);
+    qemu_thread_create(&mis->listen_thread, "postcopy/listen",
+                       postcopy_ram_listen_thread, mis->from_src_file,
+                       QEMU_THREAD_JOINABLE);
+    qemu_sem_wait(&mis->listen_thread_sem);
+    qemu_sem_destroy(&mis->listen_thread_sem);
+
     return 0;
 }
 
@@ -1745,6 +1817,11 @@ int qemu_loadvm_state(QEMUFile *f)
 
     trace_qemu_loadvm_state_post_main(ret);
 
+    if (mis->have_listen_thread) {
+        /* Listen thread still going, can't clean up yet */
+        return ret;
+    }
+
     if (ret == 0) {
         ret = qemu_file_get_error(f);
     }
diff --git a/trace-events b/trace-events
index 8ca3518..cb956e3 100644
--- a/trace-events
+++ b/trace-events
@@ -1213,6 +1213,8 @@ loadvm_postcopy_ram_handle_discard_end(void) ""
 loadvm_postcopy_ram_handle_discard_header(const char *ramid, uint16_t len) "%s: %ud"
 loadvm_process_command(uint16_t com, uint16_t len) "com=0x%x len=%d"
 loadvm_process_command_ping(uint32_t val) "%x"
+postcopy_ram_listen_thread_exit(void) ""
+postcopy_ram_listen_thread_start(void) ""
 qemu_savevm_send_postcopy_advise(void) ""
 qemu_savevm_send_postcopy_ram_discard(const char *id, uint16_t len) "%s: %ud"
 savevm_command_send(uint16_t command, uint16_t len) "com=0x%x len=%d"
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 50/54] postcopy: Wire up loadvm_postcopy_handle_ commands
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (48 preceding siblings ...)
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 49/54] Start up a postcopy/listener thread ready for incoming page data Dr. David Alan Gilbert (git)
@ 2015-09-29  8:38 ` Dr. David Alan Gilbert (git)
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 51/54] Postcopy: Mark nohugepage before discard Dr. David Alan Gilbert (git)
                   ` (3 subsequent siblings)
  53 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:38 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Wire up more of the handlers for the commands on the destination side,
in particular loadvm_postcopy_handle_run now has enough to start the
guest running.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 migration/savevm.c | 28 +++++++++++++++++++++++++++-
 trace-events       |  2 ++
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index fe683d6..7c5b9d1 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1477,12 +1477,33 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
 static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
 {
     PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_RUNNING);
+    Error *local_err = NULL;
+
     trace_loadvm_postcopy_handle_run();
     if (ps != POSTCOPY_INCOMING_LISTENING) {
         error_report("CMD_POSTCOPY_RUN in wrong postcopy state (%d)", ps);
         return -1;
     }
 
+    /* TODO we should move all of this lot into postcopy_ram.c or a shared code
+     * in migration.c
+     */
+    cpu_synchronize_all_post_init();
+
+    qemu_announce_self();
+
+    /* Make sure all file formats flush their mutable metadata */
+    bdrv_invalidate_cache_all(&local_err);
+    if (local_err) {
+        error_report_err(local_err);
+        return -1;
+    }
+
+    trace_loadvm_postcopy_handle_run_cpu_sync();
+    cpu_synchronize_all_post_init();
+
+    trace_loadvm_postcopy_handle_run_vmstart();
+
     if (autostart) {
         /* Hold onto your hats, starting the CPU */
         vm_start();
@@ -1491,7 +1512,12 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
         runstate_set(RUN_STATE_PAUSED);
     }
 
-    return 0;
+    /* We need to finish reading the stream from the package
+     * and also stop reading anything more from the stream that loaded the
+     * package (since it's now being read by the listener thread).
+     * LOADVM_QUIT will quit all the layers of nested loadvm loops.
+     */
+    return LOADVM_QUIT;
 }
 
 /**
diff --git a/trace-events b/trace-events
index cb956e3..42a7577 100644
--- a/trace-events
+++ b/trace-events
@@ -1208,6 +1208,8 @@ loadvm_handle_cmd_packaged_received(int ret) "%d"
 loadvm_postcopy_handle_advise(void) ""
 loadvm_postcopy_handle_listen(void) ""
 loadvm_postcopy_handle_run(void) ""
+loadvm_postcopy_handle_run_cpu_sync(void) ""
+loadvm_postcopy_handle_run_vmstart(void) ""
 loadvm_postcopy_ram_handle_discard(void) ""
 loadvm_postcopy_ram_handle_discard_end(void) ""
 loadvm_postcopy_ram_handle_discard_header(const char *ramid, uint16_t len) "%s: %ud"
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 51/54] Postcopy: Mark nohugepage before discard
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (49 preceding siblings ...)
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 50/54] postcopy: Wire up loadvm_postcopy_handle_ commands Dr. David Alan Gilbert (git)
@ 2015-09-29  8:38 ` Dr. David Alan Gilbert (git)
  2015-10-28 14:02   ` Juan Quintela
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 52/54] End of migration for postcopy Dr. David Alan Gilbert (git)
                   ` (2 subsequent siblings)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:38 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Prior to servicing userfault requests we must ensure we've not got
huge pages in the area that might include non-transferred memory,
since a hugepage could incorrectly mark the whole huge page as present.

We mark the area as non-huge page (nhp) just before we perform
discards; the discard code now tells us to discard any areas
that haven't been sent (as well as any that are redirtied);
any already formed transparent-huge-pages get fragmented
by this discard process if they cotnain any discards.

Transparent huge pages that have been entirely transferred
and don't contain any discards are not broken by this mechanism;
they stay as huge pages.

By starting postcopy after a full precopy pass, many of the pages
then stay as huge pages; this is important for maintaining performance
after the end of the migration.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/postcopy-ram.h |  6 ++++++
 migration/postcopy-ram.c         | 46 +++++++++++++++++++++++++++++++++++++---
 migration/savevm.c               |  9 +++++++-
 trace-events                     |  1 +
 4 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index 50c1ce5..60ac5b1 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -41,6 +41,12 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis);
 int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
                                size_t length);
 
+/*
+ * Userfault requires us to mark RAM as NOHUGEPAGE prior to discard
+ * however leaving it until after precopy means that most of the precopy
+ * data is still THPd
+ */
+int postcopy_ram_prepare_discard(MigrationIncomingState *mis);
 
 /*
  * Called at the start of each RAMBlock by the bitmap code.
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 0b021ca..10e5a5b 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -226,12 +226,10 @@ static int cleanup_range(const char *block_name, void *host_addr,
      * We turned off hugepage for the precopy stage with postcopy enabled
      * we can turn it back on now.
      */
-#ifdef MADV_HUGEPAGE
-    if (madvise(host_addr, length, MADV_HUGEPAGE)) {
+    if (qemu_madvise(host_addr, length, QEMU_MADV_HUGEPAGE)) {
         error_report("%s HUGEPAGE: %s", __func__, strerror(errno));
         return -1;
     }
-#endif
 
     /*
      * We can also turn off userfault now since we should have all the
@@ -308,6 +306,43 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
 }
 
 /*
+ * Disable huge pages on an area
+ */
+static int nhp_range(const char *block_name, void *host_addr,
+                    ram_addr_t offset, ram_addr_t length, void *opaque)
+{
+    trace_postcopy_nhp_range(block_name, host_addr, offset, length);
+
+    /*
+     * Before we do discards we need to ensure those discards really
+     * do delete areas of the page, even if THP thinks a hugepage would
+     * be a good idea, so force hugepages off.
+     */
+    if (qemu_madvise(host_addr, length, QEMU_MADV_NOHUGEPAGE)) {
+        error_report("%s: NOHUGEPAGE: %s", __func__, strerror(errno));
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * Userfault requires us to mark RAM as NOHUGEPAGE prior to discard
+ * however leaving it until after precopy means that most of the precopy
+ * data is still THPd
+ */
+int postcopy_ram_prepare_discard(MigrationIncomingState *mis)
+{
+    if (qemu_ram_foreach_block(nhp_range, mis)) {
+        return -1;
+    }
+
+    postcopy_state_set(POSTCOPY_INCOMING_DISCARD);
+
+    return 0;
+}
+
+/*
  * Mark the given area of RAM as requiring notification to unwritten areas
  * Used as a  callback on qemu_ram_foreach_block.
  *   host_addr: Base of area to mark
@@ -581,6 +616,11 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
     assert(0);
 }
 
+int postcopy_ram_prepare_discard(MigrationIncomingState *mis)
+{
+    assert(0);
+}
+
 int postcopy_ram_enable_notify(MigrationIncomingState *mis)
 {
     assert(0);
diff --git a/migration/savevm.c b/migration/savevm.c
index 7c5b9d1..894e085 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1314,7 +1314,7 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
     switch (ps) {
     case POSTCOPY_INCOMING_ADVISE:
         /* 1st discard */
-        tmp = 0; /* TODO: later patch postcopy_ram_prepare_discard(mis); */
+        tmp = postcopy_ram_prepare_discard(mis);
         if (tmp) {
             return tmp;
         }
@@ -1446,6 +1446,13 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
         error_report("CMD_POSTCOPY_LISTEN in wrong postcopy state (%d)", ps);
         return -1;
     }
+    if (ps == POSTCOPY_INCOMING_ADVISE) {
+        /*
+         * A rare case, we entered listen without having to do any discards,
+         * so do the setup that's normally done at the time of the 1st discard.
+         */
+        postcopy_ram_prepare_discard(mis);
+    }
 
     /*
      * Sensitise RAM - can now generate requests for blocks that don't exist
diff --git a/trace-events b/trace-events
index 42a7577..81f9ca7 100644
--- a/trace-events
+++ b/trace-events
@@ -1542,6 +1542,7 @@ postcopy_discard_send_finish(const char *ramblock, int nwords, int ncmds) "%s ma
 postcopy_ram_discard_range(void *start, size_t length) "%p,+%zx"
 postcopy_cleanup_range(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=%zx length=%zx"
 postcopy_init_range(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=%zx length=%zx"
+postcopy_nhp_range(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=%zx length=%zx"
 postcopy_place_page(void *host_addr) "host=%p"
 postcopy_place_page_zero(void *host_addr) "host=%p"
 postcopy_ram_enable_notify(void) ""
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 52/54] End of migration for postcopy
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (50 preceding siblings ...)
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 51/54] Postcopy: Mark nohugepage before discard Dr. David Alan Gilbert (git)
@ 2015-09-29  8:38 ` Dr. David Alan Gilbert (git)
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 53/54] Disable mlock around incoming postcopy Dr. David Alan Gilbert (git)
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 54/54] Inhibit ballooning during postcopy Dr. David Alan Gilbert (git)
  53 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:38 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Tweak the end of migration cleanup; we don't want to close stuff down
at the end of the main stream, since the postcopy is still sending pages
on the other thread.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 migration/migration.c | 26 +++++++++++++++++++++++++-
 trace-events          |  6 ++++--
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 379fadc..571ce1f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -317,13 +317,37 @@ static void process_incoming_migration_co(void *opaque)
 {
     QEMUFile *f = opaque;
     Error *local_err = NULL;
+    MigrationIncomingState *mis;
+    PostcopyState ps;
     int ret;
 
-    migration_incoming_state_new(f);
+    mis = migration_incoming_state_new(f);
     postcopy_state_set(POSTCOPY_INCOMING_NONE);
     migrate_generate_event(MIGRATION_STATUS_ACTIVE);
+
     ret = qemu_loadvm_state(f);
 
+    ps = postcopy_state_get();
+    trace_process_incoming_migration_co_end(ret, ps);
+    if (ps != POSTCOPY_INCOMING_NONE) {
+        if (ps == POSTCOPY_INCOMING_ADVISE) {
+            /*
+             * Where a migration had postcopy enabled (and thus went to advise)
+             * but managed to complete within the precopy period, we can use
+             * the normal exit.
+             */
+            postcopy_ram_incoming_cleanup(mis);
+        } else if (ret >= 0) {
+            /*
+             * Postcopy was started, cleanup should happen at the end of the
+             * postcopy thread.
+             */
+            trace_process_incoming_migration_co_postcopy_end_main();
+            return;
+        }
+        /* Else if something went wrong then just fall out of the normal exit */
+    }
+
     qemu_fclose(f);
     free_xbzrle_decoded_buf();
     migration_incoming_state_destroy();
diff --git a/trace-events b/trace-events
index 81f9ca7..3d299d5 100644
--- a/trace-events
+++ b/trace-events
@@ -1468,10 +1468,12 @@ source_return_path_thread_entry(void) ""
 source_return_path_thread_loop_top(void) ""
 source_return_path_thread_pong(uint32_t val) "%x"
 source_return_path_thread_shut(uint32_t val) "%x"
-migrate_transferred(uint64_t tranferred, uint64_t time_spent, double bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %g max_size %" PRId64
-migrate_state_too_big(void) ""
 migrate_global_state_post_load(const char *state) "loaded state: %s"
 migrate_global_state_pre_save(const char *state) "saved state: %s"
+migrate_state_too_big(void) ""
+migrate_transferred(uint64_t tranferred, uint64_t time_spent, double bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %g max_size %" PRId64
+process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
+process_incoming_migration_co_postcopy_end_main(void) ""
 
 # migration/rdma.c
 qemu_rdma_accept_incoming_migration(void) ""
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 53/54] Disable mlock around incoming postcopy
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (51 preceding siblings ...)
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 52/54] End of migration for postcopy Dr. David Alan Gilbert (git)
@ 2015-09-29  8:38 ` Dr. David Alan Gilbert (git)
  2015-10-21  9:17   ` Juan Quintela
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 54/54] Inhibit ballooning during postcopy Dr. David Alan Gilbert (git)
  53 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:38 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Userfault doesn't work with mlock; mlock is designed to nail down pages
so they don't move, userfault is designed to tell you when they're not
there.

munlock the pages we userfault protect before postcopy.
mlock everything again at the end if mlock is enabled.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 include/sysemu/sysemu.h  |  1 +
 migration/postcopy-ram.c | 24 ++++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index e2353a5..592ceb9 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -172,6 +172,7 @@ extern int boot_menu;
 extern bool boot_strict;
 extern uint8_t *boot_splash_filedata;
 extern size_t boot_splash_filedata_size;
+extern bool enable_mlock;
 extern uint8_t qemu_extra_params_fw[2];
 extern QEMUClockType rtc_clock;
 extern const char *mem_path;
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 10e5a5b..ec649e1 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -87,6 +87,11 @@ static bool ufd_version_check(int ufd)
     return true;
 }
 
+/*
+ * Note: This has the side effect of munlock'ing all of RAM, that's
+ * normally fine since if the postcopy succeeds it gets turned back on at the
+ * end.
+ */
 bool postcopy_ram_supported_by_host(void)
 {
     long pagesize = getpagesize();
@@ -115,6 +120,15 @@ bool postcopy_ram_supported_by_host(void)
     }
 
     /*
+     * userfault and mlock don't go together; we'll put it back later if
+     * it was enabled.
+     */
+    if (munlockall()) {
+        error_report("%s: munlockall: %s", __func__,  strerror(errno));
+        return -1;
+    }
+
+    /*
      *  We need to check that the ops we need are supported on anon memory
      *  To do that we need to register a chunk and see the flags that
      *  are returned.
@@ -294,6 +308,16 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
         mis->have_fault_thread = false;
     }
 
+    if (enable_mlock) {
+        if (os_mlock() < 0) {
+            error_report("mlock: %s", strerror(errno));
+            /*
+             * It doesn't feel right to fail at this point, we have a valid
+             * VM state.
+             */
+        }
+    }
+
     postcopy_state_set(POSTCOPY_INCOMING_END);
     migrate_send_rp_shut(mis, qemu_file_get_error(mis->from_src_file) != 0);
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* [Qemu-devel] [PATCH v8 54/54] Inhibit ballooning during postcopy
  2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (52 preceding siblings ...)
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 53/54] Disable mlock around incoming postcopy Dr. David Alan Gilbert (git)
@ 2015-09-29  8:38 ` Dr. David Alan Gilbert (git)
  53 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-09-29  8:38 UTC (permalink / raw)
  To: qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy detects accesses to pages that haven't been transferred yet
using userfaultfd, and it causes exceptions on pages that are 'not
present'.
Ballooning also causes pages to be marked as 'not present' when the
guest inflates the balloon.
Potentially a balloon could be inflated to discard pages that are
currently inflight during postcopy and that may be arriving at about
the same time.

To avoid this confusion, disable ballooning during postcopy.

When disabled we drop balloon requests from the guest.  Since ballooning
is generally initiated by the host, the management system should avoid
initiating any balloon instructions to the guest during migration,
although it's not possible to know how long it would take a guest to
process a request made prior to the start of migration.
Guest initiated ballooning will not know if it's really freed a page
of host memory or not.

Queueing the requests until after migration would be nice, but is
non-trivial, since the set of inflate/deflate requests have to
be compared with the state of the page to know what the final
outcome is allowed to be.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 balloon.c                  | 11 +++++++++++
 hw/virtio/virtio-balloon.c |  4 +++-
 include/sysemu/balloon.h   |  2 ++
 migration/postcopy-ram.c   |  9 +++++++++
 4 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/balloon.c b/balloon.c
index 5d69e8a..0f45d1b 100644
--- a/balloon.c
+++ b/balloon.c
@@ -36,6 +36,17 @@
 static QEMUBalloonEvent *balloon_event_fn;
 static QEMUBalloonStatus *balloon_stat_fn;
 static void *balloon_opaque;
+static bool balloon_inhibited;
+
+bool qemu_balloon_is_inhibited(void)
+{
+    return balloon_inhibited;
+}
+
+void qemu_balloon_inhibit(bool state)
+{
+    balloon_inhibited = state;
+}
 
 static bool have_balloon(Error **errp)
 {
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index c419b17..9671635 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -37,9 +37,11 @@
 static void balloon_page(void *addr, int deflate)
 {
 #if defined(__linux__)
-    if (!kvm_enabled() || kvm_has_sync_mmu())
+    if (!qemu_balloon_is_inhibited() && (!kvm_enabled() ||
+                                         kvm_has_sync_mmu())) {
         qemu_madvise(addr, TARGET_PAGE_SIZE,
                 deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
+    }
 #endif
 }
 
diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
index 17fe300..3f976b4 100644
--- a/include/sysemu/balloon.h
+++ b/include/sysemu/balloon.h
@@ -22,5 +22,7 @@ typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info);
 int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
 			     QEMUBalloonStatus *stat_func, void *opaque);
 void qemu_remove_balloon_handler(void *opaque);
+bool qemu_balloon_is_inhibited(void);
+void qemu_balloon_inhibit(bool state);
 
 #endif
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index ec649e1..a6fdefe 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -24,6 +24,7 @@
 #include "migration/migration.h"
 #include "migration/postcopy-ram.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/balloon.h"
 #include "qemu/error-report.h"
 #include "trace.h"
 
@@ -308,6 +309,8 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
         mis->have_fault_thread = false;
     }
 
+    qemu_balloon_inhibit(false);
+
     if (enable_mlock) {
         if (os_mlock() < 0) {
             error_report("mlock: %s", strerror(errno));
@@ -533,6 +536,12 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
         return -1;
     }
 
+    /*
+     * Ballooning can mark pages as absent while we're postcopying
+     * that would cause false userfaults.
+     */
+    qemu_balloon_inhibit(true);
+
     trace_postcopy_ram_enable_notify();
 
     return 0;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 06/54] Rename mis->file to from_src_file
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 06/54] Rename mis->file to from_src_file Dr. David Alan Gilbert (git)
@ 2015-09-29 10:41   ` Amit Shah
  0 siblings, 0 replies; 119+ messages in thread
From: Amit Shah @ 2015-09-29 10:41 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, quintela, liang.z.li, qemu-devel, luis, bharata,
	pbonzini

On (Tue) 29 Sep 2015 [09:37:30], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> 'file' becomes confusing when you have flows in each direction;
> rename to make it clear.
> This leaves just the main forward direction ms->file, which is used
> in a lot of places and is probably not worth renaming given the churn.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

I guess we could drop the '_file' in from_src_file too w/o loss of
info.

Also, if you have to respin, description has a typo: ms->file instead
of mis->file.

		Amit

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 23/54] Add migration-capability boolean for postcopy-ram.
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 23/54] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
@ 2015-09-29 20:22   ` Eric Blake
  2015-09-30  7:00     ` Amit Shah
  0 siblings, 1 reply; 119+ messages in thread
From: Eric Blake @ 2015-09-29 20:22 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

[-- Attachment #1: Type: text/plain, Size: 1254 bytes --]

On 09/29/2015 02:37 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> The 'postcopy ram' capability allows postcopy migration of RAM;
> note that the migration starts off in precopy mode until
> postcopy mode is triggered (see the migrate_start_postcopy
> patch later in the series).
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> Reviewed-by: Amit Shah <amit.shah@redhat.com>
> ---
>  include/migration/migration.h |  1 +
>  migration/migration.c         | 23 +++++++++++++++++++++++
>  qapi-schema.json              |  6 +++++-
>  3 files changed, 29 insertions(+), 1 deletion(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

I'm guessing the plan is to keep this experimental until a bit more
experience is gained, to make sure we aren't missing anything essential
in the use of postcopy.

>  { 'enum': 'MigrationCapability',
>    'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
> -           'compress', 'events'] }
> +           'compress', 'events', 'x-postcopy-ram'] }

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 23/54] Add migration-capability boolean for postcopy-ram.
  2015-09-29 20:22   ` Eric Blake
@ 2015-09-30  7:00     ` Amit Shah
  2015-09-30 12:44       ` Eric Blake
  0 siblings, 1 reply; 119+ messages in thread
From: Amit Shah @ 2015-09-30  7:00 UTC (permalink / raw)
  To: Eric Blake
  Cc: aarcange, quintela, liang.z.li, Dr. David Alan Gilbert (git),
	qemu-devel, luis, bharata, pbonzini

On (Tue) 29 Sep 2015 [14:22:17], Eric Blake wrote:
> On 09/29/2015 02:37 AM, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > The 'postcopy ram' capability allows postcopy migration of RAM;
> > note that the migration starts off in precopy mode until
> > postcopy mode is triggered (see the migrate_start_postcopy
> > patch later in the series).
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: Juan Quintela <quintela@redhat.com>
> > Reviewed-by: Amit Shah <amit.shah@redhat.com>
> > ---
> >  include/migration/migration.h |  1 +
> >  migration/migration.c         | 23 +++++++++++++++++++++++
> >  qapi-schema.json              |  6 +++++-
> >  3 files changed, 29 insertions(+), 1 deletion(-)
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 
> I'm guessing the plan is to keep this experimental until a bit more
> experience is gained, to make sure we aren't missing anything essential
> in the use of postcopy.

>From the cover letter:

    I'm keeping the x-  for now, until the libvirt interface gets finalised.

I expect, though, that we'll merge this series in 2.5, and remove the
x- before the 2.5 release.  My main concern of the Linux interface
being not released in a stable release will be satisfied with the 4.3
kernel release.

Any concerns from the libvirt side?



		Amit

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 23/54] Add migration-capability boolean for postcopy-ram.
  2015-09-30  7:00     ` Amit Shah
@ 2015-09-30 12:44       ` Eric Blake
  0 siblings, 0 replies; 119+ messages in thread
From: Eric Blake @ 2015-09-30 12:44 UTC (permalink / raw)
  To: Amit Shah
  Cc: aarcange, quintela, liang.z.li, Dr. David Alan Gilbert (git),
	qemu-devel, luis, bharata, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1007 bytes --]

On 09/30/2015 01:00 AM, Amit Shah wrote:

>> Reviewed-by: Eric Blake <eblake@redhat.com>
>>
>> I'm guessing the plan is to keep this experimental until a bit more
>> experience is gained, to make sure we aren't missing anything essential
>> in the use of postcopy.
> 
>>From the cover letter:
> 
>     I'm keeping the x-  for now, until the libvirt interface gets finalised.
> 
> I expect, though, that we'll merge this series in 2.5, and remove the
> x- before the 2.5 release.  My main concern of the Linux interface
> being not released in a stable release will be satisfied with the 4.3
> kernel release.
> 
> Any concerns from the libvirt side?

No, that should be fine. The libvirt side won't push the commit until
the x- is gone, but there's nothing stopping us from developing the
interface in parallel while x- is still present to prove that the design
will work.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 28/54] migrate_start_postcopy: Command to trigger transition to postcopy
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 28/54] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
@ 2015-09-30 16:25   ` Eric Blake
  2015-09-30 16:30     ` Dr. David Alan Gilbert
  2015-10-20 13:33   ` Juan Quintela
  2015-10-28 11:17   ` Amit Shah
  2 siblings, 1 reply; 119+ messages in thread
From: Eric Blake @ 2015-09-30 16:25 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel, quintela, amit.shah
  Cc: aarcange, pbonzini, liang.z.li, luis, bharata

[-- Attachment #1: Type: text/plain, Size: 1077 bytes --]

On 09/29/2015 02:37 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Once postcopy is enabled (with migrate_set_capability), the migration
> will still start on precopy mode.  To cause a transition into postcopy
> the:
> 
>   migrate_start_postcopy
> 
> command must be issued.  Postcopy will start sometime after this
> (when it's next checked in the migration loop).
> 
> Issuing the command before migration has started will error,
> and issuing after it has finished is ignored.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> ---

> +++ b/qapi-schema.json
> @@ -670,6 +670,14 @@
>              '*tls-port': 'int', '*cert-subject': 'str' } }
>  
>  ##
> +# @migrate-start-postcopy
> +#
> +# Switch migration to postcopy mode
> +#
> +# Since: 2.4

2.5, now.

> +{ 'command': 'migrate-start-postcopy' }
> +
> +##

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 28/54] migrate_start_postcopy: Command to trigger transition to postcopy
  2015-09-30 16:25   ` Eric Blake
@ 2015-09-30 16:30     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert @ 2015-09-30 16:30 UTC (permalink / raw)
  To: Eric Blake
  Cc: aarcange, quintela, liang.z.li, qemu-devel, luis, bharata,
	amit.shah, pbonzini

* Eric Blake (eblake@redhat.com) wrote:
> On 09/29/2015 02:37 AM, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Once postcopy is enabled (with migrate_set_capability), the migration
> > will still start on precopy mode.  To cause a transition into postcopy
> > the:
> > 
> >   migrate_start_postcopy
> > 
> > command must be issued.  Postcopy will start sometime after this
> > (when it's next checked in the migration loop).
> > 
> > Issuing the command before migration has started will error,
> > and issuing after it has finished is ignored.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: Eric Blake <eblake@redhat.com>
> > ---
> 
> > +++ b/qapi-schema.json
> > @@ -670,6 +670,14 @@
> >              '*tls-port': 'int', '*cert-subject': 'str' } }
> >  
> >  ##
> > +# @migrate-start-postcopy
> > +#
> > +# Switch migration to postcopy mode
> > +#
> > +# Since: 2.4
> 
> 2.5, now.

Fixed, thanks.

Dave

> 
> > +{ 'command': 'migrate-start-postcopy' }
> > +
> > +##
> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 16/54] Return path: Open a return path on QEMUFile for sockets
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 16/54] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
@ 2015-10-02 15:29   ` Daniel P. Berrange
  2015-10-02 16:32     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 119+ messages in thread
From: Daniel P. Berrange @ 2015-10-02 15:29 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, quintela, liang.z.li, qemu-devel, luis, bharata,
	amit.shah, pbonzini

On Tue, Sep 29, 2015 at 09:37:40AM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Postcopy needs a method to send messages from the destination back to
> the source, this is the 'return path'.
> 
> Wire it up for 'socket' QEMUFile's.

I find this to be a pretty wierd approach to the problem. THe underlying
transport is bi-directional, so I would expect to have a single QEMUFile
object that allowed bi-directional I/O on it, rather than creating a
second QEMUFile for the back channel, which was forbidden from closing
the shared FD.

I can understand why you've done this though - since we only have a
single buffer embedded in QEMUFile.  I wonder though if we'd be better
off changing QEMUFile to have a 'inbuf' and 'outbuf' intead of just
'buf' and likewise iniov & outiov. Then we can allow bi-directional
I/O on the single QEMUFile object which is a more natural fit.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 16/54] Return path: Open a return path on QEMUFile for sockets
  2015-10-02 15:29   ` Daniel P. Berrange
@ 2015-10-02 16:32     ` Dr. David Alan Gilbert
  2015-10-02 17:03       ` Daniel P. Berrange
  0 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert @ 2015-10-02 16:32 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: aarcange, quintela, liang.z.li, qemu-devel, luis, bharata,
	amit.shah, pbonzini

* Daniel P. Berrange (berrange@redhat.com) wrote:
> On Tue, Sep 29, 2015 at 09:37:40AM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Postcopy needs a method to send messages from the destination back to
> > the source, this is the 'return path'.
> > 
> > Wire it up for 'socket' QEMUFile's.
> 
> I find this to be a pretty wierd approach to the problem. THe underlying
> transport is bi-directional, so I would expect to have a single QEMUFile
> object that allowed bi-directional I/O on it, rather than creating a
> second QEMUFile for the back channel, which was forbidden from closing
> the shared FD.
> 
> I can understand why you've done this though - since we only have a
> single buffer embedded in QEMUFile.  I wonder though if we'd be better
> off changing QEMUFile to have a 'inbuf' and 'outbuf' intead of just
> 'buf' and likewise iniov & outiov. Then we can allow bi-directional
> I/O on the single QEMUFile object which is a more natural fit.

The 'c' FILE* is one directional, and I just took it that the QEMUFile* is
like that; i.e. a buffered layer on top of an underlying one directional
transport. stdin,stdout are two separate FILE*'s.

Your iniov, outiov would be basically the same, so you'd end up duplicating
code for the in and out parts; where as what you really have is two of the same
thing wired up in opposite directions.

Having said that, for things like RDMA, they have to do special stuff for
each direction and the QEMUFile is really a shim on top of that.

Dave

> 
> Regards,
> Daniel
> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 16/54] Return path: Open a return path on QEMUFile for sockets
  2015-10-02 16:32     ` Dr. David Alan Gilbert
@ 2015-10-02 17:03       ` Daniel P. Berrange
  0 siblings, 0 replies; 119+ messages in thread
From: Daniel P. Berrange @ 2015-10-02 17:03 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, quintela, liang.z.li, qemu-devel, luis, bharata,
	amit.shah, pbonzini

On Fri, Oct 02, 2015 at 05:32:18PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrange (berrange@redhat.com) wrote:
> > On Tue, Sep 29, 2015 at 09:37:40AM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Postcopy needs a method to send messages from the destination back to
> > > the source, this is the 'return path'.
> > > 
> > > Wire it up for 'socket' QEMUFile's.
> > 
> > I find this to be a pretty wierd approach to the problem. THe underlying
> > transport is bi-directional, so I would expect to have a single QEMUFile
> > object that allowed bi-directional I/O on it, rather than creating a
> > second QEMUFile for the back channel, which was forbidden from closing
> > the shared FD.
> > 
> > I can understand why you've done this though - since we only have a
> > single buffer embedded in QEMUFile.  I wonder though if we'd be better
> > off changing QEMUFile to have a 'inbuf' and 'outbuf' intead of just
> > 'buf' and likewise iniov & outiov. Then we can allow bi-directional
> > I/O on the single QEMUFile object which is a more natural fit.
> 
> The 'c' FILE* is one directional, and I just took it that the QEMUFile* is
> like that; i.e. a buffered layer on top of an underlying one directional
> transport. stdin,stdout are two separate FILE*'s.

Yep, QEMUFile was really designed as a FILE* alternative, so makes sense
from that POV.

> Your iniov, outiov would be basically the same, so you'd end up duplicating
> code for the in and out parts; where as what you really have is two of the same
> thing wired up in opposite directions.

I don't think it'd actually end up duplicating any code - mostly just
updating which variable was accessed in each existing method, depending
on whether it was a read or write related method.

> Having said that, for things like RDMA, they have to do special stuff for
> each direction and the QEMUFile is really a shim on top of that.

Similarly when we add TLS into the mix, there is a single shared TLS
session context that is used by both I/O directionals. Now this would
not be visible to the QEMUFile regardless, since its hidden in the
QIOChannel object I'm defining, so its not a show stopper either but
I guess my general thought is that there is a mixture of state that
we maintain some different for read vs write and some shared. You
workaround the fact that the FD is shared by having a comment saying
we should not call close() on the FD kept by the QEMUFile for the
return path.

All that said, I don't think it is too critical to change this right
now. It would be fine to leave it to a later date, unless there's a
more pressing reason.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 04/54] Move configuration section writing
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 04/54] Move configuration section writing Dr. David Alan Gilbert (git)
@ 2015-10-05  6:44   ` Amit Shah
  2015-10-30 12:47     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 119+ messages in thread
From: Amit Shah @ 2015-10-05  6:44 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, quintela, liang.z.li, qemu-devel, luis, bharata,
	pbonzini

On (Tue) 29 Sep 2015 [09:37:28], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> The vmstate_configuration is currently written
> in 'qemu_savevm_state_begin', move it to
> 'qemu_savevm_state_header' since it's got a hard
> requirement that it must be the 1st thing after
> the header.
> (In postcopy some 'command' sections get sent
> early before the saving of the main sections
> and hence before qemu_savevm_state_begin).
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

The function name 'savevm_state_header()' isn't accurate anymore.  Not
serious for this series.

		Amit

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 18/54] Migration commands
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 18/54] Migration commands Dr. David Alan Gilbert (git)
@ 2015-10-20 11:22   ` Juan Quintela
  0 siblings, 0 replies; 119+ messages in thread
From: Juan Quintela @ 2015-10-20 11:22 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Create QEMU_VM_COMMAND section type for sending commands from
> source to destination.  These commands are not intended to convey
> guest state but to control the migration process.
>
> For use in postcopy.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Amit Shah <amit.shah@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 19/54] Return path: Control commands
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 19/54] Return path: Control commands Dr. David Alan Gilbert (git)
@ 2015-10-20 11:27   ` Juan Quintela
  2015-10-26 11:42     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 119+ messages in thread
From: Juan Quintela @ 2015-10-20 11:27 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Add two src->dest commands:
>    * OPEN_RETURN_PATH - To request that the destination open the return path
>    * PING - Request an acknowledge from the destination
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Amit Shah <amit.shah@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>


> +void qemu_savevm_send_open_return_path(QEMUFile *f)
> +{
> +    qemu_savevm_command_send(f, MIG_CMD_OPEN_RETURN_PATH, 0, NULL);


For consistency, I would have put a

       trace_savevm_send_open_return_path(....) here

The send in the loadvm path

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 21/54] Return path: Source handling of return path
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 21/54] Return path: Source handling of return path Dr. David Alan Gilbert (git)
@ 2015-10-20 11:33   ` Juan Quintela
  2015-10-26 12:06     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 119+ messages in thread
From: Juan Quintela @ 2015-10-20 11:33 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Open a return path, and handle messages that are received upon it.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

> +/*
> + * Return true if we're already in the middle of a migration
> + * (i.e. any of the active or setup states)
> + */
> +static bool migration_is_active(MigrationState *ms)
> +{
> +    switch (ms->state) {
> +    case MIGRATION_STATUS_ACTIVE:
> +    case MIGRATION_STATUS_SETUP:
> +        return true;
> +
> +    default:
> +        return false;
> +
> +    }
> +}
> +


If you have to resend, you can split this bit, and update users around.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 24/54] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages.
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 24/54] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
@ 2015-10-20 11:50   ` Juan Quintela
  2015-10-26 12:22     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 119+ messages in thread
From: Juan Quintela @ 2015-10-20 11:50 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> The state of the postcopy process is managed via a series of messages;
>    * Add wrappers and handlers for sending/receiving these messages
>    * Add state variable that track the current state of postcopy
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Amit Shah <amit.shah@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>


> +    tmp[0] = cpu_to_be64(getpagesize());
> +    tmp[1] = cpu_to_be64(1ul << qemu_target_page_bits());

     we don't have a qemu_target_pagesize()?

#fail

> +    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_LISTEN, 0, NULL);

Should we create a macro

       qemu_savevm_command_noargs_send(f, MIG_CMD_POSTCOPY_LISTEN);

It is a "bit" clear, but saves a "whole" byte.  Not convinced one way or
other :-p




> +
> +    case MIG_CMD_POSTCOPY_ADVISE:
> +        tmp64a = qemu_get_be64(f); /* hps */
> +        tmp64b = qemu_get_be64(f); /* tps */
> +        return loadvm_postcopy_handle_advise(mis, tmp64a, tmp64b);

In the rest of the commands, you read the arguments inside the
loadvm_postocpy_handle_*(), I think you should do the same here.

Later, Juan.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 10/54] migration/ram.c: Use RAMBlock rather than MemoryRegion
       [not found]   ` <87zizdvm9m.fsf@neno.neno>
@ 2015-10-20 11:58     ` Juan Quintela
  0 siblings, 0 replies; 119+ messages in thread
From: Juan Quintela @ 2015-10-20 11:58 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: QEMU Developer

Juan Quintela <quintela@redhat.com> wrote:

Post proper list

Remove cc'd

> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>
>> RAM migration mainly works on RAMBlocks but in a few places
>> uses data from MemoryRegions to access the same information that's
>> already held in RAMBlocks; clean it up just to avoid the
>> MemoryRegion use.
>>
>> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

This was a leftover of when I tried to convert migration to use Memory
regions, yes, it didn't went too well

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 25/54] MIG_CMD_PACKAGED: Send a packaged chunk of migration stream
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 25/54] MIG_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
@ 2015-10-20 13:25   ` Juan Quintela
  2015-10-26 16:21     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 119+ messages in thread
From: Juan Quintela @ 2015-10-20 13:25 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> MIG_CMD_PACKAGED is a migration command that wraps a chunk of migration
> stream inside a package whose length can be determined purely by reading
> its header.  The destination guarantees that the whole MIG_CMD_PACKAGED
> is read off the stream prior to parsing the contents.
>
> This is used by postcopy to load device state (from the package)
> while leaving the main stream free to receive memory pages.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Amit Shah <amit.shah@redhat.com>


Reviewed-by: Juan Quintela <quintela@redhat.com>

But I propose the change below


> +    size_t len = qsb_get_length(qsb);

....

> +    /* all the data follows (concatinating the iov's) */
> +    for (cur_iov = 0; cur_iov < qsb->n_iov; cur_iov++) {
> +        /* The iov entries are partially filled */
> +        size_t towrite = (qsb->iov[cur_iov].iov_len > len) ?
> +                              len :
> +                              qsb->iov[cur_iov].iov_len;

Or something have been very wrong here, or qsb->iov[cur_iov].iov_len can
never be > len.  So this should be the same than:

size_t towrite = MIN(qsb->iov[cur_iov].iov_len, len);

right?


> +        len -= towrite;
> +
> +        if (!towrite) {
> +            break;
> +        }

This should never happen, right?  And if we want to be extra safe,


> +    QEMUFile *packf = qemu_bufopen("r", qsb);
> +
> +    ret = qemu_loadvm_state_main(packf, mis);
> +    trace_loadvm_handle_cmd_packaged_main(ret);
> +    qemu_fclose(packf);
> +    qsb_free(qsb);

Migration code is re-entrant!!!!!  Who would have guessed O:-)

Later, Juan.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 27/54] postcopy: OS support test
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 27/54] postcopy: OS support test Dr. David Alan Gilbert (git)
@ 2015-10-20 13:31   ` Juan Quintela
  0 siblings, 0 replies; 119+ messages in thread
From: Juan Quintela @ 2015-10-20 13:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Provide a check to see if the OS we're running on has all the bits
> needed for postcopy.
>
> Creates postcopy-ram.c which will get most of the other helpers we need.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Amit Shah <amit.shah@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

I still think that this bit should be in utils/*

Obvious candidates are

utils/osdep.c?


> +/*
> + * Postcopy migration for RAM
> + *
> + * Copyright 2013 Red Hat, Inc. and/or its affiliates
                ^^^^

Ouch .... it has taken some time ...

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 28/54] migrate_start_postcopy: Command to trigger transition to postcopy
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 28/54] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
  2015-09-30 16:25   ` Eric Blake
@ 2015-10-20 13:33   ` Juan Quintela
  2015-10-28 11:17   ` Amit Shah
  2 siblings, 0 replies; 119+ messages in thread
From: Juan Quintela @ 2015-10-20 13:33 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Once postcopy is enabled (with migrate_set_capability), the migration
> will still start on precopy mode.  To cause a transition into postcopy
> the:
>
>   migrate_start_postcopy
>
> command must be issued.  Postcopy will start sometime after this
> (when it's next checked in the migration loop).
>
> Issuing the command before migration has started will error,
> and issuing after it has finished is ignored.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 29/54] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 29/54] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
@ 2015-10-20 13:35   ` Juan Quintela
  2015-10-30 18:19     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 119+ messages in thread
From: Juan Quintela @ 2015-10-20 13:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> 'MIGRATION_STATUS_POSTCOPY_ACTIVE' is entered after migrate_start_postcopy
>
> 'migration_in_postcopy' is provided for other sections to know if
> they're in postcopy.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> Reviewed-by: Amit Shah <amit.shah@redhat.com>

Just wondering


> diff --git a/migration/migration.c b/migration/migration.c
> index 5ee2c11..2ae5909 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -439,6 +439,7 @@ static bool migration_is_active(MigrationState *ms)
>  {
>      switch (ms->state) {
>      case MIGRATION_STATUS_ACTIVE:
> +    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
>      case MIGRATION_STATUS_SETUP:
>          return true;
>  
> @@ -509,6 +510,39 @@ MigrationInfo *qmp_query_migrate(Error **errp)
>  
>          get_xbzrle_cache_stats(info);
>          break;
> +    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> +        /* Mostly the same as active; TODO add some postcopy stats */
> +        info->has_status = true;
> +        info->has_total_time = true;
> +        info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME)
> +            - s->total_time;
> +        info->has_expected_downtime = true;
> +        info->expected_downtime = s->expected_downtime;
> +        info->has_setup_time = true;
> +        info->setup_time = s->setup_time;
> +
> +        info->has_ram = true;
> +        info->ram = g_malloc0(sizeof(*info->ram));
> +        info->ram->transferred = ram_bytes_transferred();
> +        info->ram->remaining = ram_bytes_remaining();
> +        info->ram->total = ram_bytes_total();
> +        info->ram->duplicate = dup_mig_pages_transferred();
> +        info->ram->skipped = skipped_mig_pages_transferred();
> +        info->ram->normal = norm_mig_pages_transferred();
> +        info->ram->normal_bytes = norm_mig_bytes_transferred();
> +        info->ram->dirty_pages_rate = s->dirty_pages_rate;
> +        info->ram->mbps = s->mbps;
> +
> +        if (blk_mig_active()) {
> +            info->has_disk = true;
> +            info->disk = g_malloc0(sizeof(*info->disk));
> +            info->disk->transferred = blk_mig_bytes_transferred();
> +            info->disk->remaining = blk_mig_bytes_remaining();
> +            info->disk->total = blk_mig_bytes_total();
> +        }

Are we sure that disk migration works with postcopy?  I would expect no ...

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 30/54] Avoid sending vmdescription during postcopy
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 30/54] Avoid sending vmdescription during postcopy Dr. David Alan Gilbert (git)
@ 2015-10-20 13:35   ` Juan Quintela
  2015-10-28 11:19   ` Amit Shah
  1 sibling, 0 replies; 119+ messages in thread
From: Juan Quintela @ 2015-10-20 13:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> VMDescription is normally sent at the end, after all
> of the devices; however that's not the end for postcopy,
> so just don't send it when in postcopy.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 33/54] postcopy: Incoming initialisation
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 33/54] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
@ 2015-10-21  8:35   ` Juan Quintela
  2015-11-03 17:59     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 119+ messages in thread
From: Juan Quintela @ 2015-10-21  8:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> Reviewed-by: Amit Shah <amit.shah@redhat.com>

> +/*
> + * At the end of migration, undo the effects of init_range
> + * opaque should be the MIS.
> + */
> +static int cleanup_range(const char *block_name, void *host_addr,
> +                        ram_addr_t offset, ram_addr_t length, void *opaque)
> +{
> +    MigrationIncomingState *mis = opaque;
> +    struct uffdio_range range_struct;
> +    trace_postcopy_cleanup_range(block_name, host_addr, offset, length);
> +
> +    /*
> +     * We turned off hugepage for the precopy stage with postcopy enabled
> +     * we can turn it back on now.
> +     */
> +#ifdef MADV_HUGEPAGE
> +    if (madvise(host_addr, length, MADV_HUGEPAGE)) {
> +        error_report("%s HUGEPAGE: %s", __func__, strerror(errno));
> +        return -1;
> +    }
> +#endif

this should be the same than:

       qemu_madvise(host_addr, lenght, QEMU_MADV_HUGEPAGE);

Only problem I can see, is that there is no way to differentiate that
madvise() has given one error or that MADV_HUGEPAGE is not defined.

If we really want that:

   if (QEMU_MADV_HUGEPAGE != QEM_MADV_INVALID) {
      if (qemu_madvise(host_addr, length, QEMU_MADV_HUGEPAGE)) {
        error_report("%s HUGEPAGE: %s", __func__, strerror(errno));
        return -1;
   }

But I am not sure if we want it.


> +
> +    /*
> +     * We can also turn off userfault now since we should have all the
> +     * pages.   It can be useful to leave it on to debug postcopy
> +     * if you're not sure it's always getting every page.
> +     */
> +    range_struct.start = (uintptr_t)host_addr;
> +    range_struct.len = length;
> +
> +    if (ioctl(mis->userfault_fd, UFFDIO_UNREGISTER, &range_struct)) {
> +        error_report("%s: userfault unregister %s", __func__, strerror(errno));
> +
> +        return -1;
> +    }
> +
> +    return 0;
> +}


I still think that exposing the userfault API all around is a bad idea,
that it would be easier to just export:

qemu_userfault_register_range(addr, lenght);
qemu_userfault_unregister_range(addr, lenght);

And hide the details on a header file.

Later, Juan.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 35/54] Postcopy: Postcopy startup in migration thread
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 35/54] Postcopy: Postcopy startup in migration thread Dr. David Alan Gilbert (git)
@ 2015-10-21  8:57   ` Juan Quintela
  2015-10-26 17:12     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 119+ messages in thread
From: Juan Quintela @ 2015-10-21  8:57 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Rework the migration thread to setup and start postcopy.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Amit Shah <amit.shah@redhat.com>


> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index aecf284..0586f8c 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -136,6 +136,9 @@ struct MigrationState
>      /* Flag set once the migration has been asked to enter postcopy */
>      bool start_postcopy;
>  
> +    /* Flag set once the migration thread is running (and needs joining) */
> +    bool migration_thread_started;
> +
>      /* bitmap of pages that have been sent at least once
>       * only maintained and used in postcopy at the moment
>       * where it's used to send the dirtymap at the start

if you split it, we can already integrate the migration_thread_started.

I would suggest to change the name to migration_thread_running, but you
are the native one here O:-)

If you don't want to add this variable, I *think* we could use
MIGRATION_STATE_NONE with just some little rearangements of the code.
On the other hand, it could be racy :-(


Reviewed-by: Juan Quintela <quintela@redhat.com>


Split only or rename variable just if you consider it convenient.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 36/54] Split out end of migration code from migration_thread
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 36/54] Split out end of migration code from migration_thread Dr. David Alan Gilbert (git)
@ 2015-10-21  9:11   ` Juan Quintela
  0 siblings, 0 replies; 119+ messages in thread
From: Juan Quintela @ 2015-10-21  9:11 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
v> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> The code that gets run at the end of the migration process
> is getting large, and is about to have a chunk added for postcopy.
> Split it into a separate function.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

(but already upstream, so ...)

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 37/54] Postcopy: End of iteration
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 37/54] Postcopy: End of iteration Dr. David Alan Gilbert (git)
@ 2015-10-21  9:16   ` Juan Quintela
  2015-10-29  5:10   ` Amit Shah
  1 sibling, 0 replies; 119+ messages in thread
From: Juan Quintela @ 2015-10-21  9:16 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> The end of migration in postcopy is a bit different since some of
> the things normally done at the end of migration have already been
> done on the transition to postcopy.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 53/54] Disable mlock around incoming postcopy
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 53/54] Disable mlock around incoming postcopy Dr. David Alan Gilbert (git)
@ 2015-10-21  9:17   ` Juan Quintela
  0 siblings, 0 replies; 119+ messages in thread
From: Juan Quintela @ 2015-10-21  9:17 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Userfault doesn't work with mlock; mlock is designed to nail down pages
> so they don't move, userfault is designed to tell you when they're not
> there.
>
> munlock the pages we userfault protect before postcopy.
> mlock everything again at the end if mlock is enabled.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> Reviewed-by: Amit Shah <amit.shah@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 38/54] Page request: Add MIG_RP_MSG_REQ_PAGES reverse command
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 38/54] Page request: Add MIG_RP_MSG_REQ_PAGES reverse command Dr. David Alan Gilbert (git)
@ 2015-10-21 11:12   ` Juan Quintela
  2015-10-26 16:58     ` Dr. David Alan Gilbert
  2015-10-29  5:17   ` Amit Shah
  1 sibling, 1 reply; 119+ messages in thread
From: Juan Quintela @ 2015-10-21 11:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Add MIG_RP_MSG_REQ_PAGES command on Return path for the postcopy
> destination to request a page from the source.
>
> Two versions exist:
>    MIG_RP_MSG_REQ_PAGES_ID that includes a RAMBlock name and start/len
>    MIG_RP_MSG_REQ_PAGES that just has start/len for use with the same
>                         RAMBlock as a previous MIG_RP_MSG_REQ_PAGES_ID
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>


Reviewed-by: Juan Quintela <quintela@redhat.com>


> diff --git a/migration/migration.c b/migration/migration.c
> index 4f8ef6f..e994164 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -251,6 +251,35 @@ static void deferred_incoming_migration(Error **errp)
>      deferred_incoming = true;
>  }
>  
> +/* Request a range of pages from the source VM at the given
> + * start address.
> + *   rbname: Name of the RAMBlock to request the page in, if NULL it's the same
> + *           as the last request (a name must have been given previously)
> + *   Start: Address offset within the RB
> + *   Len: Length in bytes required - must be a multiple of pagesize
> + */
> +void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
> +                               ram_addr_t start, size_t len)
> +{
> +    uint8_t bufc[12 + 1 + 255]; /* start (8), len (4), rbname upto 256 */
> +    size_t msglen = 12; /* start + len */
> +
> +    *(uint64_t *)bufc = cpu_to_be64((uint64_t)start);
> +    *(uint32_t *)(bufc + 8) = cpu_to_be32((uint32_t)len);

struct foo {
     uint64_t start;
     uint32_t len;
     char msg[];
}

As we are supposed to have the same qemu on both sides, this should
work, no?

Reviewed by anyways, because I am not sure that proposed solution is
(even) better than current code.

Later, Juan.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 39/54] Page request: Process incoming page request
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 39/54] Page request: Process incoming page request Dr. David Alan Gilbert (git)
@ 2015-10-21 11:17   ` Juan Quintela
  0 siblings, 0 replies; 119+ messages in thread
From: Juan Quintela @ 2015-10-21 11:17 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> On receiving MIG_RPCOMM_REQ_PAGES look up the address and
> queue the page.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Amit Shah <amit.shah@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 32/54] Postcopy: Maintain sentmap and calculate discard
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 32/54] Postcopy: Maintain sentmap and calculate discard Dr. David Alan Gilbert (git)
@ 2015-10-21 11:17   ` Juan Quintela
  2015-10-30 18:43     ` Dr. David Alan Gilbert
                       ` (3 more replies)
  0 siblings, 4 replies; 119+ messages in thread
From: Juan Quintela @ 2015-10-21 11:17 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Where postcopy is preceeded by a period of precopy, the destination will
> have received pages that may have been dirtied on the source after the
> page was sent.  The destination must throw these pages away before
> starting it's CPUs.
>
> Maintain a 'sentmap' of pages that have already been sent.
> Calculate list of sent & dirty pages
> Provide helpers on the destination side to discard these.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Amit Shah <amit.shah@redhat.com>


Hi

>      /* Flag set once the migration has been asked to enter postcopy */
>      bool start_postcopy;


This is from a previous patch, but ....

Change the name of the variable or the comment?  From the comment it
sholud be "in_postcopy", no?


> +
> +    /* bitmap of pages that have been sent at least once
> +     * only maintained and used in postcopy at the moment
> +     * where it's used to send the dirtymap at the start
> +     * of the postcopy phase
> +     */
> +    unsigned long *sentmap;
> };

I *think* that patch would be easier if you put this one inside
migration_bitmap_rcu.  If you put it there, yoqu could do the 

> +                if (ms->sentmap) {
> +                    set_bit(dirty_ram_abs >> TARGET_PAGE_BITS, ms->sentmap);
> +                }

And you wouldn't have to change all callers to have an ram_addr_abs
address parameter, right?


> +struct PostcopyDiscardState {
> +    const char *name;

Iht is not obvious to me what name means here.  I assume ram block name,
change it to block_name, ramblock?


> + * returns: 0 on success.
> + */
> +int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
> +                               size_t length)
> +{
> +    trace_postcopy_ram_discard_range(start, length);
> +    if (madvise(start, length, MADV_DONTNEED)) {
> +        error_report("%s MADV_DONTNEED: %s", __func__, strerror(errno));
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
>  #else
>  /* No target OS support, stubs just fail */
>  bool postcopy_ram_supported_by_host(void)
> @@ -153,5 +192,95 @@ bool postcopy_ram_supported_by_host(void)
>      return false;
>  }
>  
> +int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
> +                               size_t length)
> +{
> +    assert(0);

I will assume that just returning -1 would work here.

But yes, I understand that this code shouldn't be reach ...

> +}
>  #endif
>  
> +/* ------------------------------------------------------------------------- */
> +
> +/**
> + * postcopy_discard_send_init: Called at the start of each RAMBlock before
> + *   asking to discard individual ranges.
> + *
> + * @ms: The current migration state.
> + * @offset: the bitmap offset of the named RAMBlock in the migration
> + *   bitmap.
> + * @name: RAMBlock that discards will operate on.
> + *
> + * returns: a new PDS.
> + */
> +PostcopyDiscardState *postcopy_discard_send_init(MigrationState *ms,
> +                                                 unsigned long offset,
> +                                                 const char *name)
> +{
> +    PostcopyDiscardState *res = g_try_malloc(sizeof(PostcopyDiscardState));

Why are we using here g_try_malloc instead of g_malloc()?  Even
g_malloc0()?

Specially when we don't check if res is NULL on return.  Please change.


> +
> +    if (res) {
> +        res->name = name;
> +        res->cur_entry = 0;
> +        res->nsentwords = 0;
> +        res->nsentcmds = 0;

With the zero variant, this three can be removed.

> +        res->offset = offset;
> +    }
> +
> +    return res;
> +}

> -/* Called with rcu_read_lock() to protect migration_bitmap */
> +/* Called with rcu_read_lock() to protect migration_bitmap
> + * mr: The region to search for dirty pages in

Haha, you forgot to update the comment when you moved the function to
use ram blocks O:-)


> @@ -662,6 +672,24 @@ static int save_zero_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
>  }
>  
>  /**
> + * ram_find_block_by_id: Find a ramblock by name.
> + *
> + * Returns: The RAMBlock with matching ID, or NULL.
> + */
> +static RAMBlock *ram_find_block_by_id(const char *id)
> +{
> +    RAMBlock *block;
> +
> +    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
> +        if (!strcmp(id, block->idstr)) {
> +            return block;
> +        }
> +    }
> +
> +    return NULL;
> +}

We don't have this function already.....

Once here, could we split it in its own patch and use it in ram_load?


                QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
                    if (!strncmp(id, block->idstr, sizeof(id))) {
                        if (length != block->used_length) {
                            Error *local_err = NULL;

                            ret = qemu_ram_resize(block->offset, length, &local_err);
                            if (local_err) {
                                error_report_err(local_err);
                            }
                        }
                        ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG,
                                              block->idstr);
                        break;
                    }
                }

                if (!block) {
                    error_report("Unknown ramblock \"%s\", cannot "
                                 "accept migration", id);
                    ret = -EINVAL;
                }


We could also use it in:

host_from_stream_offset


> +/* **** functions for postcopy ***** */
> +
> +/*
> + * Callback from postcopy_each_ram_send_discard for each RAMBlock
> + * start,end: Indexes into the bitmap for the first and last bit
> + *            representing the named block
> + */
> +static int postcopy_send_discard_bm_ram(MigrationState *ms,
> +                                        PostcopyDiscardState *pds,
> +                                        unsigned long start, unsigned long end)
> +{
> +    unsigned long current;
> +
> +    for (current = start; current <= end; ) {
> +        unsigned long set = find_next_bit(ms->sentmap, end + 1, current);
> +
> +        if (set <= end) {
> +            unsigned long zero = find_next_zero_bit(ms->sentmap,
> +                                                    end + 1, set + 1);
> +
> +            if (zero > end) {
> +                zero = end + 1;
> +            }
> +            postcopy_discard_send_range(ms, pds, set, zero - 1);
> +            current = zero + 1;
> +        } else {
> +            current = set;
> +        }
> +    }

I think I undrestood the logic  here at the end....

But if we change the meaning of postcopy_discard_send_range() from
(begin, end), to (begin, length), I think everything goes clearer, no?

        if (set <= end) {
            unsigned long zero = find_next_zero_bit(ms->sentmap,
                                                    end + 1, set + 1);
            unsigned long length;

            if (zero > end) {
                length = end - set;
            } else {
                lenght = zero - 1 - set;
                current = zero + 1;
            }
            postcopy_discard_send_range(ms, pds, set, len);
        } else {
            current = set;
        }
    }

Y would clame that if we call one zero, the other would be called one.
Or change to set/unset, but that is just me.  Yes, I haven't tested, and
it is possible that there is a off-by-one somewhere...


Looking at postocpy_eand_ram_send_discard, I even think that it would be
a good idea to pass length to this function.

> +/*
> + * Transmit the set of pages to be discarded after precopy to the target
> + * these are pages that:
> + *     a) Have been previously transmitted but are now dirty again
> + *     b) Pages that have never been transmitted, this ensures that
> + *        any pages on the destination that have been mapped by background
> + *        tasks get discarded (transparent huge pages is the specific concern)
> + * Hopefully this is pretty sparse
> + */
> +int ram_postcopy_send_discard_bitmap(MigrationState *ms)
> +{
> +    int ret;
> +
> +    rcu_read_lock();
> +
> +    /* This should be our last sync, the src is now paused */
> +    migration_bitmap_sync();
> +
> +    /*
> +     * Update the sentmap to be sentmap = ~sentmap | dirty
> +     */
> +    bitmap_complement(ms->sentmap, ms->sentmap,
> +               last_ram_offset() >> TARGET_PAGE_BITS);
> +
> +    bitmap_or(ms->sentmap, ms->sentmap, migration_bitmap,
> +               last_ram_offset() >> TARGET_PAGE_BITS);

This bitmaps are really big, I don't know how long would take to do this
operations here, but I think that we can avoid (at least) the
bitmap_complement.  We can change the bitmap name to notsentbitmap, init
it to one and clear it each time that we sent a page, no?

We can also do the bitmap_or() at migration_sync_bitmap() time, at that
point, we shouldn't be on the critical path?

Later, Juan.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 19/54] Return path: Control commands
  2015-10-20 11:27   ` Juan Quintela
@ 2015-10-26 11:42     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert @ 2015-10-26 11:42 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Add two src->dest commands:
> >    * OPEN_RETURN_PATH - To request that the destination open the return path
> >    * PING - Request an acknowledge from the destination
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: Amit Shah <amit.shah@redhat.com>
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> 
> 
> > +void qemu_savevm_send_open_return_path(QEMUFile *f)
> > +{
> > +    qemu_savevm_command_send(f, MIG_CMD_OPEN_RETURN_PATH, 0, NULL);
> 
> 
> For consistency, I would have put a
> 
>        trace_savevm_send_open_return_path(....) here
> 
> The send in the loadvm path

Done.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 21/54] Return path: Source handling of return path
  2015-10-20 11:33   ` Juan Quintela
@ 2015-10-26 12:06     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert @ 2015-10-26 12:06 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Open a return path, and handle messages that are received upon it.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> 
> > +/*
> > + * Return true if we're already in the middle of a migration
> > + * (i.e. any of the active or setup states)
> > + */
> > +static bool migration_is_active(MigrationState *ms)
> > +{
> > +    switch (ms->state) {
> > +    case MIGRATION_STATUS_ACTIVE:
> > +    case MIGRATION_STATUS_SETUP:
> > +        return true;
> > +
> > +    default:
> > +        return false;
> > +
> > +    }
> > +}
> > +
> 
> 
> If you have to resend, you can split this bit, and update users around.

Done; and renamed to migration_is_setup_or_active.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 24/54] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages.
  2015-10-20 11:50   ` Juan Quintela
@ 2015-10-26 12:22     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert @ 2015-10-26 12:22 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > The state of the postcopy process is managed via a series of messages;
> >    * Add wrappers and handlers for sending/receiving these messages
> >    * Add state variable that track the current state of postcopy
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: Amit Shah <amit.shah@redhat.com>
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> 
> 
> > +    tmp[0] = cpu_to_be64(getpagesize());
> > +    tmp[1] = cpu_to_be64(1ul << qemu_target_page_bits());
> 
>      we don't have a qemu_target_pagesize()?
> 
> #fail

Well we didn't even have qemu_target_page_bits() until patch 1 - I
could add pagesize as well if you prefer?

> > +    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_LISTEN, 0, NULL);
> 
> Should we create a macro
> 
>        qemu_savevm_command_noargs_send(f, MIG_CMD_POSTCOPY_LISTEN);
> 
> It is a "bit" clear, but saves a "whole" byte.  Not convinced one way or
> other :-p

Doesn't seem worth it to me.

> > +
> > +    case MIG_CMD_POSTCOPY_ADVISE:
> > +        tmp64a = qemu_get_be64(f); /* hps */
> > +        tmp64b = qemu_get_be64(f); /* tps */
> > +        return loadvm_postcopy_handle_advise(mis, tmp64a, tmp64b);
> 
> In the rest of the commands, you read the arguments inside the
> loadvm_postocpy_handle_*(), I think you should do the same here.

Hmm; actually most of them I don't do it in the handle_ function,
only the ones that were dynamically sized I do; however it is
neater doing it that way so I'll change all the places in that
switch to do it in the handle.

Dave

> 
> Later, Juan.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 25/54] MIG_CMD_PACKAGED: Send a packaged chunk of migration stream
  2015-10-20 13:25   ` Juan Quintela
@ 2015-10-26 16:21     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert @ 2015-10-26 16:21 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > MIG_CMD_PACKAGED is a migration command that wraps a chunk of migration
> > stream inside a package whose length can be determined purely by reading
> > its header.  The destination guarantees that the whole MIG_CMD_PACKAGED
> > is read off the stream prior to parsing the contents.
> >
> > This is used by postcopy to load device state (from the package)
> > while leaving the main stream free to receive memory pages.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: Amit Shah <amit.shah@redhat.com>
> 
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> 
> But I propose the change below
> 
> 
> > +    size_t len = qsb_get_length(qsb);
> 
> ....
> 
> > +    /* all the data follows (concatinating the iov's) */
> > +    for (cur_iov = 0; cur_iov < qsb->n_iov; cur_iov++) {
> > +        /* The iov entries are partially filled */
> > +        size_t towrite = (qsb->iov[cur_iov].iov_len > len) ?
> > +                              len :
> > +                              qsb->iov[cur_iov].iov_len;
> 
> Or something have been very wrong here, or qsb->iov[cur_iov].iov_len can
> never be > len.  So this should be the same than:
> 
> size_t towrite = MIN(qsb->iov[cur_iov].iov_len, len);
> 
> right?

Done.

> > +        len -= towrite;
> > +
> > +        if (!towrite) {
> > +            break;
> > +        }
> 
> This should never happen, right?  And if we want to be extra safe,

qsb_get_length() returns the amount of data in the qsb, not the
amount of allocated space; so it's legal for the qsb to have
allocated an iov entry but not actually put any data in it yet.
Will it have done that in our case? I don't think so, but no reason
to make assumptions.

> > +    QEMUFile *packf = qemu_bufopen("r", qsb);
> > +
> > +    ret = qemu_loadvm_state_main(packf, mis);
> > +    trace_loadvm_handle_cmd_packaged_main(ret);
> > +    qemu_fclose(packf);
> > +    qsb_free(qsb);
> 
> Migration code is re-entrant!!!!!  Who would have guessed O:-)

To a very limited degree; there's global state shotgunned around everywhere
(e.g. in the RAM code).

Dave

> Later, Juan.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 40/54] Page request: Consume pages off the post-copy queue
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 40/54] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
@ 2015-10-26 16:32   ` Juan Quintela
  2015-11-03 11:52     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 119+ messages in thread
From: Juan Quintela @ 2015-10-26 16:32 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> When transmitting RAM pages, consume pages that have been queued by
> MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning.
>
> Note:
>   a) After a queued page the linear walk carries on from after the
> unqueued page; there is a reasonable chance that the destination
> was about to ask for other closeby pages anyway.
>
>   b) We have to be careful of any assumptions that the page walking
> code makes, in particular it does some short cuts on its first linear
> walk that break as soon as we do a queued page.
>
>   c) We have to be careful to not break up host-page size chunks, since
> this makes it harder to place the pages on the destination.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  migration/ram.c | 195 +++++++++++++++++++++++++++++++++++++++++++++++---------
>  trace-events    |   2 +
>  2 files changed, 168 insertions(+), 29 deletions(-)
>
> diff --git a/migration/ram.c b/migration/ram.c
> index 5771983..487e838 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -516,9 +516,9 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t **current_data,
>   * Returns: byte offset within memory region of the start of a dirty page
>   */
>  static inline
> -ram_addr_t migration_bitmap_find_and_reset_dirty(RAMBlock *rb,
> -                                                 ram_addr_t start,
> -                                                 ram_addr_t *ram_addr_abs)
> +ram_addr_t migration_bitmap_find_dirty(RAMBlock *rb,
> +                                       ram_addr_t start,
> +                                       ram_addr_t *ram_addr_abs)
>  {
>      unsigned long base = rb->offset >> TARGET_PAGE_BITS;
>      unsigned long nr = base + (start >> TARGET_PAGE_BITS);
> @@ -535,15 +535,24 @@ ram_addr_t migration_bitmap_find_and_reset_dirty(RAMBlock *rb,
>          next = find_next_bit(bitmap, size, nr);
>      }
>  
> -    if (next < size) {
> -        clear_bit(next, bitmap);
> -        migration_dirty_pages--;
> -    }
>      *ram_addr_abs = next << TARGET_PAGE_BITS;
>      return (next - base) << TARGET_PAGE_BITS;
>  }
>  
> -/* Called with rcu_read_lock() to protect migration_bitmap */
> +static inline bool migration_bitmap_clear_dirty(ram_addr_t addr)
> +{
> +    bool ret;
> +    int nr = addr >> TARGET_PAGE_BITS;
> +    unsigned long *bitmap = atomic_rcu_read(&migration_bitmap);
> +
> +    ret = test_and_clear_bit(nr, bitmap);
> +
> +    if (ret) {
> +        migration_dirty_pages--;
> +    }
> +    return ret;
> +}
> +
>  static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length)
>  {
>      unsigned long *bitmap;
> @@ -960,9 +969,8 @@ static int ram_save_compressed_page(QEMUFile *f, RAMBlock *block,
>  static bool find_dirty_block(QEMUFile *f, PageSearchStatus *pss,
>                               bool *again, ram_addr_t *ram_addr_abs)
>  {
> -    pss->offset = migration_bitmap_find_and_reset_dirty(pss->block,
> -                                                       pss->offset,
> -                                                       ram_addr_abs);
> +    pss->offset = migration_bitmap_find_dirty(pss->block, pss->offset,
> +                                              ram_addr_abs);
>      if (pss->complete_round && pss->block == last_seen_block &&
>          pss->offset >= last_offset) {
>          /*
> @@ -1001,6 +1009,88 @@ static bool find_dirty_block(QEMUFile *f, PageSearchStatus *pss,
>      }
>  }
>  
> +/*
> + * Unqueue a page from the queue fed by postcopy page requests; skips pages
> + * that are already sent (!dirty)
> + *
> + * Returns:      true if a queued page is found
> + *      ms:      MigrationState in
> + *     pss:      PageSearchStatus structure updated with found block/offset
> + * ram_addr_abs: global offset in the dirty/sent bitmaps
> + */
> +static bool get_queued_page(MigrationState *ms, PageSearchStatus *pss,
> +                            ram_addr_t *ram_addr_abs)
> +{
> +    RAMBlock  *block;
> +    ram_addr_t offset;
> +    bool dirty;
> +
> +    do {
> +        block = NULL;
> +        qemu_mutex_lock(&ms->src_page_req_mutex);
> +        if (!QSIMPLEQ_EMPTY(&ms->src_page_requests)) {
> +            struct MigrationSrcPageRequest *entry =
> +                                    QSIMPLEQ_FIRST(&ms->src_page_requests);
> +            block = entry->rb;
> +            offset = entry->offset;
> +            *ram_addr_abs = (entry->offset + entry->rb->offset) &
> +                            TARGET_PAGE_MASK;
> +
> +            if (entry->len > TARGET_PAGE_SIZE) {
> +                entry->len -= TARGET_PAGE_SIZE;
> +                entry->offset += TARGET_PAGE_SIZE;
> +            } else {
> +                memory_region_unref(block->mr);
> +                QSIMPLEQ_REMOVE_HEAD(&ms->src_page_requests, next_req);
> +                g_free(entry);
> +            }
> +        }
> +        qemu_mutex_unlock(&ms->src_page_req_mutex);

Can we spilt this chunk with a name like:

it_is_complicated_to_get_the_first_queued_pagge(&ms, &block, &offset,
ram_addr_abs) or something like that?

Yes, we can improve naming here.

> +
> +        /*
> +         * We're sending this page, and since it's postcopy nothing else
> +         * will dirty it, and we must make sure it doesn't get sent again
> +         * even if this queue request was received after the background
> +         * search already sent it.
> +         */
> +        if (block) {
> +            dirty = test_bit(*ram_addr_abs >> TARGET_PAGE_BITS,
> +                             migration_bitmap);


You need to do the atomic_rcu_read(&migration_bitmap) dance, no?


Why don't you do here a test_and_clear_bit() and then you don't have to
change migration_bintmap_find_and_reset_dirty()


All our migration code works with ram address, but we need basically
everywhere page numbers.  I am not sure if things will get clearer/more
complicated if we changed the conventions to use page_number insntead of
ram_addr_abs.  But this one is completely independent of this patch.

> +            if (!dirty) {
> +                trace_get_queued_page_not_dirty(
> +                    block->idstr, (uint64_t)offset,
> +                    (uint64_t)*ram_addr_abs,
> +                    test_bit(*ram_addr_abs >> TARGET_PAGE_BITS, ms->sentmap));
> +            } else {
> +                trace_get_queued_page(block->idstr,
> +                                      (uint64_t)offset,
> +                                      (uint64_t)*ram_addr_abs);
> +            }
> +        }
> +
> +    } while (block && !dirty);
> +
> +    if (block) {
> +        /*
> +         * As soon as we start servicing pages out of order, then we have
> +         * to kill the bulk stage, since the bulk stage assumes
> +         * in (migration_bitmap_find_and_reset_dirty) that every page is
> +         * dirty, that's no longer true.
> +         */
> +        ram_bulk_stage = false;
> +
> +        /*
> +         * We want the background search to continue from the queued page
> +         * since the guest is likely to want other pages near to the page
> +         * it just requested.
> +         */
> +        pss->block = block;
> +        pss->offset = offset;
> +    }
> +
> +    return !!block;
> +}
> +
>  /**
>   * flush_page_queue: Flush any remaining pages in the ram request queue
>   *    it should be empty at the end anyway, but in error cases there may be
> @@ -1087,6 +1177,57 @@ err:
>  
>  
>  /**
> + * ram_save_host_page: Starting at *offset send pages upto the end
> + *                     of the current host page.  It's valid for the initial
> + *                     offset to point into the middle of a host page
> + *                     in which case the remainder of the hostpage is sent.
> + *                     Only dirty target pages are sent.
> + *
> + * Returns: Number of pages written.
> + *
> + * @f: QEMUFile where to send the data
> + * @block: pointer to block that contains the page we want to send
> + * @offset: offset inside the block for the page; updated to last target page
> + *          sent
> + * @last_stage: if we are at the completion stage
> + * @bytes_transferred: increase it with the number of transferred bytes
> + */
> +static int ram_save_host_page(MigrationState *ms, QEMUFile *f, RAMBlock* block,
> +                              ram_addr_t *offset, bool last_stage,
> +                              uint64_t *bytes_transferred,
> +                              ram_addr_t dirty_ram_abs)
> +{
> +    int tmppages, pages = 0;
> +    do {
> +        /* Check the pages is dirty and if it is send it */
> +        if (migration_bitmap_clear_dirty(dirty_ram_abs)) {
> +            if (compression_switch && migrate_use_compression()) {
> +                tmppages = ram_save_compressed_page(f, block, *offset,
> +                                                    last_stage,
> +                                                    bytes_transferred);
> +            } else {
> +                tmppages = ram_save_page(f, block, *offset, last_stage,
> +                                         bytes_transferred);
> +            }
> +
> +            if (tmppages < 0) {
> +                return tmppages;
> +            }
> +            if (ms->sentmap) {
> +                set_bit(dirty_ram_abs >> TARGET_PAGE_BITS, ms->sentmap);
> +            }
> +            pages += tmppages;
> +        }
> +        *offset += TARGET_PAGE_SIZE;
> +        dirty_ram_abs += TARGET_PAGE_SIZE;
> +    } while (*offset & (qemu_host_page_size - 1));
> +
> +    /* The offset we leave with is the last one we looked at */
> +    *offset -= TARGET_PAGE_SIZE;
> +    return pages;
> +}

Split this function first to make changes easier to gasp?

We are doing (at least) two quite different things here.


> +
> +/**
>   * ram_find_and_save_block: Finds a dirty page and sends it to f
>   *
>   * Called within an RCU critical section.
> @@ -1097,12 +1238,16 @@ err:
>   * @f: QEMUFile where to send the data
>   * @last_stage: if we are at the completion stage
>   * @bytes_transferred: increase it with the number of transferred bytes
> + *
> + * On systems where host-page-size > target-page-size it will send all the
> + * pages in a host page that are dirty.
>   */
>  
>  static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
>                                     uint64_t *bytes_transferred)
>  {
>      PageSearchStatus pss;
> +    MigrationState *ms = migrate_get_current();
>      int pages = 0;
>      bool again, found;
>      ram_addr_t dirty_ram_abs; /* Address of the start of the dirty page in
> @@ -1117,26 +1262,18 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
>      }
>  
>      do {
> -        found = find_dirty_block(f, &pss, &again, &dirty_ram_abs);
> +        again = true;
> +        found = get_queued_page(ms, &pss, &dirty_ram_abs);
>  
> -        if (found) {
> -            if (compression_switch && migrate_use_compression()) {
> -                pages = ram_save_compressed_page(f, pss.block, pss.offset,
> -                                                 last_stage,
> -                                                 bytes_transferred);
> -            } else {
> -                pages = ram_save_page(f, pss.block, pss.offset, last_stage,
> -                                      bytes_transferred);
> -            }
> +        if (!found) {
> +            /* priority queue empty, so just search for something dirty */
> +            found = find_dirty_block(f, &pss, &again, &dirty_ram_abs);
> +        }
>  
> -            /* if page is unmodified, continue to the next */
> -            if (pages > 0) {
> -                MigrationState *ms = migrate_get_current();
> -                last_sent_block = pss.block;
> -                if (ms->sentmap) {
> -                    set_bit(dirty_ram_abs >> TARGET_PAGE_BITS, ms->sentmap);
> -                }
> -            }
> +        if (found) {
> +            pages = ram_save_host_page(ms, f, pss.block, &pss.offset,
> +                                       last_stage, bytes_transferred,
> +                                       dirty_ram_abs);
>          }
>      } while (!pages && again);


Using too loops here?
This is the code after your changes:


     do {
        again = true;
        found = get_queued_page(ms, &pss, &dirty_ram_abs);

        if (!found) {
            /* priority queue empty, so just search for something dirty */
            found = find_dirty_block(f, &pss, &again, &dirty_ram_abs);
        }

        if (found) {
            pages = ram_save_host_page(ms, f, pss.block, &pss.offset,
                                       last_stage, bytes_transferred,
                                       dirty_ram_abs);
        }
     } while (!pages && again);




while (get_queued_page(ms, &pss, &dirty_ram_abs)) {
    pages = ram_save_host_page(ms, f, pss.block, &pss.offset,
                               last_stage, bytes_transferred,
                               dirty_ram_abs);
}



do {
        /* priority queue empty, so just search for something dirty */
        found = find_dirty_block(f, &pss, &again, &dirty_ram_abs);

        if (found) {
            pages = ram_save_host_page(ms, f, pss.block, &pss.offset,
                                       last_stage, bytes_transferred,
                                       dirty_ram_abs);
        }
     } while (!pages && again);


We repeat the ram_save_host_page() call, but IMHO, it is easrier to see
what we are doing, and specially how we get out of the loop.

Later, Juan.


>  
> diff --git a/trace-events b/trace-events
> index e40f00e..9e4206b 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1244,6 +1244,8 @@ vmstate_subsection_load_good(const char *parent) "%s"
>  qemu_file_fclose(void) ""
>  
>  # migration/ram.c
> +get_queued_page(const char *block_name, uint64_t tmp_offset, uint64_t ram_addr) "%s/%" PRIx64 " ram_addr=%" PRIx64
> +get_queued_page_not_dirty(const char *block_name, uint64_t tmp_offset, uint64_t ram_addr, int sent) "%s/%" PRIx64 " ram_addr=%" PRIx64 " (sent=%d)"
>  migration_bitmap_sync_start(void) ""
>  migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64""
>  migration_throttle(void) ""

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 38/54] Page request: Add MIG_RP_MSG_REQ_PAGES reverse command
  2015-10-21 11:12   ` Juan Quintela
@ 2015-10-26 16:58     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert @ 2015-10-26 16:58 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Add MIG_RP_MSG_REQ_PAGES command on Return path for the postcopy
> > destination to request a page from the source.
> >
> > Two versions exist:
> >    MIG_RP_MSG_REQ_PAGES_ID that includes a RAMBlock name and start/len
> >    MIG_RP_MSG_REQ_PAGES that just has start/len for use with the same
> >                         RAMBlock as a previous MIG_RP_MSG_REQ_PAGES_ID
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> 
> 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 4f8ef6f..e994164 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -251,6 +251,35 @@ static void deferred_incoming_migration(Error **errp)
> >      deferred_incoming = true;
> >  }
> >  
> > +/* Request a range of pages from the source VM at the given
> > + * start address.
> > + *   rbname: Name of the RAMBlock to request the page in, if NULL it's the same
> > + *           as the last request (a name must have been given previously)
> > + *   Start: Address offset within the RB
> > + *   Len: Length in bytes required - must be a multiple of pagesize
> > + */
> > +void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
> > +                               ram_addr_t start, size_t len)
> > +{
> > +    uint8_t bufc[12 + 1 + 255]; /* start (8), len (4), rbname upto 256 */
> > +    size_t msglen = 12; /* start + len */
> > +
> > +    *(uint64_t *)bufc = cpu_to_be64((uint64_t)start);
> > +    *(uint32_t *)(bufc + 8) = cpu_to_be32((uint32_t)len);
> 
> struct foo {
>      uint64_t start;
>      uint32_t len;
>      char msg[];
> }
> 
> As we are supposed to have the same qemu on both sides, this should
> work, no?

In principal I think it should work between opposite endian hosts;
if they are capable of running the same endian guest (untested);
similarly there's no requirement for the two qemu's to be built
with the same compiler, or I think host word size.
Using structs on a wire always makes me start worrying what
the compiler would do to the layout; I'd assume it would at least
need a 'packed' to be sure a particularly entertaining compiler
doesn't do something odd.

> Reviewed by anyways, because I am not sure that proposed solution is
> (even) better than current code.

Dave

> 
> Later, Juan.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 35/54] Postcopy: Postcopy startup in migration thread
  2015-10-21  8:57   ` Juan Quintela
@ 2015-10-26 17:12     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert @ 2015-10-26 17:12 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Rework the migration thread to setup and start postcopy.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: Amit Shah <amit.shah@redhat.com>
> 
> 
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index aecf284..0586f8c 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -136,6 +136,9 @@ struct MigrationState
> >      /* Flag set once the migration has been asked to enter postcopy */
> >      bool start_postcopy;
> >  
> > +    /* Flag set once the migration thread is running (and needs joining) */
> > +    bool migration_thread_started;
> > +
> >      /* bitmap of pages that have been sent at least once
> >       * only maintained and used in postcopy at the moment
> >       * where it's used to send the dirtymap at the start
> 
> if you split it, we can already integrate the migration_thread_started.
> 
> I would suggest to change the name to migration_thread_running, but you
> are the native one here O:-)

Changed.

> If you don't want to add this variable, I *think* we could use
> MIGRATION_STATE_NONE with just some little rearangements of the code.
> On the other hand, it could be racy :-(

Yeh, it seemed simpler to have a nice simple bool.

> Reviewed-by: Juan Quintela <quintela@redhat.com>
> 
> 
> Split only or rename variable just if you consider it convenient.

Renamed, not split.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 41/54] postcopy_ram.c: place_page and helpers
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 41/54] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
@ 2015-10-28 10:28   ` Juan Quintela
  2015-10-28 13:11     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 119+ messages in thread
From: Juan Quintela @ 2015-10-28 10:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> postcopy_place_page (etc) provide a way for postcopy to place a page
> into guests memory atomically (using the copy ioctl on the ufd).
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Amit Shah <amit.shah@redhat.com>


Reviewed-by: Juan Quintela <quintela@redhat.com>

> +int postcopy_place_page_zero(MigrationIncomingState *mis, void *host)
> +{
> +    struct uffdio_zeropage zero_struct;
> +
> +    zero_struct.range.start = (uint64_t)(uintptr_t)host;
> +    zero_struct.range.len = getpagesize();
> +    zero_struct.mode = 0;
> +
> +    if (ioctl(mis->userfault_fd, UFFDIO_ZEROPAGE, &zero_struct)) {
> +        int e = errno;
> +        error_report("%s: %s zero host: %p",
> +                     __func__, strerror(e), host);
> +
> +        return -e;
> +    }
> +
> +    trace_postcopy_place_page_zero(host);
> +    return 0;
> +}

Would this be faster than normal precopy way of just copying a zero page?

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 09/54] Add QEMU_MADV_NOHUGEPAGE
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 09/54] Add QEMU_MADV_NOHUGEPAGE Dr. David Alan Gilbert (git)
@ 2015-10-28 10:35   ` Amit Shah
  0 siblings, 0 replies; 119+ messages in thread
From: Amit Shah @ 2015-10-28 10:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, quintela, liang.z.li, qemu-devel, luis, bharata,
	pbonzini

On (Tue) 29 Sep 2015 [09:37:33], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Add QEMU_MADV_NOHUGEPAGE as an OS-independent version of
> MADV_NOHUGEPAGE.
> 
> We include sys/mman.h before making the test to ensure
> that we pick up the system defines.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 10/54] migration/ram.c: Use RAMBlock rather than MemoryRegion
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 10/54] migration/ram.c: Use RAMBlock rather than MemoryRegion Dr. David Alan Gilbert (git)
@ 2015-10-28 10:36   ` Amit Shah
  0 siblings, 0 replies; 119+ messages in thread
From: Amit Shah @ 2015-10-28 10:36 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, quintela, liang.z.li, qemu-devel, luis, bharata,
	pbonzini

On (Tue) 29 Sep 2015 [09:37:34], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> RAM migration mainly works on RAMBlocks but in a few places
> uses data from MemoryRegions to access the same information that's
> already held in RAMBlocks; clean it up just to avoid the
> MemoryRegion use.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 42/54] Postcopy: Use helpers to map pages during migration
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 42/54] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
@ 2015-10-28 10:58   ` Juan Quintela
  2015-10-30 12:59     ` Dr. David Alan Gilbert
  2015-10-30 16:35     ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 119+ messages in thread
From: Juan Quintela @ 2015-10-28 10:58 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> In postcopy, the destination guest is running at the same time
> as it's receiving pages; as we receive new pages we must put
> them into the guests address space atomically to avoid a running
> CPU accessing a partially written page.
>
> Use the helpers in postcopy-ram.c to map these pages.
>
> qemu_get_buffer_in_place is used to avoid a copy out of qemu_file
> in the case that postcopy is going to do a copy anyway.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  migration/ram.c | 128 +++++++++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 103 insertions(+), 25 deletions(-)
>
> diff --git a/migration/ram.c b/migration/ram.c
> index 487e838..6d9cfb5 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1848,7 +1848,17 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
>  /* Must be called from within a rcu critical section.
>   * Returns a pointer from within the RCU-protected ram_list.
>   */
> +/*
> + * Read a RAMBlock ID from the stream f, find the host address of the
> + * start of that block and add on 'offset'
> + *
> + * f: Stream to read from
> + * mis: MigrationIncomingState
> + * offset: Offset within the block
> + * flags: Page flags (mostly to see if it's a continuation of previous block)
> + */
>  static inline void *host_from_stream_offset(QEMUFile *f,
> +                                            MigrationIncomingState *mis,
>                                              ram_addr_t offset,
>                                              int flags)
>  {


Uh, oh, we change the prototype of host_from_stream_offset() but not the
function itself?  Strange, no?

> +        postcopy_place_needed = false;
> +        if (flags & (RAM_SAVE_FLAG_COMPRESS | RAM_SAVE_FLAG_PAGE |
> +                     RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
> +            host = host_from_stream_offset(f, mis, addr, flags);
> +            if (!host) {
> +                error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
> +                ret = -EINVAL;
> +                break;
> +            }
> +            page_buffer = host;

You can move this bit of code here in a different patch, makes review easier.
all_zero can also be on that patch.

> +            if (postcopy_running) {


As discussed on irc, I still think that having a RAM_SAVE_HOST_PAGE make
everything much, much clearer and easier, but I agree that is not
trivial with current code.


You are reusingh ram_load, but have lots and lots of

if (postcopy_running) {

} else {

}

I think that it would be easier to just have:

if (postcopy_running) {
     ram_load_postcopy()
} else {
     ram_load_precopy{}
}

You duplicate a bit of code, but remove lots of ifs from the equation,
not sure which one is really easier.  I just hate bits like the
following one.

> @@ -2062,32 +2123,36 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>              }
>              break;
>          case RAM_SAVE_FLAG_COMPRESS:
>              ch = qemu_get_byte(f);
> -            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
> +            if (!postcopy_running) {
> +                ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
> +            } else {
> +                memset(page_buffer, ch, TARGET_PAGE_SIZE);
> +                if (ch) {
> +                    all_zero = false;
> +                }
> +            }


> @@ -2123,6 +2188,19 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>                  ret = -EINVAL;
>              }
>          }
> +
> +        if (postcopy_place_needed) {
> +            /* This gets called at the last target page in the host page */
> +            if (!all_zero) {
> +                ret = postcopy_place_page(mis, host + TARGET_PAGE_SIZE -
> +                                               qemu_host_page_size,
> +                                               postcopy_place_source);
> +            } else {
> +                ret = postcopy_place_page_zero(mis,
> +                                               host + TARGET_PAGE_SIZE -
> +                                                 qemu_host_page_size);
> +            }
> +        }


Hahahaha, just change the if or the variable name.

having a

if (!cond) {
   f1();
} else {
   f2();
}

makes no sense, better to have

if (cond) {
   f2()
} else {
   f1()
}
no?



The patch itself is ok.

Thanks, Juan.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 44/54] Don't iterate on precopy-only devices during postcopy
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 44/54] Don't iterate on precopy-only devices during postcopy Dr. David Alan Gilbert (git)
@ 2015-10-28 11:01   ` Juan Quintela
  0 siblings, 0 replies; 119+ messages in thread
From: Juan Quintela @ 2015-10-28 11:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> During the postcopy phase we must not call the iterate method on
> precopy-only devices, since they may have done some cleanup during
> the _complete call at the end of the precopy phase.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 26/54] Modify save_live_pending for postcopy
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 26/54] Modify save_live_pending for postcopy Dr. David Alan Gilbert (git)
@ 2015-10-28 11:03   ` Amit Shah
  0 siblings, 0 replies; 119+ messages in thread
From: Amit Shah @ 2015-10-28 11:03 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, quintela, liang.z.li, qemu-devel, luis, bharata,
	pbonzini

On (Tue) 29 Sep 2015 [09:37:50], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Modify save_live_pending to return separate postcopiable and
> non-postcopiable counts.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Juan Quintela <quintela@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

(I had R-b'ed v7 too)

		Amit

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 28/54] migrate_start_postcopy: Command to trigger transition to postcopy
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 28/54] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
  2015-09-30 16:25   ` Eric Blake
  2015-10-20 13:33   ` Juan Quintela
@ 2015-10-28 11:17   ` Amit Shah
  2 siblings, 0 replies; 119+ messages in thread
From: Amit Shah @ 2015-10-28 11:17 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, quintela, liang.z.li, qemu-devel, luis, bharata,
	pbonzini

On (Tue) 29 Sep 2015 [09:37:52], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Once postcopy is enabled (with migrate_set_capability), the migration
> will still start on precopy mode.  To cause a transition into postcopy
> the:
> 
>   migrate_start_postcopy
> 
> command must be issued.  Postcopy will start sometime after this
> (when it's next checked in the migration loop).
> 
> Issuing the command before migration has started will error,
> and issuing after it has finished is ignored.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 30/54] Avoid sending vmdescription during postcopy
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 30/54] Avoid sending vmdescription during postcopy Dr. David Alan Gilbert (git)
  2015-10-20 13:35   ` Juan Quintela
@ 2015-10-28 11:19   ` Amit Shah
  1 sibling, 0 replies; 119+ messages in thread
From: Amit Shah @ 2015-10-28 11:19 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, quintela, liang.z.li, qemu-devel, luis, bharata,
	pbonzini

On (Tue) 29 Sep 2015 [09:37:54], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> VMDescription is normally sent at the end, after all
> of the devices; however that's not the end for postcopy,
> so just don't send it when in postcopy.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 45/54] Host page!=target page: Cleanup bitmaps
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 45/54] Host page!=target page: Cleanup bitmaps Dr. David Alan Gilbert (git)
@ 2015-10-28 11:24   ` Juan Quintela
  2015-11-03 17:32     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 119+ messages in thread
From: Juan Quintela @ 2015-10-28 11:24 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Prior to the start of postcopy, ensure that everything that will
> be transferred later is a whole host-page in size.
>
> This is accomplished by discarding partially transferred host pages
> and marking any that are partially dirty as fully dirty.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> +    struct RAMBlock *block;
> +    unsigned int host_ratio = qemu_host_page_size / TARGET_PAGE_SIZE;
> +
> +    if (qemu_host_page_size == TARGET_PAGE_SIZE) {
> +        /* Easy case - TPS==HPS - nothing to be done */
> +        return 0;
> +    }
> +
> +    /* Easiest way to make sure we don't resume in the middle of a host-page */
> +    last_seen_block = NULL;
> +    last_sent_block = NULL;
> +    last_offset     = 0;


It should be enough with the last one, right?  if you put
last_seen/sent_block to NULL, you will return from the beggining each
time that you do a migration bitmap sync, penalizing the pages on the
begining of the cycle.  Even better than:

last_offset = 0 is doing a:

last_offset &= HOST_PAGE_MASK

or whatever is the constant, no?



> +
> +    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
> +        unsigned long first = block->offset >> TARGET_PAGE_BITS;
> +        unsigned long len = block->used_length >> TARGET_PAGE_BITS;
> +        unsigned long last = first + (len - 1);
> +        unsigned long found_set;
> +        unsigned long search_start;

next_search?  search_next?


> +
> +        PostcopyDiscardState *pds =
> +                         postcopy_discard_send_init(ms, first, block->idstr);
> +
> +        /* First pass: Discard all partially sent host pages */
> +        found_set = find_next_bit(ms->sentmap, last + 1, first);
> +        while (found_set <= last) {
> +            bool do_discard = false;
> +            unsigned long discard_start_addr;
> +            /*
> +             * If the start of this run of pages is in the middle of a host
> +             * page, then we need to discard this host page.
> +             */
> +            if (found_set % host_ratio) {
> +                do_discard = true;
> +                found_set -= found_set % host_ratio;

please, create a PAGE_HOST_ALIGN() macro, or whatever you want to call it?


> +                discard_start_addr = found_set;
> +                search_start = found_set + host_ratio;
> +            } else {
> +                /* Find the end of this run */
> +                unsigned long found_zero;
> +                found_zero = find_next_zero_bit(ms->sentmap, last + 1,
> +                                                found_set + 1);
> +                /*
> +                 * If the 0 isn't at the start of a host page, then the
> +                 * run of 1's doesn't finish at the end of a host page
> +                 * and we need to discard.
> +                 */
> +                if (found_zero % host_ratio) {
> +                    do_discard = true;
> +                    discard_start_addr = found_zero - (found_zero % host_ratio);
> +                    /*
> +                     * This host page has gone, the next loop iteration starts
> +                     * from the next page with a 1 bit
> +                     */
> +                    search_start = discard_start_addr + host_ratio;
> +                } else {
> +                    /*
> +                     * No discards on this iteration, next loop starts from
> +                     * next 1 bit
> +                     */
> +                    search_start = found_zero + 1;

change for this

found_set = found_zero + 1;

> +                }
> +            }
> +            /* Find the next 1 for the next iteration */
> +            found_set = find_next_bit(ms->sentmap, last + 1, search_start);


and move previous line to:

> +            if (do_discard) {
> +                unsigned long page;
> +
> +                /* Tell the destination to discard this page */
> +                postcopy_discard_send_range(ms, pds, discard_start_addr,
> +                         discard_start_addr + host_ratio - 1);
> +                /* Clean up the bitmap */
> +                for (page = discard_start_addr;
> +                     page < discard_start_addr + host_ratio; page++) {
> +                    /* All pages in this host page are now not sent */
> +                    clear_bit(page, ms->sentmap);
> +
> +                    /*
> +                     * Remark them as dirty, updating the count for any pages
> +                     * that weren't previously dirty.
> +                     */
> +                    migration_dirty_pages += !test_and_set_bit(page,
> +                                                             migration_bitmap);
> +                }


to here
                   /* Find the next 1 for the next iteration */
                   found_set = find_next_bit(ms->sentmap, last + 1, search_start);
               }
> +        }

?


> +
> +        /*
> +         * Second pass: Ensure that all partially dirty host pages are made
> +         * fully dirty.
> +         */
> +        found_set = find_next_bit(migration_bitmap, last + 1, first);
> +        while (found_set <= last) {
> +            bool do_dirty = false;
> +            unsigned long dirty_start_addr;
> +            /*
> +             * If the start of this run of pages is in the middle of a host
> +             * page, then we need to mark the whole of this host page dirty
> +             */
> +            if (found_set % host_ratio) {
> +                do_dirty = true;
> +                found_set -= found_set % host_ratio;
> +                dirty_start_addr = found_set;
> +                search_start = found_set + host_ratio;
> +            } else {
> +                /* Find the end of this run */
> +                unsigned long found_zero;
> +                found_zero = find_next_zero_bit(migration_bitmap, last + 1,
> +                                                found_set + 1);
> +                /*
> +                 * If the 0 isn't at the start of a host page, then the
> +                 * run of 1's doesn't finish at the end of a host page
> +                 * and we need to discard.
> +                 */
> +                if (found_zero % host_ratio) {
> +                    do_dirty = true;
> +                    dirty_start_addr = found_zero - (found_zero % host_ratio);
> +                    /*
> +                     * This host page has gone, the next loop iteration starts
> +                     * from the next page with a 1 bit
> +                     */
> +                    search_start = dirty_start_addr + host_ratio;
> +                } else {
> +                    /*
> +                     * No discards on this iteration, next loop starts from
> +                     * next 1 bit
> +                     */
> +                    search_start = found_zero + 1;
> +                }
> +            }
> +
> +            /* Find the next 1 for the next iteration */
> +            found_set = find_next_bit(migration_bitmap, last + 1, search_start);
> +
> +            if (do_dirty) {
> +                unsigned long page;
> +
> +                if (test_bit(dirty_start_addr, ms->sentmap)) {
> +                    /*
> +                     * If the page being redirtied is marked as sent, then it
> +                     * must have been fully sent (otherwise it would have been
> +                     * discarded by the previous pass.)
> +                     * Discard it now.
> +                     */
> +                    postcopy_discard_send_range(ms, pds, dirty_start_addr,
> +                                                dirty_start_addr +
> +                                                host_ratio - 1);
> +                }
> +
> +                /* Clean up the bitmap */
> +                for (page = dirty_start_addr;
> +                     page < dirty_start_addr + host_ratio; page++) {
> +
> +                    /* Clear the sentmap bits for the discard case above */
> +                    clear_bit(page, ms->sentmap);
> +
> +                    /*
> +                     * Mark them as dirty, updating the count for any pages
> +                     * that weren't previously dirty.
> +                     */
> +                    migration_dirty_pages += !test_and_set_bit(page,
> +                                                             migration_bitmap);
> +                }
> +            }
> +        }


This is exactly the same code than the previous half of the function,
you just need to factor out in a function?

walk_btimap_host_page_chunks or whatever, and pass the two bits that
change?  the bitmap, and what to do with the ranges that are not there?

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 46/54] postcopy: Check order of received target pages
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 46/54] postcopy: Check order of received target pages Dr. David Alan Gilbert (git)
@ 2015-10-28 11:26   ` Juan Quintela
  0 siblings, 0 replies; 119+ messages in thread
From: Juan Quintela @ 2015-10-28 11:26 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Ensure that target pages received within a host page are in order.
> This shouldn't trigger, but in the cases where the sender goes
> wrong and sends stuff out of order it produces a corruption that's
> really nasty to debug.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 47/54] Round up RAMBlock sizes to host page sizes
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 47/54] Round up RAMBlock sizes to host page sizes Dr. David Alan Gilbert (git)
@ 2015-10-28 11:28   ` Juan Quintela
  0 siblings, 0 replies; 119+ messages in thread
From: Juan Quintela @ 2015-10-28 11:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> RAMBlocks that are not a multiple of host pages in length
> cause problems for postcopy (I've seen an ACPI table on aarch64
> be 5k in length - i.e. 5x target-page), so round RAMBlock sizes
> up to a host-page.
>
> This potentially breaks migration compatibility due to changes
> in RAMBlock sizes; however:
>    1) x86 and s390 I think always have host=target page size
>    2) When I've tried on Power the block sizes already seem aligned.
>    3) I don't think there's anything else that maintains per-version
>       machine-types for compatibility.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

We had this problem on the past when we moved the machines to be
Megabyte rounded size, some machines where not.  But in this particular
case, I will clame that having a size that is _not_ of the size of the
host pages is just asking for trouble.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 34/54] postcopy: ram_enable_notify to switch on userfault
  2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 34/54] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
@ 2015-10-28 11:40   ` Amit Shah
  0 siblings, 0 replies; 119+ messages in thread
From: Amit Shah @ 2015-10-28 11:40 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, quintela, liang.z.li, qemu-devel, luis, bharata,
	pbonzini

On (Tue) 29 Sep 2015 [09:37:58], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Mark the area of RAM as 'userfault'
> Start up a fault-thread to handle any userfaults we might receive
> from it (to be filled in later)
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> Reviewed-by: Juan Quintela <quintela@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

(I'd also reviewed v7)

		Amit

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 41/54] postcopy_ram.c: place_page and helpers
  2015-10-28 10:28   ` Juan Quintela
@ 2015-10-28 13:11     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert @ 2015-10-28 13:11 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > postcopy_place_page (etc) provide a way for postcopy to place a page
> > into guests memory atomically (using the copy ioctl on the ufd).
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: Amit Shah <amit.shah@redhat.com>
> 
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> 
> > +int postcopy_place_page_zero(MigrationIncomingState *mis, void *host)
> > +{
> > +    struct uffdio_zeropage zero_struct;
> > +
> > +    zero_struct.range.start = (uint64_t)(uintptr_t)host;
> > +    zero_struct.range.len = getpagesize();
> > +    zero_struct.mode = 0;
> > +
> > +    if (ioctl(mis->userfault_fd, UFFDIO_ZEROPAGE, &zero_struct)) {
> > +        int e = errno;
> > +        error_report("%s: %s zero host: %p",
> > +                     __func__, strerror(e), host);
> > +
> > +        return -e;
> > +    }
> > +
> > +    trace_postcopy_place_page_zero(host);
> > +    return 0;
> > +}
> 
> Would this be faster than normal precopy way of just copying a zero page?

For postcopy we have to do an ioctl anyway (to release any paused tasks
waiting on the page), and we can't just write to the page because it's not
mapped yet.  We could do a UFFDIO_COPY of a zero page but that would
take a copy; here the kernel maps the zero page and releases the paused task
without needing a zero page to copy from.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 51/54] Postcopy: Mark nohugepage before discard
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 51/54] Postcopy: Mark nohugepage before discard Dr. David Alan Gilbert (git)
@ 2015-10-28 14:02   ` Juan Quintela
  0 siblings, 0 replies; 119+ messages in thread
From: Juan Quintela @ 2015-10-28 14:02 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Prior to servicing userfault requests we must ensure we've not got
> huge pages in the area that might include non-transferred memory,
> since a hugepage could incorrectly mark the whole huge page as present.
>
> We mark the area as non-huge page (nhp) just before we perform
> discards; the discard code now tells us to discard any areas
> that haven't been sent (as well as any that are redirtied);
> any already formed transparent-huge-pages get fragmented
> by this discard process if they cotnain any discards.
>
> Transparent huge pages that have been entirely transferred
> and don't contain any discards are not broken by this mechanism;
> they stay as huge pages.
>
> By starting postcopy after a full precopy pass, many of the pages
> then stay as huge pages; this is important for maintaining performance
> after the end of the migration.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 37/54] Postcopy: End of iteration
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 37/54] Postcopy: End of iteration Dr. David Alan Gilbert (git)
  2015-10-21  9:16   ` Juan Quintela
@ 2015-10-29  5:10   ` Amit Shah
  1 sibling, 0 replies; 119+ messages in thread
From: Amit Shah @ 2015-10-29  5:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, quintela, liang.z.li, qemu-devel, luis, bharata,
	pbonzini

On (Tue) 29 Sep 2015 [09:38:01], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> The end of migration in postcopy is a bit different since some of
> the things normally done at the end of migration have already been
> done on the transition to postcopy.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 38/54] Page request: Add MIG_RP_MSG_REQ_PAGES reverse command
  2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 38/54] Page request: Add MIG_RP_MSG_REQ_PAGES reverse command Dr. David Alan Gilbert (git)
  2015-10-21 11:12   ` Juan Quintela
@ 2015-10-29  5:17   ` Amit Shah
  1 sibling, 0 replies; 119+ messages in thread
From: Amit Shah @ 2015-10-29  5:17 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, quintela, liang.z.li, qemu-devel, luis, bharata,
	pbonzini

On (Tue) 29 Sep 2015 [09:38:02], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Add MIG_RP_MSG_REQ_PAGES command on Return path for the postcopy
> destination to request a page from the source.
> 
> Two versions exist:
>    MIG_RP_MSG_REQ_PAGES_ID that includes a RAMBlock name and start/len
>    MIG_RP_MSG_REQ_PAGES that just has start/len for use with the same
>                         RAMBlock as a previous MIG_RP_MSG_REQ_PAGES_ID
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>


		Amit

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 04/54] Move configuration section writing
  2015-10-05  6:44   ` Amit Shah
@ 2015-10-30 12:47     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert @ 2015-10-30 12:47 UTC (permalink / raw)
  To: Amit Shah
  Cc: aarcange, quintela, liang.z.li, qemu-devel, luis, bharata,
	pbonzini

* Amit Shah (amit.shah@redhat.com) wrote:
> On (Tue) 29 Sep 2015 [09:37:28], Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > The vmstate_configuration is currently written
> > in 'qemu_savevm_state_begin', move it to
> > 'qemu_savevm_state_header' since it's got a hard
> > requirement that it must be the 1st thing after
> > the header.
> > (In postcopy some 'command' sections get sent
> > early before the saving of the main sections
> > and hence before qemu_savevm_state_begin).
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Reviewed-by: Amit Shah <amit.shah@redhat.com>
> 
> The function name 'savevm_state_header()' isn't accurate anymore.  Not
> serious for this series.

Well, it does still write the header; but if you have a simple
better name, I'd be happy to change it.

Dave

> 
> 		Amit
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 42/54] Postcopy: Use helpers to map pages during migration
  2015-10-28 10:58   ` Juan Quintela
@ 2015-10-30 12:59     ` Dr. David Alan Gilbert
  2015-10-30 16:35     ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert @ 2015-10-30 12:59 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > In postcopy, the destination guest is running at the same time
> > as it's receiving pages; as we receive new pages we must put
> > them into the guests address space atomically to avoid a running
> > CPU accessing a partially written page.
> >
> > Use the helpers in postcopy-ram.c to map these pages.
> >
> > qemu_get_buffer_in_place is used to avoid a copy out of qemu_file
> > in the case that postcopy is going to do a copy anyway.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  migration/ram.c | 128 +++++++++++++++++++++++++++++++++++++++++++++-----------
> >  1 file changed, 103 insertions(+), 25 deletions(-)
> >
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 487e838..6d9cfb5 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -1848,7 +1848,17 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
> >  /* Must be called from within a rcu critical section.
> >   * Returns a pointer from within the RCU-protected ram_list.
> >   */
> > +/*
> > + * Read a RAMBlock ID from the stream f, find the host address of the
> > + * start of that block and add on 'offset'
> > + *
> > + * f: Stream to read from
> > + * mis: MigrationIncomingState
> > + * offset: Offset within the block
> > + * flags: Page flags (mostly to see if it's a continuation of previous block)
> > + */
> >  static inline void *host_from_stream_offset(QEMUFile *f,
> > +                                            MigrationIncomingState *mis,
> >                                              ram_addr_t offset,
> >                                              int flags)
> >  {
> 
> 
> Uh, oh, we change the prototype of host_from_stream_offset() but not the
> function itself?  Strange, no?

Ah, that's a straggler from an old version of the patches that needed mis; gone.

<snip - I'll take the other refactoring in a different reply>

> Hahahaha, just change the if or the variable name.
> 
> having a
> 
> if (!cond) {
>    f1();
> } else {
>    f2();
> }
> 
> makes no sense, better to have
> 
> if (cond) {
>    f2()
> } else {
>    f1()
> }
> no?

Done.

Dave

> 
> 
> 
> The patch itself is ok.
> 
> Thanks, Juan.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 42/54] Postcopy: Use helpers to map pages during migration
  2015-10-28 10:58   ` Juan Quintela
  2015-10-30 12:59     ` Dr. David Alan Gilbert
@ 2015-10-30 16:35     ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert @ 2015-10-30 16:35 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > In postcopy, the destination guest is running at the same time
> > as it's receiving pages; as we receive new pages we must put
> > them into the guests address space atomically to avoid a running
> > CPU accessing a partially written page.
> >
> > Use the helpers in postcopy-ram.c to map these pages.
> >
> > qemu_get_buffer_in_place is used to avoid a copy out of qemu_file
> > in the case that postcopy is going to do a copy anyway.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  migration/ram.c | 128 +++++++++++++++++++++++++++++++++++++++++++++-----------
> >  1 file changed, 103 insertions(+), 25 deletions(-)
> >

> > diff --git a/migration/ram.c b/migration/ram.c
> > +        postcopy_place_needed = false;
> > +        if (flags & (RAM_SAVE_FLAG_COMPRESS | RAM_SAVE_FLAG_PAGE |
> > +                     RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
> > +            host = host_from_stream_offset(f, mis, addr, flags);
> > +            if (!host) {
> > +                error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
> > +                ret = -EINVAL;
> > +                break;
> > +            }
> > +            page_buffer = host;
> 
> You can move this bit of code here in a different patch, makes review easier.
> all_zero can also be on that patch.

Done; this is now 'ram_load: Factor out host_from_stream_offset call and check'

> 
> You are reusingh ram_load, but have lots and lots of
> 
> if (postcopy_running) {
> 
> } else {
> 
> }
> 
> I think that it would be easier to just have:
> 
> if (postcopy_running) {
>      ram_load_postcopy()
> } else {
>      ram_load_precopy{}
> }
> 
> You duplicate a bit of code, but remove lots of ifs from the equation,
> not sure which one is really easier.  I just hate bits like the
> following one.
> 
> > @@ -2062,32 +2123,36 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >              }
> >              break;
> >          case RAM_SAVE_FLAG_COMPRESS:
> >              ch = qemu_get_byte(f);
> > -            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
> > +            if (!postcopy_running) {
> > +                ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
> > +            } else {
> > +                memset(page_buffer, ch, TARGET_PAGE_SIZE);
> > +                if (ch) {
> > +                    all_zero = false;
> > +                }
> > +            }
> 


Yeh, I've split that out now into ram_load_postcopy (called from just
before the main loop in ram_load); as you say it is a bit bigger,
but clearer.

> > +            if (postcopy_running) {
> 
> 
> As discussed on irc, I still think that having a RAM_SAVE_HOST_PAGE make
> everything much, much clearer and easier, but I agree that is not
> trivial with current code.

(I've moved this comment down a bit in this reply).
Actually, now that the postcopy load code is in a separate routine, it might
be possible to reorder things a bit since we know all of these pages are 
host-page-sized.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 29/54] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state
  2015-10-20 13:35   ` Juan Quintela
@ 2015-10-30 18:19     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert @ 2015-10-30 18:19 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > 'MIGRATION_STATUS_POSTCOPY_ACTIVE' is entered after migrate_start_postcopy
> >
> > 'migration_in_postcopy' is provided for other sections to know if
> > they're in postcopy.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > Reviewed-by: Eric Blake <eblake@redhat.com>
> > Reviewed-by: Juan Quintela <quintela@redhat.com>
> > Reviewed-by: Amit Shah <amit.shah@redhat.com>
> 
> Just wondering
> 
> 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 5ee2c11..2ae5909 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -439,6 +439,7 @@ static bool migration_is_active(MigrationState *ms)
> >  {
> >      switch (ms->state) {
> >      case MIGRATION_STATUS_ACTIVE:
> > +    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> >      case MIGRATION_STATUS_SETUP:
> >          return true;
> >  
> > @@ -509,6 +510,39 @@ MigrationInfo *qmp_query_migrate(Error **errp)
> >  
> >          get_xbzrle_cache_stats(info);
> >          break;
> > +    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> > +        /* Mostly the same as active; TODO add some postcopy stats */
> > +        info->has_status = true;
> > +        info->has_total_time = true;
> > +        info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME)
> > +            - s->total_time;
> > +        info->has_expected_downtime = true;
> > +        info->expected_downtime = s->expected_downtime;
> > +        info->has_setup_time = true;
> > +        info->setup_time = s->setup_time;
> > +
> > +        info->has_ram = true;
> > +        info->ram = g_malloc0(sizeof(*info->ram));
> > +        info->ram->transferred = ram_bytes_transferred();
> > +        info->ram->remaining = ram_bytes_remaining();
> > +        info->ram->total = ram_bytes_total();
> > +        info->ram->duplicate = dup_mig_pages_transferred();
> > +        info->ram->skipped = skipped_mig_pages_transferred();
> > +        info->ram->normal = norm_mig_pages_transferred();
> > +        info->ram->normal_bytes = norm_mig_bytes_transferred();
> > +        info->ram->dirty_pages_rate = s->dirty_pages_rate;
> > +        info->ram->mbps = s->mbps;
> > +
> > +        if (blk_mig_active()) {
> > +            info->has_disk = true;
> > +            info->disk = g_malloc0(sizeof(*info->disk));
> > +            info->disk->transferred = blk_mig_bytes_transferred();
> > +            info->disk->remaining = blk_mig_bytes_remaining();
> > +            info->disk->total = blk_mig_bytes_total();
> > +        }
> 
> Are we sure that disk migration works with postcopy?  I would expect no ...

Well, the theory goes that it should; although probably not a good
idea.
What should happen is that since the block migration will reply
with a non-postcopiable pending size > 0, it will stay in precopy
mode until the block migration is done; once that happens
(and the block code returns a pending_size < bandwidth*downtime)
it should flip into postcopy.

It's a bad idea because:
  a) While still in precopy mode the RAM migration will still be
trying to transfer in precopy mode as well, which is wasting bandwidth
fighting with the block IO (we could probably quiesce that once
postcopy is turned on).
  b) I'm a bit worried about how much the block migration might try
and use in RAM when it does it's save_complete into the postcopy
tempoary buffer.

It should be possible to implement a postcopy for block as well
that would run concurrently with the psotcopying of RAM; you'd have
to make sure you scheduled requests that came back to the source so
that slow disc reads didn't block RAM requests.

(Having said that, my test I just did of it didn't work; I'll
try and have a look at that).

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 32/54] Postcopy: Maintain sentmap and calculate discard
  2015-10-21 11:17   ` Juan Quintela
@ 2015-10-30 18:43     ` Dr. David Alan Gilbert
  2015-11-02 17:31     ` Dr. David Alan Gilbert
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert @ 2015-10-30 18:43 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Where postcopy is preceeded by a period of precopy, the destination will
> > have received pages that may have been dirtied on the source after the
> > page was sent.  The destination must throw these pages away before
> > starting it's CPUs.
> >
> > Maintain a 'sentmap' of pages that have already been sent.
> > Calculate list of sent & dirty pages
> > Provide helpers on the destination side to discard these.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: Amit Shah <amit.shah@redhat.com>
> 
> 
> Hi

(I'm going to reply to this mail in a few separate mails as I get
to them)

> >      /* Flag set once the migration has been asked to enter postcopy */
> >      bool start_postcopy;
> 
> 
> This is from a previous patch, but ....
> 
> Change the name of the variable or the comment?  From the comment it
> sholud be "in_postcopy", no?

We have to be careful to differentiate between two separate things:
  1) The user has issued 'migrate_start_postcopy'
     - that sets this 'start_postcopy' flag

  2) The non-postcopiable data has dropped below the limit and we've
     now been able to take notice of 'start_postcopy' and actually
     enter postcopy.

  I think 'in_postcopy' would imply (2); while 'start_postcopy'
  matches the command that's been issued.

> > +struct PostcopyDiscardState {
> > +    const char *name;
> 
> Iht is not obvious to me what name means here.  I assume ram block name,
> change it to block_name, ramblock?

Now ramblock_name.

> 
> > + * returns: 0 on success.
> > + */
> > +int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
> > +                               size_t length)
> > +{
> > +    trace_postcopy_ram_discard_range(start, length);
> > +    if (madvise(start, length, MADV_DONTNEED)) {
> > +        error_report("%s MADV_DONTNEED: %s", __func__, strerror(errno));
> > +        return -1;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> >  #else
> >  /* No target OS support, stubs just fail */
> >  bool postcopy_ram_supported_by_host(void)
> > @@ -153,5 +192,95 @@ bool postcopy_ram_supported_by_host(void)
> >      return false;
> >  }
> >  
> > +int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
> > +                               size_t length)
> > +{
> > +    assert(0);
> 
> I will assume that just returning -1 would work here.
> 
> But yes, I understand that this code shouldn't be reach ...

Yes, it really shouldn't happen if the previous code that says
postcopy isn't supported has been obeyed; I'm happy to change
it if you want.

> > +}
> >  #endif
> >  
> > +/* ------------------------------------------------------------------------- */
> > +
> > +/**
> > + * postcopy_discard_send_init: Called at the start of each RAMBlock before
> > + *   asking to discard individual ranges.
> > + *
> > + * @ms: The current migration state.
> > + * @offset: the bitmap offset of the named RAMBlock in the migration
> > + *   bitmap.
> > + * @name: RAMBlock that discards will operate on.
> > + *
> > + * returns: a new PDS.
> > + */
> > +PostcopyDiscardState *postcopy_discard_send_init(MigrationState *ms,
> > +                                                 unsigned long offset,
> > +                                                 const char *name)
> > +{
> > +    PostcopyDiscardState *res = g_try_malloc(sizeof(PostcopyDiscardState));
> 
> Why are we using here g_try_malloc instead of g_malloc()?  Even
> g_malloc0()?
>
> Specially when we don't check if res is NULL on return.  Please change.

Eek yes; I've gone with malloc0.

> 
> 
> > +
> > +    if (res) {
> > +        res->name = name;
> > +        res->cur_entry = 0;
> > +        res->nsentwords = 0;
> > +        res->nsentcmds = 0;
> 
> With the zero variant, this three can be removed.

Done.

> 
> > +        res->offset = offset;
> > +    }
> > +
> > +    return res;
> > +}
> 
> > -/* Called with rcu_read_lock() to protect migration_bitmap */
> > +/* Called with rcu_read_lock() to protect migration_bitmap
> > + * mr: The region to search for dirty pages in
> 
> Haha, you forgot to update the comment when you moved the function to
> use ram blocks O:-)

Oops, fixed :-)

(Rest of the patch another time)

Dave

> > @@ -662,6 +672,24 @@ static int save_zero_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
> >  }
> >  
> >  /**
> > + * ram_find_block_by_id: Find a ramblock by name.
> > + *
> > + * Returns: The RAMBlock with matching ID, or NULL.
> > + */
> > +static RAMBlock *ram_find_block_by_id(const char *id)
> > +{
> > +    RAMBlock *block;
> > +
> > +    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
> > +        if (!strcmp(id, block->idstr)) {
> > +            return block;
> > +        }
> > +    }
> > +
> > +    return NULL;
> > +}
> 
> We don't have this function already.....
> 
> Once here, could we split it in its own patch and use it in ram_load?
> 
> 
>                 QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
>                     if (!strncmp(id, block->idstr, sizeof(id))) {
>                         if (length != block->used_length) {
>                             Error *local_err = NULL;
> 
>                             ret = qemu_ram_resize(block->offset, length, &local_err);
>                             if (local_err) {
>                                 error_report_err(local_err);
>                             }
>                         }
>                         ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG,
>                                               block->idstr);
>                         break;
>                     }
>                 }
> 
>                 if (!block) {
>                     error_report("Unknown ramblock \"%s\", cannot "
>                                  "accept migration", id);
>                     ret = -EINVAL;
>                 }
> 
> 
> We could also use it in:
> 
> host_from_stream_offset
> 
> 
> > +/* **** functions for postcopy ***** */
> > +
> > +/*
> > + * Callback from postcopy_each_ram_send_discard for each RAMBlock
> > + * start,end: Indexes into the bitmap for the first and last bit
> > + *            representing the named block
> > + */
> > +static int postcopy_send_discard_bm_ram(MigrationState *ms,
> > +                                        PostcopyDiscardState *pds,
> > +                                        unsigned long start, unsigned long end)
> > +{
> > +    unsigned long current;
> > +
> > +    for (current = start; current <= end; ) {
> > +        unsigned long set = find_next_bit(ms->sentmap, end + 1, current);
> > +
> > +        if (set <= end) {
> > +            unsigned long zero = find_next_zero_bit(ms->sentmap,
> > +                                                    end + 1, set + 1);
> > +
> > +            if (zero > end) {
> > +                zero = end + 1;
> > +            }
> > +            postcopy_discard_send_range(ms, pds, set, zero - 1);
> > +            current = zero + 1;
> > +        } else {
> > +            current = set;
> > +        }
> > +    }
> 
> I think I undrestood the logic  here at the end....
> 
> But if we change the meaning of postcopy_discard_send_range() from
> (begin, end), to (begin, length), I think everything goes clearer, no?
> 
>         if (set <= end) {
>             unsigned long zero = find_next_zero_bit(ms->sentmap,
>                                                     end + 1, set + 1);
>             unsigned long length;
> 
>             if (zero > end) {
>                 length = end - set;
>             } else {
>                 lenght = zero - 1 - set;
>                 current = zero + 1;
>             }
>             postcopy_discard_send_range(ms, pds, set, len);
>         } else {
>             current = set;
>         }
>     }
> 
> Y would clame that if we call one zero, the other would be called one.
> Or change to set/unset, but that is just me.  Yes, I haven't tested, and
> it is possible that there is a off-by-one somewhere...
> 
> 
> Looking at postocpy_eand_ram_send_discard, I even think that it would be
> a good idea to pass length to this function.
> 
> > +/*
> > + * Transmit the set of pages to be discarded after precopy to the target
> > + * these are pages that:
> > + *     a) Have been previously transmitted but are now dirty again
> > + *     b) Pages that have never been transmitted, this ensures that
> > + *        any pages on the destination that have been mapped by background
> > + *        tasks get discarded (transparent huge pages is the specific concern)
> > + * Hopefully this is pretty sparse
> > + */
> > +int ram_postcopy_send_discard_bitmap(MigrationState *ms)
> > +{
> > +    int ret;
> > +
> > +    rcu_read_lock();
> > +
> > +    /* This should be our last sync, the src is now paused */
> > +    migration_bitmap_sync();
> > +
> > +    /*
> > +     * Update the sentmap to be sentmap = ~sentmap | dirty
> > +     */
> > +    bitmap_complement(ms->sentmap, ms->sentmap,
> > +               last_ram_offset() >> TARGET_PAGE_BITS);
> > +
> > +    bitmap_or(ms->sentmap, ms->sentmap, migration_bitmap,
> > +               last_ram_offset() >> TARGET_PAGE_BITS);
> 
> This bitmaps are really big, I don't know how long would take to do this
> operations here, but I think that we can avoid (at least) the
> bitmap_complement.  We can change the bitmap name to notsentbitmap, init
> it to one and clear it each time that we sent a page, no?
> 
> We can also do the bitmap_or() at migration_sync_bitmap() time, at that
> point, we shouldn't be on the critical path?
> 
> Later, Juan.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 32/54] Postcopy: Maintain sentmap and calculate discard
  2015-10-21 11:17   ` Juan Quintela
  2015-10-30 18:43     ` Dr. David Alan Gilbert
@ 2015-11-02 17:31     ` Dr. David Alan Gilbert
  2015-11-02 18:19     ` Dr. David Alan Gilbert
  2015-11-02 20:14     ` Dr. David Alan Gilbert
  3 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert @ 2015-11-02 17:31 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

* Juan Quintela (quintela@redhat.com) wrote:

> > +/*
> > + * Transmit the set of pages to be discarded after precopy to the target
> > + * these are pages that:
> > + *     a) Have been previously transmitted but are now dirty again
> > + *     b) Pages that have never been transmitted, this ensures that
> > + *        any pages on the destination that have been mapped by background
> > + *        tasks get discarded (transparent huge pages is the specific concern)
> > + * Hopefully this is pretty sparse
> > + */
> > +int ram_postcopy_send_discard_bitmap(MigrationState *ms)
> > +{
> > +    int ret;
> > +
> > +    rcu_read_lock();
> > +
> > +    /* This should be our last sync, the src is now paused */
> > +    migration_bitmap_sync();
> > +
> > +    /*
> > +     * Update the sentmap to be sentmap = ~sentmap | dirty
> > +     */
> > +    bitmap_complement(ms->sentmap, ms->sentmap,
> > +               last_ram_offset() >> TARGET_PAGE_BITS);
> > +
> > +    bitmap_or(ms->sentmap, ms->sentmap, migration_bitmap,
> > +               last_ram_offset() >> TARGET_PAGE_BITS);
> 
> This bitmaps are really big, I don't know how long would take to do this
> operations here, but I think that we can avoid (at least) the
> bitmap_complement.  We can change the bitmap name to notsentbitmap, init
> it to one and clear it each time that we sent a page, no?

Done, it's now 'unsentmap' - although I suspect the complement step is
probably one of the simpler steps in the process, I'm not sure it's a vast
saving.

> We can also do the bitmap_or() at migration_sync_bitmap() time, at that
> point, we shouldn't be on the critical path?

Given that we're doing the bitmap_sync immediately before the OR, I don't
understand that line; are you talking about a modified migration_bitmap_sync?

Dave

> 
> Later, Juan.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 32/54] Postcopy: Maintain sentmap and calculate discard
  2015-10-21 11:17   ` Juan Quintela
  2015-10-30 18:43     ` Dr. David Alan Gilbert
  2015-11-02 17:31     ` Dr. David Alan Gilbert
@ 2015-11-02 18:19     ` Dr. David Alan Gilbert
  2015-11-02 20:14     ` Dr. David Alan Gilbert
  3 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert @ 2015-11-02 18:19 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

* Juan Quintela (quintela@redhat.com) wrote:

> > @@ -662,6 +672,24 @@ static int save_zero_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
> >  }
> >  
> >  /**
> > + * ram_find_block_by_id: Find a ramblock by name.
> > + *
> > + * Returns: The RAMBlock with matching ID, or NULL.
> > + */
> > +static RAMBlock *ram_find_block_by_id(const char *id)
> > +{
> > +    RAMBlock *block;
> > +
> > +    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
> > +        if (!strcmp(id, block->idstr)) {
> > +            return block;
> > +        }
> > +    }
> > +
> > +    return NULL;
> > +}
> 
> We don't have this function already.....
> 
> Once here, could we split it in its own patch and use it in ram_load?
> 
> 
>                 QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
>                     if (!strncmp(id, block->idstr, sizeof(id))) {
>                         if (length != block->used_length) {
>                             Error *local_err = NULL;
> 
>                             ret = qemu_ram_resize(block->offset, length, &local_err);
>                             if (local_err) {
>                                 error_report_err(local_err);
>                             }
>                         }
>                         ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG,
>                                               block->idstr);
>                         break;
>                     }
>                 }
> 
>                 if (!block) {
>                     error_report("Unknown ramblock \"%s\", cannot "
>                                  "accept migration", id);
>                     ret = -EINVAL;
>                 }
> 
> 
> We could also use it in:
> 
> host_from_stream_offset

Done; replaced both uses and it's now called 'qemu_ram_block_by_name'

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 32/54] Postcopy: Maintain sentmap and calculate discard
  2015-10-21 11:17   ` Juan Quintela
                       ` (2 preceding siblings ...)
  2015-11-02 18:19     ` Dr. David Alan Gilbert
@ 2015-11-02 20:14     ` Dr. David Alan Gilbert
  3 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert @ 2015-11-02 20:14 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

* Juan Quintela (quintela@redhat.com) wrote:
> > +/* **** functions for postcopy ***** */
> > +
> > +/*
> > + * Callback from postcopy_each_ram_send_discard for each RAMBlock
> > + * start,end: Indexes into the bitmap for the first and last bit
> > + *            representing the named block
> > + */
> > +static int postcopy_send_discard_bm_ram(MigrationState *ms,
> > +                                        PostcopyDiscardState *pds,
> > +                                        unsigned long start, unsigned long end)
> > +{
> > +    unsigned long current;
> > +
> > +    for (current = start; current <= end; ) {
> > +        unsigned long set = find_next_bit(ms->sentmap, end + 1, current);
> > +
> > +        if (set <= end) {
> > +            unsigned long zero = find_next_zero_bit(ms->sentmap,
> > +                                                    end + 1, set + 1);
> > +
> > +            if (zero > end) {
> > +                zero = end + 1;
> > +            }
> > +            postcopy_discard_send_range(ms, pds, set, zero - 1);
> > +            current = zero + 1;
> > +        } else {
> > +            current = set;
> > +        }
> > +    }
> 
> I think I undrestood the logic  here at the end....
> 
> But if we change the meaning of postcopy_discard_send_range() from
> (begin, end), to (begin, length), I think everything goes clearer, no?
> 
>         if (set <= end) {
>             unsigned long zero = find_next_zero_bit(ms->sentmap,
>                                                     end + 1, set + 1);
>             unsigned long length;
> 
>             if (zero > end) {
>                 length = end - set;
>             } else {
>                 lenght = zero - 1 - set;
>                 current = zero + 1;
>             }
>             postcopy_discard_send_range(ms, pds, set, len);
>         } else {
>             current = set;
>         }
>     }
> 
> Y would clame that if we call one zero, the other would be called one.
> Or change to set/unset, but that is just me.  Yes, I haven't tested, and
> it is possible that there is a off-by-one somewhere...
> 
> Looking at postocpy_eand_ram_send_discard, I even think that it would be
> a good idea to pass length to this function.

OK, so I've ended up with (build tested only so far):
/*
 * Callback from postcopy_each_ram_send_discard for each RAMBlock
 * Note: At this point the 'unsentmap' is the processed bitmap combined
 *       with the dirtymap; so a '1' means it's either dirty or unsent.
 * start,length: Indexes into the bitmap for the first bit
 *            representing the named block and length in target-pages
 */
static int postcopy_send_discard_bm_ram(MigrationState *ms,
                                        PostcopyDiscardState *pds,
                                        unsigned long start,
                                        unsigned long length)
{
    unsigned long end = start + length; /* one after the end */
    unsigned long current;

    for (current = start; current < end; ) {
        unsigned long one = find_next_bit(ms->unsentmap, end, current);

        if (one <= end) {
            unsigned long zero = find_next_zero_bit(ms->unsentmap,
                                                    end, one + 1);
            unsigned long discard_length;

            if (zero >= end) {
                discard_length = end - one;
            } else {
                discard_length = zero - one;
            }
            postcopy_discard_send_range(ms, pds, one, discard_length);
            current = one + discard_length;
        } else {
            current = one;
        }
    }

    return 0;
}

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 40/54] Page request: Consume pages off the post-copy queue
  2015-10-26 16:32   ` Juan Quintela
@ 2015-11-03 11:52     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 119+ messages in thread
From: Dr. David Alan Gilbert @ 2015-11-03 11:52 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > When transmitting RAM pages, consume pages that have been queued by
> > MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning.
> >
> > Note:
> >   a) After a queued page the linear walk carries on from after the
> > unqueued page; there is a reasonable chance that the destination
> > was about to ask for other closeby pages anyway.
> >
> >   b) We have to be careful of any assumptions that the page walking
> > code makes, in particular it does some short cuts on its first linear
> > walk that break as soon as we do a queued page.
> >
> >   c) We have to be careful to not break up host-page size chunks, since
> > this makes it harder to place the pages on the destination.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  migration/ram.c | 195 +++++++++++++++++++++++++++++++++++++++++++++++---------
> >  trace-events    |   2 +
> >  2 files changed, 168 insertions(+), 29 deletions(-)
> >
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 5771983..487e838 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -516,9 +516,9 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t **current_data,
> >   * Returns: byte offset within memory region of the start of a dirty page
> >   */
> >  static inline
> > -ram_addr_t migration_bitmap_find_and_reset_dirty(RAMBlock *rb,
> > -                                                 ram_addr_t start,
> > -                                                 ram_addr_t *ram_addr_abs)
> > +ram_addr_t migration_bitmap_find_dirty(RAMBlock *rb,
> > +                                       ram_addr_t start,
> > +                                       ram_addr_t *ram_addr_abs)
> >  {
> >      unsigned long base = rb->offset >> TARGET_PAGE_BITS;
> >      unsigned long nr = base + (start >> TARGET_PAGE_BITS);
> > @@ -535,15 +535,24 @@ ram_addr_t migration_bitmap_find_and_reset_dirty(RAMBlock *rb,
> >          next = find_next_bit(bitmap, size, nr);
> >      }
> >  
> > -    if (next < size) {
> > -        clear_bit(next, bitmap);
> > -        migration_dirty_pages--;
> > -    }
> >      *ram_addr_abs = next << TARGET_PAGE_BITS;
> >      return (next - base) << TARGET_PAGE_BITS;
> >  }
> >  
> > -/* Called with rcu_read_lock() to protect migration_bitmap */
> > +static inline bool migration_bitmap_clear_dirty(ram_addr_t addr)
> > +{
> > +    bool ret;
> > +    int nr = addr >> TARGET_PAGE_BITS;
> > +    unsigned long *bitmap = atomic_rcu_read(&migration_bitmap);
> > +
> > +    ret = test_and_clear_bit(nr, bitmap);
> > +
> > +    if (ret) {
> > +        migration_dirty_pages--;
> > +    }
> > +    return ret;
> > +}
> > +
> >  static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length)
> >  {
> >      unsigned long *bitmap;
> > @@ -960,9 +969,8 @@ static int ram_save_compressed_page(QEMUFile *f, RAMBlock *block,
> >  static bool find_dirty_block(QEMUFile *f, PageSearchStatus *pss,
> >                               bool *again, ram_addr_t *ram_addr_abs)
> >  {
> > -    pss->offset = migration_bitmap_find_and_reset_dirty(pss->block,
> > -                                                       pss->offset,
> > -                                                       ram_addr_abs);
> > +    pss->offset = migration_bitmap_find_dirty(pss->block, pss->offset,
> > +                                              ram_addr_abs);
> >      if (pss->complete_round && pss->block == last_seen_block &&
> >          pss->offset >= last_offset) {
> >          /*
> > @@ -1001,6 +1009,88 @@ static bool find_dirty_block(QEMUFile *f, PageSearchStatus *pss,
> >      }
> >  }
> >  
> > +/*
> > + * Unqueue a page from the queue fed by postcopy page requests; skips pages
> > + * that are already sent (!dirty)
> > + *
> > + * Returns:      true if a queued page is found
> > + *      ms:      MigrationState in
> > + *     pss:      PageSearchStatus structure updated with found block/offset
> > + * ram_addr_abs: global offset in the dirty/sent bitmaps
> > + */
> > +static bool get_queued_page(MigrationState *ms, PageSearchStatus *pss,
> > +                            ram_addr_t *ram_addr_abs)
> > +{
> > +    RAMBlock  *block;
> > +    ram_addr_t offset;
> > +    bool dirty;
> > +
> > +    do {
> > +        block = NULL;
> > +        qemu_mutex_lock(&ms->src_page_req_mutex);
> > +        if (!QSIMPLEQ_EMPTY(&ms->src_page_requests)) {
> > +            struct MigrationSrcPageRequest *entry =
> > +                                    QSIMPLEQ_FIRST(&ms->src_page_requests);
> > +            block = entry->rb;
> > +            offset = entry->offset;
> > +            *ram_addr_abs = (entry->offset + entry->rb->offset) &
> > +                            TARGET_PAGE_MASK;
> > +
> > +            if (entry->len > TARGET_PAGE_SIZE) {
> > +                entry->len -= TARGET_PAGE_SIZE;
> > +                entry->offset += TARGET_PAGE_SIZE;
> > +            } else {
> > +                memory_region_unref(block->mr);
> > +                QSIMPLEQ_REMOVE_HEAD(&ms->src_page_requests, next_req);
> > +                g_free(entry);
> > +            }
> > +        }
> > +        qemu_mutex_unlock(&ms->src_page_req_mutex);
> 
> Can we spilt this chunk with a name like:
> 
> it_is_complicated_to_get_the_first_queued_pagge(&ms, &block, &offset,
> ram_addr_abs) or something like that?
> 
> Yes, we can improve naming here.

Done; we still have get_queued_page and it called 'unqueue_page' that
does the first half of that.

> > +
> > +        /*
> > +         * We're sending this page, and since it's postcopy nothing else
> > +         * will dirty it, and we must make sure it doesn't get sent again
> > +         * even if this queue request was received after the background
> > +         * search already sent it.
> > +         */
> > +        if (block) {
> > +            dirty = test_bit(*ram_addr_abs >> TARGET_PAGE_BITS,
> > +                             migration_bitmap);
> 
> 
> You need to do the atomic_rcu_read(&migration_bitmap) dance, no?

Done.

> Why don't you do here a test_and_clear_bit() and then you don't have to
> change migration_bintmap_find_and_reset_dirty()

Because it gets messy with the host pages; we're only 'finding' the first
target-page in a host page, and leaving the host-page code to do the work
on the whole of the host page, so I want to leave the dirty bits for it
to deal with.

> All our migration code works with ram address, but we need basically
> everywhere page numbers.  I am not sure if things will get clearer/more
> complicated if we changed the conventions to use page_number insntead of
> ram_addr_abs.  But this one is completely independent of this patch.

Yes; it's VERY confusing.

> > +            if (!dirty) {
> > +                trace_get_queued_page_not_dirty(
> > +                    block->idstr, (uint64_t)offset,
> > +                    (uint64_t)*ram_addr_abs,
> > +                    test_bit(*ram_addr_abs >> TARGET_PAGE_BITS, ms->sentmap));
> > +            } else {
> > +                trace_get_queued_page(block->idstr,
> > +                                      (uint64_t)offset,
> > +                                      (uint64_t)*ram_addr_abs);
> > +            }
> > +        }
> > +
> > +    } while (block && !dirty);
> > +
> > +    if (block) {
> > +        /*
> > +         * As soon as we start servicing pages out of order, then we have
> > +         * to kill the bulk stage, since the bulk stage assumes
> > +         * in (migration_bitmap_find_and_reset_dirty) that every page is
> > +         * dirty, that's no longer true.
> > +         */
> > +        ram_bulk_stage = false;
> > +
> > +        /*
> > +         * We want the background search to continue from the queued page
> > +         * since the guest is likely to want other pages near to the page
> > +         * it just requested.
> > +         */
> > +        pss->block = block;
> > +        pss->offset = offset;
> > +    }
> > +
> > +    return !!block;
> > +}
> > +
> >  /**
> >   * flush_page_queue: Flush any remaining pages in the ram request queue
> >   *    it should be empty at the end anyway, but in error cases there may be
> > @@ -1087,6 +1177,57 @@ err:
> >  
> >  
> >  /**
> > + * ram_save_host_page: Starting at *offset send pages upto the end
> > + *                     of the current host page.  It's valid for the initial
> > + *                     offset to point into the middle of a host page
> > + *                     in which case the remainder of the hostpage is sent.
> > + *                     Only dirty target pages are sent.
> > + *
> > + * Returns: Number of pages written.
> > + *
> > + * @f: QEMUFile where to send the data
> > + * @block: pointer to block that contains the page we want to send
> > + * @offset: offset inside the block for the page; updated to last target page
> > + *          sent
> > + * @last_stage: if we are at the completion stage
> > + * @bytes_transferred: increase it with the number of transferred bytes
> > + */
> > +static int ram_save_host_page(MigrationState *ms, QEMUFile *f, RAMBlock* block,
> > +                              ram_addr_t *offset, bool last_stage,
> > +                              uint64_t *bytes_transferred,
> > +                              ram_addr_t dirty_ram_abs)
> > +{
> > +    int tmppages, pages = 0;
> > +    do {
> > +        /* Check the pages is dirty and if it is send it */
> > +        if (migration_bitmap_clear_dirty(dirty_ram_abs)) {
> > +            if (compression_switch && migrate_use_compression()) {
> > +                tmppages = ram_save_compressed_page(f, block, *offset,
> > +                                                    last_stage,
> > +                                                    bytes_transferred);
> > +            } else {
> > +                tmppages = ram_save_page(f, block, *offset, last_stage,
> > +                                         bytes_transferred);
> > +            }
> > +
> > +            if (tmppages < 0) {
> > +                return tmppages;
> > +            }
> > +            if (ms->sentmap) {
> > +                set_bit(dirty_ram_abs >> TARGET_PAGE_BITS, ms->sentmap);
> > +            }
> > +            pages += tmppages;
> > +        }
> > +        *offset += TARGET_PAGE_SIZE;
> > +        dirty_ram_abs += TARGET_PAGE_SIZE;
> > +    } while (*offset & (qemu_host_page_size - 1));
> > +
> > +    /* The offset we leave with is the last one we looked at */
> > +    *offset -= TARGET_PAGE_SIZE;
> > +    return pages;
> > +}
> 
> Split this function first to make changes easier to gasp?
> 
> We are doing (at least) two quite different things here.

Done; ram_save_host_page now calls ram_save_target_page to do the meat
of it.

> > +
> > +/**
> >   * ram_find_and_save_block: Finds a dirty page and sends it to f
> >   *
> >   * Called within an RCU critical section.
> > @@ -1097,12 +1238,16 @@ err:
> >   * @f: QEMUFile where to send the data
> >   * @last_stage: if we are at the completion stage
> >   * @bytes_transferred: increase it with the number of transferred bytes
> > + *
> > + * On systems where host-page-size > target-page-size it will send all the
> > + * pages in a host page that are dirty.
> >   */
> >  
> >  static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
> >                                     uint64_t *bytes_transferred)
> >  {
> >      PageSearchStatus pss;
> > +    MigrationState *ms = migrate_get_current();
> >      int pages = 0;
> >      bool again, found;
> >      ram_addr_t dirty_ram_abs; /* Address of the start of the dirty page in
> > @@ -1117,26 +1262,18 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
> >      }
> >  
> >      do {
> > -        found = find_dirty_block(f, &pss, &again, &dirty_ram_abs);
> > +        again = true;
> > +        found = get_queued_page(ms, &pss, &dirty_ram_abs);
> >  
> > -        if (found) {
> > -            if (compression_switch && migrate_use_compression()) {
> > -                pages = ram_save_compressed_page(f, pss.block, pss.offset,
> > -                                                 last_stage,
> > -                                                 bytes_transferred);
> > -            } else {
> > -                pages = ram_save_page(f, pss.block, pss.offset, last_stage,
> > -                                      bytes_transferred);
> > -            }
> > +        if (!found) {
> > +            /* priority queue empty, so just search for something dirty */
> > +            found = find_dirty_block(f, &pss, &again, &dirty_ram_abs);
> > +        }
> >  
> > -            /* if page is unmodified, continue to the next */
> > -            if (pages > 0) {
> > -                MigrationState *ms = migrate_get_current();
> > -                last_sent_block = pss.block;
> > -                if (ms->sentmap) {
> > -                    set_bit(dirty_ram_abs >> TARGET_PAGE_BITS, ms->sentmap);
> > -                }
> > -            }
> > +        if (found) {
> > +            pages = ram_save_host_page(ms, f, pss.block, &pss.offset,
> > +                                       last_stage, bytes_transferred,
> > +                                       dirty_ram_abs);
> >          }
> >      } while (!pages && again);
> 
> 
> Using too loops here?
> This is the code after your changes:
> 
> 
>      do {
>         again = true;
>         found = get_queued_page(ms, &pss, &dirty_ram_abs);
> 
>         if (!found) {
>             /* priority queue empty, so just search for something dirty */
>             found = find_dirty_block(f, &pss, &again, &dirty_ram_abs);
>         }
> 
>         if (found) {
>             pages = ram_save_host_page(ms, f, pss.block, &pss.offset,
>                                        last_stage, bytes_transferred,
>                                        dirty_ram_abs);
>         }
>      } while (!pages && again);
> 
> 
> while (get_queued_page(ms, &pss, &dirty_ram_abs)) {
>     pages = ram_save_host_page(ms, f, pss.block, &pss.offset,
>                                last_stage, bytes_transferred,
>                                dirty_ram_abs);
> }
> 
> 
> 
> do {
>         /* priority queue empty, so just search for something dirty */
>         found = find_dirty_block(f, &pss, &again, &dirty_ram_abs);
> 
>         if (found) {
>             pages = ram_save_host_page(ms, f, pss.block, &pss.offset,
>                                        last_stage, bytes_transferred,
>                                        dirty_ram_abs);
>         }
>      } while (!pages && again);
> 
> 
> We repeat the ram_save_host_page() call, but IMHO, it is easrier to see
> what we are doing, and specially how we get out of the loop.

Note you've changed the behaviour there;   my loop sends only one (host) page
(preferably from the queue, but failing that finds a dirty one); yours
sends *all* the queued pages and then tries to find a dirty one.  That
might not be a bad change, and I don't think it will break anything higher
up, but it's not the same behaviour.

Dave

> 
> Later, Juan.
> 
> 
> >  
> > diff --git a/trace-events b/trace-events
> > index e40f00e..9e4206b 100644
> > --- a/trace-events
> > +++ b/trace-events
> > @@ -1244,6 +1244,8 @@ vmstate_subsection_load_good(const char *parent) "%s"
> >  qemu_file_fclose(void) ""
> >  
> >  # migration/ram.c
> > +get_queued_page(const char *block_name, uint64_t tmp_offset, uint64_t ram_addr) "%s/%" PRIx64 " ram_addr=%" PRIx64
> > +get_queued_page_not_dirty(const char *block_name, uint64_t tmp_offset, uint64_t ram_addr, int sent) "%s/%" PRIx64 " ram_addr=%" PRIx64 " (sent=%d)"
> >  migration_bitmap_sync_start(void) ""
> >  migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64""
> >  migration_throttle(void) ""
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 45/54] Host page!=target page: Cleanup bitmaps
  2015-10-28 11:24   ` Juan Quintela
@ 2015-11-03 17:32     ` Dr. David Alan Gilbert
  2015-11-03 18:30       ` Juan Quintela
  0 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert @ 2015-11-03 17:32 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Prior to the start of postcopy, ensure that everything that will
> > be transferred later is a whole host-page in size.
> >
> > This is accomplished by discarding partially transferred host pages
> > and marking any that are partially dirty as fully dirty.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > +    struct RAMBlock *block;
> > +    unsigned int host_ratio = qemu_host_page_size / TARGET_PAGE_SIZE;
> > +
> > +    if (qemu_host_page_size == TARGET_PAGE_SIZE) {
> > +        /* Easy case - TPS==HPS - nothing to be done */
> > +        return 0;
> > +    }
> > +
> > +    /* Easiest way to make sure we don't resume in the middle of a host-page */
> > +    last_seen_block = NULL;
> > +    last_sent_block = NULL;
> > +    last_offset     = 0;
> 
> 
> It should be enough with the last one, right?  if you put
> last_seen/sent_block to NULL, you will return from the beggining each
> time that you do a migration bitmap sync, penalizing the pages on the
> begining of the cycle.  Even better than:
> 
> last_offset = 0 is doing a:
> 
> last_offset &= HOST_PAGE_MASK
> 
> or whatever is the constant, no?

These only happen once at the transition to postcopy; so I just
make sure by resetting the 3 associated variables; you're probably
right you could just do it by setting last_offset or last_seen_block;
but resetting all 3 gives you something that's obviously consistent.

> > +
> > +    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
> > +        unsigned long first = block->offset >> TARGET_PAGE_BITS;
> > +        unsigned long len = block->used_length >> TARGET_PAGE_BITS;
> > +        unsigned long last = first + (len - 1);
> > +        unsigned long found_set;
> > +        unsigned long search_start;
> 
> next_search?  search_next?

Based on your comments below, it's gone.

> > +
> > +        PostcopyDiscardState *pds =
> > +                         postcopy_discard_send_init(ms, first, block->idstr);
> > +
> > +        /* First pass: Discard all partially sent host pages */
> > +        found_set = find_next_bit(ms->sentmap, last + 1, first);
> > +        while (found_set <= last) {
> > +            bool do_discard = false;
> > +            unsigned long discard_start_addr;
> > +            /*
> > +             * If the start of this run of pages is in the middle of a host
> > +             * page, then we need to discard this host page.
> > +             */
> > +            if (found_set % host_ratio) {
> > +                do_discard = true;
> > +                found_set -= found_set % host_ratio;
> 
> please, create a PAGE_HOST_ALIGN() macro, or whatever you want to call it?

Note this is aligning by bit rather than page; so I think the
uses in here are the only case; however I've redone it as:
  host_offset = found_set % host_ratio;
  if (host_offset) {
      do_discard = true;
      found_set -= host_offset;
  }

so no repetition.

> > +                     * next 1 bit
> > +                     */
> > +                    search_start = found_zero + 1;
> 
> change for this
> 
> found_set = found_zero + 1;
> 
> > +                }
> > +            }
> > +            /* Find the next 1 for the next iteration */
> > +            found_set = find_next_bit(ms->sentmap, last + 1, search_start);
> 
> 
> and move previous line to:
> 
> > +            if (do_discard) {
> > +                unsigned long page;
> > +
> > +                /* Tell the destination to discard this page */
> > +                postcopy_discard_send_range(ms, pds, discard_start_addr,
> > +                         discard_start_addr + host_ratio - 1);
> > +                /* Clean up the bitmap */
> > +                for (page = discard_start_addr;
> > +                     page < discard_start_addr + host_ratio; page++) {
> > +                    /* All pages in this host page are now not sent */
> > +                    clear_bit(page, ms->sentmap);
> > +
> > +                    /*
> > +                     * Remark them as dirty, updating the count for any pages
> > +                     * that weren't previously dirty.
> > +                     */
> > +                    migration_dirty_pages += !test_and_set_bit(page,
> > +                                                             migration_bitmap);
> > +                }
> 
> 
> to here
>                    /* Find the next 1 for the next iteration */
>                    found_set = find_next_bit(ms->sentmap, last + 1, search_start);
>                }

OK, see the version below; I think I've done it as suggested; I've got rid of the
'search_start' and just reused the found_set (that's now run_start).

> > +        }
> 
> ?
> 
> 
> > +
> > +        /*
> > +         * Second pass: Ensure that all partially dirty host pages are made
> > +         * fully dirty.
> > +         */
> > +        found_set = find_next_bit(migration_bitmap, last + 1, first);
> > +        while (found_set <= last) {
> > +            bool do_dirty = false;
> > +            unsigned long dirty_start_addr;
> > +            /*
> > +             * If the start of this run of pages is in the middle of a host
> > +             * page, then we need to mark the whole of this host page dirty
> > +             */
> > +            if (found_set % host_ratio) {
> > +                do_dirty = true;
> > +                found_set -= found_set % host_ratio;
> > +                dirty_start_addr = found_set;
> > +                search_start = found_set + host_ratio;
> > +            } else {
> > +                /* Find the end of this run */
> > +                unsigned long found_zero;
> > +                found_zero = find_next_zero_bit(migration_bitmap, last + 1,
> > +                                                found_set + 1);
> > +                /*
> > +                 * If the 0 isn't at the start of a host page, then the
> > +                 * run of 1's doesn't finish at the end of a host page
> > +                 * and we need to discard.
> > +                 */
> > +                if (found_zero % host_ratio) {
> > +                    do_dirty = true;
> > +                    dirty_start_addr = found_zero - (found_zero % host_ratio);
> > +                    /*
> > +                     * This host page has gone, the next loop iteration starts
> > +                     * from the next page with a 1 bit
> > +                     */
> > +                    search_start = dirty_start_addr + host_ratio;
> > +                } else {
> > +                    /*
> > +                     * No discards on this iteration, next loop starts from
> > +                     * next 1 bit
> > +                     */
> > +                    search_start = found_zero + 1;
> > +                }
> > +            }
> > +
> > +            /* Find the next 1 for the next iteration */
> > +            found_set = find_next_bit(migration_bitmap, last + 1, search_start);
> > +
> > +            if (do_dirty) {
> > +                unsigned long page;
> > +
> > +                if (test_bit(dirty_start_addr, ms->sentmap)) {
> > +                    /*
> > +                     * If the page being redirtied is marked as sent, then it
> > +                     * must have been fully sent (otherwise it would have been
> > +                     * discarded by the previous pass.)
> > +                     * Discard it now.
> > +                     */
> > +                    postcopy_discard_send_range(ms, pds, dirty_start_addr,
> > +                                                dirty_start_addr +
> > +                                                host_ratio - 1);
> > +                }
> > +
> > +                /* Clean up the bitmap */
> > +                for (page = dirty_start_addr;
> > +                     page < dirty_start_addr + host_ratio; page++) {
> > +
> > +                    /* Clear the sentmap bits for the discard case above */
> > +                    clear_bit(page, ms->sentmap);
> > +
> > +                    /*
> > +                     * Mark them as dirty, updating the count for any pages
> > +                     * that weren't previously dirty.
> > +                     */
> > +                    migration_dirty_pages += !test_and_set_bit(page,
> > +                                                             migration_bitmap);
> > +                }
> > +            }
> > +        }
> 
> 
> This is exactly the same code than the previous half of the function,
> you just need to factor out in a function?
> 
> walk_btimap_host_page_chunks or whatever, and pass the two bits that
> change?  the bitmap, and what to do with the ranges that are not there?

Split out; see below - it gets a little bit more hairy since sentmap is
now unsentmap, so we need a few if's; but it's still lost the duplication:
(build tested only so far):

Dave

commit 15003123520ee5c358b2233c0bc30635aa90eb75
Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
Date:   Fri Sep 26 15:15:14 2014 +0100

    Host page!=target page: Cleanup bitmaps
    
    Prior to the start of postcopy, ensure that everything that will
    be transferred later is a whole host-page in size.
    
    This is accomplished by discarding partially transferred host pages
    and marking any that are partially dirty as fully dirty.
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

diff --git a/migration/ram.c b/migration/ram.c
index fe782e7..e30ed2b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1576,6 +1576,167 @@ static int postcopy_each_ram_send_discard(MigrationState *ms)
 }
 
 /*
+ * Helper for postcopy_chunk_hostpages; it's called twice to cleanup
+ *   the two bitmaps, that are similar, but one is inverted.
+ *
+ * We search for runs of target-pages that don't start or end on a
+ * host page boundary;
+ * unsent_pass=true: Cleans up partially unsent host pages by searching
+ *                 the unsentmap
+ * unsent_pass=false: Cleans up partially dirty host pages by searching
+ *                 the main migration bitmap
+ *
+ */
+static void postcopy_chunk_hostpages_pass(MigrationState *ms, bool unsent_pass,
+                                          RAMBlock *block,
+                                          PostcopyDiscardState *pds)
+{
+    unsigned long *bitmap = atomic_rcu_read(&migration_bitmap_rcu)->bmap;
+    unsigned int host_ratio = qemu_host_page_size / TARGET_PAGE_SIZE;
+    unsigned long first = block->offset >> TARGET_PAGE_BITS;
+    unsigned long len = block->used_length >> TARGET_PAGE_BITS;
+    unsigned long last = first + (len - 1);
+    unsigned long run_start;
+
+    if (unsent_pass) {
+        /* Find a sent page */
+        run_start = find_next_zero_bit(ms->unsentmap, last + 1, first);
+    } else {
+        /* Find a dirty page */
+        run_start = find_next_bit(bitmap, last + 1, first);
+    }
+
+    while (run_start <= last) {
+        bool do_fixup = false;
+        unsigned long fixup_start_addr;
+        unsigned long host_offset;
+
+        /*
+         * If the start of this run of pages is in the middle of a host
+         * page, then we need to fixup this host page.
+         */
+        host_offset = run_start % host_ratio;
+        if (host_offset) {
+            do_fixup = true;
+            run_start -= host_offset;
+            fixup_start_addr = run_start;
+            /* For the next pass */
+            run_start = run_start + host_ratio;
+        } else {
+            /* Find the end of this run */
+            unsigned long run_end;
+            if (unsent_pass) {
+                run_end = find_next_bit(ms->unsentmap, last + 1, run_start + 1);
+            } else {
+                run_end = find_next_zero_bit(bitmap, last + 1, run_start + 1);
+            }
+            /*
+             * If the end isn't at the start of a host page, then the
+             * run doesn't finish at the end of a host page
+             * and we need to discard.
+             */
+            host_offset = run_end % host_ratio;
+            if (host_offset) {
+                do_fixup = true;
+                fixup_start_addr = run_end - host_offset;
+                /*
+                 * This host page has gone, the next loop iteration starts
+                 * from after the fixup
+                 */
+                run_start = fixup_start_addr + host_ratio;
+            } else {
+                /*
+                 * No discards on this iteration, next loop starts from
+                 * next sent/dirty page
+                 */
+                run_start = run_end + 1;
+            }
+        }
+
+        if (do_fixup) {
+            unsigned long page;
+
+            /* Tell the destination to discard this page */
+            if (unsent_pass || !test_bit(fixup_start_addr, ms->unsentmap)) {
+                /* For the unsent_pass we:
+                 *     discard partially sent pages
+                 * For the !unsent_pass (dirty) we:
+                 *     discard partially dirty pages that were sent
+                 *     (any partially sent pages were already discarded
+                 *     by the previous unsent_pass)
+                 */
+                postcopy_discard_send_range(ms, pds, fixup_start_addr,
+                                            host_ratio);
+            }
+
+            /* Clean up the bitmap */
+            for (page = fixup_start_addr;
+                 page < fixup_start_addr + host_ratio; page++) {
+                /* All pages in this host page are now not sent */
+                set_bit(page, ms->unsentmap);
+
+                /*
+                 * Remark them as dirty, updating the count for any pages
+                 * that weren't previously dirty.
+                 */
+                migration_dirty_pages += !test_and_set_bit(page, bitmap);
+            }
+        }
+
+        if (unsent_pass) {
+            /* Find the next sent page for the next iteration */
+            run_start = find_next_zero_bit(ms->unsentmap, last + 1,
+                                           run_start);
+        } else {
+            /* Find the next dirty page for the next iteration */
+            run_start = find_next_bit(bitmap, last + 1, run_start);
+        }
+    }
+}
+
+/*
+ * Utility for the outgoing postcopy code.
+ *
+ * Discard any partially sent host-page size chunks, mark any partially
+ * dirty host-page size chunks as all dirty.
+ *
+ * Returns: 0 on success
+ */
+static int postcopy_chunk_hostpages(MigrationState *ms)
+{
+    struct RAMBlock *block;
+
+    if (qemu_host_page_size == TARGET_PAGE_SIZE) {
+        /* Easy case - TPS==HPS - nothing to be done */
+        return 0;
+    }
+
+    /* Easiest way to make sure we don't resume in the middle of a host-page */
+    last_seen_block = NULL;
+    last_sent_block = NULL;
+    last_offset     = 0;
+
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        unsigned long first = block->offset >> TARGET_PAGE_BITS;
+
+        PostcopyDiscardState *pds =
+                         postcopy_discard_send_init(ms, first, block->idstr);
+
+        /* First pass: Discard all partially sent host pages */
+        postcopy_chunk_hostpages_pass(ms, true, block, pds);
+        /*
+         * Second pass: Ensure that all partially dirty host pages are made
+         * fully dirty.
+         */
+        postcopy_chunk_hostpages_pass(ms, false, block, pds);
+
+        postcopy_discard_send_finish(ms, pds);
+    } /* ram_list loop */
+
+    return 0;
+}
+
+/*
  * Transmit the set of pages to be discarded after precopy to the target
  * these are pages that:
  *     a) Have been previously transmitted but are now dirty again
@@ -1594,6 +1755,13 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms)
     /* This should be our last sync, the src is now paused */
     migration_bitmap_sync();
 
+    /* Deal with TPS != HPS */
+    ret = postcopy_chunk_hostpages(ms);
+    if (ret) {
+        rcu_read_unlock();
+        return ret;
+    }
+
     /*
      * Update the unsentmap to be unsentmap = unsentmap | dirty
      */
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply related	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 33/54] postcopy: Incoming initialisation
  2015-10-21  8:35   ` Juan Quintela
@ 2015-11-03 17:59     ` Dr. David Alan Gilbert
  2015-11-03 18:32       ` Juan Quintela
  0 siblings, 1 reply; 119+ messages in thread
From: Dr. David Alan Gilbert @ 2015-11-03 17:59 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > Reviewed-by: Amit Shah <amit.shah@redhat.com>
> 
> > +/*
> > + * At the end of migration, undo the effects of init_range
> > + * opaque should be the MIS.
> > + */
> > +static int cleanup_range(const char *block_name, void *host_addr,
> > +                        ram_addr_t offset, ram_addr_t length, void *opaque)
> > +{
> > +    MigrationIncomingState *mis = opaque;
> > +    struct uffdio_range range_struct;
> > +    trace_postcopy_cleanup_range(block_name, host_addr, offset, length);
> > +
> > +    /*
> > +     * We turned off hugepage for the precopy stage with postcopy enabled
> > +     * we can turn it back on now.
> > +     */
> > +#ifdef MADV_HUGEPAGE
> > +    if (madvise(host_addr, length, MADV_HUGEPAGE)) {
> > +        error_report("%s HUGEPAGE: %s", __func__, strerror(errno));
> > +        return -1;
> > +    }
> > +#endif
> 
> this should be the same than:
> 
>        qemu_madvise(host_addr, lenght, QEMU_MADV_HUGEPAGE);
> 
> Only problem I can see, is that there is no way to differentiate that
> madvise() has given one error or that MADV_HUGEPAGE is not defined.
> 
> If we really want that:
> 
>    if (QEMU_MADV_HUGEPAGE != QEM_MADV_INVALID) {
>       if (qemu_madvise(host_addr, length, QEMU_MADV_HUGEPAGE)) {
>         error_report("%s HUGEPAGE: %s", __func__, strerror(errno));
>         return -1;
>    }
> 
> But I am not sure if we want it.

Yes, so what I've currently got is:

    if (qemu_madvise(host_addr, length, QEMU_MADV_HUGEPAGE)) {
        error_report("%s HUGEPAGE: %s", __func__, strerror(errno));
        return -1;
    }

I'm tempted to add that if check, but the other similar case
is where you have headers that define HUGEPAGE, but a kernel built without
it, and in that case the madvise fails, which is a shame, since if the
kernel hasn't actually got transparent hugepages, then we don't
care if it failed to turn them on/off - but there doesn't seem
to be a good way to tell that.

> > +    /*
> > +     * We can also turn off userfault now since we should have all the
> > +     * pages.   It can be useful to leave it on to debug postcopy
> > +     * if you're not sure it's always getting every page.
> > +     */
> > +    range_struct.start = (uintptr_t)host_addr;
> > +    range_struct.len = length;
> > +
> > +    if (ioctl(mis->userfault_fd, UFFDIO_UNREGISTER, &range_struct)) {
> > +        error_report("%s: userfault unregister %s", __func__, strerror(errno));
> > +
> > +        return -1;
> > +    }
> > +
> > +    return 0;
> > +}
> 
> 
> I still think that exposing the userfault API all around is a bad idea,
> that it would be easier to just export:
> 
> qemu_userfault_register_range(addr, lenght);
> qemu_userfault_unregister_range(addr, lenght);
> 
> And hide the details on a header file.

That only hides a tiny bit of the detail;
for example the ioctl's for UFFDIO_COPY and UFFDIO_ZEROPAGE, have semantics
associated with them (that they also wake the waiting process for example)
it's not obvious that another OS would implement it in a similar way
or what the constraints on it would be.  Indeed the previous kernel API
we had, meant I had to do a lot more work with a similar set of calls
in userspace.  Most of the places where we pull this out into separate
headers/libraries is where we have an interface that's the same across
a bunch of different OSs but the detail is different.   Currently we only
have one interaface and no idea what the commonality would be, or how
much of the semantics that's in postcopy-ram.c would need to move with
that interface as well.

Dave

> 
> Later, Juan.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 45/54] Host page!=target page: Cleanup bitmaps
  2015-11-03 17:32     ` Dr. David Alan Gilbert
@ 2015-11-03 18:30       ` Juan Quintela
  0 siblings, 0 replies; 119+ messages in thread
From: Juan Quintela @ 2015-11-03 18:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Juan Quintela (quintela@redhat.com) wrote:
>> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
>> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

>> 
>> This is exactly the same code than the previous half of the function,
>> you just need to factor out in a function?
>> 
>> walk_btimap_host_page_chunks or whatever, and pass the two bits that
>> change?  the bitmap, and what to do with the ranges that are not there?
>
> Split out; see below - it gets a little bit more hairy since sentmap is
> now unsentmap, so we need a few if's; but it's still lost the duplication:
> (build tested only so far):
>
> Dave
>
> commit 15003123520ee5c358b2233c0bc30635aa90eb75
> Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Date:   Fri Sep 26 15:15:14 2014 +0100
>
>     Host page!=target page: Cleanup bitmaps
>     
>     Prior to the start of postcopy, ensure that everything that will
>     be transferred later is a whole host-page in size.
>     
>     This is accomplished by discarding partially transferred host pages
>     and marking any that are partially dirty as fully dirty.
>     
>     Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>


See, when we changed the sentbtmap to unsentbitmap, we made this code
more complex, but I still think that new code is easier to undertand.

Thanks.

Reviewed-by: Juan Quintela <quintela@redhat.com>



> diff --git a/migration/ram.c b/migration/ram.c
> index fe782e7..e30ed2b 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1576,6 +1576,167 @@ static int postcopy_each_ram_send_discard(MigrationState *ms)
>  }
>  
>  /*
> + * Helper for postcopy_chunk_hostpages; it's called twice to cleanup
> + *   the two bitmaps, that are similar, but one is inverted.
> + *
> + * We search for runs of target-pages that don't start or end on a
> + * host page boundary;
> + * unsent_pass=true: Cleans up partially unsent host pages by searching
> + *                 the unsentmap
> + * unsent_pass=false: Cleans up partially dirty host pages by searching
> + *                 the main migration bitmap
> + *
> + */
> +static void postcopy_chunk_hostpages_pass(MigrationState *ms, bool unsent_pass,
> +                                          RAMBlock *block,
> +                                          PostcopyDiscardState *pds)
> +{
> +    unsigned long *bitmap = atomic_rcu_read(&migration_bitmap_rcu)->bmap;
> +    unsigned int host_ratio = qemu_host_page_size / TARGET_PAGE_SIZE;
> +    unsigned long first = block->offset >> TARGET_PAGE_BITS;
> +    unsigned long len = block->used_length >> TARGET_PAGE_BITS;
> +    unsigned long last = first + (len - 1);
> +    unsigned long run_start;
> +
> +    if (unsent_pass) {
> +        /* Find a sent page */
> +        run_start = find_next_zero_bit(ms->unsentmap, last + 1, first);
> +    } else {
> +        /* Find a dirty page */
> +        run_start = find_next_bit(bitmap, last + 1, first);
> +    }
> +
> +    while (run_start <= last) {
> +        bool do_fixup = false;
> +        unsigned long fixup_start_addr;
> +        unsigned long host_offset;
> +
> +        /*
> +         * If the start of this run of pages is in the middle of a host
> +         * page, then we need to fixup this host page.
> +         */
> +        host_offset = run_start % host_ratio;
> +        if (host_offset) {
> +            do_fixup = true;
> +            run_start -= host_offset;
> +            fixup_start_addr = run_start;
> +            /* For the next pass */
> +            run_start = run_start + host_ratio;
> +        } else {
> +            /* Find the end of this run */
> +            unsigned long run_end;
> +            if (unsent_pass) {
> +                run_end = find_next_bit(ms->unsentmap, last + 1, run_start + 1);
> +            } else {
> +                run_end = find_next_zero_bit(bitmap, last + 1, run_start + 1);
> +            }
> +            /*
> +             * If the end isn't at the start of a host page, then the
> +             * run doesn't finish at the end of a host page
> +             * and we need to discard.
> +             */
> +            host_offset = run_end % host_ratio;
> +            if (host_offset) {
> +                do_fixup = true;
> +                fixup_start_addr = run_end - host_offset;
> +                /*
> +                 * This host page has gone, the next loop iteration starts
> +                 * from after the fixup
> +                 */
> +                run_start = fixup_start_addr + host_ratio;
> +            } else {
> +                /*
> +                 * No discards on this iteration, next loop starts from
> +                 * next sent/dirty page
> +                 */
> +                run_start = run_end + 1;
> +            }
> +        }
> +
> +        if (do_fixup) {
> +            unsigned long page;
> +
> +            /* Tell the destination to discard this page */
> +            if (unsent_pass || !test_bit(fixup_start_addr, ms->unsentmap)) {
> +                /* For the unsent_pass we:
> +                 *     discard partially sent pages
> +                 * For the !unsent_pass (dirty) we:
> +                 *     discard partially dirty pages that were sent
> +                 *     (any partially sent pages were already discarded
> +                 *     by the previous unsent_pass)
> +                 */
> +                postcopy_discard_send_range(ms, pds, fixup_start_addr,
> +                                            host_ratio);
> +            }
> +
> +            /* Clean up the bitmap */
> +            for (page = fixup_start_addr;
> +                 page < fixup_start_addr + host_ratio; page++) {
> +                /* All pages in this host page are now not sent */
> +                set_bit(page, ms->unsentmap);
> +
> +                /*
> +                 * Remark them as dirty, updating the count for any pages
> +                 * that weren't previously dirty.
> +                 */
> +                migration_dirty_pages += !test_and_set_bit(page, bitmap);
> +            }
> +        }
> +
> +        if (unsent_pass) {
> +            /* Find the next sent page for the next iteration */
> +            run_start = find_next_zero_bit(ms->unsentmap, last + 1,
> +                                           run_start);
> +        } else {
> +            /* Find the next dirty page for the next iteration */
> +            run_start = find_next_bit(bitmap, last + 1, run_start);
> +        }
> +    }
> +}
> +
> +/*
> + * Utility for the outgoing postcopy code.
> + *
> + * Discard any partially sent host-page size chunks, mark any partially
> + * dirty host-page size chunks as all dirty.
> + *
> + * Returns: 0 on success
> + */
> +static int postcopy_chunk_hostpages(MigrationState *ms)
> +{
> +    struct RAMBlock *block;
> +
> +    if (qemu_host_page_size == TARGET_PAGE_SIZE) {
> +        /* Easy case - TPS==HPS - nothing to be done */
> +        return 0;
> +    }
> +
> +    /* Easiest way to make sure we don't resume in the middle of a host-page */
> +    last_seen_block = NULL;
> +    last_sent_block = NULL;
> +    last_offset     = 0;
> +
> +    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
> +        unsigned long first = block->offset >> TARGET_PAGE_BITS;
> +
> +        PostcopyDiscardState *pds =
> +                         postcopy_discard_send_init(ms, first, block->idstr);
> +
> +        /* First pass: Discard all partially sent host pages */
> +        postcopy_chunk_hostpages_pass(ms, true, block, pds);
> +        /*
> +         * Second pass: Ensure that all partially dirty host pages are made
> +         * fully dirty.
> +         */
> +        postcopy_chunk_hostpages_pass(ms, false, block, pds);
> +
> +        postcopy_discard_send_finish(ms, pds);
> +    } /* ram_list loop */
> +
> +    return 0;
> +}
> +
> +/*
>   * Transmit the set of pages to be discarded after precopy to the target
>   * these are pages that:
>   *     a) Have been previously transmitted but are now dirty again
> @@ -1594,6 +1755,13 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms)
>      /* This should be our last sync, the src is now paused */
>      migration_bitmap_sync();
>  
> +    /* Deal with TPS != HPS */
> +    ret = postcopy_chunk_hostpages(ms);
> +    if (ret) {
> +        rcu_read_unlock();
> +        return ret;
> +    }
> +
>      /*
>       * Update the unsentmap to be unsentmap = unsentmap | dirty
>       */
>> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [Qemu-devel] [PATCH v8 33/54] postcopy: Incoming initialisation
  2015-11-03 17:59     ` Dr. David Alan Gilbert
@ 2015-11-03 18:32       ` Juan Quintela
  0 siblings, 0 replies; 119+ messages in thread
From: Juan Quintela @ 2015-11-03 18:32 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, liang.z.li, qemu-devel, luis, bharata, amit.shah,
	pbonzini

"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Juan Quintela (quintela@redhat.com) wrote:
>> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
>> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> >
>> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
>> > Reviewed-by: Amit Shah <amit.shah@redhat.com>
>> 
>> > +/*
>> > + * At the end of migration, undo the effects of init_range
>> > + * opaque should be the MIS.
>> > + */
>> > +static int cleanup_range(const char *block_name, void *host_addr,
>> > +                        ram_addr_t offset, ram_addr_t length, void *opaque)
>> > +{
>> > +    MigrationIncomingState *mis = opaque;
>> > +    struct uffdio_range range_struct;
>> > +    trace_postcopy_cleanup_range(block_name, host_addr, offset, length);
>> > +
>> > +    /*
>> > +     * We turned off hugepage for the precopy stage with postcopy enabled
>> > +     * we can turn it back on now.
>> > +     */
>> > +#ifdef MADV_HUGEPAGE
>> > +    if (madvise(host_addr, length, MADV_HUGEPAGE)) {
>> > +        error_report("%s HUGEPAGE: %s", __func__, strerror(errno));
>> > +        return -1;
>> > +    }
>> > +#endif
>> 
>> this should be the same than:
>> 
>>        qemu_madvise(host_addr, lenght, QEMU_MADV_HUGEPAGE);
>> 
>> Only problem I can see, is that there is no way to differentiate that
>> madvise() has given one error or that MADV_HUGEPAGE is not defined.
>> 
>> If we really want that:
>> 
>>    if (QEMU_MADV_HUGEPAGE != QEM_MADV_INVALID) {
>>       if (qemu_madvise(host_addr, length, QEMU_MADV_HUGEPAGE)) {
>>         error_report("%s HUGEPAGE: %s", __func__, strerror(errno));
>>         return -1;
>>    }
>> 
>> But I am not sure if we want it.
>
> Yes, so what I've currently got is:
>
>     if (qemu_madvise(host_addr, length, QEMU_MADV_HUGEPAGE)) {
>         error_report("%s HUGEPAGE: %s", __func__, strerror(errno));
>         return -1;
>     }
>
> I'm tempted to add that if check, but the other similar case
> is where you have headers that define HUGEPAGE, but a kernel built without
> it, and in that case the madvise fails, which is a shame, since if the
> kernel hasn't actually got transparent hugepages, then we don't
> care if it failed to turn them on/off - but there doesn't seem
> to be a good way to tell that.
>
>> > +    /*
>> > +     * We can also turn off userfault now since we should have all the
>> > +     * pages.   It can be useful to leave it on to debug postcopy
>> > +     * if you're not sure it's always getting every page.
>> > +     */
>> > +    range_struct.start = (uintptr_t)host_addr;
>> > +    range_struct.len = length;
>> > +
>> > +    if (ioctl(mis->userfault_fd, UFFDIO_UNREGISTER, &range_struct)) {
>> > + error_report("%s: userfault unregister %s", __func__,
>> > strerror(errno));
>> > +
>> > +        return -1;
>> > +    }
>> > +
>> > +    return 0;
>> > +}
>> 
>> 
>> I still think that exposing the userfault API all around is a bad idea,
>> that it would be easier to just export:
>> 
>> qemu_userfault_register_range(addr, lenght);
>> qemu_userfault_unregister_range(addr, lenght);
>> 
>> And hide the details on a header file.
>
> That only hides a tiny bit of the detail;
> for example the ioctl's for UFFDIO_COPY and UFFDIO_ZEROPAGE, have semantics
> associated with them (that they also wake the waiting process for example)
> it's not obvious that another OS would implement it in a similar way
> or what the constraints on it would be.  Indeed the previous kernel API
> we had, meant I had to do a lot more work with a similar set of calls
> in userspace.  Most of the places where we pull this out into separate
> headers/libraries is where we have an interface that's the same across
> a bunch of different OSs but the detail is different.   Currently we only
> have one interaface and no idea what the commonality would be, or how
> much of the semantics that's in postcopy-ram.c would need to move with
> that interface as well.

ok, if it is too difficult (I didn't knew about the associated
semantics), we will wait until something else implemented a similar
interface.

Thanks, Juan.

^ permalink raw reply	[flat|nested] 119+ messages in thread

end of thread, other threads:[~2015-11-03 18:33 UTC | newest]

Thread overview: 119+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-29  8:37 [Qemu-devel] [PATCH v8 00/54] Postcopy implementation Dr. David Alan Gilbert (git)
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 01/54] Add postcopy documentation Dr. David Alan Gilbert (git)
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 02/54] Provide runtime Target page information Dr. David Alan Gilbert (git)
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 03/54] Init page sizes in qtest Dr. David Alan Gilbert (git)
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 04/54] Move configuration section writing Dr. David Alan Gilbert (git)
2015-10-05  6:44   ` Amit Shah
2015-10-30 12:47     ` Dr. David Alan Gilbert
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 05/54] qemu_ram_block_from_host Dr. David Alan Gilbert (git)
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 06/54] Rename mis->file to from_src_file Dr. David Alan Gilbert (git)
2015-09-29 10:41   ` Amit Shah
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 07/54] Add qemu_get_buffer_in_place to avoid copies some of the time Dr. David Alan Gilbert (git)
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 08/54] Add wrapper for setting blocking status on a QEMUFile Dr. David Alan Gilbert (git)
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 09/54] Add QEMU_MADV_NOHUGEPAGE Dr. David Alan Gilbert (git)
2015-10-28 10:35   ` Amit Shah
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 10/54] migration/ram.c: Use RAMBlock rather than MemoryRegion Dr. David Alan Gilbert (git)
2015-10-28 10:36   ` Amit Shah
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 11/54] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 12/54] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 13/54] Move dirty page search state into separate structure Dr. David Alan Gilbert (git)
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 14/54] ram_find_and_save_block: Split out the finding Dr. David Alan Gilbert (git)
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 15/54] Rename save_live_complete to save_live_complete_precopy Dr. David Alan Gilbert (git)
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 16/54] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
2015-10-02 15:29   ` Daniel P. Berrange
2015-10-02 16:32     ` Dr. David Alan Gilbert
2015-10-02 17:03       ` Daniel P. Berrange
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 17/54] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 18/54] Migration commands Dr. David Alan Gilbert (git)
2015-10-20 11:22   ` Juan Quintela
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 19/54] Return path: Control commands Dr. David Alan Gilbert (git)
2015-10-20 11:27   ` Juan Quintela
2015-10-26 11:42     ` Dr. David Alan Gilbert
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 20/54] Return path: Send responses from destination to source Dr. David Alan Gilbert (git)
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 21/54] Return path: Source handling of return path Dr. David Alan Gilbert (git)
2015-10-20 11:33   ` Juan Quintela
2015-10-26 12:06     ` Dr. David Alan Gilbert
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 22/54] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 23/54] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
2015-09-29 20:22   ` Eric Blake
2015-09-30  7:00     ` Amit Shah
2015-09-30 12:44       ` Eric Blake
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 24/54] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
2015-10-20 11:50   ` Juan Quintela
2015-10-26 12:22     ` Dr. David Alan Gilbert
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 25/54] MIG_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
2015-10-20 13:25   ` Juan Quintela
2015-10-26 16:21     ` Dr. David Alan Gilbert
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 26/54] Modify save_live_pending for postcopy Dr. David Alan Gilbert (git)
2015-10-28 11:03   ` Amit Shah
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 27/54] postcopy: OS support test Dr. David Alan Gilbert (git)
2015-10-20 13:31   ` Juan Quintela
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 28/54] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
2015-09-30 16:25   ` Eric Blake
2015-09-30 16:30     ` Dr. David Alan Gilbert
2015-10-20 13:33   ` Juan Quintela
2015-10-28 11:17   ` Amit Shah
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 29/54] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
2015-10-20 13:35   ` Juan Quintela
2015-10-30 18:19     ` Dr. David Alan Gilbert
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 30/54] Avoid sending vmdescription during postcopy Dr. David Alan Gilbert (git)
2015-10-20 13:35   ` Juan Quintela
2015-10-28 11:19   ` Amit Shah
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 31/54] Add qemu_savevm_state_complete_postcopy Dr. David Alan Gilbert (git)
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 32/54] Postcopy: Maintain sentmap and calculate discard Dr. David Alan Gilbert (git)
2015-10-21 11:17   ` Juan Quintela
2015-10-30 18:43     ` Dr. David Alan Gilbert
2015-11-02 17:31     ` Dr. David Alan Gilbert
2015-11-02 18:19     ` Dr. David Alan Gilbert
2015-11-02 20:14     ` Dr. David Alan Gilbert
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 33/54] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
2015-10-21  8:35   ` Juan Quintela
2015-11-03 17:59     ` Dr. David Alan Gilbert
2015-11-03 18:32       ` Juan Quintela
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 34/54] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
2015-10-28 11:40   ` Amit Shah
2015-09-29  8:37 ` [Qemu-devel] [PATCH v8 35/54] Postcopy: Postcopy startup in migration thread Dr. David Alan Gilbert (git)
2015-10-21  8:57   ` Juan Quintela
2015-10-26 17:12     ` Dr. David Alan Gilbert
2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 36/54] Split out end of migration code from migration_thread Dr. David Alan Gilbert (git)
2015-10-21  9:11   ` Juan Quintela
2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 37/54] Postcopy: End of iteration Dr. David Alan Gilbert (git)
2015-10-21  9:16   ` Juan Quintela
2015-10-29  5:10   ` Amit Shah
2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 38/54] Page request: Add MIG_RP_MSG_REQ_PAGES reverse command Dr. David Alan Gilbert (git)
2015-10-21 11:12   ` Juan Quintela
2015-10-26 16:58     ` Dr. David Alan Gilbert
2015-10-29  5:17   ` Amit Shah
2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 39/54] Page request: Process incoming page request Dr. David Alan Gilbert (git)
2015-10-21 11:17   ` Juan Quintela
2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 40/54] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
2015-10-26 16:32   ` Juan Quintela
2015-11-03 11:52     ` Dr. David Alan Gilbert
2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 41/54] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
2015-10-28 10:28   ` Juan Quintela
2015-10-28 13:11     ` Dr. David Alan Gilbert
2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 42/54] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
2015-10-28 10:58   ` Juan Quintela
2015-10-30 12:59     ` Dr. David Alan Gilbert
2015-10-30 16:35     ` Dr. David Alan Gilbert
2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 43/54] Don't sync dirty bitmaps in postcopy Dr. David Alan Gilbert (git)
2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 44/54] Don't iterate on precopy-only devices during postcopy Dr. David Alan Gilbert (git)
2015-10-28 11:01   ` Juan Quintela
2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 45/54] Host page!=target page: Cleanup bitmaps Dr. David Alan Gilbert (git)
2015-10-28 11:24   ` Juan Quintela
2015-11-03 17:32     ` Dr. David Alan Gilbert
2015-11-03 18:30       ` Juan Quintela
2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 46/54] postcopy: Check order of received target pages Dr. David Alan Gilbert (git)
2015-10-28 11:26   ` Juan Quintela
2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 47/54] Round up RAMBlock sizes to host page sizes Dr. David Alan Gilbert (git)
2015-10-28 11:28   ` Juan Quintela
2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 48/54] Postcopy; Handle userfault requests Dr. David Alan Gilbert (git)
2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 49/54] Start up a postcopy/listener thread ready for incoming page data Dr. David Alan Gilbert (git)
2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 50/54] postcopy: Wire up loadvm_postcopy_handle_ commands Dr. David Alan Gilbert (git)
2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 51/54] Postcopy: Mark nohugepage before discard Dr. David Alan Gilbert (git)
2015-10-28 14:02   ` Juan Quintela
2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 52/54] End of migration for postcopy Dr. David Alan Gilbert (git)
2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 53/54] Disable mlock around incoming postcopy Dr. David Alan Gilbert (git)
2015-10-21  9:17   ` Juan Quintela
2015-09-29  8:38 ` [Qemu-devel] [PATCH v8 54/54] Inhibit ballooning during postcopy Dr. David Alan Gilbert (git)
     [not found] <1443459153-10965-1-git-send-email-dgilbert@redhat.com>
     [not found] ` <1443459153-10965-11-git-send-email-dgilbert@redhat.com>
     [not found]   ` <87zizdvm9m.fsf@neno.neno>
2015-10-20 11:58     ` [Qemu-devel] [PATCH v8 10/54] migration/ram.c: Use RAMBlock rather than MemoryRegion Juan Quintela

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).