From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:44301)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <yanghy@cn.fujitsu.com>) id 1Z8Ora-00037K-7X
	for qemu-devel@nongnu.org; Fri, 26 Jun 2015 04:19:56 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <yanghy@cn.fujitsu.com>) id 1Z8OrW-0000IP-3R
	for qemu-devel@nongnu.org; Fri, 26 Jun 2015 04:19:54 -0400
Received: from [59.151.112.132] (port=51338 helo=heian.cn.fujitsu.com)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <yanghy@cn.fujitsu.com>) id 1Z8OrR-0000EH-FX
	for qemu-devel@nongnu.org; Fri, 26 Jun 2015 04:19:50 -0400
Message-ID: <558D0B18.4050008@cn.fujitsu.com>
Date: Fri, 26 Jun 2015 16:19:36 +0800
From: Yang Hongyang <yanghy@cn.fujitsu.com>
MIME-Version: 1.0
References: <1434450415-11339-1-git-send-email-dgilbert@redhat.com>
	<1434450415-11339-2-git-send-email-dgilbert@redhat.com>
	<558CF559.9060208@cn.fujitsu.com> <558D04F0.5050904@huawei.com>
	<558D0681.9060904@cn.fujitsu.com> <20150626081004.GA2186@work-vm>
In-Reply-To: <20150626081004.GA2186@work-vm>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy
 works.
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: aarcange@redhat.com, yamahata@private.email.ne.jp, zhanghailiang <zhang.zhanghailiang@huawei.com>, quintela@redhat.com, liang.z.li@intel.com, peter.huangpeng@huawei.com, qemu-devel@nongnu.org, luis@cs.umu.se, amit.shah@redhat.com, pbonzini@redhat.com, david@gibson.dropbear.id.au


On 06/26/2015 04:10 PM, Dr. David Alan Gilbert wrote:
> * Yang Hongyang (yanghy@cn.fujitsu.com) wrote:
>>
>>
>> On 06/26/2015 03:53 PM, zhanghailiang wrote:
>>> On 2015/6/26 14:46, Yang Hongyang wrote:
>>>> Hi Dave,
>>>>
>>>> On 06/16/2015 06:26 PM, Dr. David Alan Gilbert (git) wrote:
>>>>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>>>>
>>>> [...]
>>>>> += Postcopy =
>>>>> +'Postcopy' migration is a way to deal with migrations that refuse to converge;
>>>>> +its plus side is that there is an upper bound on the amount of migration
>>>>> traffic
>>>>> +and time it takes, the down side is that during the postcopy phase, a
>>>>> failure of
>>>>> +*either* side or the network connection causes the guest to be lost.
>>>>> +
>>>>> +In postcopy the destination CPUs are started before all the memory has been
>>>>> +transferred, and accesses to pages that are yet to be transferred cause
>>>>> +a fault that's translated by QEMU into a request to the source QEMU.
>>>>
>>>> I have a immature idea,
>>>> Can we keep a source RAM cache on destination QEMU, instead of request to the
>>>> source QEMU, that is:
>>>>   - When start_postcopy issued, source will paused, and __open another socket
>>>>     (maybe another migration thread)__ to send the remaining dirty pages to
>>>>     destination, at the same time, destination will start, and cache the
>>>>     remaining pages.
>>>
>>> Er, it seems that current implementation is just like what you described except
>>> the ram cache:
>>> After switch to post-copy mode, the source side will send the remaining dirty
>>> pages as pre-copy.
>>> Here it does not need any cache at all, it just places the dirty pages where it
>>> will be accessed.
>
> Yes, zhanghailiang is correct; the source keeps sending other pages without being asked,
> however when asked it sends requested pages immediately.  and the 'cache' is just
> the main memory from which the destination is working.
>
> However, the idea of using a separate socket is one that we have been thinking
> about; one of the problems is that the urgent requested pages get delayed behind
> the background page transfer and that increases the latency; a separate socket
> should fix that.

That would be better.

>
>> I haven't look into the implementation in detail, but if it is, I think it
>> should be documented here...or in the below section [Source behaviour]
>
> Yes, I can add to the documentation; I've added the following text:
>
>    During postcopy the source scans the list of dirty pages and sends them
>    to the destination without being requested (in much the same way as precopy),
>    however when a page request is received from the destination the dirty page
>    scanning restarts from the requested location.  This causes requested pages
>    to be sent quickly, and also causes pages directly after the requested page
>    to be sent quickly in the hope that those pages are likely to be requested
>    by the destination soon.

Looks clearer for me now :)

>
> Dave
>
>>>
>>>>   - When the page fault occured, first lookup the page in the CACHE, if it is not
>>>>     yet received, request to the source QEMU.
>>>>   - Once the remaining dirty pages are transfered, the source QEMU can go now.
>>>>
>>>> The existing postcopy mechanism does not need to be changed, just add the
>>>> remaining page transfer mechanism, and the RAM cache.
>>>>
>>>> I don't know if it is feasible and whether it will bring improvement to the
>>>> postcopy, what do you think?
>>>>
>>>>> +
>>>>> +Postcopy can be combined with precopy (i.e. normal migration) so that if
>>>>> precopy
>>>>> +doesn't finish in a given time the switch is made to postcopy.
>>>>> +
>>>>> +=== Enabling postcopy ===
>>>>> +
>>>>> +To enable postcopy (prior to the start of migration):
>>>>> +
>>>>> +migrate_set_capability x-postcopy-ram on
>>>>> +
>>>>> +The migration will still start in precopy mode, however issuing:
>>>>> +
>>>>> +migrate_start_postcopy
>>>>> +
>>>>> +will now cause the transition from precopy to postcopy.
>>>>> +It can be issued immediately after migration is started or any
>>>>> +time later on.  Issuing it after the end of a migration is harmless.
>>>>> +
>>>>> +=== Postcopy device transfer ===
>>>>> +
>>>>> +Loading of device data may cause the device emulation to access guest RAM
>>>>> +that may trigger faults that have to be resolved by the source, as such
>>>>> +the migration stream has to be able to respond with page data *during* the
>>>>> +device load, and hence the device data has to be read from the stream
>>>>> completely
>>>>> +before the device load begins to free the stream up.  This is achieved by
>>>>> +'packaging' the device data into a blob that's read in one go.
>>>>> +
>>>>> +Source behaviour
>>>>> +
>>>>> +Until postcopy is entered the migration stream is identical to normal
>>>>> +precopy, except for the addition of a 'postcopy advise' command at
>>>>> +the beginning, to tell the destination that postcopy might happen.
>>>>> +When postcopy starts the source sends the page discard data and then
>>>>> +forms the 'package' containing:
>>>>> +
>>>>> +   Command: 'postcopy listen'
>>>>> +   The device state
>>>>> +      A series of sections, identical to the precopy streams device state
>>>>> stream
>>>>> +      containing everything except postcopiable devices (i.e. RAM)
>>>>> +   Command: 'postcopy run'
>>>>> +
>>>>> +The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
>>>>> +contents are formatted in the same way as the main migration stream.
>>>>> +
>>>>> +Destination behaviour
>>>>> +
>>>>> +Initially the destination looks the same as precopy, with a single thread
>>>>> +reading the migration stream; the 'postcopy advise' and 'discard' commands
>>>>> +are processed to change the way RAM is managed, but don't affect the stream
>>>>> +processing.
>>>>> +
>>>>> +------------------------------------------------------------------------------
>>>>> +                        1      2   3     4 5                      6   7
>>>>> +main -----DISCARD-CMD_PACKAGED ( LISTEN  DEVICE     DEVICE DEVICE RUN )
>>>>> +thread                             |       |
>>>>> +                                   |     (page request)
>>>>> +                                   |        \___
>>>>> +                                   v            \
>>>>> +listen thread:                     --- page -- page -- page -- page -- page --
>>>>> +
>>>>> +                                   a   b        c
>>>>> +------------------------------------------------------------------------------
>>>>> +
>>>>> +On receipt of CMD_PACKAGED (1)
>>>>> +   All the data associated with the package - the ( ... ) section in the
>>>>> +diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
>>>>> +recurses into qemu_loadvm_state_main to process the contents of the package (2)
>>>>> +which contains commands (3,6) and devices (4...)
>>>>> +
>>>>> +On receipt of 'postcopy listen' - 3 -(i.e. the 1st command in the package)
>>>>> +a new thread (a) is started that takes over servicing the migration stream,
>>>>> +while the main thread carries on loading the package.   It loads normal
>>>>> +background page data (b) but if during a device load a fault happens (5) the
>>>>> +returned page (c) is loaded by the listen thread allowing the main threads
>>>>> +device load to carry on.
>>>>> +
>>>>> +The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the
>>>>> destination
>>>>> +CPUs start running.
>>>>> +At the end of the CMD_PACKAGED (7) the main thread returns to normal running
>>>>> behaviour
>>>>> +and is no longer used by migration, while the listen thread carries
>>>>> +on servicing page data until the end of migration.
>>>>> +
>>>>> +=== Postcopy states ===
>>>>> +
>>>>> +Postcopy moves through a series of states (see postcopy_state) from
>>>>> +ADVISE->LISTEN->RUNNING->END
>>>>> +
>>>>> +  Advise: Set at the start of migration if postcopy is enabled, even
>>>>> +          if it hasn't had the start command; here the destination
>>>>> +          checks that its OS has the support needed for postcopy, and performs
>>>>> +          setup to ensure the RAM mappings are suitable for later postcopy.
>>>>> +          (Triggered by reception of POSTCOPY_ADVISE command)
>>>>> +
>>>>> +  Listen: The first command in the package, POSTCOPY_LISTEN, switches
>>>>> +          the destination state to Listen, and starts a new thread
>>>>> +          (the 'listen thread') which takes over the job of receiving
>>>>> +          pages off the migration stream, while the main thread carries
>>>>> +          on processing the blob.  With this thread able to process page
>>>>> +          reception, the destination now 'sensitises' the RAM to detect
>>>>> +          any access to missing pages (on Linux using the 'userfault'
>>>>> +          system).
>>>>> +
>>>>> +  Running: POSTCOPY_RUN causes the destination to synchronise all
>>>>> +          state and start the CPUs and IO devices running.  The main
>>>>> +          thread now finishes processing the migration package and
>>>>> +          now carries on as it would for normal precopy migration
>>>>> +          (although it can't do the cleanup it would do as it
>>>>> +          finishes a normal migration).
>>>>> +
>>>>> +  End: The listen thread can now quit, and perform the cleanup of migration
>>>>> +          state, the migration is now complete.
>>>>> +
>>>>> +=== Source side page maps ===
>>>>> +
>>>>> +The source side keeps two bitmaps during postcopy; 'the migration bitmap'
>>>>> +and 'sent map'.  The 'migration bitmap' is basically the same as in
>>>>> +the precopy case, and holds a bit to indicate that page is 'dirty' -
>>>>> +i.e. needs sending.  During the precopy phase this is updated as the CPU
>>>>> +dirties pages, however during postcopy the CPUs are stopped and nothing
>>>>> +should dirty anything any more.
>>>>> +
>>>>> +The 'sent map' is used for the transition to postcopy. It is a bitmap that
>>>>> +has a bit set whenever a page is sent to the destination, however during
>>>>> +the transition to postcopy mode it is masked against the migration bitmap
>>>>> +(sentmap &= migrationbitmap) to generate a bitmap recording pages that
>>>>> +have been previously been sent but are now dirty again.  This masked
>>>>> +sentmap is sent to the destination which discards those now dirty pages
>>>>> +before starting the CPUs.
>>>>> +
>>>>> +Note that the contents of the sentmap are sacrificed during the calculation
>>>>> +of the discard set and thus aren't valid once in postcopy.  The dirtymap
>>>>> +is still valid and is used to ensure that no page is sent more than once.  Any
>>>>> +request for a page that has already been sent is ignored.  Duplicate requests
>>>>> +such as this can happen as a page is sent at about the same time the
>>>>> +destination accesses it.
>>>>>
>>>>
>>>
>>>
>>> .
>>>
>>
>> --
>> Thanks,
>> Yang.
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> .
>

-- 
Thanks,
Yang.