From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38822) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z8OYh-0002fI-0J for qemu-devel@nongnu.org; Fri, 26 Jun 2015 04:00:24 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z8OYb-0005sQ-1u for qemu-devel@nongnu.org; Fri, 26 Jun 2015 04:00:22 -0400 Received: from [59.151.112.132] (port=29532 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z8OYV-0005o9-E9 for qemu-devel@nongnu.org; Fri, 26 Jun 2015 04:00:16 -0400 Message-ID: <558D0681.9060904@cn.fujitsu.com> Date: Fri, 26 Jun 2015 16:00:01 +0800 From: Yang Hongyang MIME-Version: 1.0 References: <1434450415-11339-1-git-send-email-dgilbert@redhat.com> <1434450415-11339-2-git-send-email-dgilbert@redhat.com> <558CF559.9060208@cn.fujitsu.com> <558D04F0.5050904@huawei.com> In-Reply-To: <558D04F0.5050904@huawei.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: zhanghailiang , "Dr. David Alan Gilbert (git)" , qemu-devel@nongnu.org Cc: aarcange@redhat.com, yamahata@private.email.ne.jp, quintela@redhat.com, liang.z.li@intel.com, peter.huangpeng@huawei.com, luis@cs.umu.se, amit.shah@redhat.com, pbonzini@redhat.com, david@gibson.dropbear.id.au On 06/26/2015 03:53 PM, zhanghailiang wrote: > On 2015/6/26 14:46, Yang Hongyang wrote: >> Hi Dave, >> >> On 06/16/2015 06:26 PM, Dr. David Alan Gilbert (git) wrote: >>> From: "Dr. David Alan Gilbert" >>> >> [...] >>> += Postcopy = >>> +'Postcopy' migration is a way to deal with migrations that refuse to converge; >>> +its plus side is that there is an upper bound on the amount of migration >>> traffic >>> +and time it takes, the down side is that during the postcopy phase, a >>> failure of >>> +*either* side or the network connection causes the guest to be lost. >>> + >>> +In postcopy the destination CPUs are started before all the memory has been >>> +transferred, and accesses to pages that are yet to be transferred cause >>> +a fault that's translated by QEMU into a request to the source QEMU. >> >> I have a immature idea, >> Can we keep a source RAM cache on destination QEMU, instead of request to the >> source QEMU, that is: >> - When start_postcopy issued, source will paused, and __open another socket >> (maybe another migration thread)__ to send the remaining dirty pages to >> destination, at the same time, destination will start, and cache the >> remaining pages. > > Er, it seems that current implementation is just like what you described except > the ram cache: > After switch to post-copy mode, the source side will send the remaining dirty > pages as pre-copy. > Here it does not need any cache at all, it just places the dirty pages where it > will be accessed. I haven't look into the implementation in detail, but if it is, I think it should be documented here...or in the below section [Source behaviour] > >> - When the page fault occured, first lookup the page in the CACHE, if it is not >> yet received, request to the source QEMU. >> - Once the remaining dirty pages are transfered, the source QEMU can go now. >> >> The existing postcopy mechanism does not need to be changed, just add the >> remaining page transfer mechanism, and the RAM cache. >> >> I don't know if it is feasible and whether it will bring improvement to the >> postcopy, what do you think? >> >>> + >>> +Postcopy can be combined with precopy (i.e. normal migration) so that if >>> precopy >>> +doesn't finish in a given time the switch is made to postcopy. >>> + >>> +=== Enabling postcopy === >>> + >>> +To enable postcopy (prior to the start of migration): >>> + >>> +migrate_set_capability x-postcopy-ram on >>> + >>> +The migration will still start in precopy mode, however issuing: >>> + >>> +migrate_start_postcopy >>> + >>> +will now cause the transition from precopy to postcopy. >>> +It can be issued immediately after migration is started or any >>> +time later on. Issuing it after the end of a migration is harmless. >>> + >>> +=== Postcopy device transfer === >>> + >>> +Loading of device data may cause the device emulation to access guest RAM >>> +that may trigger faults that have to be resolved by the source, as such >>> +the migration stream has to be able to respond with page data *during* the >>> +device load, and hence the device data has to be read from the stream >>> completely >>> +before the device load begins to free the stream up. This is achieved by >>> +'packaging' the device data into a blob that's read in one go. >>> + >>> +Source behaviour >>> + >>> +Until postcopy is entered the migration stream is identical to normal >>> +precopy, except for the addition of a 'postcopy advise' command at >>> +the beginning, to tell the destination that postcopy might happen. >>> +When postcopy starts the source sends the page discard data and then >>> +forms the 'package' containing: >>> + >>> + Command: 'postcopy listen' >>> + The device state >>> + A series of sections, identical to the precopy streams device state >>> stream >>> + containing everything except postcopiable devices (i.e. RAM) >>> + Command: 'postcopy run' >>> + >>> +The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the >>> +contents are formatted in the same way as the main migration stream. >>> + >>> +Destination behaviour >>> + >>> +Initially the destination looks the same as precopy, with a single thread >>> +reading the migration stream; the 'postcopy advise' and 'discard' commands >>> +are processed to change the way RAM is managed, but don't affect the stream >>> +processing. >>> + >>> +------------------------------------------------------------------------------ >>> + 1 2 3 4 5 6 7 >>> +main -----DISCARD-CMD_PACKAGED ( LISTEN DEVICE DEVICE DEVICE RUN ) >>> +thread | | >>> + | (page request) >>> + | \___ >>> + v \ >>> +listen thread: --- page -- page -- page -- page -- page -- >>> + >>> + a b c >>> +------------------------------------------------------------------------------ >>> + >>> +On receipt of CMD_PACKAGED (1) >>> + All the data associated with the package - the ( ... ) section in the >>> +diagram - is read into memory (into a QEMUSizedBuffer), and the main thread >>> +recurses into qemu_loadvm_state_main to process the contents of the package (2) >>> +which contains commands (3,6) and devices (4...) >>> + >>> +On receipt of 'postcopy listen' - 3 -(i.e. the 1st command in the package) >>> +a new thread (a) is started that takes over servicing the migration stream, >>> +while the main thread carries on loading the package. It loads normal >>> +background page data (b) but if during a device load a fault happens (5) the >>> +returned page (c) is loaded by the listen thread allowing the main threads >>> +device load to carry on. >>> + >>> +The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the >>> destination >>> +CPUs start running. >>> +At the end of the CMD_PACKAGED (7) the main thread returns to normal running >>> behaviour >>> +and is no longer used by migration, while the listen thread carries >>> +on servicing page data until the end of migration. >>> + >>> +=== Postcopy states === >>> + >>> +Postcopy moves through a series of states (see postcopy_state) from >>> +ADVISE->LISTEN->RUNNING->END >>> + >>> + Advise: Set at the start of migration if postcopy is enabled, even >>> + if it hasn't had the start command; here the destination >>> + checks that its OS has the support needed for postcopy, and performs >>> + setup to ensure the RAM mappings are suitable for later postcopy. >>> + (Triggered by reception of POSTCOPY_ADVISE command) >>> + >>> + Listen: The first command in the package, POSTCOPY_LISTEN, switches >>> + the destination state to Listen, and starts a new thread >>> + (the 'listen thread') which takes over the job of receiving >>> + pages off the migration stream, while the main thread carries >>> + on processing the blob. With this thread able to process page >>> + reception, the destination now 'sensitises' the RAM to detect >>> + any access to missing pages (on Linux using the 'userfault' >>> + system). >>> + >>> + Running: POSTCOPY_RUN causes the destination to synchronise all >>> + state and start the CPUs and IO devices running. The main >>> + thread now finishes processing the migration package and >>> + now carries on as it would for normal precopy migration >>> + (although it can't do the cleanup it would do as it >>> + finishes a normal migration). >>> + >>> + End: The listen thread can now quit, and perform the cleanup of migration >>> + state, the migration is now complete. >>> + >>> +=== Source side page maps === >>> + >>> +The source side keeps two bitmaps during postcopy; 'the migration bitmap' >>> +and 'sent map'. The 'migration bitmap' is basically the same as in >>> +the precopy case, and holds a bit to indicate that page is 'dirty' - >>> +i.e. needs sending. During the precopy phase this is updated as the CPU >>> +dirties pages, however during postcopy the CPUs are stopped and nothing >>> +should dirty anything any more. >>> + >>> +The 'sent map' is used for the transition to postcopy. It is a bitmap that >>> +has a bit set whenever a page is sent to the destination, however during >>> +the transition to postcopy mode it is masked against the migration bitmap >>> +(sentmap &= migrationbitmap) to generate a bitmap recording pages that >>> +have been previously been sent but are now dirty again. This masked >>> +sentmap is sent to the destination which discards those now dirty pages >>> +before starting the CPUs. >>> + >>> +Note that the contents of the sentmap are sacrificed during the calculation >>> +of the discard set and thus aren't valid once in postcopy. The dirtymap >>> +is still valid and is used to ensure that no page is sent more than once. Any >>> +request for a page that has already been sent is ignored. Duplicate requests >>> +such as this can happen as a page is sent at about the same time the >>> +destination accesses it. >>> >> > > > . > -- Thanks, Yang.