From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44301) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z8Ora-00037K-7X for qemu-devel@nongnu.org; Fri, 26 Jun 2015 04:19:56 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z8OrW-0000IP-3R for qemu-devel@nongnu.org; Fri, 26 Jun 2015 04:19:54 -0400 Received: from [59.151.112.132] (port=51338 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z8OrR-0000EH-FX for qemu-devel@nongnu.org; Fri, 26 Jun 2015 04:19:50 -0400 Message-ID: <558D0B18.4050008@cn.fujitsu.com> Date: Fri, 26 Jun 2015 16:19:36 +0800 From: Yang Hongyang MIME-Version: 1.0 References: <1434450415-11339-1-git-send-email-dgilbert@redhat.com> <1434450415-11339-2-git-send-email-dgilbert@redhat.com> <558CF559.9060208@cn.fujitsu.com> <558D04F0.5050904@huawei.com> <558D0681.9060904@cn.fujitsu.com> <20150626081004.GA2186@work-vm> In-Reply-To: <20150626081004.GA2186@work-vm> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: aarcange@redhat.com, yamahata@private.email.ne.jp, zhanghailiang , quintela@redhat.com, liang.z.li@intel.com, peter.huangpeng@huawei.com, qemu-devel@nongnu.org, luis@cs.umu.se, amit.shah@redhat.com, pbonzini@redhat.com, david@gibson.dropbear.id.au On 06/26/2015 04:10 PM, Dr. David Alan Gilbert wrote: > * Yang Hongyang (yanghy@cn.fujitsu.com) wrote: >> >> >> On 06/26/2015 03:53 PM, zhanghailiang wrote: >>> On 2015/6/26 14:46, Yang Hongyang wrote: >>>> Hi Dave, >>>> >>>> On 06/16/2015 06:26 PM, Dr. David Alan Gilbert (git) wrote: >>>>> From: "Dr. David Alan Gilbert" >>>>> >>>> [...] >>>>> += Postcopy = >>>>> +'Postcopy' migration is a way to deal with migrations that refuse to converge; >>>>> +its plus side is that there is an upper bound on the amount of migration >>>>> traffic >>>>> +and time it takes, the down side is that during the postcopy phase, a >>>>> failure of >>>>> +*either* side or the network connection causes the guest to be lost. >>>>> + >>>>> +In postcopy the destination CPUs are started before all the memory has been >>>>> +transferred, and accesses to pages that are yet to be transferred cause >>>>> +a fault that's translated by QEMU into a request to the source QEMU. >>>> >>>> I have a immature idea, >>>> Can we keep a source RAM cache on destination QEMU, instead of request to the >>>> source QEMU, that is: >>>> - When start_postcopy issued, source will paused, and __open another socket >>>> (maybe another migration thread)__ to send the remaining dirty pages to >>>> destination, at the same time, destination will start, and cache the >>>> remaining pages. >>> >>> Er, it seems that current implementation is just like what you described except >>> the ram cache: >>> After switch to post-copy mode, the source side will send the remaining dirty >>> pages as pre-copy. >>> Here it does not need any cache at all, it just places the dirty pages where it >>> will be accessed. > > Yes, zhanghailiang is correct; the source keeps sending other pages without being asked, > however when asked it sends requested pages immediately. and the 'cache' is just > the main memory from which the destination is working. > > However, the idea of using a separate socket is one that we have been thinking > about; one of the problems is that the urgent requested pages get delayed behind > the background page transfer and that increases the latency; a separate socket > should fix that. That would be better. > >> I haven't look into the implementation in detail, but if it is, I think it >> should be documented here...or in the below section [Source behaviour] > > Yes, I can add to the documentation; I've added the following text: > > During postcopy the source scans the list of dirty pages and sends them > to the destination without being requested (in much the same way as precopy), > however when a page request is received from the destination the dirty page > scanning restarts from the requested location. This causes requested pages > to be sent quickly, and also causes pages directly after the requested page > to be sent quickly in the hope that those pages are likely to be requested > by the destination soon. Looks clearer for me now :) > > Dave > >>> >>>> - When the page fault occured, first lookup the page in the CACHE, if it is not >>>> yet received, request to the source QEMU. >>>> - Once the remaining dirty pages are transfered, the source QEMU can go now. >>>> >>>> The existing postcopy mechanism does not need to be changed, just add the >>>> remaining page transfer mechanism, and the RAM cache. >>>> >>>> I don't know if it is feasible and whether it will bring improvement to the >>>> postcopy, what do you think? >>>> >>>>> + >>>>> +Postcopy can be combined with precopy (i.e. normal migration) so that if >>>>> precopy >>>>> +doesn't finish in a given time the switch is made to postcopy. >>>>> + >>>>> +=== Enabling postcopy === >>>>> + >>>>> +To enable postcopy (prior to the start of migration): >>>>> + >>>>> +migrate_set_capability x-postcopy-ram on >>>>> + >>>>> +The migration will still start in precopy mode, however issuing: >>>>> + >>>>> +migrate_start_postcopy >>>>> + >>>>> +will now cause the transition from precopy to postcopy. >>>>> +It can be issued immediately after migration is started or any >>>>> +time later on. Issuing it after the end of a migration is harmless. >>>>> + >>>>> +=== Postcopy device transfer === >>>>> + >>>>> +Loading of device data may cause the device emulation to access guest RAM >>>>> +that may trigger faults that have to be resolved by the source, as such >>>>> +the migration stream has to be able to respond with page data *during* the >>>>> +device load, and hence the device data has to be read from the stream >>>>> completely >>>>> +before the device load begins to free the stream up. This is achieved by >>>>> +'packaging' the device data into a blob that's read in one go. >>>>> + >>>>> +Source behaviour >>>>> + >>>>> +Until postcopy is entered the migration stream is identical to normal >>>>> +precopy, except for the addition of a 'postcopy advise' command at >>>>> +the beginning, to tell the destination that postcopy might happen. >>>>> +When postcopy starts the source sends the page discard data and then >>>>> +forms the 'package' containing: >>>>> + >>>>> + Command: 'postcopy listen' >>>>> + The device state >>>>> + A series of sections, identical to the precopy streams device state >>>>> stream >>>>> + containing everything except postcopiable devices (i.e. RAM) >>>>> + Command: 'postcopy run' >>>>> + >>>>> +The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the >>>>> +contents are formatted in the same way as the main migration stream. >>>>> + >>>>> +Destination behaviour >>>>> + >>>>> +Initially the destination looks the same as precopy, with a single thread >>>>> +reading the migration stream; the 'postcopy advise' and 'discard' commands >>>>> +are processed to change the way RAM is managed, but don't affect the stream >>>>> +processing. >>>>> + >>>>> +------------------------------------------------------------------------------ >>>>> + 1 2 3 4 5 6 7 >>>>> +main -----DISCARD-CMD_PACKAGED ( LISTEN DEVICE DEVICE DEVICE RUN ) >>>>> +thread | | >>>>> + | (page request) >>>>> + | \___ >>>>> + v \ >>>>> +listen thread: --- page -- page -- page -- page -- page -- >>>>> + >>>>> + a b c >>>>> +------------------------------------------------------------------------------ >>>>> + >>>>> +On receipt of CMD_PACKAGED (1) >>>>> + All the data associated with the package - the ( ... ) section in the >>>>> +diagram - is read into memory (into a QEMUSizedBuffer), and the main thread >>>>> +recurses into qemu_loadvm_state_main to process the contents of the package (2) >>>>> +which contains commands (3,6) and devices (4...) >>>>> + >>>>> +On receipt of 'postcopy listen' - 3 -(i.e. the 1st command in the package) >>>>> +a new thread (a) is started that takes over servicing the migration stream, >>>>> +while the main thread carries on loading the package. It loads normal >>>>> +background page data (b) but if during a device load a fault happens (5) the >>>>> +returned page (c) is loaded by the listen thread allowing the main threads >>>>> +device load to carry on. >>>>> + >>>>> +The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the >>>>> destination >>>>> +CPUs start running. >>>>> +At the end of the CMD_PACKAGED (7) the main thread returns to normal running >>>>> behaviour >>>>> +and is no longer used by migration, while the listen thread carries >>>>> +on servicing page data until the end of migration. >>>>> + >>>>> +=== Postcopy states === >>>>> + >>>>> +Postcopy moves through a series of states (see postcopy_state) from >>>>> +ADVISE->LISTEN->RUNNING->END >>>>> + >>>>> + Advise: Set at the start of migration if postcopy is enabled, even >>>>> + if it hasn't had the start command; here the destination >>>>> + checks that its OS has the support needed for postcopy, and performs >>>>> + setup to ensure the RAM mappings are suitable for later postcopy. >>>>> + (Triggered by reception of POSTCOPY_ADVISE command) >>>>> + >>>>> + Listen: The first command in the package, POSTCOPY_LISTEN, switches >>>>> + the destination state to Listen, and starts a new thread >>>>> + (the 'listen thread') which takes over the job of receiving >>>>> + pages off the migration stream, while the main thread carries >>>>> + on processing the blob. With this thread able to process page >>>>> + reception, the destination now 'sensitises' the RAM to detect >>>>> + any access to missing pages (on Linux using the 'userfault' >>>>> + system). >>>>> + >>>>> + Running: POSTCOPY_RUN causes the destination to synchronise all >>>>> + state and start the CPUs and IO devices running. The main >>>>> + thread now finishes processing the migration package and >>>>> + now carries on as it would for normal precopy migration >>>>> + (although it can't do the cleanup it would do as it >>>>> + finishes a normal migration). >>>>> + >>>>> + End: The listen thread can now quit, and perform the cleanup of migration >>>>> + state, the migration is now complete. >>>>> + >>>>> +=== Source side page maps === >>>>> + >>>>> +The source side keeps two bitmaps during postcopy; 'the migration bitmap' >>>>> +and 'sent map'. The 'migration bitmap' is basically the same as in >>>>> +the precopy case, and holds a bit to indicate that page is 'dirty' - >>>>> +i.e. needs sending. During the precopy phase this is updated as the CPU >>>>> +dirties pages, however during postcopy the CPUs are stopped and nothing >>>>> +should dirty anything any more. >>>>> + >>>>> +The 'sent map' is used for the transition to postcopy. It is a bitmap that >>>>> +has a bit set whenever a page is sent to the destination, however during >>>>> +the transition to postcopy mode it is masked against the migration bitmap >>>>> +(sentmap &= migrationbitmap) to generate a bitmap recording pages that >>>>> +have been previously been sent but are now dirty again. This masked >>>>> +sentmap is sent to the destination which discards those now dirty pages >>>>> +before starting the CPUs. >>>>> + >>>>> +Note that the contents of the sentmap are sacrificed during the calculation >>>>> +of the discard set and thus aren't valid once in postcopy. The dirtymap >>>>> +is still valid and is used to ensure that no page is sent more than once. Any >>>>> +request for a page that has already been sent is ignored. Duplicate requests >>>>> +such as this can happen as a page is sent at about the same time the >>>>> +destination accesses it. >>>>> >>>> >>> >>> >>> . >>> >> >> -- >> Thanks, >> Yang. > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > . > -- Thanks, Yang.