From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36632) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XRCJ9-0003Qp-VC for qemu-devel@nongnu.org; Mon, 08 Sep 2014 23:41:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XRCJ3-0003V7-Nh for qemu-devel@nongnu.org; Mon, 08 Sep 2014 23:41:31 -0400 Received: from [59.151.112.132] (port=16969 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XRCJ2-0003Us-Go for qemu-devel@nongnu.org; Mon, 08 Sep 2014 23:41:25 -0400 Message-ID: <540E7673.1030902@cn.fujitsu.com> Date: Tue, 9 Sep 2014 11:39:31 +0800 From: Hongyang Yang MIME-Version: 1.0 References: <1409238244-31720-1-git-send-email-dgilbert@redhat.com> <1409238244-31720-4-git-send-email-dgilbert@redhat.com> In-Reply-To: <1409238244-31720-4-git-send-email-dgilbert@redhat.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v3 03/47] Start documenting how postcopy works. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert (git)" , qemu-devel@nongnu.org Cc: aarcange@redhat.com, yamahata@private.email.ne.jp, quintela@redhat.com, amit.shah@redhat.com, lilei@linux.vnet.ibm.com =E5=9C=A8 08/28/2014 11:03 PM, Dr. David Alan Gilbert (git) =E5=86=99=E9=81= =93: > From: "Dr. David Alan Gilbert" > > Signed-off-by: Dr. David Alan Gilbert > --- > docs/migration.txt | 188 ++++++++++++++++++++++++++++++++++++++++++++++= +++++++ > 1 file changed, 188 insertions(+) > > diff --git a/docs/migration.txt b/docs/migration.txt > index 0492a45..7f0fdc4 100644 > --- a/docs/migration.txt > +++ b/docs/migration.txt > @@ -294,3 +294,191 @@ save/send this state when we are in the middle of a= pio operation > (that is what ide_drive_pio_state_needed() checks). If DRQ_STAT is > not enabled, the values on that fields are garbage and don't need to > be sent. > + > +=3D Return path =3D > + > +In most migration scenarios there is only a single data path that runs > +from the source VM to the destination, typically along a single fd (alth= ough > +possibly with another fd or similar for some fast way of throwing pages = across). > + > +However, some uses need two way communication; in particular the Postcop= y destination > +needs to be able to request pages on demand from the source. > + > +For these scenarios there is a 'return path' from the destination to the= source; > +qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the= return > +path. > + > + Source side > + Forward path - written by migration thread > + Return path - opened by main thread, read by return-path thread > + > + Destination side > + Forward path - read by main thread > + Return path - opened by main thread, written by main thread AND po= stcopy > + thread (protected by rp_mutex) > + > +=3D Postcopy =3D > +'Postcopy' migration is a way to deal with migrations that refuse to con= verge; > +its plus side is that there is an upper bound on the amount of migration= traffic > +and time it takes, the down side is that during the postcopy phase, a fa= ilure of > +*either* side or the network connection causes the guest to be lost. > + > +In postcopy the destination CPUs are started before all the memory has b= een > +transferred, and accesses to pages that are yet to be transferred cause > +a fault that's translated by QEMU into a request to the source QEMU. > + > +Postcopy can be combined with precopy (i.e. normal migration) so that if= precopy > +doesn't finish in a given time the switch is automatically made to preco= py. I think you mean "automatically made to postcopy" here? > + > +=3D=3D=3D Enabling postcopy =3D=3D=3D > + > +To enable postcopy (prior to the start of migration): > + > +migrate_set_capability x-postcopy-ram on > + > +The migration will still start in precopy mode, however issuing: > + > +migrate_start_postcopy > + > +will now cause the transition from precopy to postcopy. > +It can be issued immediately after migration is started or any > +time later on. Issuing it after the end of a migration is harmless. > + > +=3D=3D=3D Postcopy device transfer =3D=3D=3D > + > +Loading of device data may cause the device emulation to access guest RA= M > +that may trigger faults that have to be resolved by the source, as such > +the migration stream has to be able to respond with page data *during* t= he > +device load, and hence the device data has to be read from the stream co= mpletely > +before the device load begins to free the stream up. This is achieved b= y > +'packaging' the device data into a blob that's read in one go. > + > +Source behaviour > + > +Until postcopy is entered the migration stream is identical to normal po= stcopy, > +except for the addition of a 'postcopy advise' command at the beginning = to > +let the destination know that postcopy might happen. When postcopy star= ts A comma here? > +the source sends the page discard data and then forms the 'package' cont= aining: > + > + Command: 'postcopy ram listen' > + The device state > + A series of sections, identical to the precopy streams device stat= e stream > + containing everything except postcopiable devices (i.e. RAM) > + Command: 'postcopy ram run' > + > +The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and= the > +contents are formatted in the same way as the main migration stream. > + > +Destination behaviour > + > +Initially the destination looks the same as precopy, with a single threa= d > +reading the migration stream; the 'postcopy advise' and 'discard' comman= ds > +are processed to change the way RAM is managed, but don't affect the str= eam > +processing. > + > +------------------------------------------------------------------------= ------ > + 1 2 3 4 5 6 7 > +main -----DISCARD-CMD_PACKAGED ( LISTEN DEVICE DEVICE DEVICE RUN ) > +thread | | > + | (page request) > + | \___ > + v \ > +listen thread: --- page -- page -- page -- page -- p= age -- > + > + a b c > +------------------------------------------------------------------------= ------ > + > +On receipt of CMD_PACKAGED (1) > + All the data associated with the package - the ( ... ) section in the > +diagram - is read into memory (into a QEMUSizedBuffer), and the main thr= ead > +recurses into qemu_loadvm_state_main to process the contents of the pack= age (2) > +which contains commands (3,6) and devices (4...) > + > +On receipt of 'postcopy ram listen' - 3 -(i.e. the 1st command in the pa= ckage) > +a new thread (a) is started that takes over servicing the migration stre= am, > +while the main thread carries on loading the package. It loads normal > +background page data (b) but if during a device load a fault happens (5)= the > +returned page (c) is loaded by the listen thread allowing the main threa= ds > +device load to carry on. > + > +The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the de= stination > +CPUs start running. > +At the end of the CMD_PACKAGED (7) the main thread returns to normal run= ning behaviour > +and is no longer used by migration, while the listen thread carries > +on servicing page data until the end of migration. > + > +=3D=3D=3D Postcopy states =3D=3D=3D > + > +Postcopy moves through a series of states (see postcopy_ram_state) > +from ADVISE->LISTEN->RUNNING->END > + > + Advise: Set at the start of migration if postcopy is enabled, even > + if it hasn't had the start command; here the destination > + checks that its OS has the support needed for postcopy, and pe= rforms > + setup to ensure the RAM mappings are suitable for later postco= py. > + (Triggered by reception of POSTCOPY_RAM_ADVISE command) > + > + Listen: The first command in the package, POSTCOPY_RAM_LISTEN, switche= s > + the destination state to Listen, and starts a new thread > + (the 'listen thread') which takes over the job of receiving > + pages off the migration stream, while the main thread carries > + on processing the blob. With this thread able to process page > + reception, the destination now 'sensitises' the RAM to detect > + any access to missing pages (on Linux using the 'userfault' > + system). > + > + Running: POSTCOPY_RAM_RUN causes the destination to synchronise all > + state and start the CPUs and IO devices running. The main > + thread now finishes processing the migration package and > + now carries on as it would for normal precopy migration > + (although it can't do the cleanup it would do as it > + finishes a normal migration). > + > + End: The listen thread can now quit, and perform the cleanup of migrat= ion > + state, the migration is now complete. > + > +=3D=3D=3D Source side page maps =3D=3D=3D > + > +The source side keeps two bitmaps during postcopy; 'the migration bitmap= ' > +and 'sent map'. The 'migration bitmap' is basically the same as in > +the precopy case, and holds a bit to indicate that page is 'dirty' - > +i.e. needs sending. During the precopy phase this is updated as the CPU > +dirties pages, however during postcopy the CPUs are stopped and nothing > +should dirty anything any more. > + > +The 'sent map' is used for the transition to postcopy. It is a bitmap th= at > +has a bit set whenever a page is sent to the destination, however during > +the transition to postcopy mode it is masked against the migration bitma= p > +(sentmap &=3D migrationbitmap) to generate a bitmap recording pages that > +have been previously been sent but are now dirty again. This masked > +sentmap is sent to the destination which discards those now dirty pages > +before starting the CPUs. > + > +Note that once in postcopy mode, the sent map is still updated; however, > +its contents are not necessarily consistent with the pages already sent > +due to the masking with the migration bitmap. > + > +=3D=3D=3D Destination side page maps =3D=3D=3D > + > +(Needs to be changed so we can update both easily - at the moment update= s are done > + with a lock) > +The destination keeps a 'requested map' and a 'received map'. > +Both maps are initially 0, as pages are received the bits are set in 're= ceived map'. > +Incoming requests from the kernel cause the bit to be set in the 'reques= ted map'. > +When a page is received that is marked as 'requested' the kernel is noti= fied. > +If the kernel requests a page that has already been 'received' the kerne= l is notified > +without re-requesting. > + > +This leads to three valid page states: > +page states: > + missing (!rc,!rq) - page not yet received or requested > + received (rc,!rq) - Page received > + requested (!rc,rq) - page requested but not yet received > + > +state transitions: > + received -> missing (only during setup/discard) > + > + missing -> received (normal incoming page) > + requested -> received (incoming page previously requested) > + missing -> requested (userfault request) > + > --=20 Thanks, Yang.