From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:43664)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1X4AIZ-0006P6-AN
	for qemu-devel@nongnu.org; Mon, 07 Jul 2014 10:53:50 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1X4AIR-0000ix-IR
	for qemu-devel@nongnu.org; Mon, 07 Jul 2014 10:53:43 -0400
Received: from mx1.redhat.com ([209.132.183.28]:26210)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1X4AIR-0000ip-Ag
	for qemu-devel@nongnu.org; Mon, 07 Jul 2014 10:53:35 -0400
Message-ID: <53BAB466.8060609@redhat.com>
Date: Mon, 07 Jul 2014 16:53:26 +0200
From: Paolo Bonzini <pbonzini@redhat.com>
MIME-Version: 1.0
References: <1404495717-4239-1-git-send-email-dgilbert@redhat.com>
	<1404495717-4239-16-git-send-email-dgilbert@redhat.com>
	<53B7D2B8.7000002@redhat.com> <20140707143539.GB3443@work-vm>
In-Reply-To: <20140707143539.GB3443@work-vm>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH 15/46] Rework loadvm path for subloops
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: aarcange@redhat.com, yamahata@private.email.ne.jp, quintela@redhat.com, qemu-devel@nongnu.org, lilei@linux.vnet.ibm.com

Il 07/07/2014 16:35, Dr. David Alan Gilbert ha scritto:
> * Paolo Bonzini (pbonzini@redhat.com) wrote:
>> Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto:
>>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>>
>>> Postcopy needs to have two migration streams loading concurrently;
>>> one from memory (with the device state) and the other from the fd
>>> with the memory transactions.
>>
>> Can you explain this?
>>
>> I would have though the order is
>>
>>     precopy RAM and everything
>>     prepare postcopy RAM ("sent && dirty" bitmap)
>>     finish precopy non-RAM
>>     finish devices
>>     postcopy RAM
>>
>> Why do you need to have all the packaging stuff and a separate memory-based
>> migration stream for devices?  I'm sure I'm missing something. :)
>
> The thing you're missing is the details of 'finish devices'.
> The device emulation may access guest memory as part of loading it's
> state, so you can't successfully complete 'finish devices' without
> having the 'postcopy RAM' available to provide pages.

I see.  Can you document the flow (preferrably as a reply to this email 
_and_ in docs/ when you send v2 of the code :))?

 From my cursory read of the code it is something like this on the source:

     finish precopy non-RAM
     start RAM postcopy
     for each device
         pack up data
         send it to destination

and on the destination:

     while source sends packet
         pick up packet atomically
         pass the packet to device loader
             (while the loader works, userfaultfd does background magic)

But something is missing still, either some kind of ack is needed 
between device data sends or userfaultfd needs to be able to process 
device data packets.

Paolo

> Thus you need to be able to start up 'postcopy RAM' before 'finish devices'
> has completed, and you can't do that if 'finish devices' is still stuffing
> data down the fd.
>
> Now, if hypothetically you had:
>   1) A migration format that let you separate out device state so that you
> could load all the state of the device off the fd without calling the device
> IO code.
>   2) All devices were good and didn't touch guest memory while loading their
> state.
>
> then you could avoid this complexity.  However, if you look at how Stefan's
> BER code tried to do 1 (which I don't do in my way of doing it), it was by
> using the same trick of stuffing the device data into a dummy memory file
> to find out the size of the data.   And I'm not convinced (2) will happen
> this century.
>
>> Paolo
>
> Dave
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>