From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:34384)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1UnDnA-0003rB-7V
	for qemu-devel@nongnu.org; Thu, 13 Jun 2013 16:06:46 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1UnDn8-0003eC-T8
	for qemu-devel@nongnu.org; Thu, 13 Jun 2013 16:06:44 -0400
Received: from mail-qc0-x22f.google.com ([2607:f8b0:400d:c01::22f]:59162)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1UnDn8-0003e6-NJ
	for qemu-devel@nongnu.org; Thu, 13 Jun 2013 16:06:42 -0400
Received: by mail-qc0-f175.google.com with SMTP id k14so5337396qcv.20
	for <qemu-devel@nongnu.org>; Thu, 13 Jun 2013 13:06:42 -0700 (PDT)
Sender: Paolo Bonzini <paolo.bonzini@gmail.com>
Message-ID: <51BA264E.4010701@redhat.com>
Date: Thu, 13 Jun 2013 16:06:38 -0400
From: Paolo Bonzini <pbonzini@redhat.com>
MIME-Version: 1.0
References: <1370880226-2208-1-git-send-email-mrhines@linux.vnet.ibm.com>
	<4168C988EBDF2141B4E0B6475B6A73D10CE2AAC1@G6W2488.americas.hpqcorp.net>
	<51B60ABA.2070401@linux.vnet.ibm.com>
	<4168C988EBDF2141B4E0B6475B6A73D10CE2BAE7@G6W2488.americas.hpqcorp.net>
	<51B7B652.3070905@linux.vnet.ibm.com> <51B85EE5.1050702@hp.com>
	<51B868B3.9090607@linux.vnet.ibm.com> <51B9A614.2050101@hp.com>
	<51B9C2D6.30000@linux.vnet.ibm.com> <51B9D6A8.9070007@hp.com>
	<51B9DD5C.1030409@linux.vnet.ibm.com>
In-Reply-To: <51B9DD5C.1030409@linux.vnet.ibm.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v7 00/12] rdma: migration support
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Michael R. Hines" <mrhines@linux.vnet.ibm.com>
Cc: Juan Jose Quintela Carreira <quintela@redhat.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, Bulent Abali <abali@us.ibm.com>, "mrhines@us.ibm.com" <mrhines@us.ibm.com>, Anthony Liguori <anthony@codemonkey.ws>, Chegu Vinod <chegu_vinod@hp.com>

Il 13/06/2013 10:55, Michael R. Hines ha scritto:
> On 06/13/2013 10:26 AM, Chegu Vinod wrote:
>>>
>>> 1. start QEMU with the lock option *first*
>>> 2. Then enable x-rdma-pin-all
>>> 3. Then perform the migration
>>>
>>> What happens here? Does pinning "in advance" help you?
>>
>> Yes it does help by avoiding the freeze time at the start of the
>> pin-all migration.
>>
>> I already mentioned about this in in my earlier responses as an option
>> to consider for larger guests
>> (https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00435.html).
>>
>> But pinning all of guest memory has a few drawbacks...as you may
>> already know.
>>
>> Just to be sure I just double checked it again with your v7 bits .
>> Started a 64GB/10VCPU guest  (started qemu with the"-realtime
>> mlock=on" option) and as expected the guest startup took about 20
>> seconds longer (i.e. time taken to mlock the 64GB of guest RAM) but
>> the pin-all migration started fine...i.e. didn't observe any freezes
>> at the start of the migration
>>
>>
> (CC-ing qemu-devel).
> 
> OK, that's good to know. This means that we need to bringup the mlock()
> problem as a "larger" issue in the linux community instead of the QEMU
> community.
> 
> In the meantime, how about I make update to the RDMA patch which does
> the following:
> 
> 1. Solution #1:
>        If user requests "x-rdma-pin-all", then
>             If QEMU has enabled "-realtime mlock=on"
>                    Then, allow the capability
>             Else
>                   Disallow the capability
> 
> 2. Solution #2:  Create NEW qemu monitor command which locks memory *in
> advance*
>                           before the migrate command occurs, to clearly
> indicate to the user
>                           that the cost of locking memory must be paid
> before the migration starts.
> 
> Which solution do you prefer? Or do you have alternative idea?

Let's just document it in the release notes.  There's time to fix it.

Regarding the timestamp problem, it should be fixed in the RDMA code.
You did find a bug, but xyz_start_outgoing_migration should be
asynchronous and the pinning should happen in the setup phase.  This is
because the setup phase is already running outside the big QEMU lock and
the guest would not be frozen.

I think the patches are ready for merging, because incremental work
makes it easier to discuss the changes(*) but you really need to do two
things before 1.6, or I would rather revert them.

(1) move the pinning to the setup phase

(2) add a debug mode where every pass unpins all the memory and
restarts.  Speed doesn't matter, this is so that the protocol supports
it from the beginning, and any caching heuristics need to be done on the
source side.  As all debug modes, it will be somewhat prone to bitrot,
but at least there is a reference implementation for anyone who laters
wants to add caching.

I think (2) is very important so that, for example, during fault
tolerance you can reduce a bit the pinned size for smaller workloads,
even without ballooning.

    (*) for example, why the introduction of acct_update_position?  Is
    it a fix for a bug that always existed, or driven by some other
    changes?

Paolo