From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37816) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WVuzI-0004br-8D for qemu-devel@nongnu.org; Thu, 03 Apr 2014 23:40:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WVuz9-0002kC-6B for qemu-devel@nongnu.org; Thu, 03 Apr 2014 23:40:16 -0400 Received: from e9.ny.us.ibm.com ([32.97.182.139]:40946) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WVuz9-0002k0-2Z for qemu-devel@nongnu.org; Thu, 03 Apr 2014 23:40:07 -0400 Received: from /spool/local by e9.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 3 Apr 2014 23:40:06 -0400 Received: from b01cxnp23032.gho.pok.ibm.com (b01cxnp23032.gho.pok.ibm.com [9.57.198.27]) by d01dlp03.pok.ibm.com (Postfix) with ESMTP id 018C3C90041 for ; Thu, 3 Apr 2014 23:40:00 -0400 (EDT) Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by b01cxnp23032.gho.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s343e3rr8585652 for ; Fri, 4 Apr 2014 03:40:03 GMT Received: from d01av01.pok.ibm.com (localhost [127.0.0.1]) by d01av01.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s343e2u8000552 for ; Thu, 3 Apr 2014 23:40:03 -0400 Message-ID: <533E293E.5090305@linux.vnet.ibm.com> Date: Fri, 04 Apr 2014 11:38:38 +0800 From: "Michael R. Hines" MIME-Version: 1.0 References: <1392713429-18201-1-git-send-email-mrhines@linux.vnet.ibm.com> <1392713429-18201-12-git-send-email-mrhines@linux.vnet.ibm.com> <531F86D6.4080204@redhat.com> In-Reply-To: <531F86D6.4080204@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH v2 11/12] mc: introduce new capabilities to control micro-checkpointing List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake , qemu-devel@nongnu.org Cc: GILR@il.ibm.com, SADEKJ@il.ibm.com, pbonzini@redhat.com, quintela@redhat.com, abali@us.ibm.com, EREZH@il.ibm.com, owasserm@redhat.com, onom@us.ibm.com, hinesmr@cn.ibm.com, isaku.yamahata@gmail.com, gokul@us.ibm.com, dbulkow@gmail.com, junqing.wang@cs2c.com.cn, BIRAN@il.ibm.com, lig.fnst@cn.fujitsu.com, "Michael R. Hines" On 03/12/2014 05:57 AM, Eric Blake wrote: > --- > qapi-schema.json | 36 +++++++++++++++++++++++++++++++++++- > 1 file changed, 35 insertions(+), 1 deletion(-) > >> +# Only for performance testing. (Since 2.x) >> +# >> +# @mc-rdma-copy: MC requires creating a local-memory checkpoint before >> +# transmission to the destination. This requires heavy use of >> +# memcpy() which dominates the processor pipeline. This option >> +# makes use of *local* RDMA to perform the copy instead of the CPU. >> +# Enabled by default only if the migration transport is RDMA. >> +# Disabled by default otherwise. (Since 2.x) > How does that work? If I query migration capabilities before requesting > a migration, what state am I going to read? Is there coupling where I > would observe the state of this flag change merely because I did some > other action? And if so, then how do I know that explicitly setting > this flag won't be undone by similar coupling? > > It sounds like you are describing a tri-state option (unspecified so > default to migration transport, explicitly disabled, explicitly > enabled); but that doesn't work for something that only lists boolean > capabilities. The only way around that is to have 2 separate > capabilities (one on whether to base decision on transport or to honor > override, and the other to provide the override value which is ignored > when defaulting by transport). Yes, now that I think about it, this 'tri-state' possibility is indeed confusing to the management software. I'll stop this behavior and instead require that it be manually enabled when needed. >> +# >> +# @rdma-keepalive: RDMA connections do not timeout by themselves if a peer >> +# has disconnected prematurely or failed. User-level keepalives >> +# allow the migration to abort cleanly if there is a problem with the >> +# destination host. For debugging, this can be problematic as >> +# the keepalive may cause the peer to abort prematurely if we are >> +# at a GDB breakpoint, for example. >> +# Enabled by default. (Since 2.x) > Enabled-by-default is an interesting choice, but I suppose it is okay. I'll rename the command to "rdma-disable-keepalive" and change the default to "disabled". - Michael