From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37369) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X2yyS-0001Ss-Mu for qemu-devel@nongnu.org; Fri, 04 Jul 2014 04:36:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1X2yyN-0004cU-Iu for qemu-devel@nongnu.org; Fri, 04 Jul 2014 04:36:04 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58211) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X2yyN-0004cO-B8 for qemu-devel@nongnu.org; Fri, 04 Jul 2014 04:35:59 -0400 Date: Fri, 4 Jul 2014 09:35:46 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20140704083546.GC2425@work-vm> References: <53A8DD80.7070905@cn.fujitsu.com> <20140701121248.GH2394@work-vm> <53B4D133.4060903@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: Subject: Re: [Qemu-devel] [RFC] COLO HA Project proposal List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dong, Eddie" Cc: FNST-Gui Jianfeng , Hongyang Yang , "qemu-devel@nongnu.org" , "kvm@vger.kernel.org" * Dong, Eddie (eddie.dong@intel.com) wrote: > > > > > > I didn't quite understand a couple of things though, perhaps you can > > > explain: > > > 1) If we ignore the TCP sequence number problem, in an SMP machine > > > don't we get other randomnesses - e.g. which core completes something > > > first, or who wins a lock contention, so the output stream might not > > > be identical - so do those normal bits of randomness cause the > > > machines to flag as out-of-sync? > >=20 > > It's about COLO agent, CCing Congyang, he can give the detailed > > explanation. > >=20 >=20 > Let me clarify on this issue. COLO didn't ignore the TCP sequence number,= but uses a=20 > new implementation to make the sequence number to be best effort identica= l=20 > between the primary VM (PVM) and secondary VM (SVM). Likely, VMM has to s= ynchronize=20 > the emulation of randomization number generation mechanism between the=20 > PVM and SVM, like the lock-stepping mechanism does.=20 >=20 > Further mnore, for long TCP connection, we can rely on the (on-demand) VM= checkpoint to get the=20 > identical Sequence number both in PVM and SVM.=20 That wasn't really my question; I was worrying about other forms of randomn= ess, such as winners of lock contention, and other SMP non-determinisms, and I'm also worried by what proportion of time the system can't recover =66rom a failure due to being unable to distinguish an SVM failure from a randomness issue. Dave >=20 >=20 > Thanks, Eddie -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK