From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34449) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yqefp-0007QT-Jr for qemu-devel@nongnu.org; Fri, 08 May 2015 05:34:26 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Yqefo-0008C8-Ju for qemu-devel@nongnu.org; Fri, 08 May 2015 05:34:25 -0400 Date: Fri, 8 May 2015 10:34:10 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20150508093409.GA2126@work-vm> References: <5539A78D.1020206@cn.fujitsu.com> <5539F4F6.10507@redhat.com> <5539F702.7040708@cn.fujitsu.com> <20150424085841.GC2139@work-vm> <553A070E.8030909@redhat.com> <553A0F18.4010501@cn.fujitsu.com> <553A0EA3.9080400@redhat.com> <20150427093741.GA15658@stefanha-thinkpad.redhat.com> <20150505152355.GN2126@work-vm> <20150508084250.GA11717@stefanha-thinkpad.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150508084250.GA11717@stefanha-thinkpad.redhat.com> Subject: Re: [Qemu-devel] [PATCH COLO v3 01/14] docs: block replication's description List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Kevin Wolf , Fam Zheng , Lai Jiangshan , qemu block , armbru@redhat.com, jcody@redhat.com, Jiang Yunhong , Dong Eddie , qemu devel , Max Reitz , Gonglei , Paolo Bonzini , Yang Hongyang , zhanghailiang * Stefan Hajnoczi (stefanha@redhat.com) wrote: > On Tue, May 05, 2015 at 04:23:56PM +0100, Dr. David Alan Gilbert wrote: > > * Stefan Hajnoczi (stefanha@redhat.com) wrote: > > > On Fri, Apr 24, 2015 at 11:36:35AM +0200, Paolo Bonzini wrote: > > > > > > > > > > > > On 24/04/2015 11:38, Wen Congyang wrote: > > > > >> > > > > > >> > That can be done with drive-mirror. But I think it's too early for that. > > > > > Do you mean use drive-mirror instead of quorum? > > > > > > > > Only before starting up a new secondary. Basically you do a migration > > > > with non-shared storage, and then start the secondary in colo mode. > > > > > > > > But it's only for the failover case. Quorum (or a new block/colo.c > > > > driver or filter) is fine for normal colo operation. > > > > > > Perhaps this patch series should mirror the Secondary's disk to a Backup > > > Secondary so that the system can be protected very quickly after > > > failover. > > > > > > I think anyone serious about fault tolerance would deploy a Backup > > > Secondary, otherwise the system cannot survive two failures unless a > > > human administrator is lucky/fast enough to set up a new Secondary. > > > > I'd assumed that a higher level management layer would do the allocation > > of a new secondary after the first failover, so no human need be involved. > > That doesn't help, after the first failover is too late even if it's > done by a program. There should be no window during which the VM is > unprotected. > > People who want fault tolerance care about 9s of availability. The VM > must be protected on the new Primary as soon as the failover occurs, > otherwise this isn't a serious fault tolerance solution. I'm not aware of any other system that manages that, so I don't think that's fair. You gain a lot more availability going from a single system to the 1+1 system that COLO (or any of the checkpointing systems) propose, I can't say how many 9s it gets you. It's true having multiple secondaries would get you a bit more on top of that, but you're still a lot better off just having the one secondary. I had thought that having >1 secondary would be a nice addition, but it's a big change everywhere else (e.g. having to maintain multiple migration streams, dealing with miscompares from multiple hosts). Dave > > Stefan -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK