From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39785) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yqf0E-0004Et-RI for qemu-devel@nongnu.org; Fri, 08 May 2015 05:55:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Yqf0D-0008PX-Qq for qemu-devel@nongnu.org; Fri, 08 May 2015 05:55:30 -0400 Date: Fri, 8 May 2015 10:55:16 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20150508095515.GC2126@work-vm> References: <5539F4F6.10507@redhat.com> <5539F702.7040708@cn.fujitsu.com> <20150424085841.GC2139@work-vm> <553A070E.8030909@redhat.com> <553A0F18.4010501@cn.fujitsu.com> <553A0EA3.9080400@redhat.com> <20150427093741.GA15658@stefanha-thinkpad.redhat.com> <20150505152355.GN2126@work-vm> <20150508084250.GA11717@stefanha-thinkpad.redhat.com> <20150508093951.GA4318@noname.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150508093951.GA4318@noname.redhat.com> Subject: Re: [Qemu-devel] [PATCH COLO v3 01/14] docs: block replication's description List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: Fam Zheng , qemu block , armbru@redhat.com, jcody@redhat.com, Jiang Yunhong , Dong Eddie , qemu devel , Max Reitz , zhanghailiang , Gonglei , Stefan Hajnoczi , Paolo Bonzini , Yang Hongyang , Lai Jiangshan * Kevin Wolf (kwolf@redhat.com) wrote: > Am 08.05.2015 um 10:42 hat Stefan Hajnoczi geschrieben: > > On Tue, May 05, 2015 at 04:23:56PM +0100, Dr. David Alan Gilbert wrote: > > > * Stefan Hajnoczi (stefanha@redhat.com) wrote: > > > > On Fri, Apr 24, 2015 at 11:36:35AM +0200, Paolo Bonzini wrote: > > > > > > > > > > > > > > > On 24/04/2015 11:38, Wen Congyang wrote: > > > > > >> > > > > > > >> > That can be done with drive-mirror. But I think it's too early for that. > > > > > > Do you mean use drive-mirror instead of quorum? > > > > > > > > > > Only before starting up a new secondary. Basically you do a migration > > > > > with non-shared storage, and then start the secondary in colo mode. > > > > > > > > > > But it's only for the failover case. Quorum (or a new block/colo.c > > > > > driver or filter) is fine for normal colo operation. > > > > > > > > Perhaps this patch series should mirror the Secondary's disk to a Backup > > > > Secondary so that the system can be protected very quickly after > > > > failover. > > > > > > > > I think anyone serious about fault tolerance would deploy a Backup > > > > Secondary, otherwise the system cannot survive two failures unless a > > > > human administrator is lucky/fast enough to set up a new Secondary. > > > > > > I'd assumed that a higher level management layer would do the allocation > > > of a new secondary after the first failover, so no human need be involved. > > > > That doesn't help, after the first failover is too late even if it's > > done by a program. There should be no window during which the VM is > > unprotected. > > > > People who want fault tolerance care about 9s of availability. The VM > > must be protected on the new Primary as soon as the failover occurs, > > otherwise this isn't a serious fault tolerance solution. > > If you're worried about two failures in a row, why wouldn't you be > worried about three in a row? I think if you really want more than one > backup to be ready, you shouldn't go to two, but to n. Agreed, if you did multiple secondaries you'd do 'n'. But 1+2 does satisfy all but the most paranoid; and in particular it does mean that if you want to take a host down for some maintenance you can do it without worrying. But, as I said in my reply to Stefan, doing more than 1+1 gets really hairy; the combinations of failovers are much more complicated. Dave 1) It means that 1) As Stefan mentions you get worried about the lack of protection after the first failover; > Kevin -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK