From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:39785)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1Yqf0E-0004Et-RI
	for qemu-devel@nongnu.org; Fri, 08 May 2015 05:55:31 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1Yqf0D-0008PX-Qq
	for qemu-devel@nongnu.org; Fri, 08 May 2015 05:55:30 -0400
Date: Fri, 8 May 2015 10:55:16 +0100
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Message-ID: <20150508095515.GC2126@work-vm>
References: <5539F4F6.10507@redhat.com> <5539F702.7040708@cn.fujitsu.com>
	<20150424085841.GC2139@work-vm> <553A070E.8030909@redhat.com>
	<553A0F18.4010501@cn.fujitsu.com> <553A0EA3.9080400@redhat.com>
	<20150427093741.GA15658@stefanha-thinkpad.redhat.com>
	<20150505152355.GN2126@work-vm>
	<20150508084250.GA11717@stefanha-thinkpad.redhat.com>
	<20150508093951.GA4318@noname.redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150508093951.GA4318@noname.redhat.com>
Subject: Re: [Qemu-devel] [PATCH COLO v3 01/14] docs: block replication's
	description
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Fam Zheng <famz@redhat.com>, qemu block <qemu-block@nongnu.org>, armbru@redhat.com, jcody@redhat.com, Jiang Yunhong <yunhong.jiang@intel.com>, Dong Eddie <eddie.dong@intel.com>, qemu devel <qemu-devel@nongnu.org>, Max Reitz <mreitz@redhat.com>, zhanghailiang <zhang.zhanghailiang@huawei.com>, Gonglei <arei.gonglei@huawei.com>, Stefan Hajnoczi <stefanha@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Yang Hongyang <yanghy@cn.fujitsu.com>, Lai Jiangshan <laijs@cn.fujitsu.com>

* Kevin Wolf (kwolf@redhat.com) wrote:
> Am 08.05.2015 um 10:42 hat Stefan Hajnoczi geschrieben:
> > On Tue, May 05, 2015 at 04:23:56PM +0100, Dr. David Alan Gilbert wrote:
> > > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > > On Fri, Apr 24, 2015 at 11:36:35AM +0200, Paolo Bonzini wrote:
> > > > > 
> > > > > 
> > > > > On 24/04/2015 11:38, Wen Congyang wrote:
> > > > > >> > 
> > > > > >> > That can be done with drive-mirror.  But I think it's too early for that.
> > > > > > Do you mean use drive-mirror instead of quorum?
> > > > > 
> > > > > Only before starting up a new secondary.  Basically you do a migration
> > > > > with non-shared storage, and then start the secondary in colo mode.
> > > > > 
> > > > > But it's only for the failover case.  Quorum (or a new block/colo.c
> > > > > driver or filter) is fine for normal colo operation.
> > > > 
> > > > Perhaps this patch series should mirror the Secondary's disk to a Backup
> > > > Secondary so that the system can be protected very quickly after
> > > > failover.
> > > > 
> > > > I think anyone serious about fault tolerance would deploy a Backup
> > > > Secondary, otherwise the system cannot survive two failures unless a
> > > > human administrator is lucky/fast enough to set up a new Secondary.
> > > 
> > > I'd assumed that a higher level management layer would do the allocation
> > > of a new secondary after the first failover, so no human need be involved.
> > 
> > That doesn't help, after the first failover is too late even if it's
> > done by a program.  There should be no window during which the VM is
> > unprotected.
> > 
> > People who want fault tolerance care about 9s of availability.  The VM
> > must be protected on the new Primary as soon as the failover occurs,
> > otherwise this isn't a serious fault tolerance solution.
> 
> If you're worried about two failures in a row, why wouldn't you be
> worried about three in a row? I think if you really want more than one
> backup to be ready, you shouldn't go to two, but to n.

Agreed, if you did multiple secondaries you'd do 'n'.

But 1+2 does satisfy all but the most paranoid; and in particular it does
mean that if you want to take a host down for some maintenance you can
do it without worrying.

But, as I said in my reply to Stefan, doing more than 1+1 gets really hairy;
the combinations of failovers are much more complicated.

Dave
  1) It means that 
  1) As Stefan mentions you get worried about the lack of protection after
the first failover; 
> Kevin


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK