From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52956) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YlG70-0001fn-EX for qemu-devel@nongnu.org; Thu, 23 Apr 2015 08:20:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YlG6z-00015m-HO for qemu-devel@nongnu.org; Thu, 23 Apr 2015 08:20:10 -0400 Date: Thu, 23 Apr 2015 13:19:53 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20150423121953.GG2177@work-vm> References: <5538B813.5090506@cn.fujitsu.com> <5538C3CC.9030902@redhat.com> <20150423101716.GF5289@noname.redhat.com> <5538CA77.4030708@redhat.com> <20150423104045.GG5289@noname.redhat.com> <5538CD0F.1060100@redhat.com> <20150423113631.GH5289@noname.redhat.com> <5538DD52.3020101@redhat.com> <20150423120533.GF2177@work-vm> <5538E174.9020201@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5538E174.9020201@redhat.com> Subject: Re: [Qemu-devel] [PATCH COLO v3 01/14] docs: block replication's description List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Kevin Wolf , Fam Zheng , Lai Jiangshan , qemu block , armbru@redhat.com, jcody@redhat.com, Jiang Yunhong , Dong Eddie , qemu devel , Max Reitz , Gonglei , Stefan Hajnoczi , Yang Hongyang , zhanghailiang * Paolo Bonzini (pbonzini@redhat.com) wrote: > > > On 23/04/2015 14:05, Dr. David Alan Gilbert wrote: > > As presented at the moment, I don't see there's any dynamic reconfiguration > > on the primary side at the moment > > So that means the bdrv_start_replication and bdrv_stop_replication > callbacks are more or less redundant, at least on the primary? > > In fact, who calls them? Certainly nothing in this patch set... > :) In the main colo set (I'm looking at the February version) there are calls to them, the 'stop_replication' is called at failover time. Here is I think the later version: http://lists.nongnu.org/archive/html/qemu-devel/2015-03/msg05391.html Dave > > Paolo > > - it starts up in the configuration with > > the quorum(disk, NBD), and that's the way it stays throughout the fault-tolerant > > setup; the primary doesn't start running until the secondary is connected. > > > > Similarly the secondary startups in the configuration and stays that way; > > the interesting question to me is what happens after a failure. > > > > If the secondary fails, then your primary is still quorum(disk, NBD) but > > the NBD side is dead - so I don't think you need to do anything there > > immediately. > > > > If the primary fails, and the secondary takes over, then a lot of the > > stuff on the secondary now becomes redundent; does that stay the same > > and just operate in some form of passthrough - or does it need to > > change configuration? > > > > The hard part to me is how to bring it back into fault-tolerance now; > > after a primary failure, the secondary now needs to morph into something > > like a primary, and somehow you need to bring up a new secondary > > and get that new secondary an image of the primaries current disk. -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK