From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51352) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YlFz9-0007Sv-Cs for qemu-devel@nongnu.org; Thu, 23 Apr 2015 08:12:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YlFz4-0006mu-Ez for qemu-devel@nongnu.org; Thu, 23 Apr 2015 08:12:03 -0400 Message-ID: <5538E174.9020201@redhat.com> Date: Thu, 23 Apr 2015 14:11:32 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <5537742A.90504@redhat.com> <20150423090031.GB5289@noname.redhat.com> <5538B813.5090506@cn.fujitsu.com> <5538C3CC.9030902@redhat.com> <20150423101716.GF5289@noname.redhat.com> <5538CA77.4030708@redhat.com> <20150423104045.GG5289@noname.redhat.com> <5538CD0F.1060100@redhat.com> <20150423113631.GH5289@noname.redhat.com> <5538DD52.3020101@redhat.com> <20150423120533.GF2177@work-vm> In-Reply-To: <20150423120533.GF2177@work-vm> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH COLO v3 01/14] docs: block replication's description List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: Kevin Wolf , Fam Zheng , Lai Jiangshan , qemu block , armbru@redhat.com, jcody@redhat.com, Jiang Yunhong , Dong Eddie , qemu devel , Max Reitz , Gonglei , Stefan Hajnoczi , Yang Hongyang , zhanghailiang On 23/04/2015 14:05, Dr. David Alan Gilbert wrote: > As presented at the moment, I don't see there's any dynamic reconfiguration > on the primary side at the moment So that means the bdrv_start_replication and bdrv_stop_replication callbacks are more or less redundant, at least on the primary? In fact, who calls them? Certainly nothing in this patch set... :) Paolo - it starts up in the configuration with > the quorum(disk, NBD), and that's the way it stays throughout the fault-tolerant > setup; the primary doesn't start running until the secondary is connected. > > Similarly the secondary startups in the configuration and stays that way; > the interesting question to me is what happens after a failure. > > If the secondary fails, then your primary is still quorum(disk, NBD) but > the NBD side is dead - so I don't think you need to do anything there > immediately. > > If the primary fails, and the secondary takes over, then a lot of the > stuff on the secondary now becomes redundent; does that stay the same > and just operate in some form of passthrough - or does it need to > change configuration? > > The hard part to me is how to bring it back into fault-tolerance now; > after a primary failure, the secondary now needs to morph into something > like a primary, and somehow you need to bring up a new secondary > and get that new secondary an image of the primaries current disk.