From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:40606) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SxNI5-000750-3V for qemu-devel@nongnu.org; Fri, 03 Aug 2012 15:12:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SxNI3-0007qa-Mi for qemu-devel@nongnu.org; Fri, 03 Aug 2012 15:12:04 -0400 Received: from paradis.irqsave.net ([109.190.18.76]:56823) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SxNI3-0007pT-Dq for qemu-devel@nongnu.org; Fri, 03 Aug 2012 15:12:03 -0400 Date: Fri, 3 Aug 2012 21:11:54 +0200 From: =?iso-8859-1?Q?Beno=EEt?= Canet Message-ID: <20120803191154.GA2069@irqsave.net> References: <1343902604-13981-1-git-send-email-benoit@irqsave.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Blue Swirl Cc: kwolf@redhat.com, pbonzini@redhat.com, =?iso-8859-1?Q?Beno=EEt?= Canet , qemu-devel@nongnu.org, stefanha@linux.vnet.ibm.com Le Friday 03 Aug 2012 =E0 16:14:51 (+0000), Blue Swirl a =E9crit : > On Thu, Aug 2, 2012 at 10:16 AM, Beno=EEt Canet wrote: > > This patchset create a block driver implementing a qorum using three = qemu disk > > images. Writes are mirrored on the three files. > > For the reading part the three files are read at the same time and a = vote is > > done to determine which is the majoritary qiov version. It then retur= n this > > majoritary version to the upper layers. > > When three differents versions of the data are returned by the lower = layer the > > qorum is broken and the read return -EIO. >=20 > It would be pretty easy to make the number of nodes and quorum > threshold values for both read and write selectable. Then you could > have for example 100 nodes and write quorum at 51 (for example, 49 > nodes offline). Obviously writing the same data 100 times sequentially > would not give very high performance but it's a start. For now the number of disk is hardcoded to 3. But most of the code is wri= tten with a variable number of disk in mind: just quorum_open and quorum_vote = would need to be rewritten with a few automatic changes across the code.