From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:45350) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TqSrT-000271-5C for qemu-devel@nongnu.org; Wed, 02 Jan 2013 13:16:21 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TqSrP-0004F9-J6 for qemu-devel@nongnu.org; Wed, 02 Jan 2013 13:16:19 -0500 Received: from nodalink.pck.nerim.net ([62.212.105.220]:56381 helo=paradis.irqsave.net) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TqSrP-0004F3-Ac for qemu-devel@nongnu.org; Wed, 02 Jan 2013 13:16:15 -0500 Date: Wed, 2 Jan 2013 19:16:35 +0100 From: =?iso-8859-1?Q?Beno=EEt?= Canet Message-ID: <20130102181635.GA30225@irqsave.net> References: <1357143393-29832-1-git-send-email-benoit@irqsave.net> <20130102171057.GP19472@us.grid.coop> <20130102173324.GB29742@irqsave.net> <50E475E0.2080103@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <50E475E0.2080103@redhat.com> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake Cc: =?iso-8859-1?Q?Beno=EEt?= Canet , kwolf@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com, pbonzini@redhat.com I think I can easily add a "verify" option at image creation. This way the code would read the cluster already on disk and compare it w= ith the cluster to write. If there are different it would print some debug message and return -EIO = to the upper layers. > Le Wednesday 02 Jan 2013 =E0 11:01:04 (-0700), Eric Blake a =E9crit : > On 01/02/2013 10:33 AM, Beno=EEt Canet wrote: > >> How does this code handle hash collisions, and do you have some regr= ession > >> tests that purposefully create a dedup hash collision, and verify th= at the > >> 'right thing' happens? > >=20 > > The two hash function that can be used are cryptographics and not bro= ken yet. > > So nobody knows how to generate a collision. >=20 > I can understand that it is hard to write a test for two distinct data > sectors hashing to the same value, but perhaps it's worth including a > debug-only hash algorithm that intentionally generates collisions, just > to prove that you handle them correctly. De-duplicating collided data, > while unlikely, is still a case of data loss that not everyone is happy > to risk. >=20 > --=20 > Eric Blake eblake redhat com +1-919-301-3266 > Libvirt virtualization library http://libvirt.org >=20