From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:50736) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TqTTG-0005Uc-W2 for qemu-devel@nongnu.org; Wed, 02 Jan 2013 13:55:24 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TqTTA-0005ga-AL for qemu-devel@nongnu.org; Wed, 02 Jan 2013 13:55:22 -0500 Received: from nodalink.pck.nerim.net ([62.212.105.220]:56418 helo=paradis.irqsave.net) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TqTT9-0005g7-RN for qemu-devel@nongnu.org; Wed, 02 Jan 2013 13:55:16 -0500 Date: Wed, 2 Jan 2013 19:55:26 +0100 From: =?iso-8859-1?Q?Beno=EEt?= Canet Message-ID: <20130102185526.GC30225@irqsave.net> References: <1357143393-29832-1-git-send-email-benoit@irqsave.net> <20130102171057.GP19472@us.grid.coop> <20130102173324.GB29742@irqsave.net> <20130102182637.GR19472@us.grid.coop> <20130102184052.GB30225@irqsave.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: ronnie sahlberg Cc: =?iso-8859-1?Q?Beno=EEt?= Canet , kwolf@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com, pbonzini@redhat.com Le Wednesday 02 Jan 2013 =E0 10:47:48 (-0800), ronnie sahlberg a =E9crit = : > Do you really need to resolve the conflicts? > It might be easier and sufficient to just flag those hashes where a > conflict has been detected as : "dont dedup this hash anymore, > collissions have been seen." True, that's more elegant. The user would still need to specify the verify option at creation and it would require to do a read before verify but it would not make the qcow2 format uglier. >=20 >=20 > On Wed, Jan 2, 2013 at 10:40 AM, Beno=EEt Canet wrote: > > Le Wednesday 02 Jan 2013 =E0 12:26:37 (-0600), Troy Benjegerdes a =E9= crit : > >> The probability may be 'low' but it is not zero. Just because it's > >> hard to calculate the hash doesn't mean you can't do it. If your > >> input data is not random the probability of a hash collision is > >> going to get scewed. > >> > >> Read about how Bitcoin uses hashes. > >> > >> I need a budget of around $10,000 or so for some FPGAs and/or GPU ca= rds, > >> and I can make a regression test that will create deduplication hash > >> collisions on purpose. > > > > It's not a problem as Eric pointed out while reviewing the previous p= atchset > > there is a small place left with zeroes on the deduplication block. > > A bit could be set on it when a collision is detected and an offset c= ould point > > to a cluster used to resolve collisions. > > > >> > >> > >> On Wed, Jan 02, 2013 at 06:33:24PM +0100, Beno?t Canet wrote: > >> > > How does this code handle hash collisions, and do you have some = regression > >> > > tests that purposefully create a dedup hash collision, and verif= y that the > >> > > 'right thing' happens? > >> > > >> > The two hash function that can be used are cryptographics and not = broken yet. > >> > So nobody knows how to generate a collision. > >> > > >> > You can do the math to calculate the probability of collision usin= g a 256 bit > >> > hash while processing 1EiB of data the result is so low you can co= nsider it > >> > won't happen. > >> > The sha256 ZFS deduplication works the same way regarding collisio= ns. > >> > > >> > I currently use qemu-io-test for testing purpose and iozone with t= he -w flag in > >> > the guest. > >> > I would like to find a good deduplication stress test to run in a = guest. > >> > > >> > Regards > >> > > >> > Beno?t > >> > > >> > > It's great that this almost works, but it seems rather dangerous= to put > >> > > something like this into the mainline code without some regressi= on tests. > >> > > > >> > > (I'm also suspecting the regression test will be a great way to = find > >> > > flakey hardware) > >> > > > >> > > ----------------------------------------------------------------= ---------- > >> > > Troy Benjegerdes 'da hozer' hozer= @hozed.org > >> > > > >> > > Somone asked my why I work on this free (http://www.fsf.org/phil= osophy/) > >> > > software & hardware (http://q3u.be) stuff and not get a real job= . > >> > > Charles Shultz had the best answer: > >> > > > >> > > "Why do musicians compose symphonies and poets write poems? They= do it > >> > > because life wouldn't have any meaning for them if they didn't. = That's why > >> > > I draw cartoons. It's my life." -- Charles Shultz > >> > >> -- > >> --------------------------------------------------------------------= ------ > >> Troy Benjegerdes 'da hozer' hozer@hoz= ed.org > >> > >> Somone asked my why I work on this free (http://www.fsf.org/philosop= hy/) > >> software & hardware (http://q3u.be) stuff and not get a real job. > >> Charles Shultz had the best answer: > >> > >> "Why do musicians compose symphonies and poets write poems? They do = it > >> because life wouldn't have any meaning for them if they didn't. That= 's why > >> I draw cartoons. It's my life." -- Charles Shultz > >> > >