From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:52130) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Tr3u8-0003kM-Q6 for qemu-devel@nongnu.org; Fri, 04 Jan 2013 04:49:35 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Tr3u6-0000rM-BI for qemu-devel@nongnu.org; Fri, 04 Jan 2013 04:49:32 -0500 Received: from mx1.redhat.com ([209.132.183.28]:2903) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Tr3u6-0000qg-34 for qemu-devel@nongnu.org; Fri, 04 Jan 2013 04:49:30 -0500 Date: Fri, 4 Jan 2013 10:49:22 +0100 From: Stefan Hajnoczi Message-ID: <20130104094922.GB14426@stefanha-thinkpad.redhat.com> References: <1357143393-29832-1-git-send-email-benoit@irqsave.net> <20130102171057.GP19472@us.grid.coop> <20130102173324.GB29742@irqsave.net> <20130102182637.GR19472@us.grid.coop> <20130103123948.GF6976@stefanha-thinkpad.muc.redhat.com> <20130103195102.GX19472@us.grid.coop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130103195102.GX19472@us.grid.coop> Subject: Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Troy Benjegerdes Cc: Beno?t Canet , kwolf@redhat.com, qemu-devel@nongnu.org, pbonzini@redhat.com On Thu, Jan 03, 2013 at 01:51:02PM -0600, Troy Benjegerdes wrote: > On Thu, Jan 03, 2013 at 01:39:48PM +0100, Stefan Hajnoczi wrote: > > On Wed, Jan 02, 2013 at 12:26:37PM -0600, Troy Benjegerdes wrote: > > > The probability may be 'low' but it is not zero. Just because it's > > > hard to calculate the hash doesn't mean you can't do it. If your > > > input data is not random the probability of a hash collision is > > > going to get scewed. > > > > The cost of catching hash collisions is an extra read for every write. > > It's possible to reduce this with a 2nd hash function and/or caching. > > > > I'm not sure it's worth it given the extremely low probability of a hash > > collision. > > > > Venti is an example of an existing system where hash collisions were > > ignored because the probability is so low. See 3.1. Choice of Hash > > Function section: > > > > http://plan9.bell-labs.com/sys/doc/venti/venti.html > > > If you believe that it's 'extremely low', then please provide either: > > * experimental evidence to prove your claim > * an insurance underwriter who will pay-out if data is lost due to > a hash collision. Read the paper, the point is that if the probability of collision is so extremely low, then it's not worth worrying about since other effects are much more likely (i.e. cosmic rays). The TCP/IP checksums are weak and not comparable to what Benoit is using. Stefan