From: Troy Benjegerdes <hozer@hozed.org>
To: ronnie sahlberg <ronniesahlberg@gmail.com>
Cc: Beno?t Canet <benoit.canet@irqsave.net>,
kwolf@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com,
pbonzini@redhat.com
Subject: Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication
Date: Wed, 2 Jan 2013 13:18:59 -0600 [thread overview]
Message-ID: <20130102191859.GT19472@us.grid.coop> (raw)
In-Reply-To: <CAN05THTRuUAv7337kH-z+etbS7Esw6=u97d9Acr1dju9-9ZO5Q@mail.gmail.com>
If you do get a hash collision, it's a rather exceptional event, so I'd
say every effort should be made to log the event and the data that created
it in multiple places.
There are three questions I'd ask on a hash collision:
1) was it the data?
2) was it the hardware?
3) was it a software bug?
On Wed, Jan 02, 2013 at 10:47:48AM -0800, ronnie sahlberg wrote:
> Do you really need to resolve the conflicts?
> It might be easier and sufficient to just flag those hashes where a
> conflict has been detected as : "dont dedup this hash anymore,
> collissions have been seen."
>
>
> On Wed, Jan 2, 2013 at 10:40 AM, Beno?t Canet <benoit.canet@irqsave.net> wrote:
> > Le Wednesday 02 Jan 2013 ? 12:26:37 (-0600), Troy Benjegerdes a ?crit :
> >> The probability may be 'low' but it is not zero. Just because it's
> >> hard to calculate the hash doesn't mean you can't do it. If your
> >> input data is not random the probability of a hash collision is
> >> going to get scewed.
> >>
> >> Read about how Bitcoin uses hashes.
> >>
> >> I need a budget of around $10,000 or so for some FPGAs and/or GPU cards,
> >> and I can make a regression test that will create deduplication hash
> >> collisions on purpose.
> >
> > It's not a problem as Eric pointed out while reviewing the previous patchset
> > there is a small place left with zeroes on the deduplication block.
> > A bit could be set on it when a collision is detected and an offset could point
> > to a cluster used to resolve collisions.
> >
> >>
> >>
> >> On Wed, Jan 02, 2013 at 06:33:24PM +0100, Beno?t Canet wrote:
> >> > > How does this code handle hash collisions, and do you have some regression
> >> > > tests that purposefully create a dedup hash collision, and verify that the
> >> > > 'right thing' happens?
> >> >
> >> > The two hash function that can be used are cryptographics and not broken yet.
> >> > So nobody knows how to generate a collision.
> >> >
> >> > You can do the math to calculate the probability of collision using a 256 bit
> >> > hash while processing 1EiB of data the result is so low you can consider it
> >> > won't happen.
> >> > The sha256 ZFS deduplication works the same way regarding collisions.
> >> >
> >> > I currently use qemu-io-test for testing purpose and iozone with the -w flag in
> >> > the guest.
> >> > I would like to find a good deduplication stress test to run in a guest.
> >> >
> >> > Regards
> >> >
> >> > Beno?t
> >> >
> >> > > It's great that this almost works, but it seems rather dangerous to put
> >> > > something like this into the mainline code without some regression tests.
> >> > >
> >> > > (I'm also suspecting the regression test will be a great way to find
> >> > > flakey hardware)
> >> > >
> >> > > --------------------------------------------------------------------------
> >> > > Troy Benjegerdes 'da hozer' hozer@hozed.org
> >> > >
> >> > > Somone asked my why I work on this free (http://www.fsf.org/philosophy/)
> >> > > software & hardware (http://q3u.be) stuff and not get a real job.
> >> > > Charles Shultz had the best answer:
> >> > >
> >> > > "Why do musicians compose symphonies and poets write poems? They do it
> >> > > because life wouldn't have any meaning for them if they didn't. That's why
> >> > > I draw cartoons. It's my life." -- Charles Shultz
> >>
> >> --
> >> --------------------------------------------------------------------------
> >> Troy Benjegerdes 'da hozer' hozer@hozed.org
> >>
> >> Somone asked my why I work on this free (http://www.fsf.org/philosophy/)
> >> software & hardware (http://q3u.be) stuff and not get a real job.
> >> Charles Shultz had the best answer:
> >>
> >> "Why do musicians compose symphonies and poets write poems? They do it
> >> because life wouldn't have any meaning for them if they didn't. That's why
> >> I draw cartoons. It's my life." -- Charles Shultz
> >>
> >
--
--------------------------------------------------------------------------
Troy Benjegerdes 'da hozer' hozer@hozed.org
Somone asked my why I work on this free (http://www.fsf.org/philosophy/)
software & hardware (http://q3u.be) stuff and not get a real job.
Charles Shultz had the best answer:
"Why do musicians compose symphonies and poets write poems? They do it
because life wouldn't have any meaning for them if they didn't. That's why
I draw cartoons. It's my life." -- Charles Shultz
next prev parent reply other threads:[~2013-01-02 19:19 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-02 16:16 [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 01/30] qcow2: Add deduplication to the qcow2 specification Benoît Canet
2013-01-03 18:18 ` Eric Blake
2013-01-04 14:49 ` Benoît Canet
2013-01-16 14:50 ` Benoît Canet
2013-01-16 15:58 ` Eric Blake
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 02/30] qcow2: Add deduplication structures and fields Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 03/30] qcow2: Add qcow2_dedup_read_missing_and_concatenate Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 04/30] qcow2: Make update_refcount public Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 05/30] qcow2: Create a way to link to l2 tables when deduplicating Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 06/30] qcow2: Add qcow2_dedup and related functions Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 07/30] qcow2: Add qcow2_dedup_store_new_hashes Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 08/30] qcow2: Implement qcow2_compute_cluster_hash Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 09/30] qcow2: Extract qcow2_dedup_grow_table Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 10/30] qcow2: Add qcow2_dedup_grow_table and use it Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 11/30] qcow2: create function to load deduplication hashes at startup Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 12/30] qcow2: Load and save deduplication table header extension Benoît Canet
2013-01-05 0:02 ` Eric Blake
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 13/30] qcow2: Extract qcow2_do_table_init Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 14/30] qcow2-cache: Allow to choose table size at creation Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 15/30] qcow2: Add qcow2_dedup_init and qcow2_dedup_close Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 16/30] qcow2: Extract qcow2_add_feature and qcow2_remove_feature Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 17/30] block: Add qemu-img dedup create option Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 18/30] qcow2: Behave correctly when refcount reach 0 or 2^16 Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 19/30] qcow2: Integrate deduplication in qcow2_co_writev loop Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 20/30] qcow2: Serialize write requests when deduplication is activated Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 21/30] qcow2: Add verification of dedup table Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 22/30] qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 23/30] qcow2: Add check_dedup_l2 in order to check l2 of dedup table Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 24/30] qcow2: Do not overwrite existing entries with QCOW_OFLAG_COPIED Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 25/30] qcow2: Integrate SKEIN hash algorithm in deduplication Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 26/30] qcow2: Add lazy refcounts to deduplication to prevent qcow2_cache_set_dependency loops Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 27/30] qcow2: Use large L2 table for deduplication Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 28/30] qcow: Set dedup cluster block size to 64KB Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 29/30] qcow2: init and cleanup deduplication Benoît Canet
2013-01-02 16:16 ` [Qemu-devel] [RFC V4 30/30] qemu-iotests: Filter dedup=on/off so existing tests don't break Benoît Canet
2013-01-02 16:42 ` Eric Blake
2013-01-02 16:50 ` Benoît Canet
2013-01-02 17:10 ` [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication Troy Benjegerdes
2013-01-02 17:33 ` Benoît Canet
2013-01-02 18:01 ` Eric Blake
2013-01-02 18:16 ` Benoît Canet
2013-01-02 18:26 ` Troy Benjegerdes
2013-01-02 18:40 ` Benoît Canet
2013-01-02 18:47 ` ronnie sahlberg
2013-01-02 18:55 ` Benoît Canet
2013-01-02 19:18 ` Troy Benjegerdes [this message]
2013-01-03 2:16 ` ronnie sahlberg
2013-01-03 12:39 ` Stefan Hajnoczi
2013-01-03 19:51 ` Troy Benjegerdes
2013-01-04 7:09 ` Dietmar Maurer
2013-01-04 9:49 ` Stefan Hajnoczi
2013-01-03 17:18 ` Benoît Canet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130102191859.GT19472@us.grid.coop \
--to=hozer@hozed.org \
--cc=benoit.canet@irqsave.net \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=ronniesahlberg@gmail.com \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).