Re: Offline Deduplication for Btrfs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Gordan Bobic <gordan@bobich.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Offline Deduplication for Btrfs
Date: Thu, 06 Jan 2011 14:49:39 +0000	[thread overview]
Message-ID: <4D25D683.3050609@bobich.net> (raw)
In-Reply-To: <20110106143019.GB14674@mother>

Tomasz Torcz wrote:
> On Thu, Jan 06, 2011 at 02:19:04AM +0100, Spelic wrote:
>>> CPU can handle considerably more than 250 block hashings per
>>> second. You could argue that this changes in cases of sequential
>>> I/O on big files, but a 1.86GHz GHz Core2 can churn through
>>> 111MB/s of SHA256, which even SSDs will struggle to keep up with.
>> A normal 1TB disk with platters can do 130MB/sec sequential, no prob=
lems.
>> A SSD can do more like 200MB/sec write 280MB/sec read sequential or
>> random and is actually limited only by the SATA 3.0gbit/sec but soon
>> enough they will have SATA/SAS 6.0gbit/sec.
>=20
>   By =E2=80=9Csoon enough=E2=80=9D you really meant =E2=80=9Ca year a=
go=E2=80=9D, I think:
> http://www.anandtech.com/show/3812/the-ssd-diaries-crucials-realssd-c=
300
> Current 6Gbps SSD are doing 415 MB/s sequential:
> http://www.anandtech.com/show/4086/microns-realssd-c400-uses-25nm-nan=
d-at-161gb-offers-415mbs-reads
> or even claim 550MB/s:
> http://www.anandtech.com/show/4100/ocz-vertex-pro-3-demo-worlds-first=
-sandforce-sf2000
> (funny bit: Sandforce SSD controllers dedup internally).=20
>=20
>   Anyway, 6Gbps is not a future tale, but something long available.
> And not the fastest kids on the block:  currently build filesystems
> must deal storage providing many gigabytes per second.  Think
> of massive disk arrays or stuff like Oracle F5100, claiming
> 12.8GB/sec read and ~10GB/s write (in one rack unit).

Sequential figures look nice and impressive but we all know they are=20
meaningless for most real world workloads. IOPS are where it's at. And=20
maybe you can get 100,000 IOPS out of an SSD. But that still means=20
100,000 SHA256 hashes/second. That's 3.2MB/s of SHA256 hashes, or about=
=20
2% of what a modern x64 CPU will do, assuming it doesn't have a suitabl=
e=20
hardware crypto accelerator for that algorithm. So on a reasonably=20
recent quad core CPU you would probably be able to comfortably handle=20
about 200x that before it starts becoming an issue. If you're that=20
concerned about space requirements, doing LZO compression will still be=
=20
much more expensive.

And that's only for writes - on reads we don't need to do any hashing=20
(although it's useful to do for the disk error checking reasons=20
explained earlier).

Gordan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2011-01-06 14:49 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-05 16:36 Offline Deduplication for Btrfs Josef Bacik
2011-01-05 16:36 ` [PATCH] Btrfs: add extent-same ioctl for dedup Josef Bacik
2011-01-05 17:50   ` Simon Farnsworth
2011-01-05 16:36 ` [PATCH] Btrfs-progs: add dedup functionality Josef Bacik
2011-01-05 17:42 ` Offline Deduplication for Btrfs Gordan Bobic
2011-01-05 18:41   ` Diego Calleja
2011-01-05 19:01     ` Ray Van Dolson
2011-01-05 20:27       ` Gordan Bobic
2011-01-05 20:28       ` Josef Bacik
2011-01-05 20:25     ` Gordan Bobic
2011-01-05 21:14       ` Diego Calleja
2011-01-05 21:21         ` Gordan Bobic
2011-01-05 19:46   ` Josef Bacik
2011-01-05 19:58     ` Lars Wirzenius
2011-01-05 20:15       ` Josef Bacik
2011-01-05 20:34         ` Freddie Cash
2011-01-05 21:07       ` Lars Wirzenius
2011-01-05 20:12     ` Freddie Cash
2011-01-05 20:46     ` Gordan Bobic
     [not found]       ` <4D250B3C.6010708@shiftmail.org>
2011-01-06  1:03         ` Gordan Bobic
2011-01-06  1:56           ` Spelic
2011-01-06 10:39             ` Gordan Bobic
2011-01-06  3:33           ` Freddie Cash
2011-01-06  1:19       ` Spelic
2011-01-06  3:58         ` Peter A
2011-01-06 10:48           ` Gordan Bobic
2011-01-06 13:33             ` Peter A
2011-01-06 14:00               ` Gordan Bobic
2011-01-06 14:52                 ` Peter A
2011-01-06 15:07                   ` Gordan Bobic
2011-01-06 16:11                     ` Peter A
2011-01-06 18:35           ` Chris Mason
2011-01-08  0:27             ` Peter A
2011-01-06 14:30         ` Tomasz Torcz
2011-01-06 14:49           ` Gordan Bobic [this message]
2011-01-06  1:29   ` Chris Mason
2011-01-06 10:33     ` Gordan Bobic
2011-01-10 15:28     ` Ric Wheeler
2011-01-10 15:37       ` Josef Bacik
2011-01-10 15:39         ` Chris Mason
2011-01-10 15:43           ` Josef Bacik
2011-01-06 12:18   ` Simon Farnsworth
2011-01-06 12:29     ` Gordan Bobic
2011-01-06 13:30       ` Simon Farnsworth
2011-01-06 14:20     ` Ondřej Bílka
2011-01-06 14:41       ` Gordan Bobic
2011-01-06 15:37         ` Ondřej Bílka
2011-01-06  8:25 ` Yan, Zheng 
  -- strict thread matches above, loose matches on Subject: below --
2011-01-06  9:37 Tomasz Chmielewski
2011-01-06  9:51 ` Mike Hommey
2011-01-06 16:57   ` Hubert Kario
2011-01-06 10:52 ` Gordan Bobic
2011-01-16  0:18 Arjen Nienhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D25D683.3050609@bobich.net \
    --to=gordan@bobich.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.