linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* btrfs vs data deduplication
@ 2011-07-09  6:19 Paweł Brodacki
  2011-09-18 20:15 ` Hubert Kario
  2011-09-18 21:01 ` Maciej Marcin Piechotka
  0 siblings, 2 replies; 4+ messages in thread
From: Paweł Brodacki @ 2011-07-09  6:19 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I've stumbled upon this article:
http://storagemojo.com/2011/06/27/de-dup-too-much-of-good-thing/

Reportedly Sandforce SF1200 SSD controller does internally block-level
data de-duplication. This effectively removes the additional
protection given by writing multiple metadata copies. This technique
may be used, or can be used in the future by manufactureres of other
drives too.

I would like to ask, if the metadata copies written to a btrfs system
with enabled metadata mirroring are identical, or is there something
that makes them unique on-disk, therefore preventing their
de-duplication. I tried googling for the answer, but didn't net
anything that would answer my question.

If the metadata copies are identical, I'd like to ask if it would be
possible to change this without major disruption? I know that changes
to on-disk format aren't a thing made lightly, but I'd be grateful for
any comments.

The increase of the risk of file system corruption introduced by data
de-duplication on Sandforce controllers was down-played in the
vendor's reply included in the article, but still, what's the point of
duplicating metadata on file system level, if storage below can remove
that redundancy?

Regards,
Pawe=C5=82
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: btrfs vs data deduplication
  2011-07-09  6:19 btrfs vs data deduplication Paweł Brodacki
@ 2011-09-18 20:15 ` Hubert Kario
  2011-09-18 22:14   ` Chris Samuel
  2011-09-18 21:01 ` Maciej Marcin Piechotka
  1 sibling, 1 reply; 4+ messages in thread
From: Hubert Kario @ 2011-09-18 20:15 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Paweł Brodacki

[-- Attachment #1: Type: text/plain, Size: 1856 bytes --]

On Saturday 09 of July 2011 08:19:30 Paweł Brodacki wrote:
> Hello,
>
> I've stumbled upon this article:
> http://storagemojo.com/2011/06/27/de-dup-too-much-of-good-thing/
>
> Reportedly Sandforce SF1200 SSD controller does internally block-level
> data de-duplication. This effectively removes the additional
> protection given by writing multiple metadata copies. This technique
> may be used, or can be used in the future by manufactureres of other
> drives too.

Only a problem in a single disk installation

> I would like to ask, if the metadata copies written to a btrfs system
> with enabled metadata mirroring are identical, or is there something
> that makes them unique on-disk, therefore preventing their
> de-duplication. I tried googling for the answer, but didn't net
> anything that would answer my question.

There is a difference between root inode copies, don't think there's any 
difference between metadata tree copies. I'm quite certain they are bit for 
bit identical.

> If the metadata copies are identical, I'd like to ask if it would be
> possible to change this without major disruption? I know that changes
> to on-disk format aren't a thing made lightly, but I'd be grateful for
> any comments.

That would be a big change for little to no benefit.

> The increase of the risk of file system corruption introduced by data
> de-duplication on Sandforce controllers was down-played in the
> vendor's reply included in the article, but still, what's the point of
> duplicating metadata on file system level, if storage below can remove
> that redundancy?

You shouldn't depend on single drive, metadata raid is there to protect 
against single bad blocks, not disk crash.

If you want redundancy, use mulitple disks. Either HDD or SSD. And have 
readable backups.

Regards,
Hubert


[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: btrfs vs data deduplication
  2011-07-09  6:19 btrfs vs data deduplication Paweł Brodacki
  2011-09-18 20:15 ` Hubert Kario
@ 2011-09-18 21:01 ` Maciej Marcin Piechotka
  1 sibling, 0 replies; 4+ messages in thread
From: Maciej Marcin Piechotka @ 2011-09-18 21:01 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1819 bytes --]

On Sat, 2011-07-09 at 08:19 +0200, Paweł Brodacki wrote:
> Hello,
> 
> I've stumbled upon this article:
> http://storagemojo.com/2011/06/27/de-dup-too-much-of-good-thing/
> 
> Reportedly Sandforce SF1200 SSD controller does internally block-level
> data de-duplication. This effectively removes the additional
> protection given by writing multiple metadata copies. This technique
> may be used, or can be used in the future by manufactureres of other
> drives too.
> 
> I would like to ask, if the metadata copies written to a btrfs system
> with enabled metadata mirroring are identical, or is there something
> that makes them unique on-disk, therefore preventing their
> de-duplication. I tried googling for the answer, but didn't net
> anything that would answer my question.
> 
> If the metadata copies are identical, I'd like to ask if it would be
> possible to change this without major disruption? I know that changes
> to on-disk format aren't a thing made lightly, but I'd be grateful for
> any comments.
> 
> The increase of the risk of file system corruption introduced by data
> de-duplication on Sandforce controllers was down-played in the
> vendor's reply included in the article, but still, what's the point of
> duplicating metadata on file system level, if storage below can remove
> that redundancy?
> 
> Regards,
> Paweł

Hello,

Sorry I add my 0.03$. It is possible to workaround it by using
encryption. If something other then ebc is used the identical elements
in unecrypted mode are stored as different on hdd.

The drawbacks:

 - Encryption overhead (you may want to use non-secure mode as you're
not interested in security)
 - There is avalanche effect (whole [encryption] block gets corrupted
even if one bit of block is corrupted).

Regards

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: btrfs vs data deduplication
  2011-09-18 20:15 ` Hubert Kario
@ 2011-09-18 22:14   ` Chris Samuel
  0 siblings, 0 replies; 4+ messages in thread
From: Chris Samuel @ 2011-09-18 22:14 UTC (permalink / raw)
  To: linux-btrfs

On Mon, 19 Sep 2011, 06:15:51 EST, Hubert Kario <hubert@kario.pl> wrote:

> You shouldn't depend on single drive, metadata
> raid is there to protect against single bad
> blocks, not disk crash.

I guess the issue here is you no longer even
have that protection with this sort of dedup.

cheers,
Chris
-- 
Chris Samuel - http://www.csamuel.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-09-18 22:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-09  6:19 btrfs vs data deduplication Paweł Brodacki
2011-09-18 20:15 ` Hubert Kario
2011-09-18 22:14   ` Chris Samuel
2011-09-18 21:01 ` Maciej Marcin Piechotka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).