Re: raid0 vs single, and should we allow -mdup by default on SSDs?

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Marc MERLIN <marc@merlins.org>
To: Duncan <1i5t5.duncan@cox.net>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: raid0 vs single, and should we allow -mdup by default on SSDs?
Date: Wed, 7 May 2014 01:18:40 -0700	[thread overview]
Message-ID: <20140507081840.GM10159@merlins.org> (raw)
In-Reply-To: <pan$43f87$1b04f612$26876f7$5d505a6d@cox.net> <pan$31100$5edd829$14fe1b2f$4e1f4bd7@cox.net> <pan$a6ac4$b6e79aa$56ddd950$5b7b75c2@cox.net>

Hi Chris and other devs,

Does it really make sense to turn off -mdup on SSDs? I would argue that
no. In my case dmcrypt protected me from that, so I'm happy, but even if
I didn't use it, I'd want the protection of -mdup, even if the
protection mght only be partial.

On Tue, May 06, 2014 at 05:16:08PM +0000, Duncan wrote:
> Single only stripes in such extremely large (1 GiB data, quarter-GiB 
> metadata, per strip) chunks that it doesn't matter for speed, and then 
> only as a result of its chunk allocation policy.  If one can define such 
> large strips as striping, which it is in a way, but not really in the 
> practical sense.

Oh good, I didn't know it was that big.

> The effect of a lost device, then, is more or less random, tho for single 
> metadata the effect is likely to be quite large up to total loss, due to 
> the damage to the tree.  It's not out of thin air that the multi-device 

Yes. I totally use either -mdup or -mraid1.

> That contrasts with raid0, where the striping is at sizes well under a 
> chunk (memory page size or 4 MiB on x86/amd64 data I believe, tho the 
> fact that files under the 16 MiB node size may actually be entirely 
> folded into metadata and not have a data extent allocation at all skews 
> things for up to the 16 MiB metadata node size), so the definition of 
> "small file likely to be recovered" is **MUCH** smaller on raid0, than on 
> single.

Great to know, I'll use -m raid1 -d single next time.

> Effectively, raid0 data you're only (relatively) likely to recover files 
> smaller than 16 MiB, while single data, it's files smaller than 1 GiB.

Thanks much for that.

On Tue, May 06, 2014 at 07:05:52PM +0000, Duncan wrote:
> 1) In ordered to do that, btrfs (I guess mkfs.btrfs in this case) must be 
> able to detect that the device *IS* ssd.  Depending on the SSD, the 
> kernel version, and whether the btrfs is being created direct on bare-
> metal device or on some device layered (lvm or dmcrypt or whatever) on 
> top of the bare metal, btrfs may or may not successfully detect that.
> 
> Obviously in your case[1] the ssd wasn't detected.

Indeed.  I also found out why my SSD has -mdup: It's on top of dmcrypt
so btrfs failed to see it was and SSD and gave me -mdup. Good, that's
what I wanted anyway :)

> I believe I've seen you mention using dmcrypt or the like, however, which 
> probably doesn't pass whatever is used for ssd protection on thru, thus 
> explaining btrfs not seeing it and having to specify it yourself, if you 
> wish.

You guessed correctly, congrats.

> 2) The only reason I happen to know about the SSD metadata single-device 
> single mode default exception (where metadata otherwise defaults to dup 
> mode on single-device, and to raid1 mode on multi-device regardless of 
> the media), is as a result of I believe Chris Mason commenting on it in 
> an on-list reply.
> The reasoning given in that reply was not the erase-block reason I've 
> seen someone else mention here (and which doesn't quite make sense to me, 
> since I don't know why that would make a difference), but rather:

Yes. I personally don't think it's a good idea. Basically when having 2
copies, they could still end up on the same erase block, making them
less redundant.
My answer to that is 'so what?'
There are plenty of other times where dup would be useful on an SSD. I
really don't see the point of trying to it off by default just because
maybe in one case it would not offer extra protection.

> Some SSD firmware does automatic deduplication and compression.  On these 
> devices, DUP-mode would almost certainly be stored as a single internal 
> data block with two external address references anyway, so it would 
> actually be single in any case, and defaulting to single (a) doesn't hide 
> that fact, and (b) reduces overhead that's justified for safety 
> otherwise, but if the firmware is doing an end run around that safety 
> anyway, might as well just shortcut the overhead as well.

If some SSDs do this, let's not punish those have SSDs that don't.

> However, while the btrfs default will apply to all (detected) ssds, not 
> all ssds have firmware that does this internal deduplication!

Exactly.

On Tue, May 06, 2014 at 07:39:12PM +0000, Duncan wrote:
> Well, assuming that by -d linear you meant -d single. Btrfs doesn't call 
> it linear, tho at the data safety level, btrfs single is actually quite 
> comparable to mdadm linear.  =:^)  

Yes, I meant single, sorry :)
(aka linear for mdadm)

> > At the time I used -m raid1 -d raid0, but it sounds for slightly extra
> > recoverability, I should have ued -m raid1 -d linear (and yes, I
> > undertand that one should not consider a -d linear recoverable when a
> > drive went missing).
> 
> That appears to be a very good use of either -d raid0 or -d single, yes.  
> And since you're apparently not streaming such high resolution video that 
> you NEED the raid0, single does indeed give you a somewhat better chance 
> at recovery.

zoneminder saves 'video' as a stream of independent small jpegs, so I'm
good. Actually come to think of it they're so small that they probably
all ended up in the raid1 metadata. That also means that I'm not getting
twice the storage space like I planned to. Oh well...

Thanks for all the answers.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

next prev parent reply	other threads:[~2014-05-07  8:18 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-03 23:27 Is metadata redundant over more than one drive with raid0 too? Marc MERLIN
2014-05-04  6:57 ` Brendan Hide
2014-05-04  7:24   ` Marc MERLIN
2014-05-04  7:44     ` Brendan Hide
2014-05-05  1:27       ` Marc MERLIN
2014-05-06 19:05         ` Duncan
2014-05-06 19:39         ` Duncan
2014-05-05  0:46     ` Daniel Lee
2014-05-05  5:06       ` Marc MERLIN
2014-05-06 17:16         ` Duncan
2014-05-07  8:18           ` Marc MERLIN [this message]
2014-05-07  8:29             ` raid0 vs single, and should we allow -mdup by default on SSDs? Hugo Mills
2014-05-07  8:52               ` Marc MERLIN
2014-05-07 22:39                 ` Mitch Harder
2014-05-04 21:49 ` Is metadata redundant over more than one drive with raid0 too? Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140507081840.GM10159@merlins.org \
    --to=marc@merlins.org \
    --cc=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).