Re: Is metadata redundant over more than one drive with raid0 too?

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Is metadata redundant over more than one drive with raid0 too?
Date: Tue, 6 May 2014 19:05:52 +0000 (UTC)	[thread overview]
Message-ID: <pan$31100$5edd829$14fe1b2f$4e1f4bd7@cox.net> (raw)
In-Reply-To: 20140505012719.GD10159@merlins.org

Marc MERLIN posted on Sun, 04 May 2014 18:27:19 -0700 as excerpted:

> On Sun, May 04, 2014 at 09:44:41AM +0200, Brendan Hide wrote:
>> >Ah, I see the man page now "This is because SSDs can remap blocks
>> >internally so duplicate blocks could end up in the same erase block
>> >which negates the benefits of doing metadata duplication."
>> 
>> You can force dup but, per the man page, whether or not that is
>> beneficial is questionable.
> 
> So the reason I was confused originally was this:
> legolas:~# btrfs fi df /mnt/btrfs_pool1
> Data, single: total=734.01GiB, used=435.39GiB
> System, DUP: total=8.00MiB, used=96.00KiB
> System, single: total=4.00MiB, used=0.00
> Metadata, DUP: total=8.50GiB, used=6.74GiB
> Metadata, single: total=8.00MiB, used=0.00
> 
> This is on my laptop with an SSD. Clearly btrfs is using duplicate
> metadata on an SSD, and I did not ask it to do so.
> Note that I'm still generally happy with the idea of duplicate metadata
> on an SSD even if it's not bulletproof.

In regard to metadata defaulting to single rather than the (otherwise) dup 
on single-device ssd:

1) In ordered to do that, btrfs (I guess mkfs.btrfs in this case) must be 
able to detect that the device *IS* ssd.  Depending on the SSD, the 
kernel version, and whether the btrfs is being created direct on bare-
metal device or on some device layered (lvm or dmcrypt or whatever) on 
top of the bare metal, btrfs may or may not successfully detect that.

Obviously in your case[1] the ssd wasn't detected.

Question:  Does btrfs detect ssd and automatically add it to the mount 
options for that btrfs?  I suspect not, thus consistent behavior in not 
detecting the SSD.  FWIW, it is detected here.  I've never specifically 
added ssd to any of my btrfs mount options, but it's always there in 
/proc/self/mounts when I check.[2]

I believe I've seen you mention using dmcrypt or the like, however, which 
probably doesn't pass whatever is used for ssd protection on thru, thus 
explaining btrfs not seeing it and having to specify it yourself, if you 
wish.

While I'm not sure, I /think/ btrfs may use the sysfs rotational file (or 
rather, the same information that the kernel exports to that file) for 
this detection.  For my bare-metal devices that's:

/sys/block/sdX/queue/rotational

For my ssds that file contains "0" while for spinning rust, it contains 
"1".

The contents of that file are derived in turn from the information 
exported by the device.  I believe the same information can be seen with 
hdparm -I, in the Configuration section, as Nominal Media Rotation Rate.

For my spinning rust that returns an RPM value such as 7200.  For my sdds 
it returns "Solid State Device".

The same information can be seen with smartctl -i, which has much shorter 
output so it's easier to find.  Look for Rotation Rate.

Again, my ssds report "Solid State Device", while my spinning rust 
reports a value such as "7200 rpm".

2) The only reason I happen to know about the SSD metadata single-device 
single mode default exception (where metadata otherwise defaults to dup 
mode on single-device, and to raid1 mode on multi-device regardless of 
the media), is as a result of I believe Chris Mason commenting on it in 
an on-list reply.

The reasoning given in that reply was not the erase-block reason I've 
seen someone else mention here (and which doesn't quite make sense to me, 
since I don't know why that would make a difference), but rather:

Some SSD firmware does automatic deduplication and compression.  On these 
devices, DUP-mode would almost certainly be stored as a single internal 
data block with two external address references anyway, so it would 
actually be single in any case, and defaulting to single (a) doesn't hide 
that fact, and (b) reduces overhead that's justified for safety 
otherwise, but if the firmware is doing an end run around that safety 
anyway, might as well just shortcut the overhead as well.

However, while the btrfs default will apply to all (detected) ssds, not 
all ssds have firmware that does this internal deduplication!

In fact, the documentation for my ssds sells its LACK of such compression 
and deduplication as a feature, pointing out that such features tend to 
make the behavior of a device far less predictable[3], tho they do 
increase maximum speed and capacity.

Which is why I've chosen to specify dup mode on my single-device btrfs 
here, even on ssds.[4]  While it'd be the wrong choice on ssds that do 
compression and deduplication, on mine, it's still the right choice. =:^)

If your SSDs don't do firmware-based dedup/compression, then dup metadata 
is still arguably the best choice on ssd.  But if they do, the single 
metadata default does indeed make more sense, even if that's not the 
default you're getting due to lack of ssd detection.

---
[1] Obviously ssd not detected: Assuming you didn't specify metadata 
level, probably a safe assumption or we'd not be having the discussion.  
Personally, I always make a point of specifying both data and metadata 
level here when doing a mkfs.btrfs, just to be sure.

[2] My btrfs are all on SSD.  I'm still using legacy reiserfs on my 
legacy spinning rust, but reiserfs' journaling behavior isn't appropriate 
for ssd, so where I've upgraded to ssd I use btrfs.  Which works out 
great since the spinning rust is backup for the ssds, and the very mature 
reiserfs is backup for the still under heavy development btrfs. =:^)

[3] Compression/deduplication performance:  Indeed, both the speed and 
capacity of devices with compression and deduplication varies greatly 
depending on the compressibility of the data, tho maximum speed and 
capacity is certainly greater, but it's not easily predictable.

[4] Most of my btrfs are raid1 mode across two devices.  I do have a 
couple single-device btrfs, however, /boot and its backup on the other 
device, instead of the usual raid1 mode across both devices but with a 
second raid1 btrfs primary backup of the first on a second set of 
partitions across the same devices, with the /boot exception being 
because it's a lot easier to tell the BIOS to boot from the other device 
and thus select the backup /that/ way, than it is to tell grub to use a 
different /boot!  Altho with grub2, it's actually possible to have it 
select the /boot too, but the BIOs selector method is stiff easier.

Of course being /boot and its backup, those single-device btrfs are both 
quite small, 256 MiB each, and I use mixed-bg (-M in mkfs.btrfs) mode for 
them as a result.  That means I dup both data and metadata at the same 
time, since their mixed together, which in turn means the effective 
filesystem capacity is half the filesystem size, 128 MiB instead of 256.  
But 128 MiB is fine for /boot.  I just have to track the number of 
kernels (with attached initramfs on each one, dramatically increasing the 
individual kernel size) I have available a bit closer and delete them 
sooner than I might otherwise, plus watch the btrfs fi show output a bit 
more closely and do a balance when unallocated gets too low.  But I still 
have room to track a couple stable kernels, plus a dozen or so pre-
releases when I'm bisecting a kernel bug, before I have to start deleting.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

next prev parent reply	other threads:[~2014-05-06 20:11 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-03 23:27 Is metadata redundant over more than one drive with raid0 too? Marc MERLIN
2014-05-04  6:57 ` Brendan Hide
2014-05-04  7:24   ` Marc MERLIN
2014-05-04  7:44     ` Brendan Hide
2014-05-05  1:27       ` Marc MERLIN
2014-05-06 19:05         ` Duncan [this message]
2014-05-06 19:39         ` Duncan
2014-05-05  0:46     ` Daniel Lee
2014-05-05  5:06       ` Marc MERLIN
2014-05-06 17:16         ` Duncan
2014-05-07  8:18           ` raid0 vs single, and should we allow -mdup by default on SSDs? Marc MERLIN
2014-05-07  8:29             ` Hugo Mills
2014-05-07  8:52               ` Marc MERLIN
2014-05-07 22:39                 ` Mitch Harder
2014-05-04 21:49 ` Is metadata redundant over more than one drive with raid0 too? Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$31100$5edd829$14fe1b2f$4e1f4bd7@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).