linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: mkfs.btrfs(8) --data outdated?
Date: Mon, 28 Sep 2015 02:19:32 +0000 (UTC)	[thread overview]
Message-ID: <pan$d27fc$1a42a006$3814459b$d2b2ea95@cox.net> (raw)
In-Reply-To: mu9jpv$l25$1@ger.gmane.org

Juan Francisco Cantero Hurtado posted on Sun, 27 Sep 2015 22:33:01 +0200
as excerpted:

> kernel: 4.1.7
> btrfs-progs: 4.2
> 
> mkfs.btrfs(8) man page says:

> -d|--data <type>
>      Specify how the data must be spanned across the devices specified.
>      Valid values are raid0, raid1, raid5, raid6, raid10 or single.
> 
> $ sudo mkfs.btrfs -d dup -m dup -M -O no-holes -f /dev/sdb1
> SMALL VOLUME: forcing mixed metadata/data groups
> btrfs-progs v4.2
> See http://btrfs.wiki.kernel.org for more information.
> 
> Label:              (null)
> UUID:               29de8c3e-92da-4fe1-aa31-830c1068f532
> Node size:          4096
> Sector size:        4096
> Filesystem size:    3.00GiB
> Block group profiles:
>    Data+Metadata:    DUP             161.56MiB
>    System:           DUP             12.00MiB
> SSD detected:       no
> Incompat features:  mixed-bg, extref, skinny-metadata, no-holes
> Number of devices:  1
> Devices:
>     ID        SIZE  PATH
>      1     3.00GiB  /dev/sdb1
> 
> 
> The man page doesn't mention the option "-d dup". Am I missing something
> or the man page is outdated?. The partition will contain two copies of
> the data blocks?

The manpage isn't exactly outdated (for that, but see below), you just 
missed the implications of what it said about another option, -M|--mixed, 
and the SMALL VOLUME notation.

The...

SMALL VOLUME: forcing mixed metadata/data groups

... explicitly tell you it's enabling mixed mode (which is the default 
for 1 GiB and smaller filesystems, tho the manpage just says recommended, 
not mentioning that it's the default, so it's arguably outdated in that 
regard, tho the mkfs.btrfs output is explicit if there's any doubt).

But while you have a 3.00 GiB filesystem that wouldn't default to mixed-
mode, you specifically enabled it with the -M option.

In fact, there has been some discussion about upping the default breakover 
between separate data/metadata block groups and mixed-mode, to something 
like 32 GiB, but AFAIK, nobody has ever actually submitted a patch to do 
so, so the default breakover remains 1 GiB, despite it being apparently 
generally agreed on-list that mixed-mode would be a better default to at 
least 16 if not 32 GiB.  (Tho with btrfs now automatically cleaning empty 
chunks, the need for this is less than it was back before that change, 
around 3.17, IIRC, but it's arguably still a good idea.)

So at 3 GiB, your choice of mixed-mode was arguably wise. =:^)

But that you specifically chose mixed-mode with the -M option, and yet 
are still posting this question, indicates that while you might well know 
it's recommended at the 3 GiB filesystem level, you remain confused as to 
what it actually does.

What mixed-mode does is this.  Instead of creating separate data and 
metadata chunks (aka block-groups), data and metadata are combined into a 
single, mixed-chunk, format.  While slightly less efficient performance-
wise, this allows maximum flexibility in terms of space usage, allowing 
you to fill the filesystem much closer to 100% without running out of 
space in either data or metadata chunks while still having plenty of room 
in the other.  As such, with data chunks nominally 1 GiB (tho they're 
smaller as space gets tight and initial data chunks can be bigger on TiB-
scale filesystems) and metadata chunks nominally 256 MiB, the extra 
flexibility of being able to use that last bit of space on a small 
filesystem can really make a difference.

OTOH, metadata defaults to dup mode on single-device filesystems (except 
on SSD) for reliability reasons, with data defaulting to single mode.  
And mixed mode takes that default from the metadata side, so it too 
defaults to dup, for data as well as metadata since they're mixed in the 
same block-groups -- on the smallest filesystems, where dup mode data 
*REALLY* puts a crimp in how much room you actually have to store stuff!


So mixed-bg mode is definitely a tradeoff, since with mixed-bg you can't 
set data and metadata replication modes separately because they're 
combined in the same chunks/bgs, meaning you have to choose either the 
riskier single mode for metadata as well as data, or the duplicating and 
thus space gobbling dup mode for data as well as metadata... on a small 
filesystem that's likely already tight on space!

On the bright side, with dup mode mixed-bgs, the increased reliability of 
dup mode metadata now applies to data too, so damage to the device that 
doesn't take out the entire device, just some parts of it, is less likely 
to trigger loss of access to the file, as there's a second copy available 
that with a bit of luck hasn't been damaged. =:^)

In fact, because separate data/metadata doesn't even give you the option 
of dup mode for data, only for metadata, some people actually choose to 
use mixed-mode on much larger filesystems as well, even at the expense of 
performance mixed mode implies, simply to be able to get the extra 
protection dup mode gives them. =:^)


Meanwhile, while we're on the topic, it may be useful to note another 
difference of mixed-bg mode that doesn't affect normal operation, but 
that you'll run into if you try to use btrfs balance filters.  Balance 
filters are normally applied using the -d[<filters>] -m[<filters>] 
options, for data and metadata chunks, respectively.  As such, there's no 
explicit balance-filters support for mixed-bg mode, but they can still be 
applied, provided both the -d and -m options are given, and the filters 
element for both exactly match.  So for instance you can run...

btrfs balance start -dusage=5 -musage=5 /

... and on a mixed-bg /, it'll work, because the identical usage=5 filter 
has been given for both data and metadata.  But...

btrfs balance start -dusage=5

... without the similar -musage=5, will not run, instead giving you an 
error.  Similarly...

btrfs balance start -dusage=5 -musage=2

... will fail with an error, because it's a usage filter in both cases, 
but the values are different.


(The rest is simply personal experience, with explanation of how I came 
to have a 256 MiB mixed-bg dup mode btrfs in the first place.  Skip it if 
you are pressed for time or your curiosity isn't as active as mine. =:^)

I have two ssds, partitioned identically, with various mostly btrfs raid1 
filesystems setup on the parallel partitions on each ssd.  However, my /
boot partition is an exception, since grub can only point at one /boot, 
so my normal backup method of setting up two partitions of the same size 
on each device, each still raid1 but here two separate copies of the 
filesystem on the same device, one for the working copy filesystem, the 
other for the backup, won't work for /boot, because grub can only point 
at one of them.  (Tho granted, grub2, with its rescue mode, is more 
flexible in this regard than was grub1.  In theory, I could use grub 
rescue mode to point grub at a backup boot partition if i needed to.  
However, grub's rescue mode is much more limited than regular mode, and 
worth avoiding if possible.  And since I had developed this technique 
back on grub1 which didn't have a rescue mode, I could simply continue to 
use it with grub2, thereby avoiding rescue mode unless things get REALLY 
bad. =:^)

So instead of creating /boot as a btrfs raid1, with a second btrfs raid1 
on other partitions on the same devices, I setup only one boot partition 
and filesystem on each device, with grub setup on each device to point at 
its own boot partition.  Since I can select the boot device from BIOS, 
with the BIOS then loading grub as the boot manager from that device, and 
grub in turn pointing at its /boot on the same device, that gives me a 
working copy /boot on one device, and its backup on the other one. =:^)

But without the normal protection of the btrfs raid1 on both the working 
and backup copy that I have for other partitions, I at least wanted btrfs 
dup mode, and it was the default anyway, since the filesystems were well 
under a gig.

The problem is, when I originally did my partition setup, I only setup 
256 MiB partitions for /boot and its backup, thinking that should be 
plenty, as like my other filesystems, it was the same or bigger than the 
partitions I had used previously, when I was using reiserfs.  And I mount 
with compress=lzo on all my btrfs, so in most cases, they actually ended 
up taking less space than the reiserfs version, so the same or more space 
was fine.

But what I forgot to account for was the extra space required for the 
single-device dup-mode data, on the 256 MiB /boot partition on the one 
ssd and its backup on the other.  So instead of extra room due to the 
compression as I had on the other btrfs, on /boot and its backup, I only 
ended up with 128 MiB due to the dup, with the reserved system chunks 
taking several MiB of that! =:^(

So my /boot and its backup ended up with under 128 MiB capacity due to 
the dup, half what I had planned, and as a result, they run *MUCH* closer 
to full than I had intended.  What with the grub2 files as well, I can't 
fit nearly the number of kernels on /boot as I would have liked, and 
while I can still fit several, generally a tested stable, plus a few pre-
release rc kernels, I have to delete those pre-releases a whole lot 
faster than I was used to back with the old reiserfs /boot and its 
backup, where I could generally keep a whole kernel cycle's worth of 
testing kernels, handy when I'm doing a bisect, for instance.

I rebalance occasionally too, trying to keep at least /some/ unallocated 
space on the filesystem.  Which is why I know about having to keep the
-mfilter and -dfilter exactly the same on mixed-bg mode btrfs.

Anyway, I've adapted to it, but they're still smaller than I'd like.  I 
try to fit all my tiny partitions in the first gig, arranging sizes so 
the last one ends at the 1 GiB boundary and all further filesystems are 
in multiples of a GiB and thus GiB aligned.  But when I next repartition, 
I'll probably create /boot and its backup as 384 MiB instead of 256, 
giving it another 128 MiB and thus another 64 MiB capacity with dup.  
Since I'll still be trying to keep the sub-GiB filesystems to the first 
GiB, that means something else will be 128 MiB smaller, probably
/var/log, 512 MiB instead of its current 640 MiB.  (Like the others 
excepting boot and the reserved GPT BIOS and EFI partitions, log is btrfs 
raid1, so while it's tiny and thus mixed-bg too, the dup mode problem 
doesn't apply.)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


  parent reply	other threads:[~2015-09-28  2:19 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-27 20:33 mkfs.btrfs(8) --data outdated? Juan Francisco Cantero Hurtado
2015-09-28  1:22 ` Qu Wenruo
2015-09-29 11:40   ` David Sterba
2015-09-28  2:19 ` Duncan [this message]
2015-09-28 21:53   ` Juan Francisco Cantero Hurtado

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$d27fc$1a42a006$3814459b$d2b2ea95@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).