From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from plane.gmane.org ([80.91.229.3]:36634 "EHLO plane.gmane.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753373AbbI1VxV (ORCPT ); Mon, 28 Sep 2015 17:53:21 -0400 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1ZggMH-00038O-Ks for linux-btrfs@vger.kernel.org; Mon, 28 Sep 2015 23:53:17 +0200 Received: from 84.76.233.227 ([84.76.233.227]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 28 Sep 2015 23:53:17 +0200 Received: from iam by 84.76.233.227 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 28 Sep 2015 23:53:17 +0200 To: linux-btrfs@vger.kernel.org From: Juan Francisco Cantero Hurtado Subject: Re: mkfs.btrfs(8) --data outdated? Date: Mon, 28 Sep 2015 23:53:10 +0200 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed In-Reply-To: Sender: linux-btrfs-owner@vger.kernel.org List-ID: Thanks for the help (and the story :P ). The partition was just a test. On 09/28/2015 04:19 AM, Duncan wrote: > Juan Francisco Cantero Hurtado posted on Sun, 27 Sep 2015 22:33:01 +0200 > as excerpted: > >> kernel: 4.1.7 >> btrfs-progs: 4.2 >> >> mkfs.btrfs(8) man page says: > >> -d|--data >> Specify how the data must be spanned across the devices specified. >> Valid values are raid0, raid1, raid5, raid6, raid10 or single. >> >> $ sudo mkfs.btrfs -d dup -m dup -M -O no-holes -f /dev/sdb1 >> SMALL VOLUME: forcing mixed metadata/data groups >> btrfs-progs v4.2 >> See http://btrfs.wiki.kernel.org for more information. >> >> Label: (null) >> UUID: 29de8c3e-92da-4fe1-aa31-830c1068f532 >> Node size: 4096 >> Sector size: 4096 >> Filesystem size: 3.00GiB >> Block group profiles: >> Data+Metadata: DUP 161.56MiB >> System: DUP 12.00MiB >> SSD detected: no >> Incompat features: mixed-bg, extref, skinny-metadata, no-holes >> Number of devices: 1 >> Devices: >> ID SIZE PATH >> 1 3.00GiB /dev/sdb1 >> >> >> The man page doesn't mention the option "-d dup". Am I missing something >> or the man page is outdated?. The partition will contain two copies of >> the data blocks? > > The manpage isn't exactly outdated (for that, but see below), you just > missed the implications of what it said about another option, -M|--mixed, > and the SMALL VOLUME notation. > > The... > > SMALL VOLUME: forcing mixed metadata/data groups > > ... explicitly tell you it's enabling mixed mode (which is the default > for 1 GiB and smaller filesystems, tho the manpage just says recommended, > not mentioning that it's the default, so it's arguably outdated in that > regard, tho the mkfs.btrfs output is explicit if there's any doubt). > > But while you have a 3.00 GiB filesystem that wouldn't default to mixed- > mode, you specifically enabled it with the -M option. > > In fact, there has been some discussion about upping the default breakover > between separate data/metadata block groups and mixed-mode, to something > like 32 GiB, but AFAIK, nobody has ever actually submitted a patch to do > so, so the default breakover remains 1 GiB, despite it being apparently > generally agreed on-list that mixed-mode would be a better default to at > least 16 if not 32 GiB. (Tho with btrfs now automatically cleaning empty > chunks, the need for this is less than it was back before that change, > around 3.17, IIRC, but it's arguably still a good idea.) > > So at 3 GiB, your choice of mixed-mode was arguably wise. =:^) > > But that you specifically chose mixed-mode with the -M option, and yet > are still posting this question, indicates that while you might well know > it's recommended at the 3 GiB filesystem level, you remain confused as to > what it actually does. > > What mixed-mode does is this. Instead of creating separate data and > metadata chunks (aka block-groups), data and metadata are combined into a > single, mixed-chunk, format. While slightly less efficient performance- > wise, this allows maximum flexibility in terms of space usage, allowing > you to fill the filesystem much closer to 100% without running out of > space in either data or metadata chunks while still having plenty of room > in the other. As such, with data chunks nominally 1 GiB (tho they're > smaller as space gets tight and initial data chunks can be bigger on TiB- > scale filesystems) and metadata chunks nominally 256 MiB, the extra > flexibility of being able to use that last bit of space on a small > filesystem can really make a difference. > > OTOH, metadata defaults to dup mode on single-device filesystems (except > on SSD) for reliability reasons, with data defaulting to single mode. > And mixed mode takes that default from the metadata side, so it too > defaults to dup, for data as well as metadata since they're mixed in the > same block-groups -- on the smallest filesystems, where dup mode data > *REALLY* puts a crimp in how much room you actually have to store stuff! > > > So mixed-bg mode is definitely a tradeoff, since with mixed-bg you can't > set data and metadata replication modes separately because they're > combined in the same chunks/bgs, meaning you have to choose either the > riskier single mode for metadata as well as data, or the duplicating and > thus space gobbling dup mode for data as well as metadata... on a small > filesystem that's likely already tight on space! > > On the bright side, with dup mode mixed-bgs, the increased reliability of > dup mode metadata now applies to data too, so damage to the device that > doesn't take out the entire device, just some parts of it, is less likely > to trigger loss of access to the file, as there's a second copy available > that with a bit of luck hasn't been damaged. =:^) > > In fact, because separate data/metadata doesn't even give you the option > of dup mode for data, only for metadata, some people actually choose to > use mixed-mode on much larger filesystems as well, even at the expense of > performance mixed mode implies, simply to be able to get the extra > protection dup mode gives them. =:^) > > > Meanwhile, while we're on the topic, it may be useful to note another > difference of mixed-bg mode that doesn't affect normal operation, but > that you'll run into if you try to use btrfs balance filters. Balance > filters are normally applied using the -d[] -m[] > options, for data and metadata chunks, respectively. As such, there's no > explicit balance-filters support for mixed-bg mode, but they can still be > applied, provided both the -d and -m options are given, and the filters > element for both exactly match. So for instance you can run... > > btrfs balance start -dusage=5 -musage=5 / > > ... and on a mixed-bg /, it'll work, because the identical usage=5 filter > has been given for both data and metadata. But... > > btrfs balance start -dusage=5 > > ... without the similar -musage=5, will not run, instead giving you an > error. Similarly... > > btrfs balance start -dusage=5 -musage=2 > > ... will fail with an error, because it's a usage filter in both cases, > but the values are different. > > > (The rest is simply personal experience, with explanation of how I came > to have a 256 MiB mixed-bg dup mode btrfs in the first place. Skip it if > you are pressed for time or your curiosity isn't as active as mine. =:^) > > I have two ssds, partitioned identically, with various mostly btrfs raid1 > filesystems setup on the parallel partitions on each ssd. However, my / > boot partition is an exception, since grub can only point at one /boot, > so my normal backup method of setting up two partitions of the same size > on each device, each still raid1 but here two separate copies of the > filesystem on the same device, one for the working copy filesystem, the > other for the backup, won't work for /boot, because grub can only point > at one of them. (Tho granted, grub2, with its rescue mode, is more > flexible in this regard than was grub1. In theory, I could use grub > rescue mode to point grub at a backup boot partition if i needed to. > However, grub's rescue mode is much more limited than regular mode, and > worth avoiding if possible. And since I had developed this technique > back on grub1 which didn't have a rescue mode, I could simply continue to > use it with grub2, thereby avoiding rescue mode unless things get REALLY > bad. =:^) > > So instead of creating /boot as a btrfs raid1, with a second btrfs raid1 > on other partitions on the same devices, I setup only one boot partition > and filesystem on each device, with grub setup on each device to point at > its own boot partition. Since I can select the boot device from BIOS, > with the BIOS then loading grub as the boot manager from that device, and > grub in turn pointing at its /boot on the same device, that gives me a > working copy /boot on one device, and its backup on the other one. =:^) > > But without the normal protection of the btrfs raid1 on both the working > and backup copy that I have for other partitions, I at least wanted btrfs > dup mode, and it was the default anyway, since the filesystems were well > under a gig. > > The problem is, when I originally did my partition setup, I only setup > 256 MiB partitions for /boot and its backup, thinking that should be > plenty, as like my other filesystems, it was the same or bigger than the > partitions I had used previously, when I was using reiserfs. And I mount > with compress=lzo on all my btrfs, so in most cases, they actually ended > up taking less space than the reiserfs version, so the same or more space > was fine. > > But what I forgot to account for was the extra space required for the > single-device dup-mode data, on the 256 MiB /boot partition on the one > ssd and its backup on the other. So instead of extra room due to the > compression as I had on the other btrfs, on /boot and its backup, I only > ended up with 128 MiB due to the dup, with the reserved system chunks > taking several MiB of that! =:^( > > So my /boot and its backup ended up with under 128 MiB capacity due to > the dup, half what I had planned, and as a result, they run *MUCH* closer > to full than I had intended. What with the grub2 files as well, I can't > fit nearly the number of kernels on /boot as I would have liked, and > while I can still fit several, generally a tested stable, plus a few pre- > release rc kernels, I have to delete those pre-releases a whole lot > faster than I was used to back with the old reiserfs /boot and its > backup, where I could generally keep a whole kernel cycle's worth of > testing kernels, handy when I'm doing a bisect, for instance. > > I rebalance occasionally too, trying to keep at least /some/ unallocated > space on the filesystem. Which is why I know about having to keep the > -mfilter and -dfilter exactly the same on mixed-bg mode btrfs. > > Anyway, I've adapted to it, but they're still smaller than I'd like. I > try to fit all my tiny partitions in the first gig, arranging sizes so > the last one ends at the 1 GiB boundary and all further filesystems are > in multiples of a GiB and thus GiB aligned. But when I next repartition, > I'll probably create /boot and its backup as 384 MiB instead of 256, > giving it another 128 MiB and thus another 64 MiB capacity with dup. > Since I'll still be trying to keep the sub-GiB filesystems to the first > GiB, that means something else will be 128 MiB smaller, probably > /var/log, 512 MiB instead of its current 640 MiB. (Like the others > excepting boot and the reserved GPT BIOS and EFI partitions, log is btrfs > raid1, so while it's tiny and thus mixed-bg too, the dup mode problem > doesn't apply.) >