btrfs raid-1 uuid-fstab

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* btrfs raid-1 uuid-fstab
@ 2015-02-14  2:31 James
  2015-02-14 11:52 ` Chris Murphy
  0 siblings, 1 reply; 8+ messages in thread
From: James @ 2015-02-14  2:31 UTC (permalink / raw)
  To: linux-btrfs

Ok,

I have (2) identical 2T drives btrfs + ext4 formated and . I want to use
uuid in the fstab.  No swap for now (each system had 32G) if I need
swap later, I can just setup a file and use swapon? Usually I set up
"boot root and swap" but it seems that I have confused how to do
that correct with btrfs in a raid 1. What I want is if a drive fails,
I can just replace it, or pull one drive  out, replace it with a second 
blank, 2T new drive. Them move the removed drive into a second (identical)
system to build a cloned workstation. From what I've read, uuid numbers
are suppose to be use with fstab + btrfs Partuuid is still flaky. But the
UUID numbers to not appear uniq (due to raid-1)? Do the only get listed once
in fstab?

So I'm finishing up a new install of btrfs-raid1  on a gentoo system.
The machine is in a chroot right now:

gdisk /dev/sdb

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present
Found valid GPT with protective MBR; using GPT.

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048            8191   3.0 MiB     EF02  grub2biosboot
   2            8192         1024000   496.0 MiB   8300  boot
   3         1026048      3907029134   1.8 TiB     8300  root

gdisk /dev/sda

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present
Found valid GPT with protective MBR; using GPT.

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048            8191   3.0 MiB     EF02  grub2biosboot
   2            8192         1024000   496.0 MiB   8300  boot
   3         1026048      3907029134   1.8 TiB     8300  root

#   blkid
/dev/loop0: TYPE="squashfs" 
/dev/sda1: UUID="85cd9d86-4f4d-4113-b14e-cf5339373e20" TYPE="ext4"
PARTLABEL="grub2biosboot" PARTUUID="f88a8259-a4e4-4db8-86df-e709d135fe47" 
/dev/sda2: LABEL="BOOT" UUID="d67a8d19-64bc-4ee1-bebf-48c935b039fa"
UUID_SUB="3eb62dd8-3f07-440f-8606-0c6d99362f6e" TYPE="btrfs"
PARTLABEL="boot" PARTUUID="8a6f7b5f-28a8-4f87-938f-386a93ebe07f" 
/dev/sda3: LABEL="BTROOT" UUID="b7753366-a9a9-4074-8e0e-3beea50fee56"
UUID_SUB="e546ce31-098f-4897-bffd-6c5628f6b62e" TYPE="btrfs"
PARTLABEL="root" PARTUUID="6a8fa54b-3d58-4ac5-8784-6d540f2e65fc" 
/dev/sdb2: LABEL="BOOT" UUID="d67a8d19-64bc-4ee1-bebf-48c935b039fa"
UUID_SUB="02034edf-c537-4fc6-9375-1599e8af2737" TYPE="btrfs"
PARTLABEL="boot" PARTUUID="b7b88ea7-b59a-4a4d-b857-4f55a1be3830" 
/dev/sdb3: LABEL="BTROOT" UUID="b7753366-a9a9-4074-8e0e-3beea50fee56"
UUID_SUB="8a76be85-6106-47ea-90ae-756fb8c37bf1" TYPE="btrfs"
PARTLABEL="root" PARTUUID="3c2c6f88-a1da-40de-83be-21af71a5ce26" 
/dev/sr0: UUID="2014-08-28-06-08-20-22" LABEL="Gentoo Linux amd64 20140828"
TYPE="iso9660" PTUUID="1047d058" PTTYPE="dos" 
/dev/sdb1: PARTLABEL="grub2biosboot"
PARTUUID="3c7a0935-57d4-4bff-a492-aaa261e62212" 

UUID=d67a8d19-64bc-4ee1-bebf-48c935b039fa  
/boot  btrfs noauto,noatime         1 2
UUID=b7753366-a9a9-4074-8e0e-3beea50fee56  /      
btrfs defaults,noatime,compress=lzo,space_cache  0 0
UUID=3c2c6f88-a1da-40de-83be-21af71a5ce26  /boot   
UUID=d67a8d19-64bc-4ee1-bebf-48c935b039fa  /      
btrfs
UUID=b7753366-a9a9-4074-8e0e-3beea50fee56
UUID=85cd9d86-4f4d-4113-b14e-cf5339373e20 /grub2biosboot   
ext4   1 2
PARTUUID=3c7a0935-57d4-4bff-a492-aaa261e62212"  /grub2biosboot
???   0 0

First I notice the last partition (sdb1) seems to be missing the ext4 file
system  I guess when I exit the chroot I can just fix that to match sda1.
So my fstab should look like this?:

You know, it's obvious to me that I have not idea how to create
the fstab for this installation. Any help or guidance would be keen,
to help salvage the installation and get a few partitions installed
with btrfs. Maybe I can somehow migrate to a raid-1 configuration
under btrfs.

James

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: btrfs raid-1 uuid-fstab
  2015-02-14  2:31 btrfs raid-1 uuid-fstab James
@ 2015-02-14 11:52 ` Chris Murphy
  2015-02-15  6:28   ` Duncan
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Murphy @ 2015-02-14 11:52 UTC (permalink / raw)
  To: James; +Cc: Btrfs BTRFS

On Fri, Feb 13, 2015 at 7:31 PM, James <wireless@tampabay.rr.com> wrote:

>No swap for now (each system had 32G) if I need
> swap later, I can just setup a file and use swapon?

No. You should read the wiki.
https://btrfs.wiki.kernel.org/index.php/FAQ#Does_btrfs_support_swap_files.3F

> What I want is if a drive fails,
> I can just replace it, or pull one drive out, replace it with a second
> blank, 2T new drive. Them move the removed drive into a second (identical)
> system to build a cloned workstation. From what I've read, uuid numbers
> are suppose to be use with fstab + btrfs Partuuid is still flaky. But the
> UUID numbers to not appear uniq (due to raid-1)? Do the only get listed once
> in fstab?

Once is enough. Kernel code will find both devices.

For degraded use, this gets tricky, you have to use boot param
rootflags=degraded to get it to mount, otherwise mount fails and
you'll be dropped to a pre-mount shell in the initramfs. Also, there's
a nasty little gotcha, there is no equivalent for mdadm bitmap. So
once one member drive is mounted degraded+rw, it's changed, and
there's no way to "catch up" the other drive - if you reconnect, it
might seem things are OK but there's a good chance of corruption in
such a case. You have to make sure you wipe the "lost" drive (the
older version one). wipefs -a should be sufficient, then use 'device
add' and 'device delete missing' to rebuild it.

This should not be formatted ext4, it's strictly for GRUB, it doesn't
get a file system. You should use wipefs -a on this.

This fstab has lots of problems. Based on your partition scheme it
should only have two entries total. A btrfs /boot UUID="d67a... and a
btrfs / UUID="b7753... There is no mountpoint for biosboot, it's used
by GRUB and is never formatted or mounted.

> First I notice the last partition (sdb1) seems to be missing the ext4 file
> system I guess when I exit the chroot I can just fix that to match sda1.

No the problem is sda1 is wrongly formatted ext4, you should use
wipefs -a on it.

> Any help or guidance would be keen,
> to help salvage the installation and get a few partitions installed
> with btrfs. Maybe I can somehow migrate to a raid-1 configuration
> under btrfs.

Good luck. Make backups often. Btrfs raid1 is not a backup. Btrfs
snapshots are not a backup. And use recent kernels. Recent on this
list means 3.18.3 or newer, and is listed unstable on this list
http://packages.gentoo.org/package/sys-kernel/gentoo-sources Based on
the kernel.org change log, you'd probably be fine running 3.14.31, but
if you have problems and ask about it on this list, there's a decent
chance the first question will be "can you reproduce the problem on a
current kernel?"

Anyway, I suggest reading the entire btrfs wiki.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: btrfs raid-1 uuid-fstab
  2015-02-14 11:52 ` Chris Murphy
@ 2015-02-15  6:28   ` Duncan
  2015-02-15 11:11     ` Kai Krakow
  2015-02-16  5:29     ` Chris Murphy
  0 siblings, 2 replies; 8+ messages in thread
From: Duncan @ 2015-02-15  6:28 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Sat, 14 Feb 2015 04:52:12 -0700 as excerpted:

> On Fri, Feb 13, 2015 at 7:31 PM, James <wireless@tampabay.rr.com> wrote:

>> What I want is if a drive fails,
>> I can just replace it, or pull one drive out, replace it with a second
>> blank, 2T new drive. Them move the removed drive into a second
>> (identical) system to build a cloned workstation. From what I've read,
>> uuid numbers are suppose to be use with fstab + btrfs Partuuid is still
>> flaky. But the UUID numbers to not appear uniq (due to raid-1)? Do the
>> only get listed once in fstab?
> 
> Once is enough. Kernel code will find both devices.

[Preliminary note. FWIW, gentooer here too, running a btrfs raid1 root, 
altho I strongly prefer several smaller filesystems over a single large 
filesystem, so all my data eggs aren't in the same filesystem basket if 
the proverbial bottom drops out of it.  So /home is a separate 
filesystem, as is /var/log, as is my updates stuff (gentoo and other 
repos, including kernel, sources, binpkgs, ccache, everything I use to 
update the system on a single filesystem, kept unmounted unless I'm 
updating), as is my media partition, and of course /tmp, which is tmpfs.  
But of interest here is that I'm running a btrfs raid1 root.]

CM is correct. =:^)

But in addition, for a btrfs raid1 root (or any multi-device btrfs root, 
for that matter), you *WILL* need an initr*, because normally the kernel 
must run a userspace (initr* to mount root) btrfs device scan, before it 
can actually assemble a multi-device btrfs properly.  As I don't believe 
Chris is a gentooer, I'm guessing he's used to an initr* and thus forgot 
about this requirement, which can be a big one for a gentooer, since we 
build our own kernels and often build in at least the modules required to 
mount root, thus in many cases making an initr* unnecessary.  
Unfortunately, for a multi-device btrfs root, it's necessary. =:^(

While in theory btrfs has the device= mount option, and the kernel has 
rootflags= to tell it what mount options to use, at least last I checked 
a few kernel cycles ago (I'd say last summer, so 3-5 kernel cycles ago), 
for some reason rootflags=device= doesn't appear to work correctly.  My 
theory is that the kernel commandline parser breaks at the second/last = 
instead of the first, so instead of seeing settings for the rootflags 
parameter, it sees settings for the rootflags=device parameter, which of 
course makes no sense to the kernel and is ignored.  But that's just my 
best theory.  All I know for sure is that the subject has come up a 
number of times here and has been acknowledged by the btrfs devs, I had 
to set up an initr* to get a raid1 btrfs root to mount when I originally 
set it up here, and some time later when I decided to try an initr*-less 
rootflags= boot again and see if the problem had been fixed, it still 
didn't work.

So for a multi-device btrfs root, plan on that initr*.  If you'd never 
really learned how to set one up, as was the case here, you will probably 
either have to learn, or skip the idea of a multi-device btrfs root until 
the problem is, eventually/hopefully, fixed.

FWIW, I use dracut to create my initr* here, and have the kernel options 
set such that the dracut-pre-created initr* is attached to each kernel I 
build as an initramfs, so I don't have to have an initr* setting in grub2 
-- each kernel image has its own, attached.

And FWIW, when I first setup the btrfs root (and dracut-based initr*), I 
was running openrc (and thus using sysv-init as my init).  I've since 
switched to systemd and activated the appropriate dracut systemd module.  
So I know from personal experience, a dracut-based initr* can be setup to 
boot either openrc/sysvinit, or systemd.  Both work. =:^)

> For degraded use, this gets tricky, you have to use boot param
> rootflags=degraded to get it to mount, otherwise mount fails and you'll
> be dropped to a pre-mount shell in the initramfs.

See, assumed initr*. =:^\

But while on the topic of rootflags=degraded, in my experimentation, 
without an initr* with its pre-mount btrfs device scan, since it /was/ a 
two-device btrfs raid1 both data and metadata, thus with copies of 
everything on each device, the only way to boot without an initr* was to 
set rootflags=degraded, since the kernel would only know about the root= 
device in that case.

And that worked, so the kernel certainly could parse rootflags= and pass 
the mount options to btrfs as it should.  It simply broke when device= 
was passed in those rootflags.  Thus my theory about the parser breaking 
at the wrong =.

> Also, there's a nasty
> little gotcha, there is no equivalent for mdadm bitmap. So once one
> member drive is mounted degraded+rw, it's changed, and there's no way to
> "catch up" the other drive - if you reconnect, it might seem things are
> OK but there's a good chance of corruption in such a case. You have to
> make sure you wipe the "lost" drive (the older version one). wipefs -a
> should be sufficient, then use 'device add' and 'device delete missing'
> to rebuild it.

I caught this in my initial btrfs experimentation, before I set it up 
permanently.  It's worth repeating for emphasis, with a bit more 
information as well.

*** If you break up a btrfs raid1 and attempt to recombine afterward, be 
*SURE* you *ONLY* mount the one side writable after that.  As long as 
ONLY one side is written to, that one side will consistently have a later 
generation than the device that was dropped out, and you can add the 
dropped device back in, with the caveat that you should then immediately 
run a btrfs scrub, which will scan both the updated devices and the 
behind one, and catch up the behind one.

Never, ever, separately mount both devices writable, and then try to 
recombine them, without first wiping the one.

Because at least in theory (that is, barring bugs), if one device had 
more transactions and is thus at a later transaction generation (an 
integral part of btrfs and tracked in the superblock), the filesystem 
should pick the later generation and a scrub will update the older one as 
necessary.  This is how things work if only one side was written to or if 
they were both written to, how btrfs picks which side to consider valid.  
However, if the two sides were both written to separately, and the 
generation happens to be the same on both, the filesystem will consider 
them both valid even tho they differ, and "bad things can happen."

The best way to avoid those "bad things" is to avoid splitting and 
recombining where possible.  If it must be done, be sure btrfs only sees 
one side updated since the split, either by only mounting the one side 
writable and doing a scrub after recombine to update the other one, or if 
for some reason they were both mounted writable, wipe the one before 
reattaching it, so btrfs never sees the diverged writes and there's never 
a chance of corruption as a result.

> This should not be formatted ext4, it's strictly for GRUB, it doesn't
> get a file system. You should use wipefs -a on this.

"This" referring of course to the grub2 bios boot.

What grub2 actually uses this for is to store the grub-core, with the 
various modules it needs to read /boot builtin.  This was what grub1 
called stage-1.5.

On a BIOS system, the firmware reads and loads the boot sector, but 
that's only 512 bytes, far too small to contain the main grub binary.  
All it has room for is a small stub and a pointer to a larger core.

On the simplest /boot filesystems, this pointer can be directly to the 
binary on /boot, but that only works as long as the filesystem doesn't 
move that binary around (defrag or for btrfs, balance), and as long as 
that binary was stored serially, in terms of device LBA addressing.  In 
the grub1 era, these filesystems were the ones that didn't require a 
stage-1.5, with the grub binary on /boot being the stage2.

With now legacy mbr-based partitioning, the only place grub could put a 
stage-1.5, if needed to read the stage-2 on /boot, was in the clear space 
many partitioners left at the beginning of the partition.

With grub2 and gpt partitioning, as long as there's a grub2biosboot 
partition reserved, that's where grub2 now places this core, formerly 
stage-1.5, with grub2 updated to dynamically add any grub modules (for 
gpt, the filesystem, raid, lvm, etc) necessary to access /boot to the 
core dynamically, before it places it in this reserved partition.

But the gpt reserved biosboot partition should not have a filesystem and 
is never mounted -- grub2 writes the core-plus-necessary-modules binary 
directly to the reserved partition without a filesystem, in LBA address 
order so it can be read serially by the very simple code that's still 
held in that 512-byte boot sector.

In fact, that very simple 512-byte boot-sector code knows nothing about 
gpt, it simply knows how to read the pointer that points to the LBA 
address of the first grubcore sector, and starts reading from there until 
it hits the magic sequence that tells it to stop.  Only after it has read 
and loaded that grub2-core code, does grub as we know it start to execute.

And in fact, as long as the grub2-core code can be read and loaded, even 
if grub can't find and load its config file and the other modules on 
/boot for some reason, you'll still get a rescue shell, and with a bit of 
grub knowledge, can point grub either at its /boot config and additional 
modules manually, or at a backup /boot, possibly on another device, and 
load normal mode and hopefully be able to continue booting normally, from 
there.

What's nice about gpt is that it has a dedicated bios-boot reserved 
partition for grub2, or other boot loader, to use.  This is far more 
reliable than hoping the partitioner and filesystem left enough room at 
the beginning of the partition to store the stage-1.5, as grub1 used to 
have to do, and as grub2 still has to do on legacy mbr-formatted systems.

> This fstab has lots of problems. Based on your partition scheme it
> should only have two entries total. A btrfs /boot UUID="d67a... and a
> btrfs / UUID="b7753... There is no mountpoint for biosboot, it's used by
> GRUB and is never formatted or mounted.

Spot on.

>> First I notice the last partition (sdb1) seems to be missing the ext4
>> file system I guess when I exit the chroot I can just fix that to match
>> sda1.
> 
> No the problem is sda1 is wrongly formatted ext4, you should use wipefs
> -a on it.

Spot on.

>> Any help or guidance would be keen,
>> to help salvage the installation and get a few partitions installed
>> with btrfs. Maybe I can somehow migrate to a raid-1 configuration under
>> btrfs.
> 
> Good luck. Make backups often. Btrfs raid1 is not a backup. Btrfs
> snapshots are not a backup. And use recent kernels. Recent on this list
> means 3.18.3 or newer, and is listed unstable on this list
> http://packages.gentoo.org/package/sys-kernel/gentoo-sources Based on
> the kernel.org change log, you'd probably be fine running 3.14.31, but
> if you have problems and ask about it on this list, there's a decent
> chance the first question will be "can you reproduce the problem on a
> current kernel?"
> 
> Anyway, I suggest reading the entire btrfs wiki.

Absolutely.  Well, the entire user documentation section, anyway.  If 
you're not a dev, you can skip that stuff unless you're curious.

Just as reading the rest of the gentoo handbook, not just the install 
section, can save you a lot of needlessly wasted time and headaches on 
gentoo, so reading the entire user documentation section on the btrfs 
wiki can save you lots of wasted time and headaches, and since it's a 
filesystem on which you're placing data presumably of some value, very 
possibly needlessly lost data, as well.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: btrfs raid-1 uuid-fstab
  2015-02-15  6:28   ` Duncan
@ 2015-02-15 11:11     ` Kai Krakow
  2015-02-16  5:50       ` Duncan
  2015-02-16  5:29     ` Chris Murphy
  1 sibling, 1 reply; 8+ messages in thread
From: Kai Krakow @ 2015-02-15 11:11 UTC (permalink / raw)
  To: linux-btrfs

Duncan <1i5t5.duncan@cox.net> schrieb:

> While in theory btrfs has the device= mount option, and the kernel has
> rootflags= to tell it what mount options to use, at least last I checked
> a few kernel cycles ago (I'd say last summer, so 3-5 kernel cycles ago),
> for some reason rootflags=device= doesn't appear to work correctly.  My
> theory is that the kernel commandline parser breaks at the second/last =
> instead of the first, so instead of seeing settings for the rootflags
> parameter, it sees settings for the rootflags=device parameter, which of
> course makes no sense to the kernel and is ignored.  But that's just my
> best theory.

Gentoo here, too. And I tried to fiddle around with the exact same issue 
some kernel versions back and didn't get it to work, so I did go with dracut 
which works pretty well for me - combined with grub2, multi-device detection 
works pretty well tho you sometimes need rootdelay={1,2,3} to wait up to 
three seconds for btrfs figure out its setup. Looks like btrfs devices are 
assembled with a delay by the kernel and at the point you try to mount one 
of the compound devices, if done too early, the kernel code cannot yet find 
all the other devices of the set. Maybe "rootwait" would also do tho I 
didn't tried that yet (it probably won't as the root device is initrd 
initially). It may be a side-effect of the kernel doing async SCSI device 
detection. It may be worth trying to turn that option of.

But about your theory: I don't think the cmdline parser works incorrect, 
becauce rootflags=subvol=something works. It's probably just a flaw that 
btrfs device composition comes up later and the kernel tries to early to 
mount root. "rootwait" probably won't help here, too. But "rootdelay" may 
help that case tho I myself don't have the ambitions to experiment with it. 
My dracut initrd setup works fine and has some benefits like early debug 
shell to investigate problems without resorting to rescue systems or 
bootable USB sticks.

-- 
Replies to list only preferred.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: btrfs raid-1 uuid-fstab
  2015-02-15  6:28   ` Duncan
  2015-02-15 11:11     ` Kai Krakow
@ 2015-02-16  5:29     ` Chris Murphy
  1 sibling, 0 replies; 8+ messages in thread
From: Chris Murphy @ 2015-02-16  5:29 UTC (permalink / raw)
  To: Btrfs BTRFS

On Sat, Feb 14, 2015 at 11:28 PM, Duncan <1i5t5.duncan@cox.net> wrote:
> Chris Murphy posted on Sat, 14 Feb 2015 04:52:12 -0700 as excerpted:

>> Also, there's a nasty
>> little gotcha, there is no equivalent for mdadm bitmap. So once one
>> member drive is mounted degraded+rw, it's changed, and there's no way to
>> "catch up" the other drive - if you reconnect, it might seem things are
>> OK but there's a good chance of corruption in such a case. You have to
>> make sure you wipe the "lost" drive (the older version one). wipefs -a
>> should be sufficient, then use 'device add' and 'device delete missing'
>> to rebuild it.
>
> I caught this in my initial btrfs experimentation, before I set it up
> permanently.  It's worth repeating for emphasis, with a bit more
> information as well.
>
> *** If you break up a btrfs raid1 and attempt to recombine afterward, be
> *SURE* you *ONLY* mount the one side writable after that.  As long as
> ONLY one side is written to, that one side will consistently have a later
> generation than the device that was dropped out, and you can add the
> dropped device back in,

Right. I left out the distinguishing factor in whether or not it
corrupts. I'm uncertain how bad this corruption is, I've never tried
reproducing it.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: btrfs raid-1 uuid-fstab
  2015-02-15 11:11     ` Kai Krakow
@ 2015-02-16  5:50       ` Duncan
  2015-02-16 23:15         ` Kai Krakow
  0 siblings, 1 reply; 8+ messages in thread
From: Duncan @ 2015-02-16  5:50 UTC (permalink / raw)
  To: linux-btrfs

Kai Krakow posted on Sun, 15 Feb 2015 12:11:56 +0100 as excerpted:

> Duncan <1i5t5.duncan@cox.net> schrieb:
> 
> Gentoo here, too. And I tried to fiddle around with the exact same issue
> some kernel versions back and didn't get it to work, so I did go with
> dracut which works pretty well for me - combined with grub2,
> multi-device detection works pretty well tho you sometimes need
> rootdelay={1,2,3} to wait up to three seconds for btrfs figure out its
> setup. Looks like btrfs devices are assembled with a delay by the kernel
> and at the point you try to mount one of the compound devices, if done
> too early, the kernel code cannot yet find all the other devices of the
> set. Maybe "rootwait" would also do tho I didn't tried that yet (it
> probably won't as the root device is initrd initially). It may be a
> side-effect of the kernel doing async SCSI device detection. It may be
> worth trying to turn that option of.

Interesting.  I had forgotten I had rootwait set as a builtin kernel 
commandline-option, and was about to reply that I had SCSI_ASYNC_SCAN 
turned on and had never seen problems, but then I remembered having to 
turn on rootwait.

Actually, I had tried rootdelay=N some years ago, perhaps before rootwait 
actually became an kernel commandline option, certainly before I knew of 
it.  I used it with mdraid (initr*-less) too.  But eventually I got tired 
of having to play with rootdelay timeouts, and when I came across rootwait 
I decided to try it, and that solved my timeouts issue once and for all.

So I can confirm that rootwait seems to work for multi-device btrfs as 
well, which of course requires an initr*.  But that actually might be 
dracut reading the kernel commandline and applying the same option at the 
initr* level, and thus not work with other initr*-generators, if they 
don't do the same thing.  I'm actually not sure.

What I can say, however, is that after I set rootwait here, I've had no 
more block-device-detection-timing issues.  It has "just worked" in terms 
of timing.

And what's nice is that rootwait actually appears to go into a loop, 
checking for a mountable root, as well, and will continue immediately 
upon finding it.  So the delay is exactly as long as it needs to be, and 
no longer.  (I don't remember whether rootdelay=N could terminate the 
delay early if it found all necessary devices, or not, but certainly, 
rootwait does.)

> But about your theory: I don't think the cmdline parser works incorrect,
> becauce rootflags=subvol=something works.

Well, so much for /that/ theory, then.  I /thought/ the kernel devs were 
too smart to have let a bug that simple, especially where it was likely 
to be triggered by other = options as well, remain for as long as this 
has.  But that was what I came up with as a possible explanation.  I 
think your theory below makes more sense.

> It's probably just a flaw that
> btrfs device composition comes up later and the kernel tries to early to
> mount root. "rootwait" probably won't help here, too. But "rootdelay"
> may help that case tho I myself don't have the ambitions to experiment
> with it. My dracut initrd setup works fine and has some benefits like
> early debug shell to investigate problems without resorting to rescue
> systems or bootable USB sticks.

FWIW, my root backup and rescue solution are one and the same, an 
occasional (every few kernel cycles) "snapshot" copy (not btrfs snapshot, 
a full copy) of my root filesystem, made when things seem reasonably 
stable and have been working for awhile, to an identically sized "backup 
root filesystem" located elsewhere.  That way, I have effectively a fully 
operational system "snapshot" copy, taken when the system was known to be 
operational, complete with everything I normally use, X, KDE, firefox, 
media players, games, everything, and of course tested to boot and run as 
normal.  No crippled semi-functional rescue media for me! =:^)

With a root filesystem of 8 GiB, that's easy enough, and I keep several 
backup copies available, the first one another 8 GiB partitions each pair-
device btrfs raid1 on the same physical pair of SSDs, with a second and 
third 8 GiB root backup on reiserfs on spinning rust, in case the pair of 
SSD physical devices fail, or if btrfs itself gets majorly bugged out, 
such that booting to the first backup kills it just like it did the 
working copy.

And I have my grub2 menu setup with the root= boot option assigned a 
variable, and menu options to set that variable to point to any of the 
backups as necessary.  So to boot a particular backup, I just select the 
option to set the pointer variable appropriately, and then select boot.  
Similarly with other kernel commandline options, including the kernel 
choice and init=.  They're all loaded into pointer variables, and if I 
want to choose a different one, I simply select the menu option that sets 
the pointer variable appropriately, and then select boot.

Very flexible, this grub2 is! =:^)

Meanwhile, grub2 is setup on both ssds (which have identical partition 
layouts) and on the spinning rust, with each one having its own /boot, 
thus giving me backup /boots as well, and of course I can select any of 
them from the BIOS to boot, so I'm pretty well set as long as I don't 
lose all three devices at once.

If I lose all three devices at once, I figure it's quite likely I'm 
dealing with a rather larger disaster, say a fire or flood or the like, 
and will probably have my hands full just surviving for awhile.  When I 
do get back to worrying about the computer, likely after replacing what I 
lost in the disaster, it won't be that big a deal to start over 
downloading a live image and doing a new install from the stage-3 
starter.  After all, the *REAL* important backup is in my head, and if I 
lose that, I guess I won't be worrying much about computers any more, 
even if I'm still "alive" in some facility somewhere.  Tho I /do/ have 
some stuff backed up on USB thumb drive and the like as well.  But I 
don't put much priority in it, because I figure if I'm having to restore 
from that backup in the first place, I'm pretty much screwed in any case, 
and the /last/ thing I'm likely to be worried about is having to start 
over with a new computer install.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: btrfs raid-1 uuid-fstab
  2015-02-16  5:50       ` Duncan
@ 2015-02-16 23:15         ` Kai Krakow
  2015-02-18  1:12           ` Duncan
  0 siblings, 1 reply; 8+ messages in thread
From: Kai Krakow @ 2015-02-16 23:15 UTC (permalink / raw)
  To: linux-btrfs

Duncan <1i5t5.duncan@cox.net> schrieb:

>> It's probably just a flaw that
>> btrfs device composition comes up later and the kernel tries to early to
>> mount root. "rootwait" probably won't help here, too. But "rootdelay"
>> may help that case tho I myself don't have the ambitions to experiment
>> with it. My dracut initrd setup works fine and has some benefits like
>> early debug shell to investigate problems without resorting to rescue
>> systems or bootable USB sticks.
> 
> FWIW, my root backup and rescue solution are one and the same, an
> occasional (every few kernel cycles) "snapshot" copy (not btrfs snapshot,
> a full copy) of my root filesystem, made when things seem reasonably
> stable and have been working for awhile, to an identically sized "backup
> root filesystem" located elsewhere.  That way, I have effectively a fully
> operational system "snapshot" copy, taken when the system was known to be
> operational, complete with everything I normally use, X, KDE, firefox,
> media players, games, everything, and of course tested to boot and run as
> normal.  No crippled semi-functional rescue media for me! =:^)

I accidently forced myself into using my USB3 backup drive as my rootfs due 
to fiddling around with dracut build options without thinking about it too 
much while waiting for my btrfs device add/del disk jockeying to migrate to 
bcache. Long story short: I managed to strip dracut down to too few modules 
and it lost its ability to mount anything and even could not spawn a shell. 
*gnarf

And when that wasn't fun enough, my BIOS decided to no longer initialize USB 
so I could neither get into BIOS nor into Grub shell. I don't know when that 
problem happened. Probably been that for a while and I never noticed. Just 
that it went a lot slower through BIOS after I managed to convince it to 
initialize USB again (by opening the case and shorting the reset jumper).

The next fun part was: My backup was incomplete in a special way: It had no 
directories dev, proc, run, sys and friends... Don't ask me how I solved 
that, probably by "init=/bin/bash". It happens, because I used "rsync" with 
the option to exclude those dirs. But well: In the end by backup was tested 
bootable. :-)

I fixed by dracut setup and in the same procedure also fixed a long-standing 
issue with "btrfs check" telling me nlink errors. Luckily, this newer 
version could tell me the paths and I just delete those files in the chrome 
profile and var/lib/bluetooth directory. I wonder if those errors were 
causing me issues with chrome freezing the PC and bluetooth stopped working 
sometimes.

And BTW: bcache is pretty fast, booting to graphical.target within 3-8 
seconds (mostly around 5). Now I wonder what I need the resume swap for 
which I created in the process: It takes longer to resume from swap than 
just booting to complete KDE desktop. Well, without the benefit of having a 
fully running session at least.

> Very flexible, this grub2 is! =:^)

I've been waiting long before doing the switch. But I had to use it when I 
migrated from legacy to UEFI boot mode. Although every configuration bit 
looked confusing and cumbersome, everything worked automatically out of the 
box. Very suprising it is. :-)

> If I lose all three devices at once, I figure it's quite likely I'm
> dealing with a rather larger disaster, say a fire or flood or the like,
> and will probably have my hands full just surviving for awhile.  When I
> do get back to worrying about the computer, likely after replacing what I
> lost in the disaster, it won't be that big a deal to start over
> downloading a live image and doing a new install from the stage-3
> starter.  After all, the *REAL* important backup is in my head, and if I
> lose that, I guess I won't be worrying much about computers any more,
> even if I'm still "alive" in some facility somewhere.  Tho I /do/ have
> some stuff backed up on USB thumb drive and the like as well.  But I
> don't put much priority in it, because I figure if I'm having to restore
> from that backup in the first place, I'm pretty much screwed in any case,
> and the /last/ thing I'm likely to be worried about is having to start
> over with a new computer install.

>From my own experience, the head is not a very good backup. While there are 
things which you simply cannot remember to rebuild, there are other things 
which, when rebuilt from scratch, probably get better and more well thought 
about but very frustrating to rebuild and thus never reach the same stage of 
completeness again. So, no: Not a good backup. It's no fun even when I had 
no other stuff to deal with...

But to get back to the multi-device btrfs booting issue: Thanks for 
recommending "rootwait", I will try that. I had thought it would have no 
effect if booting from initrd. Let's see if dracut+systemd with rootwait 
will work for me, too.

-- 
Replies to list only preferred.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: btrfs raid-1 uuid-fstab
  2015-02-16 23:15         ` Kai Krakow
@ 2015-02-18  1:12           ` Duncan
  0 siblings, 0 replies; 8+ messages in thread
From: Duncan @ 2015-02-18  1:12 UTC (permalink / raw)
  To: linux-btrfs

Kai Krakow posted on Tue, 17 Feb 2015 00:15:50 +0100 as excerpted:

> Long story short: I managed to strip dracut down to
> too few modules and it lost its ability to mount anything and even could
> not spawn a shell. *gnarf

Ouch!

FWIW, that's why I use a kernel built-in initramfs.  If I upgrade dracut 
or change its config and it fails to work, just as if the new kernel the 
initramfs is appended to fails to work, I simply boot an older kernel... 
with a known-working dracut-created initramfs.

Tho I /did/ have trouble with an older dracut locking to a particular 
default-root UUID at one point, so it would boot any root= I pointed it 
at, but *ONLY* as long as that particular UUID continued to exist!

Which is pretty hard to test for, since until you actually mkfs the 
existing default-root, its UUID will continue to exist, and you'll never 
know that your boot to the backup root using root= is working now, but 
will fail as soon as the default-root ceases to exist, until you're 
actually in the situation and can't boot, using any kernel/dracut 
combination!

That did drop me to the dracut/initramfs shell, but I was new enough with 
dracut at the time that I didn't really know how to fix it from there, 
nor could I properly edit a file or even view an entire file (cat worked, 
but that only let me see the last N lines and I didn't have a pager in 
the initramfs), to try to read documentation and fix the issue.

What I finally did to get out of that hole was manually ln -s the /dev/
disk/by-uuid/* symlink that the dracut/initramfs scripts were looking for 
based on the error, pointing it at an existing /dev/sdXN.  It didn't have 
to point at the root device, it could point at any device-block file, as 
long as that device-block file actually existed.

I didn't originally file a bug on that as the host-only option 
documentation warned about it being host-specific, so I figured it was 
/designed/ to do that.  Only later, when host-only was being discussed as 
the gentoo-recommended default on gentoo-dev and I explained that it 
wasn't always suitable as it broke if/when you blew away your default-
root and recreated it with a new UUID, and the gentoo dracut maintainer 
asked why I hadn't filed a bug, did I figure out it /was/ a bug, not a 
"confusingly documented feature".  So I filed a bug and the gentoo 
maintainer filed one upstream as well, and it was apparently fixed.  But 
of course by then I had long since worked around the problem with more 
specific dracut-module include and exclude statements in the config, 
instead of using host-only, and that was working and continues to work, 
so I've never had reason to go back and test the more loosely specified 
host-only mode, and thus have never confirmed whether the bug was 
actually fixed or not, since I don't use that mode any more.

> And when that wasn't fun enough, my BIOS decided to no longer initialize
> USB so I could neither get into BIOS nor into Grub shell. I don't know
> when that problem happened. Probably been that for a while and I never
> noticed. Just that it went a lot slower through BIOS after I managed to
> convince it to initialize USB again (by opening the case and shorting
> the reset jumper).

Ouch.  FWIW my mobo has dual-bios, which is nice, but I've been down the 
bios-reset road before, several times.

I even had a BIOS update go bad once (due to bad RAM), screwed up the 
last-ditch bios-rescue it offered as I didn't know what I was doing, and 
had to use my netbook to setup a webmail account (didn't have the 
passwords to my normal email as I don't normally keep anything private on 
the netbook at all, in case I lose it, and couldn't access my other disks 
without a device to convert them to external/USB) and order a new BIOS 
shipped to me.

That is of course the big reason my new machine is dual-bios! =:^)  Tho 
it's not an absolute cure-all, as once it successfully boots from the 
main BIOS it auto-overwrites the second one, if different.  I'd actually 
rather make the auto-overwrite bit manual, so I could update it only when 
I was sufficiently sure it worked _reliably_, but oh, well, better than 
not having a backup BIOS at all, as I learned from experience!

> The next fun part was: My backup was incomplete in a special way: It had
> no directories dev, proc, run, sys and friends... Don't ask me how I
> solved that, probably by "init=/bin/bash".

init=/bin/bash is indeed a very handy tool to have as a sysadmin. =:^)

I think I mentioned that setting that (via grub var) is actually one of 
my grub2 menu options, in the backup menu, FWIW.

> It happens, because I used
> "rsync" with the option to exclude those dirs. But well: In the end by
> backup was tested bootable. :-)
> 
> I fixed by dracut setup and in the same procedure also fixed a
> long-standing issue with "btrfs check" telling me nlink errors. Luckily,
> this newer version could tell me the paths and I just delete those files
> in the chrome profile and var/lib/bluetooth directory. I wonder if those
> errors were causing me issues with chrome freezing the PC and bluetooth
> stopped working sometimes.

Likely.

With all my filesystems being rather small, and having (tested) backup 
versions of them available, I'd probably just ensure that I had a current 
backup, blow the filesystem away and recreate it fresh, restoring from 
backup, at the first sign of nlink errors or the like.

> And BTW: bcache is pretty fast, booting to graphical.target within 3-8
> seconds (mostly around 5). Now I wonder what I need the resume swap for
> which I created in the process: It takes longer to resume from swap than
> just booting to complete KDE desktop. Well, without the benefit of
> having a fully running session at least.

Since my main filesystems are all on ssd, I get that too.  Tho I can say 
I was rather surprised at how much faster systemd was than even 
parallelized openrc.  Systemd's demand-based socket setup, not making 
final initialization of a daemon thru opening the socket a necessity 
before starting other services that depend on that socket, probably has a 
lot to do with that.  Openrc can parallelize, but if a daemon must be up 
and a socket it creates usable before another service depending on it can 
start, it must be, and openrc doesn't have the demand-based socket 
activation available to shortcut that, as systemd does.  That makes more 
difference than I expected.

As for resume-swap, back when my system was still on spinning rust, yes, 
resume took longer, dramatically longer if I made the resume image big 
enough to save and restore everything in cache, but it was still worth 
it, because dropping cache as one did on reboot was EXPENSIVE, due to 
having to reload all that stuff from slow spinning-rust over time.

But now that most of the system's on ssd (basically everything but the 
media partition, with media being primarily serially accessed big files 
anyway, such that speed doesn't make so much difference as long as it's 
faster than the play-rate consumption, as even spinning-rust is for most 
media... tho full 4k video may change that) cache is still faster, but 
more like an order of magnitude faster instead of about three orders of 
magnitude faster.

So dropping and having to reread gigabytes of cached files off of ssd 
isn't the big deal it was when it was gigabytes of cached files off of 
spinning rust.

But bcache will get most of that benefit, too.  I went full ssd instead, 
both because I did it a bit earlier, before bcache was mature, and 
because ssds became cheap enough for my limited non-media requirements 
that I decided it was worth throwing the required money at it, to avoid 
the hassle of ssd-cache-of-still-spinning-rust vs. just ssd pretty much 
everything.

But bigger ssds have continued to drop in price, and even my media files 
requirements aren't /that/ big, particularly as local storage 
increasingly is simply cache for media otherwise streamed from the net 
anyway (they're advertising "gigablast" here, now, tho my own connection 
remains under single-digit MiByte/sec, in-single-digit Mbit/sec), so I'll 
probably just go full ssd, even for media files, at some point, if only 
to be able to kill off the constant incremental power draw and noise of 
spinning-rust, entirely.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-02-18  1:12 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-14  2:31 btrfs raid-1 uuid-fstab James
2015-02-14 11:52 ` Chris Murphy
2015-02-15  6:28   ` Duncan
2015-02-15 11:11     ` Kai Krakow
2015-02-16  5:50       ` Duncan
2015-02-16 23:15         ` Kai Krakow
2015-02-18  1:12           ` Duncan
2015-02-16  5:29     ` Chris Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).