* btrfs-raid questions I couldn't find an answer to on the wiki
@ 2012-01-26 15:41 Duncan
2012-01-28 12:08 ` Martin Steigerwald
2012-01-29 11:23 ` Goffredo Baroncelli
0 siblings, 2 replies; 14+ messages in thread
From: Duncan @ 2012-01-26 15:41 UTC (permalink / raw)
To: linux-btrfs
I'm currently researching an upgrade to (raid1-ed) btrfs from mostly
reiserfs (which I've found quite reliable (even thru a period of bad ram
and resulting system crashes) since data=ordered went in with 2.6.16 or
whatever it was. (Thanks, Chris! =:^)) on multiple md/raid-1s. I have
some questions that don't appear to be addressed well on the wiki, yet,
or where the wiki info might be dated.
Device hardware is 4 now aging 300-gig disks with identical gpt-
partitioning on all four disks, using multiple 4-way md/raid-1s for most
of the system. I'm running gentoo/~amd64 with the linus mainline kernel
from git, kernel generally updated 1-2X/wk except during the merge
window, so I stay reasonably current. I have btrfs-progs-9999, aka the
live-git build, kernel.org mason tree, installed.
The current layout has a total of 16 physical disk partitions on each of
the four drives, mostly of which are 4-disk md/raid1, but with a couple
md/raid1s for local cache of redownloadables, etc, thrown in. Some of
the mds are further partitioned (mdp), some not. A couple are only 2-
disk md/raid1 instead of the usual 4-disk. Most mds have a working and
backup copy of exactly the same partitioned size, thus explaining the
multitude of partitions, since most of them come in pairs. No lvm as I'm
not running an initrd which meant it couldn't handle root, and I wasn't
confident in my ability to recover the system in an emergency with lvm
either, so I was best off without it.
Note that my current plan is to keep the backup sets as reiserfs on md/
raid1 for the time being, probably until btrfs comes out of experimental/
testing or at least until it further stabilizes, so I'm not too worried
about btrfs as long as it's not going to go scribbling outside the
partitions established for it. For the worst-case I have boot-tested
external-drive backup.
Three questions:
1) My /boot partition and its backup (which I do want to keep separate
from root) are only 128 MB each. The wiki recommends 1 gig sizes
minimum, but there's some indication that's dated info due to mixed data/
metadata mode in recent kernels.
Is a 128 MB btrfs reasonable? What's the mixed-mode minumum recommended
and what is overhead going to look like?
2) The wiki indicates that btrfs-raid1 and raid-10 only mirror data 2-
way, regardless of the number of devices. On my now aging disks, I
really do NOT like the idea of only 2-copy redundancy. I'm far happier
with the 4-way redundancy, twice for the important stuff since it's in
both working and backup mds altho they're on the same 4-disk set (tho I
do have an external drive backup as well, but it's not kept as current).
If true that's a real disappointment, as I was looking forward to btrfs-
raid1 with checksummed integrity management.
Is there really NO way to do more than 2-way btrfs-raid1? If not,
presumably layering it on md/raid1 is possible, but is two-way-btrfs-
raid1-on-2-way-md-raid1 or btrfs-on-single-4-way-md-raid1 (presumably
still-duped btrfs metadata) recommended? Or perhaps the recommendations
for performance and reliability differ in that scenario?
3) How does btrfs space overhead (and ENOSPC issues) compare to reiserfs
with its (default) journal and tail-packing? My existing filesystems are
128 MB and 4 GB at the low end, and 90 GB and 16 GB at the high end. At
the same size, can I expect to fit more or less data on them? Do the
compression options change that by much "IRL"? Given that I'm using same-
sized partitions for my raid-1s, I guess at least /that/ angle of it's
covered.
Thanks. =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: btrfs-raid questions I couldn't find an answer to on the wiki
2012-01-26 15:41 btrfs-raid questions I couldn't find an answer to on the wiki Duncan
@ 2012-01-28 12:08 ` Martin Steigerwald
2012-01-29 5:40 ` Duncan
2012-01-29 11:23 ` Goffredo Baroncelli
1 sibling, 1 reply; 14+ messages in thread
From: Martin Steigerwald @ 2012-01-28 12:08 UTC (permalink / raw)
To: linux-btrfs
Am Donnerstag, 26. Januar 2012 schrieb Duncan:
> I'm currently researching an upgrade to (raid1-ed) btrfs from mostly
> reiserfs (which I've found quite reliable (even thru a period of bad
> ram and resulting system crashes) since data=3Dordered went in with
> 2.6.16 or whatever it was. (Thanks, Chris! =3D:^)) on multiple
> md/raid-1s. I have some questions that don't appear to be addressed
> well on the wiki, yet, or where the wiki info might be dated.
>=20
> Device hardware is 4 now aging 300-gig disks with identical gpt-
> partitioning on all four disks, using multiple 4-way md/raid-1s for
> most of the system. I'm running gentoo/~amd64 with the linus mainlin=
e
> kernel from git, kernel generally updated 1-2X/wk except during the
> merge window, so I stay reasonably current. I have btrfs-progs-9999,
> aka the live-git build, kernel.org mason tree, installed.
>=20
> The current layout has a total of 16 physical disk partitions on each
> of the four drives, mostly of which are 4-disk md/raid1, but with a
> couple md/raid1s for local cache of redownloadables, etc, thrown in.=20
> Some of the mds are further partitioned (mdp), some not. A couple ar=
e
> only 2- disk md/raid1 instead of the usual 4-disk. Most mds have a
> working and backup copy of exactly the same partitioned size, thus
> explaining the multitude of partitions, since most of them come in
> pairs. No lvm as I'm not running an initrd which meant it couldn't
> handle root, and I wasn't confident in my ability to recover the
> system in an emergency with lvm either, so I was best off without it.
Sounds like a quite complex setup.
> Three questions:
>=20
> 1) My /boot partition and its backup (which I do want to keep separat=
e
> from root) are only 128 MB each. The wiki recommends 1 gig sizes
> minimum, but there's some indication that's dated info due to mixed
> data/ metadata mode in recent kernels.
>=20
> Is a 128 MB btrfs reasonable? What's the mixed-mode minumum
> recommended and what is overhead going to look like?
I don=C2=B4t know.
You could try with a loop device. Just create one and mkfs.btrfs on it,=
=20
mount it and copy your stuff from /boot over to see whether that works =
and=20
how much space is left.
On BTRFS I recommend using btrfs filesystem df for more exact figures o=
f=20
space utilization that df would return.
Likewise for RAID 1, just create 2 or 4 BTRFS image files.
You may try with:
-M, --mixed
Mix data and metadata chunks together for more
efficient space utilization. This feature incurs
a performance penalty in larger filesystems. It
is recommended for use with filesystems of 1 GiB
or smaller.
for smaller partitions (see manpage of mkfs.btrfs).
=20
> 2) The wiki indicates that btrfs-raid1 and raid-10 only mirror data =
2-
> way, regardless of the number of devices. On my now aging disks, I
> really do NOT like the idea of only 2-copy redundancy. I'm far happi=
er
> with the 4-way redundancy, twice for the important stuff since it's i=
n
> both working and backup mds altho they're on the same 4-disk set (tho=
I
> do have an external drive backup as well, but it's not kept as
> current).
>=20
> If true that's a real disappointment, as I was looking forward to
> btrfs- raid1 with checksummed integrity management.
I didn=C2=B4t see anything like this.
Would be nice to be able to adapt the redundancy degree where possible.
An idea might be splitting into a delayed synchronisation mirror:
Have two BTRFS RAID-1 - original and backup - and have a cronjob with=20
rsync mirroring files every hour or so. Later this might be replaced by=
=20
btrfs send/receive - or by RAID-1 with higher redundancy.
> 3) How does btrfs space overhead (and ENOSPC issues) compare to
> reiserfs with its (default) journal and tail-packing? My existing
> filesystems are 128 MB and 4 GB at the low end, and 90 GB and 16 GB a=
t
> the high end. At the same size, can I expect to fit more or less dat=
a
> on them? Do the compression options change that by much "IRL"? Give=
n
> that I'm using same- sized partitions for my raid-1s, I guess at leas=
t
> /that/ angle of it's covered.
The efficiency of the compression options depend highly of the kind of =
data=20
you want to store.
I tried lzo on a external disk with movies, music files, images and=20
software archives. The effect has been minimal, about 3% or so. But for=
=20
unpacked source trees, lots of clear text files, likely also virtual=20
machine image files or other nicely compressible data the effect should=
be=20
better.
Although BTRFS received a lot of fixes for ENOSPC issues I would be a b=
it=20
reluctant with very small filesystems. But that is just a gut feeling. =
So I=20
do not know whether the option -M from above is tested widely. I doubt =
it.
Maybe someone with more in-depth knowledge can shed some light on this.
--=20
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: btrfs-raid questions I couldn't find an answer to on the wiki
2012-01-28 12:08 ` Martin Steigerwald
@ 2012-01-29 5:40 ` Duncan
2012-01-29 7:55 ` Martin Steigerwald
0 siblings, 1 reply; 14+ messages in thread
From: Duncan @ 2012-01-29 5:40 UTC (permalink / raw)
To: linux-btrfs
Martin Steigerwald posted on Sat, 28 Jan 2012 13:08:52 +0100 as excerpt=
ed:
> Am Donnerstag, 26. Januar 2012 schrieb Duncan:
>> The current layout has a total of 16 physical disk partitions on eac=
h
>> of the four drives, mostly of which are 4-disk md/raid1, but with a
>> couple md/raid1s for local cache of redownloadables, etc, thrown in.
>> Some of the mds are further partitioned (mdp), some not. A couple a=
re
>> only 2- disk md/raid1 instead of the usual 4-disk. Most mds have a
>> working and backup copy of exactly the same partitioned size, thus
>> explaining the multitude of partitions, since most of them come in
>> pairs. No lvm as I'm not running an initrd which meant it couldn't
>> handle root, and I wasn't confident in my ability to recover the sys=
tem
>> in an emergency with lvm either, so I was best off without it.
>=20
> Sounds like a quite complex setup.
It is. I was actually writing a rather more detailed description, but=20
decided few would care and it'd turn into a tl;dr. It was I think the=20
4th rewrite that finally got it down to something reasonable while stil=
l=20
hopefully conveying any details that might be corner-cases someone know=
s=20
something about.
>> Three questions:
>>=20
>> 1) My /boot partition and its backup (which I do want to keep separa=
te
>> from root) are only 128 MB each. The wiki recommends 1 gig sizes
>> minimum, but there's some indication that's dated info due to mixed
>> data/ metadata mode in recent kernels.
>>=20
>> Is a 128 MB btrfs reasonable? What's the mixed-mode minumum
>> recommended and what is overhead going to look like?
>=20
> I don=C2=B4t know.
>=20
> You could try with a loop device. Just create one and mkfs.btrfs on i=
t,
> mount it and copy your stuff from /boot over to see whether that work=
s
> and how much space is left.
The loop device is a really good idea that hadn't occurred to me. Than=
ks!
> On BTRFS I recommend using btrfs filesystem df for more exact figures=
of
> space utilization that df would return.
Yes. I've read about the various space reports on the wiki so have the=
=20
general idea, but will of course need to review it again after I get=20
something setup so I can actually type in the commands and see for=20
myself. Still, thanks for the reinforcement. It certainly won't hurt,=
=20
and of course it's quite possible that others will end up reading this=20
too, so it could end up being a benefit to many people, not just me. =3D=
:^)
> You may try with:
>=20
> -M, --mixed
> Mix data and metadata chunks together for more
> efficient space utilization. This feature incurs a=20
> performance penalty in larger filesystems. It is
> recommended for use with filesystems of 1 GiB or
> smaller.
>=20
> for smaller partitions (see manpage of mkfs.btrfs).
I had actually seen that too, but as it's newer there's significantly=20
less mentions of it out there, so the reinforcement is DEFINITELY=20
valued! I like to have a rather good general sysadmin's idea of what's=
=20
going on and how everything fits together, as opposed to simply followi=
ng=20
instructions by rote, before I'm really comfortable with something as=20
critical as filesystem maintenance (keeping in mind that when one reall=
y=20
tends to need that knowledge is in an already stressful recovery=20
situation, very possibly without all the usual documentation/net-
resources available), and repetition of the basics helps getting=20
comfortable with it, so I'm very happy for it even if it isn't "new" to=
=20
me. =3D:^) (As mentioned, that was a big reason behind my ultimate=20
rejection of LVM, I simply couldn't get comfortable enough with it to b=
e=20
confident of my ability to recover it in an emergency recovery situatio=
n.)
>> 2) The wiki indicates that btrfs-raid1 and raid-10 only mirror data=
2-
>> way, regardless of the number of devices. On my now aging disks, I
>> really do NOT like the idea of only 2-copy redundancy. I'm far happ=
ier
>> with the 4-way redundancy, twice for the important stuff since it's =
in
>> both working and backup mds altho they're on the same 4-disk set (th=
o I
>> do have an external drive backup as well, but it's not kept as
>> current).
>>=20
>> If true that's a real disappointment, as I was looking forward to
>> btrfs- raid1 with checksummed integrity management.
>=20
> I didn=C2=B4t see anything like this.
>=20
> Would be nice to be able to adapt the redundancy degree where possibl=
e.
I posted the wiki reference in reply to someone else recently. Let's s=
ee=20
if I can find it again...
Here it is. This is from the bottom of the RAID and data replication=20
section (immediately above "Balancing") on the SysadminGuide page:
>>>>>
With RAID-1 and RAID-10, only two copies of each byte of data are=20
written, regardless of how many block devices are actually in use on th=
e=20
filesystem.=20
<<<<<
But that's one of the bits that I hoped was stale, and that it allowed=20
setting the number of copies for both data and metadata, now. However,=
I=20
don't see any options along that line to feed to mkfs.btrfs or btrfs *=20
either one, so it would seem it's not there yet, at least not in btrfs-
tools as built just a couple days ago from the official/mason tree on=20
kernel.org. I haven't tried the integration tree (aka Hugo Mills' aka=20
darksatanic.net tree). So I guess that wiki quote is still correct. O=
h,=20
well... maybe later-this-year/in-a-few-kernel-cycles.
> An idea might be splitting into a delayed synchronisation mirror:
>=20
> Have two BTRFS RAID-1 - original and backup - and have a cronjob with
> rsync mirroring files every hour or so. Later this might be replaced =
by
> btrfs send/receive - or by RAID-1 with higher redundancy.
That's an interesting idea. However, as I run git kernels and don't=20
accumulate a lot of uptime in any case, what I'd probably do is set up=20
the rsync to be run after a successful boot or mount of the filesystem =
in=20
question. That way, if it ever failed to boot/mount for whatever reaso=
n,=20
I could be relatively confident that the backup version remained intact=
=20
and usable.
That's actually /quite/ an interesting idea. While I have working and=20
backup partitions for most stuff now, the process remains a manual one,=
=20
when I think the system is stable enough and enough time has passed sin=
ce=20
the last one, so the backup tends to be weeks or months old as opposed =
to=20
days or hours. This idea, modified to do it once per boot or mount or=20
whatever, would keep the backups far more current and be much less hass=
le=20
than the manual method I'm using now. So even if I don't immediately=20
switch to btrfs as I had thought I might, I can implement those scripts=
=20
on the current system now, and then they'll be ready and tested, needin=
g=20
little modification when I switch to btrfs, later.
Thanks for the ideas! =3D:^)
>> 3) How does btrfs space overhead (and ENOSPC issues) compare to
>> reiserfs with its (default) journal and tail-packing? My existing
>> filesystems are 128 MB and 4 GB at the low end, and 90 GB and 16 GB =
at
>> the high end. At the same size, can I expect to fit more or less da=
ta
>> on them? Do the compression options change that by much "IRL"? Giv=
en
>> that I'm using same- sized partitions for my raid-1s, I guess at lea=
st
>> /that/ angle of it's covered.
>=20
> The efficiency of the compression options depend highly of the kind o=
f
> data you want to store.
>=20
> I tried lzo on a external disk with movies, music files, images and
> software archives. The effect has been minimal, about 3% or so. But f=
or
> unpacked source trees, lots of clear text files, likely also virtual
> machine image files or other nicely compressible data the effect shou=
ld
> be better.
Back in the day, MS-DOS 6.2 on a 130 MB hard drive, I used to run MS=20
Drivespace (which I guess they partnered with Stacker to get the tech=20
for, then dropped the Stacker partnership like a hot potato after they'=
d=20
sucked out all the tech they wanted, killing Stacker in the process...)=
,=20
so I'm familiar with the idea of filesystem or lower integrated=20
compression and realize that it's definitely variable. I was just=20
wondering what the real-life usage scenarios had come up with, realizin=
g=20
even as I wrote it that the question wasn't one that could be answered =
in=20
anything but general terms.
But I run Gentoo and thus deal with a lot of build scripts, etc, plus t=
he=20
usual *ix style plain text config files, etc, so I expect for that=20
compression will be pretty good. Rather less so on the media and bzip-
tarballed binpkgs partitions, certainly, with the home partition likely=
=20
intermediate since it has a lot of plain text /and/ a lot of pre-
compressed data.
Meanwhile, even without a specific answer, just the discussion is helpi=
ng=20
to clarify my understanding and expectations regarding compression, so=20
thanks.
> Although BTRFS received a lot of fixes for ENOSPC issues I would be a
> bit reluctant with very small filesystems. But that is just a gut
> feeling. So I do not know whether the option -M from above is tested
> widely. I doubt it.
The only real small filesystem/raid I have is /boot, the 128 MB=20
mentioned. But in thinking it over a bit more since I wrote the initia=
l=20
post, I realized that given the 9-ish gigs of unallocated freespace at=20
the end of the drives and the fact that most of the partitions are at a=
=20
quarter-gig offset due to the 128 MB /boot and the combined 128 MB BIOS=
=20
and UEFI reserved partitions, I have room to expand both by several=20
times, and making the total of all 3 (plus the initial few sectors of=20
unpartitioned boot area) at the beginning of the drive an even 1 gig=20
would give me even gig offsets for all the other partitions/raids as we=
ll.
So I'll almost certainly expand /boot from 1/8 gig to 1/4 gig, and mayb=
e=20
to half or even 3/4 gig, just so the offsets for everything else end up=
=20
at even half or full gig boundaries, instead of the quarter-gig I have=20
now. Between that and mixed-mode, I think the potential sizing issue o=
f=20
/boot pretty much disappears. One less problem to worry about. =3D:^)
So the big sticking point now is two-copy-only data on btrfs-raid1,=20
regardless of the number of drives, and sticking that on top of md/raid=
's=20
a workaround, tho obviously I'd much rather a btrfs that could mirror=20
both data and metadata an arbtrary number of ways instead of just two. =
=20
(There's some hints that metadata at least gets mirrored to all drives =
in=20
a btrfs-raid1, tho nothing clearly states it one way or another. But=20
without data mirrored to all drives as well, I'm just not comfortable.)
But while not ideal, the data integrity checking of two-way btrfs-raid1=
=20
on two-way md/raid1 should at least be better than entirely unverified
4-way md/raid1, and I expect the rest will come over time, so I could=20
simply upgrade anyway.
OTOH, in general as I've looked closer, I've found btrfs to be rather=20
farther away from exiting experimental than the prominent adoption by=20
various distros had led me to believe, and without N-way mirroring raid=
,=20
one of the two big features that I was looking forward to (the other=20
being the data integrity checking) just vaporized in front of my eyes, =
so=20
I may well hold off on upgrading until, potentially, late this year=20
instead of early this year, even if there are workarounds. I'm just no=
t=20
sure it's worth the cost of dealing with the still experimental aspects=
=2E
Either way, however, this little foray into previously unexplored=20
territory leaves me with a MUCH firmer grasp of btrfs. It's no longer=20
simply a vague filesystem with some vague features out there.
And now that I'm here, I'll probably stay on the list as well, as I've=20
already answered a number of questions posted by others, based on the=20
material in the wiki and manpages, so I think I have something to=20
contribute, and keeping up with developments will be far easier if I st=
ay=20
involved.
Meanwhile, again and overall, thanks for the answer. I did have most o=
f=20
the bits of info I needed there floating around, but having someone to=20
discuss my questions with has definitely helped solidify the concepts,=20
and you've given me at least two very good suggestions that were entire=
ly=20
new to me and that would have certainly taken me quite some time to com=
e=20
up with on my own, if I'd been able to do so at all, so thanks, indeed!=
=20
=3D:^)
--=20
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: btrfs-raid questions I couldn't find an answer to on the wiki
2012-01-29 5:40 ` Duncan
@ 2012-01-29 7:55 ` Martin Steigerwald
0 siblings, 0 replies; 14+ messages in thread
From: Martin Steigerwald @ 2012-01-29 7:55 UTC (permalink / raw)
To: linux-btrfs; +Cc: Duncan
Am Sonntag, 29. Januar 2012 schrieb Duncan:
> Martin Steigerwald posted on Sat, 28 Jan 2012 13:08:52 +0100 as=20
excerpted:
> > Am Donnerstag, 26. Januar 2012 schrieb Duncan:
[=E2=80=A6]
> >> 2) The wiki indicates that btrfs-raid1 and raid-10 only mirror da=
ta
> >> 2- way, regardless of the number of devices. On my now aging
> >> disks, I really do NOT like the idea of only 2-copy redundancy.=20
> >> I'm far happier with the 4-way redundancy, twice for the important
> >> stuff since it's in both working and backup mds altho they're on
> >> the same 4-disk set (tho I do have an external drive backup as
> >> well, but it's not kept as current).
> >>=20
> >> If true that's a real disappointment, as I was looking forward to
> >> btrfs- raid1 with checksummed integrity management.
> >=20
> > I didn=C2=B4t see anything like this.
> >=20
> > Would be nice to be able to adapt the redundancy degree where
> > possible.
>=20
> I posted the wiki reference in reply to someone else recently. Let's
> see if I can find it again...
>=20
> Here it is. This is from the bottom of the RAID and data replication
> section (immediately above "Balancing") on the SysadminGuide page:
>=20
>=20
> With RAID-1 and RAID-10, only two copies of each byte of data are
> written, regardless of how many block devices are actually in use on
> the filesystem.
> <<<<<
Yes, I have seen that too sometime ago. What I meant I didn=C2=B4t see =
anything=20
like this, is that I didn=C2=B4t see and option to set the number of co=
pies=20
anywhere yet - just like you.
> > An idea might be splitting into a delayed synchronisation mirror:
> >=20
> > Have two BTRFS RAID-1 - original and backup - and have a cronjob wi=
th
> > rsync mirroring files every hour or so. Later this might be replace=
d
> > by btrfs send/receive - or by RAID-1 with higher redundancy.
>=20
> That's an interesting idea. However, as I run git kernels and don't
> accumulate a lot of uptime in any case, what I'd probably do is set u=
p
> the rsync to be run after a successful boot or mount of the filesyste=
m
> in question. That way, if it ever failed to boot/mount for whatever
> reason, I could be relatively confident that the backup version
> remained intact and usable.
>=20
> That's actually /quite/ an interesting idea. While I have working an=
d
> backup partitions for most stuff now, the process remains a manual on=
e,
> when I think the system is stable enough and enough time has passed
> since the last one, so the backup tends to be weeks or months old as
> opposed to days or hours. This idea, modified to do it once per boot
> or mount or whatever, would keep the backups far more current and be
> much less hassle than the manual method I'm using now. So even if I
> don't immediately switch to btrfs as I had thought I might, I can
> implement those scripts on the current system now, and then they'll b=
e
> ready and tested, needing little modification when I switch to btrfs,
> later.
>=20
> Thanks for the ideas! =3D:^)
Well you may even through in a snapshot in-between.
During boot before backup first snapshot or just after mount before=20
applications / services are started snapshot the source device. That=20
should give you a fairly consistent backup source. Then do the rsync=20
backup. Then snapshot the backup drive.
This way you can access older backups in case the original has gone bad=
=20
and has been backuped nonetheless.
I suggest a cronjob deleting old snapshots after some time again in ord=
er=20
to save space.
I want to replace my backup by something like this. There is also=20
rsnapshot for this case, but its error reporting I find sub optimal (no=
=20
rsync error messages included unless you run it on the command line wit=
h=20
option -v) and it uses hardlinks. Maybe could be adapted to use snapsho=
ts?
=20
> > Although BTRFS received a lot of fixes for ENOSPC issues I would be=
a
> > bit reluctant with very small filesystems. But that is just a gut
> > feeling. So I do not know whether the option -M from above is teste=
d
> > widely. I doubt it.
>=20
> The only real small filesystem/raid I have is /boot, the 128 MB
> mentioned. But in thinking it over a bit more since I wrote the
> initial post, I realized that given the 9-ish gigs of unallocated
> freespace at the end of the drives and the fact that most of the
> partitions are at a quarter-gig offset due to the 128 MB /boot and th=
e
> combined 128 MB BIOS and UEFI reserved partitions, I have room to
> expand both by several times, and making the total of all 3 (plus the
> initial few sectors of unpartitioned boot area) at the beginning of
> the drive an even 1 gig would give me even gig offsets for all the
> other partitions/raids as well.
>=20
> So I'll almost certainly expand /boot from 1/8 gig to 1/4 gig, and
> maybe to half or even 3/4 gig, just so the offsets for everything els=
e
> end up at even half or full gig boundaries, instead of the quarter-gi=
g
> I have now. Between that and mixed-mode, I think the potential sizin=
g
> issue of /boot pretty much disappears. One less problem to worry
> about. =3D:^)
About /boot: I do not see any specific need to convert boot to BTRFS as=
=20
well. Since kernels have version number attached to seem and can be=20
installed side by side, snapshotting /boot does not appear that importa=
nt=20
to me.
So you can just use Ext3 or with GRUB 2 or a patched GRUB 1, some distr=
os=20
do it, Ext4 for /boot in case BTRFS would not work out.
=20
> So the big sticking point now is two-copy-only data on btrfs-raid1,
> regardless of the number of drives, and sticking that on top of
> md/raid's a workaround, tho obviously I'd much rather a btrfs that
> could mirror both data and metadata an arbtrary number of ways instea=
d
> of just two. (There's some hints that metadata at least gets mirrored
> to all drives in a btrfs-raid1, tho nothing clearly states it one way
> or another. But without data mirrored to all drives as well, I'm jus=
t
> not comfortable.)
I am with you there. Would be a nice feature.
The distributed filesystem Ceph which likes to be based on BTRFS volume=
s=20
has something like that, but Ceph might be overdoing it for your case ;=
).
=20
> OTOH, in general as I've looked closer, I've found btrfs to be rather
> farther away from exiting experimental than the prominent adoption by
> various distros had led me to believe, and without N-way mirroring
> raid, one of the two big features that I was looking forward to (the
> other being the data integrity checking) just vaporized in front of m=
y
> eyes, so I may well hold off on upgrading until, potentially, late
> this year instead of early this year, even if there are workarounds.=20
> I'm just not sure it's worth the cost of dealing with the still
> experimental aspects.
I decided for a partial approach.
My Amarok machine - an old ThinkPad T23 - is fully upgraded. On my main=
=20
laptop - a ThinkPad T520 with Intel SSD 320 - I have BTRFS as / and /ho=
me=20
still sits on Ext4.
I like this approach, cause I can gain experience with BTRFS, while not=
=20
putting to important data at risk. I can afford to loose /, since I hav=
e a=20
backup. But even with a backup of /home, I=C2=B4d rather not loose it, =
since I=20
only do it all 2-3 weeks cause its a manual thing for me at the moment.
At work I have a scratch data partition for Debian package development,=
=20
compiling stuff and other stuff I do not want to do within the NFS expo=
rt,=20
on BTRFS - that I backup to an Ext4 partition.
> And now that I'm here, I'll probably stay on the list as well, as I'v=
e
> already answered a number of questions posted by others, based on the
> material in the wiki and manpages, so I think I have something to
> contribute, and keeping up with developments will be far easier if I
> stay involved.
I encourage you, to start by putting something you can afford to loose =
on=20
BTRFS to gather practical experiences.
> Meanwhile, again and overall, thanks for the answer. I did have most
You are welcome.
I do not know a definitve answer to the number of copies question, but =
I=20
believe that its not possible yet to set it.
Thanks,
--=20
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: btrfs-raid questions I couldn't find an answer to on the wiki
2012-01-26 15:41 btrfs-raid questions I couldn't find an answer to on the wiki Duncan
2012-01-28 12:08 ` Martin Steigerwald
@ 2012-01-29 11:23 ` Goffredo Baroncelli
2012-01-30 5:49 ` Li Zefan
2012-01-30 14:58 ` Kyle Gates
1 sibling, 2 replies; 14+ messages in thread
From: Goffredo Baroncelli @ 2012-01-29 11:23 UTC (permalink / raw)
To: Duncan; +Cc: linux-btrfs
On Thursday, 26 January, 2012 16:41:32 Duncan wrote:
> 1) My /boot partition and its backup (which I do want to keep separate
> from root) are only 128 MB each. The wiki recommends 1 gig sizes
> minimum, but there's some indication that's dated info due to mixed data/
> metadata mode in recent kernels.
>
> Is a 128 MB btrfs reasonable? What's the mixed-mode minumum recommended
> and what is overhead going to look like?
IIRC, the minimum size should be 256MB. Anyway, if you want/allow a separate
partition for /boot I suggest to use a classic filesystem like ext3.
BR
G.Baroncelli
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: btrfs-raid questions I couldn't find an answer to on the wiki
2012-01-29 11:23 ` Goffredo Baroncelli
@ 2012-01-30 5:49 ` Li Zefan
2012-01-30 14:58 ` Kyle Gates
1 sibling, 0 replies; 14+ messages in thread
From: Li Zefan @ 2012-01-30 5:49 UTC (permalink / raw)
To: kreijack; +Cc: Duncan, linux-btrfs
Goffredo Baroncelli wrote:
> On Thursday, 26 January, 2012 16:41:32 Duncan wrote:
>> 1) My /boot partition and its backup (which I do want to keep separate
>> from root) are only 128 MB each. The wiki recommends 1 gig sizes
>> minimum, but there's some indication that's dated info due to mixed data/
>> metadata mode in recent kernels.
>>
>> Is a 128 MB btrfs reasonable? What's the mixed-mode minumum recommended
>> and what is overhead going to look like?
>
> IIRC, the minimum size should be 256MB. Anyway, if you want/allow a separate
> partition for /boot I suggest to use a classic filesystem like ext3.
>
The 256MB limitation has been removed.
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: btrfs-raid questions I couldn't find an answer to on the wiki
2012-01-29 11:23 ` Goffredo Baroncelli
2012-01-30 5:49 ` Li Zefan
@ 2012-01-30 14:58 ` Kyle Gates
2012-01-31 5:55 ` Duncan
1 sibling, 1 reply; 14+ messages in thread
From: Kyle Gates @ 2012-01-30 14:58 UTC (permalink / raw)
To: kreijack, 1i5t5.duncan; +Cc: linux-btrfs
I've been having good luck with my /boot on a separate 1GB RAID1 btrfs filesystem using grub2 (2 disks only! I wouldn't try it with 3). I should note, however, that I'm NOT using compression on this volume because if I remember correctly it may not play well with grub (maybe that was just lzo though) and I'm also not using subvolumes either for the same reason.
Kyle
----------------------------------------
> From: kreijack@inwind.it
> To: 1i5t5.duncan@cox.net
> Subject: Re: btrfs-raid questions I couldn't find an answer to on the wiki
> Date: Sun, 29 Jan 2012 12:23:39 +0100
> CC: linux-btrfs@vger.kernel.org
>
> On Thursday, 26 January, 2012 16:41:32 Duncan wrote:
> > 1) My /boot partition and its backup (which I do want to keep separate
> > from root) are only 128 MB each. The wiki recommends 1 gig sizes
> > minimum, but there's some indication that's dated info due to mixed data/
> > metadata mode in recent kernels.
> >
> > Is a 128 MB btrfs reasonable? What's the mixed-mode minumum recommended
> > and what is overhead going to look like?
>
> IIRC, the minimum size should be 256MB. Anyway, if you want/allow a separate
> partition for /boot I suggest to use a classic filesystem like ext3.
>
> BR
> G.Baroncelli
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: btrfs-raid questions I couldn't find an answer to on the wiki
2012-01-30 14:58 ` Kyle Gates
@ 2012-01-31 5:55 ` Duncan
2012-02-01 0:22 ` Kyle Gates
2012-02-10 19:45 ` Phillip Susi
0 siblings, 2 replies; 14+ messages in thread
From: Duncan @ 2012-01-31 5:55 UTC (permalink / raw)
To: linux-btrfs
Kyle Gates posted on Mon, 30 Jan 2012 08:58:41 -0600 as excerpted:
> I've been having good luck with my /boot on a separate 1GB RAID1 btrfs
> filesystem using grub2 (2 disks only! I wouldn't try it with 3). I
> should note, however, that I'm NOT using compression on this volume
> because if I remember correctly it may not play well with grub (maybe
> that was just lzo though) and I'm also not using subvolumes either for
> the same reason.
Thanks! I'm on grub2 as well. It's is still masked on gentoo, but I
recently unmasked and upgraded to it, taking advantage of the fact that I
have two two-spindle md/raid-1s for /boot and its backup to test and
upgrade one of them first, then the other only when I was satisfied with
the results on the first set. I'll be using a similar strategy for the
btrfs upgrades, only most of my md/raid-1s are 4-spindle, with two sets,
working and backup, and I'll upgrade one set first.
I'm going to keep /boot a pair of two-spindle raid-1s, but intend to make
them btrfs-raid1s instead of md/raid-1s, and will upgrade one two-spindle
set at a time.
More on the status of grub2 btrfs-compression support based on my
research. There is support for btrfs/gzip-compression in at least grub
trunk. AFAIK, it's gzip-compression in grub-1.99-release and
lzo-compression in trunk only, but I may be misremembering and it's gzip
in trunk only and only uncompressed in grub-1.99-release.
In any event, since I'm running 128 MB /boot md/raid-1s without
compression now, and intend to increase the size to at least a quarter
gig to better align the following partitions, /boot is the one set of
btrfs partitions I do NOT intend to enable compression on, so that won't
be an issue for me here. And since for /boot I'm running a pair of
two-spindle raid1s instead of my usual quad-spindle raid1s, you've
confirmed that works as well. =:^)
As a side note, since I only recently did the grub2 upgrade, I've been
enjoying its ability to load and read md/raid and my current reiserfs
directly, thus giving me the ability to look up info in at least text-
based main system config and notes files directly from grub2, without
booting into Linux, if for some reason the above-grub boot is hosed or
inconvenient at that moment. I just realized that if I want to maintain
that direct-from-grub access, I'll need to ensure that the grub2 I'm
running groks the btrfs compression scheme I'm using on any filesystem I
want grub2 to be able to read.
Hmm... that brings up another question: You mention a 1-gig btrfs-raid1 /
boot, but do NOT mention whether you installed it before or after mixed-
chunk (data/metadata) support made it into btrfs and became the default
for <= 1 gig filesystems.
Can you confirm one way or the other whether you're running mixed-chunk
on that 1-gig? I'm not sure whether grub2's btrfs module groks mixed-
chunk or not, or whether that even matters to it.
Also, could you confirm mbr-bios vs gpt-bios vs uefi-gpt partitions? I'm
using gpt-bios partitioning here, with the special gpt-bios-reserved
partition, so grub2-install can build the modules necessary for /boot
access directly into its core-image and install that in the gpt-bios-
reserved partition. It occurs to me that either uefi-gpt or gpt-bios
with the appropriate reserved partition won't have quite the same issues
with grub2 reading a btrfs /boot that either mbr-bios or gpt-bios without
a reserved bios partition would. If you're running gpt-bios with a
reserved bios partition, that confirms yet another aspect of your setup,
compared to mine. If you're running uefi-gpt, not so much as at least in
theory, that's best-case. If you're running either mbr-bios or gpt-bios
without a reserved bios partition, that's a worst-case, so if it works,
then the others should definitely work.
Meanwhile, you're right about subvolumes. I'd not try them on a btrfs
/boot, either. (I don't really see the use case for it, for a separate
/boot, tho there's certainly a case for a /boot subvolume on a btrfs
root, for people doing that.)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: btrfs-raid questions I couldn't find an answer to on the wiki
2012-01-31 5:55 ` Duncan
@ 2012-02-01 0:22 ` Kyle Gates
2012-02-01 6:59 ` Duncan
2012-02-10 19:45 ` Phillip Susi
1 sibling, 1 reply; 14+ messages in thread
From: Kyle Gates @ 2012-02-01 0:22 UTC (permalink / raw)
To: 1i5t5.duncan, linux-btrfs
>> I've been having good luck with my /boot on a separate 1GB RAID1 btrfs
>> filesystem using grub2 (2 disks only! I wouldn't try it with 3). I
>> should note, however, that I'm NOT using compression on this volume
>> because if I remember correctly it may not play well with grub (maybe
>> that was just lzo though) and I'm also not using subvolumes either for
>> the same reason.
>
> Thanks! I'm on grub2 as well. It's is still masked on gentoo, but I
> recently unmasked and upgraded to it, taking advantage of the fact that I
> have two two-spindle md/raid-1s for /boot and its backup to test and
> upgrade one of them first, then the other only when I was satisfied with
> the results on the first set. I'll be using a similar strategy for the
> btrfs upgrades, only most of my md/raid-1s are 4-spindle, with two sets,
> working and backup, and I'll upgrade one set first.
>
> I'm going to keep /boot a pair of two-spindle raid-1s, but intend to make
> them btrfs-raid1s instead of md/raid-1s, and will upgrade one two-spindle
> set at a time.
>
> More on the status of grub2 btrfs-compression support based on my
> research. There is support for btrfs/gzip-compression in at least grub
> trunk. AFAIK, it's gzip-compression in grub-1.99-release and
> lzo-compression in trunk only, but I may be misremembering and it's gzip
> in trunk only and only uncompressed in grub-1.99-release.
I believe you are correct that btrfs zlib support is included in grub2
version 1.99 and lzo is in trunk.
I'll try compressing the files on /boot for one installed kernel with the
defrag -czlib option and see how it goes.
Result: Seemed to work just fine.
> In any event, since I'm running 128 MB /boot md/raid-1s without
> compression now, and intend to increase the size to at least a quarter
> gig to better align the following partitions, /boot is the one set of
> btrfs partitions I do NOT intend to enable compression on, so that won't
> be an issue for me here. And since for /boot I'm running a pair of
> two-spindle raid1s instead of my usual quad-spindle raid1s, you've
> confirmed that works as well. =:^)
>
> As a side note, since I only recently did the grub2 upgrade, I've been
> enjoying its ability to load and read md/raid and my current reiserfs
> directly, thus giving me the ability to look up info in at least text-
> based main system config and notes files directly from grub2, without
> booting into Linux, if for some reason the above-grub boot is hosed or
> inconvenient at that moment. I just realized that if I want to maintain
> that direct-from-grub access, I'll need to ensure that the grub2 I'm
> running groks the btrfs compression scheme I'm using on any filesystem I
> want grub2 to be able to read.
>
> Hmm... that brings up another question: You mention a 1-gig btrfs-raid1 /
> boot, but do NOT mention whether you installed it before or after mixed-
> chunk (data/metadata) support made it into btrfs and became the default
> for <= 1 gig filesystems.
I don't think I specifically enabled mixed chunk support when I created this
filesystem. It was done on a 2.6 kernel sometime in the middle of 2011 iirc.
> Can you confirm one way or the other whether you're running mixed-chunk
> on that 1-gig? I'm not sure whether grub2's btrfs module groks mixed-
> chunk or not, or whether that even matters to it.
>
> Also, could you confirm mbr-bios vs gpt-bios vs uefi-gpt partitions? I'm
> using gpt-bios partitioning here, with the special gpt-bios-reserved
> partition, so grub2-install can build the modules necessary for /boot
> access directly into its core-image and install that in the gpt-bios-
> reserved partition. It occurs to me that either uefi-gpt or gpt-bios
> with the appropriate reserved partition won't have quite the same issues
> with grub2 reading a btrfs /boot that either mbr-bios or gpt-bios without
> a reserved bios partition would. If you're running gpt-bios with a
> reserved bios partition, that confirms yet another aspect of your setup,
> compared to mine. If you're running uefi-gpt, not so much as at least in
> theory, that's best-case. If you're running either mbr-bios or gpt-bios
> without a reserved bios partition, that's a worst-case, so if it works,
> then the others should definitely work.
Same here, gpt-bios, 1MB partition with bios_grub flag set (gdisk code EF02)
for grub to reside on.
> Meanwhile, you're right about subvolumes. I'd not try them on a btrfs
> /boot, either. (I don't really see the use case for it, for a separate
> /boot, tho there's certainly a case for a /boot subvolume on a btrfs
> root, for people doing that.)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: btrfs-raid questions I couldn't find an answer to on the wiki
2012-02-01 0:22 ` Kyle Gates
@ 2012-02-01 6:59 ` Duncan
0 siblings, 0 replies; 14+ messages in thread
From: Duncan @ 2012-02-01 6:59 UTC (permalink / raw)
To: linux-btrfs
Kyle Gates posted on Tue, 31 Jan 2012 18:22:51 -0600 as excerpted:
> I don't think I specifically enabled mixed chunk support when I created
> this filesystem. It was done on a 2.6 kernel sometime in the middle of
> 2011 iirc.
Yeah, I'd guess that was before mixed-chunk, or at least before it became
the default for <=1GiB filesystems, so even if it was supported it
wouldn't have been the default.
Meaning there's still an open question as to whether grub-1.99 supports
mixed-chunk.
It looks like I might get more time to play with it this coming week than
I had this past week. I might try some of my own experiments... and
whether grub groks mixed-chunk will certainly be among them if I do.
As for those recommending something other than btrfs for /boot, yes,
that's a possibility, but I strongly prefer to standardize on a single
filesystem type. Right now, that's reiserfs for everything except flash-
based USB and legacy floppies (both of which I use ext4 without
journaling for, except for the floppies I used to update my BIOS, before
my 2003 era mainboard got EOLed; those were freedos images), and
ultimately, I hope it'll be btrfs for everything including flash-based
(tho perhaps not for legacy floppies, but it has been awhile since I used
one of them for anything, after that last BIOS update...).
Of course I'm going to keep reiserfs on my backups, even if I use btrfs
for my working system, for the time being since btrfs is still in heavy
development, but ultimately, I want to go all btrfs just as I'm all
reiserfs now, and that would include both /boot 2-spindle raid-1s.
Tho if btrfs doesn't work well for that ATM, I can keep /boot as reiserfs
for the time being, since I'm already keeping it for the backups, for the
time being.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: btrfs-raid questions I couldn't find an answer to on the wiki
2012-01-31 5:55 ` Duncan
2012-02-01 0:22 ` Kyle Gates
@ 2012-02-10 19:45 ` Phillip Susi
2012-02-11 5:48 ` Duncan
1 sibling, 1 reply; 14+ messages in thread
From: Phillip Susi @ 2012-02-10 19:45 UTC (permalink / raw)
To: Duncan; +Cc: linux-btrfs
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 1/31/2012 12:55 AM, Duncan wrote:
> Thanks! I'm on grub2 as well. It's is still masked on gentoo, but
> I recently unmasked and upgraded to it, taking advantage of the
> fact that I have two two-spindle md/raid-1s for /boot and its
> backup to test and upgrade one of them first, then the other only
> when I was satisfied with the results on the first set. I'll be
> using a similar strategy for the btrfs upgrades, only most of my
> md/raid-1s are 4-spindle, with two sets, working and backup, and
> I'll upgrade one set first.
Why do you want to have a separate /boot partition? Unless you can't
boot without it, having one just makes things more
complex/problematic. If you do have one, I agree that it is best to
keep it ext4 not btrfs.
> Meanwhile, you're right about subvolumes. I'd not try them on a
> btrfs /boot, either. (I don't really see the use case for it, for
> a separate /boot, tho there's certainly a case for a /boot
> subvolume on a btrfs root, for people doing that.)
The Ubuntu installer creates two subvolumes by default when you
install on btrfs: one named @, mounted on /, and one named @home,
mounted on /home. Grub2 handles this well since the subvols have
names in the default root, so grub just refers to /@/boot instead of
/boot, and so on. The apt-btrfs-snapshot package makes apt
automatically snapshot the root subvol so you can revert after an
upgrade. This seamlessly causes grub to go back to the old boot menu
without the new kernels too, since it goes back to reading the old
grub.cfg in the reverted root subvol.
I have a radically different suggestion you might consider rebuilding
your system using. Partition each disk into only two partitions: one
for bios_grub, and one for everything else ( or just use MBR and skip
the bios_grub partition ). Give the second partitions to mdadm to
make a raid10 array out of. If you use a 2x far and 2x offset instead
of the default near layout, you will have an array that can still
handle any 2 of the 4 drives failing, will have twice the capacity of
a 4 way mirror, almost the same sequential read throughput of a 4 way
raid0, and about twice the write throughput of a 4 way mirror.
Partition that array up and put your filesystems on it.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQEcBAEBAgAGBQJPNXPnAAoJEJrBOlT6nu75/d8IAJ0fQ3xWPe6SYBY8nj34mcWh
ql6C4ieMkd07ZCuymT5ZVhWJhtdc6/Vg7ecWmhYdeu4d1WGp4DvTumEYHVl4ZlRk
mT9Lq4SupDL5Dk0nfxZUqY8XnIek3kIG/wgekgdSuLF0J9QFQdCFc25j/idIh0Dy
Gk5NJtgKmsTKUQhzPQZxif8nwWVQzQICm5P//FeOQgx8sq7iVdCQHUxlJEPfsL7m
CVVMJPVk+524rFTWxLZ4KLbXkNE7nrikg7UMlWBtM5gflkU0Y+bfmZKPGcqBCSSn
AId5M5alzjLSLblBqwf8wKpEIiDXBqb6f+bSxqnk5FdKKx5l5lziZyqQM+gnyIo=
=ePD3
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: btrfs-raid questions I couldn't find an answer to on the wiki
2012-02-10 19:45 ` Phillip Susi
@ 2012-02-11 5:48 ` Duncan
2012-02-12 0:04 ` Phillip Susi
0 siblings, 1 reply; 14+ messages in thread
From: Duncan @ 2012-02-11 5:48 UTC (permalink / raw)
To: linux-btrfs
Phillip Susi posted on Fri, 10 Feb 2012 14:45:43 -0500 as excerpted:
> On 1/31/2012 12:55 AM, Duncan wrote:
>> Thanks! I'm on grub2 as well. It's is still masked on gentoo, but I
>> recently unmasked and upgraded to it, taking advantage of the fact that
>> I have two two-spindle md/raid-1s for /boot and its backup to test and
>> upgrade one of them first, then the other only when I was satisfied
>> with the results on the first set. I'll be using a similar strategy
>> for the btrfs upgrades, only most of my md/raid-1s are 4-spindle, with
>> two sets, working and backup, and I'll upgrade one set first.
>
> Why do you want to have a separate /boot partition? Unless you can't
> boot without it, having one just makes things more complex/problematic.
> If you do have one, I agree that it is best to keep it ext4 not btrfs.
For a proper picture of the situation, understand that I don't have an
initr*, I build everything I need into the kernel and have module loading
disabled, and I keep /boot unmounted except when I'm actually installing
an upgrade or reconfiguring.
Having a separate /boot means that I can keep it unmounted and thus free
from possible random corruption or accidental partial /boot tree
overwrite or deletion, most of the time. It also means that I can emerge
(build from sources using the gentoo ebuild script provided for the
purpose, and install to the live system) a new grub without fear of
corrupting what I actually boot from -- the grub system installation and
boot installation remain separate.
A separate /boot is also more robust in terms of file system corruption
-- if something goes wrong with my rootfs, I can simply boot its backup,
from a separate /boot that will not have been corrupted. Similarly, if
something goes wrong with /boot (or the bios partition), I can switch
drives in the BIOS and boot from the backup /boot, then load my usual
rootfs.
Since I'm working with four drives, and both the working /boot and
backup /boot are two-spindle md/raid1, one on one pair, one on the other,
I have both hardware redundancy via the second spindle of the raid1, and
admin-fatfinger redundancy via the backup. However, the rootfs and its
backup are both on quad-spindle md/raid1s, thus giving me four separate
physical copies each of rootfs and its backup. Because the disk points
at a single bootloader, if /boot is on rootfs, all four would point to
either the working rootfs or the backup rootfs, and would update
together, so I'd lose the ability to fall back to the backup /boot.
(Note that I developed the backup /boot policy and solution back on
legacy-grub. Grub2 is rather more flexible, particularly with a
reasonably roomy GPT BIOS partition, and since each BIOS partition is
installed individually, in theory, if a grub2 update failed, I could
point the BIOS at a disk I hadn't installed the BIOS partition update to
yet, boot to the limited grub rescue-mode-shell, and point it at the
/boot in the backup rootfs to load the normal-mode-shell, menu, and
additional grub2 modules as necessary. However, being able to access a
full normal-mode-shell grub2 on the backup /boot instead of having to
resort to the grub2 rescue-mode-shell to reach the backup rootfs, does
have its benefits.)
One of the nice things about grub2 normal-mode is that it allows
(directory and plain text file) browsing of pretty much anything it has a
module for, anywhere on the system. That's a nice thing to be able to
do, but it too is much more robust if /boot isn't part of rootfs, and
thus, isn't likely to be damaged if the rootfs is. The ability to boot
to grub2 and retrieve vital information (even if limited to plain-text
file storage) from a system without a working rootfs is a very nice
ability to have!
So you see, a separate /boot really does have its uses. =:^)
>> Meanwhile, you're right about subvolumes. I'd not try them on a btrfs
>> /boot, either. (I don't really see the use case for it, for a separate
>> /boot, tho there's certainly a case for a /boot subvolume on a btrfs
>> root, for people doing that.)
>
> The Ubuntu installer creates two subvolumes by default when you install
> on btrfs: one named @, mounted on /, and one named @home, mounted on
> /home. Grub2 handles this well since the subvols have names in the
> default root, so grub just refers to /@/boot instead of /boot, and so
> on. The apt-btrfs-snapshot package makes apt automatically snapshot the
> root subvol so you can revert after an upgrade. This seamlessly causes
> grub to go back to the old boot menu without the new kernels too, since
> it goes back to reading the old grub.cfg in the reverted root subvol.
Thanks for that "real world" example. Subvolumes and particularly
snapshots can indeed be quite useful, but I'd be rather leery of having
all that on the same master filesystem. Lose it and you've lost
everything, snapshots or no snapshots, if there's not bootable backups
somewhere.
Two experiences inform my partitioning and layout judgment here. The
first one was back before the turn of the century when I still did MS.
In fact, at the time I was running an MSIE public beta for either MSIE 4
or 5, both of which I ran but IDR which it was that this happened with.
MS made a change to the MSIE cache indexing, keeping the index file disk
location in memory and direct-writing to it for performance reasons,
rather than going the usual filesystem access route. The only problem
was, whoever made that change didn't think about MSIE and MS (filesystem)
Explorer being effectively merged, and that it ran all the time as it was
the shell.
So then it comes time for the regularly scheduled system defrag, and
defrag moves the index files out from under MSIE. Then MSIE updates the
index, writing to the old location, in the process overwriting whatever's
there, causing all sorts of crosslinked files and other destruction.
A number of folks running that beta had un-backed-up data destroyed by
that bug (which MS fixed in the release by simply marking the MSIE index
files with the system attribute, so defrag wouldn't move them), but all
it did to me was screw up a few files on my separate TMP partition,
because I HAD a separate TMP partition, and because that's where I had
put the IE cache, reasoning that it was temporary data and thus belonged
on the TMP partition. That decision saved my bacon!
Both before and after that, I had a number of similar but rather more
minor incidents where a strict partitioning policy saved me trouble, as
well. But that one was all it took to keep me using a strict separate
partitioning system to this day.
The second experience was when the AC failed here, in the hot Phoenix
summer (routinely 45-48C highs). I had left the system on and gone
somewhere. When the AC failed, the outside-in-the-shade-temperature was
45C+, inside room temperature was EASILY 60C+, and the drive temperature
was very likely 90C+!
The drive of course failed due to physical head-crash on the still-
spinning platters (I could see the grooves when I took it apart, later).
When I came home of course the system was frozen, and I turned it off.
The CPUs survived, and surprisingly, so did much of the disk. It was
only where the physical head crash grooves were that the data was gone.
I didn't have off-disk backups at that time (for sure I do now!), but I
had duplicate backup partitions for anything valuable. Since they
weren't mounted, I was able to recover and even continue using the backup
rootfs, /usr, etc, for a couple months, until I could buy a new disk and
transfer everything over.
Again, what saved me was the fact that I had everything partitioned off.
The partitions that weren't actually mounted were pretty much undamaged,
save for a few single scratches due to head seeking from one mounted
partition to another, before the system itself crashed, and unlike the
grooves worn in the mounted partitions, the disk's own error correction
caught most of that. An fsck fixed things up pretty good, tho I lost a
few files.
I hate to think about what would have happened if instead of separate
partitions, each with its own intact metadata, etc, those "unmounted"
partitions had been simply subvolumes on a single master filesystem!
True, btrfs has double metadata and both data and metadata checksumming,
and I'm *DEFINITELY* looking forward to the additional protection from
that (tho only two-way even on a 4-spindle so-called raid1 btrfs was a
big disappointment, tho an article I read somewhere says multi-redundancy
is scheduled for kernel 3.4 or 3.5), but the plan at least here is for
that to be ADDITIONAL protection, NOT AN EXCUSE TO BE SLOPPY! It's for
that reason that I intend to keep proper partitions and probably won't
make a lot of use of the subvolume functionality, except as it's used by
the snapshot functionality, which I expect I WILL use, for exactly the
type of rollback functionality you describe above.
> I have a radically different suggestion you might consider rebuilding
> your system using. Partition each disk into only two partitions: one
> for bios_grub, and one for everything else ( or just use MBR and skip
> the bios_grub partition ). Give the second partitions to mdadm to make
> a raid10 array out of. If you use a 2x far and 2x offset instead of the
> default near layout, you will have an array that can still handle any 2
> of the 4 drives failing, will have twice the capacity of a 4 way mirror,
> almost the same sequential read throughput of a 4 way raid0, and about
> twice the write throughput of a 4 way mirror. Partition that array up
> and put your filesystems on it.
I like the raid-10 idea and will have to research it some more as I
understand the idea behind "near" and "far" on raid10, but having never
used raid-10, I don't "grok" that idea, understand it well enough to have
appreciated the possibility for lose-an-two, before you suggested it.
And I'm only running 300 gig disks and given that I'm running a working
and a backup copy of most of those raids/partitions, it's more like 180
or 200 gig of actual storage, with the free-space fragmented due to the
multiple partitions/raids, so I /am/ running a bit low on free-space and
could definitely use the doubled space at this point!
But I believe I'll keep multiple raids for much the same reason I keep
multiple partitions, it's a FAR more robust solution than having all
one's eggs in one RAID basket.
Besides, I actually did try a single partitioned RAID (well, two, one for
all the working copies, one for the backups) when I first setup md/raid,
and came to the conclusion that the recovery time on that big a raid is
rather longer than I like to be dealing with it. Multiple raids, with
the ones I'm not using ATM offline, means I don't have to worry about
recovering the entire thing, only the raids that were online and actually
dirty at the time of crash or whatever. And of course write-intent
bitmaps means even shorter recovery time in most cases, so between
multiple raids and write-intent-bitmaps, a recovery that would take 2-3
hours with my original all-in-one raid setup, now often takes < 5
minutes! =:^) Even with write-intent-bitmaps, I'd hate to go back to big
all-in-one raids, for recovery reasons alone, and between that and the
additional robustness of multiple raids, I just don't see myself doing
that any time soon.
But the 2x far, 2x offset raid10 idea, to let me lose any two of the
four, is something I will very possibly use, especially now that I've
seen that btrfs isn't as close to ready with multi-redundancy as I had
hoped, so it'll probably be mid-year at the earliest before I can
reasonably play with that. Thanks again, as that's a very practical
suggestion indeed! =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: btrfs-raid questions I couldn't find an answer to on the wiki
2012-02-11 5:48 ` Duncan
@ 2012-02-12 0:04 ` Phillip Susi
2012-02-12 22:31 ` Duncan
0 siblings, 1 reply; 14+ messages in thread
From: Phillip Susi @ 2012-02-12 0:04 UTC (permalink / raw)
To: Duncan; +Cc: linux-btrfs
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 02/11/2012 12:48 AM, Duncan wrote:
> So you see, a separate /boot really does have its uses. =:^)
True, but booting from removable media is easy too, and a full livecd gives
much more recovery options than the grub shell. It is the corrupted root
fs that is of much more concern than /boot.
> I like the raid-10 idea and will have to research it some more as I
> understand the idea behind "near" and "far" on raid10, but having never
> used raid-10, I don't "grok" that idea, understand it well enough to have
> appreciated the possibility for lose-an-two, before you suggested it.
To grok the other layouts, it helps to think of the simple two disk case.
A far layout is like having a raid0 across the first half of the disk, then
mirroring the whole first half of the disk onto the second half of the other
disk. Offset has the mirror on the next stripe so each stripe is interleaved
with a mirror stripe, rather than having all original, then all mirrors after.
It looks like mdadm won't let you use both at once, so you'd have to go with
a 3 way far or offset. Also I was wrong about the additional space. You
would only get 25% more space since you still have 3 copies of all data so
you get 4/3 times the space, but you will get much better throughput since
it is striped across all 4 disks. Far gives better sequential read since it
reads just like a raid0, but writes have to seek all the way across the disk
to write the backup. Offset requires seeks between each stripe on read, but
the writes don't have to seek to write the backup.
You also could do a raid6 and get the double failure tolerance, and two disks
worth of capacity, but not as much read throughput as raid10.
> But I believe I'll keep multiple raids for much the same reason I keep
> multiple partitions, it's a FAR more robust solution than having all
> one's eggs in one RAID basket.
True.
> Besides, I actually did try a single partitioned RAID (well, two, one for
> all the working copies, one for the backups) when I first setup md/raid,
> and came to the conclusion that the recovery time on that big a raid is
> rather longer than I like to be dealing with it. Multiple raids, with
> the ones I'm not using ATM offline, means I don't have to worry about
> recovering the entire thing, only the raids that were online and actually
> dirty at the time of crash or whatever. And of course write-intent
> bitmaps means even shorter recovery time in most cases, so between
> multiple raids and write-intent-bitmaps, a recovery that would take 2-3
> hours with my original all-in-one raid setup, now often takes < 5
> minutes! =:^) Even with write-intent-bitmaps, I'd hate to go back to big
> all-in-one raids, for recovery reasons alone, and between that and the
> additional robustness of multiple raids, I just don't see myself doing
> that any time soon.
Depends on what you mean by recovery. Re-adding a drive that you removed
will be faster with multiple raids ( though write-intent bitmaps also take
care of that ), but if you actually have a failed disk and have to replace
it with a new one, you still have to do a rebuild on all of the raids
so it ends up taking the same total time.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQEcBAEBAgAGBQJPNwIZAAoJEJrBOlT6nu754yUIAL79DHhanAC0SWaXFBYTT4T2
N2xG3ved177BXX0VhKCcoYcWFiSerWzAnPlZsUDzMfaHDxBNF4ATsnboY31rCG1j
QJE3Oz9Cop45xhTBrMcwYs+woR+0HAmYb1Qa1aKrNwG0d6XlfZsLFBFUtrB411lX
erOS77EsT2BYaumanvouM8vm5LG9ZrOItiELI7rm+hEcw64p3rjkUkvBG5nTdj8K
0x7tYgUHEZNngMSx4rMTUFTlx9485gn7eJ2hT1gbVNmRcCGwotTpOTXoJMh3csbF
jYbUJKqK0n+gxhHSW/+KJBTlb1gbZpuaiibqpQnUlOecI/Fmj2MpHQnZ4WSNpc8=
=HjvY
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: btrfs-raid questions I couldn't find an answer to on the wiki
2012-02-12 0:04 ` Phillip Susi
@ 2012-02-12 22:31 ` Duncan
0 siblings, 0 replies; 14+ messages in thread
From: Duncan @ 2012-02-12 22:31 UTC (permalink / raw)
To: linux-btrfs
Phillip Susi posted on Sat, 11 Feb 2012 19:04:41 -0500 as excerpted:
> On 02/11/2012 12:48 AM, Duncan wrote:
>> So you see, a separate /boot really does have its uses. =:^)
>
> True, but booting from removable media is easy too, and a full livecd
> gives much more recovery options than the grub shell.
And a rootfs backup that's simply a copy of rootfs at the time it was
taken is even MORE flexible, especially when rootfs is arranged to
contain all packages installed by the package manager. That's what I
use. If misfortune comes my way right in the middle of a critical
project and rootfs dies, simply root= on the kernel command line at the
grub prompt, to the backup root, and assuming that critical project is on
another filesystem (such as home), I can normally simply continue where I
left off. Full X and desktop, browser, movie players, document editors
and viewers, presentation software, all the software I had on the system
at the time I made the backup, directly bootable without futzing around
with data restores, etc. =:^)
> It is the corrupted root fs that is of much more concern than /boot.
Yes, but to the extent that /boot is the gateway to both the rootfs and
its backup... and digging out the removable media is at least a /bit/
more hassle than simply altering the root= (and mdX=) on the kernel
command line...`
(Incidentally, I've thought for quite some time that I really should have
had two such backups, such that if I'm just doing the backup when
misfortune strikes and takes out both the working rootfs and its backup,
the backup being mounted and actively written at the time of the
misfortune, I could always boot to the second backup. But I hadn't
considered that when I did the current layout. Given that rootfs with
the full installed system's only 4.75 gigs (with a quarter gig /usr/local
on the same 5 gig partitioned md/raid), it shouldn't be /too/ difficult
to fit that in at my next rearrange, especially if I do the 4/3 raid10s
as you suggested (for another ~100 gig since I'm running 300 gig disks).)
>> I don't "grok" [raid10]
>
> To grok the other layouts, it helps to think of the simple two disk
> case.
> A far layout is like having a raid0 across the first half of the disk,
> then mirroring the whole first half of the disk onto the second half of
> the other disk. Offset has the mirror on the next stripe so each stripe
> is interleaved with a mirror stripe, rather than having all original,
> then all mirrors after.
>
> It looks like mdadm won't let you use both at once, so you'd have to go
> with a 3 way far or offset. Also I was wrong about the additional
> space. You would only get 25% more space since you still have 3 copies
> of all data so you get 4/3 times the space, but you will get much better
> throughput since it is striped across all 4 disks. Far gives better
> sequential read since it reads just like a raid0, but writes have to
> seek all the way across the disk to write the backup. Offset requires
> seeks between each stripe on read, but the writes don't have to seek to
> write the backup.
Thanks. That's reasonably clear. Beyond that, I just have to DO IT, to
get comfortable enough with it to be confident in my restoration
abilities under the stress of an emergency recovery. (That's the reason
I ditched the lvm2 layer I had tried, the additional complexity of that
one more layer was simply too much for me to be confident in my ability
to manage it without fat-fingering under the stress of an emergency
recovery situation.)
> You also could do a raid6 and get the double failure tolerance, and two
> disks worth of capacity, but not as much read throughput as raid10.
Ugh! That's what I tried as my first raid layout, when I was young and
foolish, raid-wise! Raid5/6's read-modify-write cycle in ordered to get
the parity data written was simply too much! Combine that with the
parallel job read boost of raid1, and raid1 was a FAR better choice for
me than raid6!
Actually, since much of my reading /is/ parallel jobs and the kernel i/o
scheduler and md do such a good job of taking advantage of raid1's
parallel-read characteristics, it has seemed I do better with that that
with raid0! I do still have one raid0, for gentoo's package tree, the
kernel tree, etc, since redundancy doesn't matter for it and the 4X space
it gives me for that is nice, but bigger storage, I'd have it all raid1
(or now raid10) and not have to worry about other levels.
Counterintuitively, even write seems more responsive with raid1 than
raid0, in actual use. The only explanation I've come up with for that is
that in practice, any large scale writes tend to be reads from elsewhere
as well, and the md scheduler is evidently smart enough to read from one
spindle and write to the others, then switch off to catch up writing on
the formerly read-spindle, such that there's rather less head seeking
between read and write than there'd be otherwise. Since raid0 only has
the single copy, the data MUST be read from whatever spindle it resides
on, thus eliminating the kernel/md's ability to smart-schedule, favoring
one spindle at a time for reads to eliminate seeks.
For that reason, I've always thought that if I went to raid10, I'd try to
do it with at least triple spindle at the raid1 level, thus hoping to get
both the additional redundancy and parallel scheduling of raid1, while
also getting the thruput speed and size of the stripes.
Now you've pointed out that I can do essentially that with a triple
mirror on quad spindle raid10, and I'm seeing new possibilities open up...
>> Multiple
>> raids, with the ones I'm not using ATM offline, means I don't have to
>> worry about recovering the entire thing, only the raids that were
>> online and actually dirty at the time of crash or whatever.
>
> Depends on what you mean by recovery. Re-adding a drive that you
> removed will be faster with multiple raids ( though write-intent bitmaps
> also take care of that ), but if you actually have a failed disk and
> have to replace it with a new one, you still have to do a rebuild on all
> of the raids so it ends up taking the same total time.
Very good point. I was talking about re-adding. For various reasons
including hardware power-on stability latency (these particular disks
apparently take a bit to stabilize after power on and suspend-to-disk
often kicks a disk on resume due to ID-match-failure, which then appears
as say sde instead of sdb; I've solved that problem by simply leaving on
or shutting down the system instead of using suspend-to-disk), faulty
memory at one point causing kernel panics, and the fact that I run live-
git kernels, I've had rather more experience with re-add than I would
have liked. But that has made me QUITE confident in my ability to
recover from either that or a dead drive, since I've had rather more
practice than I anticipated.
But all my experience has been with re-add, so that's what I was thinking
about when I said recovery. Thanks for pointing out that I omitted to
mention that as I was really quite oblivious. =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2012-02-12 22:31 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-26 15:41 btrfs-raid questions I couldn't find an answer to on the wiki Duncan
2012-01-28 12:08 ` Martin Steigerwald
2012-01-29 5:40 ` Duncan
2012-01-29 7:55 ` Martin Steigerwald
2012-01-29 11:23 ` Goffredo Baroncelli
2012-01-30 5:49 ` Li Zefan
2012-01-30 14:58 ` Kyle Gates
2012-01-31 5:55 ` Duncan
2012-02-01 0:22 ` Kyle Gates
2012-02-01 6:59 ` Duncan
2012-02-10 19:45 ` Phillip Susi
2012-02-11 5:48 ` Duncan
2012-02-12 0:04 ` Phillip Susi
2012-02-12 22:31 ` Duncan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).