Planning out new fs. Am I missing anything?

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Planning out new fs. Am I missing anything?
@ 2020-05-25  1:13 Justin Engwer
  2020-05-26 12:47 ` Neal Gompa
  2020-05-27  2:20 ` Chris Murphy
  0 siblings, 2 replies; 9+ messages in thread
From: Justin Engwer @ 2020-05-25  1:13 UTC (permalink / raw)
  To: linux-btrfs

Hi, I'm the guy who lost all his VMs due to a massive configuration oversight.

I'm looking to implement the remaining 4 x 3tb drives into a new fs
and just want someone to look over things. I'm intending to use them
for backup storage (veeam).

Centos 7 Kernel 5.5.2-1.el7.elrepo.x86_64
btrfs-progs v4.9.1

mkfs.btrfs -m raid1c4 -d raid1 /dev/disk/by-id/ata-ST3000*-part1
echo "UUID=whatever /mnt/btrfs/ btrfs defaults,space_cache=v2 0 2" >> /etc/fstab
mount /mnt/btrfs

RAID1 over 4 disks and RAID1C4 metadata. Mounting with space_cache=v2.
Any other mount switches or btrfs creation switches I should be aware
of? Should I consider RAID5/6 instead? 6tb should be sufficient, so
it's not like I'd get anything out of RAID5, but RAID6 I suppose could
provide a little more safety in the case of multiple drive failures at
once.

Cheers,
Justin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Planning out new fs. Am I missing anything?
  2020-05-25  1:13 Planning out new fs. Am I missing anything? Justin Engwer
@ 2020-05-26 12:47 ` Neal Gompa
  2020-05-27  2:20 ` Chris Murphy
  1 sibling, 0 replies; 9+ messages in thread
From: Neal Gompa @ 2020-05-26 12:47 UTC (permalink / raw)
  To: Justin Engwer; +Cc: Btrfs BTRFS

On Sun, May 24, 2020 at 9:35 PM Justin Engwer <justin@mautobu.com> wrote:
>
> Hi, I'm the guy who lost all his VMs due to a massive configuration oversight.
>
> I'm looking to implement the remaining 4 x 3tb drives into a new fs
> and just want someone to look over things. I'm intending to use them
> for backup storage (veeam).
>
> Centos 7 Kernel 5.5.2-1.el7.elrepo.x86_64
> btrfs-progs v4.9.1
>
> mkfs.btrfs -m raid1c4 -d raid1 /dev/disk/by-id/ata-ST3000*-part1
> echo "UUID=whatever /mnt/btrfs/ btrfs defaults,space_cache=v2 0 2" >> /etc/fstab
> mount /mnt/btrfs
>
> RAID1 over 4 disks and RAID1C4 metadata. Mounting with space_cache=v2.
> Any other mount switches or btrfs creation switches I should be aware
> of? Should I consider RAID5/6 instead? 6tb should be sufficient, so
> it's not like I'd get anything out of RAID5, but RAID6 I suppose could
> provide a little more safety in the case of multiple drive failures at
> once.
>

In general, this looks fine, but I'd suggest that you switch to CentOS 8.

There's a COPR for btrfs-progs for EL8 that's keeps in sync with
Fedora: https://copr.fedorainfracloud.org/coprs/ngompa/btrfs-progs-el8/

For CentOS 8, you should continue to plan to use ELRepo.org kernels. :)


-- 
真実はいつも一つ！/ Always, there's only one truth!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Planning out new fs. Am I missing anything?
  2020-05-25  1:13 Planning out new fs. Am I missing anything? Justin Engwer
  2020-05-26 12:47 ` Neal Gompa
@ 2020-05-27  2:20 ` Chris Murphy
  2020-05-27  5:22   ` Andrei Borzenkov
  1 sibling, 1 reply; 9+ messages in thread
From: Chris Murphy @ 2020-05-27  2:20 UTC (permalink / raw)
  To: Justin Engwer; +Cc: Btrfs BTRFS

On Sun, May 24, 2020 at 7:13 PM Justin Engwer <justin@mautobu.com> wrote:
>
> Hi, I'm the guy who lost all his VMs due to a massive configuration oversight.
>
> I'm looking to implement the remaining 4 x 3tb drives into a new fs
> and just want someone to look over things. I'm intending to use them
> for backup storage (veeam).
>
> Centos 7 Kernel 5.5.2-1.el7.elrepo.x86_64
> btrfs-progs v4.9.1

I suggest updating the btrfs-progs, that's old.
>
> mkfs.btrfs -m raid1c4 -d raid1 /dev/disk/by-id/ata-ST3000*-part1
> echo "UUID=whatever /mnt/btrfs/ btrfs defaults,space_cache=v2 0 2" >> /etc/fstab
> mount /mnt/btrfs

Add noatime.
https://lwn.net/Articles/499293/

I don't recommend space_cache=v2 in fstab. Use it once manually with
clear_cache,space_cache=v2, and a feature flag will be set to use it
from that point on. Soon v2 will be the default and you won't have to
worry about this at all.

fs_passno should be 0 for btrfs. man fsck.btrfs - it's a no op, it's
not designed for unattended use during startup. XFS is the same.

> RAID1 over 4 disks and RAID1C4 metadata. Mounting with space_cache=v2.
> Any other mount switches or btrfs creation switches I should be aware
> of? Should I consider RAID5/6 instead? 6tb should be sufficient, so
> it's not like I'd get anything out of RAID5, but RAID6 I suppose could
> provide a little more safety in the case of multiple drive failures at
> once.

single, dup, raid0, raid1 (all), raid10 are safe and stable. raid56
has caveats and you need to take precautions that kinda amount to hand
holding. If there is a crash or power fail you need to do a scrub
(full file system scrub) when raid56. It's a good idea, but not "very
necessary" with other profiles. If you mount raid56 degraided, you
seriously need to consider not doing writes or being very skeptical of
depending on those writes because there's some evidence of degraded
writes being corrupted.

You can check the archives for more information from Zygo, about
raid56 pitfalls. It is table on stable storage. But the point of any
raid is to withstand a non-stable situation like a device failure. And
there's still work needed on raid56 to get to that point, without
handholding.

If you need raid5, you might consider mdadm for the raid5, and then
format it with btrfs using defaults which will get you DUP metadata
and single copy data. You'll get cheap snapshots. Faster scrubs. And
warnings for any corruptions of metadata or data.

Also consider mkfs.btrfs --checksum=xxhash, but you definitely need
btrfs-progs 5.5 or newer, and kernel 5.6 or newer. If those are too
new for your use case, skip it. crc32c is fine, but it is intended for
detection of casual incidental corruption and can't be  used for
dedup. xxhash64 is about as fast, but much better collision
resistance.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Planning out new fs. Am I missing anything?
  2020-05-27  2:20 ` Chris Murphy
@ 2020-05-27  5:22   ` Andrei Borzenkov
  2020-05-27  6:25     ` Chris Murphy
  0 siblings, 1 reply; 9+ messages in thread
From: Andrei Borzenkov @ 2020-05-27  5:22 UTC (permalink / raw)
  To: Chris Murphy, Justin Engwer; +Cc: Btrfs BTRFS

27.05.2020 05:20, Chris Murphy пишет:
> 
> single, dup, raid0, raid1 (all), raid10 are safe and stable.

Until btrfs can reliably detect and automatically handle outdated device
I would not call any multi-device profiles "safe", at least unconditionally.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Planning out new fs. Am I missing anything?
  2020-05-27  5:22   ` Andrei Borzenkov
@ 2020-05-27  6:25     ` Chris Murphy
  2020-05-27 16:23       ` Goffredo Baroncelli
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Murphy @ 2020-05-27  6:25 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: Chris Murphy, Justin Engwer, Btrfs BTRFS

On Tue, May 26, 2020 at 11:22 PM Andrei Borzenkov <arvidjaar@gmail.com> wrote:
>
> 27.05.2020 05:20, Chris Murphy пишет:
> >
> > single, dup, raid0, raid1 (all), raid10 are safe and stable.
>
> Until btrfs can reliably detect and automatically handle outdated device
> I would not call any multi-device profiles "safe", at least unconditionally.

I agree.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Planning out new fs. Am I missing anything?
  2020-05-27  6:25     ` Chris Murphy
@ 2020-05-27 16:23       ` Goffredo Baroncelli
  2020-05-27 18:40         ` Chris Murphy
  0 siblings, 1 reply; 9+ messages in thread
From: Goffredo Baroncelli @ 2020-05-27 16:23 UTC (permalink / raw)
  To: Chris Murphy, Andrei Borzenkov; +Cc: Justin Engwer, Btrfs BTRFS

Hi All,

On 5/27/20 8:25 AM, Chris Murphy wrote:
> On Tue, May 26, 2020 at 11:22 PM Andrei Borzenkov <arvidjaar@gmail.com> wrote:
>>
>> 27.05.2020 05:20, Chris Murphy пишет:
>>>
>>> single, dup, raid0, raid1 (all), raid10 are safe and stable.
>>
>> Until btrfs can reliably detect and automatically handle outdated device
>> I would not call any multi-device profiles "safe", at least unconditionally.
> 
> I agree.
> 

Checking the generation of each device should be sufficient to detect "outdated" devices. Why this check is not performed ?
May be that I am missing something ?

Of course this could solves the "detection"; the handling of outdated devices is another story..

BR
G.Baroncelli

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Planning out new fs. Am I missing anything?
  2020-05-27 16:23       ` Goffredo Baroncelli
@ 2020-05-27 18:40         ` Chris Murphy
  2020-05-27 19:51           ` Goffredo Baroncelli
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Murphy @ 2020-05-27 18:40 UTC (permalink / raw)
  To: Goffredo Baroncelli
  Cc: Chris Murphy, Andrei Borzenkov, Justin Engwer, Btrfs BTRFS

On Wed, May 27, 2020 at 10:23 AM Goffredo Baroncelli <kreijack@libero.it> wrote:
>
> Hi All,
>
> On 5/27/20 8:25 AM, Chris Murphy wrote:
> > On Tue, May 26, 2020 at 11:22 PM Andrei Borzenkov <arvidjaar@gmail.com> wrote:
> >>
> >> 27.05.2020 05:20, Chris Murphy пишет:
> >>>
> >>> single, dup, raid0, raid1 (all), raid10 are safe and stable.
> >>
> >> Until btrfs can reliably detect and automatically handle outdated device
> >> I would not call any multi-device profiles "safe", at least unconditionally.
> >
> > I agree.
> >
>
> Checking the generation of each device should be sufficient to detect "outdated" devices. Why this check is not performed ?
> May be that I am missing something ?

But transid isn't unique enough except in isolation. Degraded volumes
are treated completely independently. So if I take a 2x raid1 and
mount each one degraded on separate computers and modify them. Then
join them back together, how can Btrfs resolve the differences? It's a
mess. Yes that is obviously a kind of sabotage. While not literal
sabotage, the effect is the same if you have alternating degraded
drives in successive boots.

So you just cannot use degraded with either fstab or rootflags. It's
bad advice to anyone who gives it and we need to be vigilant about
recommending against it. Maybe the man 5 btrfs page should expressly
say not to include degraded in fstab, or at least warn there are
consequences.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Planning out new fs. Am I missing anything?
  2020-05-27 18:40         ` Chris Murphy
@ 2020-05-27 19:51           ` Goffredo Baroncelli
  2020-05-28  2:14             ` Chris Murphy
  0 siblings, 1 reply; 9+ messages in thread
From: Goffredo Baroncelli @ 2020-05-27 19:51 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Andrei Borzenkov, Justin Engwer, Btrfs BTRFS

On 5/27/20 8:40 PM, Chris Murphy wrote:
> On Wed, May 27, 2020 at 10:23 AM Goffredo Baroncelli <kreijack@libero.it> wrote:
>>
>> Hi All,
>>
>> On 5/27/20 8:25 AM, Chris Murphy wrote:
>>> On Tue, May 26, 2020 at 11:22 PM Andrei Borzenkov <arvidjaar@gmail.com> wrote:
>>>>
>>>> 27.05.2020 05:20, Chris Murphy пишет:
>>>>>
>>>>> single, dup, raid0, raid1 (all), raid10 are safe and stable.
>>>>
>>>> Until btrfs can reliably detect and automatically handle outdated device
>>>> I would not call any multi-device profiles "safe", at least unconditionally.
>>>
>>> I agree.
>>>
>>
>> Checking the generation of each device should be sufficient to detect "outdated" devices. Why this check is not performed ?
>> May be that I am missing something ?
> 
> But transid isn't unique enough except in isolation. Degraded volumes
> are treated completely independently. So if I take a 2x raid1 and
> mount each one degraded on separate computers and modify them. Then
> join them back together, how can Btrfs resolve the differences? It's a
> mess. Yes that is obviously a kind of sabotage. While not literal
> sabotage, the effect is the same if you have alternating degraded
> drives in successive boots.

Even tough we can't close all the holes, we can reduce the likelihood of a this issue.

Anyway mounting a filesystem with different generation number is wrong. And the
fact the we can't prevent all the kind of mismatches doesn't mean that
we don't have to do anything.

I am thinking about adding a "opt in" check. I.e. if the mismatch happens
btrfs should raise a warning. If a flag is passed at mount (like
mount -o prevent-generation-mismatch) and the generations don't match,
the mount fails.

Then, on the basis of feedback returned, in the future we can change the
flags from "opt in" to "opt out" (mount -o no-prevent-generation-mismatch)

> 
> So you just cannot use degraded with either fstab or rootflags. It's
> bad advice to anyone who gives it and we need to be vigilant about
> recommending against it. Maybe the man 5 btrfs page should expressly
> say not to include degraded in fstab, or at least warn there are
> consequences.
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Planning out new fs. Am I missing anything?
  2020-05-27 19:51           ` Goffredo Baroncelli
@ 2020-05-28  2:14             ` Chris Murphy
  0 siblings, 0 replies; 9+ messages in thread
From: Chris Murphy @ 2020-05-28  2:14 UTC (permalink / raw)
  To: Goffredo Baroncelli
  Cc: Chris Murphy, Andrei Borzenkov, Justin Engwer, Btrfs BTRFS

On Wed, May 27, 2020 at 1:51 PM Goffredo Baroncelli <kreijack@inwind.it> wrote:
>
> On 5/27/20 8:40 PM, Chris Murphy wrote:
> > On Wed, May 27, 2020 at 10:23 AM Goffredo Baroncelli <kreijack@libero.it> wrote:
> >>
> >> Hi All,
> >>
> >> On 5/27/20 8:25 AM, Chris Murphy wrote:
> >>> On Tue, May 26, 2020 at 11:22 PM Andrei Borzenkov <arvidjaar@gmail.com> wrote:
> >>>>
> >>>> 27.05.2020 05:20, Chris Murphy пишет:
> >>>>>
> >>>>> single, dup, raid0, raid1 (all), raid10 are safe and stable.
> >>>>
> >>>> Until btrfs can reliably detect and automatically handle outdated device
> >>>> I would not call any multi-device profiles "safe", at least unconditionally.
> >>>
> >>> I agree.
> >>>
> >>
> >> Checking the generation of each device should be sufficient to detect "outdated" devices. Why this check is not performed ?
> >> May be that I am missing something ?
> >
> > But transid isn't unique enough except in isolation. Degraded volumes
> > are treated completely independently. So if I take a 2x raid1 and
> > mount each one degraded on separate computers and modify them. Then
> > join them back together, how can Btrfs resolve the differences? It's a
> > mess. Yes that is obviously a kind of sabotage. While not literal
> > sabotage, the effect is the same if you have alternating degraded
> > drives in successive boots.
>
> Even tough we can't close all the holes, we can reduce the likelihood of a this issue.
>
> Anyway mounting a filesystem with different generation number is wrong. And the
> fact the we can't prevent all the kind of mismatches doesn't mean that
> we don't have to do anything.

Yep. You're right.

>
> I am thinking about adding a "opt in" check. I.e. if the mismatch happens
> btrfs should raise a warning. If a flag is passed at mount (like
> mount -o prevent-generation-mismatch) and the generations don't match,
> the mount fails.

I wonder about using a compat_flag to set a device as having been
mounted degraded. The next time a mount happens, all devices with
compat_flag degraded set should have identical transids or we know
something is screwy. If there is a device that does not have degraded
flag, and has older transid, there could be some kind of sanity check
to make sure the last 1-3 transids transactions are the same (?) and
if so (a) allow a non-degraded mount, (b) warn, (c) "replay" the
transactions between stale and current, so that all devices are caught
up, similar to the partial rebuild mdadm does using write intent
bitmap as the hint for what needs to be caught up.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-05-28  2:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-05-25  1:13 Planning out new fs. Am I missing anything? Justin Engwer
2020-05-26 12:47 ` Neal Gompa
2020-05-27  2:20 ` Chris Murphy
2020-05-27  5:22   ` Andrei Borzenkov
2020-05-27  6:25     ` Chris Murphy
2020-05-27 16:23       ` Goffredo Baroncelli
2020-05-27 18:40         ` Chris Murphy
2020-05-27 19:51           ` Goffredo Baroncelli
2020-05-28  2:14             ` Chris Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).