questoin about Data=single on multi-device fs

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* questoin about Data=single on multi-device fs
@ 2020-04-26 10:04 Marc Lehmann
  2020-04-26 10:25 ` Hugo Mills
  0 siblings, 1 reply; 5+ messages in thread
From: Marc Lehmann @ 2020-04-26 10:04 UTC (permalink / raw)
  To: linux-btrfs

Hi!

I have a quesion about a possible behaviour change. I have a multi-device
btrfs filesystem with metadatas profile raid1 and data profile single.

With my current kernel (5.4.28), it seems btrfs is balancing writes to the
devices, which is nice for performance (it's kind of a best-effort raid0),
but not so nice for data recovery (files get sepasrated out on all kinds of
disks, which increases data loss on device failure).

I remember (maybe wrongly!) that this behaviour was diferent with older
kernels (4.9, possibly 4.19), in that I feel that btrfs was mostly writing ot
a single disk until it was more or less full before switching to another
disk, which is worse for performance, but much better for data recovery.

The reason I chose data=single was specifically to help in case of device
loss at the cost of performance.

So my question is: did the behaviour change (possibly I misinterpreted
what I saw weith older kernels), and is there a way to get the behaviour
I thought it had before, where it mostly stayed with one disk without
balancing writes?

(I can simulate this for my case using either btrfs resize or incremental
btrfs adds).

Thanks a lot for any insights!

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: questoin about Data=single on multi-device fs
  2020-04-26 10:04 questoin about Data=single on multi-device fs Marc Lehmann
@ 2020-04-26 10:25 ` Hugo Mills
  2020-04-27 11:29   ` Marc Lehmann
  0 siblings, 1 reply; 5+ messages in thread
From: Hugo Mills @ 2020-04-26 10:25 UTC (permalink / raw)
  To: Marc Lehmann; +Cc: linux-btrfs

On Sun, Apr 26, 2020 at 12:04:05PM +0200, Marc Lehmann wrote:
> Hi!
> 
> I have a quesion about a possible behaviour change. I have a multi-device
> btrfs filesystem with metadatas profile raid1 and data profile single.
> 
> With my current kernel (5.4.28), it seems btrfs is balancing writes to the
> devices, which is nice for performance (it's kind of a best-effort raid0),
> but not so nice for data recovery (files get sepasrated out on all kinds of
> disks, which increases data loss on device failure).
> 
> I remember (maybe wrongly!) that this behaviour was diferent with older
> kernels (4.9, possibly 4.19), in that I feel that btrfs was mostly writing ot
> a single disk until it was more or less full before switching to another
> disk, which is worse for performance, but much better for data recovery.
> 
> The reason I chose data=single was specifically to help in case of device
> loss at the cost of performance.

   Make backups. That's the only way to be sure about this sort of thing.

> So my question is: did the behaviour change (possibly I misinterpreted
> what I saw weith older kernels), and is there a way to get the behaviour
> I thought it had before, where it mostly stayed with one disk without
> balancing writes?

   As far as I'm aware, the behaviour hasn't changed.

   With single data, *chunk allocation* will go to the device with the
largest amount of unallocated space. If your data is WORM
(write-once-read-many), then you'll get a gigabyte of contiguous space
on one device, and then it'll switch to a different one.

   If you are also deleting or modifying files, then the FS may be
placing newly-written extents within any free space in existing chunk
allocation. This could be anywhere within the FS, and so could be on
any device.

   There's no way to control this behaviour (either the chunk
allocation or the extent allocation).

   Hugo.

> (I can simulate this for my case using either btrfs resize or incremental
> btrfs adds).
> 
> Thanks a lot for any insights!
> 

-- 
Hugo Mills             | Someone's been throwing dead sheep down my Fun Well
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4          |                                          Nick Gibbins

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: questoin about Data=single on multi-device fs
  2020-04-26 10:25 ` Hugo Mills
@ 2020-04-27 11:29   ` Marc Lehmann
  2020-04-27 11:44     ` Roman Mamedov
  0 siblings, 1 reply; 5+ messages in thread
From: Marc Lehmann @ 2020-04-27 11:29 UTC (permalink / raw)
  To: Hugo Mills, linux-btrfs

On Sun, Apr 26, 2020 at 11:25:47AM +0100, Hugo Mills <hugo@carfax.org.uk> wrote:
> > The reason I chose data=single was specifically to help in case of device
> > loss at the cost of performance.
> 
>    Make backups. That's the only way to be sure about this sort of thing.

I think you are unthinkingly repeating a wrong (and slightly dangerous)
claim - backups cannot actually do that sort of thing: a raid will protect
against (some amount of) disk failures with no data loss, but backups
cannot: Backups can protect against complete data loss, but cannot
completely protect against data loss.

>    With single data, *chunk allocation* will go to the device with the
> largest amount of unallocated space. If your data is WORM

That is definitely not the case with 5.4 - I added two disks to an
existing filesystem and copied 8TB onto it with btrfs receive, resulting
in about 3800G used on both new disks, with Data=single. I repeated it by
creating a 5 disk-fs and 1TB of data and got prettyx even distribution.

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: questoin about Data=single on multi-device fs
  2020-04-27 11:29   ` Marc Lehmann
@ 2020-04-27 11:44     ` Roman Mamedov
  2020-04-27 12:32       ` Marc Lehmann
  0 siblings, 1 reply; 5+ messages in thread
From: Roman Mamedov @ 2020-04-27 11:44 UTC (permalink / raw)
  To: Marc Lehmann; +Cc: Hugo Mills, linux-btrfs

On Mon, 27 Apr 2020 13:29:47 +0200
Marc Lehmann <schmorp@schmorp.de> wrote:

> On Sun, Apr 26, 2020 at 11:25:47AM +0100, Hugo Mills <hugo@carfax.org.uk> wrote:
> > > The reason I chose data=single was specifically to help in case of device
> > > loss at the cost of performance.
> > 
> >    Make backups. That's the only way to be sure about this sort of thing.
> 
> I think you are unthinkingly repeating a wrong (and slightly dangerous)
> claim - backups cannot actually do that sort of thing: a raid will protect
> against (some amount of) disk failures with no data loss, but backups
> cannot: Backups can protect against complete data loss, but cannot
> completely protect against data loss.

With backups it is at least clear enough to anyone that only the data that has
been backed up will be recoverable from the backup;

On the other hand you follow a much more dangerous theory, that a low-level
JBOD-style merging of disks can be of any significant "help" in case of a
device failure. That's often heard applied to LVM LVs spanned across multiple
devices, or MD Linear, or in this case Btrfs "single". In all of those cases I
have to wonder how getting to keep a few chunks of what some time ago was a
filesystem, or in your case, *random pieces of random files* being luckily
intact, will be of any help and alleviate the need to restore from backups.

If you really want a JBOD-style storage merged into a single pool, with device
failures having impact limited only to that device, better look into FUSE
file-level overlay filesystems, such as MergerFS and MHDDFS. At least with
those you are guaranteed to have whole files intact on still running devices.
Exactly what Btrfs doesn't guarantee you now (seemingly even more so), but most
importantly never did, not even on any prior kernel version.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: questoin about Data=single on multi-device fs
  2020-04-27 11:44     ` Roman Mamedov
@ 2020-04-27 12:32       ` Marc Lehmann
  0 siblings, 0 replies; 5+ messages in thread
From: Marc Lehmann @ 2020-04-27 12:32 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Hugo Mills, linux-btrfs

On Mon, Apr 27, 2020 at 04:44:36PM +0500, Roman Mamedov <rm@romanrm.net> wrote:
> With backups it is at least clear enough to anyone that only the data that has
> been backed up will be recoverable from the backup;
> 
> On the other hand you follow a much more dangerous theory, that a low-level
> JBOD-style merging of disks can be of any significant "help" in case of a
> device failure.

I'm not sure why you are trying to derail this discussion - in any case, I
am not sure what you means with dangerous, or even theory: it's a trivial
fact that losing half of every file is obviously a bigger data loss than
losing half your files, for practically all scenarios (but admittededly
not all).

> devices, or MD Linear, or in this case Btrfs "single". In all of those cases I
> have to wonder how getting to keep a few chunks of what some time ago was a
> filesystem, or in your case, *random pieces of random files* being luckily
> intact, will be of any help and alleviate the need to restore from backups.

Well, to give you a practical example, I once had to rescue an extremely
damaged reiserfs filesystem, given chunk-md4 checksums of all files, and
md5 checksums of all files. This allowed me to recover practically all
files, except a few big ones that were probably too fragmented.

Here is another practical example which shows your assumptions are simply
wrong: Restoring 100GB from backup takes a very long time hereabouts. If
btrfs behaves as it apparently traditionally did with Data=single, you
can instantly stay online even after losing one or more disks (with fewer
files), repair the metadata, delete the broken files, restore those much
more quickly, and be only practically all the time.

So with traditional Data=single behaviour, you can potentially save a lot
of time - for example, in a multi-device fs with 10x10TB, this can make a
10x difference in downtime, which is significant, especially if your to
storage allows a certain amount of downtime (being not raided in the irst
place).

> If you really want a JBOD-style storage merged into a single pool, with device
> failures having impact limited only to that device, better look into FUSE
> file-level overlay filesystems, such as MergerFS and MHDDFS.

Funnily enough, I actually did look into mergerfs, unfortnately, it is
extremely buggy (as in, crashes, memory leaks and simply wrong behaviour).
Btrfs is absolutley the better alternative at the moment :)

> At least with those you are guaranteed to have whole files intact on
> still running devices.  Exactly what Btrfs doesn't guarantee you now
> (seemingly even more so), but most importantly never did, not even on
> any prior kernel version.

I haven't asked/requested/expected any guarantees, but since making wrong
assumptions about backups is so common here, let's give it another use
case, power saving - you can save power by limiting activity to fewer
disks (and also reduce latency due to disk spin-up), at the cost of
performance by not striping data.

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-04-27 12:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-04-26 10:04 questoin about Data=single on multi-device fs Marc Lehmann
2020-04-26 10:25 ` Hugo Mills
2020-04-27 11:29   ` Marc Lehmann
2020-04-27 11:44     ` Roman Mamedov
2020-04-27 12:32       ` Marc Lehmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).