linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marc Lehmann <schmorp@schmorp.de>
To: Roman Mamedov <rm@romanrm.net>
Cc: Hugo Mills <hugo@carfax.org.uk>, linux-btrfs@vger.kernel.org
Subject: Re: questoin about Data=single on multi-device fs
Date: Mon, 27 Apr 2020 14:32:35 +0200	[thread overview]
Message-ID: <20200427123235.GA8243@schmorp.de> (raw)
In-Reply-To: <20200427164436.05c5c257@natsu>

On Mon, Apr 27, 2020 at 04:44:36PM +0500, Roman Mamedov <rm@romanrm.net> wrote:
> With backups it is at least clear enough to anyone that only the data that has
> been backed up will be recoverable from the backup;
> 
> On the other hand you follow a much more dangerous theory, that a low-level
> JBOD-style merging of disks can be of any significant "help" in case of a
> device failure.

I'm not sure why you are trying to derail this discussion - in any case, I
am not sure what you means with dangerous, or even theory: it's a trivial
fact that losing half of every file is obviously a bigger data loss than
losing half your files, for practically all scenarios (but admittededly
not all).

> devices, or MD Linear, or in this case Btrfs "single". In all of those cases I
> have to wonder how getting to keep a few chunks of what some time ago was a
> filesystem, or in your case, *random pieces of random files* being luckily
> intact, will be of any help and alleviate the need to restore from backups.

Well, to give you a practical example, I once had to rescue an extremely
damaged reiserfs filesystem, given chunk-md4 checksums of all files, and
md5 checksums of all files. This allowed me to recover practically all
files, except a few big ones that were probably too fragmented.

Here is another practical example which shows your assumptions are simply
wrong: Restoring 100GB from backup takes a very long time hereabouts. If
btrfs behaves as it apparently traditionally did with Data=single, you
can instantly stay online even after losing one or more disks (with fewer
files), repair the metadata, delete the broken files, restore those much
more quickly, and be only practically all the time.

So with traditional Data=single behaviour, you can potentially save a lot
of time - for example, in a multi-device fs with 10x10TB, this can make a
10x difference in downtime, which is significant, especially if your to
storage allows a certain amount of downtime (being not raided in the irst
place).

> If you really want a JBOD-style storage merged into a single pool, with device
> failures having impact limited only to that device, better look into FUSE
> file-level overlay filesystems, such as MergerFS and MHDDFS.

Funnily enough, I actually did look into mergerfs, unfortnately, it is
extremely buggy (as in, crashes, memory leaks and simply wrong behaviour).
Btrfs is absolutley the better alternative at the moment :)

> At least with those you are guaranteed to have whole files intact on
> still running devices.  Exactly what Btrfs doesn't guarantee you now
> (seemingly even more so), but most importantly never did, not even on
> any prior kernel version.

I haven't asked/requested/expected any guarantees, but since making wrong
assumptions about backups is so common here, let's give it another use
case, power saving - you can save power by limiting activity to fewer
disks (and also reduce latency due to disk spin-up), at the cost of
performance by not striping data.

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\

      reply	other threads:[~2020-04-27 12:32 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-26 10:04 questoin about Data=single on multi-device fs Marc Lehmann
2020-04-26 10:25 ` Hugo Mills
2020-04-27 11:29   ` Marc Lehmann
2020-04-27 11:44     ` Roman Mamedov
2020-04-27 12:32       ` Marc Lehmann [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200427123235.GA8243@schmorp.de \
    --to=schmorp@schmorp.de \
    --cc=hugo@carfax.org.uk \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=rm@romanrm.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).