linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ellis H. Wilson III" <ellisw@panasas.com>
To: Chris Murphy <lists@colorremedies.com>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Status of FST and mount times
Date: Thu, 15 Feb 2018 11:45:42 -0500	[thread overview]
Message-ID: <c9f84d14-7e9c-c5c6-bf40-dbd17f747f9e@panasas.com> (raw)
In-Reply-To: <CAJCQCtQn-rOA7MPP-613mEKvnfJW+0US9ZJ8eCNxkOYZbtQFsw@mail.gmail.com>

On 02/15/2018 01:14 AM, Chris Murphy wrote:
> On Wed, Feb 14, 2018 at 9:00 AM, Ellis H. Wilson III <ellisw@panasas.com> wrote:
> 
>> Frame-of-reference here: RAID0.  Around 70TB raw capacity.  No compression.
>> No quotas enabled.  Many (potentially tens to hundreds) of subvolumes, each
>> with tens of snapshots.
> 
> Even if non-catastrophic to lose such a file system, it's big enough
> to be tedious and take time to set it up again. I think it's worth
> considering one of two things as alternatives:
> 
> a. metadata raid1, data single: you lose the striping performance of
> raid0, and if it's not randomly filled you'll end up with some disk
> contention for reads and writes *but* if you lose a drive you will not
> lose the file system. Any missing files on the dead drive will result
> in EIO (and I think also a kernel message with path to file), and so
> you could just run a script to delete those files and replace them
> with backup copies.

This option is on our roadmap for future releases of our parallel file 
system, but unfortunately we do not presently have the time to implement 
the functionality to report from the manager of that btrfs filesystem to 
the pfs manager that said files have gone missing.  We will absolutely 
be revisiting that as an option in early 2019, as replacing just one 
disk instead of N is highly attractive.  Waiting for EIO as you suggest 
in b is a non-starter for us, as we're working at scales sufficiently 
large that we don't want to wait for someone to stumble over a partially 
degraded file.  Pro-active reporting is what's needed, and we'll 
implement that Real Soon Now.

> b. Variation on the above would be to put it behind glusterfs
> replicated volume. Gluster getting EIO from a brick should cause it to
> get a copy from another brick and then fix up the bad one
> automatically. Or in your raid0 case, the whole volume is lost, and
> glusterfs helps do the full rebuild over 3-7 days while you're still
> able to access those 70TB of data normally. Of course, this option
> requires having two 70TB storage bricks available.

See my email address, which may help understand why GlusterFS is a 
non-starter.  Nevertheless, the idea is a fine one and we'll have 
something similar going on, but at higher raid levels and across 
typically a dozen or more of such bricks.

Best,

ellis

      reply	other threads:[~2018-02-15 16:45 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-14 16:00 Status of FST and mount times Ellis H. Wilson III
2018-02-14 17:08 ` Nikolay Borisov
2018-02-14 17:21   ` Ellis H. Wilson III
2018-02-15  1:42   ` Qu Wenruo
2018-02-15  2:15     ` Duncan
2018-02-15  3:49       ` Qu Wenruo
2018-02-15 11:12     ` Hans van Kranenburg
2018-02-15 16:30       ` Ellis H. Wilson III
2018-02-16  1:55         ` Qu Wenruo
2018-02-16 14:12           ` Ellis H. Wilson III
2018-02-16 14:20             ` Hans van Kranenburg
2018-02-16 14:42               ` Ellis H. Wilson III
2018-02-16 14:55                 ` Ellis H. Wilson III
2018-02-17  0:59             ` Qu Wenruo
2018-02-20 14:59               ` Ellis H. Wilson III
2018-02-20 15:41                 ` Austin S. Hemmelgarn
2018-02-21  1:49                   ` Qu Wenruo
2018-02-21 14:49                     ` Ellis H. Wilson III
2018-02-21 15:03                       ` Hans van Kranenburg
2018-02-21 15:19                         ` Ellis H. Wilson III
2018-02-21 15:56                           ` Hans van Kranenburg
2018-02-22 12:41                             ` Austin S. Hemmelgarn
2018-02-21 21:27                       ` E V
2018-02-22  0:53                       ` Qu Wenruo
2018-02-15  5:54   ` Chris Murphy
2018-02-14 23:24 ` Duncan
2018-02-15 15:42   ` Ellis H. Wilson III
2018-02-15 16:51     ` Austin S. Hemmelgarn
2018-02-15 16:58       ` Ellis H. Wilson III
2018-02-15 17:57         ` Austin S. Hemmelgarn
2018-02-15  6:14 ` Chris Murphy
2018-02-15 16:45   ` Ellis H. Wilson III [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c9f84d14-7e9c-c5c6-bf40-dbd17f747f9e@panasas.com \
    --to=ellisw@panasas.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).