From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: System completely unresponsive after `btrfs balance start -dconvert=raid0 /` and `btrfs fi show /`
Date: Wed, 14 Oct 2015 05:08:17 +0000 (UTC) [thread overview]
Message-ID: <pan$39697$bacd3e0c$3e39ab12$ed8c21a7@cox.net> (raw)
In-Reply-To: C1BFF62A-9C2E-4A5D-86F9-7F01DDDF8BF6@paolino.me
Carmine Paolino posted on Tue, 13 Oct 2015 23:21:49 +0200 as excerpted:
> I have an home server with 3 hard drives that I added to the same btrfs
> filesystem. Several hours ago I run `btrfs balance start -dconvert=raid0
> /` and as soon as I run `btrfs fi show /` I lost my ssh connection to
> the machine. The machine is still on, but it doesn’t even respond to
> ping[. ...]
>
> (I have a 250gb internal hard drive, a 120gb usb 2.0 one and a 2TB usb
> 2.0 one so the transfer speeds are pretty low)
I won't attempt to answer the primary question[1] directly, but can point
out that in many cases, USB-connected devices simply don't have a stable
enough connection to work reliably in a multi-device btrfs. There's
several possibilities for failure, including flaky connections (sometimes
assisted by cats or kids), unstable USB host port drivers, and unstable
USB/ATA translators. A number of folks have reported problems with such
filesystems with devices connected over USB, that simply disappear if
they direct-connect the exact same devices to a proper SATA port. The
problem seems to be /dramatically/ worse with USB connected devices, than
it is with, for instance, PCIE-based SATA expansion cards.
Single-device btrfs with USB-attached devices seem to work rather better,
because at least in that case, if the connection is flaky, the entire
filesystem appears and disappears at once, and btrfs' COW, atomic-commit
and data-integrity features, kick in to help deal with the connection's
instability.
Arguably, a two-device raid1 (both data/metadata, with metadata including
system) should work reasonably well too, as long as scrubs are done after
reconnection when there's trouble with one of the pair, because in that
case, all data appears on both devices, but single and raid0 modes are
likely to have severe issues in that sort of environment, because even
temporary disconnection of a single device means loss of access to some
data/metadata on the filesystem. Raid10, 3+-device-raid1, and raid5/6,
are more complex situations. They should survive loss of at least one
device, but keeping the filesystem healthy in the presence of unstable
connections is... complex enough I'd hate to be the one having to deal
with it, which means I can't recommend it to others, either.
So I'd recommend either connecting all devices internally if possible, or
setting up the USB-connected devices with separate filesystems, if
internal direct-connection isn't possible.
---
[1] Sysadmin's rule of backups. If the data isn't backed up, by
definition it is of less value than the resource and hassle cost of
backup. No exceptions -- post-loss claims to the contrary simply put the
lie to the claims, as actions spoke louder than words and they defined
the cost of the backup as more expensive than the data that would have
been backed up. Worst-case is then loss of data that was by definition
of less value than the cost of backup, and the more valuable resource and
hassle cost of the backup was avoided, so the comparatively lower value
data loss is no big deal.
So in a case like this, I'd simply power down and take my chances of
filesystem loss, strictly limiting the time and resources I'd devote to
any further attempt at recovery, because the data is by definition either
backed up, or of such low value that a backup was considered too
expensive to do, meaning there's a very real possibility of spending more
time in a recovery attempt that's iffy at best, than the data on the
filesystem is actually worth, either because there are backups, or
because it's throw-away data in the first place.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
next prev parent reply other threads:[~2015-10-14 5:08 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-13 21:21 System completely unresponsive after `btrfs balance start -dconvert=raid0 /` and `btrfs fi show /` Carmine Paolino
2015-10-14 5:08 ` Duncan [this message]
2015-10-14 9:13 ` Hugo Mills
2015-10-14 13:12 ` Austin S Hemmelgarn
2015-10-14 21:09 ` Duncan
2015-10-15 4:39 ` Zygo Blaxell
2015-10-15 7:59 ` Duncan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$39697$bacd3e0c$3e39ab12$ed8c21a7@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).