From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: btrfs mounts RO, kernel oops on RW
Date: Sun, 28 May 2017 05:56:34 +0000 (UTC) [thread overview]
Message-ID: <pan$3ce02$7e425613$62194897$37882c38@cox.net> (raw)
In-Reply-To: CAHPjZW7Af2KzRr_HTaCNGqRAM0SsV8+gJWSRis6eCjGeXNBK7Q@mail.gmail.com
Bill Williamson posted on Sun, 28 May 2017 12:46:00 +1000 as excerpted:
> Version details:
> btrfs-progs v4.9.1
> Linux bigserver 4.10.0-22-generic #24-Ubuntu SMP Mon May 22
> 17:43:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
>
> Array Details:
> root@bigserver:~# btrfs fi df /mnt/storage
> Data, RAID1: total=12.48TiB, used=12.25TiB
> System, RAID1: total=32.00MiB, used=2.11MiB
> Metadata, RAID1: total=14.00GiB, used=13.31GiB
> GlobalReserve, single: total=512.00MiB, used=0.00
>
>
> root@bigserver:~# btrfs fi show /mnt/storage Label: none uuid:
> c792d033-b0a6-44a0-bd37-9825de7eeb8b
> Total devices 10 FS bytes used 12.27TiB
> devid 1 size 2.73TiB used 2.71TiB path /dev/sde
> devid 2 size 3.64TiB used 3.62TiB path /dev/sdh
> devid 5 size 1.82TiB used 1.80TiB path /dev/sdg
> devid 6 size 1.82TiB used 1.80TiB path /dev/sdc
> devid 8 size 1.36TiB used 1.35TiB path /dev/sdb
> devid 9 size 3.64TiB used 3.62TiB path /dev/sdf
> devid 12 size 1.82TiB used 1.80TiB path /dev/sdd
> devid 13 size 4.55TiB used 4.53TiB path /dev/sdk
> devid 14 size 3.64TiB used 3.62TiB path /dev/sdi
> devid 15 size 3.64TiB used 134.00GiB path /dev/sdj
Only one device with free space of any size. That can be an issue for
raid1, which needs two devices with free space for it to be worth
anything. But you were working on that and it doesn't seem to be your
current issue...
> Issue:
> I can mount my btrfs readonly (recovery option not necessary).
> Attempting to mount it readwrite results in a kernel null pointer
> exception.
>
> Background:
> I have a home server with a bunch of disks running btrfs raid 1. When
> it starts to fill up I add another disk and re-balance.
> I added a new 4TB disk and began the re-balance. After a while I needed
> to shut down the server, and did so gracefully with a shutdown -h now.
> Upon rebooting the array wouldn't mount, so I put "noauto"
> into fstab to allow a graceful bootup and diagnose from there.
So far, so good.
> At first I got the failed to read log tree error, so I ran
> btrfs-zero-log. It walked back 3-4 transactions but now seems okay.
>
> After that fix:
> - btrfs check shows no errors.
> - mounting the filesystem RO works great, I can read files.
> - mounting the filesystem RW results in a huge kernel exception and a
> hang, centering around can_overcommit and
> btrfs_async_reclaim_metadata_space
Try using the skip_balance mount option. See the btrfs (5) manpage (you
must specify the 5, or you'll get the section 8 general btrfs command
manpage).
If that works, you can resume or cancel the balance once the filesystem
is mounted writable.
But the filesystem is clearly not healthy, and that won't make it
healthy, just eliminate the current heart-attack trigger. I'd observe
the sysadmin's rules of backups below before trying anything else,
including the skip_balance mount option.
> My "you're screwed, it's dead" backup plan is to build another server
> and buy 2x8TB drives, and then copy the data I care about over, but I'd
> much rather save myself the trouble and $$$ and repair the array if
> possible.
The sysadmin's first rule of backups: The value of your data is defined
by the number and currency of your backups: No backups, you are defining
your data as of only trivial value, worth less than the time/trouble/
resources necessary to make those backups. (In)Actions speak louder than
words, so the definition holds regardless of any after-the-fact protests
to the contrary.
Put differently, if you don't /already/ have backups, then by definition,
you /don't/ care about any data on those drives and need not bother
copying it over as that would be as much of a hassle as making the backup
in the first place and you've already demonstrated you don't value the
data enough to do that.
Put yet differently, if the potential loss of that data has changed your
mind about its value, better make that backup **NOW**, preferably before
any further attempts to mount writable, with or without skip_balance,
while you have the chance and before further inaction tempts fate by
continuing to define the data as throw-away value. Next time you might
not get that chance!
(The second rule of backups is that a would-be backup isn't a backup
until you've tested it restorable/usable. Until then, it's only a would-
be backup, as the backup simply isn't complete until it has been tested.)
After that, assuming skip_balance works, I'd try a scrub. Given that
both data and metadata are raid1, that should ensure everything matches
checksum and eliminate any wrote-one-mirror-crashed-while-writing-the-
other, type errors. Of course if the filesystem is corrupted enough,
when you get to that point it might crash if it can't fix it, but at
least here, I've found scrub pretty reliable at fixing problems such as
bad shutdowns.
If scrub finds errors and they're all correctable, you're likely healthy
again, but it might be worth running a read-only btrfs check to be sure.
Same if scrub finds some uncorrectable errors. If the check reports
errors, post them here and see what the experts say (I'm not a dev, just
another user, and that sort of thing is normally beyond me), before
actually trying to fix them.
Meanwhile, turning the topic a bit, toward your suggested 8 TB drives.
Be aware that many of those are archive-targeted drives and aren't
designed for normal use. Linux (generally, not just btrfs) originally
had problems with them but they've been fixed for a few kernel cycles
now. However, unless you really /are/ going to use them for archiving,
that is, write once and shelve them, btrfs, and any other COW-based
filesystem, isn't going to be your best choice for filesystem on them, as
COW is a worst-case for the technology they use. A more conventional
filesystem should work better, altho ordinary usage performance still
isn't going to be great because they're not /designed/ for that sort of
usage, but rather for mostly write once and archive, or alternatively,
for a write, save, whole-drive (firmware command level) secure-erase,
reuse, cycle.
So if you're going for the really large drives, do be aware of that and
buy archive-usage or otherwise based on what you actually plan to do with
the drives.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
next prev parent reply other threads:[~2017-05-28 5:56 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-28 2:46 btrfs mounts RO, kernel oops on RW Bill Williamson
2017-05-28 5:56 ` Duncan [this message]
2017-05-28 7:27 ` Bill Williamson
2017-05-28 20:51 ` Duncan
2017-05-29 8:39 ` Marat Khalili
2017-05-28 22:25 ` Chris Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$3ce02$7e425613$62194897$37882c38@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).