From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: [3.2.1] BUG at fs/btrfs/inode.c:1588
Date: Sun, 5 Feb 2012 05:02:47 +0000 (UTC) [thread overview]
Message-ID: <pan.2012.02.05.05.02.46@cox.net> (raw)
In-Reply-To: mkirv8-uns.ln1@hurikhan.ath.cx
Kai Krakow posted on Fri, 03 Feb 2012 00:25:51 +0100 as excerpted:
> Duncan <1i5t5.duncan@cox.net> schrieb:
>
>> I had hoped someone else better qualified would answer, and they may
>> still do so, but in the meantime, a couple notes...
>
> Still I think you gained good insight by reading all those posts. I'm
> using btrfs for a few weeks now and it is pretty solid since 3.2. I've
> been reading the list a few weeks before starting btrfs but only looked
> at articles about corruption and data loss. I started using btrfs when
> rescue tools became available and the most annoying corruption bugs had
> been fixed.
> But I've been hit by corruption and freezes a few times so I decided to
> have that big usb3 disk which I rsync every now and then using snapshots
> for rollback.
Agreed on the insight from reading, and good to read that it has been
pretty solid for you since 3.2
What I seem to be seeing is that the normal single-disk/dual-metadata
setup seems to be reasonable, a few weird reports here and there
including the ENOSPC stuff, but nothing huge.
But my primary interest is raid1 both data and metadata, more than two-
copies (3-4), and > 2-copy just doesn't appear to be available yet (but
see the article linked below, which says 3.4 timeframe). Even the two-
copy setup looks like it still has major problems, including a recent
thread reporting an inability to allocate further chunks when in degraded
mode, so as soon as currently in-use chunks get full (maybe a gig or so
instead of 20 in that test), ENOSPC.
So I think I'll wait another kernel cycle or two... but now that I'm
here, I'm going to continue tracking the list, so I'll be ready to go
when the time comes.
>> 1) "phantom ENOSPC bug"
> On my first thought this was my suspicion, too. But otoh there was no
> ENOSPC message, neither in dmesg nor in rsync. Rsync just froze, I was
> able to kill it and my script continued to create a snapshot afterwards
> and unmount. I tried to mount again after btrfsck, it worked fine, I
> unmounted, system hung. I rebooted, scrubbed my two-disk array, no
> problems, I mounted the backup disk again, rsync'ed it, went fine,
> unmounted. But btrfsck still shows the same errors for this disk. *sigh
Good point. It looks similar to the ENOSPC bug, but without the ENOSPC.
But keep in mind that they're apparently simply throttling as a near-term
workaround and haven't fully traced the bug, yet. Given the otherwise
similar trigger and symptoms, your reported problem could thus be a
variant of the same bug, that happened to freeze rsync instead of erroring
out with ENOSPC. If so, when they do finally nail that one, it could
well either nail yours or at least make it easier to trace, as well.
> I think btrfs should try to fix such corruptions online while using it.
> From what I've learned here this is the long-term target and a working
> btrfsck should just be a helper tool. And the reason for the long
> delayed btrfsck is that Chris wants to have proper online fixing in
> place first.
I had seen articles pointing out that the mount-time and online fixing
tools were indeed taking up some of the slack, but this is the first time
I've seen it claimed as a major strategy, vs. the problems they've been
seeing simply being easy enough to fix online once they track them down
sufficiently to fix them at all, online or off. However, it does make a
lot of sense to do what you can online, and until the last couple weeks I
could have easily missed that it was deliberate since I wasn't following
btrfs closely enough to be sure to catch it before that, so it well may
/be/ a deliberate strategy. I believe you're correct.
> At least I can tell this corruption was introduced by bad logic in the
> kernel, and not by some crash. The usb3 disk is solely mounted for the
> purpose of rsync'ing and unmounted all the other times.
That's a good point, as well.
>> 2) Just a couple days ago I read an article that claimed Oracle has a
>> Feb 16 deadline for a working btrfsck as that's the deadline for
>> getting it in their next shipping Unbreakable Linux release. I won't
>> claim to know if the article is correct or not, but if so, a reasonably
>> working btrfsck should be available within two weeks. =:^) Of course
>> it may continue to improve after that...
>
> Sounds good. I wonder if Chris could tell anything on that point. ;-)
>
>> Meanwhile, there's a tool already available that should allow
>> retrieving the undamaged data off of unmountable filesystems, at least,
>> and there's another tool that allows rollback to an earlier root node
>> if necessary
> The tools are btrfs-rescue and btrfs-repair from Josef's btrfs-progs
> available from github.
Thanks.
> But if you could provide a link for the Feb 16 deadline I'd be eager to
> read the article.
It was a couple days before I could go looking, thus the delay in this
post, and it might be Feb 14 not 16, but...
The basis seems to be Chris Mason's talk at SCALE 10x LA, so there should
be independent coverage on various Linux new sites. Here's the one I
googled up first (using the Feb 16 date, that at least here appears to be
Feb 14, which might explain why I had trouble googling it). Phoronix.
http://www.phoronix.com/scan.php?page=news_item&px=MTA0ODU
That might have been the one I read, originally. (I subscribe to
lxer.com 's feed, which covers Linux and Android stories from around the
net, including phoronix, and would have clicked that if it had come up,
but don't remember for sure whether that was it or if there was another.)
There's a bit more tech detail, including the new tidbit about multiple-
mirroring that I mentioned above, in a different article.
http://www.phoronix.com/scan.php?page=news_item&px=MTA0Njk
I had discovered much to my dismay that so-called btrfs-raid1 only does
dual-copy, not full raid1 to an arbitrary number of copies. My current
disks are old enough that I really don't want to risk two-copy-only,
especially since I'm currently running 4-spindle md/raid1 for most of my
system so I already have the disks. I originally installed md/raid6 for
most of my data, thus the quad-spindle, but after running it for awhile,
decided raid-1 fit my needs better. If I'd have known about raid5/6 at
purchase and setup time what I know now, I'd have probably gone 2-spindle
raid1 then, with a third as a hot-spare, and saved myself the money on
the 4th one. The two-way would have been fine for current btrfs, but
given that I'm running 4-way raid1 now and the disks are about mid-life
operating hours, according to SMART, I simply don't want to risk
switching to two-way-only mirroring, only to then have both mirrors of
after all aging disks die at once.
So I've been debating whether btrfs DUAL mode (dual metadata on the same
device, single data) on 4-way md/raid1s would be better, or btrfs-raid1s
(two-way-mirrored data and metadata both, since two-way is all that's
possible ATM) layered on pairs of 2-way md/raids would be better. The
latter would play to btrfs' ability to recover data from a different
mirror when necessary. But I already run a dozen md/raids on partitions
across the same four physical devices, and that would double it to two-
dozen. At some point it's no longer a workable solution...
But if triple-way mirroring (and one assumes N-way mirroring can't be far
behind that if it's not what was meant) will show up in 3.4 or 3.5, as
that article suggests, and with the writing-fsck being out for awhile by
then if it's coming out later this month, that might well be my upgrade
time.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
next prev parent reply other threads:[~2012-02-05 5:02 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-01 1:05 [3.2.1] BUG at fs/btrfs/inode.c:1588 Kai Krakow
2012-02-01 18:39 ` Kai Krakow
2012-02-02 3:54 ` Kai Krakow
2012-02-02 11:19 ` Duncan
2012-02-02 23:25 ` Kai Krakow
2012-02-05 5:02 ` Duncan [this message]
2012-02-04 11:40 ` Kai Krakow
2012-02-05 0:07 ` Mitch Harder
2012-02-05 8:01 ` Kai Krakow
2012-02-05 16:15 ` Duncan
2012-02-13 21:05 ` Andrea Gelmini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pan.2012.02.05.05.02.46@cox.net \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).