From: David Arendt <admin@prnet.org>
To: Duncan <1i5t5.duncan@cox.net>, linux-btrfs@vger.kernel.org
Subject: Re: btrfs random filesystem corruption in kernel 3.17
Date: Tue, 14 Oct 2014 19:00:10 +0200 [thread overview]
Message-ID: <543D569A.8000103@prnet.org> (raw)
In-Reply-To: <pan$e02d1$5d8cd0cc$87f3cfc$b14a0d51@cox.net>
The corruption seems to be worse than expected. In kernel 3.16.5 I can
not mount this filesystem read/write.
I'm in progress of doing a tar - mkfs.btrfs - untar recovery and staying
on 3.16.5 for now.
[ 55.465584] parent transid verify failed on 51150848 wanted 272368
found 276401
[ 55.468415] parent transid verify failed on 918274048 wanted 273135
found 274590
[ 55.470915] parent transid verify failed on 508444672 wanted 274054
found 276617
[ 55.473758] parent transid verify failed on 18317623296 wanted 275876
found 278431
[ 55.476240] parent transid verify failed on 127254528 wanted 276488
found 276490
[ 55.479494] ------------[ cut here ]------------
[ 55.479499] WARNING: CPU: 1 PID: 1723 at fs/btrfs/extent-tree.c:876
btrfs_lookup_extent_info+0x44c/0x490()
[ 55.479500] Modules linked in:
[ 55.479502] CPU: 1 PID: 1723 Comm: ls Not tainted 3.16.5 #1
[ 55.479502] Hardware name: ASUS All Series/H87M-PRO, BIOS 2101 07/21/2014
[ 55.479503] 0000000000000000 0000000000000009 ffffffff816ff873
0000000000000000
[ 55.479504] ffffffff81078261 ffff8807f7084770 ffff8807ed8ca000
000000003dcf4000
[ 55.479506] ffff8807f7133de0 0000000000000000 ffffffff812be9bc
0000000000004000
[ 55.479507] Call Trace:
[ 55.479511] [<ffffffff816ff873>] ? dump_stack+0x41/0x51
[ 55.479514] [<ffffffff81078261>] ? warn_slowpath_common+0x81/0xb0
[ 55.479515] [<ffffffff812be9bc>] ? btrfs_lookup_extent_info+0x44c/0x490
[ 55.479516] [<ffffffff812c4998>] ? btrfs_alloc_free_block+0x2c8/0x450
[ 55.479519] [<ffffffff812af7df>] ? update_ref_for_cow+0x1ff/0x3f0
[ 55.479520] [<ffffffff812afc0a>] ? __btrfs_cow_block+0x23a/0x5a0
[ 55.479522] [<ffffffff812d14fd>] ? btrfs_buffer_uptodate+0x6d/0x80
[ 55.479524] [<ffffffff812b0136>] ? btrfs_cow_block+0x126/0x190
[ 55.479525] [<ffffffff812b43bd>] ? btrfs_search_slot+0x1fd/0xaa0
[ 55.479527] [<ffffffff812e07a3>] ?
btrfs_truncate_inode_items+0x123/0x8e0
[ 55.479529] [<ffffffff812e204a>] ? btrfs_evict_inode+0x32a/0x490
[ 55.479532] [<ffffffff8112e02a>] ? unlock_new_inode+0x3a/0x60
[ 55.479533] [<ffffffff8113abb5>] ? __inode_wait_for_writeback+0x65/0xb0
[ 55.479536] [<ffffffff810a8f70>] ? wake_atomic_t_function+0x30/0x30
[ 55.479537] [<ffffffff8112f276>] ? evict+0xa6/0x160
[ 55.479539] [<ffffffff812e2c2d>] ? btrfs_orphan_cleanup+0x1ed/0x430
[ 55.479540] [<ffffffff812e31c8>] ? btrfs_lookup_dentry+0x358/0x4c0
[ 55.479542] [<ffffffff812e3339>] ? btrfs_lookup+0x9/0x30
[ 55.479543] [<ffffffff8111f6c4>] ? lookup_real+0x14/0x50
[ 55.479545] [<ffffffff81120292>] ? __lookup_hash+0x32/0x50
[ 55.479546] [<ffffffff81120938>] ? lookup_slow+0x48/0xc0
[ 55.479547] [<ffffffff811227bc>] ? path_lookupat+0x73c/0x770
[ 55.479550] [<ffffffff81164860>] ? posix_acl_xattr_get+0x40/0xb0
[ 55.479551] [<ffffffff81137a80>] ? generic_getxattr+0x50/0x80
[ 55.479552] [<ffffffff8112281e>] ? filename_lookup.isra.51+0x2e/0x90
[ 55.479554] [<ffffffff8112553f>] ? user_path_at_empty+0x5f/0xb0
[ 55.479555] [<ffffffff81125549>] ? user_path_at_empty+0x69/0xb0
[ 55.479556] [<ffffffff8111b690>] ? vfs_fstatat+0x40/0x90
[ 55.479557] [<ffffffff8111b862>] ? SyS_newlstat+0x12/0x30
[ 55.479559] [<ffffffff8111f89d>] ? path_put+0xd/0x20
[ 55.479560] [<ffffffff81138ab7>] ? SyS_getxattr+0x57/0x80
[ 55.479562] [<ffffffff817053d2>] ? system_call_fastpath+0x16/0x1b
[ 55.479563] ---[ end trace a8ad56fd476f7474 ]---
[ 55.479564] BTRFS: error (device sda2) in update_ref_for_cow:1018:
errno=-30 Readonly filesystem
[ 55.479565] BTRFS info (device sda2): forced readonly
[ 55.479565] ------------[ cut here ]------------
[ 55.479567] WARNING: CPU: 1 PID: 1723 at fs/btrfs/super.c:259
__btrfs_abort_transaction+0x5a/0x140()
[ 55.479567] BTRFS: Transaction aborted (error -30)
[ 55.479568] Modules linked in:
[ 55.479569] CPU: 1 PID: 1723 Comm: ls Tainted: G W 3.16.5 #1
[ 55.479569] Hardware name: ASUS All Series/H87M-PRO, BIOS 2101 07/21/2014
[ 55.479570] 0000000000000000 0000000000000009 ffffffff816ff873
ffff8807f2dcf788
[ 55.479571] ffffffff81078261 00000000ffffffe2 ffff8807ed8ca000
ffff8807f7133de0
[ 55.479572] ffffffff8184d800 0000000000000488 ffffffff81078345
ffffffff8197afd8
[ 55.479573] Call Trace:
[ 55.479574] [<ffffffff816ff873>] ? dump_stack+0x41/0x51
[ 55.479576] [<ffffffff81078261>] ? warn_slowpath_common+0x81/0xb0
[ 55.479578] [<ffffffff81078345>] ? warn_slowpath_fmt+0x45/0x50
[ 55.479579] [<ffffffff812aa41a>] ? __btrfs_abort_transaction+0x5a/0x140
[ 55.479580] [<ffffffff812afe02>] ? __btrfs_cow_block+0x432/0x5a0
[ 55.479582] [<ffffffff812d14fd>] ? btrfs_buffer_uptodate+0x6d/0x80
[ 55.479583] [<ffffffff812b0136>] ? btrfs_cow_block+0x126/0x190
[ 55.479584] [<ffffffff812b43bd>] ? btrfs_search_slot+0x1fd/0xaa0
[ 55.479586] [<ffffffff812e07a3>] ?
btrfs_truncate_inode_items+0x123/0x8e0
[ 55.479587] [<ffffffff812e204a>] ? btrfs_evict_inode+0x32a/0x490
[ 55.479588] [<ffffffff8112e02a>] ? unlock_new_inode+0x3a/0x60
[ 55.479590] [<ffffffff8113abb5>] ? __inode_wait_for_writeback+0x65/0xb0
[ 55.479591] [<ffffffff810a8f70>] ? wake_atomic_t_function+0x30/0x30
[ 55.479592] [<ffffffff8112f276>] ? evict+0xa6/0x160
[ 55.479594] [<ffffffff812e2c2d>] ? btrfs_orphan_cleanup+0x1ed/0x430
[ 55.479595] [<ffffffff812e31c8>] ? btrfs_lookup_dentry+0x358/0x4c0
[ 55.479596] [<ffffffff812e3339>] ? btrfs_lookup+0x9/0x30
[ 55.479598] [<ffffffff8111f6c4>] ? lookup_real+0x14/0x50
[ 55.479599] [<ffffffff81120292>] ? __lookup_hash+0x32/0x50
[ 55.479600] [<ffffffff81120938>] ? lookup_slow+0x48/0xc0
[ 55.479601] [<ffffffff811227bc>] ? path_lookupat+0x73c/0x770
[ 55.479603] [<ffffffff81164860>] ? posix_acl_xattr_get+0x40/0xb0
[ 55.479605] [<ffffffff81137a80>] ? generic_getxattr+0x50/0x80
[ 55.479606] [<ffffffff8112281e>] ? filename_lookup.isra.51+0x2e/0x90
[ 55.479607] [<ffffffff8112553f>] ? user_path_at_empty+0x5f/0xb0
[ 55.479608] [<ffffffff81125549>] ? user_path_at_empty+0x69/0xb0
[ 55.479609] [<ffffffff8111b690>] ? vfs_fstatat+0x40/0x90
[ 55.479610] [<ffffffff8111b862>] ? SyS_newlstat+0x12/0x30
[ 55.479611] [<ffffffff8111f89d>] ? path_put+0xd/0x20
[ 55.479613] [<ffffffff81138ab7>] ? SyS_getxattr+0x57/0x80
[ 55.479614] [<ffffffff817053d2>] ? system_call_fastpath+0x16/0x1b
[ 55.479615] ---[ end trace a8ad56fd476f7475 ]---
[ 55.479620] BTRFS error (device sda2): Error removing orphan entry,
stopping orphan cleanup
[ 55.479621] BTRFS critical (device sda2): could not do orphan cleanup -22
[ 83.454294] parent transid verify failed on 51150848 wanted 272368
found 276401
[ 83.454945] parent transid verify failed on 918274048 wanted 273135
found 274590
[ 83.455601] parent transid verify failed on 508444672 wanted 274054
found 276617
[ 83.456251] parent transid verify failed on 18317623296 wanted 275876
found 278431
[ 83.456897] parent transid verify failed on 127254528 wanted 276488
found 276490
[ 84.647964] parent transid verify failed on 51150848 wanted 272368
found 276401
[ 84.648612] parent transid verify failed on 918274048 wanted 273135
found 274590
[ 84.649267] parent transid verify failed on 508444672 wanted 274054
found 276617
[ 84.649913] parent transid verify failed on 18317623296 wanted 275876
found 278431
[ 84.650557] parent transid verify failed on 127254528 wanted 276488
found 276490
On 10/14/14 12:36 AM, Duncan wrote:
> Rich Freeman posted on Mon, 13 Oct 2014 16:42:14 -0400 as excerpted:
>
>> On Mon, Oct 13, 2014 at 4:27 PM, David Arendt <admin@prnet.org> wrote:
>>> From my own experience and based on what other people are saying, I
>>> think there is a random btrfs filesystem corruption problem in kernel
>>> 3.17 at least related to snapshots, therefore I decided to post using
>>> another subject to draw attention from people not concerned about btrfs
>>> send to it. More information can be found in the brtfs send posts.
>>>
>>> Did the filesystem you tried to balance contain snapshots ? Read only
>>> ones ?
>> The filesystem contains numerous subvolumes and snapshots, many of which
>> are read-only. I'm managing many with snapper.
>>
>> The similarity of the transid verify errors made me think this issue is
>> related, and the root cause may have nothing to do with btrfs send.
>>
>> As far as I can tell these errors aren't having any affect on my data -
>> hopefully the system is catching the problems before there are actual
>> disk writes/etc.
> Summarizing what I've seen on the threads...
>
> 1) The bug seems to be read-only snapshot related. The connection to
> send is that send creates read-only snapshots, but people creating read-
> only snapshots for other purposes are now reporting the same problem, so
> it's not send, it's the read-only snapshots.
>
> 2) Writable snapshots haven't been implicated yet, and the working set
> from which the snapshots are taken doesn't seem to be affected, either.
> So in that sense it's not affecting ordinary usage, only the read-only
> snapshots themselves.
>
> 3) More problematic, however, is the fact that these apparently corrupted
> read-only snapshots often are not listed properly and can't be deleted,
> tho I'm not sure if that's /all/ the corrupted snapshots or only part of
> them. So while it may not affect ordinary operation in the short term,
> over time until there's a fix, people routinely doing read-only snapshots
> are going to be getting more and more of these undeletable snapshots, and
> depending on whether the eventual patch only prevents more or can
> actually fix the bad ones (possibly via btrfs check or the like),
> affected filesystems may ultimately have to be blown away and recreated
> with a fresh mkfs, in ordered to kill the currently undeletable snapshots.
>
> So the first thing to do would be to shut off whatever's making read-only
> snapshots, so you don't make the problem worse while it's being
> investigated. For those who can do that without too big an interruption
> to their normal routine (who don't depend on send/receive, for instance),
> just keep it off for the time being. For those who depend on read-only
> snapshots (send-receive for backup and the data is too valuable to not do
> the backups for a few days), consider switching back to 3.16-stable --
> from 3.16.3 at least, the patch for the compress bug is there, so that
> shouldn't be a problem.
>
> And if you're affected, be aware that until we have a fix, we don't know
> if it'll be possible to remove the affected and currently undeletable
> snapshots. If it's not, at some point you'll need to do a fresh
> mkfs.btrfs, to get rid of the damage. Since the bug doesn't appear to
> affect writable snapshots or the "head" from which snapshots are made,
> it's not urgent, and a full fix is likely to include a patch to detect
> and fix the problem as well, but until we know what the problem is we
> can't be sure of that, so be prepared to do that mkfs at some point, as
> at this point it's possible that's the only way you'll be able to kill
> the corrupted snapshots.
>
> 4) Total speculation on my part, but given the wanted transid (aka
> generation, in different contexts) is significantly lower than the found
> transid, and the fact that the problem appears to be limited to
> /read-only/ snapshots, my first suspicion is that something's getting
> updated that would normally apply to all snapshots, but the read-only
> nature of the snapshots is preventing the full update there. The transid
> of the block is updated, but the snapshot being read-only is preventing
> update of the pointer in that snapshot accordingly.
>
> What I do /not/ know is whether the bug is that something's getting
> updated that should NOT be, and it's simply the read-only snapshots
> letting us know about it since the writable snapshots are fully updated,
> even if that breaks the snapshot (breaking writable snapshots in a
> different and currently undetected way), or if instead, it's a legitimate
> update, like a balance simply moving the snapshot around but not
> affecting it otherwise, and the bug is that the read-only snapshots
> aren't allowing the legitimate update.
>
> Either way, this more or less developed over the weekend, and it's Monday
> now, so the devs should be on it. If it's anything like the 3.15/3.16
> compression bug, it'll take some time for them to properly trace it, and
> then to figure out an appropriate fix, but they will. Chances are we'll
> have at least some decent progress on a trace by Friday, and maybe even a
> good-to-go patch. =:^)
>
next prev parent reply other threads:[~2014-10-14 17:00 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <DC336054-F307-4A86-AD6D-204E700DE9AA@prnet.org>
2014-10-07 13:19 ` btrfs send and kernel 3.17 Chris Mason
2014-10-07 20:45 ` David Arendt
2014-10-07 20:46 ` Chris Mason
2014-10-12 11:11 ` David Arendt
2014-10-12 15:24 ` john terragon
2014-10-12 21:35 ` David Arendt
2014-10-13 4:11 ` David Arendt
2014-10-13 12:40 ` john terragon
2014-10-13 15:40 ` David Arendt
2014-10-13 17:22 ` Rich Freeman
2014-10-13 20:27 ` btrfs random filesystem corruption in " David Arendt
2014-10-13 20:42 ` Rich Freeman
2014-10-13 22:36 ` Duncan
2014-10-14 11:17 ` admin
2014-10-14 21:35 ` Duncan
2014-10-14 22:03 ` Robert White
2014-10-14 22:55 ` Duncan
2014-10-14 17:00 ` David Arendt [this message]
2014-10-13 20:48 ` john terragon
2014-10-13 20:55 ` Rich Freeman
2014-10-13 20:57 ` Rich Freeman
2014-10-13 21:22 ` john terragon
2014-10-13 21:25 ` David Arendt
2014-10-13 21:49 ` Duncan
2014-10-13 23:18 ` Rich Freeman
2014-10-14 1:30 ` john terragon
2014-10-13 21:22 ` David Arendt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=543D569A.8000103@prnet.org \
--to=admin@prnet.org \
--cc=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).