* Hung I/O, Kernel BUG with corrupt leaf (bad key order)
@ 2012-08-14 18:20 Peter Marheine
2012-08-15 1:29 ` Peter Marheine
2012-08-22 15:01 ` David Sterba
0 siblings, 2 replies; 3+ messages in thread
From: Peter Marheine @ 2012-08-14 18:20 UTC (permalink / raw)
To: linux-btrfs
Hi all,
I'm running btrfs in a 3-disk RAID1 configuration. After a hard
power-off, I'm seeing a lot of hung I/O tasks on this volume,
apparently due to a corrupt leaf. I first noticed the problem on
kernel 3.4.7, and it's persisted with 3.4.8. Relevant parts of the
kernel log follow.
[ 85.179621] block group 38684065792 has an wrong amount of free space
[ 85.179667] btrfs: failed to load free space cache for block group
38684065792
[ 136.969477] btrfs: corrupt leaf, bad key order:
block=1478255230976,root=1, slot=26
[ 136.998953] btrfs: corrupt leaf, bad key order:
block=1478255230976,root=1, slot=26
[ 137.000492] btrfs: corrupt leaf, bad key order:
block=1478255230976,root=1, slot=26
[ 137.000708] btrfs: corrupt leaf, bad key order:
block=1478255230976,root=1, slot=26
[ 153.912922] btrfs: corrupt leaf, bad key order:
block=1478255230976,root=1, slot=26
[ 153.913020] ------------[ cut here ]------------
[ 153.913055] kernel BUG at fs/btrfs/inode.c:828!
[ 153.913087] invalid opcode: 0000 [#1] PREEMPT SMP
[ 153.913142] CPU 1
[ 153.913155] Modules linked in: nfsd exportfs arc4 snd_hda_codec_idt
snd_hda_intel snd_hda_codec snd_hwdep snd_pcm ath5k ath microcode i915
video i2c_algo_bit acpi_cpufreq drm_kms_helper mperf mac80211 cfg80211
i2c_i801 rfkill serio_raw drm processor evdev snd_page_alloc snd_timer
snd coretemp soundcore mei(C) psmouse pcspkr e1000e iTCO_wdt i2c_core
button iTCO_vendor_support intel_agp intel_gtt nfs nfs_acl lockd
auth_rpcgss sunrpc fscache dm_mod floppy btrfs crc32c libcrc32c
zlib_deflate ext4 crc16 jbd2 mbcache uhci_hcd ehci_hcd usbcore
usb_common sd_mod ahci libahci pata_marvell libata scsi_mod
[ 153.913685]
[ 153.913698] Pid: 325, comm: btrfs-transacti Tainted: G C
3.4.8-1-ARCH #1 /DG33TL
[ 153.913767] RIP: 0010:[<ffffffffa0197cd0>] [<ffffffffa0197cd0>]
cow_file_range+0x3d0/0x4b0 [btrfs]
[ 153.913841] RSP: 0018:ffff8801a1fb1580 EFLAGS: 00010246
[ 153.913873] RAX: ffff88019cd38000 RBX: ffff8801a1fb18e8 RCX: 000000000000ffff
[ 153.913911] RDX: ffff88019d8bb800 RSI: ffffea00060d0040 RDI: ffff88017dff47f0
[ 153.913951] RBP: ffff8801a1fb1640 R08: ffff8801a1fb18d4 R09: ffff8801a1fb18e8
[ 153.913990] R10: 0000000000010000 R11: 0000000000000001 R12: 0000000000000000
[ 153.914029] R13: 0000000000000000 R14: 0000000000001000 R15: ffff88017dff47f0
[ 153.914068] FS: 0000000000000000(0000) GS:ffff8801abc80000(0000)
knlGS:0000000000000000
[ 153.914112] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 153.914144] CR2: 00007f085106b000 CR3: 0000000198736000 CR4: 00000000000007e0
[ 153.914182] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 153.914221] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 153.914261] Process btrfs-transacti (pid: 325, threadinfo
ffff8801a1fb0000, task ffff88019cd7b790)
[ 153.914308] Stack:
[ 153.914322] 0000000000000000 ffff880162624b60 0000000000000286
0000000000000003
[ 153.914377] 000000000000ffff ffff88017dff4620 ffff8801a1fb15f0
ffffea00060d0040
[ 153.914431] ffff8801a1fb15f0 ffff88019d8bb800 ffff8801a09ad360
ffff8801a1fb18d4
[ 153.914485] Call Trace:
[ 153.914516] [<ffffffffa01b687f>] ? free_extent_buffer+0x2f/0x70 [btrfs]
[ 153.914565] [<ffffffffa0198173>] run_delalloc_nocow+0x3c3/0x950 [btrfs]
[ 153.914615] [<ffffffffa0198a31>] run_delalloc_range+0x331/0x3a0 [btrfs]
[ 153.914665] [<ffffffffa01b52f1>] __extent_writepage+0x341/0x7c0 [btrfs]
[ 153.914715] [<ffffffffa01b5a52>]
extent_write_cache_pages.isra.26.constprop.44+0x2e2/0x3e0 [btrfs]
[ 153.914775] [<ffffffffa01b5da5>] extent_writepages+0x45/0x60 [btrfs]
[ 153.914823] [<ffffffffa0194330>] ? btrfs_writepage+0x70/0x70 [btrfs]
[ 153.914871] [<ffffffffa01b191e>] ? free_extent_state+0x1e/0x30 [btrfs]
[ 153.914919] [<ffffffffa0193338>] btrfs_writepages+0x28/0x30 [btrfs]
[ 153.916201] [<ffffffff81118082>] do_writepages+0x22/0x50
[ 153.916315] [<ffffffff8110d5fb>] __filemap_fdatawrite_range+0x5b/0x60
[ 153.916315] [<ffffffff8110d61f>] filemap_fdatawrite+0x1f/0x30
[ 153.920013] [<ffffffff8110d665>] filemap_write_and_wait+0x35/0x60
[ 153.920013] [<ffffffffa01cf622>] __btrfs_write_out_cache+0x792/0x9a0 [btrfs]
[ 153.920013] [<ffffffffa0175b25>] ? __find_space_info+0x85/0xa0 [btrfs]
[ 153.920013] [<ffffffffa017f28b>] ?
btrfs_run_delayed_refs+0x1cb/0x450 [btrfs]
[ 153.920013] [<ffffffffa01cf8c5>] btrfs_write_out_cache+0x95/0xf0 [btrfs]
[ 153.920013] [<ffffffffa017fa2f>]
btrfs_write_dirty_block_groups+0x51f/0x5f0 [btrfs]
[ 153.920013] [<ffffffffa01e9b2a>] commit_cowonly_roots+0xec/0x1c6 [btrfs]
[ 153.920013] [<ffffffffa0190895>]
btrfs_commit_transaction+0x575/0xaa0 [btrfs]
[ 153.920013] [<ffffffff81073b50>] ? abort_exclusive_wait+0xb0/0xb0
[ 153.920013] [<ffffffffa0188e15>] transaction_kthread+0x235/0x2b0 [btrfs]
[ 153.920013] [<ffffffffa0188be0>] ? btrfs_alloc_root+0x50/0x50 [btrfs]
[ 153.920013] [<ffffffff810731c3>] kthread+0x93/0xa0
[ 153.920013] [<ffffffff8146bfa4>] kernel_thread_helper+0x4/0x10
[ 153.920013] [<ffffffff81073130>] ? kthread_freezable_should_stop+0x70/0x70
[ 153.920013] [<ffffffff8146bfa0>] ? gs_change+0x13/0x13
[ 153.920013] Code: ff 48 8b 75 88 48 8b 7d 80 41 89 c0 b9 a3 03 00
00 48 c7 c2 63 10 1f a0 41 89 c6 e8 ab 3e fd ff eb 2a 66 0f 1f 84 00
00 00 00 00 <0f> 0b 48 8b 75 88 48 8b 7d 80 41 89 c0 b9 7d 03 00 00 48
c7 c2
[ 153.920013] RIP [<ffffffffa0197cd0>] cow_file_range+0x3d0/0x4b0 [btrfs]
[ 153.920013] RSP <ffff8801a1fb1580>
[ 153.920330] ---[ end trace 462486d382b33cae ]---
Btrfsck on this volume prints a lot of messages about incorrect
backrefs, and eventually fails out due to bad key ordering:
backpointer mismatch on [823847440384 1204224]
owner ref check failed [823847440384 1204224]
ref mismatch on [823848644608 1269760] extent item 1, found 0
Incorrect local backref count on 823848644608 root 5 owner 136598
offset 0 found 0 wanted 1 back 0xa6
cc9a0
backpointer mismatch on [823848644608 1269760]
owner ref check failed [823848644608 1269760]
ref mismatch on [823849914368 1662976] extent item 1, found 0
Incorrect local backref count on 823849914368 root 5 owner 136599
offset 0 found 0 wanted 1 back 0xa6
ccc00
backpointer mismatch on [823849914368 1662976]
owner ref check failed [823849914368 1662976]
ref mismatch on [823851577344 1585152] extent item 1, found 0
Incorrect local backref count on 823851577344 root 5 owner 136600
offset 0 found 0 wanted 1 back 0xa6
cd0c0
backpointer mismatch on [823851577344 1585152]
owner ref check failed [823851577344 1585152]
ref mismatch on [823853162496 1585152] extent item 1, found 0
Incorrect local backref count on 823853162496 root 5 owner 136601
offset 0 found 0 wanted 1 back 0xa6
cd580
backpointer mismatch on [823853162496 1585152]
owner ref check failed [823853162496 1585152]
ref mismatch on [823854747648 1777664] extent item 1, found 0
Incorrect local backref count on 823854747648 root 5 owner 136602
offset 0 found 0 wanted 1 back 0xa6cd450
backpointer mismatch on [823854747648 1777664]
owner ref check failed [823854747648 1777664]
owner ref check failed [1478255230976 4096]
Errors found in extent allocation tree
checking fs roots
bad key ordering 26 27
btrfsck: btrfsck.c:873: count_csum_range: Assertion `!(ret < 0)' failed.
Is there some way to fix this corruption? I noticed what looks like
the same problem in an earlier message on the list ("btrfs unmountable
after failed suspend", February 7), but with no resolution. I have
offline backups, but recovering those in their entirety will take some
time, so a solution that doesn't require wiping the entire FS would be
preferred.
--
Peter Marheine
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Hung I/O, Kernel BUG with corrupt leaf (bad key order)
2012-08-14 18:20 Hung I/O, Kernel BUG with corrupt leaf (bad key order) Peter Marheine
@ 2012-08-15 1:29 ` Peter Marheine
2012-08-22 15:01 ` David Sterba
1 sibling, 0 replies; 3+ messages in thread
From: Peter Marheine @ 2012-08-15 1:29 UTC (permalink / raw)
To: linux-btrfs
> Is there some way to fix this corruption? I noticed what looks like
> the same problem in an earlier message on the list ("btrfs unmountable
> after failed suspend", February 7), but with no resolution. I have
> offline backups, but recovering those in their entirety will take some
> time, so a solution that doesn't require wiping the entire FS would be
> preferred.
I did some further investigation into the problem, and I have
determined the problematic directory (by seeing where `ls -R` hangs).
If I skip the corrupt directory, everything works properly, but
attempting to list its contents causes the entire volume to stop
responding.
At this point I'd like to simply unlink the corrupt directory (without
enumerating it). Is that possible, or should I just image the volume
minus the corrupt directory and recreate my fs?
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Hung I/O, Kernel BUG with corrupt leaf (bad key order)
2012-08-14 18:20 Hung I/O, Kernel BUG with corrupt leaf (bad key order) Peter Marheine
2012-08-15 1:29 ` Peter Marheine
@ 2012-08-22 15:01 ` David Sterba
1 sibling, 0 replies; 3+ messages in thread
From: David Sterba @ 2012-08-22 15:01 UTC (permalink / raw)
To: Peter Marheine; +Cc: linux-btrfs
On Tue, Aug 14, 2012 at 01:20:36PM -0500, Peter Marheine wrote:
> Hi all,
>
> I'm running btrfs in a 3-disk RAID1 configuration. After a hard
> power-off, I'm seeing a lot of hung I/O tasks on this volume,
> apparently due to a corrupt leaf. I first noticed the problem on
> kernel 3.4.7, and it's persisted with 3.4.8. Relevant parts of the
> kernel log follow.
What was the filesystem activity when the power-off happened?
>
> [ 85.179621] block group 38684065792 has an wrong amount of free space
> [ 85.179667] btrfs: failed to load free space cache for block group
> 38684065792
> [ 136.969477] btrfs: corrupt leaf, bad key order:
> block=1478255230976,root=1, slot=26
> [ 136.998953] btrfs: corrupt leaf, bad key order:
> block=1478255230976,root=1, slot=26
> [ 137.000492] btrfs: corrupt leaf, bad key order:
> block=1478255230976,root=1, slot=26
> [ 137.000708] btrfs: corrupt leaf, bad key order:
> block=1478255230976,root=1, slot=26
> [ 153.912922] btrfs: corrupt leaf, bad key order:
> block=1478255230976,root=1, slot=26
> [ 153.913020] ------------[ cut here ]------------
> [ 153.913055] kernel BUG at fs/btrfs/inode.c:828!
809 static noinline int cow_file_range(struct inode *inode,
810 struct page *locked_page,
811 u64 start, u64 end, int *page_started,
812 unsigned long *nr_written,
813 int unlock)
814 {
[...]
828 BUG_ON(btrfs_is_free_space_inode(root, inode));
plus the 'block group' warning above, this seems to be the but that Liu Bo
fixed with patches
Btrfs: fix a bug of writting free space cache with nodatacow option
Btrfs: fix a bug of writting free space cache during balance
Btrfs: fix btrfs_is_free_space_inode to recognize btree inode
that should appear in 3.6.
You can try to mount with 'nospace_cache' or 'clear_cache' if this would make a
difference to redo the space cache from scratch, but I'm afaraid the bad keys
will remain and would have to be removed via offline fsck.
david
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-08-22 15:06 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-14 18:20 Hung I/O, Kernel BUG with corrupt leaf (bad key order) Peter Marheine
2012-08-15 1:29 ` Peter Marheine
2012-08-22 15:01 ` David Sterba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).