linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kernel BUG at /build/buildd/linux-3.2.0/fs/btrfs/extent-tree.c:4816!
@ 2011-11-29  1:39 Karl Mardoff Kittilsen
  2011-11-29 15:12 ` Chris Mason
  0 siblings, 1 reply; 7+ messages in thread
From: Karl Mardoff Kittilsen @ 2011-11-29  1:39 UTC (permalink / raw)
  To: linux-btrfs

Hi!

Sending a mail on this issue, as advised on IRC.

My /home file system fails to mount and the kernel seem to freeze and I 
need to do the Alt+SysRq RSNEIUB routine to boot it safely.
The corruption happened on a 3.2-rc<something> kernel and Ubuntu 11.10, 
but I am now running on Ubuntu 12.04 with the 3.2.0-2-generic kernel to 
see if that helped, it did not.
btrfsck from the latest btrfs-tools returns:

karl@karl-precise:~/git/btrfs-progs$ sudo ./btrfsck /dev/md0
ref mismatch on [2176962560 8192] extent item 480, found 1
Incorrect local backref count on 2176970752 root 5 owner 2101705 offset 
368640 found 1 wanted 3925868545
backpointer mismatch on [2176970752 4096]
found 1322579566593 bytes used err is 1
total csum bytes: 1288573748
total tree bytes: 3057922048
total fs tree bytes: 862068736
btree space waste bytes: 704584583
file data blocks allocated: 18991122972672
  referenced 1361205268480
Btrfs Btrfs v0.19-dirty

The file system is on a md raid1 device, and the only thing that I have 
done recently that might be related is that I made a script
to run through all my files and defrag them as well as compress them. 
That completed without any errors and I gained about 10% of space :)
This was about 5 days ago, after that I used it like normal without any 
problems.
Mount options are "defaults,compression=zlib"

This is the trace from dmesg when I try to mount it:

Nov 29 01:17:30 karl-precise kernel: [  100.963449] ------------[ cut 
here ]------------
Nov 29 01:17:30 karl-precise kernel: [  100.963478] kernel BUG at 
/build/buildd/linux-3.2.0/fs/btrfs/extent-tree.c:4816!
Nov 29 01:17:30 karl-precise kernel: [  100.963516] invalid opcode: 0000 
[#1] SMP
Nov 29 01:17:30 karl-precise kernel: [  100.963534] CPU 3
Nov 29 01:17:30 karl-precise kernel: [  100.963543] Modules linked in: 
nls_iso8859_1 nls_cp437 vfat fat rfcomm bnep bluetooth parport_pc ppdev 
binfmt_misc snd_hda_codec_hdmi arc4 rt2500usb rt2x00usb rt2x00lib 
mac80211 snd_hda_codec_realtek cfg80211 snd_hda_intel snd_hda_codec 
snd_hwdep snd_pcm snd_seq_midi radeon snd_rawmidi snd_seq_midi_event 
snd_seq psmouse snd_timer snd_seq_device snd ttm sp5100_tco 
drm_kms_helper drm soundcore snd_page_alloc i2c_algo_bit i2c_piix4 
edac_core wmi asus_atk0110 k10temp serio_raw edac_mce_amd lp parport 
raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov 
usb_storage uas usbhid hid raid6_pq async_tx raid0 multipath raid1 
linear pata_atiixp btrfs zlib_deflate firewire_ohci firewire_core 
crc_itu_t r8169 libcrc32c
Nov 29 01:17:30 karl-precise kernel: [  100.963855]
Nov 29 01:17:30 karl-precise kernel: [  100.963862] Pid: 2184, comm: 
mount Not tainted 3.2.0-2-generic #4-Ubuntu System manufacturer System 
Product Name/M4A79T Deluxe
Nov 29 01:17:30 karl-precise kernel: [  100.963908] RIP: 
0010:[<ffffffffa0060ef7>]  [<ffffffffa0060ef7>] 
__btrfs_free_extent+0x617/0x650 [btrfs]
Nov 29 01:17:30 karl-precise kernel: [  100.963958] RSP: 
0018:ffff880404ec9778  EFLAGS: 00010207
Nov 29 01:17:30 karl-precise kernel: [  100.963979] RAX: 
00000000ea000001 RBX: ffff8803e23ce000 RCX: 0000000000000000
Nov 29 01:17:30 karl-precise kernel: [  100.964006] RDX: 
ffff880000000000 RSI: 00000000000007ad RDI: ffff8803e23d0280
Nov 29 01:17:30 karl-precise kernel: [  100.964046] RBP: 
ffff880404ec9838 R08: 00000000000007b1 R09: 0000000000000000
Nov 29 01:17:30 karl-precise kernel: [  100.964078] R10: 
000000000000000d R11: ffff8803dac09840 R12: 000000000000002c
Nov 29 01:17:30 karl-precise kernel: [  100.964109] R13: 
0000000081c1f000 R14: 0000000000001000 R15: 0000000000000000
Nov 29 01:17:30 karl-precise kernel: [  100.964141] FS: 
00007f2290850820(0000) GS:ffff88042fcc0000(0000) knlGS:0000000000000000
Nov 29 01:17:30 karl-precise kernel: [  100.964177] CS:  0010 DS: 0000 
ES: 0000 CR0: 000000008005003b
Nov 29 01:17:30 karl-precise kernel: [  100.964203] CR2: 
00007f641727a000 CR3: 00000003ea2cf000 CR4: 00000000000006e0
Nov 29 01:17:30 karl-precise kernel: [  100.964235] DR0: 
0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 29 01:17:30 karl-precise kernel: [  100.964266] DR3: 
0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 29 01:17:30 karl-precise kernel: [  100.964298] Process mount (pid: 
2184, threadinfo ffff880404ec8000, task ffff8803ea29c530)
Nov 29 01:17:30 karl-precise kernel: [  100.964334] Stack:
Nov 29 01:17:30 karl-precise kernel: [  100.964344]  0000000000000000 
0000000000000005 00000000002011c9 000000000005a000
Nov 29 01:17:30 karl-precise kernel: [  100.964386]  ffff880400000035 
ffff880414f52000 0000000100000001 ffff8803e7a0e800
Nov 29 01:17:30 karl-precise kernel: [  100.964417]  ffff8803e7a0fc00 
ffff8803e23cf000 000000000000077c ffff8803e23d0280
Nov 29 01:17:30 karl-precise kernel: [  100.964449] Call Trace:
Nov 29 01:17:30 karl-precise kernel: [  100.964467] 
[<ffffffffa0061180>] run_delayed_data_ref+0xb0/0x1a0 [btrfs]
Nov 29 01:17:30 karl-precise kernel: [  100.964496] 
[<ffffffff8116087f>] ? kmem_cache_free+0x2f/0x110
Nov 29 01:17:30 karl-precise kernel: [  100.965751] 
[<ffffffffa0064b3e>] run_one_delayed_ref+0x8e/0xf0 [btrfs]
Nov 29 01:17:30 karl-precise kernel: [  100.966996] 
[<ffffffffa0064c74>] run_clustered_refs+0xd4/0x240 [btrfs]
Nov 29 01:17:30 karl-precise kernel: [  100.967397] 
[<ffffffffa0064eaa>] btrfs_run_delayed_refs+0xca/0x220 [btrfs]
Nov 29 01:17:30 karl-precise kernel: [  100.967397] 
[<ffffffff8165135d>] ? mutex_lock+0x1d/0x50
Nov 29 01:17:30 karl-precise kernel: [  100.967397] 
[<ffffffffa008ede6>] ? btrfs_run_ordered_operations+0x1d6/0x1f0 [btrfs]
Nov 29 01:17:30 karl-precise kernel: [  100.967397] 
[<ffffffffa0074f53>] btrfs_commit_transaction+0x93/0x840 [btrfs]
Nov 29 01:17:30 karl-precise kernel: [  100.967397] 
[<ffffffff81089c50>] ? add_wait_queue+0x60/0x60
Nov 29 01:17:30 karl-precise kernel: [  100.967397] 
[<ffffffff8116087f>] ? kmem_cache_free+0x2f/0x110
Nov 29 01:17:30 karl-precise kernel: [  100.967397] 
[<ffffffffa00a8982>] btrfs_recover_log_trees+0x2d2/0x300 [btrfs]
Nov 29 01:17:30 karl-precise kernel: [  100.967397] 
[<ffffffffa00a75e0>] ? fixup_inode_link_counts+0x150/0x150 [btrfs]
Nov 29 01:17:30 karl-precise kernel: [  100.967397] 
[<ffffffffa0073411>] open_ctree+0x1471/0x1920 [btrfs]
Nov 29 01:17:30 karl-precise kernel: [  100.967397] 
[<ffffffff81311d74>] ? snprintf+0x34/0x40
Nov 29 01:17:30 karl-precise kernel: [  100.967397] 
[<ffffffffa00c2582>] btrfs_fill_super.isra.38+0x72/0x12c [btrfs]
Nov 29 01:17:30 karl-precise kernel: [  100.967397] 
[<ffffffff811e1d7a>] ? disk_name+0xba/0xc0
Nov 29 01:17:30 karl-precise kernel: [  100.967397] 
[<ffffffff8130f397>] ? strlcpy+0x47/0x60
Nov 29 01:17:30 karl-precise kernel: [  100.967397] 
[<ffffffffa0052807>] btrfs_mount+0x497/0x4e0 [btrfs]
Nov 29 01:17:30 karl-precise kernel: [  100.967397] 
[<ffffffff81179b43>] mount_fs+0x43/0x1b0
Nov 29 01:17:30 karl-precise kernel: [  100.967397] 
[<ffffffff811941ba>] vfs_kern_mount+0x6a/0xc0
Nov 29 01:17:30 karl-precise kernel: [  100.967397] 
[<ffffffff81195664>] do_kern_mount+0x54/0x110
Nov 29 01:17:30 karl-precise kernel: [  100.967397] 
[<ffffffff811971b4>] do_mount+0x1a4/0x260
Nov 29 01:17:30 karl-precise kernel: [  100.967397] 
[<ffffffff81197690>] sys_mount+0x90/0xe0
Nov 29 01:17:30 karl-precise kernel: [  100.967397] 
[<ffffffff8165ad02>] system_call_fastpath+0x16/0x1b
Nov 29 01:17:30 karl-precise kernel: [  100.967397] Code: 0f 85 94 fa ff 
ff 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 48 8b 55 c8 48 8b 3b 48 8d 73 40 
e8 98 17 06 00 39 45 20 0f 84 e9 fd ff ff <0f> 0b 0f 0b 89 c6 4c 89 ea 
31 c0 48 c7 c7 48 9d 0c a0 e8 7b 93
Nov 29 01:17:30 karl-precise kernel: [  100.967397] RIP 
[<ffffffffa0060ef7>] __btrfs_free_extent+0x617/0x650 [btrfs]
Nov 29 01:17:30 karl-precise kernel: [  100.967397]  RSP <ffff880404ec9778>
Nov 29 01:17:30 karl-precise kernel: [  101.005914] ---[ end trace 
ae54b272e480df0f ]---

--------------- After digging through some log files I found the first 
occurrence of this error, with some new log lines -----------

These lines occurred just before the first time the partition became 
unmountable:

Nov 27 23:45:47 karl-workstation kernel: [211390.634303] btrfs csum 
failed ino 3738022 off 1819189248 csum 318166411 private 1787547189
Nov 27 23:45:54 karl-workstation kernel: [211398.556254] btrfs csum 
failed ino 3738022 off 1819189248 csum 2203380165 private 1787547189
Nov 27 23:45:55 karl-workstation kernel: [211398.676454] btrfs csum 
failed ino 3738022 off 1819189248 csum 2203380165 private 1787547189
Nov 27 23:45:55 karl-workstation kernel: [211398.679193] btrfs csum 
failed ino 3738022 off 1819189248 csum 2203380165 private 1787547189

And then this

Nov 28 00:11:14 karl-workstation kernel: [212918.235045] ------------[ 
cut here ]------------
Nov 28 00:11:14 karl-workstation kernel: [212918.235050] kernel BUG at 
/home/apw/COD/linux/fs/btrfs/extent-tree.c:4775!
Nov 28 00:11:14 karl-workstation kernel: [212918.235052] invalid opcode: 
0000 [#1] SMP
Nov 28 00:11:14 karl-workstation kernel: [212918.235054] CPU 0
Nov 28 00:11:14 karl-workstation kernel: [212918.235056] Modules linked 
in: nls_iso8859_1 nls_cp437 vfat fat bnep rfcomm bluetooth 
ip6table_filter ip6_tables ipt_MASQUERADE iptable_nat nf_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT 
xt_CHECKSUM iptable_mangle xt_tcpudp nfsd iptable_filter lockd ip_tables 
nfs_acl x_tables auth_rpcgss sunrpc bridge stp kvm_amd kvm ppdev 
binfmt_misc arc4 rt2500usb rt2x00usb rt2x00lib mac80211 cfg80211 
snd_hda_codec_hdmi snd_hda_codec_realtek fglrx(P) snd_hda_intel psmouse 
snd_seq_midi snd_hda_codec snd_rawmidi snd_hwdep snd_seq_midi_event 
snd_pcm snd_seq edac_core serio_raw edac_mce_amd k10temp sp5100_tco 
snd_seq_device i2c_piix4 snd_timer asus_atk0110 snd soundcore 
snd_page_alloc wmi lp parport raid10 raid456 async_pq async_xor xor 
async_memcpy async_raid6_recov usb_storage uas usbhid hid raid6_pq 
async_tx raid1 pata_atiixp raid0 firewire_ohci ahci libahci multipath 
firewire_core crc_itu_t linear btrfs r8169 zlib_deflate libcrc32c [last 
unloaded: parport_pc]
Nov 28 00:11:14 karl-workstation kernel: [212918.235092]
Nov 28 00:11:14 karl-workstation kernel: [212918.235094] Pid: 6962, 
comm: btrfs-endio-wri Tainted: P           O 3.2.0-999-generic 
#201111220410 System manufacturer System Product Name/M4A79T Deluxe
Nov 28 00:11:14 karl-workstation kernel: [212918.235098] RIP: 
0010:[<ffffffffa002b910>]  [<ffffffffa002b910>] 
__btrfs_free_extent+0x6c0/0x700 [btrfs]
Nov 28 00:11:14 karl-workstation kernel: [212918.235117] RSP: 
0018:ffff880380173990  EFLAGS: 00010207
Nov 28 00:11:14 karl-workstation kernel: [212918.235118] RAX: 
00000000ea000001 RBX: ffff880412c3ab40 RCX: ffff880380173900
Nov 28 00:11:14 karl-workstation kernel: [212918.235120] RDX: 
ffff880000000000 RSI: 00000000000007ad RDI: ffff88027db9a8c0
Nov 28 00:11:14 karl-workstation kernel: [212918.235121] RBP: 
ffff880380173a80 R08: 00000000000007b1 R09: ffff8803801738f0
Nov 28 00:11:14 karl-workstation kernel: [212918.235123] R10: 
0000000000000000 R11: 0000000000000000 R12: 000000000000002c
Nov 28 00:11:14 karl-workstation kernel: [212918.235124] R13: 
0000000081c1f000 R14: 0000000000000001 R15: 0000000000000001
Nov 28 00:11:14 karl-workstation kernel: [212918.235126] FS: 
00007fd5b95399c0(0000) GS:ffff88042fc00000(0000) knlGS:00000000f67d8880
Nov 28 00:11:14 karl-workstation kernel: [212918.235127] CS:  0010 DS: 
0000 ES: 0000 CR0: 000000008005003b
Nov 28 00:11:14 karl-workstation kernel: [212918.235129] CR2: 
00007f3a8bbd7000 CR3: 00000003452e1000 CR4: 00000000000006f0
Nov 28 00:11:14 karl-workstation kernel: [212918.235130] DR0: 
0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 28 00:11:14 karl-workstation kernel: [212918.235132] DR3: 
0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 28 00:11:14 karl-workstation kernel: [212918.235133] Process 
btrfs-endio-wri (pid: 6962, threadinfo ffff880380172000, task 
ffff8803f47d16f0)
Nov 28 00:11:14 karl-workstation kernel: [212918.235135] Stack:
Nov 28 00:11:14 karl-workstation kernel: [212918.235136] 
0000000000000000 0000000000000005 00000000002011c9 000000000005a000
Nov 28 00:11:14 karl-workstation kernel: [212918.235138] 
0000160000000000 0000000000000000 0000000200000033 ffff880000000035
Nov 28 00:11:14 karl-workstation kernel: [212918.235140] 
0000000112f78030 ffff8804146ee000 0000000100001000 ffff88041194a000
Nov 28 00:11:14 karl-workstation kernel: [212918.235143] Call Trace:
Nov 28 00:11:14 karl-workstation kernel: [212918.235153] 
[<ffffffffa002bc04>] run_delayed_data_ref+0x154/0x160 [btrfs]
Nov 28 00:11:14 karl-workstation kernel: [212918.235162] 
[<ffffffffa001a203>] ? leaf_space_used+0xc3/0xf0 [btrfs]
Nov 28 00:11:14 karl-workstation kernel: [212918.235171] 
[<ffffffffa002bcba>] run_one_delayed_ref+0xaa/0xc0 [btrfs]
Nov 28 00:11:14 karl-workstation kernel: [212918.235180] 
[<ffffffffa002bd90>] run_clustered_refs+0xc0/0x220 [btrfs]
Nov 28 00:11:14 karl-workstation kernel: [212918.235189] 
[<ffffffffa002bfba>] btrfs_run_delayed_refs+0xca/0x220 [btrfs]
Nov 28 00:11:14 karl-workstation kernel: [212918.235193] 
[<ffffffff8160f27e>] ? _raw_spin_lock+0xe/0x20
Nov 28 00:11:14 karl-workstation kernel: [212918.235203] 
[<ffffffffa003b08f>] __btrfs_end_transaction+0xbf/0x250 [btrfs]
Nov 28 00:11:14 karl-workstation kernel: [212918.235213] 
[<ffffffffa003b295>] btrfs_end_transaction+0x15/0x20 [btrfs]
Nov 28 00:11:14 karl-workstation kernel: [212918.235223] 
[<ffffffffa00403cb>] btrfs_finish_ordered_io+0x16b/0x340 [btrfs]
Nov 28 00:11:14 karl-workstation kernel: [212918.235233] 
[<ffffffffa00405f1>] btrfs_writepage_end_io_hook+0x51/0xa0 [btrfs]
Nov 28 00:11:14 karl-workstation kernel: [212918.235244] 
[<ffffffffa0056c8b>] end_bio_extent_writepage+0x13b/0x180 [btrfs]
Nov 28 00:11:14 karl-workstation kernel: [212918.235247] 
[<ffffffff8160d66b>] ? schedule_timeout+0x18b/0x2e0
Nov 28 00:11:14 karl-workstation kernel: [212918.235250] 
[<ffffffff811ab9dd>] bio_endio+0x1d/0x40
Nov 28 00:11:14 karl-workstation kernel: [212918.235259] 
[<ffffffffa0034ef4>] end_workqueue_fn+0xf4/0x130 [btrfs]
Nov 28 00:11:14 karl-workstation kernel: [212918.235269] 
[<ffffffffa0063f8c>] worker_loop+0x15c/0x4c0 [btrfs]
Nov 28 00:11:14 karl-workstation kernel: [212918.235279] 
[<ffffffffa0063e30>] ? check_pending_worker_creates+0xd0/0xd0 [btrfs]
Nov 28 00:11:14 karl-workstation kernel: [212918.235283] 
[<ffffffff81088536>] kthread+0x96/0xa0
Nov 28 00:11:14 karl-workstation kernel: [212918.235285] 
[<ffffffff816197f4>] kernel_thread_helper+0x4/0x10
Nov 28 00:11:14 karl-workstation kernel: [212918.235288] 
[<ffffffff810884a0>] ? kthread_worker_fn+0x190/0x190
Nov 28 00:11:14 karl-workstation kernel: [212918.235290] 
[<ffffffff816197f0>] ? gs_change+0x13/0x13
Nov 28 00:11:14 karl-workstation kernel: [212918.235291] Code: 8b bd 70 
ff ff ff e8 00 22 00 00 0f 0b eb fe 48 8b 55 c8 48 8b bd 68 ff ff ff 48 
89 de e8 49 b5 ff ff 39 45 20 0f 84 78 fd ff ff <0f> 0b eb fe 0f 0b eb 
fe 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe be
Nov 28 00:11:14 karl-workstation kernel: [212918.235309] RIP 
[<ffffffffa002b910>] __btrfs_free_extent+0x6c0/0x700 [btrfs]
Nov 28 00:11:14 karl-workstation kernel: [212918.235317]  RSP 
<ffff880380173990>
Nov 28 00:11:14 karl-workstation kernel: [212918.235320] ---[ end trace 
7c26e4285890c533 ]---

And then I had to reboot the system as it became unresponsive.
If you need any more info I will be more than happy to help out.

Karl M. Kittilsen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/btrfs/extent-tree.c:4816!
  2011-11-29  1:39 kernel BUG at /build/buildd/linux-3.2.0/fs/btrfs/extent-tree.c:4816! Karl Mardoff Kittilsen
@ 2011-11-29 15:12 ` Chris Mason
  2011-11-29 15:29   ` Karl Mardoff Kittilsen
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Mason @ 2011-11-29 15:12 UTC (permalink / raw)
  To: Karl Mardoff Kittilsen; +Cc: linux-btrfs

On Tue, Nov 29, 2011 at 02:39:26AM +0100, Karl Mardoff Kittilsen wrote:
> Hi!
> 
> Sending a mail on this issue, as advised on IRC.
> 
> My /home file system fails to mount and the kernel seem to freeze
> and I need to do the Alt+SysRq RSNEIUB routine to boot it safely.
> The corruption happened on a 3.2-rc<something> kernel and Ubuntu
> 11.10, but I am now running on Ubuntu 12.04 with the 3.2.0-2-generic
> kernel to see if that helped, it did not.
> btrfsck from the latest btrfs-tools returns:
> 
> karl@karl-precise:~/git/btrfs-progs$ sudo ./btrfsck /dev/md0
> ref mismatch on [2176962560 8192] extent item 480, found 1
> Incorrect local backref count on 2176970752 root 5 owner 2101705
> offset 368640 found 1 wanted 3925868545
> backpointer mismatch on [2176970752 4096]

So the crashes below were because we tried to free one of these extents.
You have two extents whose reference counts are way off.

Unfortunately this is stored on disk, so different kernels aren't going
to fix it (yet).  One of the extents is in a file with inode number
2101705, and the other is in a btree block (2176962560).

I'll be able to fix this soon, but we can also make a patch that changes
those BUG_ONs to just deal with the mismatch.  The worst case here would
be leaking those two extents, about 12K of data.

-chris

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/btrfs/extent-tree.c:4816!
  2011-11-29 15:12 ` Chris Mason
@ 2011-11-29 15:29   ` Karl Mardoff Kittilsen
  2011-11-29 15:49     ` Chris Mason
  0 siblings, 1 reply; 7+ messages in thread
From: Karl Mardoff Kittilsen @ 2011-11-29 15:29 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs

Den 29. nov. 2011 16:12, skrev Chris Mason:
> On Tue, Nov 29, 2011 at 02:39:26AM +0100, Karl Mardoff Kittilsen wrote:
>> Hi!
>>
>> Sending a mail on this issue, as advised on IRC.
>>
>> My /home file system fails to mount and the kernel seem to freeze
>> and I need to do the Alt+SysRq RSNEIUB routine to boot it safely.
>> The corruption happened on a 3.2-rc<something>  kernel and Ubuntu
>> 11.10, but I am now running on Ubuntu 12.04 with the 3.2.0-2-generic
>> kernel to see if that helped, it did not.
>> btrfsck from the latest btrfs-tools returns:
>>
>> karl@karl-precise:~/git/btrfs-progs$ sudo ./btrfsck /dev/md0
>> ref mismatch on [2176962560 8192] extent item 480, found 1
>> Incorrect local backref count on 2176970752 root 5 owner 2101705
>> offset 368640 found 1 wanted 3925868545
>> backpointer mismatch on [2176970752 4096]
>
> So the crashes below were because we tried to free one of these extents.
> You have two extents whose reference counts are way off.
>
> Unfortunately this is stored on disk, so different kernels aren't going
> to fix it (yet).  One of the extents is in a file with inode number
> 2101705, and the other is in a btree block (2176962560).
>
> I'll be able to fix this soon, but we can also make a patch that changes
> those BUG_ONs to just deal with the mismatch.  The worst case here would
> be leaking those two extents, about 12K of data.
>
> -chris

Thank you for looking into it, and that does sounds really promising. I 
am available to test any patches you want tested. Is there anything else 
that I can do to help getting this issue fixed?

Karl

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/btrfs/extent-tree.c:4816!
  2011-11-29 15:29   ` Karl Mardoff Kittilsen
@ 2011-11-29 15:49     ` Chris Mason
  2011-11-29 16:47       ` David Sterba
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Mason @ 2011-11-29 15:49 UTC (permalink / raw)
  To: Karl Mardoff Kittilsen; +Cc: linux-btrfs

On Tue, Nov 29, 2011 at 04:29:54PM +0100, Karl Mardoff Kittilsen wrote:
> Den 29. nov. 2011 16:12, skrev Chris Mason:
> >On Tue, Nov 29, 2011 at 02:39:26AM +0100, Karl Mardoff Kittilsen wrote:
> >>Hi!
> >>
> >>Sending a mail on this issue, as advised on IRC.
> >>
> >>My /home file system fails to mount and the kernel seem to freeze
> >>and I need to do the Alt+SysRq RSNEIUB routine to boot it safely.
> >>The corruption happened on a 3.2-rc<something>  kernel and Ubuntu
> >>11.10, but I am now running on Ubuntu 12.04 with the 3.2.0-2-generic
> >>kernel to see if that helped, it did not.
> >>btrfsck from the latest btrfs-tools returns:
> >>
> >>karl@karl-precise:~/git/btrfs-progs$ sudo ./btrfsck /dev/md0
> >>ref mismatch on [2176962560 8192] extent item 480, found 1
> >>Incorrect local backref count on 2176970752 root 5 owner 2101705
> >>offset 368640 found 1 wanted 3925868545
> >>backpointer mismatch on [2176970752 4096]
> >
> >So the crashes below were because we tried to free one of these extents.
> >You have two extents whose reference counts are way off.
> >
> >Unfortunately this is stored on disk, so different kernels aren't going
> >to fix it (yet).  One of the extents is in a file with inode number
> >2101705, and the other is in a btree block (2176962560).
> >
> >I'll be able to fix this soon, but we can also make a patch that changes
> >those BUG_ONs to just deal with the mismatch.  The worst case here would
> >be leaking those two extents, about 12K of data.
> >
> >-chris
> 
> Thank you for looking into it, and that does sounds really
> promising. I am available to test any patches you want tested. Is
> there anything else that I can do to help getting this issue fixed?

The good news about this one is that it is very clear cut.  The hard
part is figuring out where these bogus link counts came from.

I'd suggest that you spend some time running memtest on the machine.

-chris


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/btrfs/extent-tree.c:4816!
  2011-11-29 15:49     ` Chris Mason
@ 2011-11-29 16:47       ` David Sterba
  2011-11-29 18:12         ` Chris Mason
  0 siblings, 1 reply; 7+ messages in thread
From: David Sterba @ 2011-11-29 16:47 UTC (permalink / raw)
  To: Chris Mason, Karl Mardoff Kittilsen, linux-btrfs

On Tue, Nov 29, 2011 at 10:49:13AM -0500, Chris Mason wrote:
> The good news about this one is that it is very clear cut.  The hard
> part is figuring out where these bogus link counts came from.
> 
> I'd suggest that you spend some time running memtest on the machine.

Just to add some evidence from the log:

Nov 28 00:11:14 karl-workstation kernel: [212918.235050] kernel BUG at
/home/apw/COD/linux/fs/btrfs/extent-tree.c:4775!
Nov 28 00:11:14 karl-workstation kernel: [212918.235118] RAX:
00000000ea000001 RBX: ffff880412c3ab40 RCX: ffff880380173900
^^^^^^^^^^^^^^^^

4765                         ret = btrfs_search_slot(trans, extent_root,
4766                                                 &key, path, -1, 1);
4767                         if (ret) {
4768                                 printk(KERN_ERR "umm, got %d back from search"
4769                                        ", was looking for %llu\n", ret,
4770                                        (unsigned long long)bytenr);
4771                                 if (ret > 0)
4772                                         btrfs_print_leaf(extent_root,
4773                                                          path->nodes[0]);
4774                         }
4775                         BUG_ON(ret);

the ret value comes from btrfs_search_slot, returning " < 0" or 1, but
RAX has some extra bits set, this could really be a RAM failure.


david

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/btrfs/extent-tree.c:4816!
  2011-11-29 16:47       ` David Sterba
@ 2011-11-29 18:12         ` Chris Mason
  2011-12-15  0:01           ` David Sterba
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Mason @ 2011-11-29 18:12 UTC (permalink / raw)
  To: Karl Mardoff Kittilsen, linux-btrfs

On Tue, Nov 29, 2011 at 05:47:46PM +0100, David Sterba wrote:
> On Tue, Nov 29, 2011 at 10:49:13AM -0500, Chris Mason wrote:
> > The good news about this one is that it is very clear cut.  The hard
> > part is figuring out where these bogus link counts came from.
> > 
> > I'd suggest that you spend some time running memtest on the machine.
> 
> Just to add some evidence from the log:
> 
> Nov 28 00:11:14 karl-workstation kernel: [212918.235050] kernel BUG at
> /home/apw/COD/linux/fs/btrfs/extent-tree.c:4775!
> Nov 28 00:11:14 karl-workstation kernel: [212918.235118] RAX:
> 00000000ea000001 RBX: ffff880412c3ab40 RCX: ffff880380173900
> ^^^^^^^^^^^^^^^^
> 
> 4765                         ret = btrfs_search_slot(trans, extent_root,
> 4766                                                 &key, path, -1, 1);
> 4767                         if (ret) {
> 4768                                 printk(KERN_ERR "umm, got %d back from search"
> 4769                                        ", was looking for %llu\n", ret,
> 4770                                        (unsigned long long)bytenr);
> 4771                                 if (ret > 0)
> 4772                                         btrfs_print_leaf(extent_root,
> 4773                                                          path->nodes[0]);
> 4774                         }
> 4775                         BUG_ON(ret);
> 
> the ret value comes from btrfs_search_slot, returning " < 0" or 1, but
> RAX has some extra bits set, this could really be a RAM failure.
> 
> 
> david

Interesting, look at this:

> karl@karl-precise:~/git/btrfs-progs$ sudo ./btrfsck /dev/md0
> ref mismatch on [2176962560 8192] extent item 480, found 1
> Incorrect local backref count on 2176970752 root 5 owner 2101705
> offset 368640 found 1 wanted 3925868545
> backpointer mismatch on [2176970752 4096]

3925868545 == EA000001

Are you sure this is the BUG_ON he was triggering?

-chris


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/btrfs/extent-tree.c:4816!
  2011-11-29 18:12         ` Chris Mason
@ 2011-12-15  0:01           ` David Sterba
  0 siblings, 0 replies; 7+ messages in thread
From: David Sterba @ 2011-12-15  0:01 UTC (permalink / raw)
  To: Chris Mason, Karl Mardoff Kittilsen, linux-btrfs

On Tue, Nov 29, 2011 at 01:12:14PM -0500, Chris Mason wrote:
> > Nov 28 00:11:14 karl-workstation kernel: [212918.235050] kernel BUG at
> > /home/apw/COD/linux/fs/btrfs/extent-tree.c:4775!
> > Nov 28 00:11:14 karl-workstation kernel: [212918.235118] RAX:
> > 00000000ea000001 RBX: ffff880412c3ab40 RCX: ffff880380173900
> > ^^^^^^^^^^^^^^^^
> > 
> > 4765                         ret = btrfs_search_slot(trans, extent_root,
> > 4766                                                 &key, path, -1, 1);
> > 4767                         if (ret) {
> > 4768                                 printk(KERN_ERR "umm, got %d back from search"
> > 4769                                        ", was looking for %llu\n", ret,
> > 4770                                        (unsigned long long)bytenr);
> > 4771                                 if (ret > 0)
> > 4772                                         btrfs_print_leaf(extent_root,
> > 4773                                                          path->nodes[0]);
> > 4774                         }
> > 4775                         BUG_ON(ret);
> > 
> > the ret value comes from btrfs_search_slot, returning " < 0" or 1, but
> > RAX has some extra bits set, this could really be a RAM failure.
> 
> Interesting, look at this:
> 
> > karl@karl-precise:~/git/btrfs-progs$ sudo ./btrfsck /dev/md0
> > ref mismatch on [2176962560 8192] extent item 480, found 1
> > Incorrect local backref count on 2176970752 root 5 owner 2101705
> > offset 368640 found 1 wanted 3925868545
> > backpointer mismatch on [2176970752 4096]
> 
> 3925868545 == EA000001

I applied usual first analysis steps (source line, registers, call
chain), search slot could return 1 and taking a memory failure into
account looks possible, though bit count of 'EA' is 5, seems too high.

> Are you sure this is the BUG_ON he was triggering?

This was referring to the second BUG_ON in the logs. I checked the first
BUG_ON again and see:

kernel: [  100.963478] kernel BUG at 
/build/buildd/linux-3.2.0/fs/btrfs/extent-tree.c:4816!

RAX: 00000000ea000001

4815                 if (iref) {
4816                         BUG_ON(!found_extent);
4817                 } else {
4818                         btrfs_set_extent_refs(leaf, ei, refs);
4819                         btrfs_mark_buffer_dirty(leaf);
4820                 }

found_extent is int and modified at

4686         int found_extent = 0;

and

4712                         if (key.type == BTRFS_EXTENT_ITEM_KEY &&
4713                             key.offset == num_bytes) {
4714                                 found_extent = 1;
4715                                 break;
4716                         }


This looks like a crappy memory as well.

> > offset 368640 found 1 wanted 3925868545
> 3925868545 == EA000001

"found 1 wanted 1"


david

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-12-15  0:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-29  1:39 kernel BUG at /build/buildd/linux-3.2.0/fs/btrfs/extent-tree.c:4816! Karl Mardoff Kittilsen
2011-11-29 15:12 ` Chris Mason
2011-11-29 15:29   ` Karl Mardoff Kittilsen
2011-11-29 15:49     ` Chris Mason
2011-11-29 16:47       ` David Sterba
2011-11-29 18:12         ` Chris Mason
2011-12-15  0:01           ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).