crash while trying to access corrupt fs

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* crash while trying to access corrupt fs
@ 2012-08-26 20:07 tubalcane
  2012-08-27 11:12 ` Stefan Behrens
  0 siblings, 1 reply; 4+ messages in thread
From: tubalcane @ 2012-08-26 20:07 UTC (permalink / raw)
  To: linux-btrfs

I'm primarily interested in the block level checksums of files and the scrubbing
feature to detect corrupt files.  Currently I use ext4 and create and keep
md5sums of everything which is tedious but I care about my data (quadruple
backups including offsite)

I decided to experiment by copying 7 large video files (total 900MB) to a btrfs
test drive and purposely corrupted the 4th file using the instructions here:

https://blogs.oracle.com/wim/entry/btrfs_scrub_go_fix_corruptions

umount, then remount, md5sum of the files and the entire machine locks up when
accessing the 4th file.  I rebooted, ran btrfs scrub, waited for it to finish. 
It detects the corruptions but I'm not doing RAID so it can't fix them.  Then I
tried to access the 4th file again and another crash.  Rebooted again and
crashed a third time just to be sure.

I'm running Fedora 17 and kernel 3.5.2, crash info below.  I saved the
btrfs-debug-tree output and can email it someone wants it (only 21K gzipped)


Aug 25 11:37:24 bubblegum kernel: [ 1183.786267] btrfs csum failed ino 260 off 0 csum 3029581555 private 3057259415
Aug 25 11:37:24 bubblegum kernel: [ 1183.786273] unable to find logical 0 len 0
Aug 25 11:37:24 bubblegum kernel: [ 1183.786297] ------------[ cut here ]------------
Aug 25 11:37:24 bubblegum kernel: [ 1183.787326] kernel BUG at fs/btrfs/volumes.c:3762!
Aug 25 11:37:24 bubblegum kernel: [ 1183.789085] invalid opcode: 0000 [#1] SMP
Aug 25 11:37:24 bubblegum kernel: [ 1183.792003] CPU 6
Aug 25 11:37:24 bubblegum kernel: [ 1183.792008] Modules linked in: btrfs libcrc32c zlib_deflate fuse ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptab
le_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle bridge stp llc tpm_bios vhost_net tun macvtap macvlan nfsd coretemp kvm_in
tel kvm snd_hda_codec_realtek nfs_acl auth_rpcgss lockd snd_hda_intel snd_hda_codec sunrpc lpc_ich mfd_core i7core_edac edac_core i2c_i801 snd_hwdep snd_pcm snd_page_allo
c snd_timer snd soundcore microcode r8169 uinput mii binfmt_misc ata_generic pata_acpi crc32c_intel usb_storage pata_jmicron sata_mv hid_logitech_dj nouveau mxm_wmi wmi v
ideo i2c_algo_bit drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan]
Aug 25 11:37:24 bubblegum kernel: [ 1183.802920]
Aug 25 11:37:24 bubblegum kernel: [ 1183.805100] Pid: 1783, comm: btrfs-endio-1 Not tainted 3.5.2-1.fc17.x86_64 #1 Gigabyte Technology Co., Ltd. P55M-UD2/P55M-UD2
Aug 25 11:37:24 bubblegum kernel: [ 1183.809165] RIP: 0010:[<ffffffffa04ce1e8>]  [<ffffffffa04ce1e8>] __btrfs_map_block+0x678/0x690 [btrfs]
Aug 25 11:37:24 bubblegum kernel: [ 1183.813476] RSP: 0018:ffff8803e5c3fc60  EFLAGS: 00010282
Aug 25 11:37:24 bubblegum kernel: [ 1183.815061] RAX: 000000000000001e RBX: 0000000000000000 RCX: 00000000000000c4
Aug 25 11:37:24 bubblegum kernel: [ 1183.816203] RDX: 000000000000004a RSI: 0000000000000046 RDI: 0000000000000246
Aug 25 11:37:24 bubblegum kernel: [ 1183.817347] RBP: ffff8803e5c3fd00 R08: 0000000000000449 R09: 0000000000000000
Aug 25 11:37:24 bubblegum kernel: [ 1183.818748] R10: 0000000000000000 R11: 0000000000040000 R12: ffff88040109e108
Aug 25 11:37:24 bubblegum kernel: [ 1183.819904] R13: ffff8803f4e54010 R14: 0000000000000fff R15: ffff8803e5c3fd10
Aug 25 11:37:24 bubblegum kernel: [ 1183.821067] FS:  0000000000000000(0000) GS:ffff88041fd80000(0000) knlGS:0000000000000000
Aug 25 11:37:24 bubblegum kernel: [ 1183.822236] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 25 11:37:24 bubblegum kernel: [ 1183.823411] CR2: 0000003b66b47090 CR3: 0000000001c0b000 CR4: 00000000000007e0
Aug 25 11:37:24 bubblegum kernel: [ 1183.824594] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 25 11:37:24 bubblegum kernel: [ 1183.825783] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 25 11:37:24 bubblegum kernel: [ 1183.826974] Process btrfs-endio-1 (pid: 1783, threadinfo ffff8803e5c3e000, task ffff8803e5c10000)
Aug 25 11:37:24 bubblegum kernel: [ 1183.828168] Stack:
Aug 25 11:37:24 bubblegum kernel: [ 1183.829363]  ffff8803f37c2c00 0000000000001000 ffff8803e5c3fcd0 ffffffff81602828
Aug 25 11:37:24 bubblegum kernel: [ 1183.830579]  ffff8803e5c3fcc0 0000000000000028 ffff8803e5c3fce0 ffff8803e5c3fca0
Aug 25 11:37:24 bubblegum kernel: [ 1183.831796]  ffff88034b6c410c ffff8803e5c3fd18 0000000000000000 00000000b493bef3
Aug 25 11:37:24 bubblegum kernel: [ 1183.833013] Call Trace:
Aug 25 11:37:24 bubblegum kernel: [ 1183.834243]  [<ffffffff81602828>] ? printk+0x61/0x63
Aug 25 11:37:24 bubblegum kernel: [ 1183.835479]  [<ffffffffa04d344a>] btrfs_find_device_for_logical+0x4a/0xa0 [btrfs]
Aug 25 11:37:24 bubblegum kernel: [ 1183.836717]  [<ffffffffa04c6955>] end_bio_extent_readpage+0x105/0xa80 [btrfs]
Aug 25 11:37:24 bubblegum kernel: [ 1183.837938]  [<ffffffff81173569>] ? kfree+0x139/0x160
Aug 25 11:37:24 bubblegum kernel: [ 1183.839157]  [<ffffffff811baaad>] bio_endio+0x1d/0x40
Aug 25 11:37:24 bubblegum kernel: [ 1183.840395]  [<ffffffffa049be81>] end_workqueue_fn+0x41/0x50 [btrfs]
Aug 25 11:37:24 bubblegum kernel: [ 1183.841635]  [<ffffffffa04d4d46>] worker_loop+0x136/0x580 [btrfs]
Aug 25 11:37:24 bubblegum kernel: [ 1183.842876]  [<ffffffffa04d4c10>] ? btrfs_queue_worker+0x300/0x300 [btrfs]
Aug 25 11:37:24 bubblegum kernel: [ 1183.844093]  [<ffffffff8107b4e3>] kthread+0x93/0xa0
Aug 25 11:37:24 bubblegum kernel: [ 1183.845309]  [<ffffffff81615be4>] kernel_thread_helper+0x4/0x10
Aug 25 11:37:24 bubblegum kernel: [ 1183.846522]  [<ffffffff8107b450>] ? flush_kthread_worker+0x80/0x80
Aug 25 11:37:24 bubblegum kernel: [ 1183.847741]  [<ffffffff81615be0>] ? gs_change+0x13/0x13
Aug 25 11:37:24 bubblegum kernel: [ 1183.848952] Code: e6 89 c7 eb a4 0f 0b c7 45 c4 01 00 00 00 31 db e9 06 fd ff ff 0f 0b 49 8b 17 48 89 de 48 c7 c7 e8 92 50 a0 31 c0 e
8 df 45 13 e1 <0f> 0b 0f 0b 89 df e9 73 ff ff ff 66 66 66 66 2e 0f 1f 84 00 00
Aug 25 11:37:24 bubblegum kernel: [ 1183.850358] RIP  [<ffffffffa04ce1e8>] __btrfs_map_block+0x678/0x690 [btrfs]
Aug 25 11:37:24 bubblegum kernel: [ 1183.851677]  RSP <ffff8803e5c3fc60>
Aug 25 11:37:24 bubblegum kernel: [ 1183.890781] ---[ end trace afd1a418cb384dde ]---
Aug 25 11:37:25 bubblegum sh[676]: abrt-dump-oops: Found oopses: 1
Aug 25 11:37:25 bubblegum sh[676]: abrt-dump-oops: Creating dump directories
Aug 25 11:37:25 bubblegum abrtd: Directory 'oops-2012-08-25-11:37:25-1856-0' creation detected
Aug 25 11:37:25 bubblegum abrt-dump-oops: Reported 1 kernel oopses to Abrt
Aug 25 11:37:25 bubblegum abrtd: Can't open file '/var/spool/abrt/oops-2012-08-25-11:37:25-1856-0/uid': No such file or directory
Aug 25 11:37:25 bubblegum abrtd: New problem directory /var/spool/abrt/oops-2012-08-25-11:37:25-1856-0, processing
Aug 25 11:37:25 bubblegum abrtd: Can't open file '/var/spool/abrt/oops-2012-08-25-11:37:25-1856-0/uid': No such file or directory
Aug 25 11:37:46 bubblegum dbus-daemon[791]: ** Message: No devices in use, exit


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: crash while trying to access corrupt fs
  2012-08-26 20:07 crash while trying to access corrupt fs tubalcane
@ 2012-08-27 11:12 ` Stefan Behrens
  2012-08-27 15:31   ` Liu Bo
  0 siblings, 1 reply; 4+ messages in thread
From: Stefan Behrens @ 2012-08-27 11:12 UTC (permalink / raw)
  To: tubalcane; +Cc: linux-btrfs

On Sun, 26 Aug 2012 16:07:33 -0400 (EDT), tubalcane wrote:
> I'm primarily interested in the block level checksums of files and the
> scrubbing
> feature to detect corrupt files.  Currently I use ext4 and create and keep
> md5sums of everything which is tedious but I care about my data (quadruple
> backups including offsite)
> 
> I decided to experiment by copying 7 large video files (total 900MB) to
> a btrfs
> test drive and purposely corrupted the 4th file using the instructions
> here:
> 
> https://blogs.oracle.com/wim/entry/btrfs_scrub_go_fix_corruptions
> 
> umount, then remount, md5sum of the files and the entire machine locks
> up when
> accessing the 4th file.  I rebooted, ran btrfs scrub, waited for it to
> finish. It detects the corruptions but I'm not doing RAID so it can't
> fix them.  Then I
> tried to access the 4th file again and another crash.  Rebooted again and
> crashed a third time just to be sure.
> 
> I'm running Fedora 17 and kernel 3.5.2, crash info below.  I saved the
> btrfs-debug-tree output and can email it someone wants it (only 21K
> gzipped)
> 
> 
> Aug 25 11:37:24 bubblegum kernel: [ 1183.786267] btrfs csum failed ino
> 260 off 0 csum 3029581555 private 3057259415
> Aug 25 11:37:24 bubblegum kernel: [ 1183.786273] unable to find logical
> 0 len 0
> Aug 25 11:37:24 bubblegum kernel: [ 1183.786297] ------------[ cut here
> ]------------
> Aug 25 11:37:24 bubblegum kernel: [ 1183.787326] kernel BUG at
> fs/btrfs/volumes.c:3762!
> Aug 25 11:37:24 bubblegum kernel: [ 1183.789085] invalid opcode: 0000
> [#1] SMP
> Aug 25 11:37:24 bubblegum kernel: [ 1183.792003] CPU 6
> Aug 25 11:37:24 bubblegum kernel: [ 1183.792008] Modules linked in:
> btrfs libcrc32c zlib_deflate fuse ip6table_filter ip6_tables ebtable_nat
> ebtables ipt_MASQUERADE iptab
> le_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack
> xt_CHECKSUM iptable_mangle bridge stp llc tpm_bios vhost_net tun macvtap
> macvlan nfsd coretemp kvm_in
> tel kvm snd_hda_codec_realtek nfs_acl auth_rpcgss lockd snd_hda_intel
> snd_hda_codec sunrpc lpc_ich mfd_core i7core_edac edac_core i2c_i801
> snd_hwdep snd_pcm snd_page_allo
> c snd_timer snd soundcore microcode r8169 uinput mii binfmt_misc
> ata_generic pata_acpi crc32c_intel usb_storage pata_jmicron sata_mv
> hid_logitech_dj nouveau mxm_wmi wmi v
> ideo i2c_algo_bit drm_kms_helper ttm drm i2c_core [last unloaded:
> scsi_wait_scan]
> Aug 25 11:37:24 bubblegum kernel: [ 1183.802920]
> Aug 25 11:37:24 bubblegum kernel: [ 1183.805100] Pid: 1783, comm:
> btrfs-endio-1 Not tainted 3.5.2-1.fc17.x86_64 #1 Gigabyte Technology
> Co., Ltd. P55M-UD2/P55M-UD2
> Aug 25 11:37:24 bubblegum kernel: [ 1183.809165] RIP:
> 0010:[<ffffffffa04ce1e8>]  [<ffffffffa04ce1e8>]
> __btrfs_map_block+0x678/0x690 [btrfs]
> Aug 25 11:37:24 bubblegum kernel: [ 1183.813476] RSP:
> 0018:ffff8803e5c3fc60  EFLAGS: 00010282
> Aug 25 11:37:24 bubblegum kernel: [ 1183.815061] RAX: 000000000000001e
> RBX: 0000000000000000 RCX: 00000000000000c4
> Aug 25 11:37:24 bubblegum kernel: [ 1183.816203] RDX: 000000000000004a
> RSI: 0000000000000046 RDI: 0000000000000246
> Aug 25 11:37:24 bubblegum kernel: [ 1183.817347] RBP: ffff8803e5c3fd00
> R08: 0000000000000449 R09: 0000000000000000
> Aug 25 11:37:24 bubblegum kernel: [ 1183.818748] R10: 0000000000000000
> R11: 0000000000040000 R12: ffff88040109e108
> Aug 25 11:37:24 bubblegum kernel: [ 1183.819904] R13: ffff8803f4e54010
> R14: 0000000000000fff R15: ffff8803e5c3fd10
> Aug 25 11:37:24 bubblegum kernel: [ 1183.821067] FS: 
> 0000000000000000(0000) GS:ffff88041fd80000(0000) knlGS:0000000000000000
> Aug 25 11:37:24 bubblegum kernel: [ 1183.822236] CS:  0010 DS: 0000 ES:
> 0000 CR0: 000000008005003b
> Aug 25 11:37:24 bubblegum kernel: [ 1183.823411] CR2: 0000003b66b47090
> CR3: 0000000001c0b000 CR4: 00000000000007e0
> Aug 25 11:37:24 bubblegum kernel: [ 1183.824594] DR0: 0000000000000000
> DR1: 0000000000000000 DR2: 0000000000000000
> Aug 25 11:37:24 bubblegum kernel: [ 1183.825783] DR3: 0000000000000000
> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Aug 25 11:37:24 bubblegum kernel: [ 1183.826974] Process btrfs-endio-1
> (pid: 1783, threadinfo ffff8803e5c3e000, task ffff8803e5c10000)
> Aug 25 11:37:24 bubblegum kernel: [ 1183.828168] Stack:
> Aug 25 11:37:24 bubblegum kernel: [ 1183.829363]  ffff8803f37c2c00
> 0000000000001000 ffff8803e5c3fcd0 ffffffff81602828
> Aug 25 11:37:24 bubblegum kernel: [ 1183.830579]  ffff8803e5c3fcc0
> 0000000000000028 ffff8803e5c3fce0 ffff8803e5c3fca0
> Aug 25 11:37:24 bubblegum kernel: [ 1183.831796]  ffff88034b6c410c
> ffff8803e5c3fd18 0000000000000000 00000000b493bef3
> Aug 25 11:37:24 bubblegum kernel: [ 1183.833013] Call Trace:
> Aug 25 11:37:24 bubblegum kernel: [ 1183.834243]  [<ffffffff81602828>] ?
> printk+0x61/0x63
> Aug 25 11:37:24 bubblegum kernel: [ 1183.835479]  [<ffffffffa04d344a>]
> btrfs_find_device_for_logical+0x4a/0xa0 [btrfs]
> Aug 25 11:37:24 bubblegum kernel: [ 1183.836717]  [<ffffffffa04c6955>]
> end_bio_extent_readpage+0x105/0xa80 [btrfs]
> Aug 25 11:37:24 bubblegum kernel: [ 1183.837938]  [<ffffffff81173569>] ?
> kfree+0x139/0x160
> Aug 25 11:37:24 bubblegum kernel: [ 1183.839157]  [<ffffffff811baaad>]
> bio_endio+0x1d/0x40
> Aug 25 11:37:24 bubblegum kernel: [ 1183.840395]  [<ffffffffa049be81>]
> end_workqueue_fn+0x41/0x50 [btrfs]
> Aug 25 11:37:24 bubblegum kernel: [ 1183.841635]  [<ffffffffa04d4d46>]
> worker_loop+0x136/0x580 [btrfs]

That crash is a bug which I have introduced with the IO error stats. It can happen after checksum errors are detected.
I'll send a patch to (temporarily) remove the counting for checksum errors in the IO error stats.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: crash while trying to access corrupt fs
  2012-08-27 11:12 ` Stefan Behrens
@ 2012-08-27 15:31   ` Liu Bo
  2012-08-27 16:12     ` Stefan Behrens
  0 siblings, 1 reply; 4+ messages in thread
From: Liu Bo @ 2012-08-27 15:31 UTC (permalink / raw)
  To: Stefan Behrens; +Cc: tubalcane, linux-btrfs

On 08/27/2012 07:12 PM, Stefan Behrens wrote:
> On Sun, 26 Aug 2012 16:07:33 -0400 (EDT), tubalcane wrote:
>> I'm primarily interested in the block level checksums of files and the
>> scrubbing
>> feature to detect corrupt files.  Currently I use ext4 and create and keep
>> md5sums of everything which is tedious but I care about my data (quadruple
>> backups including offsite)
>>
[...]
>> Aug 25 11:37:24 bubblegum kernel: [ 1183.835479]  [<ffffffffa04d344a>]
>> btrfs_find_device_for_logical+0x4a/0xa0 [btrfs]
>> Aug 25 11:37:24 bubblegum kernel: [ 1183.836717]  [<ffffffffa04c6955>]
>> end_bio_extent_readpage+0x105/0xa80 [btrfs]
>> Aug 25 11:37:24 bubblegum kernel: [ 1183.837938]  [<ffffffff81173569>] ?
>> kfree+0x139/0x160
>> Aug 25 11:37:24 bubblegum kernel: [ 1183.839157]  [<ffffffff811baaad>]
>> bio_endio+0x1d/0x40
>> Aug 25 11:37:24 bubblegum kernel: [ 1183.840395]  [<ffffffffa049be81>]
>> end_workqueue_fn+0x41/0x50 [btrfs]
>> Aug 25 11:37:24 bubblegum kernel: [ 1183.841635]  [<ffffffffa04d4d46>]
>> worker_loop+0x136/0x580 [btrfs]
> 
> That crash is a bug which I have introduced with the IO error stats. It can happen after checksum errors are detected.
> I'll send a patch to (temporarily) remove the counting for checksum errors in the IO error stats.

Just out of curiosity, isn't it fixable due to your design, Stefan?
Why not try to fix the bug?

thanks,
liubo

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: crash while trying to access corrupt fs
  2012-08-27 15:31   ` Liu Bo
@ 2012-08-27 16:12     ` Stefan Behrens
  0 siblings, 0 replies; 4+ messages in thread
From: Stefan Behrens @ 2012-08-27 16:12 UTC (permalink / raw)
  To: Liu Bo; +Cc: tubalcane, linux-btrfs

On Mon, 27 Aug 2012 23:31:41 +0800, Liu Bo wrote:
> On 08/27/2012 07:12 PM, Stefan Behrens wrote:
>> On Sun, 26 Aug 2012 16:07:33 -0400 (EDT), tubalcane wrote:
>>> I'm primarily interested in the block level checksums of files and the
>>> scrubbing
>>> feature to detect corrupt files.  Currently I use ext4 and create and keep
>>> md5sums of everything which is tedious but I care about my data (quadruple
>>> backups including offsite)
>>>
> [...]
>>> Aug 25 11:37:24 bubblegum kernel: [ 1183.835479]  [<ffffffffa04d344a>]
>>> btrfs_find_device_for_logical+0x4a/0xa0 [btrfs]
>>> Aug 25 11:37:24 bubblegum kernel: [ 1183.836717]  [<ffffffffa04c6955>]
>>> end_bio_extent_readpage+0x105/0xa80 [btrfs]
>>> Aug 25 11:37:24 bubblegum kernel: [ 1183.837938]  [<ffffffff81173569>] ?
>>> kfree+0x139/0x160
>>> Aug 25 11:37:24 bubblegum kernel: [ 1183.839157]  [<ffffffff811baaad>]
>>> bio_endio+0x1d/0x40
>>> Aug 25 11:37:24 bubblegum kernel: [ 1183.840395]  [<ffffffffa049be81>]
>>> end_workqueue_fn+0x41/0x50 [btrfs]
>>> Aug 25 11:37:24 bubblegum kernel: [ 1183.841635]  [<ffffffffa04d4d46>]
>>> worker_loop+0x136/0x580 [btrfs]
>>
>> That crash is a bug which I have introduced with the IO error stats. It can happen after checksum errors are detected.
>> I'll send a patch to (temporarily) remove the counting for checksum errors in the IO error stats.
> 
> Just out of curiosity, isn't it fixable due to your design, Stefan?
> Why not try to fix the bug?

Yes, it is fixable. But it is complicated (and a source for new errors),
and I wanted to quickly prevent any more harm caused by this bug. People
who face that bug get a kernel crash whenever they access that corrupted
part of the filesystem.

The right btrfs_device pointer is needed in order to find the statistic
counters to increment. One would need to take some code of
bio_readpage_error() and some code of repair_io_failure() to retrieve
the btrfs_device pointer, and that would be rather huge additional code.
But maybe I am just not seeing the simple way to do it. Any simple
solution would be appreciated.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-08-27 16:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-26 20:07 crash while trying to access corrupt fs tubalcane
2012-08-27 11:12 ` Stefan Behrens
2012-08-27 15:31   ` Liu Bo
2012-08-27 16:12     ` Stefan Behrens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).