* btrfs volume corrupt. btrfs-progs bug or need to rebuild volume? @ 2018-01-19 21:45 Rosen Penev 2018-01-20 6:32 ` Duncan 2018-01-21 9:53 ` Qu Wenruo 0 siblings, 2 replies; 5+ messages in thread From: Rosen Penev @ 2018-01-19 21:45 UTC (permalink / raw) To: linux-btrfs v2: Add proper subject I've been playing around with a specific kernel on a specific device trying to figure out why btrfs keeps throwing csum errors after ~15 hours. I've almost nailed it down to some specific CONFIG option in the kernel, possibly related to IRQs. Anyway, I managed to get my btrfs RAID5 array corrupted to the point where it will just mount to read-only mode. btrfs check doesn't seem to work either. Here's some output. root@LEDE:~# btrfs check /dev/sda Checking filesystem on /dev/sda UUID: 22d612d9-b7b6-4c4c-95cd-64f5056d420b checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A Csum didn't match ERROR: failed to repair root items: I/O error root@LEDE:~# btrfs check --init-extent-tree /dev/sda Checking filesystem on /dev/sda UUID: 22d612d9-b7b6-4c4c-95cd-64f5056d420b Creating a new extent tree Failed to find [3174144425984, 168, 16384] btrfs unable to find ref byte nr 3174347603968 parent 0 root 1 owner 1 offset 0 Failed to find [3174144475136, 168, 16384] btrfs unable to find ref byte nr 3174444449792 parent 0 root 1 owner 0 offset 1 Failed to find [3174144507904, 168, 16384] btrfs unable to find ref byte nr 3174631505920 parent 0 root 1 owner 0 offset 1 checking extents cmds-check.c:7866: add_data_backref: BUG_ON `!back` triggered, value 1 Aborted root@LEDE:~# btrfs check --init-csum-tree /dev/sda Creating a new CRC tree Checking filesystem on /dev/sda UUID: 22d612d9-b7b6-4c4c-95cd-64f5056d420b Reinitialize checksum tree Fixed 0 roots. checking extents cmds-check.c:7866: add_data_backref: BUG_ON `!back` triggered, value 1 Aborted This is with version 4.14 of btrfs-progs. Do I need a newer version or should I just reinitialize my array and copy everything back? Log on mount attached below: Fri Jan 19 14:26:01 2018 kern.info kernel: [168376.739242] BTRFS info (device sda): disk space caching is enabled Fri Jan 19 14:26:01 2018 kern.info kernel: [168376.752038] BTRFS info (device sda): has skinny extents Fri Jan 19 14:26:04 2018 kern.info kernel: [168380.493600] BTRFS info (device sda): continuing balance Fri Jan 19 14:26:07 2018 kern.info kernel: [168382.691771] BTRFS info (device sda): relocating block group 3295510790144 flags 129 Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.028958] BTRFS warning (device sda): sda checksum verify failed on 3174631424000 wanted 2658452A found 6F04F3FC level 0 Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.052699] BTRFS warning (device sda): sda checksum verify failed on 3174631424000 wanted 2658452A found 6F04F3FC level 0 Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.087279] BTRFS warning (device sda): sda checksum verify failed on 3174631424000 wanted 2658452A found 6F04F3FC level 0 Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.110017] ------------[ cut here ]------------ Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.119950] WARNING: CPU: 0 PID: 2496 at fs/btrfs/extent-tree.c:6958 btrfs_lookup_block_group+0x1438/0x1f74 [btrfs] Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.120096] BTRFS warning (device sda): sda checksum verify failed on 3174631424000 wanted 2658452A found 6F04F3FC level 0 Fri Jan 19 14:26:07 2018 kern.crit kernel: [168383.120189] BTRFS: error (device sda) in __btrfs_free_extent:6958: errno=-5 IO failure Fri Jan 19 14:26:07 2018 kern.info kernel: [168383.120197] BTRFS info (device sda): forced readonly Fri Jan 19 14:26:07 2018 kern.crit kernel: [168383.120214] BTRFS: error (device sda) in btrfs_run_delayed_refs:2967: errno=-5 IO failure Fri Jan 19 14:26:07 2018 kern.debug kernel: [168383.207466] BTRFS: Transaction aborted (error -5) Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.217230] Modules linked in: snd_usb_audio nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG snd_usbmidi_lib nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack libcrc32c iptable_mangle iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables snd_compress snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_rawmidi snd_seq_device snd_hwdep snd soundcore cifs sha256_generic md5 md4 hmac ecb des_generic usb_storage leds_gpio xhci_mtk xhci_plat_hcd xhci_pci xhci_hcd ahci libahci libata sd_mod Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.361711] scsi_mod gpio_button_hotplug btrfs xor raid6_pq usbcore nls_base usb_common crc32c_generic Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.378239] CPU: 0 PID: 2496 Comm: kworker/u8:2 Tainted: G W 4.9.75 #0 Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.394206] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.408183] Stack : 8b3b8200 804c0000 8045bc04 8f7d359c 00000009 00001b2e 8ed29270 00000000 Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.425374] 8f673800 8006b9c8 8045bc04 00000000 000009c0 80523824 8045bb70 8c6b3b24 Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.442564] 804c0000 800a8670 00000001 80520000 804c9ec4 804c9ec8 80460810 8c6b3b24 Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.459753] 804c0000 8004334c 8ed29270 8c6b3b5c 000005ae 00000000 00000006 006b3b44 Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.476942] 8f7777ac 8fe2e400 8fe2eb00 66727462 78652d73 746e6574 6665722d 00000073 Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.494132] ... Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.499272] Call Trace: Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.504435] [<8000f814>] show_stack+0x54/0x88 Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.513472] [<801da9cc>] dump_stack+0x8c/0xd0 Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.522505] [<8002bdc4>] __warn+0xe4/0x118 Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.531005] [<8002be28>] warn_slowpath_fmt+0x30/0x3c Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.541343] [<8f716adc>] btrfs_lookup_block_group+0x1438/0x1f74 [btrfs] Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.555109] ---[ end trace d625fb7e6ea3d882 ]--- Fri Jan 19 14:26:08 2018 kern.crit kernel: [168383.564700] BTRFS: error (device sda) in __btrfs_free_extent:6958: errno=-5 IO failure Fri Jan 19 14:26:08 2018 kern.crit kernel: [168383.581024] BTRFS: error (device sda) in btrfs_run_delayed_refs:2967: errno=-5 IO failure ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: btrfs volume corrupt. btrfs-progs bug or need to rebuild volume? 2018-01-19 21:45 btrfs volume corrupt. btrfs-progs bug or need to rebuild volume? Rosen Penev @ 2018-01-20 6:32 ` Duncan 2018-01-21 9:53 ` Qu Wenruo 1 sibling, 0 replies; 5+ messages in thread From: Duncan @ 2018-01-20 6:32 UTC (permalink / raw) To: linux-btrfs Rosen Penev posted on Fri, 19 Jan 2018 13:45:35 -0800 as excerpted: > v2: Add proper subject =:^) > I've been playing around with a specific kernel on a specific device > trying to figure out why btrfs keeps throwing csum errors after ~15 > hours. I've almost nailed it down to some specific CONFIG option in the > kernel, possibly related to IRQs. > > Anyway, I managed to get my btrfs RAID5 array corrupted to the point > where it will just mount to read-only mode. [...] > This is with version 4.14 of btrfs-progs. Do I need a newer version or > should I just reinitialize my array and copy everything back? > > Log on mount attached below: [...] > Fri Jan 19 14:26:08 2018 kern.warn kernel: > [168383.378239] CPU: 0 PID: > 2496 Comm: kworker/u8:2 Tainted: G W 4.9.75 #0 Tho as the penultimate LTS kernel series 4.9 is still on the btrfs-list supported list in general... 4.9 still had known btrfs raid56 mode issues and is strongly negatively recommended for use with btrfs raid56 mode. Those weren't fixed until 4.12, which /finally/ brought raid56 mode into generally working and not negatively recommended state. While as an LTS applicable general btrfs bug fixes would be backported to 4.9, because raid56 mode had never worked /well/ at that point, I'm not sure those fixes were backported. So you really need either kernel 4.12+, presumably the LTS 4.14 series since you're on LTS 4.9 series now, for btrfs raid56 mode, or don't use raid56 mode if you plan on staying with the 4.9 LTS, as it still had severe known issues back then and I haven't seen on-list confirmation that the 4.12 btrfs raid56 mode fixes were backported to 4.9-LTS. If you need/choose to stick with 4.9 and dump raid56 mode, the recommended alternative depends on the number of devices in the filesystem. For a small number of devices in the filesystem, btrfs raid1 is effectively as stable as the still stabilizing and maturing btrfs itself is at this point and is recommended. For a larger number of devices, btrfs raid1 is still a good choice because it /is/ the most mature, but btrfs raid10 is /reasonably/ stable tho IMO not quite as stable as raid1, or for better performance (due to btrfs raid10 not being read-optimized yet) while keeping btrfs checksumming and error repair from the second copy when available, consider a layered approach, with btrfs raid1 on top of a pair of mdraid0s (or dmraid0s, or hardware raid0s). -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: btrfs volume corrupt. btrfs-progs bug or need to rebuild volume? 2018-01-19 21:45 btrfs volume corrupt. btrfs-progs bug or need to rebuild volume? Rosen Penev 2018-01-20 6:32 ` Duncan @ 2018-01-21 9:53 ` Qu Wenruo 2018-01-21 20:33 ` Rosen Penev 1 sibling, 1 reply; 5+ messages in thread From: Qu Wenruo @ 2018-01-21 9:53 UTC (permalink / raw) To: Rosen Penev, linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 8238 bytes --] On 2018年01月20日 05:45, Rosen Penev wrote: > v2: Add proper subject > > I've been playing around with a specific kernel on a specific device > trying to figure out why btrfs keeps throwing csum errors after ~15 > hours. I've almost nailed it down to some specific CONFIG option in > the kernel, possibly related to IRQs. According to the hostname, it seems to be LEDE (or should be called OpenWRT soon?). Using btrfs in embedded environment is really interesting to see. > > Anyway, I managed to get my btrfs RAID5 array corrupted to the point > where it will just mount to read-only mode. btrfs check doesn't seem > to work either. Here's some output. So not really deadly corrupted, if the data matters mount it RO and grab whatever you could get. > > root@LEDE:~# btrfs check /dev/sda > Checking filesystem on /dev/sda > UUID: 22d612d9-b7b6-4c4c-95cd-64f5056d420b > checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A > checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A > checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A > checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A > Csum didn't match > ERROR: failed to repair root items: I/O error IIRC btrfs-progs doesn't handle RAID5/6 repair well, so if something went wrong btrfs-progs just give up. So don't expect too much when using btrfs-progs with RAID5/6. > > root@LEDE:~# btrfs check --init-extent-tree /dev/sda > Checking filesystem on /dev/sda > UUID: 22d612d9-b7b6-4c4c-95cd-64f5056d420b > Creating a new extent tree > Failed to find [3174144425984, 168, 16384] > btrfs unable to find ref byte nr 3174347603968 parent 0 root 1 owner 1 offset 0 > Failed to find [3174144475136, 168, 16384] > btrfs unable to find ref byte nr 3174444449792 parent 0 root 1 owner 0 offset 1 > Failed to find [3174144507904, 168, 16384] > btrfs unable to find ref byte nr 3174631505920 parent 0 root 1 owner 0 offset 1 > checking extents > cmds-check.c:7866: add_data_backref: BUG_ON `!back` triggered, value 1 > Aborted You're calling one of the most dangerous operation. It's a fortune it just aborts before causing more dangerous. > > root@LEDE:~# btrfs check --init-csum-tree /dev/sda > Creating a new CRC tree > Checking filesystem on /dev/sda > UUID: 22d612d9-b7b6-4c4c-95cd-64f5056d420b > Reinitialize checksum tree > Fixed 0 roots. > checking extents > cmds-check.c:7866: add_data_backref: BUG_ON `!back` triggered, value 1 > Aborted > > This is with version 4.14 of btrfs-progs. Do I need a newer version or > should I just reinitialize my array and copy everything back? > > Log on mount attached below: > > Fri Jan 19 14:26:01 2018 kern.info kernel: [168376.739242] BTRFS info > (device sda): disk space caching is enabled > Fri Jan 19 14:26:01 2018 kern.info kernel: [168376.752038] BTRFS info > (device sda): has skinny extents > Fri Jan 19 14:26:04 2018 kern.info kernel: [168380.493600] BTRFS info > (device sda): continuing balance It seems to be a problem relocating the chunk. Try 'skip_balance' to see if it allow you to mount it RW. If it doesn't work, and since btrfs-progs won't help much in such case, rebuilding seems to be your only option. Thanks, Qu > Fri Jan 19 14:26:07 2018 kern.info kernel: [168382.691771] BTRFS info > (device sda): relocating block group 3295510790144 flags 129 > Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.028958] BTRFS > warning (device sda): sda checksum verify failed on 3174631424000 > wanted 2658452A found 6F04F3FC level 0 > Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.052699] BTRFS > warning (device sda): sda checksum verify failed on 3174631424000 > wanted 2658452A found 6F04F3FC level 0 > Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.087279] BTRFS > warning (device sda): sda checksum verify failed on 3174631424000 > wanted 2658452A found 6F04F3FC level 0 > Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.110017] > ------------[ cut here ]------------ > Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.119950] WARNING: > CPU: 0 PID: 2496 at fs/btrfs/extent-tree.c:6958 > btrfs_lookup_block_group+0x1438/0x1f74 [btrfs] > Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.120096] BTRFS > warning (device sda): sda checksum verify failed on 3174631424000 > wanted 2658452A found 6F04F3FC level 0 > Fri Jan 19 14:26:07 2018 kern.crit kernel: [168383.120189] BTRFS: > error (device sda) in __btrfs_free_extent:6958: errno=-5 IO failure > Fri Jan 19 14:26:07 2018 kern.info kernel: [168383.120197] BTRFS info > (device sda): forced readonly > Fri Jan 19 14:26:07 2018 kern.crit kernel: [168383.120214] BTRFS: > error (device sda) in btrfs_run_delayed_refs:2967: errno=-5 IO failure > Fri Jan 19 14:26:07 2018 kern.debug kernel: [168383.207466] BTRFS: > Transaction aborted (error -5) > Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.217230] Modules > linked in: snd_usb_audio nf_conntrack_ipv6 iptable_nat ipt_REJECT > ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark > xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG > snd_usbmidi_lib nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 > nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 > nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack libcrc32c > iptable_mangle iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 > nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables > x_tables snd_compress snd_pcm_oss snd_mixer_oss snd_pcm snd_timer > snd_rawmidi snd_seq_device snd_hwdep snd soundcore cifs sha256_generic > md5 md4 hmac ecb des_generic usb_storage leds_gpio xhci_mtk > xhci_plat_hcd xhci_pci xhci_hcd ahci libahci libata sd_mod > Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.361711] scsi_mod > gpio_button_hotplug btrfs xor raid6_pq usbcore nls_base usb_common > crc32c_generic > Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.378239] CPU: 0 PID: > 2496 Comm: kworker/u8:2 Tainted: G W 4.9.75 #0 > Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.394206] Workqueue: > btrfs-extent-refs btrfs_extent_refs_helper [btrfs] > Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.408183] Stack : > 8b3b8200 804c0000 8045bc04 8f7d359c 00000009 00001b2e 8ed29270 > 00000000 > Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.425374] > 8f673800 8006b9c8 8045bc04 00000000 000009c0 80523824 8045bb70 > 8c6b3b24 > Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.442564] > 804c0000 800a8670 00000001 80520000 804c9ec4 804c9ec8 80460810 > 8c6b3b24 > Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.459753] > 804c0000 8004334c 8ed29270 8c6b3b5c 000005ae 00000000 00000006 > 006b3b44 > Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.476942] > 8f7777ac 8fe2e400 8fe2eb00 66727462 78652d73 746e6574 6665722d > 00000073 > Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.494132] ... > Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.499272] Call Trace: > Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.504435] > [<8000f814>] show_stack+0x54/0x88 > Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.513472] > [<801da9cc>] dump_stack+0x8c/0xd0 > Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.522505] > [<8002bdc4>] __warn+0xe4/0x118 > Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.531005] > [<8002be28>] warn_slowpath_fmt+0x30/0x3c > Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.541343] > [<8f716adc>] btrfs_lookup_block_group+0x1438/0x1f74 [btrfs] > Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.555109] ---[ end > trace d625fb7e6ea3d882 ]--- > Fri Jan 19 14:26:08 2018 kern.crit kernel: [168383.564700] BTRFS: > error (device sda) in __btrfs_free_extent:6958: errno=-5 IO failure > Fri Jan 19 14:26:08 2018 kern.crit kernel: [168383.581024] BTRFS: > error (device sda) in btrfs_run_delayed_refs:2967: errno=-5 IO failure > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 520 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: btrfs volume corrupt. btrfs-progs bug or need to rebuild volume? 2018-01-21 9:53 ` Qu Wenruo @ 2018-01-21 20:33 ` Rosen Penev 2018-01-22 0:41 ` Qu Wenruo 0 siblings, 1 reply; 5+ messages in thread From: Rosen Penev @ 2018-01-21 20:33 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Sun, Jan 21, 2018 at 1:53 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > On 2018年01月20日 05:45, Rosen Penev wrote: >> v2: Add proper subject >> >> I've been playing around with a specific kernel on a specific device >> trying to figure out why btrfs keeps throwing csum errors after ~15 >> hours. I've almost nailed it down to some specific CONFIG option in >> the kernel, possibly related to IRQs. > > According to the hostname, it seems to be LEDE (or should be called > OpenWRT soon?). > Using btrfs in embedded environment is really interesting to see. > The issue that was causing the corruption seems to have been fixed in .75 of 4.9. The particular device is using router hardware (mt7621) except instead of using the pcie lanes for wireless controllers, it has Asmedia SATA controllers. Slow but seems to work. >> >> Anyway, I managed to get my btrfs RAID5 array corrupted to the point >> where it will just mount to read-only mode. btrfs check doesn't seem >> to work either. Here's some output. > > So not really deadly corrupted, if the data matters mount it RO and grab > whatever you could get. > Funny story about that. On access, it locks up the entire shell making me unable to do anything. However, Samba actually works. A lot of the data that was on the array was corrupted but I did manage to grab some stuff. >> >> root@LEDE:~# btrfs check /dev/sda >> Checking filesystem on /dev/sda >> UUID: 22d612d9-b7b6-4c4c-95cd-64f5056d420b >> checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A >> checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A >> checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A >> checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A >> Csum didn't match >> ERROR: failed to repair root items: I/O error > > IIRC btrfs-progs doesn't handle RAID5/6 repair well, so if something > went wrong btrfs-progs just give up. > > So don't expect too much when using btrfs-progs with RAID5/6. > Duly noted. I/O error is strange since the hardware is fine... >> >> root@LEDE:~# btrfs check --init-extent-tree /dev/sda >> Checking filesystem on /dev/sda >> UUID: 22d612d9-b7b6-4c4c-95cd-64f5056d420b >> Creating a new extent tree >> Failed to find [3174144425984, 168, 16384] >> btrfs unable to find ref byte nr 3174347603968 parent 0 root 1 owner 1 offset 0 >> Failed to find [3174144475136, 168, 16384] >> btrfs unable to find ref byte nr 3174444449792 parent 0 root 1 owner 0 offset 1 >> Failed to find [3174144507904, 168, 16384] >> btrfs unable to find ref byte nr 3174631505920 parent 0 root 1 owner 0 offset 1 >> checking extents >> cmds-check.c:7866: add_data_backref: BUG_ON `!back` triggered, value 1 >> Aborted > > You're calling one of the most dangerous operation. > It's a fortune it just aborts before causing more dangerous. > Didn't realize this option was dangerous. Guess I should have read man pages... >> >> root@LEDE:~# btrfs check --init-csum-tree /dev/sda >> Creating a new CRC tree >> Checking filesystem on /dev/sda >> UUID: 22d612d9-b7b6-4c4c-95cd-64f5056d420b >> Reinitialize checksum tree >> Fixed 0 roots. >> checking extents >> cmds-check.c:7866: add_data_backref: BUG_ON `!back` triggered, value 1 >> Aborted >> >> This is with version 4.14 of btrfs-progs. Do I need a newer version or >> should I just reinitialize my array and copy everything back? >> >> Log on mount attached below: >> >> Fri Jan 19 14:26:01 2018 kern.info kernel: [168376.739242] BTRFS info >> (device sda): disk space caching is enabled >> Fri Jan 19 14:26:01 2018 kern.info kernel: [168376.752038] BTRFS info >> (device sda): has skinny extents >> Fri Jan 19 14:26:04 2018 kern.info kernel: [168380.493600] BTRFS info >> (device sda): continuing balance > > It seems to be a problem relocating the chunk. > > Try 'skip_balance' to see if it allow you to mount it RW. > > If it doesn't work, and since btrfs-progs won't help much in such case, > rebuilding seems to be your only option. > Ended up rebuilding. It seems userspace (and maybe kernel?) is getting proper data now from the drives so btrfs is not detecting silent data corruption and trying to deal with it. > Thanks, > Qu > >> Fri Jan 19 14:26:07 2018 kern.info kernel: [168382.691771] BTRFS info >> (device sda): relocating block group 3295510790144 flags 129 >> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.028958] BTRFS >> warning (device sda): sda checksum verify failed on 3174631424000 >> wanted 2658452A found 6F04F3FC level 0 >> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.052699] BTRFS >> warning (device sda): sda checksum verify failed on 3174631424000 >> wanted 2658452A found 6F04F3FC level 0 >> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.087279] BTRFS >> warning (device sda): sda checksum verify failed on 3174631424000 >> wanted 2658452A found 6F04F3FC level 0 >> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.110017] >> ------------[ cut here ]------------ >> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.119950] WARNING: >> CPU: 0 PID: 2496 at fs/btrfs/extent-tree.c:6958 >> btrfs_lookup_block_group+0x1438/0x1f74 [btrfs] >> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.120096] BTRFS >> warning (device sda): sda checksum verify failed on 3174631424000 >> wanted 2658452A found 6F04F3FC level 0 >> Fri Jan 19 14:26:07 2018 kern.crit kernel: [168383.120189] BTRFS: >> error (device sda) in __btrfs_free_extent:6958: errno=-5 IO failure >> Fri Jan 19 14:26:07 2018 kern.info kernel: [168383.120197] BTRFS info >> (device sda): forced readonly >> Fri Jan 19 14:26:07 2018 kern.crit kernel: [168383.120214] BTRFS: >> error (device sda) in btrfs_run_delayed_refs:2967: errno=-5 IO failure >> Fri Jan 19 14:26:07 2018 kern.debug kernel: [168383.207466] BTRFS: >> Transaction aborted (error -5) >> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.217230] Modules >> linked in: snd_usb_audio nf_conntrack_ipv6 iptable_nat ipt_REJECT >> ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark >> xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG >> snd_usbmidi_lib nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 >> nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 >> nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack libcrc32c >> iptable_mangle iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 >> nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables >> x_tables snd_compress snd_pcm_oss snd_mixer_oss snd_pcm snd_timer >> snd_rawmidi snd_seq_device snd_hwdep snd soundcore cifs sha256_generic >> md5 md4 hmac ecb des_generic usb_storage leds_gpio xhci_mtk >> xhci_plat_hcd xhci_pci xhci_hcd ahci libahci libata sd_mod >> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.361711] scsi_mod >> gpio_button_hotplug btrfs xor raid6_pq usbcore nls_base usb_common >> crc32c_generic >> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.378239] CPU: 0 PID: >> 2496 Comm: kworker/u8:2 Tainted: G W 4.9.75 #0 >> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.394206] Workqueue: >> btrfs-extent-refs btrfs_extent_refs_helper [btrfs] >> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.408183] Stack : >> 8b3b8200 804c0000 8045bc04 8f7d359c 00000009 00001b2e 8ed29270 >> 00000000 >> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.425374] >> 8f673800 8006b9c8 8045bc04 00000000 000009c0 80523824 8045bb70 >> 8c6b3b24 >> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.442564] >> 804c0000 800a8670 00000001 80520000 804c9ec4 804c9ec8 80460810 >> 8c6b3b24 >> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.459753] >> 804c0000 8004334c 8ed29270 8c6b3b5c 000005ae 00000000 00000006 >> 006b3b44 >> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.476942] >> 8f7777ac 8fe2e400 8fe2eb00 66727462 78652d73 746e6574 6665722d >> 00000073 >> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.494132] ... >> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.499272] Call Trace: >> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.504435] >> [<8000f814>] show_stack+0x54/0x88 >> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.513472] >> [<801da9cc>] dump_stack+0x8c/0xd0 >> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.522505] >> [<8002bdc4>] __warn+0xe4/0x118 >> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.531005] >> [<8002be28>] warn_slowpath_fmt+0x30/0x3c >> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.541343] >> [<8f716adc>] btrfs_lookup_block_group+0x1438/0x1f74 [btrfs] >> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.555109] ---[ end >> trace d625fb7e6ea3d882 ]--- >> Fri Jan 19 14:26:08 2018 kern.crit kernel: [168383.564700] BTRFS: >> error (device sda) in __btrfs_free_extent:6958: errno=-5 IO failure >> Fri Jan 19 14:26:08 2018 kern.crit kernel: [168383.581024] BTRFS: >> error (device sda) in btrfs_run_delayed_refs:2967: errno=-5 IO failure >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: btrfs volume corrupt. btrfs-progs bug or need to rebuild volume? 2018-01-21 20:33 ` Rosen Penev @ 2018-01-22 0:41 ` Qu Wenruo 0 siblings, 0 replies; 5+ messages in thread From: Qu Wenruo @ 2018-01-22 0:41 UTC (permalink / raw) To: Rosen Penev; +Cc: linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 9980 bytes --] On 2018年01月22日 04:33, Rosen Penev wrote: > On Sun, Jan 21, 2018 at 1:53 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >> >> >> On 2018年01月20日 05:45, Rosen Penev wrote: >>> v2: Add proper subject >>> >>> I've been playing around with a specific kernel on a specific device >>> trying to figure out why btrfs keeps throwing csum errors after ~15 >>> hours. I've almost nailed it down to some specific CONFIG option in >>> the kernel, possibly related to IRQs. >> >> According to the hostname, it seems to be LEDE (or should be called >> OpenWRT soon?). >> Using btrfs in embedded environment is really interesting to see. >> > The issue that was causing the corruption seems to have been fixed in > .75 of 4.9. The particular device is using router hardware (mt7621) > except instead of using the pcie lanes for wireless controllers, it > has Asmedia SATA controllers. Slow but seems to work. >>> >>> Anyway, I managed to get my btrfs RAID5 array corrupted to the point >>> where it will just mount to read-only mode. btrfs check doesn't seem >>> to work either. Here's some output. >> >> So not really deadly corrupted, if the data matters mount it RO and grab >> whatever you could get. >> > Funny story about that. On access, it locks up the entire shell making > me unable to do anything. However, Samba actually works. A lot of the > data that was on the array was corrupted but I did manage to grab some > stuff. >>> >>> root@LEDE:~# btrfs check /dev/sda >>> Checking filesystem on /dev/sda >>> UUID: 22d612d9-b7b6-4c4c-95cd-64f5056d420b >>> checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A >>> checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A >>> checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A >>> checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A >>> Csum didn't match >>> ERROR: failed to repair root items: I/O error >> >> IIRC btrfs-progs doesn't handle RAID5/6 repair well, so if something >> went wrong btrfs-progs just give up. >> >> So don't expect too much when using btrfs-progs with RAID5/6. >> > Duly noted. I/O error is strange since the hardware is fine... Well, most EIO in btrfs only means csum error. So your hardware is mostly in good shape, unless there is some extra error message from your device driver or block layer. Thanks, Qu >>> >>> root@LEDE:~# btrfs check --init-extent-tree /dev/sda >>> Checking filesystem on /dev/sda >>> UUID: 22d612d9-b7b6-4c4c-95cd-64f5056d420b >>> Creating a new extent tree >>> Failed to find [3174144425984, 168, 16384] >>> btrfs unable to find ref byte nr 3174347603968 parent 0 root 1 owner 1 offset 0 >>> Failed to find [3174144475136, 168, 16384] >>> btrfs unable to find ref byte nr 3174444449792 parent 0 root 1 owner 0 offset 1 >>> Failed to find [3174144507904, 168, 16384] >>> btrfs unable to find ref byte nr 3174631505920 parent 0 root 1 owner 0 offset 1 >>> checking extents >>> cmds-check.c:7866: add_data_backref: BUG_ON `!back` triggered, value 1 >>> Aborted >> >> You're calling one of the most dangerous operation. >> It's a fortune it just aborts before causing more dangerous. >> > Didn't realize this option was dangerous. Guess I should have read man pages... >>> >>> root@LEDE:~# btrfs check --init-csum-tree /dev/sda >>> Creating a new CRC tree >>> Checking filesystem on /dev/sda >>> UUID: 22d612d9-b7b6-4c4c-95cd-64f5056d420b >>> Reinitialize checksum tree >>> Fixed 0 roots. >>> checking extents >>> cmds-check.c:7866: add_data_backref: BUG_ON `!back` triggered, value 1 >>> Aborted >>> >>> This is with version 4.14 of btrfs-progs. Do I need a newer version or >>> should I just reinitialize my array and copy everything back? >>> >>> Log on mount attached below: >>> >>> Fri Jan 19 14:26:01 2018 kern.info kernel: [168376.739242] BTRFS info >>> (device sda): disk space caching is enabled >>> Fri Jan 19 14:26:01 2018 kern.info kernel: [168376.752038] BTRFS info >>> (device sda): has skinny extents >>> Fri Jan 19 14:26:04 2018 kern.info kernel: [168380.493600] BTRFS info >>> (device sda): continuing balance >> >> It seems to be a problem relocating the chunk. >> >> Try 'skip_balance' to see if it allow you to mount it RW. >> >> If it doesn't work, and since btrfs-progs won't help much in such case, >> rebuilding seems to be your only option. >> > > Ended up rebuilding. It seems userspace (and maybe kernel?) is getting > proper data now from the drives so btrfs is not detecting silent data > corruption and trying to deal with it. > >> Thanks, >> Qu >> >>> Fri Jan 19 14:26:07 2018 kern.info kernel: [168382.691771] BTRFS info >>> (device sda): relocating block group 3295510790144 flags 129 >>> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.028958] BTRFS >>> warning (device sda): sda checksum verify failed on 3174631424000 >>> wanted 2658452A found 6F04F3FC level 0 >>> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.052699] BTRFS >>> warning (device sda): sda checksum verify failed on 3174631424000 >>> wanted 2658452A found 6F04F3FC level 0 >>> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.087279] BTRFS >>> warning (device sda): sda checksum verify failed on 3174631424000 >>> wanted 2658452A found 6F04F3FC level 0 >>> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.110017] >>> ------------[ cut here ]------------ >>> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.119950] WARNING: >>> CPU: 0 PID: 2496 at fs/btrfs/extent-tree.c:6958 >>> btrfs_lookup_block_group+0x1438/0x1f74 [btrfs] >>> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.120096] BTRFS >>> warning (device sda): sda checksum verify failed on 3174631424000 >>> wanted 2658452A found 6F04F3FC level 0 >>> Fri Jan 19 14:26:07 2018 kern.crit kernel: [168383.120189] BTRFS: >>> error (device sda) in __btrfs_free_extent:6958: errno=-5 IO failure >>> Fri Jan 19 14:26:07 2018 kern.info kernel: [168383.120197] BTRFS info >>> (device sda): forced readonly >>> Fri Jan 19 14:26:07 2018 kern.crit kernel: [168383.120214] BTRFS: >>> error (device sda) in btrfs_run_delayed_refs:2967: errno=-5 IO failure >>> Fri Jan 19 14:26:07 2018 kern.debug kernel: [168383.207466] BTRFS: >>> Transaction aborted (error -5) >>> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.217230] Modules >>> linked in: snd_usb_audio nf_conntrack_ipv6 iptable_nat ipt_REJECT >>> ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark >>> xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG >>> snd_usbmidi_lib nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 >>> nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 >>> nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack libcrc32c >>> iptable_mangle iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 >>> nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables >>> x_tables snd_compress snd_pcm_oss snd_mixer_oss snd_pcm snd_timer >>> snd_rawmidi snd_seq_device snd_hwdep snd soundcore cifs sha256_generic >>> md5 md4 hmac ecb des_generic usb_storage leds_gpio xhci_mtk >>> xhci_plat_hcd xhci_pci xhci_hcd ahci libahci libata sd_mod >>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.361711] scsi_mod >>> gpio_button_hotplug btrfs xor raid6_pq usbcore nls_base usb_common >>> crc32c_generic >>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.378239] CPU: 0 PID: >>> 2496 Comm: kworker/u8:2 Tainted: G W 4.9.75 #0 >>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.394206] Workqueue: >>> btrfs-extent-refs btrfs_extent_refs_helper [btrfs] >>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.408183] Stack : >>> 8b3b8200 804c0000 8045bc04 8f7d359c 00000009 00001b2e 8ed29270 >>> 00000000 >>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.425374] >>> 8f673800 8006b9c8 8045bc04 00000000 000009c0 80523824 8045bb70 >>> 8c6b3b24 >>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.442564] >>> 804c0000 800a8670 00000001 80520000 804c9ec4 804c9ec8 80460810 >>> 8c6b3b24 >>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.459753] >>> 804c0000 8004334c 8ed29270 8c6b3b5c 000005ae 00000000 00000006 >>> 006b3b44 >>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.476942] >>> 8f7777ac 8fe2e400 8fe2eb00 66727462 78652d73 746e6574 6665722d >>> 00000073 >>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.494132] ... >>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.499272] Call Trace: >>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.504435] >>> [<8000f814>] show_stack+0x54/0x88 >>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.513472] >>> [<801da9cc>] dump_stack+0x8c/0xd0 >>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.522505] >>> [<8002bdc4>] __warn+0xe4/0x118 >>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.531005] >>> [<8002be28>] warn_slowpath_fmt+0x30/0x3c >>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.541343] >>> [<8f716adc>] btrfs_lookup_block_group+0x1438/0x1f74 [btrfs] >>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.555109] ---[ end >>> trace d625fb7e6ea3d882 ]--- >>> Fri Jan 19 14:26:08 2018 kern.crit kernel: [168383.564700] BTRFS: >>> error (device sda) in __btrfs_free_extent:6958: errno=-5 IO failure >>> Fri Jan 19 14:26:08 2018 kern.crit kernel: [168383.581024] BTRFS: >>> error (device sda) in btrfs_run_delayed_refs:2967: errno=-5 IO failure >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 520 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-01-22 0:41 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-01-19 21:45 btrfs volume corrupt. btrfs-progs bug or need to rebuild volume? Rosen Penev 2018-01-20 6:32 ` Duncan 2018-01-21 9:53 ` Qu Wenruo 2018-01-21 20:33 ` Rosen Penev 2018-01-22 0:41 ` Qu Wenruo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox