* btrfs volume corrupt. btrfs-progs bug or need to rebuild volume?
@ 2018-01-19 21:45 Rosen Penev
2018-01-20 6:32 ` Duncan
2018-01-21 9:53 ` Qu Wenruo
0 siblings, 2 replies; 5+ messages in thread
From: Rosen Penev @ 2018-01-19 21:45 UTC (permalink / raw)
To: linux-btrfs
v2: Add proper subject
I've been playing around with a specific kernel on a specific device
trying to figure out why btrfs keeps throwing csum errors after ~15
hours. I've almost nailed it down to some specific CONFIG option in
the kernel, possibly related to IRQs.
Anyway, I managed to get my btrfs RAID5 array corrupted to the point
where it will just mount to read-only mode. btrfs check doesn't seem
to work either. Here's some output.
root@LEDE:~# btrfs check /dev/sda
Checking filesystem on /dev/sda
UUID: 22d612d9-b7b6-4c4c-95cd-64f5056d420b
checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A
checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A
checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A
checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A
Csum didn't match
ERROR: failed to repair root items: I/O error
root@LEDE:~# btrfs check --init-extent-tree /dev/sda
Checking filesystem on /dev/sda
UUID: 22d612d9-b7b6-4c4c-95cd-64f5056d420b
Creating a new extent tree
Failed to find [3174144425984, 168, 16384]
btrfs unable to find ref byte nr 3174347603968 parent 0 root 1 owner 1 offset 0
Failed to find [3174144475136, 168, 16384]
btrfs unable to find ref byte nr 3174444449792 parent 0 root 1 owner 0 offset 1
Failed to find [3174144507904, 168, 16384]
btrfs unable to find ref byte nr 3174631505920 parent 0 root 1 owner 0 offset 1
checking extents
cmds-check.c:7866: add_data_backref: BUG_ON `!back` triggered, value 1
Aborted
root@LEDE:~# btrfs check --init-csum-tree /dev/sda
Creating a new CRC tree
Checking filesystem on /dev/sda
UUID: 22d612d9-b7b6-4c4c-95cd-64f5056d420b
Reinitialize checksum tree
Fixed 0 roots.
checking extents
cmds-check.c:7866: add_data_backref: BUG_ON `!back` triggered, value 1
Aborted
This is with version 4.14 of btrfs-progs. Do I need a newer version or
should I just reinitialize my array and copy everything back?
Log on mount attached below:
Fri Jan 19 14:26:01 2018 kern.info kernel: [168376.739242] BTRFS info
(device sda): disk space caching is enabled
Fri Jan 19 14:26:01 2018 kern.info kernel: [168376.752038] BTRFS info
(device sda): has skinny extents
Fri Jan 19 14:26:04 2018 kern.info kernel: [168380.493600] BTRFS info
(device sda): continuing balance
Fri Jan 19 14:26:07 2018 kern.info kernel: [168382.691771] BTRFS info
(device sda): relocating block group 3295510790144 flags 129
Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.028958] BTRFS
warning (device sda): sda checksum verify failed on 3174631424000
wanted 2658452A found 6F04F3FC level 0
Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.052699] BTRFS
warning (device sda): sda checksum verify failed on 3174631424000
wanted 2658452A found 6F04F3FC level 0
Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.087279] BTRFS
warning (device sda): sda checksum verify failed on 3174631424000
wanted 2658452A found 6F04F3FC level 0
Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.110017]
------------[ cut here ]------------
Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.119950] WARNING:
CPU: 0 PID: 2496 at fs/btrfs/extent-tree.c:6958
btrfs_lookup_block_group+0x1438/0x1f74 [btrfs]
Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.120096] BTRFS
warning (device sda): sda checksum verify failed on 3174631424000
wanted 2658452A found 6F04F3FC level 0
Fri Jan 19 14:26:07 2018 kern.crit kernel: [168383.120189] BTRFS:
error (device sda) in __btrfs_free_extent:6958: errno=-5 IO failure
Fri Jan 19 14:26:07 2018 kern.info kernel: [168383.120197] BTRFS info
(device sda): forced readonly
Fri Jan 19 14:26:07 2018 kern.crit kernel: [168383.120214] BTRFS:
error (device sda) in btrfs_run_delayed_refs:2967: errno=-5 IO failure
Fri Jan 19 14:26:07 2018 kern.debug kernel: [168383.207466] BTRFS:
Transaction aborted (error -5)
Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.217230] Modules
linked in: snd_usb_audio nf_conntrack_ipv6 iptable_nat ipt_REJECT
ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark
xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG
snd_usbmidi_lib nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4
nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6
nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack libcrc32c
iptable_mangle iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6
nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables
x_tables snd_compress snd_pcm_oss snd_mixer_oss snd_pcm snd_timer
snd_rawmidi snd_seq_device snd_hwdep snd soundcore cifs sha256_generic
md5 md4 hmac ecb des_generic usb_storage leds_gpio xhci_mtk
xhci_plat_hcd xhci_pci xhci_hcd ahci libahci libata sd_mod
Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.361711] scsi_mod
gpio_button_hotplug btrfs xor raid6_pq usbcore nls_base usb_common
crc32c_generic
Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.378239] CPU: 0 PID:
2496 Comm: kworker/u8:2 Tainted: G W 4.9.75 #0
Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.394206] Workqueue:
btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.408183] Stack :
8b3b8200 804c0000 8045bc04 8f7d359c 00000009 00001b2e 8ed29270
00000000
Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.425374]
8f673800 8006b9c8 8045bc04 00000000 000009c0 80523824 8045bb70
8c6b3b24
Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.442564]
804c0000 800a8670 00000001 80520000 804c9ec4 804c9ec8 80460810
8c6b3b24
Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.459753]
804c0000 8004334c 8ed29270 8c6b3b5c 000005ae 00000000 00000006
006b3b44
Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.476942]
8f7777ac 8fe2e400 8fe2eb00 66727462 78652d73 746e6574 6665722d
00000073
Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.494132] ...
Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.499272] Call Trace:
Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.504435]
[<8000f814>] show_stack+0x54/0x88
Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.513472]
[<801da9cc>] dump_stack+0x8c/0xd0
Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.522505]
[<8002bdc4>] __warn+0xe4/0x118
Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.531005]
[<8002be28>] warn_slowpath_fmt+0x30/0x3c
Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.541343]
[<8f716adc>] btrfs_lookup_block_group+0x1438/0x1f74 [btrfs]
Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.555109] ---[ end
trace d625fb7e6ea3d882 ]---
Fri Jan 19 14:26:08 2018 kern.crit kernel: [168383.564700] BTRFS:
error (device sda) in __btrfs_free_extent:6958: errno=-5 IO failure
Fri Jan 19 14:26:08 2018 kern.crit kernel: [168383.581024] BTRFS:
error (device sda) in btrfs_run_delayed_refs:2967: errno=-5 IO failure
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: btrfs volume corrupt. btrfs-progs bug or need to rebuild volume?
2018-01-19 21:45 btrfs volume corrupt. btrfs-progs bug or need to rebuild volume? Rosen Penev
@ 2018-01-20 6:32 ` Duncan
2018-01-21 9:53 ` Qu Wenruo
1 sibling, 0 replies; 5+ messages in thread
From: Duncan @ 2018-01-20 6:32 UTC (permalink / raw)
To: linux-btrfs
Rosen Penev posted on Fri, 19 Jan 2018 13:45:35 -0800 as excerpted:
> v2: Add proper subject
=:^)
> I've been playing around with a specific kernel on a specific device
> trying to figure out why btrfs keeps throwing csum errors after ~15
> hours. I've almost nailed it down to some specific CONFIG option in the
> kernel, possibly related to IRQs.
>
> Anyway, I managed to get my btrfs RAID5 array corrupted to the point
> where it will just mount to read-only mode.
[...]
> This is with version 4.14 of btrfs-progs. Do I need a newer version or
> should I just reinitialize my array and copy everything back?
>
> Log on mount attached below:
[...]
> Fri Jan 19 14:26:08 2018 kern.warn kernel:
> [168383.378239] CPU: 0 PID:
> 2496 Comm: kworker/u8:2 Tainted: G W 4.9.75 #0
Tho as the penultimate LTS kernel series 4.9 is still on the btrfs-list
supported list in general... 4.9 still had known btrfs raid56 mode issues
and is strongly negatively recommended for use with btrfs raid56 mode.
Those weren't fixed until 4.12, which /finally/ brought raid56 mode into
generally working and not negatively recommended state.
While as an LTS applicable general btrfs bug fixes would be backported to
4.9, because raid56 mode had never worked /well/ at that point, I'm not
sure those fixes were backported.
So you really need either kernel 4.12+, presumably the LTS 4.14 series
since you're on LTS 4.9 series now, for btrfs raid56 mode, or don't use
raid56 mode if you plan on staying with the 4.9 LTS, as it still had
severe known issues back then and I haven't seen on-list confirmation
that the 4.12 btrfs raid56 mode fixes were backported to 4.9-LTS.
If you need/choose to stick with 4.9 and dump raid56 mode, the
recommended alternative depends on the number of devices in the
filesystem.
For a small number of devices in the filesystem, btrfs raid1 is
effectively as stable as the still stabilizing and maturing btrfs itself
is at this point and is recommended.
For a larger number of devices, btrfs raid1 is still a good choice
because it /is/ the most mature, but btrfs raid10 is /reasonably/ stable
tho IMO not quite as stable as raid1, or for better performance (due to
btrfs raid10 not being read-optimized yet) while keeping btrfs
checksumming and error repair from the second copy when available,
consider a layered approach, with btrfs raid1 on top of a pair of mdraid0s
(or dmraid0s, or hardware raid0s).
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: btrfs volume corrupt. btrfs-progs bug or need to rebuild volume?
2018-01-19 21:45 btrfs volume corrupt. btrfs-progs bug or need to rebuild volume? Rosen Penev
2018-01-20 6:32 ` Duncan
@ 2018-01-21 9:53 ` Qu Wenruo
2018-01-21 20:33 ` Rosen Penev
1 sibling, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2018-01-21 9:53 UTC (permalink / raw)
To: Rosen Penev, linux-btrfs
[-- Attachment #1.1: Type: text/plain, Size: 8238 bytes --]
On 2018年01月20日 05:45, Rosen Penev wrote:
> v2: Add proper subject
>
> I've been playing around with a specific kernel on a specific device
> trying to figure out why btrfs keeps throwing csum errors after ~15
> hours. I've almost nailed it down to some specific CONFIG option in
> the kernel, possibly related to IRQs.
According to the hostname, it seems to be LEDE (or should be called
OpenWRT soon?).
Using btrfs in embedded environment is really interesting to see.
>
> Anyway, I managed to get my btrfs RAID5 array corrupted to the point
> where it will just mount to read-only mode. btrfs check doesn't seem
> to work either. Here's some output.
So not really deadly corrupted, if the data matters mount it RO and grab
whatever you could get.
>
> root@LEDE:~# btrfs check /dev/sda
> Checking filesystem on /dev/sda
> UUID: 22d612d9-b7b6-4c4c-95cd-64f5056d420b
> checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A
> checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A
> checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A
> checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A
> Csum didn't match
> ERROR: failed to repair root items: I/O error
IIRC btrfs-progs doesn't handle RAID5/6 repair well, so if something
went wrong btrfs-progs just give up.
So don't expect too much when using btrfs-progs with RAID5/6.
>
> root@LEDE:~# btrfs check --init-extent-tree /dev/sda
> Checking filesystem on /dev/sda
> UUID: 22d612d9-b7b6-4c4c-95cd-64f5056d420b
> Creating a new extent tree
> Failed to find [3174144425984, 168, 16384]
> btrfs unable to find ref byte nr 3174347603968 parent 0 root 1 owner 1 offset 0
> Failed to find [3174144475136, 168, 16384]
> btrfs unable to find ref byte nr 3174444449792 parent 0 root 1 owner 0 offset 1
> Failed to find [3174144507904, 168, 16384]
> btrfs unable to find ref byte nr 3174631505920 parent 0 root 1 owner 0 offset 1
> checking extents
> cmds-check.c:7866: add_data_backref: BUG_ON `!back` triggered, value 1
> Aborted
You're calling one of the most dangerous operation.
It's a fortune it just aborts before causing more dangerous.
>
> root@LEDE:~# btrfs check --init-csum-tree /dev/sda
> Creating a new CRC tree
> Checking filesystem on /dev/sda
> UUID: 22d612d9-b7b6-4c4c-95cd-64f5056d420b
> Reinitialize checksum tree
> Fixed 0 roots.
> checking extents
> cmds-check.c:7866: add_data_backref: BUG_ON `!back` triggered, value 1
> Aborted
>
> This is with version 4.14 of btrfs-progs. Do I need a newer version or
> should I just reinitialize my array and copy everything back?
>
> Log on mount attached below:
>
> Fri Jan 19 14:26:01 2018 kern.info kernel: [168376.739242] BTRFS info
> (device sda): disk space caching is enabled
> Fri Jan 19 14:26:01 2018 kern.info kernel: [168376.752038] BTRFS info
> (device sda): has skinny extents
> Fri Jan 19 14:26:04 2018 kern.info kernel: [168380.493600] BTRFS info
> (device sda): continuing balance
It seems to be a problem relocating the chunk.
Try 'skip_balance' to see if it allow you to mount it RW.
If it doesn't work, and since btrfs-progs won't help much in such case,
rebuilding seems to be your only option.
Thanks,
Qu
> Fri Jan 19 14:26:07 2018 kern.info kernel: [168382.691771] BTRFS info
> (device sda): relocating block group 3295510790144 flags 129
> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.028958] BTRFS
> warning (device sda): sda checksum verify failed on 3174631424000
> wanted 2658452A found 6F04F3FC level 0
> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.052699] BTRFS
> warning (device sda): sda checksum verify failed on 3174631424000
> wanted 2658452A found 6F04F3FC level 0
> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.087279] BTRFS
> warning (device sda): sda checksum verify failed on 3174631424000
> wanted 2658452A found 6F04F3FC level 0
> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.110017]
> ------------[ cut here ]------------
> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.119950] WARNING:
> CPU: 0 PID: 2496 at fs/btrfs/extent-tree.c:6958
> btrfs_lookup_block_group+0x1438/0x1f74 [btrfs]
> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.120096] BTRFS
> warning (device sda): sda checksum verify failed on 3174631424000
> wanted 2658452A found 6F04F3FC level 0
> Fri Jan 19 14:26:07 2018 kern.crit kernel: [168383.120189] BTRFS:
> error (device sda) in __btrfs_free_extent:6958: errno=-5 IO failure
> Fri Jan 19 14:26:07 2018 kern.info kernel: [168383.120197] BTRFS info
> (device sda): forced readonly
> Fri Jan 19 14:26:07 2018 kern.crit kernel: [168383.120214] BTRFS:
> error (device sda) in btrfs_run_delayed_refs:2967: errno=-5 IO failure
> Fri Jan 19 14:26:07 2018 kern.debug kernel: [168383.207466] BTRFS:
> Transaction aborted (error -5)
> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.217230] Modules
> linked in: snd_usb_audio nf_conntrack_ipv6 iptable_nat ipt_REJECT
> ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark
> xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG
> snd_usbmidi_lib nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4
> nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6
> nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack libcrc32c
> iptable_mangle iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6
> nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables
> x_tables snd_compress snd_pcm_oss snd_mixer_oss snd_pcm snd_timer
> snd_rawmidi snd_seq_device snd_hwdep snd soundcore cifs sha256_generic
> md5 md4 hmac ecb des_generic usb_storage leds_gpio xhci_mtk
> xhci_plat_hcd xhci_pci xhci_hcd ahci libahci libata sd_mod
> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.361711] scsi_mod
> gpio_button_hotplug btrfs xor raid6_pq usbcore nls_base usb_common
> crc32c_generic
> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.378239] CPU: 0 PID:
> 2496 Comm: kworker/u8:2 Tainted: G W 4.9.75 #0
> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.394206] Workqueue:
> btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.408183] Stack :
> 8b3b8200 804c0000 8045bc04 8f7d359c 00000009 00001b2e 8ed29270
> 00000000
> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.425374]
> 8f673800 8006b9c8 8045bc04 00000000 000009c0 80523824 8045bb70
> 8c6b3b24
> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.442564]
> 804c0000 800a8670 00000001 80520000 804c9ec4 804c9ec8 80460810
> 8c6b3b24
> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.459753]
> 804c0000 8004334c 8ed29270 8c6b3b5c 000005ae 00000000 00000006
> 006b3b44
> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.476942]
> 8f7777ac 8fe2e400 8fe2eb00 66727462 78652d73 746e6574 6665722d
> 00000073
> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.494132] ...
> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.499272] Call Trace:
> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.504435]
> [<8000f814>] show_stack+0x54/0x88
> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.513472]
> [<801da9cc>] dump_stack+0x8c/0xd0
> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.522505]
> [<8002bdc4>] __warn+0xe4/0x118
> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.531005]
> [<8002be28>] warn_slowpath_fmt+0x30/0x3c
> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.541343]
> [<8f716adc>] btrfs_lookup_block_group+0x1438/0x1f74 [btrfs]
> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.555109] ---[ end
> trace d625fb7e6ea3d882 ]---
> Fri Jan 19 14:26:08 2018 kern.crit kernel: [168383.564700] BTRFS:
> error (device sda) in __btrfs_free_extent:6958: errno=-5 IO failure
> Fri Jan 19 14:26:08 2018 kern.crit kernel: [168383.581024] BTRFS:
> error (device sda) in btrfs_run_delayed_refs:2967: errno=-5 IO failure
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: btrfs volume corrupt. btrfs-progs bug or need to rebuild volume?
2018-01-21 9:53 ` Qu Wenruo
@ 2018-01-21 20:33 ` Rosen Penev
2018-01-22 0:41 ` Qu Wenruo
0 siblings, 1 reply; 5+ messages in thread
From: Rosen Penev @ 2018-01-21 20:33 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Sun, Jan 21, 2018 at 1:53 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
> On 2018年01月20日 05:45, Rosen Penev wrote:
>> v2: Add proper subject
>>
>> I've been playing around with a specific kernel on a specific device
>> trying to figure out why btrfs keeps throwing csum errors after ~15
>> hours. I've almost nailed it down to some specific CONFIG option in
>> the kernel, possibly related to IRQs.
>
> According to the hostname, it seems to be LEDE (or should be called
> OpenWRT soon?).
> Using btrfs in embedded environment is really interesting to see.
>
The issue that was causing the corruption seems to have been fixed in
.75 of 4.9. The particular device is using router hardware (mt7621)
except instead of using the pcie lanes for wireless controllers, it
has Asmedia SATA controllers. Slow but seems to work.
>>
>> Anyway, I managed to get my btrfs RAID5 array corrupted to the point
>> where it will just mount to read-only mode. btrfs check doesn't seem
>> to work either. Here's some output.
>
> So not really deadly corrupted, if the data matters mount it RO and grab
> whatever you could get.
>
Funny story about that. On access, it locks up the entire shell making
me unable to do anything. However, Samba actually works. A lot of the
data that was on the array was corrupted but I did manage to grab some
stuff.
>>
>> root@LEDE:~# btrfs check /dev/sda
>> Checking filesystem on /dev/sda
>> UUID: 22d612d9-b7b6-4c4c-95cd-64f5056d420b
>> checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A
>> checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A
>> checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A
>> checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A
>> Csum didn't match
>> ERROR: failed to repair root items: I/O error
>
> IIRC btrfs-progs doesn't handle RAID5/6 repair well, so if something
> went wrong btrfs-progs just give up.
>
> So don't expect too much when using btrfs-progs with RAID5/6.
>
Duly noted. I/O error is strange since the hardware is fine...
>>
>> root@LEDE:~# btrfs check --init-extent-tree /dev/sda
>> Checking filesystem on /dev/sda
>> UUID: 22d612d9-b7b6-4c4c-95cd-64f5056d420b
>> Creating a new extent tree
>> Failed to find [3174144425984, 168, 16384]
>> btrfs unable to find ref byte nr 3174347603968 parent 0 root 1 owner 1 offset 0
>> Failed to find [3174144475136, 168, 16384]
>> btrfs unable to find ref byte nr 3174444449792 parent 0 root 1 owner 0 offset 1
>> Failed to find [3174144507904, 168, 16384]
>> btrfs unable to find ref byte nr 3174631505920 parent 0 root 1 owner 0 offset 1
>> checking extents
>> cmds-check.c:7866: add_data_backref: BUG_ON `!back` triggered, value 1
>> Aborted
>
> You're calling one of the most dangerous operation.
> It's a fortune it just aborts before causing more dangerous.
>
Didn't realize this option was dangerous. Guess I should have read man pages...
>>
>> root@LEDE:~# btrfs check --init-csum-tree /dev/sda
>> Creating a new CRC tree
>> Checking filesystem on /dev/sda
>> UUID: 22d612d9-b7b6-4c4c-95cd-64f5056d420b
>> Reinitialize checksum tree
>> Fixed 0 roots.
>> checking extents
>> cmds-check.c:7866: add_data_backref: BUG_ON `!back` triggered, value 1
>> Aborted
>>
>> This is with version 4.14 of btrfs-progs. Do I need a newer version or
>> should I just reinitialize my array and copy everything back?
>>
>> Log on mount attached below:
>>
>> Fri Jan 19 14:26:01 2018 kern.info kernel: [168376.739242] BTRFS info
>> (device sda): disk space caching is enabled
>> Fri Jan 19 14:26:01 2018 kern.info kernel: [168376.752038] BTRFS info
>> (device sda): has skinny extents
>> Fri Jan 19 14:26:04 2018 kern.info kernel: [168380.493600] BTRFS info
>> (device sda): continuing balance
>
> It seems to be a problem relocating the chunk.
>
> Try 'skip_balance' to see if it allow you to mount it RW.
>
> If it doesn't work, and since btrfs-progs won't help much in such case,
> rebuilding seems to be your only option.
>
Ended up rebuilding. It seems userspace (and maybe kernel?) is getting
proper data now from the drives so btrfs is not detecting silent data
corruption and trying to deal with it.
> Thanks,
> Qu
>
>> Fri Jan 19 14:26:07 2018 kern.info kernel: [168382.691771] BTRFS info
>> (device sda): relocating block group 3295510790144 flags 129
>> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.028958] BTRFS
>> warning (device sda): sda checksum verify failed on 3174631424000
>> wanted 2658452A found 6F04F3FC level 0
>> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.052699] BTRFS
>> warning (device sda): sda checksum verify failed on 3174631424000
>> wanted 2658452A found 6F04F3FC level 0
>> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.087279] BTRFS
>> warning (device sda): sda checksum verify failed on 3174631424000
>> wanted 2658452A found 6F04F3FC level 0
>> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.110017]
>> ------------[ cut here ]------------
>> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.119950] WARNING:
>> CPU: 0 PID: 2496 at fs/btrfs/extent-tree.c:6958
>> btrfs_lookup_block_group+0x1438/0x1f74 [btrfs]
>> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.120096] BTRFS
>> warning (device sda): sda checksum verify failed on 3174631424000
>> wanted 2658452A found 6F04F3FC level 0
>> Fri Jan 19 14:26:07 2018 kern.crit kernel: [168383.120189] BTRFS:
>> error (device sda) in __btrfs_free_extent:6958: errno=-5 IO failure
>> Fri Jan 19 14:26:07 2018 kern.info kernel: [168383.120197] BTRFS info
>> (device sda): forced readonly
>> Fri Jan 19 14:26:07 2018 kern.crit kernel: [168383.120214] BTRFS:
>> error (device sda) in btrfs_run_delayed_refs:2967: errno=-5 IO failure
>> Fri Jan 19 14:26:07 2018 kern.debug kernel: [168383.207466] BTRFS:
>> Transaction aborted (error -5)
>> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.217230] Modules
>> linked in: snd_usb_audio nf_conntrack_ipv6 iptable_nat ipt_REJECT
>> ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark
>> xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG
>> snd_usbmidi_lib nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4
>> nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6
>> nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack libcrc32c
>> iptable_mangle iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6
>> nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables
>> x_tables snd_compress snd_pcm_oss snd_mixer_oss snd_pcm snd_timer
>> snd_rawmidi snd_seq_device snd_hwdep snd soundcore cifs sha256_generic
>> md5 md4 hmac ecb des_generic usb_storage leds_gpio xhci_mtk
>> xhci_plat_hcd xhci_pci xhci_hcd ahci libahci libata sd_mod
>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.361711] scsi_mod
>> gpio_button_hotplug btrfs xor raid6_pq usbcore nls_base usb_common
>> crc32c_generic
>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.378239] CPU: 0 PID:
>> 2496 Comm: kworker/u8:2 Tainted: G W 4.9.75 #0
>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.394206] Workqueue:
>> btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.408183] Stack :
>> 8b3b8200 804c0000 8045bc04 8f7d359c 00000009 00001b2e 8ed29270
>> 00000000
>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.425374]
>> 8f673800 8006b9c8 8045bc04 00000000 000009c0 80523824 8045bb70
>> 8c6b3b24
>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.442564]
>> 804c0000 800a8670 00000001 80520000 804c9ec4 804c9ec8 80460810
>> 8c6b3b24
>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.459753]
>> 804c0000 8004334c 8ed29270 8c6b3b5c 000005ae 00000000 00000006
>> 006b3b44
>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.476942]
>> 8f7777ac 8fe2e400 8fe2eb00 66727462 78652d73 746e6574 6665722d
>> 00000073
>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.494132] ...
>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.499272] Call Trace:
>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.504435]
>> [<8000f814>] show_stack+0x54/0x88
>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.513472]
>> [<801da9cc>] dump_stack+0x8c/0xd0
>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.522505]
>> [<8002bdc4>] __warn+0xe4/0x118
>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.531005]
>> [<8002be28>] warn_slowpath_fmt+0x30/0x3c
>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.541343]
>> [<8f716adc>] btrfs_lookup_block_group+0x1438/0x1f74 [btrfs]
>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.555109] ---[ end
>> trace d625fb7e6ea3d882 ]---
>> Fri Jan 19 14:26:08 2018 kern.crit kernel: [168383.564700] BTRFS:
>> error (device sda) in __btrfs_free_extent:6958: errno=-5 IO failure
>> Fri Jan 19 14:26:08 2018 kern.crit kernel: [168383.581024] BTRFS:
>> error (device sda) in btrfs_run_delayed_refs:2967: errno=-5 IO failure
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: btrfs volume corrupt. btrfs-progs bug or need to rebuild volume?
2018-01-21 20:33 ` Rosen Penev
@ 2018-01-22 0:41 ` Qu Wenruo
0 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2018-01-22 0:41 UTC (permalink / raw)
To: Rosen Penev; +Cc: linux-btrfs
[-- Attachment #1.1: Type: text/plain, Size: 9980 bytes --]
On 2018年01月22日 04:33, Rosen Penev wrote:
> On Sun, Jan 21, 2018 at 1:53 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>> On 2018年01月20日 05:45, Rosen Penev wrote:
>>> v2: Add proper subject
>>>
>>> I've been playing around with a specific kernel on a specific device
>>> trying to figure out why btrfs keeps throwing csum errors after ~15
>>> hours. I've almost nailed it down to some specific CONFIG option in
>>> the kernel, possibly related to IRQs.
>>
>> According to the hostname, it seems to be LEDE (or should be called
>> OpenWRT soon?).
>> Using btrfs in embedded environment is really interesting to see.
>>
> The issue that was causing the corruption seems to have been fixed in
> .75 of 4.9. The particular device is using router hardware (mt7621)
> except instead of using the pcie lanes for wireless controllers, it
> has Asmedia SATA controllers. Slow but seems to work.
>>>
>>> Anyway, I managed to get my btrfs RAID5 array corrupted to the point
>>> where it will just mount to read-only mode. btrfs check doesn't seem
>>> to work either. Here's some output.
>>
>> So not really deadly corrupted, if the data matters mount it RO and grab
>> whatever you could get.
>>
> Funny story about that. On access, it locks up the entire shell making
> me unable to do anything. However, Samba actually works. A lot of the
> data that was on the array was corrupted but I did manage to grab some
> stuff.
>>>
>>> root@LEDE:~# btrfs check /dev/sda
>>> Checking filesystem on /dev/sda
>>> UUID: 22d612d9-b7b6-4c4c-95cd-64f5056d420b
>>> checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A
>>> checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A
>>> checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A
>>> checksum verify failed on 3174631424000 found 6F04F3FC wanted 2658452A
>>> Csum didn't match
>>> ERROR: failed to repair root items: I/O error
>>
>> IIRC btrfs-progs doesn't handle RAID5/6 repair well, so if something
>> went wrong btrfs-progs just give up.
>>
>> So don't expect too much when using btrfs-progs with RAID5/6.
>>
> Duly noted. I/O error is strange since the hardware is fine...
Well, most EIO in btrfs only means csum error.
So your hardware is mostly in good shape, unless there is some extra
error message from your device driver or block layer.
Thanks,
Qu
>>>
>>> root@LEDE:~# btrfs check --init-extent-tree /dev/sda
>>> Checking filesystem on /dev/sda
>>> UUID: 22d612d9-b7b6-4c4c-95cd-64f5056d420b
>>> Creating a new extent tree
>>> Failed to find [3174144425984, 168, 16384]
>>> btrfs unable to find ref byte nr 3174347603968 parent 0 root 1 owner 1 offset 0
>>> Failed to find [3174144475136, 168, 16384]
>>> btrfs unable to find ref byte nr 3174444449792 parent 0 root 1 owner 0 offset 1
>>> Failed to find [3174144507904, 168, 16384]
>>> btrfs unable to find ref byte nr 3174631505920 parent 0 root 1 owner 0 offset 1
>>> checking extents
>>> cmds-check.c:7866: add_data_backref: BUG_ON `!back` triggered, value 1
>>> Aborted
>>
>> You're calling one of the most dangerous operation.
>> It's a fortune it just aborts before causing more dangerous.
>>
> Didn't realize this option was dangerous. Guess I should have read man pages...
>>>
>>> root@LEDE:~# btrfs check --init-csum-tree /dev/sda
>>> Creating a new CRC tree
>>> Checking filesystem on /dev/sda
>>> UUID: 22d612d9-b7b6-4c4c-95cd-64f5056d420b
>>> Reinitialize checksum tree
>>> Fixed 0 roots.
>>> checking extents
>>> cmds-check.c:7866: add_data_backref: BUG_ON `!back` triggered, value 1
>>> Aborted
>>>
>>> This is with version 4.14 of btrfs-progs. Do I need a newer version or
>>> should I just reinitialize my array and copy everything back?
>>>
>>> Log on mount attached below:
>>>
>>> Fri Jan 19 14:26:01 2018 kern.info kernel: [168376.739242] BTRFS info
>>> (device sda): disk space caching is enabled
>>> Fri Jan 19 14:26:01 2018 kern.info kernel: [168376.752038] BTRFS info
>>> (device sda): has skinny extents
>>> Fri Jan 19 14:26:04 2018 kern.info kernel: [168380.493600] BTRFS info
>>> (device sda): continuing balance
>>
>> It seems to be a problem relocating the chunk.
>>
>> Try 'skip_balance' to see if it allow you to mount it RW.
>>
>> If it doesn't work, and since btrfs-progs won't help much in such case,
>> rebuilding seems to be your only option.
>>
>
> Ended up rebuilding. It seems userspace (and maybe kernel?) is getting
> proper data now from the drives so btrfs is not detecting silent data
> corruption and trying to deal with it.
>
>> Thanks,
>> Qu
>>
>>> Fri Jan 19 14:26:07 2018 kern.info kernel: [168382.691771] BTRFS info
>>> (device sda): relocating block group 3295510790144 flags 129
>>> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.028958] BTRFS
>>> warning (device sda): sda checksum verify failed on 3174631424000
>>> wanted 2658452A found 6F04F3FC level 0
>>> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.052699] BTRFS
>>> warning (device sda): sda checksum verify failed on 3174631424000
>>> wanted 2658452A found 6F04F3FC level 0
>>> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.087279] BTRFS
>>> warning (device sda): sda checksum verify failed on 3174631424000
>>> wanted 2658452A found 6F04F3FC level 0
>>> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.110017]
>>> ------------[ cut here ]------------
>>> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.119950] WARNING:
>>> CPU: 0 PID: 2496 at fs/btrfs/extent-tree.c:6958
>>> btrfs_lookup_block_group+0x1438/0x1f74 [btrfs]
>>> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.120096] BTRFS
>>> warning (device sda): sda checksum verify failed on 3174631424000
>>> wanted 2658452A found 6F04F3FC level 0
>>> Fri Jan 19 14:26:07 2018 kern.crit kernel: [168383.120189] BTRFS:
>>> error (device sda) in __btrfs_free_extent:6958: errno=-5 IO failure
>>> Fri Jan 19 14:26:07 2018 kern.info kernel: [168383.120197] BTRFS info
>>> (device sda): forced readonly
>>> Fri Jan 19 14:26:07 2018 kern.crit kernel: [168383.120214] BTRFS:
>>> error (device sda) in btrfs_run_delayed_refs:2967: errno=-5 IO failure
>>> Fri Jan 19 14:26:07 2018 kern.debug kernel: [168383.207466] BTRFS:
>>> Transaction aborted (error -5)
>>> Fri Jan 19 14:26:07 2018 kern.warn kernel: [168383.217230] Modules
>>> linked in: snd_usb_audio nf_conntrack_ipv6 iptable_nat ipt_REJECT
>>> ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark
>>> xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG
>>> snd_usbmidi_lib nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4
>>> nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6
>>> nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack libcrc32c
>>> iptable_mangle iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6
>>> nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables
>>> x_tables snd_compress snd_pcm_oss snd_mixer_oss snd_pcm snd_timer
>>> snd_rawmidi snd_seq_device snd_hwdep snd soundcore cifs sha256_generic
>>> md5 md4 hmac ecb des_generic usb_storage leds_gpio xhci_mtk
>>> xhci_plat_hcd xhci_pci xhci_hcd ahci libahci libata sd_mod
>>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.361711] scsi_mod
>>> gpio_button_hotplug btrfs xor raid6_pq usbcore nls_base usb_common
>>> crc32c_generic
>>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.378239] CPU: 0 PID:
>>> 2496 Comm: kworker/u8:2 Tainted: G W 4.9.75 #0
>>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.394206] Workqueue:
>>> btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
>>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.408183] Stack :
>>> 8b3b8200 804c0000 8045bc04 8f7d359c 00000009 00001b2e 8ed29270
>>> 00000000
>>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.425374]
>>> 8f673800 8006b9c8 8045bc04 00000000 000009c0 80523824 8045bb70
>>> 8c6b3b24
>>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.442564]
>>> 804c0000 800a8670 00000001 80520000 804c9ec4 804c9ec8 80460810
>>> 8c6b3b24
>>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.459753]
>>> 804c0000 8004334c 8ed29270 8c6b3b5c 000005ae 00000000 00000006
>>> 006b3b44
>>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.476942]
>>> 8f7777ac 8fe2e400 8fe2eb00 66727462 78652d73 746e6574 6665722d
>>> 00000073
>>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.494132] ...
>>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.499272] Call Trace:
>>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.504435]
>>> [<8000f814>] show_stack+0x54/0x88
>>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.513472]
>>> [<801da9cc>] dump_stack+0x8c/0xd0
>>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.522505]
>>> [<8002bdc4>] __warn+0xe4/0x118
>>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.531005]
>>> [<8002be28>] warn_slowpath_fmt+0x30/0x3c
>>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.541343]
>>> [<8f716adc>] btrfs_lookup_block_group+0x1438/0x1f74 [btrfs]
>>> Fri Jan 19 14:26:08 2018 kern.warn kernel: [168383.555109] ---[ end
>>> trace d625fb7e6ea3d882 ]---
>>> Fri Jan 19 14:26:08 2018 kern.crit kernel: [168383.564700] BTRFS:
>>> error (device sda) in __btrfs_free_extent:6958: errno=-5 IO failure
>>> Fri Jan 19 14:26:08 2018 kern.crit kernel: [168383.581024] BTRFS:
>>> error (device sda) in btrfs_run_delayed_refs:2967: errno=-5 IO failure
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-01-22 0:41 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-19 21:45 btrfs volume corrupt. btrfs-progs bug or need to rebuild volume? Rosen Penev
2018-01-20 6:32 ` Duncan
2018-01-21 9:53 ` Qu Wenruo
2018-01-21 20:33 ` Rosen Penev
2018-01-22 0:41 ` Qu Wenruo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox