* Transaction aborted (error -17) after crash
@ 2015-11-17 5:12 Mordechay Kaganer
2015-11-18 3:02 ` Qu Wenruo
0 siblings, 1 reply; 6+ messages in thread
From: Mordechay Kaganer @ 2015-11-17 5:12 UTC (permalink / raw)
To: Btrfs BTRFS
B.H.
Hello.
I have btrfs volume used for backups. The configuration is as follows:
# uname -a
Linux yemot-4u 4.2.5-040205-generic #201510270124 SMP Tue Oct 27
01:25:49 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
# btrfs --version
btrfs-progs v4.2.3
The volume is on top of 2 MD RAID10 arrays, 12TB each.
Recently the server crashed with some weird kernel panic during
rsyncing data to the backup volume.
After reboot btrfs check passed without errors, also scrub finished
with 0 errors.
Yet, after resuming rsync the following errors appeared:
[ 836.026606] BTRFS warning (device md1): block group 12969790406656
has wrong amount of free space
[ 836.026610] BTRFS warning (device md1): failed to load free space
cache for block group 12969790406656, rebuild it now
[ 1033.619798] BTRFS warning (device md1): block group 15322358743040
has wrong amount of free space
[ 1033.619801] BTRFS warning (device md1): failed to load free space
cache for block group 15322358743040, rebuild it now
[ 2052.843713] ------------[ cut here ]------------
[ 2052.843756] WARNING: CPU: 2 PID: 1725 at
/home/kernel/COD/linux/fs/btrfs/extent-tree.c:2781
btrfs_run_delayed_refs.part.73+0x242/0x270 [btrfs]()
[ 2052.843758] BTRFS: Transaction aborted (error -17)
[ 2052.843760] Modules linked in: ipmi_ssif x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul
aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd
snd_hda_codec_realtek snd_hda_codec_generic serio_raw ast sb_edac ttm
snd_hda_intel edac_core joydev drm_kms_helper snd_hda_codec drm
input_leds snd_hda_core snd_hwdep syscopyarea sysfillrect mei_me
snd_pcm sysimgblt lpc_ich mei snd_timer snd soundcore ipmi_si wmi
8250_fintek ipmi_msghandler shpchp lp mac_hid parport btrfs ses
enclosure raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor raid6_pq raid1 hid_generic igb i2c_algo_bit
isci dca firewire_ohci usbhid ptp firewire_core raid0 libsas ahci hid
psmouse multipath pps_core crc_itu_t libahci scsi_transport_sas
aacraid linear
[ 2052.843827] CPU: 2 PID: 1725 Comm: btrfs-transacti Not tainted
4.2.5-040205-generic #201510270124
[ 2052.843829] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./EPC602D8A, BIOS P1.20 04/16/2014
[ 2052.843832] 0000000000000000 00000000df907816 ffff8808414dfcb8
ffffffff817d8d6d
[ 2052.843836] 0000000000000000 ffff8808414dfd10 ffff8808414dfcf8
ffffffff8107b3c6
[ 2052.843839] 0000000000001a0c ffff88049c5fe8a0 ffff88085577d800
ffff88082932cb80
[ 2052.843843] Call Trace:
[ 2052.843852] [<ffffffff817d8d6d>] dump_stack+0x45/0x57
[ 2052.843858] [<ffffffff8107b3c6>] warn_slowpath_common+0x86/0xc0
[ 2052.843862] [<ffffffff8107b455>] warn_slowpath_fmt+0x55/0x70
[ 2052.843878] [<ffffffffc022ecf2>]
btrfs_run_delayed_refs.part.73+0x242/0x270 [btrfs]
[ 2052.843882] [<ffffffff810e54bc>] ? del_timer_sync+0x4c/0x60
[ 2052.843897] [<ffffffffc022ed35>] btrfs_run_delayed_refs+0x15/0x20 [btrfs]
[ 2052.843915] [<ffffffffc0243756>] btrfs_commit_transaction+0x56/0xb20 [btrfs]
[ 2052.843931] [<ffffffffc023ee19>] transaction_kthread+0x229/0x240 [btrfs]
[ 2052.843945] [<ffffffffc023ebf0>] ?
btrfs_cleanup_transaction+0x550/0x550 [btrfs]
[ 2052.843949] [<ffffffff8109a798>] kthread+0xd8/0xf0
[ 2052.843953] [<ffffffff8109a6c0>] ? kthread_create_on_node+0x1b0/0x1b0
[ 2052.843957] [<ffffffff817dff9f>] ret_from_fork+0x3f/0x70
[ 2052.843960] [<ffffffff8109a6c0>] ? kthread_create_on_node+0x1b0/0x1b0
[ 2052.843962] ---[ end trace 6575cf272a151e61 ]---
[ 2052.843966] BTRFS: error (device md1) in
btrfs_run_delayed_refs:2781: errno=-17 Object already exists
[ 2052.844024] BTRFS info (device md1): forced readonly
[ 2052.848397] pending csums is 7327744
Then, during unmount:
[25209.834767] BTRFS error (device md1): cleaner transaction attach returned -30
Any suggestions?
--
"ויעקב הלך לדרכו..."
משיח NOW!
Moshiach is coming very soon, prepare yourself!
יחי אדוננו מורינו ורבינו מלך המשיח לעולם ועד!
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Transaction aborted (error -17) after crash
2015-11-17 5:12 Transaction aborted (error -17) after crash Mordechay Kaganer
@ 2015-11-18 3:02 ` Qu Wenruo
2015-11-18 5:18 ` Mordechay Kaganer
0 siblings, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2015-11-18 3:02 UTC (permalink / raw)
To: Mordechay Kaganer, Btrfs BTRFS
在 2015年11月17日 13:12, Mordechay Kaganer 写道:
> B.H.
>
> Hello.
>
> I have btrfs volume used for backups. The configuration is as follows:
>
> # uname -a
> Linux yemot-4u 4.2.5-040205-generic #201510270124 SMP Tue Oct 27
> 01:25:49 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
>
> # btrfs --version
> btrfs-progs v4.2.3
>
> The volume is on top of 2 MD RAID10 arrays, 12TB each.
>
> Recently the server crashed with some weird kernel panic during
> rsyncing data to the backup volume.
>
> After reboot btrfs check passed without errors, also scrub finished
> with 0 errors.
>
> Yet, after resuming rsync the following errors appeared:
>
> [ 836.026606] BTRFS warning (device md1): block group 12969790406656
> has wrong amount of free space
> [ 836.026610] BTRFS warning (device md1): failed to load free space
> cache for block group 12969790406656, rebuild it now
> [ 1033.619798] BTRFS warning (device md1): block group 15322358743040
> has wrong amount of free space
> [ 1033.619801] BTRFS warning (device md1): failed to load free space
> cache for block group 15322358743040, rebuild it now
Block group seems to be corrupted, maybe some other place in extent tree
is also corrupted.
Personally speaking, this would not be a huge problem and can be fix by
btrfsck --init-extent-tree --repair to fix it.
But you must ensure other tree is not corrupted, by using btrfsck
without any option to check the filesystem.
If no error in fs tree/subvolume tree check, then it would be OK to use
above "--init-extent-tree --repair" to fix it.
Although backup is highly recommended before running --init-extent-tree.
Thanks,
Qu
> [ 2052.843713] ------------[ cut here ]------------
> [ 2052.843756] WARNING: CPU: 2 PID: 1725 at
> /home/kernel/COD/linux/fs/btrfs/extent-tree.c:2781
> btrfs_run_delayed_refs.part.73+0x242/0x270 [btrfs]()
> [ 2052.843758] BTRFS: Transaction aborted (error -17)
> [ 2052.843760] Modules linked in: ipmi_ssif x86_pkg_temp_thermal
> intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul
> aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd
> snd_hda_codec_realtek snd_hda_codec_generic serio_raw ast sb_edac ttm
> snd_hda_intel edac_core joydev drm_kms_helper snd_hda_codec drm
> input_leds snd_hda_core snd_hwdep syscopyarea sysfillrect mei_me
> snd_pcm sysimgblt lpc_ich mei snd_timer snd soundcore ipmi_si wmi
> 8250_fintek ipmi_msghandler shpchp lp mac_hid parport btrfs ses
> enclosure raid10 raid456 async_raid6_recov async_memcpy async_pq
> async_xor async_tx xor raid6_pq raid1 hid_generic igb i2c_algo_bit
> isci dca firewire_ohci usbhid ptp firewire_core raid0 libsas ahci hid
> psmouse multipath pps_core crc_itu_t libahci scsi_transport_sas
> aacraid linear
> [ 2052.843827] CPU: 2 PID: 1725 Comm: btrfs-transacti Not tainted
> 4.2.5-040205-generic #201510270124
> [ 2052.843829] Hardware name: To Be Filled By O.E.M. To Be Filled By
> O.E.M./EPC602D8A, BIOS P1.20 04/16/2014
> [ 2052.843832] 0000000000000000 00000000df907816 ffff8808414dfcb8
> ffffffff817d8d6d
> [ 2052.843836] 0000000000000000 ffff8808414dfd10 ffff8808414dfcf8
> ffffffff8107b3c6
> [ 2052.843839] 0000000000001a0c ffff88049c5fe8a0 ffff88085577d800
> ffff88082932cb80
> [ 2052.843843] Call Trace:
> [ 2052.843852] [<ffffffff817d8d6d>] dump_stack+0x45/0x57
> [ 2052.843858] [<ffffffff8107b3c6>] warn_slowpath_common+0x86/0xc0
> [ 2052.843862] [<ffffffff8107b455>] warn_slowpath_fmt+0x55/0x70
> [ 2052.843878] [<ffffffffc022ecf2>]
> btrfs_run_delayed_refs.part.73+0x242/0x270 [btrfs]
> [ 2052.843882] [<ffffffff810e54bc>] ? del_timer_sync+0x4c/0x60
> [ 2052.843897] [<ffffffffc022ed35>] btrfs_run_delayed_refs+0x15/0x20 [btrfs]
> [ 2052.843915] [<ffffffffc0243756>] btrfs_commit_transaction+0x56/0xb20 [btrfs]
> [ 2052.843931] [<ffffffffc023ee19>] transaction_kthread+0x229/0x240 [btrfs]
> [ 2052.843945] [<ffffffffc023ebf0>] ?
> btrfs_cleanup_transaction+0x550/0x550 [btrfs]
> [ 2052.843949] [<ffffffff8109a798>] kthread+0xd8/0xf0
> [ 2052.843953] [<ffffffff8109a6c0>] ? kthread_create_on_node+0x1b0/0x1b0
> [ 2052.843957] [<ffffffff817dff9f>] ret_from_fork+0x3f/0x70
> [ 2052.843960] [<ffffffff8109a6c0>] ? kthread_create_on_node+0x1b0/0x1b0
> [ 2052.843962] ---[ end trace 6575cf272a151e61 ]---
> [ 2052.843966] BTRFS: error (device md1) in
> btrfs_run_delayed_refs:2781: errno=-17 Object already exists
> [ 2052.844024] BTRFS info (device md1): forced readonly
> [ 2052.848397] pending csums is 7327744
>
> Then, during unmount:
>
> [25209.834767] BTRFS error (device md1): cleaner transaction attach returned -30
>
>
> Any suggestions?
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Transaction aborted (error -17) after crash
2015-11-18 3:02 ` Qu Wenruo
@ 2015-11-18 5:18 ` Mordechay Kaganer
2015-11-18 5:31 ` Qu Wenruo
0 siblings, 1 reply; 6+ messages in thread
From: Mordechay Kaganer @ 2015-11-18 5:18 UTC (permalink / raw)
To: Qu Wenruo; +Cc: Btrfs BTRFS
B.H.
On Wed, Nov 18, 2015 at 5:02 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
> 在 2015年11月17日 13:12, Mordechay Kaganer 写道:
>>
>> B.H.
>>
>>
>> [ 836.026606] BTRFS warning (device md1): block group 12969790406656
>> has wrong amount of free space
>> [ 836.026610] BTRFS warning (device md1): failed to load free space
>> cache for block group 12969790406656, rebuild it now
>> [ 1033.619798] BTRFS warning (device md1): block group 15322358743040
>> has wrong amount of free space
>> [ 1033.619801] BTRFS warning (device md1): failed to load free space
>> cache for block group 15322358743040, rebuild it now
>
>
> Block group seems to be corrupted, maybe some other place in extent tree is
> also corrupted.
>
> Personally speaking, this would not be a huge problem and can be fix by
> btrfsck --init-extent-tree --repair to fix it.
Thanks. That's what i get so far (it takes several hours to complete
because the partition is huge):
# btrfsck --init-extent-tree --repair /dev/md1
enabling repair mode
Checking filesystem on /dev/md1
UUID: e16cc571-d897-473a-91d3-7673ea91d4cc
Creating a new extent tree
Failed to find [12968716681216, 168, 16384]
btrfs unable to find ref byte nr 12968927035392 parent 0 root 1 owner
1 offset 0
Failed to find [12968716730368, 168, 16384]
btrfs unable to find ref byte nr 12969060089856 parent 0 root 1 owner
0 offset 1
parent transid verify failed on 1103806464 wanted 19 found 13821
Ignoring transid failure
Failed to find [12968716779520, 168, 16384]
btrfs unable to find ref byte nr 12968933851136 parent 0 root 1 owner
0 offset 1
Does it mean something screwed up and unrecoverable?
>
> But you must ensure other tree is not corrupted, by using btrfsck without
> any option to check the filesystem.
I did run btrfs check --repair without other options. Finished without
problems as far as i understand the output. Also tried mount with -o
clear_cache. All this doesn't seem to help - tried rsync after that
and the FS went read only with the same error message.
--
"ויעקב הלך לדרכו..."
משיח NOW!
Moshiach is coming very soon, prepare yourself!
יחי אדוננו מורינו ורבינו מלך המשיח לעולם ועד!
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Transaction aborted (error -17) after crash
2015-11-18 5:18 ` Mordechay Kaganer
@ 2015-11-18 5:31 ` Qu Wenruo
2015-11-18 12:02 ` Mordechay Kaganer
0 siblings, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2015-11-18 5:31 UTC (permalink / raw)
To: Mordechay Kaganer, Qu Wenruo; +Cc: Btrfs BTRFS
Mordechay Kaganer wrote on 2015/11/18 07:18 +0200:
> B.H.
>
> On Wed, Nov 18, 2015 at 5:02 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>> 在 2015年11月17日 13:12, Mordechay Kaganer 写道:
>>>
>>> B.H.
>>>
>>>
>>> [ 836.026606] BTRFS warning (device md1): block group 12969790406656
>>> has wrong amount of free space
>>> [ 836.026610] BTRFS warning (device md1): failed to load free space
>>> cache for block group 12969790406656, rebuild it now
>>> [ 1033.619798] BTRFS warning (device md1): block group 15322358743040
>>> has wrong amount of free space
>>> [ 1033.619801] BTRFS warning (device md1): failed to load free space
>>> cache for block group 15322358743040, rebuild it now
>>
>>
>> Block group seems to be corrupted, maybe some other place in extent tree is
>> also corrupted.
>>
>> Personally speaking, this would not be a huge problem and can be fix by
>> btrfsck --init-extent-tree --repair to fix it.
>
> Thanks. That's what i get so far (it takes several hours to complete
> because the partition is huge):
>
> # btrfsck --init-extent-tree --repair /dev/md1
> enabling repair mode
> Checking filesystem on /dev/md1
> UUID: e16cc571-d897-473a-91d3-7673ea91d4cc
> Creating a new extent tree
> Failed to find [12968716681216, 168, 16384]
> btrfs unable to find ref byte nr 12968927035392 parent 0 root 1 owner
> 1 offset 0
> Failed to find [12968716730368, 168, 16384]
> btrfs unable to find ref byte nr 12969060089856 parent 0 root 1 owner
> 0 offset 1
> parent transid verify failed on 1103806464 wanted 19 found 13821
> Ignoring transid failure
> Failed to find [12968716779520, 168, 16384]
> btrfs unable to find ref byte nr 12968933851136 parent 0 root 1 owner
> 0 offset 1
>
> Does it mean something screwed up and unrecoverable?
Hard to say yet.
Is that the only error?
And did you tried btrfsck --init-extent-tree alone?
Although IIRC --init-extent-tree implies --repair, but it seems that
extent tree is not correctly rebuilt at least.
>
>>
>> But you must ensure other tree is not corrupted, by using btrfsck without
>> any option to check the filesystem.
>
> I did run btrfs check --repair without other options.
Oh, I was meant to execute "btrfs check" without any other options,
especially exclude "--repair"...
> Finished without
> problems as far as i understand the output.
That's a good news, at least it doesn't make the problem worse.
And, it would be quite helpful if you can post all the output(for both
btrfsck --init-extent-tree, and btrfsck), to give us a better view of
what's going wrong.
Thanks,
Qu
> Also tried mount with -o
> clear_cache. All this doesn't seem to help - tried rsync after that
> and the FS went read only with the same error message.
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Transaction aborted (error -17) after crash
2015-11-18 5:31 ` Qu Wenruo
@ 2015-11-18 12:02 ` Mordechay Kaganer
2015-11-18 12:28 ` Qu Wenruo
0 siblings, 1 reply; 6+ messages in thread
From: Mordechay Kaganer @ 2015-11-18 12:02 UTC (permalink / raw)
To: Qu Wenruo; +Cc: Qu Wenruo, Btrfs BTRFS
B.H.
On Wed, Nov 18, 2015 at 7:31 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>
> Hard to say yet.
>
> Is that the only error?
>
> And did you tried btrfsck --init-extent-tree alone?
> Although IIRC --init-extent-tree implies --repair, but it seems that extent
> tree is not correctly rebuilt at least.
btrfs check --init-extent-tree --repair gives tons of messages like this:
ref mismatch on [1628438528 16384] extent item 0, found 1
adding new tree backref on start 1628438528 len 16384 parent 0 root 258
Backref 1628438528 parent 258 root 258 not found in extent tree
backpointer mismatch on [1628438528 16384]
ref mismatch on [1628454912 16384] extent item 0, found 1
adding new tree backref on start 1628454912 len 16384 parent 0 root 7
Backref 1628454912 parent 7 root 7 not found in extent tree
backpointer mismatch on [1628454912 16384]
ref mismatch on [1628471296 16384] extent item 0, found 1
adding new tree backref on start 1628471296 len 16384 parent 0 root 7
Backref 1628471296 parent 7 root 7 not found in extent tree
backpointer mismatch on [1628471296 16384]
ref mismatch on [1628487680 16384] extent item 0, found 1
adding new tree backref on start 1628487680 len 16384 parent 0 root 7
Backref 1628487680 parent 7 root 7 not found in extent tree
backpointer mismatch on [1628487680 16384]
Anyway, that's only a backup so i think it's better to rebuild it from
scratch, right?
--
"ויעקב הלך לדרכו..."
משיח NOW!
Moshiach is coming very soon, prepare yourself!
יחי אדוננו מורינו ורבינו מלך המשיח לעולם ועד!
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Transaction aborted (error -17) after crash
2015-11-18 12:02 ` Mordechay Kaganer
@ 2015-11-18 12:28 ` Qu Wenruo
0 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2015-11-18 12:28 UTC (permalink / raw)
To: Mordechay Kaganer, Qu Wenruo; +Cc: Btrfs BTRFS
在 2015年11月18日 20:02, Mordechay Kaganer 写道:
> B.H.
>
> On Wed, Nov 18, 2015 at 7:31 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>
>> Hard to say yet.
>>
>> Is that the only error?
>>
>> And did you tried btrfsck --init-extent-tree alone?
>> Although IIRC --init-extent-tree implies --repair, but it seems that extent
>> tree is not correctly rebuilt at least.
>
> btrfs check --init-extent-tree --repair gives tons of messages like this:
>
> ref mismatch on [1628438528 16384] extent item 0, found 1
> adding new tree backref on start 1628438528 len 16384 parent 0 root 258
> Backref 1628438528 parent 258 root 258 not found in extent tree
> backpointer mismatch on [1628438528 16384]
> ref mismatch on [1628454912 16384] extent item 0, found 1
> adding new tree backref on start 1628454912 len 16384 parent 0 root 7
> Backref 1628454912 parent 7 root 7 not found in extent tree
> backpointer mismatch on [1628454912 16384]
> ref mismatch on [1628471296 16384] extent item 0, found 1
> adding new tree backref on start 1628471296 len 16384 parent 0 root 7
> Backref 1628471296 parent 7 root 7 not found in extent tree
> backpointer mismatch on [1628471296 16384]
> ref mismatch on [1628487680 16384] extent item 0, found 1
> adding new tree backref on start 1628487680 len 16384 parent 0 root 7
> Backref 1628487680 parent 7 root 7 not found in extent tree
> backpointer mismatch on [1628487680 16384]
>
> Anyway, that's only a backup so i think it's better to rebuild it from
> scratch, right?
>
If it's OK for you, then it's OK
BTW, did you run btrfsck *after* --init-extent-tree, *without* --repair?
I think it would be much better to do the last btrfsck.
And if the last one reports no error, and btrfs still hits the same
problem, I think it's time to rebuild the backup....
Thanks,
Qu
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-11-18 12:29 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-17 5:12 Transaction aborted (error -17) after crash Mordechay Kaganer
2015-11-18 3:02 ` Qu Wenruo
2015-11-18 5:18 ` Mordechay Kaganer
2015-11-18 5:31 ` Qu Wenruo
2015-11-18 12:02 ` Mordechay Kaganer
2015-11-18 12:28 ` Qu Wenruo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox