* system hangs when running btrfs balance
@ 2020-08-10 9:22 Johannes Rohr
2020-08-10 9:59 ` Qu Wenruo
0 siblings, 1 reply; 5+ messages in thread
From: Johannes Rohr @ 2020-08-10 9:22 UTC (permalink / raw)
To: linux-btrfs
Dear devs,
since I upgraded our system from Ubuntu 18.04 LTS to 20.04 LTS, the file
system completely freezes when I run a btrfs balance on it. The only way
to get a usable system for the time being is with the mount option
"skip_balance".
The server has a raid1 with 4 SSDs with 500 GB each.
Here is a backtrace of what I am seeing:
[Sun Aug 9 12:21:35 2020] ------------[ cut here ]------------
[Sun Aug 9 12:21:35 2020] kernel BUG at fs/btrfs/
relocation.c:2626!
[Sun Aug 9 12:21:35 2020] invalid opcode: 0000 [#1] SMP PTI
[Sun Aug 9 12:21:35 2020] CPU: 1 PID: 4537 Comm: btrfs-balance Tainted: G O 5.4.47 #1
[Sun Aug 9 12:21:35 2020] Hardware name: FUJITSU D3401-H1/D3401-H1,
BIOS V5.0.0.11 R1.14.0 for D3401-H1x 06/09/2016
[Sun Aug 9 12:21:35 2020] RIP: 0010:select_reloc_root+0x5b/0x19f [btrfs]
[Sun Aug 9 12:21:35 2020] Code: c0 c7 44 24 04 00 00 00 00 e8 8b 9d 17
e1 48 89 df 4c 89 f6 48 8d 54 24 04 e8 9c e6 ff ff 48 8b 58 60 48 89 c5
48 85 db 75 02 <0f> 0b 48 8b 43 20 a8 02 75 02 0f 0b 48 83 bb df
01 00 00 f8 75 45
[Sun Aug 9 12:21:35 2020] RSP: 0018:ffff8887e0b0bb20 EFLAGS: 00010246
[Sun Aug 9 12:21:35 2020] RAX: ffff8887dfab5280 RBX: 0000000000000000 RCX: 0000000000000000
[Sun Aug 9 12:21:35 2020] RDX: ffff8887e0b0bb24 RSI: ffff8887e0b0bc10 RDI: ffff8887dfab52c0
[Sun Aug 9 12:21:35 2020] RBP: ffff8887dfab5280 R08: ffff8887dfab52c0 R09: ffffffffa0491e7e
[Sun Aug 9 12:21:35 2020] R10: ffff8887f4ba7e70 R11: ffff8888090ed158 R12: ffff8887dfab5280
[Sun Aug 9 12:21:35 2020] R13: ffff8887fd330800 R14: ffff8887e0b0bc10 R15: ffff8887e7fa66e8
[Sun Aug 9 12:21:35 2020] FS: 0000000000000000(0000) GS:ffff88880e240000(0000) knlGS:0000000000000000
[Sun Aug 9 12:21:35 2020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sun Aug 9 12:21:35 2020] CR2: 000055b4d5b7cfe0 CR3: 000000000200a004 CR4: 00000000003606e0
[Sun Aug 9 12:21:35 2020] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[Sun Aug 9 12:21:35 2020] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[Sun Aug 9 12:21:35 2020] Call Trace:
[Sun Aug 9 12:21:35 2020] do_relocation+0xb6/0x4c2 [btrfs]
[Sun Aug 9 12:21:35 2020] ? calcu_metadata_size.isra.36.constprop.42+0x9e/0xc4 [btrfs]
[Sun Aug 9 12:21:35 2020] ? do_raw_spin_lock+0x2f/0x5a
[Sun Aug 9 12:21:35 2020] ? btrfs_block_rsv_refill+0x4b/0x8b [btrfs]
[Sun Aug 9 12:21:35 2020] relocate_tree_blocks+0x301/0x427 [btrfs]
[Sun Aug 9 12:21:35 2020] ? tree_insert+0x49/0x4e [btrfs]
[Sun Aug 9 12:21:35 2020] ? add_tree_block.isra.38+0x11e/0x144 [btrfs]
[Sun Aug 9 12:21:35 2020] relocate_block_group+0x279/0x49e [btrfs]
[Sun Aug 9 12:21:35 2020] btrfs_relocate_block_group+0x15e/0x23d [btrfs]
[Sun Aug 9 12:21:35 2020] btrfs_relocate_chunk+0x25/0x8c [btrfs]
[Sun Aug 9 12:21:35 2020] btrfs_balance+0xaf0/0xd45 [btrfs]
[Sun Aug 9 12:21:35 2020] ? btrfs_balance+0xd45/0xd45 [btrfs]
[Sun Aug 9 12:21:35 2020] balance_kthread+0x32/0x46 [btrfs]
[Sun Aug 9 12:21:35 2020] kthread+0xf5/0xfa
[Sun Aug 9 12:21:35 2020] ? kthread_associate_blkcg+0x86/0x86
[Sun Aug 9 12:21:35 2020] ret_from_fork+0x3a/0x50
[Sun Aug 9 12:21:35 2020] Modules linked in: btrfs xor zstd_decompress
zstd_compress lzo_compress lzo_decompress zlib_deflate raid6_pq
libcrc32c sd_mod ipmi_devintf ipmi_msghandler sg x86_pkg_temp_thermal
intel_powerclamp kvm_intel kvm irqbypass crc32_pclmul crc32c_intel
iTCO_wdt ghash_clmulni_intel aesni_intel crypto_simd psmouse ahci cryptd
libahci i2c_i801 serio_raw glue_helper intel_pch_thermal evdev video
thermal acpi_pad button fan jc42 ftsteutates nct6775 hwmon_vid coretemp
ip_tables x_tables autofs4 e1000e
[Sun Aug 9 12:21:36 2020] ---[ end trace 442b443de6cecc6e ]---
[Sun Aug 9 12:21:36 2020] RIP: 0010:select_reloc_root+0x5b/0x19f [btrfs]
[Sun Aug 9 12:21:36 2020] Code: c0 c7 44 24 04 00 00 00 00 e8 8b 9d 17
e1 48 89 df 4c 89 f6 48 8d 54 24 04 e8 9c e6 ff ff 48 8b 58 60 48 89 c5
48 85 db 75 02 <0f> 0b 48 8b 43 20 a8 02 75 02 0f 0b 48 83 bb df
01 00 00 f8 75 45
There has been a related bug report at kernel.org for a year,
https://bugzilla.kernel.org/show_bug.cgi?id=203405 and I have found
similar reports here and there, some pertaining to quite old kernel
versions, but we have only been hit with kernel 5.4. After this first
occurred, I had no better luck though, with older kernels (4 something
from Debian buster, also 4 something from Ubuntu 18.04).
Apart from fixing the underlying issue, would there be any wordaround
for it? Currently the balance for the fs is in suspended status. Since
there is quite a few people who depend on this server, I can't just play
around with it at random. That's why I am asking for advice here...
Thanks so much for any suggestions you might have!
Johannes
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: system hangs when running btrfs balance
2020-08-10 9:22 system hangs when running btrfs balance Johannes Rohr
@ 2020-08-10 9:59 ` Qu Wenruo
2020-08-10 20:23 ` Johannes Rohr
0 siblings, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2020-08-10 9:59 UTC (permalink / raw)
To: Johannes Rohr, linux-btrfs
[-- Attachment #1.1: Type: text/plain, Size: 2270 bytes --]
On 2020/8/10 下午5:22, Johannes Rohr wrote:
> Dear devs,
>
> since I upgraded our system from Ubuntu 18.04 LTS to 20.04 LTS, the file
> system completely freezes when I run a btrfs balance on it. The only way
> to get a usable system for the time being is with the mount option
> "skip_balance".
>
> The server has a raid1 with 4 SSDs with 500 GB each.
> [Sun Aug 9 12:21:35 2020] CPU: 1 PID: 4537 Comm: btrfs-balance Tainted: G O 5.4.47 #1
A quick git log glance shows that, some reloc tree related fixes haven't
landed in v5.4.47.
E.g. (commits are upstream commits, not stable tree commits)\
1dae7e0e58b484eaa43d530f211098fdeeb0f404 btrfs: reloc: clear
DEAD_RELOC_TREE bit for orphan roots to prevent runaway balance
51415b6c1b117e223bc083e30af675cb5c5498f3 btrfs: reloc: fix reloc root
leak and NULL pointer dereference.
And above fixes only landed in v5.4.54, so I guess you have to update
your kernel anyway.
> There has been a related bug report at kernel.org for a year,
> https://bugzilla.kernel.org/show_bug.cgi?id=203405 and I have found
> similar reports here and there, some pertaining to quite old kernel
> versions, but we have only been hit with kernel 5.4. After this first
> occurred, I had no better luck though, with older kernels (4 something
> from Debian buster, also 4 something from Ubuntu 18.04).
Nope, the mentioned one is another bug, we had some clue on this, but
need some time to solve it.
(It's mostly related to some special timing in canceling, leading to
parted dropped trees).
>
> Apart from fixing the underlying issue, would there be any wordaround
> for it?
Update your kernel to at least v5.4.54, then mount with skip_balance and
finally "btrfs balance cancel <mnt>".
After that, doing whatever you like should be fine.
I prefer to do a btrfs check on the unmounted or at least ro mounted fs
to ensure your fs is sane in the first place.
Thanks,
Qu
> Currently the balance for the fs is in suspended status. Since
> there is quite a few people who depend on this server, I can't just play
> around with it at random. That's why I am asking for advice here...
>
> Thanks so much for any suggestions you might have!
>
> Johannes
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: system hangs when running btrfs balance
2020-08-10 9:59 ` Qu Wenruo
@ 2020-08-10 20:23 ` Johannes Rohr
2020-08-11 16:43 ` Johannes Rohr
0 siblings, 1 reply; 5+ messages in thread
From: Johannes Rohr @ 2020-08-10 20:23 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs
[-- Attachment #1.1: Type: text/plain, Size: 2788 bytes --]
Thanks so much, Qu, for your advice and the background
We had some issues with btrfs, but I definitely want to continue using
it. So the incredible responsiveness of btrfs devs like you is
definitely on the plus side..
Apparently, Ubuntu is preparing a kernel update to v 5.4.54
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1889669 so our
problem should be solved soon!
Cheers,
Johannes
Am 10.08.20 um 11:59 schrieb Qu Wenruo:
>
> On 2020/8/10 下午5:22, Johannes Rohr wrote:
>> Dear devs,
>>
>> since I upgraded our system from Ubuntu 18.04 LTS to 20.04 LTS, the file
>> system completely freezes when I run a btrfs balance on it. The only way
>> to get a usable system for the time being is with the mount option
>> "skip_balance".
>>
>> The server has a raid1 with 4 SSDs with 500 GB each.
>> [Sun Aug 9 12:21:35 2020] CPU: 1 PID: 4537 Comm: btrfs-balance Tainted: G O 5.4.47 #1
> A quick git log glance shows that, some reloc tree related fixes haven't
> landed in v5.4.47.
>
> E.g. (commits are upstream commits, not stable tree commits)\
>
> 1dae7e0e58b484eaa43d530f211098fdeeb0f404 btrfs: reloc: clear
> DEAD_RELOC_TREE bit for orphan roots to prevent runaway balance
> 51415b6c1b117e223bc083e30af675cb5c5498f3 btrfs: reloc: fix reloc root
> leak and NULL pointer dereference.
>
> And above fixes only landed in v5.4.54, so I guess you have to update
> your kernel anyway.
>
>> There has been a related bug report at kernel.org for a year,
>> https://bugzilla.kernel.org/show_bug.cgi?id=203405 and I have found
>> similar reports here and there, some pertaining to quite old kernel
>> versions, but we have only been hit with kernel 5.4. After this first
>> occurred, I had no better luck though, with older kernels (4 something
>> from Debian buster, also 4 something from Ubuntu 18.04).
> Nope, the mentioned one is another bug, we had some clue on this, but
> need some time to solve it.
> (It's mostly related to some special timing in canceling, leading to
> parted dropped trees).
>
>> Apart from fixing the underlying issue, would there be any wordaround
>> for it?
> Update your kernel to at least v5.4.54, then mount with skip_balance and
> finally "btrfs balance cancel <mnt>".
> After that, doing whatever you like should be fine.
>
> I prefer to do a btrfs check on the unmounted or at least ro mounted fs
> to ensure your fs is sane in the first place.
>
> Thanks,
> Qu
>
>> Currently the balance for the fs is in suspended status. Since
>> there is quite a few people who depend on this server, I can't just play
>> around with it at random. That's why I am asking for advice here...
>>
>> Thanks so much for any suggestions you might have!
>>
>> Johannes
>>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: system hangs when running btrfs balance
2020-08-10 20:23 ` Johannes Rohr
@ 2020-08-11 16:43 ` Johannes Rohr
2020-08-11 21:30 ` Lukas Tribus
0 siblings, 1 reply; 5+ messages in thread
From: Johannes Rohr @ 2020-08-11 16:43 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs
Dear Qu, dear all,
hat the fix also been backported to the 5.6 kernel series? If so, from
which version on?
Cheers,
Johannes
Am 10.08.20 um 22:23 schrieb Johannes Rohr:
> Thanks so much, Qu, for your advice and the background
>
> We had some issues with btrfs, but I definitely want to continue using
> it. So the incredible responsiveness of btrfs devs like you is
> definitely on the plus side..
>
> Apparently, Ubuntu is preparing a kernel update to v 5.4.54
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1889669 so our
> problem should be solved soon!
>
> Cheers,
>
> Johannes
>
>
> Am 10.08.20 um 11:59 schrieb Qu Wenruo:
>> On 2020/8/10 下午5:22, Johannes Rohr wrote:
>>> Dear devs,
>>>
>>> since I upgraded our system from Ubuntu 18.04 LTS to 20.04 LTS, the file
>>> system completely freezes when I run a btrfs balance on it. The only way
>>> to get a usable system for the time being is with the mount option
>>> "skip_balance".
>>>
>>> The server has a raid1 with 4 SSDs with 500 GB each.
>>> [Sun Aug 9 12:21:35 2020] CPU: 1 PID: 4537 Comm: btrfs-balance Tainted: G O 5.4.47 #1
>> A quick git log glance shows that, some reloc tree related fixes haven't
>> landed in v5.4.47.
>>
>> E.g. (commits are upstream commits, not stable tree commits)\
>>
>> 1dae7e0e58b484eaa43d530f211098fdeeb0f404 btrfs: reloc: clear
>> DEAD_RELOC_TREE bit for orphan roots to prevent runaway balance
>> 51415b6c1b117e223bc083e30af675cb5c5498f3 btrfs: reloc: fix reloc root
>> leak and NULL pointer dereference.
>>
>> And above fixes only landed in v5.4.54, so I guess you have to update
>> your kernel anyway.
>>
>>> There has been a related bug report at kernel.org for a year,
>>> https://bugzilla.kernel.org/show_bug.cgi?id=203405 and I have found
>>> similar reports here and there, some pertaining to quite old kernel
>>> versions, but we have only been hit with kernel 5.4. After this first
>>> occurred, I had no better luck though, with older kernels (4 something
>>> from Debian buster, also 4 something from Ubuntu 18.04).
>> Nope, the mentioned one is another bug, we had some clue on this, but
>> need some time to solve it.
>> (It's mostly related to some special timing in canceling, leading to
>> parted dropped trees).
>>
>>> Apart from fixing the underlying issue, would there be any wordaround
>>> for it?
>> Update your kernel to at least v5.4.54, then mount with skip_balance and
>> finally "btrfs balance cancel <mnt>".
>> After that, doing whatever you like should be fine.
>>
>> I prefer to do a btrfs check on the unmounted or at least ro mounted fs
>> to ensure your fs is sane in the first place.
>>
>> Thanks,
>> Qu
>>
>>> Currently the balance for the fs is in suspended status. Since
>>> there is quite a few people who depend on this server, I can't just play
>>> around with it at random. That's why I am asking for advice here...
>>>
>>> Thanks so much for any suggestions you might have!
>>>
>>> Johannes
>>>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: system hangs when running btrfs balance
2020-08-11 16:43 ` Johannes Rohr
@ 2020-08-11 21:30 ` Lukas Tribus
0 siblings, 0 replies; 5+ messages in thread
From: Lukas Tribus @ 2020-08-11 21:30 UTC (permalink / raw)
To: Johannes Rohr; +Cc: Qu Wenruo, linux-btrfs
On Tue, 11 Aug 2020 at 18:43, Johannes Rohr <jorohr@gmail.com> wrote:
>
> Dear Qu, dear all,
>
> hat the fix also been backported to the 5.6 kernel series? If so, from
> which version on?
No, 5.6 is EOL.
http://lkml.iu.edu/hypermail/linux/kernel/2006.2/02484.html
Lukas
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-08-11 21:40 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-08-10 9:22 system hangs when running btrfs balance Johannes Rohr
2020-08-10 9:59 ` Qu Wenruo
2020-08-10 20:23 ` Johannes Rohr
2020-08-11 16:43 ` Johannes Rohr
2020-08-11 21:30 ` Lukas Tribus
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox