From: David Sterba <dsterba@suse.cz>
To: Naohiro Aota <naohiro.aota@wdc.com>
Cc: linux-btrfs@vger.kernel.org,
Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>,
Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Subject: Re: [PATCH] btrfs: avoid possible parallel list adding
Date: Mon, 1 Jul 2024 15:12:44 +0200 [thread overview]
Message-ID: <20240701131244.GB21023@twin.jikos.cz> (raw)
In-Reply-To: <58e8574ccd70645c613e6bc7d328e34c2f52421a.1719549099.git.naohiro.aota@wdc.com>
On Fri, Jun 28, 2024 at 01:32:24PM +0900, Naohiro Aota wrote:
> There is a potential parallel list adding for retrying in
> btrfs_reclaim_bgs_work and adding to the unused list. Since the block
> group is removed from the reclaim list and it is on a relocation work,
> it can be added into the unused list in parallel. When that happens,
> adding it to the reclaim list will corrupt the list head and trigger
> list corruption like below.
>
> Fix it by taking fs_info->unused_bgs_lock.
>
> [27177.504101][T2585409] BTRFS error (device nullb1): error relocating ch= unk 2415919104
> [27177.514722][T2585409] list_del corruption. next->prev should be ff1100= 0344b119c0, but was ff11000377e87c70. (next=3Dff110002390cd9c0)
> [27177.529789][T2585409] ------------[ cut here ]------------
> [27177.537690][T2585409] kernel BUG at lib/list_debug.c:65!
> [27177.545548][T2585409] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
> [27177.555466][T2585409] CPU: 9 PID: 2585409 Comm: kworker/u128:2 Tainted: G W 6.10.0-rc5-kts #1
> [27177.568502][T2585409] Hardware name: Supermicro SYS-520P-WTR/X12SPW-TF, BIOS 1.2 02/14/2022
> [27177.579784][T2585409] Workqueue: events_unbound btrfs_reclaim_bgs_work[btrfs]
> [27177.589946][T2585409] RIP: 0010:__list_del_entry_valid_or_report.cold+0x70/0x72
> [27177.600088][T2585409] Code: fa ff 0f 0b 4c 89 e2 48 89 de 48 c7 c7 c0
> 8c 3b 93 e8 ab 8e fa ff 0f 0b 4c 89 e1 48 89 de 48 c7 c7 00 8e 3b 93 e8
> 97 8e fa ff <0f> 0b 48 63 d1 4c 89 f6 48 c7 c7 e0 56 67 94 89 4c 24 04
> e8 ff f2
> [27177.624173][T2585409] RSP: 0018:ff11000377e87a70 EFLAGS: 00010286
> [27177.633052][T2585409] RAX: 000000000000006d RBX: ff11000344b119c0 RCX:0000000000000000
> [27177.644059][T2585409] RDX: 000000000000006d RSI: 0000000000000008 RDI:ffe21c006efd0f40
> [27177.655030][T2585409] RBP: ff110002e0509f78 R08: 0000000000000001 R09:ffe21c006efd0f08
> [27177.665992][T2585409] R10: ff11000377e87847 R11: 0000000000000000 R12:ff110002390cd9c0
> [27177.676964][T2585409] R13: ff11000344b119c0 R14: ff110002e0508000 R15:dffffc0000000000
> [27177.687938][T2585409] FS: 0000000000000000(0000) GS:ff11000fec880000(0000) knlGS:0000000000000000
> [27177.700006][T2585409] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [27177.709431][T2585409] CR2: 00007f06bc7b1978 CR3: 0000001021e86005 CR4:0000000000771ef0
> [27177.720402][T2585409] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
> [27177.731337][T2585409] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
> [27177.742252][T2585409] PKRU: 55555554
> [27177.748207][T2585409] Call Trace:
> [27177.753850][T2585409] <TASK>
> [27177.759103][T2585409] ? __die_body.cold+0x19/0x27
> [27177.766405][T2585409] ? die+0x2e/0x50
> [27177.772498][T2585409] ? do_trap+0x1ea/0x2d0
> [27177.779139][T2585409] ? __list_del_entry_valid_or_report.cold+0x70/0x72
> [27177.788518][T2585409] ? do_error_trap+0xa3/0x160
> [27177.795649][T2585409] ? __list_del_entry_valid_or_report.cold+0x70/0x72
> [27177.805045][T2585409] ? handle_invalid_op+0x2c/0x40
> [27177.812022][T2585409] ? __list_del_entry_valid_or_report.cold+0x70/0x72
> [27177.820947][T2585409] ? exc_invalid_op+0x2d/0x40
> [27177.827608][T2585409] ? asm_exc_invalid_op+0x1a/0x20
> [27177.834637][T2585409] ? __list_del_entry_valid_or_report.cold+0x70/0x72
> [27177.843519][T2585409] btrfs_delete_unused_bgs+0x3d9/0x14c0 [btrfs]
>
> There is a similar retry_list code in btrfs_delete_unused_bgs(), but it is
> safe, AFAIC. Since the block group was in the unused list, the used bytes
> should be 0 when it was added to the unused list. Then, it checks
> block_group->{used,resereved,pinned} are still 0 under the
> block_group->lock. So, they should be still eligible for the unused list,
> not the reclaim list.
>
> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Suggested-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
> Fixes: 4eb4e85c4f81 ("btrfs: retry block group reclaim without infinite loop")
This commit has landed in several stable trees so we need to get this
fixup out quickly. How hard is to hit the problem? It's caught by list
corruption detection but this is most likely not enabled on distro
kernels so it might go unnoticed.
next prev parent reply other threads:[~2024-07-01 13:12 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-28 4:32 [PATCH] btrfs: avoid possible parallel list adding Naohiro Aota
2024-06-28 5:37 ` Qu Wenruo
2024-06-28 9:56 ` Filipe Manana
2024-06-28 6:02 ` Johannes Thumshirn
2024-06-28 9:45 ` Filipe Manana
2024-07-01 7:14 ` Naohiro Aota
2024-07-01 9:14 ` Filipe Manana
2024-06-28 10:03 ` Filipe Manana
2024-07-01 7:17 ` Naohiro Aota
2024-07-01 13:12 ` David Sterba [this message]
2024-07-01 13:35 ` Naohiro Aota
2024-07-01 13:50 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240701131244.GB21023@twin.jikos.cz \
--to=dsterba@suse.cz \
--cc=Johannes.Thumshirn@wdc.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=naohiro.aota@wdc.com \
--cc=shinichiro.kawasaki@wdc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox