Linux Btrfs filesystem development
 help / color / mirror / Atom feed
* [PATCH] btrfs: avoid possible parallel list adding
@ 2024-06-28  4:32 Naohiro Aota
  2024-06-28  5:37 ` Qu Wenruo
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Naohiro Aota @ 2024-06-28  4:32 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Naohiro Aota, Shinichiro Kawasaki, Johannes Thumshirn

There is a potential parallel list adding for retrying in
btrfs_reclaim_bgs_work and adding to the unused list. Since the block
group is removed from the reclaim list and it is on a relocation work,
it can be added into the unused list in parallel. When that happens,
adding it to the reclaim list will corrupt the list head and trigger
list corruption like below.

Fix it by taking fs_info->unused_bgs_lock.

[27177.504101][T2585409] BTRFS error (device nullb1): error relocating ch= unk 2415919104
[27177.514722][T2585409] list_del corruption. next->prev should be ff1100= 0344b119c0, but was ff11000377e87c70. (next=3Dff110002390cd9c0)
[27177.529789][T2585409] ------------[ cut here ]------------
[27177.537690][T2585409] kernel BUG at lib/list_debug.c:65!
[27177.545548][T2585409] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
[27177.555466][T2585409] CPU: 9 PID: 2585409 Comm: kworker/u128:2 Tainted: G        W          6.10.0-rc5-kts #1
[27177.568502][T2585409] Hardware name: Supermicro SYS-520P-WTR/X12SPW-TF, BIOS 1.2 02/14/2022
[27177.579784][T2585409] Workqueue: events_unbound btrfs_reclaim_bgs_work[btrfs]
[27177.589946][T2585409] RIP: 0010:__list_del_entry_valid_or_report.cold+0x70/0x72
[27177.600088][T2585409] Code: fa ff 0f 0b 4c 89 e2 48 89 de 48 c7 c7 c0
8c 3b 93 e8 ab 8e fa ff 0f 0b 4c 89 e1 48 89 de 48 c7 c7 00 8e 3b 93 e8
97 8e fa ff <0f> 0b 48 63 d1 4c 89 f6 48 c7 c7 e0 56 67 94 89 4c 24 04
e8 ff f2
[27177.624173][T2585409] RSP: 0018:ff11000377e87a70 EFLAGS: 00010286
[27177.633052][T2585409] RAX: 000000000000006d RBX: ff11000344b119c0 RCX:0000000000000000
[27177.644059][T2585409] RDX: 000000000000006d RSI: 0000000000000008 RDI:ffe21c006efd0f40
[27177.655030][T2585409] RBP: ff110002e0509f78 R08: 0000000000000001 R09:ffe21c006efd0f08
[27177.665992][T2585409] R10: ff11000377e87847 R11: 0000000000000000 R12:ff110002390cd9c0
[27177.676964][T2585409] R13: ff11000344b119c0 R14: ff110002e0508000 R15:dffffc0000000000
[27177.687938][T2585409] FS:  0000000000000000(0000) GS:ff11000fec880000(0000) knlGS:0000000000000000
[27177.700006][T2585409] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[27177.709431][T2585409] CR2: 00007f06bc7b1978 CR3: 0000001021e86005 CR4:0000000000771ef0
[27177.720402][T2585409] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
[27177.731337][T2585409] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
[27177.742252][T2585409] PKRU: 55555554
[27177.748207][T2585409] Call Trace:
[27177.753850][T2585409]  <TASK>
[27177.759103][T2585409]  ? __die_body.cold+0x19/0x27
[27177.766405][T2585409]  ? die+0x2e/0x50
[27177.772498][T2585409]  ? do_trap+0x1ea/0x2d0
[27177.779139][T2585409]  ? __list_del_entry_valid_or_report.cold+0x70/0x72
[27177.788518][T2585409]  ? do_error_trap+0xa3/0x160
[27177.795649][T2585409]  ? __list_del_entry_valid_or_report.cold+0x70/0x72
[27177.805045][T2585409]  ? handle_invalid_op+0x2c/0x40
[27177.812022][T2585409]  ? __list_del_entry_valid_or_report.cold+0x70/0x72
[27177.820947][T2585409]  ? exc_invalid_op+0x2d/0x40
[27177.827608][T2585409]  ? asm_exc_invalid_op+0x1a/0x20
[27177.834637][T2585409]  ? __list_del_entry_valid_or_report.cold+0x70/0x72
[27177.843519][T2585409]  btrfs_delete_unused_bgs+0x3d9/0x14c0 [btrfs]

There is a similar retry_list code in btrfs_delete_unused_bgs(), but it is
safe, AFAIC. Since the block group was in the unused list, the used bytes
should be 0 when it was added to the unused list. Then, it checks
block_group->{used,resereved,pinned} are still 0 under the
block_group->lock. So, they should be still eligible for the unused list,
not the reclaim list.

Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Suggested-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Fixes: 4eb4e85c4f81 ("btrfs: retry block group reclaim without infinite loop")
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/block-group.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 7cde40fe6a50..498442d0c216 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1945,8 +1945,17 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
 next:
 		if (ret && !READ_ONCE(space_info->periodic_reclaim)) {
 			/* Refcount held by the reclaim_bgs list after splice. */
-			btrfs_get_block_group(bg);
-			list_add_tail(&bg->bg_list, &retry_list);
+			spin_lock(&fs_info->unused_bgs_lock);
+			/*
+			 * This block group might be added to the unused list
+			 * during the above process. Move it back to the
+			 * reclaim list otherwise.
+			 */
+			if (list_empty(&bg->bg_list)) {
+				btrfs_get_block_group(bg);
+				list_add_tail(&bg->bg_list, &retry_list);
+			}
+			spin_unlock(&fs_info->unused_bgs_lock);
 		}
 		btrfs_put_block_group(bg);
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-07-01 13:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-28  4:32 [PATCH] btrfs: avoid possible parallel list adding Naohiro Aota
2024-06-28  5:37 ` Qu Wenruo
2024-06-28  9:56   ` Filipe Manana
2024-06-28  6:02 ` Johannes Thumshirn
2024-06-28  9:45 ` Filipe Manana
2024-07-01  7:14   ` Naohiro Aota
2024-07-01  9:14     ` Filipe Manana
2024-06-28 10:03 ` Filipe Manana
2024-07-01  7:17   ` Naohiro Aota
2024-07-01 13:12 ` David Sterba
2024-07-01 13:35   ` Naohiro Aota
2024-07-01 13:50     ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox