From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Naohiro Aota <naohiro.aota@wdc.com>, linux-btrfs@vger.kernel.org
Cc: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>,
Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Subject: Re: [PATCH] btrfs: avoid possible parallel list adding
Date: Fri, 28 Jun 2024 15:07:04 +0930 [thread overview]
Message-ID: <89357644-24e6-4894-86b6-fbe5d369dd0a@gmx.com> (raw)
In-Reply-To: <58e8574ccd70645c613e6bc7d328e34c2f52421a.1719549099.git.naohiro.aota@wdc.com>
在 2024/6/28 14:02, Naohiro Aota 写道:
> There is a potential parallel list adding for retrying in
> btrfs_reclaim_bgs_work and adding to the unused list. Since the block
> group is removed from the reclaim list and it is on a relocation work,
> it can be added into the unused list in parallel. When that happens,
> adding it to the reclaim list will corrupt the list head and trigger
> list corruption like below.
>
> Fix it by taking fs_info->unused_bgs_lock.
>
> [27177.504101][T2585409] BTRFS error (device nullb1): error relocating ch= unk 2415919104
> [27177.514722][T2585409] list_del corruption. next->prev should be ff1100= 0344b119c0, but was ff11000377e87c70. (next=3Dff110002390cd9c0)
> [27177.529789][T2585409] ------------[ cut here ]------------
> [27177.537690][T2585409] kernel BUG at lib/list_debug.c:65!
> [27177.545548][T2585409] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
> [27177.555466][T2585409] CPU: 9 PID: 2585409 Comm: kworker/u128:2 Tainted: G W 6.10.0-rc5-kts #1
> [27177.568502][T2585409] Hardware name: Supermicro SYS-520P-WTR/X12SPW-TF, BIOS 1.2 02/14/2022
> [27177.579784][T2585409] Workqueue: events_unbound btrfs_reclaim_bgs_work[btrfs]
> [27177.589946][T2585409] RIP: 0010:__list_del_entry_valid_or_report.cold+0x70/0x72
> [27177.600088][T2585409] Code: fa ff 0f 0b 4c 89 e2 48 89 de 48 c7 c7 c0
> 8c 3b 93 e8 ab 8e fa ff 0f 0b 4c 89 e1 48 89 de 48 c7 c7 00 8e 3b 93 e8
> 97 8e fa ff <0f> 0b 48 63 d1 4c 89 f6 48 c7 c7 e0 56 67 94 89 4c 24 04
> e8 ff f2
> [27177.624173][T2585409] RSP: 0018:ff11000377e87a70 EFLAGS: 00010286
> [27177.633052][T2585409] RAX: 000000000000006d RBX: ff11000344b119c0 RCX:0000000000000000
> [27177.644059][T2585409] RDX: 000000000000006d RSI: 0000000000000008 RDI:ffe21c006efd0f40
> [27177.655030][T2585409] RBP: ff110002e0509f78 R08: 0000000000000001 R09:ffe21c006efd0f08
> [27177.665992][T2585409] R10: ff11000377e87847 R11: 0000000000000000 R12:ff110002390cd9c0
> [27177.676964][T2585409] R13: ff11000344b119c0 R14: ff110002e0508000 R15:dffffc0000000000
> [27177.687938][T2585409] FS: 0000000000000000(0000) GS:ff11000fec880000(0000) knlGS:0000000000000000
> [27177.700006][T2585409] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [27177.709431][T2585409] CR2: 00007f06bc7b1978 CR3: 0000001021e86005 CR4:0000000000771ef0
> [27177.720402][T2585409] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
> [27177.731337][T2585409] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
> [27177.742252][T2585409] PKRU: 55555554
> [27177.748207][T2585409] Call Trace:
> [27177.753850][T2585409] <TASK>
> [27177.759103][T2585409] ? __die_body.cold+0x19/0x27
> [27177.766405][T2585409] ? die+0x2e/0x50
> [27177.772498][T2585409] ? do_trap+0x1ea/0x2d0
> [27177.779139][T2585409] ? __list_del_entry_valid_or_report.cold+0x70/0x72
> [27177.788518][T2585409] ? do_error_trap+0xa3/0x160
> [27177.795649][T2585409] ? __list_del_entry_valid_or_report.cold+0x70/0x72
> [27177.805045][T2585409] ? handle_invalid_op+0x2c/0x40
> [27177.812022][T2585409] ? __list_del_entry_valid_or_report.cold+0x70/0x72
> [27177.820947][T2585409] ? exc_invalid_op+0x2d/0x40
> [27177.827608][T2585409] ? asm_exc_invalid_op+0x1a/0x20
> [27177.834637][T2585409] ? __list_del_entry_valid_or_report.cold+0x70/0x72
> [27177.843519][T2585409] btrfs_delete_unused_bgs+0x3d9/0x14c0 [btrfs]
>
> There is a similar retry_list code in btrfs_delete_unused_bgs(), but it is
> safe, AFAIC. Since the block group was in the unused list, the used bytes
> should be 0 when it was added to the unused list. Then, it checks
> block_group->{used,resereved,pinned} are still 0 under the
> block_group->lock. So, they should be still eligible for the unused list,
> not the reclaim list.
>
> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Suggested-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
> Fixes: 4eb4e85c4f81 ("btrfs: retry block group reclaim without infinite loop")
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Just to mention, I find another location which may cause problems:
- btrfs_make_block_group()
We call list_add_tail() to add the current bg to trans->new_bgs,
without any extra locking.
Not sure if it's racy, as it doesn't look that possible to call
btrfs_make_block_group() on the same trans handler, but maybe we want
to make sure it's safe.
Thanks,
Qu
> ---
> fs/btrfs/block-group.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 7cde40fe6a50..498442d0c216 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1945,8 +1945,17 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
> next:
> if (ret && !READ_ONCE(space_info->periodic_reclaim)) {
> /* Refcount held by the reclaim_bgs list after splice. */
> - btrfs_get_block_group(bg);
> - list_add_tail(&bg->bg_list, &retry_list);
> + spin_lock(&fs_info->unused_bgs_lock);
> + /*
> + * This block group might be added to the unused list
> + * during the above process. Move it back to the
> + * reclaim list otherwise.
> + */
> + if (list_empty(&bg->bg_list)) {
> + btrfs_get_block_group(bg);
> + list_add_tail(&bg->bg_list, &retry_list);
> + }
> + spin_unlock(&fs_info->unused_bgs_lock);
> }
> btrfs_put_block_group(bg);
>
next prev parent reply other threads:[~2024-06-28 5:37 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-28 4:32 [PATCH] btrfs: avoid possible parallel list adding Naohiro Aota
2024-06-28 5:37 ` Qu Wenruo [this message]
2024-06-28 9:56 ` Filipe Manana
2024-06-28 6:02 ` Johannes Thumshirn
2024-06-28 9:45 ` Filipe Manana
2024-07-01 7:14 ` Naohiro Aota
2024-07-01 9:14 ` Filipe Manana
2024-06-28 10:03 ` Filipe Manana
2024-07-01 7:17 ` Naohiro Aota
2024-07-01 13:12 ` David Sterba
2024-07-01 13:35 ` Naohiro Aota
2024-07-01 13:50 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=89357644-24e6-4894-86b6-fbe5d369dd0a@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=Johannes.Thumshirn@wdc.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=naohiro.aota@wdc.com \
--cc=shinichiro.kawasaki@wdc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox