From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
WenRuo Qu <wqu@suse.com>,
"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH] btrfs: fix the wrong rcu_str_deref() usage
Date: Fri, 4 Oct 2024 19:38:49 +0930 [thread overview]
Message-ID: <e5ddc5d0-cfb7-4a86-bc6d-d88cfa93c103@gmx.com> (raw)
In-Reply-To: <ace405db-4089-4fe0-b9dc-ef7bacb08f09@wdc.com>
在 2024/10/4 19:33, Johannes Thumshirn 写道:
> On 02.10.24 23:35, Qu Wenruo wrote:
>> For rcu_str_deref(), it should be called with rcu read lock hold.
>> But inside the function is_same_device(), the rcu string is accessed
>> without holding the rcu read lock, causing rcu warnings.
>>
>> Fix it by holding the rcu read lock during the path resolution of the
>> existing device.
>>
>> This will be folded into the offending patch "btrfs: avoid unnecessary
>> device path update for the same device"
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> fs/btrfs/volumes.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 35c4d151b5b0..3867d3c18be5 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -807,8 +807,10 @@ static bool is_same_device(struct btrfs_device *device, const char *new_path)
>> if (!device->name)
>> goto out;
>>
>> + rcu_read_lock();
>> old_path = rcu_str_deref(device->name);
>> ret = kern_path(old_path, LOOKUP_FOLLOW, &old);
>> + rcu_read_unlock();
>> if (ret)
>> goto out;
>> ret = kern_path(new_path, LOOKUP_FOLLOW, &new);
>
>
> With that patch applied I get the following in btrfs/006 with Qemu
> emulated NVMe drives:
> [ 150.421039] run fstests btrfs/006 at 2024-10-04 09:50:53
> [ 151.322158] BTRFS: device fsid e9489dbd-0346-4c23-9b8b-9a0221e235f8
> devid 1 transid 8 /dev/nvme0n1 (259:1) scanned by mount (29282)
> [ 151.322779] BTRFS info (device nvme0n1): first mount of filesystem
> e9489dbd-0346-4c23-9b8b-9a0221e235f8
> [ 151.322789] BTRFS info (device nvme0n1): using crc32c (crc32c-intel)
> checksum algorithm
> [ 151.322794] BTRFS info (device nvme0n1): using free-space-tree
> [ 152.831300] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2
> devid 1 transid 6 /dev/nvme1n1 (259:0) scanned by mkfs.btrfs (29450)
> [ 152.831712] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2
> devid 2 transid 6 /dev/nvme2n1 (259:3) scanned by mkfs.btrfs (29450)
> [ 152.832292] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2
> devid 3 transid 6 /dev/nvme3n1 (259:2) scanned by mkfs.btrfs (29450)
> [ 152.832747] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2
> devid 4 transid 6 /dev/nvme4n1 (259:4) scanned by mkfs.btrfs (29450)
> [ 152.833189] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2
> devid 5 transid 6 /dev/nvme5n1 (259:5) scanned by mkfs.btrfs (29450)
> [ 152.872989] BUG: sleeping function called from invalid context at
> include/linux/sched/mm.h:337
My bad, I forgot that the kern_path() can sleep, thus it can not be
utilized inside RCU context.
Instead I should dump the rcu string into a local buffer instead.
And of course I should enable the sleep in atomic for my aarch64 VM too.
Sorry for the RCU related regression again and again.
Qu
> [ 152.872995] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid:
> 29459, name: (udev-worker)
> [ 152.872997] preempt_count: 0, expected: 0
> [ 152.872999] RCU nest depth: 1, expected: 0
> [ 152.873001] 3 locks held by (udev-worker)/29459:
> [ 152.873003] #0: ffffffff8253e968 (uuid_mutex){....}-{3:3}, at:
> btrfs_control_ioctl+0xcb/0x1a0
> [ 152.873016] #1: ffff8881105c18d8
> (&fs_devs->device_list_mutex){....}-{3:3}, at:
> device_list_add.constprop.0+0x126/0x670
> [ 152.873027] #2: ffffffff824cdb60 (rcu_read_lock){....}-{1:2}, at:
> device_list_add.constprop.0+0x193/0x670
> [ 152.873036] CPU: 0 UID: 0 PID: 29459 Comm: (udev-worker) Tainted: G
> W 6.11.0-rc7+ #800
> [ 152.873040] Tainted: [W]=WARN
> [ 152.873042] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [ 152.873044] Call Trace:
> [ 152.873047] <TASK>
> [ 152.873050] dump_stack_lvl+0x4b/0x70
> [ 152.873056] __might_resched.cold+0xbf/0xcf
> [ 152.873061] kmem_cache_alloc_noprof+0x1c5/0x2e0
> [ 152.873067] ? getname_kernel+0x29/0x150
> [ 152.873073] getname_kernel+0x29/0x150
> [ 152.873077] kern_path+0x17/0x50
> [ 152.873081] device_list_add.constprop.0+0x1c1/0x670
> [ 152.873085] ? device_list_add.constprop.0+0x193/0x670
> [ 152.873089] ? device_list_add.constprop.0+0x193/0x670
> [ 152.873092] ? btrfs_scan_one_device+0x1bf/0x410
> [ 152.873101] btrfs_scan_one_device+0x239/0x410
> [ 152.873112] btrfs_control_ioctl+0xdb/0x1a0
> [ 152.873119] __x64_sys_ioctl+0x96/0xc0
> [ 152.873127] do_syscall_64+0x54/0x110
> [ 152.873132] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 152.873139] RIP: 0033:0x7fa8daf99f9b
> [ 152.873143] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10
> 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f
> 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
> [ 152.873146] RSP: 002b:00007ffd34cd5cf0 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [ 152.873150] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 00007fa8daf99f9b
> [ 152.873152] RDX: 00007ffd34cd5d70 RSI: 0000000090009427 RDI:
> 0000000000000005
> [ 152.873154] RBP: 00007ffd34cd5d70 R08: 0000000000000000 R09:
> 00007ffd34cd6cd0
> [ 152.873156] R10: 0000000000000000 R11: 0000000000000246 R12:
> 00005639a2ce2b60
> [ 152.873158] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000005
> [ 152.873166] </TASK>
> [ 153.117462] BTRFS info (device nvme1n1): first mount of filesystem
> 05148ba4-7ee1-480f-af62-7aa0b80b35f2
> [ 153.117475] BTRFS info (device nvme1n1): using crc32c (crc32c-intel)
> checksum algorithm
> [ 153.117480] BTRFS info (device nvme1n1): using free-space-tree
> [ 153.120689] BTRFS info (device nvme1n1): checking UUID tree
> [ 153.632137] BTRFS info (device nvme0n1): last unmount of filesystem
> e9489dbd-0346-4c23-9b8b-9a0221e235f8
> [ 153.715256] BTRFS info (device nvme1n1): last unmount of filesystem
> 05148ba4-7ee1-480f-af62-7aa0b80b35f2
> [ 153.818188] BTRFS info (device nvme1n1): first mount of filesystem
> 05148ba4-7ee1-480f-af62-7aa0b80b35f2
> [ 153.818203] BTRFS info (device nvme1n1): using crc32c (crc32c-intel)
> checksum algorithm
> [ 153.818209] BTRFS info (device nvme1n1): using free-space-tree
> [ 153.884093] BTRFS info (device nvme1n1): last unmount of filesystem
> 05148ba4-7ee1-480f-af62-7aa0b80b35f2
prev parent reply other threads:[~2024-10-04 10:08 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-02 21:29 [PATCH] btrfs: fix the wrong rcu_str_deref() usage Qu Wenruo
2024-10-03 10:10 ` Filipe Manana
2024-10-04 10:03 ` Johannes Thumshirn
2024-10-04 10:08 ` Qu Wenruo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e5ddc5d0-cfb7-4a86-bc6d-d88cfa93c103@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=Johannes.Thumshirn@wdc.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).