* [PATCH] btrfs: fix the wrong rcu_str_deref() usage
@ 2024-10-02 21:29 Qu Wenruo
2024-10-03 10:10 ` Filipe Manana
2024-10-04 10:03 ` Johannes Thumshirn
0 siblings, 2 replies; 4+ messages in thread
From: Qu Wenruo @ 2024-10-02 21:29 UTC (permalink / raw)
To: linux-btrfs
For rcu_str_deref(), it should be called with rcu read lock hold.
But inside the function is_same_device(), the rcu string is accessed
without holding the rcu read lock, causing rcu warnings.
Fix it by holding the rcu read lock during the path resolution of the
existing device.
This will be folded into the offending patch "btrfs: avoid unnecessary
device path update for the same device"
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/volumes.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 35c4d151b5b0..3867d3c18be5 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -807,8 +807,10 @@ static bool is_same_device(struct btrfs_device *device, const char *new_path)
if (!device->name)
goto out;
+ rcu_read_lock();
old_path = rcu_str_deref(device->name);
ret = kern_path(old_path, LOOKUP_FOLLOW, &old);
+ rcu_read_unlock();
if (ret)
goto out;
ret = kern_path(new_path, LOOKUP_FOLLOW, &new);
--
2.46.2
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] btrfs: fix the wrong rcu_str_deref() usage
2024-10-02 21:29 [PATCH] btrfs: fix the wrong rcu_str_deref() usage Qu Wenruo
@ 2024-10-03 10:10 ` Filipe Manana
2024-10-04 10:03 ` Johannes Thumshirn
1 sibling, 0 replies; 4+ messages in thread
From: Filipe Manana @ 2024-10-03 10:10 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Wed, Oct 2, 2024 at 10:39 PM Qu Wenruo <wqu@suse.com> wrote:
>
> For rcu_str_deref(), it should be called with rcu read lock hold.
> But inside the function is_same_device(), the rcu string is accessed
> without holding the rcu read lock, causing rcu warnings.
>
> Fix it by holding the rcu read lock during the path resolution of the
> existing device.
>
> This will be folded into the offending patch "btrfs: avoid unnecessary
> device path update for the same device"
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Thanks.
> ---
> fs/btrfs/volumes.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 35c4d151b5b0..3867d3c18be5 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -807,8 +807,10 @@ static bool is_same_device(struct btrfs_device *device, const char *new_path)
> if (!device->name)
> goto out;
>
> + rcu_read_lock();
> old_path = rcu_str_deref(device->name);
> ret = kern_path(old_path, LOOKUP_FOLLOW, &old);
> + rcu_read_unlock();
> if (ret)
> goto out;
> ret = kern_path(new_path, LOOKUP_FOLLOW, &new);
> --
> 2.46.2
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] btrfs: fix the wrong rcu_str_deref() usage
2024-10-02 21:29 [PATCH] btrfs: fix the wrong rcu_str_deref() usage Qu Wenruo
2024-10-03 10:10 ` Filipe Manana
@ 2024-10-04 10:03 ` Johannes Thumshirn
2024-10-04 10:08 ` Qu Wenruo
1 sibling, 1 reply; 4+ messages in thread
From: Johannes Thumshirn @ 2024-10-04 10:03 UTC (permalink / raw)
To: WenRuo Qu, linux-btrfs@vger.kernel.org
On 02.10.24 23:35, Qu Wenruo wrote:
> For rcu_str_deref(), it should be called with rcu read lock hold.
> But inside the function is_same_device(), the rcu string is accessed
> without holding the rcu read lock, causing rcu warnings.
>
> Fix it by holding the rcu read lock during the path resolution of the
> existing device.
>
> This will be folded into the offending patch "btrfs: avoid unnecessary
> device path update for the same device"
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/volumes.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 35c4d151b5b0..3867d3c18be5 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -807,8 +807,10 @@ static bool is_same_device(struct btrfs_device *device, const char *new_path)
> if (!device->name)
> goto out;
>
> + rcu_read_lock();
> old_path = rcu_str_deref(device->name);
> ret = kern_path(old_path, LOOKUP_FOLLOW, &old);
> + rcu_read_unlock();
> if (ret)
> goto out;
> ret = kern_path(new_path, LOOKUP_FOLLOW, &new);
With that patch applied I get the following in btrfs/006 with Qemu
emulated NVMe drives:
[ 150.421039] run fstests btrfs/006 at 2024-10-04 09:50:53
[ 151.322158] BTRFS: device fsid e9489dbd-0346-4c23-9b8b-9a0221e235f8
devid 1 transid 8 /dev/nvme0n1 (259:1) scanned by mount (29282)
[ 151.322779] BTRFS info (device nvme0n1): first mount of filesystem
e9489dbd-0346-4c23-9b8b-9a0221e235f8
[ 151.322789] BTRFS info (device nvme0n1): using crc32c (crc32c-intel)
checksum algorithm
[ 151.322794] BTRFS info (device nvme0n1): using free-space-tree
[ 152.831300] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2
devid 1 transid 6 /dev/nvme1n1 (259:0) scanned by mkfs.btrfs (29450)
[ 152.831712] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2
devid 2 transid 6 /dev/nvme2n1 (259:3) scanned by mkfs.btrfs (29450)
[ 152.832292] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2
devid 3 transid 6 /dev/nvme3n1 (259:2) scanned by mkfs.btrfs (29450)
[ 152.832747] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2
devid 4 transid 6 /dev/nvme4n1 (259:4) scanned by mkfs.btrfs (29450)
[ 152.833189] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2
devid 5 transid 6 /dev/nvme5n1 (259:5) scanned by mkfs.btrfs (29450)
[ 152.872989] BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:337
[ 152.872995] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid:
29459, name: (udev-worker)
[ 152.872997] preempt_count: 0, expected: 0
[ 152.872999] RCU nest depth: 1, expected: 0
[ 152.873001] 3 locks held by (udev-worker)/29459:
[ 152.873003] #0: ffffffff8253e968 (uuid_mutex){....}-{3:3}, at:
btrfs_control_ioctl+0xcb/0x1a0
[ 152.873016] #1: ffff8881105c18d8
(&fs_devs->device_list_mutex){....}-{3:3}, at:
device_list_add.constprop.0+0x126/0x670
[ 152.873027] #2: ffffffff824cdb60 (rcu_read_lock){....}-{1:2}, at:
device_list_add.constprop.0+0x193/0x670
[ 152.873036] CPU: 0 UID: 0 PID: 29459 Comm: (udev-worker) Tainted: G
W 6.11.0-rc7+ #800
[ 152.873040] Tainted: [W]=WARN
[ 152.873042] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 152.873044] Call Trace:
[ 152.873047] <TASK>
[ 152.873050] dump_stack_lvl+0x4b/0x70
[ 152.873056] __might_resched.cold+0xbf/0xcf
[ 152.873061] kmem_cache_alloc_noprof+0x1c5/0x2e0
[ 152.873067] ? getname_kernel+0x29/0x150
[ 152.873073] getname_kernel+0x29/0x150
[ 152.873077] kern_path+0x17/0x50
[ 152.873081] device_list_add.constprop.0+0x1c1/0x670
[ 152.873085] ? device_list_add.constprop.0+0x193/0x670
[ 152.873089] ? device_list_add.constprop.0+0x193/0x670
[ 152.873092] ? btrfs_scan_one_device+0x1bf/0x410
[ 152.873101] btrfs_scan_one_device+0x239/0x410
[ 152.873112] btrfs_control_ioctl+0xdb/0x1a0
[ 152.873119] __x64_sys_ioctl+0x96/0xc0
[ 152.873127] do_syscall_64+0x54/0x110
[ 152.873132] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 152.873139] RIP: 0033:0x7fa8daf99f9b
[ 152.873143] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10
00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f
05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[ 152.873146] RSP: 002b:00007ffd34cd5cf0 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ 152.873150] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
00007fa8daf99f9b
[ 152.873152] RDX: 00007ffd34cd5d70 RSI: 0000000090009427 RDI:
0000000000000005
[ 152.873154] RBP: 00007ffd34cd5d70 R08: 0000000000000000 R09:
00007ffd34cd6cd0
[ 152.873156] R10: 0000000000000000 R11: 0000000000000246 R12:
00005639a2ce2b60
[ 152.873158] R13: 0000000000000000 R14: 0000000000000000 R15:
0000000000000005
[ 152.873166] </TASK>
[ 153.117462] BTRFS info (device nvme1n1): first mount of filesystem
05148ba4-7ee1-480f-af62-7aa0b80b35f2
[ 153.117475] BTRFS info (device nvme1n1): using crc32c (crc32c-intel)
checksum algorithm
[ 153.117480] BTRFS info (device nvme1n1): using free-space-tree
[ 153.120689] BTRFS info (device nvme1n1): checking UUID tree
[ 153.632137] BTRFS info (device nvme0n1): last unmount of filesystem
e9489dbd-0346-4c23-9b8b-9a0221e235f8
[ 153.715256] BTRFS info (device nvme1n1): last unmount of filesystem
05148ba4-7ee1-480f-af62-7aa0b80b35f2
[ 153.818188] BTRFS info (device nvme1n1): first mount of filesystem
05148ba4-7ee1-480f-af62-7aa0b80b35f2
[ 153.818203] BTRFS info (device nvme1n1): using crc32c (crc32c-intel)
checksum algorithm
[ 153.818209] BTRFS info (device nvme1n1): using free-space-tree
[ 153.884093] BTRFS info (device nvme1n1): last unmount of filesystem
05148ba4-7ee1-480f-af62-7aa0b80b35f2
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] btrfs: fix the wrong rcu_str_deref() usage
2024-10-04 10:03 ` Johannes Thumshirn
@ 2024-10-04 10:08 ` Qu Wenruo
0 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2024-10-04 10:08 UTC (permalink / raw)
To: Johannes Thumshirn, WenRuo Qu, linux-btrfs@vger.kernel.org
在 2024/10/4 19:33, Johannes Thumshirn 写道:
> On 02.10.24 23:35, Qu Wenruo wrote:
>> For rcu_str_deref(), it should be called with rcu read lock hold.
>> But inside the function is_same_device(), the rcu string is accessed
>> without holding the rcu read lock, causing rcu warnings.
>>
>> Fix it by holding the rcu read lock during the path resolution of the
>> existing device.
>>
>> This will be folded into the offending patch "btrfs: avoid unnecessary
>> device path update for the same device"
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> fs/btrfs/volumes.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 35c4d151b5b0..3867d3c18be5 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -807,8 +807,10 @@ static bool is_same_device(struct btrfs_device *device, const char *new_path)
>> if (!device->name)
>> goto out;
>>
>> + rcu_read_lock();
>> old_path = rcu_str_deref(device->name);
>> ret = kern_path(old_path, LOOKUP_FOLLOW, &old);
>> + rcu_read_unlock();
>> if (ret)
>> goto out;
>> ret = kern_path(new_path, LOOKUP_FOLLOW, &new);
>
>
> With that patch applied I get the following in btrfs/006 with Qemu
> emulated NVMe drives:
> [ 150.421039] run fstests btrfs/006 at 2024-10-04 09:50:53
> [ 151.322158] BTRFS: device fsid e9489dbd-0346-4c23-9b8b-9a0221e235f8
> devid 1 transid 8 /dev/nvme0n1 (259:1) scanned by mount (29282)
> [ 151.322779] BTRFS info (device nvme0n1): first mount of filesystem
> e9489dbd-0346-4c23-9b8b-9a0221e235f8
> [ 151.322789] BTRFS info (device nvme0n1): using crc32c (crc32c-intel)
> checksum algorithm
> [ 151.322794] BTRFS info (device nvme0n1): using free-space-tree
> [ 152.831300] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2
> devid 1 transid 6 /dev/nvme1n1 (259:0) scanned by mkfs.btrfs (29450)
> [ 152.831712] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2
> devid 2 transid 6 /dev/nvme2n1 (259:3) scanned by mkfs.btrfs (29450)
> [ 152.832292] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2
> devid 3 transid 6 /dev/nvme3n1 (259:2) scanned by mkfs.btrfs (29450)
> [ 152.832747] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2
> devid 4 transid 6 /dev/nvme4n1 (259:4) scanned by mkfs.btrfs (29450)
> [ 152.833189] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2
> devid 5 transid 6 /dev/nvme5n1 (259:5) scanned by mkfs.btrfs (29450)
> [ 152.872989] BUG: sleeping function called from invalid context at
> include/linux/sched/mm.h:337
My bad, I forgot that the kern_path() can sleep, thus it can not be
utilized inside RCU context.
Instead I should dump the rcu string into a local buffer instead.
And of course I should enable the sleep in atomic for my aarch64 VM too.
Sorry for the RCU related regression again and again.
Qu
> [ 152.872995] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid:
> 29459, name: (udev-worker)
> [ 152.872997] preempt_count: 0, expected: 0
> [ 152.872999] RCU nest depth: 1, expected: 0
> [ 152.873001] 3 locks held by (udev-worker)/29459:
> [ 152.873003] #0: ffffffff8253e968 (uuid_mutex){....}-{3:3}, at:
> btrfs_control_ioctl+0xcb/0x1a0
> [ 152.873016] #1: ffff8881105c18d8
> (&fs_devs->device_list_mutex){....}-{3:3}, at:
> device_list_add.constprop.0+0x126/0x670
> [ 152.873027] #2: ffffffff824cdb60 (rcu_read_lock){....}-{1:2}, at:
> device_list_add.constprop.0+0x193/0x670
> [ 152.873036] CPU: 0 UID: 0 PID: 29459 Comm: (udev-worker) Tainted: G
> W 6.11.0-rc7+ #800
> [ 152.873040] Tainted: [W]=WARN
> [ 152.873042] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [ 152.873044] Call Trace:
> [ 152.873047] <TASK>
> [ 152.873050] dump_stack_lvl+0x4b/0x70
> [ 152.873056] __might_resched.cold+0xbf/0xcf
> [ 152.873061] kmem_cache_alloc_noprof+0x1c5/0x2e0
> [ 152.873067] ? getname_kernel+0x29/0x150
> [ 152.873073] getname_kernel+0x29/0x150
> [ 152.873077] kern_path+0x17/0x50
> [ 152.873081] device_list_add.constprop.0+0x1c1/0x670
> [ 152.873085] ? device_list_add.constprop.0+0x193/0x670
> [ 152.873089] ? device_list_add.constprop.0+0x193/0x670
> [ 152.873092] ? btrfs_scan_one_device+0x1bf/0x410
> [ 152.873101] btrfs_scan_one_device+0x239/0x410
> [ 152.873112] btrfs_control_ioctl+0xdb/0x1a0
> [ 152.873119] __x64_sys_ioctl+0x96/0xc0
> [ 152.873127] do_syscall_64+0x54/0x110
> [ 152.873132] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 152.873139] RIP: 0033:0x7fa8daf99f9b
> [ 152.873143] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10
> 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f
> 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
> [ 152.873146] RSP: 002b:00007ffd34cd5cf0 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [ 152.873150] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 00007fa8daf99f9b
> [ 152.873152] RDX: 00007ffd34cd5d70 RSI: 0000000090009427 RDI:
> 0000000000000005
> [ 152.873154] RBP: 00007ffd34cd5d70 R08: 0000000000000000 R09:
> 00007ffd34cd6cd0
> [ 152.873156] R10: 0000000000000000 R11: 0000000000000246 R12:
> 00005639a2ce2b60
> [ 152.873158] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000005
> [ 152.873166] </TASK>
> [ 153.117462] BTRFS info (device nvme1n1): first mount of filesystem
> 05148ba4-7ee1-480f-af62-7aa0b80b35f2
> [ 153.117475] BTRFS info (device nvme1n1): using crc32c (crc32c-intel)
> checksum algorithm
> [ 153.117480] BTRFS info (device nvme1n1): using free-space-tree
> [ 153.120689] BTRFS info (device nvme1n1): checking UUID tree
> [ 153.632137] BTRFS info (device nvme0n1): last unmount of filesystem
> e9489dbd-0346-4c23-9b8b-9a0221e235f8
> [ 153.715256] BTRFS info (device nvme1n1): last unmount of filesystem
> 05148ba4-7ee1-480f-af62-7aa0b80b35f2
> [ 153.818188] BTRFS info (device nvme1n1): first mount of filesystem
> 05148ba4-7ee1-480f-af62-7aa0b80b35f2
> [ 153.818203] BTRFS info (device nvme1n1): using crc32c (crc32c-intel)
> checksum algorithm
> [ 153.818209] BTRFS info (device nvme1n1): using free-space-tree
> [ 153.884093] BTRFS info (device nvme1n1): last unmount of filesystem
> 05148ba4-7ee1-480f-af62-7aa0b80b35f2
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-10-04 10:08 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-02 21:29 [PATCH] btrfs: fix the wrong rcu_str_deref() usage Qu Wenruo
2024-10-03 10:10 ` Filipe Manana
2024-10-04 10:03 ` Johannes Thumshirn
2024-10-04 10:08 ` Qu Wenruo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).