* [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL
@ 2025-12-13 20:09 Kai Krakow
2025-12-13 20:48 ` Qu Wenruo
0 siblings, 1 reply; 9+ messages in thread
From: Kai Krakow @ 2025-12-13 20:09 UTC (permalink / raw)
To: linux-btrfs; +Cc: Kai Krakow, Eli Venter
During mount, the global block reserve might not have its space_info
initialized yet. If we try to reserve bytes in this state (e.g. via
early sysfs writes), we must not crash.
This happened while developing patches which allow modification of the
devinfo.type field via sysfs. If this write access is executed by the
user before the mount finished, the kernel crashed with a NULL pointer
dereference:
> Noticed an oops with these patches when doing echo 1 >devinfo/2/type
> while mount is still ongoing. My btrfs is big so the mount takes
> 20-30 minutes. Reboot and wait until mount is complete and this
> worked fine.
BUG: kernel NULL pointer dereference, address: 0000000000000008
PGD 0 P4D 0
Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 4 UID: 0 PID: 3520 Comm: bash Not tainted 6.12.52-dirty #2
Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 06/25/2018
RIP: 0010:_raw_spin_lock+0x17/0x30
Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00
RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008
RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001
R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000
R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000
FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0
Call Trace:
__reserve_bytes+0x70/0x720 [btrfs]
? get_page_from_freelist+0x343/0x1570
btrfs_reserve_metadata_bytes+0x1d/0xd0 [btrfs]
btrfs_use_block_rsv+0x153/0x220 [btrfs]
btrfs_alloc_tree_block+0x83/0x580 [btrfs]
btrfs_force_cow_block+0x129/0x620 [btrfs]
btrfs_cow_block+0xcd/0x230 [btrfs]
btrfs_search_slot+0x566/0xd60 [btrfs]
? kmem_cache_alloc_noprof+0x106/0x2f0
btrfs_update_device+0x91/0x1d0 [btrfs]
btrfs_devinfo_type_store+0xb8/0x140 [btrfs]
kernfs_fop_write_iter+0x14c/0x200
vfs_write+0x289/0x440
ksys_write+0x6d/0xf0
trace_clock_x86_tsc+0x20/0x20
? do_wp_page+0x838/0xf90
? __do_sys_newfstat+0x68/0x70
? __pte_offset_map+0x1b/0xf0
? __handle_mm_fault+0xa6c/0x10f0
? __count_memcg_events+0x53/0xf0
? handle_mm_fault+0x1c4/0x2d0
? do_user_addr_fault+0x334/0x620
? arch_exit_to_user_mode_prepare.isra.0+0x11/0x90
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fd6e27a1687
Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
RSP: 002b:00007ffecb401260 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007fd6e270f740 RCX: 00007fd6e27a1687
RDX: 0000000000000002 RSI: 0000557a2c38ad20 RDI: 0000000000000001
RBP: 0000557a2c38ad20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002
R13: 00007fd6e28fa5c0 R14: 00007fd6e28f7e80 R15: 0000000000000000
Modules linked in: rpcsec_gss_krb5 nfsv3 nfsv4 dns_resolver nfs netfs zram lz4hc_compress lz4_compress dm_crypt bonding tls ipmi_ssif intel_rapl_msr nfsd binfmt_misc auth_rpcgss nfs_acl lockd grace intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp rapl intel_cstate s>
intel_pmc_bxt ixgbe ehci_pci iTCO_vendor_support xfrm_algo gf128mul libata mpt3sas xhci_hcd ehci_hcd watchdog crypto_simd mdio_devres libphy cryptd raid_class usbcore scsi_transport_sas mdio igb scsi_mod wmi usb_common i2c_i801 lpc_ich scsi_common i2c_smbus i2c_algo_bit dca
CR2: 0000000000000008
---[ end trace 0000000000000000 ]---
RIP: 0010:_raw_spin_lock+0x17/0x30
Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00
RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008
RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001
R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000
R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000
FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0
Reported-by: Eli Venter <eli@genedx.com>
Signed-off-by: Kai Krakow <kai@kaishome.de>
---
fs/btrfs/space-info.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 97452fb5d29b..cbb6c4924850 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1752,6 +1752,14 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info,
ASSERT(flush != BTRFS_RESERVE_FLUSH_EVICT);
}
+ /*
+ * During mount, the global block reserve might not have its space_info
+ * initialized yet. If we try to reserve bytes in this state (e.g. via
+ * early sysfs writes), we must not crash.
+ */
+ if (unlikely(!space_info))
+ return -EBUSY;
+
if (flush == BTRFS_RESERVE_FLUSH_DATA)
async_work = &fs_info->async_data_reclaim_work;
else
--
2.51.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL
2025-12-13 20:09 [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL Kai Krakow
@ 2025-12-13 20:48 ` Qu Wenruo
2025-12-13 21:04 ` Qu Wenruo
2025-12-13 21:10 ` Kai Krakow
0 siblings, 2 replies; 9+ messages in thread
From: Qu Wenruo @ 2025-12-13 20:48 UTC (permalink / raw)
To: Kai Krakow, linux-btrfs; +Cc: Eli Venter
在 2025/12/14 06:39, Kai Krakow 写道:
> During mount, the global block reserve might not have its space_info
> initialized yet. If we try to reserve bytes in this state (e.g. via
> early sysfs writes), we must not crash.
>
> This happened while developing patches which allow modification of the
> devinfo.type field via sysfs. If this write access is executed by the
> user before the mount finished, the kernel crashed with a NULL pointer
> dereference:
I'd say the modification through sysfs itself is a dangerous idea, it
will need to hold the proper locks and if not properly checked can
easily introduce unexpected races.
Furthermore currently there is no RW support for devinfo related member.
So this means your patch is fixing something that is only affecting your
out-of-tree development branch, which is not bringing much usefulness to
upstream.
Thanks,
Qu
>
>> Noticed an oops with these patches when doing echo 1 >devinfo/2/type
>> while mount is still ongoing. My btrfs is big so the mount takes
>> 20-30 minutes. Reboot and wait until mount is complete and this
>> worked fine.
>
> BUG: kernel NULL pointer dereference, address: 0000000000000008
> PGD 0 P4D 0
> Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
> CPU: 4 UID: 0 PID: 3520 Comm: bash Not tainted 6.12.52-dirty #2
> Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 06/25/2018
> RIP: 0010:_raw_spin_lock+0x17/0x30
> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00
> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008
> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001
> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000
> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000
> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0
> Call Trace:
>
> __reserve_bytes+0x70/0x720 [btrfs]
> ? get_page_from_freelist+0x343/0x1570
> btrfs_reserve_metadata_bytes+0x1d/0xd0 [btrfs]
> btrfs_use_block_rsv+0x153/0x220 [btrfs]
> btrfs_alloc_tree_block+0x83/0x580 [btrfs]
> btrfs_force_cow_block+0x129/0x620 [btrfs]
> btrfs_cow_block+0xcd/0x230 [btrfs]
> btrfs_search_slot+0x566/0xd60 [btrfs]
> ? kmem_cache_alloc_noprof+0x106/0x2f0
> btrfs_update_device+0x91/0x1d0 [btrfs]
> btrfs_devinfo_type_store+0xb8/0x140 [btrfs]
> kernfs_fop_write_iter+0x14c/0x200
> vfs_write+0x289/0x440
> ksys_write+0x6d/0xf0
> trace_clock_x86_tsc+0x20/0x20
> ? do_wp_page+0x838/0xf90
> ? __do_sys_newfstat+0x68/0x70
> ? __pte_offset_map+0x1b/0xf0
> ? __handle_mm_fault+0xa6c/0x10f0
> ? __count_memcg_events+0x53/0xf0
> ? handle_mm_fault+0x1c4/0x2d0
> ? do_user_addr_fault+0x334/0x620
> ? arch_exit_to_user_mode_prepare.isra.0+0x11/0x90
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x7fd6e27a1687
> Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
> RSP: 002b:00007ffecb401260 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 00007fd6e270f740 RCX: 00007fd6e27a1687
> RDX: 0000000000000002 RSI: 0000557a2c38ad20 RDI: 0000000000000001
> RBP: 0000557a2c38ad20 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002
> R13: 00007fd6e28fa5c0 R14: 00007fd6e28f7e80 R15: 0000000000000000
>
> Modules linked in: rpcsec_gss_krb5 nfsv3 nfsv4 dns_resolver nfs netfs zram lz4hc_compress lz4_compress dm_crypt bonding tls ipmi_ssif intel_rapl_msr nfsd binfmt_misc auth_rpcgss nfs_acl lockd grace intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp rapl intel_cstate s>
> intel_pmc_bxt ixgbe ehci_pci iTCO_vendor_support xfrm_algo gf128mul libata mpt3sas xhci_hcd ehci_hcd watchdog crypto_simd mdio_devres libphy cryptd raid_class usbcore scsi_transport_sas mdio igb scsi_mod wmi usb_common i2c_i801 lpc_ich scsi_common i2c_smbus i2c_algo_bit dca
> CR2: 0000000000000008
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:_raw_spin_lock+0x17/0x30
> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00
> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008
> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001
> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000
> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000
> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0
>
> Reported-by: Eli Venter <eli@genedx.com>
> Signed-off-by: Kai Krakow <kai@kaishome.de>
> ---
> fs/btrfs/space-info.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> index 97452fb5d29b..cbb6c4924850 100644
> --- a/fs/btrfs/space-info.c
> +++ b/fs/btrfs/space-info.c
> @@ -1752,6 +1752,14 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info,
> ASSERT(flush != BTRFS_RESERVE_FLUSH_EVICT);
> }
>
> + /*
> + * During mount, the global block reserve might not have its space_info
> + * initialized yet. If we try to reserve bytes in this state (e.g. via
> + * early sysfs writes), we must not crash.
> + */
> + if (unlikely(!space_info))
> + return -EBUSY;
> +
> if (flush == BTRFS_RESERVE_FLUSH_DATA)
> async_work = &fs_info->async_data_reclaim_work;
> else
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL
2025-12-13 20:48 ` Qu Wenruo
@ 2025-12-13 21:04 ` Qu Wenruo
2025-12-13 21:14 ` Kai Krakow
2025-12-13 21:10 ` Kai Krakow
1 sibling, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2025-12-13 21:04 UTC (permalink / raw)
To: Kai Krakow, linux-btrfs; +Cc: Eli Venter
在 2025/12/14 07:18, Qu Wenruo 写道:
>
>
> 在 2025/12/14 06:39, Kai Krakow 写道:
>> During mount, the global block reserve might not have its space_info
>> initialized yet. If we try to reserve bytes in this state (e.g. via
>> early sysfs writes), we must not crash.
>>
>> This happened while developing patches which allow modification of the
>> devinfo.type field via sysfs. If this write access is executed by the
>> user before the mount finished, the kernel crashed with a NULL pointer
>> dereference:
>
>
> I'd say the modification through sysfs itself is a dangerous idea, it
> will need to hold the proper locks and if not properly checked can
> easily introduce unexpected races.
>
>
> Furthermore currently there is no RW support for devinfo related member.
Correction, there are RW members but that are only runtime ones, e.g.
scrub throughput and iops limits.
Nothing is going to trigger any metadata writes.
And that's true for all RW members, no matter if it's
space_info->chunk_size or reclaim control.
Even for the label modification through sysfs, we do not trigger any
metadata changes, but only modify in-memory structures.
If you're going to use sysfs to trigger a transaction, I'd just say, DON'T.
Thanks,
Qu
>
> So this means your patch is fixing something that is only affecting your
> out-of-tree development branch, which is not bringing much usefulness to
> upstream.
>
> Thanks,
> Qu
>
>>
>>> Noticed an oops with these patches when doing echo 1 >devinfo/2/type
>>> while mount is still ongoing. My btrfs is big so the mount takes
>>> 20-30 minutes. Reboot and wait until mount is complete and this
>>> worked fine.
>>
>> BUG: kernel NULL pointer dereference, address: 0000000000000008
>> PGD 0 P4D 0
>> Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
>> CPU: 4 UID: 0 PID: 3520 Comm: bash Not tainted 6.12.52-dirty #2
>> Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15
>> 06/25/2018
>> RIP: 0010:_raw_spin_lock+0x17/0x30
>> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f
>> 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1
>> 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00
>> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246
>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
>> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008
>> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001
>> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000
>> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000
>> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000)
>> knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0
>> Call Trace:
>>
>> __reserve_bytes+0x70/0x720 [btrfs]
>> ? get_page_from_freelist+0x343/0x1570
>> btrfs_reserve_metadata_bytes+0x1d/0xd0 [btrfs]
>> btrfs_use_block_rsv+0x153/0x220 [btrfs]
>> btrfs_alloc_tree_block+0x83/0x580 [btrfs]
>> btrfs_force_cow_block+0x129/0x620 [btrfs]
>> btrfs_cow_block+0xcd/0x230 [btrfs]
>> btrfs_search_slot+0x566/0xd60 [btrfs]
>> ? kmem_cache_alloc_noprof+0x106/0x2f0
>> btrfs_update_device+0x91/0x1d0 [btrfs]
>> btrfs_devinfo_type_store+0xb8/0x140 [btrfs]
>> kernfs_fop_write_iter+0x14c/0x200
>> vfs_write+0x289/0x440
>> ksys_write+0x6d/0xf0
>> trace_clock_x86_tsc+0x20/0x20
>> ? do_wp_page+0x838/0xf90
>> ? __do_sys_newfstat+0x68/0x70
>> ? __pte_offset_map+0x1b/0xf0
>> ? __handle_mm_fault+0xa6c/0x10f0
>> ? __count_memcg_events+0x53/0xf0
>> ? handle_mm_fault+0x1c4/0x2d0
>> ? do_user_addr_fault+0x334/0x620
>> ? arch_exit_to_user_mode_prepare.isra.0+0x11/0x90
>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>> RIP: 0033:0x7fd6e27a1687
>> Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83
>> f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3
>> 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
>> RSP: 002b:00007ffecb401260 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
>> RAX: ffffffffffffffda RBX: 00007fd6e270f740 RCX: 00007fd6e27a1687
>> RDX: 0000000000000002 RSI: 0000557a2c38ad20 RDI: 0000000000000001
>> RBP: 0000557a2c38ad20 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002
>> R13: 00007fd6e28fa5c0 R14: 00007fd6e28f7e80 R15: 0000000000000000
>>
>> Modules linked in: rpcsec_gss_krb5 nfsv3 nfsv4 dns_resolver nfs netfs
>> zram lz4hc_compress lz4_compress dm_crypt bonding tls ipmi_ssif
>> intel_rapl_msr nfsd binfmt_misc auth_rpcgss nfs_acl lockd grace
>> intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common
>> sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp rapl
>> intel_cstate s>
>> intel_pmc_bxt ixgbe ehci_pci iTCO_vendor_support xfrm_algo gf128mul
>> libata mpt3sas xhci_hcd ehci_hcd watchdog crypto_simd mdio_devres
>> libphy cryptd raid_class usbcore scsi_transport_sas mdio igb scsi_mod
>> wmi usb_common i2c_i801 lpc_ich scsi_common i2c_smbus i2c_algo_bit dca
>> CR2: 0000000000000008
>> ---[ end trace 0000000000000000 ]---
>> RIP: 0010:_raw_spin_lock+0x17/0x30
>> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f
>> 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1
>> 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00
>> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246
>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
>> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008
>> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001
>> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000
>> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000
>> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000)
>> knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0
>>
>> Reported-by: Eli Venter <eli@genedx.com>
>> Signed-off-by: Kai Krakow <kai@kaishome.de>
>> ---
>> fs/btrfs/space-info.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
>> index 97452fb5d29b..cbb6c4924850 100644
>> --- a/fs/btrfs/space-info.c
>> +++ b/fs/btrfs/space-info.c
>> @@ -1752,6 +1752,14 @@ static int __reserve_bytes(struct btrfs_fs_info
>> *fs_info,
>> ASSERT(flush != BTRFS_RESERVE_FLUSH_EVICT);
>> }
>> + /*
>> + * During mount, the global block reserve might not have its
>> space_info
>> + * initialized yet. If we try to reserve bytes in this state
>> (e.g. via
>> + * early sysfs writes), we must not crash.
>> + */
>> + if (unlikely(!space_info))
>> + return -EBUSY;
>> +
>> if (flush == BTRFS_RESERVE_FLUSH_DATA)
>> async_work = &fs_info->async_data_reclaim_work;
>> else
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL
2025-12-13 20:48 ` Qu Wenruo
2025-12-13 21:04 ` Qu Wenruo
@ 2025-12-13 21:10 ` Kai Krakow
2025-12-13 21:15 ` Qu Wenruo
1 sibling, 1 reply; 9+ messages in thread
From: Kai Krakow @ 2025-12-13 21:10 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs, Eli Venter
Hello Qu!
Am Sa., 13. Dez. 2025 um 21:48 Uhr schrieb Qu Wenruo <wqu@suse.com>:
>
>
>
> 在 2025/12/14 06:39, Kai Krakow 写道:
> > During mount, the global block reserve might not have its space_info
> > initialized yet. If we try to reserve bytes in this state (e.g. via
> > early sysfs writes), we must not crash.
> >
> > This happened while developing patches which allow modification of the
> > devinfo.type field via sysfs. If this write access is executed by the
> > user before the mount finished, the kernel crashed with a NULL pointer
> > dereference:
>
>
> I'd say the modification through sysfs itself is a dangerous idea, it
> will need to hold the proper locks and if not properly checked can
> easily introduce unexpected races.
>
>
> Furthermore currently there is no RW support for devinfo related member.
>
> So this means your patch is fixing something that is only affecting your
> out-of-tree development branch, which is not bringing much usefulness to
> upstream.
>
> Thanks,
> Qu
Okay, thanks. I understand your argumentation. I almost expected that
this won't be accepted because it is triggered by out-of-tree code.
In case, you'd like to see the code causing this:
https://gist.github.com/kakra/8ccdcb96ca8426b95bcd86c7e0b5115e
It's part of Goffredo's "allocator hint" patches which I rebased to 6.18.
As you may see, I already guarded the call with:
if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags)) return -EBUSY;
So I should be safe there even without this patch.
Thanks,
Kai
> >> Noticed an oops with these patches when doing echo 1 >devinfo/2/type
> >> while mount is still ongoing. My btrfs is big so the mount takes
> >> 20-30 minutes. Reboot and wait until mount is complete and this
> >> worked fine.
> >
> > BUG: kernel NULL pointer dereference, address: 0000000000000008
> > PGD 0 P4D 0
> > Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
> > CPU: 4 UID: 0 PID: 3520 Comm: bash Not tainted 6.12.52-dirty #2
> > Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 06/25/2018
> > RIP: 0010:_raw_spin_lock+0x17/0x30
> > Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00
> > RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246
> > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008
> > RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001
> > R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000
> > R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000
> > FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0
> > Call Trace:
> >
> > __reserve_bytes+0x70/0x720 [btrfs]
> > ? get_page_from_freelist+0x343/0x1570
> > btrfs_reserve_metadata_bytes+0x1d/0xd0 [btrfs]
> > btrfs_use_block_rsv+0x153/0x220 [btrfs]
> > btrfs_alloc_tree_block+0x83/0x580 [btrfs]
> > btrfs_force_cow_block+0x129/0x620 [btrfs]
> > btrfs_cow_block+0xcd/0x230 [btrfs]
> > btrfs_search_slot+0x566/0xd60 [btrfs]
> > ? kmem_cache_alloc_noprof+0x106/0x2f0
> > btrfs_update_device+0x91/0x1d0 [btrfs]
> > btrfs_devinfo_type_store+0xb8/0x140 [btrfs]
> > kernfs_fop_write_iter+0x14c/0x200
> > vfs_write+0x289/0x440
> > ksys_write+0x6d/0xf0
> > trace_clock_x86_tsc+0x20/0x20
> > ? do_wp_page+0x838/0xf90
> > ? __do_sys_newfstat+0x68/0x70
> > ? __pte_offset_map+0x1b/0xf0
> > ? __handle_mm_fault+0xa6c/0x10f0
> > ? __count_memcg_events+0x53/0xf0
> > ? handle_mm_fault+0x1c4/0x2d0
> > ? do_user_addr_fault+0x334/0x620
> > ? arch_exit_to_user_mode_prepare.isra.0+0x11/0x90
> > entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > RIP: 0033:0x7fd6e27a1687
> > Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
> > RSP: 002b:00007ffecb401260 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
> > RAX: ffffffffffffffda RBX: 00007fd6e270f740 RCX: 00007fd6e27a1687
> > RDX: 0000000000000002 RSI: 0000557a2c38ad20 RDI: 0000000000000001
> > RBP: 0000557a2c38ad20 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002
> > R13: 00007fd6e28fa5c0 R14: 00007fd6e28f7e80 R15: 0000000000000000
> >
> > Modules linked in: rpcsec_gss_krb5 nfsv3 nfsv4 dns_resolver nfs netfs zram lz4hc_compress lz4_compress dm_crypt bonding tls ipmi_ssif intel_rapl_msr nfsd binfmt_misc auth_rpcgss nfs_acl lockd grace intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp rapl intel_cstate s>
> > intel_pmc_bxt ixgbe ehci_pci iTCO_vendor_support xfrm_algo gf128mul libata mpt3sas xhci_hcd ehci_hcd watchdog crypto_simd mdio_devres libphy cryptd raid_class usbcore scsi_transport_sas mdio igb scsi_mod wmi usb_common i2c_i801 lpc_ich scsi_common i2c_smbus i2c_algo_bit dca
> > CR2: 0000000000000008
> > ---[ end trace 0000000000000000 ]---
> > RIP: 0010:_raw_spin_lock+0x17/0x30
> > Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00
> > RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246
> > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008
> > RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001
> > R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000
> > R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000
> > FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0
> >
> > Reported-by: Eli Venter <eli@genedx.com>
> > Signed-off-by: Kai Krakow <kai@kaishome.de>
> > ---
> > fs/btrfs/space-info.c | 8 ++++++++
> > 1 file changed, 8 insertions(+)
> >
> > diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> > index 97452fb5d29b..cbb6c4924850 100644
> > --- a/fs/btrfs/space-info.c
> > +++ b/fs/btrfs/space-info.c
> > @@ -1752,6 +1752,14 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info,
> > ASSERT(flush != BTRFS_RESERVE_FLUSH_EVICT);
> > }
> >
> > + /*
> > + * During mount, the global block reserve might not have its space_info
> > + * initialized yet. If we try to reserve bytes in this state (e.g. via
> > + * early sysfs writes), we must not crash.
> > + */
> > + if (unlikely(!space_info))
> > + return -EBUSY;
> > +
> > if (flush == BTRFS_RESERVE_FLUSH_DATA)
> > async_work = &fs_info->async_data_reclaim_work;
> > else
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL
2025-12-13 21:04 ` Qu Wenruo
@ 2025-12-13 21:14 ` Kai Krakow
0 siblings, 0 replies; 9+ messages in thread
From: Kai Krakow @ 2025-12-13 21:14 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs, Eli Venter
Am Sa., 13. Dez. 2025 um 22:04 Uhr schrieb Qu Wenruo <wqu@suse.com>:
>
>
>
> 在 2025/12/14 07:18, Qu Wenruo 写道:
> >
> >
> > 在 2025/12/14 06:39, Kai Krakow 写道:
> >> During mount, the global block reserve might not have its space_info
> >> initialized yet. If we try to reserve bytes in this state (e.g. via
> >> early sysfs writes), we must not crash.
> >>
> >> This happened while developing patches which allow modification of the
> >> devinfo.type field via sysfs. If this write access is executed by the
> >> user before the mount finished, the kernel crashed with a NULL pointer
> >> dereference:
> >
> >
> > I'd say the modification through sysfs itself is a dangerous idea, it
> > will need to hold the proper locks and if not properly checked can
> > easily introduce unexpected races.
> >
> >
> > Furthermore currently there is no RW support for devinfo related member.
>
> Correction, there are RW members but that are only runtime ones, e.g.
> scrub throughput and iops limits.
>
> Nothing is going to trigger any metadata writes.
> And that's true for all RW members, no matter if it's
> space_info->chunk_size or reclaim control.
>
> Even for the label modification through sysfs, we do not trigger any
> metadata changes, but only modify in-memory structures.
>
> If you're going to use sysfs to trigger a transaction, I'd just say, DON'T.
Yes, I'm already thinking about another solution. But for now, I'll
rebase this approach.
Thanks for suggesting not to trigger transactions from within sysfs
code paths. I'll be taking that into account for future versions.
Thanks,
Kai
> Thanks,
> Qu
>
> >
> > So this means your patch is fixing something that is only affecting your
> > out-of-tree development branch, which is not bringing much usefulness to
> > upstream.
> >
> > Thanks,
> > Qu
> >
> >>
> >>> Noticed an oops with these patches when doing echo 1 >devinfo/2/type
> >>> while mount is still ongoing. My btrfs is big so the mount takes
> >>> 20-30 minutes. Reboot and wait until mount is complete and this
> >>> worked fine.
> >>
> >> BUG: kernel NULL pointer dereference, address: 0000000000000008
> >> PGD 0 P4D 0
> >> Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
> >> CPU: 4 UID: 0 PID: 3520 Comm: bash Not tainted 6.12.52-dirty #2
> >> Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15
> >> 06/25/2018
> >> RIP: 0010:_raw_spin_lock+0x17/0x30
> >> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f
> >> 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1
> >> 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00
> >> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246
> >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> >> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008
> >> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001
> >> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000
> >> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000
> >> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000)
> >> knlGS:0000000000000000
> >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0
> >> Call Trace:
> >>
> >> __reserve_bytes+0x70/0x720 [btrfs]
> >> ? get_page_from_freelist+0x343/0x1570
> >> btrfs_reserve_metadata_bytes+0x1d/0xd0 [btrfs]
> >> btrfs_use_block_rsv+0x153/0x220 [btrfs]
> >> btrfs_alloc_tree_block+0x83/0x580 [btrfs]
> >> btrfs_force_cow_block+0x129/0x620 [btrfs]
> >> btrfs_cow_block+0xcd/0x230 [btrfs]
> >> btrfs_search_slot+0x566/0xd60 [btrfs]
> >> ? kmem_cache_alloc_noprof+0x106/0x2f0
> >> btrfs_update_device+0x91/0x1d0 [btrfs]
> >> btrfs_devinfo_type_store+0xb8/0x140 [btrfs]
> >> kernfs_fop_write_iter+0x14c/0x200
> >> vfs_write+0x289/0x440
> >> ksys_write+0x6d/0xf0
> >> trace_clock_x86_tsc+0x20/0x20
> >> ? do_wp_page+0x838/0xf90
> >> ? __do_sys_newfstat+0x68/0x70
> >> ? __pte_offset_map+0x1b/0xf0
> >> ? __handle_mm_fault+0xa6c/0x10f0
> >> ? __count_memcg_events+0x53/0xf0
> >> ? handle_mm_fault+0x1c4/0x2d0
> >> ? do_user_addr_fault+0x334/0x620
> >> ? arch_exit_to_user_mode_prepare.isra.0+0x11/0x90
> >> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >> RIP: 0033:0x7fd6e27a1687
> >> Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83
> >> f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3
> >> 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
> >> RSP: 002b:00007ffecb401260 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
> >> RAX: ffffffffffffffda RBX: 00007fd6e270f740 RCX: 00007fd6e27a1687
> >> RDX: 0000000000000002 RSI: 0000557a2c38ad20 RDI: 0000000000000001
> >> RBP: 0000557a2c38ad20 R08: 0000000000000000 R09: 0000000000000000
> >> R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002
> >> R13: 00007fd6e28fa5c0 R14: 00007fd6e28f7e80 R15: 0000000000000000
> >>
> >> Modules linked in: rpcsec_gss_krb5 nfsv3 nfsv4 dns_resolver nfs netfs
> >> zram lz4hc_compress lz4_compress dm_crypt bonding tls ipmi_ssif
> >> intel_rapl_msr nfsd binfmt_misc auth_rpcgss nfs_acl lockd grace
> >> intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common
> >> sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp rapl
> >> intel_cstate s>
> >> intel_pmc_bxt ixgbe ehci_pci iTCO_vendor_support xfrm_algo gf128mul
> >> libata mpt3sas xhci_hcd ehci_hcd watchdog crypto_simd mdio_devres
> >> libphy cryptd raid_class usbcore scsi_transport_sas mdio igb scsi_mod
> >> wmi usb_common i2c_i801 lpc_ich scsi_common i2c_smbus i2c_algo_bit dca
> >> CR2: 0000000000000008
> >> ---[ end trace 0000000000000000 ]---
> >> RIP: 0010:_raw_spin_lock+0x17/0x30
> >> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f
> >> 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1
> >> 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00
> >> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246
> >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> >> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008
> >> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001
> >> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000
> >> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000
> >> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000)
> >> knlGS:0000000000000000
> >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0
> >>
> >> Reported-by: Eli Venter <eli@genedx.com>
> >> Signed-off-by: Kai Krakow <kai@kaishome.de>
> >> ---
> >> fs/btrfs/space-info.c | 8 ++++++++
> >> 1 file changed, 8 insertions(+)
> >>
> >> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> >> index 97452fb5d29b..cbb6c4924850 100644
> >> --- a/fs/btrfs/space-info.c
> >> +++ b/fs/btrfs/space-info.c
> >> @@ -1752,6 +1752,14 @@ static int __reserve_bytes(struct btrfs_fs_info
> >> *fs_info,
> >> ASSERT(flush != BTRFS_RESERVE_FLUSH_EVICT);
> >> }
> >> + /*
> >> + * During mount, the global block reserve might not have its
> >> space_info
> >> + * initialized yet. If we try to reserve bytes in this state
> >> (e.g. via
> >> + * early sysfs writes), we must not crash.
> >> + */
> >> + if (unlikely(!space_info))
> >> + return -EBUSY;
> >> +
> >> if (flush == BTRFS_RESERVE_FLUSH_DATA)
> >> async_work = &fs_info->async_data_reclaim_work;
> >> else
> >
> >
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL
2025-12-13 21:10 ` Kai Krakow
@ 2025-12-13 21:15 ` Qu Wenruo
2025-12-13 21:43 ` Kai Krakow
0 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2025-12-13 21:15 UTC (permalink / raw)
To: Kai Krakow; +Cc: linux-btrfs, Eli Venter
在 2025/12/14 07:40, Kai Krakow 写道:
> Hello Qu!
>
> Am Sa., 13. Dez. 2025 um 21:48 Uhr schrieb Qu Wenruo <wqu@suse.com>:
>>
>>
>>
>> 在 2025/12/14 06:39, Kai Krakow 写道:
>>> During mount, the global block reserve might not have its space_info
>>> initialized yet. If we try to reserve bytes in this state (e.g. via
>>> early sysfs writes), we must not crash.
>>>
>>> This happened while developing patches which allow modification of the
>>> devinfo.type field via sysfs. If this write access is executed by the
>>> user before the mount finished, the kernel crashed with a NULL pointer
>>> dereference:
>>
>>
>> I'd say the modification through sysfs itself is a dangerous idea, it
>> will need to hold the proper locks and if not properly checked can
>> easily introduce unexpected races.
>>
>>
>> Furthermore currently there is no RW support for devinfo related member.
>>
>> So this means your patch is fixing something that is only affecting your
>> out-of-tree development branch, which is not bringing much usefulness to
>> upstream.
>>
>> Thanks,
>> Qu
>
> Okay, thanks. I understand your argumentation. I almost expected that
> this won't be accepted because it is triggered by out-of-tree code.
>
> In case, you'd like to see the code causing this:
> https://gist.github.com/kakra/8ccdcb96ca8426b95bcd86c7e0b5115e
>
> It's part of Goffredo's "allocator hint" patches which I rebased to 6.18.
>
> As you may see, I already guarded the call with:
>
> if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags)) return -EBUSY;
Currently the RW sysfs interfaces follows the pattern that no
transaction is triggered from sysfs write context.
I strongly doubt if it's a good idea to trigger a transaction without
any VFS checks.
So if you really want to change a member, do a proper in-memory update
only, and find a way (e.g. a new dirty dev list) to tell the fs to
update the device item at commit time.
It's much safer and avoid a new and untested direct way to trigger
metadata modification.
Thanks,
Qu
>
> So I should be safe there even without this patch.
>
> Thanks,
> Kai
>
>>>> Noticed an oops with these patches when doing echo 1 >devinfo/2/type
>>>> while mount is still ongoing. My btrfs is big so the mount takes
>>>> 20-30 minutes. Reboot and wait until mount is complete and this
>>>> worked fine.
>>>
>>> BUG: kernel NULL pointer dereference, address: 0000000000000008
>>> PGD 0 P4D 0
>>> Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
>>> CPU: 4 UID: 0 PID: 3520 Comm: bash Not tainted 6.12.52-dirty #2
>>> Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 06/25/2018
>>> RIP: 0010:_raw_spin_lock+0x17/0x30
>>> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00
>>> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246
>>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
>>> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008
>>> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001
>>> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000
>>> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000
>>> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000
>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0
>>> Call Trace:
>>>
>>> __reserve_bytes+0x70/0x720 [btrfs]
>>> ? get_page_from_freelist+0x343/0x1570
>>> btrfs_reserve_metadata_bytes+0x1d/0xd0 [btrfs]
>>> btrfs_use_block_rsv+0x153/0x220 [btrfs]
>>> btrfs_alloc_tree_block+0x83/0x580 [btrfs]
>>> btrfs_force_cow_block+0x129/0x620 [btrfs]
>>> btrfs_cow_block+0xcd/0x230 [btrfs]
>>> btrfs_search_slot+0x566/0xd60 [btrfs]
>>> ? kmem_cache_alloc_noprof+0x106/0x2f0
>>> btrfs_update_device+0x91/0x1d0 [btrfs]
>>> btrfs_devinfo_type_store+0xb8/0x140 [btrfs]
>>> kernfs_fop_write_iter+0x14c/0x200
>>> vfs_write+0x289/0x440
>>> ksys_write+0x6d/0xf0
>>> trace_clock_x86_tsc+0x20/0x20
>>> ? do_wp_page+0x838/0xf90
>>> ? __do_sys_newfstat+0x68/0x70
>>> ? __pte_offset_map+0x1b/0xf0
>>> ? __handle_mm_fault+0xa6c/0x10f0
>>> ? __count_memcg_events+0x53/0xf0
>>> ? handle_mm_fault+0x1c4/0x2d0
>>> ? do_user_addr_fault+0x334/0x620
>>> ? arch_exit_to_user_mode_prepare.isra.0+0x11/0x90
>>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>> RIP: 0033:0x7fd6e27a1687
>>> Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
>>> RSP: 002b:00007ffecb401260 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
>>> RAX: ffffffffffffffda RBX: 00007fd6e270f740 RCX: 00007fd6e27a1687
>>> RDX: 0000000000000002 RSI: 0000557a2c38ad20 RDI: 0000000000000001
>>> RBP: 0000557a2c38ad20 R08: 0000000000000000 R09: 0000000000000000
>>> R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002
>>> R13: 00007fd6e28fa5c0 R14: 00007fd6e28f7e80 R15: 0000000000000000
>>>
>>> Modules linked in: rpcsec_gss_krb5 nfsv3 nfsv4 dns_resolver nfs netfs zram lz4hc_compress lz4_compress dm_crypt bonding tls ipmi_ssif intel_rapl_msr nfsd binfmt_misc auth_rpcgss nfs_acl lockd grace intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp rapl intel_cstate s>
>>> intel_pmc_bxt ixgbe ehci_pci iTCO_vendor_support xfrm_algo gf128mul libata mpt3sas xhci_hcd ehci_hcd watchdog crypto_simd mdio_devres libphy cryptd raid_class usbcore scsi_transport_sas mdio igb scsi_mod wmi usb_common i2c_i801 lpc_ich scsi_common i2c_smbus i2c_algo_bit dca
>>> CR2: 0000000000000008
>>> ---[ end trace 0000000000000000 ]---
>>> RIP: 0010:_raw_spin_lock+0x17/0x30
>>> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00
>>> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246
>>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
>>> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008
>>> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001
>>> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000
>>> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000
>>> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000
>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0
>>>
>>> Reported-by: Eli Venter <eli@genedx.com>
>>> Signed-off-by: Kai Krakow <kai@kaishome.de>
>>> ---
>>> fs/btrfs/space-info.c | 8 ++++++++
>>> 1 file changed, 8 insertions(+)
>>>
>>> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
>>> index 97452fb5d29b..cbb6c4924850 100644
>>> --- a/fs/btrfs/space-info.c
>>> +++ b/fs/btrfs/space-info.c
>>> @@ -1752,6 +1752,14 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info,
>>> ASSERT(flush != BTRFS_RESERVE_FLUSH_EVICT);
>>> }
>>>
>>> + /*
>>> + * During mount, the global block reserve might not have its space_info
>>> + * initialized yet. If we try to reserve bytes in this state (e.g. via
>>> + * early sysfs writes), we must not crash.
>>> + */
>>> + if (unlikely(!space_info))
>>> + return -EBUSY;
>>> +
>>> if (flush == BTRFS_RESERVE_FLUSH_DATA)
>>> async_work = &fs_info->async_data_reclaim_work;
>>> else
>>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL
2025-12-13 21:15 ` Qu Wenruo
@ 2025-12-13 21:43 ` Kai Krakow
2025-12-13 22:31 ` Qu Wenruo
0 siblings, 1 reply; 9+ messages in thread
From: Kai Krakow @ 2025-12-13 21:43 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs, Eli Venter
Am Sa., 13. Dez. 2025 um 22:15 Uhr schrieb Qu Wenruo <wqu@suse.com>:
>
>
>
> 在 2025/12/14 07:40, Kai Krakow 写道:
> > Hello Qu!
> >
> > Am Sa., 13. Dez. 2025 um 21:48 Uhr schrieb Qu Wenruo <wqu@suse.com>:
> >>
> >>
> >>
> >> 在 2025/12/14 06:39, Kai Krakow 写道:
> >>> During mount, the global block reserve might not have its space_info
> >>> initialized yet. If we try to reserve bytes in this state (e.g. via
> >>> early sysfs writes), we must not crash.
> >>>
> >>> This happened while developing patches which allow modification of the
> >>> devinfo.type field via sysfs. If this write access is executed by the
> >>> user before the mount finished, the kernel crashed with a NULL pointer
> >>> dereference:
> >>
> >>
> >> I'd say the modification through sysfs itself is a dangerous idea, it
> >> will need to hold the proper locks and if not properly checked can
> >> easily introduce unexpected races.
> >>
> >>
> >> Furthermore currently there is no RW support for devinfo related member.
> >>
> >> So this means your patch is fixing something that is only affecting your
> >> out-of-tree development branch, which is not bringing much usefulness to
> >> upstream.
> >>
> >> Thanks,
> >> Qu
> >
> > Okay, thanks. I understand your argumentation. I almost expected that
> > this won't be accepted because it is triggered by out-of-tree code.
> >
> > In case, you'd like to see the code causing this:
> > https://gist.github.com/kakra/8ccdcb96ca8426b95bcd86c7e0b5115e
> >
> > It's part of Goffredo's "allocator hint" patches which I rebased to 6.18.
> >
> > As you may see, I already guarded the call with:
> >
> > if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags)) return -EBUSY;
>
> Currently the RW sysfs interfaces follows the pattern that no
> transaction is triggered from sysfs write context.
>
> I strongly doubt if it's a good idea to trigger a transaction without
> any VFS checks.
>
> So if you really want to change a member, do a proper in-memory update
> only, and find a way (e.g. a new dirty dev list) to tell the fs to
> update the device item at commit time.
I don't want to steal your time, but is this a better approach?
https://gist.github.com/kakra/aa3eb8473cc05c1d3dd000160a5ee481
Thanks,
Kai
> It's much safer and avoid a new and untested direct way to trigger
> metadata modification.
>
> Thanks,
> Qu
>
> >
> > So I should be safe there even without this patch.
> >
> > Thanks,
> > Kai
> >
> >>>> Noticed an oops with these patches when doing echo 1 >devinfo/2/type
> >>>> while mount is still ongoing. My btrfs is big so the mount takes
> >>>> 20-30 minutes. Reboot and wait until mount is complete and this
> >>>> worked fine.
> >>>
> >>> BUG: kernel NULL pointer dereference, address: 0000000000000008
> >>> PGD 0 P4D 0
> >>> Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
> >>> CPU: 4 UID: 0 PID: 3520 Comm: bash Not tainted 6.12.52-dirty #2
> >>> Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 06/25/2018
> >>> RIP: 0010:_raw_spin_lock+0x17/0x30
> >>> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00
> >>> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246
> >>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> >>> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008
> >>> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001
> >>> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000
> >>> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000
> >>> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000
> >>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0
> >>> Call Trace:
> >>>
> >>> __reserve_bytes+0x70/0x720 [btrfs]
> >>> ? get_page_from_freelist+0x343/0x1570
> >>> btrfs_reserve_metadata_bytes+0x1d/0xd0 [btrfs]
> >>> btrfs_use_block_rsv+0x153/0x220 [btrfs]
> >>> btrfs_alloc_tree_block+0x83/0x580 [btrfs]
> >>> btrfs_force_cow_block+0x129/0x620 [btrfs]
> >>> btrfs_cow_block+0xcd/0x230 [btrfs]
> >>> btrfs_search_slot+0x566/0xd60 [btrfs]
> >>> ? kmem_cache_alloc_noprof+0x106/0x2f0
> >>> btrfs_update_device+0x91/0x1d0 [btrfs]
> >>> btrfs_devinfo_type_store+0xb8/0x140 [btrfs]
> >>> kernfs_fop_write_iter+0x14c/0x200
> >>> vfs_write+0x289/0x440
> >>> ksys_write+0x6d/0xf0
> >>> trace_clock_x86_tsc+0x20/0x20
> >>> ? do_wp_page+0x838/0xf90
> >>> ? __do_sys_newfstat+0x68/0x70
> >>> ? __pte_offset_map+0x1b/0xf0
> >>> ? __handle_mm_fault+0xa6c/0x10f0
> >>> ? __count_memcg_events+0x53/0xf0
> >>> ? handle_mm_fault+0x1c4/0x2d0
> >>> ? do_user_addr_fault+0x334/0x620
> >>> ? arch_exit_to_user_mode_prepare.isra.0+0x11/0x90
> >>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >>> RIP: 0033:0x7fd6e27a1687
> >>> Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
> >>> RSP: 002b:00007ffecb401260 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
> >>> RAX: ffffffffffffffda RBX: 00007fd6e270f740 RCX: 00007fd6e27a1687
> >>> RDX: 0000000000000002 RSI: 0000557a2c38ad20 RDI: 0000000000000001
> >>> RBP: 0000557a2c38ad20 R08: 0000000000000000 R09: 0000000000000000
> >>> R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002
> >>> R13: 00007fd6e28fa5c0 R14: 00007fd6e28f7e80 R15: 0000000000000000
> >>>
> >>> Modules linked in: rpcsec_gss_krb5 nfsv3 nfsv4 dns_resolver nfs netfs zram lz4hc_compress lz4_compress dm_crypt bonding tls ipmi_ssif intel_rapl_msr nfsd binfmt_misc auth_rpcgss nfs_acl lockd grace intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp rapl intel_cstate s>
> >>> intel_pmc_bxt ixgbe ehci_pci iTCO_vendor_support xfrm_algo gf128mul libata mpt3sas xhci_hcd ehci_hcd watchdog crypto_simd mdio_devres libphy cryptd raid_class usbcore scsi_transport_sas mdio igb scsi_mod wmi usb_common i2c_i801 lpc_ich scsi_common i2c_smbus i2c_algo_bit dca
> >>> CR2: 0000000000000008
> >>> ---[ end trace 0000000000000000 ]---
> >>> RIP: 0010:_raw_spin_lock+0x17/0x30
> >>> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00
> >>> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246
> >>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> >>> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008
> >>> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001
> >>> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000
> >>> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000
> >>> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000
> >>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0
> >>>
> >>> Reported-by: Eli Venter <eli@genedx.com>
> >>> Signed-off-by: Kai Krakow <kai@kaishome.de>
> >>> ---
> >>> fs/btrfs/space-info.c | 8 ++++++++
> >>> 1 file changed, 8 insertions(+)
> >>>
> >>> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> >>> index 97452fb5d29b..cbb6c4924850 100644
> >>> --- a/fs/btrfs/space-info.c
> >>> +++ b/fs/btrfs/space-info.c
> >>> @@ -1752,6 +1752,14 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info,
> >>> ASSERT(flush != BTRFS_RESERVE_FLUSH_EVICT);
> >>> }
> >>>
> >>> + /*
> >>> + * During mount, the global block reserve might not have its space_info
> >>> + * initialized yet. If we try to reserve bytes in this state (e.g. via
> >>> + * early sysfs writes), we must not crash.
> >>> + */
> >>> + if (unlikely(!space_info))
> >>> + return -EBUSY;
> >>> +
> >>> if (flush == BTRFS_RESERVE_FLUSH_DATA)
> >>> async_work = &fs_info->async_data_reclaim_work;
> >>> else
> >>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL
2025-12-13 21:43 ` Kai Krakow
@ 2025-12-13 22:31 ` Qu Wenruo
2025-12-15 19:28 ` David Sterba
0 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2025-12-13 22:31 UTC (permalink / raw)
To: Kai Krakow; +Cc: linux-btrfs, Eli Venter
在 2025/12/14 08:13, Kai Krakow 写道:
> Am Sa., 13. Dez. 2025 um 22:15 Uhr schrieb Qu Wenruo <wqu@suse.com>:
[...]
>>
>> So if you really want to change a member, do a proper in-memory update
>> only, and find a way (e.g. a new dirty dev list) to tell the fs to
>> update the device item at commit time.
>
> I don't want to steal your time,
I'm totally fine discussing the implementation details here.
> but is this a better approach?
> https://gist.github.com/kakra/aa3eb8473cc05c1d3dd000160a5ee481
Unfortunately as long as if you're trying to do any metadata
modification, e.g. calling btrfs_update_device(), it will be a huge change.
My idea would be something like this:
btrfs_dev_info_type_store()
{
btrfs_device *device = container_of();
/* Do the proper locking. */
WRITE_ONCE(device->type, type);
if (!list_empty(&dev->dirty_list))
list_add_tail(&fs_info->dirty_dev_list, &dev->dirty_list);
return len;
}
Then inside btrfs_commit_transaction(), I do not yet have a good idea on
the timing, but I guess it can done before btrfs_start_delalloc_flush().
Do something like this to write those dirty devices to chunk tree:
btrfs_commit_transaction()
{
list_for_each_entry(dev, &fs_info->dirty_dev_list, dirty_list) {
ret = btrfs_update_device(dev);
}
/* The remaining code. */
ret = btrfs_start_delalloc_flush();
}
>
> Thanks,
> Kai
>
>> It's much safer and avoid a new and untested direct way to trigger
>> metadata modification.
>>
>> Thanks,
>> Qu
>>
>>>
>>> So I should be safe there even without this patch.
>>>
>>> Thanks,
>>> Kai
>>>
>>>>>> Noticed an oops with these patches when doing echo 1 >devinfo/2/type
>>>>>> while mount is still ongoing. My btrfs is big so the mount takes
>>>>>> 20-30 minutes. Reboot and wait until mount is complete and this
>>>>>> worked fine.
>>>>>
>>>>> BUG: kernel NULL pointer dereference, address: 0000000000000008
>>>>> PGD 0 P4D 0
>>>>> Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
>>>>> CPU: 4 UID: 0 PID: 3520 Comm: bash Not tainted 6.12.52-dirty #2
>>>>> Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 06/25/2018
>>>>> RIP: 0010:_raw_spin_lock+0x17/0x30
>>>>> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00
>>>>> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246
>>>>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
>>>>> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008
>>>>> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001
>>>>> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000
>>>>> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000
>>>>> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000
>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0
>>>>> Call Trace:
>>>>>
>>>>> __reserve_bytes+0x70/0x720 [btrfs]
>>>>> ? get_page_from_freelist+0x343/0x1570
>>>>> btrfs_reserve_metadata_bytes+0x1d/0xd0 [btrfs]
>>>>> btrfs_use_block_rsv+0x153/0x220 [btrfs]
>>>>> btrfs_alloc_tree_block+0x83/0x580 [btrfs]
>>>>> btrfs_force_cow_block+0x129/0x620 [btrfs]
>>>>> btrfs_cow_block+0xcd/0x230 [btrfs]
>>>>> btrfs_search_slot+0x566/0xd60 [btrfs]
>>>>> ? kmem_cache_alloc_noprof+0x106/0x2f0
>>>>> btrfs_update_device+0x91/0x1d0 [btrfs]
>>>>> btrfs_devinfo_type_store+0xb8/0x140 [btrfs]
>>>>> kernfs_fop_write_iter+0x14c/0x200
>>>>> vfs_write+0x289/0x440
>>>>> ksys_write+0x6d/0xf0
>>>>> trace_clock_x86_tsc+0x20/0x20
>>>>> ? do_wp_page+0x838/0xf90
>>>>> ? __do_sys_newfstat+0x68/0x70
>>>>> ? __pte_offset_map+0x1b/0xf0
>>>>> ? __handle_mm_fault+0xa6c/0x10f0
>>>>> ? __count_memcg_events+0x53/0xf0
>>>>> ? handle_mm_fault+0x1c4/0x2d0
>>>>> ? do_user_addr_fault+0x334/0x620
>>>>> ? arch_exit_to_user_mode_prepare.isra.0+0x11/0x90
>>>>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>> RIP: 0033:0x7fd6e27a1687
>>>>> Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
>>>>> RSP: 002b:00007ffecb401260 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
>>>>> RAX: ffffffffffffffda RBX: 00007fd6e270f740 RCX: 00007fd6e27a1687
>>>>> RDX: 0000000000000002 RSI: 0000557a2c38ad20 RDI: 0000000000000001
>>>>> RBP: 0000557a2c38ad20 R08: 0000000000000000 R09: 0000000000000000
>>>>> R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002
>>>>> R13: 00007fd6e28fa5c0 R14: 00007fd6e28f7e80 R15: 0000000000000000
>>>>>
>>>>> Modules linked in: rpcsec_gss_krb5 nfsv3 nfsv4 dns_resolver nfs netfs zram lz4hc_compress lz4_compress dm_crypt bonding tls ipmi_ssif intel_rapl_msr nfsd binfmt_misc auth_rpcgss nfs_acl lockd grace intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp rapl intel_cstate s>
>>>>> intel_pmc_bxt ixgbe ehci_pci iTCO_vendor_support xfrm_algo gf128mul libata mpt3sas xhci_hcd ehci_hcd watchdog crypto_simd mdio_devres libphy cryptd raid_class usbcore scsi_transport_sas mdio igb scsi_mod wmi usb_common i2c_i801 lpc_ich scsi_common i2c_smbus i2c_algo_bit dca
>>>>> CR2: 0000000000000008
>>>>> ---[ end trace 0000000000000000 ]---
>>>>> RIP: 0010:_raw_spin_lock+0x17/0x30
>>>>> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00
>>>>> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246
>>>>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
>>>>> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008
>>>>> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001
>>>>> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000
>>>>> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000
>>>>> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000
>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0
>>>>>
>>>>> Reported-by: Eli Venter <eli@genedx.com>
>>>>> Signed-off-by: Kai Krakow <kai@kaishome.de>
>>>>> ---
>>>>> fs/btrfs/space-info.c | 8 ++++++++
>>>>> 1 file changed, 8 insertions(+)
>>>>>
>>>>> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
>>>>> index 97452fb5d29b..cbb6c4924850 100644
>>>>> --- a/fs/btrfs/space-info.c
>>>>> +++ b/fs/btrfs/space-info.c
>>>>> @@ -1752,6 +1752,14 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info,
>>>>> ASSERT(flush != BTRFS_RESERVE_FLUSH_EVICT);
>>>>> }
>>>>>
>>>>> + /*
>>>>> + * During mount, the global block reserve might not have its space_info
>>>>> + * initialized yet. If we try to reserve bytes in this state (e.g. via
>>>>> + * early sysfs writes), we must not crash.
>>>>> + */
>>>>> + if (unlikely(!space_info))
>>>>> + return -EBUSY;
>>>>> +
>>>>> if (flush == BTRFS_RESERVE_FLUSH_DATA)
>>>>> async_work = &fs_info->async_data_reclaim_work;
>>>>> else
>>>>
>>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL
2025-12-13 22:31 ` Qu Wenruo
@ 2025-12-15 19:28 ` David Sterba
0 siblings, 0 replies; 9+ messages in thread
From: David Sterba @ 2025-12-15 19:28 UTC (permalink / raw)
To: Qu Wenruo; +Cc: Kai Krakow, linux-btrfs, Eli Venter
On Sun, Dec 14, 2025 at 09:01:41AM +1030, Qu Wenruo wrote:
>
>
> 在 2025/12/14 08:13, Kai Krakow 写道:
> > Am Sa., 13. Dez. 2025 um 22:15 Uhr schrieb Qu Wenruo <wqu@suse.com>:
> [...]
> >>
> >> So if you really want to change a member, do a proper in-memory update
> >> only, and find a way (e.g. a new dirty dev list) to tell the fs to
> >> update the device item at commit time.
> >
> > I don't want to steal your time,
>
> I'm totally fine discussing the implementation details here.
>
> > but is this a better approach?
> > https://gist.github.com/kakra/aa3eb8473cc05c1d3dd000160a5ee481
>
> Unfortunately as long as if you're trying to do any metadata
> modification, e.g. calling btrfs_update_device(), it will be a huge change.
>
>
> My idea would be something like this:
>
> btrfs_dev_info_type_store()
> {
> btrfs_device *device = container_of();
>
> /* Do the proper locking. */
>
> WRITE_ONCE(device->type, type);
> if (!list_empty(&dev->dirty_list))
> list_add_tail(&fs_info->dirty_dev_list, &dev->dirty_list);
set_bit(BTRFS_FS_NEED_TRANS_COMMIT, &fs_info->flags);
wake_up_process(fs_info->transaction_kthread);
I don't think it would be wise to wait until the transaction is
committed before returning from the store handler. We don't do that for
other similar changes either so the write becomes permanent after a
sync, using eg. 'btrfs fi sync'.
> return len;
> }
>
>
> Then inside btrfs_commit_transaction(), I do not yet have a good idea on
> the timing, but I guess it can done before btrfs_start_delalloc_flush().
>
> Do something like this to write those dirty devices to chunk tree:
>
> btrfs_commit_transaction()
> {
> list_for_each_entry(dev, &fs_info->dirty_dev_list, dirty_list) {
> ret = btrfs_update_device(dev);
> }
>
> /* The remaining code. */
> ret = btrfs_start_delalloc_flush();
> }
This should work. The placement depends on what is changed and how it's
related to the current transaction.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-12-15 19:28 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-13 20:09 [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL Kai Krakow
2025-12-13 20:48 ` Qu Wenruo
2025-12-13 21:04 ` Qu Wenruo
2025-12-13 21:14 ` Kai Krakow
2025-12-13 21:10 ` Kai Krakow
2025-12-13 21:15 ` Qu Wenruo
2025-12-13 21:43 ` Kai Krakow
2025-12-13 22:31 ` Qu Wenruo
2025-12-15 19:28 ` David Sterba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox