* [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL @ 2025-12-13 20:09 Kai Krakow 2025-12-13 20:48 ` Qu Wenruo 0 siblings, 1 reply; 9+ messages in thread From: Kai Krakow @ 2025-12-13 20:09 UTC (permalink / raw) To: linux-btrfs; +Cc: Kai Krakow, Eli Venter During mount, the global block reserve might not have its space_info initialized yet. If we try to reserve bytes in this state (e.g. via early sysfs writes), we must not crash. This happened while developing patches which allow modification of the devinfo.type field via sysfs. If this write access is executed by the user before the mount finished, the kernel crashed with a NULL pointer dereference: > Noticed an oops with these patches when doing echo 1 >devinfo/2/type > while mount is still ongoing. My btrfs is big so the mount takes > 20-30 minutes. Reboot and wait until mount is complete and this > worked fine. BUG: kernel NULL pointer dereference, address: 0000000000000008 PGD 0 P4D 0 Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 4 UID: 0 PID: 3520 Comm: bash Not tainted 6.12.52-dirty #2 Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 06/25/2018 RIP: 0010:_raw_spin_lock+0x17/0x30 Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00 RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008 RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001 R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000 R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000 FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0 Call Trace: __reserve_bytes+0x70/0x720 [btrfs] ? get_page_from_freelist+0x343/0x1570 btrfs_reserve_metadata_bytes+0x1d/0xd0 [btrfs] btrfs_use_block_rsv+0x153/0x220 [btrfs] btrfs_alloc_tree_block+0x83/0x580 [btrfs] btrfs_force_cow_block+0x129/0x620 [btrfs] btrfs_cow_block+0xcd/0x230 [btrfs] btrfs_search_slot+0x566/0xd60 [btrfs] ? kmem_cache_alloc_noprof+0x106/0x2f0 btrfs_update_device+0x91/0x1d0 [btrfs] btrfs_devinfo_type_store+0xb8/0x140 [btrfs] kernfs_fop_write_iter+0x14c/0x200 vfs_write+0x289/0x440 ksys_write+0x6d/0xf0 trace_clock_x86_tsc+0x20/0x20 ? do_wp_page+0x838/0xf90 ? __do_sys_newfstat+0x68/0x70 ? __pte_offset_map+0x1b/0xf0 ? __handle_mm_fault+0xa6c/0x10f0 ? __count_memcg_events+0x53/0xf0 ? handle_mm_fault+0x1c4/0x2d0 ? do_user_addr_fault+0x334/0x620 ? arch_exit_to_user_mode_prepare.isra.0+0x11/0x90 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7fd6e27a1687 Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff RSP: 002b:00007ffecb401260 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007fd6e270f740 RCX: 00007fd6e27a1687 RDX: 0000000000000002 RSI: 0000557a2c38ad20 RDI: 0000000000000001 RBP: 0000557a2c38ad20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002 R13: 00007fd6e28fa5c0 R14: 00007fd6e28f7e80 R15: 0000000000000000 Modules linked in: rpcsec_gss_krb5 nfsv3 nfsv4 dns_resolver nfs netfs zram lz4hc_compress lz4_compress dm_crypt bonding tls ipmi_ssif intel_rapl_msr nfsd binfmt_misc auth_rpcgss nfs_acl lockd grace intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp rapl intel_cstate s> intel_pmc_bxt ixgbe ehci_pci iTCO_vendor_support xfrm_algo gf128mul libata mpt3sas xhci_hcd ehci_hcd watchdog crypto_simd mdio_devres libphy cryptd raid_class usbcore scsi_transport_sas mdio igb scsi_mod wmi usb_common i2c_i801 lpc_ich scsi_common i2c_smbus i2c_algo_bit dca CR2: 0000000000000008 ---[ end trace 0000000000000000 ]--- RIP: 0010:_raw_spin_lock+0x17/0x30 Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00 RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008 RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001 R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000 R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000 FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0 Reported-by: Eli Venter <eli@genedx.com> Signed-off-by: Kai Krakow <kai@kaishome.de> --- fs/btrfs/space-info.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index 97452fb5d29b..cbb6c4924850 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -1752,6 +1752,14 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info, ASSERT(flush != BTRFS_RESERVE_FLUSH_EVICT); } + /* + * During mount, the global block reserve might not have its space_info + * initialized yet. If we try to reserve bytes in this state (e.g. via + * early sysfs writes), we must not crash. + */ + if (unlikely(!space_info)) + return -EBUSY; + if (flush == BTRFS_RESERVE_FLUSH_DATA) async_work = &fs_info->async_data_reclaim_work; else -- 2.51.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL 2025-12-13 20:09 [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL Kai Krakow @ 2025-12-13 20:48 ` Qu Wenruo 2025-12-13 21:04 ` Qu Wenruo 2025-12-13 21:10 ` Kai Krakow 0 siblings, 2 replies; 9+ messages in thread From: Qu Wenruo @ 2025-12-13 20:48 UTC (permalink / raw) To: Kai Krakow, linux-btrfs; +Cc: Eli Venter 在 2025/12/14 06:39, Kai Krakow 写道: > During mount, the global block reserve might not have its space_info > initialized yet. If we try to reserve bytes in this state (e.g. via > early sysfs writes), we must not crash. > > This happened while developing patches which allow modification of the > devinfo.type field via sysfs. If this write access is executed by the > user before the mount finished, the kernel crashed with a NULL pointer > dereference: I'd say the modification through sysfs itself is a dangerous idea, it will need to hold the proper locks and if not properly checked can easily introduce unexpected races. Furthermore currently there is no RW support for devinfo related member. So this means your patch is fixing something that is only affecting your out-of-tree development branch, which is not bringing much usefulness to upstream. Thanks, Qu > >> Noticed an oops with these patches when doing echo 1 >devinfo/2/type >> while mount is still ongoing. My btrfs is big so the mount takes >> 20-30 minutes. Reboot and wait until mount is complete and this >> worked fine. > > BUG: kernel NULL pointer dereference, address: 0000000000000008 > PGD 0 P4D 0 > Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI > CPU: 4 UID: 0 PID: 3520 Comm: bash Not tainted 6.12.52-dirty #2 > Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 06/25/2018 > RIP: 0010:_raw_spin_lock+0x17/0x30 > Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00 > RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 > RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008 > RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001 > R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000 > R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000 > FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0 > Call Trace: > > __reserve_bytes+0x70/0x720 [btrfs] > ? get_page_from_freelist+0x343/0x1570 > btrfs_reserve_metadata_bytes+0x1d/0xd0 [btrfs] > btrfs_use_block_rsv+0x153/0x220 [btrfs] > btrfs_alloc_tree_block+0x83/0x580 [btrfs] > btrfs_force_cow_block+0x129/0x620 [btrfs] > btrfs_cow_block+0xcd/0x230 [btrfs] > btrfs_search_slot+0x566/0xd60 [btrfs] > ? kmem_cache_alloc_noprof+0x106/0x2f0 > btrfs_update_device+0x91/0x1d0 [btrfs] > btrfs_devinfo_type_store+0xb8/0x140 [btrfs] > kernfs_fop_write_iter+0x14c/0x200 > vfs_write+0x289/0x440 > ksys_write+0x6d/0xf0 > trace_clock_x86_tsc+0x20/0x20 > ? do_wp_page+0x838/0xf90 > ? __do_sys_newfstat+0x68/0x70 > ? __pte_offset_map+0x1b/0xf0 > ? __handle_mm_fault+0xa6c/0x10f0 > ? __count_memcg_events+0x53/0xf0 > ? handle_mm_fault+0x1c4/0x2d0 > ? do_user_addr_fault+0x334/0x620 > ? arch_exit_to_user_mode_prepare.isra.0+0x11/0x90 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > RIP: 0033:0x7fd6e27a1687 > Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff > RSP: 002b:00007ffecb401260 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 > RAX: ffffffffffffffda RBX: 00007fd6e270f740 RCX: 00007fd6e27a1687 > RDX: 0000000000000002 RSI: 0000557a2c38ad20 RDI: 0000000000000001 > RBP: 0000557a2c38ad20 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002 > R13: 00007fd6e28fa5c0 R14: 00007fd6e28f7e80 R15: 0000000000000000 > > Modules linked in: rpcsec_gss_krb5 nfsv3 nfsv4 dns_resolver nfs netfs zram lz4hc_compress lz4_compress dm_crypt bonding tls ipmi_ssif intel_rapl_msr nfsd binfmt_misc auth_rpcgss nfs_acl lockd grace intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp rapl intel_cstate s> > intel_pmc_bxt ixgbe ehci_pci iTCO_vendor_support xfrm_algo gf128mul libata mpt3sas xhci_hcd ehci_hcd watchdog crypto_simd mdio_devres libphy cryptd raid_class usbcore scsi_transport_sas mdio igb scsi_mod wmi usb_common i2c_i801 lpc_ich scsi_common i2c_smbus i2c_algo_bit dca > CR2: 0000000000000008 > ---[ end trace 0000000000000000 ]--- > RIP: 0010:_raw_spin_lock+0x17/0x30 > Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00 > RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 > RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008 > RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001 > R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000 > R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000 > FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0 > > Reported-by: Eli Venter <eli@genedx.com> > Signed-off-by: Kai Krakow <kai@kaishome.de> > --- > fs/btrfs/space-info.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c > index 97452fb5d29b..cbb6c4924850 100644 > --- a/fs/btrfs/space-info.c > +++ b/fs/btrfs/space-info.c > @@ -1752,6 +1752,14 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info, > ASSERT(flush != BTRFS_RESERVE_FLUSH_EVICT); > } > > + /* > + * During mount, the global block reserve might not have its space_info > + * initialized yet. If we try to reserve bytes in this state (e.g. via > + * early sysfs writes), we must not crash. > + */ > + if (unlikely(!space_info)) > + return -EBUSY; > + > if (flush == BTRFS_RESERVE_FLUSH_DATA) > async_work = &fs_info->async_data_reclaim_work; > else ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL 2025-12-13 20:48 ` Qu Wenruo @ 2025-12-13 21:04 ` Qu Wenruo 2025-12-13 21:14 ` Kai Krakow 2025-12-13 21:10 ` Kai Krakow 1 sibling, 1 reply; 9+ messages in thread From: Qu Wenruo @ 2025-12-13 21:04 UTC (permalink / raw) To: Kai Krakow, linux-btrfs; +Cc: Eli Venter 在 2025/12/14 07:18, Qu Wenruo 写道: > > > 在 2025/12/14 06:39, Kai Krakow 写道: >> During mount, the global block reserve might not have its space_info >> initialized yet. If we try to reserve bytes in this state (e.g. via >> early sysfs writes), we must not crash. >> >> This happened while developing patches which allow modification of the >> devinfo.type field via sysfs. If this write access is executed by the >> user before the mount finished, the kernel crashed with a NULL pointer >> dereference: > > > I'd say the modification through sysfs itself is a dangerous idea, it > will need to hold the proper locks and if not properly checked can > easily introduce unexpected races. > > > Furthermore currently there is no RW support for devinfo related member. Correction, there are RW members but that are only runtime ones, e.g. scrub throughput and iops limits. Nothing is going to trigger any metadata writes. And that's true for all RW members, no matter if it's space_info->chunk_size or reclaim control. Even for the label modification through sysfs, we do not trigger any metadata changes, but only modify in-memory structures. If you're going to use sysfs to trigger a transaction, I'd just say, DON'T. Thanks, Qu > > So this means your patch is fixing something that is only affecting your > out-of-tree development branch, which is not bringing much usefulness to > upstream. > > Thanks, > Qu > >> >>> Noticed an oops with these patches when doing echo 1 >devinfo/2/type >>> while mount is still ongoing. My btrfs is big so the mount takes >>> 20-30 minutes. Reboot and wait until mount is complete and this >>> worked fine. >> >> BUG: kernel NULL pointer dereference, address: 0000000000000008 >> PGD 0 P4D 0 >> Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI >> CPU: 4 UID: 0 PID: 3520 Comm: bash Not tainted 6.12.52-dirty #2 >> Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 >> 06/25/2018 >> RIP: 0010:_raw_spin_lock+0x17/0x30 >> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f >> 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 >> 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00 >> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246 >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 >> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008 >> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001 >> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000 >> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000 >> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) >> knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0 >> Call Trace: >> >> __reserve_bytes+0x70/0x720 [btrfs] >> ? get_page_from_freelist+0x343/0x1570 >> btrfs_reserve_metadata_bytes+0x1d/0xd0 [btrfs] >> btrfs_use_block_rsv+0x153/0x220 [btrfs] >> btrfs_alloc_tree_block+0x83/0x580 [btrfs] >> btrfs_force_cow_block+0x129/0x620 [btrfs] >> btrfs_cow_block+0xcd/0x230 [btrfs] >> btrfs_search_slot+0x566/0xd60 [btrfs] >> ? kmem_cache_alloc_noprof+0x106/0x2f0 >> btrfs_update_device+0x91/0x1d0 [btrfs] >> btrfs_devinfo_type_store+0xb8/0x140 [btrfs] >> kernfs_fop_write_iter+0x14c/0x200 >> vfs_write+0x289/0x440 >> ksys_write+0x6d/0xf0 >> trace_clock_x86_tsc+0x20/0x20 >> ? do_wp_page+0x838/0xf90 >> ? __do_sys_newfstat+0x68/0x70 >> ? __pte_offset_map+0x1b/0xf0 >> ? __handle_mm_fault+0xa6c/0x10f0 >> ? __count_memcg_events+0x53/0xf0 >> ? handle_mm_fault+0x1c4/0x2d0 >> ? do_user_addr_fault+0x334/0x620 >> ? arch_exit_to_user_mode_prepare.isra.0+0x11/0x90 >> entry_SYSCALL_64_after_hwframe+0x76/0x7e >> RIP: 0033:0x7fd6e27a1687 >> Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 >> f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 >> 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff >> RSP: 002b:00007ffecb401260 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 >> RAX: ffffffffffffffda RBX: 00007fd6e270f740 RCX: 00007fd6e27a1687 >> RDX: 0000000000000002 RSI: 0000557a2c38ad20 RDI: 0000000000000001 >> RBP: 0000557a2c38ad20 R08: 0000000000000000 R09: 0000000000000000 >> R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002 >> R13: 00007fd6e28fa5c0 R14: 00007fd6e28f7e80 R15: 0000000000000000 >> >> Modules linked in: rpcsec_gss_krb5 nfsv3 nfsv4 dns_resolver nfs netfs >> zram lz4hc_compress lz4_compress dm_crypt bonding tls ipmi_ssif >> intel_rapl_msr nfsd binfmt_misc auth_rpcgss nfs_acl lockd grace >> intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common >> sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp rapl >> intel_cstate s> >> intel_pmc_bxt ixgbe ehci_pci iTCO_vendor_support xfrm_algo gf128mul >> libata mpt3sas xhci_hcd ehci_hcd watchdog crypto_simd mdio_devres >> libphy cryptd raid_class usbcore scsi_transport_sas mdio igb scsi_mod >> wmi usb_common i2c_i801 lpc_ich scsi_common i2c_smbus i2c_algo_bit dca >> CR2: 0000000000000008 >> ---[ end trace 0000000000000000 ]--- >> RIP: 0010:_raw_spin_lock+0x17/0x30 >> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f >> 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 >> 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00 >> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246 >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 >> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008 >> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001 >> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000 >> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000 >> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) >> knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0 >> >> Reported-by: Eli Venter <eli@genedx.com> >> Signed-off-by: Kai Krakow <kai@kaishome.de> >> --- >> fs/btrfs/space-info.c | 8 ++++++++ >> 1 file changed, 8 insertions(+) >> >> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c >> index 97452fb5d29b..cbb6c4924850 100644 >> --- a/fs/btrfs/space-info.c >> +++ b/fs/btrfs/space-info.c >> @@ -1752,6 +1752,14 @@ static int __reserve_bytes(struct btrfs_fs_info >> *fs_info, >> ASSERT(flush != BTRFS_RESERVE_FLUSH_EVICT); >> } >> + /* >> + * During mount, the global block reserve might not have its >> space_info >> + * initialized yet. If we try to reserve bytes in this state >> (e.g. via >> + * early sysfs writes), we must not crash. >> + */ >> + if (unlikely(!space_info)) >> + return -EBUSY; >> + >> if (flush == BTRFS_RESERVE_FLUSH_DATA) >> async_work = &fs_info->async_data_reclaim_work; >> else > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL 2025-12-13 21:04 ` Qu Wenruo @ 2025-12-13 21:14 ` Kai Krakow 0 siblings, 0 replies; 9+ messages in thread From: Kai Krakow @ 2025-12-13 21:14 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs, Eli Venter Am Sa., 13. Dez. 2025 um 22:04 Uhr schrieb Qu Wenruo <wqu@suse.com>: > > > > 在 2025/12/14 07:18, Qu Wenruo 写道: > > > > > > 在 2025/12/14 06:39, Kai Krakow 写道: > >> During mount, the global block reserve might not have its space_info > >> initialized yet. If we try to reserve bytes in this state (e.g. via > >> early sysfs writes), we must not crash. > >> > >> This happened while developing patches which allow modification of the > >> devinfo.type field via sysfs. If this write access is executed by the > >> user before the mount finished, the kernel crashed with a NULL pointer > >> dereference: > > > > > > I'd say the modification through sysfs itself is a dangerous idea, it > > will need to hold the proper locks and if not properly checked can > > easily introduce unexpected races. > > > > > > Furthermore currently there is no RW support for devinfo related member. > > Correction, there are RW members but that are only runtime ones, e.g. > scrub throughput and iops limits. > > Nothing is going to trigger any metadata writes. > And that's true for all RW members, no matter if it's > space_info->chunk_size or reclaim control. > > Even for the label modification through sysfs, we do not trigger any > metadata changes, but only modify in-memory structures. > > If you're going to use sysfs to trigger a transaction, I'd just say, DON'T. Yes, I'm already thinking about another solution. But for now, I'll rebase this approach. Thanks for suggesting not to trigger transactions from within sysfs code paths. I'll be taking that into account for future versions. Thanks, Kai > Thanks, > Qu > > > > > So this means your patch is fixing something that is only affecting your > > out-of-tree development branch, which is not bringing much usefulness to > > upstream. > > > > Thanks, > > Qu > > > >> > >>> Noticed an oops with these patches when doing echo 1 >devinfo/2/type > >>> while mount is still ongoing. My btrfs is big so the mount takes > >>> 20-30 minutes. Reboot and wait until mount is complete and this > >>> worked fine. > >> > >> BUG: kernel NULL pointer dereference, address: 0000000000000008 > >> PGD 0 P4D 0 > >> Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI > >> CPU: 4 UID: 0 PID: 3520 Comm: bash Not tainted 6.12.52-dirty #2 > >> Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 > >> 06/25/2018 > >> RIP: 0010:_raw_spin_lock+0x17/0x30 > >> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f > >> 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 > >> 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00 > >> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246 > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 > >> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008 > >> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001 > >> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000 > >> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000 > >> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) > >> knlGS:0000000000000000 > >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0 > >> Call Trace: > >> > >> __reserve_bytes+0x70/0x720 [btrfs] > >> ? get_page_from_freelist+0x343/0x1570 > >> btrfs_reserve_metadata_bytes+0x1d/0xd0 [btrfs] > >> btrfs_use_block_rsv+0x153/0x220 [btrfs] > >> btrfs_alloc_tree_block+0x83/0x580 [btrfs] > >> btrfs_force_cow_block+0x129/0x620 [btrfs] > >> btrfs_cow_block+0xcd/0x230 [btrfs] > >> btrfs_search_slot+0x566/0xd60 [btrfs] > >> ? kmem_cache_alloc_noprof+0x106/0x2f0 > >> btrfs_update_device+0x91/0x1d0 [btrfs] > >> btrfs_devinfo_type_store+0xb8/0x140 [btrfs] > >> kernfs_fop_write_iter+0x14c/0x200 > >> vfs_write+0x289/0x440 > >> ksys_write+0x6d/0xf0 > >> trace_clock_x86_tsc+0x20/0x20 > >> ? do_wp_page+0x838/0xf90 > >> ? __do_sys_newfstat+0x68/0x70 > >> ? __pte_offset_map+0x1b/0xf0 > >> ? __handle_mm_fault+0xa6c/0x10f0 > >> ? __count_memcg_events+0x53/0xf0 > >> ? handle_mm_fault+0x1c4/0x2d0 > >> ? do_user_addr_fault+0x334/0x620 > >> ? arch_exit_to_user_mode_prepare.isra.0+0x11/0x90 > >> entry_SYSCALL_64_after_hwframe+0x76/0x7e > >> RIP: 0033:0x7fd6e27a1687 > >> Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 > >> f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 > >> 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff > >> RSP: 002b:00007ffecb401260 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 > >> RAX: ffffffffffffffda RBX: 00007fd6e270f740 RCX: 00007fd6e27a1687 > >> RDX: 0000000000000002 RSI: 0000557a2c38ad20 RDI: 0000000000000001 > >> RBP: 0000557a2c38ad20 R08: 0000000000000000 R09: 0000000000000000 > >> R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002 > >> R13: 00007fd6e28fa5c0 R14: 00007fd6e28f7e80 R15: 0000000000000000 > >> > >> Modules linked in: rpcsec_gss_krb5 nfsv3 nfsv4 dns_resolver nfs netfs > >> zram lz4hc_compress lz4_compress dm_crypt bonding tls ipmi_ssif > >> intel_rapl_msr nfsd binfmt_misc auth_rpcgss nfs_acl lockd grace > >> intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common > >> sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp rapl > >> intel_cstate s> > >> intel_pmc_bxt ixgbe ehci_pci iTCO_vendor_support xfrm_algo gf128mul > >> libata mpt3sas xhci_hcd ehci_hcd watchdog crypto_simd mdio_devres > >> libphy cryptd raid_class usbcore scsi_transport_sas mdio igb scsi_mod > >> wmi usb_common i2c_i801 lpc_ich scsi_common i2c_smbus i2c_algo_bit dca > >> CR2: 0000000000000008 > >> ---[ end trace 0000000000000000 ]--- > >> RIP: 0010:_raw_spin_lock+0x17/0x30 > >> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f > >> 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 > >> 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00 > >> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246 > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 > >> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008 > >> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001 > >> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000 > >> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000 > >> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) > >> knlGS:0000000000000000 > >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0 > >> > >> Reported-by: Eli Venter <eli@genedx.com> > >> Signed-off-by: Kai Krakow <kai@kaishome.de> > >> --- > >> fs/btrfs/space-info.c | 8 ++++++++ > >> 1 file changed, 8 insertions(+) > >> > >> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c > >> index 97452fb5d29b..cbb6c4924850 100644 > >> --- a/fs/btrfs/space-info.c > >> +++ b/fs/btrfs/space-info.c > >> @@ -1752,6 +1752,14 @@ static int __reserve_bytes(struct btrfs_fs_info > >> *fs_info, > >> ASSERT(flush != BTRFS_RESERVE_FLUSH_EVICT); > >> } > >> + /* > >> + * During mount, the global block reserve might not have its > >> space_info > >> + * initialized yet. If we try to reserve bytes in this state > >> (e.g. via > >> + * early sysfs writes), we must not crash. > >> + */ > >> + if (unlikely(!space_info)) > >> + return -EBUSY; > >> + > >> if (flush == BTRFS_RESERVE_FLUSH_DATA) > >> async_work = &fs_info->async_data_reclaim_work; > >> else > > > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL 2025-12-13 20:48 ` Qu Wenruo 2025-12-13 21:04 ` Qu Wenruo @ 2025-12-13 21:10 ` Kai Krakow 2025-12-13 21:15 ` Qu Wenruo 1 sibling, 1 reply; 9+ messages in thread From: Kai Krakow @ 2025-12-13 21:10 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs, Eli Venter Hello Qu! Am Sa., 13. Dez. 2025 um 21:48 Uhr schrieb Qu Wenruo <wqu@suse.com>: > > > > 在 2025/12/14 06:39, Kai Krakow 写道: > > During mount, the global block reserve might not have its space_info > > initialized yet. If we try to reserve bytes in this state (e.g. via > > early sysfs writes), we must not crash. > > > > This happened while developing patches which allow modification of the > > devinfo.type field via sysfs. If this write access is executed by the > > user before the mount finished, the kernel crashed with a NULL pointer > > dereference: > > > I'd say the modification through sysfs itself is a dangerous idea, it > will need to hold the proper locks and if not properly checked can > easily introduce unexpected races. > > > Furthermore currently there is no RW support for devinfo related member. > > So this means your patch is fixing something that is only affecting your > out-of-tree development branch, which is not bringing much usefulness to > upstream. > > Thanks, > Qu Okay, thanks. I understand your argumentation. I almost expected that this won't be accepted because it is triggered by out-of-tree code. In case, you'd like to see the code causing this: https://gist.github.com/kakra/8ccdcb96ca8426b95bcd86c7e0b5115e It's part of Goffredo's "allocator hint" patches which I rebased to 6.18. As you may see, I already guarded the call with: if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags)) return -EBUSY; So I should be safe there even without this patch. Thanks, Kai > >> Noticed an oops with these patches when doing echo 1 >devinfo/2/type > >> while mount is still ongoing. My btrfs is big so the mount takes > >> 20-30 minutes. Reboot and wait until mount is complete and this > >> worked fine. > > > > BUG: kernel NULL pointer dereference, address: 0000000000000008 > > PGD 0 P4D 0 > > Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI > > CPU: 4 UID: 0 PID: 3520 Comm: bash Not tainted 6.12.52-dirty #2 > > Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 06/25/2018 > > RIP: 0010:_raw_spin_lock+0x17/0x30 > > Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00 > > RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246 > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 > > RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008 > > RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001 > > R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000 > > R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000 > > FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0 > > Call Trace: > > > > __reserve_bytes+0x70/0x720 [btrfs] > > ? get_page_from_freelist+0x343/0x1570 > > btrfs_reserve_metadata_bytes+0x1d/0xd0 [btrfs] > > btrfs_use_block_rsv+0x153/0x220 [btrfs] > > btrfs_alloc_tree_block+0x83/0x580 [btrfs] > > btrfs_force_cow_block+0x129/0x620 [btrfs] > > btrfs_cow_block+0xcd/0x230 [btrfs] > > btrfs_search_slot+0x566/0xd60 [btrfs] > > ? kmem_cache_alloc_noprof+0x106/0x2f0 > > btrfs_update_device+0x91/0x1d0 [btrfs] > > btrfs_devinfo_type_store+0xb8/0x140 [btrfs] > > kernfs_fop_write_iter+0x14c/0x200 > > vfs_write+0x289/0x440 > > ksys_write+0x6d/0xf0 > > trace_clock_x86_tsc+0x20/0x20 > > ? do_wp_page+0x838/0xf90 > > ? __do_sys_newfstat+0x68/0x70 > > ? __pte_offset_map+0x1b/0xf0 > > ? __handle_mm_fault+0xa6c/0x10f0 > > ? __count_memcg_events+0x53/0xf0 > > ? handle_mm_fault+0x1c4/0x2d0 > > ? do_user_addr_fault+0x334/0x620 > > ? arch_exit_to_user_mode_prepare.isra.0+0x11/0x90 > > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > RIP: 0033:0x7fd6e27a1687 > > Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff > > RSP: 002b:00007ffecb401260 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 > > RAX: ffffffffffffffda RBX: 00007fd6e270f740 RCX: 00007fd6e27a1687 > > RDX: 0000000000000002 RSI: 0000557a2c38ad20 RDI: 0000000000000001 > > RBP: 0000557a2c38ad20 R08: 0000000000000000 R09: 0000000000000000 > > R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002 > > R13: 00007fd6e28fa5c0 R14: 00007fd6e28f7e80 R15: 0000000000000000 > > > > Modules linked in: rpcsec_gss_krb5 nfsv3 nfsv4 dns_resolver nfs netfs zram lz4hc_compress lz4_compress dm_crypt bonding tls ipmi_ssif intel_rapl_msr nfsd binfmt_misc auth_rpcgss nfs_acl lockd grace intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp rapl intel_cstate s> > > intel_pmc_bxt ixgbe ehci_pci iTCO_vendor_support xfrm_algo gf128mul libata mpt3sas xhci_hcd ehci_hcd watchdog crypto_simd mdio_devres libphy cryptd raid_class usbcore scsi_transport_sas mdio igb scsi_mod wmi usb_common i2c_i801 lpc_ich scsi_common i2c_smbus i2c_algo_bit dca > > CR2: 0000000000000008 > > ---[ end trace 0000000000000000 ]--- > > RIP: 0010:_raw_spin_lock+0x17/0x30 > > Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00 > > RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246 > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 > > RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008 > > RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001 > > R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000 > > R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000 > > FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0 > > > > Reported-by: Eli Venter <eli@genedx.com> > > Signed-off-by: Kai Krakow <kai@kaishome.de> > > --- > > fs/btrfs/space-info.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c > > index 97452fb5d29b..cbb6c4924850 100644 > > --- a/fs/btrfs/space-info.c > > +++ b/fs/btrfs/space-info.c > > @@ -1752,6 +1752,14 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info, > > ASSERT(flush != BTRFS_RESERVE_FLUSH_EVICT); > > } > > > > + /* > > + * During mount, the global block reserve might not have its space_info > > + * initialized yet. If we try to reserve bytes in this state (e.g. via > > + * early sysfs writes), we must not crash. > > + */ > > + if (unlikely(!space_info)) > > + return -EBUSY; > > + > > if (flush == BTRFS_RESERVE_FLUSH_DATA) > > async_work = &fs_info->async_data_reclaim_work; > > else > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL 2025-12-13 21:10 ` Kai Krakow @ 2025-12-13 21:15 ` Qu Wenruo 2025-12-13 21:43 ` Kai Krakow 0 siblings, 1 reply; 9+ messages in thread From: Qu Wenruo @ 2025-12-13 21:15 UTC (permalink / raw) To: Kai Krakow; +Cc: linux-btrfs, Eli Venter 在 2025/12/14 07:40, Kai Krakow 写道: > Hello Qu! > > Am Sa., 13. Dez. 2025 um 21:48 Uhr schrieb Qu Wenruo <wqu@suse.com>: >> >> >> >> 在 2025/12/14 06:39, Kai Krakow 写道: >>> During mount, the global block reserve might not have its space_info >>> initialized yet. If we try to reserve bytes in this state (e.g. via >>> early sysfs writes), we must not crash. >>> >>> This happened while developing patches which allow modification of the >>> devinfo.type field via sysfs. If this write access is executed by the >>> user before the mount finished, the kernel crashed with a NULL pointer >>> dereference: >> >> >> I'd say the modification through sysfs itself is a dangerous idea, it >> will need to hold the proper locks and if not properly checked can >> easily introduce unexpected races. >> >> >> Furthermore currently there is no RW support for devinfo related member. >> >> So this means your patch is fixing something that is only affecting your >> out-of-tree development branch, which is not bringing much usefulness to >> upstream. >> >> Thanks, >> Qu > > Okay, thanks. I understand your argumentation. I almost expected that > this won't be accepted because it is triggered by out-of-tree code. > > In case, you'd like to see the code causing this: > https://gist.github.com/kakra/8ccdcb96ca8426b95bcd86c7e0b5115e > > It's part of Goffredo's "allocator hint" patches which I rebased to 6.18. > > As you may see, I already guarded the call with: > > if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags)) return -EBUSY; Currently the RW sysfs interfaces follows the pattern that no transaction is triggered from sysfs write context. I strongly doubt if it's a good idea to trigger a transaction without any VFS checks. So if you really want to change a member, do a proper in-memory update only, and find a way (e.g. a new dirty dev list) to tell the fs to update the device item at commit time. It's much safer and avoid a new and untested direct way to trigger metadata modification. Thanks, Qu > > So I should be safe there even without this patch. > > Thanks, > Kai > >>>> Noticed an oops with these patches when doing echo 1 >devinfo/2/type >>>> while mount is still ongoing. My btrfs is big so the mount takes >>>> 20-30 minutes. Reboot and wait until mount is complete and this >>>> worked fine. >>> >>> BUG: kernel NULL pointer dereference, address: 0000000000000008 >>> PGD 0 P4D 0 >>> Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI >>> CPU: 4 UID: 0 PID: 3520 Comm: bash Not tainted 6.12.52-dirty #2 >>> Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 06/25/2018 >>> RIP: 0010:_raw_spin_lock+0x17/0x30 >>> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00 >>> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246 >>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 >>> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008 >>> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001 >>> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000 >>> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000 >>> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000 >>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0 >>> Call Trace: >>> >>> __reserve_bytes+0x70/0x720 [btrfs] >>> ? get_page_from_freelist+0x343/0x1570 >>> btrfs_reserve_metadata_bytes+0x1d/0xd0 [btrfs] >>> btrfs_use_block_rsv+0x153/0x220 [btrfs] >>> btrfs_alloc_tree_block+0x83/0x580 [btrfs] >>> btrfs_force_cow_block+0x129/0x620 [btrfs] >>> btrfs_cow_block+0xcd/0x230 [btrfs] >>> btrfs_search_slot+0x566/0xd60 [btrfs] >>> ? kmem_cache_alloc_noprof+0x106/0x2f0 >>> btrfs_update_device+0x91/0x1d0 [btrfs] >>> btrfs_devinfo_type_store+0xb8/0x140 [btrfs] >>> kernfs_fop_write_iter+0x14c/0x200 >>> vfs_write+0x289/0x440 >>> ksys_write+0x6d/0xf0 >>> trace_clock_x86_tsc+0x20/0x20 >>> ? do_wp_page+0x838/0xf90 >>> ? __do_sys_newfstat+0x68/0x70 >>> ? __pte_offset_map+0x1b/0xf0 >>> ? __handle_mm_fault+0xa6c/0x10f0 >>> ? __count_memcg_events+0x53/0xf0 >>> ? handle_mm_fault+0x1c4/0x2d0 >>> ? do_user_addr_fault+0x334/0x620 >>> ? arch_exit_to_user_mode_prepare.isra.0+0x11/0x90 >>> entry_SYSCALL_64_after_hwframe+0x76/0x7e >>> RIP: 0033:0x7fd6e27a1687 >>> Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff >>> RSP: 002b:00007ffecb401260 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 >>> RAX: ffffffffffffffda RBX: 00007fd6e270f740 RCX: 00007fd6e27a1687 >>> RDX: 0000000000000002 RSI: 0000557a2c38ad20 RDI: 0000000000000001 >>> RBP: 0000557a2c38ad20 R08: 0000000000000000 R09: 0000000000000000 >>> R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002 >>> R13: 00007fd6e28fa5c0 R14: 00007fd6e28f7e80 R15: 0000000000000000 >>> >>> Modules linked in: rpcsec_gss_krb5 nfsv3 nfsv4 dns_resolver nfs netfs zram lz4hc_compress lz4_compress dm_crypt bonding tls ipmi_ssif intel_rapl_msr nfsd binfmt_misc auth_rpcgss nfs_acl lockd grace intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp rapl intel_cstate s> >>> intel_pmc_bxt ixgbe ehci_pci iTCO_vendor_support xfrm_algo gf128mul libata mpt3sas xhci_hcd ehci_hcd watchdog crypto_simd mdio_devres libphy cryptd raid_class usbcore scsi_transport_sas mdio igb scsi_mod wmi usb_common i2c_i801 lpc_ich scsi_common i2c_smbus i2c_algo_bit dca >>> CR2: 0000000000000008 >>> ---[ end trace 0000000000000000 ]--- >>> RIP: 0010:_raw_spin_lock+0x17/0x30 >>> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00 >>> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246 >>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 >>> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008 >>> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001 >>> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000 >>> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000 >>> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000 >>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0 >>> >>> Reported-by: Eli Venter <eli@genedx.com> >>> Signed-off-by: Kai Krakow <kai@kaishome.de> >>> --- >>> fs/btrfs/space-info.c | 8 ++++++++ >>> 1 file changed, 8 insertions(+) >>> >>> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c >>> index 97452fb5d29b..cbb6c4924850 100644 >>> --- a/fs/btrfs/space-info.c >>> +++ b/fs/btrfs/space-info.c >>> @@ -1752,6 +1752,14 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info, >>> ASSERT(flush != BTRFS_RESERVE_FLUSH_EVICT); >>> } >>> >>> + /* >>> + * During mount, the global block reserve might not have its space_info >>> + * initialized yet. If we try to reserve bytes in this state (e.g. via >>> + * early sysfs writes), we must not crash. >>> + */ >>> + if (unlikely(!space_info)) >>> + return -EBUSY; >>> + >>> if (flush == BTRFS_RESERVE_FLUSH_DATA) >>> async_work = &fs_info->async_data_reclaim_work; >>> else >> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL 2025-12-13 21:15 ` Qu Wenruo @ 2025-12-13 21:43 ` Kai Krakow 2025-12-13 22:31 ` Qu Wenruo 0 siblings, 1 reply; 9+ messages in thread From: Kai Krakow @ 2025-12-13 21:43 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs, Eli Venter Am Sa., 13. Dez. 2025 um 22:15 Uhr schrieb Qu Wenruo <wqu@suse.com>: > > > > 在 2025/12/14 07:40, Kai Krakow 写道: > > Hello Qu! > > > > Am Sa., 13. Dez. 2025 um 21:48 Uhr schrieb Qu Wenruo <wqu@suse.com>: > >> > >> > >> > >> 在 2025/12/14 06:39, Kai Krakow 写道: > >>> During mount, the global block reserve might not have its space_info > >>> initialized yet. If we try to reserve bytes in this state (e.g. via > >>> early sysfs writes), we must not crash. > >>> > >>> This happened while developing patches which allow modification of the > >>> devinfo.type field via sysfs. If this write access is executed by the > >>> user before the mount finished, the kernel crashed with a NULL pointer > >>> dereference: > >> > >> > >> I'd say the modification through sysfs itself is a dangerous idea, it > >> will need to hold the proper locks and if not properly checked can > >> easily introduce unexpected races. > >> > >> > >> Furthermore currently there is no RW support for devinfo related member. > >> > >> So this means your patch is fixing something that is only affecting your > >> out-of-tree development branch, which is not bringing much usefulness to > >> upstream. > >> > >> Thanks, > >> Qu > > > > Okay, thanks. I understand your argumentation. I almost expected that > > this won't be accepted because it is triggered by out-of-tree code. > > > > In case, you'd like to see the code causing this: > > https://gist.github.com/kakra/8ccdcb96ca8426b95bcd86c7e0b5115e > > > > It's part of Goffredo's "allocator hint" patches which I rebased to 6.18. > > > > As you may see, I already guarded the call with: > > > > if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags)) return -EBUSY; > > Currently the RW sysfs interfaces follows the pattern that no > transaction is triggered from sysfs write context. > > I strongly doubt if it's a good idea to trigger a transaction without > any VFS checks. > > So if you really want to change a member, do a proper in-memory update > only, and find a way (e.g. a new dirty dev list) to tell the fs to > update the device item at commit time. I don't want to steal your time, but is this a better approach? https://gist.github.com/kakra/aa3eb8473cc05c1d3dd000160a5ee481 Thanks, Kai > It's much safer and avoid a new and untested direct way to trigger > metadata modification. > > Thanks, > Qu > > > > > So I should be safe there even without this patch. > > > > Thanks, > > Kai > > > >>>> Noticed an oops with these patches when doing echo 1 >devinfo/2/type > >>>> while mount is still ongoing. My btrfs is big so the mount takes > >>>> 20-30 minutes. Reboot and wait until mount is complete and this > >>>> worked fine. > >>> > >>> BUG: kernel NULL pointer dereference, address: 0000000000000008 > >>> PGD 0 P4D 0 > >>> Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI > >>> CPU: 4 UID: 0 PID: 3520 Comm: bash Not tainted 6.12.52-dirty #2 > >>> Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 06/25/2018 > >>> RIP: 0010:_raw_spin_lock+0x17/0x30 > >>> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00 > >>> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246 > >>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 > >>> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008 > >>> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001 > >>> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000 > >>> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000 > >>> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000 > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0 > >>> Call Trace: > >>> > >>> __reserve_bytes+0x70/0x720 [btrfs] > >>> ? get_page_from_freelist+0x343/0x1570 > >>> btrfs_reserve_metadata_bytes+0x1d/0xd0 [btrfs] > >>> btrfs_use_block_rsv+0x153/0x220 [btrfs] > >>> btrfs_alloc_tree_block+0x83/0x580 [btrfs] > >>> btrfs_force_cow_block+0x129/0x620 [btrfs] > >>> btrfs_cow_block+0xcd/0x230 [btrfs] > >>> btrfs_search_slot+0x566/0xd60 [btrfs] > >>> ? kmem_cache_alloc_noprof+0x106/0x2f0 > >>> btrfs_update_device+0x91/0x1d0 [btrfs] > >>> btrfs_devinfo_type_store+0xb8/0x140 [btrfs] > >>> kernfs_fop_write_iter+0x14c/0x200 > >>> vfs_write+0x289/0x440 > >>> ksys_write+0x6d/0xf0 > >>> trace_clock_x86_tsc+0x20/0x20 > >>> ? do_wp_page+0x838/0xf90 > >>> ? __do_sys_newfstat+0x68/0x70 > >>> ? __pte_offset_map+0x1b/0xf0 > >>> ? __handle_mm_fault+0xa6c/0x10f0 > >>> ? __count_memcg_events+0x53/0xf0 > >>> ? handle_mm_fault+0x1c4/0x2d0 > >>> ? do_user_addr_fault+0x334/0x620 > >>> ? arch_exit_to_user_mode_prepare.isra.0+0x11/0x90 > >>> entry_SYSCALL_64_after_hwframe+0x76/0x7e > >>> RIP: 0033:0x7fd6e27a1687 > >>> Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff > >>> RSP: 002b:00007ffecb401260 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 > >>> RAX: ffffffffffffffda RBX: 00007fd6e270f740 RCX: 00007fd6e27a1687 > >>> RDX: 0000000000000002 RSI: 0000557a2c38ad20 RDI: 0000000000000001 > >>> RBP: 0000557a2c38ad20 R08: 0000000000000000 R09: 0000000000000000 > >>> R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002 > >>> R13: 00007fd6e28fa5c0 R14: 00007fd6e28f7e80 R15: 0000000000000000 > >>> > >>> Modules linked in: rpcsec_gss_krb5 nfsv3 nfsv4 dns_resolver nfs netfs zram lz4hc_compress lz4_compress dm_crypt bonding tls ipmi_ssif intel_rapl_msr nfsd binfmt_misc auth_rpcgss nfs_acl lockd grace intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp rapl intel_cstate s> > >>> intel_pmc_bxt ixgbe ehci_pci iTCO_vendor_support xfrm_algo gf128mul libata mpt3sas xhci_hcd ehci_hcd watchdog crypto_simd mdio_devres libphy cryptd raid_class usbcore scsi_transport_sas mdio igb scsi_mod wmi usb_common i2c_i801 lpc_ich scsi_common i2c_smbus i2c_algo_bit dca > >>> CR2: 0000000000000008 > >>> ---[ end trace 0000000000000000 ]--- > >>> RIP: 0010:_raw_spin_lock+0x17/0x30 > >>> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00 > >>> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246 > >>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 > >>> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008 > >>> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001 > >>> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000 > >>> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000 > >>> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000 > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0 > >>> > >>> Reported-by: Eli Venter <eli@genedx.com> > >>> Signed-off-by: Kai Krakow <kai@kaishome.de> > >>> --- > >>> fs/btrfs/space-info.c | 8 ++++++++ > >>> 1 file changed, 8 insertions(+) > >>> > >>> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c > >>> index 97452fb5d29b..cbb6c4924850 100644 > >>> --- a/fs/btrfs/space-info.c > >>> +++ b/fs/btrfs/space-info.c > >>> @@ -1752,6 +1752,14 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info, > >>> ASSERT(flush != BTRFS_RESERVE_FLUSH_EVICT); > >>> } > >>> > >>> + /* > >>> + * During mount, the global block reserve might not have its space_info > >>> + * initialized yet. If we try to reserve bytes in this state (e.g. via > >>> + * early sysfs writes), we must not crash. > >>> + */ > >>> + if (unlikely(!space_info)) > >>> + return -EBUSY; > >>> + > >>> if (flush == BTRFS_RESERVE_FLUSH_DATA) > >>> async_work = &fs_info->async_data_reclaim_work; > >>> else > >> > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL 2025-12-13 21:43 ` Kai Krakow @ 2025-12-13 22:31 ` Qu Wenruo 2025-12-15 19:28 ` David Sterba 0 siblings, 1 reply; 9+ messages in thread From: Qu Wenruo @ 2025-12-13 22:31 UTC (permalink / raw) To: Kai Krakow; +Cc: linux-btrfs, Eli Venter 在 2025/12/14 08:13, Kai Krakow 写道: > Am Sa., 13. Dez. 2025 um 22:15 Uhr schrieb Qu Wenruo <wqu@suse.com>: [...] >> >> So if you really want to change a member, do a proper in-memory update >> only, and find a way (e.g. a new dirty dev list) to tell the fs to >> update the device item at commit time. > > I don't want to steal your time, I'm totally fine discussing the implementation details here. > but is this a better approach? > https://gist.github.com/kakra/aa3eb8473cc05c1d3dd000160a5ee481 Unfortunately as long as if you're trying to do any metadata modification, e.g. calling btrfs_update_device(), it will be a huge change. My idea would be something like this: btrfs_dev_info_type_store() { btrfs_device *device = container_of(); /* Do the proper locking. */ WRITE_ONCE(device->type, type); if (!list_empty(&dev->dirty_list)) list_add_tail(&fs_info->dirty_dev_list, &dev->dirty_list); return len; } Then inside btrfs_commit_transaction(), I do not yet have a good idea on the timing, but I guess it can done before btrfs_start_delalloc_flush(). Do something like this to write those dirty devices to chunk tree: btrfs_commit_transaction() { list_for_each_entry(dev, &fs_info->dirty_dev_list, dirty_list) { ret = btrfs_update_device(dev); } /* The remaining code. */ ret = btrfs_start_delalloc_flush(); } > > Thanks, > Kai > >> It's much safer and avoid a new and untested direct way to trigger >> metadata modification. >> >> Thanks, >> Qu >> >>> >>> So I should be safe there even without this patch. >>> >>> Thanks, >>> Kai >>> >>>>>> Noticed an oops with these patches when doing echo 1 >devinfo/2/type >>>>>> while mount is still ongoing. My btrfs is big so the mount takes >>>>>> 20-30 minutes. Reboot and wait until mount is complete and this >>>>>> worked fine. >>>>> >>>>> BUG: kernel NULL pointer dereference, address: 0000000000000008 >>>>> PGD 0 P4D 0 >>>>> Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI >>>>> CPU: 4 UID: 0 PID: 3520 Comm: bash Not tainted 6.12.52-dirty #2 >>>>> Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 06/25/2018 >>>>> RIP: 0010:_raw_spin_lock+0x17/0x30 >>>>> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00 >>>>> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246 >>>>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 >>>>> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008 >>>>> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001 >>>>> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000 >>>>> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000 >>>>> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000 >>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0 >>>>> Call Trace: >>>>> >>>>> __reserve_bytes+0x70/0x720 [btrfs] >>>>> ? get_page_from_freelist+0x343/0x1570 >>>>> btrfs_reserve_metadata_bytes+0x1d/0xd0 [btrfs] >>>>> btrfs_use_block_rsv+0x153/0x220 [btrfs] >>>>> btrfs_alloc_tree_block+0x83/0x580 [btrfs] >>>>> btrfs_force_cow_block+0x129/0x620 [btrfs] >>>>> btrfs_cow_block+0xcd/0x230 [btrfs] >>>>> btrfs_search_slot+0x566/0xd60 [btrfs] >>>>> ? kmem_cache_alloc_noprof+0x106/0x2f0 >>>>> btrfs_update_device+0x91/0x1d0 [btrfs] >>>>> btrfs_devinfo_type_store+0xb8/0x140 [btrfs] >>>>> kernfs_fop_write_iter+0x14c/0x200 >>>>> vfs_write+0x289/0x440 >>>>> ksys_write+0x6d/0xf0 >>>>> trace_clock_x86_tsc+0x20/0x20 >>>>> ? do_wp_page+0x838/0xf90 >>>>> ? __do_sys_newfstat+0x68/0x70 >>>>> ? __pte_offset_map+0x1b/0xf0 >>>>> ? __handle_mm_fault+0xa6c/0x10f0 >>>>> ? __count_memcg_events+0x53/0xf0 >>>>> ? handle_mm_fault+0x1c4/0x2d0 >>>>> ? do_user_addr_fault+0x334/0x620 >>>>> ? arch_exit_to_user_mode_prepare.isra.0+0x11/0x90 >>>>> entry_SYSCALL_64_after_hwframe+0x76/0x7e >>>>> RIP: 0033:0x7fd6e27a1687 >>>>> Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff >>>>> RSP: 002b:00007ffecb401260 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 >>>>> RAX: ffffffffffffffda RBX: 00007fd6e270f740 RCX: 00007fd6e27a1687 >>>>> RDX: 0000000000000002 RSI: 0000557a2c38ad20 RDI: 0000000000000001 >>>>> RBP: 0000557a2c38ad20 R08: 0000000000000000 R09: 0000000000000000 >>>>> R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002 >>>>> R13: 00007fd6e28fa5c0 R14: 00007fd6e28f7e80 R15: 0000000000000000 >>>>> >>>>> Modules linked in: rpcsec_gss_krb5 nfsv3 nfsv4 dns_resolver nfs netfs zram lz4hc_compress lz4_compress dm_crypt bonding tls ipmi_ssif intel_rapl_msr nfsd binfmt_misc auth_rpcgss nfs_acl lockd grace intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp rapl intel_cstate s> >>>>> intel_pmc_bxt ixgbe ehci_pci iTCO_vendor_support xfrm_algo gf128mul libata mpt3sas xhci_hcd ehci_hcd watchdog crypto_simd mdio_devres libphy cryptd raid_class usbcore scsi_transport_sas mdio igb scsi_mod wmi usb_common i2c_i801 lpc_ich scsi_common i2c_smbus i2c_algo_bit dca >>>>> CR2: 0000000000000008 >>>>> ---[ end trace 0000000000000000 ]--- >>>>> RIP: 0010:_raw_spin_lock+0x17/0x30 >>>>> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00 >>>>> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246 >>>>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 >>>>> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008 >>>>> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001 >>>>> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000 >>>>> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000 >>>>> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000 >>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0 >>>>> >>>>> Reported-by: Eli Venter <eli@genedx.com> >>>>> Signed-off-by: Kai Krakow <kai@kaishome.de> >>>>> --- >>>>> fs/btrfs/space-info.c | 8 ++++++++ >>>>> 1 file changed, 8 insertions(+) >>>>> >>>>> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c >>>>> index 97452fb5d29b..cbb6c4924850 100644 >>>>> --- a/fs/btrfs/space-info.c >>>>> +++ b/fs/btrfs/space-info.c >>>>> @@ -1752,6 +1752,14 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info, >>>>> ASSERT(flush != BTRFS_RESERVE_FLUSH_EVICT); >>>>> } >>>>> >>>>> + /* >>>>> + * During mount, the global block reserve might not have its space_info >>>>> + * initialized yet. If we try to reserve bytes in this state (e.g. via >>>>> + * early sysfs writes), we must not crash. >>>>> + */ >>>>> + if (unlikely(!space_info)) >>>>> + return -EBUSY; >>>>> + >>>>> if (flush == BTRFS_RESERVE_FLUSH_DATA) >>>>> async_work = &fs_info->async_data_reclaim_work; >>>>> else >>>> >> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL 2025-12-13 22:31 ` Qu Wenruo @ 2025-12-15 19:28 ` David Sterba 0 siblings, 0 replies; 9+ messages in thread From: David Sterba @ 2025-12-15 19:28 UTC (permalink / raw) To: Qu Wenruo; +Cc: Kai Krakow, linux-btrfs, Eli Venter On Sun, Dec 14, 2025 at 09:01:41AM +1030, Qu Wenruo wrote: > > > 在 2025/12/14 08:13, Kai Krakow 写道: > > Am Sa., 13. Dez. 2025 um 22:15 Uhr schrieb Qu Wenruo <wqu@suse.com>: > [...] > >> > >> So if you really want to change a member, do a proper in-memory update > >> only, and find a way (e.g. a new dirty dev list) to tell the fs to > >> update the device item at commit time. > > > > I don't want to steal your time, > > I'm totally fine discussing the implementation details here. > > > but is this a better approach? > > https://gist.github.com/kakra/aa3eb8473cc05c1d3dd000160a5ee481 > > Unfortunately as long as if you're trying to do any metadata > modification, e.g. calling btrfs_update_device(), it will be a huge change. > > > My idea would be something like this: > > btrfs_dev_info_type_store() > { > btrfs_device *device = container_of(); > > /* Do the proper locking. */ > > WRITE_ONCE(device->type, type); > if (!list_empty(&dev->dirty_list)) > list_add_tail(&fs_info->dirty_dev_list, &dev->dirty_list); set_bit(BTRFS_FS_NEED_TRANS_COMMIT, &fs_info->flags); wake_up_process(fs_info->transaction_kthread); I don't think it would be wise to wait until the transaction is committed before returning from the store handler. We don't do that for other similar changes either so the write becomes permanent after a sync, using eg. 'btrfs fi sync'. > return len; > } > > > Then inside btrfs_commit_transaction(), I do not yet have a good idea on > the timing, but I guess it can done before btrfs_start_delalloc_flush(). > > Do something like this to write those dirty devices to chunk tree: > > btrfs_commit_transaction() > { > list_for_each_entry(dev, &fs_info->dirty_dev_list, dirty_list) { > ret = btrfs_update_device(dev); > } > > /* The remaining code. */ > ret = btrfs_start_delalloc_flush(); > } This should work. The placement depends on what is changed and how it's related to the current transaction. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-12-15 19:28 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-12-13 20:09 [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL Kai Krakow 2025-12-13 20:48 ` Qu Wenruo 2025-12-13 21:04 ` Qu Wenruo 2025-12-13 21:14 ` Kai Krakow 2025-12-13 21:10 ` Kai Krakow 2025-12-13 21:15 ` Qu Wenruo 2025-12-13 21:43 ` Kai Krakow 2025-12-13 22:31 ` Qu Wenruo 2025-12-15 19:28 ` David Sterba
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox