Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: Kai Krakow <kai@kaishome.de>
Cc: linux-btrfs@vger.kernel.org, Eli Venter <eli@genedx.com>
Subject: Re: [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL
Date: Sun, 14 Dec 2025 07:45:30 +1030	[thread overview]
Message-ID: <4a4c04ff-3855-467c-af75-77c6ddf27098@suse.com> (raw)
In-Reply-To: <CAC2ZOYtgvzThSVX7tsajF=czm3JaYzpKjCsJB72Tw3_35Notzw@mail.gmail.com>



在 2025/12/14 07:40, Kai Krakow 写道:
> Hello Qu!
> 
> Am Sa., 13. Dez. 2025 um 21:48 Uhr schrieb Qu Wenruo <wqu@suse.com>:
>>
>>
>>
>> 在 2025/12/14 06:39, Kai Krakow 写道:
>>> During mount, the global block reserve might not have its space_info
>>> initialized yet. If we try to reserve bytes in this state (e.g. via
>>> early sysfs writes), we must not crash.
>>>
>>> This happened while developing patches which allow modification of the
>>> devinfo.type field via sysfs. If this write access is executed by the
>>> user before the mount finished, the kernel crashed with a NULL pointer
>>> dereference:
>>
>>
>> I'd say the modification through sysfs itself is a dangerous idea, it
>> will need to hold the proper locks and if not properly checked can
>> easily introduce unexpected races.
>>
>>
>> Furthermore currently there is no RW support for devinfo related member.
>>
>> So this means your patch is fixing something that is only affecting your
>> out-of-tree development branch, which is not bringing much usefulness to
>> upstream.
>>
>> Thanks,
>> Qu
> 
> Okay, thanks. I understand your argumentation. I almost expected that
> this won't be accepted because it is triggered by out-of-tree code.
> 
> In case, you'd like to see the code causing this:
> https://gist.github.com/kakra/8ccdcb96ca8426b95bcd86c7e0b5115e
> 
> It's part of Goffredo's "allocator hint" patches which I rebased to 6.18.
> 
> As you may see, I already guarded the call with:
> 
> if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags)) return -EBUSY;

Currently the RW sysfs interfaces follows the pattern that no 
transaction is triggered from sysfs write context.

I strongly doubt if it's a good idea to trigger a transaction without 
any VFS checks.

So if you really want to change a member, do a proper in-memory update 
only, and find a way (e.g. a new dirty dev list) to tell the fs to 
update the device item at commit time.

It's much safer and avoid a new and untested direct way to trigger 
metadata modification.

Thanks,
Qu

> 
> So I should be safe there even without this patch.
> 
> Thanks,
> Kai
> 
>>>> Noticed an oops with these patches when doing echo 1 >devinfo/2/type
>>>> while mount is still ongoing. My btrfs is big so the mount takes
>>>> 20-30 minutes. Reboot and wait until mount is complete and this
>>>> worked fine.
>>>
>>> BUG: kernel NULL pointer dereference, address: 0000000000000008
>>> PGD 0 P4D 0
>>> Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
>>> CPU: 4 UID: 0 PID: 3520 Comm: bash Not tainted 6.12.52-dirty #2
>>> Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 06/25/2018
>>> RIP: 0010:_raw_spin_lock+0x17/0x30
>>> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00
>>> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246
>>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
>>> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008
>>> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001
>>> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000
>>> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000
>>> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000
>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0
>>> Call Trace:
>>>
>>> __reserve_bytes+0x70/0x720 [btrfs]
>>> ? get_page_from_freelist+0x343/0x1570
>>> btrfs_reserve_metadata_bytes+0x1d/0xd0 [btrfs]
>>> btrfs_use_block_rsv+0x153/0x220 [btrfs]
>>> btrfs_alloc_tree_block+0x83/0x580 [btrfs]
>>> btrfs_force_cow_block+0x129/0x620 [btrfs]
>>> btrfs_cow_block+0xcd/0x230 [btrfs]
>>> btrfs_search_slot+0x566/0xd60 [btrfs]
>>> ? kmem_cache_alloc_noprof+0x106/0x2f0
>>> btrfs_update_device+0x91/0x1d0 [btrfs]
>>> btrfs_devinfo_type_store+0xb8/0x140 [btrfs]
>>> kernfs_fop_write_iter+0x14c/0x200
>>> vfs_write+0x289/0x440
>>> ksys_write+0x6d/0xf0
>>> trace_clock_x86_tsc+0x20/0x20
>>> ? do_wp_page+0x838/0xf90
>>> ? __do_sys_newfstat+0x68/0x70
>>> ? __pte_offset_map+0x1b/0xf0
>>> ? __handle_mm_fault+0xa6c/0x10f0
>>> ? __count_memcg_events+0x53/0xf0
>>> ? handle_mm_fault+0x1c4/0x2d0
>>> ? do_user_addr_fault+0x334/0x620
>>> ? arch_exit_to_user_mode_prepare.isra.0+0x11/0x90
>>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>> RIP: 0033:0x7fd6e27a1687
>>> Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
>>> RSP: 002b:00007ffecb401260 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
>>> RAX: ffffffffffffffda RBX: 00007fd6e270f740 RCX: 00007fd6e27a1687
>>> RDX: 0000000000000002 RSI: 0000557a2c38ad20 RDI: 0000000000000001
>>> RBP: 0000557a2c38ad20 R08: 0000000000000000 R09: 0000000000000000
>>> R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002
>>> R13: 00007fd6e28fa5c0 R14: 00007fd6e28f7e80 R15: 0000000000000000
>>>
>>> Modules linked in: rpcsec_gss_krb5 nfsv3 nfsv4 dns_resolver nfs netfs zram lz4hc_compress lz4_compress dm_crypt bonding tls ipmi_ssif intel_rapl_msr nfsd binfmt_misc auth_rpcgss nfs_acl lockd grace intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp rapl intel_cstate s>
>>> intel_pmc_bxt ixgbe ehci_pci iTCO_vendor_support xfrm_algo gf128mul libata mpt3sas xhci_hcd ehci_hcd watchdog crypto_simd mdio_devres libphy cryptd raid_class usbcore scsi_transport_sas mdio igb scsi_mod wmi usb_common i2c_i801 lpc_ich scsi_common i2c_smbus i2c_algo_bit dca
>>> CR2: 0000000000000008
>>> ---[ end trace 0000000000000000 ]---
>>> RIP: 0010:_raw_spin_lock+0x17/0x30
>>> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00
>>> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246
>>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
>>> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008
>>> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001
>>> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000
>>> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000
>>> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000
>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0
>>>
>>> Reported-by: Eli Venter <eli@genedx.com>
>>> Signed-off-by: Kai Krakow <kai@kaishome.de>
>>> ---
>>>    fs/btrfs/space-info.c | 8 ++++++++
>>>    1 file changed, 8 insertions(+)
>>>
>>> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
>>> index 97452fb5d29b..cbb6c4924850 100644
>>> --- a/fs/btrfs/space-info.c
>>> +++ b/fs/btrfs/space-info.c
>>> @@ -1752,6 +1752,14 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info,
>>>                ASSERT(flush != BTRFS_RESERVE_FLUSH_EVICT);
>>>        }
>>>
>>> +     /*
>>> +      * During mount, the global block reserve might not have its space_info
>>> +      * initialized yet. If we try to reserve bytes in this state (e.g. via
>>> +      * early sysfs writes), we must not crash.
>>> +      */
>>> +     if (unlikely(!space_info))
>>> +             return -EBUSY;
>>> +
>>>        if (flush == BTRFS_RESERVE_FLUSH_DATA)
>>>                async_work = &fs_info->async_data_reclaim_work;
>>>        else
>>


  reply	other threads:[~2025-12-13 21:15 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-13 20:09 [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL Kai Krakow
2025-12-13 20:48 ` Qu Wenruo
2025-12-13 21:04   ` Qu Wenruo
2025-12-13 21:14     ` Kai Krakow
2025-12-13 21:10   ` Kai Krakow
2025-12-13 21:15     ` Qu Wenruo [this message]
2025-12-13 21:43       ` Kai Krakow
2025-12-13 22:31         ` Qu Wenruo
2025-12-15 19:28           ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4a4c04ff-3855-467c-af75-77c6ddf27098@suse.com \
    --to=wqu@suse.com \
    --cc=eli@genedx.com \
    --cc=kai@kaishome.de \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox