From: Qu Wenruo <wqu@suse.com>
To: Kai Krakow <kai@kaishome.de>
Cc: linux-btrfs@vger.kernel.org, Eli Venter <eli@genedx.com>
Subject: Re: [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL
Date: Sun, 14 Dec 2025 09:01:41 +1030 [thread overview]
Message-ID: <684a6f99-0df8-428a-8e57-294a38b8788e@suse.com> (raw)
In-Reply-To: <CAC2ZOYvY1x-vx9rUs-tZ2J_oHjX-pj7C1muFwvnH5NHSHn0ntw@mail.gmail.com>
在 2025/12/14 08:13, Kai Krakow 写道:
> Am Sa., 13. Dez. 2025 um 22:15 Uhr schrieb Qu Wenruo <wqu@suse.com>:
[...]
>>
>> So if you really want to change a member, do a proper in-memory update
>> only, and find a way (e.g. a new dirty dev list) to tell the fs to
>> update the device item at commit time.
>
> I don't want to steal your time,
I'm totally fine discussing the implementation details here.
> but is this a better approach?
> https://gist.github.com/kakra/aa3eb8473cc05c1d3dd000160a5ee481
Unfortunately as long as if you're trying to do any metadata
modification, e.g. calling btrfs_update_device(), it will be a huge change.
My idea would be something like this:
btrfs_dev_info_type_store()
{
btrfs_device *device = container_of();
/* Do the proper locking. */
WRITE_ONCE(device->type, type);
if (!list_empty(&dev->dirty_list))
list_add_tail(&fs_info->dirty_dev_list, &dev->dirty_list);
return len;
}
Then inside btrfs_commit_transaction(), I do not yet have a good idea on
the timing, but I guess it can done before btrfs_start_delalloc_flush().
Do something like this to write those dirty devices to chunk tree:
btrfs_commit_transaction()
{
list_for_each_entry(dev, &fs_info->dirty_dev_list, dirty_list) {
ret = btrfs_update_device(dev);
}
/* The remaining code. */
ret = btrfs_start_delalloc_flush();
}
>
> Thanks,
> Kai
>
>> It's much safer and avoid a new and untested direct way to trigger
>> metadata modification.
>>
>> Thanks,
>> Qu
>>
>>>
>>> So I should be safe there even without this patch.
>>>
>>> Thanks,
>>> Kai
>>>
>>>>>> Noticed an oops with these patches when doing echo 1 >devinfo/2/type
>>>>>> while mount is still ongoing. My btrfs is big so the mount takes
>>>>>> 20-30 minutes. Reboot and wait until mount is complete and this
>>>>>> worked fine.
>>>>>
>>>>> BUG: kernel NULL pointer dereference, address: 0000000000000008
>>>>> PGD 0 P4D 0
>>>>> Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
>>>>> CPU: 4 UID: 0 PID: 3520 Comm: bash Not tainted 6.12.52-dirty #2
>>>>> Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 06/25/2018
>>>>> RIP: 0010:_raw_spin_lock+0x17/0x30
>>>>> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00
>>>>> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246
>>>>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
>>>>> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008
>>>>> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001
>>>>> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000
>>>>> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000
>>>>> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000
>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0
>>>>> Call Trace:
>>>>>
>>>>> __reserve_bytes+0x70/0x720 [btrfs]
>>>>> ? get_page_from_freelist+0x343/0x1570
>>>>> btrfs_reserve_metadata_bytes+0x1d/0xd0 [btrfs]
>>>>> btrfs_use_block_rsv+0x153/0x220 [btrfs]
>>>>> btrfs_alloc_tree_block+0x83/0x580 [btrfs]
>>>>> btrfs_force_cow_block+0x129/0x620 [btrfs]
>>>>> btrfs_cow_block+0xcd/0x230 [btrfs]
>>>>> btrfs_search_slot+0x566/0xd60 [btrfs]
>>>>> ? kmem_cache_alloc_noprof+0x106/0x2f0
>>>>> btrfs_update_device+0x91/0x1d0 [btrfs]
>>>>> btrfs_devinfo_type_store+0xb8/0x140 [btrfs]
>>>>> kernfs_fop_write_iter+0x14c/0x200
>>>>> vfs_write+0x289/0x440
>>>>> ksys_write+0x6d/0xf0
>>>>> trace_clock_x86_tsc+0x20/0x20
>>>>> ? do_wp_page+0x838/0xf90
>>>>> ? __do_sys_newfstat+0x68/0x70
>>>>> ? __pte_offset_map+0x1b/0xf0
>>>>> ? __handle_mm_fault+0xa6c/0x10f0
>>>>> ? __count_memcg_events+0x53/0xf0
>>>>> ? handle_mm_fault+0x1c4/0x2d0
>>>>> ? do_user_addr_fault+0x334/0x620
>>>>> ? arch_exit_to_user_mode_prepare.isra.0+0x11/0x90
>>>>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>> RIP: 0033:0x7fd6e27a1687
>>>>> Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
>>>>> RSP: 002b:00007ffecb401260 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
>>>>> RAX: ffffffffffffffda RBX: 00007fd6e270f740 RCX: 00007fd6e27a1687
>>>>> RDX: 0000000000000002 RSI: 0000557a2c38ad20 RDI: 0000000000000001
>>>>> RBP: 0000557a2c38ad20 R08: 0000000000000000 R09: 0000000000000000
>>>>> R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002
>>>>> R13: 00007fd6e28fa5c0 R14: 00007fd6e28f7e80 R15: 0000000000000000
>>>>>
>>>>> Modules linked in: rpcsec_gss_krb5 nfsv3 nfsv4 dns_resolver nfs netfs zram lz4hc_compress lz4_compress dm_crypt bonding tls ipmi_ssif intel_rapl_msr nfsd binfmt_misc auth_rpcgss nfs_acl lockd grace intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp rapl intel_cstate s>
>>>>> intel_pmc_bxt ixgbe ehci_pci iTCO_vendor_support xfrm_algo gf128mul libata mpt3sas xhci_hcd ehci_hcd watchdog crypto_simd mdio_devres libphy cryptd raid_class usbcore scsi_transport_sas mdio igb scsi_mod wmi usb_common i2c_i801 lpc_ich scsi_common i2c_smbus i2c_algo_bit dca
>>>>> CR2: 0000000000000008
>>>>> ---[ end trace 0000000000000000 ]---
>>>>> RIP: 0010:_raw_spin_lock+0x17/0x30
>>>>> Code: 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 e8 c0 d8 5e 31 c0 ba 01 00 00 00 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e9 97 01 00 00 0f 1f 80 00
>>>>> RSP: 0018:ffffbc13a95837c8 EFLAGS: 00010246
>>>>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
>>>>> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008
>>>>> RBP: 0000000000000008 R08: ffffbc13a9583a07 R09: 0000000000000001
>>>>> R10: d800000000000000 R11: 0000000000000001 R12: ffff9bee913db000
>>>>> R13: 0000000000000000 R14: 00000000fffffffb R15: ffff9bee913db000
>>>>> FS: 00007fd6e270f740(0000) GS:ffff9bfddfc00000(0000) knlGS:0000000000000000
>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> CR2: 0000000000000008 CR3: 00000008d9986004 CR4: 00000000003706f0
>>>>>
>>>>> Reported-by: Eli Venter <eli@genedx.com>
>>>>> Signed-off-by: Kai Krakow <kai@kaishome.de>
>>>>> ---
>>>>> fs/btrfs/space-info.c | 8 ++++++++
>>>>> 1 file changed, 8 insertions(+)
>>>>>
>>>>> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
>>>>> index 97452fb5d29b..cbb6c4924850 100644
>>>>> --- a/fs/btrfs/space-info.c
>>>>> +++ b/fs/btrfs/space-info.c
>>>>> @@ -1752,6 +1752,14 @@ static int __reserve_bytes(struct btrfs_fs_info *fs_info,
>>>>> ASSERT(flush != BTRFS_RESERVE_FLUSH_EVICT);
>>>>> }
>>>>>
>>>>> + /*
>>>>> + * During mount, the global block reserve might not have its space_info
>>>>> + * initialized yet. If we try to reserve bytes in this state (e.g. via
>>>>> + * early sysfs writes), we must not crash.
>>>>> + */
>>>>> + if (unlikely(!space_info))
>>>>> + return -EBUSY;
>>>>> +
>>>>> if (flush == BTRFS_RESERVE_FLUSH_DATA)
>>>>> async_work = &fs_info->async_data_reclaim_work;
>>>>> else
>>>>
>>
next prev parent reply other threads:[~2025-12-13 22:31 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-13 20:09 [PATCH] btrfs: harden __reserve_bytes() with space_info==NULL Kai Krakow
2025-12-13 20:48 ` Qu Wenruo
2025-12-13 21:04 ` Qu Wenruo
2025-12-13 21:14 ` Kai Krakow
2025-12-13 21:10 ` Kai Krakow
2025-12-13 21:15 ` Qu Wenruo
2025-12-13 21:43 ` Kai Krakow
2025-12-13 22:31 ` Qu Wenruo [this message]
2025-12-15 19:28 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=684a6f99-0df8-428a-8e57-294a38b8788e@suse.com \
--to=wqu@suse.com \
--cc=eli@genedx.com \
--cc=kai@kaishome.de \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox