From: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
To: "glaubitz@physik.fu-berlin.de" <glaubitz@physik.fu-berlin.de>,
"yang.chenzhi@vivo.com" <yang.chenzhi@vivo.com>,
"slava@dubeyko.com" <slava@dubeyko.com>,
"frank.li@vivo.com" <frank.li@vivo.com>
Cc: "syzbot+005d2a9ecd9fbf525f6a@syzkaller.appspotmail.com"
<syzbot+005d2a9ecd9fbf525f6a@syzkaller.appspotmail.com>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: [PATCH] hfsplus: fix missing hfs_bnode_get() in __hfs_bnode_create
Date: Tue, 2 Sep 2025 21:57:50 +0000 [thread overview]
Message-ID: <7c4e1a9e21d3a5538d448344d1bba6fcc93e9ef0.camel@ibm.com> (raw)
In-Reply-To: <e2b42d21-97b5-4e25-a3b0-1d3ec23bd104@vivo.com>
On Mon, 2025-09-01 at 14:21 +0000, 杨晨志 wrote:
> Thanks for reviewing.
>
Thanks a lot for detailed explanation. It is helpful.
> The syzbot C reproducer creates five processes, each of which repeatedly
> performs the following sequence of operations in a loop device: mount,
> hard link creation, sync, unmount, and loop device reset[1]. In certain
> cases, this can result in situations where, for example, Thread A is
> executing sync while Thread B is concurrently performing a link operation.
>
> > It's clear why link() could try to create a b-tree node. But it's not completely
> > clear why sync() should try to create a b-tree node? Usually, sync() should try> to flush already existed inodes. Especially, if we imply any system
> inodes that
> > are created mostly during mount operation. How is it possible that sync() could> try to create the node?
>
> ```
> if (!cnid && tree->cnid == 4)
> dump_stack();
> ```
>
> Add the code above to __hfs_bnode_create(). Call sync() immediately
> after hfsplus is mounted successfully. The call trace shows that sync()
> can create a bnode(create header node; id 0).
>
> > Moreover, node_id 0 (root node) should be already
> > created if we created the b-tree.
> >
>
> node_id 0 is the header node, which contains a bitmap, and node_id 1 is
> the root node (the root node can change if the tree height increases,
> but the initial root node ID of the catalog tree is always 1).
>
> In fact, the root node (ID 1) is the bnode we create when initializing
> the catalog btree. However, during the creation of the catalog tree, we
> do not create the header node (ID 0).
>
> hfsplus_fill_super()
> -> hfsplus_iget(ROOT_CNID)
> -> hfs_find_init(catalog_tree)
> -> hfsplus_find_cat
> -> hfs_brec_read
> -> hfs_brec_find(tree->root)
> -> hfs_bnode_find(tree->root)
> -> __hfs_bnode_create(tree->root)
>
> From my perspective, the header node is somewhat special. There are two
> cases where we need to use it. First, when searching the bitmap to
> allocate or free a node. Second, when reading or writing metadata in the
> header record. In hfs_btree_open(), we do read data from the header
> record; however, it reads this data directly from the page rather than
> creating a header node.
>
> ```
> mapping = tree->inode->i_mapping;
> page = read_mapping_page(mapping, 0, NULL);
> if (IS_ERR(page))
> goto free_inode;
>
> /* Load the header */
> head = (struct hfs_btree_header_rec *)(kmap_local_page(page) +
> sizeof(struct hfs_bnode_desc));
> tree->root = be32_to_cpu(head->root);
>
> ```
>
> So we lose the opportunity to initialize the header node during the
> mounting process. We can only wait until the next time we need to use
> the header node to create it. In this scenario, it could be sync() ->
> hfs_btree_write to write metadata back to disk, or link() ->
> hfs_bnode_split to use the bitmap to allocate a new node.
>
> Besides this, I found hfs_bnode_find() to be somewhat strange.
> hfs_bnode_find() does not record the node it creates in the bitmap(e.g.
> header node). The recording is handled by hfs_bmap_alloc, meaning that
> recording and creation are separated operations, and the two functions
> do not have a one-to-one relationship.The caller must ensure that the
> node it wants to find is already in the bitmap. However, we do not
> currently have such checks in place—or perhaps there is some mechanism
> that guarantees every node passed through it is already in the bitmap?
>
I assume that we have enough bugs in this logic. Currently, I am investigating
the issue in HFS related to node's bitmap management logic. And I still haven't
the complete picture of the issue:
[ 34.496034][ T9845] hfs: new node 0 already hashed?
[ 34.496526][ T9845] ------------[ cut here ]------------
[ 34.497010][ T9845] WARNING: CPU: 2 PID: 9845 at fs/hfs/bnode.c:549
hfs_bnode_create+0x461/0x4f0
[ 34.497805][ T9845] Modules linked in:
[ 34.498186][ T9845] CPU: 2 UID: 0 PID: 9845 Comm: repro Not tainted 6.17.0-
rc1-00116-gd7ee5bdce789-dirty #32 PREEMPT(full)
[ 34.499175][ T9845] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX,
1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 34.500144][ T9845] RIP: 0010:hfs_bnode_create+0x461/0x4f0
[ 34.500650][ T9845] Code: c1 8b 89 ee e8 e0 8a 93 fe e9 cf fc ff ff e8 66 49
24 ff 4c 89 ef e8 9e ad f4 08 48 c7 c7 20 cb cf
[ 34.502377][ T9845] RSP: 0018:ffffc9000d036f20 EFLAGS: 00010246
[ 34.502918][ T9845] RAX: 000000000000001f RBX: ffff888022b28000 RCX:
f65fb9590b3bdd00
[ 34.503625][ T9845] RDX: 0000000000000000 RSI: 0000000080000000 RDI:
0000000000000000
[ 34.504423][ T9845] RBP: 0000000000000000 R08: ffff888062924253 R09:
1ffff1100c52484a
[ 34.505281][ T9845] R10: dffffc0000000000 R11: ffffed100c52484b R12:
0000000000000000
[ 34.506257][ T9845] R13: ffff888022b280e0 R14: ffff888021a88a00 R15:
dffffc0000000000
[ 34.507072][ T9845] FS: 00007f8a41e5c540(0000) GS:ffff8880cfce5000(0000)
knlGS:0000000000000000
[ 34.508102][ T9845] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 34.508861][ T9845] CR2: 00007f19ac0470f8 CR3: 00000001101d8000 CR4:
00000000000006f0
[ 34.509696][ T9845] Call Trace:
[ 34.510067][ T9845] <TASK>
[ 34.510360][ T9845] ? do_raw_spin_unlock+0x4d/0x210
[ 34.510890][ T9845] hfs_bmap_alloc+0x83b/0x8f0
[ 34.511419][ T9845] ? __pfx_hfs_bnode_find+0x10/0x10
[ 34.511886][ T9845] ? __pfx_hfs_bmap_alloc+0x10/0x10
[ 34.512452][ T9845] hfs_btree_inc_height+0xf6/0xac0
[ 34.512937][ T9845] ? _atomic_dec_and_lock+0xbd/0x140
[ 34.513475][ T9845] ? __pfx_folio_mark_accessed+0x10/0x10
[ 34.514117][ T9845] ? __pfx_hfs_btree_inc_height+0x10/0x10
[ 34.514703][ T9845] ? do_raw_spin_unlock+0x4d/0x210
[ 34.515217][ T9845] hfs_brec_insert+0x891/0xd40
[ 34.515707][ T9845] ? __pfx_hfs_brec_insert+0x10/0x10
[ 34.516286][ T9845] ? __kmalloc_noprof+0x2a9/0x510
[ 34.516795][ T9845] __hfs_ext_cache_extent+0x2c7/0xdc0
[ 34.517394][ T9845] hfs_ext_read_extent+0x288/0x300
[ 34.517922][ T9845] ? __pfx_hfs_ext_read_extent+0x10/0x10
[ 34.518523][ T9845] ? __pfx_clean_bdev_aliases+0x10/0x10
[ 34.519071][ T9845] hfs_extend_file+0x14e/0xee0
[ 34.519570][ T9845] ? hfs_bmap_reserve+0x3e9/0x430
[ 34.520045][ T9845] ? __pfx_hfs_extend_file+0x10/0x10
[ 34.520593][ T9845] hfs_get_block+0x3a9/0x9d0
[ 34.521062][ T9845] __block_write_begin_int+0x6c2/0x1990
[ 34.521617][ T9845] ? __pfx___folio_batch_add_and_move+0x10/0x10
[ 34.522242][ T9845] ? __pfx_hfs_get_block+0x10/0x10
[ 34.522758][ T9845] ? __pfx___block_write_begin_int+0x10/0x10
[ 34.523387][ T9845] cont_write_begin+0x6d5/0xa50
[ 34.523879][ T9845] ? __pfx_cont_write_begin+0x10/0x10
[ 34.524440][ T9845] ? __pfx___might_resched+0x10/0x10
[ 34.524976][ T9845] ? __mark_inode_dirty+0x3db/0xeb0
[ 34.525490][ T9845] ? folio_unlock+0x10c/0x170
[ 34.525981][ T9845] hfs_write_begin+0x66/0xb0
[ 34.526489][ T9845] ? __pfx_hfs_get_block+0x10/0x10
[ 34.527044][ T9845] generic_perform_write+0x2c5/0x900
[ 34.527569][ T9845] ? __pfx_generic_perform_write+0x10/0x10
[ 34.528161][ T9845] ? file_update_time+0x2ce/0x470
[ 34.528733][ T9845] ? __generic_file_write_iter+0xf9/0x230
[ 34.529270][ T9845] ? generic_file_write_iter+0x103/0x550
[ 34.529907][ T9845] generic_file_write_iter+0x117/0x550
[ 34.530445][ T9845] ? __pfx_generic_file_write_iter+0x10/0x10
[ 34.531076][ T9845] ? stack_depot_save_flags+0x40/0x890
[ 34.531600][ T9845] ? __pfx_aa_file_perm+0x10/0x10
[ 34.532108][ T9845] ? __lock_acquire+0xaec/0xd80
[ 34.532619][ T9845] ? rcu_read_lock_any_held+0xb3/0x120
[ 34.533138][ T9845] ? __pfx_rcu_read_lock_any_held+0x10/0x10
[ 34.533742][ T9845] vfs_write+0x60a/0xb80
[ 34.534189][ T9845] ? __pfx_generic_file_write_iter+0x10/0x10
[ 34.534769][ T9845] ? __pfx_vfs_write+0x10/0x10
[ 34.535259][ T9845] ? do_sys_openat2+0x154/0x1c0
[ 34.535781][ T9845] ? kmem_cache_free+0x18c/0x400
[ 34.536288][ T9845] ksys_write+0x14a/0x250
[ 34.536748][ T9845] ? __pfx_ksys_write+0x10/0x10
[ 34.537207][ T9845] ? do_syscall_64+0xb7/0x3a0
[ 34.537661][ T9845] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 34.538268][ T9845] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 34.538886][ T9845] do_syscall_64+0xf3/0x3a0
[ 34.539380][ T9845] ? exc_page_fault+0x9f/0xf0
[ 34.539871][ T9845] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 34.540472][ T9845] RIP: 0033:0x7f8a41d7bfc9
[ 34.540915][ T9845] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 48
[ 34.542872][ T9845] RSP: 002b:00007ffeb34ea6e8 EFLAGS: 00000202 ORIG_RAX:
0000000000000001
[ 34.543642][ T9845] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
00007f8a41d7bfc9
[ 34.544324][ T9845] RDX: 000000000208e24b RSI: 0000200000000080 RDI:
0000000000000005
[ 34.545145][ T9845] RBP: 00007ffeb34ea700 R08: 00007ffeb34ea700 R09:
00007ffeb34ea700
[ 34.545885][ T9845] R10: 00007ffeb34ea700 R11: 0000000000000202 R12:
000055a027173260
[ 34.546641][ T9845] R13: 0000000000000000 R14: 0000000000000000 R15:
0000000000000000
[ 34.547329][ T9845] </TASK>
And I suspect that HFS+ could have the same issue.
> Interestingly, since hfsplus_fill_super() needs to initialize the root
> directory “/”, this operation must rely on the catalog_tree. As a
> result, the first byte of the catalog_tree bitmap is 0xC0 (i.e.,
> 0b11000000), meaning that both the header node and the root node are
> pre-marked in the bitmap so that hfs_bnode_find can later be used to
> create the header node (although in practice, the root node is the first
> one created).
>
> ```
> /* Load the root directory */
> root = hfsplus_iget(sb, HFSPLUS_ROOT_CNID); <- use catalog tree
> if (IS_ERR(root)) {
> pr_err("failed to load root directory\n");
> err = PTR_ERR(root);
> goto out_put_alloc_file;
> }
>
> ```
>
> In contrast, the extent tree is not used at all during the mounting
> process. As a result, the first byte of the extents tree is 0x80 (i.e.,
> 0b1000000), so its root node can be created later, when
> hfs_brec_insert() performs the first insertion. At that point, the root
> node is marked in the bitmap through hfs_bmap_alloc().
>
> In short, during the mounting process, the extents tree initializes
> neither the header node nor the root node.
>
> ```
> if (!fd->bnode) {
> if (!tree->root)
> hfs_btree_inc_height(tree); <- hfs_bmap_alloc
> node = hfs_bnode_find(tree, tree->leaf_head);
> if (IS_ERR(node))
> return PTR_ERR(node);
> fd->bnode = node;
> fd->record = -1;
> }
>
> ```
>
> > So, if you are fixing the syzbot reported issue, then it could be good to have
> > call trace in the comment to understand the fix environment. And it could be
> > more clear in what environment sync() is trying to create the b-tree node.
>
> OK, I will put call trace in the commit message in future development.
>
> syzbot report call trace:[2][3]
>
> Call Trace:
> <TASK>
> hfsplus_btree_write+0x379/0x7b0 fs/hfsplus/btree.c:309
> hfsplus_system_write_inode fs/hfsplus/super.c:137 [inline]
> hfsplus_write_inode+0x4c9/0x5f0 fs/hfsplus/super.c:163
> write_inode fs/fs-writeback.c:1525 [inline]
> __writeback_single_inode+0x6f4/0x1000 fs/fs-writeback.c:1745
> writeback_sb_inodes+0x6b7/0xf60 fs/fs-writeback.c:1976
> wb_writeback+0x43b/0xaf0 fs/fs-writeback.c:2156
> wb_do_writeback fs/fs-writeback.c:2303 [inline]
> wb_workfn+0x40e/0xf00 fs/fs-writeback.c:2343
> process_one_work kernel/workqueue.c:3236 [inline]
> process_scheduled_works+0xae1/0x17b0 kernel/workqueue.c:3319
> worker_thread+0x8a0/0xda0 kernel/workqueue.c:3400
> kthread+0x711/0x8a0 kernel/kthread.c:463
> ret_from_fork+0x3fc/0x770 arch/x86/kernel/process.c:148
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
>
> Call Trace:
> <TASK>
> hfsplus_bmap_alloc+0x5a5/0x640 fs/hfsplus/btree.c:414
> hfs_bnode_split+0xcc/0xef0 fs/hfsplus/brec.c:245
> hfsplus_brec_insert+0x38f/0xcc0 fs/hfsplus/brec.c:100
> hfsplus_rename_cat+0x51c/0xde0 fs/hfsplus/catalog.c:491
> hfsplus_link+0x359/0x6a0 fs/hfsplus/dir.c:323
> vfs_link+0x4ea/0x6e0 fs/namei.c:4854
> do_linkat+0x272/0x560 fs/namei.c:4924
> __do_sys_link fs/namei.c:4958 [inline]
> __se_sys_link fs/namei.c:4956 [inline]
> __x64_sys_link+0x82/0x90 fs/namei.c:4956
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> There are two different call traces, depending on which path acquires
> the node first.
>
> Finally, I believe it is necessary to add the missing hfs_bnode_get in
> __hfs_bnode_create. This ensures that each find call holds a proper
> reference, aligning the reference counting design with that of VFS
> inodes, and effectively preventing miscounting in concurrent scenarios.
>
I agree with the fix. The patch simply needs to be more clearly explained.
> If you would like, I can rewrite the commit message and resend the
> patch. Please let me know.
>
Yes, please, the commit message needs to be improved.
Thanks,
Slava.
prev parent reply other threads:[~2025-09-02 21:57 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-29 9:39 [PATCH] hfsplus: fix missing hfs_bnode_get() in __hfs_bnode_create Chenzhi Yang
2025-08-29 21:53 ` Viacheslav Dubeyko
2025-09-01 14:21 ` 杨晨志
2025-09-02 21:57 ` Viacheslav Dubeyko [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7c4e1a9e21d3a5538d448344d1bba6fcc93e9ef0.camel@ibm.com \
--to=slava.dubeyko@ibm.com \
--cc=frank.li@vivo.com \
--cc=glaubitz@physik.fu-berlin.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=slava@dubeyko.com \
--cc=syzbot+005d2a9ecd9fbf525f6a@syzkaller.appspotmail.com \
--cc=yang.chenzhi@vivo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).