linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
To: "glaubitz@physik.fu-berlin.de" <glaubitz@physik.fu-berlin.de>,
	"yang.chenzhi@vivo.com" <yang.chenzhi@vivo.com>,
	"slava@dubeyko.com" <slava@dubeyko.com>,
	"frank.li@vivo.com" <frank.li@vivo.com>
Cc: "syzbot+005d2a9ecd9fbf525f6a@syzkaller.appspotmail.com"
	<syzbot+005d2a9ecd9fbf525f6a@syzkaller.appspotmail.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: [PATCH] hfsplus: fix missing hfs_bnode_get() in __hfs_bnode_create
Date: Tue, 2 Sep 2025 21:57:50 +0000	[thread overview]
Message-ID: <7c4e1a9e21d3a5538d448344d1bba6fcc93e9ef0.camel@ibm.com> (raw)
In-Reply-To: <e2b42d21-97b5-4e25-a3b0-1d3ec23bd104@vivo.com>

On Mon, 2025-09-01 at 14:21 +0000, 杨晨志 wrote:
> Thanks for reviewing.
> 

Thanks a lot for detailed explanation. It is helpful.

> The syzbot C reproducer creates five processes, each of which repeatedly 
> performs the following sequence of operations in a loop device: mount, 
> hard link creation, sync, unmount, and loop device reset[1]. In certain 
> cases, this can result in situations where, for example, Thread A is 
> executing sync while Thread B is concurrently performing a link operation.
> 
> > It's clear why link() could try to create a b-tree node. But it's not completely
> > clear why sync() should try to create a b-tree node? Usually, sync() should try> to flush already existed inodes. Especially, if we imply any system 
> inodes that
> > are created mostly during mount operation. How is it possible that sync() could> try to create the node?
> 
> ```
> if (!cnid && tree->cnid == 4)
> 	dump_stack();
> ```
> 
> Add the code above to __hfs_bnode_create(). Call sync() immediately 
> after hfsplus is mounted successfully. The call trace shows that sync() 
> can create a bnode(create header node; id 0).
> 
> > Moreover, node_id 0 (root node) should be already
> > created if we created the b-tree.
> > 
> 
> node_id 0 is the header node, which contains a bitmap, and node_id 1 is 
> the root node (the root node can change if the tree height increases, 
> but the initial root node ID of the catalog tree is always 1).
> 
> In fact, the root node (ID 1) is the bnode we create when initializing 
> the catalog btree. However, during the creation of the catalog tree, we 
> do not create the header node (ID 0).
> 
> hfsplus_fill_super()
>    -> hfsplus_iget(ROOT_CNID)
>      -> hfs_find_init(catalog_tree)
>        -> hfsplus_find_cat
>          -> hfs_brec_read
>            -> hfs_brec_find(tree->root)
>              -> hfs_bnode_find(tree->root)
>                -> __hfs_bnode_create(tree->root)
> 
>  From my perspective, the header node is somewhat special. There are two 
> cases where we need to use it. First, when searching the bitmap to 
> allocate or free a node. Second, when reading or writing metadata in the 
> header record. In hfs_btree_open(), we do read data from the header 
> record; however, it reads this data directly from the page rather than 
> creating a header node.
> 
> ```
> 	mapping = tree->inode->i_mapping;
> 	page = read_mapping_page(mapping, 0, NULL);
> 	if (IS_ERR(page))
> 		goto free_inode;
> 
> 	/* Load the header */
> 	head = (struct hfs_btree_header_rec *)(kmap_local_page(page) +
> 		sizeof(struct hfs_bnode_desc));
> 	tree->root = be32_to_cpu(head->root);
> 
> ```
> 
> So we lose the opportunity to initialize the header node during the 
> mounting process. We can only wait until the next time we need to use 
> the header node to create it. In this scenario, it could be sync() -> 
> hfs_btree_write to write metadata back to disk, or link() -> 
> hfs_bnode_split to use the bitmap to allocate a new node.
> 
> Besides this, I found hfs_bnode_find() to be somewhat strange. 
> hfs_bnode_find() does not record the node it creates in the bitmap(e.g. 
> header node). The recording is handled by hfs_bmap_alloc, meaning that 
> recording and creation are separated operations, and the two functions 
> do not have a one-to-one relationship.The caller must ensure that the 
> node it wants to find is already in the bitmap. However, we do not 
> currently have such checks in place—or perhaps there is some mechanism 
> that guarantees every node passed through it is already in the bitmap?
> 

I assume that we have enough bugs in this logic. Currently, I am investigating
the issue in HFS related to node's bitmap management logic. And I still haven't
the complete picture of the issue:

[   34.496034][ T9845] hfs: new node 0 already hashed?
[   34.496526][ T9845] ------------[ cut here ]------------
[   34.497010][ T9845] WARNING: CPU: 2 PID: 9845 at fs/hfs/bnode.c:549
hfs_bnode_create+0x461/0x4f0
[   34.497805][ T9845] Modules linked in:
[   34.498186][ T9845] CPU: 2 UID: 0 PID: 9845 Comm: repro Not tainted 6.17.0-
rc1-00116-gd7ee5bdce789-dirty #32 PREEMPT(full) 
[   34.499175][ T9845] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX,
1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   34.500144][ T9845] RIP: 0010:hfs_bnode_create+0x461/0x4f0
[   34.500650][ T9845] Code: c1 8b 89 ee e8 e0 8a 93 fe e9 cf fc ff ff e8 66 49
24 ff 4c 89 ef e8 9e ad f4 08 48 c7 c7 20 cb cf
[   34.502377][ T9845] RSP: 0018:ffffc9000d036f20 EFLAGS: 00010246
[   34.502918][ T9845] RAX: 000000000000001f RBX: ffff888022b28000 RCX:
f65fb9590b3bdd00
[   34.503625][ T9845] RDX: 0000000000000000 RSI: 0000000080000000 RDI:
0000000000000000
[   34.504423][ T9845] RBP: 0000000000000000 R08: ffff888062924253 R09:
1ffff1100c52484a
[   34.505281][ T9845] R10: dffffc0000000000 R11: ffffed100c52484b R12:
0000000000000000
[   34.506257][ T9845] R13: ffff888022b280e0 R14: ffff888021a88a00 R15:
dffffc0000000000
[   34.507072][ T9845] FS:  00007f8a41e5c540(0000) GS:ffff8880cfce5000(0000)
knlGS:0000000000000000
[   34.508102][ T9845] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   34.508861][ T9845] CR2: 00007f19ac0470f8 CR3: 00000001101d8000 CR4:
00000000000006f0
[   34.509696][ T9845] Call Trace:
[   34.510067][ T9845]  <TASK>
[   34.510360][ T9845]  ? do_raw_spin_unlock+0x4d/0x210
[   34.510890][ T9845]  hfs_bmap_alloc+0x83b/0x8f0
[   34.511419][ T9845]  ? __pfx_hfs_bnode_find+0x10/0x10
[   34.511886][ T9845]  ? __pfx_hfs_bmap_alloc+0x10/0x10
[   34.512452][ T9845]  hfs_btree_inc_height+0xf6/0xac0
[   34.512937][ T9845]  ? _atomic_dec_and_lock+0xbd/0x140
[   34.513475][ T9845]  ? __pfx_folio_mark_accessed+0x10/0x10
[   34.514117][ T9845]  ? __pfx_hfs_btree_inc_height+0x10/0x10
[   34.514703][ T9845]  ? do_raw_spin_unlock+0x4d/0x210
[   34.515217][ T9845]  hfs_brec_insert+0x891/0xd40
[   34.515707][ T9845]  ? __pfx_hfs_brec_insert+0x10/0x10
[   34.516286][ T9845]  ? __kmalloc_noprof+0x2a9/0x510
[   34.516795][ T9845]  __hfs_ext_cache_extent+0x2c7/0xdc0
[   34.517394][ T9845]  hfs_ext_read_extent+0x288/0x300
[   34.517922][ T9845]  ? __pfx_hfs_ext_read_extent+0x10/0x10
[   34.518523][ T9845]  ? __pfx_clean_bdev_aliases+0x10/0x10
[   34.519071][ T9845]  hfs_extend_file+0x14e/0xee0
[   34.519570][ T9845]  ? hfs_bmap_reserve+0x3e9/0x430
[   34.520045][ T9845]  ? __pfx_hfs_extend_file+0x10/0x10
[   34.520593][ T9845]  hfs_get_block+0x3a9/0x9d0
[   34.521062][ T9845]  __block_write_begin_int+0x6c2/0x1990
[   34.521617][ T9845]  ? __pfx___folio_batch_add_and_move+0x10/0x10
[   34.522242][ T9845]  ? __pfx_hfs_get_block+0x10/0x10
[   34.522758][ T9845]  ? __pfx___block_write_begin_int+0x10/0x10
[   34.523387][ T9845]  cont_write_begin+0x6d5/0xa50
[   34.523879][ T9845]  ? __pfx_cont_write_begin+0x10/0x10
[   34.524440][ T9845]  ? __pfx___might_resched+0x10/0x10
[   34.524976][ T9845]  ? __mark_inode_dirty+0x3db/0xeb0
[   34.525490][ T9845]  ? folio_unlock+0x10c/0x170
[   34.525981][ T9845]  hfs_write_begin+0x66/0xb0
[   34.526489][ T9845]  ? __pfx_hfs_get_block+0x10/0x10
[   34.527044][ T9845]  generic_perform_write+0x2c5/0x900
[   34.527569][ T9845]  ? __pfx_generic_perform_write+0x10/0x10
[   34.528161][ T9845]  ? file_update_time+0x2ce/0x470
[   34.528733][ T9845]  ? __generic_file_write_iter+0xf9/0x230
[   34.529270][ T9845]  ? generic_file_write_iter+0x103/0x550
[   34.529907][ T9845]  generic_file_write_iter+0x117/0x550
[   34.530445][ T9845]  ? __pfx_generic_file_write_iter+0x10/0x10
[   34.531076][ T9845]  ? stack_depot_save_flags+0x40/0x890
[   34.531600][ T9845]  ? __pfx_aa_file_perm+0x10/0x10
[   34.532108][ T9845]  ? __lock_acquire+0xaec/0xd80
[   34.532619][ T9845]  ? rcu_read_lock_any_held+0xb3/0x120
[   34.533138][ T9845]  ? __pfx_rcu_read_lock_any_held+0x10/0x10
[   34.533742][ T9845]  vfs_write+0x60a/0xb80
[   34.534189][ T9845]  ? __pfx_generic_file_write_iter+0x10/0x10
[   34.534769][ T9845]  ? __pfx_vfs_write+0x10/0x10
[   34.535259][ T9845]  ? do_sys_openat2+0x154/0x1c0
[   34.535781][ T9845]  ? kmem_cache_free+0x18c/0x400
[   34.536288][ T9845]  ksys_write+0x14a/0x250
[   34.536748][ T9845]  ? __pfx_ksys_write+0x10/0x10
[   34.537207][ T9845]  ? do_syscall_64+0xb7/0x3a0
[   34.537661][ T9845]  ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   34.538268][ T9845]  ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   34.538886][ T9845]  do_syscall_64+0xf3/0x3a0
[   34.539380][ T9845]  ? exc_page_fault+0x9f/0xf0
[   34.539871][ T9845]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   34.540472][ T9845] RIP: 0033:0x7f8a41d7bfc9
[   34.540915][ T9845] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 48
[   34.542872][ T9845] RSP: 002b:00007ffeb34ea6e8 EFLAGS: 00000202 ORIG_RAX:
0000000000000001
[   34.543642][ T9845] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
00007f8a41d7bfc9
[   34.544324][ T9845] RDX: 000000000208e24b RSI: 0000200000000080 RDI:
0000000000000005
[   34.545145][ T9845] RBP: 00007ffeb34ea700 R08: 00007ffeb34ea700 R09:
00007ffeb34ea700
[   34.545885][ T9845] R10: 00007ffeb34ea700 R11: 0000000000000202 R12:
000055a027173260
[   34.546641][ T9845] R13: 0000000000000000 R14: 0000000000000000 R15:
0000000000000000
[   34.547329][ T9845]  </TASK>

And I suspect that HFS+ could have the same issue.

> Interestingly, since hfsplus_fill_super() needs to initialize the root 
> directory “/”, this operation must rely on the catalog_tree. As a 
> result, the first byte of the catalog_tree bitmap is 0xC0 (i.e., 
> 0b11000000), meaning that both the header node and the root node are 
> pre-marked in the bitmap so that hfs_bnode_find can later be used to 
> create the header node (although in practice, the root node is the first 
> one created).
> 
> ```
> 	/* Load the root directory */
> 	root = hfsplus_iget(sb, HFSPLUS_ROOT_CNID);  <- use catalog tree
> 	if (IS_ERR(root)) {
> 		pr_err("failed to load root directory\n");
> 		err = PTR_ERR(root);
> 		goto out_put_alloc_file;
> 	}
> 
> ```
> 
> In contrast, the extent tree is not used at all during the mounting 
> process. As a result, the first byte of the extents tree is 0x80 (i.e., 
> 0b1000000), so its root node can be created later, when 
> hfs_brec_insert() performs the first insertion. At that point, the root 
> node is marked in the bitmap through hfs_bmap_alloc().
> 
> In short, during the mounting process, the extents tree initializes 
> neither the header node nor the root node.
> 
> ```
> 	if (!fd->bnode) {
> 		if (!tree->root)
> 			hfs_btree_inc_height(tree);  <- hfs_bmap_alloc
> 		node = hfs_bnode_find(tree, tree->leaf_head);
> 		if (IS_ERR(node))
> 			return PTR_ERR(node);
> 		fd->bnode = node;
> 		fd->record = -1;
> 	}
> 
> ```
> 
> > So, if you are fixing the syzbot reported issue, then it could be good to have
> > call trace in the comment to understand the fix environment. And it could be
> > more clear in what environment sync() is trying to create the b-tree node.
> 
> OK, I will put call trace in the commit message in future development.
> 
> syzbot report call trace:[2][3]
> 
> Call Trace:
>   <TASK>
>   hfsplus_btree_write+0x379/0x7b0 fs/hfsplus/btree.c:309
>   hfsplus_system_write_inode fs/hfsplus/super.c:137 [inline]
>   hfsplus_write_inode+0x4c9/0x5f0 fs/hfsplus/super.c:163
>   write_inode fs/fs-writeback.c:1525 [inline]
>   __writeback_single_inode+0x6f4/0x1000 fs/fs-writeback.c:1745
>   writeback_sb_inodes+0x6b7/0xf60 fs/fs-writeback.c:1976
>   wb_writeback+0x43b/0xaf0 fs/fs-writeback.c:2156
>   wb_do_writeback fs/fs-writeback.c:2303 [inline]
>   wb_workfn+0x40e/0xf00 fs/fs-writeback.c:2343
>   process_one_work kernel/workqueue.c:3236 [inline]
>   process_scheduled_works+0xae1/0x17b0 kernel/workqueue.c:3319
>   worker_thread+0x8a0/0xda0 kernel/workqueue.c:3400
>   kthread+0x711/0x8a0 kernel/kthread.c:463
>   ret_from_fork+0x3fc/0x770 arch/x86/kernel/process.c:148
>   ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> 
> Call Trace:
>   <TASK>
>   hfsplus_bmap_alloc+0x5a5/0x640 fs/hfsplus/btree.c:414
>   hfs_bnode_split+0xcc/0xef0 fs/hfsplus/brec.c:245
>   hfsplus_brec_insert+0x38f/0xcc0 fs/hfsplus/brec.c:100
>   hfsplus_rename_cat+0x51c/0xde0 fs/hfsplus/catalog.c:491
>   hfsplus_link+0x359/0x6a0 fs/hfsplus/dir.c:323
>   vfs_link+0x4ea/0x6e0 fs/namei.c:4854
>   do_linkat+0x272/0x560 fs/namei.c:4924
>   __do_sys_link fs/namei.c:4958 [inline]
>   __se_sys_link fs/namei.c:4956 [inline]
>   __x64_sys_link+0x82/0x90 fs/namei.c:4956
>   do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>   do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
>   entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
> There are two different call traces, depending on which path acquires 
> the node first.
> 
> Finally, I believe it is necessary to add the missing hfs_bnode_get in 
> __hfs_bnode_create. This ensures that each find call holds a proper 
> reference, aligning the reference counting design with that of VFS 
> inodes, and effectively preventing miscounting in concurrent scenarios.
> 

I agree with the fix. The patch simply needs to be more clearly explained.

> If you would like, I can rewrite the commit message and resend the 
> patch. Please let me know.
> 

Yes, please, the commit message needs to be improved.


Thanks,
Slava.

      reply	other threads:[~2025-09-02 21:57 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-29  9:39 [PATCH] hfsplus: fix missing hfs_bnode_get() in __hfs_bnode_create Chenzhi Yang
2025-08-29 21:53 ` Viacheslav Dubeyko
2025-09-01 14:21   ` 杨晨志
2025-09-02 21:57     ` Viacheslav Dubeyko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7c4e1a9e21d3a5538d448344d1bba6fcc93e9ef0.camel@ibm.com \
    --to=slava.dubeyko@ibm.com \
    --cc=frank.li@vivo.com \
    --cc=glaubitz@physik.fu-berlin.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=slava@dubeyko.com \
    --cc=syzbot+005d2a9ecd9fbf525f6a@syzkaller.appspotmail.com \
    --cc=yang.chenzhi@vivo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).