The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Joseph Qi <joseph.qi@linux.alibaba.com>
To: ZhengYuan Huang <gality369@gmail.com>
Cc: ocfs2-devel@lists.linux.dev, linux-kernel@vger.kernel.org,
	baijiaju1990@gmail.com, r33s3n6@gmail.com, zzzccc427@gmail.com,
	Mark Fasheh <mark@fasheh.com>, Joel Becker <jlbec@evilplan.org>
Subject: Re: [PATCH] ocfs2: revalidate the journal dinode before toggling dirty
Date: Mon, 11 May 2026 14:15:04 +0800	[thread overview]
Message-ID: <80f6acd3-f80a-4b7a-abdb-6723df92d867@linux.alibaba.com> (raw)
In-Reply-To: <CAOmEq9Wv+v_rb+FRKYJxvr4V6e9nj49zN-4tM+82=_Ei82PN2w@mail.gmail.com>



On 5/11/26 10:58 AM, ZhengYuan Huang wrote:
> On Sun, May 10, 2026 at 12:02 PM Joseph Qi <joseph.qi@linux.alibaba.com> wrote:
>>
>>
>>
>> On 5/9/26 9:52 PM, ZhengYuan Huang wrote:
>>> [BUG]
>>> A fuzzed OCFS2 image can corrupt the current slot journal dinode while
>>> mount is still in progress. The mount path first reports the invalid
>>> journal block and then crashes in shutdown:
>>>
>>> kernel BUG at fs/ocfs2/journal.c:1034!
>>> Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
>>> RIP: 0010:ocfs2_journal_toggle_dirty+0x2d6/0x340 fs/ocfs2/journal.c:1034
>>> Call Trace:
>>>  ocfs2_journal_shutdown+0x414/0xc30 fs/ocfs2/journal.c:1116
>>>  ocfs2_mount_volume fs/ocfs2/super.c:1785 [inline]
>>>  ocfs2_fill_super+0x30a9/0x3cd0 fs/ocfs2/super.c:1083
>>>  get_tree_bdev_flags+0x38b/0x640 fs/super.c:1698
>>>  get_tree_bdev+0x24/0x40 fs/super.c:1721
>>>  ocfs2_get_tree+0x21/0x30 fs/ocfs2/super.c:1184
>>>  vfs_get_tree+0x9a/0x370 fs/super.c:1758
>>>  fc_mount fs/namespace.c:1199 [inline]
>>>  do_new_mount_fc fs/namespace.c:3642 [inline]
>>>  do_new_mount fs/namespace.c:3718 [inline]
>>>  path_mount+0x5b8/0x1ea0 fs/namespace.c:4028
>>>  do_mount fs/namespace.c:4041 [inline]
>>>  __do_sys_mount fs/namespace.c:4229 [inline]
>>>  __se_sys_mount fs/namespace.c:4206 [inline]
>>>  __x64_sys_mount+0x282/0x320 fs/namespace.c:4206
>>>  ...
>>>
>>>
>>> [CAUSE]
>>> ocfs2_journal_toggle_dirty() assumes journal->j_bh still contains the
>>> same validated dinode that ocfs2_journal_init() locked earlier, and it
>>> uses BUG_ON() when the buffer no longer looks like a dinode. That
>>> assumption is too strong. The mount path can force the same current-slot
>>> journal inode block back in from disk through
>>> ocfs2_read_journal_inode(..., OCFS2_BH_IGNORE_CACHE) while
>>> ocfs2_mark_dead_nodes() scans the journal slots. If that reread finds
>>> corrupted metadata, mount unwinds through ocfs2_journal_shutdown(),
>>> which reuses journal->j_bh and turns the metadata corruption into a
>>> kernel BUG.
>>>
>>
>> A bit confused.
>> Since journal dinode is firstly validated, it means image is checked.
>> Now mount is in progress, how to corrupt it during runtime?
>>
>> Thanks,
>> Joseph
> 
> Thanks for taking a look.
> 
> Yes, the journal dinode is validated when it is first initialized. My
> concern is that later in the mount path, the same journal inode block
> may be read again from disk with OCFS2_BH_IGNORE_CACHE, so the buffer
> used by ocfs2_journal_shutdown() may no longer be the same validated
> contents.
> 
After the validation in ocfs2_journal_init(), the in-memory copy won't
spontaneously become invalid.

And if it is broken by a re-write (e.g. recover), this a bug in the
re-write flow and we have to fix the flow itself.


> This does not mean the filesystem itself corrupts the block during
> mount. Rather, after the initial validation and before the later use,
> the block contents may change due to unexpected disk corruption, I/O
> problems, or a forced reread of corrupted on-disk metadata. In that
> case, ocfs2_journal_toggle_dirty() should not rely only on the earlier
> validation.
> 
ocfs2_validate_inode_block() is a bit heavy. So if we want to prevent a
BUG_ON in case of unexpected disk corruption (still a bit strange, it is
fine in init and then suddenly down...), a simpler alternative would be
just replace BUG_ON with WARN_ON and return error.

Thanks,
Joseph

> Since this is a cold mount/shutdown error path, adding this extra
> validation should not have a meaningful performance impact. I see it
> as a small robustness improvement to avoid turning bad metadata into a
> kernel BUG.
> 


  reply	other threads:[~2026-05-11  6:15 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-09 13:52 [PATCH] ocfs2: revalidate the journal dinode before toggling dirty ZhengYuan Huang
2026-05-10  4:02 ` Joseph Qi
2026-05-11  2:58   ` ZhengYuan Huang
2026-05-11  6:15     ` Joseph Qi [this message]
2026-05-12  2:45       ` ZhengYuan Huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=80f6acd3-f80a-4b7a-abdb-6723df92d867@linux.alibaba.com \
    --to=joseph.qi@linux.alibaba.com \
    --cc=baijiaju1990@gmail.com \
    --cc=gality369@gmail.com \
    --cc=jlbec@evilplan.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark@fasheh.com \
    --cc=ocfs2-devel@lists.linux.dev \
    --cc=r33s3n6@gmail.com \
    --cc=zzzccc427@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox