Re: [syzbot] [ext4?] [ocfs2?] kernel BUG in jbd2_cleanup_journal_tail

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

From: Julian Sun <sunjunchao2870@gmail.com>
To: Joel Becker <jlbec@evilplan.org>, Theodore Ts'o <tytso@mit.edu>
Cc: Vinicius Peixoto <vpeixoto@lkcamp.dev>,
	 syzbot+8512f3dbd96253ffbe27@syzkaller.appspotmail.com,
	jack@suse.com,  joseph.qi@linux.alibaba.com,
	linux-ext4@vger.kernel.org,  linux-kernel@vger.kernel.org,
	mark@fasheh.com, ocfs2-devel@lists.linux.dev,
	syzkaller-bugs@googlegroups.com, ~lkcamp/discussion@lists.sr.ht
Subject: Re: [syzbot] [ext4?] [ocfs2?] kernel BUG in jbd2_cleanup_journal_tail
Date: Wed, 28 Aug 2024 20:28:32 +0800	[thread overview]
Message-ID: <80ad8de73bc8bea5d77f0802e65e970d5993c65a.camel@gmail.com> (raw)
In-Reply-To: <ZszY3SHWTp7XfS3z@google.com>

On Mon, 2024-08-26 at 12:34 -0700, Joel Becker wrote:
> On Mon, Aug 26, 2024 at 09:32:08AM -0400, Theodore Ts'o wrote:
> > On Mon, Aug 26, 2024 at 01:22:54AM -0300, Vinicius Peixoto wrote:
> > > Since the disk data is bogus, journal_reset fails with -EINVAL
> > > ("JBD2:
> > > Journal too short (blocks 2-1024)"); this leaves journal->j_head
> > > == NULL.
> > > However, jbd2_journal_load clears the JBD2_ABORT flag right
> > > before calling
> > > journal_reset. This leads to a problem later when
> > > ofcs2_mount_volume tries
> > > to flush the journal as part of the cleanup when aborting the
> > > mount
> > > operation:
> > > 
> > >   -> ofcs2_mount_volume (error; goto out_system_inodes)
> > >     -> ofcs2_journal_shutdown
> > >       -> jbd2_journal_flush
> > >         -> jbd2_cleanup_journal_tail (J_ASSERT fails)
> > > ...
> > The reason why this isn't an issue with ext4 is because the normal
> > "right" way to do this is if jbd2_journal_load() returns an error,
> > is
> > to call jbd2_journal_destroy() to release the data structures with
> > the
> > journal --- and then don't try to use the journal afterwards.
> > 
> > The weird thing is that there are two code paths in ocfs2 that
> > calls
> > jbd2_journal_load().  One is in ocfs2_replay_journal() which does
> > what
> > ext4 does.  The other is ocfs2_load_journal() which does *not* do
> > this, and this is the one which you saw in the syzkaller
> > reproducer.
> > It looks like one codepath is used to replay the ocfs2 for some
> > other
> > node, and the is to load (and presumably later, replay) the journal
> > for the mounting node.
> 
> You are correct, Ted, that one path is for the local journal and the
> other is to recover remote journals for other nodes that may have
> crashed.
> 
> I think the big ordering issue is that we set
> osb->journal->j_state=OCFS2_JOURNAL_LOADED in ocfs2_journal_init(),
> before we've attempted any replay.  Later in
> ocfs2_journal_shutdown(),
> we check this state and decide to perform cleanup.
> 
> Instead, we should not set OCFS2_JOURNAL_LOADED until
> ocfs2_journal_load() has called jbd2_journal_load().  Only then do we
> actually know we have loaded a valid journal.

Yeah, it's right. We should not set OCFS2_JOURNAL_LOADED in
ocfs2_journal_load(). Instead, we should set other flag like
OCFS2_JOURNAL_INIT, to indicate that resources have been allocated.
This way, we can clean them up properly in ocfs2_journal_shutdown(). We
should distinguish between these two states to ensure the correct exit
procedure when an error occurs, just like this patch[1] does.

[1]:
https://lore.kernel.org/ocfs2-devel/20240823083150.17590-1-sunjunchao2870@gmail.com/
> 
> Something like:
> 
> ```
>         status = jbd2_journal_load(journal->j_journal);
>         if (status < 0) {
>                 mlog(ML_ERROR, "Failed to load journal!\n");
>                 BUG_ON(!igrab(journal->j_inode));
>                 jbd2_journal_destroy(journal->j_journal);
>                 iput(journal->j_inode)
>                 goto done;
>         }
>         journal->j_state = OCFS2_JOURNAL_LOADED;
> ```
> 
> With code like this, when jbd2_journal_load() fails, the future
> ocfs2_journal_shutdown() will exit early, because
> !OCFS2_JOURNAL_LOADED.
> 
> I think this is the right spot; a quick audit of the paths (it has
> been
> a while) doesn't find any other outstanding state; the rest of
> journal
> startup, such as the commit thread etc, only happen after this.
> 
> > jbd2_journal_destroy().  It would seem like the *right* thing to do
> > is
> > to bump the refcount in ocfs2_journal_init(), and if for some
> > reason
> > the igrab fails, it can just return an error early, instead of
> > having
> > to resort to BUG_ON() later, and if you don't realize that you have
> > to
> > do this weird igrab() before calling jbd2_journal_destroy(), you'll
> > end up leaking the journal inode.
> 
> There are interactions of journal inodes for nodes we don't own, and
> also connections to cluster locks for our own journal (don't replay
> ourselves while another node has locked it and is recovering us).  So
> we
> do have some state to keep track of.  But it's been so long that I
> don't
> recall if there was a specific reason we do this late via igrab(), or
> if
> it's just that we should have done as you describe and missed it.
> You'll note that I copied the igrab/iput game in my snippet above.
> 
> Should someone try to audit the igrab/iput thing later?  Yes.  But
> it's
> not a necessary part of this fix.
> 
> Thanks,
> Joel
> 

Thanks,
-- 
Julian Sun <sunjunchao2870@gmail.com>

next prev parent reply	other threads:[~2024-08-28 12:28 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-23  0:00 [syzbot] [ext4?] [ocfs2?] kernel BUG in jbd2_cleanup_journal_tail syzbot
2024-08-26  4:22 ` Vinicius Peixoto
2024-08-26 13:32   ` Theodore Ts'o
2024-08-26 19:34     ` Joel Becker
2024-08-28 12:28       ` Julian Sun [this message]
2024-08-28 10:54     ` Jan Kara
2024-08-28 11:55       ` Julian Sun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=80ad8de73bc8bea5d77f0802e65e970d5993c65a.camel@gmail.com \
    --to=sunjunchao2870@gmail.com \
    --cc=jack@suse.com \
    --cc=jlbec@evilplan.org \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark@fasheh.com \
    --cc=ocfs2-devel@lists.linux.dev \
    --cc=syzbot+8512f3dbd96253ffbe27@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=tytso@mit.edu \
    --cc=vpeixoto@lkcamp.dev \
    --cc=~lkcamp/discussion@lists.sr.ht \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox