Re: JBD2: journal transaction 6943 on loop0-8 is corrupt.

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

From: "Theodore Ts'o" <tytso@mit.edu>
To: Manfred Spraul <manfred@colorfullife.com>
Cc: adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org, 1vier1@web.de
Subject: Re: JBD2: journal transaction 6943 on loop0-8 is corrupt.
Date: Wed, 29 Dec 2021 20:37:21 -0500	[thread overview]
Message-ID: <Yc0NUYyRhLdtapq+@mit.edu> (raw)
In-Reply-To: <baa3101d-e2f7-823e-040f-8739ab610419@colorfullife.com>

On Tue, Dec 28, 2021 at 09:36:22PM +0100, Manfred Spraul wrote:
> Hi,
> 
> with simulated power failures, I see a corrupted journal
> 
> [39056.200845] JBD2: journal transaction 6943 on loop0-8 is corrupt.
> [39056.200851] EXT4-fs (loop0): error loading journal

This means that the journal replay found a commit which was *not* the
last commit, and which contained a CRC error.  If it's the last commit
(e.g., there is no valid subsequent commit block), then it's possible
that the journal commit was never completed before the system crashed
--- e.g., it was an interrupted commit.

Your test is aborting the commit at various points in the write I/O
stream, so it should be simulating an interrupted commit (assuming
that it's not corrupting any I/O.  So the jbd2 layer should have
understood it was the last commit in the journal, and been OK with the
checksum failure.

But what can happen is that if there is a commit block in the right
place at the end of the transaction, left over from the previous
journalling session, this can confuse the jbd2 layer into thinking
that it is *not* the last transaction, and then it will make the
"journal transaction is corrupt" report.

How does the jbd2 layer determine whether there is a valid "subsequent
commit", well if the subsequent commit block meets the following two
criteria:

	* the commit id is the correct, expected one (n+1 the previous
          commit id).
	* the commit time (seconds since January 1, 1970) in the
	  commit block is greater than the comit time in the previous
	  commit block.

So if your test setup doesn't correctly set the time (say, it always
leaves the bootup time to January 1, 1970), and the workload is
extremely regular, it's possible that the replay interrupted a journal
commit, but there was left-over commit block that *looked* valid, and
it triggered the failure.

If this is what happened, it's not a disaster --- the journal replay
will have correctly stopped where it should have, but it thought it
was an exceptional abort, as opposed to a normal journal replay
commpletion.  So the "file system is corrupted flag" will be set,
forcing an fsck, but the fsck shouldn't find any problems with the
file system.

Does this explanation seem to fit with how your test setup is
arranged?

     	  	      	      	       - Ted

next prev parent reply	other threads:[~2021-12-30  1:37 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-28 20:36 JBD2: journal transaction 6943 on loop0-8 is corrupt Manfred Spraul
2021-12-30  1:37 ` Theodore Ts'o [this message]
2021-12-30  8:16   ` Manfred Spraul

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yc0NUYyRhLdtapq+@mit.edu \
    --to=tytso@mit.edu \
    --cc=1vier1@web.de \
    --cc=adilger.kernel@dilger.ca \
    --cc=linux-ext4@vger.kernel.org \
    --cc=manfred@colorfullife.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox