From: "Theodore Ts'o" <tytso@mit.edu>
To: Manfred Spraul <manfred@colorfullife.com>
Cc: adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org, 1vier1@web.de
Subject: Re: JBD2: journal transaction 6943 on loop0-8 is corrupt.
Date: Wed, 29 Dec 2021 20:37:21 -0500 [thread overview]
Message-ID: <Yc0NUYyRhLdtapq+@mit.edu> (raw)
In-Reply-To: <baa3101d-e2f7-823e-040f-8739ab610419@colorfullife.com>
On Tue, Dec 28, 2021 at 09:36:22PM +0100, Manfred Spraul wrote:
> Hi,
>
> with simulated power failures, I see a corrupted journal
>
> [39056.200845] JBD2: journal transaction 6943 on loop0-8 is corrupt.
> [39056.200851] EXT4-fs (loop0): error loading journal
This means that the journal replay found a commit which was *not* the
last commit, and which contained a CRC error. If it's the last commit
(e.g., there is no valid subsequent commit block), then it's possible
that the journal commit was never completed before the system crashed
--- e.g., it was an interrupted commit.
Your test is aborting the commit at various points in the write I/O
stream, so it should be simulating an interrupted commit (assuming
that it's not corrupting any I/O. So the jbd2 layer should have
understood it was the last commit in the journal, and been OK with the
checksum failure.
But what can happen is that if there is a commit block in the right
place at the end of the transaction, left over from the previous
journalling session, this can confuse the jbd2 layer into thinking
that it is *not* the last transaction, and then it will make the
"journal transaction is corrupt" report.
How does the jbd2 layer determine whether there is a valid "subsequent
commit", well if the subsequent commit block meets the following two
criteria:
* the commit id is the correct, expected one (n+1 the previous
commit id).
* the commit time (seconds since January 1, 1970) in the
commit block is greater than the comit time in the previous
commit block.
So if your test setup doesn't correctly set the time (say, it always
leaves the bootup time to January 1, 1970), and the workload is
extremely regular, it's possible that the replay interrupted a journal
commit, but there was left-over commit block that *looked* valid, and
it triggered the failure.
If this is what happened, it's not a disaster --- the journal replay
will have correctly stopped where it should have, but it thought it
was an exceptional abort, as opposed to a normal journal replay
commpletion. So the "file system is corrupted flag" will be set,
forcing an fsck, but the fsck shouldn't find any problems with the
file system.
Does this explanation seem to fit with how your test setup is
arranged?
- Ted
next prev parent reply other threads:[~2021-12-30 1:37 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-28 20:36 JBD2: journal transaction 6943 on loop0-8 is corrupt Manfred Spraul
2021-12-30 1:37 ` Theodore Ts'o [this message]
2021-12-30 8:16 ` Manfred Spraul
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yc0NUYyRhLdtapq+@mit.edu \
--to=tytso@mit.edu \
--cc=1vier1@web.de \
--cc=adilger.kernel@dilger.ca \
--cc=linux-ext4@vger.kernel.org \
--cc=manfred@colorfullife.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox