public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>, Ye Bin <yebin10@huawei.com>,
	adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] jbd2: avoid mount failed when commit block is partial submitted
Date: Wed, 3 Apr 2024 12:11:22 +0200	[thread overview]
Message-ID: <20240403101122.rmffivvvf4a33qis@quack3> (raw)
In-Reply-To: <20240403033742.GE1189142@mit.edu>

On Tue 02-04-24 23:37:42, Theodore Ts'o wrote:
> On Tue, Apr 02, 2024 at 03:42:40PM +0200, Jan Kara wrote:
> > On Tue 02-04-24 17:09:51, Ye Bin wrote:
> > > We encountered a problem that the file system could not be mounted in
> > > the power-off scenario. The analysis of the file system mirror shows that
> > > only part of the data is written to the last commit block.
> > > To solve above issue, if commit block checksum is incorrect, check the next
> > > block if has valid magic and transaction ID. If next block hasn't valid
> > > magic or transaction ID then just drop the last transaction ignore checksum
> > > error. Theoretically, the transaction ID maybe occur loopback, which may cause
> > > the mounting failure.
> > > 
> > > Signed-off-by: Ye Bin <yebin10@huawei.com>
> > 
> > So this is curious. The commit block data is fully within one sector and
> > the expectation of the journaling is that either full sector or nothing is
> > written. So what kind of storage were you using that it breaks these
> > expectations?
> 
> I suppose if the physical sector size is 512 bytes, and the file
> system block is 4k, I suppose it's possible that on a crash, that part
> of the 4k commit block could be written.

I was thinking about that as well but the commit block looks like:

truct commit_header {
        __be32          h_magic;
        __be32          h_blocktype;
        __be32          h_sequence;
        unsigned char   h_chksum_type;
        unsigned char   h_chksum_size;
        unsigned char   h_padding[2];
        __be32          h_chksum[JBD2_CHECKSUM_BYTES];
        __be64          h_commit_sec;
        __be32          h_commit_nsec;
};

where JBD2_CHECKSUM_BYTES is 8. So all the data in the commit block
including the checksum is in the first 60 bytes. Hence I would be really
surprised if some storage can tear that...

Hence either Ye Bin is running on some really exotic storage or the storage
/ CPU in fact flipped bits somewhere so that the checksum didn't match or
the commit block was in fact not written now but it was a leftover from
previous journal use and h_sequence happened to match. Very unlikely but
depending on how exactly they do their powerfail testing I can imagine it
would be possible with enough tries...

> In *practice* though, this
> is super rare.  That's because on many modern HDD's, the physical
> sector size is 4k (because the ECC overhead is much lower), even if
> the logical sector size is 512 byte (for Windows 98 compatibility).
> And even on HDD's where the physical sector size is really 512 bytes,
> the way the sectors are laid out in a serpentine fashion, it is
> *highly* likely that 4k write won't get torn.
> 
> And while this is *possible*, it's also possible that some kind of I/O
> transfer error --- such as some bit flips which breaks the checksum on
> the commit block, but also trashes the tid of the subsequent block,
> such that your patch gets tricked into thinking that this is the
> partial last commit, when in fact it's not the last commit, thus
> causing the journal replay abort early.  If that's case, it's much
> safer to force fsck to be run to detect any inconsistency that might
> result.

Yeah, I agree in these cases of a corrupted journal it seems dangerous to
just try to continue without fsck based on some heuristics.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2024-04-03 10:11 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-02  9:09 [PATCH] jbd2: avoid mount failed when commit block is partial submitted Ye Bin
2024-04-02 13:42 ` Jan Kara
2024-04-03  3:37   ` Theodore Ts'o
2024-04-03 10:11     ` Jan Kara [this message]
2024-04-07  1:37       ` yebin (H)
2024-04-11 13:37         ` Jan Kara
2024-04-11 14:55           ` Theodore Ts'o
2024-04-12  1:27             ` yebin (H)
2024-04-12  3:55               ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240403101122.rmffivvvf4a33qis@quack3 \
    --to=jack@suse.cz \
    --cc=adilger.kernel@dilger.ca \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=yebin10@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox