Re: [PATCH] jbd2: avoid mount failed when commit block is partial submitted

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "yebin (H)" <yebin10@huawei.com>
To: Jan Kara <jack@suse.cz>, Theodore Ts'o <tytso@mit.edu>
Cc: <adilger.kernel@dilger.ca>, <linux-ext4@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] jbd2: avoid mount failed when commit block is partial submitted
Date: Sun, 7 Apr 2024 09:37:25 +0800	[thread overview]
Message-ID: <6611F8D5.3030403@huawei.com> (raw)
In-Reply-To: <20240403101122.rmffivvvf4a33qis@quack3>



On 2024/4/3 18:11, Jan Kara wrote:
> On Tue 02-04-24 23:37:42, Theodore Ts'o wrote:
>> On Tue, Apr 02, 2024 at 03:42:40PM +0200, Jan Kara wrote:
>>> On Tue 02-04-24 17:09:51, Ye Bin wrote:
>>>> We encountered a problem that the file system could not be mounted in
>>>> the power-off scenario. The analysis of the file system mirror shows that
>>>> only part of the data is written to the last commit block.
>>>> To solve above issue, if commit block checksum is incorrect, check the next
>>>> block if has valid magic and transaction ID. If next block hasn't valid
>>>> magic or transaction ID then just drop the last transaction ignore checksum
>>>> error. Theoretically, the transaction ID maybe occur loopback, which may cause
>>>> the mounting failure.
>>>>
>>>> Signed-off-by: Ye Bin <yebin10@huawei.com>
>>> So this is curious. The commit block data is fully within one sector and
>>> the expectation of the journaling is that either full sector or nothing is
>>> written. So what kind of storage were you using that it breaks these
>>> expectations?
>> I suppose if the physical sector size is 512 bytes, and the file
>> system block is 4k, I suppose it's possible that on a crash, that part
>> of the 4k commit block could be written.
> I was thinking about that as well but the commit block looks like:
>
> truct commit_header {
>          __be32          h_magic;
>          __be32          h_blocktype;
>          __be32          h_sequence;
>          unsigned char   h_chksum_type;
>          unsigned char   h_chksum_size;
>          unsigned char   h_padding[2];
>          __be32          h_chksum[JBD2_CHECKSUM_BYTES];
>          __be64          h_commit_sec;
>          __be32          h_commit_nsec;
> };
>
> where JBD2_CHECKSUM_BYTES is 8. So all the data in the commit block
> including the checksum is in the first 60 bytes. Hence I would be really
> surprised if some storage can tear that...
This issue has been encountered a few times in the context of eMMC 
devices. The vendor
has confirmed that only 512-byte atomicity can be ensured in the firmware.
Although the valid data is only 60 bytes, the entire commit block is 
used for calculating
the checksum.
jbd2_commit_block_csum_verify:
...
calculated = jbd2_chksum(j, j->j_csum_seed, buf, j->j_blocksize);
...
>
> Hence either Ye Bin is running on some really exotic storage or the storage
> / CPU in fact flipped bits somewhere so that the checksum didn't match or
> the commit block was in fact not written now but it was a leftover from
> previous journal use and h_sequence happened to match. Very unlikely but
> depending on how exactly they do their powerfail testing I can imagine it
> would be possible with enough tries...
>
>> In *practice* though, this
>> is super rare.  That's because on many modern HDD's, the physical
>> sector size is 4k (because the ECC overhead is much lower), even if
>> the logical sector size is 512 byte (for Windows 98 compatibility).
>> And even on HDD's where the physical sector size is really 512 bytes,
>> the way the sectors are laid out in a serpentine fashion, it is
>> *highly* likely that 4k write won't get torn.
>>
>> And while this is *possible*, it's also possible that some kind of I/O
>> transfer error --- such as some bit flips which breaks the checksum on
>> the commit block, but also trashes the tid of the subsequent block,
>> such that your patch gets tricked into thinking that this is the
>> partial last commit, when in fact it's not the last commit, thus
>> causing the journal replay abort early.  If that's case, it's much
>> safer to force fsck to be run to detect any inconsistency that might
>> result.
> Yeah, I agree in these cases of a corrupted journal it seems dangerous to
> just try to continue without fsck based on some heuristics.
>
> 								Honza

next prev parent reply	other threads:[~2024-04-07  1:37 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-02  9:09 [PATCH] jbd2: avoid mount failed when commit block is partial submitted Ye Bin
2024-04-02 13:42 ` Jan Kara
2024-04-03  3:37   ` Theodore Ts'o
2024-04-03 10:11     ` Jan Kara
2024-04-07  1:37       ` yebin (H) [this message]
2024-04-11 13:37         ` Jan Kara
2024-04-11 14:55           ` Theodore Ts'o
2024-04-12  1:27             ` yebin (H)
2024-04-12  3:55               ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6611F8D5.3030403@huawei.com \
    --to=yebin10@huawei.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.