All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@sun.com>
To: Girish Shilamkar <Girish.Shilamkar@sun.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>, linux-ext4@vger.kernel.org
Subject: Re: What to do when the journal checksum is incorrect
Date: Tue, 03 Jun 2008 15:27:40 -0600	[thread overview]
Message-ID: <20080603212740.GI2961@webber.adilger.int> (raw)
In-Reply-To: <1212488533.3272.23.camel@alpha.linsyssoft.com>

On Jun 03, 2008  15:52 +0530, Girish Shilamkar wrote:
> I went through the code and also re-ran the e2fsprogs tests which we had
> sent upstream for journal checksum. And found that if the transaction is
> bad it is marked as info->end_transaction which indicates a bad
> transaction and is not replayed.
> 
> if (chksum_err) {
>      info->end_transaction = next_commit_ID;
> 
> The end_transaction is set to transaction ID which is found to be
> corrupt. So #4 will be set in end_transaction and in PASS_REPLAY the
> last transaction to be replayed will be #3 due to this:
> ----------------------------------------------------------------
> if (tid_geq(next_commit_ID, info->end_transaction))
>                                 break;
> -----------------------------------------------------------------
> 
>      if (!JFS_HAS_COMPAT_FEATURE(journal,
>                                  JFS_FEATURE_INCOMPAT_ASYNC_COMMIT)){
>            printk(KERN_ERR "JBD: Transaction %u found to be corrupt.\n",
>                 next_commit_ID);
>    brelse(bh);                                 
>    break;
>      }
> } 

Girish, thanks for following up on this.  It definitely eases my
concerns about the patch we are currently using.

> > Worse yet, no indication of any problems is
> > sent back to the ext4 filesystem code.
>
> This definitely is not present and needs to be incorporated. 

Yes, it seems that skipping later transactions definitely has some
drawbacks.

We had discussed on the ext4 concall a change to the format of the journal
checksum code, having it store a small per-block checksum in addition to
the per-transaction checksum, so that in case of transaction corruption
the good blocks can be recovered and the bad ones skipped.

The proposal was to do an adler32 checksum of the block (using
the transaction number and filesystem block number at the start of
the checksum data), and then store the high (or low?) 16 bits of the
checksum in the high 16 bits of the journal_block_tag_s.t_flags field.
The full 32-bit per-block checksum would also be checksummed in order
to generate the 32-bit transaction checksum.

While a 16-bit checksum is itself is not very strong, we have the full
32-bit transaction checksum to verify the whole transaction, and we
only care about 16-bit per-block checksum in the rare case when the
transaction checksum is bad.  Having the transaction number "seed" each
of the per-block checksums avoids the risk of having an old journal tag
block in the journal with "good" checksums, and having the filesystem
block number in the checksum avoids the risk of single-bit errors in
the tag block resulting in the "good checksum" block being written to
the wrong part of the filesystem.

This new per-block checksum would be recorded as a new checksum type
in the journal superblock.  The reason for using adler32 instead of
the CRC32 we are using now is that adler32 is significantly faster
when implemented in software, and is equally robust for the blocksizes
we are using for ext3 (adler32 is not as strong with very small buffer
sizes like 128 bytes or less).

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


  reply	other threads:[~2008-06-03 21:27 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-24 22:34 What to do when the journal checksum is incorrect Theodore Ts'o
2008-05-25  6:30 ` Andreas Dilger
2008-05-25 11:38   ` Theodore Tso
2008-05-26 14:54     ` Theodore Tso
2008-05-26 18:24     ` Andreas Dilger
2008-05-26 21:28       ` Ric Wheeler
2008-06-03 10:22 ` Girish Shilamkar
2008-06-03 21:27   ` Andreas Dilger [this message]
2008-06-04 23:40   ` Theodore Tso
2008-06-04 23:56     ` [PATCH] jbd2: Fix memory leak when verifying checksums in the journal Theodore Ts'o
2008-06-04 23:56       ` [PATCH] jbd2: If a journal checksum error is detected, propagate the error to ext4 Theodore Ts'o
2008-06-05  3:17         ` Andreas Dilger
2008-06-05 16:21           ` Theodore Tso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080603212740.GI2961@webber.adilger.int \
    --to=adilger@sun.com \
    --cc=Girish.Shilamkar@sun.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.