Re: RFC: log record CRC validation

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Michael Nishimoto <miken@agami.com>
To: David Chinner <dgc@sgi.com>
Cc: markgw@sgi.com, xfs-dev <xfs-dev@sgi.com>, xfs-oss <xfs@oss.sgi.com>
Subject: Re: RFC: log record CRC validation
Date: Tue, 31 Jul 2007 17:49:18 -0700	[thread overview]
Message-ID: <46AFD88E.9070403@agami.com> (raw)
In-Reply-To: <20070727065930.GT12413810@sgi.com>

David Chinner wrote:
> On Thu, Jul 26, 2007 at 06:24:51PM -0700, Michael Nishimoto wrote:
>  > The log checksum code has not been used since the
>  > development phase of xfs.  It did work at one point because I
>  > remember using it and then decided to disable it and use just
>  > the current cycle stamping technique.  The checksum code was
>  > just advisory, so I could see if it ever occurred during
>  > development.
>  >
>  > When a CRC error is found, your suggestion is correct.  Recovery
>  > should backup and process only completely good log records.  The code
>  > backs up in this same fashion when it encounters a region of
>  > missing sector updates because of the async nature of log
>  > writes and disk caches.
> 
> Yes, but that's usually only in the last 8 log-buffers worth of the
> the log that the hole exists in (i.e. 256k by default). However, if
> the tail block has a CRC error, we've got to through away the entire
> log and that, like zeroing a dirty log from xfs_repair, generally
> results in a corrupted filesystem image.
> 
> An example of where this could be a problem is reusing a just-freed
> extent. Before reusing it we force the log to get the transaction on
> disk and rely on log replay to ensure that the block is freed in the
> event of a crash. We then go and write over the contents of the
> block. If that log transaction is not replayed (that freed the
> extent) then we've overwritten the previous contents of that extent
> and so the "current" contents of the extent after log replay are wrong.
> 
> IOWs, I think that if we come across a bad CRC in a log record we
> can replay up to that point but we've still got to abort the mount
> and ask the user to run repair....
> 
>  > At this point, I'm not convinced that xfs needs to do CRCs on
>  > the xfs log because the size of an xfs log is relatively small.
> 
> Sure, but the same argument can be made about the superblock,
> or an AGF and a directory block. That doesn't mean that they'll
> never have an error.
> 
> Statistically speaking, the log contains that blocks in the
> filesystem we most frequently do I/O to, so it's the most likely
> region to see an I/O path induced bit error. If we see one
> on recovery......
> 
> Cheers,
> 
> Dave.
> --
> Dave Chinner
> Principal Engineer
> SGI Australian Software Group

You are correct.  I didn't need the example -- just a reminder that
CRC errors can occur anywhere, not just in the last 8 log-buffers
worth of data. ;-)

And yes, repair is necessary now because we are no longer replaying
a transaction which we thought was committed.

I have a request.  Can this be made a mkfs.xfs option, so it can be
disabled?

What are your plans for adding CRCs to other metadata objects?

     Michael

next prev parent reply	other threads:[~2007-08-01  0:49 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20070725092445.GT12413810@sgi.com>
2007-07-25 10:14 ` RFC: log record CRC validation Mark Goodwin
2007-07-26  5:55   ` David Chinner
2007-07-26 23:01     ` Andi Kleen
2007-07-26 23:50       ` David Chinner
2007-07-26 17:53   ` Michael Nishimoto
2007-07-26 23:31     ` David Chinner
2007-07-27  1:24       ` Michael Nishimoto
2007-07-27  6:59         ` David Chinner
2007-08-01  0:49           ` Michael Nishimoto [this message]
2007-08-01  2:24             ` David Chinner
2007-08-01  2:36               ` Barry Naujok
2007-08-01  2:43                 ` David Chinner
2007-08-01 12:11               ` Andi Kleen
2007-07-28  2:00       ` William J. Earl
2007-07-28 14:03         ` Andi Kleen
2007-07-31  5:30         ` David Chinner
2007-08-01  1:32           ` William J. Earl
2007-08-01 10:02             ` David Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46AFD88E.9070403@agami.com \
    --to=miken@agami.com \
    --cc=dgc@sgi.com \
    --cc=markgw@sgi.com \
    --cc=xfs-dev@sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox