RFC: adding a crc field to xfs_buf_log_format

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* RFC: adding a crc field to xfs_buf_log_format_t
@ 2008-09-23 17:28 Christoph Hellwig
  2008-09-24  1:05 ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2008-09-23 17:28 UTC (permalink / raw)
  To: xfs

With adding CRC to xfs metadata structures we face an interesting
problem.  As we want all the CRCs logged we always have to log the CRC.
But this means for many modification that previous just required one
or two chunks of a buffer logged (e.g. superblock counters or changing a
single btree records) we would now need a second iovec.  One idea to
avoid this would be to put the crc into a modified version of
xfs_buf_log_format_t.  Any opinions wether this is worth the trouble?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: adding a crc field to xfs_buf_log_format_t
  2008-09-23 17:28 RFC: adding a crc field to xfs_buf_log_format_t Christoph Hellwig
@ 2008-09-24  1:05 ` Dave Chinner
  2008-09-24  7:03   ` Timothy Shimmin
  2008-09-24 17:29   ` Christoph Hellwig
  0 siblings, 2 replies; 4+ messages in thread
From: Dave Chinner @ 2008-09-24  1:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Tue, Sep 23, 2008 at 01:28:00PM -0400, Christoph Hellwig wrote:
> With adding CRC to xfs metadata structures we face an interesting
> problem.  As we want all the CRCs logged we always have to log the CRC.

What version of the CRC are you wanting to log? The one that is
currently in the buffer (i.e. the one we last wrote to disk), or a
new CRC that covers the changes we just made to the buffer?

In the first case, I can't see how having that CRC in the
transaction helps in recovery at all. Algorithmically, if all
buffers have a crc32c in them, then the buffers should CRC to zero
when you include the CRC value in the calculation. Hence during log
recovery when we read a buffer in for the first time, we simply need
to check that the buffer CRC is zero. Hence we can verify that we've
read an uncorrupted buffer regardless of it's type or location of
the crc value in the buffer.

In the second case, that means every transaction commit has to
recalculate the CRC for every buffer modified to insert them into
the transaction. That means we need to peak into the buffer type
during transaction commit to determine where the CRC is and
extract that. There's a *lot* of CPU overhead there, especially
for heavily re-logged buffers, and once again I don't think it
buys us anything because we still need to verify the CRC is
correct before we write the buffer to disk at the end of log
replay...

I note that from your previous patch set you make these comments:

>> Note that we currently do not log the crc of the block, but
>> re-created it during log recovery.  With the pending patch to
>> also checksum the log this should be safe against filesystem
>> corruption but doesn't really follow the end to end argument.

The CRC is protecting what is on disk, not what is being changed in
memory. The model for protection is "write-IO to read-IO", not
"in-memory change to in-memory change".  That is, the CRC is not
protecting every single change that is made - it is simply there to
validate what is on disk is *what we wrote*, and with the current
re-logging model of the transaction subsystem that means each update
of the CRC is an "aggregate change" of the object.

Hence I think that CRC'd log transactions are more than sufficient
to protect against corruption of the delta changes that get applied
to CRC protected objects.....

>> Also poking into the buffer to find out whether this is a btree
>> buffer during log recovery is not a very clean way to implement
>> this.

Add the type of buffer to the buffer format structure, that way we
can poke the buffer to _verify_ it's type rather than having to rely
on what came off disk. Recording that type will also enable us to
easily set up the buffer correctly for calculating the CRC at
writeback at the end of log replay....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: adding a crc field to xfs_buf_log_format_t
  2008-09-24  1:05 ` Dave Chinner
@ 2008-09-24  7:03   ` Timothy Shimmin
  2008-09-24 17:29   ` Christoph Hellwig
  1 sibling, 0 replies; 4+ messages in thread
From: Timothy Shimmin @ 2008-09-24  7:03 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

Dave Chinner wrote:
> On Tue, Sep 23, 2008 at 01:28:00PM -0400, Christoph Hellwig wrote:
>> With adding CRC to xfs metadata structures we face an interesting
>> problem.  As we want all the CRCs logged we always have to log the CRC.
> 
> What version of the CRC are you wanting to log? The one that is
> currently in the buffer (i.e. the one we last wrote to disk), or a
> new CRC that covers the changes we just made to the buffer?
> 
> In the first case, I can't see how having that CRC in the
> transaction helps in recovery at all. Algorithmically, if all
> buffers have a crc32c in them, then the buffers should CRC to zero
> when you include the CRC value in the calculation. Hence during log
> recovery when we read a buffer in for the first time, we simply need
> to check that the buffer CRC is zero. Hence we can verify that we've
> read an uncorrupted buffer regardless of it's type or location of
> the crc value in the buffer.
> 
> In the second case, that means every transaction commit has to
> recalculate the CRC for every buffer modified to insert them into
> the transaction. That means we need to peak into the buffer type
> during transaction commit to determine where the CRC is and
> extract that. There's a *lot* of CPU overhead there, especially
> for heavily re-logged buffers, and once again I don't think it
> buys us anything because we still need to verify the CRC is
> correct before we write the buffer to disk at the end of log
> replay...
> 
> I note that from your previous patch set you make these comments:
> 
>>> Note that we currently do not log the crc of the block, but
>>> re-created it during log recovery.  With the pending patch to
>>> also checksum the log this should be safe against filesystem
>>> corruption but doesn't really follow the end to end argument.
> 
> The CRC is protecting what is on disk, not what is being changed in
> memory. The model for protection is "write-IO to read-IO", not
> "in-memory change to in-memory change".  That is, the CRC is not
> protecting every single change that is made - it is simply there to
> validate what is on disk is *what we wrote*, and with the current
> re-logging model of the transaction subsystem that means each update
> of the CRC is an "aggregate change" of the object.
> 
> Hence I think that CRC'd log transactions are more than sufficient
> to protect against corruption of the delta changes that get applied
> to CRC protected objects.....
> 
Thanks for the clarification.
I haven't looked at the CRC of the transactions yet - need to find
that patch.
But it seems to make sense to just apply CRC's to metadata or log data
that is going to disk and keep things simple - as we are targetting
corruption of on-disk meta data by outside things.

--Tim

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: adding a crc field to xfs_buf_log_format_t
  2008-09-24  1:05 ` Dave Chinner
  2008-09-24  7:03   ` Timothy Shimmin
@ 2008-09-24 17:29   ` Christoph Hellwig
  1 sibling, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2008-09-24 17:29 UTC (permalink / raw)
  To: Christoph Hellwig, xfs

On Wed, Sep 24, 2008 at 11:05:53AM +1000, Dave Chinner wrote:
> On Tue, Sep 23, 2008 at 01:28:00PM -0400, Christoph Hellwig wrote:
> > With adding CRC to xfs metadata structures we face an interesting
> > problem.  As we want all the CRCs logged we always have to log the CRC.
> 
> What version of the CRC are you wanting to log? The one that is
> currently in the buffer (i.e. the one we last wrote to disk), or a
> new CRC that covers the changes we just made to the buffer?

The CRC of the block after the just applied changes.

> >> Note that we currently do not log the crc of the block, but
> >> re-created it during log recovery.  With the pending patch to
> >> also checksum the log this should be safe against filesystem
> >> corruption but doesn't really follow the end to end argument.
> 
> The CRC is protecting what is on disk, not what is being changed in
> memory. The model for protection is "write-IO to read-IO", not
> "in-memory change to in-memory change".  That is, the CRC is not
> protecting every single change that is made - it is simply there to
> validate what is on disk is *what we wrote*, and with the current
> re-logging model of the transaction subsystem that means each update
> of the CRC is an "aggregate change" of the object.
> 
> Hence I think that CRC'd log transactions are more than sufficient
> to protect against corruption of the delta changes that get applied
> to CRC protected objects.....

Still feeling a little un-easy about it, but I guess you're right.
Having one proper checksum for the log buffer should be enough, and
we should not worry about the end-to-end argument.  Certainly makes the
implementation a lot simpler, and the operations faster.

> >> Also poking into the buffer to find out whether this is a btree
> >> buffer during log recovery is not a very clean way to implement
> >> this.
> 
> Add the type of buffer to the buffer format structure, that way we
> can poke the buffer to _verify_ it's type rather than having to rely
> on what came off disk. Recording that type will also enable us to
> easily set up the buffer correctly for calculating the CRC at
> writeback at the end of log replay....

We can easily use blf_type for the type of buffer, that's what I had
changed the btree patch to after posting it.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-09-24 17:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-23 17:28 RFC: adding a crc field to xfs_buf_log_format_t Christoph Hellwig
2008-09-24  1:05 ` Dave Chinner
2008-09-24  7:03   ` Timothy Shimmin
2008-09-24 17:29   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox