public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Ben Myers <bpm@sgi.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 09/21] xfs: add version 3 inode format with CRCs
Date: Wed, 3 Apr 2013 15:08:45 +1100	[thread overview]
Message-ID: <20130403040845.GU17758@dastard> (raw)
In-Reply-To: <20130402224433.GZ22182@sgi.com>

On Tue, Apr 02, 2013 at 05:44:33PM -0500, Ben Myers wrote:
> On Wed, Mar 27, 2013 at 12:48:28PM +1100, Dave Chinner wrote:
> > On Tue, Mar 26, 2013 at 07:53:07PM -0500, Ben Myers wrote:
> > > On Wed, Mar 27, 2013 at 09:56:00AM +1100, Dave Chinner wrote:
> > > > On Fri, Mar 15, 2013 at 12:11:04PM +1100, Dave Chinner wrote:
> > > > Ben, FYI: I've taken the easy way out for this - log the entire
> > > > inode buffer rather than just the inode core. The CRC means we are
> > > > dependent on having all the inode logged so that seems to be the
> > > > simplest way to deal with this problem overall, even though it
> > > > increases the amount of metadata logged for inode creates
> > > > substantially.
> > > > 
> > > > I'll address this potential performance issue in future with new
> > > > inode create and unlink transactions that allow us to avoid logging
> > > > buffers for all inode modifications. There are other good reasons
> > > > for doing this as well (e.g. avoid the subtly broken special
> > > > handling of physical inode buffer logging vs logical inode logging
> > > > in log recovery), so I think this is best to just take the simple
> > > > option here....
> > > 
> > > It seems like this is a more general problem with fresh on-disk
> > > structures.  When we calculate crc and log only part of a buffer we are
> > > prone to the crc being incorrect after log replay because the unlogged
> > > portions of the buffer are still undefined.  They aren't the 0s we
> > > calculated crcs with.
> > 
> > But it doesn't matter for all other metadata as we don't log CRC
> > fields except in the inode/dquot at allocation. It is the exception
> > rather than the rule.
.....
> > > 2) Create a new transaction to write a known pattern over the
> > > entire buffer, then initialize the buffer with that pattern,
> > > calculate the crc, and still log only the parts of the buffer
> > > which were modified.  In the non-crash case we still need to
> > > arrange for the buffer to be patterned after the log wraps, but it
> > > has the advantage of not having to log large structures just to
> > > zero them.
> > 
> > We need to ensure we log the entire object if we are logging the CRC
> > of the object.
> 
> We don't need to log the entire object if we can arrange for the contents of
> the buffer to be a known pattern after recovery and then calculate the CRC
> against that.  It's just the initialization that is problematic.  The rest of
> the time the contents are already cached anyway.  

Right, but...

> > In this case, the initialisation and calculation of
> > the CRC needs to be atomic, so it needs to be a single transactions.
> 
> I agree that the initialisation of the block and the calculation of the crc
> must be in the same transaction.  It would need to be a new log item type that
> specifies a pattern (normally zero) and a length to be written to the buffer.
> I used the wrong terminology, as usual.
> 
> > That's what logging the entire buffer does.
> 
> Yep.  I'm just pointing out that if logging the entire structure becomes an
> issue we have some other options.

.... to do that we need a new transaction type, new flags/fields in
the xfs_buf_log_item, new handling of unlogged buffer contents that
still are tracked in the AIL, new reservations, new transaction
nesting as there's now 3 transactions needed for inode allocation,
etc. It's pretty messy, and it doesn't replace the fact we then
immediately have to relog the buffer with the initialised inode
cores. It doesn't simplify log recovery, either, and that already
has issues with buffer based inode allocation vs logical inode
logging....

But, as I mentioned, I already have a patchset that basically does
all this for inode allocation. It doesn't initialised buffers to a
byte pattern - it initialises a contiguous extent to contain inodes,
and introduces a "ordered buffer" that is not logged but is still
tracked in the AIL to ensure that the correct behaviour occurs. That
patchset has been around for a while - the original series I wrote:

$ ls -l src/kern/patches/icreate
total 64
-rw------- 1 dave dave 10423 Dec  3  2009 xfs-icreate-factor-inode-stamping
-rw------- 1 dave dave 12304 Dec  3  2009 xfs-icreate-item
-rw------- 1 dave dave  6347 Dec  3  2009 xfs-icreate-ordered-buf-item
-rw------- 1 dave dave  2679 Dec  3  2009 xfs-icreate-remove-log-di
-rw------- 1 dave dave  4713 Dec  3  2009 xfs-icreate-use-xact
-rw------- 1 dave dave  7924 Dec  3  2009 xfs-icreate-xact-recovery
-rw------- 1 dave dave  6159 Dec  3  2009 xfs-icreate-xact-resv

That will solve the perf problem of inode initialisation and CRCs,
as well as a bunch of other problems limiting inode create
performance. It will also avoid having to log buffers for inode
creation and hence remove all the recovery coherency problems that
causes...

> This could be useful for other reasons too,
> e.g. to prevent stale data exposure after a crash.

That can't actually happen as they are metadata buffers and hence
the unreferenced contents of the buffers cannot escape to
userspace....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2013-04-03  4:08 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-12 12:30 [PATCH 00/21] xfs: metadata CRCs, third version Dave Chinner
2013-03-12 12:30 ` [PATCH 01/21] xfs: ensure we capture IO errors correctly Dave Chinner
2013-03-12 12:30 ` [PATCH 02/21] xfs: increase hexdump output in xfs_corruption_error Dave Chinner
2013-03-14 21:18   ` Ben Myers
2013-03-15  1:13     ` Dave Chinner
2013-03-12 12:30 ` [PATCH 03/21] xfs: take inode version into account in XFS_LITINO Dave Chinner
2013-03-12 12:30 ` [PATCH 04/21] xfs: add support for large btree blocks Dave Chinner
2013-03-12 12:30 ` [PATCH 05/21] xfs: add CRC checks to the AGF Dave Chinner
2013-03-12 12:30 ` [PATCH 06/21] xfs: add CRC checks to the AGFL Dave Chinner
2013-03-12 12:30 ` [PATCH 07/21] xfs: add CRC checks to the AGI Dave Chinner
2013-03-12 12:30 ` [PATCH 08/21] xfs: add CRC checks for quota blocks Dave Chinner
2013-03-12 12:30 ` [PATCH 09/21] xfs: add version 3 inode format with CRCs Dave Chinner
2013-03-14 16:03   ` Ben Myers
2013-03-14 19:01     ` Ben Myers
2013-03-15  1:11     ` Dave Chinner
2013-03-26 22:56       ` Dave Chinner
2013-03-27  0:53         ` Ben Myers
2013-03-27  1:48           ` Dave Chinner
2013-04-02 22:44             ` Ben Myers
2013-04-03  4:08               ` Dave Chinner [this message]
2013-04-02 22:49   ` Ben Myers
2013-03-12 12:30 ` [PATCH 10/21] xfs: add CRC checks to remote symlinks Dave Chinner
2013-03-20 21:14   ` Ben Myers
2013-03-21  1:22     ` Dave Chinner
2013-03-21 14:59       ` Ben Myers
2013-03-20 22:03   ` Ben Myers
2013-03-21  1:32     ` Dave Chinner
2013-03-12 12:30 ` [PATCH 11/21] xfs: add CRC checks to block format directory blocks Dave Chinner
2013-03-26 18:39   ` Ben Myers
2013-03-26 21:40     ` Dave Chinner
2013-03-12 12:30 ` [PATCH 12/21] xfs: add CRC checking to dir2 free blocks Dave Chinner
2013-03-28 23:40   ` Ben Myers
2013-03-29  3:13     ` Dave Chinner
2013-03-12 12:30 ` [PATCH 13/21] xfs: add CRC checking to dir2 data blocks Dave Chinner
2013-04-03 22:13   ` Ben Myers
2013-03-12 12:30 ` [PATCH 14/21] xfs: add CRC checking to dir2 leaf blocks Dave Chinner
2013-03-12 12:30 ` [PATCH 15/21] xfs: shortform directory offsets change for dir3 format Dave Chinner
2013-03-12 12:30 ` [PATCH 16/21] xfs: add CRCs to dir2/da node blocks Dave Chinner
2013-03-12 12:30 ` [PATCH 17/21] xfs: add CRCs to attr leaf blocks Dave Chinner
2013-03-12 12:30 ` [PATCH 18/21] xfs: split remote attribute code out Dave Chinner
2013-03-12 12:30 ` [PATCH 19/21] xfs: add CRC protection to remote attributes Dave Chinner
2013-03-12 12:30 ` [PATCH 20/21] xfs: add buffer types to directory and attribute buffers Dave Chinner
2013-03-12 12:30 ` [PATCH 21/21] xfs: add CRC checks to the superblock Dave Chinner
2013-03-26 20:58   ` Chandra Seetharaman
2013-03-27  1:06     ` Dave Chinner
2013-03-27 23:07       ` Chandra Seetharaman
2013-03-28  1:36         ` Dave Chinner
2013-03-12 12:43 ` [PATCH 22/21] xfs: Fix magic number assert in xfs_dir3_leaf_log_bests Dave Chinner
2013-03-13  0:29 ` [PATCH 23/21] xfs: fix endian issues reported by sparse Dave Chinner
2013-03-13  1:34 ` [PATCH 24/21] xfs: buffer type overruns blf_flags field Dave Chinner
2013-03-14 21:41 ` [PATCH 00/21] xfs: metadata CRCs, third version Ben Myers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130403040845.GU17758@dastard \
    --to=david@fromorbit.com \
    --cc=bpm@sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox