From: Dave Chinner <david@fromorbit.com>
To: Ben Myers <bpm@sgi.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 16/22] xfs: add CRCs to attr leaf blocks
Date: Wed, 24 Apr 2013 11:17:21 +1000 [thread overview]
Message-ID: <20130424011721.GP10481@dastard> (raw)
In-Reply-To: <20130423230245.GC29359@sgi.com>
On Tue, Apr 23, 2013 at 06:02:45PM -0500, Ben Myers wrote:
> Hi Dave,
>
> On Wed, Apr 03, 2013 at 04:11:26PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
>
> Comments below.
....
> > + if (ichdr.count == 0)
> > + return false;
> > +
> > + /* XXX: need to range check rest of attr header values */
> > + /* XXX: hash order check? */
>
> Seems reasonable for debug code. Is that coming along in a subsequent patch?
Same as for the last patch - not in this patch set, but in future
work.
> > int
> > -xfs_attr_leaf_to_shortform(
> > - struct xfs_buf *bp,
> > - xfs_da_args_t *args,
> > - int forkoff)
> > +xfs_attr3_leaf_to_shortform(
> > + struct xfs_buf *bp,
> > + struct xfs_da_args *args,
> > + int forkoff)
> > {
> > - xfs_attr_leafblock_t *leaf;
> > - xfs_attr_leaf_entry_t *entry;
> > - xfs_attr_leaf_name_local_t *name_loc;
> > - xfs_da_args_t nargs;
> > - xfs_inode_t *dp;
> > - char *tmpbuffer;
> > - int error, i;
> > + struct xfs_attr_leafblock *leaf;
> > + struct xfs_attr3_icleaf_hdr ichdr;
> > + struct xfs_attr_leaf_entry *entry;
> > + struct xfs_attr_leaf_name_local *name_loc;
> > + struct xfs_da_args nargs;
> > + struct xfs_inode *dp = args->dp;
> > + char *tmpbuffer;
> > + int error;
> > + int i;
> >
> > trace_xfs_attr_leaf_to_sf(args);
> >
> > - dp = args->dp;
> > tmpbuffer = kmem_alloc(XFS_LBSIZE(dp->i_mount), KM_SLEEP);
> > - ASSERT(tmpbuffer != NULL);
> > + if (!tmpbuffer)
> > + return ENOMEM;
> >
> > - ASSERT(bp != NULL);
> > memcpy(tmpbuffer, bp->b_addr, XFS_LBSIZE(dp->i_mount));
> > +
> > leaf = (xfs_attr_leafblock_t *)tmpbuffer;
> > - ASSERT(leaf->hdr.info.magic == cpu_to_be16(XFS_ATTR_LEAF_MAGIC));
> > + xfs_attr3_leaf_hdr_from_disk(&ichdr, leaf);
> > + entry = xfs_attr3_leaf_entryp(leaf);
> > +
> > + /* XXX (dgc): buffer is about to be marked stale - why zero it? */
> > memset(bp->b_addr, 0, XFS_LBSIZE(dp->i_mount));
>
> Good point. Why do we need a temporary buffer here? It seems like we could
> keep the leaf around long enough to copy from it directly into shortform.
Right. I suspect that this dates back to a long time ago when we
didn't have busy extents preventing stale metadata exposure to
userspace and attributes used sync IO to avoid problems. There are
other grotty bits in the attribute code (e.g. remote attrs are
written synchronously rather than logged), so I'd say this is where
it came from. It's another thing that we can clean up later...
> > @@ -963,52 +1120,62 @@ out:
> > * or a leaf in a node attribute list.
> > */
> > STATIC int
> > -xfs_attr_leaf_create(
> > - xfs_da_args_t *args,
> > - xfs_dablk_t blkno,
> > - struct xfs_buf **bpp)
> > +xfs_attr3_leaf_create(
> > + struct xfs_da_args *args,
> > + xfs_dablk_t blkno,
> > + struct xfs_buf **bpp)
> > {
> > - xfs_attr_leafblock_t *leaf;
> > - xfs_attr_leaf_hdr_t *hdr;
> > - xfs_inode_t *dp;
> > - struct xfs_buf *bp;
> > - int error;
> > + struct xfs_attr_leafblock *leaf;
> > + struct xfs_attr3_icleaf_hdr ichdr;
> > + struct xfs_inode *dp = args->dp;
> > + struct xfs_mount *mp = dp->i_mount;
> > + struct xfs_buf *bp;
> > + int error;
> >
> > trace_xfs_attr_leaf_create(args);
> >
> > - dp = args->dp;
> > - ASSERT(dp != NULL);
> > error = xfs_da_get_buf(args->trans, args->dp, blkno, -1, &bp,
> > XFS_ATTR_FORK);
> > if (error)
> > - return(error);
> > - bp->b_ops = &xfs_attr_leaf_buf_ops;
> > + return error;
> > + bp->b_ops = &xfs_attr3_leaf_buf_ops;
> > leaf = bp->b_addr;
> > - memset((char *)leaf, 0, XFS_LBSIZE(dp->i_mount));
> > - hdr = &leaf->hdr;
> > - hdr->info.magic = cpu_to_be16(XFS_ATTR_LEAF_MAGIC);
> > - hdr->firstused = cpu_to_be16(XFS_LBSIZE(dp->i_mount));
> > - if (!hdr->firstused) {
> > - hdr->firstused = cpu_to_be16(
> > - XFS_LBSIZE(dp->i_mount) - XFS_ATTR_LEAF_NAME_ALIGN);
> > - }
>
> Saw this several times in review. I don't understand why that code would be
> there, given that we just set firstused to be XFS_LBSIZE, the blocksize of the
> filesystem. It's dead code now, so it's fine to remove... There must be some
> history there. Any idea?
It's in the initial commit for the attribute code way back in 1995.
It didn't make sense then, either, because there's code right around
it such as memory allocations that use XFS_LBSIZE(dp->i_mount) as
the size of the buffer to allocate. Hence I think it's been dead
code ever since it was written...
> > @@ -1113,82 +1279,90 @@ xfs_attr_leaf_add(
> > * and we don't have enough freespace, then compaction will do us
> > * no good and we should just give up.
> > */
> > - if (!hdr->holes && (sum < entsize))
> > - return(XFS_ERROR(ENOSPC));
> > + if (!ichdr.holes && sum < entsize)
> > + return XFS_ERROR(ENOSPC);
> >
> > /*
> > * Compact the entries to coalesce free space.
> > * This may change the hdr->count via dropping INCOMPLETE entries.
> > */
> > - xfs_attr_leaf_compact(args, bp);
> > + xfs_attr3_leaf_compact(args, &ichdr, bp);
> >
> > /*
> > * After compaction, the block is guaranteed to have only one
> > * free region, in freemap[0]. If it is not big enough, give up.
> > */
> > - if (be16_to_cpu(hdr->freemap[0].size)
> > - < (entsize + sizeof(xfs_attr_leaf_entry_t)))
> > - return(XFS_ERROR(ENOSPC));
> > + if (ichdr.freemap[0].size < (entsize + sizeof(xfs_attr_leaf_entry_t))) {
> > + tmp = ENOSPC;
> > + goto out_log_hdr;
>
> That represents a change in behavior that I'm not sure about... Before if we
> had ENOSPC here we would simply return and xfs_addr3_leaf_add_work would not
> have a chance to log the header. Now...
The difference is that hte header used to be logged in
xfs_attr_leaf_compact(). Now it is not logged in
xfs_attr3_leaf_compact() because the in-core header is modified.
Hence on an enospc error we now have to write back and log the
header. Hence there is no change in behaviour here...
>
> > + }
> > +
> > + tmp = xfs_attr3_leaf_add_work(bp, &ichdr, args, 0);
> >
> > - return(xfs_attr_leaf_add_work(bp, args, 0));
> > +out_log_hdr:
> > + xfs_attr3_leaf_hdr_to_disk(leaf, &ichdr);
> > + xfs_trans_log_buf(args->trans, bp,
> > + XFS_DA_LOGRANGE(leaf, &leaf->hdr,
> > + xfs_attr3_leaf_hdr_size(leaf)));
>
> We're logging the header here instead of in xfs_attr_leaf_add_work
Right, for the same reason as the above enospc branch - we passed
the ichdr to xfs_attr3_leaf_compact() so the header modifications
are made to the ichdr, not the buffer. The same is done here,
because it means we don't repeatedly convert to/from ichdr and
on-disk header and log it every time. Instead, we read it from disk
once in this function and pass the ichdr around to collect all the
modifications and then write it back to the disk buffer once and log
it only once....
> > + return tmp;
> > }
>
> and then returning. I think this would be better with a second goto so that we
> don't log the header in keeping with the old behavior.
The way it is done with these modifications gives an identical dirty
range on the buffer as the old code; it is just more efficient as we
do a lot less endian swapping and logging calls...
> > for (i = 0; i < XFS_ATTR_LEAF_MAPSIZE; map++, i++) {
>
> Looks like you can get rid of map.
Good catch. Fixed.
> > @@ -1286,34 +1457,69 @@ xfs_attr_leaf_compact(
> > */
> > leaf_s = (xfs_attr_leafblock_t *)tmpbuffer;
> > leaf_d = bp->b_addr;
> > - hdr_s = &leaf_s->hdr;
> > - hdr_d = &leaf_d->hdr;
> > - hdr_d->info = hdr_s->info; /* struct copy */
> > - hdr_d->firstused = cpu_to_be16(XFS_LBSIZE(mp));
> > - /* handle truncation gracefully */
> > - if (!hdr_d->firstused) {
> > - hdr_d->firstused = cpu_to_be16(
> > - XFS_LBSIZE(mp) - XFS_ATTR_LEAF_NAME_ALIGN);
> > - }
>
> Another one. Weird.
>
> > - hdr_d->usedbytes = 0;
> > - hdr_d->count = 0;
> > - hdr_d->holes = 0;
> > - hdr_d->freemap[0].base = cpu_to_be16(sizeof(xfs_attr_leaf_hdr_t));
> > - hdr_d->freemap[0].size = cpu_to_be16(be16_to_cpu(hdr_d->firstused) -
> > - sizeof(xfs_attr_leaf_hdr_t));
> > + ichdr_s = *ichdr_d; /* struct copy */
>
> ichdr_s should be initialized from the copy we have in the temporary buffer
> rather than from a struct copy of ichdr_d. I understand that what you have
> here probably works fine, but it is not very obvious, or clear that it was
> an intentional deviation from the struct copy up above.
I disagree: the canonical source of the ichdr information being
passed in is in ichdr_d, not in the on disk buffer. That is the
reason ichdr_d is passed into the function - because modifications
to the ichdr may have been made and not written back to the backing
buffer before the function was called.
If you check the code above and the various explanations of it,
you'll see that this is the that is the exact situation in which
xfs_attr3_leaf_compact() is called. Hence reading ichdr_s out of the
on-disk buffer is the wrong thing to do and would likely result in
corruptions caused by compacting blocks...
> > + xfs_attr3_leaf_moveents(leaf_s, &ichdr_s, 0, leaf_d, ichdr_d, 0,
> > + ichdr_s.count, mp);
> > + /*
> > + * this logs the entire buffer, but the caller must write the header
> > + * back to the buffer when it is finished modifying it.
> > + */
> > xfs_trans_log_buf(trans, bp, 0, XFS_LBSIZE(mp) - 1);
>
> Maybe better to just log the entries here, someday.
*nod*
>
> > @@ -2201,45 +2425,41 @@ xfs_attr_leaf_moveents(xfs_attr_leafblock_t *leaf_s, int start_s,
> > /*
> > * Set up environment.
> > */
> > - ASSERT(leaf_s->hdr.info.magic == cpu_to_be16(XFS_ATTR_LEAF_MAGIC));
> > - ASSERT(leaf_d->hdr.info.magic == cpu_to_be16(XFS_ATTR_LEAF_MAGIC));
> > - hdr_s = &leaf_s->hdr;
> > - hdr_d = &leaf_d->hdr;
> > - ASSERT((be16_to_cpu(hdr_s->count) > 0) &&
> > - (be16_to_cpu(hdr_s->count) < (XFS_LBSIZE(mp)/8)));
> > - ASSERT(be16_to_cpu(hdr_s->firstused) >=
> > - ((be16_to_cpu(hdr_s->count)
> > - * sizeof(*entry_s))+sizeof(*hdr_s)));
> > - ASSERT(be16_to_cpu(hdr_d->count) < (XFS_LBSIZE(mp)/8));
> > - ASSERT(be16_to_cpu(hdr_d->firstused) >=
> > - ((be16_to_cpu(hdr_d->count)
> > - * sizeof(*entry_d))+sizeof(*hdr_d)));
> > -
> > - ASSERT(start_s < be16_to_cpu(hdr_s->count));
> > - ASSERT(start_d <= be16_to_cpu(hdr_d->count));
> > - ASSERT(count <= be16_to_cpu(hdr_s->count));
> > + ASSERT(ichdr_s->magic == XFS_ATTR_LEAF_MAGIC ||
> > + ichdr_s->magic == XFS_ATTR3_LEAF_MAGIC);
>
> This assert might be unnecessary due to the asserts in hdr_from_disk.
might be, but given the number of interesting callers of this
function I decided that specifically ensuring the ichdr magic
numbers were valid was a good idea....
>
> > @@ -2286,71 +2504,40 @@ xfs_attr_leaf_moveents(xfs_attr_leafblock_t *leaf_s, int start_s,
> > /*
> > * Zero out the entries we just copied.
> > */
> > - if (start_s == be16_to_cpu(hdr_s->count)) {
> > + if (start_s == ichdr_s->count) {
> > tmp = count * sizeof(xfs_attr_leaf_entry_t);
> > - entry_s = &leaf_s->entries[start_s];
> > + entry_s = &xfs_attr3_leaf_entryp(leaf_s)[start_s];
> > ASSERT(((char *)entry_s + tmp) <=
> > ((char *)leaf_s + XFS_LBSIZE(mp)));
> > - memset((char *)entry_s, 0, tmp);
> > + memset(entry_s, 0, tmp);
> > } else {
> > /*
> > * Move the remaining entries down to fill the hole,
> > * then zero the entries at the top.
> > */
> > - tmp = be16_to_cpu(hdr_s->count) - count;
> > - tmp *= sizeof(xfs_attr_leaf_entry_t);
> > - entry_s = &leaf_s->entries[start_s + count];
> > - entry_d = &leaf_s->entries[start_s];
> > - memmove((char *)entry_d, (char *)entry_s, tmp);
> > + tmp = (ichdr_s->count - count) - sizeof(xfs_attr_leaf_entry_t);
> *
> Multiply.
Fine catch. Fixed.
>
> > @@ -2379,20 +2567,21 @@ xfs_attr_leaf_lasthash(
> > STATIC int
> > xfs_attr_leaf_entsize(xfs_attr_leafblock_t *leaf, int index)
> > {
> > + struct xfs_attr_leaf_entry *entries;
> > xfs_attr_leaf_name_local_t *name_loc;
> > xfs_attr_leaf_name_remote_t *name_rmt;
> > int size;
> >
> > - ASSERT(leaf->hdr.info.magic == cpu_to_be16(XFS_ATTR_LEAF_MAGIC));
>
> I don't see a replacement for this assert.
It's only ever called from places that have already validated the
magic numbers.
> > @@ -2786,38 +3003,44 @@ xfs_attr_root_inactive(xfs_trans_t **trans, xfs_inode_t *dp)
> > */
> > error = xfs_da3_node_read(*trans, dp, 0, -1, &bp, XFS_ATTR_FORK);
> > if (error)
> > - return(error);
> > - blkno = XFS_BUF_ADDR(bp);
> > + return error;
> > + blkno = bp->b_bn;
>
> I'm a little concerned about the switch from XFS_BUF_ADDR to bp->b_bn here, but
> have not looked into it yet. If you have a quick explanation, that's great.
> Otherwise I'll come back to it.
XFS_BUF_ADDR should die. bp->b_bn is the block number of the buffer
that is returned, and the attribute code does not deal with
discontiguous buffers so it should always be the correct block
number for further operations...
> > @@ -1479,7 +1480,9 @@ xfs_da3_node_lookup_int(
> > curr = blk->bp->b_addr;
> > blk->magic = be16_to_cpu(curr->magic);
> >
> > - if (blk->magic == XFS_ATTR_LEAF_MAGIC) {
> > + if (blk->magic == XFS_ATTR_LEAF_MAGIC ||
> > + blk->magic == XFS_ATTR3_LEAF_MAGIC) {
> > + blk->magic = XFS_ATTR_LEAF_MAGIC;
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> I'm still a little confused by these assignments. I understand you're
> squashing them down from a comment, and that the correct magic is still in the
> buffer... I need to find who is looking at blk->magic. Any hints?
It's basically an entry in the cursor passed through all the
xfs_da_btree.c code - node/leaf/attr/dir types matter to the code
that uses it, not the physical encoding of the block on disk....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2013-04-24 1:18 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-03 5:11 [PATCH 00/22] xfs: metadata CRCs, fourth version Dave Chinner
2013-04-03 5:11 ` [PATCH 01/22] xfs: increase hexdump output in xfs_corruption_error Dave Chinner
2013-04-03 5:11 ` [PATCH 02/22] xfs: add support for large btree blocks Dave Chinner
2013-04-03 5:11 ` [PATCH 03/22] xfs: add CRC checks to the AGF Dave Chinner
2013-04-03 5:11 ` [PATCH 04/22] xfs: add CRC checks to the AGFL Dave Chinner
2013-04-03 5:11 ` [PATCH 05/22] xfs: add CRC checks to the AGI Dave Chinner
2013-04-03 5:11 ` [PATCH 06/22] xfs: add CRC checks for quota blocks Dave Chinner
2013-04-03 5:11 ` [PATCH 07/22] xfs: add version 3 inode format with CRCs Dave Chinner
2013-04-03 5:11 ` [PATCH 08/22] xfs: split out symlink code into it's own file Dave Chinner
2013-04-03 5:11 ` [PATCH 09/22] xfs: add CRC checks to remote symlinks Dave Chinner
2013-04-03 5:11 ` [PATCH 10/22] xfs: add CRC checks to block format directory blocks Dave Chinner
2013-04-03 5:11 ` [PATCH 11/22] xfs: add CRC checking to dir2 free blocks Dave Chinner
2013-04-03 5:11 ` [PATCH 12/22] xfs: add CRC checking to dir2 data blocks Dave Chinner
2013-04-03 5:11 ` [PATCH 13/22] xfs: add CRC checking to dir2 leaf blocks Dave Chinner
2013-04-10 17:46 ` Ben Myers
2013-04-11 2:06 ` Dave Chinner
2013-04-11 16:16 ` Ben Myers
2013-04-11 21:30 ` [PATCH V2 " Dave Chinner
2013-04-03 5:11 ` [PATCH 14/22] xfs: shortform directory offsets change for dir3 format Dave Chinner
2013-04-10 19:52 ` Ben Myers
2013-04-03 5:11 ` [PATCH 15/22] xfs: add CRCs to dir2/da node blocks Dave Chinner
2013-04-22 18:55 ` Ben Myers
2013-04-24 0:33 ` Dave Chinner
2013-04-24 8:58 ` [PATCH V2 " Dave Chinner
2013-04-03 5:11 ` [PATCH 16/22] xfs: add CRCs to attr leaf blocks Dave Chinner
2013-04-23 23:02 ` Ben Myers
2013-04-24 1:17 ` Dave Chinner [this message]
2013-04-24 8:58 ` [PATCH V2 " Dave Chinner
2013-04-03 5:11 ` [PATCH 17/22] xfs: split remote attribute code out Dave Chinner
2013-04-24 19:13 ` Ben Myers
2013-04-03 5:11 ` [PATCH 18/22] xfs: add CRC protection to remote attributes Dave Chinner
2013-04-25 18:56 ` Ben Myers
2013-04-30 7:20 ` Dave Chinner
2013-04-03 5:11 ` [PATCH 19/22] xfs: add buffer types to directory and attribute buffers Dave Chinner
2013-04-26 19:09 ` Ben Myers
2013-04-30 7:28 ` Dave Chinner
2013-04-03 5:11 ` [PATCH 20/22] xfs: buffer type overruns blf_flags field Dave Chinner
2013-04-03 5:11 ` [PATCH 21/22] xfs: add CRC checks to the superblock Dave Chinner
2013-04-03 5:11 ` [PATCH 22/22] xfs: implement extended feature masks Dave Chinner
2013-04-05 6:55 ` [PATCH 00/22] xfs: metadata CRCs, fourth version Dave Chinner
2013-04-05 7:00 ` [PATCH 23/22] xfs: add metadata CRC documentation Dave Chinner
2013-04-05 10:45 ` Hans-Peter Jansen
2013-04-05 11:20 ` Dave Howorth
2013-04-07 23:06 ` Dave Chinner
2013-04-05 11:35 ` Brian Foster
2013-04-07 23:08 ` Dave Chinner
2013-04-09 6:49 ` [PATCH V2 " Dave Chinner
2013-04-09 7:33 ` [PATCH 24/22] xfs: Teach dquot recovery about CONFIG_XFS_QUOTA Dave Chinner
2013-04-27 20:44 ` Ben Myers
2013-04-30 6:18 ` Dave Chinner
2013-04-27 20:42 ` [PATCH 00/22] xfs: metadata CRCs, fourth version Ben Myers
2013-04-28 23:25 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130424011721.GP10481@dastard \
--to=david@fromorbit.com \
--cc=bpm@sgi.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox