From: Dave Chinner <david@fromorbit.com>
To: Ben Myers <bpm@sgi.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 09/11] xfs: rework remote attr CRCs
Date: Fri, 24 May 2013 09:35:01 +1000 [thread overview]
Message-ID: <20130523233501.GE24543@dastard> (raw)
In-Reply-To: <20130523215405.GT20028@sgi.com>
On Thu, May 23, 2013 at 04:54:05PM -0500, Ben Myers wrote:
> Hi Dave,
>
> On Tue, May 21, 2013 at 06:02:08PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > Note: this changes the on-disk remote attribute format. I assert
> > that this is OK to do as CRCs are marked experimental and the first
> > kernel it is included in has not yet reached release yet. Further,
> > the userspace utilities are still evolving and so anyone using this
> > stuff right now is a developer or tester using volatile filesystems
> > for testing this feature. Hence changing the format right now to
> > save longer term pain is the right thing to do.
>
> I have no objection to changing the on-disk format for an experimental feature
> even after it has released. People know what they're signing up for when
> they're turning on experimental features. Or they soon will. We should
> probably be nice and use feature bits though. I'm thinking about the unlinked
> list modifications...
>
> > The fundamental change is to move from a header per extent in the
> > attribute to a header per filesytem block in the attribute. This
> > means there are more header blocks and the parsing of the attribute
> > data is slightly more complex, but it has the advantage that we
> > always know the size of the attribute on disk based on the length of
> > the data it contains.
> >
> > This is where the header-per-extent method has problems. We don't
> > know the size of the attribute on disk without first knowing how
> > many extents are used to hold it. And we can't tell from a
> > mapping lookup, either, because remote attributes can be allocated
> > contiguously with other attribute blocks and so there is no obvious
> > way of determining the actual size of the atribute on disk short of
> > walking and mapping buffers.
>
> You can't tell exactly how many extents will be required to store the attribute
> until after they've been allocated. At least... not until you're down to
> trying to allocate the last block.
>
> > The problem with this approach is that if we map a buffer
> > incorrectly (e.g. we make the last buffer for the attribute data too
> > long), we then get buffer cache lookup failure when we map it
> > correctly. i.e. we get a size mismatch on lookup.
>
> Ah. The resulting block count of a fragmented remote attribute is not
> reflected in 'valuelen' in the remote attribute entry, so the number of headers
> to account for is unknown.
>
> typedef struct xfs_attr_leaf_name_remote {
> __be32 valueblk; /* block number of value bytes */
> __be32 valuelen; /* number of bytes in value */
> __u8 namelen; /* length of name bytes */
> __u8 name[1]; /* name bytes */
> } xfs_attr_leaf_name_remote_t;
>
> Other options:
>
> valuelen could be changed to a block count. Now that you have offset and
> payload length in the header you can get valuelen from there on the last
> extent.
Which is an on-disk format change.
If valuelen is stored a block count, how do you know how long the
attribute data is in bytes? You don't - it's not stored on disk
anymore in the the attribute entry. How do you return the length
of the attribute to userspace on a lookup without now having to read
the data blocks directly?
> valuelen could be split in two, half is the block count and the other half is
> the payload length in the last extent.
Also an on disk format change. And it doesn't solve the problem, as
you still can't work out the attribute data length because you don't
know how many extents (and therefore headers) make up that block
count.
And both of these mean that all the remote attr code needs to handle
the remote attribute data length information differently. i.e.
Handle one type of indexing for existing filesystems and a different
type of indexing for v5 superblocks.
It just adds more complexity to what is already complex and fragile.
> > This is not
> > necessarily fatal, but it's a cache coherency problem that can lead
> > to returning the wrong data to userspace or writing the wrong data
> > to disk. And debug kernels will assert fail if this occurs.
>
> What data was being exposed? The data after the attribute? I thought testing
> for valuelen would take care of that.
Stale data. e.g. we do an attribute write and get a buffer
length of 16 blocks at bno X. we then do an attribute read, and get
a length of 8 blocks at bno X because we've failed to calculate the
length correctly.
A debug kernel will assert fail in _xfs_buf_find here because of the
length mismatch. A production kernel may fail to find a match, and
if it does fail, then we'll read from bno X to fill the buffer. So
what is returned is not attribute data but whatever stale data is on
disk at bno X....
> > + ASSERT(len >= XFS_LBSIZE(mp));
> > +
> > + while (len > 0 && *valuelen > 0) {
> > + int hdr_size;
> > + int byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, XFS_LBSIZE(mp));
> > +
> > + byte_cnt = min(*valuelen, byte_cnt);
> ^^^
>
> You used 'min_t(int,...)' above and min() here. Any reason?
Both are of the same type, so min is appropriate. I'll fix it up.
> > - xfs_mount_t *mp;
> > - xfs_bmbt_irec_t map;
> > - xfs_buf_t *bp;
> > - xfs_daddr_t dblkno;
> > - xfs_dablk_t lblkno;
> > - int valuelen, blkcnt, nmap, error, done, committed;
> > + struct xfs_mount *mp = args->dp->i_mount;
> > + xfs_dablk_t lblkno;
> > + int blkcnt;
> > + int error;
> > + int done;
> >
> > trace_xfs_attr_rmtval_remove(args);
> >
> > - mp = args->dp->i_mount;
> > -
> > /*
> > * Roll through the "value", invalidating the attribute value's blocks.
> > * Note that args->rmtblkcnt is the minimum number of data blocks we'll
>
> I believe this comment is now out of date and needs to be updated.
Yup, good catch. Will fix.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2013-05-23 23:35 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-21 8:01 [PATCH 00/11] xfs: fixes for 3.10-rc3 Dave Chinner
2013-05-21 8:02 ` [PATCH 01/11] xfs: avoid nesting transactions in xfs_qm_scall_setqlim() Dave Chinner
2013-05-21 18:35 ` Ben Myers
2013-05-21 8:02 ` [PATCH 02/11] xfs: remote attribute allocation may be contiguous Dave Chinner
2013-05-21 8:02 ` [PATCH 03/11] xfs: remote attribute read too short Dave Chinner
2013-05-21 20:59 ` Ben Myers
2013-05-21 22:53 ` Dave Chinner
2013-05-21 8:02 ` [PATCH 04/11] xfs: remote attribute tail zeroing does too much Dave Chinner
2013-05-21 22:31 ` Ben Myers
2013-05-21 22:55 ` Dave Chinner
2013-05-21 8:02 ` [PATCH 05/11] xfs: correctly map remote attr buffers during removal Dave Chinner
2013-05-22 17:01 ` Ben Myers
2013-05-21 8:02 ` [PATCH 06/11] xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalance Dave Chinner
2013-05-22 20:50 ` Ben Myers
2013-05-22 23:54 ` Dave Chinner
2013-05-23 22:51 ` Ben Myers
2013-05-21 8:02 ` [PATCH 07/11] xfs: fully initialise temp leaf in xfs_attr3_leaf_compact Dave Chinner
2013-05-22 21:59 ` Ben Myers
2013-05-22 23:58 ` Dave Chinner
2013-05-23 22:51 ` Ben Myers
2013-05-21 8:02 ` [PATCH 08/11] xfs: don't emit v5 superblock warnings on write Dave Chinner
2013-05-22 22:26 ` Ben Myers
2013-05-23 0:03 ` Dave Chinner
2013-05-23 15:23 ` Ben Myers
2013-05-23 23:13 ` Dave Chinner
2013-05-21 8:02 ` [PATCH 09/11] xfs: rework remote attr CRCs Dave Chinner
2013-05-23 21:54 ` Ben Myers
2013-05-23 23:35 ` Dave Chinner [this message]
2013-05-21 8:02 ` [PATCH 10/11] xfs: fix incorrect remote symlink block count Dave Chinner
2013-05-24 20:36 ` Ben Myers
2013-05-24 20:39 ` Ben Myers
2013-05-24 23:41 ` Dave Chinner
2013-05-25 15:16 ` Ben Myers
2013-05-21 8:02 ` [PATCH 11/11] xfs: increase number of ACL entries for V5 superblocks Dave Chinner
2013-05-21 14:00 ` Brian Foster
2013-05-21 20:27 ` Dave Chinner
2013-05-21 16:26 ` [PATCH 00/11] xfs: fixes for 3.10-rc3 Ben Myers
2013-05-21 20:24 ` Dave Chinner
2013-05-21 20:52 ` Ben Myers
2013-05-21 21:27 ` Dave Chinner
2013-05-23 12:30 ` [PATCH 0/2] xfs: more " Dave Chinner
2013-05-23 12:30 ` [PATCH 1/2] xfs: rework dquot CRCs Dave Chinner
2013-05-23 12:30 ` [PATCH 2/2] xfs: fix split buffer vector log recovery support Dave Chinner
2013-05-24 8:58 ` [patch 0/2] xfs: yet more fixes for 3.10-rc3 Dave Chinner
2013-05-24 8:58 ` [PATCH 1/2] xfs: disable swap extents ioctl on CRC enabled filesystems Dave Chinner
2013-05-24 8:58 ` [PATCH 2/2] xfs: kill suid/sgid through the truncate path Dave Chinner
2013-05-24 10:02 ` Christoph Hellwig
2013-05-24 10:07 ` Christoph Hellwig
2013-05-24 18:29 ` [PATCH 00/11] xfs: fixes for 3.10-rc3 Ben Myers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130523233501.GE24543@dastard \
--to=david@fromorbit.com \
--cc=bpm@sgi.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox