From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs <linux-xfs@vger.kernel.org>
Subject: Re: [PATCH] xfsdocs: capture some information about dirs vs. attrs and how they use dabtrees
Date: Mon, 13 Apr 2020 12:37:09 -0700 [thread overview]
Message-ID: <20200413193709.GH6749@magnolia> (raw)
In-Reply-To: <20200409001608.GR24067@dread.disaster.area>
On Thu, Apr 09, 2020 at 10:16:08AM +1000, Dave Chinner wrote:
> On Wed, Apr 08, 2020 at 04:27:53PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> >
> > Dave and I had a short discussion about whether or not xattr trees
> > needed to have the same free space tracking that directories have, and
> > a comparison of how each of the two metadata types interact with
> > dabtrees resulted. I've reworked this a bit to make it flow better as a
> > book chapter, so here we go.
> >
> > Original-mail: https://lore.kernel.org/linux-xfs/20200404085203.1908-1-chandanrlinux@gmail.com/T/#mdd12ad06cf5d635772cc38946fc5b22e349e136f
> > Originally-from: Dave Chinner <david@fromorbit.com>
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
>
> Couple of things.
>
> We are talking about btrees and where the record data is being
> stored (internal or external). Hence I think it makes sense to refer
> to "attribute records" and "directory records" (or "dirent records")
> rather than "attributes" and "directory entries"...
Ok, I'll clean that up
> "leaves" -> "leaf nodes"
Fixed.
> > ---
> > .../extended_attributes.asciidoc | 49 ++++++++++++++++++++
> > 1 file changed, 49 insertions(+)
> >
> > diff --git a/design/XFS_Filesystem_Structure/extended_attributes.asciidoc b/design/XFS_Filesystem_Structure/extended_attributes.asciidoc
> > index 99f7b35..d61c649 100644
> > --- a/design/XFS_Filesystem_Structure/extended_attributes.asciidoc
> > +++ b/design/XFS_Filesystem_Structure/extended_attributes.asciidoc
> > @@ -910,3 +910,52 @@ Log sequence number of the last write to this block.
> >
> > Filesystems formatted prior to v5 do not have this header in the remote block.
> > Value data begins immediately at offset zero.
> > +
> > +== Key Differences Between Directories and Extended Attributes
> > +
> > +Though directories and extended attributes can take advantage of the same
> > +variable length record btree structures (i.e. the dabtree) to map name hashes
> > +to disk blocks, there are major differences in the ways that each of those
> > +users embed the btree within the information that they are storing.
> > +
> > +Directory blocks require external free space tracking because the directory
> > +blocks are not part of the dabtree itself. The dabtree leaves for a directory
> > +map name hashes to external directory data blocks. Extended attributes, on
>
> "The dabtree leaves for ...." implies it is going somewhere, not
> that you are talking about leaf nodes. :) Perhaps:
>
> "The directory dabtree leaf nodes contain a mapping between name
> hash and the location of the dirent record in the external directory
> data blocks."
<nod>
> > +the other hand, store all of the attributes in the leaves of the dabtree.
>
> "... store the attribute records directly in the dabtree leaf
> nodes."
<nod>
> > +
> > +When we add or remove an extended attribute in the dabtree, we split or merge
> > +leaves of the tree based on where the name hash index tells us a leaf needs to
> > +be inserted into or removed. In other words, we make space available or
> > +collapse sparse leaves of the dabtree as a side effect of inserting or
> > +removing attributes.
> > +
> > +The directory structure is very different. Directory entries cannot change
> > +location because each entry's logical offset into the directory data segment
> > +is used as the readdir/seekdir/telldir cookie, and the cookie is required to
> > +be stable for the life of the entry. Therefore, we cannot store directory
> > +entries in the leaves of a dabtree (which is indexed in hash order) because
>
> The userspace readdir/seekdir/telldir directory cookie API places a
> requirement on the directory structure that dirent record cookie
> cannot change for the life of the dirent record. We use the dirent
> record's logical offset into the directory data segment for that
> cookie, and hence the dirent record cannot change location.
> Therefore, we cannot store directory records in the leaf nodes of
> the dabtree....
Ok, I'll massage that in. :)
> > +the offset into the tree would change as other entries are inserted and
> > +removed. Hence when we remove directory entries, we must leave holes in the
> > +data segment so the rest of the entries do not move.
> > +
> > +The directory name hash index (the dabtree bit) is held in the second
> > +directory segment. Because the dabtree only stores pointers to directory
> > +entries in the (first) data segment, there is no need to leave holes in the
> > +dabtree itself. The dabtree merges or splits leaves as required as pointers
> > +to the directory data segment are added or removed. The dabtree itself needs
> > +no free space tracking.
> > +
> > +When we go to add a directory entry, we need to find the best-fitting free
>
> s/go to//
Fixed.
> > +space in the directory data segment to turn into the new entry. This requires
> > +a free space index for the directory data segment. The free space index is
> > +held in the third directory segment. Once we've used the free space index to
> > +find the block with that best free space, we modify the directory data block
> > +and update the dabtree to point the name hash at the new entry.
> > +
> > +In other words, the requirement for a free space map in the directory
> > +structure results from storing the directory entry data externally to the
> > +dabtree. Extended atttributes are stored directly in the leaves of the
>
> dabtree leaf nodes
Fixed.
> > +dabtree (except for remote attributes which can be anywhere in the attr fork
> > +address space) and do not need external free space tracking to determine where
> > +to best insert them. As a result, extended attributes exhibit nearly perfect
> > +scaling until we run out of memory.
>
> Thanks for doing this, Darrick!
NP. v2 is on its way.
--D
> -Dave.
> --
> Dave Chinner
> david@fromorbit.com
prev parent reply other threads:[~2020-04-13 19:37 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-08 23:27 [PATCH] xfsdocs: capture some information about dirs vs. attrs and how they use dabtrees Darrick J. Wong
2020-04-09 0:16 ` Dave Chinner
2020-04-13 19:37 ` Darrick J. Wong [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200413193709.GH6749@magnolia \
--to=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).