linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs <linux-xfs@vger.kernel.org>
Subject: Re: [PATCH] xfsdocs: capture some information about dirs vs. attrs and how they use dabtrees
Date: Mon, 13 Apr 2020 12:37:09 -0700	[thread overview]
Message-ID: <20200413193709.GH6749@magnolia> (raw)
In-Reply-To: <20200409001608.GR24067@dread.disaster.area>

On Thu, Apr 09, 2020 at 10:16:08AM +1000, Dave Chinner wrote:
> On Wed, Apr 08, 2020 at 04:27:53PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Dave and I had a short discussion about whether or not xattr trees
> > needed to have the same free space tracking that directories have, and
> > a comparison of how each of the two metadata types interact with
> > dabtrees resulted.  I've reworked this a bit to make it flow better as a
> > book chapter, so here we go.
> > 
> > Original-mail: https://lore.kernel.org/linux-xfs/20200404085203.1908-1-chandanrlinux@gmail.com/T/#mdd12ad06cf5d635772cc38946fc5b22e349e136f
> > Originally-from: Dave Chinner <david@fromorbit.com>
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Couple of things.
> 
> We are talking about btrees and where the record data is being
> stored (internal or external). Hence I think it makes sense to refer
> to "attribute records" and "directory records" (or "dirent records")
> rather than "attributes" and "directory entries"...

Ok, I'll clean that up 

> "leaves" -> "leaf nodes"

Fixed.

> > ---
> >  .../extended_attributes.asciidoc                   |   49 ++++++++++++++++++++
> >  1 file changed, 49 insertions(+)
> > 
> > diff --git a/design/XFS_Filesystem_Structure/extended_attributes.asciidoc b/design/XFS_Filesystem_Structure/extended_attributes.asciidoc
> > index 99f7b35..d61c649 100644
> > --- a/design/XFS_Filesystem_Structure/extended_attributes.asciidoc
> > +++ b/design/XFS_Filesystem_Structure/extended_attributes.asciidoc
> > @@ -910,3 +910,52 @@ Log sequence number of the last write to this block.
> >  
> >  Filesystems formatted prior to v5 do not have this header in the remote block.
> >  Value data begins immediately at offset zero.
> > +
> > +== Key Differences Between Directories and Extended Attributes
> > +
> > +Though directories and extended attributes can take advantage of the same
> > +variable length record btree structures (i.e. the dabtree) to map name hashes
> > +to disk blocks, there are major differences in the ways that each of those
> > +users embed the btree within the information that they are storing.
> > +
> > +Directory blocks require external free space tracking because the directory
> > +blocks are not part of the dabtree itself.  The dabtree leaves for a directory
> > +map name hashes to external directory data blocks.  Extended attributes, on
> 
> "The dabtree leaves for ...." implies it is going somewhere, not
> that you are talking about leaf nodes. :) Perhaps:
> 
> "The directory dabtree leaf nodes contain a mapping between name
> hash and the location of the dirent record in the external directory
> data blocks."

<nod>

> > +the other hand, store all of the attributes in the leaves of the dabtree.
> 
> "... store the attribute records directly in the dabtree leaf
> nodes."

<nod>

> > +
> > +When we add or remove an extended attribute in the dabtree, we split or merge
> > +leaves of the tree based on where the name hash index tells us a leaf needs to
> > +be inserted into or removed.  In other words, we make space available or
> > +collapse sparse leaves of the dabtree as a side effect of inserting or
> > +removing attributes.
> > +
> > +The directory structure is very different.  Directory entries cannot change
> > +location because each entry's logical offset into the directory data segment
> > +is used as the readdir/seekdir/telldir cookie, and the cookie is required to
> > +be stable for the life of the entry.  Therefore, we cannot store directory
> > +entries in the leaves of a dabtree (which is indexed in hash order) because
> 
> The userspace readdir/seekdir/telldir directory cookie API places a
> requirement on the directory structure that dirent record cookie
> cannot change for the life of the dirent record. We use the dirent
> record's logical offset into the directory data segment for that
> cookie, and hence the dirent record cannot change location.
> Therefore, we cannot store directory records in the leaf nodes of
> the dabtree....

Ok, I'll massage that in. :)

> > +the offset into the tree would change as other entries are inserted and
> > +removed.  Hence when we remove directory entries, we must leave holes in the
> > +data segment so the rest of the entries do not move.
> > +
> > +The directory name hash index (the dabtree bit) is held in the second
> > +directory segment.  Because the dabtree only stores pointers to directory
> > +entries in the (first) data segment, there is no need to leave holes in the
> > +dabtree itself.  The dabtree merges or splits leaves as required as pointers
> > +to the directory data segment are added or removed.  The dabtree itself needs
> > +no free space tracking.
> > +
> > +When we go to add a directory entry, we need to find the best-fitting free
> 
> s/go to//

Fixed.

> > +space in the directory data segment to turn into the new entry.  This requires
> > +a free space index for the directory data segment.  The free space index is
> > +held in the third directory segment.  Once we've used the free space index to
> > +find the block with that best free space, we modify the directory data block
> > +and update the dabtree to point the name hash at the new entry.
> > +
> > +In other words, the requirement for a free space map in the directory
> > +structure results from storing the directory entry data externally to the
> > +dabtree.  Extended atttributes are stored directly in the leaves of the
> 
> dabtree leaf nodes

Fixed.

> > +dabtree (except for remote attributes which can be anywhere in the attr fork
> > +address space) and do not need external free space tracking to determine where
> > +to best insert them.  As a result, extended attributes exhibit nearly perfect
> > +scaling until we run out of memory.
> 
> Thanks for doing this, Darrick!

NP.  v2 is on its way.

--D

> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

      reply	other threads:[~2020-04-13 19:37 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-08 23:27 [PATCH] xfsdocs: capture some information about dirs vs. attrs and how they use dabtrees Darrick J. Wong
2020-04-09  0:16 ` Dave Chinner
2020-04-13 19:37   ` Darrick J. Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200413193709.GH6749@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).