From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: Eric Sandeen <sandeen@sandeen.net>,
Allison Henderson <allison.henderson@oracle.com>,
cmaiolino@redhat.com, linux-xfs@vger.kernel.org
Subject: Re: Parent pointers
Date: Fri, 14 Jul 2017 15:46:58 -0700 [thread overview]
Message-ID: <20170714224658.GB4224@magnolia> (raw)
In-Reply-To: <20170714190740.GA43494@bfoster.bfoster>
On Fri, Jul 14, 2017 at 03:07:41PM -0400, Brian Foster wrote:
> On Fri, Jul 14, 2017 at 01:46:27PM -0500, Eric Sandeen wrote:
> > On 07/14/2017 12:44 PM, Allison Henderson wrote:
> > > On 7/14/2017 7:04 AM, Eric Sandeen wrote:
> > >>
> > >>
> > >> On 07/14/2017 03:50 AM, Carlos Maiolino wrote:
> > >>> Hi,
> > >>>
> > >>> On Thu, Jul 13, 2017 at 04:25:25PM -0700, Allison Henderson wrote:
> > >>>> Hi all,
> > >>>>
> > >>>> I've been doing some digging on adding parent pointers to xfs and wanted to
> > >>>> send a note to folks here to get peoples opinions on it.
> > >>>
> > >>> Are you talking about parent pointers in the BTrees?
> > >>>
> > >>
> > >> No, see [RFC 00/17] RFC parent inode pointers. for example, from long
> > >> ago.
> > >>
> > >> "Parent inode support allow XFS to quickly derive a file name and
> > >> path from the mount point. This can aid in directory/path policies
> > >> and can help relocate items during filesystem shrink."
> > >>
> > >> It has a long and ... difficult history.
> > >>
> > >> -Eric
Woot, time to dive in. :)
> > > Right, so to expand on Eric's answer, it looks like Dave and Brian
> > > had been working on some improvements based on that set, but it's
> > > not quite finished yet. The idea is that we add an extended
> > > attribute to keep track of the parents inode and generation, and
Well, a lot of attributes, in the case of multiply hardlinked files.
One concern I have is that since a file can be hardlinked 2^32 times and
path components have a maximum length of 255 bytes, we'll have to be
careful about not overflowing di_anextents as a part of making a
potentially huge attribute tree.
> > > also the child entries offset, and filename. So in this solution
> > > the EA is name={parent inode #, parent inode generation, dirent
> > > offset}, value={dirent filename}.
This (the disk format) is probably the most important part to get
settled. FWIW, I've recaptured three of the previous discussions of
parent pointers -- Lachlan McIlroy's code dump in 2009[1], Mark
Tinguely's 2013 attempt[2] and 2014 attempt[3] at discussing patches.
For those of you following along at home, the gmane link mentioned in
[3] is the link at [1].
[1] http://oss.sgi.com/archives/xfs/2009-01/msg01068.html
[2] http://oss.sgi.com/archives/xfs/2013-04/msg00214.html
[3] https://www.spinics.net/lists/xfs/msg25175.html
The latest round of parent pointer patches is (I think) in the form of
an off-list patchset that Brian (and now Allison) are cleaning up for
reposting, which hopefully happens soon because I feel rather ill at
ease discussing off-list code that lacks S-o-b's.
Anyway, the current proposal is to create a new xattr namespace
(ATTR_PARENT) and to fill it with (parent_ino, parent_gen,
dirent_offset) => (dirent name) attributes:
struct xfs_parent_name_rec {
__be64 p_ino;
__be32 p_gen;
__be32 p_diroffset;
};
p_diroffset is really a directory datapointer, not a raw byte offset.
We also have an incore data structure of:
struct xfs_parent_name_irec {
__uint64_t p_ino;
__uint32_t p_gen;
__uint32_t p_diroffset;
const char *p_name;
__uint32_t p_namelen;
};
I think p_ino ought to be xfs_ino_t and not uint64_t, and p_diroffset
ought to be xfs_dir2_dataptr_t instead of uint32_t.
> > > My goal at the moment is just to get it compiling again and finish
> > > out some of the sub routines that maintain it. It looks like it
> > > hasn't had much attention in a while, so I wanted to let people
> > > know the direction I'm planning to move in before I get too far
> > > in.
<nod>
> > If you're forward-porting that 17-patch set from Mark, I'd suggest
> > first reading Dave's response to it - IIRC it amounted to a firm
> > NAK. it also highlights the complexity of this undertaking, and may
> > explain why nobody has gotten it done (yet) :)
> >
>
> The patches that came from me (to Allison) were last sent to me directly
> from Dave. I know he had some objections to the original design, but my
> understanding was that he had incorporated fixes for those issues in his
> modifications to Mark's original series (such as the xattr format and
>From what I can tell, he seems to have taken Mark's patches to pass dir
offsets back from the dirent manipulation functions and then started
writing a simple implementation of stuffing the attrs into the attr
tree.
> whatnot), but the series wasn't completely done yet in terms of
> supporting all possible directory operations. (It's probably a good idea
> to review that old thread anyways just to confirm, unless Dave catches
> this and is able to chime in one way or the other.)
(Or jump in the fray out once he's back from sabbatical.)
> I basically just forwarded ported Dave's patches to more recent kernels
> and added a couple minor fixes as I had combed through the existing
> code. I never really got to adding anything of substance before Allison
> volunteered to take over the series.
>
> FWIW, I think there was some mention of porting some of the operations
> over to the deferred ops infrastructure, but it's not clear to me off
> the top of my head how important (or appropriate) that is.
Most of the users of xfs_trans_roll seem to be buried in the xattr code.
For example, _attr_set can end up rolling a transaction after converting
an attr fork from short format to leaf format before retrying the attr
add operation, but we don't use log redo items (er, defer_ops items) to
make sure log recovery will restart the attr add operation if we crash
before the final _trans_commit after calling _attr_{leaf,node}_addname.
In the case of a regular xattr set operation this wouldn't have been a
big deal because the fsetxattr call wouldn't have returned, so all the
user could possibly see is an inode with a perhaps unnecessarily large
xattr fork. Now that we need to set xattrs in the same transaction
chain as a directory operation, it becomes very important that log
recovery gain the ability to resume an xattr add operation no matter
where the log stopped. Creating a directory can become this longwinded
pile of updates:
- Allocate directory block, map into data fork, chain rmap update, add
directory entry, chain pptr update;
- Add rmap entry for new directory block (more chaining to fix
freelist), finish first rmap update;
- Allocate xattr block for short->leaf conversion, map into attr fork,
chain rmap update; chain xattr add operation;
- Add rmap entry for new xattr block (more chaining to fix freelist),
finish second rmap update;
- Add xattr entry to file, finish xattr add operation;
- Finish pptr update.
Basically, I think someone's going to have to go audit all the uses of
xfs_trans_roll in the attr code to figure out which operations need redo
items, and how to cram everything toegether into a single xfs_defer_ops,
rather than sprinkling them around the attr code like we do now, because
redo items cannot be deferred from one defer_ops to another.
> It's something to keep in mind, in any event. IIRC there were also
> missing Signed-off-by's required for some of Mark's original patches.
That's also a problem; unless someone can get Mark to supply them, we
probably have to get someone to rewrite them. At a bare minimum I think
we explicitly have to pass back a xfs_dir2_dataptr_t, not a bare
uint32_t.
> IMO, the best next step might be to just finish off the implementation
> as-is such that we could have a fairly functional RFC to put on the
> list and hash out whether there are in fact any remaining design
> hurdles, but others might have a different opinion on that. :)
Agreed.
> Brian
>
> > -Eric
> >
> > > Allison
> > >>
> > >>>> I got in touch
> > >>>> with Brian Foster not too long ago and he had some code partially done from
> > >>>> about a year or so ago (looks like it has patches from Dave Chinner and Mark
> > >>>> Tinguely as well). So I am hoping to be able to use what we have so far to
> > >>>> create something updated and finished out. I am still pretty new to the xfs
> > >>>> code, so at the moment I am still just going through old discussion threads,
> > >>>> and reviewing the patches. But for the most part I just wanted to see what
> > >>>> people thought and get everyone on board with the idea. Suggestions and
> > >>>> feedback are much appreciated. Thank you!!
> > >>>>
> > >>>
> > >>> It will be better if you can describe in more details if you have any specific
> > >>> goal with this, and/or what kind of improvement you expect to have with it,
> > >>> adding something new without a reason, is usually not well received.
> > >>>
> > >>> Cheers
> > >>>
> > >> --
> > >> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > >> the body of a message to majordomo@vger.kernel.org
> > >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >>
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2017-07-14 22:47 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-13 23:25 Parent pointers Allison Henderson
2017-07-14 8:50 ` Carlos Maiolino
2017-07-14 14:04 ` Eric Sandeen
2017-07-14 17:44 ` Allison Henderson
2017-07-14 18:46 ` Eric Sandeen
2017-07-14 19:07 ` Brian Foster
2017-07-14 19:14 ` Eric Sandeen
2017-07-14 22:46 ` Darrick J. Wong [this message]
2017-07-15 16:36 ` Tinguely, Mark
2017-07-17 14:49 ` Brian Foster
2017-07-17 15:33 ` Mark Tinguely
2017-07-17 22:53 ` Darrick J. Wong
2017-07-17 14:48 ` Brian Foster
2017-07-17 22:14 ` Dave Chinner
2017-07-17 23:10 ` Darrick J. Wong
2017-07-18 0:10 ` Dave Chinner
-- strict thread matches above, loose matches on Subject: below --
2017-07-14 4:54 Allison Henderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170714224658.GB4224@magnolia \
--to=darrick.wong@oracle.com \
--cc=allison.henderson@oracle.com \
--cc=bfoster@redhat.com \
--cc=cmaiolino@redhat.com \
--cc=linux-xfs@vger.kernel.org \
--cc=sandeen@sandeen.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox