From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Brian Foster <bfoster@redhat.com>,
Eric Sandeen <sandeen@sandeen.net>,
Allison Henderson <allison.henderson@oracle.com>,
cmaiolino@redhat.com, linux-xfs@vger.kernel.org
Subject: Re: Parent pointers
Date: Mon, 17 Jul 2017 16:10:03 -0700 [thread overview]
Message-ID: <20170717231003.GH4224@magnolia> (raw)
In-Reply-To: <20170717221407.GJ17762@dastard>
On Tue, Jul 18, 2017 at 08:14:07AM +1000, Dave Chinner wrote:
> On Mon, Jul 17, 2017 at 10:48:16AM -0400, Brian Foster wrote:
> > On Fri, Jul 14, 2017 at 03:46:58PM -0700, Darrick J. Wong wrote:
> > > On Fri, Jul 14, 2017 at 03:07:41PM -0400, Brian Foster wrote:
> > > From what I can tell, he seems to have taken Mark's patches to pass dir
> > > offsets back from the dirent manipulation functions and then started
> > > writing a simple implementation of stuffing the attrs into the attr
> > > tree.
>
> Yup, that's the idea.
>
> > > > FWIW, I think there was some mention of porting some of the operations
> > > > over to the deferred ops infrastructure, but it's not clear to me off
> > > > the top of my head how important (or appropriate) that is.
> > >
> > > Most of the users of xfs_trans_roll seem to be buried in the xattr code.
> > > For example, _attr_set can end up rolling a transaction after converting
> > > an attr fork from short format to leaf format before retrying the attr
> > > add operation, but we don't use log redo items (er, defer_ops items) to
> > > make sure log recovery will restart the attr add operation if we crash
> > > before the final _trans_commit after calling _attr_{leaf,node}_addname.
> > >
> > > In the case of a regular xattr set operation this wouldn't have been a
> > > big deal because the fsetxattr call wouldn't have returned, so all the
> > > user could possibly see is an inode with a perhaps unnecessarily large
> > > xattr fork. Now that we need to set xattrs in the same transaction
> > > chain as a directory operation, it becomes very important that log
> > > recovery gain the ability to resume an xattr add operation no matter
> > > where the log stopped. Creating a directory can become this longwinded
> > > pile of updates:
> > >
> >
> > Ok, thanks for the summary. So we have some transaction ordering down in
> > the xattr code that might no longer be sufficient to provide atomicity
> > in the context of parent pointers (via directory operations).
>
> We already have xattr operations that are not atomic with directory
> entry creation - security xattrs (default ACL, etc). This can mean
> that after a crash and recovery, we have a newly created file
> without any security xattr on it.
>
> This is a long standing problem - it bites SGI's HSM (DMF) all the
> time, so rather than fixing the problem 10+ years ago, they added a
> post-boot inode scan (via bulkstat) to find inodes that didn't have
> a DMF xattr.
<nod>
That reminds me, Ted was asking what _check_xfs_test_fs (in xfstests)
does. Judging from the nonexistent xfs_repair_ipaths command, I'm
guessing that program fixed up pptr xattrs if they were
broken/missing/whatever?
> The original parent pointer patchset also had this problem the
> moment xattrs were required (i.e. first hardlink) - the creation of
> the pp xattr was not atomic with the dirent creation. To solve this
> problem, I started by moving the xattr creation into the transaction
> context of the dirent.
>
> > I see that
> > the current patch series increases the transaction size to accommodate
> > parent pointers, but that doesn't help us deal with those situations
> > where the transaction is rolled into a new one.
>
> Right, I hadn't got that far. The plan effective was to get a proof
> of concept running with the xattr creation within the directory
> creation xact context and get that working, knowing that it still
> wasn't properly atomic. This was done before we had deferred ops,
> and my thinking at the time was to add an intent/done pair of ops
> to tell recovery it needed to add an xattr to the dirent once the
> xattr was inside the dirent context. i.e.
>
> - move xattr inside dirent ctx
> - add pp intent
> - roll transaction
> - add pp xattr
> - roll final xattr transaction rather than commit
> - add pp done
> - commit dir ctx transaction.
>
> This doesn't help us with security context xattrs, though, so
> when deferred ops came along, it made much more sense to me to
> make the adding of an xattr a deferred op so we could use the same
> mechanism for adding all the required xattrs to the inode atomically
> and so have a mechanism for recovery to replay them appropriately.
<nod>
> > > - Allocate directory block, map into data fork, chain rmap update, add
> > > directory entry, chain pptr update;
> > > - Add rmap entry for new directory block (more chaining to fix
> > > freelist), finish first rmap update;
> > > - Allocate xattr block for short->leaf conversion, map into attr fork,
> > > chain rmap update; chain xattr add operation;
> > > - Add rmap entry for new xattr block (more chaining to fix freelist),
> > > finish second rmap update;
> > > - Add xattr entry to file, finish xattr add operation;
> > > - Finish pptr update.
> > >
> > > Basically, I think someone's going to have to go audit all the uses of
> > > xfs_trans_roll in the attr code to figure out which operations need redo
> > > items, and how to cram everything toegether into a single xfs_defer_ops,
> > > rather than sprinkling them around the attr code like we do now, because
> > > redo items cannot be deferred from one defer_ops to another.
>
> Probably should handle that when we dup the transaction just before
> the roll, right?
Something like that. Ideally, I think I'd like to have two xattr redo
items: "add key => value" and "remove key", where "add key" would first
ensure xattr space, roll, and then actually add the attr & value. A big
problem with that scheme is that we then have to log the value, which
could be ... tough.
--D
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2017-07-17 23:10 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-13 23:25 Parent pointers Allison Henderson
2017-07-14 8:50 ` Carlos Maiolino
2017-07-14 14:04 ` Eric Sandeen
2017-07-14 17:44 ` Allison Henderson
2017-07-14 18:46 ` Eric Sandeen
2017-07-14 19:07 ` Brian Foster
2017-07-14 19:14 ` Eric Sandeen
2017-07-14 22:46 ` Darrick J. Wong
2017-07-15 16:36 ` Tinguely, Mark
2017-07-17 14:49 ` Brian Foster
2017-07-17 15:33 ` Mark Tinguely
2017-07-17 22:53 ` Darrick J. Wong
2017-07-17 14:48 ` Brian Foster
2017-07-17 22:14 ` Dave Chinner
2017-07-17 23:10 ` Darrick J. Wong [this message]
2017-07-18 0:10 ` Dave Chinner
-- strict thread matches above, loose matches on Subject: below --
2017-07-14 4:54 Allison Henderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170717231003.GH4224@magnolia \
--to=darrick.wong@oracle.com \
--cc=allison.henderson@oracle.com \
--cc=bfoster@redhat.com \
--cc=cmaiolino@redhat.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
--cc=sandeen@sandeen.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox