From: Dave Chinner <david@fromorbit.com>
To: Alli <allison.henderson@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 17/17] xfs: ATTR_REPLACE algorithm with LARP enabled needs rework
Date: Mon, 9 May 2022 07:50:23 +1000 [thread overview]
Message-ID: <20220508215023.GL1098723@dread.disaster.area> (raw)
In-Reply-To: <c2159284a8b1c575140b7bbd3190fd38428b0a4d.camel@oracle.com>
On Fri, May 06, 2022 at 11:01:23PM -0700, Alli wrote:
> On Fri, 2022-05-06 at 19:45 +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > We can't use the same algorithm for replacing an existing attribute
> > when logging attributes. The existing algorithm is essentially:
> >
> > 1. create new attr w/ INCOMPLETE
> > 2. atomically flip INCOMPLETE flags between old + new attribute
> > 3. remove old attr which is marked w/ INCOMPLETE
> >
> > This algorithm guarantees that we see either the old or new
> > attribute, and if we fail after the atomic flag flip, we don't have
> > to recover the removal of the old attr because we never see
> > INCOMPLETE attributes in lookups.
> >
> > For logged attributes, however, this does not work. The logged
> > attribute intents do not track the work that has been done as the
> > transaction rolls, and hence the only recovery mechanism we have is
> > "run the replace operation from scratch".
> >
> > This is further exacerbated by the attempt to avoid needing the
> > INCOMPLETE flag to create an atomic swap. This means we can create
> > a second active attribute of the same name before we remove the
> > original. If we fail at any point after the create but before the
> > removal has completed, we end up with duplicate attributes in
> > the attr btree and recovery only tries to replace one of them.
> >
> > There are several other failure modes where we can leave partially
> > allocated remote attributes that expose stale data, partially free
> > remote attributes that enable UAF based stale data exposure, etc.
> >
> > TO fix this, we need a different algorithm for replace operations
> > when LARP is enabled. Luckily, it's not that complex if we take the
> > right first step. That is, the first thing we log is the attri
> > intent with the new name/value pair and mark the old attr as
> > INCOMPLETE in the same transaction.
> >
> > From there, we then remove the old attr and keep relogging the
> > new name/value in the intent, such that we always know that we have
> > to create the new attr in recovery. Once the old attr is removed,
> > we then run a normal ATTR_CREATE operation relogging the intent as
> > we go. If the new attr is local, then it gets created in a single
> > atomic transaction that also logs the final intent done. If the new
> > attr is remote, the we set INCOMPLETE on the new attr while we
> > allocate and set the remote value, and then we clear the INCOMPLETE
> > flag at in the last transaction taht logs the final intent done.
> >
> > If we fail at any point in this algorithm, log recovery will always
> > see the same state on disk: the new name/value in the intent, and
> > either an INCOMPLETE attr or no attr in the attr btree. If we find
> > an INCOMPLETE attr, we run the full replace starting with removing
> > the INCOMPLETE attr. If we don't find it, then we simply create the
> > new attr.
> >
> > Notably, recovery of a failed create that has an INCOMPLETE flag set
> > is now the same - we start with the lookup of the INCOMPLETE attr,
> > and if that exists then we do the full replace recovery process,
> > otherwise we just create the new attr.
> >
> > Hence changing the way we do the replace operation when LARP is
> > enabled allows us to use the same log recovery algorithm for both
> > the ATTR_CREATE and ATTR_REPLACE operations. This is also the same
> > algorithm we use for runtime ATTR_REPLACE operations (except for the
> > step setting up the initial conditions).
> >
> > The result is that:
> >
> > - ATTR_CREATE uses the same algorithm regardless of whether LARP is
> > enabled or not
> > - ATTR_REPLACE with larp=0 is identical to the old algorithm
> > - ATTR_REPLACE with larp=1 runs an unmodified attr removal algorithm
> > from the larp=0 code and then runs the unmodified ATTR_CREATE
> > code.
> > - log recovery when larp=1 runs the same ATTR_REPLACE algorithm as
> > it uses at runtime.
> >
> > Because the state machine is now quite clean, changing the algorithm
> > is really just a case of changing the initial state and how the
> > states link together for the ATTR_REPLACE case. Hence it's not a
> > huge amoutn of code for what is a fairly substantial rework
> > of the attr logging and recovery algorithm....
> >
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> This looks mostly good, though when I run it with the new tests, I seem
> to run into a failed filesystem check at the end. "bad attribute count
> 0 in attr block 0". I suspect we still dont have the removal of the
> fork quite right. It sounds like you're still working with things
> though, I'll keep looking too.
Yeah, that's a strange one. I didn't get it with the based branch
testing (based on 5.18-rc+for-next) but over the weekend where it
got merged with 5.18-rc5 it appears that the error has manifested
in several test runs. I'll dig into it this morning.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2022-05-08 21:51 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-06 9:45 Dave Chinner
2022-05-06 9:45 ` [PATCH 01/17] xfs: avoid empty xattr transaction when attrs are inline Dave Chinner
2022-05-06 9:45 ` [PATCH 02/17] xfs: initialise attrd item to zero Dave Chinner
2022-05-06 9:45 ` [PATCH 03/17] xfs: make xattri_leaf_bp more useful Dave Chinner
2022-05-06 9:45 ` [PATCH 04/17] xfs: rework deferred attribute operation setup Dave Chinner
2022-05-06 23:43 ` Alli
2022-05-06 9:45 ` [PATCH 05/17] xfs: separate out initial attr_set states Dave Chinner
2022-05-06 23:44 ` Alli
2022-05-06 9:45 ` [PATCH 06/17] xfs: kill XFS_DAC_LEAF_ADDNAME_INIT Dave Chinner
2022-05-06 9:45 ` [PATCH 07/17] xfs: consolidate leaf/node states in xfs_attr_set_iter Dave Chinner
2022-05-06 9:45 ` [PATCH 08/17] xfs: split remote attr setting out from replace path Dave Chinner
2022-05-06 9:45 ` [PATCH 09/17] xfs: XFS_DAS_LEAF_REPLACE state only needed if !LARP Dave Chinner
2022-05-06 23:44 ` Alli
2022-05-06 9:45 ` [PATCH 10/17] xfs: remote xattr removal in xfs_attr_set_iter() is conditional Dave Chinner
2022-05-06 9:45 ` [PATCH 11/17] xfs: clean up final attr removal in xfs_attr_set_iter Dave Chinner
2022-05-06 9:45 ` [PATCH 12/17] xfs: xfs_attr_set_iter() does not need to return EAGAIN Dave Chinner
2022-05-06 9:45 ` [PATCH 13/17] xfs: introduce attr remove initial states into xfs_attr_set_iter Dave Chinner
2022-05-06 9:45 ` [PATCH 14/17] xfs: switch attr remove to xfs_attri_set_iter Dave Chinner
2022-05-06 23:44 ` Alli
2022-05-06 9:45 ` [PATCH 15/17] xfs: remove xfs_attri_remove_iter Dave Chinner
2022-05-06 9:45 ` [PATCH 16/17] xfs: use XFS_DA_OP flags in deferred attr ops Dave Chinner
2022-05-09 8:17 ` Dave Chinner
2022-05-10 22:31 ` Alli
2022-05-06 9:45 ` [PATCH 17/17] xfs: ATTR_REPLACE algorithm with LARP enabled needs rework Dave Chinner
2022-05-07 6:01 ` Alli
2022-05-08 21:50 ` Dave Chinner [this message]
2022-05-08 23:24 ` Dave Chinner
2022-05-06 22:35 ` [PATCH 00/17 V3] XFS: LARP state machine and recovery rework Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220508215023.GL1098723@dread.disaster.area \
--to=david@fromorbit.com \
--cc=allison.henderson@oracle.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox