From: "Darrick J. Wong" <djwong@kernel.org>
To: Alli <allison.henderson@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH RESEND v2 01/18] xfs: Fix multi-transaction larp replay
Date: Tue, 23 Aug 2022 08:07:30 -0700 [thread overview]
Message-ID: <YwTtMqOCtkxyox3x@magnolia> (raw)
In-Reply-To: <82cc6ff775832d34f32cdbfe9bd125487ec22226.camel@oracle.com>
On Thu, Aug 18, 2022 at 06:05:54PM -0700, Alli wrote:
> On Tue, 2022-08-16 at 13:41 -0700, Alli wrote:
> > On Mon, 2022-08-15 at 22:07 -0700, Darrick J. Wong wrote:
> > > On Tue, Aug 16, 2022 at 10:54:38AM +1000, Dave Chinner wrote:
> > > > On Thu, Aug 11, 2022 at 06:55:16PM -0700, Alli wrote:
> > > > > On Wed, 2022-08-10 at 16:12 +1000, Dave Chinner wrote:
> > > > > > On Tue, Aug 09, 2022 at 10:01:49PM -0700, Alli wrote:
> > > > > > > On Wed, 2022-08-10 at 11:58 +1000, Dave Chinner wrote:
> > > > > > > > On Tue, Aug 09, 2022 at 09:52:55AM -0700, Darrick J. Wong
> > > > > > > > wrote:
> > > > > > > > > On Thu, Aug 04, 2022 at 12:39:56PM -0700, Allison
> > > > > > > > > Henderson
> > > > > > > > > wrote:
> > > > > > > > > > Recent parent pointer testing has exposed a bug in
> > > > > > > > > > the
> > > > > > > > > > underlying
> > > > > > > > > > attr replay. A multi transaction replay currently
> > > > > > > > > > performs a
> > > > > > > > > > single step of the replay, then deferrs the rest if
> > > > > > > > > > there is
> > > > > > > > > > more
> > > > > > > > > > to do.
> > > > > > > >
> > > > > > > > Yup.
> > > > > > > >
> > > > > > > > > > This causes race conditions with other attr replays
> > > > > > > > > > that
> > > > > > > > > > might be recovered before the remaining deferred work
> > > > > > > > > > has had
> > > > > > > > > > a
> > > > > > > > > > chance to finish.
> > > > > > > >
> > > > > > > > What other attr replays are we racing against? There can
> > > > > > > > only be
> > > > > > > > one incomplete attr item intent/done chain per inode
> > > > > > > > present in
> > > > > > > > log
> > > > > > > > recovery, right?
> > > > > > > No, a rename queues up a set and remove before committing
> > > > > > > the
> > > > > > > transaction. One for the new parent pointer, and another
> > > > > > > to
> > > > > > > remove
> > > > > > > the
> > > > > > > old one.
> > > > > >
> > > > > > Ah. That really needs to be described in the commit message -
> > > > > > changing from "single intent chain per object" to "multiple
> > > > > > concurrent independent and unserialised intent chains per
> > > > > > object" is
> > > > > > a pretty important design rule change...
> > > > > >
> > > > > > The whole point of intents is to allow complex, multi-stage
> > > > > > operations on a single object to be sequenced in a tightly
> > > > > > controlled manner. They weren't intended to be run as
> > > > > > concurrent
> > > > > > lines of modification on single items; if you need to do two
> > > > > > modifications on an object, the intent chain ties the two
> > > > > > modifications together into a single whole.
> > > > > >
> > > > > > One of the reasons I rewrote the attr state machine for LARP
> > > > > > was to
> > > > > > enable new multiple attr operation chains to be easily build
> > > > > > from
> > > > > > the entry points the state machien provides. Parent attr
> > > > > > rename
> > > > > > needs a new intent chain to be built, not run multiple
> > > > > > independent
> > > > > > intent chains for each modification.
> > > > > >
> > > > > > > It cant be an attr replace because technically the names
> > > > > > > are
> > > > > > > different.
> > > > > >
> > > > > > I disagree - we have all the pieces we need in the state
> > > > > > machine
> > > > > > already, we just need to define separate attr names for the
> > > > > > remove and insert steps in the attr intent.
> > > > > >
> > > > > > That is, the "replace" operation we execute when an attr set
> > > > > > overwrites the value is "technically" a "replace value"
> > > > > > operation,
> > > > > > but we actually implement it as a "replace entire attribute"
> > > > > > operation.
> > > > > >
> > > > > > Without LARP, we do that overwrite in independent steps via
> > > > > > an
> > > > > > intermediate INCOMPLETE state to allow two xattrs of the same
> > > > > > name
> > > > > > to exist in the attr tree at the same time. IOWs, the attr
> > > > > > value
> > > > > > overwrite is effectively a "set-swap-remove" operation on two
> > > > > > entirely independent xattrs, ensuring that if we crash we
> > > > > > always
> > > > > > have either the old or new xattr visible.
> > > > > >
> > > > > > With LARP, we can remove the original attr first, thereby
> > > > > > avoiding
> > > > > > the need for two versions of the xattr to exist in the tree
> > > > > > in
> > > > > > the
> > > > > > first place. However, we have to do these two operations as a
> > > > > > pair
> > > > > > of linked independent operations. The intent chain provides
> > > > > > the
> > > > > > linking, and requires us to log the name and the value of the
> > > > > > attr
> > > > > > that we are overwriting in the intent. Hence we can always
> > > > > > recover
> > > > > > the modification to completion no matter where in the
> > > > > > operation
> > > > > > we
> > > > > > fail.
> > > > > >
> > > > > > When it comes to a parent attr rename operation, we are
> > > > > > effectively
> > > > > > doing two linked operations - remove the old attr, set the
> > > > > > new
> > > > > > attr
> > > > > > - on different attributes. Implementation wise, it is exactly
> > > > > > the
> > > > > > same sequence as a "replace value" operation, except for the
> > > > > > fact
> > > > > > that the new attr we add has a different name.
> > > > > >
> > > > > > Hence the only real difference between the existing "attr
> > > > > > replace"
> > > > > > and the intent chain we need for "parent attr rename" is that
> > > > > > we
> > > > > > have to log two attr names instead of one.
> > > > >
> > > > > To be clear, this would imply expanding xfs_attri_log_format to
> > > > > have
> > > > > another alfi_new_name_len feild and another iovec for the attr
> > > > > intent
> > > > > right? Does that cause issues to change the on disk log layout
> > > > > after
> > > > > the original has merged? Or is that ok for things that are
> > > > > still
> > > > > experimental? Thanks!
> > > >
> > > > I think we can get away with this quite easily without breaking
> > > > the
> > > > existing experimental code.
> > > >
> > > > struct xfs_attri_log_format {
> > > > uint16_t alfi_type; /* attri log item type */
> > > > uint16_t alfi_size; /* size of this item */
> > > > uint32_t __pad; /* pad to 64 bit aligned
> > > > */
> > > > uint64_t alfi_id; /* attri identifier */
> > > > uint64_t alfi_ino; /* the inode for this
> > > > attr
> > > > operation */
> > > > uint32_t alfi_op_flags; /* marks the op as a set
> > > > or
> > > > remove */
> > > > uint32_t alfi_name_len; /* attr name length */
> > > > uint32_t alfi_value_len; /* attr value length */
> > > > uint32_t alfi_attr_filter;/* attr filter flags */
> > > > };
> > > >
> > > > We have a padding field in there that is currently all zeros.
> > > > Let's
> > > > make that a count of the number of {name, value} tuples that are
> > > > appended to the format. i.e.
> > > >
> > > > struct xfs_attri_log_name {
> > > > uint32_t alfi_op_flags; /* marks the op as a set
> > > > or
> > > > remove */
> > > > uint32_t alfi_name_len; /* attr name length */
> > > > uint32_t alfi_value_len; /* attr value length */
> > > > uint32_t alfi_attr_filter;/* attr filter flags */
> > > > };
> > > >
> > > > struct xfs_attri_log_format {
> > > > uint16_t alfi_type; /* attri log item type */
> > > > uint16_t alfi_size; /* size of this item */
> > > > uint8_t alfi_attr_cnt; /* count of name/val
> > > > pairs
> > > > */
> > > > uint8_t __pad1; /* pad to 64 bit
> > > > aligned */
> > > > uint16_t __pad2; /* pad to 64 bit aligned */
> > > > uint64_t alfi_id; /* attri identifier */
> > > > uint64_t alfi_ino; /* the inode for this
> > > > attr
> > > > operation */
> > > > struct xfs_attri_log_name alfi_attr[]; /* attrs to operate on
> > > > */
> > > > };
> > > >
> > > > Basically, the size and shape of the structure has not changed,
> > > > and
> > > > if alfi_attr_cnt == 0 we just treat it as if alfi_attr_cnt == 1
> > > > as
> > > > the backwards compat code for the existing code.
> > > >
> > > > And then we just have as many followup regions for name/val pairs
> > > > as are defined by the alfi_attr_cnt and alfi_attr[] parts of the
> > > > structure. Each attr can have a different operation performed on
> > > > them, and they can have different filters applied so they can
> > > > exist
> > > > in different namespaces, too.
> > > >
> > > > SO I don't think we need a new on-disk feature bit for this
> > > > enhancement - it definitely comes under the heading of "this
> > > > stuff
> > > > is experimental, this is the sort of early structure revision
> > > > that
> > > > EXPERIMENTAL is supposed to cover....
> > >
> > > You might even callit "alfi_extra_names" to avoid the "0 means 1"
> > > stuff.
> > > ;)
> > >
> > > --D
> >
> > Oh, I just noticed these comments this morning when I sent out the
> > new
> > attri/d patch. I'll add this changes to v2. Please let me know if
> > there's anything else you'd like me to change from the v1. Thx!
> >
> > Allison
>
> Ok, so I am part way through coding this up, and I'm getting this
> feeling like this is not going to work out very well due to the size
> checks for the log formats:
>
> root@garnet:/home/achender/work_area/xfs-linux# git diff
> fs/xfs/libxfs/xfs_log_format.h fs/xfs/xfs_ondisk.h
> diff --git a/fs/xfs/libxfs/xfs_log_format.h
> b/fs/xfs/libxfs/xfs_log_format.h
> index f1ff52ebb982..5a4e700f32fc 100644
> --- a/fs/xfs/libxfs/xfs_log_format.h
> +++ b/fs/xfs/libxfs/xfs_log_format.h
> @@ -922,6 +922,13 @@ struct xfs_icreate_log {
> XFS_ATTR_PARENT | \
> XFS_ATTR_INCOMPLETE)
>
> +struct xfs_attri_log_name {
> + uint32_t alfi_op_flags; /* marks the op as a set or
> remove */
> + uint32_t alfi_name_len; /* attr name length */
> + uint32_t alfi_value_len; /* attr value length */
> + uint32_t alfi_attr_filter;/* attr filter flags */
> +};
> +
> /*
> * This is the structure used to lay out an attr log item in the
> * log.
> @@ -929,14 +936,12 @@ struct xfs_icreate_log {
> struct xfs_attri_log_format {
> uint16_t alfi_type; /* attri log item type */
> uint16_t alfi_size; /* size of this item */
> - uint32_t __pad; /* pad to 64 bit aligned */
> + uint8_t alfi_extra_names;/* count of name/val pairs */
> + uint8_t __pad1; /* pad to 64 bit aligned */
> + uint16_t __pad2; /* pad to 64 bit aligned */
> uint64_t alfi_id; /* attri identifier */
> uint64_t alfi_ino; /* the inode for this attr
> operation */
> - uint32_t alfi_op_flags; /* marks the op as a set or
> remove */
> - uint32_t alfi_name_len; /* attr name length */
> - uint32_t alfi_value_len; /* attr value length */
> - uint32_t alfi_attr_filter;/* attr filter flags */
> + struct xfs_attri_log_name alfi_attr[]; /* attrs to operate on
What's the length of this VLA? 1 for a normal SET or REPLACE
operation, and 2 for the "rename and replace value" operation?
If so, why do we need two xfs_attri_log_name structures? The old value
is unimportant, so we only need one alfi_value_len per operation. Each
xfs_attri_log_format only describes one change, so it only needs one
alfi_op_flags per op.
For now I also don't think attributes should be able to jump namespaces,
so we'd only need one alfi_attr_filter per op as well.
*lightbulb comes on* Oops, I think I led you astray with my unfortunate
comment. :(
IOWs, the only change to struct xfs_attri_log_format is:
- uint32_t __pad; /* pad to 64 bit aligned */
+ uint32_t alfi_new_namelen;/* new attr name length */
and the rest of the changes in "[PATCH] xfs: Add new name to attri/d"
are more or less fine as is.
I'll go reply to that before I get back to Dave's log accounting stuff.
--D
> */
> };
>
> struct xfs_attrd_log_format {
> diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
> index 3e7f7eaa5b96..c040eeb88def 100644
> --- a/fs/xfs/xfs_ondisk.h
> +++ b/fs/xfs/xfs_ondisk.h
> @@ -132,7 +132,7 @@ xfs_check_ondisk_structs(void)
> XFS_CHECK_STRUCT_SIZE(struct xfs_inode_log_format, 56);
> XFS_CHECK_STRUCT_SIZE(struct xfs_qoff_logformat, 20);
> XFS_CHECK_STRUCT_SIZE(struct xfs_trans_header, 16);
> - XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format, 48);
> + XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format, 24);
> XFS_CHECK_STRUCT_SIZE(struct xfs_attrd_log_format, 16);
>
> /* parent pointer ioctls */
> root@garnet:/home/achender/work_area/xfs-linux#
>
>
>
> If the on disk size check thinks the format is 24 bytes, and then we
> surprise pack an array of structs after it, isnt that going to run over
> the next item? I think anything dynamic like this has to be an nvec.
> Maybe we leave the existing alfi_* as they are so the size doesnt
> change, and then if we have a value in alfi_extra_names, then we have
> an extra nvec that has the array in it. I think that would work.
>
> FWIW, an alternate solution would be to use the pad for a second name
> length, and then we get a patch that's very similar to the one I sent
> out last Tues, but backward compatible. Though it does eat the
> remaining pad and wouldn't be as flexible, I cant think of an attr op
> that would need more than two names either?
>
> Let me know what people think. Thanks!
> Allison
>
>
> > > > Cheers,
> > > >
> > > > Dave.
> > > > --
> > > > Dave Chinner
> > > > david@fromorbit.com
>
next prev parent reply other threads:[~2022-08-23 17:32 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-04 19:39 [PATCH RESEND v2 00/18] Parent Pointers Allison Henderson
2022-08-04 19:39 ` [PATCH RESEND v2 01/18] xfs: Fix multi-transaction larp replay Allison Henderson
2022-08-09 16:52 ` Darrick J. Wong
2022-08-10 1:58 ` Dave Chinner
2022-08-10 5:01 ` Alli
2022-08-10 6:12 ` Dave Chinner
2022-08-10 15:52 ` Darrick J. Wong
2022-08-10 19:28 ` Alli
2022-08-12 1:55 ` Alli
2022-08-12 3:05 ` Darrick J. Wong
2022-08-16 0:54 ` Dave Chinner
2022-08-16 5:07 ` Darrick J. Wong
2022-08-16 20:41 ` Alli
2022-08-19 1:05 ` Alli
2022-08-23 15:07 ` Darrick J. Wong [this message]
2022-08-24 18:47 ` Alli
2022-08-10 3:08 ` Alli
2022-08-04 19:39 ` [PATCH RESEND v2 02/18] xfs: Increase XFS_DEFER_OPS_NR_INODES to 5 Allison Henderson
2022-08-09 16:38 ` Darrick J. Wong
2022-08-10 3:07 ` Alli
2022-08-04 19:39 ` [PATCH RESEND v2 03/18] xfs: Hold inode locks in xfs_ialloc Allison Henderson
2022-08-04 19:39 ` [PATCH RESEND v2 04/18] xfs: Hold inode locks in xfs_trans_alloc_dir Allison Henderson
2022-08-04 19:40 ` [PATCH RESEND v2 05/18] xfs: get directory offset when adding directory name Allison Henderson
2022-08-04 19:40 ` [PATCH RESEND v2 06/18] xfs: get directory offset when removing " Allison Henderson
2022-08-04 19:40 ` [PATCH RESEND v2 07/18] xfs: get directory offset when replacing a " Allison Henderson
2022-08-04 19:40 ` [PATCH RESEND v2 08/18] xfs: add parent pointer support to attribute code Allison Henderson
2022-08-09 16:54 ` Darrick J. Wong
2022-08-10 3:08 ` Alli
2022-08-04 19:40 ` [PATCH RESEND v2 09/18] xfs: define parent pointer xattr format Allison Henderson
2022-08-04 19:40 ` [PATCH RESEND v2 10/18] xfs: Add xfs_verify_pptr Allison Henderson
2022-08-09 16:59 ` Darrick J. Wong
2022-08-10 3:08 ` Alli
2022-08-04 19:40 ` [PATCH RESEND v2 11/18] xfs: extend transaction reservations for parent attributes Allison Henderson
2022-08-09 17:48 ` Darrick J. Wong
2022-08-10 3:08 ` Alli
2022-08-04 19:40 ` [PATCH RESEND v2 12/18] xfs: parent pointer attribute creation Allison Henderson
2022-08-09 18:01 ` Darrick J. Wong
2022-08-09 18:13 ` Darrick J. Wong
2022-08-10 3:09 ` Alli
2022-08-10 3:08 ` Alli
2022-08-04 19:40 ` [PATCH RESEND v2 13/18] xfs: add parent attributes to link Allison Henderson
2022-08-09 18:43 ` Darrick J. Wong
2022-08-10 3:09 ` Alli
2022-09-23 20:25 ` Darrick J. Wong
2022-08-04 19:40 ` [PATCH RESEND v2 14/18] xfs: remove parent pointers in unlink Allison Henderson
2022-08-09 18:45 ` Darrick J. Wong
2022-08-10 3:09 ` Alli
2022-08-04 19:40 ` [PATCH RESEND v2 15/18] xfs: Add parent pointers to rename Allison Henderson
2022-08-09 18:49 ` Darrick J. Wong
2022-08-10 3:09 ` Alli
2022-08-04 19:40 ` [PATCH RESEND v2 16/18] xfs: Add the parent pointer support to the superblock version 5 Allison Henderson
2022-08-04 19:40 ` [PATCH RESEND v2 17/18] xfs: Add helper function xfs_attr_list_context_init Allison Henderson
2022-08-04 19:40 ` [PATCH RESEND v2 18/18] xfs: Add parent pointer ioctl Allison Henderson
2022-08-09 19:26 ` Darrick J. Wong
2022-08-10 3:09 ` Alli
2022-09-24 0:01 ` Darrick J. Wong
2022-08-09 22:55 ` [RFC PATCH 19/18] xfs: fix unit conversion error in xfs_log_calc_max_attrsetm_res Darrick J. Wong
2022-08-09 22:56 ` [RFC PATCH 20/18] xfs: drop compatibility minimum log size computations for reflink Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YwTtMqOCtkxyox3x@magnolia \
--to=djwong@kernel.org \
--cc=allison.henderson@oracle.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox