* [PATCH 1/3] design: update group quota inode information for v5 filesystems
2023-01-18 0:42 [PATCHSET 0/3] xfs-documentation: updates for 6.1 Darrick J. Wong
@ 2023-01-18 0:44 ` Darrick J. Wong
2023-01-24 5:29 ` Chandan Babu R
2023-01-18 0:45 ` [PATCH 2/3] design: document the large extent count ondisk format changes Darrick J. Wong
2023-01-18 0:45 ` [PATCH 3/3] design: document extended attribute log item changes Darrick J. Wong
2 siblings, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2023-01-18 0:44 UTC (permalink / raw)
To: djwong, darrick.wong; +Cc: linux-xfs, chandan.babu, allison.henderson
From: Darrick J. Wong <djwong@kernel.org>
Fix a few out of date statements about the group quota inode field on v5
filesystems.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
.../allocation_groups.asciidoc | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
index 0e48b4bf..7ee5d561 100644
--- a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
+++ b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
@@ -262,11 +262,12 @@ maintained in the first superblock.
*sb_uquotino*::
Inode for user quotas. This and the following two quota fields only apply if
+XFS_SB_VERSION_QUOTABIT+ flag is set in +sb_versionnum+. Refer to
-xref:Quota_Inodes[quota inodes] for more information
+xref:Quota_Inodes[quota inodes] for more information.
*sb_gquotino*::
-Inode for group or project quotas. Group and Project quotas cannot be used at
-the same time.
+Inode for group or project quotas. Group and project quotas cannot be used at
+the same time on v4 filesystems. On a v5 filesystem, this inode always stores
+group quota information.
*sb_qflags*::
Quota flags. It can be a combination of the following flags:
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] design: update group quota inode information for v5 filesystems
2023-01-18 0:44 ` [PATCH 1/3] design: update group quota inode information for v5 filesystems Darrick J. Wong
@ 2023-01-24 5:29 ` Chandan Babu R
0 siblings, 0 replies; 8+ messages in thread
From: Chandan Babu R @ 2023-01-24 5:29 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: darrick.wong, linux-xfs, allison.henderson
On Tue, Jan 17, 2023 at 04:44:49 PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> Fix a few out of date statements about the group quota inode field on v5
> filesystems.
>
Looks good to me.
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
--
chandan
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
> .../allocation_groups.asciidoc | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
>
> diff --git a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
> index 0e48b4bf..7ee5d561 100644
> --- a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
> +++ b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
> @@ -262,11 +262,12 @@ maintained in the first superblock.
> *sb_uquotino*::
> Inode for user quotas. This and the following two quota fields only apply if
> +XFS_SB_VERSION_QUOTABIT+ flag is set in +sb_versionnum+. Refer to
> -xref:Quota_Inodes[quota inodes] for more information
> +xref:Quota_Inodes[quota inodes] for more information.
>
> *sb_gquotino*::
> -Inode for group or project quotas. Group and Project quotas cannot be used at
> -the same time.
> +Inode for group or project quotas. Group and project quotas cannot be used at
> +the same time on v4 filesystems. On a v5 filesystem, this inode always stores
> +group quota information.
>
> *sb_qflags*::
> Quota flags. It can be a combination of the following flags:
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/3] design: document the large extent count ondisk format changes
2023-01-18 0:42 [PATCHSET 0/3] xfs-documentation: updates for 6.1 Darrick J. Wong
2023-01-18 0:44 ` [PATCH 1/3] design: update group quota inode information for v5 filesystems Darrick J. Wong
@ 2023-01-18 0:45 ` Darrick J. Wong
2023-01-24 5:30 ` Chandan Babu R
2023-01-18 0:45 ` [PATCH 3/3] design: document extended attribute log item changes Darrick J. Wong
2 siblings, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2023-01-18 0:45 UTC (permalink / raw)
To: djwong, darrick.wong; +Cc: linux-xfs, chandan.babu, allison.henderson
From: Darrick J. Wong <djwong@kernel.org>
Update the ondisk format documentation to discuss the larger maximum
extent counts that were added in 2022.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
.../allocation_groups.asciidoc | 4 +
.../XFS_Filesystem_Structure/ondisk_inode.asciidoc | 61 ++++++++++++++++++--
2 files changed, 58 insertions(+), 7 deletions(-)
diff --git a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
index 7ee5d561..c64b4fad 100644
--- a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
+++ b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
@@ -454,6 +454,10 @@ xref:Timestamps[timestamps] for more information.
The filesystem is not in operable condition, and must be run through
xfs_repair before it can be mounted.
+| +XFS_SB_FEAT_INCOMPAT_NREXT64+ |
+Large file fork extent counts. This greatly expands the maximum number of
+space mappings allowed in data and extended attribute file forks.
+
|=====
*sb_features_log_incompat*::
diff --git a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
index 1922954e..34c06487 100644
--- a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
+++ b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
@@ -84,14 +84,41 @@ struct xfs_dinode_core {
__uint32_t di_nlink;
__uint16_t di_projid;
__uint16_t di_projid_hi;
- __uint8_t di_pad[6];
- __uint16_t di_flushiter;
+ union {
+ /* Number of data fork extents if NREXT64 is set */
+ __be64 di_big_nextents;
+
+ /* Padding for V3 inodes without NREXT64 set. */
+ __be64 di_v3_pad;
+
+ /* Padding and inode flush counter for V2 inodes. */
+ struct {
+ __u8 di_v2_pad[6];
+ __be16 di_flushiter;
+ };
+ };
xfs_timestamp_t di_atime;
xfs_timestamp_t di_mtime;
xfs_timestamp_t di_ctime;
xfs_fsize_t di_size;
xfs_rfsblock_t di_nblocks;
xfs_extlen_t di_extsize;
+ union {
+ /*
+ * For V2 inodes and V3 inodes without NREXT64 set, this
+ * is the number of data and attr fork extents.
+ */
+ struct {
+ __be32 di_nextents;
+ __be16 di_anextents;
+ } __packed;
+
+ /* Number of attr fork extents if NREXT64 is set. */
+ struct {
+ __be32 di_big_anextents;
+ __be16 di_nrext64_pad;
+ } __packed;
+ } __packed;
xfs_extnum_t di_nextents;
xfs_aextnum_t di_anextents;
__uint8_t di_forkoff;
@@ -162,7 +189,7 @@ When the number exceeds 65535, the inode is converted to v2 and the link count
is stored in +di_nlink+.
*di_uid*::
-Specifies the owner's UID of the inode.
+Specifies the owner's UID of the inode.
*di_gid*::
Specifies the owner's GID of the inode.
@@ -181,10 +208,17 @@ Specifies the high 16 bits of the owner's project ID in v2 inodes, if the
+XFS_SB_VERSION2_PROJID32BIT+ feature is set; and zero otherwise.
*di_pad[6]*::
-Reserved, must be zero.
+Reserved, must be zero. Only exists for v2 inodes.
*di_flushiter*::
-Incremented on flush.
+Incremented on flush. Only exists for v2 inodes.
+
+*di_v3_pad*::
+Must be zero for v3 inodes without the NREXT64 flag set.
+
+*di_big_nextents*::
+Specifies the number of data extents associated with this inode if the NREXT64
+flag is set. This allows for up to 2^48^ - 1 extent mappings.
*di_atime*::
@@ -231,10 +265,19 @@ file is written to beyond allocated space, XFS will attempt to allocate
additional disk space based on this value.
*di_nextents*::
-Specifies the number of data extents associated with this inode.
+Specifies the number of data extents associated with this inode if the NREXT64
+flag is not set. Supports up to 2^31^ - 1 extents.
*di_anextents*::
-Specifies the number of extended attribute extents associated with this inode.
+Specifies the number of extended attribute extents associated with this inode
+if the NREXT64 flag is not set. Supports up to 2^15^ - 1 extents.
+
+*di_big_anextents*::
+Specifies the number of extended attribute extents associated with this inode
+if the NREXT64 flag is set. Supports up to 2^32^ - 1 extents.
+
+*di_nrext64_pad*::
+Must be zero if the NREXT64 flag is set.
*di_forkoff*::
Specifies the offset into the inode's literal area where the extended attribute
@@ -336,6 +379,10 @@ This inode shares (or has shared) data blocks with another inode.
For files, this is the extent size hint for copy on write operations; see
+di_cowextsize+ for details. For directories, the value in +di_cowextsize+
will be copied to all newly created files and directories.
+| +XFS_DIFLAG2_NREXT64+ |
+Files with this flag set may have up to (2^48^ - 1) extents mapped to the data
+fork and up to (2^32^ - 1) extents mapped to the attribute fork. This flag
+requires the +XFS_SB_FEAT_INCOMPAT_NREXT64+ feature to be enabled.
|=====
*di_cowextsize*::
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH 2/3] design: document the large extent count ondisk format changes
2023-01-18 0:45 ` [PATCH 2/3] design: document the large extent count ondisk format changes Darrick J. Wong
@ 2023-01-24 5:30 ` Chandan Babu R
0 siblings, 0 replies; 8+ messages in thread
From: Chandan Babu R @ 2023-01-24 5:30 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: darrick.wong, linux-xfs, allison.henderson
On Tue, Jan 17, 2023 at 04:45:05 PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> Update the ondisk format documentation to discuss the larger maximum
> extent counts that were added in 2022.
>
Looks good to me.
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
--
chandan
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
> .../allocation_groups.asciidoc | 4 +
> .../XFS_Filesystem_Structure/ondisk_inode.asciidoc | 61 ++++++++++++++++++--
> 2 files changed, 58 insertions(+), 7 deletions(-)
>
>
> diff --git a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
> index 7ee5d561..c64b4fad 100644
> --- a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
> +++ b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
> @@ -454,6 +454,10 @@ xref:Timestamps[timestamps] for more information.
> The filesystem is not in operable condition, and must be run through
> xfs_repair before it can be mounted.
>
> +| +XFS_SB_FEAT_INCOMPAT_NREXT64+ |
> +Large file fork extent counts. This greatly expands the maximum number of
> +space mappings allowed in data and extended attribute file forks.
> +
> |=====
>
> *sb_features_log_incompat*::
> diff --git a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
> index 1922954e..34c06487 100644
> --- a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
> +++ b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
> @@ -84,14 +84,41 @@ struct xfs_dinode_core {
> __uint32_t di_nlink;
> __uint16_t di_projid;
> __uint16_t di_projid_hi;
> - __uint8_t di_pad[6];
> - __uint16_t di_flushiter;
> + union {
> + /* Number of data fork extents if NREXT64 is set */
> + __be64 di_big_nextents;
> +
> + /* Padding for V3 inodes without NREXT64 set. */
> + __be64 di_v3_pad;
> +
> + /* Padding and inode flush counter for V2 inodes. */
> + struct {
> + __u8 di_v2_pad[6];
> + __be16 di_flushiter;
> + };
> + };
> xfs_timestamp_t di_atime;
> xfs_timestamp_t di_mtime;
> xfs_timestamp_t di_ctime;
> xfs_fsize_t di_size;
> xfs_rfsblock_t di_nblocks;
> xfs_extlen_t di_extsize;
> + union {
> + /*
> + * For V2 inodes and V3 inodes without NREXT64 set, this
> + * is the number of data and attr fork extents.
> + */
> + struct {
> + __be32 di_nextents;
> + __be16 di_anextents;
> + } __packed;
> +
> + /* Number of attr fork extents if NREXT64 is set. */
> + struct {
> + __be32 di_big_anextents;
> + __be16 di_nrext64_pad;
> + } __packed;
> + } __packed;
> xfs_extnum_t di_nextents;
> xfs_aextnum_t di_anextents;
> __uint8_t di_forkoff;
> @@ -162,7 +189,7 @@ When the number exceeds 65535, the inode is converted to v2 and the link count
> is stored in +di_nlink+.
>
> *di_uid*::
> -Specifies the owner's UID of the inode.
> +Specifies the owner's UID of the inode.
>
> *di_gid*::
> Specifies the owner's GID of the inode.
> @@ -181,10 +208,17 @@ Specifies the high 16 bits of the owner's project ID in v2 inodes, if the
> +XFS_SB_VERSION2_PROJID32BIT+ feature is set; and zero otherwise.
>
> *di_pad[6]*::
> -Reserved, must be zero.
> +Reserved, must be zero. Only exists for v2 inodes.
>
> *di_flushiter*::
> -Incremented on flush.
> +Incremented on flush. Only exists for v2 inodes.
> +
> +*di_v3_pad*::
> +Must be zero for v3 inodes without the NREXT64 flag set.
> +
> +*di_big_nextents*::
> +Specifies the number of data extents associated with this inode if the NREXT64
> +flag is set. This allows for up to 2^48^ - 1 extent mappings.
>
> *di_atime*::
>
> @@ -231,10 +265,19 @@ file is written to beyond allocated space, XFS will attempt to allocate
> additional disk space based on this value.
>
> *di_nextents*::
> -Specifies the number of data extents associated with this inode.
> +Specifies the number of data extents associated with this inode if the NREXT64
> +flag is not set. Supports up to 2^31^ - 1 extents.
>
> *di_anextents*::
> -Specifies the number of extended attribute extents associated with this inode.
> +Specifies the number of extended attribute extents associated with this inode
> +if the NREXT64 flag is not set. Supports up to 2^15^ - 1 extents.
> +
> +*di_big_anextents*::
> +Specifies the number of extended attribute extents associated with this inode
> +if the NREXT64 flag is set. Supports up to 2^32^ - 1 extents.
> +
> +*di_nrext64_pad*::
> +Must be zero if the NREXT64 flag is set.
>
> *di_forkoff*::
> Specifies the offset into the inode's literal area where the extended attribute
> @@ -336,6 +379,10 @@ This inode shares (or has shared) data blocks with another inode.
> For files, this is the extent size hint for copy on write operations; see
> +di_cowextsize+ for details. For directories, the value in +di_cowextsize+
> will be copied to all newly created files and directories.
> +| +XFS_DIFLAG2_NREXT64+ |
> +Files with this flag set may have up to (2^48^ - 1) extents mapped to the data
> +fork and up to (2^32^ - 1) extents mapped to the attribute fork. This flag
> +requires the +XFS_SB_FEAT_INCOMPAT_NREXT64+ feature to be enabled.
> |=====
>
> *di_cowextsize*::
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 3/3] design: document extended attribute log item changes
2023-01-18 0:42 [PATCHSET 0/3] xfs-documentation: updates for 6.1 Darrick J. Wong
2023-01-18 0:44 ` [PATCH 1/3] design: update group quota inode information for v5 filesystems Darrick J. Wong
2023-01-18 0:45 ` [PATCH 2/3] design: document the large extent count ondisk format changes Darrick J. Wong
@ 2023-01-18 0:45 ` Darrick J. Wong
2023-01-24 5:30 ` Chandan Babu R
2 siblings, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2023-01-18 0:45 UTC (permalink / raw)
To: djwong, darrick.wong; +Cc: linux-xfs, chandan.babu, allison.henderson
From: Darrick J. Wong <djwong@kernel.org>
Describe the changes to the ondisk log format that are required to
support atomic updates to extended attributes.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
.../allocation_groups.asciidoc | 14 ++-
.../journaling_log.asciidoc | 109 ++++++++++++++++++++
design/XFS_Filesystem_Structure/magic.asciidoc | 2
3 files changed, 122 insertions(+), 3 deletions(-)
diff --git a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
index c64b4fad..c0ba16a8 100644
--- a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
+++ b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
@@ -461,9 +461,17 @@ space mappings allowed in data and extended attribute file forks.
|=====
*sb_features_log_incompat*::
-Read-write incompatible feature flags for the log. The kernel cannot read or
-write this FS log if it doesn't understand the flag. Currently, no flags are
-defined.
+Read-write incompatible feature flags for the log. The kernel cannot recover
+the FS log if it doesn't understand the flag.
+
+.Extended Version 5 Superblock Log incompatibility flags
+[options="header"]
+|=====
+| Flag | Description
+| +XFS_SB_FEAT_INCOMPAT_LOG_XATTRS+ |
+Extended attribute updates have been committed to the ondisk log.
+
+|=====
*sb_crc*::
Superblock checksum.
diff --git a/design/XFS_Filesystem_Structure/journaling_log.asciidoc b/design/XFS_Filesystem_Structure/journaling_log.asciidoc
index ddcb87f4..f36dd352 100644
--- a/design/XFS_Filesystem_Structure/journaling_log.asciidoc
+++ b/design/XFS_Filesystem_Structure/journaling_log.asciidoc
@@ -215,6 +215,8 @@ magic number to distinguish themselves. Buffer data items only appear after
| +XFS_LI_CUD+ | 0x1243 | xref:CUD_Log_Item[Reference Count Update Done]
| +XFS_LI_BUI+ | 0x1244 | xref:BUI_Log_Item[File Block Mapping Update Intent]
| +XFS_LI_BUD+ | 0x1245 | xref:BUD_Log_Item[File Block Mapping Update Done]
+| +XFS_LI_ATTRI+ | 0x1246 | xref:ATTRI_Log_Item[Extended Attribute Update Intent]
+| +XFS_LI_ATTRD+ | 0x1247 | xref:ATTRD_Log_Item[Extended Attribute Update Done]
|=====
Note that all log items (except for transaction headers) MUST start with
@@ -712,6 +714,113 @@ Size of this log item. Should be 1.
*bud_bui_id*::
A 64-bit number that binds the corresponding BUI log item to this BUD log item.
+[[ATTRI_Log_Item]]
+=== Extended Attribute Update Intent
+
+The next two operation types work together to handle atomic extended attribute
+updates.
+
+The lower byte of the +alfi_op_flags+ field is a type code indicating what sort
+of file block mapping operation we want.
+
+.Extended attribute update log intent types
+[options="header"]
+|=====
+| Value | Description
+| +XFS_ATTRI_OP_FLAGS_SET+ | Set a key/value pair.
+| +XFS_ATTRI_OP_FLAGS_REMOVE+ | Remove a key/value pair.
+| +XFS_ATTRI_OP_FLAGS_REPLACE+ | Replace one key/value pair with another.
+|=====
+
+The ``extended attribute update intent'' operation comes first; it tells the
+log that XFS wants to update one of a file's extended attributes. This record
+is crucial for correct log recovery because it enables us to spread a complex
+metadata update across multiple transactions while ensuring that a crash midway
+through the complex update will be replayed fully during log recovery.
+
+[source, c]
+----
+struct xfs_attri_log_format {
+ uint16_t alfi_type;
+ uint16_t alfi_size;
+ uint32_t __pad;
+ uint64_t alfi_id;
+ uint64_t alfi_ino;
+ uint32_t alfi_op_flags;
+ uint32_t alfi_name_len;
+ uint32_t alfi_value_len;
+ uint32_t alfi_attr_filter;
+};
+----
+
+*alfi_type*::
+The signature of an ATTRI operation, 0x1246. This value is in host-endian
+order, not big-endian like the rest of XFS.
+
+*alfi_size*::
+Size of this log item. Should be 1.
+
+*alfi_id*::
+A 64-bit number that binds the corresponding ATTRD log item to this ATTRI log
+item.
+
+*alfi_ino*::
+Inode number of the file being updated.
+
+*alfi_op_flags*::
+The operation being performed. The lower byte must be one of the
++XFS_ATTRI_OP_FLAGS_*+ flags defined above. The upper bytes must be zero.
+
+*alfi_name_len*::
+Length of the name of the extended attribute. This must not be zero.
+The attribute name itself is captured in the next log item.
+
+*alfi_value_len*::
+Length of the value of the extended attribute. This must be zero for remove
+operations, and nonzero for set and replace operations. The attribute value
+itself is captured in the log item immediately after the item containing the
+name.
+
+*alfi_attr_filter*::
+Attribute namespace filter flags. This must be one of +ATTR_ROOT+,
++ATTR_SECURE+, or +ATTR_INCOMPLETE+.
+
+[[ATTRD_Log_Item]]
+=== Completion of Extended Attribute Updates
+
+The ``extended attribute update done'' operation complements the ``extended
+attribute update intent'' operation. This second operation indicates that the
+update actually happened, so that log recovery needn't replay the update. The
+ATTRD and the actual updates are typically found in a new transaction following
+the transaction in which the ATTRI was logged.
+
+[source, c]
+----
+struct xfs_attrd_log_format {
+ __uint16_t alfd_type;
+ __uint16_t alfd_size;
+ __uint32_t __pad;
+ __uint64_t alfd_alf_id;
+};
+----
+
+*alfd_type*::
+The signature of an ATTRD operation, 0x1247. This value is in host-endian
+order, not big-endian like the rest of XFS.
+
+*alfd_size*::
+Size of this log item. Should be 1.
+
+*alfd_bui_id*::
+A 64-bit number that binds the corresponding ATTRI log item to this ATTRD log
+item.
+
+=== Extended Attribute Name and Value
+
+These regions contain the name and value components of the extended attribute
+being updated, as needed. There are no magic numbers; each region contains the
+data and nothing else.
+
[[Inode_Log_Item]]
=== Inode Updates
diff --git a/design/XFS_Filesystem_Structure/magic.asciidoc b/design/XFS_Filesystem_Structure/magic.asciidoc
index 9be26f82..a343271a 100644
--- a/design/XFS_Filesystem_Structure/magic.asciidoc
+++ b/design/XFS_Filesystem_Structure/magic.asciidoc
@@ -71,6 +71,8 @@ are not aligned to blocks.
| +XFS_LI_CUD+ | 0x1243 | | xref:CUD_Log_Item[Reference Count Update Done]
| +XFS_LI_BUI+ | 0x1244 | | xref:BUI_Log_Item[File Block Mapping Update Intent]
| +XFS_LI_BUD+ | 0x1245 | | xref:BUD_Log_Item[File Block Mapping Update Done]
+| +XFS_LI_ATTRI+ | 0x1246 | | xref:ATTRI_Log_Item[Extended Attribute Update Intent]
+| +XFS_LI_ATTRD+ | 0x1247 | | xref:ATTRD_Log_Item[Extended Attribute Update Done]
|=====
= Theoretical Limits
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH 3/3] design: document extended attribute log item changes
2023-01-18 0:45 ` [PATCH 3/3] design: document extended attribute log item changes Darrick J. Wong
@ 2023-01-24 5:30 ` Chandan Babu R
2023-01-25 1:20 ` Darrick J. Wong
0 siblings, 1 reply; 8+ messages in thread
From: Chandan Babu R @ 2023-01-24 5:30 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: darrick.wong, linux-xfs, allison.henderson
On Tue, Jan 17, 2023 at 04:45:20 PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> Describe the changes to the ondisk log format that are required to
> support atomic updates to extended attributes.
>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
> .../allocation_groups.asciidoc | 14 ++-
> .../journaling_log.asciidoc | 109 ++++++++++++++++++++
> design/XFS_Filesystem_Structure/magic.asciidoc | 2
> 3 files changed, 122 insertions(+), 3 deletions(-)
>
>
> diff --git a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
> index c64b4fad..c0ba16a8 100644
> --- a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
> +++ b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
> @@ -461,9 +461,17 @@ space mappings allowed in data and extended attribute file forks.
> |=====
>
> *sb_features_log_incompat*::
> -Read-write incompatible feature flags for the log. The kernel cannot read or
> -write this FS log if it doesn't understand the flag. Currently, no flags are
> -defined.
> +Read-write incompatible feature flags for the log. The kernel cannot recover
> +the FS log if it doesn't understand the flag.
> +
> +.Extended Version 5 Superblock Log incompatibility flags
> +[options="header"]
> +|=====
> +| Flag | Description
> +| +XFS_SB_FEAT_INCOMPAT_LOG_XATTRS+ |
> +Extended attribute updates have been committed to the ondisk log.
> +
> +|=====
>
> *sb_crc*::
> Superblock checksum.
> diff --git a/design/XFS_Filesystem_Structure/journaling_log.asciidoc b/design/XFS_Filesystem_Structure/journaling_log.asciidoc
> index ddcb87f4..f36dd352 100644
> --- a/design/XFS_Filesystem_Structure/journaling_log.asciidoc
> +++ b/design/XFS_Filesystem_Structure/journaling_log.asciidoc
> @@ -215,6 +215,8 @@ magic number to distinguish themselves. Buffer data items only appear after
> | +XFS_LI_CUD+ | 0x1243 | xref:CUD_Log_Item[Reference Count Update Done]
> | +XFS_LI_BUI+ | 0x1244 | xref:BUI_Log_Item[File Block Mapping Update Intent]
> | +XFS_LI_BUD+ | 0x1245 | xref:BUD_Log_Item[File Block Mapping Update Done]
> +| +XFS_LI_ATTRI+ | 0x1246 | xref:ATTRI_Log_Item[Extended Attribute Update Intent]
> +| +XFS_LI_ATTRD+ | 0x1247 | xref:ATTRD_Log_Item[Extended Attribute Update Done]
> |=====
>
> Note that all log items (except for transaction headers) MUST start with
> @@ -712,6 +714,113 @@ Size of this log item. Should be 1.
> *bud_bui_id*::
> A 64-bit number that binds the corresponding BUI log item to this BUD log item.
>
> +[[ATTRI_Log_Item]]
> +=== Extended Attribute Update Intent
> +
> +The next two operation types work together to handle atomic extended attribute
> +updates.
> +
> +The lower byte of the +alfi_op_flags+ field is a type code indicating what sort
> +of file block mapping operation we want.
> +
> +.Extended attribute update log intent types
> +[options="header"]
> +|=====
> +| Value | Description
> +| +XFS_ATTRI_OP_FLAGS_SET+ | Set a key/value pair.
> +| +XFS_ATTRI_OP_FLAGS_REMOVE+ | Remove a key/value pair.
> +| +XFS_ATTRI_OP_FLAGS_REPLACE+ | Replace one key/value pair with another.
> +|=====
> +
> +The ``extended attribute update intent'' operation comes first; it tells the
> +log that XFS wants to update one of a file's extended attributes. This record
> +is crucial for correct log recovery because it enables us to spread a complex
> +metadata update across multiple transactions while ensuring that a crash midway
> +through the complex update will be replayed fully during log recovery.
> +
> +[source, c]
> +----
> +struct xfs_attri_log_format {
> + uint16_t alfi_type;
> + uint16_t alfi_size;
> + uint32_t __pad;
> + uint64_t alfi_id;
> + uint64_t alfi_ino;
> + uint32_t alfi_op_flags;
> + uint32_t alfi_name_len;
> + uint32_t alfi_value_len;
> + uint32_t alfi_attr_filter;
> +};
> +----
> +
> +*alfi_type*::
> +The signature of an ATTRI operation, 0x1246. This value is in host-endian
> +order, not big-endian like the rest of XFS.
> +
> +*alfi_size*::
> +Size of this log item. Should be 1.
> +
> +*alfi_id*::
> +A 64-bit number that binds the corresponding ATTRD log item to this ATTRI log
> +item.
> +
> +*alfi_ino*::
> +Inode number of the file being updated.
> +
> +*alfi_op_flags*::
> +The operation being performed. The lower byte must be one of the
> ++XFS_ATTRI_OP_FLAGS_*+ flags defined above. The upper bytes must be zero.
> +
> +*alfi_name_len*::
> +Length of the name of the extended attribute. This must not be zero.
> +The attribute name itself is captured in the next log item.
> +
> +*alfi_value_len*::
> +Length of the value of the extended attribute. This must be zero for remove
> +operations, and nonzero for set and replace operations. The attribute value
> +itself is captured in the log item immediately after the item containing the
> +name.
> +
> +*alfi_attr_filter*::
> +Attribute namespace filter flags. This must be one of +ATTR_ROOT+,
> ++ATTR_SECURE+, or +ATTR_INCOMPLETE+.
> +
> +[[ATTRD_Log_Item]]
> +=== Completion of Extended Attribute Updates
> +
> +The ``extended attribute update done'' operation complements the ``extended
> +attribute update intent'' operation. This second operation indicates that the
> +update actually happened, so that log recovery needn't replay the update. The
> +ATTRD and the actual updates are typically found in a new transaction following
> +the transaction in which the ATTRI was logged.
> +
> +[source, c]
> +----
> +struct xfs_attrd_log_format {
> + __uint16_t alfd_type;
> + __uint16_t alfd_size;
> + __uint32_t __pad;
> + __uint64_t alfd_alf_id;
> +};
> +----
> +
> +*alfd_type*::
> +The signature of an ATTRD operation, 0x1247. This value is in host-endian
> +order, not big-endian like the rest of XFS.
> +
> +*alfd_size*::
> +Size of this log item. Should be 1.
> +
> +*alfd_bui_id*::
The above should be "alfd_alf_id". Apart from that, the remaining
changes appear to be correct.
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
--
chandan
> +A 64-bit number that binds the corresponding ATTRI log item to this ATTRD log
> +item.
> +
> +=== Extended Attribute Name and Value
> +
> +These regions contain the name and value components of the extended attribute
> +being updated, as needed. There are no magic numbers; each region contains the
> +data and nothing else.
> +
> [[Inode_Log_Item]]
> === Inode Updates
>
> diff --git a/design/XFS_Filesystem_Structure/magic.asciidoc b/design/XFS_Filesystem_Structure/magic.asciidoc
> index 9be26f82..a343271a 100644
> --- a/design/XFS_Filesystem_Structure/magic.asciidoc
> +++ b/design/XFS_Filesystem_Structure/magic.asciidoc
> @@ -71,6 +71,8 @@ are not aligned to blocks.
> | +XFS_LI_CUD+ | 0x1243 | | xref:CUD_Log_Item[Reference Count Update Done]
> | +XFS_LI_BUI+ | 0x1244 | | xref:BUI_Log_Item[File Block Mapping Update Intent]
> | +XFS_LI_BUD+ | 0x1245 | | xref:BUD_Log_Item[File Block Mapping Update Done]
> +| +XFS_LI_ATTRI+ | 0x1246 | | xref:ATTRI_Log_Item[Extended Attribute Update Intent]
> +| +XFS_LI_ATTRD+ | 0x1247 | | xref:ATTRD_Log_Item[Extended Attribute Update Done]
> |=====
>
> = Theoretical Limits
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH 3/3] design: document extended attribute log item changes
2023-01-24 5:30 ` Chandan Babu R
@ 2023-01-25 1:20 ` Darrick J. Wong
0 siblings, 0 replies; 8+ messages in thread
From: Darrick J. Wong @ 2023-01-25 1:20 UTC (permalink / raw)
To: Chandan Babu R; +Cc: darrick.wong, linux-xfs, allison.henderson
On Tue, Jan 24, 2023 at 11:00:56AM +0530, Chandan Babu R wrote:
> On Tue, Jan 17, 2023 at 04:45:20 PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> >
> > Describe the changes to the ondisk log format that are required to
> > support atomic updates to extended attributes.
> >
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> > .../allocation_groups.asciidoc | 14 ++-
> > .../journaling_log.asciidoc | 109 ++++++++++++++++++++
> > design/XFS_Filesystem_Structure/magic.asciidoc | 2
> > 3 files changed, 122 insertions(+), 3 deletions(-)
> >
> >
> > diff --git a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
> > index c64b4fad..c0ba16a8 100644
> > --- a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
> > +++ b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
> > @@ -461,9 +461,17 @@ space mappings allowed in data and extended attribute file forks.
> > |=====
> >
> > *sb_features_log_incompat*::
> > -Read-write incompatible feature flags for the log. The kernel cannot read or
> > -write this FS log if it doesn't understand the flag. Currently, no flags are
> > -defined.
> > +Read-write incompatible feature flags for the log. The kernel cannot recover
> > +the FS log if it doesn't understand the flag.
> > +
> > +.Extended Version 5 Superblock Log incompatibility flags
> > +[options="header"]
> > +|=====
> > +| Flag | Description
> > +| +XFS_SB_FEAT_INCOMPAT_LOG_XATTRS+ |
> > +Extended attribute updates have been committed to the ondisk log.
> > +
> > +|=====
> >
> > *sb_crc*::
> > Superblock checksum.
> > diff --git a/design/XFS_Filesystem_Structure/journaling_log.asciidoc b/design/XFS_Filesystem_Structure/journaling_log.asciidoc
> > index ddcb87f4..f36dd352 100644
> > --- a/design/XFS_Filesystem_Structure/journaling_log.asciidoc
> > +++ b/design/XFS_Filesystem_Structure/journaling_log.asciidoc
> > @@ -215,6 +215,8 @@ magic number to distinguish themselves. Buffer data items only appear after
> > | +XFS_LI_CUD+ | 0x1243 | xref:CUD_Log_Item[Reference Count Update Done]
> > | +XFS_LI_BUI+ | 0x1244 | xref:BUI_Log_Item[File Block Mapping Update Intent]
> > | +XFS_LI_BUD+ | 0x1245 | xref:BUD_Log_Item[File Block Mapping Update Done]
> > +| +XFS_LI_ATTRI+ | 0x1246 | xref:ATTRI_Log_Item[Extended Attribute Update Intent]
> > +| +XFS_LI_ATTRD+ | 0x1247 | xref:ATTRD_Log_Item[Extended Attribute Update Done]
> > |=====
> >
> > Note that all log items (except for transaction headers) MUST start with
> > @@ -712,6 +714,113 @@ Size of this log item. Should be 1.
> > *bud_bui_id*::
> > A 64-bit number that binds the corresponding BUI log item to this BUD log item.
> >
> > +[[ATTRI_Log_Item]]
> > +=== Extended Attribute Update Intent
> > +
> > +The next two operation types work together to handle atomic extended attribute
> > +updates.
> > +
> > +The lower byte of the +alfi_op_flags+ field is a type code indicating what sort
> > +of file block mapping operation we want.
> > +
> > +.Extended attribute update log intent types
> > +[options="header"]
> > +|=====
> > +| Value | Description
> > +| +XFS_ATTRI_OP_FLAGS_SET+ | Set a key/value pair.
> > +| +XFS_ATTRI_OP_FLAGS_REMOVE+ | Remove a key/value pair.
> > +| +XFS_ATTRI_OP_FLAGS_REPLACE+ | Replace one key/value pair with another.
> > +|=====
> > +
> > +The ``extended attribute update intent'' operation comes first; it tells the
> > +log that XFS wants to update one of a file's extended attributes. This record
> > +is crucial for correct log recovery because it enables us to spread a complex
> > +metadata update across multiple transactions while ensuring that a crash midway
> > +through the complex update will be replayed fully during log recovery.
> > +
> > +[source, c]
> > +----
> > +struct xfs_attri_log_format {
> > + uint16_t alfi_type;
> > + uint16_t alfi_size;
> > + uint32_t __pad;
> > + uint64_t alfi_id;
> > + uint64_t alfi_ino;
> > + uint32_t alfi_op_flags;
> > + uint32_t alfi_name_len;
> > + uint32_t alfi_value_len;
> > + uint32_t alfi_attr_filter;
> > +};
> > +----
> > +
> > +*alfi_type*::
> > +The signature of an ATTRI operation, 0x1246. This value is in host-endian
> > +order, not big-endian like the rest of XFS.
> > +
> > +*alfi_size*::
> > +Size of this log item. Should be 1.
> > +
> > +*alfi_id*::
> > +A 64-bit number that binds the corresponding ATTRD log item to this ATTRI log
> > +item.
> > +
> > +*alfi_ino*::
> > +Inode number of the file being updated.
> > +
> > +*alfi_op_flags*::
> > +The operation being performed. The lower byte must be one of the
> > ++XFS_ATTRI_OP_FLAGS_*+ flags defined above. The upper bytes must be zero.
> > +
> > +*alfi_name_len*::
> > +Length of the name of the extended attribute. This must not be zero.
> > +The attribute name itself is captured in the next log item.
> > +
> > +*alfi_value_len*::
> > +Length of the value of the extended attribute. This must be zero for remove
> > +operations, and nonzero for set and replace operations. The attribute value
> > +itself is captured in the log item immediately after the item containing the
> > +name.
> > +
> > +*alfi_attr_filter*::
> > +Attribute namespace filter flags. This must be one of +ATTR_ROOT+,
> > ++ATTR_SECURE+, or +ATTR_INCOMPLETE+.
> > +
> > +[[ATTRD_Log_Item]]
> > +=== Completion of Extended Attribute Updates
> > +
> > +The ``extended attribute update done'' operation complements the ``extended
> > +attribute update intent'' operation. This second operation indicates that the
> > +update actually happened, so that log recovery needn't replay the update. The
> > +ATTRD and the actual updates are typically found in a new transaction following
> > +the transaction in which the ATTRI was logged.
> > +
> > +[source, c]
> > +----
> > +struct xfs_attrd_log_format {
> > + __uint16_t alfd_type;
> > + __uint16_t alfd_size;
> > + __uint32_t __pad;
> > + __uint64_t alfd_alf_id;
> > +};
> > +----
> > +
> > +*alfd_type*::
> > +The signature of an ATTRD operation, 0x1247. This value is in host-endian
> > +order, not big-endian like the rest of XFS.
> > +
> > +*alfd_size*::
> > +Size of this log item. Should be 1.
> > +
> > +*alfd_bui_id*::
>
> The above should be "alfd_alf_id". Apart from that, the remaining
> changes appear to be correct.
>
> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
I'll fix that. Thanks for the review!
--D
> --
> chandan
>
> > +A 64-bit number that binds the corresponding ATTRI log item to this ATTRD log
> > +item.
> > +
> > +=== Extended Attribute Name and Value
> > +
> > +These regions contain the name and value components of the extended attribute
> > +being updated, as needed. There are no magic numbers; each region contains the
> > +data and nothing else.
> > +
> > [[Inode_Log_Item]]
> > === Inode Updates
> >
> > diff --git a/design/XFS_Filesystem_Structure/magic.asciidoc b/design/XFS_Filesystem_Structure/magic.asciidoc
> > index 9be26f82..a343271a 100644
> > --- a/design/XFS_Filesystem_Structure/magic.asciidoc
> > +++ b/design/XFS_Filesystem_Structure/magic.asciidoc
> > @@ -71,6 +71,8 @@ are not aligned to blocks.
> > | +XFS_LI_CUD+ | 0x1243 | | xref:CUD_Log_Item[Reference Count Update Done]
> > | +XFS_LI_BUI+ | 0x1244 | | xref:BUI_Log_Item[File Block Mapping Update Intent]
> > | +XFS_LI_BUD+ | 0x1245 | | xref:BUD_Log_Item[File Block Mapping Update Done]
> > +| +XFS_LI_ATTRI+ | 0x1246 | | xref:ATTRI_Log_Item[Extended Attribute Update Intent]
> > +| +XFS_LI_ATTRD+ | 0x1247 | | xref:ATTRD_Log_Item[Extended Attribute Update Done]
> > |=====
> >
> > = Theoretical Limits
^ permalink raw reply [flat|nested] 8+ messages in thread