* [PATCHBOMB] xfs-documentation: updates for 6.13
@ 2024-11-27 0:16 Darrick J. Wong
2024-11-27 0:18 ` [PATCHSET] " Darrick J. Wong
2024-11-27 0:20 ` [GIT PULL] xfs-documentation: updates for 6.13 Darrick J. Wong
0 siblings, 2 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27 0:16 UTC (permalink / raw)
To: xfs
Hi all,
Here's all the changes I have staged for the ondisk documentation for
Linux 6.13. I'm reposting the patches along with a pull request and
will do a release immediately afterwards.
--D
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCHSET] xfs-documentation: updates for 6.13
2024-11-27 0:16 [PATCHBOMB] xfs-documentation: updates for 6.13 Darrick J. Wong
@ 2024-11-27 0:18 ` Darrick J. Wong
2024-11-27 0:18 ` [PATCH 01/10] design: update metadata reconstruction chapter Darrick J. Wong
` (9 more replies)
2024-11-27 0:20 ` [GIT PULL] xfs-documentation: updates for 6.13 Darrick J. Wong
1 sibling, 10 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27 0:18 UTC (permalink / raw)
To: djwong; +Cc: hch, hch, cem, linux-xfs
Hi all,
Here's a pile of updates detailing the changes made during 6.12 and 6.13.
If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.
Comments and questions are, as always, welcome.
xfsdocs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-documentation.git/log/?h=xfsdocs-6.13-updates
---
Commits in this patchset:
* design: update metadata reconstruction chapter
* design: document filesystem properties
* design: move superblock documentation to a separate file
* design: document the actual ondisk superblock
* design: document the changes required to handle metadata directories
* design: move discussion of realtime volumes to a separate section
* design: document realtime groups
* design: document metadata directory tree quota changes
* design: update metadump v2 format to reflect rt dumps
* xfs-documentation: release for 6.1[23]
---
.../allocation_groups.asciidoc | 570 --------------------
.../XFS_Filesystem_Structure/common_types.asciidoc | 4
design/XFS_Filesystem_Structure/docinfo.xml | 19 +
.../fs_properties.asciidoc | 28 +
.../internal_inodes.asciidoc | 154 ++++-
design/XFS_Filesystem_Structure/magic.asciidoc | 3
design/XFS_Filesystem_Structure/metadump.asciidoc | 12
.../XFS_Filesystem_Structure/ondisk_inode.asciidoc | 27 +
design/XFS_Filesystem_Structure/realtime.asciidoc | 394 ++++++++++++++
.../reconstruction.asciidoc | 17 -
.../XFS_Filesystem_Structure/superblock.asciidoc | 574 ++++++++++++++++++++
.../xfs_filesystem_structure.asciidoc | 4
12 files changed, 1192 insertions(+), 614 deletions(-)
create mode 100644 design/XFS_Filesystem_Structure/fs_properties.asciidoc
create mode 100644 design/XFS_Filesystem_Structure/realtime.asciidoc
create mode 100644 design/XFS_Filesystem_Structure/superblock.asciidoc
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 01/10] design: update metadata reconstruction chapter
2024-11-27 0:18 ` [PATCHSET] " Darrick J. Wong
@ 2024-11-27 0:18 ` Darrick J. Wong
2024-11-27 0:18 ` [PATCH 02/10] design: document filesystem properties Darrick J. Wong
` (8 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27 0:18 UTC (permalink / raw)
To: djwong; +Cc: hch, hch, cem, linux-xfs
From: Darrick J. Wong <djwong@kernel.org>
We've landed online repair and full backrefs in the filesystem, so
update the links to the new sections and transform future tense to
present tense.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
.../reconstruction.asciidoc | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/design/XFS_Filesystem_Structure/reconstruction.asciidoc b/design/XFS_Filesystem_Structure/reconstruction.asciidoc
index f172e0f8161656..f4c10217910b6c 100644
--- a/design/XFS_Filesystem_Structure/reconstruction.asciidoc
+++ b/design/XFS_Filesystem_Structure/reconstruction.asciidoc
@@ -1,10 +1,6 @@
[[Reconstruction]]
= Metadata Reconstruction
-[NOTE]
-This is a theoretical discussion of how reconstruction could work; none of this
-is implemented as of 2015.
-
A simple UNIX filesystem can be thought of in terms of a directed acyclic graph.
To a first approximation, there exists a root directory node, which points to
other nodes. Those other nodes can themselves be directories or they can be
@@ -45,9 +41,14 @@ The xref:Reverse_Mapping_Btree[reverse-mapping B+tree] fills in part of the
puzzle. Since it contains copies of every entry in each inode’s data and
attribute forks, we can fix a corrupted block map with these records.
Furthermore, if the inode B+trees become corrupt, it is possible to visit all
-inode chunks using the reverse-mapping data. Should XFS ever gain the ability
-to store parent directory information in each inode, it also becomes possible
+inode chunks using the reverse-mapping data. xref:Parent_Pointers[Directory
+parent pointers] fill in the rest of the puzzle by mirroring the directory tree
+structure with parent directory information in each inode. It is now possible
to resurrect damaged directory trees, which should reduce the complaints about
inodes ending up in +/lost+found+. Everything else in the per-AG primary
-metadata can already be reconstructed via +xfs_repair+. Hopefully,
-reconstruction will not turn out to be a fool's errand.
+metadata can already be reconstructed via +xfs_repair+.
+
+See the
+https://docs.kernel.org/filesystems/xfs/xfs-online-fsck-design.html[design
+document] for online repair for a more thorough discussion of how this metadata
+are put to use.
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 02/10] design: document filesystem properties
2024-11-27 0:18 ` [PATCHSET] " Darrick J. Wong
2024-11-27 0:18 ` [PATCH 01/10] design: update metadata reconstruction chapter Darrick J. Wong
@ 2024-11-27 0:18 ` Darrick J. Wong
2024-11-27 0:18 ` [PATCH 03/10] design: move superblock documentation to a separate file Darrick J. Wong
` (7 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27 0:18 UTC (permalink / raw)
To: djwong; +Cc: hch, hch, cem, linux-xfs
From: Darrick J. Wong <djwong@kernel.org>
Now that xfsprogs utilities can set properties to coordinate the
behavior of other xfsprogs utilities, record them in the ondisk format
documentation.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
.../fs_properties.asciidoc | 28 ++++++++++++++++++++
.../xfs_filesystem_structure.asciidoc | 2 +
2 files changed, 30 insertions(+)
create mode 100644 design/XFS_Filesystem_Structure/fs_properties.asciidoc
diff --git a/design/XFS_Filesystem_Structure/fs_properties.asciidoc b/design/XFS_Filesystem_Structure/fs_properties.asciidoc
new file mode 100644
index 00000000000000..b639aec9ab6366
--- /dev/null
+++ b/design/XFS_Filesystem_Structure/fs_properties.asciidoc
@@ -0,0 +1,28 @@
+[[Filesystem_Properties]]
+= Filesystem Properties
+
+System administrators can set filesystem-wide properties to coordinate the
+behavior of userspace XFS administration tools. These properties are recorded
+as extended attributes of the +ATTR_ROOT+ namesace that are set on the root
+directory.
+
+[options="header"]
+|=====
+| Property | Description
+| +xfs:autofsck+ | Online fsck background scanning behavior
+|=====
+
+*xfs:autofsck*::
+This property controls the behavior of background online fsck.
+Unrecognized values are treated as if the property was not set.
+Check the +xfs_scrub+ manual page for more information.
+
+.autofsck property values
+[options="header"]
+|=====
+| Value | Description
+| +none+ | Do not perform background scans.
+| +check+ | Only check metadata.
+| +optimize+ | Check and optimize metadata.
+| +repair+ | Check, repair, or optimize metadata.
+|=====
diff --git a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
index a95a5806172a0c..689e2a874c13e9 100644
--- a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
+++ b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
@@ -84,6 +84,8 @@ include::journaling_log.asciidoc[]
include::internal_inodes.asciidoc[]
+include::fs_properties.asciidoc[]
+
:leveloffset: 0
Dynamically Allocated Structures
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 03/10] design: move superblock documentation to a separate file
2024-11-27 0:18 ` [PATCHSET] " Darrick J. Wong
2024-11-27 0:18 ` [PATCH 01/10] design: update metadata reconstruction chapter Darrick J. Wong
2024-11-27 0:18 ` [PATCH 02/10] design: document filesystem properties Darrick J. Wong
@ 2024-11-27 0:18 ` Darrick J. Wong
2024-11-27 0:19 ` [PATCH 04/10] design: document the actual ondisk superblock Darrick J. Wong
` (6 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27 0:18 UTC (permalink / raw)
To: djwong; +Cc: hch, hch, cem, linux-xfs
From: Darrick J. Wong <djwong@kernel.org>
Move the ondisk superblock docs to a separate file.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
.../allocation_groups.asciidoc | 550 --------------------
.../XFS_Filesystem_Structure/superblock.asciidoc | 548 ++++++++++++++++++++
2 files changed, 549 insertions(+), 549 deletions(-)
create mode 100644 design/XFS_Filesystem_Structure/superblock.asciidoc
diff --git a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
index d7fd63ea20a646..e2cdaab5e03d3f 100644
--- a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
+++ b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
@@ -31,555 +31,7 @@ image::images/6.png[]
Each of these structures are expanded upon in the following sections.
-[[Superblocks]]
-== Superblocks
-
-Each AG starts with a superblock. The first one, in AG 0, is the primary
-superblock which stores aggregate AG information. Secondary superblocks are
-only used by xfs_repair when the primary superblock has been corrupted. A
-superblock is one sector in length.
-
-The superblock is defined by the following structure. The description of each
-field follows.
-
-[source, c]
-----
-struct xfs_sb
-{
- __uint32_t sb_magicnum;
- __uint32_t sb_blocksize;
- xfs_rfsblock_t sb_dblocks;
- xfs_rfsblock_t sb_rblocks;
- xfs_rtblock_t sb_rextents;
- uuid_t sb_uuid;
- xfs_fsblock_t sb_logstart;
- xfs_ino_t sb_rootino;
- xfs_ino_t sb_rbmino;
- xfs_ino_t sb_rsumino;
- xfs_agblock_t sb_rextsize;
- xfs_agblock_t sb_agblocks;
- xfs_agnumber_t sb_agcount;
- xfs_extlen_t sb_rbmblocks;
- xfs_extlen_t sb_logblocks;
- __uint16_t sb_versionnum;
- __uint16_t sb_sectsize;
- __uint16_t sb_inodesize;
- __uint16_t sb_inopblock;
- char sb_fname[12];
- __uint8_t sb_blocklog;
- __uint8_t sb_sectlog;
- __uint8_t sb_inodelog;
- __uint8_t sb_inopblog;
- __uint8_t sb_agblklog;
- __uint8_t sb_rextslog;
- __uint8_t sb_inprogress;
- __uint8_t sb_imax_pct;
- __uint64_t sb_icount;
- __uint64_t sb_ifree;
- __uint64_t sb_fdblocks;
- __uint64_t sb_frextents;
- xfs_ino_t sb_uquotino;
- xfs_ino_t sb_gquotino;
- __uint16_t sb_qflags;
- __uint8_t sb_flags;
- __uint8_t sb_shared_vn;
- xfs_extlen_t sb_inoalignmt;
- __uint32_t sb_unit;
- __uint32_t sb_width;
- __uint8_t sb_dirblklog;
- __uint8_t sb_logsectlog;
- __uint16_t sb_logsectsize;
- __uint32_t sb_logsunit;
- __uint32_t sb_features2;
- __uint32_t sb_bad_features2;
-
- /* version 5 superblock fields start here */
- __uint32_t sb_features_compat;
- __uint32_t sb_features_ro_compat;
- __uint32_t sb_features_incompat;
- __uint32_t sb_features_log_incompat;
-
- __uint32_t sb_crc;
- xfs_extlen_t sb_spino_align;
-
- xfs_ino_t sb_pquotino;
- xfs_lsn_t sb_lsn;
- uuid_t sb_meta_uuid;
- xfs_ino_t sb_rrmapino;
-};
-----
-*sb_magicnum*::
-Identifies the filesystem. Its value is +XFS_SB_MAGIC+ ``XFSB'' (0x58465342).
-
-*sb_blocksize*::
-The size of a basic unit of space allocation in bytes. Typically, this is 4096
-(4KB) but can range from 512 to 65536 bytes.
-
-*sb_dblocks*::
-Total number of blocks available for data and metadata on the filesystem.
-
-*sb_rblocks*::
-Number blocks in the real-time disk device. Refer to
-xref:Real-time_Devices[real-time sub-volumes] for more information.
-
-*sb_rextents*::
-Number of extents on the real-time device.
-
-*sb_uuid*::
-UUID (Universally Unique ID) for the filesystem. Filesystems can be mounted by
-the UUID instead of device name.
-
-*sb_logstart*::
-First block number for the journaling log if the log is internal (ie. not on a
-separate disk device). For an external log device, this will be zero (the log
-will also start on the first block on the log device). The identity of the log
-devices is not recorded in the filesystem, but the UUIDs of the filesystem and
-the log device are compared to prevent corruption.
-
-*sb_rootino*::
-Root inode number for the filesystem. Normally, the root inode is at the
-start of the first possible inode chunk in AG 0. This is 128 when using a 4KB
-block size.
-
-*sb_rbmino*::
-Bitmap inode for real-time extents.
-
-*sb_rsumino*::
-Summary inode for real-time bitmap.
-
-*sb_rextsize*::
-Realtime extent size in blocks.
-
-*sb_agblocks*::
-Size of each AG in blocks. For the actual size of the last AG, refer to the
-xref:AG_Free_Space_Management[free space] +agf_length+ value.
-
-*sb_agcount*::
-Number of AGs in the filesystem.
-
-*sb_rbmblocks*::
-Number of real-time bitmap blocks.
-
-*sb_logblocks*::
-Number of blocks for the journaling log.
-
-*sb_versionnum*::
-Filesystem version number. This is a bitmask specifying the features enabled
-when creating the filesystem. Any disk checking tools or drivers that do not
-recognize any set bits must not operate upon the filesystem. Most of the flags
-indicate features introduced over time. If the value of the lower nibble is >=
-4, the higher bits indicate feature flags as follows:
-
-.Version 4 Superblock version flags
-[options="header"]
-|=====
-| Flag | Description
-| +XFS_SB_VERSION_ATTRBIT+ |
-Set if any inode have extended attributes. If this bit is set; the
-+XFS_SB_VERSION2_ATTR2BIT+ is not set; and the +attr2+ mount flag is not
-specified, the +di_forkoff+ inode field will not be dynamically adjusted.
-See the section about xref:Extended_Attribute_Versions[extended attribute
-versions] for more information.
-
-| +XFS_SB_VERSION_NLINKBIT+ | Set if any inodes use 32-bit di_nlink values.
-| +XFS_SB_VERSION_QUOTABIT+ |
-Quotas are enabled on the filesystem. This
-also brings in the various quota fields in the superblock.
-
-| +XFS_SB_VERSION_ALIGNBIT+ | Set if sb_inoalignmt is used.
-| +XFS_SB_VERSION_DALIGNBIT+ | Set if sb_unit and sb_width are used.
-| +XFS_SB_VERSION_SHAREDBIT+ | Set if sb_shared_vn is used.
-| +XFS_SB_VERSION_LOGV2BIT+ | Version 2 journaling logs are used.
-| +XFS_SB_VERSION_SECTORBIT+ | Set if sb_sectsize is not 512.
-| +XFS_SB_VERSION_EXTFLGBIT+ | Unwritten extents are used. This is always set.
-| +XFS_SB_VERSION_DIRV2BIT+ |
-Version 2 directories are used. This is always set.
-
-| +XFS_SB_VERSION_MOREBITSBIT+ |
-Set if the sb_features2 field in the superblock contains more flags.
-|=====
-
-If the lower nibble of this value is 5, then this is a v5 filesystem; the
-+XFS_SB_VERSION2_CRCBIT+ feature must be set in +sb_features2+.
-
-*sb_sectsize*::
-Specifies the underlying disk sector size in bytes. Typically this is 512 or
-4096 bytes. This determines the minimum I/O alignment, especially for direct I/O.
-
-*sb_inodesize*::
-Size of the inode in bytes. The default is 256 (2 inodes per standard sector)
-but can be made as large as 2048 bytes when creating the filesystem. On a v5
-filesystem, the default and minimum inode size are both 512 bytes.
-
-*sb_inopblock*::
-Number of inodes per block. This is equivalent to +sb_blocksize / sb_inodesize+.
-
-*sb_fname[12]*::
-Name for the filesystem. This value can be used in the mount command.
-
-*sb_blocklog*::
-log~2~ value of +sb_blocksize+. In other terms, +sb_blocksize = 2^sb_blocklog^+.
-
-*sb_sectlog*::
-log~2~ value of +sb_sectsize+.
-
-*sb_inodelog*::
-log~2~ value of +sb_inodesize+.
-
-*sb_inopblog*::
-log~2~ value of +sb_inopblock+.
-
-*sb_agblklog*::
-log~2~ value of +sb_agblocks+ (rounded up). This value is used to generate inode
-numbers and absolute block numbers defined in extent maps.
-
-*sb_rextslog*::
-log~2~ value of +sb_rextents+.
-
-*sb_inprogress*::
-Flag specifying that the filesystem is being created.
-
-*sb_imax_pct*::
-Maximum percentage of filesystem space that can be used for inodes. The default
-value is 5%.
-
-*sb_icount*::
-Global count for number inodes allocated on the filesystem. This is only
-maintained in the first superblock.
-
-*sb_ifree*::
-Global count of free inodes on the filesystem. This is only maintained in the
-first superblock.
-
-*sb_fdblocks*::
-Global count of free data blocks on the filesystem. This is only maintained in
-the first superblock.
-
-*sb_frextents*::
-Global count of free real-time extents on the filesystem. This is only
-maintained in the first superblock.
-
-*sb_uquotino*::
-Inode for user quotas. This and the following two quota fields only apply if
-+XFS_SB_VERSION_QUOTABIT+ flag is set in +sb_versionnum+. Refer to
-xref:Quota_Inodes[quota inodes] for more information.
-
-*sb_gquotino*::
-Inode for group or project quotas. Group and project quotas cannot be used at
-the same time on v4 filesystems. On a v5 filesystem, this inode always stores
-group quota information.
-
-*sb_qflags*::
-Quota flags. It can be a combination of the following flags:
-
-.Superblock quota flags
-[options="header"]
-|=====
-| Flag | Description
-| +XFS_UQUOTA_ACCT+ | User quota accounting is enabled.
-| +XFS_UQUOTA_ENFD+ | User quotas are enforced.
-| +XFS_UQUOTA_CHKD+ | User quotas have been checked.
-| +XFS_PQUOTA_ACCT+ | Project quota accounting is enabled.
-| +XFS_OQUOTA_ENFD+ | Other (group/project) quotas are enforced.
-| +XFS_OQUOTA_CHKD+ | Other (group/project) quotas have been checked.
-| +XFS_GQUOTA_ACCT+ | Group quota accounting is enabled.
-| +XFS_GQUOTA_ENFD+ | Group quotas are enforced.
-| +XFS_GQUOTA_CHKD+ | Group quotas have been checked.
-| +XFS_PQUOTA_ENFD+ | Project quotas are enforced.
-| +XFS_PQUOTA_CHKD+ | Project quotas have been checked.
-|=====
-
-*sb_flags*::
-Miscellaneous flags.
-
-.Superblock flags
-[options="header"]
-|=====
-| Flag | Description
-| +XFS_SBF_READONLY+ | Only read-only mounts allowed.
-|=====
-
-*sb_shared_vn*::
-Reserved and must be zero (``vn'' stands for version number).
-
-*sb_inoalignmt*::
-Inode chunk alignment in fsblocks. Prior to v5, the default value provided for
-inode chunks to have an 8KiB alignment. Starting with v5, the default value
-scales with the multiple of the inode size over 256 bytes. Concretely, this
-means an alignment of 16KiB for 512-byte inodes, 32KiB for 1024-byte inodes,
-etc. If sparse inodes are enabled, the +ir_startino+ field of each inode
-B+tree record must be aligned to this block granularity, even if the inode
-given by +ir_startino+ itself is sparse.
-
-*sb_unit*::
-Underlying stripe or raid unit in blocks.
-
-*sb_width*::
-Underlying stripe or raid width in blocks.
-
-*sb_dirblklog*::
-log~2~ multiplier that determines the granularity of directory block allocations
-in fsblocks.
-
-*sb_logsectlog*::
-log~2~ value of the log subvolume's sector size. This is only used if the
-journaling log is on a separate disk device (i.e. not internal).
-
-*sb_logsectsize*::
-The log's sector size in bytes if the filesystem uses an external log device.
-
-*sb_logsunit*::
-The log device's stripe or raid unit size. This only applies to version 2 logs
-+XFS_SB_VERSION_LOGV2BIT+ is set in +sb_versionnum+.
-
-*sb_features2*::
-Additional version flags if +XFS_SB_VERSION_MOREBITSBIT+ is set in
-+sb_versionnum+. The currently defined additional features include:
-
-.Extended Version 4 Superblock flags
-[options="header"]
-|=====
-| Flag | Description
-| +XFS_SB_VERSION2_LAZYSBCOUNTBIT+ |
-Lazy global counters. Making a filesystem with this bit set can improve
-performance. The global free space and inode counts are only updated in the
-primary superblock when the filesystem is cleanly unmounted.
-
-| +XFS_SB_VERSION2_ATTR2BIT+ |
-Extended attributes version 2. Making a filesystem with this optimises the
-inode layout of extended attributes. If this bit is set and the +noattr2+
-mount flag is not specified, the +di_forkoff+ inode field will be dynamically
-adjusted. See the section about xref:Extended_Attribute_Versions[extended
-attribute versions] for more information.
-
-| +XFS_SB_VERSION2_PARENTBIT+ |
-Parent pointers. All inodes must have an extended attribute that points back to
-its parent inode. The primary purpose for this information is in backup systems.
-
-| +XFS_SB_VERSION2_PROJID32BIT+ |
-32-bit Project ID. Inodes can be associated with a project ID number, which
-can be used to enforce disk space usage quotas for a particular group of
-directories. This flag indicates that project IDs can be 32 bits in size.
-
-| +XFS_SB_VERSION2_CRCBIT+ |
-Metadata checksumming. All metadata blocks have an extended header containing
-the block checksum, a copy of the metadata UUID, the log sequence number of the
-last update to prevent stale replays, and a back pointer to the owner of the
-block. This feature must be and can only be set if the lowest nibble of
-+sb_versionnum+ is set to 5.
-
-| +XFS_SB_VERSION2_FTYPE+ |
-Directory file type. Each directory entry records the type of the inode to
-which the entry points. This speeds up directory iteration by removing the
-need to load every inode into memory.
-|=====
-
-*sb_bad_features2*::
-This field mirrors +sb_features2+, due to past 64-bit alignment errors.
-
-*sb_features_compat*::
-Read-write compatible feature flags. The kernel can still read and write this
-FS even if it doesn't understand the flag. Currently, there are no valid
-flags.
-
-*sb_features_ro_compat*::
-Read-only compatible feature flags. The kernel can still read this FS even if
-it doesn't understand the flag.
-
-.Extended Version 5 Superblock Read-Only compatibility flags
-[options="header"]
-|=====
-| Flag | Description
-| +XFS_SB_FEAT_RO_COMPAT_FINOBT+ |
-Free inode B+tree. Each allocation group contains a B+tree to track inode chunks
-containing free inodes. This is a performance optimization to reduce the time
-required to allocate inodes.
-
-| +XFS_SB_FEAT_RO_COMPAT_RMAPBT+ |
-Reverse mapping B+tree. Each allocation group contains a B+tree containing
-records mapping AG blocks to their owners. See the section about
-xref:Reconstruction[reconstruction] for more details.
-
-| +XFS_SB_FEAT_RO_COMPAT_REFLINK+ |
-Reference count B+tree. Each allocation group contains a B+tree to track the
-reference counts of AG blocks. This enables files to share data blocks safely.
-See the section about xref:Reflink_Deduplication[reflink and deduplication] for
-more details.
-
-| +XFS_SB_FEAT_RO_COMPAT_INOBTCNT+ |
-Inode B+tree block counters. Each allocation group's inode (AGI) header
-tracks the number of blocks in each of the inode B+trees. This allows us
-to have a slightly higher level of redundancy over the shape of the inode
-btrees, and decreases the amount of time to compute the metadata B+tree
-preallocations at mount time.
-
-|=====
-
-*sb_features_incompat*::
-Read-write incompatible feature flags. The kernel cannot read or write this
-FS if it doesn't understand the flag.
-
-.Extended Version 5 Superblock Read-Write incompatibility flags
-[options="header"]
-|=====
-| Flag | Description
-| +XFS_SB_FEAT_INCOMPAT_FTYPE+ |
-Directory file type. Each directory entry tracks the type of the inode to
-which the entry points. This is a performance optimization to remove the need
-to load every inode into memory to iterate a directory.
-
-| +XFS_SB_FEAT_INCOMPAT_SPINODES+ |
-Sparse inodes. This feature relaxes the requirement to allocate inodes in
-chunks of 64. When the free space is heavily fragmented, there might exist
-plenty of free space but not enough contiguous free space to allocate a new
-inode chunk. With this feature, the user can continue to create files until
-all free space is exhausted.
-
-Unused space in the inode B+tree records are used to track which parts of the
-inode chunk are not inodes.
-
-See the chapter on xref:Sparse_Inodes[Sparse Inodes] for more information.
-
-| +XFS_SB_FEAT_INCOMPAT_META_UUID+ |
-Metadata UUID. The UUID stamped into each metadata block must match the value
-in +sb_meta_uuid+. This enables the administrator to change +sb_uuid+ at will
-without having to rewrite the entire filesystem.
-
-| +XFS_SB_FEAT_INCOMPAT_BIGTIME+ |
-Large timestamps. Inode timestamps and quota expiration timers are extended to
-support times through the year 2486. See the section on
-xref:Timestamps[timestamps] for more information.
-
-| +XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR+ |
-The filesystem is not in operable condition, and must be run through
-xfs_repair before it can be mounted.
-
-| +XFS_SB_FEAT_INCOMPAT_NREXT64+ |
-Large file fork extent counts. This greatly expands the maximum number of
-space mappings allowed in data and extended attribute file forks.
-
-| +XFS_SB_FEAT_INCOMPAT_EXCHRANGE+ |
-Atomic file mapping exchanges. The filesystem is capable of exchanging a range
-of mappings between two arbitrary ranges of a file's fork by using log intent
-items to track the progress of the high level exchange operation. In other
-words, the exchange operation can be restarted if the system goes down, which
-is necessary for userspace to commit of new file contents atomically. This
-flag has user-visible impacts, which is why it is a permanent incompat flag.
-See the section about xref:XMI_Log_Item[mapping exchange log intents] for more
-information.
-
-| +XFS_SB_FEAT_INCOMPAT_PARENT+ |
-Directory parent pointers. See the section about xref:Parent_Pointers[parent
-pointers] for more information.
-
-|=====
-
-*sb_features_log_incompat*::
-Read-write incompatible feature flags for the log. The kernel cannot recover
-the FS log if it doesn't understand the flag.
-
-.Extended Version 5 Superblock Log incompatibility flags
-[options="header"]
-|=====
-| Flag | Description
-| +XFS_SB_FEAT_INCOMPAT_LOG_XATTRS+ |
-Extended attribute updates have been committed to the ondisk log.
-
-|=====
-
-*sb_crc*::
-Superblock checksum.
-
-*sb_spino_align*::
-Sparse inode alignment, in fsblocks. Each chunk of inodes referenced by a
-sparse inode B+tree record must be aligned to this block granularity.
-
-*sb_pquotino*::
-Project quota inode.
-
-*sb_lsn*::
-Log sequence number of the last superblock update.
-
-*sb_meta_uuid*::
-If the +XFS_SB_FEAT_INCOMPAT_META_UUID+ feature is set, then the UUID field in
-all metadata blocks must match this UUID. If not, the block header UUID field
-must match +sb_uuid+.
-
-*sb_rrmapino*::
-If the +XFS_SB_FEAT_RO_COMPAT_RMAPBT+ feature is set and a real-time
-device is present (+sb_rblocks+ > 0), this field points to an inode
-that contains the root to the
-xref:Real_time_Reverse_Mapping_Btree[Real-Time Reverse Mapping B+tree].
-This field is zero otherwise.
-
-=== xfs_db Superblock Example
-
-A filesystem is made on a single disk with the following command:
-
-----
-# mkfs.xfs -i attr=2 -n size=16384 -f /dev/sda7
-meta-data=/dev/sda7 isize=256 agcount=16, agsize=3923122 blks
- = sectsz=512 attr=2
-data = bsize=4096 blocks=62769952, imaxpct=25
- = sunit=0 swidth=0 blks, unwritten=1
-naming =version 2 bsize=16384
-log =internal log bsize=4096 blocks=30649, version=1
- = sectsz=512 sunit=0 blks
-realtime =none extsz=65536 blocks=0, rtextents=0
-----
-
-And in xfs_db, inspecting the superblock:
-
-----
-xfs_db> sb
-xfs_db> p
-magicnum = 0x58465342
-blocksize = 4096
-dblocks = 62769952
-rblocks = 0
-rextents = 0
-uuid = 32b24036-6931-45b4-b68c-cd5e7d9a1ca5
-logstart = 33554436
-rootino = 128
-rbmino = 129
-rsumino = 130
-rextsize = 16
-agblocks = 3923122
-agcount = 16
-rbmblocks = 0
-logblocks = 30649
-versionnum = 0xb084
-sectsize = 512
-inodesize = 256
-inopblock = 16
-fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
-blocklog = 12
-sectlog = 9
-inodelog = 8
-inopblog = 4
-agblklog = 22
-rextslog = 0
-inprogress = 0
-imax_pct = 25
-icount = 64
-ifree = 61
-fdblocks = 62739235
-frextents = 0
-uquotino = 0
-gquotino = 0
-qflags = 0
-flags = 0
-shared_vn = 0
-inoalignmt = 2
-unit = 0
-width = 0
-dirblklog = 2
-logsectlog = 0
-logsectsize = 0
-logsunit = 0
-features2 = 8
-----
-
+include::superblock.asciidoc[]
[[AG_Free_Space_Management]]
== AG Free Space Management
diff --git a/design/XFS_Filesystem_Structure/superblock.asciidoc b/design/XFS_Filesystem_Structure/superblock.asciidoc
new file mode 100644
index 00000000000000..16c31116ffafd4
--- /dev/null
+++ b/design/XFS_Filesystem_Structure/superblock.asciidoc
@@ -0,0 +1,548 @@
+[[Superblocks]]
+== Superblocks
+
+Each AG starts with a superblock. The first one, in AG 0, is the primary
+superblock which stores aggregate AG information. Secondary superblocks are
+only used by xfs_repair when the primary superblock has been corrupted. A
+superblock is one sector in length.
+
+The superblock is defined by the following structure. The description of each
+field follows.
+
+[source, c]
+----
+struct xfs_sb
+{
+ __uint32_t sb_magicnum;
+ __uint32_t sb_blocksize;
+ xfs_rfsblock_t sb_dblocks;
+ xfs_rfsblock_t sb_rblocks;
+ xfs_rtblock_t sb_rextents;
+ uuid_t sb_uuid;
+ xfs_fsblock_t sb_logstart;
+ xfs_ino_t sb_rootino;
+ xfs_ino_t sb_rbmino;
+ xfs_ino_t sb_rsumino;
+ xfs_agblock_t sb_rextsize;
+ xfs_agblock_t sb_agblocks;
+ xfs_agnumber_t sb_agcount;
+ xfs_extlen_t sb_rbmblocks;
+ xfs_extlen_t sb_logblocks;
+ __uint16_t sb_versionnum;
+ __uint16_t sb_sectsize;
+ __uint16_t sb_inodesize;
+ __uint16_t sb_inopblock;
+ char sb_fname[12];
+ __uint8_t sb_blocklog;
+ __uint8_t sb_sectlog;
+ __uint8_t sb_inodelog;
+ __uint8_t sb_inopblog;
+ __uint8_t sb_agblklog;
+ __uint8_t sb_rextslog;
+ __uint8_t sb_inprogress;
+ __uint8_t sb_imax_pct;
+ __uint64_t sb_icount;
+ __uint64_t sb_ifree;
+ __uint64_t sb_fdblocks;
+ __uint64_t sb_frextents;
+ xfs_ino_t sb_uquotino;
+ xfs_ino_t sb_gquotino;
+ __uint16_t sb_qflags;
+ __uint8_t sb_flags;
+ __uint8_t sb_shared_vn;
+ xfs_extlen_t sb_inoalignmt;
+ __uint32_t sb_unit;
+ __uint32_t sb_width;
+ __uint8_t sb_dirblklog;
+ __uint8_t sb_logsectlog;
+ __uint16_t sb_logsectsize;
+ __uint32_t sb_logsunit;
+ __uint32_t sb_features2;
+ __uint32_t sb_bad_features2;
+
+ /* version 5 superblock fields start here */
+ __uint32_t sb_features_compat;
+ __uint32_t sb_features_ro_compat;
+ __uint32_t sb_features_incompat;
+ __uint32_t sb_features_log_incompat;
+
+ __uint32_t sb_crc;
+ xfs_extlen_t sb_spino_align;
+
+ xfs_ino_t sb_pquotino;
+ xfs_lsn_t sb_lsn;
+ uuid_t sb_meta_uuid;
+ xfs_ino_t sb_rrmapino;
+};
+----
+*sb_magicnum*::
+Identifies the filesystem. Its value is +XFS_SB_MAGIC+ ``XFSB'' (0x58465342).
+
+*sb_blocksize*::
+The size of a basic unit of space allocation in bytes. Typically, this is 4096
+(4KB) but can range from 512 to 65536 bytes.
+
+*sb_dblocks*::
+Total number of blocks available for data and metadata on the filesystem.
+
+*sb_rblocks*::
+Number blocks in the real-time disk device. Refer to
+xref:Real-time_Devices[real-time sub-volumes] for more information.
+
+*sb_rextents*::
+Number of extents on the real-time device.
+
+*sb_uuid*::
+UUID (Universally Unique ID) for the filesystem. Filesystems can be mounted by
+the UUID instead of device name.
+
+*sb_logstart*::
+First block number for the journaling log if the log is internal (ie. not on a
+separate disk device). For an external log device, this will be zero (the log
+will also start on the first block on the log device). The identity of the log
+devices is not recorded in the filesystem, but the UUIDs of the filesystem and
+the log device are compared to prevent corruption.
+
+*sb_rootino*::
+Root inode number for the filesystem. Normally, the root inode is at the
+start of the first possible inode chunk in AG 0. This is 128 when using a 4KB
+block size.
+
+*sb_rbmino*::
+Bitmap inode for real-time extents.
+
+*sb_rsumino*::
+Summary inode for real-time bitmap.
+
+*sb_rextsize*::
+Realtime extent size in blocks.
+
+*sb_agblocks*::
+Size of each AG in blocks. For the actual size of the last AG, refer to the
+xref:AG_Free_Space_Management[free space] +agf_length+ value.
+
+*sb_agcount*::
+Number of AGs in the filesystem.
+
+*sb_rbmblocks*::
+Number of real-time bitmap blocks.
+
+*sb_logblocks*::
+Number of blocks for the journaling log.
+
+*sb_versionnum*::
+Filesystem version number. This is a bitmask specifying the features enabled
+when creating the filesystem. Any disk checking tools or drivers that do not
+recognize any set bits must not operate upon the filesystem. Most of the flags
+indicate features introduced over time. If the value of the lower nibble is >=
+4, the higher bits indicate feature flags as follows:
+
+.Version 4 Superblock version flags
+[options="header"]
+|=====
+| Flag | Description
+| +XFS_SB_VERSION_ATTRBIT+ |
+Set if any inode have extended attributes. If this bit is set; the
++XFS_SB_VERSION2_ATTR2BIT+ is not set; and the +attr2+ mount flag is not
+specified, the +di_forkoff+ inode field will not be dynamically adjusted.
+See the section about xref:Extended_Attribute_Versions[extended attribute
+versions] for more information.
+
+| +XFS_SB_VERSION_NLINKBIT+ | Set if any inodes use 32-bit di_nlink values.
+| +XFS_SB_VERSION_QUOTABIT+ |
+Quotas are enabled on the filesystem. This
+also brings in the various quota fields in the superblock.
+
+| +XFS_SB_VERSION_ALIGNBIT+ | Set if sb_inoalignmt is used.
+| +XFS_SB_VERSION_DALIGNBIT+ | Set if sb_unit and sb_width are used.
+| +XFS_SB_VERSION_SHAREDBIT+ | Set if sb_shared_vn is used.
+| +XFS_SB_VERSION_LOGV2BIT+ | Version 2 journaling logs are used.
+| +XFS_SB_VERSION_SECTORBIT+ | Set if sb_sectsize is not 512.
+| +XFS_SB_VERSION_EXTFLGBIT+ | Unwritten extents are used. This is always set.
+| +XFS_SB_VERSION_DIRV2BIT+ |
+Version 2 directories are used. This is always set.
+
+| +XFS_SB_VERSION_MOREBITSBIT+ |
+Set if the sb_features2 field in the superblock contains more flags.
+|=====
+
+If the lower nibble of this value is 5, then this is a v5 filesystem; the
++XFS_SB_VERSION2_CRCBIT+ feature must be set in +sb_features2+.
+
+*sb_sectsize*::
+Specifies the underlying disk sector size in bytes. Typically this is 512 or
+4096 bytes. This determines the minimum I/O alignment, especially for direct I/O.
+
+*sb_inodesize*::
+Size of the inode in bytes. The default is 256 (2 inodes per standard sector)
+but can be made as large as 2048 bytes when creating the filesystem. On a v5
+filesystem, the default and minimum inode size are both 512 bytes.
+
+*sb_inopblock*::
+Number of inodes per block. This is equivalent to +sb_blocksize / sb_inodesize+.
+
+*sb_fname[12]*::
+Name for the filesystem. This value can be used in the mount command.
+
+*sb_blocklog*::
+log~2~ value of +sb_blocksize+. In other terms, +sb_blocksize = 2^sb_blocklog^+.
+
+*sb_sectlog*::
+log~2~ value of +sb_sectsize+.
+
+*sb_inodelog*::
+log~2~ value of +sb_inodesize+.
+
+*sb_inopblog*::
+log~2~ value of +sb_inopblock+.
+
+*sb_agblklog*::
+log~2~ value of +sb_agblocks+ (rounded up). This value is used to generate inode
+numbers and absolute block numbers defined in extent maps.
+
+*sb_rextslog*::
+log~2~ value of +sb_rextents+.
+
+*sb_inprogress*::
+Flag specifying that the filesystem is being created.
+
+*sb_imax_pct*::
+Maximum percentage of filesystem space that can be used for inodes. The default
+value is 5%.
+
+*sb_icount*::
+Global count for number inodes allocated on the filesystem. This is only
+maintained in the first superblock.
+
+*sb_ifree*::
+Global count of free inodes on the filesystem. This is only maintained in the
+first superblock.
+
+*sb_fdblocks*::
+Global count of free data blocks on the filesystem. This is only maintained in
+the first superblock.
+
+*sb_frextents*::
+Global count of free real-time extents on the filesystem. This is only
+maintained in the first superblock.
+
+*sb_uquotino*::
+Inode for user quotas. This and the following two quota fields only apply if
++XFS_SB_VERSION_QUOTABIT+ flag is set in +sb_versionnum+. Refer to
+xref:Quota_Inodes[quota inodes] for more information.
+
+*sb_gquotino*::
+Inode for group or project quotas. Group and project quotas cannot be used at
+the same time on v4 filesystems. On a v5 filesystem, this inode always stores
+group quota information.
+
+*sb_qflags*::
+Quota flags. It can be a combination of the following flags:
+
+.Superblock quota flags
+[options="header"]
+|=====
+| Flag | Description
+| +XFS_UQUOTA_ACCT+ | User quota accounting is enabled.
+| +XFS_UQUOTA_ENFD+ | User quotas are enforced.
+| +XFS_UQUOTA_CHKD+ | User quotas have been checked.
+| +XFS_PQUOTA_ACCT+ | Project quota accounting is enabled.
+| +XFS_OQUOTA_ENFD+ | Other (group/project) quotas are enforced.
+| +XFS_OQUOTA_CHKD+ | Other (group/project) quotas have been checked.
+| +XFS_GQUOTA_ACCT+ | Group quota accounting is enabled.
+| +XFS_GQUOTA_ENFD+ | Group quotas are enforced.
+| +XFS_GQUOTA_CHKD+ | Group quotas have been checked.
+| +XFS_PQUOTA_ENFD+ | Project quotas are enforced.
+| +XFS_PQUOTA_CHKD+ | Project quotas have been checked.
+|=====
+
+*sb_flags*::
+Miscellaneous flags.
+
+.Superblock flags
+[options="header"]
+|=====
+| Flag | Description
+| +XFS_SBF_READONLY+ | Only read-only mounts allowed.
+|=====
+
+*sb_shared_vn*::
+Reserved and must be zero (``vn'' stands for version number).
+
+*sb_inoalignmt*::
+Inode chunk alignment in fsblocks. Prior to v5, the default value provided for
+inode chunks to have an 8KiB alignment. Starting with v5, the default value
+scales with the multiple of the inode size over 256 bytes. Concretely, this
+means an alignment of 16KiB for 512-byte inodes, 32KiB for 1024-byte inodes,
+etc. If sparse inodes are enabled, the +ir_startino+ field of each inode
+B+tree record must be aligned to this block granularity, even if the inode
+given by +ir_startino+ itself is sparse.
+
+*sb_unit*::
+Underlying stripe or raid unit in blocks.
+
+*sb_width*::
+Underlying stripe or raid width in blocks.
+
+*sb_dirblklog*::
+log~2~ multiplier that determines the granularity of directory block allocations
+in fsblocks.
+
+*sb_logsectlog*::
+log~2~ value of the log subvolume's sector size. This is only used if the
+journaling log is on a separate disk device (i.e. not internal).
+
+*sb_logsectsize*::
+The log's sector size in bytes if the filesystem uses an external log device.
+
+*sb_logsunit*::
+The log device's stripe or raid unit size. This only applies to version 2 logs
++XFS_SB_VERSION_LOGV2BIT+ is set in +sb_versionnum+.
+
+*sb_features2*::
+Additional version flags if +XFS_SB_VERSION_MOREBITSBIT+ is set in
++sb_versionnum+. The currently defined additional features include:
+
+.Extended Version 4 Superblock flags
+[options="header"]
+|=====
+| Flag | Description
+| +XFS_SB_VERSION2_LAZYSBCOUNTBIT+ |
+Lazy global counters. Making a filesystem with this bit set can improve
+performance. The global free space and inode counts are only updated in the
+primary superblock when the filesystem is cleanly unmounted.
+
+| +XFS_SB_VERSION2_ATTR2BIT+ |
+Extended attributes version 2. Making a filesystem with this optimises the
+inode layout of extended attributes. If this bit is set and the +noattr2+
+mount flag is not specified, the +di_forkoff+ inode field will be dynamically
+adjusted. See the section about xref:Extended_Attribute_Versions[extended
+attribute versions] for more information.
+
+| +XFS_SB_VERSION2_PARENTBIT+ |
+Parent pointers. All inodes must have an extended attribute that points back to
+its parent inode. The primary purpose for this information is in backup systems.
+
+| +XFS_SB_VERSION2_PROJID32BIT+ |
+32-bit Project ID. Inodes can be associated with a project ID number, which
+can be used to enforce disk space usage quotas for a particular group of
+directories. This flag indicates that project IDs can be 32 bits in size.
+
+| +XFS_SB_VERSION2_CRCBIT+ |
+Metadata checksumming. All metadata blocks have an extended header containing
+the block checksum, a copy of the metadata UUID, the log sequence number of the
+last update to prevent stale replays, and a back pointer to the owner of the
+block. This feature must be and can only be set if the lowest nibble of
++sb_versionnum+ is set to 5.
+
+| +XFS_SB_VERSION2_FTYPE+ |
+Directory file type. Each directory entry records the type of the inode to
+which the entry points. This speeds up directory iteration by removing the
+need to load every inode into memory.
+|=====
+
+*sb_bad_features2*::
+This field mirrors +sb_features2+, due to past 64-bit alignment errors.
+
+*sb_features_compat*::
+Read-write compatible feature flags. The kernel can still read and write this
+FS even if it doesn't understand the flag. Currently, there are no valid
+flags.
+
+*sb_features_ro_compat*::
+Read-only compatible feature flags. The kernel can still read this FS even if
+it doesn't understand the flag.
+
+.Extended Version 5 Superblock Read-Only compatibility flags
+[options="header"]
+|=====
+| Flag | Description
+| +XFS_SB_FEAT_RO_COMPAT_FINOBT+ |
+Free inode B+tree. Each allocation group contains a B+tree to track inode chunks
+containing free inodes. This is a performance optimization to reduce the time
+required to allocate inodes.
+
+| +XFS_SB_FEAT_RO_COMPAT_RMAPBT+ |
+Reverse mapping B+tree. Each allocation group contains a B+tree containing
+records mapping AG blocks to their owners. See the section about
+xref:Reconstruction[reconstruction] for more details.
+
+| +XFS_SB_FEAT_RO_COMPAT_REFLINK+ |
+Reference count B+tree. Each allocation group contains a B+tree to track the
+reference counts of AG blocks. This enables files to share data blocks safely.
+See the section about xref:Reflink_Deduplication[reflink and deduplication] for
+more details.
+
+| +XFS_SB_FEAT_RO_COMPAT_INOBTCNT+ |
+Inode B+tree block counters. Each allocation group's inode (AGI) header
+tracks the number of blocks in each of the inode B+trees. This allows us
+to have a slightly higher level of redundancy over the shape of the inode
+btrees, and decreases the amount of time to compute the metadata B+tree
+preallocations at mount time.
+
+|=====
+
+*sb_features_incompat*::
+Read-write incompatible feature flags. The kernel cannot read or write this
+FS if it doesn't understand the flag.
+
+.Extended Version 5 Superblock Read-Write incompatibility flags
+[options="header"]
+|=====
+| Flag | Description
+| +XFS_SB_FEAT_INCOMPAT_FTYPE+ |
+Directory file type. Each directory entry tracks the type of the inode to
+which the entry points. This is a performance optimization to remove the need
+to load every inode into memory to iterate a directory.
+
+| +XFS_SB_FEAT_INCOMPAT_SPINODES+ |
+Sparse inodes. This feature relaxes the requirement to allocate inodes in
+chunks of 64. When the free space is heavily fragmented, there might exist
+plenty of free space but not enough contiguous free space to allocate a new
+inode chunk. With this feature, the user can continue to create files until
+all free space is exhausted.
+
+Unused space in the inode B+tree records are used to track which parts of the
+inode chunk are not inodes.
+
+See the chapter on xref:Sparse_Inodes[Sparse Inodes] for more information.
+
+| +XFS_SB_FEAT_INCOMPAT_META_UUID+ |
+Metadata UUID. The UUID stamped into each metadata block must match the value
+in +sb_meta_uuid+. This enables the administrator to change +sb_uuid+ at will
+without having to rewrite the entire filesystem.
+
+| +XFS_SB_FEAT_INCOMPAT_BIGTIME+ |
+Large timestamps. Inode timestamps and quota expiration timers are extended to
+support times through the year 2486. See the section on
+xref:Timestamps[timestamps] for more information.
+
+| +XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR+ |
+The filesystem is not in operable condition, and must be run through
+xfs_repair before it can be mounted.
+
+| +XFS_SB_FEAT_INCOMPAT_NREXT64+ |
+Large file fork extent counts. This greatly expands the maximum number of
+space mappings allowed in data and extended attribute file forks.
+
+| +XFS_SB_FEAT_INCOMPAT_EXCHRANGE+ |
+Atomic file mapping exchanges. The filesystem is capable of exchanging a range
+of mappings between two arbitrary ranges of a file's fork by using log intent
+items to track the progress of the high level exchange operation. In other
+words, the exchange operation can be restarted if the system goes down, which
+is necessary for userspace to commit of new file contents atomically. This
+flag has user-visible impacts, which is why it is a permanent incompat flag.
+See the section about xref:XMI_Log_Item[mapping exchange log intents] for more
+information.
+
+| +XFS_SB_FEAT_INCOMPAT_PARENT+ |
+Directory parent pointers. See the section about xref:Parent_Pointers[parent
+pointers] for more information.
+
+|=====
+
+*sb_features_log_incompat*::
+Read-write incompatible feature flags for the log. The kernel cannot recover
+the FS log if it doesn't understand the flag.
+
+.Extended Version 5 Superblock Log incompatibility flags
+[options="header"]
+|=====
+| Flag | Description
+| +XFS_SB_FEAT_INCOMPAT_LOG_XATTRS+ |
+Extended attribute updates have been committed to the ondisk log.
+
+|=====
+
+*sb_crc*::
+Superblock checksum.
+
+*sb_spino_align*::
+Sparse inode alignment, in fsblocks. Each chunk of inodes referenced by a
+sparse inode B+tree record must be aligned to this block granularity.
+
+*sb_pquotino*::
+Project quota inode.
+
+*sb_lsn*::
+Log sequence number of the last superblock update.
+
+*sb_meta_uuid*::
+If the +XFS_SB_FEAT_INCOMPAT_META_UUID+ feature is set, then the UUID field in
+all metadata blocks must match this UUID. If not, the block header UUID field
+must match +sb_uuid+.
+
+*sb_rrmapino*::
+If the +XFS_SB_FEAT_RO_COMPAT_RMAPBT+ feature is set and a real-time
+device is present (+sb_rblocks+ > 0), this field points to an inode
+that contains the root to the
+xref:Real_time_Reverse_Mapping_Btree[Real-Time Reverse Mapping B+tree].
+This field is zero otherwise.
+
+=== xfs_db Superblock Example
+
+A filesystem is made on a single disk with the following command:
+
+----
+# mkfs.xfs -i attr=2 -n size=16384 -f /dev/sda7
+meta-data=/dev/sda7 isize=256 agcount=16, agsize=3923122 blks
+ = sectsz=512 attr=2
+data = bsize=4096 blocks=62769952, imaxpct=25
+ = sunit=0 swidth=0 blks, unwritten=1
+naming =version 2 bsize=16384
+log =internal log bsize=4096 blocks=30649, version=1
+ = sectsz=512 sunit=0 blks
+realtime =none extsz=65536 blocks=0, rtextents=0
+----
+
+And in xfs_db, inspecting the superblock:
+
+----
+xfs_db> sb
+xfs_db> p
+magicnum = 0x58465342
+blocksize = 4096
+dblocks = 62769952
+rblocks = 0
+rextents = 0
+uuid = 32b24036-6931-45b4-b68c-cd5e7d9a1ca5
+logstart = 33554436
+rootino = 128
+rbmino = 129
+rsumino = 130
+rextsize = 16
+agblocks = 3923122
+agcount = 16
+rbmblocks = 0
+logblocks = 30649
+versionnum = 0xb084
+sectsize = 512
+inodesize = 256
+inopblock = 16
+fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
+blocklog = 12
+sectlog = 9
+inodelog = 8
+inopblog = 4
+agblklog = 22
+rextslog = 0
+inprogress = 0
+imax_pct = 25
+icount = 64
+ifree = 61
+fdblocks = 62739235
+frextents = 0
+uquotino = 0
+gquotino = 0
+qflags = 0
+flags = 0
+shared_vn = 0
+inoalignmt = 2
+unit = 0
+width = 0
+dirblklog = 2
+logsectlog = 0
+logsectsize = 0
+logsunit = 0
+features2 = 8
+----
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 04/10] design: document the actual ondisk superblock
2024-11-27 0:18 ` [PATCHSET] " Darrick J. Wong
` (2 preceding siblings ...)
2024-11-27 0:18 ` [PATCH 03/10] design: move superblock documentation to a separate file Darrick J. Wong
@ 2024-11-27 0:19 ` Darrick J. Wong
2024-11-27 0:19 ` [PATCH 05/10] design: document the changes required to handle metadata directories Darrick J. Wong
` (5 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27 0:19 UTC (permalink / raw)
To: djwong; +Cc: hch, hch, cem, linux-xfs
From: Darrick J. Wong <djwong@kernel.org>
struct xfs_dsb is the ondisk superblock, not struct xfs_sb. Replace the
struct definition with the one for the the ondisk superblock.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
.../XFS_Filesystem_Structure/superblock.asciidoc | 117 ++++++++++----------
1 file changed, 58 insertions(+), 59 deletions(-)
diff --git a/design/XFS_Filesystem_Structure/superblock.asciidoc b/design/XFS_Filesystem_Structure/superblock.asciidoc
index 16c31116ffafd4..79e8c30dc93e79 100644
--- a/design/XFS_Filesystem_Structure/superblock.asciidoc
+++ b/design/XFS_Filesystem_Structure/superblock.asciidoc
@@ -11,68 +11,67 @@ field follows.
[source, c]
----
-struct xfs_sb
-{
- __uint32_t sb_magicnum;
- __uint32_t sb_blocksize;
- xfs_rfsblock_t sb_dblocks;
- xfs_rfsblock_t sb_rblocks;
- xfs_rtblock_t sb_rextents;
- uuid_t sb_uuid;
- xfs_fsblock_t sb_logstart;
- xfs_ino_t sb_rootino;
- xfs_ino_t sb_rbmino;
- xfs_ino_t sb_rsumino;
- xfs_agblock_t sb_rextsize;
- xfs_agblock_t sb_agblocks;
- xfs_agnumber_t sb_agcount;
- xfs_extlen_t sb_rbmblocks;
- xfs_extlen_t sb_logblocks;
- __uint16_t sb_versionnum;
- __uint16_t sb_sectsize;
- __uint16_t sb_inodesize;
- __uint16_t sb_inopblock;
- char sb_fname[12];
- __uint8_t sb_blocklog;
- __uint8_t sb_sectlog;
- __uint8_t sb_inodelog;
- __uint8_t sb_inopblog;
- __uint8_t sb_agblklog;
- __uint8_t sb_rextslog;
- __uint8_t sb_inprogress;
- __uint8_t sb_imax_pct;
- __uint64_t sb_icount;
- __uint64_t sb_ifree;
- __uint64_t sb_fdblocks;
- __uint64_t sb_frextents;
- xfs_ino_t sb_uquotino;
- xfs_ino_t sb_gquotino;
- __uint16_t sb_qflags;
- __uint8_t sb_flags;
- __uint8_t sb_shared_vn;
- xfs_extlen_t sb_inoalignmt;
- __uint32_t sb_unit;
- __uint32_t sb_width;
- __uint8_t sb_dirblklog;
- __uint8_t sb_logsectlog;
- __uint16_t sb_logsectsize;
- __uint32_t sb_logsunit;
- __uint32_t sb_features2;
- __uint32_t sb_bad_features2;
+struct xfs_dsb {
+ __be32 sb_magicnum;
+ __be32 sb_blocksize;
+ __be64 sb_dblocks;
+ __be64 sb_rblocks;
+ __be64 sb_rextents;
+ uuid_t sb_uuid;
+ __be64 sb_logstart;
+ __be64 sb_rootino;
+ __be64 sb_rbmino;
+ __be64 sb_rsumino;
+ __be32 sb_rextsize;
+ __be32 sb_agblocks;
+ __be32 sb_agcount;
+ __be32 sb_rbmblocks;
+ __be32 sb_logblocks;
+ __be16 sb_versionnum;
+ __be16 sb_sectsize;
+ __be16 sb_inodesize;
+ __be16 sb_inopblock;
+ char sb_fname[XFSLABEL_MAX];
+ __u8 sb_blocklog;
+ __u8 sb_sectlog;
+ __u8 sb_inodelog;
+ __u8 sb_inopblog;
+ __u8 sb_agblklog;
+ __u8 sb_rextslog;
+ __u8 sb_inprogress;
+ __u8 sb_imax_pct;
+ __be64 sb_icount;
+ __be64 sb_ifree;
+ __be64 sb_fdblocks;
+ __be64 sb_frextents;
+ __be64 sb_uquotino;
+ __be64 sb_gquotino;
+ __be16 sb_qflags;
+ __u8 sb_flags;
+ __u8 sb_shared_vn;
+ __be32 sb_inoalignmt;
+ __be32 sb_unit;
+ __be32 sb_width;
+ __u8 sb_dirblklog;
+ __u8 sb_logsectlog;
+ __be16 sb_logsectsize;
+ __be32 sb_logsunit;
+ __be32 sb_features2;
+ __be32 sb_bad_features2;
/* version 5 superblock fields start here */
- __uint32_t sb_features_compat;
- __uint32_t sb_features_ro_compat;
- __uint32_t sb_features_incompat;
- __uint32_t sb_features_log_incompat;
+ __be32 sb_features_compat;
+ __be32 sb_features_ro_compat;
+ __be32 sb_features_incompat;
+ __be32 sb_features_log_incompat;
+ __le32 sb_crc;
+ __be32 sb_spino_align;
+ __be64 sb_pquotino;
+ __be64 sb_lsn;
+ uuid_t sb_meta_uuid;
+ __be64 sb_rrmapino;
- __uint32_t sb_crc;
- xfs_extlen_t sb_spino_align;
-
- xfs_ino_t sb_pquotino;
- xfs_lsn_t sb_lsn;
- uuid_t sb_meta_uuid;
- xfs_ino_t sb_rrmapino;
+ /* must be padded to 64 bit alignment */
};
----
*sb_magicnum*::
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 05/10] design: document the changes required to handle metadata directories
2024-11-27 0:18 ` [PATCHSET] " Darrick J. Wong
` (3 preceding siblings ...)
2024-11-27 0:19 ` [PATCH 04/10] design: document the actual ondisk superblock Darrick J. Wong
@ 2024-11-27 0:19 ` Darrick J. Wong
2024-11-27 0:19 ` [PATCH 06/10] design: move discussion of realtime volumes to a separate section Darrick J. Wong
` (4 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27 0:19 UTC (permalink / raw)
To: djwong; +Cc: hch, hch, cem, linux-xfs
From: Darrick J. Wong <djwong@kernel.org>
Document the ondisk format changes for metadata directories.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
.../internal_inodes.asciidoc | 113 ++++++++++++++++++++
.../XFS_Filesystem_Structure/ondisk_inode.asciidoc | 22 ++++
.../XFS_Filesystem_Structure/superblock.asciidoc | 14 +-
3 files changed, 142 insertions(+), 7 deletions(-)
diff --git a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
index 84e4cb969ce392..eaa0a50aa848f3 100644
--- a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
+++ b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
@@ -5,6 +5,119 @@ XFS allocates several inodes when a filesystem is created. These are internal
and not accessible from the standard directory structure. These inodes are only
accessible from the superblock.
+[[Metadata_Directories]]
+== Metadata Directory Tree
+
+If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, the +sb_metadirino+
+field in the superblock points to the root of a directory tree containing
+metadata files. This directory tree is completely internal to the filesystem
+and must not be exposed to user programs.
+
+When this feature is enabled, metadata files should be found by walking the
+metadata directory tree. The superblock fields that formerly pointed to (some)
+of those inodes have been deallocated and may be reused by future features.
+
+.Metadata Directory Paths
+[options="header"]
+|=====
+| Metadata File | Location
+|=====
+
+Metadata files are flagged by the +XFS_DIFLAG2_METADATA+ flag in the
++di_flags2+ field. Metadata files must have the following properties:
+
+* Must be either a directory or a regular file.
+* chmod 0000
+* User and group IDs set to zero.
+* The +XFS_DIFLAG_IMMUTABLE+, +XFS_DIFLAG_SYNC+, +XFS_DIFLAG_NOATIME+, +XFS_DIFLAG_NODUMP+, and +XFS_DIFLAG_NODEFRAG+ flags must all be set in +di_flags+.
+* For a directory, the +XFS_DIFLAG_NOSYMLINKS+ flag must also be set.
+* The +XFS_DIFLAG2_METADATA+ flag must be set in +di_flags2+.
+* The +XFS_DIFLAG2_DAX+ flag must not be set.
+
+=== Metadata Directory Example
+
+This example shows a metadta directory from a freshly formatted root
+filesystem:
+
+----
+xfs_db> sb 0
+xfs_db> p
+magicnum = 0x58465342
+blocksize = 4096
+dblocks = 5192704
+rblocks = 0
+rextents = 0
+uuid = cbf2ceef-658e-46b0-8f96-785661c37976
+logstart = 4194311
+rootino = 128
+rbmino = 130
+rsumino = 131
+...
+meta_uuid = 00000000-0000-0000-0000-000000000000
+metadirino = 129
+...
+----
+
+Notice how the listing includes the root of the metadata directory tree
+(+metadirino+).
+
+----
+xfs_db> path -m /
+xfs_db> ls
+8 129 directory 0x0000002e 1 . (good)
+10 129 directory 0x0000172e 2 .. (good)
+12 33685632 directory 0x2d18ab4c 8 rtgroups (good)
+----
+
+Here we use the +path+ and +ls+ commands to display the root directory of
+the metadata directory. We can navigate the directory the old way, too:
+
+----
+xfs_db> p
+core.magic = 0x494e
+core.mode = 040000
+core.version = 3
+core.format = 1 (local)
+core.onlink = 0
+core.uid = 0
+core.gid = 0
+...
+v3.flags2 = 0x8000000000000018
+v3.cowextsize = 0
+v3.crtime.sec = Wed Aug 7 10:22:36 2024
+v3.crtime.nsec = 273744000
+v3.inumber = 129
+v3.uuid = 7e55b909-8728-4d69-a1fa-891427314eea
+v3.reflink = 0
+v3.cowextsz = 0
+v3.dax = 0
+v3.bigtime = 1
+v3.nrext64 = 1
+v3.metadata = 1
+u3.sfdir3.hdr.count = 1
+u3.sfdir3.hdr.i8count = 0
+u3.sfdir3.hdr.parent.i4 = 129
+u3.sfdir3.list[0].namelen = 8
+u3.sfdir3.list[0].offset = 0x60
+u3.sfdir3.list[0].name = "rtgroups"
+u3.sfdir3.list[0].inumber.i4 = 33685632
+u3.sfdir3.list[0].filetype = 2
+----
+
+The root of the metadata directory is a short format directory, and looks just
+like any other directory. The only difference is that the metadata flag is
+set, and the directory can only be viewed in the XFS debugger.
+
+----
+xfs_db> path -m /rtgroups/0.rmap
+btdump
+u3.rtrmapbt.recs[1] = [startblock,blockcount,owner,offset,extentflag,attrfork,bmbtblock]
+1:[0,1,-3,0,0,0,0]
+----
+
+Observe that we can use the xfs_db +path+ command to navigate the metadata
+directory tree to the user quota file and display its contents.
+
[[Quota_Inodes]]
== Quota Inodes
diff --git a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
index 34c064871cb255..02ec0d12bb57e5 100644
--- a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
+++ b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
@@ -78,7 +78,10 @@ struct xfs_dinode_core {
__uint16_t di_mode;
__int8_t di_version;
__int8_t di_format;
- __uint16_t di_onlink;
+ union {
+ __uint16_t di_onlink;
+ __uint16_t di_metatype;
+ };
__uint32_t di_uid;
__uint32_t di_gid;
__uint32_t di_nlink;
@@ -188,6 +191,17 @@ In v1 inodes, this specifies the number of links to the inode from directories.
When the number exceeds 65535, the inode is converted to v2 and the link count
is stored in +di_nlink+.
+*di_metatype*::
+If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, the +di_onlink+ field
+is redefined to declare the intended contents of files in the metadata
+directory tree.
+
+[source, c]
+----
+enum xfs_metafile_type {
+};
+----
+
*di_uid*::
Specifies the owner's UID of the inode.
@@ -383,6 +397,12 @@ will be copied to all newly created files and directories.
Files with this flag set may have up to (2^48^ - 1) extents mapped to the data
fork and up to (2^32^ - 1) extents mapped to the attribute fork. This flag
requires the +XFS_SB_FEAT_INCOMPAT_NREXT64+ feature to be enabled.
+| +XFS_DIFLAG2_METADATA+ |
+This file contains filesystem metadata. This feature requires the
++XFS_SB_FEAT_INCOMPAT_METADIR+ feature to be enabled. See the section about
+xref:Metadata_Directories[metadata directories] for more information on
+metadata inode properties. Only directories and regular files can have this
+flag set.
|=====
*di_cowextsize*::
diff --git a/design/XFS_Filesystem_Structure/superblock.asciidoc b/design/XFS_Filesystem_Structure/superblock.asciidoc
index 79e8c30dc93e79..56877615ae81bf 100644
--- a/design/XFS_Filesystem_Structure/superblock.asciidoc
+++ b/design/XFS_Filesystem_Structure/superblock.asciidoc
@@ -69,7 +69,7 @@ struct xfs_dsb {
__be64 sb_pquotino;
__be64 sb_lsn;
uuid_t sb_meta_uuid;
- __be64 sb_rrmapino;
+ __be64 sb_metadirino;
/* must be padded to 64 bit alignment */
};
@@ -438,6 +438,10 @@ information.
Directory parent pointers. See the section about xref:Parent_Pointers[parent
pointers] for more information.
+| +XFS_SB_FEAT_INCOMPAT_METADIR+ |
+Metadata directory tree. See the section about the xref:Metadata_Directories[
+metadata directory tree] for more information.
+
|=====
*sb_features_log_incompat*::
@@ -471,11 +475,9 @@ If the +XFS_SB_FEAT_INCOMPAT_META_UUID+ feature is set, then the UUID field in
all metadata blocks must match this UUID. If not, the block header UUID field
must match +sb_uuid+.
-*sb_rrmapino*::
-If the +XFS_SB_FEAT_RO_COMPAT_RMAPBT+ feature is set and a real-time
-device is present (+sb_rblocks+ > 0), this field points to an inode
-that contains the root to the
-xref:Real_time_Reverse_Mapping_Btree[Real-Time Reverse Mapping B+tree].
+*sb_metadirino*::
+If the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is set, this field points to
+the inode of the root directory of the metadata directory tree.
This field is zero otherwise.
=== xfs_db Superblock Example
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 06/10] design: move discussion of realtime volumes to a separate section
2024-11-27 0:18 ` [PATCHSET] " Darrick J. Wong
` (4 preceding siblings ...)
2024-11-27 0:19 ` [PATCH 05/10] design: document the changes required to handle metadata directories Darrick J. Wong
@ 2024-11-27 0:19 ` Darrick J. Wong
2024-11-27 0:19 ` [PATCH 07/10] design: document realtime groups Darrick J. Wong
` (3 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27 0:19 UTC (permalink / raw)
To: djwong; +Cc: hch, hch, cem, linux-xfs
From: Darrick J. Wong <djwong@kernel.org>
In preparation for documenting the realtime modernization project, move
the discussions of the realtime-realted ondisk metadata to a separate
file. Since realtime reverse mapping btrees haven't been added to the
filesystem yet, stop including them in the final output.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
.../allocation_groups.asciidoc | 20 --------
.../internal_inodes.asciidoc | 36 +-------------
design/XFS_Filesystem_Structure/realtime.asciidoc | 50 ++++++++++++++++++++
.../xfs_filesystem_structure.asciidoc | 2 +
4 files changed, 54 insertions(+), 54 deletions(-)
create mode 100644 design/XFS_Filesystem_Structure/realtime.asciidoc
diff --git a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
index e2cdaab5e03d3f..c746a92ca47dd6 100644
--- a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
+++ b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
@@ -772,23 +772,3 @@ core.magic = 0x494e
The chunk record also indicates that this chunk has 32 inodes, and that the
missing inodes are also ``free''.
-
-[[Real-time_Devices]]
-== Real-time Devices
-
-The performance of the standard XFS allocator varies depending on the internal
-state of the various metadata indices enabled on the filesystem. For
-applications which need to minimize the jitter of allocation latency, XFS
-supports the notion of a ``real-time device''. This is a special device
-separate from the regular filesystem where extent allocations are tracked with
-a bitmap and free space is indexed with a two-dimensional array. If an inode
-is flagged with +XFS_DIFLAG_REALTIME+, its data will live on the real time
-device. The metadata for real time devices is discussed in the section about
-xref:Real-time_Inodes[real time inodes].
-
-By placing the real time device (and the journal) on separate high-performance
-storage devices, it is possible to reduce most of the unpredictability in I/O
-response times that come from metadata operations.
-
-None of the XFS per-AG B+trees are involved with real time files. It is not
-possible for real time files to share data blocks.
diff --git a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
index eaa0a50aa848f3..68c86d30ff8206 100644
--- a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
+++ b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
@@ -287,41 +287,9 @@ Log sequence number of the last DQ block write.
*dd_crc*::
Checksum of the DQ block.
-
[[Real-time_Inodes]]
== Real-time Inodes
There are two inodes allocated to managing the real-time device's space, the
-Bitmap Inode and the Summary Inode.
-
-[[Real-Time_Bitmap_Inode]]
-=== Real-Time Bitmap Inode
-
-The real time bitmap inode, +sb_rbmino+, tracks the used/free space in the
-real-time device using an old-style bitmap. One bit is allocated per real-time
-extent. The size of an extent is specified by the superblock's +sb_rextsize+
-value.
-
-The number of blocks used by the bitmap inode is equal to the number of
-real-time extents (+sb_rextents+) divided by the block size (+sb_blocksize+)
-and bits per byte. This value is stored in +sb_rbmblocks+. The nblocks and
-extent array for the inode should match this. Each real time block gets its
-own bit in the bitmap.
-
-[[Real-Time_Summary_Inode]]
-=== Real-Time Summary Inode
-
-The real time summary inode, +sb_rsumino+, tracks the used and free space
-accounting information for the real-time device. This file indexes the
-approximate location of each free extent on the real-time device first by
-log2(extent size) and then by the real-time bitmap block number. The size of
-the summary inode file is equal to +sb_rbmblocks+ × log2(realtime device size)
-× sizeof(+xfs_suminfo_t+). The entry for a given log2(extent size) and
-rtbitmap block number is 0 if there is no free extents of that size at that
-rtbitmap location, and positive if there are any.
-
-This data structure is not particularly space efficient, however it is a very
-fast way to provide the same data as the two free space B+trees for regular
-files since the space is preallocated and metadata maintenance is minimal.
-
-include::rtrmapbt.asciidoc[]
+xref:Real-Time_Bitmap_Inode[Bitmap Inode] and the
+xref:Real-Time_Summary_Inode[Summary Inode].
diff --git a/design/XFS_Filesystem_Structure/realtime.asciidoc b/design/XFS_Filesystem_Structure/realtime.asciidoc
new file mode 100644
index 00000000000000..11426e8fdb632d
--- /dev/null
+++ b/design/XFS_Filesystem_Structure/realtime.asciidoc
@@ -0,0 +1,50 @@
+[[Real-time_Devices]]
+= Real-time Devices
+
+The performance of the standard XFS allocator varies depending on the internal
+state of the various metadata indices enabled on the filesystem. For
+applications which need to minimize the jitter of allocation latency, XFS
+supports the notion of a ``real-time device''. This is a special device
+separate from the regular filesystem where extent allocations are tracked with
+a bitmap and free space is indexed with a two-dimensional array. If an inode
+is flagged with +XFS_DIFLAG_REALTIME+, its data will live on the real time
+device.
+
+By placing the real time device (and the journal) on separate high-performance
+storage devices, it is possible to reduce most of the unpredictability in I/O
+response times that come from metadata operations.
+
+None of the XFS per-AG B+trees are involved with real time files. It is not
+possible for real time files to share data blocks.
+
+[[Real-Time_Bitmap_Inode]]
+== Free Space Bitmap Inode
+
+The real time bitmap inode, +sb_rbmino+, tracks the used/free space in the
+real-time device using an old-style bitmap. One bit is allocated per real-time
+extent. The size of an extent is specified by the superblock's +sb_rextsize+
+value.
+
+The number of blocks used by the bitmap inode is equal to the number of
+real-time extents (+sb_rextents+) divided by the block size (+sb_blocksize+)
+and bits per byte. This value is stored in +sb_rbmblocks+. The nblocks and
+extent array for the inode should match this. Each real time block gets its
+own bit in the bitmap.
+
+[[Real-Time_Summary_Inode]]
+== Free Space Summary Inode
+
+The real time summary inode, +sb_rsumino+, tracks the used and free space
+accounting information for the real-time device. This file indexes the
+approximate location of each free extent on the real-time device first by
+log2(extent size) and then by the real-time bitmap block number. The size of
+the summary inode file is equal to +sb_rbmblocks+ × log2(realtime device size)
+× sizeof(+xfs_suminfo_t+). The entry for a given log2(extent size) and
+rtbitmap block number is 0 if there is no free extents of that size at that
+rtbitmap location, and positive if there are any.
+
+This data structure is not particularly space efficient, however it is a very
+fast way to provide the same data as the two free space B+trees for regular
+files since the space is preallocated and metadata maintenance is minimal.
+
+include::rtrmapbt.asciidoc[]
diff --git a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
index 689e2a874c13e9..a643d18add6094 100644
--- a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
+++ b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
@@ -84,6 +84,8 @@ include::journaling_log.asciidoc[]
include::internal_inodes.asciidoc[]
+include::realtime.asciidoc[]
+
include::fs_properties.asciidoc[]
:leveloffset: 0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 07/10] design: document realtime groups
2024-11-27 0:18 ` [PATCHSET] " Darrick J. Wong
` (5 preceding siblings ...)
2024-11-27 0:19 ` [PATCH 06/10] design: move discussion of realtime volumes to a separate section Darrick J. Wong
@ 2024-11-27 0:19 ` Darrick J. Wong
2024-11-27 0:20 ` [PATCH 08/10] design: document metadata directory tree quota changes Darrick J. Wong
` (2 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27 0:19 UTC (permalink / raw)
To: djwong; +Cc: hch, hch, cem, linux-xfs
From: Darrick J. Wong <djwong@kernel.org>
Document the ondisk changes for realtime allocation groups.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
.../XFS_Filesystem_Structure/common_types.asciidoc | 4
.../internal_inodes.asciidoc | 2
design/XFS_Filesystem_Structure/magic.asciidoc | 3
.../XFS_Filesystem_Structure/ondisk_inode.asciidoc | 2
design/XFS_Filesystem_Structure/realtime.asciidoc | 344 ++++++++++++++++++++
.../XFS_Filesystem_Structure/superblock.asciidoc | 22 +
6 files changed, 376 insertions(+), 1 deletion(-)
diff --git a/design/XFS_Filesystem_Structure/common_types.asciidoc b/design/XFS_Filesystem_Structure/common_types.asciidoc
index 51909be384e273..34cdfdaeccf848 100644
--- a/design/XFS_Filesystem_Structure/common_types.asciidoc
+++ b/design/XFS_Filesystem_Structure/common_types.asciidoc
@@ -43,7 +43,9 @@ Unsigned 64 bit raw filesystem block number.
*xfs_rtblock_t*::
Unsigned 64 bit extent number in the xref:Real-time_Devices[real-time]
-sub-volume.
+sub-volume. If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, these
+values combine an xref:Realtime_Groups[rtgroup number] and block offset into
+the realtime group.
*xfs_fileoff_t*::
Unsigned 64 bit block offset into a file.
diff --git a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
index 68c86d30ff8206..5f4d62201cbd67 100644
--- a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
+++ b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
@@ -21,6 +21,8 @@ of those inodes have been deallocated and may be reused by future features.
[options="header"]
|=====
| Metadata File | Location
+| xref:Real-Time_Bitmap_Inode[Realtime Bitmap] | /rtgroups/*.bitmap
+| xref:Real-Time_Summary_Inode[Realtime Summary] | /rtgroups/*.summary
|=====
Metadata files are flagged by the +XFS_DIFLAG2_METADATA+ flag in the
diff --git a/design/XFS_Filesystem_Structure/magic.asciidoc b/design/XFS_Filesystem_Structure/magic.asciidoc
index 60952aeb876ff5..5da29b9ef9f3a8 100644
--- a/design/XFS_Filesystem_Structure/magic.asciidoc
+++ b/design/XFS_Filesystem_Structure/magic.asciidoc
@@ -45,9 +45,12 @@ relevant chapters. Magic numbers tend to have consistent locations:
| +XFS_ATTR3_LEAF_MAGIC+ | 0x3bee | | xref:Leaf_Attributes[Leaf Attribute], v5 only
| +XFS_ATTR3_RMT_MAGIC+ | 0x5841524d | XARM | xref:Remote_Values[Remote Attribute Value], v5 only
| +XFS_RMAP_CRC_MAGIC+ | 0x524d4233 | RMB3 | xref:Reverse_Mapping_Btree[Reverse Mapping B+tree], v5 only
+| +XFS_RTBITMAP_MAGIC+ | 0x424D505A | BMPZ | xref:Real-Time_Bitmap_Inode[Real-Time Bitmap], metadir only
+| +XFS_RTSUMMARY_MAGIC+ | 0x53554D59 | SUMY | xref:Real-Time_Summary_Inode[Real-Time Summary], metadir only
| +XFS_RTRMAP_CRC_MAGIC+ | 0x4d415052 | MAPR | xref:Real_time_Reverse_Mapping_Btree[Real-Time Reverse Mapping B+tree], v5 only
| +XFS_REFC_CRC_MAGIC+ | 0x52334643 | R3FC | xref:Reference_Count_Btree[Reference Count B+tree], v5 only
| +XFS_MD_MAGIC+ | 0x5846534d | XFSM | xref:Metadata_Dumps[Metadata Dumps]
+| +XFS_RTSB_MAGIC+ | 0x46726F67 | Frog | xref:Realtime_Groups[Realtime Groups]
|=====
The magic numbers for log items are at offset zero in each log item, but items
diff --git a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
index 02ec0d12bb57e5..e28929907147b7 100644
--- a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
+++ b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
@@ -199,6 +199,8 @@ directory tree.
[source, c]
----
enum xfs_metafile_type {
+ XFS_METAFILE_RTBITMAP,
+ XFS_METAFILE_RTSUMMARY,
};
----
diff --git a/design/XFS_Filesystem_Structure/realtime.asciidoc b/design/XFS_Filesystem_Structure/realtime.asciidoc
index 11426e8fdb632d..3a72eb5175ad89 100644
--- a/design/XFS_Filesystem_Structure/realtime.asciidoc
+++ b/design/XFS_Filesystem_Structure/realtime.asciidoc
@@ -31,6 +31,146 @@ and bits per byte. This value is stored in +sb_rbmblocks+. The nblocks and
extent array for the inode should match this. Each real time block gets its
own bit in the bitmap.
+If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, each block of the
+realtime bitmap file has a header of the following format:
+
+[source, c]
+----
+struct xfs_rtbuf_blkinfo {
+ __be32 rt_magic;
+ __be32 rt_crc;
+ __be64 rt_owner;
+ __be64 rt_blkno;
+ __be64 rt_lsn;
+ uuid_t rt_uuid;
+};
+----
+
+*rt_magic*::
+Specifies the magic number for the rtbitmap block: ``BMPZ'' (0x424D505A).
+
+*rt_crc*::
+Checksum of the block.
+
+*rt_owner*::
+Specifies the inode number for the file that owns this block.
+
+*rt_blkno*::
+Disk address of this block.
+
+*rt_lsn*::
+Log sequence number of the last write to this block.
+
+*rt_uuid*::
+The UUID of this block, which must match either +sb_uuid+ or +sb_meta_uuid+
+depending on which features are set.
+
+After the block header, the bitmap data are encoded as be32 word values.
+
+=== xfs_db rtbitmap Example
+
+This example shows a real-time bitmap file from a freshly populated filesystem:
+
+----
+xfs_db> path -m /rtgroups/3.bitmap
+xfs_db> p
+core.magic = 0x494e
+core.mode = 0100000
+core.version = 3
+core.format = 2 (extents)
+core.metatype = 5 (rtbitmap)
+core.uid = 0
+core.gid = 0
+core.nlinkv2 = 1
+core.projid_lo = 3
+core.projid_hi = 0
+core.nextents = 1
+core.atime.sec = Tue Oct 15 16:04:02 2024
+core.atime.nsec = 769675000
+core.mtime.sec = Tue Oct 15 16:04:02 2024
+core.mtime.nsec = 769675000
+core.ctime.sec = Tue Oct 15 16:04:02 2024
+core.ctime.nsec = 769681000
+core.size = 135168
+core.nblocks = 33
+core.extsize = 0
+core.naextents = 0
+core.forkoff = 24
+core.aformat = 1 (local)
+core.dmevmask = 0
+core.dmstate = 0
+core.newrtbm = 0
+core.prealloc = 0
+core.realtime = 0
+core.immutable = 1
+core.append = 0
+core.sync = 1
+core.noatime = 1
+core.nodump = 1
+core.rtinherit = 0
+core.projinherit = 0
+core.nosymlinks = 0
+core.extsz = 0
+core.extszinherit = 0
+core.nodefrag = 1
+core.filestream = 0
+core.gen = 2653591217
+next_unlinked = null
+v3.crc = 0x34a17119 (correct)
+v3.change_count = 3
+v3.lsn = 0
+v3.flags2 = 0x38
+v3.cowextsize = 0
+v3.crtime.sec = Tue Oct 15 16:04:02 2024
+v3.crtime.nsec = 769675000
+v3.inumber = 33685633
+v3.uuid = a6575f59-1514-445e-883e-211b2c5a0f05
+v3.reflink = 0
+v3.cowextsz = 0
+v3.dax = 0
+v3.bigtime = 1
+v3.nrext64 = 1
+v3.metadata = 1
+u3.bmx[0] = [startoff,startblock,blockcount,extentflag]
+0:[0,4210712,33,0]
+a.sfattr.hdr.totsize = 27
+a.sfattr.hdr.count = 1
+a.sfattr.list[0].namelen = 8
+a.sfattr.list[0].valuelen = 12
+a.sfattr.list[0].root = 0
+a.sfattr.list[0].secure = 0
+a.sfattr.list[0].parent = 1
+a.sfattr.list[0].name = "0.bitmap"
+a.sfattr.list[0].parent_dir.inumber = 33685632
+a.sfattr.list[0].parent_dir.gen = 142228546
+xfs_db> dblock 0
+xfs_db> p
+magicnum = 0x424d505a
+crc = 0xc8b10abf (correct)
+owner = 33685633
+bno = 20902080
+lsn = 0x100007696
+uuid = a6575f59-1514-445e-883e-211b2c5a0f05
+rtwords[0-1011] = 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0 8:0 9:0 10:0 11:0 12:0 13:0
+14:0 15:0 16:0 17:0 18:0 19:0 20:0 21:0xfffff800 22:0xffffffff 23:0xffffffff
+24:0xffffffff 25:0xffffffff 26:0xffffffff 27:0xffffffff 28:0xffffffff
+29:0xffffffff 30:0xffffffff 31:0xffffffff 32:0xffffffff
+...
+979:0xffffffff 980:0xffffffff 981:0xffffffff 982:0xffffffff 983:0xffffffff
+984:0xffffffff 985:0xffffffff 986:0xffffffff 987:0xffffffff 988:0xffffffff
+989:0xffffffff 990:0xffffffff 991:0xffffffff 992:0xffffffff 993:0xffffffff
+994:0xffffffff 995:0xffffffff 996:0xffffffff 997:0xffffffff 998:0xffffffff
+999:0xffffffff 1000:0xffffffff 1001:0xffffffff 1002:0xffffffff 1003:0xffffffff
+1004:0xffffffff 1005:0xffffffff 1006:0xffffffff 1007:0xffffffff 1008:0xffffffff
+1009:0xffffffff 1010:0xffffffff 1011:0xffffffff
+----
+
+From this example, we can clearly see that this is a bitmap file in the
+metadata directory tree, and that it is the bitmap file for rtgroup 3. When we
+access the first block in the bitmap file, we can clearly see the new block
+header and that the first 179 extents are allocated. The bitmap words were
+excerpted for brevity.
+
[[Real-Time_Summary_Inode]]
== Free Space Summary Inode
@@ -47,4 +187,208 @@ This data structure is not particularly space efficient, however it is a very
fast way to provide the same data as the two free space B+trees for regular
files since the space is preallocated and metadata maintenance is minimal.
+If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, each block of the
+realtime summary file has the same header as rtbitmap file blocks. However,
+the magic number will be ``SUMY'' (0x53554D59). After the block header, the
+summary counts are encoded as be32 integers.
+
+=== xfs_db rtsummary Example
+
+This example shows a real-time summary file from a freshly populated filesystem:
+
+----
+xfs_db> path -m /rtgroups/3.summary
+xfs_db> p
+core.magic = 0x494e
+core.mode = 0100000
+core.version = 3
+core.format = 2 (extents)
+core.metatype = 6 (rtsummary)
+core.uid = 0
+core.gid = 0
+core.nlinkv2 = 1
+core.projid_lo = 3
+core.projid_hi = 0
+core.nextents = 1
+core.atime.sec = Tue Oct 15 16:04:02 2024
+core.atime.nsec = 769694000
+core.mtime.sec = Tue Oct 15 16:04:02 2024
+core.mtime.nsec = 769694000
+core.ctime.sec = Tue Oct 15 16:04:02 2024
+core.ctime.nsec = 769699000
+core.size = 4096
+core.nblocks = 1
+core.extsize = 0
+core.naextents = 0
+core.forkoff = 24
+core.aformat = 1 (local)
+core.dmevmask = 0
+core.dmstate = 0
+core.newrtbm = 0
+core.prealloc = 0
+core.realtime = 0
+core.immutable = 1
+core.append = 0
+core.sync = 1
+core.noatime = 1
+core.nodump = 1
+core.rtinherit = 0
+core.projinherit = 0
+core.nosymlinks = 0
+core.extsz = 0
+core.extszinherit = 0
+core.nodefrag = 1
+core.filestream = 0
+core.gen = 519466891
+next_unlinked = null
+v3.crc = 0x54fc58d0 (correct)
+v3.change_count = 3
+v3.lsn = 0
+v3.flags2 = 0x38
+v3.cowextsize = 0
+v3.crtime.sec = Tue Oct 15 16:04:02 2024
+v3.crtime.nsec = 769694000
+v3.inumber = 33685634
+v3.uuid = a6575f59-1514-445e-883e-211b2c5a0f05
+v3.reflink = 0
+v3.cowextsz = 0
+v3.dax = 0
+v3.bigtime = 1
+v3.nrext64 = 1
+v3.metadata = 1
+u3.bmx[0] = [startoff,startblock,blockcount,extentflag]
+0:[0,4210703,1,0]
+a.sfattr.hdr.totsize = 28
+a.sfattr.hdr.count = 1
+a.sfattr.list[0].namelen = 9
+a.sfattr.list[0].valuelen = 12
+a.sfattr.list[0].root = 0
+a.sfattr.list[0].secure = 0
+a.sfattr.list[0].parent = 1
+a.sfattr.list[0].name = "0.summary"
+a.sfattr.list[0].parent_dir.inumber = 33685632
+a.sfattr.list[0].parent_dir.gen = 142228546
+xfs_db> dblock 0
+xfs_db> p
+magicnum = 0x53554d59
+crc = 0x473340a8 (correct)
+owner = 33685634
+bno = 20902008
+lsn = 0x100007696
+uuid = a6575f59-1514-445e-883e-211b2c5a0f05
+suminfo[0-1011] = 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0 8:0 9:0 10:0 11:0 12:0 13:0
+14:0 15:0 16:0 17:0 18:0 19:0 20:0 21:0 22:0 23:0 24:0 25:0 26:0 27:0 28:0 29:0
+30:0 31:0 32:0
+...
+618:0 619:0 620:0 621:0 622:0 623:0 624:0 625:0 626:0 627:1 628:0 629:0 630:0
+...
+979:0 980:0 981:0 982:0 983:0 984:0 985:0 986:0 987:0 988:0 989:0 990:0 991:0
+992:0 993:0 994:0 995:0 996:0 997:0 998:0 999:0 1000:0 1001:0 1002:0 1003:0
+1004:0 1005:0 1006:0 1007:0 1008:0 1009:0 1010:0 1011:0
+----
+
+From this example, we can clearly see that this is a summary file in the
+metadata directory tree, and that it is the summary file for rtgroup 3. When
+we access the first block in the summary file, we can clearly see the new block
+header and the nonzero counter for the one large free extent in this group.
+The summary counts were excerpted for brevity.
+
+[[Realtime_Groups]]
+== Realtime Groups
+
+To reduce metadata contention for space allocation and remapping activities
+being applied to realtime files, the realtime volume can be split into
+allocation groups, just like the data volume. The free space information is
+still contained in a single file that applies to the entire volume.
+
+Each realtime allocation group can contain up to (2^31^ - 1) filesystem blocks,
+regardless of the underlying realtime extent size.
+
+Each realtime group has the following characteristics:
+
+ * Group 0 has a super block describing overall filesystem info
+ * Free space bitmap
+ * Summary of free space
+
+The free space metadata are the same as described in the previous sections,
+except that their scope covers only a single rtgroup. The other structures are
+expanded upon in the following sections.
+
+[[Realtime_Group_Superblocks]]
+=== Superblocks
+
+The first block of each realtime group contains a superblock. These fields
+must match their counterparts in the filesystem superblock on the data device.
+
+[source, c]
+----
+struct xfs_rtsb {
+ __be32 rsb_magicnum;
+ __le32 rsb_crc;
+
+ __be32 rsb_pad;
+ unsigned char rsb_fname[XFSLABEL_MAX];
+
+ uuid_t rsb_uuid;
+ uuid_t rsb_meta_uuid;
+
+ /* must be padded to 64 bit alignment */
+};
+----
+
+*rsb_magicnum*::
+Identifies the filesystem. Its value is +XFS_RTSB_MAGIC+ ``Frog'' (0x46726F67).
+
+*rsb_crc*::
+Superblock checksum.
+
+*rsb_pad*::
+Must be zero.
+
+*rsb_fname[12]*::
+Name for the filesystem. This matches +sb_fname+ in the primary superblock.
+
+*rsb_uuid*::
+UUID (Universally Unique ID) for the filesystem. This matches +sb_uuid+ in the
+primary superblock.
+
+*rsb_meta_uuid*::
+Metadata UUID for the filesystem. This matches +sb_meta_uuid+ in the primary
+superblock.
+
+==== xfs_db rtgroup Superblock Example
+
+A filesystem is made on a multidisk filesystem with the following command:
+
+----
+# mkfs.xfs -r rtgroups=1,rgcount=4,rtdev=/dev/sdb /dev/sda -f
+meta-data=/dev/sda isize=512 agcount=4, agsize=1298176 blks
+ = sectsz=512 attr=2, projid32bit=1
+ = crc=1 finobt=1, sparse=1, rmapbt=1
+ = reflink=1 bigtime=1 inobtcount=1 nrext64=1
+ = metadir=1
+data = bsize=4096 blocks=5192704, imaxpct=25
+ = sunit=0 swidth=0 blks
+naming =version 2 bsize=4096 ascii-ci=0, ftype=1
+log =internal log bsize=4096 blocks=16384, version=2
+ = sectsz=512 sunit=0 blks, lazy-count=1
+realtime =/dev/sdb extsz=4096 blocks=5192704, rtextents=5192704
+ = rgcount=5 rgsize=1048576 extents
+----
+
+And in xfs_db, inspecting the realtime group superblock and then the regular
+superblock:
+
+----
+# xfs_db -R /dev/sdb /dev/sda
+xfs_db> rtsb
+xfs_db> print
+magicnum = 0x46726f67
+crc = 0x759a62d4 (correct)
+pad = 0
+fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
+uuid = 7e55b909-8728-4d69-a1fa-891427314eea
+meta_uuid = 7e55b909-8728-4d69-a1fa-891427314eea
+----
+
include::rtrmapbt.asciidoc[]
diff --git a/design/XFS_Filesystem_Structure/superblock.asciidoc b/design/XFS_Filesystem_Structure/superblock.asciidoc
index 56877615ae81bf..bffb1659d0ba38 100644
--- a/design/XFS_Filesystem_Structure/superblock.asciidoc
+++ b/design/XFS_Filesystem_Structure/superblock.asciidoc
@@ -70,6 +70,10 @@ struct xfs_dsb {
__be64 sb_lsn;
uuid_t sb_meta_uuid;
__be64 sb_metadirino;
+ __be32 sb_rgcount;
+ __be32 sb_rgextents;
+ __u8 sb_rgblklog;
+ __u8 sb_pad[7];
/* must be padded to 64 bit alignment */
};
@@ -480,6 +484,24 @@ If the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is set, this field points to
the inode of the root directory of the metadata directory tree.
This field is zero otherwise.
+*sb_rgcount*::
+Count of realtime groups in the filesystem, if the
++XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is enabled. If no realtime subvolume
+exists, this value will be zero.
+
+*sb_rgextents*::
+Maximum number of realtime extents that can be contained within a realtime
+group, if the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is enabled.
+
+*sb_rgblklog*::
+If the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is enabled, this is the log~2~
+value of +sb_rgextents+ * +sb_rextsize+ (rounded up). This value is used to
+generate absolute block numbers defined in extent maps from the segmented
++xfs_rtblock_t+ values.
+
+*sb_pad[7]*::
+Zeroes, if the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is enabled.
+
=== xfs_db Superblock Example
A filesystem is made on a single disk with the following command:
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 08/10] design: document metadata directory tree quota changes
2024-11-27 0:18 ` [PATCHSET] " Darrick J. Wong
` (6 preceding siblings ...)
2024-11-27 0:19 ` [PATCH 07/10] design: document realtime groups Darrick J. Wong
@ 2024-11-27 0:20 ` Darrick J. Wong
2024-11-27 0:20 ` [PATCH 09/10] design: update metadump v2 format to reflect rt dumps Darrick J. Wong
2024-11-27 0:20 ` [PATCH 10/10] xfs-documentation: release for 6.1[23] Darrick J. Wong
9 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27 0:20 UTC (permalink / raw)
To: djwong; +Cc: hch, hch, cem, linux-xfs
From: Darrick J. Wong <djwong@kernel.org>
Document the changes to the ondisk quota metadata that came in with
metadata directory trees.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
.../internal_inodes.asciidoc | 3 +++
.../XFS_Filesystem_Structure/ondisk_inode.asciidoc | 3 +++
.../XFS_Filesystem_Structure/superblock.asciidoc | 3 +++
3 files changed, 9 insertions(+)
diff --git a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
index 5f4d62201cbd67..40eb57233ce7c0 100644
--- a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
+++ b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
@@ -21,6 +21,9 @@ of those inodes have been deallocated and may be reused by future features.
[options="header"]
|=====
| Metadata File | Location
+| xref:Quota_Inodes[User Quota] | /quota/user
+| xref:Quota_Inodes[Group Quota] | /quota/group
+| xref:Quota_Inodes[Project Quota] | /quota/project
| xref:Real-Time_Bitmap_Inode[Realtime Bitmap] | /rtgroups/*.bitmap
| xref:Real-Time_Summary_Inode[Realtime Summary] | /rtgroups/*.summary
|=====
diff --git a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
index e28929907147b7..6e52e5fd3d6c1e 100644
--- a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
+++ b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
@@ -199,6 +199,9 @@ directory tree.
[source, c]
----
enum xfs_metafile_type {
+ XFS_METAFILE_USRQUOTA,
+ XFS_METAFILE_GRPQUOTA,
+ XFS_METAFILE_PRJQUOTA,
XFS_METAFILE_RTBITMAP,
XFS_METAFILE_RTSUMMARY,
};
diff --git a/design/XFS_Filesystem_Structure/superblock.asciidoc b/design/XFS_Filesystem_Structure/superblock.asciidoc
index bffb1659d0ba38..f0455304635737 100644
--- a/design/XFS_Filesystem_Structure/superblock.asciidoc
+++ b/design/XFS_Filesystem_Structure/superblock.asciidoc
@@ -259,6 +259,9 @@ Quota flags. It can be a combination of the following flags:
| +XFS_PQUOTA_CHKD+ | Project quotas have been checked.
|=====
+If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, the +sb_qflags+ field
+will persist across mounts if no quota mount options are provided.
+
*sb_flags*::
Miscellaneous flags.
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 09/10] design: update metadump v2 format to reflect rt dumps
2024-11-27 0:18 ` [PATCHSET] " Darrick J. Wong
` (7 preceding siblings ...)
2024-11-27 0:20 ` [PATCH 08/10] design: document metadata directory tree quota changes Darrick J. Wong
@ 2024-11-27 0:20 ` Darrick J. Wong
2024-11-27 0:20 ` [PATCH 10/10] xfs-documentation: release for 6.1[23] Darrick J. Wong
9 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27 0:20 UTC (permalink / raw)
To: djwong; +Cc: hch, hch, cem, linux-xfs
From: Darrick J. Wong <djwong@kernel.org>
Update the metadump v2 format documentation to add realtime device
dumps.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
design/XFS_Filesystem_Structure/metadump.asciidoc | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/design/XFS_Filesystem_Structure/metadump.asciidoc b/design/XFS_Filesystem_Structure/metadump.asciidoc
index a32d6423ea6e75..226622c0d2f20e 100644
--- a/design/XFS_Filesystem_Structure/metadump.asciidoc
+++ b/design/XFS_Filesystem_Structure/metadump.asciidoc
@@ -119,7 +119,16 @@ Dump contains external log contents.
|=====
*xmh_incompat_flags*::
-Must be zero.
+A combination of the following flags:
+
+.Metadump v2 incompat flags
+[options="header"]
+|=====
+| Flag | Description
+| +XFS_MD2_INCOMPAT_RTDEVICE+ |
+Dump contains realtime device contents.
+
+|=====
*xmh_reserved*::
Must be zero.
@@ -143,6 +152,7 @@ Bits 55-56 determine the device from which the metadata dump data was extracted.
| Value | Description
| 0 | Data device
| 1 | External log
+| 2 | Realtime device
|=====
The lower 54 bits determine the device address from which the dump data was
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 10/10] xfs-documentation: release for 6.1[23]
2024-11-27 0:18 ` [PATCHSET] " Darrick J. Wong
` (8 preceding siblings ...)
2024-11-27 0:20 ` [PATCH 09/10] design: update metadump v2 format to reflect rt dumps Darrick J. Wong
@ 2024-11-27 0:20 ` Darrick J. Wong
9 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27 0:20 UTC (permalink / raw)
To: djwong; +Cc: hch, cem, linux-xfs
From: Darrick J. Wong <djwong@kernel.org>
Make a new release since we've just landed ondisk format changes for
6.12 and 6.13.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
design/XFS_Filesystem_Structure/docinfo.xml | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/design/XFS_Filesystem_Structure/docinfo.xml b/design/XFS_Filesystem_Structure/docinfo.xml
index 1eddb1f42f11a1..3aadb6637070d2 100644
--- a/design/XFS_Filesystem_Structure/docinfo.xml
+++ b/design/XFS_Filesystem_Structure/docinfo.xml
@@ -230,4 +230,23 @@
</simplelist>
</revdescription>
</revision>
+ <revision>
+ <revnumber>3.1415926535</revnumber>
+ <date>November 2024</date>
+ <author>
+ <firstname>Darrick</firstname>
+ <surname>Wong</surname>
+ <email>djwong@kernel.org</email>
+ </author>
+ <revdescription>
+ <simplelist>
+ <member>update online fsck docs</member>
+ <member>filesystem properties</member>
+ <member>metadata directory tree</member>
+ <member>realtime groups</member>
+ <member>metadir and quota </member>
+ <member>realtime sb metadump</member>
+ </simplelist>
+ </revdescription>
+ </revision>
</revhistory>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [GIT PULL] xfs-documentation: updates for 6.13
2024-11-27 0:16 [PATCHBOMB] xfs-documentation: updates for 6.13 Darrick J. Wong
2024-11-27 0:18 ` [PATCHSET] " Darrick J. Wong
@ 2024-11-27 0:20 ` Darrick J. Wong
1 sibling, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27 0:20 UTC (permalink / raw)
To: djwong; +Cc: cem, hch, linux-xfs
Hi Darrick,
Please pull this branch with changes.
As usual, I did a test-merge with the main upstream branch as of a few
minutes ago, and didn't see any conflicts. Please let me know if you
encounter any problems.
The following changes since commit 661d339d50b8e504456d6435ae25246057d21a21:
Merge tag 'xfsdocs-6.10-updates_2024-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-documentation into mainn (2024-08-22 17:05:15 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-documentation.git tags/xfsdocs-6.13-updates_2024-11-26
for you to fetch changes up to 368784fa00f920518ac686638c163852a477937c:
xfs-documentation: release for 6.1[23] (2024-11-26 15:57:07 -0800)
----------------------------------------------------------------
xfs-documentation: updates for 6.13 [1/3]
Here's a pile of updates detailing the changes made during 6.12 and 6.13.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
----------------------------------------------------------------
Darrick J. Wong (10):
design: update metadata reconstruction chapter
design: document filesystem properties
design: move superblock documentation to a separate file
design: document the actual ondisk superblock
design: document the changes required to handle metadata directories
design: move discussion of realtime volumes to a separate section
design: document realtime groups
design: document metadata directory tree quota changes
design: update metadump v2 format to reflect rt dumps
xfs-documentation: release for 6.1[23]
.../allocation_groups.asciidoc | 570 +-------------------
.../XFS_Filesystem_Structure/common_types.asciidoc | 4 +-
design/XFS_Filesystem_Structure/docinfo.xml | 19 +
.../fs_properties.asciidoc | 28 +
.../internal_inodes.asciidoc | 154 ++++--
design/XFS_Filesystem_Structure/magic.asciidoc | 3 +
design/XFS_Filesystem_Structure/metadump.asciidoc | 12 +-
.../XFS_Filesystem_Structure/ondisk_inode.asciidoc | 27 +-
design/XFS_Filesystem_Structure/realtime.asciidoc | 394 ++++++++++++++
.../reconstruction.asciidoc | 17 +-
.../XFS_Filesystem_Structure/superblock.asciidoc | 574 +++++++++++++++++++++
.../xfs_filesystem_structure.asciidoc | 4 +
12 files changed, 1192 insertions(+), 614 deletions(-)
create mode 100644 design/XFS_Filesystem_Structure/fs_properties.asciidoc
create mode 100644 design/XFS_Filesystem_Structure/realtime.asciidoc
create mode 100644 design/XFS_Filesystem_Structure/superblock.asciidoc
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-11-27 0:20 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-27 0:16 [PATCHBOMB] xfs-documentation: updates for 6.13 Darrick J. Wong
2024-11-27 0:18 ` [PATCHSET] " Darrick J. Wong
2024-11-27 0:18 ` [PATCH 01/10] design: update metadata reconstruction chapter Darrick J. Wong
2024-11-27 0:18 ` [PATCH 02/10] design: document filesystem properties Darrick J. Wong
2024-11-27 0:18 ` [PATCH 03/10] design: move superblock documentation to a separate file Darrick J. Wong
2024-11-27 0:19 ` [PATCH 04/10] design: document the actual ondisk superblock Darrick J. Wong
2024-11-27 0:19 ` [PATCH 05/10] design: document the changes required to handle metadata directories Darrick J. Wong
2024-11-27 0:19 ` [PATCH 06/10] design: move discussion of realtime volumes to a separate section Darrick J. Wong
2024-11-27 0:19 ` [PATCH 07/10] design: document realtime groups Darrick J. Wong
2024-11-27 0:20 ` [PATCH 08/10] design: document metadata directory tree quota changes Darrick J. Wong
2024-11-27 0:20 ` [PATCH 09/10] design: update metadump v2 format to reflect rt dumps Darrick J. Wong
2024-11-27 0:20 ` [PATCH 10/10] xfs-documentation: release for 6.1[23] Darrick J. Wong
2024-11-27 0:20 ` [GIT PULL] xfs-documentation: updates for 6.13 Darrick J. Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox