public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHBOMB] xfs-documentation: updates for 6.13
@ 2024-11-27  0:16 Darrick J. Wong
  2024-11-27  0:18 ` [PATCHSET] " Darrick J. Wong
  2024-11-27  0:20 ` [GIT PULL] xfs-documentation: updates for 6.13 Darrick J. Wong
  0 siblings, 2 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27  0:16 UTC (permalink / raw)
  To: xfs

Hi all,

Here's all the changes I have staged for the ondisk documentation for
Linux 6.13.  I'm reposting the patches along with a pull request and
will do a release immediately afterwards.

--D

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCHSET] xfs-documentation: updates for 6.13
  2024-11-27  0:16 [PATCHBOMB] xfs-documentation: updates for 6.13 Darrick J. Wong
@ 2024-11-27  0:18 ` Darrick J. Wong
  2024-11-27  0:18   ` [PATCH 01/10] design: update metadata reconstruction chapter Darrick J. Wong
                     ` (9 more replies)
  2024-11-27  0:20 ` [GIT PULL] xfs-documentation: updates for 6.13 Darrick J. Wong
  1 sibling, 10 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27  0:18 UTC (permalink / raw)
  To: djwong; +Cc: hch, hch, cem, linux-xfs

Hi all,

Here's a pile of updates detailing the changes made during 6.12 and 6.13.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

Comments and questions are, as always, welcome.

xfsdocs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-documentation.git/log/?h=xfsdocs-6.13-updates
---
Commits in this patchset:
 * design: update metadata reconstruction chapter
 * design: document filesystem properties
 * design: move superblock documentation to a separate file
 * design: document the actual ondisk superblock
 * design: document the changes required to handle metadata directories
 * design: move discussion of realtime volumes to a separate section
 * design: document realtime groups
 * design: document metadata directory tree quota changes
 * design: update metadump v2 format to reflect rt dumps
 * xfs-documentation: release for 6.1[23]
---
 .../allocation_groups.asciidoc                     |  570 --------------------
 .../XFS_Filesystem_Structure/common_types.asciidoc |    4 
 design/XFS_Filesystem_Structure/docinfo.xml        |   19 +
 .../fs_properties.asciidoc                         |   28 +
 .../internal_inodes.asciidoc                       |  154 ++++-
 design/XFS_Filesystem_Structure/magic.asciidoc     |    3 
 design/XFS_Filesystem_Structure/metadump.asciidoc  |   12 
 .../XFS_Filesystem_Structure/ondisk_inode.asciidoc |   27 +
 design/XFS_Filesystem_Structure/realtime.asciidoc  |  394 ++++++++++++++
 .../reconstruction.asciidoc                        |   17 -
 .../XFS_Filesystem_Structure/superblock.asciidoc   |  574 ++++++++++++++++++++
 .../xfs_filesystem_structure.asciidoc              |    4 
 12 files changed, 1192 insertions(+), 614 deletions(-)
 create mode 100644 design/XFS_Filesystem_Structure/fs_properties.asciidoc
 create mode 100644 design/XFS_Filesystem_Structure/realtime.asciidoc
 create mode 100644 design/XFS_Filesystem_Structure/superblock.asciidoc


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 01/10] design: update metadata reconstruction chapter
  2024-11-27  0:18 ` [PATCHSET] " Darrick J. Wong
@ 2024-11-27  0:18   ` Darrick J. Wong
  2024-11-27  0:18   ` [PATCH 02/10] design: document filesystem properties Darrick J. Wong
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27  0:18 UTC (permalink / raw)
  To: djwong; +Cc: hch, hch, cem, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

We've landed online repair and full backrefs in the filesystem, so
update the links to the new sections and transform future tense to
present tense.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 .../reconstruction.asciidoc                        |   17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)


diff --git a/design/XFS_Filesystem_Structure/reconstruction.asciidoc b/design/XFS_Filesystem_Structure/reconstruction.asciidoc
index f172e0f8161656..f4c10217910b6c 100644
--- a/design/XFS_Filesystem_Structure/reconstruction.asciidoc
+++ b/design/XFS_Filesystem_Structure/reconstruction.asciidoc
@@ -1,10 +1,6 @@
 [[Reconstruction]]
 = Metadata Reconstruction
 
-[NOTE]
-This is a theoretical discussion of how reconstruction could work; none of this
-is implemented as of 2015.
-
 A simple UNIX filesystem can be thought of in terms of a directed acyclic graph.
 To a first approximation, there exists a root directory node, which points to
 other nodes.  Those other nodes can themselves be directories or they can be
@@ -45,9 +41,14 @@ The xref:Reverse_Mapping_Btree[reverse-mapping B+tree] fills in part of the
 puzzle.  Since it contains copies of every entry in each inode’s data and
 attribute forks, we can fix a corrupted block map with these records.
 Furthermore, if the inode B+trees become corrupt, it is possible to visit all
-inode chunks using the reverse-mapping data.  Should XFS ever gain the ability
-to store parent directory information in each inode, it also becomes possible
+inode chunks using the reverse-mapping data.  xref:Parent_Pointers[Directory
+parent pointers] fill in the rest of the puzzle by mirroring the directory tree
+structure with parent directory information in each inode.  It is now possible
 to resurrect damaged directory trees, which should reduce the complaints about
 inodes ending up in +/lost+found+.  Everything else in the per-AG primary
-metadata can already be reconstructed via +xfs_repair+.  Hopefully,
-reconstruction will not turn out to be a fool's errand.
+metadata can already be reconstructed via +xfs_repair+.
+
+See the
+https://docs.kernel.org/filesystems/xfs/xfs-online-fsck-design.html[design
+document] for online repair for a more thorough discussion of how this metadata
+are put to use.


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 02/10] design: document filesystem properties
  2024-11-27  0:18 ` [PATCHSET] " Darrick J. Wong
  2024-11-27  0:18   ` [PATCH 01/10] design: update metadata reconstruction chapter Darrick J. Wong
@ 2024-11-27  0:18   ` Darrick J. Wong
  2024-11-27  0:18   ` [PATCH 03/10] design: move superblock documentation to a separate file Darrick J. Wong
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27  0:18 UTC (permalink / raw)
  To: djwong; +Cc: hch, hch, cem, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Now that xfsprogs utilities can set properties to coordinate the
behavior of other xfsprogs utilities, record them in the ondisk format
documentation.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 .../fs_properties.asciidoc                         |   28 ++++++++++++++++++++
 .../xfs_filesystem_structure.asciidoc              |    2 +
 2 files changed, 30 insertions(+)
 create mode 100644 design/XFS_Filesystem_Structure/fs_properties.asciidoc


diff --git a/design/XFS_Filesystem_Structure/fs_properties.asciidoc b/design/XFS_Filesystem_Structure/fs_properties.asciidoc
new file mode 100644
index 00000000000000..b639aec9ab6366
--- /dev/null
+++ b/design/XFS_Filesystem_Structure/fs_properties.asciidoc
@@ -0,0 +1,28 @@
+[[Filesystem_Properties]]
+= Filesystem Properties
+
+System administrators can set filesystem-wide properties to coordinate the
+behavior of userspace XFS administration tools.  These properties are recorded
+as extended attributes of the +ATTR_ROOT+ namesace that are set on the root
+directory.
+
+[options="header"]
+|=====
+| Property			| Description
+| +xfs:autofsck+		| Online fsck background scanning behavior
+|=====
+
+*xfs:autofsck*::
+This property controls the behavior of background online fsck.
+Unrecognized values are treated as if the property was not set.
+Check the +xfs_scrub+ manual page for more information.
+
+.autofsck property values
+[options="header"]
+|=====
+| Value				| Description
+| +none+			| Do not perform background scans.
+| +check+			| Only check metadata.
+| +optimize+			| Check and optimize metadata.
+| +repair+			| Check, repair, or optimize metadata.
+|=====
diff --git a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
index a95a5806172a0c..689e2a874c13e9 100644
--- a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
+++ b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
@@ -84,6 +84,8 @@ include::journaling_log.asciidoc[]
 
 include::internal_inodes.asciidoc[]
 
+include::fs_properties.asciidoc[]
+
 :leveloffset: 0
 
 Dynamically Allocated Structures


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 03/10] design: move superblock documentation to a separate file
  2024-11-27  0:18 ` [PATCHSET] " Darrick J. Wong
  2024-11-27  0:18   ` [PATCH 01/10] design: update metadata reconstruction chapter Darrick J. Wong
  2024-11-27  0:18   ` [PATCH 02/10] design: document filesystem properties Darrick J. Wong
@ 2024-11-27  0:18   ` Darrick J. Wong
  2024-11-27  0:19   ` [PATCH 04/10] design: document the actual ondisk superblock Darrick J. Wong
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27  0:18 UTC (permalink / raw)
  To: djwong; +Cc: hch, hch, cem, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Move the ondisk superblock docs to a separate file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 .../allocation_groups.asciidoc                     |  550 --------------------
 .../XFS_Filesystem_Structure/superblock.asciidoc   |  548 ++++++++++++++++++++
 2 files changed, 549 insertions(+), 549 deletions(-)
 create mode 100644 design/XFS_Filesystem_Structure/superblock.asciidoc


diff --git a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
index d7fd63ea20a646..e2cdaab5e03d3f 100644
--- a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
+++ b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
@@ -31,555 +31,7 @@ image::images/6.png[]
 
 Each of these structures are expanded upon in the following sections.
 
-[[Superblocks]]
-== Superblocks
-
-Each AG starts with a superblock. The first one, in AG 0, is the primary
-superblock which stores aggregate AG information. Secondary superblocks are
-only used by xfs_repair when the primary superblock has been corrupted.  A
-superblock is one sector in length.
-
-The superblock is defined by the following structure. The description of each
-field follows.
-
-[source, c]
-----
-struct xfs_sb
-{
-	__uint32_t		sb_magicnum;
-	__uint32_t		sb_blocksize;
-	xfs_rfsblock_t		sb_dblocks;
-	xfs_rfsblock_t		sb_rblocks;
-	xfs_rtblock_t		sb_rextents;
-	uuid_t			sb_uuid;
-	xfs_fsblock_t		sb_logstart;
-	xfs_ino_t		sb_rootino;
-	xfs_ino_t		sb_rbmino;
-	xfs_ino_t		sb_rsumino;
-	xfs_agblock_t		sb_rextsize;
-	xfs_agblock_t		sb_agblocks;
-	xfs_agnumber_t		sb_agcount;
-	xfs_extlen_t		sb_rbmblocks;
-	xfs_extlen_t		sb_logblocks;
-	__uint16_t		sb_versionnum;
-	__uint16_t		sb_sectsize;
-	__uint16_t		sb_inodesize;
-	__uint16_t		sb_inopblock;
-	char			sb_fname[12];
-	__uint8_t		sb_blocklog;
-	__uint8_t		sb_sectlog;
-	__uint8_t		sb_inodelog;
-	__uint8_t		sb_inopblog;
-	__uint8_t		sb_agblklog;
-	__uint8_t		sb_rextslog;
-	__uint8_t		sb_inprogress;
-	__uint8_t		sb_imax_pct;
-	__uint64_t		sb_icount;
-	__uint64_t		sb_ifree;
-	__uint64_t		sb_fdblocks;
-	__uint64_t		sb_frextents;
-	xfs_ino_t		sb_uquotino;
-	xfs_ino_t		sb_gquotino;
-	__uint16_t		sb_qflags;
-	__uint8_t		sb_flags;
-	__uint8_t		sb_shared_vn;
-	xfs_extlen_t		sb_inoalignmt;
-	__uint32_t		sb_unit;
-	__uint32_t		sb_width;
-	__uint8_t		sb_dirblklog;
-	__uint8_t		sb_logsectlog;
-	__uint16_t		sb_logsectsize;
-	__uint32_t		sb_logsunit;
-	__uint32_t		sb_features2;
-	__uint32_t		sb_bad_features2;
-
-	/* version 5 superblock fields start here */
-	__uint32_t		sb_features_compat;
-	__uint32_t		sb_features_ro_compat;
-	__uint32_t		sb_features_incompat;
-	__uint32_t		sb_features_log_incompat;
-
-	__uint32_t		sb_crc;
-	xfs_extlen_t		sb_spino_align;
-
-	xfs_ino_t		sb_pquotino;
-	xfs_lsn_t		sb_lsn;
-	uuid_t			sb_meta_uuid;
-	xfs_ino_t		sb_rrmapino;
-};
-----
-*sb_magicnum*::
-Identifies the filesystem. Its value is +XFS_SB_MAGIC+ ``XFSB'' (0x58465342).
-
-*sb_blocksize*::
-The size of a basic unit of space allocation in bytes. Typically, this is 4096
-(4KB) but can range from 512 to 65536 bytes.
-
-*sb_dblocks*::
-Total number of blocks available for data and metadata on the filesystem.
-
-*sb_rblocks*::
-Number blocks in the real-time disk device. Refer to
-xref:Real-time_Devices[real-time sub-volumes] for more information.
-
-*sb_rextents*::
-Number of extents on the real-time device.
-
-*sb_uuid*::
-UUID (Universally Unique ID) for the filesystem. Filesystems can be mounted by
-the UUID instead of device name.
-
-*sb_logstart*::
-First block number for the journaling log if the log is internal (ie. not on a
-separate disk device). For an external log device, this will be zero (the log
-will also start on the first block on the log device).  The identity of the log
-devices is not recorded in the filesystem, but the UUIDs of the filesystem and
-the log device are compared to prevent corruption.
-
-*sb_rootino*::
-Root inode number for the filesystem.  Normally, the root inode is at the
-start of the first possible inode chunk in AG 0.  This is 128 when using a 4KB
-block size.
-
-*sb_rbmino*::
-Bitmap inode for real-time extents.
-
-*sb_rsumino*::
-Summary inode for real-time bitmap.
-
-*sb_rextsize*::
-Realtime extent size in blocks.
-
-*sb_agblocks*::
-Size of each AG in blocks. For the actual size of the last AG, refer to the
-xref:AG_Free_Space_Management[free space] +agf_length+ value.
-
-*sb_agcount*::
-Number of AGs in the filesystem.
-
-*sb_rbmblocks*::
-Number of real-time bitmap blocks.
-
-*sb_logblocks*::
-Number of blocks for the journaling log.
-
-*sb_versionnum*::
-Filesystem version number. This is a bitmask specifying the features enabled
-when creating the filesystem. Any disk checking tools or drivers that do not
-recognize any set bits must not operate upon the filesystem. Most of the flags
-indicate features introduced over time. If the value of the lower nibble is >=
-4, the higher bits indicate feature flags as follows:
-
-.Version 4 Superblock version flags
-[options="header"]
-|=====
-| Flag				| Description
-| +XFS_SB_VERSION_ATTRBIT+	|
-Set if any inode have extended attributes.  If this bit is set; the
-+XFS_SB_VERSION2_ATTR2BIT+ is not set; and the +attr2+ mount flag is not
-specified, the +di_forkoff+ inode field will not be dynamically adjusted.
-See the section about xref:Extended_Attribute_Versions[extended attribute
-versions] for more information.
-
-| +XFS_SB_VERSION_NLINKBIT+	| Set if any inodes use 32-bit di_nlink values.
-| +XFS_SB_VERSION_QUOTABIT+	|
-Quotas are enabled on the filesystem. This
-also brings in the various quota fields in the superblock.
-
-| +XFS_SB_VERSION_ALIGNBIT+	| Set if sb_inoalignmt is used.
-| +XFS_SB_VERSION_DALIGNBIT+	| Set if sb_unit and sb_width are used.
-| +XFS_SB_VERSION_SHAREDBIT+	| Set if sb_shared_vn is used.
-| +XFS_SB_VERSION_LOGV2BIT+	| Version 2 journaling logs are used.
-| +XFS_SB_VERSION_SECTORBIT+	| Set if sb_sectsize is not 512.
-| +XFS_SB_VERSION_EXTFLGBIT+	| Unwritten extents are used. This is always set.
-| +XFS_SB_VERSION_DIRV2BIT+	|
-Version 2 directories are used. This is always set.
-
-| +XFS_SB_VERSION_MOREBITSBIT+	|
-Set if the sb_features2 field in the superblock contains more flags.
-|=====
-
-If the lower nibble of this value is 5, then this is a v5 filesystem; the
-+XFS_SB_VERSION2_CRCBIT+ feature must be set in +sb_features2+.
-
-*sb_sectsize*::
-Specifies the underlying disk sector size in bytes.  Typically this is 512 or
-4096 bytes. This determines the minimum I/O alignment, especially for direct I/O.
-
-*sb_inodesize*::
-Size of the inode in bytes. The default is 256 (2 inodes per standard sector)
-but can be made as large as 2048 bytes when creating the filesystem.  On a v5
-filesystem, the default and minimum inode size are both 512 bytes.
-
-*sb_inopblock*::
-Number of inodes per block. This is equivalent to +sb_blocksize / sb_inodesize+.
-
-*sb_fname[12]*::
-Name for the filesystem. This value can be used in the mount command.
-
-*sb_blocklog*::
-log~2~ value of +sb_blocksize+. In other terms, +sb_blocksize = 2^sb_blocklog^+.
-
-*sb_sectlog*::
-log~2~ value of +sb_sectsize+.
-
-*sb_inodelog*::
-log~2~ value of +sb_inodesize+.
-
-*sb_inopblog*::
-log~2~ value of +sb_inopblock+.
-
-*sb_agblklog*::
-log~2~ value of +sb_agblocks+ (rounded up). This value is used to generate inode
-numbers and absolute block numbers defined in extent maps.
-
-*sb_rextslog*::
-log~2~ value of +sb_rextents+.
-
-*sb_inprogress*::
-Flag specifying that the filesystem is being created.
-
-*sb_imax_pct*::
-Maximum percentage of filesystem space that can be used for inodes. The default
-value is 5%.
-
-*sb_icount*::
-Global count for number inodes allocated on the filesystem. This is only
-maintained in the first superblock.
-
-*sb_ifree*::
-Global count of free inodes on the filesystem. This is only maintained in the
-first superblock.
-
-*sb_fdblocks*::
-Global count of free data blocks on the filesystem. This is only maintained in
-the first superblock.
-
-*sb_frextents*::
-Global count of free real-time extents on the filesystem. This is only
-maintained in the first superblock.
-
-*sb_uquotino*::
-Inode for user quotas. This and the following two quota fields only apply if
-+XFS_SB_VERSION_QUOTABIT+ flag is set in +sb_versionnum+. Refer to
-xref:Quota_Inodes[quota inodes] for more information.
-
-*sb_gquotino*::
-Inode for group or project quotas. Group and project quotas cannot be used at
-the same time on v4 filesystems.  On a v5 filesystem, this inode always stores
-group quota information.
-
-*sb_qflags*::
-Quota flags. It can be a combination of the following flags:
-
-.Superblock quota flags
-[options="header"]
-|=====
-| Flag				| Description
-| +XFS_UQUOTA_ACCT+		| User quota accounting is enabled.
-| +XFS_UQUOTA_ENFD+		| User quotas are enforced.
-| +XFS_UQUOTA_CHKD+		| User quotas have been checked.
-| +XFS_PQUOTA_ACCT+		| Project quota accounting is enabled.
-| +XFS_OQUOTA_ENFD+		| Other (group/project) quotas are enforced.
-| +XFS_OQUOTA_CHKD+		| Other (group/project) quotas have been checked.
-| +XFS_GQUOTA_ACCT+		| Group quota accounting is enabled.
-| +XFS_GQUOTA_ENFD+		| Group quotas are enforced.
-| +XFS_GQUOTA_CHKD+		| Group quotas have been checked.
-| +XFS_PQUOTA_ENFD+		| Project quotas are enforced.
-| +XFS_PQUOTA_CHKD+		| Project quotas have been checked.
-|=====
-
-*sb_flags*::
-Miscellaneous flags.
-
-.Superblock flags
-[options="header"]
-|=====
-| Flag                          | Description
-| +XFS_SBF_READONLY+            | Only read-only mounts allowed.
-|=====
-
-*sb_shared_vn*::
-Reserved and must be zero (``vn'' stands for version number).
-
-*sb_inoalignmt*::
-Inode chunk alignment in fsblocks.  Prior to v5, the default value provided for
-inode chunks to have an 8KiB alignment.  Starting with v5, the default value
-scales with the multiple of the inode size over 256 bytes.  Concretely, this
-means an alignment of 16KiB for 512-byte inodes, 32KiB for 1024-byte inodes,
-etc.  If sparse inodes are enabled, the +ir_startino+ field of each inode
-B+tree record must be aligned to this block granularity, even if the inode
-given by +ir_startino+ itself is sparse.
-
-*sb_unit*::
-Underlying stripe or raid unit in blocks.
-
-*sb_width*::
-Underlying stripe or raid width in blocks.
-
-*sb_dirblklog*::
-log~2~ multiplier that determines the granularity of directory block allocations
-in fsblocks.
-
-*sb_logsectlog*::
-log~2~ value of the log subvolume's sector size. This is only used if the
-journaling log is on a separate disk device (i.e. not internal).
-
-*sb_logsectsize*::
-The log's sector size in bytes if the filesystem uses an external log device.
-
-*sb_logsunit*::
-The log device's stripe or raid unit size. This only applies to version 2 logs
-+XFS_SB_VERSION_LOGV2BIT+ is set in +sb_versionnum+.
-
-*sb_features2*::
-Additional version flags if +XFS_SB_VERSION_MOREBITSBIT+ is set in
-+sb_versionnum+. The currently defined additional features include:
-
-.Extended Version 4 Superblock flags
-[options="header"]
-|=====
-| Flag				| Description
-| +XFS_SB_VERSION2_LAZYSBCOUNTBIT+ |
-Lazy global counters. Making a filesystem with this bit set can improve
-performance. The global free space and inode counts are only updated in the
-primary superblock when the filesystem is cleanly unmounted.
-
-| +XFS_SB_VERSION2_ATTR2BIT+	|
-Extended attributes version 2. Making a filesystem with this optimises the
-inode layout of extended attributes.  If this bit is set and the +noattr2+
-mount flag is not specified, the +di_forkoff+ inode field will be dynamically
-adjusted.  See the section about xref:Extended_Attribute_Versions[extended
-attribute versions] for more information.
-
-| +XFS_SB_VERSION2_PARENTBIT+	|
-Parent pointers. All inodes must have an extended attribute that points back to
-its parent inode. The primary purpose for this information is in backup systems.
-
-| +XFS_SB_VERSION2_PROJID32BIT+	|
-32-bit Project ID.  Inodes can be associated with a project ID number, which
-can be used to enforce disk space usage quotas for a particular group of
-directories.  This flag indicates that project IDs can be 32 bits in size.
-
-| +XFS_SB_VERSION2_CRCBIT+	|
-Metadata checksumming.  All metadata blocks have an extended header containing
-the block checksum, a copy of the metadata UUID, the log sequence number of the
-last update to prevent stale replays, and a back pointer to the owner of the
-block.  This feature must be and can only be set if the lowest nibble of
-+sb_versionnum+ is set to 5.
-
-| +XFS_SB_VERSION2_FTYPE+	|
-Directory file type.  Each directory entry records the type of the inode to
-which the entry points.  This speeds up directory iteration by removing the
-need to load every inode into memory.
-|=====
-
-*sb_bad_features2*::
-This field mirrors +sb_features2+, due to past 64-bit alignment errors.
-
-*sb_features_compat*::
-Read-write compatible feature flags.  The kernel can still read and write this
-FS even if it doesn't understand the flag.  Currently, there are no valid
-flags.
-
-*sb_features_ro_compat*::
-Read-only compatible feature flags.  The kernel can still read this FS even if
-it doesn't understand the flag.
-
-.Extended Version 5 Superblock Read-Only compatibility flags
-[options="header"]
-|=====
-| Flag				| Description
-| +XFS_SB_FEAT_RO_COMPAT_FINOBT+ |
-Free inode B+tree.  Each allocation group contains a B+tree to track inode chunks
-containing free inodes.  This is a performance optimization to reduce the time
-required to allocate inodes.
-
-| +XFS_SB_FEAT_RO_COMPAT_RMAPBT+ |
-Reverse mapping B+tree.  Each allocation group contains a B+tree containing
-records mapping AG blocks to their owners.  See the section about
-xref:Reconstruction[reconstruction] for more details.
-
-| +XFS_SB_FEAT_RO_COMPAT_REFLINK+ |
-Reference count B+tree.  Each allocation group contains a B+tree to track the
-reference counts of AG blocks.  This enables files to share data blocks safely.
-See the section about xref:Reflink_Deduplication[reflink and deduplication] for
-more details.
-
-| +XFS_SB_FEAT_RO_COMPAT_INOBTCNT+ |
-Inode B+tree block counters.  Each allocation group's inode (AGI) header
-tracks the number of blocks in each of the inode B+trees.  This allows us
-to have a slightly higher level of redundancy over the shape of the inode
-btrees, and decreases the amount of time to compute the metadata B+tree
-preallocations at mount time.
-
-|=====
-
-*sb_features_incompat*::
-Read-write incompatible feature flags.  The kernel cannot read or write this
-FS if it doesn't understand the flag.
-
-.Extended Version 5 Superblock Read-Write incompatibility flags
-[options="header"]
-|=====
-| Flag				| Description
-| +XFS_SB_FEAT_INCOMPAT_FTYPE+ |
-Directory file type.  Each directory entry tracks the type of the inode to
-which the entry points.  This is a performance optimization to remove the need
-to load every inode into memory to iterate a directory.
-
-| +XFS_SB_FEAT_INCOMPAT_SPINODES+ |
-Sparse inodes.  This feature relaxes the requirement to allocate inodes in
-chunks of 64.  When the free space is heavily fragmented, there might exist
-plenty of free space but not enough contiguous free space to allocate a new
-inode chunk.  With this feature, the user can continue to create files until
-all free space is exhausted.
-
-Unused space in the inode B+tree records are used to track which parts of the
-inode chunk are not inodes.
-
-See the chapter on xref:Sparse_Inodes[Sparse Inodes] for more information.
-
-| +XFS_SB_FEAT_INCOMPAT_META_UUID+ |
-Metadata UUID.  The UUID stamped into each metadata block must match the value
-in +sb_meta_uuid+.  This enables the administrator to change +sb_uuid+ at will
-without having to rewrite the entire filesystem.
-
-| +XFS_SB_FEAT_INCOMPAT_BIGTIME+ |
-Large timestamps.  Inode timestamps and quota expiration timers are extended to
-support times through the year 2486.  See the section on
-xref:Timestamps[timestamps] for more information.
-
-| +XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR+ |
-The filesystem is not in operable condition, and must be run through
-xfs_repair before it can be mounted.
-
-| +XFS_SB_FEAT_INCOMPAT_NREXT64+ |
-Large file fork extent counts.  This greatly expands the maximum number of
-space mappings allowed in data and extended attribute file forks.
-
-| +XFS_SB_FEAT_INCOMPAT_EXCHRANGE+ |
-Atomic file mapping exchanges.  The filesystem is capable of exchanging a range
-of mappings between two arbitrary ranges of a file's fork by using log intent
-items to track the progress of the high level exchange operation.  In other
-words, the exchange operation can be restarted if the system goes down, which
-is necessary for userspace to commit of new file contents atomically.  This
-flag has user-visible impacts, which is why it is a permanent incompat flag.
-See the section about xref:XMI_Log_Item[mapping exchange log intents] for more
-information.
-
-| +XFS_SB_FEAT_INCOMPAT_PARENT+ |
-Directory parent pointers.  See the section about xref:Parent_Pointers[parent
-pointers] for more information.
-
-|=====
-
-*sb_features_log_incompat*::
-Read-write incompatible feature flags for the log.  The kernel cannot recover
-the FS log if it doesn't understand the flag.
-
-.Extended Version 5 Superblock Log incompatibility flags
-[options="header"]
-|=====
-| Flag					| Description
-| +XFS_SB_FEAT_INCOMPAT_LOG_XATTRS+	|
-Extended attribute updates have been committed to the ondisk log.
-
-|=====
-
-*sb_crc*::
-Superblock checksum.
-
-*sb_spino_align*::
-Sparse inode alignment, in fsblocks.  Each chunk of inodes referenced by a
-sparse inode B+tree record must be aligned to this block granularity.
-
-*sb_pquotino*::
-Project quota inode.
-
-*sb_lsn*::
-Log sequence number of the last superblock update.
-
-*sb_meta_uuid*::
-If the +XFS_SB_FEAT_INCOMPAT_META_UUID+ feature is set, then the UUID field in
-all metadata blocks must match this UUID.  If not, the block header UUID field
-must match +sb_uuid+.
-
-*sb_rrmapino*::
-If the +XFS_SB_FEAT_RO_COMPAT_RMAPBT+ feature is set and a real-time
-device is present (+sb_rblocks+ > 0), this field points to an inode
-that contains the root to the
-xref:Real_time_Reverse_Mapping_Btree[Real-Time Reverse Mapping B+tree].
-This field is zero otherwise.
-
-=== xfs_db Superblock Example
-
-A filesystem is made on a single disk with the following command:
-
-----
-# mkfs.xfs -i attr=2 -n size=16384 -f /dev/sda7
-meta-data=/dev/sda7              isize=256    agcount=16, agsize=3923122 blks
-         =                       sectsz=512   attr=2
-data     =                       bsize=4096   blocks=62769952, imaxpct=25
-         =                       sunit=0      swidth=0 blks, unwritten=1
-naming   =version 2              bsize=16384
-log      =internal log           bsize=4096   blocks=30649, version=1
-         =                       sectsz=512   sunit=0 blks
-realtime =none                   extsz=65536  blocks=0, rtextents=0
-----
-
-And in xfs_db, inspecting the superblock:
-
-----
-xfs_db> sb
-xfs_db> p
-magicnum = 0x58465342
-blocksize = 4096
-dblocks = 62769952
-rblocks = 0
-rextents = 0
-uuid = 32b24036-6931-45b4-b68c-cd5e7d9a1ca5
-logstart = 33554436
-rootino = 128
-rbmino = 129
-rsumino = 130
-rextsize = 16
-agblocks = 3923122
-agcount = 16
-rbmblocks = 0
-logblocks = 30649
-versionnum = 0xb084
-sectsize = 512
-inodesize = 256
-inopblock = 16
-fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
-blocklog = 12
-sectlog = 9
-inodelog = 8
-inopblog = 4
-agblklog = 22
-rextslog = 0
-inprogress = 0
-imax_pct = 25
-icount = 64
-ifree = 61
-fdblocks = 62739235
-frextents = 0
-uquotino = 0
-gquotino = 0
-qflags = 0
-flags = 0
-shared_vn = 0
-inoalignmt = 2
-unit = 0
-width = 0
-dirblklog = 2
-logsectlog = 0
-logsectsize = 0
-logsunit = 0
-features2 = 8
-----
-
+include::superblock.asciidoc[]
 
 [[AG_Free_Space_Management]]
 == AG Free Space Management
diff --git a/design/XFS_Filesystem_Structure/superblock.asciidoc b/design/XFS_Filesystem_Structure/superblock.asciidoc
new file mode 100644
index 00000000000000..16c31116ffafd4
--- /dev/null
+++ b/design/XFS_Filesystem_Structure/superblock.asciidoc
@@ -0,0 +1,548 @@
+[[Superblocks]]
+== Superblocks
+
+Each AG starts with a superblock. The first one, in AG 0, is the primary
+superblock which stores aggregate AG information. Secondary superblocks are
+only used by xfs_repair when the primary superblock has been corrupted.  A
+superblock is one sector in length.
+
+The superblock is defined by the following structure. The description of each
+field follows.
+
+[source, c]
+----
+struct xfs_sb
+{
+	__uint32_t		sb_magicnum;
+	__uint32_t		sb_blocksize;
+	xfs_rfsblock_t		sb_dblocks;
+	xfs_rfsblock_t		sb_rblocks;
+	xfs_rtblock_t		sb_rextents;
+	uuid_t			sb_uuid;
+	xfs_fsblock_t		sb_logstart;
+	xfs_ino_t		sb_rootino;
+	xfs_ino_t		sb_rbmino;
+	xfs_ino_t		sb_rsumino;
+	xfs_agblock_t		sb_rextsize;
+	xfs_agblock_t		sb_agblocks;
+	xfs_agnumber_t		sb_agcount;
+	xfs_extlen_t		sb_rbmblocks;
+	xfs_extlen_t		sb_logblocks;
+	__uint16_t		sb_versionnum;
+	__uint16_t		sb_sectsize;
+	__uint16_t		sb_inodesize;
+	__uint16_t		sb_inopblock;
+	char			sb_fname[12];
+	__uint8_t		sb_blocklog;
+	__uint8_t		sb_sectlog;
+	__uint8_t		sb_inodelog;
+	__uint8_t		sb_inopblog;
+	__uint8_t		sb_agblklog;
+	__uint8_t		sb_rextslog;
+	__uint8_t		sb_inprogress;
+	__uint8_t		sb_imax_pct;
+	__uint64_t		sb_icount;
+	__uint64_t		sb_ifree;
+	__uint64_t		sb_fdblocks;
+	__uint64_t		sb_frextents;
+	xfs_ino_t		sb_uquotino;
+	xfs_ino_t		sb_gquotino;
+	__uint16_t		sb_qflags;
+	__uint8_t		sb_flags;
+	__uint8_t		sb_shared_vn;
+	xfs_extlen_t		sb_inoalignmt;
+	__uint32_t		sb_unit;
+	__uint32_t		sb_width;
+	__uint8_t		sb_dirblklog;
+	__uint8_t		sb_logsectlog;
+	__uint16_t		sb_logsectsize;
+	__uint32_t		sb_logsunit;
+	__uint32_t		sb_features2;
+	__uint32_t		sb_bad_features2;
+
+	/* version 5 superblock fields start here */
+	__uint32_t		sb_features_compat;
+	__uint32_t		sb_features_ro_compat;
+	__uint32_t		sb_features_incompat;
+	__uint32_t		sb_features_log_incompat;
+
+	__uint32_t		sb_crc;
+	xfs_extlen_t		sb_spino_align;
+
+	xfs_ino_t		sb_pquotino;
+	xfs_lsn_t		sb_lsn;
+	uuid_t			sb_meta_uuid;
+	xfs_ino_t		sb_rrmapino;
+};
+----
+*sb_magicnum*::
+Identifies the filesystem. Its value is +XFS_SB_MAGIC+ ``XFSB'' (0x58465342).
+
+*sb_blocksize*::
+The size of a basic unit of space allocation in bytes. Typically, this is 4096
+(4KB) but can range from 512 to 65536 bytes.
+
+*sb_dblocks*::
+Total number of blocks available for data and metadata on the filesystem.
+
+*sb_rblocks*::
+Number blocks in the real-time disk device. Refer to
+xref:Real-time_Devices[real-time sub-volumes] for more information.
+
+*sb_rextents*::
+Number of extents on the real-time device.
+
+*sb_uuid*::
+UUID (Universally Unique ID) for the filesystem. Filesystems can be mounted by
+the UUID instead of device name.
+
+*sb_logstart*::
+First block number for the journaling log if the log is internal (ie. not on a
+separate disk device). For an external log device, this will be zero (the log
+will also start on the first block on the log device).  The identity of the log
+devices is not recorded in the filesystem, but the UUIDs of the filesystem and
+the log device are compared to prevent corruption.
+
+*sb_rootino*::
+Root inode number for the filesystem.  Normally, the root inode is at the
+start of the first possible inode chunk in AG 0.  This is 128 when using a 4KB
+block size.
+
+*sb_rbmino*::
+Bitmap inode for real-time extents.
+
+*sb_rsumino*::
+Summary inode for real-time bitmap.
+
+*sb_rextsize*::
+Realtime extent size in blocks.
+
+*sb_agblocks*::
+Size of each AG in blocks. For the actual size of the last AG, refer to the
+xref:AG_Free_Space_Management[free space] +agf_length+ value.
+
+*sb_agcount*::
+Number of AGs in the filesystem.
+
+*sb_rbmblocks*::
+Number of real-time bitmap blocks.
+
+*sb_logblocks*::
+Number of blocks for the journaling log.
+
+*sb_versionnum*::
+Filesystem version number. This is a bitmask specifying the features enabled
+when creating the filesystem. Any disk checking tools or drivers that do not
+recognize any set bits must not operate upon the filesystem. Most of the flags
+indicate features introduced over time. If the value of the lower nibble is >=
+4, the higher bits indicate feature flags as follows:
+
+.Version 4 Superblock version flags
+[options="header"]
+|=====
+| Flag				| Description
+| +XFS_SB_VERSION_ATTRBIT+	|
+Set if any inode have extended attributes.  If this bit is set; the
++XFS_SB_VERSION2_ATTR2BIT+ is not set; and the +attr2+ mount flag is not
+specified, the +di_forkoff+ inode field will not be dynamically adjusted.
+See the section about xref:Extended_Attribute_Versions[extended attribute
+versions] for more information.
+
+| +XFS_SB_VERSION_NLINKBIT+	| Set if any inodes use 32-bit di_nlink values.
+| +XFS_SB_VERSION_QUOTABIT+	|
+Quotas are enabled on the filesystem. This
+also brings in the various quota fields in the superblock.
+
+| +XFS_SB_VERSION_ALIGNBIT+	| Set if sb_inoalignmt is used.
+| +XFS_SB_VERSION_DALIGNBIT+	| Set if sb_unit and sb_width are used.
+| +XFS_SB_VERSION_SHAREDBIT+	| Set if sb_shared_vn is used.
+| +XFS_SB_VERSION_LOGV2BIT+	| Version 2 journaling logs are used.
+| +XFS_SB_VERSION_SECTORBIT+	| Set if sb_sectsize is not 512.
+| +XFS_SB_VERSION_EXTFLGBIT+	| Unwritten extents are used. This is always set.
+| +XFS_SB_VERSION_DIRV2BIT+	|
+Version 2 directories are used. This is always set.
+
+| +XFS_SB_VERSION_MOREBITSBIT+	|
+Set if the sb_features2 field in the superblock contains more flags.
+|=====
+
+If the lower nibble of this value is 5, then this is a v5 filesystem; the
++XFS_SB_VERSION2_CRCBIT+ feature must be set in +sb_features2+.
+
+*sb_sectsize*::
+Specifies the underlying disk sector size in bytes.  Typically this is 512 or
+4096 bytes. This determines the minimum I/O alignment, especially for direct I/O.
+
+*sb_inodesize*::
+Size of the inode in bytes. The default is 256 (2 inodes per standard sector)
+but can be made as large as 2048 bytes when creating the filesystem.  On a v5
+filesystem, the default and minimum inode size are both 512 bytes.
+
+*sb_inopblock*::
+Number of inodes per block. This is equivalent to +sb_blocksize / sb_inodesize+.
+
+*sb_fname[12]*::
+Name for the filesystem. This value can be used in the mount command.
+
+*sb_blocklog*::
+log~2~ value of +sb_blocksize+. In other terms, +sb_blocksize = 2^sb_blocklog^+.
+
+*sb_sectlog*::
+log~2~ value of +sb_sectsize+.
+
+*sb_inodelog*::
+log~2~ value of +sb_inodesize+.
+
+*sb_inopblog*::
+log~2~ value of +sb_inopblock+.
+
+*sb_agblklog*::
+log~2~ value of +sb_agblocks+ (rounded up). This value is used to generate inode
+numbers and absolute block numbers defined in extent maps.
+
+*sb_rextslog*::
+log~2~ value of +sb_rextents+.
+
+*sb_inprogress*::
+Flag specifying that the filesystem is being created.
+
+*sb_imax_pct*::
+Maximum percentage of filesystem space that can be used for inodes. The default
+value is 5%.
+
+*sb_icount*::
+Global count for number inodes allocated on the filesystem. This is only
+maintained in the first superblock.
+
+*sb_ifree*::
+Global count of free inodes on the filesystem. This is only maintained in the
+first superblock.
+
+*sb_fdblocks*::
+Global count of free data blocks on the filesystem. This is only maintained in
+the first superblock.
+
+*sb_frextents*::
+Global count of free real-time extents on the filesystem. This is only
+maintained in the first superblock.
+
+*sb_uquotino*::
+Inode for user quotas. This and the following two quota fields only apply if
++XFS_SB_VERSION_QUOTABIT+ flag is set in +sb_versionnum+. Refer to
+xref:Quota_Inodes[quota inodes] for more information.
+
+*sb_gquotino*::
+Inode for group or project quotas. Group and project quotas cannot be used at
+the same time on v4 filesystems.  On a v5 filesystem, this inode always stores
+group quota information.
+
+*sb_qflags*::
+Quota flags. It can be a combination of the following flags:
+
+.Superblock quota flags
+[options="header"]
+|=====
+| Flag				| Description
+| +XFS_UQUOTA_ACCT+		| User quota accounting is enabled.
+| +XFS_UQUOTA_ENFD+		| User quotas are enforced.
+| +XFS_UQUOTA_CHKD+		| User quotas have been checked.
+| +XFS_PQUOTA_ACCT+		| Project quota accounting is enabled.
+| +XFS_OQUOTA_ENFD+		| Other (group/project) quotas are enforced.
+| +XFS_OQUOTA_CHKD+		| Other (group/project) quotas have been checked.
+| +XFS_GQUOTA_ACCT+		| Group quota accounting is enabled.
+| +XFS_GQUOTA_ENFD+		| Group quotas are enforced.
+| +XFS_GQUOTA_CHKD+		| Group quotas have been checked.
+| +XFS_PQUOTA_ENFD+		| Project quotas are enforced.
+| +XFS_PQUOTA_CHKD+		| Project quotas have been checked.
+|=====
+
+*sb_flags*::
+Miscellaneous flags.
+
+.Superblock flags
+[options="header"]
+|=====
+| Flag                          | Description
+| +XFS_SBF_READONLY+            | Only read-only mounts allowed.
+|=====
+
+*sb_shared_vn*::
+Reserved and must be zero (``vn'' stands for version number).
+
+*sb_inoalignmt*::
+Inode chunk alignment in fsblocks.  Prior to v5, the default value provided for
+inode chunks to have an 8KiB alignment.  Starting with v5, the default value
+scales with the multiple of the inode size over 256 bytes.  Concretely, this
+means an alignment of 16KiB for 512-byte inodes, 32KiB for 1024-byte inodes,
+etc.  If sparse inodes are enabled, the +ir_startino+ field of each inode
+B+tree record must be aligned to this block granularity, even if the inode
+given by +ir_startino+ itself is sparse.
+
+*sb_unit*::
+Underlying stripe or raid unit in blocks.
+
+*sb_width*::
+Underlying stripe or raid width in blocks.
+
+*sb_dirblklog*::
+log~2~ multiplier that determines the granularity of directory block allocations
+in fsblocks.
+
+*sb_logsectlog*::
+log~2~ value of the log subvolume's sector size. This is only used if the
+journaling log is on a separate disk device (i.e. not internal).
+
+*sb_logsectsize*::
+The log's sector size in bytes if the filesystem uses an external log device.
+
+*sb_logsunit*::
+The log device's stripe or raid unit size. This only applies to version 2 logs
++XFS_SB_VERSION_LOGV2BIT+ is set in +sb_versionnum+.
+
+*sb_features2*::
+Additional version flags if +XFS_SB_VERSION_MOREBITSBIT+ is set in
++sb_versionnum+. The currently defined additional features include:
+
+.Extended Version 4 Superblock flags
+[options="header"]
+|=====
+| Flag				| Description
+| +XFS_SB_VERSION2_LAZYSBCOUNTBIT+ |
+Lazy global counters. Making a filesystem with this bit set can improve
+performance. The global free space and inode counts are only updated in the
+primary superblock when the filesystem is cleanly unmounted.
+
+| +XFS_SB_VERSION2_ATTR2BIT+	|
+Extended attributes version 2. Making a filesystem with this optimises the
+inode layout of extended attributes.  If this bit is set and the +noattr2+
+mount flag is not specified, the +di_forkoff+ inode field will be dynamically
+adjusted.  See the section about xref:Extended_Attribute_Versions[extended
+attribute versions] for more information.
+
+| +XFS_SB_VERSION2_PARENTBIT+	|
+Parent pointers. All inodes must have an extended attribute that points back to
+its parent inode. The primary purpose for this information is in backup systems.
+
+| +XFS_SB_VERSION2_PROJID32BIT+	|
+32-bit Project ID.  Inodes can be associated with a project ID number, which
+can be used to enforce disk space usage quotas for a particular group of
+directories.  This flag indicates that project IDs can be 32 bits in size.
+
+| +XFS_SB_VERSION2_CRCBIT+	|
+Metadata checksumming.  All metadata blocks have an extended header containing
+the block checksum, a copy of the metadata UUID, the log sequence number of the
+last update to prevent stale replays, and a back pointer to the owner of the
+block.  This feature must be and can only be set if the lowest nibble of
++sb_versionnum+ is set to 5.
+
+| +XFS_SB_VERSION2_FTYPE+	|
+Directory file type.  Each directory entry records the type of the inode to
+which the entry points.  This speeds up directory iteration by removing the
+need to load every inode into memory.
+|=====
+
+*sb_bad_features2*::
+This field mirrors +sb_features2+, due to past 64-bit alignment errors.
+
+*sb_features_compat*::
+Read-write compatible feature flags.  The kernel can still read and write this
+FS even if it doesn't understand the flag.  Currently, there are no valid
+flags.
+
+*sb_features_ro_compat*::
+Read-only compatible feature flags.  The kernel can still read this FS even if
+it doesn't understand the flag.
+
+.Extended Version 5 Superblock Read-Only compatibility flags
+[options="header"]
+|=====
+| Flag				| Description
+| +XFS_SB_FEAT_RO_COMPAT_FINOBT+ |
+Free inode B+tree.  Each allocation group contains a B+tree to track inode chunks
+containing free inodes.  This is a performance optimization to reduce the time
+required to allocate inodes.
+
+| +XFS_SB_FEAT_RO_COMPAT_RMAPBT+ |
+Reverse mapping B+tree.  Each allocation group contains a B+tree containing
+records mapping AG blocks to their owners.  See the section about
+xref:Reconstruction[reconstruction] for more details.
+
+| +XFS_SB_FEAT_RO_COMPAT_REFLINK+ |
+Reference count B+tree.  Each allocation group contains a B+tree to track the
+reference counts of AG blocks.  This enables files to share data blocks safely.
+See the section about xref:Reflink_Deduplication[reflink and deduplication] for
+more details.
+
+| +XFS_SB_FEAT_RO_COMPAT_INOBTCNT+ |
+Inode B+tree block counters.  Each allocation group's inode (AGI) header
+tracks the number of blocks in each of the inode B+trees.  This allows us
+to have a slightly higher level of redundancy over the shape of the inode
+btrees, and decreases the amount of time to compute the metadata B+tree
+preallocations at mount time.
+
+|=====
+
+*sb_features_incompat*::
+Read-write incompatible feature flags.  The kernel cannot read or write this
+FS if it doesn't understand the flag.
+
+.Extended Version 5 Superblock Read-Write incompatibility flags
+[options="header"]
+|=====
+| Flag				| Description
+| +XFS_SB_FEAT_INCOMPAT_FTYPE+ |
+Directory file type.  Each directory entry tracks the type of the inode to
+which the entry points.  This is a performance optimization to remove the need
+to load every inode into memory to iterate a directory.
+
+| +XFS_SB_FEAT_INCOMPAT_SPINODES+ |
+Sparse inodes.  This feature relaxes the requirement to allocate inodes in
+chunks of 64.  When the free space is heavily fragmented, there might exist
+plenty of free space but not enough contiguous free space to allocate a new
+inode chunk.  With this feature, the user can continue to create files until
+all free space is exhausted.
+
+Unused space in the inode B+tree records are used to track which parts of the
+inode chunk are not inodes.
+
+See the chapter on xref:Sparse_Inodes[Sparse Inodes] for more information.
+
+| +XFS_SB_FEAT_INCOMPAT_META_UUID+ |
+Metadata UUID.  The UUID stamped into each metadata block must match the value
+in +sb_meta_uuid+.  This enables the administrator to change +sb_uuid+ at will
+without having to rewrite the entire filesystem.
+
+| +XFS_SB_FEAT_INCOMPAT_BIGTIME+ |
+Large timestamps.  Inode timestamps and quota expiration timers are extended to
+support times through the year 2486.  See the section on
+xref:Timestamps[timestamps] for more information.
+
+| +XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR+ |
+The filesystem is not in operable condition, and must be run through
+xfs_repair before it can be mounted.
+
+| +XFS_SB_FEAT_INCOMPAT_NREXT64+ |
+Large file fork extent counts.  This greatly expands the maximum number of
+space mappings allowed in data and extended attribute file forks.
+
+| +XFS_SB_FEAT_INCOMPAT_EXCHRANGE+ |
+Atomic file mapping exchanges.  The filesystem is capable of exchanging a range
+of mappings between two arbitrary ranges of a file's fork by using log intent
+items to track the progress of the high level exchange operation.  In other
+words, the exchange operation can be restarted if the system goes down, which
+is necessary for userspace to commit of new file contents atomically.  This
+flag has user-visible impacts, which is why it is a permanent incompat flag.
+See the section about xref:XMI_Log_Item[mapping exchange log intents] for more
+information.
+
+| +XFS_SB_FEAT_INCOMPAT_PARENT+ |
+Directory parent pointers.  See the section about xref:Parent_Pointers[parent
+pointers] for more information.
+
+|=====
+
+*sb_features_log_incompat*::
+Read-write incompatible feature flags for the log.  The kernel cannot recover
+the FS log if it doesn't understand the flag.
+
+.Extended Version 5 Superblock Log incompatibility flags
+[options="header"]
+|=====
+| Flag					| Description
+| +XFS_SB_FEAT_INCOMPAT_LOG_XATTRS+	|
+Extended attribute updates have been committed to the ondisk log.
+
+|=====
+
+*sb_crc*::
+Superblock checksum.
+
+*sb_spino_align*::
+Sparse inode alignment, in fsblocks.  Each chunk of inodes referenced by a
+sparse inode B+tree record must be aligned to this block granularity.
+
+*sb_pquotino*::
+Project quota inode.
+
+*sb_lsn*::
+Log sequence number of the last superblock update.
+
+*sb_meta_uuid*::
+If the +XFS_SB_FEAT_INCOMPAT_META_UUID+ feature is set, then the UUID field in
+all metadata blocks must match this UUID.  If not, the block header UUID field
+must match +sb_uuid+.
+
+*sb_rrmapino*::
+If the +XFS_SB_FEAT_RO_COMPAT_RMAPBT+ feature is set and a real-time
+device is present (+sb_rblocks+ > 0), this field points to an inode
+that contains the root to the
+xref:Real_time_Reverse_Mapping_Btree[Real-Time Reverse Mapping B+tree].
+This field is zero otherwise.
+
+=== xfs_db Superblock Example
+
+A filesystem is made on a single disk with the following command:
+
+----
+# mkfs.xfs -i attr=2 -n size=16384 -f /dev/sda7
+meta-data=/dev/sda7              isize=256    agcount=16, agsize=3923122 blks
+         =                       sectsz=512   attr=2
+data     =                       bsize=4096   blocks=62769952, imaxpct=25
+         =                       sunit=0      swidth=0 blks, unwritten=1
+naming   =version 2              bsize=16384
+log      =internal log           bsize=4096   blocks=30649, version=1
+         =                       sectsz=512   sunit=0 blks
+realtime =none                   extsz=65536  blocks=0, rtextents=0
+----
+
+And in xfs_db, inspecting the superblock:
+
+----
+xfs_db> sb
+xfs_db> p
+magicnum = 0x58465342
+blocksize = 4096
+dblocks = 62769952
+rblocks = 0
+rextents = 0
+uuid = 32b24036-6931-45b4-b68c-cd5e7d9a1ca5
+logstart = 33554436
+rootino = 128
+rbmino = 129
+rsumino = 130
+rextsize = 16
+agblocks = 3923122
+agcount = 16
+rbmblocks = 0
+logblocks = 30649
+versionnum = 0xb084
+sectsize = 512
+inodesize = 256
+inopblock = 16
+fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
+blocklog = 12
+sectlog = 9
+inodelog = 8
+inopblog = 4
+agblklog = 22
+rextslog = 0
+inprogress = 0
+imax_pct = 25
+icount = 64
+ifree = 61
+fdblocks = 62739235
+frextents = 0
+uquotino = 0
+gquotino = 0
+qflags = 0
+flags = 0
+shared_vn = 0
+inoalignmt = 2
+unit = 0
+width = 0
+dirblklog = 2
+logsectlog = 0
+logsectsize = 0
+logsunit = 0
+features2 = 8
+----


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 04/10] design: document the actual ondisk superblock
  2024-11-27  0:18 ` [PATCHSET] " Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-11-27  0:18   ` [PATCH 03/10] design: move superblock documentation to a separate file Darrick J. Wong
@ 2024-11-27  0:19   ` Darrick J. Wong
  2024-11-27  0:19   ` [PATCH 05/10] design: document the changes required to handle metadata directories Darrick J. Wong
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27  0:19 UTC (permalink / raw)
  To: djwong; +Cc: hch, hch, cem, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

struct xfs_dsb is the ondisk superblock, not struct xfs_sb.  Replace the
struct definition with the one for the the ondisk superblock.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 .../XFS_Filesystem_Structure/superblock.asciidoc   |  117 ++++++++++----------
 1 file changed, 58 insertions(+), 59 deletions(-)


diff --git a/design/XFS_Filesystem_Structure/superblock.asciidoc b/design/XFS_Filesystem_Structure/superblock.asciidoc
index 16c31116ffafd4..79e8c30dc93e79 100644
--- a/design/XFS_Filesystem_Structure/superblock.asciidoc
+++ b/design/XFS_Filesystem_Structure/superblock.asciidoc
@@ -11,68 +11,67 @@ field follows.
 
 [source, c]
 ----
-struct xfs_sb
-{
-	__uint32_t		sb_magicnum;
-	__uint32_t		sb_blocksize;
-	xfs_rfsblock_t		sb_dblocks;
-	xfs_rfsblock_t		sb_rblocks;
-	xfs_rtblock_t		sb_rextents;
-	uuid_t			sb_uuid;
-	xfs_fsblock_t		sb_logstart;
-	xfs_ino_t		sb_rootino;
-	xfs_ino_t		sb_rbmino;
-	xfs_ino_t		sb_rsumino;
-	xfs_agblock_t		sb_rextsize;
-	xfs_agblock_t		sb_agblocks;
-	xfs_agnumber_t		sb_agcount;
-	xfs_extlen_t		sb_rbmblocks;
-	xfs_extlen_t		sb_logblocks;
-	__uint16_t		sb_versionnum;
-	__uint16_t		sb_sectsize;
-	__uint16_t		sb_inodesize;
-	__uint16_t		sb_inopblock;
-	char			sb_fname[12];
-	__uint8_t		sb_blocklog;
-	__uint8_t		sb_sectlog;
-	__uint8_t		sb_inodelog;
-	__uint8_t		sb_inopblog;
-	__uint8_t		sb_agblklog;
-	__uint8_t		sb_rextslog;
-	__uint8_t		sb_inprogress;
-	__uint8_t		sb_imax_pct;
-	__uint64_t		sb_icount;
-	__uint64_t		sb_ifree;
-	__uint64_t		sb_fdblocks;
-	__uint64_t		sb_frextents;
-	xfs_ino_t		sb_uquotino;
-	xfs_ino_t		sb_gquotino;
-	__uint16_t		sb_qflags;
-	__uint8_t		sb_flags;
-	__uint8_t		sb_shared_vn;
-	xfs_extlen_t		sb_inoalignmt;
-	__uint32_t		sb_unit;
-	__uint32_t		sb_width;
-	__uint8_t		sb_dirblklog;
-	__uint8_t		sb_logsectlog;
-	__uint16_t		sb_logsectsize;
-	__uint32_t		sb_logsunit;
-	__uint32_t		sb_features2;
-	__uint32_t		sb_bad_features2;
+struct xfs_dsb {
+	__be32		sb_magicnum;
+	__be32		sb_blocksize;
+	__be64		sb_dblocks;
+	__be64		sb_rblocks;
+	__be64		sb_rextents;
+	uuid_t		sb_uuid;
+	__be64		sb_logstart;
+	__be64		sb_rootino;
+	__be64		sb_rbmino;
+	__be64		sb_rsumino;
+	__be32		sb_rextsize;
+	__be32		sb_agblocks;
+	__be32		sb_agcount;
+	__be32		sb_rbmblocks;
+	__be32		sb_logblocks;
+	__be16		sb_versionnum;
+	__be16		sb_sectsize;
+	__be16		sb_inodesize;
+	__be16		sb_inopblock;
+	char		sb_fname[XFSLABEL_MAX];
+	__u8		sb_blocklog;
+	__u8		sb_sectlog;
+	__u8		sb_inodelog;
+	__u8		sb_inopblog;
+	__u8		sb_agblklog;
+	__u8		sb_rextslog;
+	__u8		sb_inprogress;
+	__u8		sb_imax_pct;
+	__be64		sb_icount;
+	__be64		sb_ifree;
+	__be64		sb_fdblocks;
+	__be64		sb_frextents;
+	__be64		sb_uquotino;
+	__be64		sb_gquotino;
+	__be16		sb_qflags;
+	__u8		sb_flags;
+	__u8		sb_shared_vn;
+	__be32		sb_inoalignmt;
+	__be32		sb_unit;
+	__be32		sb_width;
+	__u8		sb_dirblklog;
+	__u8		sb_logsectlog;
+	__be16		sb_logsectsize;
+	__be32		sb_logsunit;
+	__be32		sb_features2;
+	__be32		sb_bad_features2;
 
 	/* version 5 superblock fields start here */
-	__uint32_t		sb_features_compat;
-	__uint32_t		sb_features_ro_compat;
-	__uint32_t		sb_features_incompat;
-	__uint32_t		sb_features_log_incompat;
+	__be32		sb_features_compat;
+	__be32		sb_features_ro_compat;
+	__be32		sb_features_incompat;
+	__be32		sb_features_log_incompat;
+	__le32		sb_crc;
+	__be32		sb_spino_align;
+	__be64		sb_pquotino;
+	__be64		sb_lsn;
+	uuid_t		sb_meta_uuid;
+	__be64		sb_rrmapino;
 
-	__uint32_t		sb_crc;
-	xfs_extlen_t		sb_spino_align;
-
-	xfs_ino_t		sb_pquotino;
-	xfs_lsn_t		sb_lsn;
-	uuid_t			sb_meta_uuid;
-	xfs_ino_t		sb_rrmapino;
+	/* must be padded to 64 bit alignment */
 };
 ----
 *sb_magicnum*::


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 05/10] design: document the changes required to handle metadata directories
  2024-11-27  0:18 ` [PATCHSET] " Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-11-27  0:19   ` [PATCH 04/10] design: document the actual ondisk superblock Darrick J. Wong
@ 2024-11-27  0:19   ` Darrick J. Wong
  2024-11-27  0:19   ` [PATCH 06/10] design: move discussion of realtime volumes to a separate section Darrick J. Wong
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27  0:19 UTC (permalink / raw)
  To: djwong; +Cc: hch, hch, cem, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Document the ondisk format changes for metadata directories.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 .../internal_inodes.asciidoc                       |  113 ++++++++++++++++++++
 .../XFS_Filesystem_Structure/ondisk_inode.asciidoc |   22 ++++
 .../XFS_Filesystem_Structure/superblock.asciidoc   |   14 +-
 3 files changed, 142 insertions(+), 7 deletions(-)


diff --git a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
index 84e4cb969ce392..eaa0a50aa848f3 100644
--- a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
+++ b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
@@ -5,6 +5,119 @@ XFS allocates several inodes when a filesystem is created. These are internal
 and not accessible from the standard directory structure. These inodes are only
 accessible from the superblock.
 
+[[Metadata_Directories]]
+== Metadata Directory Tree
+
+If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, the +sb_metadirino+
+field in the superblock points to the root of a directory tree containing
+metadata files.  This directory tree is completely internal to the filesystem
+and must not be exposed to user programs.
+
+When this feature is enabled, metadata files should be found by walking the
+metadata directory tree.  The superblock fields that formerly pointed to (some)
+of those inodes have been deallocated and may be reused by future features.
+
+.Metadata Directory Paths
+[options="header"]
+|=====
+| Metadata File                                  | Location
+|=====
+
+Metadata files are flagged by the +XFS_DIFLAG2_METADATA+ flag in the
++di_flags2+ field.  Metadata files must have the following properties:
+
+* Must be either a directory or a regular file.
+* chmod 0000
+* User and group IDs set to zero.
+* The +XFS_DIFLAG_IMMUTABLE+, +XFS_DIFLAG_SYNC+, +XFS_DIFLAG_NOATIME+, +XFS_DIFLAG_NODUMP+, and +XFS_DIFLAG_NODEFRAG+ flags must all be set in +di_flags+.
+* For a directory, the +XFS_DIFLAG_NOSYMLINKS+ flag must also be set.
+* The +XFS_DIFLAG2_METADATA+ flag must be set in +di_flags2+.
+* The +XFS_DIFLAG2_DAX+ flag must not be set.
+
+=== Metadata Directory Example
+
+This example shows a metadta directory from a freshly formatted root
+filesystem:
+
+----
+xfs_db> sb 0
+xfs_db> p
+magicnum = 0x58465342
+blocksize = 4096
+dblocks = 5192704
+rblocks = 0
+rextents = 0
+uuid = cbf2ceef-658e-46b0-8f96-785661c37976
+logstart = 4194311
+rootino = 128
+rbmino = 130
+rsumino = 131
+...
+meta_uuid = 00000000-0000-0000-0000-000000000000
+metadirino = 129
+...
+----
+
+Notice how the listing includes the root of the metadata directory tree
+(+metadirino+).
+
+----
+xfs_db> path -m /
+xfs_db> ls
+8          129                directory      0x0000002e   1 . (good)
+10         129                directory      0x0000172e   2 .. (good)
+12         33685632           directory      0x2d18ab4c   8 rtgroups (good)
+----
+
+Here we use the +path+ and +ls+ commands to display the root directory of
+the metadata directory.  We can navigate the directory the old way, too:
+
+----
+xfs_db> p
+core.magic = 0x494e
+core.mode = 040000
+core.version = 3
+core.format = 1 (local)
+core.onlink = 0
+core.uid = 0
+core.gid = 0
+...
+v3.flags2 = 0x8000000000000018
+v3.cowextsize = 0
+v3.crtime.sec = Wed Aug  7 10:22:36 2024
+v3.crtime.nsec = 273744000
+v3.inumber = 129
+v3.uuid = 7e55b909-8728-4d69-a1fa-891427314eea
+v3.reflink = 0
+v3.cowextsz = 0
+v3.dax = 0
+v3.bigtime = 1
+v3.nrext64 = 1
+v3.metadata = 1
+u3.sfdir3.hdr.count = 1
+u3.sfdir3.hdr.i8count = 0
+u3.sfdir3.hdr.parent.i4 = 129
+u3.sfdir3.list[0].namelen = 8
+u3.sfdir3.list[0].offset = 0x60
+u3.sfdir3.list[0].name = "rtgroups"
+u3.sfdir3.list[0].inumber.i4 = 33685632
+u3.sfdir3.list[0].filetype = 2
+----
+
+The root of the metadata directory is a short format directory, and looks just
+like any other directory.  The only difference is that the metadata flag is
+set, and the directory can only be viewed in the XFS debugger.
+
+----
+xfs_db> path -m /rtgroups/0.rmap
+btdump
+u3.rtrmapbt.recs[1] = [startblock,blockcount,owner,offset,extentflag,attrfork,bmbtblock]
+1:[0,1,-3,0,0,0,0]
+----
+
+Observe that we can use the xfs_db +path+ command to navigate the metadata
+directory tree to the user quota file and display its contents.
+
 [[Quota_Inodes]]
 == Quota Inodes
 
diff --git a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
index 34c064871cb255..02ec0d12bb57e5 100644
--- a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
+++ b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
@@ -78,7 +78,10 @@ struct xfs_dinode_core {
      __uint16_t                di_mode;
      __int8_t                  di_version;
      __int8_t                  di_format;
-     __uint16_t                di_onlink;
+     union {
+          __uint16_t           di_onlink;
+          __uint16_t           di_metatype;
+     };
      __uint32_t                di_uid;
      __uint32_t                di_gid;
      __uint32_t                di_nlink;
@@ -188,6 +191,17 @@ In v1 inodes, this specifies the number of links to the inode from directories.
 When the number exceeds 65535, the inode is converted to v2 and the link count
 is stored in +di_nlink+.
 
+*di_metatype*::
+If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, the +di_onlink+ field
+is redefined to declare the intended contents of files in the metadata
+directory tree.
+
+[source, c]
+----
+enum xfs_metafile_type {
+};
+----
+
 *di_uid*::
 Specifies the owner's UID of the inode.
 
@@ -383,6 +397,12 @@ will be copied to all newly created files and directories.
 Files with this flag set may have up to (2^48^ - 1) extents mapped to the data
 fork and up to (2^32^ - 1) extents mapped to the attribute fork.  This flag
 requires the +XFS_SB_FEAT_INCOMPAT_NREXT64+ feature to be enabled.
+| +XFS_DIFLAG2_METADATA+	|
+This file contains filesystem metadata.  This feature requires the
++XFS_SB_FEAT_INCOMPAT_METADIR+ feature to be enabled.  See the section about
+xref:Metadata_Directories[metadata directories] for more information on
+metadata inode properties.  Only directories and regular files can have this
+flag set.
 |=====
 
 *di_cowextsize*::
diff --git a/design/XFS_Filesystem_Structure/superblock.asciidoc b/design/XFS_Filesystem_Structure/superblock.asciidoc
index 79e8c30dc93e79..56877615ae81bf 100644
--- a/design/XFS_Filesystem_Structure/superblock.asciidoc
+++ b/design/XFS_Filesystem_Structure/superblock.asciidoc
@@ -69,7 +69,7 @@ struct xfs_dsb {
 	__be64		sb_pquotino;
 	__be64		sb_lsn;
 	uuid_t		sb_meta_uuid;
-	__be64		sb_rrmapino;
+	__be64		sb_metadirino;
 
 	/* must be padded to 64 bit alignment */
 };
@@ -438,6 +438,10 @@ information.
 Directory parent pointers.  See the section about xref:Parent_Pointers[parent
 pointers] for more information.
 
+| +XFS_SB_FEAT_INCOMPAT_METADIR+ |
+Metadata directory tree.  See the section about the xref:Metadata_Directories[
+metadata directory tree] for more information.
+
 |=====
 
 *sb_features_log_incompat*::
@@ -471,11 +475,9 @@ If the +XFS_SB_FEAT_INCOMPAT_META_UUID+ feature is set, then the UUID field in
 all metadata blocks must match this UUID.  If not, the block header UUID field
 must match +sb_uuid+.
 
-*sb_rrmapino*::
-If the +XFS_SB_FEAT_RO_COMPAT_RMAPBT+ feature is set and a real-time
-device is present (+sb_rblocks+ > 0), this field points to an inode
-that contains the root to the
-xref:Real_time_Reverse_Mapping_Btree[Real-Time Reverse Mapping B+tree].
+*sb_metadirino*::
+If the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is set, this field points to
+the inode of the root directory of the metadata directory tree.
 This field is zero otherwise.
 
 === xfs_db Superblock Example


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 06/10] design: move discussion of realtime volumes to a separate section
  2024-11-27  0:18 ` [PATCHSET] " Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-11-27  0:19   ` [PATCH 05/10] design: document the changes required to handle metadata directories Darrick J. Wong
@ 2024-11-27  0:19   ` Darrick J. Wong
  2024-11-27  0:19   ` [PATCH 07/10] design: document realtime groups Darrick J. Wong
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27  0:19 UTC (permalink / raw)
  To: djwong; +Cc: hch, hch, cem, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

In preparation for documenting the realtime modernization project, move
the discussions of the realtime-realted ondisk metadata to a separate
file.  Since realtime reverse mapping btrees haven't been added to the
filesystem yet, stop including them in the final output.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 .../allocation_groups.asciidoc                     |   20 --------
 .../internal_inodes.asciidoc                       |   36 +-------------
 design/XFS_Filesystem_Structure/realtime.asciidoc  |   50 ++++++++++++++++++++
 .../xfs_filesystem_structure.asciidoc              |    2 +
 4 files changed, 54 insertions(+), 54 deletions(-)
 create mode 100644 design/XFS_Filesystem_Structure/realtime.asciidoc


diff --git a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
index e2cdaab5e03d3f..c746a92ca47dd6 100644
--- a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
+++ b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
@@ -772,23 +772,3 @@ core.magic = 0x494e
 
 The chunk record also indicates that this chunk has 32 inodes, and that the
 missing inodes are also ``free''.
-
-[[Real-time_Devices]]
-== Real-time Devices
-
-The performance of the standard XFS allocator varies depending on the internal
-state of the various metadata indices enabled on the filesystem.  For
-applications which need to minimize the jitter of allocation latency, XFS
-supports the notion of a ``real-time device''.  This is a special device
-separate from the regular filesystem where extent allocations are tracked with
-a bitmap and free space is indexed with a two-dimensional array.  If an inode
-is flagged with +XFS_DIFLAG_REALTIME+, its data will live on the real time
-device.  The metadata for real time devices is discussed in the section about
-xref:Real-time_Inodes[real time inodes].
-
-By placing the real time device (and the journal) on separate high-performance
-storage devices, it is possible to reduce most of the unpredictability in I/O
-response times that come from metadata operations.
-
-None of the XFS per-AG B+trees are involved with real time files.  It is not
-possible for real time files to share data blocks.
diff --git a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
index eaa0a50aa848f3..68c86d30ff8206 100644
--- a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
+++ b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
@@ -287,41 +287,9 @@ Log sequence number of the last DQ block write.
 *dd_crc*::
 Checksum of the DQ block.
 
-
 [[Real-time_Inodes]]
 == Real-time Inodes
 
 There are two inodes allocated to managing the real-time device's space, the
-Bitmap Inode and the Summary Inode.
-
-[[Real-Time_Bitmap_Inode]]
-=== Real-Time Bitmap Inode
-
-The real time bitmap inode, +sb_rbmino+, tracks the used/free space in the
-real-time device using an old-style bitmap. One bit is allocated per real-time
-extent. The size of an extent is specified by the superblock's +sb_rextsize+
-value.
-
-The number of blocks used by the bitmap inode is equal to the number of
-real-time extents (+sb_rextents+) divided by the block size (+sb_blocksize+)
-and bits per byte. This value is stored in +sb_rbmblocks+. The nblocks and
-extent array for the inode should match this.  Each real time block gets its
-own bit in the bitmap.
-
-[[Real-Time_Summary_Inode]]
-=== Real-Time Summary Inode
-
-The real time summary inode, +sb_rsumino+, tracks the used and free space
-accounting information for the real-time device.  This file indexes the
-approximate location of each free extent on the real-time device first by
-log2(extent size) and then by the real-time bitmap block number.  The size of
-the summary inode file is equal to +sb_rbmblocks+ × log2(realtime device size)
-× sizeof(+xfs_suminfo_t+).  The entry for a given log2(extent size) and
-rtbitmap block number is 0 if there is no free extents of that size at that
-rtbitmap location, and positive if there are any.
-
-This data structure is not particularly space efficient, however it is a very
-fast way to provide the same data as the two free space B+trees for regular
-files since the space is preallocated and metadata maintenance is minimal.
-
-include::rtrmapbt.asciidoc[]
+xref:Real-Time_Bitmap_Inode[Bitmap Inode] and the
+xref:Real-Time_Summary_Inode[Summary Inode].
diff --git a/design/XFS_Filesystem_Structure/realtime.asciidoc b/design/XFS_Filesystem_Structure/realtime.asciidoc
new file mode 100644
index 00000000000000..11426e8fdb632d
--- /dev/null
+++ b/design/XFS_Filesystem_Structure/realtime.asciidoc
@@ -0,0 +1,50 @@
+[[Real-time_Devices]]
+= Real-time Devices
+
+The performance of the standard XFS allocator varies depending on the internal
+state of the various metadata indices enabled on the filesystem.  For
+applications which need to minimize the jitter of allocation latency, XFS
+supports the notion of a ``real-time device''.  This is a special device
+separate from the regular filesystem where extent allocations are tracked with
+a bitmap and free space is indexed with a two-dimensional array.  If an inode
+is flagged with +XFS_DIFLAG_REALTIME+, its data will live on the real time
+device.
+
+By placing the real time device (and the journal) on separate high-performance
+storage devices, it is possible to reduce most of the unpredictability in I/O
+response times that come from metadata operations.
+
+None of the XFS per-AG B+trees are involved with real time files.  It is not
+possible for real time files to share data blocks.
+
+[[Real-Time_Bitmap_Inode]]
+== Free Space Bitmap Inode
+
+The real time bitmap inode, +sb_rbmino+, tracks the used/free space in the
+real-time device using an old-style bitmap. One bit is allocated per real-time
+extent. The size of an extent is specified by the superblock's +sb_rextsize+
+value.
+
+The number of blocks used by the bitmap inode is equal to the number of
+real-time extents (+sb_rextents+) divided by the block size (+sb_blocksize+)
+and bits per byte. This value is stored in +sb_rbmblocks+. The nblocks and
+extent array for the inode should match this.  Each real time block gets its
+own bit in the bitmap.
+
+[[Real-Time_Summary_Inode]]
+== Free Space Summary Inode
+
+The real time summary inode, +sb_rsumino+, tracks the used and free space
+accounting information for the real-time device.  This file indexes the
+approximate location of each free extent on the real-time device first by
+log2(extent size) and then by the real-time bitmap block number.  The size of
+the summary inode file is equal to +sb_rbmblocks+ × log2(realtime device size)
+× sizeof(+xfs_suminfo_t+).  The entry for a given log2(extent size) and
+rtbitmap block number is 0 if there is no free extents of that size at that
+rtbitmap location, and positive if there are any.
+
+This data structure is not particularly space efficient, however it is a very
+fast way to provide the same data as the two free space B+trees for regular
+files since the space is preallocated and metadata maintenance is minimal.
+
+include::rtrmapbt.asciidoc[]
diff --git a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
index 689e2a874c13e9..a643d18add6094 100644
--- a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
+++ b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
@@ -84,6 +84,8 @@ include::journaling_log.asciidoc[]
 
 include::internal_inodes.asciidoc[]
 
+include::realtime.asciidoc[]
+
 include::fs_properties.asciidoc[]
 
 :leveloffset: 0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 07/10] design: document realtime groups
  2024-11-27  0:18 ` [PATCHSET] " Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-11-27  0:19   ` [PATCH 06/10] design: move discussion of realtime volumes to a separate section Darrick J. Wong
@ 2024-11-27  0:19   ` Darrick J. Wong
  2024-11-27  0:20   ` [PATCH 08/10] design: document metadata directory tree quota changes Darrick J. Wong
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27  0:19 UTC (permalink / raw)
  To: djwong; +Cc: hch, hch, cem, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Document the ondisk changes for realtime allocation groups.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 .../XFS_Filesystem_Structure/common_types.asciidoc |    4 
 .../internal_inodes.asciidoc                       |    2 
 design/XFS_Filesystem_Structure/magic.asciidoc     |    3 
 .../XFS_Filesystem_Structure/ondisk_inode.asciidoc |    2 
 design/XFS_Filesystem_Structure/realtime.asciidoc  |  344 ++++++++++++++++++++
 .../XFS_Filesystem_Structure/superblock.asciidoc   |   22 +
 6 files changed, 376 insertions(+), 1 deletion(-)


diff --git a/design/XFS_Filesystem_Structure/common_types.asciidoc b/design/XFS_Filesystem_Structure/common_types.asciidoc
index 51909be384e273..34cdfdaeccf848 100644
--- a/design/XFS_Filesystem_Structure/common_types.asciidoc
+++ b/design/XFS_Filesystem_Structure/common_types.asciidoc
@@ -43,7 +43,9 @@ Unsigned 64 bit raw filesystem block number.
 
 *xfs_rtblock_t*::
 Unsigned 64 bit extent number in the xref:Real-time_Devices[real-time]
-sub-volume.
+sub-volume.  If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, these
+values combine an xref:Realtime_Groups[rtgroup number] and block offset into
+the realtime group.
 
 *xfs_fileoff_t*::
 Unsigned 64 bit block offset into a file.
diff --git a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
index 68c86d30ff8206..5f4d62201cbd67 100644
--- a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
+++ b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
@@ -21,6 +21,8 @@ of those inodes have been deallocated and may be reused by future features.
 [options="header"]
 |=====
 | Metadata File                                  | Location
+| xref:Real-Time_Bitmap_Inode[Realtime Bitmap]   | /rtgroups/*.bitmap
+| xref:Real-Time_Summary_Inode[Realtime Summary] | /rtgroups/*.summary
 |=====
 
 Metadata files are flagged by the +XFS_DIFLAG2_METADATA+ flag in the
diff --git a/design/XFS_Filesystem_Structure/magic.asciidoc b/design/XFS_Filesystem_Structure/magic.asciidoc
index 60952aeb876ff5..5da29b9ef9f3a8 100644
--- a/design/XFS_Filesystem_Structure/magic.asciidoc
+++ b/design/XFS_Filesystem_Structure/magic.asciidoc
@@ -45,9 +45,12 @@ relevant chapters.  Magic numbers tend to have consistent locations:
 | +XFS_ATTR3_LEAF_MAGIC+	| 0x3bee	|     	| xref:Leaf_Attributes[Leaf Attribute], v5 only
 | +XFS_ATTR3_RMT_MAGIC+		| 0x5841524d	| XARM	| xref:Remote_Values[Remote Attribute Value], v5 only
 | +XFS_RMAP_CRC_MAGIC+		| 0x524d4233	| RMB3	| xref:Reverse_Mapping_Btree[Reverse Mapping B+tree], v5 only
+| +XFS_RTBITMAP_MAGIC+		| 0x424D505A	| BMPZ	| xref:Real-Time_Bitmap_Inode[Real-Time Bitmap], metadir only
+| +XFS_RTSUMMARY_MAGIC+		| 0x53554D59	| SUMY	| xref:Real-Time_Summary_Inode[Real-Time Summary], metadir only
 | +XFS_RTRMAP_CRC_MAGIC+	| 0x4d415052	| MAPR	| xref:Real_time_Reverse_Mapping_Btree[Real-Time Reverse Mapping B+tree], v5 only
 | +XFS_REFC_CRC_MAGIC+		| 0x52334643	| R3FC	| xref:Reference_Count_Btree[Reference Count B+tree], v5 only
 | +XFS_MD_MAGIC+		| 0x5846534d	| XFSM	| xref:Metadata_Dumps[Metadata Dumps]
+| +XFS_RTSB_MAGIC+		| 0x46726F67	| Frog	| xref:Realtime_Groups[Realtime Groups]
 |=====
 
 The magic numbers for log items are at offset zero in each log item, but items
diff --git a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
index 02ec0d12bb57e5..e28929907147b7 100644
--- a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
+++ b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
@@ -199,6 +199,8 @@ directory tree.
 [source, c]
 ----
 enum xfs_metafile_type {
+     XFS_METAFILE_RTBITMAP,
+     XFS_METAFILE_RTSUMMARY,
 };
 ----
 
diff --git a/design/XFS_Filesystem_Structure/realtime.asciidoc b/design/XFS_Filesystem_Structure/realtime.asciidoc
index 11426e8fdb632d..3a72eb5175ad89 100644
--- a/design/XFS_Filesystem_Structure/realtime.asciidoc
+++ b/design/XFS_Filesystem_Structure/realtime.asciidoc
@@ -31,6 +31,146 @@ and bits per byte. This value is stored in +sb_rbmblocks+. The nblocks and
 extent array for the inode should match this.  Each real time block gets its
 own bit in the bitmap.
 
+If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, each block of the
+realtime bitmap file has a header of the following format:
+
+[source, c]
+----
+struct xfs_rtbuf_blkinfo {
+	__be32		rt_magic;
+	__be32		rt_crc;
+	__be64		rt_owner;
+	__be64		rt_blkno;
+	__be64		rt_lsn;
+	uuid_t		rt_uuid;
+};
+----
+
+*rt_magic*::
+Specifies the magic number for the rtbitmap block: ``BMPZ'' (0x424D505A).
+
+*rt_crc*::
+Checksum of the block.
+
+*rt_owner*::
+Specifies the inode number for the file that owns this block.
+
+*rt_blkno*::
+Disk address of this block.
+
+*rt_lsn*::
+Log sequence number of the last write to this block.
+
+*rt_uuid*::
+The UUID of this block, which must match either +sb_uuid+ or +sb_meta_uuid+
+depending on which features are set.
+
+After the block header, the bitmap data are encoded as be32 word values.
+
+=== xfs_db rtbitmap Example
+
+This example shows a real-time bitmap file from a freshly populated filesystem:
+
+----
+xfs_db> path -m /rtgroups/3.bitmap
+xfs_db> p
+core.magic = 0x494e
+core.mode = 0100000
+core.version = 3
+core.format = 2 (extents)
+core.metatype = 5 (rtbitmap)
+core.uid = 0
+core.gid = 0
+core.nlinkv2 = 1
+core.projid_lo = 3
+core.projid_hi = 0
+core.nextents = 1
+core.atime.sec = Tue Oct 15 16:04:02 2024
+core.atime.nsec = 769675000
+core.mtime.sec = Tue Oct 15 16:04:02 2024
+core.mtime.nsec = 769675000
+core.ctime.sec = Tue Oct 15 16:04:02 2024
+core.ctime.nsec = 769681000
+core.size = 135168
+core.nblocks = 33
+core.extsize = 0
+core.naextents = 0
+core.forkoff = 24
+core.aformat = 1 (local)
+core.dmevmask = 0
+core.dmstate = 0
+core.newrtbm = 0
+core.prealloc = 0
+core.realtime = 0
+core.immutable = 1
+core.append = 0
+core.sync = 1
+core.noatime = 1
+core.nodump = 1
+core.rtinherit = 0
+core.projinherit = 0
+core.nosymlinks = 0
+core.extsz = 0
+core.extszinherit = 0
+core.nodefrag = 1
+core.filestream = 0
+core.gen = 2653591217
+next_unlinked = null
+v3.crc = 0x34a17119 (correct)
+v3.change_count = 3
+v3.lsn = 0
+v3.flags2 = 0x38
+v3.cowextsize = 0
+v3.crtime.sec = Tue Oct 15 16:04:02 2024
+v3.crtime.nsec = 769675000
+v3.inumber = 33685633
+v3.uuid = a6575f59-1514-445e-883e-211b2c5a0f05
+v3.reflink = 0
+v3.cowextsz = 0
+v3.dax = 0
+v3.bigtime = 1
+v3.nrext64 = 1
+v3.metadata = 1
+u3.bmx[0] = [startoff,startblock,blockcount,extentflag] 
+0:[0,4210712,33,0]
+a.sfattr.hdr.totsize = 27
+a.sfattr.hdr.count = 1
+a.sfattr.list[0].namelen = 8
+a.sfattr.list[0].valuelen = 12
+a.sfattr.list[0].root = 0
+a.sfattr.list[0].secure = 0
+a.sfattr.list[0].parent = 1
+a.sfattr.list[0].name = "0.bitmap"
+a.sfattr.list[0].parent_dir.inumber = 33685632
+a.sfattr.list[0].parent_dir.gen = 142228546
+xfs_db> dblock 0
+xfs_db> p
+magicnum = 0x424d505a
+crc = 0xc8b10abf (correct)
+owner = 33685633
+bno = 20902080
+lsn = 0x100007696
+uuid = a6575f59-1514-445e-883e-211b2c5a0f05
+rtwords[0-1011] = 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0 8:0 9:0 10:0 11:0 12:0 13:0
+14:0 15:0 16:0 17:0 18:0 19:0 20:0 21:0xfffff800 22:0xffffffff 23:0xffffffff
+24:0xffffffff 25:0xffffffff 26:0xffffffff 27:0xffffffff 28:0xffffffff
+29:0xffffffff 30:0xffffffff 31:0xffffffff 32:0xffffffff
+...
+979:0xffffffff 980:0xffffffff 981:0xffffffff 982:0xffffffff 983:0xffffffff
+984:0xffffffff 985:0xffffffff 986:0xffffffff 987:0xffffffff 988:0xffffffff
+989:0xffffffff 990:0xffffffff 991:0xffffffff 992:0xffffffff 993:0xffffffff
+994:0xffffffff 995:0xffffffff 996:0xffffffff 997:0xffffffff 998:0xffffffff
+999:0xffffffff 1000:0xffffffff 1001:0xffffffff 1002:0xffffffff 1003:0xffffffff
+1004:0xffffffff 1005:0xffffffff 1006:0xffffffff 1007:0xffffffff 1008:0xffffffff
+1009:0xffffffff 1010:0xffffffff 1011:0xffffffff
+----
+
+From this example, we can clearly see that this is a bitmap file in the
+metadata directory tree, and that it is the bitmap file for rtgroup 3.  When we
+access the first block in the bitmap file, we can clearly see the new block
+header and that the first 179 extents are allocated.  The bitmap words were
+excerpted for brevity.
+
 [[Real-Time_Summary_Inode]]
 == Free Space Summary Inode
 
@@ -47,4 +187,208 @@ This data structure is not particularly space efficient, however it is a very
 fast way to provide the same data as the two free space B+trees for regular
 files since the space is preallocated and metadata maintenance is minimal.
 
+If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, each block of the
+realtime summary file has the same header as rtbitmap file blocks.  However,
+the magic number will be ``SUMY'' (0x53554D59).  After the block header, the
+summary counts are encoded as be32 integers.
+
+=== xfs_db rtsummary Example
+
+This example shows a real-time summary file from a freshly populated filesystem:
+
+----
+xfs_db> path -m /rtgroups/3.summary
+xfs_db> p
+core.magic = 0x494e
+core.mode = 0100000
+core.version = 3
+core.format = 2 (extents)
+core.metatype = 6 (rtsummary)
+core.uid = 0
+core.gid = 0
+core.nlinkv2 = 1
+core.projid_lo = 3
+core.projid_hi = 0
+core.nextents = 1
+core.atime.sec = Tue Oct 15 16:04:02 2024
+core.atime.nsec = 769694000
+core.mtime.sec = Tue Oct 15 16:04:02 2024
+core.mtime.nsec = 769694000
+core.ctime.sec = Tue Oct 15 16:04:02 2024
+core.ctime.nsec = 769699000
+core.size = 4096
+core.nblocks = 1
+core.extsize = 0
+core.naextents = 0
+core.forkoff = 24
+core.aformat = 1 (local)
+core.dmevmask = 0
+core.dmstate = 0
+core.newrtbm = 0
+core.prealloc = 0
+core.realtime = 0
+core.immutable = 1
+core.append = 0
+core.sync = 1
+core.noatime = 1
+core.nodump = 1
+core.rtinherit = 0
+core.projinherit = 0
+core.nosymlinks = 0
+core.extsz = 0
+core.extszinherit = 0
+core.nodefrag = 1
+core.filestream = 0
+core.gen = 519466891
+next_unlinked = null
+v3.crc = 0x54fc58d0 (correct)
+v3.change_count = 3
+v3.lsn = 0
+v3.flags2 = 0x38
+v3.cowextsize = 0
+v3.crtime.sec = Tue Oct 15 16:04:02 2024
+v3.crtime.nsec = 769694000
+v3.inumber = 33685634
+v3.uuid = a6575f59-1514-445e-883e-211b2c5a0f05
+v3.reflink = 0
+v3.cowextsz = 0
+v3.dax = 0
+v3.bigtime = 1
+v3.nrext64 = 1
+v3.metadata = 1
+u3.bmx[0] = [startoff,startblock,blockcount,extentflag] 
+0:[0,4210703,1,0]
+a.sfattr.hdr.totsize = 28
+a.sfattr.hdr.count = 1
+a.sfattr.list[0].namelen = 9
+a.sfattr.list[0].valuelen = 12
+a.sfattr.list[0].root = 0
+a.sfattr.list[0].secure = 0
+a.sfattr.list[0].parent = 1
+a.sfattr.list[0].name = "0.summary"
+a.sfattr.list[0].parent_dir.inumber = 33685632
+a.sfattr.list[0].parent_dir.gen = 142228546
+xfs_db> dblock 0
+xfs_db> p
+magicnum = 0x53554d59
+crc = 0x473340a8 (correct)
+owner = 33685634
+bno = 20902008
+lsn = 0x100007696
+uuid = a6575f59-1514-445e-883e-211b2c5a0f05
+suminfo[0-1011] = 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0 8:0 9:0 10:0 11:0 12:0 13:0
+14:0 15:0 16:0 17:0 18:0 19:0 20:0 21:0 22:0 23:0 24:0 25:0 26:0 27:0 28:0 29:0
+30:0 31:0 32:0
+...
+618:0 619:0 620:0 621:0 622:0 623:0 624:0 625:0 626:0 627:1 628:0 629:0 630:0
+...
+979:0 980:0 981:0 982:0 983:0 984:0 985:0 986:0 987:0 988:0 989:0 990:0 991:0
+992:0 993:0 994:0 995:0 996:0 997:0 998:0 999:0 1000:0 1001:0 1002:0 1003:0
+1004:0 1005:0 1006:0 1007:0 1008:0 1009:0 1010:0 1011:0
+----
+
+From this example, we can clearly see that this is a summary file in the
+metadata directory tree, and that it is the summary file for rtgroup 3.  When
+we access the first block in the summary file, we can clearly see the new block
+header and the nonzero counter for the one large free extent in this group.
+The summary counts were excerpted for brevity.
+
+[[Realtime_Groups]]
+== Realtime Groups
+
+To reduce metadata contention for space allocation and remapping activities
+being applied to realtime files, the realtime volume can be split into
+allocation groups, just like the data volume.  The free space information is
+still contained in a single file that applies to the entire volume.
+
+Each realtime allocation group can contain up to (2^31^ - 1) filesystem blocks,
+regardless of the underlying realtime extent size.
+
+Each realtime group has the following characteristics:
+
+         * Group 0 has a super block describing overall filesystem info
+         * Free space bitmap
+         * Summary of free space
+
+The free space metadata are the same as described in the previous sections,
+except that their scope covers only a single rtgroup.  The other structures are
+expanded upon in the following sections.
+
+[[Realtime_Group_Superblocks]]
+=== Superblocks
+
+The first block of each realtime group contains a superblock.  These fields
+must match their counterparts in the filesystem superblock on the data device.
+
+[source, c]
+----
+struct xfs_rtsb {
+	__be32		rsb_magicnum;
+	__le32		rsb_crc;
+
+	__be32		rsb_pad;
+	unsigned char	rsb_fname[XFSLABEL_MAX];
+
+	uuid_t		rsb_uuid;
+	uuid_t		rsb_meta_uuid;
+
+	/* must be padded to 64 bit alignment */
+};
+----
+
+*rsb_magicnum*::
+Identifies the filesystem. Its value is +XFS_RTSB_MAGIC+ ``Frog'' (0x46726F67).
+
+*rsb_crc*::
+Superblock checksum.
+
+*rsb_pad*::
+Must be zero.
+
+*rsb_fname[12]*::
+Name for the filesystem.  This matches +sb_fname+ in the primary superblock.
+
+*rsb_uuid*::
+UUID (Universally Unique ID) for the filesystem.  This matches +sb_uuid+ in the
+primary superblock.
+
+*rsb_meta_uuid*::
+Metadata UUID for the filesystem.  This matches +sb_meta_uuid+ in the primary
+superblock.
+
+==== xfs_db rtgroup Superblock Example
+
+A filesystem is made on a multidisk filesystem with the following command:
+
+----
+# mkfs.xfs -r rtgroups=1,rgcount=4,rtdev=/dev/sdb /dev/sda -f
+meta-data=/dev/sda               isize=512    agcount=4, agsize=1298176 blks
+         =                       sectsz=512   attr=2, projid32bit=1
+         =                       crc=1        finobt=1, sparse=1, rmapbt=1
+         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=1
+         =                       metadir=1
+data     =                       bsize=4096   blocks=5192704, imaxpct=25
+         =                       sunit=0      swidth=0 blks
+naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
+log      =internal log           bsize=4096   blocks=16384, version=2
+         =                       sectsz=512   sunit=0 blks, lazy-count=1
+realtime =/dev/sdb               extsz=4096   blocks=5192704, rtextents=5192704
+         =                       rgcount=5    rgsize=1048576 extents
+----
+
+And in xfs_db, inspecting the realtime group superblock and then the regular
+superblock:
+
+----
+# xfs_db -R /dev/sdb /dev/sda
+xfs_db> rtsb
+xfs_db> print
+magicnum = 0x46726f67
+crc = 0x759a62d4 (correct)
+pad = 0
+fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
+uuid = 7e55b909-8728-4d69-a1fa-891427314eea
+meta_uuid = 7e55b909-8728-4d69-a1fa-891427314eea
+----
+
 include::rtrmapbt.asciidoc[]
diff --git a/design/XFS_Filesystem_Structure/superblock.asciidoc b/design/XFS_Filesystem_Structure/superblock.asciidoc
index 56877615ae81bf..bffb1659d0ba38 100644
--- a/design/XFS_Filesystem_Structure/superblock.asciidoc
+++ b/design/XFS_Filesystem_Structure/superblock.asciidoc
@@ -70,6 +70,10 @@ struct xfs_dsb {
 	__be64		sb_lsn;
 	uuid_t		sb_meta_uuid;
 	__be64		sb_metadirino;
+	__be32		sb_rgcount;
+	__be32		sb_rgextents;
+	__u8		sb_rgblklog;
+	__u8		sb_pad[7];
 
 	/* must be padded to 64 bit alignment */
 };
@@ -480,6 +484,24 @@ If the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is set, this field points to
 the inode of the root directory of the metadata directory tree.
 This field is zero otherwise.
 
+*sb_rgcount*::
+Count of realtime groups in the filesystem, if the
++XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is enabled.  If no realtime subvolume
+exists, this value will be zero.
+
+*sb_rgextents*::
+Maximum number of realtime extents that can be contained within a realtime
+group, if the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is enabled.
+
+*sb_rgblklog*::
+If the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is enabled, this is the log~2~
+value of +sb_rgextents+ * +sb_rextsize+ (rounded up). This value is used to
+generate absolute block numbers defined in extent maps from the segmented
++xfs_rtblock_t+ values.
+
+*sb_pad[7]*::
+Zeroes, if the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is enabled.
+
 === xfs_db Superblock Example
 
 A filesystem is made on a single disk with the following command:


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 08/10] design: document metadata directory tree quota changes
  2024-11-27  0:18 ` [PATCHSET] " Darrick J. Wong
                     ` (6 preceding siblings ...)
  2024-11-27  0:19   ` [PATCH 07/10] design: document realtime groups Darrick J. Wong
@ 2024-11-27  0:20   ` Darrick J. Wong
  2024-11-27  0:20   ` [PATCH 09/10] design: update metadump v2 format to reflect rt dumps Darrick J. Wong
  2024-11-27  0:20   ` [PATCH 10/10] xfs-documentation: release for 6.1[23] Darrick J. Wong
  9 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27  0:20 UTC (permalink / raw)
  To: djwong; +Cc: hch, hch, cem, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Document the changes to the ondisk quota metadata that came in with
metadata directory trees.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 .../internal_inodes.asciidoc                       |    3 +++
 .../XFS_Filesystem_Structure/ondisk_inode.asciidoc |    3 +++
 .../XFS_Filesystem_Structure/superblock.asciidoc   |    3 +++
 3 files changed, 9 insertions(+)


diff --git a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
index 5f4d62201cbd67..40eb57233ce7c0 100644
--- a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
+++ b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
@@ -21,6 +21,9 @@ of those inodes have been deallocated and may be reused by future features.
 [options="header"]
 |=====
 | Metadata File                                  | Location
+| xref:Quota_Inodes[User Quota]                  | /quota/user
+| xref:Quota_Inodes[Group Quota]                 | /quota/group
+| xref:Quota_Inodes[Project Quota]               | /quota/project
 | xref:Real-Time_Bitmap_Inode[Realtime Bitmap]   | /rtgroups/*.bitmap
 | xref:Real-Time_Summary_Inode[Realtime Summary] | /rtgroups/*.summary
 |=====
diff --git a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
index e28929907147b7..6e52e5fd3d6c1e 100644
--- a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
+++ b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
@@ -199,6 +199,9 @@ directory tree.
 [source, c]
 ----
 enum xfs_metafile_type {
+     XFS_METAFILE_USRQUOTA,
+     XFS_METAFILE_GRPQUOTA,
+     XFS_METAFILE_PRJQUOTA,
      XFS_METAFILE_RTBITMAP,
      XFS_METAFILE_RTSUMMARY,
 };
diff --git a/design/XFS_Filesystem_Structure/superblock.asciidoc b/design/XFS_Filesystem_Structure/superblock.asciidoc
index bffb1659d0ba38..f0455304635737 100644
--- a/design/XFS_Filesystem_Structure/superblock.asciidoc
+++ b/design/XFS_Filesystem_Structure/superblock.asciidoc
@@ -259,6 +259,9 @@ Quota flags. It can be a combination of the following flags:
 | +XFS_PQUOTA_CHKD+		| Project quotas have been checked.
 |=====
 
+If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, the +sb_qflags+ field
+will persist across mounts if no quota mount options are provided.
+
 *sb_flags*::
 Miscellaneous flags.
 


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 09/10] design: update metadump v2 format to reflect rt dumps
  2024-11-27  0:18 ` [PATCHSET] " Darrick J. Wong
                     ` (7 preceding siblings ...)
  2024-11-27  0:20   ` [PATCH 08/10] design: document metadata directory tree quota changes Darrick J. Wong
@ 2024-11-27  0:20   ` Darrick J. Wong
  2024-11-27  0:20   ` [PATCH 10/10] xfs-documentation: release for 6.1[23] Darrick J. Wong
  9 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27  0:20 UTC (permalink / raw)
  To: djwong; +Cc: hch, hch, cem, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Update the metadump v2 format documentation to add realtime device
dumps.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 design/XFS_Filesystem_Structure/metadump.asciidoc |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)


diff --git a/design/XFS_Filesystem_Structure/metadump.asciidoc b/design/XFS_Filesystem_Structure/metadump.asciidoc
index a32d6423ea6e75..226622c0d2f20e 100644
--- a/design/XFS_Filesystem_Structure/metadump.asciidoc
+++ b/design/XFS_Filesystem_Structure/metadump.asciidoc
@@ -119,7 +119,16 @@ Dump contains external log contents.
 |=====
 
 *xmh_incompat_flags*::
-Must be zero.
+A combination of the following flags:
+
+.Metadump v2 incompat flags
+[options="header"]
+|=====
+| Flag				| Description
+| +XFS_MD2_INCOMPAT_RTDEVICE+	|
+Dump contains realtime device contents.
+
+|=====
 
 *xmh_reserved*::
 Must be zero.
@@ -143,6 +152,7 @@ Bits 55-56 determine the device from which the metadata dump data was extracted.
 | Value		| Description
 | 0		| Data device
 | 1		| External log
+| 2		| Realtime device
 |=====
 
 The lower 54 bits determine the device address from which the dump data was


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 10/10] xfs-documentation: release for 6.1[23]
  2024-11-27  0:18 ` [PATCHSET] " Darrick J. Wong
                     ` (8 preceding siblings ...)
  2024-11-27  0:20   ` [PATCH 09/10] design: update metadump v2 format to reflect rt dumps Darrick J. Wong
@ 2024-11-27  0:20   ` Darrick J. Wong
  9 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27  0:20 UTC (permalink / raw)
  To: djwong; +Cc: hch, cem, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Make a new release since we've just landed ondisk format changes for
6.12 and 6.13.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 design/XFS_Filesystem_Structure/docinfo.xml |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)


diff --git a/design/XFS_Filesystem_Structure/docinfo.xml b/design/XFS_Filesystem_Structure/docinfo.xml
index 1eddb1f42f11a1..3aadb6637070d2 100644
--- a/design/XFS_Filesystem_Structure/docinfo.xml
+++ b/design/XFS_Filesystem_Structure/docinfo.xml
@@ -230,4 +230,23 @@
 			</simplelist>
 		</revdescription>
 	</revision>
+	<revision>
+		<revnumber>3.1415926535</revnumber>
+		<date>November 2024</date>
+		<author>
+			<firstname>Darrick</firstname>
+			<surname>Wong</surname>
+			<email>djwong@kernel.org</email>
+		</author>
+		<revdescription>
+			<simplelist>
+				<member>update online fsck docs</member>
+				<member>filesystem properties</member>
+				<member>metadata directory tree</member>
+				<member>realtime groups</member>
+				<member>metadir and quota </member>
+				<member>realtime sb metadump</member>
+			</simplelist>
+		</revdescription>
+	</revision>
 </revhistory>


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [GIT PULL] xfs-documentation: updates for 6.13
  2024-11-27  0:16 [PATCHBOMB] xfs-documentation: updates for 6.13 Darrick J. Wong
  2024-11-27  0:18 ` [PATCHSET] " Darrick J. Wong
@ 2024-11-27  0:20 ` Darrick J. Wong
  1 sibling, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-11-27  0:20 UTC (permalink / raw)
  To: djwong; +Cc: cem, hch, linux-xfs

Hi Darrick,

Please pull this branch with changes.

As usual, I did a test-merge with the main upstream branch as of a few
minutes ago, and didn't see any conflicts.  Please let me know if you
encounter any problems.

The following changes since commit 661d339d50b8e504456d6435ae25246057d21a21:

Merge tag 'xfsdocs-6.10-updates_2024-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-documentation into mainn (2024-08-22 17:05:15 -0700)

are available in the Git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-documentation.git tags/xfsdocs-6.13-updates_2024-11-26

for you to fetch changes up to 368784fa00f920518ac686638c163852a477937c:

xfs-documentation: release for 6.1[23] (2024-11-26 15:57:07 -0800)

----------------------------------------------------------------
xfs-documentation: updates for 6.13 [1/3]

Here's a pile of updates detailing the changes made during 6.12 and 6.13.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>

----------------------------------------------------------------
Darrick J. Wong (10):
design: update metadata reconstruction chapter
design: document filesystem properties
design: move superblock documentation to a separate file
design: document the actual ondisk superblock
design: document the changes required to handle metadata directories
design: move discussion of realtime volumes to a separate section
design: document realtime groups
design: document metadata directory tree quota changes
design: update metadump v2 format to reflect rt dumps
xfs-documentation: release for 6.1[23]

.../allocation_groups.asciidoc                     | 570 +-------------------
.../XFS_Filesystem_Structure/common_types.asciidoc |   4 +-
design/XFS_Filesystem_Structure/docinfo.xml        |  19 +
.../fs_properties.asciidoc                         |  28 +
.../internal_inodes.asciidoc                       | 154 ++++--
design/XFS_Filesystem_Structure/magic.asciidoc     |   3 +
design/XFS_Filesystem_Structure/metadump.asciidoc  |  12 +-
.../XFS_Filesystem_Structure/ondisk_inode.asciidoc |  27 +-
design/XFS_Filesystem_Structure/realtime.asciidoc  | 394 ++++++++++++++
.../reconstruction.asciidoc                        |  17 +-
.../XFS_Filesystem_Structure/superblock.asciidoc   | 574 +++++++++++++++++++++
.../xfs_filesystem_structure.asciidoc              |   4 +
12 files changed, 1192 insertions(+), 614 deletions(-)
create mode 100644 design/XFS_Filesystem_Structure/fs_properties.asciidoc
create mode 100644 design/XFS_Filesystem_Structure/realtime.asciidoc
create mode 100644 design/XFS_Filesystem_Structure/superblock.asciidoc


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-11-27  0:20 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-27  0:16 [PATCHBOMB] xfs-documentation: updates for 6.13 Darrick J. Wong
2024-11-27  0:18 ` [PATCHSET] " Darrick J. Wong
2024-11-27  0:18   ` [PATCH 01/10] design: update metadata reconstruction chapter Darrick J. Wong
2024-11-27  0:18   ` [PATCH 02/10] design: document filesystem properties Darrick J. Wong
2024-11-27  0:18   ` [PATCH 03/10] design: move superblock documentation to a separate file Darrick J. Wong
2024-11-27  0:19   ` [PATCH 04/10] design: document the actual ondisk superblock Darrick J. Wong
2024-11-27  0:19   ` [PATCH 05/10] design: document the changes required to handle metadata directories Darrick J. Wong
2024-11-27  0:19   ` [PATCH 06/10] design: move discussion of realtime volumes to a separate section Darrick J. Wong
2024-11-27  0:19   ` [PATCH 07/10] design: document realtime groups Darrick J. Wong
2024-11-27  0:20   ` [PATCH 08/10] design: document metadata directory tree quota changes Darrick J. Wong
2024-11-27  0:20   ` [PATCH 09/10] design: update metadump v2 format to reflect rt dumps Darrick J. Wong
2024-11-27  0:20   ` [PATCH 10/10] xfs-documentation: release for 6.1[23] Darrick J. Wong
2024-11-27  0:20 ` [GIT PULL] xfs-documentation: updates for 6.13 Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox