* [PATCHBOMB 6.13 v5.6] xfs-docs: metadata directories and realtime groups
@ 2024-11-07 23:21 Darrick J. Wong
2024-11-07 23:24 ` Darrick J. Wong
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Darrick J. Wong @ 2024-11-07 23:21 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-xfs
Hi everyone,
Here's a minor revision to yesterday's metadir/rtgroups ondisk format
documentation updates to move the superblock docs to a separate file,
and to remove a few feature flag bits that were removed before the final
release. The last bits of online fsck haven't changed, so I've left
them out for brevity.
Nobody likes asciidoc, so if you want an easy to read version of the
ondisk documentation, try either:
http://djwong.org/docs/xfs_filesystem_structure.pdf
http://djwong.org/docs/xfs_filesystem_structure.html
Please have a look at the git tree links for code changes:
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-documentation.git/log/?h=realtime-groups_2024-11-07
--D
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCHBOMB 6.13 v5.6] xfs-docs: metadata directories and realtime groups 2024-11-07 23:21 [PATCHBOMB 6.13 v5.6] xfs-docs: metadata directories and realtime groups Darrick J. Wong @ 2024-11-07 23:24 ` Darrick J. Wong 2024-11-07 23:24 ` [PATCHSET v5.6 1/2] xfs-documentation: document metadata directories Darrick J. Wong 2024-11-07 23:25 ` [PATCHSET v5.6 2/2] xfs-documentation: shard the realtime section Darrick J. Wong 2 siblings, 0 replies; 17+ messages in thread From: Darrick J. Wong @ 2024-11-07 23:24 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-xfs On Thu, Nov 07, 2024 at 03:21:31PM -0800, Darrick J. Wong wrote: > Hi everyone, > > Here's a minor revision to yesterday's metadir/rtgroups ondisk format > documentation updates to move the superblock docs to a separate file, > and to remove a few feature flag bits that were removed before the final > release. The last bits of online fsck haven't changed, so I've left > them out for brevity. > > Nobody likes asciidoc, so if you want an easy to read version of the > ondisk documentation, try either: > http://djwong.org/docs/xfs_filesystem_structure.pdf > http://djwong.org/docs/xfs_filesystem_structure.html > > Please have a look at the git tree links for code changes: > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-documentation.git/log/?h=realtime-groups_2024-11-07 Forgot to mention that these are the unreviewed patches: [PATCHSET v5.6 1/2] xfs-documentation: document metadata directories [PATCH 1/3] design: move superblock documentation to a separate file [PATCH 2/3] design: document the actual ondisk superblock [PATCHSET v5.6 2/2] xfs-documentation: shard the realtime section [PATCH 2/4] design: document realtime groups --D ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCHSET v5.6 1/2] xfs-documentation: document metadata directories 2024-11-07 23:21 [PATCHBOMB 6.13 v5.6] xfs-docs: metadata directories and realtime groups Darrick J. Wong 2024-11-07 23:24 ` Darrick J. Wong @ 2024-11-07 23:24 ` Darrick J. Wong 2024-11-07 23:25 ` [PATCH 1/3] design: move superblock documentation to a separate file Darrick J. Wong ` (2 more replies) 2024-11-07 23:25 ` [PATCHSET v5.6 2/2] xfs-documentation: shard the realtime section Darrick J. Wong 2 siblings, 3 replies; 17+ messages in thread From: Darrick J. Wong @ 2024-11-07 23:24 UTC (permalink / raw) To: djwong; +Cc: hch, linux-xfs Hi all, This patch documents the metadata directory tree feature. If you're going to start using this code, I strongly recommend pulling from my git trees, which are linked below. Comments and questions are, as always, welcome. kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=metadir xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=metadir fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=metadir xfsdocs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-documentation.git/log/?h=metadir --- Commits in this patchset: * design: move superblock documentation to a separate file * design: document the actual ondisk superblock * design: document the changes required to handle metadata directories --- .../allocation_groups.asciidoc | 550 -------------------- .../internal_inodes.asciidoc | 113 ++++ .../XFS_Filesystem_Structure/ondisk_inode.asciidoc | 22 + .../XFS_Filesystem_Structure/superblock.asciidoc | 549 ++++++++++++++++++++ 4 files changed, 684 insertions(+), 550 deletions(-) create mode 100644 design/XFS_Filesystem_Structure/superblock.asciidoc ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/3] design: move superblock documentation to a separate file 2024-11-07 23:24 ` [PATCHSET v5.6 1/2] xfs-documentation: document metadata directories Darrick J. Wong @ 2024-11-07 23:25 ` Darrick J. Wong 2024-11-08 14:16 ` Christoph Hellwig 2024-11-07 23:25 ` [PATCH 2/3] design: document the actual ondisk superblock Darrick J. Wong 2024-11-07 23:26 ` [PATCH 3/3] design: document the changes required to handle metadata directories Darrick J. Wong 2 siblings, 1 reply; 17+ messages in thread From: Darrick J. Wong @ 2024-11-07 23:25 UTC (permalink / raw) To: djwong; +Cc: hch, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Move the ondisk superblock docs to a separate file. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- .../allocation_groups.asciidoc | 550 -------------------- .../XFS_Filesystem_Structure/superblock.asciidoc | 548 ++++++++++++++++++++ 2 files changed, 549 insertions(+), 549 deletions(-) create mode 100644 design/XFS_Filesystem_Structure/superblock.asciidoc diff --git a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc index d7fd63ea20a646..e2cdaab5e03d3f 100644 --- a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc +++ b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc @@ -31,555 +31,7 @@ image::images/6.png[] Each of these structures are expanded upon in the following sections. -[[Superblocks]] -== Superblocks - -Each AG starts with a superblock. The first one, in AG 0, is the primary -superblock which stores aggregate AG information. Secondary superblocks are -only used by xfs_repair when the primary superblock has been corrupted. A -superblock is one sector in length. - -The superblock is defined by the following structure. The description of each -field follows. - -[source, c] ----- -struct xfs_sb -{ - __uint32_t sb_magicnum; - __uint32_t sb_blocksize; - xfs_rfsblock_t sb_dblocks; - xfs_rfsblock_t sb_rblocks; - xfs_rtblock_t sb_rextents; - uuid_t sb_uuid; - xfs_fsblock_t sb_logstart; - xfs_ino_t sb_rootino; - xfs_ino_t sb_rbmino; - xfs_ino_t sb_rsumino; - xfs_agblock_t sb_rextsize; - xfs_agblock_t sb_agblocks; - xfs_agnumber_t sb_agcount; - xfs_extlen_t sb_rbmblocks; - xfs_extlen_t sb_logblocks; - __uint16_t sb_versionnum; - __uint16_t sb_sectsize; - __uint16_t sb_inodesize; - __uint16_t sb_inopblock; - char sb_fname[12]; - __uint8_t sb_blocklog; - __uint8_t sb_sectlog; - __uint8_t sb_inodelog; - __uint8_t sb_inopblog; - __uint8_t sb_agblklog; - __uint8_t sb_rextslog; - __uint8_t sb_inprogress; - __uint8_t sb_imax_pct; - __uint64_t sb_icount; - __uint64_t sb_ifree; - __uint64_t sb_fdblocks; - __uint64_t sb_frextents; - xfs_ino_t sb_uquotino; - xfs_ino_t sb_gquotino; - __uint16_t sb_qflags; - __uint8_t sb_flags; - __uint8_t sb_shared_vn; - xfs_extlen_t sb_inoalignmt; - __uint32_t sb_unit; - __uint32_t sb_width; - __uint8_t sb_dirblklog; - __uint8_t sb_logsectlog; - __uint16_t sb_logsectsize; - __uint32_t sb_logsunit; - __uint32_t sb_features2; - __uint32_t sb_bad_features2; - - /* version 5 superblock fields start here */ - __uint32_t sb_features_compat; - __uint32_t sb_features_ro_compat; - __uint32_t sb_features_incompat; - __uint32_t sb_features_log_incompat; - - __uint32_t sb_crc; - xfs_extlen_t sb_spino_align; - - xfs_ino_t sb_pquotino; - xfs_lsn_t sb_lsn; - uuid_t sb_meta_uuid; - xfs_ino_t sb_rrmapino; -}; ----- -*sb_magicnum*:: -Identifies the filesystem. Its value is +XFS_SB_MAGIC+ ``XFSB'' (0x58465342). - -*sb_blocksize*:: -The size of a basic unit of space allocation in bytes. Typically, this is 4096 -(4KB) but can range from 512 to 65536 bytes. - -*sb_dblocks*:: -Total number of blocks available for data and metadata on the filesystem. - -*sb_rblocks*:: -Number blocks in the real-time disk device. Refer to -xref:Real-time_Devices[real-time sub-volumes] for more information. - -*sb_rextents*:: -Number of extents on the real-time device. - -*sb_uuid*:: -UUID (Universally Unique ID) for the filesystem. Filesystems can be mounted by -the UUID instead of device name. - -*sb_logstart*:: -First block number for the journaling log if the log is internal (ie. not on a -separate disk device). For an external log device, this will be zero (the log -will also start on the first block on the log device). The identity of the log -devices is not recorded in the filesystem, but the UUIDs of the filesystem and -the log device are compared to prevent corruption. - -*sb_rootino*:: -Root inode number for the filesystem. Normally, the root inode is at the -start of the first possible inode chunk in AG 0. This is 128 when using a 4KB -block size. - -*sb_rbmino*:: -Bitmap inode for real-time extents. - -*sb_rsumino*:: -Summary inode for real-time bitmap. - -*sb_rextsize*:: -Realtime extent size in blocks. - -*sb_agblocks*:: -Size of each AG in blocks. For the actual size of the last AG, refer to the -xref:AG_Free_Space_Management[free space] +agf_length+ value. - -*sb_agcount*:: -Number of AGs in the filesystem. - -*sb_rbmblocks*:: -Number of real-time bitmap blocks. - -*sb_logblocks*:: -Number of blocks for the journaling log. - -*sb_versionnum*:: -Filesystem version number. This is a bitmask specifying the features enabled -when creating the filesystem. Any disk checking tools or drivers that do not -recognize any set bits must not operate upon the filesystem. Most of the flags -indicate features introduced over time. If the value of the lower nibble is >= -4, the higher bits indicate feature flags as follows: - -.Version 4 Superblock version flags -[options="header"] -|===== -| Flag | Description -| +XFS_SB_VERSION_ATTRBIT+ | -Set if any inode have extended attributes. If this bit is set; the -+XFS_SB_VERSION2_ATTR2BIT+ is not set; and the +attr2+ mount flag is not -specified, the +di_forkoff+ inode field will not be dynamically adjusted. -See the section about xref:Extended_Attribute_Versions[extended attribute -versions] for more information. - -| +XFS_SB_VERSION_NLINKBIT+ | Set if any inodes use 32-bit di_nlink values. -| +XFS_SB_VERSION_QUOTABIT+ | -Quotas are enabled on the filesystem. This -also brings in the various quota fields in the superblock. - -| +XFS_SB_VERSION_ALIGNBIT+ | Set if sb_inoalignmt is used. -| +XFS_SB_VERSION_DALIGNBIT+ | Set if sb_unit and sb_width are used. -| +XFS_SB_VERSION_SHAREDBIT+ | Set if sb_shared_vn is used. -| +XFS_SB_VERSION_LOGV2BIT+ | Version 2 journaling logs are used. -| +XFS_SB_VERSION_SECTORBIT+ | Set if sb_sectsize is not 512. -| +XFS_SB_VERSION_EXTFLGBIT+ | Unwritten extents are used. This is always set. -| +XFS_SB_VERSION_DIRV2BIT+ | -Version 2 directories are used. This is always set. - -| +XFS_SB_VERSION_MOREBITSBIT+ | -Set if the sb_features2 field in the superblock contains more flags. -|===== - -If the lower nibble of this value is 5, then this is a v5 filesystem; the -+XFS_SB_VERSION2_CRCBIT+ feature must be set in +sb_features2+. - -*sb_sectsize*:: -Specifies the underlying disk sector size in bytes. Typically this is 512 or -4096 bytes. This determines the minimum I/O alignment, especially for direct I/O. - -*sb_inodesize*:: -Size of the inode in bytes. The default is 256 (2 inodes per standard sector) -but can be made as large as 2048 bytes when creating the filesystem. On a v5 -filesystem, the default and minimum inode size are both 512 bytes. - -*sb_inopblock*:: -Number of inodes per block. This is equivalent to +sb_blocksize / sb_inodesize+. - -*sb_fname[12]*:: -Name for the filesystem. This value can be used in the mount command. - -*sb_blocklog*:: -log~2~ value of +sb_blocksize+. In other terms, +sb_blocksize = 2^sb_blocklog^+. - -*sb_sectlog*:: -log~2~ value of +sb_sectsize+. - -*sb_inodelog*:: -log~2~ value of +sb_inodesize+. - -*sb_inopblog*:: -log~2~ value of +sb_inopblock+. - -*sb_agblklog*:: -log~2~ value of +sb_agblocks+ (rounded up). This value is used to generate inode -numbers and absolute block numbers defined in extent maps. - -*sb_rextslog*:: -log~2~ value of +sb_rextents+. - -*sb_inprogress*:: -Flag specifying that the filesystem is being created. - -*sb_imax_pct*:: -Maximum percentage of filesystem space that can be used for inodes. The default -value is 5%. - -*sb_icount*:: -Global count for number inodes allocated on the filesystem. This is only -maintained in the first superblock. - -*sb_ifree*:: -Global count of free inodes on the filesystem. This is only maintained in the -first superblock. - -*sb_fdblocks*:: -Global count of free data blocks on the filesystem. This is only maintained in -the first superblock. - -*sb_frextents*:: -Global count of free real-time extents on the filesystem. This is only -maintained in the first superblock. - -*sb_uquotino*:: -Inode for user quotas. This and the following two quota fields only apply if -+XFS_SB_VERSION_QUOTABIT+ flag is set in +sb_versionnum+. Refer to -xref:Quota_Inodes[quota inodes] for more information. - -*sb_gquotino*:: -Inode for group or project quotas. Group and project quotas cannot be used at -the same time on v4 filesystems. On a v5 filesystem, this inode always stores -group quota information. - -*sb_qflags*:: -Quota flags. It can be a combination of the following flags: - -.Superblock quota flags -[options="header"] -|===== -| Flag | Description -| +XFS_UQUOTA_ACCT+ | User quota accounting is enabled. -| +XFS_UQUOTA_ENFD+ | User quotas are enforced. -| +XFS_UQUOTA_CHKD+ | User quotas have been checked. -| +XFS_PQUOTA_ACCT+ | Project quota accounting is enabled. -| +XFS_OQUOTA_ENFD+ | Other (group/project) quotas are enforced. -| +XFS_OQUOTA_CHKD+ | Other (group/project) quotas have been checked. -| +XFS_GQUOTA_ACCT+ | Group quota accounting is enabled. -| +XFS_GQUOTA_ENFD+ | Group quotas are enforced. -| +XFS_GQUOTA_CHKD+ | Group quotas have been checked. -| +XFS_PQUOTA_ENFD+ | Project quotas are enforced. -| +XFS_PQUOTA_CHKD+ | Project quotas have been checked. -|===== - -*sb_flags*:: -Miscellaneous flags. - -.Superblock flags -[options="header"] -|===== -| Flag | Description -| +XFS_SBF_READONLY+ | Only read-only mounts allowed. -|===== - -*sb_shared_vn*:: -Reserved and must be zero (``vn'' stands for version number). - -*sb_inoalignmt*:: -Inode chunk alignment in fsblocks. Prior to v5, the default value provided for -inode chunks to have an 8KiB alignment. Starting with v5, the default value -scales with the multiple of the inode size over 256 bytes. Concretely, this -means an alignment of 16KiB for 512-byte inodes, 32KiB for 1024-byte inodes, -etc. If sparse inodes are enabled, the +ir_startino+ field of each inode -B+tree record must be aligned to this block granularity, even if the inode -given by +ir_startino+ itself is sparse. - -*sb_unit*:: -Underlying stripe or raid unit in blocks. - -*sb_width*:: -Underlying stripe or raid width in blocks. - -*sb_dirblklog*:: -log~2~ multiplier that determines the granularity of directory block allocations -in fsblocks. - -*sb_logsectlog*:: -log~2~ value of the log subvolume's sector size. This is only used if the -journaling log is on a separate disk device (i.e. not internal). - -*sb_logsectsize*:: -The log's sector size in bytes if the filesystem uses an external log device. - -*sb_logsunit*:: -The log device's stripe or raid unit size. This only applies to version 2 logs -+XFS_SB_VERSION_LOGV2BIT+ is set in +sb_versionnum+. - -*sb_features2*:: -Additional version flags if +XFS_SB_VERSION_MOREBITSBIT+ is set in -+sb_versionnum+. The currently defined additional features include: - -.Extended Version 4 Superblock flags -[options="header"] -|===== -| Flag | Description -| +XFS_SB_VERSION2_LAZYSBCOUNTBIT+ | -Lazy global counters. Making a filesystem with this bit set can improve -performance. The global free space and inode counts are only updated in the -primary superblock when the filesystem is cleanly unmounted. - -| +XFS_SB_VERSION2_ATTR2BIT+ | -Extended attributes version 2. Making a filesystem with this optimises the -inode layout of extended attributes. If this bit is set and the +noattr2+ -mount flag is not specified, the +di_forkoff+ inode field will be dynamically -adjusted. See the section about xref:Extended_Attribute_Versions[extended -attribute versions] for more information. - -| +XFS_SB_VERSION2_PARENTBIT+ | -Parent pointers. All inodes must have an extended attribute that points back to -its parent inode. The primary purpose for this information is in backup systems. - -| +XFS_SB_VERSION2_PROJID32BIT+ | -32-bit Project ID. Inodes can be associated with a project ID number, which -can be used to enforce disk space usage quotas for a particular group of -directories. This flag indicates that project IDs can be 32 bits in size. - -| +XFS_SB_VERSION2_CRCBIT+ | -Metadata checksumming. All metadata blocks have an extended header containing -the block checksum, a copy of the metadata UUID, the log sequence number of the -last update to prevent stale replays, and a back pointer to the owner of the -block. This feature must be and can only be set if the lowest nibble of -+sb_versionnum+ is set to 5. - -| +XFS_SB_VERSION2_FTYPE+ | -Directory file type. Each directory entry records the type of the inode to -which the entry points. This speeds up directory iteration by removing the -need to load every inode into memory. -|===== - -*sb_bad_features2*:: -This field mirrors +sb_features2+, due to past 64-bit alignment errors. - -*sb_features_compat*:: -Read-write compatible feature flags. The kernel can still read and write this -FS even if it doesn't understand the flag. Currently, there are no valid -flags. - -*sb_features_ro_compat*:: -Read-only compatible feature flags. The kernel can still read this FS even if -it doesn't understand the flag. - -.Extended Version 5 Superblock Read-Only compatibility flags -[options="header"] -|===== -| Flag | Description -| +XFS_SB_FEAT_RO_COMPAT_FINOBT+ | -Free inode B+tree. Each allocation group contains a B+tree to track inode chunks -containing free inodes. This is a performance optimization to reduce the time -required to allocate inodes. - -| +XFS_SB_FEAT_RO_COMPAT_RMAPBT+ | -Reverse mapping B+tree. Each allocation group contains a B+tree containing -records mapping AG blocks to their owners. See the section about -xref:Reconstruction[reconstruction] for more details. - -| +XFS_SB_FEAT_RO_COMPAT_REFLINK+ | -Reference count B+tree. Each allocation group contains a B+tree to track the -reference counts of AG blocks. This enables files to share data blocks safely. -See the section about xref:Reflink_Deduplication[reflink and deduplication] for -more details. - -| +XFS_SB_FEAT_RO_COMPAT_INOBTCNT+ | -Inode B+tree block counters. Each allocation group's inode (AGI) header -tracks the number of blocks in each of the inode B+trees. This allows us -to have a slightly higher level of redundancy over the shape of the inode -btrees, and decreases the amount of time to compute the metadata B+tree -preallocations at mount time. - -|===== - -*sb_features_incompat*:: -Read-write incompatible feature flags. The kernel cannot read or write this -FS if it doesn't understand the flag. - -.Extended Version 5 Superblock Read-Write incompatibility flags -[options="header"] -|===== -| Flag | Description -| +XFS_SB_FEAT_INCOMPAT_FTYPE+ | -Directory file type. Each directory entry tracks the type of the inode to -which the entry points. This is a performance optimization to remove the need -to load every inode into memory to iterate a directory. - -| +XFS_SB_FEAT_INCOMPAT_SPINODES+ | -Sparse inodes. This feature relaxes the requirement to allocate inodes in -chunks of 64. When the free space is heavily fragmented, there might exist -plenty of free space but not enough contiguous free space to allocate a new -inode chunk. With this feature, the user can continue to create files until -all free space is exhausted. - -Unused space in the inode B+tree records are used to track which parts of the -inode chunk are not inodes. - -See the chapter on xref:Sparse_Inodes[Sparse Inodes] for more information. - -| +XFS_SB_FEAT_INCOMPAT_META_UUID+ | -Metadata UUID. The UUID stamped into each metadata block must match the value -in +sb_meta_uuid+. This enables the administrator to change +sb_uuid+ at will -without having to rewrite the entire filesystem. - -| +XFS_SB_FEAT_INCOMPAT_BIGTIME+ | -Large timestamps. Inode timestamps and quota expiration timers are extended to -support times through the year 2486. See the section on -xref:Timestamps[timestamps] for more information. - -| +XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR+ | -The filesystem is not in operable condition, and must be run through -xfs_repair before it can be mounted. - -| +XFS_SB_FEAT_INCOMPAT_NREXT64+ | -Large file fork extent counts. This greatly expands the maximum number of -space mappings allowed in data and extended attribute file forks. - -| +XFS_SB_FEAT_INCOMPAT_EXCHRANGE+ | -Atomic file mapping exchanges. The filesystem is capable of exchanging a range -of mappings between two arbitrary ranges of a file's fork by using log intent -items to track the progress of the high level exchange operation. In other -words, the exchange operation can be restarted if the system goes down, which -is necessary for userspace to commit of new file contents atomically. This -flag has user-visible impacts, which is why it is a permanent incompat flag. -See the section about xref:XMI_Log_Item[mapping exchange log intents] for more -information. - -| +XFS_SB_FEAT_INCOMPAT_PARENT+ | -Directory parent pointers. See the section about xref:Parent_Pointers[parent -pointers] for more information. - -|===== - -*sb_features_log_incompat*:: -Read-write incompatible feature flags for the log. The kernel cannot recover -the FS log if it doesn't understand the flag. - -.Extended Version 5 Superblock Log incompatibility flags -[options="header"] -|===== -| Flag | Description -| +XFS_SB_FEAT_INCOMPAT_LOG_XATTRS+ | -Extended attribute updates have been committed to the ondisk log. - -|===== - -*sb_crc*:: -Superblock checksum. - -*sb_spino_align*:: -Sparse inode alignment, in fsblocks. Each chunk of inodes referenced by a -sparse inode B+tree record must be aligned to this block granularity. - -*sb_pquotino*:: -Project quota inode. - -*sb_lsn*:: -Log sequence number of the last superblock update. - -*sb_meta_uuid*:: -If the +XFS_SB_FEAT_INCOMPAT_META_UUID+ feature is set, then the UUID field in -all metadata blocks must match this UUID. If not, the block header UUID field -must match +sb_uuid+. - -*sb_rrmapino*:: -If the +XFS_SB_FEAT_RO_COMPAT_RMAPBT+ feature is set and a real-time -device is present (+sb_rblocks+ > 0), this field points to an inode -that contains the root to the -xref:Real_time_Reverse_Mapping_Btree[Real-Time Reverse Mapping B+tree]. -This field is zero otherwise. - -=== xfs_db Superblock Example - -A filesystem is made on a single disk with the following command: - ----- -# mkfs.xfs -i attr=2 -n size=16384 -f /dev/sda7 -meta-data=/dev/sda7 isize=256 agcount=16, agsize=3923122 blks - = sectsz=512 attr=2 -data = bsize=4096 blocks=62769952, imaxpct=25 - = sunit=0 swidth=0 blks, unwritten=1 -naming =version 2 bsize=16384 -log =internal log bsize=4096 blocks=30649, version=1 - = sectsz=512 sunit=0 blks -realtime =none extsz=65536 blocks=0, rtextents=0 ----- - -And in xfs_db, inspecting the superblock: - ----- -xfs_db> sb -xfs_db> p -magicnum = 0x58465342 -blocksize = 4096 -dblocks = 62769952 -rblocks = 0 -rextents = 0 -uuid = 32b24036-6931-45b4-b68c-cd5e7d9a1ca5 -logstart = 33554436 -rootino = 128 -rbmino = 129 -rsumino = 130 -rextsize = 16 -agblocks = 3923122 -agcount = 16 -rbmblocks = 0 -logblocks = 30649 -versionnum = 0xb084 -sectsize = 512 -inodesize = 256 -inopblock = 16 -fname = "\000\000\000\000\000\000\000\000\000\000\000\000" -blocklog = 12 -sectlog = 9 -inodelog = 8 -inopblog = 4 -agblklog = 22 -rextslog = 0 -inprogress = 0 -imax_pct = 25 -icount = 64 -ifree = 61 -fdblocks = 62739235 -frextents = 0 -uquotino = 0 -gquotino = 0 -qflags = 0 -flags = 0 -shared_vn = 0 -inoalignmt = 2 -unit = 0 -width = 0 -dirblklog = 2 -logsectlog = 0 -logsectsize = 0 -logsunit = 0 -features2 = 8 ----- - +include::superblock.asciidoc[] [[AG_Free_Space_Management]] == AG Free Space Management diff --git a/design/XFS_Filesystem_Structure/superblock.asciidoc b/design/XFS_Filesystem_Structure/superblock.asciidoc new file mode 100644 index 00000000000000..16c31116ffafd4 --- /dev/null +++ b/design/XFS_Filesystem_Structure/superblock.asciidoc @@ -0,0 +1,548 @@ +[[Superblocks]] +== Superblocks + +Each AG starts with a superblock. The first one, in AG 0, is the primary +superblock which stores aggregate AG information. Secondary superblocks are +only used by xfs_repair when the primary superblock has been corrupted. A +superblock is one sector in length. + +The superblock is defined by the following structure. The description of each +field follows. + +[source, c] +---- +struct xfs_sb +{ + __uint32_t sb_magicnum; + __uint32_t sb_blocksize; + xfs_rfsblock_t sb_dblocks; + xfs_rfsblock_t sb_rblocks; + xfs_rtblock_t sb_rextents; + uuid_t sb_uuid; + xfs_fsblock_t sb_logstart; + xfs_ino_t sb_rootino; + xfs_ino_t sb_rbmino; + xfs_ino_t sb_rsumino; + xfs_agblock_t sb_rextsize; + xfs_agblock_t sb_agblocks; + xfs_agnumber_t sb_agcount; + xfs_extlen_t sb_rbmblocks; + xfs_extlen_t sb_logblocks; + __uint16_t sb_versionnum; + __uint16_t sb_sectsize; + __uint16_t sb_inodesize; + __uint16_t sb_inopblock; + char sb_fname[12]; + __uint8_t sb_blocklog; + __uint8_t sb_sectlog; + __uint8_t sb_inodelog; + __uint8_t sb_inopblog; + __uint8_t sb_agblklog; + __uint8_t sb_rextslog; + __uint8_t sb_inprogress; + __uint8_t sb_imax_pct; + __uint64_t sb_icount; + __uint64_t sb_ifree; + __uint64_t sb_fdblocks; + __uint64_t sb_frextents; + xfs_ino_t sb_uquotino; + xfs_ino_t sb_gquotino; + __uint16_t sb_qflags; + __uint8_t sb_flags; + __uint8_t sb_shared_vn; + xfs_extlen_t sb_inoalignmt; + __uint32_t sb_unit; + __uint32_t sb_width; + __uint8_t sb_dirblklog; + __uint8_t sb_logsectlog; + __uint16_t sb_logsectsize; + __uint32_t sb_logsunit; + __uint32_t sb_features2; + __uint32_t sb_bad_features2; + + /* version 5 superblock fields start here */ + __uint32_t sb_features_compat; + __uint32_t sb_features_ro_compat; + __uint32_t sb_features_incompat; + __uint32_t sb_features_log_incompat; + + __uint32_t sb_crc; + xfs_extlen_t sb_spino_align; + + xfs_ino_t sb_pquotino; + xfs_lsn_t sb_lsn; + uuid_t sb_meta_uuid; + xfs_ino_t sb_rrmapino; +}; +---- +*sb_magicnum*:: +Identifies the filesystem. Its value is +XFS_SB_MAGIC+ ``XFSB'' (0x58465342). + +*sb_blocksize*:: +The size of a basic unit of space allocation in bytes. Typically, this is 4096 +(4KB) but can range from 512 to 65536 bytes. + +*sb_dblocks*:: +Total number of blocks available for data and metadata on the filesystem. + +*sb_rblocks*:: +Number blocks in the real-time disk device. Refer to +xref:Real-time_Devices[real-time sub-volumes] for more information. + +*sb_rextents*:: +Number of extents on the real-time device. + +*sb_uuid*:: +UUID (Universally Unique ID) for the filesystem. Filesystems can be mounted by +the UUID instead of device name. + +*sb_logstart*:: +First block number for the journaling log if the log is internal (ie. not on a +separate disk device). For an external log device, this will be zero (the log +will also start on the first block on the log device). The identity of the log +devices is not recorded in the filesystem, but the UUIDs of the filesystem and +the log device are compared to prevent corruption. + +*sb_rootino*:: +Root inode number for the filesystem. Normally, the root inode is at the +start of the first possible inode chunk in AG 0. This is 128 when using a 4KB +block size. + +*sb_rbmino*:: +Bitmap inode for real-time extents. + +*sb_rsumino*:: +Summary inode for real-time bitmap. + +*sb_rextsize*:: +Realtime extent size in blocks. + +*sb_agblocks*:: +Size of each AG in blocks. For the actual size of the last AG, refer to the +xref:AG_Free_Space_Management[free space] +agf_length+ value. + +*sb_agcount*:: +Number of AGs in the filesystem. + +*sb_rbmblocks*:: +Number of real-time bitmap blocks. + +*sb_logblocks*:: +Number of blocks for the journaling log. + +*sb_versionnum*:: +Filesystem version number. This is a bitmask specifying the features enabled +when creating the filesystem. Any disk checking tools or drivers that do not +recognize any set bits must not operate upon the filesystem. Most of the flags +indicate features introduced over time. If the value of the lower nibble is >= +4, the higher bits indicate feature flags as follows: + +.Version 4 Superblock version flags +[options="header"] +|===== +| Flag | Description +| +XFS_SB_VERSION_ATTRBIT+ | +Set if any inode have extended attributes. If this bit is set; the ++XFS_SB_VERSION2_ATTR2BIT+ is not set; and the +attr2+ mount flag is not +specified, the +di_forkoff+ inode field will not be dynamically adjusted. +See the section about xref:Extended_Attribute_Versions[extended attribute +versions] for more information. + +| +XFS_SB_VERSION_NLINKBIT+ | Set if any inodes use 32-bit di_nlink values. +| +XFS_SB_VERSION_QUOTABIT+ | +Quotas are enabled on the filesystem. This +also brings in the various quota fields in the superblock. + +| +XFS_SB_VERSION_ALIGNBIT+ | Set if sb_inoalignmt is used. +| +XFS_SB_VERSION_DALIGNBIT+ | Set if sb_unit and sb_width are used. +| +XFS_SB_VERSION_SHAREDBIT+ | Set if sb_shared_vn is used. +| +XFS_SB_VERSION_LOGV2BIT+ | Version 2 journaling logs are used. +| +XFS_SB_VERSION_SECTORBIT+ | Set if sb_sectsize is not 512. +| +XFS_SB_VERSION_EXTFLGBIT+ | Unwritten extents are used. This is always set. +| +XFS_SB_VERSION_DIRV2BIT+ | +Version 2 directories are used. This is always set. + +| +XFS_SB_VERSION_MOREBITSBIT+ | +Set if the sb_features2 field in the superblock contains more flags. +|===== + +If the lower nibble of this value is 5, then this is a v5 filesystem; the ++XFS_SB_VERSION2_CRCBIT+ feature must be set in +sb_features2+. + +*sb_sectsize*:: +Specifies the underlying disk sector size in bytes. Typically this is 512 or +4096 bytes. This determines the minimum I/O alignment, especially for direct I/O. + +*sb_inodesize*:: +Size of the inode in bytes. The default is 256 (2 inodes per standard sector) +but can be made as large as 2048 bytes when creating the filesystem. On a v5 +filesystem, the default and minimum inode size are both 512 bytes. + +*sb_inopblock*:: +Number of inodes per block. This is equivalent to +sb_blocksize / sb_inodesize+. + +*sb_fname[12]*:: +Name for the filesystem. This value can be used in the mount command. + +*sb_blocklog*:: +log~2~ value of +sb_blocksize+. In other terms, +sb_blocksize = 2^sb_blocklog^+. + +*sb_sectlog*:: +log~2~ value of +sb_sectsize+. + +*sb_inodelog*:: +log~2~ value of +sb_inodesize+. + +*sb_inopblog*:: +log~2~ value of +sb_inopblock+. + +*sb_agblklog*:: +log~2~ value of +sb_agblocks+ (rounded up). This value is used to generate inode +numbers and absolute block numbers defined in extent maps. + +*sb_rextslog*:: +log~2~ value of +sb_rextents+. + +*sb_inprogress*:: +Flag specifying that the filesystem is being created. + +*sb_imax_pct*:: +Maximum percentage of filesystem space that can be used for inodes. The default +value is 5%. + +*sb_icount*:: +Global count for number inodes allocated on the filesystem. This is only +maintained in the first superblock. + +*sb_ifree*:: +Global count of free inodes on the filesystem. This is only maintained in the +first superblock. + +*sb_fdblocks*:: +Global count of free data blocks on the filesystem. This is only maintained in +the first superblock. + +*sb_frextents*:: +Global count of free real-time extents on the filesystem. This is only +maintained in the first superblock. + +*sb_uquotino*:: +Inode for user quotas. This and the following two quota fields only apply if ++XFS_SB_VERSION_QUOTABIT+ flag is set in +sb_versionnum+. Refer to +xref:Quota_Inodes[quota inodes] for more information. + +*sb_gquotino*:: +Inode for group or project quotas. Group and project quotas cannot be used at +the same time on v4 filesystems. On a v5 filesystem, this inode always stores +group quota information. + +*sb_qflags*:: +Quota flags. It can be a combination of the following flags: + +.Superblock quota flags +[options="header"] +|===== +| Flag | Description +| +XFS_UQUOTA_ACCT+ | User quota accounting is enabled. +| +XFS_UQUOTA_ENFD+ | User quotas are enforced. +| +XFS_UQUOTA_CHKD+ | User quotas have been checked. +| +XFS_PQUOTA_ACCT+ | Project quota accounting is enabled. +| +XFS_OQUOTA_ENFD+ | Other (group/project) quotas are enforced. +| +XFS_OQUOTA_CHKD+ | Other (group/project) quotas have been checked. +| +XFS_GQUOTA_ACCT+ | Group quota accounting is enabled. +| +XFS_GQUOTA_ENFD+ | Group quotas are enforced. +| +XFS_GQUOTA_CHKD+ | Group quotas have been checked. +| +XFS_PQUOTA_ENFD+ | Project quotas are enforced. +| +XFS_PQUOTA_CHKD+ | Project quotas have been checked. +|===== + +*sb_flags*:: +Miscellaneous flags. + +.Superblock flags +[options="header"] +|===== +| Flag | Description +| +XFS_SBF_READONLY+ | Only read-only mounts allowed. +|===== + +*sb_shared_vn*:: +Reserved and must be zero (``vn'' stands for version number). + +*sb_inoalignmt*:: +Inode chunk alignment in fsblocks. Prior to v5, the default value provided for +inode chunks to have an 8KiB alignment. Starting with v5, the default value +scales with the multiple of the inode size over 256 bytes. Concretely, this +means an alignment of 16KiB for 512-byte inodes, 32KiB for 1024-byte inodes, +etc. If sparse inodes are enabled, the +ir_startino+ field of each inode +B+tree record must be aligned to this block granularity, even if the inode +given by +ir_startino+ itself is sparse. + +*sb_unit*:: +Underlying stripe or raid unit in blocks. + +*sb_width*:: +Underlying stripe or raid width in blocks. + +*sb_dirblklog*:: +log~2~ multiplier that determines the granularity of directory block allocations +in fsblocks. + +*sb_logsectlog*:: +log~2~ value of the log subvolume's sector size. This is only used if the +journaling log is on a separate disk device (i.e. not internal). + +*sb_logsectsize*:: +The log's sector size in bytes if the filesystem uses an external log device. + +*sb_logsunit*:: +The log device's stripe or raid unit size. This only applies to version 2 logs ++XFS_SB_VERSION_LOGV2BIT+ is set in +sb_versionnum+. + +*sb_features2*:: +Additional version flags if +XFS_SB_VERSION_MOREBITSBIT+ is set in ++sb_versionnum+. The currently defined additional features include: + +.Extended Version 4 Superblock flags +[options="header"] +|===== +| Flag | Description +| +XFS_SB_VERSION2_LAZYSBCOUNTBIT+ | +Lazy global counters. Making a filesystem with this bit set can improve +performance. The global free space and inode counts are only updated in the +primary superblock when the filesystem is cleanly unmounted. + +| +XFS_SB_VERSION2_ATTR2BIT+ | +Extended attributes version 2. Making a filesystem with this optimises the +inode layout of extended attributes. If this bit is set and the +noattr2+ +mount flag is not specified, the +di_forkoff+ inode field will be dynamically +adjusted. See the section about xref:Extended_Attribute_Versions[extended +attribute versions] for more information. + +| +XFS_SB_VERSION2_PARENTBIT+ | +Parent pointers. All inodes must have an extended attribute that points back to +its parent inode. The primary purpose for this information is in backup systems. + +| +XFS_SB_VERSION2_PROJID32BIT+ | +32-bit Project ID. Inodes can be associated with a project ID number, which +can be used to enforce disk space usage quotas for a particular group of +directories. This flag indicates that project IDs can be 32 bits in size. + +| +XFS_SB_VERSION2_CRCBIT+ | +Metadata checksumming. All metadata blocks have an extended header containing +the block checksum, a copy of the metadata UUID, the log sequence number of the +last update to prevent stale replays, and a back pointer to the owner of the +block. This feature must be and can only be set if the lowest nibble of ++sb_versionnum+ is set to 5. + +| +XFS_SB_VERSION2_FTYPE+ | +Directory file type. Each directory entry records the type of the inode to +which the entry points. This speeds up directory iteration by removing the +need to load every inode into memory. +|===== + +*sb_bad_features2*:: +This field mirrors +sb_features2+, due to past 64-bit alignment errors. + +*sb_features_compat*:: +Read-write compatible feature flags. The kernel can still read and write this +FS even if it doesn't understand the flag. Currently, there are no valid +flags. + +*sb_features_ro_compat*:: +Read-only compatible feature flags. The kernel can still read this FS even if +it doesn't understand the flag. + +.Extended Version 5 Superblock Read-Only compatibility flags +[options="header"] +|===== +| Flag | Description +| +XFS_SB_FEAT_RO_COMPAT_FINOBT+ | +Free inode B+tree. Each allocation group contains a B+tree to track inode chunks +containing free inodes. This is a performance optimization to reduce the time +required to allocate inodes. + +| +XFS_SB_FEAT_RO_COMPAT_RMAPBT+ | +Reverse mapping B+tree. Each allocation group contains a B+tree containing +records mapping AG blocks to their owners. See the section about +xref:Reconstruction[reconstruction] for more details. + +| +XFS_SB_FEAT_RO_COMPAT_REFLINK+ | +Reference count B+tree. Each allocation group contains a B+tree to track the +reference counts of AG blocks. This enables files to share data blocks safely. +See the section about xref:Reflink_Deduplication[reflink and deduplication] for +more details. + +| +XFS_SB_FEAT_RO_COMPAT_INOBTCNT+ | +Inode B+tree block counters. Each allocation group's inode (AGI) header +tracks the number of blocks in each of the inode B+trees. This allows us +to have a slightly higher level of redundancy over the shape of the inode +btrees, and decreases the amount of time to compute the metadata B+tree +preallocations at mount time. + +|===== + +*sb_features_incompat*:: +Read-write incompatible feature flags. The kernel cannot read or write this +FS if it doesn't understand the flag. + +.Extended Version 5 Superblock Read-Write incompatibility flags +[options="header"] +|===== +| Flag | Description +| +XFS_SB_FEAT_INCOMPAT_FTYPE+ | +Directory file type. Each directory entry tracks the type of the inode to +which the entry points. This is a performance optimization to remove the need +to load every inode into memory to iterate a directory. + +| +XFS_SB_FEAT_INCOMPAT_SPINODES+ | +Sparse inodes. This feature relaxes the requirement to allocate inodes in +chunks of 64. When the free space is heavily fragmented, there might exist +plenty of free space but not enough contiguous free space to allocate a new +inode chunk. With this feature, the user can continue to create files until +all free space is exhausted. + +Unused space in the inode B+tree records are used to track which parts of the +inode chunk are not inodes. + +See the chapter on xref:Sparse_Inodes[Sparse Inodes] for more information. + +| +XFS_SB_FEAT_INCOMPAT_META_UUID+ | +Metadata UUID. The UUID stamped into each metadata block must match the value +in +sb_meta_uuid+. This enables the administrator to change +sb_uuid+ at will +without having to rewrite the entire filesystem. + +| +XFS_SB_FEAT_INCOMPAT_BIGTIME+ | +Large timestamps. Inode timestamps and quota expiration timers are extended to +support times through the year 2486. See the section on +xref:Timestamps[timestamps] for more information. + +| +XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR+ | +The filesystem is not in operable condition, and must be run through +xfs_repair before it can be mounted. + +| +XFS_SB_FEAT_INCOMPAT_NREXT64+ | +Large file fork extent counts. This greatly expands the maximum number of +space mappings allowed in data and extended attribute file forks. + +| +XFS_SB_FEAT_INCOMPAT_EXCHRANGE+ | +Atomic file mapping exchanges. The filesystem is capable of exchanging a range +of mappings between two arbitrary ranges of a file's fork by using log intent +items to track the progress of the high level exchange operation. In other +words, the exchange operation can be restarted if the system goes down, which +is necessary for userspace to commit of new file contents atomically. This +flag has user-visible impacts, which is why it is a permanent incompat flag. +See the section about xref:XMI_Log_Item[mapping exchange log intents] for more +information. + +| +XFS_SB_FEAT_INCOMPAT_PARENT+ | +Directory parent pointers. See the section about xref:Parent_Pointers[parent +pointers] for more information. + +|===== + +*sb_features_log_incompat*:: +Read-write incompatible feature flags for the log. The kernel cannot recover +the FS log if it doesn't understand the flag. + +.Extended Version 5 Superblock Log incompatibility flags +[options="header"] +|===== +| Flag | Description +| +XFS_SB_FEAT_INCOMPAT_LOG_XATTRS+ | +Extended attribute updates have been committed to the ondisk log. + +|===== + +*sb_crc*:: +Superblock checksum. + +*sb_spino_align*:: +Sparse inode alignment, in fsblocks. Each chunk of inodes referenced by a +sparse inode B+tree record must be aligned to this block granularity. + +*sb_pquotino*:: +Project quota inode. + +*sb_lsn*:: +Log sequence number of the last superblock update. + +*sb_meta_uuid*:: +If the +XFS_SB_FEAT_INCOMPAT_META_UUID+ feature is set, then the UUID field in +all metadata blocks must match this UUID. If not, the block header UUID field +must match +sb_uuid+. + +*sb_rrmapino*:: +If the +XFS_SB_FEAT_RO_COMPAT_RMAPBT+ feature is set and a real-time +device is present (+sb_rblocks+ > 0), this field points to an inode +that contains the root to the +xref:Real_time_Reverse_Mapping_Btree[Real-Time Reverse Mapping B+tree]. +This field is zero otherwise. + +=== xfs_db Superblock Example + +A filesystem is made on a single disk with the following command: + +---- +# mkfs.xfs -i attr=2 -n size=16384 -f /dev/sda7 +meta-data=/dev/sda7 isize=256 agcount=16, agsize=3923122 blks + = sectsz=512 attr=2 +data = bsize=4096 blocks=62769952, imaxpct=25 + = sunit=0 swidth=0 blks, unwritten=1 +naming =version 2 bsize=16384 +log =internal log bsize=4096 blocks=30649, version=1 + = sectsz=512 sunit=0 blks +realtime =none extsz=65536 blocks=0, rtextents=0 +---- + +And in xfs_db, inspecting the superblock: + +---- +xfs_db> sb +xfs_db> p +magicnum = 0x58465342 +blocksize = 4096 +dblocks = 62769952 +rblocks = 0 +rextents = 0 +uuid = 32b24036-6931-45b4-b68c-cd5e7d9a1ca5 +logstart = 33554436 +rootino = 128 +rbmino = 129 +rsumino = 130 +rextsize = 16 +agblocks = 3923122 +agcount = 16 +rbmblocks = 0 +logblocks = 30649 +versionnum = 0xb084 +sectsize = 512 +inodesize = 256 +inopblock = 16 +fname = "\000\000\000\000\000\000\000\000\000\000\000\000" +blocklog = 12 +sectlog = 9 +inodelog = 8 +inopblog = 4 +agblklog = 22 +rextslog = 0 +inprogress = 0 +imax_pct = 25 +icount = 64 +ifree = 61 +fdblocks = 62739235 +frextents = 0 +uquotino = 0 +gquotino = 0 +qflags = 0 +flags = 0 +shared_vn = 0 +inoalignmt = 2 +unit = 0 +width = 0 +dirblklog = 2 +logsectlog = 0 +logsectsize = 0 +logsunit = 0 +features2 = 8 +---- ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 1/3] design: move superblock documentation to a separate file 2024-11-07 23:25 ` [PATCH 1/3] design: move superblock documentation to a separate file Darrick J. Wong @ 2024-11-08 14:16 ` Christoph Hellwig 0 siblings, 0 replies; 17+ messages in thread From: Christoph Hellwig @ 2024-11-08 14:16 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs Looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 2/3] design: document the actual ondisk superblock 2024-11-07 23:24 ` [PATCHSET v5.6 1/2] xfs-documentation: document metadata directories Darrick J. Wong 2024-11-07 23:25 ` [PATCH 1/3] design: move superblock documentation to a separate file Darrick J. Wong @ 2024-11-07 23:25 ` Darrick J. Wong 2024-11-08 14:16 ` Christoph Hellwig 2024-11-07 23:26 ` [PATCH 3/3] design: document the changes required to handle metadata directories Darrick J. Wong 2 siblings, 1 reply; 17+ messages in thread From: Darrick J. Wong @ 2024-11-07 23:25 UTC (permalink / raw) To: djwong; +Cc: hch, linux-xfs From: Darrick J. Wong <djwong@kernel.org> struct xfs_dsb is the ondisk superblock, not struct xfs_sb. Replace the struct definition with the one for the the ondisk superblock. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- .../XFS_Filesystem_Structure/superblock.asciidoc | 117 ++++++++++---------- 1 file changed, 58 insertions(+), 59 deletions(-) diff --git a/design/XFS_Filesystem_Structure/superblock.asciidoc b/design/XFS_Filesystem_Structure/superblock.asciidoc index 16c31116ffafd4..79e8c30dc93e79 100644 --- a/design/XFS_Filesystem_Structure/superblock.asciidoc +++ b/design/XFS_Filesystem_Structure/superblock.asciidoc @@ -11,68 +11,67 @@ field follows. [source, c] ---- -struct xfs_sb -{ - __uint32_t sb_magicnum; - __uint32_t sb_blocksize; - xfs_rfsblock_t sb_dblocks; - xfs_rfsblock_t sb_rblocks; - xfs_rtblock_t sb_rextents; - uuid_t sb_uuid; - xfs_fsblock_t sb_logstart; - xfs_ino_t sb_rootino; - xfs_ino_t sb_rbmino; - xfs_ino_t sb_rsumino; - xfs_agblock_t sb_rextsize; - xfs_agblock_t sb_agblocks; - xfs_agnumber_t sb_agcount; - xfs_extlen_t sb_rbmblocks; - xfs_extlen_t sb_logblocks; - __uint16_t sb_versionnum; - __uint16_t sb_sectsize; - __uint16_t sb_inodesize; - __uint16_t sb_inopblock; - char sb_fname[12]; - __uint8_t sb_blocklog; - __uint8_t sb_sectlog; - __uint8_t sb_inodelog; - __uint8_t sb_inopblog; - __uint8_t sb_agblklog; - __uint8_t sb_rextslog; - __uint8_t sb_inprogress; - __uint8_t sb_imax_pct; - __uint64_t sb_icount; - __uint64_t sb_ifree; - __uint64_t sb_fdblocks; - __uint64_t sb_frextents; - xfs_ino_t sb_uquotino; - xfs_ino_t sb_gquotino; - __uint16_t sb_qflags; - __uint8_t sb_flags; - __uint8_t sb_shared_vn; - xfs_extlen_t sb_inoalignmt; - __uint32_t sb_unit; - __uint32_t sb_width; - __uint8_t sb_dirblklog; - __uint8_t sb_logsectlog; - __uint16_t sb_logsectsize; - __uint32_t sb_logsunit; - __uint32_t sb_features2; - __uint32_t sb_bad_features2; +struct xfs_dsb { + __be32 sb_magicnum; + __be32 sb_blocksize; + __be64 sb_dblocks; + __be64 sb_rblocks; + __be64 sb_rextents; + uuid_t sb_uuid; + __be64 sb_logstart; + __be64 sb_rootino; + __be64 sb_rbmino; + __be64 sb_rsumino; + __be32 sb_rextsize; + __be32 sb_agblocks; + __be32 sb_agcount; + __be32 sb_rbmblocks; + __be32 sb_logblocks; + __be16 sb_versionnum; + __be16 sb_sectsize; + __be16 sb_inodesize; + __be16 sb_inopblock; + char sb_fname[XFSLABEL_MAX]; + __u8 sb_blocklog; + __u8 sb_sectlog; + __u8 sb_inodelog; + __u8 sb_inopblog; + __u8 sb_agblklog; + __u8 sb_rextslog; + __u8 sb_inprogress; + __u8 sb_imax_pct; + __be64 sb_icount; + __be64 sb_ifree; + __be64 sb_fdblocks; + __be64 sb_frextents; + __be64 sb_uquotino; + __be64 sb_gquotino; + __be16 sb_qflags; + __u8 sb_flags; + __u8 sb_shared_vn; + __be32 sb_inoalignmt; + __be32 sb_unit; + __be32 sb_width; + __u8 sb_dirblklog; + __u8 sb_logsectlog; + __be16 sb_logsectsize; + __be32 sb_logsunit; + __be32 sb_features2; + __be32 sb_bad_features2; /* version 5 superblock fields start here */ - __uint32_t sb_features_compat; - __uint32_t sb_features_ro_compat; - __uint32_t sb_features_incompat; - __uint32_t sb_features_log_incompat; + __be32 sb_features_compat; + __be32 sb_features_ro_compat; + __be32 sb_features_incompat; + __be32 sb_features_log_incompat; + __le32 sb_crc; + __be32 sb_spino_align; + __be64 sb_pquotino; + __be64 sb_lsn; + uuid_t sb_meta_uuid; + __be64 sb_rrmapino; - __uint32_t sb_crc; - xfs_extlen_t sb_spino_align; - - xfs_ino_t sb_pquotino; - xfs_lsn_t sb_lsn; - uuid_t sb_meta_uuid; - xfs_ino_t sb_rrmapino; + /* must be padded to 64 bit alignment */ }; ---- *sb_magicnum*:: ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 2/3] design: document the actual ondisk superblock 2024-11-07 23:25 ` [PATCH 2/3] design: document the actual ondisk superblock Darrick J. Wong @ 2024-11-08 14:16 ` Christoph Hellwig 0 siblings, 0 replies; 17+ messages in thread From: Christoph Hellwig @ 2024-11-08 14:16 UTC (permalink / raw) To: Darrick J. Wong; +Cc: hch, linux-xfs Looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 3/3] design: document the changes required to handle metadata directories 2024-11-07 23:24 ` [PATCHSET v5.6 1/2] xfs-documentation: document metadata directories Darrick J. Wong 2024-11-07 23:25 ` [PATCH 1/3] design: move superblock documentation to a separate file Darrick J. Wong 2024-11-07 23:25 ` [PATCH 2/3] design: document the actual ondisk superblock Darrick J. Wong @ 2024-11-07 23:26 ` Darrick J. Wong 2 siblings, 0 replies; 17+ messages in thread From: Darrick J. Wong @ 2024-11-07 23:26 UTC (permalink / raw) To: djwong; +Cc: hch, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Document the ondisk format changes for metadata directories. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> --- .../internal_inodes.asciidoc | 113 ++++++++++++++++++++ .../XFS_Filesystem_Structure/ondisk_inode.asciidoc | 22 ++++ .../XFS_Filesystem_Structure/superblock.asciidoc | 14 +- 3 files changed, 142 insertions(+), 7 deletions(-) diff --git a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc index 84e4cb969ce392..eaa0a50aa848f3 100644 --- a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc +++ b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc @@ -5,6 +5,119 @@ XFS allocates several inodes when a filesystem is created. These are internal and not accessible from the standard directory structure. These inodes are only accessible from the superblock. +[[Metadata_Directories]] +== Metadata Directory Tree + +If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, the +sb_metadirino+ +field in the superblock points to the root of a directory tree containing +metadata files. This directory tree is completely internal to the filesystem +and must not be exposed to user programs. + +When this feature is enabled, metadata files should be found by walking the +metadata directory tree. The superblock fields that formerly pointed to (some) +of those inodes have been deallocated and may be reused by future features. + +.Metadata Directory Paths +[options="header"] +|===== +| Metadata File | Location +|===== + +Metadata files are flagged by the +XFS_DIFLAG2_METADATA+ flag in the ++di_flags2+ field. Metadata files must have the following properties: + +* Must be either a directory or a regular file. +* chmod 0000 +* User and group IDs set to zero. +* The +XFS_DIFLAG_IMMUTABLE+, +XFS_DIFLAG_SYNC+, +XFS_DIFLAG_NOATIME+, +XFS_DIFLAG_NODUMP+, and +XFS_DIFLAG_NODEFRAG+ flags must all be set in +di_flags+. +* For a directory, the +XFS_DIFLAG_NOSYMLINKS+ flag must also be set. +* The +XFS_DIFLAG2_METADATA+ flag must be set in +di_flags2+. +* The +XFS_DIFLAG2_DAX+ flag must not be set. + +=== Metadata Directory Example + +This example shows a metadta directory from a freshly formatted root +filesystem: + +---- +xfs_db> sb 0 +xfs_db> p +magicnum = 0x58465342 +blocksize = 4096 +dblocks = 5192704 +rblocks = 0 +rextents = 0 +uuid = cbf2ceef-658e-46b0-8f96-785661c37976 +logstart = 4194311 +rootino = 128 +rbmino = 130 +rsumino = 131 +... +meta_uuid = 00000000-0000-0000-0000-000000000000 +metadirino = 129 +... +---- + +Notice how the listing includes the root of the metadata directory tree +(+metadirino+). + +---- +xfs_db> path -m / +xfs_db> ls +8 129 directory 0x0000002e 1 . (good) +10 129 directory 0x0000172e 2 .. (good) +12 33685632 directory 0x2d18ab4c 8 rtgroups (good) +---- + +Here we use the +path+ and +ls+ commands to display the root directory of +the metadata directory. We can navigate the directory the old way, too: + +---- +xfs_db> p +core.magic = 0x494e +core.mode = 040000 +core.version = 3 +core.format = 1 (local) +core.onlink = 0 +core.uid = 0 +core.gid = 0 +... +v3.flags2 = 0x8000000000000018 +v3.cowextsize = 0 +v3.crtime.sec = Wed Aug 7 10:22:36 2024 +v3.crtime.nsec = 273744000 +v3.inumber = 129 +v3.uuid = 7e55b909-8728-4d69-a1fa-891427314eea +v3.reflink = 0 +v3.cowextsz = 0 +v3.dax = 0 +v3.bigtime = 1 +v3.nrext64 = 1 +v3.metadata = 1 +u3.sfdir3.hdr.count = 1 +u3.sfdir3.hdr.i8count = 0 +u3.sfdir3.hdr.parent.i4 = 129 +u3.sfdir3.list[0].namelen = 8 +u3.sfdir3.list[0].offset = 0x60 +u3.sfdir3.list[0].name = "rtgroups" +u3.sfdir3.list[0].inumber.i4 = 33685632 +u3.sfdir3.list[0].filetype = 2 +---- + +The root of the metadata directory is a short format directory, and looks just +like any other directory. The only difference is that the metadata flag is +set, and the directory can only be viewed in the XFS debugger. + +---- +xfs_db> path -m /rtgroups/0.rmap +btdump +u3.rtrmapbt.recs[1] = [startblock,blockcount,owner,offset,extentflag,attrfork,bmbtblock] +1:[0,1,-3,0,0,0,0] +---- + +Observe that we can use the xfs_db +path+ command to navigate the metadata +directory tree to the user quota file and display its contents. + [[Quota_Inodes]] == Quota Inodes diff --git a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc index 34c064871cb255..02ec0d12bb57e5 100644 --- a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc +++ b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc @@ -78,7 +78,10 @@ struct xfs_dinode_core { __uint16_t di_mode; __int8_t di_version; __int8_t di_format; - __uint16_t di_onlink; + union { + __uint16_t di_onlink; + __uint16_t di_metatype; + }; __uint32_t di_uid; __uint32_t di_gid; __uint32_t di_nlink; @@ -188,6 +191,17 @@ In v1 inodes, this specifies the number of links to the inode from directories. When the number exceeds 65535, the inode is converted to v2 and the link count is stored in +di_nlink+. +*di_metatype*:: +If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, the +di_onlink+ field +is redefined to declare the intended contents of files in the metadata +directory tree. + +[source, c] +---- +enum xfs_metafile_type { +}; +---- + *di_uid*:: Specifies the owner's UID of the inode. @@ -383,6 +397,12 @@ will be copied to all newly created files and directories. Files with this flag set may have up to (2^48^ - 1) extents mapped to the data fork and up to (2^32^ - 1) extents mapped to the attribute fork. This flag requires the +XFS_SB_FEAT_INCOMPAT_NREXT64+ feature to be enabled. +| +XFS_DIFLAG2_METADATA+ | +This file contains filesystem metadata. This feature requires the ++XFS_SB_FEAT_INCOMPAT_METADIR+ feature to be enabled. See the section about +xref:Metadata_Directories[metadata directories] for more information on +metadata inode properties. Only directories and regular files can have this +flag set. |===== *di_cowextsize*:: diff --git a/design/XFS_Filesystem_Structure/superblock.asciidoc b/design/XFS_Filesystem_Structure/superblock.asciidoc index 79e8c30dc93e79..56877615ae81bf 100644 --- a/design/XFS_Filesystem_Structure/superblock.asciidoc +++ b/design/XFS_Filesystem_Structure/superblock.asciidoc @@ -69,7 +69,7 @@ struct xfs_dsb { __be64 sb_pquotino; __be64 sb_lsn; uuid_t sb_meta_uuid; - __be64 sb_rrmapino; + __be64 sb_metadirino; /* must be padded to 64 bit alignment */ }; @@ -438,6 +438,10 @@ information. Directory parent pointers. See the section about xref:Parent_Pointers[parent pointers] for more information. +| +XFS_SB_FEAT_INCOMPAT_METADIR+ | +Metadata directory tree. See the section about the xref:Metadata_Directories[ +metadata directory tree] for more information. + |===== *sb_features_log_incompat*:: @@ -471,11 +475,9 @@ If the +XFS_SB_FEAT_INCOMPAT_META_UUID+ feature is set, then the UUID field in all metadata blocks must match this UUID. If not, the block header UUID field must match +sb_uuid+. -*sb_rrmapino*:: -If the +XFS_SB_FEAT_RO_COMPAT_RMAPBT+ feature is set and a real-time -device is present (+sb_rblocks+ > 0), this field points to an inode -that contains the root to the -xref:Real_time_Reverse_Mapping_Btree[Real-Time Reverse Mapping B+tree]. +*sb_metadirino*:: +If the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is set, this field points to +the inode of the root directory of the metadata directory tree. This field is zero otherwise. === xfs_db Superblock Example ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCHSET v5.6 2/2] xfs-documentation: shard the realtime section 2024-11-07 23:21 [PATCHBOMB 6.13 v5.6] xfs-docs: metadata directories and realtime groups Darrick J. Wong 2024-11-07 23:24 ` Darrick J. Wong 2024-11-07 23:24 ` [PATCHSET v5.6 1/2] xfs-documentation: document metadata directories Darrick J. Wong @ 2024-11-07 23:25 ` Darrick J. Wong 2024-11-07 23:26 ` [PATCH 1/4] design: move discussion of realtime volumes to a separate section Darrick J. Wong ` (3 more replies) 2 siblings, 4 replies; 17+ messages in thread From: Darrick J. Wong @ 2024-11-07 23:25 UTC (permalink / raw) To: djwong; +Cc: hch, linux-xfs Hi all, Document changes to the ondisk format for realtime groups. If you're going to start using this code, I strongly recommend pulling from my git trees, which are linked below. Comments and questions are, as always, welcome. kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=realtime-groups xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=realtime-groups fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=realtime-groups xfsdocs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-documentation.git/log/?h=realtime-groups --- Commits in this patchset: * design: move discussion of realtime volumes to a separate section * design: document realtime groups * design: document metadata directory tree quota changes * design: update metadump v2 format to reflect rt dumps --- .../allocation_groups.asciidoc | 20 - .../XFS_Filesystem_Structure/common_types.asciidoc | 4 .../internal_inodes.asciidoc | 41 -- design/XFS_Filesystem_Structure/magic.asciidoc | 3 design/XFS_Filesystem_Structure/metadump.asciidoc | 12 + .../XFS_Filesystem_Structure/ondisk_inode.asciidoc | 5 design/XFS_Filesystem_Structure/realtime.asciidoc | 394 ++++++++++++++++++++ .../XFS_Filesystem_Structure/superblock.asciidoc | 25 + .../xfs_filesystem_structure.asciidoc | 2 9 files changed, 450 insertions(+), 56 deletions(-) create mode 100644 design/XFS_Filesystem_Structure/realtime.asciidoc ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/4] design: move discussion of realtime volumes to a separate section 2024-11-07 23:25 ` [PATCHSET v5.6 2/2] xfs-documentation: shard the realtime section Darrick J. Wong @ 2024-11-07 23:26 ` Darrick J. Wong 2024-11-07 23:26 ` [PATCH 2/4] design: document realtime groups Darrick J. Wong ` (2 subsequent siblings) 3 siblings, 0 replies; 17+ messages in thread From: Darrick J. Wong @ 2024-11-07 23:26 UTC (permalink / raw) To: djwong; +Cc: hch, linux-xfs From: Darrick J. Wong <djwong@kernel.org> In preparation for documenting the realtime modernization project, move the discussions of the realtime-realted ondisk metadata to a separate file. Since realtime reverse mapping btrees haven't been added to the filesystem yet, stop including them in the final output. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> --- .../allocation_groups.asciidoc | 20 -------- .../internal_inodes.asciidoc | 36 +------------- design/XFS_Filesystem_Structure/realtime.asciidoc | 50 ++++++++++++++++++++ .../xfs_filesystem_structure.asciidoc | 2 + 4 files changed, 54 insertions(+), 54 deletions(-) create mode 100644 design/XFS_Filesystem_Structure/realtime.asciidoc diff --git a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc index e2cdaab5e03d3f..c746a92ca47dd6 100644 --- a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc +++ b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc @@ -772,23 +772,3 @@ core.magic = 0x494e The chunk record also indicates that this chunk has 32 inodes, and that the missing inodes are also ``free''. - -[[Real-time_Devices]] -== Real-time Devices - -The performance of the standard XFS allocator varies depending on the internal -state of the various metadata indices enabled on the filesystem. For -applications which need to minimize the jitter of allocation latency, XFS -supports the notion of a ``real-time device''. This is a special device -separate from the regular filesystem where extent allocations are tracked with -a bitmap and free space is indexed with a two-dimensional array. If an inode -is flagged with +XFS_DIFLAG_REALTIME+, its data will live on the real time -device. The metadata for real time devices is discussed in the section about -xref:Real-time_Inodes[real time inodes]. - -By placing the real time device (and the journal) on separate high-performance -storage devices, it is possible to reduce most of the unpredictability in I/O -response times that come from metadata operations. - -None of the XFS per-AG B+trees are involved with real time files. It is not -possible for real time files to share data blocks. diff --git a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc index eaa0a50aa848f3..68c86d30ff8206 100644 --- a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc +++ b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc @@ -287,41 +287,9 @@ Log sequence number of the last DQ block write. *dd_crc*:: Checksum of the DQ block. - [[Real-time_Inodes]] == Real-time Inodes There are two inodes allocated to managing the real-time device's space, the -Bitmap Inode and the Summary Inode. - -[[Real-Time_Bitmap_Inode]] -=== Real-Time Bitmap Inode - -The real time bitmap inode, +sb_rbmino+, tracks the used/free space in the -real-time device using an old-style bitmap. One bit is allocated per real-time -extent. The size of an extent is specified by the superblock's +sb_rextsize+ -value. - -The number of blocks used by the bitmap inode is equal to the number of -real-time extents (+sb_rextents+) divided by the block size (+sb_blocksize+) -and bits per byte. This value is stored in +sb_rbmblocks+. The nblocks and -extent array for the inode should match this. Each real time block gets its -own bit in the bitmap. - -[[Real-Time_Summary_Inode]] -=== Real-Time Summary Inode - -The real time summary inode, +sb_rsumino+, tracks the used and free space -accounting information for the real-time device. This file indexes the -approximate location of each free extent on the real-time device first by -log2(extent size) and then by the real-time bitmap block number. The size of -the summary inode file is equal to +sb_rbmblocks+ × log2(realtime device size) -× sizeof(+xfs_suminfo_t+). The entry for a given log2(extent size) and -rtbitmap block number is 0 if there is no free extents of that size at that -rtbitmap location, and positive if there are any. - -This data structure is not particularly space efficient, however it is a very -fast way to provide the same data as the two free space B+trees for regular -files since the space is preallocated and metadata maintenance is minimal. - -include::rtrmapbt.asciidoc[] +xref:Real-Time_Bitmap_Inode[Bitmap Inode] and the +xref:Real-Time_Summary_Inode[Summary Inode]. diff --git a/design/XFS_Filesystem_Structure/realtime.asciidoc b/design/XFS_Filesystem_Structure/realtime.asciidoc new file mode 100644 index 00000000000000..11426e8fdb632d --- /dev/null +++ b/design/XFS_Filesystem_Structure/realtime.asciidoc @@ -0,0 +1,50 @@ +[[Real-time_Devices]] += Real-time Devices + +The performance of the standard XFS allocator varies depending on the internal +state of the various metadata indices enabled on the filesystem. For +applications which need to minimize the jitter of allocation latency, XFS +supports the notion of a ``real-time device''. This is a special device +separate from the regular filesystem where extent allocations are tracked with +a bitmap and free space is indexed with a two-dimensional array. If an inode +is flagged with +XFS_DIFLAG_REALTIME+, its data will live on the real time +device. + +By placing the real time device (and the journal) on separate high-performance +storage devices, it is possible to reduce most of the unpredictability in I/O +response times that come from metadata operations. + +None of the XFS per-AG B+trees are involved with real time files. It is not +possible for real time files to share data blocks. + +[[Real-Time_Bitmap_Inode]] +== Free Space Bitmap Inode + +The real time bitmap inode, +sb_rbmino+, tracks the used/free space in the +real-time device using an old-style bitmap. One bit is allocated per real-time +extent. The size of an extent is specified by the superblock's +sb_rextsize+ +value. + +The number of blocks used by the bitmap inode is equal to the number of +real-time extents (+sb_rextents+) divided by the block size (+sb_blocksize+) +and bits per byte. This value is stored in +sb_rbmblocks+. The nblocks and +extent array for the inode should match this. Each real time block gets its +own bit in the bitmap. + +[[Real-Time_Summary_Inode]] +== Free Space Summary Inode + +The real time summary inode, +sb_rsumino+, tracks the used and free space +accounting information for the real-time device. This file indexes the +approximate location of each free extent on the real-time device first by +log2(extent size) and then by the real-time bitmap block number. The size of +the summary inode file is equal to +sb_rbmblocks+ × log2(realtime device size) +× sizeof(+xfs_suminfo_t+). The entry for a given log2(extent size) and +rtbitmap block number is 0 if there is no free extents of that size at that +rtbitmap location, and positive if there are any. + +This data structure is not particularly space efficient, however it is a very +fast way to provide the same data as the two free space B+trees for regular +files since the space is preallocated and metadata maintenance is minimal. + +include::rtrmapbt.asciidoc[] diff --git a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc index 689e2a874c13e9..a643d18add6094 100644 --- a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc +++ b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc @@ -84,6 +84,8 @@ include::journaling_log.asciidoc[] include::internal_inodes.asciidoc[] +include::realtime.asciidoc[] + include::fs_properties.asciidoc[] :leveloffset: 0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 2/4] design: document realtime groups 2024-11-07 23:25 ` [PATCHSET v5.6 2/2] xfs-documentation: shard the realtime section Darrick J. Wong 2024-11-07 23:26 ` [PATCH 1/4] design: move discussion of realtime volumes to a separate section Darrick J. Wong @ 2024-11-07 23:26 ` Darrick J. Wong 2024-11-08 14:17 ` Christoph Hellwig 2024-11-07 23:26 ` [PATCH 3/4] design: document metadata directory tree quota changes Darrick J. Wong 2024-11-07 23:27 ` [PATCH 4/4] design: update metadump v2 format to reflect rt dumps Darrick J. Wong 3 siblings, 1 reply; 17+ messages in thread From: Darrick J. Wong @ 2024-11-07 23:26 UTC (permalink / raw) To: djwong; +Cc: hch, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Document the ondisk changes for realtime allocation groups. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- .../XFS_Filesystem_Structure/common_types.asciidoc | 4 .../internal_inodes.asciidoc | 2 design/XFS_Filesystem_Structure/magic.asciidoc | 3 .../XFS_Filesystem_Structure/ondisk_inode.asciidoc | 2 design/XFS_Filesystem_Structure/realtime.asciidoc | 344 ++++++++++++++++++++ .../XFS_Filesystem_Structure/superblock.asciidoc | 22 + 6 files changed, 376 insertions(+), 1 deletion(-) diff --git a/design/XFS_Filesystem_Structure/common_types.asciidoc b/design/XFS_Filesystem_Structure/common_types.asciidoc index 51909be384e273..34cdfdaeccf848 100644 --- a/design/XFS_Filesystem_Structure/common_types.asciidoc +++ b/design/XFS_Filesystem_Structure/common_types.asciidoc @@ -43,7 +43,9 @@ Unsigned 64 bit raw filesystem block number. *xfs_rtblock_t*:: Unsigned 64 bit extent number in the xref:Real-time_Devices[real-time] -sub-volume. +sub-volume. If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, these +values combine an xref:Realtime_Groups[rtgroup number] and block offset into +the realtime group. *xfs_fileoff_t*:: Unsigned 64 bit block offset into a file. diff --git a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc index 68c86d30ff8206..5f4d62201cbd67 100644 --- a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc +++ b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc @@ -21,6 +21,8 @@ of those inodes have been deallocated and may be reused by future features. [options="header"] |===== | Metadata File | Location +| xref:Real-Time_Bitmap_Inode[Realtime Bitmap] | /rtgroups/*.bitmap +| xref:Real-Time_Summary_Inode[Realtime Summary] | /rtgroups/*.summary |===== Metadata files are flagged by the +XFS_DIFLAG2_METADATA+ flag in the diff --git a/design/XFS_Filesystem_Structure/magic.asciidoc b/design/XFS_Filesystem_Structure/magic.asciidoc index 60952aeb876ff5..5da29b9ef9f3a8 100644 --- a/design/XFS_Filesystem_Structure/magic.asciidoc +++ b/design/XFS_Filesystem_Structure/magic.asciidoc @@ -45,9 +45,12 @@ relevant chapters. Magic numbers tend to have consistent locations: | +XFS_ATTR3_LEAF_MAGIC+ | 0x3bee | | xref:Leaf_Attributes[Leaf Attribute], v5 only | +XFS_ATTR3_RMT_MAGIC+ | 0x5841524d | XARM | xref:Remote_Values[Remote Attribute Value], v5 only | +XFS_RMAP_CRC_MAGIC+ | 0x524d4233 | RMB3 | xref:Reverse_Mapping_Btree[Reverse Mapping B+tree], v5 only +| +XFS_RTBITMAP_MAGIC+ | 0x424D505A | BMPZ | xref:Real-Time_Bitmap_Inode[Real-Time Bitmap], metadir only +| +XFS_RTSUMMARY_MAGIC+ | 0x53554D59 | SUMY | xref:Real-Time_Summary_Inode[Real-Time Summary], metadir only | +XFS_RTRMAP_CRC_MAGIC+ | 0x4d415052 | MAPR | xref:Real_time_Reverse_Mapping_Btree[Real-Time Reverse Mapping B+tree], v5 only | +XFS_REFC_CRC_MAGIC+ | 0x52334643 | R3FC | xref:Reference_Count_Btree[Reference Count B+tree], v5 only | +XFS_MD_MAGIC+ | 0x5846534d | XFSM | xref:Metadata_Dumps[Metadata Dumps] +| +XFS_RTSB_MAGIC+ | 0x46726F67 | Frog | xref:Realtime_Groups[Realtime Groups] |===== The magic numbers for log items are at offset zero in each log item, but items diff --git a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc index 02ec0d12bb57e5..e28929907147b7 100644 --- a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc +++ b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc @@ -199,6 +199,8 @@ directory tree. [source, c] ---- enum xfs_metafile_type { + XFS_METAFILE_RTBITMAP, + XFS_METAFILE_RTSUMMARY, }; ---- diff --git a/design/XFS_Filesystem_Structure/realtime.asciidoc b/design/XFS_Filesystem_Structure/realtime.asciidoc index 11426e8fdb632d..3a72eb5175ad89 100644 --- a/design/XFS_Filesystem_Structure/realtime.asciidoc +++ b/design/XFS_Filesystem_Structure/realtime.asciidoc @@ -31,6 +31,146 @@ and bits per byte. This value is stored in +sb_rbmblocks+. The nblocks and extent array for the inode should match this. Each real time block gets its own bit in the bitmap. +If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, each block of the +realtime bitmap file has a header of the following format: + +[source, c] +---- +struct xfs_rtbuf_blkinfo { + __be32 rt_magic; + __be32 rt_crc; + __be64 rt_owner; + __be64 rt_blkno; + __be64 rt_lsn; + uuid_t rt_uuid; +}; +---- + +*rt_magic*:: +Specifies the magic number for the rtbitmap block: ``BMPZ'' (0x424D505A). + +*rt_crc*:: +Checksum of the block. + +*rt_owner*:: +Specifies the inode number for the file that owns this block. + +*rt_blkno*:: +Disk address of this block. + +*rt_lsn*:: +Log sequence number of the last write to this block. + +*rt_uuid*:: +The UUID of this block, which must match either +sb_uuid+ or +sb_meta_uuid+ +depending on which features are set. + +After the block header, the bitmap data are encoded as be32 word values. + +=== xfs_db rtbitmap Example + +This example shows a real-time bitmap file from a freshly populated filesystem: + +---- +xfs_db> path -m /rtgroups/3.bitmap +xfs_db> p +core.magic = 0x494e +core.mode = 0100000 +core.version = 3 +core.format = 2 (extents) +core.metatype = 5 (rtbitmap) +core.uid = 0 +core.gid = 0 +core.nlinkv2 = 1 +core.projid_lo = 3 +core.projid_hi = 0 +core.nextents = 1 +core.atime.sec = Tue Oct 15 16:04:02 2024 +core.atime.nsec = 769675000 +core.mtime.sec = Tue Oct 15 16:04:02 2024 +core.mtime.nsec = 769675000 +core.ctime.sec = Tue Oct 15 16:04:02 2024 +core.ctime.nsec = 769681000 +core.size = 135168 +core.nblocks = 33 +core.extsize = 0 +core.naextents = 0 +core.forkoff = 24 +core.aformat = 1 (local) +core.dmevmask = 0 +core.dmstate = 0 +core.newrtbm = 0 +core.prealloc = 0 +core.realtime = 0 +core.immutable = 1 +core.append = 0 +core.sync = 1 +core.noatime = 1 +core.nodump = 1 +core.rtinherit = 0 +core.projinherit = 0 +core.nosymlinks = 0 +core.extsz = 0 +core.extszinherit = 0 +core.nodefrag = 1 +core.filestream = 0 +core.gen = 2653591217 +next_unlinked = null +v3.crc = 0x34a17119 (correct) +v3.change_count = 3 +v3.lsn = 0 +v3.flags2 = 0x38 +v3.cowextsize = 0 +v3.crtime.sec = Tue Oct 15 16:04:02 2024 +v3.crtime.nsec = 769675000 +v3.inumber = 33685633 +v3.uuid = a6575f59-1514-445e-883e-211b2c5a0f05 +v3.reflink = 0 +v3.cowextsz = 0 +v3.dax = 0 +v3.bigtime = 1 +v3.nrext64 = 1 +v3.metadata = 1 +u3.bmx[0] = [startoff,startblock,blockcount,extentflag] +0:[0,4210712,33,0] +a.sfattr.hdr.totsize = 27 +a.sfattr.hdr.count = 1 +a.sfattr.list[0].namelen = 8 +a.sfattr.list[0].valuelen = 12 +a.sfattr.list[0].root = 0 +a.sfattr.list[0].secure = 0 +a.sfattr.list[0].parent = 1 +a.sfattr.list[0].name = "0.bitmap" +a.sfattr.list[0].parent_dir.inumber = 33685632 +a.sfattr.list[0].parent_dir.gen = 142228546 +xfs_db> dblock 0 +xfs_db> p +magicnum = 0x424d505a +crc = 0xc8b10abf (correct) +owner = 33685633 +bno = 20902080 +lsn = 0x100007696 +uuid = a6575f59-1514-445e-883e-211b2c5a0f05 +rtwords[0-1011] = 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0 8:0 9:0 10:0 11:0 12:0 13:0 +14:0 15:0 16:0 17:0 18:0 19:0 20:0 21:0xfffff800 22:0xffffffff 23:0xffffffff +24:0xffffffff 25:0xffffffff 26:0xffffffff 27:0xffffffff 28:0xffffffff +29:0xffffffff 30:0xffffffff 31:0xffffffff 32:0xffffffff +... +979:0xffffffff 980:0xffffffff 981:0xffffffff 982:0xffffffff 983:0xffffffff +984:0xffffffff 985:0xffffffff 986:0xffffffff 987:0xffffffff 988:0xffffffff +989:0xffffffff 990:0xffffffff 991:0xffffffff 992:0xffffffff 993:0xffffffff +994:0xffffffff 995:0xffffffff 996:0xffffffff 997:0xffffffff 998:0xffffffff +999:0xffffffff 1000:0xffffffff 1001:0xffffffff 1002:0xffffffff 1003:0xffffffff +1004:0xffffffff 1005:0xffffffff 1006:0xffffffff 1007:0xffffffff 1008:0xffffffff +1009:0xffffffff 1010:0xffffffff 1011:0xffffffff +---- + +From this example, we can clearly see that this is a bitmap file in the +metadata directory tree, and that it is the bitmap file for rtgroup 3. When we +access the first block in the bitmap file, we can clearly see the new block +header and that the first 179 extents are allocated. The bitmap words were +excerpted for brevity. + [[Real-Time_Summary_Inode]] == Free Space Summary Inode @@ -47,4 +187,208 @@ This data structure is not particularly space efficient, however it is a very fast way to provide the same data as the two free space B+trees for regular files since the space is preallocated and metadata maintenance is minimal. +If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, each block of the +realtime summary file has the same header as rtbitmap file blocks. However, +the magic number will be ``SUMY'' (0x53554D59). After the block header, the +summary counts are encoded as be32 integers. + +=== xfs_db rtsummary Example + +This example shows a real-time summary file from a freshly populated filesystem: + +---- +xfs_db> path -m /rtgroups/3.summary +xfs_db> p +core.magic = 0x494e +core.mode = 0100000 +core.version = 3 +core.format = 2 (extents) +core.metatype = 6 (rtsummary) +core.uid = 0 +core.gid = 0 +core.nlinkv2 = 1 +core.projid_lo = 3 +core.projid_hi = 0 +core.nextents = 1 +core.atime.sec = Tue Oct 15 16:04:02 2024 +core.atime.nsec = 769694000 +core.mtime.sec = Tue Oct 15 16:04:02 2024 +core.mtime.nsec = 769694000 +core.ctime.sec = Tue Oct 15 16:04:02 2024 +core.ctime.nsec = 769699000 +core.size = 4096 +core.nblocks = 1 +core.extsize = 0 +core.naextents = 0 +core.forkoff = 24 +core.aformat = 1 (local) +core.dmevmask = 0 +core.dmstate = 0 +core.newrtbm = 0 +core.prealloc = 0 +core.realtime = 0 +core.immutable = 1 +core.append = 0 +core.sync = 1 +core.noatime = 1 +core.nodump = 1 +core.rtinherit = 0 +core.projinherit = 0 +core.nosymlinks = 0 +core.extsz = 0 +core.extszinherit = 0 +core.nodefrag = 1 +core.filestream = 0 +core.gen = 519466891 +next_unlinked = null +v3.crc = 0x54fc58d0 (correct) +v3.change_count = 3 +v3.lsn = 0 +v3.flags2 = 0x38 +v3.cowextsize = 0 +v3.crtime.sec = Tue Oct 15 16:04:02 2024 +v3.crtime.nsec = 769694000 +v3.inumber = 33685634 +v3.uuid = a6575f59-1514-445e-883e-211b2c5a0f05 +v3.reflink = 0 +v3.cowextsz = 0 +v3.dax = 0 +v3.bigtime = 1 +v3.nrext64 = 1 +v3.metadata = 1 +u3.bmx[0] = [startoff,startblock,blockcount,extentflag] +0:[0,4210703,1,0] +a.sfattr.hdr.totsize = 28 +a.sfattr.hdr.count = 1 +a.sfattr.list[0].namelen = 9 +a.sfattr.list[0].valuelen = 12 +a.sfattr.list[0].root = 0 +a.sfattr.list[0].secure = 0 +a.sfattr.list[0].parent = 1 +a.sfattr.list[0].name = "0.summary" +a.sfattr.list[0].parent_dir.inumber = 33685632 +a.sfattr.list[0].parent_dir.gen = 142228546 +xfs_db> dblock 0 +xfs_db> p +magicnum = 0x53554d59 +crc = 0x473340a8 (correct) +owner = 33685634 +bno = 20902008 +lsn = 0x100007696 +uuid = a6575f59-1514-445e-883e-211b2c5a0f05 +suminfo[0-1011] = 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0 8:0 9:0 10:0 11:0 12:0 13:0 +14:0 15:0 16:0 17:0 18:0 19:0 20:0 21:0 22:0 23:0 24:0 25:0 26:0 27:0 28:0 29:0 +30:0 31:0 32:0 +... +618:0 619:0 620:0 621:0 622:0 623:0 624:0 625:0 626:0 627:1 628:0 629:0 630:0 +... +979:0 980:0 981:0 982:0 983:0 984:0 985:0 986:0 987:0 988:0 989:0 990:0 991:0 +992:0 993:0 994:0 995:0 996:0 997:0 998:0 999:0 1000:0 1001:0 1002:0 1003:0 +1004:0 1005:0 1006:0 1007:0 1008:0 1009:0 1010:0 1011:0 +---- + +From this example, we can clearly see that this is a summary file in the +metadata directory tree, and that it is the summary file for rtgroup 3. When +we access the first block in the summary file, we can clearly see the new block +header and the nonzero counter for the one large free extent in this group. +The summary counts were excerpted for brevity. + +[[Realtime_Groups]] +== Realtime Groups + +To reduce metadata contention for space allocation and remapping activities +being applied to realtime files, the realtime volume can be split into +allocation groups, just like the data volume. The free space information is +still contained in a single file that applies to the entire volume. + +Each realtime allocation group can contain up to (2^31^ - 1) filesystem blocks, +regardless of the underlying realtime extent size. + +Each realtime group has the following characteristics: + + * Group 0 has a super block describing overall filesystem info + * Free space bitmap + * Summary of free space + +The free space metadata are the same as described in the previous sections, +except that their scope covers only a single rtgroup. The other structures are +expanded upon in the following sections. + +[[Realtime_Group_Superblocks]] +=== Superblocks + +The first block of each realtime group contains a superblock. These fields +must match their counterparts in the filesystem superblock on the data device. + +[source, c] +---- +struct xfs_rtsb { + __be32 rsb_magicnum; + __le32 rsb_crc; + + __be32 rsb_pad; + unsigned char rsb_fname[XFSLABEL_MAX]; + + uuid_t rsb_uuid; + uuid_t rsb_meta_uuid; + + /* must be padded to 64 bit alignment */ +}; +---- + +*rsb_magicnum*:: +Identifies the filesystem. Its value is +XFS_RTSB_MAGIC+ ``Frog'' (0x46726F67). + +*rsb_crc*:: +Superblock checksum. + +*rsb_pad*:: +Must be zero. + +*rsb_fname[12]*:: +Name for the filesystem. This matches +sb_fname+ in the primary superblock. + +*rsb_uuid*:: +UUID (Universally Unique ID) for the filesystem. This matches +sb_uuid+ in the +primary superblock. + +*rsb_meta_uuid*:: +Metadata UUID for the filesystem. This matches +sb_meta_uuid+ in the primary +superblock. + +==== xfs_db rtgroup Superblock Example + +A filesystem is made on a multidisk filesystem with the following command: + +---- +# mkfs.xfs -r rtgroups=1,rgcount=4,rtdev=/dev/sdb /dev/sda -f +meta-data=/dev/sda isize=512 agcount=4, agsize=1298176 blks + = sectsz=512 attr=2, projid32bit=1 + = crc=1 finobt=1, sparse=1, rmapbt=1 + = reflink=1 bigtime=1 inobtcount=1 nrext64=1 + = metadir=1 +data = bsize=4096 blocks=5192704, imaxpct=25 + = sunit=0 swidth=0 blks +naming =version 2 bsize=4096 ascii-ci=0, ftype=1 +log =internal log bsize=4096 blocks=16384, version=2 + = sectsz=512 sunit=0 blks, lazy-count=1 +realtime =/dev/sdb extsz=4096 blocks=5192704, rtextents=5192704 + = rgcount=5 rgsize=1048576 extents +---- + +And in xfs_db, inspecting the realtime group superblock and then the regular +superblock: + +---- +# xfs_db -R /dev/sdb /dev/sda +xfs_db> rtsb +xfs_db> print +magicnum = 0x46726f67 +crc = 0x759a62d4 (correct) +pad = 0 +fname = "\000\000\000\000\000\000\000\000\000\000\000\000" +uuid = 7e55b909-8728-4d69-a1fa-891427314eea +meta_uuid = 7e55b909-8728-4d69-a1fa-891427314eea +---- + include::rtrmapbt.asciidoc[] diff --git a/design/XFS_Filesystem_Structure/superblock.asciidoc b/design/XFS_Filesystem_Structure/superblock.asciidoc index 56877615ae81bf..bffb1659d0ba38 100644 --- a/design/XFS_Filesystem_Structure/superblock.asciidoc +++ b/design/XFS_Filesystem_Structure/superblock.asciidoc @@ -70,6 +70,10 @@ struct xfs_dsb { __be64 sb_lsn; uuid_t sb_meta_uuid; __be64 sb_metadirino; + __be32 sb_rgcount; + __be32 sb_rgextents; + __u8 sb_rgblklog; + __u8 sb_pad[7]; /* must be padded to 64 bit alignment */ }; @@ -480,6 +484,24 @@ If the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is set, this field points to the inode of the root directory of the metadata directory tree. This field is zero otherwise. +*sb_rgcount*:: +Count of realtime groups in the filesystem, if the ++XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is enabled. If no realtime subvolume +exists, this value will be zero. + +*sb_rgextents*:: +Maximum number of realtime extents that can be contained within a realtime +group, if the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is enabled. + +*sb_rgblklog*:: +If the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is enabled, this is the log~2~ +value of +sb_rgextents+ * +sb_rextsize+ (rounded up). This value is used to +generate absolute block numbers defined in extent maps from the segmented ++xfs_rtblock_t+ values. + +*sb_pad[7]*:: +Zeroes, if the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is enabled. + === xfs_db Superblock Example A filesystem is made on a single disk with the following command: ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 2/4] design: document realtime groups 2024-11-07 23:26 ` [PATCH 2/4] design: document realtime groups Darrick J. Wong @ 2024-11-08 14:17 ` Christoph Hellwig 0 siblings, 0 replies; 17+ messages in thread From: Christoph Hellwig @ 2024-11-08 14:17 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs Looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 3/4] design: document metadata directory tree quota changes 2024-11-07 23:25 ` [PATCHSET v5.6 2/2] xfs-documentation: shard the realtime section Darrick J. Wong 2024-11-07 23:26 ` [PATCH 1/4] design: move discussion of realtime volumes to a separate section Darrick J. Wong 2024-11-07 23:26 ` [PATCH 2/4] design: document realtime groups Darrick J. Wong @ 2024-11-07 23:26 ` Darrick J. Wong 2024-11-07 23:27 ` [PATCH 4/4] design: update metadump v2 format to reflect rt dumps Darrick J. Wong 3 siblings, 0 replies; 17+ messages in thread From: Darrick J. Wong @ 2024-11-07 23:26 UTC (permalink / raw) To: djwong; +Cc: hch, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Document the changes to the ondisk quota metadata that came in with metadata directory trees. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> --- .../internal_inodes.asciidoc | 3 +++ .../XFS_Filesystem_Structure/ondisk_inode.asciidoc | 3 +++ .../XFS_Filesystem_Structure/superblock.asciidoc | 3 +++ 3 files changed, 9 insertions(+) diff --git a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc index 5f4d62201cbd67..40eb57233ce7c0 100644 --- a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc +++ b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc @@ -21,6 +21,9 @@ of those inodes have been deallocated and may be reused by future features. [options="header"] |===== | Metadata File | Location +| xref:Quota_Inodes[User Quota] | /quota/user +| xref:Quota_Inodes[Group Quota] | /quota/group +| xref:Quota_Inodes[Project Quota] | /quota/project | xref:Real-Time_Bitmap_Inode[Realtime Bitmap] | /rtgroups/*.bitmap | xref:Real-Time_Summary_Inode[Realtime Summary] | /rtgroups/*.summary |===== diff --git a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc index e28929907147b7..6e52e5fd3d6c1e 100644 --- a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc +++ b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc @@ -199,6 +199,9 @@ directory tree. [source, c] ---- enum xfs_metafile_type { + XFS_METAFILE_USRQUOTA, + XFS_METAFILE_GRPQUOTA, + XFS_METAFILE_PRJQUOTA, XFS_METAFILE_RTBITMAP, XFS_METAFILE_RTSUMMARY, }; diff --git a/design/XFS_Filesystem_Structure/superblock.asciidoc b/design/XFS_Filesystem_Structure/superblock.asciidoc index bffb1659d0ba38..f0455304635737 100644 --- a/design/XFS_Filesystem_Structure/superblock.asciidoc +++ b/design/XFS_Filesystem_Structure/superblock.asciidoc @@ -259,6 +259,9 @@ Quota flags. It can be a combination of the following flags: | +XFS_PQUOTA_CHKD+ | Project quotas have been checked. |===== +If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, the +sb_qflags+ field +will persist across mounts if no quota mount options are provided. + *sb_flags*:: Miscellaneous flags. ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 4/4] design: update metadump v2 format to reflect rt dumps 2024-11-07 23:25 ` [PATCHSET v5.6 2/2] xfs-documentation: shard the realtime section Darrick J. Wong ` (2 preceding siblings ...) 2024-11-07 23:26 ` [PATCH 3/4] design: document metadata directory tree quota changes Darrick J. Wong @ 2024-11-07 23:27 ` Darrick J. Wong 3 siblings, 0 replies; 17+ messages in thread From: Darrick J. Wong @ 2024-11-07 23:27 UTC (permalink / raw) To: djwong; +Cc: hch, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Update the metadump v2 format documentation to add realtime device dumps. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> --- design/XFS_Filesystem_Structure/metadump.asciidoc | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/design/XFS_Filesystem_Structure/metadump.asciidoc b/design/XFS_Filesystem_Structure/metadump.asciidoc index a32d6423ea6e75..226622c0d2f20e 100644 --- a/design/XFS_Filesystem_Structure/metadump.asciidoc +++ b/design/XFS_Filesystem_Structure/metadump.asciidoc @@ -119,7 +119,16 @@ Dump contains external log contents. |===== *xmh_incompat_flags*:: -Must be zero. +A combination of the following flags: + +.Metadump v2 incompat flags +[options="header"] +|===== +| Flag | Description +| +XFS_MD2_INCOMPAT_RTDEVICE+ | +Dump contains realtime device contents. + +|===== *xmh_reserved*:: Must be zero. @@ -143,6 +152,7 @@ Bits 55-56 determine the device from which the metadata dump data was extracted. | Value | Description | 0 | Data device | 1 | External log +| 2 | Realtime device |===== The lower 54 bits determine the device address from which the dump data was ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCHSET v5.5 3/3] xfs-documentation: shard the realtime section @ 2024-11-06 19:17 Darrick J. Wong 2024-11-06 19:19 ` [PATCH 2/4] design: document realtime groups Darrick J. Wong 0 siblings, 1 reply; 17+ messages in thread From: Darrick J. Wong @ 2024-11-06 19:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs, hch Hi all, Document changes to the ondisk format for realtime groups. If you're going to start using this code, I strongly recommend pulling from my git trees, which are linked below. Comments and questions are, as always, welcome. kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=realtime-groups xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=realtime-groups fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=realtime-groups xfsdocs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-documentation.git/log/?h=realtime-groups --- Commits in this patchset: * design: move discussion of realtime volumes to a separate section * design: document realtime groups * design: document metadata directory tree quota changes * design: update metadump v2 format to reflect rt dumps --- .../allocation_groups.asciidoc | 54 ++- .../XFS_Filesystem_Structure/common_types.asciidoc | 4 .../internal_inodes.asciidoc | 41 -- design/XFS_Filesystem_Structure/magic.asciidoc | 3 design/XFS_Filesystem_Structure/metadump.asciidoc | 12 + .../XFS_Filesystem_Structure/ondisk_inode.asciidoc | 5 design/XFS_Filesystem_Structure/realtime.asciidoc | 394 ++++++++++++++++++++ .../xfs_filesystem_structure.asciidoc | 2 8 files changed, 459 insertions(+), 56 deletions(-) create mode 100644 design/XFS_Filesystem_Structure/realtime.asciidoc ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 2/4] design: document realtime groups 2024-11-06 19:17 [PATCHSET v5.5 3/3] xfs-documentation: shard the realtime section Darrick J. Wong @ 2024-11-06 19:19 ` Darrick J. Wong 2024-11-07 7:33 ` Christoph Hellwig 0 siblings, 1 reply; 17+ messages in thread From: Darrick J. Wong @ 2024-11-06 19:19 UTC (permalink / raw) To: djwong; +Cc: linux-xfs, hch From: Darrick J. Wong <djwong@kernel.org> Document the ondisk changes for realtime allocation groups. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- .../allocation_groups.asciidoc | 31 ++ .../XFS_Filesystem_Structure/common_types.asciidoc | 4 .../internal_inodes.asciidoc | 2 design/XFS_Filesystem_Structure/magic.asciidoc | 3 .../XFS_Filesystem_Structure/ondisk_inode.asciidoc | 2 design/XFS_Filesystem_Structure/realtime.asciidoc | 344 ++++++++++++++++++++ 6 files changed, 385 insertions(+), 1 deletion(-) diff --git a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc index 5d560073d146ff..9f92be49a7a095 100644 --- a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc +++ b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc @@ -106,6 +106,10 @@ struct xfs_sb xfs_lsn_t sb_lsn; uuid_t sb_meta_uuid; xfs_ino_t sb_metadirino; + xfs_rgnumber_t sb_rgcount; + xfs_rgblock_t sb_rgextents; + uint8_t sb_rgblklog; + uint8_t sb_pad[7]; }; ---- *sb_magicnum*:: @@ -413,6 +417,12 @@ to have a slightly higher level of redundancy over the shape of the inode btrees, and decreases the amount of time to compute the metadata B+tree preallocations at mount time. +| +XFS_SB_FEAT_RO_COMPAT_RTSB+ | +Realtime superblock. The first rt extent in rt group zero contains a superblock +header that can be used to identify the realtime device. See the section about +the xref:Realtime_Group_Superblocks[realtime group superblocks] for more +information. + |===== *sb_features_incompat*:: @@ -476,6 +486,10 @@ pointers] for more information. Metadata directory tree. See the section about the xref:Metadata_Directories[ metadata directory tree] for more information. +| +XFS_SB_FEAT_INCOMPAT_RTGROUPS+ | +Realtime allocation groups. See the section about the xref:Realtime_Groups[ +realtime groups] for more information. + |===== *sb_features_log_incompat*:: @@ -514,6 +528,23 @@ If the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is set, this field points to the inode of the root directory of the metadata directory tree. This field is zero otherwise. +*sb_rgcount*:: +Count of realtime groups in the filesystem, if the ++XFS_SB_FEAT_INCOMPAT_RTGROUPS+ feature is enabled. + +*sb_rgextents*:: +Maximum number of realtime extents that can be contained within a realtime +group, if the +XFS_SB_FEAT_INCOMPAT_RTGROUPS+ feature is enabled. + +*sb_rgblklog*:: +If the +XFS_SB_FEAT_INCOMPAT_RTGROUPS+ feature is enabled, this is the log~2~ +value of +sb_rgextents+ * +sb_rextsize+ (rounded up). This value is used to +generate absolute block numbers defined in extent maps from the segmented ++xfs_rtblock_t+ values. + +*sb_pad[7]*:: +Zeroes, if the +XFS_SB_FEAT_INCOMPAT_RTGROUPS+ feature is enabled. + === xfs_db Superblock Example A filesystem is made on a single disk with the following command: diff --git a/design/XFS_Filesystem_Structure/common_types.asciidoc b/design/XFS_Filesystem_Structure/common_types.asciidoc index 51909be384e273..34cdfdaeccf848 100644 --- a/design/XFS_Filesystem_Structure/common_types.asciidoc +++ b/design/XFS_Filesystem_Structure/common_types.asciidoc @@ -43,7 +43,9 @@ Unsigned 64 bit raw filesystem block number. *xfs_rtblock_t*:: Unsigned 64 bit extent number in the xref:Real-time_Devices[real-time] -sub-volume. +sub-volume. If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, these +values combine an xref:Realtime_Groups[rtgroup number] and block offset into +the realtime group. *xfs_fileoff_t*:: Unsigned 64 bit block offset into a file. diff --git a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc index 68c86d30ff8206..5f4d62201cbd67 100644 --- a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc +++ b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc @@ -21,6 +21,8 @@ of those inodes have been deallocated and may be reused by future features. [options="header"] |===== | Metadata File | Location +| xref:Real-Time_Bitmap_Inode[Realtime Bitmap] | /rtgroups/*.bitmap +| xref:Real-Time_Summary_Inode[Realtime Summary] | /rtgroups/*.summary |===== Metadata files are flagged by the +XFS_DIFLAG2_METADATA+ flag in the diff --git a/design/XFS_Filesystem_Structure/magic.asciidoc b/design/XFS_Filesystem_Structure/magic.asciidoc index 60952aeb876ff5..5da29b9ef9f3a8 100644 --- a/design/XFS_Filesystem_Structure/magic.asciidoc +++ b/design/XFS_Filesystem_Structure/magic.asciidoc @@ -45,9 +45,12 @@ relevant chapters. Magic numbers tend to have consistent locations: | +XFS_ATTR3_LEAF_MAGIC+ | 0x3bee | | xref:Leaf_Attributes[Leaf Attribute], v5 only | +XFS_ATTR3_RMT_MAGIC+ | 0x5841524d | XARM | xref:Remote_Values[Remote Attribute Value], v5 only | +XFS_RMAP_CRC_MAGIC+ | 0x524d4233 | RMB3 | xref:Reverse_Mapping_Btree[Reverse Mapping B+tree], v5 only +| +XFS_RTBITMAP_MAGIC+ | 0x424D505A | BMPZ | xref:Real-Time_Bitmap_Inode[Real-Time Bitmap], metadir only +| +XFS_RTSUMMARY_MAGIC+ | 0x53554D59 | SUMY | xref:Real-Time_Summary_Inode[Real-Time Summary], metadir only | +XFS_RTRMAP_CRC_MAGIC+ | 0x4d415052 | MAPR | xref:Real_time_Reverse_Mapping_Btree[Real-Time Reverse Mapping B+tree], v5 only | +XFS_REFC_CRC_MAGIC+ | 0x52334643 | R3FC | xref:Reference_Count_Btree[Reference Count B+tree], v5 only | +XFS_MD_MAGIC+ | 0x5846534d | XFSM | xref:Metadata_Dumps[Metadata Dumps] +| +XFS_RTSB_MAGIC+ | 0x46726F67 | Frog | xref:Realtime_Groups[Realtime Groups] |===== The magic numbers for log items are at offset zero in each log item, but items diff --git a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc index 02ec0d12bb57e5..e28929907147b7 100644 --- a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc +++ b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc @@ -199,6 +199,8 @@ directory tree. [source, c] ---- enum xfs_metafile_type { + XFS_METAFILE_RTBITMAP, + XFS_METAFILE_RTSUMMARY, }; ---- diff --git a/design/XFS_Filesystem_Structure/realtime.asciidoc b/design/XFS_Filesystem_Structure/realtime.asciidoc index 11426e8fdb632d..3a72eb5175ad89 100644 --- a/design/XFS_Filesystem_Structure/realtime.asciidoc +++ b/design/XFS_Filesystem_Structure/realtime.asciidoc @@ -31,6 +31,146 @@ and bits per byte. This value is stored in +sb_rbmblocks+. The nblocks and extent array for the inode should match this. Each real time block gets its own bit in the bitmap. +If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, each block of the +realtime bitmap file has a header of the following format: + +[source, c] +---- +struct xfs_rtbuf_blkinfo { + __be32 rt_magic; + __be32 rt_crc; + __be64 rt_owner; + __be64 rt_blkno; + __be64 rt_lsn; + uuid_t rt_uuid; +}; +---- + +*rt_magic*:: +Specifies the magic number for the rtbitmap block: ``BMPZ'' (0x424D505A). + +*rt_crc*:: +Checksum of the block. + +*rt_owner*:: +Specifies the inode number for the file that owns this block. + +*rt_blkno*:: +Disk address of this block. + +*rt_lsn*:: +Log sequence number of the last write to this block. + +*rt_uuid*:: +The UUID of this block, which must match either +sb_uuid+ or +sb_meta_uuid+ +depending on which features are set. + +After the block header, the bitmap data are encoded as be32 word values. + +=== xfs_db rtbitmap Example + +This example shows a real-time bitmap file from a freshly populated filesystem: + +---- +xfs_db> path -m /rtgroups/3.bitmap +xfs_db> p +core.magic = 0x494e +core.mode = 0100000 +core.version = 3 +core.format = 2 (extents) +core.metatype = 5 (rtbitmap) +core.uid = 0 +core.gid = 0 +core.nlinkv2 = 1 +core.projid_lo = 3 +core.projid_hi = 0 +core.nextents = 1 +core.atime.sec = Tue Oct 15 16:04:02 2024 +core.atime.nsec = 769675000 +core.mtime.sec = Tue Oct 15 16:04:02 2024 +core.mtime.nsec = 769675000 +core.ctime.sec = Tue Oct 15 16:04:02 2024 +core.ctime.nsec = 769681000 +core.size = 135168 +core.nblocks = 33 +core.extsize = 0 +core.naextents = 0 +core.forkoff = 24 +core.aformat = 1 (local) +core.dmevmask = 0 +core.dmstate = 0 +core.newrtbm = 0 +core.prealloc = 0 +core.realtime = 0 +core.immutable = 1 +core.append = 0 +core.sync = 1 +core.noatime = 1 +core.nodump = 1 +core.rtinherit = 0 +core.projinherit = 0 +core.nosymlinks = 0 +core.extsz = 0 +core.extszinherit = 0 +core.nodefrag = 1 +core.filestream = 0 +core.gen = 2653591217 +next_unlinked = null +v3.crc = 0x34a17119 (correct) +v3.change_count = 3 +v3.lsn = 0 +v3.flags2 = 0x38 +v3.cowextsize = 0 +v3.crtime.sec = Tue Oct 15 16:04:02 2024 +v3.crtime.nsec = 769675000 +v3.inumber = 33685633 +v3.uuid = a6575f59-1514-445e-883e-211b2c5a0f05 +v3.reflink = 0 +v3.cowextsz = 0 +v3.dax = 0 +v3.bigtime = 1 +v3.nrext64 = 1 +v3.metadata = 1 +u3.bmx[0] = [startoff,startblock,blockcount,extentflag] +0:[0,4210712,33,0] +a.sfattr.hdr.totsize = 27 +a.sfattr.hdr.count = 1 +a.sfattr.list[0].namelen = 8 +a.sfattr.list[0].valuelen = 12 +a.sfattr.list[0].root = 0 +a.sfattr.list[0].secure = 0 +a.sfattr.list[0].parent = 1 +a.sfattr.list[0].name = "0.bitmap" +a.sfattr.list[0].parent_dir.inumber = 33685632 +a.sfattr.list[0].parent_dir.gen = 142228546 +xfs_db> dblock 0 +xfs_db> p +magicnum = 0x424d505a +crc = 0xc8b10abf (correct) +owner = 33685633 +bno = 20902080 +lsn = 0x100007696 +uuid = a6575f59-1514-445e-883e-211b2c5a0f05 +rtwords[0-1011] = 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0 8:0 9:0 10:0 11:0 12:0 13:0 +14:0 15:0 16:0 17:0 18:0 19:0 20:0 21:0xfffff800 22:0xffffffff 23:0xffffffff +24:0xffffffff 25:0xffffffff 26:0xffffffff 27:0xffffffff 28:0xffffffff +29:0xffffffff 30:0xffffffff 31:0xffffffff 32:0xffffffff +... +979:0xffffffff 980:0xffffffff 981:0xffffffff 982:0xffffffff 983:0xffffffff +984:0xffffffff 985:0xffffffff 986:0xffffffff 987:0xffffffff 988:0xffffffff +989:0xffffffff 990:0xffffffff 991:0xffffffff 992:0xffffffff 993:0xffffffff +994:0xffffffff 995:0xffffffff 996:0xffffffff 997:0xffffffff 998:0xffffffff +999:0xffffffff 1000:0xffffffff 1001:0xffffffff 1002:0xffffffff 1003:0xffffffff +1004:0xffffffff 1005:0xffffffff 1006:0xffffffff 1007:0xffffffff 1008:0xffffffff +1009:0xffffffff 1010:0xffffffff 1011:0xffffffff +---- + +From this example, we can clearly see that this is a bitmap file in the +metadata directory tree, and that it is the bitmap file for rtgroup 3. When we +access the first block in the bitmap file, we can clearly see the new block +header and that the first 179 extents are allocated. The bitmap words were +excerpted for brevity. + [[Real-Time_Summary_Inode]] == Free Space Summary Inode @@ -47,4 +187,208 @@ This data structure is not particularly space efficient, however it is a very fast way to provide the same data as the two free space B+trees for regular files since the space is preallocated and metadata maintenance is minimal. +If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, each block of the +realtime summary file has the same header as rtbitmap file blocks. However, +the magic number will be ``SUMY'' (0x53554D59). After the block header, the +summary counts are encoded as be32 integers. + +=== xfs_db rtsummary Example + +This example shows a real-time summary file from a freshly populated filesystem: + +---- +xfs_db> path -m /rtgroups/3.summary +xfs_db> p +core.magic = 0x494e +core.mode = 0100000 +core.version = 3 +core.format = 2 (extents) +core.metatype = 6 (rtsummary) +core.uid = 0 +core.gid = 0 +core.nlinkv2 = 1 +core.projid_lo = 3 +core.projid_hi = 0 +core.nextents = 1 +core.atime.sec = Tue Oct 15 16:04:02 2024 +core.atime.nsec = 769694000 +core.mtime.sec = Tue Oct 15 16:04:02 2024 +core.mtime.nsec = 769694000 +core.ctime.sec = Tue Oct 15 16:04:02 2024 +core.ctime.nsec = 769699000 +core.size = 4096 +core.nblocks = 1 +core.extsize = 0 +core.naextents = 0 +core.forkoff = 24 +core.aformat = 1 (local) +core.dmevmask = 0 +core.dmstate = 0 +core.newrtbm = 0 +core.prealloc = 0 +core.realtime = 0 +core.immutable = 1 +core.append = 0 +core.sync = 1 +core.noatime = 1 +core.nodump = 1 +core.rtinherit = 0 +core.projinherit = 0 +core.nosymlinks = 0 +core.extsz = 0 +core.extszinherit = 0 +core.nodefrag = 1 +core.filestream = 0 +core.gen = 519466891 +next_unlinked = null +v3.crc = 0x54fc58d0 (correct) +v3.change_count = 3 +v3.lsn = 0 +v3.flags2 = 0x38 +v3.cowextsize = 0 +v3.crtime.sec = Tue Oct 15 16:04:02 2024 +v3.crtime.nsec = 769694000 +v3.inumber = 33685634 +v3.uuid = a6575f59-1514-445e-883e-211b2c5a0f05 +v3.reflink = 0 +v3.cowextsz = 0 +v3.dax = 0 +v3.bigtime = 1 +v3.nrext64 = 1 +v3.metadata = 1 +u3.bmx[0] = [startoff,startblock,blockcount,extentflag] +0:[0,4210703,1,0] +a.sfattr.hdr.totsize = 28 +a.sfattr.hdr.count = 1 +a.sfattr.list[0].namelen = 9 +a.sfattr.list[0].valuelen = 12 +a.sfattr.list[0].root = 0 +a.sfattr.list[0].secure = 0 +a.sfattr.list[0].parent = 1 +a.sfattr.list[0].name = "0.summary" +a.sfattr.list[0].parent_dir.inumber = 33685632 +a.sfattr.list[0].parent_dir.gen = 142228546 +xfs_db> dblock 0 +xfs_db> p +magicnum = 0x53554d59 +crc = 0x473340a8 (correct) +owner = 33685634 +bno = 20902008 +lsn = 0x100007696 +uuid = a6575f59-1514-445e-883e-211b2c5a0f05 +suminfo[0-1011] = 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0 8:0 9:0 10:0 11:0 12:0 13:0 +14:0 15:0 16:0 17:0 18:0 19:0 20:0 21:0 22:0 23:0 24:0 25:0 26:0 27:0 28:0 29:0 +30:0 31:0 32:0 +... +618:0 619:0 620:0 621:0 622:0 623:0 624:0 625:0 626:0 627:1 628:0 629:0 630:0 +... +979:0 980:0 981:0 982:0 983:0 984:0 985:0 986:0 987:0 988:0 989:0 990:0 991:0 +992:0 993:0 994:0 995:0 996:0 997:0 998:0 999:0 1000:0 1001:0 1002:0 1003:0 +1004:0 1005:0 1006:0 1007:0 1008:0 1009:0 1010:0 1011:0 +---- + +From this example, we can clearly see that this is a summary file in the +metadata directory tree, and that it is the summary file for rtgroup 3. When +we access the first block in the summary file, we can clearly see the new block +header and the nonzero counter for the one large free extent in this group. +The summary counts were excerpted for brevity. + +[[Realtime_Groups]] +== Realtime Groups + +To reduce metadata contention for space allocation and remapping activities +being applied to realtime files, the realtime volume can be split into +allocation groups, just like the data volume. The free space information is +still contained in a single file that applies to the entire volume. + +Each realtime allocation group can contain up to (2^31^ - 1) filesystem blocks, +regardless of the underlying realtime extent size. + +Each realtime group has the following characteristics: + + * Group 0 has a super block describing overall filesystem info + * Free space bitmap + * Summary of free space + +The free space metadata are the same as described in the previous sections, +except that their scope covers only a single rtgroup. The other structures are +expanded upon in the following sections. + +[[Realtime_Group_Superblocks]] +=== Superblocks + +The first block of each realtime group contains a superblock. These fields +must match their counterparts in the filesystem superblock on the data device. + +[source, c] +---- +struct xfs_rtsb { + __be32 rsb_magicnum; + __le32 rsb_crc; + + __be32 rsb_pad; + unsigned char rsb_fname[XFSLABEL_MAX]; + + uuid_t rsb_uuid; + uuid_t rsb_meta_uuid; + + /* must be padded to 64 bit alignment */ +}; +---- + +*rsb_magicnum*:: +Identifies the filesystem. Its value is +XFS_RTSB_MAGIC+ ``Frog'' (0x46726F67). + +*rsb_crc*:: +Superblock checksum. + +*rsb_pad*:: +Must be zero. + +*rsb_fname[12]*:: +Name for the filesystem. This matches +sb_fname+ in the primary superblock. + +*rsb_uuid*:: +UUID (Universally Unique ID) for the filesystem. This matches +sb_uuid+ in the +primary superblock. + +*rsb_meta_uuid*:: +Metadata UUID for the filesystem. This matches +sb_meta_uuid+ in the primary +superblock. + +==== xfs_db rtgroup Superblock Example + +A filesystem is made on a multidisk filesystem with the following command: + +---- +# mkfs.xfs -r rtgroups=1,rgcount=4,rtdev=/dev/sdb /dev/sda -f +meta-data=/dev/sda isize=512 agcount=4, agsize=1298176 blks + = sectsz=512 attr=2, projid32bit=1 + = crc=1 finobt=1, sparse=1, rmapbt=1 + = reflink=1 bigtime=1 inobtcount=1 nrext64=1 + = metadir=1 +data = bsize=4096 blocks=5192704, imaxpct=25 + = sunit=0 swidth=0 blks +naming =version 2 bsize=4096 ascii-ci=0, ftype=1 +log =internal log bsize=4096 blocks=16384, version=2 + = sectsz=512 sunit=0 blks, lazy-count=1 +realtime =/dev/sdb extsz=4096 blocks=5192704, rtextents=5192704 + = rgcount=5 rgsize=1048576 extents +---- + +And in xfs_db, inspecting the realtime group superblock and then the regular +superblock: + +---- +# xfs_db -R /dev/sdb /dev/sda +xfs_db> rtsb +xfs_db> print +magicnum = 0x46726f67 +crc = 0x759a62d4 (correct) +pad = 0 +fname = "\000\000\000\000\000\000\000\000\000\000\000\000" +uuid = 7e55b909-8728-4d69-a1fa-891427314eea +meta_uuid = 7e55b909-8728-4d69-a1fa-891427314eea +---- + include::rtrmapbt.asciidoc[] ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 2/4] design: document realtime groups 2024-11-06 19:19 ` [PATCH 2/4] design: document realtime groups Darrick J. Wong @ 2024-11-07 7:33 ` Christoph Hellwig 2024-11-07 15:58 ` Darrick J. Wong 0 siblings, 1 reply; 17+ messages in thread From: Christoph Hellwig @ 2024-11-07 7:33 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs, hch > --- a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc > +++ b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc > @@ -106,6 +106,10 @@ struct xfs_sb > xfs_lsn_t sb_lsn; > uuid_t sb_meta_uuid; > xfs_ino_t sb_metadirino; > + xfs_rgnumber_t sb_rgcount; > + xfs_rgblock_t sb_rgextents; > + uint8_t sb_rgblklog; > + uint8_t sb_pad[7]; And following on with my ranting about existing bits theme from the previous review: why are we documenting the in-memory xfs_sb here and not the on-disk xfs_dsb? > +| +XFS_SB_FEAT_RO_COMPAT_RTSB+ | > +Realtime superblock. The first rt extent in rt group zero contains a superblock > +header that can be used to identify the realtime device. See the section about > +the xref:Realtime_Group_Superblocks[realtime group superblocks] for more > +information. This is actually gone now. > +*sb_rgcount*:: > +Count of realtime groups in the filesystem, if the > ++XFS_SB_FEAT_INCOMPAT_RTGROUPS+ feature is enabled. ... will be zero if XFS_SB_FEAT_INCOMPAT_RTGROUPS is set, but no realtime subvolume exists ? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/4] design: document realtime groups 2024-11-07 7:33 ` Christoph Hellwig @ 2024-11-07 15:58 ` Darrick J. Wong 0 siblings, 0 replies; 17+ messages in thread From: Darrick J. Wong @ 2024-11-07 15:58 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-xfs On Thu, Nov 07, 2024 at 08:33:18AM +0100, Christoph Hellwig wrote: > > --- a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc > > +++ b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc > > @@ -106,6 +106,10 @@ struct xfs_sb > > xfs_lsn_t sb_lsn; > > uuid_t sb_meta_uuid; > > xfs_ino_t sb_metadirino; > > + xfs_rgnumber_t sb_rgcount; > > + xfs_rgblock_t sb_rgextents; > > + uint8_t sb_rgblklog; > > + uint8_t sb_pad[7]; > > And following on with my ranting about existing bits theme from the > previous review: why are we documenting the in-memory xfs_sb here > and not the on-disk xfs_dsb? <nod> will clean that one up too. > > +| +XFS_SB_FEAT_RO_COMPAT_RTSB+ | > > +Realtime superblock. The first rt extent in rt group zero contains a superblock > > +header that can be used to identify the realtime device. See the section about > > +the xref:Realtime_Group_Superblocks[realtime group superblocks] for more > > +information. > > This is actually gone now. Oops, will remove that one. > > +*sb_rgcount*:: > > +Count of realtime groups in the filesystem, if the > > ++XFS_SB_FEAT_INCOMPAT_RTGROUPS+ feature is enabled. > > ... will be zero if XFS_SB_FEAT_INCOMPAT_RTGROUPS is set, but no > realtime subvolume exists > > ? Yeah, that's a good thing to note. I'll also s/RTGROUPS/METADIR/ since the rtgroups feature bit is also gone yet I seem to have missed this one. :( --D ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-11-08 14:17 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-07 23:21 [PATCHBOMB 6.13 v5.6] xfs-docs: metadata directories and realtime groups Darrick J. Wong 2024-11-07 23:24 ` Darrick J. Wong 2024-11-07 23:24 ` [PATCHSET v5.6 1/2] xfs-documentation: document metadata directories Darrick J. Wong 2024-11-07 23:25 ` [PATCH 1/3] design: move superblock documentation to a separate file Darrick J. Wong 2024-11-08 14:16 ` Christoph Hellwig 2024-11-07 23:25 ` [PATCH 2/3] design: document the actual ondisk superblock Darrick J. Wong 2024-11-08 14:16 ` Christoph Hellwig 2024-11-07 23:26 ` [PATCH 3/3] design: document the changes required to handle metadata directories Darrick J. Wong 2024-11-07 23:25 ` [PATCHSET v5.6 2/2] xfs-documentation: shard the realtime section Darrick J. Wong 2024-11-07 23:26 ` [PATCH 1/4] design: move discussion of realtime volumes to a separate section Darrick J. Wong 2024-11-07 23:26 ` [PATCH 2/4] design: document realtime groups Darrick J. Wong 2024-11-08 14:17 ` Christoph Hellwig 2024-11-07 23:26 ` [PATCH 3/4] design: document metadata directory tree quota changes Darrick J. Wong 2024-11-07 23:27 ` [PATCH 4/4] design: update metadump v2 format to reflect rt dumps Darrick J. Wong -- strict thread matches above, loose matches on Subject: below -- 2024-11-06 19:17 [PATCHSET v5.5 3/3] xfs-documentation: shard the realtime section Darrick J. Wong 2024-11-06 19:19 ` [PATCH 2/4] design: document realtime groups Darrick J. Wong 2024-11-07 7:33 ` Christoph Hellwig 2024-11-07 15:58 ` Darrick J. Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox