[PATCHBOMB v6.2] xfsprogs: metadata directories and realtime groups

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCHBOMB v6.2] xfsprogs: metadata directories and realtime groups
@ 2024-12-23 21:29 Darrick J. Wong
  2024-12-23 21:34 ` [PATCHSET 1/8] xfsprogs: bug fixes for 6.12 Darrick J. Wong
                   ` (15 more replies)
  0 siblings, 16 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:29 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: linux-xfs, Christoph Hellwig

Hi Andrey,

This patchbomb completes the review of userspace support for the
metadata directory tree and realtime allocation group code that was
merged into kernel 6.13-rc1.  I now submit this for xfsprogs 6.13.

As with last time, I'm presenting the libxfs sync for 6.13 a bit
differently than I've done traditionally.  Whereas I usually do all the
kernel sync and only then go to work on the surrounding utilities, I
noticed that the conversion of xfs_perag to be a subclass of generic
xfs_group objects and the introduction of xfs_rtgroup objects causes a
lot of changes in the utilities.

As a result, I decided that it'd be easier to take care of the libxfs
sync for only the metadata directory tree code and the utility changes
*before* moving on to the allocation groups restructuring.  That's why
there are nine patchsets instead of five.  I hope that makes it easier
to tackle.

Note that this is /not/ the last code I intend to submit for xfsprogs
6.13 -- I found a few more bugs while examining Christoph's zoned
storage support code and will submit those when I return from vacation.
This patchbomb merely covers everything that is ready for merge.

As always, you can browse / pull the branch from here:
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=realtime-quotas_2024-12-23

--D

^ permalink raw reply	[flat|nested] 213+ messages in thread

* [PATCHSET 1/8] xfsprogs: bug fixes for 6.12
  2024-12-23 21:29 [PATCHBOMB v6.2] xfsprogs: metadata directories and realtime groups Darrick J. Wong
@ 2024-12-23 21:34 ` Darrick J. Wong
  2024-12-23 21:36   ` [PATCH 1/3] xfs_repair: fix maximum file offset comparison Darrick J. Wong
                     ` (2 more replies)
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                   ` (14 subsequent siblings)
  15 siblings, 3 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:34 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: jpalus, hch, linux-xfs

Hi all,

Bug fixes for 6.12.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=xfs-6.12-fixes

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=xfs-6.12-fixes
---
Commits in this patchset:
 * xfs_repair: fix maximum file offset comparison
 * man: fix ioctl_xfs_commit_range man page install
 * man: document the -n parent mkfs option
---
 man/man2/ioctl_xfs_commit_range.2 |    2 +-
 man/man8/mkfs.xfs.8.in            |   12 ++++++++++++
 repair/dinode.c                   |    2 +-
 repair/globals.c                  |    1 -
 repair/globals.h                  |    1 -
 repair/prefetch.c                 |    2 +-
 repair/versions.c                 |    7 +------
 7 files changed, 16 insertions(+), 11 deletions(-)


^ permalink raw reply	[flat|nested] 213+ messages in thread

* [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees
  2024-12-23 21:29 [PATCHBOMB v6.2] xfsprogs: metadata directories and realtime groups Darrick J. Wong
  2024-12-23 21:34 ` [PATCHSET 1/8] xfsprogs: bug fixes for 6.12 Darrick J. Wong
@ 2024-12-23 21:34 ` Darrick J. Wong
  2024-12-23 21:37   ` [PATCH 01/36] xfs: remove the redundant xfs_alloc_log_agf Darrick J. Wong
                     ` (35 more replies)
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                   ` (13 subsequent siblings)
  15 siblings, 36 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:34 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: cem, dchinner, leo.lilong, hch, linux-xfs

Hi all,

Synchronize libxfs with the kernel.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=metadata-directory-tree-sync
---
Commits in this patchset:
 * xfs: remove the redundant xfs_alloc_log_agf
 * xfs: sb_spino_align is not verified
 * xfs: remove the unused pagb_count field in struct xfs_perag
 * xfs: remove the unused pag_active_wq field in struct xfs_perag
 * xfs: pass a pag to xfs_difree_inode_chunk
 * xfs: remove the agno argument to xfs_free_ag_extent
 * xfs: add xfs_agbno_to_fsb and xfs_agbno_to_daddr helpers
 * xfs: add a xfs_agino_to_ino helper
 * xfs: pass a pag to xfs_extent_busy_{search,reuse}
 * xfs: pass a perag structure to the xfs_ag_resv_init_error trace point
 * xfs: pass objects to the xfs_irec_merge_{pre,post} trace points
 * xfs: convert remaining trace points to pass pag structures
 * xfs: split xfs_initialize_perag
 * xfs: insert the pag structures into the xarray later
 * xfs: factor out a generic xfs_group structure
 * xfs: add a xfs_group_next_range helper
 * xfs: switch perag iteration from the for_each macros to a while based iterator
 * xfs: move metadata health tracking to the generic group structure
 * xfs: move draining of deferred operations to the generic group structure
 * xfs: move the online repair rmap hooks to the generic group structure
 * xfs: convert busy extent tracking to the generic group structure
 * xfs: add a generic group pointer to the btree cursor
 * xfs: add group based bno conversion helpers
 * xfs: store a generic group structure in the intents
 * xfs: constify the xfs_sb predicates
 * xfs: rename metadata inode predicates
 * xfs: define the on-disk format for the metadir feature
 * xfs: iget for metadata inodes
 * xfs: enforce metadata inode flag
 * xfs: read and write metadata inode directory tree
 * xfs: disable the agi rotor for metadata inodes
 * xfs: advertise metadata directory feature
 * xfs: allow bulkstat to return metadata directories
 * xfs: adjust xfs_bmap_add_attrfork for metadir
 * xfs: record health problems with the metadata directory
 * xfs: check metadata directory file path connectivity
---
 db/check.c                  |    2 
 db/fsmap.c                  |   10 -
 db/info.c                   |    7 -
 db/inode.c                  |    4 
 db/iunlink.c                |    6 -
 include/libxfs.h            |    2 
 include/xfs_inode.h         |   11 +
 include/xfs_mount.h         |   45 +++-
 include/xfs_trace.h         |   31 ++-
 include/xfs_trans.h         |    3 
 libfrog/radix-tree.h        |    9 +
 libxfs/Makefile             |    6 +
 libxfs/defer_item.c         |   35 ++-
 libxfs/init.c               |    8 -
 libxfs/inode.c              |   55 +++++
 libxfs/iunlink.c            |   11 +
 libxfs/libxfs_api_defs.h    |   15 +
 libxfs/libxfs_priv.h        |   10 +
 libxfs/trans.c              |   39 +++
 libxfs/util.c               |    4 
 libxfs/xfs_ag.c             |  246 ++++++++--------------
 libxfs/xfs_ag.h             |  189 ++++++++++-------
 libxfs/xfs_ag_resv.c        |   22 +-
 libxfs/xfs_alloc.c          |  104 +++++----
 libxfs/xfs_alloc.h          |    7 -
 libxfs/xfs_alloc_btree.c    |   30 +--
 libxfs/xfs_attr.c           |    5 
 libxfs/xfs_bmap.c           |    7 -
 libxfs/xfs_bmap.h           |    2 
 libxfs/xfs_btree.c          |   38 +--
 libxfs/xfs_btree.h          |    3 
 libxfs/xfs_btree_mem.c      |    6 -
 libxfs/xfs_format.h         |  121 +++++++++--
 libxfs/xfs_fs.h             |   25 ++
 libxfs/xfs_group.c          |  223 ++++++++++++++++++++
 libxfs/xfs_group.h          |  131 ++++++++++++
 libxfs/xfs_health.h         |   51 ++---
 libxfs/xfs_ialloc.c         |  175 +++++++++-------
 libxfs/xfs_ialloc_btree.c   |   29 +--
 libxfs/xfs_inode_buf.c      |   90 ++++++++
 libxfs/xfs_inode_buf.h      |    3 
 libxfs/xfs_inode_util.c     |    6 -
 libxfs/xfs_log_format.h     |    2 
 libxfs/xfs_metadir.c        |  480 +++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_metadir.h        |   47 ++++
 libxfs/xfs_metafile.c       |   52 +++++
 libxfs/xfs_metafile.h       |   31 +++
 libxfs/xfs_ondisk.h         |    2 
 libxfs/xfs_refcount.c       |   33 +--
 libxfs/xfs_refcount.h       |    2 
 libxfs/xfs_refcount_btree.c |   17 +-
 libxfs/xfs_rmap.c           |   42 ++--
 libxfs/xfs_rmap.h           |    6 -
 libxfs/xfs_rmap_btree.c     |   28 +--
 libxfs/xfs_sb.c             |   54 ++++-
 libxfs/xfs_types.c          |    9 -
 libxfs/xfs_types.h          |   10 +
 repair/agbtree.c            |   27 +-
 repair/bmap_repair.c        |   11 -
 repair/bulkload.c           |    9 -
 repair/dino_chunks.c        |    2 
 repair/dinode.c             |   12 +
 repair/phase2.c             |   17 +-
 repair/phase5.c             |    6 -
 repair/rmap.c               |   12 +
 65 files changed, 2032 insertions(+), 705 deletions(-)
 create mode 100644 libxfs/xfs_group.c
 create mode 100644 libxfs/xfs_group.h
 create mode 100644 libxfs/xfs_metadir.c
 create mode 100644 libxfs/xfs_metadir.h
 create mode 100644 libxfs/xfs_metafile.c
 create mode 100644 libxfs/xfs_metafile.h


^ permalink raw reply	[flat|nested] 213+ messages in thread

* [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees
  2024-12-23 21:29 [PATCHBOMB v6.2] xfsprogs: metadata directories and realtime groups Darrick J. Wong
  2024-12-23 21:34 ` [PATCHSET 1/8] xfsprogs: bug fixes for 6.12 Darrick J. Wong
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
@ 2024-12-23 21:35 ` Darrick J. Wong
  2024-12-23 21:46   ` [PATCH 01/41] libxfs: constify the xfs_inode predicates Darrick J. Wong
                     ` (40 more replies)
  2024-12-23 21:35 ` [PATCHSET v6.2 4/8] mkfs: make protofiles less janky Darrick J. Wong
                   ` (12 subsequent siblings)
  15 siblings, 41 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:35 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs, linux-xfs

Hi all,

This series delivers a new feature -- metadata inode directories.  This
is a separate directory tree (rooted in the superblock) that contains
only inodes that contain filesystem metadata.  Different metadata
objects can be looked up with regular paths.

We start by creating xfs_imeta_* functions to mediate access to metadata
inode pointers.  This enables the imeta code to abstract inode pointers,
whether they're the classic five in the superblock, or the much more
complex directory tree.  All current users of metadata inodes (rt+quota)
are converted to use the boilerplate code.

Next, we define the metadir on-disk format, which consists of marking
inodes with a new iflag that says they're metadata.  This we use to
prevent bulkstat and friends from ever getting their hands on fs
metadata.

Finally, we implement metadir operations so that clients can create,
delete, zap, and look up metadata inodes by path.  Beware that much of
this code is only lightly used, because the five current users of
metadata inodes don't tend to change them very often.  This is likely to
change if and when the subvolume and multiple-rt-volume features get
written/merged/etc.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=metadata-directory-tree

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=metadata-directory-tree
---
Commits in this patchset:
 * libxfs: constify the xfs_inode predicates
 * libxfs: load metadata directory root at mount time
 * libxfs: enforce metadata inode flag
 * man2: document metadata directory flag in fsgeom ioctl
 * man: update scrub ioctl documentation for metadir
 * libfrog: report metadata directories in the geometry report
 * libfrog: allow METADIR in xfrog_bulkstat_single5
 * xfs_io: support scrubbing metadata directory paths
 * xfs_db: disable xfs_check when metadir is enabled
 * xfs_db: report metadir support for version command
 * xfs_db: don't obfuscate metadata directories and attributes
 * xfs_db: support metadata directories in the path command
 * xfs_db: show the metadata root directory when dumping superblocks
 * xfs_db: display di_metatype
 * xfs_db: drop the metadata checking code from blockget
 * xfs_io: support flag for limited bulkstat of the metadata directory
 * xfs_io: support scrubbing metadata directory paths
 * xfs_spaceman: report health of metadir inodes too
 * xfs_scrub: tread zero-length read verify as an IO error
 * xfs_scrub: scan metadata directories during phase 3
 * xfs_scrub: re-run metafile scrubbers during phase 5
 * xfs_repair: handle sb_metadirino correctly when zeroing supers
 * xfs_repair: dont check metadata directory dirent inumbers
 * xfs_repair: refactor fixing dotdot
 * xfs_repair: refactor marking of metadata inodes
 * xfs_repair: refactor root directory initialization
 * xfs_repair: refactor grabbing realtime metadata inodes
 * xfs_repair: check metadata inode flag
 * xfs_repair: use libxfs_metafile_iget for quota/rt inodes
 * xfs_repair: rebuild the metadata directory
 * xfs_repair: don't let metadata and regular files mix
 * xfs_repair: update incore metadata state whenever we create new files
 * xfs_repair: pass private data pointer to scan_lbtree
 * xfs_repair: mark space used by metadata files
 * xfs_repair: adjust keep_fsinos to handle metadata directories
 * xfs_repair: metadata dirs are never plausible root dirs
 * xfs_repair: drop all the metadata directory files during pass 4
 * xfs_repair: truncate and unmark orphaned metadata inodes
 * xfs_repair: do not count metadata directory files when doing quotacheck
 * xfs_repair: refactor generate_rtinfo
 * mkfs.xfs: enable metadata directories
---
 db/check.c                          |  290 -------------------
 db/field.c                          |    2 
 db/field.h                          |    1 
 db/inode.c                          |   86 +++++-
 db/inode.h                          |    2 
 db/metadump.c                       |  385 ++++++++++++-------------
 db/namei.c                          |   71 ++++-
 db/sb.c                             |   16 +
 include/xfs_inode.h                 |   12 -
 include/xfs_mount.h                 |    1 
 io/bulkstat.c                       |   16 +
 io/scrub.c                          |   62 ++++
 libfrog/bulkstat.c                  |    3 
 libfrog/fsgeom.c                    |    6 
 libfrog/scrub.c                     |   14 +
 libfrog/scrub.h                     |    2 
 libxfs/init.c                       |   26 ++
 libxfs/inode.c                      |    9 +
 libxfs/libxfs_api_defs.h            |    4 
 man/man2/ioctl_xfs_fsgeometry.2     |    3 
 man/man2/ioctl_xfs_scrub_metadata.2 |   44 +++
 man/man8/mkfs.xfs.8.in              |   11 +
 man/man8/xfs_db.8                   |   35 +-
 man/man8/xfs_io.8                   |   13 +
 mkfs/lts_4.19.conf                  |    1 
 mkfs/lts_5.10.conf                  |    1 
 mkfs/lts_5.15.conf                  |    1 
 mkfs/lts_5.4.conf                   |    1 
 mkfs/lts_6.1.conf                   |    1 
 mkfs/lts_6.12.conf                  |    1 
 mkfs/lts_6.6.conf                   |    1 
 mkfs/proto.c                        |   68 ++++
 mkfs/xfs_mkfs.c                     |   33 ++
 repair/agheader.c                   |   11 +
 repair/dino_chunks.c                |   43 +++
 repair/dinode.c                     |  196 +++++++++++--
 repair/dinode.h                     |    6 
 repair/dir2.c                       |   51 +++
 repair/globals.c                    |    8 -
 repair/globals.h                    |    8 -
 repair/incore.h                     |   63 +++-
 repair/incore_ino.c                 |    1 
 repair/phase1.c                     |    2 
 repair/phase2.c                     |   58 +++-
 repair/phase4.c                     |   18 +
 repair/phase5.c                     |   12 +
 repair/phase6.c                     |  541 +++++++++++++++++++++++------------
 repair/pptr.c                       |   94 ++++++
 repair/pptr.h                       |    2 
 repair/quotacheck.c                 |   22 +
 repair/rt.c                         |  189 ++++++++----
 repair/rt.h                         |   12 -
 repair/sb.c                         |    3 
 repair/scan.c                       |   43 ++-
 repair/scan.h                       |    7 
 repair/xfs_repair.c                 |   56 ++++
 scrub/inodes.c                      |   11 +
 scrub/inodes.h                      |    5 
 scrub/phase3.c                      |    7 
 scrub/phase5.c                      |  102 ++++++-
 scrub/phase6.c                      |   24 +-
 scrub/read_verify.c                 |    8 +
 scrub/scrub.c                       |   18 +
 scrub/scrub.h                       |    7 
 spaceman/health.c                   |    2 
 65 files changed, 1949 insertions(+), 903 deletions(-)


^ permalink raw reply	[flat|nested] 213+ messages in thread

* [PATCHSET v6.2 4/8] mkfs: make protofiles less janky
  2024-12-23 21:29 [PATCHBOMB v6.2] xfsprogs: metadata directories and realtime groups Darrick J. Wong
                   ` (2 preceding siblings ...)
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
@ 2024-12-23 21:35 ` Darrick J. Wong
  2024-12-23 21:57   ` [PATCH 1/4] libxfs: resync libxfs_alloc_file_space interface with the kernel Darrick J. Wong
                     ` (3 more replies)
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                   ` (11 subsequent siblings)
  15 siblings, 4 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:35 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

Hi all,

Here are some minor improvements to the protofile code in mkfs.  First,
we improve the file copy-in code to avoid allocating gigantic heap
buffers by replacing it with a SEEK_{DATA,HOLE} loop and only copying
a fixed size buffer.  Next, we add the ability to pull in extended
attributes.  Finally, we add a program to generate a protofile from a
directory tree.

Originally this code refactoring was intended to support more efficient
construction of rt bitmap and summary files, but that was subsequently
refactored into libxfs.  However, the cleanups still make the protofile
code less disgusting, so they're being submitted anyway.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=protofiles

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=protofiles
---
Commits in this patchset:
 * libxfs: resync libxfs_alloc_file_space interface with the kernel
 * mkfs: support copying in large or sparse files
 * mkfs: support copying in xattrs
 * mkfs: add a utility to generate protofiles
---
 include/libxfs.h         |    6 +
 libxfs/util.c            |  207 +++++++++++++++++++++++++++--------
 man/man8/xfs_protofile.8 |   33 ++++++
 mkfs/Makefile            |   10 ++
 mkfs/proto.c             |  269 ++++++++++++++++++++++++++++++++++------------
 mkfs/xfs_protofile.in    |  152 ++++++++++++++++++++++++++
 6 files changed, 557 insertions(+), 120 deletions(-)
 create mode 100644 man/man8/xfs_protofile.8
 create mode 100644 mkfs/xfs_protofile.in


^ permalink raw reply	[flat|nested] 213+ messages in thread

* [PATCHSET 5/8] xfsprogs: new code for 6.13
  2024-12-23 21:29 [PATCHBOMB v6.2] xfsprogs: metadata directories and realtime groups Darrick J. Wong
                   ` (3 preceding siblings ...)
  2024-12-23 21:35 ` [PATCHSET v6.2 4/8] mkfs: make protofiles less janky Darrick J. Wong
@ 2024-12-23 21:35 ` Darrick J. Wong
  2024-12-23 21:58   ` [PATCH 01/52] xfs: create incore realtime group structures Darrick J. Wong
                     ` (51 more replies)
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                   ` (10 subsequent siblings)
  15 siblings, 52 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:35 UTC (permalink / raw)
  To: djwong, aalbersh
  Cc: cem, brauner, jlayton, rdunlap, stable, dan.carpenter, hch,
	leo.lilong, josef, dchinner, linux-xfs

Hi all,

New code for 6.12.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=xfs-6.13-merge

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=xfs-6.13-merge
---
Commits in this patchset:
 * xfs: create incore realtime group structures
 * xfs: define locking primitives for realtime groups
 * xfs: add a lockdep class key for rtgroup inodes
 * xfs: support caching rtgroup metadata inodes
 * xfs: add a xfs_bmap_free_rtblocks helper
 * xfs: move RT bitmap and summary information to the rtgroup
 * xfs: support creating per-RTG files in growfs
 * xfs: refactor xfs_rtbitmap_blockcount
 * xfs: refactor xfs_rtsummary_blockcount
 * xfs: make RT extent numbers relative to the rtgroup
 * libfrog: add memchr_inv
 * xfs: define the format of rt groups
 * xfs: update realtime super every time we update the primary fs super
 * xfs: export realtime group geometry via XFS_FSOP_GEOM
 * xfs: check that rtblock extents do not break rtsupers or rtgroups
 * xfs: add a helper to prevent bmap merges across rtgroup boundaries
 * xfs: add frextents to the lazysbcounters when rtgroups enabled
 * xfs: record rt group metadata errors in the health system
 * xfs: export the geometry of realtime groups to userspace
 * xfs: add block headers to realtime bitmap and summary blocks
 * xfs: encode the rtbitmap in big endian format
 * xfs: encode the rtsummary in big endian format
 * xfs: grow the realtime section when realtime groups are enabled
 * xfs: support logging EFIs for realtime extents
 * xfs: support error injection when freeing rt extents
 * xfs: use realtime EFI to free extents when rtgroups are enabled
 * xfs: don't merge ioends across RTGs
 * xfs: make the RT allocator rtgroup aware
 * xfs: scrub the realtime group superblock
 * xfs: scrub metadir paths for rtgroup metadata
 * xfs: mask off the rtbitmap and summary inodes when metadir in use
 * xfs: create helpers to deal with rounding xfs_fileoff_t to rtx boundaries
 * xfs: create helpers to deal with rounding xfs_filblks_t to rtx boundaries
 * xfs: make xfs_rtblock_t a segmented address like xfs_fsblock_t
 * xfs: adjust min_block usage in xfs_verify_agbno
 * xfs: move the min and max group block numbers to xfs_group
 * xfs: implement busy extent tracking for rtgroups
 * xfs: use metadir for quota inodes
 * xfs: scrub quota file metapaths
 * xfs: enable metadata directory feature
 * xfs: convert struct typedefs in xfs_ondisk.h
 * xfs: separate space btree structures in xfs_ondisk.h
 * xfs: port ondisk structure checks from xfs/122 to the kernel
 * xfs: remove unknown compat feature check in superblock write validation
 * xfs: fix sparse inode limits on runt AG
 * xfs: switch to multigrain timestamps
 * xfs: don't call xfs_bmap_same_rtgroup in xfs_bmap_add_extent_hole_delay
 * xfs: return a 64-bit block count from xfs_btree_count_blocks
 * xfs: fix error bailout in xfs_rtginode_create
 * xfs: update btree keys correctly when _insrec splits an inode root block
 * xfs: fix sb_spino_align checks for large fsblock sizes
 * xfs: return from xfs_symlink_verify early on V4 filesystems
---
 db/block.c                  |    2 
 db/block.h                  |   16 -
 db/convert.c                |    1 
 db/faddr.c                  |    1 
 include/libxfs.h            |    2 
 include/platform_defs.h     |   33 ++
 include/xfs_mount.h         |   30 +-
 include/xfs_trace.h         |    7 
 include/xfs_trans.h         |    1 
 libfrog/util.c              |   14 +
 libfrog/util.h              |    4 
 libxfs/Makefile             |    2 
 libxfs/init.c               |   35 ++
 libxfs/libxfs_api_defs.h    |   16 +
 libxfs/libxfs_io.h          |    1 
 libxfs/libxfs_priv.h        |   34 --
 libxfs/rdwr.c               |   17 +
 libxfs/trans.c              |   29 ++
 libxfs/util.c               |    8 
 libxfs/xfs_ag.c             |   22 +
 libxfs/xfs_ag.h             |   16 -
 libxfs/xfs_alloc.c          |   15 +
 libxfs/xfs_alloc.h          |   12 +
 libxfs/xfs_bmap.c           |  124 ++++++--
 libxfs/xfs_btree.c          |   33 ++
 libxfs/xfs_btree.h          |    2 
 libxfs/xfs_defer.c          |    6 
 libxfs/xfs_defer.h          |    1 
 libxfs/xfs_dquot_buf.c      |  190 ++++++++++++
 libxfs/xfs_format.h         |   80 +++++
 libxfs/xfs_fs.h             |   32 ++
 libxfs/xfs_group.h          |   33 ++
 libxfs/xfs_health.h         |   42 ++-
 libxfs/xfs_ialloc.c         |   16 +
 libxfs/xfs_ialloc_btree.c   |    6 
 libxfs/xfs_log_format.h     |    6 
 libxfs/xfs_ondisk.h         |  186 +++++++++---
 libxfs/xfs_quota_defs.h     |   43 +++
 libxfs/xfs_rtbitmap.c       |  405 +++++++++++++++++--------
 libxfs/xfs_rtbitmap.h       |  247 ++++++++++-----
 libxfs/xfs_rtgroup.c        |  694 +++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_rtgroup.h        |  284 ++++++++++++++++++
 libxfs/xfs_sb.c             |  246 ++++++++++++++-
 libxfs/xfs_sb.h             |    6 
 libxfs/xfs_shared.h         |    4 
 libxfs/xfs_symlink_remote.c |    4 
 libxfs/xfs_trans_inode.c    |    6 
 libxfs/xfs_trans_resv.c     |    2 
 libxfs/xfs_types.c          |   35 ++
 libxfs/xfs_types.h          |    8 
 mkfs/proto.c                |   33 +-
 mkfs/xfs_mkfs.c             |    8 
 repair/dinode.c             |    4 
 repair/phase6.c             |  203 ++++++-------
 repair/rt.c                 |   34 --
 repair/rt.h                 |    4 
 56 files changed, 2728 insertions(+), 617 deletions(-)
 create mode 100644 libxfs/xfs_rtgroup.c
 create mode 100644 libxfs/xfs_rtgroup.h


^ permalink raw reply	[flat|nested] 213+ messages in thread

* [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section
  2024-12-23 21:29 [PATCHBOMB v6.2] xfsprogs: metadata directories and realtime groups Darrick J. Wong
                   ` (4 preceding siblings ...)
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
@ 2024-12-23 21:35 ` Darrick J. Wong
  2024-12-23 22:12   ` [PATCH 01/51] libxfs: remove XFS_ILOCK_RT* Darrick J. Wong
                     ` (50 more replies)
  2024-12-23 21:36 ` [PATCHSET v6.2 7/8] xfsprogs: store quota files in the metadir Darrick J. Wong
                   ` (9 subsequent siblings)
  15 siblings, 51 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:35 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

Hi all,

Right now, the realtime section uses a single pair of metadata inodes to
store the free space information.  This presents a scalability problem
since every thread trying to allocate or free rt extents have to lock
these files.  Solve this problem by sharding the realtime section into
separate realtime allocation groups.

While we're at it, define a superblock to be stamped into the start of
the rt section.  This enables utilities such as blkid to identify block
devices containing realtime sections, and avoids the situation where
anything written into block 0 of the realtime extent can be
misinterpreted as file data.

The best advantage for rtgroups will become evident later when we get to
adding rmap and reflink to the realtime volume, since the geometry
constraints are the same for rt groups and AGs.  Hence we can reuse all
that code directly.

This is a very large patchset, but it catches us up with 20 years of
technical debt that have accumulated.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=realtime-groups

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=realtime-groups

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=realtime-groups

xfsdocs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-documentation.git/log/?h=realtime-groups
---
Commits in this patchset:
 * libxfs: remove XFS_ILOCK_RT*
 * libxfs: adjust xfs_fsb_to_db to handle segmented rtblocks
 * xfs_repair,mkfs: port to libxfs_rt{bitmap,summary}_create
 * libxfs: use correct rtx count to block count conversion
 * libfrog: scrub the realtime group superblock
 * man: document the rt group geometry ioctl
 * man: document rgextents geom field
 * libxfs: port userspace deferred log item to handle rtgroups
 * libxfs: implement some sanity checking for enormous rgcount
 * libfrog: support scrubbing rtgroup metadata paths
 * libfrog: report rt groups in output
 * libfrog: add bitmap_clear
 * xfs_logprint: report realtime EFIs
 * xfs_repair: adjust rtbitmap/rtsummary word updates to handle big endian values
 * xfs_repair: refactor phase4
 * xfs_repair: refactor offsetof+sizeof to offsetofend
 * xfs_repair: improve rtbitmap discrepancy reporting
 * xfs_repair: simplify rt_lock handling
 * xfs_repair: add a real per-AG bitmap abstraction
 * xfs_repair: support realtime groups
 * xfs_repair: find and clobber rtgroup bitmap and summary files
 * xfs_repair: support realtime superblocks
 * xfs_repair: repair rtbitmap and rtsummary block headers
 * xfs_db: enable the rtblock and rtextent commands for segmented rt block numbers
 * xfs_db: enable rtconvert to handle segmented rtblocks
 * xfs_db: listify the definition of enum typnm
 * xfs_db: support dumping realtime group data and superblocks
 * xfs_db: support changing the label and uuid of rt superblocks
 * xfs_db: enable conversion of rt space units
 * xfs_db: metadump metadir rt bitmap and summary files
 * xfs_db: metadump realtime devices
 * xfs_db: dump rt bitmap blocks
 * xfs_db: dump rt summary blocks
 * xfs_db: report rt group and block number in the bmap command
 * xfs_io: support scrubbing rtgroup metadata
 * xfs_io: support scrubbing rtgroup metadata paths
 * xfs_io: add a command to display allocation group information
 * xfs_io: add a command to display realtime group information
 * xfs_io: display rt group in verbose bmap output
 * xfs_io: display rt group in verbose fsmap output
 * xfs_mdrestore: refactor open-coded fd/is_file into a structure
 * xfs_mdrestore: restore rt group superblocks to realtime device
 * xfs_spaceman: report on realtime group health
 * xfs_scrub: scrub realtime allocation group metadata
 * xfs_scrub: check rtgroup metadata directory connections
 * xfs_scrub: cleanup fsmap keys initialization
 * xfs_scrub: call GETFSMAP for each rt group in parallel
 * xfs_scrub: trim realtime volumes too
 * xfs_scrub: use histograms to speed up phase 8 on the realtime volume
 * mkfs: add headers to realtime bitmap blocks
 * mkfs: format realtime groups
---
 db/Makefile                           |    1 
 db/block.c                            |   34 ++
 db/bmap.c                             |   56 +++-
 db/command.c                          |    2 
 db/convert.c                          |  118 +++++++-
 db/field.c                            |   20 +
 db/field.h                            |   10 +
 db/inode.c                            |   36 ++
 db/metadump.c                         |   59 ++++
 db/rtgroup.c                          |  154 ++++++++++
 db/rtgroup.h                          |   21 +
 db/sb.c                               |  117 +++++++-
 db/type.c                             |   16 +
 db/type.h                             |   32 ++
 db/xfs_metadump.sh                    |    5 
 include/xfs.h                         |   15 +
 include/xfs_metadump.h                |    8 +
 io/Makefile                           |    1 
 io/aginfo.c                           |  215 ++++++++++++++
 io/bmap.c                             |   27 +-
 io/fsmap.c                            |   22 +
 io/init.c                             |    1 
 io/io.h                               |    1 
 io/scrub.c                            |   81 +++++
 libfrog/bitmap.c                      |   25 +-
 libfrog/bitmap.h                      |    1 
 libfrog/div64.h                       |    6 
 libfrog/fsgeom.c                      |   24 +-
 libfrog/fsgeom.h                      |   16 +
 libfrog/scrub.c                       |   24 +-
 libfrog/scrub.h                       |    1 
 libfrog/util.c                        |   12 +
 libfrog/util.h                        |    1 
 libxfs/defer_item.c                   |   73 +++--
 libxfs/init.c                         |   46 +++
 libxfs/libxfs_api_defs.h              |    5 
 libxfs/libxfs_priv.h                  |    8 -
 libxfs/topology.c                     |   42 +++
 libxfs/topology.h                     |    3 
 libxfs/trans.c                        |    2 
 libxfs/util.c                         |    2 
 logprint/log_misc.c                   |    2 
 logprint/log_print_all.c              |    8 +
 logprint/log_redo.c                   |   57 +++-
 man/man2/ioctl_xfs_fsgeometry.2       |    6 
 man/man2/ioctl_xfs_rtgroup_geometry.2 |   99 ++++++
 man/man2/ioctl_xfs_scrub_metadata.2   |    9 +
 man/man8/mkfs.xfs.8.in                |   31 ++
 man/man8/xfs_db.8                     |   34 ++
 man/man8/xfs_io.8                     |   29 ++
 man/man8/xfs_mdrestore.8              |   10 +
 man/man8/xfs_metadump.8               |   11 +
 man/man8/xfs_spaceman.8               |    5 
 mdrestore/xfs_mdrestore.c             |  163 ++++++-----
 mkfs/proto.c                          |  139 ++++++---
 mkfs/xfs_mkfs.c                       |  281 ++++++++++++++++++
 repair/agheader.c                     |   27 --
 repair/agheader.h                     |   10 +
 repair/dino_chunks.c                  |   58 +++-
 repair/dinode.c                       |  162 +++++++----
 repair/dir2.c                         |   13 +
 repair/globals.c                      |    3 
 repair/globals.h                      |    6 
 repair/incore.c                       |  235 +++++++++++----
 repair/incore.h                       |   36 ++
 repair/incore_ext.c                   |    3 
 repair/phase2.c                       |   51 ++-
 repair/phase3.c                       |    4 
 repair/phase4.c                       |  221 ++++++++------
 repair/phase5.c                       |    2 
 repair/phase6.c                       |  180 +++++++++++-
 repair/rmap.c                         |    4 
 repair/rt.c                           |  506 +++++++++++++++++++++++++++++----
 repair/rt.h                           |   23 ++
 repair/sb.c                           |   41 +++
 repair/scan.c                         |   36 +-
 repair/xfs_repair.c                   |   19 +
 scrub/phase2.c                        |  124 ++++++--
 scrub/phase5.c                        |   24 +-
 scrub/phase6.c                        |   17 +
 scrub/phase7.c                        |    7 
 scrub/phase8.c                        |   36 ++
 scrub/repair.c                        |    1 
 scrub/scrub.c                         |    7 
 scrub/scrub.h                         |   13 +
 scrub/spacemap.c                      |  102 +++++--
 scrub/xfs_scrub.c                     |    2 
 scrub/xfs_scrub.h                     |    1 
 spaceman/health.c                     |   63 ++++
 89 files changed, 3545 insertions(+), 719 deletions(-)
 create mode 100644 db/rtgroup.c
 create mode 100644 db/rtgroup.h
 create mode 100644 io/aginfo.c
 create mode 100644 man/man2/ioctl_xfs_rtgroup_geometry.2


^ permalink raw reply	[flat|nested] 213+ messages in thread

* [PATCHSET v6.2 7/8] xfsprogs: store quota files in the metadir
  2024-12-23 21:29 [PATCHBOMB v6.2] xfsprogs: metadata directories and realtime groups Darrick J. Wong
                   ` (5 preceding siblings ...)
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
@ 2024-12-23 21:36 ` Darrick J. Wong
  2024-12-23 22:25   ` [PATCH 1/7] libfrog: scrub quota file metapaths Darrick J. Wong
                     ` (6 more replies)
  2024-12-23 21:36 ` [PATCHSET v6.2 8/8] xfsprogs: enable quota for realtime volumes Darrick J. Wong
                   ` (8 subsequent siblings)
  15 siblings, 7 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:36 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

Hi all,

Store the quota files in the metadata directory tree instead of the
superblock.  Since we're introducing a new incompat feature flag, let's
also make the mount process bring up quotas in whatever state they were
when the filesystem was last unmounted, instead of requiring sysadmins
to remember that themselves.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=metadir-quotas

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=metadir-quotas

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=metadir-quotas
---
Commits in this patchset:
 * libfrog: scrub quota file metapaths
 * xfs_db: support metadir quotas
 * xfs_repair: refactor quota inumber handling
 * xfs_repair: hoist the secondary sb qflags handling
 * xfs_repair: support quota inodes in the metadata directory
 * xfs_repair: try not to trash qflags on metadir filesystems
 * mkfs: add quota flags when setting up filesystem
---
 db/dquot.c               |   59 +++++++++++++----
 libfrog/scrub.c          |   20 ++++++
 libxfs/libxfs_api_defs.h |    6 ++
 man/man8/mkfs.xfs.8.in   |   48 ++++++++++++++
 mkfs/xfs_mkfs.c          |  113 ++++++++++++++++++++++++++++++++
 repair/agheader.c        |  161 +++++++++++++++++++++++++---------------------
 repair/dinode.c          |   18 +++--
 repair/dir2.c            |   12 ++-
 repair/globals.c         |  111 ++++++++++++++++++++++++++++++--
 repair/globals.h         |   15 +++-
 repair/phase2.c          |    3 +
 repair/phase4.c          |  116 +++++++++++++++++----------------
 repair/phase6.c          |  128 +++++++++++++++++++++++++++++++++++--
 repair/quotacheck.c      |  118 ++++++++++++++++++++++++++++++----
 repair/quotacheck.h      |    3 +
 repair/sb.c              |    3 +
 repair/versions.c        |    9 +--
 repair/xfs_repair.c      |   13 +++-
 18 files changed, 753 insertions(+), 203 deletions(-)


^ permalink raw reply	[flat|nested] 213+ messages in thread

* [PATCHSET v6.2 8/8] xfsprogs: enable quota for realtime volumes
  2024-12-23 21:29 [PATCHBOMB v6.2] xfsprogs: metadata directories and realtime groups Darrick J. Wong
                   ` (6 preceding siblings ...)
  2024-12-23 21:36 ` [PATCHSET v6.2 7/8] xfsprogs: store quota files in the metadir Darrick J. Wong
@ 2024-12-23 21:36 ` Darrick J. Wong
  2024-12-23 22:27   ` [PATCH 1/2] xfs_quota: report warning limits for realtime space quotas Darrick J. Wong
  2024-12-23 22:27   ` [PATCH 2/2] mkfs: enable rt quota options Darrick J. Wong
  2024-12-23 22:27 ` [GIT PULL 1/8] xfsprogs: bug fixes for 6.12 Darrick J. Wong
                   ` (7 subsequent siblings)
  15 siblings, 2 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:36 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

Hi all,

At some point, I realized that I've refactored enough of the quota code
in XFS that I should evaluate whether or not quota actually works on
realtime volumes.  It turns out that it nearly works: the only broken
pieces are chown and delayed allocation, and reporting of project
quotas in the statvfs output for projinherit+rtinherit directories.

Fix these things and we can have realtime quotas again after 20 years.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=realtime-quotas

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=realtime-quotas

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=realtime-quotas
---
Commits in this patchset:
 * xfs_quota: report warning limits for realtime space quotas
 * mkfs: enable rt quota options
---
 include/xqm.h   |    5 ++++-
 mkfs/xfs_mkfs.c |    6 ------
 quota/state.c   |    1 +
 3 files changed, 5 insertions(+), 7 deletions(-)


^ permalink raw reply	[flat|nested] 213+ messages in thread

* [PATCH 1/3] xfs_repair: fix maximum file offset comparison
  2024-12-23 21:34 ` [PATCHSET 1/8] xfsprogs: bug fixes for 6.12 Darrick J. Wong
@ 2024-12-23 21:36   ` Darrick J. Wong
  2024-12-23 21:36   ` [PATCH 2/3] man: fix ioctl_xfs_commit_range man page install Darrick J. Wong
  2024-12-23 21:37   ` [PATCH 3/3] man: document the -n parent mkfs option Darrick J. Wong
  2 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:36 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

When running generic/525 with rtinherit=1 and rextsize=28k, generic/525
trips over the following block mapping:

data offset 2251799813685247 startblock 7 (0/7) count 1 flag 0
data offset 2251799813685248 startblock 8 (0/8) count 6 flag 1

with this error:

inode 155 - extent exceeds max offset - start 2251799813685248, count 6,
physical block 8

This is due to an incorrect check in xfs_repair, which tries to validate
that a block mapping cannot exceed what it thinks is the maximum file
offset.  Unfortunately, the check is wrong, because only br_startoff is
subject to the 2^52-1 limit -- not br_startoff + br_blockcount.

Nowadays libxfs provides a symbol XFS_MAX_FILEOFF for the maximum
allowable file block offset that can be mapped into a file.  Use this
instead of the open-coded logic in versions.c and correct all the other
checks.  Note that this problem only surfaced when rtgroups were enabled
because hch changed xfs_repair to use the same tree-based block state
data structure that we use for AGs when rtgroups are enabled.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/dinode.c   |    2 +-
 repair/globals.c  |    1 -
 repair/globals.h  |    1 -
 repair/prefetch.c |    2 +-
 repair/versions.c |    7 +------
 5 files changed, 3 insertions(+), 10 deletions(-)


diff --git a/repair/dinode.c b/repair/dinode.c
index ac81c487a20b8a..2c9d9acfa10be5 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -491,7 +491,7 @@ _("inode %" PRIu64 " - bad extent overflows - start %" PRIu64 ", "
 		}
 		/* Ensure this extent does not extend beyond the max offset */
 		if (irec.br_startoff + irec.br_blockcount - 1 >
-							fs_max_file_offset) {
+							XFS_MAX_FILEOFF) {
 			do_warn(
 _("inode %" PRIu64 " - extent exceeds max offset - start %" PRIu64 ", "
   "count %" PRIu64 ", physical block %" PRIu64 "\n"),
diff --git a/repair/globals.c b/repair/globals.c
index c0c45df51d562a..7388090a7d39f3 100644
--- a/repair/globals.c
+++ b/repair/globals.c
@@ -83,7 +83,6 @@ int		inodes_per_block;
 unsigned int	glob_agcount;
 int		chunks_pblock;	/* # of 64-ino chunks per allocation */
 int		max_symlink_blocks;
-int64_t		fs_max_file_offset;
 
 /* realtime info */
 
diff --git a/repair/globals.h b/repair/globals.h
index 1eadfdbf9ae4f2..fa53502f98bbcd 100644
--- a/repair/globals.h
+++ b/repair/globals.h
@@ -124,7 +124,6 @@ extern int		inodes_per_block;
 extern unsigned int	glob_agcount;
 extern int		chunks_pblock;	/* # of 64-ino chunks per allocation */
 extern int		max_symlink_blocks;
-extern int64_t		fs_max_file_offset;
 
 /* realtime info */
 
diff --git a/repair/prefetch.c b/repair/prefetch.c
index 0772ecef9d73eb..5ecf19ae9cb111 100644
--- a/repair/prefetch.c
+++ b/repair/prefetch.c
@@ -185,7 +185,7 @@ pf_read_bmbt_reclist(
 
 		if (((i > 0) && (op + cp > irec.br_startoff)) ||
 				(irec.br_blockcount == 0) ||
-				(irec.br_startoff >= fs_max_file_offset))
+				(irec.br_startoff + irec.br_blockcount - 1 >= XFS_MAX_FILEOFF))
 			goto out_free;
 
 		if (!libxfs_verify_fsbno(mp, irec.br_startblock) ||
diff --git a/repair/versions.c b/repair/versions.c
index b24965b263a183..7dc91b4597eece 100644
--- a/repair/versions.c
+++ b/repair/versions.c
@@ -180,10 +180,5 @@ _("WARNING: you have a V1 inode filesystem. It would be converted to a\n"
 		fs_ino_alignment = mp->m_sb.sb_inoalignmt;
 	}
 
-	/*
-	 * calculate maximum file offset for this geometry
-	 */
-	fs_max_file_offset = 0x7fffffffffffffffLL >> mp->m_sb.sb_blocklog;
-
-	return(0);
+	return 0;
 }


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 2/3] man: fix ioctl_xfs_commit_range man page install
  2024-12-23 21:34 ` [PATCHSET 1/8] xfsprogs: bug fixes for 6.12 Darrick J. Wong
  2024-12-23 21:36   ` [PATCH 1/3] xfs_repair: fix maximum file offset comparison Darrick J. Wong
@ 2024-12-23 21:36   ` Darrick J. Wong
  2024-12-23 21:37   ` [PATCH 3/3] man: document the -n parent mkfs option Darrick J. Wong
  2 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:36 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: jpalus, linux-xfs

From: Jan Palus <jpalus@fastmail.com>

INSTALL_MAN uses first symbol in .SH NAME section for both source and
destination filename hence it needs to match current filename. since
ioctl_xfs_commit_range.2 documents both ioctl_xfs_start_commit as well
as ioctl_xfs_commit_range ensure they are listed in order INSTALL_MAN
expects.

Signed-off-by: Jan Palus <jpalus@fastmail.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 man/man2/ioctl_xfs_commit_range.2 |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/man/man2/ioctl_xfs_commit_range.2 b/man/man2/ioctl_xfs_commit_range.2
index 3244e52c3e0946..4cd074ed00c6f2 100644
--- a/man/man2/ioctl_xfs_commit_range.2
+++ b/man/man2/ioctl_xfs_commit_range.2
@@ -22,8 +22,8 @@
 .\" %%%LICENSE_END
 .TH IOCTL-XFS-COMMIT-RANGE 2  2024-02-18 "XFS"
 .SH NAME
-ioctl_xfs_start_commit \- prepare to exchange the contents of two files
 ioctl_xfs_commit_range \- conditionally exchange the contents of parts of two files
+ioctl_xfs_start_commit \- prepare to exchange the contents of two files
 .SH SYNOPSIS
 .br
 .B #include <sys/ioctl.h>


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 3/3] man: document the -n parent mkfs option
  2024-12-23 21:34 ` [PATCHSET 1/8] xfsprogs: bug fixes for 6.12 Darrick J. Wong
  2024-12-23 21:36   ` [PATCH 1/3] xfs_repair: fix maximum file offset comparison Darrick J. Wong
  2024-12-23 21:36   ` [PATCH 2/3] man: fix ioctl_xfs_commit_range man page install Darrick J. Wong
@ 2024-12-23 21:37   ` Darrick J. Wong
  2 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:37 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Document the -n parent option to mkfs.xfs so that users will actually
know how to turn on directory parent pointers.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 man/man8/mkfs.xfs.8.in |   12 ++++++++++++
 1 file changed, 12 insertions(+)


diff --git a/man/man8/mkfs.xfs.8.in b/man/man8/mkfs.xfs.8.in
index a854b0e87cb1a2..e56c8f31a52c78 100644
--- a/man/man8/mkfs.xfs.8.in
+++ b/man/man8/mkfs.xfs.8.in
@@ -902,6 +902,18 @@ .SH OPTIONS
 enabled, and cannot be turned off.
 .IP
 In other words, this option is only tunable on the deprecated V4 format.
+.TP
+.BI parent= value
+This feature creates back references from child files to parent directories
+so that online repair can reconstruct broken directory files.
+The value is either 0 to disable the feature, or 1 to create parent pointers.
+
+By default,
+.B mkfs.xfs
+will not create parent pointers.
+This feature is only available for filesystems created with the (default)
+.B \-m crc=1
+option set.
 .RE
 .PP
 .PD 0


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 01/36] xfs: remove the redundant xfs_alloc_log_agf
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
@ 2024-12-23 21:37   ` Darrick J. Wong
  2024-12-23 21:37   ` [PATCH 02/36] xfs: sb_spino_align is not verified Darrick J. Wong
                     ` (34 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:37 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: leo.lilong, dchinner, cem, hch, linux-xfs

From: Long Li <leo.lilong@huawei.com>

Source kernel commit: 8b9b261594d8ef218ef4d0e732dad153f82aab49

There are two invocations of xfs_alloc_log_agf in xfs_alloc_put_freelist.
The AGF does not change between the two calls. Although this does not pose
any practical problems, it seems like a small mistake. Therefore, fix it
by removing the first xfs_alloc_log_agf invocation.

Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_alloc.c |    2 --
 1 file changed, 2 deletions(-)


diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index f0635b17f18548..355f8ef1a872d3 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -3152,8 +3152,6 @@ xfs_alloc_put_freelist(
 		logflags |= XFS_AGF_BTREEBLKS;
 	}
 
-	xfs_alloc_log_agf(tp, agbp, logflags);
-
 	ASSERT(be32_to_cpu(agf->agf_flcount) <= xfs_agfl_size(mp));
 
 	agfl_bno = xfs_buf_to_agfl_bno(agflbp);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 02/36] xfs: sb_spino_align is not verified
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
  2024-12-23 21:37   ` [PATCH 01/36] xfs: remove the redundant xfs_alloc_log_agf Darrick J. Wong
@ 2024-12-23 21:37   ` Darrick J. Wong
  2024-12-23 21:37   ` [PATCH 03/36] xfs: remove the unused pagb_count field in struct xfs_perag Darrick J. Wong
                     ` (33 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:37 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: dchinner, cem, hch, linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Source kernel commit: 59e43f5479cce106d71c0b91a297c7ad1913176c

It's just read in from the superblock and used without doing any
validity checks at all on the value.

Fixes: fb4f2b4e5a82 ("xfs: add sparse inode chunk alignment superblock field")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_sb.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)


diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 0603e5087f2e46..0d98b8a344209e 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -395,6 +395,20 @@ xfs_validate_sb_common(
 					 sbp->sb_inoalignmt, align);
 				return -EINVAL;
 			}
+
+			if (!sbp->sb_spino_align ||
+			    sbp->sb_spino_align > sbp->sb_inoalignmt ||
+			    (sbp->sb_inoalignmt % sbp->sb_spino_align) != 0) {
+				xfs_warn(mp,
+				"Sparse inode alignment (%u) is invalid.",
+					sbp->sb_spino_align);
+				return -EINVAL;
+			}
+		} else if (sbp->sb_spino_align) {
+			xfs_warn(mp,
+				"Sparse inode alignment (%u) should be zero.",
+				sbp->sb_spino_align);
+			return -EINVAL;
 		}
 	} else if (sbp->sb_qflags & (XFS_PQUOTA_ENFD | XFS_GQUOTA_ENFD |
 				XFS_PQUOTA_CHKD | XFS_GQUOTA_CHKD)) {


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 03/36] xfs: remove the unused pagb_count field in struct xfs_perag
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
  2024-12-23 21:37   ` [PATCH 01/36] xfs: remove the redundant xfs_alloc_log_agf Darrick J. Wong
  2024-12-23 21:37   ` [PATCH 02/36] xfs: sb_spino_align is not verified Darrick J. Wong
@ 2024-12-23 21:37   ` Darrick J. Wong
  2024-12-23 21:38   ` [PATCH 04/36] xfs: remove the unused pag_active_wq " Darrick J. Wong
                     ` (32 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:37 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: 4e071d79e477189a6c318f598634799e50921994

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_ag.c |    1 -
 libxfs/xfs_ag.h |    1 -
 2 files changed, 2 deletions(-)


diff --git a/libxfs/xfs_ag.c b/libxfs/xfs_ag.c
index 79ee483b42695a..975b139ca497a2 100644
--- a/libxfs/xfs_ag.c
+++ b/libxfs/xfs_ag.c
@@ -325,7 +325,6 @@ xfs_initialize_perag(
 		xfs_defer_drain_init(&pag->pag_intents_drain);
 		init_waitqueue_head(&pag->pagb_wait);
 		init_waitqueue_head(&pag->pag_active_wq);
-		pag->pagb_count = 0;
 		pag->pagb_tree = RB_ROOT;
 		xfs_hooks_init(&pag->pag_rmap_update_hooks);
 #endif /* __KERNEL__ */
diff --git a/libxfs/xfs_ag.h b/libxfs/xfs_ag.h
index 9edfe0e9643964..79149a5ec44e9a 100644
--- a/libxfs/xfs_ag.h
+++ b/libxfs/xfs_ag.h
@@ -55,7 +55,6 @@ struct xfs_perag {
 	xfs_agino_t	pagl_leftrec;
 	xfs_agino_t	pagl_rightrec;
 
-	int		pagb_count;	/* pagb slots in use */
 	uint8_t		pagf_refcount_level; /* recount btree height */
 
 	/* Blocks reserved for all kinds of metadata. */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 04/36] xfs: remove the unused pag_active_wq field in struct xfs_perag
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-12-23 21:37   ` [PATCH 03/36] xfs: remove the unused pagb_count field in struct xfs_perag Darrick J. Wong
@ 2024-12-23 21:38   ` Darrick J. Wong
  2024-12-23 21:38   ` [PATCH 05/36] xfs: pass a pag to xfs_difree_inode_chunk Darrick J. Wong
                     ` (31 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:38 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: 9943b45732905a70496fc44368ab85b230c70db4

pag_active_wq is only woken, but never waited for.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_ag.c |    4 +---
 libxfs/xfs_ag.h |    1 -
 2 files changed, 1 insertion(+), 4 deletions(-)


diff --git a/libxfs/xfs_ag.c b/libxfs/xfs_ag.c
index 975b139ca497a2..62fc21fe7109b9 100644
--- a/libxfs/xfs_ag.c
+++ b/libxfs/xfs_ag.c
@@ -105,8 +105,7 @@ xfs_perag_rele(
 	struct xfs_perag	*pag)
 {
 	trace_xfs_perag_rele(pag, _RET_IP_);
-	if (atomic_dec_and_test(&pag->pag_active_ref))
-		wake_up(&pag->pag_active_wq);
+	atomic_dec(&pag->pag_active_ref);
 }
 
 /*
@@ -324,7 +323,6 @@ xfs_initialize_perag(
 		INIT_RADIX_TREE(&pag->pag_ici_root, GFP_ATOMIC);
 		xfs_defer_drain_init(&pag->pag_intents_drain);
 		init_waitqueue_head(&pag->pagb_wait);
-		init_waitqueue_head(&pag->pag_active_wq);
 		pag->pagb_tree = RB_ROOT;
 		xfs_hooks_init(&pag->pag_rmap_update_hooks);
 #endif /* __KERNEL__ */
diff --git a/libxfs/xfs_ag.h b/libxfs/xfs_ag.h
index 79149a5ec44e9a..958ca82524292f 100644
--- a/libxfs/xfs_ag.h
+++ b/libxfs/xfs_ag.h
@@ -34,7 +34,6 @@ struct xfs_perag {
 	xfs_agnumber_t	pag_agno;	/* AG this structure belongs to */
 	atomic_t	pag_ref;	/* passive reference count */
 	atomic_t	pag_active_ref;	/* active reference count */
-	wait_queue_head_t pag_active_wq;/* woken active_ref falls to zero */
 	unsigned long	pag_opstate;
 	uint8_t		pagf_bno_level;	/* # of levels in bno btree */
 	uint8_t		pagf_cnt_level;	/* # of levels in cnt btree */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 05/36] xfs: pass a pag to xfs_difree_inode_chunk
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-12-23 21:38   ` [PATCH 04/36] xfs: remove the unused pag_active_wq " Darrick J. Wong
@ 2024-12-23 21:38   ` Darrick J. Wong
  2024-12-23 21:38   ` [PATCH 06/36] xfs: remove the agno argument to xfs_free_ag_extent Darrick J. Wong
                     ` (30 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:38 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: 67ce5ba575354da1542e0579fb8c7a871cbf57b3

We'll want to use more than just the agno field in a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_ialloc.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)


diff --git a/libxfs/xfs_ialloc.c b/libxfs/xfs_ialloc.c
index 43af698fa90903..10d88eb0b5bc32 100644
--- a/libxfs/xfs_ialloc.c
+++ b/libxfs/xfs_ialloc.c
@@ -1969,10 +1969,11 @@ xfs_dialloc(
 static int
 xfs_difree_inode_chunk(
 	struct xfs_trans		*tp,
-	xfs_agnumber_t			agno,
+	struct xfs_perag		*pag,
 	struct xfs_inobt_rec_incore	*rec)
 {
 	struct xfs_mount		*mp = tp->t_mountp;
+	xfs_agnumber_t			agno = pag->pag_agno;
 	xfs_agblock_t			sagbno = XFS_AGINO_TO_AGBNO(mp,
 							rec->ir_startino);
 	int				startidx, endidx;
@@ -2143,7 +2144,7 @@ xfs_difree_inobt(
 			goto error0;
 		}
 
-		error = xfs_difree_inode_chunk(tp, pag->pag_agno, &rec);
+		error = xfs_difree_inode_chunk(tp, pag, &rec);
 		if (error)
 			goto error0;
 	} else {


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 06/36] xfs: remove the agno argument to xfs_free_ag_extent
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-12-23 21:38   ` [PATCH 05/36] xfs: pass a pag to xfs_difree_inode_chunk Darrick J. Wong
@ 2024-12-23 21:38   ` Darrick J. Wong
  2024-12-23 21:38   ` [PATCH 07/36] xfs: add xfs_agbno_to_fsb and xfs_agbno_to_daddr helpers Darrick J. Wong
                     ` (29 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:38 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: db129fa01113f767d5b7a6fd339114a962023464

xfs_free_ag_extent already has a pointer to the pag structure through
the agf buffer.  Use that instead of passing the redundant argument,
and do the same for the tracepoint.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/xfs_trace.h |    2 +-
 libxfs/defer_item.c |    4 ++--
 libxfs/xfs_alloc.c  |   10 ++++------
 libxfs/xfs_alloc.h  |    5 ++---
 4 files changed, 9 insertions(+), 12 deletions(-)


diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index f6d6a6ea1af9e1..ba51419b3df3d3 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -79,7 +79,7 @@
 #define trace_xfs_btree_free_block(...)		((void) 0)
 #define trace_xfs_btree_alloc_block(...)	((void) 0)
 
-#define trace_xfs_free_extent(a,b,c,d,e,f,g)	((void) 0)
+#define trace_xfs_free_extent(...)		((void) 0)
 #define trace_xfs_agf(a,b,c,d)			((void) 0)
 #define trace_xfs_read_agf(a,b)			((void) 0)
 #define trace_xfs_alloc_read_agf(a,b)		((void) 0)
diff --git a/libxfs/defer_item.c b/libxfs/defer_item.c
index 2b48ed14d67bcb..d5e075362ababe 100644
--- a/libxfs/defer_item.c
+++ b/libxfs/defer_item.c
@@ -181,8 +181,8 @@ xfs_agfl_free_finish_item(
 
 	error = xfs_alloc_read_agf(xefi->xefi_pag, tp, 0, &agbp);
 	if (!error)
-		error = xfs_free_ag_extent(tp, agbp, xefi->xefi_pag->pag_agno,
-				agbno, 1, &oinfo, XFS_AG_RESV_AGFL);
+		error = xfs_free_ag_extent(tp, agbp, agbno, 1, &oinfo,
+				XFS_AG_RESV_AGFL);
 
 	xfs_extent_free_cancel_item(item);
 	return error;
diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index 355f8ef1a872d3..1f4740cced73a1 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -2033,7 +2033,6 @@ int
 xfs_free_ag_extent(
 	struct xfs_trans		*tp,
 	struct xfs_buf			*agbp,
-	xfs_agnumber_t			agno,
 	xfs_agblock_t			bno,
 	xfs_extlen_t			len,
 	const struct xfs_owner_info	*oinfo,
@@ -2354,19 +2353,19 @@ xfs_free_ag_extent(
 	 * Update the freespace totals in the ag and superblock.
 	 */
 	error = xfs_alloc_update_counters(tp, agbp, len);
-	xfs_ag_resv_free_extent(agbp->b_pag, type, tp, len);
+	xfs_ag_resv_free_extent(pag, type, tp, len);
 	if (error)
 		goto error0;
 
 	XFS_STATS_INC(mp, xs_freex);
 	XFS_STATS_ADD(mp, xs_freeb, len);
 
-	trace_xfs_free_extent(mp, agno, bno, len, type, haveleft, haveright);
+	trace_xfs_free_extent(pag, bno, len, type, haveleft, haveright);
 
 	return 0;
 
  error0:
-	trace_xfs_free_extent(mp, agno, bno, len, type, -1, -1);
+	trace_xfs_free_extent(pag, bno, len, type, -1, -1);
 	if (bno_cur)
 		xfs_btree_del_cursor(bno_cur, XFS_BTREE_ERROR);
 	if (cnt_cur)
@@ -4006,8 +4005,7 @@ __xfs_free_extent(
 		goto err_release;
 	}
 
-	error = xfs_free_ag_extent(tp, agbp, pag->pag_agno, agbno, len, oinfo,
-			type);
+	error = xfs_free_ag_extent(tp, agbp, agbno, len, oinfo, type);
 	if (error)
 		goto err_release;
 
diff --git a/libxfs/xfs_alloc.h b/libxfs/xfs_alloc.h
index 0165452e7cd055..88fbce5001185f 100644
--- a/libxfs/xfs_alloc.h
+++ b/libxfs/xfs_alloc.h
@@ -79,9 +79,8 @@ int xfs_alloc_put_freelist(struct xfs_perag *pag, struct xfs_trans *tp,
 		struct xfs_buf *agfbp, struct xfs_buf *agflbp,
 		xfs_agblock_t bno, int btreeblk);
 int xfs_free_ag_extent(struct xfs_trans *tp, struct xfs_buf *agbp,
-		xfs_agnumber_t agno, xfs_agblock_t bno,
-		xfs_extlen_t len, const struct xfs_owner_info *oinfo,
-		enum xfs_ag_resv_type type);
+		xfs_agblock_t bno, xfs_extlen_t len,
+		const struct xfs_owner_info *oinfo, enum xfs_ag_resv_type type);
 
 /*
  * Compute and fill in value of m_alloc_maxlevels.


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 07/36] xfs: add xfs_agbno_to_fsb and xfs_agbno_to_daddr helpers
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-12-23 21:38   ` [PATCH 06/36] xfs: remove the agno argument to xfs_free_ag_extent Darrick J. Wong
@ 2024-12-23 21:38   ` Darrick J. Wong
  2024-12-23 21:39   ` [PATCH 08/36] xfs: add a xfs_agino_to_ino helper Darrick J. Wong
                     ` (28 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:38 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: 856a920ac2bbb2352ef6aa9e1e052f2e80677df7

Add helpers to convert an agbno to a daddr or fsbno based on a pag
structure.

This provides a simpler conversion and better type safety compared to the
existing code that passes the mount structure and the agno separately.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_ag.c             |    2 +-
 libxfs/xfs_ag.h             |   16 ++++++++++++++++
 libxfs/xfs_alloc.c          |   14 ++++++--------
 libxfs/xfs_btree.c          |    7 +++----
 libxfs/xfs_ialloc.c         |   29 +++++++++++++----------------
 libxfs/xfs_ialloc_btree.c   |    2 +-
 libxfs/xfs_refcount.c       |   11 ++++-------
 libxfs/xfs_refcount_btree.c |    3 +--
 8 files changed, 45 insertions(+), 39 deletions(-)


diff --git a/libxfs/xfs_ag.c b/libxfs/xfs_ag.c
index 62fc21fe7109b9..a6b5a7d71bbf80 100644
--- a/libxfs/xfs_ag.c
+++ b/libxfs/xfs_ag.c
@@ -867,7 +867,7 @@ xfs_ag_shrink_space(
 
 	/* internal log shouldn't also show up in the free space btrees */
 	error = xfs_alloc_vextent_exact_bno(&args,
-			XFS_AGB_TO_FSB(mp, pag->pag_agno, aglen - delta));
+			xfs_agbno_to_fsb(pag, aglen - delta));
 	if (!error && args.agbno == NULLAGBLOCK)
 		error = -ENOSPC;
 
diff --git a/libxfs/xfs_ag.h b/libxfs/xfs_ag.h
index 958ca82524292f..c0a30141ddc330 100644
--- a/libxfs/xfs_ag.h
+++ b/libxfs/xfs_ag.h
@@ -330,4 +330,20 @@ int xfs_ag_extend_space(struct xfs_perag *pag, struct xfs_trans *tp,
 			xfs_extlen_t len);
 int xfs_ag_get_geometry(struct xfs_perag *pag, struct xfs_ag_geometry *ageo);
 
+static inline xfs_fsblock_t
+xfs_agbno_to_fsb(
+	struct xfs_perag	*pag,
+	xfs_agblock_t		agbno)
+{
+	return XFS_AGB_TO_FSB(pag->pag_mount, pag->pag_agno, agbno);
+}
+
+static inline xfs_daddr_t
+xfs_agbno_to_daddr(
+	struct xfs_perag	*pag,
+	xfs_agblock_t		agbno)
+{
+	return XFS_AGB_TO_DADDR(pag->pag_mount, pag->pag_agno, agbno);
+}
+
 #endif /* __LIBXFS_AG_H */
diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index 1f4740cced73a1..19b38eaf45dd07 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -1255,7 +1255,7 @@ xfs_alloc_ag_vextent_small(
 		struct xfs_buf	*bp;
 
 		error = xfs_trans_get_buf(args->tp, args->mp->m_ddev_targp,
-				XFS_AGB_TO_DADDR(args->mp, args->agno, fbno),
+				xfs_agbno_to_daddr(args->pag, fbno),
 				args->mp->m_bsize, 0, &bp);
 		if (error)
 			goto error;
@@ -2929,9 +2929,8 @@ xfs_alloc_fix_freelist(
 		 * Deferring the free disconnects freeing up the AGFL slot from
 		 * freeing the block.
 		 */
-		error = xfs_free_extent_later(tp,
-				XFS_AGB_TO_FSB(mp, args->agno, bno), 1,
-				&targs.oinfo, XFS_AG_RESV_AGFL, 0);
+		error = xfs_free_extent_later(tp, xfs_agbno_to_fsb(pag, bno),
+				1, &targs.oinfo, XFS_AG_RESV_AGFL, 0);
 		if (error)
 			goto out_agbp_relse;
 	}
@@ -3590,7 +3589,7 @@ xfs_alloc_vextent_finish(
 		goto out_drop_perag;
 	}
 
-	args->fsbno = XFS_AGB_TO_FSB(mp, args->agno, args->agbno);
+	args->fsbno = xfs_agbno_to_fsb(args->pag, args->agbno);
 
 	ASSERT(args->len >= args->minlen);
 	ASSERT(args->len <= args->maxlen);
@@ -3642,7 +3641,6 @@ xfs_alloc_vextent_this_ag(
 	struct xfs_alloc_arg	*args,
 	xfs_agnumber_t		agno)
 {
-	struct xfs_mount	*mp = args->mp;
 	xfs_agnumber_t		minimum_agno;
 	uint32_t		alloc_flags = 0;
 	int			error;
@@ -3655,8 +3653,8 @@ xfs_alloc_vextent_this_ag(
 
 	trace_xfs_alloc_vextent_this_ag(args);
 
-	error = xfs_alloc_vextent_check_args(args, XFS_AGB_TO_FSB(mp, agno, 0),
-			&minimum_agno);
+	error = xfs_alloc_vextent_check_args(args,
+			xfs_agbno_to_fsb(args->pag, 0), &minimum_agno);
 	if (error) {
 		if (error == -ENOSPC)
 			return 0;
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index bb53b6d7af22f6..4f04a92f6513bf 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -1015,21 +1015,20 @@ xfs_btree_readahead_agblock(
 	struct xfs_btree_block	*block)
 {
 	struct xfs_mount	*mp = cur->bc_mp;
-	xfs_agnumber_t		agno = cur->bc_ag.pag->pag_agno;
 	xfs_agblock_t		left = be32_to_cpu(block->bb_u.s.bb_leftsib);
 	xfs_agblock_t		right = be32_to_cpu(block->bb_u.s.bb_rightsib);
 	int			rval = 0;
 
 	if ((lr & XFS_BTCUR_LEFTRA) && left != NULLAGBLOCK) {
 		xfs_buf_readahead(mp->m_ddev_targp,
-				XFS_AGB_TO_DADDR(mp, agno, left),
+				xfs_agbno_to_daddr(cur->bc_ag.pag, left),
 				mp->m_bsize, cur->bc_ops->buf_ops);
 		rval++;
 	}
 
 	if ((lr & XFS_BTCUR_RIGHTRA) && right != NULLAGBLOCK) {
 		xfs_buf_readahead(mp->m_ddev_targp,
-				XFS_AGB_TO_DADDR(mp, agno, right),
+				xfs_agbno_to_daddr(cur->bc_ag.pag, right),
 				mp->m_bsize, cur->bc_ops->buf_ops);
 		rval++;
 	}
@@ -1089,7 +1088,7 @@ xfs_btree_ptr_to_daddr(
 
 	switch (cur->bc_ops->type) {
 	case XFS_BTREE_TYPE_AG:
-		*daddr = XFS_AGB_TO_DADDR(cur->bc_mp, cur->bc_ag.pag->pag_agno,
+		*daddr = xfs_agbno_to_daddr(cur->bc_ag.pag,
 				be32_to_cpu(ptr->s));
 		break;
 	case XFS_BTREE_TYPE_INODE:
diff --git a/libxfs/xfs_ialloc.c b/libxfs/xfs_ialloc.c
index 10d88eb0b5bc32..6694ee2370411a 100644
--- a/libxfs/xfs_ialloc.c
+++ b/libxfs/xfs_ialloc.c
@@ -763,8 +763,7 @@ xfs_ialloc_ag_alloc(
 		/* Allow space for the inode btree to split. */
 		args.minleft = igeo->inobt_maxlevels;
 		error = xfs_alloc_vextent_exact_bno(&args,
-				XFS_AGB_TO_FSB(args.mp, pag->pag_agno,
-						args.agbno));
+				xfs_agbno_to_fsb(pag, args.agbno));
 		if (error)
 			return error;
 
@@ -806,8 +805,8 @@ xfs_ialloc_ag_alloc(
 		 */
 		args.minleft = igeo->inobt_maxlevels;
 		error = xfs_alloc_vextent_near_bno(&args,
-				XFS_AGB_TO_FSB(args.mp, pag->pag_agno,
-						be32_to_cpu(agi->agi_root)));
+				xfs_agbno_to_fsb(pag,
+					be32_to_cpu(agi->agi_root)));
 		if (error)
 			return error;
 	}
@@ -819,8 +818,8 @@ xfs_ialloc_ag_alloc(
 	if (isaligned && args.fsbno == NULLFSBLOCK) {
 		args.alignment = igeo->cluster_align;
 		error = xfs_alloc_vextent_near_bno(&args,
-				XFS_AGB_TO_FSB(args.mp, pag->pag_agno,
-						be32_to_cpu(agi->agi_root)));
+				xfs_agbno_to_fsb(pag,
+					be32_to_cpu(agi->agi_root)));
 		if (error)
 			return error;
 	}
@@ -855,8 +854,8 @@ xfs_ialloc_ag_alloc(
 				 igeo->ialloc_blks;
 
 		error = xfs_alloc_vextent_near_bno(&args,
-				XFS_AGB_TO_FSB(args.mp, pag->pag_agno,
-						be32_to_cpu(agi->agi_root)));
+				xfs_agbno_to_fsb(pag,
+					be32_to_cpu(agi->agi_root)));
 		if (error)
 			return error;
 
@@ -1973,7 +1972,6 @@ xfs_difree_inode_chunk(
 	struct xfs_inobt_rec_incore	*rec)
 {
 	struct xfs_mount		*mp = tp->t_mountp;
-	xfs_agnumber_t			agno = pag->pag_agno;
 	xfs_agblock_t			sagbno = XFS_AGINO_TO_AGBNO(mp,
 							rec->ir_startino);
 	int				startidx, endidx;
@@ -1984,8 +1982,7 @@ xfs_difree_inode_chunk(
 
 	if (!xfs_inobt_issparse(rec->ir_holemask)) {
 		/* not sparse, calculate extent info directly */
-		return xfs_free_extent_later(tp,
-				XFS_AGB_TO_FSB(mp, agno, sagbno),
+		return xfs_free_extent_later(tp, xfs_agbno_to_fsb(pag, sagbno),
 				M_IGEO(mp)->ialloc_blks, &XFS_RMAP_OINFO_INODES,
 				XFS_AG_RESV_NONE, 0);
 	}
@@ -2031,9 +2028,9 @@ xfs_difree_inode_chunk(
 
 		ASSERT(agbno % mp->m_sb.sb_spino_align == 0);
 		ASSERT(contigblk % mp->m_sb.sb_spino_align == 0);
-		error = xfs_free_extent_later(tp,
-				XFS_AGB_TO_FSB(mp, agno, agbno), contigblk,
-				&XFS_RMAP_OINFO_INODES, XFS_AG_RESV_NONE, 0);
+		error = xfs_free_extent_later(tp, xfs_agbno_to_fsb(pag, agbno),
+				contigblk, &XFS_RMAP_OINFO_INODES,
+				XFS_AG_RESV_NONE, 0);
 		if (error)
 			return error;
 
@@ -2503,7 +2500,7 @@ xfs_imap(
 		offset = XFS_INO_TO_OFFSET(mp, ino);
 		ASSERT(offset < mp->m_sb.sb_inopblock);
 
-		imap->im_blkno = XFS_AGB_TO_DADDR(mp, pag->pag_agno, agbno);
+		imap->im_blkno = xfs_agbno_to_daddr(pag, agbno);
 		imap->im_len = XFS_FSB_TO_BB(mp, 1);
 		imap->im_boffset = (unsigned short)(offset <<
 							mp->m_sb.sb_inodelog);
@@ -2533,7 +2530,7 @@ xfs_imap(
 	offset = ((agbno - cluster_agbno) * mp->m_sb.sb_inopblock) +
 		XFS_INO_TO_OFFSET(mp, ino);
 
-	imap->im_blkno = XFS_AGB_TO_DADDR(mp, pag->pag_agno, cluster_agbno);
+	imap->im_blkno = xfs_agbno_to_daddr(pag, cluster_agbno);
 	imap->im_len = XFS_FSB_TO_BB(mp, M_IGEO(mp)->blocks_per_cluster);
 	imap->im_boffset = (unsigned short)(offset << mp->m_sb.sb_inodelog);
 
diff --git a/libxfs/xfs_ialloc_btree.c b/libxfs/xfs_ialloc_btree.c
index ffca4a80219d6d..f80368b4d5fa5f 100644
--- a/libxfs/xfs_ialloc_btree.c
+++ b/libxfs/xfs_ialloc_btree.c
@@ -119,7 +119,7 @@ __xfs_inobt_alloc_block(
 	args.resv = resv;
 
 	error = xfs_alloc_vextent_near_bno(&args,
-			XFS_AGB_TO_FSB(args.mp, args.pag->pag_agno, sbno));
+			xfs_agbno_to_fsb(args.pag, sbno));
 	if (error)
 		return error;
 
diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c
index 22f8afb27ece1c..eeddec68a08539 100644
--- a/libxfs/xfs_refcount.c
+++ b/libxfs/xfs_refcount.c
@@ -1153,8 +1153,7 @@ xfs_refcount_adjust_extents(
 					goto out_error;
 				}
 			} else {
-				fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
-						cur->bc_ag.pag->pag_agno,
+				fsbno = xfs_agbno_to_fsb(cur->bc_ag.pag,
 						tmp.rc_startblock);
 				error = xfs_free_extent_later(cur->bc_tp, fsbno,
 						  tmp.rc_blockcount, NULL,
@@ -1216,8 +1215,7 @@ xfs_refcount_adjust_extents(
 			}
 			goto advloop;
 		} else {
-			fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
-					cur->bc_ag.pag->pag_agno,
+			fsbno = xfs_agbno_to_fsb(cur->bc_ag.pag,
 					ext.rc_startblock);
 			error = xfs_free_extent_later(cur->bc_tp, fsbno,
 					ext.rc_blockcount, NULL,
@@ -1319,7 +1317,7 @@ xfs_refcount_continue_op(
 		return -EFSCORRUPTED;
 	}
 
-	ri->ri_startblock = XFS_AGB_TO_FSB(mp, pag->pag_agno, new_agbno);
+	ri->ri_startblock = xfs_agbno_to_fsb(pag, new_agbno);
 
 	ASSERT(xfs_verify_fsbext(mp, ri->ri_startblock, ri->ri_blockcount));
 	ASSERT(pag->pag_agno == XFS_FSB_TO_AGNO(mp, ri->ri_startblock));
@@ -1955,8 +1953,7 @@ xfs_refcount_recover_cow_leftovers(
 			goto out_free;
 
 		/* Free the orphan record */
-		fsb = XFS_AGB_TO_FSB(mp, pag->pag_agno,
-				rr->rr_rrec.rc_startblock);
+		fsb = xfs_agbno_to_fsb(pag, rr->rr_rrec.rc_startblock);
 		xfs_refcount_free_cow_extent(tp, fsb,
 				rr->rr_rrec.rc_blockcount);
 
diff --git a/libxfs/xfs_refcount_btree.c b/libxfs/xfs_refcount_btree.c
index 9028dea06b0c07..5913f4176889ea 100644
--- a/libxfs/xfs_refcount_btree.c
+++ b/libxfs/xfs_refcount_btree.c
@@ -73,8 +73,7 @@ xfs_refcountbt_alloc_block(
 	args.resv = XFS_AG_RESV_METADATA;
 
 	error = xfs_alloc_vextent_near_bno(&args,
-			XFS_AGB_TO_FSB(args.mp, args.pag->pag_agno,
-					xfs_refc_block(args.mp)));
+			xfs_agbno_to_fsb(args.pag, xfs_refc_block(args.mp)));
 	if (error)
 		goto out_error;
 	if (args.fsbno == NULLFSBLOCK) {


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 08/36] xfs: add a xfs_agino_to_ino helper
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (6 preceding siblings ...)
  2024-12-23 21:38   ` [PATCH 07/36] xfs: add xfs_agbno_to_fsb and xfs_agbno_to_daddr helpers Darrick J. Wong
@ 2024-12-23 21:39   ` Darrick J. Wong
  2024-12-23 21:39   ` [PATCH 09/36] xfs: pass a pag to xfs_extent_busy_{search,reuse} Darrick J. Wong
                     ` (27 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:39 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: 6abd82ab6ea48430c13caebaad436ca6b5f2c34d

Add a helpers to convert an agino to an ino based on a pag structure.

This provides a simpler conversion and better type safety compared to the
existing code that passes the mount structure and the agno separately.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_ag.h     |    8 ++++++++
 libxfs/xfs_ialloc.c |   24 +++++++++++-------------
 2 files changed, 19 insertions(+), 13 deletions(-)


diff --git a/libxfs/xfs_ag.h b/libxfs/xfs_ag.h
index c0a30141ddc330..e0f567d90debee 100644
--- a/libxfs/xfs_ag.h
+++ b/libxfs/xfs_ag.h
@@ -346,4 +346,12 @@ xfs_agbno_to_daddr(
 	return XFS_AGB_TO_DADDR(pag->pag_mount, pag->pag_agno, agbno);
 }
 
+static inline xfs_ino_t
+xfs_agino_to_ino(
+	struct xfs_perag	*pag,
+	xfs_agino_t		agino)
+{
+	return XFS_AGINO_TO_INO(pag->pag_mount, pag->pag_agno, agino);
+}
+
 #endif /* __LIBXFS_AG_H */
diff --git a/libxfs/xfs_ialloc.c b/libxfs/xfs_ialloc.c
index 6694ee2370411a..01b2e2d8c27c22 100644
--- a/libxfs/xfs_ialloc.c
+++ b/libxfs/xfs_ialloc.c
@@ -909,8 +909,7 @@ xfs_ialloc_ag_alloc(
 		if (error == -EFSCORRUPTED) {
 			xfs_alert(args.mp,
 	"invalid sparse inode record: ino 0x%llx holemask 0x%x count %u",
-				  XFS_AGINO_TO_INO(args.mp, pag->pag_agno,
-						   rec.ir_startino),
+				  xfs_agino_to_ino(pag, rec.ir_startino),
 				  rec.ir_holemask, rec.ir_count);
 			xfs_force_shutdown(args.mp, SHUTDOWN_CORRUPT_INCORE);
 		}
@@ -1329,7 +1328,7 @@ xfs_dialloc_ag_inobt(
 	ASSERT(offset < XFS_INODES_PER_CHUNK);
 	ASSERT((XFS_AGINO_TO_OFFSET(mp, rec.ir_startino) %
 				   XFS_INODES_PER_CHUNK) == 0);
-	ino = XFS_AGINO_TO_INO(mp, pag->pag_agno, rec.ir_startino + offset);
+	ino = xfs_agino_to_ino(pag, rec.ir_startino + offset);
 
 	if (xfs_ag_has_sickness(pag, XFS_SICK_AG_INODES)) {
 		error = xfs_dialloc_check_ino(pag, tp, ino);
@@ -1610,7 +1609,7 @@ xfs_dialloc_ag(
 	ASSERT(offset < XFS_INODES_PER_CHUNK);
 	ASSERT((XFS_AGINO_TO_OFFSET(mp, rec.ir_startino) %
 				   XFS_INODES_PER_CHUNK) == 0);
-	ino = XFS_AGINO_TO_INO(mp, pag->pag_agno, rec.ir_startino + offset);
+	ino = xfs_agino_to_ino(pag, rec.ir_startino + offset);
 
 	if (xfs_ag_has_sickness(pag, XFS_SICK_AG_INODES)) {
 		error = xfs_dialloc_check_ino(pag, tp, ino);
@@ -2117,8 +2116,7 @@ xfs_difree_inobt(
 	if (!xfs_has_ikeep(mp) && rec.ir_free == XFS_INOBT_ALL_FREE &&
 	    mp->m_sb.sb_inopblock <= XFS_INODES_PER_CHUNK) {
 		xic->deleted = true;
-		xic->first_ino = XFS_AGINO_TO_INO(mp, pag->pag_agno,
-				rec.ir_startino);
+		xic->first_ino = xfs_agino_to_ino(pag, rec.ir_startino);
 		xic->alloc = xfs_inobt_irec_to_allocmask(&rec);
 
 		/*
@@ -2317,10 +2315,10 @@ xfs_difree(
 		return -EINVAL;
 	}
 	agino = XFS_INO_TO_AGINO(mp, inode);
-	if (inode != XFS_AGINO_TO_INO(mp, pag->pag_agno, agino))  {
-		xfs_warn(mp, "%s: inode != XFS_AGINO_TO_INO() (%llu != %llu).",
+	if (inode != xfs_agino_to_ino(pag, agino))  {
+		xfs_warn(mp, "%s: inode != xfs_agino_to_ino() (%llu != %llu).",
 			__func__, (unsigned long long)inode,
-			(unsigned long long)XFS_AGINO_TO_INO(mp, pag->pag_agno, agino));
+			(unsigned long long)xfs_agino_to_ino(pag, agino));
 		ASSERT(0);
 		return -EINVAL;
 	}
@@ -2451,7 +2449,7 @@ xfs_imap(
 	agino = XFS_INO_TO_AGINO(mp, ino);
 	agbno = XFS_AGINO_TO_AGBNO(mp, agino);
 	if (agbno >= mp->m_sb.sb_agblocks ||
-	    ino != XFS_AGINO_TO_INO(mp, pag->pag_agno, agino)) {
+	    ino != xfs_agino_to_ino(pag, agino)) {
 		error = -EINVAL;
 #ifdef DEBUG
 		/*
@@ -2466,11 +2464,11 @@ xfs_imap(
 				__func__, (unsigned long long)agbno,
 				(unsigned long)mp->m_sb.sb_agblocks);
 		}
-		if (ino != XFS_AGINO_TO_INO(mp, pag->pag_agno, agino)) {
+		if (ino != xfs_agino_to_ino(pag, agino)) {
 			xfs_alert(mp,
-		"%s: ino (0x%llx) != XFS_AGINO_TO_INO() (0x%llx)",
+		"%s: ino (0x%llx) != xfs_agino_to_ino() (0x%llx)",
 				__func__, ino,
-				XFS_AGINO_TO_INO(mp, pag->pag_agno, agino));
+				xfs_agino_to_ino(pag, agino));
 		}
 		xfs_stack_trace();
 #endif /* DEBUG */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 09/36] xfs: pass a pag to xfs_extent_busy_{search,reuse}
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (7 preceding siblings ...)
  2024-12-23 21:39   ` [PATCH 08/36] xfs: add a xfs_agino_to_ino helper Darrick J. Wong
@ 2024-12-23 21:39   ` Darrick J. Wong
  2024-12-23 21:39   ` [PATCH 10/36] xfs: pass a perag structure to the xfs_ag_resv_init_error trace point Darrick J. Wong
                     ` (26 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:39 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: b6dc8c6dd2d3f230e1a554f869d6df4568a2dfbb

Replace the [mp,agno] tuple with the perag structure, which will become
more useful later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/libxfs_priv.h     |    2 +-
 libxfs/xfs_alloc.c       |    4 ++--
 libxfs/xfs_alloc_btree.c |    2 +-
 libxfs/xfs_rmap_btree.c  |    2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)


diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index 97f5003ea53862..9505806131bc42 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -465,7 +465,7 @@ xfs_buf_readahead(
 #define XFS_EXTENT_BUSY_DISCARDED	0x01	/* undergoing a discard op. */
 #define XFS_EXTENT_BUSY_SKIP_DISCARD	0x02	/* do not discard */
 
-#define xfs_extent_busy_reuse(mp,ag,bno,len,user)	((void) 0)
+#define xfs_extent_busy_reuse(...)			((void) 0)
 /* avoid unused variable warning */
 #define xfs_extent_busy_insert(tp,pag,bno,len,flags)({ 	\
 	struct xfs_perag *__foo = pag;			\
diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index 19b38eaf45dd07..ed04e40856740b 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -1248,7 +1248,7 @@ xfs_alloc_ag_vextent_small(
 	if (fbno == NULLAGBLOCK)
 		goto out;
 
-	xfs_extent_busy_reuse(args->mp, args->pag, fbno, 1,
+	xfs_extent_busy_reuse(args->pag, fbno, 1,
 			      (args->datatype & XFS_ALLOC_NOBUSY));
 
 	if (args->datatype & XFS_ALLOC_USERDATA) {
@@ -3610,7 +3610,7 @@ xfs_alloc_vextent_finish(
 		if (error)
 			goto out_drop_perag;
 
-		ASSERT(!xfs_extent_busy_search(mp, args->pag, args->agbno,
+		ASSERT(!xfs_extent_busy_search(args->pag, args->agbno,
 				args->len));
 	}
 
diff --git a/libxfs/xfs_alloc_btree.c b/libxfs/xfs_alloc_btree.c
index 4a711f2463cd30..949cd18ab16a99 100644
--- a/libxfs/xfs_alloc_btree.c
+++ b/libxfs/xfs_alloc_btree.c
@@ -84,7 +84,7 @@ xfs_allocbt_alloc_block(
 	}
 
 	atomic64_inc(&cur->bc_mp->m_allocbt_blks);
-	xfs_extent_busy_reuse(cur->bc_mp, cur->bc_ag.pag, bno, 1, false);
+	xfs_extent_busy_reuse(cur->bc_ag.pag, bno, 1, false);
 
 	new->s = cpu_to_be32(bno);
 
diff --git a/libxfs/xfs_rmap_btree.c b/libxfs/xfs_rmap_btree.c
index ada58e92645020..c261d6eae3bc3b 100644
--- a/libxfs/xfs_rmap_btree.c
+++ b/libxfs/xfs_rmap_btree.c
@@ -101,7 +101,7 @@ xfs_rmapbt_alloc_block(
 		return 0;
 	}
 
-	xfs_extent_busy_reuse(cur->bc_mp, pag, bno, 1, false);
+	xfs_extent_busy_reuse(pag, bno, 1, false);
 
 	new->s = cpu_to_be32(bno);
 	be32_add_cpu(&agf->agf_rmap_blocks, 1);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 10/36] xfs: pass a perag structure to the xfs_ag_resv_init_error trace point
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (8 preceding siblings ...)
  2024-12-23 21:39   ` [PATCH 09/36] xfs: pass a pag to xfs_extent_busy_{search,reuse} Darrick J. Wong
@ 2024-12-23 21:39   ` Darrick J. Wong
  2024-12-23 21:39   ` [PATCH 11/36] xfs: pass objects to the xfs_irec_merge_{pre,post} trace points Darrick J. Wong
                     ` (25 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:39 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: 835ddb592fab75ed96828ee3f12ea44496882d6b

And remove the single instance class indirection for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_ag_resv.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)


diff --git a/libxfs/xfs_ag_resv.c b/libxfs/xfs_ag_resv.c
index 7e5cbe0cb6afd3..d1657d6c636546 100644
--- a/libxfs/xfs_ag_resv.c
+++ b/libxfs/xfs_ag_resv.c
@@ -205,8 +205,7 @@ __xfs_ag_resv_init(
 	else
 		error = xfs_dec_fdblocks(mp, hidden_space, true);
 	if (error) {
-		trace_xfs_ag_resv_init_error(pag->pag_mount, pag->pag_agno,
-				error, _RET_IP_);
+		trace_xfs_ag_resv_init_error(pag, error, _RET_IP_);
 		xfs_warn(mp,
 "Per-AG reservation for AG %u failed.  Filesystem may run out of space.",
 				pag->pag_agno);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 11/36] xfs: pass objects to the xfs_irec_merge_{pre,post} trace points
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (9 preceding siblings ...)
  2024-12-23 21:39   ` [PATCH 10/36] xfs: pass a perag structure to the xfs_ag_resv_init_error trace point Darrick J. Wong
@ 2024-12-23 21:39   ` Darrick J. Wong
  2024-12-23 21:40   ` [PATCH 12/36] xfs: convert remaining trace points to pass pag structures Darrick J. Wong
                     ` (24 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:39 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: 487092ceaa72448ca3a82ea9fb89768c88f6abec

Pass the perag structure and the irec to these tracepoints so that the
decoding is only done when tracing is actually enabled and the call sites
look a lot neater.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/xfs_trace.h |    4 ++--
 libxfs/xfs_ialloc.c |    7 ++-----
 2 files changed, 4 insertions(+), 7 deletions(-)


diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index ba51419b3df3d3..0986e1621437d4 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -85,8 +85,8 @@
 #define trace_xfs_alloc_read_agf(a,b)		((void) 0)
 #define trace_xfs_read_agi(a,b)			((void) 0)
 #define trace_xfs_ialloc_read_agi(a,b)		((void) 0)
-#define trace_xfs_irec_merge_pre(a,b,c,d,e,f)	((void) 0)
-#define trace_xfs_irec_merge_post(a,b,c,d)	((void) 0)
+#define trace_xfs_irec_merge_pre(...)		((void) 0)
+#define trace_xfs_irec_merge_post(...)		((void) 0)
 
 #define trace_xfs_iext_insert(a,b,c,d)		((void) 0)
 #define trace_xfs_iext_remove(a,b,c,d)		((void) 0)
diff --git a/libxfs/xfs_ialloc.c b/libxfs/xfs_ialloc.c
index 01b2e2d8c27c22..b3d6f7f4212588 100644
--- a/libxfs/xfs_ialloc.c
+++ b/libxfs/xfs_ialloc.c
@@ -601,15 +601,12 @@ xfs_inobt_insert_sprec(
 		goto error;
 	}
 
-	trace_xfs_irec_merge_pre(mp, pag->pag_agno, rec.ir_startino,
-				 rec.ir_holemask, nrec->ir_startino,
-				 nrec->ir_holemask);
+	trace_xfs_irec_merge_pre(pag, &rec, nrec);
 
 	/* merge to nrec to output the updated record */
 	__xfs_inobt_rec_merge(nrec, &rec);
 
-	trace_xfs_irec_merge_post(mp, pag->pag_agno, nrec->ir_startino,
-				  nrec->ir_holemask);
+	trace_xfs_irec_merge_post(pag, nrec);
 
 	error = xfs_inobt_rec_check_count(mp, nrec);
 	if (error)


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 12/36] xfs: convert remaining trace points to pass pag structures
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (10 preceding siblings ...)
  2024-12-23 21:39   ` [PATCH 11/36] xfs: pass objects to the xfs_irec_merge_{pre,post} trace points Darrick J. Wong
@ 2024-12-23 21:40   ` Darrick J. Wong
  2024-12-23 21:40   ` [PATCH 13/36] xfs: split xfs_initialize_perag Darrick J. Wong
                     ` (23 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:40 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: c4ae021bcb6bf8bbb329ce8ef947a43009bc2fe4

Convert all tracepoints that take [mp,agno] tuples to take a pag argument
instead so that decoding only happens when tracepoints are enabled and to
clean up the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/xfs_trace.h     |    8 ++++----
 libxfs/xfs_alloc.c      |    4 ++--
 libxfs/xfs_ialloc.c     |    4 ++--
 libxfs/xfs_inode_util.c |    4 ++--
 4 files changed, 10 insertions(+), 10 deletions(-)


diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index 0986e1621437d4..012e0018cb8367 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -81,10 +81,10 @@
 
 #define trace_xfs_free_extent(...)		((void) 0)
 #define trace_xfs_agf(a,b,c,d)			((void) 0)
-#define trace_xfs_read_agf(a,b)			((void) 0)
-#define trace_xfs_alloc_read_agf(a,b)		((void) 0)
-#define trace_xfs_read_agi(a,b)			((void) 0)
-#define trace_xfs_ialloc_read_agi(a,b)		((void) 0)
+#define trace_xfs_read_agf(...)			((void) 0)
+#define trace_xfs_alloc_read_agf(...)		((void) 0)
+#define trace_xfs_read_agi(...)			((void) 0)
+#define trace_xfs_ialloc_read_agi(...)		((void) 0)
 #define trace_xfs_irec_merge_pre(...)		((void) 0)
 #define trace_xfs_irec_merge_post(...)		((void) 0)
 
diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index ed04e40856740b..bd39bcde0ea224 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -3354,7 +3354,7 @@ xfs_read_agf(
 	struct xfs_mount	*mp = pag->pag_mount;
 	int			error;
 
-	trace_xfs_read_agf(pag->pag_mount, pag->pag_agno);
+	trace_xfs_read_agf(pag);
 
 	error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp,
 			XFS_AG_DADDR(mp, pag->pag_agno, XFS_AGF_DADDR(mp)),
@@ -3385,7 +3385,7 @@ xfs_alloc_read_agf(
 	int			error;
 	int			allocbt_blks;
 
-	trace_xfs_alloc_read_agf(pag->pag_mount, pag->pag_agno);
+	trace_xfs_alloc_read_agf(pag);
 
 	/* We don't support trylock when freeing. */
 	ASSERT((flags & (XFS_ALLOC_FLAG_FREEING | XFS_ALLOC_FLAG_TRYLOCK)) !=
diff --git a/libxfs/xfs_ialloc.c b/libxfs/xfs_ialloc.c
index b3d6f7f4212588..4f087e6b074081 100644
--- a/libxfs/xfs_ialloc.c
+++ b/libxfs/xfs_ialloc.c
@@ -2724,7 +2724,7 @@ xfs_read_agi(
 	struct xfs_mount	*mp = pag->pag_mount;
 	int			error;
 
-	trace_xfs_read_agi(pag->pag_mount, pag->pag_agno);
+	trace_xfs_read_agi(pag);
 
 	error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp,
 			XFS_AG_DADDR(mp, pag->pag_agno, XFS_AGI_DADDR(mp)),
@@ -2755,7 +2755,7 @@ xfs_ialloc_read_agi(
 	struct xfs_agi		*agi;
 	int			error;
 
-	trace_xfs_ialloc_read_agi(pag->pag_mount, pag->pag_agno);
+	trace_xfs_ialloc_read_agi(pag);
 
 	error = xfs_read_agi(pag, tp,
 			(flags & XFS_IALLOC_FLAG_TRYLOCK) ? XBF_TRYLOCK : 0,
diff --git a/libxfs/xfs_inode_util.c b/libxfs/xfs_inode_util.c
index 92bfdf0715f02e..f9f16c7e2d0788 100644
--- a/libxfs/xfs_inode_util.c
+++ b/libxfs/xfs_inode_util.c
@@ -439,8 +439,8 @@ xfs_iunlink_update_bucket(
 	ASSERT(xfs_verify_agino_or_null(pag, new_agino));
 
 	old_value = be32_to_cpu(agi->agi_unlinked[bucket_index]);
-	trace_xfs_iunlink_update_bucket(tp->t_mountp, pag->pag_agno, bucket_index,
-			old_value, new_agino);
+	trace_xfs_iunlink_update_bucket(pag, bucket_index, old_value,
+			new_agino);
 
 	/*
 	 * We should never find the head of the list already set to the value


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 13/36] xfs: split xfs_initialize_perag
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (11 preceding siblings ...)
  2024-12-23 21:40   ` [PATCH 12/36] xfs: convert remaining trace points to pass pag structures Darrick J. Wong
@ 2024-12-23 21:40   ` Darrick J. Wong
  2024-12-23 21:40   ` [PATCH 14/36] xfs: insert the pag structures into the xarray later Darrick J. Wong
                     ` (22 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:40 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: 201c5fa342af75adaf762fd6c63380bb8001762d

Factor out a xfs_perag_alloc helper that allocates a single perag
structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_ag.c |  121 ++++++++++++++++++++++++++++++++-----------------------
 libxfs/xfs_ag.h |    4 +-
 2 files changed, 72 insertions(+), 53 deletions(-)


diff --git a/libxfs/xfs_ag.c b/libxfs/xfs_ag.c
index a6b5a7d71bbf80..8809d76d496ae9 100644
--- a/libxfs/xfs_ag.c
+++ b/libxfs/xfs_ag.c
@@ -270,6 +270,10 @@ xfs_agino_range(
 	return __xfs_agino_range(mp, xfs_ag_block_count(mp, agno), first, last);
 }
 
+/*
+ * Update the perag of the previous tail AG if it has been changed during
+ * recovery (i.e. recovery of a growfs).
+ */
 int
 xfs_update_last_ag_size(
 	struct xfs_mount	*mp,
@@ -287,69 +291,57 @@ xfs_update_last_ag_size(
 	return 0;
 }
 
-int
-xfs_initialize_perag(
+static int
+xfs_perag_alloc(
 	struct xfs_mount	*mp,
-	xfs_agnumber_t		old_agcount,
-	xfs_agnumber_t		new_agcount,
-	xfs_rfsblock_t		dblocks,
-	xfs_agnumber_t		*maxagi)
+	xfs_agnumber_t		index,
+	xfs_agnumber_t		agcount,
+	xfs_rfsblock_t		dblocks)
 {
 	struct xfs_perag	*pag;
-	xfs_agnumber_t		index;
 	int			error;
 
-	for (index = old_agcount; index < new_agcount; index++) {
-		pag = kzalloc(sizeof(*pag), GFP_KERNEL);
-		if (!pag) {
-			error = -ENOMEM;
-			goto out_unwind_new_pags;
-		}
-		pag->pag_agno = index;
-		pag->pag_mount = mp;
+	pag = kzalloc(sizeof(*pag), GFP_KERNEL);
+	if (!pag)
+		return -ENOMEM;
 
-		error = xa_insert(&mp->m_perags, index, pag, GFP_KERNEL);
-		if (error) {
-			WARN_ON_ONCE(error == -EBUSY);
-			goto out_free_pag;
-		}
+	pag->pag_agno = index;
+	pag->pag_mount = mp;
+
+	error = xa_insert(&mp->m_perags, index, pag, GFP_KERNEL);
+	if (error) {
+		WARN_ON_ONCE(error == -EBUSY);
+		goto out_free_pag;
+	}
 
 #ifdef __KERNEL__
-		/* Place kernel structure only init below this point. */
-		spin_lock_init(&pag->pag_ici_lock);
-		spin_lock_init(&pag->pagb_lock);
-		spin_lock_init(&pag->pag_state_lock);
-		INIT_DELAYED_WORK(&pag->pag_blockgc_work, xfs_blockgc_worker);
-		INIT_RADIX_TREE(&pag->pag_ici_root, GFP_ATOMIC);
-		xfs_defer_drain_init(&pag->pag_intents_drain);
-		init_waitqueue_head(&pag->pagb_wait);
-		pag->pagb_tree = RB_ROOT;
-		xfs_hooks_init(&pag->pag_rmap_update_hooks);
+	/* Place kernel structure only init below this point. */
+	spin_lock_init(&pag->pag_ici_lock);
+	spin_lock_init(&pag->pagb_lock);
+	spin_lock_init(&pag->pag_state_lock);
+	INIT_DELAYED_WORK(&pag->pag_blockgc_work, xfs_blockgc_worker);
+	INIT_RADIX_TREE(&pag->pag_ici_root, GFP_ATOMIC);
+	xfs_defer_drain_init(&pag->pag_intents_drain);
+	init_waitqueue_head(&pag->pagb_wait);
+	pag->pagb_tree = RB_ROOT;
+	xfs_hooks_init(&pag->pag_rmap_update_hooks);
 #endif /* __KERNEL__ */
 
-		error = xfs_buf_cache_init(&pag->pag_bcache);
-		if (error)
-			goto out_remove_pag;
+	error = xfs_buf_cache_init(&pag->pag_bcache);
+	if (error)
+		goto out_remove_pag;
 
-		/* Active ref owned by mount indicates AG is online. */
-		atomic_set(&pag->pag_active_ref, 1);
+	/* Active ref owned by mount indicates AG is online. */
+	atomic_set(&pag->pag_active_ref, 1);
 
-		/*
-		 * Pre-calculated geometry
-		 */
-		pag->block_count = __xfs_ag_block_count(mp, index, new_agcount,
-				dblocks);
-		pag->min_block = XFS_AGFL_BLOCK(mp);
-		__xfs_agino_range(mp, pag->block_count, &pag->agino_min,
-				&pag->agino_max);
-	}
+	/*
+	 * Pre-calculated geometry
+	 */
+	pag->block_count = __xfs_ag_block_count(mp, index, agcount, dblocks);
+	pag->min_block = XFS_AGFL_BLOCK(mp);
+	__xfs_agino_range(mp, pag->block_count, &pag->agino_min,
+			&pag->agino_max);
 
-	index = xfs_set_inode_alloc(mp, new_agcount);
-
-	if (maxagi)
-		*maxagi = index;
-
-	mp->m_ag_prealloc_blocks = xfs_prealloc_blocks(mp);
 	return 0;
 
 out_remove_pag:
@@ -357,8 +349,35 @@ xfs_initialize_perag(
 	pag = xa_erase(&mp->m_perags, index);
 out_free_pag:
 	kfree(pag);
+	return error;
+}
+
+int
+xfs_initialize_perag(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		orig_agcount,
+	xfs_agnumber_t		new_agcount,
+	xfs_rfsblock_t		dblocks,
+	xfs_agnumber_t		*maxagi)
+{
+	xfs_agnumber_t		index;
+	int			error;
+
+	if (orig_agcount >= new_agcount)
+		return 0;
+
+	for (index = orig_agcount; index < new_agcount; index++) {
+		error = xfs_perag_alloc(mp, index, new_agcount, dblocks);
+		if (error)
+			goto out_unwind_new_pags;
+	}
+
+	*maxagi = xfs_set_inode_alloc(mp, new_agcount);
+	mp->m_ag_prealloc_blocks = xfs_prealloc_blocks(mp);
+	return 0;
+
 out_unwind_new_pags:
-	xfs_free_perag_range(mp, old_agcount, index);
+	xfs_free_perag_range(mp, orig_agcount, index);
 	return error;
 }
 
diff --git a/libxfs/xfs_ag.h b/libxfs/xfs_ag.h
index e0f567d90debee..8787823ae37f9f 100644
--- a/libxfs/xfs_ag.h
+++ b/libxfs/xfs_ag.h
@@ -142,8 +142,8 @@ __XFS_AG_OPSTATE(prefers_metadata, PREFERS_METADATA)
 __XFS_AG_OPSTATE(allows_inodes, ALLOWS_INODES)
 __XFS_AG_OPSTATE(agfl_needs_reset, AGFL_NEEDS_RESET)
 
-int xfs_initialize_perag(struct xfs_mount *mp, xfs_agnumber_t old_agcount,
-		xfs_agnumber_t agcount, xfs_rfsblock_t dcount,
+int xfs_initialize_perag(struct xfs_mount *mp, xfs_agnumber_t orig_agcount,
+		xfs_agnumber_t new_agcount, xfs_rfsblock_t dcount,
 		xfs_agnumber_t *maxagi);
 void xfs_free_perag_range(struct xfs_mount *mp, xfs_agnumber_t first_agno,
 		xfs_agnumber_t end_agno);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 14/36] xfs: insert the pag structures into the xarray later
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (12 preceding siblings ...)
  2024-12-23 21:40   ` [PATCH 13/36] xfs: split xfs_initialize_perag Darrick J. Wong
@ 2024-12-23 21:40   ` Darrick J. Wong
  2024-12-23 21:41   ` [PATCH 15/36] xfs: factor out a generic xfs_group structure Darrick J. Wong
                     ` (21 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:40 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: d66496578b2a099ea453f56782f1cd2bf63a8029

Cleaning up is much easier if a structure can't be looked up yet, so only
insert the pag once it is fully set up.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_ag.c |   31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)


diff --git a/libxfs/xfs_ag.c b/libxfs/xfs_ag.c
index 8809d76d496ae9..e22ce4e83f4a62 100644
--- a/libxfs/xfs_ag.c
+++ b/libxfs/xfs_ag.c
@@ -305,15 +305,6 @@ xfs_perag_alloc(
 	if (!pag)
 		return -ENOMEM;
 
-	pag->pag_agno = index;
-	pag->pag_mount = mp;
-
-	error = xa_insert(&mp->m_perags, index, pag, GFP_KERNEL);
-	if (error) {
-		WARN_ON_ONCE(error == -EBUSY);
-		goto out_free_pag;
-	}
-
 #ifdef __KERNEL__
 	/* Place kernel structure only init below this point. */
 	spin_lock_init(&pag->pag_ici_lock);
@@ -329,10 +320,7 @@ xfs_perag_alloc(
 
 	error = xfs_buf_cache_init(&pag->pag_bcache);
 	if (error)
-		goto out_remove_pag;
-
-	/* Active ref owned by mount indicates AG is online. */
-	atomic_set(&pag->pag_active_ref, 1);
+		goto out_defer_drain_free;
 
 	/*
 	 * Pre-calculated geometry
@@ -342,12 +330,23 @@ xfs_perag_alloc(
 	__xfs_agino_range(mp, pag->block_count, &pag->agino_min,
 			&pag->agino_max);
 
+	pag->pag_agno = index;
+	pag->pag_mount = mp;
+	/* Active ref owned by mount indicates AG is online. */
+	atomic_set(&pag->pag_active_ref, 1);
+
+	error = xa_insert(&mp->m_perags, index, pag, GFP_KERNEL);
+	if (error) {
+		WARN_ON_ONCE(error == -EBUSY);
+		goto out_buf_cache_destroy;
+	}
+
 	return 0;
 
-out_remove_pag:
+out_buf_cache_destroy:
+	xfs_buf_cache_destroy(&pag->pag_bcache);
+out_defer_drain_free:
 	xfs_defer_drain_free(&pag->pag_intents_drain);
-	pag = xa_erase(&mp->m_perags, index);
-out_free_pag:
 	kfree(pag);
 	return error;
 }


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 15/36] xfs: factor out a generic xfs_group structure
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (13 preceding siblings ...)
  2024-12-23 21:40   ` [PATCH 14/36] xfs: insert the pag structures into the xarray later Darrick J. Wong
@ 2024-12-23 21:41   ` Darrick J. Wong
  2024-12-23 21:41   ` [PATCH 16/36] xfs: add a xfs_group_next_range helper Darrick J. Wong
                     ` (20 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:41 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: e9c4d8bfb26c13c41b73fdf4183d3df2d392101e

Split the lookup and refcount handling of struct xfs_perag into an
embedded xfs_group structure that can be reused for the upcoming
realtime groups.

It will be extended with more features later.

Note that he xg_type field will only need a single bit even with
realtime group support.  For now it fills a hole, but it might be
worth to fold it into another field if we can use this space better.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/fsmap.c                  |    6 +-
 db/info.c                   |    2 -
 db/iunlink.c                |    2 -
 include/xfs_mount.h         |    6 +-
 include/xfs_trace.h         |    7 ++
 libfrog/radix-tree.h        |    9 ++
 libxfs/Makefile             |    2 +
 libxfs/defer_item.c         |    6 +-
 libxfs/init.c               |    8 +-
 libxfs/iunlink.c            |   11 ++-
 libxfs/xfs_ag.c             |  139 +++++++-----------------------------
 libxfs/xfs_ag.h             |   81 +++++++++++++++++----
 libxfs/xfs_ag_resv.c        |   19 +++--
 libxfs/xfs_alloc.c          |   39 +++++-----
 libxfs/xfs_alloc_btree.c    |    2 -
 libxfs/xfs_bmap.c           |    2 -
 libxfs/xfs_btree.c          |    6 +-
 libxfs/xfs_group.c          |  168 +++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_group.h          |   41 ++++++++++
 libxfs/xfs_ialloc.c         |   40 +++++-----
 libxfs/xfs_ialloc_btree.c   |   16 ++--
 libxfs/xfs_refcount.c       |    4 +
 libxfs/xfs_refcount_btree.c |    8 +-
 libxfs/xfs_rmap.c           |    4 +
 libxfs/xfs_rmap_btree.c     |    7 +-
 libxfs/xfs_sb.c             |    6 +-
 libxfs/xfs_types.h          |    8 ++
 repair/agbtree.c            |   27 +++----
 repair/bmap_repair.c        |    4 -
 repair/bulkload.c           |    9 +-
 repair/phase2.c             |   10 +--
 repair/phase5.c             |    2 -
 repair/rmap.c               |   10 +--
 33 files changed, 455 insertions(+), 256 deletions(-)
 create mode 100644 libxfs/xfs_group.c
 create mode 100644 libxfs/xfs_group.h


diff --git a/db/fsmap.c b/db/fsmap.c
index 7fd42df2a1c854..923d7568b9d977 100644
--- a/db/fsmap.c
+++ b/db/fsmap.c
@@ -64,7 +64,7 @@ fsmap(
 
 	info.nr = 0;
 	for_each_perag_range(mp, start_ag, end_ag, pag) {
-		if (pag->pag_agno == end_ag)
+		if (pag_agno(pag) == end_ag)
 			high.rm_startblock = XFS_FSB_TO_AGBNO(mp, end_fsb);
 
 		error = -libxfs_alloc_read_agf(pag, NULL, 0, &agbp);
@@ -82,7 +82,7 @@ fsmap(
 			return;
 		}
 
-		info.agno = pag->pag_agno;
+		info.agno = pag_agno(pag);
 		error = -libxfs_rmap_query_range(bt_cur, &low, &high,
 				fsmap_fn, &info);
 		if (error) {
@@ -97,7 +97,7 @@ fsmap(
 		libxfs_btree_del_cursor(bt_cur, XFS_BTREE_NOERROR);
 		libxfs_buf_relse(agbp);
 
-		if (pag->pag_agno == start_ag)
+		if (pag_agno(pag) == start_ag)
 			low.rm_startblock = 0;
 	}
 }
diff --git a/db/info.c b/db/info.c
index 9c6203f029d41c..6a8765ec761a49 100644
--- a/db/info.c
+++ b/db/info.c
@@ -66,7 +66,7 @@ print_agresv_info(
 {
 	struct xfs_buf	*bp;
 	struct xfs_agf	*agf;
-	xfs_agnumber_t	agno = pag->pag_agno;
+	xfs_agnumber_t	agno = pag_agno(pag);
 	xfs_extlen_t	ask = 0;
 	xfs_extlen_t	used = 0;
 	xfs_extlen_t	free = 0;
diff --git a/db/iunlink.c b/db/iunlink.c
index 55ba5af5a3c563..c9977a859e2842 100644
--- a/db/iunlink.c
+++ b/db/iunlink.c
@@ -117,7 +117,7 @@ dump_unlinked(
 	bool			verbose)
 {
 	struct xfs_buf		*agi_bp;
-	xfs_agnumber_t		agno = pag->pag_agno;
+	xfs_agnumber_t		agno = pag_agno(pag);
 	int			error;
 
 	error = -libxfs_ialloc_read_agi(pag, NULL, 0, &agi_bp);
diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index e2add8a648f887..2102009aa8df73 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -24,6 +24,10 @@ enum {
 	XFS_LOWSP_MAX,
 };
 
+struct xfs_groups {
+	struct xarray		xa;
+};
+
 /*
  * Define a user-level mount structure with all we need
  * in order to make use of the numerous XFS_* macros.
@@ -91,7 +95,7 @@ typedef struct xfs_mount {
 	xfs_extlen_t		m_ag_prealloc_blocks; /* reserved ag blocks */
 	uint			m_alloc_set_aside; /* space we can't use */
 	uint			m_ag_max_usable; /* max space per AG */
-	struct xarray		m_perags;
+	struct xfs_groups	m_groups[XG_TYPE_MAX];
 	uint64_t		m_features;	/* active filesystem features */
 	uint64_t		m_low_space[XFS_LOWSP_MAX];
 	uint64_t		m_rtxblkmask;	/* rt extent block mask */
diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index 012e0018cb8367..de435a9d1a2a64 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -366,4 +366,11 @@
 #define trace_xfs_iunlink_reload_next(...)	((void) 0)
 #define trace_xfs_iunlink_remove(...)		((void) 0)
 
+#define trace_xfs_group_get(...)		((void) 0)
+#define trace_xfs_group_grab_next_tag(...)	((void) 0)
+#define trace_xfs_group_grab(...)		((void) 0)
+#define trace_xfs_group_hold(...)		((void) 0)
+#define trace_xfs_group_put(...)		((void) 0)
+#define trace_xfs_group_rele(...)		((void) 0)
+
 #endif /* __TRACE_H__ */
diff --git a/libfrog/radix-tree.h b/libfrog/radix-tree.h
index fe896134eeb283..0a4e3bb4f9defc 100644
--- a/libfrog/radix-tree.h
+++ b/libfrog/radix-tree.h
@@ -72,6 +72,8 @@ struct xarray {
 	struct radix_tree_root	r;
 };
 
+typedef unsigned xa_mark_t;
+
 static inline void xa_init(struct xarray *xa)
 {
 	INIT_RADIX_TREE(&xa->r, GFP_KERNEL);
@@ -98,4 +100,11 @@ static inline int xa_insert(struct xarray *xa, unsigned long index, void *entry,
 	return error;
 }
 
+static inline void *xa_find(struct xarray *xa, unsigned long *indexp,
+			unsigned long max, xa_mark_t filter)
+{
+	/* not implemented */
+	return NULL;
+}
+
 #endif /* __LIBFROG_RADIX_TREE_H__ */
diff --git a/libxfs/Makefile b/libxfs/Makefile
index aca28440adac08..470583006de69a 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -49,6 +49,7 @@ HFILES = \
 	xfs_dir2.h \
 	xfs_errortag.h \
 	xfs_exchmaps.h \
+	xfs_group.h \
 	xfs_ialloc.h \
 	xfs_ialloc_btree.h \
 	xfs_inode_buf.h \
@@ -105,6 +106,7 @@ CFILES = buf_mem.c \
 	xfs_dir2_sf.c \
 	xfs_dquot_buf.c \
 	xfs_exchmaps.c \
+	xfs_group.c \
 	xfs_ialloc.c \
 	xfs_iext_tree.c \
 	xfs_inode_buf.c \
diff --git a/libxfs/defer_item.c b/libxfs/defer_item.c
index d5e075362ababe..f0f35361a0ff97 100644
--- a/libxfs/defer_item.c
+++ b/libxfs/defer_item.c
@@ -48,7 +48,7 @@ xfs_extent_free_diff_items(
 	struct xfs_extent_free_item	*ra = xefi_entry(a);
 	struct xfs_extent_free_item	*rb = xefi_entry(b);
 
-	return ra->xefi_pag->pag_agno - rb->xefi_pag->pag_agno;
+	return pag_agno(ra->xefi_pag) - pag_agno(rb->xefi_pag);
 }
 
 /* Get an EFI. */
@@ -215,7 +215,7 @@ xfs_rmap_update_diff_items(
 	struct xfs_rmap_intent		*ra = ri_entry(a);
 	struct xfs_rmap_intent		*rb = ri_entry(b);
 
-	return ra->ri_pag->pag_agno - rb->ri_pag->pag_agno;
+	return pag_agno(ra->ri_pag) - pag_agno(rb->ri_pag);
 }
 
 /* Get an RUI. */
@@ -336,7 +336,7 @@ xfs_refcount_update_diff_items(
 	struct xfs_refcount_intent	*ra = ci_entry(a);
 	struct xfs_refcount_intent	*rb = ci_entry(b);
 
-	return ra->ri_pag->pag_agno - rb->ri_pag->pag_agno;
+	return pag_agno(ra->ri_pag) - pag_agno(rb->ri_pag);
 }
 
 /* Get an CUI. */
diff --git a/libxfs/init.c b/libxfs/init.c
index 483cd99546052f..beb58706629d23 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -360,7 +360,7 @@ xfs_set_inode_alloc_perag(
 	xfs_ino_t		ino,
 	xfs_agnumber_t		max_metadata)
 {
-	if (!xfs_is_inode32(pag->pag_mount)) {
+	if (!xfs_is_inode32(pag_mount(pag))) {
 		set_bit(XFS_AGSTATE_ALLOWS_INODES, &pag->pag_opstate);
 		clear_bit(XFS_AGSTATE_PREFERS_METADATA, &pag->pag_opstate);
 		return false;
@@ -373,7 +373,7 @@ xfs_set_inode_alloc_perag(
 	}
 
 	set_bit(XFS_AGSTATE_ALLOWS_INODES, &pag->pag_opstate);
-	if (pag->pag_agno < max_metadata)
+	if (pag_agno(pag) < max_metadata)
 		set_bit(XFS_AGSTATE_PREFERS_METADATA, &pag->pag_opstate);
 	else
 		clear_bit(XFS_AGSTATE_PREFERS_METADATA, &pag->pag_opstate);
@@ -650,6 +650,7 @@ libxfs_mount(
 	struct xfs_buf		*bp;
 	struct xfs_sb		*sbp;
 	xfs_daddr_t		d;
+	int			i;
 	int			error;
 
 	mp->m_features = xfs_sb_version_to_features(sb);
@@ -667,7 +668,8 @@ libxfs_mount(
 	mp->m_finobt_nores = true;
 	xfs_set_inode32(mp);
 	mp->m_sb = *sb;
-	xa_init(&mp->m_perags);
+	for (i = 0; i < XG_TYPE_MAX; i++)
+		xa_init(&mp->m_groups[i].xa);
 	sbp = &mp->m_sb;
 	spin_lock_init(&mp->m_sb_lock);
 	spin_lock_init(&mp->m_agirotor_lock);
diff --git a/libxfs/iunlink.c b/libxfs/iunlink.c
index 6d0554535994c9..53e36cdc3439b2 100644
--- a/libxfs/iunlink.c
+++ b/libxfs/iunlink.c
@@ -65,9 +65,10 @@ xfs_iunlink_log_dinode(
 		goto out;
 	}
 
-	trace_xfs_iunlink_update_dinode(mp, iup->pag->pag_agno,
-			XFS_INO_TO_AGINO(mp, ip->i_ino),
-			be32_to_cpu(dip->di_next_unlinked), iup->next_agino);
+	trace_xfs_iunlink_update_dinode(mp, pag_agno(iup->pag),
+					XFS_INO_TO_AGINO(mp, ip->i_ino),
+					be32_to_cpu(dip->di_next_unlinked),
+					iup->next_agino);
 
 	dip->di_next_unlinked = cpu_to_be32(iup->next_agino);
 	offset = ip->i_imap.im_boffset +
@@ -137,14 +138,14 @@ xfs_iunlink_reload_next(
 	xfs_agino_t		next_agino)
 {
 	struct xfs_perag	*pag = agibp->b_pag;
-	struct xfs_mount	*mp = pag->pag_mount;
+	struct xfs_mount	*mp = pag_mount(pag);
 	struct xfs_inode	*next_ip = NULL;
 	xfs_ino_t		ino;
 	int			error;
 
 	ASSERT(next_agino != NULLAGINO);
 
-	ino = XFS_AGINO_TO_INO(mp, pag->pag_agno, next_agino);
+	ino = XFS_AGINO_TO_INO(mp, pag_agno(pag), next_agino);
 	error = libxfs_iget(mp, tp, ino, XFS_IGET_UNTRUSTED, &next_ip);
 	if (error)
 		return error;
diff --git a/libxfs/xfs_ag.c b/libxfs/xfs_ag.c
index e22ce4e83f4a62..15d4ac5a99f0e7 100644
--- a/libxfs/xfs_ag.c
+++ b/libxfs/xfs_ag.c
@@ -28,85 +28,7 @@
 #include "xfs_trans.h"
 #include "xfs_trace.h"
 #include "xfs_inode.h"
-
-
-/*
- * Passive reference counting access wrappers to the perag structures.  If the
- * per-ag structure is to be freed, the freeing code is responsible for cleaning
- * up objects with passive references before freeing the structure. This is
- * things like cached buffers.
- */
-struct xfs_perag *
-xfs_perag_get(
-	struct xfs_mount	*mp,
-	xfs_agnumber_t		agno)
-{
-	struct xfs_perag	*pag;
-
-	rcu_read_lock();
-	pag = xa_load(&mp->m_perags, agno);
-	if (pag) {
-		trace_xfs_perag_get(pag, _RET_IP_);
-		ASSERT(atomic_read(&pag->pag_ref) >= 0);
-		atomic_inc(&pag->pag_ref);
-	}
-	rcu_read_unlock();
-	return pag;
-}
-
-/* Get a passive reference to the given perag. */
-struct xfs_perag *
-xfs_perag_hold(
-	struct xfs_perag	*pag)
-{
-	ASSERT(atomic_read(&pag->pag_ref) > 0 ||
-	       atomic_read(&pag->pag_active_ref) > 0);
-
-	trace_xfs_perag_hold(pag, _RET_IP_);
-	atomic_inc(&pag->pag_ref);
-	return pag;
-}
-
-void
-xfs_perag_put(
-	struct xfs_perag	*pag)
-{
-	trace_xfs_perag_put(pag, _RET_IP_);
-	ASSERT(atomic_read(&pag->pag_ref) > 0);
-	atomic_dec(&pag->pag_ref);
-}
-
-/*
- * Active references for perag structures. This is for short term access to the
- * per ag structures for walking trees or accessing state. If an AG is being
- * shrunk or is offline, then this will fail to find that AG and return NULL
- * instead.
- */
-struct xfs_perag *
-xfs_perag_grab(
-	struct xfs_mount	*mp,
-	xfs_agnumber_t		agno)
-{
-	struct xfs_perag	*pag;
-
-	rcu_read_lock();
-	pag = xa_load(&mp->m_perags, agno);
-	if (pag) {
-		trace_xfs_perag_grab(pag, _RET_IP_);
-		if (!atomic_inc_not_zero(&pag->pag_active_ref))
-			pag = NULL;
-	}
-	rcu_read_unlock();
-	return pag;
-}
-
-void
-xfs_perag_rele(
-	struct xfs_perag	*pag)
-{
-	trace_xfs_perag_rele(pag, _RET_IP_);
-	atomic_dec(&pag->pag_active_ref);
-}
+#include "xfs_group.h"
 
 /*
  * xfs_initialize_perag_data
@@ -181,6 +103,19 @@ xfs_initialize_perag_data(
 	return error;
 }
 
+static void
+xfs_perag_uninit(
+	struct xfs_group	*xg)
+{
+#ifdef __KERNEL__
+	struct xfs_perag	*pag = to_perag(xg);
+
+	xfs_defer_drain_free(&pag->pag_intents_drain);
+	cancel_delayed_work_sync(&pag->pag_blockgc_work);
+	xfs_buf_cache_destroy(&pag->pag_bcache);
+#endif
+}
+
 /*
  * Free up the per-ag resources  within the specified AG range.
  */
@@ -193,22 +128,8 @@ xfs_free_perag_range(
 {
 	xfs_agnumber_t		agno;
 
-	for (agno = first_agno; agno < end_agno; agno++) {
-		struct xfs_perag	*pag = xa_erase(&mp->m_perags, agno);
-
-		ASSERT(pag);
-		XFS_IS_CORRUPT(pag->pag_mount, atomic_read(&pag->pag_ref) != 0);
-		xfs_defer_drain_free(&pag->pag_intents_drain);
-
-		cancel_delayed_work_sync(&pag->pag_blockgc_work);
-		xfs_buf_cache_destroy(&pag->pag_bcache);
-
-		/* drop the mount's active reference */
-		xfs_perag_rele(pag);
-		XFS_IS_CORRUPT(pag->pag_mount,
-				atomic_read(&pag->pag_active_ref) != 0);
-		kfree_rcu_mightsleep(pag);
-	}
+	for (agno = first_agno; agno < end_agno; agno++)
+		xfs_group_free(mp, agno, XG_TYPE_AG, xfs_perag_uninit);
 }
 
 /* Find the size of the AG, in blocks. */
@@ -330,16 +251,9 @@ xfs_perag_alloc(
 	__xfs_agino_range(mp, pag->block_count, &pag->agino_min,
 			&pag->agino_max);
 
-	pag->pag_agno = index;
-	pag->pag_mount = mp;
-	/* Active ref owned by mount indicates AG is online. */
-	atomic_set(&pag->pag_active_ref, 1);
-
-	error = xa_insert(&mp->m_perags, index, pag, GFP_KERNEL);
-	if (error) {
-		WARN_ON_ONCE(error == -EBUSY);
+	error = xfs_group_insert(mp, pag_group(pag), index, XG_TYPE_AG);
+	if (error)
 		goto out_buf_cache_destroy;
-	}
 
 	return 0;
 
@@ -831,7 +745,7 @@ xfs_ag_shrink_space(
 	struct xfs_trans	**tpp,
 	xfs_extlen_t		delta)
 {
-	struct xfs_mount	*mp = pag->pag_mount;
+	struct xfs_mount	*mp = pag_mount(pag);
 	struct xfs_alloc_arg	args = {
 		.tp	= *tpp,
 		.mp	= mp,
@@ -848,7 +762,7 @@ xfs_ag_shrink_space(
 	xfs_agblock_t		aglen;
 	int			error, err2;
 
-	ASSERT(pag->pag_agno == mp->m_sb.sb_agcount - 1);
+	ASSERT(pag_agno(pag) == mp->m_sb.sb_agcount - 1);
 	error = xfs_ialloc_read_agi(pag, *tpp, 0, &agibp);
 	if (error)
 		return error;
@@ -945,8 +859,8 @@ xfs_ag_shrink_space(
 
 	/* Update perag geometry */
 	pag->block_count -= delta;
-	__xfs_agino_range(pag->pag_mount, pag->block_count, &pag->agino_min,
-				&pag->agino_max);
+	__xfs_agino_range(mp, pag->block_count, &pag->agino_min,
+			&pag->agino_max);
 
 	xfs_ialloc_log_agi(*tpp, agibp, XFS_AGI_LENGTH);
 	xfs_alloc_log_agf(*tpp, agfbp, XFS_AGF_LENGTH);
@@ -971,12 +885,13 @@ xfs_ag_extend_space(
 	struct xfs_trans	*tp,
 	xfs_extlen_t		len)
 {
+	struct xfs_mount	*mp = pag_mount(pag);
 	struct xfs_buf		*bp;
 	struct xfs_agi		*agi;
 	struct xfs_agf		*agf;
 	int			error;
 
-	ASSERT(pag->pag_agno == pag->pag_mount->m_sb.sb_agcount - 1);
+	ASSERT(pag_agno(pag) == mp->m_sb.sb_agcount - 1);
 
 	error = xfs_ialloc_read_agi(pag, tp, 0, &bp);
 	if (error)
@@ -1016,8 +931,8 @@ xfs_ag_extend_space(
 
 	/* Update perag geometry */
 	pag->block_count = be32_to_cpu(agf->agf_length);
-	__xfs_agino_range(pag->pag_mount, pag->block_count, &pag->agino_min,
-				&pag->agino_max);
+	__xfs_agino_range(mp, pag->block_count, &pag->agino_min,
+			&pag->agino_max);
 	return 0;
 }
 
@@ -1044,7 +959,7 @@ xfs_ag_get_geometry(
 
 	/* Fill out form. */
 	memset(ageo, 0, sizeof(*ageo));
-	ageo->ag_number = pag->pag_agno;
+	ageo->ag_number = pag_agno(pag);
 
 	agi = agi_bp->b_addr;
 	ageo->ag_icount = be32_to_cpu(agi->agi_count);
diff --git a/libxfs/xfs_ag.h b/libxfs/xfs_ag.h
index 8787823ae37f9f..69b934ad2c4aad 100644
--- a/libxfs/xfs_ag.h
+++ b/libxfs/xfs_ag.h
@@ -7,6 +7,8 @@
 #ifndef __LIBXFS_AG_H
 #define __LIBXFS_AG_H 1
 
+#include "xfs_group.h"
+
 struct xfs_mount;
 struct xfs_trans;
 struct xfs_perag;
@@ -30,10 +32,7 @@ struct xfs_ag_resv {
  * performance of allocation group selection.
  */
 struct xfs_perag {
-	struct xfs_mount *pag_mount;	/* owner filesystem */
-	xfs_agnumber_t	pag_agno;	/* AG this structure belongs to */
-	atomic_t	pag_ref;	/* passive reference count */
-	atomic_t	pag_active_ref;	/* active reference count */
+	struct xfs_group pag_group;
 	unsigned long	pag_opstate;
 	uint8_t		pagf_bno_level;	/* # of levels in bno btree */
 	uint8_t		pagf_cnt_level;	/* # of levels in cnt btree */
@@ -121,6 +120,26 @@ struct xfs_perag {
 #endif /* __KERNEL__ */
 };
 
+static inline struct xfs_perag *to_perag(struct xfs_group *xg)
+{
+	return container_of(xg, struct xfs_perag, pag_group);
+}
+
+static inline struct xfs_group *pag_group(struct xfs_perag *pag)
+{
+	return &pag->pag_group;
+}
+
+static inline struct xfs_mount *pag_mount(const struct xfs_perag *pag)
+{
+	return pag->pag_group.xg_mount;
+}
+
+static inline xfs_agnumber_t pag_agno(const struct xfs_perag *pag)
+{
+	return pag->pag_group.xg_gno;
+}
+
 /*
  * Per-AG operational state. These are atomic flag bits.
  */
@@ -151,13 +170,43 @@ int xfs_initialize_perag_data(struct xfs_mount *mp, xfs_agnumber_t agno);
 int xfs_update_last_ag_size(struct xfs_mount *mp, xfs_agnumber_t prev_agcount);
 
 /* Passive AG references */
-struct xfs_perag *xfs_perag_get(struct xfs_mount *mp, xfs_agnumber_t agno);
-struct xfs_perag *xfs_perag_hold(struct xfs_perag *pag);
-void xfs_perag_put(struct xfs_perag *pag);
+static inline struct xfs_perag *
+xfs_perag_get(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno)
+{
+	return to_perag(xfs_group_get(mp, agno, XG_TYPE_AG));
+}
+
+static inline struct xfs_perag *
+xfs_perag_hold(
+	struct xfs_perag	*pag)
+{
+	return to_perag(xfs_group_hold(pag_group(pag)));
+}
+
+static inline void
+xfs_perag_put(
+	struct xfs_perag	*pag)
+{
+	xfs_group_put(pag_group(pag));
+}
 
 /* Active AG references */
-struct xfs_perag *xfs_perag_grab(struct xfs_mount *, xfs_agnumber_t);
-void xfs_perag_rele(struct xfs_perag *pag);
+static inline struct xfs_perag *
+xfs_perag_grab(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno)
+{
+	return to_perag(xfs_group_grab(mp, agno, XG_TYPE_AG));
+}
+
+static inline void
+xfs_perag_rele(
+	struct xfs_perag	*pag)
+{
+	xfs_group_rele(pag_group(pag));
+}
 
 /*
  * Per-ag geometry infomation and validation
@@ -233,9 +282,9 @@ xfs_perag_next(
 	xfs_agnumber_t		*agno,
 	xfs_agnumber_t		end_agno)
 {
-	struct xfs_mount	*mp = pag->pag_mount;
+	struct xfs_mount	*mp = pag_mount(pag);
 
-	*agno = pag->pag_agno + 1;
+	*agno = pag_agno(pag) + 1;
 	xfs_perag_rele(pag);
 	while (*agno <= end_agno) {
 		pag = xfs_perag_grab(mp, *agno);
@@ -266,9 +315,9 @@ xfs_perag_next_wrap(
 	xfs_agnumber_t		restart_agno,
 	xfs_agnumber_t		wrap_agno)
 {
-	struct xfs_mount	*mp = pag->pag_mount;
+	struct xfs_mount	*mp = pag_mount(pag);
 
-	*agno = pag->pag_agno + 1;
+	*agno = pag_agno(pag) + 1;
 	xfs_perag_rele(pag);
 	while (*agno != stop_agno) {
 		if (*agno >= wrap_agno) {
@@ -335,7 +384,7 @@ xfs_agbno_to_fsb(
 	struct xfs_perag	*pag,
 	xfs_agblock_t		agbno)
 {
-	return XFS_AGB_TO_FSB(pag->pag_mount, pag->pag_agno, agbno);
+	return XFS_AGB_TO_FSB(pag_mount(pag), pag_agno(pag), agbno);
 }
 
 static inline xfs_daddr_t
@@ -343,7 +392,7 @@ xfs_agbno_to_daddr(
 	struct xfs_perag	*pag,
 	xfs_agblock_t		agbno)
 {
-	return XFS_AGB_TO_DADDR(pag->pag_mount, pag->pag_agno, agbno);
+	return XFS_AGB_TO_DADDR(pag_mount(pag), pag_agno(pag), agbno);
 }
 
 static inline xfs_ino_t
@@ -351,7 +400,7 @@ xfs_agino_to_ino(
 	struct xfs_perag	*pag,
 	xfs_agino_t		agino)
 {
-	return XFS_AGINO_TO_INO(pag->pag_mount, pag->pag_agno, agino);
+	return XFS_AGINO_TO_INO(pag_mount(pag), pag_agno(pag), agino);
 }
 
 #endif /* __LIBXFS_AG_H */
diff --git a/libxfs/xfs_ag_resv.c b/libxfs/xfs_ag_resv.c
index d1657d6c636546..f5cbaa94664f22 100644
--- a/libxfs/xfs_ag_resv.c
+++ b/libxfs/xfs_ag_resv.c
@@ -69,6 +69,7 @@ xfs_ag_resv_critical(
 	struct xfs_perag		*pag,
 	enum xfs_ag_resv_type		type)
 {
+	struct xfs_mount		*mp = pag_mount(pag);
 	xfs_extlen_t			avail;
 	xfs_extlen_t			orig;
 
@@ -91,8 +92,8 @@ xfs_ag_resv_critical(
 
 	/* Critically low if less than 10% or max btree height remains. */
 	return XFS_TEST_ERROR(avail < orig / 10 ||
-			      avail < pag->pag_mount->m_agbtree_maxlevels,
-			pag->pag_mount, XFS_ERRTAG_AG_RESV_CRITICAL);
+			      avail < mp->m_agbtree_maxlevels,
+			mp, XFS_ERRTAG_AG_RESV_CRITICAL);
 }
 
 /*
@@ -136,8 +137,8 @@ __xfs_ag_resv_free(
 	trace_xfs_ag_resv_free(pag, type, 0);
 
 	resv = xfs_perag_resv(pag, type);
-	if (pag->pag_agno == 0)
-		pag->pag_mount->m_ag_max_usable += resv->ar_asked;
+	if (pag_agno(pag) == 0)
+		pag_mount(pag)->m_ag_max_usable += resv->ar_asked;
 	/*
 	 * RMAPBT blocks come from the AGFL and AGFL blocks are always
 	 * considered "free", so whatever was reserved at mount time must be
@@ -147,7 +148,7 @@ __xfs_ag_resv_free(
 		oldresv = resv->ar_orig_reserved;
 	else
 		oldresv = resv->ar_reserved;
-	xfs_add_fdblocks(pag->pag_mount, oldresv);
+	xfs_add_fdblocks(pag_mount(pag), oldresv);
 	resv->ar_reserved = 0;
 	resv->ar_asked = 0;
 	resv->ar_orig_reserved = 0;
@@ -169,7 +170,7 @@ __xfs_ag_resv_init(
 	xfs_extlen_t			ask,
 	xfs_extlen_t			used)
 {
-	struct xfs_mount		*mp = pag->pag_mount;
+	struct xfs_mount		*mp = pag_mount(pag);
 	struct xfs_ag_resv		*resv;
 	int				error;
 	xfs_extlen_t			hidden_space;
@@ -208,7 +209,7 @@ __xfs_ag_resv_init(
 		trace_xfs_ag_resv_init_error(pag, error, _RET_IP_);
 		xfs_warn(mp,
 "Per-AG reservation for AG %u failed.  Filesystem may run out of space.",
-				pag->pag_agno);
+				pag_agno(pag));
 		return error;
 	}
 
@@ -218,7 +219,7 @@ __xfs_ag_resv_init(
 	 * counter, we only make the adjustment for AG 0.  This assumes that
 	 * there aren't any AGs hungrier for per-AG reservation than AG 0.
 	 */
-	if (pag->pag_agno == 0)
+	if (pag_agno(pag) == 0)
 		mp->m_ag_max_usable -= ask;
 
 	resv = xfs_perag_resv(pag, type);
@@ -236,7 +237,7 @@ xfs_ag_resv_init(
 	struct xfs_perag		*pag,
 	struct xfs_trans		*tp)
 {
-	struct xfs_mount		*mp = pag->pag_mount;
+	struct xfs_mount		*mp = pag_mount(pag);
 	xfs_extlen_t			ask;
 	xfs_extlen_t			used;
 	int				error = 0, error2;
diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index bd39bcde0ea224..39e1961078ae3a 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -271,7 +271,7 @@ xfs_alloc_complain_bad_rec(
 
 	xfs_warn(mp,
 		"%sbt record corruption in AG %d detected at %pS!",
-		cur->bc_ops->name, cur->bc_ag.pag->pag_agno, fa);
+		cur->bc_ops->name, pag_agno(cur->bc_ag.pag), fa);
 	xfs_warn(mp,
 		"start block 0x%x block count 0x%x", irec->ar_startblock,
 		irec->ar_blockcount);
@@ -795,7 +795,7 @@ xfs_agfl_verify(
 	 * use it by using uncached buffers that don't have the perag attached
 	 * so we can detect and avoid this problem.
 	 */
-	if (bp->b_pag && be32_to_cpu(agfl->agfl_seqno) != bp->b_pag->pag_agno)
+	if (bp->b_pag && be32_to_cpu(agfl->agfl_seqno) != pag_agno((bp->b_pag)))
 		return __this_address;
 
 	for (i = 0; i < xfs_agfl_size(mp); i++) {
@@ -875,13 +875,12 @@ xfs_alloc_read_agfl(
 	struct xfs_trans	*tp,
 	struct xfs_buf		**bpp)
 {
-	struct xfs_mount	*mp = pag->pag_mount;
+	struct xfs_mount	*mp = pag_mount(pag);
 	struct xfs_buf		*bp;
 	int			error;
 
-	error = xfs_trans_read_buf(
-			mp, tp, mp->m_ddev_targp,
-			XFS_AG_DADDR(mp, pag->pag_agno, XFS_AGFL_DADDR(mp)),
+	error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp,
+			XFS_AG_DADDR(mp, pag_agno(pag), XFS_AGFL_DADDR(mp)),
 			XFS_FSS_TO_BB(mp, 1), 0, &bp, &xfs_agfl_buf_ops);
 	if (xfs_metadata_is_sick(error))
 		xfs_ag_mark_sick(pag, XFS_SICK_AG_AGFL);
@@ -2424,7 +2423,7 @@ xfs_alloc_longest_free_extent(
 	 * reservations and AGFL rules in place, we can return this extent.
 	 */
 	if (pag->pagf_longest > delta)
-		return min_t(xfs_extlen_t, pag->pag_mount->m_ag_max_usable,
+		return min_t(xfs_extlen_t, pag_mount(pag)->m_ag_max_usable,
 				pag->pagf_longest - delta);
 
 	/* Otherwise, let the caller try for 1 block if there's space. */
@@ -2607,7 +2606,7 @@ xfs_agfl_reset(
 	xfs_warn(mp,
 	       "WARNING: Reset corrupted AGFL on AG %u. %d blocks leaked. "
 	       "Please unmount and run xfs_repair.",
-	         pag->pag_agno, pag->pagf_flcount);
+		pag_agno(pag), pag->pagf_flcount);
 
 	agf->agf_flfirst = 0;
 	agf->agf_fllast = cpu_to_be32(xfs_agfl_size(mp) - 1);
@@ -3182,7 +3181,7 @@ xfs_validate_ag_length(
 	 * use it by using uncached buffers that don't have the perag attached
 	 * so we can detect and avoid this problem.
 	 */
-	if (bp->b_pag && seqno != bp->b_pag->pag_agno)
+	if (bp->b_pag && seqno != pag_agno(bp->b_pag))
 		return __this_address;
 
 	/*
@@ -3351,13 +3350,13 @@ xfs_read_agf(
 	int			flags,
 	struct xfs_buf		**agfbpp)
 {
-	struct xfs_mount	*mp = pag->pag_mount;
+	struct xfs_mount	*mp = pag_mount(pag);
 	int			error;
 
 	trace_xfs_read_agf(pag);
 
 	error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp,
-			XFS_AG_DADDR(mp, pag->pag_agno, XFS_AGF_DADDR(mp)),
+			XFS_AG_DADDR(mp, pag_agno(pag), XFS_AGF_DADDR(mp)),
 			XFS_FSS_TO_BB(mp, 1), flags, agfbpp, &xfs_agf_buf_ops);
 	if (xfs_metadata_is_sick(error))
 		xfs_ag_mark_sick(pag, XFS_SICK_AG_AGF);
@@ -3380,6 +3379,7 @@ xfs_alloc_read_agf(
 	int			flags,
 	struct xfs_buf		**agfbpp)
 {
+	struct xfs_mount	*mp = pag_mount(pag);
 	struct xfs_buf		*agfbp;
 	struct xfs_agf		*agf;
 	int			error;
@@ -3406,7 +3406,7 @@ xfs_alloc_read_agf(
 		pag->pagf_cnt_level = be32_to_cpu(agf->agf_cnt_level);
 		pag->pagf_rmap_level = be32_to_cpu(agf->agf_rmap_level);
 		pag->pagf_refcount_level = be32_to_cpu(agf->agf_refcount_level);
-		if (xfs_agfl_needs_reset(pag->pag_mount, agf))
+		if (xfs_agfl_needs_reset(mp, agf))
 			set_bit(XFS_AGSTATE_AGFL_NEEDS_RESET, &pag->pag_opstate);
 		else
 			clear_bit(XFS_AGSTATE_AGFL_NEEDS_RESET, &pag->pag_opstate);
@@ -3419,16 +3419,15 @@ xfs_alloc_read_agf(
 		 * counter only tracks non-root blocks.
 		 */
 		allocbt_blks = pag->pagf_btreeblks;
-		if (xfs_has_rmapbt(pag->pag_mount))
+		if (xfs_has_rmapbt(mp))
 			allocbt_blks -= be32_to_cpu(agf->agf_rmap_blocks) - 1;
 		if (allocbt_blks > 0)
-			atomic64_add(allocbt_blks,
-					&pag->pag_mount->m_allocbt_blks);
+			atomic64_add(allocbt_blks, &mp->m_allocbt_blks);
 
 		set_bit(XFS_AGSTATE_AGF_INIT, &pag->pag_opstate);
 	}
 #ifdef DEBUG
-	else if (!xfs_is_shutdown(pag->pag_mount)) {
+	else if (!xfs_is_shutdown(mp)) {
 		ASSERT(pag->pagf_freeblks == be32_to_cpu(agf->agf_freeblks));
 		ASSERT(pag->pagf_btreeblks == be32_to_cpu(agf->agf_btreeblks));
 		ASSERT(pag->pagf_flcount == be32_to_cpu(agf->agf_flcount));
@@ -3646,7 +3645,7 @@ xfs_alloc_vextent_this_ag(
 	int			error;
 
 	ASSERT(args->pag != NULL);
-	ASSERT(args->pag->pag_agno == agno);
+	ASSERT(pag_agno(args->pag) == agno);
 
 	args->agno = agno;
 	args->agbno = 0;
@@ -3859,7 +3858,7 @@ xfs_alloc_vextent_exact_bno(
 	int			error;
 
 	ASSERT(args->pag != NULL);
-	ASSERT(args->pag->pag_agno == XFS_FSB_TO_AGNO(mp, target));
+	ASSERT(pag_agno(args->pag) == XFS_FSB_TO_AGNO(mp, target));
 
 	args->agno = XFS_FSB_TO_AGNO(mp, target);
 	args->agbno = XFS_FSB_TO_AGBNO(mp, target);
@@ -3898,7 +3897,7 @@ xfs_alloc_vextent_near_bno(
 	int			error;
 
 	if (!needs_perag)
-		ASSERT(args->pag->pag_agno == XFS_FSB_TO_AGNO(mp, target));
+		ASSERT(pag_agno(args->pag) == XFS_FSB_TO_AGNO(mp, target));
 
 	args->agno = XFS_FSB_TO_AGNO(mp, target);
 	args->agbno = XFS_FSB_TO_AGBNO(mp, target);
@@ -3935,7 +3934,7 @@ xfs_free_extent_fix_freelist(
 	memset(&args, 0, sizeof(struct xfs_alloc_arg));
 	args.tp = tp;
 	args.mp = tp->t_mountp;
-	args.agno = pag->pag_agno;
+	args.agno = pag_agno(pag);
 	args.pag = pag;
 
 	/*
diff --git a/libxfs/xfs_alloc_btree.c b/libxfs/xfs_alloc_btree.c
index 949cd18ab16a99..667655a639fef1 100644
--- a/libxfs/xfs_alloc_btree.c
+++ b/libxfs/xfs_alloc_btree.c
@@ -176,7 +176,7 @@ xfs_allocbt_init_ptr_from_cur(
 {
 	struct xfs_agf		*agf = cur->bc_ag.agbp->b_addr;
 
-	ASSERT(cur->bc_ag.pag->pag_agno == be32_to_cpu(agf->agf_seqno));
+	ASSERT(pag_agno(cur->bc_ag.pag) == be32_to_cpu(agf->agf_seqno));
 
 	if (xfs_btree_is_bno(cur->bc_ops))
 		ptr->s = agf->agf_bno_root;
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index aec378ff4a9193..13cd4faa492838 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -3274,7 +3274,7 @@ xfs_bmap_longest_free_extent(
 	}
 
 	longest = xfs_alloc_longest_free_extent(pag,
-				xfs_alloc_min_freelist(pag->pag_mount, pag),
+				xfs_alloc_min_freelist(pag_mount(pag), pag),
 				xfs_ag_resv_needed(pag, XFS_AG_RESV_NONE));
 	if (*blen < longest)
 		*blen = longest;
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index 4f04a92f6513bf..2b63c18114763c 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -370,7 +370,7 @@ xfs_btree_check_ptr(
 		case XFS_BTREE_TYPE_AG:
 			xfs_err(cur->bc_mp,
 "AG %u: Corrupt %sbt pointer at level %d index %d.",
-				cur->bc_ag.pag->pag_agno, cur->bc_ops->name,
+				pag_agno(cur->bc_ag.pag), cur->bc_ops->name,
 				level, index);
 			break;
 		}
@@ -1310,7 +1310,7 @@ xfs_btree_owner(
 	case XFS_BTREE_TYPE_INODE:
 		return cur->bc_ino.ip->i_ino;
 	case XFS_BTREE_TYPE_AG:
-		return cur->bc_ag.pag->pag_agno;
+		return pag_agno(cur->bc_ag.pag);
 	default:
 		ASSERT(0);
 		return 0;
@@ -4742,7 +4742,7 @@ xfs_btree_agblock_v5hdr_verify(
 		return __this_address;
 	if (block->bb_u.s.bb_blkno != cpu_to_be64(xfs_buf_daddr(bp)))
 		return __this_address;
-	if (pag && be32_to_cpu(block->bb_u.s.bb_owner) != pag->pag_agno)
+	if (pag && be32_to_cpu(block->bb_u.s.bb_owner) != pag_agno(pag))
 		return __this_address;
 	return NULL;
 }
diff --git a/libxfs/xfs_group.c b/libxfs/xfs_group.c
new file mode 100644
index 00000000000000..8a67148362b0d7
--- /dev/null
+++ b/libxfs/xfs_group.c
@@ -0,0 +1,168 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2018 Red Hat, Inc.
+ */
+
+#include "libxfs_priv.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_trace.h"
+#include "xfs_group.h"
+
+/*
+ * Groups can have passive and active references.
+ *
+ * For passive references the code freeing a group is responsible for cleaning
+ * up objects that hold the passive references (e.g. cached buffers).
+ * Routines manipulating passive references are xfs_group_get, xfs_group_hold
+ * and xfs_group_put.
+ *
+ * Active references are for short term access to the group for walking trees or
+ * accessing state. If a group is being shrunk or offlined, the lookup will fail
+ * to find that group and return NULL instead.
+ * Routines manipulating active references are xfs_group_grab and
+ * xfs_group_rele.
+ */
+
+struct xfs_group *
+xfs_group_get(
+	struct xfs_mount	*mp,
+	uint32_t		index,
+	enum xfs_group_type	type)
+{
+	struct xfs_group	*xg;
+
+	rcu_read_lock();
+	xg = xa_load(&mp->m_groups[type].xa, index);
+	if (xg) {
+		trace_xfs_group_get(xg, _RET_IP_);
+		ASSERT(atomic_read(&xg->xg_ref) >= 0);
+		atomic_inc(&xg->xg_ref);
+	}
+	rcu_read_unlock();
+	return xg;
+}
+
+struct xfs_group *
+xfs_group_hold(
+	struct xfs_group	*xg)
+{
+	ASSERT(atomic_read(&xg->xg_ref) > 0 ||
+	       atomic_read(&xg->xg_active_ref) > 0);
+
+	trace_xfs_group_hold(xg, _RET_IP_);
+	atomic_inc(&xg->xg_ref);
+	return xg;
+}
+
+void
+xfs_group_put(
+	struct xfs_group	*xg)
+{
+	trace_xfs_group_put(xg, _RET_IP_);
+
+	ASSERT(atomic_read(&xg->xg_ref) > 0);
+	atomic_dec(&xg->xg_ref);
+}
+
+struct xfs_group *
+xfs_group_grab(
+	struct xfs_mount	*mp,
+	uint32_t		index,
+	enum xfs_group_type	type)
+{
+	struct xfs_group	*xg;
+
+	rcu_read_lock();
+	xg = xa_load(&mp->m_groups[type].xa, index);
+	if (xg) {
+		trace_xfs_group_grab(xg, _RET_IP_);
+		if (!atomic_inc_not_zero(&xg->xg_active_ref))
+			xg = NULL;
+	}
+	rcu_read_unlock();
+	return xg;
+}
+
+/*
+ * Find the next group after @xg, or the first group if @xg is NULL.
+ */
+struct xfs_group *
+xfs_group_grab_next_mark(
+	struct xfs_mount	*mp,
+	struct xfs_group	*xg,
+	xa_mark_t		mark,
+	enum xfs_group_type	type)
+{
+	unsigned long		index = 0;
+
+	if (xg) {
+		index = xg->xg_gno + 1;
+		xfs_group_rele(xg);
+	}
+
+	rcu_read_lock();
+	xg = xa_find(&mp->m_groups[type].xa, &index, ULONG_MAX, mark);
+	if (xg) {
+		trace_xfs_group_grab_next_tag(xg, _RET_IP_);
+		if (!atomic_inc_not_zero(&xg->xg_active_ref))
+			xg = NULL;
+	}
+	rcu_read_unlock();
+	return xg;
+}
+
+void
+xfs_group_rele(
+	struct xfs_group	*xg)
+{
+	trace_xfs_group_rele(xg, _RET_IP_);
+	atomic_dec(&xg->xg_active_ref);
+}
+
+void
+xfs_group_free(
+	struct xfs_mount	*mp,
+	uint32_t		index,
+	enum xfs_group_type	type,
+	void			(*uninit)(struct xfs_group *xg))
+{
+	struct xfs_group	*xg = xa_erase(&mp->m_groups[type].xa, index);
+
+	XFS_IS_CORRUPT(mp, atomic_read(&xg->xg_ref) != 0);
+
+	if (uninit)
+		uninit(xg);
+
+	/* drop the mount's active reference */
+	xfs_group_rele(xg);
+	XFS_IS_CORRUPT(mp, atomic_read(&xg->xg_active_ref) != 0);
+	kfree_rcu_mightsleep(xg);
+}
+
+int
+xfs_group_insert(
+	struct xfs_mount	*mp,
+	struct xfs_group	*xg,
+	uint32_t		index,
+	enum xfs_group_type	type)
+{
+	int			error;
+
+	xg->xg_mount = mp;
+	xg->xg_gno = index;
+	xg->xg_type = type;
+
+	/* Active ref owned by mount indicates group is online. */
+	atomic_set(&xg->xg_active_ref, 1);
+
+	error = xa_insert(&mp->m_groups[type].xa, index, xg, GFP_KERNEL);
+	if (error) {
+		WARN_ON_ONCE(error == -EBUSY);
+		return error;
+	}
+
+	return 0;
+}
diff --git a/libxfs/xfs_group.h b/libxfs/xfs_group.h
new file mode 100644
index 00000000000000..e3b6be7ff9e802
--- /dev/null
+++ b/libxfs/xfs_group.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2018 Red Hat, Inc.
+ */
+#ifndef __LIBXFS_GROUP_H
+#define __LIBXFS_GROUP_H 1
+
+struct xfs_group {
+	struct xfs_mount	*xg_mount;
+	uint32_t		xg_gno;
+	enum xfs_group_type	xg_type;
+	atomic_t		xg_ref;		/* passive reference count */
+	atomic_t		xg_active_ref;	/* active reference count */
+};
+
+struct xfs_group *xfs_group_get(struct xfs_mount *mp, uint32_t index,
+		enum xfs_group_type type);
+struct xfs_group *xfs_group_hold(struct xfs_group *xg);
+void xfs_group_put(struct xfs_group *xg);
+
+struct xfs_group *xfs_group_grab(struct xfs_mount *mp, uint32_t index,
+		enum xfs_group_type type);
+struct xfs_group *xfs_group_grab_next_mark(struct xfs_mount *mp,
+		struct xfs_group *xg, xa_mark_t mark, enum xfs_group_type type);
+void xfs_group_rele(struct xfs_group *xg);
+
+void xfs_group_free(struct xfs_mount *mp, uint32_t index,
+		enum xfs_group_type type, void (*uninit)(struct xfs_group *xg));
+int xfs_group_insert(struct xfs_mount *mp, struct xfs_group *xg,
+		uint32_t index, enum xfs_group_type);
+
+#define xfs_group_set_mark(_xg, _mark) \
+	xa_set_mark(&(_xg)->xg_mount->m_groups[(_xg)->xg_type].xa, \
+			(_xg)->xg_gno, (_mark))
+#define xfs_group_clear_mark(_xg, _mark) \
+	xa_clear_mark(&(_xg)->xg_mount->m_groups[(_xg)->xg_type].xa, \
+			(_xg)->xg_gno, (_mark))
+#define xfs_group_marked(_mp, _type, _mark) \
+	xa_marked(&(_mp)->m_groups[(_type)].xa, (_mark))
+
+#endif /* __LIBXFS_GROUP_H */
diff --git a/libxfs/xfs_ialloc.c b/libxfs/xfs_ialloc.c
index 4f087e6b074081..e4b9d3e2fbb5ce 100644
--- a/libxfs/xfs_ialloc.c
+++ b/libxfs/xfs_ialloc.c
@@ -137,7 +137,7 @@ xfs_inobt_complain_bad_rec(
 
 	xfs_warn(mp,
 		"%sbt record corruption in AG %d detected at %pS!",
-		cur->bc_ops->name, cur->bc_ag.pag->pag_agno, fa);
+		cur->bc_ops->name, pag_agno(cur->bc_ag.pag), fa);
 	xfs_warn(mp,
 "start inode 0x%x, count 0x%x, free 0x%x freemask 0x%llx, holemask 0x%x",
 		irec->ir_startino, irec->ir_count, irec->ir_freecount,
@@ -546,7 +546,7 @@ xfs_inobt_insert_sprec(
 	struct xfs_buf			*agbp,
 	struct xfs_inobt_rec_incore	*nrec)	/* in/out: new/merged rec. */
 {
-	struct xfs_mount		*mp = pag->pag_mount;
+	struct xfs_mount		*mp = pag_mount(pag);
 	struct xfs_btree_cur		*cur;
 	int				error;
 	int				i;
@@ -640,7 +640,7 @@ xfs_finobt_insert_sprec(
 	struct xfs_buf			*agbp,
 	struct xfs_inobt_rec_incore	*nrec)	/* in/out: new rec. */
 {
-	struct xfs_mount		*mp = pag->pag_mount;
+	struct xfs_mount		*mp = pag_mount(pag);
 	struct xfs_btree_cur		*cur;
 	int				error;
 	int				i;
@@ -875,7 +875,7 @@ xfs_ialloc_ag_alloc(
 	 * rather than a linear progression to prevent the next generation
 	 * number from being easily guessable.
 	 */
-	error = xfs_ialloc_inode_init(args.mp, tp, NULL, newlen, pag->pag_agno,
+	error = xfs_ialloc_inode_init(args.mp, tp, NULL, newlen, pag_agno(pag),
 			args.agbno, args.len, get_random_u32());
 
 	if (error)
@@ -1066,7 +1066,7 @@ xfs_dialloc_check_ino(
 	if (error)
 		return -EAGAIN;
 
-	error = xfs_imap_to_bp(pag->pag_mount, tp, &imap, &bp);
+	error = xfs_imap_to_bp(pag_mount(pag), tp, &imap, &bp);
 	if (error)
 		return -EAGAIN;
 
@@ -1117,7 +1117,7 @@ xfs_dialloc_ag_inobt(
 	/*
 	 * If in the same AG as the parent, try to get near the parent.
 	 */
-	if (pagno == pag->pag_agno) {
+	if (pagno == pag_agno(pag)) {
 		int		doneleft;	/* done, to the left */
 		int		doneright;	/* done, to the right */
 
@@ -1594,7 +1594,7 @@ xfs_dialloc_ag(
 	 * parent. If so, find the closest available inode to the parent. If
 	 * not, consider the agi hint or find the first free inode in the AG.
 	 */
-	if (pag->pag_agno == pagno)
+	if (pag_agno(pag) == pagno)
 		error = xfs_dialloc_ag_finobt_near(pagino, &cur, &rec);
 	else
 		error = xfs_dialloc_ag_finobt_newino(agi, cur, &rec);
@@ -2048,7 +2048,7 @@ xfs_difree_inobt(
 	struct xfs_icluster		*xic,
 	struct xfs_inobt_rec_incore	*orec)
 {
-	struct xfs_mount		*mp = pag->pag_mount;
+	struct xfs_mount		*mp = pag_mount(pag);
 	struct xfs_agi			*agi = agbp->b_addr;
 	struct xfs_btree_cur		*cur;
 	struct xfs_inobt_rec_incore	rec;
@@ -2182,7 +2182,7 @@ xfs_difree_finobt(
 	xfs_agino_t			agino,
 	struct xfs_inobt_rec_incore	*ibtrec) /* inobt record */
 {
-	struct xfs_mount		*mp = pag->pag_mount;
+	struct xfs_mount		*mp = pag_mount(pag);
 	struct xfs_btree_cur		*cur;
 	struct xfs_inobt_rec_incore	rec;
 	int				offset = agino - ibtrec->ir_startino;
@@ -2305,9 +2305,9 @@ xfs_difree(
 	/*
 	 * Break up inode number into its components.
 	 */
-	if (pag->pag_agno != XFS_INO_TO_AGNO(mp, inode)) {
-		xfs_warn(mp, "%s: agno != pag->pag_agno (%d != %d).",
-			__func__, XFS_INO_TO_AGNO(mp, inode), pag->pag_agno);
+	if (pag_agno(pag) != XFS_INO_TO_AGNO(mp, inode)) {
+		xfs_warn(mp, "%s: agno != pag_agno(pag) (%d != %d).",
+			__func__, XFS_INO_TO_AGNO(mp, inode), pag_agno(pag));
 		ASSERT(0);
 		return -EINVAL;
 	}
@@ -2368,7 +2368,7 @@ xfs_imap_lookup(
 	xfs_agblock_t		*offset_agbno,
 	int			flags)
 {
-	struct xfs_mount	*mp = pag->pag_mount;
+	struct xfs_mount	*mp = pag_mount(pag);
 	struct xfs_inobt_rec_incore rec;
 	struct xfs_btree_cur	*cur;
 	struct xfs_buf		*agbp;
@@ -2379,7 +2379,7 @@ xfs_imap_lookup(
 	if (error) {
 		xfs_alert(mp,
 			"%s: xfs_ialloc_read_agi() returned error %d, agno %d",
-			__func__, error, pag->pag_agno);
+			__func__, error, pag_agno(pag));
 		return error;
 	}
 
@@ -2429,7 +2429,7 @@ xfs_imap(
 	struct xfs_imap		*imap,	/* location map structure */
 	uint			flags)	/* flags for inode btree lookup */
 {
-	struct xfs_mount	*mp = pag->pag_mount;
+	struct xfs_mount	*mp = pag_mount(pag);
 	xfs_agblock_t		agbno;	/* block number of inode in the alloc group */
 	xfs_agino_t		agino;	/* inode number within alloc group */
 	xfs_agblock_t		chunk_agbno;	/* first block in inode chunk */
@@ -2721,13 +2721,13 @@ xfs_read_agi(
 	xfs_buf_flags_t		flags,
 	struct xfs_buf		**agibpp)
 {
-	struct xfs_mount	*mp = pag->pag_mount;
+	struct xfs_mount	*mp = pag_mount(pag);
 	int			error;
 
 	trace_xfs_read_agi(pag);
 
 	error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp,
-			XFS_AG_DADDR(mp, pag->pag_agno, XFS_AGI_DADDR(mp)),
+			XFS_AG_DADDR(mp, pag_agno(pag), XFS_AGI_DADDR(mp)),
 			XFS_FSS_TO_BB(mp, 1), flags, agibpp, &xfs_agi_buf_ops);
 	if (xfs_metadata_is_sick(error))
 		xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI);
@@ -2775,7 +2775,7 @@ xfs_ialloc_read_agi(
 	 * we are in the middle of a forced shutdown.
 	 */
 	ASSERT(pag->pagi_freecount == be32_to_cpu(agi->agi_freecount) ||
-		xfs_is_shutdown(pag->pag_mount));
+		xfs_is_shutdown(pag_mount(pag)));
 	if (agibpp)
 		*agibpp = agibp;
 	else
@@ -3114,13 +3114,13 @@ xfs_ialloc_check_shrink(
 	int			has;
 	int			error;
 
-	if (!xfs_has_sparseinodes(pag->pag_mount))
+	if (!xfs_has_sparseinodes(pag_mount(pag)))
 		return 0;
 
 	cur = xfs_inobt_init_cursor(pag, tp, agibp);
 
 	/* Look up the inobt record that would correspond to the new EOFS. */
-	agino = XFS_AGB_TO_AGINO(pag->pag_mount, new_length);
+	agino = XFS_AGB_TO_AGINO(pag_mount(pag), new_length);
 	error = xfs_inobt_lookup(cur, agino, XFS_LOOKUP_LE, &has);
 	if (error || !has)
 		goto out;
diff --git a/libxfs/xfs_ialloc_btree.c b/libxfs/xfs_ialloc_btree.c
index f80368b4d5fa5f..45908cce464dff 100644
--- a/libxfs/xfs_ialloc_btree.c
+++ b/libxfs/xfs_ialloc_btree.c
@@ -247,7 +247,7 @@ xfs_inobt_init_ptr_from_cur(
 {
 	struct xfs_agi		*agi = cur->bc_ag.agbp->b_addr;
 
-	ASSERT(cur->bc_ag.pag->pag_agno == be32_to_cpu(agi->agi_seqno));
+	ASSERT(pag_agno(cur->bc_ag.pag) == be32_to_cpu(agi->agi_seqno));
 
 	ptr->s = agi->agi_root;
 }
@@ -259,7 +259,7 @@ xfs_finobt_init_ptr_from_cur(
 {
 	struct xfs_agi		*agi = cur->bc_ag.agbp->b_addr;
 
-	ASSERT(cur->bc_ag.pag->pag_agno == be32_to_cpu(agi->agi_seqno));
+	ASSERT(pag_agno(cur->bc_ag.pag) == be32_to_cpu(agi->agi_seqno));
 	ptr->s = agi->agi_free_root;
 }
 
@@ -477,7 +477,7 @@ xfs_inobt_init_cursor(
 	struct xfs_trans	*tp,
 	struct xfs_buf		*agbp)
 {
-	struct xfs_mount	*mp = pag->pag_mount;
+	struct xfs_mount	*mp = pag_mount(pag);
 	struct xfs_btree_cur	*cur;
 
 	cur = xfs_btree_alloc_cursor(mp, tp, &xfs_inobt_ops,
@@ -503,7 +503,7 @@ xfs_finobt_init_cursor(
 	struct xfs_trans	*tp,
 	struct xfs_buf		*agbp)
 {
-	struct xfs_mount	*mp = pag->pag_mount;
+	struct xfs_mount	*mp = pag_mount(pag);
 	struct xfs_btree_cur	*cur;
 
 	cur = xfs_btree_alloc_cursor(mp, tp, &xfs_finobt_ops,
@@ -714,7 +714,7 @@ static xfs_extlen_t
 xfs_inobt_max_size(
 	struct xfs_perag	*pag)
 {
-	struct xfs_mount	*mp = pag->pag_mount;
+	struct xfs_mount	*mp = pag_mount(pag);
 	xfs_agblock_t		agblocks = pag->block_count;
 
 	/* Bail out if we're uninitialized, which can happen in mkfs. */
@@ -726,7 +726,7 @@ xfs_inobt_max_size(
 	 * never be available for the kinds of things that would require btree
 	 * expansion.  We therefore can pretend the space isn't there.
 	 */
-	if (xfs_ag_contains_log(mp, pag->pag_agno))
+	if (xfs_ag_contains_log(mp, pag_agno(pag)))
 		agblocks -= mp->m_sb.sb_logblocks;
 
 	return xfs_btree_calc_size(M_IGEO(mp)->inobt_mnr,
@@ -790,10 +790,10 @@ xfs_finobt_calc_reserves(
 	xfs_extlen_t		tree_len = 0;
 	int			error;
 
-	if (!xfs_has_finobt(pag->pag_mount))
+	if (!xfs_has_finobt(pag_mount(pag)))
 		return 0;
 
-	if (xfs_has_inobtcounts(pag->pag_mount))
+	if (xfs_has_inobtcounts(pag_mount(pag)))
 		error = xfs_finobt_read_blocks(pag, tp, &tree_len);
 	else
 		error = xfs_finobt_count_blocks(pag, tp, &tree_len);
diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c
index eeddec68a08539..9507cf74578d4f 100644
--- a/libxfs/xfs_refcount.c
+++ b/libxfs/xfs_refcount.c
@@ -153,7 +153,7 @@ xfs_refcount_complain_bad_rec(
 
 	xfs_warn(mp,
  "Refcount BTree record corruption in AG %d detected at %pS!",
-				cur->bc_ag.pag->pag_agno, fa);
+				pag_agno(cur->bc_ag.pag), fa);
 	xfs_warn(mp,
 		"Start block 0x%x, block count 0x%x, references 0x%x",
 		irec->rc_startblock, irec->rc_blockcount, irec->rc_refcount);
@@ -1320,7 +1320,7 @@ xfs_refcount_continue_op(
 	ri->ri_startblock = xfs_agbno_to_fsb(pag, new_agbno);
 
 	ASSERT(xfs_verify_fsbext(mp, ri->ri_startblock, ri->ri_blockcount));
-	ASSERT(pag->pag_agno == XFS_FSB_TO_AGNO(mp, ri->ri_startblock));
+	ASSERT(pag_agno(pag) == XFS_FSB_TO_AGNO(mp, ri->ri_startblock));
 
 	return 0;
 }
diff --git a/libxfs/xfs_refcount_btree.c b/libxfs/xfs_refcount_btree.c
index 5913f4176889ea..e9c4fc419a6114 100644
--- a/libxfs/xfs_refcount_btree.c
+++ b/libxfs/xfs_refcount_btree.c
@@ -80,7 +80,7 @@ xfs_refcountbt_alloc_block(
 		*stat = 0;
 		return 0;
 	}
-	ASSERT(args.agno == cur->bc_ag.pag->pag_agno);
+	ASSERT(args.agno == pag_agno(cur->bc_ag.pag));
 	ASSERT(args.len == 1);
 
 	new->s = cpu_to_be32(args.agbno);
@@ -168,7 +168,7 @@ xfs_refcountbt_init_ptr_from_cur(
 {
 	struct xfs_agf		*agf = cur->bc_ag.agbp->b_addr;
 
-	ASSERT(cur->bc_ag.pag->pag_agno == be32_to_cpu(agf->agf_seqno));
+	ASSERT(pag_agno(cur->bc_ag.pag) == be32_to_cpu(agf->agf_seqno));
 
 	ptr->s = agf->agf_refcount_root;
 }
@@ -360,7 +360,7 @@ xfs_refcountbt_init_cursor(
 {
 	struct xfs_btree_cur	*cur;
 
-	ASSERT(pag->pag_agno < mp->m_sb.sb_agcount);
+	ASSERT(pag_agno(pag) < mp->m_sb.sb_agcount);
 
 	cur = xfs_btree_alloc_cursor(mp, tp, &xfs_refcountbt_ops,
 			mp->m_refc_maxlevels, xfs_refcountbt_cur_cache);
@@ -513,7 +513,7 @@ xfs_refcountbt_calc_reserves(
 	 * never be available for the kinds of things that would require btree
 	 * expansion.  We therefore can pretend the space isn't there.
 	 */
-	if (xfs_ag_contains_log(mp, pag->pag_agno))
+	if (xfs_ag_contains_log(mp, pag_agno(pag)))
 		agblocks -= mp->m_sb.sb_logblocks;
 
 	*ask += xfs_refcountbt_max_size(mp, agblocks);
diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c
index 22947e3c9ae310..0f7dee40bda87a 100644
--- a/libxfs/xfs_rmap.c
+++ b/libxfs/xfs_rmap.c
@@ -212,7 +212,7 @@ xfs_rmap_check_irec(
 	struct xfs_perag		*pag,
 	const struct xfs_rmap_irec	*irec)
 {
-	struct xfs_mount		*mp = pag->pag_mount;
+	struct xfs_mount		*mp = pag_mount(pag);
 	bool				is_inode;
 	bool				is_unwritten;
 	bool				is_bmbt;
@@ -287,7 +287,7 @@ xfs_rmap_complain_bad_rec(
 	else
 		xfs_warn(mp,
  "Reverse Mapping BTree record corruption in AG %d detected at %pS!",
-			cur->bc_ag.pag->pag_agno, fa);
+			pag_agno(cur->bc_ag.pag), fa);
 	xfs_warn(mp,
 		"Owner 0x%llx, flags 0x%x, start block 0x%x block count 0x%x",
 		irec->rm_owner, irec->rm_flags, irec->rm_startblock,
diff --git a/libxfs/xfs_rmap_btree.c b/libxfs/xfs_rmap_btree.c
index c261d6eae3bc3b..ebb2519cf8baf3 100644
--- a/libxfs/xfs_rmap_btree.c
+++ b/libxfs/xfs_rmap_btree.c
@@ -226,7 +226,7 @@ xfs_rmapbt_init_ptr_from_cur(
 {
 	struct xfs_agf		*agf = cur->bc_ag.agbp->b_addr;
 
-	ASSERT(cur->bc_ag.pag->pag_agno == be32_to_cpu(agf->agf_seqno));
+	ASSERT(pag_agno(cur->bc_ag.pag) == be32_to_cpu(agf->agf_seqno));
 
 	ptr->s = agf->agf_rmap_root;
 }
@@ -646,9 +646,8 @@ xfs_rmapbt_mem_cursor(
 	struct xfbtree		*xfbt)
 {
 	struct xfs_btree_cur	*cur;
-	struct xfs_mount	*mp = pag->pag_mount;
 
-	cur = xfs_btree_alloc_cursor(mp, tp, &xfs_rmapbt_mem_ops,
+	cur = xfs_btree_alloc_cursor(pag_mount(pag), tp, &xfs_rmapbt_mem_ops,
 			xfs_rmapbt_maxlevels_ondisk(), xfs_rmapbt_cur_cache);
 	cur->bc_mem.xfbtree = xfbt;
 	cur->bc_nlevels = xfbt->nlevels;
@@ -862,7 +861,7 @@ xfs_rmapbt_calc_reserves(
 	 * never be available for the kinds of things that would require btree
 	 * expansion.  We therefore can pretend the space isn't there.
 	 */
-	if (xfs_ag_contains_log(mp, pag->pag_agno))
+	if (xfs_ag_contains_log(mp, pag_agno(pag)))
 		agblocks -= mp->m_sb.sb_logblocks;
 
 	/* Reserve 1% of the AG or enough for 1 block per record. */
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 0d98b8a344209e..f534ae5d4c4db2 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -1131,7 +1131,7 @@ xfs_update_secondary_sbs(
 		struct xfs_buf		*bp;
 
 		error = xfs_buf_get(mp->m_ddev_targp,
-				 XFS_AG_DADDR(mp, pag->pag_agno, XFS_SB_DADDR),
+				 XFS_AG_DADDR(mp, pag_agno(pag), XFS_SB_DADDR),
 				 XFS_FSS_TO_BB(mp, 1), &bp);
 		/*
 		 * If we get an error reading or writing alternate superblocks,
@@ -1143,7 +1143,7 @@ xfs_update_secondary_sbs(
 		if (error) {
 			xfs_warn(mp,
 		"error allocating secondary superblock for ag %d",
-				pag->pag_agno);
+				pag_agno(pag));
 			if (!saved_error)
 				saved_error = error;
 			continue;
@@ -1164,7 +1164,7 @@ xfs_update_secondary_sbs(
 		if (error) {
 			xfs_warn(mp,
 		"write error %d updating a secondary superblock near ag %d",
-				error, pag->pag_agno);
+				error, pag_agno(pag));
 			if (!saved_error)
 				saved_error = error;
 			continue;
diff --git a/libxfs/xfs_types.h b/libxfs/xfs_types.h
index a8cd44d03ef648..d3cb6ff3b91301 100644
--- a/libxfs/xfs_types.h
+++ b/libxfs/xfs_types.h
@@ -212,6 +212,14 @@ enum xbtree_recpacking {
 	XBTREE_RECPACKING_FULL,
 };
 
+enum xfs_group_type {
+	XG_TYPE_AG,
+	XG_TYPE_MAX,
+} __packed;
+
+#define XG_TYPE_STRINGS \
+	{ XG_TYPE_AG,	"ag" }
+
 /*
  * Type verifier functions
  */
diff --git a/repair/agbtree.c b/repair/agbtree.c
index c8f75f49e6b363..67baa1acba9907 100644
--- a/repair/agbtree.c
+++ b/repair/agbtree.c
@@ -166,8 +166,7 @@ finish_rebuild(
 		if (resv->used == resv->len)
 			continue;
 
-		fsbno = XFS_AGB_TO_FSB(mp, resv->pag->pag_agno,
-				resv->agbno + resv->used);
+		fsbno = xfs_agbno_to_fsb(resv->pag, resv->agbno + resv->used);
 		error = bitmap_set(lost_blocks, fsbno, resv->len - resv->used);
 		if (error)
 			do_error(
@@ -203,7 +202,7 @@ get_bno_rec(
 	struct xfs_btree_cur	*cur,
 	struct extent_tree_node	*prev_value)
 {
-	xfs_agnumber_t		agno = cur->bc_ag.pag->pag_agno;
+	xfs_agnumber_t		agno = pag_agno(cur->bc_ag.pag);
 
 	if (xfs_btree_is_bno(cur->bc_ops)) {
 		if (!prev_value)
@@ -254,7 +253,7 @@ init_freespace_cursors(
 	struct bt_rebuild	*btr_bno,
 	struct bt_rebuild	*btr_cnt)
 {
-	xfs_agnumber_t		agno = pag->pag_agno;
+	xfs_agnumber_t		agno = pag_agno(pag);
 	unsigned int		agfl_goal;
 	int			error;
 
@@ -377,7 +376,7 @@ get_ino_rec(
 	struct xfs_btree_cur	*cur,
 	struct ino_tree_node	*prev_value)
 {
-	xfs_agnumber_t		agno = cur->bc_ag.pag->pag_agno;
+	xfs_agnumber_t		agno = pag_agno(cur->bc_ag.pag);
 
 	if (xfs_btree_is_ino(cur->bc_ops)) {
 		if (!prev_value)
@@ -482,7 +481,7 @@ init_ino_cursors(
 	struct bt_rebuild	*btr_fino)
 {
 	struct ino_tree_node	*ino_rec;
-	xfs_agnumber_t		agno = pag->pag_agno;
+	xfs_agnumber_t		agno = pag_agno(pag);
 	unsigned int		ino_recs = 0;
 	unsigned int		fino_recs = 0;
 	bool			finobt;
@@ -615,7 +614,7 @@ get_rmapbt_records(
 		if (ret == 0)
 			do_error(
  _("ran out of records while rebuilding AG %u rmap btree\n"),
-					cur->bc_ag.pag->pag_agno);
+					pag_agno(cur->bc_ag.pag));
 
 		block_rec = libxfs_btree_rec_addr(cur, idx, block);
 		cur->bc_ops->init_rec_from_cur(cur, block_rec);
@@ -632,7 +631,7 @@ init_rmapbt_cursor(
 	unsigned int		est_agfreeblocks,
 	struct bt_rebuild	*btr)
 {
-	xfs_agnumber_t		agno = pag->pag_agno;
+	xfs_agnumber_t		agno = pag_agno(pag);
 	int			error;
 
 	if (!xfs_has_rmapbt(sc->mp))
@@ -715,7 +714,7 @@ init_refc_cursor(
 	unsigned int		est_agfreeblocks,
 	struct bt_rebuild	*btr)
 {
-	xfs_agnumber_t		agno = pag->pag_agno;
+	xfs_agnumber_t		agno = pag_agno(pag);
 	int			error;
 
 	if (!xfs_has_reflink(sc->mp))
@@ -769,7 +768,7 @@ estimate_allocbt_blocks(
 	unsigned int		nr_extents)
 {
 	/* Account for space consumed by both free space btrees */
-	return libxfs_allocbt_calc_size(pag->pag_mount, nr_extents) * 2;
+	return libxfs_allocbt_calc_size(pag_mount(pag), nr_extents) * 2;
 }
 
 static xfs_extlen_t
@@ -777,7 +776,7 @@ estimate_inobt_blocks(
 	struct xfs_perag	*pag)
 {
 	struct ino_tree_node	*ino_rec;
-	xfs_agnumber_t		agno = pag->pag_agno;
+	xfs_agnumber_t		agno = pag_agno(pag);
 	unsigned int		ino_recs = 0;
 	unsigned int		fino_recs = 0;
 	xfs_extlen_t		ret;
@@ -807,9 +806,9 @@ estimate_inobt_blocks(
 			fino_recs++;
 	}
 
-	ret = libxfs_iallocbt_calc_size(pag->pag_mount, ino_recs);
-	if (xfs_has_finobt(pag->pag_mount))
-		ret += libxfs_iallocbt_calc_size(pag->pag_mount, fino_recs);
+	ret = libxfs_iallocbt_calc_size(pag_mount(pag), ino_recs);
+	if (xfs_has_finobt(pag_mount(pag)))
+		ret += libxfs_iallocbt_calc_size(pag_mount(pag), fino_recs);
 	return ret;
 
 }
diff --git a/repair/bmap_repair.c b/repair/bmap_repair.c
index b341caf627d5fd..7ccbb96fa8dc5e 100644
--- a/repair/bmap_repair.c
+++ b/repair/bmap_repair.c
@@ -129,7 +129,6 @@ xrep_bmap_walk_rmap(
 	void				*priv)
 {
 	struct xrep_bmap		*rb = priv;
-	struct xfs_mount		*mp = cur->bc_mp;
 	xfs_fsblock_t			fsbno;
 	int				error;
 
@@ -155,8 +154,7 @@ xrep_bmap_walk_rmap(
 	    !(rec->rm_flags & XFS_RMAP_ATTR_FORK))
 		return 0;
 
-	fsbno = XFS_AGB_TO_FSB(mp, cur->bc_ag.pag->pag_agno,
-			rec->rm_startblock);
+	fsbno = xfs_agbno_to_fsb(cur->bc_ag.pag, rec->rm_startblock);
 
 	if (rec->rm_flags & XFS_RMAP_BMBT_BLOCK) {
 		rb->old_bmbt_block_count += rec->rm_blockcount;
diff --git a/repair/bulkload.c b/repair/bulkload.c
index c96e569ef2979e..aada5bbae579f8 100644
--- a/repair/bulkload.c
+++ b/repair/bulkload.c
@@ -74,11 +74,10 @@ bulkload_add_extent(
 	xfs_agblock_t		agbno,
 	xfs_extlen_t		len)
 {
-	struct xfs_mount	*mp = bkl->sc->mp;
 	struct xfs_alloc_arg	args = {
 		.tp		= NULL, /* no autoreap */
 		.oinfo		= bkl->oinfo,
-		.fsbno		= XFS_AGB_TO_FSB(mp, pag->pag_agno, agbno),
+		.fsbno		= xfs_agbno_to_fsb(pag, agbno),
 		.len		= len,
 		.resv		= XFS_AG_RESV_NONE,
 	};
@@ -194,7 +193,7 @@ bulkload_free_extent(
 	 * Use EFIs to free the reservations.  We don't need to use EFIs here
 	 * like the kernel, but we'll do it to keep the code matched.
 	 */
-	fsbno = XFS_AGB_TO_FSB(sc->mp, resv->pag->pag_agno, free_agbno);
+	fsbno = xfs_agbno_to_fsb(resv->pag, free_agbno);
 	error = -libxfs_free_extent_later(sc->tp, fsbno, free_aglen,
 			&bkl->oinfo, XFS_AG_RESV_NONE,
 			XFS_FREE_EXTENT_SKIP_DISCARD);
@@ -290,7 +289,6 @@ bulkload_claim_block(
 	union xfs_btree_ptr	*ptr)
 {
 	struct bulkload_resv	*resv;
-	struct xfs_mount	*mp = cur->bc_mp;
 	xfs_agblock_t		agbno;
 
 	/*
@@ -316,8 +314,7 @@ bulkload_claim_block(
 		list_move_tail(&resv->list, &bkl->resv_list);
 
 	if (cur->bc_ops->ptr_len == XFS_BTREE_LONG_PTR_LEN)
-		ptr->l = cpu_to_be64(XFS_AGB_TO_FSB(mp, resv->pag->pag_agno,
-								agbno));
+		ptr->l = cpu_to_be64(xfs_agbno_to_fsb(resv->pag, agbno));
 	else
 		ptr->s = cpu_to_be32(agbno);
 	return 0;
diff --git a/repair/phase2.c b/repair/phase2.c
index e50cd3f8c2759d..42a2861dcc3714 100644
--- a/repair/phase2.c
+++ b/repair/phase2.c
@@ -304,13 +304,13 @@ check_fs_free_space(
 		if (error)
 			do_error(
 	_("Cannot read AGI %u for upgrade check, err=%d.\n"),
-					pag->pag_agno, error);
+					pag_agno(pag), error);
 
 		error = -libxfs_alloc_read_agf(pag, tp, 0, &agf_bp);
 		if (error)
 			do_error(
 	_("Cannot read AGF %u for upgrade check, err=%d.\n"),
-					pag->pag_agno, error);
+					pag_agno(pag), error);
 		agf = agf_bp->b_addr;
 		agblocks = be32_to_cpu(agf->agf_length);
 
@@ -326,13 +326,13 @@ check_fs_free_space(
 		if (error == ENOSPC) {
 			printf(
 	_("Not enough free space would remain in AG %u for metadata.\n"),
-					pag->pag_agno);
+					pag_agno(pag));
 			exit(1);
 		}
 		if (error)
 			do_error(
 	_("Error %d while checking AG %u space reservation.\n"),
-					error, pag->pag_agno);
+					error, pag_agno(pag));
 
 		/*
 		 * Would the post-upgrade filesystem have enough free space in
@@ -345,7 +345,7 @@ check_fs_free_space(
 		if (!check_free_space(mp, avail, agblocks)) {
 			printf(
 	_("AG %u will be low on space after upgrade.\n"),
-					pag->pag_agno);
+					pag_agno(pag));
 			exit(1);
 		}
 		libxfs_trans_cancel(tp);
diff --git a/repair/phase5.c b/repair/phase5.c
index 9207da7172c05b..fdbd9f56998fb4 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -441,7 +441,7 @@ phase5_func(
 	struct bt_rebuild	btr_fino;
 	struct bt_rebuild	btr_rmap;
 	struct bt_rebuild	btr_refc;
-	xfs_agnumber_t		agno = pag->pag_agno;
+	xfs_agnumber_t		agno = pag_agno(pag);
 	int			extra_blocks = 0;
 	uint			num_freeblocks;
 	xfs_agblock_t		num_extents;
diff --git a/repair/rmap.c b/repair/rmap.c
index 553c7a6c3658a0..29af74eee11831 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -1694,7 +1694,7 @@ xfs_extlen_t
 estimate_rmapbt_blocks(
 	struct xfs_perag	*pag)
 {
-	struct xfs_mount	*mp = pag->pag_mount;
+	struct xfs_mount	*mp = pag_mount(pag);
 	struct xfs_ag_rmap	*x;
 	unsigned long long	nr_recs = 0;
 
@@ -1707,12 +1707,12 @@ estimate_rmapbt_blocks(
 	 * means we can use SEEK_DATA/HOLE on the xfile, which is faster than
 	 * walking the entire btree to count records.
 	 */
-	x = &ag_rmaps[pag->pag_agno];
+	x = &ag_rmaps[pag_agno(pag)];
 	if (!rmaps_has_observations(x))
 		return 0;
 
 	nr_recs = xmbuf_bytes(x->ar_xmbtp) / sizeof(struct xfs_rmap_rec);
-	return libxfs_rmapbt_calc_size(pag->pag_mount, nr_recs);
+	return libxfs_rmapbt_calc_size(pag_mount(pag), nr_recs);
 }
 
 /* Estimate the size of the ondisk refcountbt from the incore data. */
@@ -1720,13 +1720,13 @@ xfs_extlen_t
 estimate_refcountbt_blocks(
 	struct xfs_perag	*pag)
 {
-	struct xfs_mount	*mp = pag->pag_mount;
+	struct xfs_mount	*mp = pag_mount(pag);
 	struct xfs_ag_rmap	*x;
 
 	if (!rmap_needs_work(mp) || !xfs_has_reflink(mp))
 		return 0;
 
-	x = &ag_rmaps[pag->pag_agno];
+	x = &ag_rmaps[pag_agno(pag)];
 	if (!x->ar_refcount_items)
 		return 0;
 


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 16/36] xfs: add a xfs_group_next_range helper
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (14 preceding siblings ...)
  2024-12-23 21:41   ` [PATCH 15/36] xfs: factor out a generic xfs_group structure Darrick J. Wong
@ 2024-12-23 21:41   ` Darrick J. Wong
  2024-12-23 21:41   ` [PATCH 17/36] xfs: switch perag iteration from the for_each macros to a while based iterator Darrick J. Wong
                     ` (19 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:41 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: 819928770bd91960f88f5a4dfa21b35a1bade61b

Add a helper to iterate over iterate over all groups, which can be used
as a simple while loop:

struct xfs_group                *xg = NULL;

while ((xg = xfs_group_next_range(mp, xg, 0, MAX_GROUP))) {
...
}

This will be wrapped by the realtime group code first, and eventually
replace the for_each_rtgroup_from and for_each_rtgroup_range helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_group.c |   26 ++++++++++++++++++++++++++
 libxfs/xfs_group.h |    3 +++
 2 files changed, 29 insertions(+)


diff --git a/libxfs/xfs_group.c b/libxfs/xfs_group.c
index 8a67148362b0d7..04d65033b75eca 100644
--- a/libxfs/xfs_group.c
+++ b/libxfs/xfs_group.c
@@ -86,6 +86,32 @@ xfs_group_grab(
 	return xg;
 }
 
+/*
+ * Iterate to the next group.  To start the iteration at @start_index, a %NULL
+ * @xg is passed, else the previous group returned from this function.  The
+ * caller should break out of the loop when this returns %NULL.  If the caller
+ * wants to break out of a loop that did not finish it needs to release the
+ * active reference to @xg using xfs_group_rele() itself.
+ */
+struct xfs_group *
+xfs_group_next_range(
+	struct xfs_mount	*mp,
+	struct xfs_group	*xg,
+	uint32_t		start_index,
+	uint32_t		end_index,
+	enum xfs_group_type	type)
+{
+	uint32_t		index = start_index;
+
+	if (xg) {
+		index = xg->xg_gno + 1;
+		xfs_group_rele(xg);
+	}
+	if (index > end_index)
+		return NULL;
+	return xfs_group_grab(mp, index, type);
+}
+
 /*
  * Find the next group after @xg, or the first group if @xg is NULL.
  */
diff --git a/libxfs/xfs_group.h b/libxfs/xfs_group.h
index e3b6be7ff9e802..dd7da90443054b 100644
--- a/libxfs/xfs_group.h
+++ b/libxfs/xfs_group.h
@@ -20,6 +20,9 @@ void xfs_group_put(struct xfs_group *xg);
 
 struct xfs_group *xfs_group_grab(struct xfs_mount *mp, uint32_t index,
 		enum xfs_group_type type);
+struct xfs_group *xfs_group_next_range(struct xfs_mount *mp,
+		struct xfs_group *xg, uint32_t start_index, uint32_t end_index,
+		enum xfs_group_type type);
 struct xfs_group *xfs_group_grab_next_mark(struct xfs_mount *mp,
 		struct xfs_group *xg, xa_mark_t mark, enum xfs_group_type type);
 void xfs_group_rele(struct xfs_group *xg);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 17/36] xfs: switch perag iteration from the for_each macros to a while based iterator
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (15 preceding siblings ...)
  2024-12-23 21:41   ` [PATCH 16/36] xfs: add a xfs_group_next_range helper Darrick J. Wong
@ 2024-12-23 21:41   ` Darrick J. Wong
  2024-12-23 21:41   ` [PATCH 18/36] xfs: move metadata health tracking to the generic group structure Darrick J. Wong
                     ` (18 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:41 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: 86437e6abbd2ef040f42ef190264819db6118415

The current for_each_perag* macros are a bit annoying in that they
require the caller to both provide an object and an index iterator, and
also somewhat obsfucate the underlying control flow mechanism.

Switch to open coded while loops using new xfs_perag_next{,_from,_range}
helpers that return the next pag structure to iterate on based on the
previous one or NULL for the loop start.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/fsmap.c           |    4 ++-
 db/info.c            |    5 ++--
 db/iunlink.c         |    4 ++-
 libxfs/xfs_ag.h      |   62 +++++++++++++++++++++++---------------------------
 libxfs/xfs_sb.c      |   15 ++++--------
 libxfs/xfs_types.c   |    5 ++--
 repair/bmap_repair.c |    5 ++--
 repair/phase2.c      |    7 ++----
 repair/phase5.c      |    4 ++-
 9 files changed, 48 insertions(+), 63 deletions(-)


diff --git a/db/fsmap.c b/db/fsmap.c
index 923d7568b9d977..a9259c4632185b 100644
--- a/db/fsmap.c
+++ b/db/fsmap.c
@@ -46,7 +46,7 @@ fsmap(
 	struct xfs_rmap_irec	high = {0};
 	struct xfs_btree_cur	*bt_cur;
 	struct xfs_buf		*agbp;
-	struct xfs_perag	*pag;
+	struct xfs_perag	*pag = NULL;
 	int			error;
 
 	eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks);
@@ -63,7 +63,7 @@ fsmap(
 	end_ag = XFS_FSB_TO_AGNO(mp, end_fsb);
 
 	info.nr = 0;
-	for_each_perag_range(mp, start_ag, end_ag, pag) {
+	while ((pag = xfs_perag_next_range(mp, pag, start_ag, end_ag))) {
 		if (pag_agno(pag) == end_ag)
 			high.rm_startblock = XFS_FSB_TO_AGBNO(mp, end_fsb);
 
diff --git a/db/info.c b/db/info.c
index 6a8765ec761a49..9a86d247839f84 100644
--- a/db/info.c
+++ b/db/info.c
@@ -104,8 +104,7 @@ agresv_f(
 	int			argc,
 	char			**argv)
 {
-	struct xfs_perag	*pag;
-	xfs_agnumber_t		agno;
+	struct xfs_perag	*pag = NULL;
 	int			i;
 
 	if (argc > 1) {
@@ -134,7 +133,7 @@ agresv_f(
 		return 0;
 	}
 
-	for_each_perag(mp, agno, pag)
+	while ((pag = xfs_perag_next(mp, pag)))
 		print_agresv_info(pag);
 
 	return 0;
diff --git a/db/iunlink.c b/db/iunlink.c
index c9977a859e2842..c73f818242b983 100644
--- a/db/iunlink.c
+++ b/db/iunlink.c
@@ -144,7 +144,7 @@ dump_iunlinked_f(
 	int			argc,
 	char			**argv)
 {
-	struct xfs_perag	*pag;
+	struct xfs_perag	*pag = NULL;
 	xfs_agnumber_t		agno = NULLAGNUMBER;
 	unsigned int		bucket = -1U;
 	bool			quiet = false;
@@ -189,7 +189,7 @@ dump_iunlinked_f(
 		return 0;
 	}
 
-	for_each_perag(mp, agno, pag)
+	while ((pag = xfs_perag_next(mp, pag)))
 		dump_unlinked(pag, bucket, quiet, verbose);
 
 	return 0;
diff --git a/libxfs/xfs_ag.h b/libxfs/xfs_ag.h
index 69b934ad2c4aad..80969682dc4746 100644
--- a/libxfs/xfs_ag.h
+++ b/libxfs/xfs_ag.h
@@ -208,6 +208,34 @@ xfs_perag_rele(
 	xfs_group_rele(pag_group(pag));
 }
 
+static inline struct xfs_perag *
+xfs_perag_next_range(
+	struct xfs_mount	*mp,
+	struct xfs_perag	*pag,
+	xfs_agnumber_t		start_agno,
+	xfs_agnumber_t		end_agno)
+{
+	return to_perag(xfs_group_next_range(mp, pag ? pag_group(pag) : NULL,
+			start_agno, end_agno, XG_TYPE_AG));
+}
+
+static inline struct xfs_perag *
+xfs_perag_next_from(
+	struct xfs_mount	*mp,
+	struct xfs_perag	*pag,
+	xfs_agnumber_t		start_agno)
+{
+	return xfs_perag_next_range(mp, pag, start_agno, mp->m_sb.sb_agcount - 1);
+}
+
+static inline struct xfs_perag *
+xfs_perag_next(
+	struct xfs_mount	*mp,
+	struct xfs_perag	*pag)
+{
+	return xfs_perag_next_from(mp, pag, 0);
+}
+
 /*
  * Per-ag geometry infomation and validation
  */
@@ -273,40 +301,6 @@ xfs_ag_contains_log(struct xfs_mount *mp, xfs_agnumber_t agno)
 	       agno == XFS_FSB_TO_AGNO(mp, mp->m_sb.sb_logstart);
 }
 
-/*
- * Perag iteration APIs
- */
-static inline struct xfs_perag *
-xfs_perag_next(
-	struct xfs_perag	*pag,
-	xfs_agnumber_t		*agno,
-	xfs_agnumber_t		end_agno)
-{
-	struct xfs_mount	*mp = pag_mount(pag);
-
-	*agno = pag_agno(pag) + 1;
-	xfs_perag_rele(pag);
-	while (*agno <= end_agno) {
-		pag = xfs_perag_grab(mp, *agno);
-		if (pag)
-			return pag;
-		(*agno)++;
-	}
-	return NULL;
-}
-
-#define for_each_perag_range(mp, agno, end_agno, pag) \
-	for ((pag) = xfs_perag_grab((mp), (agno)); \
-		(pag) != NULL; \
-		(pag) = xfs_perag_next((pag), &(agno), (end_agno)))
-
-#define for_each_perag_from(mp, agno, pag) \
-	for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount - 1, (pag))
-
-#define for_each_perag(mp, agno, pag) \
-	(agno) = 0; \
-	for_each_perag_from((mp), (agno), (pag))
-
 static inline struct xfs_perag *
 xfs_perag_next_wrap(
 	struct xfs_perag	*pag,
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index f534ae5d4c4db2..d32f789037389f 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -1120,14 +1120,13 @@ int
 xfs_update_secondary_sbs(
 	struct xfs_mount	*mp)
 {
-	struct xfs_perag	*pag;
-	xfs_agnumber_t		agno = 1;
+	struct xfs_perag	*pag = NULL;
 	int			saved_error = 0;
 	int			error = 0;
 	LIST_HEAD		(buffer_list);
 
 	/* update secondary superblocks. */
-	for_each_perag_from(mp, agno, pag) {
+	while ((pag = xfs_perag_next_from(mp, pag, 1))) {
 		struct xfs_buf		*bp;
 
 		error = xfs_buf_get(mp->m_ddev_targp,
@@ -1157,7 +1156,7 @@ xfs_update_secondary_sbs(
 		xfs_buf_relse(bp);
 
 		/* don't hold too many buffers at once */
-		if (agno % 16)
+		if (pag_agno(pag) % 16)
 			continue;
 
 		error = xfs_buf_delwri_submit(&buffer_list);
@@ -1171,12 +1170,8 @@ xfs_update_secondary_sbs(
 		}
 	}
 	error = xfs_buf_delwri_submit(&buffer_list);
-	if (error) {
-		xfs_warn(mp,
-		"write error %d updating a secondary superblock near ag %d",
-			error, agno);
-	}
-
+	if (error)
+		xfs_warn(mp, "error %d writing secondary superblocks", error);
 	return saved_error ? saved_error : error;
 }
 
diff --git a/libxfs/xfs_types.c b/libxfs/xfs_types.c
index 74ab1965a8f49c..0d1b86ae59d93e 100644
--- a/libxfs/xfs_types.c
+++ b/libxfs/xfs_types.c
@@ -170,13 +170,12 @@ xfs_icount_range(
 	unsigned long long	*max)
 {
 	unsigned long long	nr_inos = 0;
-	struct xfs_perag	*pag;
-	xfs_agnumber_t		agno;
+	struct xfs_perag	*pag = NULL;
 
 	/* root, rtbitmap, rtsum all live in the first chunk */
 	*min = XFS_INODES_PER_CHUNK;
 
-	for_each_perag(mp, agno, pag)
+	while ((pag = xfs_perag_next(mp, pag)))
 		nr_inos += pag->agino_max - pag->agino_min + 1;
 	*max = nr_inos;
 }
diff --git a/repair/bmap_repair.c b/repair/bmap_repair.c
index 7ccbb96fa8dc5e..3a214c85a1de5f 100644
--- a/repair/bmap_repair.c
+++ b/repair/bmap_repair.c
@@ -218,12 +218,11 @@ STATIC int
 xrep_bmap_find_mappings(
 	struct xrep_bmap	*rb)
 {
-	struct xfs_perag	*pag;
-	xfs_agnumber_t		agno;
+	struct xfs_perag	*pag = NULL;
 	int			error;
 
 	/* Iterate the rmaps for extents. */
-	for_each_perag(rb->sc->mp, agno, pag) {
+	while ((pag = xfs_perag_next(rb->sc->mp, pag))) {
 		error = xrep_bmap_scan_ag(rb, pag);
 		if (error) {
 			libxfs_perag_put(pag);
diff --git a/repair/phase2.c b/repair/phase2.c
index 42a2861dcc3714..17966bb54db09d 100644
--- a/repair/phase2.c
+++ b/repair/phase2.c
@@ -274,12 +274,11 @@ check_fs_free_space(
 	const struct check_state	*old,
 	struct xfs_sb			*new_sb)
 {
-	struct xfs_perag		*pag;
-	xfs_agnumber_t			agno;
+	struct xfs_perag		*pag = NULL;
 	int				error;
 
 	/* Make sure we have enough space for per-AG reservations. */
-	for_each_perag(mp, agno, pag) {
+	while ((pag = xfs_perag_next(mp, pag))) {
 		struct xfs_trans	*tp;
 		struct xfs_agf		*agf;
 		struct xfs_buf		*agi_bp, *agf_bp;
@@ -365,7 +364,7 @@ check_fs_free_space(
 	 * uninitialized so that we don't trip over stale cached counters
 	 * after the upgrade/
 	 */
-	for_each_perag(mp, agno, pag) {
+	while ((pag = xfs_perag_next(mp, pag))) {
 		libxfs_ag_resv_free(pag);
 		clear_bit(XFS_AGSTATE_AGF_INIT, &pag->pag_opstate);
 		clear_bit(XFS_AGSTATE_AGI_INIT, &pag->pag_opstate);
diff --git a/repair/phase5.c b/repair/phase5.c
index fdbd9f56998fb4..86b1f681a72bb8 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -632,7 +632,7 @@ void
 phase5(xfs_mount_t *mp)
 {
 	struct bitmap		*lost_blocks = NULL;
-	struct xfs_perag	*pag;
+	struct xfs_perag	*pag = NULL;
 	xfs_agnumber_t		agno;
 	int			error;
 
@@ -679,7 +679,7 @@ phase5(xfs_mount_t *mp)
 	if (error)
 		do_error(_("cannot alloc lost block bitmap\n"));
 
-	for_each_perag(mp, agno, pag)
+	while ((pag = xfs_perag_next(mp, pag)))
 		phase5_func(mp, pag, lost_blocks);
 
 	print_final_rpt();


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 18/36] xfs: move metadata health tracking to the generic group structure
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (16 preceding siblings ...)
  2024-12-23 21:41   ` [PATCH 17/36] xfs: switch perag iteration from the for_each macros to a while based iterator Darrick J. Wong
@ 2024-12-23 21:41   ` Darrick J. Wong
  2024-12-23 21:42   ` [PATCH 19/36] xfs: move draining of deferred operations " Darrick J. Wong
                     ` (17 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:41 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: 5c8483cec3fe261a5c1ede7430bab042ed156361

Prepare for also tracking the health status of the upcoming realtime
groups by moving the health tracking code to the generic xfs_group
structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/util.c       |    4 ++--
 libxfs/xfs_ag.c     |    1 -
 libxfs/xfs_ag.h     |    9 ---------
 libxfs/xfs_group.c  |    4 ++++
 libxfs/xfs_group.h  |   12 ++++++++++++
 libxfs/xfs_health.h |   45 +++++++++++++++++----------------------------
 6 files changed, 35 insertions(+), 40 deletions(-)


diff --git a/libxfs/util.c b/libxfs/util.c
index a3f3ad299336c7..97da94506aee01 100644
--- a/libxfs/util.c
+++ b/libxfs/util.c
@@ -449,8 +449,8 @@ void xfs_ag_geom_health(struct xfs_perag *pag, struct xfs_ag_geometry *ageo) { }
 void xfs_fs_mark_sick(struct xfs_mount *mp, unsigned int mask) { }
 void xfs_agno_mark_sick(struct xfs_mount *mp, xfs_agnumber_t agno,
 		unsigned int mask) { }
-void xfs_ag_mark_sick(struct xfs_perag *pag, unsigned int mask) { }
-void xfs_ag_measure_sickness(struct xfs_perag *pag, unsigned int *sick,
+void xfs_group_mark_sick(struct xfs_group *xg, unsigned int mask) { }
+void xfs_group_measure_sickness(struct xfs_group *xg, unsigned int *sick,
 		unsigned int *checked)
 {
 	*sick = 0;
diff --git a/libxfs/xfs_ag.c b/libxfs/xfs_ag.c
index 15d4ac5a99f0e7..20af8b67d86e88 100644
--- a/libxfs/xfs_ag.c
+++ b/libxfs/xfs_ag.c
@@ -230,7 +230,6 @@ xfs_perag_alloc(
 	/* Place kernel structure only init below this point. */
 	spin_lock_init(&pag->pag_ici_lock);
 	spin_lock_init(&pag->pagb_lock);
-	spin_lock_init(&pag->pag_state_lock);
 	INIT_DELAYED_WORK(&pag->pag_blockgc_work, xfs_blockgc_worker);
 	INIT_RADIX_TREE(&pag->pag_ici_root, GFP_ATOMIC);
 	xfs_defer_drain_init(&pag->pag_intents_drain);
diff --git a/libxfs/xfs_ag.h b/libxfs/xfs_ag.h
index 80969682dc4746..8271cb72c88387 100644
--- a/libxfs/xfs_ag.h
+++ b/libxfs/xfs_ag.h
@@ -69,13 +69,6 @@ struct xfs_perag {
 #ifdef __KERNEL__
 	/* -- kernel only structures below this line -- */
 
-	/*
-	 * Bitsets of per-ag metadata that have been checked and/or are sick.
-	 * Callers should hold pag_state_lock before accessing this field.
-	 */
-	uint16_t	pag_checked;
-	uint16_t	pag_sick;
-
 #ifdef CONFIG_XFS_ONLINE_REPAIR
 	/*
 	 * Alternate btree heights so that online repair won't trip the write
@@ -87,8 +80,6 @@ struct xfs_perag {
 	uint8_t		pagf_repair_rmap_level;
 #endif
 
-	spinlock_t	pag_state_lock;
-
 	spinlock_t	pagb_lock;	/* lock for pagb_tree */
 	struct rb_root	pagb_tree;	/* ordered tree of busy extents */
 	unsigned int	pagb_gen;	/* generation count for pagb_tree */
diff --git a/libxfs/xfs_group.c b/libxfs/xfs_group.c
index 04d65033b75eca..c5269cd659f327 100644
--- a/libxfs/xfs_group.c
+++ b/libxfs/xfs_group.c
@@ -181,6 +181,10 @@ xfs_group_insert(
 	xg->xg_gno = index;
 	xg->xg_type = type;
 
+#ifdef __KERNEL__
+	spin_lock_init(&xg->xg_state_lock);
+#endif
+
 	/* Active ref owned by mount indicates group is online. */
 	atomic_set(&xg->xg_active_ref, 1);
 
diff --git a/libxfs/xfs_group.h b/libxfs/xfs_group.h
index dd7da90443054b..d2c61dd1f43e44 100644
--- a/libxfs/xfs_group.h
+++ b/libxfs/xfs_group.h
@@ -11,6 +11,18 @@ struct xfs_group {
 	enum xfs_group_type	xg_type;
 	atomic_t		xg_ref;		/* passive reference count */
 	atomic_t		xg_active_ref;	/* active reference count */
+
+#ifdef __KERNEL__
+	/* -- kernel only structures below this line -- */
+
+	/*
+	 * Bitsets of per-ag metadata that have been checked and/or are sick.
+	 * Callers should hold xg_state_lock before accessing this field.
+	 */
+	uint16_t		xg_checked;
+	uint16_t		xg_sick;
+	spinlock_t		xg_state_lock;
+#endif /* __KERNEL__ */
 };
 
 struct xfs_group *xfs_group_get(struct xfs_mount *mp, uint32_t index,
diff --git a/libxfs/xfs_health.h b/libxfs/xfs_health.h
index b0edb4288e5929..13301420a2f670 100644
--- a/libxfs/xfs_health.h
+++ b/libxfs/xfs_health.h
@@ -6,6 +6,8 @@
 #ifndef __XFS_HEALTH_H__
 #define __XFS_HEALTH_H__
 
+struct xfs_group;
+
 /*
  * In-Core Filesystem Health Assessments
  * =====================================
@@ -197,10 +199,12 @@ void xfs_rt_measure_sickness(struct xfs_mount *mp, unsigned int *sick,
 
 void xfs_agno_mark_sick(struct xfs_mount *mp, xfs_agnumber_t agno,
 		unsigned int mask);
-void xfs_ag_mark_sick(struct xfs_perag *pag, unsigned int mask);
-void xfs_ag_mark_corrupt(struct xfs_perag *pag, unsigned int mask);
-void xfs_ag_mark_healthy(struct xfs_perag *pag, unsigned int mask);
-void xfs_ag_measure_sickness(struct xfs_perag *pag, unsigned int *sick,
+void xfs_group_mark_sick(struct xfs_group *xg, unsigned int mask);
+#define xfs_ag_mark_sick(pag, mask) \
+	xfs_group_mark_sick(pag_group(pag), (mask))
+void xfs_group_mark_corrupt(struct xfs_group *xg, unsigned int mask);
+void xfs_group_mark_healthy(struct xfs_group *xg, unsigned int mask);
+void xfs_group_measure_sickness(struct xfs_group *xg, unsigned int *sick,
 		unsigned int *checked);
 
 void xfs_inode_mark_sick(struct xfs_inode *ip, unsigned int mask);
@@ -227,22 +231,19 @@ xfs_fs_has_sickness(struct xfs_mount *mp, unsigned int mask)
 }
 
 static inline bool
-xfs_rt_has_sickness(struct xfs_mount *mp, unsigned int mask)
+xfs_group_has_sickness(
+	struct xfs_group	*xg,
+	unsigned int		mask)
 {
-	unsigned int	sick, checked;
+	unsigned int		sick, checked;
 
-	xfs_rt_measure_sickness(mp, &sick, &checked);
-	return sick & mask;
-}
-
-static inline bool
-xfs_ag_has_sickness(struct xfs_perag *pag, unsigned int mask)
-{
-	unsigned int	sick, checked;
-
-	xfs_ag_measure_sickness(pag, &sick, &checked);
+	xfs_group_measure_sickness(xg, &sick, &checked);
 	return sick & mask;
 }
+#define xfs_ag_has_sickness(pag, mask) \
+	xfs_group_has_sickness(pag_group(pag), (mask))
+#define xfs_ag_is_healthy(pag) \
+	(!xfs_ag_has_sickness((pag), UINT_MAX))
 
 static inline bool
 xfs_inode_has_sickness(struct xfs_inode *ip, unsigned int mask)
@@ -259,18 +260,6 @@ xfs_fs_is_healthy(struct xfs_mount *mp)
 	return !xfs_fs_has_sickness(mp, -1U);
 }
 
-static inline bool
-xfs_rt_is_healthy(struct xfs_mount *mp)
-{
-	return !xfs_rt_has_sickness(mp, -1U);
-}
-
-static inline bool
-xfs_ag_is_healthy(struct xfs_perag *pag)
-{
-	return !xfs_ag_has_sickness(pag, -1U);
-}
-
 static inline bool
 xfs_inode_is_healthy(struct xfs_inode *ip)
 {


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 19/36] xfs: move draining of deferred operations to the generic group structure
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (17 preceding siblings ...)
  2024-12-23 21:41   ` [PATCH 18/36] xfs: move metadata health tracking to the generic group structure Darrick J. Wong
@ 2024-12-23 21:42   ` Darrick J. Wong
  2024-12-23 21:42   ` [PATCH 20/36] xfs: move the online repair rmap hooks " Darrick J. Wong
                     ` (16 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:42 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: 34cf3a6f3952ecabd54b4fe3d431aa44ce98fe45

Prepare supporting the upcoming realtime groups feature by moving the
deferred operation draining to the generic xfs_group structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_ag.c    |    7 ++-----
 libxfs/xfs_ag.h    |    9 ---------
 libxfs/xfs_group.c |    4 ++++
 libxfs/xfs_group.h |    9 +++++++++
 4 files changed, 15 insertions(+), 14 deletions(-)


diff --git a/libxfs/xfs_ag.c b/libxfs/xfs_ag.c
index 20af8b67d86e88..d67e40f49a3fc0 100644
--- a/libxfs/xfs_ag.c
+++ b/libxfs/xfs_ag.c
@@ -110,7 +110,6 @@ xfs_perag_uninit(
 #ifdef __KERNEL__
 	struct xfs_perag	*pag = to_perag(xg);
 
-	xfs_defer_drain_free(&pag->pag_intents_drain);
 	cancel_delayed_work_sync(&pag->pag_blockgc_work);
 	xfs_buf_cache_destroy(&pag->pag_bcache);
 #endif
@@ -232,7 +231,6 @@ xfs_perag_alloc(
 	spin_lock_init(&pag->pagb_lock);
 	INIT_DELAYED_WORK(&pag->pag_blockgc_work, xfs_blockgc_worker);
 	INIT_RADIX_TREE(&pag->pag_ici_root, GFP_ATOMIC);
-	xfs_defer_drain_init(&pag->pag_intents_drain);
 	init_waitqueue_head(&pag->pagb_wait);
 	pag->pagb_tree = RB_ROOT;
 	xfs_hooks_init(&pag->pag_rmap_update_hooks);
@@ -240,7 +238,7 @@ xfs_perag_alloc(
 
 	error = xfs_buf_cache_init(&pag->pag_bcache);
 	if (error)
-		goto out_defer_drain_free;
+		goto out_free_perag;
 
 	/*
 	 * Pre-calculated geometry
@@ -258,8 +256,7 @@ xfs_perag_alloc(
 
 out_buf_cache_destroy:
 	xfs_buf_cache_destroy(&pag->pag_bcache);
-out_defer_drain_free:
-	xfs_defer_drain_free(&pag->pag_intents_drain);
+out_free_perag:
 	kfree(pag);
 	return error;
 }
diff --git a/libxfs/xfs_ag.h b/libxfs/xfs_ag.h
index 8271cb72c88387..45f8de06cdbc8a 100644
--- a/libxfs/xfs_ag.h
+++ b/libxfs/xfs_ag.h
@@ -97,15 +97,6 @@ struct xfs_perag {
 	/* background prealloc block trimming */
 	struct delayed_work	pag_blockgc_work;
 
-	/*
-	 * We use xfs_drain to track the number of deferred log intent items
-	 * that have been queued (but not yet processed) so that waiters (e.g.
-	 * scrub) will not lock resources when other threads are in the middle
-	 * of processing a chain of intent items only to find momentary
-	 * inconsistencies.
-	 */
-	struct xfs_defer_drain	pag_intents_drain;
-
 	/* Hook to feed rmapbt updates to an active online repair. */
 	struct xfs_hooks	pag_rmap_update_hooks;
 #endif /* __KERNEL__ */
diff --git a/libxfs/xfs_group.c b/libxfs/xfs_group.c
index c5269cd659f327..dfcebf2e9b30f8 100644
--- a/libxfs/xfs_group.c
+++ b/libxfs/xfs_group.c
@@ -159,6 +159,8 @@ xfs_group_free(
 
 	XFS_IS_CORRUPT(mp, atomic_read(&xg->xg_ref) != 0);
 
+	xfs_defer_drain_free(&xg->xg_intents_drain);
+
 	if (uninit)
 		uninit(xg);
 
@@ -184,6 +186,7 @@ xfs_group_insert(
 #ifdef __KERNEL__
 	spin_lock_init(&xg->xg_state_lock);
 #endif
+	xfs_defer_drain_init(&xg->xg_intents_drain);
 
 	/* Active ref owned by mount indicates group is online. */
 	atomic_set(&xg->xg_active_ref, 1);
@@ -191,6 +194,7 @@ xfs_group_insert(
 	error = xa_insert(&mp->m_groups[type].xa, index, xg, GFP_KERNEL);
 	if (error) {
 		WARN_ON_ONCE(error == -EBUSY);
+		xfs_defer_drain_free(&xg->xg_intents_drain);
 		return error;
 	}
 
diff --git a/libxfs/xfs_group.h b/libxfs/xfs_group.h
index d2c61dd1f43e44..ebefbba7d98cc2 100644
--- a/libxfs/xfs_group.h
+++ b/libxfs/xfs_group.h
@@ -22,6 +22,15 @@ struct xfs_group {
 	uint16_t		xg_checked;
 	uint16_t		xg_sick;
 	spinlock_t		xg_state_lock;
+
+	/*
+	 * We use xfs_drain to track the number of deferred log intent items
+	 * that have been queued (but not yet processed) so that waiters (e.g.
+	 * scrub) will not lock resources when other threads are in the middle
+	 * of processing a chain of intent items only to find momentary
+	 * inconsistencies.
+	 */
+	struct xfs_defer_drain	xg_intents_drain;
 #endif /* __KERNEL__ */
 };
 


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 20/36] xfs: move the online repair rmap hooks to the generic group structure
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (18 preceding siblings ...)
  2024-12-23 21:42   ` [PATCH 19/36] xfs: move draining of deferred operations " Darrick J. Wong
@ 2024-12-23 21:42   ` Darrick J. Wong
  2024-12-23 21:42   ` [PATCH 21/36] xfs: convert busy extent tracking " Darrick J. Wong
                     ` (15 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:42 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: eb4a84a3c2bd09efe770fa940fb68e349f90c8c6

Prepare for the upcoming realtime groups feature by moving the online
repair rmap hooks to based to the generic xfs_group structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_ag.c    |    1 -
 libxfs/xfs_ag.h    |    3 ---
 libxfs/xfs_group.c |    1 +
 libxfs/xfs_group.h |    5 +++++
 libxfs/xfs_rmap.c  |   24 +++++++++++++-----------
 libxfs/xfs_rmap.h  |    4 ++--
 6 files changed, 21 insertions(+), 17 deletions(-)


diff --git a/libxfs/xfs_ag.c b/libxfs/xfs_ag.c
index d67e40f49a3fc0..1542fea06e305e 100644
--- a/libxfs/xfs_ag.c
+++ b/libxfs/xfs_ag.c
@@ -233,7 +233,6 @@ xfs_perag_alloc(
 	INIT_RADIX_TREE(&pag->pag_ici_root, GFP_ATOMIC);
 	init_waitqueue_head(&pag->pagb_wait);
 	pag->pagb_tree = RB_ROOT;
-	xfs_hooks_init(&pag->pag_rmap_update_hooks);
 #endif /* __KERNEL__ */
 
 	error = xfs_buf_cache_init(&pag->pag_bcache);
diff --git a/libxfs/xfs_ag.h b/libxfs/xfs_ag.h
index 45f8de06cdbc8a..042ee0913fb9b9 100644
--- a/libxfs/xfs_ag.h
+++ b/libxfs/xfs_ag.h
@@ -96,9 +96,6 @@ struct xfs_perag {
 
 	/* background prealloc block trimming */
 	struct delayed_work	pag_blockgc_work;
-
-	/* Hook to feed rmapbt updates to an active online repair. */
-	struct xfs_hooks	pag_rmap_update_hooks;
 #endif /* __KERNEL__ */
 };
 
diff --git a/libxfs/xfs_group.c b/libxfs/xfs_group.c
index dfcebf2e9b30f8..58ace330a765cf 100644
--- a/libxfs/xfs_group.c
+++ b/libxfs/xfs_group.c
@@ -185,6 +185,7 @@ xfs_group_insert(
 
 #ifdef __KERNEL__
 	spin_lock_init(&xg->xg_state_lock);
+	xfs_hooks_init(&xg->xg_rmap_update_hooks);
 #endif
 	xfs_defer_drain_init(&xg->xg_intents_drain);
 
diff --git a/libxfs/xfs_group.h b/libxfs/xfs_group.h
index ebefbba7d98cc2..a87b9b80ef7516 100644
--- a/libxfs/xfs_group.h
+++ b/libxfs/xfs_group.h
@@ -31,6 +31,11 @@ struct xfs_group {
 	 * inconsistencies.
 	 */
 	struct xfs_defer_drain	xg_intents_drain;
+
+	/*
+	 * Hook to feed rmapbt updates to an active online repair.
+	 */
+	struct xfs_hooks	xg_rmap_update_hooks;
 #endif /* __KERNEL__ */
 };
 
diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c
index 0f7dee40bda87a..e13f4aa7e99538 100644
--- a/libxfs/xfs_rmap.c
+++ b/libxfs/xfs_rmap.c
@@ -834,7 +834,7 @@ xfs_rmap_hook_enable(void)
 static inline void
 xfs_rmap_update_hook(
 	struct xfs_trans		*tp,
-	struct xfs_perag		*pag,
+	struct xfs_group		*xg,
 	enum xfs_rmap_intent_type	op,
 	xfs_agblock_t			startblock,
 	xfs_extlen_t			blockcount,
@@ -849,27 +849,27 @@ xfs_rmap_update_hook(
 			.oinfo		= *oinfo, /* struct copy */
 		};
 
-		if (pag)
-			xfs_hooks_call(&pag->pag_rmap_update_hooks, op, &p);
+		if (xg)
+			xfs_hooks_call(&xg->xg_rmap_update_hooks, op, &p);
 	}
 }
 
 /* Call the specified function during a reverse mapping update. */
 int
 xfs_rmap_hook_add(
-	struct xfs_perag	*pag,
+	struct xfs_group	*xg,
 	struct xfs_rmap_hook	*hook)
 {
-	return xfs_hooks_add(&pag->pag_rmap_update_hooks, &hook->rmap_hook);
+	return xfs_hooks_add(&xg->xg_rmap_update_hooks, &hook->rmap_hook);
 }
 
 /* Stop calling the specified function during a reverse mapping update. */
 void
 xfs_rmap_hook_del(
-	struct xfs_perag	*pag,
+	struct xfs_group	*xg,
 	struct xfs_rmap_hook	*hook)
 {
-	xfs_hooks_del(&pag->pag_rmap_update_hooks, &hook->rmap_hook);
+	xfs_hooks_del(&xg->xg_rmap_update_hooks, &hook->rmap_hook);
 }
 
 /* Configure rmap update hook functions. */
@@ -904,7 +904,8 @@ xfs_rmap_free(
 		return 0;
 
 	cur = xfs_rmapbt_init_cursor(mp, tp, agbp, pag);
-	xfs_rmap_update_hook(tp, pag, XFS_RMAP_UNMAP, bno, len, false, oinfo);
+	xfs_rmap_update_hook(tp, pag_group(pag), XFS_RMAP_UNMAP, bno, len,
+			false, oinfo);
 	error = xfs_rmap_unmap(cur, bno, len, false, oinfo);
 
 	xfs_btree_del_cursor(cur, error);
@@ -1148,7 +1149,8 @@ xfs_rmap_alloc(
 		return 0;
 
 	cur = xfs_rmapbt_init_cursor(mp, tp, agbp, pag);
-	xfs_rmap_update_hook(tp, pag, XFS_RMAP_MAP, bno, len, false, oinfo);
+	xfs_rmap_update_hook(tp, pag_group(pag), XFS_RMAP_MAP, bno, len, false,
+			oinfo);
 	error = xfs_rmap_map(cur, bno, len, false, oinfo);
 
 	xfs_btree_del_cursor(cur, error);
@@ -2619,8 +2621,8 @@ xfs_rmap_finish_one(
 	if (error)
 		return error;
 
-	xfs_rmap_update_hook(tp, ri->ri_pag, ri->ri_type, bno,
-			ri->ri_bmap.br_blockcount, unwritten, &oinfo);
+	xfs_rmap_update_hook(tp, pag_group(ri->ri_pag), ri->ri_type, bno,
+			     ri->ri_bmap.br_blockcount, unwritten, &oinfo);
 	return 0;
 }
 
diff --git a/libxfs/xfs_rmap.h b/libxfs/xfs_rmap.h
index b783dd4dd95d1a..d409b463bc6662 100644
--- a/libxfs/xfs_rmap.h
+++ b/libxfs/xfs_rmap.h
@@ -264,8 +264,8 @@ struct xfs_rmap_hook {
 void xfs_rmap_hook_disable(void);
 void xfs_rmap_hook_enable(void);
 
-int xfs_rmap_hook_add(struct xfs_perag *pag, struct xfs_rmap_hook *hook);
-void xfs_rmap_hook_del(struct xfs_perag *pag, struct xfs_rmap_hook *hook);
+int xfs_rmap_hook_add(struct xfs_group *xg, struct xfs_rmap_hook *hook);
+void xfs_rmap_hook_del(struct xfs_group *xg, struct xfs_rmap_hook *hook);
 void xfs_rmap_hook_setup(struct xfs_rmap_hook *hook, notifier_fn_t mod_fn);
 #endif
 


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 21/36] xfs: convert busy extent tracking to the generic group structure
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (19 preceding siblings ...)
  2024-12-23 21:42   ` [PATCH 20/36] xfs: move the online repair rmap hooks " Darrick J. Wong
@ 2024-12-23 21:42   ` Darrick J. Wong
  2024-12-23 21:42   ` [PATCH 22/36] xfs: add a generic group pointer to the btree cursor Darrick J. Wong
                     ` (14 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:42 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: adbc76aa0fedcb6da2d1ceb1ce786d1f963afee8

Split busy extent tracking from struct xfs_perag into its own private
structure, which can be pointed to by the generic group structure.

Note that this structure is now dynamically allocated instead of embedded
as the upcoming zone XFS code doesn't need it and will also have an
unusually high number of groups due to hardware constraints.  Dynamically
allocating the structure this is a big memory saver for this case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/libxfs_priv.h     |    6 +++---
 libxfs/xfs_ag.c          |    3 ---
 libxfs/xfs_ag.h          |    5 -----
 libxfs/xfs_alloc.c       |   29 +++++++++++++++++------------
 libxfs/xfs_alloc_btree.c |    4 ++--
 libxfs/xfs_group.c       |   15 +++++++++++++--
 libxfs/xfs_group.h       |    5 +++++
 libxfs/xfs_rmap_btree.c  |    4 ++--
 8 files changed, 42 insertions(+), 29 deletions(-)


diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index 9505806131bc42..1b6bb961b7ac06 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -467,11 +467,11 @@ xfs_buf_readahead(
 
 #define xfs_extent_busy_reuse(...)			((void) 0)
 /* avoid unused variable warning */
-#define xfs_extent_busy_insert(tp,pag,bno,len,flags)({ 	\
-	struct xfs_perag *__foo = pag;			\
+#define xfs_extent_busy_insert(tp,xg,bno,len,flags)({ 	\
+	struct xfs_group *__foo = xg;			\
 	__foo = __foo; /* no set-but-unused warning */	\
 })
-#define xfs_extent_busy_trim(args,bno,len,busy_gen) 	({	\
+#define xfs_extent_busy_trim(group,minlen,maxlen,bno,len,busy_gen) 	({	\
 	unsigned __foo = *(busy_gen);				\
 	*(busy_gen) = __foo;					\
 	false;							\
diff --git a/libxfs/xfs_ag.c b/libxfs/xfs_ag.c
index 1542fea06e305e..bd38ac175bbae3 100644
--- a/libxfs/xfs_ag.c
+++ b/libxfs/xfs_ag.c
@@ -228,11 +228,8 @@ xfs_perag_alloc(
 #ifdef __KERNEL__
 	/* Place kernel structure only init below this point. */
 	spin_lock_init(&pag->pag_ici_lock);
-	spin_lock_init(&pag->pagb_lock);
 	INIT_DELAYED_WORK(&pag->pag_blockgc_work, xfs_blockgc_worker);
 	INIT_RADIX_TREE(&pag->pag_ici_root, GFP_ATOMIC);
-	init_waitqueue_head(&pag->pagb_wait);
-	pag->pagb_tree = RB_ROOT;
 #endif /* __KERNEL__ */
 
 	error = xfs_buf_cache_init(&pag->pag_bcache);
diff --git a/libxfs/xfs_ag.h b/libxfs/xfs_ag.h
index 042ee0913fb9b9..7290148fa6e6aa 100644
--- a/libxfs/xfs_ag.h
+++ b/libxfs/xfs_ag.h
@@ -80,11 +80,6 @@ struct xfs_perag {
 	uint8_t		pagf_repair_rmap_level;
 #endif
 
-	spinlock_t	pagb_lock;	/* lock for pagb_tree */
-	struct rb_root	pagb_tree;	/* ordered tree of busy extents */
-	unsigned int	pagb_gen;	/* generation count for pagb_tree */
-	wait_queue_head_t pagb_wait;	/* woken when pagb_gen changes */
-
 	atomic_t        pagf_fstrms;    /* # of filestreams active in this AG */
 
 	spinlock_t	pag_ici_lock;	/* incore inode cache lock */
diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index 39e1961078ae3a..1c50358a8be7ae 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -327,7 +327,8 @@ xfs_alloc_compute_aligned(
 	bool		busy;
 
 	/* Trim busy sections out of found extent */
-	busy = xfs_extent_busy_trim(args, &bno, &len, busy_gen);
+	busy = xfs_extent_busy_trim(pag_group(args->pag), args->minlen,
+			args->maxlen, &bno, &len, busy_gen);
 
 	/*
 	 * If we have a largish extent that happens to start before min_agbno,
@@ -1247,7 +1248,7 @@ xfs_alloc_ag_vextent_small(
 	if (fbno == NULLAGBLOCK)
 		goto out;
 
-	xfs_extent_busy_reuse(args->pag, fbno, 1,
+	xfs_extent_busy_reuse(pag_group(args->pag), fbno, 1,
 			      (args->datatype & XFS_ALLOC_NOBUSY));
 
 	if (args->datatype & XFS_ALLOC_USERDATA) {
@@ -1360,7 +1361,8 @@ xfs_alloc_ag_vextent_exact(
 	 */
 	tbno = fbno;
 	tlen = flen;
-	xfs_extent_busy_trim(args, &tbno, &tlen, &busy_gen);
+	xfs_extent_busy_trim(pag_group(args->pag), args->minlen, args->maxlen,
+			&tbno, &tlen, &busy_gen);
 
 	/*
 	 * Give up if the start of the extent is busy, or the freespace isn't
@@ -1753,8 +1755,9 @@ xfs_alloc_ag_vextent_near(
 			 * the allocation can be retried.
 			 */
 			trace_xfs_alloc_near_busy(args);
-			error = xfs_extent_busy_flush(args->tp, args->pag,
-					acur.busy_gen, alloc_flags);
+			error = xfs_extent_busy_flush(args->tp,
+					pag_group(args->pag), acur.busy_gen,
+					alloc_flags);
 			if (error)
 				goto out;
 
@@ -1869,8 +1872,9 @@ xfs_alloc_ag_vextent_size(
 			 * the allocation can be retried.
 			 */
 			trace_xfs_alloc_size_busy(args);
-			error = xfs_extent_busy_flush(args->tp, args->pag,
-					busy_gen, alloc_flags);
+			error = xfs_extent_busy_flush(args->tp,
+					pag_group(args->pag), busy_gen,
+					alloc_flags);
 			if (error)
 				goto error0;
 
@@ -1968,8 +1972,9 @@ xfs_alloc_ag_vextent_size(
 			 * the allocation can be retried.
 			 */
 			trace_xfs_alloc_size_busy(args);
-			error = xfs_extent_busy_flush(args->tp, args->pag,
-					busy_gen, alloc_flags);
+			error = xfs_extent_busy_flush(args->tp,
+					pag_group(args->pag), busy_gen,
+					alloc_flags);
 			if (error)
 				goto error0;
 
@@ -3609,8 +3614,8 @@ xfs_alloc_vextent_finish(
 		if (error)
 			goto out_drop_perag;
 
-		ASSERT(!xfs_extent_busy_search(args->pag, args->agbno,
-				args->len));
+		ASSERT(!xfs_extent_busy_search(pag_group(args->pag),
+				args->agbno, args->len));
 	}
 
 	xfs_ag_resv_alloc_extent(args->pag, args->resv, args);
@@ -4008,7 +4013,7 @@ __xfs_free_extent(
 
 	if (skip_discard)
 		busy_flags |= XFS_EXTENT_BUSY_SKIP_DISCARD;
-	xfs_extent_busy_insert(tp, pag, agbno, len, busy_flags);
+	xfs_extent_busy_insert(tp, pag_group(pag), agbno, len, busy_flags);
 	return 0;
 
 err_release:
diff --git a/libxfs/xfs_alloc_btree.c b/libxfs/xfs_alloc_btree.c
index 667655a639fef1..bf906aeb2f8a9e 100644
--- a/libxfs/xfs_alloc_btree.c
+++ b/libxfs/xfs_alloc_btree.c
@@ -84,7 +84,7 @@ xfs_allocbt_alloc_block(
 	}
 
 	atomic64_inc(&cur->bc_mp->m_allocbt_blks);
-	xfs_extent_busy_reuse(cur->bc_ag.pag, bno, 1, false);
+	xfs_extent_busy_reuse(pag_group(cur->bc_ag.pag), bno, 1, false);
 
 	new->s = cpu_to_be32(bno);
 
@@ -108,7 +108,7 @@ xfs_allocbt_free_block(
 		return error;
 
 	atomic64_dec(&cur->bc_mp->m_allocbt_blks);
-	xfs_extent_busy_insert(cur->bc_tp, agbp->b_pag, bno, 1,
+	xfs_extent_busy_insert(cur->bc_tp, pag_group(agbp->b_pag), bno, 1,
 			      XFS_EXTENT_BUSY_SKIP_DISCARD);
 	return 0;
 }
diff --git a/libxfs/xfs_group.c b/libxfs/xfs_group.c
index 58ace330a765cf..9c7fa99d00b802 100644
--- a/libxfs/xfs_group.c
+++ b/libxfs/xfs_group.c
@@ -160,6 +160,9 @@ xfs_group_free(
 	XFS_IS_CORRUPT(mp, atomic_read(&xg->xg_ref) != 0);
 
 	xfs_defer_drain_free(&xg->xg_intents_drain);
+#ifdef __KERNEL__
+	kfree(xg->xg_busy_extents);
+#endif
 
 	if (uninit)
 		uninit(xg);
@@ -184,6 +187,9 @@ xfs_group_insert(
 	xg->xg_type = type;
 
 #ifdef __KERNEL__
+	xg->xg_busy_extents = xfs_extent_busy_alloc();
+	if (!xg->xg_busy_extents)
+		return -ENOMEM;
 	spin_lock_init(&xg->xg_state_lock);
 	xfs_hooks_init(&xg->xg_rmap_update_hooks);
 #endif
@@ -195,9 +201,14 @@ xfs_group_insert(
 	error = xa_insert(&mp->m_groups[type].xa, index, xg, GFP_KERNEL);
 	if (error) {
 		WARN_ON_ONCE(error == -EBUSY);
-		xfs_defer_drain_free(&xg->xg_intents_drain);
-		return error;
+		goto out_drain;
 	}
 
 	return 0;
+out_drain:
+	xfs_defer_drain_free(&xg->xg_intents_drain);
+#ifdef __KERNEL__
+	kfree(xg->xg_busy_extents);
+#endif
+	return error;
 }
diff --git a/libxfs/xfs_group.h b/libxfs/xfs_group.h
index a87b9b80ef7516..0ff6e1d5635cb1 100644
--- a/libxfs/xfs_group.h
+++ b/libxfs/xfs_group.h
@@ -15,6 +15,11 @@ struct xfs_group {
 #ifdef __KERNEL__
 	/* -- kernel only structures below this line -- */
 
+	/*
+	 * Track freed but not yet committed extents.
+	 */
+	struct xfs_extent_busy_tree *xg_busy_extents;
+
 	/*
 	 * Bitsets of per-ag metadata that have been checked and/or are sick.
 	 * Callers should hold xg_state_lock before accessing this field.
diff --git a/libxfs/xfs_rmap_btree.c b/libxfs/xfs_rmap_btree.c
index ebb2519cf8baf3..5829e68790314a 100644
--- a/libxfs/xfs_rmap_btree.c
+++ b/libxfs/xfs_rmap_btree.c
@@ -101,7 +101,7 @@ xfs_rmapbt_alloc_block(
 		return 0;
 	}
 
-	xfs_extent_busy_reuse(pag, bno, 1, false);
+	xfs_extent_busy_reuse(pag_group(pag), bno, 1, false);
 
 	new->s = cpu_to_be32(bno);
 	be32_add_cpu(&agf->agf_rmap_blocks, 1);
@@ -135,7 +135,7 @@ xfs_rmapbt_free_block(
 	if (error)
 		return error;
 
-	xfs_extent_busy_insert(cur->bc_tp, pag, bno, 1,
+	xfs_extent_busy_insert(cur->bc_tp, pag_group(pag), bno, 1,
 			      XFS_EXTENT_BUSY_SKIP_DISCARD);
 
 	xfs_ag_resv_free_extent(pag, XFS_AG_RESV_RMAPBT, NULL, 1);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 22/36] xfs: add a generic group pointer to the btree cursor
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (20 preceding siblings ...)
  2024-12-23 21:42   ` [PATCH 21/36] xfs: convert busy extent tracking " Darrick J. Wong
@ 2024-12-23 21:42   ` Darrick J. Wong
  2024-12-23 21:43   ` [PATCH 23/36] xfs: add group based bno conversion helpers Darrick J. Wong
                     ` (13 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:42 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: 77a530e6c49d22bd4a221d2f059db24fc30094db

Replace the pag pointers in the type specific union with a generic
xfs_group pointer.  This prepares for adding realtime group support.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_alloc.c          |    8 ++++----
 libxfs/xfs_alloc_btree.c    |   28 ++++++++++++++--------------
 libxfs/xfs_btree.c          |   35 ++++++++++++-----------------------
 libxfs/xfs_btree.h          |    3 +--
 libxfs/xfs_btree_mem.c      |    6 ++----
 libxfs/xfs_ialloc.c         |   12 +++++++-----
 libxfs/xfs_ialloc_btree.c   |   15 ++++++++-------
 libxfs/xfs_refcount.c       |   17 +++++++++--------
 libxfs/xfs_refcount_btree.c |   10 +++++-----
 libxfs/xfs_rmap.c           |    8 +++-----
 libxfs/xfs_rmap_btree.c     |   19 ++++++++++---------
 repair/agbtree.c            |    6 +++---
 repair/bmap_repair.c        |    4 ++--
 13 files changed, 80 insertions(+), 91 deletions(-)


diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index 1c50358a8be7ae..c449285743534d 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -271,7 +271,7 @@ xfs_alloc_complain_bad_rec(
 
 	xfs_warn(mp,
 		"%sbt record corruption in AG %d detected at %pS!",
-		cur->bc_ops->name, pag_agno(cur->bc_ag.pag), fa);
+		cur->bc_ops->name, cur->bc_group->xg_gno, fa);
 	xfs_warn(mp,
 		"start block 0x%x block count 0x%x", irec->ar_startblock,
 		irec->ar_blockcount);
@@ -299,7 +299,7 @@ xfs_alloc_get_rec(
 		return error;
 
 	xfs_alloc_btrec_to_irec(rec, &irec);
-	fa = xfs_alloc_check_irec(cur->bc_ag.pag, &irec);
+	fa = xfs_alloc_check_irec(to_perag(cur->bc_group), &irec);
 	if (fa)
 		return xfs_alloc_complain_bad_rec(cur, fa, &irec);
 
@@ -536,7 +536,7 @@ static int
 xfs_alloc_fixup_longest(
 	struct xfs_btree_cur	*cnt_cur)
 {
-	struct xfs_perag	*pag = cnt_cur->bc_ag.pag;
+	struct xfs_perag	*pag = to_perag(cnt_cur->bc_group);
 	struct xfs_buf		*bp = cnt_cur->bc_ag.agbp;
 	struct xfs_agf		*agf = bp->b_addr;
 	xfs_extlen_t		longest = 0;
@@ -4038,7 +4038,7 @@ xfs_alloc_query_range_helper(
 	xfs_failaddr_t				fa;
 
 	xfs_alloc_btrec_to_irec(rec, &irec);
-	fa = xfs_alloc_check_irec(cur->bc_ag.pag, &irec);
+	fa = xfs_alloc_check_irec(to_perag(cur->bc_group), &irec);
 	if (fa)
 		return xfs_alloc_complain_bad_rec(cur, fa, &irec);
 
diff --git a/libxfs/xfs_alloc_btree.c b/libxfs/xfs_alloc_btree.c
index bf906aeb2f8a9e..63b8db4ed01350 100644
--- a/libxfs/xfs_alloc_btree.c
+++ b/libxfs/xfs_alloc_btree.c
@@ -26,7 +26,7 @@ xfs_bnobt_dup_cursor(
 	struct xfs_btree_cur	*cur)
 {
 	return xfs_bnobt_init_cursor(cur->bc_mp, cur->bc_tp, cur->bc_ag.agbp,
-			cur->bc_ag.pag);
+			to_perag(cur->bc_group));
 }
 
 STATIC struct xfs_btree_cur *
@@ -34,29 +34,29 @@ xfs_cntbt_dup_cursor(
 	struct xfs_btree_cur	*cur)
 {
 	return xfs_cntbt_init_cursor(cur->bc_mp, cur->bc_tp, cur->bc_ag.agbp,
-			cur->bc_ag.pag);
+			to_perag(cur->bc_group));
 }
 
-
 STATIC void
 xfs_allocbt_set_root(
 	struct xfs_btree_cur		*cur,
 	const union xfs_btree_ptr	*ptr,
 	int				inc)
 {
-	struct xfs_buf		*agbp = cur->bc_ag.agbp;
-	struct xfs_agf		*agf = agbp->b_addr;
+	struct xfs_perag		*pag = to_perag(cur->bc_group);
+	struct xfs_buf			*agbp = cur->bc_ag.agbp;
+	struct xfs_agf			*agf = agbp->b_addr;
 
 	ASSERT(ptr->s != 0);
 
 	if (xfs_btree_is_bno(cur->bc_ops)) {
 		agf->agf_bno_root = ptr->s;
 		be32_add_cpu(&agf->agf_bno_level, inc);
-		cur->bc_ag.pag->pagf_bno_level += inc;
+		pag->pagf_bno_level += inc;
 	} else {
 		agf->agf_cnt_root = ptr->s;
 		be32_add_cpu(&agf->agf_cnt_level, inc);
-		cur->bc_ag.pag->pagf_cnt_level += inc;
+		pag->pagf_cnt_level += inc;
 	}
 
 	xfs_alloc_log_agf(cur->bc_tp, agbp, XFS_AGF_ROOTS | XFS_AGF_LEVELS);
@@ -73,7 +73,7 @@ xfs_allocbt_alloc_block(
 	xfs_agblock_t		bno;
 
 	/* Allocate the new block from the freelist. If we can't, give up.  */
-	error = xfs_alloc_get_freelist(cur->bc_ag.pag, cur->bc_tp,
+	error = xfs_alloc_get_freelist(to_perag(cur->bc_group), cur->bc_tp,
 			cur->bc_ag.agbp, &bno, 1);
 	if (error)
 		return error;
@@ -84,7 +84,7 @@ xfs_allocbt_alloc_block(
 	}
 
 	atomic64_inc(&cur->bc_mp->m_allocbt_blks);
-	xfs_extent_busy_reuse(pag_group(cur->bc_ag.pag), bno, 1, false);
+	xfs_extent_busy_reuse(cur->bc_group, bno, 1, false);
 
 	new->s = cpu_to_be32(bno);
 
@@ -102,8 +102,8 @@ xfs_allocbt_free_block(
 	int			error;
 
 	bno = xfs_daddr_to_agbno(cur->bc_mp, xfs_buf_daddr(bp));
-	error = xfs_alloc_put_freelist(cur->bc_ag.pag, cur->bc_tp, agbp, NULL,
-			bno, 1);
+	error = xfs_alloc_put_freelist(to_perag(cur->bc_group), cur->bc_tp,
+			agbp, NULL, bno, 1);
 	if (error)
 		return error;
 
@@ -176,7 +176,7 @@ xfs_allocbt_init_ptr_from_cur(
 {
 	struct xfs_agf		*agf = cur->bc_ag.agbp->b_addr;
 
-	ASSERT(pag_agno(cur->bc_ag.pag) == be32_to_cpu(agf->agf_seqno));
+	ASSERT(cur->bc_group->xg_gno == be32_to_cpu(agf->agf_seqno));
 
 	if (xfs_btree_is_bno(cur->bc_ops))
 		ptr->s = agf->agf_bno_root;
@@ -490,7 +490,7 @@ xfs_bnobt_init_cursor(
 
 	cur = xfs_btree_alloc_cursor(mp, tp, &xfs_bnobt_ops,
 			mp->m_alloc_maxlevels, xfs_allocbt_cur_cache);
-	cur->bc_ag.pag = xfs_perag_hold(pag);
+	cur->bc_group = xfs_group_hold(pag_group(pag));
 	cur->bc_ag.agbp = agbp;
 	if (agbp) {
 		struct xfs_agf		*agf = agbp->b_addr;
@@ -516,7 +516,7 @@ xfs_cntbt_init_cursor(
 
 	cur = xfs_btree_alloc_cursor(mp, tp, &xfs_cntbt_ops,
 			mp->m_alloc_maxlevels, xfs_allocbt_cur_cache);
-	cur->bc_ag.pag = xfs_perag_hold(pag);
+	cur->bc_group = xfs_group_hold(pag_group(pag));
 	cur->bc_ag.agbp = agbp;
 	if (agbp) {
 		struct xfs_agf		*agf = agbp->b_addr;
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index 2b63c18114763c..3d870f3f4a5165 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -223,7 +223,7 @@ __xfs_btree_check_agblock(
 	struct xfs_buf		*bp)
 {
 	struct xfs_mount	*mp = cur->bc_mp;
-	struct xfs_perag	*pag = cur->bc_ag.pag;
+	struct xfs_perag	*pag = to_perag(cur->bc_group);
 	xfs_failaddr_t		fa;
 	xfs_agblock_t		agbno;
 
@@ -329,7 +329,7 @@ __xfs_btree_check_ptr(
 			return -EFSCORRUPTED;
 		break;
 	case XFS_BTREE_TYPE_AG:
-		if (!xfs_verify_agbno(cur->bc_ag.pag,
+		if (!xfs_verify_agbno(to_perag(cur->bc_group),
 				be32_to_cpu((&ptr->s)[index])))
 			return -EFSCORRUPTED;
 		break;
@@ -370,7 +370,7 @@ xfs_btree_check_ptr(
 		case XFS_BTREE_TYPE_AG:
 			xfs_err(cur->bc_mp,
 "AG %u: Corrupt %sbt pointer at level %d index %d.",
-				pag_agno(cur->bc_ag.pag), cur->bc_ops->name,
+				cur->bc_group->xg_gno, cur->bc_ops->name,
 				level, index);
 			break;
 		}
@@ -521,20 +521,8 @@ xfs_btree_del_cursor(
 	ASSERT(!xfs_btree_is_bmap(cur->bc_ops) || cur->bc_bmap.allocated == 0 ||
 	       xfs_is_shutdown(cur->bc_mp) || error != 0);
 
-	switch (cur->bc_ops->type) {
-	case XFS_BTREE_TYPE_AG:
-		if (cur->bc_ag.pag)
-			xfs_perag_put(cur->bc_ag.pag);
-		break;
-	case XFS_BTREE_TYPE_INODE:
-		/* nothing to do */
-		break;
-	case XFS_BTREE_TYPE_MEM:
-		if (cur->bc_mem.pag)
-			xfs_perag_put(cur->bc_mem.pag);
-		break;
-	}
-
+	if (cur->bc_group)
+		xfs_group_put(cur->bc_group);
 	kmem_cache_free(cur->bc_cache, cur);
 }
 
@@ -1015,21 +1003,22 @@ xfs_btree_readahead_agblock(
 	struct xfs_btree_block	*block)
 {
 	struct xfs_mount	*mp = cur->bc_mp;
+	struct xfs_perag	*pag = to_perag(cur->bc_group);
 	xfs_agblock_t		left = be32_to_cpu(block->bb_u.s.bb_leftsib);
 	xfs_agblock_t		right = be32_to_cpu(block->bb_u.s.bb_rightsib);
 	int			rval = 0;
 
 	if ((lr & XFS_BTCUR_LEFTRA) && left != NULLAGBLOCK) {
 		xfs_buf_readahead(mp->m_ddev_targp,
-				xfs_agbno_to_daddr(cur->bc_ag.pag, left),
-				mp->m_bsize, cur->bc_ops->buf_ops);
+				xfs_agbno_to_daddr(pag, left), mp->m_bsize,
+				cur->bc_ops->buf_ops);
 		rval++;
 	}
 
 	if ((lr & XFS_BTCUR_RIGHTRA) && right != NULLAGBLOCK) {
 		xfs_buf_readahead(mp->m_ddev_targp,
-				xfs_agbno_to_daddr(cur->bc_ag.pag, right),
-				mp->m_bsize, cur->bc_ops->buf_ops);
+				xfs_agbno_to_daddr(pag, right), mp->m_bsize,
+				cur->bc_ops->buf_ops);
 		rval++;
 	}
 
@@ -1088,7 +1077,7 @@ xfs_btree_ptr_to_daddr(
 
 	switch (cur->bc_ops->type) {
 	case XFS_BTREE_TYPE_AG:
-		*daddr = xfs_agbno_to_daddr(cur->bc_ag.pag,
+		*daddr = xfs_agbno_to_daddr(to_perag(cur->bc_group),
 				be32_to_cpu(ptr->s));
 		break;
 	case XFS_BTREE_TYPE_INODE:
@@ -1310,7 +1299,7 @@ xfs_btree_owner(
 	case XFS_BTREE_TYPE_INODE:
 		return cur->bc_ino.ip->i_ino;
 	case XFS_BTREE_TYPE_AG:
-		return pag_agno(cur->bc_ag.pag);
+		return cur->bc_group->xg_gno;
 	default:
 		ASSERT(0);
 		return 0;
diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index 10b7ddc3b2b34e..3b739459ebb0f4 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -254,6 +254,7 @@ struct xfs_btree_cur
 	union xfs_btree_irec	bc_rec;	/* current insert/search record value */
 	uint8_t			bc_nlevels; /* number of levels in the tree */
 	uint8_t			bc_maxlevels; /* maximum levels for this btree type */
+	struct xfs_group	*bc_group;
 
 	/* per-type information */
 	union {
@@ -264,13 +265,11 @@ struct xfs_btree_cur
 			struct xbtree_ifakeroot	*ifake;	/* for staging cursor */
 		} bc_ino;
 		struct {
-			struct xfs_perag	*pag;
 			struct xfs_buf		*agbp;
 			struct xbtree_afakeroot	*afake;	/* for staging cursor */
 		} bc_ag;
 		struct {
 			struct xfbtree		*xfbtree;
-			struct xfs_perag	*pag;
 		} bc_mem;
 	};
 
diff --git a/libxfs/xfs_btree_mem.c b/libxfs/xfs_btree_mem.c
index ae9302b9090f58..8e3efdbccc156a 100644
--- a/libxfs/xfs_btree_mem.c
+++ b/libxfs/xfs_btree_mem.c
@@ -56,10 +56,8 @@ xfbtree_dup_cursor(
 	ncur->bc_flags = cur->bc_flags;
 	ncur->bc_nlevels = cur->bc_nlevels;
 	ncur->bc_mem.xfbtree = cur->bc_mem.xfbtree;
-
-	if (cur->bc_mem.pag)
-		ncur->bc_mem.pag = xfs_perag_hold(cur->bc_mem.pag);
-
+	if (cur->bc_group)
+		ncur->bc_group = xfs_group_hold(cur->bc_group);
 	return ncur;
 }
 
diff --git a/libxfs/xfs_ialloc.c b/libxfs/xfs_ialloc.c
index e4b9d3e2fbb5ce..055eff477faceb 100644
--- a/libxfs/xfs_ialloc.c
+++ b/libxfs/xfs_ialloc.c
@@ -137,7 +137,7 @@ xfs_inobt_complain_bad_rec(
 
 	xfs_warn(mp,
 		"%sbt record corruption in AG %d detected at %pS!",
-		cur->bc_ops->name, pag_agno(cur->bc_ag.pag), fa);
+		cur->bc_ops->name, cur->bc_group->xg_gno, fa);
 	xfs_warn(mp,
 "start inode 0x%x, count 0x%x, free 0x%x freemask 0x%llx, holemask 0x%x",
 		irec->ir_startino, irec->ir_count, irec->ir_freecount,
@@ -165,7 +165,7 @@ xfs_inobt_get_rec(
 		return error;
 
 	xfs_inobt_btrec_to_irec(mp, rec, irec);
-	fa = xfs_inobt_check_irec(cur->bc_ag.pag, irec);
+	fa = xfs_inobt_check_irec(to_perag(cur->bc_group), irec);
 	if (fa)
 		return xfs_inobt_complain_bad_rec(cur, fa, irec);
 
@@ -270,8 +270,10 @@ xfs_check_agi_freecount(
 			}
 		} while (i == 1);
 
-		if (!xfs_is_shutdown(cur->bc_mp))
-			ASSERT(freecount == cur->bc_ag.pag->pagi_freecount);
+		if (!xfs_is_shutdown(cur->bc_mp)) {
+			ASSERT(freecount ==
+				to_perag(cur->bc_group)->pagi_freecount);
+		}
 	}
 	return 0;
 }
@@ -2875,7 +2877,7 @@ xfs_ialloc_count_inodes_rec(
 	xfs_failaddr_t			fa;
 
 	xfs_inobt_btrec_to_irec(cur->bc_mp, rec, &irec);
-	fa = xfs_inobt_check_irec(cur->bc_ag.pag, &irec);
+	fa = xfs_inobt_check_irec(to_perag(cur->bc_group), &irec);
 	if (fa)
 		return xfs_inobt_complain_bad_rec(cur, fa, &irec);
 
diff --git a/libxfs/xfs_ialloc_btree.c b/libxfs/xfs_ialloc_btree.c
index 45908cce464dff..4eeea240b0bd89 100644
--- a/libxfs/xfs_ialloc_btree.c
+++ b/libxfs/xfs_ialloc_btree.c
@@ -36,7 +36,7 @@ STATIC struct xfs_btree_cur *
 xfs_inobt_dup_cursor(
 	struct xfs_btree_cur	*cur)
 {
-	return xfs_inobt_init_cursor(cur->bc_ag.pag, cur->bc_tp,
+	return xfs_inobt_init_cursor(to_perag(cur->bc_group), cur->bc_tp,
 			cur->bc_ag.agbp);
 }
 
@@ -44,7 +44,7 @@ STATIC struct xfs_btree_cur *
 xfs_finobt_dup_cursor(
 	struct xfs_btree_cur	*cur)
 {
-	return xfs_finobt_init_cursor(cur->bc_ag.pag, cur->bc_tp,
+	return xfs_finobt_init_cursor(to_perag(cur->bc_group), cur->bc_tp,
 			cur->bc_ag.agbp);
 }
 
@@ -111,7 +111,7 @@ __xfs_inobt_alloc_block(
 	memset(&args, 0, sizeof(args));
 	args.tp = cur->bc_tp;
 	args.mp = cur->bc_mp;
-	args.pag = cur->bc_ag.pag;
+	args.pag = to_perag(cur->bc_group);
 	args.oinfo = XFS_RMAP_OINFO_INOBT;
 	args.minlen = 1;
 	args.maxlen = 1;
@@ -247,7 +247,7 @@ xfs_inobt_init_ptr_from_cur(
 {
 	struct xfs_agi		*agi = cur->bc_ag.agbp->b_addr;
 
-	ASSERT(pag_agno(cur->bc_ag.pag) == be32_to_cpu(agi->agi_seqno));
+	ASSERT(cur->bc_group->xg_gno == be32_to_cpu(agi->agi_seqno));
 
 	ptr->s = agi->agi_root;
 }
@@ -259,7 +259,8 @@ xfs_finobt_init_ptr_from_cur(
 {
 	struct xfs_agi		*agi = cur->bc_ag.agbp->b_addr;
 
-	ASSERT(pag_agno(cur->bc_ag.pag) == be32_to_cpu(agi->agi_seqno));
+	ASSERT(cur->bc_group->xg_gno == be32_to_cpu(agi->agi_seqno));
+
 	ptr->s = agi->agi_free_root;
 }
 
@@ -482,7 +483,7 @@ xfs_inobt_init_cursor(
 
 	cur = xfs_btree_alloc_cursor(mp, tp, &xfs_inobt_ops,
 			M_IGEO(mp)->inobt_maxlevels, xfs_inobt_cur_cache);
-	cur->bc_ag.pag = xfs_perag_hold(pag);
+	cur->bc_group = xfs_group_hold(pag_group(pag));
 	cur->bc_ag.agbp = agbp;
 	if (agbp) {
 		struct xfs_agi		*agi = agbp->b_addr;
@@ -508,7 +509,7 @@ xfs_finobt_init_cursor(
 
 	cur = xfs_btree_alloc_cursor(mp, tp, &xfs_finobt_ops,
 			M_IGEO(mp)->inobt_maxlevels, xfs_inobt_cur_cache);
-	cur->bc_ag.pag = xfs_perag_hold(pag);
+	cur->bc_group = xfs_group_hold(pag_group(pag));
 	cur->bc_ag.agbp = agbp;
 	if (agbp) {
 		struct xfs_agi		*agi = agbp->b_addr;
diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c
index 9507cf74578d4f..3eccb998545d6f 100644
--- a/libxfs/xfs_refcount.c
+++ b/libxfs/xfs_refcount.c
@@ -153,7 +153,7 @@ xfs_refcount_complain_bad_rec(
 
 	xfs_warn(mp,
  "Refcount BTree record corruption in AG %d detected at %pS!",
-				pag_agno(cur->bc_ag.pag), fa);
+				cur->bc_group->xg_gno, fa);
 	xfs_warn(mp,
 		"Start block 0x%x, block count 0x%x, references 0x%x",
 		irec->rc_startblock, irec->rc_blockcount, irec->rc_refcount);
@@ -179,7 +179,7 @@ xfs_refcount_get_rec(
 		return error;
 
 	xfs_refcount_btrec_to_irec(rec, irec);
-	fa = xfs_refcount_check_irec(cur->bc_ag.pag, irec);
+	fa = xfs_refcount_check_irec(to_perag(cur->bc_group), irec);
 	if (fa)
 		return xfs_refcount_complain_bad_rec(cur, fa, irec);
 
@@ -1153,7 +1153,7 @@ xfs_refcount_adjust_extents(
 					goto out_error;
 				}
 			} else {
-				fsbno = xfs_agbno_to_fsb(cur->bc_ag.pag,
+				fsbno = xfs_agbno_to_fsb(to_perag(cur->bc_group),
 						tmp.rc_startblock);
 				error = xfs_free_extent_later(cur->bc_tp, fsbno,
 						  tmp.rc_blockcount, NULL,
@@ -1215,7 +1215,7 @@ xfs_refcount_adjust_extents(
 			}
 			goto advloop;
 		} else {
-			fsbno = xfs_agbno_to_fsb(cur->bc_ag.pag,
+			fsbno = xfs_agbno_to_fsb(to_perag(cur->bc_group),
 					ext.rc_startblock);
 			error = xfs_free_extent_later(cur->bc_tp, fsbno,
 					ext.rc_blockcount, NULL,
@@ -1309,7 +1309,7 @@ xfs_refcount_continue_op(
 	xfs_agblock_t			new_agbno)
 {
 	struct xfs_mount		*mp = cur->bc_mp;
-	struct xfs_perag		*pag = cur->bc_ag.pag;
+	struct xfs_perag		*pag = to_perag(cur->bc_group);
 
 	if (XFS_IS_CORRUPT(mp, !xfs_verify_agbext(pag, new_agbno,
 					ri->ri_blockcount))) {
@@ -1357,7 +1357,7 @@ xfs_refcount_finish_one(
 	 * If we haven't gotten a cursor or the cursor AG doesn't match
 	 * the startblock, get one now.
 	 */
-	if (rcur != NULL && rcur->bc_ag.pag != ri->ri_pag) {
+	if (rcur != NULL && to_perag(rcur->bc_group) != ri->ri_pag) {
 		nr_ops = rcur->bc_refc.nr_ops;
 		shape_changes = rcur->bc_refc.shape_changes;
 		xfs_btree_del_cursor(rcur, 0);
@@ -1877,7 +1877,8 @@ xfs_refcount_recover_extent(
 	INIT_LIST_HEAD(&rr->rr_list);
 	xfs_refcount_btrec_to_irec(rec, &rr->rr_rrec);
 
-	if (xfs_refcount_check_irec(cur->bc_ag.pag, &rr->rr_rrec) != NULL ||
+	if (xfs_refcount_check_irec(to_perag(cur->bc_group), &rr->rr_rrec) !=
+			NULL ||
 	    XFS_IS_CORRUPT(cur->bc_mp,
 			   rr->rr_rrec.rc_domain != XFS_REFC_DOMAIN_COW)) {
 		xfs_btree_mark_sick(cur);
@@ -2025,7 +2026,7 @@ xfs_refcount_query_range_helper(
 	xfs_failaddr_t			fa;
 
 	xfs_refcount_btrec_to_irec(rec, &irec);
-	fa = xfs_refcount_check_irec(cur->bc_ag.pag, &irec);
+	fa = xfs_refcount_check_irec(to_perag(cur->bc_group), &irec);
 	if (fa)
 		return xfs_refcount_complain_bad_rec(cur, fa, &irec);
 
diff --git a/libxfs/xfs_refcount_btree.c b/libxfs/xfs_refcount_btree.c
index e9c4fc419a6114..efb9af710f7361 100644
--- a/libxfs/xfs_refcount_btree.c
+++ b/libxfs/xfs_refcount_btree.c
@@ -29,7 +29,7 @@ xfs_refcountbt_dup_cursor(
 	struct xfs_btree_cur	*cur)
 {
 	return xfs_refcountbt_init_cursor(cur->bc_mp, cur->bc_tp,
-			cur->bc_ag.agbp, cur->bc_ag.pag);
+			cur->bc_ag.agbp, to_perag(cur->bc_group));
 }
 
 STATIC void
@@ -67,7 +67,7 @@ xfs_refcountbt_alloc_block(
 	memset(&args, 0, sizeof(args));
 	args.tp = cur->bc_tp;
 	args.mp = cur->bc_mp;
-	args.pag = cur->bc_ag.pag;
+	args.pag = to_perag(cur->bc_group);
 	args.oinfo = XFS_RMAP_OINFO_REFC;
 	args.minlen = args.maxlen = args.prod = 1;
 	args.resv = XFS_AG_RESV_METADATA;
@@ -80,7 +80,7 @@ xfs_refcountbt_alloc_block(
 		*stat = 0;
 		return 0;
 	}
-	ASSERT(args.agno == pag_agno(cur->bc_ag.pag));
+	ASSERT(args.agno == cur->bc_group->xg_gno);
 	ASSERT(args.len == 1);
 
 	new->s = cpu_to_be32(args.agbno);
@@ -168,7 +168,7 @@ xfs_refcountbt_init_ptr_from_cur(
 {
 	struct xfs_agf		*agf = cur->bc_ag.agbp->b_addr;
 
-	ASSERT(pag_agno(cur->bc_ag.pag) == be32_to_cpu(agf->agf_seqno));
+	ASSERT(cur->bc_group->xg_gno == be32_to_cpu(agf->agf_seqno));
 
 	ptr->s = agf->agf_refcount_root;
 }
@@ -364,7 +364,7 @@ xfs_refcountbt_init_cursor(
 
 	cur = xfs_btree_alloc_cursor(mp, tp, &xfs_refcountbt_ops,
 			mp->m_refc_maxlevels, xfs_refcountbt_cur_cache);
-	cur->bc_ag.pag = xfs_perag_hold(pag);
+	cur->bc_group = xfs_group_hold(pag_group(pag));
 	cur->bc_refc.nr_ops = 0;
 	cur->bc_refc.shape_changes = 0;
 	cur->bc_ag.agbp = agbp;
diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c
index e13f4aa7e99538..07bcdf82d10081 100644
--- a/libxfs/xfs_rmap.c
+++ b/libxfs/xfs_rmap.c
@@ -268,9 +268,7 @@ xfs_rmap_check_btrec(
 	struct xfs_btree_cur		*cur,
 	const struct xfs_rmap_irec	*irec)
 {
-	if (xfs_btree_is_mem_rmap(cur->bc_ops))
-		return xfs_rmap_check_irec(cur->bc_mem.pag, irec);
-	return xfs_rmap_check_irec(cur->bc_ag.pag, irec);
+	return xfs_rmap_check_irec(to_perag(cur->bc_group), irec);
 }
 
 static inline int
@@ -287,7 +285,7 @@ xfs_rmap_complain_bad_rec(
 	else
 		xfs_warn(mp,
  "Reverse Mapping BTree record corruption in AG %d detected at %pS!",
-			pag_agno(cur->bc_ag.pag), fa);
+			cur->bc_group->xg_gno, fa);
 	xfs_warn(mp,
 		"Owner 0x%llx, flags 0x%x, start block 0x%x block count 0x%x",
 		irec->rm_owner, irec->rm_flags, irec->rm_startblock,
@@ -2587,7 +2585,7 @@ xfs_rmap_finish_one(
 	 * If we haven't gotten a cursor or the cursor AG doesn't match
 	 * the startblock, get one now.
 	 */
-	if (rcur != NULL && rcur->bc_ag.pag != ri->ri_pag) {
+	if (rcur != NULL && to_perag(rcur->bc_group) != ri->ri_pag) {
 		xfs_btree_del_cursor(rcur, 0);
 		rcur = NULL;
 		*pcur = NULL;
diff --git a/libxfs/xfs_rmap_btree.c b/libxfs/xfs_rmap_btree.c
index 5829e68790314a..f4da35050d2635 100644
--- a/libxfs/xfs_rmap_btree.c
+++ b/libxfs/xfs_rmap_btree.c
@@ -56,7 +56,7 @@ xfs_rmapbt_dup_cursor(
 	struct xfs_btree_cur	*cur)
 {
 	return xfs_rmapbt_init_cursor(cur->bc_mp, cur->bc_tp,
-				cur->bc_ag.agbp, cur->bc_ag.pag);
+				cur->bc_ag.agbp, to_perag(cur->bc_group));
 }
 
 STATIC void
@@ -65,14 +65,15 @@ xfs_rmapbt_set_root(
 	const union xfs_btree_ptr	*ptr,
 	int				inc)
 {
-	struct xfs_buf		*agbp = cur->bc_ag.agbp;
-	struct xfs_agf		*agf = agbp->b_addr;
+	struct xfs_buf			*agbp = cur->bc_ag.agbp;
+	struct xfs_agf			*agf = agbp->b_addr;
+	struct xfs_perag		*pag = to_perag(cur->bc_group);
 
 	ASSERT(ptr->s != 0);
 
 	agf->agf_rmap_root = ptr->s;
 	be32_add_cpu(&agf->agf_rmap_level, inc);
-	cur->bc_ag.pag->pagf_rmap_level += inc;
+	pag->pagf_rmap_level += inc;
 
 	xfs_alloc_log_agf(cur->bc_tp, agbp, XFS_AGF_ROOTS | XFS_AGF_LEVELS);
 }
@@ -86,7 +87,7 @@ xfs_rmapbt_alloc_block(
 {
 	struct xfs_buf		*agbp = cur->bc_ag.agbp;
 	struct xfs_agf		*agf = agbp->b_addr;
-	struct xfs_perag	*pag = cur->bc_ag.pag;
+	struct xfs_perag	*pag = to_perag(cur->bc_group);
 	struct xfs_alloc_arg    args = { .len = 1 };
 	int			error;
 	xfs_agblock_t		bno;
@@ -124,7 +125,7 @@ xfs_rmapbt_free_block(
 {
 	struct xfs_buf		*agbp = cur->bc_ag.agbp;
 	struct xfs_agf		*agf = agbp->b_addr;
-	struct xfs_perag	*pag = cur->bc_ag.pag;
+	struct xfs_perag	*pag = to_perag(cur->bc_group);
 	xfs_agblock_t		bno;
 	int			error;
 
@@ -226,7 +227,7 @@ xfs_rmapbt_init_ptr_from_cur(
 {
 	struct xfs_agf		*agf = cur->bc_ag.agbp->b_addr;
 
-	ASSERT(pag_agno(cur->bc_ag.pag) == be32_to_cpu(agf->agf_seqno));
+	ASSERT(cur->bc_group->xg_gno == be32_to_cpu(agf->agf_seqno));
 
 	ptr->s = agf->agf_rmap_root;
 }
@@ -537,7 +538,7 @@ xfs_rmapbt_init_cursor(
 
 	cur = xfs_btree_alloc_cursor(mp, tp, &xfs_rmapbt_ops,
 			mp->m_rmap_maxlevels, xfs_rmapbt_cur_cache);
-	cur->bc_ag.pag = xfs_perag_hold(pag);
+	cur->bc_group = xfs_group_hold(pag_group(pag));
 	cur->bc_ag.agbp = agbp;
 	if (agbp) {
 		struct xfs_agf		*agf = agbp->b_addr;
@@ -652,7 +653,7 @@ xfs_rmapbt_mem_cursor(
 	cur->bc_mem.xfbtree = xfbt;
 	cur->bc_nlevels = xfbt->nlevels;
 
-	cur->bc_mem.pag = xfs_perag_hold(pag);
+	cur->bc_group = xfs_group_hold(pag_group(pag));
 	return cur;
 }
 
diff --git a/repair/agbtree.c b/repair/agbtree.c
index 67baa1acba9907..42bfeee9eafbcb 100644
--- a/repair/agbtree.c
+++ b/repair/agbtree.c
@@ -202,7 +202,7 @@ get_bno_rec(
 	struct xfs_btree_cur	*cur,
 	struct extent_tree_node	*prev_value)
 {
-	xfs_agnumber_t		agno = pag_agno(cur->bc_ag.pag);
+	xfs_agnumber_t		agno = cur->bc_group->xg_gno;
 
 	if (xfs_btree_is_bno(cur->bc_ops)) {
 		if (!prev_value)
@@ -376,7 +376,7 @@ get_ino_rec(
 	struct xfs_btree_cur	*cur,
 	struct ino_tree_node	*prev_value)
 {
-	xfs_agnumber_t		agno = pag_agno(cur->bc_ag.pag);
+	xfs_agnumber_t		agno = cur->bc_group->xg_gno;
 
 	if (xfs_btree_is_ino(cur->bc_ops)) {
 		if (!prev_value)
@@ -614,7 +614,7 @@ get_rmapbt_records(
 		if (ret == 0)
 			do_error(
  _("ran out of records while rebuilding AG %u rmap btree\n"),
-					pag_agno(cur->bc_ag.pag));
+					cur->bc_group->xg_gno);
 
 		block_rec = libxfs_btree_rec_addr(cur, idx, block);
 		cur->bc_ops->init_rec_from_cur(cur, block_rec);
diff --git a/repair/bmap_repair.c b/repair/bmap_repair.c
index 3a214c85a1de5f..f052b5dcddff08 100644
--- a/repair/bmap_repair.c
+++ b/repair/bmap_repair.c
@@ -104,7 +104,7 @@ xrep_bmap_check_fork_rmap(
 		return EFSCORRUPTED;
 
 	/* Check that this is within the AG. */
-	if (!xfs_verify_agbext(cur->bc_ag.pag, rec->rm_startblock,
+	if (!xfs_verify_agbext(to_perag(cur->bc_group), rec->rm_startblock,
 				rec->rm_blockcount))
 		return EFSCORRUPTED;
 
@@ -154,7 +154,7 @@ xrep_bmap_walk_rmap(
 	    !(rec->rm_flags & XFS_RMAP_ATTR_FORK))
 		return 0;
 
-	fsbno = xfs_agbno_to_fsb(cur->bc_ag.pag, rec->rm_startblock);
+	fsbno = xfs_agbno_to_fsb(to_perag(cur->bc_group), rec->rm_startblock);
 
 	if (rec->rm_flags & XFS_RMAP_BMBT_BLOCK) {
 		rb->old_bmbt_block_count += rec->rm_blockcount;


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 23/36] xfs: add group based bno conversion helpers
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (21 preceding siblings ...)
  2024-12-23 21:42   ` [PATCH 22/36] xfs: add a generic group pointer to the btree cursor Darrick J. Wong
@ 2024-12-23 21:43   ` Darrick J. Wong
  2024-12-23 21:43   ` [PATCH 24/36] xfs: store a generic group structure in the intents Darrick J. Wong
                     ` (12 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:43 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: 759cc1989a53024066b0f2ea52c206b4ff8f522c

Add/move the blocks, blklog and blkmask fields to the generic groups
structure so that code can work with AGs and RTGs by just using the
right index into the array.

Then, add convenience helpers to convert block numbers based on the
generic group.  This will allow writing code that doesn't care if it is
used on AGs or the upcoming realtime groups.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/xfs_mount.h      |   26 +++++++++++++++++++++
 libxfs/libxfs_api_defs.h |    1 +
 libxfs/xfs_group.c       |    9 +++++++
 libxfs/xfs_group.h       |   56 ++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_sb.c          |    7 ++++++
 repair/bmap_repair.c     |    2 +-
 6 files changed, 100 insertions(+), 1 deletion(-)


diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 2102009aa8df73..1d36e3986ead2f 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -26,6 +26,32 @@ enum {
 
 struct xfs_groups {
 	struct xarray		xa;
+
+	/*
+	 * Maximum capacity of the group in FSBs.
+	 *
+	 * Each group is laid out densely in the daddr space.  For the
+	 * degenerate case of a pre-rtgroups filesystem, the incore rtgroup
+	 * pretends to have a zero-block and zero-blklog rtgroup.
+	 */
+	uint32_t		blocks;
+
+	/*
+	 * Log(2) of the logical size of each group.
+	 *
+	 * Compared to the blocks field above this is rounded up to the next
+	 * power of two, and thus lays out the xfs_fsblock_t/xfs_rtblock_t
+	 * space sparsely with a hole from blocks to (1 << blklog) at the end
+	 * of each group.
+	 */
+	uint8_t			blklog;
+
+	/*
+	 * Mask to extract the group-relative block number from a FSB.
+	 * For a pre-rtgroups filesystem we pretend to have one very large
+	 * rtgroup, so this mask must be 64-bit.
+	 */
+	uint64_t		blkmask;
 };
 
 /*
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 8f3e9e8694675d..483a7a9a4cbf45 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -168,6 +168,7 @@
 #define xfs_free_extent_later		libxfs_free_extent_later
 #define xfs_free_perag_range		libxfs_free_perag_range
 #define xfs_fs_geometry			libxfs_fs_geometry
+#define xfs_gbno_to_fsb			libxfs_gbno_to_fsb
 #define xfs_get_initial_prid		libxfs_get_initial_prid
 #define xfs_highbit32			libxfs_highbit32
 #define xfs_highbit64			libxfs_highbit64
diff --git a/libxfs/xfs_group.c b/libxfs/xfs_group.c
index 9c7fa99d00b802..80b6993cc9916e 100644
--- a/libxfs/xfs_group.c
+++ b/libxfs/xfs_group.c
@@ -212,3 +212,12 @@ xfs_group_insert(
 #endif
 	return error;
 }
+
+struct xfs_group *
+xfs_group_get_by_fsb(
+	struct xfs_mount	*mp,
+	xfs_fsblock_t		fsbno,
+	enum xfs_group_type	type)
+{
+	return xfs_group_get(mp, xfs_fsb_to_gno(mp, fsbno, type), type);
+}
diff --git a/libxfs/xfs_group.h b/libxfs/xfs_group.h
index 0ff6e1d5635cb1..5b7362277c3f7a 100644
--- a/libxfs/xfs_group.h
+++ b/libxfs/xfs_group.h
@@ -46,6 +46,8 @@ struct xfs_group {
 
 struct xfs_group *xfs_group_get(struct xfs_mount *mp, uint32_t index,
 		enum xfs_group_type type);
+struct xfs_group *xfs_group_get_by_fsb(struct xfs_mount *mp,
+		xfs_fsblock_t fsbno, enum xfs_group_type type);
 struct xfs_group *xfs_group_hold(struct xfs_group *xg);
 void xfs_group_put(struct xfs_group *xg);
 
@@ -72,4 +74,58 @@ int xfs_group_insert(struct xfs_mount *mp, struct xfs_group *xg,
 #define xfs_group_marked(_mp, _type, _mark) \
 	xa_marked(&(_mp)->m_groups[(_type)].xa, (_mark))
 
+static inline xfs_agblock_t
+xfs_group_max_blocks(
+	struct xfs_group	*xg)
+{
+	return xg->xg_mount->m_groups[xg->xg_type].blocks;
+}
+
+static inline xfs_fsblock_t
+xfs_group_start_fsb(
+	struct xfs_group	*xg)
+{
+	return ((xfs_fsblock_t)xg->xg_gno) <<
+		xg->xg_mount->m_groups[xg->xg_type].blklog;
+}
+
+static inline xfs_fsblock_t
+xfs_gbno_to_fsb(
+	struct xfs_group	*xg,
+	xfs_agblock_t		gbno)
+{
+	return xfs_group_start_fsb(xg) | gbno;
+}
+
+static inline xfs_daddr_t
+xfs_gbno_to_daddr(
+	struct xfs_group	*xg,
+	xfs_agblock_t		gbno)
+{
+	struct xfs_mount	*mp = xg->xg_mount;
+	uint32_t		blocks = mp->m_groups[xg->xg_type].blocks;
+
+	return XFS_FSB_TO_BB(mp, (xfs_fsblock_t)xg->xg_gno * blocks + gbno);
+}
+
+static inline uint32_t
+xfs_fsb_to_gno(
+	struct xfs_mount	*mp,
+	xfs_fsblock_t		fsbno,
+	enum xfs_group_type	type)
+{
+	if (!mp->m_groups[type].blklog)
+		return 0;
+	return fsbno >> mp->m_groups[type].blklog;
+}
+
+static inline xfs_agblock_t
+xfs_fsb_to_gbno(
+	struct xfs_mount	*mp,
+	xfs_fsblock_t		fsbno,
+	enum xfs_group_type	type)
+{
+	return fsbno & mp->m_groups[type].blkmask;
+}
+
 #endif /* __LIBXFS_GROUP_H */
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index d32f789037389f..9120a3377735b0 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -999,6 +999,8 @@ xfs_sb_mount_common(
 	struct xfs_mount	*mp,
 	struct xfs_sb		*sbp)
 {
+	struct xfs_groups	*ags = &mp->m_groups[XG_TYPE_AG];
+
 	mp->m_agfrotor = 0;
 	atomic_set(&mp->m_agirotor, 0);
 	mp->m_maxagi = mp->m_sb.sb_agcount;
@@ -1009,6 +1011,11 @@ xfs_sb_mount_common(
 	mp->m_blockmask = sbp->sb_blocksize - 1;
 	mp->m_blockwsize = sbp->sb_blocksize >> XFS_WORDLOG;
 	mp->m_blockwmask = mp->m_blockwsize - 1;
+
+	ags->blocks = mp->m_sb.sb_agblocks;
+	ags->blklog = mp->m_sb.sb_agblklog;
+	ags->blkmask = xfs_mask32lo(mp->m_sb.sb_agblklog);
+
 	xfs_mount_sb_set_rextsize(mp, sbp);
 
 	mp->m_alloc_mxr[0] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, true);
diff --git a/repair/bmap_repair.c b/repair/bmap_repair.c
index f052b5dcddff08..7e7c2a39f5724b 100644
--- a/repair/bmap_repair.c
+++ b/repair/bmap_repair.c
@@ -154,7 +154,7 @@ xrep_bmap_walk_rmap(
 	    !(rec->rm_flags & XFS_RMAP_ATTR_FORK))
 		return 0;
 
-	fsbno = xfs_agbno_to_fsb(to_perag(cur->bc_group), rec->rm_startblock);
+	fsbno = libxfs_gbno_to_fsb(cur->bc_group, rec->rm_startblock);
 
 	if (rec->rm_flags & XFS_RMAP_BMBT_BLOCK) {
 		rb->old_bmbt_block_count += rec->rm_blockcount;


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 24/36] xfs: store a generic group structure in the intents
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (22 preceding siblings ...)
  2024-12-23 21:43   ` [PATCH 23/36] xfs: add group based bno conversion helpers Darrick J. Wong
@ 2024-12-23 21:43   ` Darrick J. Wong
  2024-12-23 21:43   ` [PATCH 25/36] xfs: constify the xfs_sb predicates Darrick J. Wong
                     ` (11 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:43 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: e5e5cae05b71aa5b5e291c0e74b4e4d98a0b05d4

Replace the pag pointers in the extent free, bmap, rmap and refcount
intent structures with a pointer to the generic group to prepare
for adding intents for realtime groups.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/xfs_mount.h   |   11 ++++++-----
 libxfs/defer_item.c   |   31 ++++++++++++++++++-------------
 libxfs/xfs_alloc.h    |    2 +-
 libxfs/xfs_bmap.h     |    2 +-
 libxfs/xfs_refcount.c |    9 +++++----
 libxfs/xfs_refcount.h |    2 +-
 libxfs/xfs_rmap.c     |   16 +++++++++-------
 libxfs/xfs_rmap.h     |    2 +-
 8 files changed, 42 insertions(+), 33 deletions(-)


diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 1d36e3986ead2f..1179a80d9df94e 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -11,6 +11,7 @@ struct xfs_inode;
 struct xfs_buftarg;
 struct xfs_da_geometry;
 struct libxfs_init;
+struct xfs_group;
 
 typedef void (*buf_writeback_fn)(struct xfs_buf *bp);
 
@@ -328,12 +329,12 @@ struct xfs_defer_drain { /* empty */ };
 #define xfs_defer_drain_init(dr)		((void)0)
 #define xfs_defer_drain_free(dr)		((void)0)
 
-#define xfs_perag_intent_get(mp, agno) \
-	xfs_perag_get((mp), XFS_FSB_TO_AGNO((mp), (agno)))
-#define xfs_perag_intent_put(pag)		xfs_perag_put(pag)
+#define xfs_group_intent_get(mp, fsbno, type) \
+	xfs_group_get_by_fsb((mp), (fsbno), (type))
+#define xfs_group_intent_put(xg)		xfs_group_put(xg)
 
-static inline void xfs_perag_intent_hold(struct xfs_perag *pag) {}
-static inline void xfs_perag_intent_rele(struct xfs_perag *pag) {}
+static inline void xfs_group_intent_hold(struct xfs_group *xg) {}
+static inline void xfs_group_intent_rele(struct xfs_group *xg) {}
 
 static inline void libxfs_buftarg_drain(struct xfs_buftarg *btp)
 {
diff --git a/libxfs/defer_item.c b/libxfs/defer_item.c
index f0f35361a0ff97..eee84ffbe625d5 100644
--- a/libxfs/defer_item.c
+++ b/libxfs/defer_item.c
@@ -28,6 +28,7 @@
 #include "xfs_ag.h"
 #include "xfs_exchmaps.h"
 #include "defer_item.h"
+#include "xfs_group.h"
 
 /* Dummy defer item ops, since we don't do logging. */
 
@@ -48,7 +49,7 @@ xfs_extent_free_diff_items(
 	struct xfs_extent_free_item	*ra = xefi_entry(a);
 	struct xfs_extent_free_item	*rb = xefi_entry(b);
 
-	return pag_agno(ra->xefi_pag) - pag_agno(rb->xefi_pag);
+	return ra->xefi_group->xg_gno - rb->xefi_group->xg_gno;
 }
 
 /* Get an EFI. */
@@ -85,7 +86,8 @@ xfs_extent_free_defer_add(
 {
 	struct xfs_mount		*mp = tp->t_mountp;
 
-	xefi->xefi_pag = xfs_perag_intent_get(mp, xefi->xefi_startblock);
+	xefi->xefi_group = xfs_group_intent_get(mp, xefi->xefi_startblock,
+			XG_TYPE_AG);
 	if (xefi->xefi_agresv == XFS_AG_RESV_AGFL)
 		*dfpp = xfs_defer_add(tp, &xefi->xefi_list,
 				&xfs_agfl_free_defer_type);
@@ -101,7 +103,7 @@ xfs_extent_free_cancel_item(
 {
 	struct xfs_extent_free_item	*xefi = xefi_entry(item);
 
-	xfs_perag_intent_put(xefi->xefi_pag);
+	xfs_group_intent_put(xefi->xefi_group);
 	kmem_cache_free(xfs_extfree_item_cache, xefi);
 }
 
@@ -127,7 +129,7 @@ xfs_extent_free_finish_item(
 	agbno = XFS_FSB_TO_AGBNO(tp->t_mountp, xefi->xefi_startblock);
 
 	if (!(xefi->xefi_flags & XFS_EFI_CANCELLED)) {
-		error = xfs_free_extent(tp, xefi->xefi_pag, agbno,
+		error = xfs_free_extent(tp, to_perag(xefi->xefi_group), agbno,
 				xefi->xefi_blockcount, &oinfo,
 				XFS_AG_RESV_NONE);
 	}
@@ -179,7 +181,7 @@ xfs_agfl_free_finish_item(
 	agbno = XFS_FSB_TO_AGBNO(mp, xefi->xefi_startblock);
 	oinfo.oi_owner = xefi->xefi_owner;
 
-	error = xfs_alloc_read_agf(xefi->xefi_pag, tp, 0, &agbp);
+	error = xfs_alloc_read_agf(to_perag(xefi->xefi_group), tp, 0, &agbp);
 	if (!error)
 		error = xfs_free_ag_extent(tp, agbp, agbno, 1, &oinfo,
 				XFS_AG_RESV_AGFL);
@@ -215,7 +217,7 @@ xfs_rmap_update_diff_items(
 	struct xfs_rmap_intent		*ra = ri_entry(a);
 	struct xfs_rmap_intent		*rb = ri_entry(b);
 
-	return pag_agno(ra->ri_pag) - pag_agno(rb->ri_pag);
+	return ra->ri_group->xg_gno - rb->ri_group->xg_gno;
 }
 
 /* Get an RUI. */
@@ -253,7 +255,8 @@ xfs_rmap_defer_add(
 
 	trace_xfs_rmap_defer(mp, ri);
 
-	ri->ri_pag = xfs_perag_intent_get(mp, ri->ri_bmap.br_startblock);
+	ri->ri_group = xfs_group_intent_get(mp, ri->ri_bmap.br_startblock,
+			XG_TYPE_AG);
 	xfs_defer_add(tp, &ri->ri_list, &xfs_rmap_update_defer_type);
 }
 
@@ -264,7 +267,7 @@ xfs_rmap_update_cancel_item(
 {
 	struct xfs_rmap_intent		*ri = ri_entry(item);
 
-	xfs_perag_intent_put(ri->ri_pag);
+	xfs_group_intent_put(ri->ri_group);
 	kmem_cache_free(xfs_rmap_intent_cache, ri);
 }
 
@@ -336,7 +339,7 @@ xfs_refcount_update_diff_items(
 	struct xfs_refcount_intent	*ra = ci_entry(a);
 	struct xfs_refcount_intent	*rb = ci_entry(b);
 
-	return pag_agno(ra->ri_pag) - pag_agno(rb->ri_pag);
+	return ra->ri_group->xg_gno - rb->ri_group->xg_gno;
 }
 
 /* Get an CUI. */
@@ -374,7 +377,8 @@ xfs_refcount_defer_add(
 
 	trace_xfs_refcount_defer(mp, ri);
 
-	ri->ri_pag = xfs_perag_intent_get(mp, ri->ri_startblock);
+	ri->ri_group = xfs_group_intent_get(mp, ri->ri_startblock,
+			XG_TYPE_AG);
 	xfs_defer_add(tp, &ri->ri_list, &xfs_refcount_update_defer_type);
 }
 
@@ -385,7 +389,7 @@ xfs_refcount_update_cancel_item(
 {
 	struct xfs_refcount_intent	*ri = ci_entry(item);
 
-	xfs_perag_intent_put(ri->ri_pag);
+	xfs_group_intent_put(ri->ri_group);
 	kmem_cache_free(xfs_refcount_intent_cache, ri);
 }
 
@@ -508,7 +512,8 @@ xfs_bmap_update_get_group(
 	 * intent drops the intent count, ensuring that the intent count
 	 * remains nonzero across the transaction roll.
 	 */
-	bi->bi_pag = xfs_perag_intent_get(mp, bi->bi_bmap.br_startblock);
+	bi->bi_group = xfs_group_intent_get(mp, bi->bi_bmap.br_startblock,
+			XG_TYPE_AG);
 }
 
 /* Add this deferred BUI to the transaction. */
@@ -542,7 +547,7 @@ xfs_bmap_update_put_group(
 	if (xfs_ifork_is_realtime(bi->bi_owner, bi->bi_whichfork))
 		return;
 
-	xfs_perag_intent_put(bi->bi_pag);
+	xfs_group_intent_put(bi->bi_group);
 }
 
 /* Cancel a deferred bmap update. */
diff --git a/libxfs/xfs_alloc.h b/libxfs/xfs_alloc.h
index 88fbce5001185f..efbde04fbbb15f 100644
--- a/libxfs/xfs_alloc.h
+++ b/libxfs/xfs_alloc.h
@@ -248,7 +248,7 @@ struct xfs_extent_free_item {
 	uint64_t		xefi_owner;
 	xfs_fsblock_t		xefi_startblock;/* starting fs block number */
 	xfs_extlen_t		xefi_blockcount;/* number of blocks in extent */
-	struct xfs_perag	*xefi_pag;
+	struct xfs_group	*xefi_group;
 	unsigned int		xefi_flags;
 	enum xfs_ag_resv_type	xefi_agresv;
 };
diff --git a/libxfs/xfs_bmap.h b/libxfs/xfs_bmap.h
index 7592d46e97c661..4b721d9359943b 100644
--- a/libxfs/xfs_bmap.h
+++ b/libxfs/xfs_bmap.h
@@ -248,7 +248,7 @@ struct xfs_bmap_intent {
 	enum xfs_bmap_intent_type		bi_type;
 	int					bi_whichfork;
 	struct xfs_inode			*bi_owner;
-	struct xfs_perag			*bi_pag;
+	struct xfs_group			*bi_group;
 	struct xfs_bmbt_irec			bi_bmap;
 };
 
diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c
index 3eccb998545d6f..709e2a94176da5 100644
--- a/libxfs/xfs_refcount.c
+++ b/libxfs/xfs_refcount.c
@@ -1357,7 +1357,7 @@ xfs_refcount_finish_one(
 	 * If we haven't gotten a cursor or the cursor AG doesn't match
 	 * the startblock, get one now.
 	 */
-	if (rcur != NULL && to_perag(rcur->bc_group) != ri->ri_pag) {
+	if (rcur != NULL && rcur->bc_group != ri->ri_group) {
 		nr_ops = rcur->bc_refc.nr_ops;
 		shape_changes = rcur->bc_refc.shape_changes;
 		xfs_btree_del_cursor(rcur, 0);
@@ -1365,13 +1365,14 @@ xfs_refcount_finish_one(
 		*pcur = NULL;
 	}
 	if (rcur == NULL) {
-		error = xfs_alloc_read_agf(ri->ri_pag, tp,
+		struct xfs_perag	*pag = to_perag(ri->ri_group);
+
+		error = xfs_alloc_read_agf(pag, tp,
 				XFS_ALLOC_FLAG_FREEING, &agbp);
 		if (error)
 			return error;
 
-		*pcur = rcur = xfs_refcountbt_init_cursor(mp, tp, agbp,
-							  ri->ri_pag);
+		*pcur = rcur = xfs_refcountbt_init_cursor(mp, tp, agbp, pag);
 		rcur->bc_refc.nr_ops = nr_ops;
 		rcur->bc_refc.shape_changes = shape_changes;
 	}
diff --git a/libxfs/xfs_refcount.h b/libxfs/xfs_refcount.h
index 68acb0b1b4a878..62d78afcf1f3ff 100644
--- a/libxfs/xfs_refcount.h
+++ b/libxfs/xfs_refcount.h
@@ -56,7 +56,7 @@ enum xfs_refcount_intent_type {
 
 struct xfs_refcount_intent {
 	struct list_head			ri_list;
-	struct xfs_perag			*ri_pag;
+	struct xfs_group			*ri_group;
 	enum xfs_refcount_intent_type		ri_type;
 	xfs_extlen_t				ri_blockcount;
 	xfs_fsblock_t				ri_startblock;
diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c
index 07bcdf82d10081..dabc7003586ce4 100644
--- a/libxfs/xfs_rmap.c
+++ b/libxfs/xfs_rmap.c
@@ -2585,28 +2585,30 @@ xfs_rmap_finish_one(
 	 * If we haven't gotten a cursor or the cursor AG doesn't match
 	 * the startblock, get one now.
 	 */
-	if (rcur != NULL && to_perag(rcur->bc_group) != ri->ri_pag) {
+	if (rcur != NULL && rcur->bc_group != ri->ri_group) {
 		xfs_btree_del_cursor(rcur, 0);
 		rcur = NULL;
 		*pcur = NULL;
 	}
 	if (rcur == NULL) {
+		struct xfs_perag	*pag = to_perag(ri->ri_group);
+
 		/*
 		 * Refresh the freelist before we start changing the
 		 * rmapbt, because a shape change could cause us to
 		 * allocate blocks.
 		 */
-		error = xfs_free_extent_fix_freelist(tp, ri->ri_pag, &agbp);
+		error = xfs_free_extent_fix_freelist(tp, pag, &agbp);
 		if (error) {
-			xfs_ag_mark_sick(ri->ri_pag, XFS_SICK_AG_AGFL);
+			xfs_ag_mark_sick(pag, XFS_SICK_AG_AGFL);
 			return error;
 		}
 		if (XFS_IS_CORRUPT(tp->t_mountp, !agbp)) {
-			xfs_ag_mark_sick(ri->ri_pag, XFS_SICK_AG_AGFL);
+			xfs_ag_mark_sick(pag, XFS_SICK_AG_AGFL);
 			return -EFSCORRUPTED;
 		}
 
-		*pcur = rcur = xfs_rmapbt_init_cursor(mp, tp, agbp, ri->ri_pag);
+		*pcur = rcur = xfs_rmapbt_init_cursor(mp, tp, agbp, pag);
 	}
 
 	xfs_rmap_ino_owner(&oinfo, ri->ri_owner, ri->ri_whichfork,
@@ -2619,8 +2621,8 @@ xfs_rmap_finish_one(
 	if (error)
 		return error;
 
-	xfs_rmap_update_hook(tp, pag_group(ri->ri_pag), ri->ri_type, bno,
-			     ri->ri_bmap.br_blockcount, unwritten, &oinfo);
+	xfs_rmap_update_hook(tp, ri->ri_group, ri->ri_type, bno,
+			ri->ri_bmap.br_blockcount, unwritten, &oinfo);
 	return 0;
 }
 
diff --git a/libxfs/xfs_rmap.h b/libxfs/xfs_rmap.h
index d409b463bc6662..96b4321d831007 100644
--- a/libxfs/xfs_rmap.h
+++ b/libxfs/xfs_rmap.h
@@ -173,7 +173,7 @@ struct xfs_rmap_intent {
 	int					ri_whichfork;
 	uint64_t				ri_owner;
 	struct xfs_bmbt_irec			ri_bmap;
-	struct xfs_perag			*ri_pag;
+	struct xfs_group			*ri_group;
 };
 
 /* functions for updating the rmapbt based on bmbt map/unmap operations */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 25/36] xfs: constify the xfs_sb predicates
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (23 preceding siblings ...)
  2024-12-23 21:43   ` [PATCH 24/36] xfs: store a generic group structure in the intents Darrick J. Wong
@ 2024-12-23 21:43   ` Darrick J. Wong
  2024-12-23 21:43   ` [PATCH 26/36] xfs: rename metadata inode predicates Darrick J. Wong
                     ` (10 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:43 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 8d939f4bd7b225d8b157b1329881d2719c0ecb29

Change the xfs_sb predicates to take a const struct xfs_sb pointer
because they do not change the superblock.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_format.h |   24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)


diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index e1bfee0c3b1a8c..a24ab46aaebc7e 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -278,7 +278,7 @@ struct xfs_dsb {
 
 #define	XFS_SB_VERSION_NUM(sbp)	((sbp)->sb_versionnum & XFS_SB_VERSION_NUMBITS)
 
-static inline bool xfs_sb_is_v5(struct xfs_sb *sbp)
+static inline bool xfs_sb_is_v5(const struct xfs_sb *sbp)
 {
 	return XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5;
 }
@@ -287,12 +287,12 @@ static inline bool xfs_sb_is_v5(struct xfs_sb *sbp)
  * Detect a mismatched features2 field.  Older kernels read/wrote
  * this into the wrong slot, so to be safe we keep them in sync.
  */
-static inline bool xfs_sb_has_mismatched_features2(struct xfs_sb *sbp)
+static inline bool xfs_sb_has_mismatched_features2(const struct xfs_sb *sbp)
 {
 	return sbp->sb_bad_features2 != sbp->sb_features2;
 }
 
-static inline bool xfs_sb_version_hasmorebits(struct xfs_sb *sbp)
+static inline bool xfs_sb_version_hasmorebits(const struct xfs_sb *sbp)
 {
 	return xfs_sb_is_v5(sbp) ||
 	       (sbp->sb_versionnum & XFS_SB_VERSION_MOREBITSBIT);
@@ -342,8 +342,8 @@ static inline void xfs_sb_version_addprojid32(struct xfs_sb *sbp)
 #define XFS_SB_FEAT_COMPAT_UNKNOWN	~XFS_SB_FEAT_COMPAT_ALL
 static inline bool
 xfs_sb_has_compat_feature(
-	struct xfs_sb	*sbp,
-	uint32_t	feature)
+	const struct xfs_sb	*sbp,
+	uint32_t		feature)
 {
 	return (sbp->sb_features_compat & feature) != 0;
 }
@@ -360,8 +360,8 @@ xfs_sb_has_compat_feature(
 #define XFS_SB_FEAT_RO_COMPAT_UNKNOWN	~XFS_SB_FEAT_RO_COMPAT_ALL
 static inline bool
 xfs_sb_has_ro_compat_feature(
-	struct xfs_sb	*sbp,
-	uint32_t	feature)
+	const struct xfs_sb	*sbp,
+	uint32_t		feature)
 {
 	return (sbp->sb_features_ro_compat & feature) != 0;
 }
@@ -387,8 +387,8 @@ xfs_sb_has_ro_compat_feature(
 #define XFS_SB_FEAT_INCOMPAT_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_ALL
 static inline bool
 xfs_sb_has_incompat_feature(
-	struct xfs_sb	*sbp,
-	uint32_t	feature)
+	const struct xfs_sb	*sbp,
+	uint32_t		feature)
 {
 	return (sbp->sb_features_incompat & feature) != 0;
 }
@@ -399,8 +399,8 @@ xfs_sb_has_incompat_feature(
 #define XFS_SB_FEAT_INCOMPAT_LOG_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_LOG_ALL
 static inline bool
 xfs_sb_has_incompat_log_feature(
-	struct xfs_sb	*sbp,
-	uint32_t	feature)
+	const struct xfs_sb	*sbp,
+	uint32_t		feature)
 {
 	return (sbp->sb_features_log_incompat & feature) != 0;
 }
@@ -420,7 +420,7 @@ xfs_sb_add_incompat_log_features(
 	sbp->sb_features_log_incompat |= features;
 }
 
-static inline bool xfs_sb_version_haslogxattrs(struct xfs_sb *sbp)
+static inline bool xfs_sb_version_haslogxattrs(const struct xfs_sb *sbp)
 {
 	return xfs_sb_is_v5(sbp) && (sbp->sb_features_log_incompat &
 		 XFS_SB_FEAT_INCOMPAT_LOG_XATTRS);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 26/36] xfs: rename metadata inode predicates
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (24 preceding siblings ...)
  2024-12-23 21:43   ` [PATCH 25/36] xfs: constify the xfs_sb predicates Darrick J. Wong
@ 2024-12-23 21:43   ` Darrick J. Wong
  2024-12-23 21:44   ` [PATCH 27/36] xfs: define the on-disk format for the metadir feature Darrick J. Wong
                     ` (9 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:43 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 4d272929a5258074328dae206c935634e0fd1a54

The predicate xfs_internal_inum tells us if an inumber refers to one of
the inodes rooted in the superblock.  Soon we're going to have internal
inodes in a metadata directory tree, so this helper should be renamed
to capture its limited scope.

Ondisk inodes will soon have a flag to indicate that they're metadata
inodes.  Head off some confusion by renaming the xfs_is_metadata_inode
predicate to xfs_is_internal_inode.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/libxfs_api_defs.h |    2 +-
 libxfs/xfs_types.c       |    4 ++--
 libxfs/xfs_types.h       |    2 +-
 repair/rmap.c            |    2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 483a7a9a4cbf45..92e26eebabfed8 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -199,7 +199,7 @@
 #define xfs_inode_validate_cowextsize	libxfs_inode_validate_cowextsize
 #define xfs_inode_validate_extsize	libxfs_inode_validate_extsize
 
-#define xfs_internal_inum		libxfs_internal_inum
+#define xfs_is_sb_inum			libxfs_is_sb_inum
 
 #define xfs_iread_extents		libxfs_iread_extents
 #define xfs_irele			libxfs_irele
diff --git a/libxfs/xfs_types.c b/libxfs/xfs_types.c
index 0d1b86ae59d93e..a70c0395979cf8 100644
--- a/libxfs/xfs_types.c
+++ b/libxfs/xfs_types.c
@@ -111,7 +111,7 @@ xfs_verify_ino(
 
 /* Is this an internal inode number? */
 inline bool
-xfs_internal_inum(
+xfs_is_sb_inum(
 	struct xfs_mount	*mp,
 	xfs_ino_t		ino)
 {
@@ -129,7 +129,7 @@ xfs_verify_dir_ino(
 	struct xfs_mount	*mp,
 	xfs_ino_t		ino)
 {
-	if (xfs_internal_inum(mp, ino))
+	if (xfs_is_sb_inum(mp, ino))
 		return false;
 	return xfs_verify_ino(mp, ino);
 }
diff --git a/libxfs/xfs_types.h b/libxfs/xfs_types.h
index d3cb6ff3b91301..25053a66c225ed 100644
--- a/libxfs/xfs_types.h
+++ b/libxfs/xfs_types.h
@@ -230,7 +230,7 @@ bool xfs_verify_fsbext(struct xfs_mount *mp, xfs_fsblock_t fsbno,
 		xfs_fsblock_t len);
 
 bool xfs_verify_ino(struct xfs_mount *mp, xfs_ino_t ino);
-bool xfs_internal_inum(struct xfs_mount *mp, xfs_ino_t ino);
+bool xfs_is_sb_inum(struct xfs_mount *mp, xfs_ino_t ino);
 bool xfs_verify_dir_ino(struct xfs_mount *mp, xfs_ino_t ino);
 bool xfs_verify_rtbno(struct xfs_mount *mp, xfs_rtblock_t rtbno);
 bool xfs_verify_rtbext(struct xfs_mount *mp, xfs_rtblock_t rtbno,
diff --git a/repair/rmap.c b/repair/rmap.c
index 29af74eee11831..3b998a22cee10d 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -814,7 +814,7 @@ rmap_shareable(
 		return false;
 
 	/* Metadata in files are never shareable */
-	if (libxfs_internal_inum(mp, rmap->rm_owner))
+	if (libxfs_is_sb_inum(mp, rmap->rm_owner))
 		return false;
 
 	/* Metadata and unwritten file blocks are not shareable. */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 27/36] xfs: define the on-disk format for the metadir feature
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (25 preceding siblings ...)
  2024-12-23 21:43   ` [PATCH 26/36] xfs: rename metadata inode predicates Darrick J. Wong
@ 2024-12-23 21:44   ` Darrick J. Wong
  2024-12-23 21:44   ` [PATCH 28/36] xfs: iget for metadata inodes Darrick J. Wong
                     ` (8 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:44 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 4f3d4dd1b04b2ba0bf236fbaa3c3c0c669aa5a47

Define the on-disk layout and feature flags for the metadata inode
directory feature.  Add a xfs_sb_version_hasmetadir for benefit of
xfs_repair, which needs to know where the new end of the superblock
lies.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/check.c              |    2 -
 db/inode.c              |    4 +-
 include/xfs_inode.h     |    6 +++
 include/xfs_mount.h     |    2 +
 libxfs/xfs_format.h     |   97 +++++++++++++++++++++++++++++++++++++++++------
 libxfs/xfs_inode_buf.c  |   20 +++++++---
 libxfs/xfs_inode_util.c |    2 +
 libxfs/xfs_log_format.h |    2 -
 libxfs/xfs_ondisk.h     |    2 -
 libxfs/xfs_sb.c         |   10 +++++
 repair/dino_chunks.c    |    2 -
 repair/dinode.c         |   12 +++---
 12 files changed, 132 insertions(+), 29 deletions(-)


diff --git a/db/check.c b/db/check.c
index 0a6e5c3280e1cf..fb7b6cb41a3fbf 100644
--- a/db/check.c
+++ b/db/check.c
@@ -2823,7 +2823,7 @@ process_inode(
 		return;
 	}
 	if (dip->di_version == 1) {
-		nlink = be16_to_cpu(dip->di_onlink);
+		nlink = be16_to_cpu(dip->di_metatype);
 		prid = 0;
 	} else {
 		nlink = be32_to_cpu(dip->di_nlink);
diff --git a/db/inode.c b/db/inode.c
index 7a5f5a0cb987aa..246febb5929aa1 100644
--- a/db/inode.c
+++ b/db/inode.c
@@ -90,9 +90,9 @@ const field_t	inode_core_flds[] = {
 	{ "mode", FLDT_UINT16O, OI(COFF(mode)), C1, 0, TYP_NONE },
 	{ "version", FLDT_INT8D, OI(COFF(version)), C1, 0, TYP_NONE },
 	{ "format", FLDT_DINODE_FMT, OI(COFF(format)), C1, 0, TYP_NONE },
-	{ "nlinkv1", FLDT_UINT16D, OI(COFF(onlink)), inode_core_nlinkv1_count,
+	{ "nlinkv1", FLDT_UINT16D, OI(COFF(metatype)), inode_core_nlinkv1_count,
 	  FLD_COUNT, TYP_NONE },
-	{ "onlink", FLDT_UINT16D, OI(COFF(onlink)), inode_core_onlink_count,
+	{ "onlink", FLDT_UINT16D, OI(COFF(metatype)), inode_core_onlink_count,
 	  FLD_COUNT, TYP_NONE },
 	{ "uid", FLDT_UINT32D, OI(COFF(uid)), C1, 0, TYP_NONE },
 	{ "gid", FLDT_UINT32D, OI(COFF(gid)), C1, 0, TYP_NONE },
diff --git a/include/xfs_inode.h b/include/xfs_inode.h
index f250102ff19d65..e03521bc9aaaa2 100644
--- a/include/xfs_inode.h
+++ b/include/xfs_inode.h
@@ -230,6 +230,7 @@ typedef struct xfs_inode {
 		uint16_t	i_flushiter;	/* incremented on flush */
 	};
 	uint8_t			i_forkoff;	/* attr fork offset >> 3 */
+	enum xfs_metafile_type	i_metatype;	/* XFS_METAFILE_* */
 	uint16_t		i_diflags;	/* XFS_DIFLAG_... */
 	uint64_t		i_diflags2;	/* XFS_DIFLAG2_... */
 	struct timespec64	i_crtime;	/* time created */
@@ -396,6 +397,11 @@ static inline bool xfs_is_always_cow_inode(struct xfs_inode *ip)
 	return false;
 }
 
+static inline bool xfs_is_metadir_inode(const struct xfs_inode *ip)
+{
+	return ip->i_diflags2 & XFS_DIFLAG2_METADATA;
+}
+
 extern void	libxfs_trans_inode_alloc_buf (struct xfs_trans *,
 				struct xfs_buf *);
 
diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 1179a80d9df94e..c7fada9e2a6d70 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -201,6 +201,7 @@ typedef struct xfs_mount {
 #define XFS_FEAT_NEEDSREPAIR	(1ULL << 25)	/* needs xfs_repair */
 #define XFS_FEAT_NREXT64	(1ULL << 26)	/* large extent counters */
 #define XFS_FEAT_EXCHANGE_RANGE	(1ULL << 27)	/* exchange range */
+#define XFS_FEAT_METADIR	(1ULL << 28)	/* metadata directory tree */
 
 #define __XFS_HAS_FEAT(name, NAME) \
 static inline bool xfs_has_ ## name (struct xfs_mount *mp) \
@@ -246,6 +247,7 @@ __XFS_HAS_FEAT(bigtime, BIGTIME)
 __XFS_HAS_FEAT(needsrepair, NEEDSREPAIR)
 __XFS_HAS_FEAT(large_extent_counts, NREXT64)
 __XFS_HAS_FEAT(exchange_range, EXCHANGE_RANGE)
+__XFS_HAS_FEAT(metadir, METADIR)
 
 /* Kernel mount features that we don't support */
 #define __XFS_UNSUPP_FEAT(name) \
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index a24ab46aaebc7e..616f81045921b7 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -174,6 +174,8 @@ typedef struct xfs_sb {
 	xfs_lsn_t	sb_lsn;		/* last write sequence */
 	uuid_t		sb_meta_uuid;	/* metadata file system unique id */
 
+	xfs_ino_t	sb_metadirino;	/* metadata directory tree root */
+
 	/* must be padded to 64 bit alignment */
 } xfs_sb_t;
 
@@ -259,6 +261,8 @@ struct xfs_dsb {
 	__be64		sb_lsn;		/* last write sequence */
 	uuid_t		sb_meta_uuid;	/* metadata file system unique id */
 
+	__be64		sb_metadirino;	/* metadata directory tree root */
+
 	/* must be padded to 64 bit alignment */
 };
 
@@ -374,6 +378,7 @@ xfs_sb_has_ro_compat_feature(
 #define XFS_SB_FEAT_INCOMPAT_NREXT64	(1 << 5)  /* large extent counters */
 #define XFS_SB_FEAT_INCOMPAT_EXCHRANGE	(1 << 6)  /* exchangerange supported */
 #define XFS_SB_FEAT_INCOMPAT_PARENT	(1 << 7)  /* parent pointers */
+#define XFS_SB_FEAT_INCOMPAT_METADIR	(1 << 8)  /* metadata dir tree */
 #define XFS_SB_FEAT_INCOMPAT_ALL \
 		(XFS_SB_FEAT_INCOMPAT_FTYPE | \
 		 XFS_SB_FEAT_INCOMPAT_SPINODES | \
@@ -790,6 +795,27 @@ static inline time64_t xfs_bigtime_to_unix(uint64_t ondisk_seconds)
 	return (time64_t)ondisk_seconds - XFS_BIGTIME_EPOCH_OFFSET;
 }
 
+enum xfs_metafile_type {
+	XFS_METAFILE_UNKNOWN,		/* unknown */
+	XFS_METAFILE_DIR,		/* metadir directory */
+	XFS_METAFILE_USRQUOTA,		/* user quota */
+	XFS_METAFILE_GRPQUOTA,		/* group quota */
+	XFS_METAFILE_PRJQUOTA,		/* project quota */
+	XFS_METAFILE_RTBITMAP,		/* rt bitmap */
+	XFS_METAFILE_RTSUMMARY,		/* rt summary */
+
+	XFS_METAFILE_MAX
+} __packed;
+
+#define XFS_METAFILE_TYPE_STR \
+	{ XFS_METAFILE_UNKNOWN,		"unknown" }, \
+	{ XFS_METAFILE_DIR,		"dir" }, \
+	{ XFS_METAFILE_USRQUOTA,	"usrquota" }, \
+	{ XFS_METAFILE_GRPQUOTA,	"grpquota" }, \
+	{ XFS_METAFILE_PRJQUOTA,	"prjquota" }, \
+	{ XFS_METAFILE_RTBITMAP,	"rtbitmap" }, \
+	{ XFS_METAFILE_RTSUMMARY,	"rtsummary" }
+
 /*
  * On-disk inode structure.
  *
@@ -812,7 +838,7 @@ struct xfs_dinode {
 	__be16		di_mode;	/* mode and type of file */
 	__u8		di_version;	/* inode version */
 	__u8		di_format;	/* format of di_c data */
-	__be16		di_onlink;	/* old number of links to file */
+	__be16		di_metatype;	/* XFS_METAFILE_*; was di_onlink */
 	__be32		di_uid;		/* owner's user id */
 	__be32		di_gid;		/* owner's group id */
 	__be32		di_nlink;	/* number of links to file */
@@ -1088,21 +1114,60 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
  * Values for di_flags2 These start by being exposed to userspace in the upper
  * 16 bits of the XFS_XFLAG_s range.
  */
-#define XFS_DIFLAG2_DAX_BIT	0	/* use DAX for this inode */
-#define XFS_DIFLAG2_REFLINK_BIT	1	/* file's blocks may be shared */
-#define XFS_DIFLAG2_COWEXTSIZE_BIT   2  /* copy on write extent size hint */
-#define XFS_DIFLAG2_BIGTIME_BIT	3	/* big timestamps */
-#define XFS_DIFLAG2_NREXT64_BIT 4	/* large extent counters */
-
-#define XFS_DIFLAG2_DAX		(1 << XFS_DIFLAG2_DAX_BIT)
-#define XFS_DIFLAG2_REFLINK     (1 << XFS_DIFLAG2_REFLINK_BIT)
-#define XFS_DIFLAG2_COWEXTSIZE  (1 << XFS_DIFLAG2_COWEXTSIZE_BIT)
-#define XFS_DIFLAG2_BIGTIME	(1 << XFS_DIFLAG2_BIGTIME_BIT)
-#define XFS_DIFLAG2_NREXT64	(1 << XFS_DIFLAG2_NREXT64_BIT)
+/* use DAX for this inode */
+#define XFS_DIFLAG2_DAX_BIT		0
+
+/* file's blocks may be shared */
+#define XFS_DIFLAG2_REFLINK_BIT		1
+
+/* copy on write extent size hint */
+#define XFS_DIFLAG2_COWEXTSIZE_BIT	2
+
+/* big timestamps */
+#define XFS_DIFLAG2_BIGTIME_BIT		3
+
+/* large extent counters */
+#define XFS_DIFLAG2_NREXT64_BIT		4
+
+/*
+ * The inode contains filesystem metadata and can be found through the metadata
+ * directory tree.  Metadata inodes must satisfy the following constraints:
+ *
+ * - V5 filesystem (and ftype) are enabled;
+ * - The only valid modes are regular files and directories;
+ * - The access bits must be zero;
+ * - DMAPI event and state masks are zero;
+ * - The user and group IDs must be zero;
+ * - The project ID can be used as a u32 annotation;
+ * - The immutable, sync, noatime, nodump, nodefrag flags must be set.
+ * - The dax flag must not be set.
+ * - Directories must have nosymlinks set.
+ *
+ * These requirements are chosen defensively to minimize the ability of
+ * userspace to read or modify the contents, should a metadata file ever
+ * escape to userspace.
+ *
+ * There are further constraints on the directory tree itself:
+ *
+ * - Metadata inodes must never be resolvable through the root directory;
+ * - They must never be accessed by userspace;
+ * - Metadata directory entries must have correct ftype.
+ *
+ * Superblock-rooted metadata files must have the METADATA iflag set even
+ * though they do not have a parent directory.
+ */
+#define XFS_DIFLAG2_METADATA_BIT	5
+
+#define XFS_DIFLAG2_DAX		(1ULL << XFS_DIFLAG2_DAX_BIT)
+#define XFS_DIFLAG2_REFLINK	(1ULL << XFS_DIFLAG2_REFLINK_BIT)
+#define XFS_DIFLAG2_COWEXTSIZE	(1ULL << XFS_DIFLAG2_COWEXTSIZE_BIT)
+#define XFS_DIFLAG2_BIGTIME	(1ULL << XFS_DIFLAG2_BIGTIME_BIT)
+#define XFS_DIFLAG2_NREXT64	(1ULL << XFS_DIFLAG2_NREXT64_BIT)
+#define XFS_DIFLAG2_METADATA	(1ULL << XFS_DIFLAG2_METADATA_BIT)
 
 #define XFS_DIFLAG2_ANY \
 	(XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE | \
-	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64)
+	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64 | XFS_DIFLAG2_METADATA)
 
 static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip)
 {
@@ -1117,6 +1182,12 @@ static inline bool xfs_dinode_has_large_extent_counts(
 	       (dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_NREXT64));
 }
 
+static inline bool xfs_dinode_is_metadir(const struct xfs_dinode *dip)
+{
+	return dip->di_version >= 3 &&
+	       (dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_METADATA));
+}
+
 /*
  * Inode number format:
  * low inopblog bits - offset in block
diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c
index 5970ee705bc5a2..981113f6acd37a 100644
--- a/libxfs/xfs_inode_buf.c
+++ b/libxfs/xfs_inode_buf.c
@@ -206,12 +206,15 @@ xfs_inode_from_disk(
 	 * They will also be unconditionally written back to disk as v2 inodes.
 	 */
 	if (unlikely(from->di_version == 1)) {
-		set_nlink(inode, be16_to_cpu(from->di_onlink));
+		/* di_metatype used to be di_onlink */
+		set_nlink(inode, be16_to_cpu(from->di_metatype));
 		ip->i_projid = 0;
 	} else {
 		set_nlink(inode, be32_to_cpu(from->di_nlink));
 		ip->i_projid = (prid_t)be16_to_cpu(from->di_projid_hi) << 16 |
 					be16_to_cpu(from->di_projid_lo);
+		if (xfs_dinode_is_metadir(from))
+			ip->i_metatype = be16_to_cpu(from->di_metatype);
 	}
 
 	i_uid_write(inode, be32_to_cpu(from->di_uid));
@@ -312,7 +315,10 @@ xfs_inode_to_disk(
 	struct inode		*inode = VFS_I(ip);
 
 	to->di_magic = cpu_to_be16(XFS_DINODE_MAGIC);
-	to->di_onlink = 0;
+	if (xfs_is_metadir_inode(ip))
+		to->di_metatype = cpu_to_be16(ip->i_metatype);
+	else
+		to->di_metatype = 0;
 
 	to->di_format = xfs_ifork_format(&ip->i_df);
 	to->di_uid = cpu_to_be32(i_uid_read(inode));
@@ -520,8 +526,11 @@ xfs_dinode_verify(
 	 * di_nlink==0 on a V1 inode.  V2/3 inodes would get written out with
 	 * di_onlink==0, so we can check that.
 	 */
-	if (dip->di_version >= 2) {
-		if (dip->di_onlink)
+	if (dip->di_version == 2) {
+		if (dip->di_metatype)
+			return __this_address;
+	} else if (dip->di_version >= 3) {
+		if (!xfs_dinode_is_metadir(dip) && dip->di_metatype)
 			return __this_address;
 	}
 
@@ -543,7 +552,8 @@ xfs_dinode_verify(
 			if (dip->di_nlink)
 				return __this_address;
 		} else {
-			if (dip->di_onlink)
+			/* di_metatype used to be di_onlink */
+			if (dip->di_metatype)
 				return __this_address;
 		}
 	}
diff --git a/libxfs/xfs_inode_util.c b/libxfs/xfs_inode_util.c
index f9f16c7e2d0788..edc985eb9a4e45 100644
--- a/libxfs/xfs_inode_util.c
+++ b/libxfs/xfs_inode_util.c
@@ -221,6 +221,8 @@ xfs_inode_inherit_flags2(
 	}
 	if (pip->i_diflags2 & XFS_DIFLAG2_DAX)
 		ip->i_diflags2 |= XFS_DIFLAG2_DAX;
+	if (xfs_is_metadir_inode(pip))
+		ip->i_diflags2 |= XFS_DIFLAG2_METADATA;
 
 	/* Don't let invalid cowextsize hints propagate. */
 	failaddr = xfs_inode_validate_cowextsize(ip->i_mount, ip->i_cowextsize,
diff --git a/libxfs/xfs_log_format.h b/libxfs/xfs_log_format.h
index 3e6682ed656b30..ace7384a275bfb 100644
--- a/libxfs/xfs_log_format.h
+++ b/libxfs/xfs_log_format.h
@@ -404,7 +404,7 @@ struct xfs_log_dinode {
 	uint16_t	di_mode;	/* mode and type of file */
 	int8_t		di_version;	/* inode version */
 	int8_t		di_format;	/* format of di_c data */
-	uint8_t		di_pad3[2];	/* unused in v2/3 inodes */
+	uint16_t	di_metatype;	/* metadata type, if DIFLAG2_METADATA */
 	uint32_t	di_uid;		/* owner's user id */
 	uint32_t	di_gid;		/* owner's group id */
 	uint32_t	di_nlink;	/* number of links to file */
diff --git a/libxfs/xfs_ondisk.h b/libxfs/xfs_ondisk.h
index 23c133fd36f5bb..8bca86e350fdc1 100644
--- a/libxfs/xfs_ondisk.h
+++ b/libxfs/xfs_ondisk.h
@@ -37,7 +37,7 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_STRUCT_SIZE(struct xfs_dinode,		176);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_disk_dquot,		104);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_dqblk,			136);
-	XFS_CHECK_STRUCT_SIZE(struct xfs_dsb,			264);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dsb,			272);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_dsymlink_hdr,		56);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_inobt_key,		4);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_inobt_rec,		16);
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 9120a3377735b0..de12edd8192e1c 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -177,6 +177,8 @@ xfs_sb_version_to_features(
 		features |= XFS_FEAT_EXCHANGE_RANGE;
 	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_PARENT)
 		features |= XFS_FEAT_PARENT;
+	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR)
+		features |= XFS_FEAT_METADIR;
 
 	return features;
 }
@@ -700,6 +702,11 @@ __xfs_sb_from_disk(
 	/* Convert on-disk flags to in-memory flags? */
 	if (convert_xquota)
 		xfs_sb_quota_from_disk(to);
+
+	if (to->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR)
+		to->sb_metadirino = be64_to_cpu(from->sb_metadirino);
+	else
+		to->sb_metadirino = NULLFSINO;
 }
 
 void
@@ -847,6 +854,9 @@ xfs_sb_to_disk(
 	to->sb_lsn = cpu_to_be64(from->sb_lsn);
 	if (from->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_META_UUID)
 		uuid_copy(&to->sb_meta_uuid, &from->sb_meta_uuid);
+
+	if (from->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR)
+		to->sb_metadirino = cpu_to_be64(from->sb_metadirino);
 }
 
 /*
diff --git a/repair/dino_chunks.c b/repair/dino_chunks.c
index 86e29dd9ae05eb..49d57948c7eca8 100644
--- a/repair/dino_chunks.c
+++ b/repair/dino_chunks.c
@@ -896,7 +896,7 @@ process_inode_chunk(
 			set_inode_disk_nlinks(ino_rec, irec_offset,
 				dino->di_version > 1
 					? be32_to_cpu(dino->di_nlink)
-					: be16_to_cpu(dino->di_onlink));
+					: be16_to_cpu(dino->di_metatype));
 
 		} else  {
 			set_inode_free(ino_rec, irec_offset);
diff --git a/repair/dinode.c b/repair/dinode.c
index 2c9d9acfa10be5..e217e037f8862d 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -2231,20 +2231,22 @@ process_check_inode_nlink_version(
 	int			dirty = 0;
 
 	/*
-	 * if it's a version 2 inode, it should have a zero
+	 * if it's a version 2 non-metadir inode, it should have a zero
 	 * onlink field, so clear it.
 	 */
-	if (dino->di_version > 1 && dino->di_onlink != 0) {
+	if (dino->di_version > 1 &&
+	    !(dino->di_flags2 & cpu_to_be64(XFS_DIFLAG2_METADATA)) &&
+	    dino->di_metatype != 0) {
 		if (!no_modify) {
 			do_warn(
 _("clearing obsolete nlink field in version 2 inode %" PRIu64 ", was %d, now 0\n"),
-				lino, be16_to_cpu(dino->di_onlink));
-			dino->di_onlink = 0;
+				lino, be16_to_cpu(dino->di_metatype));
+			dino->di_metatype = 0;
 			dirty = 1;
 		} else  {
 			do_warn(
 _("would clear obsolete nlink field in version 2 inode %" PRIu64 ", currently %d\n"),
-				lino, be16_to_cpu(dino->di_onlink));
+				lino, be16_to_cpu(dino->di_metatype));
 		}
 	}
 	return dirty;


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 28/36] xfs: iget for metadata inodes
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (26 preceding siblings ...)
  2024-12-23 21:44   ` [PATCH 27/36] xfs: define the on-disk format for the metadir feature Darrick J. Wong
@ 2024-12-23 21:44   ` Darrick J. Wong
  2024-12-23 21:44   ` [PATCH 29/36] xfs: enforce metadata inode flag Darrick J. Wong
                     ` (7 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:44 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit df2e495bb92c84a401b6b90c835a9d1be84a3a0f

Create a xfs_trans_metafile_iget function for metadata inodes to ensure
that when we try to iget a metadata file, the inode is allocated and its
file mode matches the metadata file type the caller expects.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/libxfs.h         |    1 +
 include/xfs_inode.h      |    5 ++++
 libxfs/Makefile          |    1 +
 libxfs/inode.c           |   55 ++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/libxfs_api_defs.h |    5 ++++
 libxfs/xfs_metafile.h    |   16 +++++++++++++
 6 files changed, 83 insertions(+)
 create mode 100644 libxfs/xfs_metafile.h


diff --git a/include/libxfs.h b/include/libxfs.h
index fe8e6584f1caca..0356bc57b956a9 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -95,6 +95,7 @@ struct iomap;
 #include "xfs_btree_mem.h"
 #include "xfs_parent.h"
 #include "xfs_ag_resv.h"
+#include "xfs_metafile.h"
 
 #ifndef ARRAY_SIZE
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
diff --git a/include/xfs_inode.h b/include/xfs_inode.h
index e03521bc9aaaa2..6f2d23987d5f8a 100644
--- a/include/xfs_inode.h
+++ b/include/xfs_inode.h
@@ -210,6 +210,11 @@ static inline struct timespec64 inode_set_ctime_current(struct inode *inode)
 	return now;
 }
 
+static inline bool inode_wrong_type(const struct inode *inode, umode_t mode)
+{
+	return (inode->i_mode ^ mode) & S_IFMT;
+}
+
 typedef struct xfs_inode {
 	struct cache_node	i_node;
 	struct xfs_mount	*i_mount;	/* fs mount struct ptr */
diff --git a/libxfs/Makefile b/libxfs/Makefile
index 470583006de69a..765c84a16408f8 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -55,6 +55,7 @@ HFILES = \
 	xfs_inode_buf.h \
 	xfs_inode_fork.h \
 	xfs_inode_util.h \
+	xfs_metafile.h \
 	xfs_parent.h \
 	xfs_quota_defs.h \
 	xfs_refcount.h \
diff --git a/libxfs/inode.c b/libxfs/inode.c
index 9230ad24a5cb6c..1eb0bccae48906 100644
--- a/libxfs/inode.c
+++ b/libxfs/inode.c
@@ -205,6 +205,61 @@ libxfs_iget(
 	return error;
 }
 
+/*
+ * Get a metadata inode.
+ *
+ * The metafile type must match the file mode exactly.
+ */
+int
+libxfs_trans_metafile_iget(
+	struct xfs_trans	*tp,
+	xfs_ino_t		ino,
+	enum xfs_metafile_type	metafile_type,
+	struct xfs_inode	**ipp)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_inode	*ip;
+	umode_t			mode;
+	int			error;
+
+	error = libxfs_iget(mp, tp, ino, 0, &ip);
+	if (error)
+		return error;
+
+	if (metafile_type == XFS_METAFILE_DIR)
+		mode = S_IFDIR;
+	else
+		mode = S_IFREG;
+	if (inode_wrong_type(VFS_I(ip), mode))
+		goto bad_rele;
+
+	*ipp = ip;
+	return 0;
+bad_rele:
+	libxfs_irele(ip);
+	return -EFSCORRUPTED;
+}
+
+/* Grab a metadata file if the caller doesn't already have a transaction. */
+int
+libxfs_metafile_iget(
+	struct xfs_mount	*mp,
+	xfs_ino_t		ino,
+	enum xfs_metafile_type	metafile_type,
+	struct xfs_inode	**ipp)
+{
+	struct xfs_trans	*tp;
+	int			error;
+
+	error = libxfs_trans_alloc_empty(mp, &tp);
+	if (error)
+		return error;
+
+	error = libxfs_trans_metafile_iget(tp, ino, metafile_type, ipp);
+	libxfs_trans_cancel(tp);
+	return error;
+}
+
 static void
 libxfs_idestroy(
 	struct xfs_inode	*ip)
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 92e26eebabfed8..fefae9256555a0 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -184,6 +184,7 @@
 #define xfs_iext_next			libxfs_iext_next
 #define xfs_ifork_zap_attr		libxfs_ifork_zap_attr
 #define xfs_imap_to_bp			libxfs_imap_to_bp
+
 #define xfs_initialize_perag		libxfs_initialize_perag
 #define xfs_initialize_perag_data	libxfs_initialize_perag_data
 #define xfs_init_local_fork		libxfs_init_local_fork
@@ -208,8 +209,12 @@
 #define xfs_log_calc_minimum_size	libxfs_log_calc_minimum_size
 #define xfs_log_get_max_trans_res	libxfs_log_get_max_trans_res
 #define xfs_log_sb			libxfs_log_sb
+
+#define xfs_metafile_iget		libxfs_metafile_iget
+#define xfs_trans_metafile_iget		libxfs_trans_metafile_iget
 #define xfs_mode_to_ftype		libxfs_mode_to_ftype
 #define xfs_mkdir_space_res		libxfs_mkdir_space_res
+
 #define xfs_parent_addname		libxfs_parent_addname
 #define xfs_parent_finish		libxfs_parent_finish
 #define xfs_parent_hashval		libxfs_parent_hashval
diff --git a/libxfs/xfs_metafile.h b/libxfs/xfs_metafile.h
new file mode 100644
index 00000000000000..60fe1890611277
--- /dev/null
+++ b/libxfs/xfs_metafile.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (c) 2018-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __XFS_METAFILE_H__
+#define __XFS_METAFILE_H__
+
+/* Code specific to kernel/userspace; must be provided externally. */
+
+int xfs_trans_metafile_iget(struct xfs_trans *tp, xfs_ino_t ino,
+		enum xfs_metafile_type metafile_type, struct xfs_inode **ipp);
+int xfs_metafile_iget(struct xfs_mount *mp, xfs_ino_t ino,
+		enum xfs_metafile_type metafile_type, struct xfs_inode **ipp);
+
+#endif /* __XFS_METAFILE_H__ */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 29/36] xfs: enforce metadata inode flag
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (27 preceding siblings ...)
  2024-12-23 21:44   ` [PATCH 28/36] xfs: iget for metadata inodes Darrick J. Wong
@ 2024-12-23 21:44   ` Darrick J. Wong
  2024-12-23 21:44   ` [PATCH 30/36] xfs: read and write metadata inode directory tree Darrick J. Wong
                     ` (6 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:44 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 7297fd0bebbd70efd12f72632a0f3ac49a8f59fe

Add checks for the metadata inode flag so that we don't ever leak
metadata inodes out to userspace, and we don't ever try to read a
regular inode as metadata.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_inode_buf.c |   70 ++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_inode_buf.h |    3 ++
 libxfs/xfs_metafile.h  |   11 ++++++++
 3 files changed, 84 insertions(+)


diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c
index 981113f6acd37a..98482cb4948284 100644
--- a/libxfs/xfs_inode_buf.c
+++ b/libxfs/xfs_inode_buf.c
@@ -17,6 +17,7 @@
 #include "xfs_ialloc.h"
 #include "xfs_dir2.h"
 #include "xfs_health.h"
+#include "xfs_metafile.h"
 
 
 /*
@@ -486,6 +487,69 @@ xfs_dinode_verify_nrext64(
 	return NULL;
 }
 
+/*
+ * Validate all the picky requirements we have for a file that claims to be
+ * filesystem metadata.
+ */
+xfs_failaddr_t
+xfs_dinode_verify_metadir(
+	struct xfs_mount	*mp,
+	struct xfs_dinode	*dip,
+	uint16_t		mode,
+	uint16_t		flags,
+	uint64_t		flags2)
+{
+	if (!xfs_has_metadir(mp))
+		return __this_address;
+
+	/* V5 filesystem only */
+	if (dip->di_version < 3)
+		return __this_address;
+
+	if (be16_to_cpu(dip->di_metatype) >= XFS_METAFILE_MAX)
+		return __this_address;
+
+	/* V3 inode fields that are always zero */
+	if ((flags2 & XFS_DIFLAG2_NREXT64) && dip->di_nrext64_pad)
+		return __this_address;
+	if (!(flags2 & XFS_DIFLAG2_NREXT64) && dip->di_flushiter)
+		return __this_address;
+
+	/* Metadata files can only be directories or regular files */
+	if (!S_ISDIR(mode) && !S_ISREG(mode))
+		return __this_address;
+
+	/* They must have zero access permissions */
+	if (mode & 0777)
+		return __this_address;
+
+	/* DMAPI event and state masks are zero */
+	if (dip->di_dmevmask || dip->di_dmstate)
+		return __this_address;
+
+	/*
+	 * User and group IDs must be zero.  The project ID is used for
+	 * grouping inodes.  Metadata inodes are never accounted to quotas.
+	 */
+	if (dip->di_uid || dip->di_gid)
+		return __this_address;
+
+	/* Mandatory inode flags must be set */
+	if (S_ISDIR(mode)) {
+		if ((flags & XFS_METADIR_DIFLAGS) != XFS_METADIR_DIFLAGS)
+			return __this_address;
+	} else {
+		if ((flags & XFS_METAFILE_DIFLAGS) != XFS_METAFILE_DIFLAGS)
+			return __this_address;
+	}
+
+	/* dax flags2 must not be set */
+	if (flags2 & XFS_DIFLAG2_DAX)
+		return __this_address;
+
+	return NULL;
+}
+
 xfs_failaddr_t
 xfs_dinode_verify(
 	struct xfs_mount	*mp,
@@ -670,6 +734,12 @@ xfs_dinode_verify(
 	    !xfs_has_bigtime(mp))
 		return __this_address;
 
+	if (flags2 & XFS_DIFLAG2_METADATA) {
+		fa = xfs_dinode_verify_metadir(mp, dip, mode, flags, flags2);
+		if (fa)
+			return fa;
+	}
+
 	return NULL;
 }
 
diff --git a/libxfs/xfs_inode_buf.h b/libxfs/xfs_inode_buf.h
index 585ed5a110af4e..8d43d2641c7328 100644
--- a/libxfs/xfs_inode_buf.h
+++ b/libxfs/xfs_inode_buf.h
@@ -28,6 +28,9 @@ int	xfs_inode_from_disk(struct xfs_inode *ip, struct xfs_dinode *from);
 
 xfs_failaddr_t xfs_dinode_verify(struct xfs_mount *mp, xfs_ino_t ino,
 			   struct xfs_dinode *dip);
+xfs_failaddr_t xfs_dinode_verify_metadir(struct xfs_mount *mp,
+		struct xfs_dinode *dip, uint16_t mode, uint16_t flags,
+		uint64_t flags2);
 xfs_failaddr_t xfs_inode_validate_extsize(struct xfs_mount *mp,
 		uint32_t extsize, uint16_t mode, uint16_t flags);
 xfs_failaddr_t xfs_inode_validate_cowextsize(struct xfs_mount *mp,
diff --git a/libxfs/xfs_metafile.h b/libxfs/xfs_metafile.h
index 60fe1890611277..c66b0c51b461a8 100644
--- a/libxfs/xfs_metafile.h
+++ b/libxfs/xfs_metafile.h
@@ -6,6 +6,17 @@
 #ifndef __XFS_METAFILE_H__
 #define __XFS_METAFILE_H__
 
+/* All metadata files must have these flags set. */
+#define XFS_METAFILE_DIFLAGS	(XFS_DIFLAG_IMMUTABLE | \
+				 XFS_DIFLAG_SYNC | \
+				 XFS_DIFLAG_NOATIME | \
+				 XFS_DIFLAG_NODUMP | \
+				 XFS_DIFLAG_NODEFRAG)
+
+/* All metadata directories must have these flags set. */
+#define XFS_METADIR_DIFLAGS	(XFS_METAFILE_DIFLAGS | \
+				 XFS_DIFLAG_NOSYMLINKS)
+
 /* Code specific to kernel/userspace; must be provided externally. */
 
 int xfs_trans_metafile_iget(struct xfs_trans *tp, xfs_ino_t ino,


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 30/36] xfs: read and write metadata inode directory tree
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (28 preceding siblings ...)
  2024-12-23 21:44   ` [PATCH 29/36] xfs: enforce metadata inode flag Darrick J. Wong
@ 2024-12-23 21:44   ` Darrick J. Wong
  2024-12-23 21:45   ` [PATCH 31/36] xfs: disable the agi rotor for metadata inodes Darrick J. Wong
                     ` (5 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:44 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit e0f091c40bf1287a122dcddd8aa4e6ad06ce441f

Plumb in the bits we need to load metadata inodes from a named entry in
a metadir directory, create (or hardlink) inodes into a metadir
directory, create metadir directories, and flag inodes as being metadata
files.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/libxfs.h         |    1 
 include/xfs_trace.h      |   10 +
 include/xfs_trans.h      |    3 
 libxfs/Makefile          |    3 
 libxfs/libxfs_api_defs.h |    7 +
 libxfs/libxfs_priv.h     |    2 
 libxfs/trans.c           |   39 ++++
 libxfs/xfs_metadir.c     |  473 ++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_metadir.h     |   47 +++++
 libxfs/xfs_metafile.c    |   52 +++++
 libxfs/xfs_metafile.h    |    4 
 11 files changed, 641 insertions(+)
 create mode 100644 libxfs/xfs_metadir.c
 create mode 100644 libxfs/xfs_metadir.h
 create mode 100644 libxfs/xfs_metafile.c


diff --git a/include/libxfs.h b/include/libxfs.h
index 0356bc57b956a9..348d36be192966 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -96,6 +96,7 @@ struct iomap;
 #include "xfs_parent.h"
 #include "xfs_ag_resv.h"
 #include "xfs_metafile.h"
+#include "xfs_metadir.h"
 
 #ifndef ARRAY_SIZE
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index de435a9d1a2a64..f3e531ef118d05 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -360,6 +360,16 @@
 
 #define trace_xlog_intent_recovery_failed(...)	((void) 0)
 
+#define trace_xfs_metadir_cancel(...)		((void) 0)
+#define trace_xfs_metadir_commit(...)		((void) 0)
+#define trace_xfs_metadir_create(...)		((void) 0)
+#define trace_xfs_metadir_link(...)		((void) 0)
+#define trace_xfs_metadir_lookup(...)		((void) 0)
+#define trace_xfs_metadir_start_create(...)	((void) 0)
+#define trace_xfs_metadir_start_link(...)	((void) 0)
+#define trace_xfs_metadir_teardown(...)		((void) 0)
+#define trace_xfs_metadir_try_create(...)	((void) 0)
+
 #define trace_xfs_iunlink_update_bucket(...)	((void) 0)
 #define trace_xfs_iunlink_update_dinode(...)	((void) 0)
 #define trace_xfs_iunlink(...)			((void) 0)
diff --git a/include/xfs_trans.h b/include/xfs_trans.h
index 9bc4b1ef5e82f9..d508f8947a301c 100644
--- a/include/xfs_trans.h
+++ b/include/xfs_trans.h
@@ -93,6 +93,9 @@ int	libxfs_trans_alloc(struct xfs_mount *mp, struct xfs_trans_res *resp,
 int	libxfs_trans_alloc_inode(struct xfs_inode *ip, struct xfs_trans_res *resv,
 			unsigned int dblocks, unsigned int rblocks, bool force,
 			struct xfs_trans **tpp);
+int	libxfs_trans_alloc_dir(struct xfs_inode *dp, struct xfs_trans_res *resv,
+			struct xfs_inode *ip, unsigned int *dblocks,
+			struct xfs_trans **tpp, int *nospace_error);
 int	libxfs_trans_alloc_rollable(struct xfs_mount *mp, uint blocks,
 				    struct xfs_trans **tpp);
 int	libxfs_trans_alloc_empty(struct xfs_mount *mp, struct xfs_trans **tpp);
diff --git a/libxfs/Makefile b/libxfs/Makefile
index 765c84a16408f8..c8267841e77b01 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -55,6 +55,7 @@ HFILES = \
 	xfs_inode_buf.h \
 	xfs_inode_fork.h \
 	xfs_inode_util.h \
+	xfs_metadir.h \
 	xfs_metafile.h \
 	xfs_parent.h \
 	xfs_quota_defs.h \
@@ -114,6 +115,8 @@ CFILES = buf_mem.c \
 	xfs_inode_fork.c \
 	xfs_inode_util.c \
 	xfs_ialloc_btree.c \
+	xfs_metadir.c \
+	xfs_metafile.c \
 	xfs_log_rlimit.c \
 	xfs_parent.c \
 	xfs_refcount.c \
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index fefae9256555a0..dd163709fe07df 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -176,6 +176,7 @@
 #define xfs_iallocbt_calc_size		libxfs_iallocbt_calc_size
 #define xfs_iallocbt_maxlevels_ondisk	libxfs_iallocbt_maxlevels_ondisk
 #define xfs_ialloc_read_agi		libxfs_ialloc_read_agi
+#define xfs_icreate			libxfs_icreate
 #define xfs_idata_realloc		libxfs_idata_realloc
 #define xfs_idestroy_fork		libxfs_idestroy_fork
 #define xfs_iext_first			libxfs_iext_first
@@ -212,6 +213,11 @@
 
 #define xfs_metafile_iget		libxfs_metafile_iget
 #define xfs_trans_metafile_iget		libxfs_trans_metafile_iget
+#define xfs_metadir_link		libxfs_metadir_link
+#define xfs_metadir_lookup		libxfs_metadir_lookup
+#define xfs_metadir_start_create	libxfs_metadir_start_create
+#define xfs_metadir_start_link		libxfs_metadir_start_link
+
 #define xfs_mode_to_ftype		libxfs_mode_to_ftype
 #define xfs_mkdir_space_res		libxfs_mkdir_space_res
 
@@ -285,6 +291,7 @@
 #define xfs_trans_add_item		libxfs_trans_add_item
 #define xfs_trans_alloc_empty		libxfs_trans_alloc_empty
 #define xfs_trans_alloc			libxfs_trans_alloc
+#define xfs_trans_alloc_dir		libxfs_trans_alloc_dir
 #define xfs_trans_alloc_inode		libxfs_trans_alloc_inode
 #define xfs_trans_bdetach		libxfs_trans_bdetach
 #define xfs_trans_bhold			libxfs_trans_bhold
diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index 1b6bb961b7ac06..631abf266ad526 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -181,6 +181,7 @@ enum ce { CE_DEBUG, CE_CONT, CE_NOTE, CE_WARN, CE_ALERT, CE_PANIC };
 #define XFS_ILOCK_SHARED		0
 #define XFS_ILOCK_RTBITMAP		0
 #define XFS_ILOCK_RTSUM			0
+#define XFS_IOLOCK_EXCL			0
 #define XFS_STATS_INC(mp, count)	do { (mp) = (mp); } while (0)
 #define XFS_STATS_DEC(mp, count, x)	do { (mp) = (mp); } while (0)
 #define XFS_STATS_ADD(mp, count, x)	do { (mp) = (mp); } while (0)
@@ -632,6 +633,7 @@ int xfs_bmap_last_extent(struct xfs_trans *tp, struct xfs_inode *ip,
 
 /* xfs_inode.h */
 #define xfs_iflags_set(ip, flags)	do { } while (0)
+#define xfs_finish_inode_setup(ip)	((void) 0)
 
 /* linux/wordpart.h */
 
diff --git a/libxfs/trans.c b/libxfs/trans.c
index 7e36205c6a3d50..72f26591053716 100644
--- a/libxfs/trans.c
+++ b/libxfs/trans.c
@@ -1185,6 +1185,45 @@ libxfs_trans_alloc_inode(
 	return 0;
 }
 
+/*
+ * Allocate an transaction, lock and join the directory and child inodes to it,
+ * and reserve quota for a directory update.  @resblks must point to the number
+ * of blocks to reserve; if not enough space is available, -ENOSPC will be
+ * returned.  @nospace_error is always set to zero here.  Userspace does not
+ * support reservationless creation at all, unlike the kernel.
+ *
+ * The ILOCKs will be dropped when the transaction is committed or cancelled.
+ *
+ * Caller is responsible for unlocking the inodes manually upon return
+ */
+int
+libxfs_trans_alloc_dir(
+	struct xfs_inode	*dp,
+	struct xfs_trans_res	*resv,
+	struct xfs_inode	*ip,
+	unsigned int		*resblks,
+	struct xfs_trans	**tpp,
+	int			*nospace_error)
+{
+	struct xfs_trans	*tp;
+	struct xfs_mount	*mp = ip->i_mount;
+	int			error;
+
+	*nospace_error = 0;
+
+	error = xfs_trans_alloc(mp, resv, *resblks, 0, 0, &tp);
+	if (error)
+		return error;
+
+	xfs_lock_two_inodes(dp, XFS_ILOCK_EXCL, ip, XFS_ILOCK_EXCL);
+
+	xfs_trans_ijoin(tp, dp, 0);
+	xfs_trans_ijoin(tp, ip, 0);
+
+	*tpp = tp;
+	return 0;
+}
+
 /*
  * Try to reserve more blocks for a transaction.  The single use case we
  * support is for offline repair -- use a transaction to gather data without
diff --git a/libxfs/xfs_metadir.c b/libxfs/xfs_metadir.c
new file mode 100644
index 00000000000000..b52abf12d20511
--- /dev/null
+++ b/libxfs/xfs_metadir.c
@@ -0,0 +1,473 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2018-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "libxfs_priv.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_bit.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_trans.h"
+#include "xfs_metafile.h"
+#include "xfs_metadir.h"
+#include "xfs_trace.h"
+#include "xfs_inode.h"
+#include "xfs_ialloc.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_trans_space.h"
+#include "xfs_ag.h"
+#include "xfs_dir2.h"
+#include "xfs_dir2_priv.h"
+#include "xfs_parent.h"
+
+/*
+ * Metadata Directory Tree
+ * =======================
+ *
+ * These functions provide an abstraction layer for looking up, creating, and
+ * deleting metadata inodes that live within a special metadata directory tree.
+ *
+ * This code does not manage the five existing metadata inodes: real time
+ * bitmap & summary; and the user, group, and quotas.  All other metadata
+ * inodes must use only the xfs_meta{dir,file}_* functions.
+ *
+ * Callers wishing to create or hardlink a metadata inode must create an
+ * xfs_metadir_update structure, call the appropriate xfs_metadir* function,
+ * and then call xfs_metadir_commit or xfs_metadir_cancel to commit or cancel
+ * the update.  Files in the metadata directory tree currently cannot be
+ * unlinked.
+ *
+ * When the metadir feature is enabled, all metadata inodes must have the
+ * "metadata" inode flag set to prevent them from being exposed to the outside
+ * world.
+ *
+ * Callers must take the ILOCK of any inode in the metadata directory tree to
+ * synchronize access to that inode.  It is never necessary to take the IOLOCK
+ * or the MMAPLOCK since metadata inodes must not be exposed to user space.
+ */
+
+static inline void
+xfs_metadir_set_xname(
+	struct xfs_name		*xname,
+	const char		*path,
+	unsigned char		ftype)
+{
+	xname->name = (const unsigned char *)path;
+	xname->len = strlen(path);
+	xname->type = ftype;
+}
+
+/*
+ * Given a parent directory @dp and a metadata inode path component @xname,
+ * Look up the inode number in the directory, returning it in @ino.
+ * @xname.type must match the directory entry's ftype.
+ *
+ * Caller must hold ILOCK_EXCL.
+ */
+static inline int
+xfs_metadir_lookup(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*dp,
+	struct xfs_name		*xname,
+	xfs_ino_t		*ino)
+{
+	struct xfs_mount	*mp = dp->i_mount;
+	struct xfs_da_args	args = {
+		.trans		= tp,
+		.dp		= dp,
+		.geo		= mp->m_dir_geo,
+		.name		= xname->name,
+		.namelen	= xname->len,
+		.hashval	= xfs_dir2_hashname(mp, xname),
+		.whichfork	= XFS_DATA_FORK,
+		.op_flags	= XFS_DA_OP_OKNOENT,
+		.owner		= dp->i_ino,
+	};
+	int			error;
+
+	if (!S_ISDIR(VFS_I(dp)->i_mode))
+		return -EFSCORRUPTED;
+	if (xfs_is_shutdown(mp))
+		return -EIO;
+
+	error = xfs_dir_lookup_args(&args);
+	if (error)
+		return error;
+
+	if (!xfs_verify_ino(mp, args.inumber))
+		return -EFSCORRUPTED;
+	if (xname->type != XFS_DIR3_FT_UNKNOWN && xname->type != args.filetype)
+		return -EFSCORRUPTED;
+
+	trace_xfs_metadir_lookup(dp, xname, args.inumber);
+	*ino = args.inumber;
+	return 0;
+}
+
+/*
+ * Look up and read a metadata inode from the metadata directory.  If the path
+ * component doesn't exist, return -ENOENT.
+ */
+int
+xfs_metadir_load(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*dp,
+	const char		*path,
+	enum xfs_metafile_type	metafile_type,
+	struct xfs_inode	**ipp)
+{
+	struct xfs_name		xname;
+	xfs_ino_t		ino;
+	int			error;
+
+	xfs_metadir_set_xname(&xname, path, XFS_DIR3_FT_UNKNOWN);
+
+	xfs_ilock(dp, XFS_ILOCK_EXCL);
+	error = xfs_metadir_lookup(tp, dp, &xname, &ino);
+	xfs_iunlock(dp, XFS_ILOCK_EXCL);
+	if (error)
+		return error;
+	return xfs_trans_metafile_iget(tp, ino, metafile_type, ipp);
+}
+
+/*
+ * Unlock and release resources after committing (or cancelling) a metadata
+ * directory tree operation.  The caller retains its reference to @upd->ip
+ * and must release it explicitly.
+ */
+static inline void
+xfs_metadir_teardown(
+	struct xfs_metadir_update	*upd,
+	int				error)
+{
+	trace_xfs_metadir_teardown(upd, error);
+
+	if (upd->ppargs) {
+		xfs_parent_finish(upd->dp->i_mount, upd->ppargs);
+		upd->ppargs = NULL;
+	}
+
+	if (upd->ip) {
+		if (upd->ip_locked)
+			xfs_iunlock(upd->ip, XFS_ILOCK_EXCL);
+		upd->ip_locked = false;
+	}
+
+	if (upd->dp_locked)
+		xfs_iunlock(upd->dp, XFS_ILOCK_EXCL);
+	upd->dp_locked = false;
+}
+
+/*
+ * Begin the process of creating a metadata file by allocating transactions
+ * and taking whatever resources we're going to need.
+ */
+int
+xfs_metadir_start_create(
+	struct xfs_metadir_update	*upd)
+{
+	struct xfs_mount		*mp = upd->dp->i_mount;
+	int				error;
+
+	ASSERT(upd->dp != NULL);
+	ASSERT(upd->ip == NULL);
+	ASSERT(xfs_has_metadir(mp));
+	ASSERT(upd->metafile_type != XFS_METAFILE_UNKNOWN);
+
+	error = xfs_parent_start(mp, &upd->ppargs);
+	if (error)
+		return error;
+
+	/*
+	 * If we ever need the ability to create rt metadata files on a
+	 * pre-metadir filesystem, we'll need to dqattach the parent here.
+	 * Currently we assume that mkfs will create the files and quotacheck
+	 * will account for them.
+	 */
+
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_create,
+			xfs_create_space_res(mp, MAXNAMELEN), 0, 0, &upd->tp);
+	if (error)
+		goto out_teardown;
+
+	/*
+	 * Lock the parent directory if there is one.  We can't ijoin it to
+	 * the transaction until after the child file has been created.
+	 */
+	xfs_ilock(upd->dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT);
+	upd->dp_locked = true;
+
+	trace_xfs_metadir_start_create(upd);
+	return 0;
+out_teardown:
+	xfs_metadir_teardown(upd, error);
+	return error;
+}
+
+/*
+ * Create a metadata inode with the given @mode, and insert it into the
+ * metadata directory tree at the given @upd->path.  The path up to the final
+ * component must already exist.  The final path component must not exist.
+ *
+ * The new metadata inode will be attached to the update structure @upd->ip,
+ * with the ILOCK held until the caller releases it.
+ *
+ * NOTE: This function may return a new inode to the caller even if it returns
+ * a negative error code.  If an inode is passed back, the caller must finish
+ * setting up the inode before releasing it.
+ */
+int
+xfs_metadir_create(
+	struct xfs_metadir_update	*upd,
+	umode_t				mode)
+{
+	struct xfs_icreate_args		args = {
+		.pip			= upd->dp,
+		.mode			= mode,
+	};
+	struct xfs_name			xname;
+	struct xfs_dir_update		du = {
+		.dp			= upd->dp,
+		.name			= &xname,
+		.ppargs			= upd->ppargs,
+	};
+	struct xfs_mount		*mp = upd->dp->i_mount;
+	xfs_ino_t			ino;
+	unsigned int			resblks;
+	int				error;
+
+	xfs_assert_ilocked(upd->dp, XFS_ILOCK_EXCL);
+
+	/* Check that the name does not already exist in the directory. */
+	xfs_metadir_set_xname(&xname, upd->path, XFS_DIR3_FT_UNKNOWN);
+	error = xfs_metadir_lookup(upd->tp, upd->dp, &xname, &ino);
+	switch (error) {
+	case -ENOENT:
+		break;
+	case 0:
+		error = -EEXIST;
+		fallthrough;
+	default:
+		return error;
+	}
+
+	/*
+	 * A newly created regular or special file just has one directory
+	 * entry pointing to them, but a directory also the "." entry
+	 * pointing to itself.
+	 */
+	error = xfs_dialloc(&upd->tp, &args, &ino);
+	if (error)
+		return error;
+	error = xfs_icreate(upd->tp, ino, &args, &upd->ip);
+	if (error)
+		return error;
+	du.ip = upd->ip;
+	xfs_metafile_set_iflag(upd->tp, upd->ip, upd->metafile_type);
+	upd->ip_locked = true;
+
+	/*
+	 * Join the directory inode to the transaction.  We do not do it
+	 * earlier because xfs_dialloc rolls the transaction.
+	 */
+	xfs_trans_ijoin(upd->tp, upd->dp, 0);
+
+	/* Create the entry. */
+	if (S_ISDIR(args.mode))
+		resblks = xfs_mkdir_space_res(mp, xname.len);
+	else
+		resblks = xfs_create_space_res(mp, xname.len);
+	xname.type = xfs_mode_to_ftype(args.mode);
+
+	trace_xfs_metadir_try_create(upd);
+
+	error = xfs_dir_create_child(upd->tp, resblks, &du);
+	if (error)
+		return error;
+
+	/* Metadir files are not accounted to quota. */
+
+	trace_xfs_metadir_create(upd);
+
+	return 0;
+}
+
+#ifndef __KERNEL__
+/*
+ * Begin the process of linking a metadata file by allocating transactions
+ * and locking whatever resources we're going to need.
+ */
+int
+xfs_metadir_start_link(
+	struct xfs_metadir_update	*upd)
+{
+	struct xfs_mount		*mp = upd->dp->i_mount;
+	unsigned int			resblks;
+	int				nospace_error = 0;
+	int				error;
+
+	ASSERT(upd->dp != NULL);
+	ASSERT(upd->ip != NULL);
+	ASSERT(xfs_has_metadir(mp));
+
+	error = xfs_parent_start(mp, &upd->ppargs);
+	if (error)
+		return error;
+
+	resblks = xfs_link_space_res(mp, MAXNAMELEN);
+	error = xfs_trans_alloc_dir(upd->dp, &M_RES(mp)->tr_link, upd->ip,
+			&resblks, &upd->tp, &nospace_error);
+	if (error)
+		goto out_teardown;
+	if (!resblks) {
+		/* We don't allow reservationless updates. */
+		xfs_trans_cancel(upd->tp);
+		upd->tp = NULL;
+		xfs_iunlock(upd->dp, XFS_ILOCK_EXCL);
+		xfs_iunlock(upd->ip, XFS_ILOCK_EXCL);
+		error = nospace_error;
+		goto out_teardown;
+	}
+
+	upd->dp_locked = true;
+	upd->ip_locked = true;
+
+	trace_xfs_metadir_start_link(upd);
+	return 0;
+out_teardown:
+	xfs_metadir_teardown(upd, error);
+	return error;
+}
+
+/*
+ * Link the metadata directory given by @path to the inode @upd->ip.
+ * The path (up to the final component) must already exist, but the final
+ * component must not already exist.
+ */
+int
+xfs_metadir_link(
+	struct xfs_metadir_update	*upd)
+{
+	struct xfs_name			xname;
+	struct xfs_dir_update		du = {
+		.dp			= upd->dp,
+		.name			= &xname,
+		.ip			= upd->ip,
+		.ppargs			= upd->ppargs,
+	};
+	struct xfs_mount		*mp = upd->dp->i_mount;
+	xfs_ino_t			ino;
+	unsigned int			resblks;
+	int				error;
+
+	xfs_assert_ilocked(upd->dp, XFS_ILOCK_EXCL);
+	xfs_assert_ilocked(upd->ip, XFS_ILOCK_EXCL);
+
+	/* Look up the name in the current directory. */
+	xfs_metadir_set_xname(&xname, upd->path,
+			xfs_mode_to_ftype(VFS_I(upd->ip)->i_mode));
+	error = xfs_metadir_lookup(upd->tp, upd->dp, &xname, &ino);
+	switch (error) {
+	case -ENOENT:
+		break;
+	case 0:
+		error = -EEXIST;
+		fallthrough;
+	default:
+		return error;
+	}
+
+	resblks = xfs_link_space_res(mp, xname.len);
+	error = xfs_dir_add_child(upd->tp, resblks, &du);
+	if (error)
+		return error;
+
+	trace_xfs_metadir_link(upd);
+
+	return 0;
+}
+#endif /* ! __KERNEL__ */
+
+/* Commit a metadir update and unlock/drop all resources. */
+int
+xfs_metadir_commit(
+	struct xfs_metadir_update	*upd)
+{
+	int				error;
+
+	trace_xfs_metadir_commit(upd);
+
+	error = xfs_trans_commit(upd->tp);
+	upd->tp = NULL;
+
+	xfs_metadir_teardown(upd, error);
+	return error;
+}
+
+/* Cancel a metadir update and unlock/drop all resources. */
+void
+xfs_metadir_cancel(
+	struct xfs_metadir_update	*upd,
+	int				error)
+{
+	trace_xfs_metadir_cancel(upd);
+
+	xfs_trans_cancel(upd->tp);
+	upd->tp = NULL;
+
+	xfs_metadir_teardown(upd, error);
+}
+
+/* Create a metadata for the last component of the path. */
+int
+xfs_metadir_mkdir(
+	struct xfs_inode		*dp,
+	const char			*path,
+	struct xfs_inode		**ipp)
+{
+	struct xfs_metadir_update	upd = {
+		.dp			= dp,
+		.path			= path,
+		.metafile_type		= XFS_METAFILE_DIR,
+	};
+	int				error;
+
+	if (xfs_is_shutdown(dp->i_mount))
+		return -EIO;
+
+	/* Allocate a transaction to create the last directory. */
+	error = xfs_metadir_start_create(&upd);
+	if (error)
+		return error;
+
+	/* Create the subdirectory and take our reference. */
+	error = xfs_metadir_create(&upd, S_IFDIR);
+	if (error)
+		goto out_cancel;
+
+	error = xfs_metadir_commit(&upd);
+	if (error)
+		goto out_irele;
+
+	xfs_finish_inode_setup(upd.ip);
+	*ipp = upd.ip;
+	return 0;
+
+out_cancel:
+	xfs_metadir_cancel(&upd, error);
+out_irele:
+	/* Have to finish setting up the inode to ensure it's deleted. */
+	if (upd.ip) {
+		xfs_finish_inode_setup(upd.ip);
+		xfs_irele(upd.ip);
+	}
+	return error;
+}
diff --git a/libxfs/xfs_metadir.h b/libxfs/xfs_metadir.h
new file mode 100644
index 00000000000000..bfecac7d3d1472
--- /dev/null
+++ b/libxfs/xfs_metadir.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (c) 2018-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __XFS_METADIR_H__
+#define __XFS_METADIR_H__
+
+/* Cleanup widget for metadata inode creation and deletion. */
+struct xfs_metadir_update {
+	/* Parent directory */
+	struct xfs_inode	*dp;
+
+	/* Path to metadata file */
+	const char		*path;
+
+	/* Parent pointer update context */
+	struct xfs_parent_args	*ppargs;
+
+	/* Child metadata file */
+	struct xfs_inode	*ip;
+
+	struct xfs_trans	*tp;
+
+	enum xfs_metafile_type	metafile_type;
+
+	unsigned int		dp_locked:1;
+	unsigned int		ip_locked:1;
+};
+
+int xfs_metadir_load(struct xfs_trans *tp, struct xfs_inode *dp,
+		const char *path, enum xfs_metafile_type metafile_type,
+		struct xfs_inode **ipp);
+
+int xfs_metadir_start_create(struct xfs_metadir_update *upd);
+int xfs_metadir_create(struct xfs_metadir_update *upd, umode_t mode);
+
+int xfs_metadir_start_link(struct xfs_metadir_update *upd);
+int xfs_metadir_link(struct xfs_metadir_update *upd);
+
+int xfs_metadir_commit(struct xfs_metadir_update *upd);
+void xfs_metadir_cancel(struct xfs_metadir_update *upd, int error);
+
+int xfs_metadir_mkdir(struct xfs_inode *dp, const char *path,
+		struct xfs_inode **ipp);
+
+#endif /* __XFS_METADIR_H__ */
diff --git a/libxfs/xfs_metafile.c b/libxfs/xfs_metafile.c
new file mode 100644
index 00000000000000..3bd9493373115a
--- /dev/null
+++ b/libxfs/xfs_metafile.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2018-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "libxfs_priv.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_bit.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_trans.h"
+#include "xfs_metafile.h"
+#include "xfs_trace.h"
+#include "xfs_inode.h"
+
+/* Set up an inode to be recognized as a metadata directory inode. */
+void
+xfs_metafile_set_iflag(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip,
+	enum xfs_metafile_type	metafile_type)
+{
+	VFS_I(ip)->i_mode &= ~0777;
+	VFS_I(ip)->i_uid = GLOBAL_ROOT_UID;
+	VFS_I(ip)->i_gid = GLOBAL_ROOT_GID;
+	if (S_ISDIR(VFS_I(ip)->i_mode))
+		ip->i_diflags |= XFS_METADIR_DIFLAGS;
+	else
+		ip->i_diflags |= XFS_METAFILE_DIFLAGS;
+	ip->i_diflags2 &= ~XFS_DIFLAG2_DAX;
+	ip->i_diflags2 |= XFS_DIFLAG2_METADATA;
+	ip->i_metatype = metafile_type;
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+}
+
+/* Clear the metadata directory inode flag. */
+void
+xfs_metafile_clear_iflag(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip)
+{
+	ASSERT(xfs_is_metadir_inode(ip));
+	ASSERT(VFS_I(ip)->i_nlink == 0);
+
+	ip->i_diflags2 &= ~XFS_DIFLAG2_METADATA;
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+}
diff --git a/libxfs/xfs_metafile.h b/libxfs/xfs_metafile.h
index c66b0c51b461a8..acec400123db05 100644
--- a/libxfs/xfs_metafile.h
+++ b/libxfs/xfs_metafile.h
@@ -17,6 +17,10 @@
 #define XFS_METADIR_DIFLAGS	(XFS_METAFILE_DIFLAGS | \
 				 XFS_DIFLAG_NOSYMLINKS)
 
+void xfs_metafile_set_iflag(struct xfs_trans *tp, struct xfs_inode *ip,
+		enum xfs_metafile_type metafile_type);
+void xfs_metafile_clear_iflag(struct xfs_trans *tp, struct xfs_inode *ip);
+
 /* Code specific to kernel/userspace; must be provided externally. */
 
 int xfs_trans_metafile_iget(struct xfs_trans *tp, xfs_ino_t ino,


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 31/36] xfs: disable the agi rotor for metadata inodes
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (29 preceding siblings ...)
  2024-12-23 21:44   ` [PATCH 30/36] xfs: read and write metadata inode directory tree Darrick J. Wong
@ 2024-12-23 21:45   ` Darrick J. Wong
  2024-12-23 21:45   ` [PATCH 32/36] xfs: advertise metadata directory feature Darrick J. Wong
                     ` (4 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:45 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 8651b410ae781cc607159c51dbb0b317b23543b1

Ideally, we'd put all the metadata inodes in one place if we could, so
that the metadata all stay reasonably close together instead of
spreading out over the disk.  Furthermore, if the log is internal we'd
probably prefer to keep the metadata near the log.  Therefore, disable
AGI rotoring for metadata inode allocations.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_ialloc.c |   58 +++++++++++++++++++++++++++++++++++----------------
 1 file changed, 40 insertions(+), 18 deletions(-)


diff --git a/libxfs/xfs_ialloc.c b/libxfs/xfs_ialloc.c
index 055eff477faceb..2575447f92dfbb 100644
--- a/libxfs/xfs_ialloc.c
+++ b/libxfs/xfs_ialloc.c
@@ -1836,6 +1836,40 @@ xfs_dialloc_try_ag(
 	return error;
 }
 
+/*
+ * Pick an AG for the new inode.
+ *
+ * Directories, symlinks, and regular files frequently allocate at least one
+ * block, so factor that potential expansion when we examine whether an AG has
+ * enough space for file creation.  Try to keep metadata files all in the same
+ * AG.
+ */
+static inline xfs_agnumber_t
+xfs_dialloc_pick_ag(
+	struct xfs_mount	*mp,
+	struct xfs_inode	*dp,
+	umode_t			mode)
+{
+	xfs_agnumber_t		start_agno;
+
+	if (!dp)
+		return 0;
+	if (xfs_is_metadir_inode(dp)) {
+		if (mp->m_sb.sb_logstart)
+			return XFS_FSB_TO_AGNO(mp, mp->m_sb.sb_logstart);
+		return 0;
+	}
+
+	if (S_ISDIR(mode))
+		return (atomic_inc_return(&mp->m_agirotor) - 1) % mp->m_maxagi;
+
+	start_agno = XFS_INO_TO_AGNO(mp, dp->i_ino);
+	if (start_agno >= mp->m_maxagi)
+		start_agno = 0;
+
+	return start_agno;
+}
+
 /*
  * Allocate an on-disk inode.
  *
@@ -1851,31 +1885,19 @@ xfs_dialloc(
 	xfs_ino_t		*new_ino)
 {
 	struct xfs_mount	*mp = (*tpp)->t_mountp;
+	struct xfs_perag	*pag;
+	struct xfs_ino_geometry	*igeo = M_IGEO(mp);
+	xfs_ino_t		ino = NULLFSINO;
 	xfs_ino_t		parent = args->pip ? args->pip->i_ino : 0;
-	umode_t			mode = args->mode & S_IFMT;
 	xfs_agnumber_t		agno;
-	int			error = 0;
 	xfs_agnumber_t		start_agno;
-	struct xfs_perag	*pag;
-	struct xfs_ino_geometry	*igeo = M_IGEO(mp);
+	umode_t			mode = args->mode & S_IFMT;
 	bool			ok_alloc = true;
 	bool			low_space = false;
 	int			flags;
-	xfs_ino_t		ino = NULLFSINO;
+	int			error = 0;
 
-	/*
-	 * Directories, symlinks, and regular files frequently allocate at least
-	 * one block, so factor that potential expansion when we examine whether
-	 * an AG has enough space for file creation.
-	 */
-	if (S_ISDIR(mode))
-		start_agno = (atomic_inc_return(&mp->m_agirotor) - 1) %
-				mp->m_maxagi;
-	else {
-		start_agno = XFS_INO_TO_AGNO(mp, parent);
-		if (start_agno >= mp->m_maxagi)
-			start_agno = 0;
-	}
+	start_agno = xfs_dialloc_pick_ag(mp, args->pip, mode);
 
 	/*
 	 * If we have already hit the ceiling of inode blocks then clear


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 32/36] xfs: advertise metadata directory feature
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (30 preceding siblings ...)
  2024-12-23 21:45   ` [PATCH 31/36] xfs: disable the agi rotor for metadata inodes Darrick J. Wong
@ 2024-12-23 21:45   ` Darrick J. Wong
  2024-12-23 21:45   ` [PATCH 33/36] xfs: allow bulkstat to return metadata directories Darrick J. Wong
                     ` (3 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:45 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 688828d8f8cdf8b1b917de938a1ce86a93fdbba9

Advertise the existence of the metadata directory feature; this will be
used by scrub to decide if it needs to scan the metadir too.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_fs.h |    1 +
 libxfs/xfs_sb.c |    2 ++
 2 files changed, 3 insertions(+)


diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index 860284064c5aa9..a42c1a33691c0f 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -242,6 +242,7 @@ typedef struct xfs_fsop_resblks {
 #define XFS_FSOP_GEOM_FLAGS_NREXT64	(1 << 23) /* large extent counters */
 #define XFS_FSOP_GEOM_FLAGS_EXCHANGE_RANGE (1 << 24) /* exchange range */
 #define XFS_FSOP_GEOM_FLAGS_PARENT	(1 << 25) /* linux parent pointers */
+#define XFS_FSOP_GEOM_FLAGS_METADIR	(1 << 26) /* metadata directories */
 
 /*
  * Minimum and maximum sizes need for growth checks.
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index de12edd8192e1c..4ca57d592be8fa 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -1306,6 +1306,8 @@ xfs_fs_geometry(
 		geo->flags |= XFS_FSOP_GEOM_FLAGS_NREXT64;
 	if (xfs_has_exchange_range(mp))
 		geo->flags |= XFS_FSOP_GEOM_FLAGS_EXCHANGE_RANGE;
+	if (xfs_has_metadir(mp))
+		geo->flags |= XFS_FSOP_GEOM_FLAGS_METADIR;
 	geo->rtsectsize = sbp->sb_blocksize;
 	geo->dirblocksize = xfs_dir2_dirblock_bytes(sbp);
 


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 33/36] xfs: allow bulkstat to return metadata directories
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (31 preceding siblings ...)
  2024-12-23 21:45   ` [PATCH 32/36] xfs: advertise metadata directory feature Darrick J. Wong
@ 2024-12-23 21:45   ` Darrick J. Wong
  2024-12-23 21:45   ` [PATCH 34/36] xfs: adjust xfs_bmap_add_attrfork for metadir Darrick J. Wong
                     ` (2 subsequent siblings)
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:45 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: df866c538ff098baa210b407b822818a415a6e7e

Allow the V5 bulkstat ioctl to return information about metadata
directory files so that xfs_scrub can find and scrub them, since they
are otherwise ordinary directories.

(Metadata files of course require per-file scrub code and hence do not
need exposure.)

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_fs.h |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index a42c1a33691c0f..499bea4ea8067f 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -490,9 +490,17 @@ struct xfs_bulk_ireq {
  */
 #define XFS_BULK_IREQ_NREXT64	(1U << 2)
 
+/*
+ * Allow bulkstat to return information about metadata directories.  This
+ * enables xfs_scrub to find them for scanning, as they are otherwise ordinary
+ * directories.
+ */
+#define XFS_BULK_IREQ_METADIR	(1U << 3)
+
 #define XFS_BULK_IREQ_FLAGS_ALL	(XFS_BULK_IREQ_AGNO |	 \
 				 XFS_BULK_IREQ_SPECIAL | \
-				 XFS_BULK_IREQ_NREXT64)
+				 XFS_BULK_IREQ_NREXT64 | \
+				 XFS_BULK_IREQ_METADIR)
 
 /* Operate on the root directory inode. */
 #define XFS_BULK_IREQ_SPECIAL_ROOT	(1)


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 34/36] xfs: adjust xfs_bmap_add_attrfork for metadir
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (32 preceding siblings ...)
  2024-12-23 21:45   ` [PATCH 33/36] xfs: allow bulkstat to return metadata directories Darrick J. Wong
@ 2024-12-23 21:45   ` Darrick J. Wong
  2024-12-23 21:46   ` [PATCH 35/36] xfs: record health problems with the metadata directory Darrick J. Wong
  2024-12-23 21:46   ` [PATCH 36/36] xfs: check metadata directory file path connectivity Darrick J. Wong
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:45 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 61b6bdb30a4bee1f3417081aedfe9e346538f897

Online repair might use the xfs_bmap_add_attrfork to repair a file in
the metadata directory tree if (say) the metadata file lacks the correct
parent pointers.  In that case, it is not correct to check that the file
is dqattached -- metadata files must be not have /any/ dquot attached at
all.  Adjust the assertions appropriately.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_attr.c |    5 ++++-
 libxfs/xfs_bmap.c |    5 ++++-
 2 files changed, 8 insertions(+), 2 deletions(-)


diff --git a/libxfs/xfs_attr.c b/libxfs/xfs_attr.c
index 150aaddf7f9fed..7567014abbe7f0 100644
--- a/libxfs/xfs_attr.c
+++ b/libxfs/xfs_attr.c
@@ -1003,7 +1003,10 @@ xfs_attr_add_fork(
 	unsigned int		blks;		/* space reservation */
 	int			error;		/* error return value */
 
-	ASSERT(!XFS_NOT_DQATTACHED(mp, ip));
+	if (xfs_is_metadir_inode(ip))
+		ASSERT(XFS_IS_DQDETACHED(ip));
+	else
+		ASSERT(!XFS_NOT_DQATTACHED(mp, ip));
 
 	blks = XFS_ADDAFORK_SPACE_RES(mp);
 
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 13cd4faa492838..99d53f9383a49a 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -1036,7 +1036,10 @@ xfs_bmap_add_attrfork(
 	int			error;		/* error return value */
 
 	xfs_assert_ilocked(ip, XFS_ILOCK_EXCL);
-	ASSERT(!XFS_NOT_DQATTACHED(mp, ip));
+	if (xfs_is_metadir_inode(ip))
+		ASSERT(XFS_IS_DQDETACHED(ip));
+	else
+		ASSERT(!XFS_NOT_DQATTACHED(mp, ip));
 	ASSERT(!xfs_inode_has_attr_fork(ip));
 
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 35/36] xfs: record health problems with the metadata directory
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (33 preceding siblings ...)
  2024-12-23 21:45   ` [PATCH 34/36] xfs: adjust xfs_bmap_add_attrfork for metadir Darrick J. Wong
@ 2024-12-23 21:46   ` Darrick J. Wong
  2024-12-23 21:46   ` [PATCH 36/36] xfs: check metadata directory file path connectivity Darrick J. Wong
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:46 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: be42fc1393d66024eb6415c92f45fab5d1878c3e

Make a report to the health monitoring subsystem any time we encounter
something in the metadata directory tree that looks like corruption.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_fs.h      |    1 +
 libxfs/xfs_health.h  |    4 +++-
 libxfs/xfs_metadir.c |   13 ++++++++++---
 3 files changed, 14 insertions(+), 4 deletions(-)


diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index 499bea4ea8067f..b05e6fb1470351 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -198,6 +198,7 @@ struct xfs_fsop_geom {
 #define XFS_FSOP_GEOM_SICK_RT_SUMMARY	(1 << 5)  /* realtime summary */
 #define XFS_FSOP_GEOM_SICK_QUOTACHECK	(1 << 6)  /* quota counts */
 #define XFS_FSOP_GEOM_SICK_NLINKS	(1 << 7)  /* inode link counts */
+#define XFS_FSOP_GEOM_SICK_METADIR	(1 << 8)  /* metadata directory */
 
 /* Output for XFS_FS_COUNTS */
 typedef struct xfs_fsop_counts {
diff --git a/libxfs/xfs_health.h b/libxfs/xfs_health.h
index 13301420a2f670..f90e8dfc050000 100644
--- a/libxfs/xfs_health.h
+++ b/libxfs/xfs_health.h
@@ -62,6 +62,7 @@ struct xfs_da_args;
 #define XFS_SICK_FS_PQUOTA	(1 << 3)  /* project quota */
 #define XFS_SICK_FS_QUOTACHECK	(1 << 4)  /* quota counts */
 #define XFS_SICK_FS_NLINKS	(1 << 5)  /* inode link counts */
+#define XFS_SICK_FS_METADIR	(1 << 6)  /* metadata directory tree */
 
 /* Observable health issues for realtime volume metadata. */
 #define XFS_SICK_RT_BITMAP	(1 << 0)  /* realtime bitmap */
@@ -105,7 +106,8 @@ struct xfs_da_args;
 				 XFS_SICK_FS_GQUOTA | \
 				 XFS_SICK_FS_PQUOTA | \
 				 XFS_SICK_FS_QUOTACHECK | \
-				 XFS_SICK_FS_NLINKS)
+				 XFS_SICK_FS_NLINKS | \
+				 XFS_SICK_FS_METADIR)
 
 #define XFS_SICK_RT_PRIMARY	(XFS_SICK_RT_BITMAP | \
 				 XFS_SICK_RT_SUMMARY)
diff --git a/libxfs/xfs_metadir.c b/libxfs/xfs_metadir.c
index b52abf12d20511..b5f05925e73a4e 100644
--- a/libxfs/xfs_metadir.c
+++ b/libxfs/xfs_metadir.c
@@ -27,6 +27,7 @@
 #include "xfs_dir2.h"
 #include "xfs_dir2_priv.h"
 #include "xfs_parent.h"
+#include "xfs_health.h"
 
 /*
  * Metadata Directory Tree
@@ -93,8 +94,10 @@ xfs_metadir_lookup(
 	};
 	int			error;
 
-	if (!S_ISDIR(VFS_I(dp)->i_mode))
+	if (!S_ISDIR(VFS_I(dp)->i_mode)) {
+		xfs_fs_mark_sick(mp, XFS_SICK_FS_METADIR);
 		return -EFSCORRUPTED;
+	}
 	if (xfs_is_shutdown(mp))
 		return -EIO;
 
@@ -102,10 +105,14 @@ xfs_metadir_lookup(
 	if (error)
 		return error;
 
-	if (!xfs_verify_ino(mp, args.inumber))
+	if (!xfs_verify_ino(mp, args.inumber)) {
+		xfs_fs_mark_sick(mp, XFS_SICK_FS_METADIR);
 		return -EFSCORRUPTED;
-	if (xname->type != XFS_DIR3_FT_UNKNOWN && xname->type != args.filetype)
+	}
+	if (xname->type != XFS_DIR3_FT_UNKNOWN && xname->type != args.filetype) {
+		xfs_fs_mark_sick(mp, XFS_SICK_FS_METADIR);
 		return -EFSCORRUPTED;
+	}
 
 	trace_xfs_metadir_lookup(dp, xname, args.inumber);
 	*ino = args.inumber;


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 36/36] xfs: check metadata directory file path connectivity
  2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                     ` (34 preceding siblings ...)
  2024-12-23 21:46   ` [PATCH 35/36] xfs: record health problems with the metadata directory Darrick J. Wong
@ 2024-12-23 21:46   ` Darrick J. Wong
  35 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:46 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: b3c03efa5972f084e40104307dbe432359279cf2

Create a new scrubber type that checks that well known metadata
directory paths are connected to the metadata inode that the incore
structures think is in use.  For example, check that "/quota/user" in
the metadata directory tree actually points to
mp->m_quotainfo->qi_uquotaip->i_ino.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_fs.h     |   13 ++++++++++++-
 libxfs/xfs_health.h |    4 +++-
 2 files changed, 15 insertions(+), 2 deletions(-)


diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index b05e6fb1470351..faa38a7d1eb019 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -199,6 +199,7 @@ struct xfs_fsop_geom {
 #define XFS_FSOP_GEOM_SICK_QUOTACHECK	(1 << 6)  /* quota counts */
 #define XFS_FSOP_GEOM_SICK_NLINKS	(1 << 7)  /* inode link counts */
 #define XFS_FSOP_GEOM_SICK_METADIR	(1 << 8)  /* metadata directory */
+#define XFS_FSOP_GEOM_SICK_METAPATH	(1 << 9)  /* metadir tree path */
 
 /* Output for XFS_FS_COUNTS */
 typedef struct xfs_fsop_counts {
@@ -732,9 +733,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_NLINKS	26	/* inode link counts */
 #define XFS_SCRUB_TYPE_HEALTHY	27	/* everything checked out ok */
 #define XFS_SCRUB_TYPE_DIRTREE	28	/* directory tree structure */
+#define XFS_SCRUB_TYPE_METAPATH	29	/* metadata directory tree paths */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	29
+#define XFS_SCRUB_TYPE_NR	30
 
 /*
  * This special type code only applies to the vectored scrub implementation.
@@ -812,6 +814,15 @@ struct xfs_scrub_vec_head {
 
 #define XFS_SCRUB_VEC_FLAGS_ALL		(0)
 
+/*
+ * i: sm_ino values for XFS_SCRUB_TYPE_METAPATH to select a metadata file for
+ * path checking.
+ */
+#define XFS_SCRUB_METAPATH_PROBE	(0)  /* do we have a metapath scrubber? */
+
+/* Number of metapath sm_ino values */
+#define XFS_SCRUB_METAPATH_NR		(1)
+
 /*
  * ioctl limits
  */
diff --git a/libxfs/xfs_health.h b/libxfs/xfs_health.h
index f90e8dfc050000..a23df94319e5fb 100644
--- a/libxfs/xfs_health.h
+++ b/libxfs/xfs_health.h
@@ -63,6 +63,7 @@ struct xfs_da_args;
 #define XFS_SICK_FS_QUOTACHECK	(1 << 4)  /* quota counts */
 #define XFS_SICK_FS_NLINKS	(1 << 5)  /* inode link counts */
 #define XFS_SICK_FS_METADIR	(1 << 6)  /* metadata directory tree */
+#define XFS_SICK_FS_METAPATH	(1 << 7)  /* metadata directory tree path */
 
 /* Observable health issues for realtime volume metadata. */
 #define XFS_SICK_RT_BITMAP	(1 << 0)  /* realtime bitmap */
@@ -107,7 +108,8 @@ struct xfs_da_args;
 				 XFS_SICK_FS_PQUOTA | \
 				 XFS_SICK_FS_QUOTACHECK | \
 				 XFS_SICK_FS_NLINKS | \
-				 XFS_SICK_FS_METADIR)
+				 XFS_SICK_FS_METADIR | \
+				 XFS_SICK_FS_METAPATH)
 
 #define XFS_SICK_RT_PRIMARY	(XFS_SICK_RT_BITMAP | \
 				 XFS_SICK_RT_SUMMARY)


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 01/41] libxfs: constify the xfs_inode predicates
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
@ 2024-12-23 21:46   ` Darrick J. Wong
  2024-12-23 21:47   ` [PATCH 02/41] libxfs: load metadata directory root at mount time Darrick J. Wong
                     ` (39 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:46 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Change the xfs_inode predicates to take a const struct xfs_inode pointer
because they do not change the inode.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/xfs_inode.h |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)


diff --git a/include/xfs_inode.h b/include/xfs_inode.h
index 6f2d23987d5f8a..30e171696c80e2 100644
--- a/include/xfs_inode.h
+++ b/include/xfs_inode.h
@@ -248,7 +248,7 @@ typedef struct xfs_inode {
 	struct inode		i_vnode;
 } xfs_inode_t;
 
-static inline bool xfs_inode_has_attr_fork(struct xfs_inode *ip)
+static inline bool xfs_inode_has_attr_fork(const struct xfs_inode *ip)
 {
 	return ip->i_forkoff > 0;
 }
@@ -372,17 +372,17 @@ static inline void drop_nlink(struct inode *inode)
 	inode->i_nlink--;
 }
 
-static inline bool xfs_is_reflink_inode(struct xfs_inode *ip)
+static inline bool xfs_is_reflink_inode(const struct xfs_inode *ip)
 {
 	return ip->i_diflags2 & XFS_DIFLAG2_REFLINK;
 }
 
-static inline bool xfs_inode_has_bigtime(struct xfs_inode *ip)
+static inline bool xfs_inode_has_bigtime(const struct xfs_inode *ip)
 {
 	return ip->i_diflags2 & XFS_DIFLAG2_BIGTIME;
 }
 
-static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
+static inline bool xfs_inode_has_large_extent_counts(const struct xfs_inode *ip)
 {
 	return ip->i_diflags2 & XFS_DIFLAG2_NREXT64;
 }
@@ -392,12 +392,12 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
  * Decide if this file is a realtime file whose data allocation unit is larger
  * than a single filesystem block.
  */
-static inline bool xfs_inode_has_bigrtalloc(struct xfs_inode *ip)
+static inline bool xfs_inode_has_bigrtalloc(const struct xfs_inode *ip)
 {
 	return XFS_IS_REALTIME_INODE(ip) && ip->i_mount->m_sb.sb_rextsize > 1;
 }
 
-static inline bool xfs_is_always_cow_inode(struct xfs_inode *ip)
+static inline bool xfs_is_always_cow_inode(const struct xfs_inode *ip)
 {
 	return false;
 }


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 02/41] libxfs: load metadata directory root at mount time
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
  2024-12-23 21:46   ` [PATCH 01/41] libxfs: constify the xfs_inode predicates Darrick J. Wong
@ 2024-12-23 21:47   ` Darrick J. Wong
  2024-12-23 21:47   ` [PATCH 03/41] libxfs: enforce metadata inode flag Darrick J. Wong
                     ` (38 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:47 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Load the metadata directory root inode into memory at mount time and
release it at unmount time.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/xfs_mount.h |    1 +
 libxfs/init.c       |   26 ++++++++++++++++++++++++++
 2 files changed, 27 insertions(+)


diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index c7fada9e2a6d70..6daf3f01ffa9cf 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -91,6 +91,7 @@ typedef struct xfs_mount {
 	uint8_t			*m_rsum_cache;
 	struct xfs_inode	*m_rbmip;	/* pointer to bitmap inode */
 	struct xfs_inode	*m_rsumip;	/* pointer to summary inode */
+	struct xfs_inode	*m_metadirip;	/* ptr to metadata directory */
 	struct xfs_buftarg	*m_ddev_targp;
 	struct xfs_buftarg	*m_logdev_targp;
 	struct xfs_buftarg	*m_rtdev_targp;
diff --git a/libxfs/init.c b/libxfs/init.c
index beb58706629d23..bf488c5d8533b1 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -619,6 +619,27 @@ libxfs_compute_all_maxlevels(
 
 }
 
+/* Mount the metadata files under the metadata directory tree. */
+STATIC void
+libxfs_mount_setup_metadir(
+	struct xfs_mount	*mp)
+{
+	int			error;
+
+	/* Ignore filesystems that are under construction. */
+	if (mp->m_sb.sb_inprogress)
+		return;
+
+	error = -libxfs_metafile_iget(mp, mp->m_sb.sb_metadirino,
+			XFS_METAFILE_DIR, &mp->m_metadirip);
+	if (error) {
+		fprintf(stderr,
+ _("%s: Failed to load metadir root directory, error %d\n"),
+					progname, error);
+		return;
+	}
+}
+
 /*
  * precalculate the low space thresholds for dynamic speculative preallocation.
  */
@@ -800,6 +821,9 @@ libxfs_mount(
 	}
 	xfs_set_perag_data_loaded(mp);
 
+	if (xfs_has_metadir(mp))
+		libxfs_mount_setup_metadir(mp);
+
 	return mp;
 out_da:
 	xfs_da_unmount(mp);
@@ -918,6 +942,8 @@ libxfs_umount(
 	int			error;
 
 	libxfs_rtmount_destroy(mp);
+	if (mp->m_metadirip)
+		libxfs_irele(mp->m_metadirip);
 
 	/*
 	 * Purge the buffer cache to write all dirty buffers to disk and free


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 03/41] libxfs: enforce metadata inode flag
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
  2024-12-23 21:46   ` [PATCH 01/41] libxfs: constify the xfs_inode predicates Darrick J. Wong
  2024-12-23 21:47   ` [PATCH 02/41] libxfs: load metadata directory root at mount time Darrick J. Wong
@ 2024-12-23 21:47   ` Darrick J. Wong
  2024-12-23 21:47   ` [PATCH 04/41] man2: document metadata directory flag in fsgeom ioctl Darrick J. Wong
                     ` (37 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:47 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Add checks for the metadata inode flag so that we don't ever leak
metadata inodes out to userspace, and we don't ever try to read a
regular inode as metadata.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/inode.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)


diff --git a/libxfs/inode.c b/libxfs/inode.c
index 1eb0bccae48906..0598a70ff504a4 100644
--- a/libxfs/inode.c
+++ b/libxfs/inode.c
@@ -208,7 +208,8 @@ libxfs_iget(
 /*
  * Get a metadata inode.
  *
- * The metafile type must match the file mode exactly.
+ * The metafile type must match the file mode exactly, and for files in the
+ * metadata directory tree, it must match the inode's metatype exactly.
  */
 int
 libxfs_trans_metafile_iget(
@@ -232,6 +233,12 @@ libxfs_trans_metafile_iget(
 		mode = S_IFREG;
 	if (inode_wrong_type(VFS_I(ip), mode))
 		goto bad_rele;
+	if (xfs_has_metadir(mp)) {
+		if (!xfs_is_metadir_inode(ip))
+			goto bad_rele;
+		if (metafile_type != ip->i_metatype)
+			goto bad_rele;
+	}
 
 	*ipp = ip;
 	return 0;


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 04/41] man2: document metadata directory flag in fsgeom ioctl
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-12-23 21:47   ` [PATCH 03/41] libxfs: enforce metadata inode flag Darrick J. Wong
@ 2024-12-23 21:47   ` Darrick J. Wong
  2024-12-23 21:47   ` [PATCH 05/41] man: update scrub ioctl documentation for metadir Darrick J. Wong
                     ` (36 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:47 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Document the additions to the fsgeometry ioctl for metadata directory
trees.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 man/man2/ioctl_xfs_fsgeometry.2 |    3 +++
 1 file changed, 3 insertions(+)


diff --git a/man/man2/ioctl_xfs_fsgeometry.2 b/man/man2/ioctl_xfs_fsgeometry.2
index db7698fa922b87..c808ad5b8b9190 100644
--- a/man/man2/ioctl_xfs_fsgeometry.2
+++ b/man/man2/ioctl_xfs_fsgeometry.2
@@ -214,6 +214,9 @@ .SH FILESYSTEM FEATURE FLAGS
 .TP
 .B XFS_FSOP_GEOM_FLAGS_EXCHANGE_RANGE
 Filesystem can exchange file contents atomically via XFS_IOC_EXCHANGE_RANGE.
+.TP
+.B XFS_FSOP_GEOM_FLAGS_METADIR
+Filesystem contains a metadata directory tree.
 .RE
 .SH XFS METADATA HEALTH REPORTING
 .PP


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 05/41] man: update scrub ioctl documentation for metadir
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-12-23 21:47   ` [PATCH 04/41] man2: document metadata directory flag in fsgeom ioctl Darrick J. Wong
@ 2024-12-23 21:47   ` Darrick J. Wong
  2024-12-23 21:48   ` [PATCH 06/41] libfrog: report metadata directories in the geometry report Darrick J. Wong
                     ` (35 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:47 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Update the scrub ioctl manpage to reflect the new metadir path scrubber.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 man/man2/ioctl_xfs_scrub_metadata.2 |   44 +++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)


diff --git a/man/man2/ioctl_xfs_scrub_metadata.2 b/man/man2/ioctl_xfs_scrub_metadata.2
index 44aa139b297a3b..1e7e327b37d226 100644
--- a/man/man2/ioctl_xfs_scrub_metadata.2
+++ b/man/man2/ioctl_xfs_scrub_metadata.2
@@ -200,6 +200,50 @@ .SH DESCRIPTION
 Mark everything healthy after a clean scrub run.
 This clears out all the indirect health problem markers that might remain
 in the system.
+
+.TP
+.B XFS_SCRUB_TYPE_METAPATH
+Check that a metadata directory path actually points to the active metadata
+inode.
+Metadata inodes are usually cached for the duration of the mount, so this
+scrubber ensures that the same inode will still be reachable after an unmount
+and mount cycle.
+Discrepancies can happen if the directory or parent pointer scrubbers rebuild
+a metadata directory but lose a link in the process.
+The
+.B sm_ino
+field should be passed one of the following special values to communicate which
+path to check:
+
+.RS 7
+.TP
+.B XFS_SCRUB_METAPATH_RTDIR
+Realtime metadata file subdirectory.
+.TP
+.B XFS_SCRUB_METAPATH_RTBITMAP
+Realtime bitmap file.
+.TP
+.B XFS_SCRUB_METAPATH_RTSUMMARY
+Realtime summary file.
+.TP
+.B XFS_SCRUB_METAPATH_QUOTADIR
+Quota metadata file subdirectory.
+.TP
+.B XFS_SCRUB_METAPATH_USRQUOTA
+User quota file.
+.TP
+.B XFS_SCRUB_METAPATH_GRPQUOTA
+Group quota file.
+.TP
+.B XFS_SCRUB_METAPATH_PRJQUOTA
+Project quota file.
+.RE
+
+The values of
+.I sm_agno
+and
+.I sm_gen
+must be zero.
 .RE
 
 .PD 1


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 06/41] libfrog: report metadata directories in the geometry report
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-12-23 21:47   ` [PATCH 05/41] man: update scrub ioctl documentation for metadir Darrick J. Wong
@ 2024-12-23 21:48   ` Darrick J. Wong
  2024-12-23 21:48   ` [PATCH 07/41] libfrog: allow METADIR in xfrog_bulkstat_single5 Darrick J. Wong
                     ` (34 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:48 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Report the presence of a metadata directory tree in the geometry report.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libfrog/fsgeom.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)


diff --git a/libfrog/fsgeom.c b/libfrog/fsgeom.c
index 597c38b1140250..67b4e65713be5b 100644
--- a/libfrog/fsgeom.c
+++ b/libfrog/fsgeom.c
@@ -33,6 +33,7 @@ xfs_report_geom(
 	int			nrext64;
 	int			exchangerange;
 	int			parent;
+	int			metadir;
 
 	isint = geo->logstart > 0;
 	lazycount = geo->flags & XFS_FSOP_GEOM_FLAGS_LAZYSB ? 1 : 0;
@@ -53,13 +54,14 @@ xfs_report_geom(
 	nrext64 = geo->flags & XFS_FSOP_GEOM_FLAGS_NREXT64 ? 1 : 0;
 	exchangerange = geo->flags & XFS_FSOP_GEOM_FLAGS_EXCHANGE_RANGE ? 1 : 0;
 	parent = geo->flags & XFS_FSOP_GEOM_FLAGS_PARENT ? 1 : 0;
+	metadir = geo->flags & XFS_FSOP_GEOM_FLAGS_METADIR ? 1 : 0;
 
 	printf(_(
 "meta-data=%-22s isize=%-6d agcount=%u, agsize=%u blks\n"
 "         =%-22s sectsz=%-5u attr=%u, projid32bit=%u\n"
 "         =%-22s crc=%-8u finobt=%u, sparse=%u, rmapbt=%u\n"
 "         =%-22s reflink=%-4u bigtime=%u inobtcount=%u nrext64=%u\n"
-"         =%-22s exchange=%-3u\n"
+"         =%-22s exchange=%-3u metadir=%u\n"
 "data     =%-22s bsize=%-6u blocks=%llu, imaxpct=%u\n"
 "         =%-22s sunit=%-6u swidth=%u blks\n"
 "naming   =version %-14u bsize=%-6u ascii-ci=%d, ftype=%d, parent=%d\n"
@@ -70,7 +72,7 @@ xfs_report_geom(
 		"", geo->sectsize, attrversion, projid32bit,
 		"", crcs_enabled, finobt_enabled, spinodes, rmapbt_enabled,
 		"", reflink_enabled, bigtime_enabled, inobtcount, nrext64,
-		"", exchangerange,
+		"", exchangerange, metadir,
 		"", geo->blocksize, (unsigned long long)geo->datablocks,
 			geo->imaxpct,
 		"", geo->sunit, geo->swidth,


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 07/41] libfrog: allow METADIR in xfrog_bulkstat_single5
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-12-23 21:48   ` [PATCH 06/41] libfrog: report metadata directories in the geometry report Darrick J. Wong
@ 2024-12-23 21:48   ` Darrick J. Wong
  2024-12-23 21:48   ` [PATCH 08/41] xfs_io: support scrubbing metadata directory paths Darrick J. Wong
                     ` (33 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:48 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

This is a valid flag for a single-file bulkstat, so add that to the
filter.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libfrog/bulkstat.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


diff --git a/libfrog/bulkstat.c b/libfrog/bulkstat.c
index c863bcb6bf89b8..6eceef2a8fa6b9 100644
--- a/libfrog/bulkstat.c
+++ b/libfrog/bulkstat.c
@@ -53,7 +53,8 @@ xfrog_bulkstat_single5(
 	struct xfs_bulkstat_req		*req;
 	int				ret;
 
-	if (flags & ~(XFS_BULK_IREQ_SPECIAL | XFS_BULK_IREQ_NREXT64))
+	if (flags & ~(XFS_BULK_IREQ_SPECIAL | XFS_BULK_IREQ_NREXT64 |
+		      XFS_BULK_IREQ_METADIR))
 		return -EINVAL;
 
 	if (xfd->fsgeom.flags & XFS_FSOP_GEOM_FLAGS_NREXT64)


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 08/41] xfs_io: support scrubbing metadata directory paths
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (6 preceding siblings ...)
  2024-12-23 21:48   ` [PATCH 07/41] libfrog: allow METADIR in xfrog_bulkstat_single5 Darrick J. Wong
@ 2024-12-23 21:48   ` Darrick J. Wong
  2024-12-23 21:48   ` [PATCH 09/41] xfs_db: disable xfs_check when metadir is enabled Darrick J. Wong
                     ` (32 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:48 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Support invoking the metadata directory path scrubber from xfs_io for
testing.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libfrog/scrub.c |   14 +++++++++++++-
 libfrog/scrub.h |    2 ++
 scrub/scrub.c   |   18 ++++++++++++++++++
 3 files changed, 33 insertions(+), 1 deletion(-)


diff --git a/libfrog/scrub.c b/libfrog/scrub.c
index e233c0f9c8e1e6..b2d58c7a966b0d 100644
--- a/libfrog/scrub.c
+++ b/libfrog/scrub.c
@@ -154,8 +154,20 @@ const struct xfrog_scrub_descr xfrog_scrubbers[XFS_SCRUB_TYPE_NR] = {
 		.descr	= "directory tree structure",
 		.group	= XFROG_SCRUB_GROUP_INODE,
 	},
+	[XFS_SCRUB_TYPE_METAPATH] = {
+		.name	= "metapath",
+		.descr	= "metadata directory paths",
+		.group	= XFROG_SCRUB_GROUP_METAPATH,
+	},
+};
+
+const struct xfrog_scrub_descr xfrog_metapaths[XFS_SCRUB_METAPATH_NR] = {
+	[XFS_SCRUB_METAPATH_PROBE] = {
+		.name	= "probe",
+		.descr	= "metapath",
+		.group	= XFROG_SCRUB_GROUP_NONE,
+	},
 };
-#undef DEP
 
 /* Invoke the scrub ioctl.  Returns zero or negative error code. */
 int
diff --git a/libfrog/scrub.h b/libfrog/scrub.h
index b564c0d7bd0f55..a35d3e9c293fe5 100644
--- a/libfrog/scrub.h
+++ b/libfrog/scrub.h
@@ -15,6 +15,7 @@ enum xfrog_scrub_group {
 	XFROG_SCRUB_GROUP_INODE,	/* per-inode metadata */
 	XFROG_SCRUB_GROUP_ISCAN,	/* metadata requiring full inode scan */
 	XFROG_SCRUB_GROUP_SUMMARY,	/* summary metadata */
+	XFROG_SCRUB_GROUP_METAPATH,	/* metadata directory path */
 };
 
 /* Catalog of scrub types and names, indexed by XFS_SCRUB_TYPE_* */
@@ -25,6 +26,7 @@ struct xfrog_scrub_descr {
 };
 
 extern const struct xfrog_scrub_descr xfrog_scrubbers[XFS_SCRUB_TYPE_NR];
+extern const struct xfrog_scrub_descr xfrog_metapaths[XFS_SCRUB_METAPATH_NR];
 
 int xfrog_scrub_metadata(struct xfs_fd *xfd, struct xfs_scrub_metadata *meta);
 
diff --git a/scrub/scrub.c b/scrub/scrub.c
index 44c4049899d29e..bcd63eea1030a6 100644
--- a/scrub/scrub.c
+++ b/scrub/scrub.c
@@ -53,6 +53,22 @@ static const unsigned int scrub_deps[XFS_SCRUB_TYPE_NR] = {
 };
 #undef DEP
 
+static int
+format_metapath_descr(
+	char				*buf,
+	size_t				buflen,
+	struct xfs_scrub_vec_head	*vhead)
+{
+	const struct xfrog_scrub_descr	*sc;
+
+	if (vhead->svh_ino >= XFS_SCRUB_METAPATH_NR)
+		return snprintf(buf, buflen, _("unknown metadir path %llu"),
+				(unsigned long long)vhead->svh_ino);
+
+	sc = &xfrog_metapaths[vhead->svh_ino];
+	return snprintf(buf, buflen, "%s", _(sc->descr));
+}
+
 /* Describe the current state of a vectored scrub. */
 int
 format_scrubv_descr(
@@ -89,6 +105,8 @@ format_scrubv_descr(
 	case XFROG_SCRUB_GROUP_ISCAN:
 	case XFROG_SCRUB_GROUP_NONE:
 		return snprintf(buf, buflen, _("%s"), _(sc->descr));
+	case XFROG_SCRUB_GROUP_METAPATH:
+		return format_metapath_descr(buf, buflen, vhead);
 	}
 	return -1;
 }


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 09/41] xfs_db: disable xfs_check when metadir is enabled
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (7 preceding siblings ...)
  2024-12-23 21:48   ` [PATCH 08/41] xfs_io: support scrubbing metadata directory paths Darrick J. Wong
@ 2024-12-23 21:48   ` Darrick J. Wong
  2024-12-23 21:49   ` [PATCH 10/41] xfs_db: report metadir support for version command Darrick J. Wong
                     ` (31 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:48 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

As of July 2024, xfs_repair can detect more types of corruptions than
xfs_check does.  I don't think it makes sense to maintain the xfs_check
code anymore, so let's just turn it off for any filesystem that has
metadata directory trees.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/check.c |    6 ++++++
 1 file changed, 6 insertions(+)


diff --git a/db/check.c b/db/check.c
index fb7b6cb41a3fbf..37306bd7a6ac2d 100644
--- a/db/check.c
+++ b/db/check.c
@@ -831,6 +831,12 @@ blockget_f(
 		dbprefix = oldprefix;
 		return 0;
 	}
+
+	if (xfs_has_metadir(mp)) {
+		dbprefix = oldprefix;
+		return 0;
+	}
+
 	check_rootdir();
 	/*
 	 * Check that there are no blocks either


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 10/41] xfs_db: report metadir support for version command
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (8 preceding siblings ...)
  2024-12-23 21:48   ` [PATCH 09/41] xfs_db: disable xfs_check when metadir is enabled Darrick J. Wong
@ 2024-12-23 21:49   ` Darrick J. Wong
  2024-12-23 21:49   ` [PATCH 11/41] xfs_db: don't obfuscate metadata directories and attributes Darrick J. Wong
                     ` (30 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:49 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Report metadir support if we have it enabled.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/inode.c |    3 +++
 db/sb.c    |    2 ++
 2 files changed, 5 insertions(+)


diff --git a/db/inode.c b/db/inode.c
index 246febb5929aa1..0aff68083508cb 100644
--- a/db/inode.c
+++ b/db/inode.c
@@ -207,6 +207,9 @@ const field_t	inode_v3_flds[] = {
 	{ "nrext64", FLDT_UINT1,
 	  OI(COFF(flags2) + bitsz(uint64_t) - XFS_DIFLAG2_NREXT64_BIT - 1), C1,
 	  0, TYP_NONE },
+	{ "metadata", FLDT_UINT1,
+	  OI(COFF(flags2) + bitsz(uint64_t) - XFS_DIFLAG2_METADATA_BIT-1), C1,
+	  0, TYP_NONE },
 	{ NULL }
 };
 
diff --git a/db/sb.c b/db/sb.c
index 9fcb7340f8b02f..4f115650e1283f 100644
--- a/db/sb.c
+++ b/db/sb.c
@@ -710,6 +710,8 @@ version_string(
 		strcat(s, ",EXCHANGE");
 	if (xfs_has_parent(mp))
 		strcat(s, ",PARENT");
+	if (xfs_has_metadir(mp))
+		strcat(s, ",METADIR");
 	return s;
 }
 


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 11/41] xfs_db: don't obfuscate metadata directories and attributes
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (9 preceding siblings ...)
  2024-12-23 21:49   ` [PATCH 10/41] xfs_db: report metadir support for version command Darrick J. Wong
@ 2024-12-23 21:49   ` Darrick J. Wong
  2024-12-23 21:49   ` [PATCH 12/41] xfs_db: support metadata directories in the path command Darrick J. Wong
                     ` (29 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:49 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Don't obfuscate the directory and attribute names of metadata inodes.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/metadump.c |  385 +++++++++++++++++++++++++++------------------------------
 1 file changed, 183 insertions(+), 202 deletions(-)


diff --git a/db/metadump.c b/db/metadump.c
index 5c57c1293a53cb..144ebfbfe2a90e 100644
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -1056,10 +1056,26 @@ generate_obfuscated_name(
 		free(orig_name);
 }
 
+static inline bool
+is_metadata_ino(
+	struct xfs_dinode	*dip)
+{
+	if (!xfs_has_metadir(mp) || dip->di_version < 3)
+		return false;
+	return dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_METADATA);
+}
+
+static inline bool
+want_obfuscate_dirents(bool is_meta)
+{
+	return metadump.obfuscate && !is_meta;
+}
+
 static void
 process_sf_dir(
 	struct xfs_dinode	*dip)
 {
+	bool			is_meta = is_metadata_ino(dip);
 	struct xfs_dir2_sf_hdr	*sfp;
 	xfs_dir2_sf_entry_t	*sfep;
 	uint64_t		ino_dir_size;
@@ -1105,7 +1121,7 @@ process_sf_dir(
 					 (char *)sfp);
 		}
 
-		if (metadump.obfuscate)
+		if (want_obfuscate_dirents(is_meta))
 			generate_obfuscated_name(
 					 libxfs_dir2_sf_get_ino(mp, sfp, sfep),
 					 namelen, &sfep->name[0]);
@@ -1201,7 +1217,8 @@ maybe_obfuscate_pptr(
 	uint8_t				*name,
 	int				namelen,
 	const void			*value,
-	int				valuelen)
+	int				valuelen,
+	bool				is_meta)
 {
 	unsigned char			old_name[MAXNAMELEN];
 	struct remap_ent		*remap;
@@ -1210,7 +1227,7 @@ maybe_obfuscate_pptr(
 	xfs_ino_t			parent_ino;
 	int				error;
 
-	if (!metadump.obfuscate)
+	if (!metadump.obfuscate || is_meta)
 		return;
 
 	if (!(attr_flags & XFS_ATTR_PARENT))
@@ -1274,9 +1291,10 @@ want_obfuscate_attr(
 	const void	*name,
 	unsigned int	namelen,
 	const void	*value,
-	unsigned int	valuelen)
+	unsigned int	valuelen,
+	bool		is_meta)
 {
-	if (!metadump.obfuscate)
+	if (!metadump.obfuscate || is_meta)
 		return false;
 
 	/*
@@ -1289,15 +1307,15 @@ want_obfuscate_attr(
 	return true;
 }
 
+/*
+ * Obfuscate the attr names and fill the actual values with 'v' (to see a valid
+ * string length, as opposed to NULLs).
+ */
 static void
 process_sf_attr(
 	struct xfs_dinode		*dip)
 {
-	/*
-	 * with extended attributes, obfuscate the names and fill the actual
-	 * values with 'v' (to see a valid string length, as opposed to NULLs)
-	 */
-
+	bool				is_meta = is_metadata_ino(dip);
 	struct xfs_attr_sf_hdr		*hdr = XFS_DFORK_APTR(dip);
 	struct xfs_attr_sf_entry	*asfep = libxfs_attr_sf_firstentry(hdr);
 	int				ino_attr_size;
@@ -1338,9 +1356,9 @@ process_sf_attr(
 
 		if (asfep->flags & XFS_ATTR_PARENT) {
 			maybe_obfuscate_pptr(asfep->flags, name, namelen,
-					value, asfep->valuelen);
+					value, asfep->valuelen, is_meta);
 		} else if (want_obfuscate_attr(asfep->flags, name, namelen,
-					value, asfep->valuelen)) {
+					value, asfep->valuelen, is_meta)) {
 			generate_obfuscated_name(0, asfep->namelen, name);
 			memset(value, 'v', asfep->valuelen);
 		}
@@ -1442,7 +1460,8 @@ static void
 process_dir_data_block(
 	char		*block,
 	xfs_fileoff_t	offset,
-	int		is_block_format)
+	int		is_block_format,
+	bool		is_meta)
 {
 	/*
 	 * we have to rely on the fileoffset and signature of the block to
@@ -1549,7 +1568,7 @@ process_dir_data_block(
 				dir_offset)
 			return;
 
-		if (metadump.obfuscate)
+		if (want_obfuscate_dirents(is_meta))
 			generate_obfuscated_name(be64_to_cpu(dep->inumber),
 					 dep->namelen, &dep->name[0]);
 		dir_offset += length;
@@ -1574,7 +1593,8 @@ process_symlink_block(
 	xfs_fsblock_t	s,
 	xfs_filblks_t	c,
 	typnm_t		btype,
-	xfs_fileoff_t	last)
+	xfs_fileoff_t	last,
+	bool		is_meta)
 {
 	struct bbmap	map;
 	char		*link;
@@ -1599,7 +1619,7 @@ process_symlink_block(
 	if (xfs_has_crc((mp)))
 		link += sizeof(struct xfs_dsymlink_hdr);
 
-	if (metadump.obfuscate)
+	if (want_obfuscate_dirents(is_meta))
 		obfuscate_path_components(link, XFS_SYMLINK_BUF_SPACE(mp,
 							mp->m_sb.sb_blocksize));
 	if (metadump.zero_stale_data) {
@@ -1650,7 +1670,8 @@ add_remote_vals(
 static void
 process_attr_block(
 	char				*block,
-	xfs_fileoff_t			offset)
+	xfs_fileoff_t			offset,
+	bool				is_meta)
 {
 	struct xfs_attr_leafblock	*leaf;
 	struct xfs_attr3_icleaf_hdr	hdr;
@@ -1730,10 +1751,10 @@ process_attr_block(
 			if (entry->flags & XFS_ATTR_PARENT) {
 				maybe_obfuscate_pptr(entry->flags, name,
 						local->namelen, value,
-						valuelen);
+						valuelen, is_meta);
 			} else if (want_obfuscate_attr(entry->flags, name,
 						local->namelen, value,
-						valuelen)) {
+						valuelen, is_meta)) {
 				generate_obfuscated_name(0, local->namelen,
 						name);
 				memset(value, 'v', valuelen);
@@ -1759,7 +1780,7 @@ process_attr_block(
 				/* do not obfuscate obviously busted pptr */
 				add_remote_vals(be32_to_cpu(remote->valueblk),
 						be32_to_cpu(remote->valuelen));
-			} else if (metadump.obfuscate) {
+			} else if (want_obfuscate_dirents(is_meta)) {
 				generate_obfuscated_name(0, remote->namelen,
 							 &remote->name[0]);
 				add_remote_vals(be32_to_cpu(remote->valueblk),
@@ -1792,7 +1813,8 @@ process_single_fsb_objects(
 	xfs_fsblock_t	s,
 	xfs_filblks_t	c,
 	typnm_t		btype,
-	xfs_fileoff_t	last)
+	xfs_fileoff_t	last,
+	bool		is_meta)
 {
 	int		rval = 1;
 	char		*dp;
@@ -1862,12 +1884,13 @@ process_single_fsb_objects(
 				process_dir_leaf_block(dp);
 			} else {
 				process_dir_data_block(dp, o,
-					 last == mp->m_dir_geo->fsbcount);
+					 last == mp->m_dir_geo->fsbcount,
+					 is_meta);
 			}
 			iocur_top->need_crc = 1;
 			break;
 		case TYP_ATTR:
-			process_attr_block(dp, o);
+			process_attr_block(dp, o, is_meta);
 			iocur_top->need_crc = 1;
 			break;
 		default:
@@ -1900,7 +1923,8 @@ process_multi_fsb_dir(
 	xfs_fsblock_t	s,
 	xfs_filblks_t	c,
 	typnm_t		btype,
-	xfs_fileoff_t	last)
+	xfs_fileoff_t	last,
+	bool		is_meta)
 {
 	char		*dp;
 	int		rval = 1;
@@ -1944,7 +1968,8 @@ process_multi_fsb_dir(
 				process_dir_leaf_block(dp);
 			} else {
 				process_dir_data_block(dp, o,
-					 last == mp->m_dir_geo->fsbcount);
+					 last == mp->m_dir_geo->fsbcount,
+					 is_meta);
 			}
 			iocur_top->need_crc = 1;
 write:
@@ -1963,6 +1988,38 @@ process_multi_fsb_dir(
 	return rval;
 }
 
+static typnm_t
+ifork_data_type(
+	struct xfs_dinode	*dip,
+	int			whichfork)
+{
+	xfs_ino_t		ino = be64_to_cpu(dip->di_ino);
+
+	if (whichfork == XFS_ATTR_FORK)
+		return TYP_ATTR;
+
+	switch (be16_to_cpu(dip->di_mode) & S_IFMT) {
+	case S_IFDIR:
+		return TYP_DIR2;
+	case S_IFLNK:
+		return TYP_SYMLINK;
+	case S_IFREG:
+		if (ino == mp->m_sb.sb_rbmino)
+			return TYP_RTBITMAP;
+		if (ino == mp->m_sb.sb_rsumino)
+			return TYP_RTSUMMARY;
+		if (ino == mp->m_sb.sb_uquotino)
+			return TYP_DQBLK;
+		if (ino == mp->m_sb.sb_gquotino)
+			return TYP_DQBLK;
+		if (ino == mp->m_sb.sb_pquotino)
+			return TYP_DQBLK;
+		return TYP_DATA;
+	default:
+		return TYP_NONE;
+	}
+}
+
 static bool
 is_multi_fsb_object(
 	struct xfs_mount	*mp,
@@ -1981,13 +2038,14 @@ process_multi_fsb_objects(
 	xfs_fsblock_t	s,
 	xfs_filblks_t	c,
 	typnm_t		btype,
-	xfs_fileoff_t	last)
+	xfs_fileoff_t	last,
+	bool		is_meta)
 {
 	switch (btype) {
 	case TYP_DIR2:
-		return process_multi_fsb_dir(o, s, c, btype, last);
+		return process_multi_fsb_dir(o, s, c, btype, last, is_meta);
 	case TYP_SYMLINK:
-		return process_symlink_block(o, s, c, btype, last);
+		return process_symlink_block(o, s, c, btype, last, is_meta);
 	default:
 		print_warning("bad type for multi-fsb object %d", btype);
 		return 1;
@@ -1997,10 +2055,13 @@ process_multi_fsb_objects(
 /* inode copy routines */
 static int
 process_bmbt_reclist(
-	xfs_bmbt_rec_t		*rp,
-	int			numrecs,
-	typnm_t			btype)
+	struct xfs_dinode	*dip,
+	int			whichfork,
+	struct xfs_bmbt_rec	*rp,
+	int			numrecs)
 {
+	bool			is_meta = is_metadata_ino(dip);
+	typnm_t			btype = ifork_data_type(dip, whichfork);
 	int			i;
 	xfs_fileoff_t		o, op = NULLFILEOFF;
 	xfs_fsblock_t		s;
@@ -2009,7 +2070,6 @@ process_bmbt_reclist(
 	xfs_fileoff_t		last;
 	xfs_agnumber_t		agno;
 	xfs_agblock_t		agbno;
-	bool			is_multi_fsb = is_multi_fsb_object(mp, btype);
 	int			rval = 1;
 
 	if (btype == TYP_DATA)
@@ -2077,12 +2137,12 @@ process_bmbt_reclist(
 		}
 
 		/* multi-extent blocks require special handling */
-		if (is_multi_fsb)
+		if (is_multi_fsb_object(mp, btype))
 			rval = process_multi_fsb_objects(o, s, c, btype,
-					last);
+					last, is_meta);
 		else
 			rval = process_single_fsb_objects(o, s, c, btype,
-					last);
+					last, is_meta);
 		if (!rval)
 			break;
 	}
@@ -2090,6 +2150,11 @@ process_bmbt_reclist(
 	return rval;
 }
 
+struct scan_bmap {
+	struct xfs_dinode *dip;
+	int		whichfork;
+};
+
 static int
 scanfunc_bmap(
 	struct xfs_btree_block	*block,
@@ -2097,8 +2162,9 @@ scanfunc_bmap(
 	xfs_agblock_t		agbno,
 	int			level,
 	typnm_t			btype,
-	void			*arg)	/* ptr to itype */
+	void			*arg)
 {
+	struct scan_bmap	*sbm = arg;
 	int			i;
 	xfs_bmbt_ptr_t		*pp;
 	int			nrecs;
@@ -2113,8 +2179,8 @@ scanfunc_bmap(
 					typtab[btype].name, agno, agbno);
 			return 1;
 		}
-		return process_bmbt_reclist(xfs_bmbt_rec_addr(mp, block, 1),
-					    nrecs, *(typnm_t*)arg);
+		return process_bmbt_reclist(sbm->dip, sbm->whichfork,
+				xfs_bmbt_rec_addr(mp, block, 1), nrecs);
 	}
 
 	if (nrecs > mp->m_bmap_dmxr[1]) {
@@ -2149,23 +2215,17 @@ scanfunc_bmap(
 static int
 process_btinode(
 	struct xfs_dinode 	*dip,
-	typnm_t			itype)
+	int			whichfork)
 {
-	xfs_bmdr_block_t	*dib;
-	int			i;
-	xfs_bmbt_ptr_t		*pp;
-	int			level;
-	int			nrecs;
+	struct xfs_bmdr_block	*dib =
+		(struct xfs_bmdr_block *)XFS_DFORK_PTR(dip, whichfork);
+	int			level = be16_to_cpu(dib->bb_level);
+	int			nrecs = be16_to_cpu(dib->bb_numrecs);
 	int			maxrecs;
-	int			whichfork;
-	typnm_t			btype;
-
-	whichfork = (itype == TYP_ATTR) ? XFS_ATTR_FORK : XFS_DATA_FORK;
-	btype = (itype == TYP_ATTR) ? TYP_BMAPBTA : TYP_BMAPBTD;
-
-	dib = (xfs_bmdr_block_t *)XFS_DFORK_PTR(dip, whichfork);
-	level = be16_to_cpu(dib->bb_level);
-	nrecs = be16_to_cpu(dib->bb_numrecs);
+	typnm_t			btype = (whichfork == XFS_ATTR_FORK) ?
+		TYP_BMAPBTA : TYP_BMAPBTD;
+	xfs_bmbt_ptr_t		*pp;
+	int			i;
 
 	if (level > XFS_BM_MAXLEVELS(mp, whichfork)) {
 		if (metadump.show_warnings)
@@ -2176,8 +2236,8 @@ process_btinode(
 	}
 
 	if (level == 0) {
-		return process_bmbt_reclist(xfs_bmdr_rec_addr(dib, 1),
-					    nrecs, itype);
+		return process_bmbt_reclist(dip, whichfork,
+				xfs_bmdr_rec_addr(dib, 1), nrecs);
 	}
 
 	maxrecs = libxfs_bmdr_maxrecs(XFS_DFORK_SIZE(dip, mp, whichfork), 0);
@@ -2204,6 +2264,10 @@ process_btinode(
 	}
 
 	for (i = 0; i < nrecs; i++) {
+		struct scan_bmap	sbm = {
+			.dip		= dip,
+			.whichfork	= whichfork,
+		};
 		xfs_agnumber_t	ag;
 		xfs_agblock_t	bno;
 
@@ -2220,7 +2284,7 @@ process_btinode(
 			continue;
 		}
 
-		if (!scan_btree(ag, bno, level, btype, &itype, scanfunc_bmap))
+		if (!scan_btree(ag, bno, level, btype, &sbm, scanfunc_bmap))
 			return 0;
 	}
 	return 1;
@@ -2229,19 +2293,13 @@ process_btinode(
 static int
 process_exinode(
 	struct xfs_dinode 	*dip,
-	typnm_t			itype)
+	int			whichfork)
 {
-	int			whichfork;
-	int			used;
-	xfs_extnum_t		nex, max_nex;
+	xfs_extnum_t		max_nex = xfs_iext_max_nextents(
+			xfs_dinode_has_large_extent_counts(dip), whichfork);
+	xfs_extnum_t		nex = xfs_dfork_nextents(dip, whichfork);
+	int			used = nex * sizeof(struct xfs_bmbt_rec);
 
-	whichfork = (itype == TYP_ATTR) ? XFS_ATTR_FORK : XFS_DATA_FORK;
-
-	nex = xfs_dfork_nextents(dip, whichfork);
-	max_nex = xfs_iext_max_nextents(
-			xfs_dinode_has_large_extent_counts(dip),
-			whichfork);
-	used = nex * sizeof(xfs_bmbt_rec_t);
 	if (nex > max_nex || used > XFS_DFORK_SIZE(dip, mp, whichfork)) {
 		if (metadump.show_warnings)
 			print_warning("bad number of extents %llu in inode %lld",
@@ -2257,52 +2315,52 @@ process_exinode(
 		       XFS_DFORK_SIZE(dip, mp, whichfork) - used);
 
 
-	return process_bmbt_reclist((xfs_bmbt_rec_t *)XFS_DFORK_PTR(dip,
-					whichfork), nex, itype);
+	return process_bmbt_reclist(dip, whichfork,
+			(struct xfs_bmbt_rec *)XFS_DFORK_PTR(dip, whichfork),
+			nex);
 }
 
 static int
 process_inode_data(
-	struct xfs_dinode	*dip,
-	typnm_t			itype)
+	struct xfs_dinode	*dip)
 {
 	switch (dip->di_format) {
-		case XFS_DINODE_FMT_LOCAL:
-			if (!(metadump.obfuscate || metadump.zero_stale_data))
-				break;
+	case XFS_DINODE_FMT_LOCAL:
+		if (!(metadump.obfuscate || metadump.zero_stale_data))
+			break;
 
-			/*
-			 * If the fork size is invalid, we can't safely do
-			 * anything with this fork. Leave it alone to preserve
-			 * the information for diagnostic purposes.
-			 */
-			if (XFS_DFORK_DSIZE(dip, mp) > XFS_LITINO(mp)) {
-				print_warning(
+		/*
+		 * If the fork size is invalid, we can't safely do anything
+		 * with this fork. Leave it alone to preserve the information
+		 * for diagnostic purposes.
+		 */
+		if (XFS_DFORK_DSIZE(dip, mp) > XFS_LITINO(mp)) {
+			print_warning(
 "Invalid data fork size (%d) in inode %llu, preserving contents!",
-						XFS_DFORK_DSIZE(dip, mp),
-						(long long)metadump.cur_ino);
-				break;
-			}
+					XFS_DFORK_DSIZE(dip, mp),
+					(long long)metadump.cur_ino);
+			break;
+		}
 
-			switch (itype) {
-				case TYP_DIR2:
-					process_sf_dir(dip);
-					break;
+		switch (be16_to_cpu(dip->di_mode) & S_IFMT) {
+		case S_IFDIR:
+			process_sf_dir(dip);
+			break;
 
-				case TYP_SYMLINK:
-					process_sf_symlink(dip);
-					break;
+		case S_IFLNK:
+			process_sf_symlink(dip);
+			break;
 
-				default:
-					break;
-			}
+		default:
 			break;
+		}
+		break;
 
-		case XFS_DINODE_FMT_EXTENTS:
-			return process_exinode(dip, itype);
+	case XFS_DINODE_FMT_EXTENTS:
+		return process_exinode(dip, XFS_DATA_FORK);
 
-		case XFS_DINODE_FMT_BTREE:
-			return process_btinode(dip, itype);
+	case XFS_DINODE_FMT_BTREE:
+		return process_btinode(dip, XFS_DATA_FORK);
 	}
 	return 1;
 }
@@ -2375,28 +2433,22 @@ process_inode(
 
 	/* copy appropriate data fork metadata */
 	switch (be16_to_cpu(dip->di_mode) & S_IFMT) {
-		case S_IFDIR:
-			rval = process_inode_data(dip, TYP_DIR2);
-			if (dip->di_format == XFS_DINODE_FMT_LOCAL)
-				need_new_crc = true;
-			break;
-		case S_IFLNK:
-			rval = process_inode_data(dip, TYP_SYMLINK);
-			if (dip->di_format == XFS_DINODE_FMT_LOCAL)
-				need_new_crc = true;
-			break;
-		case S_IFREG:
-			rval = process_inode_data(dip, TYP_DATA);
-			break;
-		case S_IFIFO:
-		case S_IFCHR:
-		case S_IFBLK:
-		case S_IFSOCK:
-			process_dev_inode(dip);
+	case S_IFDIR:
+	case S_IFLNK:
+	case S_IFREG:
+		rval = process_inode_data(dip);
+		if (dip->di_format == XFS_DINODE_FMT_LOCAL)
 			need_new_crc = true;
-			break;
-		default:
-			break;
+		break;
+	case S_IFIFO:
+	case S_IFCHR:
+	case S_IFBLK:
+	case S_IFSOCK:
+		process_dev_inode(dip);
+		need_new_crc = true;
+		break;
+	default:
+		break;
 	}
 	nametable_clear();
 	if (!rval)
@@ -2406,20 +2458,17 @@ process_inode(
 	if (XFS_DFORK_DSIZE(dip, mp) < XFS_LITINO(mp)) {
 		attr_data.remote_val_count = 0;
 		switch (dip->di_aformat) {
-			case XFS_DINODE_FMT_LOCAL:
-				need_new_crc = true;
-				if (metadump.obfuscate ||
-				    metadump.zero_stale_data)
-					process_sf_attr(dip);
-				break;
-
-			case XFS_DINODE_FMT_EXTENTS:
-				rval = process_exinode(dip, TYP_ATTR);
-				break;
-
-			case XFS_DINODE_FMT_BTREE:
-				rval = process_btinode(dip, TYP_ATTR);
-				break;
+		case XFS_DINODE_FMT_LOCAL:
+			need_new_crc = true;
+			if (metadump.obfuscate || metadump.zero_stale_data)
+				process_sf_attr(dip);
+			break;
+		case XFS_DINODE_FMT_EXTENTS:
+			rval = process_exinode(dip, XFS_ATTR_FORK);
+			break;
+		case XFS_DINODE_FMT_BTREE:
+			rval = process_btinode(dip, XFS_ATTR_FORK);
+			break;
 		}
 		nametable_clear();
 	}
@@ -2796,70 +2845,6 @@ scan_ag(
 	return rval;
 }
 
-static int
-copy_ino(
-	xfs_ino_t		ino,
-	typnm_t			itype)
-{
-	xfs_agnumber_t		agno;
-	xfs_agblock_t		agbno;
-	xfs_agino_t		agino;
-	int			offset;
-	int			rval = 1;
-
-	if (ino == 0 || ino == NULLFSINO)
-		return 1;
-
-	agno = XFS_INO_TO_AGNO(mp, ino);
-	agino = XFS_INO_TO_AGINO(mp, ino);
-	agbno = XFS_AGINO_TO_AGBNO(mp, agino);
-	offset = XFS_AGINO_TO_OFFSET(mp, agino);
-
-	if (agno >= mp->m_sb.sb_agcount || agbno >= mp->m_sb.sb_agblocks ||
-			offset >= mp->m_sb.sb_inopblock) {
-		if (metadump.show_warnings)
-			print_warning("invalid %s inode number (%lld)",
-					typtab[itype].name, (long long)ino);
-		return 1;
-	}
-
-	push_cur();
-	set_cur(&typtab[TYP_INODE], XFS_AGB_TO_DADDR(mp, agno, agbno),
-			blkbb, DB_RING_IGN, NULL);
-	if (iocur_top->data == NULL) {
-		print_warning("cannot read %s inode %lld",
-				typtab[itype].name, (long long)ino);
-		rval = !metadump.stop_on_read_error;
-		goto pop_out;
-	}
-	off_cur(offset << mp->m_sb.sb_inodelog, mp->m_sb.sb_inodesize);
-
-	metadump.cur_ino = ino;
-	rval = process_inode_data(iocur_top->data, itype);
-pop_out:
-	pop_cur();
-	return rval;
-}
-
-
-static int
-copy_sb_inodes(void)
-{
-	if (!copy_ino(mp->m_sb.sb_rbmino, TYP_RTBITMAP))
-		return 0;
-
-	if (!copy_ino(mp->m_sb.sb_rsumino, TYP_RTSUMMARY))
-		return 0;
-
-	if (!copy_ino(mp->m_sb.sb_uquotino, TYP_DQBLK))
-		return 0;
-
-	if (!copy_ino(mp->m_sb.sb_gquotino, TYP_DQBLK))
-		return 0;
-
-	return copy_ino(mp->m_sb.sb_pquotino, TYP_DQBLK);
-}
-
 static int
 copy_log(void)
 {
@@ -3296,10 +3281,6 @@ metadump_f(
 		}
 	}
 
-	/* copy realtime and quota inode contents */
-	if (!exitcode)
-		exitcode = !copy_sb_inodes();
-
 	/* copy log */
 	if (!exitcode && !(metadump.version == 1 && metadump.external_log))
 		exitcode = !copy_log();


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 12/41] xfs_db: support metadata directories in the path command
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (10 preceding siblings ...)
  2024-12-23 21:49   ` [PATCH 11/41] xfs_db: don't obfuscate metadata directories and attributes Darrick J. Wong
@ 2024-12-23 21:49   ` Darrick J. Wong
  2024-12-23 21:49   ` [PATCH 13/41] xfs_db: show the metadata root directory when dumping superblocks Darrick J. Wong
                     ` (28 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:49 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Teach various directory tree debugger commands to traverse the metadata
directory tree by adding a -m switch to select that tree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/namei.c        |   71 ++++++++++++++++++++++++++++++++++++++++++++---------
 man/man8/xfs_db.8 |   23 ++++++++++++++---
 2 files changed, 78 insertions(+), 16 deletions(-)


diff --git a/db/namei.c b/db/namei.c
index 8c7f4932fce4a3..4eae4e8fd3232c 100644
--- a/db/namei.c
+++ b/db/namei.c
@@ -139,11 +139,11 @@ path_navigate(
 /* Walk a directory path to an inode and set the io cursor to that inode. */
 static int
 path_walk(
+	xfs_ino_t	rootino,
 	char		*path)
 {
 	struct dirpath	*dirpath;
 	char		*p = path;
-	xfs_ino_t	rootino = mp->m_sb.sb_rootino;
 	int		error = 0;
 
 	if (*p == '/') {
@@ -173,6 +173,9 @@ path_help(void)
 	dbprintf(_(
 "\n"
 " Navigate to an inode via directory path.\n"
+"\n"
+" Options:\n"
+"   -m -- Walk an absolute path down the metadata directory tree.\n"
 	));
 }
 
@@ -181,18 +184,34 @@ path_f(
 	int		argc,
 	char		**argv)
 {
+	xfs_ino_t	rootino = mp->m_sb.sb_rootino;
 	int		c;
 	int		error;
 
-	while ((c = getopt(argc, argv, "")) != -1) {
+	while ((c = getopt(argc, argv, "m")) != -1) {
 		switch (c) {
+		case 'm':
+			/* Absolute path, start from metadata rootdir. */
+			if (!xfs_has_metadir(mp)) {
+				dbprintf(
+	_("filesystem does not support metadata directories.\n"));
+				exitcode = 1;
+				return 0;
+			}
+			rootino = mp->m_sb.sb_metadirino;
+			break;
 		default:
 			path_help();
 			return 0;
 		}
 	}
 
-	error = path_walk(argv[optind]);
+	if (argc == optind || argc > optind + 1) {
+		dbprintf(_("Only supply one path.\n"));
+		return -1;
+	}
+
+	error = path_walk(rootino, argv[optind]);
 	if (error) {
 		dbprintf("%s: %s\n", argv[optind], strerror(error));
 		exitcode = 1;
@@ -206,7 +225,7 @@ static struct cmdinfo path_cmd = {
 	.altname	= NULL,
 	.cfunc		= path_f,
 	.argmin		= 1,
-	.argmax		= 1,
+	.argmax		= -1,
 	.canpush	= 0,
 	.args		= "",
 	.help		= path_help,
@@ -519,6 +538,7 @@ ls_help(void)
 " Options:\n"
 "   -i -- Resolve the given paths to their corresponding inode numbers.\n"
 "         If no paths are given, display the current inode number.\n"
+"   -m -- Walk an absolute path down the metadata directory tree.\n"
 "\n"
 " Directory contents will be listed in the format:\n"
 " dir_cookie	inode_number	type	hash	name_length	name\n"
@@ -530,15 +550,26 @@ ls_f(
 	int			argc,
 	char			**argv)
 {
+	xfs_ino_t		rootino = mp->m_sb.sb_rootino;
 	bool			inum_only = false;
 	int			c;
 	int			error = 0;
 
-	while ((c = getopt(argc, argv, "i")) != -1) {
+	while ((c = getopt(argc, argv, "im")) != -1) {
 		switch (c) {
 		case 'i':
 			inum_only = true;
 			break;
+		case 'm':
+			/* Absolute path, start from metadata rootdir. */
+			if (!xfs_has_metadir(mp)) {
+				dbprintf(
+	_("filesystem does not support metadata directories.\n"));
+				exitcode = 1;
+				return 0;
+			}
+			rootino = mp->m_sb.sb_metadirino;
+			break;
 		default:
 			ls_help();
 			return 0;
@@ -561,7 +592,7 @@ ls_f(
 	for (c = optind; c < argc; c++) {
 		push_cur();
 
-		error = path_walk(argv[c]);
+		error = path_walk(rootino, argv[c]);
 		if (error)
 			goto err_cur;
 
@@ -860,11 +891,22 @@ parent_f(
 	int			argc,
 	char			**argv)
 {
+	xfs_ino_t		rootino = mp->m_sb.sb_rootino;
 	int			c;
 	int			error = 0;
 
-	while ((c = getopt(argc, argv, "")) != -1) {
+	while ((c = getopt(argc, argv, "m")) != -1) {
 		switch (c) {
+		case 'm':
+			/* Absolute path, start from metadata rootdir. */
+			if (!xfs_has_metadir(mp)) {
+				dbprintf(
+	_("filesystem does not support metadata directories.\n"));
+				exitcode = 1;
+				return 0;
+			}
+			rootino = mp->m_sb.sb_metadirino;
+			break;
 		default:
 			ls_help();
 			return 0;
@@ -884,7 +926,7 @@ parent_f(
 	for (c = optind; c < argc; c++) {
 		push_cur();
 
-		error = path_walk(argv[c]);
+		error = path_walk(rootino, argv[c]);
 		if (error)
 			goto err_cur;
 
@@ -912,7 +954,7 @@ static struct cmdinfo parent_cmd = {
 	.argmin		= 0,
 	.argmax		= -1,
 	.canpush	= 0,
-	.args		= "[paths...]",
+	.args		= "[-m] [paths...]",
 	.help		= parent_help,
 };
 
@@ -926,6 +968,7 @@ link_help(void)
 "\n"
 " Options:\n"
 "   -i   -- Point to this specific inode number.\n"
+"   -m   -- Select the metadata directory tree.\n"
 "   -p   -- Point to the inode given by this path.\n"
 "   -t   -- Set the file type to this value.\n"
 "   name -- Create this directory entry with this name.\n"
@@ -1039,11 +1082,12 @@ link_f(
 {
 	xfs_ino_t		child_ino = NULLFSINO;
 	int			ftype = XFS_DIR3_FT_UNKNOWN;
+	xfs_ino_t		rootino = mp->m_sb.sb_rootino;
 	unsigned int		i;
 	int			c;
 	int			error = 0;
 
-	while ((c = getopt(argc, argv, "i:p:t:")) != -1) {
+	while ((c = getopt(argc, argv, "i:mp:t:")) != -1) {
 		switch (c) {
 		case 'i':
 			errno = 0;
@@ -1054,9 +1098,12 @@ link_f(
 				return 0;
 			}
 			break;
+		case 'm':
+			rootino = mp->m_sb.sb_metadirino;
+			break;
 		case 'p':
 			push_cur();
-			error = path_walk(optarg);
+			error = path_walk(rootino, optarg);
 			if (error) {
 				printf("%s: %s\n", optarg, strerror(error));
 				exitcode = 1;
@@ -1126,7 +1173,7 @@ static struct cmdinfo link_cmd = {
 	.argmin		= 0,
 	.argmax		= -1,
 	.canpush	= 0,
-	.args		= "[-i ino] [-p path] [-t ftype] name",
+	.args		= "[-i ino] [-m] [-p path] [-t ftype] name",
 	.help		= link_help,
 };
 
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index 998553684586c7..066f124458b286 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -992,7 +992,7 @@ .SH COMMANDS
 .I label
 is given, the current filesystem label is printed.
 .TP
-.BI "link [-i " ino "] [-p " path "] [-t " ftype "] name"
+.BI "link [-i " ino "] [-m] [-p " path "] [-t " ftype "] name"
 In the current directory, create a directory entry with the given
 .I name
 pointing to a file.
@@ -1005,6 +1005,10 @@ .SH COMMANDS
 child file unless the
 .I ftype
 option is given.
+The
+.B -m
+option specifies that the path lookup should be done in the metadata directory
+tree.
 The file being targetted must not be on the iunlink list.
 .TP
 .BI "log [stop | start " filename ]
@@ -1039,7 +1043,7 @@ .SH COMMANDS
 between xfsprogs and the kernel, which will help when diagnosing minimum
 log size calculation errors.
 .TP
-.BI "ls [\-i] [" paths "]..."
+.BI "ls [\-im] [" paths "]..."
 List the contents of a directory.
 If a path resolves to a directory, the directory will be listed.
 If no paths are supplied and the IO cursor points at a directory inode,
@@ -1053,6 +1057,9 @@ .SH COMMANDS
 Resolve each of the given paths to an inode number and print that number.
 If no paths are given and the IO cursor points to an inode, print the inode
 number.
+.TP
+.B \-m
+Absolute paths should be walked from the root of the metadata directory tree.
 .RE
 .TP
 .BI "metadump [\-egow] " filename
@@ -1080,18 +1087,26 @@ .SH COMMANDS
 .B print
 command.
 .TP
-.BI "parent [" paths "]..."
+.BI "parent [\-m] [" paths "]..."
 List the parents of a file.
 If a path resolves to a file, the parents of that file will be listed.
 If no paths are supplied and the IO cursor points at an inode, the parents of
 that file will be listed.
+The
+.B \-m
+option causes absolute paths to be walked from the root of the metadata
+directory tree.
 
 The output format is:
 inode number, inode generation, ondisk namehash, namehash, name length, name.
 .TP
-.BI "path " dir_path
+.BI "path [\-m] " dir_path
 Walk the directory tree to an inode using the supplied path.
 Absolute and relative paths are supported.
+The
+.B \-m
+option causes absolute paths to be walked from the root of the metadata
+directory tree.
 .TP
 .B pop
 Pop location from the stack.


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 13/41] xfs_db: show the metadata root directory when dumping superblocks
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (11 preceding siblings ...)
  2024-12-23 21:49   ` [PATCH 12/41] xfs_db: support metadata directories in the path command Darrick J. Wong
@ 2024-12-23 21:49   ` Darrick J. Wong
  2024-12-23 21:50   ` [PATCH 14/41] xfs_db: display di_metatype Darrick J. Wong
                     ` (27 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:49 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Show the metadirino field when appropriate.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/sb.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)


diff --git a/db/sb.c b/db/sb.c
index 4f115650e1283f..fa15b429ecbefa 100644
--- a/db/sb.c
+++ b/db/sb.c
@@ -50,6 +50,18 @@ sb_init(void)
 	add_command(&version_cmd);
 }
 
+/*
+ * Counts superblock fields that only exist when the metadata directory feature
+ * is enabled.
+ */
+static int
+metadirfld_count(
+	void		*obj,
+	int		startoff)
+{
+	return xfs_has_metadir(mp) ? 1 : 0;
+}
+
 #define	OFF(f)	bitize(offsetof(struct xfs_dsb, sb_ ## f))
 #define	SZC(f)	szcount(struct xfs_dsb, sb_ ## f)
 const field_t	sb_flds[] = {
@@ -113,6 +125,8 @@ const field_t	sb_flds[] = {
 	{ "pquotino", FLDT_INO, OI(OFF(pquotino)), C1, 0, TYP_INODE },
 	{ "lsn", FLDT_UINT64X, OI(OFF(lsn)), C1, 0, TYP_NONE },
 	{ "meta_uuid", FLDT_UUID, OI(OFF(meta_uuid)), C1, 0, TYP_NONE },
+	{ "metadirino", FLDT_INO, OI(OFF(metadirino)), metadirfld_count,
+		FLD_COUNT, TYP_INODE },
 	{ NULL }
 };
 


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 14/41] xfs_db: display di_metatype
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (12 preceding siblings ...)
  2024-12-23 21:49   ` [PATCH 13/41] xfs_db: show the metadata root directory when dumping superblocks Darrick J. Wong
@ 2024-12-23 21:50   ` Darrick J. Wong
  2024-12-23 21:50   ` [PATCH 15/41] xfs_db: drop the metadata checking code from blockget Darrick J. Wong
                     ` (26 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:50 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Print the metadata file type if available.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/field.c |    2 +
 db/field.h |    1 +
 db/inode.c |   83 +++++++++++++++++++++++++++++++++++++++++++++++++++++-------
 db/inode.h |    2 +
 4 files changed, 78 insertions(+), 10 deletions(-)


diff --git a/db/field.c b/db/field.c
index a61ccc9ef6d072..946684415f65d3 100644
--- a/db/field.c
+++ b/db/field.c
@@ -212,6 +212,8 @@ const ftattr_t	ftattrtab[] = {
 	  SI(bitsz(struct xfs_dinode)), 0, NULL, inode_core_flds },
 	{ FLDT_DINODE_FMT, "dinode_fmt", fp_dinode_fmt, NULL,
 	  SI(bitsz(int8_t)), 0, NULL, NULL },
+	{ FLDT_DINODE_METATYPE, "metatype", fp_metatype, NULL,
+	  SI(bitsz(uint16_t)), 0, NULL, NULL },
 	{ FLDT_DINODE_U, "dinode_u", NULL, (char *)inode_u_flds, inode_u_size,
 	  FTARG_SIZE|FTARG_OKEMPTY, NULL, inode_u_flds },
 	{ FLDT_DINODE_V3, "dinode_v3", NULL, (char *)inode_v3_flds,
diff --git a/db/field.h b/db/field.h
index b1bfdbed19cea3..9746676a6c7ac9 100644
--- a/db/field.h
+++ b/db/field.h
@@ -95,6 +95,7 @@ typedef enum fldt	{
 	FLDT_DINODE_A,
 	FLDT_DINODE_CORE,
 	FLDT_DINODE_FMT,
+	FLDT_DINODE_METATYPE,
 	FLDT_DINODE_U,
 	FLDT_DINODE_V3,
 
diff --git a/db/inode.c b/db/inode.c
index 0aff68083508cb..07efbb4902be08 100644
--- a/db/inode.c
+++ b/db/inode.c
@@ -25,6 +25,7 @@ static int	inode_a_offset(void *obj, int startoff, int idx);
 static int	inode_a_sfattr_count(void *obj, int startoff);
 static int	inode_core_nlinkv2_count(void *obj, int startoff);
 static int	inode_core_onlink_count(void *obj, int startoff);
+static int	inode_core_metatype_count(void *obj, int startoff);
 static int	inode_core_projid_count(void *obj, int startoff);
 static int	inode_core_nlinkv1_count(void *obj, int startoff);
 static int	inode_core_v3_pad_count(void *obj, int startoff);
@@ -94,6 +95,8 @@ const field_t	inode_core_flds[] = {
 	  FLD_COUNT, TYP_NONE },
 	{ "onlink", FLDT_UINT16D, OI(COFF(metatype)), inode_core_onlink_count,
 	  FLD_COUNT, TYP_NONE },
+	{ "metatype", FLDT_DINODE_METATYPE, OI(COFF(metatype)),
+	  inode_core_metatype_count, FLD_COUNT, TYP_NONE },
 	{ "uid", FLDT_UINT32D, OI(COFF(uid)), C1, 0, TYP_NONE },
 	{ "gid", FLDT_UINT32D, OI(COFF(gid)), C1, 0, TYP_NONE },
 	{ "nlinkv2", FLDT_UINT32D, OI(COFF(nlink)), inode_core_nlinkv2_count,
@@ -247,9 +250,8 @@ static const char	*dinode_fmt_name[] =
 static const int	dinode_fmt_name_size =
 	sizeof(dinode_fmt_name) / sizeof(dinode_fmt_name[0]);
 
-/*ARGSUSED*/
-int
-fp_dinode_fmt(
+static int
+fp_enum_fmt(
 	void			*obj,
 	int			bit,
 	int			count,
@@ -257,26 +259,65 @@ fp_dinode_fmt(
 	int			size,
 	int			arg,
 	int			base,
-	int			array)
+	int			array,
+	const char		**names,
+	unsigned int		nr_names)
 {
 	int			bitpos;
-	enum xfs_dinode_fmt	f;
+	int			f;
 	int			i;
 
 	for (i = 0, bitpos = bit; i < count; i++, bitpos += size) {
-		f = (enum xfs_dinode_fmt)getbitval(obj, bitpos, size, BVUNSIGNED);
+		f = getbitval(obj, bitpos, size, BVUNSIGNED);
 		if (array)
 			dbprintf("%d:", i + base);
-		if (f < 0 || f >= dinode_fmt_name_size)
-			dbprintf("%d", (int)f);
+		if (f < 0 || f >= nr_names)
+			dbprintf("%d", f);
 		else
-			dbprintf("%d (%s)", (int)f, dinode_fmt_name[(int)f]);
+			dbprintf("%d (%s)", f, names[f]);
 		if (i < count - 1)
 			dbprintf(" ");
 	}
 	return 1;
 }
 
+/*ARGSUSED*/
+int
+fp_dinode_fmt(
+	void			*obj,
+	int			bit,
+	int			count,
+	char			*fmtstr,
+	int			size,
+	int			arg,
+	int			base,
+	int			array)
+{
+	return fp_enum_fmt(obj, bit, count, fmtstr, size, arg, base, array,
+			dinode_fmt_name, dinode_fmt_name_size);
+}
+
+static const char	*metatype_name[] =
+	{ "unknown", "dir", "usrquota", "grpquota", "prjquota", "rtbitmap",
+	  "rtsummary"
+	};
+static const int	metatype_name_size = ARRAY_SIZE(metatype_name);
+
+int
+fp_metatype(
+	void			*obj,
+	int			bit,
+	int			count,
+	char			*fmtstr,
+	int			size,
+	int			arg,
+	int			base,
+	int			array)
+{
+	return fp_enum_fmt(obj, bit, count, fmtstr, size, arg, base, array,
+			metatype_name, metatype_name_size);
+}
+
 static int
 inode_a_bmbt_count(
 	void			*obj,
@@ -414,7 +455,29 @@ inode_core_onlink_count(
 	ASSERT(startoff == 0);
 	ASSERT(obj == iocur_top->data);
 	dic = obj;
-	return dic->di_version >= 2;
+	if (dic->di_version < 2)
+		return 0;
+	if (dic->di_flags2 & cpu_to_be64(XFS_DIFLAG2_METADATA))
+		return 0;
+	return 1;
+}
+
+static int
+inode_core_metatype_count(
+	void			*obj,
+	int			startoff)
+{
+	struct xfs_dinode	*dic;
+
+	ASSERT(startoff == 0);
+	ASSERT(obj == iocur_top->data);
+	dic = obj;
+
+	if (dic->di_version < 3)
+		return 0;
+	if (dic->di_flags2 & cpu_to_be64(XFS_DIFLAG2_METADATA))
+		return 1;
+	return 0;
 }
 
 static int
diff --git a/db/inode.h b/db/inode.h
index 31a2ebbba6a175..9e4ec1b3833716 100644
--- a/db/inode.h
+++ b/db/inode.h
@@ -16,6 +16,8 @@ extern const struct field	timestamp_flds[];
 
 extern int	fp_dinode_fmt(void *obj, int bit, int count, char *fmtstr,
 			      int size, int arg, int base, int array);
+int		fp_metatype(void *obj, int bit, int count, char *fmtstr,
+			      int size, int arg, int base, int array);
 extern int	inode_a_size(void *obj, int startoff, int idx);
 extern void	inode_init(void);
 extern typnm_t	inode_next_type(void);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 15/41] xfs_db: drop the metadata checking code from blockget
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (13 preceding siblings ...)
  2024-12-23 21:50   ` [PATCH 14/41] xfs_db: display di_metatype Darrick J. Wong
@ 2024-12-23 21:50   ` Darrick J. Wong
  2024-12-23 21:50   ` [PATCH 16/41] xfs_io: support flag for limited bulkstat of the metadata directory Darrick J. Wong
                     ` (25 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:50 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Drop the check subcommand and all the metadata checking code from
xfs_db.  We haven't shipped xfs_check in xfsprogs in a decade and the
last known user (fstests) stopped calling it back in July 2024.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/check.c        |  294 -----------------------------------------------------
 man/man8/xfs_db.8 |   12 +-
 2 files changed, 5 insertions(+), 301 deletions(-)


diff --git a/db/check.c b/db/check.c
index 37306bd7a6ac2d..4f7785c64f5b49 100644
--- a/db/check.c
+++ b/db/check.c
@@ -236,14 +236,12 @@ static void		check_dbmap(xfs_agnumber_t agno, xfs_agblock_t agbno,
 				    int ignore_reflink);
 static int		check_inomap(xfs_agnumber_t agno, xfs_agblock_t agbno,
 				     xfs_extlen_t len, xfs_ino_t c_ino);
-static void		check_linkcounts(xfs_agnumber_t agno);
 static int		check_range(xfs_agnumber_t agno, xfs_agblock_t agbno,
 				    xfs_extlen_t len);
 static void		check_rdbmap(xfs_rfsblock_t bno, xfs_extlen_t len,
 				     dbm_t type);
 static int		check_rinomap(xfs_rfsblock_t bno, xfs_extlen_t len,
 				      xfs_ino_t c_ino);
-static void		check_rootdir(void);
 static int		check_rrange(xfs_rfsblock_t bno, xfs_extlen_t len);
 static void		check_set_dbmap(xfs_agnumber_t agno,
 					xfs_agblock_t agbno, xfs_extlen_t len,
@@ -252,11 +250,6 @@ static void		check_set_dbmap(xfs_agnumber_t agno,
 					xfs_agblock_t c_agbno);
 static void		check_set_rdbmap(xfs_rfsblock_t bno, xfs_extlen_t len,
 					 dbm_t type1, dbm_t type2);
-static void		check_summary(void);
-static void		checknot_dbmap(xfs_agnumber_t agno, xfs_agblock_t agbno,
-				       xfs_extlen_t len, int typemask);
-static void		checknot_rdbmap(xfs_rfsblock_t bno, xfs_extlen_t len,
-					int typemask);
 static void		dir_hash_add(xfs_dahash_t hash,
 				     xfs_dir2_dataptr_t addr);
 static void		dir_hash_check(inodata_t *id, int v);
@@ -323,7 +316,6 @@ static void		quota_add(xfs_dqid_t *p, xfs_dqid_t *g, xfs_dqid_t *u,
 static void		quota_add1(qdata_t **qt, xfs_dqid_t id, int dq,
 				   xfs_qcnt_t bc, xfs_qcnt_t ic,
 				   xfs_qcnt_t rc);
-static void		quota_check(char *s, qdata_t **qt);
 static void		quota_init(void);
 static void		scan_ag(xfs_agnumber_t agno);
 static void		scan_freelist(xfs_agf_t *agf);
@@ -376,7 +368,7 @@ static const cmdinfo_t	blockfree_cmd =
 	{ "blockfree", NULL, blockfree_f, 0, 0, 0,
 	  NULL, N_("free block usage information"), NULL };
 static const cmdinfo_t	blockget_cmd =
-	{ "blockget", "check", blockget_f, 0, -1, 0,
+	{ "blockget", NULL, blockget_f, 0, -1, 0,
 	  N_("[-s|-v] [-n] [-t] [-b bno]... [-i ino] ..."),
 	  N_("get block usage and check consistency"), NULL };
 static const cmdinfo_t	blocktrash_cmd =
@@ -826,107 +818,9 @@ blockget_f(
 		blist = NULL;
 		blist_size = 0;
 	}
-	if (serious_error) {
+	if (serious_error)
 		exitcode = 2;
-		dbprefix = oldprefix;
-		return 0;
-	}
 
-	if (xfs_has_metadir(mp)) {
-		dbprefix = oldprefix;
-		return 0;
-	}
-
-	check_rootdir();
-	/*
-	 * Check that there are no blocks either
-	 * a) unaccounted for or
-	 * b) bno-free but not cnt-free
-	 */
-	if (!tflag) {	/* are we in test mode, faking out freespace? */
-		for (agno = 0; agno < mp->m_sb.sb_agcount; agno++)
-			checknot_dbmap(agno, 0, mp->m_sb.sb_agblocks,
-				(1 << DBM_UNKNOWN) | (1 << DBM_FREE1));
-	}
-	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++)
-		check_linkcounts(agno);
-	if (mp->m_sb.sb_rblocks) {
-		checknot_rdbmap(0,
-			(xfs_extlen_t)(mp->m_sb.sb_rextents *
-				       mp->m_sb.sb_rextsize),
-			1 << DBM_UNKNOWN);
-		check_summary();
-	}
-	if (mp->m_sb.sb_icount != icount) {
-		if (!sflag)
-			dbprintf(_("sb_icount %lld, counted %lld\n"),
-				mp->m_sb.sb_icount, icount);
-		error++;
-	}
-	if (mp->m_sb.sb_ifree != ifree) {
-		if (!sflag)
-			dbprintf(_("sb_ifree %lld, counted %lld\n"),
-				mp->m_sb.sb_ifree, ifree);
-		error++;
-	}
-	if (mp->m_sb.sb_fdblocks != fdblocks) {
-		if (!sflag)
-			dbprintf(_("sb_fdblocks %lld, counted %lld\n"),
-				mp->m_sb.sb_fdblocks, fdblocks);
-		error++;
-	}
-	if (lazycount && mp->m_sb.sb_fdblocks != agf_aggr_freeblks) {
-		if (!sflag)
-			dbprintf(_("sb_fdblocks %lld, aggregate AGF count %lld\n"),
-				mp->m_sb.sb_fdblocks, agf_aggr_freeblks);
-		error++;
-	}
-	if (mp->m_sb.sb_frextents != frextents) {
-		if (!sflag)
-			dbprintf(_("sb_frextents %lld, counted %lld\n"),
-				mp->m_sb.sb_frextents, frextents);
-		error++;
-	}
-	if (mp->m_sb.sb_bad_features2 != 0 &&
-			mp->m_sb.sb_bad_features2 != mp->m_sb.sb_features2) {
-		if (!sflag)
-			dbprintf(_("sb_features2 (0x%x) not same as "
-				"sb_bad_features2 (0x%x)\n"),
-				mp->m_sb.sb_features2,
-				mp->m_sb.sb_bad_features2);
-		error++;
-	}
-	if ((sbversion & XFS_SB_VERSION_ATTRBIT) &&
-					!xfs_has_attr(mp)) {
-		if (!sflag)
-			dbprintf(_("sb versionnum missing attr bit %x\n"),
-				XFS_SB_VERSION_ATTRBIT);
-		error++;
-	}
-	if ((sbversion & XFS_SB_VERSION_QUOTABIT) &&
-					!xfs_has_quota(mp)) {
-		if (!sflag)
-			dbprintf(_("sb versionnum missing quota bit %x\n"),
-				XFS_SB_VERSION_QUOTABIT);
-		error++;
-	}
-	if (!(sbversion & XFS_SB_VERSION_ALIGNBIT) &&
-					xfs_has_align(mp)) {
-		if (!sflag)
-			dbprintf(_("sb versionnum extra align bit %x\n"),
-				XFS_SB_VERSION_ALIGNBIT);
-		error++;
-	}
-	if (qudo)
-		quota_check("user", qudata);
-	if (qpdo)
-		quota_check("project", qpdata);
-	if (qgdo)
-		quota_check("group", qgdata);
-	if (sbver_err > mp->m_sb.sb_agcount / 2)
-		dbprintf(_("WARNING: this may be a newer XFS filesystem.\n"));
-	if (error)
-		exitcode = 3;
 	dbprefix = oldprefix;
 	return 0;
 }
@@ -1388,58 +1282,6 @@ check_inomap(
 	return rval;
 }
 
-static void
-check_linkcounts(
-	xfs_agnumber_t	agno)
-{
-	inodata_t	*ep;
-	inodata_t	**ht;
-	int		idx;
-	char		*path;
-
-	ht = inodata[agno];
-	for (idx = 0; idx < inodata_hash_size; ht++, idx++) {
-		ep = *ht;
-		while (ep) {
-			if (ep->link_set != ep->link_add || ep->link_set == 0) {
-				path = inode_name(ep->ino, NULL);
-				if (!path && ep->link_add)
-					path = xstrdup("?");
-				if (!sflag || ep->ilist) {
-					if (ep->link_add)
-						dbprintf(_("link count mismatch "
-							 "for inode %lld (name "
-							 "%s), nlink %d, "
-							 "counted %d\n"),
-							ep->ino, path,
-							ep->link_set,
-							ep->link_add);
-					else if (ep->link_set)
-						dbprintf(_("disconnected inode "
-							 "%lld, nlink %d\n"),
-							ep->ino, ep->link_set);
-					else
-						dbprintf(_("allocated inode %lld "
-							 "has 0 link count\n"),
-							ep->ino);
-				}
-				if (path)
-					xfree(path);
-				error++;
-			} else if (verbose || ep->ilist) {
-				path = inode_name(ep->ino, NULL);
-				if (path) {
-					dbprintf(_("inode %lld name %s\n"),
-						ep->ino, path);
-					xfree(path);
-				}
-			}
-			ep = ep->next;
-		}
-	}
-
-}
-
 static int
 check_range(
 	xfs_agnumber_t  agno,
@@ -1556,25 +1398,6 @@ check_rinomap(
 	return rval;
 }
 
-static void
-check_rootdir(void)
-{
-	inodata_t	*id;
-
-	id = find_inode(mp->m_sb.sb_rootino, 0);
-	if (id == NULL) {
-		if (!sflag)
-			dbprintf(_("root inode %lld is missing\n"),
-				mp->m_sb.sb_rootino);
-		error++;
-	} else if (!id->isdir) {
-		if (!sflag || id->ilist)
-			dbprintf(_("root inode %lld is not a directory\n"),
-				mp->m_sb.sb_rootino);
-		error++;
-	}
-}
-
 static inline void
 report_rrange(
 	xfs_rfsblock_t	low,
@@ -1718,77 +1541,6 @@ get_suminfo(
 	return raw->old;
 }
 
-static void
-check_summary(void)
-{
-	xfs_rfsblock_t	bno;
-	union xfs_suminfo_raw *csp;
-	union xfs_suminfo_raw *fsp;
-	int		log;
-
-	csp = sumcompute;
-	fsp = sumfile;
-	for (log = 0; log < mp->m_rsumlevels; log++) {
-		for (bno = 0;
-		     bno < mp->m_sb.sb_rbmblocks;
-		     bno++, csp++, fsp++) {
-			if (csp->old != fsp->old) {
-				if (!sflag)
-					dbprintf(_("rt summary mismatch, size %d "
-						 "block %llu, file: %d, "
-						 "computed: %d\n"),
-						log, bno,
-						get_suminfo(mp, fsp),
-						get_suminfo(mp, csp));
-				error++;
-			}
-		}
-	}
-}
-
-static void
-checknot_dbmap(
-	xfs_agnumber_t	agno,
-	xfs_agblock_t	agbno,
-	xfs_extlen_t	len,
-	int		typemask)
-{
-	xfs_extlen_t	i;
-	char		*p;
-
-	if (!check_range(agno, agbno, len))
-		return;
-	for (i = 0, p = &dbmap[agno][agbno]; i < len; i++, p++) {
-		if ((1 << *p) & typemask) {
-			if (!sflag || CHECK_BLISTA(agno, agbno + i))
-				dbprintf(_("block %u/%u type %s not expected\n"),
-					agno, agbno + i, typename[(dbm_t)*p]);
-			error++;
-		}
-	}
-}
-
-static void
-checknot_rdbmap(
-	xfs_rfsblock_t	bno,
-	xfs_extlen_t	len,
-	int		typemask)
-{
-	xfs_extlen_t	i;
-	char		*p;
-
-	if (!check_rrange(bno, len))
-		return;
-	for (i = 0, p = &dbmap[mp->m_sb.sb_agcount][bno]; i < len; i++, p++) {
-		if ((1 << *p) & typemask) {
-			if (!sflag || CHECK_BLIST(bno + i))
-				dbprintf(_("rtblock %llu type %s not expected\n"),
-					bno + i, typename[(dbm_t)*p]);
-			error++;
-		}
-	}
-}
-
 static void
 dir_hash_add(
 	xfs_dahash_t		hash,
@@ -3923,48 +3675,6 @@ quota_add1(
 	qt[qh] = qe;
 }
 
-static void
-quota_check(
-	char	*s,
-	qdata_t	**qt)
-{
-	int	i;
-	qdata_t	*next;
-	qdata_t	*qp;
-
-	for (i = 0; i < QDATA_HASH_SIZE; i++) {
-		qp = qt[i];
-		while (qp) {
-			next = qp->next;
-			if (qp->count.bc != qp->dq.bc ||
-			    qp->count.ic != qp->dq.ic ||
-			    qp->count.rc != qp->dq.rc) {
-				if (!sflag) {
-					dbprintf(_("%s quota id %u, have/exp"),
-						s, qp->id);
-					if (qp->count.bc != qp->dq.bc)
-						dbprintf(_(" bc %lld/%lld"),
-							qp->dq.bc,
-							qp->count.bc);
-					if (qp->count.ic != qp->dq.ic)
-						dbprintf(_(" ic %lld/%lld"),
-							qp->dq.ic,
-							qp->count.ic);
-					if (qp->count.rc != qp->dq.rc)
-						dbprintf(_(" rc %lld/%lld"),
-							qp->dq.rc,
-							qp->count.rc);
-					dbprintf("\n");
-				}
-				error++;
-			}
-			xfree(qp);
-			qp = next;
-		}
-	}
-	xfree(qt);
-}
-
 static void
 quota_init(void)
 {
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index 066f124458b286..2325ef169ddc1b 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -345,7 +345,7 @@ .SH COMMANDS
 command can be given, presumably with different arguments than the previous one.
 .TP
 .BI "blockget [\-npvs] [\-b " bno "] ... [\-i " ino "] ..."
-Get block usage and check filesystem consistency.
+Get block usage.
 The information is saved for use by a subsequent
 .BR blockuse ", " ncheck ", or " blocktrash
 command.
@@ -564,11 +564,6 @@ .SH COMMANDS
 half full.
 .RE
 .TP
-.B check
-See the
-.B blockget
-command.
-.TP
 .BI "convert " "type number" " [" "type number" "] ... " type
 Convert from one address form to another.
 The known
@@ -2635,8 +2630,7 @@ .SH TYPES
 and printable ASCII chars.
 .SH DIAGNOSTICS
 Many messages can come from the
-.B check
-.RB ( blockget )
+.B blockget
 command.
 If the filesystem is completely corrupt, a core dump might
 be produced instead of the message
@@ -2646,7 +2640,7 @@ .SH DIAGNOSTICS
 .RE
 .PP
 If the filesystem is very large (has many files) then
-.B check
+.B blockget
 might run out of memory. In this case the message
 .RS
 .B out of memory


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 16/41] xfs_io: support flag for limited bulkstat of the metadata directory
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (14 preceding siblings ...)
  2024-12-23 21:50   ` [PATCH 15/41] xfs_db: drop the metadata checking code from blockget Darrick J. Wong
@ 2024-12-23 21:50   ` Darrick J. Wong
  2024-12-23 21:50   ` [PATCH 17/41] xfs_io: support scrubbing metadata directory paths Darrick J. Wong
                     ` (24 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:50 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Support the new XFS_BULK_IREQ_METADIR flag for bulkstat commands.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 io/bulkstat.c     |   16 +++++++++++++++-
 man/man8/xfs_io.8 |   10 +++++++---
 2 files changed, 22 insertions(+), 4 deletions(-)


diff --git a/io/bulkstat.c b/io/bulkstat.c
index 06023e1289e420..f312c6d55f47bc 100644
--- a/io/bulkstat.c
+++ b/io/bulkstat.c
@@ -70,6 +70,7 @@ bulkstat_help(void)
 "   -d         Print debugging output.\n"
 "   -q         Be quiet, no output.\n"
 "   -e <ino>   Stop after this inode.\n"
+"   -m         Include metadata directories.\n"
 "   -n <nr>    Ask for this many results at once.\n"
 "   -s <ino>   Inode to start with.\n"
 "   -v <ver>   Use this version of the ioctl (1 or 5).\n"));
@@ -107,11 +108,12 @@ bulkstat_f(
 	bool			has_agno = false;
 	bool			debug = false;
 	bool			quiet = false;
+	bool			metadir = false;
 	unsigned int		i;
 	int			c;
 	int			ret;
 
-	while ((c = getopt(argc, argv, "a:de:n:qs:v:")) != -1) {
+	while ((c = getopt(argc, argv, "a:de:mn:qs:v:")) != -1) {
 		switch (c) {
 		case 'a':
 			agno = cvt_u32(optarg, 10);
@@ -131,6 +133,9 @@ bulkstat_f(
 				return 1;
 			}
 			break;
+		case 'm':
+			metadir = true;
+			break;
 		case 'n':
 			batch_size = cvt_u32(optarg, 10);
 			if (errno) {
@@ -185,6 +190,8 @@ bulkstat_f(
 
 	if (has_agno)
 		xfrog_bulkstat_set_ag(breq, agno);
+	if (metadir)
+		breq->hdr.flags |= XFS_BULK_IREQ_METADIR;
 
 	set_xfd_flags(&xfd, ver);
 
@@ -253,6 +260,7 @@ bulkstat_single_f(
 	unsigned long		ver = 0;
 	unsigned int		i;
 	bool			debug = false;
+	bool			metadir = false;
 	int			c;
 	int			ret;
 
@@ -261,6 +269,9 @@ bulkstat_single_f(
 		case 'd':
 			debug = true;
 			break;
+		case 'm':
+			metadir = true;
+			break;
 		case 'v':
 			errno = 0;
 			ver = strtoull(optarg, NULL, 10);
@@ -313,6 +324,9 @@ bulkstat_single_f(
 			}
 		}
 
+		if (metadir)
+			flags |= XFS_BULK_IREQ_METADIR;
+
 		ret = -xfrog_bulkstat_single(&xfd, ino, flags, &bulkstat);
 		if (ret) {
 			xfrog_perror(ret, "xfrog_bulkstat_single");
diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8
index eb2201fca74380..fb7a026224fda7 100644
--- a/man/man8/xfs_io.8
+++ b/man/man8/xfs_io.8
@@ -1243,7 +1243,7 @@ .SH MEMORY MAPPED I/O COMMANDS
 
 .SH FILESYSTEM COMMANDS
 .TP
-.BI "bulkstat [ \-a " agno " ] [ \-d ] [ \-e " endino " ] [ \-n " batchsize " ] [ \-q ] [ \-s " startino " ] [ \-v " version" ]
+.BI "bulkstat [ \-a " agno " ] [ \-d ] [ \-e " endino " ] [ \-m ] [ \-n " batchsize " ] [ \-q ] [ \-s " startino " ] [ \-v " version" ]
 Display raw stat information about a bunch of inodes in an XFS filesystem.
 Options are as follows:
 .RS 1.0i
@@ -1260,6 +1260,9 @@ .SH FILESYSTEM COMMANDS
 Stop displaying records when this inode number is reached.
 Defaults to stopping when the system call stops returning results.
 .TP
+.BI \-m
+Include metadata directories in the output.
+.TP
 .BI \-n " batchsize"
 Retrieve at most this many records per call.
 Defaults to 4,096.
@@ -1280,10 +1283,11 @@ .SH FILESYSTEM COMMANDS
 .RE
 .PD
 .TP
-.BI "bulkstat_single [ \-d ] [ \-v " version " ] [ " inum... " | " special... " ]
+.BI "bulkstat_single [ \-d ] [ \-m ] [ \-v " version " ] [ " inum... " | " special... " ]
 Display raw stat information about individual inodes in an XFS filesystem.
 The
-.B \-d
+.BR \-d ,
+.BR \-m ,
 and
 .B \-v
 options are the same as the


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 17/41] xfs_io: support scrubbing metadata directory paths
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (15 preceding siblings ...)
  2024-12-23 21:50   ` [PATCH 16/41] xfs_io: support flag for limited bulkstat of the metadata directory Darrick J. Wong
@ 2024-12-23 21:50   ` Darrick J. Wong
  2024-12-23 21:51   ` [PATCH 18/41] xfs_spaceman: report health of metadir inodes too Darrick J. Wong
                     ` (23 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:50 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Support invoking the metadata directory path scrubber from xfs_io for
testing.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 io/scrub.c        |   62 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 man/man8/xfs_io.8 |    3 ++-
 2 files changed, 62 insertions(+), 3 deletions(-)


diff --git a/io/scrub.c b/io/scrub.c
index e03c1d0eaf1db8..45229a8ae81099 100644
--- a/io/scrub.c
+++ b/io/scrub.c
@@ -41,6 +41,12 @@ scrub_help(void)
 " Known metadata scrub types are:"));
 	for (i = 0, d = xfrog_scrubbers; i < XFS_SCRUB_TYPE_NR; i++, d++)
 		printf(" %s", d->name);
+	printf(_(
+"\n"
+"\n"
+" Known metapath scrub arguments are:"));
+	for (i = 0, d = xfrog_metapaths; i < XFS_SCRUB_METAPATH_NR; i++, d++)
+		printf(" %s", d->name);
 	printf("\n");
 }
 
@@ -125,6 +131,40 @@ parse_none(
 	return true;
 }
 
+static bool
+parse_metapath(
+	int		argc,
+	char		**argv,
+	int		optind,
+	__u64		*ino)
+{
+	char		*p;
+	unsigned long long control;
+	int		i;
+
+	if (optind != argc - 1) {
+		fprintf(stderr, _("Must specify metapath number.\n"));
+		return false;
+	}
+
+	for (i = 0; i < XFS_SCRUB_METAPATH_NR; i++) {
+		if (!strcmp(argv[optind], xfrog_metapaths[i].name)) {
+			*ino = i;
+			return true;
+		}
+	}
+
+	control = strtoll(argv[optind], &p, 0);
+	if (*p != '\0') {
+		fprintf(stderr, _("Bad metapath number '%s'.\n"),
+				argv[optind]);
+		return false;
+	}
+
+	*ino = control;
+	return true;
+}
+
 static int
 parse_args(
 	int				argc,
@@ -170,6 +210,12 @@ parse_args(
 	meta->sm_flags = flags;
 
 	switch (d->group) {
+	case XFROG_SCRUB_GROUP_METAPATH:
+		if (!parse_metapath(argc, argv, optind, &meta->sm_ino)) {
+			exitcode = 1;
+			return command_usage(cmdinfo);
+		}
+		break;
 	case XFROG_SCRUB_GROUP_INODE:
 		if (!parse_inode(argc, argv, optind, &meta->sm_ino,
 						     &meta->sm_gen)) {
@@ -244,7 +290,7 @@ scrub_init(void)
 	scrub_cmd.argmin = 1;
 	scrub_cmd.argmax = -1;
 	scrub_cmd.flags = CMD_NOMAP_OK;
-	scrub_cmd.args = _("type [agno|ino gen]");
+	scrub_cmd.args = _("type [agno|ino gen|metapath]");
 	scrub_cmd.oneline = _("scrubs filesystem metadata");
 	scrub_cmd.help = scrub_help;
 
@@ -275,6 +321,12 @@ repair_help(void)
 " Known metadata repair types are:"));
 	for (i = 0, d = xfrog_scrubbers; i < XFS_SCRUB_TYPE_NR; i++, d++)
 		printf(" %s", d->name);
+	printf(_(
+"\n"
+"\n"
+" Known metapath repair arguments are:"));
+	for (i = 0, d = xfrog_metapaths; i < XFS_SCRUB_METAPATH_NR; i++, d++)
+		printf(" %s", d->name);
 	printf("\n");
 }
 
@@ -327,7 +379,7 @@ repair_init(void)
 	repair_cmd.argmin = 1;
 	repair_cmd.argmax = -1;
 	repair_cmd.flags = CMD_NOMAP_OK;
-	repair_cmd.args = _("type [agno|ino gen]");
+	repair_cmd.args = _("type [agno|ino gen|metapath]");
 	repair_cmd.oneline = _("repairs filesystem metadata");
 	repair_cmd.help = repair_help;
 
@@ -495,6 +547,12 @@ scrubv_f(
 	optind++;
 
 	switch (group) {
+	case XFROG_SCRUB_GROUP_METAPATH:
+		if (!parse_metapath(argc, argv, optind, &scrubv.head.svh_ino)) {
+			exitcode = 1;
+			return command_usage(&scrubv_cmd);
+		}
+		break;
 	case XFROG_SCRUB_GROUP_INODE:
 		if (!parse_inode(argc, argv, optind, &scrubv.head.svh_ino,
 						     &scrubv.head.svh_gen)) {
diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8
index fb7a026224fda7..c73fee7c2780c6 100644
--- a/man/man8/xfs_io.8
+++ b/man/man8/xfs_io.8
@@ -1425,13 +1425,14 @@ .SH FILESYSTEM COMMANDS
 .RE
 .PD
 .TP
-.BI "scrub " type " [ " agnumber " | " "ino" " " "gen" " ]"
+.BI "scrub " type " [ " agnumber " | " "ino" " " "gen" " | " metapath " ]"
 Scrub internal XFS filesystem metadata.  The
 .BI type
 parameter specifies which type of metadata to scrub.
 For AG metadata, one AG number must be specified.
 For file metadata, the scrub is applied to the open file unless the
 inode number and generation number are specified.
+For metapath, the name of a file or a raw number must be specified.
 .RE
 .PD
 .TP


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 18/41] xfs_spaceman: report health of metadir inodes too
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (16 preceding siblings ...)
  2024-12-23 21:50   ` [PATCH 17/41] xfs_io: support scrubbing metadata directory paths Darrick J. Wong
@ 2024-12-23 21:51   ` Darrick J. Wong
  2024-12-23 21:51   ` [PATCH 19/41] xfs_scrub: tread zero-length read verify as an IO error Darrick J. Wong
                     ` (22 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:51 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If the filesystem has a metadata directory tree, we should include those
inodes in the health report.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 spaceman/health.c |    2 ++
 1 file changed, 2 insertions(+)


diff --git a/spaceman/health.c b/spaceman/health.c
index d88a7f6c6e53f2..c4d570363fbbf1 100644
--- a/spaceman/health.c
+++ b/spaceman/health.c
@@ -324,6 +324,8 @@ report_bulkstat_health(
 
 	if (agno != NULLAGNUMBER)
 		xfrog_bulkstat_set_ag(breq, agno);
+	if (file->xfd.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_METADIR)
+		breq->hdr.flags |= XFS_BULK_IREQ_METADIR;
 
 	do {
 		error = -xfrog_bulkstat(&file->xfd, breq);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 19/41] xfs_scrub: tread zero-length read verify as an IO error
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (17 preceding siblings ...)
  2024-12-23 21:51   ` [PATCH 18/41] xfs_spaceman: report health of metadir inodes too Darrick J. Wong
@ 2024-12-23 21:51   ` Darrick J. Wong
  2024-12-23 21:51   ` [PATCH 20/41] xfs_scrub: scan metadata directories during phase 3 Darrick J. Wong
                     ` (21 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:51 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: linux-xfs, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

While doing some chaos testing on the xfs_scrub read verify code, I
noticed that if the device under a live filesystem gets resized while
scrub is running a media scan, reads will start returning 0.  This
causes read_verify() to run around in an infinite loop instead of
erroring out like it should.

Cc: <linux-xfs@vger.kernel.org> # v5.3.0
Fixes: 27464242956fac ("xfs_scrub: fix read verify disk error handling strategy")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 scrub/phase6.c      |   22 ++++++++++++++++++++++
 scrub/read_verify.c |    8 ++++++++
 2 files changed, 30 insertions(+)


diff --git a/scrub/phase6.c b/scrub/phase6.c
index a61853019e290c..54d21820a722a6 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -44,6 +44,9 @@ struct media_verify_state {
 	struct read_verify_pool	*rvp_realtime;
 	struct bitmap		*d_bad;		/* bytes */
 	struct bitmap		*r_bad;		/* bytes */
+	bool			d_trunc:1;
+	bool			r_trunc:1;
+	bool			l_trunc:1;
 };
 
 /* Find the fd for a given device identifier. */
@@ -544,6 +547,13 @@ report_all_media_errors(
 {
 	int				ret;
 
+	if (vs->d_trunc)
+		str_corrupt(ctx, ctx->mntpoint, _("data device truncated"));
+	if (vs->l_trunc)
+		str_corrupt(ctx, ctx->mntpoint, _("log device truncated"));
+	if (vs->r_trunc)
+		str_corrupt(ctx, ctx->mntpoint, _("rt device truncated"));
+
 	ret = report_disk_ioerrs(ctx, ctx->datadev, vs);
 	if (ret) {
 		str_liberror(ctx, ret, _("walking datadev io errors"));
@@ -663,6 +673,18 @@ remember_ioerr(
 	struct bitmap			*tree;
 	int				ret;
 
+	if (!length) {
+		dev_t			dev = disk_to_dev(ctx, disk);
+
+		if (dev == ctx->fsinfo.fs_datadev)
+			vs->d_trunc = true;
+		else if (dev == ctx->fsinfo.fs_rtdev)
+			vs->r_trunc = true;
+		else if (dev == ctx->fsinfo.fs_logdev)
+			vs->l_trunc = true;
+		return;
+	}
+
 	tree = bitmap_for_disk(ctx, disk, vs);
 	if (!tree) {
 		str_liberror(ctx, ENOENT, _("finding bad block bitmap"));
diff --git a/scrub/read_verify.c b/scrub/read_verify.c
index 52348274be2c25..1219efe2590182 100644
--- a/scrub/read_verify.c
+++ b/scrub/read_verify.c
@@ -245,6 +245,14 @@ read_verify(
 					read_error);
 			rvp->ioerr_fn(rvp->ctx, rvp->disk, rv->io_start, sz,
 					read_error, rv->io_end_arg);
+		} else if (sz == 0) {
+			/* No bytes at all?  Did we hit the end of the disk? */
+			dbg_printf("EOF %d @ %"PRIu64" %zu err %d\n",
+					rvp->disk->d_fd, rv->io_start, sz,
+					read_error);
+			rvp->ioerr_fn(rvp->ctx, rvp->disk, rv->io_start, sz,
+					read_error, rv->io_end_arg);
+			break;
 		} else if (sz < len) {
 			/*
 			 * A short direct read suggests that we might have hit


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 20/41] xfs_scrub: scan metadata directories during phase 3
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (18 preceding siblings ...)
  2024-12-23 21:51   ` [PATCH 19/41] xfs_scrub: tread zero-length read verify as an IO error Darrick J. Wong
@ 2024-12-23 21:51   ` Darrick J. Wong
  2024-12-23 21:51   ` [PATCH 21/41] xfs_scrub: re-run metafile scrubbers during phase 5 Darrick J. Wong
                     ` (20 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:51 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Scan metadata directories for correctness during phase 3.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 scrub/inodes.c |   11 ++++++++++-
 scrub/inodes.h |    5 ++++-
 scrub/phase3.c |    7 ++++++-
 scrub/phase5.c |    5 ++++-
 scrub/phase6.c |    2 +-
 5 files changed, 25 insertions(+), 5 deletions(-)


diff --git a/scrub/inodes.c b/scrub/inodes.c
index 16c79cf495c793..3fe759e8f4867d 100644
--- a/scrub/inodes.c
+++ b/scrub/inodes.c
@@ -56,6 +56,7 @@ bulkstat_for_inumbers(
 {
 	struct xfs_bulkstat	*bstat = breq->bulkstat;
 	struct xfs_bulkstat	*bs;
+	unsigned int		flags = 0;
 	int			i;
 	int			error;
 
@@ -70,6 +71,9 @@ bulkstat_for_inumbers(
 			 strerror_r(error, errbuf, DESCR_BUFSZ));
 	}
 
+	if (breq->hdr.flags & XFS_BULK_IREQ_METADIR)
+		flags |= XFS_BULK_IREQ_METADIR;
+
 	/*
 	 * Check each of the stats we got back to make sure we got the inodes
 	 * we asked for.
@@ -84,7 +88,7 @@ bulkstat_for_inumbers(
 
 		/* Load the one inode. */
 		error = -xfrog_bulkstat_single(&ctx->mnt,
-				inumbers->xi_startino + i, 0, bs);
+				inumbers->xi_startino + i, flags, bs);
 		if (error || bs->bs_ino != inumbers->xi_startino + i) {
 			memset(bs, 0, sizeof(struct xfs_bulkstat));
 			bs->bs_ino = inumbers->xi_startino + i;
@@ -100,6 +104,7 @@ struct scan_inodes {
 	scrub_inode_iter_fn	fn;
 	void			*arg;
 	unsigned int		nr_threads;
+	unsigned int		flags;
 	bool			aborted;
 };
 
@@ -158,6 +163,8 @@ alloc_ichunk(
 
 	breq = ichunk_to_bulkstat(ichunk);
 	breq->hdr.icount = LIBFROG_BULKSTAT_CHUNKSIZE;
+	if (si->flags & SCRUB_SCAN_METADIR)
+		breq->hdr.flags |= XFS_BULK_IREQ_METADIR;
 
 	*ichunkp = ichunk;
 	return 0;
@@ -380,10 +387,12 @@ int
 scrub_scan_all_inodes(
 	struct scrub_ctx	*ctx,
 	scrub_inode_iter_fn	fn,
+	unsigned int		flags,
 	void			*arg)
 {
 	struct scan_inodes	si = {
 		.fn		= fn,
+		.flags		= flags,
 		.arg		= arg,
 		.nr_threads	= scrub_nproc_workqueue(ctx),
 	};
diff --git a/scrub/inodes.h b/scrub/inodes.h
index 9447fb56aa62e7..7a0b275e575ead 100644
--- a/scrub/inodes.h
+++ b/scrub/inodes.h
@@ -17,8 +17,11 @@
 typedef int (*scrub_inode_iter_fn)(struct scrub_ctx *ctx,
 		struct xfs_handle *handle, struct xfs_bulkstat *bs, void *arg);
 
+/* Return metadata directories too. */
+#define SCRUB_SCAN_METADIR	(1 << 0)
+
 int scrub_scan_all_inodes(struct scrub_ctx *ctx, scrub_inode_iter_fn fn,
-		void *arg);
+		unsigned int flags, void *arg);
 
 int scrub_open_handle(struct xfs_handle *handle);
 
diff --git a/scrub/phase3.c b/scrub/phase3.c
index 046a42c1da8beb..c90da78439425a 100644
--- a/scrub/phase3.c
+++ b/scrub/phase3.c
@@ -312,6 +312,7 @@ phase3_func(
 	struct scrub_inode_ctx	ictx = { .ctx = ctx };
 	uint64_t		val;
 	xfs_agnumber_t		agno;
+	unsigned int		scan_flags = 0;
 	int			err;
 
 	err = -ptvar_alloc(scrub_nproc(ctx), sizeof(struct action_list),
@@ -328,6 +329,10 @@ phase3_func(
 		goto out_ptvar;
 	}
 
+	/* Scan the metadata directory tree too. */
+	if (ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_METADIR)
+		scan_flags |= SCRUB_SCAN_METADIR;
+
 	/*
 	 * If we already have ag/fs metadata to repair from previous phases,
 	 * we would rather not try to repair file metadata until we've tried
@@ -338,7 +343,7 @@ phase3_func(
 			ictx.always_defer_repairs = true;
 	}
 
-	err = scrub_scan_all_inodes(ctx, scrub_inode, &ictx);
+	err = scrub_scan_all_inodes(ctx, scrub_inode, scan_flags, &ictx);
 	if (!err && ictx.aborted)
 		err = ECANCELED;
 	if (err)
diff --git a/scrub/phase5.c b/scrub/phase5.c
index e1d94f9a3568b1..69b1cae5c5e2c0 100644
--- a/scrub/phase5.c
+++ b/scrub/phase5.c
@@ -462,6 +462,9 @@ retry_deferred_inode(
 	unsigned int		flags = 0;
 	int			error;
 
+	if (ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_METADIR)
+		flags |= XFS_BULK_IREQ_METADIR;
+
 	error = -xfrog_bulkstat_single(&ctx->mnt, ino, flags, &bstat);
 	if (error == ENOENT) {
 		/* Directory is gone, mark it clear. */
@@ -772,7 +775,7 @@ _("Filesystem has errors, skipping connectivity checks."));
 
 	pthread_mutex_init(&ncs.lock, NULL);
 
-	ret = scrub_scan_all_inodes(ctx, check_inode_names, &ncs);
+	ret = scrub_scan_all_inodes(ctx, check_inode_names, 0, &ncs);
 	if (ret)
 		goto out_lock;
 	if (ncs.aborted) {
diff --git a/scrub/phase6.c b/scrub/phase6.c
index 54d21820a722a6..e4f26e7f1dd93e 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -578,7 +578,7 @@ report_all_media_errors(
 	}
 
 	/* Scan for unlinked files. */
-	return scrub_scan_all_inodes(ctx, report_inode_loss, vs);
+	return scrub_scan_all_inodes(ctx, report_inode_loss, 0, vs);
 }
 
 /* Schedule a read-verify of a (data block) extent. */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 21/41] xfs_scrub: re-run metafile scrubbers during phase 5
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (19 preceding siblings ...)
  2024-12-23 21:51   ` [PATCH 20/41] xfs_scrub: scan metadata directories during phase 3 Darrick J. Wong
@ 2024-12-23 21:51   ` Darrick J. Wong
  2024-12-23 21:52   ` [PATCH 22/41] xfs_repair: handle sb_metadirino correctly when zeroing supers Darrick J. Wong
                     ` (19 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:51 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

For metadata files on a metadir filesystem, re-run the scrubbers during
phase 5 to ensure that the metadata files are still connected.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 scrub/phase5.c |   97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 scrub/scrub.h  |    7 ++++
 2 files changed, 103 insertions(+), 1 deletion(-)


diff --git a/scrub/phase5.c b/scrub/phase5.c
index 69b1cae5c5e2c0..4d0a76a529b55d 100644
--- a/scrub/phase5.c
+++ b/scrub/phase5.c
@@ -745,6 +745,87 @@ run_kernel_fs_scan_scrubbers(
 	return ret;
 }
 
+/* Queue one metapath scrubber. */
+static int
+queue_metapath_scan(
+	struct workqueue	*wq,
+	bool			*abortedp,
+	uint64_t		type)
+{
+	struct fs_scan_item	*item;
+	struct scrub_ctx	*ctx = wq->wq_ctx;
+	int			ret;
+
+	item = malloc(sizeof(struct fs_scan_item));
+	if (!item) {
+		ret = ENOMEM;
+		str_liberror(ctx, ret, _("setting up metapath scan"));
+		return ret;
+	}
+	scrub_item_init_metapath(&item->sri, type);
+	scrub_item_schedule(&item->sri, XFS_SCRUB_TYPE_METAPATH);
+	item->abortedp = abortedp;
+
+	ret = -workqueue_add(wq, fs_scan_worker, 0, item);
+	if (ret)
+		str_liberror(ctx, ret, _("queuing metapath scan work"));
+
+	return ret;
+}
+
+/*
+ * Scrub metadata directory file paths to ensure that fs metadata are still
+ * connected where the fs needs to find them.
+ */
+static int
+run_kernel_metadir_path_scrubbers(
+	struct scrub_ctx	*ctx)
+{
+	struct workqueue	wq;
+	const struct xfrog_scrub_descr	*sc;
+	uint64_t		type;
+	unsigned int		nr_threads = scrub_nproc_workqueue(ctx);
+	bool			aborted = false;
+	int			ret, ret2;
+
+	ret = -workqueue_create(&wq, (struct xfs_mount *)ctx, nr_threads);
+	if (ret) {
+		str_liberror(ctx, ret, _("setting up metapath scan workqueue"));
+		return ret;
+	}
+
+	/*
+	 * Scan all the metadata files in parallel if metadata directories
+	 * are enabled, because the phase 3 scrubbers might have taken out
+	 * parts of the metadir tree.
+	 */
+	for (type = 0; type < XFS_SCRUB_METAPATH_NR; type++) {
+		sc = &xfrog_metapaths[type];
+		if (sc->group != XFROG_SCRUB_GROUP_FS)
+			continue;
+
+		ret = queue_metapath_scan(&wq, &aborted, type);
+		if (ret) {
+			str_liberror(ctx, ret,
+ _("queueing metapath scrub work"));
+			goto wait;
+		}
+	}
+
+wait:
+	ret2 = -workqueue_terminate(&wq);
+	if (ret2) {
+		str_liberror(ctx, ret2, _("joining metapath scan workqueue"));
+		if (!ret)
+			ret = ret2;
+	}
+	if (aborted && !ret)
+		ret = ECANCELED;
+
+	workqueue_destroy(&wq);
+	return ret;
+}
+
 /* Check directory connectivity. */
 int
 phase5_func(
@@ -753,6 +834,16 @@ phase5_func(
 	struct ncheck_state	ncs = { .ctx = ctx };
 	int			ret;
 
+	/*
+	 * Make sure metadata files are still connected to the metadata
+	 * directory tree now that phase 3 pruned all corrupt directory tree
+	 * links.
+	 */
+	if (ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_METADIR) {
+		ret = run_kernel_metadir_path_scrubbers(ctx);
+		if (ret)
+			return ret;
+	}
 
 	/*
 	 * Check and fix anything that requires a full filesystem scan.  We do
@@ -805,8 +896,12 @@ phase5_estimate(
 	unsigned int		*nr_threads,
 	int			*rshift)
 {
+	unsigned int		scans = 2;
+
 	*items = scrub_estimate_iscan_work(ctx);
-	*nr_threads = scrub_nproc(ctx) * 2;
+	if (ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_METADIR)
+		scans++;
+	*nr_threads = scrub_nproc(ctx) * scans;
 	*rshift = 0;
 	return 0;
 }
diff --git a/scrub/scrub.h b/scrub/scrub.h
index c3eed1b261d511..3bb3ea1d07bf40 100644
--- a/scrub/scrub.h
+++ b/scrub/scrub.h
@@ -108,6 +108,13 @@ scrub_item_init_file(struct scrub_item *sri, const struct xfs_bulkstat *bstat)
 	sri->sri_gen = bstat->bs_gen;
 }
 
+static inline void
+scrub_item_init_metapath(struct scrub_item *sri, uint64_t metapath)
+{
+	memset(sri, 0, sizeof(*sri));
+	sri->sri_ino = metapath;
+}
+
 void scrub_item_dump(struct scrub_item *sri, unsigned int group_mask,
 		const char *tag);
 


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 22/41] xfs_repair: handle sb_metadirino correctly when zeroing supers
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (20 preceding siblings ...)
  2024-12-23 21:51   ` [PATCH 21/41] xfs_scrub: re-run metafile scrubbers during phase 5 Darrick J. Wong
@ 2024-12-23 21:52   ` Darrick J. Wong
  2024-12-23 21:52   ` [PATCH 23/41] xfs_repair: dont check metadata directory dirent inumbers Darrick J. Wong
                     ` (18 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:52 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

The metadata directory root inumber is now the last field in the
superblock, so extend the zeroing code to know about that.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/agheader.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)


diff --git a/repair/agheader.c b/repair/agheader.c
index 3930a0ac0919b4..fe58d833b8bafa 100644
--- a/repair/agheader.c
+++ b/repair/agheader.c
@@ -319,6 +319,12 @@ check_v5_feature_mismatch(
 	return XR_AG_SB_SEC;
 }
 
+static inline bool xfs_sb_version_hasmetadir(const struct xfs_sb *sbp)
+{
+	return xfs_sb_is_v5(sbp) &&
+		(sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR);
+}
+
 /*
  * Possible fields that may have been set at mkfs time,
  * sb_inoalignmt, sb_unit, sb_width and sb_dirblklog.
@@ -357,7 +363,10 @@ secondary_sb_whack(
 	 *
 	 * size is the size of data which is valid for this sb.
 	 */
-	if (xfs_sb_version_hasmetauuid(sb))
+	if (xfs_sb_version_hasmetadir(sb))
+		size = offsetof(struct xfs_dsb, sb_metadirino)
+			+ sizeof(sb->sb_metadirino);
+	else if (xfs_sb_version_hasmetauuid(sb))
 		size = offsetof(struct xfs_dsb, sb_meta_uuid)
 			+ sizeof(sb->sb_meta_uuid);
 	else if (xfs_sb_version_hascrc(sb))


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 23/41] xfs_repair: dont check metadata directory dirent inumbers
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (21 preceding siblings ...)
  2024-12-23 21:52   ` [PATCH 22/41] xfs_repair: handle sb_metadirino correctly when zeroing supers Darrick J. Wong
@ 2024-12-23 21:52   ` Darrick J. Wong
  2024-12-23 21:52   ` [PATCH 24/41] xfs_repair: refactor fixing dotdot Darrick J. Wong
                     ` (17 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:52 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Phase 6 always rebuilds the entire metadata directory tree, and repair
quietly ignores all the DIFLAG2_METADATA directory inodes that it finds.
As a result, none of the metadata directories are marked inuse in the
incore data.  Therefore, the is_inode_free checks are not valid for
anything we find in a metadata directory.

Therefore, avoid checking is_inode_free when scanning metadata directory
dirents.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/libxfs_api_defs.h |    1 +
 repair/dir2.c            |   35 +++++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index dd163709fe07df..d2611d7a764259 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -121,6 +121,7 @@
 #define xfs_dinode_calc_crc		libxfs_dinode_calc_crc
 #define xfs_dinode_good_version		libxfs_dinode_good_version
 #define xfs_dinode_verify		libxfs_dinode_verify
+#define xfs_dinode_verify_metadir	libxfs_dinode_verify_metadir
 
 #define xfs_dir2_data_bestfree_p	libxfs_dir2_data_bestfree_p
 #define xfs_dir2_data_entry_tag_p	libxfs_dir2_data_entry_tag_p
diff --git a/repair/dir2.c b/repair/dir2.c
index bfeaddd07d2058..dab6523f676a34 100644
--- a/repair/dir2.c
+++ b/repair/dir2.c
@@ -136,6 +136,29 @@ process_sf_dir2_fixoff(
 	}
 }
 
+static inline bool
+is_metadata_directory(
+	struct xfs_mount	*mp,
+	struct xfs_dinode	*dip)
+{
+	xfs_failaddr_t		fa;
+	uint16_t		mode;
+	uint16_t		flags;
+	uint64_t		flags2;
+
+	if (!xfs_has_metadir(mp))
+		return false;
+	if (dip->di_version < 3 ||
+	    !(dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_METADATA)))
+		return false;
+
+	mode = be16_to_cpu(dip->di_mode);
+	flags = be16_to_cpu(dip->di_flags);
+	flags2 = be64_to_cpu(dip->di_flags2);
+	fa = libxfs_dinode_verify_metadir(mp, dip, mode, flags, flags2);
+	return fa == NULL;
+}
+
 /*
  * this routine performs inode discovery and tries to fix things
  * in place.  available redundancy -- inode data size should match
@@ -227,6 +250,12 @@ process_sf_dir2(
 		} else if (!libxfs_verify_dir_ino(mp, lino)) {
 			junkit = 1;
 			junkreason = _("invalid");
+		} else if (is_metadata_directory(mp, dip)) {
+			/*
+			 * Metadata directories are always rebuilt, so don't
+			 * bother checking if the child inode is free or not.
+			 */
+			junkit = 0;
 		} else if (lino == mp->m_sb.sb_rbmino)  {
 			junkit = 1;
 			junkreason = _("realtime bitmap");
@@ -698,6 +727,12 @@ process_dir2_data(
 			 * directory since it's still structurally intact.
 			 */
 			clearreason = _("invalid");
+		} else if (is_metadata_directory(mp, dip)) {
+			/*
+			 * Metadata directories are always rebuilt, so don't
+			 * bother checking if the child inode is free or not.
+			 */
+			clearino = 0;
 		} else if (ent_ino == mp->m_sb.sb_rbmino) {
 			clearreason = _("realtime bitmap");
 		} else if (ent_ino == mp->m_sb.sb_rsumino) {


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 24/41] xfs_repair: refactor fixing dotdot
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (22 preceding siblings ...)
  2024-12-23 21:52   ` [PATCH 23/41] xfs_repair: dont check metadata directory dirent inumbers Darrick J. Wong
@ 2024-12-23 21:52   ` Darrick J. Wong
  2024-12-23 21:53   ` [PATCH 25/41] xfs_repair: refactor marking of metadata inodes Darrick J. Wong
                     ` (16 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:52 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Pull the code that fixes a directory's dot-dot entry into a separate
helper function so that we can call it on the rootdir and (later) the
metadir.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/phase6.c |   96 +++++++++++++++++++++++++++++++++----------------------
 1 file changed, 57 insertions(+), 39 deletions(-)


diff --git a/repair/phase6.c b/repair/phase6.c
index 630617ef8ab8fe..0f13d996fe726a 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -2649,6 +2649,62 @@ dir_hash_add_parent_ptrs(
 	}
 }
 
+/*
+ * If we have to create a .. for /, do it now *before* we delete the bogus
+ * entries, otherwise the directory could transform into a shortform dir which
+ * would probably cause the simulation to choke.  Even if the illegal entries
+ * get shifted around, it's ok because the entries are structurally intact and
+ * in in hash-value order so the simulation won't get confused if it has to
+ * move them around.
+ */
+static void
+fix_dotdot(
+	struct xfs_mount	*mp,
+	xfs_ino_t		ino,
+	struct xfs_inode	*ip,
+	xfs_ino_t		rootino,
+	const char		*tag,
+	int			*need_dotdot)
+{
+	struct xfs_trans	*tp;
+	int			nres;
+	int			error;
+
+	if (ino != rootino || !*need_dotdot)
+		return;
+
+	if (no_modify) {
+		do_warn(_("would recreate %s directory .. entry\n"), tag);
+		return;
+	}
+
+	ASSERT(ip->i_df.if_format != XFS_DINODE_FMT_LOCAL);
+
+	do_warn(_("recreating %s directory .. entry\n"), tag);
+
+	nres = libxfs_mkdir_space_res(mp, 2);
+	error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_mkdir, nres, 0, 0, &tp);
+	if (error)
+		res_failed(error);
+
+	libxfs_trans_ijoin(tp, ip, 0);
+
+	error = -libxfs_dir_createname(tp, ip, &xfs_name_dotdot, ip->i_ino,
+			nres);
+	if (error)
+		do_error(
+_("can't make \"..\" entry in %s inode %" PRIu64 ", createname error %d\n"),
+			tag ,ino, error);
+
+	libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+	error = -libxfs_trans_commit(tp);
+	if (error)
+		do_error(
+_("%s inode \"..\" entry recreation failed (%d)\n"), tag, error);
+
+	*need_dotdot = 0;
+}
+
 /*
  * processes all reachable inodes in directories
  */
@@ -2778,45 +2834,7 @@ _("error %d fixing shortform directory %llu\n"),
 	dir_hash_add_parent_ptrs(ip, hashtab);
 	dir_hash_done(hashtab);
 
-	/*
-	 * if we have to create a .. for /, do it now *before*
-	 * we delete the bogus entries, otherwise the directory
-	 * could transform into a shortform dir which would
-	 * probably cause the simulation to choke.  Even
-	 * if the illegal entries get shifted around, it's ok
-	 * because the entries are structurally intact and in
-	 * in hash-value order so the simulation won't get confused
-	 * if it has to move them around.
-	 */
-	if (!no_modify && need_root_dotdot && ino == mp->m_sb.sb_rootino)  {
-		ASSERT(ip->i_df.if_format != XFS_DINODE_FMT_LOCAL);
-
-		do_warn(_("recreating root directory .. entry\n"));
-
-		nres = libxfs_mkdir_space_res(mp, 2);
-		error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_mkdir,
-					    nres, 0, 0, &tp);
-		if (error)
-			res_failed(error);
-
-		libxfs_trans_ijoin(tp, ip, 0);
-
-		error = -libxfs_dir_createname(tp, ip, &xfs_name_dotdot,
-					ip->i_ino, nres);
-		if (error)
-			do_error(
-	_("can't make \"..\" entry in root inode %" PRIu64 ", createname error %d\n"), ino, error);
-
-		libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
-		error = -libxfs_trans_commit(tp);
-		if (error)
-			do_error(
-	_("root inode \"..\" entry recreation failed (%d)\n"), error);
-
-		need_root_dotdot = 0;
-	} else if (need_root_dotdot && ino == mp->m_sb.sb_rootino)  {
-		do_warn(_("would recreate root directory .. entry\n"));
-	}
+	fix_dotdot(mp, ino, ip, mp->m_sb.sb_rootino, "root", &need_root_dotdot);
 
 	/*
 	 * if we need to create the '.' entry, do so only if


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 25/41] xfs_repair: refactor marking of metadata inodes
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (23 preceding siblings ...)
  2024-12-23 21:52   ` [PATCH 24/41] xfs_repair: refactor fixing dotdot Darrick J. Wong
@ 2024-12-23 21:53   ` Darrick J. Wong
  2024-12-23 21:53   ` [PATCH 26/41] xfs_repair: refactor root directory initialization Darrick J. Wong
                     ` (15 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:53 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Refactor the mechanics of marking a metadata inode into a helper
function so that we don't have to open-code that for every single
metadata inode.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/phase6.c |   72 ++++++++++++++++++-------------------------------------
 1 file changed, 24 insertions(+), 48 deletions(-)


diff --git a/repair/phase6.c b/repair/phase6.c
index 0f13d996fe726a..79c8226110a890 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -2891,6 +2891,18 @@ _("error %d fixing shortform directory %llu\n"),
 	libxfs_irele(ip);
 }
 
+static void
+mark_inode(
+	struct xfs_mount	*mp,
+	xfs_ino_t		ino)
+{
+	struct ino_tree_node	*irec =
+		find_inode_rec(mp, XFS_INO_TO_AGNO(mp, ino),
+				   XFS_INO_TO_AGINO(mp, ino));
+
+	add_inode_reached(irec, XFS_INO_TO_AGINO(mp, ino) - irec->ino_startnum);
+}
+
 /*
  * mark realtime bitmap and summary inodes as reached.
  * quota inode will be marked here as well
@@ -2898,54 +2910,18 @@ _("error %d fixing shortform directory %llu\n"),
 static void
 mark_standalone_inodes(xfs_mount_t *mp)
 {
-	ino_tree_node_t		*irec;
-	int			offset;
-
-	irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, mp->m_sb.sb_rbmino),
-			XFS_INO_TO_AGINO(mp, mp->m_sb.sb_rbmino));
-
-	offset = XFS_INO_TO_AGINO(mp, mp->m_sb.sb_rbmino) -
-			irec->ino_startnum;
-
-	add_inode_reached(irec, offset);
-
-	irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, mp->m_sb.sb_rsumino),
-			XFS_INO_TO_AGINO(mp, mp->m_sb.sb_rsumino));
-
-	offset = XFS_INO_TO_AGINO(mp, mp->m_sb.sb_rsumino) -
-			irec->ino_startnum;
-
-	add_inode_reached(irec, offset);
-
-	if (fs_quotas)  {
-		if (mp->m_sb.sb_uquotino
-				&& mp->m_sb.sb_uquotino != NULLFSINO)  {
-			irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp,
-						mp->m_sb.sb_uquotino),
-				XFS_INO_TO_AGINO(mp, mp->m_sb.sb_uquotino));
-			offset = XFS_INO_TO_AGINO(mp, mp->m_sb.sb_uquotino)
-					- irec->ino_startnum;
-			add_inode_reached(irec, offset);
-		}
-		if (mp->m_sb.sb_gquotino
-				&& mp->m_sb.sb_gquotino != NULLFSINO)  {
-			irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp,
-						mp->m_sb.sb_gquotino),
-				XFS_INO_TO_AGINO(mp, mp->m_sb.sb_gquotino));
-			offset = XFS_INO_TO_AGINO(mp, mp->m_sb.sb_gquotino)
-					- irec->ino_startnum;
-			add_inode_reached(irec, offset);
-		}
-		if (mp->m_sb.sb_pquotino
-				&& mp->m_sb.sb_pquotino != NULLFSINO)  {
-			irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp,
-						mp->m_sb.sb_pquotino),
-				XFS_INO_TO_AGINO(mp, mp->m_sb.sb_pquotino));
-			offset = XFS_INO_TO_AGINO(mp, mp->m_sb.sb_pquotino)
-					- irec->ino_startnum;
-			add_inode_reached(irec, offset);
-		}
-	}
+	mark_inode(mp, mp->m_sb.sb_rbmino);
+	mark_inode(mp, mp->m_sb.sb_rsumino);
+
+	if (!fs_quotas)
+		return;
+
+	if (mp->m_sb.sb_uquotino && mp->m_sb.sb_uquotino != NULLFSINO)
+		mark_inode(mp, mp->m_sb.sb_uquotino);
+	if (mp->m_sb.sb_gquotino && mp->m_sb.sb_gquotino != NULLFSINO)
+		mark_inode(mp, mp->m_sb.sb_gquotino);
+	if (mp->m_sb.sb_pquotino && mp->m_sb.sb_pquotino != NULLFSINO)
+		mark_inode(mp, mp->m_sb.sb_pquotino);
 }
 
 static void


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 26/41] xfs_repair: refactor root directory initialization
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (24 preceding siblings ...)
  2024-12-23 21:53   ` [PATCH 25/41] xfs_repair: refactor marking of metadata inodes Darrick J. Wong
@ 2024-12-23 21:53   ` Darrick J. Wong
  2024-12-23 21:53   ` [PATCH 27/41] xfs_repair: refactor grabbing realtime metadata inodes Darrick J. Wong
                     ` (14 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:53 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Refactor root directory initialization into a separate function we can
call for both the root dir and the metadir.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/phase6.c |   63 +++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 40 insertions(+), 23 deletions(-)


diff --git a/repair/phase6.c b/repair/phase6.c
index 79c8226110a890..b3c013138be6c8 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -587,27 +587,27 @@ mk_rsumino(xfs_mount_t *mp)
 	libxfs_irele(ip);
 }
 
-/*
- * makes a new root directory.
- */
-static void
-mk_root_dir(xfs_mount_t *mp)
+/* Initialize a root directory. */
+static int
+init_fs_root_dir(
+	struct xfs_mount	*mp,
+	xfs_ino_t		ino,
+	mode_t			mode,
+	struct xfs_inode	**ipp)
 {
-	xfs_trans_t	*tp;
-	xfs_inode_t	*ip;
-	int		i;
-	int		error;
-	const mode_t	mode = 0755;
-	ino_tree_node_t	*irec;
+	struct xfs_trans	*tp;
+	struct xfs_inode	*ip = NULL;
+	struct ino_tree_node	*irec;
+	int			error;
 
-	ip = NULL;
-	i = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 10, 0, 0, &tp);
-	if (i)
-		res_failed(i);
+	error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 10, 0, 0, &tp);
+	if (error)
+		return error;
 
-	error = -libxfs_iget(mp, tp, mp->m_sb.sb_rootino, 0, &ip);
+	error = -libxfs_iget(mp, tp, ino, 0, &ip);
 	if (error) {
-		do_error(_("could not iget root inode -- error - %d\n"), error);
+		libxfs_trans_cancel(tp);
+		return error;
 	}
 
 	/* Reset the root directory. */
@@ -616,14 +616,31 @@ mk_root_dir(xfs_mount_t *mp)
 
 	error = -libxfs_trans_commit(tp);
 	if (error)
-		do_error(_("%s: commit failed, error %d\n"), __func__, error);
+		return error;
+
+	irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, ino),
+				XFS_INO_TO_AGINO(mp, ino));
+	set_inode_isadir(irec, XFS_INO_TO_AGINO(mp, ino) - irec->ino_startnum);
+	*ipp = ip;
+	return 0;
+}
+
+/*
+ * makes a new root directory.
+ */
+static void
+mk_root_dir(xfs_mount_t *mp)
+{
+	struct xfs_inode	*ip = NULL;
+	int			error;
+
+	error = init_fs_root_dir(mp, mp->m_sb.sb_rootino, 0755, &ip);
+	if (error)
+		do_error(
+	_("Could not reinitialize root directory inode, error %d\n"),
+			error);
 
 	libxfs_irele(ip);
-
-	irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, mp->m_sb.sb_rootino),
-				XFS_INO_TO_AGINO(mp, mp->m_sb.sb_rootino));
-	set_inode_isadir(irec, XFS_INO_TO_AGINO(mp, mp->m_sb.sb_rootino) -
-				irec->ino_startnum);
 }
 
 /*


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 27/41] xfs_repair: refactor grabbing realtime metadata inodes
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (25 preceding siblings ...)
  2024-12-23 21:53   ` [PATCH 26/41] xfs_repair: refactor root directory initialization Darrick J. Wong
@ 2024-12-23 21:53   ` Darrick J. Wong
  2024-12-23 21:53   ` [PATCH 28/41] xfs_repair: check metadata inode flag Darrick J. Wong
                     ` (13 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:53 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Create a helper function to grab a realtime metadata inode.  When
metadir arrives, the bitmap and summary inodes can float, so we'll
turn this function into a "load or allocate" function.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/phase6.c |   51 ++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 34 insertions(+), 17 deletions(-)


diff --git a/repair/phase6.c b/repair/phase6.c
index b3c013138be6c8..1decfe2286fa47 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -474,6 +474,24 @@ reset_sbroot_ino(
 	libxfs_inode_init(tp, &args, ip);
 }
 
+/* Load a realtime freespace metadata inode from disk and reset it. */
+static int
+ensure_rtino(
+	struct xfs_trans		*tp,
+	xfs_ino_t			ino,
+	struct xfs_inode		**ipp)
+{
+	struct xfs_mount		*mp = tp->t_mountp;
+	int				error;
+
+	error = -libxfs_iget(mp, tp, ino, 0, ipp);
+	if (error)
+		return error;
+
+	reset_sbroot_ino(tp, S_IFREG, *ipp);
+	return 0;
+}
+
 static void
 mk_rbmino(
 	struct xfs_mount	*mp)
@@ -486,15 +504,14 @@ mk_rbmino(
 	if (error)
 		res_failed(error);
 
-	error = -libxfs_iget(mp, tp, mp->m_sb.sb_rbmino, 0, &ip);
-	if (error) {
-		do_error(
-		_("couldn't iget realtime bitmap inode -- error - %d\n"),
-			error);
-	}
-
 	/* Reset the realtime bitmap inode. */
-	reset_sbroot_ino(tp, S_IFREG, ip);
+	error = ensure_rtino(tp, mp->m_sb.sb_rbmino, &ip);
+	if (error) {
+		do_error(
+		_("couldn't iget realtime bitmap inode -- error - %d\n"),
+			error);
+	}
+
 	ip->i_disk_size = mp->m_sb.sb_rbmblocks * mp->m_sb.sb_blocksize;
 	libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 	error = -libxfs_trans_commit(tp);
@@ -560,7 +577,8 @@ _("couldn't re-initialize realtime summary inode, error %d\n"), error);
 }
 
 static void
-mk_rsumino(xfs_mount_t *mp)
+mk_rsumino(
+	struct xfs_mount	*mp)
 {
 	struct xfs_trans	*tp;
 	struct xfs_inode	*ip;
@@ -570,15 +588,14 @@ mk_rsumino(xfs_mount_t *mp)
 	if (error)
 		res_failed(error);
 
-	error = -libxfs_iget(mp, tp, mp->m_sb.sb_rsumino, 0, &ip);
-	if (error) {
-		do_error(
-		_("couldn't iget realtime summary inode -- error - %d\n"),
-			error);
-	}
-
 	/* Reset the rt summary inode. */
-	reset_sbroot_ino(tp, S_IFREG, ip);
+	error = ensure_rtino(tp, mp->m_sb.sb_rsumino, &ip);
+	if (error) {
+		do_error(
+		_("couldn't iget realtime summary inode -- error - %d\n"),
+			error);
+	}
+
 	ip->i_disk_size = mp->m_rsumblocks * mp->m_sb.sb_blocksize;
 	libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 	error = -libxfs_trans_commit(tp);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 28/41] xfs_repair: check metadata inode flag
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (26 preceding siblings ...)
  2024-12-23 21:53   ` [PATCH 27/41] xfs_repair: refactor grabbing realtime metadata inodes Darrick J. Wong
@ 2024-12-23 21:53   ` Darrick J. Wong
  2024-12-23 21:54   ` [PATCH 29/41] xfs_repair: use libxfs_metafile_iget for quota/rt inodes Darrick J. Wong
                     ` (12 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:53 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Check whether or not the metadata inode flag is set appropriately.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/dinode.c |   41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)


diff --git a/repair/dinode.c b/repair/dinode.c
index e217e037f8862d..5dd7edfa36aed0 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -2331,6 +2331,26 @@ _("Bad extent size hint %u on inode %" PRIu64 ", "),
 	}
 }
 
+static inline bool
+should_have_metadir_iflag(
+	struct xfs_mount	*mp,
+	xfs_ino_t		ino)
+{
+	if (ino == mp->m_sb.sb_metadirino)
+		return true;
+	if (ino == mp->m_sb.sb_rbmino)
+		return true;
+	if (ino == mp->m_sb.sb_rsumino)
+		return true;
+	if (ino == mp->m_sb.sb_uquotino)
+		return true;
+	if (ino == mp->m_sb.sb_gquotino)
+		return true;
+	if (ino == mp->m_sb.sb_pquotino)
+		return true;
+	return false;
+}
+
 /*
  * returns 0 if the inode is ok, 1 if the inode is corrupt
  * check_dups can be set to 1 *only* when called by the
@@ -2680,6 +2700,27 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"),
 			}
 		}
 
+		if (flags2 & XFS_DIFLAG2_METADATA) {
+			xfs_failaddr_t	fa;
+
+			fa = libxfs_dinode_verify_metadir(mp, dino, di_mode,
+					be16_to_cpu(dino->di_flags), flags2);
+			if (fa) {
+				if (!uncertain)
+					do_warn(
+	_("inode %" PRIu64 " is incorrectly marked as metadata\n"),
+						lino);
+				goto clear_bad_out;
+			}
+		} else if (xfs_has_metadir(mp) &&
+			   should_have_metadir_iflag(mp, lino)) {
+			if (!uncertain)
+				do_warn(
+	_("inode %" PRIu64 " should be marked as metadata\n"),
+					lino);
+			goto clear_bad_out;
+		}
+
 		if ((flags2 & XFS_DIFLAG2_REFLINK) &&
 		    !xfs_has_reflink(mp)) {
 			if (!uncertain) {


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 29/41] xfs_repair: use libxfs_metafile_iget for quota/rt inodes
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (27 preceding siblings ...)
  2024-12-23 21:53   ` [PATCH 28/41] xfs_repair: check metadata inode flag Darrick J. Wong
@ 2024-12-23 21:54   ` Darrick J. Wong
  2024-12-23 21:54   ` [PATCH 30/41] xfs_repair: rebuild the metadata directory Darrick J. Wong
                     ` (11 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:54 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Use the new iget function for these metadata files so that we can check
types, etc.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/phase6.c     |   23 +++++++++--------------
 repair/quotacheck.c |   17 ++++++++++++++---
 repair/rt.c         |   29 ++++++++++++++++++++++-------
 3 files changed, 45 insertions(+), 24 deletions(-)


diff --git a/repair/phase6.c b/repair/phase6.c
index 1decfe2286fa47..82e1687f9b278d 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -484,6 +484,11 @@ ensure_rtino(
 	struct xfs_mount		*mp = tp->t_mountp;
 	int				error;
 
+	/*
+	 * Don't use metafile iget here because we're resetting sb-rooted
+	 * inodes that live at fixed inumbers, but these inodes could be in
+	 * an arbitrary state.
+	 */
 	error = -libxfs_iget(mp, tp, ino, 0, ipp);
 	if (error)
 		return error;
@@ -524,16 +529,11 @@ static void
 fill_rbmino(
 	struct xfs_mount	*mp)
 {
-	struct xfs_trans	*tp;
 	struct xfs_inode	*ip;
 	int			error;
 
-	error = -libxfs_trans_alloc_rollable(mp, 10, &tp);
-	if (error)
-		res_failed(error);
-
-	error = -libxfs_iget(mp, tp, mp->m_sb.sb_rbmino, 0, &ip);
-	libxfs_trans_cancel(tp);
+	error = -libxfs_metafile_iget(mp, mp->m_sb.sb_rbmino,
+			XFS_METAFILE_RTBITMAP, &ip);
 	if (error)
 		do_error(
 _("couldn't iget realtime bitmap inode, error %d\n"), error);
@@ -551,16 +551,11 @@ static void
 fill_rsumino(
 	struct xfs_mount	*mp)
 {
-	struct xfs_trans	*tp;
 	struct xfs_inode	*ip;
 	int			error;
 
-	error = -libxfs_trans_alloc_rollable(mp, 10, &tp);
-	if (error)
-		res_failed(error);
-
-	error = -libxfs_iget(mp, tp, mp->m_sb.sb_rsumino, 0, &ip);
-	libxfs_trans_cancel(tp);
+	error = -libxfs_metafile_iget(mp, mp->m_sb.sb_rsumino,
+			XFS_METAFILE_RTSUMMARY, &ip);
 	if (error)
 		do_error(
 _("couldn't iget realtime summary inode, error %d\n"), error);
diff --git a/repair/quotacheck.c b/repair/quotacheck.c
index 4cb38db3ddd6d7..d9e08059927f80 100644
--- a/repair/quotacheck.c
+++ b/repair/quotacheck.c
@@ -403,21 +403,26 @@ quotacheck_verify(
 	struct xfs_ifork	*ifp;
 	struct qc_dquots	*dquots = NULL;
 	struct avl64node	*node, *n;
+	struct xfs_trans	*tp;
 	xfs_ino_t		ino = NULLFSINO;
+	enum xfs_metafile_type	metafile_type;
 	int			error;
 
 	switch (type) {
 	case XFS_DQTYPE_USER:
 		ino = mp->m_sb.sb_uquotino;
 		dquots = user_dquots;
+		metafile_type = XFS_METAFILE_USRQUOTA;
 		break;
 	case XFS_DQTYPE_GROUP:
 		ino = mp->m_sb.sb_gquotino;
 		dquots = group_dquots;
+		metafile_type = XFS_METAFILE_GRPQUOTA;
 		break;
 	case XFS_DQTYPE_PROJ:
 		ino = mp->m_sb.sb_pquotino;
 		dquots = proj_dquots;
+		metafile_type = XFS_METAFILE_PRJQUOTA;
 		break;
 	}
 
@@ -429,17 +434,21 @@ quotacheck_verify(
 	if (!dquots || !chkd_flags)
 		return;
 
-	error = -libxfs_iget(mp, NULL, ino, 0, &ip);
+	error = -libxfs_trans_alloc_empty(mp, &tp);
+	if (error)
+		do_error(_("could not alloc transaction to open quota file\n"));
+
+	error = -libxfs_trans_metafile_iget(tp, ino, metafile_type, &ip);
 	if (error) {
 		do_warn(
 	_("could not open %s inode %"PRIu64" for quotacheck, err=%d\n"),
 			qflags_typestr(type), ino, error);
 		chkd_flags = 0;
-		return;
+		goto out_trans;
 	}
 
 	ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK);
-	error = -libxfs_iread_extents(NULL, ip, XFS_DATA_FORK);
+	error = -libxfs_iread_extents(tp, ip, XFS_DATA_FORK);
 	if (error) {
 		do_warn(
 	_("could not read %s inode %"PRIu64" extents, err=%d\n"),
@@ -477,6 +486,8 @@ _("%s record for id %u not found on disk (bcount %"PRIu64" rtbcount %"PRIu64" ic
 	}
 err:
 	libxfs_irele(ip);
+out_trans:
+	libxfs_trans_cancel(tp);
 }
 
 /*
diff --git a/repair/rt.c b/repair/rt.c
index 721c363cc1dd10..9ae421168e84b4 100644
--- a/repair/rt.c
+++ b/repair/rt.c
@@ -144,18 +144,34 @@ generate_rtinfo(
 static void
 check_rtfile_contents(
 	struct xfs_mount	*mp,
-	const char		*filename,
-	xfs_ino_t		ino,
-	void			*buf,
+	enum xfs_metafile_type	metafile_type,
 	xfs_fileoff_t		filelen)
 {
 	struct xfs_bmbt_irec	map;
 	struct xfs_buf		*bp;
 	struct xfs_inode	*ip;
+	const char		*filename;
+	void			*buf;
+	xfs_ino_t		ino;
 	xfs_fileoff_t		bno = 0;
 	int			error;
 
-	error = -libxfs_iget(mp, NULL, ino, 0, &ip);
+	switch (metafile_type) {
+	case XFS_METAFILE_RTBITMAP:
+		ino = mp->m_sb.sb_rbmino;
+		filename = "rtbitmap";
+		buf = btmcompute;
+		break;
+	case XFS_METAFILE_RTSUMMARY:
+		ino = mp->m_sb.sb_rsumino;
+		filename = "rtsummary";
+		buf = sumcompute;
+		break;
+	default:
+		return;
+	}
+
+	error = -libxfs_metafile_iget(mp, ino, metafile_type, &ip);
 	if (error) {
 		do_warn(_("unable to open %s file, err %d\n"), filename, error);
 		return;
@@ -216,7 +232,7 @@ check_rtbitmap(
 	if (need_rbmino)
 		return;
 
-	check_rtfile_contents(mp, "rtbitmap", mp->m_sb.sb_rbmino, btmcompute,
+	check_rtfile_contents(mp, XFS_METAFILE_RTBITMAP,
 			mp->m_sb.sb_rbmblocks);
 }
 
@@ -227,6 +243,5 @@ check_rtsummary(
 	if (need_rsumino)
 		return;
 
-	check_rtfile_contents(mp, "rtsummary", mp->m_sb.sb_rsumino, sumcompute,
-			mp->m_rsumblocks);
+	check_rtfile_contents(mp, XFS_METAFILE_RTSUMMARY, mp->m_rsumblocks);
 }


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 30/41] xfs_repair: rebuild the metadata directory
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (28 preceding siblings ...)
  2024-12-23 21:54   ` [PATCH 29/41] xfs_repair: use libxfs_metafile_iget for quota/rt inodes Darrick J. Wong
@ 2024-12-23 21:54   ` Darrick J. Wong
  2024-12-23 21:54   ` [PATCH 31/41] xfs_repair: don't let metadata and regular files mix Darrick J. Wong
                     ` (10 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:54 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Check the dirents in metadata directories for problems and repair them
if necessary.  Also make sure that the sb-rooted inodes (root, metadir
root, rt bitmap, rt summary) are always allocated in that order.

Note that xfs_repair will always rebuild the metadata directory tree
itself, so we only need to report problems, not fix them.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/libxfs_api_defs.h |    1 
 repair/dino_chunks.c     |   12 ++++++
 repair/dir2.c            |   16 +++++++-
 repair/globals.c         |    3 +
 repair/globals.h         |    3 +
 repair/incore.h          |   13 ++++++
 repair/phase1.c          |    2 +
 repair/phase2.c          |   53 +++++++++++++++++++-------
 repair/phase4.c          |   16 ++++++++
 repair/phase6.c          |   73 ++++++++++++++++++++++++++++++++++--
 repair/pptr.c            |   94 ++++++++++++++++++++++++++++++++++++++++++++++
 repair/pptr.h            |    2 +
 repair/sb.c              |    3 +
 repair/xfs_repair.c      |   50 ++++++++++++++++++++++++
 14 files changed, 320 insertions(+), 21 deletions(-)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index d2611d7a764259..e79aa0e06e4f90 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -214,6 +214,7 @@
 
 #define xfs_metafile_iget		libxfs_metafile_iget
 #define xfs_trans_metafile_iget		libxfs_trans_metafile_iget
+#define xfs_metafile_set_iflag		libxfs_metafile_set_iflag
 #define xfs_metadir_link		libxfs_metadir_link
 #define xfs_metadir_lookup		libxfs_metadir_lookup
 #define xfs_metadir_start_create	libxfs_metadir_start_create
diff --git a/repair/dino_chunks.c b/repair/dino_chunks.c
index 49d57948c7eca8..0d9b3a01bc298d 100644
--- a/repair/dino_chunks.c
+++ b/repair/dino_chunks.c
@@ -932,6 +932,18 @@ process_inode_chunk(
 	_("would clear root inode %" PRIu64 "\n"),
 						ino);
 				}
+			} else if (mp->m_sb.sb_metadirino == ino) {
+				need_metadir_inode = true;
+
+				if (!no_modify)  {
+					do_warn(
+	_("cleared metadata directory %" PRIu64 "\n"),
+						ino);
+				} else  {
+					do_warn(
+	_("would clear metadata directory %" PRIu64 "\n"),
+						ino);
+				}
 			} else if (mp->m_sb.sb_rbmino == ino) {
 				need_rbmino = 1;
 
diff --git a/repair/dir2.c b/repair/dir2.c
index dab6523f676a34..d233c724488182 100644
--- a/repair/dir2.c
+++ b/repair/dir2.c
@@ -271,6 +271,9 @@ process_sf_dir2(
 		} else if (lino == mp->m_sb.sb_pquotino)  {
 			junkit = 1;
 			junkreason = _("project quota");
+		} else if (lino == mp->m_sb.sb_metadirino)  {
+			junkit = 1;
+			junkreason = _("metadata directory root");
 		} else if ((irec_p = find_inode_rec(mp,
 					XFS_INO_TO_AGNO(mp, lino),
 					XFS_INO_TO_AGINO(mp, lino))) != NULL) {
@@ -564,7 +567,8 @@ _("corrected root directory %" PRIu64 " .. entry, was %" PRIu64 ", now %" PRIu64
 _("would have corrected root directory %" PRIu64 " .. entry from %" PRIu64" to %" PRIu64 "\n"),
 				ino, *parent, ino);
 		}
-	} else if (ino == *parent && ino != mp->m_sb.sb_rootino)  {
+	} else if (ino == *parent && ino != mp->m_sb.sb_rootino &&
+		   ino != mp->m_sb.sb_metadirino)  {
 		/*
 		 * likewise, non-root directories can't have .. pointing
 		 * to .
@@ -743,6 +747,8 @@ process_dir2_data(
 			clearreason = _("group quota");
 		} else if (ent_ino == mp->m_sb.sb_pquotino) {
 			clearreason = _("project quota");
+		} else if (ent_ino == mp->m_sb.sb_metadirino)  {
+			clearreason = _("metadata directory root");
 		} else {
 			irec_p = find_inode_rec(mp,
 						XFS_INO_TO_AGNO(mp, ent_ino),
@@ -864,7 +870,8 @@ _("entry at block %u offset %" PRIdPTR " in directory inode %" PRIu64 " has ille
 				 * NULLFSINO otherwise.
 				 */
 				if (ino == ent_ino &&
-						ino != mp->m_sb.sb_rootino) {
+				    ino != mp->m_sb.sb_rootino &&
+				    ino != mp->m_sb.sb_metadirino) {
 					*parent = NULLFSINO;
 					do_warn(
 _("bad .. entry in directory inode %" PRIu64 ", points to self: "),
@@ -1519,9 +1526,14 @@ process_dir2(
 	} else if (dotdot == 0 && ino == mp->m_sb.sb_rootino) {
 		do_warn(_("no .. entry for root directory %" PRIu64 "\n"), ino);
 		need_root_dotdot = 1;
+	} else if (dotdot == 0 && ino == mp->m_sb.sb_metadirino) {
+		do_warn(_("no .. entry for metaino directory %" PRIu64 "\n"), ino);
+		need_metadir_dotdot = 1;
 	}
 
 	ASSERT((ino != mp->m_sb.sb_rootino && ino != *parent) ||
+		(ino == mp->m_sb.sb_metadirino &&
+			(ino == *parent || need_metadir_dotdot == 1)) ||
 		(ino == mp->m_sb.sb_rootino &&
 			(ino == *parent || need_root_dotdot == 1)));
 
diff --git a/repair/globals.c b/repair/globals.c
index 7388090a7d39f3..b63931be9fdb70 100644
--- a/repair/globals.c
+++ b/repair/globals.c
@@ -66,6 +66,9 @@ int	fs_is_dirty;
 int	need_root_inode;
 int	need_root_dotdot;
 
+bool	need_metadir_inode;
+int	need_metadir_dotdot;
+
 int	need_rbmino;
 int	need_rsumino;
 
diff --git a/repair/globals.h b/repair/globals.h
index fa53502f98bbcd..1dc85ce7f8114c 100644
--- a/repair/globals.h
+++ b/repair/globals.h
@@ -107,6 +107,9 @@ extern int		fs_is_dirty;
 extern int		need_root_inode;
 extern int		need_root_dotdot;
 
+extern bool		need_metadir_inode;
+extern int		need_metadir_dotdot;
+
 extern int		need_rbmino;
 extern int		need_rsumino;
 
diff --git a/repair/incore.h b/repair/incore.h
index 9ad5f1972d3dee..4f32ad3377faed 100644
--- a/repair/incore.h
+++ b/repair/incore.h
@@ -661,4 +661,17 @@ inorec_set_freecount(
 		rp->ir_u.f.ir_freecount = cpu_to_be32(freecount);
 }
 
+/*
+ * Number of inodes assumed to be always allocated because they are created
+ * by mkfs.
+ */
+static inline unsigned int
+xfs_rootrec_inodes_inuse(
+	struct xfs_mount	*mp)
+{
+	if (xfs_has_metadir(mp))
+		return 4; /* sb_rootino, sb_rbmino, sb_rsumino, sb_metadirino */
+	return 3; /* sb_rootino, sb_rbmino, sb_rsumino */
+}
+
 #endif /* XFS_REPAIR_INCORE_H */
diff --git a/repair/phase1.c b/repair/phase1.c
index 00b98584eed429..40e7f164c55158 100644
--- a/repair/phase1.c
+++ b/repair/phase1.c
@@ -48,6 +48,8 @@ phase1(xfs_mount_t *mp)
 	primary_sb_modified = 0;
 	need_root_inode = 0;
 	need_root_dotdot = 0;
+	need_metadir_inode = false;
+	need_metadir_dotdot = 0;
 	need_rbmino = 0;
 	need_rsumino = 0;
 	lost_quotas = 0;
diff --git a/repair/phase2.c b/repair/phase2.c
index 17966bb54db09d..17c16e94a600c2 100644
--- a/repair/phase2.c
+++ b/repair/phase2.c
@@ -496,8 +496,8 @@ phase2(
 	struct xfs_mount	*mp,
 	int			scan_threads)
 {
-	int			j;
 	ino_tree_node_t		*ino_rec;
+	unsigned int		inuse = xfs_rootrec_inodes_inuse(mp), j;
 
 	/* now we can start using the buffer cache routines */
 	set_mp(mp);
@@ -541,58 +541,81 @@ phase2(
 	 * make sure we know about the root inode chunk
 	 */
 	if ((ino_rec = find_inode_rec(mp, 0, mp->m_sb.sb_rootino)) == NULL)  {
-		ASSERT(mp->m_sb.sb_rbmino == mp->m_sb.sb_rootino + 1 &&
-			mp->m_sb.sb_rsumino == mp->m_sb.sb_rootino + 2);
+		struct xfs_sb	*sb = &mp->m_sb;
+
+		if (xfs_has_metadir(mp))
+			ASSERT(sb->sb_metadirino == sb->sb_rootino + 1 &&
+			       sb->sb_rbmino  == sb->sb_rootino + 2 &&
+			       sb->sb_rsumino == sb->sb_rootino + 3);
+		else
+			ASSERT(sb->sb_rbmino  == sb->sb_rootino + 1 &&
+			       sb->sb_rsumino == sb->sb_rootino + 2);
 		do_warn(_("root inode chunk not found\n"));
 
 		/*
-		 * mark the first 3 used, the rest are free
+		 * mark the first 3-4 inodes used, the rest are free
 		 */
 		ino_rec = set_inode_used_alloc(mp, 0,
-				(xfs_agino_t) mp->m_sb.sb_rootino);
-		set_inode_used(ino_rec, 1);
-		set_inode_used(ino_rec, 2);
+				XFS_INO_TO_AGINO(mp, sb->sb_rootino));
+		for (j = 1; j < inuse; j++)
+			set_inode_used(ino_rec, j);
 
-		for (j = 3; j < XFS_INODES_PER_CHUNK; j++)
+		for (j = inuse; j < XFS_INODES_PER_CHUNK; j++)
 			set_inode_free(ino_rec, j);
 
 		/*
 		 * also mark blocks
 		 */
-		set_bmap_ext(0, XFS_INO_TO_AGBNO(mp, mp->m_sb.sb_rootino),
+		set_bmap_ext(0, XFS_INO_TO_AGBNO(mp, sb->sb_rootino),
 			     M_IGEO(mp)->ialloc_blks, XR_E_INO);
 	} else  {
 		do_log(_("        - found root inode chunk\n"));
+		j = 0;
 
 		/*
 		 * blocks are marked, just make sure they're in use
 		 */
-		if (is_inode_free(ino_rec, 0))  {
+		if (is_inode_free(ino_rec, j)) {
 			do_warn(_("root inode marked free, "));
-			set_inode_used(ino_rec, 0);
+			set_inode_used(ino_rec, j);
 			if (!no_modify)
 				do_warn(_("correcting\n"));
 			else
 				do_warn(_("would correct\n"));
 		}
+		j++;
 
-		if (is_inode_free(ino_rec, 1))  {
+		if (xfs_has_metadir(mp)) {
+			if (is_inode_free(ino_rec, j))  {
+				do_warn(_("metadata root inode marked free, "));
+				set_inode_used(ino_rec, j);
+				if (!no_modify)
+					do_warn(_("correcting\n"));
+				else
+					do_warn(_("would correct\n"));
+			}
+			j++;
+		}
+
+		if (is_inode_free(ino_rec, j))  {
 			do_warn(_("realtime bitmap inode marked free, "));
-			set_inode_used(ino_rec, 1);
+			set_inode_used(ino_rec, j);
 			if (!no_modify)
 				do_warn(_("correcting\n"));
 			else
 				do_warn(_("would correct\n"));
 		}
+		j++;
 
-		if (is_inode_free(ino_rec, 2))  {
+		if (is_inode_free(ino_rec, j))  {
 			do_warn(_("realtime summary inode marked free, "));
-			set_inode_used(ino_rec, 2);
+			set_inode_used(ino_rec, j);
 			if (!no_modify)
 				do_warn(_("correcting\n"));
 			else
 				do_warn(_("would correct\n"));
 		}
+		j++;
 	}
 
 	/*
diff --git a/repair/phase4.c b/repair/phase4.c
index 071f20ed736e4b..7efef86245fbe7 100644
--- a/repair/phase4.c
+++ b/repair/phase4.c
@@ -264,6 +264,22 @@ phase4(xfs_mount_t *mp)
 			do_warn(_("root inode lost\n"));
 	}
 
+	/*
+	 * If metadata directory trees are enabled, the metadata root directory
+	 * always comes immediately after the regular root directory, even if
+	 * it's free.
+	 */
+	if (xfs_has_metadir(mp) &&
+	    (is_inode_free(irec, 1) || !inode_isadir(irec, 1))) {
+		need_metadir_inode = true;
+		if (no_modify)
+			do_warn(
+	_("metadata directory root inode would be lost\n"));
+		else
+			do_warn(
+	_("metadata directory root inode lost\n"));
+	}
+
 	for (i = 0; i < mp->m_sb.sb_agcount; i++)  {
 		ag_end = (i < mp->m_sb.sb_agcount - 1) ? mp->m_sb.sb_agblocks :
 			mp->m_sb.sb_dblocks -
diff --git a/repair/phase6.c b/repair/phase6.c
index 82e1687f9b278d..7a8edbad2ebfc2 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -478,12 +478,25 @@ reset_sbroot_ino(
 static int
 ensure_rtino(
 	struct xfs_trans		*tp,
-	xfs_ino_t			ino,
+	enum xfs_metafile_type		metafile_type,
 	struct xfs_inode		**ipp)
 {
 	struct xfs_mount		*mp = tp->t_mountp;
+	xfs_ino_t			ino;
 	int				error;
 
+	switch (metafile_type) {
+	case XFS_METAFILE_RTBITMAP:
+		ino = mp->m_sb.sb_rbmino;
+		break;
+	case XFS_METAFILE_RTSUMMARY:
+		ino = mp->m_sb.sb_rsumino;
+		break;
+	default:
+		ASSERT(0);
+		return -EFSCORRUPTED;
+	}
+
 	/*
 	 * Don't use metafile iget here because we're resetting sb-rooted
 	 * inodes that live at fixed inumbers, but these inodes could be in
@@ -494,6 +507,8 @@ ensure_rtino(
 		return error;
 
 	reset_sbroot_ino(tp, S_IFREG, *ipp);
+	if (xfs_has_metadir(mp))
+		libxfs_metafile_set_iflag(tp, *ipp, metafile_type);
 	return 0;
 }
 
@@ -510,7 +525,7 @@ mk_rbmino(
 		res_failed(error);
 
 	/* Reset the realtime bitmap inode. */
-	error = ensure_rtino(tp, mp->m_sb.sb_rbmino, &ip);
+	error = ensure_rtino(tp, XFS_METAFILE_RTBITMAP, &ip);
 	if (error) {
 		do_error(
 		_("couldn't iget realtime bitmap inode -- error - %d\n"),
@@ -584,7 +599,7 @@ mk_rsumino(
 		res_failed(error);
 
 	/* Reset the rt summary inode. */
-	error = ensure_rtino(tp, mp->m_sb.sb_rsumino, &ip);
+	error = ensure_rtino(tp, XFS_METAFILE_RTSUMMARY, &ip);
 	if (error) {
 		do_error(
 		_("couldn't iget realtime summary inode -- error - %d\n"),
@@ -655,6 +670,36 @@ mk_root_dir(xfs_mount_t *mp)
 	libxfs_irele(ip);
 }
 
+/* Create a new metadata directory root. */
+static void
+mk_metadir(
+	struct xfs_mount	*mp)
+{
+	struct xfs_trans	*tp;
+	int			error;
+
+	error = init_fs_root_dir(mp, mp->m_sb.sb_metadirino, 0,
+			&mp->m_metadirip);
+	if (error)
+		do_error(
+	_("Initialization of the metadata root directory failed, error %d\n"),
+			error);
+
+	/* Mark the new metadata root dir as metadata. */
+	error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 0, 0, 0, &tp);
+	if (error)
+		do_error(
+	_("Marking metadata root directory failed"));
+
+	libxfs_trans_ijoin(tp, mp->m_metadirip, 0);
+	libxfs_metafile_set_iflag(tp, mp->m_metadirip, XFS_METAFILE_DIR);
+
+	error = -libxfs_trans_commit(tp);
+	if (error)
+		do_error(
+	_("Marking metadata root directory failed, error %d\n"), error);
+}
+
 /*
  * orphanage name == lost+found
  */
@@ -1168,6 +1213,8 @@ longform_dir2_rebuild(
 
 	if (ino == mp->m_sb.sb_rootino)
 		need_root_dotdot = 0;
+	else if (ino == mp->m_sb.sb_metadirino)
+		need_metadir_dotdot = 0;
 
 	/* go through the hash list and re-add the inodes */
 
@@ -2789,7 +2836,7 @@ process_dir_inode(
 
 	need_dot = dirty = num_illegal = 0;
 
-	if (mp->m_sb.sb_rootino == ino)  {
+	if (mp->m_sb.sb_rootino == ino || mp->m_sb.sb_metadirino == ino) {
 		/*
 		 * mark root inode reached and bump up
 		 * link count for root inode to account
@@ -2864,6 +2911,9 @@ _("error %d fixing shortform directory %llu\n"),
 	dir_hash_done(hashtab);
 
 	fix_dotdot(mp, ino, ip, mp->m_sb.sb_rootino, "root", &need_root_dotdot);
+	if (xfs_has_metadir(mp))
+		fix_dotdot(mp, ino, ip, mp->m_sb.sb_metadirino, "metadata",
+				&need_metadir_dotdot);
 
 	/*
 	 * if we need to create the '.' entry, do so only if
@@ -3117,6 +3167,21 @@ phase6(xfs_mount_t *mp)
 		}
 	}
 
+	if (!no_modify && xfs_has_metadir(mp)) {
+		/*
+		 * In write mode, we always rebuild the metadata directory
+		 * tree, even if the old one was correct.  However, we still
+		 * want to log something if we couldn't find the old root.
+		 */
+		if (need_metadir_inode)
+			do_warn(_("reinitializing metadata root directory\n"));
+		mk_metadir(mp);
+		need_metadir_inode = false;
+		need_metadir_dotdot = 0;
+	} else if (need_metadir_inode) {
+		do_warn(_("would reinitialize metadata root directory\n"));
+	}
+
 	if (need_rbmino)  {
 		if (!no_modify)  {
 			do_warn(_("reinitializing realtime bitmap inode\n"));
diff --git a/repair/pptr.c b/repair/pptr.c
index ee29e47a87bd07..ac0a9c618bc87d 100644
--- a/repair/pptr.c
+++ b/repair/pptr.c
@@ -1334,3 +1334,97 @@ check_parent_ptrs(
 
 	destroy_work_queue(&wq);
 }
+
+static int
+erase_pptrs(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip,
+	unsigned int		attr_flags,
+	const unsigned char	*name,
+	unsigned int		namelen,
+	const void		*value,
+	unsigned int		valuelen,
+	void			*priv)
+{
+	struct garbage_xattr	garbage_xattr = {
+		.attr_filter	= attr_flags,
+		.attrnamelen	= namelen,
+		.attrvaluelen	= valuelen,
+	};
+	struct file_scan	*fscan = priv;
+	int			error;
+
+	if (!(attr_flags & XFS_ATTR_PARENT))
+		return 0;
+
+	error = -xfblob_store(fscan->garbage_xattr_names,
+			&garbage_xattr.attrname_cookie, name, namelen);
+	if (error)
+		do_error(_("storing ino %llu garbage pptr failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				strerror(error));
+
+	error = -xfblob_store(fscan->garbage_xattr_names,
+			&garbage_xattr.attrvalue_cookie, value, valuelen);
+	if (error)
+		do_error(_("storing ino %llu garbage pptr failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				strerror(error));
+
+	error = -slab_add(fscan->garbage_xattr_recs, &garbage_xattr);
+	if (error)
+		do_error(_("storing ino %llu garbage pptr rec failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				strerror(error));
+
+	return 0;
+}
+
+/* Delete all of this file's parent pointers if we can. */
+void
+try_erase_parent_ptrs(
+	struct xfs_inode	*ip)
+{
+	struct file_scan	fscan = {
+		.have_garbage	= true,
+	};
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_trans	*tp = NULL;
+	char			*descr;
+	int			error;
+
+	if (!xfs_has_parent(ip->i_mount))
+		return;
+
+	if (no_modify) {
+		do_warn(
+ _("would delete garbage parent pointers in metadata ino %llu\n"),
+				(unsigned long long)ip->i_ino);
+		return;
+	}
+
+	error = -init_slab(&fscan.garbage_xattr_recs,
+			sizeof(struct garbage_xattr));
+	if (error)
+		do_error(_("init garbage pptr recs failed: %s\n"),
+				strerror(error));
+
+	descr = kasprintf(GFP_KERNEL, "xfs_repair (%s): garbage pptr names",
+			mp->m_fsname);
+	error = -xfblob_create(descr, &fscan.garbage_xattr_names);
+	kfree(descr);
+	if (error)
+		do_error("init garbage pptr names failed: %s\n",
+				strerror(error));
+
+	libxfs_trans_alloc_empty(ip->i_mount, &tp);
+	error = xattr_walk(tp, ip, erase_pptrs, &fscan);
+	if (tp)
+		libxfs_trans_cancel(tp);
+	if (error)
+		do_warn(_("ino %llu garbage pptr collection failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				strerror(error));
+
+	remove_garbage_xattrs(ip, &fscan);
+}
diff --git a/repair/pptr.h b/repair/pptr.h
index 65acff963a3fe9..38d5c4052ea86c 100644
--- a/repair/pptr.h
+++ b/repair/pptr.h
@@ -14,4 +14,6 @@ void add_parent_ptr(xfs_ino_t ino, const unsigned char *fname,
 
 void check_parent_ptrs(struct xfs_mount *mp);
 
+void try_erase_parent_ptrs(struct xfs_inode *ip);
+
 #endif /* __REPAIR_PPTR_H__ */
diff --git a/repair/sb.c b/repair/sb.c
index 1320929caee590..7f27833d697ea9 100644
--- a/repair/sb.c
+++ b/repair/sb.c
@@ -28,6 +28,7 @@ copy_sb(xfs_sb_t *source, xfs_sb_t *dest)
 	xfs_ino_t	uquotino;
 	xfs_ino_t	gquotino;
 	xfs_ino_t	pquotino;
+	xfs_ino_t	metadirino;
 	uint16_t	versionnum;
 
 	rootino = dest->sb_rootino;
@@ -36,6 +37,7 @@ copy_sb(xfs_sb_t *source, xfs_sb_t *dest)
 	uquotino = dest->sb_uquotino;
 	gquotino = dest->sb_gquotino;
 	pquotino = dest->sb_pquotino;
+	metadirino = dest->sb_metadirino;
 
 	versionnum = dest->sb_versionnum;
 
@@ -47,6 +49,7 @@ copy_sb(xfs_sb_t *source, xfs_sb_t *dest)
 	dest->sb_uquotino = uquotino;
 	dest->sb_gquotino = gquotino;
 	dest->sb_pquotino = pquotino;
+	dest->sb_metadirino = metadirino;
 
 	dest->sb_versionnum = versionnum;
 
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 3ade85bbcbb7fd..70cab1ad852a21 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -644,6 +644,46 @@ guess_correct_sunit(
 		do_warn(_("Would reset sb_width to %u\n"), new_sunit);
 }
 
+/*
+ * Check that the metadata directory inode comes immediately after the root
+ * directory inode and that it seems to look like a metadata directory.
+ */
+STATIC void
+check_metadir_inode(
+	struct xfs_mount	*mp,
+	xfs_ino_t		rootino)
+{
+	int			error;
+
+	validate_sb_ino(&mp->m_sb.sb_metadirino, rootino + 1,
+			_("metadata root directory"));
+
+	/* If we changed the metadir inode, try reloading it. */
+	if (!mp->m_metadirip ||
+	    mp->m_metadirip->i_ino != mp->m_sb.sb_metadirino) {
+		if (mp->m_metadirip)
+			libxfs_irele(mp->m_metadirip);
+
+		error = -libxfs_metafile_iget(mp, mp->m_sb.sb_metadirino,
+				XFS_METAFILE_DIR, &mp->m_metadirip);
+		if (error) {
+			need_metadir_inode = true;
+			goto done;
+		}
+	}
+
+done:
+	if (need_metadir_inode) {
+		if (!no_modify)
+			do_warn(_("will reset metadata root directory\n"));
+		else
+			do_warn(_("would reset metadata root directory\n"));
+		if (mp->m_metadirip)
+			libxfs_irele(mp->m_metadirip);
+		mp->m_metadirip = NULL;
+	}
+}
+
 /*
  * Make sure that the first 3 inodes in the filesystem are the root directory,
  * the realtime bitmap, and the realtime summary, in that order.
@@ -673,6 +713,16 @@ _("sb root inode value %" PRIu64 " valid but in unaligned location (expected %"P
 
 	validate_sb_ino(&mp->m_sb.sb_rootino, rootino,
 			_("root"));
+
+	if (xfs_has_metadir(mp)) {
+		/*
+		 * The metadata root directory comes after the regular root
+		 * directory.
+		 */
+		check_metadir_inode(mp, rootino);
+		rootino++;
+	}
+
 	validate_sb_ino(&mp->m_sb.sb_rbmino, rootino + 1,
 			_("realtime bitmap"));
 	validate_sb_ino(&mp->m_sb.sb_rsumino, rootino + 2,


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 31/41] xfs_repair: don't let metadata and regular files mix
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (29 preceding siblings ...)
  2024-12-23 21:54   ` [PATCH 30/41] xfs_repair: rebuild the metadata directory Darrick J. Wong
@ 2024-12-23 21:54   ` Darrick J. Wong
  2024-12-23 21:54   ` [PATCH 32/41] xfs_repair: update incore metadata state whenever we create new files Darrick J. Wong
                     ` (9 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:54 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Track whether or not inodes thought they were metadata inodes.  We
cannot allow metadata inodes to appear in the regular directory tree,
and we cannot allow regular inodes to appear in the metadata directory
tree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/dinode.c     |   21 +++++++++++++++++
 repair/incore.h     |   19 +++++++++++++++
 repair/incore_ino.c |    1 +
 repair/phase2.c     |    7 +++++-
 repair/phase6.c     |   63 +++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 110 insertions(+), 1 deletion(-)


diff --git a/repair/dinode.c b/repair/dinode.c
index 5dd7edfa36aed0..60a853d152ccec 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -2392,6 +2392,7 @@ process_dinode_int(
 	struct xfs_dinode	*dino = *dinop;
 	xfs_agino_t		unlinked_ino;
 	struct xfs_perag	*pag;
+	bool			is_meta = false;
 
 	*dirty = *isa_dir = 0;
 	*used = is_used;
@@ -2971,6 +2972,18 @@ _("Bad CoW extent size %u on inode %" PRIu64 ", "),
 	if (collect_rmaps)
 		record_inode_reflink_flag(mp, dino, agno, ino, lino);
 
+	/* Does this inode think it was metadata? */
+	if (dino->di_version >= 3 &&
+	    (dino->di_flags2 & cpu_to_be64(XFS_DIFLAG2_METADATA))) {
+		struct ino_tree_node	*irec;
+		int			off;
+
+		irec = find_inode_rec(mp, agno, ino);
+		off = get_inode_offset(mp, lino, irec);
+		set_inode_is_meta(irec, off);
+		is_meta = true;
+	}
+
 	/*
 	 * check data fork -- if it's bad, clear the inode
 	 */
@@ -3057,6 +3070,14 @@ _("Bad CoW extent size %u on inode %" PRIu64 ", "),
 	*used = is_free;
 	*isa_dir = 0;
 	blkmap_free(dblkmap);
+	if (is_meta) {
+		struct ino_tree_node	*irec;
+		int			off;
+
+		irec = find_inode_rec(mp, agno, ino);
+		off = get_inode_offset(mp, lino, irec);
+		clear_inode_is_meta(irec, off);
+	}
 	return 1;
 }
 
diff --git a/repair/incore.h b/repair/incore.h
index 4f32ad3377faed..568a8c7cb75b7c 100644
--- a/repair/incore.h
+++ b/repair/incore.h
@@ -271,6 +271,7 @@ typedef struct ino_tree_node  {
 	uint64_t		ino_isa_dir;	/* bit == 1 if a directory */
 	uint64_t		ino_was_rl;	/* bit == 1 if reflink flag set */
 	uint64_t		ino_is_rl;	/* bit == 1 if reflink flag should be set */
+	uint64_t		ino_is_meta;	/* bit == 1 if metadata */
 	uint8_t			nlink_size;
 	union ino_nlink		disk_nlinks;	/* on-disk nlinks, set in P3 */
 	union  {
@@ -538,6 +539,24 @@ static inline int inode_is_rl(struct ino_tree_node *irec, int offset)
 	return (irec->ino_is_rl & IREC_MASK(offset)) != 0;
 }
 
+/*
+ * set/clear/test was inode marked as metadata
+ */
+static inline void set_inode_is_meta(struct ino_tree_node *irec, int offset)
+{
+	irec->ino_is_meta |= IREC_MASK(offset);
+}
+
+static inline void clear_inode_is_meta(struct ino_tree_node *irec, int offset)
+{
+	irec->ino_is_meta &= ~IREC_MASK(offset);
+}
+
+static inline int inode_is_meta(struct ino_tree_node *irec, int offset)
+{
+	return (irec->ino_is_meta & IREC_MASK(offset)) != 0;
+}
+
 /*
  * add_inode_reached() is set on inode I only if I has been reached
  * by an inode P claiming to be the parent and if I is a directory,
diff --git a/repair/incore_ino.c b/repair/incore_ino.c
index 158e9b4980d984..3189e019faa23d 100644
--- a/repair/incore_ino.c
+++ b/repair/incore_ino.c
@@ -257,6 +257,7 @@ alloc_ino_node(
 	irec->ino_isa_dir = 0;
 	irec->ino_was_rl = 0;
 	irec->ino_is_rl = 0;
+	irec->ino_is_meta = 0;
 	irec->ir_free = (xfs_inofree_t) - 1;
 	irec->ir_sparse = 0;
 	irec->ino_un.ex_data = NULL;
diff --git a/repair/phase2.c b/repair/phase2.c
index 17c16e94a600c2..476a1c74db8c8d 100644
--- a/repair/phase2.c
+++ b/repair/phase2.c
@@ -557,8 +557,10 @@ phase2(
 		 */
 		ino_rec = set_inode_used_alloc(mp, 0,
 				XFS_INO_TO_AGINO(mp, sb->sb_rootino));
-		for (j = 1; j < inuse; j++)
+		for (j = 1; j < inuse; j++) {
 			set_inode_used(ino_rec, j);
+			set_inode_is_meta(ino_rec, j);
+		}
 
 		for (j = inuse; j < XFS_INODES_PER_CHUNK; j++)
 			set_inode_free(ino_rec, j);
@@ -594,6 +596,7 @@ phase2(
 				else
 					do_warn(_("would correct\n"));
 			}
+			set_inode_is_meta(ino_rec, j);
 			j++;
 		}
 
@@ -605,6 +608,7 @@ phase2(
 			else
 				do_warn(_("would correct\n"));
 		}
+		set_inode_is_meta(ino_rec, j);
 		j++;
 
 		if (is_inode_free(ino_rec, j))  {
@@ -615,6 +619,7 @@ phase2(
 			else
 				do_warn(_("would correct\n"));
 		}
+		set_inode_is_meta(ino_rec, j);
 		j++;
 	}
 
diff --git a/repair/phase6.c b/repair/phase6.c
index 7a8edbad2ebfc2..688eee20bb3e8e 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -1610,6 +1610,38 @@ longform_dir2_entry_check_data(
 			continue;
 		}
 
+		/*
+		 * Regular directories cannot point to metadata files.  If
+		 * we find such a thing, blow out the entry.
+		 */
+		if (!xfs_is_metadir_inode(ip) &&
+		    inode_is_meta(irec, ino_offset)) {
+			nbad++;
+			if (entry_junked(
+	_("entry \"%s\" in regular dir %" PRIu64" points to a metadata inode %" PRIu64 ", "),
+					fname, ip->i_ino, inum, NULLFSINO)) {
+				dep->name[0] = '/';
+				libxfs_dir2_data_log_entry(&da, bp, dep);
+			}
+			continue;
+		}
+
+		/*
+		 * Metadata directories cannot point to regular files.  If
+		 * we find such a thing, blow out the entry.
+		 */
+		if (xfs_is_metadir_inode(ip) &&
+		    !inode_is_meta(irec, ino_offset)) {
+			nbad++;
+			if (entry_junked(
+	_("entry \"%s\" in metadata dir %" PRIu64" points to a regular inode %" PRIu64 ", "),
+					fname, ip->i_ino, inum, NULLFSINO)) {
+				dep->name[0] = '/';
+				libxfs_dir2_data_log_entry(&da, bp, dep);
+			}
+			continue;
+		}
+
 		/*
 		 * check if this inode is lost+found dir in the root
 		 */
@@ -2521,6 +2553,37 @@ shortform_dir2_entry_check(
 						ino_dirty);
 			continue;
 		}
+
+		/*
+		 * Regular directories cannot point to metadata files.  If
+		 * we find such a thing, blow out the entry.
+		 */
+		if (!xfs_is_metadir_inode(ip) &&
+		    inode_is_meta(irec, ino_offset)) {
+			do_warn(
+	_("entry \"%s\" in regular dir %" PRIu64" points to a metadata inode %" PRIu64 ", "),
+					fname, ip->i_ino, lino);
+			next_sfep = shortform_dir2_junk(mp, sfp, sfep, lino,
+						&max_size, &i, &bytes_deleted,
+						ino_dirty);
+			continue;
+		}
+
+		/*
+		 * Metadata directories cannot point to regular files.  If
+		 * we find such a thing, blow out the entry.
+		 */
+		if (xfs_is_metadir_inode(ip) &&
+		    !inode_is_meta(irec, ino_offset)) {
+			do_warn(
+	_("entry \"%s\" in metadata dir %" PRIu64" points to a regular inode %" PRIu64 ", "),
+					fname, ip->i_ino, lino);
+			next_sfep = shortform_dir2_junk(mp, sfp, sfep, lino,
+						&max_size, &i, &bytes_deleted,
+						ino_dirty);
+			continue;
+		}
+
 		/*
 		 * check if this inode is lost+found dir in the root
 		 */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 32/41] xfs_repair: update incore metadata state whenever we create new files
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (30 preceding siblings ...)
  2024-12-23 21:54   ` [PATCH 31/41] xfs_repair: don't let metadata and regular files mix Darrick J. Wong
@ 2024-12-23 21:54   ` Darrick J. Wong
  2024-12-23 21:55   ` [PATCH 33/41] xfs_repair: pass private data pointer to scan_lbtree Darrick J. Wong
                     ` (8 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:54 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Make sure that we update our incore metadata inode bookkeepping whenever
we create new metadata files.  There will be many more of these later.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/phase6.c |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)


diff --git a/repair/phase6.c b/repair/phase6.c
index 688eee20bb3e8e..8fa2c3c8bf0419 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -474,6 +474,22 @@ reset_sbroot_ino(
 	libxfs_inode_init(tp, &args, ip);
 }
 
+/*
+ * Mark a newly allocated inode as metadata in the incore bitmap.  Callers
+ * must have already called mark_ino_inuse to ensure there is an incore record.
+ */
+static void
+mark_ino_metadata(
+	struct xfs_mount	*mp,
+	xfs_ino_t		ino)
+{
+	struct ino_tree_node	*irec =
+		find_inode_rec(mp, XFS_INO_TO_AGNO(mp, ino),
+				   XFS_INO_TO_AGINO(mp, ino));
+
+	set_inode_is_meta(irec, get_inode_offset(mp, ino, irec));
+}
+
 /* Load a realtime freespace metadata inode from disk and reset it. */
 static int
 ensure_rtino(
@@ -693,6 +709,7 @@ mk_metadir(
 
 	libxfs_trans_ijoin(tp, mp->m_metadirip, 0);
 	libxfs_metafile_set_iflag(tp, mp->m_metadirip, XFS_METAFILE_DIR);
+	mark_ino_metadata(mp, mp->m_metadirip->i_ino);
 
 	error = -libxfs_trans_commit(tp);
 	if (error)


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 33/41] xfs_repair: pass private data pointer to scan_lbtree
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (31 preceding siblings ...)
  2024-12-23 21:54   ` [PATCH 32/41] xfs_repair: update incore metadata state whenever we create new files Darrick J. Wong
@ 2024-12-23 21:55   ` Darrick J. Wong
  2024-12-23 21:55   ` [PATCH 34/41] xfs_repair: mark space used by metadata files Darrick J. Wong
                     ` (7 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:55 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Pass a private data pointer through scan_lbtree.  We'll use this
later when scanning the rtrmapbt to keep track of scan state.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/dinode.c |    2 +-
 repair/scan.c   |   11 +++++++----
 repair/scan.h   |    7 +++++--
 3 files changed, 13 insertions(+), 7 deletions(-)


diff --git a/repair/dinode.c b/repair/dinode.c
index 60a853d152ccec..4ca45ad4aab1fa 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -839,7 +839,7 @@ _("bad bmap btree ptr 0x%" PRIx64 " in ino %" PRIu64 "\n"),
 
 		if (scan_lbtree(get_unaligned_be64(&pp[i]), level, scan_bmapbt,
 				type, whichfork, lino, tot, nex, blkmapp,
-				&cursor, 1, check_dups, magic,
+				&cursor, 1, check_dups, magic, NULL,
 				&xfs_bmbt_buf_ops))
 			return(1);
 		/*
diff --git a/repair/scan.c b/repair/scan.c
index b115dd4948b969..f6d46a2861b312 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -139,7 +139,8 @@ scan_lbtree(
 				int			isroot,
 				int			check_dups,
 				int			*dirty,
-				uint64_t		magic),
+				uint64_t		magic,
+				void			*priv),
 	int		type,
 	int		whichfork,
 	xfs_ino_t	ino,
@@ -150,6 +151,7 @@ scan_lbtree(
 	int		isroot,
 	int		check_dups,
 	uint64_t	magic,
+	void		*priv,
 	const struct xfs_buf_ops *ops)
 {
 	struct xfs_buf	*bp;
@@ -181,7 +183,7 @@ scan_lbtree(
 	err = (*func)(XFS_BUF_TO_BLOCK(bp), nlevels - 1,
 			type, whichfork, root, ino, tot, nex, blkmapp,
 			bm_cursor, isroot, check_dups, &dirty,
-			magic);
+			magic, priv);
 
 	ASSERT(dirty == 0 || (dirty && !no_modify));
 
@@ -210,7 +212,8 @@ scan_bmapbt(
 	int			isroot,
 	int			check_dups,
 	int			*dirty,
-	uint64_t		magic)
+	uint64_t		magic,
+	void			*priv)
 {
 	int			i;
 	int			err;
@@ -486,7 +489,7 @@ _("bad bmap btree ptr 0x%llx in ino %" PRIu64 "\n"),
 
 		err = scan_lbtree(be64_to_cpu(pp[i]), level, scan_bmapbt,
 				type, whichfork, ino, tot, nex, blkmapp,
-				bm_cursor, 0, check_dups, magic,
+				bm_cursor, 0, check_dups, magic, priv,
 				&xfs_bmbt_buf_ops);
 		if (err)
 			return(1);
diff --git a/repair/scan.h b/repair/scan.h
index ee16362b6d3c69..4da788becbef66 100644
--- a/repair/scan.h
+++ b/repair/scan.h
@@ -26,7 +26,8 @@ int scan_lbtree(
 				int			isroot,
 				int			check_dups,
 				int			*dirty,
-				uint64_t		magic),
+				uint64_t		magic,
+				void			*priv),
 	int		type,
 	int		whichfork,
 	xfs_ino_t	ino,
@@ -37,6 +38,7 @@ int scan_lbtree(
 	int		isroot,
 	int		check_dups,
 	uint64_t	magic,
+	void		*priv,
 	const struct xfs_buf_ops *ops);
 
 int scan_bmapbt(
@@ -53,7 +55,8 @@ int scan_bmapbt(
 	int			isroot,
 	int			check_dups,
 	int			*dirty,
-	uint64_t		magic);
+	uint64_t		magic,
+	void			*priv);
 
 void
 scan_ags(


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 34/41] xfs_repair: mark space used by metadata files
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (32 preceding siblings ...)
  2024-12-23 21:55   ` [PATCH 33/41] xfs_repair: pass private data pointer to scan_lbtree Darrick J. Wong
@ 2024-12-23 21:55   ` Darrick J. Wong
  2024-12-23 21:55   ` [PATCH 35/41] xfs_repair: adjust keep_fsinos to handle metadata directories Darrick J. Wong
                     ` (6 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:55 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Track space used by metadata files as a separate incore extent type.
This ensures that we can warn about cross-linked metadata files, even
though we are going to rebuild the entire metadata directory tree in the
end.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/dino_chunks.c |   31 +++++++++++++
 repair/dinode.c      |  122 ++++++++++++++++++++++++++++++++++++++------------
 repair/dinode.h      |    6 ++
 repair/incore.h      |   31 ++++++++-----
 repair/phase4.c      |    2 +
 repair/scan.c        |   30 +++++++++++-
 6 files changed, 175 insertions(+), 47 deletions(-)


diff --git a/repair/dino_chunks.c b/repair/dino_chunks.c
index 0d9b3a01bc298d..0cbc498101ec72 100644
--- a/repair/dino_chunks.c
+++ b/repair/dino_chunks.c
@@ -141,6 +141,16 @@ verify_inode_chunk(xfs_mount_t		*mp,
 		_("uncertain inode block %d/%d already known\n"),
 				agno, agbno);
 			break;
+		case XR_E_METADATA:
+			/*
+			 * Files in the metadata directory tree are always
+			 * reconstructed, so it's ok to let go if this block
+			 * is also a valid inode cluster.
+			 */
+			do_warn(
+		_("inode block %d/%d claimed by metadata file\n"),
+				agno, agbno);
+			fallthrough;
 		case XR_E_UNKNOWN:
 		case XR_E_FREE1:
 		case XR_E_FREE:
@@ -430,6 +440,7 @@ verify_inode_chunk(xfs_mount_t		*mp,
 			set_bmap_ext(agno, cur_agbno, blen, XR_E_MULT);
 			pthread_mutex_unlock(&ag_locks[agno].lock);
 			return 0;
+		case XR_E_METADATA:
 		case XR_E_INO:
 			do_error(
 	_("uncertain inode block overlap, agbno = %d, ino = %" PRIu64 "\n"),
@@ -474,6 +485,16 @@ verify_inode_chunk(xfs_mount_t		*mp,
 		_("uncertain inode block %" PRIu64 " already known\n"),
 				XFS_AGB_TO_FSB(mp, agno, cur_agbno));
 			break;
+		case XR_E_METADATA:
+			/*
+			 * Files in the metadata directory tree are always
+			 * reconstructed, so it's ok to let go if this block
+			 * is also a valid inode cluster.
+			 */
+			do_warn(
+		_("inode block %d/%d claimed by metadata file\n"),
+				agno, agbno);
+			fallthrough;
 		case XR_E_UNKNOWN:
 		case XR_E_FREE1:
 		case XR_E_FREE:
@@ -559,6 +580,16 @@ process_inode_agbno_state(
 	switch (state) {
 	case XR_E_INO:	/* already marked */
 		break;
+	case XR_E_METADATA:
+		/*
+		 * Files in the metadata directory tree are always
+		 * reconstructed, so it's ok to let go if this block is also a
+		 * valid inode cluster.
+		 */
+		do_warn(
+	_("inode block %d/%d claimed by metadata file\n"),
+			agno, agbno);
+		fallthrough;
 	case XR_E_UNKNOWN:
 	case XR_E_FREE:
 	case XR_E_FREE1:
diff --git a/repair/dinode.c b/repair/dinode.c
index 4ca45ad4aab1fa..8148cdbc4f10ad 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -220,6 +220,7 @@ static int
 process_rt_rec_state(
 	struct xfs_mount	*mp,
 	xfs_ino_t		ino,
+	bool			zap_metadata,
 	struct xfs_bmbt_irec	*irec)
 {
 	xfs_fsblock_t		b = irec->br_startblock;
@@ -256,7 +257,13 @@ _("data fork in rt inode %" PRIu64 " found invalid rt extent %"PRIu64" state %d
 		switch (state)  {
 		case XR_E_FREE:
 		case XR_E_UNKNOWN:
-			set_rtbmap(ext, XR_E_INUSE);
+			set_rtbmap(ext, zap_metadata ? XR_E_METADATA :
+						       XR_E_INUSE);
+			break;
+		case XR_E_METADATA:
+			do_error(
+_("data fork in rt inode %" PRIu64 " found metadata file block %" PRIu64 " in rt bmap\n"),
+				ino, ext);
 			break;
 		case XR_E_BAD_STATE:
 			do_error(
@@ -293,7 +300,8 @@ process_rt_rec(
 	struct xfs_bmbt_irec	*irec,
 	xfs_ino_t		ino,
 	xfs_rfsblock_t		*tot,
-	int			check_dups)
+	int			check_dups,
+	bool			zap_metadata)
 {
 	xfs_fsblock_t		lastb;
 	int			bad;
@@ -333,7 +341,7 @@ _("inode %" PRIu64 " - bad rt extent overflows - start %" PRIu64 ", "
 	if (check_dups)
 		bad = process_rt_rec_dups(mp, ino, irec);
 	else
-		bad = process_rt_rec_state(mp, ino, irec);
+		bad = process_rt_rec_state(mp, ino, zap_metadata, irec);
 	if (bad)
 		return bad;
 
@@ -364,7 +372,8 @@ process_bmbt_reclist_int(
 	xfs_fileoff_t		*first_key,
 	xfs_fileoff_t		*last_key,
 	int			check_dups,
-	int			whichfork)
+	int			whichfork,
+	bool			zap_metadata)
 {
 	xfs_bmbt_irec_t		irec;
 	xfs_filblks_t		cp = 0;		/* prev count */
@@ -443,7 +452,8 @@ _("zero length extent (off = %" PRIu64 ", fsbno = %" PRIu64 ") in ino %" PRIu64
 
 		if (type == XR_INO_RTDATA && whichfork == XFS_DATA_FORK) {
 			pthread_mutex_lock(&rt_lock.lock);
-			error2 = process_rt_rec(mp, &irec, ino, tot, check_dups);
+			error2 = process_rt_rec(mp, &irec, ino, tot, check_dups,
+					zap_metadata);
 			pthread_mutex_unlock(&rt_lock.lock);
 			if (error2)
 				return error2;
@@ -558,6 +568,11 @@ _("%s fork in ino %" PRIu64 " claims free block %" PRIu64 "\n"),
 			case XR_E_INUSE_FS1:
 				do_warn(_("rmap claims metadata use!\n"));
 				fallthrough;
+			case XR_E_METADATA:
+				do_warn(
+_("%s fork in inode %" PRIu64 " claims metadata file block %" PRIu64 "\n"),
+					forkname, ino, b);
+				break;
 			case XR_E_FS_MAP:
 			case XR_E_INO:
 			case XR_E_INUSE_FS:
@@ -614,15 +629,28 @@ _("illegal state %d in block map %" PRIu64 "\n"),
 		for (; agbno < ebno; agbno += blen) {
 			state = get_bmap_ext(agno, agbno, ebno, &blen);
 			switch (state)  {
+			case XR_E_METADATA:
+				/*
+				 * The entire metadata directory tree is rebuilt
+				 * every time, so we can let regular files take
+				 * ownership of this block.
+				 */
+				if (zap_metadata)
+					break;
+				fallthrough;
 			case XR_E_FREE:
 			case XR_E_FREE1:
 			case XR_E_INUSE1:
 			case XR_E_UNKNOWN:
-				set_bmap_ext(agno, agbno, blen, XR_E_INUSE);
+				set_bmap_ext(agno, agbno, blen, zap_metadata ?
+						XR_E_METADATA : XR_E_INUSE);
 				break;
+
 			case XR_E_INUSE:
 			case XR_E_MULT:
-				set_bmap_ext(agno, agbno, blen, XR_E_MULT);
+				if (!zap_metadata)
+					set_bmap_ext(agno, agbno, blen,
+							XR_E_MULT);
 				break;
 			default:
 				break;
@@ -661,10 +689,12 @@ process_bmbt_reclist(
 	blkmap_t		**blkmapp,
 	xfs_fileoff_t		*first_key,
 	xfs_fileoff_t		*last_key,
-	int			whichfork)
+	int			whichfork,
+	bool			zap_metadata)
 {
 	return process_bmbt_reclist_int(mp, rp, numrecs, type, ino, tot,
-				blkmapp, first_key, last_key, 0, whichfork);
+				blkmapp, first_key, last_key, 0, whichfork,
+				zap_metadata);
 }
 
 /*
@@ -679,13 +709,15 @@ scan_bmbt_reclist(
 	int			type,
 	xfs_ino_t		ino,
 	xfs_rfsblock_t		*tot,
-	int			whichfork)
+	int			whichfork,
+	bool			zap_metadata)
 {
 	xfs_fileoff_t		first_key = 0;
 	xfs_fileoff_t		last_key = 0;
 
 	return process_bmbt_reclist_int(mp, rp, numrecs, type, ino, tot,
-				NULL, &first_key, &last_key, 1, whichfork);
+				NULL, &first_key, &last_key, 1, whichfork,
+				zap_metadata);
 }
 
 /*
@@ -760,7 +792,8 @@ process_btinode(
 	xfs_extnum_t		*nex,
 	blkmap_t		**blkmapp,
 	int			whichfork,
-	int			check_dups)
+	int			check_dups,
+	bool			zap_metadata)
 {
 	xfs_bmdr_block_t	*dib;
 	xfs_fileoff_t		last_key;
@@ -839,8 +872,8 @@ _("bad bmap btree ptr 0x%" PRIx64 " in ino %" PRIu64 "\n"),
 
 		if (scan_lbtree(get_unaligned_be64(&pp[i]), level, scan_bmapbt,
 				type, whichfork, lino, tot, nex, blkmapp,
-				&cursor, 1, check_dups, magic, NULL,
-				&xfs_bmbt_buf_ops))
+				&cursor, 1, check_dups, magic,
+				(void *)zap_metadata, &xfs_bmbt_buf_ops))
 			return(1);
 		/*
 		 * fix key (offset) mismatches between the keys in root
@@ -935,7 +968,8 @@ process_exinode(
 	xfs_extnum_t		*nex,
 	blkmap_t		**blkmapp,
 	int			whichfork,
-	int			check_dups)
+	int			check_dups,
+	bool			zap_metadata)
 {
 	xfs_ino_t		lino;
 	xfs_bmbt_rec_t		*rp;
@@ -969,10 +1003,10 @@ process_exinode(
 	if (check_dups == 0)
 		ret = process_bmbt_reclist(mp, rp, &numrecs, type, lino,
 					tot, blkmapp, &first_key, &last_key,
-					whichfork);
+					whichfork, zap_metadata);
 	else
 		ret = scan_bmbt_reclist(mp, rp, &numrecs, type, lino, tot,
-					whichfork);
+					whichfork, zap_metadata);
 
 	*nex = numrecs;
 	return ret;
@@ -1901,7 +1935,8 @@ process_inode_data_fork(
 	xfs_extnum_t		*nextents,
 	blkmap_t		**dblkmap,
 	int			check_dups,
-	struct xfs_buf		**ino_bpp)
+	struct xfs_buf		**ino_bpp,
+	bool			zap_metadata)
 {
 	struct xfs_dinode	*dino = *dinop;
 	xfs_ino_t		lino = XFS_AGINO_TO_INO(mp, agno, ino);
@@ -1948,14 +1983,14 @@ process_inode_data_fork(
 			try_rebuild = 1;
 		err = process_exinode(mp, agno, ino, dino, type, dirty,
 			totblocks, nextents, dblkmap, XFS_DATA_FORK,
-			check_dups);
+			check_dups, zap_metadata);
 		break;
 	case XFS_DINODE_FMT_BTREE:
 		if (!rmapbt_suspect && try_rebuild == -1)
 			try_rebuild = 1;
 		err = process_btinode(mp, agno, ino, dino, type, dirty,
 			totblocks, nextents, dblkmap, XFS_DATA_FORK,
-			check_dups);
+			check_dups, zap_metadata);
 		break;
 	case XFS_DINODE_FMT_DEV:
 		err = 0;
@@ -2008,12 +2043,12 @@ _("would have tried to rebuild inode %"PRIu64" data fork\n"),
 		case XFS_DINODE_FMT_EXTENTS:
 			err = process_exinode(mp, agno, ino, dino, type,
 				dirty, totblocks, nextents, dblkmap,
-				XFS_DATA_FORK, 0);
+				XFS_DATA_FORK, 0, zap_metadata);
 			break;
 		case XFS_DINODE_FMT_BTREE:
 			err = process_btinode(mp, agno, ino, dino, type,
 				dirty, totblocks, nextents, dblkmap,
-				XFS_DATA_FORK, 0);
+				XFS_DATA_FORK, 0, zap_metadata);
 			break;
 		case XFS_DINODE_FMT_DEV:
 			err = 0;
@@ -2048,7 +2083,8 @@ process_inode_attr_fork(
 	int			check_dups,
 	int			extra_attr_check,
 	int			*retval,
-	struct xfs_buf		**ino_bpp)
+	struct xfs_buf		**ino_bpp,
+	bool			zap_metadata)
 {
 	xfs_ino_t		lino = XFS_AGINO_TO_INO(mp, agno, ino);
 	struct xfs_dinode	*dino = *dinop;
@@ -2096,7 +2132,7 @@ process_inode_attr_fork(
 		*anextents = 0;
 		err = process_exinode(mp, agno, ino, dino, type, dirty,
 				atotblocks, anextents, &ablkmap,
-				XFS_ATTR_FORK, check_dups);
+				XFS_ATTR_FORK, check_dups, zap_metadata);
 		break;
 	case XFS_DINODE_FMT_BTREE:
 		if (!rmapbt_suspect && try_rebuild == -1)
@@ -2105,7 +2141,7 @@ process_inode_attr_fork(
 		*anextents = 0;
 		err = process_btinode(mp, agno, ino, dino, type, dirty,
 				atotblocks, anextents, &ablkmap,
-				XFS_ATTR_FORK, check_dups);
+				XFS_ATTR_FORK, check_dups, zap_metadata);
 		break;
 	default:
 		do_warn(_("illegal attribute format %d, ino %" PRIu64 "\n"),
@@ -2169,12 +2205,12 @@ _("would have tried to rebuild inode %"PRIu64" attr fork or cleared it\n"),
 		case XFS_DINODE_FMT_EXTENTS:
 			err = process_exinode(mp, agno, ino, dino,
 				type, dirty, atotblocks, anextents,
-				&ablkmap, XFS_ATTR_FORK, 0);
+				&ablkmap, XFS_ATTR_FORK, 0, zap_metadata);
 			break;
 		case XFS_DINODE_FMT_BTREE:
 			err = process_btinode(mp, agno, ino, dino,
 				type, dirty, atotblocks, anextents,
-				&ablkmap, XFS_ATTR_FORK, 0);
+				&ablkmap, XFS_ATTR_FORK, 0, zap_metadata);
 			break;
 		default:
 			do_error(_("illegal attribute fmt %d, ino %" PRIu64 "\n"),
@@ -2393,6 +2429,7 @@ process_dinode_int(
 	xfs_agino_t		unlinked_ino;
 	struct xfs_perag	*pag;
 	bool			is_meta = false;
+	bool			zap_metadata = false;
 
 	*dirty = *isa_dir = 0;
 	*used = is_used;
@@ -2982,6 +3019,33 @@ _("Bad CoW extent size %u on inode %" PRIu64 ", "),
 		off = get_inode_offset(mp, lino, irec);
 		set_inode_is_meta(irec, off);
 		is_meta = true;
+
+		/*
+		 * We always rebuild the metadata directory tree during phase
+		 * 6, so we use this flag to get all the directory blocks
+		 * marked as free, and any other metadata files whose contents
+		 * we don't want to save.
+		 *
+		 * Currently, there are no metadata files that use xattrs, so
+		 * we always drop the xattr blocks of metadata files.  Parent
+		 * pointers will be rebuilt during phase 6.
+		 */
+		switch (type) {
+		case XR_INO_RTBITMAP:
+		case XR_INO_RTSUM:
+		case XR_INO_UQUOTA:
+		case XR_INO_GQUOTA:
+		case XR_INO_PQUOTA:
+			/*
+			 * This inode was recognized as being filesystem
+			 * metadata, so preserve the inode and its contents for
+			 * later checking and repair.
+			 */
+			break;
+		default:
+			zap_metadata = true;
+			break;
+		}
 	}
 
 	/*
@@ -2989,7 +3053,7 @@ _("Bad CoW extent size %u on inode %" PRIu64 ", "),
 	 */
 	if (process_inode_data_fork(mp, agno, ino, dinop, type, dirty,
 			&totblocks, &nextents, &dblkmap, check_dups,
-			ino_bpp) != 0)
+			ino_bpp, zap_metadata) != 0)
 		goto bad_out;
 	dino = *dinop;
 
@@ -2999,7 +3063,7 @@ _("Bad CoW extent size %u on inode %" PRIu64 ", "),
 	 */
 	if (process_inode_attr_fork(mp, agno, ino, dinop, type, dirty,
 			&atotblocks, &anextents, check_dups, extra_attr_check,
-			&retval, ino_bpp))
+			&retval, ino_bpp, is_meta))
 		goto bad_out;
 	dino = *dinop;
 
diff --git a/repair/dinode.h b/repair/dinode.h
index 92df83da621053..ed2ec4ca2386ff 100644
--- a/repair/dinode.h
+++ b/repair/dinode.h
@@ -27,7 +27,8 @@ process_bmbt_reclist(xfs_mount_t	*mp,
 		struct blkmap		**blkmapp,
 		uint64_t		*first_key,
 		uint64_t		*last_key,
-		int			whichfork);
+		int			whichfork,
+		bool			zap_metadata);
 
 int
 scan_bmbt_reclist(
@@ -37,7 +38,8 @@ scan_bmbt_reclist(
 	int			type,
 	xfs_ino_t		ino,
 	xfs_rfsblock_t		*tot,
-	int			whichfork);
+	int			whichfork,
+	bool			zap_metadata);
 
 void
 update_rootino(xfs_mount_t *mp);
diff --git a/repair/incore.h b/repair/incore.h
index 568a8c7cb75b7c..07716fc4c01a05 100644
--- a/repair/incore.h
+++ b/repair/incore.h
@@ -85,18 +85,25 @@ typedef struct rt_extent_tree_node  {
 #define XR_E_UNKNOWN	0	/* unknown state */
 #define XR_E_FREE1	1	/* free block (marked by one fs space tree) */
 #define XR_E_FREE	2	/* free block (marked by both fs space trees) */
-#define XR_E_INUSE	3	/* extent used by file/dir data or metadata */
-#define XR_E_INUSE_FS	4	/* extent used by fs ag header or log */
-#define XR_E_MULT	5	/* extent is multiply referenced */
-#define XR_E_INO	6	/* extent used by inodes (inode blocks) */
-#define XR_E_FS_MAP	7	/* extent used by fs space/inode maps */
-#define XR_E_INUSE1	8	/* used block (marked by rmap btree) */
-#define XR_E_INUSE_FS1	9	/* used by fs ag header or log (rmap btree) */
-#define XR_E_INO1	10	/* used by inodes (marked by rmap btree) */
-#define XR_E_FS_MAP1	11	/* used by fs space/inode maps (rmap btree) */
-#define XR_E_REFC	12	/* used by fs ag reference count btree */
-#define XR_E_COW	13	/* leftover cow extent */
-#define XR_E_BAD_STATE	14
+/*
+ * Space used by metadata files.  The entire metadata directory tree will be
+ * rebuilt from scratch during phase 6, so this value must be less than
+ * XR_E_INUSE so that the space will go back to the free space btrees during
+ * phase 5.
+ */
+#define XR_E_METADATA	3
+#define XR_E_INUSE	4	/* extent used by file/dir data or metadata */
+#define XR_E_INUSE_FS	5	/* extent used by fs ag header or log */
+#define XR_E_MULT	6	/* extent is multiply referenced */
+#define XR_E_INO	7	/* extent used by inodes (inode blocks) */
+#define XR_E_FS_MAP	8	/* extent used by fs space/inode maps */
+#define XR_E_INUSE1	9	/* used block (marked by rmap btree) */
+#define XR_E_INUSE_FS1	10	/* used by fs ag header or log (rmap btree) */
+#define XR_E_INO1	11	/* used by inodes (marked by rmap btree) */
+#define XR_E_FS_MAP1	12	/* used by fs space/inode maps (rmap btree) */
+#define XR_E_REFC	13	/* used by fs ag reference count btree */
+#define XR_E_COW	14	/* leftover cow extent */
+#define XR_E_BAD_STATE	15
 
 /* separate state bit, OR'ed into high (4th) bit of ex_state field */
 
diff --git a/repair/phase4.c b/repair/phase4.c
index 7efef86245fbe7..40db36f1f93020 100644
--- a/repair/phase4.c
+++ b/repair/phase4.c
@@ -303,6 +303,7 @@ phase4(xfs_mount_t *mp)
 				_("unknown block state, ag %d, blocks %u-%u\n"),
 					i, j, j + blen - 1);
 				fallthrough;
+			case XR_E_METADATA:
 			case XR_E_UNKNOWN:
 			case XR_E_FREE:
 			case XR_E_INUSE:
@@ -335,6 +336,7 @@ phase4(xfs_mount_t *mp)
 	_("unknown rt extent state, extent %" PRIu64 "\n"),
 				rtx);
 			fallthrough;
+		case XR_E_METADATA:
 		case XR_E_UNKNOWN:
 		case XR_E_FREE1:
 		case XR_E_FREE:
diff --git a/repair/scan.c b/repair/scan.c
index f6d46a2861b312..0fec7c222ff156 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -227,6 +227,7 @@ scan_bmapbt(
 	xfs_agnumber_t		agno;
 	xfs_agblock_t		agbno;
 	int			state;
+	bool			zap_metadata = priv != NULL;
 
 	/*
 	 * unlike the ag freeblock btrees, if anything looks wrong
@@ -352,7 +353,20 @@ _("bad back (left) sibling pointer (saw %llu should be NULL (0))\n"
 		case XR_E_UNKNOWN:
 		case XR_E_FREE1:
 		case XR_E_FREE:
-			set_bmap(agno, agbno, XR_E_INUSE);
+			set_bmap(agno, agbno, zap_metadata ? XR_E_METADATA :
+							     XR_E_INUSE);
+			break;
+		case XR_E_METADATA:
+			/*
+			 * bmbt block already claimed by a metadata file.  We
+			 * always reconstruct the entire metadata tree, so if
+			 * this is a regular file we mark it owned by the file.
+			 */
+			do_warn(
+_("inode 0x%" PRIx64 "bmap block 0x%" PRIx64 " claimed by metadata file\n"),
+				ino, bno);
+			if (!zap_metadata)
+				set_bmap(agno, agbno, XR_E_INUSE);
 			break;
 		case XR_E_FS_MAP:
 		case XR_E_INUSE:
@@ -364,7 +378,8 @@ _("bad back (left) sibling pointer (saw %llu should be NULL (0))\n"
 			 * we made it here, the block probably
 			 * contains btree data.
 			 */
-			set_bmap(agno, agbno, XR_E_MULT);
+			if (!zap_metadata)
+				set_bmap(agno, agbno, XR_E_MULT);
 			do_warn(
 _("inode 0x%" PRIx64 "bmap block 0x%" PRIx64 " claimed, state is %d\n"),
 				ino, bno, state);
@@ -429,7 +444,8 @@ _("inode %" PRIu64 " bad # of bmap records (%" PRIu64 ", min - %u, max - %u)\n")
 		if (check_dups == 0)  {
 			err = process_bmbt_reclist(mp, rp, &numrecs, type, ino,
 						   tot, blkmapp, &first_key,
-						   &last_key, whichfork);
+						   &last_key, whichfork,
+						   zap_metadata);
 			if (err)
 				return 1;
 
@@ -459,7 +475,7 @@ _("out-of-order bmap key (file offset) in inode %" PRIu64 ", %s fork, fsbno %" P
 			return 0;
 		} else {
 			return scan_bmbt_reclist(mp, rp, &numrecs, type, ino,
-						 tot, whichfork);
+					tot, whichfork, zap_metadata);
 		}
 	}
 	if (numrecs > mp->m_bmap_dmxr[1] || (isroot == 0 && numrecs <
@@ -849,6 +865,12 @@ process_rmap_rec(
 			break;
 		}
 		break;
+	case XR_E_METADATA:
+		do_warn(
+_("Metadata file block (%d,%d-%d) mismatch in %s tree, state - %d,%" PRIx64 "\n"),
+			agno, b, b + blen - 1,
+			name, state, owner);
+		break;
 	case XR_E_INUSE_FS:
 		if (owner == XFS_RMAP_OWN_FS ||
 		    owner == XFS_RMAP_OWN_LOG)


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 35/41] xfs_repair: adjust keep_fsinos to handle metadata directories
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (33 preceding siblings ...)
  2024-12-23 21:55   ` [PATCH 34/41] xfs_repair: mark space used by metadata files Darrick J. Wong
@ 2024-12-23 21:55   ` Darrick J. Wong
  2024-12-23 21:55   ` [PATCH 36/41] xfs_repair: metadata dirs are never plausible root dirs Darrick J. Wong
                     ` (5 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:55 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

In keep_fsinos, mark the root of the metadata directory tree as inuse.
The realtime bitmap and summary files still come after the root
directories, so this is a fairly simple change to the loop test.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/phase5.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)


diff --git a/repair/phase5.c b/repair/phase5.c
index 86b1f681a72bb8..a5e19998b97061 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -419,13 +419,18 @@ static void
 keep_fsinos(xfs_mount_t *mp)
 {
 	ino_tree_node_t		*irec;
-	int			i;
+	unsigned int		inuse = xfs_rootrec_inodes_inuse(mp), i;
 
 	irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, mp->m_sb.sb_rootino),
 			XFS_INO_TO_AGINO(mp, mp->m_sb.sb_rootino));
 
-	for (i = 0; i < 3; i++)
+	for (i = 0; i < inuse; i++) {
 		set_inode_used(irec, i);
+
+		/* Everything after the root dir is metadata */
+		if (i)
+			set_inode_is_meta(irec, i);
+	}
 }
 
 static void


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 36/41] xfs_repair: metadata dirs are never plausible root dirs
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (34 preceding siblings ...)
  2024-12-23 21:55   ` [PATCH 35/41] xfs_repair: adjust keep_fsinos to handle metadata directories Darrick J. Wong
@ 2024-12-23 21:55   ` Darrick J. Wong
  2024-12-23 21:56   ` [PATCH 37/41] xfs_repair: drop all the metadata directory files during pass 4 Darrick J. Wong
                     ` (4 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:55 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Metadata directories are never candidates to be the root of the
user-accessible directory tree.  Update has_plausible_rootdir to ignore
them all, as well as detecting the case where the superblock incorrectly
thinks both trees have the same root.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/xfs_repair.c |    6 ++++++
 1 file changed, 6 insertions(+)


diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 70cab1ad852a21..30b014898c3203 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -546,9 +546,15 @@ has_plausible_rootdir(
 	int			error;
 	bool			ret = false;
 
+	if (xfs_has_metadir(mp) &&
+	    mp->m_sb.sb_rootino == mp->m_sb.sb_metadirino)
+		goto out;
+
 	error = -libxfs_iget(mp, NULL, mp->m_sb.sb_rootino, 0, &ip);
 	if (error)
 		goto out;
+	if (xfs_is_metadir_inode(ip))
+		goto out_rele;
 	if (!S_ISDIR(VFS_I(ip)->i_mode))
 		goto out_rele;
 


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 37/41] xfs_repair: drop all the metadata directory files during pass 4
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (35 preceding siblings ...)
  2024-12-23 21:55   ` [PATCH 36/41] xfs_repair: metadata dirs are never plausible root dirs Darrick J. Wong
@ 2024-12-23 21:56   ` Darrick J. Wong
  2024-12-23 21:56   ` [PATCH 38/41] xfs_repair: truncate and unmark orphaned metadata inodes Darrick J. Wong
                     ` (3 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:56 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Drop the entire metadata directory tree during pass 4 so that we can
reinitialize the entire tree in phase 6.  The existing metadata files
(rtbitmap, rtsummary, quotas) will be reattached to the newly rebuilt
directory tree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/dinode.c |   14 +++++++++++++-
 repair/scan.c   |    2 +-
 2 files changed, 14 insertions(+), 2 deletions(-)


diff --git a/repair/dinode.c b/repair/dinode.c
index 8148cdbc4f10ad..2185214ac41bdf 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -656,7 +656,7 @@ _("illegal state %d in block map %" PRIu64 "\n"),
 				break;
 			}
 		}
-		if (collect_rmaps) /* && !check_dups */
+		if (collect_rmaps && !zap_metadata) /* && !check_dups */
 			rmap_add_rec(mp, ino, whichfork, &irec);
 		*tot += irec.br_blockcount;
 	}
@@ -3123,6 +3123,18 @@ _("Bad CoW extent size %u on inode %" PRIu64 ", "),
 	 */
 	*dirty += process_check_inode_nlink_version(dino, lino);
 
+	/*
+	 * The entire metadata directory tree will be rebuilt during phase 6.
+	 * Therefore, if we're at the end of phase 4 and this is a metadata
+	 * file, zero the ondisk inode and the incore state.
+	 */
+	if (check_dups && zap_metadata && !no_modify) {
+		clear_dinode(mp, dino, lino);
+		*dirty += 1;
+		*used = is_free;
+		*isa_dir = 0;
+	}
+
 	return retval;
 
 clear_bad_out:
diff --git a/repair/scan.c b/repair/scan.c
index 0fec7c222ff156..ed73de4b2477bf 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -418,7 +418,7 @@ _("bad state %d, inode %" PRIu64 " bmap block 0x%" PRIx64 "\n"),
 	numrecs = be16_to_cpu(block->bb_numrecs);
 
 	/* Record BMBT blocks in the reverse-mapping data. */
-	if (check_dups && collect_rmaps) {
+	if (check_dups && collect_rmaps && !zap_metadata) {
 		agno = XFS_FSB_TO_AGNO(mp, bno);
 		pthread_mutex_lock(&ag_locks[agno].lock);
 		rmap_add_bmbt_rec(mp, ino, whichfork, bno);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 38/41] xfs_repair: truncate and unmark orphaned metadata inodes
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (36 preceding siblings ...)
  2024-12-23 21:56   ` [PATCH 37/41] xfs_repair: drop all the metadata directory files during pass 4 Darrick J. Wong
@ 2024-12-23 21:56   ` Darrick J. Wong
  2024-12-23 21:56   ` [PATCH 39/41] xfs_repair: do not count metadata directory files when doing quotacheck Darrick J. Wong
                     ` (2 subsequent siblings)
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:56 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If an inode claims to be a metadata inode but wasn't linked in either
directory tree, remove the attr fork and reset the data fork if the
contents weren't regular extent mappings before moving the inode to the
lost+found.

We don't ifree the inode, because it's possible that the inode was not
actually a metadata inode but simply got corrupted due to bitflips or
something, and we'd rather let the sysadmin examine what's left of the
file instead of photorec'ing it.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/phase6.c |   50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)


diff --git a/repair/phase6.c b/repair/phase6.c
index 8fa2c3c8bf0419..3dd67f7e0ec051 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -853,6 +853,53 @@ mk_orphanage(
 	return(ino);
 }
 
+/* Don't let metadata inode contents leak to lost+found. */
+static void
+trunc_metadata_inode(
+	struct xfs_inode	*ip)
+{
+	struct xfs_trans	*tp;
+	struct xfs_mount	*mp = ip->i_mount;
+	int			err;
+
+	err = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 0, 0, 0, &tp);
+	if (err)
+		do_error(
+	_("space reservation failed (%d), filesystem may be out of space\n"),
+					err);
+
+	libxfs_trans_ijoin(tp, ip, 0);
+	ip->i_diflags2 &= ~XFS_DIFLAG2_METADATA;
+
+	switch (VFS_I(ip)->i_mode & S_IFMT) {
+	case S_IFIFO:
+	case S_IFCHR:
+	case S_IFBLK:
+	case S_IFSOCK:
+		ip->i_df.if_format = XFS_DINODE_FMT_DEV;
+		break;
+	case S_IFREG:
+		switch (ip->i_df.if_format) {
+		case XFS_DINODE_FMT_EXTENTS:
+		case XFS_DINODE_FMT_BTREE:
+			break;
+		default:
+			ip->i_df.if_format = XFS_DINODE_FMT_EXTENTS;
+			ip->i_df.if_nextents = 0;
+			break;
+		}
+		break;
+	}
+
+	libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+	err = -libxfs_trans_commit(tp);
+	if (err)
+		do_error(
+	_("truncation of metadata inode 0x%llx failed, err=%d\n"),
+				(unsigned long long)ip->i_ino, err);
+}
+
 /*
  * Add a parent pointer back to the orphanage for any file we're moving into
  * the orphanage, being careful not to trip over any existing parent pointer.
@@ -943,6 +990,9 @@ mv_orphanage(
 	if (err)
 		do_error(_("%d - couldn't iget disconnected inode\n"), err);
 
+	if (xfs_is_metadir_inode(ino_p))
+		trunc_metadata_inode(ino_p);
+
 	xname.type = libxfs_mode_to_ftype(VFS_I(ino_p)->i_mode);
 
 	if (isa_dir)  {


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 39/41] xfs_repair: do not count metadata directory files when doing quotacheck
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (37 preceding siblings ...)
  2024-12-23 21:56   ` [PATCH 38/41] xfs_repair: truncate and unmark orphaned metadata inodes Darrick J. Wong
@ 2024-12-23 21:56   ` Darrick J. Wong
  2024-12-23 21:56   ` [PATCH 40/41] xfs_repair: refactor generate_rtinfo Darrick J. Wong
  2024-12-23 21:57   ` [PATCH 41/41] mkfs.xfs: enable metadata directories Darrick J. Wong
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:56 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Previously, we stated that files in the metadata directory tree are not
counted in the dquot information.  Fix the offline quotacheck code in
xfs_repair and xfs_check to reflect this.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/quotacheck.c |    5 +++++
 1 file changed, 5 insertions(+)


diff --git a/repair/quotacheck.c b/repair/quotacheck.c
index d9e08059927f80..11e2d64eb34791 100644
--- a/repair/quotacheck.c
+++ b/repair/quotacheck.c
@@ -217,6 +217,10 @@ quotacheck_adjust(
 		return;
 	}
 
+	/* Metadata directory files aren't counted in quota. */
+	if (xfs_is_metadir_inode(ip))
+		goto out_rele;
+
 	/* Count the file's blocks. */
 	if (XFS_IS_REALTIME_INODE(ip))
 		rtblks = qc_count_rtblocks(ip);
@@ -229,6 +233,7 @@ quotacheck_adjust(
 	if (proj_dquots)
 		qc_adjust(proj_dquots, ip->i_projid, blocks, rtblks);
 
+out_rele:
 	libxfs_irele(ip);
 }
 


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 40/41] xfs_repair: refactor generate_rtinfo
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (38 preceding siblings ...)
  2024-12-23 21:56   ` [PATCH 39/41] xfs_repair: do not count metadata directory files when doing quotacheck Darrick J. Wong
@ 2024-12-23 21:56   ` Darrick J. Wong
  2024-12-23 21:57   ` [PATCH 41/41] mkfs.xfs: enable metadata directories Darrick J. Wong
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:56 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Move the allocation of the computed values into generate_rtinfo, and thus
make the variables holding them private in rt.c, and clean up a few
formatting nits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
[djwong: move functions to fix build errors]
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/globals.c |    5 --
 repair/globals.h |    5 --
 repair/phase5.c  |    3 -
 repair/phase6.c  |   51 +----------------
 repair/rt.c      |  160 ++++++++++++++++++++++++++++++++----------------------
 repair/rt.h      |   12 +---
 6 files changed, 104 insertions(+), 132 deletions(-)


diff --git a/repair/globals.c b/repair/globals.c
index b63931be9fdb70..bd07a9656d193b 100644
--- a/repair/globals.c
+++ b/repair/globals.c
@@ -87,11 +87,6 @@ unsigned int	glob_agcount;
 int		chunks_pblock;	/* # of 64-ino chunks per allocation */
 int		max_symlink_blocks;
 
-/* realtime info */
-
-union xfs_rtword_raw	*btmcompute;
-union xfs_suminfo_raw	*sumcompute;
-
 /* inode tree records have full or partial backptr fields ? */
 
 int	full_ino_ex_data;	/*
diff --git a/repair/globals.h b/repair/globals.h
index 1dc85ce7f8114c..ebe8d5ee132b8d 100644
--- a/repair/globals.h
+++ b/repair/globals.h
@@ -128,11 +128,6 @@ extern unsigned int	glob_agcount;
 extern int		chunks_pblock;	/* # of 64-ino chunks per allocation */
 extern int		max_symlink_blocks;
 
-/* realtime info */
-
-extern union xfs_rtword_raw		*btmcompute;
-extern union xfs_suminfo_raw		*sumcompute;
-
 /* inode tree records have full or partial backptr fields ? */
 
 extern int		full_ino_ex_data;/*
diff --git a/repair/phase5.c b/repair/phase5.c
index a5e19998b97061..91c4a8662a69f2 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -627,8 +627,7 @@ void
 check_rtmetadata(
 	struct xfs_mount	*mp)
 {
-	rtinit(mp);
-	generate_rtinfo(mp, btmcompute, sumcompute);
+	generate_rtinfo(mp);
 	check_rtbitmap(mp);
 	check_rtsummary(mp);
 }
diff --git a/repair/phase6.c b/repair/phase6.c
index 3dd67f7e0ec051..44677096873a54 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -19,6 +19,7 @@
 #include "progress.h"
 #include "versions.h"
 #include "repair/pptr.h"
+#include "repair/rt.h"
 
 static xfs_ino_t		orphanage_ino;
 
@@ -556,52 +557,6 @@ mk_rbmino(
 	libxfs_irele(ip);
 }
 
-static void
-fill_rbmino(
-	struct xfs_mount	*mp)
-{
-	struct xfs_inode	*ip;
-	int			error;
-
-	error = -libxfs_metafile_iget(mp, mp->m_sb.sb_rbmino,
-			XFS_METAFILE_RTBITMAP, &ip);
-	if (error)
-		do_error(
-_("couldn't iget realtime bitmap inode, error %d\n"), error);
-
-	error = -libxfs_rtfile_initialize_blocks(ip, 0, mp->m_sb.sb_rbmblocks,
-			btmcompute);
-	if (error)
-		do_error(
-_("couldn't re-initialize realtime bitmap inode, error %d\n"), error);
-
-	libxfs_irele(ip);
-}
-
-static void
-fill_rsumino(
-	struct xfs_mount	*mp)
-{
-	struct xfs_inode	*ip;
-	int			error;
-
-	error = -libxfs_metafile_iget(mp, mp->m_sb.sb_rsumino,
-			XFS_METAFILE_RTSUMMARY, &ip);
-	if (error)
-		do_error(
-_("couldn't iget realtime summary inode, error %d\n"), error);
-
-	mp->m_rsumip = ip;
-	error = -libxfs_rtfile_initialize_blocks(ip, 0, mp->m_rsumblocks,
-			sumcompute);
-	mp->m_rsumip = NULL;
-	if (error)
-		do_error(
-_("couldn't re-initialize realtime summary inode, error %d\n"), error);
-
-	libxfs_irele(ip);
-}
-
 static void
 mk_rsumino(
 	struct xfs_mount	*mp)
@@ -3335,8 +3290,8 @@ phase6(xfs_mount_t *mp)
 	if (!no_modify)  {
 		do_log(
 _("        - resetting contents of realtime bitmap and summary inodes\n"));
-		fill_rbmino(mp);
-		fill_rsumino(mp);
+		fill_rtbitmap(mp);
+		fill_rtsummary(mp);
 	}
 
 	mark_standalone_inodes(mp);
diff --git a/repair/rt.c b/repair/rt.c
index 9ae421168e84b4..d0a171020c4b49 100644
--- a/repair/rt.c
+++ b/repair/rt.c
@@ -14,31 +14,9 @@
 #include "err_protos.h"
 #include "rt.h"
 
-void
-rtinit(xfs_mount_t *mp)
-{
-	unsigned long long	wordcnt;
-
-	if (mp->m_sb.sb_rblocks == 0)
-		return;
-
-	/*
-	 * Allocate buffers for formatting the collected rt free space
-	 * information.  The rtbitmap buffer must be large enough to compare
-	 * against any unused bytes in the last block of the file.
-	 */
-	wordcnt = XFS_FSB_TO_B(mp, mp->m_sb.sb_rbmblocks) >> XFS_WORDLOG;
-	btmcompute = calloc(wordcnt, sizeof(union xfs_rtword_raw));
-	if (!btmcompute)
-		do_error(
-	_("couldn't allocate memory for incore realtime bitmap.\n"));
-
-	wordcnt = XFS_FSB_TO_B(mp, mp->m_rsumblocks) >> XFS_WORDLOG;
-	sumcompute = calloc(wordcnt, sizeof(union xfs_suminfo_raw));
-	if (!sumcompute)
-		do_error(
-	_("couldn't allocate memory for incore realtime summary info.\n"));
-}
+/* Computed rt bitmap/summary data */
+static union xfs_rtword_raw	*btmcompute;
+static union xfs_suminfo_raw	*sumcompute;
 
 static inline void
 set_rtword(
@@ -64,58 +42,66 @@ inc_sumcount(
  * generate the real-time bitmap and summary info based on the
  * incore realtime extent map.
  */
-int
+void
 generate_rtinfo(
-	struct xfs_mount	*mp,
-	union xfs_rtword_raw	*words,
-	union xfs_suminfo_raw	*sumcompute)
+	struct xfs_mount	*mp)
 {
-	xfs_rtxnum_t	extno;
-	xfs_rtxnum_t	start_ext;
-	int		bitsperblock;
-	int		bmbno;
-	xfs_rtword_t	freebit;
-	xfs_rtword_t	bits;
-	int		start_bmbno;
-	int		i;
-	int		offs;
-	int		log;
-	int		len;
-	int		in_extent;
+	unsigned int		bitsperblock =
+		mp->m_blockwsize << XFS_NBWORDLOG;
+	xfs_rtxnum_t		extno = 0;
+	xfs_rtxnum_t		start_ext = 0;
+	int			bmbno = 0;
+	int			start_bmbno = 0;
+	bool			in_extent = false;
+	unsigned long long	wordcnt;
+	union xfs_rtword_raw	*words;
+
+	wordcnt = XFS_FSB_TO_B(mp, mp->m_sb.sb_rbmblocks) >> XFS_WORDLOG;
+	btmcompute = calloc(wordcnt, sizeof(union xfs_rtword_raw));
+	if (!btmcompute)
+		do_error(
+_("couldn't allocate memory for incore realtime bitmap.\n"));
+	words = btmcompute;
+
+	wordcnt = XFS_FSB_TO_B(mp, mp->m_rsumblocks) >> XFS_WORDLOG;
+	sumcompute = calloc(wordcnt, sizeof(union xfs_suminfo_raw));
+	if (!sumcompute)
+		do_error(
+_("couldn't allocate memory for incore realtime summary info.\n"));
 
 	ASSERT(mp->m_rbmip == NULL);
 
-	bitsperblock = mp->m_blockwsize << XFS_NBWORDLOG;
-	extno = start_ext = 0;
-	bmbno = in_extent = start_bmbno = 0;
-
 	/*
-	 * slower but simple, don't play around with trying to set
-	 * things one word at a time, just set bit as required.
-	 * Have to * track start and end (size) of each range of
-	 * free extents to set the summary info properly.
+	 * Slower but simple, don't play around with trying to set things one
+	 * word at a time, just set bit as required.  Have to track start and
+	 * end (size) of each range of free extents to set the summary info
+	 * properly.
 	 */
 	while (extno < mp->m_sb.sb_rextents)  {
-		freebit = 1;
+		xfs_rtword_t		freebit = 1;
+		xfs_rtword_t		bits = 0;
+		int			i;
+
 		set_rtword(mp, words, 0);
-		bits = 0;
 		for (i = 0; i < sizeof(xfs_rtword_t) * NBBY &&
 				extno < mp->m_sb.sb_rextents; i++, extno++)  {
 			if (get_rtbmap(extno) == XR_E_FREE)  {
 				sb_frextents++;
 				bits |= freebit;
 
-				if (in_extent == 0) {
+				if (!in_extent) {
 					start_ext = extno;
 					start_bmbno = bmbno;
-					in_extent = 1;
+					in_extent = true;
 				}
-			} else if (in_extent == 1) {
-				len = (int) (extno - start_ext);
-				log = libxfs_highbit64(len);
-				offs = xfs_rtsumoffs(mp, log, start_bmbno);
+			} else if (in_extent) {
+				uint64_t	len = extno - start_ext;
+				xfs_rtsumoff_t	offs;
+
+				offs = xfs_rtsumoffs(mp, libxfs_highbit64(len),
+						start_bmbno);
 				inc_sumcount(mp, sumcompute, offs);
-				in_extent = 0;
+				in_extent = false;
 			}
 
 			freebit <<= 1;
@@ -126,10 +112,12 @@ generate_rtinfo(
 		if (extno % bitsperblock == 0)
 			bmbno++;
 	}
-	if (in_extent == 1) {
-		len = (int) (extno - start_ext);
-		log = libxfs_highbit64(len);
-		offs = xfs_rtsumoffs(mp, log, start_bmbno);
+
+	if (in_extent) {
+		uint64_t	len = extno - start_ext;
+		xfs_rtsumoff_t	offs;
+
+		offs = xfs_rtsumoffs(mp, libxfs_highbit64(len), start_bmbno);
 		inc_sumcount(mp, sumcompute, offs);
 	}
 
@@ -137,8 +125,6 @@ generate_rtinfo(
 		do_warn(_("sb_frextents %" PRIu64 ", counted %" PRIu64 "\n"),
 				mp->m_sb.sb_frextents, sb_frextents);
 	}
-
-	return(0);
 }
 
 static void
@@ -245,3 +231,49 @@ check_rtsummary(
 
 	check_rtfile_contents(mp, XFS_METAFILE_RTSUMMARY, mp->m_rsumblocks);
 }
+
+void
+fill_rtbitmap(
+	struct xfs_mount	*mp)
+{
+	struct xfs_inode	*ip;
+	int			error;
+
+	error = -libxfs_metafile_iget(mp, mp->m_sb.sb_rbmino,
+			XFS_METAFILE_RTBITMAP, &ip);
+	if (error)
+		do_error(
+_("couldn't iget realtime bitmap inode, error %d\n"), error);
+
+	error = -libxfs_rtfile_initialize_blocks(ip, 0, mp->m_sb.sb_rbmblocks,
+			btmcompute);
+	if (error)
+		do_error(
+_("couldn't re-initialize realtime bitmap inode, error %d\n"), error);
+
+	libxfs_irele(ip);
+}
+
+void
+fill_rtsummary(
+	struct xfs_mount	*mp)
+{
+	struct xfs_inode	*ip;
+	int			error;
+
+	error = -libxfs_metafile_iget(mp, mp->m_sb.sb_rsumino,
+			XFS_METAFILE_RTSUMMARY, &ip);
+	if (error)
+		do_error(
+_("couldn't iget realtime summary inode, error %d\n"), error);
+
+	mp->m_rsumip = ip;
+	error = -libxfs_rtfile_initialize_blocks(ip, 0, mp->m_rsumblocks,
+			sumcompute);
+	mp->m_rsumip = NULL;
+	if (error)
+		do_error(
+_("couldn't re-initialize realtime summary inode, error %d\n"), error);
+
+	libxfs_irele(ip);
+}
diff --git a/repair/rt.h b/repair/rt.h
index 862695487bcd4c..f8caa5dc874ec2 100644
--- a/repair/rt.h
+++ b/repair/rt.h
@@ -6,15 +6,11 @@
 #ifndef _XFS_REPAIR_RT_H_
 #define _XFS_REPAIR_RT_H_
 
-struct blkmap;
-
-void
-rtinit(xfs_mount_t		*mp);
-
-int generate_rtinfo(struct xfs_mount *mp, union xfs_rtword_raw *words,
-		union xfs_suminfo_raw *sumcompute);
-
+void generate_rtinfo(struct xfs_mount *mp);
 void check_rtbitmap(struct xfs_mount *mp);
 void check_rtsummary(struct xfs_mount *mp);
 
+void fill_rtbitmap(struct xfs_mount *mp);
+void fill_rtsummary(struct xfs_mount *mp);
+
 #endif /* _XFS_REPAIR_RT_H_ */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 41/41] mkfs.xfs: enable metadata directories
  2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
                     ` (39 preceding siblings ...)
  2024-12-23 21:56   ` [PATCH 40/41] xfs_repair: refactor generate_rtinfo Darrick J. Wong
@ 2024-12-23 21:57   ` Darrick J. Wong
  40 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:57 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Enable formatting filesystems with metadata directories.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/libxfs_api_defs.h |    2 +
 man/man8/mkfs.xfs.8.in   |   11 +++++++
 mkfs/lts_4.19.conf       |    1 +
 mkfs/lts_5.10.conf       |    1 +
 mkfs/lts_5.15.conf       |    1 +
 mkfs/lts_5.4.conf        |    1 +
 mkfs/lts_6.1.conf        |    1 +
 mkfs/lts_6.12.conf       |    1 +
 mkfs/lts_6.6.conf        |    1 +
 mkfs/proto.c             |   68 +++++++++++++++++++++++++++++++++++++++++++++-
 mkfs/xfs_mkfs.c          |   33 +++++++++++++++++++++-
 11 files changed, 118 insertions(+), 3 deletions(-)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index e79aa0e06e4f90..2f218296688477 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -215,6 +215,8 @@
 #define xfs_metafile_iget		libxfs_metafile_iget
 #define xfs_trans_metafile_iget		libxfs_trans_metafile_iget
 #define xfs_metafile_set_iflag		libxfs_metafile_set_iflag
+#define xfs_metadir_cancel		libxfs_metadir_cancel
+#define xfs_metadir_commit		libxfs_metadir_commit
 #define xfs_metadir_link		libxfs_metadir_link
 #define xfs_metadir_lookup		libxfs_metadir_lookup
 #define xfs_metadir_start_create	libxfs_metadir_start_create
diff --git a/man/man8/mkfs.xfs.8.in b/man/man8/mkfs.xfs.8.in
index e56c8f31a52c78..de5f6baf59df95 100644
--- a/man/man8/mkfs.xfs.8.in
+++ b/man/man8/mkfs.xfs.8.in
@@ -271,6 +271,17 @@ .SH OPTIONS
 When the option
 .B \-m finobt=0
 is used, the inode btree counter feature is not supported and is disabled.
+.TP
+.BI metadir= value
+This option creates an internal directory tree to store filesystem metadata.
+.IP
+By default,
+.B mkfs.xfs
+will not enable this feature.
+If the option
+.B \-m crc=0
+is used, the metadata directory feature is not supported and is disabled.
+
 .TP
 .BI uuid= value
 Use the given value as the filesystem UUID for the newly created filesystem.
diff --git a/mkfs/lts_4.19.conf b/mkfs/lts_4.19.conf
index 4f190bacf9780c..4aa12f429ca2dd 100644
--- a/mkfs/lts_4.19.conf
+++ b/mkfs/lts_4.19.conf
@@ -6,6 +6,7 @@
 crc=1
 finobt=1
 inobtcount=0
+metadir=0
 reflink=0
 rmapbt=0
 autofsck=0
diff --git a/mkfs/lts_5.10.conf b/mkfs/lts_5.10.conf
index a55fc68e4e3f2f..9625135c011f08 100644
--- a/mkfs/lts_5.10.conf
+++ b/mkfs/lts_5.10.conf
@@ -6,6 +6,7 @@
 crc=1
 finobt=1
 inobtcount=0
+metadir=0
 reflink=1
 rmapbt=0
 autofsck=0
diff --git a/mkfs/lts_5.15.conf b/mkfs/lts_5.15.conf
index daea0b40671936..5306fad7e02f0f 100644
--- a/mkfs/lts_5.15.conf
+++ b/mkfs/lts_5.15.conf
@@ -6,6 +6,7 @@
 crc=1
 finobt=1
 inobtcount=1
+metadir=0
 reflink=1
 rmapbt=0
 autofsck=0
diff --git a/mkfs/lts_5.4.conf b/mkfs/lts_5.4.conf
index 0f807fc35e34b4..9114388b0248a5 100644
--- a/mkfs/lts_5.4.conf
+++ b/mkfs/lts_5.4.conf
@@ -6,6 +6,7 @@
 crc=1
 finobt=1
 inobtcount=0
+metadir=0
 reflink=1
 rmapbt=0
 autofsck=0
diff --git a/mkfs/lts_6.1.conf b/mkfs/lts_6.1.conf
index 0ff5bbad5a1c2d..1d5378042eed6c 100644
--- a/mkfs/lts_6.1.conf
+++ b/mkfs/lts_6.1.conf
@@ -6,6 +6,7 @@
 crc=1
 finobt=1
 inobtcount=1
+metadir=0
 reflink=1
 rmapbt=0
 autofsck=0
diff --git a/mkfs/lts_6.12.conf b/mkfs/lts_6.12.conf
index 35b79082495d24..b204b78511f666 100644
--- a/mkfs/lts_6.12.conf
+++ b/mkfs/lts_6.12.conf
@@ -6,6 +6,7 @@
 crc=1
 finobt=1
 inobtcount=1
+metadir=0
 reflink=1
 rmapbt=1
 autofsck=0
diff --git a/mkfs/lts_6.6.conf b/mkfs/lts_6.6.conf
index 2ef5957e0b3a3f..d2649c562fac12 100644
--- a/mkfs/lts_6.6.conf
+++ b/mkfs/lts_6.6.conf
@@ -6,6 +6,7 @@
 crc=1
 finobt=1
 inobtcount=1
+metadir=0
 reflink=1
 rmapbt=1
 autofsck=0
diff --git a/mkfs/proto.c b/mkfs/proto.c
index d8eb6ca33672bd..05c2621f8a0b13 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -471,6 +471,65 @@ creatproto(
 	return 0;
 }
 
+/* Create a new metadata root directory. */
+static int
+create_metadir(
+	struct xfs_mount	*mp)
+{
+	struct xfs_inode	*ip = NULL;
+	struct xfs_trans	*tp;
+	int			error;
+	struct xfs_icreate_args	args = {
+		.mode		= S_IFDIR,
+		.flags		= XFS_ICREATE_UNLINKABLE,
+	};
+	xfs_ino_t		ino;
+
+	if (!xfs_has_metadir(mp))
+		return 0;
+
+	error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_create,
+			libxfs_create_space_res(mp, MAXNAMELEN), 0, 0, &tp);
+	if (error)
+		return error;
+
+	/*
+	 * Create a new inode and set the sb pointer.  The primary super is
+	 * still marked inprogress, so we do not need to log the metadirino
+	 * change ourselves.
+	 */
+	error = -libxfs_dialloc(&tp, &args, &ino);
+	if (error)
+		goto out_cancel;
+	error = -libxfs_icreate(tp, ino, &args, &ip);
+	if (error)
+		goto out_cancel;
+	mp->m_sb.sb_metadirino = ino;
+
+	/*
+	 * Initialize the root directory.  There are no ILOCKs in userspace
+	 * so we do not need to drop it here.
+	 */
+	libxfs_metafile_set_iflag(tp, ip, XFS_METAFILE_DIR);
+	error = -libxfs_dir_init(tp, ip, ip);
+	if (error)
+		goto out_cancel;
+
+	error = -libxfs_trans_commit(tp);
+	if (error)
+		goto out_rele;
+
+	mp->m_metadirip = ip;
+	return 0;
+
+out_cancel:
+	libxfs_trans_cancel(tp);
+out_rele:
+	if (ip)
+		libxfs_irele(ip);
+	return error;
+}
+
 static void
 parseproto(
 	xfs_mount_t	*mp,
@@ -709,8 +768,15 @@ parseproto(
 		 * RT initialization.  Do this here to ensure that
 		 * the RT inodes get placed after the root inode.
 		 */
-		if (isroot)
+		if (isroot) {
+			error = create_metadir(mp);
+			if (error)
+				fail(
+	_("Creation of the metadata directory inode failed"),
+					error);
+
 			rtinit(mp);
+		}
 		tp = NULL;
 		for (;;) {
 			name = getdirentname(pp);
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index bbd0dbb6c80ab6..4e51caead9dac2 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -150,6 +150,7 @@ enum {
 	M_INOBTCNT,
 	M_BIGTIME,
 	M_AUTOFSCK,
+	M_METADIR,
 	M_MAX_OPTS,
 };
 
@@ -812,6 +813,7 @@ static struct opt_params mopts = {
 		[M_INOBTCNT] = "inobtcount",
 		[M_BIGTIME] = "bigtime",
 		[M_AUTOFSCK] = "autofsck",
+		[M_METADIR] = "metadir",
 		[M_MAX_OPTS] = NULL,
 	},
 	.subopt_params = {
@@ -861,6 +863,12 @@ static struct opt_params mopts = {
 		  .maxval = 1,
 		  .defaultval = 1,
 		},
+		{ .index = M_METADIR,
+		  .conflicts = { { NULL, LAST_CONFLICT } },
+		  .minval = 0,
+		  .maxval = 1,
+		  .defaultval = 1,
+		},
 	},
 };
 
@@ -913,6 +921,7 @@ struct sb_feat_args {
 	bool	reflink;		/* XFS_SB_FEAT_RO_COMPAT_REFLINK */
 	bool	inobtcnt;		/* XFS_SB_FEAT_RO_COMPAT_INOBTCNT */
 	bool	bigtime;		/* XFS_SB_FEAT_INCOMPAT_BIGTIME */
+	bool	metadir;		/* XFS_SB_FEAT_INCOMPAT_METADIR */
 	bool	nodalign;
 	bool	nortalign;
 	bool	nrext64;
@@ -1048,7 +1057,8 @@ usage( void )
 /* blocksize */		[-b size=num]\n\
 /* config file */	[-c options=xxx]\n\
 /* metadata */		[-m crc=0|1,finobt=0|1,uuid=xxx,rmapbt=0|1,reflink=0|1,\n\
-			    inobtcount=0|1,bigtime=0|1,autofsck=xxx]\n\
+			    inobtcount=0|1,bigtime=0|1,autofsck=xxx,\n\
+			    metadir=0|1]\n\
 /* data subvol */	[-d agcount=n,agsize=n,file,name=xxx,size=num,\n\
 			    (sunit=value,swidth=value|su=num,sw=num|noalign),\n\
 			    sectsize=num,concurrency=num]\n\
@@ -1883,6 +1893,9 @@ meta_opts_parser(
 				illegal(value, "m autofsck");
 		}
 		break;
+	case M_METADIR:
+		cli->sb_feat.metadir = getnum(value, opts, subopt);
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -2465,6 +2478,14 @@ _("autofsck not supported without CRC support\n"));
 			usage();
 		}
 		cli->autofsck = FSPROP_AUTOFSCK_UNSET;
+
+		if (cli->sb_feat.metadir &&
+		    cli_opt_set(&mopts, M_METADIR)) {
+			fprintf(stderr,
+_("metadata directory not supported without CRC support\n"));
+			usage();
+		}
+		cli->sb_feat.metadir = false;
 	}
 
 	if (!cli->sb_feat.finobt) {
@@ -3568,7 +3589,8 @@ sb_set_features(
 	 * the sb_bad_features2 field. To avoid older kernels mounting
 	 * filesystems they shouldn't, set both field to the same value.
 	 */
-	sbp->sb_bad_features2 = sbp->sb_features2;
+	if (!fp->metadir)
+		sbp->sb_bad_features2 = sbp->sb_features2;
 
 	if (!fp->crcs_enabled)
 		return;
@@ -3618,6 +3640,8 @@ sb_set_features(
 		 */
 		sbp->sb_versionnum |= XFS_SB_VERSION_ATTRBIT;
 	}
+	if (fp->metadir)
+		sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_METADIR;
 }
 
 /*
@@ -4053,6 +4077,7 @@ finish_superblock_setup(
 	platform_uuid_copy(&sbp->sb_meta_uuid, &cfg->uuid);
 	sbp->sb_logstart = cfg->logstart;
 	sbp->sb_rootino = sbp->sb_rbmino = sbp->sb_rsumino = NULLFSINO;
+	sbp->sb_metadirino = NULLFSINO;
 	sbp->sb_agcount = (xfs_agnumber_t)cfg->agcount;
 	sbp->sb_rbmblocks = cfg->rtbmblocks;
 	sbp->sb_logblocks = (xfs_extlen_t)cfg->logblocks;
@@ -4279,6 +4304,8 @@ rewrite_secondary_superblocks(
 	}
 	dsb = buf->b_addr;
 	dsb->sb_rootino = cpu_to_be64(mp->m_sb.sb_rootino);
+	if (xfs_has_metadir(mp))
+		dsb->sb_metadirino = cpu_to_be64(mp->m_sb.sb_metadirino);
 	libxfs_buf_mark_dirty(buf);
 	libxfs_buf_relse(buf);
 
@@ -4297,6 +4324,8 @@ rewrite_secondary_superblocks(
 	}
 	dsb = buf->b_addr;
 	dsb->sb_rootino = cpu_to_be64(mp->m_sb.sb_rootino);
+	if (xfs_has_metadir(mp))
+		dsb->sb_metadirino = cpu_to_be64(mp->m_sb.sb_metadirino);
 	libxfs_buf_mark_dirty(buf);
 	libxfs_buf_relse(buf);
 }


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 1/4] libxfs: resync libxfs_alloc_file_space interface with the kernel
  2024-12-23 21:35 ` [PATCHSET v6.2 4/8] mkfs: make protofiles less janky Darrick J. Wong
@ 2024-12-23 21:57   ` Darrick J. Wong
  2024-12-23 21:57   ` [PATCH 2/4] mkfs: support copying in large or sparse files Darrick J. Wong
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:57 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Make the userspace xfs_alloc_file_space behave (more or less) like the
kernel version, at least as far as the interface goes.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/libxfs.h |    4 +-
 libxfs/util.c    |  140 ++++++++++++++++++++++++++++++++++++------------------
 mkfs/proto.c     |    2 -
 3 files changed, 96 insertions(+), 50 deletions(-)


diff --git a/include/libxfs.h b/include/libxfs.h
index 348d36be192966..878fefbbde7524 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -176,8 +176,8 @@ extern int	libxfs_log_header(char *, uuid_t *, int, int, int, xfs_lsn_t,
 
 /* Shared utility routines */
 
-extern int	libxfs_alloc_file_space (struct xfs_inode *, xfs_off_t,
-				xfs_off_t, int, int);
+int	libxfs_alloc_file_space(struct xfs_inode *ip, xfs_off_t offset,
+		xfs_off_t len, uint32_t bmapi_flags);
 
 /* XXX: this is messy and needs fixing */
 #ifndef __LIBXFS_INTERNAL_XFS_H__
diff --git a/libxfs/util.c b/libxfs/util.c
index 97da94506aee01..e5892fc86c3e92 100644
--- a/libxfs/util.c
+++ b/libxfs/util.c
@@ -179,78 +179,124 @@ libxfs_mod_incore_sb(
  */
 int
 libxfs_alloc_file_space(
-	xfs_inode_t	*ip,
-	xfs_off_t	offset,
-	xfs_off_t	len,
-	int		alloc_type,
-	int		attr_flags)
+	struct xfs_inode	*ip,
+	xfs_off_t		offset,
+	xfs_off_t		len,
+	uint32_t		bmapi_flags)
 {
-	xfs_mount_t	*mp;
-	xfs_off_t	count;
-	xfs_filblks_t	datablocks;
-	xfs_filblks_t	allocated_fsb;
-	xfs_filblks_t	allocatesize_fsb;
-	xfs_bmbt_irec_t *imapp;
-	xfs_bmbt_irec_t imaps[1];
-	int		reccount;
-	uint		resblks;
-	xfs_fileoff_t	startoffset_fsb;
-	xfs_trans_t	*tp;
-	int		xfs_bmapi_flags;
-	int		error;
+	xfs_mount_t		*mp = ip->i_mount;
+	xfs_off_t		count;
+	xfs_filblks_t		allocatesize_fsb;
+	xfs_extlen_t		extsz, temp;
+	xfs_fileoff_t		startoffset_fsb;
+	xfs_fileoff_t		endoffset_fsb;
+	int			rt;
+	xfs_trans_t		*tp;
+	xfs_bmbt_irec_t		imaps[1], *imapp;
+	int			error;
 
 	if (len <= 0)
 		return -EINVAL;
 
+	rt = XFS_IS_REALTIME_INODE(ip);
+	extsz = xfs_get_extsz_hint(ip);
+
 	count = len;
-	error = 0;
 	imapp = &imaps[0];
-	reccount = 1;
-	xfs_bmapi_flags = alloc_type ? XFS_BMAPI_PREALLOC : 0;
-	mp = ip->i_mount;
-	startoffset_fsb = XFS_B_TO_FSBT(mp, offset);
-	allocatesize_fsb = XFS_B_TO_FSB(mp, count);
+	startoffset_fsb	= XFS_B_TO_FSBT(mp, offset);
+	endoffset_fsb = XFS_B_TO_FSB(mp, offset + count);
+	allocatesize_fsb = endoffset_fsb - startoffset_fsb;
 
-	/* allocate file space until done or until there is an error */
+	/*
+	 * Allocate file space until done or until there is an error
+	 */
 	while (allocatesize_fsb && !error) {
-		datablocks = allocatesize_fsb;
+		xfs_fileoff_t	s, e;
+		unsigned int	dblocks, rblocks, resblks;
+		int		nimaps = 1;
 
-		resblks = (uint)XFS_DIOSTRAT_SPACE_RES(mp, datablocks);
-		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks,
-					0, 0, &tp);
 		/*
-		 * Check for running out of space
+		 * Determine space reservations for data/realtime.
 		 */
-		if (error) {
-			ASSERT(error == -ENOSPC);
+		if (unlikely(extsz)) {
+			s = startoffset_fsb;
+			do_div(s, extsz);
+			s *= extsz;
+			e = startoffset_fsb + allocatesize_fsb;
+			div_u64_rem(startoffset_fsb, extsz, &temp);
+			if (temp)
+				e += temp;
+			div_u64_rem(e, extsz, &temp);
+			if (temp)
+				e += extsz - temp;
+		} else {
+			s = 0;
+			e = allocatesize_fsb;
+		}
+
+		/*
+		 * The transaction reservation is limited to a 32-bit block
+		 * count, hence we need to limit the number of blocks we are
+		 * trying to reserve to avoid an overflow. We can't allocate
+		 * more than @nimaps extents, and an extent is limited on disk
+		 * to XFS_BMBT_MAX_EXTLEN (21 bits), so use that to enforce the
+		 * limit.
+		 */
+		resblks = min_t(xfs_fileoff_t, (e - s),
+				(XFS_MAX_BMBT_EXTLEN * nimaps));
+		if (unlikely(rt)) {
+			dblocks = XFS_DIOSTRAT_SPACE_RES(mp, 0);
+			rblocks = resblks;
+		} else {
+			dblocks = XFS_DIOSTRAT_SPACE_RES(mp, resblks);
+			rblocks = 0;
+		}
+
+		error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_write,
+				dblocks, rblocks, false, &tp);
+		if (error)
 			break;
-		}
-		xfs_trans_ijoin(tp, ip, 0);
 
-		error = xfs_bmapi_write(tp, ip, startoffset_fsb, allocatesize_fsb,
-				xfs_bmapi_flags, 0, imapp, &reccount);
+		error = xfs_iext_count_extend(tp, ip, XFS_DATA_FORK,
+				XFS_IEXT_ADD_NOSPLIT_CNT);
+		if (error)
+			goto error;
 
+		error = xfs_bmapi_write(tp, ip, startoffset_fsb,
+				allocatesize_fsb, bmapi_flags, 0, imapp,
+				&nimaps);
 		if (error)
-			goto error0;
+			goto error;
+
+		ip->i_diflags |= XFS_DIFLAG_PREALLOC;
+		xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
-		/*
-		 * Complete the transaction
-		 */
 		error = xfs_trans_commit(tp);
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
 		if (error)
 			break;
 
-		allocated_fsb = imapp->br_blockcount;
-		if (reccount == 0)
-			return -ENOSPC;
-
-		startoffset_fsb += allocated_fsb;
-		allocatesize_fsb -= allocated_fsb;
+		/*
+		 * If xfs_bmapi_write finds a delalloc extent at the requested
+		 * range, it tries to convert the entire delalloc extent to a
+		 * real allocation.
+		 * If the allocator cannot find a single free extent large
+		 * enough to cover the start block of the requested range,
+		 * xfs_bmapi_write will return 0 but leave *nimaps set to 0.
+		 * In that case we simply need to keep looping with the same
+		 * startoffset_fsb.
+		 */
+		if (nimaps) {
+			startoffset_fsb += imapp->br_blockcount;
+			allocatesize_fsb -= imapp->br_blockcount;
+		}
 	}
+
 	return error;
 
-error0:	/* Cancel bmap, cancel trans */
+error:
 	xfs_trans_cancel(tp);
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	return error;
 }
 
diff --git a/mkfs/proto.c b/mkfs/proto.c
index 05c2621f8a0b13..0764064e043e97 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -203,7 +203,7 @@ rsvfile(
 	int		error;
 	xfs_trans_t	*tp;
 
-	error = -libxfs_alloc_file_space(ip, 0, llen, 1, 0);
+	error = -libxfs_alloc_file_space(ip, 0, llen, XFS_BMAPI_PREALLOC);
 
 	if (error) {
 		fail(_("error reserving space for a file"), error);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 2/4] mkfs: support copying in large or sparse files
  2024-12-23 21:35 ` [PATCHSET v6.2 4/8] mkfs: make protofiles less janky Darrick J. Wong
  2024-12-23 21:57   ` [PATCH 1/4] libxfs: resync libxfs_alloc_file_space interface with the kernel Darrick J. Wong
@ 2024-12-23 21:57   ` Darrick J. Wong
  2024-12-23 21:57   ` [PATCH 3/4] mkfs: support copying in xattrs Darrick J. Wong
  2024-12-23 21:58   ` [PATCH 4/4] mkfs: add a utility to generate protofiles Darrick J. Wong
  3 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:57 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Restructure the protofile code to handle sparse files and files that are
larger than the program's address space.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/libxfs.h |    2 +
 libxfs/util.c    |   67 +++++++++++++++++++++
 mkfs/proto.c     |  175 ++++++++++++++++++++++++++++++++----------------------
 3 files changed, 174 insertions(+), 70 deletions(-)


diff --git a/include/libxfs.h b/include/libxfs.h
index 878fefbbde7524..985646e6ad89d1 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -178,6 +178,8 @@ extern int	libxfs_log_header(char *, uuid_t *, int, int, int, xfs_lsn_t,
 
 int	libxfs_alloc_file_space(struct xfs_inode *ip, xfs_off_t offset,
 		xfs_off_t len, uint32_t bmapi_flags);
+int	libxfs_file_write(struct xfs_inode *ip, void *buf, off_t pos,
+		size_t len);
 
 /* XXX: this is messy and needs fixing */
 #ifndef __LIBXFS_INTERNAL_XFS_H__
diff --git a/libxfs/util.c b/libxfs/util.c
index e5892fc86c3e92..8c2ecff5855775 100644
--- a/libxfs/util.c
+++ b/libxfs/util.c
@@ -527,3 +527,70 @@ get_random_u32(void)
 	return ret;
 }
 #endif
+
+/*
+ * Write a buffer to a file on the data device.  There must not be sparse holes
+ * or unwritten extents.
+ */
+int
+libxfs_file_write(
+	struct xfs_inode	*ip,
+	void			*buf,
+	off_t			pos,
+	size_t			len)
+{
+	struct xfs_bmbt_irec	map;
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_buf		*bp;
+	xfs_fileoff_t		bno = XFS_B_TO_FSBT(mp, pos);
+	xfs_fileoff_t		end_bno = XFS_B_TO_FSB(mp, pos + len);
+	unsigned int		block_off = pos % mp->m_sb.sb_blocksize;
+	size_t			count;
+	size_t			bcount;
+	int			nmap;
+	int			error = 0;
+
+	/* Write up to 1MB at a time. */
+	while (bno < end_bno) {
+		xfs_filblks_t	maplen;
+
+		maplen = min(end_bno - bno, XFS_B_TO_FSBT(mp, 1048576));
+		nmap = 1;
+		error = libxfs_bmapi_read(ip, bno, maplen, &map, &nmap, 0);
+		if (error)
+			return error;
+		if (nmap != 1)
+			return -ENOSPC;
+
+		if (map.br_startblock == HOLESTARTBLOCK ||
+		    map.br_state == XFS_EXT_UNWRITTEN)
+			return -EINVAL;
+
+		error = libxfs_buf_get(mp->m_dev,
+				XFS_FSB_TO_DADDR(mp, map.br_startblock),
+				XFS_FSB_TO_BB(mp, map.br_blockcount),
+				&bp);
+		if (error)
+			break;
+		bp->b_ops = NULL;
+
+		if (block_off > 0)
+			memset((char *)bp->b_addr, 0, block_off);
+		count = min(len, XFS_FSB_TO_B(mp, map.br_blockcount));
+		memmove(bp->b_addr, buf + block_off, count);
+		bcount = BBTOB(bp->b_length);
+		if (count < bcount)
+			memset((char *)bp->b_addr + block_off + count, 0,
+					bcount - (block_off + count));
+
+		libxfs_buf_mark_dirty(bp);
+		libxfs_buf_relse(bp);
+
+		buf += count;
+		len -= count;
+		bno += map.br_blockcount;
+		block_off = 0;
+	}
+
+	return error;
+}
diff --git a/mkfs/proto.c b/mkfs/proto.c
index 0764064e043e97..6946c22ff14d2a 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -16,7 +16,7 @@ static char *getstr(char **pp);
 static void fail(char *msg, int i);
 static struct xfs_trans * getres(struct xfs_mount *mp, uint blocks);
 static void rsvfile(xfs_mount_t *mp, xfs_inode_t *ip, long long len);
-static char *newregfile(char **pp, int *len);
+static int newregfile(char **pp, char **fname);
 static void rtinit(xfs_mount_t *mp);
 static long filesize(int fd);
 static int slashes_are_spaces;
@@ -261,88 +261,120 @@ writesymlink(
 }
 
 static void
-writefile(
-	struct xfs_trans	*tp,
+writefile_range(
 	struct xfs_inode	*ip,
-	char			*buf,
-	int			len)
+	const char		*fname,
+	int			fd,
+	off_t			pos,
+	uint64_t		len)
 {
-	struct xfs_bmbt_irec	map;
-	struct xfs_mount	*mp;
-	struct xfs_buf		*bp;
-	xfs_daddr_t		d;
-	xfs_extlen_t		nb;
-	int			nmap;
+	static char		buf[131072];
 	int			error;
 
-	mp = ip->i_mount;
-	if (len > 0) {
-		int	bcount;
+	if (XFS_IS_REALTIME_INODE(ip)) {
+		fprintf(stderr,
+ _("%s: creating realtime files from proto file not supported.\n"),
+				progname);
+		exit(1);
+	}
 
-		nb = XFS_B_TO_FSB(mp, len);
-		nmap = 1;
-		error = -libxfs_bmapi_write(tp, ip, 0, nb, 0, nb, &map, &nmap);
-		if (error == ENOSYS && XFS_IS_REALTIME_INODE(ip)) {
-			fprintf(stderr,
-	_("%s: creating realtime files from proto file not supported.\n"),
-					progname);
+	while (len > 0) {
+		ssize_t		read_len;
+
+		read_len = pread(fd, buf, min(len, sizeof(buf)), pos);
+		if (read_len < 0) {
+			fprintf(stderr, _("%s: read failed on %s: %s\n"),
+					progname, fname, strerror(errno));
 			exit(1);
 		}
-		if (error) {
+
+		error = -libxfs_alloc_file_space(ip, pos, read_len, 0);
+		if (error)
 			fail(_("error allocating space for a file"), error);
+
+		error = -libxfs_file_write(ip, buf, pos, read_len);
+		if (error)
+			fail(_("error writing file"), error);
+
+		pos += read_len;
+		len -= read_len;
+	}
+}
+
+static void
+writefile(
+	struct xfs_inode	*ip,
+	const char		*fname,
+	int			fd)
+{
+	struct xfs_trans	*tp;
+	struct xfs_mount	*mp = ip->i_mount;
+	struct stat		statbuf;
+	off_t			data_pos;
+	off_t			eof = 0;
+	int			error;
+
+	/* do not try to read from non-regular files */
+	error = fstat(fd, &statbuf);
+	if (error < 0)
+		fail(_("unable to stat file to copyin"), errno);
+
+	if (!S_ISREG(statbuf.st_mode))
+		return;
+
+	data_pos = lseek(fd, 0, SEEK_DATA);
+	while (data_pos >= 0) {
+		off_t		hole_pos;
+
+		hole_pos = lseek(fd, data_pos, SEEK_HOLE);
+		if (hole_pos < 0) {
+			/* save error, break */
+			data_pos = hole_pos;
+			break;
 		}
-		if (nmap != 1) {
-			fprintf(stderr,
-				_("%s: cannot allocate space for file\n"),
-				progname);
-			exit(1);
-		}
-		d = XFS_FSB_TO_DADDR(mp, map.br_startblock);
-		error = -libxfs_trans_get_buf(NULL, mp->m_dev, d,
-				nb << mp->m_blkbb_log, 0, &bp);
-		if (error) {
-			fprintf(stderr,
-				_("%s: cannot allocate buffer for file\n"),
-				progname);
-			exit(1);
+		if (hole_pos <= data_pos) {
+			/* shouldn't happen??? */
+			break;
 		}
-		memmove(bp->b_addr, buf, len);
-		bcount = BBTOB(bp->b_length);
-		if (len < bcount)
-			memset((char *)bp->b_addr + len, 0, bcount - len);
-		libxfs_buf_mark_dirty(bp);
-		libxfs_buf_relse(bp);
+
+		writefile_range(ip, fname, fd, data_pos, hole_pos - data_pos);
+		eof = hole_pos;
+
+		data_pos = lseek(fd, hole_pos, SEEK_DATA);
 	}
-	ip->i_disk_size = len;
+	if (data_pos < 0 && errno != ENXIO)
+		fail(_("error finding file data to import"), errno);
+
+	/* extend EOF only after writing all the file data */
+	error = -libxfs_trans_alloc_inode(ip, &M_RES(mp)->tr_ichange, 0, 0,
+			false, &tp);
+	if (error)
+		fail(_("error creating isize transaction"), error);
+
+	libxfs_trans_ijoin(tp, ip, 0);
+	ip->i_disk_size = eof;
+	libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+	error = -libxfs_trans_commit(tp);
+	if (error)
+		fail(_("error committing isize transaction"), error);
 }
 
-static char *
+static int
 newregfile(
 	char		**pp,
-	int		*len)
+	char		**fname)
 {
-	char		*buf;
 	int		fd;
-	char		*fname;
-	long		size;
+	off_t		size;
 
-	fname = getstr(pp);
-	if ((fd = open(fname, O_RDONLY)) < 0 || (size = filesize(fd)) < 0) {
+	*fname = getstr(pp);
+	if ((fd = open(*fname, O_RDONLY)) < 0 || (size = filesize(fd)) < 0) {
 		fprintf(stderr, _("%s: cannot open %s: %s\n"),
-			progname, fname, strerror(errno));
+			progname, *fname, strerror(errno));
 		exit(1);
 	}
-	if ((*len = (int)size)) {
-		buf = malloc(size);
-		if (read(fd, buf, size) < size) {
-			fprintf(stderr, _("%s: read failed on %s: %s\n"),
-				progname, fname, strerror(errno));
-			exit(1);
-		}
-	} else
-		buf = NULL;
-	close(fd);
-	return buf;
+
+	return fd;
 }
 
 static void
@@ -552,7 +584,8 @@ parseproto(
 	int		fmt;
 	int		i;
 	xfs_inode_t	*ip;
-	int		len;
+	int		fd = -1;
+	off_t		len;
 	long long	llen;
 	int		majdev;
 	int		mindev;
@@ -563,6 +596,7 @@ parseproto(
 	int		isroot = 0;
 	struct cred	creds;
 	char		*value;
+	char		*fname = NULL;
 	struct xfs_name	xname;
 	struct xfs_parent_args *ppargs = NULL;
 
@@ -636,16 +670,13 @@ parseproto(
 	flags = XFS_ILOG_CORE;
 	switch (fmt) {
 	case IF_REGULAR:
-		buf = newregfile(pp, &len);
-		tp = getres(mp, XFS_B_TO_FSB(mp, len));
+		fd = newregfile(pp, &fname);
+		tp = getres(mp, 0);
 		ppargs = newpptr(mp);
 		error = creatproto(&tp, pip, mode | S_IFREG, 0, &creds, fsxp,
 				&ip);
 		if (error)
 			fail(_("Inode allocation failed"), error);
-		writefile(tp, ip, buf, len);
-		if (buf)
-			free(buf);
 		libxfs_trans_ijoin(tp, pip, 0);
 		xname.type = XFS_DIR3_FT_REG_FILE;
 		newdirent(mp, tp, pip, &xname, ip, ppargs);
@@ -800,6 +831,10 @@ parseproto(
 	}
 
 	libxfs_parent_finish(mp, ppargs);
+	if (fmt == IF_REGULAR) {
+		writefile(ip, fname, fd);
+		close(fd);
+	}
 	libxfs_irele(ip);
 }
 
@@ -950,7 +985,7 @@ rtinit(
 	rtfreesp_init(mp);
 }
 
-static long
+static off_t
 filesize(
 	int		fd)
 {
@@ -958,5 +993,5 @@ filesize(
 
 	if (fstat(fd, &stb) < 0)
 		return -1;
-	return (long)stb.st_size;
+	return stb.st_size;
 }


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 3/4] mkfs: support copying in xattrs
  2024-12-23 21:35 ` [PATCHSET v6.2 4/8] mkfs: make protofiles less janky Darrick J. Wong
  2024-12-23 21:57   ` [PATCH 1/4] libxfs: resync libxfs_alloc_file_space interface with the kernel Darrick J. Wong
  2024-12-23 21:57   ` [PATCH 2/4] mkfs: support copying in large or sparse files Darrick J. Wong
@ 2024-12-23 21:57   ` Darrick J. Wong
  2024-12-23 21:58   ` [PATCH 4/4] mkfs: add a utility to generate protofiles Darrick J. Wong
  3 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:57 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Update the protofile code to import extended attributes from the source
files.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 mkfs/proto.c |   94 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 94 insertions(+)


diff --git a/mkfs/proto.c b/mkfs/proto.c
index 6946c22ff14d2a..7a0493ec71cfd9 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -6,6 +6,8 @@
 
 #include "libxfs.h"
 #include <sys/stat.h>
+#include <sys/xattr.h>
+#include <linux/xattr.h>
 #include "libfrog/convert.h"
 #include "proto.h"
 
@@ -359,6 +361,97 @@ writefile(
 		fail(_("error committing isize transaction"), error);
 }
 
+static void
+writeattr(
+	struct xfs_inode	*ip,
+	const char		*fname,
+	int			fd,
+	const char		*attrname,
+	char			*valuebuf,
+	size_t			valuelen)
+{
+	struct xfs_da_args	args = {
+		.dp		= ip,
+		.geo		= ip->i_mount->m_attr_geo,
+		.owner		= ip->i_ino,
+		.whichfork	= XFS_ATTR_FORK,
+		.op_flags	= XFS_DA_OP_OKNOENT,
+		.value		= valuebuf,
+	};
+	ssize_t			ret;
+	int			error;
+
+	ret = fgetxattr(fd, attrname, valuebuf, valuelen);
+	if (ret < 0) {
+		if (errno == EOPNOTSUPP)
+			return;
+		fail(_("error collecting xattr value"), errno);
+	}
+	if (ret == 0)
+		return;
+
+	if (!strncmp(attrname, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN)) {
+		args.name = (unsigned char *)attrname + XATTR_TRUSTED_PREFIX_LEN;
+		args.attr_filter = LIBXFS_ATTR_ROOT;
+	} else if (!strncmp(attrname, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN)) {
+		args.name = (unsigned char *)attrname + XATTR_SECURITY_PREFIX_LEN;
+		args.attr_filter = LIBXFS_ATTR_SECURE;
+	} else if (!strncmp(attrname, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN)) {
+		args.name = (unsigned char *)attrname + XATTR_USER_PREFIX_LEN;
+		args.attr_filter = 0;
+	} else {
+		args.name = (unsigned char *)attrname;
+		args.attr_filter = 0;
+	}
+	args.namelen = strlen((char *)args.name);
+
+	args.valuelen = ret;
+	libxfs_attr_sethash(&args);
+
+	error = -libxfs_attr_set(&args, XFS_ATTRUPDATE_UPSERT, false);
+	if (error)
+		fail(_("setting xattr value"), error);
+}
+
+static void
+writeattrs(
+	struct xfs_inode	*ip,
+	const char		*fname,
+	int			fd)
+{
+	char			*namebuf, *p, *end;
+	char			*valuebuf = NULL;
+	ssize_t			ret;
+
+	namebuf = malloc(XATTR_LIST_MAX);
+	if (!namebuf)
+		fail(_("error allocating xattr name buffer"), errno);
+
+	ret = flistxattr(fd, namebuf, XATTR_LIST_MAX);
+	if (ret < 0) {
+		if (errno == EOPNOTSUPP)
+			goto out_namebuf;
+		fail(_("error collecting xattr names"), errno);
+	}
+
+	p = namebuf;
+	end = namebuf + ret;
+	for (p = namebuf; p < end; p += strlen(p) + 1) {
+		if (!valuebuf) {
+			valuebuf = malloc(ATTR_MAX_VALUELEN);
+			if (!valuebuf)
+				fail(_("error allocating xattr value buffer"),
+						errno);
+		}
+
+		writeattr(ip, fname, fd, p, valuebuf, ATTR_MAX_VALUELEN);
+	}
+
+	free(valuebuf);
+out_namebuf:
+	free(namebuf);
+}
+
 static int
 newregfile(
 	char		**pp,
@@ -833,6 +926,7 @@ parseproto(
 	libxfs_parent_finish(mp, ppargs);
 	if (fmt == IF_REGULAR) {
 		writefile(ip, fname, fd);
+		writeattrs(ip, fname, fd);
 		close(fd);
 	}
 	libxfs_irele(ip);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 4/4] mkfs: add a utility to generate protofiles
  2024-12-23 21:35 ` [PATCHSET v6.2 4/8] mkfs: make protofiles less janky Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-12-23 21:57   ` [PATCH 3/4] mkfs: support copying in xattrs Darrick J. Wong
@ 2024-12-23 21:58   ` Darrick J. Wong
  3 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:58 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Add a new utility to generate mkfs protofiles from a directory tree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 man/man8/xfs_protofile.8 |   33 ++++++++++
 mkfs/Makefile            |   10 +++
 mkfs/xfs_protofile.in    |  152 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 194 insertions(+), 1 deletion(-)
 create mode 100644 man/man8/xfs_protofile.8
 create mode 100644 mkfs/xfs_protofile.in


diff --git a/man/man8/xfs_protofile.8 b/man/man8/xfs_protofile.8
new file mode 100644
index 00000000000000..75090c138f3393
--- /dev/null
+++ b/man/man8/xfs_protofile.8
@@ -0,0 +1,33 @@
+.TH xfs_protofile 8
+.SH NAME
+xfs_protofile \- create a protofile for use with mkfs.xfs
+.SH SYNOPSIS
+.B xfs_protofile
+.I path
+[
+.I paths...
+]
+.br
+.B xfs_protofile \-V
+.SH DESCRIPTION
+.B xfs_protofile
+walks a directory tree to generate a protofile.
+The protofile format is specified in the
+.BR mkfs.xfs (8)
+manual page and is derived from 3rd edition Unix.
+.SH OPTIONS
+.TP 1.0i
+.I path
+Create protofile directives to copy this path into the root directory.
+If the path is a directory, protofile directives will be emitted to
+replicate the entire subtree as a subtree of the root directory.
+If the path is a not a directory, protofile directives will be emitted
+to create the file as an entry in the root directory.
+The first path must resolve to a directory.
+
+.SH BUGS
+Filenames cannot contain spaces.
+Extended attributes are not copied into the filesystem.
+
+.PD
+.RE
diff --git a/mkfs/Makefile b/mkfs/Makefile
index 754a17ea5137b7..3d3f08ad54f844 100644
--- a/mkfs/Makefile
+++ b/mkfs/Makefile
@@ -6,6 +6,7 @@ TOPDIR = ..
 include $(TOPDIR)/include/builddefs
 
 LTCOMMAND = mkfs.xfs
+XFS_PROTOFILE = xfs_protofile
 
 HFILES =
 CFILES = proto.c xfs_mkfs.c
@@ -23,14 +24,21 @@ LLDLIBS += $(LIBXFS) $(LIBXCMD) $(LIBFROG) $(LIBRT) $(LIBBLKID) \
 	$(LIBUUID) $(LIBINIH) $(LIBURCU) $(LIBPTHREAD)
 LTDEPENDENCIES += $(LIBXFS) $(LIBXCMD) $(LIBFROG)
 LLDFLAGS = -static-libtool-libs
+DIRT = $(XFS_PROTOFILE)
 
-default: depend $(LTCOMMAND) $(CFGFILES)
+default: depend $(LTCOMMAND) $(CFGFILES) $(XFS_PROTOFILE)
 
 include $(BUILDRULES)
 
+$(XFS_PROTOFILE): $(XFS_PROTOFILE).in
+	@echo "    [SED]    $@"
+	$(Q)$(SED) -e "s|@pkg_version@|$(PKG_VERSION)|g" < $< > $@
+	$(Q)chmod a+x $@
+
 install: default
 	$(INSTALL) -m 755 -d $(PKG_SBIN_DIR)
 	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_SBIN_DIR)
+	$(INSTALL) -m 755 $(XFS_PROTOFILE) $(PKG_SBIN_DIR)
 	$(INSTALL) -m 755 -d $(MKFS_CFG_DIR)
 	$(INSTALL) -m 644 $(CFGFILES) $(MKFS_CFG_DIR)
 
diff --git a/mkfs/xfs_protofile.in b/mkfs/xfs_protofile.in
new file mode 100644
index 00000000000000..9aee4336888523
--- /dev/null
+++ b/mkfs/xfs_protofile.in
@@ -0,0 +1,152 @@
+#!/usr/bin/python3
+
+# SPDX-License-Identifier: GPL-2.0-or-later
+# Copyright (c) 2018-2024 Oracle.  All rights reserved.
+#
+# Author: Darrick J. Wong <djwong@kernel.org>
+
+# Walk a filesystem tree to generate a protofile for mkfs.
+
+import os
+import argparse
+import sys
+import stat
+
+def emit_proto_header():
+	'''Emit the protofile header.'''
+	print('/')
+	print('0 0')
+
+def stat_to_str(statbuf):
+	'''Convert a stat buffer to a proto string.'''
+
+	if stat.S_ISREG(statbuf.st_mode):
+		type = '-'
+	elif stat.S_ISCHR(statbuf.st_mode):
+		type = 'c'
+	elif stat.S_ISBLK(statbuf.st_mode):
+		type = 'b'
+	elif stat.S_ISFIFO(statbuf.st_mode):
+		type = 'p'
+	elif stat.S_ISDIR(statbuf.st_mode):
+		type = 'd'
+	elif stat.S_ISLNK(statbuf.st_mode):
+		type = 'l'
+
+	if statbuf.st_mode & stat.S_ISUID:
+		suid = 'u'
+	else:
+		suid = '-'
+
+	if statbuf.st_mode & stat.S_ISGID:
+		sgid = 'g'
+	else:
+		sgid = '-'
+
+	perms = stat.S_IMODE(statbuf.st_mode)
+
+	return '%s%s%s%o %d %d' % (type, suid, sgid, perms, statbuf.st_uid, \
+			statbuf.st_gid)
+
+def stat_to_extra(statbuf, fullpath):
+	'''Compute the extras column for a protofile.'''
+
+	if stat.S_ISREG(statbuf.st_mode):
+		return ' %s' % fullpath
+	elif stat.S_ISCHR(statbuf.st_mode) or stat.S_ISBLK(statbuf.st_mode):
+		return ' %d %d' % (statbuf.st_rdev, statbuf.st_rdev)
+	elif stat.S_ISLNK(statbuf.st_mode):
+		return ' %s' % os.readlink(fullpath)
+	return ''
+
+def max_fname_len(s1):
+	'''Return the length of the longest string in s1.'''
+	ret = 0
+	for s in s1:
+		if len(s) > ret:
+			ret = len(s)
+	return ret
+
+def walk_tree(path, depth):
+	'''Walk the directory tree rooted by path.'''
+	dirs = []
+	files = []
+
+	for fname in os.listdir(path):
+		fullpath = os.path.join(path, fname)
+		sb = os.lstat(fullpath)
+
+		if stat.S_ISDIR(sb.st_mode):
+			dirs.append(fname)
+			continue
+		elif stat.S_ISSOCK(sb.st_mode):
+			continue
+		else:
+			files.append(fname)
+
+	for fname in files:
+		if ' ' in fname:
+			raise ValueError( \
+				f'{fname}: Spaces not allowed in file names.')
+	for fname in dirs:
+		if ' ' in fname:
+			raise Exception( \
+				f'{fname}: Spaces not allowed in file names.')
+
+	fname_width = max_fname_len(files)
+	for fname in files:
+		fullpath = os.path.join(path, fname)
+		sb = os.lstat(fullpath)
+		extra = stat_to_extra(sb, fullpath)
+		print('%*s%-*s %s%s' % (depth, ' ', fname_width, fname, \
+				stat_to_str(sb), extra))
+
+	for fname in dirs:
+		fullpath = os.path.join(path, fname)
+		sb = os.lstat(fullpath)
+		extra = stat_to_extra(sb, fullpath)
+		print('%*s%s %s' % (depth, ' ', fname, \
+				stat_to_str(sb)))
+		walk_tree(fullpath, depth + 1)
+
+	if depth > 1:
+		print('%*s$' % (depth - 1, ' '))
+
+def main():
+	parser = argparse.ArgumentParser( \
+			description = "Generate mkfs.xfs protofile for a directory tree.")
+	parser.add_argument('paths', metavar = 'paths', type = str, \
+			nargs = '*', help = 'Directory paths to walk.')
+	parser.add_argument("-V", help = "Report version and exit.", \
+			action = "store_true")
+	args = parser.parse_args()
+
+	if args.V:
+		print("xfs_protofile version @pkg_version@")
+		sys.exit(0)
+
+	emit_proto_header()
+	if len(args.paths) == 0:
+		print('d--755 0 0')
+		print('$')
+	else:
+		# Copy the first argument's stat to the rootdir
+		statbuf = os.stat(args.paths[0])
+		if not stat.S_ISDIR(statbuf.st_mode):
+			raise NotADirectoryError(path)
+		print(stat_to_str(statbuf))
+
+		# All files under each path go in the root dir, recursively
+		for path in args.paths:
+			print(': Descending path %s' % path)
+			try:
+				walk_tree(path, 1)
+			except Exception as e:
+				print(e, file = sys.stderr)
+				return 1
+
+		print('$')
+	return 0
+
+if __name__ == '__main__':
+	sys.exit(main())


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 01/52] xfs: create incore realtime group structures
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
@ 2024-12-23 21:58   ` Darrick J. Wong
  2024-12-23 21:58   ` [PATCH 02/52] xfs: define locking primitives for realtime groups Darrick J. Wong
                     ` (50 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:58 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 87fe4c34a383d51ec75f254240bcd08828f4ce5a

Create an incore object that will contain information about a realtime
allocation group.  This will eventually enable us to shard the realtime
section in a similar manner to how we shard the data section, but for
now just a single object for the entire RT subvolume is created.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/block.h               |   16 ---
 db/convert.c             |    1 
 db/faddr.c               |    1 
 include/libxfs.h         |    1 
 include/platform_defs.h  |   33 +++++++
 include/xfs_mount.h      |   16 +++
 include/xfs_trace.h      |    7 +
 libxfs/Makefile          |    2 
 libxfs/init.c            |   17 ++++
 libxfs/libxfs_api_defs.h |    4 +
 libxfs/libxfs_priv.h     |   33 -------
 libxfs/xfs_format.h      |    3 +
 libxfs/xfs_rtgroup.c     |  149 ++++++++++++++++++++++++++++++++
 libxfs/xfs_rtgroup.h     |  217 ++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_sb.c          |   13 +++
 libxfs/xfs_types.h       |    8 +-
 mkfs/xfs_mkfs.c          |    8 ++
 17 files changed, 477 insertions(+), 52 deletions(-)
 create mode 100644 libxfs/xfs_rtgroup.c
 create mode 100644 libxfs/xfs_rtgroup.h


diff --git a/db/block.h b/db/block.h
index 55843a6b521393..ed13e159064461 100644
--- a/db/block.h
+++ b/db/block.h
@@ -11,20 +11,4 @@ struct field;
 extern void	block_init(void);
 extern void	print_block(const struct field *fields, int argc, char **argv);
 
-static inline xfs_daddr_t
-xfs_rtb_to_daddr(
-	struct xfs_mount	*mp,
-	xfs_rtblock_t		rtb)
-{
-	return rtb << mp->m_blkbb_log;
-}
-
-static inline xfs_rtblock_t
-xfs_daddr_to_rtb(
-	struct xfs_mount	*mp,
-	xfs_daddr_t		daddr)
-{
-	return daddr >> mp->m_blkbb_log;
-}
-
 #endif /* __XFS_DB_BLOCK_H */
diff --git a/db/convert.c b/db/convert.c
index 3014367e7d7652..0c5c942150fe6f 100644
--- a/db/convert.c
+++ b/db/convert.c
@@ -8,7 +8,6 @@
 #include "command.h"
 #include "output.h"
 #include "init.h"
-#include "block.h"
 
 #define	M(A)	(1 << CT_ ## A)
 #define	agblock_to_bytes(x)	\
diff --git a/db/faddr.c b/db/faddr.c
index e2f9587da0a67c..bf9dd521f14068 100644
--- a/db/faddr.c
+++ b/db/faddr.c
@@ -15,7 +15,6 @@
 #include "bmap.h"
 #include "output.h"
 #include "init.h"
-#include "block.h"
 
 void
 fa_agblock(
diff --git a/include/libxfs.h b/include/libxfs.h
index 985646e6ad89d1..df38d875143ed9 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -97,6 +97,7 @@ struct iomap;
 #include "xfs_ag_resv.h"
 #include "xfs_metafile.h"
 #include "xfs_metadir.h"
+#include "xfs_rtgroup.h"
 
 #ifndef ARRAY_SIZE
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
diff --git a/include/platform_defs.h b/include/platform_defs.h
index a3644dea41cdef..051ee25a5b4fea 100644
--- a/include/platform_defs.h
+++ b/include/platform_defs.h
@@ -228,4 +228,37 @@ static inline size_t __ab_c_size(size_t a, size_t b, size_t c)
  */
 #define IS_ENABLED(option) __or(IS_BUILTIN(option), IS_MODULE(option))
 
+/* miscellaneous kernel routines not in user space */
+#define likely(x)		(x)
+#define unlikely(x)		(x)
+
+#define __must_check	__attribute__((__warn_unused_result__))
+
+/*
+ * Allows for effectively applying __must_check to a macro so we can have
+ * both the type-agnostic benefits of the macros while also being able to
+ * enforce that the return value is, in fact, checked.
+ */
+static inline bool __must_check __must_check_overflow(bool overflow)
+{
+	return unlikely(overflow);
+}
+
+/*
+ * For simplicity and code hygiene, the fallback code below insists on
+ * a, b and *d having the same type (similar to the min() and max()
+ * macros), whereas gcc's type-generic overflow checkers accept
+ * different types. Hence we don't just make check_add_overflow an
+ * alias for __builtin_add_overflow, but add type checks similar to
+ * below.
+ */
+#define check_add_overflow(a, b, d) __must_check_overflow(({	\
+	typeof(a) __a = (a);			\
+	typeof(b) __b = (b);			\
+	typeof(d) __d = (d);			\
+	(void) (&__a == &__b);			\
+	(void) (&__a == __d);			\
+	__builtin_add_overflow(__a, __b, __d);	\
+}))
+
 #endif	/* __XFS_PLATFORM_DEFS_H__ */
diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 6daf3f01ffa9cf..987ffea19d1586 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -82,6 +82,7 @@ typedef struct xfs_mount {
         struct xfs_ino_geometry	m_ino_geo;	/* inode geometry */
 	uint			m_rsumlevels;	/* rt summary levels */
 	xfs_filblks_t		m_rsumblocks;	/* size of rt summary, FSBs */
+	uint32_t		m_rgblocks;	/* size of rtgroup in rtblocks */
 	/*
 	 * Optional cache of rt summary level per bitmap block with the
 	 * invariant that m_rsum_cache[bbno] <= the minimum i for which
@@ -104,6 +105,7 @@ typedef struct xfs_mount {
 	uint8_t			m_sectbb_log;	/* sectorlog - BBSHIFT */
 	uint8_t			m_agno_log;	/* log #ag's */
 	int8_t			m_rtxblklog;	/* log2 of rextsize, if possible */
+	int8_t			m_rgblklog;	/* log2 of rt group sz if possible */
 	uint			m_blockmask;	/* sb_blocksize-1 */
 	uint			m_blockwsize;	/* sb_blocksize in words */
 	uint			m_blockwmask;	/* blockwsize-1 */
@@ -127,6 +129,7 @@ typedef struct xfs_mount {
 	uint64_t		m_features;	/* active filesystem features */
 	uint64_t		m_low_space[XFS_LOWSP_MAX];
 	uint64_t		m_rtxblkmask;	/* rt extent block mask */
+	uint64_t		m_rgblkmask;	/* rt group block mask */
 	unsigned long		m_opstate;	/* dynamic state flags */
 	bool			m_finobt_nores; /* no per-AG finobt resv. */
 	uint			m_qflags;	/* quota status flags */
@@ -250,6 +253,17 @@ __XFS_HAS_FEAT(large_extent_counts, NREXT64)
 __XFS_HAS_FEAT(exchange_range, EXCHANGE_RANGE)
 __XFS_HAS_FEAT(metadir, METADIR)
 
+
+static inline bool xfs_has_rtgroups(struct xfs_mount *mp)
+{
+	return false;
+}
+
+static inline bool xfs_has_rtsb(struct xfs_mount *mp)
+{
+	return false;
+}
+
 /* Kernel mount features that we don't support */
 #define __XFS_UNSUPP_FEAT(name) \
 static inline bool xfs_has_ ## name (struct xfs_mount *mp) \
@@ -269,6 +283,7 @@ __XFS_UNSUPP_FEAT(grpid)
 #define XFS_OPSTATE_DEBUGGER		1	/* is this the debugger? */
 #define XFS_OPSTATE_REPORT_CORRUPTION	2	/* report buffer corruption? */
 #define XFS_OPSTATE_PERAG_DATA_LOADED	3	/* per-AG data initialized? */
+#define XFS_OPSTATE_RTGROUP_DATA_LOADED	4	/* rtgroup data initialized? */
 
 #define __XFS_IS_OPSTATE(name, NAME) \
 static inline bool xfs_is_ ## name (struct xfs_mount *mp) \
@@ -294,6 +309,7 @@ __XFS_IS_OPSTATE(inode32, INODE32)
 __XFS_IS_OPSTATE(debugger, DEBUGGER)
 __XFS_IS_OPSTATE(reporting_corruption, REPORT_CORRUPTION)
 __XFS_IS_OPSTATE(perag_data_loaded, PERAG_DATA_LOADED)
+__XFS_IS_OPSTATE(rtgroup_data_loaded, RTGROUP_DATA_LOADED)
 
 #define __XFS_UNSUPP_OPSTATE(name) \
 static inline bool xfs_is_ ## name (struct xfs_mount *mp) \
diff --git a/include/xfs_trace.h b/include/xfs_trace.h
index f3e531ef118d05..a53ce092c8ea3b 100644
--- a/include/xfs_trace.h
+++ b/include/xfs_trace.h
@@ -356,6 +356,13 @@
 #define trace_xfs_exchmaps_overhead(...)	((void) 0)
 #define trace_xfs_exchmaps_update_inode_size(...) ((void) 0)
 
+/* set c = c to avoid unused var warnings */
+#define trace_xfs_rtgroup_get(a,b)		((a) = (a))
+#define trace_xfs_rtgroup_hold(a,b)		((a) = (a))
+#define trace_xfs_rtgroup_put(a,b)		((a) = (a))
+#define trace_xfs_rtgroup_grab(a,b)		((a) = (a))
+#define trace_xfs_rtgroup_rele(a,b)		((a) = (a))
+
 #define trace_xfs_fs_mark_healthy(a,b)		((void) 0)
 
 #define trace_xlog_intent_recovery_failed(...)	((void) 0)
diff --git a/libxfs/Makefile b/libxfs/Makefile
index c8267841e77b01..42c9f3cc439dc5 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -64,6 +64,7 @@ HFILES = \
 	xfs_rmap.h \
 	xfs_rmap_btree.h \
 	xfs_rtbitmap.h \
+	xfs_rtgroup.h \
 	xfs_sb.h \
 	xfs_shared.h \
 	xfs_trans_resv.h \
@@ -124,6 +125,7 @@ CFILES = buf_mem.c \
 	xfs_rmap.c \
 	xfs_rmap_btree.c \
 	xfs_rtbitmap.c \
+	xfs_rtgroup.c \
 	xfs_sb.c \
 	xfs_symlink_remote.c \
 	xfs_trans_inode.c \
diff --git a/libxfs/init.c b/libxfs/init.c
index bf488c5d8533b1..b47d8eaf4f468c 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -32,6 +32,7 @@
 #include "xfs_ondisk.h"
 
 #include "libxfs.h"		/* for now */
+#include "xfs_rtgroup.h"
 
 #ifndef HAVE_LIBURCU_ATOMIC64
 pthread_mutex_t	atomic64_lock = PTHREAD_MUTEX_INITIALIZER;
@@ -670,6 +671,7 @@ libxfs_mount(
 {
 	struct xfs_buf		*bp;
 	struct xfs_sb		*sbp;
+	struct xfs_rtgroup	*rtg = NULL;
 	xfs_daddr_t		d;
 	int			i;
 	int			error;
@@ -824,6 +826,19 @@ libxfs_mount(
 	if (xfs_has_metadir(mp))
 		libxfs_mount_setup_metadir(mp);
 
+	error = libxfs_initialize_rtgroups(mp, 0, sbp->sb_rgcount,
+			sbp->sb_rextents);
+	if (error) {
+		fprintf(stderr, _("%s: rtgroup init failed\n"),
+			progname);
+		exit(1);
+	}
+
+	while ((rtg = xfs_rtgroup_next(mp, rtg)))
+		rtg->rtg_extents = xfs_rtgroup_extents(mp, rtg_rgno(rtg));
+
+	xfs_set_rtgroup_data_loaded(mp);
+
 	return mp;
 out_da:
 	xfs_da_unmount(mp);
@@ -957,6 +972,8 @@ libxfs_umount(
 	 * Only try to free the per-AG structures if we set them up in the
 	 * first place.
 	 */
+	if (xfs_is_rtgroup_data_loaded(mp))
+		libxfs_free_rtgroups(mp, 0, mp->m_sb.sb_rgcount);
 	if (xfs_is_perag_data_loaded(mp))
 		libxfs_free_perag_range(mp, 0, mp->m_sb.sb_agcount);
 
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 2f218296688477..c4f42754c311fc 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -168,6 +168,7 @@
 #define xfs_free_extent			libxfs_free_extent
 #define xfs_free_extent_later		libxfs_free_extent_later
 #define xfs_free_perag_range		libxfs_free_perag_range
+#define xfs_free_rtgroups		libxfs_free_rtgroups
 #define xfs_fs_geometry			libxfs_fs_geometry
 #define xfs_gbno_to_fsb			libxfs_gbno_to_fsb
 #define xfs_get_initial_prid		libxfs_get_initial_prid
@@ -189,6 +190,7 @@
 
 #define xfs_initialize_perag		libxfs_initialize_perag
 #define xfs_initialize_perag_data	libxfs_initialize_perag_data
+#define xfs_initialize_rtgroups		libxfs_initialize_rtgroups
 #define xfs_init_local_fork		libxfs_init_local_fork
 
 #define xfs_inobt_init_cursor		libxfs_inobt_init_cursor
@@ -276,6 +278,8 @@
 #define xfs_rtbitmap_setword		libxfs_rtbitmap_setword
 #define xfs_rtbitmap_wordcount		libxfs_rtbitmap_wordcount
 
+#define xfs_rtgroup_alloc		libxfs_rtgroup_alloc
+
 #define xfs_suminfo_add			libxfs_suminfo_add
 #define xfs_suminfo_get			libxfs_suminfo_get
 #define xfs_rtsummary_wordcount		libxfs_rtsummary_wordcount
diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index 631abf266ad526..620cd13b8ef796 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -204,10 +204,6 @@ enum ce { CE_DEBUG, CE_CONT, CE_NOTE, CE_WARN, CE_ALERT, CE_PANIC };
 #define xfs_info_once(dev, fmt, ...)				\
 	xfs_printk_once(xfs_info, dev, fmt, ##__VA_ARGS__)
 
-/* miscellaneous kernel routines not in user space */
-#define likely(x)		(x)
-#define unlikely(x)		(x)
-
 /* Need to be able to handle this bare or in control flow */
 static inline bool WARN_ON(bool expr) {
 	return (expr);
@@ -238,35 +234,6 @@ struct mnt_idmap;
 void inode_init_owner(struct mnt_idmap *idmap, struct inode *inode,
 		      const struct inode *dir, umode_t mode);
 
-#define __must_check	__attribute__((__warn_unused_result__))
-
-/*
- * Allows for effectively applying __must_check to a macro so we can have
- * both the type-agnostic benefits of the macros while also being able to
- * enforce that the return value is, in fact, checked.
- */
-static inline bool __must_check __must_check_overflow(bool overflow)
-{
-	return unlikely(overflow);
-}
-
-/*
- * For simplicity and code hygiene, the fallback code below insists on
- * a, b and *d having the same type (similar to the min() and max()
- * macros), whereas gcc's type-generic overflow checkers accept
- * different types. Hence we don't just make check_add_overflow an
- * alias for __builtin_add_overflow, but add type checks similar to
- * below.
- */
-#define check_add_overflow(a, b, d) __must_check_overflow(({	\
-	typeof(a) __a = (a);			\
-	typeof(b) __b = (b);			\
-	typeof(d) __d = (d);			\
-	(void) (&__a == &__b);			\
-	(void) (&__a == __d);			\
-	__builtin_add_overflow(__a, __b, __d);	\
-}))
-
 #define min_t(type,x,y) \
 	({ type __x = (x); type __y = (y); __x < __y ? __x: __y; })
 #define max_t(type,x,y) \
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index 616f81045921b7..867060d60e8583 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -176,6 +176,9 @@ typedef struct xfs_sb {
 
 	xfs_ino_t	sb_metadirino;	/* metadata directory tree root */
 
+	xfs_rgnumber_t	sb_rgcount;	/* number of realtime groups */
+	xfs_rtxlen_t	sb_rgextents;	/* size of a realtime group in rtx */
+
 	/* must be padded to 64 bit alignment */
 } xfs_sb_t;
 
diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
new file mode 100644
index 00000000000000..88f31384bf6961
--- /dev/null
+++ b/libxfs/xfs_rtgroup.c
@@ -0,0 +1,149 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2022-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "libxfs_priv.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_bit.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_btree.h"
+#include "xfs_alloc_btree.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_alloc.h"
+#include "xfs_ialloc.h"
+#include "xfs_rmap.h"
+#include "xfs_ag.h"
+#include "xfs_ag_resv.h"
+#include "xfs_health.h"
+#include "xfs_bmap.h"
+#include "xfs_defer.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_trace.h"
+#include "xfs_inode.h"
+#include "xfs_rtgroup.h"
+#include "xfs_rtbitmap.h"
+
+int
+xfs_rtgroup_alloc(
+	struct xfs_mount	*mp,
+	xfs_rgnumber_t		rgno,
+	xfs_rgnumber_t		rgcount,
+	xfs_rtbxlen_t		rextents)
+{
+	struct xfs_rtgroup	*rtg;
+	int			error;
+
+	rtg = kzalloc(sizeof(struct xfs_rtgroup), GFP_KERNEL);
+	if (!rtg)
+		return -ENOMEM;
+
+	error = xfs_group_insert(mp, rtg_group(rtg), rgno, XG_TYPE_RTG);
+	if (error)
+		goto out_free_rtg;
+	return 0;
+
+out_free_rtg:
+	kfree(rtg);
+	return error;
+}
+
+void
+xfs_rtgroup_free(
+	struct xfs_mount	*mp,
+	xfs_rgnumber_t		rgno)
+{
+	xfs_group_free(mp, rgno, XG_TYPE_RTG, NULL);
+}
+
+/* Free a range of incore rtgroup objects. */
+void
+xfs_free_rtgroups(
+	struct xfs_mount	*mp,
+	xfs_rgnumber_t		first_rgno,
+	xfs_rgnumber_t		end_rgno)
+{
+	xfs_rgnumber_t		rgno;
+
+	for (rgno = first_rgno; rgno < end_rgno; rgno++)
+		xfs_rtgroup_free(mp, rgno);
+}
+
+/* Initialize some range of incore rtgroup objects. */
+int
+xfs_initialize_rtgroups(
+	struct xfs_mount	*mp,
+	xfs_rgnumber_t		first_rgno,
+	xfs_rgnumber_t		end_rgno,
+	xfs_rtbxlen_t		rextents)
+{
+	xfs_rgnumber_t		index;
+	int			error;
+
+	if (first_rgno >= end_rgno)
+		return 0;
+
+	for (index = first_rgno; index < end_rgno; index++) {
+		error = xfs_rtgroup_alloc(mp, index, end_rgno, rextents);
+		if (error)
+			goto out_unwind_new_rtgs;
+	}
+
+	return 0;
+
+out_unwind_new_rtgs:
+	xfs_free_rtgroups(mp, first_rgno, index);
+	return error;
+}
+
+/* Compute the number of rt extents in this realtime group. */
+xfs_rtxnum_t
+__xfs_rtgroup_extents(
+	struct xfs_mount	*mp,
+	xfs_rgnumber_t		rgno,
+	xfs_rgnumber_t		rgcount,
+	xfs_rtbxlen_t		rextents)
+{
+	ASSERT(rgno < rgcount);
+	if (rgno == rgcount - 1)
+		return rextents - ((xfs_rtxnum_t)rgno * mp->m_sb.sb_rgextents);
+
+	ASSERT(xfs_has_rtgroups(mp));
+	return mp->m_sb.sb_rgextents;
+}
+
+xfs_rtxnum_t
+xfs_rtgroup_extents(
+	struct xfs_mount	*mp,
+	xfs_rgnumber_t		rgno)
+{
+	return __xfs_rtgroup_extents(mp, rgno, mp->m_sb.sb_rgcount,
+			mp->m_sb.sb_rextents);
+}
+
+/*
+ * Update the rt extent count of the previous tail rtgroup if it changed during
+ * recovery (i.e. recovery of a growfs).
+ */
+int
+xfs_update_last_rtgroup_size(
+	struct xfs_mount	*mp,
+	xfs_rgnumber_t		prev_rgcount)
+{
+	struct xfs_rtgroup	*rtg;
+
+	ASSERT(prev_rgcount > 0);
+
+	rtg = xfs_rtgroup_grab(mp, prev_rgcount - 1);
+	if (!rtg)
+		return -EFSCORRUPTED;
+	rtg->rtg_extents = __xfs_rtgroup_extents(mp, prev_rgcount - 1,
+			mp->m_sb.sb_rgcount, mp->m_sb.sb_rextents);
+	xfs_rtgroup_rele(rtg);
+	return 0;
+}
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
new file mode 100644
index 00000000000000..8872c27a9585fd
--- /dev/null
+++ b/libxfs/xfs_rtgroup.h
@@ -0,0 +1,217 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (c) 2022-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __LIBXFS_RTGROUP_H
+#define __LIBXFS_RTGROUP_H 1
+
+#include "xfs_group.h"
+
+struct xfs_mount;
+struct xfs_trans;
+
+/*
+ * Realtime group incore structure, similar to the per-AG structure.
+ */
+struct xfs_rtgroup {
+	struct xfs_group	rtg_group;
+
+	/* Number of blocks in this group */
+	xfs_rtxnum_t		rtg_extents;
+};
+
+static inline struct xfs_rtgroup *to_rtg(struct xfs_group *xg)
+{
+	return container_of(xg, struct xfs_rtgroup, rtg_group);
+}
+
+static inline struct xfs_group *rtg_group(struct xfs_rtgroup *rtg)
+{
+	return &rtg->rtg_group;
+}
+
+static inline struct xfs_mount *rtg_mount(const struct xfs_rtgroup *rtg)
+{
+	return rtg->rtg_group.xg_mount;
+}
+
+static inline xfs_rgnumber_t rtg_rgno(const struct xfs_rtgroup *rtg)
+{
+	return rtg->rtg_group.xg_gno;
+}
+
+/* Passive rtgroup references */
+static inline struct xfs_rtgroup *
+xfs_rtgroup_get(
+	struct xfs_mount	*mp,
+	xfs_rgnumber_t		rgno)
+{
+	return to_rtg(xfs_group_get(mp, rgno, XG_TYPE_RTG));
+}
+
+static inline struct xfs_rtgroup *
+xfs_rtgroup_hold(
+	struct xfs_rtgroup	*rtg)
+{
+	return to_rtg(xfs_group_hold(rtg_group(rtg)));
+}
+
+static inline void
+xfs_rtgroup_put(
+	struct xfs_rtgroup	*rtg)
+{
+	xfs_group_put(rtg_group(rtg));
+}
+
+/* Active rtgroup references */
+static inline struct xfs_rtgroup *
+xfs_rtgroup_grab(
+	struct xfs_mount	*mp,
+	xfs_rgnumber_t		rgno)
+{
+	return to_rtg(xfs_group_grab(mp, rgno, XG_TYPE_RTG));
+}
+
+static inline void
+xfs_rtgroup_rele(
+	struct xfs_rtgroup	*rtg)
+{
+	xfs_group_rele(rtg_group(rtg));
+}
+
+static inline struct xfs_rtgroup *
+xfs_rtgroup_next_range(
+	struct xfs_mount	*mp,
+	struct xfs_rtgroup	*rtg,
+	xfs_rgnumber_t		start_rgno,
+	xfs_rgnumber_t		end_rgno)
+{
+	return to_rtg(xfs_group_next_range(mp, rtg ? rtg_group(rtg) : NULL,
+			start_rgno, end_rgno, XG_TYPE_RTG));
+}
+
+static inline struct xfs_rtgroup *
+xfs_rtgroup_next(
+	struct xfs_mount	*mp,
+	struct xfs_rtgroup	*rtg)
+{
+	return xfs_rtgroup_next_range(mp, rtg, 0, mp->m_sb.sb_rgcount - 1);
+}
+
+static inline xfs_rtblock_t
+xfs_rgno_start_rtb(
+	struct xfs_mount	*mp,
+	xfs_rgnumber_t		rgno)
+{
+	if (mp->m_rgblklog >= 0)
+		return ((xfs_rtblock_t)rgno << mp->m_rgblklog);
+	return ((xfs_rtblock_t)rgno * mp->m_rgblocks);
+}
+
+static inline xfs_rtblock_t
+__xfs_rgbno_to_rtb(
+	struct xfs_mount	*mp,
+	xfs_rgnumber_t		rgno,
+	xfs_rgblock_t		rgbno)
+{
+	return xfs_rgno_start_rtb(mp, rgno) + rgbno;
+}
+
+static inline xfs_rtblock_t
+xfs_rgbno_to_rtb(
+	struct xfs_rtgroup	*rtg,
+	xfs_rgblock_t		rgbno)
+{
+	return __xfs_rgbno_to_rtb(rtg_mount(rtg), rtg_rgno(rtg), rgbno);
+}
+
+static inline xfs_rgnumber_t
+xfs_rtb_to_rgno(
+	struct xfs_mount	*mp,
+	xfs_rtblock_t		rtbno)
+{
+	if (!xfs_has_rtgroups(mp))
+		return 0;
+
+	if (mp->m_rgblklog >= 0)
+		return rtbno >> mp->m_rgblklog;
+
+	return div_u64(rtbno, mp->m_rgblocks);
+}
+
+static inline uint64_t
+__xfs_rtb_to_rgbno(
+	struct xfs_mount	*mp,
+	xfs_rtblock_t		rtbno)
+{
+	uint32_t		rem;
+
+	if (!xfs_has_rtgroups(mp))
+		return rtbno;
+
+	if (mp->m_rgblklog >= 0)
+		return rtbno & mp->m_rgblkmask;
+
+	div_u64_rem(rtbno, mp->m_rgblocks, &rem);
+	return rem;
+}
+
+static inline xfs_rgblock_t
+xfs_rtb_to_rgbno(
+	struct xfs_mount	*mp,
+	xfs_rtblock_t		rtbno)
+{
+	return __xfs_rtb_to_rgbno(mp, rtbno);
+}
+
+static inline xfs_daddr_t
+xfs_rtb_to_daddr(
+	struct xfs_mount	*mp,
+	xfs_rtblock_t		rtbno)
+{
+	return rtbno << mp->m_blkbb_log;
+}
+
+static inline xfs_rtblock_t
+xfs_daddr_to_rtb(
+	struct xfs_mount	*mp,
+	xfs_daddr_t		daddr)
+{
+	return daddr >> mp->m_blkbb_log;
+}
+
+#ifdef CONFIG_XFS_RT
+int xfs_rtgroup_alloc(struct xfs_mount *mp, xfs_rgnumber_t rgno,
+		xfs_rgnumber_t rgcount, xfs_rtbxlen_t rextents);
+void xfs_rtgroup_free(struct xfs_mount *mp, xfs_rgnumber_t rgno);
+
+void xfs_free_rtgroups(struct xfs_mount *mp, xfs_rgnumber_t first_rgno,
+		xfs_rgnumber_t end_rgno);
+int xfs_initialize_rtgroups(struct xfs_mount *mp, xfs_rgnumber_t first_rgno,
+		xfs_rgnumber_t end_rgno, xfs_rtbxlen_t rextents);
+
+xfs_rtxnum_t __xfs_rtgroup_extents(struct xfs_mount *mp, xfs_rgnumber_t rgno,
+		xfs_rgnumber_t rgcount, xfs_rtbxlen_t rextents);
+xfs_rtxnum_t xfs_rtgroup_extents(struct xfs_mount *mp, xfs_rgnumber_t rgno);
+
+int xfs_update_last_rtgroup_size(struct xfs_mount *mp,
+		xfs_rgnumber_t prev_rgcount);
+#else
+static inline void xfs_free_rtgroups(struct xfs_mount *mp,
+		xfs_rgnumber_t first_rgno, xfs_rgnumber_t end_rgno)
+{
+}
+
+static inline int xfs_initialize_rtgroups(struct xfs_mount *mp,
+		xfs_rgnumber_t first_rgno, xfs_rgnumber_t end_rgno,
+		xfs_rtbxlen_t rextents)
+{
+	return 0;
+}
+
+# define xfs_rtgroup_extents(mp, rgno)		(0)
+# define xfs_update_last_rtgroup_size(mp, rgno)	(-EOPNOTSUPP)
+#endif /* CONFIG_XFS_RT */
+
+#endif /* __LIBXFS_RTGROUP_H */
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 4ca57d592be8fa..2a5368c266ee91 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -707,6 +707,9 @@ __xfs_sb_from_disk(
 		to->sb_metadirino = be64_to_cpu(from->sb_metadirino);
 	else
 		to->sb_metadirino = NULLFSINO;
+
+	to->sb_rgcount = 1;
+	to->sb_rgextents = 0;
 }
 
 void
@@ -991,8 +994,18 @@ xfs_mount_sb_set_rextsize(
 	struct xfs_mount	*mp,
 	struct xfs_sb		*sbp)
 {
+	struct xfs_groups	*rgs = &mp->m_groups[XG_TYPE_RTG];
+
 	mp->m_rtxblklog = log2_if_power2(sbp->sb_rextsize);
 	mp->m_rtxblkmask = mask64_if_power2(sbp->sb_rextsize);
+
+	mp->m_rgblocks = 0;
+	mp->m_rgblklog = 0;
+	mp->m_rgblkmask = (uint64_t)-1;
+
+	rgs->blocks = 0;
+	rgs->blklog = 0;
+	rgs->blkmask = (uint64_t)-1;
 }
 
 /*
diff --git a/libxfs/xfs_types.h b/libxfs/xfs_types.h
index 25053a66c225ed..bf33c2b1e43e5f 100644
--- a/libxfs/xfs_types.h
+++ b/libxfs/xfs_types.h
@@ -9,10 +9,12 @@
 typedef uint32_t	prid_t;		/* project ID */
 
 typedef uint32_t	xfs_agblock_t;	/* blockno in alloc. group */
+typedef uint32_t	xfs_rgblock_t;	/* blockno in realtime group */
 typedef uint32_t	xfs_agino_t;	/* inode # within allocation grp */
 typedef uint32_t	xfs_extlen_t;	/* extent length in blocks */
 typedef uint32_t	xfs_rtxlen_t;	/* file extent length in rtextents */
 typedef uint32_t	xfs_agnumber_t;	/* allocation group number */
+typedef uint32_t	xfs_rgnumber_t;	/* realtime group number */
 typedef uint64_t	xfs_extnum_t;	/* # of extents in a file */
 typedef uint32_t	xfs_aextnum_t;	/* # extents in an attribute fork */
 typedef int64_t		xfs_fsize_t;	/* bytes in a file */
@@ -53,7 +55,9 @@ typedef void *		xfs_failaddr_t;
 #define	NULLFILEOFF	((xfs_fileoff_t)-1)
 
 #define	NULLAGBLOCK	((xfs_agblock_t)-1)
+#define NULLRGBLOCK	((xfs_rgblock_t)-1)
 #define	NULLAGNUMBER	((xfs_agnumber_t)-1)
+#define	NULLRGNUMBER	((xfs_rgnumber_t)-1)
 
 #define NULLCOMMITLSN	((xfs_lsn_t)-1)
 
@@ -214,11 +218,13 @@ enum xbtree_recpacking {
 
 enum xfs_group_type {
 	XG_TYPE_AG,
+	XG_TYPE_RTG,
 	XG_TYPE_MAX,
 } __packed;
 
 #define XG_TYPE_STRINGS \
-	{ XG_TYPE_AG,	"ag" }
+	{ XG_TYPE_AG,	"ag" }, \
+	{ XG_TYPE_RTG,	"rtg" }
 
 /*
  * Type verifier functions
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 4e51caead9dac2..2549a636568d1b 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -3592,6 +3592,14 @@ sb_set_features(
 	if (!fp->metadir)
 		sbp->sb_bad_features2 = sbp->sb_features2;
 
+	/*
+	 * This will be overriden later for real rtgroup file systems.  For
+	 * !rtgroups filesystems, we pretend that there's one huge group, just
+	 * like __xfs_sb_from_disk does.
+	 */
+	sbp->sb_rgcount = 1;
+	sbp->sb_rgextents = 0;
+
 	if (!fp->crcs_enabled)
 		return;
 


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 02/52] xfs: define locking primitives for realtime groups
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
  2024-12-23 21:58   ` [PATCH 01/52] xfs: create incore realtime group structures Darrick J. Wong
@ 2024-12-23 21:58   ` Darrick J. Wong
  2024-12-23 21:59   ` [PATCH 03/52] xfs: add a lockdep class key for rtgroup inodes Darrick J. Wong
                     ` (49 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:58 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 0e4875b3fb24c5bfdf685876c76713cda5a23b65

Define helper functions to lock all metadata inodes related to a
realtime group.  There's not much to look at now, but this will become
important when we add per-rtgroup metadata files and online fsck code
for them.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_rtgroup.c |   49 +++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_rtgroup.h |   16 ++++++++++++++++
 2 files changed, 65 insertions(+)


diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index 88f31384bf6961..5c8e269b62dbc4 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -147,3 +147,52 @@ xfs_update_last_rtgroup_size(
 	xfs_rtgroup_rele(rtg);
 	return 0;
 }
+
+/* Lock metadata inodes associated with this rt group. */
+void
+xfs_rtgroup_lock(
+	struct xfs_rtgroup	*rtg,
+	unsigned int		rtglock_flags)
+{
+	ASSERT(!(rtglock_flags & ~XFS_RTGLOCK_ALL_FLAGS));
+	ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) ||
+	       !(rtglock_flags & XFS_RTGLOCK_BITMAP));
+
+	if (rtglock_flags & XFS_RTGLOCK_BITMAP)
+		xfs_rtbitmap_lock(rtg_mount(rtg));
+	else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED)
+		xfs_rtbitmap_lock_shared(rtg_mount(rtg), XFS_RBMLOCK_BITMAP);
+}
+
+/* Unlock metadata inodes associated with this rt group. */
+void
+xfs_rtgroup_unlock(
+	struct xfs_rtgroup	*rtg,
+	unsigned int		rtglock_flags)
+{
+	ASSERT(!(rtglock_flags & ~XFS_RTGLOCK_ALL_FLAGS));
+	ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) ||
+	       !(rtglock_flags & XFS_RTGLOCK_BITMAP));
+
+	if (rtglock_flags & XFS_RTGLOCK_BITMAP)
+		xfs_rtbitmap_unlock(rtg_mount(rtg));
+	else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED)
+		xfs_rtbitmap_unlock_shared(rtg_mount(rtg), XFS_RBMLOCK_BITMAP);
+}
+
+/*
+ * Join realtime group metadata inodes to the transaction.  The ILOCKs will be
+ * released on transaction commit.
+ */
+void
+xfs_rtgroup_trans_join(
+	struct xfs_trans	*tp,
+	struct xfs_rtgroup	*rtg,
+	unsigned int		rtglock_flags)
+{
+	ASSERT(!(rtglock_flags & ~XFS_RTGLOCK_ALL_FLAGS));
+	ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED));
+
+	if (rtglock_flags & XFS_RTGLOCK_BITMAP)
+		xfs_rtbitmap_trans_join(tp);
+}
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index 8872c27a9585fd..7d82eb753fd097 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -197,6 +197,19 @@ xfs_rtxnum_t xfs_rtgroup_extents(struct xfs_mount *mp, xfs_rgnumber_t rgno);
 
 int xfs_update_last_rtgroup_size(struct xfs_mount *mp,
 		xfs_rgnumber_t prev_rgcount);
+
+/* Lock the rt bitmap inode in exclusive mode */
+#define XFS_RTGLOCK_BITMAP		(1U << 0)
+/* Lock the rt bitmap inode in shared mode */
+#define XFS_RTGLOCK_BITMAP_SHARED	(1U << 1)
+
+#define XFS_RTGLOCK_ALL_FLAGS	(XFS_RTGLOCK_BITMAP | \
+				 XFS_RTGLOCK_BITMAP_SHARED)
+
+void xfs_rtgroup_lock(struct xfs_rtgroup *rtg, unsigned int rtglock_flags);
+void xfs_rtgroup_unlock(struct xfs_rtgroup *rtg, unsigned int rtglock_flags);
+void xfs_rtgroup_trans_join(struct xfs_trans *tp, struct xfs_rtgroup *rtg,
+		unsigned int rtglock_flags);
 #else
 static inline void xfs_free_rtgroups(struct xfs_mount *mp,
 		xfs_rgnumber_t first_rgno, xfs_rgnumber_t end_rgno)
@@ -212,6 +225,9 @@ static inline int xfs_initialize_rtgroups(struct xfs_mount *mp,
 
 # define xfs_rtgroup_extents(mp, rgno)		(0)
 # define xfs_update_last_rtgroup_size(mp, rgno)	(-EOPNOTSUPP)
+# define xfs_rtgroup_lock(rtg, gf)		((void)0)
+# define xfs_rtgroup_unlock(rtg, gf)		((void)0)
+# define xfs_rtgroup_trans_join(tp, rtg, gf)	((void)0)
 #endif /* CONFIG_XFS_RT */
 
 #endif /* __LIBXFS_RTGROUP_H */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 03/52] xfs: add a lockdep class key for rtgroup inodes
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
  2024-12-23 21:58   ` [PATCH 01/52] xfs: create incore realtime group structures Darrick J. Wong
  2024-12-23 21:58   ` [PATCH 02/52] xfs: define locking primitives for realtime groups Darrick J. Wong
@ 2024-12-23 21:59   ` Darrick J. Wong
  2024-12-23 21:59   ` [PATCH 04/52] xfs: support caching rtgroup metadata inodes Darrick J. Wong
                     ` (48 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:59 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: c29237a65c8dbfade3c032763b66d495b8e8cb7a

Add a dynamic lockdep class key for rtgroup inodes.  This will enable
lockdep to deduce inconsistencies in the rtgroup metadata ILOCK locking
order.  Each class can have 8 subclasses, and for now we will only have
2 inodes per group.  This enables rtgroup order and inode order checks
when nesting ILOCKs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_rtgroup.c |   52 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)


diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index 5c8e269b62dbc4..ece52626584200 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -196,3 +196,55 @@ xfs_rtgroup_trans_join(
 	if (rtglock_flags & XFS_RTGLOCK_BITMAP)
 		xfs_rtbitmap_trans_join(tp);
 }
+
+#ifdef CONFIG_PROVE_LOCKING
+static struct lock_class_key xfs_rtginode_lock_class;
+
+static int
+xfs_rtginode_ilock_cmp_fn(
+	const struct lockdep_map	*m1,
+	const struct lockdep_map	*m2)
+{
+	const struct xfs_inode *ip1 =
+		container_of(m1, struct xfs_inode, i_lock.dep_map);
+	const struct xfs_inode *ip2 =
+		container_of(m2, struct xfs_inode, i_lock.dep_map);
+
+	if (ip1->i_projid < ip2->i_projid)
+		return -1;
+	if (ip1->i_projid > ip2->i_projid)
+		return 1;
+	return 0;
+}
+
+static inline void
+xfs_rtginode_ilock_print_fn(
+	const struct lockdep_map	*m)
+{
+	const struct xfs_inode *ip =
+		container_of(m, struct xfs_inode, i_lock.dep_map);
+
+	printk(KERN_CONT " rgno=%u", ip->i_projid);
+}
+
+/*
+ * Most of the time each of the RTG inode locks are only taken one at a time.
+ * But when committing deferred ops, more than one of a kind can be taken.
+ * However, deferred rt ops will be committed in rgno order so there is no
+ * potential for deadlocks.  The code here is needed to tell lockdep about this
+ * order.
+ */
+static inline void
+xfs_rtginode_lockdep_setup(
+	struct xfs_inode	*ip,
+	xfs_rgnumber_t		rgno,
+	enum xfs_rtg_inodes	type)
+{
+	lockdep_set_class_and_subclass(&ip->i_lock, &xfs_rtginode_lock_class,
+			type);
+	lock_set_cmp_fn(&ip->i_lock, xfs_rtginode_ilock_cmp_fn,
+			xfs_rtginode_ilock_print_fn);
+}
+#else
+#define xfs_rtginode_lockdep_setup(ip, rgno, type)	do { } while (0)
+#endif /* CONFIG_PROVE_LOCKING */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 04/52] xfs: support caching rtgroup metadata inodes
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-12-23 21:59   ` [PATCH 03/52] xfs: add a lockdep class key for rtgroup inodes Darrick J. Wong
@ 2024-12-23 21:59   ` Darrick J. Wong
  2024-12-23 21:59   ` [PATCH 05/52] xfs: add a xfs_bmap_free_rtblocks helper Darrick J. Wong
                     ` (47 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:59 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 65b1231b8cea7fbe7362dceecfda76026d335536

Create the necessary per-rtgroup infrastructure that we need to load
metadata inodes into memory.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/xfs_mount.h      |    1 
 libxfs/init.c            |   11 ++++
 libxfs/libxfs_api_defs.h |    1 
 libxfs/xfs_rtgroup.c     |  123 ++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_rtgroup.h     |   27 ++++++++++
 5 files changed, 162 insertions(+), 1 deletion(-)


diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 987ffea19d1586..14d640b1511cab 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -93,6 +93,7 @@ typedef struct xfs_mount {
 	struct xfs_inode	*m_rbmip;	/* pointer to bitmap inode */
 	struct xfs_inode	*m_rsumip;	/* pointer to summary inode */
 	struct xfs_inode	*m_metadirip;	/* ptr to metadata directory */
+	struct xfs_inode	*m_rtdirip;	/* ptr to realtime metadir */
 	struct xfs_buftarg	*m_ddev_targp;
 	struct xfs_buftarg	*m_logdev_targp;
 	struct xfs_buftarg	*m_rtdev_targp;
diff --git a/libxfs/init.c b/libxfs/init.c
index b47d8eaf4f468c..1a8084108f0778 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -846,8 +846,17 @@ libxfs_mount(
 }
 
 void
-libxfs_rtmount_destroy(xfs_mount_t *mp)
+libxfs_rtmount_destroy(
+	struct xfs_mount	*mp)
 {
+	struct xfs_rtgroup	*rtg = NULL;
+	unsigned int		i;
+
+	while ((rtg = xfs_rtgroup_next(mp, rtg))) {
+		for (i = 0; i < XFS_RTGI_MAX; i++)
+			libxfs_rtginode_irele(&rtg->rtg_inodes[i]);
+	}
+	libxfs_rtginode_irele(&mp->m_rtdirip);
 	if (mp->m_rsumip)
 		libxfs_irele(mp->m_rsumip);
 	if (mp->m_rbmip)
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index c4f42754c311fc..83e31a507aca6f 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -277,6 +277,7 @@
 #define xfs_rtbitmap_getword		libxfs_rtbitmap_getword
 #define xfs_rtbitmap_setword		libxfs_rtbitmap_setword
 #define xfs_rtbitmap_wordcount		libxfs_rtbitmap_wordcount
+#define xfs_rtginode_irele		libxfs_rtginode_irele
 
 #define xfs_rtgroup_alloc		libxfs_rtgroup_alloc
 
diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index ece52626584200..09dd0c2ed8769c 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -28,6 +28,8 @@
 #include "xfs_inode.h"
 #include "xfs_rtgroup.h"
 #include "xfs_rtbitmap.h"
+#include "xfs_metafile.h"
+#include "xfs_metadir.h"
 
 int
 xfs_rtgroup_alloc(
@@ -248,3 +250,124 @@ xfs_rtginode_lockdep_setup(
 #else
 #define xfs_rtginode_lockdep_setup(ip, rgno, type)	do { } while (0)
 #endif /* CONFIG_PROVE_LOCKING */
+
+struct xfs_rtginode_ops {
+	const char		*name;	/* short name */
+
+	enum xfs_metafile_type	metafile_type;
+
+	/* Does the fs have this feature? */
+	bool			(*enabled)(struct xfs_mount *mp);
+};
+
+static const struct xfs_rtginode_ops xfs_rtginode_ops[XFS_RTGI_MAX] = {
+};
+
+/* Return the shortname of this rtgroup inode. */
+const char *
+xfs_rtginode_name(
+	enum xfs_rtg_inodes	type)
+{
+	return xfs_rtginode_ops[type].name;
+}
+
+/* Return the metafile type of this rtgroup inode. */
+enum xfs_metafile_type
+xfs_rtginode_metafile_type(
+	enum xfs_rtg_inodes	type)
+{
+	return xfs_rtginode_ops[type].metafile_type;
+}
+
+/* Should this rtgroup inode be present? */
+bool
+xfs_rtginode_enabled(
+	struct xfs_rtgroup	*rtg,
+	enum xfs_rtg_inodes	type)
+{
+	const struct xfs_rtginode_ops *ops = &xfs_rtginode_ops[type];
+
+	if (!ops->enabled)
+		return true;
+	return ops->enabled(rtg_mount(rtg));
+}
+
+/* Load and existing rtgroup inode into the rtgroup structure. */
+int
+xfs_rtginode_load(
+	struct xfs_rtgroup	*rtg,
+	enum xfs_rtg_inodes	type,
+	struct xfs_trans	*tp)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	const char		*path;
+	struct xfs_inode	*ip;
+	const struct xfs_rtginode_ops *ops = &xfs_rtginode_ops[type];
+	int			error;
+
+	if (!xfs_rtginode_enabled(rtg, type))
+		return 0;
+
+	if (!mp->m_rtdirip)
+		return -EFSCORRUPTED;
+
+	path = xfs_rtginode_path(rtg_rgno(rtg), type);
+	if (!path)
+		return -ENOMEM;
+	error = xfs_metadir_load(tp, mp->m_rtdirip, path, ops->metafile_type,
+			&ip);
+	kfree(path);
+
+	if (error)
+		return error;
+
+	if (XFS_IS_CORRUPT(mp, ip->i_df.if_format != XFS_DINODE_FMT_EXTENTS &&
+			       ip->i_df.if_format != XFS_DINODE_FMT_BTREE)) {
+		xfs_irele(ip);
+		return -EFSCORRUPTED;
+	}
+
+	if (XFS_IS_CORRUPT(mp, ip->i_projid != rtg_rgno(rtg))) {
+		xfs_irele(ip);
+		return -EFSCORRUPTED;
+	}
+
+	xfs_rtginode_lockdep_setup(ip, rtg_rgno(rtg), type);
+	rtg->rtg_inodes[type] = ip;
+	return 0;
+}
+
+/* Release an rtgroup metadata inode. */
+void
+xfs_rtginode_irele(
+	struct xfs_inode	**ipp)
+{
+	if (*ipp)
+		xfs_irele(*ipp);
+	*ipp = NULL;
+}
+
+/* Create the parent directory for all rtgroup inodes and load it. */
+int
+xfs_rtginode_mkdir_parent(
+	struct xfs_mount	*mp)
+{
+	if (!mp->m_metadirip)
+		return -EFSCORRUPTED;
+
+	return xfs_metadir_mkdir(mp->m_metadirip, "rtgroups", &mp->m_rtdirip);
+}
+
+/* Load the parent directory of all rtgroup inodes. */
+int
+xfs_rtginode_load_parent(
+	struct xfs_trans	*tp)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+
+	if (!mp->m_metadirip)
+		return -EFSCORRUPTED;
+
+	return xfs_metadir_load(tp, mp->m_metadirip, "rtgroups",
+			XFS_METAFILE_DIR, &mp->m_rtdirip);
+}
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index 7d82eb753fd097..2c894df723a786 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -11,12 +11,23 @@
 struct xfs_mount;
 struct xfs_trans;
 
+enum xfs_rtg_inodes {
+	XFS_RTGI_MAX,
+};
+
+#ifdef MAX_LOCKDEP_SUBCLASSES
+static_assert(XFS_RTGI_MAX <= MAX_LOCKDEP_SUBCLASSES);
+#endif
+
 /*
  * Realtime group incore structure, similar to the per-AG structure.
  */
 struct xfs_rtgroup {
 	struct xfs_group	rtg_group;
 
+	/* per-rtgroup metadata inodes */
+	struct xfs_inode	*rtg_inodes[1 /* hack */];
+
 	/* Number of blocks in this group */
 	xfs_rtxnum_t		rtg_extents;
 };
@@ -210,6 +221,22 @@ void xfs_rtgroup_lock(struct xfs_rtgroup *rtg, unsigned int rtglock_flags);
 void xfs_rtgroup_unlock(struct xfs_rtgroup *rtg, unsigned int rtglock_flags);
 void xfs_rtgroup_trans_join(struct xfs_trans *tp, struct xfs_rtgroup *rtg,
 		unsigned int rtglock_flags);
+
+int xfs_rtginode_mkdir_parent(struct xfs_mount *mp);
+int xfs_rtginode_load_parent(struct xfs_trans *tp);
+
+const char *xfs_rtginode_name(enum xfs_rtg_inodes type);
+enum xfs_metafile_type xfs_rtginode_metafile_type(enum xfs_rtg_inodes type);
+bool xfs_rtginode_enabled(struct xfs_rtgroup *rtg, enum xfs_rtg_inodes type);
+int xfs_rtginode_load(struct xfs_rtgroup *rtg, enum xfs_rtg_inodes type,
+		struct xfs_trans *tp);
+void xfs_rtginode_irele(struct xfs_inode **ipp);
+
+static inline const char *xfs_rtginode_path(xfs_rgnumber_t rgno,
+		enum xfs_rtg_inodes type)
+{
+	return kasprintf(GFP_KERNEL, "%u.%s", rgno, xfs_rtginode_name(type));
+}
 #else
 static inline void xfs_free_rtgroups(struct xfs_mount *mp,
 		xfs_rgnumber_t first_rgno, xfs_rgnumber_t end_rgno)


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 05/52] xfs: add a xfs_bmap_free_rtblocks helper
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-12-23 21:59   ` [PATCH 04/52] xfs: support caching rtgroup metadata inodes Darrick J. Wong
@ 2024-12-23 21:59   ` Darrick J. Wong
  2024-12-23 21:59   ` [PATCH 06/52] xfs: move RT bitmap and summary information to the rtgroup Darrick J. Wong
                     ` (46 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:59 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: 9c3cfb9c96eee7f1656ef165e1471e1778510f6f

Split the RT extent freeing logic from xfs_bmap_del_extent_real because
it will become more complicated when adding RT group.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_bmap.c |   33 ++++++++++++++++++++++-----------
 1 file changed, 22 insertions(+), 11 deletions(-)


diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 99d53f9383a49a..949546f3eba470 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -5110,6 +5110,27 @@ xfs_bmap_del_extent_cow(
 	ip->i_delayed_blks -= del->br_blockcount;
 }
 
+static int
+xfs_bmap_free_rtblocks(
+	struct xfs_trans	*tp,
+	struct xfs_bmbt_irec	*del)
+{
+	int			error;
+
+	/*
+	 * Ensure the bitmap and summary inodes are locked and joined to the
+	 * transaction before modifying them.
+	 */
+	if (!(tp->t_flags & XFS_TRANS_RTBITMAP_LOCKED)) {
+		tp->t_flags |= XFS_TRANS_RTBITMAP_LOCKED;
+		xfs_rtbitmap_lock(tp->t_mountp);
+		xfs_rtbitmap_trans_join(tp);
+	}
+
+	error = xfs_rtfree_blocks(tp, del->br_startblock, del->br_blockcount);
+	return error;
+}
+
 /*
  * Called by xfs_bmapi to update file extent records and the btree
  * after removing space.
@@ -5325,17 +5346,7 @@ xfs_bmap_del_extent_real(
 		if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) {
 			xfs_refcount_decrease_extent(tp, del);
 		} else if (xfs_ifork_is_realtime(ip, whichfork)) {
-			/*
-			 * Ensure the bitmap and summary inodes are locked
-			 * and joined to the transaction before modifying them.
-			 */
-			if (!(tp->t_flags & XFS_TRANS_RTBITMAP_LOCKED)) {
-				tp->t_flags |= XFS_TRANS_RTBITMAP_LOCKED;
-				xfs_rtbitmap_lock(mp);
-				xfs_rtbitmap_trans_join(tp);
-			}
-			error = xfs_rtfree_blocks(tp, del->br_startblock,
-					del->br_blockcount);
+			error = xfs_bmap_free_rtblocks(tp, del);
 		} else {
 			unsigned int	efi_flags = 0;
 


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 06/52] xfs: move RT bitmap and summary information to the rtgroup
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-12-23 21:59   ` [PATCH 05/52] xfs: add a xfs_bmap_free_rtblocks helper Darrick J. Wong
@ 2024-12-23 21:59   ` Darrick J. Wong
  2024-12-23 22:00   ` [PATCH 07/52] xfs: support creating per-RTG files in growfs Darrick J. Wong
                     ` (45 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:59 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: e3088ae2dcae3c15d03d7970d4926c8095fd8c7c

Move the pointers to the RT bitmap and summary inodes as well as the
summary cache to the rtgroups structure to prepare for having a
separate bitmap and summary inodes for each rtgroup.

Code using the inodes now needs to operate on a rtgroup.  Where easily
possible such code is converted to iterate over all rtgroups, else
rtgroup 0 (the only one that can currently exist) is hardcoded.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/xfs_mount.h      |    9 --
 libxfs/init.c            |    7 --
 libxfs/libxfs_api_defs.h |   10 ++
 libxfs/xfs_bmap.c        |   13 ++-
 libxfs/xfs_rtbitmap.c    |  142 +++++++++++---------------------
 libxfs/xfs_rtbitmap.h    |   64 +++++----------
 libxfs/xfs_rtgroup.c     |   80 ++++++++++++++----
 libxfs/xfs_rtgroup.h     |   14 +++
 mkfs/proto.c             |   33 +++++--
 repair/phase6.c          |  203 +++++++++++++++++++++++-----------------------
 repair/rt.c              |   34 +-------
 repair/rt.h              |    4 -
 12 files changed, 294 insertions(+), 319 deletions(-)


diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 14d640b1511cab..7406825cf7f57a 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -83,15 +83,6 @@ typedef struct xfs_mount {
 	uint			m_rsumlevels;	/* rt summary levels */
 	xfs_filblks_t		m_rsumblocks;	/* size of rt summary, FSBs */
 	uint32_t		m_rgblocks;	/* size of rtgroup in rtblocks */
-	/*
-	 * Optional cache of rt summary level per bitmap block with the
-	 * invariant that m_rsum_cache[bbno] <= the minimum i for which
-	 * rsum[i][bbno] != 0. Reads and writes are serialized by the rsumip
-	 * inode lock.
-	 */
-	uint8_t			*m_rsum_cache;
-	struct xfs_inode	*m_rbmip;	/* pointer to bitmap inode */
-	struct xfs_inode	*m_rsumip;	/* pointer to summary inode */
 	struct xfs_inode	*m_metadirip;	/* ptr to metadata directory */
 	struct xfs_inode	*m_rtdirip;	/* ptr to realtime metadir */
 	struct xfs_buftarg	*m_ddev_targp;
diff --git a/libxfs/init.c b/libxfs/init.c
index 1a8084108f0778..5ec01537faac6b 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -326,7 +326,6 @@ rtmount_init(
 	mp->m_rsumlevels = mp->m_sb.sb_rextslog + 1;
 	mp->m_rsumblocks = xfs_rtsummary_blockcount(mp, mp->m_rsumlevels,
 			mp->m_sb.sb_rbmblocks);
-	mp->m_rbmip = mp->m_rsumip = NULL;
 
 	/*
 	 * Allow debugger to be run without the realtime device present.
@@ -855,13 +854,9 @@ libxfs_rtmount_destroy(
 	while ((rtg = xfs_rtgroup_next(mp, rtg))) {
 		for (i = 0; i < XFS_RTGI_MAX; i++)
 			libxfs_rtginode_irele(&rtg->rtg_inodes[i]);
+		kvfree(rtg->rtg_rsum_cache);
 	}
 	libxfs_rtginode_irele(&mp->m_rtdirip);
-	if (mp->m_rsumip)
-		libxfs_irele(mp->m_rsumip);
-	if (mp->m_rbmip)
-		libxfs_irele(mp->m_rbmip);
-	mp->m_rsumip = mp->m_rbmip = NULL;
 }
 
 /* Flush a device and report on writes that didn't make it to stable storage. */
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 83e31a507aca6f..c869a4b0a46551 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -274,12 +274,22 @@
 #define xfs_rmap_query_all		libxfs_rmap_query_all
 #define xfs_rmap_query_range		libxfs_rmap_query_range
 
+#define xfs_rtbitmap_create		libxfs_rtbitmap_create
 #define xfs_rtbitmap_getword		libxfs_rtbitmap_getword
 #define xfs_rtbitmap_setword		libxfs_rtbitmap_setword
 #define xfs_rtbitmap_wordcount		libxfs_rtbitmap_wordcount
+#define xfs_rtginode_create		libxfs_rtginode_create
 #define xfs_rtginode_irele		libxfs_rtginode_irele
+#define xfs_rtginode_load		libxfs_rtginode_load
+#define xfs_rtginode_load_parent	libxfs_rtginode_load_parent
+#define xfs_rtginode_metafile_type	libxfs_rtginode_metafile_type
+#define xfs_rtginode_mkdir_parent	libxfs_rtginode_mkdir_parent
+#define xfs_rtginode_name		libxfs_rtginode_name
+#define xfs_rtsummary_create		libxfs_rtsummary_create
 
 #define xfs_rtgroup_alloc		libxfs_rtgroup_alloc
+#define xfs_rtgroup_grab		libxfs_rtgroup_grab
+#define xfs_rtgroup_rele		libxfs_rtgroup_rele
 
 #define xfs_suminfo_add			libxfs_suminfo_add
 #define xfs_suminfo_get			libxfs_suminfo_get
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 949546f3eba470..cebf5479189280 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -5115,19 +5115,26 @@ xfs_bmap_free_rtblocks(
 	struct xfs_trans	*tp,
 	struct xfs_bmbt_irec	*del)
 {
+	struct xfs_rtgroup	*rtg;
 	int			error;
 
+	rtg = xfs_rtgroup_grab(tp->t_mountp, 0);
+	if (!rtg)
+		return -EIO;
+
 	/*
 	 * Ensure the bitmap and summary inodes are locked and joined to the
 	 * transaction before modifying them.
 	 */
 	if (!(tp->t_flags & XFS_TRANS_RTBITMAP_LOCKED)) {
 		tp->t_flags |= XFS_TRANS_RTBITMAP_LOCKED;
-		xfs_rtbitmap_lock(tp->t_mountp);
-		xfs_rtbitmap_trans_join(tp);
+		xfs_rtgroup_lock(rtg, XFS_RTGLOCK_BITMAP);
+		xfs_rtgroup_trans_join(tp, rtg, XFS_RTGLOCK_BITMAP);
 	}
 
-	error = xfs_rtfree_blocks(tp, del->br_startblock, del->br_blockcount);
+	error = xfs_rtfree_blocks(tp, rtg, del->br_startblock,
+			del->br_blockcount);
+	xfs_rtgroup_rele(rtg);
 	return error;
 }
 
diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c
index cff3030d1662b7..e4d0646ba19bfd 100644
--- a/libxfs/xfs_rtbitmap.c
+++ b/libxfs/xfs_rtbitmap.c
@@ -88,12 +88,12 @@ xfs_rtbuf_get(
 	if (issum) {
 		cbpp = &args->sumbp;
 		coffp = &args->sumoff;
-		ip = mp->m_rsumip;
+		ip = args->rtg->rtg_inodes[XFS_RTGI_SUMMARY];
 		type = XFS_BLFT_RTSUMMARY_BUF;
 	} else {
 		cbpp = &args->rbmbp;
 		coffp = &args->rbmoff;
-		ip = mp->m_rbmip;
+		ip = args->rtg->rtg_inodes[XFS_RTGI_BITMAP];
 		type = XFS_BLFT_RTBITMAP_BUF;
 	}
 
@@ -501,6 +501,7 @@ xfs_rtmodify_summary(
 {
 	struct xfs_mount	*mp = args->mp;
 	xfs_rtsumoff_t		so = xfs_rtsumoffs(mp, log, bbno);
+	uint8_t			*rsum_cache = args->rtg->rtg_rsum_cache;
 	unsigned int		infoword;
 	xfs_suminfo_t		val;
 	int			error;
@@ -512,11 +513,11 @@ xfs_rtmodify_summary(
 	infoword = xfs_rtsumoffs_to_infoword(mp, so);
 	val = xfs_suminfo_add(args, infoword, delta);
 
-	if (mp->m_rsum_cache) {
-		if (val == 0 && log + 1 == mp->m_rsum_cache[bbno])
-			mp->m_rsum_cache[bbno] = log;
-		if (val != 0 && log >= mp->m_rsum_cache[bbno])
-			mp->m_rsum_cache[bbno] = log + 1;
+	if (rsum_cache) {
+		if (val == 0 && log + 1 == rsum_cache[bbno])
+			rsum_cache[bbno] = log;
+		if (val != 0 && log >= rsum_cache[bbno])
+			rsum_cache[bbno] = log + 1;
 	}
 
 	xfs_trans_log_rtsummary(args, infoword);
@@ -735,7 +736,7 @@ xfs_rtfree_range(
 	/*
 	 * Find the next allocated block (end of allocated extent).
 	 */
-	error = xfs_rtfind_forw(args, end, mp->m_sb.sb_rextents - 1,
+	error = xfs_rtfind_forw(args, end, args->rtg->rtg_extents - 1,
 			&postblock);
 	if (error)
 		return error;
@@ -959,19 +960,22 @@ xfs_rtcheck_alloc_range(
 int
 xfs_rtfree_extent(
 	struct xfs_trans	*tp,	/* transaction pointer */
+	struct xfs_rtgroup	*rtg,
 	xfs_rtxnum_t		start,	/* starting rtext number to free */
 	xfs_rtxlen_t		len)	/* length of extent freed */
 {
 	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_inode	*rbmip = rtg->rtg_inodes[XFS_RTGI_BITMAP];
 	struct xfs_rtalloc_args	args = {
 		.mp		= mp,
 		.tp		= tp,
+		.rtg		= rtg,
 	};
 	int			error;
 	struct timespec64	atime;
 
-	ASSERT(mp->m_rbmip->i_itemp != NULL);
-	xfs_assert_ilocked(mp->m_rbmip, XFS_ILOCK_EXCL);
+	ASSERT(rbmip->i_itemp != NULL);
+	xfs_assert_ilocked(rbmip, XFS_ILOCK_EXCL);
 
 	error = xfs_rtcheck_alloc_range(&args, start, len);
 	if (error)
@@ -994,13 +998,13 @@ xfs_rtfree_extent(
 	 */
 	if (tp->t_frextents_delta + mp->m_sb.sb_frextents ==
 	    mp->m_sb.sb_rextents) {
-		if (!(mp->m_rbmip->i_diflags & XFS_DIFLAG_NEWRTBM))
-			mp->m_rbmip->i_diflags |= XFS_DIFLAG_NEWRTBM;
+		if (!(rbmip->i_diflags & XFS_DIFLAG_NEWRTBM))
+			rbmip->i_diflags |= XFS_DIFLAG_NEWRTBM;
 
-		atime = inode_get_atime(VFS_I(mp->m_rbmip));
+		atime = inode_get_atime(VFS_I(rbmip));
 		atime.tv_sec = 0;
-		inode_set_atime_to_ts(VFS_I(mp->m_rbmip), atime);
-		xfs_trans_log_inode(tp, mp->m_rbmip, XFS_ILOG_CORE);
+		inode_set_atime_to_ts(VFS_I(rbmip), atime);
+		xfs_trans_log_inode(tp, rbmip, XFS_ILOG_CORE);
 	}
 	error = 0;
 out:
@@ -1016,6 +1020,7 @@ xfs_rtfree_extent(
 int
 xfs_rtfree_blocks(
 	struct xfs_trans	*tp,
+	struct xfs_rtgroup	*rtg,
 	xfs_fsblock_t		rtbno,
 	xfs_filblks_t		rtlen)
 {
@@ -1036,21 +1041,23 @@ xfs_rtfree_blocks(
 		return -EIO;
 	}
 
-	return xfs_rtfree_extent(tp, xfs_rtb_to_rtx(mp, rtbno),
-			xfs_rtb_to_rtx(mp, rtlen));
+	return xfs_rtfree_extent(tp, rtg, xfs_rtb_to_rtx(mp, rtbno),
+			xfs_extlen_to_rtxlen(mp, rtlen));
 }
 
 /* Find all the free records within a given range. */
 int
 xfs_rtalloc_query_range(
-	struct xfs_mount		*mp,
+	struct xfs_rtgroup		*rtg,
 	struct xfs_trans		*tp,
 	xfs_rtxnum_t			start,
 	xfs_rtxnum_t			end,
 	xfs_rtalloc_query_range_fn	fn,
 	void				*priv)
 {
+	struct xfs_mount		*mp = rtg_mount(rtg);
 	struct xfs_rtalloc_args		args = {
+		.rtg			= rtg,
 		.mp			= mp,
 		.tp			= tp,
 	};
@@ -1058,10 +1065,10 @@ xfs_rtalloc_query_range(
 
 	if (start > end)
 		return -EINVAL;
-	if (start == end || start >= mp->m_sb.sb_rextents)
+	if (start == end || start >= rtg->rtg_extents)
 		return 0;
 
-	end = min(end, mp->m_sb.sb_rextents - 1);
+	end = min(end, rtg->rtg_extents - 1);
 
 	/* Iterate the bitmap, looking for discrepancies. */
 	while (start <= end) {
@@ -1084,7 +1091,7 @@ xfs_rtalloc_query_range(
 			rec.ar_startext = start;
 			rec.ar_extcount = rtend - start + 1;
 
-			error = fn(mp, tp, &rec, priv);
+			error = fn(rtg, tp, &rec, priv);
 			if (error)
 				break;
 		}
@@ -1099,26 +1106,27 @@ xfs_rtalloc_query_range(
 /* Find all the free records. */
 int
 xfs_rtalloc_query_all(
-	struct xfs_mount		*mp,
+	struct xfs_rtgroup		*rtg,
 	struct xfs_trans		*tp,
 	xfs_rtalloc_query_range_fn	fn,
 	void				*priv)
 {
-	return xfs_rtalloc_query_range(mp, tp, 0, mp->m_sb.sb_rextents - 1, fn,
+	return xfs_rtalloc_query_range(rtg, tp, 0, rtg->rtg_extents - 1, fn,
 			priv);
 }
 
 /* Is the given extent all free? */
 int
 xfs_rtalloc_extent_is_free(
-	struct xfs_mount		*mp,
+	struct xfs_rtgroup		*rtg,
 	struct xfs_trans		*tp,
 	xfs_rtxnum_t			start,
 	xfs_rtxlen_t			len,
 	bool				*is_free)
 {
 	struct xfs_rtalloc_args		args = {
-		.mp			= mp,
+		.mp			= rtg_mount(rtg),
+		.rtg			= rtg,
 		.tp			= tp,
 	};
 	xfs_rtxnum_t			end;
@@ -1159,65 +1167,6 @@ xfs_rtsummary_blockcount(
 	return XFS_B_TO_FSB(mp, rsumwords << XFS_WORDLOG);
 }
 
-/* Lock both realtime free space metadata inodes for a freespace update. */
-void
-xfs_rtbitmap_lock(
-	struct xfs_mount	*mp)
-{
-	xfs_ilock(mp->m_rbmip, XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP);
-	xfs_ilock(mp->m_rsumip, XFS_ILOCK_EXCL | XFS_ILOCK_RTSUM);
-}
-
-/*
- * Join both realtime free space metadata inodes to the transaction.  The
- * ILOCKs will be released on transaction commit.
- */
-void
-xfs_rtbitmap_trans_join(
-	struct xfs_trans	*tp)
-{
-	xfs_trans_ijoin(tp, tp->t_mountp->m_rbmip, XFS_ILOCK_EXCL);
-	xfs_trans_ijoin(tp, tp->t_mountp->m_rsumip, XFS_ILOCK_EXCL);
-}
-
-/* Unlock both realtime free space metadata inodes after a freespace update. */
-void
-xfs_rtbitmap_unlock(
-	struct xfs_mount	*mp)
-{
-	xfs_iunlock(mp->m_rsumip, XFS_ILOCK_EXCL | XFS_ILOCK_RTSUM);
-	xfs_iunlock(mp->m_rbmip, XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP);
-}
-
-/*
- * Lock the realtime free space metadata inodes for a freespace scan.  Callers
- * must walk metadata blocks in order of increasing file offset.
- */
-void
-xfs_rtbitmap_lock_shared(
-	struct xfs_mount	*mp,
-	unsigned int		rbmlock_flags)
-{
-	if (rbmlock_flags & XFS_RBMLOCK_BITMAP)
-		xfs_ilock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP);
-
-	if (rbmlock_flags & XFS_RBMLOCK_SUMMARY)
-		xfs_ilock(mp->m_rsumip, XFS_ILOCK_SHARED | XFS_ILOCK_RTSUM);
-}
-
-/* Unlock the realtime free space metadata inodes after a freespace scan. */
-void
-xfs_rtbitmap_unlock_shared(
-	struct xfs_mount	*mp,
-	unsigned int		rbmlock_flags)
-{
-	if (rbmlock_flags & XFS_RBMLOCK_SUMMARY)
-		xfs_iunlock(mp->m_rsumip, XFS_ILOCK_SHARED | XFS_ILOCK_RTSUM);
-
-	if (rbmlock_flags & XFS_RBMLOCK_BITMAP)
-		xfs_iunlock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP);
-}
-
 static int
 xfs_rtfile_alloc_blocks(
 	struct xfs_inode	*ip,
@@ -1258,21 +1207,25 @@ xfs_rtfile_alloc_blocks(
 /* Get a buffer for the block. */
 static int
 xfs_rtfile_initialize_block(
-	struct xfs_inode	*ip,
+	struct xfs_rtgroup	*rtg,
+	enum xfs_rtg_inodes	type,
 	xfs_fsblock_t		fsbno,
 	void			*data)
 {
-	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_mount	*mp = rtg_mount(rtg);
+	struct xfs_inode	*ip = rtg->rtg_inodes[type];
 	struct xfs_trans	*tp;
 	struct xfs_buf		*bp;
 	const size_t		copylen = mp->m_blockwsize << XFS_WORDLOG;
 	enum xfs_blft		buf_type;
 	int			error;
 
-	if (ip == mp->m_rsumip)
-		buf_type = XFS_BLFT_RTSUMMARY_BUF;
-	else
+	if (type == XFS_RTGI_BITMAP)
 		buf_type = XFS_BLFT_RTBITMAP_BUF;
+	else if (type == XFS_RTGI_SUMMARY)
+		buf_type = XFS_BLFT_RTSUMMARY_BUF;
+	else
+		return -EINVAL;
 
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growrtzero, 0, 0, 0, &tp);
 	if (error)
@@ -1304,12 +1257,13 @@ xfs_rtfile_initialize_block(
  */
 int
 xfs_rtfile_initialize_blocks(
-	struct xfs_inode	*ip,		/* inode (bitmap/summary) */
+	struct xfs_rtgroup	*rtg,
+	enum xfs_rtg_inodes	type,
 	xfs_fileoff_t		offset_fsb,	/* offset to start from */
 	xfs_fileoff_t		end_fsb,	/* offset to allocate to */
 	void			*data)		/* data to fill the blocks */
 {
-	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_mount	*mp = rtg_mount(rtg);
 	const size_t		copylen = mp->m_blockwsize << XFS_WORDLOG;
 
 	while (offset_fsb < end_fsb) {
@@ -1317,8 +1271,8 @@ xfs_rtfile_initialize_blocks(
 		xfs_filblks_t		i;
 		int			error;
 
-		error = xfs_rtfile_alloc_blocks(ip, offset_fsb,
-				end_fsb - offset_fsb, &map);
+		error = xfs_rtfile_alloc_blocks(rtg->rtg_inodes[type],
+				offset_fsb, end_fsb - offset_fsb, &map);
 		if (error)
 			return error;
 
@@ -1328,7 +1282,7 @@ xfs_rtfile_initialize_blocks(
 		 * Do this one block per transaction, to keep it simple.
 		 */
 		for (i = 0; i < map.br_blockcount; i++) {
-			error = xfs_rtfile_initialize_block(ip,
+			error = xfs_rtfile_initialize_block(rtg, type,
 					map.br_startblock + i, data);
 			if (error)
 				return error;
diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h
index 140513d1d6bcf1..b3cbc56aa255ed 100644
--- a/libxfs/xfs_rtbitmap.h
+++ b/libxfs/xfs_rtbitmap.h
@@ -6,7 +6,10 @@
 #ifndef __XFS_RTBITMAP_H__
 #define	__XFS_RTBITMAP_H__
 
+#include "xfs_rtgroup.h"
+
 struct xfs_rtalloc_args {
+	struct xfs_rtgroup	*rtg;
 	struct xfs_mount	*mp;
 	struct xfs_trans	*tp;
 
@@ -268,7 +271,7 @@ struct xfs_rtalloc_rec {
 };
 
 typedef int (*xfs_rtalloc_query_range_fn)(
-	struct xfs_mount		*mp,
+	struct xfs_rtgroup		*rtg,
 	struct xfs_trans		*tp,
 	const struct xfs_rtalloc_rec	*rec,
 	void				*priv);
@@ -291,53 +294,37 @@ int xfs_rtmodify_summary(struct xfs_rtalloc_args *args, int log,
 		xfs_fileoff_t bbno, int delta);
 int xfs_rtfree_range(struct xfs_rtalloc_args *args, xfs_rtxnum_t start,
 		xfs_rtxlen_t len);
-int xfs_rtalloc_query_range(struct xfs_mount *mp, struct xfs_trans *tp,
+int xfs_rtalloc_query_range(struct xfs_rtgroup *rtg, struct xfs_trans *tp,
 		xfs_rtxnum_t start, xfs_rtxnum_t end,
 		xfs_rtalloc_query_range_fn fn, void *priv);
-int xfs_rtalloc_query_all(struct xfs_mount *mp, struct xfs_trans *tp,
-			  xfs_rtalloc_query_range_fn fn,
-			  void *priv);
-int xfs_rtalloc_extent_is_free(struct xfs_mount *mp, struct xfs_trans *tp,
-			       xfs_rtxnum_t start, xfs_rtxlen_t len,
-			       bool *is_free);
-/*
- * Free an extent in the realtime subvolume.  Length is expressed in
- * realtime extents, as is the block number.
- */
-int					/* error */
-xfs_rtfree_extent(
-	struct xfs_trans	*tp,	/* transaction pointer */
-	xfs_rtxnum_t		start,	/* starting rtext number to free */
-	xfs_rtxlen_t		len);	/* length of extent freed */
-
+int xfs_rtalloc_query_all(struct xfs_rtgroup *rtg, struct xfs_trans *tp,
+		xfs_rtalloc_query_range_fn fn, void *priv);
+int xfs_rtalloc_extent_is_free(struct xfs_rtgroup *rtg, struct xfs_trans *tp,
+		xfs_rtxnum_t start, xfs_rtxlen_t len, bool *is_free);
+int xfs_rtfree_extent(struct xfs_trans *tp, struct xfs_rtgroup *rtg,
+		xfs_rtxnum_t start, xfs_rtxlen_t len);
 /* Same as above, but in units of rt blocks. */
-int xfs_rtfree_blocks(struct xfs_trans *tp, xfs_fsblock_t rtbno,
-		xfs_filblks_t rtlen);
+int xfs_rtfree_blocks(struct xfs_trans *tp, struct xfs_rtgroup *rtg,
+		xfs_fsblock_t rtbno, xfs_filblks_t rtlen);
 
 xfs_filblks_t xfs_rtbitmap_blockcount(struct xfs_mount *mp, xfs_rtbxlen_t
 		rtextents);
 xfs_filblks_t xfs_rtsummary_blockcount(struct xfs_mount *mp,
 		unsigned int rsumlevels, xfs_extlen_t rbmblocks);
 
-int xfs_rtfile_initialize_blocks(struct xfs_inode *ip,
-		xfs_fileoff_t offset_fsb, xfs_fileoff_t end_fsb, void *data);
+int xfs_rtfile_initialize_blocks(struct xfs_rtgroup *rtg,
+		enum xfs_rtg_inodes type, xfs_fileoff_t offset_fsb,
+		xfs_fileoff_t end_fsb, void *data);
 
-void xfs_rtbitmap_lock(struct xfs_mount *mp);
-void xfs_rtbitmap_unlock(struct xfs_mount *mp);
-void xfs_rtbitmap_trans_join(struct xfs_trans *tp);
-
-/* Lock the rt bitmap inode in shared mode */
-#define XFS_RBMLOCK_BITMAP	(1U << 0)
-/* Lock the rt summary inode in shared mode */
-#define XFS_RBMLOCK_SUMMARY	(1U << 1)
-
-void xfs_rtbitmap_lock_shared(struct xfs_mount *mp,
-		unsigned int rbmlock_flags);
-void xfs_rtbitmap_unlock_shared(struct xfs_mount *mp,
-		unsigned int rbmlock_flags);
 #else /* CONFIG_XFS_RT */
 # define xfs_rtfree_extent(t,b,l)			(-ENOSYS)
-# define xfs_rtfree_blocks(t,rb,rl)			(-ENOSYS)
+
+static inline int xfs_rtfree_blocks(struct xfs_trans *tp,
+		struct xfs_rtgroup *rtg, xfs_fsblock_t rtbno,
+		xfs_filblks_t rtlen)
+{
+	return -ENOSYS;
+}
 # define xfs_rtalloc_query_range(m,t,l,h,f,p)		(-ENOSYS)
 # define xfs_rtalloc_query_all(m,t,f,p)			(-ENOSYS)
 # define xfs_rtbitmap_read_buf(a,b)			(-ENOSYS)
@@ -351,11 +338,6 @@ xfs_rtbitmap_blockcount(struct xfs_mount *mp, xfs_rtbxlen_t rtextents)
 	return 0;
 }
 # define xfs_rtsummary_blockcount(mp, l, b)		(0)
-# define xfs_rtbitmap_lock(mp)			do { } while (0)
-# define xfs_rtbitmap_trans_join(tp)		do { } while (0)
-# define xfs_rtbitmap_unlock(mp)		do { } while (0)
-# define xfs_rtbitmap_lock_shared(mp, lf)	do { } while (0)
-# define xfs_rtbitmap_unlock_shared(mp, lf)	do { } while (0)
 #endif /* CONFIG_XFS_RT */
 
 #endif /* __XFS_RTBITMAP_H__ */
diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index 09dd0c2ed8769c..41c794718e06c9 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -160,10 +160,16 @@ xfs_rtgroup_lock(
 	ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) ||
 	       !(rtglock_flags & XFS_RTGLOCK_BITMAP));
 
-	if (rtglock_flags & XFS_RTGLOCK_BITMAP)
-		xfs_rtbitmap_lock(rtg_mount(rtg));
-	else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED)
-		xfs_rtbitmap_lock_shared(rtg_mount(rtg), XFS_RBMLOCK_BITMAP);
+	if (rtglock_flags & XFS_RTGLOCK_BITMAP) {
+		/*
+		 * Lock both realtime free space metadata inodes for a freespace
+		 * update.
+		 */
+		xfs_ilock(rtg->rtg_inodes[XFS_RTGI_BITMAP], XFS_ILOCK_EXCL);
+		xfs_ilock(rtg->rtg_inodes[XFS_RTGI_SUMMARY], XFS_ILOCK_EXCL);
+	} else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) {
+		xfs_ilock(rtg->rtg_inodes[XFS_RTGI_BITMAP], XFS_ILOCK_SHARED);
+	}
 }
 
 /* Unlock metadata inodes associated with this rt group. */
@@ -176,10 +182,12 @@ xfs_rtgroup_unlock(
 	ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) ||
 	       !(rtglock_flags & XFS_RTGLOCK_BITMAP));
 
-	if (rtglock_flags & XFS_RTGLOCK_BITMAP)
-		xfs_rtbitmap_unlock(rtg_mount(rtg));
-	else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED)
-		xfs_rtbitmap_unlock_shared(rtg_mount(rtg), XFS_RBMLOCK_BITMAP);
+	if (rtglock_flags & XFS_RTGLOCK_BITMAP) {
+		xfs_iunlock(rtg->rtg_inodes[XFS_RTGI_SUMMARY], XFS_ILOCK_EXCL);
+		xfs_iunlock(rtg->rtg_inodes[XFS_RTGI_BITMAP], XFS_ILOCK_EXCL);
+	} else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) {
+		xfs_iunlock(rtg->rtg_inodes[XFS_RTGI_BITMAP], XFS_ILOCK_SHARED);
+	}
 }
 
 /*
@@ -195,8 +203,12 @@ xfs_rtgroup_trans_join(
 	ASSERT(!(rtglock_flags & ~XFS_RTGLOCK_ALL_FLAGS));
 	ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED));
 
-	if (rtglock_flags & XFS_RTGLOCK_BITMAP)
-		xfs_rtbitmap_trans_join(tp);
+	if (rtglock_flags & XFS_RTGLOCK_BITMAP) {
+		xfs_trans_ijoin(tp, rtg->rtg_inodes[XFS_RTGI_BITMAP],
+				XFS_ILOCK_EXCL);
+		xfs_trans_ijoin(tp, rtg->rtg_inodes[XFS_RTGI_SUMMARY],
+				XFS_ILOCK_EXCL);
+	}
 }
 
 #ifdef CONFIG_PROVE_LOCKING
@@ -261,6 +273,14 @@ struct xfs_rtginode_ops {
 };
 
 static const struct xfs_rtginode_ops xfs_rtginode_ops[XFS_RTGI_MAX] = {
+	[XFS_RTGI_BITMAP] = {
+		.name		= "bitmap",
+		.metafile_type	= XFS_METAFILE_RTBITMAP,
+	},
+	[XFS_RTGI_SUMMARY] = {
+		.name		= "summary",
+		.metafile_type	= XFS_METAFILE_RTSUMMARY,
+	},
 };
 
 /* Return the shortname of this rtgroup inode. */
@@ -300,7 +320,6 @@ xfs_rtginode_load(
 	struct xfs_trans	*tp)
 {
 	struct xfs_mount	*mp = tp->t_mountp;
-	const char		*path;
 	struct xfs_inode	*ip;
 	const struct xfs_rtginode_ops *ops = &xfs_rtginode_ops[type];
 	int			error;
@@ -308,15 +327,36 @@ xfs_rtginode_load(
 	if (!xfs_rtginode_enabled(rtg, type))
 		return 0;
 
-	if (!mp->m_rtdirip)
-		return -EFSCORRUPTED;
-
-	path = xfs_rtginode_path(rtg_rgno(rtg), type);
-	if (!path)
-		return -ENOMEM;
-	error = xfs_metadir_load(tp, mp->m_rtdirip, path, ops->metafile_type,
-			&ip);
-	kfree(path);
+	if (!xfs_has_rtgroups(mp)) {
+		xfs_ino_t	ino;
+
+		switch (type) {
+		case XFS_RTGI_BITMAP:
+			ino = mp->m_sb.sb_rbmino;
+			break;
+		case XFS_RTGI_SUMMARY:
+			ino = mp->m_sb.sb_rsumino;
+			break;
+		default:
+			/* None of the other types exist on !rtgroups */
+			return 0;
+		}
+
+		error = xfs_trans_metafile_iget(tp, ino, ops->metafile_type,
+				&ip);
+	} else {
+		const char	*path;
+
+		if (!mp->m_rtdirip)
+			return -EFSCORRUPTED;
+
+		path = xfs_rtginode_path(rtg_rgno(rtg), type);
+		if (!path)
+			return -ENOMEM;
+		error = xfs_metadir_load(tp, mp->m_rtdirip, path,
+				ops->metafile_type, &ip);
+		kfree(path);
+	}
 
 	if (error)
 		return error;
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index 2c894df723a786..3732f65ba8a1f6 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -12,6 +12,9 @@ struct xfs_mount;
 struct xfs_trans;
 
 enum xfs_rtg_inodes {
+	XFS_RTGI_BITMAP,	/* allocation bitmap */
+	XFS_RTGI_SUMMARY,	/* allocation summary */
+
 	XFS_RTGI_MAX,
 };
 
@@ -26,10 +29,19 @@ struct xfs_rtgroup {
 	struct xfs_group	rtg_group;
 
 	/* per-rtgroup metadata inodes */
-	struct xfs_inode	*rtg_inodes[1 /* hack */];
+	struct xfs_inode	*rtg_inodes[XFS_RTGI_MAX];
 
 	/* Number of blocks in this group */
 	xfs_rtxnum_t		rtg_extents;
+
+	/*
+	 * Cache of rt summary level per bitmap block with the invariant that
+	 * rtg_rsum_cache[bbno] > the maximum i for which rsum[i][bbno] != 0,
+	 * or 0 if rsum[i][bbno] == 0 for all i.
+	 *
+	 * Reads and writes are serialized by the rsumip inode lock.
+	 */
+	uint8_t			*rtg_rsum_cache;
 };
 
 static inline struct xfs_rtgroup *to_rtg(struct xfs_group *xg)
diff --git a/mkfs/proto.c b/mkfs/proto.c
index 7a0493ec71cfd9..55182bf50fd399 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -946,9 +946,11 @@ parse_proto(
 /* Create a sb-rooted metadata file. */
 static void
 create_sb_metadata_file(
-	struct xfs_mount	*mp,
+	struct xfs_rtgroup	*rtg,
+	enum xfs_rtg_inodes	type,
 	void			(*create)(struct xfs_inode *ip))
 {
+	struct xfs_mount	*mp = rtg_mount(rtg);
 	struct xfs_icreate_args	args = {
 		.mode		= S_IFREG,
 		.flags		= XFS_ICREATE_UNLINKABLE,
@@ -978,6 +980,8 @@ create_sb_metadata_file(
 	error = -libxfs_trans_commit(tp);
 	if (error)
 		goto fail;
+	rtg->rtg_inodes[type] = ip;
+	return;
 
 fail:
 	if (ip)
@@ -997,8 +1001,6 @@ rtbitmap_create(
 	inode_set_atime(VFS_I(ip), 0, 0);
 
 	mp->m_sb.sb_rbmino = ip->i_ino;
-	mp->m_rbmip = ip;
-	ihold(VFS_I(ip));
 }
 
 static void
@@ -1010,8 +1012,6 @@ rtsummary_create(
 	ip->i_disk_size = mp->m_rsumblocks * mp->m_sb.sb_blocksize;
 
 	mp->m_sb.sb_rsumino = ip->i_ino;
-	mp->m_rsumip = ip;
-	ihold(VFS_I(ip));
 }
 
 /*
@@ -1020,8 +1020,9 @@ rtsummary_create(
  */
 static void
 rtfreesp_init(
-	struct xfs_mount	*mp)
+	struct xfs_rtgroup	*rtg)
 {
+	struct xfs_mount	*mp = rtg_mount(rtg);
 	struct xfs_trans	*tp;
 	xfs_rtxnum_t		rtx;
 	xfs_rtxnum_t		ertx;
@@ -1030,12 +1031,12 @@ rtfreesp_init(
 	/*
 	 * First zero the realtime bitmap and summary files.
 	 */
-	error = -libxfs_rtfile_initialize_blocks(mp->m_rbmip, 0,
+	error = -libxfs_rtfile_initialize_blocks(rtg, XFS_RTGI_BITMAP, 0,
 			mp->m_sb.sb_rbmblocks, NULL);
 	if (error)
 		fail(_("Initialization of rtbitmap inode failed"), error);
 
-	error = -libxfs_rtfile_initialize_blocks(mp->m_rsumip, 0,
+	error = -libxfs_rtfile_initialize_blocks(rtg, XFS_RTGI_SUMMARY, 0,
 			mp->m_rsumblocks, NULL);
 	if (error)
 		fail(_("Initialization of rtsummary inode failed"), error);
@@ -1049,11 +1050,11 @@ rtfreesp_init(
 		if (error)
 			res_failed(error);
 
-		libxfs_trans_ijoin(tp, mp->m_rbmip, 0);
+		libxfs_trans_ijoin(tp, rtg->rtg_inodes[XFS_RTGI_BITMAP], 0);
 		ertx = min(mp->m_sb.sb_rextents,
 			   rtx + NBBY * mp->m_sb.sb_blocksize);
 
-		error = -libxfs_rtfree_extent(tp, rtx,
+		error = -libxfs_rtfree_extent(tp, rtg, rtx,
 				(xfs_rtxlen_t)(ertx - rtx));
 		if (error) {
 			fail(_("Error initializing the realtime space"),
@@ -1073,10 +1074,16 @@ static void
 rtinit(
 	struct xfs_mount	*mp)
 {
-	create_sb_metadata_file(mp, rtbitmap_create);
-	create_sb_metadata_file(mp, rtsummary_create);
+	struct xfs_rtgroup	*rtg = NULL;
 
-	rtfreesp_init(mp);
+	while ((rtg = xfs_rtgroup_next(mp, rtg))) {
+		create_sb_metadata_file(rtg, XFS_RTGI_BITMAP,
+				rtbitmap_create);
+		create_sb_metadata_file(rtg, XFS_RTGI_SUMMARY,
+				rtsummary_create);
+
+		rtfreesp_init(rtg);
+	}
 }
 
 static off_t
diff --git a/repair/phase6.c b/repair/phase6.c
index 44677096873a54..41342d884ce37a 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -491,98 +491,85 @@ mark_ino_metadata(
 	set_inode_is_meta(irec, get_inode_offset(mp, ino, irec));
 }
 
-/* Load a realtime freespace metadata inode from disk and reset it. */
-static int
-ensure_rtino(
-	struct xfs_trans		*tp,
-	enum xfs_metafile_type		metafile_type,
-	struct xfs_inode		**ipp)
-{
-	struct xfs_mount		*mp = tp->t_mountp;
-	xfs_ino_t			ino;
-	int				error;
-
-	switch (metafile_type) {
-	case XFS_METAFILE_RTBITMAP:
-		ino = mp->m_sb.sb_rbmino;
-		break;
-	case XFS_METAFILE_RTSUMMARY:
-		ino = mp->m_sb.sb_rsumino;
-		break;
-	default:
-		ASSERT(0);
-		return -EFSCORRUPTED;
-	}
-
-	/*
-	 * Don't use metafile iget here because we're resetting sb-rooted
-	 * inodes that live at fixed inumbers, but these inodes could be in
-	 * an arbitrary state.
-	 */
-	error = -libxfs_iget(mp, tp, ino, 0, ipp);
-	if (error)
-		return error;
-
-	reset_sbroot_ino(tp, S_IFREG, *ipp);
-	if (xfs_has_metadir(mp))
-		libxfs_metafile_set_iflag(tp, *ipp, metafile_type);
-	return 0;
-}
-
+/* (Re)create a missing sb-rooted rt freespace inode. */
 static void
-mk_rbmino(
-	struct xfs_mount	*mp)
+mk_rtino(
+	struct xfs_rtgroup	*rtg,
+	enum xfs_rtg_inodes	type)
 {
+	struct xfs_mount	*mp = rtg_mount(rtg);
+	struct xfs_inode	*ip = rtg->rtg_inodes[type];
 	struct xfs_trans	*tp;
-	struct xfs_inode	*ip;
+	enum xfs_metafile_type	metafile_type =
+		libxfs_rtginode_metafile_type(type);
 	int			error;
 
 	error = -libxfs_trans_alloc_rollable(mp, 10, &tp);
 	if (error)
 		res_failed(error);
 
-	/* Reset the realtime bitmap inode. */
-	error = ensure_rtino(tp, XFS_METAFILE_RTBITMAP, &ip);
-	if (error) {
-		do_error(
-		_("couldn't iget realtime bitmap inode -- error - %d\n"),
-			error);
+	if (!ip) {
+		xfs_ino_t	rootino = mp->m_sb.sb_rootino;
+		xfs_ino_t	ino = NULLFSINO;
+
+		if (xfs_has_metadir(mp))
+			rootino++;
+
+		switch (type) {
+		case XFS_RTGI_BITMAP:
+			mp->m_sb.sb_rbmino = rootino + 1;
+			ino = mp->m_sb.sb_rbmino;
+			break;
+		case XFS_RTGI_SUMMARY:
+			mp->m_sb.sb_rsumino = rootino + 2;
+			ino = mp->m_sb.sb_rsumino;
+			break;
+		default:
+			break;
+		}
+
+		/*
+		 * Don't use metafile iget here because we're resetting
+		 * sb-rooted inodes that live at fixed inumbers, but these
+		 * inodes could be in an arbitrary state.
+		 */
+		error = -libxfs_iget(mp, tp, ino, 0, &ip);
+		if (error) {
+			do_error(
+_("couldn't iget realtime %s inode -- error - %d\n"),
+					libxfs_rtginode_name(type),
+					error);
+		}
+
+		rtg->rtg_inodes[type] = ip;
 	}
 
-	ip->i_disk_size = mp->m_sb.sb_rbmblocks * mp->m_sb.sb_blocksize;
-	libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
-	error = -libxfs_trans_commit(tp);
-	if (error)
-		do_error(_("%s: commit failed, error %d\n"), __func__, error);
-	libxfs_irele(ip);
-}
-
-static void
-mk_rsumino(
-	struct xfs_mount	*mp)
-{
-	struct xfs_trans	*tp;
-	struct xfs_inode	*ip;
-	int			error;
-
-	error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 10, 0, 0, &tp);
-	if (error)
-		res_failed(error);
-
-	/* Reset the rt summary inode. */
-	error = ensure_rtino(tp, XFS_METAFILE_RTSUMMARY, &ip);
-	if (error) {
-		do_error(
-		_("couldn't iget realtime summary inode -- error - %d\n"),
-			error);
+	reset_sbroot_ino(tp, S_IFREG, ip);
+	if (xfs_has_metadir(mp))
+		libxfs_metafile_set_iflag(tp, ip, metafile_type);
+
+	switch (type) {
+	case XFS_RTGI_BITMAP:
+		ip->i_disk_size = mp->m_sb.sb_rbmblocks * mp->m_sb.sb_blocksize;
+		libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+		error = 0;
+		break;
+	case XFS_RTGI_SUMMARY:
+		ip->i_disk_size = mp->m_rsumblocks * mp->m_sb.sb_blocksize;
+		libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+		error = 0;
+		break;
+	default:
+		error = EINVAL;
 	}
 
-	ip->i_disk_size = mp->m_rsumblocks * mp->m_sb.sb_blocksize;
-	libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+	if (error)
+		do_error(_("%s inode re-initialization failed for rtgroup %u\n"),
+			libxfs_rtginode_name(type), rtg_rgno(rtg));
+
 	error = -libxfs_trans_commit(tp);
 	if (error)
 		do_error(_("%s: commit failed, error %d\n"), __func__, error);
-	libxfs_irele(ip);
 }
 
 /* Initialize a root directory. */
@@ -3219,6 +3206,43 @@ traverse_ags(
 	do_inode_prefetch(mp, ag_stride, traverse_function, false, true);
 }
 
+static void
+reset_rt_sb_inodes(
+	struct xfs_mount	*mp)
+{
+	struct xfs_rtgroup	*rtg;
+
+	if (no_modify) {
+		if (need_rbmino)
+			do_warn(_("would reinitialize realtime bitmap inode\n"));
+		if (need_rsumino)
+			do_warn(_("would reinitialize realtime summary inode\n"));
+		return;
+	}
+
+	rtg = libxfs_rtgroup_grab(mp, 0);
+
+	if (need_rbmino)  {
+		do_warn(_("reinitializing realtime bitmap inode\n"));
+		mk_rtino(rtg, XFS_RTGI_BITMAP);
+		need_rbmino = 0;
+	}
+
+	if (need_rsumino)  {
+		do_warn(_("reinitializing realtime summary inode\n"));
+		mk_rtino(rtg, XFS_RTGI_SUMMARY);
+		need_rsumino = 0;
+	}
+
+	do_log(
+_("        - resetting contents of realtime bitmap and summary inodes\n"));
+
+	fill_rtbitmap(rtg);
+	fill_rtsummary(rtg);
+
+	libxfs_rtgroup_rele(rtg);
+}
+
 void
 phase6(xfs_mount_t *mp)
 {
@@ -3267,32 +3291,7 @@ phase6(xfs_mount_t *mp)
 		do_warn(_("would reinitialize metadata root directory\n"));
 	}
 
-	if (need_rbmino)  {
-		if (!no_modify)  {
-			do_warn(_("reinitializing realtime bitmap inode\n"));
-			mk_rbmino(mp);
-			need_rbmino = 0;
-		} else  {
-			do_warn(_("would reinitialize realtime bitmap inode\n"));
-		}
-	}
-
-	if (need_rsumino)  {
-		if (!no_modify)  {
-			do_warn(_("reinitializing realtime summary inode\n"));
-			mk_rsumino(mp);
-			need_rsumino = 0;
-		} else  {
-			do_warn(_("would reinitialize realtime summary inode\n"));
-		}
-	}
-
-	if (!no_modify)  {
-		do_log(
-_("        - resetting contents of realtime bitmap and summary inodes\n"));
-		fill_rtbitmap(mp);
-		fill_rtsummary(mp);
-	}
+	reset_rt_sb_inodes(mp);
 
 	mark_standalone_inodes(mp);
 
diff --git a/repair/rt.c b/repair/rt.c
index d0a171020c4b49..711891a724b076 100644
--- a/repair/rt.c
+++ b/repair/rt.c
@@ -69,8 +69,6 @@ _("couldn't allocate memory for incore realtime bitmap.\n"));
 		do_error(
 _("couldn't allocate memory for incore realtime summary info.\n"));
 
-	ASSERT(mp->m_rbmip == NULL);
-
 	/*
 	 * Slower but simple, don't play around with trying to set things one
 	 * word at a time, just set bit as required.  Have to track start and
@@ -234,46 +232,26 @@ check_rtsummary(
 
 void
 fill_rtbitmap(
-	struct xfs_mount	*mp)
+	struct xfs_rtgroup	*rtg)
 {
-	struct xfs_inode	*ip;
 	int			error;
 
-	error = -libxfs_metafile_iget(mp, mp->m_sb.sb_rbmino,
-			XFS_METAFILE_RTBITMAP, &ip);
-	if (error)
-		do_error(
-_("couldn't iget realtime bitmap inode, error %d\n"), error);
-
-	error = -libxfs_rtfile_initialize_blocks(ip, 0, mp->m_sb.sb_rbmblocks,
-			btmcompute);
+	error = -libxfs_rtfile_initialize_blocks(rtg, XFS_RTGI_BITMAP,
+			0, rtg_mount(rtg)->m_sb.sb_rbmblocks, btmcompute);
 	if (error)
 		do_error(
 _("couldn't re-initialize realtime bitmap inode, error %d\n"), error);
-
-	libxfs_irele(ip);
 }
 
 void
 fill_rtsummary(
-	struct xfs_mount	*mp)
+	struct xfs_rtgroup	*rtg)
 {
-	struct xfs_inode	*ip;
 	int			error;
 
-	error = -libxfs_metafile_iget(mp, mp->m_sb.sb_rsumino,
-			XFS_METAFILE_RTSUMMARY, &ip);
-	if (error)
-		do_error(
-_("couldn't iget realtime summary inode, error %d\n"), error);
-
-	mp->m_rsumip = ip;
-	error = -libxfs_rtfile_initialize_blocks(ip, 0, mp->m_rsumblocks,
-			sumcompute);
-	mp->m_rsumip = NULL;
+	error = -libxfs_rtfile_initialize_blocks(rtg, XFS_RTGI_SUMMARY,
+			0, rtg_mount(rtg)->m_rsumblocks, sumcompute);
 	if (error)
 		do_error(
 _("couldn't re-initialize realtime summary inode, error %d\n"), error);
-
-	libxfs_irele(ip);
 }
diff --git a/repair/rt.h b/repair/rt.h
index f8caa5dc874ec2..9d837de65a7dfc 100644
--- a/repair/rt.h
+++ b/repair/rt.h
@@ -10,7 +10,7 @@ void generate_rtinfo(struct xfs_mount *mp);
 void check_rtbitmap(struct xfs_mount *mp);
 void check_rtsummary(struct xfs_mount *mp);
 
-void fill_rtbitmap(struct xfs_mount *mp);
-void fill_rtsummary(struct xfs_mount *mp);
+void fill_rtbitmap(struct xfs_rtgroup *rtg);
+void fill_rtsummary(struct xfs_rtgroup *rtg);
 
 #endif /* _XFS_REPAIR_RT_H_ */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 07/52] xfs: support creating per-RTG files in growfs
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-12-23 21:59   ` [PATCH 06/52] xfs: move RT bitmap and summary information to the rtgroup Darrick J. Wong
@ 2024-12-23 22:00   ` Darrick J. Wong
  2024-12-23 22:00   ` [PATCH 08/52] xfs: refactor xfs_rtbitmap_blockcount Darrick J. Wong
                     ` (44 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:00 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: ae897e0bed0f5461a6b1c3259c7d899759ba2a62

To support adding new RT groups in growfs, we need to be able to create
the per-RT group files.  Add a new xfs_rtginode_create helper to create
a given per-RTG file.  Most of the code for that is shared, but the
details of the actual file are abstracted out using a new create method
in struct xfs_rtginode_ops.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_rtbitmap.c |   32 +++++++++++++++++++++++
 libxfs/xfs_rtbitmap.h |    4 +++
 libxfs/xfs_rtgroup.c  |   69 +++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_rtgroup.h  |    2 +
 4 files changed, 107 insertions(+)


diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c
index e4d0646ba19bfd..c686cd5309e87c 100644
--- a/libxfs/xfs_rtbitmap.c
+++ b/libxfs/xfs_rtbitmap.c
@@ -1295,3 +1295,35 @@ xfs_rtfile_initialize_blocks(
 
 	return 0;
 }
+
+int
+xfs_rtbitmap_create(
+	struct xfs_rtgroup	*rtg,
+	struct xfs_inode	*ip,
+	struct xfs_trans	*tp,
+	bool			init)
+{
+	struct xfs_mount	*mp = rtg_mount(rtg);
+
+	ip->i_disk_size = mp->m_sb.sb_rbmblocks * mp->m_sb.sb_blocksize;
+	if (init && !xfs_has_rtgroups(mp)) {
+		ip->i_diflags |= XFS_DIFLAG_NEWRTBM;
+		inode_set_atime(VFS_I(ip), 0, 0);
+	}
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+	return 0;
+}
+
+int
+xfs_rtsummary_create(
+	struct xfs_rtgroup	*rtg,
+	struct xfs_inode	*ip,
+	struct xfs_trans	*tp,
+	bool			init)
+{
+	struct xfs_mount	*mp = rtg_mount(rtg);
+
+	ip->i_disk_size = mp->m_rsumblocks * mp->m_sb.sb_blocksize;
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+	return 0;
+}
diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h
index b3cbc56aa255ed..e4994a3e461d33 100644
--- a/libxfs/xfs_rtbitmap.h
+++ b/libxfs/xfs_rtbitmap.h
@@ -315,6 +315,10 @@ xfs_filblks_t xfs_rtsummary_blockcount(struct xfs_mount *mp,
 int xfs_rtfile_initialize_blocks(struct xfs_rtgroup *rtg,
 		enum xfs_rtg_inodes type, xfs_fileoff_t offset_fsb,
 		xfs_fileoff_t end_fsb, void *data);
+int xfs_rtbitmap_create(struct xfs_rtgroup *rtg, struct xfs_inode *ip,
+		struct xfs_trans *tp, bool init);
+int xfs_rtsummary_create(struct xfs_rtgroup *rtg, struct xfs_inode *ip,
+		struct xfs_trans *tp, bool init);
 
 #else /* CONFIG_XFS_RT */
 # define xfs_rtfree_extent(t,b,l)			(-ENOSYS)
diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index 41c794718e06c9..4f2e6be2ae48cc 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -270,16 +270,24 @@ struct xfs_rtginode_ops {
 
 	/* Does the fs have this feature? */
 	bool			(*enabled)(struct xfs_mount *mp);
+
+	/* Create this rtgroup metadata inode and initialize it. */
+	int			(*create)(struct xfs_rtgroup *rtg,
+					  struct xfs_inode *ip,
+					  struct xfs_trans *tp,
+					  bool init);
 };
 
 static const struct xfs_rtginode_ops xfs_rtginode_ops[XFS_RTGI_MAX] = {
 	[XFS_RTGI_BITMAP] = {
 		.name		= "bitmap",
 		.metafile_type	= XFS_METAFILE_RTBITMAP,
+		.create		= xfs_rtbitmap_create,
 	},
 	[XFS_RTGI_SUMMARY] = {
 		.name		= "summary",
 		.metafile_type	= XFS_METAFILE_RTSUMMARY,
+		.create		= xfs_rtsummary_create,
 	},
 };
 
@@ -387,6 +395,67 @@ xfs_rtginode_irele(
 	*ipp = NULL;
 }
 
+/* Add a metadata inode for a realtime rmap btree. */
+int
+xfs_rtginode_create(
+	struct xfs_rtgroup		*rtg,
+	enum xfs_rtg_inodes		type,
+	bool				init)
+{
+	const struct xfs_rtginode_ops	*ops = &xfs_rtginode_ops[type];
+	struct xfs_mount		*mp = rtg_mount(rtg);
+	struct xfs_metadir_update	upd = {
+		.dp			= mp->m_rtdirip,
+		.metafile_type		= ops->metafile_type,
+	};
+	int				error;
+
+	if (!xfs_rtginode_enabled(rtg, type))
+		return 0;
+
+	if (!mp->m_rtdirip)
+		return -EFSCORRUPTED;
+
+	upd.path = xfs_rtginode_path(rtg_rgno(rtg), type);
+	if (!upd.path)
+		return -ENOMEM;
+
+	error = xfs_metadir_start_create(&upd);
+	if (error)
+		goto out_path;
+
+	error = xfs_metadir_create(&upd, S_IFREG);
+	if (error)
+		return error;
+
+	xfs_rtginode_lockdep_setup(upd.ip, rtg_rgno(rtg), type);
+
+	upd.ip->i_projid = rtg_rgno(rtg);
+	error = ops->create(rtg, upd.ip, upd.tp, init);
+	if (error)
+		goto out_cancel;
+
+	error = xfs_metadir_commit(&upd);
+	if (error)
+		goto out_path;
+
+	kfree(upd.path);
+	xfs_finish_inode_setup(upd.ip);
+	rtg->rtg_inodes[type] = upd.ip;
+	return 0;
+
+out_cancel:
+	xfs_metadir_cancel(&upd, error);
+	/* Have to finish setting up the inode to ensure it's deleted. */
+	if (upd.ip) {
+		xfs_finish_inode_setup(upd.ip);
+		xfs_irele(upd.ip);
+	}
+out_path:
+	kfree(upd.path);
+	return error;
+}
+
 /* Create the parent directory for all rtgroup inodes and load it. */
 int
 xfs_rtginode_mkdir_parent(
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index 3732f65ba8a1f6..6ccf31bb6bc7a7 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -242,6 +242,8 @@ enum xfs_metafile_type xfs_rtginode_metafile_type(enum xfs_rtg_inodes type);
 bool xfs_rtginode_enabled(struct xfs_rtgroup *rtg, enum xfs_rtg_inodes type);
 int xfs_rtginode_load(struct xfs_rtgroup *rtg, enum xfs_rtg_inodes type,
 		struct xfs_trans *tp);
+int xfs_rtginode_create(struct xfs_rtgroup *rtg, enum xfs_rtg_inodes type,
+		bool init);
 void xfs_rtginode_irele(struct xfs_inode **ipp);
 
 static inline const char *xfs_rtginode_path(xfs_rgnumber_t rgno,


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 08/52] xfs: refactor xfs_rtbitmap_blockcount
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (6 preceding siblings ...)
  2024-12-23 22:00   ` [PATCH 07/52] xfs: support creating per-RTG files in growfs Darrick J. Wong
@ 2024-12-23 22:00   ` Darrick J. Wong
  2024-12-23 22:00   ` [PATCH 09/52] xfs: refactor xfs_rtsummary_blockcount Darrick J. Wong
                     ` (43 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:00 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: 5a7566c8d6b9b5c0aac34882f30448d29d9deafc

Rename the existing xfs_rtbitmap_blockcount to
xfs_rtbitmap_blockcount_len and add a new xfs_rtbitmap_blockcount wrapper
around it that takes the number of extents from the mount structure.

This will simplify the move to per-rtgroup bitmaps as those will need to
pass in the number of extents per rtgroup instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_rtbitmap.c   |   12 +++++++++++-
 libxfs/xfs_rtbitmap.h   |    7 ++++---
 libxfs/xfs_trans_resv.c |    2 +-
 3 files changed, 16 insertions(+), 5 deletions(-)


diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c
index c686cd5309e87c..cebeef5134e666 100644
--- a/libxfs/xfs_rtbitmap.c
+++ b/libxfs/xfs_rtbitmap.c
@@ -1147,13 +1147,23 @@ xfs_rtalloc_extent_is_free(
  * extents.
  */
 xfs_filblks_t
-xfs_rtbitmap_blockcount(
+xfs_rtbitmap_blockcount_len(
 	struct xfs_mount	*mp,
 	xfs_rtbxlen_t		rtextents)
 {
 	return howmany_64(rtextents, NBBY * mp->m_sb.sb_blocksize);
 }
 
+/*
+ * Compute the number of rtbitmap blocks used for a given file system.
+ */
+xfs_filblks_t
+xfs_rtbitmap_blockcount(
+	struct xfs_mount	*mp)
+{
+	return xfs_rtbitmap_blockcount_len(mp, mp->m_sb.sb_rextents);
+}
+
 /* Compute the number of rtsummary blocks needed to track the given rt space. */
 xfs_filblks_t
 xfs_rtsummary_blockcount(
diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h
index e4994a3e461d33..58672863053a94 100644
--- a/libxfs/xfs_rtbitmap.h
+++ b/libxfs/xfs_rtbitmap.h
@@ -307,8 +307,9 @@ int xfs_rtfree_extent(struct xfs_trans *tp, struct xfs_rtgroup *rtg,
 int xfs_rtfree_blocks(struct xfs_trans *tp, struct xfs_rtgroup *rtg,
 		xfs_fsblock_t rtbno, xfs_filblks_t rtlen);
 
-xfs_filblks_t xfs_rtbitmap_blockcount(struct xfs_mount *mp, xfs_rtbxlen_t
-		rtextents);
+xfs_filblks_t xfs_rtbitmap_blockcount(struct xfs_mount *mp);
+xfs_filblks_t xfs_rtbitmap_blockcount_len(struct xfs_mount *mp,
+		xfs_rtbxlen_t rtextents);
 xfs_filblks_t xfs_rtsummary_blockcount(struct xfs_mount *mp,
 		unsigned int rsumlevels, xfs_extlen_t rbmblocks);
 
@@ -336,7 +337,7 @@ static inline int xfs_rtfree_blocks(struct xfs_trans *tp,
 # define xfs_rtbuf_cache_relse(a)			(0)
 # define xfs_rtalloc_extent_is_free(m,t,s,l,i)		(-ENOSYS)
 static inline xfs_filblks_t
-xfs_rtbitmap_blockcount(struct xfs_mount *mp, xfs_rtbxlen_t rtextents)
+xfs_rtbitmap_blockcount_len(struct xfs_mount *mp, xfs_rtbxlen_t rtextents)
 {
 	/* shut up gcc */
 	return 0;
diff --git a/libxfs/xfs_trans_resv.c b/libxfs/xfs_trans_resv.c
index 3da18fb4027420..93047e149693d6 100644
--- a/libxfs/xfs_trans_resv.c
+++ b/libxfs/xfs_trans_resv.c
@@ -221,7 +221,7 @@ xfs_rtalloc_block_count(
 	xfs_rtxlen_t		rtxlen;
 
 	rtxlen = xfs_extlen_to_rtxlen(mp, XFS_MAX_BMBT_EXTLEN);
-	rtbmp_blocks = xfs_rtbitmap_blockcount(mp, rtxlen);
+	rtbmp_blocks = xfs_rtbitmap_blockcount_len(mp, rtxlen);
 	return (rtbmp_blocks + 1) * num_ops;
 }
 


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 09/52] xfs: refactor xfs_rtsummary_blockcount
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (7 preceding siblings ...)
  2024-12-23 22:00   ` [PATCH 08/52] xfs: refactor xfs_rtbitmap_blockcount Darrick J. Wong
@ 2024-12-23 22:00   ` Darrick J. Wong
  2024-12-23 22:00   ` [PATCH 10/52] xfs: make RT extent numbers relative to the rtgroup Darrick J. Wong
                     ` (42 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:00 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: f8c5a8415f6e23fa5b6301635d8b451627efae1c

Make xfs_rtsummary_blockcount take all the required information from
the mount structure and return the number of summary levels from it
as well.  This cleans up many of the callers and prepares for making the
rtsummary files per-rtgroup where they need to look at different value.

This means we recalculate some values in some callers, but as all these
calculations are outside the fast path and cheap, which seems like a
price worth paying.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/init.c         |    4 +---
 libxfs/xfs_rtbitmap.c |   13 +++++++++----
 libxfs/xfs_rtbitmap.h |    3 +--
 3 files changed, 11 insertions(+), 9 deletions(-)


diff --git a/libxfs/init.c b/libxfs/init.c
index 5ec01537faac6b..a037012b77e5f6 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -323,9 +323,7 @@ rtmount_init(
 			progname);
 		return -1;
 	}
-	mp->m_rsumlevels = mp->m_sb.sb_rextslog + 1;
-	mp->m_rsumblocks = xfs_rtsummary_blockcount(mp, mp->m_rsumlevels,
-			mp->m_sb.sb_rbmblocks);
+	mp->m_rsumblocks = xfs_rtsummary_blockcount(mp, &mp->m_rsumlevels);
 
 	/*
 	 * Allow debugger to be run without the realtime device present.
diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c
index cebeef5134e666..edcfb09e29fa18 100644
--- a/libxfs/xfs_rtbitmap.c
+++ b/libxfs/xfs_rtbitmap.c
@@ -18,6 +18,7 @@
 #include "xfs_trans.h"
 #include "xfs_rtbitmap.h"
 #include "xfs_health.h"
+#include "xfs_sb.h"
 
 /*
  * Realtime allocator bitmap functions shared with userspace.
@@ -1164,16 +1165,20 @@ xfs_rtbitmap_blockcount(
 	return xfs_rtbitmap_blockcount_len(mp, mp->m_sb.sb_rextents);
 }
 
-/* Compute the number of rtsummary blocks needed to track the given rt space. */
+/*
+ * Compute the geometry of the rtsummary file needed to track the given rt
+ * space.
+ */
 xfs_filblks_t
 xfs_rtsummary_blockcount(
 	struct xfs_mount	*mp,
-	unsigned int		rsumlevels,
-	xfs_extlen_t		rbmblocks)
+	unsigned int		*rsumlevels)
 {
 	unsigned long long	rsumwords;
 
-	rsumwords = (unsigned long long)rsumlevels * rbmblocks;
+	*rsumlevels = xfs_compute_rextslog(mp->m_sb.sb_rextents) + 1;
+
+	rsumwords = xfs_rtbitmap_blockcount(mp) * (*rsumlevels);
 	return XFS_B_TO_FSB(mp, rsumwords << XFS_WORDLOG);
 }
 
diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h
index 58672863053a94..776cca9e41bf05 100644
--- a/libxfs/xfs_rtbitmap.h
+++ b/libxfs/xfs_rtbitmap.h
@@ -311,7 +311,7 @@ xfs_filblks_t xfs_rtbitmap_blockcount(struct xfs_mount *mp);
 xfs_filblks_t xfs_rtbitmap_blockcount_len(struct xfs_mount *mp,
 		xfs_rtbxlen_t rtextents);
 xfs_filblks_t xfs_rtsummary_blockcount(struct xfs_mount *mp,
-		unsigned int rsumlevels, xfs_extlen_t rbmblocks);
+		unsigned int *rsumlevels);
 
 int xfs_rtfile_initialize_blocks(struct xfs_rtgroup *rtg,
 		enum xfs_rtg_inodes type, xfs_fileoff_t offset_fsb,
@@ -342,7 +342,6 @@ xfs_rtbitmap_blockcount_len(struct xfs_mount *mp, xfs_rtbxlen_t rtextents)
 	/* shut up gcc */
 	return 0;
 }
-# define xfs_rtsummary_blockcount(mp, l, b)		(0)
 #endif /* CONFIG_XFS_RT */
 
 #endif /* __XFS_RTBITMAP_H__ */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 10/52] xfs: make RT extent numbers relative to the rtgroup
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (8 preceding siblings ...)
  2024-12-23 22:00   ` [PATCH 09/52] xfs: refactor xfs_rtsummary_blockcount Darrick J. Wong
@ 2024-12-23 22:00   ` Darrick J. Wong
  2024-12-23 22:01   ` [PATCH 11/52] libfrog: add memchr_inv Darrick J. Wong
                     ` (41 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:00 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: f220f6da5f4ad7da538c39075cf57e829d5202f7

To prepare for adding per-rtgroup bitmap files, make the xfs_rtxnum_t
type encode the RT extent number relative to the rtgroup.  The biggest
part of this to clearly distinguish between the relative extent number
that gets masked when converting from a global block number and length
values that just have a factor applied to them when converting from
file system blocks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/block.c            |    2 +
 libxfs/xfs_bmap.c     |    6 ++--
 libxfs/xfs_rtbitmap.h |   69 +++++++++++++++++++++++++++++++------------------
 3 files changed, 47 insertions(+), 30 deletions(-)


diff --git a/db/block.c b/db/block.c
index b50b2c16060ac7..f197e10cd5a08d 100644
--- a/db/block.c
+++ b/db/block.c
@@ -423,7 +423,7 @@ rtextent_f(
 		return 0;
 	}
 
-	rtbno = xfs_rtx_to_rtb(mp, rtx);
+	rtbno = xfs_rtbxlen_to_blen(mp, rtx);
 	ASSERT(typtab[TYP_DATA].typnm == TYP_DATA);
 	set_rt_cur(&typtab[TYP_DATA], xfs_rtb_to_daddr(mp, rtbno),
 			mp->m_sb.sb_rextsize * blkbb, DB_RING_ADD, NULL);
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index cebf5479189280..48b05c40e23235 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -4088,7 +4088,7 @@ xfs_bmapi_reserve_delalloc(
 
 	fdblocks = indlen;
 	if (XFS_IS_REALTIME_INODE(ip)) {
-		error = xfs_dec_frextents(mp, xfs_rtb_to_rtx(mp, alen));
+		error = xfs_dec_frextents(mp, xfs_blen_to_rtbxlen(mp, alen));
 		if (error)
 			goto out_unreserve_quota;
 	} else {
@@ -4123,7 +4123,7 @@ xfs_bmapi_reserve_delalloc(
 
 out_unreserve_frextents:
 	if (XFS_IS_REALTIME_INODE(ip))
-		xfs_add_frextents(mp, xfs_rtb_to_rtx(mp, alen));
+		xfs_add_frextents(mp, xfs_blen_to_rtbxlen(mp, alen));
 out_unreserve_quota:
 	if (XFS_IS_QUOTA_ON(mp))
 		xfs_quota_unreserve_blkres(ip, alen);
@@ -5031,7 +5031,7 @@ xfs_bmap_del_extent_delay(
 	fdblocks = da_diff;
 
 	if (isrt)
-		xfs_add_frextents(mp, xfs_rtb_to_rtx(mp, del->br_blockcount));
+		xfs_add_frextents(mp, xfs_blen_to_rtbxlen(mp, del->br_blockcount));
 	else
 		fdblocks += del->br_blockcount;
 
diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h
index 776cca9e41bf05..b2b9e59a87a278 100644
--- a/libxfs/xfs_rtbitmap.h
+++ b/libxfs/xfs_rtbitmap.h
@@ -22,13 +22,37 @@ struct xfs_rtalloc_args {
 
 static inline xfs_rtblock_t
 xfs_rtx_to_rtb(
-	struct xfs_mount	*mp,
+	struct xfs_rtgroup	*rtg,
 	xfs_rtxnum_t		rtx)
+{
+	struct xfs_mount	*mp = rtg_mount(rtg);
+	xfs_rtblock_t		start = xfs_rgno_start_rtb(mp, rtg_rgno(rtg));
+
+	if (mp->m_rtxblklog >= 0)
+		return start + (rtx << mp->m_rtxblklog);
+	return start + (rtx * mp->m_sb.sb_rextsize);
+}
+
+/* Convert an rgbno into an rt extent number. */
+static inline xfs_rtxnum_t
+xfs_rgbno_to_rtx(
+	struct xfs_mount	*mp,
+	xfs_rgblock_t		rgbno)
+{
+	if (likely(mp->m_rtxblklog >= 0))
+		return rgbno >> mp->m_rtxblklog;
+	return rgbno / mp->m_sb.sb_rextsize;
+}
+
+static inline uint64_t
+xfs_rtbxlen_to_blen(
+	struct xfs_mount	*mp,
+	xfs_rtbxlen_t		rtbxlen)
 {
 	if (mp->m_rtxblklog >= 0)
-		return rtx << mp->m_rtxblklog;
+		return rtbxlen << mp->m_rtxblklog;
 
-	return rtx * mp->m_sb.sb_rextsize;
+	return rtbxlen * mp->m_sb.sb_rextsize;
 }
 
 static inline xfs_extlen_t
@@ -65,16 +89,29 @@ xfs_extlen_to_rtxlen(
 	return len / mp->m_sb.sb_rextsize;
 }
 
+/* Convert an rt block count into an rt extent count. */
+static inline xfs_rtbxlen_t
+xfs_blen_to_rtbxlen(
+	struct xfs_mount	*mp,
+	uint64_t		blen)
+{
+	if (likely(mp->m_rtxblklog >= 0))
+		return blen >> mp->m_rtxblklog;
+
+	return div_u64(blen, mp->m_sb.sb_rextsize);
+}
+
 /* Convert an rt block number into an rt extent number. */
 static inline xfs_rtxnum_t
 xfs_rtb_to_rtx(
 	struct xfs_mount	*mp,
 	xfs_rtblock_t		rtbno)
 {
-	if (likely(mp->m_rtxblklog >= 0))
-		return rtbno >> mp->m_rtxblklog;
+	uint64_t		__rgbno = __xfs_rtb_to_rgbno(mp, rtbno);
 
-	return div_u64(rtbno, mp->m_sb.sb_rextsize);
+	if (likely(mp->m_rtxblklog >= 0))
+		return __rgbno >> mp->m_rtxblklog;
+	return div_u64(__rgbno, mp->m_sb.sb_rextsize);
 }
 
 /* Return the offset of an rt block number within an rt extent. */
@@ -89,26 +126,6 @@ xfs_rtb_to_rtxoff(
 	return do_div(rtbno, mp->m_sb.sb_rextsize);
 }
 
-/*
- * Convert an rt block number into an rt extent number, rounding up to the next
- * rt extent if the rt block is not aligned to an rt extent boundary.
- */
-static inline xfs_rtxnum_t
-xfs_rtb_to_rtxup(
-	struct xfs_mount	*mp,
-	xfs_rtblock_t		rtbno)
-{
-	if (likely(mp->m_rtxblklog >= 0)) {
-		if (rtbno & mp->m_rtxblkmask)
-			return (rtbno >> mp->m_rtxblklog) + 1;
-		return rtbno >> mp->m_rtxblklog;
-	}
-
-	if (do_div(rtbno, mp->m_sb.sb_rextsize))
-		rtbno++;
-	return rtbno;
-}
-
 /* Round this rtblock up to the nearest rt extent size. */
 static inline xfs_rtblock_t
 xfs_rtb_roundup_rtx(


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 11/52] libfrog: add memchr_inv
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (9 preceding siblings ...)
  2024-12-23 22:00   ` [PATCH 10/52] xfs: make RT extent numbers relative to the rtgroup Darrick J. Wong
@ 2024-12-23 22:01   ` Darrick J. Wong
  2024-12-23 22:01   ` [PATCH 12/52] xfs: define the format of rt groups Darrick J. Wong
                     ` (40 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:01 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Add this kernel function so we can use it in userspace.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libfrog/util.c |   14 ++++++++++++++
 libfrog/util.h |    4 ++++
 2 files changed, 18 insertions(+)


diff --git a/libfrog/util.c b/libfrog/util.c
index 8fb10cf82f5ca4..46047571a5531f 100644
--- a/libfrog/util.c
+++ b/libfrog/util.c
@@ -22,3 +22,17 @@ log2_roundup(unsigned int i)
 	}
 	return rval;
 }
+
+void *
+memchr_inv(const void *start, int c, size_t bytes)
+{
+	const unsigned char	*p = start;
+
+	while (bytes > 0) {
+		if (*p != (unsigned char)c)
+			return (void *)p;
+		bytes--;
+	}
+
+	return NULL;
+}
diff --git a/libfrog/util.h b/libfrog/util.h
index 5df95e69cd11da..8b4ee7c1333b6b 100644
--- a/libfrog/util.h
+++ b/libfrog/util.h
@@ -6,6 +6,8 @@
 #ifndef __LIBFROG_UTIL_H__
 #define __LIBFROG_UTIL_H__
 
+#include <sys/types.h>
+
 unsigned int	log2_roundup(unsigned int i);
 
 #define min_t(type,x,y) \
@@ -13,4 +15,6 @@ unsigned int	log2_roundup(unsigned int i);
 #define max_t(type,x,y) \
 	({ type __x = (x); type __y = (y); __x > __y ? __x: __y; })
 
+void *memchr_inv(const void *start, int c, size_t bytes);
+
 #endif /* __LIBFROG_UTIL_H__ */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 12/52] xfs: define the format of rt groups
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (10 preceding siblings ...)
  2024-12-23 22:01   ` [PATCH 11/52] libfrog: add memchr_inv Darrick J. Wong
@ 2024-12-23 22:01   ` Darrick J. Wong
  2024-12-23 22:01   ` [PATCH 13/52] xfs: update realtime super every time we update the primary fs super Darrick J. Wong
                     ` (39 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:01 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 96768e91511bfced6e9e537f4891157d909b13ee

Define the ondisk format of realtime group metadata, and a superblock
for realtime volumes.  rt supers are conditionally enabled by a
predicate function so that they can be disabled if we ever implement
zoned storage support for the realtime volume.

For rt group enabled file systems there is a separate bitmap and summary
file for each group and thus the number of bitmap and summary blocks
needs to be calculated differently.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/libxfs.h      |    1 
 include/xfs_mount.h   |    6 ++-
 libxfs/libxfs_priv.h  |    1 
 libxfs/xfs_format.h   |   42 ++++++++++++++++++++-
 libxfs/xfs_ondisk.h   |    3 +
 libxfs/xfs_rtbitmap.c |   20 ++++++++--
 libxfs/xfs_rtgroup.c  |   81 ++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_sb.c       |  100 +++++++++++++++++++++++++++++++++++++++++++------
 libxfs/xfs_shared.h   |    1 
 9 files changed, 236 insertions(+), 19 deletions(-)


diff --git a/include/libxfs.h b/include/libxfs.h
index df38d875143ed9..5689f58c640168 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -98,6 +98,7 @@ struct iomap;
 #include "xfs_metafile.h"
 #include "xfs_metadir.h"
 #include "xfs_rtgroup.h"
+#include "xfs_rtbitmap.h"
 
 #ifndef ARRAY_SIZE
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 7406825cf7f57a..31ac60d93c61cc 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -248,12 +248,14 @@ __XFS_HAS_FEAT(metadir, METADIR)
 
 static inline bool xfs_has_rtgroups(struct xfs_mount *mp)
 {
-	return false;
+	/* all metadir file systems also allow rtgroups */
+	return xfs_has_metadir(mp);
 }
 
 static inline bool xfs_has_rtsb(struct xfs_mount *mp)
 {
-	return false;
+	/* all rtgroups filesystems with an rt section have an rtsb */
+	return xfs_has_rtgroups(mp) && xfs_has_realtime(mp);
 }
 
 /* Kernel mount features that we don't support */
diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index 620cd13b8ef796..1fc9c784b05054 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -53,6 +53,7 @@
 #include "libfrog/radix-tree.h"
 #include "libfrog/bitmask.h"
 #include "libfrog/div64.h"
+#include "libfrog/util.h"
 #include "atomic.h"
 #include "spinlock.h"
 #include "linux-err.h"
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index 867060d60e8583..3cf044a2d5f2a9 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -265,8 +265,15 @@ struct xfs_dsb {
 	uuid_t		sb_meta_uuid;	/* metadata file system unique id */
 
 	__be64		sb_metadirino;	/* metadata directory tree root */
+	__be32		sb_rgcount;	/* # of realtime groups */
+	__be32		sb_rgextents;	/* size of rtgroup in rtx */
 
-	/* must be padded to 64 bit alignment */
+	/*
+	 * The size of this structure must be padded to 64 bit alignment.
+	 *
+	 * NOTE: Don't forget to update secondary_sb_whack in xfs_repair when
+	 * adding new fields here.
+	 */
 };
 
 #define XFS_SB_CRC_OFF		offsetof(struct xfs_dsb, sb_crc)
@@ -716,6 +723,39 @@ union xfs_suminfo_raw {
 	__u32		old;
 };
 
+/*
+ * Realtime allocation groups break the rt section into multiple pieces that
+ * could be locked independently.  Realtime block group numbers are 32-bit
+ * quantities.  Block numbers within a group are also 32-bit quantities, but
+ * the upper bit must never be set.  rtgroup 0 might have a superblock in it,
+ * so the minimum size of an rtgroup is 2 rtx.
+ */
+#define XFS_MAX_RGBLOCKS	((xfs_rgblock_t)(1U << 31) - 1)
+#define XFS_MIN_RGEXTENTS	((xfs_rtxlen_t)2)
+#define XFS_MAX_RGNUMBER	((xfs_rgnumber_t)(-1U))
+
+#define XFS_RTSB_MAGIC	0x46726F67	/* 'Frog' */
+
+/*
+ * Realtime superblock - on disk version.  Must be padded to 64 bit alignment.
+ * The first block of the realtime volume contains this superblock.
+ */
+struct xfs_rtsb {
+	__be32		rsb_magicnum;	/* magic number == XFS_RTSB_MAGIC */
+	__le32		rsb_crc;	/* superblock crc */
+
+	__be32		rsb_pad;	/* zero */
+	unsigned char	rsb_fname[XFSLABEL_MAX]; /* file system name */
+
+	uuid_t		rsb_uuid;	/* user-visible file system unique id */
+	uuid_t		rsb_meta_uuid;	/* metadata file system unique id */
+
+	/* must be padded to 64 bit alignment */
+};
+
+#define XFS_RTSB_CRC_OFF	offsetof(struct xfs_rtsb, rsb_crc)
+#define XFS_RTSB_DADDR		((xfs_daddr_t)0) /* daddr in rt section */
+
 /*
  * XFS Timestamps
  * ==============
diff --git a/libxfs/xfs_ondisk.h b/libxfs/xfs_ondisk.h
index 8bca86e350fdc1..38b314113d8f24 100644
--- a/libxfs/xfs_ondisk.h
+++ b/libxfs/xfs_ondisk.h
@@ -37,7 +37,7 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_STRUCT_SIZE(struct xfs_dinode,		176);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_disk_dquot,		104);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_dqblk,			136);
-	XFS_CHECK_STRUCT_SIZE(struct xfs_dsb,			272);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dsb,			280);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_dsymlink_hdr,		56);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_inobt_key,		4);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_inobt_rec,		16);
@@ -53,6 +53,7 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_STRUCT_SIZE(xfs_inobt_ptr_t,			4);
 	XFS_CHECK_STRUCT_SIZE(xfs_refcount_ptr_t,		4);
 	XFS_CHECK_STRUCT_SIZE(xfs_rmap_ptr_t,			4);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_rtsb,			56);
 
 	/* dir/attr trees */
 	XFS_CHECK_STRUCT_SIZE(struct xfs_attr3_leaf_hdr,	80);
diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c
index edcfb09e29fa18..84ca5447ca770f 100644
--- a/libxfs/xfs_rtbitmap.c
+++ b/libxfs/xfs_rtbitmap.c
@@ -1155,6 +1155,21 @@ xfs_rtbitmap_blockcount_len(
 	return howmany_64(rtextents, NBBY * mp->m_sb.sb_blocksize);
 }
 
+/* How many rt extents does each rtbitmap file track? */
+static inline xfs_rtbxlen_t
+xfs_rtbitmap_bitcount(
+	struct xfs_mount	*mp)
+{
+	if (!mp->m_sb.sb_rextents)
+		return 0;
+
+	/* rtgroup size can be nonzero even if rextents is zero */
+	if (xfs_has_rtgroups(mp))
+		return mp->m_sb.sb_rgextents;
+
+	return mp->m_sb.sb_rextents;
+}
+
 /*
  * Compute the number of rtbitmap blocks used for a given file system.
  */
@@ -1162,7 +1177,7 @@ xfs_filblks_t
 xfs_rtbitmap_blockcount(
 	struct xfs_mount	*mp)
 {
-	return xfs_rtbitmap_blockcount_len(mp, mp->m_sb.sb_rextents);
+	return xfs_rtbitmap_blockcount_len(mp, xfs_rtbitmap_bitcount(mp));
 }
 
 /*
@@ -1176,8 +1191,7 @@ xfs_rtsummary_blockcount(
 {
 	unsigned long long	rsumwords;
 
-	*rsumlevels = xfs_compute_rextslog(mp->m_sb.sb_rextents) + 1;
-
+	*rsumlevels = xfs_compute_rextslog(xfs_rtbitmap_bitcount(mp)) + 1;
 	rsumwords = xfs_rtbitmap_blockcount(mp) * (*rsumlevels);
 	return XFS_B_TO_FSB(mp, rsumwords << XFS_WORDLOG);
 }
diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index 4f2e6be2ae48cc..e294a295b3d0d8 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -480,3 +480,84 @@ xfs_rtginode_load_parent(
 	return xfs_metadir_load(tp, mp->m_metadirip, "rtgroups",
 			XFS_METAFILE_DIR, &mp->m_rtdirip);
 }
+
+/* Check superblock fields for a read or a write. */
+static xfs_failaddr_t
+xfs_rtsb_verify_common(
+	struct xfs_buf		*bp)
+{
+	struct xfs_rtsb		*rsb = bp->b_addr;
+
+	if (!xfs_verify_magic(bp, rsb->rsb_magicnum))
+		return __this_address;
+	if (rsb->rsb_pad)
+		return __this_address;
+
+	/* Everything to the end of the fs block must be zero */
+	if (memchr_inv(rsb + 1, 0, BBTOB(bp->b_length) - sizeof(*rsb)))
+		return __this_address;
+
+	return NULL;
+}
+
+/* Check superblock fields for a read or revalidation. */
+static inline xfs_failaddr_t
+xfs_rtsb_verify_all(
+	struct xfs_buf		*bp)
+{
+	struct xfs_rtsb		*rsb = bp->b_addr;
+	struct xfs_mount	*mp = bp->b_mount;
+	xfs_failaddr_t		fa;
+
+	fa = xfs_rtsb_verify_common(bp);
+	if (fa)
+		return fa;
+
+	if (memcmp(&rsb->rsb_fname, &mp->m_sb.sb_fname, XFSLABEL_MAX))
+		return __this_address;
+	if (!uuid_equal(&rsb->rsb_uuid, &mp->m_sb.sb_uuid))
+		return __this_address;
+	if (!uuid_equal(&rsb->rsb_meta_uuid, &mp->m_sb.sb_meta_uuid))
+		return  __this_address;
+
+	return NULL;
+}
+
+static void
+xfs_rtsb_read_verify(
+	struct xfs_buf		*bp)
+{
+	xfs_failaddr_t		fa;
+
+	if (!xfs_buf_verify_cksum(bp, XFS_RTSB_CRC_OFF)) {
+		xfs_verifier_error(bp, -EFSBADCRC, __this_address);
+		return;
+	}
+
+	fa = xfs_rtsb_verify_all(bp);
+	if (fa)
+		xfs_verifier_error(bp, -EFSCORRUPTED, fa);
+}
+
+static void
+xfs_rtsb_write_verify(
+	struct xfs_buf		*bp)
+{
+	xfs_failaddr_t		fa;
+
+	fa = xfs_rtsb_verify_common(bp);
+	if (fa) {
+		xfs_verifier_error(bp, -EFSCORRUPTED, fa);
+		return;
+	}
+
+	xfs_buf_update_cksum(bp, XFS_RTSB_CRC_OFF);
+}
+
+const struct xfs_buf_ops xfs_rtsb_buf_ops = {
+	.name		= "xfs_rtsb",
+	.magic		= { 0, cpu_to_be32(XFS_RTSB_MAGIC) },
+	.verify_read	= xfs_rtsb_read_verify,
+	.verify_write	= xfs_rtsb_write_verify,
+	.verify_struct	= xfs_rtsb_verify_all,
+};
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 2a5368c266ee91..3c80413b67f6ec 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -231,11 +231,22 @@ xfs_validate_sb_read(
 	return 0;
 }
 
+/* Return the number of extents covered by a single rt bitmap file */
+static xfs_rtbxlen_t
+xfs_extents_per_rbm(
+	struct xfs_sb		*sbp)
+{
+	if (xfs_sb_is_v5(sbp) &&
+	    (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR))
+		return sbp->sb_rgextents;
+	return sbp->sb_rextents;
+}
+
 static uint64_t
-xfs_sb_calc_rbmblocks(
+xfs_expected_rbmblocks(
 	struct xfs_sb		*sbp)
 {
-	return howmany_64(sbp->sb_rextents, NBBY * sbp->sb_blocksize);
+	return howmany_64(xfs_extents_per_rbm(sbp), NBBY * sbp->sb_blocksize);
 }
 
 /* Validate the realtime geometry */
@@ -257,7 +268,7 @@ xfs_validate_rt_geometry(
 	if (sbp->sb_rextents == 0 ||
 	    sbp->sb_rextents != div_u64(sbp->sb_rblocks, sbp->sb_rextsize) ||
 	    sbp->sb_rextslog != xfs_compute_rextslog(sbp->sb_rextents) ||
-	    sbp->sb_rbmblocks != xfs_sb_calc_rbmblocks(sbp))
+	    sbp->sb_rbmblocks != xfs_expected_rbmblocks(sbp))
 		return false;
 
 	return true;
@@ -338,6 +349,59 @@ xfs_validate_sb_write(
 	return 0;
 }
 
+static int
+xfs_validate_sb_rtgroups(
+	struct xfs_mount	*mp,
+	struct xfs_sb		*sbp)
+{
+	uint64_t		groups;
+
+	if (sbp->sb_rextsize == 0) {
+		xfs_warn(mp,
+"Realtime extent size must not be zero.");
+		return -EINVAL;
+	}
+
+	if (sbp->sb_rgextents > XFS_MAX_RGBLOCKS / sbp->sb_rextsize) {
+		xfs_warn(mp,
+"Realtime group size (%u) must be less than %u rt extents.",
+				sbp->sb_rgextents,
+				XFS_MAX_RGBLOCKS / sbp->sb_rextsize);
+		return -EINVAL;
+	}
+
+	if (sbp->sb_rgextents < XFS_MIN_RGEXTENTS) {
+		xfs_warn(mp,
+"Realtime group size (%u) must be at least %u rt extents.",
+				sbp->sb_rgextents, XFS_MIN_RGEXTENTS);
+		return -EINVAL;
+	}
+
+	if (sbp->sb_rgcount > XFS_MAX_RGNUMBER) {
+		xfs_warn(mp,
+"Realtime groups (%u) must be less than %u.",
+				sbp->sb_rgcount, XFS_MAX_RGNUMBER);
+		return -EINVAL;
+	}
+
+	groups = howmany_64(sbp->sb_rextents, sbp->sb_rgextents);
+	if (groups != sbp->sb_rgcount) {
+		xfs_warn(mp,
+"Realtime groups (%u) do not cover the entire rt section; need (%llu) groups.",
+				sbp->sb_rgcount, groups);
+		return -EINVAL;
+	}
+
+	/* Exchange-range is required for fsr to work on realtime files */
+	if (!(sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_EXCHRANGE)) {
+		xfs_warn(mp,
+"Realtime groups feature requires exchange-range support.");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 /* Check the validity of the SB. */
 STATIC int
 xfs_validate_sb_common(
@@ -349,6 +413,7 @@ xfs_validate_sb_common(
 	uint32_t		agcount = 0;
 	uint32_t		rem;
 	bool			has_dalign;
+	int			error;
 
 	if (!xfs_verify_magic(bp, dsb->sb_magicnum)) {
 		xfs_warn(mp,
@@ -412,6 +477,12 @@ xfs_validate_sb_common(
 				sbp->sb_spino_align);
 			return -EINVAL;
 		}
+
+		if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR) {
+			error = xfs_validate_sb_rtgroups(mp, sbp);
+			if (error)
+				return error;
+		}
 	} else if (sbp->sb_qflags & (XFS_PQUOTA_ENFD | XFS_GQUOTA_ENFD |
 				XFS_PQUOTA_CHKD | XFS_GQUOTA_CHKD)) {
 			xfs_notice(mp,
@@ -703,13 +774,15 @@ __xfs_sb_from_disk(
 	if (convert_xquota)
 		xfs_sb_quota_from_disk(to);
 
-	if (to->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR)
+	if (to->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR) {
 		to->sb_metadirino = be64_to_cpu(from->sb_metadirino);
-	else
+		to->sb_rgcount = be32_to_cpu(from->sb_rgcount);
+		to->sb_rgextents = be32_to_cpu(from->sb_rgextents);
+	} else {
 		to->sb_metadirino = NULLFSINO;
-
-	to->sb_rgcount = 1;
-	to->sb_rgextents = 0;
+		to->sb_rgcount = 1;
+		to->sb_rgextents = 0;
+	}
 }
 
 void
@@ -858,8 +931,11 @@ xfs_sb_to_disk(
 	if (from->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_META_UUID)
 		uuid_copy(&to->sb_meta_uuid, &from->sb_meta_uuid);
 
-	if (from->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR)
+	if (from->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR) {
 		to->sb_metadirino = cpu_to_be64(from->sb_metadirino);
+		to->sb_rgcount = cpu_to_be32(from->sb_rgcount);
+		to->sb_rgextents = cpu_to_be32(from->sb_rgextents);
+	}
 }
 
 /*
@@ -999,9 +1075,9 @@ xfs_mount_sb_set_rextsize(
 	mp->m_rtxblklog = log2_if_power2(sbp->sb_rextsize);
 	mp->m_rtxblkmask = mask64_if_power2(sbp->sb_rextsize);
 
-	mp->m_rgblocks = 0;
-	mp->m_rgblklog = 0;
-	mp->m_rgblkmask = (uint64_t)-1;
+	mp->m_rgblocks = sbp->sb_rgextents * sbp->sb_rextsize;
+	mp->m_rgblklog = log2_if_power2(mp->m_rgblocks);
+	mp->m_rgblkmask = mask64_if_power2(mp->m_rgblocks);
 
 	rgs->blocks = 0;
 	rgs->blklog = 0;
diff --git a/libxfs/xfs_shared.h b/libxfs/xfs_shared.h
index 33b84a3a83ff63..552365d212ea26 100644
--- a/libxfs/xfs_shared.h
+++ b/libxfs/xfs_shared.h
@@ -39,6 +39,7 @@ extern const struct xfs_buf_ops xfs_inode_buf_ra_ops;
 extern const struct xfs_buf_ops xfs_refcountbt_buf_ops;
 extern const struct xfs_buf_ops xfs_rmapbt_buf_ops;
 extern const struct xfs_buf_ops xfs_rtbuf_ops;
+extern const struct xfs_buf_ops xfs_rtsb_buf_ops;
 extern const struct xfs_buf_ops xfs_sb_buf_ops;
 extern const struct xfs_buf_ops xfs_sb_quiet_buf_ops;
 extern const struct xfs_buf_ops xfs_symlink_buf_ops;


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 13/52] xfs: update realtime super every time we update the primary fs super
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (11 preceding siblings ...)
  2024-12-23 22:01   ` [PATCH 12/52] xfs: define the format of rt groups Darrick J. Wong
@ 2024-12-23 22:01   ` Darrick J. Wong
  2024-12-23 22:01   ` [PATCH 14/52] xfs: export realtime group geometry via XFS_FSOP_GEOM Darrick J. Wong
                     ` (38 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:01 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 76d3be00df91a56f7c05142ed500f8f8544d5457

Every time we update parts of the primary filesystem superblock that are
echoed in the rt superblock, we must update the rt super.  Avoid
changing the log to support logging to the rt device by using ordered
buffers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/xfs_trans.h      |    1 +
 libxfs/libxfs_api_defs.h |    1 +
 libxfs/libxfs_io.h       |    1 +
 libxfs/rdwr.c            |   17 +++++++++++++
 libxfs/trans.c           |   29 ++++++++++++++++++++++
 libxfs/xfs_rtgroup.c     |   60 ++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_rtgroup.h     |    7 +++++
 libxfs/xfs_sb.c          |   14 ++++++++++-
 libxfs/xfs_sb.h          |    2 +-
 9 files changed, 130 insertions(+), 2 deletions(-)


diff --git a/include/xfs_trans.h b/include/xfs_trans.h
index d508f8947a301c..248064019a0ab5 100644
--- a/include/xfs_trans.h
+++ b/include/xfs_trans.h
@@ -108,6 +108,7 @@ int	libxfs_trans_reserve_more(struct xfs_trans *tp, uint blocks,
 void xfs_defer_cancel(struct xfs_trans *);
 
 struct xfs_buf *libxfs_trans_getsb(struct xfs_trans *);
+struct xfs_buf *libxfs_trans_getrtsb(struct xfs_trans *tp);
 
 void	libxfs_trans_ijoin(struct xfs_trans *, struct xfs_inode *, uint);
 void	libxfs_trans_log_inode (struct xfs_trans *, struct xfs_inode *,
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index c869a4b0a46551..50da547f8f21d4 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -324,6 +324,7 @@
 #define xfs_trans_dirty_buf		libxfs_trans_dirty_buf
 #define xfs_trans_get_buf		libxfs_trans_get_buf
 #define xfs_trans_get_buf_map		libxfs_trans_get_buf_map
+#define xfs_trans_getrtsb		libxfs_trans_getrtsb
 #define xfs_trans_getsb			libxfs_trans_getsb
 #define xfs_trans_ichgtime		libxfs_trans_ichgtime
 #define xfs_trans_ijoin			libxfs_trans_ijoin
diff --git a/libxfs/libxfs_io.h b/libxfs/libxfs_io.h
index 82d86f1d1b37bf..99372eb6d3d13c 100644
--- a/libxfs/libxfs_io.h
+++ b/libxfs/libxfs_io.h
@@ -208,6 +208,7 @@ libxfs_buf_read(
 
 int libxfs_readbuf_verify(struct xfs_buf *bp, const struct xfs_buf_ops *ops);
 struct xfs_buf *libxfs_getsb(struct xfs_mount *mp);
+struct xfs_buf *libxfs_getrtsb(struct xfs_mount *mp);
 extern void	libxfs_bcache_purge(struct xfs_mount *mp);
 extern void	libxfs_bcache_free(void);
 extern void	libxfs_bcache_flush(struct xfs_mount *mp);
diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index 7d4d93e4f094e6..35be785c435a97 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -165,6 +165,23 @@ libxfs_getsb(
 	return bp;
 }
 
+struct xfs_buf *
+libxfs_getrtsb(
+	struct xfs_mount	*mp)
+{
+	struct xfs_buf		*bp;
+	int			error;
+
+	if (!mp->m_rtdev_targp->bt_bdev)
+		return NULL;
+
+	error = libxfs_buf_read_uncached(mp->m_rtdev_targp, XFS_RTSB_DADDR,
+			XFS_FSB_TO_BB(mp, 1), 0, &bp, &xfs_rtsb_buf_ops);
+	if (error)
+		return NULL;
+	return bp;
+}
+
 struct kmem_cache			*xfs_buf_cache;
 
 static struct cache_mru		xfs_buf_freelist =
diff --git a/libxfs/trans.c b/libxfs/trans.c
index 72f26591053716..01834eff4b77ca 100644
--- a/libxfs/trans.c
+++ b/libxfs/trans.c
@@ -512,6 +512,35 @@ libxfs_trans_getsb(
 	return bp;
 }
 
+struct xfs_buf *
+libxfs_trans_getrtsb(
+	struct xfs_trans	*tp)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_buf		*bp;
+	struct xfs_buf_log_item	*bip;
+	int			len = XFS_FSS_TO_BB(mp, 1);
+	DEFINE_SINGLE_BUF_MAP(map, XFS_SB_DADDR, len);
+
+	bp = xfs_trans_buf_item_match(tp, mp->m_rtdev, &map, 1);
+	if (bp != NULL) {
+		ASSERT(bp->b_transp == tp);
+		bip = bp->b_log_item;
+		ASSERT(bip != NULL);
+		bip->bli_recur++;
+		trace_xfs_trans_getsb_recur(bip);
+		return bp;
+	}
+
+	bp = libxfs_getrtsb(mp);
+	if (bp == NULL)
+		return NULL;
+
+	_libxfs_trans_bjoin(tp, bp, 1);
+	trace_xfs_trans_getsb(bp->b_log_item);
+	return bp;
+}
+
 int
 libxfs_trans_read_buf_map(
 	struct xfs_mount	*mp,
diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index e294a295b3d0d8..6eabc66099dc50 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -561,3 +561,63 @@ const struct xfs_buf_ops xfs_rtsb_buf_ops = {
 	.verify_write	= xfs_rtsb_write_verify,
 	.verify_struct	= xfs_rtsb_verify_all,
 };
+
+/* Update a realtime superblock from the primary fs super */
+void
+xfs_update_rtsb(
+	struct xfs_buf		*rtsb_bp,
+	const struct xfs_buf	*sb_bp)
+{
+	const struct xfs_dsb	*dsb = sb_bp->b_addr;
+	struct xfs_rtsb		*rsb = rtsb_bp->b_addr;
+	const uuid_t		*meta_uuid;
+
+	rsb->rsb_magicnum = cpu_to_be32(XFS_RTSB_MAGIC);
+
+	rsb->rsb_pad = 0;
+	memcpy(&rsb->rsb_fname, &dsb->sb_fname, XFSLABEL_MAX);
+
+	memcpy(&rsb->rsb_uuid, &dsb->sb_uuid, sizeof(rsb->rsb_uuid));
+
+	/*
+	 * The metadata uuid is the fs uuid if the metauuid feature is not
+	 * enabled.
+	 */
+	if (dsb->sb_features_incompat &
+				cpu_to_be32(XFS_SB_FEAT_INCOMPAT_META_UUID))
+		meta_uuid = &dsb->sb_meta_uuid;
+	else
+		meta_uuid = &dsb->sb_uuid;
+	memcpy(&rsb->rsb_meta_uuid, meta_uuid, sizeof(rsb->rsb_meta_uuid));
+}
+
+/*
+ * Update the realtime superblock from a filesystem superblock and log it to
+ * the given transaction.
+ */
+struct xfs_buf *
+xfs_log_rtsb(
+	struct xfs_trans	*tp,
+	const struct xfs_buf	*sb_bp)
+{
+	struct xfs_buf		*rtsb_bp;
+
+	if (!xfs_has_rtsb(tp->t_mountp))
+		return NULL;
+
+	rtsb_bp = xfs_trans_getrtsb(tp);
+	if (!rtsb_bp) {
+		/*
+		 * It's possible for the rtgroups feature to be enabled but
+		 * there is no incore rt superblock buffer if the rt geometry
+		 * was specified at mkfs time but the rt section has not yet
+		 * been attached.  In this case, rblocks must be zero.
+		 */
+		ASSERT(tp->t_mountp->m_sb.sb_rblocks == 0);
+		return NULL;
+	}
+
+	xfs_update_rtsb(rtsb_bp, sb_bp);
+	xfs_trans_ordered_buf(tp, rtsb_bp);
+	return rtsb_bp;
+}
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index 6ccf31bb6bc7a7..e7679fafff8ce7 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -251,6 +251,11 @@ static inline const char *xfs_rtginode_path(xfs_rgnumber_t rgno,
 {
 	return kasprintf(GFP_KERNEL, "%u.%s", rgno, xfs_rtginode_name(type));
 }
+
+void xfs_update_rtsb(struct xfs_buf *rtsb_bp,
+		const struct xfs_buf *sb_bp);
+struct xfs_buf *xfs_log_rtsb(struct xfs_trans *tp,
+		const struct xfs_buf *sb_bp);
 #else
 static inline void xfs_free_rtgroups(struct xfs_mount *mp,
 		xfs_rgnumber_t first_rgno, xfs_rgnumber_t end_rgno)
@@ -269,6 +274,8 @@ static inline int xfs_initialize_rtgroups(struct xfs_mount *mp,
 # define xfs_rtgroup_lock(rtg, gf)		((void)0)
 # define xfs_rtgroup_unlock(rtg, gf)		((void)0)
 # define xfs_rtgroup_trans_join(tp, rtg, gf)	((void)0)
+# define xfs_update_rtsb(bp, sb_bp)	((void)0)
+# define xfs_log_rtsb(tp, sb_bp)	(NULL)
 #endif /* CONFIG_XFS_RT */
 
 #endif /* __LIBXFS_RTGROUP_H */
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 3c80413b67f6ec..96b0a73682f435 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -24,6 +24,7 @@
 #include "xfs_health.h"
 #include "xfs_ag.h"
 #include "xfs_rtbitmap.h"
+#include "xfs_rtgroup.h"
 
 /*
  * Physical superblock buffer manipulations. Shared with libxfs in userspace.
@@ -1287,10 +1288,12 @@ xfs_update_secondary_sbs(
  */
 int
 xfs_sync_sb_buf(
-	struct xfs_mount	*mp)
+	struct xfs_mount	*mp,
+	bool			update_rtsb)
 {
 	struct xfs_trans	*tp;
 	struct xfs_buf		*bp;
+	struct xfs_buf		*rtsb_bp = NULL;
 	int			error;
 
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_sb, 0, 0, 0, &tp);
@@ -1300,6 +1303,11 @@ xfs_sync_sb_buf(
 	bp = xfs_trans_getsb(tp);
 	xfs_log_sb(tp);
 	xfs_trans_bhold(tp, bp);
+	if (update_rtsb) {
+		rtsb_bp = xfs_log_rtsb(tp, bp);
+		if (rtsb_bp)
+			xfs_trans_bhold(tp, rtsb_bp);
+	}
 	xfs_trans_set_sync(tp);
 	error = xfs_trans_commit(tp);
 	if (error)
@@ -1308,7 +1316,11 @@ xfs_sync_sb_buf(
 	 * write out the sb buffer to get the changes to disk
 	 */
 	error = xfs_bwrite(bp);
+	if (!error && rtsb_bp)
+		error = xfs_bwrite(rtsb_bp);
 out:
+	if (rtsb_bp)
+		xfs_buf_relse(rtsb_bp);
 	xfs_buf_relse(bp);
 	return error;
 }
diff --git a/libxfs/xfs_sb.h b/libxfs/xfs_sb.h
index 885c837559914d..999dcfccdaf960 100644
--- a/libxfs/xfs_sb.h
+++ b/libxfs/xfs_sb.h
@@ -15,7 +15,7 @@ struct xfs_perag;
 
 extern void	xfs_log_sb(struct xfs_trans *tp);
 extern int	xfs_sync_sb(struct xfs_mount *mp, bool wait);
-extern int	xfs_sync_sb_buf(struct xfs_mount *mp);
+extern int	xfs_sync_sb_buf(struct xfs_mount *mp, bool update_rtsb);
 extern void	xfs_sb_mount_common(struct xfs_mount *mp, struct xfs_sb *sbp);
 void		xfs_mount_sb_set_rextsize(struct xfs_mount *mp,
 			struct xfs_sb *sbp);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 14/52] xfs: export realtime group geometry via XFS_FSOP_GEOM
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (12 preceding siblings ...)
  2024-12-23 22:01   ` [PATCH 13/52] xfs: update realtime super every time we update the primary fs super Darrick J. Wong
@ 2024-12-23 22:01   ` Darrick J. Wong
  2024-12-23 22:02   ` [PATCH 15/52] xfs: check that rtblock extents do not break rtsupers or rtgroups Darrick J. Wong
                     ` (37 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:01 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 8edde94d640153d645f85b94b2e1af8872c11ac8

Export the realtime geometry information so that userspace can query it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_fs.h |    4 +++-
 libxfs/xfs_sb.c |    5 +++++
 2 files changed, 8 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index faa38a7d1eb019..5c224d03270ce9 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -187,7 +187,9 @@ struct xfs_fsop_geom {
 	__u32		logsunit;	/* log stripe unit, bytes	*/
 	uint32_t	sick;		/* o: unhealthy fs & rt metadata */
 	uint32_t	checked;	/* o: checked fs & rt metadata	*/
-	__u64		reserved[17];	/* reserved space		*/
+	__u32		rgextents;	/* rt extents in a realtime group */
+	__u32		rgcount;	/* number of realtime groups	*/
+	__u64		reserved[16];	/* reserved space		*/
 };
 
 #define XFS_FSOP_GEOM_SICK_COUNTERS	(1 << 0)  /* summary counters */
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 96b0a73682f435..2e536bc3b2090b 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -1424,6 +1424,11 @@ xfs_fs_geometry(
 		return;
 
 	geo->version = XFS_FSOP_GEOM_VERSION_V5;
+
+	if (xfs_has_rtgroups(mp)) {
+		geo->rgcount = sbp->sb_rgcount;
+		geo->rgextents = sbp->sb_rgextents;
+	}
 }
 
 /* Read a secondary superblock. */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 15/52] xfs: check that rtblock extents do not break rtsupers or rtgroups
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (13 preceding siblings ...)
  2024-12-23 22:01   ` [PATCH 14/52] xfs: export realtime group geometry via XFS_FSOP_GEOM Darrick J. Wong
@ 2024-12-23 22:02   ` Darrick J. Wong
  2024-12-23 22:02   ` [PATCH 16/52] xfs: add a helper to prevent bmap merges across rtgroup boundaries Darrick J. Wong
                     ` (36 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:02 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 9bb512734722d2815bb79e27850dddeeff10db90

Check that rt block pointers do not point to the realtime superblock and
that allocated rt space extents do not cross rtgroup boundaries.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_types.c |   38 +++++++++++++++++++++++++++++++++-----
 1 file changed, 33 insertions(+), 5 deletions(-)


diff --git a/libxfs/xfs_types.c b/libxfs/xfs_types.c
index a70c0395979cf8..4e366e62eb2ae8 100644
--- a/libxfs/xfs_types.c
+++ b/libxfs/xfs_types.c
@@ -12,6 +12,8 @@
 #include "xfs_bit.h"
 #include "xfs_mount.h"
 #include "xfs_ag.h"
+#include "xfs_rtbitmap.h"
+#include "xfs_rtgroup.h"
 
 
 /*
@@ -135,18 +137,37 @@ xfs_verify_dir_ino(
 }
 
 /*
- * Verify that an realtime block number pointer doesn't point off the
- * end of the realtime device.
+ * Verify that a realtime block number pointer neither points outside the
+ * allocatable areas of the rtgroup nor off the end of the realtime
+ * device.
  */
 inline bool
 xfs_verify_rtbno(
 	struct xfs_mount	*mp,
 	xfs_rtblock_t		rtbno)
 {
-	return rtbno < mp->m_sb.sb_rblocks;
+	if (rtbno >= mp->m_sb.sb_rblocks)
+		return false;
+
+	if (xfs_has_rtgroups(mp)) {
+		xfs_rgnumber_t	rgno = xfs_rtb_to_rgno(mp, rtbno);
+		xfs_rtxnum_t	rtx = xfs_rtb_to_rtx(mp, rtbno);
+
+		if (rgno >= mp->m_sb.sb_rgcount)
+			return false;
+		if (rtx >= xfs_rtgroup_extents(mp, rgno))
+			return false;
+		if (xfs_has_rtsb(mp) && rgno == 0 && rtx == 0)
+			return false;
+	}
+	return true;
 }
 
-/* Verify that a realtime device extent is fully contained inside the volume. */
+/*
+ * Verify that an allocated realtime device extent neither points outside
+ * allocatable areas of the rtgroup, across an rtgroup boundary, nor off the
+ * end of the realtime device.
+ */
 bool
 xfs_verify_rtbext(
 	struct xfs_mount	*mp,
@@ -159,7 +180,14 @@ xfs_verify_rtbext(
 	if (!xfs_verify_rtbno(mp, rtbno))
 		return false;
 
-	return xfs_verify_rtbno(mp, rtbno + len - 1);
+	if (!xfs_verify_rtbno(mp, rtbno + len - 1))
+		return false;
+
+	if (xfs_has_rtgroups(mp) &&
+	    xfs_rtb_to_rgno(mp, rtbno) != xfs_rtb_to_rgno(mp, rtbno + len - 1))
+		return false;
+
+	return true;
 }
 
 /* Calculate the range of valid icount values. */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 16/52] xfs: add a helper to prevent bmap merges across rtgroup boundaries
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (14 preceding siblings ...)
  2024-12-23 22:02   ` [PATCH 15/52] xfs: check that rtblock extents do not break rtsupers or rtgroups Darrick J. Wong
@ 2024-12-23 22:02   ` Darrick J. Wong
  2024-12-23 22:02   ` [PATCH 17/52] xfs: add frextents to the lazysbcounters when rtgroups enabled Darrick J. Wong
                     ` (35 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:02 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: 8458c4944e10aa8119d9de88e257d60a3537263e

Except for the rt superblock, realtime groups do not store any metadata
at the start (or end) of the group.  There is nothing to prevent the
bmap code from merging allocations from multiple groups into a single
bmap record.  Add a helper to check for this case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: massage the commit message after pulling this into rtgroups]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_bmap.c |   56 ++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 44 insertions(+), 12 deletions(-)


diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 48b05c40e23235..d7769f0e70005d 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -34,6 +34,7 @@
 #include "defer_item.h"
 #include "xfs_symlink_remote.h"
 #include "xfs_inode_util.h"
+#include "xfs_rtgroup.h"
 
 struct kmem_cache		*xfs_bmap_intent_cache;
 
@@ -1420,6 +1421,24 @@ xfs_bmap_last_offset(
  * Extent tree manipulation functions used during allocation.
  */
 
+static inline bool
+xfs_bmap_same_rtgroup(
+	struct xfs_inode	*ip,
+	int			whichfork,
+	struct xfs_bmbt_irec	*left,
+	struct xfs_bmbt_irec	*right)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+
+	if (xfs_ifork_is_realtime(ip, whichfork) && xfs_has_rtgroups(mp)) {
+		if (xfs_rtb_to_rgno(mp, left->br_startblock) !=
+		    xfs_rtb_to_rgno(mp, right->br_startblock))
+			return false;
+	}
+
+	return true;
+}
+
 /*
  * Convert a delayed allocation to a real allocation.
  */
@@ -1489,7 +1508,8 @@ xfs_bmap_add_extent_delay_real(
 	    LEFT.br_startoff + LEFT.br_blockcount == new->br_startoff &&
 	    LEFT.br_startblock + LEFT.br_blockcount == new->br_startblock &&
 	    LEFT.br_state == new->br_state &&
-	    LEFT.br_blockcount + new->br_blockcount <= XFS_MAX_BMBT_EXTLEN)
+	    LEFT.br_blockcount + new->br_blockcount <= XFS_MAX_BMBT_EXTLEN &&
+	    xfs_bmap_same_rtgroup(bma->ip, whichfork, &LEFT, new))
 		state |= BMAP_LEFT_CONTIG;
 
 	/*
@@ -1513,7 +1533,8 @@ xfs_bmap_add_extent_delay_real(
 		      (BMAP_LEFT_CONTIG | BMAP_LEFT_FILLING |
 		       BMAP_RIGHT_FILLING) ||
 	     LEFT.br_blockcount + new->br_blockcount + RIGHT.br_blockcount
-			<= XFS_MAX_BMBT_EXTLEN))
+			<= XFS_MAX_BMBT_EXTLEN) &&
+	    xfs_bmap_same_rtgroup(bma->ip, whichfork, new, &RIGHT))
 		state |= BMAP_RIGHT_CONTIG;
 
 	error = 0;
@@ -2058,7 +2079,8 @@ xfs_bmap_add_extent_unwritten_real(
 	    LEFT.br_startoff + LEFT.br_blockcount == new->br_startoff &&
 	    LEFT.br_startblock + LEFT.br_blockcount == new->br_startblock &&
 	    LEFT.br_state == new->br_state &&
-	    LEFT.br_blockcount + new->br_blockcount <= XFS_MAX_BMBT_EXTLEN)
+	    LEFT.br_blockcount + new->br_blockcount <= XFS_MAX_BMBT_EXTLEN &&
+	    xfs_bmap_same_rtgroup(ip, whichfork, &LEFT, new))
 		state |= BMAP_LEFT_CONTIG;
 
 	/*
@@ -2082,7 +2104,8 @@ xfs_bmap_add_extent_unwritten_real(
 		      (BMAP_LEFT_CONTIG | BMAP_LEFT_FILLING |
 		       BMAP_RIGHT_FILLING) ||
 	     LEFT.br_blockcount + new->br_blockcount + RIGHT.br_blockcount
-			<= XFS_MAX_BMBT_EXTLEN))
+			<= XFS_MAX_BMBT_EXTLEN) &&
+	    xfs_bmap_same_rtgroup(ip, whichfork, new, &RIGHT))
 		state |= BMAP_RIGHT_CONTIG;
 
 	/*
@@ -2591,7 +2614,8 @@ xfs_bmap_add_extent_hole_delay(
 	 */
 	if ((state & BMAP_LEFT_VALID) && (state & BMAP_LEFT_DELAY) &&
 	    left.br_startoff + left.br_blockcount == new->br_startoff &&
-	    left.br_blockcount + new->br_blockcount <= XFS_MAX_BMBT_EXTLEN)
+	    left.br_blockcount + new->br_blockcount <= XFS_MAX_BMBT_EXTLEN &&
+	    xfs_bmap_same_rtgroup(ip, whichfork, &left, new))
 		state |= BMAP_LEFT_CONTIG;
 
 	if ((state & BMAP_RIGHT_VALID) && (state & BMAP_RIGHT_DELAY) &&
@@ -2599,7 +2623,8 @@ xfs_bmap_add_extent_hole_delay(
 	    new->br_blockcount + right.br_blockcount <= XFS_MAX_BMBT_EXTLEN &&
 	    (!(state & BMAP_LEFT_CONTIG) ||
 	     (left.br_blockcount + new->br_blockcount +
-	      right.br_blockcount <= XFS_MAX_BMBT_EXTLEN)))
+	      right.br_blockcount <= XFS_MAX_BMBT_EXTLEN)) &&
+	    xfs_bmap_same_rtgroup(ip, whichfork, new, &right))
 		state |= BMAP_RIGHT_CONTIG;
 
 	/*
@@ -2742,7 +2767,8 @@ xfs_bmap_add_extent_hole_real(
 	    left.br_startoff + left.br_blockcount == new->br_startoff &&
 	    left.br_startblock + left.br_blockcount == new->br_startblock &&
 	    left.br_state == new->br_state &&
-	    left.br_blockcount + new->br_blockcount <= XFS_MAX_BMBT_EXTLEN)
+	    left.br_blockcount + new->br_blockcount <= XFS_MAX_BMBT_EXTLEN &&
+	    xfs_bmap_same_rtgroup(ip, whichfork, &left, new))
 		state |= BMAP_LEFT_CONTIG;
 
 	if ((state & BMAP_RIGHT_VALID) && !(state & BMAP_RIGHT_DELAY) &&
@@ -2752,7 +2778,8 @@ xfs_bmap_add_extent_hole_real(
 	    new->br_blockcount + right.br_blockcount <= XFS_MAX_BMBT_EXTLEN &&
 	    (!(state & BMAP_LEFT_CONTIG) ||
 	     left.br_blockcount + new->br_blockcount +
-	     right.br_blockcount <= XFS_MAX_BMBT_EXTLEN))
+	     right.br_blockcount <= XFS_MAX_BMBT_EXTLEN) &&
+	    xfs_bmap_same_rtgroup(ip, whichfork, new, &right))
 		state |= BMAP_RIGHT_CONTIG;
 
 	error = 0;
@@ -5709,6 +5736,8 @@ xfs_bunmapi(
  */
 STATIC bool
 xfs_bmse_can_merge(
+	struct xfs_inode	*ip,
+	int			whichfork,
 	struct xfs_bmbt_irec	*left,	/* preceding extent */
 	struct xfs_bmbt_irec	*got,	/* current extent to shift */
 	xfs_fileoff_t		shift)	/* shift fsb */
@@ -5724,7 +5753,8 @@ xfs_bmse_can_merge(
 	if ((left->br_startoff + left->br_blockcount != startoff) ||
 	    (left->br_startblock + left->br_blockcount != got->br_startblock) ||
 	    (left->br_state != got->br_state) ||
-	    (left->br_blockcount + got->br_blockcount > XFS_MAX_BMBT_EXTLEN))
+	    (left->br_blockcount + got->br_blockcount > XFS_MAX_BMBT_EXTLEN) ||
+	    !xfs_bmap_same_rtgroup(ip, whichfork, left, got))
 		return false;
 
 	return true;
@@ -5760,7 +5790,7 @@ xfs_bmse_merge(
 	blockcount = left->br_blockcount + got->br_blockcount;
 
 	xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL);
-	ASSERT(xfs_bmse_can_merge(left, got, shift));
+	ASSERT(xfs_bmse_can_merge(ip, whichfork, left, got, shift));
 
 	new = *left;
 	new.br_blockcount = blockcount;
@@ -5922,7 +5952,8 @@ xfs_bmap_collapse_extents(
 			goto del_cursor;
 		}
 
-		if (xfs_bmse_can_merge(&prev, &got, offset_shift_fsb)) {
+		if (xfs_bmse_can_merge(ip, whichfork, &prev, &got,
+				offset_shift_fsb)) {
 			error = xfs_bmse_merge(tp, ip, whichfork,
 					offset_shift_fsb, &icur, &got, &prev,
 					cur, &logflags);
@@ -6058,7 +6089,8 @@ xfs_bmap_insert_extents(
 		 * never find mergeable extents in this scenario.  Check anyways
 		 * and warn if we encounter two extents that could be one.
 		 */
-		if (xfs_bmse_can_merge(&got, &next, offset_shift_fsb))
+		if (xfs_bmse_can_merge(ip, whichfork, &got, &next,
+				offset_shift_fsb))
 			WARN_ON_ONCE(1);
 	}
 


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 17/52] xfs: add frextents to the lazysbcounters when rtgroups enabled
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (15 preceding siblings ...)
  2024-12-23 22:02   ` [PATCH 16/52] xfs: add a helper to prevent bmap merges across rtgroup boundaries Darrick J. Wong
@ 2024-12-23 22:02   ` Darrick J. Wong
  2024-12-23 22:02   ` [PATCH 18/52] xfs: record rt group metadata errors in the health system Darrick J. Wong
                     ` (34 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:02 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 35537f25d23697716f0070ea0a6e8b3f1fe10196

Make the free rt extent count a part of the lazy sb counters when the
realtime groups feature is enabled.  This is possible because the patch
to recompute frextents from the rtbitmap during log recovery predates
the code adding rtgroup support, hence we know that the value will
always be correct during runtime.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/xfs_mount.h |    1 +
 libxfs/xfs_sb.c     |   15 ++++++++++-----
 2 files changed, 11 insertions(+), 5 deletions(-)


diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 31ac60d93c61cc..c03246a08a4af3 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -64,6 +64,7 @@ typedef struct xfs_mount {
 #define m_icount	m_sb.sb_icount
 #define m_ifree		m_sb.sb_ifree
 #define m_fdblocks	m_sb.sb_fdblocks
+#define m_frextents	m_sb.sb_frextents
 	spinlock_t		m_sb_lock;
 
 	/*
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 2e536bc3b2090b..88fb4890d95a72 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -1163,11 +1163,6 @@ xfs_log_sb(
 	 * reservations that have been taken out percpu counters. If we have an
 	 * unclean shutdown, this will be corrected by log recovery rebuilding
 	 * the counters from the AGF block counts.
-	 *
-	 * Do not update sb_frextents here because it is not part of the lazy
-	 * sb counters, despite having a percpu counter. It is always kept
-	 * consistent with the ondisk rtbitmap by xfs_trans_apply_sb_deltas()
-	 * and hence we don't need have to update it here.
 	 */
 	if (xfs_has_lazysbcount(mp)) {
 		mp->m_sb.sb_icount = percpu_counter_sum_positive(&mp->m_icount);
@@ -1178,6 +1173,16 @@ xfs_log_sb(
 				percpu_counter_sum_positive(&mp->m_fdblocks);
 	}
 
+	/*
+	 * sb_frextents was added to the lazy sb counters when the rt groups
+	 * feature was introduced.  This counter can go negative due to the way
+	 * we handle nearly-lockless reservations, so we must use the _positive
+	 * variant here to avoid writing out nonsense frextents.
+	 */
+	if (xfs_has_rtgroups(mp))
+		mp->m_sb.sb_frextents =
+				percpu_counter_sum_positive(&mp->m_frextents);
+
 	xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
 	xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF);
 	xfs_trans_log_buf(tp, bp, 0, sizeof(struct xfs_dsb) - 1);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 18/52] xfs: record rt group metadata errors in the health system
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (16 preceding siblings ...)
  2024-12-23 22:02   ` [PATCH 17/52] xfs: add frextents to the lazysbcounters when rtgroups enabled Darrick J. Wong
@ 2024-12-23 22:02   ` Darrick J. Wong
  2024-12-23 22:03   ` [PATCH 19/52] xfs: export the geometry of realtime groups to userspace Darrick J. Wong
                     ` (33 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:02 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: ab7bd650e17a392a205ec6b6c72b97cae18d43b4

Record the state of per-rtgroup metadata sickness in the rtgroup
structure for later reporting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/util.c         |    1 -
 libxfs/xfs_health.h   |   40 +++++++++++++++++++++++-----------------
 libxfs/xfs_rtbitmap.c |   37 +++++++++++++++++++------------------
 libxfs/xfs_rtgroup.c  |   38 +++++++++++++++++++++++++++++++++-----
 libxfs/xfs_rtgroup.h  |    1 +
 5 files changed, 76 insertions(+), 41 deletions(-)


diff --git a/libxfs/util.c b/libxfs/util.c
index 8c2ecff5855775..83f2f048bbe9f2 100644
--- a/libxfs/util.c
+++ b/libxfs/util.c
@@ -507,7 +507,6 @@ void xfs_btree_mark_sick(struct xfs_btree_cur *cur) { }
 void xfs_dirattr_mark_sick(struct xfs_inode *ip, int whichfork) { }
 void xfs_da_mark_sick(struct xfs_da_args *args) { }
 void xfs_inode_mark_sick(struct xfs_inode *ip, unsigned int mask) { }
-void xfs_rt_mark_sick(struct xfs_mount *mp, unsigned int mask) { }
 
 #ifdef HAVE_GETRANDOM_NONBLOCK
 uint32_t
diff --git a/libxfs/xfs_health.h b/libxfs/xfs_health.h
index a23df94319e5fb..3d93b57cf57143 100644
--- a/libxfs/xfs_health.h
+++ b/libxfs/xfs_health.h
@@ -54,6 +54,7 @@ struct xfs_inode;
 struct xfs_fsop_geom;
 struct xfs_btree_cur;
 struct xfs_da_args;
+struct xfs_rtgroup;
 
 /* Observable health issues for metadata spanning the entire filesystem. */
 #define XFS_SICK_FS_COUNTERS	(1 << 0)  /* summary counters */
@@ -65,9 +66,10 @@ struct xfs_da_args;
 #define XFS_SICK_FS_METADIR	(1 << 6)  /* metadata directory tree */
 #define XFS_SICK_FS_METAPATH	(1 << 7)  /* metadata directory tree path */
 
-/* Observable health issues for realtime volume metadata. */
-#define XFS_SICK_RT_BITMAP	(1 << 0)  /* realtime bitmap */
-#define XFS_SICK_RT_SUMMARY	(1 << 1)  /* realtime summary */
+/* Observable health issues for realtime group metadata. */
+#define XFS_SICK_RG_SUPER	(1 << 0)  /* rt group superblock */
+#define XFS_SICK_RG_BITMAP	(1 << 1)  /* rt group bitmap */
+#define XFS_SICK_RG_SUMMARY	(1 << 2)  /* rt groups summary */
 
 /* Observable health issues for AG metadata. */
 #define XFS_SICK_AG_SB		(1 << 0)  /* superblock */
@@ -111,8 +113,9 @@ struct xfs_da_args;
 				 XFS_SICK_FS_METADIR | \
 				 XFS_SICK_FS_METAPATH)
 
-#define XFS_SICK_RT_PRIMARY	(XFS_SICK_RT_BITMAP | \
-				 XFS_SICK_RT_SUMMARY)
+#define XFS_SICK_RG_PRIMARY	(XFS_SICK_RG_SUPER | \
+				 XFS_SICK_RG_BITMAP | \
+				 XFS_SICK_RG_SUMMARY)
 
 #define XFS_SICK_AG_PRIMARY	(XFS_SICK_AG_SB | \
 				 XFS_SICK_AG_AGF | \
@@ -142,26 +145,26 @@ struct xfs_da_args;
 
 /* Secondary state related to (but not primary evidence of) health problems. */
 #define XFS_SICK_FS_SECONDARY	(0)
-#define XFS_SICK_RT_SECONDARY	(0)
+#define XFS_SICK_RG_SECONDARY	(0)
 #define XFS_SICK_AG_SECONDARY	(0)
 #define XFS_SICK_INO_SECONDARY	(XFS_SICK_INO_FORGET)
 
 /* Evidence of health problems elsewhere. */
 #define XFS_SICK_FS_INDIRECT	(0)
-#define XFS_SICK_RT_INDIRECT	(0)
+#define XFS_SICK_RG_INDIRECT	(0)
 #define XFS_SICK_AG_INDIRECT	(XFS_SICK_AG_INODES)
 #define XFS_SICK_INO_INDIRECT	(0)
 
 /* All health masks. */
-#define XFS_SICK_FS_ALL	(XFS_SICK_FS_PRIMARY | \
+#define XFS_SICK_FS_ALL		(XFS_SICK_FS_PRIMARY | \
 				 XFS_SICK_FS_SECONDARY | \
 				 XFS_SICK_FS_INDIRECT)
 
-#define XFS_SICK_RT_ALL	(XFS_SICK_RT_PRIMARY | \
-				 XFS_SICK_RT_SECONDARY | \
-				 XFS_SICK_RT_INDIRECT)
+#define XFS_SICK_RG_ALL		(XFS_SICK_RG_PRIMARY | \
+				 XFS_SICK_RG_SECONDARY | \
+				 XFS_SICK_RG_INDIRECT)
 
-#define XFS_SICK_AG_ALL	(XFS_SICK_AG_PRIMARY | \
+#define XFS_SICK_AG_ALL		(XFS_SICK_AG_PRIMARY | \
 				 XFS_SICK_AG_SECONDARY | \
 				 XFS_SICK_AG_INDIRECT)
 
@@ -195,11 +198,8 @@ void xfs_fs_mark_healthy(struct xfs_mount *mp, unsigned int mask);
 void xfs_fs_measure_sickness(struct xfs_mount *mp, unsigned int *sick,
 		unsigned int *checked);
 
-void xfs_rt_mark_sick(struct xfs_mount *mp, unsigned int mask);
-void xfs_rt_mark_corrupt(struct xfs_mount *mp, unsigned int mask);
-void xfs_rt_mark_healthy(struct xfs_mount *mp, unsigned int mask);
-void xfs_rt_measure_sickness(struct xfs_mount *mp, unsigned int *sick,
-		unsigned int *checked);
+void xfs_rgno_mark_sick(struct xfs_mount *mp, xfs_rgnumber_t rgno,
+		unsigned int mask);
 
 void xfs_agno_mark_sick(struct xfs_mount *mp, xfs_agnumber_t agno,
 		unsigned int mask);
@@ -244,11 +244,17 @@ xfs_group_has_sickness(
 	xfs_group_measure_sickness(xg, &sick, &checked);
 	return sick & mask;
 }
+
 #define xfs_ag_has_sickness(pag, mask) \
 	xfs_group_has_sickness(pag_group(pag), (mask))
 #define xfs_ag_is_healthy(pag) \
 	(!xfs_ag_has_sickness((pag), UINT_MAX))
 
+#define xfs_rtgroup_has_sickness(rtg, mask) \
+	xfs_group_has_sickness(rtg_group(rtg), (mask))
+#define xfs_rtgroup_is_healthy(rtg) \
+	(!xfs_rtgroup_has_sickness((rtg), UINT_MAX))
+
 static inline bool
 xfs_inode_has_sickness(struct xfs_inode *ip, unsigned int mask)
 {
diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c
index 84ca5447ca770f..5e63340668a03d 100644
--- a/libxfs/xfs_rtbitmap.c
+++ b/libxfs/xfs_rtbitmap.c
@@ -74,28 +74,31 @@ static int
 xfs_rtbuf_get(
 	struct xfs_rtalloc_args	*args,
 	xfs_fileoff_t		block,	/* block number in bitmap or summary */
-	int			issum)	/* is summary not bitmap */
+	enum xfs_rtg_inodes	type)
 {
+	struct xfs_inode	*ip = args->rtg->rtg_inodes[type];
 	struct xfs_mount	*mp = args->mp;
 	struct xfs_buf		**cbpp;	/* cached block buffer */
 	xfs_fileoff_t		*coffp;	/* cached block number */
 	struct xfs_buf		*bp;	/* block buffer, result */
-	struct xfs_inode	*ip;	/* bitmap or summary inode */
 	struct xfs_bmbt_irec	map;
-	enum xfs_blft		type;
+	enum xfs_blft		buf_type;
 	int			nmap = 1;
 	int			error;
 
-	if (issum) {
+	switch (type) {
+	case XFS_RTGI_SUMMARY:
 		cbpp = &args->sumbp;
 		coffp = &args->sumoff;
-		ip = args->rtg->rtg_inodes[XFS_RTGI_SUMMARY];
-		type = XFS_BLFT_RTSUMMARY_BUF;
-	} else {
+		buf_type = XFS_BLFT_RTSUMMARY_BUF;
+		break;
+	case XFS_RTGI_BITMAP:
 		cbpp = &args->rbmbp;
 		coffp = &args->rbmoff;
-		ip = args->rtg->rtg_inodes[XFS_RTGI_BITMAP];
-		type = XFS_BLFT_RTBITMAP_BUF;
+		buf_type = XFS_BLFT_RTBITMAP_BUF;
+		break;
+	default:
+		return -EINVAL;
 	}
 
 	/*
@@ -118,8 +121,7 @@ xfs_rtbuf_get(
 		return error;
 
 	if (XFS_IS_CORRUPT(mp, nmap == 0 || !xfs_bmap_is_written_extent(&map))) {
-		xfs_rt_mark_sick(mp, issum ? XFS_SICK_RT_SUMMARY :
-					     XFS_SICK_RT_BITMAP);
+		xfs_rtginode_mark_sick(args->rtg, type);
 		return -EFSCORRUPTED;
 	}
 
@@ -128,12 +130,11 @@ xfs_rtbuf_get(
 				   XFS_FSB_TO_DADDR(mp, map.br_startblock),
 				   mp->m_bsize, 0, &bp, &xfs_rtbuf_ops);
 	if (xfs_metadata_is_sick(error))
-		xfs_rt_mark_sick(mp, issum ? XFS_SICK_RT_SUMMARY :
-					     XFS_SICK_RT_BITMAP);
+		xfs_rtginode_mark_sick(args->rtg, type);
 	if (error)
 		return error;
 
-	xfs_trans_buf_set_type(args->tp, bp, type);
+	xfs_trans_buf_set_type(args->tp, bp, buf_type);
 	*cbpp = bp;
 	*coffp = block;
 	return 0;
@@ -147,11 +148,11 @@ xfs_rtbitmap_read_buf(
 	struct xfs_mount		*mp = args->mp;
 
 	if (XFS_IS_CORRUPT(mp, block >= mp->m_sb.sb_rbmblocks)) {
-		xfs_rt_mark_sick(mp, XFS_SICK_RT_BITMAP);
+		xfs_rtginode_mark_sick(args->rtg, XFS_RTGI_BITMAP);
 		return -EFSCORRUPTED;
 	}
 
-	return xfs_rtbuf_get(args, block, 0);
+	return xfs_rtbuf_get(args, block, XFS_RTGI_BITMAP);
 }
 
 int
@@ -162,10 +163,10 @@ xfs_rtsummary_read_buf(
 	struct xfs_mount		*mp = args->mp;
 
 	if (XFS_IS_CORRUPT(mp, block >= mp->m_rsumblocks)) {
-		xfs_rt_mark_sick(args->mp, XFS_SICK_RT_SUMMARY);
+		xfs_rtginode_mark_sick(args->rtg, XFS_RTGI_SUMMARY);
 		return -EFSCORRUPTED;
 	}
-	return xfs_rtbuf_get(args, block, 1);
+	return xfs_rtbuf_get(args, block, XFS_RTGI_SUMMARY);
 }
 
 /*
diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index 6eabc66099dc50..f753ed5fc05588 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -268,6 +268,8 @@ struct xfs_rtginode_ops {
 
 	enum xfs_metafile_type	metafile_type;
 
+	unsigned int		sick;	/* rtgroup sickness flag */
+
 	/* Does the fs have this feature? */
 	bool			(*enabled)(struct xfs_mount *mp);
 
@@ -282,11 +284,13 @@ static const struct xfs_rtginode_ops xfs_rtginode_ops[XFS_RTGI_MAX] = {
 	[XFS_RTGI_BITMAP] = {
 		.name		= "bitmap",
 		.metafile_type	= XFS_METAFILE_RTBITMAP,
+		.sick		= XFS_SICK_RG_BITMAP,
 		.create		= xfs_rtbitmap_create,
 	},
 	[XFS_RTGI_SUMMARY] = {
 		.name		= "summary",
 		.metafile_type	= XFS_METAFILE_RTSUMMARY,
+		.sick		= XFS_SICK_RG_SUMMARY,
 		.create		= xfs_rtsummary_create,
 	},
 };
@@ -320,6 +324,17 @@ xfs_rtginode_enabled(
 	return ops->enabled(rtg_mount(rtg));
 }
 
+/* Mark an rtgroup inode sick */
+void
+xfs_rtginode_mark_sick(
+	struct xfs_rtgroup	*rtg,
+	enum xfs_rtg_inodes	type)
+{
+	const struct xfs_rtginode_ops *ops = &xfs_rtginode_ops[type];
+
+	xfs_group_mark_sick(rtg_group(rtg), ops->sick);
+}
+
 /* Load and existing rtgroup inode into the rtgroup structure. */
 int
 xfs_rtginode_load(
@@ -355,8 +370,10 @@ xfs_rtginode_load(
 	} else {
 		const char	*path;
 
-		if (!mp->m_rtdirip)
+		if (!mp->m_rtdirip) {
+			xfs_fs_mark_sick(mp, XFS_SICK_FS_METADIR);
 			return -EFSCORRUPTED;
+		}
 
 		path = xfs_rtginode_path(rtg_rgno(rtg), type);
 		if (!path)
@@ -366,17 +383,22 @@ xfs_rtginode_load(
 		kfree(path);
 	}
 
-	if (error)
+	if (error) {
+		if (xfs_metadata_is_sick(error))
+			xfs_rtginode_mark_sick(rtg, type);
 		return error;
+	}
 
 	if (XFS_IS_CORRUPT(mp, ip->i_df.if_format != XFS_DINODE_FMT_EXTENTS &&
 			       ip->i_df.if_format != XFS_DINODE_FMT_BTREE)) {
 		xfs_irele(ip);
+		xfs_rtginode_mark_sick(rtg, type);
 		return -EFSCORRUPTED;
 	}
 
 	if (XFS_IS_CORRUPT(mp, ip->i_projid != rtg_rgno(rtg))) {
 		xfs_irele(ip);
+		xfs_rtginode_mark_sick(rtg, type);
 		return -EFSCORRUPTED;
 	}
 
@@ -413,8 +435,10 @@ xfs_rtginode_create(
 	if (!xfs_rtginode_enabled(rtg, type))
 		return 0;
 
-	if (!mp->m_rtdirip)
+	if (!mp->m_rtdirip) {
+		xfs_fs_mark_sick(mp, XFS_SICK_FS_METADIR);
 		return -EFSCORRUPTED;
+	}
 
 	upd.path = xfs_rtginode_path(rtg_rgno(rtg), type);
 	if (!upd.path)
@@ -461,8 +485,10 @@ int
 xfs_rtginode_mkdir_parent(
 	struct xfs_mount	*mp)
 {
-	if (!mp->m_metadirip)
+	if (!mp->m_metadirip) {
+		xfs_fs_mark_sick(mp, XFS_SICK_FS_METADIR);
 		return -EFSCORRUPTED;
+	}
 
 	return xfs_metadir_mkdir(mp->m_metadirip, "rtgroups", &mp->m_rtdirip);
 }
@@ -474,8 +500,10 @@ xfs_rtginode_load_parent(
 {
 	struct xfs_mount	*mp = tp->t_mountp;
 
-	if (!mp->m_metadirip)
+	if (!mp->m_metadirip) {
+		xfs_fs_mark_sick(mp, XFS_SICK_FS_METADIR);
 		return -EFSCORRUPTED;
+	}
 
 	return xfs_metadir_load(tp, mp->m_metadirip, "rtgroups",
 			XFS_METAFILE_DIR, &mp->m_rtdirip);
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index e7679fafff8ce7..fba62b26912a89 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -240,6 +240,7 @@ int xfs_rtginode_load_parent(struct xfs_trans *tp);
 const char *xfs_rtginode_name(enum xfs_rtg_inodes type);
 enum xfs_metafile_type xfs_rtginode_metafile_type(enum xfs_rtg_inodes type);
 bool xfs_rtginode_enabled(struct xfs_rtgroup *rtg, enum xfs_rtg_inodes type);
+void xfs_rtginode_mark_sick(struct xfs_rtgroup *rtg, enum xfs_rtg_inodes type);
 int xfs_rtginode_load(struct xfs_rtgroup *rtg, enum xfs_rtg_inodes type,
 		struct xfs_trans *tp);
 int xfs_rtginode_create(struct xfs_rtgroup *rtg, enum xfs_rtg_inodes type,


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 19/52] xfs: export the geometry of realtime groups to userspace
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (17 preceding siblings ...)
  2024-12-23 22:02   ` [PATCH 18/52] xfs: record rt group metadata errors in the health system Darrick J. Wong
@ 2024-12-23 22:03   ` Darrick J. Wong
  2024-12-23 22:03   ` [PATCH 20/52] xfs: add block headers to realtime bitmap and summary blocks Darrick J. Wong
                     ` (32 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:03 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 3fa7a6d0c7eb264e469eaf1e3ef59b6793a853ee

Create an ioctl so that the kernel can report the status of realtime
groups to userspace.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/util.c        |    7 +++++++
 libxfs/xfs_fs.h      |   16 ++++++++++++++++
 libxfs/xfs_health.h  |    2 ++
 libxfs/xfs_rtgroup.c |   14 ++++++++++++++
 libxfs/xfs_rtgroup.h |    4 ++++
 5 files changed, 43 insertions(+)


diff --git a/libxfs/util.c b/libxfs/util.c
index 83f2f048bbe9f2..ed48e2045b484b 100644
--- a/libxfs/util.c
+++ b/libxfs/util.c
@@ -492,6 +492,13 @@ xfs_fs_mark_healthy(
 }
 
 void xfs_ag_geom_health(struct xfs_perag *pag, struct xfs_ag_geometry *ageo) { }
+void
+xfs_rtgroup_geom_health(
+	struct xfs_rtgroup		*rtg,
+	struct xfs_rtgroup_geometry	*rgeo)
+{
+	/* empty */
+}
 void xfs_fs_mark_sick(struct xfs_mount *mp, unsigned int mask) { }
 void xfs_agno_mark_sick(struct xfs_mount *mp, xfs_agnumber_t agno,
 		unsigned int mask) { }
diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index 5c224d03270ce9..4c0682173d6144 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -971,6 +971,21 @@ struct xfs_getparents_by_handle {
 	struct xfs_getparents		gph_request;
 };
 
+/*
+ * Output for XFS_IOC_RTGROUP_GEOMETRY
+ */
+struct xfs_rtgroup_geometry {
+	__u32 rg_number;	/* i/o: rtgroup number */
+	__u32 rg_length;	/* o: length in blocks */
+	__u32 rg_sick;		/* o: sick things in ag */
+	__u32 rg_checked;	/* o: checked metadata in ag */
+	__u32 rg_flags;		/* i/o: flags for this ag */
+	__u32 rg_reserved[27];	/* o: zero */
+};
+#define XFS_RTGROUP_GEOM_SICK_SUPER	(1U << 0)  /* superblock */
+#define XFS_RTGROUP_GEOM_SICK_BITMAP	(1U << 1)  /* rtbitmap */
+#define XFS_RTGROUP_GEOM_SICK_SUMMARY	(1U << 2)  /* rtsummary */
+
 /*
  * ioctl commands that are used by Linux filesystems
  */
@@ -1009,6 +1024,7 @@ struct xfs_getparents_by_handle {
 #define XFS_IOC_GETPARENTS	_IOWR('X', 62, struct xfs_getparents)
 #define XFS_IOC_GETPARENTS_BY_HANDLE _IOWR('X', 63, struct xfs_getparents_by_handle)
 #define XFS_IOC_SCRUBV_METADATA	_IOWR('X', 64, struct xfs_scrub_vec_head)
+#define XFS_IOC_RTGROUP_GEOMETRY _IOWR('X', 65, struct xfs_rtgroup_geometry)
 
 /*
  * ioctl commands that replace IRIX syssgi()'s
diff --git a/libxfs/xfs_health.h b/libxfs/xfs_health.h
index 3d93b57cf57143..d34986ac18c3fa 100644
--- a/libxfs/xfs_health.h
+++ b/libxfs/xfs_health.h
@@ -278,6 +278,8 @@ xfs_inode_is_healthy(struct xfs_inode *ip)
 
 void xfs_fsop_geom_health(struct xfs_mount *mp, struct xfs_fsop_geom *geo);
 void xfs_ag_geom_health(struct xfs_perag *pag, struct xfs_ag_geometry *ageo);
+void xfs_rtgroup_geom_health(struct xfs_rtgroup *rtg,
+		struct xfs_rtgroup_geometry *rgeo);
 void xfs_bulkstat_health(struct xfs_inode *ip, struct xfs_bulkstat *bs);
 
 #define xfs_metadata_is_sick(error) \
diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index f753ed5fc05588..50cfb038f03b79 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -211,6 +211,20 @@ xfs_rtgroup_trans_join(
 	}
 }
 
+/* Retrieve rt group geometry. */
+int
+xfs_rtgroup_get_geometry(
+	struct xfs_rtgroup	*rtg,
+	struct xfs_rtgroup_geometry *rgeo)
+{
+	/* Fill out form. */
+	memset(rgeo, 0, sizeof(*rgeo));
+	rgeo->rg_number = rtg_rgno(rtg);
+	rgeo->rg_length = rtg->rtg_extents * rtg_mount(rtg)->m_sb.sb_rextsize;
+	xfs_rtgroup_geom_health(rtg, rgeo);
+	return 0;
+}
+
 #ifdef CONFIG_PROVE_LOCKING
 static struct lock_class_key xfs_rtginode_lock_class;
 
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index fba62b26912a89..026f34f984b32f 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -234,6 +234,9 @@ void xfs_rtgroup_unlock(struct xfs_rtgroup *rtg, unsigned int rtglock_flags);
 void xfs_rtgroup_trans_join(struct xfs_trans *tp, struct xfs_rtgroup *rtg,
 		unsigned int rtglock_flags);
 
+int xfs_rtgroup_get_geometry(struct xfs_rtgroup *rtg,
+		struct xfs_rtgroup_geometry *rgeo);
+
 int xfs_rtginode_mkdir_parent(struct xfs_mount *mp);
 int xfs_rtginode_load_parent(struct xfs_trans *tp);
 
@@ -277,6 +280,7 @@ static inline int xfs_initialize_rtgroups(struct xfs_mount *mp,
 # define xfs_rtgroup_trans_join(tp, rtg, gf)	((void)0)
 # define xfs_update_rtsb(bp, sb_bp)	((void)0)
 # define xfs_log_rtsb(tp, sb_bp)	(NULL)
+# define xfs_rtgroup_get_geometry(rtg, rgeo)	(-EOPNOTSUPP)
 #endif /* CONFIG_XFS_RT */
 
 #endif /* __LIBXFS_RTGROUP_H */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 20/52] xfs: add block headers to realtime bitmap and summary blocks
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (18 preceding siblings ...)
  2024-12-23 22:03   ` [PATCH 19/52] xfs: export the geometry of realtime groups to userspace Darrick J. Wong
@ 2024-12-23 22:03   ` Darrick J. Wong
  2024-12-23 22:03   ` [PATCH 21/52] xfs: encode the rtbitmap in big endian format Darrick J. Wong
                     ` (31 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:03 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 118895aa9513412b9077a8cae0bc63df8956f9b2

Upgrade rtbitmap and rtsummary blocks to have self describing metadata
like most every other thing in XFS.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/xfs_mount.h   |    3 +
 libxfs/xfs_format.h   |   18 ++++++
 libxfs/xfs_ondisk.h   |    1 
 libxfs/xfs_rtbitmap.c |  144 ++++++++++++++++++++++++++++++++++++++++++++-----
 libxfs/xfs_rtbitmap.h |   50 ++++++++++++++++-
 libxfs/xfs_sb.c       |   21 ++++++-
 libxfs/xfs_shared.h   |    2 +
 7 files changed, 218 insertions(+), 21 deletions(-)


diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index c03246a08a4af3..92c01c52e99e4a 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -101,7 +101,8 @@ typedef struct xfs_mount {
 	int8_t			m_rgblklog;	/* log2 of rt group sz if possible */
 	uint			m_blockmask;	/* sb_blocksize-1 */
 	uint			m_blockwsize;	/* sb_blocksize in words */
-	uint			m_blockwmask;	/* blockwsize-1 */
+	/* number of rt extents per rt bitmap block if rtgroups enabled */
+	unsigned int		m_rtx_per_rbmblock;
 	uint			m_alloc_mxr[2];	/* XFS_ALLOC_BLOCK_MAXRECS */
 	uint			m_alloc_mnr[2];	/* XFS_ALLOC_BLOCK_MINRECS */
 	uint			m_bmap_dmxr[2];	/* XFS_BMAP_BLOCK_DMAXRECS */
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index 3cf044a2d5f2a9..016ee4ff537440 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -1278,6 +1278,24 @@ static inline bool xfs_dinode_is_metadir(const struct xfs_dinode *dip)
 #define	XFS_DFL_RTEXTSIZE	(64 * 1024)	        /* 64kB */
 #define	XFS_MIN_RTEXTSIZE	(4 * 1024)		/* 4kB */
 
+/*
+ * RT bit manipulation macros.
+ */
+#define XFS_RTBITMAP_MAGIC	0x424D505A	/* BMPZ */
+#define XFS_RTSUMMARY_MAGIC	0x53554D59	/* SUMY */
+
+struct xfs_rtbuf_blkinfo {
+	__be32		rt_magic;	/* validity check on block */
+	__be32		rt_crc;		/* CRC of block */
+	__be64		rt_owner;	/* inode that owns the block */
+	__be64		rt_blkno;	/* first block of the buffer */
+	__be64		rt_lsn;		/* sequence number of last write */
+	uuid_t		rt_uuid;	/* filesystem we belong to */
+};
+
+#define XFS_RTBUF_CRC_OFF \
+	offsetof(struct xfs_rtbuf_blkinfo, rt_crc)
+
 /*
  * Dquot and dquot block format definitions
  */
diff --git a/libxfs/xfs_ondisk.h b/libxfs/xfs_ondisk.h
index 38b314113d8f24..6a2bcbc392842c 100644
--- a/libxfs/xfs_ondisk.h
+++ b/libxfs/xfs_ondisk.h
@@ -76,6 +76,7 @@ xfs_check_ondisk_structs(void)
 	/* realtime structures */
 	XFS_CHECK_STRUCT_SIZE(union xfs_rtword_raw,		4);
 	XFS_CHECK_STRUCT_SIZE(union xfs_suminfo_raw,		4);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_rtbuf_blkinfo,		48);
 
 	/*
 	 * m68k has problems with xfs_attr_leaf_name_remote_t, but we pad it to
diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c
index 5e63340668a03d..580e74b7d317db 100644
--- a/libxfs/xfs_rtbitmap.c
+++ b/libxfs/xfs_rtbitmap.c
@@ -24,23 +24,77 @@
  * Realtime allocator bitmap functions shared with userspace.
  */
 
-/*
- * Real time buffers need verifiers to avoid runtime warnings during IO.
- * We don't have anything to verify, however, so these are just dummy
- * operations.
- */
+static xfs_failaddr_t
+xfs_rtbuf_verify(
+	struct xfs_buf			*bp)
+{
+	struct xfs_mount		*mp = bp->b_mount;
+	struct xfs_rtbuf_blkinfo	*hdr = bp->b_addr;
+
+	if (!xfs_verify_magic(bp, hdr->rt_magic))
+		return __this_address;
+	if (!xfs_has_rtgroups(mp))
+		return __this_address;
+	if (!xfs_has_crc(mp))
+		return __this_address;
+	if (!uuid_equal(&hdr->rt_uuid, &mp->m_sb.sb_meta_uuid))
+		return __this_address;
+	if (hdr->rt_blkno != cpu_to_be64(xfs_buf_daddr(bp)))
+		return __this_address;
+	return NULL;
+}
+
 static void
 xfs_rtbuf_verify_read(
-	struct xfs_buf	*bp)
+	struct xfs_buf			*bp)
 {
+	struct xfs_mount		*mp = bp->b_mount;
+	struct xfs_rtbuf_blkinfo	*hdr = bp->b_addr;
+	xfs_failaddr_t			fa;
+
+	if (!xfs_has_rtgroups(mp))
+		return;
+
+	if (!xfs_log_check_lsn(mp, be64_to_cpu(hdr->rt_lsn))) {
+		fa = __this_address;
+		goto fail;
+	}
+
+	if (!xfs_buf_verify_cksum(bp, XFS_RTBUF_CRC_OFF)) {
+		fa = __this_address;
+		goto fail;
+	}
+
+	fa = xfs_rtbuf_verify(bp);
+	if (fa)
+		goto fail;
+
 	return;
+fail:
+	xfs_verifier_error(bp, -EFSCORRUPTED, fa);
 }
 
 static void
 xfs_rtbuf_verify_write(
 	struct xfs_buf	*bp)
 {
-	return;
+	struct xfs_mount		*mp = bp->b_mount;
+	struct xfs_rtbuf_blkinfo	*hdr = bp->b_addr;
+	struct xfs_buf_log_item		*bip = bp->b_log_item;
+	xfs_failaddr_t			fa;
+
+	if (!xfs_has_rtgroups(mp))
+		return;
+
+	fa = xfs_rtbuf_verify(bp);
+	if (fa) {
+		xfs_verifier_error(bp, -EFSCORRUPTED, fa);
+		return;
+	}
+
+	if (bip)
+		hdr->rt_lsn = cpu_to_be64(bip->bli_item.li_lsn);
+	xfs_buf_update_cksum(bp, XFS_RTBUF_CRC_OFF);
 }
 
 const struct xfs_buf_ops xfs_rtbuf_ops = {
@@ -49,6 +103,22 @@ const struct xfs_buf_ops xfs_rtbuf_ops = {
 	.verify_write = xfs_rtbuf_verify_write,
 };
 
+const struct xfs_buf_ops xfs_rtbitmap_buf_ops = {
+	.name		= "xfs_rtbitmap",
+	.magic		= { 0, cpu_to_be32(XFS_RTBITMAP_MAGIC) },
+	.verify_read	= xfs_rtbuf_verify_read,
+	.verify_write	= xfs_rtbuf_verify_write,
+	.verify_struct	= xfs_rtbuf_verify,
+};
+
+const struct xfs_buf_ops xfs_rtsummary_buf_ops = {
+	.name		= "xfs_rtsummary",
+	.magic		= { 0, cpu_to_be32(XFS_RTSUMMARY_MAGIC) },
+	.verify_read	= xfs_rtbuf_verify_read,
+	.verify_write	= xfs_rtbuf_verify_write,
+	.verify_struct	= xfs_rtbuf_verify,
+};
+
 /* Release cached rt bitmap and summary buffers. */
 void
 xfs_rtbuf_cache_relse(
@@ -128,12 +198,24 @@ xfs_rtbuf_get(
 	ASSERT(map.br_startblock != NULLFSBLOCK);
 	error = xfs_trans_read_buf(mp, args->tp, mp->m_ddev_targp,
 				   XFS_FSB_TO_DADDR(mp, map.br_startblock),
-				   mp->m_bsize, 0, &bp, &xfs_rtbuf_ops);
+				   mp->m_bsize, 0, &bp,
+				   xfs_rtblock_ops(mp, type));
 	if (xfs_metadata_is_sick(error))
 		xfs_rtginode_mark_sick(args->rtg, type);
 	if (error)
 		return error;
 
+	if (xfs_has_rtgroups(mp)) {
+		struct xfs_rtbuf_blkinfo	*hdr = bp->b_addr;
+
+		if (hdr->rt_owner != cpu_to_be64(ip->i_ino)) {
+			xfs_buf_mark_corrupt(bp);
+			xfs_trans_brelse(args->tp, bp);
+			xfs_rtginode_mark_sick(args->rtg, type);
+			return -EFSCORRUPTED;
+		}
+	}
+
 	xfs_trans_buf_set_type(args->tp, bp, buf_type);
 	*cbpp = bp;
 	*coffp = block;
@@ -1144,6 +1226,19 @@ xfs_rtalloc_extent_is_free(
 	return 0;
 }
 
+/* Compute the number of rt extents tracked by a single bitmap block. */
+xfs_rtxnum_t
+xfs_rtbitmap_rtx_per_rbmblock(
+	struct xfs_mount	*mp)
+{
+	unsigned int		rbmblock_bytes = mp->m_sb.sb_blocksize;
+
+	if (xfs_has_rtgroups(mp))
+		rbmblock_bytes -= sizeof(struct xfs_rtbuf_blkinfo);
+
+	return rbmblock_bytes * NBBY;
+}
+
 /*
  * Compute the number of rtbitmap blocks needed to track the given number of rt
  * extents.
@@ -1153,7 +1248,7 @@ xfs_rtbitmap_blockcount_len(
 	struct xfs_mount	*mp,
 	xfs_rtbxlen_t		rtextents)
 {
-	return howmany_64(rtextents, NBBY * mp->m_sb.sb_blocksize);
+	return howmany_64(rtextents, xfs_rtbitmap_rtx_per_rbmblock(mp));
 }
 
 /* How many rt extents does each rtbitmap file track? */
@@ -1190,11 +1285,12 @@ xfs_rtsummary_blockcount(
 	struct xfs_mount	*mp,
 	unsigned int		*rsumlevels)
 {
+	xfs_rtbxlen_t		rextents = xfs_rtbitmap_bitcount(mp);
 	unsigned long long	rsumwords;
 
-	*rsumlevels = xfs_compute_rextslog(xfs_rtbitmap_bitcount(mp)) + 1;
-	rsumwords = xfs_rtbitmap_blockcount(mp) * (*rsumlevels);
-	return XFS_B_TO_FSB(mp, rsumwords << XFS_WORDLOG);
+	*rsumlevels = xfs_compute_rextslog(rextents) + 1;
+	rsumwords = xfs_rtbitmap_blockcount_len(mp, rextents) * (*rsumlevels);
+	return howmany_64(rsumwords, mp->m_blockwsize);
 }
 
 static int
@@ -1246,6 +1342,7 @@ xfs_rtfile_initialize_block(
 	struct xfs_inode	*ip = rtg->rtg_inodes[type];
 	struct xfs_trans	*tp;
 	struct xfs_buf		*bp;
+	void			*bufdata;
 	const size_t		copylen = mp->m_blockwsize << XFS_WORDLOG;
 	enum xfs_blft		buf_type;
 	int			error;
@@ -1269,13 +1366,30 @@ xfs_rtfile_initialize_block(
 		xfs_trans_cancel(tp);
 		return error;
 	}
+	bufdata = bp->b_addr;
 
 	xfs_trans_buf_set_type(tp, bp, buf_type);
-	bp->b_ops = &xfs_rtbuf_ops;
+	bp->b_ops = xfs_rtblock_ops(mp, type);
+
+	if (xfs_has_rtgroups(mp)) {
+		struct xfs_rtbuf_blkinfo	*hdr = bp->b_addr;
+
+		if (type == XFS_RTGI_BITMAP)
+			hdr->rt_magic = cpu_to_be32(XFS_RTBITMAP_MAGIC);
+		else
+			hdr->rt_magic = cpu_to_be32(XFS_RTSUMMARY_MAGIC);
+		hdr->rt_owner = cpu_to_be64(ip->i_ino);
+		hdr->rt_blkno = cpu_to_be64(XFS_FSB_TO_DADDR(mp, fsbno));
+		hdr->rt_lsn = 0;
+		uuid_copy(&hdr->rt_uuid, &mp->m_sb.sb_meta_uuid);
+
+		bufdata += sizeof(*hdr);
+	}
+
 	if (data)
-		memcpy(bp->b_addr, data, copylen);
+		memcpy(bufdata, data, copylen);
 	else
-		memset(bp->b_addr, 0, copylen);
+		memset(bufdata, 0, copylen);
 	xfs_trans_log_buf(tp, bp, 0, mp->m_sb.sb_blocksize - 1);
 	return xfs_trans_commit(tp);
 }
diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h
index b2b9e59a87a278..2286a98ecb32bb 100644
--- a/libxfs/xfs_rtbitmap.h
+++ b/libxfs/xfs_rtbitmap.h
@@ -150,6 +150,9 @@ xfs_rtx_to_rbmblock(
 	struct xfs_mount	*mp,
 	xfs_rtxnum_t		rtx)
 {
+	if (xfs_has_rtgroups(mp))
+		return div_u64(rtx, mp->m_rtx_per_rbmblock);
+
 	return rtx >> mp->m_blkbit_log;
 }
 
@@ -159,6 +162,13 @@ xfs_rtx_to_rbmword(
 	struct xfs_mount	*mp,
 	xfs_rtxnum_t		rtx)
 {
+	if (xfs_has_rtgroups(mp)) {
+		unsigned int	mod;
+
+		div_u64_rem(rtx >> XFS_NBWORDLOG, mp->m_blockwsize, &mod);
+		return mod;
+	}
+
 	return (rtx >> XFS_NBWORDLOG) & (mp->m_blockwsize - 1);
 }
 
@@ -168,6 +178,9 @@ xfs_rbmblock_to_rtx(
 	struct xfs_mount	*mp,
 	xfs_fileoff_t		rbmoff)
 {
+	if (xfs_has_rtgroups(mp))
+		return rbmoff * mp->m_rtx_per_rbmblock;
+
 	return rbmoff << mp->m_blkbit_log;
 }
 
@@ -177,7 +190,14 @@ xfs_rbmblock_wordptr(
 	struct xfs_rtalloc_args	*args,
 	unsigned int		index)
 {
-	union xfs_rtword_raw	*words = args->rbmbp->b_addr;
+	struct xfs_mount	*mp = args->mp;
+	union xfs_rtword_raw	*words;
+	struct xfs_rtbuf_blkinfo *hdr = args->rbmbp->b_addr;
+
+	if (xfs_has_rtgroups(mp))
+		words = (union xfs_rtword_raw *)(hdr + 1);
+	else
+		words = args->rbmbp->b_addr;
 
 	return words + index;
 }
@@ -227,6 +247,9 @@ xfs_rtsumoffs_to_block(
 	struct xfs_mount	*mp,
 	xfs_rtsumoff_t		rsumoff)
 {
+	if (xfs_has_rtgroups(mp))
+		return rsumoff / mp->m_blockwsize;
+
 	return XFS_B_TO_FSBT(mp, rsumoff * sizeof(xfs_suminfo_t));
 }
 
@@ -241,6 +264,9 @@ xfs_rtsumoffs_to_infoword(
 {
 	unsigned int		mask = mp->m_blockmask >> XFS_SUMINFOLOG;
 
+	if (xfs_has_rtgroups(mp))
+		return rsumoff % mp->m_blockwsize;
+
 	return rsumoff & mask;
 }
 
@@ -250,7 +276,13 @@ xfs_rsumblock_infoptr(
 	struct xfs_rtalloc_args	*args,
 	unsigned int		index)
 {
-	union xfs_suminfo_raw	*info = args->sumbp->b_addr;
+	union xfs_suminfo_raw	*info;
+	struct xfs_rtbuf_blkinfo *hdr = args->sumbp->b_addr;
+
+	if (xfs_has_rtgroups(args->mp))
+		info = (union xfs_suminfo_raw *)(hdr + 1);
+	else
+		info = args->sumbp->b_addr;
 
 	return info + index;
 }
@@ -279,6 +311,19 @@ xfs_suminfo_add(
 	return info->old;
 }
 
+static inline const struct xfs_buf_ops *
+xfs_rtblock_ops(
+	struct xfs_mount	*mp,
+	enum xfs_rtg_inodes	type)
+{
+	if (xfs_has_rtgroups(mp)) {
+		if (type == XFS_RTGI_SUMMARY)
+			return &xfs_rtsummary_buf_ops;
+		return &xfs_rtbitmap_buf_ops;
+	}
+	return &xfs_rtbuf_ops;
+}
+
 /*
  * Functions for walking free space rtextents in the realtime bitmap.
  */
@@ -324,6 +369,7 @@ int xfs_rtfree_extent(struct xfs_trans *tp, struct xfs_rtgroup *rtg,
 int xfs_rtfree_blocks(struct xfs_trans *tp, struct xfs_rtgroup *rtg,
 		xfs_fsblock_t rtbno, xfs_filblks_t rtlen);
 
+xfs_rtxnum_t xfs_rtbitmap_rtx_per_rbmblock(struct xfs_mount *mp);
 xfs_filblks_t xfs_rtbitmap_blockcount(struct xfs_mount *mp);
 xfs_filblks_t xfs_rtbitmap_blockcount_len(struct xfs_mount *mp,
 		xfs_rtbxlen_t rtextents);
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 88fb4890d95a72..87be47083aa571 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -243,11 +243,26 @@ xfs_extents_per_rbm(
 	return sbp->sb_rextents;
 }
 
+/*
+ * Return the payload size of a single rt bitmap block (without the metadata
+ * header if any).
+ */
+static inline unsigned int
+xfs_rtbmblock_size(
+	struct xfs_sb		*sbp)
+{
+	if (xfs_sb_is_v5(sbp) &&
+	    (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR))
+		return sbp->sb_blocksize - sizeof(struct xfs_rtbuf_blkinfo);
+	return sbp->sb_blocksize;
+}
+
 static uint64_t
 xfs_expected_rbmblocks(
 	struct xfs_sb		*sbp)
 {
-	return howmany_64(xfs_extents_per_rbm(sbp), NBBY * sbp->sb_blocksize);
+	return howmany_64(xfs_extents_per_rbm(sbp),
+			  NBBY * xfs_rtbmblock_size(sbp));
 }
 
 /* Validate the realtime geometry */
@@ -1109,8 +1124,8 @@ xfs_sb_mount_common(
 	mp->m_sectbb_log = sbp->sb_sectlog - BBSHIFT;
 	mp->m_agno_log = xfs_highbit32(sbp->sb_agcount - 1) + 1;
 	mp->m_blockmask = sbp->sb_blocksize - 1;
-	mp->m_blockwsize = sbp->sb_blocksize >> XFS_WORDLOG;
-	mp->m_blockwmask = mp->m_blockwsize - 1;
+	mp->m_blockwsize = xfs_rtbmblock_size(sbp) >> XFS_WORDLOG;
+	mp->m_rtx_per_rbmblock = mp->m_blockwsize << XFS_NBWORDLOG;
 
 	ags->blocks = mp->m_sb.sb_agblocks;
 	ags->blklog = mp->m_sb.sb_agblklog;
diff --git a/libxfs/xfs_shared.h b/libxfs/xfs_shared.h
index 552365d212ea26..9363f918675ac0 100644
--- a/libxfs/xfs_shared.h
+++ b/libxfs/xfs_shared.h
@@ -38,6 +38,8 @@ extern const struct xfs_buf_ops xfs_inode_buf_ops;
 extern const struct xfs_buf_ops xfs_inode_buf_ra_ops;
 extern const struct xfs_buf_ops xfs_refcountbt_buf_ops;
 extern const struct xfs_buf_ops xfs_rmapbt_buf_ops;
+extern const struct xfs_buf_ops xfs_rtbitmap_buf_ops;
+extern const struct xfs_buf_ops xfs_rtsummary_buf_ops;
 extern const struct xfs_buf_ops xfs_rtbuf_ops;
 extern const struct xfs_buf_ops xfs_rtsb_buf_ops;
 extern const struct xfs_buf_ops xfs_sb_buf_ops;


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 21/52] xfs: encode the rtbitmap in big endian format
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (19 preceding siblings ...)
  2024-12-23 22:03   ` [PATCH 20/52] xfs: add block headers to realtime bitmap and summary blocks Darrick J. Wong
@ 2024-12-23 22:03   ` Darrick J. Wong
  2024-12-23 22:03   ` [PATCH 22/52] xfs: encode the rtsummary " Darrick J. Wong
                     ` (30 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:03 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: eba42c2c53c8b8905307b702c93dffef0719a896

Currently, the ondisk realtime bitmap file is accessed in units of
32-bit words.  There's no endian translation of the contents of this
file, which means that the Bad Things Happen(tm) if you go from (say)
x86 to powerpc.  Since we have a new feature flag, let's take the
opportunity to enforce an endianness on the file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_format.h   |    4 +++-
 libxfs/xfs_rtbitmap.h |    7 ++++++-
 2 files changed, 9 insertions(+), 2 deletions(-)


diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index 016ee4ff537440..cd9457ed5873fe 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -709,10 +709,12 @@ struct xfs_agfl {
 
 /*
  * Realtime bitmap information is accessed by the word, which is currently
- * stored in host-endian format.
+ * stored in host-endian format.  Starting with the realtime groups feature,
+ * the words are stored in be32 ondisk.
  */
 union xfs_rtword_raw {
 	__u32		old;
+	__be32		rtg;
 };
 
 /*
diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h
index 2286a98ecb32bb..f9c0d241590104 100644
--- a/libxfs/xfs_rtbitmap.h
+++ b/libxfs/xfs_rtbitmap.h
@@ -210,6 +210,8 @@ xfs_rtbitmap_getword(
 {
 	union xfs_rtword_raw	*word = xfs_rbmblock_wordptr(args, index);
 
+	if (xfs_has_rtgroups(args->mp))
+		return be32_to_cpu(word->rtg);
 	return word->old;
 }
 
@@ -222,7 +224,10 @@ xfs_rtbitmap_setword(
 {
 	union xfs_rtword_raw	*word = xfs_rbmblock_wordptr(args, index);
 
-	word->old = value;
+	if (xfs_has_rtgroups(args->mp))
+		word->rtg = cpu_to_be32(value);
+	else
+		word->old = value;
 }
 
 /*


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 22/52] xfs: encode the rtsummary in big endian format
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (20 preceding siblings ...)
  2024-12-23 22:03   ` [PATCH 21/52] xfs: encode the rtbitmap in big endian format Darrick J. Wong
@ 2024-12-23 22:03   ` Darrick J. Wong
  2024-12-23 22:04   ` [PATCH 23/52] xfs: grow the realtime section when realtime groups are enabled Darrick J. Wong
                     ` (29 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:03 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: a2c28367396a85f2d9cfb22acfcedcff08dd1c3c

Currently, the ondisk realtime summary file counters are accessed in
units of 32-bit words.  There's no endian translation of the contents of
this file, which means that the Bad Things Happen(tm) if you go from
(say) x86 to powerpc.  Since we have a new feature flag, let's take the
opportunity to enforce an endianness on the file.  Encode the summary
information in big endian format, like most of the rest of the
filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_format.h   |    4 +++-
 libxfs/xfs_rtbitmap.h |    7 +++++++
 2 files changed, 10 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index cd9457ed5873fe..f56ff9f43c218f 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -719,10 +719,12 @@ union xfs_rtword_raw {
 
 /*
  * Realtime summary counts are accessed by the word, which is currently
- * stored in host-endian format.
+ * stored in host-endian format.  Starting with the realtime groups feature,
+ * the words are stored in be32 ondisk.
  */
 union xfs_suminfo_raw {
 	__u32		old;
+	__be32		rtg;
 };
 
 /*
diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h
index f9c0d241590104..7be76490a31879 100644
--- a/libxfs/xfs_rtbitmap.h
+++ b/libxfs/xfs_rtbitmap.h
@@ -300,6 +300,8 @@ xfs_suminfo_get(
 {
 	union xfs_suminfo_raw	*info = xfs_rsumblock_infoptr(args, index);
 
+	if (xfs_has_rtgroups(args->mp))
+		return be32_to_cpu(info->rtg);
 	return info->old;
 }
 
@@ -312,6 +314,11 @@ xfs_suminfo_add(
 {
 	union xfs_suminfo_raw	*info = xfs_rsumblock_infoptr(args, index);
 
+	if (xfs_has_rtgroups(args->mp)) {
+		be32_add_cpu(&info->rtg, delta);
+		return be32_to_cpu(info->rtg);
+	}
+
 	info->old += delta;
 	return info->old;
 }


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 23/52] xfs: grow the realtime section when realtime groups are enabled
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (21 preceding siblings ...)
  2024-12-23 22:03   ` [PATCH 22/52] xfs: encode the rtsummary " Darrick J. Wong
@ 2024-12-23 22:04   ` Darrick J. Wong
  2024-12-23 22:04   ` [PATCH 24/52] xfs: support logging EFIs for realtime extents Darrick J. Wong
                     ` (28 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:04 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: ee321351487ae00db147d570c8c2a43e10207386

Enable growing the rt section when realtime groups are enabled.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_shared.h |    1 +
 1 file changed, 1 insertion(+)


diff --git a/libxfs/xfs_shared.h b/libxfs/xfs_shared.h
index 9363f918675ac0..e7efdb9ceaf382 100644
--- a/libxfs/xfs_shared.h
+++ b/libxfs/xfs_shared.h
@@ -160,6 +160,7 @@ void	xfs_log_get_max_trans_res(struct xfs_mount *mp,
 #define	XFS_TRANS_SB_RBLOCKS		0x00000800
 #define	XFS_TRANS_SB_REXTENTS		0x00001000
 #define	XFS_TRANS_SB_REXTSLOG		0x00002000
+#define XFS_TRANS_SB_RGCOUNT		0x00004000
 
 /*
  * Here we centralize the specification of XFS meta-data buffer reference count


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 24/52] xfs: support logging EFIs for realtime extents
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (22 preceding siblings ...)
  2024-12-23 22:04   ` [PATCH 23/52] xfs: grow the realtime section when realtime groups are enabled Darrick J. Wong
@ 2024-12-23 22:04   ` Darrick J. Wong
  2024-12-23 22:04   ` [PATCH 25/52] xfs: support error injection when freeing rt extents Darrick J. Wong
                     ` (27 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:04 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 4c8900bbf106592ce647285e308abd2a7f080d88

Teach the EFI mechanism how to free realtime extents.  We're going to
need this to enforce proper ordering of operations when we enable
realtime rmap.

Declare a new log intent item type (XFS_LI_EFI_RT) and a separate defer
ops for rt extents.  This keeps the ondisk artifacts and processing code
completely separate between the rt and non-rt cases.  Hopefully this
will make it easier to debug filesystem problems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_alloc.c      |   15 +++++++++++++--
 libxfs/xfs_alloc.h      |   12 +++++++++++-
 libxfs/xfs_defer.c      |    6 ++++++
 libxfs/xfs_defer.h      |    1 +
 libxfs/xfs_log_format.h |    6 +++++-
 5 files changed, 36 insertions(+), 4 deletions(-)


diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index c449285743534d..9aebe7227a6148 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -2644,8 +2644,17 @@ xfs_defer_extent_free(
 	ASSERT(!isnullstartblock(bno));
 	ASSERT(!(free_flags & ~XFS_FREE_EXTENT_ALL_FLAGS));
 
-	if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbext(mp, bno, len)))
-		return -EFSCORRUPTED;
+	if (free_flags & XFS_FREE_EXTENT_REALTIME) {
+		if (type != XFS_AG_RESV_NONE) {
+			ASSERT(type == XFS_AG_RESV_NONE);
+			return -EFSCORRUPTED;
+		}
+		if (XFS_IS_CORRUPT(mp, !xfs_verify_rtbext(mp, bno, len)))
+			return -EFSCORRUPTED;
+	} else {
+		if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbext(mp, bno, len)))
+			return -EFSCORRUPTED;
+	}
 
 	xefi = kmem_cache_zalloc(xfs_extfree_item_cache,
 			       GFP_KERNEL | __GFP_NOFAIL);
@@ -2654,6 +2663,8 @@ xfs_defer_extent_free(
 	xefi->xefi_agresv = type;
 	if (free_flags & XFS_FREE_EXTENT_SKIP_DISCARD)
 		xefi->xefi_flags |= XFS_EFI_SKIP_DISCARD;
+	if (free_flags & XFS_FREE_EXTENT_REALTIME)
+		xefi->xefi_flags |= XFS_EFI_REALTIME;
 	if (oinfo) {
 		ASSERT(oinfo->oi_offset == 0);
 
diff --git a/libxfs/xfs_alloc.h b/libxfs/xfs_alloc.h
index efbde04fbbb15f..50ef79a1ed41a1 100644
--- a/libxfs/xfs_alloc.h
+++ b/libxfs/xfs_alloc.h
@@ -237,7 +237,11 @@ int xfs_free_extent_later(struct xfs_trans *tp, xfs_fsblock_t bno,
 /* Don't issue a discard for the blocks freed. */
 #define XFS_FREE_EXTENT_SKIP_DISCARD	(1U << 0)
 
-#define XFS_FREE_EXTENT_ALL_FLAGS	(XFS_FREE_EXTENT_SKIP_DISCARD)
+/* Free blocks on the realtime device. */
+#define XFS_FREE_EXTENT_REALTIME	(1U << 1)
+
+#define XFS_FREE_EXTENT_ALL_FLAGS	(XFS_FREE_EXTENT_SKIP_DISCARD | \
+					 XFS_FREE_EXTENT_REALTIME)
 
 /*
  * List of extents to be free "later".
@@ -257,6 +261,12 @@ struct xfs_extent_free_item {
 #define XFS_EFI_ATTR_FORK	(1U << 1) /* freeing attr fork block */
 #define XFS_EFI_BMBT_BLOCK	(1U << 2) /* freeing bmap btree block */
 #define XFS_EFI_CANCELLED	(1U << 3) /* dont actually free the space */
+#define XFS_EFI_REALTIME	(1U << 4) /* freeing realtime extent */
+
+static inline bool xfs_efi_is_realtime(const struct xfs_extent_free_item *xefi)
+{
+	return xefi->xefi_flags & XFS_EFI_REALTIME;
+}
 
 struct xfs_alloc_autoreap {
 	struct xfs_defer_pending	*dfp;
diff --git a/libxfs/xfs_defer.c b/libxfs/xfs_defer.c
index e3a608f64536ec..8f6708c0f3bfcd 100644
--- a/libxfs/xfs_defer.c
+++ b/libxfs/xfs_defer.c
@@ -839,6 +839,12 @@ xfs_defer_add(
 
 	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
 
+	if (!ops->finish_item) {
+		ASSERT(ops->finish_item != NULL);
+		xfs_force_shutdown(tp->t_mountp, SHUTDOWN_CORRUPT_INCORE);
+		return NULL;
+	}
+
 	dfp = xfs_defer_find_last(tp, ops);
 	if (!dfp || !xfs_defer_can_append(dfp, ops))
 		dfp = xfs_defer_alloc(&tp->t_dfops, ops);
diff --git a/libxfs/xfs_defer.h b/libxfs/xfs_defer.h
index 8b338031e487c4..ec51b8465e61cb 100644
--- a/libxfs/xfs_defer.h
+++ b/libxfs/xfs_defer.h
@@ -71,6 +71,7 @@ extern const struct xfs_defer_op_type xfs_refcount_update_defer_type;
 extern const struct xfs_defer_op_type xfs_rmap_update_defer_type;
 extern const struct xfs_defer_op_type xfs_extent_free_defer_type;
 extern const struct xfs_defer_op_type xfs_agfl_free_defer_type;
+extern const struct xfs_defer_op_type xfs_rtextent_free_defer_type;
 extern const struct xfs_defer_op_type xfs_attr_defer_type;
 extern const struct xfs_defer_op_type xfs_exchmaps_defer_type;
 
diff --git a/libxfs/xfs_log_format.h b/libxfs/xfs_log_format.h
index ace7384a275bfb..15dec19b6c32ad 100644
--- a/libxfs/xfs_log_format.h
+++ b/libxfs/xfs_log_format.h
@@ -248,6 +248,8 @@ typedef struct xfs_trans_header {
 #define	XFS_LI_ATTRD		0x1247  /* attr set/remove done */
 #define	XFS_LI_XMI		0x1248  /* mapping exchange intent */
 #define	XFS_LI_XMD		0x1249  /* mapping exchange done */
+#define	XFS_LI_EFI_RT		0x124a	/* realtime extent free intent */
+#define	XFS_LI_EFD_RT		0x124b	/* realtime extent free done */
 
 #define XFS_LI_TYPE_DESC \
 	{ XFS_LI_EFI,		"XFS_LI_EFI" }, \
@@ -267,7 +269,9 @@ typedef struct xfs_trans_header {
 	{ XFS_LI_ATTRI,		"XFS_LI_ATTRI" }, \
 	{ XFS_LI_ATTRD,		"XFS_LI_ATTRD" }, \
 	{ XFS_LI_XMI,		"XFS_LI_XMI" }, \
-	{ XFS_LI_XMD,		"XFS_LI_XMD" }
+	{ XFS_LI_XMD,		"XFS_LI_XMD" }, \
+	{ XFS_LI_EFI_RT,	"XFS_LI_EFI_RT" }, \
+	{ XFS_LI_EFD_RT,	"XFS_LI_EFD_RT" }
 
 /*
  * Inode Log Item Format definitions.


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 25/52] xfs: support error injection when freeing rt extents
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (23 preceding siblings ...)
  2024-12-23 22:04   ` [PATCH 24/52] xfs: support logging EFIs for realtime extents Darrick J. Wong
@ 2024-12-23 22:04   ` Darrick J. Wong
  2024-12-23 22:05   ` [PATCH 26/52] xfs: use realtime EFI to free extents when rtgroups are enabled Darrick J. Wong
                     ` (26 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:04 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: fc91d9430e5dd2008ef6c1350fa15c1a0ed17f11

A handful of fstests expect to be able to test what happens when extent
free intents fail to actually free the extent.  Now that we're
supporting EFIs for realtime extents, add to xfs_rtfree_extent the same
injection point that exists in the regular extent freeing code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_rtbitmap.c |    4 ++++
 1 file changed, 4 insertions(+)


diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c
index 580e74b7d317db..b6874885107f09 100644
--- a/libxfs/xfs_rtbitmap.c
+++ b/libxfs/xfs_rtbitmap.c
@@ -19,6 +19,7 @@
 #include "xfs_rtbitmap.h"
 #include "xfs_health.h"
 #include "xfs_sb.h"
+#include "xfs_errortag.h"
 
 /*
  * Realtime allocator bitmap functions shared with userspace.
@@ -1061,6 +1062,9 @@ xfs_rtfree_extent(
 	ASSERT(rbmip->i_itemp != NULL);
 	xfs_assert_ilocked(rbmip, XFS_ILOCK_EXCL);
 
+	if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_FREE_EXTENT))
+		return -EIO;
+
 	error = xfs_rtcheck_alloc_range(&args, start, len);
 	if (error)
 		return error;


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 26/52] xfs: use realtime EFI to free extents when rtgroups are enabled
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (24 preceding siblings ...)
  2024-12-23 22:04   ` [PATCH 25/52] xfs: support error injection when freeing rt extents Darrick J. Wong
@ 2024-12-23 22:05   ` Darrick J. Wong
  2024-12-23 22:05   ` [PATCH 27/52] xfs: don't merge ioends across RTGs Darrick J. Wong
                     ` (25 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:05 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 44e69c9af159e61d4f765ff4805dd5b55f241597

When rmap is enabled, XFS expects a certain order of operations, which
is: 1) remove the file mapping, 2) remove the reverse mapping, and then
3) free the blocks.  When reflink is enabled, XFS replaces (3) with a
deferred refcount decrement operation that can schedule freeing the
blocks if that was the last refcount.

For realtime files, xfs_bmap_del_extent_real tries to do 1 and 3 in the
same transaction, which will break both rmap and reflink unless we
switch it to use realtime EFIs.  Both rmap and reflink depend on the
rtgroups feature, so let's turn on EFIs for all rtgroups filesystems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_bmap.c |   17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index d7769f0e70005d..bdede0e683ae91 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -5377,9 +5377,11 @@ xfs_bmap_del_extent_real(
 	 * If we need to, add to list of extents to delete.
 	 */
 	if (!(bflags & XFS_BMAPI_REMAP)) {
+		bool	isrt = xfs_ifork_is_realtime(ip, whichfork);
+
 		if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) {
 			xfs_refcount_decrease_extent(tp, del);
-		} else if (xfs_ifork_is_realtime(ip, whichfork)) {
+		} else if (isrt && !xfs_has_rtgroups(mp)) {
 			error = xfs_bmap_free_rtblocks(tp, del);
 		} else {
 			unsigned int	efi_flags = 0;
@@ -5388,6 +5390,19 @@ xfs_bmap_del_extent_real(
 			    del->br_state == XFS_EXT_UNWRITTEN)
 				efi_flags |= XFS_FREE_EXTENT_SKIP_DISCARD;
 
+			/*
+			 * Historically, we did not use EFIs to free realtime
+			 * extents.  However, when reverse mapping is enabled,
+			 * we must maintain the same order of operations as the
+			 * data device, which is: Remove the file mapping,
+			 * remove the reverse mapping, and then free the
+			 * blocks.  Reflink for realtime volumes requires the
+			 * same sort of ordering.  Both features rely on
+			 * rtgroups, so let's gate rt EFI usage on rtgroups.
+			 */
+			if (isrt)
+				efi_flags |= XFS_FREE_EXTENT_REALTIME;
+
 			error = xfs_free_extent_later(tp, del->br_startblock,
 					del->br_blockcount, NULL,
 					XFS_AG_RESV_NONE, efi_flags);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 27/52] xfs: don't merge ioends across RTGs
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (25 preceding siblings ...)
  2024-12-23 22:05   ` [PATCH 26/52] xfs: use realtime EFI to free extents when rtgroups are enabled Darrick J. Wong
@ 2024-12-23 22:05   ` Darrick J. Wong
  2024-12-23 22:05   ` [PATCH 28/52] xfs: make the RT allocator rtgroup aware Darrick J. Wong
                     ` (24 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:05 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: b91afef724710e3dc7d65a28105ffd7a4e861d69

Unlike AGs, RTGs don't always have metadata in their first blocks, and
thus we don't get automatic protection from merging I/O completions
across RTG boundaries.  Add code to set the IOMAP_F_BOUNDARY flag for
ioends that start at the first block of a RTG so that they never get
merged into the previous ioend.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_rtgroup.h |    9 +++++++++
 1 file changed, 9 insertions(+)


diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index 026f34f984b32f..2ddfac9a0182f9 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -188,6 +188,15 @@ xfs_rtb_to_rgbno(
 	return __xfs_rtb_to_rgbno(mp, rtbno);
 }
 
+/* Is rtbno the start of a RT group? */
+static inline bool
+xfs_rtbno_is_group_start(
+	struct xfs_mount	*mp,
+	xfs_rtblock_t		rtbno)
+{
+	return (rtbno & mp->m_rgblkmask) == 0;
+}
+
 static inline xfs_daddr_t
 xfs_rtb_to_daddr(
 	struct xfs_mount	*mp,


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 28/52] xfs: make the RT allocator rtgroup aware
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (26 preceding siblings ...)
  2024-12-23 22:05   ` [PATCH 27/52] xfs: don't merge ioends across RTGs Darrick J. Wong
@ 2024-12-23 22:05   ` Darrick J. Wong
  2024-12-23 22:05   ` [PATCH 29/52] xfs: scrub the realtime group superblock Darrick J. Wong
                     ` (23 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:05 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: d162491c5459f4dd72e65b72a2c864591668ec07

Make the allocator rtgroup aware by either picking a specific group if
there is a hint, or loop over all groups otherwise.  A simple rotor is
provided to pick the placement for initial allocations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_bmap.c     |   13 +++++++++++--
 libxfs/xfs_rtbitmap.c |    6 ++++--
 2 files changed, 15 insertions(+), 4 deletions(-)


diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index bdede0e683ae91..60310d3c1074c8 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -3145,8 +3145,17 @@ xfs_bmap_adjacent_valid(
 	struct xfs_mount	*mp = ap->ip->i_mount;
 
 	if (XFS_IS_REALTIME_INODE(ap->ip) &&
-	    (ap->datatype & XFS_ALLOC_USERDATA))
-		return x < mp->m_sb.sb_rblocks;
+	    (ap->datatype & XFS_ALLOC_USERDATA)) {
+		if (x >= mp->m_sb.sb_rblocks)
+			return false;
+		if (!xfs_has_rtgroups(mp))
+			return true;
+
+		return xfs_rtb_to_rgno(mp, x) == xfs_rtb_to_rgno(mp, y) &&
+			xfs_rtb_to_rgno(mp, x) < mp->m_sb.sb_rgcount &&
+			xfs_rtb_to_rtx(mp, x) < mp->m_sb.sb_rgextents;
+
+	}
 
 	return XFS_FSB_TO_AGNO(mp, x) == XFS_FSB_TO_AGNO(mp, y) &&
 		XFS_FSB_TO_AGNO(mp, x) < mp->m_sb.sb_agcount &&
diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c
index b6874885107f09..44c801f31d5dc3 100644
--- a/libxfs/xfs_rtbitmap.c
+++ b/libxfs/xfs_rtbitmap.c
@@ -1080,11 +1080,13 @@ xfs_rtfree_extent(
 	 * Mark more blocks free in the superblock.
 	 */
 	xfs_trans_mod_sb(tp, XFS_TRANS_SB_FREXTENTS, (long)len);
+
 	/*
 	 * If we've now freed all the blocks, reset the file sequence
-	 * number to 0.
+	 * number to 0 for pre-RTG file systems.
 	 */
-	if (tp->t_frextents_delta + mp->m_sb.sb_frextents ==
+	if (!xfs_has_rtgroups(mp) &&
+	    tp->t_frextents_delta + mp->m_sb.sb_frextents ==
 	    mp->m_sb.sb_rextents) {
 		if (!(rbmip->i_diflags & XFS_DIFLAG_NEWRTBM))
 			rbmip->i_diflags |= XFS_DIFLAG_NEWRTBM;


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 29/52] xfs: scrub the realtime group superblock
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (27 preceding siblings ...)
  2024-12-23 22:05   ` [PATCH 28/52] xfs: make the RT allocator rtgroup aware Darrick J. Wong
@ 2024-12-23 22:05   ` Darrick J. Wong
  2024-12-23 22:06   ` [PATCH 30/52] xfs: scrub metadir paths for rtgroup metadata Darrick J. Wong
                     ` (22 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:05 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 3f1bdf50ab1b9c94d0da010f8879895d29585fd9

Enable scrubbing of realtime group superblocks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_fs.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index 4c0682173d6144..50de6ad88dbe45 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -736,9 +736,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_HEALTHY	27	/* everything checked out ok */
 #define XFS_SCRUB_TYPE_DIRTREE	28	/* directory tree structure */
 #define XFS_SCRUB_TYPE_METAPATH	29	/* metadata directory tree paths */
+#define XFS_SCRUB_TYPE_RGSUPER	30	/* realtime superblock */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	30
+#define XFS_SCRUB_TYPE_NR	31
 
 /*
  * This special type code only applies to the vectored scrub implementation.


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 30/52] xfs: scrub metadir paths for rtgroup metadata
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (28 preceding siblings ...)
  2024-12-23 22:05   ` [PATCH 29/52] xfs: scrub the realtime group superblock Darrick J. Wong
@ 2024-12-23 22:06   ` Darrick J. Wong
  2024-12-23 22:06   ` [PATCH 31/52] xfs: mask off the rtbitmap and summary inodes when metadir in use Darrick J. Wong
                     ` (21 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:06 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: a74923333d9c3bc7cae3f8820d5e80535dca1457

Add the code we need to scan the metadata directory paths of rt group
metadata files.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_fs.h |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index 50de6ad88dbe45..96f7d3c95fb4bc 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -822,9 +822,12 @@ struct xfs_scrub_vec_head {
  * path checking.
  */
 #define XFS_SCRUB_METAPATH_PROBE	(0)  /* do we have a metapath scrubber? */
+#define XFS_SCRUB_METAPATH_RTDIR	(1)  /* rtrgroups metadir */
+#define XFS_SCRUB_METAPATH_RTBITMAP	(2)  /* per-rtg bitmap */
+#define XFS_SCRUB_METAPATH_RTSUMMARY	(3)  /* per-rtg summary */
 
 /* Number of metapath sm_ino values */
-#define XFS_SCRUB_METAPATH_NR		(1)
+#define XFS_SCRUB_METAPATH_NR		(4)
 
 /*
  * ioctl limits


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 31/52] xfs: mask off the rtbitmap and summary inodes when metadir in use
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (29 preceding siblings ...)
  2024-12-23 22:06   ` [PATCH 30/52] xfs: scrub metadir paths for rtgroup metadata Darrick J. Wong
@ 2024-12-23 22:06   ` Darrick J. Wong
  2024-12-23 22:06   ` [PATCH 32/52] xfs: create helpers to deal with rounding xfs_fileoff_t to rtx boundaries Darrick J. Wong
                     ` (20 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:06 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: ea99122b18ca6cf902417e1acbc19a197f662299

Set the rtbitmap and summary file inumbers to NULLFSINO in the
superblock and make sure they're zeroed whenever we write the superblock
to disk, to mimic mkfs behavior.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_sb.c |   20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)


diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 87be47083aa571..fe760d38fd7673 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -666,6 +666,14 @@ xfs_validate_sb_common(
 void
 xfs_sb_quota_from_disk(struct xfs_sb *sbp)
 {
+	if (xfs_sb_is_v5(sbp) &&
+	    (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR)) {
+		sbp->sb_uquotino = NULLFSINO;
+		sbp->sb_gquotino = NULLFSINO;
+		sbp->sb_pquotino = NULLFSINO;
+		return;
+	}
+
 	/*
 	 * older mkfs doesn't initialize quota inodes to NULLFSINO. This
 	 * leads to in-core values having two different values for a quota
@@ -794,6 +802,8 @@ __xfs_sb_from_disk(
 		to->sb_metadirino = be64_to_cpu(from->sb_metadirino);
 		to->sb_rgcount = be32_to_cpu(from->sb_rgcount);
 		to->sb_rgextents = be32_to_cpu(from->sb_rgextents);
+		to->sb_rbmino = NULLFSINO;
+		to->sb_rsumino = NULLFSINO;
 	} else {
 		to->sb_metadirino = NULLFSINO;
 		to->sb_rgcount = 1;
@@ -816,6 +826,14 @@ xfs_sb_quota_to_disk(
 {
 	uint16_t	qflags = from->sb_qflags;
 
+	if (xfs_sb_is_v5(from) &&
+	    (from->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR)) {
+		to->sb_uquotino = cpu_to_be64(0);
+		to->sb_gquotino = cpu_to_be64(0);
+		to->sb_pquotino = cpu_to_be64(0);
+		return;
+	}
+
 	to->sb_uquotino = cpu_to_be64(from->sb_uquotino);
 
 	/*
@@ -951,6 +969,8 @@ xfs_sb_to_disk(
 		to->sb_metadirino = cpu_to_be64(from->sb_metadirino);
 		to->sb_rgcount = cpu_to_be32(from->sb_rgcount);
 		to->sb_rgextents = cpu_to_be32(from->sb_rgextents);
+		to->sb_rbmino = cpu_to_be64(0);
+		to->sb_rsumino = cpu_to_be64(0);
 	}
 }
 


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 32/52] xfs: create helpers to deal with rounding xfs_fileoff_t to rtx boundaries
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (30 preceding siblings ...)
  2024-12-23 22:06   ` [PATCH 31/52] xfs: mask off the rtbitmap and summary inodes when metadir in use Darrick J. Wong
@ 2024-12-23 22:06   ` Darrick J. Wong
  2024-12-23 22:06   ` [PATCH 33/52] xfs: create helpers to deal with rounding xfs_filblks_t " Darrick J. Wong
                     ` (19 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:06 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: fd7588fa6475771fe95f44011aea268c5d841da2

We're about to segment xfs_rtblock_t addresses, so we must create
type-specific helpers to do rt extent rounding of file block offsets
because the rtb helpers soon will not do the right thing there.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_rtbitmap.h |   17 +++++++++++++----
 repair/dinode.c       |    4 ++--
 2 files changed, 15 insertions(+), 6 deletions(-)


diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h
index 7be76490a31879..dc2b8beadfc331 100644
--- a/libxfs/xfs_rtbitmap.h
+++ b/libxfs/xfs_rtbitmap.h
@@ -135,13 +135,22 @@ xfs_rtb_roundup_rtx(
 	return roundup_64(rtbno, mp->m_sb.sb_rextsize);
 }
 
-/* Round this rtblock down to the nearest rt extent size. */
+/* Round this file block offset up to the nearest rt extent size. */
 static inline xfs_rtblock_t
-xfs_rtb_rounddown_rtx(
+xfs_fileoff_roundup_rtx(
 	struct xfs_mount	*mp,
-	xfs_rtblock_t		rtbno)
+	xfs_fileoff_t		off)
 {
-	return rounddown_64(rtbno, mp->m_sb.sb_rextsize);
+	return roundup_64(off, mp->m_sb.sb_rextsize);
+}
+
+/* Round this file block offset down to the nearest rt extent size. */
+static inline xfs_rtblock_t
+xfs_fileoff_rounddown_rtx(
+	struct xfs_mount	*mp,
+	xfs_fileoff_t		off)
+{
+	return rounddown_64(off, mp->m_sb.sb_rextsize);
 }
 
 /* Convert an rt extent number to a file block offset in the rt bitmap file. */
diff --git a/repair/dinode.c b/repair/dinode.c
index 2185214ac41bdf..56c7257d3766f1 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -197,7 +197,7 @@ process_rt_rec_dups(
 	xfs_rtblock_t		b;
 	xfs_rtxnum_t		ext;
 
-	for (b = xfs_rtb_rounddown_rtx(mp, irec->br_startblock);
+	for (b = irec->br_startblock;
 	     b < irec->br_startblock + irec->br_blockcount;
 	     b += mp->m_sb.sb_rextsize) {
 		ext = xfs_rtb_to_rtx(mp, b);
@@ -245,7 +245,7 @@ process_rt_rec_state(
 				do_error(
 _("data fork in rt inode %" PRIu64 " found invalid rt extent %"PRIu64" state %d at rt block %"PRIu64"\n"),
 					ino, ext, state, b);
-			b = xfs_rtb_roundup_rtx(mp, b);
+			b += mp->m_sb.sb_rextsize - mod;
 			continue;
 		}
 


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 33/52] xfs: create helpers to deal with rounding xfs_filblks_t to rtx boundaries
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (31 preceding siblings ...)
  2024-12-23 22:06   ` [PATCH 32/52] xfs: create helpers to deal with rounding xfs_fileoff_t to rtx boundaries Darrick J. Wong
@ 2024-12-23 22:06   ` Darrick J. Wong
  2024-12-23 22:07   ` [PATCH 34/52] xfs: make xfs_rtblock_t a segmented address like xfs_fsblock_t Darrick J. Wong
                     ` (18 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:06 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 3f0205ebe71f92c1b98ca580de8df6eea631cfd2

We're about to segment xfs_rtblock_t addresses, so we must create
type-specific helpers to do rt extent rounding of file mapping block
lengths because the rtb helpers soon will not do the right thing there.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_rtbitmap.c |    2 +-
 libxfs/xfs_rtbitmap.h |   30 +++++++++++++++++++++---------
 2 files changed, 22 insertions(+), 10 deletions(-)


diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c
index 44c801f31d5dc3..e304f07189d3c9 100644
--- a/libxfs/xfs_rtbitmap.c
+++ b/libxfs/xfs_rtbitmap.c
@@ -1119,7 +1119,7 @@ xfs_rtfree_blocks(
 
 	ASSERT(rtlen <= XFS_MAX_BMBT_EXTLEN);
 
-	mod = xfs_rtb_to_rtxoff(mp, rtlen);
+	mod = xfs_blen_to_rtxoff(mp, rtlen);
 	if (mod) {
 		ASSERT(mod == 0);
 		return -EIO;
diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h
index dc2b8beadfc331..e0fb36f181cc9e 100644
--- a/libxfs/xfs_rtbitmap.h
+++ b/libxfs/xfs_rtbitmap.h
@@ -101,6 +101,27 @@ xfs_blen_to_rtbxlen(
 	return div_u64(blen, mp->m_sb.sb_rextsize);
 }
 
+/* Return the offset of a file block length within an rt extent. */
+static inline xfs_extlen_t
+xfs_blen_to_rtxoff(
+	struct xfs_mount	*mp,
+	xfs_filblks_t		blen)
+{
+	if (likely(mp->m_rtxblklog >= 0))
+		return blen & mp->m_rtxblkmask;
+
+	return do_div(blen, mp->m_sb.sb_rextsize);
+}
+
+/* Round this block count up to the nearest rt extent size. */
+static inline xfs_filblks_t
+xfs_blen_roundup_rtx(
+	struct xfs_mount	*mp,
+	xfs_filblks_t		blen)
+{
+	return roundup_64(blen, mp->m_sb.sb_rextsize);
+}
+
 /* Convert an rt block number into an rt extent number. */
 static inline xfs_rtxnum_t
 xfs_rtb_to_rtx(
@@ -126,15 +147,6 @@ xfs_rtb_to_rtxoff(
 	return do_div(rtbno, mp->m_sb.sb_rextsize);
 }
 
-/* Round this rtblock up to the nearest rt extent size. */
-static inline xfs_rtblock_t
-xfs_rtb_roundup_rtx(
-	struct xfs_mount	*mp,
-	xfs_rtblock_t		rtbno)
-{
-	return roundup_64(rtbno, mp->m_sb.sb_rextsize);
-}
-
 /* Round this file block offset up to the nearest rt extent size. */
 static inline xfs_rtblock_t
 xfs_fileoff_roundup_rtx(


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 34/52] xfs: make xfs_rtblock_t a segmented address like xfs_fsblock_t
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (32 preceding siblings ...)
  2024-12-23 22:06   ` [PATCH 33/52] xfs: create helpers to deal with rounding xfs_filblks_t " Darrick J. Wong
@ 2024-12-23 22:07   ` Darrick J. Wong
  2024-12-23 22:07   ` [PATCH 35/52] xfs: adjust min_block usage in xfs_verify_agbno Darrick J. Wong
                     ` (17 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:07 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 7195f240c6578caa9e24202a26aa612a7e8cba26

Now that we've finished adding allocation groups to the realtime volume,
let's make the file block mapping address (xfs_rtblock_t) a segmented
value just like we do on the data device.  This means that group number
and block number conversions can be done with shifting and masking
instead of integer division.

While in theory we could continue caching the rgno shift value in
m_rgblklog, the fact that we now always use the shift value means that
we have an opportunity to increase the redundancy of the rt geometry by
storing it in the ondisk superblock and adding more sb verifier code.
Extend the sueprblock to store the rgblklog value.

Now that we have segmented addresses, set the correct values in
m_groups[XG_TYPE_RTG] so that the xfs_group helpers work correctly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/xfs_mount.h   |    4 +--
 libxfs/xfs_bmap.c     |    4 +--
 libxfs/xfs_format.h   |    6 ++++
 libxfs/xfs_ondisk.h   |    2 +
 libxfs/xfs_rtbitmap.h |   13 +++++----
 libxfs/xfs_rtgroup.h  |   69 +++++++++++++++----------------------------------
 libxfs/xfs_sb.c       |   65 ++++++++++++++++++++++++++++++++++++++++------
 libxfs/xfs_sb.h       |    4 ++-
 libxfs/xfs_types.c    |    7 ++---
 9 files changed, 100 insertions(+), 74 deletions(-)


diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 92c01c52e99e4a..19d08cf047f202 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -83,7 +83,6 @@ typedef struct xfs_mount {
         struct xfs_ino_geometry	m_ino_geo;	/* inode geometry */
 	uint			m_rsumlevels;	/* rt summary levels */
 	xfs_filblks_t		m_rsumblocks;	/* size of rt summary, FSBs */
-	uint32_t		m_rgblocks;	/* size of rtgroup in rtblocks */
 	struct xfs_inode	*m_metadirip;	/* ptr to metadata directory */
 	struct xfs_inode	*m_rtdirip;	/* ptr to realtime metadir */
 	struct xfs_buftarg	*m_ddev_targp;
@@ -98,7 +97,7 @@ typedef struct xfs_mount {
 	uint8_t			m_sectbb_log;	/* sectorlog - BBSHIFT */
 	uint8_t			m_agno_log;	/* log #ag's */
 	int8_t			m_rtxblklog;	/* log2 of rextsize, if possible */
-	int8_t			m_rgblklog;	/* log2 of rt group sz if possible */
+
 	uint			m_blockmask;	/* sb_blocksize-1 */
 	uint			m_blockwsize;	/* sb_blocksize in words */
 	/* number of rt extents per rt bitmap block if rtgroups enabled */
@@ -123,7 +122,6 @@ typedef struct xfs_mount {
 	uint64_t		m_features;	/* active filesystem features */
 	uint64_t		m_low_space[XFS_LOWSP_MAX];
 	uint64_t		m_rtxblkmask;	/* rt extent block mask */
-	uint64_t		m_rgblkmask;	/* rt group block mask */
 	unsigned long		m_opstate;	/* dynamic state flags */
 	bool			m_finobt_nores; /* no per-AG finobt resv. */
 	uint			m_qflags;	/* quota status flags */
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 60310d3c1074c8..552d292bf35412 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -3146,10 +3146,8 @@ xfs_bmap_adjacent_valid(
 
 	if (XFS_IS_REALTIME_INODE(ap->ip) &&
 	    (ap->datatype & XFS_ALLOC_USERDATA)) {
-		if (x >= mp->m_sb.sb_rblocks)
-			return false;
 		if (!xfs_has_rtgroups(mp))
-			return true;
+			return x < mp->m_sb.sb_rblocks;
 
 		return xfs_rtb_to_rgno(mp, x) == xfs_rtb_to_rgno(mp, y) &&
 			xfs_rtb_to_rgno(mp, x) < mp->m_sb.sb_rgcount &&
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index f56ff9f43c218f..d6c10855ab023b 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -179,6 +179,9 @@ typedef struct xfs_sb {
 	xfs_rgnumber_t	sb_rgcount;	/* number of realtime groups */
 	xfs_rtxlen_t	sb_rgextents;	/* size of a realtime group in rtx */
 
+	uint8_t		sb_rgblklog;    /* rt group number shift */
+	uint8_t		sb_pad[7];	/* zeroes */
+
 	/* must be padded to 64 bit alignment */
 } xfs_sb_t;
 
@@ -268,6 +271,9 @@ struct xfs_dsb {
 	__be32		sb_rgcount;	/* # of realtime groups */
 	__be32		sb_rgextents;	/* size of rtgroup in rtx */
 
+	__u8		sb_rgblklog;    /* rt group number shift */
+	__u8		sb_pad[7];	/* zeroes */
+
 	/*
 	 * The size of this structure must be padded to 64 bit alignment.
 	 *
diff --git a/libxfs/xfs_ondisk.h b/libxfs/xfs_ondisk.h
index 6a2bcbc392842c..99eae7f67e961b 100644
--- a/libxfs/xfs_ondisk.h
+++ b/libxfs/xfs_ondisk.h
@@ -37,7 +37,7 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_STRUCT_SIZE(struct xfs_dinode,		176);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_disk_dquot,		104);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_dqblk,			136);
-	XFS_CHECK_STRUCT_SIZE(struct xfs_dsb,			280);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dsb,			288);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_dsymlink_hdr,		56);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_inobt_key,		4);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_inobt_rec,		16);
diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h
index e0fb36f181cc9e..16563a44bd138a 100644
--- a/libxfs/xfs_rtbitmap.h
+++ b/libxfs/xfs_rtbitmap.h
@@ -26,7 +26,7 @@ xfs_rtx_to_rtb(
 	xfs_rtxnum_t		rtx)
 {
 	struct xfs_mount	*mp = rtg_mount(rtg);
-	xfs_rtblock_t		start = xfs_rgno_start_rtb(mp, rtg_rgno(rtg));
+	xfs_rtblock_t		start = xfs_group_start_fsb(rtg_group(rtg));
 
 	if (mp->m_rtxblklog >= 0)
 		return start + (rtx << mp->m_rtxblklog);
@@ -128,11 +128,11 @@ xfs_rtb_to_rtx(
 	struct xfs_mount	*mp,
 	xfs_rtblock_t		rtbno)
 {
-	uint64_t		__rgbno = __xfs_rtb_to_rgbno(mp, rtbno);
-
+	/* open-coded 64-bit masking operation */
+	rtbno &= mp->m_groups[XG_TYPE_RTG].blkmask;
 	if (likely(mp->m_rtxblklog >= 0))
-		return __rgbno >> mp->m_rtxblklog;
-	return div_u64(__rgbno, mp->m_sb.sb_rextsize);
+		return rtbno >> mp->m_rtxblklog;
+	return div_u64(rtbno, mp->m_sb.sb_rextsize);
 }
 
 /* Return the offset of an rt block number within an rt extent. */
@@ -141,9 +141,10 @@ xfs_rtb_to_rtxoff(
 	struct xfs_mount	*mp,
 	xfs_rtblock_t		rtbno)
 {
+	/* open-coded 64-bit masking operation */
+	rtbno &= mp->m_groups[XG_TYPE_RTG].blkmask;
 	if (likely(mp->m_rtxblklog >= 0))
 		return rtbno & mp->m_rtxblkmask;
-
 	return do_div(rtbno, mp->m_sb.sb_rextsize);
 }
 
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index 2ddfac9a0182f9..c15b232e1f8e77 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -122,31 +122,12 @@ xfs_rtgroup_next(
 	return xfs_rtgroup_next_range(mp, rtg, 0, mp->m_sb.sb_rgcount - 1);
 }
 
-static inline xfs_rtblock_t
-xfs_rgno_start_rtb(
-	struct xfs_mount	*mp,
-	xfs_rgnumber_t		rgno)
-{
-	if (mp->m_rgblklog >= 0)
-		return ((xfs_rtblock_t)rgno << mp->m_rgblklog);
-	return ((xfs_rtblock_t)rgno * mp->m_rgblocks);
-}
-
-static inline xfs_rtblock_t
-__xfs_rgbno_to_rtb(
-	struct xfs_mount	*mp,
-	xfs_rgnumber_t		rgno,
-	xfs_rgblock_t		rgbno)
-{
-	return xfs_rgno_start_rtb(mp, rgno) + rgbno;
-}
-
 static inline xfs_rtblock_t
 xfs_rgbno_to_rtb(
 	struct xfs_rtgroup	*rtg,
 	xfs_rgblock_t		rgbno)
 {
-	return __xfs_rgbno_to_rtb(rtg_mount(rtg), rtg_rgno(rtg), rgbno);
+	return xfs_gbno_to_fsb(rtg_group(rtg), rgbno);
 }
 
 static inline xfs_rgnumber_t
@@ -154,30 +135,7 @@ xfs_rtb_to_rgno(
 	struct xfs_mount	*mp,
 	xfs_rtblock_t		rtbno)
 {
-	if (!xfs_has_rtgroups(mp))
-		return 0;
-
-	if (mp->m_rgblklog >= 0)
-		return rtbno >> mp->m_rgblklog;
-
-	return div_u64(rtbno, mp->m_rgblocks);
-}
-
-static inline uint64_t
-__xfs_rtb_to_rgbno(
-	struct xfs_mount	*mp,
-	xfs_rtblock_t		rtbno)
-{
-	uint32_t		rem;
-
-	if (!xfs_has_rtgroups(mp))
-		return rtbno;
-
-	if (mp->m_rgblklog >= 0)
-		return rtbno & mp->m_rgblkmask;
-
-	div_u64_rem(rtbno, mp->m_rgblocks, &rem);
-	return rem;
+	return xfs_fsb_to_gno(mp, rtbno, XG_TYPE_RTG);
 }
 
 static inline xfs_rgblock_t
@@ -185,7 +143,7 @@ xfs_rtb_to_rgbno(
 	struct xfs_mount	*mp,
 	xfs_rtblock_t		rtbno)
 {
-	return __xfs_rtb_to_rgbno(mp, rtbno);
+	return xfs_fsb_to_gbno(mp, rtbno, XG_TYPE_RTG);
 }
 
 /* Is rtbno the start of a RT group? */
@@ -194,7 +152,7 @@ xfs_rtbno_is_group_start(
 	struct xfs_mount	*mp,
 	xfs_rtblock_t		rtbno)
 {
-	return (rtbno & mp->m_rgblkmask) == 0;
+	return (rtbno & mp->m_groups[XG_TYPE_RTG].blkmask) == 0;
 }
 
 static inline xfs_daddr_t
@@ -202,7 +160,11 @@ xfs_rtb_to_daddr(
 	struct xfs_mount	*mp,
 	xfs_rtblock_t		rtbno)
 {
-	return rtbno << mp->m_blkbb_log;
+	struct xfs_groups	*g = &mp->m_groups[XG_TYPE_RTG];
+	xfs_rgnumber_t		rgno = xfs_rtb_to_rgno(mp, rtbno);
+	uint64_t		start_bno = (xfs_rtblock_t)rgno * g->blocks;
+
+	return XFS_FSB_TO_BB(mp, start_bno + (rtbno & g->blkmask));
 }
 
 static inline xfs_rtblock_t
@@ -210,7 +172,18 @@ xfs_daddr_to_rtb(
 	struct xfs_mount	*mp,
 	xfs_daddr_t		daddr)
 {
-	return daddr >> mp->m_blkbb_log;
+	xfs_rfsblock_t		bno = XFS_BB_TO_FSBT(mp, daddr);
+
+	if (xfs_has_rtgroups(mp)) {
+		struct xfs_groups *g = &mp->m_groups[XG_TYPE_RTG];
+		xfs_rgnumber_t	rgno;
+		uint32_t	rgbno;
+
+		rgno = div_u64_rem(bno, g->blocks, &rgbno);
+		return ((xfs_rtblock_t)rgno << g->blklog) + rgbno;
+	}
+
+	return bno;
 }
 
 #ifdef CONFIG_XFS_RT
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index fe760d38fd7673..2ab234f1dfce70 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -365,12 +365,23 @@ xfs_validate_sb_write(
 	return 0;
 }
 
+int
+xfs_compute_rgblklog(
+	xfs_rtxlen_t	rgextents,
+	xfs_rgblock_t	rextsize)
+{
+	uint64_t	rgblocks = (uint64_t)rgextents * rextsize;
+
+	return xfs_highbit64(rgblocks - 1) + 1;
+}
+
 static int
 xfs_validate_sb_rtgroups(
 	struct xfs_mount	*mp,
 	struct xfs_sb		*sbp)
 {
 	uint64_t		groups;
+	int			rgblklog;
 
 	if (sbp->sb_rextsize == 0) {
 		xfs_warn(mp,
@@ -415,6 +426,14 @@ xfs_validate_sb_rtgroups(
 		return -EINVAL;
 	}
 
+	rgblklog = xfs_compute_rgblklog(sbp->sb_rgextents, sbp->sb_rextsize);
+	if (sbp->sb_rgblklog != rgblklog) {
+		xfs_warn(mp,
+"Realtime group log (%d) does not match expected value (%d).",
+				sbp->sb_rgblklog, rgblklog);
+		return -EINVAL;
+	}
+
 	return 0;
 }
 
@@ -495,6 +514,12 @@ xfs_validate_sb_common(
 		}
 
 		if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR) {
+			if (memchr_inv(sbp->sb_pad, 0, sizeof(sbp->sb_pad))) {
+				xfs_warn(mp,
+"Metadir superblock padding fields must be zero.");
+				return -EINVAL;
+			}
+
 			error = xfs_validate_sb_rtgroups(mp, sbp);
 			if (error)
 				return error;
@@ -800,6 +825,8 @@ __xfs_sb_from_disk(
 
 	if (to->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR) {
 		to->sb_metadirino = be64_to_cpu(from->sb_metadirino);
+		to->sb_rgblklog = from->sb_rgblklog;
+		memcpy(to->sb_pad, from->sb_pad, sizeof(to->sb_pad));
 		to->sb_rgcount = be32_to_cpu(from->sb_rgcount);
 		to->sb_rgextents = be32_to_cpu(from->sb_rgextents);
 		to->sb_rbmino = NULLFSINO;
@@ -967,6 +994,8 @@ xfs_sb_to_disk(
 
 	if (from->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR) {
 		to->sb_metadirino = cpu_to_be64(from->sb_metadirino);
+		to->sb_rgblklog = from->sb_rgblklog;
+		memset(to->sb_pad, 0, sizeof(to->sb_pad));
 		to->sb_rgcount = cpu_to_be32(from->sb_rgcount);
 		to->sb_rgextents = cpu_to_be32(from->sb_rgextents);
 		to->sb_rbmino = cpu_to_be64(0);
@@ -1101,8 +1130,9 @@ const struct xfs_buf_ops xfs_sb_quiet_buf_ops = {
 	.verify_write = xfs_sb_write_verify,
 };
 
+/* Compute cached rt geometry from the incore sb. */
 void
-xfs_mount_sb_set_rextsize(
+xfs_sb_mount_rextsize(
 	struct xfs_mount	*mp,
 	struct xfs_sb		*sbp)
 {
@@ -1111,13 +1141,32 @@ xfs_mount_sb_set_rextsize(
 	mp->m_rtxblklog = log2_if_power2(sbp->sb_rextsize);
 	mp->m_rtxblkmask = mask64_if_power2(sbp->sb_rextsize);
 
-	mp->m_rgblocks = sbp->sb_rgextents * sbp->sb_rextsize;
-	mp->m_rgblklog = log2_if_power2(mp->m_rgblocks);
-	mp->m_rgblkmask = mask64_if_power2(mp->m_rgblocks);
+	if (xfs_sb_is_v5(sbp) &&
+	    (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR)) {
+		rgs->blocks = sbp->sb_rgextents * sbp->sb_rextsize;
+		rgs->blklog = mp->m_sb.sb_rgblklog;
+		rgs->blkmask = xfs_mask32lo(mp->m_sb.sb_rgblklog);
+	} else {
+		rgs->blocks = 0;
+		rgs->blklog = 0;
+		rgs->blkmask = (uint64_t)-1;
+	}
+}
 
-	rgs->blocks = 0;
-	rgs->blklog = 0;
-	rgs->blkmask = (uint64_t)-1;
+/* Update incore sb rt extent size, then recompute the cached rt geometry. */
+void
+xfs_mount_sb_set_rextsize(
+	struct xfs_mount	*mp,
+	struct xfs_sb		*sbp,
+	xfs_agblock_t		rextsize)
+{
+	sbp->sb_rextsize = rextsize;
+	if (xfs_sb_is_v5(sbp) &&
+	    (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR))
+		sbp->sb_rgblklog = xfs_compute_rgblklog(sbp->sb_rgextents,
+							rextsize);
+
+	xfs_sb_mount_rextsize(mp, sbp);
 }
 
 /*
@@ -1151,7 +1200,7 @@ xfs_sb_mount_common(
 	ags->blklog = mp->m_sb.sb_agblklog;
 	ags->blkmask = xfs_mask32lo(mp->m_sb.sb_agblklog);
 
-	xfs_mount_sb_set_rextsize(mp, sbp);
+	xfs_sb_mount_rextsize(mp, sbp);
 
 	mp->m_alloc_mxr[0] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, true);
 	mp->m_alloc_mxr[1] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, false);
diff --git a/libxfs/xfs_sb.h b/libxfs/xfs_sb.h
index 999dcfccdaf960..34d0dd374e9b0b 100644
--- a/libxfs/xfs_sb.h
+++ b/libxfs/xfs_sb.h
@@ -17,8 +17,9 @@ extern void	xfs_log_sb(struct xfs_trans *tp);
 extern int	xfs_sync_sb(struct xfs_mount *mp, bool wait);
 extern int	xfs_sync_sb_buf(struct xfs_mount *mp, bool update_rtsb);
 extern void	xfs_sb_mount_common(struct xfs_mount *mp, struct xfs_sb *sbp);
+void		xfs_sb_mount_rextsize(struct xfs_mount *mp, struct xfs_sb *sbp);
 void		xfs_mount_sb_set_rextsize(struct xfs_mount *mp,
-			struct xfs_sb *sbp);
+			struct xfs_sb *sbp, xfs_agblock_t rextsize);
 extern void	xfs_sb_from_disk(struct xfs_sb *to, struct xfs_dsb *from);
 extern void	xfs_sb_to_disk(struct xfs_dsb *to, struct xfs_sb *from);
 extern void	xfs_sb_quota_from_disk(struct xfs_sb *sbp);
@@ -43,5 +44,6 @@ bool	xfs_validate_stripe_geometry(struct xfs_mount *mp,
 bool	xfs_validate_rt_geometry(struct xfs_sb *sbp);
 
 uint8_t xfs_compute_rextslog(xfs_rtbxlen_t rtextents);
+int xfs_compute_rgblklog(xfs_rtxlen_t rgextents, xfs_rgblock_t rextsize);
 
 #endif	/* __XFS_SB_H__ */
diff --git a/libxfs/xfs_types.c b/libxfs/xfs_types.c
index 4e366e62eb2ae8..4e24599c369397 100644
--- a/libxfs/xfs_types.c
+++ b/libxfs/xfs_types.c
@@ -146,9 +146,6 @@ xfs_verify_rtbno(
 	struct xfs_mount	*mp,
 	xfs_rtblock_t		rtbno)
 {
-	if (rtbno >= mp->m_sb.sb_rblocks)
-		return false;
-
 	if (xfs_has_rtgroups(mp)) {
 		xfs_rgnumber_t	rgno = xfs_rtb_to_rgno(mp, rtbno);
 		xfs_rtxnum_t	rtx = xfs_rtb_to_rtx(mp, rtbno);
@@ -159,8 +156,10 @@ xfs_verify_rtbno(
 			return false;
 		if (xfs_has_rtsb(mp) && rgno == 0 && rtx == 0)
 			return false;
+		return true;
 	}
-	return true;
+
+	return rtbno < mp->m_sb.sb_rblocks;
 }
 
 /*


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 35/52] xfs: adjust min_block usage in xfs_verify_agbno
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (33 preceding siblings ...)
  2024-12-23 22:07   ` [PATCH 34/52] xfs: make xfs_rtblock_t a segmented address like xfs_fsblock_t Darrick J. Wong
@ 2024-12-23 22:07   ` Darrick J. Wong
  2024-12-23 22:07   ` [PATCH 36/52] xfs: move the min and max group block numbers to xfs_group Darrick J. Wong
                     ` (16 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:07 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: ceaa0bd773e2d6d5726d6535f605ecd6b26d2fcc

There's some weird logic in xfs_verify_agbno -- min_block ought to be
the first agblock number in the AG that can be used by non-static
metadata.  However, we initialize it to the last agblock of the static
metadata, which works due to the <= check, even though this isn't
technically correct.

Change the check to < and set min_block to the next agblock past the
static metadata.  This hasn't been an issue up to now, but we're going
to move these things into the generic group struct, and this will cause
problems with rtgroups, where min_block can be zero for an rtgroup that
doesn't have a rt superblock.

Note that there's no user-visible impact with the old logic, so this
isn't a bug fix.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_ag.c |    2 +-
 libxfs/xfs_ag.h |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


diff --git a/libxfs/xfs_ag.c b/libxfs/xfs_ag.c
index bd38ac175bbae3..181e929132d855 100644
--- a/libxfs/xfs_ag.c
+++ b/libxfs/xfs_ag.c
@@ -240,7 +240,7 @@ xfs_perag_alloc(
 	 * Pre-calculated geometry
 	 */
 	pag->block_count = __xfs_ag_block_count(mp, index, agcount, dblocks);
-	pag->min_block = XFS_AGFL_BLOCK(mp);
+	pag->min_block = XFS_AGFL_BLOCK(mp) + 1;
 	__xfs_agino_range(mp, pag->block_count, &pag->agino_min,
 			&pag->agino_max);
 
diff --git a/libxfs/xfs_ag.h b/libxfs/xfs_ag.h
index 7290148fa6e6aa..9c22a76d58cfc2 100644
--- a/libxfs/xfs_ag.h
+++ b/libxfs/xfs_ag.h
@@ -222,7 +222,7 @@ xfs_verify_agbno(struct xfs_perag *pag, xfs_agblock_t agbno)
 {
 	if (agbno >= pag->block_count)
 		return false;
-	if (agbno <= pag->min_block)
+	if (agbno < pag->min_block)
 		return false;
 	return true;
 }


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 36/52] xfs: move the min and max group block numbers to xfs_group
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (34 preceding siblings ...)
  2024-12-23 22:07   ` [PATCH 35/52] xfs: adjust min_block usage in xfs_verify_agbno Darrick J. Wong
@ 2024-12-23 22:07   ` Darrick J. Wong
  2024-12-23 22:07   ` [PATCH 37/52] xfs: implement busy extent tracking for rtgroups Darrick J. Wong
                     ` (15 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:07 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: e0b5b97dde8e4737d06cb5888abd88373abc22df

Move the min and max agblock numbers to the generic xfs_group structure
so that we can start building validators for extents within an rtgroup.
While we're at it, use check_add_overflow for the extent length
computation because that has much better overflow checking.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/init.c             |    4 ----
 libxfs/xfs_ag.c           |   22 ++++++++++++----------
 libxfs/xfs_ag.h           |   16 ++--------------
 libxfs/xfs_group.h        |   33 +++++++++++++++++++++++++++++++++
 libxfs/xfs_ialloc_btree.c |    2 +-
 libxfs/xfs_rtgroup.c      |   31 ++++++++++++++++++++++++++++++-
 libxfs/xfs_rtgroup.h      |    3 +++
 7 files changed, 81 insertions(+), 30 deletions(-)


diff --git a/libxfs/init.c b/libxfs/init.c
index a037012b77e5f6..6642cd50c00b5f 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -668,7 +668,6 @@ libxfs_mount(
 {
 	struct xfs_buf		*bp;
 	struct xfs_sb		*sbp;
-	struct xfs_rtgroup	*rtg = NULL;
 	xfs_daddr_t		d;
 	int			i;
 	int			error;
@@ -831,9 +830,6 @@ libxfs_mount(
 		exit(1);
 	}
 
-	while ((rtg = xfs_rtgroup_next(mp, rtg)))
-		rtg->rtg_extents = xfs_rtgroup_extents(mp, rtg_rgno(rtg));
-
 	xfs_set_rtgroup_data_loaded(mp);
 
 	return mp;
diff --git a/libxfs/xfs_ag.c b/libxfs/xfs_ag.c
index 181e929132d855..095b581a116180 100644
--- a/libxfs/xfs_ag.c
+++ b/libxfs/xfs_ag.c
@@ -203,9 +203,10 @@ xfs_update_last_ag_size(
 
 	if (!pag)
 		return -EFSCORRUPTED;
-	pag->block_count = __xfs_ag_block_count(mp, prev_agcount - 1,
-			mp->m_sb.sb_agcount, mp->m_sb.sb_dblocks);
-	__xfs_agino_range(mp, pag->block_count, &pag->agino_min,
+	pag_group(pag)->xg_block_count = __xfs_ag_block_count(mp,
+			prev_agcount - 1, mp->m_sb.sb_agcount,
+			mp->m_sb.sb_dblocks);
+	__xfs_agino_range(mp, pag_group(pag)->xg_block_count, &pag->agino_min,
 			&pag->agino_max);
 	xfs_perag_rele(pag);
 	return 0;
@@ -239,9 +240,10 @@ xfs_perag_alloc(
 	/*
 	 * Pre-calculated geometry
 	 */
-	pag->block_count = __xfs_ag_block_count(mp, index, agcount, dblocks);
-	pag->min_block = XFS_AGFL_BLOCK(mp) + 1;
-	__xfs_agino_range(mp, pag->block_count, &pag->agino_min,
+	pag_group(pag)->xg_block_count = __xfs_ag_block_count(mp, index, agcount,
+				dblocks);
+	pag_group(pag)->xg_min_gbno = XFS_AGFL_BLOCK(mp) + 1;
+	__xfs_agino_range(mp, pag_group(pag)->xg_block_count, &pag->agino_min,
 			&pag->agino_max);
 
 	error = xfs_group_insert(mp, pag_group(pag), index, XG_TYPE_AG);
@@ -850,8 +852,8 @@ xfs_ag_shrink_space(
 	}
 
 	/* Update perag geometry */
-	pag->block_count -= delta;
-	__xfs_agino_range(mp, pag->block_count, &pag->agino_min,
+	pag_group(pag)->xg_block_count -= delta;
+	__xfs_agino_range(mp, pag_group(pag)->xg_block_count, &pag->agino_min,
 			&pag->agino_max);
 
 	xfs_ialloc_log_agi(*tpp, agibp, XFS_AGI_LENGTH);
@@ -922,8 +924,8 @@ xfs_ag_extend_space(
 		return error;
 
 	/* Update perag geometry */
-	pag->block_count = be32_to_cpu(agf->agf_length);
-	__xfs_agino_range(mp, pag->block_count, &pag->agino_min,
+	pag_group(pag)->xg_block_count = be32_to_cpu(agf->agf_length);
+	__xfs_agino_range(mp, pag_group(pag)->xg_block_count, &pag->agino_min,
 			&pag->agino_max);
 	return 0;
 }
diff --git a/libxfs/xfs_ag.h b/libxfs/xfs_ag.h
index 9c22a76d58cfc2..1f24cfa2732172 100644
--- a/libxfs/xfs_ag.h
+++ b/libxfs/xfs_ag.h
@@ -61,8 +61,6 @@ struct xfs_perag {
 	struct xfs_ag_resv	pag_rmapbt_resv;
 
 	/* Precalculated geometry info */
-	xfs_agblock_t		block_count;
-	xfs_agblock_t		min_block;
 	xfs_agino_t		agino_min;
 	xfs_agino_t		agino_max;
 
@@ -220,11 +218,7 @@ void xfs_agino_range(struct xfs_mount *mp, xfs_agnumber_t agno,
 static inline bool
 xfs_verify_agbno(struct xfs_perag *pag, xfs_agblock_t agbno)
 {
-	if (agbno >= pag->block_count)
-		return false;
-	if (agbno < pag->min_block)
-		return false;
-	return true;
+	return xfs_verify_gbno(pag_group(pag), agbno);
 }
 
 static inline bool
@@ -233,13 +227,7 @@ xfs_verify_agbext(
 	xfs_agblock_t		agbno,
 	xfs_agblock_t		len)
 {
-	if (agbno + len <= agbno)
-		return false;
-
-	if (!xfs_verify_agbno(pag, agbno))
-		return false;
-
-	return xfs_verify_agbno(pag, agbno + len - 1);
+	return xfs_verify_gbext(pag_group(pag), agbno, len);
 }
 
 /*
diff --git a/libxfs/xfs_group.h b/libxfs/xfs_group.h
index 5b7362277c3f7a..242b05627c7a6e 100644
--- a/libxfs/xfs_group.h
+++ b/libxfs/xfs_group.h
@@ -12,6 +12,10 @@ struct xfs_group {
 	atomic_t		xg_ref;		/* passive reference count */
 	atomic_t		xg_active_ref;	/* active reference count */
 
+	/* Precalculated geometry info */
+	uint32_t		xg_block_count;	/* max usable gbno */
+	uint32_t		xg_min_gbno;	/* min usable gbno */
+
 #ifdef __KERNEL__
 	/* -- kernel only structures below this line -- */
 
@@ -128,4 +132,33 @@ xfs_fsb_to_gbno(
 	return fsbno & mp->m_groups[type].blkmask;
 }
 
+static inline bool
+xfs_verify_gbno(
+	struct xfs_group	*xg,
+	uint32_t		gbno)
+{
+	if (gbno >= xg->xg_block_count)
+		return false;
+	if (gbno < xg->xg_min_gbno)
+		return false;
+	return true;
+}
+
+static inline bool
+xfs_verify_gbext(
+	struct xfs_group	*xg,
+	uint32_t		gbno,
+	uint32_t		glen)
+{
+	uint32_t		end;
+
+	if (!xfs_verify_gbno(xg, gbno))
+		return false;
+	if (glen == 0 || check_add_overflow(gbno, glen - 1, &end))
+		return false;
+	if (!xfs_verify_gbno(xg, end))
+		return false;
+	return true;
+}
+
 #endif /* __LIBXFS_GROUP_H */
diff --git a/libxfs/xfs_ialloc_btree.c b/libxfs/xfs_ialloc_btree.c
index 4eeea240b0bd89..19fca9fad62b1d 100644
--- a/libxfs/xfs_ialloc_btree.c
+++ b/libxfs/xfs_ialloc_btree.c
@@ -716,7 +716,7 @@ xfs_inobt_max_size(
 	struct xfs_perag	*pag)
 {
 	struct xfs_mount	*mp = pag_mount(pag);
-	xfs_agblock_t		agblocks = pag->block_count;
+	xfs_agblock_t		agblocks = pag_group(pag)->xg_block_count;
 
 	/* Bail out if we're uninitialized, which can happen in mkfs. */
 	if (M_IGEO(mp)->inobt_mxr[0] == 0)
diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index 50cfb038f03b79..8189b83d0f184a 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -31,6 +31,32 @@
 #include "xfs_metafile.h"
 #include "xfs_metadir.h"
 
+/* Find the first usable fsblock in this rtgroup. */
+static inline uint32_t
+xfs_rtgroup_min_block(
+	struct xfs_mount	*mp,
+	xfs_rgnumber_t		rgno)
+{
+	if (xfs_has_rtsb(mp) && rgno == 0)
+		return mp->m_sb.sb_rextsize;
+
+	return 0;
+}
+
+/* Precompute this group's geometry */
+void
+xfs_rtgroup_calc_geometry(
+	struct xfs_mount	*mp,
+	struct xfs_rtgroup	*rtg,
+	xfs_rgnumber_t		rgno,
+	xfs_rgnumber_t		rgcount,
+	xfs_rtbxlen_t		rextents)
+{
+	rtg->rtg_extents = __xfs_rtgroup_extents(mp, rgno, rgcount, rextents);
+	rtg_group(rtg)->xg_block_count = rtg->rtg_extents * mp->m_sb.sb_rextsize;
+	rtg_group(rtg)->xg_min_gbno = xfs_rtgroup_min_block(mp, rgno);
+}
+
 int
 xfs_rtgroup_alloc(
 	struct xfs_mount	*mp,
@@ -45,6 +71,8 @@ xfs_rtgroup_alloc(
 	if (!rtg)
 		return -ENOMEM;
 
+	xfs_rtgroup_calc_geometry(mp, rtg, rgno, rgcount, rextents);
+
 	error = xfs_group_insert(mp, rtg_group(rtg), rgno, XG_TYPE_RTG);
 	if (error)
 		goto out_free_rtg;
@@ -146,6 +174,7 @@ xfs_update_last_rtgroup_size(
 		return -EFSCORRUPTED;
 	rtg->rtg_extents = __xfs_rtgroup_extents(mp, prev_rgcount - 1,
 			mp->m_sb.sb_rgcount, mp->m_sb.sb_rextents);
+	rtg_group(rtg)->xg_block_count = rtg->rtg_extents * mp->m_sb.sb_rextsize;
 	xfs_rtgroup_rele(rtg);
 	return 0;
 }
@@ -220,7 +249,7 @@ xfs_rtgroup_get_geometry(
 	/* Fill out form. */
 	memset(rgeo, 0, sizeof(*rgeo));
 	rgeo->rg_number = rtg_rgno(rtg);
-	rgeo->rg_length = rtg->rtg_extents * rtg_mount(rtg)->m_sb.sb_rextsize;
+	rgeo->rg_length = rtg_group(rtg)->xg_block_count;
 	xfs_rtgroup_geom_health(rtg, rgeo);
 	return 0;
 }
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index c15b232e1f8e77..1e51dc62d1143e 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -199,6 +199,9 @@ int xfs_initialize_rtgroups(struct xfs_mount *mp, xfs_rgnumber_t first_rgno,
 xfs_rtxnum_t __xfs_rtgroup_extents(struct xfs_mount *mp, xfs_rgnumber_t rgno,
 		xfs_rgnumber_t rgcount, xfs_rtbxlen_t rextents);
 xfs_rtxnum_t xfs_rtgroup_extents(struct xfs_mount *mp, xfs_rgnumber_t rgno);
+void xfs_rtgroup_calc_geometry(struct xfs_mount *mp, struct xfs_rtgroup *rtg,
+		xfs_rgnumber_t rgno, xfs_rgnumber_t rgcount,
+		xfs_rtbxlen_t rextents);
 
 int xfs_update_last_rtgroup_size(struct xfs_mount *mp,
 		xfs_rgnumber_t prev_rgcount);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 37/52] xfs: implement busy extent tracking for rtgroups
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (35 preceding siblings ...)
  2024-12-23 22:07   ` [PATCH 36/52] xfs: move the min and max group block numbers to xfs_group Darrick J. Wong
@ 2024-12-23 22:07   ` Darrick J. Wong
  2024-12-23 22:08   ` [PATCH 38/52] xfs: use metadir for quota inodes Darrick J. Wong
                     ` (14 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:07 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 7e85fc2394115db56be678b617ed646563926581

For rtgroups filesystems, track newly freed (rt) space through the log
until the rt EFIs have been committed to disk.  This way we ensure that
space cannot be reused until all traces of the old owner are gone.

As a fringe benefit, we now support -o discard on the realtime device.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_rtbitmap.c |   11 ++++++++++-
 libxfs/xfs_rtgroup.h  |   13 +++++++++++++
 2 files changed, 23 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c
index e304f07189d3c9..b439fb3c20709f 100644
--- a/libxfs/xfs_rtbitmap.c
+++ b/libxfs/xfs_rtbitmap.c
@@ -1116,6 +1116,7 @@ xfs_rtfree_blocks(
 {
 	struct xfs_mount	*mp = tp->t_mountp;
 	xfs_extlen_t		mod;
+	int			error;
 
 	ASSERT(rtlen <= XFS_MAX_BMBT_EXTLEN);
 
@@ -1131,8 +1132,16 @@ xfs_rtfree_blocks(
 		return -EIO;
 	}
 
-	return xfs_rtfree_extent(tp, rtg, xfs_rtb_to_rtx(mp, rtbno),
+	error = xfs_rtfree_extent(tp, rtg, xfs_rtb_to_rtx(mp, rtbno),
 			xfs_extlen_to_rtxlen(mp, rtlen));
+	if (error)
+		return error;
+
+	if (xfs_has_rtgroups(mp))
+		xfs_extent_busy_insert(tp, rtg_group(rtg),
+				xfs_rtb_to_rgbno(mp, rtbno), rtlen, 0);
+
+	return 0;
 }
 
 /* Find all the free records within a given range. */
diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h
index 1e51dc62d1143e..7e7e491ff06fa5 100644
--- a/libxfs/xfs_rtgroup.h
+++ b/libxfs/xfs_rtgroup.h
@@ -155,6 +155,19 @@ xfs_rtbno_is_group_start(
 	return (rtbno & mp->m_groups[XG_TYPE_RTG].blkmask) == 0;
 }
 
+/* Convert an rtgroups rt extent number into an rgbno. */
+static inline xfs_rgblock_t
+xfs_rtx_to_rgbno(
+	struct xfs_rtgroup	*rtg,
+	xfs_rtxnum_t		rtx)
+{
+	struct xfs_mount	*mp = rtg_mount(rtg);
+
+	if (likely(mp->m_rtxblklog >= 0))
+		return rtx << mp->m_rtxblklog;
+	return rtx * mp->m_sb.sb_rextsize;
+}
+
 static inline xfs_daddr_t
 xfs_rtb_to_daddr(
 	struct xfs_mount	*mp,


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 38/52] xfs: use metadir for quota inodes
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (36 preceding siblings ...)
  2024-12-23 22:07   ` [PATCH 37/52] xfs: implement busy extent tracking for rtgroups Darrick J. Wong
@ 2024-12-23 22:08   ` Darrick J. Wong
  2024-12-23 22:08   ` [PATCH 39/52] xfs: scrub quota file metapaths Darrick J. Wong
                     ` (13 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:08 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: e80fbe1ad8eff7d7d1363e14f1e493d84dd37c84

Store the quota inodes in the /quota metadata directory if metadir is
enabled.  This enables us to stop using the sb_[ugp]uotino fields in the
superblock.  From this point on, all metadata files will be children of
the metadata directory tree root.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_dquot_buf.c  |  190 +++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_quota_defs.h |   43 +++++++++++
 libxfs/xfs_sb.c         |    1 
 3 files changed, 234 insertions(+)


diff --git a/libxfs/xfs_dquot_buf.c b/libxfs/xfs_dquot_buf.c
index db603cab92138f..599d03ac960b7b 100644
--- a/libxfs/xfs_dquot_buf.c
+++ b/libxfs/xfs_dquot_buf.c
@@ -14,6 +14,9 @@
 #include "xfs_quota_defs.h"
 #include "xfs_inode.h"
 #include "xfs_trans.h"
+#include "xfs_health.h"
+#include "xfs_metadir.h"
+#include "xfs_metafile.h"
 
 int
 xfs_calc_dquots_per_chunk(
@@ -321,3 +324,190 @@ xfs_dquot_to_disk_ts(
 
 	return cpu_to_be32(t);
 }
+
+inline unsigned int
+xfs_dqinode_sick_mask(xfs_dqtype_t type)
+{
+	switch (type) {
+	case XFS_DQTYPE_USER:
+		return XFS_SICK_FS_UQUOTA;
+	case XFS_DQTYPE_GROUP:
+		return XFS_SICK_FS_GQUOTA;
+	case XFS_DQTYPE_PROJ:
+		return XFS_SICK_FS_PQUOTA;
+	}
+
+	ASSERT(0);
+	return 0;
+}
+
+/*
+ * Load the inode for a given type of quota, assuming that the sb fields have
+ * been sorted out.  This is not true when switching quota types on a V4
+ * filesystem, so do not use this function for that.  If metadir is enabled,
+ * @dp must be the /quota metadir.
+ *
+ * Returns -ENOENT if the quota inode field is NULLFSINO; 0 and an inode on
+ * success; or a negative errno.
+ */
+int
+xfs_dqinode_load(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*dp,
+	xfs_dqtype_t		type,
+	struct xfs_inode	**ipp)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_inode	*ip;
+	enum xfs_metafile_type	metafile_type = xfs_dqinode_metafile_type(type);
+	int			error;
+
+	if (!xfs_has_metadir(mp)) {
+		xfs_ino_t	ino;
+
+		switch (type) {
+		case XFS_DQTYPE_USER:
+			ino = mp->m_sb.sb_uquotino;
+			break;
+		case XFS_DQTYPE_GROUP:
+			ino = mp->m_sb.sb_gquotino;
+			break;
+		case XFS_DQTYPE_PROJ:
+			ino = mp->m_sb.sb_pquotino;
+			break;
+		default:
+			ASSERT(0);
+			return -EFSCORRUPTED;
+		}
+
+		/* Should have set 0 to NULLFSINO when loading superblock */
+		if (ino == NULLFSINO)
+			return -ENOENT;
+
+		error = xfs_trans_metafile_iget(tp, ino, metafile_type, &ip);
+	} else {
+		error = xfs_metadir_load(tp, dp, xfs_dqinode_path(type),
+				metafile_type, &ip);
+		if (error == -ENOENT)
+			return error;
+	}
+	if (error) {
+		if (xfs_metadata_is_sick(error))
+			xfs_fs_mark_sick(mp, xfs_dqinode_sick_mask(type));
+		return error;
+	}
+
+	if (XFS_IS_CORRUPT(mp, ip->i_df.if_format != XFS_DINODE_FMT_EXTENTS &&
+			       ip->i_df.if_format != XFS_DINODE_FMT_BTREE)) {
+		xfs_irele(ip);
+		xfs_fs_mark_sick(mp, xfs_dqinode_sick_mask(type));
+		return -EFSCORRUPTED;
+	}
+
+	if (XFS_IS_CORRUPT(mp, ip->i_projid != 0)) {
+		xfs_irele(ip);
+		xfs_fs_mark_sick(mp, xfs_dqinode_sick_mask(type));
+		return -EFSCORRUPTED;
+	}
+
+	*ipp = ip;
+	return 0;
+}
+
+/* Create a metadata directory quota inode. */
+int
+xfs_dqinode_metadir_create(
+	struct xfs_inode		*dp,
+	xfs_dqtype_t			type,
+	struct xfs_inode		**ipp)
+{
+	struct xfs_metadir_update	upd = {
+		.dp			= dp,
+		.metafile_type		= xfs_dqinode_metafile_type(type),
+		.path			= xfs_dqinode_path(type),
+	};
+	int				error;
+
+	error = xfs_metadir_start_create(&upd);
+	if (error)
+		return error;
+
+	error = xfs_metadir_create(&upd, S_IFREG);
+	if (error)
+		return error;
+
+	xfs_trans_log_inode(upd.tp, upd.ip, XFS_ILOG_CORE);
+
+	error = xfs_metadir_commit(&upd);
+	if (error)
+		return error;
+
+	xfs_finish_inode_setup(upd.ip);
+	*ipp = upd.ip;
+	return 0;
+}
+
+#ifndef __KERNEL__
+/* Link a metadata directory quota inode. */
+int
+xfs_dqinode_metadir_link(
+	struct xfs_inode		*dp,
+	xfs_dqtype_t			type,
+	struct xfs_inode		*ip)
+{
+	struct xfs_metadir_update	upd = {
+		.dp			= dp,
+		.metafile_type		= xfs_dqinode_metafile_type(type),
+		.path			= xfs_dqinode_path(type),
+		.ip			= ip,
+	};
+	int				error;
+
+	error = xfs_metadir_start_link(&upd);
+	if (error)
+		return error;
+
+	error = xfs_metadir_link(&upd);
+	if (error)
+		return error;
+
+	xfs_trans_log_inode(upd.tp, upd.ip, XFS_ILOG_CORE);
+
+	return xfs_metadir_commit(&upd);
+}
+#endif /* __KERNEL__ */
+
+/* Create the parent directory for all quota inodes and load it. */
+int
+xfs_dqinode_mkdir_parent(
+	struct xfs_mount	*mp,
+	struct xfs_inode	**dpp)
+{
+	if (!mp->m_metadirip) {
+		xfs_fs_mark_sick(mp, XFS_SICK_FS_METADIR);
+		return -EFSCORRUPTED;
+	}
+
+	return xfs_metadir_mkdir(mp->m_metadirip, "quota", dpp);
+}
+
+/*
+ * Load the parent directory of all quota inodes.  Pass the inode to the caller
+ * because quota functions (e.g. QUOTARM) can be called on the quota files even
+ * if quotas are not enabled.
+ */
+int
+xfs_dqinode_load_parent(
+	struct xfs_trans	*tp,
+	struct xfs_inode	**dpp)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+
+	if (!mp->m_metadirip) {
+		xfs_fs_mark_sick(mp, XFS_SICK_FS_METADIR);
+		return -EFSCORRUPTED;
+	}
+
+	return xfs_metadir_load(tp, mp->m_metadirip, "quota", XFS_METAFILE_DIR,
+			dpp);
+}
diff --git a/libxfs/xfs_quota_defs.h b/libxfs/xfs_quota_defs.h
index fb05f44f6c754a..763d941a8420c5 100644
--- a/libxfs/xfs_quota_defs.h
+++ b/libxfs/xfs_quota_defs.h
@@ -143,4 +143,47 @@ time64_t xfs_dquot_from_disk_ts(struct xfs_disk_dquot *ddq,
 		__be32 dtimer);
 __be32 xfs_dquot_to_disk_ts(struct xfs_dquot *ddq, time64_t timer);
 
+static inline const char *
+xfs_dqinode_path(xfs_dqtype_t type)
+{
+	switch (type) {
+	case XFS_DQTYPE_USER:
+		return "user";
+	case XFS_DQTYPE_GROUP:
+		return "group";
+	case XFS_DQTYPE_PROJ:
+		return "project";
+	}
+
+	ASSERT(0);
+	return NULL;
+}
+
+static inline enum xfs_metafile_type
+xfs_dqinode_metafile_type(xfs_dqtype_t type)
+{
+	switch (type) {
+	case XFS_DQTYPE_USER:
+		return XFS_METAFILE_USRQUOTA;
+	case XFS_DQTYPE_GROUP:
+		return XFS_METAFILE_GRPQUOTA;
+	case XFS_DQTYPE_PROJ:
+		return XFS_METAFILE_PRJQUOTA;
+	}
+
+	ASSERT(0);
+	return XFS_METAFILE_UNKNOWN;
+}
+
+unsigned int xfs_dqinode_sick_mask(xfs_dqtype_t type);
+
+int xfs_dqinode_load(struct xfs_trans *tp, struct xfs_inode *dp,
+		xfs_dqtype_t type, struct xfs_inode **ipp);
+int xfs_dqinode_metadir_create(struct xfs_inode *dp, xfs_dqtype_t type,
+		struct xfs_inode **ipp);
+int xfs_dqinode_metadir_link(struct xfs_inode *dp, xfs_dqtype_t type,
+		struct xfs_inode *ip);
+int xfs_dqinode_mkdir_parent(struct xfs_mount *mp, struct xfs_inode **dpp);
+int xfs_dqinode_load_parent(struct xfs_trans *tp, struct xfs_inode **dpp);
+
 #endif	/* __XFS_QUOTA_H__ */
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 2ab234f1dfce70..375324b99261af 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -855,6 +855,7 @@ xfs_sb_quota_to_disk(
 
 	if (xfs_sb_is_v5(from) &&
 	    (from->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR)) {
+		to->sb_qflags = cpu_to_be16(from->sb_qflags);
 		to->sb_uquotino = cpu_to_be64(0);
 		to->sb_gquotino = cpu_to_be64(0);
 		to->sb_pquotino = cpu_to_be64(0);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 39/52] xfs: scrub quota file metapaths
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (37 preceding siblings ...)
  2024-12-23 22:08   ` [PATCH 38/52] xfs: use metadir for quota inodes Darrick J. Wong
@ 2024-12-23 22:08   ` Darrick J. Wong
  2024-12-23 22:08   ` [PATCH 40/52] xfs: enable metadata directory feature Darrick J. Wong
                     ` (12 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:08 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 128a055291ebbc156e219b83d03dc5e63e71d7ce

Enable online fsck for quota file metadata directory paths.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_fs.h |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index 96f7d3c95fb4bc..41ce4d3d650ec7 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -825,9 +825,13 @@ struct xfs_scrub_vec_head {
 #define XFS_SCRUB_METAPATH_RTDIR	(1)  /* rtrgroups metadir */
 #define XFS_SCRUB_METAPATH_RTBITMAP	(2)  /* per-rtg bitmap */
 #define XFS_SCRUB_METAPATH_RTSUMMARY	(3)  /* per-rtg summary */
+#define XFS_SCRUB_METAPATH_QUOTADIR	(4)  /* quota metadir */
+#define XFS_SCRUB_METAPATH_USRQUOTA	(5)  /* user quota */
+#define XFS_SCRUB_METAPATH_GRPQUOTA	(6)  /* group quota */
+#define XFS_SCRUB_METAPATH_PRJQUOTA	(7)  /* project quota */
 
 /* Number of metapath sm_ino values */
-#define XFS_SCRUB_METAPATH_NR		(4)
+#define XFS_SCRUB_METAPATH_NR		(8)
 
 /*
  * ioctl limits


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 40/52] xfs: enable metadata directory feature
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (38 preceding siblings ...)
  2024-12-23 22:08   ` [PATCH 39/52] xfs: scrub quota file metapaths Darrick J. Wong
@ 2024-12-23 22:08   ` Darrick J. Wong
  2024-12-23 22:08   ` [PATCH 41/52] xfs: convert struct typedefs in xfs_ondisk.h Darrick J. Wong
                     ` (11 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:08 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: ea079efd365e60aa26efea24b57ced4c64640e75

Enable the metadata directory feature.  With this feature, all metadata
inodes are placed in the metadata directory, and the only inumbers in
the superblock are the roots of the two directory trees.

The RT device is now sharded into a number of rtgroups, where 0 rtgroups
mean that no RT extents are supported, and the traditional XFS stub RT
bitmap and summary inodes don't exist.  A single rtgroup gives roughly
identical behavior to the traditional RT setup, but now with checksummed
and self identifying free space metadata.

For quota, the quota options are read from the superblock unless
explicitly overridden via mount options.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_format.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index d6c10855ab023b..4d47a3e723aa13 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -403,7 +403,8 @@ xfs_sb_has_ro_compat_feature(
 		 XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR | \
 		 XFS_SB_FEAT_INCOMPAT_NREXT64 | \
 		 XFS_SB_FEAT_INCOMPAT_EXCHRANGE | \
-		 XFS_SB_FEAT_INCOMPAT_PARENT)
+		 XFS_SB_FEAT_INCOMPAT_PARENT | \
+		 XFS_SB_FEAT_INCOMPAT_METADIR)

 #define XFS_SB_FEAT_INCOMPAT_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_ALL
 static inline bool

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 41/52] xfs: convert struct typedefs in xfs_ondisk.h
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (39 preceding siblings ...)
  2024-12-23 22:08   ` [PATCH 40/52] xfs: enable metadata directory feature Darrick J. Wong
@ 2024-12-23 22:08   ` Darrick J. Wong
  2024-12-23 22:09   ` [PATCH 42/52] xfs: separate space btree structures " Darrick J. Wong
                     ` (10 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:08 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 89b38282d1b0f34595f86193cb2bf96e6730060e

Replace xfs_foo_t with struct xfs_foo where appropriate.  The next patch
will import more checks from xfs/122, and it's easier to automate
deduplication if we don't have to reason about typedefs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_ondisk.h |   74 ++++++++++++++++++++++++++-------------------------
 1 file changed, 37 insertions(+), 37 deletions(-)


diff --git a/libxfs/xfs_ondisk.h b/libxfs/xfs_ondisk.h
index 99eae7f67e961b..98208c8a0d9ed0 100644
--- a/libxfs/xfs_ondisk.h
+++ b/libxfs/xfs_ondisk.h
@@ -49,7 +49,7 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_STRUCT_SIZE(struct xfs_legacy_timestamp,	8);
 	XFS_CHECK_STRUCT_SIZE(xfs_alloc_key_t,			8);
 	XFS_CHECK_STRUCT_SIZE(xfs_alloc_ptr_t,			4);
-	XFS_CHECK_STRUCT_SIZE(xfs_alloc_rec_t,			8);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_alloc_rec,		8);
 	XFS_CHECK_STRUCT_SIZE(xfs_inobt_ptr_t,			4);
 	XFS_CHECK_STRUCT_SIZE(xfs_refcount_ptr_t,		4);
 	XFS_CHECK_STRUCT_SIZE(xfs_rmap_ptr_t,			4);
@@ -68,10 +68,10 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_STRUCT_SIZE(struct xfs_dir3_free_hdr,		64);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_dir3_leaf,		64);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_dir3_leaf_hdr,		64);
-	XFS_CHECK_STRUCT_SIZE(xfs_attr_leaf_entry_t,		8);
-	XFS_CHECK_STRUCT_SIZE(xfs_attr_leaf_hdr_t,		32);
-	XFS_CHECK_STRUCT_SIZE(xfs_attr_leaf_map_t,		4);
-	XFS_CHECK_STRUCT_SIZE(xfs_attr_leaf_name_local_t,	4);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_attr_leaf_entry,		8);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_attr_leaf_hdr,		32);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_attr_leaf_map,		4);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_attr_leaf_name_local,	4);
 
 	/* realtime structures */
 	XFS_CHECK_STRUCT_SIZE(union xfs_rtword_raw,		4);
@@ -79,23 +79,23 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_STRUCT_SIZE(struct xfs_rtbuf_blkinfo,		48);
 
 	/*
-	 * m68k has problems with xfs_attr_leaf_name_remote_t, but we pad it to
-	 * 4 bytes anyway so it's not obviously a problem.  Hence for the moment
-	 * we don't check this structure. This can be re-instated when the attr
-	 * definitions are updated to use c99 VLA definitions.
+	 * m68k has problems with struct xfs_attr_leaf_name_remote, but we pad
+	 * it to 4 bytes anyway so it's not obviously a problem.  Hence for the
+	 * moment we don't check this structure. This can be re-instated when
+	 * the attr definitions are updated to use c99 VLA definitions.
 	 *
-	XFS_CHECK_STRUCT_SIZE(xfs_attr_leaf_name_remote_t,	12);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_attr_leaf_name_remote,	12);
 	 */
 
 	XFS_CHECK_OFFSET(struct xfs_dsb, sb_crc,		224);
-	XFS_CHECK_OFFSET(xfs_attr_leaf_name_local_t, valuelen,	0);
-	XFS_CHECK_OFFSET(xfs_attr_leaf_name_local_t, namelen,	2);
-	XFS_CHECK_OFFSET(xfs_attr_leaf_name_local_t, nameval,	3);
-	XFS_CHECK_OFFSET(xfs_attr_leaf_name_remote_t, valueblk,	0);
-	XFS_CHECK_OFFSET(xfs_attr_leaf_name_remote_t, valuelen,	4);
-	XFS_CHECK_OFFSET(xfs_attr_leaf_name_remote_t, namelen,	8);
-	XFS_CHECK_OFFSET(xfs_attr_leaf_name_remote_t, name,	9);
-	XFS_CHECK_STRUCT_SIZE(xfs_attr_leafblock_t,		32);
+	XFS_CHECK_OFFSET(struct xfs_attr_leaf_name_local, valuelen,	0);
+	XFS_CHECK_OFFSET(struct xfs_attr_leaf_name_local, namelen,	2);
+	XFS_CHECK_OFFSET(struct xfs_attr_leaf_name_local, nameval,	3);
+	XFS_CHECK_OFFSET(struct xfs_attr_leaf_name_remote, valueblk,	0);
+	XFS_CHECK_OFFSET(struct xfs_attr_leaf_name_remote, valuelen,	4);
+	XFS_CHECK_OFFSET(struct xfs_attr_leaf_name_remote, namelen,	8);
+	XFS_CHECK_OFFSET(struct xfs_attr_leaf_name_remote, name,	9);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_attr_leafblock,		32);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_attr_sf_hdr,		4);
 	XFS_CHECK_OFFSET(struct xfs_attr_sf_hdr, totsize,	0);
 	XFS_CHECK_OFFSET(struct xfs_attr_sf_hdr, count,		2);
@@ -103,25 +103,25 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_OFFSET(struct xfs_attr_sf_entry, valuelen,	1);
 	XFS_CHECK_OFFSET(struct xfs_attr_sf_entry, flags,	2);
 	XFS_CHECK_OFFSET(struct xfs_attr_sf_entry, nameval,	3);
-	XFS_CHECK_STRUCT_SIZE(xfs_da_blkinfo_t,			12);
-	XFS_CHECK_STRUCT_SIZE(xfs_da_intnode_t,			16);
-	XFS_CHECK_STRUCT_SIZE(xfs_da_node_entry_t,		8);
-	XFS_CHECK_STRUCT_SIZE(xfs_da_node_hdr_t,		16);
-	XFS_CHECK_STRUCT_SIZE(xfs_dir2_data_free_t,		4);
-	XFS_CHECK_STRUCT_SIZE(xfs_dir2_data_hdr_t,		16);
-	XFS_CHECK_OFFSET(xfs_dir2_data_unused_t, freetag,	0);
-	XFS_CHECK_OFFSET(xfs_dir2_data_unused_t, length,	2);
-	XFS_CHECK_STRUCT_SIZE(xfs_dir2_free_hdr_t,		16);
-	XFS_CHECK_STRUCT_SIZE(xfs_dir2_free_t,			16);
-	XFS_CHECK_STRUCT_SIZE(xfs_dir2_leaf_entry_t,		8);
-	XFS_CHECK_STRUCT_SIZE(xfs_dir2_leaf_hdr_t,		16);
-	XFS_CHECK_STRUCT_SIZE(xfs_dir2_leaf_t,			16);
-	XFS_CHECK_STRUCT_SIZE(xfs_dir2_leaf_tail_t,		4);
-	XFS_CHECK_STRUCT_SIZE(xfs_dir2_sf_entry_t,		3);
-	XFS_CHECK_OFFSET(xfs_dir2_sf_entry_t, namelen,		0);
-	XFS_CHECK_OFFSET(xfs_dir2_sf_entry_t, offset,		1);
-	XFS_CHECK_OFFSET(xfs_dir2_sf_entry_t, name,		3);
-	XFS_CHECK_STRUCT_SIZE(xfs_dir2_sf_hdr_t,		10);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_da_blkinfo,		12);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_da_intnode,		16);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_da_node_entry,		8);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_da_node_hdr,		16);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_data_free,		4);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_data_hdr,		16);
+	XFS_CHECK_OFFSET(struct xfs_dir2_data_unused, freetag,	0);
+	XFS_CHECK_OFFSET(struct xfs_dir2_data_unused, length,	2);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_free_hdr,		16);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_free,		16);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_leaf_entry,	8);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_leaf_hdr,		16);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_leaf,		16);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_leaf_tail,	4);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_sf_entry,		3);
+	XFS_CHECK_OFFSET(struct xfs_dir2_sf_entry, namelen,	0);
+	XFS_CHECK_OFFSET(struct xfs_dir2_sf_entry, offset,	1);
+	XFS_CHECK_OFFSET(struct xfs_dir2_sf_entry, name,	3);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_sf_hdr,		10);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_parent_rec,		12);
 
 	/* log structures */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 42/52] xfs: separate space btree structures in xfs_ondisk.h
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (40 preceding siblings ...)
  2024-12-23 22:08   ` [PATCH 41/52] xfs: convert struct typedefs in xfs_ondisk.h Darrick J. Wong
@ 2024-12-23 22:09   ` Darrick J. Wong
  2024-12-23 22:09   ` [PATCH 43/52] xfs: port ondisk structure checks from xfs/122 to the kernel Darrick J. Wong
                     ` (9 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:09 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 131a883fffb1a194957dc0e400d9f627c7cd1924

Create a separate section for space management btrees so that they're
not mixed in with file structures.  Ignore the dsb stuff sprinkled
around for now, because we'll deal with that in a subsequent patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_ondisk.h |   24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)


diff --git a/libxfs/xfs_ondisk.h b/libxfs/xfs_ondisk.h
index 98208c8a0d9ed0..7b7c3d6d0d62d9 100644
--- a/libxfs/xfs_ondisk.h
+++ b/libxfs/xfs_ondisk.h
@@ -22,38 +22,39 @@
 static inline void __init
 xfs_check_ondisk_structs(void)
 {
-	/* ag/file structures */
+	/* file structures */
 	XFS_CHECK_STRUCT_SIZE(struct xfs_acl,			4);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_acl_entry,		12);
-	XFS_CHECK_STRUCT_SIZE(struct xfs_agf,			224);
-	XFS_CHECK_STRUCT_SIZE(struct xfs_agfl,			36);
-	XFS_CHECK_STRUCT_SIZE(struct xfs_agi,			344);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_bmbt_key,		8);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_bmbt_rec,		16);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_bmdr_block,		4);
-	XFS_CHECK_STRUCT_SIZE(struct xfs_btree_block_shdr,	48);
-	XFS_CHECK_STRUCT_SIZE(struct xfs_btree_block_lhdr,	64);
-	XFS_CHECK_STRUCT_SIZE(struct xfs_btree_block,		72);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_dinode,		176);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_disk_dquot,		104);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_dqblk,			136);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_dsb,			288);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_dsymlink_hdr,		56);
+	XFS_CHECK_STRUCT_SIZE(xfs_timestamp_t,			8);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_legacy_timestamp,	8);
+
+	/* space btrees */
+	XFS_CHECK_STRUCT_SIZE(struct xfs_agf,			224);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_agfl,			36);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_agi,			344);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_alloc_rec,		8);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_btree_block,		72);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_btree_block_lhdr,	64);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_btree_block_shdr,	48);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_inobt_key,		4);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_inobt_rec,		16);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_refcount_key,		4);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_refcount_rec,		12);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_rmap_key,		20);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_rmap_rec,		24);
-	XFS_CHECK_STRUCT_SIZE(xfs_timestamp_t,			8);
-	XFS_CHECK_STRUCT_SIZE(struct xfs_legacy_timestamp,	8);
 	XFS_CHECK_STRUCT_SIZE(xfs_alloc_key_t,			8);
 	XFS_CHECK_STRUCT_SIZE(xfs_alloc_ptr_t,			4);
-	XFS_CHECK_STRUCT_SIZE(struct xfs_alloc_rec,		8);
 	XFS_CHECK_STRUCT_SIZE(xfs_inobt_ptr_t,			4);
 	XFS_CHECK_STRUCT_SIZE(xfs_refcount_ptr_t,		4);
 	XFS_CHECK_STRUCT_SIZE(xfs_rmap_ptr_t,			4);
-	XFS_CHECK_STRUCT_SIZE(struct xfs_rtsb,			56);
 
 	/* dir/attr trees */
 	XFS_CHECK_STRUCT_SIZE(struct xfs_attr3_leaf_hdr,	80);
@@ -74,6 +75,7 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_STRUCT_SIZE(struct xfs_attr_leaf_name_local,	4);
 
 	/* realtime structures */
+	XFS_CHECK_STRUCT_SIZE(struct xfs_rtsb,			56);
 	XFS_CHECK_STRUCT_SIZE(union xfs_rtword_raw,		4);
 	XFS_CHECK_STRUCT_SIZE(union xfs_suminfo_raw,		4);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_rtbuf_blkinfo,		48);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 43/52] xfs: port ondisk structure checks from xfs/122 to the kernel
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (41 preceding siblings ...)
  2024-12-23 22:09   ` [PATCH 42/52] xfs: separate space btree structures " Darrick J. Wong
@ 2024-12-23 22:09   ` Darrick J. Wong
  2024-12-23 22:09   ` [PATCH 44/52] xfs: remove unknown compat feature check in superblock write validation Darrick J. Wong
                     ` (8 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:09 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 13877bc79d81354c53e91f3c86ac0f7bafe3ba7b

Check this with every kernel and userspace build, so we can drop the
nonsense in xfs/122.  Roughly drafted with:

sed -e 's/^offsetof/\tXFS_CHECK_OFFSET/g' \
-e 's/^sizeof/\tXFS_CHECK_STRUCT_SIZE/g' \
-e 's/ = \([0-9]*\)/,\t\t\t\1);/g' \
-e 's/xfs_sb_t/struct xfs_dsb/g' \
-e 's/),/,/g' \
-e 's/xfs_\([a-z0-9_]*\)_t,/struct xfs_\1,/g' \
< tests/xfs/122.out | sort

and then manual fixups.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_ondisk.h |   90 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 88 insertions(+), 2 deletions(-)


diff --git a/libxfs/xfs_ondisk.h b/libxfs/xfs_ondisk.h
index 7b7c3d6d0d62d9..ad0dedf00f1898 100644
--- a/libxfs/xfs_ondisk.h
+++ b/libxfs/xfs_ondisk.h
@@ -19,6 +19,10 @@
 	static_assert((value) == (expected), \
 		"XFS: value of " #value " is wrong, expected " #expected)
 
+#define XFS_CHECK_SB_OFFSET(field, offset) \
+	XFS_CHECK_OFFSET(struct xfs_dsb, field, offset); \
+	XFS_CHECK_OFFSET(struct xfs_sb, field, offset);
+
 static inline void __init
 xfs_check_ondisk_structs(void)
 {
@@ -31,7 +35,6 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_STRUCT_SIZE(struct xfs_dinode,		176);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_disk_dquot,		104);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_dqblk,			136);
-	XFS_CHECK_STRUCT_SIZE(struct xfs_dsb,			288);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_dsymlink_hdr,		56);
 	XFS_CHECK_STRUCT_SIZE(xfs_timestamp_t,			8);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_legacy_timestamp,	8);
@@ -55,6 +58,7 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_STRUCT_SIZE(xfs_inobt_ptr_t,			4);
 	XFS_CHECK_STRUCT_SIZE(xfs_refcount_ptr_t,		4);
 	XFS_CHECK_STRUCT_SIZE(xfs_rmap_ptr_t,			4);
+	XFS_CHECK_STRUCT_SIZE(xfs_bmdr_key_t,			8);
 
 	/* dir/attr trees */
 	XFS_CHECK_STRUCT_SIZE(struct xfs_attr3_leaf_hdr,	80);
@@ -89,7 +93,6 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_STRUCT_SIZE(struct xfs_attr_leaf_name_remote,	12);
 	 */
 
-	XFS_CHECK_OFFSET(struct xfs_dsb, sb_crc,		224);
 	XFS_CHECK_OFFSET(struct xfs_attr_leaf_name_local, valuelen,	0);
 	XFS_CHECK_OFFSET(struct xfs_attr_leaf_name_local, namelen,	2);
 	XFS_CHECK_OFFSET(struct xfs_attr_leaf_name_local, nameval,	3);
@@ -126,6 +129,20 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_sf_hdr,		10);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_parent_rec,		12);
 
+	/* ondisk dir/attr structures from xfs/122 */
+	XFS_CHECK_STRUCT_SIZE(struct xfs_attr_sf_entry,		3);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_data_free,	4);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_data_hdr,		16);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_data_unused,	6);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_free,		16);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_free_hdr,		16);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_leaf,		16);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_leaf_entry,	8);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_leaf_hdr,		16);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_leaf_tail,	4);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_sf_entry,		3);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_sf_hdr,		10);
+
 	/* log structures */
 	XFS_CHECK_STRUCT_SIZE(struct xfs_buf_log_format,	88);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_dq_logformat,		24);
@@ -161,6 +178,11 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_OFFSET(struct xfs_efi_log_format_32, efi_extents,	16);
 	XFS_CHECK_OFFSET(struct xfs_efi_log_format_64, efi_extents,	16);
 
+	/* ondisk log structures from xfs/122 */
+	XFS_CHECK_STRUCT_SIZE(struct xfs_unmount_log_format,		8);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_xmd_log_format,		16);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_xmi_log_format,		88);
+
 	/* parent pointer ioctls */
 	XFS_CHECK_STRUCT_SIZE(struct xfs_getparents_rec,	32);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_getparents,		40);
@@ -205,6 +227,70 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_VALUE(XFS_DQ_BIGTIME_EXPIRY_MIN << XFS_DQ_BIGTIME_SHIFT, 4);
 	XFS_CHECK_VALUE(XFS_DQ_BIGTIME_EXPIRY_MAX << XFS_DQ_BIGTIME_SHIFT,
 			16299260424LL);
+
+	/* superblock field checks we got from xfs/122 */
+	XFS_CHECK_STRUCT_SIZE(struct xfs_dsb,		288);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_sb,		288);
+	XFS_CHECK_SB_OFFSET(sb_magicnum,		0);
+	XFS_CHECK_SB_OFFSET(sb_blocksize,		4);
+	XFS_CHECK_SB_OFFSET(sb_dblocks,			8);
+	XFS_CHECK_SB_OFFSET(sb_rblocks,			16);
+	XFS_CHECK_SB_OFFSET(sb_rextents,		24);
+	XFS_CHECK_SB_OFFSET(sb_uuid,			32);
+	XFS_CHECK_SB_OFFSET(sb_logstart,		48);
+	XFS_CHECK_SB_OFFSET(sb_rootino,			56);
+	XFS_CHECK_SB_OFFSET(sb_rbmino,			64);
+	XFS_CHECK_SB_OFFSET(sb_rsumino,			72);
+	XFS_CHECK_SB_OFFSET(sb_rextsize,		80);
+	XFS_CHECK_SB_OFFSET(sb_agblocks,		84);
+	XFS_CHECK_SB_OFFSET(sb_agcount,			88);
+	XFS_CHECK_SB_OFFSET(sb_rbmblocks,		92);
+	XFS_CHECK_SB_OFFSET(sb_logblocks,		96);
+	XFS_CHECK_SB_OFFSET(sb_versionnum,		100);
+	XFS_CHECK_SB_OFFSET(sb_sectsize,		102);
+	XFS_CHECK_SB_OFFSET(sb_inodesize,		104);
+	XFS_CHECK_SB_OFFSET(sb_inopblock,		106);
+	XFS_CHECK_SB_OFFSET(sb_blocklog,		120);
+	XFS_CHECK_SB_OFFSET(sb_fname[12],		120);
+	XFS_CHECK_SB_OFFSET(sb_sectlog,			121);
+	XFS_CHECK_SB_OFFSET(sb_inodelog,		122);
+	XFS_CHECK_SB_OFFSET(sb_inopblog,		123);
+	XFS_CHECK_SB_OFFSET(sb_agblklog,		124);
+	XFS_CHECK_SB_OFFSET(sb_rextslog,		125);
+	XFS_CHECK_SB_OFFSET(sb_inprogress,		126);
+	XFS_CHECK_SB_OFFSET(sb_imax_pct,		127);
+	XFS_CHECK_SB_OFFSET(sb_icount,			128);
+	XFS_CHECK_SB_OFFSET(sb_ifree,			136);
+	XFS_CHECK_SB_OFFSET(sb_fdblocks,		144);
+	XFS_CHECK_SB_OFFSET(sb_frextents,		152);
+	XFS_CHECK_SB_OFFSET(sb_uquotino,		160);
+	XFS_CHECK_SB_OFFSET(sb_gquotino,		168);
+	XFS_CHECK_SB_OFFSET(sb_qflags,			176);
+	XFS_CHECK_SB_OFFSET(sb_flags,			178);
+	XFS_CHECK_SB_OFFSET(sb_shared_vn,		179);
+	XFS_CHECK_SB_OFFSET(sb_inoalignmt,		180);
+	XFS_CHECK_SB_OFFSET(sb_unit,			184);
+	XFS_CHECK_SB_OFFSET(sb_width,			188);
+	XFS_CHECK_SB_OFFSET(sb_dirblklog,		192);
+	XFS_CHECK_SB_OFFSET(sb_logsectlog,		193);
+	XFS_CHECK_SB_OFFSET(sb_logsectsize,		194);
+	XFS_CHECK_SB_OFFSET(sb_logsunit,		196);
+	XFS_CHECK_SB_OFFSET(sb_features2,		200);
+	XFS_CHECK_SB_OFFSET(sb_bad_features2,		204);
+	XFS_CHECK_SB_OFFSET(sb_features_compat,		208);
+	XFS_CHECK_SB_OFFSET(sb_features_ro_compat,	212);
+	XFS_CHECK_SB_OFFSET(sb_features_incompat,	216);
+	XFS_CHECK_SB_OFFSET(sb_features_log_incompat,	220);
+	XFS_CHECK_SB_OFFSET(sb_crc,			224);
+	XFS_CHECK_SB_OFFSET(sb_spino_align,		228);
+	XFS_CHECK_SB_OFFSET(sb_pquotino,		232);
+	XFS_CHECK_SB_OFFSET(sb_lsn,			240);
+	XFS_CHECK_SB_OFFSET(sb_meta_uuid,		248);
+	XFS_CHECK_SB_OFFSET(sb_metadirino,		264);
+	XFS_CHECK_SB_OFFSET(sb_rgcount,			272);
+	XFS_CHECK_SB_OFFSET(sb_rgextents,		276);
+	XFS_CHECK_SB_OFFSET(sb_rgblklog,		280);
+	XFS_CHECK_SB_OFFSET(sb_pad,			281);
 }
 
 #endif /* __XFS_ONDISK_H */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 44/52] xfs: remove unknown compat feature check in superblock write validation
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (42 preceding siblings ...)
  2024-12-23 22:09   ` [PATCH 43/52] xfs: port ondisk structure checks from xfs/122 to the kernel Darrick J. Wong
@ 2024-12-23 22:09   ` Darrick J. Wong
  2024-12-23 22:09   ` [PATCH 45/52] xfs: fix sparse inode limits on runt AG Darrick J. Wong
                     ` (7 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:09 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: stable, leo.lilong, hch, cem, linux-xfs

From: Long Li <leo.lilong@huawei.com>

Source kernel commit: 652f03db897ba24f9c4b269e254ccc6cc01ff1b7

Compat features are new features that older kernels can safely ignore,
allowing read-write mounts without issues. The current sb write validation
implementation returns -EFSCORRUPTED for unknown compat features,
preventing filesystem write operations and contradicting the feature's
definition.

Additionally, if the mounted image is unclean, the log recovery may need
to write to the superblock. Returning an error for unknown compat features
during sb write validation can cause mount failures.

Although XFS currently does not use compat feature flags, this issue
affects current kernels' ability to mount images that may use compat
feature flags in the future.

Since superblock read validation already warns about unknown compat
features, it's unnecessary to repeat this warning during write validation.
Therefore, the relevant code in write validation is being removed.

Fixes: 9e037cb7972f ("xfs: check for unknown v5 feature bits in superblock write verifier")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
---
 libxfs/xfs_sb.c |    7 -------
 1 file changed, 7 deletions(-)

diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 375324b99261af..87f740e6c75dce 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -323,13 +323,6 @@ xfs_validate_sb_write(
 	 * the kernel cannot support since we checked for unsupported bits in
 	 * the read verifier, which means that memory is corrupt.
 	 */
-	if (xfs_sb_has_compat_feature(sbp, XFS_SB_FEAT_COMPAT_UNKNOWN)) {
-		xfs_warn(mp,
-"Corruption detected in superblock compatible features (0x%x)!",
-			(sbp->sb_features_compat & XFS_SB_FEAT_COMPAT_UNKNOWN));
-		return -EFSCORRUPTED;
-	}
-
 	if (!xfs_is_readonly(mp) &&
 	    xfs_sb_has_ro_compat_feature(sbp, XFS_SB_FEAT_RO_COMPAT_UNKNOWN)) {
 		xfs_alert(mp,

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 45/52] xfs: fix sparse inode limits on runt AG
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (43 preceding siblings ...)
  2024-12-23 22:09   ` [PATCH 44/52] xfs: remove unknown compat feature check in superblock write validation Darrick J. Wong
@ 2024-12-23 22:09   ` Darrick J. Wong
  2024-12-23 22:10   ` [PATCH 46/52] xfs: switch to multigrain timestamps Darrick J. Wong
                     ` (6 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:09 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: dchinner, cem, hch, linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Source kernel commit: 13325333582d4820d39b9e8f63d6a54e745585d9

The runt AG at the end of a filesystem is almost always smaller than
the mp->m_sb.sb_agblocks. Unfortunately, when setting the max_agbno
limit for the inode chunk allocation, we do not take this into
account. This means we can allocate a sparse inode chunk that
overlaps beyond the end of an AG. When we go to allocate an inode
from that sparse chunk, the irec fails validation because the
agbno of the start of the irec is beyond valid limits for the runt
AG.

Prevent this from happening by taking into account the size of the
runt AG when allocating inode chunks. Also convert the various
checks for valid inode chunk agbnos to use xfs_ag_block_count()
so that they will also catch such issues in the future.

Fixes: 56d1115c9bc7 ("xfs: allocate sparse inode chunks on full chunk allocation failure")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_ialloc.c |   16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)


diff --git a/libxfs/xfs_ialloc.c b/libxfs/xfs_ialloc.c
index 2575447f92dfbb..63ce76755eb77f 100644
--- a/libxfs/xfs_ialloc.c
+++ b/libxfs/xfs_ialloc.c
@@ -848,7 +848,8 @@ xfs_ialloc_ag_alloc(
 		 * the end of the AG.
 		 */
 		args.min_agbno = args.mp->m_sb.sb_inoalignmt;
-		args.max_agbno = round_down(args.mp->m_sb.sb_agblocks,
+		args.max_agbno = round_down(xfs_ag_block_count(args.mp,
+							pag_agno(pag)),
 					    args.mp->m_sb.sb_inoalignmt) -
 				 igeo->ialloc_blks;
 
@@ -2344,9 +2345,9 @@ xfs_difree(
 		return -EINVAL;
 	}
 	agbno = XFS_AGINO_TO_AGBNO(mp, agino);
-	if (agbno >= mp->m_sb.sb_agblocks)  {
-		xfs_warn(mp, "%s: agbno >= mp->m_sb.sb_agblocks (%d >= %d).",
-			__func__, agbno, mp->m_sb.sb_agblocks);
+	if (agbno >= xfs_ag_block_count(mp, pag_agno(pag))) {
+		xfs_warn(mp, "%s: agbno >= xfs_ag_block_count (%d >= %d).",
+			__func__, agbno, xfs_ag_block_count(mp, pag_agno(pag)));
 		ASSERT(0);
 		return -EINVAL;
 	}
@@ -2469,7 +2470,7 @@ xfs_imap(
 	 */
 	agino = XFS_INO_TO_AGINO(mp, ino);
 	agbno = XFS_AGINO_TO_AGBNO(mp, agino);
-	if (agbno >= mp->m_sb.sb_agblocks ||
+	if (agbno >= xfs_ag_block_count(mp, pag_agno(pag)) ||
 	    ino != xfs_agino_to_ino(pag, agino)) {
 		error = -EINVAL;
 #ifdef DEBUG
@@ -2479,11 +2480,12 @@ xfs_imap(
 		 */
 		if (flags & XFS_IGET_UNTRUSTED)
 			return error;
-		if (agbno >= mp->m_sb.sb_agblocks) {
+		if (agbno >= xfs_ag_block_count(mp, pag_agno(pag))) {
 			xfs_alert(mp,
 		"%s: agbno (0x%llx) >= mp->m_sb.sb_agblocks (0x%lx)",
 				__func__, (unsigned long long)agbno,
-				(unsigned long)mp->m_sb.sb_agblocks);
+				(unsigned long)xfs_ag_block_count(mp,
+							pag_agno(pag)));
 		}
 		if (ino != xfs_agino_to_ino(pag, agino)) {
 			xfs_alert(mp,


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 46/52] xfs: switch to multigrain timestamps
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (44 preceding siblings ...)
  2024-12-23 22:09   ` [PATCH 45/52] xfs: fix sparse inode limits on runt AG Darrick J. Wong
@ 2024-12-23 22:10   ` Darrick J. Wong
  2024-12-23 22:10   ` [PATCH 47/52] xfs: don't call xfs_bmap_same_rtgroup in xfs_bmap_add_extent_hole_delay Darrick J. Wong
                     ` (5 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:10 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: josef, rdunlap, jlayton, brauner, hch, linux-xfs

From: Jeff Layton <jlayton@kernel.org>

Source kernel commit: 1cf7e834a6fb84de9d1e038d6cf4c5bd0d202ffa

Enable multigrain timestamps, which should ensure that there is an
apparent change to the timestamp whenever it has been written after
being actively observed via getattr.

Also, anytime the mtime changes, the ctime must also change, and those
are now the only two options for xfs_trans_ichgtime. Have that function
unconditionally bump the ctime, and ASSERT that XFS_ICHGTIME_CHG is
always set.

Finally, stop setting STATX_CHANGE_COOKIE in getattr, since the ctime
should give us better semantics now.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # documentation bits
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20241002-mgtime-v10-9-d1c4717f5284@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_trans_inode.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


diff --git a/libxfs/xfs_trans_inode.c b/libxfs/xfs_trans_inode.c
index 45b513bc5ceb40..90eec4d3592dea 100644
--- a/libxfs/xfs_trans_inode.c
+++ b/libxfs/xfs_trans_inode.c
@@ -59,12 +59,12 @@ xfs_trans_ichgtime(
 	ASSERT(tp);
 	xfs_assert_ilocked(ip, XFS_ILOCK_EXCL);
 
-	tv = current_time(inode);
+	/* If the mtime changes, then ctime must also change */
+	ASSERT(flags & XFS_ICHGTIME_CHG);
 
+	tv = inode_set_ctime_current(inode);
 	if (flags & XFS_ICHGTIME_MOD)
 		inode_set_mtime_to_ts(inode, tv);
-	if (flags & XFS_ICHGTIME_CHG)
-		inode_set_ctime_to_ts(inode, tv);
 	if (flags & XFS_ICHGTIME_ACCESS)
 		inode_set_atime_to_ts(inode, tv);
 	if (flags & XFS_ICHGTIME_CREATE)


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 47/52] xfs: don't call xfs_bmap_same_rtgroup in xfs_bmap_add_extent_hole_delay
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (45 preceding siblings ...)
  2024-12-23 22:10   ` [PATCH 46/52] xfs: switch to multigrain timestamps Darrick J. Wong
@ 2024-12-23 22:10   ` Darrick J. Wong
  2024-12-23 22:10   ` [PATCH 48/52] xfs: return a 64-bit block count from xfs_btree_count_blocks Darrick J. Wong
                     ` (4 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:10 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, cem, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Source kernel commit: cc2dba08cc33daf8acd6e560957ef0e0f4d034ed

xfs_bmap_add_extent_hole_delay works entirely on delalloc extents, for
which xfs_bmap_same_rtgroup doesn't make sense.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
---
 libxfs/xfs_bmap.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)


diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 552d292bf35412..16ee7403a75f3b 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -2614,8 +2614,7 @@ xfs_bmap_add_extent_hole_delay(
 	 */
 	if ((state & BMAP_LEFT_VALID) && (state & BMAP_LEFT_DELAY) &&
 	    left.br_startoff + left.br_blockcount == new->br_startoff &&
-	    left.br_blockcount + new->br_blockcount <= XFS_MAX_BMBT_EXTLEN &&
-	    xfs_bmap_same_rtgroup(ip, whichfork, &left, new))
+	    left.br_blockcount + new->br_blockcount <= XFS_MAX_BMBT_EXTLEN)
 		state |= BMAP_LEFT_CONTIG;
 
 	if ((state & BMAP_RIGHT_VALID) && (state & BMAP_RIGHT_DELAY) &&
@@ -2623,8 +2622,7 @@ xfs_bmap_add_extent_hole_delay(
 	    new->br_blockcount + right.br_blockcount <= XFS_MAX_BMBT_EXTLEN &&
 	    (!(state & BMAP_LEFT_CONTIG) ||
 	     (left.br_blockcount + new->br_blockcount +
-	      right.br_blockcount <= XFS_MAX_BMBT_EXTLEN)) &&
-	    xfs_bmap_same_rtgroup(ip, whichfork, new, &right))
+	      right.br_blockcount <= XFS_MAX_BMBT_EXTLEN)))
 		state |= BMAP_RIGHT_CONTIG;
 
 	/*


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 48/52] xfs: return a 64-bit block count from xfs_btree_count_blocks
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (46 preceding siblings ...)
  2024-12-23 22:10   ` [PATCH 47/52] xfs: don't call xfs_bmap_same_rtgroup in xfs_bmap_add_extent_hole_delay Darrick J. Wong
@ 2024-12-23 22:10   ` Darrick J. Wong
  2024-12-23 22:10   ` [PATCH 49/52] xfs: fix error bailout in xfs_rtginode_create Darrick J. Wong
                     ` (3 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:10 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: stable, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: bd27c7bcdca25ce8067ebb94ded6ac1bd7b47317

With the nrext64 feature enabled, it's possible for a data fork to have
2^48 extent mappings.  Even with a 64k fsblock size, that maps out to
a bmbt containing more than 2^32 blocks.  Therefore, this predicate must
return a u64 count to avoid an integer wraparound that will cause scrub
to do the wrong thing.

It's unlikely that any such filesystem currently exists, because the
incore bmbt would consume more than 64GB of kernel memory on its own,
and so far nobody except me has driven a filesystem that far, judging
from the lack of complaints.

Cc: <stable@vger.kernel.org> # v5.19
Fixes: df9ad5cc7a5240 ("xfs: Introduce macros to represent new maximum extent counts for data/attr forks")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_btree.c        |    4 ++--
 libxfs/xfs_btree.h        |    2 +-
 libxfs/xfs_ialloc_btree.c |    4 +++-
 3 files changed, 6 insertions(+), 4 deletions(-)


diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index 3d870f3f4a5165..5c293ccf623336 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -5142,7 +5142,7 @@ xfs_btree_count_blocks_helper(
 	int			level,
 	void			*data)
 {
-	xfs_extlen_t		*blocks = data;
+	xfs_filblks_t		*blocks = data;
 	(*blocks)++;
 
 	return 0;
@@ -5152,7 +5152,7 @@ xfs_btree_count_blocks_helper(
 int
 xfs_btree_count_blocks(
 	struct xfs_btree_cur	*cur,
-	xfs_extlen_t		*blocks)
+	xfs_filblks_t		*blocks)
 {
 	*blocks = 0;
 	return xfs_btree_visit_blocks(cur, xfs_btree_count_blocks_helper,
diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index 3b739459ebb0f4..c5bff273cae255 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -484,7 +484,7 @@ typedef int (*xfs_btree_visit_blocks_fn)(struct xfs_btree_cur *cur, int level,
 int xfs_btree_visit_blocks(struct xfs_btree_cur *cur,
 		xfs_btree_visit_blocks_fn fn, unsigned int flags, void *data);
 
-int xfs_btree_count_blocks(struct xfs_btree_cur *cur, xfs_extlen_t *blocks);
+int xfs_btree_count_blocks(struct xfs_btree_cur *cur, xfs_filblks_t *blocks);
 
 union xfs_btree_rec *xfs_btree_rec_addr(struct xfs_btree_cur *cur, int n,
 		struct xfs_btree_block *block);
diff --git a/libxfs/xfs_ialloc_btree.c b/libxfs/xfs_ialloc_btree.c
index 19fca9fad62b1d..4cccac145dc775 100644
--- a/libxfs/xfs_ialloc_btree.c
+++ b/libxfs/xfs_ialloc_btree.c
@@ -743,6 +743,7 @@ xfs_finobt_count_blocks(
 {
 	struct xfs_buf		*agbp = NULL;
 	struct xfs_btree_cur	*cur;
+	xfs_filblks_t		blocks;
 	int			error;
 
 	error = xfs_ialloc_read_agi(pag, tp, 0, &agbp);
@@ -750,9 +751,10 @@ xfs_finobt_count_blocks(
 		return error;
 
 	cur = xfs_finobt_init_cursor(pag, tp, agbp);
-	error = xfs_btree_count_blocks(cur, tree_blocks);
+	error = xfs_btree_count_blocks(cur, &blocks);
 	xfs_btree_del_cursor(cur, error);
 	xfs_trans_brelse(tp, agbp);
+	*tree_blocks = blocks;
 
 	return error;
 }


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 49/52] xfs: fix error bailout in xfs_rtginode_create
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (47 preceding siblings ...)
  2024-12-23 22:10   ` [PATCH 48/52] xfs: return a 64-bit block count from xfs_btree_count_blocks Darrick J. Wong
@ 2024-12-23 22:10   ` Darrick J. Wong
  2024-12-23 22:11   ` [PATCH 50/52] xfs: update btree keys correctly when _insrec splits an inode root block Darrick J. Wong
                     ` (2 subsequent siblings)
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:10 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: stable, dan.carpenter, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 23bee6f390a12d0c4c51fefc083704bc5dac377e

smatch reported that we screwed up the error cleanup in this function.
Fix it.

Cc: <stable@vger.kernel.org> # v6.13-rc1
Fixes: ae897e0bed0f54 ("xfs: support creating per-RTG files in growfs")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_rtgroup.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c
index 8189b83d0f184a..aaaec2a1cef9e5 100644
--- a/libxfs/xfs_rtgroup.c
+++ b/libxfs/xfs_rtgroup.c
@@ -493,7 +493,7 @@ xfs_rtginode_create(
 
 	error = xfs_metadir_create(&upd, S_IFREG);
 	if (error)
-		return error;
+		goto out_cancel;
 
 	xfs_rtginode_lockdep_setup(upd.ip, rtg_rgno(rtg), type);
 


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 50/52] xfs: update btree keys correctly when _insrec splits an inode root block
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (48 preceding siblings ...)
  2024-12-23 22:10   ` [PATCH 49/52] xfs: fix error bailout in xfs_rtginode_create Darrick J. Wong
@ 2024-12-23 22:11   ` Darrick J. Wong
  2024-12-23 22:11   ` [PATCH 51/52] xfs: fix sb_spino_align checks for large fsblock sizes Darrick J. Wong
  2024-12-23 22:11   ` [PATCH 52/52] xfs: return from xfs_symlink_verify early on V4 filesystems Darrick J. Wong
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:11 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: stable, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 6d7b4bc1c3e00b1a25b7a05141a64337b4629337

In commit 2c813ad66a72, I partially fixed a bug wherein xfs_btree_insrec
would erroneously try to update the parent's key for a block that had
been split if we decided to insert the new record into the new block.
The solution was to detect this situation and update the in-core key
value that we pass up to the caller so that the caller will (eventually)
add the new block to the parent level of the tree with the correct key.

However, I missed a subtlety about the way inode-rooted btrees work.  If
the full block was a maximally sized inode root block, we'll solve that
fullness by moving the root block's records to a new block, resizing the
root block, and updating the root to point to the new block.  We don't
pass a pointer to the new block to the caller because that work has
already been done.  The new record will /always/ land in the new block,
so in this case we need to use xfs_btree_update_keys to update the keys.

This bug can theoretically manifest itself in the very rare case that we
split a bmbt root block and the new record lands in the very first slot
of the new block, though I've never managed to trigger it in practice.
However, it is very easy to reproduce by running generic/522 with the
realtime rmapbt patchset if rtinherit=1.

Cc: <stable@vger.kernel.org> # v4.8
Fixes: 2c813ad66a7218 ("xfs: support btrees with overlapping intervals for keys")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_btree.c |   29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)


diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index 5c293ccf623336..f4c4db62e2069e 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -3555,14 +3555,31 @@ xfs_btree_insrec(
 	xfs_btree_log_block(cur, bp, XFS_BB_NUMRECS);
 
 	/*
-	 * If we just inserted into a new tree block, we have to
-	 * recalculate nkey here because nkey is out of date.
+	 * Update btree keys to reflect the newly added record or keyptr.
+	 * There are three cases here to be aware of.  Normally, all we have to
+	 * do is walk towards the root, updating keys as necessary.
 	 *
-	 * Otherwise we're just updating an existing block (having shoved
-	 * some records into the new tree block), so use the regular key
-	 * update mechanism.
+	 * If the caller had us target a full block for the insertion, we dealt
+	 * with that by calling the _make_block_unfull function.  If the
+	 * "make unfull" function splits the block, it'll hand us back the key
+	 * and pointer of the new block.  We haven't yet added the new block to
+	 * the next level up, so if we decide to add the new record to the new
+	 * block (bp->b_bn != old_bn), we have to update the caller's pointer
+	 * so that the caller adds the new block with the correct key.
+	 *
+	 * However, there is a third possibility-- if the selected block is the
+	 * root block of an inode-rooted btree and cannot be expanded further,
+	 * the "make unfull" function moves the root block contents to a new
+	 * block and updates the root block to point to the new block.  In this
+	 * case, no block pointer is passed back because the block has already
+	 * been added to the btree.  In this case, we need to use the regular
+	 * key update function, just like the first case.  This is critical for
+	 * overlapping btrees, because the high key must be updated to reflect
+	 * the entire tree, not just the subtree accessible through the first
+	 * child of the root (which is now two levels down from the root).
 	 */
-	if (bp && xfs_buf_daddr(bp) != old_bn) {
+	if (!xfs_btree_ptr_is_null(cur, &nptr) &&
+	    bp && xfs_buf_daddr(bp) != old_bn) {
 		xfs_btree_get_keys(cur, block, lkey);
 	} else if (xfs_btree_needs_key_update(cur, optr)) {
 		error = xfs_btree_update_keys(cur, level);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 51/52] xfs: fix sb_spino_align checks for large fsblock sizes
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (49 preceding siblings ...)
  2024-12-23 22:11   ` [PATCH 50/52] xfs: update btree keys correctly when _insrec splits an inode root block Darrick J. Wong
@ 2024-12-23 22:11   ` Darrick J. Wong
  2024-12-23 22:11   ` [PATCH 52/52] xfs: return from xfs_symlink_verify early on V4 filesystems Darrick J. Wong
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:11 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: stable, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 7f8a44f37229fc76bfcafa341a4b8862368ef44a

For a sparse inodes filesystem, mkfs.xfs computes the values of
sb_spino_align and sb_inoalignmt with the following code:

int     cluster_size = XFS_INODE_BIG_CLUSTER_SIZE;

if (cfg->sb_feat.crcs_enabled)
cluster_size *= cfg->inodesize / XFS_DINODE_MIN_SIZE;

sbp->sb_spino_align = cluster_size >> cfg->blocklog;
sbp->sb_inoalignmt = XFS_INODES_PER_CHUNK *
cfg->inodesize >> cfg->blocklog;

On a V5 filesystem with 64k fsblocks and 512 byte inodes, this results
in cluster_size = 8192 * (512 / 256) = 16384.  As a result,
sb_spino_align and sb_inoalignmt are both set to zero.  Unfortunately,
this trips the new sb_spino_align check that was just added to
xfs_validate_sb_common, and the mkfs fails:

# mkfs.xfs -f -b size=64k, /dev/sda
meta-data=/dev/sda               isize=512    agcount=4, agsize=81136 blks
=                       sectsz=512   attr=2, projid32bit=1
=                       crc=1        finobt=1, sparse=1, rmapbt=1
=                       reflink=1    bigtime=1 inobtcount=1 nrext64=1
=                       exchange=0   metadir=0
data     =                       bsize=65536  blocks=324544, imaxpct=25
=                       sunit=0      swidth=0 blks
naming   =version 2              bsize=65536  ascii-ci=0, ftype=1, parent=0
log      =internal log           bsize=65536  blocks=5006, version=2
=                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=65536  blocks=0, rtextents=0
=                       rgcount=0    rgsize=0 extents
Discarding blocks...Sparse inode alignment (0) is invalid.
Metadata corruption detected at 0x560ac5a80bbe, xfs_sb block 0x0/0x200
libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x1
mkfs.xfs: Releasing dirty buffer to free list!
found dirty buffer (bulk) on free list!
Sparse inode alignment (0) is invalid.
Metadata corruption detected at 0x560ac5a80bbe, xfs_sb block 0x0/0x200
libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x1
mkfs.xfs: writing AG headers failed, err=22

Prior to commit 59e43f5479cce1 this all worked fine, even if "sparse"
inodes are somewhat meaningless when everything fits in a single
fsblock.  Adjust the checks to handle existing filesystems.

Cc: <stable@vger.kernel.org> # v6.13-rc1
Fixes: 59e43f5479cce1 ("xfs: sb_spino_align is not verified")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_sb.c |   11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 87f740e6c75dce..ff23803a8065bf 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -491,12 +491,13 @@ xfs_validate_sb_common(
 				return -EINVAL;
 			}

-			if (!sbp->sb_spino_align ||
-			    sbp->sb_spino_align > sbp->sb_inoalignmt ||
-			    (sbp->sb_inoalignmt % sbp->sb_spino_align) != 0) {
+			if (sbp->sb_spino_align &&
+			    (sbp->sb_spino_align > sbp->sb_inoalignmt ||
+			     (sbp->sb_inoalignmt % sbp->sb_spino_align) != 0)) {
 				xfs_warn(mp,
-				"Sparse inode alignment (%u) is invalid.",
-					sbp->sb_spino_align);
+"Sparse inode alignment (%u) is invalid, must be integer factor of (%u).",
+					sbp->sb_spino_align,
+					sbp->sb_inoalignmt);
 				return -EINVAL;
 			}
 		} else if (sbp->sb_spino_align) {

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 52/52] xfs: return from xfs_symlink_verify early on V4 filesystems
  2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                     ` (50 preceding siblings ...)
  2024-12-23 22:11   ` [PATCH 51/52] xfs: fix sb_spino_align checks for large fsblock sizes Darrick J. Wong
@ 2024-12-23 22:11   ` Darrick J. Wong
  51 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:11 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: stable, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Source kernel commit: 7f8b718c58783f3ff0810b39e2f62f50ba2549f6

V4 symlink blocks didn't have headers, so return early if this is a V4
filesystem.

Cc: <stable@vger.kernel.org> # v5.1
Fixes: 39708c20ab5133 ("xfs: miscellaneous verifier magic value fixups")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/xfs_symlink_remote.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_symlink_remote.c b/libxfs/xfs_symlink_remote.c
index 2ad586f3926ad2..1c355f751e1cc7 100644
--- a/libxfs/xfs_symlink_remote.c
+++ b/libxfs/xfs_symlink_remote.c
@@ -89,8 +89,10 @@ xfs_symlink_verify(
 	struct xfs_mount	*mp = bp->b_mount;
 	struct xfs_dsymlink_hdr	*dsl = bp->b_addr;
 
+	/* no verification of non-crc buffers */
 	if (!xfs_has_crc(mp))
-		return __this_address;
+		return NULL;
+
 	if (!xfs_verify_magic(bp, dsl->sl_magic))
 		return __this_address;
 	if (!uuid_equal(&dsl->sl_uuid, &mp->m_sb.sb_meta_uuid))


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 01/51] libxfs: remove XFS_ILOCK_RT*
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
@ 2024-12-23 22:12   ` Darrick J. Wong
  2024-12-23 22:12   ` [PATCH 02/51] libxfs: adjust xfs_fsb_to_db to handle segmented rtblocks Darrick J. Wong
                     ` (49 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:12 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Now that we've centralized the realtime metadata locking routines, get
rid of the ILOCK subclasses since we now use explicit lockdep classes.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/libxfs_priv.h |    2 --
 1 file changed, 2 deletions(-)


diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index 1fc9c784b05054..dd24bdc2d169d9 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -180,8 +180,6 @@ enum ce { CE_DEBUG, CE_CONT, CE_NOTE, CE_WARN, CE_ALERT, CE_PANIC };
 #define XFS_ERRLEVEL_LOW		1
 #define XFS_ILOCK_EXCL			0
 #define XFS_ILOCK_SHARED		0
-#define XFS_ILOCK_RTBITMAP		0
-#define XFS_ILOCK_RTSUM			0
 #define XFS_IOLOCK_EXCL			0
 #define XFS_STATS_INC(mp, count)	do { (mp) = (mp); } while (0)
 #define XFS_STATS_DEC(mp, count, x)	do { (mp) = (mp); } while (0)


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 02/51] libxfs: adjust xfs_fsb_to_db to handle segmented rtblocks
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
  2024-12-23 22:12   ` [PATCH 01/51] libxfs: remove XFS_ILOCK_RT* Darrick J. Wong
@ 2024-12-23 22:12   ` Darrick J. Wong
  2024-12-23 22:12   ` [PATCH 03/51] xfs_repair,mkfs: port to libxfs_rt{bitmap,summary}_create Darrick J. Wong
                     ` (48 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:12 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Update this function to handle segmented xfs_rtblock_t, just like we did
for the kernel.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/util.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/libxfs/util.c b/libxfs/util.c
index ed48e2045b484b..4a9dd254083a63 100644
--- a/libxfs/util.c
+++ b/libxfs/util.c
@@ -433,7 +433,7 @@ static xfs_daddr_t
 xfs_fsb_to_db(struct xfs_inode *ip, xfs_fsblock_t fsb)
 {
 	if (XFS_IS_REALTIME_INODE(ip))
-		 return XFS_FSB_TO_BB(ip->i_mount, fsb);
+		 return xfs_rtb_to_daddr(ip->i_mount, fsb);
 	return XFS_FSB_TO_DADDR(ip->i_mount, (fsb));
 }
 


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 03/51] xfs_repair,mkfs: port to libxfs_rt{bitmap,summary}_create
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
  2024-12-23 22:12   ` [PATCH 01/51] libxfs: remove XFS_ILOCK_RT* Darrick J. Wong
  2024-12-23 22:12   ` [PATCH 02/51] libxfs: adjust xfs_fsb_to_db to handle segmented rtblocks Darrick J. Wong
@ 2024-12-23 22:12   ` Darrick J. Wong
  2024-12-23 22:12   ` [PATCH 04/51] libxfs: use correct rtx count to block count conversion Darrick J. Wong
                     ` (47 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:12 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Replace the open-coded rtbitmap and summary creation routines with the
ones in libxfs so that we share code.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 mkfs/proto.c    |   51 ++++++++++++++++++++++-----------------------------
 repair/phase6.c |    8 ++------
 2 files changed, 24 insertions(+), 35 deletions(-)


diff --git a/mkfs/proto.c b/mkfs/proto.c
index 55182bf50fd399..846b1c9a9e8a21 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -948,7 +948,10 @@ static void
 create_sb_metadata_file(
 	struct xfs_rtgroup	*rtg,
 	enum xfs_rtg_inodes	type,
-	void			(*create)(struct xfs_inode *ip))
+	int			(*create)(struct xfs_rtgroup *rtg,
+					  struct xfs_inode *ip,
+					  struct xfs_trans *tp,
+					  bool init))
 {
 	struct xfs_mount	*mp = rtg_mount(rtg);
 	struct xfs_icreate_args	args = {
@@ -972,9 +975,23 @@ create_sb_metadata_file(
 	if (error)
 		goto fail;
 
-	create(ip);
+	error = create(rtg, ip, tp, true);
+	if (error < 0)
+		error = -error;
+	if (error)
+		goto fail;
 
-	libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+	switch (type) {
+	case XFS_RTGI_BITMAP:
+		mp->m_sb.sb_rbmino = ip->i_ino;
+		break;
+	case XFS_RTGI_SUMMARY:
+		mp->m_sb.sb_rsumino = ip->i_ino;
+		break;
+	default:
+		error = EFSCORRUPTED;
+		goto fail;
+	}
 	libxfs_log_sb(tp);
 
 	error = -libxfs_trans_commit(tp);
@@ -990,30 +1007,6 @@ create_sb_metadata_file(
 		fail(_("Realtime inode allocation failed"), error);
 }
 
-static void
-rtbitmap_create(
-	struct xfs_inode	*ip)
-{
-	struct xfs_mount	*mp = ip->i_mount;
-
-	ip->i_disk_size = mp->m_sb.sb_rbmblocks * mp->m_sb.sb_blocksize;
-	ip->i_diflags |= XFS_DIFLAG_NEWRTBM;
-	inode_set_atime(VFS_I(ip), 0, 0);
-
-	mp->m_sb.sb_rbmino = ip->i_ino;
-}
-
-static void
-rtsummary_create(
-	struct xfs_inode	*ip)
-{
-	struct xfs_mount	*mp = ip->i_mount;
-
-	ip->i_disk_size = mp->m_rsumblocks * mp->m_sb.sb_blocksize;
-
-	mp->m_sb.sb_rsumino = ip->i_ino;
-}
-
 /*
  * Free the whole realtime area using transactions.
  * Do one transaction per bitmap block.
@@ -1078,9 +1071,9 @@ rtinit(
 
 	while ((rtg = xfs_rtgroup_next(mp, rtg))) {
 		create_sb_metadata_file(rtg, XFS_RTGI_BITMAP,
-				rtbitmap_create);
+				libxfs_rtbitmap_create);
 		create_sb_metadata_file(rtg, XFS_RTGI_SUMMARY,
-				rtsummary_create);
+				libxfs_rtsummary_create);
 
 		rtfreesp_init(rtg);
 	}
diff --git a/repair/phase6.c b/repair/phase6.c
index 41342d884ce37a..e9feaa5739efa1 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -550,14 +550,10 @@ _("couldn't iget realtime %s inode -- error - %d\n"),
 
 	switch (type) {
 	case XFS_RTGI_BITMAP:
-		ip->i_disk_size = mp->m_sb.sb_rbmblocks * mp->m_sb.sb_blocksize;
-		libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
-		error = 0;
+		error = -libxfs_rtbitmap_create(rtg, ip, tp, false);
 		break;
 	case XFS_RTGI_SUMMARY:
-		ip->i_disk_size = mp->m_rsumblocks * mp->m_sb.sb_blocksize;
-		libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
-		error = 0;
+		error = -libxfs_rtsummary_create(rtg, ip, tp, false);
 		break;
 	default:
 		error = EINVAL;


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 04/51] libxfs: use correct rtx count to block count conversion
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-12-23 22:12   ` [PATCH 03/51] xfs_repair,mkfs: port to libxfs_rt{bitmap,summary}_create Darrick J. Wong
@ 2024-12-23 22:12   ` Darrick J. Wong
  2024-12-23 22:13   ` [PATCH 05/51] libfrog: scrub the realtime group superblock Darrick J. Wong
                     ` (46 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:12 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Fix a place where we use the wrong conversion functions to convert
between a number of rt extents and a number of rt blocks.  This isn't
really necessary since userspace cannot allocate rt extents, but let's
not leave a logic bomb.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/trans.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/libxfs/trans.c b/libxfs/trans.c
index 01834eff4b77ca..5c896ba1661b10 100644
--- a/libxfs/trans.c
+++ b/libxfs/trans.c
@@ -1202,7 +1202,7 @@ libxfs_trans_alloc_inode(
 	int			error;
 
 	error = libxfs_trans_alloc(mp, resv, dblocks,
-			xfs_rtb_to_rtx(mp, rblocks),
+			xfs_extlen_to_rtxlen(mp, rblocks),
 			force ? XFS_TRANS_RESERVE : 0, &tp);
 	if (error)
 		return error;


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 05/51] libfrog: scrub the realtime group superblock
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-12-23 22:12   ` [PATCH 04/51] libxfs: use correct rtx count to block count conversion Darrick J. Wong
@ 2024-12-23 22:13   ` Darrick J. Wong
  2024-12-23 22:13   ` [PATCH 06/51] man: document the rt group geometry ioctl Darrick J. Wong
                     ` (45 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:13 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Enable scrubbing of realtime group superblocks in xfs_scrub, and
update the scrub ioctl documentation.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libfrog/scrub.c                     |    5 +++++
 libfrog/scrub.h                     |    1 +
 man/man2/ioctl_xfs_scrub_metadata.2 |    9 +++++++++
 scrub/repair.c                      |    1 +
 scrub/scrub.c                       |    6 ++++++
 5 files changed, 22 insertions(+)


diff --git a/libfrog/scrub.c b/libfrog/scrub.c
index b2d58c7a966b0d..e7fb8b890bc133 100644
--- a/libfrog/scrub.c
+++ b/libfrog/scrub.c
@@ -159,6 +159,11 @@ const struct xfrog_scrub_descr xfrog_scrubbers[XFS_SCRUB_TYPE_NR] = {
 		.descr	= "metadata directory paths",
 		.group	= XFROG_SCRUB_GROUP_METAPATH,
 	},
+	[XFS_SCRUB_TYPE_RGSUPER] = {
+		.name	= "rgsuper",
+		.descr	= "realtime group superblock",
+		.group	= XFROG_SCRUB_GROUP_RTGROUP,
+	},
 };
 
 const struct xfrog_scrub_descr xfrog_metapaths[XFS_SCRUB_METAPATH_NR] = {
diff --git a/libfrog/scrub.h b/libfrog/scrub.h
index a35d3e9c293fe5..83455c390e170a 100644
--- a/libfrog/scrub.h
+++ b/libfrog/scrub.h
@@ -16,6 +16,7 @@ enum xfrog_scrub_group {
 	XFROG_SCRUB_GROUP_ISCAN,	/* metadata requiring full inode scan */
 	XFROG_SCRUB_GROUP_SUMMARY,	/* summary metadata */
 	XFROG_SCRUB_GROUP_METAPATH,	/* metadata directory path */
+	XFROG_SCRUB_GROUP_RTGROUP,	/* per-rtgroup metadata */
 };
 
 /* Catalog of scrub types and names, indexed by XFS_SCRUB_TYPE_* */
diff --git a/man/man2/ioctl_xfs_scrub_metadata.2 b/man/man2/ioctl_xfs_scrub_metadata.2
index 1e7e327b37d226..545e3fcbac320e 100644
--- a/man/man2/ioctl_xfs_scrub_metadata.2
+++ b/man/man2/ioctl_xfs_scrub_metadata.2
@@ -88,6 +88,15 @@ .SH DESCRIPTION
 .BR sm_ino " and " sm_gen
 must be zero.
 
+.PP
+.TP
+.B XFS_SCRUB_TYPE_RGSUPER
+Examine a given realtime allocation group's superblock.
+The realtime allocation group number must be given in
+.IR sm_agno "."
+.IR sm_ino " and " sm_gen
+must be zero.
+
 .TP
 .B XFS_SCRUB_TYPE_INODE
 Examine a given inode record for obviously incorrect values and
diff --git a/scrub/repair.c b/scrub/repair.c
index e594e704f51503..c8cdb98de5457b 100644
--- a/scrub/repair.c
+++ b/scrub/repair.c
@@ -546,6 +546,7 @@ repair_item_difficulty(
 		case XFS_SCRUB_TYPE_REFCNTBT:
 		case XFS_SCRUB_TYPE_RTBITMAP:
 		case XFS_SCRUB_TYPE_RTSUM:
+		case XFS_SCRUB_TYPE_RGSUPER:
 			ret |= REPAIR_DIFFICULTY_PRIMARY;
 			break;
 		}
diff --git a/scrub/scrub.c b/scrub/scrub.c
index bcd63eea1030a6..a2fd8d77d82be0 100644
--- a/scrub/scrub.c
+++ b/scrub/scrub.c
@@ -66,6 +66,9 @@ format_metapath_descr(
 				(unsigned long long)vhead->svh_ino);
 
 	sc = &xfrog_metapaths[vhead->svh_ino];
+	if (sc->group == XFROG_SCRUB_GROUP_RTGROUP)
+		return snprintf(buf, buflen, _("rtgroup %u %s"),
+				vhead->svh_agno, _(sc->descr));
 	return snprintf(buf, buflen, "%s", _(sc->descr));
 }
 
@@ -107,6 +110,9 @@ format_scrubv_descr(
 		return snprintf(buf, buflen, _("%s"), _(sc->descr));
 	case XFROG_SCRUB_GROUP_METAPATH:
 		return format_metapath_descr(buf, buflen, vhead);
+	case XFROG_SCRUB_GROUP_RTGROUP:
+		return snprintf(buf, buflen, _("rtgroup %u %s"),
+				vhead->svh_agno, _(sc->descr));
 	}
 	return -1;
 }


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 06/51] man: document the rt group geometry ioctl
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-12-23 22:13   ` [PATCH 05/51] libfrog: scrub the realtime group superblock Darrick J. Wong
@ 2024-12-23 22:13   ` Darrick J. Wong
  2024-12-23 22:13   ` [PATCH 07/51] man: document rgextents geom field Darrick J. Wong
                     ` (44 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:13 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Document the new ioctl that retrieves realtime allocation group geometry
information.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 man/man2/ioctl_xfs_rtgroup_geometry.2 |   99 +++++++++++++++++++++++++++++++++
 1 file changed, 99 insertions(+)
 create mode 100644 man/man2/ioctl_xfs_rtgroup_geometry.2


diff --git a/man/man2/ioctl_xfs_rtgroup_geometry.2 b/man/man2/ioctl_xfs_rtgroup_geometry.2
new file mode 100644
index 00000000000000..c4b0de94453558
--- /dev/null
+++ b/man/man2/ioctl_xfs_rtgroup_geometry.2
@@ -0,0 +1,99 @@
+.\" Copyright (c) 2022-2024 Oracle.  All rights reserved.
+.\"
+.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
+.\" SPDX-License-Identifier: GPL-2.0-or-later
+.\" %%%LICENSE_END
+.TH IOCTL-XFS-RTGROUP-GEOMETRY 2 2022-08-18 "XFS"
+.SH NAME
+ioctl_xfs_rtgroup_geometry \- query XFS realtime group geometry information
+.SH SYNOPSIS
+.br
+.B #include <xfs/xfs_fs.h>
+.PP
+.BI "int ioctl(int " fd ", XFS_IOC_RTGROUP_GEOMETRY, struct xfs_rtgroup_geometry *" arg );
+.SH DESCRIPTION
+This XFS ioctl retrieves the geometry information for a given realtime group.
+The geometry information is conveyed in a structure of the following form:
+.PP
+.in +4n
+.nf
+struct xfs_rtgroup_geometry {
+	__u32  rg_number;
+	__u32  rg_length;
+	__u32  rg_sick;
+	__u32  rg_checked;
+	__u32  rg_flags;
+	__u32  rg_reserved[27];
+};
+.fi
+.in
+.TP
+.I rg_number
+The caller must set this field to the index of the realtime group that the
+caller wishes to learn about.
+.TP
+.I rg_length
+The length of the realtime group is returned in this field, in units of
+filesystem blocks.
+.I rg_flags
+The caller can set this field to change the operational behavior of the ioctl.
+Currently no flags are defined, so this field must be zero.
+.TP
+.IR rg_reserved " and " rg_pad
+All reserved fields will be set to zero on return.
+.PP
+The fields
+.IR rg_sick " and " rg_checked
+indicate the relative health of various realtime group metadata:
+.IP \[bu] 2
+If a given sick flag is set in
+.IR rg_sick ,
+then that piece of metadata has been observed to be damaged.
+The same bit will be set in
+.IR rg_checked .
+.IP \[bu]
+If a given sick flag is set in
+.I rg_checked
+and is not set in
+.IR rg_sick ,
+then that piece of metadata has been checked and is not faulty.
+.IP \[bu]
+If a given sick flag is not set in
+.IR rg_checked ,
+then no conclusion can be made.
+.PP
+The following flags apply to these fields:
+.RS 0.4i
+.TP
+.B XFS_RTGROUP_GEOM_SICK_SUPER
+Realtime group superblock.
+.TP
+.B XFS_RTGROUP_GEOM_SICK_BITMAP
+Realtime bitmap for this group.
+.TP
+.B XFS_RTGROUP_GEOM_SICK_SUMMARY
+Realtime summary for this group.
+.RE
+.SH RETURN VALUE
+On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.PP
+.SH ERRORS
+Error codes can be one of, but are not limited to, the following:
+.TP
+.B EFSBADCRC
+Metadata checksum validation failed while performing the query.
+.TP
+.B EFSCORRUPTED
+Metadata corruption was encountered while performing the query.
+.TP
+.B EINVAL
+The specified realtime group number is not valid for this filesystem.
+.TP
+.B EIO
+An I/O error was encountered while performing the query.
+.SH CONFORMING TO
+This API is specific to XFS filesystem on the Linux kernel.
+.SH SEE ALSO
+.BR ioctl (2)


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 07/51] man: document rgextents geom field
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-12-23 22:13   ` [PATCH 06/51] man: document the rt group geometry ioctl Darrick J. Wong
@ 2024-12-23 22:13   ` Darrick J. Wong
  2024-12-23 22:13   ` [PATCH 08/51] libxfs: port userspace deferred log item to handle rtgroups Darrick J. Wong
                     ` (43 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:13 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Document the new rgextent geom field.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 man/man2/ioctl_xfs_fsgeometry.2 |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)


diff --git a/man/man2/ioctl_xfs_fsgeometry.2 b/man/man2/ioctl_xfs_fsgeometry.2
index c808ad5b8b9190..502054f391e9b5 100644
--- a/man/man2/ioctl_xfs_fsgeometry.2
+++ b/man/man2/ioctl_xfs_fsgeometry.2
@@ -49,7 +49,8 @@ .SH DESCRIPTION
 
 	__u32         sick;
 	__u32         checked;
-	__u64         reserved[17];
+	__u64         rgextents;
+	__u64         reserved[16];
 };
 .fi
 .in
@@ -139,6 +140,9 @@ .SH DESCRIPTION
 .B XFS METADATA HEALTH REPORTING
 for more details.
 .PP
+.I rgextents
+Is the number of RT extents in each rtgroup.
+.PP
 .I reserved
 is set to zero.
 .SH FILESYSTEM FEATURE FLAGS


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 08/51] libxfs: port userspace deferred log item to handle rtgroups
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (6 preceding siblings ...)
  2024-12-23 22:13   ` [PATCH 07/51] man: document rgextents geom field Darrick J. Wong
@ 2024-12-23 22:13   ` Darrick J. Wong
  2024-12-23 22:14   ` [PATCH 09/51] libxfs: implement some sanity checking for enormous rgcount Darrick J. Wong
                     ` (42 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:13 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Make the userspace log items to handle rt groups correctly.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/defer_item.c |   73 ++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 49 insertions(+), 24 deletions(-)


diff --git a/libxfs/defer_item.c b/libxfs/defer_item.c
index eee84ffbe625d5..9db0e471cba69c 100644
--- a/libxfs/defer_item.c
+++ b/libxfs/defer_item.c
@@ -77,6 +77,17 @@ xfs_extent_free_create_done(
 	return NULL;
 }
 
+static inline const struct xfs_defer_op_type *
+xefi_ops(
+	struct xfs_extent_free_item	*xefi)
+{
+	if (xfs_efi_is_realtime(xefi))
+		return &xfs_rtextent_free_defer_type;
+	if (xefi->xefi_agresv == XFS_AG_RESV_AGFL)
+		return &xfs_agfl_free_defer_type;
+	return &xfs_extent_free_defer_type;
+}
+
 /* Add this deferred EFI to the transaction. */
 void
 xfs_extent_free_defer_add(
@@ -86,14 +97,11 @@ xfs_extent_free_defer_add(
 {
 	struct xfs_mount		*mp = tp->t_mountp;
 
+	trace_xfs_extent_free_defer(mp, xefi);
+
 	xefi->xefi_group = xfs_group_intent_get(mp, xefi->xefi_startblock,
-			XG_TYPE_AG);
-	if (xefi->xefi_agresv == XFS_AG_RESV_AGFL)
-		*dfpp = xfs_defer_add(tp, &xefi->xefi_list,
-				&xfs_agfl_free_defer_type);
-	else
-		*dfpp = xfs_defer_add(tp, &xefi->xefi_list,
-				&xfs_extent_free_defer_type);
+			xfs_efi_is_realtime(xefi) ? XG_TYPE_RTG : XG_TYPE_AG);
+	*dfpp = xfs_defer_add(tp, &xefi->xefi_list, xefi_ops(xefi));
 }
 
 /* Cancel a free extent. */
@@ -159,6 +167,32 @@ const struct xfs_defer_op_type xfs_extent_free_defer_type = {
 	.cancel_item	= xfs_extent_free_cancel_item,
 };
 
+STATIC int
+xfs_rtextent_free_finish_item(
+	struct xfs_trans		*tp,
+	struct xfs_log_item		*done,
+	struct list_head		*item,
+	struct xfs_btree_cur		**state)
+{
+	struct xfs_extent_free_item	*xefi = xefi_entry(item);
+	int				error;
+
+	error = xfs_rtfree_blocks(tp, to_rtg(xefi->xefi_group),
+			xefi->xefi_startblock, xefi->xefi_blockcount);
+	if (error != -EAGAIN)
+		xfs_extent_free_cancel_item(item);
+	return error;
+}
+
+const struct xfs_defer_op_type xfs_rtextent_free_defer_type = {
+	.name		= "rtextent_free",
+	.create_intent	= xfs_extent_free_create_intent,
+	.abort_intent	= xfs_extent_free_abort_intent,
+	.create_done	= xfs_extent_free_create_done,
+	.finish_item	= xfs_rtextent_free_finish_item,
+	.cancel_item	= xfs_extent_free_cancel_item,
+};
+
 /*
  * AGFL blocks are accounted differently in the reserve pools and are not
  * inserted into the busy extent list.
@@ -496,14 +530,16 @@ xfs_bmap_update_create_done(
 	return NULL;
 }
 
-/* Take an active ref to the AG containing the space we're mapping. */
+/* Take a passive ref to the group containing the space we're mapping. */
 static inline void
 xfs_bmap_update_get_group(
 	struct xfs_mount	*mp,
 	struct xfs_bmap_intent	*bi)
 {
+	enum xfs_group_type	type = XG_TYPE_AG;
+
 	if (xfs_ifork_is_realtime(bi->bi_owner, bi->bi_whichfork))
-		return;
+		type = XG_TYPE_RTG;
 
 	/*
 	 * Bump the intent count on behalf of the deferred rmap and refcount
@@ -513,7 +549,7 @@ xfs_bmap_update_get_group(
 	 * remains nonzero across the transaction roll.
 	 */
 	bi->bi_group = xfs_group_intent_get(mp, bi->bi_bmap.br_startblock,
-			XG_TYPE_AG);
+				type);
 }
 
 /* Add this deferred BUI to the transaction. */
@@ -522,8 +558,6 @@ xfs_bmap_defer_add(
 	struct xfs_trans	*tp,
 	struct xfs_bmap_intent	*bi)
 {
-	trace_xfs_bmap_defer(bi);
-
 	xfs_bmap_update_get_group(tp->t_mountp, bi);
 
 	/*
@@ -536,20 +570,11 @@ xfs_bmap_defer_add(
 	 */
 	if (bi->bi_type == XFS_BMAP_MAP)
 		bi->bi_owner->i_delayed_blks += bi->bi_bmap.br_blockcount;
+
+	trace_xfs_bmap_defer(bi);
 	xfs_defer_add(tp, &bi->bi_list, &xfs_bmap_update_defer_type);
 }
 
-/* Release an active AG ref after finishing mapping work. */
-static inline void
-xfs_bmap_update_put_group(
-	struct xfs_bmap_intent	*bi)
-{
-	if (xfs_ifork_is_realtime(bi->bi_owner, bi->bi_whichfork))
-		return;
-
-	xfs_group_intent_put(bi->bi_group);
-}
-
 /* Cancel a deferred bmap update. */
 STATIC void
 xfs_bmap_update_cancel_item(
@@ -560,7 +585,7 @@ xfs_bmap_update_cancel_item(
 	if (bi->bi_type == XFS_BMAP_MAP)
 		bi->bi_owner->i_delayed_blks -= bi->bi_bmap.br_blockcount;
 
-	xfs_bmap_update_put_group(bi);
+	xfs_group_intent_put(bi->bi_group);
 	kmem_cache_free(xfs_bmap_intent_cache, bi);
 }
 


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 09/51] libxfs: implement some sanity checking for enormous rgcount
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (7 preceding siblings ...)
  2024-12-23 22:13   ` [PATCH 08/51] libxfs: port userspace deferred log item to handle rtgroups Darrick J. Wong
@ 2024-12-23 22:14   ` Darrick J. Wong
  2024-12-23 22:14   ` [PATCH 10/51] libfrog: support scrubbing rtgroup metadata paths Darrick J. Wong
                     ` (41 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:14 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Similar to what we do for suspiciously large sb_agcount values, if
someone tries to get libxfs to load a filesystem with a very large
realtime group count, let's do some basic checks of the rt device to
see if it's really that large.  If the read fails, only load the first
rtgroup and warn the user.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/init.c |   46 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)


diff --git a/libxfs/init.c b/libxfs/init.c
index 6642cd50c00b5f..16291466ac86d3 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -654,6 +654,49 @@ xfs_set_low_space_thresholds(
 		mp->m_low_space[i] = dblocks * (i + 1);
 }
 
+/*
+ * libxfs_initialize_rtgroup will allocate a rtgroup structure for each
+ * rtgroup.  If rgcount is corrupted and insanely high, this will OOM the box.
+ * Try to read what would be the last rtgroup superblock.  If that fails, read
+ * the first one and let the user know to check the geometry.
+ */
+static inline bool
+check_many_rtgroups(
+	struct xfs_mount	*mp,
+	struct xfs_sb		*sbp)
+{
+	struct xfs_buf		*bp;
+	xfs_daddr_t		d;
+	int			error;
+
+	if (!mp->m_rtdev->bt_bdev) {
+		fprintf(stderr, _("%s: no rt device, ignoring rgcount %u\n"),
+				progname, sbp->sb_rgcount);
+		if (!xfs_is_debugger(mp))
+			return false;
+
+		sbp->sb_rgcount = 0;
+		return true;
+	}
+
+	d = (xfs_daddr_t)XFS_FSB_TO_BB(mp, mp->m_sb.sb_rblocks);
+	error = libxfs_buf_read(mp->m_rtdev, d - XFS_FSB_TO_BB(mp, 1), 1, 0,
+			&bp, NULL);
+	if (!error) {
+		libxfs_buf_relse(bp);
+		return true;
+	}
+
+	fprintf(stderr, _("%s: read of rtgroup %u failed\n"), progname,
+			sbp->sb_rgcount - 1);
+	if (!xfs_is_debugger(mp))
+		return false;
+
+	fprintf(stderr, _("%s: limiting reads to rtgroup 0\n"), progname);
+	sbp->sb_rgcount = 1;
+	return true;
+}
+
 /*
  * Mount structure initialization, provides a filled-in xfs_mount_t
  * such that the numerous XFS_* macros can be used.  If dev is zero,
@@ -810,6 +853,9 @@ libxfs_mount(
 			libxfs_buf_relse(bp);
 	}
 
+	if (sbp->sb_rgcount > 1000000 && !check_many_rtgroups(mp, sbp))
+		goto out_da;
+
 	error = libxfs_initialize_perag(mp, 0, sbp->sb_agcount,
 			sbp->sb_dblocks, &mp->m_maxagi);
 	if (error) {


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 10/51] libfrog: support scrubbing rtgroup metadata paths
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (8 preceding siblings ...)
  2024-12-23 22:14   ` [PATCH 09/51] libxfs: implement some sanity checking for enormous rgcount Darrick J. Wong
@ 2024-12-23 22:14   ` Darrick J. Wong
  2024-12-23 22:14   ` [PATCH 11/51] libfrog: report rt groups in output Darrick J. Wong
                     ` (40 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:14 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Support scrubbing the metadata paths of rtgroup metadata inodes.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libfrog/scrub.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)


diff --git a/libfrog/scrub.c b/libfrog/scrub.c
index e7fb8b890bc133..66000f1ed66be4 100644
--- a/libfrog/scrub.c
+++ b/libfrog/scrub.c
@@ -172,6 +172,21 @@ const struct xfrog_scrub_descr xfrog_metapaths[XFS_SCRUB_METAPATH_NR] = {
 		.descr	= "metapath",
 		.group	= XFROG_SCRUB_GROUP_NONE,
 	},
+	[XFS_SCRUB_METAPATH_RTDIR] = {
+		.name	= "rtdir",
+		.descr	= "realtime group metadir",
+		.group	= XFROG_SCRUB_GROUP_FS,
+	},
+	[XFS_SCRUB_METAPATH_RTBITMAP] = {
+		.name	= "rtbitmap",
+		.descr	= "rtgroup bitmap",
+		.group	= XFROG_SCRUB_GROUP_RTGROUP,
+	},
+	[XFS_SCRUB_METAPATH_RTSUMMARY] = {
+		.name	= "rtsummary",
+		.descr	= "rtgroup summary",
+		.group	= XFROG_SCRUB_GROUP_RTGROUP,
+	},
 };
 
 /* Invoke the scrub ioctl.  Returns zero or negative error code. */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 11/51] libfrog: report rt groups in output
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (9 preceding siblings ...)
  2024-12-23 22:14   ` [PATCH 10/51] libfrog: support scrubbing rtgroup metadata paths Darrick J. Wong
@ 2024-12-23 22:14   ` Darrick J. Wong
  2024-12-23 22:14   ` [PATCH 12/51] libfrog: add bitmap_clear Darrick J. Wong
                     ` (39 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:14 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Report realtime group geometry.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libfrog/fsgeom.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)


diff --git a/libfrog/fsgeom.c b/libfrog/fsgeom.c
index 67b4e65713be5b..9c1e9a90eb1f1b 100644
--- a/libfrog/fsgeom.c
+++ b/libfrog/fsgeom.c
@@ -67,7 +67,8 @@ xfs_report_geom(
 "naming   =version %-14u bsize=%-6u ascii-ci=%d, ftype=%d, parent=%d\n"
 "log      =%-22s bsize=%-6d blocks=%u, version=%d\n"
 "         =%-22s sectsz=%-5u sunit=%d blks, lazy-count=%d\n"
-"realtime =%-22s extsz=%-6d blocks=%lld, rtextents=%lld\n"),
+"realtime =%-22s extsz=%-6d blocks=%lld, rtextents=%lld\n"
+"         =%-22s rgcount=%-4d rgsize=%u extents\n"),
 		mntpoint, geo->inodesize, geo->agcount, geo->agblocks,
 		"", geo->sectsize, attrversion, projid32bit,
 		"", crcs_enabled, finobt_enabled, spinodes, rmapbt_enabled,
@@ -82,7 +83,8 @@ xfs_report_geom(
 		"", geo->logsectsize, geo->logsunit / geo->blocksize, lazycount,
 		!geo->rtblocks ? _("none") : rtname ? rtname : _("external"),
 		geo->rtextsize * geo->blocksize, (unsigned long long)geo->rtblocks,
-			(unsigned long long)geo->rtextents);
+			(unsigned long long)geo->rtextents,
+		"", geo->rgcount, geo->rgextents);
 }
 
 /* Try to obtain the xfs geometry.  On error returns a negative error code. */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 12/51] libfrog: add bitmap_clear
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (10 preceding siblings ...)
  2024-12-23 22:14   ` [PATCH 11/51] libfrog: report rt groups in output Darrick J. Wong
@ 2024-12-23 22:14   ` Darrick J. Wong
  2024-12-23 22:15   ` [PATCH 13/51] xfs_logprint: report realtime EFIs Darrick J. Wong
                     ` (38 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:14 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Uncomment and fix bitmap_clear so that xfs_repair can start using it.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libfrog/bitmap.c |   25 +++++++++++++++++++------
 libfrog/bitmap.h |    1 +
 2 files changed, 20 insertions(+), 6 deletions(-)


diff --git a/libfrog/bitmap.c b/libfrog/bitmap.c
index 5af5ab8dd6b3bb..0308886d446ff2 100644
--- a/libfrog/bitmap.c
+++ b/libfrog/bitmap.c
@@ -233,10 +233,9 @@ bitmap_set(
 	return res;
 }
 
-#if 0	/* Unused, provided for completeness. */
 /* Clear a region of bits. */
-int
-bitmap_clear(
+static int
+__bitmap_clear(
 	struct bitmap		*bmap,
 	uint64_t		start,
 	uint64_t		len)
@@ -251,8 +250,8 @@ bitmap_clear(
 	uint64_t		new_length;
 	struct avl64node	*node;
 	int			stat;
+	int			ret = 0;
 
-	pthread_mutex_lock(&bmap->bt_lock);
 	/* Find any existing nodes over that range. */
 	avl64_findranges(bmap->bt_tree, start, start + len, &firstn, &lastn);
 
@@ -312,10 +311,24 @@ bitmap_clear(
 	}
 
 out:
-	pthread_mutex_unlock(&bmap->bt_lock);
 	return ret;
 }
-#endif
+
+/* Clear a region of bits. */
+int
+bitmap_clear(
+	struct bitmap		*bmap,
+	uint64_t		start,
+	uint64_t		length)
+{
+	int			res;
+
+	pthread_mutex_lock(&bmap->bt_lock);
+	res = __bitmap_clear(bmap, start, length);
+	pthread_mutex_unlock(&bmap->bt_lock);
+
+	return res;
+}
 
 /* Iterate the set regions of this bitmap. */
 int
diff --git a/libfrog/bitmap.h b/libfrog/bitmap.h
index 043b77eece65b3..47df0ad38467ce 100644
--- a/libfrog/bitmap.h
+++ b/libfrog/bitmap.h
@@ -14,6 +14,7 @@ struct bitmap {
 int bitmap_alloc(struct bitmap **bmap);
 void bitmap_free(struct bitmap **bmap);
 int bitmap_set(struct bitmap *bmap, uint64_t start, uint64_t length);
+int bitmap_clear(struct bitmap *bmap, uint64_t start, uint64_t length);
 int bitmap_iterate(struct bitmap *bmap, int (*fn)(uint64_t, uint64_t, void *),
 		void *arg);
 int bitmap_iterate_range(struct bitmap *bmap, uint64_t start, uint64_t length,


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 13/51] xfs_logprint: report realtime EFIs
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (11 preceding siblings ...)
  2024-12-23 22:14   ` [PATCH 12/51] libfrog: add bitmap_clear Darrick J. Wong
@ 2024-12-23 22:15   ` Darrick J. Wong
  2024-12-23 22:15   ` [PATCH 14/51] xfs_repair: adjust rtbitmap/rtsummary word updates to handle big endian values Darrick J. Wong
                     ` (37 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:15 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Decode the EFI format just enough to report if an EFI targets the
realtime device or not.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 logprint/log_misc.c      |    2 ++
 logprint/log_print_all.c |    8 ++++++
 logprint/log_redo.c      |   57 +++++++++++++++++++++++++++++++++++-----------
 3 files changed, 53 insertions(+), 14 deletions(-)


diff --git a/logprint/log_misc.c b/logprint/log_misc.c
index 8e86ac347fa963..1df8c5d377c02d 100644
--- a/logprint/log_misc.c
+++ b/logprint/log_misc.c
@@ -997,12 +997,14 @@ xlog_print_record(
 					&i, num_ops);
 			break;
 		    }
+		    case XFS_LI_EFI_RT:
 		    case XFS_LI_EFI: {
 			skip = xlog_print_trans_efi(&ptr,
 					be32_to_cpu(op_head->oh_len),
 					continued);
 			break;
 		    }
+		    case XFS_LI_EFD_RT:
 		    case XFS_LI_EFD: {
 			skip = xlog_print_trans_efd(&ptr,
 					be32_to_cpu(op_head->oh_len));
diff --git a/logprint/log_print_all.c b/logprint/log_print_all.c
index a4a5e41f17fa64..5a9ddd05ab1288 100644
--- a/logprint/log_print_all.c
+++ b/logprint/log_print_all.c
@@ -410,9 +410,11 @@ xlog_recover_print_logitem(
 	case XFS_LI_INODE:
 		xlog_recover_print_inode(item);
 		break;
+	case XFS_LI_EFD_RT:
 	case XFS_LI_EFD:
 		xlog_recover_print_efd(item);
 		break;
+	case XFS_LI_EFI_RT:
 	case XFS_LI_EFI:
 		xlog_recover_print_efi(item);
 		break;
@@ -474,6 +476,12 @@ xlog_recover_print_item(
 	case XFS_LI_INODE:
 		printf("INO");
 		break;
+	case XFS_LI_EFD_RT:
+		printf("EFD_RT");
+		break;
+	case XFS_LI_EFI_RT:
+		printf("EFI_RT");
+		break;
 	case XFS_LI_EFD:
 		printf("EFD");
 		break;
diff --git a/logprint/log_redo.c b/logprint/log_redo.c
index 684e5f4a3f32c2..41e7c94a52dc21 100644
--- a/logprint/log_redo.c
+++ b/logprint/log_redo.c
@@ -67,6 +67,7 @@ xlog_print_trans_efi(
 	uint			src_len,
 	int			continued)
 {
+	const char		*item_name = "EFI?";
 	xfs_efi_log_format_t	*src_f, *f = NULL;
 	uint			dst_len;
 	xfs_extent_t		*ex;
@@ -103,8 +104,14 @@ xlog_print_trans_efi(
 		goto error;
 	}
 
-	printf(_("EFI:  #regs: %d	num_extents: %d  id: 0x%llx\n"),
-		f->efi_size, f->efi_nextents, (unsigned long long)f->efi_id);
+	switch (f->efi_type) {
+	case XFS_LI_EFI:	item_name = "EFI"; break;
+	case XFS_LI_EFI_RT:	item_name = "EFI_RT"; break;
+	}
+
+	printf(_("%s:  #regs: %d	num_extents: %u  id: 0x%llx\n"),
+			item_name, f->efi_size, f->efi_nextents,
+			(unsigned long long)f->efi_id);
 
 	if (continued) {
 		printf(_("EFI free extent data skipped (CONTINUE set, no space)\n"));
@@ -113,7 +120,7 @@ xlog_print_trans_efi(
 
 	ex = f->efi_extents;
 	for (i=0; i < f->efi_nextents; i++) {
-		printf("(s: 0x%llx, l: %d) ",
+		printf("(s: 0x%llx, l: %u) ",
 			(unsigned long long)ex->ext_start, ex->ext_len);
 		if (i % 4 == 3) printf("\n");
 		ex++;
@@ -130,6 +137,7 @@ void
 xlog_recover_print_efi(
 	struct xlog_recover_item *item)
 {
+	const char		*item_name = "EFI?";
 	xfs_efi_log_format_t	*f, *src_f;
 	xfs_extent_t		*ex;
 	int			i;
@@ -155,12 +163,18 @@ xlog_recover_print_efi(
 		return;
 	}
 
-	printf(_("	EFI:  #regs:%d	num_extents:%d  id:0x%llx\n"),
-		   f->efi_size, f->efi_nextents, (unsigned long long)f->efi_id);
+	switch (f->efi_type) {
+	case XFS_LI_EFI:	item_name = "EFI"; break;
+	case XFS_LI_EFI_RT:	item_name = "EFI_RT"; break;
+	}
+
+	printf(_("	%s:  #regs:%d	num_extents:%u  id:0x%llx\n"),
+			item_name, f->efi_size, f->efi_nextents,
+			(unsigned long long)f->efi_id);
 	ex = f->efi_extents;
 	printf("	");
 	for (i=0; i< f->efi_nextents; i++) {
-		printf("(s: 0x%llx, l: %d) ",
+		printf("(s: 0x%llx, l: %u) ",
 			(unsigned long long)ex->ext_start, ex->ext_len);
 		if (i % 4 == 3)
 			printf("\n");
@@ -174,8 +188,10 @@ xlog_recover_print_efi(
 int
 xlog_print_trans_efd(char **ptr, uint len)
 {
-	xfs_efd_log_format_t *f;
-	xfs_efd_log_format_t lbuf;
+	const char		*item_name = "EFD?";
+	xfs_efd_log_format_t	*f;
+	xfs_efd_log_format_t	lbuf;
+
 	/* size without extents at end */
 	uint core_size = sizeof(xfs_efd_log_format_t);
 
@@ -185,11 +201,17 @@ xlog_print_trans_efd(char **ptr, uint len)
 	 */
 	memmove(&lbuf, *ptr, min(core_size, len));
 	f = &lbuf;
+
+	switch (f->efd_type) {
+	case XFS_LI_EFD:	item_name = "EFD"; break;
+	case XFS_LI_EFD_RT:	item_name = "EFD_RT"; break;
+	}
+
 	*ptr += len;
 	if (len >= core_size) {
-		printf(_("EFD:  #regs: %d	num_extents: %d  id: 0x%llx\n"),
-			f->efd_size, f->efd_nextents,
-			(unsigned long long)f->efd_efi_id);
+		printf(_("%s:  #regs: %d	num_extents: %d  id: 0x%llx\n"),
+				item_name, f->efd_size, f->efd_nextents,
+				(unsigned long long)f->efd_efi_id);
 
 		/* don't print extents as they are not used */
 
@@ -204,18 +226,25 @@ void
 xlog_recover_print_efd(
 	struct xlog_recover_item *item)
 {
+	const char		*item_name = "EFD?";
 	xfs_efd_log_format_t	*f;
 
 	f = (xfs_efd_log_format_t *)item->ri_buf[0].i_addr;
+
+	switch (f->efd_type) {
+	case XFS_LI_EFD:	item_name = "EFD"; break;
+	case XFS_LI_EFD_RT:	item_name = "EFD_RT"; break;
+	}
+
 	/*
 	 * An xfs_efd_log_format structure contains a variable length array
 	 * as the last field.
 	 * Each element is of size xfs_extent_32_t or xfs_extent_64_t.
 	 * However, the extents are never used and won't be printed.
 	 */
-	printf(_("	EFD:  #regs: %d	num_extents: %d  id: 0x%llx\n"),
-		f->efd_size, f->efd_nextents,
-		(unsigned long long)f->efd_efi_id);
+	printf(_("	%s:  #regs: %d	num_extents: %d  id: 0x%llx\n"),
+			item_name, f->efd_size, f->efd_nextents,
+			(unsigned long long)f->efd_efi_id);
 }
 
 /* Reverse Mapping Update Items */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 14/51] xfs_repair: adjust rtbitmap/rtsummary word updates to handle big endian values
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (12 preceding siblings ...)
  2024-12-23 22:15   ` [PATCH 13/51] xfs_logprint: report realtime EFIs Darrick J. Wong
@ 2024-12-23 22:15   ` Darrick J. Wong
  2024-12-23 22:15   ` [PATCH 15/51] xfs_repair: refactor phase4 Darrick J. Wong
                     ` (36 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:15 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

With rtgroups, the rt bitmap and summary file words are defined to be
be32 values.  Adjust repair to handle the endianness correctly.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/rt.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)


diff --git a/repair/rt.c b/repair/rt.c
index 711891a724b076..65d6a713f379d2 100644
--- a/repair/rt.c
+++ b/repair/rt.c
@@ -24,7 +24,10 @@ set_rtword(
 	union xfs_rtword_raw	*word,
 	xfs_rtword_t		value)
 {
-	word->old = value;
+	if (xfs_has_rtgroups(mp))
+		word->rtg = cpu_to_be32(value);
+	else
+		word->old = value;
 }
 
 static inline void
@@ -35,7 +38,10 @@ inc_sumcount(
 {
 	union xfs_suminfo_raw	*p = info + index;
 
-	p->old++;
+	if (xfs_has_rtgroups(mp))
+		be32_add_cpu(&p->rtg, 1);
+	else
+		p->old++;
 }
 
 /*


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 15/51] xfs_repair: refactor phase4
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (13 preceding siblings ...)
  2024-12-23 22:15   ` [PATCH 14/51] xfs_repair: adjust rtbitmap/rtsummary word updates to handle big endian values Darrick J. Wong
@ 2024-12-23 22:15   ` Darrick J. Wong
  2024-12-23 22:15   ` [PATCH 16/51] xfs_repair: refactor offsetof+sizeof to offsetofend Darrick J. Wong
                     ` (35 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:15 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Split out helpers to process all duplicate extents in an AG and the RT
duplicate extents.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/phase4.c |  210 ++++++++++++++++++++++++++++++-------------------------
 1 file changed, 114 insertions(+), 96 deletions(-)


diff --git a/repair/phase4.c b/repair/phase4.c
index 40db36f1f93020..036a4ed0e54445 100644
--- a/repair/phase4.c
+++ b/repair/phase4.c
@@ -225,20 +225,122 @@ process_rmap_data(
 	destroy_work_queue(&wq);
 }
 
+static void
+process_dup_rt_extents(
+	struct xfs_mount	*mp)
+{
+	xfs_rtxnum_t		rt_start = 0;
+	xfs_rtxlen_t		rt_len = 0;
+	xfs_rtxnum_t		rtx;
+
+	for (rtx = 0; rtx < mp->m_sb.sb_rextents; rtx++)  {
+		int state;
+
+		state = get_rtbmap(rtx);
+		switch (state) {
+		case XR_E_BAD_STATE:
+		default:
+			do_warn(
+	_("unknown rt extent state %d, extent %" PRIu64 "\n"),
+				state, rtx);
+			fallthrough;
+		case XR_E_METADATA:
+		case XR_E_UNKNOWN:
+		case XR_E_FREE1:
+		case XR_E_FREE:
+		case XR_E_INUSE:
+		case XR_E_INUSE_FS:
+		case XR_E_INO:
+		case XR_E_FS_MAP:
+			if (rt_start == 0)
+				continue;
+			/*
+			 * Add extent and reset extent state.
+			 */
+			add_rt_dup_extent(rt_start, rt_len);
+			rt_start = 0;
+			rt_len = 0;
+			break;
+		case XR_E_MULT:
+			switch (rt_start)  {
+			case 0:
+				rt_start = rtx;
+				rt_len = 1;
+				break;
+			case XFS_MAX_BMBT_EXTLEN:
+				/*
+				 * Large extent case.
+				 */
+				add_rt_dup_extent(rt_start, rt_len);
+				rt_start = rtx;
+				rt_len = 1;
+				break;
+			default:
+				rt_len++;
+				break;
+			}
+			break;
+		}
+	}
+
+	/*
+	 * Catch the tail case, extent hitting the end of the RTG.
+	 */
+	if (rt_start != 0)
+		add_rt_dup_extent(rt_start, rt_len);
+}
+
+/*
+ * Set up duplicate extent list for an AG or RTG.
+ */
+static void
+process_dup_extents(
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		agbno,
+	xfs_agblock_t		ag_end)
+{
+	do {
+		int		bstate;
+		xfs_extlen_t	blen;
+
+		bstate = get_bmap_ext(agno, agbno, ag_end, &blen);
+		switch (bstate) {
+		case XR_E_FREE1:
+			if (no_modify)
+				do_warn(
+_("free space (%u,%u-%u) only seen by one free space btree\n"),
+					agno, agbno, agbno + blen - 1);
+			break;
+		case XR_E_METADATA:
+		case XR_E_UNKNOWN:
+		case XR_E_FREE:
+		case XR_E_INUSE:
+		case XR_E_INUSE_FS:
+		case XR_E_INO:
+		case XR_E_FS_MAP:
+			break;
+		case XR_E_MULT:
+			add_dup_extent(agno, agbno, blen);
+			break;
+		case XR_E_BAD_STATE:
+		default:
+			do_warn(
+_("unknown block state, ag %d, blocks %u-%u\n"),
+				agno, agbno, agbno + blen - 1);
+			break;
+		}
+
+		agbno += blen;
+	} while (agbno < ag_end);
+}
+
 void
 phase4(xfs_mount_t *mp)
 {
 	ino_tree_node_t		*irec;
-	xfs_rtxnum_t		rtx;
-	xfs_rtxnum_t		rt_start;
-	xfs_rtxlen_t		rt_len;
 	xfs_agnumber_t		i;
-	xfs_agblock_t		j;
-	xfs_agblock_t		ag_end;
-	xfs_extlen_t		blen;
 	int			ag_hdr_len = 4 * mp->m_sb.sb_sectsize;
 	int			ag_hdr_block;
-	int			bstate;
 
 	if (rmap_needs_work(mp))
 		collect_rmaps = true;
@@ -281,102 +383,19 @@ phase4(xfs_mount_t *mp)
 	}
 
 	for (i = 0; i < mp->m_sb.sb_agcount; i++)  {
+		xfs_agblock_t		ag_end;
+
 		ag_end = (i < mp->m_sb.sb_agcount - 1) ? mp->m_sb.sb_agblocks :
 			mp->m_sb.sb_dblocks -
 				(xfs_rfsblock_t) mp->m_sb.sb_agblocks * i;
 
-		/*
-		 * set up duplicate extent list for this ag
-		 */
-		for (j = ag_hdr_block; j < ag_end; j += blen)  {
-			bstate = get_bmap_ext(i, j, ag_end, &blen);
-			switch (bstate) {
-			case XR_E_FREE1:
-				if (no_modify)
-					do_warn(
-	_("free space (%u,%u-%u) only seen by one free space btree\n"),
-						i, j, j + blen - 1);
-				break;
-			case XR_E_BAD_STATE:
-			default:
-				do_warn(
-				_("unknown block state, ag %d, blocks %u-%u\n"),
-					i, j, j + blen - 1);
-				fallthrough;
-			case XR_E_METADATA:
-			case XR_E_UNKNOWN:
-			case XR_E_FREE:
-			case XR_E_INUSE:
-			case XR_E_INUSE_FS:
-			case XR_E_INO:
-			case XR_E_FS_MAP:
-				break;
-			case XR_E_MULT:
-				add_dup_extent(i, j, blen);
-				break;
-			}
-		}
+		process_dup_extents(i, ag_hdr_block, ag_end);
 
 		PROG_RPT_INC(prog_rpt_done[i], 1);
 	}
 	print_final_rpt();
 
-	/*
-	 * initialize realtime bitmap
-	 */
-	rt_start = 0;
-	rt_len = 0;
-
-	for (rtx = 0; rtx < mp->m_sb.sb_rextents; rtx++)  {
-		bstate = get_rtbmap(rtx);
-		switch (bstate)  {
-		case XR_E_BAD_STATE:
-		default:
-			do_warn(
-	_("unknown rt extent state, extent %" PRIu64 "\n"),
-				rtx);
-			fallthrough;
-		case XR_E_METADATA:
-		case XR_E_UNKNOWN:
-		case XR_E_FREE1:
-		case XR_E_FREE:
-		case XR_E_INUSE:
-		case XR_E_INUSE_FS:
-		case XR_E_INO:
-		case XR_E_FS_MAP:
-			if (rt_start == 0)
-				continue;
-			else  {
-				/*
-				 * add extent and reset extent state
-				 */
-				add_rt_dup_extent(rt_start, rt_len);
-				rt_start = 0;
-				rt_len = 0;
-			}
-			break;
-		case XR_E_MULT:
-			if (rt_start == 0)  {
-				rt_start = rtx;
-				rt_len = 1;
-			} else if (rt_len == XFS_MAX_BMBT_EXTLEN)  {
-				/*
-				 * large extent case
-				 */
-				add_rt_dup_extent(rt_start, rt_len);
-				rt_start = rtx;
-				rt_len = 1;
-			} else
-				rt_len++;
-			break;
-		}
-	}
-
-	/*
-	 * catch tail-case, extent hitting the end of the ag
-	 */
-	if (rt_start != 0)
-		add_rt_dup_extent(rt_start, rt_len);
+	process_dup_rt_extents(mp);
 
 	/*
 	 * initialize bitmaps for all AGs
@@ -410,8 +429,7 @@ phase4(xfs_mount_t *mp)
 	/*
 	 * free up memory used to track trealtime duplicate extents
 	 */
-	if (rt_start != 0)
-		free_rt_dup_extent_tree(mp);
+	free_rt_dup_extent_tree(mp);
 
 	/*
 	 * ensure consistency of quota inode pointers in superblock,


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 16/51] xfs_repair: refactor offsetof+sizeof to offsetofend
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (14 preceding siblings ...)
  2024-12-23 22:15   ` [PATCH 15/51] xfs_repair: refactor phase4 Darrick J. Wong
@ 2024-12-23 22:15   ` Darrick J. Wong
  2024-12-23 22:16   ` [PATCH 17/51] xfs_repair: improve rtbitmap discrepancy reporting Darrick J. Wong
                     ` (34 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:15 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Replace this open-coded logic with the kernel's offsetofend macro before
we start adding more in the next patch.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/xfs.h     |   15 +++++++++++++++
 repair/agheader.c |   21 +++++++--------------
 2 files changed, 22 insertions(+), 14 deletions(-)


diff --git a/include/xfs.h b/include/xfs.h
index e97158c8d223f5..a2f159a586ee71 100644
--- a/include/xfs.h
+++ b/include/xfs.h
@@ -38,8 +38,23 @@ extern int xfs_assert_largefile[sizeof(off_t)-8];
 #define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
 #endif
 
+/**
+ * sizeof_field() - Report the size of a struct field in bytes
+ *
+ * @TYPE: The structure containing the field of interest
+ * @MEMBER: The field to return the size of
+ */
 #define sizeof_field(TYPE, MEMBER) sizeof((((TYPE *)0)->MEMBER))
 
+/**
+ * offsetofend() - Report the offset of a struct field within the struct
+ *
+ * @TYPE: The type of the structure
+ * @MEMBER: The member within the structure to get the end offset of
+ */
+#define offsetofend(TYPE, MEMBER) \
+	(offsetof(TYPE, MEMBER)	+ sizeof_field(TYPE, MEMBER))
+
 #include <xfs/xfs_types.h>
 /* Include deprecated/compat pre-vfs xfs-specific symbols */
 #include <xfs/xfs_fs_compat.h>
diff --git a/repair/agheader.c b/repair/agheader.c
index fe58d833b8bafa..14aedece3d07b0 100644
--- a/repair/agheader.c
+++ b/repair/agheader.c
@@ -364,26 +364,19 @@ secondary_sb_whack(
 	 * size is the size of data which is valid for this sb.
 	 */
 	if (xfs_sb_version_hasmetadir(sb))
-		size = offsetof(struct xfs_dsb, sb_metadirino)
-			+ sizeof(sb->sb_metadirino);
+		size = offsetofend(struct xfs_dsb, sb_metadirino);
 	else if (xfs_sb_version_hasmetauuid(sb))
-		size = offsetof(struct xfs_dsb, sb_meta_uuid)
-			+ sizeof(sb->sb_meta_uuid);
+		size = offsetofend(struct xfs_dsb, sb_meta_uuid);
 	else if (xfs_sb_version_hascrc(sb))
-		size = offsetof(struct xfs_dsb, sb_lsn)
-			+ sizeof(sb->sb_lsn);
+		size = offsetofend(struct xfs_dsb, sb_lsn);
 	else if (xfs_sb_version_hasmorebits(sb))
-		size = offsetof(struct xfs_dsb, sb_bad_features2)
-			+ sizeof(sb->sb_bad_features2);
+		size = offsetofend(struct xfs_dsb, sb_bad_features2);
 	else if (xfs_sb_version_haslogv2(sb))
-		size = offsetof(struct xfs_dsb, sb_logsunit)
-			+ sizeof(sb->sb_logsunit);
+		size = offsetofend(struct xfs_dsb, sb_logsunit);
 	else if (xfs_sb_version_hassector(sb))
-		size = offsetof(struct xfs_dsb, sb_logsectsize)
-			+ sizeof(sb->sb_logsectsize);
+		size = offsetofend(struct xfs_dsb, sb_logsectsize);
 	else /* only support dirv2 or more recent */
-		size = offsetof(struct xfs_dsb, sb_dirblklog)
-			+ sizeof(sb->sb_dirblklog);
+		size = offsetofend(struct xfs_dsb, sb_dirblklog);
 
 	/* Check the buffer we read from disk for garbage outside size */
 	for (ip = (char *)sbuf->b_addr + size;


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 17/51] xfs_repair: improve rtbitmap discrepancy reporting
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (15 preceding siblings ...)
  2024-12-23 22:15   ` [PATCH 16/51] xfs_repair: refactor offsetof+sizeof to offsetofend Darrick J. Wong
@ 2024-12-23 22:16   ` Darrick J. Wong
  2024-12-23 22:16   ` [PATCH 18/51] xfs_repair: simplify rt_lock handling Darrick J. Wong
                     ` (33 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:16 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Improve the reporting of discrepancies in the realtime bitmap and
summary files by creating a separate helper function that will pinpoint
the exact (word) locations of mismatches.  This will help developers to
diagnose problems with the rtgroups feature and users to figure out
exactly what's bad in a filesystem.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/rt.c |   42 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)


diff --git a/repair/rt.c b/repair/rt.c
index 65d6a713f379d2..a3378ef1dd0af2 100644
--- a/repair/rt.c
+++ b/repair/rt.c
@@ -131,6 +131,44 @@ _("couldn't allocate memory for incore realtime summary info.\n"));
 	}
 }
 
+static void
+check_rtwords(
+	struct xfs_mount	*mp,
+	const char		*filename,
+	unsigned long long	bno,
+	void			*ondisk,
+	void			*incore)
+{
+	unsigned int		wordcnt = mp->m_blockwsize;
+	union xfs_rtword_raw	*o = ondisk, *i = incore;
+	int			badstart = -1;
+	unsigned int		j;
+
+	if (memcmp(ondisk, incore, wordcnt << XFS_WORDLOG) == 0)
+		return;
+
+	for (j = 0; j < wordcnt; j++, o++, i++) {
+		if (o->old == i->old) {
+			/* Report a range of inconsistency that just ended. */
+			if (badstart >= 0)
+				do_warn(
+ _("discrepancy in %s at dblock 0x%llx words 0x%x-0x%x/0x%x\n"),
+					filename, bno, badstart, j - 1, wordcnt);
+			badstart = -1;
+			continue;
+		}
+
+		if (badstart == -1)
+			badstart = j;
+	}
+
+	if (badstart >= 0)
+		do_warn(
+ _("discrepancy in %s at dblock 0x%llx words 0x%x-0x%x/0x%x\n"),
+					filename, bno, badstart, wordcnt,
+					wordcnt);
+}
+
 static void
 check_rtfile_contents(
 	struct xfs_mount	*mp,
@@ -203,9 +241,7 @@ check_rtfile_contents(
 			break;
 		}
 
-		if (memcmp(bp->b_addr, buf, mp->m_blockwsize << XFS_WORDLOG))
-			do_warn(_("discrepancy in %s at dblock 0x%llx\n"),
-					filename, (unsigned long long)bno);
+		check_rtwords(mp, filename, bno, bp->b_addr, buf);
 
 		buf += XFS_FSB_TO_B(mp, map.br_blockcount);
 		bno += map.br_blockcount;


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 18/51] xfs_repair: simplify rt_lock handling
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (16 preceding siblings ...)
  2024-12-23 22:16   ` [PATCH 17/51] xfs_repair: improve rtbitmap discrepancy reporting Darrick J. Wong
@ 2024-12-23 22:16   ` Darrick J. Wong
  2024-12-23 22:16   ` [PATCH 19/51] xfs_repair: add a real per-AG bitmap abstraction Darrick J. Wong
                     ` (32 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:16 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

No need to cacheline align rt_lock if we move it next to the data
it protects.  Also reduce the critical section to just where those
data structures are accessed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/dinode.c  |    6 +++---
 repair/globals.c |    1 -
 repair/globals.h |    2 +-
 repair/incore.c  |    8 ++++----
 4 files changed, 8 insertions(+), 9 deletions(-)


diff --git a/repair/dinode.c b/repair/dinode.c
index 56c7257d3766f1..8c593da545cd7f 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -304,7 +304,7 @@ process_rt_rec(
 	bool			zap_metadata)
 {
 	xfs_fsblock_t		lastb;
-	int			bad;
+	int			bad = 0;
 
 	/*
 	 * check numeric validity of the extent
@@ -338,10 +338,12 @@ _("inode %" PRIu64 " - bad rt extent overflows - start %" PRIu64 ", "
 		return 1;
 	}
 
+	pthread_mutex_lock(&rt_lock);
 	if (check_dups)
 		bad = process_rt_rec_dups(mp, ino, irec);
 	else
 		bad = process_rt_rec_state(mp, ino, zap_metadata, irec);
+	pthread_mutex_unlock(&rt_lock);
 	if (bad)
 		return bad;
 
@@ -451,10 +453,8 @@ _("zero length extent (off = %" PRIu64 ", fsbno = %" PRIu64 ") in ino %" PRIu64
 		}
 
 		if (type == XR_INO_RTDATA && whichfork == XFS_DATA_FORK) {
-			pthread_mutex_lock(&rt_lock.lock);
 			error2 = process_rt_rec(mp, &irec, ino, tot, check_dups,
 					zap_metadata);
-			pthread_mutex_unlock(&rt_lock.lock);
 			if (error2)
 				return error2;
 
diff --git a/repair/globals.c b/repair/globals.c
index bd07a9656d193b..d97e2a8d2d6d9b 100644
--- a/repair/globals.c
+++ b/repair/globals.c
@@ -112,7 +112,6 @@ uint32_t	sb_unit;
 uint32_t	sb_width;
 
 struct aglock	*ag_locks;
-struct aglock	rt_lock;
 
 time_t		report_interval;
 uint64_t	*prog_rpt_done;
diff --git a/repair/globals.h b/repair/globals.h
index ebe8d5ee132b8d..db8afabd9f0fc9 100644
--- a/repair/globals.h
+++ b/repair/globals.h
@@ -156,7 +156,7 @@ struct aglock {
 	pthread_mutex_t	lock __attribute__((__aligned__(64)));
 };
 extern struct aglock	*ag_locks;
-extern struct aglock	rt_lock;
+extern pthread_mutex_t	rt_lock;
 
 extern time_t		report_interval;
 extern uint64_t		*prog_rpt_done;
diff --git a/repair/incore.c b/repair/incore.c
index 06edaf0d605262..21f5b05d3e93e4 100644
--- a/repair/incore.c
+++ b/repair/incore.c
@@ -166,6 +166,7 @@ get_bmap_ext(
 
 static uint64_t		*rt_bmap;
 static size_t		rt_bmap_size;
+pthread_mutex_t		rt_lock;
 
 /* block records fit into uint64_t's units */
 #define XR_BB_UNIT	64			/* number of bits/unit */
@@ -209,6 +210,7 @@ init_rt_bmap(
 	if (mp->m_sb.sb_rextents == 0)
 		return;
 
+	pthread_mutex_init(&rt_lock, NULL);
 	rt_bmap_size = roundup(howmany(mp->m_sb.sb_rextents, (NBBY / XR_BB)),
 			       sizeof(uint64_t));
 
@@ -226,8 +228,9 @@ free_rt_bmap(xfs_mount_t *mp)
 {
 	free(rt_bmap);
 	rt_bmap = NULL;
-}
+	pthread_mutex_destroy(&rt_lock);
 
+}
 
 void
 reset_bmaps(xfs_mount_t *mp)
@@ -290,7 +293,6 @@ init_bmaps(xfs_mount_t *mp)
 		btree_init(&ag_bmap[i]);
 		pthread_mutex_init(&ag_locks[i].lock, NULL);
 	}
-	pthread_mutex_init(&rt_lock.lock, NULL);
 
 	init_rt_bmap(mp);
 	reset_bmaps(mp);
@@ -301,8 +303,6 @@ free_bmaps(xfs_mount_t *mp)
 {
 	xfs_agnumber_t i;
 
-	pthread_mutex_destroy(&rt_lock.lock);
-
 	for (i = 0; i < mp->m_sb.sb_agcount; i++)
 		pthread_mutex_destroy(&ag_locks[i].lock);
 


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 19/51] xfs_repair: add a real per-AG bitmap abstraction
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (17 preceding siblings ...)
  2024-12-23 22:16   ` [PATCH 18/51] xfs_repair: simplify rt_lock handling Darrick J. Wong
@ 2024-12-23 22:16   ` Darrick J. Wong
  2024-12-23 22:16   ` [PATCH 20/51] xfs_repair: support realtime groups Darrick J. Wong
                     ` (31 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:16 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Add a struct bmap that contains the btree root and the lock, and provide
helpers for loking instead of directly poking into the data structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/dino_chunks.c |   23 +++++-----
 repair/dinode.c      |    6 +--
 repair/globals.c     |    2 -
 repair/globals.h     |    4 --
 repair/incore.c      |  114 ++++++++++++++++++++++++++++++++------------------
 repair/incore.h      |    3 +
 repair/rmap.c        |    4 +-
 repair/scan.c        |    8 ++--
 8 files changed, 97 insertions(+), 67 deletions(-)


diff --git a/repair/dino_chunks.c b/repair/dino_chunks.c
index 0cbc498101ec72..3c650dac8a4d8e 100644
--- a/repair/dino_chunks.c
+++ b/repair/dino_chunks.c
@@ -132,7 +132,7 @@ verify_inode_chunk(xfs_mount_t		*mp,
 		if (check_aginode_block(mp, agno, agino) == 0)
 			return 0;
 
-		pthread_mutex_lock(&ag_locks[agno].lock);
+		lock_ag(agno);
 
 		state = get_bmap(agno, agbno);
 		switch (state) {
@@ -167,8 +167,8 @@ verify_inode_chunk(xfs_mount_t		*mp,
 		_("inode block %d/%d multiply claimed, (state %d)\n"),
 				agno, agbno, state);
 			set_bmap(agno, agbno, XR_E_MULT);
-			pthread_mutex_unlock(&ag_locks[agno].lock);
-			return(0);
+			unlock_ag(agno);
+			return 0;
 		default:
 			do_warn(
 		_("inode block %d/%d bad state, (state %d)\n"),
@@ -177,7 +177,7 @@ verify_inode_chunk(xfs_mount_t		*mp,
 			break;
 		}
 
-		pthread_mutex_unlock(&ag_locks[agno].lock);
+		unlock_ag(agno);
 
 		start_agino = XFS_AGB_TO_AGINO(mp, agbno);
 		*start_ino = XFS_AGINO_TO_INO(mp, agno, start_agino);
@@ -424,7 +424,7 @@ verify_inode_chunk(xfs_mount_t		*mp,
 	 * user data -- we're probably here as a result of a directory
 	 * entry or an iunlinked pointer
 	 */
-	pthread_mutex_lock(&ag_locks[agno].lock);
+	lock_ag(agno);
 	for (cur_agbno = chunk_start_agbno;
 	     cur_agbno < chunk_stop_agbno;
 	     cur_agbno += blen)  {
@@ -438,7 +438,7 @@ verify_inode_chunk(xfs_mount_t		*mp,
 	_("inode block %d/%d multiply claimed, (state %d)\n"),
 				agno, cur_agbno, state);
 			set_bmap_ext(agno, cur_agbno, blen, XR_E_MULT);
-			pthread_mutex_unlock(&ag_locks[agno].lock);
+			unlock_ag(agno);
 			return 0;
 		case XR_E_METADATA:
 		case XR_E_INO:
@@ -450,7 +450,7 @@ verify_inode_chunk(xfs_mount_t		*mp,
 			break;
 		}
 	}
-	pthread_mutex_unlock(&ag_locks[agno].lock);
+	unlock_ag(agno);
 
 	/*
 	 * ok, chunk is good.  put the record into the tree if required,
@@ -473,8 +473,7 @@ verify_inode_chunk(xfs_mount_t		*mp,
 
 	set_inode_used(irec_p, agino - start_agino);
 
-	pthread_mutex_lock(&ag_locks[agno].lock);
-
+	lock_ag(agno);
 	for (cur_agbno = chunk_start_agbno;
 	     cur_agbno < chunk_stop_agbno;
 	     cur_agbno += blen)  {
@@ -516,7 +515,7 @@ verify_inode_chunk(xfs_mount_t		*mp,
 			break;
 		}
 	}
-	pthread_mutex_unlock(&ag_locks[agno].lock);
+	unlock_ag(agno);
 
 	return(ino_cnt);
 }
@@ -575,7 +574,7 @@ process_inode_agbno_state(
 {
 	int state;
 
-	pthread_mutex_lock(&ag_locks[agno].lock);
+	lock_ag(agno);
 	state = get_bmap(agno, agbno);
 	switch (state) {
 	case XR_E_INO:	/* already marked */
@@ -605,7 +604,7 @@ process_inode_agbno_state(
 			XFS_AGB_TO_FSB(mp, agno, agbno), state);
 		break;
 	}
-	pthread_mutex_unlock(&ag_locks[agno].lock);
+	unlock_ag(agno);
 }
 
 /*
diff --git a/repair/dinode.c b/repair/dinode.c
index 8c593da545cd7f..916aadc782248f 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -540,9 +540,9 @@ _("Fatal error: inode %" PRIu64 " - blkmap_set_ext(): %s\n"
 		ebno = agbno + irec.br_blockcount;
 		if (agno != locked_agno) {
 			if (locked_agno != -1)
-				pthread_mutex_unlock(&ag_locks[locked_agno].lock);
-			pthread_mutex_lock(&ag_locks[agno].lock);
+				unlock_ag(locked_agno);
 			locked_agno = agno;
+			lock_ag(locked_agno);
 		}
 
 		for (b = irec.br_startblock;
@@ -663,7 +663,7 @@ _("illegal state %d in block map %" PRIu64 "\n"),
 	error = 0;
 done:
 	if (locked_agno != -1)
-		pthread_mutex_unlock(&ag_locks[locked_agno].lock);
+		unlock_ag(locked_agno);
 
 	if (i != *numrecs) {
 		ASSERT(i < *numrecs);
diff --git a/repair/globals.c b/repair/globals.c
index d97e2a8d2d6d9b..30995f5298d5a1 100644
--- a/repair/globals.c
+++ b/repair/globals.c
@@ -111,8 +111,6 @@ xfs_extlen_t	sb_inoalignmt;
 uint32_t	sb_unit;
 uint32_t	sb_width;
 
-struct aglock	*ag_locks;
-
 time_t		report_interval;
 uint64_t	*prog_rpt_done;
 
diff --git a/repair/globals.h b/repair/globals.h
index db8afabd9f0fc9..7c2d9c56c8f8a7 100644
--- a/repair/globals.h
+++ b/repair/globals.h
@@ -152,10 +152,6 @@ extern xfs_extlen_t	sb_inoalignmt;
 extern uint32_t	sb_unit;
 extern uint32_t	sb_width;
 
-struct aglock {
-	pthread_mutex_t	lock __attribute__((__aligned__(64)));
-};
-extern struct aglock	*ag_locks;
 extern pthread_mutex_t	rt_lock;
 
 extern time_t		report_interval;
diff --git a/repair/incore.c b/repair/incore.c
index 21f5b05d3e93e4..fb9ebee1671d4f 100644
--- a/repair/incore.c
+++ b/repair/incore.c
@@ -24,7 +24,25 @@
 static int states[16] =
 	{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};
 
-static struct btree_root	**ag_bmap;
+struct bmap {
+	pthread_mutex_t		lock __attribute__((__aligned__(64)));
+	struct btree_root	*root;
+};
+static struct bmap	*ag_bmaps;
+
+void
+lock_ag(
+	xfs_agnumber_t		agno)
+{
+	pthread_mutex_lock(&ag_bmaps[agno].lock);
+}
+
+void
+unlock_ag(
+	xfs_agnumber_t		agno)
+{
+	pthread_mutex_unlock(&ag_bmaps[agno].lock);
+}
 
 static void
 update_bmap(
@@ -129,7 +147,7 @@ set_bmap_ext(
 	xfs_extlen_t		blen,
 	int			state)
 {
-	update_bmap(ag_bmap[agno], agbno, blen, &states[state]);
+	update_bmap(ag_bmaps[agno].root, agbno, blen, &states[state]);
 }
 
 int
@@ -139,23 +157,24 @@ get_bmap_ext(
 	xfs_agblock_t		maxbno,
 	xfs_extlen_t		*blen)
 {
+	struct btree_root	*bmap = ag_bmaps[agno].root;
 	int			*statep;
 	unsigned long		key;
 
-	statep = btree_find(ag_bmap[agno], agbno, &key);
+	statep = btree_find(bmap, agbno, &key);
 	if (!statep)
 		return -1;
 
 	if (key == agbno) {
 		if (blen) {
-			if (!btree_peek_next(ag_bmap[agno], &key))
+			if (!btree_peek_next(bmap, &key))
 				return -1;
 			*blen = min(maxbno, key) - agbno;
 		}
 		return *statep;
 	}
 
-	statep = btree_peek_prev(ag_bmap[agno], NULL);
+	statep = btree_peek_prev(bmap, NULL);
 	if (!statep)
 		return -1;
 	if (blen)
@@ -243,13 +262,15 @@ reset_bmaps(xfs_mount_t *mp)
 	ag_size = mp->m_sb.sb_agblocks;
 
 	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
+		struct btree_root	*bmap = ag_bmaps[agno].root;
+
 		if (agno == mp->m_sb.sb_agcount - 1)
 			ag_size = (xfs_extlen_t)(mp->m_sb.sb_dblocks -
 				   (xfs_rfsblock_t)mp->m_sb.sb_agblocks * agno);
 #ifdef BTREE_STATS
-		if (btree_find(ag_bmap[agno], 0, NULL)) {
+		if (btree_find(bmap, 0, NULL)) {
 			printf("ag_bmap[%d] btree stats:\n", i);
-			btree_print_stats(ag_bmap[agno], stdout);
+			btree_print_stats(bmap, stdout);
 		}
 #endif
 		/*
@@ -260,11 +281,10 @@ reset_bmaps(xfs_mount_t *mp)
 		 *	ag_hdr_block..ag_size:		XR_E_UNKNOWN
 		 *	ag_size...			XR_E_BAD_STATE
 		 */
-		btree_clear(ag_bmap[agno]);
-		btree_insert(ag_bmap[agno], 0, &states[XR_E_INUSE_FS]);
-		btree_insert(ag_bmap[agno],
-				ag_hdr_block, &states[XR_E_UNKNOWN]);
-		btree_insert(ag_bmap[agno], ag_size, &states[XR_E_BAD_STATE]);
+		btree_clear(bmap);
+		btree_insert(bmap, 0, &states[XR_E_INUSE_FS]);
+		btree_insert(bmap, ag_hdr_block, &states[XR_E_UNKNOWN]);
+		btree_insert(bmap, ag_size, &states[XR_E_BAD_STATE]);
 	}
 
 	if (mp->m_sb.sb_logstart != 0) {
@@ -276,44 +296,58 @@ reset_bmaps(xfs_mount_t *mp)
 	reset_rt_bmap();
 }
 
+static struct bmap *
+alloc_bmaps(
+	unsigned int		nr_groups)
+{
+	struct bmap		*bmap;
+	unsigned int		i;
+
+	bmap = calloc(nr_groups, sizeof(*bmap));
+	if (!bmap)
+		return NULL;
+
+	for (i = 0; i < nr_groups; i++)  {
+		btree_init(&bmap[i].root);
+		pthread_mutex_init(&bmap[i].lock, NULL);
+	}
+
+	return bmap;
+}
+
+static void
+destroy_bmaps(
+	struct bmap		*bmap,
+	unsigned int		nr_groups)
+{
+	unsigned int		i;
+
+	for (i = 0; i < nr_groups; i++) {
+		btree_destroy(bmap[i].root);
+		pthread_mutex_destroy(&bmap[i].lock);
+	}
+
+	free(bmap);
+}
+
 void
-init_bmaps(xfs_mount_t *mp)
+init_bmaps(
+	struct xfs_mount	*mp)
 {
-	xfs_agnumber_t i;
-
-	ag_bmap = calloc(mp->m_sb.sb_agcount, sizeof(struct btree_root *));
-	if (!ag_bmap)
+	ag_bmaps = alloc_bmaps(mp->m_sb.sb_agcount + mp->m_sb.sb_rgcount);
+	if (!ag_bmaps)
 		do_error(_("couldn't allocate block map btree roots\n"));
 
-	ag_locks = calloc(mp->m_sb.sb_agcount, sizeof(struct aglock));
-	if (!ag_locks)
-		do_error(_("couldn't allocate block map locks\n"));
-
-	for (i = 0; i < mp->m_sb.sb_agcount; i++)  {
-		btree_init(&ag_bmap[i]);
-		pthread_mutex_init(&ag_locks[i].lock, NULL);
-	}
-
 	init_rt_bmap(mp);
 	reset_bmaps(mp);
 }
 
 void
-free_bmaps(xfs_mount_t *mp)
+free_bmaps(
+	struct xfs_mount	*mp)
 {
-	xfs_agnumber_t i;
-
-	for (i = 0; i < mp->m_sb.sb_agcount; i++)
-		pthread_mutex_destroy(&ag_locks[i].lock);
-
-	free(ag_locks);
-	ag_locks = NULL;
-
-	for (i = 0; i < mp->m_sb.sb_agcount; i++)
-		btree_destroy(ag_bmap[i]);
-
-	free(ag_bmap);
-	ag_bmap = NULL;
+	destroy_bmaps(ag_bmaps, mp->m_sb.sb_agcount + mp->m_sb.sb_rgcount);
+	ag_bmaps = NULL;
 
 	free_rt_bmap(mp);
 }
diff --git a/repair/incore.h b/repair/incore.h
index 07716fc4c01a05..8385043580637f 100644
--- a/repair/incore.h
+++ b/repair/incore.h
@@ -23,6 +23,9 @@ void		init_bmaps(xfs_mount_t *mp);
 void		reset_bmaps(xfs_mount_t *mp);
 void		free_bmaps(xfs_mount_t *mp);
 
+void		lock_ag(xfs_agnumber_t agno);
+void		unlock_ag(xfs_agnumber_t agno);
+
 void		set_bmap_ext(xfs_agnumber_t agno, xfs_agblock_t agbno,
 			     xfs_extlen_t blen, int state);
 int		get_bmap_ext(xfs_agnumber_t agno, xfs_agblock_t agbno,
diff --git a/repair/rmap.c b/repair/rmap.c
index 3b998a22cee10d..1c6a8691b8cb2c 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -761,12 +761,12 @@ mark_reflink_inodes(
 		agno = XFS_INO_TO_AGNO(mp, rciter.ino);
 		agino = XFS_INO_TO_AGINO(mp, rciter.ino);
 
-		pthread_mutex_lock(&ag_locks[agno].lock);
+		lock_ag(agno);
 		irec = find_inode_rec(mp, agno, agino);
 		off = get_inode_offset(mp, rciter.ino, irec);
 		/* lock here because we might go outside this ag */
 		set_inode_is_rl(irec, off);
-		pthread_mutex_unlock(&ag_locks[agno].lock);
+		unlock_ag(agno);
 	}
 	rcbag_ino_iter_stop(rcstack, &rciter);
 }
diff --git a/repair/scan.c b/repair/scan.c
index ed73de4b2477bf..8b118423ce0457 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -340,7 +340,7 @@ _("bad back (left) sibling pointer (saw %llu should be NULL (0))\n"
 		agno = XFS_FSB_TO_AGNO(mp, bno);
 		agbno = XFS_FSB_TO_AGBNO(mp, bno);
 
-		pthread_mutex_lock(&ag_locks[agno].lock);
+		lock_ag(agno);
 		state = get_bmap(agno, agbno);
 		switch (state) {
 		case XR_E_INUSE1:
@@ -407,7 +407,7 @@ _("bad state %d, inode %" PRIu64 " bmap block 0x%" PRIx64 "\n"),
 				state, ino, bno);
 			break;
 		}
-		pthread_mutex_unlock(&ag_locks[agno].lock);
+		unlock_ag(agno);
 	} else {
 		if (search_dup_extent(XFS_FSB_TO_AGNO(mp, bno),
 				XFS_FSB_TO_AGBNO(mp, bno),
@@ -420,9 +420,9 @@ _("bad state %d, inode %" PRIu64 " bmap block 0x%" PRIx64 "\n"),
 	/* Record BMBT blocks in the reverse-mapping data. */
 	if (check_dups && collect_rmaps && !zap_metadata) {
 		agno = XFS_FSB_TO_AGNO(mp, bno);
-		pthread_mutex_lock(&ag_locks[agno].lock);
+		lock_ag(agno);
 		rmap_add_bmbt_rec(mp, ino, whichfork, bno);
-		pthread_mutex_unlock(&ag_locks[agno].lock);
+		unlock_ag(agno);
 	}
 
 	if (level == 0) {


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 20/51] xfs_repair: support realtime groups
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (18 preceding siblings ...)
  2024-12-23 22:16   ` [PATCH 19/51] xfs_repair: add a real per-AG bitmap abstraction Darrick J. Wong
@ 2024-12-23 22:16   ` Darrick J. Wong
  2024-12-23 22:17   ` [PATCH 21/51] xfs_repair: find and clobber rtgroup bitmap and summary files Darrick J. Wong
                     ` (30 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:16 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Make repair aware of multiple rtgroups.  This now uses the same code as
the AG-based data device for block usage tracking instead of the less
optimal AVL trees and bitmaps used for the traditonal RT device.  This
is done by introducing similar per-rtgroup space tracking structures as
we have for the AGs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 libxfs/libxfs_api_defs.h |    2 
 repair/agheader.c        |    8 -
 repair/agheader.h        |   10 +
 repair/dino_chunks.c     |   12 +
 repair/dinode.c          |  116 +++++++++----
 repair/dir2.c            |   13 +
 repair/incore.c          |  125 ++++++++++----
 repair/incore.h          |   37 +++-
 repair/incore_ext.c      |    3 
 repair/phase2.c          |   51 +++---
 repair/phase4.c          |   27 +++
 repair/phase5.c          |    2 
 repair/phase6.c          |  172 +++++++++++++++++++-
 repair/rt.c              |  403 ++++++++++++++++++++++++++++++++++++++--------
 repair/rt.h              |   20 ++
 repair/sb.c              |   41 +++++
 repair/scan.c            |   28 ++-
 repair/xfs_repair.c      |   15 +-
 18 files changed, 864 insertions(+), 221 deletions(-)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 50da547f8f21d4..84965106358d61 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -108,6 +108,7 @@
 #define xfs_calc_dquots_per_chunk	libxfs_calc_dquots_per_chunk
 #define xfs_cntbt_init_cursor		libxfs_cntbt_init_cursor
 #define xfs_compute_rextslog		libxfs_compute_rextslog
+#define xfs_compute_rgblklog		libxfs_compute_rgblklog
 #define xfs_create_space_res		libxfs_create_space_res
 #define xfs_da3_node_hdr_from_disk	libxfs_da3_node_hdr_from_disk
 #define xfs_da3_node_read		libxfs_da3_node_read
@@ -288,6 +289,7 @@
 #define xfs_rtsummary_create		libxfs_rtsummary_create
 
 #define xfs_rtgroup_alloc		libxfs_rtgroup_alloc
+#define xfs_rtgroup_extents		libxfs_rtgroup_extents
 #define xfs_rtgroup_grab		libxfs_rtgroup_grab
 #define xfs_rtgroup_rele		libxfs_rtgroup_rele
 
diff --git a/repair/agheader.c b/repair/agheader.c
index 14aedece3d07b0..fd279559aed973 100644
--- a/repair/agheader.c
+++ b/repair/agheader.c
@@ -319,12 +319,6 @@ check_v5_feature_mismatch(
 	return XR_AG_SB_SEC;
 }
 
-static inline bool xfs_sb_version_hasmetadir(const struct xfs_sb *sbp)
-{
-	return xfs_sb_is_v5(sbp) &&
-		(sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR);
-}
-
 /*
  * Possible fields that may have been set at mkfs time,
  * sb_inoalignmt, sb_unit, sb_width and sb_dirblklog.
@@ -364,7 +358,7 @@ secondary_sb_whack(
 	 * size is the size of data which is valid for this sb.
 	 */
 	if (xfs_sb_version_hasmetadir(sb))
-		size = offsetofend(struct xfs_dsb, sb_metadirino);
+		size = offsetofend(struct xfs_dsb, sb_pad);
 	else if (xfs_sb_version_hasmetauuid(sb))
 		size = offsetofend(struct xfs_dsb, sb_meta_uuid);
 	else if (xfs_sb_version_hascrc(sb))
diff --git a/repair/agheader.h b/repair/agheader.h
index b4f81d5537906d..c81147b6607d2b 100644
--- a/repair/agheader.h
+++ b/repair/agheader.h
@@ -3,6 +3,8 @@
  * Copyright (c) 2000-2001,2005 Silicon Graphics, Inc.
  * All Rights Reserved.
  */
+#ifndef __XFS_REPAIR_AGHEADER_H__
+#define __XFS_REPAIR_AGHEADER_H__
 
 typedef struct fs_geometry  {
 	/*
@@ -82,3 +84,11 @@ typedef struct fs_geo_list  {
 #define XR_AG_AGF	0x2
 #define XR_AG_AGI	0x4
 #define XR_AG_SB_SEC	0x8
+
+static inline bool xfs_sb_version_hasmetadir(const struct xfs_sb *sbp)
+{
+	return xfs_sb_is_v5(sbp) &&
+		(sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR);
+}
+
+#endif /* __XFS_REPAIR_AGHEADER_H__ */
diff --git a/repair/dino_chunks.c b/repair/dino_chunks.c
index 3c650dac8a4d8e..8935cf856e70c8 100644
--- a/repair/dino_chunks.c
+++ b/repair/dino_chunks.c
@@ -428,7 +428,8 @@ verify_inode_chunk(xfs_mount_t		*mp,
 	for (cur_agbno = chunk_start_agbno;
 	     cur_agbno < chunk_stop_agbno;
 	     cur_agbno += blen)  {
-		state = get_bmap_ext(agno, cur_agbno, chunk_stop_agbno, &blen);
+		state = get_bmap_ext(agno, cur_agbno, chunk_stop_agbno, &blen,
+				false);
 		switch (state) {
 		case XR_E_MULT:
 		case XR_E_INUSE:
@@ -437,7 +438,7 @@ verify_inode_chunk(xfs_mount_t		*mp,
 			do_warn(
 	_("inode block %d/%d multiply claimed, (state %d)\n"),
 				agno, cur_agbno, state);
-			set_bmap_ext(agno, cur_agbno, blen, XR_E_MULT);
+			set_bmap_ext(agno, cur_agbno, blen, XR_E_MULT, false);
 			unlock_ag(agno);
 			return 0;
 		case XR_E_METADATA:
@@ -477,7 +478,8 @@ verify_inode_chunk(xfs_mount_t		*mp,
 	for (cur_agbno = chunk_start_agbno;
 	     cur_agbno < chunk_stop_agbno;
 	     cur_agbno += blen)  {
-		state = get_bmap_ext(agno, cur_agbno, chunk_stop_agbno, &blen);
+		state = get_bmap_ext(agno, cur_agbno, chunk_stop_agbno, &blen,
+				false);
 		switch (state) {
 		case XR_E_INO:
 			do_error(
@@ -497,7 +499,7 @@ verify_inode_chunk(xfs_mount_t		*mp,
 		case XR_E_UNKNOWN:
 		case XR_E_FREE1:
 		case XR_E_FREE:
-			set_bmap_ext(agno, cur_agbno, blen, XR_E_INO);
+			set_bmap_ext(agno, cur_agbno, blen, XR_E_INO, false);
 			break;
 		case XR_E_MULT:
 		case XR_E_INUSE:
@@ -511,7 +513,7 @@ verify_inode_chunk(xfs_mount_t		*mp,
 			do_warn(
 		_("inode block %d/%d bad state, (state %d)\n"),
 				agno, cur_agbno, state);
-			set_bmap_ext(agno, cur_agbno, blen, XR_E_INO);
+			set_bmap_ext(agno, cur_agbno, blen, XR_E_INO, false);
 			break;
 		}
 	}
diff --git a/repair/dinode.c b/repair/dinode.c
index 916aadc782248f..0a9059db9302a3 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -21,6 +21,7 @@
 #include "slab.h"
 #include "rmap.h"
 #include "bmap_repair.h"
+#include "rt.h"
 
 /*
  * gettext lookups for translations of strings use mutexes internally to
@@ -171,20 +172,33 @@ clear_dinode(xfs_mount_t *mp, struct xfs_dinode *dino, xfs_ino_t ino_num)
 static __inline int
 verify_dfsbno_range(
 	struct xfs_mount	*mp,
-	xfs_fsblock_t		fsbno,
-	xfs_filblks_t		count)
+	struct xfs_bmbt_irec	*irec,
+	bool			isrt)
 {
+	xfs_fsblock_t		end =
+		irec->br_startblock + irec->br_blockcount - 1;
+
 	/* the start and end blocks better be in the same allocation group */
-	if (XFS_FSB_TO_AGNO(mp, fsbno) !=
-	    XFS_FSB_TO_AGNO(mp, fsbno + count - 1)) {
-		return XR_DFSBNORANGE_OVERFLOW;
+	if (isrt) {
+		if (xfs_rtb_to_rgno(mp, irec->br_startblock) !=
+		    xfs_rtb_to_rgno(mp, end))
+			return XR_DFSBNORANGE_OVERFLOW;
+
+		if (!libxfs_verify_rtbno(mp, irec->br_startblock))
+			return XR_DFSBNORANGE_BADSTART;
+		if (!libxfs_verify_rtbno(mp, end))
+			return XR_DFSBNORANGE_BADEND;
+	} else {
+		if (XFS_FSB_TO_AGNO(mp, irec->br_startblock) !=
+		    XFS_FSB_TO_AGNO(mp, end))
+			return XR_DFSBNORANGE_OVERFLOW;
+
+		if (!libxfs_verify_fsbno(mp, irec->br_startblock))
+			return XR_DFSBNORANGE_BADSTART;
+		if (!libxfs_verify_fsbno(mp, end))
+			return XR_DFSBNORANGE_BADEND;
 	}
 
-	if (!libxfs_verify_fsbno(mp, fsbno))
-		return XR_DFSBNORANGE_BADSTART;
-	if (!libxfs_verify_fsbno(mp, fsbno + count - 1))
-		return XR_DFSBNORANGE_BADEND;
-
 	return XR_DFSBNORANGE_VALID;
 }
 
@@ -387,17 +401,21 @@ process_bmbt_reclist_int(
 	xfs_extnum_t		i;
 	int			state;
 	xfs_agnumber_t		agno;
-	xfs_agblock_t		agbno;
+	xfs_agblock_t		agbno, first_agbno;
 	xfs_agblock_t		ebno;
 	xfs_extlen_t		blen;
 	xfs_agnumber_t		locked_agno = -1;
 	int			error = 1;
 	int			error2;
+	bool			isrt = false;
 
-	if (type == XR_INO_RTDATA)
+	if (type == XR_INO_RTDATA) {
+		if (whichfork == XFS_DATA_FORK)
+			isrt = true;
 		ftype = ftype_real_time;
-	else
+	} else {
 		ftype = ftype_regular;
+	}
 
 	for (i = 0; i < *numrecs; i++) {
 		libxfs_bmbt_disk_get_all((rp +i), &irec);
@@ -452,7 +470,7 @@ _("zero length extent (off = %" PRIu64 ", fsbno = %" PRIu64 ") in ino %" PRIu64
 			goto done;
 		}
 
-		if (type == XR_INO_RTDATA && whichfork == XFS_DATA_FORK) {
+		if (isrt && !xfs_has_rtgroups(mp)) {
 			error2 = process_rt_rec(mp, &irec, ino, tot, check_dups,
 					zap_metadata);
 			if (error2)
@@ -468,8 +486,7 @@ _("zero length extent (off = %" PRIu64 ", fsbno = %" PRIu64 ") in ino %" PRIu64
 		/*
 		 * regular file data fork or attribute fork
 		 */
-		switch (verify_dfsbno_range(mp, irec.br_startblock,
-						irec.br_blockcount)) {
+		switch (verify_dfsbno_range(mp, &irec, isrt)) {
 			case XR_DFSBNORANGE_VALID:
 				break;
 
@@ -531,26 +548,39 @@ _("Fatal error: inode %" PRIu64 " - blkmap_set_ext(): %s\n"
 			}
 		}
 
-		/*
-		 * Profiling shows that the following loop takes the
-		 * most time in all of xfs_repair.
-		 */
-		agno = XFS_FSB_TO_AGNO(mp, irec.br_startblock);
-		agbno = XFS_FSB_TO_AGBNO(mp, irec.br_startblock);
-		ebno = agbno + irec.br_blockcount;
+		if (isrt) {
+			agno = xfs_rtb_to_rgno(mp, irec.br_startblock);
+			first_agbno = xfs_rtb_to_rgbno(mp, irec.br_startblock);
+		} else {
+			agno = XFS_FSB_TO_AGNO(mp, irec.br_startblock);
+			first_agbno = XFS_FSB_TO_AGBNO(mp, irec.br_startblock);
+		}
+		agbno = first_agbno;
+		ebno = first_agbno + irec.br_blockcount;
 		if (agno != locked_agno) {
 			if (locked_agno != -1)
-				unlock_ag(locked_agno);
+				unlock_group(locked_agno, isrt);
 			locked_agno = agno;
-			lock_ag(locked_agno);
+			lock_group(locked_agno, isrt);
 		}
 
+		/*
+		 * Profiling shows that the following loop takes the most time
+		 * in all of xfs_repair.
+		 */
 		for (b = irec.br_startblock;
 		     agbno < ebno;
 		     b += blen, agbno += blen) {
-			state = get_bmap_ext(agno, agbno, ebno, &blen);
+			state = get_bmap_ext(agno, agbno, ebno, &blen, isrt);
 			switch (state)  {
 			case XR_E_FREE:
+				/*
+				 * We never do a scan pass of the rt bitmap, so unknown
+				 * blocks are marked as free.
+				 */
+				if (isrt)
+					break;
+				fallthrough;
 			case XR_E_FREE1:
 				do_warn(
 _("%s fork in ino %" PRIu64 " claims free block %" PRIu64 "\n"),
@@ -624,10 +654,10 @@ _("illegal state %d in block map %" PRIu64 "\n"),
 		 * After a successful rebuild we'll try this scan again.
 		 * (If the rebuild fails we won't come back here.)
 		 */
-		agbno = XFS_FSB_TO_AGBNO(mp, irec.br_startblock);
-		ebno = agbno + irec.br_blockcount;
+		agbno = first_agbno;
+		ebno = first_agbno + irec.br_blockcount;
 		for (; agbno < ebno; agbno += blen) {
-			state = get_bmap_ext(agno, agbno, ebno, &blen);
+			state = get_bmap_ext(agno, agbno, ebno, &blen, isrt);
 			switch (state)  {
 			case XR_E_METADATA:
 				/*
@@ -642,15 +672,16 @@ _("illegal state %d in block map %" PRIu64 "\n"),
 			case XR_E_FREE1:
 			case XR_E_INUSE1:
 			case XR_E_UNKNOWN:
-				set_bmap_ext(agno, agbno, blen, zap_metadata ?
-						XR_E_METADATA : XR_E_INUSE);
+				set_bmap_ext(agno, agbno, blen,
+					zap_metadata ?
+					XR_E_METADATA : XR_E_INUSE, isrt);
 				break;
 
 			case XR_E_INUSE:
 			case XR_E_MULT:
 				if (!zap_metadata)
 					set_bmap_ext(agno, agbno, blen,
-							XR_E_MULT);
+							XR_E_MULT, isrt);
 				break;
 			default:
 				break;
@@ -663,7 +694,7 @@ _("illegal state %d in block map %" PRIu64 "\n"),
 	error = 0;
 done:
 	if (locked_agno != -1)
-		unlock_ag(locked_agno);
+		unlock_group(locked_agno, isrt);
 
 	if (i != *numrecs) {
 		ASSERT(i < *numrecs);
@@ -1588,7 +1619,7 @@ check_dinode_mode_format(
  */
 
 static int
-process_check_sb_inodes(
+process_check_metadata_inodes(
 	xfs_mount_t		*mp,
 	struct xfs_dinode	*dinoc,
 	xfs_ino_t		lino,
@@ -1638,8 +1669,10 @@ process_check_sb_inodes(
 		}
 		return 0;
 	}
+
 	dnextents = xfs_dfork_data_extents(dinoc);
-	if (lino == mp->m_sb.sb_rsumino) {
+	if (lino == mp->m_sb.sb_rsumino ||
+	    is_rtsummary_inode(lino)) {
 		if (*type != XR_INO_RTSUM) {
 			do_warn(
 _("realtime summary inode %" PRIu64 " has bad type 0x%x, "),
@@ -1660,7 +1693,8 @@ _("bad # of extents (%" PRIu64 ") for realtime summary inode %" PRIu64 "\n"),
 		}
 		return 0;
 	}
-	if (lino == mp->m_sb.sb_rbmino) {
+	if (lino == mp->m_sb.sb_rbmino ||
+	    is_rtbitmap_inode(lino)) {
 		if (*type != XR_INO_RTBITMAP) {
 			do_warn(
 _("realtime bitmap inode %" PRIu64 " has bad type 0x%x, "),
@@ -2920,9 +2954,11 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"),
 	case S_IFREG:
 		if (be16_to_cpu(dino->di_flags) & XFS_DIFLAG_REALTIME)
 			type = XR_INO_RTDATA;
-		else if (lino == mp->m_sb.sb_rbmino)
+		else if (lino == mp->m_sb.sb_rbmino ||
+			 is_rtbitmap_inode(lino))
 			type = XR_INO_RTBITMAP;
-		else if (lino == mp->m_sb.sb_rsumino)
+		else if (lino == mp->m_sb.sb_rsumino ||
+			 is_rtsummary_inode(lino))
 			type = XR_INO_RTSUM;
 		else if (lino == mp->m_sb.sb_uquotino)
 			type = XR_INO_UQUOTA;
@@ -2955,9 +2991,9 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"),
 	}
 
 	/*
-	 * type checks for superblock inodes
+	 * type checks for metadata inodes
 	 */
-	if (process_check_sb_inodes(mp, dino, lino, &type, dirty) != 0)
+	if (process_check_metadata_inodes(mp, dino, lino, &type, dirty) != 0)
 		goto clear_bad_out;
 
 	validate_extsize(mp, dino, lino, dirty);
diff --git a/repair/dir2.c b/repair/dir2.c
index d233c724488182..ca747c90175e93 100644
--- a/repair/dir2.c
+++ b/repair/dir2.c
@@ -15,6 +15,7 @@
 #include "da_util.h"
 #include "prefetch.h"
 #include "progress.h"
+#include "rt.h"
 
 /*
  * Known bad inode list.  These are seen when the leaf and node
@@ -256,10 +257,12 @@ process_sf_dir2(
 			 * bother checking if the child inode is free or not.
 			 */
 			junkit = 0;
-		} else if (lino == mp->m_sb.sb_rbmino)  {
+		} else if (lino == mp->m_sb.sb_rbmino ||
+		           is_rtbitmap_inode(lino)) {
 			junkit = 1;
 			junkreason = _("realtime bitmap");
-		} else if (lino == mp->m_sb.sb_rsumino)  {
+		} else if (lino == mp->m_sb.sb_rsumino ||
+		           is_rtsummary_inode(lino)) {
 			junkit = 1;
 			junkreason = _("realtime summary");
 		} else if (lino == mp->m_sb.sb_uquotino)  {
@@ -737,9 +740,11 @@ process_dir2_data(
 			 * bother checking if the child inode is free or not.
 			 */
 			clearino = 0;
-		} else if (ent_ino == mp->m_sb.sb_rbmino) {
+		} else if (ent_ino == mp->m_sb.sb_rbmino ||
+		           is_rtbitmap_inode(ent_ino)) {
 			clearreason = _("realtime bitmap");
-		} else if (ent_ino == mp->m_sb.sb_rsumino) {
+		} else if (ent_ino == mp->m_sb.sb_rsumino ||
+		           is_rtsummary_inode(ent_ino)) {
 			clearreason = _("realtime summary");
 		} else if (ent_ino == mp->m_sb.sb_uquotino) {
 			clearreason = _("user quota");
diff --git a/repair/incore.c b/repair/incore.c
index fb9ebee1671d4f..2339d49a95773d 100644
--- a/repair/incore.c
+++ b/repair/incore.c
@@ -29,28 +29,42 @@ struct bmap {
 	struct btree_root	*root;
 };
 static struct bmap	*ag_bmaps;
+static struct bmap	*rtg_bmaps;
+
+static inline struct bmap *bmap_for_group(xfs_agnumber_t gno, bool isrt)
+{
+	if (isrt)
+		return &rtg_bmaps[gno];
+	return &ag_bmaps[gno];
+}
 
 void
-lock_ag(
-	xfs_agnumber_t		agno)
+lock_group(
+	xfs_agnumber_t		gno,
+	bool			isrt)
 {
-	pthread_mutex_lock(&ag_bmaps[agno].lock);
+	pthread_mutex_lock(&bmap_for_group(gno, isrt)->lock);
 }
 
 void
-unlock_ag(
-	xfs_agnumber_t		agno)
+unlock_group(
+	xfs_agnumber_t		gno,
+	bool			isrt)
 {
-	pthread_mutex_unlock(&ag_bmaps[agno].lock);
+	pthread_mutex_unlock(&bmap_for_group(gno, isrt)->lock);
 }
 
-static void
-update_bmap(
-	struct btree_root	*bmap,
-	unsigned long		offset,
+
+void
+set_bmap_ext(
+	xfs_agnumber_t		gno,
+	xfs_agblock_t		offset,
 	xfs_extlen_t		blen,
-	void			*new_state)
+	int			state,
+	bool			isrt)
 {
+	struct btree_root	*bmap = bmap_for_group(gno, isrt)->root;
+	void			*new_state = &states[state];
 	unsigned long		end = offset + blen;
 	int			*cur_state;
 	unsigned long		cur_key;
@@ -140,24 +154,15 @@ update_bmap(
 	btree_insert(bmap, end, prev_state);
 }
 
-void
-set_bmap_ext(
-	xfs_agnumber_t		agno,
-	xfs_agblock_t		agbno,
-	xfs_extlen_t		blen,
-	int			state)
-{
-	update_bmap(ag_bmaps[agno].root, agbno, blen, &states[state]);
-}
-
 int
 get_bmap_ext(
-	xfs_agnumber_t		agno,
+	xfs_agnumber_t		gno,
 	xfs_agblock_t		agbno,
 	xfs_agblock_t		maxbno,
-	xfs_extlen_t		*blen)
+	xfs_extlen_t		*blen,
+	bool			isrt)
 {
-	struct btree_root	*bmap = ag_bmaps[agno].root;
+	struct btree_root	*bmap = bmap_for_group(gno, isrt)->root;
 	int			*statep;
 	unsigned long		key;
 
@@ -248,15 +253,15 @@ free_rt_bmap(xfs_mount_t *mp)
 	free(rt_bmap);
 	rt_bmap = NULL;
 	pthread_mutex_destroy(&rt_lock);
-
 }
 
-void
-reset_bmaps(xfs_mount_t *mp)
+static void
+reset_ag_bmaps(
+	struct xfs_mount	*mp)
 {
-	xfs_agnumber_t	agno;
-	xfs_agblock_t	ag_size;
-	int		ag_hdr_block;
+	int			ag_hdr_block;
+	xfs_agnumber_t		agno;
+	xfs_agblock_t		ag_size;
 
 	ag_hdr_block = howmany(4 * mp->m_sb.sb_sectsize, mp->m_sb.sb_blocksize);
 	ag_size = mp->m_sb.sb_agblocks;
@@ -286,14 +291,50 @@ reset_bmaps(xfs_mount_t *mp)
 		btree_insert(bmap, ag_hdr_block, &states[XR_E_UNKNOWN]);
 		btree_insert(bmap, ag_size, &states[XR_E_BAD_STATE]);
 	}
+}
+
+static void
+reset_rtg_bmaps(
+	struct xfs_mount	*mp)
+{
+	xfs_rgnumber_t		rgno;
+
+	for (rgno = 0 ; rgno < mp->m_sb.sb_rgcount; rgno++) {
+		struct btree_root	*bmap = rtg_bmaps[rgno].root;
+		uint64_t		rblocks;
+
+		btree_clear(bmap);
+		if (rgno == 0 && xfs_has_rtsb(mp)) {
+			btree_insert(bmap, 0, &states[XR_E_INUSE_FS]);
+			btree_insert(bmap, mp->m_sb.sb_rextsize,
+					&states[XR_E_FREE]);
+		} else {
+			btree_insert(bmap, 0, &states[XR_E_FREE]);
+		}
+
+		rblocks = xfs_rtbxlen_to_blen(mp,
+				libxfs_rtgroup_extents(mp, rgno));
+		btree_insert(bmap, rblocks, &states[XR_E_BAD_STATE]);
+	}
+}
+
+void
+reset_bmaps(
+	struct xfs_mount	*mp)
+{
+	reset_ag_bmaps(mp);
 
 	if (mp->m_sb.sb_logstart != 0) {
 		set_bmap_ext(XFS_FSB_TO_AGNO(mp, mp->m_sb.sb_logstart),
 			     XFS_FSB_TO_AGBNO(mp, mp->m_sb.sb_logstart),
-			     mp->m_sb.sb_logblocks, XR_E_INUSE_FS);
+			     mp->m_sb.sb_logblocks, XR_E_INUSE_FS, false);
 	}
 
-	reset_rt_bmap();
+	if (xfs_has_rtgroups(mp)) {
+		reset_rtg_bmaps(mp);
+	} else {
+		reset_rt_bmap();
+	}
 }
 
 static struct bmap *
@@ -334,11 +375,18 @@ void
 init_bmaps(
 	struct xfs_mount	*mp)
 {
-	ag_bmaps = alloc_bmaps(mp->m_sb.sb_agcount + mp->m_sb.sb_rgcount);
+	ag_bmaps = alloc_bmaps(mp->m_sb.sb_agcount);
 	if (!ag_bmaps)
 		do_error(_("couldn't allocate block map btree roots\n"));
 
-	init_rt_bmap(mp);
+	if (xfs_has_rtgroups(mp)) {
+		rtg_bmaps = alloc_bmaps(mp->m_sb.sb_rgcount);
+		if (!rtg_bmaps)
+			do_error(_("couldn't allocate block map btree roots\n"));
+	} else {
+		init_rt_bmap(mp);
+	}
+
 	reset_bmaps(mp);
 }
 
@@ -346,8 +394,13 @@ void
 free_bmaps(
 	struct xfs_mount	*mp)
 {
-	destroy_bmaps(ag_bmaps, mp->m_sb.sb_agcount + mp->m_sb.sb_rgcount);
+	destroy_bmaps(ag_bmaps, mp->m_sb.sb_agcount);
 	ag_bmaps = NULL;
 
-	free_rt_bmap(mp);
+	if (xfs_has_rtgroups(mp)) {
+		destroy_bmaps(rtg_bmaps, mp->m_sb.sb_rgcount);
+		rtg_bmaps = NULL;
+	} else {
+		free_rt_bmap(mp);
+	}
 }
diff --git a/repair/incore.h b/repair/incore.h
index 8385043580637f..61730c330911f7 100644
--- a/repair/incore.h
+++ b/repair/incore.h
@@ -23,29 +23,46 @@ void		init_bmaps(xfs_mount_t *mp);
 void		reset_bmaps(xfs_mount_t *mp);
 void		free_bmaps(xfs_mount_t *mp);
 
-void		lock_ag(xfs_agnumber_t agno);
-void		unlock_ag(xfs_agnumber_t agno);
+void		lock_group(xfs_agnumber_t agno, bool isrt);
+void		unlock_group(xfs_agnumber_t agno, bool isrt);
+
+static inline void lock_ag(xfs_agnumber_t agno)
+{
+	lock_group(agno, false);
+}
+
+static inline void unlock_ag(xfs_agnumber_t agno)
+{
+	unlock_group(agno, false);
+}
 
 void		set_bmap_ext(xfs_agnumber_t agno, xfs_agblock_t agbno,
-			     xfs_extlen_t blen, int state);
+			     xfs_extlen_t blen, int state, bool isrt);
 int		get_bmap_ext(xfs_agnumber_t agno, xfs_agblock_t agbno,
-			     xfs_agblock_t maxbno, xfs_extlen_t *blen);
-
-void		set_rtbmap(xfs_rtxnum_t rtx, int state);
-int		get_rtbmap(xfs_rtxnum_t rtx);
+			     xfs_agblock_t maxbno, xfs_extlen_t *blen,
+			     bool isrt);
 
 static inline void
 set_bmap(xfs_agnumber_t agno, xfs_agblock_t agbno, int state)
 {
-	set_bmap_ext(agno, agbno, 1, state);
+	set_bmap_ext(agno, agbno, 1, state, false);
 }
 
 static inline int
 get_bmap(xfs_agnumber_t agno, xfs_agblock_t agbno)
 {
-	return get_bmap_ext(agno, agbno, agbno + 1, NULL);
+	return get_bmap_ext(agno, agbno, agbno + 1, NULL, false);
 }
 
+static inline int
+get_rgbmap(xfs_rgnumber_t rgno, xfs_rgblock_t rgbno)
+{
+	return get_bmap_ext(rgno, rgbno, rgbno + 1, NULL, true);
+}
+
+void		set_rtbmap(xfs_rtxnum_t rtx, int state);
+int		get_rtbmap(xfs_rtxnum_t rtx);
+
 /*
  * extent tree definitions
  * right now, there are 3 trees per AG, a bno tree, a bcnt tree
@@ -698,6 +715,8 @@ static inline unsigned int
 xfs_rootrec_inodes_inuse(
 	struct xfs_mount	*mp)
 {
+	if (xfs_has_rtgroups(mp))
+		return 2; /* sb_rootino, sb_metadirino */
 	if (xfs_has_metadir(mp))
 		return 4; /* sb_rootino, sb_rbmino, sb_rsumino, sb_metadirino */
 	return 3; /* sb_rootino, sb_rbmino, sb_rsumino */
diff --git a/repair/incore_ext.c b/repair/incore_ext.c
index 59c5d6f502c308..892f9d25588c11 100644
--- a/repair/incore_ext.c
+++ b/repair/incore_ext.c
@@ -593,7 +593,6 @@ release_rt_extent_tree()
 void
 free_rt_dup_extent_tree(xfs_mount_t *mp)
 {
-	ASSERT(mp->m_sb.sb_rblocks != 0);
 	free(rt_ext_tree_ptr);
 	rt_ext_tree_ptr = NULL;
 }
@@ -726,8 +725,8 @@ static avl64ops_t avl64_extent_tree_ops = {
 void
 incore_ext_init(xfs_mount_t *mp)
 {
-	int i;
 	xfs_agnumber_t agcount = mp->m_sb.sb_agcount;
+	int i;
 
 	pthread_mutex_init(&rt_ext_tree_lock, NULL);
 
diff --git a/repair/phase2.c b/repair/phase2.c
index 476a1c74db8c8d..27c873fca76747 100644
--- a/repair/phase2.c
+++ b/repair/phase2.c
@@ -14,6 +14,7 @@
 #include "incore.h"
 #include "progress.h"
 #include "scan.h"
+#include "rt.h"
 
 /* workaround craziness in the xlog routines */
 int xlog_recover_do_trans(struct xlog *log, struct xlog_recover *t, int p)
@@ -544,16 +545,14 @@ phase2(
 		struct xfs_sb	*sb = &mp->m_sb;
 
 		if (xfs_has_metadir(mp))
-			ASSERT(sb->sb_metadirino == sb->sb_rootino + 1 &&
-			       sb->sb_rbmino  == sb->sb_rootino + 2 &&
-			       sb->sb_rsumino == sb->sb_rootino + 3);
+			ASSERT(sb->sb_metadirino == sb->sb_rootino + 1);
 		else
 			ASSERT(sb->sb_rbmino  == sb->sb_rootino + 1 &&
 			       sb->sb_rsumino == sb->sb_rootino + 2);
 		do_warn(_("root inode chunk not found\n"));
 
 		/*
-		 * mark the first 3-4 inodes used, the rest are free
+		 * mark the first 2-3 inodes used, the rest are free
 		 */
 		ino_rec = set_inode_used_alloc(mp, 0,
 				XFS_INO_TO_AGINO(mp, sb->sb_rootino));
@@ -569,7 +568,7 @@ phase2(
 		 * also mark blocks
 		 */
 		set_bmap_ext(0, XFS_INO_TO_AGBNO(mp, sb->sb_rootino),
-			     M_IGEO(mp)->ialloc_blks, XR_E_INO);
+			     M_IGEO(mp)->ialloc_blks, XR_E_INO, false);
 	} else  {
 		do_log(_("        - found root inode chunk\n"));
 		j = 0;
@@ -600,29 +599,33 @@ phase2(
 			j++;
 		}
 
-		if (is_inode_free(ino_rec, j))  {
-			do_warn(_("realtime bitmap inode marked free, "));
-			set_inode_used(ino_rec, j);
-			if (!no_modify)
-				do_warn(_("correcting\n"));
-			else
-				do_warn(_("would correct\n"));
-		}
-		set_inode_is_meta(ino_rec, j);
-		j++;
+		if (!xfs_has_rtgroups(mp)) {
+			if (is_inode_free(ino_rec, j))  {
+				do_warn(_("realtime bitmap inode marked free, "));
+				set_inode_used(ino_rec, j);
+				if (!no_modify)
+					do_warn(_("correcting\n"));
+				else
+					do_warn(_("would correct\n"));
+			}
+			set_inode_is_meta(ino_rec, j);
+			j++;
 
-		if (is_inode_free(ino_rec, j))  {
-			do_warn(_("realtime summary inode marked free, "));
-			set_inode_used(ino_rec, j);
-			if (!no_modify)
-				do_warn(_("correcting\n"));
-			else
-				do_warn(_("would correct\n"));
+			if (is_inode_free(ino_rec, j))  {
+				do_warn(_("realtime summary inode marked free, "));
+				set_inode_used(ino_rec, j);
+				if (!no_modify)
+					do_warn(_("correcting\n"));
+				else
+					do_warn(_("would correct\n"));
+			}
+			set_inode_is_meta(ino_rec, j);
+			j++;
 		}
-		set_inode_is_meta(ino_rec, j);
-		j++;
 	}
 
+	discover_rtgroup_inodes(mp);
+
 	/*
 	 * Upgrade the filesystem now that we've done a preliminary check of
 	 * the superblocks, the AGs, the log, and the metadata inodes.
diff --git a/repair/phase4.c b/repair/phase4.c
index 036a4ed0e54445..f43f8ecd84e25b 100644
--- a/repair/phase4.c
+++ b/repair/phase4.c
@@ -295,15 +295,17 @@ process_dup_rt_extents(
  */
 static void
 process_dup_extents(
+	struct xfs_mount	*mp,
 	xfs_agnumber_t		agno,
 	xfs_agblock_t		agbno,
-	xfs_agblock_t		ag_end)
+	xfs_agblock_t		ag_end,
+	bool			isrt)
 {
 	do {
 		int		bstate;
 		xfs_extlen_t	blen;
 
-		bstate = get_bmap_ext(agno, agbno, ag_end, &blen);
+		bstate = get_bmap_ext(agno, agbno, ag_end, &blen, isrt);
 		switch (bstate) {
 		case XR_E_FREE1:
 			if (no_modify)
@@ -320,7 +322,12 @@ _("free space (%u,%u-%u) only seen by one free space btree\n"),
 		case XR_E_FS_MAP:
 			break;
 		case XR_E_MULT:
-			add_dup_extent(agno, agbno, blen);
+			/*
+			 * Nothing is searching for duplicate RT extents, so
+			 * don't bother tracking them.
+			 */
+			if (!isrt)
+				add_dup_extent(agno, agbno, blen);
 			break;
 		case XR_E_BAD_STATE:
 		default:
@@ -389,13 +396,23 @@ phase4(xfs_mount_t *mp)
 			mp->m_sb.sb_dblocks -
 				(xfs_rfsblock_t) mp->m_sb.sb_agblocks * i;
 
-		process_dup_extents(i, ag_hdr_block, ag_end);
+		process_dup_extents(mp, i, ag_hdr_block, ag_end, false);
 
 		PROG_RPT_INC(prog_rpt_done[i], 1);
 	}
 	print_final_rpt();
 
-	process_dup_rt_extents(mp);
+	if (xfs_has_rtgroups(mp)) {
+		for (i = 0; i < mp->m_sb.sb_rgcount; i++)  {
+			uint64_t	rblocks;
+
+			rblocks = xfs_rtbxlen_to_blen(mp,
+					libxfs_rtgroup_extents(mp, i));
+			process_dup_extents(mp, i, 0, rblocks, true);
+		}
+	} else {
+		process_dup_rt_extents(mp);
+	}
 
 	/*
 	 * initialize bitmaps for all AGs
diff --git a/repair/phase5.c b/repair/phase5.c
index 91c4a8662a69f2..ac5f04697b7110 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -72,7 +72,7 @@ mk_incore_fstree(
 	 * largest extent.
 	 */
 	for (agbno = 0; agbno < ag_end; agbno += blen) {
-		bstate = get_bmap_ext(agno, agbno, ag_end, &blen);
+		bstate = get_bmap_ext(agno, agbno, ag_end, &blen, false);
 		if (bstate < XR_E_INUSE)  {
 			free_blocks += blen;
 			if (in_extent == 0)  {
diff --git a/repair/phase6.c b/repair/phase6.c
index e9feaa5739efa1..59665f48684aa4 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -568,6 +568,122 @@ _("couldn't iget realtime %s inode -- error - %d\n"),
 		do_error(_("%s: commit failed, error %d\n"), __func__, error);
 }
 
+/* Mark a newly allocated inode in use in the incore bitmap. */
+static void
+mark_ino_inuse(
+	struct xfs_mount	*mp,
+	xfs_ino_t		ino,
+	int			mode,
+	xfs_ino_t		parent)
+{
+	struct ino_tree_node	*irec;
+	int			ino_offset;
+	int			i;
+
+	irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, ino),
+			XFS_INO_TO_AGINO(mp, ino));
+
+	if (irec == NULL) {
+		/*
+		 * This inode is allocated from a newly created inode
+		 * chunk and therefore did not exist when inode chunks
+		 * were processed in phase3. Add this group of inodes to
+		 * the entry avl tree as if they were discovered in phase3.
+		 */
+		irec = set_inode_free_alloc(mp,
+				XFS_INO_TO_AGNO(mp, ino),
+				XFS_INO_TO_AGINO(mp, ino));
+		alloc_ex_data(irec);
+
+		for (i = 0; i < XFS_INODES_PER_CHUNK; i++)
+			set_inode_free(irec, i);
+	}
+
+	ino_offset = get_inode_offset(mp, ino, irec);
+
+	/*
+	 * Mark the inode allocated so it is not skipped in phase 7.  We'll
+	 * find it with the directory traverser soon, so we don't need to
+	 * mark it reached.
+	 */
+	set_inode_used(irec, ino_offset);
+	set_inode_ftype(irec, ino_offset, libxfs_mode_to_ftype(mode));
+	set_inode_parent(irec, ino_offset, parent);
+	if (S_ISDIR(mode))
+		set_inode_isadir(irec, ino_offset);
+}
+
+static bool
+ensure_rtgroup_file(
+	struct xfs_rtgroup	*rtg,
+	enum xfs_rtg_inodes	type)
+{
+	struct xfs_mount	*mp = rtg_mount(rtg);
+	struct xfs_inode	*ip = rtg->rtg_inodes[type];
+	const char		*name = libxfs_rtginode_name(type);
+	int			error;
+
+	if (!xfs_rtginode_enabled(rtg, type))
+		return false;
+
+	if (no_modify) {
+		if (rtgroup_inodes_were_bad(type))
+			do_warn(_("would reset rtgroup %u %s inode\n"),
+				rtg_rgno(rtg), name);
+		return false;
+	}
+
+	if (rtgroup_inodes_were_bad(type)) {
+		/*
+		 * The inode was bad or missing, state that we'll make a new
+		 * one even though we always create a new one.
+		 */
+		do_warn(_("resetting rtgroup %u %s inode\n"),
+			rtg_rgno(rtg), name);
+	}
+
+	error = -libxfs_rtginode_create(rtg, type, false);
+	if (error)
+		do_error(
+_("Couldn't create rtgroup %u %s inode, error %d\n"),
+			rtg_rgno(rtg), name, error);
+
+	ip = rtg->rtg_inodes[type];
+
+	/* Mark the inode in use. */
+	mark_ino_inuse(mp, ip->i_ino, S_IFREG, mp->m_rtdirip->i_ino);
+	mark_ino_metadata(mp, ip->i_ino);
+	return true;
+}
+
+static void
+ensure_rtgroup_bitmap(
+	struct xfs_rtgroup	*rtg)
+{
+	struct xfs_mount	*mp = rtg_mount(rtg);
+
+	if (!xfs_has_rtgroups(mp))
+		return;
+	if (!ensure_rtgroup_file(rtg, XFS_RTGI_BITMAP))
+		return;
+
+	fill_rtbitmap(rtg);
+}
+
+static void
+ensure_rtgroup_summary(
+	struct xfs_rtgroup	*rtg)
+{
+	struct xfs_mount	*mp = rtg_mount(rtg);
+
+	if (!xfs_has_rtgroups(mp))
+		return;
+	if (!ensure_rtgroup_file(rtg, XFS_RTGI_SUMMARY))
+		return;
+
+	fill_rtsummary(rtg);
+}
+
 /* Initialize a root directory. */
 static int
 init_fs_root_dir(
@@ -632,6 +748,8 @@ mk_metadir(
 	struct xfs_trans	*tp;
 	int			error;
 
+	libxfs_rtginode_irele(&mp->m_rtdirip);
+
 	error = init_fs_root_dir(mp, mp->m_sb.sb_metadirino, 0,
 			&mp->m_metadirip);
 	if (error)
@@ -3057,8 +3175,10 @@ mark_inode(
 static void
 mark_standalone_inodes(xfs_mount_t *mp)
 {
-	mark_inode(mp, mp->m_sb.sb_rbmino);
-	mark_inode(mp, mp->m_sb.sb_rsumino);
+	if (!xfs_has_rtgroups(mp)) {
+		mark_inode(mp, mp->m_sb.sb_rbmino);
+		mark_inode(mp, mp->m_sb.sb_rsumino);
+	}
 
 	if (!fs_quotas)
 		return;
@@ -3239,6 +3359,49 @@ _("        - resetting contents of realtime bitmap and summary inodes\n"));
 	libxfs_rtgroup_rele(rtg);
 }
 
+static void
+reset_rt_metadir_inodes(
+	struct xfs_mount	*mp)
+{
+	struct xfs_rtgroup	*rtg = NULL;
+	int			error;
+
+	/*
+	 * Release the rtgroup inodes so that we can rebuild everything from
+	 * observations.
+	 */
+	if (!no_modify)
+		unload_rtgroup_inodes(mp);
+
+	if (mp->m_sb.sb_rgcount > 0) {
+		if (!no_modify) {
+			error = -libxfs_rtginode_mkdir_parent(mp);
+			if (error)
+				do_error(_("failed to create realtime metadir (%d)\n"),
+					error);
+		}
+		mark_ino_inuse(mp, mp->m_rtdirip->i_ino, S_IFDIR,
+				mp->m_metadirip->i_ino);
+		mark_ino_metadata(mp, mp->m_rtdirip->i_ino);
+	}
+
+	/*
+	 * This isn't the whole story, but it keeps the message that we've had
+	 * for years and which is expected in xfstests and more.
+	 */
+	if (!no_modify)
+		do_log(
+_("        - resetting contents of realtime bitmap and summary inodes\n"));
+
+	if (mp->m_sb.sb_rgcount == 0)
+		return;
+
+	while ((rtg = xfs_rtgroup_next(mp, rtg))) {
+		ensure_rtgroup_bitmap(rtg);
+		ensure_rtgroup_summary(rtg);
+	}
+}
+
 void
 phase6(xfs_mount_t *mp)
 {
@@ -3287,7 +3450,10 @@ phase6(xfs_mount_t *mp)
 		do_warn(_("would reinitialize metadata root directory\n"));
 	}
 
-	reset_rt_sb_inodes(mp);
+	if (xfs_has_rtgroups(mp))
+		reset_rt_metadir_inodes(mp);
+	else
+		reset_rt_sb_inodes(mp);
 
 	mark_standalone_inodes(mp);
 
diff --git a/repair/rt.c b/repair/rt.c
index a3378ef1dd0af2..2de6830c931e86 100644
--- a/repair/rt.c
+++ b/repair/rt.c
@@ -12,11 +12,19 @@
 #include "dinode.h"
 #include "protos.h"
 #include "err_protos.h"
+#include "libfrog/bitmap.h"
 #include "rt.h"
 
+/* Bitmap of rt group inodes */
+static struct bitmap	*rtg_inodes[XFS_RTGI_MAX];
+static bool		rtginodes_bad[XFS_RTGI_MAX];
+
 /* Computed rt bitmap/summary data */
-static union xfs_rtword_raw	*btmcompute;
-static union xfs_suminfo_raw	*sumcompute;
+struct rtg_computed {
+	union xfs_rtword_raw	*bmp;
+	union xfs_suminfo_raw	*sum;
+};
+struct rtg_computed *rt_computed;
 
 static inline void
 set_rtword(
@@ -44,14 +52,12 @@ inc_sumcount(
 		p->old++;
 }
 
-/*
- * generate the real-time bitmap and summary info based on the
- * incore realtime extent map.
- */
-void
-generate_rtinfo(
-	struct xfs_mount	*mp)
+static void
+generate_rtgroup_rtinfo(
+	struct xfs_rtgroup	*rtg)
 {
+	struct rtg_computed	*comp = &rt_computed[rtg_rgno(rtg)];
+	struct xfs_mount	*mp = rtg_mount(rtg);
 	unsigned int		bitsperblock =
 		mp->m_blockwsize << XFS_NBWORDLOG;
 	xfs_rtxnum_t		extno = 0;
@@ -63,15 +69,15 @@ generate_rtinfo(
 	union xfs_rtword_raw	*words;
 
 	wordcnt = XFS_FSB_TO_B(mp, mp->m_sb.sb_rbmblocks) >> XFS_WORDLOG;
-	btmcompute = calloc(wordcnt, sizeof(union xfs_rtword_raw));
-	if (!btmcompute)
+	comp->bmp = calloc(wordcnt, sizeof(union xfs_rtword_raw));
+	if (!comp->bmp)
 		do_error(
 _("couldn't allocate memory for incore realtime bitmap.\n"));
-	words = btmcompute;
+	words = comp->bmp;
 
 	wordcnt = XFS_FSB_TO_B(mp, mp->m_rsumblocks) >> XFS_WORDLOG;
-	sumcompute = calloc(wordcnt, sizeof(union xfs_suminfo_raw));
-	if (!sumcompute)
+	comp->sum = calloc(wordcnt, sizeof(union xfs_suminfo_raw));
+	if (!comp->sum)
 		do_error(
 _("couldn't allocate memory for incore realtime summary info.\n"));
 
@@ -81,15 +87,27 @@ _("couldn't allocate memory for incore realtime summary info.\n"));
 	 * end (size) of each range of free extents to set the summary info
 	 * properly.
 	 */
-	while (extno < mp->m_sb.sb_rextents)  {
+	while (extno < rtg->rtg_extents) {
 		xfs_rtword_t		freebit = 1;
 		xfs_rtword_t		bits = 0;
-		int			i;
+		int			state, i;
 
 		set_rtword(mp, words, 0);
-		for (i = 0; i < sizeof(xfs_rtword_t) * NBBY &&
-				extno < mp->m_sb.sb_rextents; i++, extno++)  {
-			if (get_rtbmap(extno) == XR_E_FREE)  {
+		for (i = 0; i < sizeof(xfs_rtword_t) * NBBY; i++) {
+			if (extno == rtg->rtg_extents)
+				break;
+
+			/*
+			 * Note: for the RTG case it might make sense to use
+			 * get_rgbmap_ext here and generate multiple bitmap
+			 * entries per lookup.
+			 */
+			if (xfs_has_rtgroups(mp))
+				state = get_rgbmap(rtg_rgno(rtg),
+					extno * mp->m_sb.sb_rextsize);
+			else
+				state = get_rtbmap(extno);
+			if (state == XR_E_FREE)  {
 				sb_frextents++;
 				bits |= freebit;
 
@@ -104,11 +122,12 @@ _("couldn't allocate memory for incore realtime summary info.\n"));
 
 				offs = xfs_rtsumoffs(mp, libxfs_highbit64(len),
 						start_bmbno);
-				inc_sumcount(mp, sumcompute, offs);
+				inc_sumcount(mp, comp->sum, offs);
 				in_extent = false;
 			}
 
 			freebit <<= 1;
+			extno++;
 		}
 		set_rtword(mp, words, bits);
 		words++;
@@ -122,8 +141,27 @@ _("couldn't allocate memory for incore realtime summary info.\n"));
 		xfs_rtsumoff_t	offs;
 
 		offs = xfs_rtsumoffs(mp, libxfs_highbit64(len), start_bmbno);
-		inc_sumcount(mp, sumcompute, offs);
+		inc_sumcount(mp, comp->sum, offs);
 	}
+}
+
+/*
+ * generate the real-time bitmap and summary info based on the
+ * incore realtime extent map.
+ */
+void
+generate_rtinfo(
+	struct xfs_mount	*mp)
+{
+	struct xfs_rtgroup	*rtg = NULL;
+
+	rt_computed = calloc(mp->m_sb.sb_rgcount, sizeof(struct rtg_computed));
+	if (!rt_computed)
+		do_error(
+	_("couldn't allocate memory for incore realtime info.\n"));
+
+	while ((rtg = xfs_rtgroup_next(mp, rtg)))
+		generate_rtgroup_rtinfo(rtg);
 
 	if (mp->m_sb.sb_frextents != sb_frextents) {
 		do_warn(_("sb_frextents %" PRIu64 ", counted %" PRIu64 "\n"),
@@ -133,12 +171,13 @@ _("couldn't allocate memory for incore realtime summary info.\n"));
 
 static void
 check_rtwords(
-	struct xfs_mount	*mp,
+	struct xfs_rtgroup	*rtg,
 	const char		*filename,
 	unsigned long long	bno,
 	void			*ondisk,
 	void			*incore)
 {
+	struct xfs_mount	*mp = rtg_mount(rtg);
 	unsigned int		wordcnt = mp->m_blockwsize;
 	union xfs_rtword_raw	*o = ondisk, *i = incore;
 	int			badstart = -1;
@@ -152,8 +191,9 @@ check_rtwords(
 			/* Report a range of inconsistency that just ended. */
 			if (badstart >= 0)
 				do_warn(
- _("discrepancy in %s at dblock 0x%llx words 0x%x-0x%x/0x%x\n"),
-					filename, bno, badstart, j - 1, wordcnt);
+ _("discrepancy in %s (%u) at dblock 0x%llx words 0x%x-0x%x/0x%x\n"),
+					filename, rtg_rgno(rtg), bno,
+					badstart, j - 1, wordcnt);
 			badstart = -1;
 			continue;
 		}
@@ -164,44 +204,26 @@ check_rtwords(
 
 	if (badstart >= 0)
 		do_warn(
- _("discrepancy in %s at dblock 0x%llx words 0x%x-0x%x/0x%x\n"),
-					filename, bno, badstart, wordcnt,
-					wordcnt);
+ _("discrepancy in %s (%u) at dblock 0x%llx words 0x%x-0x%x/0x%x\n"),
+			filename, rtg_rgno(rtg), bno,
+			badstart, wordcnt, wordcnt);
 }
 
 static void
 check_rtfile_contents(
-	struct xfs_mount	*mp,
-	enum xfs_metafile_type	metafile_type,
+	struct xfs_rtgroup	*rtg,
+	enum xfs_rtg_inodes	type,
+	void			*buf,
 	xfs_fileoff_t		filelen)
 {
-	struct xfs_bmbt_irec	map;
-	struct xfs_buf		*bp;
-	struct xfs_inode	*ip;
-	const char		*filename;
-	void			*buf;
-	xfs_ino_t		ino;
+	struct xfs_mount	*mp = rtg_mount(rtg);
+	const char		*filename = libxfs_rtginode_name(type);
+	struct xfs_inode	*ip = rtg->rtg_inodes[type];
 	xfs_fileoff_t		bno = 0;
 	int			error;
 
-	switch (metafile_type) {
-	case XFS_METAFILE_RTBITMAP:
-		ino = mp->m_sb.sb_rbmino;
-		filename = "rtbitmap";
-		buf = btmcompute;
-		break;
-	case XFS_METAFILE_RTSUMMARY:
-		ino = mp->m_sb.sb_rsumino;
-		filename = "rtsummary";
-		buf = sumcompute;
-		break;
-	default:
-		return;
-	}
-
-	error = -libxfs_metafile_iget(mp, ino, metafile_type, &ip);
-	if (error) {
-		do_warn(_("unable to open %s file, err %d\n"), filename, error);
+	if (!ip) {
+		do_warn(_("unable to open %s file\n"), filename);
 		return;
 	}
 
@@ -213,12 +235,11 @@ check_rtfile_contents(
 	}
 
 	while (bno < filelen)  {
-		xfs_filblks_t	maplen;
-		int		nmap = 1;
+		struct xfs_bmbt_irec	map;
+		struct xfs_buf		*bp;
+		int			nmap = 1;
 
-		/* Read up to 1MB at a time. */
-		maplen = min(filelen - bno, XFS_B_TO_FSBT(mp, 1048576));
-		error = -libxfs_bmapi_read(ip, bno, maplen, &map, &nmap, 0);
+		error = -libxfs_bmapi_read(ip, bno, 1, &map, &nmap, 0);
 		if (error) {
 			do_warn(_("unable to read %s mapping, err %d\n"),
 					filename, error);
@@ -233,43 +254,104 @@ check_rtfile_contents(
 
 		error = -libxfs_buf_read_uncached(mp->m_dev,
 				XFS_FSB_TO_DADDR(mp, map.br_startblock),
-				XFS_FSB_TO_BB(mp, map.br_blockcount),
-				0, &bp, NULL);
+				XFS_FSB_TO_BB(mp, 1), 0, &bp,
+				xfs_rtblock_ops(mp, type));
 		if (error) {
 			do_warn(_("unable to read %s at dblock 0x%llx, err %d\n"),
 					filename, (unsigned long long)bno, error);
 			break;
 		}
 
-		check_rtwords(mp, filename, bno, bp->b_addr, buf);
+		check_rtwords(rtg, filename, bno, bp->b_addr, buf);
 
-		buf += XFS_FSB_TO_B(mp, map.br_blockcount);
-		bno += map.br_blockcount;
+		buf += mp->m_blockwsize << XFS_WORDLOG;
+		bno++;
 		libxfs_buf_relse(bp);
 	}
+}
 
-	libxfs_irele(ip);
+/*
+ * Try to load a sb-rooted rt metadata file now, since earlier phases may have
+ * fixed verifier problems in the root inode chunk.
+ */
+static void
+try_load_sb_rtfile(
+	struct xfs_mount	*mp,
+	enum xfs_rtg_inodes	type)
+{
+	struct xfs_rtgroup	*rtg = libxfs_rtgroup_grab(mp, 0);
+	struct xfs_trans	*tp;
+	int			error;
+
+	if (rtg->rtg_inodes[type])
+		goto out_rtg;
+
+	error = -libxfs_trans_alloc_empty(mp, &tp);
+	if (error)
+		goto out_rtg;
+
+
+	error = -libxfs_rtginode_load(rtg, type, tp);
+	if (error)
+		goto out_cancel;
+
+	/* If we can't load the inode, signal to phase 6 to recreate it. */
+	if (!rtg->rtg_inodes[type]) {
+		switch (type) {
+		case XFS_RTGI_BITMAP:
+			need_rbmino = 1;
+			break;
+		case XFS_RTGI_SUMMARY:
+			need_rsumino = 1;
+			break;
+		default:
+			ASSERT(0);
+			break;
+		}
+	}
+
+out_cancel:
+	libxfs_trans_cancel(tp);
+out_rtg:
+	libxfs_rtgroup_rele(rtg);
 }
 
 void
 check_rtbitmap(
 	struct xfs_mount	*mp)
 {
+	struct xfs_rtgroup	*rtg = NULL;
+
 	if (need_rbmino)
 		return;
 
-	check_rtfile_contents(mp, XFS_METAFILE_RTBITMAP,
-			mp->m_sb.sb_rbmblocks);
+	if (!xfs_has_rtgroups(mp))
+		try_load_sb_rtfile(mp, XFS_RTGI_BITMAP);
+
+	while ((rtg = xfs_rtgroup_next(mp, rtg))) {
+		check_rtfile_contents(rtg, XFS_RTGI_BITMAP,
+				rt_computed[rtg_rgno(rtg)].bmp,
+				mp->m_sb.sb_rbmblocks);
+	}
 }
 
 void
 check_rtsummary(
 	struct xfs_mount	*mp)
 {
+	struct xfs_rtgroup	*rtg = NULL;
+
 	if (need_rsumino)
 		return;
 
-	check_rtfile_contents(mp, XFS_METAFILE_RTSUMMARY, mp->m_rsumblocks);
+	if (!xfs_has_rtgroups(mp))
+		try_load_sb_rtfile(mp, XFS_RTGI_SUMMARY);
+
+	while ((rtg = xfs_rtgroup_next(mp, rtg))) {
+		check_rtfile_contents(rtg, XFS_RTGI_SUMMARY,
+				rt_computed[rtg_rgno(rtg)].sum,
+				mp->m_rsumblocks);
+	}
 }
 
 void
@@ -278,8 +360,17 @@ fill_rtbitmap(
 {
 	int			error;
 
+	/*
+	 * For file systems without a RT subvolume we have the bitmap and
+	 * summary files, but they are empty.  In that case rt_computed is
+	 * NULL.
+	 */
+	if (!rt_computed)
+		return;
+
 	error = -libxfs_rtfile_initialize_blocks(rtg, XFS_RTGI_BITMAP,
-			0, rtg_mount(rtg)->m_sb.sb_rbmblocks, btmcompute);
+			0, rtg_mount(rtg)->m_sb.sb_rbmblocks,
+			rt_computed[rtg_rgno(rtg)].bmp);
 	if (error)
 		do_error(
 _("couldn't re-initialize realtime bitmap inode, error %d\n"), error);
@@ -291,9 +382,183 @@ fill_rtsummary(
 {
 	int			error;
 
+	/*
+	 * For file systems without a RT subvolume we have the bitmap and
+	 * summary files, but they are empty.  In that case rt_computed is
+	 * NULL.
+	 */
+	if (!rt_computed)
+		return;
+
 	error = -libxfs_rtfile_initialize_blocks(rtg, XFS_RTGI_SUMMARY,
-			0, rtg_mount(rtg)->m_rsumblocks, sumcompute);
+			0, rtg_mount(rtg)->m_rsumblocks,
+			rt_computed[rtg_rgno(rtg)].sum);
 	if (error)
 		do_error(
 _("couldn't re-initialize realtime summary inode, error %d\n"), error);
 }
+
+bool
+is_rtgroup_inode(
+	xfs_ino_t		ino,
+	enum xfs_rtg_inodes	type)
+{
+	if (!rtg_inodes[type])
+		return false;
+	return bitmap_test(rtg_inodes[type], ino, 1);
+}
+
+bool
+rtgroup_inodes_were_bad(
+	enum xfs_rtg_inodes	type)
+{
+	return rtginodes_bad[type];
+}
+
+void
+mark_rtgroup_inodes_bad(
+	struct xfs_mount	*mp,
+	enum xfs_rtg_inodes	type)
+{
+	struct xfs_rtgroup	*rtg = NULL;
+
+	while ((rtg = xfs_rtgroup_next(mp, rtg)))
+		libxfs_rtginode_irele(&rtg->rtg_inodes[type]);
+
+	rtginodes_bad[type] = true;
+}
+
+static inline int
+mark_rtginode(
+	struct xfs_trans	*tp,
+	struct xfs_rtgroup	*rtg,
+	enum xfs_rtg_inodes	type)
+{
+	struct xfs_inode	*ip;
+	int			error;
+
+	if (!xfs_rtginode_enabled(rtg, type))
+		return 0;
+
+	error = -libxfs_rtginode_load(rtg, type, tp);
+	if (error)
+		goto out_corrupt;
+
+	ip = rtg->rtg_inodes[type];
+	if (!ip)
+		goto out_corrupt;
+
+	if (xfs_has_rtgroups(rtg_mount(rtg))) {
+		if (bitmap_test(rtg_inodes[type], ip->i_ino, 1)) {
+			error = EFSCORRUPTED;
+			goto out_corrupt;
+		}
+
+		error = bitmap_set(rtg_inodes[type], ip->i_ino, 1);
+		if (error)
+			goto out_corrupt;
+	}
+
+	/*
+	 * Phase 3 will clear the ondisk inodes of all rt metadata files, but
+	 * it doesn't reset any blocks.  Keep the incore inodes loaded so that
+	 * phase 4 can check the rt metadata.  These inodes must be dropped
+	 * before rebuilding can begin during phase 6.
+	 */
+	return 0;
+
+out_corrupt:
+	rtginodes_bad[type] = true;
+	return error;
+}
+
+/* Mark the reachable rt metadata inodes prior to the inode scan. */
+void
+discover_rtgroup_inodes(
+	struct xfs_mount	*mp)
+{
+	struct xfs_rtgroup	*rtg = NULL;
+	struct xfs_trans	*tp;
+	int			error, err2;
+	int			i;
+
+	error = -libxfs_trans_alloc_empty(mp, &tp);
+	if (error)
+		goto out;
+	if (xfs_has_rtgroups(mp) && mp->m_sb.sb_rgcount > 0) {
+		error = -libxfs_rtginode_load_parent(tp);
+		if (error)
+			goto out_cancel;
+	}
+
+	while ((rtg = xfs_rtgroup_next(mp, rtg))) {
+		for (i = 0; i < XFS_RTGI_MAX; i++) {
+			err2 = mark_rtginode(tp, rtg, i);
+			if (err2 && !error)
+				error = err2;
+		}
+	}
+
+out_cancel:
+	libxfs_trans_cancel(tp);
+out:
+	if (xfs_has_rtgroups(mp) && error) {
+		/*
+		 * Old xfs_repair didn't complain if rtbitmaps didn't load
+		 * until phase 5, so only turn on extra warnings during phase 2
+		 * for newer filesystems.
+		 */
+		switch (error) {
+		case EFSCORRUPTED:
+			do_warn(
+ _("corruption in metadata directory tree while discovering rt group inodes\n"));
+			break;
+		default:
+			do_warn(
+ _("couldn't discover rt group inodes, err %d\n"),
+						error);
+			break;
+		}
+	}
+}
+
+/* Unload incore rtgroup inodes before rebuilding rt metadata. */
+void
+unload_rtgroup_inodes(
+	struct xfs_mount	*mp)
+{
+	struct xfs_rtgroup	*rtg = NULL;
+	unsigned int		i;
+
+	while ((rtg = xfs_rtgroup_next(mp, rtg)))
+		for (i = 0; i < XFS_RTGI_MAX; i++)
+			libxfs_rtginode_irele(&rtg->rtg_inodes[i]);
+
+	libxfs_rtginode_irele(&mp->m_rtdirip);
+}
+
+void
+init_rtgroup_inodes(void)
+{
+	unsigned int		i;
+	int			error;
+
+	for (i = 0; i < XFS_RTGI_MAX; i++) {
+		error = bitmap_alloc(&rtg_inodes[i]);
+		if (error)
+			break;
+	}
+
+	if (error)
+		do_error(_("could not allocate rtginode bitmap, err=%d!\n"),
+				error);
+}
+
+void
+free_rtgroup_inodes(void)
+{
+	int			i;
+
+	for (i = 0; i < XFS_RTGI_MAX; i++)
+		bitmap_free(&rtg_inodes[i]);
+}
diff --git a/repair/rt.h b/repair/rt.h
index 9d837de65a7dfc..4dfe4a921d4cdf 100644
--- a/repair/rt.h
+++ b/repair/rt.h
@@ -13,4 +13,24 @@ void check_rtsummary(struct xfs_mount *mp);
 void fill_rtbitmap(struct xfs_rtgroup *rtg);
 void fill_rtsummary(struct xfs_rtgroup *rtg);
 
+void discover_rtgroup_inodes(struct xfs_mount *mp);
+void unload_rtgroup_inodes(struct xfs_mount *mp);
+
+void init_rtgroup_inodes(void);
+void free_rtgroup_inodes(void);
+
+bool is_rtgroup_inode(xfs_ino_t ino, enum xfs_rtg_inodes type);
+
+static inline bool is_rtbitmap_inode(xfs_ino_t ino)
+{
+	return is_rtgroup_inode(ino, XFS_RTGI_BITMAP);
+}
+static inline bool is_rtsummary_inode(xfs_ino_t ino)
+{
+	return is_rtgroup_inode(ino, XFS_RTGI_SUMMARY);
+}
+
+void mark_rtgroup_inodes_bad(struct xfs_mount *mp, enum xfs_rtg_inodes type);
+bool rtgroup_inodes_were_bad(enum xfs_rtg_inodes type);
+
 #endif /* _XFS_REPAIR_RT_H_ */
diff --git a/repair/sb.c b/repair/sb.c
index 7f27833d697ea9..d52ab2ffeaf28c 100644
--- a/repair/sb.c
+++ b/repair/sb.c
@@ -311,6 +311,38 @@ verify_sb_loginfo(
 	return true;
 }
 
+static int
+verify_sb_rtgroups(
+	struct xfs_sb		*sbp)
+{
+	uint64_t		groups;
+
+	if (sbp->sb_rextsize == 0)
+		return XR_BAD_RT_GEO_DATA;
+
+	if (sbp->sb_rgextents > XFS_MAX_RGBLOCKS / sbp->sb_rextsize)
+		return XR_BAD_RT_GEO_DATA;
+
+	if (sbp->sb_rgextents < XFS_MIN_RGEXTENTS)
+		return XR_BAD_RT_GEO_DATA;
+
+	if (sbp->sb_rgcount > XFS_MAX_RGNUMBER)
+		return XR_BAD_RT_GEO_DATA;
+
+	groups = howmany_64(sbp->sb_rextents, sbp->sb_rgextents);
+	if (groups != sbp->sb_rgcount)
+		return XR_BAD_RT_GEO_DATA;
+
+	if (!(sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_EXCHRANGE))
+		return XR_BAD_RT_GEO_DATA;
+
+	if (sbp->sb_rgblklog != libxfs_compute_rgblklog(sbp->sb_rgextents,
+							sbp->sb_rextsize))
+		return XR_BAD_RT_GEO_DATA;
+
+	return 0;
+}
+
 /*
  * verify a superblock -- does not verify root inode #
  *	can only check that geometry info is internally
@@ -482,6 +514,15 @@ verify_sb(char *sb_buf, xfs_sb_t *sb, int is_primary_sb)
 	if (sb->sb_blocklog + sb->sb_dirblklog > XFS_MAX_BLOCKSIZE_LOG)
 		return XR_BAD_DIR_SIZE_DATA;
 
+	if (xfs_sb_version_hasmetadir(sb)) {
+		if (memchr_inv(sb->sb_pad, 0, sizeof(sb->sb_pad)))
+			return XR_SB_GEO_MISMATCH;
+
+		ret = verify_sb_rtgroups(sb);
+		if (ret)
+			return ret;
+	}
+
 	return(XR_OK);
 }
 
diff --git a/repair/scan.c b/repair/scan.c
index 8b118423ce0457..221d660e81fdb4 100644
--- a/repair/scan.c
+++ b/repair/scan.c
@@ -724,10 +724,11 @@ _("%s freespace btree block claimed (state %d), agno %d, bno %d, suspect %d\n"),
 			}
 
 			for ( ; b < end; b += blen)  {
-				state = get_bmap_ext(agno, b, end, &blen);
+				state = get_bmap_ext(agno, b, end, &blen, false);
 				switch (state) {
 				case XR_E_UNKNOWN:
-					set_bmap_ext(agno, b, blen, XR_E_FREE1);
+					set_bmap_ext(agno, b, blen, XR_E_FREE1,
+							false);
 					break;
 				case XR_E_FREE1:
 					/*
@@ -737,7 +738,7 @@ _("%s freespace btree block claimed (state %d), agno %d, bno %d, suspect %d\n"),
 					if (magic == XFS_ABTC_MAGIC ||
 					    magic == XFS_ABTC_CRC_MAGIC) {
 						set_bmap_ext(agno, b, blen,
-							     XR_E_FREE);
+							     XR_E_FREE, false);
 						break;
 					}
 					fallthrough;
@@ -841,27 +842,27 @@ process_rmap_rec(
 		switch (owner) {
 		case XFS_RMAP_OWN_FS:
 		case XFS_RMAP_OWN_LOG:
-			set_bmap_ext(agno, b, blen, XR_E_INUSE_FS1);
+			set_bmap_ext(agno, b, blen, XR_E_INUSE_FS1, false);
 			break;
 		case XFS_RMAP_OWN_AG:
 		case XFS_RMAP_OWN_INOBT:
-			set_bmap_ext(agno, b, blen, XR_E_FS_MAP1);
+			set_bmap_ext(agno, b, blen, XR_E_FS_MAP1, false);
 			break;
 		case XFS_RMAP_OWN_INODES:
-			set_bmap_ext(agno, b, blen, XR_E_INO1);
+			set_bmap_ext(agno, b, blen, XR_E_INO1, false);
 			break;
 		case XFS_RMAP_OWN_REFC:
-			set_bmap_ext(agno, b, blen, XR_E_REFC);
+			set_bmap_ext(agno, b, blen, XR_E_REFC, false);
 			break;
 		case XFS_RMAP_OWN_COW:
-			set_bmap_ext(agno, b, blen, XR_E_COW);
+			set_bmap_ext(agno, b, blen, XR_E_COW, false);
 			break;
 		case XFS_RMAP_OWN_NULL:
 			/* still unknown */
 			break;
 		default:
 			/* file data */
-			set_bmap_ext(agno, b, blen, XR_E_INUSE1);
+			set_bmap_ext(agno, b, blen, XR_E_INUSE1, false);
 			break;
 		}
 		break;
@@ -1207,7 +1208,8 @@ _("%s rmap btree block claimed (state %d), agno %d, bno %d, suspect %d\n"),
 
 			/* Check for block owner collisions. */
 			for ( ; b < end; b += blen)  {
-				state = get_bmap_ext(agno, b, end, &blen);
+				state = get_bmap_ext(agno, b, end, &blen,
+						false);
 				process_rmap_rec(mp, agno, b, end, blen, owner,
 						state, name);
 			}
@@ -1483,14 +1485,16 @@ _("leftover CoW extent has invalid startblock in record %u of %s btree block %u/
 				xfs_extlen_t	cnr;
 
 				for (c = agb; c < end; c += cnr) {
-					state = get_bmap_ext(agno, c, end, &cnr);
+					state = get_bmap_ext(agno, c, end, &cnr,
+							false);
 					switch (state) {
 					case XR_E_UNKNOWN:
 					case XR_E_COW:
 						do_warn(
 _("leftover CoW extent (%u/%u) len %u\n"),
 						agno, c, cnr);
-						set_bmap_ext(agno, c, cnr, XR_E_FREE);
+						set_bmap_ext(agno, c, cnr,
+							XR_E_FREE, false);
 						break;
 					default:
 						do_warn(
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 30b014898c3203..d06bf659df89c1 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -27,6 +27,7 @@
 #include "bulkload.h"
 #include "quotacheck.h"
 #include "rcbag_btree.h"
+#include "rt.h"
 
 /*
  * option tables for getsubopt calls
@@ -729,10 +730,12 @@ _("sb root inode value %" PRIu64 " valid but in unaligned location (expected %"P
 		rootino++;
 	}
 
-	validate_sb_ino(&mp->m_sb.sb_rbmino, rootino + 1,
-			_("realtime bitmap"));
-	validate_sb_ino(&mp->m_sb.sb_rsumino, rootino + 2,
-			_("realtime summary"));
+	if (!xfs_has_rtgroups(mp)) {
+		validate_sb_ino(&mp->m_sb.sb_rbmino, rootino + 1,
+				_("realtime bitmap"));
+		validate_sb_ino(&mp->m_sb.sb_rsumino, rootino + 2,
+				_("realtime summary"));
+	}
 }
 
 /*
@@ -1345,6 +1348,7 @@ main(int argc, char **argv)
 	incore_ino_init(mp);
 	incore_ext_init(mp);
 	rmaps_init(mp);
+	init_rtgroup_inodes();
 
 	/* initialize random globals now that we know the fs geometry */
 	inodes_per_block = mp->m_sb.sb_inopblock;
@@ -1390,11 +1394,14 @@ main(int argc, char **argv)
 		phase6(mp);
 		phase_end(mp, 6);
 
+		free_rtgroup_inodes();
+
 		phase7(mp, phase2_threads);
 		phase_end(mp, 7);
 	} else  {
 		do_warn(
 _("Inode allocation btrees are too corrupted, skipping phases 6 and 7\n"));
+		free_rtgroup_inodes();
 	}
 
 	if (lost_quotas && !have_uquotino && !have_gquotino && !have_pquotino) {


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 21/51] xfs_repair: find and clobber rtgroup bitmap and summary files
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (19 preceding siblings ...)
  2024-12-23 22:16   ` [PATCH 20/51] xfs_repair: support realtime groups Darrick J. Wong
@ 2024-12-23 22:17   ` Darrick J. Wong
  2024-12-23 22:17   ` [PATCH 22/51] xfs_repair: support realtime superblocks Darrick J. Wong
                     ` (29 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:17 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

On a rtgroups filesystem, if the rtgroups bitmap or summary files are
garbage, we need to clear the dinode and update the incore bitmap so
that we don't bother to check the old rt freespace metadata.

However, we regenerate the entire rt metadata directory tree during
phase 6.  If the bitmap and summary files are ok, we still want to clear
the dinode, but we can still use the incore inode to check the old
freespace contents.  Split the clear_dinode function into two pieces,
one that merely zeroes the inode, and the old clear_dinode now turns off
checking.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/dino_chunks.c |   23 +++++++++++++++++++++++
 repair/dinode.c      |   40 ++++++++++++++++++++++++++++++++++------
 2 files changed, 57 insertions(+), 6 deletions(-)


diff --git a/repair/dino_chunks.c b/repair/dino_chunks.c
index 8935cf856e70c8..fe106f0b6ab536 100644
--- a/repair/dino_chunks.c
+++ b/repair/dino_chunks.c
@@ -15,6 +15,7 @@
 #include "versions.h"
 #include "prefetch.h"
 #include "progress.h"
+#include "rt.h"
 
 /*
  * validates inode block or chunk, returns # of good inodes
@@ -1000,6 +1001,28 @@ process_inode_chunk(
 	_("would clear realtime summary inode %" PRIu64 "\n"),
 						ino);
 				}
+			} else if (is_rtbitmap_inode(ino)) {
+				mark_rtgroup_inodes_bad(mp, XFS_RTGI_BITMAP);
+				if (!no_modify)  {
+					do_warn(
+	_("cleared rtgroup bitmap inode %" PRIu64 "\n"),
+						ino);
+				} else  {
+					do_warn(
+	_("would clear rtgroup bitmap inode %" PRIu64 "\n"),
+						ino);
+				}
+			} else if (is_rtsummary_inode(ino)) {
+				mark_rtgroup_inodes_bad(mp, XFS_RTGI_SUMMARY);
+				if (!no_modify)  {
+					do_warn(
+	_("cleared rtgroup summary inode %" PRIu64 "\n"),
+						ino);
+				} else  {
+					do_warn(
+	_("would clear rtgroup summary inode %" PRIu64 "\n"),
+						ino);
+				}
 			} else if (!no_modify)  {
 				do_warn(_("cleared inode %" PRIu64 "\n"),
 					ino);
diff --git a/repair/dinode.c b/repair/dinode.c
index 0a9059db9302a3..2d341975e53993 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -149,16 +149,36 @@ clear_dinode_unlinked(xfs_mount_t *mp, struct xfs_dinode *dino)
  * until after the agi unlinked lists are walked in phase 3.
  */
 static void
-clear_dinode(xfs_mount_t *mp, struct xfs_dinode *dino, xfs_ino_t ino_num)
+zero_dinode(
+	struct xfs_mount	*mp,
+	struct xfs_dinode	*dino,
+	xfs_ino_t		ino_num)
 {
 	clear_dinode_core(mp, dino, ino_num);
 	clear_dinode_unlinked(mp, dino);
 
 	/* and clear the forks */
 	memset(XFS_DFORK_DPTR(dino), 0, XFS_LITINO(mp));
-	return;
 }
 
+/*
+ * clear the inode core and, if this is a metadata inode, prevent subsequent
+ * phases from checking the (obviously bad) data in the file.
+ */
+static void
+clear_dinode(
+	struct xfs_mount	*mp,
+	struct xfs_dinode	*dino,
+	xfs_ino_t		ino_num)
+{
+	zero_dinode(mp, dino, ino_num);
+
+	if (is_rtbitmap_inode(ino_num))
+		mark_rtgroup_inodes_bad(mp, XFS_RTGI_BITMAP);
+
+	if (is_rtsummary_inode(ino_num))
+		mark_rtgroup_inodes_bad(mp, XFS_RTGI_SUMMARY);
+}
 
 /*
  * misc. inode-related utility routines
@@ -3069,13 +3089,21 @@ _("Bad CoW extent size %u on inode %" PRIu64 ", "),
 		switch (type) {
 		case XR_INO_RTBITMAP:
 		case XR_INO_RTSUM:
+			/*
+			 * rt bitmap and summary files are always recreated
+			 * when rtgroups are enabled.  For older filesystems,
+			 * they exist at fixed locations and cannot be zapped.
+			 */
+			if (xfs_has_rtgroups(mp))
+				zap_metadata = true;
+			break;
 		case XR_INO_UQUOTA:
 		case XR_INO_GQUOTA:
 		case XR_INO_PQUOTA:
 			/*
-			 * This inode was recognized as being filesystem
-			 * metadata, so preserve the inode and its contents for
-			 * later checking and repair.
+			 * Quota checking and repair doesn't happen until
+			 * phase7, so preserve quota inodes and their contents
+			 * for later.
 			 */
 			break;
 		default:
@@ -3165,7 +3193,7 @@ _("Bad CoW extent size %u on inode %" PRIu64 ", "),
 	 * file, zero the ondisk inode and the incore state.
 	 */
 	if (check_dups && zap_metadata && !no_modify) {
-		clear_dinode(mp, dino, lino);
+		zero_dinode(mp, dino, lino);
 		*dirty += 1;
 		*used = is_free;
 		*isa_dir = 0;


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 22/51] xfs_repair: support realtime superblocks
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (20 preceding siblings ...)
  2024-12-23 22:17   ` [PATCH 21/51] xfs_repair: find and clobber rtgroup bitmap and summary files Darrick J. Wong
@ 2024-12-23 22:17   ` Darrick J. Wong
  2024-12-23 22:17   ` [PATCH 23/51] xfs_repair: repair rtbitmap and rtsummary block headers Darrick J. Wong
                     ` (28 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:17 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Support the realtime superblock feature.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/libxfs_api_defs.h |    1 +
 repair/incore.c          |   12 ++++++++++
 repair/phase3.c          |    4 +++
 repair/rt.c              |   54 ++++++++++++++++++++++++++++++++++++++++++++++
 repair/rt.h              |    3 +++
 repair/xfs_repair.c      |    4 +++
 6 files changed, 78 insertions(+)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 84965106358d61..dbdf5d100ec8e9 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -300,6 +300,7 @@
 
 #define xfs_rtfree_extent		libxfs_rtfree_extent
 #define xfs_rtfree_blocks		libxfs_rtfree_blocks
+#define xfs_update_rtsb			libxfs_update_rtsb
 #define xfs_sb_from_disk		libxfs_sb_from_disk
 #define xfs_sb_quota_from_disk		libxfs_sb_quota_from_disk
 #define xfs_sb_read_secondary		libxfs_sb_read_secondary
diff --git a/repair/incore.c b/repair/incore.c
index 2339d49a95773d..5b3d077f50e495 100644
--- a/repair/incore.c
+++ b/repair/incore.c
@@ -220,6 +220,15 @@ set_rtbmap(
 	 (((uint64_t) state) << ((rtx % XR_BB_NUM) * XR_BB)));
 }
 
+static void
+rtsb_init(
+	struct xfs_mount	*mp)
+{
+	/* The first rtx of the realtime device contains the super */
+	if (xfs_has_rtsb(mp) && rt_bmap)
+		set_rtbmap(0, XR_E_INUSE_FS);
+}
+
 static void
 reset_rt_bmap(void)
 {
@@ -245,6 +254,8 @@ init_rt_bmap(
 			mp->m_sb.sb_rextents);
 		return;
 	}
+
+	rtsb_init(mp);
 }
 
 static void
@@ -332,6 +343,7 @@ reset_bmaps(
 
 	if (xfs_has_rtgroups(mp)) {
 		reset_rtg_bmaps(mp);
+		rtsb_init(mp);
 	} else {
 		reset_rt_bmap();
 	}
diff --git a/repair/phase3.c b/repair/phase3.c
index ca4dbee47434c8..3a3ca22de14d26 100644
--- a/repair/phase3.c
+++ b/repair/phase3.c
@@ -17,6 +17,7 @@
 #include "progress.h"
 #include "bmap.h"
 #include "threads.h"
+#include "rt.h"
 
 static void
 process_agi_unlinked(
@@ -116,6 +117,9 @@ phase3(
 
 	set_progress_msg(PROG_FMT_AGI_UNLINKED, (uint64_t) glob_agcount);
 
+	if (xfs_has_rtsb(mp) && xfs_has_realtime(mp))
+		check_rtsb(mp);
+
 	/* first clear the agi unlinked AGI list */
 	if (!no_modify) {
 		for (i = 0; i < mp->m_sb.sb_agcount; i++)
diff --git a/repair/rt.c b/repair/rt.c
index 2de6830c931e86..102baa1d5d6186 100644
--- a/repair/rt.c
+++ b/repair/rt.c
@@ -562,3 +562,57 @@ free_rtgroup_inodes(void)
 	for (i = 0; i < XFS_RTGI_MAX; i++)
 		bitmap_free(&rtg_inodes[i]);
 }
+
+void
+check_rtsb(
+	struct xfs_mount	*mp)
+{
+	struct xfs_buf		*bp;
+	int			error;
+
+	error = -libxfs_buf_read_uncached(mp->m_rtdev_targp, XFS_RTSB_DADDR,
+			XFS_FSB_TO_BB(mp, 1), 0, &bp, &xfs_rtsb_buf_ops);
+	if (!error) {
+		libxfs_buf_relse(bp);
+		return;
+	}
+
+	if (no_modify) {
+		do_warn(_("would rewrite realtime superblock\n"));
+		return;
+	}
+
+	/*
+	 * Rewrite the rt superblock so that an update to the primary fs
+	 * superblock will not get confused by the non-matching rtsb.
+	 */
+	do_warn(_("will rewrite realtime superblock\n"));
+	rewrite_rtsb(mp);
+}
+
+void
+rewrite_rtsb(
+	struct xfs_mount	*mp)
+{
+	struct xfs_buf		*rtsb_bp;
+	struct xfs_buf		*sb_bp = libxfs_getsb(mp);
+	int			error;
+
+	if (!sb_bp)
+		do_error(
+ _("couldn't grab primary sb to update realtime sb\n"));
+
+	error = -libxfs_buf_get_uncached(mp->m_rtdev_targp,
+			XFS_FSB_TO_BB(mp, 1), XFS_RTSB_DADDR, &rtsb_bp);
+	if (error)
+		do_error(
+ _("couldn't grab realtime superblock\n"));
+
+	rtsb_bp->b_maps[0].bm_bn = XFS_RTSB_DADDR;
+	rtsb_bp->b_ops = &xfs_rtsb_buf_ops;
+
+	libxfs_update_rtsb(rtsb_bp, sb_bp);
+	libxfs_buf_mark_dirty(rtsb_bp);
+	libxfs_buf_relse(rtsb_bp);
+	libxfs_buf_relse(sb_bp);
+}
diff --git a/repair/rt.h b/repair/rt.h
index 4dfe4a921d4cdf..865d950b2bf3c4 100644
--- a/repair/rt.h
+++ b/repair/rt.h
@@ -33,4 +33,7 @@ static inline bool is_rtsummary_inode(xfs_ino_t ino)
 void mark_rtgroup_inodes_bad(struct xfs_mount *mp, enum xfs_rtg_inodes type);
 bool rtgroup_inodes_were_bad(enum xfs_rtg_inodes type);
 
+void check_rtsb(struct xfs_mount *mp);
+void rewrite_rtsb(struct xfs_mount *mp);
+
 #endif /* _XFS_REPAIR_RT_H_ */
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index d06bf659df89c1..2a8a72e7027591 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -1518,6 +1518,10 @@ _("Note - stripe unit (%d) and width (%d) were copied from a backup superblock.\
 				XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR;
 	}
 
+	/* Always rewrite the realtime superblock */
+	if (xfs_has_rtsb(mp) && xfs_has_realtime(mp))
+		rewrite_rtsb(mp);
+
 	/*
 	 * Done. Flush all cached buffers and inodes first to ensure all
 	 * verifiers are run (where we discover the max metadata LSN), reformat


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 23/51] xfs_repair: repair rtbitmap and rtsummary block headers
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (21 preceding siblings ...)
  2024-12-23 22:17   ` [PATCH 22/51] xfs_repair: support realtime superblocks Darrick J. Wong
@ 2024-12-23 22:17   ` Darrick J. Wong
  2024-12-23 22:18   ` [PATCH 24/51] xfs_db: enable the rtblock and rtextent commands for segmented rt block numbers Darrick J. Wong
                     ` (27 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:17 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Check and repair the new block headers attached to rtbitmap and
rtsummary blocks.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/rt.c |   15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)


diff --git a/repair/rt.c b/repair/rt.c
index 102baa1d5d6186..5ba04919bc3ccf 100644
--- a/repair/rt.c
+++ b/repair/rt.c
@@ -237,6 +237,7 @@ check_rtfile_contents(
 	while (bno < filelen)  {
 		struct xfs_bmbt_irec	map;
 		struct xfs_buf		*bp;
+		unsigned int		offset = 0;
 		int			nmap = 1;
 
 		error = -libxfs_bmapi_read(ip, bno, 1, &map, &nmap, 0);
@@ -262,7 +263,19 @@ check_rtfile_contents(
 			break;
 		}
 
-		check_rtwords(rtg, filename, bno, bp->b_addr, buf);
+		if (xfs_has_rtgroups(mp)) {
+			struct xfs_rtbuf_blkinfo	*hdr = bp->b_addr;
+
+			if (hdr->rt_owner != cpu_to_be64(ip->i_ino)) {
+				do_warn(
+ _("corrupt owner in %s at dblock 0x%llx\n"),
+					filename, (unsigned long long)bno);
+			}
+
+			offset = sizeof(*hdr);
+		}
+
+		check_rtwords(rtg, filename, bno, bp->b_addr + offset, buf);
 
 		buf += mp->m_blockwsize << XFS_WORDLOG;
 		bno++;


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 24/51] xfs_db: enable the rtblock and rtextent commands for segmented rt block numbers
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (22 preceding siblings ...)
  2024-12-23 22:17   ` [PATCH 23/51] xfs_repair: repair rtbitmap and rtsummary block headers Darrick J. Wong
@ 2024-12-23 22:18   ` Darrick J. Wong
  2024-12-23 22:18   ` [PATCH 25/51] xfs_db: enable rtconvert to handle segmented rtblocks Darrick J. Wong
                     ` (26 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:18 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Now that xfs_rtblock_t can be a segmented address, fix the validation in
rtblock_f to handle the inputs correctly; and fix rtextent_f to do all
of its conversions in linear address space.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/block.c |   34 +++++++++++++++++++++++++---------
 1 file changed, 25 insertions(+), 9 deletions(-)


diff --git a/db/block.c b/db/block.c
index f197e10cd5a08d..00830a3d57e1df 100644
--- a/db/block.c
+++ b/db/block.c
@@ -367,10 +367,23 @@ rtblock_f(
 		dbprintf(_("bad rtblock %s\n"), argv[1]);
 		return 0;
 	}
-	if (rtbno >= mp->m_sb.sb_rblocks) {
-		dbprintf(_("bad rtblock %s\n"), argv[1]);
-		return 0;
+
+	if (xfs_has_rtgroups(mp)) {
+		xfs_rgnumber_t	rgno = xfs_rtb_to_rgno(mp, rtbno);
+		xfs_rgblock_t	rgbno = xfs_rtb_to_rgbno(mp, rtbno);
+
+		if (rgno >= mp->m_sb.sb_rgcount ||
+		    rgbno >= mp->m_sb.sb_rgextents * mp->m_sb.sb_rextsize) {
+			dbprintf(_("bad rtblock %s\n"), argv[1]);
+			return 0;
+		}
+	} else {
+		if (rtbno >= mp->m_sb.sb_rblocks) {
+			dbprintf(_("bad rtblock %s\n"), argv[1]);
+			return 0;
+		}
 	}
+
 	ASSERT(typtab[TYP_DATA].typnm == TYP_DATA);
 	set_rt_cur(&typtab[TYP_DATA], xfs_rtb_to_daddr(mp, rtbno), blkbb,
 			DB_RING_ADD, NULL);
@@ -392,14 +405,17 @@ rtextent_help(void)
 /*
  * Move the cursor to a specific location on the realtime block device given
  * a linear address in units of realtime extents.
+ *
+ * NOTE: The user interface assumes a global RT extent number, while the
+ * in-kernel rtx is per-RTG now, thus the odd conversions here.
  */
 static int
 rtextent_f(
 	int		argc,
 	char		**argv)
 {
-	xfs_rtblock_t	rtbno;
-	xfs_rtxnum_t	rtx;
+	uint64_t	rfsbno;
+	uint64_t	rtx;
 	char		*p;
 
 	if (argc == 1) {
@@ -408,9 +424,9 @@ rtextent_f(
 			return 0;
 		}
 
-		rtbno = xfs_daddr_to_rtb(mp, iocur_top->off >> BBSHIFT);
+		rfsbno = XFS_BB_TO_FSB(mp, iocur_top->off >> BBSHIFT);
 		dbprintf(_("current rtextent is %lld\n"),
-				xfs_rtb_to_rtx(mp, rtbno));
+				xfs_blen_to_rtbxlen(mp, rfsbno));
 		return 0;
 	}
 	rtx = strtoull(argv[1], &p, 0);
@@ -423,9 +439,9 @@ rtextent_f(
 		return 0;
 	}
 
-	rtbno = xfs_rtbxlen_to_blen(mp, rtx);
+	rfsbno = xfs_rtbxlen_to_blen(mp, rtx);
 	ASSERT(typtab[TYP_DATA].typnm == TYP_DATA);
-	set_rt_cur(&typtab[TYP_DATA], xfs_rtb_to_daddr(mp, rtbno),
+	set_rt_cur(&typtab[TYP_DATA], XFS_FSB_TO_BB(mp, rfsbno),
 			mp->m_sb.sb_rextsize * blkbb, DB_RING_ADD, NULL);
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 25/51] xfs_db: enable rtconvert to handle segmented rtblocks
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (23 preceding siblings ...)
  2024-12-23 22:18   ` [PATCH 24/51] xfs_db: enable the rtblock and rtextent commands for segmented rt block numbers Darrick J. Wong
@ 2024-12-23 22:18   ` Darrick J. Wong
  2024-12-23 22:18   ` [PATCH 26/51] xfs_db: listify the definition of enum typnm Darrick J. Wong
                     ` (25 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:18 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Now that we've turned xfs_rtblock_t into a segmented address and
xfs_rtxnum_t into a per-rtgroup address, port the rtconvert debugger
command to handle the unit conversions correctly.  Also add an example
of the bitmap/summary-related conversion commands to the manpage.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/convert.c      |   52 ++++++++++++++++++++++++++++++++++++++--------------
 man/man8/xfs_db.8 |   17 +++++++++++++++--
 2 files changed, 53 insertions(+), 16 deletions(-)


diff --git a/db/convert.c b/db/convert.c
index 0c5c942150fe6f..2cdde7d05ac397 100644
--- a/db/convert.c
+++ b/db/convert.c
@@ -26,14 +26,14 @@
 	 agino_to_bytes(XFS_INO_TO_AGINO(mp, (x))))
 #define	inoidx_to_bytes(x)	\
 	((uint64_t)(x) << mp->m_sb.sb_inodelog)
-#define rtblock_to_bytes(x)	\
-	((uint64_t)(x) << mp->m_sb.sb_blocklog)
-#define rtx_to_rtblock(x)	\
-	((uint64_t)(x) * mp->m_sb.sb_rextsize)
-#define rbmblock_to_bytes(x)	\
-	rtblock_to_bytes(rtx_to_rtblock(xfs_rbmblock_to_rtx(mp, (uint64_t)x)))
-#define rbmword_to_bytes(x)	\
-	rtblock_to_bytes(rtx_to_rtblock((uint64_t)(x) << XFS_NBWORDLOG))
+#define	rtblock_to_bytes(x)	\
+	(xfs_rtb_to_daddr(mp, (x)) << BBSHIFT)
+#define	rtx_to_bytes(x)		\
+	(((uint64_t)(x) * mp->m_sb.sb_rextsize) << mp->m_sb.sb_blocklog)
+#define	rbmblock_to_bytes(x)	\
+	rtx_to_bytes(xfs_rbmblock_to_rtx(mp, (x)))
+#define	rbmword_to_bytes(x)	\
+	rtx_to_bytes((uint64_t)(x) << XFS_NBWORDLOG)
 
 typedef enum {
 	CT_NONE = -1,
@@ -316,7 +316,7 @@ bytevalue(ctype_t ctype, cval_t *val)
 	case CT_RTBLOCK:
 		return rtblock_to_bytes(val->rtblock);
 	case CT_RTX:
-		return rtblock_to_bytes(rtx_to_rtblock(val->rtx));
+		return rtx_to_bytes(val->rtx);
 	case CT_RBMBLOCK:
 		return rbmblock_to_bytes(val->rbmblock);
 	case CT_RBMWORD:
@@ -495,6 +495,32 @@ rt_daddr_to_rsuminfo(
 	return xfs_rtsumoffs_to_infoword(mp, rsumoff);
 }
 
+/* Translate an rt device disk address to be in units of realtime extents. */
+static inline uint64_t
+rt_daddr_to_rtx(
+	struct xfs_mount	*mp,
+	xfs_daddr_t		daddr)
+{
+	return XFS_BB_TO_FSBT(mp, daddr) / mp->m_sb.sb_rextsize;
+}
+
+/*
+ * Compute the offset of an rt device disk address from the start of an
+ * rtgroup and return the result in units of realtime extents.
+ */
+static inline uint64_t
+rt_daddr_to_rtgrtx(
+	struct xfs_mount	*mp,
+	xfs_daddr_t		daddr)
+{
+	uint64_t		rtx = rt_daddr_to_rtx(mp, daddr);
+
+	if (xfs_has_rtgroups(mp))
+		return rtx % mp->m_sb.sb_rgextents;
+
+	return rtx;
+}
+
 static int
 rtconvert_f(int argc, char **argv)
 {
@@ -565,17 +591,15 @@ rtconvert_f(int argc, char **argv)
 		v = xfs_daddr_to_rtb(mp, v >> BBSHIFT);
 		break;
 	case CT_RTX:
-		v = xfs_daddr_to_rtb(mp, v >> BBSHIFT) / mp->m_sb.sb_rextsize;
+		v = rt_daddr_to_rtx(mp, v >> BBSHIFT);
 		break;
 	case CT_RBMBLOCK:
 		v = xfs_rtx_to_rbmblock(mp,
-				xfs_rtb_to_rtx(mp,
-					xfs_daddr_to_rtb(mp, v >> BBSHIFT)));
+				rt_daddr_to_rtgrtx(mp, v >> BBSHIFT));
 		break;
 	case CT_RBMWORD:
 		v = xfs_rtx_to_rbmword(mp,
-				xfs_rtb_to_rtx(mp,
-					xfs_daddr_to_rtb(mp, v >> BBSHIFT)));
+				rt_daddr_to_rtgrtx(mp, v >> BBSHIFT));
 		break;
 	case CT_RSUMBLOCK:
 		v = rt_daddr_to_rsumblock(mp, v);
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index 2325ef169ddc1b..38916036d76c03 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -1239,10 +1239,23 @@ .SH COMMANDS
 conversion type, and the type must be
 .BR rsumlog .
 
-To compute the rt bitmap block from summary file location, the type/number pairs
-must be specified exactly in the order
+For example, these commands tell us where to look in the rt summary file for
+the number of free rtextents of length 2^21 starting in rt bitmap block 30:
+
+.B rtconvert rbmblock 30 rsumlog 21 rsumblock
+
+.B rtconvert rbmblock 30 rsumlog 21 rsuminfo
+
+To compute the rt bitmap block from summary file location, the type/number
+pairs must be specified exactly in the order
 .BR rsumlog ", " rsuminfo ", " rsumblock .
 
+For example, this command tells us which block in the rt bitmap file is
+summarized by info word 809 in rt summary block 10 assuming that the
+maximum free extent length is 2^21 rtextents:
+
+.B rtconvert rsumlog 21 rsuminfo 809 rsumblock 10 rbmblock
+
 .TP
 .BI "sb [" agno ]
 Set current address to SB header in allocation group


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 26/51] xfs_db: listify the definition of enum typnm
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (24 preceding siblings ...)
  2024-12-23 22:18   ` [PATCH 25/51] xfs_db: enable rtconvert to handle segmented rtblocks Darrick J. Wong
@ 2024-12-23 22:18   ` Darrick J. Wong
  2024-12-23 22:18   ` [PATCH 27/51] xfs_db: support dumping realtime group data and superblocks Darrick J. Wong
                     ` (24 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:18 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Convert the enum definition into a list so that future patches adding
things to enum typnm don't have to reflow the entire thing.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/type.h |   29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)


diff --git a/db/type.h b/db/type.h
index 411bfe90dbc7e6..397dcf5464c6c8 100644
--- a/db/type.h
+++ b/db/type.h
@@ -11,11 +11,30 @@ struct field;
 
 typedef enum typnm
 {
-	TYP_AGF, TYP_AGFL, TYP_AGI, TYP_ATTR, TYP_BMAPBTA,
-	TYP_BMAPBTD, TYP_BNOBT, TYP_CNTBT, TYP_RMAPBT, TYP_REFCBT, TYP_DATA,
-	TYP_DIR2, TYP_DQBLK, TYP_INOBT, TYP_INODATA, TYP_INODE,
-	TYP_LOG, TYP_RTBITMAP, TYP_RTSUMMARY, TYP_SB, TYP_SYMLINK,
-	TYP_TEXT, TYP_FINOBT, TYP_NONE
+	TYP_AGF,
+	TYP_AGFL,
+	TYP_AGI,
+	TYP_ATTR,
+	TYP_BMAPBTA,
+	TYP_BMAPBTD,
+	TYP_BNOBT,
+	TYP_CNTBT,
+	TYP_RMAPBT,
+	TYP_REFCBT,
+	TYP_DATA,
+	TYP_DIR2,
+	TYP_DQBLK,
+	TYP_INOBT,
+	TYP_INODATA,
+	TYP_INODE,
+	TYP_LOG,
+	TYP_RTBITMAP,
+	TYP_RTSUMMARY,
+	TYP_SB,
+	TYP_SYMLINK,
+	TYP_TEXT,
+	TYP_FINOBT,
+	TYP_NONE
 } typnm_t;
 
 #define DB_FUZZ  2


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 27/51] xfs_db: support dumping realtime group data and superblocks
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (25 preceding siblings ...)
  2024-12-23 22:18   ` [PATCH 26/51] xfs_db: listify the definition of enum typnm Darrick J. Wong
@ 2024-12-23 22:18   ` Darrick J. Wong
  2024-12-23 22:19   ` [PATCH 28/51] xfs_db: support changing the label and uuid of rt superblocks Darrick J. Wong
                     ` (23 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:18 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Allow dumping of realtime device superblocks and the new fields in the
primary superblock that were added for rtgroups support.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/Makefile  |    1 +
 db/command.c |    2 +
 db/field.c   |   10 ++++++
 db/field.h   |    5 +++
 db/rtgroup.c |  100 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 db/rtgroup.h |   15 +++++++++
 db/sb.c      |    8 +++++
 db/type.c    |    6 +++
 db/type.h    |    1 +
 9 files changed, 148 insertions(+)
 create mode 100644 db/rtgroup.c
 create mode 100644 db/rtgroup.h


diff --git a/db/Makefile b/db/Makefile
index 83389376c36c46..02eeead25b49d0 100644
--- a/db/Makefile
+++ b/db/Makefile
@@ -49,6 +49,7 @@ HFILES = \
 	output.h \
 	print.h \
 	quit.h \
+	rtgroup.h \
 	sb.h \
 	sig.h \
 	strvec.h \
diff --git a/db/command.c b/db/command.c
index 6cda03e9856d84..1b46c3fec08a0e 100644
--- a/db/command.c
+++ b/db/command.c
@@ -39,6 +39,7 @@
 #include "fsmap.h"
 #include "crc.h"
 #include "fuzz.h"
+#include "rtgroup.h"
 
 cmdinfo_t	*cmdtab;
 int		ncmds;
@@ -135,6 +136,7 @@ init_commands(void)
 	output_init();
 	print_init();
 	quit_init();
+	rtsb_init();
 	sb_init();
 	type_init();
 	write_init();
diff --git a/db/field.c b/db/field.c
index 946684415f65d3..f70955ef57a323 100644
--- a/db/field.c
+++ b/db/field.c
@@ -23,6 +23,7 @@
 #include "dir2.h"
 #include "dir2sf.h"
 #include "symlink.h"
+#include "rtgroup.h"
 
 #define	PPOFF(f)	bitize(offsetof(struct xfs_parent_rec, f))
 const field_t		parent_flds[] = {
@@ -52,6 +53,13 @@ const ftattr_t	ftattrtab[] = {
 	{ FLDT_AGNUMBER, "agnumber", fp_num, "%u", SI(bitsz(xfs_agnumber_t)),
 	  FTARG_DONULL, NULL, NULL },
 
+	{ FLDT_RGBLOCK, "rgblock", fp_num, "%u", SI(bitsz(xfs_rgblock_t)),
+	  FTARG_DONULL, NULL, NULL },
+	{ FLDT_RTXLEN, "rtxlen", fp_num, "%u", SI(bitsz(xfs_rtxlen_t)),
+	  FTARG_DONULL, NULL, NULL },
+	{ FLDT_RGNUMBER, "rgnumber", fp_num, "%u", SI(bitsz(xfs_rgnumber_t)),
+	  FTARG_DONULL, NULL, NULL },
+
 /* attr fields */
 	{ FLDT_ATTR, "attr", NULL, (char *)attr_flds, attr_size, FTARG_SIZE,
 	  NULL, attr_flds },
@@ -357,6 +365,8 @@ const ftattr_t	ftattrtab[] = {
 	  NULL, NULL },
 	{ FLDT_SB, "sb", NULL, (char *)sb_flds, sb_size, FTARG_SIZE, NULL,
 	  sb_flds },
+	{ FLDT_RTSB, "rtsb", NULL, (char *)rtsb_flds, rtsb_size, FTARG_SIZE,
+	  NULL, rtsb_flds },
 
 /* CRC enabled symlink */
 	{ FLDT_SYMLINK_CRC, "symlink", NULL, (char *)symlink_crc_flds,
diff --git a/db/field.h b/db/field.h
index 9746676a6c7ac9..8797a75f669246 100644
--- a/db/field.h
+++ b/db/field.h
@@ -15,6 +15,10 @@ typedef enum fldt	{
 	FLDT_AGINONN,
 	FLDT_AGNUMBER,
 
+	FLDT_RGBLOCK,
+	FLDT_RTXLEN,
+	FLDT_RGNUMBER,
+
 	/* attr fields */
 	FLDT_ATTR,
 	FLDT_ATTR_BLKINFO,
@@ -167,6 +171,7 @@ typedef enum fldt	{
 	FLDT_QCNT,
 	FLDT_QWARNCNT,
 	FLDT_SB,
+	FLDT_RTSB,
 
 	/* CRC enabled symlink */
 	FLDT_SYMLINK_CRC,
diff --git a/db/rtgroup.c b/db/rtgroup.c
new file mode 100644
index 00000000000000..5cda1a4f35efb6
--- /dev/null
+++ b/db/rtgroup.c
@@ -0,0 +1,100 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2022-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "libxfs.h"
+#include "libxlog.h"
+#include "command.h"
+#include "type.h"
+#include "faddr.h"
+#include "fprint.h"
+#include "field.h"
+#include "io.h"
+#include "sb.h"
+#include "bit.h"
+#include "output.h"
+#include "init.h"
+#include "rtgroup.h"
+
+#define uuid_equal(s,d)		(platform_uuid_compare((s),(d)) == 0)
+
+static int	rtsb_f(int argc, char **argv);
+static void     rtsb_help(void);
+
+static const cmdinfo_t	rtsb_cmd =
+	{ "rtsb", NULL, rtsb_f, 0, 0, 1, "",
+	  N_("set current address to realtime sb header"), rtsb_help };
+
+void
+rtsb_init(void)
+{
+	if (xfs_has_rtgroups(mp))
+		add_command(&rtsb_cmd);
+}
+
+#define	OFF(f)	bitize(offsetof(struct xfs_rtsb, rsb_ ## f))
+#define	SZC(f)	szcount(struct xfs_rtsb, rsb_ ## f)
+const field_t	rtsb_flds[] = {
+	{ "magicnum", FLDT_UINT32X, OI(OFF(magicnum)), C1, 0, TYP_NONE },
+	{ "crc", FLDT_CRC, OI(OFF(crc)), C1, 0, TYP_NONE },
+	{ "pad", FLDT_UINT32X, OI(OFF(pad)), C1, 0, TYP_NONE },
+	{ "fname", FLDT_CHARNS, OI(OFF(fname)), CI(SZC(fname)), 0, TYP_NONE },
+	{ "uuid", FLDT_UUID, OI(OFF(uuid)), C1, 0, TYP_NONE },
+	{ "meta_uuid", FLDT_UUID, OI(OFF(meta_uuid)), C1, 0, TYP_NONE },
+	{ NULL }
+};
+
+const field_t	rtsb_hfld[] = {
+	{ "", FLDT_RTSB, OI(0), C1, 0, TYP_NONE },
+	{ NULL }
+};
+
+static void
+rtsb_help(void)
+{
+	dbprintf(_(
+"\n"
+" seek to realtime superblock\n"
+"\n"
+" Example:\n"
+"\n"
+" 'rtsb - set location to realtime superblock, set type to 'rtsb'\n"
+"\n"
+" Located in the first block of the realtime volume, the rt superblock\n"
+" contains the base information for the realtime section of a filesystem.\n"
+"\n"
+));
+}
+
+static int
+rtsb_f(
+	int		argc,
+	char		**argv)
+{
+	int		c;
+
+	while ((c = getopt(argc, argv, "")) != -1) {
+		switch (c) {
+		default:
+			rtsb_help();
+			return 0;
+		}
+	}
+
+	cur_agno = NULLAGNUMBER;
+
+	ASSERT(typtab[TYP_RTSB].typnm == TYP_RTSB);
+	set_rt_cur(&typtab[TYP_RTSB], XFS_RTSB_DADDR, XFS_FSB_TO_BB(mp, 1),
+			DB_RING_ADD, NULL);
+	return 0;
+}
+
+int
+rtsb_size(
+	void	*obj,
+	int	startoff,
+	int	idx)
+{
+	return bitize(mp->m_sb.sb_blocksize);
+}
diff --git a/db/rtgroup.h b/db/rtgroup.h
new file mode 100644
index 00000000000000..85960a3fb9f5c9
--- /dev/null
+++ b/db/rtgroup.h
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2022-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef DB_RTGROUP_H_
+#define DB_RTGROUP_H_
+
+extern const struct field	rtsb_flds[];
+extern const struct field	rtsb_hfld[];
+
+extern void	rtsb_init(void);
+extern int	rtsb_size(void *obj, int startoff, int idx);
+
+#endif /* DB_RTGROUP_H_ */
diff --git a/db/sb.c b/db/sb.c
index fa15b429ecbefa..de248c10cb27f5 100644
--- a/db/sb.c
+++ b/db/sb.c
@@ -127,6 +127,14 @@ const field_t	sb_flds[] = {
 	{ "meta_uuid", FLDT_UUID, OI(OFF(meta_uuid)), C1, 0, TYP_NONE },
 	{ "metadirino", FLDT_INO, OI(OFF(metadirino)), metadirfld_count,
 		FLD_COUNT, TYP_INODE },
+	{ "rgcount", FLDT_RGNUMBER, OI(OFF(rgcount)), metadirfld_count,
+		FLD_COUNT, TYP_NONE },
+	{ "rgextents", FLDT_RTXLEN, OI(OFF(rgextents)), metadirfld_count,
+		FLD_COUNT, TYP_NONE },
+	{ "rgblklog", FLDT_UINT8D, OI(OFF(rgblklog)), metadirfld_count,
+		FLD_COUNT, TYP_NONE },
+	{ "pad", FLDT_UINT8X, OI(OFF(pad)), metadirfld_count,
+		FLD_COUNT, TYP_NONE },
 	{ NULL }
 };
 
diff --git a/db/type.c b/db/type.c
index efe7044569d357..d875c0c636553b 100644
--- a/db/type.c
+++ b/db/type.c
@@ -28,6 +28,7 @@
 #include "text.h"
 #include "symlink.h"
 #include "fuzz.h"
+#include "rtgroup.h"
 
 static const typ_t	*findtyp(char *name);
 static int		type_f(int argc, char **argv);
@@ -60,6 +61,7 @@ static const typ_t	__typtab[] = {
 	{ TYP_LOG, "log", NULL, NULL, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_RTBITMAP, "rtbitmap", handle_text, NULL, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_RTSUMMARY, "rtsummary", handle_text, NULL, NULL, TYP_F_NO_CRC_OFF },
+	{ TYP_RTSB, "rtsb", handle_struct, rtsb_hfld, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_SB, "sb", handle_struct, sb_hfld, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_SYMLINK, "symlink", handle_string, NULL, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_TEXT, "text", handle_text, NULL, NULL, TYP_F_NO_CRC_OFF },
@@ -102,6 +104,8 @@ static const typ_t	__typtab_crc[] = {
 	{ TYP_LOG, "log", NULL, NULL, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_RTBITMAP, "rtbitmap", handle_text, NULL, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_RTSUMMARY, "rtsummary", handle_text, NULL, NULL, TYP_F_NO_CRC_OFF },
+	{ TYP_RTSB, "rtsb", handle_struct, rtsb_hfld, &xfs_rtsb_buf_ops,
+		XFS_SB_CRC_OFF },
 	{ TYP_SB, "sb", handle_struct, sb_hfld, &xfs_sb_buf_ops,
 		XFS_SB_CRC_OFF },
 	{ TYP_SYMLINK, "symlink", handle_struct, symlink_crc_hfld,
@@ -146,6 +150,8 @@ static const typ_t	__typtab_spcrc[] = {
 	{ TYP_LOG, "log", NULL, NULL, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_RTBITMAP, "rtbitmap", handle_text, NULL, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_RTSUMMARY, "rtsummary", handle_text, NULL, NULL, TYP_F_NO_CRC_OFF },
+	{ TYP_RTSB, "rtsb", handle_struct, rtsb_hfld, &xfs_rtsb_buf_ops,
+		XFS_SB_CRC_OFF },
 	{ TYP_SB, "sb", handle_struct, sb_hfld, &xfs_sb_buf_ops,
 		XFS_SB_CRC_OFF },
 	{ TYP_SYMLINK, "symlink", handle_struct, symlink_crc_hfld,
diff --git a/db/type.h b/db/type.h
index 397dcf5464c6c8..d4efa4b0fab541 100644
--- a/db/type.h
+++ b/db/type.h
@@ -30,6 +30,7 @@ typedef enum typnm
 	TYP_LOG,
 	TYP_RTBITMAP,
 	TYP_RTSUMMARY,
+	TYP_RTSB,
 	TYP_SB,
 	TYP_SYMLINK,
 	TYP_TEXT,


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 28/51] xfs_db: support changing the label and uuid of rt superblocks
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (26 preceding siblings ...)
  2024-12-23 22:18   ` [PATCH 27/51] xfs_db: support dumping realtime group data and superblocks Darrick J. Wong
@ 2024-12-23 22:19   ` Darrick J. Wong
  2024-12-23 22:19   ` [PATCH 29/51] xfs_db: enable conversion of rt space units Darrick J. Wong
                     ` (22 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:19 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Update the label and uuid commands to change the rt superblocks along
with the filesystem superblocks.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/sb.c |  109 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 98 insertions(+), 11 deletions(-)


diff --git a/db/sb.c b/db/sb.c
index de248c10cb27f5..aa8fce6712e571 100644
--- a/db/sb.c
+++ b/db/sb.c
@@ -27,6 +27,7 @@ static int	label_f(int argc, char **argv);
 static void     label_help(void);
 static int	version_f(int argc, char **argv);
 static void     version_help(void);
+static size_t check_label(char *label, bool can_warn);
 
 static const cmdinfo_t	sb_cmd =
 	{ "sb", NULL, sb_f, 0, 1, 1, N_("[agno]"),
@@ -332,6 +333,65 @@ uuid_help(void)
 ));
 }
 
+static bool
+check_rtsb(
+	struct xfs_mount	*mp)
+{
+	int			error;
+
+	if (!xfs_has_realtime(mp) || !xfs_has_rtsb(mp))
+		return false;
+
+	push_cur();
+	error = set_rt_cur(&typtab[TYP_RTSB], XFS_RTSB_DADDR,
+			XFS_FSB_TO_BB(mp, 1), DB_RING_ADD, NULL);
+	if (error == ENODEV) {
+		/* no rt dev means we should just bail out */
+		pop_cur();
+		return true;
+	}
+
+	pop_cur();
+	return false;
+}
+
+static int
+update_rtsb(
+	struct xfs_mount	*mp,
+	uuid_t			*uuid,
+	char			*label)
+{
+	struct xfs_rtsb		*rsb;
+	int			error;
+
+	if (!xfs_has_rtsb(mp) || !xfs_has_realtime(mp))
+		return false;
+
+	push_cur();
+	error = set_rt_cur(&typtab[TYP_RTSB], XFS_RTSB_DADDR,
+			XFS_FSB_TO_BB(mp, 1), DB_RING_ADD, NULL);
+	if (error == ENODEV) {
+		/* no rt dev means we should just bail out */
+		exitcode = 1;
+		pop_cur();
+		return 1;
+	}
+
+	rsb = iocur_top->data;
+	if (label) {
+		size_t	len = check_label(label, false);
+
+		memset(&rsb->rsb_fname, 0, XFSLABEL_MAX);
+		memcpy(&rsb->rsb_fname, label, len);
+	}
+	if (uuid)
+		memcpy(&rsb->rsb_uuid, uuid, sizeof(rsb->rsb_uuid));
+	write_cur();
+	pop_cur();
+
+	return 0;
+}
+
 static uuid_t *
 do_uuid(xfs_agnumber_t agno, uuid_t *uuid)
 {
@@ -376,6 +436,7 @@ do_uuid(xfs_agnumber_t agno, uuid_t *uuid)
 	memcpy(&tsb.sb_uuid, uuid, sizeof(uuid_t));
 	libxfs_sb_to_disk(iocur_top->data, &tsb);
 	write_cur();
+	memcpy(&mp->m_sb.sb_uuid, uuid, sizeof(uuid_t));
 	return uuid;
 }
 
@@ -438,11 +499,18 @@ uuid_f(
 			}
 		}
 
+		if (check_rtsb(mp)) {
+			exitcode = 1;
+			return 0;
+		}
+
 		/* clear the log (setting uuid) if it's not dirty */
 		if (!sb_logzero(&uu))
 			return 0;
 
 		dbprintf(_("writing all SBs\n"));
+		if (update_rtsb(mp, &uu, NULL))
+			return 1;
 		for (agno = 0; agno < mp->m_sb.sb_agcount; agno++)
 			if (!do_uuid(agno, &uu)) {
 				dbprintf(_("failed to set UUID in AG %d\n"), agno);
@@ -511,6 +579,27 @@ label_help(void)
 ));
 }
 
+static size_t
+check_label(
+	char	*label,
+	bool	can_warn)
+{
+	size_t	len = strlen(label);
+
+	if (len > XFSLABEL_MAX) {
+		if (can_warn)
+			dbprintf(_("%s: truncating label length from %d to %d\n"),
+				progname, (int)len, XFSLABEL_MAX);
+		len = XFSLABEL_MAX;
+	}
+	if ( len == 2 &&
+	     (strcmp(label, "\"\"") == 0 ||
+	      strcmp(label, "''")   == 0 ||
+	      strcmp(label, "--")   == 0) )
+		label[0] = label[1] = '\0';
+	return len;
+}
+
 static char *
 do_label(xfs_agnumber_t agno, char *label)
 {
@@ -529,22 +618,13 @@ do_label(xfs_agnumber_t agno, char *label)
 		return &lbl[0];
 	}
 	/* set label */
-	if ((len = strlen(label)) > sizeof(tsb.sb_fname)) {
-		if (agno == 0)
-			dbprintf(_("%s: truncating label length from %d to %d\n"),
-				progname, (int)len, (int)sizeof(tsb.sb_fname));
-		len = sizeof(tsb.sb_fname);
-	}
-	if ( len == 2 &&
-	     (strcmp(label, "\"\"") == 0 ||
-	      strcmp(label, "''")   == 0 ||
-	      strcmp(label, "--")   == 0) )
-		label[0] = label[1] = '\0';
+	len = check_label(label, agno == 0);
 	memset(&tsb.sb_fname, 0, sizeof(tsb.sb_fname));
 	memcpy(&tsb.sb_fname, label, len);
 	memcpy(&lbl[0], &tsb.sb_fname, sizeof(tsb.sb_fname));
 	libxfs_sb_to_disk(iocur_top->data, &tsb);
 	write_cur();
+	memcpy(&mp->m_sb.sb_fname, &tsb.sb_fname, XFSLABEL_MAX);
 	return &lbl[0];
 }
 
@@ -576,7 +656,14 @@ label_f(
 			return 0;
 		}
 
+		if (check_rtsb(mp)) {
+			exitcode = 1;
+			return 0;
+		}
+
 		dbprintf(_("writing all SBs\n"));
+		if (update_rtsb(mp, NULL, argv[1]))
+			return 1;
 		for (ag = 0; ag < mp->m_sb.sb_agcount; ag++)
 			if ((p = do_label(ag, argv[1])) == NULL) {
 				dbprintf(_("failed to set label in AG %d\n"), ag);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 29/51] xfs_db: enable conversion of rt space units
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (27 preceding siblings ...)
  2024-12-23 22:19   ` [PATCH 28/51] xfs_db: support changing the label and uuid of rt superblocks Darrick J. Wong
@ 2024-12-23 22:19   ` Darrick J. Wong
  2024-12-23 22:19   ` [PATCH 30/51] xfs_db: metadump metadir rt bitmap and summary files Darrick J. Wong
                     ` (21 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:19 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Teach the xfs_db convert function about realtime extents, blocks, and
realtime group numbers.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/convert.c      |   68 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 man/man8/xfs_db.8 |   17 +++++++++++++
 2 files changed, 84 insertions(+), 1 deletion(-)


diff --git a/db/convert.c b/db/convert.c
index 2cdde7d05ac397..47d3e86fdc4ef2 100644
--- a/db/convert.c
+++ b/db/convert.c
@@ -34,6 +34,21 @@
 	rtx_to_bytes(xfs_rbmblock_to_rtx(mp, (x)))
 #define	rbmword_to_bytes(x)	\
 	rtx_to_bytes((uint64_t)(x) << XFS_NBWORDLOG)
+#define	rgblock_to_bytes(x)	\
+	((uint64_t)(x) << mp->m_sb.sb_blocklog)
+#define	rgnumber_to_bytes(x)	\
+	rgblock_to_bytes((uint64_t)(x) * mp->m_groups[XG_TYPE_RTG].blocks)
+
+static inline xfs_rgnumber_t
+xfs_daddr_to_rgno(
+	struct xfs_mount	*mp,
+	xfs_daddr_t		daddr)
+{
+	if (!xfs_has_rtgroups(mp))
+		return 0;
+
+	return XFS_BB_TO_FSBT(mp, daddr) / mp->m_groups[XG_TYPE_RTG].blocks;
+}
 
 typedef enum {
 	CT_NONE = -1,
@@ -55,6 +70,8 @@ typedef enum {
 	CT_RSUMBLOCK,		/* block within rt summary */
 	CT_RSUMLOG,		/* log level for rtsummary computations */
 	CT_RSUMINFO,		/* info word within rt summary */
+	CT_RGBLOCK,		/* xfs_rgblock_t */
+	CT_RGNUMBER,		/* xfs_rgno_t */
 	NCTS
 } ctype_t;
 
@@ -80,6 +97,8 @@ typedef union {
 	xfs_fileoff_t	rbmblock;
 	unsigned int	rbmword;
 	xfs_fileoff_t	rsumblock;
+	xfs_rgnumber_t	rgnumber;
+	xfs_rgblock_t	rgblock;
 } cval_t;
 
 static uint64_t		bytevalue(ctype_t ctype, cval_t *val);
@@ -95,7 +114,7 @@ static const char	*agnumber_names[] = { "agnumber", "agno", NULL };
 static const char	*bboff_names[] = { "bboff", "daddroff", NULL };
 static const char	*blkoff_names[] = { "blkoff", "fsboff", "agboff",
 					    NULL };
-static const char	*rtblkoff_names[] = { "blkoff", "rtboff",
+static const char	*rtblkoff_names[] = { "blkoff", "rtboff", "rgboff",
 					    NULL };
 static const char	*byte_names[] = { "byte", "fsbyte", NULL };
 static const char	*daddr_names[] = { "daddr", "bb", NULL };
@@ -111,6 +130,8 @@ static const char	*rbmword_names[] = { "rbmword", "rbmw", NULL };
 static const char	*rsumblock_names[] = { "rsumblock", "rsmb", NULL };
 static const char	*rsumlog_names[] = { "rsumlog", "rsml", NULL };
 static const char	*rsumword_names[] = { "rsuminfo", "rsmi", NULL };
+static const char	*rgblock_names[] = { "rgblock", "rgbno", NULL };
+static const char	*rgnumber_names[] = { "rgnumber", "rgno", NULL };
 
 static int		rsuminfo;
 static int		rsumlog;
@@ -244,6 +265,22 @@ static const ctydesc_t	ctydescs_rt[NCTS] = {
 		.allowed = M(RSUMBLOCK),
 		.names   = rsumword_names,
 	},
+	[CT_RGBLOCK] = {
+		.allowed = M(RGNUMBER) |
+			   M(BBOFF) |
+			   M(BLKOFF) |
+			   M(RSUMLOG),
+		.names   = rgblock_names,
+	},
+	[CT_RGNUMBER] = {
+		.allowed = M(RGBLOCK) |
+			   M(BBOFF) |
+			   M(BLKOFF) |
+			   M(RSUMLOG) |
+			   M(RBMBLOCK) |
+			   M(RBMWORD),
+		.names   = rgnumber_names,
+	},
 };
 
 static const cmdinfo_t	convert_cmd =
@@ -331,6 +368,10 @@ bytevalue(ctype_t ctype, cval_t *val)
 		 * value.
 		 */
 		return 0;
+	case CT_RGBLOCK:
+		return rgblock_to_bytes(val->rgblock);
+	case CT_RGNUMBER:
+		return rgnumber_to_bytes(val->rgnumber);
 	case CT_NONE:
 	case NCTS:
 		break;
@@ -437,6 +478,8 @@ convert_f(int argc, char **argv)
 	case CT_RSUMBLOCK:
 	case CT_RSUMLOG:
 	case CT_RSUMINFO:
+	case CT_RGBLOCK:
+	case CT_RGNUMBER:
 		/* shouldn't get here */
 		ASSERT(0);
 		break;
@@ -521,6 +564,17 @@ rt_daddr_to_rtgrtx(
 	return rtx;
 }
 
+static inline xfs_rgblock_t
+rt_daddr_to_rgbno(
+	struct xfs_mount	*mp,
+	xfs_daddr_t		daddr)
+{
+	if (!xfs_has_rtgroups(mp))
+		return 0;
+
+	return XFS_BB_TO_FSBT(mp, daddr) % mp->m_groups[XG_TYPE_RTG].blocks;
+}
+
 static int
 rtconvert_f(int argc, char **argv)
 {
@@ -611,6 +665,12 @@ rtconvert_f(int argc, char **argv)
 	case CT_RSUMINFO:
 		v = rt_daddr_to_rsuminfo(mp, v);
 		break;
+	case CT_RGBLOCK:
+		v = rt_daddr_to_rgbno(mp, v >> BBSHIFT);
+		break;
+	case CT_RGNUMBER:
+		v = xfs_daddr_to_rgno(mp, v >> BBSHIFT);
+		break;
 	case CT_AGBLOCK:
 	case CT_AGINO:
 	case CT_AGNUMBER:
@@ -703,6 +763,12 @@ getvalue(char *s, ctype_t ctype, cval_t *val)
 	case CT_RSUMINFO:
 		rsuminfo = (unsigned int)v;
 		break;
+	case CT_RGBLOCK:
+		val->rgblock = (xfs_rgblock_t)v;
+		break;
+	case CT_RGNUMBER:
+		val->rgnumber = (xfs_rgnumber_t)v;
+		break;
 	case CT_NONE:
 	case NCTS:
 		/* NOTREACHED */
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index 38916036d76c03..06f4464a928596 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -1162,6 +1162,16 @@ .SH COMMANDS
 .RS 1.0i
 .PD 0
 .HP
+.B rgblock
+or
+.B rgbno
+(realtime block within a realtime group)
+.HP
+.B rgnumber
+or
+.B rgno
+(realtime group number)
+.HP
 .B bboff
 or
 .B daddroff
@@ -1229,6 +1239,13 @@ .SH COMMANDS
 .RE
 .IP
 Only conversions that "make sense" are allowed.
+The compound form (with more than three arguments) is useful for
+conversions such as
+.B convert rgno
+.I rg
+.B rgbno
+.I rgb
+.BR rtblock .
 
 Realtime summary file location conversions have the following rules:
 Each info word in the rt summary file counts the number of free extents of a


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 30/51] xfs_db: metadump metadir rt bitmap and summary files
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (28 preceding siblings ...)
  2024-12-23 22:19   ` [PATCH 29/51] xfs_db: enable conversion of rt space units Darrick J. Wong
@ 2024-12-23 22:19   ` Darrick J. Wong
  2024-12-23 22:19   ` [PATCH 31/51] xfs_db: metadump realtime devices Darrick J. Wong
                     ` (20 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:19 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Don't skip dumping the data fork for regular files that are marked as
metadata inodes.  This catches rtbitmap and summary inodes on rtgroup
enabled file systems where their inode numbers aren't recorded in the
superblock.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/metadump.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)


diff --git a/db/metadump.c b/db/metadump.c
index 144ebfbfe2a90e..8eb4b8eb69e45c 100644
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -2072,7 +2072,11 @@ process_bmbt_reclist(
 	xfs_agblock_t		agbno;
 	int			rval = 1;
 
-	if (btype == TYP_DATA)
+	/*
+	 * Ignore regular file data except for metadata inodes, where it is by
+	 * definition metadata.
+	 */
+	if (btype == TYP_DATA && !is_metadata_ino(dip))
 		return 1;
 
 	convert_extent(&rp[numrecs - 1], &o, &s, &c, &f);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 31/51] xfs_db: metadump realtime devices
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (29 preceding siblings ...)
  2024-12-23 22:19   ` [PATCH 30/51] xfs_db: metadump metadir rt bitmap and summary files Darrick J. Wong
@ 2024-12-23 22:19   ` Darrick J. Wong
  2024-12-23 22:20   ` [PATCH 32/51] xfs_db: dump rt bitmap blocks Darrick J. Wong
                     ` (19 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:19 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Teach the metadump device to dump the filesystem metadata of a realtime
device to the metadump file.  Currently, this is limited to the realtime
superblock.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/metadump.c             |   53 +++++++++++++++++++++++++++++++++++++++++++++
 db/xfs_metadump.sh        |    5 +++-
 include/xfs_metadump.h    |    8 +++++++
 man/man8/xfs_metadump.8   |   11 +++++++++
 mdrestore/xfs_mdrestore.c |    5 +++-
 5 files changed, 79 insertions(+), 3 deletions(-)


diff --git a/db/metadump.c b/db/metadump.c
index 8eb4b8eb69e45c..4f4b4f8a39a551 100644
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -85,6 +85,7 @@ static struct metadump {
 	bool			dirty_log;
 	bool			external_log;
 	bool			stdout_metadump;
+	bool			realtime_data;
 	xfs_ino_t		cur_ino;
 	/* Metadump file */
 	FILE			*outf;
@@ -3027,6 +3028,7 @@ init_metadump_v2(void)
 {
 	struct xfs_metadump_header	xmh = {0};
 	uint32_t			compat_flags = 0;
+	uint32_t			incompat_flags = 0;
 
 	xmh.xmh_magic = cpu_to_be32(XFS_MD_MAGIC_V2);
 	xmh.xmh_version = cpu_to_be32(2);
@@ -3039,8 +3041,11 @@ init_metadump_v2(void)
 		compat_flags |= XFS_MD2_COMPAT_DIRTYLOG;
 	if (metadump.external_log)
 		compat_flags |= XFS_MD2_COMPAT_EXTERNALLOG;
+	if (metadump.realtime_data)
+		incompat_flags |= XFS_MD2_INCOMPAT_RTDEVICE;
 
 	xmh.xmh_compat_flags = cpu_to_be32(compat_flags);
+	xmh.xmh_incompat_flags = cpu_to_be32(incompat_flags);
 
 	if (fwrite(&xmh, sizeof(xmh), 1, metadump.outf) != 1) {
 		print_warning("error writing to target file");
@@ -3050,6 +3055,30 @@ init_metadump_v2(void)
 	return 0;
 }
 
+static int
+copy_rtsb(void)
+{
+	int		error;
+
+	if (metadump.show_progress)
+		print_progress("Copying realtime superblock");
+
+	push_cur();
+	error = set_rt_cur(&typtab[TYP_RTSB], XFS_RTSB_DADDR,
+			XFS_FSB_TO_BB(mp, 1), DB_RING_ADD, NULL);
+	if (error)
+		return 0;
+	if (iocur_top->data == NULL) {
+		pop_cur();
+		print_warning("cannot read realtime superblock");
+		return !metadump.stop_on_read_error;
+	}
+	error = write_buf(iocur_top);
+	pop_cur();
+
+	return error ? 0 : 1;
+}
+
 static int
 write_metadump_v2(
 	enum typnm		type,
@@ -3064,6 +3093,8 @@ write_metadump_v2(
 	if (type == TYP_LOG &&
 	    mp->m_logdev_targp->bt_bdev != mp->m_ddev_targp->bt_bdev)
 		addr |= XME_ADDR_LOG_DEVICE;
+	else if (type == TYP_RTSB)
+		addr |= XME_ADDR_RT_DEVICE;
 	else
 		addr |= XME_ADDR_DATA_DEVICE;
 
@@ -3112,6 +3143,7 @@ metadump_f(
 	metadump.zero_stale_data = true;
 	metadump.dirty_log = false;
 	metadump.external_log = false;
+	metadump.realtime_data = false;
 
 	if (mp->m_sb.sb_magicnum != XFS_SB_MAGIC) {
 		print_warning("bad superblock magic number %x, giving up",
@@ -3190,6 +3222,21 @@ metadump_f(
 		return 1;
 	}
 
+	/*
+	 * The realtime device only contains metadata if the RT superblock is
+	 * enabled.
+	 */
+	if (xfs_has_realtime(mp) && xfs_has_rtsb(mp)) {
+		if (mp->m_rtdev_targp->bt_bdev) {
+			metadump.realtime_data = true;
+			if (!version_opt_set)
+				metadump.version = 2;
+		} else if (metadump.version == 2 && !metadump.realtime_data) {
+			print_warning("realtime device not loaded, use -R");
+			return 1;
+		}
+	}
+
 	/*
 	 * If we'll copy the log, see if the log is dirty.
 	 *
@@ -3289,6 +3336,12 @@ metadump_f(
 	if (!exitcode && !(metadump.version == 1 && metadump.external_log))
 		exitcode = !copy_log();
 
+	/* copy rt superblock */
+	if (!exitcode && metadump.realtime_data && xfs_has_rtsb(mp)) {
+		if (!copy_rtsb())
+			exitcode = 1;
+	}
+
 	/* write the remaining index */
 	if (!exitcode && metadump.mdops->finish_dump)
 		exitcode = metadump.mdops->finish_dump() < 0;
diff --git a/db/xfs_metadump.sh b/db/xfs_metadump.sh
index 9e8f86e53eb45d..b5c6959f2007f8 100755
--- a/db/xfs_metadump.sh
+++ b/db/xfs_metadump.sh
@@ -6,9 +6,9 @@
 
 OPTS=" "
 DBOPTS=" "
-USAGE="Usage: xfs_metadump [-aefFogwV] [-m max_extents] [-l logdev] source target"
+USAGE="Usage: xfs_metadump [-aefFogwV] [-m max_extents] [-l logdev] [-r rtdev] [-v version] source target"
 
-while getopts "aefgl:m:owFv:V" c
+while getopts "aefFgl:m:or:wv:V" c
 do
 	case $c in
 	a)	OPTS=$OPTS"-a ";;
@@ -25,6 +25,7 @@ do
 		status=$?
 		exit $status
 		;;
+	r)	DBOPTS=$DBOPTS"-R "$OPTARG" ";;
 	\?)	echo $USAGE 1>&2
 		exit 2
 		;;
diff --git a/include/xfs_metadump.h b/include/xfs_metadump.h
index e9c3dcb8f711b9..e35f791573efc2 100644
--- a/include/xfs_metadump.h
+++ b/include/xfs_metadump.h
@@ -68,12 +68,18 @@ struct xfs_metadump_header {
 /* Dump contains external log contents. */
 #define XFS_MD2_COMPAT_EXTERNALLOG	(1 << 3)
 
+/* Dump contains realtime device contents. */
+#define XFS_MD2_INCOMPAT_RTDEVICE	(1U << 0)
+
+#define XFS_MD2_INCOMPAT_ALL		(XFS_MD2_INCOMPAT_RTDEVICE)
+
 struct xfs_meta_extent {
 	/*
 	 * Lowest 54 bits are used to store 512 byte addresses.
 	 * Next 2 bits is used for indicating the device.
 	 * 00 - Data device
 	 * 01 - External log
+	 * 10 - Realtime device
 	 */
 	__be64 xme_addr;
 	/* In units of 512 byte blocks */
@@ -88,6 +94,8 @@ struct xfs_meta_extent {
 #define XME_ADDR_DATA_DEVICE	(0ULL << XME_ADDR_DEVICE_SHIFT)
 /* Extent was copied from the log device */
 #define XME_ADDR_LOG_DEVICE	(1ULL << XME_ADDR_DEVICE_SHIFT)
+/* Extent was copied from the rt device */
+#define XME_ADDR_RT_DEVICE	(2ULL << XME_ADDR_DEVICE_SHIFT)
 
 #define XME_ADDR_DEVICE_MASK	(3ULL << XME_ADDR_DEVICE_SHIFT)
 
diff --git a/man/man8/xfs_metadump.8 b/man/man8/xfs_metadump.8
index 496b5926603f48..8618ea99b9a57e 100644
--- a/man/man8/xfs_metadump.8
+++ b/man/man8/xfs_metadump.8
@@ -12,6 +12,9 @@ .SH SYNOPSIS
 .B \-l
 .I logdev
 ] [
+.B \-r
+.I rtdev
+] [
 .B \-v
 .I version
 ]
@@ -146,6 +149,14 @@ .SH OPTIONS
 .B \-o
 Disables obfuscation of file names and extended attributes.
 .TP
+.BI \-r  " rtdev"
+For filesystems that have a realtime section, this specifies the device where
+the realtime section resides.
+If the v2 metadump format is selected, the realtime group superblocks will be
+copied to the metadump.
+The v2 metadump format will be selected automatically if the filesystem
+contains realtime groups.
+.TP
 .B \-v
 The format of the metadump file to be produced.
 Valid values are 1 and 2.
diff --git a/mdrestore/xfs_mdrestore.c b/mdrestore/xfs_mdrestore.c
index 269edb8f89969d..c6c00270234442 100644
--- a/mdrestore/xfs_mdrestore.c
+++ b/mdrestore/xfs_mdrestore.c
@@ -314,7 +314,7 @@ read_header_v2(
 			sizeof(h->v2) - sizeof(h->v2.xmh_magic), 1, md_fp) != 1)
 		fatal("error reading from metadump file\n");
 
-	if (h->v2.xmh_incompat_flags != 0)
+	if (h->v2.xmh_incompat_flags & cpu_to_be32(~XFS_MD2_INCOMPAT_ALL))
 		fatal("Metadump header has unknown incompat flags set\n");
 
 	if (h->v2.xmh_reserved != 0)
@@ -324,6 +324,9 @@ read_header_v2(
 
 	if (!mdrestore.external_log && (compat & XFS_MD2_COMPAT_EXTERNALLOG))
 		fatal("External Log device is required\n");
+
+	if (h->v2.xmh_incompat_flags & cpu_to_be32(XFS_MD2_INCOMPAT_RTDEVICE))
+		fatal("Realtime device not yet supported\n");
 }
 
 static void


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 32/51] xfs_db: dump rt bitmap blocks
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (30 preceding siblings ...)
  2024-12-23 22:19   ` [PATCH 31/51] xfs_db: metadump realtime devices Darrick J. Wong
@ 2024-12-23 22:20   ` Darrick J. Wong
  2024-12-23 22:20   ` [PATCH 33/51] xfs_db: dump rt summary blocks Darrick J. Wong
                     ` (18 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:20 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Now that rtbitmap blocks have a header, make it so that xfs_db can
analyze the structure.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/field.c   |    6 ++++++
 db/field.h   |    3 +++
 db/inode.c   |   34 +++++++++++++++++++++++++---------
 db/rtgroup.c |   34 ++++++++++++++++++++++++++++++++++
 db/rtgroup.h |    3 +++
 db/type.c    |    5 +++++
 db/type.h    |    1 +
 7 files changed, 77 insertions(+), 9 deletions(-)


diff --git a/db/field.c b/db/field.c
index f70955ef57a323..ad1ccb9877aca5 100644
--- a/db/field.c
+++ b/db/field.c
@@ -406,6 +406,12 @@ const ftattr_t	ftattrtab[] = {
 	{ FLDT_UUID, "uuid", fp_uuid, NULL, SI(bitsz(uuid_t)), 0, NULL, NULL },
 	{ FLDT_PARENT_REC, "parent", NULL, (char *)parent_flds,
 	  SI(bitsz(struct xfs_parent_rec)), 0, NULL, parent_flds },
+
+	{ FLDT_RTWORD, "rtword", fp_num, "%#x", SI(bitsz(xfs_rtword_t)),
+	  0, NULL, NULL },
+	{ FLDT_RGBITMAP, "rgbitmap", NULL, (char *)rgbitmap_flds, btblock_size,
+	  FTARG_SIZE, NULL, rgbitmap_flds },
+
 	{ FLDT_ZZZ, NULL }
 };
 
diff --git a/db/field.h b/db/field.h
index 8797a75f669246..aace89c90d79eb 100644
--- a/db/field.h
+++ b/db/field.h
@@ -196,6 +196,9 @@ typedef enum fldt	{
 
 	FLDT_PARENT_REC,
 
+	FLDT_RTWORD,
+	FLDT_RGBITMAP,
+
 	FLDT_ZZZ			/* mark last entry */
 } fldt_t;
 
diff --git a/db/inode.c b/db/inode.c
index 07efbb4902be08..d3207510c28265 100644
--- a/db/inode.c
+++ b/db/inode.c
@@ -705,16 +705,32 @@ inode_next_type(void)
 	case S_IFLNK:
 		return TYP_SYMLINK;
 	case S_IFREG:
-		if (iocur_top->ino == mp->m_sb.sb_rbmino)
-			return TYP_RTBITMAP;
-		else if (iocur_top->ino == mp->m_sb.sb_rsumino)
-			return TYP_RTSUMMARY;
-		else if (iocur_top->ino == mp->m_sb.sb_uquotino ||
-			 iocur_top->ino == mp->m_sb.sb_gquotino ||
-			 iocur_top->ino == mp->m_sb.sb_pquotino)
+		if (xfs_has_rtgroups(mp)) {
+			struct xfs_dinode	*dic = iocur_top->data;
+
+			switch (be16_to_cpu(dic->di_metatype)) {
+			case XFS_METAFILE_USRQUOTA:
+			case XFS_METAFILE_GRPQUOTA:
+			case XFS_METAFILE_PRJQUOTA:
+				return TYP_DQBLK;
+			case XFS_METAFILE_RTBITMAP:
+				return TYP_RGBITMAP;
+			default:
+				return TYP_DATA;
+			}
+		} else {
+			if (iocur_top->ino == mp->m_sb.sb_rbmino)
+				return TYP_RTBITMAP;
+			if (iocur_top->ino == mp->m_sb.sb_rsumino)
+				return TYP_RTSUMMARY;
+		}
+
+		if (iocur_top->ino == mp->m_sb.sb_uquotino ||
+		    iocur_top->ino == mp->m_sb.sb_gquotino ||
+		    iocur_top->ino == mp->m_sb.sb_pquotino)
 			return TYP_DQBLK;
-		else
-			return TYP_DATA;
+
+		return TYP_DATA;
 	default:
 		return TYP_NONE;
 	}
diff --git a/db/rtgroup.c b/db/rtgroup.c
index 5cda1a4f35efb6..3ef2dc8fe7f031 100644
--- a/db/rtgroup.c
+++ b/db/rtgroup.c
@@ -44,6 +44,7 @@ const field_t	rtsb_flds[] = {
 	{ "meta_uuid", FLDT_UUID, OI(OFF(meta_uuid)), C1, 0, TYP_NONE },
 	{ NULL }
 };
+#undef OFF
 
 const field_t	rtsb_hfld[] = {
 	{ "", FLDT_RTSB, OI(0), C1, 0, TYP_NONE },
@@ -98,3 +99,36 @@ rtsb_size(
 {
 	return bitize(mp->m_sb.sb_blocksize);
 }
+
+static int
+rtwords_count(
+	void			*obj,
+	int			startoff)
+{
+	unsigned int		blksz = mp->m_sb.sb_blocksize;
+
+	if (xfs_has_rtgroups(mp))
+		blksz -= sizeof(struct xfs_rtbuf_blkinfo);
+
+	return blksz >> XFS_WORDLOG;
+}
+
+#define	OFF(f)	bitize(offsetof(struct xfs_rtbuf_blkinfo, rt_ ## f))
+const field_t	rgbitmap_flds[] = {
+	{ "magicnum", FLDT_UINT32X, OI(OFF(magic)), C1, 0, TYP_NONE },
+	{ "crc", FLDT_CRC, OI(OFF(crc)), C1, 0, TYP_NONE },
+	{ "owner", FLDT_INO, OI(OFF(owner)), C1, 0, TYP_NONE },
+	{ "bno", FLDT_DFSBNO, OI(OFF(blkno)), C1, 0, TYP_BMAPBTD },
+	{ "lsn", FLDT_UINT64X, OI(OFF(lsn)), C1, 0, TYP_NONE },
+	{ "uuid", FLDT_UUID, OI(OFF(uuid)), C1, 0, TYP_NONE },
+	/* the rtword array is after the actual structure */
+	{ "rtwords", FLDT_RTWORD, OI(bitize(sizeof(struct xfs_rtbuf_blkinfo))),
+	  rtwords_count, FLD_ARRAY | FLD_COUNT, TYP_DATA },
+	{ NULL }
+};
+#undef OFF
+
+const field_t	rgbitmap_hfld[] = {
+	{ "", FLDT_RGBITMAP, OI(0), C1, 0, TYP_NONE },
+	{ NULL }
+};
diff --git a/db/rtgroup.h b/db/rtgroup.h
index 85960a3fb9f5c9..06f554e1862851 100644
--- a/db/rtgroup.h
+++ b/db/rtgroup.h
@@ -9,6 +9,9 @@
 extern const struct field	rtsb_flds[];
 extern const struct field	rtsb_hfld[];
 
+extern const struct field	rgbitmap_flds[];
+extern const struct field	rgbitmap_hfld[];
+
 extern void	rtsb_init(void);
 extern int	rtsb_size(void *obj, int startoff, int idx);
 
diff --git a/db/type.c b/db/type.c
index d875c0c636553b..65e7b24146f170 100644
--- a/db/type.c
+++ b/db/type.c
@@ -67,6 +67,7 @@ static const typ_t	__typtab[] = {
 	{ TYP_TEXT, "text", handle_text, NULL, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_FINOBT, "finobt", handle_struct, finobt_hfld, NULL,
 		TYP_F_NO_CRC_OFF },
+	{ TYP_RGBITMAP, NULL },
 	{ TYP_NONE, NULL }
 };
 
@@ -113,6 +114,8 @@ static const typ_t	__typtab_crc[] = {
 	{ TYP_TEXT, "text", handle_text, NULL, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_FINOBT, "finobt", handle_struct, finobt_crc_hfld,
 		&xfs_finobt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF },
+	{ TYP_RGBITMAP, "rgbitmap", handle_struct, rgbitmap_hfld,
+		&xfs_rtbitmap_buf_ops, XFS_RTBUF_CRC_OFF },
 	{ TYP_NONE, NULL }
 };
 
@@ -159,6 +162,8 @@ static const typ_t	__typtab_spcrc[] = {
 	{ TYP_TEXT, "text", handle_text, NULL, NULL, TYP_F_NO_CRC_OFF },
 	{ TYP_FINOBT, "finobt", handle_struct, finobt_spcrc_hfld,
 		&xfs_finobt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF },
+	{ TYP_RGBITMAP, "rgbitmap", handle_struct, rgbitmap_hfld,
+		&xfs_rtbitmap_buf_ops, XFS_RTBUF_CRC_OFF },
 	{ TYP_NONE, NULL }
 };
 
diff --git a/db/type.h b/db/type.h
index d4efa4b0fab541..e2148c6351d141 100644
--- a/db/type.h
+++ b/db/type.h
@@ -35,6 +35,7 @@ typedef enum typnm
 	TYP_SYMLINK,
 	TYP_TEXT,
 	TYP_FINOBT,
+	TYP_RGBITMAP,
 	TYP_NONE
 } typnm_t;
 


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 33/51] xfs_db: dump rt summary blocks
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (31 preceding siblings ...)
  2024-12-23 22:20   ` [PATCH 32/51] xfs_db: dump rt bitmap blocks Darrick J. Wong
@ 2024-12-23 22:20   ` Darrick J. Wong
  2024-12-23 22:20   ` [PATCH 34/51] xfs_db: report rt group and block number in the bmap command Darrick J. Wong
                     ` (17 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:20 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Now that rtsummary blocks have a header, make it so that xfs_db can
analyze the structure.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/field.c   |    4 ++++
 db/field.h   |    2 ++
 db/inode.c   |    2 ++
 db/rtgroup.c |   20 ++++++++++++++++++++
 db/rtgroup.h |    3 +++
 db/type.c    |    5 +++++
 db/type.h    |    1 +
 7 files changed, 37 insertions(+)


diff --git a/db/field.c b/db/field.c
index ad1ccb9877aca5..ca0fe1826f9a80 100644
--- a/db/field.c
+++ b/db/field.c
@@ -411,6 +411,10 @@ const ftattr_t	ftattrtab[] = {
 	  0, NULL, NULL },
 	{ FLDT_RGBITMAP, "rgbitmap", NULL, (char *)rgbitmap_flds, btblock_size,
 	  FTARG_SIZE, NULL, rgbitmap_flds },
+	{ FLDT_SUMINFO, "suminfo", fp_num, "%u", SI(bitsz(xfs_suminfo_t)),
+	  0, NULL, NULL },
+	{ FLDT_RGSUMMARY, "rgsummary", NULL, (char *)rgsummary_flds,
+	  btblock_size, FTARG_SIZE, NULL, rgsummary_flds },
 
 	{ FLDT_ZZZ, NULL }
 };
diff --git a/db/field.h b/db/field.h
index aace89c90d79eb..1d7465b4d3e562 100644
--- a/db/field.h
+++ b/db/field.h
@@ -198,6 +198,8 @@ typedef enum fldt	{
 
 	FLDT_RTWORD,
 	FLDT_RGBITMAP,
+	FLDT_SUMINFO,
+	FLDT_RGSUMMARY,
 
 	FLDT_ZZZ			/* mark last entry */
 } fldt_t;
diff --git a/db/inode.c b/db/inode.c
index d3207510c28265..0a80b8d063603f 100644
--- a/db/inode.c
+++ b/db/inode.c
@@ -715,6 +715,8 @@ inode_next_type(void)
 				return TYP_DQBLK;
 			case XFS_METAFILE_RTBITMAP:
 				return TYP_RGBITMAP;
+			case XFS_METAFILE_RTSUMMARY:
+				return TYP_RGSUMMARY;
 			default:
 				return TYP_DATA;
 			}
diff --git a/db/rtgroup.c b/db/rtgroup.c
index 3ef2dc8fe7f031..c6b96c9dc79daa 100644
--- a/db/rtgroup.c
+++ b/db/rtgroup.c
@@ -132,3 +132,23 @@ const field_t	rgbitmap_hfld[] = {
 	{ "", FLDT_RGBITMAP, OI(0), C1, 0, TYP_NONE },
 	{ NULL }
 };
+
+#define	OFF(f)	bitize(offsetof(struct xfs_rtbuf_blkinfo, rt_ ## f))
+const field_t	rgsummary_flds[] = {
+	{ "magicnum", FLDT_UINT32X, OI(OFF(magic)), C1, 0, TYP_NONE },
+	{ "crc", FLDT_CRC, OI(OFF(crc)), C1, 0, TYP_NONE },
+	{ "owner", FLDT_INO, OI(OFF(owner)), C1, 0, TYP_NONE },
+	{ "bno", FLDT_DFSBNO, OI(OFF(blkno)), C1, 0, TYP_BMAPBTD },
+	{ "lsn", FLDT_UINT64X, OI(OFF(lsn)), C1, 0, TYP_NONE },
+	{ "uuid", FLDT_UUID, OI(OFF(uuid)), C1, 0, TYP_NONE },
+	/* the suminfo array is after the actual structure */
+	{ "suminfo", FLDT_SUMINFO, OI(bitize(sizeof(struct xfs_rtbuf_blkinfo))),
+	  rtwords_count, FLD_ARRAY | FLD_COUNT, TYP_DATA },
+	{ NULL }
+};
+#undef OFF
+
+const field_t	rgsummary_hfld[] = {
+	{ "", FLDT_RGSUMMARY, OI(0), C1, 0, TYP_NONE },
+	{ NULL }
+};
diff --git a/db/rtgroup.h b/db/rtgroup.h
index 06f554e1862851..5b120f2c9a29f3 100644
--- a/db/rtgroup.h
+++ b/db/rtgroup.h
@@ -12,6 +12,9 @@ extern const struct field	rtsb_hfld[];
 extern const struct field	rgbitmap_flds[];
 extern const struct field	rgbitmap_hfld[];
 
+extern const struct field	rgsummary_flds[];
+extern const struct field	rgsummary_hfld[];
+
 extern void	rtsb_init(void);
 extern int	rtsb_size(void *obj, int startoff, int idx);
 
diff --git a/db/type.c b/db/type.c
index 65e7b24146f170..2091b4ac8b139b 100644
--- a/db/type.c
+++ b/db/type.c
@@ -68,6 +68,7 @@ static const typ_t	__typtab[] = {
 	{ TYP_FINOBT, "finobt", handle_struct, finobt_hfld, NULL,
 		TYP_F_NO_CRC_OFF },
 	{ TYP_RGBITMAP, NULL },
+	{ TYP_RGSUMMARY, NULL },
 	{ TYP_NONE, NULL }
 };
 
@@ -116,6 +117,8 @@ static const typ_t	__typtab_crc[] = {
 		&xfs_finobt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF },
 	{ TYP_RGBITMAP, "rgbitmap", handle_struct, rgbitmap_hfld,
 		&xfs_rtbitmap_buf_ops, XFS_RTBUF_CRC_OFF },
+	{ TYP_RGSUMMARY, "rgsummary", handle_struct, rgsummary_hfld,
+		&xfs_rtsummary_buf_ops, XFS_RTBUF_CRC_OFF },
 	{ TYP_NONE, NULL }
 };
 
@@ -164,6 +167,8 @@ static const typ_t	__typtab_spcrc[] = {
 		&xfs_finobt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF },
 	{ TYP_RGBITMAP, "rgbitmap", handle_struct, rgbitmap_hfld,
 		&xfs_rtbitmap_buf_ops, XFS_RTBUF_CRC_OFF },
+	{ TYP_RGSUMMARY, "rgsummary", handle_struct, rgsummary_hfld,
+		&xfs_rtsummary_buf_ops, XFS_RTBUF_CRC_OFF },
 	{ TYP_NONE, NULL }
 };
 
diff --git a/db/type.h b/db/type.h
index e2148c6351d141..e7f0ecc17680bf 100644
--- a/db/type.h
+++ b/db/type.h
@@ -36,6 +36,7 @@ typedef enum typnm
 	TYP_TEXT,
 	TYP_FINOBT,
 	TYP_RGBITMAP,
+	TYP_RGSUMMARY,
 	TYP_NONE
 } typnm_t;
 


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 34/51] xfs_db: report rt group and block number in the bmap command
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (32 preceding siblings ...)
  2024-12-23 22:20   ` [PATCH 33/51] xfs_db: dump rt summary blocks Darrick J. Wong
@ 2024-12-23 22:20   ` Darrick J. Wong
  2024-12-23 22:20   ` [PATCH 35/51] xfs_io: support scrubbing rtgroup metadata Darrick J. Wong
                     ` (16 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:20 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

The bmap command does not report startblocks for realtime files
correctly.  If rtgroups are enabled, we need to use the appropriate
functions to crack the startblock into rtgroup and block numbers; if
not, then we need to report a linear address and not try to report a
group number.

Fix both of these issues.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/bmap.c |   56 +++++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 49 insertions(+), 7 deletions(-)


diff --git a/db/bmap.c b/db/bmap.c
index 7915772aaee4e0..1c5694c3f7d281 100644
--- a/db/bmap.c
+++ b/db/bmap.c
@@ -119,6 +119,41 @@ bmap(
 	*nexp = n;
 }
 
+static void
+print_group_bmbt(
+	bool			isrt,
+	int			whichfork,
+	const struct bmap_ext	*be)
+{
+	unsigned int		gno;
+	unsigned long long	gbno;
+
+	if (whichfork == XFS_DATA_FORK && isrt) {
+		gno = xfs_fsb_to_gno(mp, be->startblock, XG_TYPE_RTG);
+		gbno = xfs_fsb_to_gbno(mp, be->startblock, XG_TYPE_RTG);
+	} else {
+		gno = xfs_fsb_to_gno(mp, be->startblock, XG_TYPE_AG);
+		gbno = xfs_fsb_to_gbno(mp, be->startblock, XG_TYPE_AG);
+	}
+
+	dbprintf(
+ _("%s offset %lld startblock %llu (%u/%llu) count %llu flag %u\n"),
+			whichfork == XFS_DATA_FORK ? _("data") : _("attr"),
+			be->startoff, be->startblock,
+			gno, gbno,
+			be->blockcount, be->flag);
+}
+
+static void
+print_linear_bmbt(
+	const struct bmap_ext	*be)
+{
+	dbprintf(_("%s offset %lld startblock %llu count %llu flag %u\n"),
+			_("data"),
+			be->startoff, be->startblock,
+			be->blockcount, be->flag);
+}
+
 static int
 bmap_f(
 	int			argc,
@@ -135,6 +170,7 @@ bmap_f(
 	xfs_extnum_t		nex;
 	char			*p;
 	int			whichfork;
+	bool			isrt;
 
 	if (iocur_top->ino == NULLFSINO) {
 		dbprintf(_("no current inode\n"));
@@ -154,6 +190,10 @@ bmap_f(
 			return 0;
 		}
 	}
+
+	dip = iocur_top->data;
+	isrt = (dip->di_flags & cpu_to_be16(XFS_DIFLAG_REALTIME));
+
 	if (afork + dfork == 0) {
 		push_cur();
 		set_cur_inode(iocur_top->ino);
@@ -198,13 +238,15 @@ bmap_f(
 			bmap(co, eo - co + 1, whichfork, &nex, &be);
 			if (nex == 0)
 				break;
-			dbprintf(_("%s offset %lld startblock %llu (%u/%u) count "
-				 "%llu flag %u\n"),
-				whichfork == XFS_DATA_FORK ? _("data") : _("attr"),
-				be.startoff, be.startblock,
-				XFS_FSB_TO_AGNO(mp, be.startblock),
-				XFS_FSB_TO_AGBNO(mp, be.startblock),
-				be.blockcount, be.flag);
+
+			if (whichfork == XFS_DATA_FORK && isrt) {
+				if (xfs_has_rtgroups(mp))
+					print_group_bmbt(isrt, whichfork, &be);
+				else
+					print_linear_bmbt(&be);
+			} else {
+				print_group_bmbt(isrt, whichfork, &be);
+			}
 			co = be.startoff + be.blockcount;
 		}
 		co = cosave;


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 35/51] xfs_io: support scrubbing rtgroup metadata
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (33 preceding siblings ...)
  2024-12-23 22:20   ` [PATCH 34/51] xfs_db: report rt group and block number in the bmap command Darrick J. Wong
@ 2024-12-23 22:20   ` Darrick J. Wong
  2024-12-23 22:21   ` [PATCH 36/51] xfs_io: support scrubbing rtgroup metadata paths Darrick J. Wong
                     ` (15 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:20 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Support scrubbing all rtgroup metadata with a scrubv call.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 io/scrub.c        |   40 ++++++++++++++++++++++++++++++++++++++++
 man/man8/xfs_io.8 |    3 ++-
 2 files changed, 42 insertions(+), 1 deletion(-)


diff --git a/io/scrub.c b/io/scrub.c
index 45229a8ae81099..99c24d9550243c 100644
--- a/io/scrub.c
+++ b/io/scrub.c
@@ -165,6 +165,32 @@ parse_metapath(
 	return true;
 }
 
+static bool
+parse_rtgroup(
+	int		argc,
+	char		**argv,
+	int		optind,
+	__u32		*rgno)
+{
+	char		*p;
+	unsigned long	control;
+
+	if (optind != argc - 1) {
+		fprintf(stderr, _("Must specify one rtgroup number.\n"));
+		return false;
+	}
+
+	control = strtoul(argv[optind], &p, 0);
+	if (*p != '\0') {
+		fprintf(stderr, _("Bad rtgroup number '%s'.\n"),
+				argv[optind]);
+		return false;
+	}
+
+	*rgno = control;
+	return true;
+}
+
 static int
 parse_args(
 	int				argc,
@@ -230,6 +256,12 @@ parse_args(
 			return command_usage(cmdinfo);
 		}
 		break;
+	case XFROG_SCRUB_GROUP_RTGROUP:
+		if (!parse_rtgroup(argc, argv, optind, &meta->sm_agno)) {
+			exitcode = 1;
+			return command_usage(cmdinfo);
+		}
+		break;
 	case XFROG_SCRUB_GROUP_FS:
 	case XFROG_SCRUB_GROUP_NONE:
 	case XFROG_SCRUB_GROUP_SUMMARY:
@@ -539,6 +571,8 @@ scrubv_f(
 		group = XFROG_SCRUB_GROUP_ISCAN;
 	else if (!strcmp(argv[optind], "summary"))
 		group = XFROG_SCRUB_GROUP_SUMMARY;
+	else if (!strcmp(argv[optind], "rtgroup"))
+		group = XFROG_SCRUB_GROUP_RTGROUP;
 	else {
 		printf(_("Unknown group '%s'.\n"), argv[optind]);
 		exitcode = 1;
@@ -576,6 +610,12 @@ scrubv_f(
 			return command_usage(&scrubv_cmd);
 		}
 		break;
+	case XFROG_SCRUB_GROUP_RTGROUP:
+		if (!parse_rtgroup(argc, argv, optind, &scrubv.head.svh_agno)) {
+			exitcode = 1;
+			return command_usage(&scrubv_cmd);
+		}
+		break;
 	default:
 		ASSERT(0);
 		break;
diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8
index c73fee7c2780c6..6775b0a273e5aa 100644
--- a/man/man8/xfs_io.8
+++ b/man/man8/xfs_io.8
@@ -1425,11 +1425,12 @@ .SH FILESYSTEM COMMANDS
 .RE
 .PD
 .TP
-.BI "scrub " type " [ " agnumber " | " "ino" " " "gen" " | " metapath " ]"
+.BI "scrub " type " [ " agnumber " | " rgnumber " | " "ino" " " "gen" " | " metapath " ]"
 Scrub internal XFS filesystem metadata.  The
 .BI type
 parameter specifies which type of metadata to scrub.
 For AG metadata, one AG number must be specified.
+For realtime group metadata, one rtgroup number must be specified.
 For file metadata, the scrub is applied to the open file unless the
 inode number and generation number are specified.
 For metapath, the name of a file or a raw number must be specified.


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 36/51] xfs_io: support scrubbing rtgroup metadata paths
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (34 preceding siblings ...)
  2024-12-23 22:20   ` [PATCH 35/51] xfs_io: support scrubbing rtgroup metadata Darrick J. Wong
@ 2024-12-23 22:21   ` Darrick J. Wong
  2024-12-23 22:21   ` [PATCH 37/51] xfs_io: add a command to display allocation group information Darrick J. Wong
                     ` (14 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:21 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Support scrubbing the metadata directory path of an rtgroup metadata
file.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 io/scrub.c        |   41 +++++++++++++++++++++++++++++++++++------
 man/man8/xfs_io.8 |    3 ++-
 2 files changed, 37 insertions(+), 7 deletions(-)


diff --git a/io/scrub.c b/io/scrub.c
index 99c24d9550243c..a137f402b94d48 100644
--- a/io/scrub.c
+++ b/io/scrub.c
@@ -136,21 +136,23 @@ parse_metapath(
 	int		argc,
 	char		**argv,
 	int		optind,
-	__u64		*ino)
+	__u64		*ino,
+	__u32		*group)
 {
 	char		*p;
 	unsigned long long control;
+	unsigned long	control2 = 0;
 	int		i;
 
-	if (optind != argc - 1) {
+	if (optind != argc - 1 && optind != argc - 2) {
 		fprintf(stderr, _("Must specify metapath number.\n"));
 		return false;
 	}
 
 	for (i = 0; i < XFS_SCRUB_METAPATH_NR; i++) {
 		if (!strcmp(argv[optind], xfrog_metapaths[i].name)) {
-			*ino = i;
-			return true;
+			control = i;
+			goto find_group;
 		}
 	}
 
@@ -161,7 +163,32 @@ parse_metapath(
 		return false;
 	}
 
+find_group:
+	if (xfrog_metapaths[control].group == XFROG_SCRUB_GROUP_RTGROUP) {
+		if (optind == argc - 1) {
+			fprintf(stderr,
+_("%s: Metapath requires a group number.\n"),
+					xfrog_metapaths[*ino].name);
+			return false;
+		}
+		control2 = strtoul(argv[optind + 1], &p, 0);
+		if (*p != '\0') {
+			fprintf(stderr,
+ _("Bad group number '%s'.\n"),
+				argv[optind + 1]);
+			return false;
+		}
+	} else {
+		if (optind == argc - 2) {
+			fprintf(stderr,
+_("%s: Metapath does not take a second argument.\n"),
+					xfrog_metapaths[*ino].name);
+			return false;
+		}
+	}
+
 	*ino = control;
+	*group = control2;
 	return true;
 }
 
@@ -237,7 +264,8 @@ parse_args(
 
 	switch (d->group) {
 	case XFROG_SCRUB_GROUP_METAPATH:
-		if (!parse_metapath(argc, argv, optind, &meta->sm_ino)) {
+		if (!parse_metapath(argc, argv, optind, &meta->sm_ino,
+							&meta->sm_agno)) {
 			exitcode = 1;
 			return command_usage(cmdinfo);
 		}
@@ -582,7 +610,8 @@ scrubv_f(
 
 	switch (group) {
 	case XFROG_SCRUB_GROUP_METAPATH:
-		if (!parse_metapath(argc, argv, optind, &scrubv.head.svh_ino)) {
+		if (!parse_metapath(argc, argv, optind, &scrubv.head.svh_ino,
+				    &scrubv.head.svh_agno)) {
 			exitcode = 1;
 			return command_usage(&scrubv_cmd);
 		}
diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8
index 6775b0a273e5aa..4673b071901c28 100644
--- a/man/man8/xfs_io.8
+++ b/man/man8/xfs_io.8
@@ -1425,7 +1425,7 @@ .SH FILESYSTEM COMMANDS
 .RE
 .PD
 .TP
-.BI "scrub " type " [ " agnumber " | " rgnumber " | " "ino" " " "gen" " | " metapath " ]"
+.BI "scrub " type " [ " agnumber " | " rgnumber " | " "ino" " " "gen" " | " metapath " [ " rgnumber " ] ]"
 Scrub internal XFS filesystem metadata.  The
 .BI type
 parameter specifies which type of metadata to scrub.
@@ -1434,6 +1434,7 @@ .SH FILESYSTEM COMMANDS
 For file metadata, the scrub is applied to the open file unless the
 inode number and generation number are specified.
 For metapath, the name of a file or a raw number must be specified.
+If the metapath file is a per-rtgroup file, the group number must be specified.
 .RE
 .PD
 .TP


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 37/51] xfs_io: add a command to display allocation group information
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (35 preceding siblings ...)
  2024-12-23 22:21   ` [PATCH 36/51] xfs_io: support scrubbing rtgroup metadata paths Darrick J. Wong
@ 2024-12-23 22:21   ` Darrick J. Wong
  2024-12-23 22:21   ` [PATCH 38/51] xfs_io: add a command to display realtime " Darrick J. Wong
                     ` (13 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:21 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Add a new 'aginfo' command to xfs_io so that we can display allocation
group geometry.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 io/Makefile       |    1 
 io/aginfo.c       |  119 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 io/init.c         |    1 
 io/io.h           |    1 
 man/man8/xfs_io.8 |   12 +++++
 5 files changed, 134 insertions(+)
 create mode 100644 io/aginfo.c


diff --git a/io/Makefile b/io/Makefile
index c33d57f5e10b8f..8f835ec71fd768 100644
--- a/io/Makefile
+++ b/io/Makefile
@@ -9,6 +9,7 @@ LTCOMMAND = xfs_io
 LSRCFILES = xfs_bmap.sh xfs_freeze.sh xfs_mkfile.sh xfs_property
 HFILES = init.h io.h
 CFILES = \
+	aginfo.c \
 	attr.c \
 	bmap.c \
 	bulkstat.c \
diff --git a/io/aginfo.c b/io/aginfo.c
new file mode 100644
index 00000000000000..6cbfcb8de35523
--- /dev/null
+++ b/io/aginfo.c
@@ -0,0 +1,119 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2021-2024 Oracle.  All rights reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "platform_defs.h"
+#include "libxfs.h"
+#include "command.h"
+#include "input.h"
+#include "init.h"
+#include "io.h"
+#include "libfrog/logging.h"
+#include "libfrog/paths.h"
+#include "libfrog/fsgeom.h"
+
+static cmdinfo_t aginfo_cmd;
+
+static int
+report_aginfo(
+	struct xfs_fd		*xfd,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_ag_geometry	ageo = { 0 };
+	int			ret;
+
+	ret = -xfrog_ag_geometry(xfd->fd, agno, &ageo);
+	if (ret) {
+		xfrog_perror(ret, "aginfo");
+		return 1;
+	}
+
+	printf(_("AG: %u\n"),		ageo.ag_number);
+	printf(_("Blocks: %u\n"),	ageo.ag_length);
+	printf(_("Free Blocks: %u\n"),	ageo.ag_freeblks);
+	printf(_("Inodes: %u\n"),	ageo.ag_icount);
+	printf(_("Free Inodes: %u\n"),	ageo.ag_ifree);
+	printf(_("Sick: 0x%x\n"),	ageo.ag_sick);
+	printf(_("Checked: 0x%x\n"),	ageo.ag_checked);
+	printf(_("Flags: 0x%x\n"),	ageo.ag_flags);
+
+	return 0;
+}
+
+/* Display AG status. */
+static int
+aginfo_f(
+	int			argc,
+	char			**argv)
+{
+	struct xfs_fd		xfd = XFS_FD_INIT(file->fd);
+	unsigned long long	x;
+	xfs_agnumber_t		agno = NULLAGNUMBER;
+	int			c;
+	int			ret = 0;
+
+	ret = -xfd_prepare_geometry(&xfd);
+	if (ret) {
+		xfrog_perror(ret, "xfd_prepare_geometry");
+		exitcode = 1;
+		return 1;
+	}
+
+	while ((c = getopt(argc, argv, "a:")) != EOF) {
+		switch (c) {
+		case 'a':
+			errno = 0;
+			x = strtoll(optarg, NULL, 10);
+			if (!errno && x >= NULLAGNUMBER)
+				errno = ERANGE;
+			if (errno) {
+				perror("aginfo");
+				return 1;
+			}
+			agno = x;
+			break;
+		default:
+			return command_usage(&aginfo_cmd);
+		}
+	}
+
+	if (agno != NULLAGNUMBER) {
+		ret = report_aginfo(&xfd, agno);
+	} else {
+		for (agno = 0; !ret && agno < xfd.fsgeom.agcount; agno++) {
+			ret = report_aginfo(&xfd, agno);
+		}
+	}
+
+	return ret;
+}
+
+static void
+aginfo_help(void)
+{
+	printf(_(
+"\n"
+"Report allocation group geometry.\n"
+"\n"
+" -a agno  -- Report on the given allocation group.\n"
+"\n"));
+
+}
+
+static cmdinfo_t aginfo_cmd = {
+	.name = "aginfo",
+	.cfunc = aginfo_f,
+	.argmin = 0,
+	.argmax = -1,
+	.args = "[-a agno]",
+	.flags = CMD_NOMAP_OK,
+	.help = aginfo_help,
+};
+
+void
+aginfo_init(void)
+{
+	aginfo_cmd.oneline = _("Get XFS allocation group state.");
+	add_command(&aginfo_cmd);
+}
diff --git a/io/init.c b/io/init.c
index 5727f73515a6a2..4831deae1b2683 100644
--- a/io/init.c
+++ b/io/init.c
@@ -44,6 +44,7 @@ init_cvtnum(
 static void
 init_commands(void)
 {
+	aginfo_init();
 	attr_init();
 	bmap_init();
 	bulkstat_init();
diff --git a/io/io.h b/io/io.h
index 4daedac06419ae..d99065582057de 100644
--- a/io/io.h
+++ b/io/io.h
@@ -155,3 +155,4 @@ extern void		crc32cselftest_init(void);
 extern void		bulkstat_init(void);
 void			exchangerange_init(void);
 void			fsprops_init(void);
+void			aginfo_init(void);
diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8
index 4673b071901c28..31c81efed8f99b 100644
--- a/man/man8/xfs_io.8
+++ b/man/man8/xfs_io.8
@@ -1242,6 +1242,18 @@ .SH MEMORY MAPPED I/O COMMANDS
 for the current memory mapping.
 
 .SH FILESYSTEM COMMANDS
+.TP
+.BI "aginfo [ \-a " agno " ]"
+Show information about or update the state of allocation groups.
+.RE
+.RS 1.0i
+.PD 0
+.TP
+.BI \-a
+Act only on a specific allocation group.
+.PD
+.RE
+
 .TP
 .BI "bulkstat [ \-a " agno " ] [ \-d ] [ \-e " endino " ] [ \-m ] [ \-n " batchsize " ] [ \-q ] [ \-s " startino " ] [ \-v " version" ]
 Display raw stat information about a bunch of inodes in an XFS filesystem.


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 38/51] xfs_io: add a command to display realtime group information
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (36 preceding siblings ...)
  2024-12-23 22:21   ` [PATCH 37/51] xfs_io: add a command to display allocation group information Darrick J. Wong
@ 2024-12-23 22:21   ` Darrick J. Wong
  2024-12-23 22:21   ` [PATCH 39/51] xfs_io: display rt group in verbose bmap output Darrick J. Wong
                     ` (12 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:21 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Add a new 'rginfo' command to xfs_io so that we can display realtime
group geometry.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 io/aginfo.c       |   96 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 libfrog/fsgeom.c  |   18 ++++++++++
 libfrog/fsgeom.h  |    4 ++
 man/man8/xfs_io.8 |   13 +++++++
 4 files changed, 131 insertions(+)


diff --git a/io/aginfo.c b/io/aginfo.c
index 6cbfcb8de35523..f81986f0df4df3 100644
--- a/io/aginfo.c
+++ b/io/aginfo.c
@@ -14,6 +14,7 @@
 #include "libfrog/fsgeom.h"
 
 static cmdinfo_t aginfo_cmd;
+static cmdinfo_t rginfo_cmd;
 
 static int
 report_aginfo(
@@ -111,9 +112,104 @@ static cmdinfo_t aginfo_cmd = {
 	.help = aginfo_help,
 };
 
+static int
+report_rginfo(
+	struct xfs_fd		*xfd,
+	xfs_rgnumber_t		rgno)
+{
+	struct xfs_rtgroup_geometry	rgeo = { 0 };
+	int			ret;
+
+	ret = -xfrog_rtgroup_geometry(xfd->fd, rgno, &rgeo);
+	if (ret) {
+		xfrog_perror(ret, "rginfo");
+		return 1;
+	}
+
+	printf(_("RTG: %u\n"),		rgeo.rg_number);
+	printf(_("Length: %u\n"),	rgeo.rg_length);
+	printf(_("Sick: 0x%x\n"),	rgeo.rg_sick);
+	printf(_("Checked: 0x%x\n"),	rgeo.rg_checked);
+	printf(_("Flags: 0x%x\n"),	rgeo.rg_flags);
+
+	return 0;
+}
+
+/* Display rtgroup status. */
+static int
+rginfo_f(
+	int			argc,
+	char			**argv)
+{
+	struct xfs_fd		xfd = XFS_FD_INIT(file->fd);
+	unsigned long long	x;
+	xfs_rgnumber_t		rgno = NULLRGNUMBER;
+	int			c;
+	int			ret = 0;
+
+	ret = -xfd_prepare_geometry(&xfd);
+	if (ret) {
+		xfrog_perror(ret, "xfd_prepare_geometry");
+		exitcode = 1;
+		return 1;
+	}
+
+	while ((c = getopt(argc, argv, "r:")) != EOF) {
+		switch (c) {
+		case 'r':
+			errno = 0;
+			x = strtoll(optarg, NULL, 10);
+			if (!errno && x >= NULLRGNUMBER)
+				errno = ERANGE;
+			if (errno) {
+				perror("rginfo");
+				return 1;
+			}
+			rgno = x;
+			break;
+		default:
+			return command_usage(&rginfo_cmd);
+		}
+	}
+
+	if (rgno != NULLRGNUMBER) {
+		ret = report_rginfo(&xfd, rgno);
+	} else {
+		for (rgno = 0; !ret && rgno < xfd.fsgeom.rgcount; rgno++) {
+			ret = report_rginfo(&xfd, rgno);
+		}
+	}
+
+	return ret;
+}
+
+static void
+rginfo_help(void)
+{
+	printf(_(
+"\n"
+"Report realtime group geometry.\n"
+"\n"
+" -r rgno  -- Report on the given realtime group.\n"
+"\n"));
+
+}
+
+static cmdinfo_t rginfo_cmd = {
+	.name = "rginfo",
+	.cfunc = rginfo_f,
+	.argmin = 0,
+	.argmax = -1,
+	.args = "[-r rgno]",
+	.flags = CMD_NOMAP_OK,
+	.help = rginfo_help,
+};
+
 void
 aginfo_init(void)
 {
 	aginfo_cmd.oneline = _("Get XFS allocation group state.");
 	add_command(&aginfo_cmd);
+	rginfo_cmd.oneline = _("Get XFS realtime group state.");
+	add_command(&rginfo_cmd);
 }
diff --git a/libfrog/fsgeom.c b/libfrog/fsgeom.c
index 9c1e9a90eb1f1b..b5220d2d6ffd22 100644
--- a/libfrog/fsgeom.c
+++ b/libfrog/fsgeom.c
@@ -214,3 +214,21 @@ xfrog_ag_geometry(
 		return -errno;
 	return 0;
 }
+
+/*
+ * Try to obtain a rt group's geometry.  Returns zero or a negative error code.
+ */
+int
+xfrog_rtgroup_geometry(
+	int			fd,
+	unsigned int		rgno,
+	struct xfs_rtgroup_geometry	*rgeo)
+{
+	int			ret;
+
+	rgeo->rg_number = rgno;
+	ret = ioctl(fd, XFS_IOC_RTGROUP_GEOMETRY, rgeo);
+	if (ret)
+		return -errno;
+	return 0;
+}
diff --git a/libfrog/fsgeom.h b/libfrog/fsgeom.h
index df2ca2a408e78a..c571ddbcfb9b70 100644
--- a/libfrog/fsgeom.h
+++ b/libfrog/fsgeom.h
@@ -5,10 +5,14 @@
 #ifndef __LIBFROG_FSGEOM_H__
 #define __LIBFROG_FSGEOM_H__
 
+struct xfs_rtgroup_geometry;
+
 void xfs_report_geom(struct xfs_fsop_geom *geo, const char *mntpoint,
 		const char *logname, const char *rtname);
 int xfrog_geometry(int fd, struct xfs_fsop_geom *fsgeo);
 int xfrog_ag_geometry(int fd, unsigned int agno, struct xfs_ag_geometry *ageo);
+int xfrog_rtgroup_geometry(int fd, unsigned int rgno,
+		struct xfs_rtgroup_geometry *rgeo);
 
 /*
  * Structure for recording whatever observations we want about the level of
diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8
index 31c81efed8f99b..59d5ddc54dcc66 100644
--- a/man/man8/xfs_io.8
+++ b/man/man8/xfs_io.8
@@ -1328,6 +1328,19 @@ .SH FILESYSTEM COMMANDS
 .I tag
 argument, displays the list of error tags available.
 Only available in expert mode and requires privileges.
+
+.TP
+.BI "rginfo [ \-r " rgno " ]"
+Show information about or update the state of realtime allocation groups.
+.RE
+.RS 1.0i
+.PD 0
+.TP
+.BI \-r
+Act only on a specific realtime group.
+.PD
+.RE
+
 .TP
 .BI "resblks [ " blocks " ]"
 Get and/or set count of reserved filesystem blocks using the


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 39/51] xfs_io: display rt group in verbose bmap output
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (37 preceding siblings ...)
  2024-12-23 22:21   ` [PATCH 38/51] xfs_io: add a command to display realtime " Darrick J. Wong
@ 2024-12-23 22:21   ` Darrick J. Wong
  2024-12-23 22:22   ` [PATCH 40/51] xfs_io: display rt group in verbose fsmap output Darrick J. Wong
                     ` (11 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:21 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Display the rt group number in the bmap -v output, just like we do for
regular data files.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 io/bmap.c        |   27 ++++++++++++++++++---------
 libfrog/fsgeom.h |   12 ++++++++++++
 2 files changed, 30 insertions(+), 9 deletions(-)


diff --git a/io/bmap.c b/io/bmap.c
index 6182e1c591da18..b2f6b490528513 100644
--- a/io/bmap.c
+++ b/io/bmap.c
@@ -257,16 +257,18 @@ bmap_f(
 #define	FLG_BSW		0000010	/* Not on begin of stripe width */
 #define	FLG_ESW		0000001	/* Not on end   of stripe width */
 		int	agno;
-		off_t agoff, bbperag;
+		off_t	agoff, bbperag;
 		int	foff_w, boff_w, aoff_w, tot_w, agno_w;
 		char	rbuf[32], bbuf[32], abuf[32];
 		int	sunit, swidth;
 
 		foff_w = boff_w = aoff_w = MINRANGE_WIDTH;
 		tot_w = MINTOT_WIDTH;
-		if (is_rt)
-			sunit = swidth = bbperag = 0;
-		else {
+		if (is_rt) {
+			bbperag = bytes_per_rtgroup(&fsgeo) / BBSIZE;
+			sunit = 0;
+			swidth = 0;
+		} else {
 			bbperag = (off_t)fsgeo.agblocks *
 				  (off_t)fsgeo.blocksize / BBSIZE;
 			sunit = (fsgeo.sunit * fsgeo.blocksize) / BBSIZE;
@@ -295,7 +297,7 @@ bmap_f(
 					(long long)(map[i + 1].bmv_block +
 						map[i + 1].bmv_length - 1LL));
 				boff_w = max(boff_w, strlen(bbuf));
-				if (!is_rt) {
+				if (bbperag > 0) {
 					agno = map[i + 1].bmv_block / bbperag;
 					agoff = map[i + 1].bmv_block -
 							(agno * bbperag);
@@ -312,13 +314,20 @@ bmap_f(
 					numlen(map[i+1].bmv_length, 10));
 			}
 		}
-		agno_w = is_rt ? 0 : max(MINAG_WIDTH, numlen(fsgeo.agcount, 10));
+		if (is_rt) {
+			if (fsgeo.rgcount > 0)
+				agno_w = max(MINAG_WIDTH, numlen(fsgeo.rgcount, 10));
+			else
+				agno_w = 0;
+		} else {
+			agno_w = max(MINAG_WIDTH, numlen(fsgeo.agcount, 10));
+		}
 		printf("%4s: %-*s %-*s %*s %-*s %*s%s\n",
 			_("EXT"),
 			foff_w, _("FILE-OFFSET"),
 			boff_w, is_rt ? _("RT-BLOCK-RANGE") : _("BLOCK-RANGE"),
-			agno_w, is_rt ? "" : _("AG"),
-			aoff_w, is_rt ? "" : _("AG-OFFSET"),
+			agno_w, is_rt ? (fsgeo.rgcount ? _("RG") : "") : _("AG"),
+			aoff_w, is_rt ? (fsgeo.rgcount ? _("RG-OFFSET") : "") : _("AG-OFFSET"),
 			tot_w, _("TOTAL"),
 			flg ? _(" FLAGS") : "");
 		for (i = 0; i < egcnt; i++) {
@@ -377,7 +386,7 @@ bmap_f(
 						map[i + 1].bmv_length - 1LL));
 				printf("%4d: %-*s %-*s", i, foff_w, rbuf,
 					boff_w, bbuf);
-				if (!is_rt) {
+				if (bbperag > 0) {
 					agno = map[i + 1].bmv_block / bbperag;
 					agoff = map[i + 1].bmv_block -
 							(agno * bbperag);
diff --git a/libfrog/fsgeom.h b/libfrog/fsgeom.h
index c571ddbcfb9b70..b851b9bbf36a58 100644
--- a/libfrog/fsgeom.h
+++ b/libfrog/fsgeom.h
@@ -205,4 +205,16 @@ cvt_b_to_agbno(
 	return cvt_daddr_to_agbno(xfd, cvt_btobbt(byteno));
 }
 
+/* Return the number of bytes in an rtgroup. */
+static inline uint64_t
+bytes_per_rtgroup(
+	const struct xfs_fsop_geom	*fsgeo)
+{
+	if (!fsgeo->rgcount)
+		return 0;
+
+	return (uint64_t)fsgeo->rgextents * fsgeo->rtextsize *
+		fsgeo->blocksize;
+}
+
 #endif /* __LIBFROG_FSGEOM_H__ */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 40/51] xfs_io: display rt group in verbose fsmap output
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (38 preceding siblings ...)
  2024-12-23 22:21   ` [PATCH 39/51] xfs_io: display rt group in verbose bmap output Darrick J. Wong
@ 2024-12-23 22:22   ` Darrick J. Wong
  2024-12-23 22:22   ` [PATCH 41/51] xfs_mdrestore: refactor open-coded fd/is_file into a structure Darrick J. Wong
                     ` (10 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:22 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Display the rt group number in the fsmap output, just like we do for
regular data files.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 io/fsmap.c |   22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)


diff --git a/io/fsmap.c b/io/fsmap.c
index bf1196390cf35d..6de720f238bb44 100644
--- a/io/fsmap.c
+++ b/io/fsmap.c
@@ -14,6 +14,7 @@
 
 static cmdinfo_t	fsmap_cmd;
 static dev_t		xfs_data_dev;
+static dev_t		xfs_rt_dev;
 
 static void
 fsmap_help(void)
@@ -170,7 +171,7 @@ dump_map_verbose(
 	unsigned long long	i;
 	struct fsmap		*p;
 	int			agno;
-	off_t			agoff, bperag;
+	off_t			agoff, bperag, bperrtg;
 	int			foff_w, boff_w, aoff_w, tot_w, agno_w, own_w;
 	int			nr_w, dev_w;
 	char			rbuf[40], bbuf[40], abuf[40], obuf[40];
@@ -185,6 +186,7 @@ dump_map_verbose(
 	tot_w = MINTOT_WIDTH;
 	bperag = (off_t)fsgeo->agblocks *
 		  (off_t)fsgeo->blocksize;
+	bperrtg = bytes_per_rtgroup(fsgeo);
 	sunit = (fsgeo->sunit * fsgeo->blocksize);
 	swidth = (fsgeo->swidth * fsgeo->blocksize);
 
@@ -243,6 +245,13 @@ dump_map_verbose(
 				"(%lld..%lld)",
 				(long long)BTOBBT(agoff),
 				(long long)BTOBBT(agoff + p->fmr_length - 1));
+		} else if (p->fmr_device == xfs_rt_dev && fsgeo->rgcount > 0) {
+			agno = p->fmr_physical / bperrtg;
+			agoff = p->fmr_physical % bperrtg;
+			snprintf(abuf, sizeof(abuf),
+				"(%lld..%lld)",
+				(long long)BTOBBT(agoff),
+				(long long)BTOBBT(agoff + p->fmr_length - 1));
 		} else
 			abuf[0] = 0;
 		aoff_w = max(aoff_w, strlen(abuf));
@@ -315,6 +324,16 @@ dump_map_verbose(
 			snprintf(gbuf, sizeof(gbuf),
 				"%lld",
 				(long long)agno);
+		} else if (p->fmr_device == xfs_rt_dev && fsgeo->rgcount > 0) {
+			agno = p->fmr_physical / bperrtg;
+			agoff = p->fmr_physical % bperrtg;
+			snprintf(abuf, sizeof(abuf),
+				"(%lld..%lld)",
+				(long long)BTOBBT(agoff),
+				(long long)BTOBBT(agoff + p->fmr_length - 1));
+			snprintf(gbuf, sizeof(gbuf),
+				"%lld",
+				(long long)agno);
 		} else {
 			abuf[0] = 0;
 			gbuf[0] = 0;
@@ -501,6 +520,7 @@ fsmap_f(
 	}
 	fs = fs_table_lookup(file->name, FS_MOUNT_POINT);
 	xfs_data_dev = fs ? fs->fs_datadev : 0;
+	xfs_rt_dev = fs ? fs->fs_rtdev : 0;
 
 	head->fmh_count = map_size;
 	do {


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 41/51] xfs_mdrestore: refactor open-coded fd/is_file into a structure
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (39 preceding siblings ...)
  2024-12-23 22:22   ` [PATCH 40/51] xfs_io: display rt group in verbose fsmap output Darrick J. Wong
@ 2024-12-23 22:22   ` Darrick J. Wong
  2024-12-23 22:22   ` [PATCH 42/51] xfs_mdrestore: restore rt group superblocks to realtime device Darrick J. Wong
                     ` (9 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:22 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Create an explicit object to track the fd and flags associated with a
device onto which we are restoring metadata, and use it to reduce the
amount of open-coded arguments to ->restore.  This avoids some grossness
in the next patch.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 mdrestore/xfs_mdrestore.c |  123 +++++++++++++++++++++++----------------------
 1 file changed, 63 insertions(+), 60 deletions(-)


diff --git a/mdrestore/xfs_mdrestore.c b/mdrestore/xfs_mdrestore.c
index c6c00270234442..c5584fec68813e 100644
--- a/mdrestore/xfs_mdrestore.c
+++ b/mdrestore/xfs_mdrestore.c
@@ -15,12 +15,20 @@ union mdrestore_headers {
 	struct xfs_metadump_header	v2;
 };
 
+struct mdrestore_dev {
+	int		fd;
+	bool		is_file;
+};
+
+#define DEFINE_MDRESTORE_DEV(name) \
+	struct mdrestore_dev name = { .fd = -1 }
+
 struct mdrestore_ops {
 	void (*read_header)(union mdrestore_headers *header, FILE *md_fp);
 	void (*show_info)(union mdrestore_headers *header, const char *md_file);
 	void (*restore)(union mdrestore_headers *header, FILE *md_fp,
-			int ddev_fd, bool is_data_target_file, int logdev_fd,
-			bool is_log_target_file);
+			const struct mdrestore_dev *ddev,
+			const struct mdrestore_dev *logdev);
 };
 
 static struct mdrestore {
@@ -108,25 +116,24 @@ fixup_superblock(
 		fatal("error writing primary superblock: %s\n", strerror(errno));
 }
 
-static int
+static void
 open_device(
-	char		*path,
-	bool		*is_file)
+	struct mdrestore_dev	*dev,
+	char			*path)
 {
-	struct stat	statbuf;
-	int		open_flags;
-	int		fd;
+	struct stat		statbuf;
+	int			open_flags;
 
 	open_flags = O_RDWR;
-	*is_file = false;
+	dev->is_file = false;
 
 	if (stat(path, &statbuf) < 0)  {
 		/* ok, assume it's a file and create it */
 		open_flags |= O_CREAT;
-		*is_file = true;
+		dev->is_file = true;
 	} else if (S_ISREG(statbuf.st_mode))  {
 		open_flags |= O_TRUNC;
-		*is_file = true;
+		dev->is_file = true;
 	} else if (platform_check_ismounted(path, NULL, &statbuf, 0)) {
 		/*
 		 * check to make sure a filesystem isn't mounted on the device
@@ -136,23 +143,30 @@ open_device(
 				path);
 	}
 
-	fd = open(path, open_flags, 0644);
-	if (fd < 0)
+	dev->fd = open(path, open_flags, 0644);
+	if (dev->fd < 0)
 		fatal("couldn't open \"%s\"\n", path);
+}
 
-	return fd;
+static void
+close_device(
+	struct mdrestore_dev	*dev)
+{
+	if (dev->fd >= 0)
+		close(dev->fd);
+	dev->fd = -1;
+	dev->is_file = false;
 }
 
 static void
 verify_device_size(
-	int		dev_fd,
-	bool		is_file,
-	xfs_rfsblock_t	nr_blocks,
-	uint32_t	blocksize)
+	const struct mdrestore_dev	*dev,
+	xfs_rfsblock_t			nr_blocks,
+	uint32_t			blocksize)
 {
-	if (is_file) {
+	if (dev->is_file) {
 		/* ensure regular files are correctly sized */
-		if (ftruncate(dev_fd, nr_blocks * blocksize))
+		if (ftruncate(dev->fd, nr_blocks * blocksize))
 			fatal("cannot set filesystem image size: %s\n",
 				strerror(errno));
 	} else {
@@ -161,7 +175,7 @@ verify_device_size(
 		off_t		off;
 
 		off = nr_blocks * blocksize - sizeof(lb);
-		if (pwrite(dev_fd, lb, sizeof(lb), off) < 0)
+		if (pwrite(dev->fd, lb, sizeof(lb), off) < 0)
 			fatal("failed to write last block, is target too "
 				"small? (error: %s)\n", strerror(errno));
 	}
@@ -195,12 +209,10 @@ show_info_v1(
 
 static void
 restore_v1(
-	union mdrestore_headers *h,
-	FILE			*md_fp,
-	int			ddev_fd,
-	bool			is_data_target_file,
-	int			logdev_fd,
-	bool			is_log_target_file)
+	union mdrestore_headers		*h,
+	FILE				*md_fp,
+	const struct mdrestore_dev	*ddev,
+	const struct mdrestore_dev	*logdev)
 {
 	struct xfs_metablock	*metablock;	/* header + index + blocks */
 	__be64			*block_index;
@@ -254,8 +266,7 @@ restore_v1(
 
 	((struct xfs_dsb*)block_buffer)->sb_inprogress = 1;
 
-	verify_device_size(ddev_fd, is_data_target_file, sb.sb_dblocks,
-			sb.sb_blocksize);
+	verify_device_size(ddev, sb.sb_dblocks, sb.sb_blocksize);
 
 	bytes_read = 0;
 
@@ -263,7 +274,7 @@ restore_v1(
 		maybe_print_progress(&mb_read, bytes_read);
 
 		for (cur_index = 0; cur_index < mb_count; cur_index++) {
-			if (pwrite(ddev_fd, &block_buffer[cur_index <<
+			if (pwrite(ddev->fd, &block_buffer[cur_index <<
 					h->v1.mb_blocklog], block_size,
 					be64_to_cpu(block_index[cur_index]) <<
 						BBSHIFT) < 0)
@@ -292,7 +303,7 @@ restore_v1(
 
 	final_print_progress(&mb_read, bytes_read);
 
-	fixup_superblock(ddev_fd, block_buffer, &sb);
+	fixup_superblock(ddev->fd, block_buffer, &sb);
 
 	free(metablock);
 }
@@ -376,12 +387,10 @@ restore_meta_extent(
 
 static void
 restore_v2(
-	union mdrestore_headers	*h,
-	FILE			*md_fp,
-	int			ddev_fd,
-	bool			is_data_target_file,
-	int			logdev_fd,
-	bool			is_log_target_file)
+	union mdrestore_headers		*h,
+	FILE				*md_fp,
+	const struct mdrestore_dev	*ddev,
+	const struct mdrestore_dev	*logdev)
 {
 	struct xfs_sb		sb;
 	struct xfs_meta_extent	xme;
@@ -415,16 +424,14 @@ restore_v2(
 
 	((struct xfs_dsb *)block_buffer)->sb_inprogress = 1;
 
-	verify_device_size(ddev_fd, is_data_target_file, sb.sb_dblocks,
-			sb.sb_blocksize);
+	verify_device_size(ddev, sb.sb_dblocks, sb.sb_blocksize);
 
 	if (sb.sb_logstart == 0) {
 		ASSERT(mdrestore.external_log == true);
-		verify_device_size(logdev_fd, is_log_target_file, sb.sb_logblocks,
-				sb.sb_blocksize);
+		verify_device_size(logdev, sb.sb_logblocks, sb.sb_blocksize);
 	}
 
-	if (pwrite(ddev_fd, block_buffer, len, 0) < 0)
+	if (pwrite(ddev->fd, block_buffer, len, 0) < 0)
 		fatal("error writing primary superblock: %s\n",
 			strerror(errno));
 
@@ -446,11 +453,11 @@ restore_v2(
 		switch (be64_to_cpu(xme.xme_addr) & XME_ADDR_DEVICE_MASK) {
 		case XME_ADDR_DATA_DEVICE:
 			device = "data";
-			fd = ddev_fd;
+			fd = ddev->fd;
 			break;
 		case XME_ADDR_LOG_DEVICE:
 			device = "log";
-			fd = logdev_fd;
+			fd = logdev->fd;
 			break;
 		default:
 			fatal("Invalid device found in metadump\n");
@@ -467,7 +474,7 @@ restore_v2(
 
 	final_print_progress(&mb_read, bytes_read);
 
-	fixup_superblock(ddev_fd, block_buffer, &sb);
+	fixup_superblock(ddev->fd, block_buffer, &sb);
 
 	free(block_buffer);
 }
@@ -492,13 +499,11 @@ main(
 	char			**argv)
 {
 	union mdrestore_headers	headers;
+	DEFINE_MDRESTORE_DEV(ddev);
+	DEFINE_MDRESTORE_DEV(logdev);
 	FILE			*src_f;
-	char			*logdev = NULL;
-	int			data_dev_fd = -1;
-	int			log_dev_fd = -1;
+	char			*logdev_path = NULL;
 	int			c;
-	bool			is_data_dev_file = false;
-	bool			is_log_dev_file = false;
 
 	mdrestore.show_progress = false;
 	mdrestore.show_info = false;
@@ -516,7 +521,7 @@ main(
 				mdrestore.show_info = true;
 				break;
 			case 'l':
-				logdev = optarg;
+				logdev_path = optarg;
 				mdrestore.external_log = true;
 				break;
 			case 'V':
@@ -555,7 +560,7 @@ main(
 
 	switch (be32_to_cpu(headers.magic)) {
 	case XFS_MD_MAGIC_V1:
-		if (logdev != NULL)
+		if (logdev_path != NULL)
 			usage();
 		mdrestore.mdrops = &mdrestore_ops_v1;
 		break;
@@ -581,18 +586,16 @@ main(
 	optind++;
 
 	/* check and open data device */
-	data_dev_fd = open_device(argv[optind], &is_data_dev_file);
+	open_device(&ddev, argv[optind]);
 
+	/* check and open log device */
 	if (mdrestore.external_log)
-		/* check and open log device */
-		log_dev_fd = open_device(logdev, &is_log_dev_file);
+		open_device(&logdev, logdev_path);
 
-	mdrestore.mdrops->restore(&headers, src_f, data_dev_fd,
-			is_data_dev_file, log_dev_fd, is_log_dev_file);
+	mdrestore.mdrops->restore(&headers, src_f, &ddev, &logdev);
 
-	close(data_dev_fd);
-	if (mdrestore.external_log)
-		close(log_dev_fd);
+	close_device(&ddev);
+	close_device(&logdev);
 
 	if (src_f != stdin)
 		fclose(src_f);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 42/51] xfs_mdrestore: restore rt group superblocks to realtime device
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (40 preceding siblings ...)
  2024-12-23 22:22   ` [PATCH 41/51] xfs_mdrestore: refactor open-coded fd/is_file into a structure Darrick J. Wong
@ 2024-12-23 22:22   ` Darrick J. Wong
  2024-12-23 22:22   ` [PATCH 43/51] xfs_spaceman: report on realtime group health Darrick J. Wong
                     ` (8 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:22 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Support restoring realtime device metadata to the realtime device, if
the dumped filesystem had one.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 man/man8/xfs_mdrestore.8  |   10 ++++++++++
 mdrestore/xfs_mdrestore.c |   47 ++++++++++++++++++++++++++++++++++++---------
 2 files changed, 48 insertions(+), 9 deletions(-)


diff --git a/man/man8/xfs_mdrestore.8 b/man/man8/xfs_mdrestore.8
index f60e7b56ebf0d1..6f6e14e96c6a5c 100644
--- a/man/man8/xfs_mdrestore.8
+++ b/man/man8/xfs_mdrestore.8
@@ -8,6 +8,9 @@ .SH SYNOPSIS
 ] [
 .B \-l
 .I logdev
+] [
+.B \-r
+.I rtdev
 ]
 .I source
 .I target
@@ -17,6 +20,9 @@ .SH SYNOPSIS
 [
 .B \-l
 .I logdev
+] [
+.B \-r
+.I rtdev
 ]
 .I source
 .br
@@ -61,6 +67,10 @@ .SH OPTIONS
 In such a scenario, the user has to provide a device to which the log device
 contents from the metadump file are copied.
 .TP
+.BI \-r " rtdev"
+Restore realtime device metadata to this device.
+This is only required for a metadump in v2 format.
+.TP
 .B \-V
 Prints the version number and exits.
 .SH DIAGNOSTICS
diff --git a/mdrestore/xfs_mdrestore.c b/mdrestore/xfs_mdrestore.c
index c5584fec68813e..d5014981b15a68 100644
--- a/mdrestore/xfs_mdrestore.c
+++ b/mdrestore/xfs_mdrestore.c
@@ -28,7 +28,8 @@ struct mdrestore_ops {
 	void (*show_info)(union mdrestore_headers *header, const char *md_file);
 	void (*restore)(union mdrestore_headers *header, FILE *md_fp,
 			const struct mdrestore_dev *ddev,
-			const struct mdrestore_dev *logdev);
+			const struct mdrestore_dev *logdev,
+			const struct mdrestore_dev *rtdev);
 };
 
 static struct mdrestore {
@@ -37,6 +38,7 @@ static struct mdrestore {
 	bool			show_info;
 	bool			progress_since_warning;
 	bool			external_log;
+	bool			realtime_data;
 } mdrestore;
 
 static void
@@ -212,7 +214,8 @@ restore_v1(
 	union mdrestore_headers		*h,
 	FILE				*md_fp,
 	const struct mdrestore_dev	*ddev,
-	const struct mdrestore_dev	*logdev)
+	const struct mdrestore_dev	*logdev,
+	const struct mdrestore_dev	*rtdev)
 {
 	struct xfs_metablock	*metablock;	/* header + index + blocks */
 	__be64			*block_index;
@@ -336,8 +339,9 @@ read_header_v2(
 	if (!mdrestore.external_log && (compat & XFS_MD2_COMPAT_EXTERNALLOG))
 		fatal("External Log device is required\n");
 
-	if (h->v2.xmh_incompat_flags & cpu_to_be32(XFS_MD2_INCOMPAT_RTDEVICE))
-		fatal("Realtime device not yet supported\n");
+	if ((h->v2.xmh_incompat_flags & cpu_to_be32(XFS_MD2_INCOMPAT_RTDEVICE)) &&
+	    !mdrestore.realtime_data)
+		fatal("Realtime device is required\n");
 }
 
 static void
@@ -346,14 +350,17 @@ show_info_v2(
 	const char		*md_file)
 {
 	uint32_t		compat_flags;
+	uint32_t		incompat_flags;
 
 	compat_flags = be32_to_cpu(h->v2.xmh_compat_flags);
+	incompat_flags = be32_to_cpu(h->v2.xmh_incompat_flags);
 
-	printf("%s: %sobfuscated, %s log, external log contents are %sdumped, %s metadata blocks,\n",
+	printf("%s: %sobfuscated, %s log, external log contents are %sdumped, rt device contents are %sdumped, %s metadata blocks,\n",
 		md_file,
 		compat_flags & XFS_MD2_COMPAT_OBFUSCATED ? "":"not ",
 		compat_flags & XFS_MD2_COMPAT_DIRTYLOG ? "dirty":"clean",
 		compat_flags & XFS_MD2_COMPAT_EXTERNALLOG ? "":"not ",
+		incompat_flags & XFS_MD2_INCOMPAT_RTDEVICE ? "":"not ",
 		compat_flags & XFS_MD2_COMPAT_FULLBLOCKS ? "full":"zeroed");
 }
 
@@ -390,7 +397,8 @@ restore_v2(
 	union mdrestore_headers		*h,
 	FILE				*md_fp,
 	const struct mdrestore_dev	*ddev,
-	const struct mdrestore_dev	*logdev)
+	const struct mdrestore_dev	*logdev,
+	const struct mdrestore_dev	*rtdev)
 {
 	struct xfs_sb		sb;
 	struct xfs_meta_extent	xme;
@@ -431,6 +439,11 @@ restore_v2(
 		verify_device_size(logdev, sb.sb_logblocks, sb.sb_blocksize);
 	}
 
+	if (sb.sb_rblocks > 0) {
+		ASSERT(mdrestore.realtime_data == true);
+		verify_device_size(rtdev, sb.sb_rblocks, sb.sb_blocksize);
+	}
+
 	if (pwrite(ddev->fd, block_buffer, len, 0) < 0)
 		fatal("error writing primary superblock: %s\n",
 			strerror(errno));
@@ -459,6 +472,10 @@ restore_v2(
 			device = "log";
 			fd = logdev->fd;
 			break;
+		case XME_ADDR_RT_DEVICE:
+			device = "rt";
+			fd = rtdev->fd;
+			break;
 		default:
 			fatal("Invalid device found in metadump\n");
 			break;
@@ -488,7 +505,7 @@ static struct mdrestore_ops mdrestore_ops_v2 = {
 static void
 usage(void)
 {
-	fprintf(stderr, "Usage: %s [-V] [-g] [-i] [-l logdev] source target\n",
+	fprintf(stderr, "Usage: %s [-V] [-g] [-i] [-l logdev] [-r rtdev] source target\n",
 		progname);
 	exit(1);
 }
@@ -501,18 +518,21 @@ main(
 	union mdrestore_headers	headers;
 	DEFINE_MDRESTORE_DEV(ddev);
 	DEFINE_MDRESTORE_DEV(logdev);
+	DEFINE_MDRESTORE_DEV(rtdev);
 	FILE			*src_f;
 	char			*logdev_path = NULL;
+	char			*rtdev_path = NULL;
 	int			c;
 
 	mdrestore.show_progress = false;
 	mdrestore.show_info = false;
 	mdrestore.progress_since_warning = false;
 	mdrestore.external_log = false;
+	mdrestore.realtime_data = false;
 
 	progname = basename(argv[0]);
 
-	while ((c = getopt(argc, argv, "gil:V")) != EOF) {
+	while ((c = getopt(argc, argv, "gil:r:V")) != EOF) {
 		switch (c) {
 			case 'g':
 				mdrestore.show_progress = true;
@@ -524,6 +544,10 @@ main(
 				logdev_path = optarg;
 				mdrestore.external_log = true;
 				break;
+			case 'r':
+				rtdev_path = optarg;
+				mdrestore.realtime_data = true;
+				break;
 			case 'V':
 				printf("%s version %s\n", progname, VERSION);
 				exit(0);
@@ -592,10 +616,15 @@ main(
 	if (mdrestore.external_log)
 		open_device(&logdev, logdev_path);
 
-	mdrestore.mdrops->restore(&headers, src_f, &ddev, &logdev);
+	/* check and open realtime device */
+	if (mdrestore.realtime_data)
+		open_device(&rtdev, rtdev_path);
+
+	mdrestore.mdrops->restore(&headers, src_f, &ddev, &logdev, &rtdev);
 
 	close_device(&ddev);
 	close_device(&logdev);
+	close_device(&rtdev);
 
 	if (src_f != stdin)
 		fclose(src_f);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 43/51] xfs_spaceman: report on realtime group health
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (41 preceding siblings ...)
  2024-12-23 22:22   ` [PATCH 42/51] xfs_mdrestore: restore rt group superblocks to realtime device Darrick J. Wong
@ 2024-12-23 22:22   ` Darrick J. Wong
  2024-12-23 22:23   ` [PATCH 44/51] xfs_scrub: scrub realtime allocation group metadata Darrick J. Wong
                     ` (7 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:22 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Add the realtime group status to the health reporting done by
xfs_spaceman.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 man/man8/xfs_spaceman.8 |    5 +++-
 spaceman/health.c       |   63 ++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 65 insertions(+), 3 deletions(-)


diff --git a/man/man8/xfs_spaceman.8 b/man/man8/xfs_spaceman.8
index 0d299132a7881b..7d2d1ff94eeb55 100644
--- a/man/man8/xfs_spaceman.8
+++ b/man/man8/xfs_spaceman.8
@@ -91,7 +91,7 @@ .SH COMMANDS
 .BR "xfs_info" "(8)"
 prints when querying a filesystem.
 .TP
-.BI "health [ \-a agno] [ \-c ] [ \-f ] [ \-i inum ] [ \-n ] [ \-q ] [ paths ]"
+.BI "health [ \-a agno] [ \-c ] [ \-f ] [ \-i inum ] [ \-n ] [ \-q ] [ \-r rgno ] [ paths ]"
 Reports the health of the given group of filesystem metadata.
 .RS 1.0i
 .PD 0
@@ -119,6 +119,9 @@ .SH COMMANDS
 .B \-q
 Report only unhealthy metadata.
 .TP
+.B \-r
+Report on the health of the given realtime group.
+.TP
 .B paths
 Report on the health of the files at the given path.
 .PD
diff --git a/spaceman/health.c b/spaceman/health.c
index c4d570363fbbf1..4281589324cd44 100644
--- a/spaceman/health.c
+++ b/spaceman/health.c
@@ -132,6 +132,22 @@ static const struct flag_map ag_flags[] = {
 	{0},
 };
 
+static const struct flag_map rtgroup_flags[] = {
+	{
+		.mask = XFS_RTGROUP_GEOM_SICK_SUPER,
+		.descr = "superblock",
+	},
+	{
+		.mask = XFS_RTGROUP_GEOM_SICK_BITMAP,
+		.descr = "realtime bitmap",
+	},
+	{
+		.mask = XFS_RTGROUP_GEOM_SICK_SUMMARY,
+		.descr = "realtime summary",
+	},
+	{0},
+};
+
 static const struct flag_map inode_flags[] = {
 	{
 		.mask = XFS_BS_SICK_INODE,
@@ -216,6 +232,25 @@ report_ag_sick(
 	return 0;
 }
 
+/* Report on a rt group's health. */
+static int
+report_rtgroup_sick(
+	xfs_rgnumber_t		rgno)
+{
+	struct xfs_rtgroup_geometry rgeo = { 0 };
+	char			descr[256];
+	int			ret;
+
+	ret = -xfrog_rtgroup_geometry(file->xfd.fd, rgno, &rgeo);
+	if (ret) {
+		xfrog_perror(ret, "rtgroup_geometry");
+		return 1;
+	}
+	snprintf(descr, sizeof(descr) - 1, _("rtgroup %u"), rgno);
+	report_sick(descr, rtgroup_flags, rgeo.rg_sick, rgeo.rg_checked);
+	return 0;
+}
+
 /* Report on an inode's health. */
 static int
 report_inode_health(
@@ -342,7 +377,7 @@ report_bulkstat_health(
 	return error;
 }
 
-#define OPT_STRING ("a:cfi:nq")
+#define OPT_STRING ("a:cfi:nqr:")
 
 /* Report on health problems in XFS filesystem. */
 static int
@@ -352,6 +387,7 @@ health_f(
 {
 	unsigned long long	x;
 	xfs_agnumber_t		agno;
+	xfs_rgnumber_t		rgno;
 	bool			default_report = true;
 	int			c;
 	int			ret;
@@ -399,6 +435,17 @@ health_f(
 		case 'q':
 			quiet = true;
 			break;
+		case 'r':
+			default_report = false;
+			errno = 0;
+			x = strtoll(optarg, NULL, 10);
+			if (!errno && x >= NULLRGNUMBER)
+				errno = ERANGE;
+			if (errno) {
+				perror("rtgroup health");
+				return 1;
+			}
+			break;
 		default:
 			return command_usage(&health_cmd);
 		}
@@ -434,6 +481,12 @@ health_f(
 			if (ret)
 				return 1;
 			break;
+		case 'r':
+			rgno = strtoll(optarg, NULL, 10);
+			ret = report_rtgroup_sick(rgno);
+			if (ret)
+				return 1;
+			break;
 		default:
 			break;
 		}
@@ -455,6 +508,11 @@ health_f(
 			if (ret)
 				return 1;
 		}
+		for (rgno = 0; rgno < file->xfd.fsgeom.rgcount; rgno++) {
+			ret = report_rtgroup_sick(rgno);
+			if (ret)
+				return 1;
+		}
 		if (comprehensive) {
 			ret = report_bulkstat_health(NULLAGNUMBER);
 			if (ret)
@@ -485,6 +543,7 @@ health_help(void)
 " -i inum  -- Report health of a given inode number.\n"
 " -n       -- Try to report file names.\n"
 " -q       -- Only report unhealthy metadata.\n"
+" -r rgno  -- Report health of the given realtime group.\n"
 " paths    -- Report health of the given file path.\n"
 "\n"));
 
@@ -495,7 +554,7 @@ static cmdinfo_t health_cmd = {
 	.cfunc = health_f,
 	.argmin = 0,
 	.argmax = -1,
-	.args = "[-a agno] [-c] [-f] [-i inum] [-n] [-q] [paths]",
+	.args = "[-a agno] [-c] [-f] [-i inum] [-n] [-q] [-r rgno] [paths]",
 	.flags = CMD_FLAG_ONESHOT,
 	.help = health_help,
 };


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 44/51] xfs_scrub: scrub realtime allocation group metadata
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (42 preceding siblings ...)
  2024-12-23 22:22   ` [PATCH 43/51] xfs_spaceman: report on realtime group health Darrick J. Wong
@ 2024-12-23 22:23   ` Darrick J. Wong
  2024-12-23 22:23   ` [PATCH 45/51] xfs_scrub: check rtgroup metadata directory connections Darrick J. Wong
                     ` (6 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:23 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Scan realtime group metadata as part of phase 2, just like we do for AG
metadata.  For pre-rtgroup filesystems, pretend that this is a "rtgroup
0" scrub request because the kernel expects that.  Replace the old
cond_wait code with a scrub barrier because they're equivalent for two
items that cannot be scrubbed in parallel.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libfrog/scrub.c |    4 +-
 scrub/phase2.c  |  124 ++++++++++++++++++++++++++++++++++++++-----------------
 scrub/scrub.c   |    1 
 scrub/scrub.h   |    9 ++++
 4 files changed, 98 insertions(+), 40 deletions(-)


diff --git a/libfrog/scrub.c b/libfrog/scrub.c
index 66000f1ed66be4..d40364d35ce0b4 100644
--- a/libfrog/scrub.c
+++ b/libfrog/scrub.c
@@ -107,12 +107,12 @@ const struct xfrog_scrub_descr xfrog_scrubbers[XFS_SCRUB_TYPE_NR] = {
 	[XFS_SCRUB_TYPE_RTBITMAP] = {
 		.name	= "rtbitmap",
 		.descr	= "realtime bitmap",
-		.group	= XFROG_SCRUB_GROUP_FS,
+		.group	= XFROG_SCRUB_GROUP_RTGROUP,
 	},
 	[XFS_SCRUB_TYPE_RTSUM] = {
 		.name	= "rtsummary",
 		.descr	= "realtime summary",
-		.group	= XFROG_SCRUB_GROUP_FS,
+		.group	= XFROG_SCRUB_GROUP_RTGROUP,
 	},
 	[XFS_SCRUB_TYPE_UQUOTA] = {
 		.name	= "usrquota",
diff --git a/scrub/phase2.c b/scrub/phase2.c
index c24d137358c74d..c7828c332e7c3a 100644
--- a/scrub/phase2.c
+++ b/scrub/phase2.c
@@ -21,12 +21,10 @@
 
 struct scan_ctl {
 	/*
-	 * Control mechanism to signal that the rt bitmap file scan is done and
-	 * wake up any waiters.
+	 * Control mechanism to signal that each group's scan of the rt bitmap
+	 * file scan is done and wake up any waiters.
 	 */
-	pthread_cond_t		rbm_wait;
-	pthread_mutex_t		rbm_waitlock;
-	bool			rbm_done;
+	unsigned int		rbm_group_count;
 
 	bool			aborted;
 };
@@ -202,7 +200,7 @@ scan_fs_metadata(
 	int			ret;
 
 	if (sctl->aborted)
-		goto out;
+		return;
 
 	/*
 	 * Try to check all of the metadata files that we just scheduled.  If
@@ -215,14 +213,14 @@ scan_fs_metadata(
 	ret = scrub_item_check(ctx, &sri);
 	if (ret) {
 		sctl->aborted = true;
-		goto out;
+		return;
 	}
 
 	ret = repair_and_scrub_loop(ctx, &sri, xfrog_scrubbers[type].descr,
 			&defer_repairs);
 	if (ret) {
 		sctl->aborted = true;
-		goto out;
+		return;
 	}
 	if (defer_repairs)
 		goto defer;
@@ -235,15 +233,60 @@ scan_fs_metadata(
 	ret = defer_fs_repair(ctx, &sri);
 	if (ret) {
 		sctl->aborted = true;
-		goto out;
+		return;
 	}
+}
 
-out:
-	if (type == XFS_SCRUB_TYPE_RTBITMAP) {
-		pthread_mutex_lock(&sctl->rbm_waitlock);
-		sctl->rbm_done = true;
-		pthread_cond_broadcast(&sctl->rbm_wait);
-		pthread_mutex_unlock(&sctl->rbm_waitlock);
+/*
+ * Scrub each rt group's metadata.  For pre-rtgroup filesystems, we ask to
+ * scrub "rtgroup 0" because that's how the kernel ioctl works.
+ */
+static void
+scan_rtgroup_metadata(
+	struct workqueue	*wq,
+	xfs_agnumber_t		rgno,
+	void			*arg)
+{
+	struct scrub_item	sri;
+	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
+	struct scan_ctl		*sctl = arg;
+	char			descr[DESCR_BUFSZ];
+	bool			defer_repairs;
+	int			ret;
+
+	if (sctl->aborted)
+		return;
+
+	scrub_item_init_rtgroup(&sri, rgno);
+	if (ctx->mnt.fsgeom.rgcount == 0)
+		snprintf(descr, DESCR_BUFSZ, _("realtime"));
+	else
+		snprintf(descr, DESCR_BUFSZ, _("rtgroup %u"), rgno);
+
+	/*
+	 * Try to check all of the rtgroup metadata items that we just
+	 * scheduled.  If we return with some types still needing a check, try
+	 * repairing any damaged metadata that we've found so far, and try
+	 * again.  Abort if we stop making forward progress.
+	 */
+	scrub_item_schedule_group(&sri, XFROG_SCRUB_GROUP_RTGROUP);
+	ret = scrub_item_check(ctx, &sri);
+	if (ret) {
+		sctl->aborted = true;
+		return;
+	}
+
+	ret = repair_and_scrub_loop(ctx, &sri, descr, &defer_repairs);
+	if (ret) {
+		sctl->aborted = true;
+		return;
+	}
+
+	/* Everything else gets fixed during phase 4. */
+	ret = defer_fs_repair(ctx, &sri);
+	if (ret) {
+		sctl->aborted = true;
+		return;
 	}
 }
 
@@ -255,17 +298,14 @@ phase2_func(
 	struct workqueue	wq;
 	struct scan_ctl		sctl = {
 		.aborted	= false,
-		.rbm_done	= false,
 	};
 	struct scrub_item	sri;
 	const struct xfrog_scrub_descr *sc = xfrog_scrubbers;
 	xfs_agnumber_t		agno;
+	xfs_rgnumber_t		rgno;
 	unsigned int		type;
 	int			ret, ret2;
 
-	pthread_mutex_init(&sctl.rbm_waitlock, NULL);
-	pthread_cond_init(&sctl.rbm_wait, NULL);
-
 	ret = -workqueue_create(&wq, (struct xfs_mount *)ctx,
 			scrub_nproc_workqueue(ctx));
 	if (ret) {
@@ -311,8 +351,6 @@ phase2_func(
 	for (type = 0; type < XFS_SCRUB_TYPE_NR; type++, sc++) {
 		if (sc->group != XFROG_SCRUB_GROUP_FS)
 			continue;
-		if (type == XFS_SCRUB_TYPE_RTSUM)
-			continue;
 
 		ret = -workqueue_add(&wq, scan_fs_metadata, type, &sctl);
 		if (ret) {
@@ -325,24 +363,37 @@ phase2_func(
 	if (sctl.aborted)
 		goto out_wq;
 
-	/*
-	 * Wait for the rt bitmap to finish scanning, then scan the rt summary
-	 * since the summary can be regenerated completely from the bitmap.
-	 */
-	pthread_mutex_lock(&sctl.rbm_waitlock);
-	while (!sctl.rbm_done)
-		pthread_cond_wait(&sctl.rbm_wait, &sctl.rbm_waitlock);
-	pthread_mutex_unlock(&sctl.rbm_waitlock);
+	if (ctx->mnt.fsgeom.rgcount == 0) {
+		/*
+		 * When rtgroups were added, the bitmap and summary files
+		 * became per-rtgroup metadata so the scrub interface for the
+		 * two started to accept sm_agno.  For pre-rtgroups
+		 * filesystems, we still accept sm_agno==0, so invoke scrub in
+		 * this manner.
+		 */
+		ret = -workqueue_add(&wq, scan_rtgroup_metadata, 0, &sctl);
+		if (ret) {
+			str_liberror(ctx, ret,
+					_("queueing realtime scrub work"));
+			goto out_wq;
+		}
+	}
+
+	/* Scan each rtgroup in parallel. */
+	for (rgno = 0;
+	     rgno < ctx->mnt.fsgeom.rgcount && !sctl.aborted;
+	     rgno++) {
+		ret = -workqueue_add(&wq, scan_rtgroup_metadata, rgno, &sctl);
+		if (ret) {
+			str_liberror(ctx, ret,
+					_("queueing rtgroup scrub work"));
+			goto out_wq;
+		}
+	}
 
 	if (sctl.aborted)
 		goto out_wq;
 
-	ret = -workqueue_add(&wq, scan_fs_metadata, XFS_SCRUB_TYPE_RTSUM, &sctl);
-	if (ret) {
-		str_liberror(ctx, ret, _("queueing rtsummary scrub work"));
-		goto out_wq;
-	}
-
 out_wq:
 	ret2 = -workqueue_terminate(&wq);
 	if (ret2) {
@@ -352,9 +403,6 @@ phase2_func(
 	}
 	workqueue_destroy(&wq);
 out_wait:
-	pthread_cond_destroy(&sctl.rbm_wait);
-	pthread_mutex_destroy(&sctl.rbm_waitlock);
-
 	if (!ret && sctl.aborted)
 		ret = ECANCELED;
 	return ret;
diff --git a/scrub/scrub.c b/scrub/scrub.c
index a2fd8d77d82be0..de687af687d32d 100644
--- a/scrub/scrub.c
+++ b/scrub/scrub.c
@@ -50,6 +50,7 @@ static const unsigned int scrub_deps[XFS_SCRUB_TYPE_NR] = {
 	[XFS_SCRUB_TYPE_QUOTACHECK]	= DEP(XFS_SCRUB_TYPE_UQUOTA) |
 					  DEP(XFS_SCRUB_TYPE_GQUOTA) |
 					  DEP(XFS_SCRUB_TYPE_PQUOTA),
+	[XFS_SCRUB_TYPE_RTSUM]		= DEP(XFS_SCRUB_TYPE_RTBITMAP),
 };
 #undef DEP
 
diff --git a/scrub/scrub.h b/scrub/scrub.h
index 3bb3ea1d07bf40..bb94a11dcfce71 100644
--- a/scrub/scrub.h
+++ b/scrub/scrub.h
@@ -90,6 +90,15 @@ scrub_item_init_ag(struct scrub_item *sri, xfs_agnumber_t agno)
 	sri->sri_gen = -1U;
 }
 
+static inline void
+scrub_item_init_rtgroup(struct scrub_item *sri, xfs_rgnumber_t rgno)
+{
+	memset(sri, 0, sizeof(*sri));
+	sri->sri_agno = rgno;
+	sri->sri_ino = -1ULL;
+	sri->sri_gen = -1U;
+}
+
 static inline void
 scrub_item_init_fs(struct scrub_item *sri)
 {


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 45/51] xfs_scrub: check rtgroup metadata directory connections
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (43 preceding siblings ...)
  2024-12-23 22:23   ` [PATCH 44/51] xfs_scrub: scrub realtime allocation group metadata Darrick J. Wong
@ 2024-12-23 22:23   ` Darrick J. Wong
  2024-12-23 22:23   ` [PATCH 46/51] xfs_scrub: cleanup fsmap keys initialization Darrick J. Wong
                     ` (5 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:23 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Run the rtgroup metapath scrubber during phase 5 to ensure that any
rtgroup metadata files are still connected to the metadir tree after
we've pruned any bad links.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 scrub/phase5.c |   24 ++++++++++++++++++++++--
 scrub/scrub.h  |    4 +++-
 2 files changed, 25 insertions(+), 3 deletions(-)


diff --git a/scrub/phase5.c b/scrub/phase5.c
index 4d0a76a529b55d..22a22915dbc68d 100644
--- a/scrub/phase5.c
+++ b/scrub/phase5.c
@@ -750,6 +750,7 @@ static int
 queue_metapath_scan(
 	struct workqueue	*wq,
 	bool			*abortedp,
+	xfs_rgnumber_t		rgno,
 	uint64_t		type)
 {
 	struct fs_scan_item	*item;
@@ -762,7 +763,7 @@ queue_metapath_scan(
 		str_liberror(ctx, ret, _("setting up metapath scan"));
 		return ret;
 	}
-	scrub_item_init_metapath(&item->sri, type);
+	scrub_item_init_metapath(&item->sri, rgno, type);
 	scrub_item_schedule(&item->sri, XFS_SCRUB_TYPE_METAPATH);
 	item->abortedp = abortedp;
 
@@ -785,6 +786,7 @@ run_kernel_metadir_path_scrubbers(
 	const struct xfrog_scrub_descr	*sc;
 	uint64_t		type;
 	unsigned int		nr_threads = scrub_nproc_workqueue(ctx);
+	xfs_rgnumber_t		rgno;
 	bool			aborted = false;
 	int			ret, ret2;
 
@@ -804,7 +806,7 @@ run_kernel_metadir_path_scrubbers(
 		if (sc->group != XFROG_SCRUB_GROUP_FS)
 			continue;
 
-		ret = queue_metapath_scan(&wq, &aborted, type);
+		ret = queue_metapath_scan(&wq, &aborted, 0, type);
 		if (ret) {
 			str_liberror(ctx, ret,
  _("queueing metapath scrub work"));
@@ -812,6 +814,24 @@ run_kernel_metadir_path_scrubbers(
 		}
 	}
 
+	/* Scan all rtgroup metadata files */
+	for (rgno = 0;
+	     rgno < ctx->mnt.fsgeom.rgcount && !aborted;
+	     rgno++) {
+		for (type = 0; type < XFS_SCRUB_METAPATH_NR; type++) {
+			sc = &xfrog_metapaths[type];
+			if (sc->group != XFROG_SCRUB_GROUP_RTGROUP)
+				continue;
+
+			ret = queue_metapath_scan(&wq, &aborted, rgno, type);
+			if (ret) {
+				str_liberror(ctx, ret,
+  _("queueing metapath scrub work"));
+				goto wait;
+			}
+		}
+	}
+
 wait:
 	ret2 = -workqueue_terminate(&wq);
 	if (ret2) {
diff --git a/scrub/scrub.h b/scrub/scrub.h
index bb94a11dcfce71..24b5ad629c5158 100644
--- a/scrub/scrub.h
+++ b/scrub/scrub.h
@@ -118,9 +118,11 @@ scrub_item_init_file(struct scrub_item *sri, const struct xfs_bulkstat *bstat)
 }
 
 static inline void
-scrub_item_init_metapath(struct scrub_item *sri, uint64_t metapath)
+scrub_item_init_metapath(struct scrub_item *sri, xfs_rgnumber_t rgno,
+		uint64_t metapath)
 {
 	memset(sri, 0, sizeof(*sri));
+	sri->sri_agno = rgno;
 	sri->sri_ino = metapath;
 }
 


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 46/51] xfs_scrub: cleanup fsmap keys initialization
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (44 preceding siblings ...)
  2024-12-23 22:23   ` [PATCH 45/51] xfs_scrub: check rtgroup metadata directory connections Darrick J. Wong
@ 2024-12-23 22:23   ` Darrick J. Wong
  2024-12-23 22:24   ` [PATCH 47/51] xfs_scrub: call GETFSMAP for each rt group in parallel Darrick J. Wong
                     ` (4 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:23 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Use the good old array notations instead of pointer arithmetics.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
[djwong: fold scan_rtg_rmaps cleanups into next patch]
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 scrub/phase6.c   |   17 ++++++++---------
 scrub/spacemap.c |   32 +++++++++++++++-----------------
 2 files changed, 23 insertions(+), 26 deletions(-)


diff --git a/scrub/phase6.c b/scrub/phase6.c
index e4f26e7f1dd93e..fc63f5aad0bd7b 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -495,7 +495,7 @@ report_ioerr(
 	uint64_t			length,
 	void				*arg)
 {
-	struct fsmap			keys[2];
+	struct fsmap			keys[2] = { };
 	struct ioerr_filerange		fr = {
 		.physical		= start,
 		.length			= length,
@@ -506,14 +506,13 @@ report_ioerr(
 	dev = disk_to_dev(dioerr->ctx, dioerr->disk);
 
 	/* Go figure out which blocks are bad from the fsmap. */
-	memset(keys, 0, sizeof(struct fsmap) * 2);
-	keys->fmr_device = dev;
-	keys->fmr_physical = start;
-	(keys + 1)->fmr_device = dev;
-	(keys + 1)->fmr_physical = start + length - 1;
-	(keys + 1)->fmr_owner = ULLONG_MAX;
-	(keys + 1)->fmr_offset = ULLONG_MAX;
-	(keys + 1)->fmr_flags = UINT_MAX;
+	keys[0].fmr_device = dev;
+	keys[0].fmr_physical = start;
+	keys[1].fmr_device = dev;
+	keys[1].fmr_physical = start + length - 1;
+	keys[1].fmr_owner = ULLONG_MAX;
+	keys[1].fmr_offset = ULLONG_MAX;
+	keys[1].fmr_flags = UINT_MAX;
 	return -scrub_iterate_fsmap(dioerr->ctx, keys, report_ioerr_fsmap,
 			&fr);
 }
diff --git a/scrub/spacemap.c b/scrub/spacemap.c
index e35756db2eed43..4b7fae252d86ca 100644
--- a/scrub/spacemap.c
+++ b/scrub/spacemap.c
@@ -96,21 +96,20 @@ scan_ag_rmaps(
 {
 	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
 	struct scan_blocks	*sbx = arg;
-	struct fsmap		keys[2];
+	struct fsmap		keys[2] = { };
 	off_t			bperag;
 	int			ret;
 
 	bperag = (off_t)ctx->mnt.fsgeom.agblocks *
 		 (off_t)ctx->mnt.fsgeom.blocksize;
 
-	memset(keys, 0, sizeof(struct fsmap) * 2);
-	keys->fmr_device = ctx->fsinfo.fs_datadev;
-	keys->fmr_physical = agno * bperag;
-	(keys + 1)->fmr_device = ctx->fsinfo.fs_datadev;
-	(keys + 1)->fmr_physical = ((agno + 1) * bperag) - 1;
-	(keys + 1)->fmr_owner = ULLONG_MAX;
-	(keys + 1)->fmr_offset = ULLONG_MAX;
-	(keys + 1)->fmr_flags = UINT_MAX;
+	keys[0].fmr_device = ctx->fsinfo.fs_datadev;
+	keys[0].fmr_physical = agno * bperag;
+	keys[1].fmr_device = ctx->fsinfo.fs_datadev;
+	keys[1].fmr_physical = ((agno + 1) * bperag) - 1;
+	keys[1].fmr_owner = ULLONG_MAX;
+	keys[1].fmr_offset = ULLONG_MAX;
+	keys[1].fmr_flags = UINT_MAX;
 
 	if (sbx->aborted)
 		return;
@@ -135,16 +134,15 @@ scan_dev_rmaps(
 	dev_t			dev,
 	struct scan_blocks	*sbx)
 {
-	struct fsmap		keys[2];
+	struct fsmap		keys[2] = { };
 	int			ret;
 
-	memset(keys, 0, sizeof(struct fsmap) * 2);
-	keys->fmr_device = dev;
-	(keys + 1)->fmr_device = dev;
-	(keys + 1)->fmr_physical = ULLONG_MAX;
-	(keys + 1)->fmr_owner = ULLONG_MAX;
-	(keys + 1)->fmr_offset = ULLONG_MAX;
-	(keys + 1)->fmr_flags = UINT_MAX;
+	keys[0].fmr_device = dev;
+	keys[1].fmr_device = dev;
+	keys[1].fmr_physical = ULLONG_MAX;
+	keys[1].fmr_owner = ULLONG_MAX;
+	keys[1].fmr_offset = ULLONG_MAX;
+	keys[1].fmr_flags = UINT_MAX;
 
 	if (sbx->aborted)
 		return;


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 47/51] xfs_scrub: call GETFSMAP for each rt group in parallel
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (45 preceding siblings ...)
  2024-12-23 22:23   ` [PATCH 46/51] xfs_scrub: cleanup fsmap keys initialization Darrick J. Wong
@ 2024-12-23 22:24   ` Darrick J. Wong
  2024-12-23 22:24   ` [PATCH 48/51] xfs_scrub: trim realtime volumes too Darrick J. Wong
                     ` (3 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:24 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If realtime groups are enabled, we should take advantage of the sharding
to speed up the spacemap scans.  Do so by issuing per-rtgroup GETFSMAP
calls.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 scrub/spacemap.c |   70 ++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 62 insertions(+), 8 deletions(-)


diff --git a/scrub/spacemap.c b/scrub/spacemap.c
index 4b7fae252d86ca..c293ab44a5286c 100644
--- a/scrub/spacemap.c
+++ b/scrub/spacemap.c
@@ -127,6 +127,43 @@ scan_ag_rmaps(
 	}
 }
 
+/* Iterate all the reverse mappings of a realtime group. */
+static void
+scan_rtg_rmaps(
+	struct workqueue	*wq,
+	xfs_agnumber_t		rgno,
+	void			*arg)
+{
+	struct scrub_ctx	*ctx = (struct scrub_ctx *)wq->wq_ctx;
+	struct scan_blocks	*sbx = arg;
+	struct fsmap		keys[2] = { };
+	off_t			bperrg = bytes_per_rtgroup(&ctx->mnt.fsgeom);
+	int			ret;
+
+	keys[0].fmr_device = ctx->fsinfo.fs_rtdev;
+	keys[0].fmr_physical = (xfs_rtblock_t)rgno * bperrg;
+	keys[1].fmr_device = ctx->fsinfo.fs_rtdev;
+	keys[1].fmr_physical = ((rgno + 1) * bperrg) - 1;
+	keys[1].fmr_owner = ULLONG_MAX;
+	keys[1].fmr_offset = ULLONG_MAX;
+	keys[1].fmr_flags = UINT_MAX;
+
+	if (sbx->aborted)
+		return;
+
+	ret = scrub_iterate_fsmap(ctx, keys, sbx->fn, sbx->arg);
+	if (ret) {
+		char		descr[DESCR_BUFSZ];
+
+		snprintf(descr, DESCR_BUFSZ, _("dev %d:%d rtgroup %u fsmap"),
+					major(ctx->fsinfo.fs_datadev),
+					minor(ctx->fsinfo.fs_datadev),
+					rgno);
+		str_liberror(ctx, ret, descr);
+		sbx->aborted = true;
+	}
+}
+
 /* Iterate all the reverse mappings of a standalone device. */
 static void
 scan_dev_rmaps(
@@ -206,14 +243,6 @@ scrub_scan_all_spacemaps(
 		str_liberror(ctx, ret, _("creating fsmap workqueue"));
 		return ret;
 	}
-	if (ctx->fsinfo.fs_rt) {
-		ret = -workqueue_add(&wq, scan_rt_rmaps, 0, &sbx);
-		if (ret) {
-			sbx.aborted = true;
-			str_liberror(ctx, ret, _("queueing rtdev fsmap work"));
-			goto out;
-		}
-	}
 	if (ctx->fsinfo.fs_log) {
 		ret = -workqueue_add(&wq, scan_log_rmaps, 0, &sbx);
 		if (ret) {
@@ -230,6 +259,31 @@ scrub_scan_all_spacemaps(
 			break;
 		}
 	}
+	if (ctx->fsinfo.fs_rt) {
+		for (agno = 0; agno < ctx->mnt.fsgeom.rgcount; agno++) {
+			ret = -workqueue_add(&wq, scan_rtg_rmaps, agno, &sbx);
+			if (ret) {
+				sbx.aborted = true;
+				str_liberror(ctx, ret,
+						_("queueing rtgroup fsmap work"));
+				break;
+			}
+		}
+
+		/*
+		 * If the fs doesn't have any realtime groups, scan the entire
+		 * volume all at once, since the above loop did nothing.
+		 */
+		if (ctx->mnt.fsgeom.rgcount == 0) {
+			ret = -workqueue_add(&wq, scan_rt_rmaps, 0, &sbx);
+			if (ret) {
+				sbx.aborted = true;
+				str_liberror(ctx, ret,
+						_("queueing rtdev fsmap work"));
+				goto out;
+			}
+		}
+	}
 out:
 	ret = -workqueue_terminate(&wq);
 	if (ret) {


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 48/51] xfs_scrub: trim realtime volumes too
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (46 preceding siblings ...)
  2024-12-23 22:24   ` [PATCH 47/51] xfs_scrub: call GETFSMAP for each rt group in parallel Darrick J. Wong
@ 2024-12-23 22:24   ` Darrick J. Wong
  2024-12-23 22:24   ` [PATCH 49/51] xfs_scrub: use histograms to speed up phase 8 on the realtime volume Darrick J. Wong
                     ` (2 subsequent siblings)
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:24 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

On the kernel side, the XFS realtime groups patchset added support for
FITRIM of the realtime volume.  This support doesn't actually require
there to be any realtime groups, so teach scrub to run through the whole
region.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 scrub/phase8.c |   32 +++++++++++++++++++++++++++++---
 1 file changed, 29 insertions(+), 3 deletions(-)


diff --git a/scrub/phase8.c b/scrub/phase8.c
index 1c88460c33962b..adb177ecdafbeb 100644
--- a/scrub/phase8.c
+++ b/scrub/phase8.c
@@ -59,7 +59,8 @@ fstrim_fsblocks(
 	struct scrub_ctx	*ctx,
 	uint64_t		start_fsb,
 	uint64_t		fsbcount,
-	uint64_t		minlen_fsb)
+	uint64_t		minlen_fsb,
+	bool			ignore_einval)
 {
 	uint64_t		start = cvt_off_fsb_to_b(&ctx->mnt, start_fsb);
 	uint64_t		len = cvt_off_fsb_to_b(&ctx->mnt, fsbcount);
@@ -72,6 +73,8 @@ fstrim_fsblocks(
 		run = min(len, FSTRIM_MAX_BYTES);
 
 		error = fstrim(ctx, start, run, minlen);
+		if (error == EINVAL && ignore_einval)
+			error = EOPNOTSUPP;
 		if (error == EOPNOTSUPP) {
 			/* Pretend we finished all the work. */
 			progress_add(len);
@@ -193,7 +196,8 @@ fstrim_datadev(
 		 */
 		progress_add(geo->blocksize);
 		fsbcount = min(geo->datablocks - fsbno, geo->agblocks);
-		error = fstrim_fsblocks(ctx, fsbno, fsbcount, minlen_fsb);
+		error = fstrim_fsblocks(ctx, fsbno, fsbcount, minlen_fsb,
+				false);
 		if (error)
 			return error;
 	}
@@ -201,15 +205,35 @@ fstrim_datadev(
 	return 0;
 }
 
+/* Trim the realtime device. */
+static int
+fstrim_rtdev(
+	struct scrub_ctx	*ctx)
+{
+	struct xfs_fsop_geom	*geo = &ctx->mnt.fsgeom;
+
+	/*
+	 * The fstrim ioctl pretends that the realtime volume is in the address
+	 * space immediately after the data volume.  Ignore EINVAL if someone
+	 * tries to run us on an older kernel.
+	 */
+	return fstrim_fsblocks(ctx, geo->datablocks, geo->rtblocks, 0, true);
+}
+
 /* Trim the filesystem, if desired. */
 int
 phase8_func(
 	struct scrub_ctx	*ctx)
 {
+	int			error;
+
 	if (!fstrim_ok(ctx))
 		return 0;
 
-	return fstrim_datadev(ctx);
+	error = fstrim_datadev(ctx);
+	if (error)
+		return error;
+	return fstrim_rtdev(ctx);
 }
 
 /* Estimate how much work we're going to do. */
@@ -223,6 +247,8 @@ phase8_estimate(
 	if (fstrim_ok(ctx)) {
 		*items = cvt_off_fsb_to_b(&ctx->mnt,
 				ctx->mnt.fsgeom.datablocks);
+		*items += cvt_off_fsb_to_b(&ctx->mnt,
+				ctx->mnt.fsgeom.rtblocks);
 	} else {
 		*items = 0;
 	}


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 49/51] xfs_scrub: use histograms to speed up phase 8 on the realtime volume
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (47 preceding siblings ...)
  2024-12-23 22:24   ` [PATCH 48/51] xfs_scrub: trim realtime volumes too Darrick J. Wong
@ 2024-12-23 22:24   ` Darrick J. Wong
  2024-12-23 22:24   ` [PATCH 50/51] mkfs: add headers to realtime bitmap blocks Darrick J. Wong
  2024-12-23 22:25   ` [PATCH 51/51] mkfs: format realtime groups Darrick J. Wong
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:24 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Use the same statistical methods that we use on the data volume to
compute the minimum threshold size for fstrims on the realtime volume.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 scrub/phase7.c    |    7 +++++++
 scrub/phase8.c    |    6 +++++-
 scrub/xfs_scrub.c |    2 ++
 scrub/xfs_scrub.h |    1 +
 4 files changed, 15 insertions(+), 1 deletion(-)


diff --git a/scrub/phase7.c b/scrub/phase7.c
index 475d8f157eecca..01097b67879878 100644
--- a/scrub/phase7.c
+++ b/scrub/phase7.c
@@ -31,6 +31,7 @@ struct summary_counts {
 
 	/* Free space histogram, in fsb */
 	struct histogram	datadev_hist;
+	struct histogram	rtdev_hist;
 };
 
 /*
@@ -56,6 +57,7 @@ summary_count_init(
 	struct summary_counts	*counts = data;
 
 	init_freesp_hist(&counts->datadev_hist);
+	init_freesp_hist(&counts->rtdev_hist);
 }
 
 /* Record block usage. */
@@ -83,6 +85,8 @@ count_block_summary(
 		blocks = cvt_b_to_off_fsbt(&ctx->mnt, fsmap->fmr_length);
 		if (fsmap->fmr_device == ctx->fsinfo.fs_datadev)
 			hist_add(&counts->datadev_hist, blocks);
+		else if (fsmap->fmr_device == ctx->fsinfo.fs_rtdev)
+			hist_add(&counts->rtdev_hist, blocks);
 		return 0;
 	}
 
@@ -124,7 +128,9 @@ add_summaries(
 	total->agbytes += item->agbytes;
 
 	hist_import(&total->datadev_hist, &item->datadev_hist);
+	hist_import(&total->rtdev_hist, &item->rtdev_hist);
 	hist_free(&item->datadev_hist);
+	hist_free(&item->rtdev_hist);
 	return 0;
 }
 
@@ -195,6 +201,7 @@ phase7_func(
 
 	/* Preserve free space histograms for phase 8. */
 	hist_move(&ctx->datadev_hist, &totalcount.datadev_hist);
+	hist_move(&ctx->rtdev_hist, &totalcount.rtdev_hist);
 
 	/* Scan the whole fs. */
 	error = scrub_count_all_inodes(ctx, &counted_inodes);
diff --git a/scrub/phase8.c b/scrub/phase8.c
index adb177ecdafbeb..e8c72d8eb851af 100644
--- a/scrub/phase8.c
+++ b/scrub/phase8.c
@@ -211,13 +211,17 @@ fstrim_rtdev(
 	struct scrub_ctx	*ctx)
 {
 	struct xfs_fsop_geom	*geo = &ctx->mnt.fsgeom;
+	uint64_t		minlen_fsb;
+
+	minlen_fsb = fstrim_compute_minlen(ctx, &ctx->rtdev_hist);
 
 	/*
 	 * The fstrim ioctl pretends that the realtime volume is in the address
 	 * space immediately after the data volume.  Ignore EINVAL if someone
 	 * tries to run us on an older kernel.
 	 */
-	return fstrim_fsblocks(ctx, geo->datablocks, geo->rtblocks, 0, true);
+	return fstrim_fsblocks(ctx, geo->datablocks, geo->rtblocks,
+			minlen_fsb, true);
 }
 
 /* Trim the filesystem, if desired. */
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index 3e7d9138f97ec2..90897cc26cd71d 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -728,6 +728,7 @@ main(
 	int			error;
 
 	hist_init(&ctx.datadev_hist);
+	hist_init(&ctx.rtdev_hist);
 
 	fprintf(stdout, "EXPERIMENTAL xfs_scrub program in use! Use at your own risk!\n");
 	fflush(stdout);
@@ -960,6 +961,7 @@ main(
 	unicrash_unload();
 
 	hist_free(&ctx.datadev_hist);
+	hist_free(&ctx.rtdev_hist);
 
 	/*
 	 * If we're being run as a service, the return code must fit the LSB
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 5d336cb55c7422..6ee359f4cebd47 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -101,6 +101,7 @@ struct scrub_ctx {
 
 	/* Free space histograms, in fsb */
 	struct histogram	datadev_hist;
+	struct histogram	rtdev_hist;
 
 	/*
 	 * Pick the largest value for fstrim minlen such that we trim at least


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 50/51] mkfs: add headers to realtime bitmap blocks
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (48 preceding siblings ...)
  2024-12-23 22:24   ` [PATCH 49/51] xfs_scrub: use histograms to speed up phase 8 on the realtime volume Darrick J. Wong
@ 2024-12-23 22:24   ` Darrick J. Wong
  2024-12-23 22:25   ` [PATCH 51/51] mkfs: format realtime groups Darrick J. Wong
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:24 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

When the rtgroups feature is enabled, format rtbitmap blocks with the
appropriate block headers.  libxfs takes care of the actual writing for
us, so all we have to do is ensure that the bitmap is the correct size.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 mkfs/xfs_mkfs.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)


diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 2549a636568d1b..5b9fd0e92f7aba 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -3210,6 +3210,7 @@ validate_rtdev(
 	struct cli_params	*cli)
 {
 	struct libxfs_init	*xi = cli->xi;
+	unsigned int		rbmblocksize = cfg->blocksize;
 
 	if (!xi->rt.dev) {
 		if (cli->rtsize) {
@@ -3253,8 +3254,10 @@ reported by the device (%u).\n"),
 _("cannot have an rt subvolume with zero extents\n"));
 		usage();
 	}
+	if (cfg->sb_feat.metadir)
+		rbmblocksize -= sizeof(struct xfs_rtbuf_blkinfo);
 	cfg->rtbmblocks = (xfs_extlen_t)howmany(cfg->rtextents,
-						NBBY * cfg->blocksize);
+						NBBY * rbmblocksize);
 }
 
 static bool


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 51/51] mkfs: format realtime groups
  2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                     ` (49 preceding siblings ...)
  2024-12-23 22:24   ` [PATCH 50/51] mkfs: add headers to realtime bitmap blocks Darrick J. Wong
@ 2024-12-23 22:25   ` Darrick J. Wong
  50 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:25 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Create filesystems with the realtime group feature enabled.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libfrog/div64.h          |    6 +
 libfrog/util.c           |   12 ++
 libfrog/util.h           |    1 
 libxfs/libxfs_api_defs.h |    2 
 libxfs/libxfs_priv.h     |    6 -
 libxfs/topology.c        |   42 +++++++
 libxfs/topology.h        |    3 
 man/man8/mkfs.xfs.8.in   |   31 +++++
 mkfs/proto.c             |   88 ++++++++++++--
 mkfs/xfs_mkfs.c          |  286 +++++++++++++++++++++++++++++++++++++++++++++-
 10 files changed, 448 insertions(+), 29 deletions(-)


diff --git a/libfrog/div64.h b/libfrog/div64.h
index 673b01cbab34de..4b0d4c3b3c704d 100644
--- a/libfrog/div64.h
+++ b/libfrog/div64.h
@@ -93,4 +93,10 @@ howmany_64(uint64_t x, uint32_t y)
 	return x;
 }
 
+static inline __attribute__((const))
+int is_power_of_2(unsigned long n)
+{
+	return (n != 0 && ((n & (n - 1)) == 0));
+}
+
 #endif /* LIBFROG_DIV64_H_ */
diff --git a/libfrog/util.c b/libfrog/util.c
index 46047571a5531f..4e130c884c17a2 100644
--- a/libfrog/util.c
+++ b/libfrog/util.c
@@ -36,3 +36,15 @@ memchr_inv(const void *start, int c, size_t bytes)
 
 	return NULL;
 }
+
+unsigned int
+log2_rounddown(unsigned long long i)
+{
+	int	rval;
+
+	for (rval = NBBY * sizeof(i) - 1; rval >= 0; rval--) {
+		if ((1ULL << rval) < i)
+			break;
+	}
+	return rval;
+}
diff --git a/libfrog/util.h b/libfrog/util.h
index 8b4ee7c1333b6b..d1c4dd40fc926c 100644
--- a/libfrog/util.h
+++ b/libfrog/util.h
@@ -9,6 +9,7 @@
 #include <sys/types.h>
 
 unsigned int	log2_roundup(unsigned int i);
+unsigned int	log2_rounddown(unsigned long long i);
 
 #define min_t(type,x,y) \
 	({ type __x = (x); type __y = (y); __x < __y ? __x: __y; })
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index dbdf5d100ec8e9..a8416dfbb27f59 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -94,6 +94,7 @@
 #define xfs_btree_stage_afakeroot	libxfs_btree_stage_afakeroot
 #define xfs_btree_stage_ifakeroot	libxfs_btree_stage_ifakeroot
 #define xfs_btree_visit_blocks		libxfs_btree_visit_blocks
+#define xfs_buf_delwri_queue		libxfs_buf_delwri_queue
 #define xfs_buf_delwri_submit		libxfs_buf_delwri_submit
 #define xfs_buf_get			libxfs_buf_get
 #define xfs_buf_get_uncached		libxfs_buf_get_uncached
@@ -302,6 +303,7 @@
 #define xfs_rtfree_blocks		libxfs_rtfree_blocks
 #define xfs_update_rtsb			libxfs_update_rtsb
 #define xfs_sb_from_disk		libxfs_sb_from_disk
+#define xfs_sb_mount_rextsize		libxfs_sb_mount_rextsize
 #define xfs_sb_quota_from_disk		libxfs_sb_quota_from_disk
 #define xfs_sb_read_secondary		libxfs_sb_read_secondary
 #define xfs_sb_to_disk			libxfs_sb_to_disk
diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index dd24bdc2d169d9..a1401b2c1e409b 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -307,12 +307,6 @@ find_next_zero_bit(const unsigned long *addr, unsigned long size,
 }
 #define find_first_zero_bit(addr, size) find_next_zero_bit((addr), (size), 0)
 
-static inline __attribute__((const))
-int is_power_of_2(unsigned long n)
-{
-	return (n != 0 && ((n & (n - 1)) == 0));
-}
-
 /*
  * xfs_iroundup: round up argument to next power of two
  */
diff --git a/libxfs/topology.c b/libxfs/topology.c
index 94adb5be7bdcae..8c6affb4c4e436 100644
--- a/libxfs/topology.c
+++ b/libxfs/topology.c
@@ -87,6 +87,48 @@ calc_default_ag_geometry(
 	*agcount = dblocks / blocks + (dblocks % blocks != 0);
 }
 
+void
+calc_default_rtgroup_geometry(
+	int		blocklog,
+	uint64_t	rblocks,
+	uint64_t	*rgsize,
+	uint64_t	*rgcount)
+{
+	uint64_t	blocks = 0;
+	int		shift = 0;
+
+	/*
+	 * For a single underlying storage device over 4TB in size use the
+	 * maximum rtgroup size.  Between 128MB and 4TB, just use 4 rtgroups
+	 * and scale up smoothly between min/max rtgroup sizes.
+	 */
+	if (rblocks >= TERABYTES(4, blocklog)) {
+		blocks = XFS_MAX_RGBLOCKS;
+		goto done;
+	}
+	if (rblocks >= MEGABYTES(128, blocklog)) {
+		shift = XFS_NOMULTIDISK_AGLOG;
+		goto calc_blocks;
+	}
+
+	/*
+	 * If rblocks is not evenly divisible by the number of desired rt
+	 * groups, round "blocks" up so we don't lose the last bit of the
+	 * filesystem. The same principle applies to the rt group count, so we
+	 * don't lose the last rt group!
+	 */
+calc_blocks:
+	ASSERT(shift >= 0 && shift <= XFS_MULTIDISK_AGLOG);
+	blocks = rblocks >> shift;
+	if (rblocks & xfs_mask32lo(shift)) {
+		if (blocks < XFS_MAX_RGBLOCKS)
+		    blocks++;
+	}
+done:
+	*rgsize = blocks;
+	*rgcount = rblocks / blocks + (rblocks % blocks != 0);
+}
+
 /*
  * Check for existing filesystem or partition table on device.
  * Returns:
diff --git a/libxfs/topology.h b/libxfs/topology.h
index fa0a23b7738624..207a8a7f150556 100644
--- a/libxfs/topology.h
+++ b/libxfs/topology.h
@@ -37,6 +37,9 @@ calc_default_ag_geometry(
 	uint64_t	*agsize,
 	uint64_t	*agcount);
 
+void calc_default_rtgroup_geometry(int blocklog, uint64_t rblocks,
+		uint64_t *rgsize, uint64_t *rgcount);
+
 extern int
 check_overwrite(
 	const char	*device);
diff --git a/man/man8/mkfs.xfs.8.in b/man/man8/mkfs.xfs.8.in
index de5f6baf59df95..0c0cf1dc151e4f 100644
--- a/man/man8/mkfs.xfs.8.in
+++ b/man/man8/mkfs.xfs.8.in
@@ -1141,6 +1141,37 @@ .SH OPTIONS
 .BI noalign
 This option disables stripe size detection, enforcing a realtime device with no
 stripe geometry.
+.TP
+.BI rgcount= value
+This is used to specify the number of allocation groups in the realtime
+section.
+The realtime section of the filesystem can be divided into allocation groups to
+improve the performance of XFS.
+More allocation groups imply that more parallelism can be achieved when
+allocating blocks.
+The minimum allocation group size is 2 realtime extents; the maximum size is
+2^31 blocks.
+The rt section of the filesystem is divided into
+.I value
+allocation groups (default value is scaled automatically based
+on the underlying device size).
+.TP
+.BI rgsize= value
+This is an alternative to using the
+.B rgcount
+suboption. The
+.I value
+is the desired size of the realtime allocation group expressed in bytes
+(usually using the
+.BR m " or " g
+suffixes).
+This value must be a multiple of the realtime extent size,
+must be at least two realtime extents, and no more than 2^31 blocks.
+The
+.B rgcount
+and
+.B rgsize
+suboptions are mutually exclusive.
 .RE
 .PP
 .PD 0
diff --git a/mkfs/proto.c b/mkfs/proto.c
index 846b1c9a9e8a21..4e9e28d4eea1ca 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -1008,8 +1008,8 @@ create_sb_metadata_file(
 }
 
 /*
- * Free the whole realtime area using transactions.
- * Do one transaction per bitmap block.
+ * Free the whole realtime area using transactions.  Each transaction may clear
+ * up to 32 rtbitmap blocks.
  */
 static void
 rtfreesp_init(
@@ -1017,8 +1017,8 @@ rtfreesp_init(
 {
 	struct xfs_mount	*mp = rtg_mount(rtg);
 	struct xfs_trans	*tp;
-	xfs_rtxnum_t		rtx;
-	xfs_rtxnum_t		ertx;
+	const xfs_rtxnum_t	max_rtx = mp->m_rtx_per_rbmblock * 32;
+	xfs_rtxnum_t		start_rtx = 0;
 	int			error;
 
 	/*
@@ -1034,37 +1034,56 @@ rtfreesp_init(
 	if (error)
 		fail(_("Initialization of rtsummary inode failed"), error);
 
+	if (!mp->m_sb.sb_rbmblocks)
+		return;
+
 	/*
 	 * Then free the blocks into the allocator, one bitmap block at a time.
 	 */
-	for (rtx = 0; rtx < mp->m_sb.sb_rextents; rtx = ertx) {
+	while (start_rtx < rtg->rtg_extents) {
+		xfs_rtxlen_t	nr = min(rtg->rtg_extents - start_rtx, max_rtx);
+
+		/*
+		 * The rt superblock, if present, must not be marked free.
+		 * This may be the only rtx in the entire volume.
+		 */
+		if (xfs_has_rtsb(mp) && rtg_rgno(rtg) == 0 && start_rtx == 0) {
+			start_rtx++;
+			nr--;
+
+			if (start_rtx == rtg->rtg_extents)
+				break;
+		}
+
 		error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate,
 				0, 0, 0, &tp);
 		if (error)
 			res_failed(error);
 
 		libxfs_trans_ijoin(tp, rtg->rtg_inodes[XFS_RTGI_BITMAP], 0);
-		ertx = min(mp->m_sb.sb_rextents,
-			   rtx + NBBY * mp->m_sb.sb_blocksize);
-
-		error = -libxfs_rtfree_extent(tp, rtg, rtx,
-				(xfs_rtxlen_t)(ertx - rtx));
+		error = -libxfs_rtfree_extent(tp, rtg, start_rtx, nr);
 		if (error) {
-			fail(_("Error initializing the realtime space"),
-				error);
+			fprintf(stderr,
+ _("Error initializing the realtime free space near rgno %u rtx %lld-%lld (max %lld): %s\n"),
+					rtg_rgno(rtg),
+					(unsigned long long)start_rtx,
+					(unsigned long long)start_rtx + nr - 1,
+					(unsigned long long)rtg->rtg_extents,
+					strerror(error));
+			exit(1);
 		}
+
 		error = -libxfs_trans_commit(tp);
 		if (error)
 			fail(_("Initialization of the realtime space failed"),
 					error);
+
+		start_rtx += nr;
 	}
 }
 
-/*
- * Allocate the realtime bitmap and summary inodes, and fill in data if any.
- */
 static void
-rtinit(
+rtinit_nogroups(
 	struct xfs_mount	*mp)
 {
 	struct xfs_rtgroup	*rtg = NULL;
@@ -1079,6 +1098,43 @@ rtinit(
 	}
 }
 
+static void
+rtinit_groups(
+	struct xfs_mount	*mp)
+{
+	struct xfs_rtgroup	*rtg = NULL;
+	unsigned int		i;
+	int			error;
+
+	error = -libxfs_rtginode_mkdir_parent(mp);
+	if (error)
+		fail(_("rtgroup directory allocation failed"), error);
+
+	while ((rtg = xfs_rtgroup_next(mp, rtg))) {
+		for (i = 0; i < XFS_RTGI_MAX; i++) {
+			error = -libxfs_rtginode_create(rtg, i, true);
+			if (error)
+				fail(_("rt group inode creation failed"),
+						error);
+		}
+
+		rtfreesp_init(rtg);
+	}
+}
+
+/*
+ * Allocate the realtime bitmap and summary inodes, and fill in data if any.
+ */
+static void
+rtinit(
+	struct xfs_mount	*mp)
+{
+	if (xfs_has_rtgroups(mp))
+		rtinit_groups(mp);
+	else
+		rtinit_nogroups(mp);
+}
+
 static off_t
 filesize(
 	int		fd)
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 5b9fd0e92f7aba..cd94cfd0b93706 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -132,6 +132,8 @@ enum {
 	R_FILE,
 	R_NAME,
 	R_NOALIGN,
+	R_RGCOUNT,
+	R_RGSIZE,
 	R_MAX_OPTS,
 };
 
@@ -727,6 +729,8 @@ static struct opt_params ropts = {
 		[R_FILE] = "file",
 		[R_NAME] = "name",
 		[R_NOALIGN] = "noalign",
+		[R_RGCOUNT] = "rgcount",
+		[R_RGSIZE] = "rgsize",
 		[R_MAX_OPTS] = NULL,
 	},
 	.subopt_params = {
@@ -766,6 +770,21 @@ static struct opt_params ropts = {
 		  .defaultval = 1,
 		  .conflicts = { { NULL, LAST_CONFLICT } },
 		},
+		{ .index = R_RGCOUNT,
+		  .conflicts = { { &ropts, R_RGSIZE },
+				 { NULL, LAST_CONFLICT } },
+		  .minval = 1,
+		  .maxval = XFS_MAX_RGNUMBER,
+		  .defaultval = SUBOPT_NEEDS_VAL,
+		},
+		{ .index = R_RGSIZE,
+		  .conflicts = { { &ropts, R_RGCOUNT },
+				 { NULL, LAST_CONFLICT } },
+		  .convert = true,
+		  .minval = 0,
+		  .maxval = (unsigned long long)XFS_MAX_RGBLOCKS << XFS_MAX_BLOCKSIZE_LOG,
+		  .defaultval = SUBOPT_NEEDS_VAL,
+		},
 	},
 };
 
@@ -940,6 +959,7 @@ struct cli_params {
 	/* parameters that depend on sector/block size being validated. */
 	char	*dsize;
 	char	*agsize;
+	char	*rgsize;
 	char	*dsu;
 	char	*dirblocksize;
 	char	*logsize;
@@ -961,6 +981,7 @@ struct cli_params {
 
 	/* parameters where 0 is not a valid value */
 	int64_t	agcount;
+	int64_t	rgcount;
 	int	inodesize;
 	int	inopblock;
 	int	imaxpct;
@@ -1017,6 +1038,9 @@ struct mkfs_params {
 	uint64_t	agsize;
 	uint64_t	agcount;
 
+	uint64_t	rgsize;
+	uint64_t	rgcount;
+
 	int		imaxpct;
 
 	bool		loginternal;
@@ -1075,7 +1099,7 @@ usage( void )
 /* no-op info only */	[-N]\n\
 /* prototype file */	[-p fname]\n\
 /* quiet */		[-q]\n\
-/* realtime subvol */	[-r extsize=num,size=num,rtdev=xxx]\n\
+/* realtime subvol */	[-r extsize=num,size=num,rtdev=xxx,rgcount=n,rgsize=n]\n\
 /* sectorsize */	[-s size=num]\n\
 /* version */		[-V]\n\
 			devicename\n\
@@ -1989,6 +2013,12 @@ rtdev_opts_parser(
 	case R_NOALIGN:
 		cli->sb_feat.nortalign = getnum(value, opts, subopt);
 		break;
+	case R_RGCOUNT:
+		cli->rgcount = getnum(value, opts, subopt);
+		break;
+	case R_RGSIZE:
+		cli->rgsize = getstr(value, opts, subopt);
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -2531,6 +2561,20 @@ _("cowextsize not supported without reflink support\n"));
 		cli->sb_feat.exchrange = true;
 	}
 
+	/*
+	 * Exchange-range will be needed for space reorganization on filesystems
+	 * with realtime rmap or realtime reflink enabled, and there is no good
+	 * reason to ever disable it on a file system with new enough features.
+	 */
+	if (cli->sb_feat.metadir && !cli->sb_feat.exchrange) {
+		if (cli_opt_set(&iopts, I_EXCHANGE)) {
+			fprintf(stderr,
+_("metadir not supported without exchange-range support\n"));
+			usage();
+		}
+		cli->sb_feat.exchrange = true;
+	}
+
 	/*
 	 * Copy features across to config structure now.
 	 */
@@ -3210,7 +3254,6 @@ validate_rtdev(
 	struct cli_params	*cli)
 {
 	struct libxfs_init	*xi = cli->xi;
-	unsigned int		rbmblocksize = cfg->blocksize;
 
 	if (!xi->rt.dev) {
 		if (cli->rtsize) {
@@ -3254,10 +3297,13 @@ reported by the device (%u).\n"),
 _("cannot have an rt subvolume with zero extents\n"));
 		usage();
 	}
-	if (cfg->sb_feat.metadir)
-		rbmblocksize -= sizeof(struct xfs_rtbuf_blkinfo);
+
+	/*
+	 * Note for rtgroup file systems this will be overriden in
+	 * calculate_rtgroup_geometry.
+	 */
 	cfg->rtbmblocks = (xfs_extlen_t)howmany(cfg->rtextents,
-						NBBY * rbmblocksize);
+						NBBY * cfg->blocksize);
 }
 
 static bool
@@ -3498,6 +3544,189 @@ an AG size that is one stripe unit smaller or larger, for example %llu.\n"),
 			     cfg->agsize, cfg->agcount);
 }
 
+static uint64_t
+calc_rgsize_extsize_nonpower(
+	struct mkfs_params	*cfg)
+{
+	uint64_t		try_rgsize, rgsize, rgcount;
+
+	/*
+	 * For non-power-of-two rt extent sizes, round the rtgroup size down to
+	 * the nearest extent.
+	 */
+	calc_default_rtgroup_geometry(cfg->blocklog, cfg->rtblocks, &rgsize,
+			&rgcount);
+	rgsize -= rgsize % cfg->rtextblocks;
+	rgsize = min(XFS_MAX_RGBLOCKS, rgsize);
+
+	/*
+	 * If we would be left with a too-small rtgroup, increase or decrease
+	 * the size of the group until we have a working geometry.
+	 */
+	for (try_rgsize = rgsize;
+	     try_rgsize <= XFS_MAX_RGBLOCKS - cfg->rtextblocks;
+	     try_rgsize += cfg->rtextblocks) {
+		if (cfg->rtblocks % try_rgsize >= (2 * cfg->rtextblocks))
+			return try_rgsize;
+	}
+	for (try_rgsize = rgsize;
+	     try_rgsize > (2 * cfg->rtextblocks);
+	     try_rgsize -= cfg->rtextblocks) {
+		if (cfg->rtblocks % try_rgsize >= (2 * cfg->rtextblocks))
+			return try_rgsize;
+	}
+
+	fprintf(stderr,
+_("realtime group size (%llu) not at all congruent with extent size (%llu)\n"),
+			(unsigned long long)rgsize,
+			(unsigned long long)cfg->rtextblocks);
+	usage();
+	return 0;
+}
+
+static uint64_t
+calc_rgsize_extsize_power(
+	struct mkfs_params	*cfg)
+{
+	uint64_t		try_rgsize, rgsize, rgcount;
+	unsigned int		rgsizelog;
+
+	/*
+	 * Find the rt group size that is both a power of two and yields at
+	 * least as many rt groups as the default geometry specified.
+	 */
+	calc_default_rtgroup_geometry(cfg->blocklog, cfg->rtblocks, &rgsize,
+			&rgcount);
+	rgsizelog = log2_rounddown(rgsize);
+	rgsize = min(XFS_MAX_RGBLOCKS, 1U << rgsizelog);
+
+	/*
+	 * If we would be left with a too-small rtgroup, increase or decrease
+	 * the size of the group by powers of 2 until we have a working
+	 * geometry.  If that doesn't work, try bumping by the extent size.
+	 */
+	for (try_rgsize = rgsize;
+	     try_rgsize <= XFS_MAX_RGBLOCKS - cfg->rtextblocks;
+	     try_rgsize <<= 2) {
+		if (cfg->rtblocks % try_rgsize >= (2 * cfg->rtextblocks))
+			return try_rgsize;
+	}
+	for (try_rgsize = rgsize;
+	     try_rgsize > (2 * cfg->rtextblocks);
+	     try_rgsize >>= 2) {
+		if (cfg->rtblocks % try_rgsize >= (2 * cfg->rtextblocks))
+			return try_rgsize;
+	}
+	for (try_rgsize = rgsize;
+	     try_rgsize <= XFS_MAX_RGBLOCKS - cfg->rtextblocks;
+	     try_rgsize += cfg->rtextblocks) {
+		if (cfg->rtblocks % try_rgsize >= (2 * cfg->rtextblocks))
+			return try_rgsize;
+	}
+	for (try_rgsize = rgsize;
+	     try_rgsize > (2 * cfg->rtextblocks);
+	     try_rgsize -= cfg->rtextblocks) {
+		if (cfg->rtblocks % try_rgsize >= (2 * cfg->rtextblocks))
+			return try_rgsize;
+	}
+
+	fprintf(stderr,
+_("realtime group size (%llu) not at all congruent with extent size (%llu)\n"),
+			(unsigned long long)rgsize,
+			(unsigned long long)cfg->rtextblocks);
+	usage();
+	return 0;
+}
+
+static void
+calculate_rtgroup_geometry(
+	struct mkfs_params	*cfg,
+	struct cli_params	*cli)
+{
+	if (!cli->sb_feat.metadir) {
+		cfg->rgcount = 0;
+		cfg->rgsize = 0;
+		return;
+	}
+
+	if (cli->rgsize) {	/* User-specified rtgroup size */
+		cfg->rgsize = getnum(cli->rgsize, &ropts, R_RGSIZE);
+
+		/*
+		 * Check specified agsize is a multiple of blocksize.
+		 */
+		if (cfg->rgsize % cfg->blocksize) {
+			fprintf(stderr,
+_("rgsize (%s) not a multiple of fs blk size (%d)\n"),
+				cli->rgsize, cfg->blocksize);
+			usage();
+		}
+		cfg->rgsize /= cfg->blocksize;
+		cfg->rgcount = cfg->rtblocks / cfg->rgsize +
+				(cfg->rtblocks % cfg->rgsize != 0);
+
+	} else if (cli->rgcount) {	/* User-specified rtgroup count */
+		cfg->rgcount = cli->rgcount;
+		cfg->rgsize = cfg->rtblocks / cfg->rgcount +
+				(cfg->rtblocks % cfg->rgcount != 0);
+	} else if (cfg->rtblocks == 0) {
+		/*
+		 * If nobody specified a realtime device or the rtgroup size,
+		 * try 1TB, rounded down to the nearest rt extent.
+		 */
+		cfg->rgsize = TERABYTES(1, cfg->blocklog);
+		cfg->rgsize -= cfg->rgsize % cfg->rtextblocks;
+		cfg->rgcount = 0;
+	} else if (cfg->rtblocks < cfg->rtextblocks * 2) {
+		/* too small even for a single group */
+		cfg->rgsize = cfg->rtblocks;
+		cfg->rgcount = 0;
+	} else if (is_power_of_2(cfg->rtextblocks)) {
+		cfg->rgsize = calc_rgsize_extsize_power(cfg);
+		cfg->rgcount = cfg->rtblocks / cfg->rgsize +
+				(cfg->rtblocks % cfg->rgsize != 0);
+	} else {
+		cfg->rgsize = calc_rgsize_extsize_nonpower(cfg);
+		cfg->rgcount = cfg->rtblocks / cfg->rgsize +
+				(cfg->rtblocks % cfg->rgsize != 0);
+	}
+
+	if (cfg->rgsize > XFS_MAX_RGBLOCKS) {
+		fprintf(stderr,
+_("realtime group size (%llu) must be less than the maximum (%u)\n"),
+				(unsigned long long)cfg->rgsize,
+				XFS_MAX_RGBLOCKS);
+		usage();
+	}
+
+	if (cfg->rgsize % cfg->rtextblocks != 0) {
+		fprintf(stderr,
+_("realtime group size (%llu) not a multiple of rt extent size (%llu)\n"),
+				(unsigned long long)cfg->rgsize,
+				(unsigned long long)cfg->rtextblocks);
+		usage();
+	}
+
+	if (cfg->rgsize <= cfg->rtextblocks) {
+		fprintf(stderr,
+_("realtime group size (%llu) must be at least two realtime extents\n"),
+				(unsigned long long)cfg->rgsize);
+		usage();
+	}
+
+	if (cfg->rgcount > XFS_MAX_RGNUMBER) {
+		fprintf(stderr,
+_("realtime group count (%llu) must be less than the maximum (%u)\n"),
+				(unsigned long long)cfg->rgcount,
+				XFS_MAX_RGNUMBER);
+		usage();
+	}
+
+	if (cfg->rtextents)
+		cfg->rtbmblocks = howmany(cfg->rgsize / cfg->rtextblocks,
+			NBBY * (cfg->blocksize - sizeof(struct xfs_rtbuf_blkinfo)));
+}
+
 static void
 calculate_imaxpct(
 	struct mkfs_params	*cfg,
@@ -3651,8 +3880,13 @@ sb_set_features(
 		 */
 		sbp->sb_versionnum |= XFS_SB_VERSION_ATTRBIT;
 	}
-	if (fp->metadir)
+	if (fp->metadir) {
 		sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_METADIR;
+		sbp->sb_rgcount = cfg->rgcount;
+		sbp->sb_rgextents = cfg->rgsize / cfg->rtextblocks;
+		sbp->sb_rgblklog = libxfs_compute_rgblklog(sbp->sb_rgextents,
+							   cfg->rtextblocks);
+	}
 }
 
 /*
@@ -4046,6 +4280,7 @@ start_superblock_setup(
 	sbp->sb_rblocks = cfg->rtblocks;
 	sbp->sb_rextsize = cfg->rtextblocks;
 	mp->m_features |= libxfs_sb_version_to_features(sbp);
+	libxfs_sb_mount_rextsize(mp, sbp);
 }
 
 static void
@@ -4106,7 +4341,7 @@ finish_superblock_setup(
 	sbp->sb_unit = cfg->dsunit;
 	sbp->sb_width = cfg->dswidth;
 	mp->m_features |= libxfs_sb_version_to_features(sbp);
-
+	libxfs_sb_mount_rextsize(mp, sbp);
 }
 
 /* Prepare an uncached buffer, ready to write something out. */
@@ -4488,6 +4723,39 @@ set_autofsck(
 	libxfs_irele(args.dp);
 }
 
+/* Write the realtime superblock */
+static void
+write_rtsb(
+	struct xfs_mount	*mp)
+{
+	struct xfs_buf		*rtsb_bp;
+	struct xfs_buf		*sb_bp = libxfs_getsb(mp);
+	int			error;
+
+	if (!sb_bp) {
+		fprintf(stderr,
+  _("%s: couldn't grab primary superblock buffer\n"), progname);
+		exit(1);
+	}
+
+	error = -libxfs_buf_get_uncached(mp->m_rtdev_targp,
+				XFS_FSB_TO_BB(mp, 1), XFS_RTSB_DADDR,
+				&rtsb_bp);
+	if (error) {
+		fprintf(stderr,
+ _("%s: couldn't grab realtime superblock buffer\n"), progname);
+			exit(1);
+	}
+
+	rtsb_bp->b_maps[0].bm_bn = XFS_RTSB_DADDR;
+	rtsb_bp->b_ops = &xfs_rtsb_buf_ops;
+
+	libxfs_update_rtsb(rtsb_bp, sb_bp);
+	libxfs_buf_mark_dirty(rtsb_bp);
+	libxfs_buf_relse(rtsb_bp);
+	libxfs_buf_relse(sb_bp);
+}
+
 int
 main(
 	int			argc,
@@ -4704,6 +4972,7 @@ main(
 	 */
 	calculate_initial_ag_geometry(&cfg, &cli, &xi);
 	align_ag_geometry(&cfg);
+	calculate_rtgroup_geometry(&cfg, &cli);
 
 	calculate_imaxpct(&cfg, &cli);
 
@@ -4804,6 +5073,9 @@ main(
 		exit(1);
 	}
 
+	if (xfs_has_rtsb(mp) && cfg.rtblocks > 0)
+		write_rtsb(mp);
+
 	/*
 	 * Initialise the freespace freelists (i.e. AGFLs) in each AG.
 	 */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 1/7] libfrog: scrub quota file metapaths
  2024-12-23 21:36 ` [PATCHSET v6.2 7/8] xfsprogs: store quota files in the metadir Darrick J. Wong
@ 2024-12-23 22:25   ` Darrick J. Wong
  2024-12-23 22:25   ` [PATCH 2/7] xfs_db: support metadir quotas Darrick J. Wong
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:25 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Support scrubbing quota file metadir paths.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libfrog/scrub.c |   20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)


diff --git a/libfrog/scrub.c b/libfrog/scrub.c
index d40364d35ce0b4..129f592e108ae1 100644
--- a/libfrog/scrub.c
+++ b/libfrog/scrub.c
@@ -187,6 +187,26 @@ const struct xfrog_scrub_descr xfrog_metapaths[XFS_SCRUB_METAPATH_NR] = {
 		.descr	= "rtgroup summary",
 		.group	= XFROG_SCRUB_GROUP_RTGROUP,
 	},
+	[XFS_SCRUB_METAPATH_QUOTADIR] = {
+		.name	= "quotadir",
+		.descr	= "quota file metadir",
+		.group	= XFROG_SCRUB_GROUP_FS,
+	},
+	[XFS_SCRUB_METAPATH_USRQUOTA] = {
+		.name	= "usrquota",
+		.descr	= "user quota file",
+		.group	= XFROG_SCRUB_GROUP_FS,
+	},
+	[XFS_SCRUB_METAPATH_GRPQUOTA] = {
+		.name	= "grpquota",
+		.descr	= "group quota file",
+		.group	= XFROG_SCRUB_GROUP_FS,
+	},
+	[XFS_SCRUB_METAPATH_PRJQUOTA] = {
+		.name	= "prjquota",
+		.descr	= "project quota file",
+		.group	= XFROG_SCRUB_GROUP_FS,
+	},
 };
 
 /* Invoke the scrub ioctl.  Returns zero or negative error code. */


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 2/7] xfs_db: support metadir quotas
  2024-12-23 21:36 ` [PATCHSET v6.2 7/8] xfsprogs: store quota files in the metadir Darrick J. Wong
  2024-12-23 22:25   ` [PATCH 1/7] libfrog: scrub quota file metapaths Darrick J. Wong
@ 2024-12-23 22:25   ` Darrick J. Wong
  2024-12-23 22:25   ` [PATCH 3/7] xfs_repair: refactor quota inumber handling Darrick J. Wong
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:25 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Support finding the quota files in the metadata directory.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 db/dquot.c               |   59 ++++++++++++++++++++++++++++++++++------------
 libxfs/libxfs_api_defs.h |    6 +++++
 2 files changed, 50 insertions(+), 15 deletions(-)


diff --git a/db/dquot.c b/db/dquot.c
index 338d064995155e..d2c76fd70bf1a6 100644
--- a/db/dquot.c
+++ b/db/dquot.c
@@ -81,6 +81,41 @@ dquot_help(void)
 {
 }
 
+static xfs_ino_t
+dqtype_to_inode(
+	struct xfs_mount	*mp,
+	xfs_dqtype_t		type)
+{
+	struct xfs_trans	*tp;
+	struct xfs_inode	*dp = NULL;
+	struct xfs_inode	*ip;
+	xfs_ino_t		ret = NULLFSINO;
+	int			error;
+
+	error = -libxfs_trans_alloc_empty(mp, &tp);
+	if (error)
+		return NULLFSINO;
+
+	if (xfs_has_metadir(mp)) {
+		error = -libxfs_dqinode_load_parent(tp, &dp);
+		if (error)
+			goto out_cancel;
+	}
+
+	error = -libxfs_dqinode_load(tp, dp, type, &ip);
+	if (error)
+		goto out_dp;
+
+	ret = ip->i_ino;
+	libxfs_irele(ip);
+out_dp:
+	if (dp)
+		libxfs_irele(dp);
+out_cancel:
+	libxfs_trans_cancel(tp);
+	return ret;
+}
+
 static int
 dquot_f(
 	int		argc,
@@ -88,8 +123,7 @@ dquot_f(
 {
 	bmap_ext_t	bm;
 	int		c;
-	int		dogrp;
-	int		doprj;
+	xfs_dqtype_t	type = XFS_DQTYPE_USER;
 	xfs_dqid_t	id;
 	xfs_ino_t	ino;
 	xfs_extnum_t	nex;
@@ -97,38 +131,33 @@ dquot_f(
 	int		perblock;
 	xfs_fileoff_t	qbno;
 	int		qoff;
-	char		*s;
+	const char	*s;
 
-	dogrp = doprj = optind = 0;
+	optind = 0;
 	while ((c = getopt(argc, argv, "gpu")) != EOF) {
 		switch (c) {
 		case 'g':
-			dogrp = 1;
-			doprj = 0;
+			type = XFS_DQTYPE_GROUP;
 			break;
 		case 'p':
-			doprj = 1;
-			dogrp = 0;
+			type = XFS_DQTYPE_PROJ;
 			break;
 		case 'u':
-			dogrp = doprj = 0;
+			type = XFS_DQTYPE_USER;
 			break;
 		default:
 			dbprintf(_("bad option for dquot command\n"));
 			return 0;
 		}
 	}
-	s = doprj ? _("project") : dogrp ? _("group") : _("user");
+
+	s = libxfs_dqinode_path(type);
 	if (optind != argc - 1) {
 		dbprintf(_("dquot command requires one %s id argument\n"), s);
 		return 0;
 	}
-	ino = mp->m_sb.sb_uquotino;
-	if (doprj)
-		ino = mp->m_sb.sb_pquotino;
-	else if (dogrp)
-		ino = mp->m_sb.sb_gquotino;
 
+	ino = dqtype_to_inode(mp, type);
 	if (ino == 0 || ino == NULLFSINO) {
 		dbprintf(_("no %s quota inode present\n"), s);
 		return 0;
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index a8416dfbb27f59..6b2dc7a30d2547 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -159,6 +159,12 @@
 #define xfs_dir_replace			libxfs_dir_replace
 
 #define xfs_dqblk_repair		libxfs_dqblk_repair
+#define xfs_dqinode_load		libxfs_dqinode_load
+#define xfs_dqinode_load_parent		libxfs_dqinode_load_parent
+#define xfs_dqinode_mkdir_parent	libxfs_dqinode_mkdir_parent
+#define xfs_dqinode_metadir_create	libxfs_dqinode_metadir_create
+#define xfs_dqinode_metadir_link	libxfs_dqinode_metadir_link
+#define xfs_dqinode_path		libxfs_dqinode_path
 #define xfs_dquot_from_disk_ts		libxfs_dquot_from_disk_ts
 #define xfs_dquot_verify		libxfs_dquot_verify
 


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 3/7] xfs_repair: refactor quota inumber handling
  2024-12-23 21:36 ` [PATCHSET v6.2 7/8] xfsprogs: store quota files in the metadir Darrick J. Wong
  2024-12-23 22:25   ` [PATCH 1/7] libfrog: scrub quota file metapaths Darrick J. Wong
  2024-12-23 22:25   ` [PATCH 2/7] xfs_db: support metadir quotas Darrick J. Wong
@ 2024-12-23 22:25   ` Darrick J. Wong
  2024-12-23 22:26   ` [PATCH 4/7] xfs_repair: hoist the secondary sb qflags handling Darrick J. Wong
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:25 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

In preparation for putting quota files in the metadata directory tree,
refactor repair's quota inumber handling to use its own variables
instead of the xfs_mount's.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/dinode.c     |   18 ++++----
 repair/dir2.c       |   12 +++---
 repair/globals.c    |  111 ++++++++++++++++++++++++++++++++++++++++++++++++---
 repair/globals.h    |   15 ++++---
 repair/phase4.c     |   96 ++++++++++++++++++--------------------------
 repair/phase6.c     |   12 +++---
 repair/quotacheck.c |   47 +++++++++++++++-------
 repair/quotacheck.h |    2 +
 repair/versions.c   |    9 +---
 repair/xfs_repair.c |   12 ++++--
 10 files changed, 221 insertions(+), 113 deletions(-)


diff --git a/repair/dinode.c b/repair/dinode.c
index 2d341975e53993..9ab193bc5fe973 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -1662,29 +1662,29 @@ process_check_metadata_inodes(
 		}
 		return 0;
 	}
-	if (lino == mp->m_sb.sb_uquotino)  {
+	if (is_quota_inode(XFS_DQTYPE_USER, lino))  {
 		if (*type != XR_INO_UQUOTA)  {
 			do_warn(_("user quota inode %" PRIu64 " has bad type 0x%x\n"),
 				lino, dinode_fmt(dinoc));
-			mp->m_sb.sb_uquotino = NULLFSINO;
+			clear_quota_inode(XFS_DQTYPE_USER);
 			return 1;
 		}
 		return 0;
 	}
-	if (lino == mp->m_sb.sb_gquotino)  {
+	if (is_quota_inode(XFS_DQTYPE_GROUP, lino))  {
 		if (*type != XR_INO_GQUOTA)  {
 			do_warn(_("group quota inode %" PRIu64 " has bad type 0x%x\n"),
 				lino, dinode_fmt(dinoc));
-			mp->m_sb.sb_gquotino = NULLFSINO;
+			clear_quota_inode(XFS_DQTYPE_GROUP);
 			return 1;
 		}
 		return 0;
 	}
-	if (lino == mp->m_sb.sb_pquotino)  {
+	if (is_quota_inode(XFS_DQTYPE_PROJ, lino))  {
 		if (*type != XR_INO_PQUOTA)  {
 			do_warn(_("project quota inode %" PRIu64 " has bad type 0x%x\n"),
 				lino, dinode_fmt(dinoc));
-			mp->m_sb.sb_pquotino = NULLFSINO;
+			clear_quota_inode(XFS_DQTYPE_PROJ);
 			return 1;
 		}
 		return 0;
@@ -2980,11 +2980,11 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"),
 		else if (lino == mp->m_sb.sb_rsumino ||
 			 is_rtsummary_inode(lino))
 			type = XR_INO_RTSUM;
-		else if (lino == mp->m_sb.sb_uquotino)
+		else if (is_quota_inode(XFS_DQTYPE_USER, lino))
 			type = XR_INO_UQUOTA;
-		else if (lino == mp->m_sb.sb_gquotino)
+		else if (is_quota_inode(XFS_DQTYPE_GROUP, lino))
 			type = XR_INO_GQUOTA;
-		else if (lino == mp->m_sb.sb_pquotino)
+		else if (is_quota_inode(XFS_DQTYPE_PROJ, lino))
 			type = XR_INO_PQUOTA;
 		else
 			type = XR_INO_DATA;
diff --git a/repair/dir2.c b/repair/dir2.c
index ca747c90175e93..ed615662009957 100644
--- a/repair/dir2.c
+++ b/repair/dir2.c
@@ -265,13 +265,13 @@ process_sf_dir2(
 		           is_rtsummary_inode(lino)) {
 			junkit = 1;
 			junkreason = _("realtime summary");
-		} else if (lino == mp->m_sb.sb_uquotino)  {
+		} else if (is_quota_inode(XFS_DQTYPE_USER, lino)) {
 			junkit = 1;
 			junkreason = _("user quota");
-		} else if (lino == mp->m_sb.sb_gquotino)  {
+		} else if (is_quota_inode(XFS_DQTYPE_GROUP, lino)) {
 			junkit = 1;
 			junkreason = _("group quota");
-		} else if (lino == mp->m_sb.sb_pquotino)  {
+		} else if (is_quota_inode(XFS_DQTYPE_PROJ, lino)) {
 			junkit = 1;
 			junkreason = _("project quota");
 		} else if (lino == mp->m_sb.sb_metadirino)  {
@@ -746,11 +746,11 @@ process_dir2_data(
 		} else if (ent_ino == mp->m_sb.sb_rsumino ||
 		           is_rtsummary_inode(ent_ino)) {
 			clearreason = _("realtime summary");
-		} else if (ent_ino == mp->m_sb.sb_uquotino) {
+		} else if (is_quota_inode(XFS_DQTYPE_USER, ent_ino)) {
 			clearreason = _("user quota");
-		} else if (ent_ino == mp->m_sb.sb_gquotino) {
+		} else if (is_quota_inode(XFS_DQTYPE_GROUP, ent_ino)) {
 			clearreason = _("group quota");
-		} else if (ent_ino == mp->m_sb.sb_pquotino) {
+		} else if (is_quota_inode(XFS_DQTYPE_PROJ, ent_ino)) {
 			clearreason = _("project quota");
 		} else if (ent_ino == mp->m_sb.sb_metadirino)  {
 			clearreason = _("metadata directory root");
diff --git a/repair/globals.c b/repair/globals.c
index 30995f5298d5a1..99291d6afd61b9 100644
--- a/repair/globals.c
+++ b/repair/globals.c
@@ -73,12 +73,6 @@ int	need_rbmino;
 int	need_rsumino;
 
 int	lost_quotas;
-int	have_uquotino;
-int	have_gquotino;
-int	have_pquotino;
-int	lost_uquotino;
-int	lost_gquotino;
-int	lost_pquotino;
 
 /* configuration vars -- fs geometry dependent */
 
@@ -119,3 +113,108 @@ int		thread_count;
 
 /* If nonzero, simulate failure after this phase. */
 int		fail_after_phase;
+
+/* quota inode numbers */
+enum quotino_state {
+	QI_STATE_UNKNOWN,
+	QI_STATE_HAVE,
+	QI_STATE_LOST,
+};
+
+static xfs_ino_t quotinos[3] = { NULLFSINO, NULLFSINO, NULLFSINO };
+static enum quotino_state quotino_state[3];
+
+static inline unsigned int quotino_off(xfs_dqtype_t type)
+{
+	switch (type) {
+	case XFS_DQTYPE_USER:
+		return 0;
+	case XFS_DQTYPE_GROUP:
+		return 1;
+	case XFS_DQTYPE_PROJ:
+		return 2;
+	}
+
+	ASSERT(0);
+	return -1;
+}
+
+void
+set_quota_inode(
+	xfs_dqtype_t	type,
+	xfs_ino_t	ino)
+{
+	unsigned int	off = quotino_off(type);
+
+	quotinos[off] = ino;
+	quotino_state[off] = QI_STATE_HAVE;
+}
+
+void
+lose_quota_inode(
+	xfs_dqtype_t	type)
+{
+	unsigned int	off = quotino_off(type);
+
+	quotinos[off] = NULLFSINO;
+	quotino_state[off] = QI_STATE_LOST;
+}
+
+void
+clear_quota_inode(
+	xfs_dqtype_t	type)
+{
+	unsigned int	off = quotino_off(type);
+
+	quotinos[off] = NULLFSINO;
+	quotino_state[off] = QI_STATE_UNKNOWN;
+}
+
+xfs_ino_t
+get_quota_inode(
+	xfs_dqtype_t	type)
+{
+	unsigned int	off = quotino_off(type);
+
+	return quotinos[off];
+}
+
+bool
+is_quota_inode(
+	xfs_dqtype_t	type,
+	xfs_ino_t	ino)
+{
+	unsigned int	off = quotino_off(type);
+
+	return quotinos[off] == ino;
+}
+
+bool
+is_any_quota_inode(
+	xfs_ino_t		ino)
+{
+	unsigned int		i;
+
+	for(i = 0; i < ARRAY_SIZE(quotinos); i++)
+		if (quotinos[i] == ino)
+			return true;
+	return false;
+}
+
+bool
+lost_quota_inode(
+	xfs_dqtype_t	type)
+{
+	unsigned int	off = quotino_off(type);
+
+	return quotino_state[off] == QI_STATE_LOST;
+}
+
+bool
+has_quota_inode(
+	xfs_dqtype_t	type)
+{
+	unsigned int	off = quotino_off(type);
+
+	return quotino_state[off] == QI_STATE_HAVE;
+}
diff --git a/repair/globals.h b/repair/globals.h
index 7c2d9c56c8f8a7..b23a06af6cc81b 100644
--- a/repair/globals.h
+++ b/repair/globals.h
@@ -114,12 +114,6 @@ extern int		need_rbmino;
 extern int		need_rsumino;
 
 extern int		lost_quotas;
-extern int		have_uquotino;
-extern int		have_gquotino;
-extern int		have_pquotino;
-extern int		lost_uquotino;
-extern int		lost_gquotino;
-extern int		lost_pquotino;
 
 /* configuration vars -- fs geometry dependent */
 
@@ -165,4 +159,13 @@ extern int		fail_after_phase;
 
 extern struct libxfs_init x;
 
+void set_quota_inode(xfs_dqtype_t type, xfs_ino_t);
+void lose_quota_inode(xfs_dqtype_t type);
+void clear_quota_inode(xfs_dqtype_t type);
+xfs_ino_t get_quota_inode(xfs_dqtype_t type);
+bool is_quota_inode(xfs_dqtype_t type, xfs_ino_t ino);
+bool is_any_quota_inode(xfs_ino_t ino);
+bool lost_quota_inode(xfs_dqtype_t type);
+bool has_quota_inode(xfs_dqtype_t type);
+
 #endif /* _XFS_REPAIR_GLOBAL_H */
diff --git a/repair/phase4.c b/repair/phase4.c
index f43f8ecd84e25b..a4183c557a1891 100644
--- a/repair/phase4.c
+++ b/repair/phase4.c
@@ -23,6 +23,35 @@
 
 bool collect_rmaps;
 
+static inline void
+quotino_check_one(
+	struct xfs_mount	*mp,
+	xfs_dqtype_t		type)
+{
+	struct ino_tree_node	*irec;
+	xfs_ino_t		ino;
+
+	if (!has_quota_inode(type))
+		return;
+
+	ino = get_quota_inode(type);
+	if (!libxfs_verify_ino(mp, ino))
+		goto bad;
+
+	irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, ino),
+			XFS_INO_TO_AGINO(mp, ino));
+	if (!irec)
+		goto bad;
+
+	if (is_inode_free(irec, ino - irec->ino_startnum))
+		goto bad;
+
+	return;
+
+bad:
+	lose_quota_inode(type);
+}
+
 /*
  * null out quota inode fields in sb if they point to non-existent inodes.
  * this isn't as redundant as it looks since it's possible that the sb field
@@ -31,57 +60,12 @@ bool collect_rmaps;
  * be cleared by process_dinode().
  */
 static void
-quotino_check(xfs_mount_t *mp)
+quotino_check(
+	struct xfs_mount	*mp)
 {
-	ino_tree_node_t *irec;
-
-	if (mp->m_sb.sb_uquotino != NULLFSINO && mp->m_sb.sb_uquotino != 0)  {
-		if (!libxfs_verify_ino(mp, mp->m_sb.sb_uquotino))
-			irec = NULL;
-		else
-			irec = find_inode_rec(mp,
-				XFS_INO_TO_AGNO(mp, mp->m_sb.sb_uquotino),
-				XFS_INO_TO_AGINO(mp, mp->m_sb.sb_uquotino));
-
-		if (irec == NULL || is_inode_free(irec,
-				mp->m_sb.sb_uquotino - irec->ino_startnum))  {
-			mp->m_sb.sb_uquotino = NULLFSINO;
-			lost_uquotino = 1;
-		} else
-			lost_uquotino = 0;
-	}
-
-	if (mp->m_sb.sb_gquotino != NULLFSINO && mp->m_sb.sb_gquotino != 0)  {
-		if (!libxfs_verify_ino(mp, mp->m_sb.sb_gquotino))
-			irec = NULL;
-		else
-			irec = find_inode_rec(mp,
-				XFS_INO_TO_AGNO(mp, mp->m_sb.sb_gquotino),
-				XFS_INO_TO_AGINO(mp, mp->m_sb.sb_gquotino));
-
-		if (irec == NULL || is_inode_free(irec,
-				mp->m_sb.sb_gquotino - irec->ino_startnum))  {
-			mp->m_sb.sb_gquotino = NULLFSINO;
-			lost_gquotino = 1;
-		} else
-			lost_gquotino = 0;
-	}
-
-	if (mp->m_sb.sb_pquotino != NULLFSINO && mp->m_sb.sb_pquotino != 0)  {
-		if (!libxfs_verify_ino(mp, mp->m_sb.sb_pquotino))
-			irec = NULL;
-		else
-			irec = find_inode_rec(mp,
-				XFS_INO_TO_AGNO(mp, mp->m_sb.sb_pquotino),
-				XFS_INO_TO_AGINO(mp, mp->m_sb.sb_pquotino));
-
-		if (irec == NULL || is_inode_free(irec,
-				mp->m_sb.sb_pquotino - irec->ino_startnum))  {
-			mp->m_sb.sb_pquotino = NULLFSINO;
-			lost_pquotino = 1;
-		} else
-			lost_pquotino = 0;
-	}
+	quotino_check_one(mp, XFS_DQTYPE_USER);
+	quotino_check_one(mp, XFS_DQTYPE_GROUP);
+	quotino_check_one(mp, XFS_DQTYPE_PROJ);
 }
 
 static void
@@ -107,14 +91,14 @@ quota_sb_check(xfs_mount_t *mp)
 	 */
 
 	if (fs_quotas &&
-	    (mp->m_sb.sb_uquotino == NULLFSINO || mp->m_sb.sb_uquotino == 0) &&
-	    (mp->m_sb.sb_gquotino == NULLFSINO || mp->m_sb.sb_gquotino == 0) &&
-	    (mp->m_sb.sb_pquotino == NULLFSINO || mp->m_sb.sb_pquotino == 0))  {
+	    !has_quota_inode(XFS_DQTYPE_USER) &&
+	    !has_quota_inode(XFS_DQTYPE_GROUP) &&
+	    !has_quota_inode(XFS_DQTYPE_PROJ))  {
 		lost_quotas = 1;
 		fs_quotas = 0;
-	} else if (libxfs_verify_ino(mp, mp->m_sb.sb_uquotino) &&
-		   libxfs_verify_ino(mp, mp->m_sb.sb_gquotino) &&
-		   libxfs_verify_ino(mp, mp->m_sb.sb_pquotino)) {
+	} else if (libxfs_verify_ino(mp, get_quota_inode(XFS_DQTYPE_USER)) &&
+		   libxfs_verify_ino(mp, get_quota_inode(XFS_DQTYPE_GROUP)) &&
+		   libxfs_verify_ino(mp, get_quota_inode(XFS_DQTYPE_PROJ))) {
 		fs_quotas = 1;
 	}
 }
diff --git a/repair/phase6.c b/repair/phase6.c
index 59665f48684aa4..bd3b6e79bae095 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -3183,12 +3183,12 @@ mark_standalone_inodes(xfs_mount_t *mp)
 	if (!fs_quotas)
 		return;
 
-	if (mp->m_sb.sb_uquotino && mp->m_sb.sb_uquotino != NULLFSINO)
-		mark_inode(mp, mp->m_sb.sb_uquotino);
-	if (mp->m_sb.sb_gquotino && mp->m_sb.sb_gquotino != NULLFSINO)
-		mark_inode(mp, mp->m_sb.sb_gquotino);
-	if (mp->m_sb.sb_pquotino && mp->m_sb.sb_pquotino != NULLFSINO)
-		mark_inode(mp, mp->m_sb.sb_pquotino);
+	if (has_quota_inode(XFS_DQTYPE_USER))
+		mark_inode(mp, get_quota_inode(XFS_DQTYPE_USER));
+	if (has_quota_inode(XFS_DQTYPE_GROUP))
+		mark_inode(mp, get_quota_inode(XFS_DQTYPE_GROUP));
+	if (has_quota_inode(XFS_DQTYPE_PROJ))
+		mark_inode(mp, get_quota_inode(XFS_DQTYPE_PROJ));
 }
 
 static void
diff --git a/repair/quotacheck.c b/repair/quotacheck.c
index 11e2d64eb34791..c4baf70e41d6b1 100644
--- a/repair/quotacheck.c
+++ b/repair/quotacheck.c
@@ -203,9 +203,7 @@ quotacheck_adjust(
 		return;
 
 	/* Quota files are not included in quota counts. */
-	if (ino == mp->m_sb.sb_uquotino ||
-	    ino == mp->m_sb.sb_gquotino ||
-	    ino == mp->m_sb.sb_pquotino)
+	if (is_any_quota_inode(ino))
 		return;
 
 	error = -libxfs_iget(mp, NULL, ino, 0, &ip);
@@ -415,20 +413,20 @@ quotacheck_verify(
 
 	switch (type) {
 	case XFS_DQTYPE_USER:
-		ino = mp->m_sb.sb_uquotino;
 		dquots = user_dquots;
 		metafile_type = XFS_METAFILE_USRQUOTA;
 		break;
 	case XFS_DQTYPE_GROUP:
-		ino = mp->m_sb.sb_gquotino;
 		dquots = group_dquots;
 		metafile_type = XFS_METAFILE_GRPQUOTA;
 		break;
 	case XFS_DQTYPE_PROJ:
-		ino = mp->m_sb.sb_pquotino;
 		dquots = proj_dquots;
 		metafile_type = XFS_METAFILE_PRJQUOTA;
 		break;
+	default:
+		ASSERT(0);
+		return;
 	}
 
 	/*
@@ -443,6 +441,7 @@ quotacheck_verify(
 	if (error)
 		do_error(_("could not alloc transaction to open quota file\n"));
 
+	ino = get_quota_inode(type);
 	error = -libxfs_trans_metafile_iget(tp, ino, metafile_type, &ip);
 	if (error) {
 		do_warn(
@@ -505,34 +504,28 @@ qc_has_quotafile(
 	struct xfs_mount	*mp,
 	xfs_dqtype_t		type)
 {
-	bool			lost;
 	xfs_ino_t		ino;
 	unsigned int		qflag;
 
 	switch (type) {
 	case XFS_DQTYPE_USER:
-		lost = lost_uquotino;
-		ino = mp->m_sb.sb_uquotino;
 		qflag = XFS_UQUOTA_CHKD;
 		break;
 	case XFS_DQTYPE_GROUP:
-		lost = lost_gquotino;
-		ino = mp->m_sb.sb_gquotino;
 		qflag = XFS_GQUOTA_CHKD;
 		break;
 	case XFS_DQTYPE_PROJ:
-		lost = lost_pquotino;
-		ino = mp->m_sb.sb_pquotino;
 		qflag = XFS_PQUOTA_CHKD;
 		break;
 	default:
 		return false;
 	}
 
-	if (lost)
+	if (lost_quota_inode(type))
 		return false;
 	if (!(mp->m_sb.sb_qflags & qflag))
 		return false;
+	ino = get_quota_inode(type);
 	if (ino == NULLFSINO || ino == 0)
 		return false;
 	return true;
@@ -626,3 +619,29 @@ quotacheck_teardown(void)
 	qc_purge(&group_dquots);
 	qc_purge(&proj_dquots);
 }
+
+void
+update_sb_quotinos(
+	struct xfs_mount	*mp,
+	struct xfs_buf		*sbp)
+{
+	bool			dirty = false;
+
+	if (mp->m_sb.sb_uquotino != get_quota_inode(XFS_DQTYPE_USER)) {
+		mp->m_sb.sb_uquotino = get_quota_inode(XFS_DQTYPE_USER);
+		dirty = true;
+	}
+
+	if (mp->m_sb.sb_gquotino != get_quota_inode(XFS_DQTYPE_GROUP)) {
+		mp->m_sb.sb_gquotino = get_quota_inode(XFS_DQTYPE_GROUP);
+		dirty = true;
+	}
+
+	if (mp->m_sb.sb_pquotino != get_quota_inode(XFS_DQTYPE_PROJ)) {
+		mp->m_sb.sb_pquotino = get_quota_inode(XFS_DQTYPE_PROJ);
+		dirty = true;
+	}
+
+	if (dirty)
+		libxfs_sb_to_disk(sbp->b_addr, &mp->m_sb);
+}
diff --git a/repair/quotacheck.h b/repair/quotacheck.h
index dcbf1623947b48..36f9f5a12f7f3e 100644
--- a/repair/quotacheck.h
+++ b/repair/quotacheck.h
@@ -13,4 +13,6 @@ uint16_t quotacheck_results(void);
 int quotacheck_setup(struct xfs_mount *mp);
 void quotacheck_teardown(void);
 
+void update_sb_quotinos(struct xfs_mount *mp, struct xfs_buf *sbp);
+
 #endif /* __XFS_REPAIR_QUOTACHECK_H__ */
diff --git a/repair/versions.c b/repair/versions.c
index 7dc91b4597eece..689cc471176da0 100644
--- a/repair/versions.c
+++ b/repair/versions.c
@@ -104,9 +104,6 @@ parse_sb_version(
 	fs_sb_feature_bits = 0;
 	fs_ino_alignment = 0;
 	fs_has_extflgbit = 1;
-	have_uquotino = 0;
-	have_gquotino = 0;
-	have_pquotino = 0;
 
 	if (mp->m_sb.sb_versionnum & XFS_SB_VERSION_SHAREDBIT) {
 		do_warn(_("Shared Version bit set. Not supported. Ever.\n"));
@@ -166,13 +163,13 @@ _("WARNING: you have a V1 inode filesystem. It would be converted to a\n"
 		fs_quotas = 1;
 
 		if (mp->m_sb.sb_uquotino != 0 && mp->m_sb.sb_uquotino != NULLFSINO)
-			have_uquotino = 1;
+			set_quota_inode(XFS_DQTYPE_USER, mp->m_sb.sb_uquotino);
 
 		if (mp->m_sb.sb_gquotino != 0 && mp->m_sb.sb_gquotino != NULLFSINO)
-			have_gquotino = 1;
+			set_quota_inode(XFS_DQTYPE_GROUP, mp->m_sb.sb_gquotino);
 
 		if (mp->m_sb.sb_pquotino != 0 && mp->m_sb.sb_pquotino != NULLFSINO)
-			have_pquotino = 1;
+			set_quota_inode(XFS_DQTYPE_PROJ, mp->m_sb.sb_pquotino);
 	}
 
 	if (xfs_has_align(mp))  {
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 2a8a72e7027591..363f8260bd575a 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -1404,7 +1404,9 @@ _("Inode allocation btrees are too corrupted, skipping phases 6 and 7\n"));
 		free_rtgroup_inodes();
 	}
 
-	if (lost_quotas && !have_uquotino && !have_gquotino && !have_pquotino) {
+	if (lost_quotas && !has_quota_inode(XFS_DQTYPE_USER) &&
+	    !has_quota_inode(XFS_DQTYPE_GROUP) &&
+	    !has_quota_inode(XFS_DQTYPE_PROJ)) {
 		if (!no_modify)  {
 			do_warn(
 _("Warning:  no quota inodes were found.  Quotas disabled.\n"));
@@ -1421,7 +1423,7 @@ _("Warning:  quota inodes were cleared.  Quotas disabled.\n"));
 _("Warning:  quota inodes would be cleared.  Quotas would be disabled.\n"));
 		}
 	} else  {
-		if (lost_uquotino)  {
+		if (lost_quota_inode(XFS_DQTYPE_USER))  {
 			if (!no_modify)  {
 				do_warn(
 _("Warning:  user quota information was cleared.\n"
@@ -1433,7 +1435,7 @@ _("Warning:  user quota information would be cleared.\n"
 			}
 		}
 
-		if (lost_gquotino)  {
+		if (lost_quota_inode(XFS_DQTYPE_GROUP))  {
 			if (!no_modify)  {
 				do_warn(
 _("Warning:  group quota information was cleared.\n"
@@ -1445,7 +1447,7 @@ _("Warning:  group quota information would be cleared.\n"
 			}
 		}
 
-		if (lost_pquotino)  {
+		if (lost_quota_inode(XFS_DQTYPE_PROJ))  {
 			if (!no_modify)  {
 				do_warn(
 _("Warning:  project quota information was cleared.\n"
@@ -1485,6 +1487,8 @@ _("Warning:  project quota information would be cleared.\n"
 	if (!sbp)
 		do_error(_("couldn't get superblock\n"));
 
+	update_sb_quotinos(mp, sbp);
+
 	if ((mp->m_sb.sb_qflags & XFS_ALL_QUOTA_CHKD) != quotacheck_results()) {
 		do_warn(_("Note - quota info will be regenerated on next "
 			"quota mount.\n"));


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 4/7] xfs_repair: hoist the secondary sb qflags handling
  2024-12-23 21:36 ` [PATCHSET v6.2 7/8] xfsprogs: store quota files in the metadir Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-12-23 22:25   ` [PATCH 3/7] xfs_repair: refactor quota inumber handling Darrick J. Wong
@ 2024-12-23 22:26   ` Darrick J. Wong
  2024-12-23 22:26   ` [PATCH 5/7] xfs_repair: support quota inodes in the metadata directory Darrick J. Wong
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:26 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Hoist all the secondary superblock qflags and quota inode modification
code into a separate function so that we can disable it in the next
patch.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/agheader.c |  157 +++++++++++++++++++++++++++++------------------------
 1 file changed, 85 insertions(+), 72 deletions(-)


diff --git a/repair/agheader.c b/repair/agheader.c
index fd279559aed973..4a530fd6b8fe96 100644
--- a/repair/agheader.c
+++ b/repair/agheader.c
@@ -319,6 +319,90 @@ check_v5_feature_mismatch(
 	return XR_AG_SB_SEC;
 }
 
+/*
+ * quota inodes and flags in secondary superblocks are never set by mkfs.
+ * However, they could be set in a secondary if a fs with quotas was growfs'ed
+ * since growfs copies the new primary into the secondaries.
+ *
+ * Also, the in-core inode flags now have different meaning to the on-disk
+ * flags, and so libxfs_sb_to_disk cannot directly write the
+ * sb_gquotino/sb_pquotino fields without specific sb_qflags being set.  Hence
+ * we need to zero those fields directly in the sb buffer here.
+ */
+static int
+secondary_sb_quota(
+	struct xfs_mount	*mp,
+	struct xfs_buf		*sbuf,
+	struct xfs_sb		*sb,
+	xfs_agnumber_t		i,
+	int			do_bzero)
+{
+	struct xfs_dsb		*dsb = sbuf->b_addr;
+	int			rval = 0;
+
+	if (sb->sb_inprogress == 1 && sb->sb_uquotino != NULLFSINO)  {
+		if (!no_modify)
+			sb->sb_uquotino = 0;
+		if (!do_bzero)  {
+			rval |= XR_AG_SB;
+			do_warn(
+		_("non-null user quota inode field in superblock %d\n"),
+				i);
+
+		} else
+			rval |= XR_AG_SB_SEC;
+	}
+
+	if (sb->sb_inprogress == 1 && sb->sb_gquotino != NULLFSINO)  {
+		if (!no_modify) {
+			sb->sb_gquotino = 0;
+			dsb->sb_gquotino = 0;
+		}
+		if (!do_bzero)  {
+			rval |= XR_AG_SB;
+			do_warn(
+		_("non-null group quota inode field in superblock %d\n"),
+				i);
+
+		} else
+			rval |= XR_AG_SB_SEC;
+	}
+
+	/*
+	 * Note that sb_pquotino is not considered a valid sb field for pre-v5
+	 * superblocks. If it is anything other than 0 it is considered garbage
+	 * data beyond the valid sb and explicitly zeroed above.
+	 */
+	if (xfs_has_pquotino(mp) &&
+	    sb->sb_inprogress == 1 && sb->sb_pquotino != NULLFSINO)  {
+		if (!no_modify) {
+			sb->sb_pquotino = 0;
+			dsb->sb_pquotino = 0;
+		}
+		if (!do_bzero)  {
+			rval |= XR_AG_SB;
+			do_warn(
+		_("non-null project quota inode field in superblock %d\n"),
+				i);
+
+		} else
+			rval |= XR_AG_SB_SEC;
+	}
+
+	if (sb->sb_inprogress == 1 && sb->sb_qflags)  {
+		if (!no_modify)
+			sb->sb_qflags = 0;
+		if (!do_bzero)  {
+			rval |= XR_AG_SB;
+			do_warn(_("non-null quota flags in superblock %d\n"),
+				i);
+		} else
+			rval |= XR_AG_SB_SEC;
+	}
+
+	return rval;
+}
+
 /*
  * Possible fields that may have been set at mkfs time,
  * sb_inoalignmt, sb_unit, sb_width and sb_dirblklog.
@@ -340,7 +424,6 @@ secondary_sb_whack(
 	struct xfs_sb	*sb,
 	xfs_agnumber_t	i)
 {
-	struct xfs_dsb	*dsb = sbuf->b_addr;
 	int		do_bzero = 0;
 	int		size;
 	char		*ip;
@@ -426,77 +509,7 @@ secondary_sb_whack(
 			rval |= XR_AG_SB_SEC;
 	}
 
-	/*
-	 * quota inodes and flags in secondary superblocks are never set by
-	 * mkfs.  However, they could be set in a secondary if a fs with quotas
-	 * was growfs'ed since growfs copies the new primary into the
-	 * secondaries.
-	 *
-	 * Also, the in-core inode flags now have different meaning to the
-	 * on-disk flags, and so libxfs_sb_to_disk cannot directly write the
-	 * sb_gquotino/sb_pquotino fields without specific sb_qflags being set.
-	 * Hence we need to zero those fields directly in the sb buffer here.
-	 */
-
-	if (sb->sb_inprogress == 1 && sb->sb_uquotino != NULLFSINO)  {
-		if (!no_modify)
-			sb->sb_uquotino = 0;
-		if (!do_bzero)  {
-			rval |= XR_AG_SB;
-			do_warn(
-		_("non-null user quota inode field in superblock %d\n"),
-				i);
-
-		} else
-			rval |= XR_AG_SB_SEC;
-	}
-
-	if (sb->sb_inprogress == 1 && sb->sb_gquotino != NULLFSINO)  {
-		if (!no_modify) {
-			sb->sb_gquotino = 0;
-			dsb->sb_gquotino = 0;
-		}
-		if (!do_bzero)  {
-			rval |= XR_AG_SB;
-			do_warn(
-		_("non-null group quota inode field in superblock %d\n"),
-				i);
-
-		} else
-			rval |= XR_AG_SB_SEC;
-	}
-
-	/*
-	 * Note that sb_pquotino is not considered a valid sb field for pre-v5
-	 * superblocks. If it is anything other than 0 it is considered garbage
-	 * data beyond the valid sb and explicitly zeroed above.
-	 */
-	if (xfs_has_pquotino(mp) &&
-	    sb->sb_inprogress == 1 && sb->sb_pquotino != NULLFSINO)  {
-		if (!no_modify) {
-			sb->sb_pquotino = 0;
-			dsb->sb_pquotino = 0;
-		}
-		if (!do_bzero)  {
-			rval |= XR_AG_SB;
-			do_warn(
-		_("non-null project quota inode field in superblock %d\n"),
-				i);
-
-		} else
-			rval |= XR_AG_SB_SEC;
-	}
-
-	if (sb->sb_inprogress == 1 && sb->sb_qflags)  {
-		if (!no_modify)
-			sb->sb_qflags = 0;
-		if (!do_bzero)  {
-			rval |= XR_AG_SB;
-			do_warn(_("non-null quota flags in superblock %d\n"),
-				i);
-		} else
-			rval |= XR_AG_SB_SEC;
-	}
+	rval |= secondary_sb_quota(mp, sbuf, sb, i, do_bzero);
 
 	/*
 	 * if the secondaries agree on a stripe unit/width or inode


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 5/7] xfs_repair: support quota inodes in the metadata directory
  2024-12-23 21:36 ` [PATCHSET v6.2 7/8] xfsprogs: store quota files in the metadir Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-12-23 22:26   ` [PATCH 4/7] xfs_repair: hoist the secondary sb qflags handling Darrick J. Wong
@ 2024-12-23 22:26   ` Darrick J. Wong
  2024-12-23 22:26   ` [PATCH 6/7] xfs_repair: try not to trash qflags on metadir filesystems Darrick J. Wong
  2024-12-23 22:26   ` [PATCH 7/7] mkfs: add quota flags when setting up filesystem Darrick J. Wong
  6 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:26 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Handle quota inodes on metadir filesystems.  This means that we have to
discover whatever quota inodes exist by looking in /quotas instead of
the superblock, and mend any broken metadir tree links might exist.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/agheader.c   |    3 +
 repair/phase2.c     |    3 +
 repair/phase6.c     |  116 +++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/quotacheck.c |   71 +++++++++++++++++++++++++++++++
 repair/quotacheck.h |    1 
 repair/xfs_repair.c |    3 +
 6 files changed, 194 insertions(+), 3 deletions(-)


diff --git a/repair/agheader.c b/repair/agheader.c
index 4a530fd6b8fe96..e6fca07c6cb4c9 100644
--- a/repair/agheader.c
+++ b/repair/agheader.c
@@ -509,7 +509,8 @@ secondary_sb_whack(
 			rval |= XR_AG_SB_SEC;
 	}
 
-	rval |= secondary_sb_quota(mp, sbuf, sb, i, do_bzero);
+	if (!xfs_has_metadir(mp))
+		rval |= secondary_sb_quota(mp, sbuf, sb, i, do_bzero);
 
 	/*
 	 * if the secondaries agree on a stripe unit/width or inode
diff --git a/repair/phase2.c b/repair/phase2.c
index 27c873fca76747..71576f5806e473 100644
--- a/repair/phase2.c
+++ b/repair/phase2.c
@@ -15,6 +15,7 @@
 #include "progress.h"
 #include "scan.h"
 #include "rt.h"
+#include "quotacheck.h"
 
 /* workaround craziness in the xlog routines */
 int xlog_recover_do_trans(struct xlog *log, struct xlog_recover *t, int p)
@@ -625,6 +626,8 @@ phase2(
 	}
 
 	discover_rtgroup_inodes(mp);
+	if (xfs_has_metadir(mp) && xfs_has_quota(mp))
+		discover_quota_inodes(mp);
 
 	/*
 	 * Upgrade the filesystem now that we've done a preliminary check of
diff --git a/repair/phase6.c b/repair/phase6.c
index bd3b6e79bae095..7d2e0554594265 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -20,6 +20,7 @@
 #include "versions.h"
 #include "repair/pptr.h"
 #include "repair/rt.h"
+#include "repair/quotacheck.h"
 
 static xfs_ino_t		orphanage_ino;
 
@@ -3180,7 +3181,7 @@ mark_standalone_inodes(xfs_mount_t *mp)
 		mark_inode(mp, mp->m_sb.sb_rsumino);
 	}
 
-	if (!fs_quotas)
+	if (!fs_quotas || xfs_has_metadir(mp))
 		return;
 
 	if (has_quota_inode(XFS_DQTYPE_USER))
@@ -3402,6 +3403,116 @@ _("        - resetting contents of realtime bitmap and summary inodes\n"));
 	}
 }
 
+static bool
+ensure_quota_file(
+	struct xfs_inode	*dp,
+	xfs_dqtype_t		type)
+{
+	struct xfs_mount	*mp = dp->i_mount;
+	struct xfs_inode	*ip;
+	const char		*name = libxfs_dqinode_path(type);
+	int			error;
+
+	if (!has_quota_inode(type))
+		return false;
+
+	if (no_modify) {
+		if (lost_quota_inode(type))
+			do_warn(_("would reset %s quota inode\n"), name);
+		return false;
+	}
+
+	if (!lost_quota_inode(type)) {
+		/*
+		 * The /quotas directory has been discarded, but we should
+		 * be able to iget the quota files directly.
+		 */
+		error = -libxfs_metafile_iget(mp, get_quota_inode(type),
+				xfs_dqinode_metafile_type(type), &ip);
+		if (error) {
+			do_warn(
+_("Could not open %s quota inode, error %d\n"),
+					name, error);
+			lose_quota_inode(type);
+		}
+	}
+
+	if (lost_quota_inode(type)) {
+		/*
+		 * The inode was bad or missing, state that we'll make a new
+		 * one even though we always create a new one.
+		 */
+		do_warn(_("resetting %s quota inode\n"), name);
+		error =  -libxfs_dqinode_metadir_create(dp, type, &ip);
+		if (error) {
+			do_warn(
+_("Couldn't create %s quota inode, error %d\n"),
+					name, error);
+			goto bad;
+		}
+	} else {
+		struct xfs_trans	*tp;
+
+		/* Erase parent pointers before we create the new link */
+		try_erase_parent_ptrs(ip);
+
+		error = -libxfs_dqinode_metadir_link(dp, type, ip);
+		if (error) {
+			do_warn(
+_("Couldn't link %s quota inode, error %d\n"),
+					name, error);
+			goto bad;
+		}
+
+		/*
+		 * Reset the link count to 1 because quota files are never
+		 * hardlinked, but the link above probably bumped it.
+		 */
+		error = -libxfs_trans_alloc_inode(ip, &M_RES(mp)->tr_ichange,
+				0, 0, false, &tp);
+		if (!error) {
+			set_nlink(VFS_I(ip), 1);
+			libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+			error = -libxfs_trans_commit(tp);
+		}
+		if (error)
+			do_error(
+_("Couldn't reset link count on %s quota inode, error %d\n"),
+					name, error);
+	}
+
+	/* Mark the inode in use. */
+	mark_ino_inuse(mp, ip->i_ino, S_IFREG, dp->i_ino);
+	mark_ino_metadata(mp, ip->i_ino);
+	libxfs_irele(ip);
+	return true;
+bad:
+	/* Zeroes qflags */
+	quotacheck_skip();
+	return false;
+}
+
+static void
+reset_quota_metadir_inodes(
+	struct xfs_mount	*mp)
+{
+	struct xfs_inode	*dp = NULL;
+	int			error;
+
+	error = -libxfs_dqinode_mkdir_parent(mp, &dp);
+	if (error)
+		do_error(_("failed to create quota metadir (%d)\n"),
+				error);
+
+	mark_ino_inuse(mp, dp->i_ino, S_IFDIR, mp->m_metadirip->i_ino);
+	mark_ino_metadata(mp, dp->i_ino);
+
+	ensure_quota_file(dp, XFS_DQTYPE_USER);
+	ensure_quota_file(dp, XFS_DQTYPE_GROUP);
+	ensure_quota_file(dp, XFS_DQTYPE_PROJ);
+	libxfs_irele(dp);
+}
+
 void
 phase6(xfs_mount_t *mp)
 {
@@ -3455,6 +3566,9 @@ phase6(xfs_mount_t *mp)
 	else
 		reset_rt_sb_inodes(mp);
 
+	if (xfs_has_metadir(mp) && xfs_has_quota(mp) && !no_modify)
+		reset_quota_metadir_inodes(mp);
+
 	mark_standalone_inodes(mp);
 
 	do_log(_("        - traversing filesystem ...\n"));
diff --git a/repair/quotacheck.c b/repair/quotacheck.c
index c4baf70e41d6b1..8c7339b267d8e6 100644
--- a/repair/quotacheck.c
+++ b/repair/quotacheck.c
@@ -645,3 +645,74 @@ update_sb_quotinos(
 	if (dirty)
 		libxfs_sb_to_disk(sbp->b_addr, &mp->m_sb);
 }
+
+static inline int
+mark_quota_inode(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*dp,
+	xfs_dqtype_t		type)
+{
+	struct xfs_inode	*ip;
+	int			error;
+
+	error = -libxfs_dqinode_load(tp, dp, type, &ip);
+	if (error == ENOENT)
+		return 0;
+	if (error)
+		goto out_corrupt;
+
+	set_quota_inode(type, ip->i_ino);
+	libxfs_irele(ip);
+	return 0;
+
+out_corrupt:
+	lose_quota_inode(type);
+	return error;
+}
+
+/* Mark the reachable quota metadata inodes prior to the inode scan. */
+void
+discover_quota_inodes(
+	struct xfs_mount	*mp)
+{
+	struct xfs_trans	*tp;
+	struct xfs_inode	*dp = NULL;
+	int			error, err2;
+
+	error = -libxfs_trans_alloc_empty(mp, &tp);
+	if (error)
+		goto out;
+
+	error = -libxfs_dqinode_load_parent(tp, &dp);
+	if (error)
+		goto out_cancel;
+
+	error = mark_quota_inode(tp, dp, XFS_DQTYPE_USER);
+	err2 = mark_quota_inode(tp, dp, XFS_DQTYPE_GROUP);
+	if (err2 && !error)
+		error = err2;
+	error = mark_quota_inode(tp, dp, XFS_DQTYPE_PROJ);
+	if (err2 && !error)
+		error = err2;
+
+	libxfs_irele(dp);
+out_cancel:
+	libxfs_trans_cancel(tp);
+out:
+	if (error) {
+		switch (error) {
+		case EFSCORRUPTED:
+			do_warn(
+ _("corruption in metadata directory tree while discovering quota inodes\n"));
+			break;
+		case ENOENT:
+			/* Do nothing, we'll just clear qflags later. */
+			break;
+		default:
+			do_warn(
+ _("couldn't discover quota inodes, err %d\n"),
+						error);
+			break;
+		}
+	}
+}
diff --git a/repair/quotacheck.h b/repair/quotacheck.h
index 36f9f5a12f7f3e..24c9ffa418a3bf 100644
--- a/repair/quotacheck.h
+++ b/repair/quotacheck.h
@@ -14,5 +14,6 @@ int quotacheck_setup(struct xfs_mount *mp);
 void quotacheck_teardown(void);
 
 void update_sb_quotinos(struct xfs_mount *mp, struct xfs_buf *sbp);
+void discover_quota_inodes(struct xfs_mount *mp);
 
 #endif /* __XFS_REPAIR_QUOTACHECK_H__ */
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 363f8260bd575a..9509f04685c870 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -1487,7 +1487,8 @@ _("Warning:  project quota information would be cleared.\n"
 	if (!sbp)
 		do_error(_("couldn't get superblock\n"));
 
-	update_sb_quotinos(mp, sbp);
+	if (!xfs_has_metadir(mp))
+		update_sb_quotinos(mp, sbp);
 
 	if ((mp->m_sb.sb_qflags & XFS_ALL_QUOTA_CHKD) != quotacheck_results()) {
 		do_warn(_("Note - quota info will be regenerated on next "


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 6/7] xfs_repair: try not to trash qflags on metadir filesystems
  2024-12-23 21:36 ` [PATCHSET v6.2 7/8] xfsprogs: store quota files in the metadir Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-12-23 22:26   ` [PATCH 5/7] xfs_repair: support quota inodes in the metadata directory Darrick J. Wong
@ 2024-12-23 22:26   ` Darrick J. Wong
  2024-12-23 22:26   ` [PATCH 7/7] mkfs: add quota flags when setting up filesystem Darrick J. Wong
  6 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:26 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Try to preserve the accounting and enforcement quota flags when
repairing filesystems.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 repair/agheader.c |    3 ++-
 repair/phase4.c   |   20 ++++++++++++++++++++
 repair/sb.c       |    3 +++
 3 files changed, 25 insertions(+), 1 deletion(-)


diff --git a/repair/agheader.c b/repair/agheader.c
index e6fca07c6cb4c9..89a23a869a02e4 100644
--- a/repair/agheader.c
+++ b/repair/agheader.c
@@ -642,7 +642,8 @@ verify_set_agheader(xfs_mount_t *mp, struct xfs_buf *sbuf, xfs_sb_t *sb,
 			sb->sb_fdblocks = 0;
 			sb->sb_frextents = 0;
 
-			sb->sb_qflags = 0;
+			if (!xfs_has_metadir(mp))
+				sb->sb_qflags = 0;
 		}
 
 		rval |= XR_AG_SB;
diff --git a/repair/phase4.c b/repair/phase4.c
index a4183c557a1891..728d9ed84cdc7a 100644
--- a/repair/phase4.c
+++ b/repair/phase4.c
@@ -71,6 +71,26 @@ quotino_check(
 static void
 quota_sb_check(xfs_mount_t *mp)
 {
+	if (xfs_has_metadir(mp)) {
+		/*
+		 * Metadir filesystems try to preserve the quota accounting
+		 * and enforcement flags so that users don't have to remember
+		 * to supply quota mount options.  Phase 1 discovered the
+		 * QUOTABIT flag (fs_quotas) and phase 2 discovered the quota
+		 * inodes from the metadir for us.
+		 *
+		 * If QUOTABIT wasn't set but we found quota inodes, signal
+		 * phase 5 to add the feature bit for us.  We do not ever
+		 * downgrade the filesystem.
+		 */
+		if (!fs_quotas &&
+		    (has_quota_inode(XFS_DQTYPE_USER) ||
+		     has_quota_inode(XFS_DQTYPE_GROUP) ||
+		     has_quota_inode(XFS_DQTYPE_PROJ)))
+			fs_quotas = 1;
+		return;
+	}
+
 	/*
 	 * if the sb says we have quotas and we lost both,
 	 * signal a superblock downgrade.  that will cause
diff --git a/repair/sb.c b/repair/sb.c
index d52ab2ffeaf28c..0e4827e046780b 100644
--- a/repair/sb.c
+++ b/repair/sb.c
@@ -233,6 +233,9 @@ find_secondary_sb(xfs_sb_t *rsb)
         if (!retval)
                 retval = __find_secondary_sb(rsb, XFS_AG_MIN_BYTES, BSIZE);
 
+	if (retval && xfs_sb_version_hasmetadir(rsb))
+		do_warn(_("quota accounting and enforcement flags lost\n"));
+
 	return retval;
 }
 


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 7/7] mkfs: add quota flags when setting up filesystem
  2024-12-23 21:36 ` [PATCHSET v6.2 7/8] xfsprogs: store quota files in the metadir Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-12-23 22:26   ` [PATCH 6/7] xfs_repair: try not to trash qflags on metadir filesystems Darrick J. Wong
@ 2024-12-23 22:26   ` Darrick J. Wong
  6 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:26 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If we're creating a metadir filesystem, the quota accounting and
enforcement flags persist until the sysadmin changes them.  Add a means
to specify those qflags at format time.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 man/man8/mkfs.xfs.8.in |   48 ++++++++++++++++++++
 mkfs/xfs_mkfs.c        |  113 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 160 insertions(+), 1 deletion(-)


diff --git a/man/man8/mkfs.xfs.8.in b/man/man8/mkfs.xfs.8.in
index 0c0cf1dc151e4f..32361cf973fcf8 100644
--- a/man/man8/mkfs.xfs.8.in
+++ b/man/man8/mkfs.xfs.8.in
@@ -340,6 +340,54 @@ .SH OPTIONS
 See the
 .BI xfs_scrub (8)
 manual page for more information on this property.
+.TP
+.B uquota
+If the metadata directory feature is enabled, the
+.B \-m uquota
+option will set up user quota accounting and enforcement at format time;
+specifying the quota options in fstab is no longer unnecessary.
+If metadata directories are not enabled, quotas must still be enabled via
+fstab.
+.TP
+.B gquota
+If the metadata directory feature is enabled, the
+.B \-m gquota
+option will set up group quota accounting and enforcement at format time;
+specifying the quota options in fstab is no longer unnecessary.
+If metadata directories are not enabled, quotas must still be enabled via
+fstab.
+.TP
+.B pquota
+If the metadata directory feature is enabled, the
+.B \-m pquota
+option will set up project quota accounting and enforcement at format time;
+specifying the quota options in fstab is no longer unnecessary.
+If metadata directories are not enabled, quotas must still be enabled via
+fstab.
+.TP
+.B uqnoenforce
+If the metadata directory feature is enabled, the
+.B \-m uqnoenforce
+option will set up user quota accounting at format time; specifying the quota
+options in fstab is no longer unnecessary.
+If metadata directories are not enabled, quotas must still be enabled via
+fstab.
+.TP
+.B gqnoenforce
+If the metadata directory feature is enabled, the
+.B \-m gqnoenforce
+option will set up group quota accounting at format time; specifying the quota
+options in fstab is no longer unnecessary.
+If metadata directories are not enabled, quotas must still be enabled via
+fstab.
+.TP
+.B pqnoenforce
+If the metadata directory feature is enabled, the
+.B \-m pqnoenforce
+option will set up project quota accounting at format time; specifying the
+quota options in fstab is no longer unnecessary.
+If metadata directories are not enabled, quotas must still be enabled via
+fstab.
 .RE
 .PP
 .PD 0
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index cd94cfd0b93706..a15c19df03a86d 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -153,6 +153,12 @@ enum {
 	M_BIGTIME,
 	M_AUTOFSCK,
 	M_METADIR,
+	M_UQUOTA,
+	M_GQUOTA,
+	M_PQUOTA,
+	M_UQNOENFORCE,
+	M_GQNOENFORCE,
+	M_PQNOENFORCE,
 	M_MAX_OPTS,
 };
 
@@ -833,6 +839,12 @@ static struct opt_params mopts = {
 		[M_BIGTIME] = "bigtime",
 		[M_AUTOFSCK] = "autofsck",
 		[M_METADIR] = "metadir",
+		[M_UQUOTA] = "uquota",
+		[M_GQUOTA] = "gquota",
+		[M_PQUOTA] = "pquota",
+		[M_UQNOENFORCE] = "uqnoenforce",
+		[M_GQNOENFORCE] = "gqnoenforce",
+		[M_PQNOENFORCE] = "pqnoenforce",
 		[M_MAX_OPTS] = NULL,
 	},
 	.subopt_params = {
@@ -888,6 +900,48 @@ static struct opt_params mopts = {
 		  .maxval = 1,
 		  .defaultval = 1,
 		},
+		{ .index = M_UQUOTA,
+		  .conflicts = { { &mopts, M_UQNOENFORCE },
+				 { NULL, LAST_CONFLICT } },
+		  .minval = 0,
+		  .maxval = 1,
+		  .defaultval = 1,
+		},
+		{ .index = M_GQUOTA,
+		  .conflicts = { { &mopts, M_GQNOENFORCE },
+				 { NULL, LAST_CONFLICT } },
+		  .minval = 0,
+		  .maxval = 1,
+		  .defaultval = 1,
+		},
+		{ .index = M_PQUOTA,
+		  .conflicts = { { &mopts, M_GQNOENFORCE },
+				 { NULL, LAST_CONFLICT } },
+		  .minval = 0,
+		  .maxval = 1,
+		  .defaultval = 1,
+		},
+		{ .index = M_UQNOENFORCE,
+		  .conflicts = { { &mopts, M_UQUOTA },
+				 { NULL, LAST_CONFLICT } },
+		  .minval = 0,
+		  .maxval = 1,
+		  .defaultval = 1,
+		},
+		{ .index = M_GQNOENFORCE,
+		  .conflicts = { { &mopts, M_GQUOTA },
+				 { NULL, LAST_CONFLICT } },
+		  .minval = 0,
+		  .maxval = 1,
+		  .defaultval = 1,
+		},
+		{ .index = M_PQNOENFORCE,
+		  .conflicts = { { &mopts, M_PQUOTA },
+				 { NULL, LAST_CONFLICT } },
+		  .minval = 0,
+		  .maxval = 1,
+		  .defaultval = 1,
+		},
 	},
 };
 
@@ -945,6 +999,8 @@ struct sb_feat_args {
 	bool	nortalign;
 	bool	nrext64;
 	bool	exchrange;		/* XFS_SB_FEAT_INCOMPAT_EXCHRANGE */
+
+	uint16_t qflags;
 };
 
 struct cli_params {
@@ -1083,6 +1139,8 @@ usage( void )
 /* metadata */		[-m crc=0|1,finobt=0|1,uuid=xxx,rmapbt=0|1,reflink=0|1,\n\
 			    inobtcount=0|1,bigtime=0|1,autofsck=xxx,\n\
 			    metadir=0|1]\n\
+/* quota */		[-m uquota|uqnoenforce,gquota|gqnoenforce,\n\
+			    pquota|pqnoenforce]\n\
 /* data subvol */	[-d agcount=n,agsize=n,file,name=xxx,size=num,\n\
 			    (sunit=value,swidth=value|su=num,sw=num|noalign),\n\
 			    sectsize=num,concurrency=num]\n\
@@ -1920,6 +1978,30 @@ meta_opts_parser(
 	case M_METADIR:
 		cli->sb_feat.metadir = getnum(value, opts, subopt);
 		break;
+	case M_UQUOTA:
+		if (getnum(value, opts, subopt))
+			cli->sb_feat.qflags |= XFS_UQUOTA_ACCT | XFS_UQUOTA_ENFD;
+		break;
+	case M_GQUOTA:
+		if (getnum(value, opts, subopt))
+			cli->sb_feat.qflags |= XFS_GQUOTA_ACCT | XFS_GQUOTA_ENFD;
+		break;
+	case M_PQUOTA:
+		if (getnum(value, opts, subopt))
+			cli->sb_feat.qflags |= XFS_PQUOTA_ACCT | XFS_PQUOTA_ENFD;
+		break;
+	case M_UQNOENFORCE:
+		if (getnum(value, opts, subopt))
+			cli->sb_feat.qflags |= XFS_UQUOTA_ACCT;
+		break;
+	case M_GQNOENFORCE:
+		if (getnum(value, opts, subopt))
+			cli->sb_feat.qflags |= XFS_GQUOTA_ACCT;
+		break;
+	case M_PQNOENFORCE:
+		if (getnum(value, opts, subopt))
+			cli->sb_feat.qflags |= XFS_PQUOTA_ACCT;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -2516,6 +2598,12 @@ _("metadata directory not supported without CRC support\n"));
 			usage();
 		}
 		cli->sb_feat.metadir = false;
+
+		if (cli->sb_feat.qflags) {
+			fprintf(stderr,
+_("persistent quota flags not supported without CRC support\n"));
+			usage();
+		}
 	}
 
 	if (!cli->sb_feat.finobt) {
@@ -2561,6 +2649,26 @@ _("cowextsize not supported without reflink support\n"));
 		cli->sb_feat.exchrange = true;
 	}
 
+	if (cli->sb_feat.qflags && cli->xi->rt.name) {
+		fprintf(stderr,
+_("persistent quota flags not supported with realtime volumes\n"));
+				usage();
+	}
+
+	/*
+	 * Persistent quota flags requires metadir support because older
+	 * kernels (or current kernels with old filesystems) will reset qflags
+	 * in the absence of any quota mount options.
+	 */
+	if (cli->sb_feat.qflags && !cli->sb_feat.metadir) {
+		if (cli_opt_set(&mopts, M_METADIR)) {
+			fprintf(stderr,
+_("persistent quota flags not supported without metadir support\n"));
+			usage();
+		}
+		cli->sb_feat.metadir = true;
+	}
+
 	/*
 	 * Exchange-range will be needed for space reorganization on filesystems
 	 * with realtime rmap or realtime reflink enabled, and there is no good
@@ -3811,6 +3919,9 @@ sb_set_features(
 	if (fp->dirftype && !fp->crcs_enabled)
 		sbp->sb_features2 |= XFS_SB_VERSION2_FTYPE;
 
+	if (fp->qflags)
+		sbp->sb_versionnum |= XFS_SB_VERSION_QUOTABIT;
+
 	/* update whether extended features are in use */
 	if (sbp->sb_features2 != 0)
 		sbp->sb_versionnum |= XFS_SB_VERSION_MOREBITSBIT;
@@ -4337,7 +4448,7 @@ finish_superblock_setup(
 			   (cfg->loginternal ? cfg->logblocks : 0);
 	sbp->sb_frextents = 0;	/* will do a free later */
 	sbp->sb_uquotino = sbp->sb_gquotino = sbp->sb_pquotino = 0;
-	sbp->sb_qflags = 0;
+	sbp->sb_qflags = cfg->sb_feat.qflags;
 	sbp->sb_unit = cfg->dsunit;
 	sbp->sb_width = cfg->dswidth;
 	mp->m_features |= libxfs_sb_version_to_features(sbp);


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 1/2] xfs_quota: report warning limits for realtime space quotas
  2024-12-23 21:36 ` [PATCHSET v6.2 8/8] xfsprogs: enable quota for realtime volumes Darrick J. Wong
@ 2024-12-23 22:27   ` Darrick J. Wong
  2024-12-23 22:27   ` [PATCH 2/2] mkfs: enable rt quota options Darrick J. Wong
  1 sibling, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:27 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Report the number of warnings that a user will get for exceeding the
soft limit of a realtime volume.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/xqm.h |    5 ++++-
 quota/state.c |    1 +
 2 files changed, 5 insertions(+), 1 deletion(-)


diff --git a/include/xqm.h b/include/xqm.h
index 573441db98601a..045af9b67fdf2b 100644
--- a/include/xqm.h
+++ b/include/xqm.h
@@ -184,7 +184,10 @@ struct fs_quota_statv {
 	__s32			qs_rtbtimelimit;/* limit for rt blks timer */
 	__u16			qs_bwarnlimit;	/* limit for num warnings */
 	__u16			qs_iwarnlimit;	/* limit for num warnings */
-	__u64			qs_pad2[8];	/* for future proofing */
+	__u16			qs_rtbwarnlimit;/* limit for rt blks warnings */
+	__u16			qs_pad3;
+	__u32			qs_pad4;
+	__u64			qs_pad2[7];	/* for future proofing */
 };
 
 #endif	/* __XQM_H__ */
diff --git a/quota/state.c b/quota/state.c
index 260ef51db18072..43fb700f9a7317 100644
--- a/quota/state.c
+++ b/quota/state.c
@@ -244,6 +244,7 @@ state_quotafile_stat(
 	state_warnlimit(fp, XFS_INODE_QUOTA, sv->qs_iwarnlimit);
 
 	state_timelimit(fp, XFS_RTBLOCK_QUOTA, sv->qs_rtbtimelimit);
+	state_warnlimit(fp, XFS_RTBLOCK_QUOTA, sv->qs_rtbwarnlimit);
 }
 
 static void


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 2/2] mkfs: enable rt quota options
  2024-12-23 21:36 ` [PATCHSET v6.2 8/8] xfsprogs: enable quota for realtime volumes Darrick J. Wong
  2024-12-23 22:27   ` [PATCH 1/2] xfs_quota: report warning limits for realtime space quotas Darrick J. Wong
@ 2024-12-23 22:27   ` Darrick J. Wong
  1 sibling, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:27 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Now that the kernel supports quota and realtime devices, allow people to
format filesystems with permanent quota options.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 mkfs/xfs_mkfs.c |    6 ------
 1 file changed, 6 deletions(-)


diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index a15c19df03a86d..956cc295489342 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -2649,12 +2649,6 @@ _("cowextsize not supported without reflink support\n"));
 		cli->sb_feat.exchrange = true;
 	}
 
-	if (cli->sb_feat.qflags && cli->xi->rt.name) {
-		fprintf(stderr,
-_("persistent quota flags not supported with realtime volumes\n"));
-				usage();
-	}
-
 	/*
 	 * Persistent quota flags requires metadir support because older
 	 * kernels (or current kernels with old filesystems) will reset qflags


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [GIT PULL 1/8] xfsprogs: bug fixes for 6.12
  2024-12-23 21:29 [PATCHBOMB v6.2] xfsprogs: metadata directories and realtime groups Darrick J. Wong
                   ` (7 preceding siblings ...)
  2024-12-23 21:36 ` [PATCHSET v6.2 8/8] xfsprogs: enable quota for realtime volumes Darrick J. Wong
@ 2024-12-23 22:27 ` Darrick J. Wong
  2024-12-23 22:27 ` [GIT PULL 2/8] libxfs: metadata inode directory trees Darrick J. Wong
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:27 UTC (permalink / raw)
  To: aalbersh, djwong; +Cc: hch, jpalus, linux-xfs

Hi Andrey,

Please pull this branch with changes for xfsprogs for 6.11-rc1.

As usual, I did a test-merge with the main upstream branch as of a few
minutes ago, and didn't see any conflicts.  Please let me know if you
encounter any problems.

--D

The following changes since commit 409477af604f465169be9b2cbe259fe382f052ae:

xfs_io: add support for atomic write statx fields (2024-11-21 14:43:58 +0100)

are available in the Git repository at:

https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git tags/xfs-6.12-fixes_2024-12-23

for you to fetch changes up to 50e3f6684fe5adb4138ec5882b316c00524a6051:

man: document the -n parent mkfs option (2024-12-23 13:05:06 -0800)

----------------------------------------------------------------
xfsprogs: bug fixes for 6.12 [01/23]

Bug fixes for 6.12.

This has been running on the djcloud for months with no problems.  Enjoy!

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>

----------------------------------------------------------------
Andrey Albershteyn (1):
xfsprogs: Release v6.12.0

Darrick J. Wong (2):
xfs_repair: fix maximum file offset comparison
man: document the -n parent mkfs option

Jan Palus (1):
man: fix ioctl_xfs_commit_range man page install

VERSION                           |  2 +-
configure.ac                      |  2 +-
debian/changelog                  |  6 ++++++
doc/CHANGES                       | 21 +++++++++++++++++++++
man/man2/ioctl_xfs_commit_range.2 |  2 +-
man/man8/mkfs.xfs.8.in            | 12 ++++++++++++
repair/dinode.c                   |  2 +-
repair/globals.c                  |  1 -
repair/globals.h                  |  1 -
repair/prefetch.c                 |  2 +-
repair/versions.c                 |  7 +------
11 files changed, 45 insertions(+), 13 deletions(-)


^ permalink raw reply	[flat|nested] 213+ messages in thread

* [GIT PULL 2/8] libxfs: metadata inode directory trees
  2024-12-23 21:29 [PATCHBOMB v6.2] xfsprogs: metadata directories and realtime groups Darrick J. Wong
                   ` (8 preceding siblings ...)
  2024-12-23 22:27 ` [GIT PULL 1/8] xfsprogs: bug fixes for 6.12 Darrick J. Wong
@ 2024-12-23 22:27 ` Darrick J. Wong
  2024-12-23 22:28 ` [GIT PULL 3/8] xfsprogs: " Darrick J. Wong
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:27 UTC (permalink / raw)
  To: aalbersh, djwong; +Cc: cem, dchinner, hch, leo.lilong, linux-xfs

Hi Andrey,

Please pull this branch with changes for xfsprogs for 6.11-rc1.

As usual, I did a test-merge with the main upstream branch as of a few
minutes ago, and didn't see any conflicts.  Please let me know if you
encounter any problems.

--D

The following changes since commit 50e3f6684fe5adb4138ec5882b316c00524a6051:

man: document the -n parent mkfs option (2024-12-23 13:05:06 -0800)

are available in the Git repository at:

https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git tags/metadata-directory-tree-sync_2024-12-23

for you to fetch changes up to 1d6b5c7e0476de97a15123768513cb3bb10803c7:

xfs: check metadata directory file path connectivity (2024-12-23 13:05:08 -0800)

----------------------------------------------------------------
libxfs: metadata inode directory trees [v6.2 02/23]

Synchronize libxfs with the kernel.

This has been running on the djcloud for months with no problems.  Enjoy!

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>

----------------------------------------------------------------
Christoph Hellwig (22):
xfs: remove the unused pagb_count field in struct xfs_perag
xfs: remove the unused pag_active_wq field in struct xfs_perag
xfs: pass a pag to xfs_difree_inode_chunk
xfs: remove the agno argument to xfs_free_ag_extent
xfs: add xfs_agbno_to_fsb and xfs_agbno_to_daddr helpers
xfs: add a xfs_agino_to_ino helper
xfs: pass a pag to xfs_extent_busy_{search,reuse}
xfs: pass a perag structure to the xfs_ag_resv_init_error trace point
xfs: pass objects to the xfs_irec_merge_{pre,post} trace points
xfs: convert remaining trace points to pass pag structures
xfs: split xfs_initialize_perag
xfs: insert the pag structures into the xarray later
xfs: factor out a generic xfs_group structure
xfs: add a xfs_group_next_range helper
xfs: switch perag iteration from the for_each macros to a while based iterator
xfs: move metadata health tracking to the generic group structure
xfs: move draining of deferred operations to the generic group structure
xfs: move the online repair rmap hooks to the generic group structure
xfs: convert busy extent tracking to the generic group structure
xfs: add a generic group pointer to the btree cursor
xfs: add group based bno conversion helpers
xfs: store a generic group structure in the intents

Darrick J. Wong (12):
xfs: constify the xfs_sb predicates
xfs: rename metadata inode predicates
xfs: define the on-disk format for the metadir feature
xfs: iget for metadata inodes
xfs: enforce metadata inode flag
xfs: read and write metadata inode directory tree
xfs: disable the agi rotor for metadata inodes
xfs: advertise metadata directory feature
xfs: allow bulkstat to return metadata directories
xfs: adjust xfs_bmap_add_attrfork for metadir
xfs: record health problems with the metadata directory
xfs: check metadata directory file path connectivity

Dave Chinner (1):
xfs: sb_spino_align is not verified

Long Li (1):
xfs: remove the redundant xfs_alloc_log_agf

db/check.c                  |   2 +-
db/fsmap.c                  |  10 +-
db/info.c                   |   7 +-
db/inode.c                  |   4 +-
db/iunlink.c                |   6 +-
include/libxfs.h            |   2 +
include/xfs_inode.h         |  11 +
include/xfs_mount.h         |  45 ++++-
include/xfs_trace.h         |  31 ++-
include/xfs_trans.h         |   3 +
libfrog/radix-tree.h        |   9 +
libxfs/Makefile             |   6 +
libxfs/defer_item.c         |  35 ++--
libxfs/init.c               |   8 +-
libxfs/inode.c              |  55 +++++
libxfs/iunlink.c            |  11 +-
libxfs/libxfs_api_defs.h    |  15 +-
libxfs/libxfs_priv.h        |  10 +-
libxfs/trans.c              |  39 ++++
libxfs/util.c               |   4 +-
libxfs/xfs_ag.c             | 246 ++++++++---------------
libxfs/xfs_ag.h             | 189 ++++++++++-------
libxfs/xfs_ag_resv.c        |  22 +-
libxfs/xfs_alloc.c          | 104 +++++-----
libxfs/xfs_alloc.h          |   7 +-
libxfs/xfs_alloc_btree.c    |  30 +--
libxfs/xfs_attr.c           |   5 +-
libxfs/xfs_bmap.c           |   7 +-
libxfs/xfs_bmap.h           |   2 +-
libxfs/xfs_btree.c          |  38 ++--
libxfs/xfs_btree.h          |   3 +-
libxfs/xfs_btree_mem.c      |   6 +-
libxfs/xfs_format.h         | 121 ++++++++---
libxfs/xfs_fs.h             |  25 ++-
libxfs/xfs_group.c          | 223 ++++++++++++++++++++
libxfs/xfs_group.h          | 131 ++++++++++++
libxfs/xfs_health.h         |  51 ++---
libxfs/xfs_ialloc.c         | 175 ++++++++--------
libxfs/xfs_ialloc_btree.c   |  29 +--
libxfs/xfs_inode_buf.c      |  90 ++++++++-
libxfs/xfs_inode_buf.h      |   3 +
libxfs/xfs_inode_util.c     |   6 +-
libxfs/xfs_log_format.h     |   2 +-
libxfs/xfs_metadir.c        | 480 ++++++++++++++++++++++++++++++++++++++++++++
libxfs/xfs_metadir.h        |  47 +++++
libxfs/xfs_metafile.c       |  52 +++++
libxfs/xfs_metafile.h       |  31 +++
libxfs/xfs_ondisk.h         |   2 +-
libxfs/xfs_refcount.c       |  33 ++-
libxfs/xfs_refcount.h       |   2 +-
libxfs/xfs_refcount_btree.c |  17 +-
libxfs/xfs_rmap.c           |  42 ++--
libxfs/xfs_rmap.h           |   6 +-
libxfs/xfs_rmap_btree.c     |  28 +--
libxfs/xfs_sb.c             |  54 +++--
libxfs/xfs_types.c          |   9 +-
libxfs/xfs_types.h          |  10 +-
repair/agbtree.c            |  27 ++-
repair/bmap_repair.c        |  11 +-
repair/bulkload.c           |   9 +-
repair/dino_chunks.c        |   2 +-
repair/dinode.c             |  12 +-
repair/phase2.c             |  17 +-
repair/phase5.c             |   6 +-
repair/rmap.c               |  12 +-
65 files changed, 2032 insertions(+), 705 deletions(-)
create mode 100644 libxfs/xfs_group.c
create mode 100644 libxfs/xfs_group.h
create mode 100644 libxfs/xfs_metadir.c
create mode 100644 libxfs/xfs_metadir.h
create mode 100644 libxfs/xfs_metafile.c
create mode 100644 libxfs/xfs_metafile.h


^ permalink raw reply	[flat|nested] 213+ messages in thread

* [GIT PULL 3/8] xfsprogs: metadata inode directory trees
  2024-12-23 21:29 [PATCHBOMB v6.2] xfsprogs: metadata directories and realtime groups Darrick J. Wong
                   ` (9 preceding siblings ...)
  2024-12-23 22:27 ` [GIT PULL 2/8] libxfs: metadata inode directory trees Darrick J. Wong
@ 2024-12-23 22:28 ` Darrick J. Wong
  2024-12-23 22:28 ` [GIT PULL 4/8] mkfs: make protofiles less janky Darrick J. Wong
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:28 UTC (permalink / raw)
  To: aalbersh, djwong; +Cc: hch, linux-xfs

Hi Andrey,

Please pull this branch with changes for xfsprogs for 6.11-rc1.

As usual, I did a test-merge with the main upstream branch as of a few
minutes ago, and didn't see any conflicts.  Please let me know if you
encounter any problems.

--D

The following changes since commit 1d6b5c7e0476de97a15123768513cb3bb10803c7:

xfs: check metadata directory file path connectivity (2024-12-23 13:05:08 -0800)

are available in the Git repository at:

https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git tags/metadata-directory-tree_2024-12-23

for you to fetch changes up to cbb4fe589532389c8ae6a4e3018707d493b8c5f3:

mkfs.xfs: enable metadata directories (2024-12-23 13:05:10 -0800)

----------------------------------------------------------------
xfsprogs: metadata inode directory trees [v6.2 03/23]

This series delivers a new feature -- metadata inode directories.  This
is a separate directory tree (rooted in the superblock) that contains
only inodes that contain filesystem metadata.  Different metadata
objects can be looked up with regular paths.

We start by creating xfs_imeta_* functions to mediate access to metadata
inode pointers.  This enables the imeta code to abstract inode pointers,
whether they're the classic five in the superblock, or the much more
complex directory tree.  All current users of metadata inodes (rt+quota)
are converted to use the boilerplate code.

Next, we define the metadir on-disk format, which consists of marking
inodes with a new iflag that says they're metadata.  This we use to
prevent bulkstat and friends from ever getting their hands on fs
metadata.

Finally, we implement metadir operations so that clients can create,
delete, zap, and look up metadata inodes by path.  Beware that much of
this code is only lightly used, because the five current users of
metadata inodes don't tend to change them very often.  This is likely to
change if and when the subvolume and multiple-rt-volume features get
written/merged/etc.

This has been running on the djcloud for months with no problems.  Enjoy!

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>

----------------------------------------------------------------
Christoph Hellwig (1):
xfs_repair: refactor generate_rtinfo

Darrick J. Wong (40):
libxfs: constify the xfs_inode predicates
libxfs: load metadata directory root at mount time
libxfs: enforce metadata inode flag
man2: document metadata directory flag in fsgeom ioctl
man: update scrub ioctl documentation for metadir
libfrog: report metadata directories in the geometry report
libfrog: allow METADIR in xfrog_bulkstat_single5
xfs_io: support scrubbing metadata directory paths
xfs_db: disable xfs_check when metadir is enabled
xfs_db: report metadir support for version command
xfs_db: don't obfuscate metadata directories and attributes
xfs_db: support metadata directories in the path command
xfs_db: show the metadata root directory when dumping superblocks
xfs_db: display di_metatype
xfs_db: drop the metadata checking code from blockget
xfs_io: support flag for limited bulkstat of the metadata directory
xfs_io: support scrubbing metadata directory paths
xfs_spaceman: report health of metadir inodes too
xfs_scrub: tread zero-length read verify as an IO error
xfs_scrub: scan metadata directories during phase 3
xfs_scrub: re-run metafile scrubbers during phase 5
xfs_repair: handle sb_metadirino correctly when zeroing supers
xfs_repair: dont check metadata directory dirent inumbers
xfs_repair: refactor fixing dotdot
xfs_repair: refactor marking of metadata inodes
xfs_repair: refactor root directory initialization
xfs_repair: refactor grabbing realtime metadata inodes
xfs_repair: check metadata inode flag
xfs_repair: use libxfs_metafile_iget for quota/rt inodes
xfs_repair: rebuild the metadata directory
xfs_repair: don't let metadata and regular files mix
xfs_repair: update incore metadata state whenever we create new files
xfs_repair: pass private data pointer to scan_lbtree
xfs_repair: mark space used by metadata files
xfs_repair: adjust keep_fsinos to handle metadata directories
xfs_repair: metadata dirs are never plausible root dirs
xfs_repair: drop all the metadata directory files during pass 4
xfs_repair: truncate and unmark orphaned metadata inodes
xfs_repair: do not count metadata directory files when doing quotacheck
mkfs.xfs: enable metadata directories

db/check.c                          | 290 +------------------
db/field.c                          |   2 +
db/field.h                          |   1 +
db/inode.c                          |  86 +++++-
db/inode.h                          |   2 +
db/metadump.c                       | 385 ++++++++++++-------------
db/namei.c                          |  71 ++++-
db/sb.c                             |  16 ++
include/xfs_inode.h                 |  12 +-
include/xfs_mount.h                 |   1 +
io/bulkstat.c                       |  16 +-
io/scrub.c                          |  62 ++++-
libfrog/bulkstat.c                  |   3 +-
libfrog/fsgeom.c                    |   6 +-
libfrog/scrub.c                     |  14 +-
libfrog/scrub.h                     |   2 +
libxfs/init.c                       |  26 ++
libxfs/inode.c                      |   9 +-
libxfs/libxfs_api_defs.h            |   4 +
man/man2/ioctl_xfs_fsgeometry.2     |   3 +
man/man2/ioctl_xfs_scrub_metadata.2 |  44 +++
man/man8/mkfs.xfs.8.in              |  11 +
man/man8/xfs_db.8                   |  35 ++-
man/man8/xfs_io.8                   |  13 +-
mkfs/lts_4.19.conf                  |   1 +
mkfs/lts_5.10.conf                  |   1 +
mkfs/lts_5.15.conf                  |   1 +
mkfs/lts_5.4.conf                   |   1 +
mkfs/lts_6.1.conf                   |   1 +
mkfs/lts_6.12.conf                  |   1 +
mkfs/lts_6.6.conf                   |   1 +
mkfs/proto.c                        |  68 ++++-
mkfs/xfs_mkfs.c                     |  33 ++-
repair/agheader.c                   |  11 +-
repair/dino_chunks.c                |  43 +++
repair/dinode.c                     | 196 +++++++++++--
repair/dinode.h                     |   6 +-
repair/dir2.c                       |  51 +++-
repair/globals.c                    |   8 +-
repair/globals.h                    |   8 +-
repair/incore.h                     |  63 ++++-
repair/incore_ino.c                 |   1 +
repair/phase1.c                     |   2 +
repair/phase2.c                     |  58 +++-
repair/phase4.c                     |  18 ++
repair/phase5.c                     |  12 +-
repair/phase6.c                     | 541 ++++++++++++++++++++++++------------
repair/pptr.c                       |  94 +++++++
repair/pptr.h                       |   2 +
repair/quotacheck.c                 |  22 +-
repair/rt.c                         | 189 ++++++++-----
repair/rt.h                         |  12 +-
repair/sb.c                         |   3 +
repair/scan.c                       |  43 ++-
repair/scan.h                       |   7 +-
repair/xfs_repair.c                 |  56 ++++
scrub/inodes.c                      |  11 +-
scrub/inodes.h                      |   5 +-
scrub/phase3.c                      |   7 +-
scrub/phase5.c                      | 102 ++++++-
scrub/phase6.c                      |  24 +-
scrub/read_verify.c                 |   8 +
scrub/scrub.c                       |  18 ++
scrub/scrub.h                       |   7 +
spaceman/health.c                   |   2 +
65 files changed, 1949 insertions(+), 903 deletions(-)

^ permalink raw reply	[flat|nested] 213+ messages in thread

* [GIT PULL 4/8] mkfs: make protofiles less janky
  2024-12-23 21:29 [PATCHBOMB v6.2] xfsprogs: metadata directories and realtime groups Darrick J. Wong
                   ` (10 preceding siblings ...)
  2024-12-23 22:28 ` [GIT PULL 3/8] xfsprogs: " Darrick J. Wong
@ 2024-12-23 22:28 ` Darrick J. Wong
  2024-12-23 22:28 ` [GIT PULL 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:28 UTC (permalink / raw)
  To: aalbersh, djwong; +Cc: hch, linux-xfs

Hi Andrey,

Please pull this branch with changes for xfsprogs for 6.11-rc1.

As usual, I did a test-merge with the main upstream branch as of a few
minutes ago, and didn't see any conflicts.  Please let me know if you
encounter any problems.

--D

The following changes since commit cbb4fe589532389c8ae6a4e3018707d493b8c5f3:

mkfs.xfs: enable metadata directories (2024-12-23 13:05:10 -0800)

are available in the Git repository at:

https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git tags/protofiles_2024-12-23

for you to fetch changes up to 513300e9565b0d446ac8e6a3a990444d766c728b:

mkfs: add a utility to generate protofiles (2024-12-23 13:05:10 -0800)

----------------------------------------------------------------
mkfs: make protofiles less janky [v6.2 04/23]

Here are some minor improvements to the protofile code in mkfs.  First,
we improve the file copy-in code to avoid allocating gigantic heap
buffers by replacing it with a SEEK_{DATA,HOLE} loop and only copying
a fixed size buffer.  Next, we add the ability to pull in extended
attributes.  Finally, we add a program to generate a protofile from a
directory tree.

Originally this code refactoring was intended to support more efficient
construction of rt bitmap and summary files, but that was subsequently
refactored into libxfs.  However, the cleanups still make the protofile
code less disgusting, so they're being submitted anyway.

This has been running on the djcloud for months with no problems.  Enjoy!

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>

----------------------------------------------------------------
Darrick J. Wong (4):
libxfs: resync libxfs_alloc_file_space interface with the kernel
mkfs: support copying in large or sparse files
mkfs: support copying in xattrs
mkfs: add a utility to generate protofiles

include/libxfs.h         |   6 +-
libxfs/util.c            | 207 +++++++++++++++++++++++++++---------
man/man8/xfs_protofile.8 |  33 ++++++
mkfs/Makefile            |  10 +-
mkfs/proto.c             | 269 +++++++++++++++++++++++++++++++++++------------
mkfs/xfs_protofile.in    | 152 ++++++++++++++++++++++++++
6 files changed, 557 insertions(+), 120 deletions(-)
create mode 100644 man/man8/xfs_protofile.8
create mode 100644 mkfs/xfs_protofile.in

^ permalink raw reply	[flat|nested] 213+ messages in thread

* [GIT PULL 5/8] xfsprogs: new code for 6.13
  2024-12-23 21:29 [PATCHBOMB v6.2] xfsprogs: metadata directories and realtime groups Darrick J. Wong
                   ` (11 preceding siblings ...)
  2024-12-23 22:28 ` [GIT PULL 4/8] mkfs: make protofiles less janky Darrick J. Wong
@ 2024-12-23 22:28 ` Darrick J. Wong
  2024-12-23 22:28 ` [GIT PULL 6/8] xfsprogs: shard the realtime section Darrick J. Wong
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:28 UTC (permalink / raw)
  To: aalbersh, djwong
  Cc: brauner, cem, dan.carpenter, dchinner, hch, jlayton, josef,
	leo.lilong, linux-xfs, rdunlap, stable

Hi Andrey,

Please pull this branch with changes for xfsprogs for 6.11-rc1.

As usual, I did a test-merge with the main upstream branch as of a few
minutes ago, and didn't see any conflicts.  Please let me know if you
encounter any problems.

--D

The following changes since commit 513300e9565b0d446ac8e6a3a990444d766c728b:

mkfs: add a utility to generate protofiles (2024-12-23 13:05:10 -0800)

are available in the Git repository at:

https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git tags/xfs-6.13-merge_2024-12-23

for you to fetch changes up to 80b81c84f015ee01fed80c32184cc763ee1a655e:

xfs: return from xfs_symlink_verify early on V4 filesystems (2024-12-23 13:05:13 -0800)

----------------------------------------------------------------
xfsprogs: new code for 6.13 [05/23]

New code for 6.12.

This has been running on the djcloud for months with no problems.  Enjoy!

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>

----------------------------------------------------------------
Christoph Hellwig (9):
xfs: add a xfs_bmap_free_rtblocks helper
xfs: move RT bitmap and summary information to the rtgroup
xfs: support creating per-RTG files in growfs
xfs: refactor xfs_rtbitmap_blockcount
xfs: refactor xfs_rtsummary_blockcount
xfs: make RT extent numbers relative to the rtgroup
xfs: add a helper to prevent bmap merges across rtgroup boundaries
xfs: make the RT allocator rtgroup aware
xfs: don't call xfs_bmap_same_rtgroup in xfs_bmap_add_extent_hole_delay

Darrick J. Wong (40):
xfs: create incore realtime group structures
xfs: define locking primitives for realtime groups
xfs: add a lockdep class key for rtgroup inodes
xfs: support caching rtgroup metadata inodes
libfrog: add memchr_inv
xfs: define the format of rt groups
xfs: update realtime super every time we update the primary fs super
xfs: export realtime group geometry via XFS_FSOP_GEOM
xfs: check that rtblock extents do not break rtsupers or rtgroups
xfs: add frextents to the lazysbcounters when rtgroups enabled
xfs: record rt group metadata errors in the health system
xfs: export the geometry of realtime groups to userspace
xfs: add block headers to realtime bitmap and summary blocks
xfs: encode the rtbitmap in big endian format
xfs: encode the rtsummary in big endian format
xfs: grow the realtime section when realtime groups are enabled
xfs: support logging EFIs for realtime extents
xfs: support error injection when freeing rt extents
xfs: use realtime EFI to free extents when rtgroups are enabled
xfs: don't merge ioends across RTGs
xfs: scrub the realtime group superblock
xfs: scrub metadir paths for rtgroup metadata
xfs: mask off the rtbitmap and summary inodes when metadir in use
xfs: create helpers to deal with rounding xfs_fileoff_t to rtx boundaries
xfs: create helpers to deal with rounding xfs_filblks_t to rtx boundaries
xfs: make xfs_rtblock_t a segmented address like xfs_fsblock_t
xfs: adjust min_block usage in xfs_verify_agbno
xfs: move the min and max group block numbers to xfs_group
xfs: implement busy extent tracking for rtgroups
xfs: use metadir for quota inodes
xfs: scrub quota file metapaths
xfs: enable metadata directory feature
xfs: convert struct typedefs in xfs_ondisk.h
xfs: separate space btree structures in xfs_ondisk.h
xfs: port ondisk structure checks from xfs/122 to the kernel
xfs: return a 64-bit block count from xfs_btree_count_blocks
xfs: fix error bailout in xfs_rtginode_create
xfs: update btree keys correctly when _insrec splits an inode root block
xfs: fix sb_spino_align checks for large fsblock sizes
xfs: return from xfs_symlink_verify early on V4 filesystems

Dave Chinner (1):
xfs: fix sparse inode limits on runt AG

Jeff Layton (1):
xfs: switch to multigrain timestamps

Long Li (1):
xfs: remove unknown compat feature check in superblock write validation

db/block.c                  |   2 +-
db/block.h                  |  16 -
db/convert.c                |   1 -
db/faddr.c                  |   1 -
include/libxfs.h            |   2 +
include/platform_defs.h     |  33 +++
include/xfs_mount.h         |  30 +-
include/xfs_trace.h         |   7 +
include/xfs_trans.h         |   1 +
libfrog/util.c              |  14 +
libfrog/util.h              |   4 +
libxfs/Makefile             |   2 +
libxfs/init.c               |  35 ++-
libxfs/libxfs_api_defs.h    |  16 +
libxfs/libxfs_io.h          |   1 +
libxfs/libxfs_priv.h        |  34 +--
libxfs/rdwr.c               |  17 ++
libxfs/trans.c              |  29 ++
libxfs/util.c               |   8 +-
libxfs/xfs_ag.c             |  22 +-
libxfs/xfs_ag.h             |  16 +-
libxfs/xfs_alloc.c          |  15 +-
libxfs/xfs_alloc.h          |  12 +-
libxfs/xfs_bmap.c           | 124 ++++++--
libxfs/xfs_btree.c          |  33 ++-
libxfs/xfs_btree.h          |   2 +-
libxfs/xfs_defer.c          |   6 +
libxfs/xfs_defer.h          |   1 +
libxfs/xfs_dquot_buf.c      | 190 ++++++++++++
libxfs/xfs_format.h         |  80 ++++-
libxfs/xfs_fs.h             |  32 +-
libxfs/xfs_group.h          |  33 +++
libxfs/xfs_health.h         |  42 +--
libxfs/xfs_ialloc.c         |  16 +-
libxfs/xfs_ialloc_btree.c   |   6 +-
libxfs/xfs_log_format.h     |   6 +-
libxfs/xfs_ondisk.h         | 186 +++++++++---
libxfs/xfs_quota_defs.h     |  43 +++
libxfs/xfs_rtbitmap.c       | 405 +++++++++++++++++---------
libxfs/xfs_rtbitmap.h       | 247 ++++++++++------
libxfs/xfs_rtgroup.c        | 694 ++++++++++++++++++++++++++++++++++++++++++++
libxfs/xfs_rtgroup.h        | 284 ++++++++++++++++++
libxfs/xfs_sb.c             | 246 ++++++++++++++--
libxfs/xfs_sb.h             |   6 +-
libxfs/xfs_shared.h         |   4 +
libxfs/xfs_symlink_remote.c |   4 +-
libxfs/xfs_trans_inode.c    |   6 +-
libxfs/xfs_trans_resv.c     |   2 +-
libxfs/xfs_types.c          |  35 ++-
libxfs/xfs_types.h          |   8 +-
mkfs/proto.c                |  33 ++-
mkfs/xfs_mkfs.c             |   8 +
repair/dinode.c             |   4 +-
repair/phase6.c             | 203 +++++++------
repair/rt.c                 |  34 +--
repair/rt.h                 |   4 +-
56 files changed, 2728 insertions(+), 617 deletions(-)
create mode 100644 libxfs/xfs_rtgroup.c
create mode 100644 libxfs/xfs_rtgroup.h

^ permalink raw reply	[flat|nested] 213+ messages in thread

* [GIT PULL 6/8] xfsprogs: shard the realtime section
  2024-12-23 21:29 [PATCHBOMB v6.2] xfsprogs: metadata directories and realtime groups Darrick J. Wong
                   ` (12 preceding siblings ...)
  2024-12-23 22:28 ` [GIT PULL 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
@ 2024-12-23 22:28 ` Darrick J. Wong
  2024-12-23 22:29 ` [GIT PULL 7/8] xfsprogs: store quota files in the metadir Darrick J. Wong
  2024-12-23 22:29 ` [GIT PULL 8/8] xfsprogs: enable quota for realtime volumes Darrick J. Wong
  15 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:28 UTC (permalink / raw)
  To: aalbersh, djwong; +Cc: hch, linux-xfs

Hi Andrey,

Please pull this branch with changes for xfsprogs for 6.11-rc1.

As usual, I did a test-merge with the main upstream branch as of a few
minutes ago, and didn't see any conflicts.  Please let me know if you
encounter any problems.

--D

The following changes since commit 80b81c84f015ee01fed80c32184cc763ee1a655e:

xfs: return from xfs_symlink_verify early on V4 filesystems (2024-12-23 13:05:13 -0800)

are available in the Git repository at:

https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git tags/realtime-groups_2024-12-23

for you to fetch changes up to be4f8d6c422045b7934b8763f013996d49627b00:

mkfs: format realtime groups (2024-12-23 13:05:16 -0800)

----------------------------------------------------------------
xfsprogs: shard the realtime section [v6.2 06/23]

Right now, the realtime section uses a single pair of metadata inodes to
store the free space information.  This presents a scalability problem
since every thread trying to allocate or free rt extents have to lock
these files.  Solve this problem by sharding the realtime section into
separate realtime allocation groups.

While we're at it, define a superblock to be stamped into the start of
the rt section.  This enables utilities such as blkid to identify block
devices containing realtime sections, and avoids the situation where
anything written into block 0 of the realtime extent can be
misinterpreted as file data.

The best advantage for rtgroups will become evident later when we get to
adding rmap and reflink to the realtime volume, since the geometry
constraints are the same for rt groups and AGs.  Hence we can reuse all
that code directly.

This is a very large patchset, but it catches us up with 20 years of
technical debt that have accumulated.

This has been running on the djcloud for months with no problems.  Enjoy!

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>

----------------------------------------------------------------
Christoph Hellwig (6):
man: document rgextents geom field
xfs_repair: refactor phase4
xfs_repair: simplify rt_lock handling
xfs_repair: add a real per-AG bitmap abstraction
xfs_db: metadump metadir rt bitmap and summary files
xfs_scrub: cleanup fsmap keys initialization

Darrick J. Wong (45):
libxfs: remove XFS_ILOCK_RT*
libxfs: adjust xfs_fsb_to_db to handle segmented rtblocks
xfs_repair,mkfs: port to libxfs_rt{bitmap,summary}_create
libxfs: use correct rtx count to block count conversion
libfrog: scrub the realtime group superblock
man: document the rt group geometry ioctl
libxfs: port userspace deferred log item to handle rtgroups
libxfs: implement some sanity checking for enormous rgcount
libfrog: support scrubbing rtgroup metadata paths
libfrog: report rt groups in output
libfrog: add bitmap_clear
xfs_logprint: report realtime EFIs
xfs_repair: adjust rtbitmap/rtsummary word updates to handle big endian values
xfs_repair: refactor offsetof+sizeof to offsetofend
xfs_repair: improve rtbitmap discrepancy reporting
xfs_repair: support realtime groups
xfs_repair: find and clobber rtgroup bitmap and summary files
xfs_repair: support realtime superblocks
xfs_repair: repair rtbitmap and rtsummary block headers
xfs_db: enable the rtblock and rtextent commands for segmented rt block numbers
xfs_db: enable rtconvert to handle segmented rtblocks
xfs_db: listify the definition of enum typnm
xfs_db: support dumping realtime group data and superblocks
xfs_db: support changing the label and uuid of rt superblocks
xfs_db: enable conversion of rt space units
xfs_db: metadump realtime devices
xfs_db: dump rt bitmap blocks
xfs_db: dump rt summary blocks
xfs_db: report rt group and block number in the bmap command
xfs_io: support scrubbing rtgroup metadata
xfs_io: support scrubbing rtgroup metadata paths
xfs_io: add a command to display allocation group information
xfs_io: add a command to display realtime group information
xfs_io: display rt group in verbose bmap output
xfs_io: display rt group in verbose fsmap output
xfs_mdrestore: refactor open-coded fd/is_file into a structure
xfs_mdrestore: restore rt group superblocks to realtime device
xfs_spaceman: report on realtime group health
xfs_scrub: scrub realtime allocation group metadata
xfs_scrub: check rtgroup metadata directory connections
xfs_scrub: call GETFSMAP for each rt group in parallel
xfs_scrub: trim realtime volumes too
xfs_scrub: use histograms to speed up phase 8 on the realtime volume
mkfs: add headers to realtime bitmap blocks
mkfs: format realtime groups

db/Makefile                           |   1 +
db/block.c                            |  34 ++-
db/bmap.c                             |  56 +++-
db/command.c                          |   2 +
db/convert.c                          | 118 +++++++-
db/field.c                            |  20 ++
db/field.h                            |  10 +
db/inode.c                            |  36 ++-
db/metadump.c                         |  59 +++-
db/rtgroup.c                          | 154 +++++++++++
db/rtgroup.h                          |  21 ++
db/sb.c                               | 117 +++++++-
db/type.c                             |  16 ++
db/type.h                             |  32 ++-
db/xfs_metadump.sh                    |   5 +-
include/xfs.h                         |  15 +
include/xfs_metadump.h                |   8 +
io/Makefile                           |   1 +
io/aginfo.c                           | 215 +++++++++++++++
io/bmap.c                             |  27 +-
io/fsmap.c                            |  22 +-
io/init.c                             |   1 +
io/io.h                               |   1 +
io/scrub.c                            |  81 +++++-
libfrog/bitmap.c                      |  25 +-
libfrog/bitmap.h                      |   1 +
libfrog/div64.h                       |   6 +
libfrog/fsgeom.c                      |  24 +-
libfrog/fsgeom.h                      |  16 ++
libfrog/scrub.c                       |  24 +-
libfrog/scrub.h                       |   1 +
libfrog/util.c                        |  12 +
libfrog/util.h                        |   1 +
libxfs/defer_item.c                   |  73 +++--
libxfs/init.c                         |  46 ++++
libxfs/libxfs_api_defs.h              |   5 +
libxfs/libxfs_priv.h                  |   8 -
libxfs/topology.c                     |  42 +++
libxfs/topology.h                     |   3 +
libxfs/trans.c                        |   2 +-
libxfs/util.c                         |   2 +-
logprint/log_misc.c                   |   2 +
logprint/log_print_all.c              |   8 +
logprint/log_redo.c                   |  57 +++-
man/man2/ioctl_xfs_fsgeometry.2       |   6 +-
man/man2/ioctl_xfs_rtgroup_geometry.2 |  99 +++++++
man/man2/ioctl_xfs_scrub_metadata.2   |   9 +
man/man8/mkfs.xfs.8.in                |  31 +++
man/man8/xfs_db.8                     |  34 ++-
man/man8/xfs_io.8                     |  29 +-
man/man8/xfs_mdrestore.8              |  10 +
man/man8/xfs_metadump.8               |  11 +
man/man8/xfs_spaceman.8               |   5 +-
mdrestore/xfs_mdrestore.c             | 163 ++++++-----
mkfs/proto.c                          | 139 +++++++---
mkfs/xfs_mkfs.c                       | 281 ++++++++++++++++++-
repair/agheader.c                     |  27 +-
repair/agheader.h                     |  10 +
repair/dino_chunks.c                  |  58 ++--
repair/dinode.c                       | 162 +++++++----
repair/dir2.c                         |  13 +-
repair/globals.c                      |   3 -
repair/globals.h                      |   6 +-
repair/incore.c                       | 235 +++++++++++-----
repair/incore.h                       |  36 ++-
repair/incore_ext.c                   |   3 +-
repair/phase2.c                       |  51 ++--
repair/phase3.c                       |   4 +
repair/phase4.c                       | 221 ++++++++-------
repair/phase5.c                       |   2 +-
repair/phase6.c                       | 180 +++++++++++-
repair/rmap.c                         |   4 +-
repair/rt.c                           | 506 +++++++++++++++++++++++++++++-----
repair/rt.h                           |  23 ++
repair/sb.c                           |  41 +++
repair/scan.c                         |  36 +--
repair/xfs_repair.c                   |  19 +-
scrub/phase2.c                        | 124 ++++++---
scrub/phase5.c                        |  24 +-
scrub/phase6.c                        |  17 +-
scrub/phase7.c                        |   7 +
scrub/phase8.c                        |  36 ++-
scrub/repair.c                        |   1 +
scrub/scrub.c                         |   7 +
scrub/scrub.h                         |  13 +-
scrub/spacemap.c                      | 102 +++++--
scrub/xfs_scrub.c                     |   2 +
scrub/xfs_scrub.h                     |   1 +
spaceman/health.c                     |  63 ++++-
89 files changed, 3545 insertions(+), 719 deletions(-)
create mode 100644 db/rtgroup.c
create mode 100644 db/rtgroup.h
create mode 100644 io/aginfo.c
create mode 100644 man/man2/ioctl_xfs_rtgroup_geometry.2

^ permalink raw reply	[flat|nested] 213+ messages in thread

* [GIT PULL 7/8] xfsprogs: store quota files in the metadir
  2024-12-23 21:29 [PATCHBOMB v6.2] xfsprogs: metadata directories and realtime groups Darrick J. Wong
                   ` (13 preceding siblings ...)
  2024-12-23 22:28 ` [GIT PULL 6/8] xfsprogs: shard the realtime section Darrick J. Wong
@ 2024-12-23 22:29 ` Darrick J. Wong
  2024-12-23 22:29 ` [GIT PULL 8/8] xfsprogs: enable quota for realtime volumes Darrick J. Wong
  15 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:29 UTC (permalink / raw)
  To: aalbersh, djwong; +Cc: hch, linux-xfs

Hi Andrey,

Please pull this branch with changes for xfsprogs for 6.11-rc1.

As usual, I did a test-merge with the main upstream branch as of a few
minutes ago, and didn't see any conflicts.  Please let me know if you
encounter any problems.

--D

The following changes since commit be4f8d6c422045b7934b8763f013996d49627b00:

mkfs: format realtime groups (2024-12-23 13:05:16 -0800)

are available in the Git repository at:

https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git tags/metadir-quotas_2024-12-23

for you to fetch changes up to 6ec6c38c96d4427f12e214181a1bb8547fb4c355:

mkfs: add quota flags when setting up filesystem (2024-12-23 13:05:17 -0800)

----------------------------------------------------------------
xfsprogs: store quota files in the metadir [v6.2 07/23]

Store the quota files in the metadata directory tree instead of the
superblock.  Since we're introducing a new incompat feature flag, let's
also make the mount process bring up quotas in whatever state they were
when the filesystem was last unmounted, instead of requiring sysadmins
to remember that themselves.

This has been running on the djcloud for months with no problems.  Enjoy!

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>

----------------------------------------------------------------
Darrick J. Wong (7):
libfrog: scrub quota file metapaths
xfs_db: support metadir quotas
xfs_repair: refactor quota inumber handling
xfs_repair: hoist the secondary sb qflags handling
xfs_repair: support quota inodes in the metadata directory
xfs_repair: try not to trash qflags on metadir filesystems
mkfs: add quota flags when setting up filesystem

db/dquot.c               |  59 ++++++++++++-----
libfrog/scrub.c          |  20 ++++++
libxfs/libxfs_api_defs.h |   6 ++
man/man8/mkfs.xfs.8.in   |  48 ++++++++++++++
mkfs/xfs_mkfs.c          | 113 ++++++++++++++++++++++++++++++++-
repair/agheader.c        | 161 ++++++++++++++++++++++++++---------------------
repair/dinode.c          |  18 +++---
repair/dir2.c            |  12 ++--
repair/globals.c         | 111 ++++++++++++++++++++++++++++++--
repair/globals.h         |  15 +++--
repair/phase2.c          |   3 +
repair/phase4.c          | 116 +++++++++++++++++-----------------
repair/phase6.c          | 128 ++++++++++++++++++++++++++++++++++---
repair/quotacheck.c      | 118 +++++++++++++++++++++++++++++-----
repair/quotacheck.h      |   3 +
repair/sb.c              |   3 +
repair/versions.c        |   9 +--
repair/xfs_repair.c      |  13 ++--
18 files changed, 753 insertions(+), 203 deletions(-)


^ permalink raw reply	[flat|nested] 213+ messages in thread

* [GIT PULL 8/8] xfsprogs: enable quota for realtime volumes
  2024-12-23 21:29 [PATCHBOMB v6.2] xfsprogs: metadata directories and realtime groups Darrick J. Wong
                   ` (14 preceding siblings ...)
  2024-12-23 22:29 ` [GIT PULL 7/8] xfsprogs: store quota files in the metadir Darrick J. Wong
@ 2024-12-23 22:29 ` Darrick J. Wong
  15 siblings, 0 replies; 213+ messages in thread
From: Darrick J. Wong @ 2024-12-23 22:29 UTC (permalink / raw)
  To: aalbersh, djwong; +Cc: hch, linux-xfs

Hi Andrey,

Please pull this branch with changes for xfsprogs for 6.11-rc1.

As usual, I did a test-merge with the main upstream branch as of a few
minutes ago, and didn't see any conflicts.  Please let me know if you
encounter any problems.

--D

The following changes since commit 6ec6c38c96d4427f12e214181a1bb8547fb4c355:

mkfs: add quota flags when setting up filesystem (2024-12-23 13:05:17 -0800)

are available in the Git repository at:

https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git tags/realtime-quotas_2024-12-23

for you to fetch changes up to c9b7f40736bde7633a84f40cd56cf707cda9925e:

mkfs: enable rt quota options (2024-12-23 13:05:17 -0800)

----------------------------------------------------------------
xfsprogs: enable quota for realtime volumes [v6.2 08/23]

At some point, I realized that I've refactored enough of the quota code
in XFS that I should evaluate whether or not quota actually works on
realtime volumes.  It turns out that it nearly works: the only broken
pieces are chown and delayed allocation, and reporting of project
quotas in the statvfs output for projinherit+rtinherit directories.

Fix these things and we can have realtime quotas again after 20 years.

This has been running on the djcloud for months with no problems.  Enjoy!

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>

----------------------------------------------------------------
Darrick J. Wong (2):
xfs_quota: report warning limits for realtime space quotas
mkfs: enable rt quota options

include/xqm.h   | 5 ++++-
mkfs/xfs_mkfs.c | 6 ------
quota/state.c   | 1 +
3 files changed, 5 insertions(+), 7 deletions(-)

^ permalink raw reply	[flat|nested] 213+ messages in thread

end of thread, other threads:[~2024-12-23 22:29 UTC | newest]

Thread overview: 213+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-23 21:29 [PATCHBOMB v6.2] xfsprogs: metadata directories and realtime groups Darrick J. Wong
2024-12-23 21:34 ` [PATCHSET 1/8] xfsprogs: bug fixes for 6.12 Darrick J. Wong
2024-12-23 21:36   ` [PATCH 1/3] xfs_repair: fix maximum file offset comparison Darrick J. Wong
2024-12-23 21:36   ` [PATCH 2/3] man: fix ioctl_xfs_commit_range man page install Darrick J. Wong
2024-12-23 21:37   ` [PATCH 3/3] man: document the -n parent mkfs option Darrick J. Wong
2024-12-23 21:34 ` [PATCHSET v6.2 2/8] libxfs: metadata inode directory trees Darrick J. Wong
2024-12-23 21:37   ` [PATCH 01/36] xfs: remove the redundant xfs_alloc_log_agf Darrick J. Wong
2024-12-23 21:37   ` [PATCH 02/36] xfs: sb_spino_align is not verified Darrick J. Wong
2024-12-23 21:37   ` [PATCH 03/36] xfs: remove the unused pagb_count field in struct xfs_perag Darrick J. Wong
2024-12-23 21:38   ` [PATCH 04/36] xfs: remove the unused pag_active_wq " Darrick J. Wong
2024-12-23 21:38   ` [PATCH 05/36] xfs: pass a pag to xfs_difree_inode_chunk Darrick J. Wong
2024-12-23 21:38   ` [PATCH 06/36] xfs: remove the agno argument to xfs_free_ag_extent Darrick J. Wong
2024-12-23 21:38   ` [PATCH 07/36] xfs: add xfs_agbno_to_fsb and xfs_agbno_to_daddr helpers Darrick J. Wong
2024-12-23 21:39   ` [PATCH 08/36] xfs: add a xfs_agino_to_ino helper Darrick J. Wong
2024-12-23 21:39   ` [PATCH 09/36] xfs: pass a pag to xfs_extent_busy_{search,reuse} Darrick J. Wong
2024-12-23 21:39   ` [PATCH 10/36] xfs: pass a perag structure to the xfs_ag_resv_init_error trace point Darrick J. Wong
2024-12-23 21:39   ` [PATCH 11/36] xfs: pass objects to the xfs_irec_merge_{pre,post} trace points Darrick J. Wong
2024-12-23 21:40   ` [PATCH 12/36] xfs: convert remaining trace points to pass pag structures Darrick J. Wong
2024-12-23 21:40   ` [PATCH 13/36] xfs: split xfs_initialize_perag Darrick J. Wong
2024-12-23 21:40   ` [PATCH 14/36] xfs: insert the pag structures into the xarray later Darrick J. Wong
2024-12-23 21:41   ` [PATCH 15/36] xfs: factor out a generic xfs_group structure Darrick J. Wong
2024-12-23 21:41   ` [PATCH 16/36] xfs: add a xfs_group_next_range helper Darrick J. Wong
2024-12-23 21:41   ` [PATCH 17/36] xfs: switch perag iteration from the for_each macros to a while based iterator Darrick J. Wong
2024-12-23 21:41   ` [PATCH 18/36] xfs: move metadata health tracking to the generic group structure Darrick J. Wong
2024-12-23 21:42   ` [PATCH 19/36] xfs: move draining of deferred operations " Darrick J. Wong
2024-12-23 21:42   ` [PATCH 20/36] xfs: move the online repair rmap hooks " Darrick J. Wong
2024-12-23 21:42   ` [PATCH 21/36] xfs: convert busy extent tracking " Darrick J. Wong
2024-12-23 21:42   ` [PATCH 22/36] xfs: add a generic group pointer to the btree cursor Darrick J. Wong
2024-12-23 21:43   ` [PATCH 23/36] xfs: add group based bno conversion helpers Darrick J. Wong
2024-12-23 21:43   ` [PATCH 24/36] xfs: store a generic group structure in the intents Darrick J. Wong
2024-12-23 21:43   ` [PATCH 25/36] xfs: constify the xfs_sb predicates Darrick J. Wong
2024-12-23 21:43   ` [PATCH 26/36] xfs: rename metadata inode predicates Darrick J. Wong
2024-12-23 21:44   ` [PATCH 27/36] xfs: define the on-disk format for the metadir feature Darrick J. Wong
2024-12-23 21:44   ` [PATCH 28/36] xfs: iget for metadata inodes Darrick J. Wong
2024-12-23 21:44   ` [PATCH 29/36] xfs: enforce metadata inode flag Darrick J. Wong
2024-12-23 21:44   ` [PATCH 30/36] xfs: read and write metadata inode directory tree Darrick J. Wong
2024-12-23 21:45   ` [PATCH 31/36] xfs: disable the agi rotor for metadata inodes Darrick J. Wong
2024-12-23 21:45   ` [PATCH 32/36] xfs: advertise metadata directory feature Darrick J. Wong
2024-12-23 21:45   ` [PATCH 33/36] xfs: allow bulkstat to return metadata directories Darrick J. Wong
2024-12-23 21:45   ` [PATCH 34/36] xfs: adjust xfs_bmap_add_attrfork for metadir Darrick J. Wong
2024-12-23 21:46   ` [PATCH 35/36] xfs: record health problems with the metadata directory Darrick J. Wong
2024-12-23 21:46   ` [PATCH 36/36] xfs: check metadata directory file path connectivity Darrick J. Wong
2024-12-23 21:35 ` [PATCHSET v6.2 3/8] xfsprogs: metadata inode directory trees Darrick J. Wong
2024-12-23 21:46   ` [PATCH 01/41] libxfs: constify the xfs_inode predicates Darrick J. Wong
2024-12-23 21:47   ` [PATCH 02/41] libxfs: load metadata directory root at mount time Darrick J. Wong
2024-12-23 21:47   ` [PATCH 03/41] libxfs: enforce metadata inode flag Darrick J. Wong
2024-12-23 21:47   ` [PATCH 04/41] man2: document metadata directory flag in fsgeom ioctl Darrick J. Wong
2024-12-23 21:47   ` [PATCH 05/41] man: update scrub ioctl documentation for metadir Darrick J. Wong
2024-12-23 21:48   ` [PATCH 06/41] libfrog: report metadata directories in the geometry report Darrick J. Wong
2024-12-23 21:48   ` [PATCH 07/41] libfrog: allow METADIR in xfrog_bulkstat_single5 Darrick J. Wong
2024-12-23 21:48   ` [PATCH 08/41] xfs_io: support scrubbing metadata directory paths Darrick J. Wong
2024-12-23 21:48   ` [PATCH 09/41] xfs_db: disable xfs_check when metadir is enabled Darrick J. Wong
2024-12-23 21:49   ` [PATCH 10/41] xfs_db: report metadir support for version command Darrick J. Wong
2024-12-23 21:49   ` [PATCH 11/41] xfs_db: don't obfuscate metadata directories and attributes Darrick J. Wong
2024-12-23 21:49   ` [PATCH 12/41] xfs_db: support metadata directories in the path command Darrick J. Wong
2024-12-23 21:49   ` [PATCH 13/41] xfs_db: show the metadata root directory when dumping superblocks Darrick J. Wong
2024-12-23 21:50   ` [PATCH 14/41] xfs_db: display di_metatype Darrick J. Wong
2024-12-23 21:50   ` [PATCH 15/41] xfs_db: drop the metadata checking code from blockget Darrick J. Wong
2024-12-23 21:50   ` [PATCH 16/41] xfs_io: support flag for limited bulkstat of the metadata directory Darrick J. Wong
2024-12-23 21:50   ` [PATCH 17/41] xfs_io: support scrubbing metadata directory paths Darrick J. Wong
2024-12-23 21:51   ` [PATCH 18/41] xfs_spaceman: report health of metadir inodes too Darrick J. Wong
2024-12-23 21:51   ` [PATCH 19/41] xfs_scrub: tread zero-length read verify as an IO error Darrick J. Wong
2024-12-23 21:51   ` [PATCH 20/41] xfs_scrub: scan metadata directories during phase 3 Darrick J. Wong
2024-12-23 21:51   ` [PATCH 21/41] xfs_scrub: re-run metafile scrubbers during phase 5 Darrick J. Wong
2024-12-23 21:52   ` [PATCH 22/41] xfs_repair: handle sb_metadirino correctly when zeroing supers Darrick J. Wong
2024-12-23 21:52   ` [PATCH 23/41] xfs_repair: dont check metadata directory dirent inumbers Darrick J. Wong
2024-12-23 21:52   ` [PATCH 24/41] xfs_repair: refactor fixing dotdot Darrick J. Wong
2024-12-23 21:53   ` [PATCH 25/41] xfs_repair: refactor marking of metadata inodes Darrick J. Wong
2024-12-23 21:53   ` [PATCH 26/41] xfs_repair: refactor root directory initialization Darrick J. Wong
2024-12-23 21:53   ` [PATCH 27/41] xfs_repair: refactor grabbing realtime metadata inodes Darrick J. Wong
2024-12-23 21:53   ` [PATCH 28/41] xfs_repair: check metadata inode flag Darrick J. Wong
2024-12-23 21:54   ` [PATCH 29/41] xfs_repair: use libxfs_metafile_iget for quota/rt inodes Darrick J. Wong
2024-12-23 21:54   ` [PATCH 30/41] xfs_repair: rebuild the metadata directory Darrick J. Wong
2024-12-23 21:54   ` [PATCH 31/41] xfs_repair: don't let metadata and regular files mix Darrick J. Wong
2024-12-23 21:54   ` [PATCH 32/41] xfs_repair: update incore metadata state whenever we create new files Darrick J. Wong
2024-12-23 21:55   ` [PATCH 33/41] xfs_repair: pass private data pointer to scan_lbtree Darrick J. Wong
2024-12-23 21:55   ` [PATCH 34/41] xfs_repair: mark space used by metadata files Darrick J. Wong
2024-12-23 21:55   ` [PATCH 35/41] xfs_repair: adjust keep_fsinos to handle metadata directories Darrick J. Wong
2024-12-23 21:55   ` [PATCH 36/41] xfs_repair: metadata dirs are never plausible root dirs Darrick J. Wong
2024-12-23 21:56   ` [PATCH 37/41] xfs_repair: drop all the metadata directory files during pass 4 Darrick J. Wong
2024-12-23 21:56   ` [PATCH 38/41] xfs_repair: truncate and unmark orphaned metadata inodes Darrick J. Wong
2024-12-23 21:56   ` [PATCH 39/41] xfs_repair: do not count metadata directory files when doing quotacheck Darrick J. Wong
2024-12-23 21:56   ` [PATCH 40/41] xfs_repair: refactor generate_rtinfo Darrick J. Wong
2024-12-23 21:57   ` [PATCH 41/41] mkfs.xfs: enable metadata directories Darrick J. Wong
2024-12-23 21:35 ` [PATCHSET v6.2 4/8] mkfs: make protofiles less janky Darrick J. Wong
2024-12-23 21:57   ` [PATCH 1/4] libxfs: resync libxfs_alloc_file_space interface with the kernel Darrick J. Wong
2024-12-23 21:57   ` [PATCH 2/4] mkfs: support copying in large or sparse files Darrick J. Wong
2024-12-23 21:57   ` [PATCH 3/4] mkfs: support copying in xattrs Darrick J. Wong
2024-12-23 21:58   ` [PATCH 4/4] mkfs: add a utility to generate protofiles Darrick J. Wong
2024-12-23 21:35 ` [PATCHSET 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
2024-12-23 21:58   ` [PATCH 01/52] xfs: create incore realtime group structures Darrick J. Wong
2024-12-23 21:58   ` [PATCH 02/52] xfs: define locking primitives for realtime groups Darrick J. Wong
2024-12-23 21:59   ` [PATCH 03/52] xfs: add a lockdep class key for rtgroup inodes Darrick J. Wong
2024-12-23 21:59   ` [PATCH 04/52] xfs: support caching rtgroup metadata inodes Darrick J. Wong
2024-12-23 21:59   ` [PATCH 05/52] xfs: add a xfs_bmap_free_rtblocks helper Darrick J. Wong
2024-12-23 21:59   ` [PATCH 06/52] xfs: move RT bitmap and summary information to the rtgroup Darrick J. Wong
2024-12-23 22:00   ` [PATCH 07/52] xfs: support creating per-RTG files in growfs Darrick J. Wong
2024-12-23 22:00   ` [PATCH 08/52] xfs: refactor xfs_rtbitmap_blockcount Darrick J. Wong
2024-12-23 22:00   ` [PATCH 09/52] xfs: refactor xfs_rtsummary_blockcount Darrick J. Wong
2024-12-23 22:00   ` [PATCH 10/52] xfs: make RT extent numbers relative to the rtgroup Darrick J. Wong
2024-12-23 22:01   ` [PATCH 11/52] libfrog: add memchr_inv Darrick J. Wong
2024-12-23 22:01   ` [PATCH 12/52] xfs: define the format of rt groups Darrick J. Wong
2024-12-23 22:01   ` [PATCH 13/52] xfs: update realtime super every time we update the primary fs super Darrick J. Wong
2024-12-23 22:01   ` [PATCH 14/52] xfs: export realtime group geometry via XFS_FSOP_GEOM Darrick J. Wong
2024-12-23 22:02   ` [PATCH 15/52] xfs: check that rtblock extents do not break rtsupers or rtgroups Darrick J. Wong
2024-12-23 22:02   ` [PATCH 16/52] xfs: add a helper to prevent bmap merges across rtgroup boundaries Darrick J. Wong
2024-12-23 22:02   ` [PATCH 17/52] xfs: add frextents to the lazysbcounters when rtgroups enabled Darrick J. Wong
2024-12-23 22:02   ` [PATCH 18/52] xfs: record rt group metadata errors in the health system Darrick J. Wong
2024-12-23 22:03   ` [PATCH 19/52] xfs: export the geometry of realtime groups to userspace Darrick J. Wong
2024-12-23 22:03   ` [PATCH 20/52] xfs: add block headers to realtime bitmap and summary blocks Darrick J. Wong
2024-12-23 22:03   ` [PATCH 21/52] xfs: encode the rtbitmap in big endian format Darrick J. Wong
2024-12-23 22:03   ` [PATCH 22/52] xfs: encode the rtsummary " Darrick J. Wong
2024-12-23 22:04   ` [PATCH 23/52] xfs: grow the realtime section when realtime groups are enabled Darrick J. Wong
2024-12-23 22:04   ` [PATCH 24/52] xfs: support logging EFIs for realtime extents Darrick J. Wong
2024-12-23 22:04   ` [PATCH 25/52] xfs: support error injection when freeing rt extents Darrick J. Wong
2024-12-23 22:05   ` [PATCH 26/52] xfs: use realtime EFI to free extents when rtgroups are enabled Darrick J. Wong
2024-12-23 22:05   ` [PATCH 27/52] xfs: don't merge ioends across RTGs Darrick J. Wong
2024-12-23 22:05   ` [PATCH 28/52] xfs: make the RT allocator rtgroup aware Darrick J. Wong
2024-12-23 22:05   ` [PATCH 29/52] xfs: scrub the realtime group superblock Darrick J. Wong
2024-12-23 22:06   ` [PATCH 30/52] xfs: scrub metadir paths for rtgroup metadata Darrick J. Wong
2024-12-23 22:06   ` [PATCH 31/52] xfs: mask off the rtbitmap and summary inodes when metadir in use Darrick J. Wong
2024-12-23 22:06   ` [PATCH 32/52] xfs: create helpers to deal with rounding xfs_fileoff_t to rtx boundaries Darrick J. Wong
2024-12-23 22:06   ` [PATCH 33/52] xfs: create helpers to deal with rounding xfs_filblks_t " Darrick J. Wong
2024-12-23 22:07   ` [PATCH 34/52] xfs: make xfs_rtblock_t a segmented address like xfs_fsblock_t Darrick J. Wong
2024-12-23 22:07   ` [PATCH 35/52] xfs: adjust min_block usage in xfs_verify_agbno Darrick J. Wong
2024-12-23 22:07   ` [PATCH 36/52] xfs: move the min and max group block numbers to xfs_group Darrick J. Wong
2024-12-23 22:07   ` [PATCH 37/52] xfs: implement busy extent tracking for rtgroups Darrick J. Wong
2024-12-23 22:08   ` [PATCH 38/52] xfs: use metadir for quota inodes Darrick J. Wong
2024-12-23 22:08   ` [PATCH 39/52] xfs: scrub quota file metapaths Darrick J. Wong
2024-12-23 22:08   ` [PATCH 40/52] xfs: enable metadata directory feature Darrick J. Wong
2024-12-23 22:08   ` [PATCH 41/52] xfs: convert struct typedefs in xfs_ondisk.h Darrick J. Wong
2024-12-23 22:09   ` [PATCH 42/52] xfs: separate space btree structures " Darrick J. Wong
2024-12-23 22:09   ` [PATCH 43/52] xfs: port ondisk structure checks from xfs/122 to the kernel Darrick J. Wong
2024-12-23 22:09   ` [PATCH 44/52] xfs: remove unknown compat feature check in superblock write validation Darrick J. Wong
2024-12-23 22:09   ` [PATCH 45/52] xfs: fix sparse inode limits on runt AG Darrick J. Wong
2024-12-23 22:10   ` [PATCH 46/52] xfs: switch to multigrain timestamps Darrick J. Wong
2024-12-23 22:10   ` [PATCH 47/52] xfs: don't call xfs_bmap_same_rtgroup in xfs_bmap_add_extent_hole_delay Darrick J. Wong
2024-12-23 22:10   ` [PATCH 48/52] xfs: return a 64-bit block count from xfs_btree_count_blocks Darrick J. Wong
2024-12-23 22:10   ` [PATCH 49/52] xfs: fix error bailout in xfs_rtginode_create Darrick J. Wong
2024-12-23 22:11   ` [PATCH 50/52] xfs: update btree keys correctly when _insrec splits an inode root block Darrick J. Wong
2024-12-23 22:11   ` [PATCH 51/52] xfs: fix sb_spino_align checks for large fsblock sizes Darrick J. Wong
2024-12-23 22:11   ` [PATCH 52/52] xfs: return from xfs_symlink_verify early on V4 filesystems Darrick J. Wong
2024-12-23 21:35 ` [PATCHSET v6.2 6/8] xfsprogs: shard the realtime section Darrick J. Wong
2024-12-23 22:12   ` [PATCH 01/51] libxfs: remove XFS_ILOCK_RT* Darrick J. Wong
2024-12-23 22:12   ` [PATCH 02/51] libxfs: adjust xfs_fsb_to_db to handle segmented rtblocks Darrick J. Wong
2024-12-23 22:12   ` [PATCH 03/51] xfs_repair,mkfs: port to libxfs_rt{bitmap,summary}_create Darrick J. Wong
2024-12-23 22:12   ` [PATCH 04/51] libxfs: use correct rtx count to block count conversion Darrick J. Wong
2024-12-23 22:13   ` [PATCH 05/51] libfrog: scrub the realtime group superblock Darrick J. Wong
2024-12-23 22:13   ` [PATCH 06/51] man: document the rt group geometry ioctl Darrick J. Wong
2024-12-23 22:13   ` [PATCH 07/51] man: document rgextents geom field Darrick J. Wong
2024-12-23 22:13   ` [PATCH 08/51] libxfs: port userspace deferred log item to handle rtgroups Darrick J. Wong
2024-12-23 22:14   ` [PATCH 09/51] libxfs: implement some sanity checking for enormous rgcount Darrick J. Wong
2024-12-23 22:14   ` [PATCH 10/51] libfrog: support scrubbing rtgroup metadata paths Darrick J. Wong
2024-12-23 22:14   ` [PATCH 11/51] libfrog: report rt groups in output Darrick J. Wong
2024-12-23 22:14   ` [PATCH 12/51] libfrog: add bitmap_clear Darrick J. Wong
2024-12-23 22:15   ` [PATCH 13/51] xfs_logprint: report realtime EFIs Darrick J. Wong
2024-12-23 22:15   ` [PATCH 14/51] xfs_repair: adjust rtbitmap/rtsummary word updates to handle big endian values Darrick J. Wong
2024-12-23 22:15   ` [PATCH 15/51] xfs_repair: refactor phase4 Darrick J. Wong
2024-12-23 22:15   ` [PATCH 16/51] xfs_repair: refactor offsetof+sizeof to offsetofend Darrick J. Wong
2024-12-23 22:16   ` [PATCH 17/51] xfs_repair: improve rtbitmap discrepancy reporting Darrick J. Wong
2024-12-23 22:16   ` [PATCH 18/51] xfs_repair: simplify rt_lock handling Darrick J. Wong
2024-12-23 22:16   ` [PATCH 19/51] xfs_repair: add a real per-AG bitmap abstraction Darrick J. Wong
2024-12-23 22:16   ` [PATCH 20/51] xfs_repair: support realtime groups Darrick J. Wong
2024-12-23 22:17   ` [PATCH 21/51] xfs_repair: find and clobber rtgroup bitmap and summary files Darrick J. Wong
2024-12-23 22:17   ` [PATCH 22/51] xfs_repair: support realtime superblocks Darrick J. Wong
2024-12-23 22:17   ` [PATCH 23/51] xfs_repair: repair rtbitmap and rtsummary block headers Darrick J. Wong
2024-12-23 22:18   ` [PATCH 24/51] xfs_db: enable the rtblock and rtextent commands for segmented rt block numbers Darrick J. Wong
2024-12-23 22:18   ` [PATCH 25/51] xfs_db: enable rtconvert to handle segmented rtblocks Darrick J. Wong
2024-12-23 22:18   ` [PATCH 26/51] xfs_db: listify the definition of enum typnm Darrick J. Wong
2024-12-23 22:18   ` [PATCH 27/51] xfs_db: support dumping realtime group data and superblocks Darrick J. Wong
2024-12-23 22:19   ` [PATCH 28/51] xfs_db: support changing the label and uuid of rt superblocks Darrick J. Wong
2024-12-23 22:19   ` [PATCH 29/51] xfs_db: enable conversion of rt space units Darrick J. Wong
2024-12-23 22:19   ` [PATCH 30/51] xfs_db: metadump metadir rt bitmap and summary files Darrick J. Wong
2024-12-23 22:19   ` [PATCH 31/51] xfs_db: metadump realtime devices Darrick J. Wong
2024-12-23 22:20   ` [PATCH 32/51] xfs_db: dump rt bitmap blocks Darrick J. Wong
2024-12-23 22:20   ` [PATCH 33/51] xfs_db: dump rt summary blocks Darrick J. Wong
2024-12-23 22:20   ` [PATCH 34/51] xfs_db: report rt group and block number in the bmap command Darrick J. Wong
2024-12-23 22:20   ` [PATCH 35/51] xfs_io: support scrubbing rtgroup metadata Darrick J. Wong
2024-12-23 22:21   ` [PATCH 36/51] xfs_io: support scrubbing rtgroup metadata paths Darrick J. Wong
2024-12-23 22:21   ` [PATCH 37/51] xfs_io: add a command to display allocation group information Darrick J. Wong
2024-12-23 22:21   ` [PATCH 38/51] xfs_io: add a command to display realtime " Darrick J. Wong
2024-12-23 22:21   ` [PATCH 39/51] xfs_io: display rt group in verbose bmap output Darrick J. Wong
2024-12-23 22:22   ` [PATCH 40/51] xfs_io: display rt group in verbose fsmap output Darrick J. Wong
2024-12-23 22:22   ` [PATCH 41/51] xfs_mdrestore: refactor open-coded fd/is_file into a structure Darrick J. Wong
2024-12-23 22:22   ` [PATCH 42/51] xfs_mdrestore: restore rt group superblocks to realtime device Darrick J. Wong
2024-12-23 22:22   ` [PATCH 43/51] xfs_spaceman: report on realtime group health Darrick J. Wong
2024-12-23 22:23   ` [PATCH 44/51] xfs_scrub: scrub realtime allocation group metadata Darrick J. Wong
2024-12-23 22:23   ` [PATCH 45/51] xfs_scrub: check rtgroup metadata directory connections Darrick J. Wong
2024-12-23 22:23   ` [PATCH 46/51] xfs_scrub: cleanup fsmap keys initialization Darrick J. Wong
2024-12-23 22:24   ` [PATCH 47/51] xfs_scrub: call GETFSMAP for each rt group in parallel Darrick J. Wong
2024-12-23 22:24   ` [PATCH 48/51] xfs_scrub: trim realtime volumes too Darrick J. Wong
2024-12-23 22:24   ` [PATCH 49/51] xfs_scrub: use histograms to speed up phase 8 on the realtime volume Darrick J. Wong
2024-12-23 22:24   ` [PATCH 50/51] mkfs: add headers to realtime bitmap blocks Darrick J. Wong
2024-12-23 22:25   ` [PATCH 51/51] mkfs: format realtime groups Darrick J. Wong
2024-12-23 21:36 ` [PATCHSET v6.2 7/8] xfsprogs: store quota files in the metadir Darrick J. Wong
2024-12-23 22:25   ` [PATCH 1/7] libfrog: scrub quota file metapaths Darrick J. Wong
2024-12-23 22:25   ` [PATCH 2/7] xfs_db: support metadir quotas Darrick J. Wong
2024-12-23 22:25   ` [PATCH 3/7] xfs_repair: refactor quota inumber handling Darrick J. Wong
2024-12-23 22:26   ` [PATCH 4/7] xfs_repair: hoist the secondary sb qflags handling Darrick J. Wong
2024-12-23 22:26   ` [PATCH 5/7] xfs_repair: support quota inodes in the metadata directory Darrick J. Wong
2024-12-23 22:26   ` [PATCH 6/7] xfs_repair: try not to trash qflags on metadir filesystems Darrick J. Wong
2024-12-23 22:26   ` [PATCH 7/7] mkfs: add quota flags when setting up filesystem Darrick J. Wong
2024-12-23 21:36 ` [PATCHSET v6.2 8/8] xfsprogs: enable quota for realtime volumes Darrick J. Wong
2024-12-23 22:27   ` [PATCH 1/2] xfs_quota: report warning limits for realtime space quotas Darrick J. Wong
2024-12-23 22:27   ` [PATCH 2/2] mkfs: enable rt quota options Darrick J. Wong
2024-12-23 22:27 ` [GIT PULL 1/8] xfsprogs: bug fixes for 6.12 Darrick J. Wong
2024-12-23 22:27 ` [GIT PULL 2/8] libxfs: metadata inode directory trees Darrick J. Wong
2024-12-23 22:28 ` [GIT PULL 3/8] xfsprogs: " Darrick J. Wong
2024-12-23 22:28 ` [GIT PULL 4/8] mkfs: make protofiles less janky Darrick J. Wong
2024-12-23 22:28 ` [GIT PULL 5/8] xfsprogs: new code for 6.13 Darrick J. Wong
2024-12-23 22:28 ` [GIT PULL 6/8] xfsprogs: shard the realtime section Darrick J. Wong
2024-12-23 22:29 ` [GIT PULL 7/8] xfsprogs: store quota files in the metadir Darrick J. Wong
2024-12-23 22:29 ` [GIT PULL 8/8] xfsprogs: enable quota for realtime volumes Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox